1
0
mirror of https://github.com/golang/go synced 2024-10-04 08:21:22 -06:00
Commit Graph

496 Commits

Author SHA1 Message Date
Russ Cox
0a8a719ded cmd/5g, cmd/6g, cmd/8g: preserve wide values in large functions
In large functions with many variables, the register optimizer
may give up and choose not to track certain variables at all.
In this case, the "nextinnode" information linking together
all the words from a given variable will be incomplete, and
the result may be that only some of a multiword value is
preserved across a call. That confuses the garbage collector,
so don't do that. Instead, mark those variables as having
their address taken, so that they will be preserved at all
calls. It's overkill, but correct.

Tested by hand using the 6g -S output to see that it does fix
the buggy generated code leading to the issue 7726 failure.

There is no automated test because I managed to break the
compiler while writing a test (see issue 7727). I will check
in a test along with the fix to issue 7727.

Fixes #7726.

LGTM=khr
R=khr, bradfitz, dave
CC=golang-codereviews
https://golang.org/cl/85200043
2014-04-16 13:59:42 -04:00
Keith Randall
bfbb2e827b cmd/6g: nacl: zero odd multiple of widthptr correctly
LGTM=iant
R=remyoudompheng, iant
CC=golang-codereviews
https://golang.org/cl/86270043
2014-04-10 07:59:46 -07:00
Rémy Oudompheng
f400d9aafc cmd/6g: relax constraint on variables that need zeroing.
On amd64p32 pointers are 32-bit-aligned and cannot be assumed to
have an offset multiple of widthreg. Instead check that they are
withptr-aligned.

Also change the threshold for region merging to 2*widthreg
instead of 2*widthptr because performance on amd64 and amd64p32
is expected to be the same.

Fixes #7712.

LGTM=khr
R=rsc, dave, khr, brad, bradfitz
CC=golang-codereviews
https://golang.org/cl/84690044
2014-04-09 21:23:36 +02:00
Rémy Oudompheng
c6a41a3559 cmd/6g, cmd/8g: disable Duff's device on NaCl.
Native Client forbids jumps/calls to arbitrary locations and
enforces a particular alignement, which makes the Duff's device
ineffective.

LGTM=khr
R=rsc, dave, khr
CC=golang-codereviews
https://golang.org/cl/84400043
2014-04-04 08:42:35 +02:00
Keith Randall
47acf16709 cmd/gc: Don't zero more than we need.
Don't merge with the zero range, we may
end up zeroing more than we need.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/83430044
2014-04-02 09:17:42 -07:00
Keith Randall
383963b506 runtime: zero at start of frame more efficiently.
Use Duff's device for zeroing.  Combine adjacent regions.

Update #7680
Update #7624

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/83200045
2014-04-01 19:44:07 -07:00
Russ Cox
daca06f2e3 cmd/gc: shorten more temporary lifetimes
1. In functions with heap-allocated result variables or with
defer statements, the return sequence requires more than
just a single RET instruction. There is an optimization that
arranges for all returns to jump to a single copy of the return
epilogue in this case. Unfortunately, that optimization is
fundamentally incompatible with PC-based liveness information:
it takes PCs at many different points in the function and makes
them all land at one PC, making the combined liveness information
at that target PC a mess. Disable this optimization, so that each
return site gets its own copy of the 'call deferreturn' and the
copying of result variables back from the heap.
This removes quite a few spurious 'ambiguously live' variables.

2. Let orderexpr allocate temporaries that are passed by address
to a function call and then die on return, so that we can arrange
an appropriate VARKILL.

2a. Do this for ... slices.

2b. Do this for closure structs.

2c. Do this for runtime.concatstring, which is the implementation
of large string additions. Change representation of OADDSTR to
an explicit list in typecheck to avoid reconstructing list in both
walk and order.

3. Let orderexpr allocate the temporary variable copies used for
range loops, so that they can be killed when the loop is over.
Similarly, let it allocate the temporary holding the map iterator.

CL 81940043 reduced the number of ambiguously live temps
in the godoc binary from 860 to 711.

This CL reduces the number to 121. Still more to do, but another
good checkpoint.

Update #7345

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83090046
2014-04-01 20:02:54 -04:00
Keith Randall
6c7cbf086c runtime: get rid of most uses of REP for copying/zeroing.
REP MOVSQ and REP STOSQ have a really high startup overhead.
Use a Duff's device to do the repetition instead.

benchmark                 old ns/op     new ns/op     delta
BenchmarkClearFat32       7.20          1.60          -77.78%
BenchmarkCopyFat32        6.88          2.38          -65.41%
BenchmarkClearFat64       7.15          3.20          -55.24%
BenchmarkCopyFat64        6.88          3.44          -50.00%
BenchmarkClearFat128      9.53          5.34          -43.97%
BenchmarkCopyFat128       9.27          5.56          -40.02%
BenchmarkClearFat256      13.8          9.53          -30.94%
BenchmarkCopyFat256       13.5          10.3          -23.70%
BenchmarkClearFat512      22.3          18.0          -19.28%
BenchmarkCopyFat512       22.0          19.7          -10.45%
BenchmarkCopyFat1024      36.5          38.4          +5.21%
BenchmarkClearFat1024     35.1          35.0          -0.28%

TODO: use for stack frame zeroing
TODO: REP prefixes are still used for "reverse" copying when src/dst
regions overlap.  Might be worth fixing.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, r
https://golang.org/cl/81370046
2014-04-01 12:51:02 -07:00
Russ Cox
b700cb4974 cmd/gc: shorten temporary lifetimes when possible
The new channel and map runtime routines take pointers
to values, typically temporaries. Without help, the compiler
cannot tell when those temporaries stop being needed,
because it isn't sure what happened to the pointer.
Arrange to insert explicit VARKILL instructions for these
temporaries so that the liveness analysis can avoid seeing
them as "ambiguously live".

The change is made in order.c, which was already in charge of
introducing temporaries to preserve the order-of-evaluation
guarantees. Now its job has expanded to include introducing
temporaries as needed by runtime routines, and then also
inserting the VARKILL annotations for all these temporaries,
so that their lifetimes can be shortened.

In order to do its job for the map runtime routines, order.c arranges
that all map lookups or map assignments have the form:

        x = m[k]
        x, y = m[k]
        m[k] = x

where x, y, and k are simple variables (often temporaries).
Likewise, receiving from a channel is now always:

        x = <-c

In order to provide the map guarantee, order.c is responsible for
rewriting x op= y into x = x op y, so that m[k] += z becomes

        t = m[k]
        t2 = t + z
        m[k] = t2

While here, fix a few bugs in order.c's traversal: it was failing to
walk into select and switch case bodies, so order of evaluation
guarantees were not preserved in those situations.
Added tests to test/reorder2.go.

Fixes #7671.

In gc/popt's temporary-merging optimization, allow merging
of temporaries with their address taken as long as the liveness
ranges do not intersect. (There is a good chance of that now
that we have VARKILL annotations to limit the liveness range.)

Explicitly killing temporaries cuts the number of ambiguously
live temporaries that must be zeroed in the godoc binary from
860 to 711, or -17%. There is more work to be done, but this
is a good checkpoint.

Update #7345

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81940043
2014-04-01 13:31:38 -04:00
Russ Cox
6722d45631 cmd/gc: liveness-related bug fixes
1. On entry to a function, only zero the ambiguously live stack variables.
Before, we were zeroing all stack variables containing pointers.
The zeroing is pretty inefficient right now (issue 7624), but there are also
too many stack variables detected as ambiguously live (issue 7345),
and that must be addressed before deciding how to improve the zeroing code.
(Changes in 5g/ggen.c, 6g/ggen.c, 8g/ggen.c, gc/pgen.c)

Fixes #7647.

2. Make the regopt word-based liveness analysis preserve the
whole-variable liveness property expected by the garbage collection
bitmap liveness analysis. That is, if the regopt liveness decides that
one word in a struct needs to be preserved, make sure it preserves
the entire struct. This is particularly important for multiword values
such as strings, slices, and interfaces, in which all the words need
to be present in order to understand the meaning.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c.)

Fixes #7591.

3. Make the regopt word-based liveness analysis treat a variable
as having its address taken - which makes it preserved across
all future calls - whenever n->addrtaken is set, for consistency
with the gc bitmap liveness analysis, even if there is no machine
instruction actually taking the address. In this case n->addrtaken
is incorrect (a nicer way to put it is overconservative), and ideally
there would be no such cases, but they can happen and the two
analyses need to agree.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c; test in bug484.go.)

Fixes crashes found by turning off "zero everything" in step 1.

4. Remove spurious VARDEF annotations. As the comment in
gc/pgen.c explains, the VARDEF must immediately precede
the initialization. It cannot be too early, and it cannot be too late.
In particular, if a function call sits between the VARDEF and the
actual machine instructions doing the initialization, the variable
will be treated as live during that function call even though it is
uninitialized, leading to problems.
(Changes in gc/gen.c; test in live.go.)

Fixes crashes found by turning off "zero everything" in step 1.

5. Do not treat loading the address of a wide value as a signal
that the value must be initialized. Instead depend on the existence
of a VARDEF or the first actual read/write of a word in the value.
If the load is in order to pass the address to a function that does
the actual initialization, treating the load as an implicit VARDEF
causes the same problems as described in step 4.
The alternative is to arrange to zero every such value before
passing it to the real initialization function, but this is a much
easier and more efficient change.
(Changes in gc/plive.c.)

Fixes crashes found by turning off "zero everything" in step 1.

6. Treat wide input parameters with their address taken as
initialized on entry to the function. Otherwise they look
"ambiguously live" and we will try to emit code to zero them.
(Changes in gc/plive.c.)

Fixes crashes found by turning off "zero everything" in step 1.

7. An array of length 0 has no pointers, even if the element type does.
Without this change, the zeroing code complains when asked to
clear a 0-length array.
(Changes in gc/reflect.c.)

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80160044
2014-03-27 14:05:57 -04:00
Rémy Oudompheng
0285d2b96b cmd/6g, cmd/8g: skip CONVNOP nodes in bgen.
Revision 3ae4607a43ff introduced CONVNOP layers
to fix type checking issues arising from comparisons.
The added complexity made 8g run out of registers
when compiling an equality function in go.net/ipv6.

A similar issue occurred in test/sizeof.go on
amd64p32 with 6g.

Fixes #7405.

LGTM=khr
R=rsc, dave, iant, khr
CC=golang-codereviews
https://golang.org/cl/78100044
2014-03-20 22:22:37 +01:00
Rémy Oudompheng
ceb346685d cmd/6g: make comment more consistent.
LGTM=dave, iant
R=iant, khr, rsc, dave
CC=golang-codereviews
https://golang.org/cl/77960044
2014-03-20 22:21:10 +01:00
Ian Lance Taylor
6ea340fd61 cmd/6g: remove unused stosptr variable
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/78030043
2014-03-19 17:18:25 -07:00
Keith Randall
14664050b9 cmd/6g: do small zeroings with straightline code.
Removes most uses of the REP prefix, which has a high startup cost.

LGTM=iant
R=golang-codereviews, iant, khr
CC=golang-codereviews
https://golang.org/cl/77920043
2014-03-19 15:41:34 -07:00
Rémy Oudompheng
f210fd1fa9 cmd/6g, runtime: alignment fixes for amd64p32.
LGTM=rsc
R=rsc, dave, iant, khr
CC=golang-codereviews
https://golang.org/cl/75820044
2014-03-14 19:37:39 +01:00
Rémy Oudompheng
741244e433 cmd/6g: fix stack zeroing preamble on amd64p32.
It was using a REP STOSQ but putting in CX the number of 32-bit
words to clear.

LGTM=dave
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/75240043
2014-03-13 08:12:38 +01:00
Dave Cheney
e31a1ce109 cmd/gc, cmd/5g, cmd/6g, cmd/8g: introduce linkarchinit and add amd64p32 support
Replaces CL 70000043.

Introduce linkarchinit() from cmd/ld.

For cmd/6g, switch to the amd64p32 linker model if we are building under nacl/amd64p32.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/71330045
2014-03-07 15:33:44 +11:00
Russ Cox
c2dd33a46f cmd/ld: clear unused ctxt before morestack
For non-closure functions, the context register is uninitialized
on entry and will not be used, but morestack saves it and then the
garbage collector treats it as live. This can be a source of memory
leaks if the context register points at otherwise dead memory.
Avoid this by introducing a parallel set of morestack functions
that clear the context register, and use those for the non-closure functions.

I hope this will help with some of the finalizer flakiness, but it probably won't.

Fixes #7244.

LGTM=dvyukov
R=khr, dvyukov
CC=golang-codereviews
https://golang.org/cl/71030044
2014-03-04 13:53:08 -05:00
Ian Lance Taylor
fa6375ea47 cmd/6g, cmd/8g: simplify calls to gvardef
The gvardef function does nothing if n->class == PEXTERN, so
we don't need to test for that before calling it.  This makes
the 6g/8g code more like the 5g code and clarifies that the
cases that do not test for n->class != PEXTERN are not buggy.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/68900044
2014-02-26 07:37:10 -08:00
Dave Cheney
7c8280c9ef all: merge NaCl branch (part 1)
See golang.org/s/go13nacl for design overview.

This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come.

CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately.

The exact change lists included are

15050047: syscall: support for Native Client
15360044: syscall: unzip implementation for Native Client
15370044: syscall: Native Client SRPC implementation
15400047: cmd/dist, cmd/go, go/build, test: support for Native Client
15410048: runtime: support for Native Client
15410049: syscall: file descriptor table for Native Client
15410050: syscall: in-memory file system for Native Client
15440048: all: update +build lines for Native Client port
15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client
15570045: os: support for Native Client
15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32
15690044: net: support for Native Client
15690048: runtime: support for fake time like on Go Playground
15690051: build: disable various tests on Native Client

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/68150047
2014-02-25 09:47:42 -05:00
Russ Cox
7a7c0ffb47 cmd/gc: correct liveness for fat variables
The VARDEF placement must be before the initialization
but after any final use. If you have something like s = ... using s ...
the rhs must be evaluated, then the VARDEF, then the lhs
assigned.

There is a large comment in pgen.c on gvardef explaining
this in more detail.

This CL also includes Ian's suggestions from earlier CLs,
namely commenting the use of mode in link.h and fixing
the precedence of the ~r check in dcl.c.

This CL enables the check that if liveness analysis decides
a variable is live on entry to the function, that variable must
be a function parameter (not a result, and not a local variable).
If this check fails, it indicates a bug in the liveness analysis or
in the generated code being analyzed.

The race detector generates invalid code for append(x, y...).
The code declares a temporary t and then uses cap(t) before
initializing t. The new liveness check catches this bug and
stops the compiler from writing out the buggy code.
Consequently, this CL disables the race detector tests in
run.bash until the race detector bug can be fixed
(golang.org/issue/7334).

Except for the race detector bug, the liveness analysis check
does not detect any problems (this CL and the previous CLs
fixed all the detected problems).

The net test still fails with GOGC=0 but the rest of the tests
now pass or time out (because GOGC=0 is so slow).

TBR=iant
CC=golang-codereviews
https://golang.org/cl/64170043
2014-02-15 10:58:55 -05:00
Russ Cox
91b1f7cb15 cmd/gc: handle variable initialization by block move in liveness
Any initialization of a variable by a block copy or block zeroing
or by multiple assignments (componentwise copying or zeroing
of a multiword variable) needs to emit a VARDEF. These cases were not.

Fixes #7205.

TBR=iant
CC=golang-codereviews
https://golang.org/cl/63650044
2014-02-13 22:45:16 -05:00
Russ Cox
801e40a0a4 cmd/gc: rename AFATVARDEF to AVARDEF
The "fat" referred to being used for multiword values only.
We're going to use it for non-fat values sometimes too.

No change other than the renaming.

TBR=iant
CC=golang-codereviews
https://golang.org/cl/63650043
2014-02-13 22:17:22 -05:00
Russ Cox
684332f47c cmd/5g: fix regopt bug in copyprop
copyau1 was assuming that it could deduce the type of the
middle register p->reg from the type of the left or right
argument: in CMPF F1, F2, the p->reg==2 must be a D_FREG
because p->from is F1, and in CMP R1, R2, the p->reg==2 must
be a D_REG because p->from is R1.

This heuristic fails for CMP $0, R2, which was causing copyau1
not to recognize p->reg==2 as a reference to R2, which was
keeping it from properly renaming the register use when
substituting registers.

cmd/5c has the right approach: look at the opcode p->as to
decide the kind of register. It is unclear where 5g's copyau1
came from; perhaps it was an attempt to avoid expanding 5c's
a2type to include new instructions used only by 5g.

Copy a2type from cmd/5c, expand to include additional instructions,
and make it crash the compiler if asked about an instruction
it does not understand (avoid silent bugs in the future if new
instructions are added).

Should fix current arm build breakage.

While we're here, fix the print statements dumping the pred and
succ info in the asm listing to pass an int arg to %.4ud
(Prog.pc is a vlong now, due to the liblink merge).

TBR=ken2
CC=golang-codereviews
https://golang.org/cl/62730043
2014-02-13 03:54:55 +00:00
Anthony Martin
2cae0591cd cmd/cc, cmd/gc, cmd/ld: consolidate print format routines
We now use the %A, %D, %P, and %R routines from liblink
across the board.

Fixes #7178.
Fixes #7055.

LGTM=iant
R=golang-codereviews, gobot, rsc, dave, iant, remyoudompheng
CC=golang-codereviews
https://golang.org/cl/49170043
2014-02-12 14:29:11 -05:00
Daniel Morsing
85e4cb2f10 cmd/6g, cmd/8g, cmd/5g: make the undefined instruction have no successors
The UNDEF instruction was listed in the instruction data as having the next instruction in the stream as its successor. This confused the optimizer into adding a load where it wasn't needed, in turn confusing the liveness analysis pass for GC bitmaps into thinking that the variable was live.

Fixes #7229.

LGTM=iant, rsc
R=golang-codereviews, bradfitz, iant, dave, rsc
CC=golang-codereviews
https://golang.org/cl/56910045
2014-02-11 20:25:40 +00:00
Rémy Oudompheng
2b93c4dd06 cmd/6g: faster memmove/memset-like code using unaligned load/stores.
This changes makes sgen and clearfat use unaligned instructions for
the trailing bytes, like the runtime memmove does, resulting in faster
code when manipulating types whose size is not a multiple of 8.

LGTM=khr
R=khr, iant, rsc
CC=golang-codereviews
https://golang.org/cl/51740044
2014-02-07 23:58:21 +01:00
Russ Cox
a81692e265 cmd/gc: add zeroing to enable precise stack accounting
There is more zeroing than I would like right now -
temporaries used for the new map and channel runtime
calls need to be eliminated - but it will do for now.

This CL only has an effect if you are building with

        GOEXPERIMENT=precisestack ./all.bash

(or make.bash). It costs about 5% in the overall time
spent in all.bash. That number will come down before
we make it on by default, but this should be enough for
Keith to try using the precise maps for copying stacks.

amd64 only (and it's not really great generated code).

TBR=khr, iant
CC=golang-codereviews
https://golang.org/cl/56430043
2014-01-23 23:11:04 -05:00
Keith Randall
0ea20bc981 cmd/6g: inline copies of up to 4 words in length.
This change fixes a serious performance regression
with reflect.Value growing to 4 words instead of 3.
The json benchmark was ~50% slower, with this change
it is ~5% slower (and the binary is 0.5% larger).

Longer term, we probably need to rethink our copy
generation.  Using REP is really expensive time-wise.
But inlining the copy grows the binary.

R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/44990043
2013-12-30 12:14:27 -08:00
Russ Cox
4acb70d377 cmd/gc: bypass DATA instruction for data initialized to integer constant
Eventually we will want to bypass DATA for everything,
but the relocations are not standardized well enough across
architectures to make that possible.

This did not help as much as I expected, but it is definitely better.
It shaves maybe 1-2% off all.bash depending on how much you
trust the timings of a single run:

Before: 241.139r 362.702u 112.967s
After:  234.339r 359.623u 111.045s

R=golang-codereviews, gobot, r, iant
CC=golang-codereviews
https://golang.org/cl/44650043
2013-12-20 14:24:39 -05:00
David du Colombier
9607255760 cmd/6g, cmd/gc, cmd/ld: fix Plan 9 amd64 warnings
warning: src/cmd/6g/reg.c:671 format mismatch d VLONG, arg 4
warning: src/cmd/gc/pgen.c:230 set and not used: oldstksize
warning: src/cmd/gc/plive.c:877 format mismatch lx UVLONG, arg 2
warning: src/cmd/gc/walk.c:2878 set and not used: cbv
warning: src/cmd/gc/walk.c:2885 set and not used: hbv
warning: src/cmd/ld/data.c:198 format mismatch s IND FUNC(IND CHAR) INT, arg 2
warning: src/cmd/ld/data.c:230 format mismatch s IND FUNC(IND CHAR) INT, arg 2
warning: src/cmd/ld/dwarf.c:1517 set and not used: pc
warning: src/cmd/ld/elf.c:1507 format mismatch d VLONG, arg 2
warning: src/cmd/ld/ldmacho.c:509 set and not used: dsymtab

R=golang-dev, gobot, rsc
CC=golang-dev
https://golang.org/cl/36740045
2013-12-18 20:20:46 +01:00
Russ Cox
870e821ded cmd/cc, cmd/gc: update compilers, assemblers for liblink changes
- add buffered stdout to all tools and provide to link ctxt.
- avoid extra \n before ! in .6 files written by assemblers
  (makes them match the C compilers).
- use linkwriteobj instead of linkouthist+linkwritefuncs.
- in assemblers and C compilers, record pc explicitly in Prog,
  for use by liblink.
- in C compilers, preserve jump target links.
- in Go compilers (gsubr.c) attach gotype directly to
  corresponding LSym* instead of rederiving from instruction stream.
- in Go compilers, emit just one definition for runtime.zerovalue
  from each compilation.

This CL consists entirely of small adjustments.
The heavy lifting is in CL 39680043.
Each depends on the other.

R=golang-dev, dave, iant
CC=golang-dev
https://golang.org/cl/37030045
2013-12-16 12:51:38 -05:00
Russ Cox
f606c1be80 cmd/5g, cmd/6g, cmd/8g: use liblink
Preparation for golang.org/s/go13linker work.

This CL does not build by itself. It depends on 35740044
and 35790044 and will be submitted at the same time.

R=iant
CC=golang-dev
https://golang.org/cl/34590045
2013-12-08 22:51:55 -05:00
Carl Shapiro
f056daf075 cmd/5g, cmd/5l, cmd/6g, cmd/6l, cmd/8g, cmd/8l, cmd/gc, runtime: generate pointer maps by liveness analysis
This change allows the garbage collector to examine stack
slots that are determined as live and containing a pointer
value by the garbage collector.  This results in a mean
reduction of 65% in the number of stack slots scanned during
an invocation of "GOGC=1 all.bash".

Unfortunately, this does not yet allow garbage collection to
be precise for the stack slots computed as live.  Pointers
confound the determination of what definitions reach a given
instruction.  In general, this problem is not solvable without
runtime cost but some advanced cooperation from the compiler
might mitigate common cases.

R=golang-dev, rsc, cshapiro
CC=golang-dev
https://golang.org/cl/14430048
2013-12-05 17:35:22 -08:00
Russ Cox
aa0439ba65 cmd/gc: eliminate redundant &x.Field nil checks
This eliminates ~75% of the nil checks being emitted,
on all architectures. We can do better, but we need
a bit more general support from the compiler, and
I don't want to do that so close to Go 1.2.
What's here is simple but effective and safe.

A few small code generation cleanups were required
to make the analysis consistent on all systems about
which nil checks are omitted, at least in the test.

Fixes #6019.

R=ken2
CC=golang-dev
https://golang.org/cl/13334052
2013-09-17 16:54:22 -04:00
Dave Cheney
8d5ec52e6c cmd/6c, cmd/6g, cmd/cc: fix undefined behavior warnings
Update #5764

Like Tribbles, the more you kill, the more spring up in their place.

R=rsc
CC=golang-dev
https://golang.org/cl/13324049
2013-09-17 13:43:27 +10:00
Rémy Oudompheng
ff416a3f19 cmd/gc: inline copy in frontend to call memmove directly.
A new node type OSPTR is added to refer to the data pointer of
strings and slices in a simple way during walk(). It will be
useful for future work on simplification of slice arithmetic.

benchmark                  old ns/op    new ns/op    delta
BenchmarkCopy1Byte                 9            8  -13.98%
BenchmarkCopy2Byte                14            8  -40.49%
BenchmarkCopy4Byte                13            8  -35.04%
BenchmarkCopy8Byte                13            8  -37.10%
BenchmarkCopy12Byte               14           12  -15.38%
BenchmarkCopy16Byte               14           12  -17.24%
BenchmarkCopy32Byte               19           14  -27.32%
BenchmarkCopy128Byte              31           26  -15.29%
BenchmarkCopy1024Byte            100           92   -7.50%
BenchmarkCopy1String              10            7  -28.99%
BenchmarkCopy2String              10            7  -28.06%
BenchmarkCopy4String              10            8  -22.69%
BenchmarkCopy8String              10            8  -23.30%
BenchmarkCopy12String             11           11   -5.88%
BenchmarkCopy16String             11           11   -5.08%
BenchmarkCopy32String             15           14   -6.58%
BenchmarkCopy128String            28           25  -10.60%
BenchmarkCopy1024String           95           95   +0.53%

R=golang-dev, bradfitz, cshapiro, dave, daniel.morsing, rsc, khr, khr
CC=golang-dev
https://golang.org/cl/9101048
2013-09-12 00:15:28 +02:00
Russ Cox
6d47de2f40 cmd/5g, cmd/6g, cmd/8g: remove O(n) reset loop in copyprop
Simpler version of CL 13084043.

R=ken2
CC=golang-dev
https://golang.org/cl/13602045
2013-09-11 15:22:11 -04:00
Russ Cox
a0bc379d46 undo CL 13084043 / ef4ee02a5853
There is a cleaner, simpler way.

««« original CL description
cmd/5g, cmd/6g, cmd/8g: faster compilation
Replace linked list walk with memset.
This reduces CPU time taken by 'go install -a std' by ~10%.
Before:
real		user		sys
0m23.561s	0m16.625s	0m5.848s
0m23.766s	0m16.624s	0m5.846s
0m23.742s	0m16.621s	0m5.868s
after:
0m22.714s	0m14.858s	0m6.138s
0m22.644s	0m14.875s	0m6.120s
0m22.604s	0m14.854s	0m6.081s

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/13084043
»»»

TBR=dvyukov
CC=golang-dev
https://golang.org/cl/13352049
2013-09-11 15:14:11 -04:00
Russ Cox
6034406eae build: more "undefined behavior" fixes
Fixes #5764.

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/13441051
2013-09-10 14:54:55 -04:00
Rémy Oudompheng
9c21ce54dd cmd/6g: handle very wide offsets.
Fixes #6036.

R=golang-dev, bradfitz, rsc
CC=golang-dev
https://golang.org/cl/12992043
2013-09-09 20:36:19 +02:00
Russ Cox
af2a3193af cmd/5g, cmd/6g, cmd/8g: simplify for loop in bitmap generation
Lucio De Re reports that the more complex
loop miscompiles on Plan 9.

R=ken2
CC=golang-dev
https://golang.org/cl/13602043
2013-09-06 16:49:11 -04:00
Dmitriy Vyukov
79dca0327e libbio, all cmd: consistently use BGETC/BPUTC instead of Bgetc/Bputc
Also introduce BGET2/4, BPUT2/4 as they are widely used.
Slightly improve BGETC/BPUTC implementation.
This gives ~5% CPU time improvement on go install -a -p1 std.
Before:
real		user		sys
0m23.561s	0m16.625s	0m5.848s
0m23.766s	0m16.624s	0m5.846s
0m23.742s	0m16.621s	0m5.868s
after:
0m22.999s	0m15.841s	0m5.889s
0m22.845s	0m15.808s	0m5.850s
0m22.889s	0m15.832s	0m5.848s

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/12745047
2013-08-30 15:46:12 +04:00
Dmitriy Vyukov
06e686def6 cmd/5g, cmd/6g, cmd/8g: faster compilation
Replace linked list walk with memset.
This reduces CPU time taken by 'go install -a std' by ~10%.
Before:
real		user		sys
0m23.561s	0m16.625s	0m5.848s
0m23.766s	0m16.624s	0m5.846s
0m23.742s	0m16.621s	0m5.868s
after:
0m22.714s	0m14.858s	0m6.138s
0m22.644s	0m14.875s	0m6.120s
0m22.604s	0m14.854s	0m6.081s

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/13084043
2013-08-21 14:20:28 +04:00
Russ Cox
3b4d792606 cmd/gc: separate "has pointers" from "needs zeroing" in stack frame
When the new call site-specific frame bitmaps are available,
we can cut the zeroing to just those values that need it due
to scope escaping.

R=cshapiro, cshapiro
CC=golang-dev
https://golang.org/cl/13045043
2013-08-16 21:45:59 -04:00
Carl Shapiro
d3b04f46b5 cmd/5g, cmd/6g, cmd/8g: update frame zeroing for new bitmap format
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12740046
2013-08-16 01:15:04 -04:00
Russ Cox
999a36f9af cmd/gc: &x panics if x does
See golang.org/s/go12nil.

This CL is about getting all the right checks inserted.
A followup CL will add an optimization pass to
remove redundant checks.

R=ken2
CC=golang-dev
https://golang.org/cl/12970043
2013-08-15 14:38:32 -04:00
Rémy Oudompheng
0d9db86f74 cmd/5g, cmd/6g, cmd/8g: restore occurrences of R replaced by nil in comments.
R=golang-dev, bradfitz, rsc
CC=golang-dev
https://golang.org/cl/12842043
2013-08-14 21:24:48 +02:00
Russ Cox
fa72679f07 cmd/gc: add temporary-merging optimization pass
The compilers assume they can generate temporary variables
as needed to preserve the right semantics or simplify code
generation and the back end will still generate good code.
This turns out not to be true. The back ends will only
track the first 128 variables per function and give up
on the remainder. That needs to be fixed too, in a later CL.

This CL merges temporary variables with equal types and
non-overlapping lifetimes using the greedy algorithm in
Poletto and Sarkar, "Linear Scan Register Allocation",
ACM TOPLAS 1999.

The result can be striking in the right functions.

Top 20 frame size changes in a 6g godoc binary by bytes saved:

5464 1984 (-3480, -63.7%) go/build.(*Context).Import
4456 1824 (-2632, -59.1%) go/printer.(*printer).expr1
2560   80 (-2480, -96.9%) time.nextStdChunk
3496 1608 (-1888, -54.0%) go/printer.(*printer).stmt
1896  272 (-1624, -85.7%) net/http.init
2688 1400 (-1288, -47.9%) fmt.(*pp).printReflectValue
2800 1512 (-1288, -46.0%) main.main
3296 2016 (-1280, -38.8%) crypto/tls.(*Conn).clientHandshake
1664  488 (-1176, -70.7%) time.loadZoneZip
1760  608 (-1152, -65.5%) time.parse
4104 3072 (-1032, -25.1%) runtime/pprof.writeHeap
1680  712 ( -968, -57.6%) go/ast.Walk
2488 1560 ( -928, -37.3%) crypto/x509.parseCertificate
1128  392 ( -736, -65.2%) math/big.nat.divLarge
1528  864 ( -664, -43.5%) go/printer.(*printer).fieldList
1360  712 ( -648, -47.6%) regexp/syntax.(*parser).factor
2104 1528 ( -576, -27.4%) encoding/asn1.parseField
1064  504 ( -560, -52.6%) encoding/xml.(*Decoder).text
 584   48 ( -536, -91.8%) html.init
1400  864 ( -536, -38.3%) go/doc.playExample

In the same godoc build, cuts the number of functions with
too many vars from 83 to 32.

R=ken2
CC=golang-dev
https://golang.org/cl/12829043
2013-08-13 00:09:31 -04:00
Russ Cox
dbf96addfb cmd/gc: move flow graph into portable opt
Now there's only one copy of the flow graph construction
and dominator computation, and different optimizations
can attach different annotations to the instructions.

R=ken2
CC=golang-dev
https://golang.org/cl/12797045
2013-08-12 22:02:10 -04:00