1
0
mirror of https://github.com/golang/go synced 2024-10-04 08:21:22 -06:00
Commit Graph

333 Commits

Author SHA1 Message Date
Russ Cox
f5184d3437 cmd/gc: correct handling of globals, func args, results
Globals, function arguments, and results are special cases in
registerization.

Globals must be flushed aggressively, because nearly any
operation can cause a panic, and the recovery code must see
the latest values. Globals also must be loaded aggressively,
because nearly any store through a pointer might be updating a
global: the compiler cannot see all the "address of"
operations on globals, especially exported globals. To
accomplish this, mark all globals as having their address
taken, which effectively disables registerization.

If a function contains a defer statement, the function results
must be flushed aggressively, because nearly any operation can
cause a panic, and the deferred code may call recover, causing
the original function to return the current values of its
function results. To accomplish this, mark all function
results as having their address taken if the function contains
any defer statements. This causes not just aggressive flushing
but also aggressive loading. The aggressive loading is
overkill but the best we can do in the current code.

Function arguments must be considered live at all safe points
in a function, because garbage collection always preserves
them: they must be up-to-date in order to be preserved
correctly. Accomplish this by marking them live at all call
sites. An earlier attempt at this marked function arguments as
having their address taken, which disabled registerization
completely, making programs slower. This CL's solution allows
registerization while preserving safety. The benchmark speedup
is caused by being able to registerize again (the earlier CL
lost the same amount).

benchmark                old ns/op     new ns/op     delta
BenchmarkEqualPort32     61.4          56.0          -8.79%

benchmark                old MB/s     new MB/s     speedup
BenchmarkEqualPort32     521.56       570.97       1.09x

Fixes #1304. (again)
Fixes #7944. (again)
Fixes #7984.
Fixes #7995.

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, iant, r
https://golang.org/cl/97500044
2014-05-15 15:34:53 -04:00
Russ Cox
26ad5d4ff0 cmd/gc: fix liveness vs regopt mismatch for input variables
The inputs to a function are marked live at all times in the
liveness bitmaps, so that the garbage collector will not free
the things they point at and reuse the pointers, so that the
pointers shown in stack traces are guaranteed not to have
been recycled.

Unfortunately, no one told the register optimizer that the
inputs need to be preserved at all call sites. If a function
is done with a particular input value, the optimizer will stop
preserving it across calls. For single-word values this just
means that the value recorded might be stale. For multi-word
values like slices, the value recorded could be only partially stale:
it can happen that, say, the cap was updated but not the len,
or that the len was updated but not the base pointer.
Either of these possibilities (and others) would make the
garbage collector misinterpret memory, leading to memory
corruption.

This came up in a real program, in which the garbage collector's
'slice len ≤ slice cap' check caught the inconsistency.

Fixes #7944.

LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, khr
https://golang.org/cl/100370045
2014-05-12 17:19:02 -04:00
Josh Bleecher Snyder
03c0f3fea9 cmd/gc: alias more variables during register allocation
This is joint work with Daniel Morsing.

In order for the register allocator to alias two variables, they must have the same width, stack offset, and etype. Code generation was altering a variable's etype in a few places. This prevented the variable from being moved to a register, which in turn prevented peephole optimization. This failure to alias was very common, with almost 23,000 instances just running make.bash.

This phenomenon was not visible in the register allocation debug output because the variables that failed to alias had the same name. The debugging-only change to bits.c fixes this by printing the variable number with its name.

This CL fixes the source of all etype mismatches for 6g, all but one case for 8g, and depressingly few cases for 5g. (I believe that extending CL 6819083 to 5g is a prerequisite.) Fixing the remaining cases in 8g and 5g is work for the future.

The etype mismatch fixes are:

* [gc] Slicing changed the type of the base pointer into a uintptr in order to perform arithmetic on it. Instead, support addition directly on pointers.

* [*g] OSPTR was giving type uintptr to slice base pointers; undo that. This arose, for example, while compiling copy(dst, src).

* [8g] 64 bit float conversion was assigning int64 type during codegen, overwriting the existing uint64 type.

Note that some etype mismatches are appropriate, such as a struct with a single field or an array with a single element.

With these fixes, the number of registerizations that occur while running make.bash for 6g increases ~10%. Hello world binary size shrinks ~1.5%. Running all benchmarks in the standard library show performance improvements ranging from nominal to substantive (>10%); a full comparison using 6g on my laptop is available at https://gist.github.com/josharian/8f9b5beb46667c272064. The microbenchmarks must be taken with a grain of salt; see issue 7920. The few benchmarks that show real regressions are likely due to issue 7920. I manually examined the generated code for the top few regressions and none had any assembly output changes. The few benchmarks that show extraordinary improvements are likely also due to issue 7920.

Performance results from 8g appear similar to 6g.

5g shows no performance improvements. This is not surprising, given the discussion above.

Update #7316

LGTM=rsc
R=rsc, daniel.morsing, bradfitz
CC=dave, golang-codereviews
https://golang.org/cl/91850043
2014-05-12 17:10:36 -04:00
Josh Bleecher Snyder
1848d71445 cmd/gc: don't give credit for NOPs during register allocation
The register allocator decides which variables should be placed into registers by charging for each load/store and crediting for each use, and then selecting an allocation with minimal cost. NOPs will be eliminated, however, so using a variable in a NOP should not generate credit.

Issue 7867 arises from attempted registerization of multi-word variables because they are used in NOPs. By not crediting for that use, they will no longer be considered for registerization.

This fix could theoretically lead to better register allocation, but NOPs are rare relative to other instructions.

Fixes #7867.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/94810044
2014-05-09 09:55:17 -07:00
Keith Randall
51b72d94de runtime: use duff zero and copy to initialize memory
benchmark                 old ns/op     new ns/op     delta
BenchmarkCopyFat512       1307          329           -74.83%
BenchmarkCopyFat256       666           169           -74.62%
BenchmarkCopyFat1024      2617          671           -74.36%
BenchmarkCopyFat128       343           89.0          -74.05%
BenchmarkCopyFat64        182           48.9          -73.13%
BenchmarkCopyFat32        103           28.8          -72.04%
BenchmarkClearFat128      102           46.6          -54.31%
BenchmarkClearFat512      344           167           -51.45%
BenchmarkClearFat64       50.5          26.5          -47.52%
BenchmarkClearFat256      147           87.2          -40.68%
BenchmarkClearFat32       22.7          16.4          -27.75%
BenchmarkClearFat1024     511           662           +29.55%

Fixes #7624

LGTM=rsc
R=golang-codereviews, khr, bradfitz, josharian, dave, rsc
CC=golang-codereviews
https://golang.org/cl/92760044
2014-05-07 13:17:10 -07:00
Russ Cox
0a8a719ded cmd/5g, cmd/6g, cmd/8g: preserve wide values in large functions
In large functions with many variables, the register optimizer
may give up and choose not to track certain variables at all.
In this case, the "nextinnode" information linking together
all the words from a given variable will be incomplete, and
the result may be that only some of a multiword value is
preserved across a call. That confuses the garbage collector,
so don't do that. Instead, mark those variables as having
their address taken, so that they will be preserved at all
calls. It's overkill, but correct.

Tested by hand using the 6g -S output to see that it does fix
the buggy generated code leading to the issue 7726 failure.

There is no automated test because I managed to break the
compiler while writing a test (see issue 7727). I will check
in a test along with the fix to issue 7727.

Fixes #7726.

LGTM=khr
R=khr, bradfitz, dave
CC=golang-codereviews
https://golang.org/cl/85200043
2014-04-16 13:59:42 -04:00
Russ Cox
9c8f11ff96 cmd/5g, cmd/8g: fix build
Botched during CL 83090046.

TBR=khr
CC=golang-codereviews
https://golang.org/cl/83070046
2014-04-01 20:24:53 -04:00
Russ Cox
daca06f2e3 cmd/gc: shorten more temporary lifetimes
1. In functions with heap-allocated result variables or with
defer statements, the return sequence requires more than
just a single RET instruction. There is an optimization that
arranges for all returns to jump to a single copy of the return
epilogue in this case. Unfortunately, that optimization is
fundamentally incompatible with PC-based liveness information:
it takes PCs at many different points in the function and makes
them all land at one PC, making the combined liveness information
at that target PC a mess. Disable this optimization, so that each
return site gets its own copy of the 'call deferreturn' and the
copying of result variables back from the heap.
This removes quite a few spurious 'ambiguously live' variables.

2. Let orderexpr allocate temporaries that are passed by address
to a function call and then die on return, so that we can arrange
an appropriate VARKILL.

2a. Do this for ... slices.

2b. Do this for closure structs.

2c. Do this for runtime.concatstring, which is the implementation
of large string additions. Change representation of OADDSTR to
an explicit list in typecheck to avoid reconstructing list in both
walk and order.

3. Let orderexpr allocate the temporary variable copies used for
range loops, so that they can be killed when the loop is over.
Similarly, let it allocate the temporary holding the map iterator.

CL 81940043 reduced the number of ambiguously live temps
in the godoc binary from 860 to 711.

This CL reduces the number to 121. Still more to do, but another
good checkpoint.

Update #7345

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83090046
2014-04-01 20:02:54 -04:00
Russ Cox
b700cb4974 cmd/gc: shorten temporary lifetimes when possible
The new channel and map runtime routines take pointers
to values, typically temporaries. Without help, the compiler
cannot tell when those temporaries stop being needed,
because it isn't sure what happened to the pointer.
Arrange to insert explicit VARKILL instructions for these
temporaries so that the liveness analysis can avoid seeing
them as "ambiguously live".

The change is made in order.c, which was already in charge of
introducing temporaries to preserve the order-of-evaluation
guarantees. Now its job has expanded to include introducing
temporaries as needed by runtime routines, and then also
inserting the VARKILL annotations for all these temporaries,
so that their lifetimes can be shortened.

In order to do its job for the map runtime routines, order.c arranges
that all map lookups or map assignments have the form:

        x = m[k]
        x, y = m[k]
        m[k] = x

where x, y, and k are simple variables (often temporaries).
Likewise, receiving from a channel is now always:

        x = <-c

In order to provide the map guarantee, order.c is responsible for
rewriting x op= y into x = x op y, so that m[k] += z becomes

        t = m[k]
        t2 = t + z
        m[k] = t2

While here, fix a few bugs in order.c's traversal: it was failing to
walk into select and switch case bodies, so order of evaluation
guarantees were not preserved in those situations.
Added tests to test/reorder2.go.

Fixes #7671.

In gc/popt's temporary-merging optimization, allow merging
of temporaries with their address taken as long as the liveness
ranges do not intersect. (There is a good chance of that now
that we have VARKILL annotations to limit the liveness range.)

Explicitly killing temporaries cuts the number of ambiguously
live temporaries that must be zeroed in the godoc binary from
860 to 711, or -17%. There is more work to be done, but this
is a good checkpoint.

Update #7345

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81940043
2014-04-01 13:31:38 -04:00
Russ Cox
6722d45631 cmd/gc: liveness-related bug fixes
1. On entry to a function, only zero the ambiguously live stack variables.
Before, we were zeroing all stack variables containing pointers.
The zeroing is pretty inefficient right now (issue 7624), but there are also
too many stack variables detected as ambiguously live (issue 7345),
and that must be addressed before deciding how to improve the zeroing code.
(Changes in 5g/ggen.c, 6g/ggen.c, 8g/ggen.c, gc/pgen.c)

Fixes #7647.

2. Make the regopt word-based liveness analysis preserve the
whole-variable liveness property expected by the garbage collection
bitmap liveness analysis. That is, if the regopt liveness decides that
one word in a struct needs to be preserved, make sure it preserves
the entire struct. This is particularly important for multiword values
such as strings, slices, and interfaces, in which all the words need
to be present in order to understand the meaning.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c.)

Fixes #7591.

3. Make the regopt word-based liveness analysis treat a variable
as having its address taken - which makes it preserved across
all future calls - whenever n->addrtaken is set, for consistency
with the gc bitmap liveness analysis, even if there is no machine
instruction actually taking the address. In this case n->addrtaken
is incorrect (a nicer way to put it is overconservative), and ideally
there would be no such cases, but they can happen and the two
analyses need to agree.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c; test in bug484.go.)

Fixes crashes found by turning off "zero everything" in step 1.

4. Remove spurious VARDEF annotations. As the comment in
gc/pgen.c explains, the VARDEF must immediately precede
the initialization. It cannot be too early, and it cannot be too late.
In particular, if a function call sits between the VARDEF and the
actual machine instructions doing the initialization, the variable
will be treated as live during that function call even though it is
uninitialized, leading to problems.
(Changes in gc/gen.c; test in live.go.)

Fixes crashes found by turning off "zero everything" in step 1.

5. Do not treat loading the address of a wide value as a signal
that the value must be initialized. Instead depend on the existence
of a VARDEF or the first actual read/write of a word in the value.
If the load is in order to pass the address to a function that does
the actual initialization, treating the load as an implicit VARDEF
causes the same problems as described in step 4.
The alternative is to arrange to zero every such value before
passing it to the real initialization function, but this is a much
easier and more efficient change.
(Changes in gc/plive.c.)

Fixes crashes found by turning off "zero everything" in step 1.

6. Treat wide input parameters with their address taken as
initialized on entry to the function. Otherwise they look
"ambiguously live" and we will try to emit code to zero them.
(Changes in gc/plive.c.)

Fixes crashes found by turning off "zero everything" in step 1.

7. An array of length 0 has no pointers, even if the element type does.
Without this change, the zeroing code complains when asked to
clear a 0-length array.
(Changes in gc/reflect.c.)

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80160044
2014-03-27 14:05:57 -04:00
Dave Cheney
e31a1ce109 cmd/gc, cmd/5g, cmd/6g, cmd/8g: introduce linkarchinit and add amd64p32 support
Replaces CL 70000043.

Introduce linkarchinit() from cmd/ld.

For cmd/6g, switch to the amd64p32 linker model if we are building under nacl/amd64p32.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/71330045
2014-03-07 15:33:44 +11:00
Russ Cox
c2dd33a46f cmd/ld: clear unused ctxt before morestack
For non-closure functions, the context register is uninitialized
on entry and will not be used, but morestack saves it and then the
garbage collector treats it as live. This can be a source of memory
leaks if the context register points at otherwise dead memory.
Avoid this by introducing a parallel set of morestack functions
that clear the context register, and use those for the non-closure functions.

I hope this will help with some of the finalizer flakiness, but it probably won't.

Fixes #7244.

LGTM=dvyukov
R=khr, dvyukov
CC=golang-codereviews
https://golang.org/cl/71030044
2014-03-04 13:53:08 -05:00
Josh Bleecher Snyder
d97d37188a 5g, 8g: remove dead code
maxstksize is superfluous and appears to be vestigial. 6g does not use it.

c >= 4 cannot occur; c = w % 4.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/68750043
2014-02-25 14:43:53 -08:00
Dave Cheney
7c8280c9ef all: merge NaCl branch (part 1)
See golang.org/s/go13nacl for design overview.

This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come.

CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately.

The exact change lists included are

15050047: syscall: support for Native Client
15360044: syscall: unzip implementation for Native Client
15370044: syscall: Native Client SRPC implementation
15400047: cmd/dist, cmd/go, go/build, test: support for Native Client
15410048: runtime: support for Native Client
15410049: syscall: file descriptor table for Native Client
15410050: syscall: in-memory file system for Native Client
15440048: all: update +build lines for Native Client port
15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client
15570045: os: support for Native Client
15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32
15690044: net: support for Native Client
15690048: runtime: support for fake time like on Go Playground
15690051: build: disable various tests on Native Client

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/68150047
2014-02-25 09:47:42 -05:00
Russ Cox
1ca1cbea65 cmd/5g, cmd/8g: zero ambiguously live values on entry
The code here is being restored after its deletion in CL 14430048.

I restored the copy in cmd/6g in CL 56430043 but neglected the
other two.

This is the reason that enabling precisestack only worked on amd64.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/66170043
2014-02-19 17:08:55 -05:00
Russ Cox
7a7c0ffb47 cmd/gc: correct liveness for fat variables
The VARDEF placement must be before the initialization
but after any final use. If you have something like s = ... using s ...
the rhs must be evaluated, then the VARDEF, then the lhs
assigned.

There is a large comment in pgen.c on gvardef explaining
this in more detail.

This CL also includes Ian's suggestions from earlier CLs,
namely commenting the use of mode in link.h and fixing
the precedence of the ~r check in dcl.c.

This CL enables the check that if liveness analysis decides
a variable is live on entry to the function, that variable must
be a function parameter (not a result, and not a local variable).
If this check fails, it indicates a bug in the liveness analysis or
in the generated code being analyzed.

The race detector generates invalid code for append(x, y...).
The code declares a temporary t and then uses cap(t) before
initializing t. The new liveness check catches this bug and
stops the compiler from writing out the buggy code.
Consequently, this CL disables the race detector tests in
run.bash until the race detector bug can be fixed
(golang.org/issue/7334).

Except for the race detector bug, the liveness analysis check
does not detect any problems (this CL and the previous CLs
fixed all the detected problems).

The net test still fails with GOGC=0 but the rest of the tests
now pass or time out (because GOGC=0 is so slow).

TBR=iant
CC=golang-codereviews
https://golang.org/cl/64170043
2014-02-15 10:58:55 -05:00
Russ Cox
91b1f7cb15 cmd/gc: handle variable initialization by block move in liveness
Any initialization of a variable by a block copy or block zeroing
or by multiple assignments (componentwise copying or zeroing
of a multiword variable) needs to emit a VARDEF. These cases were not.

Fixes #7205.

TBR=iant
CC=golang-codereviews
https://golang.org/cl/63650044
2014-02-13 22:45:16 -05:00
Russ Cox
7addda685d cmd/5g, cmd/8g: fix build
The test added in CL 63630043 fails on 5g and 8g because they
were not emitting the VARDEF instruction when clearing a fat
value by clearing the components. 6g had the call in the right place.

Hooray tests.

TBR=iant
CC=golang-codereviews
https://golang.org/cl/63660043
2014-02-13 22:30:35 -05:00
Russ Cox
801e40a0a4 cmd/gc: rename AFATVARDEF to AVARDEF
The "fat" referred to being used for multiword values only.
We're going to use it for non-fat values sometimes too.

No change other than the renaming.

TBR=iant
CC=golang-codereviews
https://golang.org/cl/63650043
2014-02-13 22:17:22 -05:00
Russ Cox
684332f47c cmd/5g: fix regopt bug in copyprop
copyau1 was assuming that it could deduce the type of the
middle register p->reg from the type of the left or right
argument: in CMPF F1, F2, the p->reg==2 must be a D_FREG
because p->from is F1, and in CMP R1, R2, the p->reg==2 must
be a D_REG because p->from is R1.

This heuristic fails for CMP $0, R2, which was causing copyau1
not to recognize p->reg==2 as a reference to R2, which was
keeping it from properly renaming the register use when
substituting registers.

cmd/5c has the right approach: look at the opcode p->as to
decide the kind of register. It is unclear where 5g's copyau1
came from; perhaps it was an attempt to avoid expanding 5c's
a2type to include new instructions used only by 5g.

Copy a2type from cmd/5c, expand to include additional instructions,
and make it crash the compiler if asked about an instruction
it does not understand (avoid silent bugs in the future if new
instructions are added).

Should fix current arm build breakage.

While we're here, fix the print statements dumping the pred and
succ info in the asm listing to pass an int arg to %.4ud
(Prog.pc is a vlong now, due to the liblink merge).

TBR=ken2
CC=golang-codereviews
https://golang.org/cl/62730043
2014-02-13 03:54:55 +00:00
Anthony Martin
27cb59fdad cmd/5g: fix print format in peephole debugging
Fixes #7294.

LGTM=minux.ma, dave, bradfitz
R=golang-codereviews, minux.ma, dave, bradfitz
CC=golang-codereviews
https://golang.org/cl/61370043
2014-02-12 17:03:21 -08:00
Anthony Martin
2cae0591cd cmd/cc, cmd/gc, cmd/ld: consolidate print format routines
We now use the %A, %D, %P, and %R routines from liblink
across the board.

Fixes #7178.
Fixes #7055.

LGTM=iant
R=golang-codereviews, gobot, rsc, dave, iant, remyoudompheng
CC=golang-codereviews
https://golang.org/cl/49170043
2014-02-12 14:29:11 -05:00
Daniel Morsing
85e4cb2f10 cmd/6g, cmd/8g, cmd/5g: make the undefined instruction have no successors
The UNDEF instruction was listed in the instruction data as having the next instruction in the stream as its successor. This confused the optimizer into adding a load where it wasn't needed, in turn confusing the liveness analysis pass for GC bitmaps into thinking that the variable was live.

Fixes #7229.

LGTM=iant, rsc
R=golang-codereviews, bradfitz, iant, dave, rsc
CC=golang-codereviews
https://golang.org/cl/56910045
2014-02-11 20:25:40 +00:00
Russ Cox
4acb70d377 cmd/gc: bypass DATA instruction for data initialized to integer constant
Eventually we will want to bypass DATA for everything,
but the relocations are not standardized well enough across
architectures to make that possible.

This did not help as much as I expected, but it is definitely better.
It shaves maybe 1-2% off all.bash depending on how much you
trust the timings of a single run:

Before: 241.139r 362.702u 112.967s
After:  234.339r 359.623u 111.045s

R=golang-codereviews, gobot, r, iant
CC=golang-codereviews
https://golang.org/cl/44650043
2013-12-20 14:24:39 -05:00
Russ Cox
870e821ded cmd/cc, cmd/gc: update compilers, assemblers for liblink changes
- add buffered stdout to all tools and provide to link ctxt.
- avoid extra \n before ! in .6 files written by assemblers
  (makes them match the C compilers).
- use linkwriteobj instead of linkouthist+linkwritefuncs.
- in assemblers and C compilers, record pc explicitly in Prog,
  for use by liblink.
- in C compilers, preserve jump target links.
- in Go compilers (gsubr.c) attach gotype directly to
  corresponding LSym* instead of rederiving from instruction stream.
- in Go compilers, emit just one definition for runtime.zerovalue
  from each compilation.

This CL consists entirely of small adjustments.
The heavy lifting is in CL 39680043.
Each depends on the other.

R=golang-dev, dave, iant
CC=golang-dev
https://golang.org/cl/37030045
2013-12-16 12:51:38 -05:00
Russ Cox
f606c1be80 cmd/5g, cmd/6g, cmd/8g: use liblink
Preparation for golang.org/s/go13linker work.

This CL does not build by itself. It depends on 35740044
and 35790044 and will be submitted at the same time.

R=iant
CC=golang-dev
https://golang.org/cl/34590045
2013-12-08 22:51:55 -05:00
Carl Shapiro
f056daf075 cmd/5g, cmd/5l, cmd/6g, cmd/6l, cmd/8g, cmd/8l, cmd/gc, runtime: generate pointer maps by liveness analysis
This change allows the garbage collector to examine stack
slots that are determined as live and containing a pointer
value by the garbage collector.  This results in a mean
reduction of 65% in the number of stack slots scanned during
an invocation of "GOGC=1 all.bash".

Unfortunately, this does not yet allow garbage collection to
be precise for the stack slots computed as live.  Pointers
confound the determination of what definitions reach a given
instruction.  In general, this problem is not solvable without
runtime cost but some advanced cooperation from the compiler
might mitigate common cases.

R=golang-dev, rsc, cshapiro
CC=golang-dev
https://golang.org/cl/14430048
2013-12-05 17:35:22 -08:00
Russ Cox
aa0439ba65 cmd/gc: eliminate redundant &x.Field nil checks
This eliminates ~75% of the nil checks being emitted,
on all architectures. We can do better, but we need
a bit more general support from the compiler, and
I don't want to do that so close to Go 1.2.
What's here is simple but effective and safe.

A few small code generation cleanups were required
to make the analysis consistent on all systems about
which nil checks are omitted, at least in the test.

Fixes #6019.

R=ken2
CC=golang-dev
https://golang.org/cl/13334052
2013-09-17 16:54:22 -04:00
Rémy Oudompheng
ff416a3f19 cmd/gc: inline copy in frontend to call memmove directly.
A new node type OSPTR is added to refer to the data pointer of
strings and slices in a simple way during walk(). It will be
useful for future work on simplification of slice arithmetic.

benchmark                  old ns/op    new ns/op    delta
BenchmarkCopy1Byte                 9            8  -13.98%
BenchmarkCopy2Byte                14            8  -40.49%
BenchmarkCopy4Byte                13            8  -35.04%
BenchmarkCopy8Byte                13            8  -37.10%
BenchmarkCopy12Byte               14           12  -15.38%
BenchmarkCopy16Byte               14           12  -17.24%
BenchmarkCopy32Byte               19           14  -27.32%
BenchmarkCopy128Byte              31           26  -15.29%
BenchmarkCopy1024Byte            100           92   -7.50%
BenchmarkCopy1String              10            7  -28.99%
BenchmarkCopy2String              10            7  -28.06%
BenchmarkCopy4String              10            8  -22.69%
BenchmarkCopy8String              10            8  -23.30%
BenchmarkCopy12String             11           11   -5.88%
BenchmarkCopy16String             11           11   -5.08%
BenchmarkCopy32String             15           14   -6.58%
BenchmarkCopy128String            28           25  -10.60%
BenchmarkCopy1024String           95           95   +0.53%

R=golang-dev, bradfitz, cshapiro, dave, daniel.morsing, rsc, khr, khr
CC=golang-dev
https://golang.org/cl/9101048
2013-09-12 00:15:28 +02:00
Russ Cox
6d47de2f40 cmd/5g, cmd/6g, cmd/8g: remove O(n) reset loop in copyprop
Simpler version of CL 13084043.

R=ken2
CC=golang-dev
https://golang.org/cl/13602045
2013-09-11 15:22:11 -04:00
Russ Cox
a0bc379d46 undo CL 13084043 / ef4ee02a5853
There is a cleaner, simpler way.

««« original CL description
cmd/5g, cmd/6g, cmd/8g: faster compilation
Replace linked list walk with memset.
This reduces CPU time taken by 'go install -a std' by ~10%.
Before:
real		user		sys
0m23.561s	0m16.625s	0m5.848s
0m23.766s	0m16.624s	0m5.846s
0m23.742s	0m16.621s	0m5.868s
after:
0m22.714s	0m14.858s	0m6.138s
0m22.644s	0m14.875s	0m6.120s
0m22.604s	0m14.854s	0m6.081s

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/13084043
»»»

TBR=dvyukov
CC=golang-dev
https://golang.org/cl/13352049
2013-09-11 15:14:11 -04:00
Russ Cox
af2a3193af cmd/5g, cmd/6g, cmd/8g: simplify for loop in bitmap generation
Lucio De Re reports that the more complex
loop miscompiles on Plan 9.

R=ken2
CC=golang-dev
https://golang.org/cl/13602043
2013-09-06 16:49:11 -04:00
Dmitriy Vyukov
79dca0327e libbio, all cmd: consistently use BGETC/BPUTC instead of Bgetc/Bputc
Also introduce BGET2/4, BPUT2/4 as they are widely used.
Slightly improve BGETC/BPUTC implementation.
This gives ~5% CPU time improvement on go install -a -p1 std.
Before:
real		user		sys
0m23.561s	0m16.625s	0m5.848s
0m23.766s	0m16.624s	0m5.846s
0m23.742s	0m16.621s	0m5.868s
after:
0m22.999s	0m15.841s	0m5.889s
0m22.845s	0m15.808s	0m5.850s
0m22.889s	0m15.832s	0m5.848s

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/12745047
2013-08-30 15:46:12 +04:00
Rémy Oudompheng
4fc7ff497d cmd/5g: avoid clash between R13 and F3 registers.
Fixes #6247.

R=golang-dev, lucio.dere, bradfitz
CC=golang-dev
https://golang.org/cl/13216043
2013-08-27 21:09:16 +02:00
Dmitriy Vyukov
06e686def6 cmd/5g, cmd/6g, cmd/8g: faster compilation
Replace linked list walk with memset.
This reduces CPU time taken by 'go install -a std' by ~10%.
Before:
real		user		sys
0m23.561s	0m16.625s	0m5.848s
0m23.766s	0m16.624s	0m5.846s
0m23.742s	0m16.621s	0m5.868s
after:
0m22.714s	0m14.858s	0m6.138s
0m22.644s	0m14.875s	0m6.120s
0m22.604s	0m14.854s	0m6.081s

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/13084043
2013-08-21 14:20:28 +04:00
Russ Cox
3b4d792606 cmd/gc: separate "has pointers" from "needs zeroing" in stack frame
When the new call site-specific frame bitmaps are available,
we can cut the zeroing to just those values that need it due
to scope escaping.

R=cshapiro, cshapiro
CC=golang-dev
https://golang.org/cl/13045043
2013-08-16 21:45:59 -04:00
Carl Shapiro
d3b04f46b5 cmd/5g, cmd/6g, cmd/8g: update frame zeroing for new bitmap format
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12740046
2013-08-16 01:15:04 -04:00
Russ Cox
999a36f9af cmd/gc: &x panics if x does
See golang.org/s/go12nil.

This CL is about getting all the right checks inserted.
A followup CL will add an optimization pass to
remove redundant checks.

R=ken2
CC=golang-dev
https://golang.org/cl/12970043
2013-08-15 14:38:32 -04:00
Rémy Oudompheng
0d9db86f74 cmd/5g, cmd/6g, cmd/8g: restore occurrences of R replaced by nil in comments.
R=golang-dev, bradfitz, rsc
CC=golang-dev
https://golang.org/cl/12842043
2013-08-14 21:24:48 +02:00
Russ Cox
72636eb506 cmd/5g: fix temp-merging on ARM
mkvar was taking care of the "LeftAddr" case,
effectively hiding it from the temp-merging optimization.

Move it into prog.c.

R=ken2
CC=golang-dev
https://golang.org/cl/12884045
2013-08-14 00:34:18 -04:00
Russ Cox
fa72679f07 cmd/gc: add temporary-merging optimization pass
The compilers assume they can generate temporary variables
as needed to preserve the right semantics or simplify code
generation and the back end will still generate good code.
This turns out not to be true. The back ends will only
track the first 128 variables per function and give up
on the remainder. That needs to be fixed too, in a later CL.

This CL merges temporary variables with equal types and
non-overlapping lifetimes using the greedy algorithm in
Poletto and Sarkar, "Linear Scan Register Allocation",
ACM TOPLAS 1999.

The result can be striking in the right functions.

Top 20 frame size changes in a 6g godoc binary by bytes saved:

5464 1984 (-3480, -63.7%) go/build.(*Context).Import
4456 1824 (-2632, -59.1%) go/printer.(*printer).expr1
2560   80 (-2480, -96.9%) time.nextStdChunk
3496 1608 (-1888, -54.0%) go/printer.(*printer).stmt
1896  272 (-1624, -85.7%) net/http.init
2688 1400 (-1288, -47.9%) fmt.(*pp).printReflectValue
2800 1512 (-1288, -46.0%) main.main
3296 2016 (-1280, -38.8%) crypto/tls.(*Conn).clientHandshake
1664  488 (-1176, -70.7%) time.loadZoneZip
1760  608 (-1152, -65.5%) time.parse
4104 3072 (-1032, -25.1%) runtime/pprof.writeHeap
1680  712 ( -968, -57.6%) go/ast.Walk
2488 1560 ( -928, -37.3%) crypto/x509.parseCertificate
1128  392 ( -736, -65.2%) math/big.nat.divLarge
1528  864 ( -664, -43.5%) go/printer.(*printer).fieldList
1360  712 ( -648, -47.6%) regexp/syntax.(*parser).factor
2104 1528 ( -576, -27.4%) encoding/asn1.parseField
1064  504 ( -560, -52.6%) encoding/xml.(*Decoder).text
 584   48 ( -536, -91.8%) html.init
1400  864 ( -536, -38.3%) go/doc.playExample

In the same godoc build, cuts the number of functions with
too many vars from 83 to 32.

R=ken2
CC=golang-dev
https://golang.org/cl/12829043
2013-08-13 00:09:31 -04:00
Russ Cox
dbf96addfb cmd/gc: move flow graph into portable opt
Now there's only one copy of the flow graph construction
and dominator computation, and different optimizations
can attach different annotations to the instructions.

R=ken2
CC=golang-dev
https://golang.org/cl/12797045
2013-08-12 22:02:10 -04:00
Russ Cox
b3b87143f2 cmd/gc: support for "portable" optimization logic
Code in gc/popt.c is compiled as part of 5g, 6g, and 8g,
meaning it can use arch-specific headers but there's
just one copy of the code.

This is the same arrangement we use for the portable
code generation logic in gc/pgen.c.

Move fixjmp and noreturn there to get the ball rolling.

R=ken2
CC=golang-dev
https://golang.org/cl/12789043
2013-08-12 19:14:02 -04:00
Russ Cox
a07218385c cmd/5g: factor out prog information
Like CL 12637051, but for 5g instead of 6g.

R=ken2
CC=golang-dev
https://golang.org/cl/12779043
2013-08-12 13:42:23 -04:00
Russ Cox
7910cd68b5 cmd/gc: zero pointers on entry to function
On entry to a function, zero the results and zero the pointer
section of the local variables.

This is an intermediate step on the way to precise collection
of Go frames.

This can incur a significant (up to 30%) slowdown, but it also ensures
that the garbage collector never looks at a word in a Go frame
and sees a stale pointer value that could cause a space leak.
(C frames and assembly frames are still possibly problematic.)

This CL is required to start making collection of interface values
as precise as collection of pointer values are today.
Since we have to dereference the interface type to understand
whether the value is a pointer, it is critical that the type field be
initialized.

A future CL by Carl will make the garbage collection pointer
bitmaps context-sensitive. At that point it will be possible to
remove most of the zeroing. The only values that will still need
zeroing are values whose addresses escape the block scoping
of the function but do not escape to the heap.

benchmark                         old ns/op    new ns/op    delta
BenchmarkBinaryTree17            4420289180   4331060459   -2.02%
BenchmarkFannkuch11              3442469663   3277706251   -4.79%
BenchmarkFmtFprintfEmpty                100          142  +42.00%
BenchmarkFmtFprintfString               262          310  +18.32%
BenchmarkFmtFprintfInt                  213          281  +31.92%
BenchmarkFmtFprintfIntInt               355          431  +21.41%
BenchmarkFmtFprintfPrefixedInt          321          383  +19.31%
BenchmarkFmtFprintfFloat                444          533  +20.05%
BenchmarkFmtManyArgs                   1380         1559  +12.97%
BenchmarkGobDecode                 10240054     11794915  +15.18%
BenchmarkGobEncode                 17350274     19970478  +15.10%
BenchmarkGzip                     455179460    460699139   +1.21%
BenchmarkGunzip                   114271814    119291574   +4.39%
BenchmarkHTTPClientServer             89051        89894   +0.95%
BenchmarkJSONEncode                40486799     52691558  +30.15%
BenchmarkJSONDecode                94193361    112428781  +19.36%
BenchmarkMandelbrot200              4747060      4748043   +0.02%
BenchmarkGoParse                    6363798      6675098   +4.89%
BenchmarkRegexpMatchEasy0_32            129          171  +32.56%
BenchmarkRegexpMatchEasy0_1K            365          395   +8.22%
BenchmarkRegexpMatchEasy1_32            106          152  +43.40%
BenchmarkRegexpMatchEasy1_1K            952         1245  +30.78%
BenchmarkRegexpMatchMedium_32           198          283  +42.93%
BenchmarkRegexpMatchMedium_1K         79006       101097  +27.96%
BenchmarkRegexpMatchHard_32            3478         5115  +47.07%
BenchmarkRegexpMatchHard_1K          110245       163582  +48.38%
BenchmarkRevcomp                  777384355    793270857   +2.04%
BenchmarkTemplate                 136713089    157093609  +14.91%
BenchmarkTimeParse                     1511         1761  +16.55%
BenchmarkTimeFormat                     535          850  +58.88%

benchmark                          old MB/s     new MB/s  speedup
BenchmarkGobDecode                    74.95        65.07    0.87x
BenchmarkGobEncode                    44.24        38.43    0.87x
BenchmarkGzip                         42.63        42.12    0.99x
BenchmarkGunzip                      169.81       162.67    0.96x
BenchmarkJSONEncode                   47.93        36.83    0.77x
BenchmarkJSONDecode                   20.60        17.26    0.84x
BenchmarkGoParse                       9.10         8.68    0.95x
BenchmarkRegexpMatchEasy0_32         247.24       186.31    0.75x
BenchmarkRegexpMatchEasy0_1K        2799.20      2591.93    0.93x
BenchmarkRegexpMatchEasy1_32         299.31       210.44    0.70x
BenchmarkRegexpMatchEasy1_1K        1074.71       822.45    0.77x
BenchmarkRegexpMatchMedium_32          5.04         3.53    0.70x
BenchmarkRegexpMatchMedium_1K         12.96        10.13    0.78x
BenchmarkRegexpMatchHard_32            9.20         6.26    0.68x
BenchmarkRegexpMatchHard_1K            9.29         6.26    0.67x
BenchmarkRevcomp                     326.95       320.40    0.98x
BenchmarkTemplate                     14.19        12.35    0.87x

R=cshapiro
CC=golang-dev
https://golang.org/cl/12616045
2013-08-09 23:10:58 -04:00
Rémy Oudompheng
357f733695 cmd/5c, cmd/5g, cmd/5l: turn MOVB, MOVH into plain moves, optimize short arithmetic.
Pseudo-instructions MOVBS and MOVHS are used to clarify
the semantics of short integers vs. registers:
 * 8-bit and 16-bit values in registers are assumed to always
   be zero-extended or sign-extended depending on their type.
 * MOVB is truncation or move of an already extended value
   between registers.
 * MOVBU enforces zero-extension at the destination (register).
 * MOVBS enforces sign-extension at the destination (register).
And similarly for MOVH/MOVS/MOVHU.

The linker is adapted to assemble MOVB and MOVH to an ordinary
mov. Also a peephole pass in 5g that aims at eliminating
redundant zero/sign extensions is improved.

encoding/binary:
benchmark                              old ns/op    new ns/op    delta
BenchmarkReadSlice1000Int32s              220387       217185   -1.45%
BenchmarkReadStruct                        12839        12910   +0.55%
BenchmarkReadInts                           5692         5534   -2.78%
BenchmarkWriteInts                          6137         6016   -1.97%
BenchmarkPutUvarint32                        257          241   -6.23%
BenchmarkPutUvarint64                        812          754   -7.14%
benchmark                               old MB/s     new MB/s  speedup
BenchmarkReadSlice1000Int32s               18.15        18.42    1.01x
BenchmarkReadStruct                         5.45         5.42    0.99x
BenchmarkReadInts                           5.27         5.42    1.03x
BenchmarkWriteInts                          4.89         4.99    1.02x
BenchmarkPutUvarint32                      15.56        16.57    1.06x
BenchmarkPutUvarint64                       9.85        10.60    1.08x

crypto/des:
benchmark                              old ns/op    new ns/op    delta
BenchmarkEncrypt                            7002         5169  -26.18%
BenchmarkDecrypt                            7015         5195  -25.94%
benchmark                               old MB/s     new MB/s  speedup
BenchmarkEncrypt                            1.14         1.55    1.36x
BenchmarkDecrypt                            1.14         1.54    1.35x

strconv:
benchmark                              old ns/op    new ns/op    delta
BenchmarkAtof64Decimal                       457          385  -15.75%
BenchmarkAtof64Float                         574          479  -16.55%
BenchmarkAtof64FloatExp                     1035          906  -12.46%
BenchmarkAtof64Big                          1793         1457  -18.74%
BenchmarkAtof64RandomBits                   2267         2066   -8.87%
BenchmarkAtof64RandomFloats                 1416         1194  -15.68%
BenchmarkAtof32Decimal                       451          379  -15.96%
BenchmarkAtof32Float                         547          435  -20.48%
BenchmarkAtof32FloatExp                     1095          986   -9.95%
BenchmarkAtof32Random                       1154         1006  -12.82%
BenchmarkAtoi                               1415         1380   -2.47%
BenchmarkAtoiNeg                            1414         1401   -0.92%
BenchmarkAtoi64                             1744         1671   -4.19%
BenchmarkAtoi64Neg                          1737         1662   -4.32%

Fixes #1837.

R=rsc, dave, bradfitz
CC=golang-dev
https://golang.org/cl/12424043
2013-08-09 06:43:17 +02:00
Rémy Oudompheng
3670337097 cmd/5c, cmd/5g, cmd/5l: introduce MOVBS and MOVHS instructions.
MOVBS and MOVHS are defined as duplicates of MOVB and MOVH,
and perform sign-extension moving.
No change is made to code generation.

Update #1837

R=rsc, bradfitz
CC=golang-dev
https://golang.org/cl/12682043
2013-08-08 23:28:53 +02:00
Dmitriy Vyukov
8679d5f2b5 cmd/gc: record argument size for all indirect function calls
This is required to properly unwind reflect.methodValueCall/makeFuncStub.
Fixes #5954.
Stats for 'go install std':
61849 total INSTCALL
24655 currently have ArgSize metadata
27278 have ArgSize metadata with this change
godoc size before: 11351888, after: 11364288

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12163043
2013-07-31 20:00:33 +04:00
Russ Cox
48769bf546 runtime: use funcdata to supply garbage collection information
This CL introduces a FUNCDATA number for runtime-specific
garbage collection metadata, changes the C and Go compilers
to emit that metadata, and changes the runtime to expect it.

The old pseudo-instructions that carried this information
are gone, as is the linker code to process them.

R=golang-dev, dvyukov, cshapiro
CC=golang-dev
https://golang.org/cl/11406044
2013-07-19 16:04:09 -04:00
Russ Cox
7b3c8b7ac8 cmd/5g, cmd/6g, cmd/8g: insert arg size annotations on runtime calls
If calling a function in package runtime, emit argument size
information around the call in case the call is to a variadic C function.

R=ken2
CC=golang-dev
https://golang.org/cl/11371043
2013-07-16 16:25:10 -04:00