The loop recognizer uses the standard dominance
frontiers but gets confused by dead code, which
has a (not explicitly set) rpo number of 0, meaning it
looks like the head of the function, so it dominates
everything. If the loop recognizer encounters dead
code while tracking backward through the graph
it fails to recognize where it started as a loop, and
then the optimizer does not registerize values loaded
inside that loop. Fix by checking rpo against rpo2r.
Separately, run a quick pass over the generated
code to squash JMPs to JMP instructions, which
are convenient to emit during code generation but
difficult to read when debugging the -S output.
A side effect of this pass is to eliminate dead code,
so the output files may be slightly smaller and the
optimizer may have less work to do.
There is no semantic effect, because the linkers
flatten JMP chains and delete dead instructions
when laying out the final code. Doing it here too
just makes the -S output easier to read and more
like what the final binary will contain.
The "dead code breaks loop finding" bug is thus
fixed twice over. It seemed prudent to fix loopit
separately just in case dead code ever sneaks back
in for one reason or another.
R=ken2
CC=golang-dev
https://golang.org/cl/5190043
My previous CL:
changeset: 9645:ce2e5f44b310
user: Russ Cox <rsc@golang.org>
date: Tue Sep 06 10:24:21 2011 -0400
summary: gc: unify stack frame layout
introduced a bug wherein no variables were
being registerized, making Go programs 2-3x
slower than they had been before.
This CL fixes that bug (along with some others
it was hiding) and adds a test that optimization
makes at least one test case faster.
R=ken2
CC=golang-dev
https://golang.org/cl/5174045
allocparams + tempname + compactframe
all knew about how to place stack variables.
Now only compactframe, renamed to allocauto,
does the work. Until the last minute, each PAUTO
variable is in its own space and has xoffset == 0.
This might break 5g. I get failures in concurrent
code running under qemu and I can't tell whether
it's 5g's fault or qemu's. We'll see what the real
ARM builders say.
R=ken2
CC=golang-dev
https://golang.org/cl/4973057
Does as much as possible in data layout instead
of during the init function.
Handles var x = y; var y = z as a special case too,
because it is so prevalent in package unicode
(var Greek = _Greek; var _Greek = []...).
Introduces InitPlan description of initialized data
so that it can be traversed multiple times (for example,
in the copy handler).
Cuts package unicode's init function size by 8x.
All that remains there is map initialization, which
is on the chopping block too.
Fixes sinit.go test case.
Aggregate DATA instructions at end of object file.
Checkpoint. More to come.
R=ken2
CC=golang-dev
https://golang.org/cl/4969051
8g/cgen.c:
8g/gobj.c
. dropped unnecessary assignments;
8g/gg.h
. added varargckk pragmas;
8g/ggen.c
. dropped duplicate assignment;
8g/gsubr.c
. adjusted format in print statement;
. dropped unnecessary assignment;
. replaced GCC's _builtin_return_address(0) with Plan 9's
getcallerpc(&n) which is defined as a macro in <u.h>;
8g/list.c
. adjusted format in snprint statement;
8g/opt.h
. added varargck pragma (Adr*) that is specific for the invoking
modules;
8g/peep.c
. dropped unnecessary incrementation;
R=rsc
CC=golang-dev
https://golang.org/cl/4974044
#include "go.h" (or "gg.h")
becomes
#include <u.h>
#include <libc.h>
#include "go.h"
so that go.y can #include <stdio.h>
after <u.h> but before "go.h".
This is necessary on Plan 9.
R=ken2
CC=golang-dev
https://golang.org/cl/4971041
Required moving some parts of gc/pgen.c to ?g/ggen.c
on linux tests pass for all 3 architectures, and
frames are actually compacted (diagnostic code for
that has been removed from the CL).
R=rsc
CC=golang-dev
https://golang.org/cl/4571071
After allocparams and walk, remove unused auto variables
and re-layout the remaining in reverse alignment order.
R=rsc
CC=golang-dev
https://golang.org/cl/4568068
Input code like
0000 (x.go:2) TEXT main+0(SB),$36-0
0001 (x.go:3) MOVL $5,i+-8(SP)
0002 (x.go:3) MOVL $0,i+-4(SP)
0003 (x.go:4) MOVL $1,BX
0004 (x.go:4) MOVL i+-8(SP),AX
0005 (x.go:4) MOVL i+-4(SP),DX
0006 (x.go:4) MOVL AX,autotmp_0000+-20(SP)
0007 (x.go:4) MOVL DX,autotmp_0000+-16(SP)
0008 (x.go:4) MOVL autotmp_0000+-20(SP),CX
0009 (x.go:4) CMPL autotmp_0000+-16(SP),$0
0010 (x.go:4) JNE ,13
0011 (x.go:4) CMPL CX,$32
0012 (x.go:4) JCS ,14
0013 (x.go:4) MOVL $0,BX
0014 (x.go:4) SHLL CX,BX
0015 (x.go:4) MOVL BX,x+-12(SP)
0016 (x.go:5) MOVL x+-12(SP),AX
0017 (x.go:5) CDQ ,
0018 (x.go:5) MOVL AX,autotmp_0001+-28(SP)
0019 (x.go:5) MOVL DX,autotmp_0001+-24(SP)
0020 (x.go:5) MOVL autotmp_0001+-28(SP),AX
0021 (x.go:5) MOVL autotmp_0001+-24(SP),DX
0022 (x.go:5) MOVL AX,(SP)
0023 (x.go:5) MOVL DX,4(SP)
0024 (x.go:5) CALL ,runtime.printint+0(SB)
0025 (x.go:5) CALL ,runtime.printnl+0(SB)
0026 (x.go:6) RET ,
is problematic because the liveness range for
autotmp_0000 (0006-0009) is nested completely
inside a span where BX holds a live value (0003-0015).
Because the register allocator only looks at 0006-0009
to see which registers are used, it misses the fact that
BX is unavailable and uses it anyway.
The n->pun = anyregalloc() check in tempname is
a workaround for this bug, but I hit it again because
I did the tempname call before allocating BX, even
though I then used the temporary after storing in BX.
This should fix the real bug, and then we can remove
the workaround in tempname.
The code creates pseudo-variables for each register
and includes that information in the liveness propagation.
Then the regu fields can be populated using that more
complete information. With that approach, BX is marked
as in use on every line in the whole span 0003-0015,
so that the decision about autotmp_0000
(using only 0006-0009) still has all the information
it needs.
This is not specific to the 386, but it only happens in
generated code of the form
load R1
...
load var into R2
...
store R2 back into var
...
use R1
and for the most part the other compilers generate
the loads for a given compiled line before any of
the stores. Even so, this may not be the case everywhere,
so the change is worth making in all three.
R=ken2, ken, ken
CC=golang-dev
https://golang.org/cl/4529106
The code for converting negative floats was
incorrectly loading an FP control word from
the stack without ever having stored it there.
Thanks to Lars Pensjö for reporting this bug.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/4515091
Also, 6g was passing uninitialized
Node &n2 to regalloc, causing non-deterministic
register collisions (but only when both left and
right hand side of comparison had function calls).
Fixes#1728.
R=ken2
CC=golang-dev
https://golang.org/cl/4425070
The ld time was dominated by symbol table processing, so
* increase hash table size
* emit fewer symbols in gc (just 1 per string, 1 per type)
* add read-only lookup to avoid creating spurious symbols
* add linked list to speed whole-table traversals
Breaks dwarf generator (no idea why), so disable dwarf.
Reduces time for 6l to link godoc by 25%.
R=ken2
CC=golang-dev
https://golang.org/cl/4383047
The recent linker changes broke NaCl support
a month ago, and there are no known users of it.
The NaCl code can always be recovered from the
repository history.
R=adg, r
CC=golang-dev
https://golang.org/cl/3671042
No semantic changes here, but working
toward being able to align structs based
on the maximum alignment of the fields
inside instead of having a fixed alignment
for all structs (issue 482).
R=ken2
CC=golang-dev
https://golang.org/cl/3617041
If an %lld argument can be 32 or 64 bits wide, cast to vlong.
If always 32 bits, drop the ll.
Fixes#1336.
R=brainman, rsc
CC=golang-dev
https://golang.org/cl/3580041
Just enough to make mov instructions work,
which in turn is enough to make strconv work
when it avoids any floating point calculations.
That makes a bunch of other packages pass
their tests.
Should suffice until hardware floating point
is available.
Enable package tests that now pass
(some due to earlier fixes).
Looks like there is a new integer math bug
exposed in the fmt and json tests.
R=ken2
CC=golang-dev
https://golang.org/cl/2638041
The Plan 9 tools assume that long is 32 bits.
We converted all instances of long to int32 when
importing the code but missed the print formats.
Because int32 is always int on the compilers we use,
it is never correct to use %lux, %ld, etc. Convert to %ux, %d, etc.
(It matters because on 64-bit gcc, long is 64 bits,
so we were printing 32-bit quantities with 64-bit formats.)
R=ken2
CC=golang-dev
https://golang.org/cl/2491041