Drop expecttaken function in favor of extra argument
to gbranch and bgen. Mark loop condition as likely to
be true, so that loops are generated inline.
The main benefit here is contiguous code when trying
to read the generated assembly. It has only minor effects
on the timing, and they mostly cancel the minor effects
that aligning function entry points had. One exception:
both changes made Fannkuch faster.
Compared to before CL 6244066 (before aligned functions)
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 4222117400 4201958800 -0.48%
BenchmarkFannkuch11 3462631800 3215908600 -7.13%
BenchmarkGobDecode 20887622 20899164 +0.06%
BenchmarkGobEncode 9548772 9439083 -1.15%
BenchmarkGzip 151687 152060 +0.25%
BenchmarkGunzip 8742 8711 -0.35%
BenchmarkJSONEncode 62730560 62686700 -0.07%
BenchmarkJSONDecode 252569180 252368960 -0.08%
BenchmarkMandelbrot200 5267599 5252531 -0.29%
BenchmarkRevcomp25M 980813500 985248400 +0.45%
BenchmarkTemplate 361259100 357414680 -1.06%
Compared to tip (aligned functions):
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 4140739800 4201958800 +1.48%
BenchmarkFannkuch11 3259914400 3215908600 -1.35%
BenchmarkGobDecode 20620222 20899164 +1.35%
BenchmarkGobEncode 9384886 9439083 +0.58%
BenchmarkGzip 150333 152060 +1.15%
BenchmarkGunzip 8741 8711 -0.34%
BenchmarkJSONEncode 65210990 62686700 -3.87%
BenchmarkJSONDecode 249394860 252368960 +1.19%
BenchmarkMandelbrot200 5273394 5252531 -0.40%
BenchmarkRevcomp25M 996013800 985248400 -1.08%
BenchmarkTemplate 360620840 357414680 -0.89%
R=ken2
CC=golang-dev
https://golang.org/cl/6245069
* Eliminate bounds check on known small shifts.
* Rewrite x<<s | x>>(32-s) as a rotate (constant s).
* More aggressive (but still minimal) range analysis.
R=ken, dave, iant
CC=golang-dev
https://golang.org/cl/6209077
Using reg as the flag word was unfortunate, since the
default value is not 0 but NREG (==16), which happens
to be the bit NOPTR now. Clear it.
If I say this will fix the build, it won't.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5690072
cc: add #pragma textflag to set it
runtime: mark mheap to go into noptr-bss.
remove special case in garbage collector
Remove the ARM from.flag field created by CL 5687044.
The DUPOK flag was already in p->reg, so keep using that.
Otherwise test/nilptr.go creates a very large binary.
Should fix the arm build.
Diagnosed by minux.ma; replacement for CL 5690044.
R=golang-dev, minux.ma, r
CC=golang-dev
https://golang.org/cl/5686060
Such variables would be put at 0(SP), leading to serious
corruptions at zero initialization.
Fixes#3084.
R=golang-dev, r
CC=golang-dev, remy
https://golang.org/cl/5683052
The alternative is to record enough information that the
trap handler know which registers contain cached globals
and can flush the registers back to their original locations.
That's significantly more work.
This only affects globals that have been written to.
Code that reads from a global should continue to registerize
as well as before.
Fixes#1304.
R=ken2
CC=golang-dev
https://golang.org/cl/5687046
ARM doesn't have the concept of scale, so I renamed the field
Addr.scale to Addr.flag to better reflect its true meaning.
R=rsc
CC=golang-dev
https://golang.org/cl/5687044
The garbage collector can avoid scanning this section, with
reduces collection time as well as the number of false positives.
Helps a little bit with issue 909, but certainly does not solve it.
R=ken2
CC=golang-dev
https://golang.org/cl/5671099
If the values being compared have different concrete types,
then they're clearly unequal without needing to invoke the
actual interface compare routine. This speeds tests for
specific values, like if err == io.EOF, by about 3x.
benchmark old ns/op new ns/op delta
BenchmarkIfaceCmp100 843 287 -65.95%
BenchmarkIfaceCmpNil100 184 182 -1.09%
Fixes#2591.
R=ken2
CC=golang-dev
https://golang.org/cl/5651073
As a convenience to people working on the tools,
leave Makefiles that invoke the go dist tool appropriately.
They are not used during the build.
R=golang-dev, bradfitz, n13m3y3r, gustavo
CC=golang-dev
https://golang.org/cl/5636050
This can happen on Plan 9 if we we're building
with the 32-bit and 64-bit host compilers, one
after the other.
R=rsc
CC=golang-dev
https://golang.org/cl/5599053
Also delete gotest, since it's messy to fix and slated for deletion anyway.
A couple of things outside src can't be tested any more. "go test" will be
fixed and these tests will be re-enabled. They're noisy for now.
Fixes#284.
R=rsc
CC=golang-dev
https://golang.org/cl/5598049
This fixes issue 2444.
A big cleanup of all 31/32bit size boundaries i'll leave for another cl though. (see also issue 1700).
R=rsc
CC=golang-dev
https://golang.org/cl/5484058
To allow these types as map keys, we must fill in
equal and hash functions in their algorithm tables.
Structs or arrays that are "just memory", like [2]int,
can and do continue to use the AMEM algorithm.
Structs or arrays that contain special values like
strings or interface values use generated functions
for both equal and hash.
The runtime helper func runtime.equal(t, x, y) bool handles
the general equality case for x == y and calls out to
the equal implementation in the algorithm table.
For short values (<= 4 struct fields or array elements),
the sequence of elementwise comparisons is inlined
instead of calling runtime.equal.
R=ken, mpimenov
CC=golang-dev
https://golang.org/cl/5451105
The loop recognizer uses the standard dominance
frontiers but gets confused by dead code, which
has a (not explicitly set) rpo number of 0, meaning it
looks like the head of the function, so it dominates
everything. If the loop recognizer encounters dead
code while tracking backward through the graph
it fails to recognize where it started as a loop, and
then the optimizer does not registerize values loaded
inside that loop. Fix by checking rpo against rpo2r.
Separately, run a quick pass over the generated
code to squash JMPs to JMP instructions, which
are convenient to emit during code generation but
difficult to read when debugging the -S output.
A side effect of this pass is to eliminate dead code,
so the output files may be slightly smaller and the
optimizer may have less work to do.
There is no semantic effect, because the linkers
flatten JMP chains and delete dead instructions
when laying out the final code. Doing it here too
just makes the -S output easier to read and more
like what the final binary will contain.
The "dead code breaks loop finding" bug is thus
fixed twice over. It seemed prudent to fix loopit
separately just in case dead code ever sneaks back
in for one reason or another.
R=ken2
CC=golang-dev
https://golang.org/cl/5190043
My previous CL:
changeset: 9645:ce2e5f44b310
user: Russ Cox <rsc@golang.org>
date: Tue Sep 06 10:24:21 2011 -0400
summary: gc: unify stack frame layout
introduced a bug wherein no variables were
being registerized, making Go programs 2-3x
slower than they had been before.
This CL fixes that bug (along with some others
it was hiding) and adds a test that optimization
makes at least one test case faster.
R=ken2
CC=golang-dev
https://golang.org/cl/5174045
allocparams + tempname + compactframe
all knew about how to place stack variables.
Now only compactframe, renamed to allocauto,
does the work. Until the last minute, each PAUTO
variable is in its own space and has xoffset == 0.
This might break 5g. I get failures in concurrent
code running under qemu and I can't tell whether
it's 5g's fault or qemu's. We'll see what the real
ARM builders say.
R=ken2
CC=golang-dev
https://golang.org/cl/4973057
Does as much as possible in data layout instead
of during the init function.
Handles var x = y; var y = z as a special case too,
because it is so prevalent in package unicode
(var Greek = _Greek; var _Greek = []...).
Introduces InitPlan description of initialized data
so that it can be traversed multiple times (for example,
in the copy handler).
Cuts package unicode's init function size by 8x.
All that remains there is map initialization, which
is on the chopping block too.
Fixes sinit.go test case.
Aggregate DATA instructions at end of object file.
Checkpoint. More to come.
R=ken2
CC=golang-dev
https://golang.org/cl/4969051
5g/cgen.c:
. USED(n4) as it is only mentioned in unreachable code later;
. dropped unused assignments;
. commented out unreachable code;
5g/cgen64.c:
5g/ggen.c:
. dropped unused assignments of function return value;
5g/gg.h:
. added varargck pragmas;
5g/peep.c:
. USED(p1) used only in unreacheable code;
. commented out unreachable code;
5g/reg.c:
. dropped unused assignment;
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/4953048
#include "go.h" (or "gg.h")
becomes
#include <u.h>
#include <libc.h>
#include "go.h"
so that go.y can #include <stdio.h>
after <u.h> but before "go.h".
This is necessary on Plan 9.
R=ken2
CC=golang-dev
https://golang.org/cl/4971041
Required moving some parts of gc/pgen.c to ?g/ggen.c
on linux tests pass for all 3 architectures, and
frames are actually compacted (diagnostic code for
that has been removed from the CL).
R=rsc
CC=golang-dev
https://golang.org/cl/4571071
After allocparams and walk, remove unused auto variables
and re-layout the remaining in reverse alignment order.
R=rsc
CC=golang-dev
https://golang.org/cl/4568068
Input code like
0000 (x.go:2) TEXT main+0(SB),$36-0
0001 (x.go:3) MOVL $5,i+-8(SP)
0002 (x.go:3) MOVL $0,i+-4(SP)
0003 (x.go:4) MOVL $1,BX
0004 (x.go:4) MOVL i+-8(SP),AX
0005 (x.go:4) MOVL i+-4(SP),DX
0006 (x.go:4) MOVL AX,autotmp_0000+-20(SP)
0007 (x.go:4) MOVL DX,autotmp_0000+-16(SP)
0008 (x.go:4) MOVL autotmp_0000+-20(SP),CX
0009 (x.go:4) CMPL autotmp_0000+-16(SP),$0
0010 (x.go:4) JNE ,13
0011 (x.go:4) CMPL CX,$32
0012 (x.go:4) JCS ,14
0013 (x.go:4) MOVL $0,BX
0014 (x.go:4) SHLL CX,BX
0015 (x.go:4) MOVL BX,x+-12(SP)
0016 (x.go:5) MOVL x+-12(SP),AX
0017 (x.go:5) CDQ ,
0018 (x.go:5) MOVL AX,autotmp_0001+-28(SP)
0019 (x.go:5) MOVL DX,autotmp_0001+-24(SP)
0020 (x.go:5) MOVL autotmp_0001+-28(SP),AX
0021 (x.go:5) MOVL autotmp_0001+-24(SP),DX
0022 (x.go:5) MOVL AX,(SP)
0023 (x.go:5) MOVL DX,4(SP)
0024 (x.go:5) CALL ,runtime.printint+0(SB)
0025 (x.go:5) CALL ,runtime.printnl+0(SB)
0026 (x.go:6) RET ,
is problematic because the liveness range for
autotmp_0000 (0006-0009) is nested completely
inside a span where BX holds a live value (0003-0015).
Because the register allocator only looks at 0006-0009
to see which registers are used, it misses the fact that
BX is unavailable and uses it anyway.
The n->pun = anyregalloc() check in tempname is
a workaround for this bug, but I hit it again because
I did the tempname call before allocating BX, even
though I then used the temporary after storing in BX.
This should fix the real bug, and then we can remove
the workaround in tempname.
The code creates pseudo-variables for each register
and includes that information in the liveness propagation.
Then the regu fields can be populated using that more
complete information. With that approach, BX is marked
as in use on every line in the whole span 0003-0015,
so that the decision about autotmp_0000
(using only 0006-0009) still has all the information
it needs.
This is not specific to the 386, but it only happens in
generated code of the form
load R1
...
load var into R2
...
store R2 back into var
...
use R1
and for the most part the other compilers generate
the loads for a given compiled line before any of
the stores. Even so, this may not be the case everywhere,
so the change is worth making in all three.
R=ken2, ken, ken
CC=golang-dev
https://golang.org/cl/4529106
Makes all.bash work after echo 4 >/proc/cpu/alignment,
which means kill the process on an unaligned access.
The default behavior on DreamPlug/GuruPlug/SheevaPlug
is to simulate an ARMv3 and just let the unaligned accesses
stop at the word boundary, resulting in all kinds of surprises.
Fixes#1240.
R=ken2
CC=golang-dev
https://golang.org/cl/4551064
Also, 6g was passing uninitialized
Node &n2 to regalloc, causing non-deterministic
register collisions (but only when both left and
right hand side of comparison had function calls).
Fixes#1728.
R=ken2
CC=golang-dev
https://golang.org/cl/4425070
The ld time was dominated by symbol table processing, so
* increase hash table size
* emit fewer symbols in gc (just 1 per string, 1 per type)
* add read-only lookup to avoid creating spurious symbols
* add linked list to speed whole-table traversals
Breaks dwarf generator (no idea why), so disable dwarf.
Reduces time for 6l to link godoc by 25%.
R=ken2
CC=golang-dev
https://golang.org/cl/4383047
same as in issue below, never fixed on ARM
changeset: 5498:3fa1372ca694
user: Ken Thompson <ken@golang.org>
date: Thu May 20 17:31:28 2010 -0700
description:
fix issue 798
cannot allocate an audomatic temp
while real registers are allocated.
there is a chance that the automatic
will be allocated to one of the
allocated registers. the fix is to
not registerize such variables.
R=rsc
CC=golang-dev
https://golang.org/cl/1202042
R=ken2
CC=golang-dev
https://golang.org/cl/4226042
The fault was lucky: when it wasn't faulting it was silently
copying a word from some other block and later putting
that same word back. If some other goroutine had changed
that word of memory in the interim, too bad.
The ARM code was inconsistent about whether the
"argument frame" included the saved LR. Including it made
some things more regular but mostly just caused confusion
in the places where the regularity broke. Now the rule
reflects reality: argp is always a pointer to arguments,
never a saved link register.
Renamed struct fields to make meaning clearer.
Running ARM in QEMU, package time's gotest:
* before: 27/58 failed
* after: 0/50
R=r, r2
CC=golang-dev
https://golang.org/cl/3993041
No semantic changes here, but working
toward being able to align structs based
on the maximum alignment of the fields
inside instead of having a fixed alignment
for all structs (issue 482).
R=ken2
CC=golang-dev
https://golang.org/cl/3617041
If an %lld argument can be 32 or 64 bits wide, cast to vlong.
If always 32 bits, drop the ll.
Fixes#1336.
R=brainman, rsc
CC=golang-dev
https://golang.org/cl/3580041
Just enough to make mov instructions work,
which in turn is enough to make strconv work
when it avoids any floating point calculations.
That makes a bunch of other packages pass
their tests.
Should suffice until hardware floating point
is available.
Enable package tests that now pass
(some due to earlier fixes).
Looks like there is a new integer math bug
exposed in the fmt and json tests.
R=ken2
CC=golang-dev
https://golang.org/cl/2638041
The Plan 9 tools assume that long is 32 bits.
We converted all instances of long to int32 when
importing the code but missed the print formats.
Because int32 is always int on the compilers we use,
it is never correct to use %lux, %ld, etc. Convert to %ux, %d, etc.
(It matters because on 64-bit gcc, long is 64 bits,
so we were printing 32-bit quantities with 64-bit formats.)
R=ken2
CC=golang-dev
https://golang.org/cl/2491041
* Maintain Sym* list for text with individual
prog lists instead of using one huge list and
overloading p->pcond.
* Comment what each file is for.
* Move some output code from span.c to asm.c.
* Move profiling into prof.c, symbol table into symtab.c.
* Move mkfwd to ld/lib.c.
* Throw away dhog dynamic loading code.
* Throw away Alef become.
* Fix printing of WORD instructions in 5l -a.
Goal here is to be able to handle each piece of text or data
as a separate piece, both to make it easier to load the
occasional .o file and also to make it possible to split the
work across multiple threads.
R=ken2, r, ken3
CC=golang-dev
https://golang.org/cl/2335043