Some 64-bit fields were run through 32-bit words, some counts were
not checked for overflow, and relocations must fit in 32 bits.
Tests to follow.
R=golang-dev, dsymonds
CC=golang-dev
https://golang.org/cl/9033043
The type information is (and for years has been) included
as an extra field in the address chunk of an instruction.
Unfortunately, suppose there is a string at a+24(FP) and
we have an instruction reading its length. It will say:
MOVQ x+32(FP), AX
and the type of *that* argument is int (not slice), because
it is the length being read. This confuses the picture seen
by debuggers and now, worse, by the garbage collector.
Instead of attaching the type information to all uses,
emit an explicit list of TYPE instructions with the information.
The TYPE instructions are no-ops whose only role is to
provide an address to attach type information to.
For example, this function:
func f(x, y, z int) (a, b string) {
return
}
now compiles into:
--- prog list "f" ---
0000 (/Users/rsc/x.go:3) TEXT f+0(SB),$0-56
0001 (/Users/rsc/x.go:3) LOCALS ,
0002 (/Users/rsc/x.go:3) TYPE x+0(FP){int},$8
0003 (/Users/rsc/x.go:3) TYPE y+8(FP){int},$8
0004 (/Users/rsc/x.go:3) TYPE z+16(FP){int},$8
0005 (/Users/rsc/x.go:3) TYPE a+24(FP){string},$16
0006 (/Users/rsc/x.go:3) TYPE b+40(FP){string},$16
0007 (/Users/rsc/x.go:3) MOVQ $0,b+40(FP)
0008 (/Users/rsc/x.go:3) MOVQ $0,b+48(FP)
0009 (/Users/rsc/x.go:3) MOVQ $0,a+24(FP)
0010 (/Users/rsc/x.go:3) MOVQ $0,a+32(FP)
0011 (/Users/rsc/x.go:4) RET ,
The { } show the formerly hidden type information.
The { } syntax is used when printing from within the gc compiler.
It is not accepted by the assemblers.
The same type information is now included on global variables:
0055 (/Users/rsc/x.go:15) GLOBL slice+0(SB){[]string},$24(AL*0)
This more accurate type information fixes a bug in the
garbage collector's precise heap collection.
The linker only cares about globals right now, but having the
local information should make things a little nicer for Carl
in the future.
Fixes#4907.
R=ken2
CC=golang-dev
https://golang.org/cl/7395056
Change ARM context register to R7, to get out of the way
of the register allocator during the compilation of the
prologue statements (it wants to use R0 as a temporary).
Step 2 of http://golang.org/s/go11func.
R=ken2
CC=golang-dev
https://golang.org/cl/7369048
cmd/8g/gsubr.c: unreachable code
cmd/8g/reg.c: overspecifed class
cmd/dist/plan9.c: unused parameter
cmd/gc/fmt.c: stkdelta is now a vlong
cmd/gc/racewalk.c: used but not set
R=golang-dev, seed, rsc
CC=golang-dev
https://golang.org/cl/7067052
A new environment variable GO386 is introduced to choose between
code generation targeting 387 or SSE2. No auto-detection is
performed and the setting defaults to 387 to preserve previous
behaviour.
The patch is a reorganization of CL6549052 by rsc.
Fixes#3912.
R=minux.ma, rsc
CC=golang-dev
https://golang.org/cl/6962043
remove zerostack compiler experiment; will do at link time instead
««« original CL description
cmd/gc: add GOEXPERIMENT=zerostack to clear stack on function entry
This is expensive but it might be useful in cases where
people are suffering from false positives during garbage
collection and are willing to trade the CPU time for getting
rid of the false positives.
On the other hand it only eliminates false positives caused
by other function calls, not false positives caused by dead
temporaries stored in the current function call.
The 5g/6g/8g changes were pulled out of the history, from
the last time we needed to do this (to work around a goto bug).
The code in go.h, lex.c, pgen.c is new but tiny.
R=ken2
CC=golang-dev
https://golang.org/cl/6938073
»»»
R=ken2
CC=golang-dev
https://golang.org/cl/7002051
This is expensive but it might be useful in cases where
people are suffering from false positives during garbage
collection and are willing to trade the CPU time for getting
rid of the false positives.
On the other hand it only eliminates false positives caused
by other function calls, not false positives caused by dead
temporaries stored in the current function call.
The 5g/6g/8g changes were pulled out of the history, from
the last time we needed to do this (to work around a goto bug).
The code in go.h, lex.c, pgen.c is new but tiny.
R=ken2
CC=golang-dev
https://golang.org/cl/6938073
5g: Prog went from 128 bytes to 88 bytes
6g: Prog went from 174 bytes to 144 bytes
8g: Prog went from 124 bytes to 92 bytes
There may be a little more that can be squeezed out of Addr, but alignment will be a factor.
All: remove the unused pun field from Addr
R=rsc, minux.ma
CC=golang-dev
https://golang.org/cl/6922048
This allows 5g and 8g to benefit from the rewrite as shifts
or magic multiplies. The 64-bit arithmetic is not handled there,
and left in 6g.
Update #2230.
R=golang-dev, dave, mtj, iant, rsc
CC=golang-dev
https://golang.org/cl/6819123
This is an experiment in static analysis of Go programs
to understand which struct fields a program might use.
It is not part of the Go language specification, it must
be enabled explicitly when building the toolchain,
and it may be removed at any time.
After building the toolchain with GOEXPERIMENT=fieldtrack,
a specific field can be marked for tracking by including
`go:"track"` in the field tag:
package pkg
type T struct {
F int `go:"track"`
G int // untracked
}
To simplify usage, only named struct types can have
tracked fields, and only exported fields can be tracked.
The implementation works by making each function begin
with a sequence of no-op USEFIELD instructions declaring
which tracked fields are accessed by a specific function.
After the linker's dead code elimination removes unused
functions, the fields referred to by the remaining
USEFIELD instructions are the ones reported as used by
the binary.
The -k option to the linker specifies the fully qualified
symbol name (such as my/pkg.list) of a string variable that
should be initialized with the field tracking information
for the program. The field tracking string is a sequence
of lines, each terminated by a \n and describing a single
tracked field referred to by the program. Each line is made
up of one or more tab-separated fields. The first field is
the name of the tracked field, fully qualified, as in
"my/pkg.T.F". Subsequent fields give a shortest path of
reverse references from that field to a global variable or
function, corresponding to one way in which the program
might reach that field.
A common source of false positives in field tracking is
types with large method sets, because a reference to the
type descriptor carries with it references to all methods.
To address this problem, the CL also introduces a comment
annotation
//go:nointerface
that marks an upcoming method declaration as unavailable
for use in satisfying interfaces, both statically and
dynamically. Such a method is also invisible to package
reflect.
Again, all of this is disabled by default. It only turns on
if you have GOEXPERIMENT=fieldtrack set during make.bash.
R=iant, ken
CC=golang-dev
https://golang.org/cl/6749064
There may be further savings if convT2I can avoid the function call
if the cache is good and T is uintptr-shaped, a la convT2E, but that
will be a follow-up CL.
src/pkg/runtime:
benchmark old ns/op new ns/op delta
BenchmarkConvT2ISmall 43 15 -64.01%
BenchmarkConvT2IUintptr 45 14 -67.48%
BenchmarkConvT2ILarge 130 101 -22.31%
test/bench/go1:
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 8588997000 8499058000 -1.05%
BenchmarkFannkuch11 5300392000 5358093000 +1.09%
BenchmarkGobDecode 30295580 31040190 +2.46%
BenchmarkGobEncode 18102070 17675650 -2.36%
BenchmarkGzip 774191400 771591400 -0.34%
BenchmarkGunzip 245915100 247464100 +0.63%
BenchmarkJSONEncode 123577000 121423050 -1.74%
BenchmarkJSONDecode 451969800 596256200 +31.92%
BenchmarkMandelbrot200 10060050 10072880 +0.13%
BenchmarkParse 10989840 11037710 +0.44%
BenchmarkRevcomp 1782666000 1716864000 -3.69%
BenchmarkTemplate 798286600 723234400 -9.40%
R=rsc, bradfitz, go.peter.90, daniel.morsing, dave, uriel
CC=golang-dev
https://golang.org/cl/6337058
Drop expecttaken function in favor of extra argument
to gbranch and bgen. Mark loop condition as likely to
be true, so that loops are generated inline.
The main benefit here is contiguous code when trying
to read the generated assembly. It has only minor effects
on the timing, and they mostly cancel the minor effects
that aligning function entry points had. One exception:
both changes made Fannkuch faster.
Compared to before CL 6244066 (before aligned functions)
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 4222117400 4201958800 -0.48%
BenchmarkFannkuch11 3462631800 3215908600 -7.13%
BenchmarkGobDecode 20887622 20899164 +0.06%
BenchmarkGobEncode 9548772 9439083 -1.15%
BenchmarkGzip 151687 152060 +0.25%
BenchmarkGunzip 8742 8711 -0.35%
BenchmarkJSONEncode 62730560 62686700 -0.07%
BenchmarkJSONDecode 252569180 252368960 -0.08%
BenchmarkMandelbrot200 5267599 5252531 -0.29%
BenchmarkRevcomp25M 980813500 985248400 +0.45%
BenchmarkTemplate 361259100 357414680 -1.06%
Compared to tip (aligned functions):
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 4140739800 4201958800 +1.48%
BenchmarkFannkuch11 3259914400 3215908600 -1.35%
BenchmarkGobDecode 20620222 20899164 +1.35%
BenchmarkGobEncode 9384886 9439083 +0.58%
BenchmarkGzip 150333 152060 +1.15%
BenchmarkGunzip 8741 8711 -0.34%
BenchmarkJSONEncode 65210990 62686700 -3.87%
BenchmarkJSONDecode 249394860 252368960 +1.19%
BenchmarkMandelbrot200 5273394 5252531 -0.40%
BenchmarkRevcomp25M 996013800 985248400 -1.08%
BenchmarkTemplate 360620840 357414680 -0.89%
R=ken2
CC=golang-dev
https://golang.org/cl/6245069
The old code generated for a bounds check was
CMP
JLT ok
CALL panicindex
ok:
...
The new code is (once the linker finishes with it):
CMP
JGE panic
...
panic:
CALL panicindex
which moves the calls out of line, putting more useful
code in each cache line. This matters especially in tight
loops, such as in Fannkuch. The benefit is more modest
elsewhere, but real.
From test/bench/go1, amd64:
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 6096092000 6088808000 -0.12%
BenchmarkFannkuch11 6151404000 4020463000 -34.64%
BenchmarkGobDecode 28990050 28894630 -0.33%
BenchmarkGobEncode 12406310 12136730 -2.17%
BenchmarkGzip 179923 179903 -0.01%
BenchmarkGunzip 11219 11130 -0.79%
BenchmarkJSONEncode 86429350 86515900 +0.10%
BenchmarkJSONDecode 334593800 315728400 -5.64%
BenchmarkRevcomp25M 1219763000 1180767000 -3.20%
BenchmarkTemplate 492947600 483646800 -1.89%
And 386:
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 6354902000 6243000000 -1.76%
BenchmarkFannkuch11 8043769000 7326965000 -8.91%
BenchmarkGobDecode 19010800 18941230 -0.37%
BenchmarkGobEncode 14077500 13792460 -2.02%
BenchmarkGzip 194087 193619 -0.24%
BenchmarkGunzip 12495 12457 -0.30%
BenchmarkJSONEncode 125636400 125451400 -0.15%
BenchmarkJSONDecode 696648600 685032800 -1.67%
BenchmarkRevcomp25M 2058088000 2052545000 -0.27%
BenchmarkTemplate 602140000 589876800 -2.04%
To implement this, two new instruction forms:
JLT target // same as always
JLT $0, target // branch expected not taken
JLT $1, target // branch expected taken
The linker could also emit the prediction prefixes, but it
does not: expected taken branches are reversed so that the
expected case is not taken (as in example above), and
the default expectaton for such a jump is not taken
already.
R=golang-dev, gri, r, dave
CC=golang-dev
https://golang.org/cl/6248049
* Eliminate bounds check on known small shifts.
* Rewrite x<<s | x>>(32-s) as a rotate (constant s).
* More aggressive (but still minimal) range analysis.
R=ken, dave, iant
CC=golang-dev
https://golang.org/cl/6209077
The garbage collector can avoid scanning this section, with
reduces collection time as well as the number of false positives.
Helps a little bit with issue 909, but certainly does not solve it.
R=ken2
CC=golang-dev
https://golang.org/cl/5671099
If the values being compared have different concrete types,
then they're clearly unequal without needing to invoke the
actual interface compare routine. This speeds tests for
specific values, like if err == io.EOF, by about 3x.
benchmark old ns/op new ns/op delta
BenchmarkIfaceCmp100 843 287 -65.95%
BenchmarkIfaceCmpNil100 184 182 -1.09%
Fixes#2591.
R=ken2
CC=golang-dev
https://golang.org/cl/5651073
My previous CL:
changeset: 9645:ce2e5f44b310
user: Russ Cox <rsc@golang.org>
date: Tue Sep 06 10:24:21 2011 -0400
summary: gc: unify stack frame layout
introduced a bug wherein no variables were
being registerized, making Go programs 2-3x
slower than they had been before.
This CL fixes that bug (along with some others
it was hiding) and adds a test that optimization
makes at least one test case faster.
R=ken2
CC=golang-dev
https://golang.org/cl/5174045
allocparams + tempname + compactframe
all knew about how to place stack variables.
Now only compactframe, renamed to allocauto,
does the work. Until the last minute, each PAUTO
variable is in its own space and has xoffset == 0.
This might break 5g. I get failures in concurrent
code running under qemu and I can't tell whether
it's 5g's fault or qemu's. We'll see what the real
ARM builders say.
R=ken2
CC=golang-dev
https://golang.org/cl/4973057
Does as much as possible in data layout instead
of during the init function.
Handles var x = y; var y = z as a special case too,
because it is so prevalent in package unicode
(var Greek = _Greek; var _Greek = []...).
Introduces InitPlan description of initialized data
so that it can be traversed multiple times (for example,
in the copy handler).
Cuts package unicode's init function size by 8x.
All that remains there is map initialization, which
is on the chopping block too.
Fixes sinit.go test case.
Aggregate DATA instructions at end of object file.
Checkpoint. More to come.
R=ken2
CC=golang-dev
https://golang.org/cl/4969051
8g/cgen.c:
8g/gobj.c
. dropped unnecessary assignments;
8g/gg.h
. added varargckk pragmas;
8g/ggen.c
. dropped duplicate assignment;
8g/gsubr.c
. adjusted format in print statement;
. dropped unnecessary assignment;
. replaced GCC's _builtin_return_address(0) with Plan 9's
getcallerpc(&n) which is defined as a macro in <u.h>;
8g/list.c
. adjusted format in snprint statement;
8g/opt.h
. added varargck pragma (Adr*) that is specific for the invoking
modules;
8g/peep.c
. dropped unnecessary incrementation;
R=rsc
CC=golang-dev
https://golang.org/cl/4974044
#include "go.h" (or "gg.h")
becomes
#include <u.h>
#include <libc.h>
#include "go.h"
so that go.y can #include <stdio.h>
after <u.h> but before "go.h".
This is necessary on Plan 9.
R=ken2
CC=golang-dev
https://golang.org/cl/4971041
After allocparams and walk, remove unused auto variables
and re-layout the remaining in reverse alignment order.
R=rsc
CC=golang-dev
https://golang.org/cl/4568068
The code for converting negative floats was
incorrectly loading an FP control word from
the stack without ever having stored it there.
Thanks to Lars Pensjö for reporting this bug.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/4515091
If an %lld argument can be 32 or 64 bits wide, cast to vlong.
If always 32 bits, drop the ll.
Fixes#1336.
R=brainman, rsc
CC=golang-dev
https://golang.org/cl/3580041
The Plan 9 tools assume that long is 32 bits.
We converted all instances of long to int32 when
importing the code but missed the print formats.
Because int32 is always int on the compilers we use,
it is never correct to use %lux, %ld, etc. Convert to %ux, %d, etc.
(It matters because on 64-bit gcc, long is 64 bits,
so we were printing 32-bit quantities with 64-bit formats.)
R=ken2
CC=golang-dev
https://golang.org/cl/2491041
cannot allocate an audomatic temp
while real registers are allocated.
there is a chance that the automatic
will be allocated to one of the
allocated registers. the fix is to
not registerize such variables.
R=rsc
CC=golang-dev
https://golang.org/cl/1202042