The compilers assume they can generate temporary variables
as needed to preserve the right semantics or simplify code
generation and the back end will still generate good code.
This turns out not to be true. The back ends will only
track the first 128 variables per function and give up
on the remainder. That needs to be fixed too, in a later CL.
This CL merges temporary variables with equal types and
non-overlapping lifetimes using the greedy algorithm in
Poletto and Sarkar, "Linear Scan Register Allocation",
ACM TOPLAS 1999.
The result can be striking in the right functions.
Top 20 frame size changes in a 6g godoc binary by bytes saved:
5464 1984 (-3480, -63.7%) go/build.(*Context).Import
4456 1824 (-2632, -59.1%) go/printer.(*printer).expr1
2560 80 (-2480, -96.9%) time.nextStdChunk
3496 1608 (-1888, -54.0%) go/printer.(*printer).stmt
1896 272 (-1624, -85.7%) net/http.init
2688 1400 (-1288, -47.9%) fmt.(*pp).printReflectValue
2800 1512 (-1288, -46.0%) main.main
3296 2016 (-1280, -38.8%) crypto/tls.(*Conn).clientHandshake
1664 488 (-1176, -70.7%) time.loadZoneZip
1760 608 (-1152, -65.5%) time.parse
4104 3072 (-1032, -25.1%) runtime/pprof.writeHeap
1680 712 ( -968, -57.6%) go/ast.Walk
2488 1560 ( -928, -37.3%) crypto/x509.parseCertificate
1128 392 ( -736, -65.2%) math/big.nat.divLarge
1528 864 ( -664, -43.5%) go/printer.(*printer).fieldList
1360 712 ( -648, -47.6%) regexp/syntax.(*parser).factor
2104 1528 ( -576, -27.4%) encoding/asn1.parseField
1064 504 ( -560, -52.6%) encoding/xml.(*Decoder).text
584 48 ( -536, -91.8%) html.init
1400 864 ( -536, -38.3%) go/doc.playExample
In the same godoc build, cuts the number of functions with
too many vars from 83 to 32.
R=ken2
CC=golang-dev
https://golang.org/cl/12829043
Now there's only one copy of the flow graph construction
and dominator computation, and different optimizations
can attach different annotations to the instructions.
R=ken2
CC=golang-dev
https://golang.org/cl/12797045
Code in gc/popt.c is compiled as part of 5g, 6g, and 8g,
meaning it can use arch-specific headers but there's
just one copy of the code.
This is the same arrangement we use for the portable
code generation logic in gc/pgen.c.
Move fixjmp and noreturn there to get the ball rolling.
R=ken2
CC=golang-dev
https://golang.org/cl/12789043
This CL introduces a FUNCDATA number for runtime-specific
garbage collection metadata, changes the C and Go compilers
to emit that metadata, and changes the runtime to expect it.
The old pseudo-instructions that carried this information
are gone, as is the linker code to process them.
R=golang-dev, dvyukov, cshapiro
CC=golang-dev
https://golang.org/cl/11406044
With this change the compiler emits a bitmap for each function
covering its stack frame arguments area. If an argument word
is known to contain a pointer, a bit is set. The garbage
collector reads this information when scanning the stack by
frames and uses it to ignores locations known to not contain a
pointer.
R=golang-dev, bradfitz, daniel.morsing, dvyukov, khr, khr, iant, cshapiro
CC=golang-dev
https://golang.org/cl/9223046
The type information is (and for years has been) included
as an extra field in the address chunk of an instruction.
Unfortunately, suppose there is a string at a+24(FP) and
we have an instruction reading its length. It will say:
MOVQ x+32(FP), AX
and the type of *that* argument is int (not slice), because
it is the length being read. This confuses the picture seen
by debuggers and now, worse, by the garbage collector.
Instead of attaching the type information to all uses,
emit an explicit list of TYPE instructions with the information.
The TYPE instructions are no-ops whose only role is to
provide an address to attach type information to.
For example, this function:
func f(x, y, z int) (a, b string) {
return
}
now compiles into:
--- prog list "f" ---
0000 (/Users/rsc/x.go:3) TEXT f+0(SB),$0-56
0001 (/Users/rsc/x.go:3) LOCALS ,
0002 (/Users/rsc/x.go:3) TYPE x+0(FP){int},$8
0003 (/Users/rsc/x.go:3) TYPE y+8(FP){int},$8
0004 (/Users/rsc/x.go:3) TYPE z+16(FP){int},$8
0005 (/Users/rsc/x.go:3) TYPE a+24(FP){string},$16
0006 (/Users/rsc/x.go:3) TYPE b+40(FP){string},$16
0007 (/Users/rsc/x.go:3) MOVQ $0,b+40(FP)
0008 (/Users/rsc/x.go:3) MOVQ $0,b+48(FP)
0009 (/Users/rsc/x.go:3) MOVQ $0,a+24(FP)
0010 (/Users/rsc/x.go:3) MOVQ $0,a+32(FP)
0011 (/Users/rsc/x.go:4) RET ,
The { } show the formerly hidden type information.
The { } syntax is used when printing from within the gc compiler.
It is not accepted by the assemblers.
The same type information is now included on global variables:
0055 (/Users/rsc/x.go:15) GLOBL slice+0(SB){[]string},$24(AL*0)
This more accurate type information fixes a bug in the
garbage collector's precise heap collection.
The linker only cares about globals right now, but having the
local information should make things a little nicer for Carl
in the future.
Fixes#4907.
R=ken2
CC=golang-dev
https://golang.org/cl/7395056
Change ARM context register to R7, to get out of the way
of the register allocator during the compilation of the
prologue statements (it wants to use R0 as a temporary).
Step 2 of http://golang.org/s/go11func.
R=ken2
CC=golang-dev
https://golang.org/cl/7369048
The peephole optimizer would keep hands off AX and X0 during returns, even though go doesn't return through registers.
R=dave, rsc
CC=golang-dev
https://golang.org/cl/7030046
A new environment variable GO386 is introduced to choose between
code generation targeting 387 or SSE2. No auto-detection is
performed and the setting defaults to 387 to preserve previous
behaviour.
The patch is a reorganization of CL6549052 by rsc.
Fixes#3912.
R=minux.ma, rsc
CC=golang-dev
https://golang.org/cl/6962043
5g: Prog went from 128 bytes to 88 bytes
6g: Prog went from 174 bytes to 144 bytes
8g: Prog went from 124 bytes to 92 bytes
There may be a little more that can be squeezed out of Addr, but alignment will be a factor.
All: remove the unused pun field from Addr
R=rsc, minux.ma
CC=golang-dev
https://golang.org/cl/6922048
* Shift/rotate by constant doesn't have to stop subprop. (also in 8g)
* Remove redundant MOVLQZX instructions.
* An attempt at issuing loads early.
Good for 0.5% on a good day, might not be worth keeping.
Need to understand more about whether the x86
looks ahead to what loads might be coming up.
R=ken2, ken
CC=golang-dev
https://golang.org/cl/6203091
This changes makes constant propagation compare 'from' values using node
pointers rather than symbol names when checking to see whether a set
operation is redundant. When a function is inlined multiple times in a
calling function its arguments will share symbol names even though the values
are different. Prior to this fix the bug409 test would hit a case with 6g
where an LEAQ instruction was incorrectly eliminated from the second inlined
function call. 8g appears to have had the same bug, but the test did not fail
there.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/5646044
8g/cgen.c:
8g/gobj.c
. dropped unnecessary assignments;
8g/gg.h
. added varargckk pragmas;
8g/ggen.c
. dropped duplicate assignment;
8g/gsubr.c
. adjusted format in print statement;
. dropped unnecessary assignment;
. replaced GCC's _builtin_return_address(0) with Plan 9's
getcallerpc(&n) which is defined as a macro in <u.h>;
8g/list.c
. adjusted format in snprint statement;
8g/opt.h
. added varargck pragma (Adr*) that is specific for the invoking
modules;
8g/peep.c
. dropped unnecessary incrementation;
R=rsc
CC=golang-dev
https://golang.org/cl/4974044
#include "go.h" (or "gg.h")
becomes
#include <u.h>
#include <libc.h>
#include "go.h"
so that go.y can #include <stdio.h>
after <u.h> but before "go.h".
This is necessary on Plan 9.
R=ken2
CC=golang-dev
https://golang.org/cl/4971041