The writebarrier pass processes WB ops from beginning to end,
replacing them by other values.
But it also checks whether there are more ops to process
by walking from beginning to end.
This is quadratic, so walk from end to beginning instead.
This speeds up compiling the code in issue 13554:
name old time/op new time/op delta
Pkg 11.9s ± 2% 8.3s ± 3% -29.88% (p=0.000 n=18+17)
Updates #13554
Passes toolstash-check.
Change-Id: I5f8a872ddc4b783540220d89ea2ee188a6d2b2ff
Reviewed-on: https://go-review.googlesource.com/43571
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
SWI only support "SWI $imm", but currently "SWI (Reg)" is also
accepted. This patch fixes it.
And more instruction tests are added to cmd/asm/internal/asm/testdata/arm.s
fixes#20375
Change-Id: Id437d853924a403e41da9b6cbddd20d994b624ff
Reviewed-on: https://go-review.googlesource.com/43552
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Second attempt to fix#14710.
CL 35272 already tried to fix this issue. But CL 35272 assumed
that runtime.epclntab type is STEXT, while it is actually SRODATA.
This CL uses Symbol.Sect.Seg to determine if symbol is part
of Segtext or Segdata.
Fixes#14710
Change-Id: Ic6b6f657555c87a64d2bc36cc4c07ab0591d00c4
Reviewed-on: https://go-review.googlesource.com/42390
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
fuseBlockPlain was accidentally quadratic.
If you had plain blocks b1 -> b2 -> b3 -> b4,
each containing single values v1, v2, v3, and v4 respectively,
fuseBlockPlain would move v1 from b1 to b2 to b3 to b4,
then v2 from b2 to b3 to b4, etc.
There are two obvious fixes.
* Look for runs of blocks in fuseBlockPlain
and handle them in a single go.
* Fuse from end to beginning; any given value in a run
of blocks to fuse then moves only once.
The latter is much simpler, so that's what this CL does.
Somewhat surprisingly, this change does not pass toolstash-check.
The resulting set of blocks is the same,
and the values in them are the same,
but the order of values in them differ,
and that order of values (while arbitrary)
is enough to change the compiler's output.
This may be due to #20178; deadstore is the next pass after fuse.
Adding basic sorting to the beginning of deadstore
is enough to make this CL pass toolstash-check:
for _, b := range f.Blocks {
obj.SortSlice(b.Values, func(i, j int) bool { return b.Values[i].ID < b.Values[j].ID })
}
Happily, this CL appears to result in better code on average,
if only by accident. It cuts 4k off of cmd/go; go1 benchmarks
are noisy as always but don't regress (numbers below).
No impact on the standard compilebench benchmarks.
For the code in #13554, this speeds up compilation dramatically:
name old time/op new time/op delta
Pkg 53.1s ± 2% 12.8s ± 3% -75.92% (p=0.008 n=5+5)
name old user-time/op new user-time/op delta
Pkg 55.0s ± 2% 14.9s ± 3% -73.00% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Pkg 2.04GB ± 0% 2.04GB ± 0% +0.18% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Pkg 6.21M ± 0% 6.21M ± 0% ~ (p=0.222 n=5+5)
name old object-bytes new object-bytes delta
Pkg 28.4M ± 0% 28.4M ± 0% +0.00% (p=0.008 n=5+5)
name old export-bytes new export-bytes delta
Pkg 208 ± 0% 208 ± 0% ~ (all equal)
Updates #13554
go1 benchmarks:
name old time/op new time/op delta
BinaryTree17-8 2.29s ± 2% 2.26s ± 2% -1.43% (p=0.000 n=48+50)
Fannkuch11-8 2.74s ± 2% 2.79s ± 2% +1.63% (p=0.000 n=50+49)
FmtFprintfEmpty-8 36.6ns ± 3% 34.6ns ± 4% -5.29% (p=0.000 n=49+50)
FmtFprintfString-8 58.3ns ± 3% 59.1ns ± 3% +1.35% (p=0.000 n=50+49)
FmtFprintfInt-8 62.4ns ± 2% 63.2ns ± 3% +1.19% (p=0.000 n=49+49)
FmtFprintfIntInt-8 95.1ns ± 2% 96.7ns ± 3% +1.61% (p=0.000 n=49+50)
FmtFprintfPrefixedInt-8 118ns ± 3% 113ns ± 2% -4.00% (p=0.000 n=50+49)
FmtFprintfFloat-8 191ns ± 2% 192ns ± 2% +0.40% (p=0.034 n=50+50)
FmtManyArgs-8 419ns ± 2% 420ns ± 2% ~ (p=0.228 n=49+49)
GobDecode-8 5.26ms ± 3% 5.19ms ± 2% -1.33% (p=0.000 n=50+49)
GobEncode-8 4.12ms ± 2% 4.15ms ± 3% +0.68% (p=0.007 n=49+50)
Gzip-8 198ms ± 2% 197ms ± 2% -0.50% (p=0.018 n=48+48)
Gunzip-8 31.9ms ± 3% 31.8ms ± 3% -0.47% (p=0.024 n=50+50)
HTTPClientServer-8 64.4µs ± 0% 64.0µs ± 0% -0.55% (p=0.000 n=43+46)
JSONEncode-8 10.6ms ± 2% 10.6ms ± 3% ~ (p=0.543 n=49+49)
JSONDecode-8 43.3ms ± 3% 43.1ms ± 2% ~ (p=0.079 n=50+50)
Mandelbrot200-8 3.70ms ± 2% 3.70ms ± 2% ~ (p=0.553 n=47+50)
GoParse-8 2.70ms ± 2% 2.71ms ± 3% ~ (p=0.843 n=49+50)
RegexpMatchEasy0_32-8 70.5ns ± 4% 70.4ns ± 4% ~ (p=0.867 n=48+50)
RegexpMatchEasy0_1K-8 162ns ± 3% 162ns ± 2% ~ (p=0.739 n=48+48)
RegexpMatchEasy1_32-8 66.1ns ± 5% 66.2ns ± 4% ~ (p=0.970 n=50+50)
RegexpMatchEasy1_1K-8 297ns ± 7% 296ns ± 7% ~ (p=0.406 n=50+50)
RegexpMatchMedium_32-8 105ns ± 5% 105ns ± 5% ~ (p=0.702 n=50+50)
RegexpMatchMedium_1K-8 32.3µs ± 4% 32.2µs ± 3% ~ (p=0.614 n=49+49)
RegexpMatchHard_32-8 1.75µs ±18% 1.74µs ±12% ~ (p=0.738 n=50+48)
RegexpMatchHard_1K-8 52.2µs ±14% 51.3µs ±13% ~ (p=0.230 n=50+50)
Revcomp-8 366ms ± 3% 367ms ± 3% ~ (p=0.745 n=49+49)
Template-8 48.5ms ± 4% 48.5ms ± 4% ~ (p=0.824 n=50+48)
TimeParse-8 263ns ± 2% 256ns ± 2% -2.98% (p=0.000 n=48+49)
TimeFormat-8 265ns ± 3% 262ns ± 3% -1.35% (p=0.000 n=48+49)
[Geo mean] 41.1µs 40.9µs -0.48%
Change-Id: Ib35fa15b54282abb39c077d150beee27f610891a
Reviewed-on: https://go-review.googlesource.com/43570
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
When the race detector is enabled,
the compiler randomizes the order in which functions are compiled,
in an attempt to shake out bugs.
But we never re-seed the rand source, so every execution is identical.
Fix that to get more coverage.
Change-Id: If5cdde03ef4f1bab5f45e07f03fb6614945481d7
Reviewed-on: https://go-review.googlesource.com/43572
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently, cgo converts integer macros into int64 if it's possible.
As a result, some macros which satisfy
math.MaxInt64 < x <= math.MaxUint64
will lose their original values.
This CL introduces the new probe to check signs,
so we can handle signed ints and unsigned ints separately.
Fixes#20369
Change-Id: I002ba452a82514b3a87440960473676f842cc9ee
Reviewed-on: https://go-review.googlesource.com/43476
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The generated file runtime/internal/sys/zversion.go is deleted by
`go tool cmd dist clean` as part of running clean.bash. Don't treat
a missing file as a reason to stop running the go tool; just treat
is as meaning that runtime/internal/sys is stale.
No test because I don't particularly want to clobber $GOROOT.
Fixes#20385.
Change-Id: I5251a99542cc93c33f627f133d7118df56e18af1
Reviewed-on: https://go-review.googlesource.com/43559
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
LLV and SCV are 64-bit load-linked and store-conditional. They
were used in runtime as #define WORD. Change them to normal
instruction form.
NOOP is hardware no-op. It was written as WORD $0. Make a name
for it for better disassembly output.
Fixes#12561.
Fixes#18238.
Change-Id: I82c667ce756fa83ef37b034b641e8c4366335e83
Reviewed-on: https://go-review.googlesource.com/40297
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Noticed while looking at #20356.
Cuts 160k (1%) off of the cmd/compile binary.
Change-Id: If2397bc6971d6be9be6975048adecb0b5efa6d66
Reviewed-on: https://go-review.googlesource.com/43501
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Previous CL (cmd/internal/objabi: shrink SymType down to a uint8) shrinks
SymType down to a uint8 but forgot making according change in goobj.
Fixes#20296
Also add a test to catch such Goobj format inconsistency bug
Change-Id: Ib43dd7122cfcacf611a643814e95f8c5a924941f
Reviewed-on: https://go-review.googlesource.com/42971
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Attaching positions to SB, SP, initial mem can result in
less-good line-numbering when compiled for debugging.
This "fix" also removes source position from a zero-valued
struct (but not from its fields) and from a zero-length
array constant.
This may be a general problem for constants in entry blocks.
Fixes#20367.
Change-Id: I7e9df3341be2e2f60f127d35bb31e43cdcfce9a1
Reviewed-on: https://go-review.googlesource.com/43531
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Enhance the one-live-memory-at-a-time check to run during many
more phases of the SSA backend. Also make it work in an interblock
fashion.
Change types.IsMemory to return true for tuples containing a memory type.
Fix trim pass to build the merged phi correctly. Doesn't affect
code but allows the check to pass after trim runs.
Switch the AddTuple* ops to take the memory-containing tuple argument second.
Update #20335
Change-Id: I5b03ef3606b75a9e4f765276bb8b183cdc172b43
Reviewed-on: https://go-review.googlesource.com/43495
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Adjust finddebugruntimepath to look for runtime/debug.go file
instead of runtime/runtime.go. This actually finds runtime.GOMAXPROCS
in every Go executable (including windows).
I also included "-Wl,-T,fix_debug_gdb_scripts.ld" parameter to gcc
invocation on windows to work around gcc bug (see #20183 for details).
This CL only fixes windows -buildmode=exe, buildmode=c-archive
is still broken.
Thanks to Egon Elbre and Nick Clifton for investigation.
Fixes#20183Fixes#20218
Change-Id: I5369a4db3913226aef3d9bd6317446856b0a1c34
Reviewed-on: https://go-review.googlesource.com/43331
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Make yellow the last highlight color rather than the first.
Yellow is also the color that Chrome uses to highlight
search results, which can be confusing.
Also, when Night Shift is on on macOS,
yellow highlighting is completely invisible.
I suppose should be sleeping instead.
Also, remove a completed TODO.
Change-Id: I0eb4439272fad9ccb5fe8e2cf409fdd5dc15b26e
Reviewed-on: https://go-review.googlesource.com/43463
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
When compiling concurrently, we walk all functions before compiling
any of them. Walking functions can cause variables to switch from
being non-addrtaken to addrtaken, e.g. to prepare for a runtime call.
Typechecking propagates addrtaken-ness of closure variables to
their outer variables, so that capturevars can decide whether to
pass the variable's value or a pointer to it.
When all functions are compiled immediately, as long as the containing
function is compiled prior to the closure, this propagation has no effect.
When compilation is deferred, though, in rare cases, this results in
a change in the addrtaken-ness of a variable in the outer function,
which in turn changes the compiler's output.
(This is rare because in a great many cases, a temporary has been
introduced, insulating the outer variable from modification.)
But concurrent compilation must generate identical results.
To fix this, track whether capturevars has run.
If it has, there is no need to update outer variables
when closure variables change.
Capturevars always runs before any functions are walked or compiled.
The remainder of the changes in this CL are to support the test.
In particular, -d=compilelater forces the compiler to walk all
functions before compiling any of them, despite being non-concurrent.
This is useful because -live is fundamentally incompatible with
concurrent compilation, but we want -c=1 to have no behavior changes.
Fixes#20250
Change-Id: I89bcb54268a41e8588af1ac8cc37fbef856a90c2
Reviewed-on: https://go-review.googlesource.com/42853
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
On Windows the drive letter is sometime "c:" and sometimes "C:".
Fixes#20336.
Change-Id: I38c86999af9522c51470d60016729d41cfec6b25
Reviewed-on: https://go-review.googlesource.com/43390
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
TestCgoContainsSpace builds a small program which mimics $CC.
Usually, $CC attempts to compile a trivial code to detect its own
supported flags (i.e. "-no-pie", which must be passed on some systems),
however the mimic didn't consider these cases.
This CL solve the issue.
Also, use the same name as $CC, it may solve other potential problems.
Fixes#20324
Change-Id: I7a00ac016a5fd0667540f2a715371f8152edc395
Reviewed-on: https://go-review.googlesource.com/43330
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Tuple ops are weird. They are essentially a pair of ops,
one which consumes a mem and one which generates a mem (the Select1).
The schedule pass didn't handle these quite right.
Fix the scheduler to include both parts of the paired op in
the store chain. That makes sure that loads are correctly ordered
with respect to the first of the pair.
Add a check for the ssacheck builder, that there is only one
live store at a time. I thought we already had such a check, but
apparently not...
Fixes#20335
Change-Id: I59eb3446a329100af38d22820b1ca2190ca46a78
Reviewed-on: https://go-review.googlesource.com/43294
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The writebarrier test has to change.
Now that T23 composite literals are passed to the backend,
they get SSA'd, so writes to their fields are treated separately,
so the relevant part of the first write to t23 is now a dead store.
Preserve the intent of the test by splitting it up into two functions.
Reduces code size a bit:
name old object-bytes new object-bytes delta
Template 386k ± 0% 386k ± 0% ~ (all equal)
Unicode 202k ± 0% 202k ± 0% ~ (all equal)
GoTypes 1.16M ± 0% 1.16M ± 0% ~ (all equal)
Compiler 3.92M ± 0% 3.91M ± 0% -0.19% (p=0.008 n=5+5)
SSA 7.91M ± 0% 7.91M ± 0% ~ (all equal)
Flate 228k ± 0% 228k ± 0% -0.05% (p=0.008 n=5+5)
GoParser 283k ± 0% 283k ± 0% ~ (all equal)
Reflect 952k ± 0% 952k ± 0% -0.06% (p=0.008 n=5+5)
Tar 188k ± 0% 188k ± 0% -0.09% (p=0.008 n=5+5)
XML 406k ± 0% 406k ± 0% -0.02% (p=0.008 n=5+5)
[Geo mean] 649k 648k -0.04%
Fixes#18872
Change-Id: Ifeed0f71f13849732999aa731cc2bf40c0f0e32a
Reviewed-on: https://go-review.googlesource.com/43154
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reuse block head or preceding instruction's line number for
register allocator's spill, fill, copy, rematerialization
instructionsl; and also for phi, and for no-src-pos
instructions. Assembler creates same line number tables
for copy-predecessor-line and for no-src-pos,
but copy-predecessor produces better-looking assembly
language output with -S and with GOSSAFUNC, and does not
require changes to tests of existing assembly language.
Split "copyInto" into two cases, one for register allocation,
one for otherwise. This caused the test score line change
count to increase by one, which may reflect legitimately
useful information preserved. Without any special treatment
for copyInto, the change count increases by 21 more, from
51 to 72 (i.e., quite a lot).
There is a test; using two naive "scores" for line number
churn, the old numbering is 2x or 4x worse.
Fixes#18902.
Change-Id: I0a0a69659d30ee4e5d10116a0dd2b8c5df8457b1
Reviewed-on: https://go-review.googlesource.com/36207
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Fixes#7906
Change-Id: Ibcf9cd670593241921ab3c426ff7357f799ebc3e
Reviewed-on: https://go-review.googlesource.com/43072
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Implements detection of x86 cpu features that
are used in the go standard library.
Changes all standard library packages to use the new cpu package
instead of using runtime internal variables to check x86 cpu features.
Updates: #15403
Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9
Reviewed-on: https://go-review.googlesource.com/41476
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Eliminates stores of values that have just been loaded from the same
location. Handles the common case where there are up to 3 intermediate
stores to non-overlapping struct fields.
For example the loads and stores of x.a, x.b and x.d in the following
function are now removed:
type T struct {
a, b, c, d int
}
func f(x *T) {
y := *x
y.c += 8
*x = y
}
Before this CL (s390x):
TEXT "".f(SB)
MOVD "".x(R15), R5
MOVD (R5), R1
MOVD 8(R5), R2
MOVD 16(R5), R0
MOVD 24(R5), R4
ADD $8, R0, R3
STMG R1, R4, (R5)
RET
After this CL (s390x):
TEXT "".f(SB)
MOVD "".x(R15), R1
MOVD 16(R1), R0
ADD $8, R0, R0
MOVD R0, 16(R1)
RET
In total these rules are triggered ~5091 times during all.bash,
which is broken down as:
Intermediate stores | Triggered
--------------------+----------
0 | 1434
1 | 2508
2 | 888
3 | 261
--------------------+----------
Change-Id: Ia4721ae40146aceec1fdd3e65b0e9283770bfba5
Reviewed-on: https://go-review.googlesource.com/38793
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The files PPC64.rules and rewritePPC64.go were out of sync due to
conflicts between CL 41630 and CL 42145 (i.e. running 'go run *.go'
in the gen directory resulted in unexpected changes).
Change-Id: I1d409656b66afeab6cb9c6df9b3dcab7859caa75
Reviewed-on: https://go-review.googlesource.com/43091
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
This can make life easier for Delve (and other debuggers),
and can help them with bug reports.
Sample producer field (from objdump):
<48> DW_AT_producer : Go cmd/compile devel +8a59dbf41a Mon May 8 16:02:44 2017 -0400
Change-Id: I0605843c959b53a60a25a3b870aa8755bf5d5b13
Reviewed-on: https://go-review.googlesource.com/33588
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This adds math/bits intrinsics for OnesCount, Len, TrailingZeros on
ppc64x.
benchmark old ns/op new ns/op delta
BenchmarkLeadingZeros-16 4.26 1.71 -59.86%
BenchmarkLeadingZeros16-16 3.04 1.83 -39.80%
BenchmarkLeadingZeros32-16 3.31 1.82 -45.02%
BenchmarkLeadingZeros64-16 3.69 1.71 -53.66%
BenchmarkTrailingZeros-16 2.55 1.62 -36.47%
BenchmarkTrailingZeros32-16 2.55 1.77 -30.59%
BenchmarkTrailingZeros64-16 2.78 1.62 -41.73%
BenchmarkOnesCount-16 3.19 0.93 -70.85%
BenchmarkOnesCount32-16 2.55 1.18 -53.73%
BenchmarkOnesCount64-16 3.22 0.93 -71.12%
Update #18616
I also made a change to bits_test.go because when debugging some failures
the output was not quite providing the right argument information.
Change-Id: Ia58d31d1777cf4582a4505f85b11a1202ca07d3e
Reviewed-on: https://go-review.googlesource.com/41630
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Also change type from map[*types.Type]bool to map[*types.Type]struct{}.
This is basically a clean-up.
Change-Id: I167583eff0fa1070a7522647219476033b52b840
Reviewed-on: https://go-review.googlesource.com/41859
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
OpVarXXX Values don't generate instructions,
so there's no reason not to duplicate them,
and duplicating them generates better code
(fewer branches).
This requires changing the start/end accounting
to correctly handle the case in which we have run
of Values beginning with an OpVarXXX, e.g.
OpVarDef, OpZeroWB, OpMoveWB.
In that case, the sequence of values should begin
at the OpZeroWB, not the OpVarDef.
This also lays the groundwork for experimenting
with allowing duplication of some scalar stores.
Shrinks function text sizes a tiny amount:
name old object-bytes new object-bytes delta
Template 381k ± 0% 381k ± 0% -0.01% (p=0.008 n=5+5)
Unicode 203k ± 0% 203k ± 0% -0.04% (p=0.008 n=5+5)
GoTypes 1.17M ± 0% 1.17M ± 0% -0.01% (p=0.008 n=5+5)
SSA 8.24M ± 0% 8.24M ± 0% -0.00% (p=0.008 n=5+5)
Flate 230k ± 0% 230k ± 0% ~ (all equal)
GoParser 286k ± 0% 286k ± 0% ~ (all equal)
Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal)
Tar 189k ± 0% 189k ± 0% ~ (all equal)
XML 415k ± 0% 415k ± 0% -0.01% (p=0.008 n=5+5)
Updates #19838
Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27
Reviewed-on: https://go-review.googlesource.com/42011
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
In PPC64 ELF files, the st_other field indicates the number of
prologue instructions between the global and local entry points.
We add the instructions in the compiler and assembler if -shared is used.
We were assuming that the instructions were present when building a
c-archive or PIE or doing dynamic linking, on the assumption that those
are the cases where the go tool would be building with -shared.
That assumption fails when using some other tool, such as Bazel,
that does not necessarily use -shared in exactly the same way.
This CL records in the object file whether a symbol was compiled
with -shared (this will be the same for all symbols in a given compilation)
and uses that information when setting the st_other field.
Fixes#20290.
Change-Id: Ib2b77e16aef38824871102e3c244fcf04a86c6ea
Reviewed-on: https://go-review.googlesource.com/43051
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
These collectively fire a few hundred times during make.bash,
mostly rewriting XOR SETNE -> SETEQ.
Fixes#17905.
Change-Id: Ic5eb241ee93ed67099da3de11f59e4df9fab64a3
Reviewed-on: https://go-review.googlesource.com/42491
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
CL 39915 introduced sorting of signats by ShortString
for reproducible builds. But ShortString treats types
byte and uint8 identically; same for rune and uint32.
CL 39915 attempted to compensate for this by only
adding the underlying type (uint8) to signats in addsignat.
This only works for byte and uint8. For e.g. *byte and *uint,
both get added, and their sort order is random,
leading to non-reproducible builds.
One fix would be to add yet another type printing mode
that doesn't eliminate byte and rune, and use it
for sorting signats. But the formatting routines
are complicated enough as it is.
Instead, just sort first by ShortString and then by String.
We can't just use String, because ShortString makes distinctions
that String doesn't. ShortString is really preferred here;
String is serving only as a backstop for handling of bytes and runes.
The long series of types in the test helps increase the odds of
failure, allowing a smaller number of iterations in the test.
On my machine, a full test takes 700ms.
Passes toolstash-check.
Updates #19961Fixes#20272
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.9MB ± 0% +0.12% (p=0.032 n=5+5)
Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.841 n=5+5)
GoTypes 110MB ± 0% 110MB ± 0% ~ (p=0.841 n=5+5)
Compiler 463MB ± 0% 463MB ± 0% ~ (p=0.056 n=5+5)
SSA 1.11GB ± 0% 1.11GB ± 0% +0.02% (p=0.016 n=5+5)
Flate 24.7MB ± 0% 24.8MB ± 0% +0.14% (p=0.032 n=5+5)
GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.421 n=5+5)
Reflect 73.9MB ± 0% 73.9MB ± 0% ~ (p=1.000 n=5+5)
Tar 25.8MB ± 0% 25.8MB ± 0% +0.15% (p=0.016 n=5+5)
XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.310 n=5+5)
[Geo mean] 72.0MB 72.0MB +0.07%
name old allocs/op new allocs/op delta
Template 384k ± 0% 385k ± 1% ~ (p=0.056 n=5+5)
Unicode 343k ± 0% 344k ± 0% ~ (p=0.548 n=5+5)
GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.421 n=5+5)
Compiler 4.43M ± 0% 4.44M ± 0% +0.26% (p=0.032 n=5+5)
SSA 9.86M ± 0% 9.87M ± 0% +0.10% (p=0.032 n=5+5)
Flate 237k ± 1% 238k ± 0% +0.49% (p=0.032 n=5+5)
GoParser 319k ± 1% 320k ± 1% ~ (p=0.151 n=5+5)
Reflect 957k ± 0% 957k ± 0% ~ (p=1.000 n=5+5)
Tar 251k ± 0% 252k ± 1% +0.49% (p=0.016 n=5+5)
XML 399k ± 0% 401k ± 1% ~ (p=0.310 n=5+5)
[Geo mean] 739k 741k +0.26%
Change-Id: Ic27995a8d374d012b8aca14546b1df9d28d30df7
Reviewed-on: https://go-review.googlesource.com/42955
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
If there were more unused imports than
the maximum default number of errors to report,
the set of reported imports was non-deterministic.
Fix by accumulating and sorting them prior to output.
Fixes#20298
Change-Id: Ib3d5a15fd7dc40009523fcdc1b93ddc62a1b05f2
Reviewed-on: https://go-review.googlesource.com/42954
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
ARM64 assembler backend only accepts loads and stores with small
or aligned offset. The compiler therefore can only fold small or
aligned offsets into loads and stores. For locals and args, their
offsets to SP are not known until very late, and the compiler
makes conservative decision not folding some of them. However,
in most cases, the offset is indeed small or aligned, and can
be folded into load and store (but actually not).
This CL adds support of loads and stores with large and unaligned
offsets. When the offset doesn't fit into the instruction, it
uses two instructions and (for very large offset) the constant
pool. This way, the compiler doesn't need to be conservative,
and can simply fold the offset.
To make it work, the assembler's optab matching rules need to be
changed. Before, MOVD accepts C_UAUTO32K which matches multiple
of 8 between 0 and 32K, and also C_UAUTO16K, which may not be
multiple of 8 and does not fit into MOVD instruction. The
assembler errors in the latter case. This change makes it only
matches multiple of 8 (or offsets within ±256, which also fits
in instruction), and uses the large-or-unaligned-offset rule
for things doesn't fit (without error). Other sized move rules
are changed similarly.
Class C_UAUTO64K and C_UOREG64K are removed, as they are never
used.
In shared library, load/store of global is rewritten to using
GOT and temp register, which conflicts with the use of temp
register for assembling large offset. So the folding is disabled
for globals in shared library mode.
Reduce cmd/go binary size by 2%.
name old time/op new time/op delta
BinaryTree17-8 8.67s ± 0% 8.61s ± 0% -0.60% (p=0.000 n=9+10)
Fannkuch11-8 6.24s ± 0% 6.19s ± 0% -0.83% (p=0.000 n=10+9)
FmtFprintfEmpty-8 116ns ± 0% 116ns ± 0% ~ (all equal)
FmtFprintfString-8 196ns ± 0% 192ns ± 0% -1.89% (p=0.000 n=10+10)
FmtFprintfInt-8 199ns ± 0% 198ns ± 0% -0.35% (p=0.001 n=9+10)
FmtFprintfIntInt-8 294ns ± 0% 293ns ± 0% -0.34% (p=0.000 n=8+8)
FmtFprintfPrefixedInt-8 318ns ± 1% 318ns ± 1% ~ (p=1.000 n=10+10)
FmtFprintfFloat-8 537ns ± 0% 531ns ± 0% -1.17% (p=0.000 n=9+10)
FmtManyArgs-8 1.19µs ± 1% 1.18µs ± 1% -1.41% (p=0.001 n=10+10)
GobDecode-8 17.2ms ± 1% 17.3ms ± 2% ~ (p=0.165 n=10+10)
GobEncode-8 14.7ms ± 1% 14.7ms ± 2% ~ (p=0.631 n=10+10)
Gzip-8 837ms ± 0% 836ms ± 0% -0.14% (p=0.006 n=9+10)
Gunzip-8 141ms ± 0% 139ms ± 0% -1.24% (p=0.000 n=9+10)
HTTPClientServer-8 256µs ± 1% 253µs ± 1% -1.35% (p=0.000 n=10+10)
JSONEncode-8 40.1ms ± 1% 41.3ms ± 1% +3.06% (p=0.000 n=10+9)
JSONDecode-8 157ms ± 1% 156ms ± 1% -0.83% (p=0.001 n=9+8)
Mandelbrot200-8 8.94ms ± 0% 8.94ms ± 0% +0.02% (p=0.000 n=9+9)
GoParse-8 8.69ms ± 0% 8.54ms ± 1% -1.69% (p=0.000 n=8+10)
RegexpMatchEasy0_32-8 227ns ± 1% 228ns ± 1% +0.48% (p=0.016 n=10+9)
RegexpMatchEasy0_1K-8 1.92µs ± 0% 1.63µs ± 0% -15.08% (p=0.000 n=10+9)
RegexpMatchEasy1_32-8 256ns ± 0% 251ns ± 0% -2.19% (p=0.000 n=10+9)
RegexpMatchEasy1_1K-8 2.38µs ± 0% 2.09µs ± 0% -12.49% (p=0.000 n=10+9)
RegexpMatchMedium_32-8 352ns ± 0% 354ns ± 0% +0.39% (p=0.002 n=10+9)
RegexpMatchMedium_1K-8 106µs ± 0% 106µs ± 0% -0.05% (p=0.005 n=10+9)
RegexpMatchHard_32-8 5.92µs ± 0% 5.89µs ± 0% -0.40% (p=0.000 n=9+8)
RegexpMatchHard_1K-8 180µs ± 0% 179µs ± 0% -0.14% (p=0.000 n=10+9)
Revcomp-8 1.20s ± 0% 1.13s ± 0% -6.29% (p=0.000 n=9+8)
Template-8 159ms ± 1% 154ms ± 1% -3.14% (p=0.000 n=9+10)
TimeParse-8 800ns ± 3% 769ns ± 1% -3.91% (p=0.000 n=10+10)
TimeFormat-8 826ns ± 2% 817ns ± 2% -1.04% (p=0.050 n=10+10)
[Geo mean] 145µs 143µs -1.79%
Change-Id: I5fc42087cee9b54ea414f8ef6d6d020b80eb5985
Reviewed-on: https://go-review.googlesource.com/42172
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
It can be disabled by setting the environment variable
GO19CONCURRENTCOMPILATION=0, or with -gcflags=-c=1.
Fixes#15756.
Change-Id: I7acbf16330512b62ee14ecbab1f46b53ec5a67b6
Reviewed-on: https://go-review.googlesource.com/41820
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This permits the user to override the code generation flag when they
know better. This is always a good policy for all flags automatically
inserted by the build system.
Doing this now so that I can write a test for #20290.
Update #20290
Change-Id: I5c6708a277238d571b8d037993a5a59e2a442e98
Reviewed-on: https://go-review.googlesource.com/42952
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>