Static branch predictions (which guide block ordering) are
adjusted based on:
loop/not-loop (favor looping)
abnormal-exit/not (avoid panic)
call/not-call (avoid call)
ret/default (treat returns as rare)
This appears to make no difference in performance of real
code, meaning the compiler itself. The earlier version of
this has been stripped down to help make the cost of this
only-aesthetic-on-Intel phase be as cheap as possible (we
probably want information about inner loops for improving
register allocation, but because register allocation follows
close behind this pass, conceivably the information could be
reused -- so we might do this anyway just to normalize
output).
For a ./make.bash that takes 200 user seconds, about .75
second is reported in likelyadjust (summing nanoseconds
reported with -d=ssa/likelyadjust/time ).
Upstream predictions are respected.
Includes test, limited to build on amd64 only.
Did several iterations on the debugging output to allow
some rough checks on behavior.
Debug=1 logging notes agree/disagree with earlier passes,
allowing analysis like the following:
Run on make.bash:
GO_GCFLAGS=-d=ssa/likelyadjust/debug \
./make.bash >& lkly5.log
grep 'ranch prediction' lkly5.log | wc -l
78242 // 78k predictions
grep 'ranch predi' lkly5.log | egrep -v 'agrees with' | wc -l
29633 // 29k NEW predictions
grep 'disagrees' lkly5.log | wc -l
444 // contradicted 444 times
grep '< exit' lkly5.log | wc -l
10212 // 10k exit predictions
grep '< exit' lkly5.log | egrep 'disagrees' | wc -l
5 // 5 contradicted by previous prediction
grep '< exit' lkly5.log | egrep -v 'agrees' | wc -l
702 // 702-5 redundant with previous prediction
grep '< call' lkly5.log | egrep -v 'agrees' | wc -l
16699 // 16k new call predictions
grep 'stay in loop' lkly5.log | egrep -v 'agrees' | wc -l
3951 // 4k new "remain in loop" predictions
Fixes#11451.
Change-Id: Iafb0504f7030d304ef4b6dc1aba9a5789151a593
Reviewed-on: https://go-review.googlesource.com/19995
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Add a blank line before the "package ssa" lines so the "autogenerated
don't edit" comments don't end up in godoc output.
Change-Id: I82bf90d52d426ce1a8e21483fc8f47b3689259c7
Reviewed-on: https://go-review.googlesource.com/20086
Reviewed-by: Keith Randall <khr@golang.org>
* This is a very basic form of straight line strength reduction.
* Removes one multiplication from a[b].c++; a[b+1].c++
* It increases pressure on the register allocator because
CSE creates more copies of the multiplication sizeof(a[0])*b.
Change-Id: I686a18e9c24cc6f8bdfa925713afed034f7d36d0
Reviewed-on: https://go-review.googlesource.com/20091
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Use only reflect.TypeOf to detect if argument is a string.
The wasString return is only needed in doPrint with the 'v' verb.
This type of string detection is handled correctly by reflect.TypeOf
which is used already in doPrint for identifying a string argument.
Remove now obsolete wasString computations and return values.
Change-Id: Iea2de7ac0f5c536a53eec63f7e679d628f5af8dc
Reviewed-on: https://go-review.googlesource.com/19976
Run-TryBot: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Also, relocate related const and type definitions from go.go.
Change-Id: Ieb9b672da8dd510ca67022b4f7ae49a778a56579
Reviewed-on: https://go-review.googlesource.com/20080
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Keith Randall <khr@golang.org>
Add writeback code to each return location which copies
the final result back to the correct stack location.
Cgo plays tricky games by taking the address of a
in f(a int) (b int) and then using that address to
modify b. So for cgo-generated Go code, disable the
SSAing of output args.
Update #14511
Change-Id: I95cba727d53699d31124eef41db0e03935862be9
Reviewed-on: https://go-review.googlesource.com/19988
Reviewed-by: Todd Neal <todd@tneal.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
In best of 10, linking cmd/go shows a ~10% improvement.
tip: real 0m1.152s user 0m1.005s
this: real 0m1.065s user 0m0.924s
Change-Id: I303a20b94332feaedc1033c453247a0e4c05c843
Reviewed-on: https://go-review.googlesource.com/19978
Reviewed-by: David Crawshaw <crawshaw@golang.org>
It gets rewritten to an xor by the linker also.
Change-Id: Iae35130325d41bd1a09b7e971190cae6f4e17fac
Reviewed-on: https://go-review.googlesource.com/20058
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The object file reader in cmd/link reads the symbol name into a scratch
[]byte, converts it to a string, and then does a substring replacement.
Instead, this CL does the replacement on the []byte into the scratch
space and then creates the final string.
Linking godoc without DWARF, best of ten, shows a ~10% improvement.
tip: real 0m1.099s user 0m1.541s
this: real 0m0.990s user 0m1.280s
This is part of an attempt to make suffixarray string deduping
come out as a wash, but it's not there yet:
cl/19987: real 0m1.335s user 0m1.794s
cl/19987+this: real 0m1.225s user 0m1.540s
Change-Id: Idf061fdfbd7f08aa3a1f5933d3f111fdd1659210
Reviewed-on: https://go-review.googlesource.com/20025
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The Cpos function is used frequently (at least once per symbol) and
it is implemented with the seek syscall. Instead, track current
output offset and use it.
Building the godoc binary with DWARF, best of ten:
tip: real 0m1.287s user 0m1.573s
this: real 0m1.208s user 0m1.555s
Change-Id: I068148695cd6b4d32cd145db25e59e6f6bae6945
Reviewed-on: https://go-review.googlesource.com/20055
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reduces number of memory allocations by 12%:
Before: 1816664
After: 1581591
Small speed improvement.
Change-Id: I61281fb852e8e31851a350e3ae756676705024a4
Reviewed-on: https://go-review.googlesource.com/20027
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Preallocate ~2MB for Lsym map (size calculation from http://play.golang.org/p/9L7F5naXRr).
Reduces best of 10 link time of cmd/go by ~4%.
On cmd/go max resident size unaffected, on println hello world max resident size grows by 4mb from 18mb->22mb. Performance improves in both cases.
tip: real 0m1.283s user 0m1.502s sys 0m0.144s
this: real 0m1.341s user 0m1.598s sys 0m0.136s
Change-Id: I4a95e45fe552f1f64f53e868421b9f45a34f8b96
Reviewed-on: https://go-review.googlesource.com/19979
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
In golang.org/cl/14449 the `getdents' system call got changed to use
_SYS_getdents as a layer of indirection instead of SYS_GETDENTS64 for
compatibility with mips64, but this broke mksyscall.pl, which then
died with with:
syscall_linux.go:840: malformed //sys declaration
Change-Id: Icb61965d8730f6e81f9fb0fa28c7bab635470f09
Reviewed-on: https://go-review.googlesource.com/20051
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This is a AMD64 version of CL19743.
Saves additional 1574 bytes in go binary.
This also speeds up bzip2 by 1-4%
Change-Id: I031ba423663c4e83fdefe44e5296f24143e303da
Reviewed-on: https://go-review.googlesource.com/19939
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Movups is 1 byte smaller than movapd that we currently use.
Change-Id: I22f771f066529352722a28543535ec43497cb9c5
Reviewed-on: https://go-review.googlesource.com/19938
Reviewed-by: David Chase <drchase@google.com>
Named returned values should only be used on public funcs and methods
when it contributes to the documentation.
Named return values should not be used if they're only saving the
programmer a few lines of code inside the body of the function,
especially if that means there's stutter in the documentation or it
was only there so the programmer could use a naked return
statement. (Naked returns should not be used except in very small
functions)
This change is a manual audit & cleanup of public func signatures.
Signatures were not changed if:
* the func was private (wouldn't be in public godoc)
* the documentation referenced it
* the named return value was an interesting name. (i.e. it wasn't
simply stutter, repeating the name of the type)
There should be no changes in behavior. (At least: none intended)
Change-Id: I3472ef49619678fe786e5e0994bdf2d9de76d109
Reviewed-on: https://go-review.googlesource.com/20024
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
Do not replace the sign in front of a number with a space if both
f.space and f.plus are both specified for number formatting.
This was already the case for integers but not for floats
and complex numbers.
Updates: #14543.
Change-Id: I07ddeb505003db84a8a7d2c743dc19fc427a00bd
Reviewed-on: https://go-review.googlesource.com/19974
Run-TryBot: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Besides being more efficient in a large build, this avoids a possible
race when creating the input file.
Change-Id: Ifc2cb055925a76be9c90eac56d84ebd9e14f2bbc
Reviewed-on: https://go-review.googlesource.com/19392
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Avoid targeting a partial register with load;
ensure source of load (writebarrier) is aligned.
Better yet would be "CMPB $1,writebarrier" but that requires
wrestling with flagalloc (mem operand complicates moving
instruction around).
Didn't see a change in time for
benchcmd -n 10 Build go build net/http
Verified that we clean the code up properly:
0x20a8 <main.main+104>: mov 0xc30a2(%rip),%eax
# 0xc5150 <runtime.writeBarrier>
0x20ae <main.main+110>: test %al,%al
Change-Id: Id5fb8c260eaec27bd727cb0ae1476c60343b0986
Reviewed-on: https://go-review.googlesource.com/19998
Reviewed-by: Keith Randall <khr@golang.org>
Exposed data already in sdom to avoid recreating it in prove.
Change-Id: I834c9c03ed8faeaee013e5a1b3f955908f0e0915
Reviewed-on: https://go-review.googlesource.com/19999
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
This is minor cleanup that reduces test output noise.
Change-Id: Ib6db4daf8cb67b7784b2d5b222fa37c7f78a6a04
Reviewed-on: https://go-review.googlesource.com/19997
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
* It does very simple bounds checking elimination. E.g.
removes the second check in for i := range a { a[i]++; a[i++]; }
* Improves on the following redundant expression:
return a6 || (a6 || (a6 || a4)) || (a6 || (a4 || a6 || (false || a6)))
* Linear in the number of block edges.
I patched in CL 12960 that does bounds, nil and constant propagation
to make sure this CL is not just redundant. Size of pkg/tool/linux_amd64/*
(excluding compile which is affected by this change):
With IsInBounds and IsSliceInBounds
-this -12960 92285080
+this -12960 91947416
-this +12960 91978976
+this +12960 91923088
Gain is ~110% of 12960.
Without IsInBounds and IsSliceInBounds (older run)
-this -12960 95515512
+this -12960 95492536
-this +12960 95216920
+this +12960 95204440
Shaves 22k on its own.
* Can we handle IsInBounds better with this? In
for i := range a { a[i]++; } the bounds checking at a[i]
is not eliminated.
Change-Id: I98957427399145fb33693173fd4d5a8d71c7cc20
Reviewed-on: https://go-review.googlesource.com/19710
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This is minor cleanup that makes the tests more readable.
Change-Id: I9f1f98f0f035096c284bdf3501e7520517a3e4d9
Reviewed-on: https://go-review.googlesource.com/19993
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Found this while reading through the code. The benchmark
accidently called the wrong function.
Change-Id: Idb88aa71e7098a4e29e7f5f39e64f8c5f8936a2a
Reviewed-on: https://go-review.googlesource.com/19977
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Add the max arg length to opcodes and use it in zcse. Doesn't affect
speed, but allows better checking in checkFunc and removes the need
to keep a list of zero arg opcodes up to date.
Change-Id: I157c6587154604119720ec6228b767b6e52bb5c7
Reviewed-on: https://go-review.googlesource.com/19994
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Motivation:
* Previously, the size of the compressed data was used for metrics,
rather than the uncompressed size. This causes the library to appear
to perform poorly relative to C or other implementation. Switch it
to use the uncompressed size so that it matches how decompression
benchmarks are usually done (like in compress/flate). This also makes
it easier to compare bzip2 rates to other algorithms since they measure
performance in this way.
* Also, reset the timer after doing initialization work.
Change-Id: I32112c2ee8e7391e658c9cf31039f70a689d9b9d
Reviewed-on: https://go-review.googlesource.com/17611
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The bzip2 block size is a multiple of 100*1000 not 100*1024.
Thus, the bzip2 decoder would incorrectly decode files with larger
block sizes when it should have otherwise failed.
Fortunately, we can correct this in a backwards compatible way since
Go has no implementation of a bzip2 encoder to produce bad blocks :)
To confirm that the C bzip2 utlity chokes on this data:
$ echo "425a683131415926535936dc55330063ffc0006000200020a40830008b00
08b8bb9229c28481b6e2a998" | xxd -r -p | bzip2 -d
bzip2: Data integrity error when decompressing.
Fixes#13941
Change-Id: I2402e8829a8027ef94dd4fac050b200440a3d4e4
Reviewed-on: https://go-review.googlesource.com/20011
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The LZ77 portion of DEFLATE is relatively self-contained. For the
decompression side of things, we extract this logic out for the
following reasons:
* It is easier to test just the LZ77 portion of the logic.
* It reduces the noise in the inflate.go
Also, we adjust the way that callbacks are handled in the inflate.
Instead of using functions to abstract the logical componets of
huffmanBlock(), use goto statements to jump between the necessary
sections. This is faster since it avoids a function call and is
arguably more readable.
benchmark old MB/s new MB/s speedup
BenchmarkDecodeDigitsSpeed1e4-4 53.62 60.11 1.12x
BenchmarkDecodeDigitsSpeed1e5-4 61.90 69.07 1.12x
BenchmarkDecodeDigitsSpeed1e6-4 63.24 70.58 1.12x
BenchmarkDecodeDigitsDefault1e4-4 54.10 59.00 1.09x
BenchmarkDecodeDigitsDefault1e5-4 69.50 74.07 1.07x
BenchmarkDecodeDigitsDefault1e6-4 71.54 75.85 1.06x
BenchmarkDecodeDigitsCompress1e4-4 54.39 58.94 1.08x
BenchmarkDecodeDigitsCompress1e5-4 69.21 73.96 1.07x
BenchmarkDecodeDigitsCompress1e6-4 71.14 75.75 1.06x
BenchmarkDecodeTwainSpeed1e4-4 53.15 58.13 1.09x
BenchmarkDecodeTwainSpeed1e5-4 66.56 72.29 1.09x
BenchmarkDecodeTwainSpeed1e6-4 69.13 75.11 1.09x
BenchmarkDecodeTwainDefault1e4-4 56.00 60.23 1.08x
BenchmarkDecodeTwainDefault1e5-4 77.84 82.27 1.06x
BenchmarkDecodeTwainDefault1e6-4 82.07 86.85 1.06x
BenchmarkDecodeTwainCompress1e4-4 56.13 60.38 1.08x
BenchmarkDecodeTwainCompress1e5-4 78.23 82.62 1.06x
BenchmarkDecodeTwainCompress1e6-4 82.38 86.73 1.05x
Change-Id: I8c6ae0e6bed652dd0570fc113c999977f5e71636
Reviewed-on: https://go-review.googlesource.com/16528
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Moves the implementation of RunBenchmarks to a non-exported function
that returns whether the execution was OK, and uses that to identify
failure in benchmarks.The exported function is kept for compatibility.
Like before, benchmarks will only be executed if tests and examples
pass. The PASS message will not be printed if there was a failure in
a benchmark.
Example output
BenchmarkThatCallsFatal-8 --- FAIL: BenchmarkThatCallsFatal-8
x_test.go:6: called by benchmark
FAIL
exit status 1
FAIL _/.../src/cmd/go/testdata/src/benchfatal 0.009s
Fixes#14307.
Change-Id: I6f3ddadc7da8a250763168cc099ae8b325a79602
Reviewed-on: https://go-review.googlesource.com/19889
Reviewed-by: Ian Lance Taylor <iant@golang.org>