The following performance improvements have been made to the
low-level atomic functions for ppc64le & ppc64:
- For those cases containing a lwarx and stwcx (or other sizes):
sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
The sync is moved before (outside) the lwarx/stwcx loop, and the
sync after is removed, so it becomes:
sync, lwarx, maybe something, stwcx, loop to lwarx, isync
- For the Or8 and And8, the shifting and manipulation of the
address to the word aligned version were removed and the
instructions were changed to use lbarx, stbcx instead of
register shifting, xor, then lwarx, stwcx.
- New instructions LWSYNC, LBAR, STBCC were tested and added.
runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
instead of the WORD encoding.
Fixes#15469
Ran some of the benchmarks in the runtime and sync directories.
Some results varied from run to run but the trend was improvement
based on best times for base and new:
runtime.test:
BenchmarkChanNonblocking-128 0.88 0.89 +1.14%
BenchmarkChanUncontended-128 569 511 -10.19%
BenchmarkChanContended-128 63110 53231 -15.65%
BenchmarkChanSync-128 691 598 -13.46%
BenchmarkChanSyncWork-128 11355 11649 +2.59%
BenchmarkChanProdCons0-128 2402 2090 -12.99%
BenchmarkChanProdCons10-128 1348 1363 +1.11%
BenchmarkChanProdCons100-128 1002 746 -25.55%
BenchmarkChanProdConsWork0-128 2554 2720 +6.50%
BenchmarkChanProdConsWork10-128 1909 1804 -5.50%
BenchmarkChanProdConsWork100-128 1624 1580 -2.71%
BenchmarkChanCreation-128 237 212 -10.55%
BenchmarkChanSem-128 705 667 -5.39%
BenchmarkChanPopular-128 5081190 4497566 -11.49%
BenchmarkCreateGoroutines-128 532 473 -11.09%
BenchmarkCreateGoroutinesParallel-128 35.0 34.7 -0.86%
BenchmarkCreateGoroutinesCapture-128 4923 4200 -14.69%
sync.test:
BenchmarkUncontendedSemaphore-128 112 94.2 -15.89%
BenchmarkContendedSemaphore-128 133 128 -3.76%
BenchmarkMutexUncontended-128 1.90 1.67 -12.11%
BenchmarkMutex-128 353 310 -12.18%
BenchmarkMutexSlack-128 304 283 -6.91%
BenchmarkMutexWork-128 554 541 -2.35%
BenchmarkMutexWorkSlack-128 567 556 -1.94%
BenchmarkMutexNoSpin-128 275 242 -12.00%
BenchmarkMutexSpin-128 1129 1030 -8.77%
BenchmarkOnce-128 1.08 0.96 -11.11%
BenchmarkPool-128 29.8 27.4 -8.05%
BenchmarkPoolOverflow-128 40564 36583 -9.81%
BenchmarkSemaUncontended-128 3.14 2.63 -16.24%
BenchmarkSemaSyntNonblock-128 1087 1069 -1.66%
BenchmarkSemaSyntBlock-128 897 893 -0.45%
BenchmarkSemaWorkNonblock-128 1034 1028 -0.58%
BenchmarkSemaWorkBlock-128 949 886 -6.64%
Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e
Reviewed-on: https://go-review.googlesource.com/22549
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
If b has exactly one predecessor, as happens
frequently with static calls, we can make
lookupVarOutgoing generate less garbage.
Instead of generating a value that is just
going to be an OpCopy and then get eliminated,
loop. This can lead to lots of looping.
However, this loop is way cheaper than generating
lots of ssa.Values and then eliminating them.
For a subset of the code in #15537:
Before:
28.31 real 36.17 user 1.68 sys
2282450944 maximum resident set size
After:
9.63 real 11.66 user 0.51 sys
638144512 maximum resident set size
Updates #15537.
Excitingly, it appears that this also helps
regular code:
name old time/op new time/op delta
Template 288ms ± 6% 276ms ± 7% -4.13% (p=0.000 n=21+24)
Unicode 143ms ± 8% 141ms ±10% ~ (p=0.287 n=24+25)
GoTypes 932ms ± 4% 874ms ± 4% -6.20% (p=0.000 n=23+22)
Compiler 4.89s ± 4% 4.58s ± 4% -6.46% (p=0.000 n=22+23)
MakeBash 40.2s ±13% 39.8s ± 9% ~ (p=0.648 n=23+23)
name old user-ns/op new user-ns/op delta
Template 388user-ms ±10% 373user-ms ± 5% -3.80% (p=0.000 n=24+25)
Unicode 203user-ms ± 6% 202user-ms ± 7% ~ (p=0.492 n=22+24)
GoTypes 1.29user-s ± 4% 1.17user-s ± 4% -9.67% (p=0.000 n=25+23)
Compiler 6.86user-s ± 5% 6.28user-s ± 4% -8.49% (p=0.000 n=25+25)
name old alloc/op new alloc/op delta
Template 51.5MB ± 0% 47.6MB ± 0% -7.47% (p=0.000 n=22+25)
Unicode 37.2MB ± 0% 37.1MB ± 0% -0.21% (p=0.000 n=25+25)
GoTypes 166MB ± 0% 138MB ± 0% -16.83% (p=0.000 n=25+25)
Compiler 756MB ± 0% 628MB ± 0% -16.96% (p=0.000 n=25+23)
name old allocs/op new allocs/op delta
Template 450k ± 0% 445k ± 0% -1.02% (p=0.000 n=25+25)
Unicode 356k ± 0% 356k ± 0% ~ (p=0.374 n=24+25)
GoTypes 1.31M ± 0% 1.25M ± 0% -4.18% (p=0.000 n=25+25)
Compiler 5.29M ± 0% 5.02M ± 0% -5.15% (p=0.000 n=25+23)
It also seems to help in other cases in which
phi insertion is a pain point (#14774, #14934).
Change-Id: Ibd05ed7b99d262117ece7bb250dfa8c3d1cc5dd2
Reviewed-on: https://go-review.googlesource.com/22790
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
The legacy mips64 backend doesn't handle large uint->float conversion
correctly. See #15552.
Change-Id: I84ceeaa95cc4e85f09cc46dfb30ab5d151f6b205
Reviewed-on: https://go-review.googlesource.com/22800
Reviewed-by: Minux Ma <minux@golang.org>
Builder is too slow. This test passed on builder machines but took
15+ min.
Change-Id: Ief9d67ea47671a57e954e402751043bc1ce09451
Reviewed-on: https://go-review.googlesource.com/22798
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I overlooked it when rebasing CL 19803.
Change-Id: Ife9d6bcc6a772715d137af903c64bafac0cdb216
Reviewed-on: https://go-review.googlesource.com/22797
Reviewed-by: Minux Ma <minux@golang.org>
Since tracebackctxt.go uses //export functions, the C functions can't be
externally visible in the C comment. The code was using attributes to
work around that, but that failed on Windows.
Change-Id: If4449fd8209a8998b4f6855ea89e5db1471b2981
Reviewed-on: https://go-review.googlesource.com/22786
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
CLs 22181, 22332 and 22336 intorduced new functionality to be used
in cmd/link (see issue #15345 for details). But we didn't have chance
to use new functionality yet. Unexport newly introduced identifiers,
so we don't have to commit to the API until we actually tried it.
Rename File.COFFSymbols into File._COFFSymbols,
COFFSymbol.FullName into COFFSymbol._FullName,
Section.Relocs into Section._Relocs,
Reloc into _Relocs,
File.StringTable into File._StringTable and
StringTable into _StringTable.
Updates #15345
Change-Id: I770eeb61f855de85e0c175225d5d1c006869b9ec
Reviewed-on: https://go-review.googlesource.com/22720
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Adds the FCFIDU instruction and uses it instead of the FCFID
instruction for unsigned integer to float casts. This change means
that unsigned integers do not have to be cast to signed integers
before being cast to a floating point value. Therefore it is no
longer necessary to insert instructions to detect and fix
values that overflow int64.
The previous code generating the uint64 to int64 cast handled
overflow by truncating the uint64 value. This truncation can
change the result of the rounding performed by the integer to
float cast.
The FCFIDU instruction was added in Power ISA 2.06B.
Fixes#15539.
Change-Id: Ia37a9631293eff91032d4cd9a9bec759d2142437
Reviewed-on: https://go-review.googlesource.com/22772
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
debug/elf is only needed to determine the endianness of the host
machine, which is easy to do without debug/elf.
Fixes#15180.
Change-Id: I21035ed3884871270765a1ca3b812a5d4890a7ee
Reviewed-on: https://go-review.googlesource.com/21662
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
To speed up the ssacheck check builder and make it on by default as a
trybot.
Change-Id: I91a3347491507c84f4878dff744ca426ba3e2e9f
Reviewed-on: https://go-review.googlesource.com/22755
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
external linking is now supported.
Change-Id: I13e90c39dad86e60781adecdbe8e6bc9e522f740
Reviewed-on: https://go-review.googlesource.com/19811
Reviewed-by: Minux Ma <minux@golang.org>
external linking is now supported.
Change-Id: I3f552f5f09391205fced509fe8a5a38297ea8153
Reviewed-on: https://go-review.googlesource.com/19810
Reviewed-by: Minux Ma <minux@golang.org>
Fixes#14544
Change-Id: I58b0b164ebbfeafe4ab32039a063df53e3018a6d
Reviewed-on: https://go-review.googlesource.com/22730
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Sean Lake <odysseus9672@gmail.com>
Similar to the flate Writer pools already used,
this adds pooling for flate Readers.
compress/flate allows re-using of Readers, see
https://codereview.appspot.com/97140043/
In a real-world scenario when reading ~ 500 small files from a ZIP
archive this gives a speedup of 1.5x-2x.
Fixes#14289
Change-Id: I2d98ad983e95ab7d97e06fd0145f619b4f47caa4
Reviewed-on: https://go-review.googlesource.com/19416
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Consider three shared libraries:
libBase.so -- defines a type T
lib2.so -- references type T
lib3.so -- also references type T, and something from lib2
lib2.so will contain a type symbol for T in its symbol table, but no
definition. If, when linking lib3.so the linker reads the symbols from lib2.so
before libBase.so, the linker didn't read the type data and later crashed.
The fix is trivial but the test change is a bit messy because the order the
linker reads the shared libraries in ends up depending on the order of the
import statements in the file so I had to rename one of the test packages so
that gofmt doesn't fix the test by accident...
Fixes#15516
Change-Id: I124b058f782c900a3a54c15ed66a0d91d0cde5ce
Reviewed-on: https://go-review.googlesource.com/22744
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
See CL 22720 for details.
Updates #15345
Change-Id: If93ddbb8137d57da9846b671160b4cebe1992570
Reviewed-on: https://go-review.googlesource.com/22752
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Why not? Because the 386 backend can't handle one of them.
But other than that, it should work.
Change-Id: Iaeb9735f8c3c281136a0734376dec5ddba21be3b
Reviewed-on: https://go-review.googlesource.com/22748
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Change https://golang.org/cl/8945 allowed Go to use its own DNS resolver
instead of libc in a number of cases. The code parses nsswitch.conf and
attempts to resolve things in the same order. Unfortunately, builds with
netgo completely ignore this parsing and always search via
hostLookupFilesDNS.
This commit modifies the logic to allow binaries built with netgo to
parse nsswitch.conf and attempt to resolve using the order specified
there. If the parsing results in hostLookupCGo, it falls back to the
original hostLookupFilesDNS. Tests are also added to ensure that both
the parsing and the fallback work properly.
Fixes#14354
Change-Id: Ib079ad03d7036a4ec57f18352a15ba55d933f261
Reviewed-on: https://go-review.googlesource.com/19523
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
If we collected a cgo traceback when entering the SIGPROF signal
handler, record it as part of the profiling stack trace.
This serves as the promised test for https://golang.org/cl/21055 .
Change-Id: I5f60cd6cea1d9b7c3932211483a6bfab60ed21d2
Reviewed-on: https://go-review.googlesource.com/22650
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
It never makes sense to CSE two ops that generate memory.
We might as well start those ops off in their own partition.
Fixes#15520
Change-Id: I0091ed51640f2c10cd0117f290b034dde7a86721
Reviewed-on: https://go-review.googlesource.com/22741
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
1) Blank parameters cannot be accessed so the package doesn't matter.
Do not export it, and consistently use localpkg when importing a
blank parameter.
2) More accurately replicate fmt.go and parser.go logic when importing
a blank struct field. Blank struct fields get exported without
package qualification.
(This is actually incorrect, even with the old textual export format,
but we will fix that in a separate change. See also issue 15514.)
Fixes#15491.
Change-Id: I7978e8de163eb9965964942aee27f13bf94a7c3c
Reviewed-on: https://go-review.googlesource.com/22714
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
1.7 traces embed symbol info and we now generate symbolized pprof profiles,
so we don't need the binary. Make binary argument optional as 1.5 traces
still need it.
Change-Id: I65eb13e3d20ec765acf85c42d42a8d7aae09854c
Reviewed-on: https://go-review.googlesource.com/22410
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Austin Clements <austin@google.com>
Race runtime also needs local malloc caches and currently uses
a mix of per-OS-thread and per-goroutine caches. This leads to
increased memory consumption. But more importantly cache of
synchronization objects is per-goroutine and we don't always
have goroutine context when feeing memory in GC. As the result
synchronization object descriptors leak (more precisely, they
can be reused if another synchronization object is recreated
at the same address, but it does not always help). For example,
the added BenchmarkSyncLeak has effectively runaway memory
consumption (based on a real long running server).
This change updates race runtime with support for per-P contexts.
BenchmarkSyncLeak now stabilizes at ~1GB memory consumption.
Long term, this will allow us to remove race runtime dependency
on glibc (as malloc is the main cornerstone).
I've also implemented a different scheme to pass P context to
race runtime: scheduler notified race runtime about association
between G and P by calling procwire(g, p)/procunwire(g, p).
But it turned out to be very messy as we have lots of places
where the association changes (e.g. syscalls). So I dropped it
in favor of the current scheme: race runtime asks scheduler
about the current P.
Fixes#14533
Change-Id: Iad10d2f816a44affae1b9fed446b3580eafd8c69
Reviewed-on: https://go-review.googlesource.com/19970
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Runqempty is a critical predicate for scheduler. If runqempty spuriously
returns true, then scheduler can fail to schedule arbitrary number of
runnable goroutines on idle Ps for arbitrary long time. With the addition
of runnext runqempty predicate become broken (can spuriously return true).
Consider that runnext is not nil and the main array is empty. Runqempty
observes that the array is empty, then it is descheduled for some time.
Then queue owner pushes another element to the queue evicting runnext
into the array. Then queue owner pops runnext. Then runqempty resumes
and observes runnext is nil and returns true. But there were no point
in time when the queue was empty.
Fix runqempty predicate to not return true spuriously.
Change-Id: Ifb7d75a699101f3ff753c4ce7c983cf08befd31e
Reviewed-on: https://go-review.googlesource.com/20858
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
GCC, unlike clang, does not provide any way for code being compiled to tell if
-fsanitize-thread was passed. But cgo can look to see if that flag is being
passed and generate different code in that case.
Fixes#14602
Change-Id: I86cb5318c2e35501ae399618c05af461d1252d2d
Reviewed-on: https://go-review.googlesource.com/22688
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The format has been tweaked several times in the latest cycle, so
replace go13ld with go17ld.
Change-Id: I343c49b02b7516fd781bc96ad46640579da68c59
Reviewed-on: https://go-review.googlesource.com/22708
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I got a complaint that cgo output triggers warnings with
-Wdeclaration-after-statement. I don't think it's worth testing for
this--C has permitted declarations after statements since C99--but it is
easy enough to fix. It may break again; so it goes.
This CL also fixes errno handling to avoid getting confused if the tsan
functions happen to change the global errno variable.
Change-Id: I0ec7c63a6be5653ef44799d134c8d27cb5efa441
Reviewed-on: https://go-review.googlesource.com/22686
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>