If we get a SIGPROF on a non-Go thread, and the program has not called
runtime.SetCgoTraceback so we have no way to collect a stack trace, then
record a profile that is just the PC where the signal occurred. That
will at least point the user to the right area.
Retrieving the PC from the sigctxt in a signal handler on a non-G thread
required marking a number of trivial sigctxt methods as nosplit, and,
for extra safety, nowritebarrierrec.
The test shows that the existing test CgoPprofThread test does not test
the stack trace, just the profile signal. Leaving that for later.
Change-Id: I8f8f3ff09ac099fc9d9df94b5a9d210ffc20c4ab
Reviewed-on: https://go-review.googlesource.com/30252
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
PC passed to racegostart is expected to be a return PC
of the go statement. Race runtime will subtract 1 from the PC
before symbolization. Passing start PC of a function is wrong.
Add sys.PCQuantum to the function start PC.
Update #17190
Change-Id: Ia504c49e79af84ed4ea360c2aea472b370ea8bf5
Reviewed-on: https://go-review.googlesource.com/29712
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
For -buildmode=plugin, this lets the linker drop the main.main symbol
out of the binary while including most of the runtime.
(In the future it should be possible to drop the entire runtime
package from plugins.)
Change-Id: I3e7a024ddf5cc945e3d8b84bf37a0b7cb2a00eb6
Reviewed-on: https://go-review.googlesource.com/27821
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
noescape is now 0 instructions with the SSA backend.
fast atomics are no longer a TODO (at least for amd64).
Change-Id: Ib6e06f7471bef282a47ba236d8ce95404bb60a42
Reviewed-on: https://go-review.googlesource.com/28087
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When compiling with -buildmode=shared, a map[int32]*_type is created for
each extra module mapping duplicate types back to a canonical object.
This is done in the function typelinksinit, which is called before the
init function that sets up the hash functions for the map
implementation. The result is typemap becomes unusable after
runtime initialization.
The fix in this CL is to move algorithm init before typelinksinit in
the runtime setup process. (For 1.8, we may want to turn typemap into
a sorted slice of types and use binary search.)
Manually tested on GOOS=linux with:
GOHOSTARCH=386 GOARCH=386 ./make.bash && \
go install -buildmode=shared std && \
cd ../test && \
go run run.go -linkshared
Fixes#16590
Change-Id: Idc08c50cc70d20028276fbf564509d2cd5405210
Reviewed-on: https://go-review.googlesource.com/25469
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
When a non-Go thread calls into Go, the runtime needs an M to run the Go
code. The runtime keeps a list of extra M's available. When the last
extra M is allocated, the needextram field is set to tell it to allocate
a new extra M as soon as it is running in Go. This ensures that an extra
M will always be available for the next thread.
However, if many threads need an extra M at the same time, this
serializes them all. One thread will get an extra M with the needextram
field set. All the other threads will see that there is no M available
and will go to sleep. The one thread that succeeded will create a new
extra M. One lucky thread will get it. All the other threads will see
that there is no M available and will go to sleep. The effect is
thundering herd, as all the threads looking for an extra M go through
the process one by one. This seems to have a particularly bad effect on
the FreeBSD scheduler for some reason.
With this change, we track the number of threads waiting for an M, and
create all of them as soon as one thread gets through. This still means
that all the threads will fight for the lock to pick up the next M. But
at least each thread that gets the lock will succeed, instead of going
to sleep only to fight again.
This smooths out the performance greatly on FreeBSD, reducing the
average wall time of `testprogcgo CgoCallbackGC` by 74%. On GNU/Linux
the average wall time goes down by 9%.
Fixes#13926Fixes#16396
Change-Id: I6dc42a4156085a7ed4e5334c60b39db8f8ef8fea
Reviewed-on: https://go-review.googlesource.com/25047
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
When the blocked field was first introduced back in
https://golang.org/cl/61250043 the scheduler trace code incorrectly used
m->blocked instead of mp->blocked. That has carried through the
conversion to Go. This CL fixes it.
Change-Id: Id81907b625221895aa5c85b9853f7c185efd8f4b
Reviewed-on: https://go-review.googlesource.com/24571
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This has a minor performance cost, but far less than is being gained by SSA.
As an experiment, enable it during the Go 1.7 beta.
Having frame pointers on by default makes Linux's perf, Intel VTune,
and other profilers much more useful, because it lets them gather a
stack trace efficiently on profiling events.
(It doesn't help us that much, since when we walk the stack we usually
need to look up PC-specific information as well.)
Fixes#15840.
Change-Id: I4efd38412a0de4a9c87b1b6e5d11c301e63f1a2a
Reviewed-on: https://go-review.googlesource.com/23451
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Acquire and release the TSAN synchronization point when calling malloc,
just as we do when calling any other C function. If we don't do this,
TSAN will report false positive errors about races calling malloc and
free.
We used to have a special code path for malloc and free, going through
the runtime functions cmalloc and cfree. The special code path for cfree
was no longer used even before this CL. This CL stops using the special
code path for malloc, because there is no place along that path where we
could conditionally insert the TSAN synchronization. This CL removes
the support for the special code path for both functions.
Instead, cgo now automatically generates the malloc function as though
it were referenced as C.malloc. We need to automatically generate it
even if C.malloc is not called, even if malloc and size_t are not
declared, to support cgo-provided functions like C.CString.
Change-Id: I829854ec0787a80f33fa0a8a0dc2ee1d617830e2
Reviewed-on: https://go-review.googlesource.com/23260
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently scanstack obtains its own gcWork from the P for the duration
of the stack scan and then, if called during mark termination,
disposes the gcWork.
However, this means that the number of workbufs allocated will be at
least the number of stacks scanned during mark termination, which may
be very high (especially during a STW GC). This happens because, in
steady state, each scanstack will obtain a fresh workbuf (either from
the empty list or by allocating it), fill it with the scan results,
and then dispose it to the full list. Nothing is consuming from the
full list during this (and hence nothing is recycling them to the
empty list), so the length of the full list by the time mark
termination starts draining it is at least the number of stacks
scanned.
Fix this by pushing the gcWork acquisition up the stack to either the
gcDrain that calls markroot that calls scanstack (which batches across
many stack scans and is the path taken during STW GC) or to newstack
(which is still a single scanstack call, but this is roughly bounded
by the number of Ps).
This fix reduces the workbuf allocation for the test program from
issue #15319 from 213 MB (roughly 2KB * 1e5 goroutines) to 10 MB.
Fixes#15319.
Note that there's potentially a similar issue in write barriers during
mark 2. Fixing that will be more difficult since there's no broader
non-preemptible context, but it should also be less of a problem since
the full list is being drained during mark 2.
Some overall improvements in the go1 benchmarks, plus the usual noise.
No significant change in the garbage benchmark (time/op or GC memory).
name old time/op new time/op delta
BinaryTree17-12 2.54s ± 1% 2.51s ± 1% -1.09% (p=0.000 n=20+19)
Fannkuch11-12 2.12s ± 0% 2.17s ± 0% +2.18% (p=0.000 n=19+18)
FmtFprintfEmpty-12 45.1ns ± 1% 45.2ns ± 0% ~ (p=0.078 n=19+18)
FmtFprintfString-12 127ns ± 0% 128ns ± 0% +1.08% (p=0.000 n=19+16)
FmtFprintfInt-12 125ns ± 0% 122ns ± 1% -2.71% (p=0.000 n=14+18)
FmtFprintfIntInt-12 196ns ± 0% 190ns ± 1% -2.91% (p=0.000 n=12+20)
FmtFprintfPrefixedInt-12 196ns ± 0% 194ns ± 1% -0.94% (p=0.000 n=13+18)
FmtFprintfFloat-12 253ns ± 1% 251ns ± 1% -0.86% (p=0.000 n=19+20)
FmtManyArgs-12 807ns ± 1% 784ns ± 1% -2.85% (p=0.000 n=20+20)
GobDecode-12 7.13ms ± 1% 7.12ms ± 1% ~ (p=0.351 n=19+20)
GobEncode-12 5.89ms ± 0% 5.95ms ± 0% +0.94% (p=0.000 n=19+19)
Gzip-12 219ms ± 1% 221ms ± 1% +1.35% (p=0.000 n=18+20)
Gunzip-12 37.5ms ± 1% 37.4ms ± 0% ~ (p=0.057 n=20+19)
HTTPClientServer-12 81.4µs ± 4% 81.9µs ± 3% ~ (p=0.118 n=17+18)
JSONEncode-12 15.7ms ± 1% 15.8ms ± 1% +0.73% (p=0.000 n=17+18)
JSONDecode-12 57.9ms ± 1% 57.2ms ± 1% -1.34% (p=0.000 n=19+19)
Mandelbrot200-12 4.12ms ± 1% 4.10ms ± 0% -0.33% (p=0.000 n=19+17)
GoParse-12 3.22ms ± 2% 3.25ms ± 1% +0.72% (p=0.000 n=18+20)
RegexpMatchEasy0_32-12 70.6ns ± 1% 71.1ns ± 2% +0.63% (p=0.005 n=19+20)
RegexpMatchEasy0_1K-12 240ns ± 0% 239ns ± 1% -0.59% (p=0.000 n=19+20)
RegexpMatchEasy1_32-12 71.3ns ± 1% 71.3ns ± 1% ~ (p=0.844 n=17+17)
RegexpMatchEasy1_1K-12 384ns ± 2% 371ns ± 1% -3.45% (p=0.000 n=19+20)
RegexpMatchMedium_32-12 109ns ± 1% 108ns ± 2% -0.48% (p=0.029 n=19+19)
RegexpMatchMedium_1K-12 34.3µs ± 1% 34.5µs ± 2% ~ (p=0.160 n=18+20)
RegexpMatchHard_32-12 1.79µs ± 9% 1.72µs ± 2% -3.83% (p=0.000 n=19+19)
RegexpMatchHard_1K-12 53.3µs ± 4% 51.8µs ± 1% -2.82% (p=0.000 n=19+20)
Revcomp-12 386ms ± 0% 388ms ± 0% +0.72% (p=0.000 n=17+20)
Template-12 62.9ms ± 1% 62.5ms ± 1% -0.57% (p=0.010 n=18+19)
TimeParse-12 325ns ± 0% 331ns ± 0% +1.84% (p=0.000 n=18+19)
TimeFormat-12 338ns ± 0% 343ns ± 0% +1.34% (p=0.000 n=18+20)
[Geo mean] 52.7µs 52.5µs -0.42%
Change-Id: Ib2d34736c4ae2ec329605b0fbc44636038d8d018
Reviewed-on: https://go-review.googlesource.com/23391
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently ready always puts the readied goroutine in runnext. We're
going to have to change this for some uses, so add a flag for whether
or not to use runnext.
For now we always pass true so this is a no-op change.
For #15706.
Change-Id: Iaa66d8355ccfe4bbe347570cc1b1878c70fa25df
Reviewed-on: https://go-review.googlesource.com/23171
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
The test case in #15639 somehow causes an invalid syscall frame. The
failure is obscured because the throw occurs when throwsplit == true,
which causes a "stack split at bad time" error when trying to print the
throw message.
This CL fixes the "stack split at bad time" by using systemstack. No
test because there shouldn't be any way to trigger this error anyhow.
Update #15639.
Change-Id: I4240f3fd01bdc3c112f3ffd1316b68504222d9e1
Reviewed-on: https://go-review.googlesource.com/23153
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
If we collected a cgo traceback when entering the SIGPROF signal
handler, record it as part of the profiling stack trace.
This serves as the promised test for https://golang.org/cl/21055 .
Change-Id: I5f60cd6cea1d9b7c3932211483a6bfab60ed21d2
Reviewed-on: https://go-review.googlesource.com/22650
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Race runtime also needs local malloc caches and currently uses
a mix of per-OS-thread and per-goroutine caches. This leads to
increased memory consumption. But more importantly cache of
synchronization objects is per-goroutine and we don't always
have goroutine context when feeing memory in GC. As the result
synchronization object descriptors leak (more precisely, they
can be reused if another synchronization object is recreated
at the same address, but it does not always help). For example,
the added BenchmarkSyncLeak has effectively runaway memory
consumption (based on a real long running server).
This change updates race runtime with support for per-P contexts.
BenchmarkSyncLeak now stabilizes at ~1GB memory consumption.
Long term, this will allow us to remove race runtime dependency
on glibc (as malloc is the main cornerstone).
I've also implemented a different scheme to pass P context to
race runtime: scheduler notified race runtime about association
between G and P by calling procwire(g, p)/procunwire(g, p).
But it turned out to be very messy as we have lots of places
where the association changes (e.g. syscalls). So I dropped it
in favor of the current scheme: race runtime asks scheduler
about the current P.
Fixes#14533
Change-Id: Iad10d2f816a44affae1b9fed446b3580eafd8c69
Reviewed-on: https://go-review.googlesource.com/19970
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Runqempty is a critical predicate for scheduler. If runqempty spuriously
returns true, then scheduler can fail to schedule arbitrary number of
runnable goroutines on idle Ps for arbitrary long time. With the addition
of runnext runqempty predicate become broken (can spuriously return true).
Consider that runnext is not nil and the main array is empty. Runqempty
observes that the array is empty, then it is descheduled for some time.
Then queue owner pushes another element to the queue evicting runnext
into the array. Then queue owner pops runnext. Then runqempty resumes
and observes runnext is nil and returns true. But there were no point
in time when the queue was empty.
Fix runqempty predicate to not return true spuriously.
Change-Id: Ifb7d75a699101f3ff753c4ce7c983cf08befd31e
Reviewed-on: https://go-review.googlesource.com/20858
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently we clear gcscanvalid in both casgstatus and
casfrom_Gscanstatus if the new status is _Grunning. This is very
important to do in casgstatus. However, this is potentially wrong in
casfrom_Gscanstatus because in this case the caller doesn't own gp and
hence the write is racy. Unlike the other _Gscan statuses, during
_Gscanrunning, the G is still running. This does not indicate that
it's transitioning into a running state. The scan simply hasn't
happened yet, so it's neither valid nor invalid.
Conveniently, this also means clearing gcscanvalid is unnecessary in
this case because the G was already in _Grunning, so we can simply
remove this code. What will happen instead is that the G will be
preempted to scan itself, that scan will set gcscanvalid to true, and
then the G will return to _Grunning via casgstatus, clearing
gcscanvalid.
This fix will become necessary shortly when we start keeping track of
the set of G's with dirty stacks, since it will no longer be
idempotent to simply set gcscanvalid to false.
Change-Id: I688c82e6fbf00d5dbbbff49efa66acb99ee86785
Reviewed-on: https://go-review.googlesource.com/20669
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently all free Gs are added to one list. Split this into two
lists: one for free Gs with cached stacks and one for Gs without
cached stacks.
This lets us preferentially allocate Gs that already have a stack, but
more importantly, it sets us up to free cached G stacks concurrently.
Change-Id: Idbe486f708997e1c9d166662995283f02d1eeb3c
Reviewed-on: https://go-review.googlesource.com/20664
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently tracer uses global sequencer and it introduces
significant slowdown on parallel machines (up to 10x).
Replace the global sequencer with per-goroutine sequencer.
If we assign per-goroutine sequence numbers to only 3 types
of events (start, unblock and syscall exit), it is enough to
restore consistent partial ordering of all events. Even these
events don't need sequence numbers all the time (if goroutine
starts on the same P where it was unblocked, then start does
not need sequence number).
The burden of restoring the order is put on trace parser.
Details of the algorithm are described in the comments.
On http benchmark with GOMAXPROCS=48:
no tracing: 5026 ns/op
tracing: 27803 ns/op (+453%)
with this change: 6369 ns/op (+26%, mostly for traceback)
Also trace size is reduced by ~22%. Average event size before: 4.63
bytes/event, after: 3.62 bytes/event.
Besides running trace tests, I've also tested with manually broken
cputicks (random skew for each event, per-P skew and episodic random skew).
In all cases broken timestamps were detected and no test failures.
Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa
Reviewed-on: https://go-review.googlesource.com/21512
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This CL introduces the typeOff type and a lookup method of the same
name that can turn a typeOff offset into an *rtype.
In a typical Go binary (built with buildmode=exe, pie, c-archive, or
c-shared), there is one moduledata and all typeOff values are offsets
relative to firstmoduledata.types. This makes computing the pointer
cheap in typical programs.
With buildmode=shared (and one day, buildmode=plugin) there are
multiple modules whose relative offset is determined at runtime.
We identify a type in the general case by the pair of the original
*rtype that references it and its typeOff value. We determine
the module from the original pointer, and then use the typeOff from
there to compute the final *rtype.
To ensure there is only one *rtype representing each type, the
runtime initializes a typemap for each module, using any identical
type from an earlier module when resolving that offset. This means
that types computed from an offset match the type mapped by the
pointer dynamic relocations.
A series of followup CLs will replace other *rtype values with typeOff
(and name/*string with nameOff).
For types created at runtime by reflect, type offsets are treated as
global IDs and reference into a reflect offset map kept by the runtime.
darwin/amd64:
cmd/go: -57KB (0.6%)
jujud: -557KB (0.8%)
linux/amd64 PIE:
cmd/go: -361KB (3.0%)
jujud: -3.5MB (4.2%)
For #6853.
Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96
Reviewed-on: https://go-review.googlesource.com/21285
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
So that all Go processes do not die on startup on a system with >256 CPUs.
I tested this by hacking osinit to set ncpu to 1000.
Updates #15131
Change-Id: I52e061a0de97be41d684dd8b748fa9087d6f1aef
Reviewed-on: https://go-review.googlesource.com/21599
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Usleep(100) in runqgrab negatively affects latency and throughput
of parallel application. We are sleeping instead of doing useful work.
This is effect is particularly visible on windows where minimal
sleep duration is 1-15ms.
Reduce sleep from 100us to 3us and use osyield on windows.
Sync chan send/recv takes ~50ns, so 3us gives us ~50x overshoot.
benchmark old ns/op new ns/op delta
BenchmarkChanSync-12 216 217 +0.46%
BenchmarkChanSyncWork-12 27213 25816 -5.13%
CPU consumption goes up from 106% to 108% in the first case,
and from 107% to 125% in the second case.
Test case from #14790 on windows:
BenchmarkDefaultResolution-8 4583372 29720 -99.35%
Benchmark1ms-8 992056 30701 -96.91%
99-th latency percentile for HTTP request serving is improved by up to 15%
(see http://golang.org/cl/20835 for details).
The following benchmarks are from the change that originally added this sleep
(see https://golang.org/s/go15gomaxprocs):
name old time/op new time/op delta
Chain 22.6µs ± 2% 22.7µs ± 6% ~ (p=0.905 n=9+10)
ChainBuf 22.4µs ± 3% 22.5µs ± 4% ~ (p=0.780 n=9+10)
Chain-2 23.5µs ± 4% 24.9µs ± 1% +5.66% (p=0.000 n=10+9)
ChainBuf-2 23.7µs ± 1% 24.4µs ± 1% +3.31% (p=0.000 n=9+10)
Chain-4 24.2µs ± 2% 25.1µs ± 3% +3.70% (p=0.000 n=9+10)
ChainBuf-4 24.4µs ± 5% 25.0µs ± 2% +2.37% (p=0.023 n=10+10)
Powser 2.37s ± 1% 2.37s ± 1% ~ (p=0.423 n=8+9)
Powser-2 2.48s ± 2% 2.57s ± 2% +3.74% (p=0.000 n=10+9)
Powser-4 2.66s ± 1% 2.75s ± 1% +3.40% (p=0.000 n=10+10)
Sieve 13.3s ± 2% 13.3s ± 2% ~ (p=1.000 n=10+9)
Sieve-2 7.00s ± 2% 7.44s ±16% ~ (p=0.408 n=8+10)
Sieve-4 4.13s ±21% 3.85s ±22% ~ (p=0.113 n=9+9)
Fixes#14790
Change-Id: Ie7c6a1c4f9c8eb2f5d65ab127a3845386d6f8b5d
Reviewed-on: https://go-review.googlesource.com/20835
Reviewed-by: Austin Clements <austin@google.com>
See #14874
This change makes the runtime register all compiler generated itabs
(as obtained from the moduledata) during init.
Change-Id: I9969a0985b99b8bda820a631f7fe4c78f1174cdf
Reviewed-on: https://go-review.googlesource.com/20900
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
During random stealing we steal 4*GOMAXPROCS times from random procs.
One would expect that most of the time we check all procs this way,
but due to low quality PRNG we actually miss procs with frightening
probability. Below are modelling experiment results for 1e6 tries:
GOMAXPROCS = 2 : missed 1 procs 7944 times
GOMAXPROCS = 3 : missed 1 procs 101620 times
GOMAXPROCS = 3 : missed 2 procs 3571 times
GOMAXPROCS = 4 : missed 1 procs 63916 times
GOMAXPROCS = 4 : missed 2 procs 61 times
GOMAXPROCS = 4 : missed 3 procs 16 times
GOMAXPROCS = 5 : missed 1 procs 133136 times
GOMAXPROCS = 5 : missed 2 procs 1025 times
GOMAXPROCS = 5 : missed 3 procs 101 times
GOMAXPROCS = 5 : missed 4 procs 15 times
GOMAXPROCS = 8 : missed 1 procs 151765 times
GOMAXPROCS = 8 : missed 2 procs 5057 times
GOMAXPROCS = 8 : missed 3 procs 1726 times
GOMAXPROCS = 8 : missed 4 procs 68 times
GOMAXPROCS = 12 : missed 1 procs 199081 times
GOMAXPROCS = 12 : missed 2 procs 27489 times
GOMAXPROCS = 12 : missed 3 procs 3113 times
GOMAXPROCS = 12 : missed 4 procs 233 times
GOMAXPROCS = 12 : missed 5 procs 9 times
GOMAXPROCS = 16 : missed 1 procs 237477 times
GOMAXPROCS = 16 : missed 2 procs 30037 times
GOMAXPROCS = 16 : missed 3 procs 9466 times
GOMAXPROCS = 16 : missed 4 procs 1334 times
GOMAXPROCS = 16 : missed 5 procs 192 times
GOMAXPROCS = 16 : missed 6 procs 5 times
GOMAXPROCS = 16 : missed 7 procs 1 times
GOMAXPROCS = 16 : missed 8 procs 1 times
A missed proc won't lead to underutilization because we check all procs
again after dropping P. But it can lead to an unpleasant situation
when we miss a proc, drop P, check all procs, discover work, acquire P,
miss the proc again, repeat.
Improve stealing logic to cover all procs.
Also don't enter spinning mode and try to steal when there is nobody around.
Change-Id: Ibb6b122cc7fb836991bad7d0639b77c807aab4c2
Reviewed-on: https://go-review.googlesource.com/20836
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
gopark calls the unlock function after setting the G to _Gwaiting.
This means it's generally unsafe to access the G's stack from the
unlock function because the G may start running on another P. Once we
start shrinking stacks concurrently, a stack shrink could also move
the stack the moment after it enters _Gwaiting and before the unlock
function is called.
Document this restriction and fix the two places where we currently
violate it.
This is unlikely to be a problem in practice for these two places
right now, but they're already skating on thin ice. For example, the
following sequence could in principle cause corruption, deadlock, or a
panic in the select code:
On M1/P1:
1. G1 selects on channels A and B.
2. selectgoImpl calls gopark.
3. gopark puts G1 in _Gwaiting.
4. gopark calls selparkcommit.
5. selparkcommit releases the lock on channel A.
On M2/P2:
6. G2 sends to channel A.
7. The send puts G1 in _Grunnable and puts it on P2's run queue.
8. The scheduler runs, selects G1, puts it in _Grunning, and resumes G1.
9. On G1, the sellock immediately following the gopark gets called.
10. sellock grows and moves the stack.
On M1/P1:
11. selparkcommit continues to scan the lock order for the next
channel to unlock, but it's now reading from a freed (and possibly
reused) stack.
This shouldn't happen in practice because step 10 isn't the first call
to sellock, so the stack should already be big enough. However, once
we start shrinking stacks concurrently, this reasoning won't work any
more.
For #12967.
Change-Id: I3660c5be37e5be9f87433cb8141bdfdf37fadc4c
Reviewed-on: https://go-review.googlesource.com/20038
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Given a G, there's currently no way to find the channel it's blocking
on. We'll need this information to fix a (probably theoretical) bug in
select and to implement concurrent stack shrinking, so record the
channel in the sudog.
For #12967.
Change-Id: If8fb63a140f1d07175818824d08c0ebeec2bdf66
Reviewed-on: https://go-review.googlesource.com/20035
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Move functions testSchedLocalQueueLocal and testSchedLocalQueueSteal
from proc.go to export_test.go, the only site that they are used.
Fixes#14796
Change-Id: I16b6fa4a13835eab33f66a2c2e87a5f5c79b7bd3
Reviewed-on: https://go-review.googlesource.com/20640
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently dropg does not unwire locked g/m.
This is unnecessary distiction between locked and non-locked g/m.
We always restart goroutines with execute which re-wires g/m.
First, this produces false sense that this distinction is necessary.
Second, it can confuse some sanity and cross checks. For example,
if we check that g/m are unwired before we wire them in execute,
the check will fail for locked g/m. I've hit this while doing some
race detector changes, When we deschedule a goroutine and run
scheduler code, m.curg is generally nil, but not for locked ms.
Remove the distinction.
Change-Id: I3b87a28ff343baa1d564aab1f821b582a84dee07
Reviewed-on: https://go-review.googlesource.com/19950
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
_Genqueue and _Gscanenqueue were introduced as part of the GC quiesce
code. The quiesce code was removed by 197aa9e, but these states and
some associated code stuck around. Remove them.
Change-Id: I69df81881602d4a431556513dac2959668d27c20
Reviewed-on: https://go-review.googlesource.com/19638
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Change-Id: I6cb8ac7b59812e82111ab3b0f8303ab8194a5129
Reviewed-on: https://go-review.googlesource.com/19791
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently it's possible for the scheduler to deadlock with the right
confluence of locked Gs, assists, and scheduling of background mark
workers. Broadly, this happens because handoffp is stricter than
findrunnable, and if the only work for a P is GC work, handoffp will
put the P into idle, rather than starting an M to execute that P. One
way this can happen is as follows:
0. There is only one user G, which we'll call G 1. There is more than
one P, but they're all idle except the one running G 1.
1. G 1 locks itself to an M using runtime.LockOSThread.
2. GC starts up and enters mark 1.
3. G 1 performs a GC assist, which completes mark 1 without being
fully satisfied. Completing mark 1 causes all background mark
workers to park. And since the assist isn't fully satisfied, it
parks as well, waiting for a background mark worker to satisfy its
remaining assist debt.
4. The assist park enters the scheduler. Since G 1 is locked to the M,
the scheduler releases the P and calls handoffp to hand the P to
another M.
5. handoffp checks the local and global run queues, which are empty,
and sees that there are idle Ps, so rather than start an M, it puts
the P into idle.
At this point, all of the Gs are waiting and all of the Ps are idle.
In particular, none of the GC workers are running, so no mark work
gets done and the assist on the main G is never satisfied, so the
whole process soft locks up.
Fix this by making handoffp start an M if there is GC work. This
reintroduces a key invariant: that in any situation where findrunnable
would return a G to run on a P, handoffp for that P will start an M to
run work on that P.
Fixes#13645.
Tested by running 2,689 iterations of `go tool dist test -no-rebuild
runtime:cpu124` across 10 linux-amd64-noopt VMs with no failures.
Without this change, the failure rate was somewhere around 1%.
Performance change is negligible.
name old time/op new time/op delta
XBenchGarbage-12 2.48ms ± 2% 2.48ms ± 1% -0.24% (p=0.000 n=92+93)
name old time/op new time/op delta
BinaryTree17-12 2.86s ± 2% 2.87s ± 2% ~ (p=0.667 n=19+20)
Fannkuch11-12 2.52s ± 1% 2.47s ± 1% -2.05% (p=0.000 n=18+20)
FmtFprintfEmpty-12 51.7ns ± 1% 51.5ns ± 3% ~ (p=0.931 n=16+20)
FmtFprintfString-12 170ns ± 1% 168ns ± 1% -0.65% (p=0.000 n=19+19)
FmtFprintfInt-12 160ns ± 0% 160ns ± 0% +0.18% (p=0.033 n=17+19)
FmtFprintfIntInt-12 265ns ± 1% 273ns ± 1% +2.98% (p=0.000 n=17+19)
FmtFprintfPrefixedInt-12 235ns ± 1% 239ns ± 1% +1.99% (p=0.000 n=16+19)
FmtFprintfFloat-12 315ns ± 0% 315ns ± 1% ~ (p=0.250 n=17+19)
FmtManyArgs-12 1.04µs ± 1% 1.05µs ± 0% +0.87% (p=0.000 n=17+19)
GobDecode-12 7.93ms ± 0% 7.85ms ± 1% -1.03% (p=0.000 n=16+18)
GobEncode-12 6.62ms ± 1% 6.58ms ± 1% -0.60% (p=0.000 n=18+19)
Gzip-12 322ms ± 1% 320ms ± 1% -0.46% (p=0.009 n=20+20)
Gunzip-12 42.5ms ± 1% 42.5ms ± 0% ~ (p=0.751 n=19+19)
HTTPClientServer-12 69.7µs ± 1% 70.0µs ± 2% ~ (p=0.056 n=19+19)
JSONEncode-12 16.9ms ± 1% 16.7ms ± 1% -1.13% (p=0.000 n=19+19)
JSONDecode-12 61.5ms ± 1% 61.3ms ± 1% -0.35% (p=0.001 n=20+17)
Mandelbrot200-12 3.94ms ± 0% 3.91ms ± 0% -0.67% (p=0.000 n=20+18)
GoParse-12 3.71ms ± 1% 3.70ms ± 1% ~ (p=0.244 n=17+19)
RegexpMatchEasy0_32-12 101ns ± 1% 102ns ± 2% +0.54% (p=0.037 n=19+20)
RegexpMatchEasy0_1K-12 349ns ± 0% 350ns ± 0% +0.33% (p=0.000 n=17+18)
RegexpMatchEasy1_32-12 84.5ns ± 2% 84.2ns ± 1% -0.43% (p=0.048 n=19+20)
RegexpMatchEasy1_1K-12 510ns ± 1% 513ns ± 2% +0.58% (p=0.002 n=18+20)
RegexpMatchMedium_32-12 132ns ± 1% 134ns ± 1% +0.95% (p=0.000 n=20+20)
RegexpMatchMedium_1K-12 40.1µs ± 1% 39.6µs ± 1% -1.39% (p=0.000 n=20+20)
RegexpMatchHard_32-12 2.08µs ± 0% 2.06µs ± 1% -0.95% (p=0.000 n=18+18)
RegexpMatchHard_1K-12 62.2µs ± 1% 61.9µs ± 1% -0.42% (p=0.001 n=19+20)
Revcomp-12 537ms ± 0% 536ms ± 0% ~ (p=0.076 n=20+20)
Template-12 71.3ms ± 1% 69.3ms ± 1% -2.75% (p=0.000 n=20+20)
TimeParse-12 361ns ± 0% 360ns ± 1% ~ (p=0.056 n=19+19)
TimeFormat-12 353ns ± 0% 352ns ± 0% -0.23% (p=0.000 n=17+18)
[Geo mean] 62.6µs 62.5µs -0.17%
Change-Id: I0fbbbe4d7d99653ba5600ffb4394fa03558bc4e9
Reviewed-on: https://go-review.googlesource.com/19107
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently p.gcBgMarkWorker is a *g. Change it to a guintptr. This
eliminates a write barrier during the subtle mark worker parking dance
(which isn't known to be causing problems, but may).
Change-Id: Ibf12c05ac910820448059e69a68e5b882c993ed8
Reviewed-on: https://go-review.googlesource.com/18970
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently mark workers attach to their designated Ps before parking,
either during initialization or after performing a phase transition.
However, in both of these cases, it's possible that the mark worker is
running on a different P than the one it attaches to. This is a
problem, because as soon as the worker attaches to a P, that P's
scheduler can execute the worker. If the worker hasn't yet parked on
the P it's actually running on, this means the worker G will be
running in two places at once. The most visible consequence of this is
that once the first instance of the worker does park, it will clear
g.m and the second instance will crash shortly when it tries to use
g.m.
Fix this by moving the attach to the gopark callback. At this point,
the G is genuinely stopped and the callback is running on the system
stack, so it's safe for another P's scheduler to pick up the worker G.
Fixes#13363. Fixes#13978.
Change-Id: If2f7c4a4174f9511f6227e14a27c56fb842d1cc8
Reviewed-on: https://go-review.googlesource.com/18761
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
This doesn't fix a bug, but may improve performance in programs that
have many concurrent calls from C to Go. The old code made several
system calls between lockextra and unlockextra. That could be happening
while another thread is spinning acquiring lockextra. This changes the
code to not make any system calls while holding the lock.
Change-Id: I50576478e478670c3d6429ad4e1b7d80f98a19d8
Reviewed-on: https://go-review.googlesource.com/18548
Reviewed-by: Russ Cox <rsc@golang.org>
[Repeat of CL 18343 with build fixes.]
Before, NumGoroutine counted system goroutines and Stack (usually) didn't show them,
which was inconsistent and confusing.
To resolve which way they should be consistent, it seems like
package main
import "runtime"
func main() { println(runtime.NumGoroutine()) }
should print 1 regardless of internal runtime details. Make it so.
Fixes#11706.
Change-Id: If26749fec06aa0ff84311f7941b88d140552e81d
Reviewed-on: https://go-review.googlesource.com/18432
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Fixes#13881.
Change-Id: Idff77db381640184ddd2b65022133bb226168800
Reviewed-on: https://go-review.googlesource.com/18449
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
The previous behaviour of installing the signal handlers in a separate
thread meant that Go initialization raced with non-Go initialization if
the non-Go initialization also wanted to install signal handlers. Make
installing signal handlers synchronous so that the process-wide behavior
is predictable.
Update #9896.
Change-Id: Ice24299877ec46f8518b072a381932d273096a32
Reviewed-on: https://go-review.googlesource.com/18150
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Before, NumGoroutine counted system goroutines and Stack (usually) didn't show them,
which was inconsistent and confusing.
To resolve which way they should be consistent, it seems like
package main
import "runtime"
func main() { println(runtime.NumGoroutine()) }
should print 1 regardless of internal runtime details. Make it so.
Fixes#11706.
Change-Id: I6bfe26a901de517728192cfb26a5568c4ef4fe47
Reviewed-on: https://go-review.googlesource.com/18343
Reviewed-by: Austin Clements <austin@google.com>
f90b48e intended to require the stack barrier lock in all cases of
sigprof that walked the user stack, but got it wrong. In particular,
if sp < gp.stack.lo || gp.stack.hi < sp, tracebackUser would be true,
but we wouldn't acquire the stack lock. If it then turned out that we
were in a cgo call, it would walk the stack without the lock.
In fact, the whole structure of stack locking is sigprof is somewhat
wrong because it assumes the G to lock is gp.m.curg, but all three
gentraceback calls start from potentially different Gs.
To fix this, we lower the gcTryLockStackBarriers calls much closer to
the gentraceback calls. There are now three separate trylock calls,
each clearly associated with a gentraceback and the locked G clearly
matches the G from which the gentraceback starts. This actually brings
the sigprof logic closer to what it originally was before stack
barrier locking.
This depends on "runtime: increase assumed stack size in
externalthreadhandler" because it very slightly increases the stack
used by sigprof; without this other commit, this is enough to blow the
profiler thread's assumed stack size.
Fixes#12528 (hopefully for real this time!).
For the 1.5 branch, though it will require some backporting. On the
1.5 branch, this will *not* require the "runtime: increase assumed
stack size in externalthreadhandler" commit: there's no pcvalue cache,
so the used stack is smaller.
Change-Id: Id2f6446ac276848f6fc158bee550cccd03186b83
Reviewed-on: https://go-review.googlesource.com/18328
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
If a sigprof happens during a cgo call, we traceback from the entry
point of the cgo call. However, if the SP is outside of the G's stack,
we'll then ignore this traceback, even if it was successful, and
overwrite it with just _ExternalCode.
Fix this by accepting any successful traceback, regardless of whether
we got it from a cgo entry point or from regular Go code.
Fixes#13466.
Change-Id: I5da9684361fc5964f44985d74a8cdf02ffefd213
Reviewed-on: https://go-review.googlesource.com/18327
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
We were setting the signal mask of a new m to the signal mask of the m
that created it. That failed when that m happened to be the one created
by ensureSigM, which sets its signal mask to only include the signals
being caught by os/signal.Notify.
Fixes#13164.
Update #9896.
Change-Id: I705c196fe9d11754e10bab9e9b2e7530ecdfa367
Reviewed-on: https://go-review.googlesource.com/18064
Reviewed-by: Russ Cox <rsc@golang.org>