Avoid targeting a partial register with load;
ensure source of load (writebarrier) is aligned.
Better yet would be "CMPB $1,writebarrier" but that requires
wrestling with flagalloc (mem operand complicates moving
instruction around).
Didn't see a change in time for
benchcmd -n 10 Build go build net/http
Verified that we clean the code up properly:
0x20a8 <main.main+104>: mov 0xc30a2(%rip),%eax
# 0xc5150 <runtime.writeBarrier>
0x20ae <main.main+110>: test %al,%al
Change-Id: Id5fb8c260eaec27bd727cb0ae1476c60343b0986
Reviewed-on: https://go-review.googlesource.com/19998
Reviewed-by: Keith Randall <khr@golang.org>
Currently most uses of gcWork use the per-P gcWork, but there are two
places that still use a stack-based gcWork. Simplify things by making
these instead use the per-P gcWork.
Change-Id: I712d012cce9dd5757c8541824e9641ac1c2a329c
Reviewed-on: https://go-review.googlesource.com/19636
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently p.gcBgMarkWorker is a *g. Change it to a guintptr. This
eliminates a write barrier during the subtle mark worker parking dance
(which isn't known to be causing problems, but may).
Change-Id: Ibf12c05ac910820448059e69a68e5b882c993ed8
Reviewed-on: https://go-review.googlesource.com/18970
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently mark workers attach to their designated Ps before parking,
either during initialization or after performing a phase transition.
However, in both of these cases, it's possible that the mark worker is
running on a different P than the one it attaches to. This is a
problem, because as soon as the worker attaches to a P, that P's
scheduler can execute the worker. If the worker hasn't yet parked on
the P it's actually running on, this means the worker G will be
running in two places at once. The most visible consequence of this is
that once the first instance of the worker does park, it will clear
g.m and the second instance will crash shortly when it tries to use
g.m.
Fix this by moving the attach to the gopark callback. At this point,
the G is genuinely stopped and the callback is running on the system
stack, so it's safe for another P's scheduler to pick up the worker G.
Fixes#13363. Fixes#13978.
Change-Id: If2f7c4a4174f9511f6227e14a27c56fb842d1cc8
Reviewed-on: https://go-review.googlesource.com/18761
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
GC assists check gcBlackenEnabled under the assist queue lock to avoid
going to sleep after gcWakeAllAssists has already woken all assists.
However, currently we clear gcBlackenEnabled shortly *after* waking
all assists, which opens a window where this exact race can happen.
Fix this by clearing gcBlackenEnabled before waking blocked assists.
However, it's unlikely this actually matters because the world is
stopped between waking assists and clearing gcBlackenEnabled and there
aren't any obvious allocations during this window, so I don't think an
assist could actually slip in to this race window.
Updates #13645.
Change-Id: I7571f059530481dc781d8fd96a1a40aadebecb0d
Reviewed-on: https://go-review.googlesource.com/18682
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Go 1.6 simplified the GC phases. The "synchronize Ps" phase no longer
exists and "root scan" and "mark" phases have been combined.
Update the gctrace line implementation and documentation to remove the
unused phases.
Fixes#13536.
Change-Id: I4fc37a3ce1ae3a99d48c0be2df64cbda3e05dee6
Reviewed-on: https://go-review.googlesource.com/18458
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently, runtime/debug.SetGCPercent does not adjust the controller
trigger ratio. As a result, runtime reductions of GOGC don't take full
effect until after one more concurrent cycle has happened, which
adjusts the trigger ratio to account for the new gcpercent.
Fix this by lowering the trigger ratio if necessary in setGCPercent.
Change-Id: I4d23e0c58d91939b86ac60fa5d53ef91d0d89e0c
Reviewed-on: https://go-review.googlesource.com/17813
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently we drop worldsema and then print the gctrace. We did this so
that if stderr is a pipe or a blocked terminal, blocking on printing
the gctrace would not block another GC from starting. However, this is
a bit of a fool's errand because a blocked runtime print will block
the whole M/P, so after GOMAXPROCS GC cycles, the whole system will
freeze. Furthermore, now this is much less of an issue because
allocation will block indefinitely if it can't start a GC (whereas it
used to be that allocation could run away). Finally, this allows
another GC cycle to start while the previous cycle is printing the
gctrace, which leads to races on reading various statistics to print
them and the next GC cycle overwriting those statistics.
Fix this by moving the release of worldsema after the gctrace print.
Change-Id: I3d044ea0f77d80f3b4050af6b771e7912258662a
Reviewed-on: https://go-review.googlesource.com/17812
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently we reset the sweep stats just after gcMarkTermination starts
the world and releases worldsema. However, background sweeping can
start the moment we start the world and, in fact, pause sweeping can
start the moment we release worldsema (because another GC cycle can
start up), so these need to be cleared before starting the world.
Change-Id: I95701e3de6af76bb3fbf2ee65719985bf57d20b2
Reviewed-on: https://go-review.googlesource.com/17811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently, we update memstats.heap_live from mcache.local_cachealloc
whenever we lock the heap (e.g., to obtain a fresh span or to release
an unused span). However, under the right circumstances,
local_cachealloc can accumulate allocations up to the size of
the *entire heap* without flushing them to heap_live. Specifically,
since span allocations from an mcentral don't lock the heap, if a
large number of pages are held in an mcentral and the application
continues to use and free objects of that size class (e.g., the
BinaryTree17 benchmark), local_cachealloc won't be flushed until the
mcentral runs out of spans.
This is a problem because, unlike many of the memory statistics that
are purely informative, heap_live is used to determine when the
garbage collector should start and how hard it should work.
This commit eliminates local_cachealloc, instead atomically updating
heap_live directly. To control contention, we do this only when
obtaining a span from an mcentral. Furthermore, we make heap_live
conservative: allocating a span assumes that all free slots in that
span will be used and accounts for these when the span is
allocated, *before* the objects themselves are. This is important
because 1) this triggers the GC earlier than necessary rather than
potentially too late and 2) this leads to a conservative GC rate
rather than a GC rate that is potentially too low.
Alternatively, we could have flushed local_cachealloc when it passed
some threshold, but this would require determining a threshold and
would cause heap_live to underestimate the true value rather than
overestimate.
Fixes#12199.
name old time/op new time/op delta
BinaryTree17-12 2.88s ± 4% 2.88s ± 1% ~ (p=0.470 n=19+19)
Fannkuch11-12 2.48s ± 1% 2.48s ± 1% ~ (p=0.243 n=16+19)
FmtFprintfEmpty-12 50.9ns ± 2% 50.7ns ± 1% ~ (p=0.238 n=15+14)
FmtFprintfString-12 175ns ± 1% 171ns ± 1% -2.48% (p=0.000 n=18+18)
FmtFprintfInt-12 159ns ± 1% 158ns ± 1% -0.78% (p=0.000 n=19+18)
FmtFprintfIntInt-12 270ns ± 1% 265ns ± 2% -1.67% (p=0.000 n=18+18)
FmtFprintfPrefixedInt-12 235ns ± 1% 234ns ± 0% ~ (p=0.362 n=18+19)
FmtFprintfFloat-12 309ns ± 1% 308ns ± 1% -0.41% (p=0.001 n=18+19)
FmtManyArgs-12 1.10µs ± 1% 1.08µs ± 0% -1.96% (p=0.000 n=19+18)
GobDecode-12 7.81ms ± 1% 7.80ms ± 1% ~ (p=0.425 n=18+19)
GobEncode-12 6.53ms ± 1% 6.53ms ± 1% ~ (p=0.817 n=19+19)
Gzip-12 312ms ± 1% 312ms ± 2% ~ (p=0.967 n=19+20)
Gunzip-12 42.0ms ± 1% 41.9ms ± 1% ~ (p=0.172 n=19+19)
HTTPClientServer-12 63.7µs ± 1% 63.8µs ± 1% ~ (p=0.639 n=19+19)
JSONEncode-12 16.4ms ± 1% 16.4ms ± 1% ~ (p=0.954 n=19+19)
JSONDecode-12 58.5ms ± 1% 57.8ms ± 1% -1.27% (p=0.000 n=18+19)
Mandelbrot200-12 3.86ms ± 1% 3.88ms ± 0% +0.44% (p=0.000 n=18+18)
GoParse-12 3.67ms ± 2% 3.66ms ± 1% -0.52% (p=0.001 n=18+19)
RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 0% ~ (p=0.257 n=19+18)
RegexpMatchEasy0_1K-12 347ns ± 1% 347ns ± 1% ~ (p=0.527 n=18+18)
RegexpMatchEasy1_32-12 83.7ns ± 2% 83.1ns ± 2% ~ (p=0.096 n=18+19)
RegexpMatchEasy1_1K-12 509ns ± 1% 505ns ± 1% -0.75% (p=0.000 n=18+19)
RegexpMatchMedium_32-12 130ns ± 2% 129ns ± 1% ~ (p=0.962 n=20+20)
RegexpMatchMedium_1K-12 39.5µs ± 2% 39.4µs ± 1% ~ (p=0.376 n=20+19)
RegexpMatchHard_32-12 2.04µs ± 0% 2.04µs ± 1% ~ (p=0.195 n=18+17)
RegexpMatchHard_1K-12 61.4µs ± 1% 61.4µs ± 1% ~ (p=0.885 n=19+19)
Revcomp-12 540ms ± 2% 542ms ± 4% ~ (p=0.552 n=19+17)
Template-12 69.6ms ± 1% 71.2ms ± 1% +2.39% (p=0.000 n=20+20)
TimeParse-12 357ns ± 1% 357ns ± 1% ~ (p=0.883 n=18+20)
TimeFormat-12 379ns ± 1% 362ns ± 1% -4.53% (p=0.000 n=18+19)
[Geo mean] 62.0µs 61.8µs -0.44%
name old time/op new time/op delta
XBenchGarbage-12 5.89ms ± 2% 5.81ms ± 2% -1.41% (p=0.000 n=19+18)
Change-Id: I96b31cca6ae77c30693a891cff3fe663fa2447a0
Reviewed-on: https://go-review.googlesource.com/17748
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
This improves the documentation comment on gcMarkDone, replaces a
recursive call with a simple goto, and disables preemption before
stopping the world in accordance with the documentation comment on
stopTheWorldWithSema.
Updates #13363, but, sadly, doesn't fix it.
Change-Id: I6cb2a5836b35685bf82f7b1ce7e48a7625906656
Reviewed-on: https://go-review.googlesource.com/17149
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The tiny alloc cache is maintained in a pointer from non-GC'd memory
(mcache) to heap memory and hence must be handled carefully.
Currently we clear the tiny alloc cache during sweep termination and,
if it is assigned to a non-nil value during concurrent marking, we
depend on a write barrier to keep the new value alive. However, while
the compiler currently always generates this write barrier, we're
treading on thin ice because write barriers may not happen for writes
to non-heap memory (e.g., typedmemmove). Without this lucky write
barrier, the GC may free a current tiny block while it's still
reachable by the tiny allocator, leading to later memory corruption.
Change this code so that, rather than depending on the write barrier,
we simply clear the tiny cache during mark termination when we're
clearing all of the other mcaches. If the current tiny block is
reachable from regular pointers, it will be retained; if it isn't
reachable from regular pointers, it may be freed, but that's okay
because there won't be any pointers in non-GC'd memory to it.
Change-Id: I8230980d8612c35c2997b9705641a1f9f865f879
Reviewed-on: https://go-review.googlesource.com/16962
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
If you set GODEBUG=cgocheck=2 the runtime package will use the write
barrier to detect cases where a Go program writes a Go pointer into
non-Go memory. In conjunction with the existing cgo checks, and the
not-yet-implemented cgo check for exported functions, this should
reliably detect all cases (that do not import the unsafe package) in
which a Go pointer is incorrectly shared with C code. This check is
optional because it turns on the write barrier at all times, which is
known to be expensive.
Update #12416.
Change-Id: I549d8b2956daa76eac853928e9280e615d6365f4
Reviewed-on: https://go-review.googlesource.com/16899
Reviewed-by: Russ Cox <rsc@golang.org>
runtime/internal/sys will hold system-, architecture- and config-
specific constants.
Updates #11647
Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54
Reviewed-on: https://go-review.googlesource.com/16817
Reviewed-by: Russ Cox <rsc@golang.org>
The GC now handles the root marking jobs as part of general marking,
so work.markfor is no longer used.
Change-Id: I6c3b23fed27e4e7ea6430d6ca7ba25ae4d04ed14
Reviewed-on: https://go-review.googlesource.com/16811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This changes "mark worker (idle)" to "GC worker (idle)" so it's more
clear to users that these goroutines are GC-related. It changes "GC
assist" to "GC assist wait" to make it clear that the assist is
blocked.
Change-Id: Iafbc0903c84f9250ff6bee14baac6fcd4ed5ef76
Reviewed-on: https://go-review.googlesource.com/16511
Reviewed-by: Rick Hudson <rlh@golang.org>
We couldn't do this before this point because it must be done before
the next GC cycle starts. Hence, if it delayed the start of the next
cycle, that would widen the window between reaching the heap trigger
of the next cycle and starting the next GC cycle, during which the
mutator could over-allocate. With the decentralized GC, any mutators
that reach the heap trigger will block on the GC starting, so it's
safe to widen the time between starting the world and being able to
start the next GC cycle.
Fixes#11465.
Change-Id: Ic7ea7e9eba5b66fc050299f843a9c9001ad814aa
Reviewed-on: https://go-review.googlesource.com/16394
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.
all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.
Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
This moves all of the mark 1 to mark 2 transition and mark termination
to the mark done transition function. This means these transitions are
now handled on the goroutine that detected mark completion. This also
means that the GC coordinator and the background completion barriers
are no longer used and various workarounds to yield to the coordinator
are no longer necessary. These will be removed in follow-up commits.
One consequence of this is that mark workers now need to be
preemptible when performing the mark done transition. This allows them
to stop the world and to perform the final clean-up steps of GC after
restarting the world. They are only made preemptible while performing
this transition, so if the worker findRunnableGCWorker would schedule
isn't available, we didn't want to schedule it anyway.
Fixes#11970.
Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837
Reviewed-on: https://go-review.googlesource.com/16391
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently gcMarkDone takes basically no time, so it's okay to account
the worker time after calling it. However, gcMarkDone is about to take
potentially *much* longer because it may perform all of mark
termination. Prepare for this by swapping the order so we account the
time before calling gcMarkDone.
Change-Id: I90c7df68192acfc4fd02a7254dae739dda4e2fcb
Reviewed-on: https://go-review.googlesource.com/16390
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently the code for completion of mark 1/mark 2 is duplicated in
background workers and assists. Factor this in to a single function
that will serve as the transition function for concurrent mark.
Change-Id: I4d9f697a15da0d349db3b34d56f3a220dd41d41b
Reviewed-on: https://go-review.googlesource.com/16359
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently, findRunnableGCWorker will perform mark completion if there
is no remaining work and no running workers. This used to be necessary
to resolve a race in the transition from mark 1 to mark 2 where we
would enter mark 2 with no mark work (and no dedicated workers), so no
workers would run, so no worker would signal mark completion.
However, we're about to make mark completion also perform the entire
follow-on process, which includes mark termination. We really don't
want to do that in the scheduler if it happens to detect completion.
Conveniently, this hack is no longer necessary because we always
enqueue root scanning work at the beginning of both mark 1 and mark 2,
so a mark worker will always run. Hence, we can simply eliminate it.
Change-Id: I3fc8f27c8da632f0fb732c9f6425e1f457f5652e
Reviewed-on: https://go-review.googlesource.com/16358
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This moves all of GC initialization, sweep termination, and the
transition to concurrent marking in to the off->mark transition
function. This means it's now handled on the goroutine that detected
the state exit condition.
As a result, malloc no longer needs to Gosched() at the beginning of
the GC cycle to prevent over-allocation while the GC is starting up
because it will now *help* the GC to start up. The Gosched hack is
still necessary during GC shutdown (this is easy to test by enabling
gctrace and hitting Ctrl-S to block the gctrace output).
At this point, the GC coordinator still handles later phases. This
requires a small tweak to how we start the GC coordinator. Currently,
starting the GC coordinator is best-effort and may fail if the
coordinator is about to park from the previous cycle but hasn't yet.
We fix this by replacing the park/ready to wake up the coordinator
with a semaphore. This is temporary since the coordinator will be
going away in a few commits.
Updates #11970.
Change-Id: I2c6a11c91e72dfbc59c2d8e7c66146dee9a444fe
Reviewed-on: https://go-review.googlesource.com/16357
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This moves concurrent sweep termination from the coordinator to the
off->mark transition. This allows it to be performed by all Gs
attempting to start the GC.
Updates #11970.
Change-Id: I24428e8599a759398c2ef7ec996ba755a448f947
Reviewed-on: https://go-review.googlesource.com/16356
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This begins the conversion of the centralized GC coordinator to a
decentralized state machine by introducing the internal API that
triggers the first state transition from _GCoff to _GCmark (or
_GCmarktermination).
This change introduces the transition lock, the off->mark transition
condition (which is very similar to shouldtriggergc()), and the
general structure of a state transition. Since we're doing this
conversion in stages, it then falls back to the GC coordinator to
actually execute the cycle. We'll start moving logic out of the GC
coordinator and in to transition functions next.
This fixes a minor bug in gcstoptheworld debug mode where passing the
heap trigger once could trigger multiple STW GCs.
Updates #11970.
Change-Id: I964087dd190a639eb5766398f8e1bbf8b352902f
Reviewed-on: https://go-review.googlesource.com/16355
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
For historical reasons we currently do a lot of the concurrent mark
setup on the system stack. In fact, at this point the one and only
thing that needs to happen on the system stack is the start-the-world.
Clean up this code by lifting everything other than the
start-the-world off the system stack.
The diff for this change looks large, but the only code change is to
narrow the systemstack call. Everything else is re-indentation.
Change-Id: I1e03b8afc759fad726f2397b05a17d183c2713ce
Reviewed-on: https://go-review.googlesource.com/16354
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
We're about to split func gc across several functions, so lift the
local variables it uses for tracking statistics and state across the
cycle into the global "work" variable.
Change-Id: Ie955f2f1758c7f5a5543ea1f3f33b222bc4b1d37
Reviewed-on: https://go-review.googlesource.com/16353
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently dedicated mark workers participate in the getfull barrier
during concurrent mark. However, the getfull barrier wasn't designed
for concurrent work and this causes no end of headaches.
In the concurrent setting, participants come and go. This makes mark
completion susceptible to live-lock: since dedicated workers are only
periodically polling for completion, it's possible for the program to
be in some transient worker each time one of the dedicated workers
wakes up to check if it can exit the getfull barrier. It also
complicates reasoning about the system because dedicated workers
participate directly in the getfull barrier, but transient workers
must instead use trygetfull because they have exit conditions that
aren't captured by getfull (e.g., fractional workers exit when
preempted). The complexity of implementing these exit conditions
contributed to #11677. Furthermore, the getfull barrier is inefficient
because we could be running user code instead of spinning on a P. In
effect, we're dedicating 25% of the CPU to marking even if that means
we have to spin to make that 25%. It also causes issues on Windows
because we can't actually sleep for 100µs (#8687).
Fix this by making dedicated workers no longer participate in the
getfull barrier. Instead, dedicated workers simply return to the
scheduler when they fail to get more work, regardless of what others
workers are doing, and the scheduler only starts new dedicated workers
if there's work available. Everything that needs to be handled by this
barrier is already handled by detection of mark completion.
This makes the system much more symmetric because all workers and
assists now use trygetfull during concurrent mark. It also loosens the
25% CPU target so that we can give some of that 25% back to user code
if there isn't enough work to keep the mark worker busy. And it
eliminates the problematic 100µs sleep on Windows during concurrent
mark (though not during mark termination).
The downside of this is that if we hit a bottleneck in the heap graph
that then expands back out, the system may shut down dedicated workers
and take a while to start them back up. We'll address this in the next
commit.
Updates #12041 and #8687.
No effect on the go1 benchmarks. This slows down the garbage benchmark
by 9%, but we'll more than make it up in the next commit.
name old time/op new time/op delta
XBenchGarbage-12 5.80ms ± 2% 6.32ms ± 4% +9.03% (p=0.000 n=20+20)
Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b
Reviewed-on: https://go-review.googlesource.com/16297
Reviewed-by: Rick Hudson <rlh@golang.org>
GC assists must block until the assist can be satisfied (either
through stealing credit or doing work) or the GC cycle ends.
Currently, this is implemented as a retry loop with a 100 µs delay.
This obviously isn't ideal, as it wastes CPU and delays mutator
execution. It also has the somewhat peculiar downside that sleeping a
G requires allocation, and this requires working around recursive
allocation.
Replace this timed delay with a proper scheduling queue. When an
assist can't be satisfied immediately, it adds the allocating G to a
queue and parks it. Any time background scan credit is flushed, it
consults this queue, directly satisfies the debt of queued assists,
and wakes up satisfied assists before flushing any remaining credit to
the background credit pool.
No effect on the go1 benchmarks. Slightly speeds up the garbage
benchmark.
name old time/op new time/op delta
XBenchGarbage-12 5.81ms ± 1% 5.72ms ± 4% -1.65% (p=0.011 n=20+20)
Updates #12041.
Change-Id: I8ee3b6274dd097b12b10a8030796a958a4b0e7b7
Reviewed-on: https://go-review.googlesource.com/15890
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently the concurrent root scan is performed in its entirety by the
GC coordinator before entering concurrent mark (which enables GC
workers). This scan is done sequentially, which can prolong the scan
phase, delay the mark phase, and means that the scan phase does not
obey the 25% CPU goal. Furthermore, there's no need to complete the
root scan before starting marking (in fact, we already allow GC
assists to happen during the scan phase), so this acts as an
unnecessary barrier between root scanning and marking.
This change shifts the root scan work out of the GC coordinator and in
to the GC workers. The coordinator simply sets up the scan state and
enqueues the right number of root scan jobs. The GC workers then drain
the root scan jobs prior to draining heap scan jobs.
This parallelizes the root scan process, makes it obey the 25% CPU
goal, and effectively eliminates root scanning as an isolated phase,
allowing the system to smoothly transition from root scanning to heap
marking. This also eliminates a major non-STW responsibility of the GC
coordinator, which will make it easier to switch to a decentralized
state machine. Finally, it puts us in a good position to perform root
scanning in assists as well, which will help satisfy assists at the
beginning of the GC cycle.
This is mostly straightforward. One tricky aspect is that we have to
deal with preemption deadlock: where two non-preemptible gorountines
are trying to preempt each other to perform a stack scan. Given the
context where this happens, the only instance of this is two
background workers trying to scan each other. We avoid this by simply
not scanning the stacks of background workers during the concurrent
phase; this is safe because we'll scan them during mark termination
(and their stacks are *very* small and should not contain any new
pointers).
This change also switches the root marking during mark termination to
use the same gcDrain-based code path as concurrent mark. This
shouldn't affect performance because STW root marking was already
parallel and tasks switched to heap marking immediately when no more
root marking tasks were available. However, it simplifies the code and
unifies these code paths.
This has negligible effect on the go1 benchmarks. It slightly slows
down the garbage benchmark, possibly by making GC run slightly more
frequently.
name old time/op new time/op delta
XBenchGarbage-12 5.10ms ± 1% 5.24ms ± 1% +2.87% (p=0.000 n=18+18)
name old time/op new time/op delta
BinaryTree17-12 3.25s ± 3% 3.20s ± 5% -1.57% (p=0.013 n=20+20)
Fannkuch11-12 2.45s ± 1% 2.46s ± 1% +0.38% (p=0.019 n=20+18)
FmtFprintfEmpty-12 49.7ns ± 3% 49.9ns ± 4% ~ (p=0.851 n=19+20)
FmtFprintfString-12 170ns ± 2% 170ns ± 1% ~ (p=0.775 n=20+19)
FmtFprintfInt-12 161ns ± 1% 160ns ± 1% -0.78% (p=0.000 n=19+18)
FmtFprintfIntInt-12 267ns ± 1% 270ns ± 1% +1.04% (p=0.000 n=19+19)
FmtFprintfPrefixedInt-12 238ns ± 2% 238ns ± 1% ~ (p=0.133 n=18+19)
FmtFprintfFloat-12 311ns ± 1% 310ns ± 2% -0.35% (p=0.023 n=20+19)
FmtManyArgs-12 1.08µs ± 1% 1.06µs ± 1% -2.31% (p=0.000 n=20+20)
GobDecode-12 8.65ms ± 1% 8.63ms ± 1% ~ (p=0.377 n=18+20)
GobEncode-12 6.49ms ± 1% 6.52ms ± 1% +0.37% (p=0.015 n=20+20)
Gzip-12 319ms ± 3% 318ms ± 1% ~ (p=0.975 n=19+17)
Gunzip-12 41.9ms ± 1% 42.1ms ± 2% +0.65% (p=0.004 n=19+20)
HTTPClientServer-12 61.7µs ± 1% 62.6µs ± 1% +1.40% (p=0.000 n=18+20)
JSONEncode-12 16.8ms ± 1% 16.9ms ± 1% ~ (p=0.239 n=20+18)
JSONDecode-12 58.4ms ± 1% 60.7ms ± 1% +3.85% (p=0.000 n=19+20)
Mandelbrot200-12 3.86ms ± 0% 3.86ms ± 1% ~ (p=0.092 n=18+19)
GoParse-12 3.75ms ± 2% 3.75ms ± 2% ~ (p=0.708 n=19+20)
RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 2% +0.60% (p=0.010 n=17+20)
RegexpMatchEasy0_1K-12 341ns ± 1% 342ns ± 2% ~ (p=0.203 n=20+19)
RegexpMatchEasy1_32-12 82.5ns ± 2% 83.2ns ± 2% +0.83% (p=0.007 n=19+19)
RegexpMatchEasy1_1K-12 495ns ± 1% 495ns ± 2% ~ (p=0.970 n=19+18)
RegexpMatchMedium_32-12 130ns ± 2% 130ns ± 2% +0.59% (p=0.039 n=19+20)
RegexpMatchMedium_1K-12 39.2µs ± 1% 39.3µs ± 1% ~ (p=0.214 n=18+18)
RegexpMatchHard_32-12 2.03µs ± 2% 2.02µs ± 1% ~ (p=0.166 n=18+19)
RegexpMatchHard_1K-12 61.0µs ± 1% 60.9µs ± 1% ~ (p=0.169 n=20+18)
Revcomp-12 533ms ± 1% 535ms ± 1% ~ (p=0.071 n=19+17)
Template-12 68.1ms ± 2% 73.0ms ± 1% +7.26% (p=0.000 n=19+20)
TimeParse-12 355ns ± 2% 356ns ± 2% ~ (p=0.530 n=19+20)
TimeFormat-12 357ns ± 2% 347ns ± 1% -2.59% (p=0.000 n=20+19)
[Geo mean] 62.1µs 62.3µs +0.31%
name old speed new speed delta
GobDecode-12 88.7MB/s ± 1% 88.9MB/s ± 1% ~ (p=0.377 n=18+20)
GobEncode-12 118MB/s ± 1% 118MB/s ± 1% -0.37% (p=0.015 n=20+20)
Gzip-12 60.9MB/s ± 3% 60.9MB/s ± 1% ~ (p=0.944 n=19+17)
Gunzip-12 464MB/s ± 1% 461MB/s ± 2% -0.64% (p=0.004 n=19+20)
JSONEncode-12 115MB/s ± 1% 115MB/s ± 1% ~ (p=0.236 n=20+18)
JSONDecode-12 33.2MB/s ± 1% 32.0MB/s ± 1% -3.71% (p=0.000 n=19+20)
GoParse-12 15.5MB/s ± 2% 15.5MB/s ± 2% ~ (p=0.702 n=19+20)
RegexpMatchEasy0_32-12 320MB/s ± 1% 318MB/s ± 2% ~ (p=0.094 n=18+20)
RegexpMatchEasy0_1K-12 3.00GB/s ± 1% 2.99GB/s ± 1% ~ (p=0.194 n=20+19)
RegexpMatchEasy1_32-12 388MB/s ± 2% 385MB/s ± 2% -0.83% (p=0.008 n=19+19)
RegexpMatchEasy1_1K-12 2.07GB/s ± 1% 2.07GB/s ± 1% ~ (p=0.964 n=19+18)
RegexpMatchMedium_32-12 7.68MB/s ± 1% 7.64MB/s ± 2% -0.57% (p=0.020 n=19+20)
RegexpMatchMedium_1K-12 26.1MB/s ± 1% 26.1MB/s ± 1% ~ (p=0.211 n=18+18)
RegexpMatchHard_32-12 15.8MB/s ± 1% 15.8MB/s ± 1% ~ (p=0.180 n=18+19)
RegexpMatchHard_1K-12 16.8MB/s ± 1% 16.8MB/s ± 2% ~ (p=0.236 n=20+19)
Revcomp-12 477MB/s ± 1% 475MB/s ± 1% ~ (p=0.071 n=19+17)
Template-12 28.5MB/s ± 2% 26.6MB/s ± 1% -6.77% (p=0.000 n=19+20)
[Geo mean] 100MB/s 99.0MB/s -0.82%
Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0
Reviewed-on: https://go-review.googlesource.com/16059
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
We already have gcMarkWorkAvailable, but the check for GC mark work is
open-coded in several places. Generalize gcMarkWorkAvailable slightly
and replace these open-coded checks with calls to gcMarkWorkAvailable.
In addition to cleaning up the code, this puts us in a better position
to make this check slightly more complicated.
Change-Id: I1b29883300ecd82a1bf6be193e9b4ee96582a860
Reviewed-on: https://go-review.googlesource.com/16058
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The ragged barrier after entering the concurrent mark phase is
vestigial. This used to be the point where we enabled write barriers,
so it was necessary to synchronize all Ps to ensure write barriers
were enabled before any marking occurred. However, we've long since
switched to enabling write barriers during the concurrent scan phase,
so the start-the-world at the beginning of the concurrent scan phase
ensures that all Ps have enabled the write barrier.
Hence, we can eliminate the old "install write barrier" phase.
Fixes#11971.
Change-Id: I8cdcb84b5525cef19927d51ea11ba0a4db991ea8
Reviewed-on: https://go-review.googlesource.com/16044
Reviewed-by: Rick Hudson <rlh@golang.org>
These functions are always called together and perform logically
related state resets, so combine them in to just gcResetMarkState.
Fixes#11427.
Change-Id: I06c17ef65f66186494887a767b3993126955b5fe
Reviewed-on: https://go-review.googlesource.com/16041
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently gcResetGState is called by func gcscan_m for concurrent GC
and directly by func gc for STW GC. Simplify this by consolidating
these two calls in to one call by func gc above where it splits for
concurrent and STW GC.
As a consequence, gcResetGState and gcResetMarkState are always called
together, so the next commit will consolidate these.
Change-Id: Ib62d404c7b32b28f7d3080d26ecf3966cbc4aca0
Reviewed-on: https://go-review.googlesource.com/16040
Reviewed-by: Rick Hudson <rlh@golang.org>
This work queue is no longer used (there are many reads of
work.partial, but the only write is in putpartial, which is never
called).
Fixes#11922.
Change-Id: I08b76c0c02a0867a9cdcb94783e1f7629d44249a
Reviewed-on: https://go-review.googlesource.com/15892
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, when the mutator allocates, the runtime first allocates the
memory and then, if that G has done "enough" allocation, the runtime
checks whether the G has assist debt to pay off and, if so, pays it
off. This approach leads to under-assisting, where a G can allocate a
large region (or many small regions) before paying for it, or can even
exit with outstanding debt.
This commit flips this around so that a G always acquires enough
credit for an allocation before it can perform that allocation. We
continue to amortize the cost of assists by requiring that they
over-assist when triggered to build up credit for many allocations.
Fixes#11967.
Change-Id: Idac9f11133b328535667674d837be72c23ebd899
Reviewed-on: https://go-review.googlesource.com/15409
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Currently we track the per-G GC assist balance as two monotonically
increasing values: the bytes allocated by the G this cycle (gcalloc)
and the scan work performed by the G this cycle (gcscanwork). The
assist balance is hence assistRatio*gcalloc - gcscanwork.
This works, but has two important downsides:
1) It requires floating-point math to figure out if a G is in debt or
not. This makes it inappropriate to check for assist debt in the
hot path of mallocgc, so we only do this when a G allocates a new
span. As a result, Gs can operate "in the red", leading to
under-assist and extended GC cycle length.
2) Revising the assist ratio during a GC cycle can lead to an "assist
burst". If you think of plotting the scan work performed versus
heaps size, the assist ratio controls the slope of this line.
However, in the current system, the target line always passes
through 0 at the heap size that triggered GC, so if the runtime
increases the assist ratio, there has to be a potentially large
assist to jump from the current amount of scan work up to the new
target scan work for the current heap size.
This commit replaces this approach with directly tracking the GC
assist balance in terms of allocation credit bytes. Allocating N bytes
simply decreases this by N and assisting raises it by the amount of
scan work performed divided by the assist ratio (to get back to
bytes).
This will make it cheap to figure out if a G is in debt, which will
let us efficiently check if an assist is necessary *before* performing
an allocation and hence keep Gs "in the black".
This also fixes assist bursts because the assist ratio is now in terms
of *remaining* work, rather than work from the beginning of the GC
cycle. Hence, the plot of scan work versus heap size becomes
continuous: we can revise the slope, but this slope always starts from
where we are right now, rather than where we were at the beginning of
the cycle.
Change-Id: Ia821c5f07f8a433e8da7f195b52adfedd58bdf2c
Reviewed-on: https://go-review.googlesource.com/15408
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we ensure a minimum heap distance of 1MB when computing the
assist ratio. Rather than enforcing this minimum on the heap distance,
it makes more sense to enforce that the heap goal itself is at least
1MB over the live heap size at the beginning of GC. Currently the two
approaches are semantically equivalent, but this will let us switch to
basing the assist ratio on current heap distance rather than the
initial heap distance, since we can't enforce this minimum on the
current heap distance (the GC may never finish because the goal posts
will always be 1MB away).
Change-Id: I0027b1c26a41a0152b01e5b67bdb1140d43ee903
Reviewed-on: https://go-review.googlesource.com/15604
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, gcController.scanWork is updated as lazily as possible
since it is only read at the end of the GC cycle. We're about to read
it during the GC cycle to improve the assist ratio revisions, so
modify gcDrain* to regularly flush to gcController.scanWork in much
the same way as we regularly flush to gcController.bgScanCredit.
One consequence of this is that it's difficult to keep gcw.scanWork
monotonic, so we give up on that and simply return the amount of scan
work done by gcDrainN rather than calculating it in the caller.
Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d
Reviewed-on: https://go-review.googlesource.com/15407
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently callers of gcDrain control whether it flushes scan work
credit to gcController.bgScanCredit by passing a value other than -1
for the flush threshold. Shortly we're going to make this always flush
scan work to gcController.scanWork and optionally also flush scan work
to gcController.bgScanCredit. This will be much easier if the flush
threshold is simply a constant (which it is in practice) and callers
merely control whether or not the flush includes the background
credit. Hence, replace the flush threshold argument with a flag.
Change-Id: Ia27db17de8a3f1e462a5d7137d4b5dc72f99a04e
Reviewed-on: https://go-review.googlesource.com/15406
Reviewed-by: Rick Hudson <rlh@golang.org>
These functions were nearly identical. Consolidate them by adding a
flags argument. In addition to cleaning up this code, this makes
further changes that affect both functions easier.
Change-Id: I6ec5c947603bbbd3ff4040113b2fbc240e99745f
Reviewed-on: https://go-review.googlesource.com/15405
Reviewed-by: Rick Hudson <rlh@golang.org>