1
0
mirror of https://github.com/golang/go synced 2024-09-29 16:24:28 -06:00

runtime: tweak bgsweep "low-priority" heuristic

Currently bgsweep attempts to be a low-priority background goroutine
that runs mainly when the application is mostly idle. To avoid
complicating the scheduler further, it achieves this with a simple
heuristic: call Gosched after each span swept. While this is somewhat
inefficient as there's scheduling overhead on each iteration, it's
mostly fine because it tends to just come out of idle time anyway. In a
busy system, the call to Gosched quickly puts bgsweep at the back of
scheduler queues.

However, what's problematic about this heuristic is the number of
tracing events it produces. Average span sweeping latencies have been
measured as low as 30 ns, so every 30 ns in the sweep phase, with
available idle time, there would be a few trace events emitted. This
could result in an overwhelming number, making traces much larger than
they need to be. It also pollutes other observability tools, like the
scheduling latencies runtime metric, because bgsweep stays runnable the
whole time.

This change fixes these problems with two modifications to the
heursitic:

1. Check if there are any idle Ps before yielding. If there are, don't
   yield.
2. Sweep at least 10 spans before trying to yield.

(1) is doing most of the work here. This change assumes that the
presence of idle Ps means that there is available CPU time, so bgsweep
is already making use of idle time and there's no reason it should stop.
This will have the biggest impact on the aforementioned issues.

(2) is a mitigation for the case where GOMAXPROCS=1, because we won't
ever observe a zero idle P count. It does mean that bgsweep is a little
bit higher priority than before because it yields its time less often,
so it could interfere with goroutine scheduling latencies more. However,
by sweeping 10 spans before volunteering time, we directly reduce trace
event production by 90% in all cases. The impact on scheduling latencies
should be fairly minimal, as sweeping a span is already so fast, that
sweeping 10 is unlikely to make a dent in any meaningful end-to-end
latency. In fact, it may even improve application latencies overall by
freeing up spans and sweep work from goroutines allocating memory. It
may be worth considering pushing this number higher in the future.

Another reason to do (2) is to reduce contention on npidle, which will
be checked as part of (1), but this is a fairly minor concern. The main
reason is to capture the GOMAXPROCS=1 case.

Fixes #54767.

Change-Id: I4361400f17197b8ab84c01f56203f20575b29fc6
Reviewed-on: https://go-review.googlesource.com/c/go/+/429615
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This commit is contained in:
Michael Anthony Knyszek 2022-09-08 20:59:02 +00:00 committed by Gopher Robot
parent 63ceff95fa
commit 8451529e9a
2 changed files with 36 additions and 2 deletions

View File

@ -279,12 +279,34 @@ func bgsweep(c chan int) {
goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceEvGoBlock, 1)
for {
// bgsweep attempts to be a "low priority" goroutine by intentionally
// yielding time. It's OK if it doesn't run, because goroutines allocating
// memory will sweep and ensure that all spans are swept before the next
// GC cycle. We really only want to run when we're idle.
//
// However, calling Gosched after each span swept produces a tremendous
// amount of tracing events, sometimes up to 50% of events in a trace. It's
// also inefficient to call into the scheduler so much because sweeping a
// single span is in general a very fast operation, taking as little as 30 ns
// on modern hardware. (See #54767.)
//
// As a result, bgsweep sweeps in batches, and only calls into the scheduler
// at the end of every batch. Furthermore, it only yields its time if there
// isn't spare idle time available on other cores. If there's available idle
// time, helping to sweep can reduce allocation latencies by getting ahead of
// the proportional sweeper and having spans ready to go for allocation.
const sweepBatchSize = 10
nSwept := 0
for sweepone() != ^uintptr(0) {
sweep.nbgsweep++
Gosched()
nSwept++
if nSwept%sweepBatchSize == 0 {
goschedIfBusy()
}
}
for freeSomeWbufs(true) {
Gosched()
// N.B. freeSomeWbufs is already batched internally.
goschedIfBusy()
}
lock(&sweep.lock)
if !isSweepDone() {

View File

@ -326,6 +326,18 @@ func goschedguarded() {
mcall(goschedguarded_m)
}
// goschedIfBusy yields the processor like gosched, but only does so if
// there are no idle Ps or if we're on the only P and there's nothing in
// the run queue. In both cases, there is freely available idle time.
//
//go:nosplit
func goschedIfBusy() {
if sched.npidle.Load() > 0 {
return
}
mcall(gosched_m)
}
// Puts the current goroutine into a waiting state and calls unlockf on the
// system stack.
//