There were two issues.
1. Delayed EvGoSysExit could have been emitted during TraceStart,
while it had not yet emitted EvGoInSyscall.
2. Delayed EvGoSysExit could have been emitted during next tracing session.
Fixes#10476Fixes#11262
Change-Id: Iab68eb31cf38eb6eb6eee427f49c5ca0865a8c64
Reviewed-on: https://go-review.googlesource.com/9132
Reviewed-by: Russ Cox <rsc@golang.org>
This fixes a hang during runtime.TestTraceStress.
It also fixes double-scan of stacks, which leads to
stack barrier installation failures.
Both of these have shown up as flaky failures on the dashboard.
Fixes#10941.
Change-Id: Ia2a5991ce2c9f43ba06ae1c7032f7c898dc990e0
Reviewed-on: https://go-review.googlesource.com/11089
Reviewed-by: Austin Clements <austin@google.com>
These were found by grepping the comments from the go code and feeding
the output to aspell.
Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395
Reviewed-on: https://go-review.googlesource.com/10941
Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The stack barrier code will need a bookkeeping structure to keep track
of the overwritten return PCs. This commit introduces and allocates
this structure, but does not yet use the structure.
We don't want to allocate space for this structure during garbage
collection, so this commit allocates it along with the allocation of
the corresponding stack. However, we can't do a regular allocation in
newstack because mallocgc may itself grow the stack (which would lead
to a recursive allocation). Hence, this commit makes the bookkeeping
structure part of the stack allocation itself by stealing the
necessary space from the top of the stack allocation. Since the size
of this bookkeeping structure is logarithmic in the size of the stack,
this has minimal impact on stack behavior.
Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665
Reviewed-on: https://go-review.googlesource.com/10313
Reviewed-by: Russ Cox <rsc@golang.org>
Currently the runtime assumes that the allocation for the stack is
exactly [stack.lo, stack.hi). We're about to steal a small part of
this allocation for per-stack GC metadata. To prepare for this, this
commit adds a field to the G for the allocated size of the stack.
With this change, stack.lo and stack.hi continue to act as the true
bounds on the stack, but are no longer also used as the bounds on the
stack allocation.
(I also tried this the other way around, where stack.lo and stack.hi
remained the allocation bounds and I introduced a new top of stack.
However, there are far more places that assume stack.hi is the true
top of the stack than there are places that assume it's the top of the
allocation.)
Change-Id: Ifa9d956753be53d286d09cbc73d47fb34a18c0c6
Reviewed-on: https://go-review.googlesource.com/10312
Reviewed-by: Russ Cox <rsc@golang.org>
Ian proposed an improved way of handling signals masks in Go, motivated
by a problem where the Android java runtime expects certain signals to
be blocked for all JVM threads. Discussion here
https://groups.google.com/forum/#!topic/golang-dev/_TSCkQHJt6g
Ian's text is used in the following:
A Go program always needs to have the synchronous signals enabled.
These are the signals for which _SigPanic is set in sigtable, namely
SIGSEGV, SIGBUS, SIGFPE.
A Go program that uses the os/signal package, and calls signal.Notify,
needs to have at least one thread which is not blocking that signal,
but it doesn't matter much which one.
Unix programs do not change signal mask across execve. They inherit
signal masks across fork. The shell uses this fact to some extent;
for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are
blocked for commands run due to backquote quoting or $().
Our current position on signal masks was not thought out. We wandered
into step by step, e.g., http://golang.org/cl/7323067 .
This CL does the following:
Introduce a new platform hook, msigsave, that saves the signal mask of
the current thread to m.sigsave.
Call msigsave from needm and newm.
In minit grab set up the signal mask from m.sigsave and unblock the
essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT
(for systems that have it).
In unminit, restore the signal mask from m.sigsave.
The first time that os/signal.Notify is called, start a new thread whose
only purpose is to update its signal mask to make sure signals for
signal.Notify are unblocked on at least one thread.
The effect on Go programs will be that if they are invoked with some
non-synchronous signals blocked, those signals will normally be
ignored. Previously, those signals would mostly be ignored. A change
in behaviour will occur for programs started with any of these signals
blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT,
SIGTERM. Previously those signals would always cause a crash (unless
using the os/signal package); with this change, they will be ignored
if the program is started with the signal blocked (and does not use
the os/signal package).
./all.bash completes successfully on linux/amd64.
OpenBSD is missing the implementation.
Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c
Reviewed-on: https://go-review.googlesource.com/10173
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently, forEachP reuses the stopwait and stopnote fields from
stopTheWorld to track how many Ps have not responded to the safe-point
request and to sleep until all Ps have responded.
It was assumed this was safe because both stopTheWorld and forEachP
must occur under the worlsema and hence stopwait and stopnote cannot
be used for both purposes simultaneously and callers could always
determine the appropriate use based on sched.gcwaiting (which is only
set by stopTheWorld). However, this is not the case, since it's
possible for there to be a window between when an M observes that
gcwaiting is set and when it checks stopwait during which stopwait
could have changed meanings. When this happens, the M decrements
stopwait and may wakeup stopnote, but does not otherwise participate
in the forEachP protocol. As a result, stopwait is decremented too
many times, so it may reach zero before all Ps have run the safe-point
function, causing forEachP to wake up early. It will then either
observe that some P has not run the safe-point function and panic with
"P did not run fn", or the remaining P (or Ps) will run the safe-point
function before it wakes up and it will observe that stopwait is
negative and panic with "not stopped".
Fix this problem by giving forEachP its own safePointWait and
safePointNote fields.
One known sequence of events that can cause this race is as
follows. It involves three actors:
G1 is running on M1 on P1. P1 has an empty run queue.
G2/M2 is in a blocked syscall and has lost its P. (The details of this
don't matter, it just needs to be in a position where it needs to grab
an idle P.)
GC just started on G3/M3/P3. (These aren't very involved, they just
have to be separate from the other G's, M's, and P's.)
1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1.
Now G1/M1 begins to enter a syscall:
2. G1/M1 invokes reentersyscall, which sets the P1's status to
_Psyscall.
3. G1/M1's reentersyscall observes gcwaiting != 0 and calls
entersyscall_gcwait.
4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock.
Back on GC:
5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and
returns.
6. GC does stuff and then calls startTheWorld().
7. startTheWorld() calls procresize(), which sets P1's status to
_Pidle and puts P1 on the idle list.
Now G2/M2 returns from its syscall and takes over P1:
8. G2/M2 returns from its blocked syscall and gets P1 from the idle
list.
9. G2/M2 acquires P1, which sets P1's status to _Prunning.
10. G2/M2 starts a new syscall and invokes reentersyscall, which sets
P1's status to _Psyscall.
Back on G1/M1:
11. G1/M1 finally acquires sched.lock in entersyscall_gcwait.
At this point, G1/M1 still thinks it's running on P1. P1's status is
_Psyscall, which is consistent with what G1/M1 is doing, but it's
_Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is
basically an ABA race on P1's status.
Because forEachP currently shares stopwait with stopTheWorld. G1/M1's
entersyscall_gcwait observes the non-zero stopwait set by forEachP,
but mistakes it for a stopTheWorld. It cas's P1's status from
_Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement
stopwait one more time than forEachP was expecting.
Fixes#10618. (See the issue for details on why the above race is safe
when forEachP is not involved.)
Prior to this commit, the command
stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile
would reliably fail after a few hundred runs. With this commit, it
ran for over 2 million runs and never crashed.
Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30
Reviewed-on: https://go-review.googlesource.com/10157
Reviewed-by: Russ Cox <rsc@golang.org>
Currently, runqsteal steals Gs from another P into an intermediate
buffer and then copies those Gs into the current P's run queue. This
intermediate buffer itself was moved from the stack to the P in commit
c4fe503 to eliminate the cost of zeroing it on every steal.
This commit follows up c4fe503 by stealing directly into the current
P's run queue, which eliminates the copy and the need for the
intermediate buffer. The update to the tail pointer is only committed
once the entire steal operation has succeeded, so the semantics of
stealing do not change.
Change-Id: Icdd7a0eb82668980bf42c0154b51eef6419fdd51
Reviewed-on: https://go-review.googlesource.com/9998
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
One important use case is a pipeline computation that pass values
from one Goroutine to the next and then exits or is placed in a
wait state. If GOMAXPROCS > 1 a Goroutine running on P1 will enable
another Goroutine and then immediately make P1 available to execute
it. We need to prevent other Ps from stealing the G that P1 is about
to execute. Otherwise the Gs can thrash between Ps causing unneeded
synchronization and slowing down throughput.
Fix this by changing the stealing logic so that when a P attempts to
steal the only G on some other P's run queue, it will pause
momentarily to allow the victim P to schedule the G.
As part of optimizing stealing we also use a per P victim queue
move stolen gs. This eliminates the zeroing of a stack local victim
queue which turned out to be expensive.
This CL is a necessary but not sufficient prerequisite to changing
the default value of GOMAXPROCS to something > 1 which is another
CL/discussion.
For highly serialized programs, such as GoroutineRing below this can
make a large difference. For larger and more parallel programs such
as the x/benchmarks there is no noticeable detriment.
~/work/code/src/rsc.io/benchstat/benchstat old.txt new.txt
name old mean new mean delta
GoroutineRing 30.2µs × (0.98,1.01) 30.1µs × (0.97,1.04) ~ (p=0.941)
GoroutineRing-2 113µs × (0.91,1.07) 30µs × (0.98,1.03) -73.17% (p=0.004)
GoroutineRing-4 144µs × (0.98,1.02) 32µs × (0.98,1.01) -77.69% (p=0.000)
GoroutineRingBuf 32.7µs × (0.97,1.03) 32.5µs × (0.97,1.02) ~ (p=0.795)
GoroutineRingBuf-2 120µs × (0.92,1.08) 33µs × (1.00,1.00) -72.48% (p=0.004)
GoroutineRingBuf-4 138µs × (0.92,1.06) 33µs × (1.00,1.00) -76.21% (p=0.003)
The bench benchmarks show little impact.
old new
garbage 7032879 7011696
httpold 25509 25301
splayold 1022073 1019499
jsonold 28230624 28081433
Change-Id: I228c48fed8d85c9bbef16a7edc53ab7898506f50
Reviewed-on: https://go-review.googlesource.com/9872
Reviewed-by: Austin Clements <austin@google.com>
Since we now have stack information for code running on the
systemstack, we can traceback over it. To make cpu profiles useful,
add a case in gentraceback to jump over systemstack switches.
Fixes#10609.
Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7
Reviewed-on: https://go-review.googlesource.com/9506
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Makes searching in source code easier.
Change-Id: Ie2e85934d23920ac0bc01d28168bcfbbdc465580
Reviewed-on: https://go-review.googlesource.com/9774
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
Currently, we use a full stop-the-world around enabling write
barriers. This is to ensure that all Gs have enabled write barriers
before any blackening occurs (either in gcBgMarkWorker() or in
gcAssistAlloc()).
However, there's no need to bring the whole world to a synchronous
stop to ensure this. This change replaces the STW with a ragged
barrier that ensures each P has individually observed that write
barriers should be enabled before GC performs any blackening.
Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955
Reviewed-on: https://go-review.googlesource.com/8207
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This adds forEachP, which performs a general-purpose ragged global
barrier. forEachP takes a callback and invokes it for every P at a GC
safe point.
Ps that are idle or in a syscall are considered to be at a continuous
safe point. forEachP ensures that these Ps do not change state by
forcing all syscall Ps into idle and holding the sched.lock.
To ensure that Ps do not enter syscall or idle without running the
safe-point function, this adds checks for a pending callback every
place there is currently a gcwaiting check.
We'll use forEachP to replace the STW around enabling the write
barrier and to replace the current asynchronous per-M wbuf cache with
a cooperatively managed per-P gcWork cache.
Change-Id: Ie944f8ce1fead7c79bf271d2f42fcd61a41bb3cc
Reviewed-on: https://go-review.googlesource.com/8206
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, each M has a cache of the most recently used *workbuf. This
is used primarily by the write barrier so it doesn't have to access
the global workbuf lists on every write barrier. It's also used by
stack scanning because it's convenient.
This cache is important for write barrier performance, but this
particular approach has several downsides. It's faster than no cache,
but far from optimal (as the benchmarks below show). It's complex:
access to the cache is sprinkled through most of the workbuf list
operations and it requires special care to transform into and back out
of the gcWork cache that's actually used for scanning and marking. It
requires atomic exchanges to take ownership of the cached workbuf and
to return it to the M's cache even though it's almost always used by
only the current M. Since it's per-M, flushing these caches is O(# of
Ms), which may be high. And it has some significant subtleties: for
example, in general the cache shouldn't be used after the
harvestwbufs() in mark termination because it could hide work from
mark termination, but stack scanning can happen after this and *will*
use the cache (but it turns out this is okay because it will always be
followed by a getfull(), which drains the cache).
This change replaces this cache with a per-P gcWork object. This
gcWork cache can be used directly by scanning and marking (as long as
preemption is disabled, which is a general requirement of gcWork).
Since it's per-P, it doesn't require synchronization, which simplifies
things and means the only atomic operations in the write barrier are
occasionally fetching new work buffers and setting a mark bit if the
object isn't already marked. This cache can be flushed in O(# of Ps),
which is generally small. It follows a simple flushing rule: the cache
can be used during any phase, but during mark termination it must be
flushed before allowing preemption. This also makes the dispose during
mutator assist no longer necessary, which eliminates the vast majority
of gcWork dispose calls and reduces contention on the global workbuf
lists. And it's a lot faster on some benchmarks:
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 11963668673 11206112763 -6.33%
BenchmarkFannkuch11 2643217136 2649182499 +0.23%
BenchmarkFmtFprintfEmpty 70.4 70.2 -0.28%
BenchmarkFmtFprintfString 364 307 -15.66%
BenchmarkFmtFprintfInt 317 282 -11.04%
BenchmarkFmtFprintfIntInt 512 483 -5.66%
BenchmarkFmtFprintfPrefixedInt 404 380 -5.94%
BenchmarkFmtFprintfFloat 521 479 -8.06%
BenchmarkFmtManyArgs 2164 1894 -12.48%
BenchmarkGobDecode 30366146 22429593 -26.14%
BenchmarkGobEncode 29867472 26663152 -10.73%
BenchmarkGzip 391236616 396779490 +1.42%
BenchmarkGunzip 96639491 96297024 -0.35%
BenchmarkHTTPClientServer 100110 70763 -29.31%
BenchmarkJSONEncode 51866051 52511382 +1.24%
BenchmarkJSONDecode 103813138 86094963 -17.07%
BenchmarkMandelbrot200 4121834 4120886 -0.02%
BenchmarkGoParse 16472789 5879949 -64.31%
BenchmarkRegexpMatchEasy0_32 140 140 +0.00%
BenchmarkRegexpMatchEasy0_1K 394 394 +0.00%
BenchmarkRegexpMatchEasy1_32 120 120 +0.00%
BenchmarkRegexpMatchEasy1_1K 621 614 -1.13%
BenchmarkRegexpMatchMedium_32 209 202 -3.35%
BenchmarkRegexpMatchMedium_1K 54889 55175 +0.52%
BenchmarkRegexpMatchHard_32 2682 2675 -0.26%
BenchmarkRegexpMatchHard_1K 79383 79524 +0.18%
BenchmarkRevcomp 584116718 584595320 +0.08%
BenchmarkTemplate 125400565 109620196 -12.58%
BenchmarkTimeParse 386 387 +0.26%
BenchmarkTimeFormat 580 447 -22.93%
(Best out of 10 runs. The delta of averages is similar.)
This also puts us in a good position to flush these caches when
nearing the end of concurrent marking, which will let us increase the
size of the work buffers while still controlling mark termination
pause time.
Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522
Reviewed-on: https://go-review.googlesource.com/9178
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently, when the runtime ready()s a G, it adds it to the end of the
current P's run queue and continues running. If there are many other
things in the run queue, this can result in a significant delay before
the ready()d G actually runs and can hurt fairness when other Gs in
the run queue are CPU hogs. For example, if there are three Gs sharing
a P, one of which is a CPU hog that never voluntarily gives up the P
and the other two of which are doing small amounts of work and
communicating back and forth on an unbuffered channel, the two
communicating Gs will get very little CPU time.
Change this so that when G1 ready()s G2 and then blocks, the scheduler
immediately hands off the remainder of G1's time slice to G2. In the
above example, the two communicating Gs will now act as a unit and
together get half of the CPU time, while the CPU hog gets the other
half of the CPU time.
This fixes the problem demonstrated by the ping-pong benchmark added
in the previous commit:
benchmark old ns/op new ns/op delta
BenchmarkPingPongHog 684287 825 -99.88%
On the x/benchmarks suite, this change improves the performance of
garbage by ~6% (for GOMAXPROCS=1 and 4), and json by 28% and 36% for
GOMAXPROCS=1 and 4. It has negligible effect on heap size.
This has no effect on the go1 benchmark suite since those benchmarks
are mostly single-threaded.
Change-Id: I858a08eaa78f702ea98a5fac99d28a4ac91d339f
Reviewed-on: https://go-review.googlesource.com/9289
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently, in accordance with the GC pacing proposal, we schedule
background marking with a goal of achieving 25% utilization *total*
between mutator assists and background marking. This is stricter than
was set out in the Go 1.5 proposal, which suggests that the garbage
collector can use 25% just for itself and anything the mutator does to
help out is on top of that. It also has several technical
drawbacks. Because mutator assist time is constantly changing and we
can't have instantaneous information on background marking time, it
effectively requires hitting a moving target based on out-of-date
information. This works out in the long run, but works poorly for
short GC cycles and on short time scales. Also, this requires
time-multiplexing all Ps between the mutator and background GC since
the goal utilization of background GC constantly fluctuates. This
results in a complicated scheduling algorithm, poor affinity, and
extra overheads from context switching.
This change modifies the way we schedule and run background marking so
that background marking always consumes 25% of GOMAXPROCS and mutator
assist is in addition to this. This enables a much more robust
scheduling algorithm where we pre-determine the number of Ps we should
dedicate to background marking as well as the utilization goal for a
single floating "remainder" mark worker.
Change-Id: I187fa4c03ab6fe78012a84d95975167299eb9168
Reviewed-on: https://go-review.googlesource.com/9013
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, the concurrent mark phase is performed by the main GC
goroutine. Prior to the previous commit enabling preemption, this
caused marking to always consume 1/GOMAXPROCS of the available CPU
time. If GOMAXPROCS=1, this meant background GC would consume 100% of
the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use
less than the goal of 25%. If GOMAXPROCS=4, background GC would use
the goal 25%, but if the mutator wasn't using the remaining 75%,
background marking wouldn't take advantage of the idle time. Enabling
preemption in the previous commit made GC miss CPU targets in
completely different ways, but set us up to bring everything back in
line.
This change replaces the fixed GC goroutine with per-P background mark
goroutines. Once started, these goroutines don't go in the standard
run queues; instead, they are scheduled specially such that the time
spent in mutator assists and the background mark goroutines totals 25%
of the CPU time available to the program. Furthermore, this lets
background marking take advantage of idle Ps, which significantly
boosts GC performance for applications that under-utilize the CPU.
This requires also changing how time is reported for gctrace, so this
change splits the concurrent mark CPU time into assist/background/idle
scanning.
This also requires increasing the size of the StackRecord slice used
in a GoroutineProfile test.
Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157
Reviewed-on: https://go-review.googlesource.com/8850
Reviewed-by: Rick Hudson <rlh@golang.org>
This time is tracked per P and periodically flushed to the global
controller state. This will be used to compute mutator assist
utilization in order to schedule background GC work.
Change-Id: Ib94f90903d426a02cf488bf0e2ef67a068eb3eec
Reviewed-on: https://go-review.googlesource.com/8837
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, mutator allocation periodically assists the garbage
collector by performing a small, fixed amount of scanning work.
However, to control heap growth, mutators need to perform scanning
work *proportional* to their allocation rate.
This change implements proportional mutator assists. This uses the
scan work estimate computed by the garbage collector at the beginning
of each cycle to compute how much scan work must be performed per
allocation byte to complete the estimated scan work by the time the
heap reaches the goal size. When allocation triggers an assist, it
uses this ratio and the amount allocated since the last assist to
compute the assist work, then attempts to steal as much of this work
as possible from the background collector's credit, and then performs
any remaining scan work itself.
Change-Id: I98b2078147a60d01d6228b99afd414ef857e4fba
Reviewed-on: https://go-review.googlesource.com/8836
Reviewed-by: Rick Hudson <rlh@golang.org>
This CL revises CL 7504 to use explicitly uintptr types for the
struct fields that are going to be updated sometimes without
write barriers. The result is that the fields are now updated *always*
without write barriers.
This approach has two important properties:
1) Now the GC never looks at the field, so if the missing reference
could cause a problem, it will do so all the time, not just when the
write barrier is missed at just the right moment.
2) Now a write barrier never happens for the field, avoiding the
(correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1.
Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e
Reviewed-on: https://go-review.googlesource.com/9019
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
readyExecute passes a closure to mcall that captures an argument to
readyExecute. Since mcall is marked noescape, this closure lives on
the stack of the calling goroutine. However, the closure puts the
calling goroutine on the run queue (and switches to a new
goroutine). If the calling goroutine gets scheduled before the mcall
returns, this stack-allocated closure will become invalid while it's
still executing. One consequence of this we've observed is that the
captured gp variable can get overwritten before the call to
execute(gp), causing execute(gp) to segfault.
Fix this by passing the currently captured gp variable through a field
in the calling goroutine's g struct so that the func is no longer a
closure.
To prevent problems like this in the future, this change also removes
the go:noescape annotation from mcall. Due to a compiler bug, this
will currently cause a func closure passed to mcall to be implicitly
allocated rather than refusing the implicit allocation. However, this
is okay because there are no other closures passed to mcall right now
and the compiler bug will be fixed shortly.
Fixes#10428.
Change-Id: I49b48b85de5643323b89e9eaa4df63854e968c32
Reviewed-on: https://go-review.googlesource.com/8866
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
This memory is untyped and can't be used anymore.
The next version of SWIG won't need it.
Change-Id: I592b287c5f5186975ee09a9b28d8efe3b57134e7
Reviewed-on: https://go-review.googlesource.com/8956
Reviewed-by: Ian Lance Taylor <iant@golang.org>
By removing type slice, renaming type sliceStruct to type slice and
whacking until it compiles.
Has a pleasing net reduction of conversions.
Fixes#10188
Change-Id: I77202b8df637185b632fd7875a1fdd8d52c7a83c
Reviewed-on: https://go-review.googlesource.com/8770
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
According to Go execution modes, a Go program compiled with
-buildmode=c-archive has a main function, but it is ignored on run.
This gives the runtime the information it needs not to run the main.
I have this working with pending linker changes on darwin/amd64.
Change-Id: I49bd7d65aa619ec847c464a872afa5deea7d4d30
Reviewed-on: https://go-review.googlesource.com/8701
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This is Part 2 of the change, see Part 1 here: in https://go-review.googlesource.com/#/c/7692/
Suggested by iant@, we use the library initialization entry point to:
- create a new OS thread and run the "regular" runtime init stack on
that thread
- return immediately from the main (i.e., loader) thread
- at the first CGO invocation, we wait for the runtime initialization
to complete.
The above mechanism is implemented only on linux_amd64. Next step is to
support it on linux_arm. Other platforms don't yet support shared library
compiling/linking, but we intend to use the same strategy there as well.
Change-Id: Ib2c81b1b83bee837134084b75a3beecfb8de6bf4
Reviewed-on: https://go-review.googlesource.com/8094
Run-TryBot: Srdjan Petrovic <spetrovic@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This tracks both total CPU time used by GC and the total time
available to all Ps since the beginning of the program and uses this
to derive a cumulative CPU usage percent for the gctrace line.
Change-Id: Ica85372b8dd45f7621909b325d5ac713a9b0d015
Reviewed-on: https://go-review.googlesource.com/8350
Reviewed-by: Russ Cox <rsc@golang.org>
Also invert it, which means it no longer needs to cross the cgo
package boundary.
Change-Id: I393cd073bda02b591a55d6bc6b8bb94970ea71cd
Reviewed-on: https://go-review.googlesource.com/8082
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
To reduce lock contention in this mode, makes persistent allocation state per-P,
which means at most 64 kB overhead x $GOMAXPROCS, which should be
completely tolerable.
Change-Id: I34ca95e77d7e67130e30822e5a4aff6772b1a1c5
Reviewed-on: https://go-review.googlesource.com/7740
Reviewed-by: Rick Hudson <rlh@golang.org>
The value in question is really a bit pattern
(a pointer with extra bits thrown in),
so treat it as a uintptr instead, avoiding the
generation of a write barrier when there
might not be a p.
Also add the obligatory //go:nowritebarrier.
Change-Id: I4ea097945dd7093a140f4740bcadca3ce7191971
Reviewed-on: https://go-review.googlesource.com/7667
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
The GC assumes that there will be no asynchronous write barriers when
the world is stopped. This keeps the synchronization between write
barriers and the GC simple. However, currently, there are a few places
in runtime code where this assumption does not hold.
The GC stops the world by collecting all Ps, which stops all user Go
code, but small parts of the runtime can run without a P. For example,
the code that releases a P must still deschedule its G onto a runnable
queue before stopping. Similarly, when a G returns from a long-running
syscall, it must run code to reacquire a P.
Currently, this code can contain write barriers. This can lead to the
GC collecting reachable objects if something like the following
sequence of events happens:
1. GC stops the world by collecting all Ps.
2. G #1 returns from a syscall (for example), tries to install a
pointer to object X, and calls greyobject on X.
3. greyobject on G #1 marks X, but does not yet add it to a write
buffer. At this point, X is effectively black, not grey, even though
it may point to white objects.
4. GC reaches X through some other path and calls greyobject on X, but
greyobject does nothing because X is already marked.
5. GC completes.
6. greyobject on G #1 adds X to a work buffer, but it's too late.
7. Objects that were reachable only through X are incorrectly collected.
To fix this, we check the invariant that no asynchronous write
barriers happen when the world is stopped by checking that write
barriers always have a P, and modify all currently known sources of
these writes to disable the write barrier. In all modified cases this
is safe because the object in question will always be reachable via
some other path.
Some of the trace code was turned off, in particular the
code that traces returning from a syscall. The GC assumes
that as far as the heap is concerned the thread is stopped
when it is in a syscall. Upon returning the trace code
must not do any heap writes for the same reasons discussed
above.
Fixes#10098Fixes#9953Fixes#9951Fixes#9884
May relate to #9610#9771
Change-Id: Ic2e70b7caffa053e56156838eb8d89503e3c0c8a
Reviewed-on: https://go-review.googlesource.com/7504
Reviewed-by: Austin Clements <austin@google.com>
Everything has moved to Go, but comments still refer to .c/.h files.
Fix all of those up, at least for these three directories.
Fixes#10138
Change-Id: Ie5efe89b247841e0b3f82aac5256b2c606ef67dc
Reviewed-on: https://go-review.googlesource.com/7431
Reviewed-by: Russ Cox <rsc@golang.org>
Stip uninteresting bottom and top frames from trace stacks.
This makes both binary and json trace files smaller,
and also makes stacks shorter and more readable in the viewer.
Change-Id: Ib9c80ccc280504f0e235f867f53f1d2652c41583
Reviewed-on: https://go-review.googlesource.com/5523
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
The unbounded list-based defer pool can grow infinitely.
This can happen if a goroutine routinely allocates a defer;
then blocks on one P; and then unblocked, scheduled and
frees the defer on another P.
The scenario was reported on golang-nuts list.
We've been here several times. Any unbounded local caches
are bad and grow to infinite size. This change introduces
central defer pool; local pools become fixed-size
with the only purpose of amortizing accesses to the
central pool.
Freedefer now executes on system stack to not consume
nosplit stack space.
Change-Id: I1a27695838409259d1586a0adfa9f92bccf7ceba
Reviewed-on: https://go-review.googlesource.com/3967
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
The unbounded list-based sudog cache can grow infinitely.
This can happen if a goroutine is routinely blocked on one P
and then unblocked and scheduled on another P.
The scenario was reported on golang-nuts list.
We've been here several times. Any unbounded local caches
are bad and grow to infinite size. This change introduces
central sudog cache; local caches become fixed-size
with the only purpose of amortizing accesses to the
central cache.
The change required to move sudog cache from mcache to P,
because mcache is not scanned by GC.
Change-Id: I3bb7b14710354c026dcba28b3d3c8936a8db4e90
Reviewed-on: https://go-review.googlesource.com/3742
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
See the following issue for context:
https://github.com/golang/go/issues/9729#issuecomment-74648287
In short, RDTSC can produce skewed results without preceding LFENCE/MFENCE.
Information on this matter is very scrappy in the internet.
But this is what linux kernel does (see rdtsc_barrier).
It also fixes the test program on my machine.
Update #9729
Change-Id: I3c1ffbf129fdfdd388bd5b7911b392b319248e68
Reviewed-on: https://go-review.googlesource.com/5033
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Add local workbufs to the m struct in order to reduce contention.
Add consistency checks for workbuf ownership.
Chain workbufs through call change to avoid swapping them
to and from the m struct.
Adjust the size of the workbuf so that the mutators can
more frequently pass modifications to the GC thus shifting
some work from the STW mark termination phase to the concurrent
mark phase.
Change-Id: I557b53af34ad9972265e0ed9f5996e52d548563d
Reviewed-on: https://go-review.googlesource.com/3972
Reviewed-by: Austin Clements <austin@google.com>
Fixes#9791
g.issystem flag setup races with other code wherever we set it.
Even if we set both in parent goroutine and in the system goroutine,
it is still possible that some other goroutine crashes
before the flag is set. We could pass issystem flag to newproc1,
but we start all goroutines with go nowadays.
Instead look at g.startpc to distinguish system goroutines (similar to topofstack).
Change-Id: Ia3467968dee27fa07d9fecedd4c2b00928f26645
Reviewed-on: https://go-review.googlesource.com/4113
Reviewed-by: Keith Randall <khr@golang.org>
The unbounded list-based defer pool can grow infinitely.
This can happen if a goroutine routinely allocates a defer;
then blocks on one P; and then unblocked, scheduled and
frees the defer on another P.
The scenario was reported on golang-nuts list.
We've been here several times. Any unbounded local caches
are bad and grow to infinite size. This change introduces
central defer pool; local pools become fixed-size
with the only purpose of amortizing accesses to the
central pool.
Change-Id: Iadcfb113ccecf912e1b64afc07926f0de9de2248
Reviewed-on: https://go-review.googlesource.com/3741
Reviewed-by: Keith Randall <khr@golang.org>
This adds a "framepointer" GOEXPERIMENT that that makes the amd64
toolchain maintain base pointer chains in the same way that gcc
-fno-omit-frame-pointer does. Go doesn't use these saved base
pointers, but this does enable external tools like Linux perf and
VTune to unwind Go stacks when collecting system-wide profiles.
This requires support in the compilers to not clobber BP, support in
liblink for generating the BP-saving function prologue and unwinding
epilogue, and support in the runtime to save BPs across preemption, to
skip saved BPs during stack unwinding and, and to adjust saved BPs
during stack moving.
As with other GOEXPERIMENTs, everything from the toolchain to the
runtime must be compiled with this experiment enabled. To do this,
run make.bash (or all.bash) with GOEXPERIMENT=framepointer.
Change-Id: I4024853beefb9539949e5ca381adfdd9cfada544
Reviewed-on: https://go-review.googlesource.com/2992
Reviewed-by: Russ Cox <rsc@golang.org>
m.gcing has become overloaded to mean "don't preempt this g" in
general. Once the garbage collector is preemptible, the one thing it
*won't* mean is that we're in the garbage collector.
So, rename gcing to "preemptoff" and make it a string giving a reason
that preemption is disabled. gcing was never set to anything but 0 or
1, so we don't have to worry about there being a stack of reasons.
Change-Id: I4337c29e8e942e7aa4f106fc29597e1b5de4ef46
Reviewed-on: https://go-review.googlesource.com/3660
Reviewed-by: Russ Cox <rsc@golang.org>
This cleanup was slated for after the conversion of the runtime to Go.
Also improve type and function documentation.
Change-Id: I55a16b09e00cf701f246deb69e7ce7e3e04b26e7
Reviewed-on: https://go-review.googlesource.com/3393
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
During a concurrent GC stacks are scanned in
an initial scan phase informing the GC of all
pointers on the stack. The GC only needs to rescan
the stack if it potentially changes which can only
happen if the goroutine runs.
This CL tracks whether the Goroutine has run
since it was last scanned and thus may have changed
its stack. If necessary the stack is rescanned.
Change-Id: I5fb1c4338d42e3f61ab56c9beb63b7b2da25f4f1
Reviewed-on: https://go-review.googlesource.com/3275
Reviewed-by: Russ Cox <rsc@golang.org>
The equal algorithm used to take the size
equal(p, q *T, size uintptr) bool
With this change, it does not
equal(p, q *T) bool
Similarly for the hash algorithm.
The size is rarely used, as most equal functions know the size
of the thing they are comparing. For instance f32equal already
knows its inputs are 4 bytes in size.
For cases where the size is not known, we allocate a closure
(one for each size needed) that points to an assembly stub that
reads the size out of the closure and calls generic code that
has a size argument.
Reduces the size of the go binary by 0.07%. Performance impact
is not measurable.
Change-Id: I6e00adf3dde7ad2974adbcff0ee91e86d2194fec
Reviewed-on: https://go-review.googlesource.com/2392
Reviewed-by: Russ Cox <rsc@golang.org>
The ones at the end of M and G are just used to compute
their size for use in assembly. Generate the size explicitly.
The one at the end of itab is variable-sized, and at least one.
The ones at the end of interfacetype and uncommontype are not
needed, as the preceding slice references them (the slice was
originally added for use by reflect?).
The one at the end of stackmap is already accessed correctly,
and the runtime never allocates one.
Update #9401
Change-Id: Ia75e3aaee38425f038c506868a17105bd64c712f
Reviewed-on: https://go-review.googlesource.com/2420
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
The Gobuf.g goroutine pointer is almost always updated by assembly code.
In one of the few places it is updated by Go code - func save - it must be
treated as a uintptr to avoid a write barrier being emitted at a bad time.
Instead of figuring out how to emit the write barriers missing in the
assembly manipulation, change the type of the field to uintptr, so that
it does not require write barriers at all.
Goroutine structs are published in the allg list and never freed.
That will keep the goroutine structs from being collected.
There is never a time that Gobuf.g's contain the only references
to a goroutine: the publishing of the goroutine in allg comes first.
Goroutine pointers are also kept in non-GC-visible places like TLS,
so I can't see them ever moving. If we did want to start moving data
in the GC, we'd need to allocate the goroutine structs from an
alternate arena. This CL doesn't make that problem any worse.
Found with GODEBUG=wbshadow=1 mode.
Eventually that will run automatically, but right now
it still detects other missing write barriers.
Change-Id: I85f91312ec3e0ef69ead0fff1a560b0cfb095e1a
Reviewed-on: https://go-review.googlesource.com/2065
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>