Current code just checks the consistency (that the functab is correctly
sorted by PC, etc) of the moduledata object that the runtime belongs to.
Change to check all of them.
Change-Id: I544a44c5de7445fff87d3cdb4840ff04c5e2bf75
Reviewed-on: https://go-review.googlesource.com/9773
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Makes searching in source code easier.
Change-Id: Ie2e85934d23920ac0bc01d28168bcfbbdc465580
Reviewed-on: https://go-review.googlesource.com/9774
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
Currently, the GC uses a moving average of recent scan work ratios to
estimate the total scan work required by this cycle. This is in turn
used to compute how much scan work should be done by mutators when
they allocate in order to perform all expected scan work by the time
the allocated heap reaches the heap goal.
However, our current scan work estimate can be arbitrarily wrong if
the heap topography changes significantly from one cycle to the
next. For example, in the go1 benchmarks, at the beginning of each
benchmark, the heap is dominated by a 256MB no-scan object, so the GC
learns that the scan density of the heap is very low. In benchmarks
that then rapidly allocate pointer-dense objects, by the time of the
next GC cycle, our estimate of the scan work can be too low by a large
factor. This in turn lets the mutator allocate faster than the GC can
collect, allowing it to get arbitrarily far ahead of the scan work
estimate, which leads to very long GC cycles with very little mutator
assist that can overshoot the heap goal by large margins. This is
particularly easy to demonstrate with BinaryTree17:
$ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17
gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P
gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P
testing: warning: no tests to run
PASS
BenchmarkBinaryTree17 gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced)
gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P
1 9150447353 ns/op
Change this estimate to instead use the *current* scannable heap
size. This has the advantage of being based solely on the current
state of the heap, not on past densities or reachable heap sizes, so
it isn't susceptible to falling behind during these sorts of phase
changes. This is strictly an over-estimate, but it's better to
over-estimate and get more assist than necessary than it is to
under-estimate and potentially spiral out of control. Experiments with
scaling this estimate back showed no obvious benefit for mutator
utilization, heap size, or assist time.
This new estimate has little effect for most benchmarks, including
most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge
effect for benchmarks that triggered the bad pacer behavior:
name old mean new mean delta
BinaryTree17 10.0s × (1.00,1.00) 3.5s × (0.98,1.01) -64.93% (p=0.000)
Fannkuch11 2.74s × (1.00,1.01) 2.65s × (1.00,1.00) -3.52% (p=0.000)
FmtFprintfEmpty 56.4ns × (0.99,1.00) 57.8ns × (1.00,1.01) +2.43% (p=0.000)
FmtFprintfString 187ns × (0.99,1.00) 185ns × (0.99,1.01) -1.19% (p=0.010)
FmtFprintfInt 184ns × (1.00,1.00) 183ns × (1.00,1.00) (no variance)
FmtFprintfIntInt 321ns × (1.00,1.00) 315ns × (1.00,1.00) -1.80% (p=0.000)
FmtFprintfPrefixedInt 266ns × (1.00,1.00) 263ns × (1.00,1.00) -1.22% (p=0.000)
FmtFprintfFloat 353ns × (1.00,1.00) 353ns × (1.00,1.00) -0.13% (p=0.035)
FmtManyArgs 1.21µs × (1.00,1.00) 1.19µs × (1.00,1.00) -1.33% (p=0.000)
GobDecode 9.69ms × (1.00,1.00) 9.59ms × (1.00,1.00) -1.07% (p=0.000)
GobEncode 7.89ms × (0.99,1.01) 7.74ms × (1.00,1.00) -1.92% (p=0.000)
Gzip 391ms × (1.00,1.00) 392ms × (1.00,1.00) ~ (p=0.522)
Gunzip 97.1ms × (1.00,1.00) 97.0ms × (1.00,1.00) -0.10% (p=0.000)
HTTPClientServer 55.7µs × (0.99,1.01) 56.7µs × (0.99,1.01) +1.81% (p=0.001)
JSONEncode 19.1ms × (1.00,1.00) 19.0ms × (1.00,1.00) -0.85% (p=0.000)
JSONDecode 66.8ms × (1.00,1.00) 66.9ms × (1.00,1.00) ~ (p=0.288)
Mandelbrot200 4.13ms × (1.00,1.00) 4.12ms × (1.00,1.00) -0.08% (p=0.000)
GoParse 3.97ms × (1.00,1.01) 4.01ms × (1.00,1.00) +0.99% (p=0.000)
RegexpMatchEasy0_32 114ns × (1.00,1.00) 115ns × (0.99,1.00) ~ (p=0.070)
RegexpMatchEasy0_1K 376ns × (1.00,1.00) 376ns × (1.00,1.00) ~ (p=0.900)
RegexpMatchEasy1_32 94.9ns × (1.00,1.00) 96.3ns × (1.00,1.01) +1.53% (p=0.001)
RegexpMatchEasy1_1K 568ns × (1.00,1.00) 567ns × (1.00,1.00) -0.22% (p=0.001)
RegexpMatchMedium_32 159ns × (1.00,1.00) 159ns × (1.00,1.00) ~ (p=0.178)
RegexpMatchMedium_1K 46.4µs × (1.00,1.00) 46.6µs × (1.00,1.00) +0.29% (p=0.000)
RegexpMatchHard_32 2.37µs × (1.00,1.00) 2.37µs × (1.00,1.00) ~ (p=0.722)
RegexpMatchHard_1K 71.1µs × (1.00,1.00) 71.2µs × (1.00,1.00) ~ (p=0.229)
Revcomp 565ms × (1.00,1.00) 562ms × (1.00,1.00) -0.52% (p=0.000)
Template 81.0ms × (1.00,1.00) 80.2ms × (1.00,1.00) -0.97% (p=0.000)
TimeParse 380ns × (1.00,1.00) 380ns × (1.00,1.00) ~ (p=0.148)
TimeFormat 405ns × (0.99,1.00) 385ns × (0.99,1.00) -5.00% (p=0.000)
Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4
Reviewed-on: https://go-review.googlesource.com/9696
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
This tracks the number of scannable bytes in the allocated heap. That
is, bytes that the garbage collector must scan before reaching the
last pointer field in each object.
This will be used to compute a more robust estimate of the GC scan
work.
Change-Id: I1eecd45ef9cdd65b69d2afb5db5da885c80086bb
Reviewed-on: https://go-review.googlesource.com/9695
Reviewed-by: Russ Cox <rsc@golang.org>
The garbage collector predicts how much "scan work" must be done in a
cycle to determine how much work should be done by mutators when they
allocate. Most code doesn't care what units the scan work is in: it
simply knows that a certain amount of scan work has to be done in the
cycle. Currently, the GC uses the number of pointer slots scanned as
the scan work on the theory that this is the bulk of the time spent in
the garbage collector and hence reflects real CPU resource usage.
However, this metric is difficult to estimate at the beginning of a
cycle.
Switch to counting the total number of bytes scanned, including both
pointer and scalar slots. This is still less than the total marked
heap since it omits no-scan objects and no-scan tails of objects. This
metric may not reflect absolute performance as well as the count of
scanned pointer slots (though it still takes time to scan scalar
fields), but it will be much easier to estimate robustly, which is
more important.
Change-Id: Ie3a5eeeb0384a1ca566f61b2f11e9ff3a75ca121
Reviewed-on: https://go-review.googlesource.com/9694
Reviewed-by: Russ Cox <rsc@golang.org>
Currently, we only flush the per-P gcWork caches in gcMark, at the
beginning of mark termination. This is necessary to ensure that no
work is held up in these caches.
However, this flush happens after we update the GC controller state,
which depends on statistics about marked heap size and scan work that
are only updated by this flush. Hence, the controller is missing the
bulk of heap marking and scan work. This bug was introduced in commit
1b4025f, which introduced the per-P gcWork caches.
Fix this by flushing these caches before we update the GC controller
state. We continue to flush them at the beginning of mark termination
as well to be robust in case any write barriers happened between the
previous flush and entering mark termination, but this should be a
no-op.
Change-Id: I8f0f91024df967ebf0c616d1c4f0c339c304ebaa
Reviewed-on: https://go-review.googlesource.com/9646
Reviewed-by: Russ Cox <rsc@golang.org>
During development some tracing routines were added that are not
needed in the release. These included GCstarttimes, GCendtimes, and
GCprinttimes.
Fixes#10462
Change-Id: I0788e6409d61038571a5ae0cbbab793102df0a65
Reviewed-on: https://go-review.googlesource.com/9689
Reviewed-by: Austin Clements <austin@google.com>
Before CL 8214 (use .plt instead of .got on Solaris) Solaris used a
dynamic linking scheme that didn't permit lazy binding. To speed program
startup, Go binaries only used it for a small number of symbols required
by the runtime. Other symbols were resolved on demand on first use, and
were cached for subsequent use. This required some moderately complex
code in the syscall package.
CL 8214 changed the way dynamic linking is implemented, and now lazy
binding is supported. As now all symbols are resolved lazily by the
dynamic loader, there is no need for the complex code in the syscall
package that did the same. This CL makes Go programs link directly
with the necessary shared libraries and deletes the lazy-loading code
implemented in Go.
Change-Id: Ifd7275db72de61b70647242e7056dd303b1aee9e
Reviewed-on: https://go-review.googlesource.com/9184
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The linker always uses .plt for externals, so libcFunc is now an actual
external symbol instead of a pointer to one.
Fixes most of the breakage introduced in previous CL.
Change-Id: I64b8c96f93127f2d13b5289b024677fd3ea7dbea
Reviewed-on: https://go-review.googlesource.com/8215
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
I forgot there is already a ptrSize constant.
Rename field to avoid some confusion.
Change-Id: I098fdcc8afc947d6c02c41c6e6de24624cc1c8ff
Reviewed-on: https://go-review.googlesource.com/9700
Reviewed-by: Austin Clements <austin@google.com>
Freezetheworld still has stuff to do when gomaxprocs=1.
In particular, signals can come in on other Ms (like the GC M, say)
and the single user M is still running.
Fixes#10546
Change-Id: I2f07f17d1c81e93cf905df2cb087112d436ca7e7
Reviewed-on: https://go-review.googlesource.com/9551
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
When emulating ARM FSQRT instruction, the sqrt function itself
should not use any floating point arithmetics, otherwise it will
clobber the user software FP registers.
Fortunately, the sqrt function only uses floating point instructions
to test for corner cases, so it's easy to make that function does
all it job using pure integer arithmetic only. I've verified that
after this change, runtime.stepflt and runtime.sqrt doesn't contain
any call to _sfloat. (Perhaps we should add //go:nosfloat to make
the compiler enforce this?)
Fixes#10641.
Change-Id: Ida4742c49000fae4fea4649f28afde630ce4c576
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9570
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Keith Randall <khr@golang.org>
This adds a field to the runtime type structure that records the size
of the prefix of objects of that type containing pointers. Any data
after this offset is scalar data.
This is necessary for shrinking the type bitmaps to 1 bit and will
help the garbage collector efficiently estimate the amount of heap
that needs to be scanned.
Change-Id: I1318d79e6360dca0ac980245016c562e61f52ff5
Reviewed-on: https://go-review.googlesource.com/9691
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
shouldtriggergc is slightly expensive due to the call overhead
and the use of an atomic. This CL reduces the number of time
one checks if a GC should be done from one at each allocation
to once when a span is allocated. Since shouldtriggergc is an
important abstraction simply hand inlining it, along with its
atomic instruction would lose the abstraction.
Change-Id: Ia3210655b4b3d433f77064a21ecb54e4d9d435f7
Reviewed-on: https://go-review.googlesource.com/9403
Reviewed-by: Austin Clements <austin@google.com>
This adds a detailed debug dump of the state of the GC controller and
a GODEBUG flag to enable it.
Change-Id: I562fed7981691a84ddf0f9e6fcd9f089f497ac13
Reviewed-on: https://go-review.googlesource.com/9640
Reviewed-by: Russ Cox <rsc@golang.org>
(1) Count pointer-free objects found during scanning roots
as marked bytes, by not zeroing the mark total after scanning roots.
(2) Don't count the bytes for the roots themselves, by not adding
them to the mark total in scanblock (the zeroing removed by (1)
was aimed at that add but hitting more).
Combined, (1) and (2) fix the calculation of the marked heap size.
This makes the GC trigger much less often in the Go 1 benchmarks,
which have a global []byte pointing at 256 MB of data.
That 256 MB allocation was not being included in the heap size
in the current code, but was included in Go 1.4.
This is the source of much of the relative slowdown in that directory.
(3) Count the bytes for the roots as scanned work, by not zeroing
the scan total after scanning roots. There is no strict justification
for this, and it probably doesn't matter much either way,
but it was always combined with another buggy zeroing
(removed in (1)), so guilty by association.
Austin noticed this.
name old mean new mean delta
BenchmarkBinaryTree17 13.1s × (0.97,1.03) 5.9s × (0.97,1.05) -55.19% (p=0.000)
BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.37s × (1.00,1.01) +0.47% (p=0.032)
BenchmarkFmtFprintfEmpty 84.6ns × (0.95,1.14) 85.7ns × (0.94,1.05) ~ (p=0.521)
BenchmarkFmtFprintfString 320ns × (0.95,1.06) 283ns × (0.99,1.02) -11.48% (p=0.000)
BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 288ns × (0.99,1.02) -7.26% (p=0.000)
BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 478ns × (0.99,1.02) -13.70% (p=0.000)
BenchmarkFmtFprintfPrefixedInt 434ns × (0.96,1.06) 393ns × (0.98,1.04) -9.60% (p=0.000)
BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 584ns × (0.99,1.01) -5.73% (p=0.000)
BenchmarkFmtManyArgs 2.19µs × (0.98,1.03) 1.94µs × (0.99,1.01) -11.62% (p=0.000)
BenchmarkGobDecode 21.2ms × (0.97,1.06) 15.2ms × (0.99,1.01) -28.17% (p=0.000)
BenchmarkGobEncode 18.1ms × (0.94,1.06) 11.8ms × (0.99,1.01) -35.00% (p=0.000)
BenchmarkGzip 650ms × (0.98,1.01) 649ms × (0.99,1.02) ~ (p=0.802)
BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.438)
BenchmarkHTTPClientServer 110µs × (0.98,1.04) 101µs × (0.98,1.02) -8.79% (p=0.000)
BenchmarkJSONEncode 40.3ms × (0.97,1.03) 31.8ms × (0.98,1.03) -20.92% (p=0.000)
BenchmarkJSONDecode 119ms × (0.97,1.02) 108ms × (0.99,1.02) -9.15% (p=0.000)
BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (0.99,1.01) ~ (p=0.750)
BenchmarkGoParse 8.58ms × (0.89,1.10) 6.80ms × (1.00,1.00) -20.71% (p=0.000)
BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 162ns × (0.99,1.02) ~ (p=0.131)
BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 559ns × (0.99,1.02) +3.58% (p=0.000)
BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (1.00,1.00) ~ (p=0.466)
BenchmarkRegexpMatchEasy1_1K 889ns × (0.99,1.01) 885ns × (0.99,1.01) -0.50% (p=0.022)
BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.02) 252ns × (0.99,1.01) ~ (p=0.469)
BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 73.6µs × (0.99,1.03) ~ (p=0.168)
BenchmarkRegexpMatchHard_32 3.87µs × (1.00,1.01) 3.86µs × (1.00,1.00) ~ (p=0.055)
BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.00) ~ (p=0.133)
BenchmarkRevcomp 995ms × (0.94,1.10) 949ms × (0.99,1.01) -4.64% (p=0.000)
BenchmarkTemplate 141ms × (0.97,1.02) 127ms × (0.99,1.01) -10.00% (p=0.000)
BenchmarkTimeParse 641ns × (0.99,1.01) 623ns × (0.99,1.01) -2.79% (p=0.000)
BenchmarkTimeFormat 729ns × (0.98,1.03) 679ns × (0.99,1.00) -6.93% (p=0.000)
Change-Id: I839bd7356630d18377989a0748763414e15ed057
Reviewed-on: https://go-review.googlesource.com/9602
Reviewed-by: Austin Clements <austin@google.com>
This reverts commit c26fc88d56.
This broke pprof. See the comments at 9491.
Change-Id: Ic99ce026e86040c050a9bf0ea3024a1a42274ad1
Reviewed-on: https://go-review.googlesource.com/9565
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Kind of a hack, but makes the non-optimized builds pass.
Fixes#10079
Change-Id: I26f41c546867f8f3f16d953dc043e784768f2aff
Reviewed-on: https://go-review.googlesource.com/9552
Reviewed-by: Russ Cox <rsc@golang.org>
This includes the following information in the per-function summary:
outK = paramJ encoded in outK bits for paramJ
outK = *paramJ encoded in outK bits for paramJ
heap = paramJ EscHeap
heap = *paramJ EscContentEscapes
Note that (currently) if the address of a parameter is taken and
returned, necessarily a heap allocation occurred to contain that
reference, and the heap can never refer to stack, therefore the
parameter and everything downstream from it escapes to the heap.
The per-function summary information now has a tuneable number of bits
(2 is probably noticeably better than 1, 3 is likely overkill, but it
is now easy to check and the -m debugging output includes information
that allows you to figure out if more would be better.)
A new test was added to check pointer flow through struct-typed and
*struct-typed parameters and returns; some of these are sensitive to
the number of summary bits, and ought to yield better results with a
more competent escape analysis algorithm. Another new test checks
(some) correctness with array parameters, results, and operations.
The old analysis inferred a piece of plan9 runtime was non-escaping by
counteracting overconservative analysis with buggy analysis; with the
bug fixed, the result was too conservative (and it's not easy to fix
in this framework) so the source code was tweaked to get the desired
result. A test was added against the discovered bug.
The escape analysis was further improved splitting the "level" into
3 parts, one tracking the conventional "level" and the other two
computing the highest-level-suffix-from-copy, which is used to
generally model the cancelling effect of indirection applied to
address-of.
With the improved escape analysis enabled, it was necessary to
modify one of the runtime tests because it now attempts to allocate
too much on the (small, fixed-size) G0 (system) stack and this
failed the test.
Compiling src/std after touching src/runtime/*.go with -m logging
turned on shows 420 fewer heap allocation sites (10538 vs 10968).
Profiling allocations in src/html/template with
for i in {1..5} ;
do go tool 6g -memprofile=mastx.${i}.prof -memprofilerate=1 *.go;
go tool pprof -alloc_objects -text mastx.${i}.prof ;
done
showed a 15% reduction in allocations performed by the compiler.
Update #3753
Update #4720Fixes#10466
Change-Id: I0fd97d5f5ac527b45f49e2218d158a6e89951432
Reviewed-on: https://go-review.googlesource.com/8202
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
App Store policy requires programs do not reference the exc_server
symbol. (Some public forum threads show that Unity ran into this
several years ago and it is a hard policy rule.) While some research
suggests that I could write my own version of exc_server, the
expedient course is to disable the exception handler by default.
Go programs only need it when running under lldb, which is primarily
used by tests. So enable the exception handler in cmd/dist when we
are running the tests.
Fixes#10646
Change-Id: I853905254894b5367edb8abd381d45585a78ee8b
Reviewed-on: https://go-review.googlesource.com/9549
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
gcDumpObject is used to print the source and destination objects when
checkmark find a missing mark. However, gcDumpObject currently assumes
the given pointer will point to a heap object. This is not true of the
source object during root marking and may not even be true of the
destination object in the limited situations where the heap points
back in to the stack.
If the pointer isn't a heap object, gcDumpObject will attempt an
out-of-bounds access to h_spans. This will cause a panicslice, which
will attempt to construct a useful panic message. This will cause a
string allocation, which will lead mallocgc to panic because the GC is
in mark termination (checkmark only happens during mark termination).
Fix this by checking that the pointer points into the heap arena
before attempting to use it as an arena pointer.
Change-Id: I09da600c380d4773f1f8f38e45b82cb229ea6382
Reviewed-on: https://go-review.googlesource.com/9498
Reviewed-by: Rick Hudson <rlh@golang.org>
Sequence of operations:
- Go code does a systemstack call
- during the systemstack call, receive a signal
- signal requests a traceback of all goroutines
The orignal G is still marked as _Grunning, so the traceback code
refuses to print its stack.
Fix by allowing traceback of Gs whose caller is on the same M as G is.
G can't be modifying its stack if that is the case.
Fixes#10546
Change-Id: I2bcea48c0197fbf78ab6fa080027cd80181083ad
Reviewed-on: https://go-review.googlesource.com/9435
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The problem is not actually specific to android/arm. Linux/ARM's
runtime.clone set the stack pointer to child_stk-4 before calling
the fn. And then when fn returns, it tries to write to 4(R13) to
provide argument for runtime.exit, which is just beyond the allocated
child stack, and thus it will corrupt the heap randomly or trigger
segfault if that memory happens to be unmapped.
While we're at here, shorten the test polling interval to 0.1s to
speed up the test (it was only checking at 1s interval, which means
the test takes at least 1s).
Fixes#10548.
Change-Id: I57cd63232022b113b6cd61e987b0684ebcce930a
Reviewed-on: https://go-review.googlesource.com/9457
Reviewed-by: David Crawshaw <crawshaw@golang.org>
The heap statistics were only written if asked for a profile with debug > 0,
but that also prints a stack trace for each profile line, which is comparatively
much noisier. The statistics are short enough and separate enough
(they only appear at the end) and useful enough that we can print them
always.
This means that people using -test.memprofile in tests will get a memory
profile with statistics included now. Pprof won't care, but if people care to
look, the numbers will be there.
This avoids the need for hacks like using -memprofilerate=1 to find
the number of allocations.
Change-Id: I10a4f593403d0315aad11b37c6e554b734caa73f
Reviewed-on: https://go-review.googlesource.com/9491
Reviewed-by: David Chase <drchase@google.com>
There's no need to call/ret to the body implementation.
It can write the result to the right place. Just jump to
it and have it return to our caller.
Old:
call body implementation
compute result
put result in a register
return
write register to result location
return
New:
load address of result location into a register
jump to body implementation
compute result
write result to passed-in address
return
It's a bit tricky on 386 because there is no free register
with which to pass the result location. Free up a register
by keeping around blen-alen instead of both alen and blen.
Change-Id: If2cf0682a5bf1cc592bdda7c126ed4eee8944fba
Reviewed-on: https://go-review.googlesource.com/9202
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Gdb is not able to backtrace our non-standard stack frames on RISC
architectures without frame pointer.
Change-Id: Id62a566ce2d743602ded2da22ff77b9ae34bc5ae
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9456
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
With 128KB stack reservation, on 32-bit Windows, the maximum number
threads is ~9000.
The original 65535-byte stack commit is causing problem on Windows
XP where it makes the stack reservation to be 1MB despite the fact
that the runtime specified 128KB.
While we're at here, also fix the extra spacings in the unable to
create more OS thread error message: println will insert a space
between each argument.
See #9457 for more information.
Change-Id: I3a82f7d9717d3d55211b6eb1c34b00b0eaad83ed
Reviewed-on: https://go-review.googlesource.com/2237
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Minux Ma <minux@golang.org>
To avoid confusion with the runtime concept of copying stack.
Change-Id: I33442377b71012c2482c2d0ddd561492c71e70d0
Reviewed-on: https://go-review.googlesource.com/8639
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Russ Cox <rsc@golang.org>
Technically you must initialize static pthread_mutex_t and
pthread_cond_t variables with the appropriate INITIALIZER macro. In
practice the default initializers are zero anyhow, but it's still good
code hygiene.
Change-Id: I517304b16c2c7943b3880855c1b47a9a506b4bdf
Reviewed-on: https://go-review.googlesource.com/9433
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Some race tests were sensitive to the goroutine scheduling order.
When this changed in commit e870f06, these tests started to fail.
Fix TestRaceHeapParam by ensuring that the racing goroutine has
run before the test exits. Fix TestRaceRWMutexMultipleReaders by
adding a third reader to ensure that two readers wind up on the
same side of the writer (and race with each other) regardless of
the schedule. Fix TestRaceRange by ensuring that the racing
goroutine runs before the main goroutine exits the loop it races
with.
Change-Id: Iaf002f8730ea42227feaf2f3c51b9a1e57ccffdd
Reviewed-on: https://go-review.googlesource.com/9402
Reviewed-by: Russ Cox <rsc@golang.org>
This makes the OS X firewall box pop up.
Not run during all.bash so hasn't been noticed before.
Change-Id: I78feb4fd3e1d3c983ae3419085048831c04de3da
Reviewed-on: https://go-review.googlesource.com/9401
Reviewed-by: Austin Clements <austin@google.com>
ReadMemStats accounts for stacks slightly differently than the runtime
does internally. Internally, only stacks allocated by newosproc0 are
accounted in memstats.stacks_sys and other stacks are accounted in
heap_sys. readmemstats_m shuffles the statistics so all stacks are
accounted in StackSys rather than HeapSys.
However, currently, readmemstats_m assumes StackSys will be zero when
it does this shuffle. This was true until commit 6ad33be. If it isn't
(e.g., if something called newosproc0), StackSys+HeapSys will be
different before and after this shuffle, and the Sys sum that was
computed earlier will no longer agree with the sum of its components.
Fix this by making the shuffle in readmemstats_m not assume that
StackSys is zero.
Fixes#10585.
Change-Id: If13991c8de68bd7b85e1b613d3f12b4fd6fd5813
Reviewed-on: https://go-review.googlesource.com/9366
Reviewed-by: Russ Cox <rsc@golang.org>
I introduced this build failure in golang.org/cl/9302 but failed to
notice due to the other failures on the dashboard.
Change-Id: I84bf00f664ba572c1ca722e0136d8a2cf21613ca
Reviewed-on: https://go-review.googlesource.com/9363
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Currently TestRaceCrawl fails to wg.Done for every wg.Adds if the
depth ever reaches 0. This causes the test to deadlock. Under the race
detector, this deadlock is not detected, so the test eventually times
out.
This only recently became a problem. Prior to commit e870f06 the depth
would never reach 0 because the strict round-robin goroutine schedule
ensured that all of the URLs were already "seen" by depth 2. Now that
the runtime prefers scheduling the most recently started goroutine,
the test is able to reach depth 0 and trigger this deadlock.
Change-Id: I5176302a89614a344c84d587073b364833af6590
Reviewed-on: https://go-review.googlesource.com/9344
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
The master goroutine was returning before
the child goroutine had done its final i < b.N
(the one that fails and causes it to exit the loop)
and then the benchmark harness was updating
b.N, causing a read+write race on b.N.
Change-Id: I2504270a0de30544736f6c32161337a25b505c3e
Reviewed-on: https://go-review.googlesource.com/9368
Reviewed-by: Austin Clements <austin@google.com>
This is a follow-up to CL 9269, as suggested
by dvyukov.
There is probably even more that can be done
to speed up this shuffle. It will matter more
once CL 7570 (fine-grained locking in select)
is in and can be revisited then, with benchmarks.
Change-Id: Ic13a27d11cedd1e1f007951214b3bb56b1644f02
Reviewed-on: https://go-review.googlesource.com/9393
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
This avoids confusion with the main findrunnable in the scheduler.
Change-Id: I8cf40657557a8610a2fe5a2f74598518256ca7f0
Reviewed-on: https://go-review.googlesource.com/9305
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, we use a full stop-the-world around enabling write
barriers. This is to ensure that all Gs have enabled write barriers
before any blackening occurs (either in gcBgMarkWorker() or in
gcAssistAlloc()).
However, there's no need to bring the whole world to a synchronous
stop to ensure this. This change replaces the STW with a ragged
barrier that ensures each P has individually observed that write
barriers should be enabled before GC performs any blackening.
Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955
Reviewed-on: https://go-review.googlesource.com/8207
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This adds forEachP, which performs a general-purpose ragged global
barrier. forEachP takes a callback and invokes it for every P at a GC
safe point.
Ps that are idle or in a syscall are considered to be at a continuous
safe point. forEachP ensures that these Ps do not change state by
forcing all syscall Ps into idle and holding the sched.lock.
To ensure that Ps do not enter syscall or idle without running the
safe-point function, this adds checks for a pending callback every
place there is currently a gcwaiting check.
We'll use forEachP to replace the STW around enabling the write
barrier and to replace the current asynchronous per-M wbuf cache with
a cooperatively managed per-P gcWork cache.
Change-Id: Ie944f8ce1fead7c79bf271d2f42fcd61a41bb3cc
Reviewed-on: https://go-review.googlesource.com/8206
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This fixes a bug where the runtime ready()s a goroutine while setting
up a new M that's initially marked as spinning, causing the scheduler
to later panic when it finds work in the run queue of a P associated
with a spinning M. Specifically, the sequence of events that can lead
to this is:
1) sysmon calls handoffp to hand off a P stolen from a syscall.
2) handoffp sees no pending work on the P, so it calls startm with
spinning set.
3) startm calls newm, which in turn calls allocm to allocate a new M.
4) allocm "borrows" the P we're handing off in order to do allocation
and performs this allocation.
5) This allocation may assist the garbage collector, and this assist
may detect the end of concurrent mark and ready() the main GC
goroutine to signal this.
6) This ready()ing puts the GC goroutine on the run queue of the
borrowed P.
7) newm starts the OS thread, which runs mstart and subsequently
mstart1, which marks the M spinning because startm was called with
spinning set.
8) mstart1 enters the scheduler, which panics because there's work on
the run queue, but the M is marked spinning.
To fix this, before marking the M spinning in step 7, add a check to
see if work was been added to the P's run queue. If this is the case,
undo the spinning instead.
Fixes#10573.
Change-Id: I4670495ae00582144a55ce88c45ae71de597cfa5
Reviewed-on: https://go-review.googlesource.com/9332
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
This adds a check that we never put a P on the idle list when it has
work on its local run queue.
Change-Id: Ifcfab750de60c335148a7f513d4eef17be03b6a7
Reviewed-on: https://go-review.googlesource.com/9324
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>