1
0
mirror of https://github.com/golang/go synced 2024-11-20 07:14:40 -07:00
Commit Graph

1066 Commits

Author SHA1 Message Date
Michael Hudson-Doyle
fa896733b5 runtime: check consistency of all module data objects
Current code just checks the consistency (that the functab is correctly
sorted by PC, etc) of the moduledata object that the runtime belongs to.
Change to check all of them.

Change-Id: I544a44c5de7445fff87d3cdb4840ff04c5e2bf75
Reviewed-on: https://go-review.googlesource.com/9773
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-07 15:06:08 +00:00
Alex Brainman
a52dc9fcbd runtime: fix comments that mention g status values
Makes searching in source code easier.

Change-Id: Ie2e85934d23920ac0bc01d28168bcfbbdc465580
Reviewed-on: https://go-review.googlesource.com/9774
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-07 00:00:38 +00:00
Austin Clements
17db6e0420 runtime: use heap scan size as estimate of GC scan work
Currently, the GC uses a moving average of recent scan work ratios to
estimate the total scan work required by this cycle. This is in turn
used to compute how much scan work should be done by mutators when
they allocate in order to perform all expected scan work by the time
the allocated heap reaches the heap goal.

However, our current scan work estimate can be arbitrarily wrong if
the heap topography changes significantly from one cycle to the
next. For example, in the go1 benchmarks, at the beginning of each
benchmark, the heap is dominated by a 256MB no-scan object, so the GC
learns that the scan density of the heap is very low. In benchmarks
that then rapidly allocate pointer-dense objects, by the time of the
next GC cycle, our estimate of the scan work can be too low by a large
factor. This in turn lets the mutator allocate faster than the GC can
collect, allowing it to get arbitrarily far ahead of the scan work
estimate, which leads to very long GC cycles with very little mutator
assist that can overshoot the heap goal by large margins. This is
particularly easy to demonstrate with BinaryTree17:

$ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17
gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P
gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P
testing: warning: no tests to run
PASS
BenchmarkBinaryTree17	gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced)
gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P
       1       9150447353 ns/op

Change this estimate to instead use the *current* scannable heap
size. This has the advantage of being based solely on the current
state of the heap, not on past densities or reachable heap sizes, so
it isn't susceptible to falling behind during these sorts of phase
changes. This is strictly an over-estimate, but it's better to
over-estimate and get more assist than necessary than it is to
under-estimate and potentially spiral out of control. Experiments with
scaling this estimate back showed no obvious benefit for mutator
utilization, heap size, or assist time.

This new estimate has little effect for most benchmarks, including
most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge
effect for benchmarks that triggered the bad pacer behavior:

name                   old mean              new mean              delta
BinaryTree17            10.0s × (1.00,1.00)    3.5s × (0.98,1.01)  -64.93% (p=0.000)
Fannkuch11              2.74s × (1.00,1.01)   2.65s × (1.00,1.00)   -3.52% (p=0.000)
FmtFprintfEmpty        56.4ns × (0.99,1.00)  57.8ns × (1.00,1.01)   +2.43% (p=0.000)
FmtFprintfString        187ns × (0.99,1.00)   185ns × (0.99,1.01)   -1.19% (p=0.010)
FmtFprintfInt           184ns × (1.00,1.00)   183ns × (1.00,1.00)  (no variance)
FmtFprintfIntInt        321ns × (1.00,1.00)   315ns × (1.00,1.00)   -1.80% (p=0.000)
FmtFprintfPrefixedInt   266ns × (1.00,1.00)   263ns × (1.00,1.00)   -1.22% (p=0.000)
FmtFprintfFloat         353ns × (1.00,1.00)   353ns × (1.00,1.00)   -0.13% (p=0.035)
FmtManyArgs            1.21µs × (1.00,1.00)  1.19µs × (1.00,1.00)   -1.33% (p=0.000)
GobDecode              9.69ms × (1.00,1.00)  9.59ms × (1.00,1.00)   -1.07% (p=0.000)
GobEncode              7.89ms × (0.99,1.01)  7.74ms × (1.00,1.00)   -1.92% (p=0.000)
Gzip                    391ms × (1.00,1.00)   392ms × (1.00,1.00)     ~    (p=0.522)
Gunzip                 97.1ms × (1.00,1.00)  97.0ms × (1.00,1.00)   -0.10% (p=0.000)
HTTPClientServer       55.7µs × (0.99,1.01)  56.7µs × (0.99,1.01)   +1.81% (p=0.001)
JSONEncode             19.1ms × (1.00,1.00)  19.0ms × (1.00,1.00)   -0.85% (p=0.000)
JSONDecode             66.8ms × (1.00,1.00)  66.9ms × (1.00,1.00)     ~    (p=0.288)
Mandelbrot200          4.13ms × (1.00,1.00)  4.12ms × (1.00,1.00)   -0.08% (p=0.000)
GoParse                3.97ms × (1.00,1.01)  4.01ms × (1.00,1.00)   +0.99% (p=0.000)
RegexpMatchEasy0_32     114ns × (1.00,1.00)   115ns × (0.99,1.00)     ~    (p=0.070)
RegexpMatchEasy0_1K     376ns × (1.00,1.00)   376ns × (1.00,1.00)     ~    (p=0.900)
RegexpMatchEasy1_32    94.9ns × (1.00,1.00)  96.3ns × (1.00,1.01)   +1.53% (p=0.001)
RegexpMatchEasy1_1K     568ns × (1.00,1.00)   567ns × (1.00,1.00)   -0.22% (p=0.001)
RegexpMatchMedium_32    159ns × (1.00,1.00)   159ns × (1.00,1.00)     ~    (p=0.178)
RegexpMatchMedium_1K   46.4µs × (1.00,1.00)  46.6µs × (1.00,1.00)   +0.29% (p=0.000)
RegexpMatchHard_32     2.37µs × (1.00,1.00)  2.37µs × (1.00,1.00)     ~    (p=0.722)
RegexpMatchHard_1K     71.1µs × (1.00,1.00)  71.2µs × (1.00,1.00)     ~    (p=0.229)
Revcomp                 565ms × (1.00,1.00)   562ms × (1.00,1.00)   -0.52% (p=0.000)
Template               81.0ms × (1.00,1.00)  80.2ms × (1.00,1.00)   -0.97% (p=0.000)
TimeParse               380ns × (1.00,1.00)   380ns × (1.00,1.00)     ~    (p=0.148)
TimeFormat              405ns × (0.99,1.00)   385ns × (0.99,1.00)   -5.00% (p=0.000)

Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4
Reviewed-on: https://go-review.googlesource.com/9696
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-06 19:40:38 +00:00
Austin Clements
3be3cbd548 runtime: track "scannable" bytes of heap
This tracks the number of scannable bytes in the allocated heap. That
is, bytes that the garbage collector must scan before reaching the
last pointer field in each object.

This will be used to compute a more robust estimate of the GC scan
work.

Change-Id: I1eecd45ef9cdd65b69d2afb5db5da885c80086bb
Reviewed-on: https://go-review.googlesource.com/9695
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:33 +00:00
Austin Clements
53c53984e7 runtime: include scalar slots in GC scan work metric
The garbage collector predicts how much "scan work" must be done in a
cycle to determine how much work should be done by mutators when they
allocate. Most code doesn't care what units the scan work is in: it
simply knows that a certain amount of scan work has to be done in the
cycle. Currently, the GC uses the number of pointer slots scanned as
the scan work on the theory that this is the bulk of the time spent in
the garbage collector and hence reflects real CPU resource usage.
However, this metric is difficult to estimate at the beginning of a
cycle.

Switch to counting the total number of bytes scanned, including both
pointer and scalar slots. This is still less than the total marked
heap since it omits no-scan objects and no-scan tails of objects. This
metric may not reflect absolute performance as well as the count of
scanned pointer slots (though it still takes time to scan scalar
fields), but it will be much easier to estimate robustly, which is
more important.

Change-Id: Ie3a5eeeb0384a1ca566f61b2f11e9ff3a75ca121
Reviewed-on: https://go-review.googlesource.com/9694
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:27 +00:00
Austin Clements
c4931a8433 runtime: dispose gcWork caches before updating controller state
Currently, we only flush the per-P gcWork caches in gcMark, at the
beginning of mark termination. This is necessary to ensure that no
work is held up in these caches.

However, this flush happens after we update the GC controller state,
which depends on statistics about marked heap size and scan work that
are only updated by this flush. Hence, the controller is missing the
bulk of heap marking and scan work. This bug was introduced in commit
1b4025f, which introduced the per-P gcWork caches.

Fix this by flushing these caches before we update the GC controller
state. We continue to flush them at the beginning of mark termination
as well to be robust in case any write barriers happened between the
previous flush and entering mark termination, but this should be a
no-op.

Change-Id: I8f0f91024df967ebf0c616d1c4f0c339c304ebaa
Reviewed-on: https://go-review.googlesource.com/9646
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:22 +00:00
Rick Hudson
1845314560 runtime: remove unused GC timers
During development some tracing routines were added that are not
needed in the release. These included GCstarttimes, GCendtimes, and
GCprinttimes.
Fixes #10462

Change-Id: I0788e6409d61038571a5ae0cbbab793102df0a65
Reviewed-on: https://go-review.googlesource.com/9689
Reviewed-by: Austin Clements <austin@google.com>
2015-05-06 12:53:08 +00:00
Aram Hăvărneanu
fe5ef5c9d7 runtime, syscall: link Solaris binaries directly instead of using dlopen/dlsym
Before CL 8214 (use .plt instead of .got on Solaris) Solaris used a
dynamic linking scheme that didn't permit lazy binding. To speed program
startup, Go binaries only used it for a small number of symbols required
by the runtime. Other symbols were resolved on demand on first use, and
were cached for subsequent use. This required some moderately complex
code in the syscall package.

CL 8214 changed the way dynamic linking is implemented, and now lazy
binding is supported. As now all symbols are resolved lazily by the
dynamic loader, there is no need for the complex code in the syscall
package that did the same. This CL makes Go programs link directly
with the necessary shared libraries and deletes the lazy-loading code
implemented in Go.

Change-Id: Ifd7275db72de61b70647242e7056dd303b1aee9e
Reviewed-on: https://go-review.googlesource.com/9184
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-06 11:38:50 +00:00
Aram Hăvărneanu
121489cbfd runtime/cgo: add cgo support for solaris/amd64
Change-Id: Ic9744c7716cdd53f27c6e5874230963e5fff0333
Reviewed-on: https://go-review.googlesource.com/8260
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-06 11:37:28 +00:00
Aram Hăvărneanu
c94f1f791b runtime: always load address of libcFunc on Solaris
The linker always uses .plt for externals, so libcFunc is now an actual
external symbol instead of a pointer to one.

Fixes most of the breakage introduced in previous CL.

Change-Id: I64b8c96f93127f2d13b5289b024677fd3ea7dbea
Reviewed-on: https://go-review.googlesource.com/8215
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-06 11:36:57 +00:00
Russ Cox
ceefebd795 runtime: rename ptrsize to ptrdata
I forgot there is already a ptrSize constant.
Rename field to avoid some confusion.

Change-Id: I098fdcc8afc947d6c02c41c6e6de24624cc1c8ff
Reviewed-on: https://go-review.googlesource.com/9700
Reviewed-by: Austin Clements <austin@google.com>
2015-05-05 19:27:47 +00:00
Keith Randall
5a828cfcde runtime: let freezetheworld work even when gomaxprocs=1
Freezetheworld still has stuff to do when gomaxprocs=1.
In particular, signals can come in on other Ms (like the GC M, say)
and the single user M is still running.

Fixes #10546

Change-Id: I2f07f17d1c81e93cf905df2cb087112d436ca7e7
Reviewed-on: https://go-review.googlesource.com/9551
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-05-05 15:11:10 +00:00
Shenghou Ma
102436e800 runtime: fix software FP regs corruption when emulating SQRT on ARM
When emulating ARM FSQRT instruction, the sqrt function itself
should not use any floating point arithmetics, otherwise it will
clobber the user software FP registers.

Fortunately, the sqrt function only uses floating point instructions
to test for corner cases, so it's easy to make that function does
all it job using pure integer arithmetic only. I've verified that
after this change, runtime.stepflt and runtime.sqrt doesn't contain
any call to _sfloat. (Perhaps we should add //go:nosfloat to make
the compiler enforce this?)

Fixes #10641.

Change-Id: Ida4742c49000fae4fea4649f28afde630ce4c576
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9570
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Keith Randall <khr@golang.org>
2015-05-05 07:32:58 +00:00
Austin Clements
98a9d36837 runtime: add pointer size to type structure
This adds a field to the runtime type structure that records the size
of the prefix of objects of that type containing pointers. Any data
after this offset is scalar data.

This is necessary for shrinking the type bitmaps to 1 bit and will
help the garbage collector efficiently estimate the amount of heap
that needs to be scanned.

Change-Id: I1318d79e6360dca0ac980245016c562e61f52ff5
Reviewed-on: https://go-review.googlesource.com/9691
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-04 20:17:48 +00:00
Rick Hudson
b86e71f5aa runtime: Reduce calls to shouldtriggergc
shouldtriggergc is slightly expensive due to the call overhead
and the use of an atomic. This CL reduces the number of time
one checks if a GC should be done from one at each allocation
to once when a span is allocated. Since shouldtriggergc is an
important abstraction simply hand inlining it, along with its
atomic instruction would lose the abstraction.

Change-Id: Ia3210655b4b3d433f77064a21ecb54e4d9d435f7
Reviewed-on: https://go-review.googlesource.com/9403
Reviewed-by: Austin Clements <austin@google.com>
2015-05-04 17:38:58 +00:00
Alex Brainman
031c3bc9ae runtime: fix stackDebug comment
Change-Id: Ia9191bd7ecdf7bd5ee7d69ae23aa71760f379aa8
Reviewed-on: https://go-review.googlesource.com/9590
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-02 02:39:50 +00:00
Austin Clements
dc870d5f4b runtime: detailed debug output of controller state
This adds a detailed debug dump of the state of the GC controller and
a GODEBUG flag to enable it.

Change-Id: I562fed7981691a84ddf0f9e6fcd9f089f497ac13
Reviewed-on: https://go-review.googlesource.com/9640
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 19:39:43 +00:00
Russ Cox
4fffc50c26 runtime: correct accounting of scan work and bytes marked
(1) Count pointer-free objects found during scanning roots
as marked bytes, by not zeroing the mark total after scanning roots.

(2) Don't count the bytes for the roots themselves, by not adding
them to the mark total in scanblock (the zeroing removed by (1)
was aimed at that add but hitting more).

Combined, (1) and (2) fix the calculation of the marked heap size.
This makes the GC trigger much less often in the Go 1 benchmarks,
which have a global []byte pointing at 256 MB of data.
That 256 MB allocation was not being included in the heap size
in the current code, but was included in Go 1.4.
This is the source of much of the relative slowdown in that directory.

(3) Count the bytes for the roots as scanned work, by not zeroing
the scan total after scanning roots. There is no strict justification
for this, and it probably doesn't matter much either way,
but it was always combined with another buggy zeroing
(removed in (1)), so guilty by association.

Austin noticed this.

name                                    old mean                new mean        delta
BenchmarkBinaryTree17              13.1s × (0.97,1.03)      5.9s × (0.97,1.05)  -55.19% (p=0.000)
BenchmarkFannkuch11                4.35s × (0.99,1.01)     4.37s × (1.00,1.01)  +0.47% (p=0.032)
BenchmarkFmtFprintfEmpty          84.6ns × (0.95,1.14)    85.7ns × (0.94,1.05)  ~ (p=0.521)
BenchmarkFmtFprintfString          320ns × (0.95,1.06)     283ns × (0.99,1.02)  -11.48% (p=0.000)
BenchmarkFmtFprintfInt             311ns × (0.98,1.03)     288ns × (0.99,1.02)  -7.26% (p=0.000)
BenchmarkFmtFprintfIntInt          554ns × (0.96,1.05)     478ns × (0.99,1.02)  -13.70% (p=0.000)
BenchmarkFmtFprintfPrefixedInt     434ns × (0.96,1.06)     393ns × (0.98,1.04)  -9.60% (p=0.000)
BenchmarkFmtFprintfFloat           620ns × (0.99,1.03)     584ns × (0.99,1.01)  -5.73% (p=0.000)
BenchmarkFmtManyArgs              2.19µs × (0.98,1.03)    1.94µs × (0.99,1.01)  -11.62% (p=0.000)
BenchmarkGobDecode                21.2ms × (0.97,1.06)    15.2ms × (0.99,1.01)  -28.17% (p=0.000)
BenchmarkGobEncode                18.1ms × (0.94,1.06)    11.8ms × (0.99,1.01)  -35.00% (p=0.000)
BenchmarkGzip                      650ms × (0.98,1.01)     649ms × (0.99,1.02)  ~ (p=0.802)
BenchmarkGunzip                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.438)
BenchmarkHTTPClientServer          110µs × (0.98,1.04)     101µs × (0.98,1.02)  -8.79% (p=0.000)
BenchmarkJSONEncode               40.3ms × (0.97,1.03)    31.8ms × (0.98,1.03)  -20.92% (p=0.000)
BenchmarkJSONDecode                119ms × (0.97,1.02)     108ms × (0.99,1.02)  -9.15% (p=0.000)
BenchmarkMandelbrot200            6.03ms × (1.00,1.01)    6.03ms × (0.99,1.01)  ~ (p=0.750)
BenchmarkGoParse                  8.58ms × (0.89,1.10)    6.80ms × (1.00,1.00)  -20.71% (p=0.000)
BenchmarkRegexpMatchEasy0_32       162ns × (1.00,1.01)     162ns × (0.99,1.02)  ~ (p=0.131)
BenchmarkRegexpMatchEasy0_1K       540ns × (0.99,1.02)     559ns × (0.99,1.02)  +3.58% (p=0.000)
BenchmarkRegexpMatchEasy1_32       139ns × (0.98,1.04)     139ns × (1.00,1.00)  ~ (p=0.466)
BenchmarkRegexpMatchEasy1_1K       889ns × (0.99,1.01)     885ns × (0.99,1.01)  -0.50% (p=0.022)
BenchmarkRegexpMatchMedium_32      252ns × (0.99,1.02)     252ns × (0.99,1.01)  ~ (p=0.469)
BenchmarkRegexpMatchMedium_1K     72.9µs × (0.99,1.01)    73.6µs × (0.99,1.03)  ~ (p=0.168)
BenchmarkRegexpMatchHard_32       3.87µs × (1.00,1.01)    3.86µs × (1.00,1.00)  ~ (p=0.055)
BenchmarkRegexpMatchHard_1K        118µs × (0.99,1.01)     117µs × (0.99,1.00)  ~ (p=0.133)
BenchmarkRevcomp                   995ms × (0.94,1.10)     949ms × (0.99,1.01)  -4.64% (p=0.000)
BenchmarkTemplate                  141ms × (0.97,1.02)     127ms × (0.99,1.01)  -10.00% (p=0.000)
BenchmarkTimeParse                 641ns × (0.99,1.01)     623ns × (0.99,1.01)  -2.79% (p=0.000)
BenchmarkTimeFormat                729ns × (0.98,1.03)     679ns × (0.99,1.00)  -6.93% (p=0.000)

Change-Id: I839bd7356630d18377989a0748763414e15ed057
Reviewed-on: https://go-review.googlesource.com/9602
Reviewed-by: Austin Clements <austin@google.com>
2015-05-01 19:31:00 +00:00
Russ Cox
4d0f3a1c95 cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss
The bitmaps were 2 bits per pointer because we needed to distinguish
scalar, pointer, multiword, and we used the leftover value to distinguish
uninitialized from scalar, even though the garbage collector (GC) didn't care.

Now that there are no multiword structures from the GC's point of view,
cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not.

The GC assumes the same layout for stack frames and for the maps
describing the global data and bss sections, so change them all in one CL.

The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since
the 2-bit representation lives (at least for now) in some of the reflect data.

Because these stack frame bitmaps are stored directly in the rodata in
the binary, this CL reduces the size of the 6g binary by about 1.1%.

Performance change is basically a wash, but using less memory,
and smaller binaries, and enables other bitmap reductions.

name                                      old mean                new mean        delta
BenchmarkBinaryTree17                13.2s × (0.97,1.03)     13.0s × (0.99,1.01)  -0.93% (p=0.005)
BenchmarkBinaryTree17-2              9.69s × (0.96,1.05)     9.51s × (0.96,1.03)  -1.86% (p=0.001)
BenchmarkBinaryTree17-4              10.1s × (0.97,1.05)     10.0s × (0.96,1.05)  ~ (p=0.141)
BenchmarkFannkuch11                  4.35s × (0.99,1.01)     4.43s × (0.98,1.04)  +1.75% (p=0.001)
BenchmarkFannkuch11-2                4.31s × (0.99,1.03)     4.32s × (1.00,1.00)  ~ (p=0.095)
BenchmarkFannkuch11-4                4.32s × (0.99,1.02)     4.38s × (0.98,1.04)  +1.38% (p=0.008)
BenchmarkFmtFprintfEmpty            83.5ns × (0.97,1.10)    87.3ns × (0.92,1.11)  +4.55% (p=0.014)
BenchmarkFmtFprintfEmpty-2          81.8ns × (0.98,1.04)    82.5ns × (0.97,1.08)  ~ (p=0.364)
BenchmarkFmtFprintfEmpty-4          80.9ns × (0.99,1.01)    82.6ns × (0.97,1.08)  +2.12% (p=0.010)
BenchmarkFmtFprintfString            320ns × (0.95,1.04)     322ns × (0.97,1.05)  ~ (p=0.368)
BenchmarkFmtFprintfString-2          303ns × (0.97,1.04)     304ns × (0.97,1.04)  ~ (p=0.484)
BenchmarkFmtFprintfString-4          305ns × (0.97,1.05)     306ns × (0.98,1.05)  ~ (p=0.543)
BenchmarkFmtFprintfInt               311ns × (0.98,1.03)     319ns × (0.97,1.03)  +2.63% (p=0.000)
BenchmarkFmtFprintfInt-2             297ns × (0.98,1.04)     301ns × (0.97,1.04)  +1.19% (p=0.023)
BenchmarkFmtFprintfInt-4             302ns × (0.98,1.02)     304ns × (0.97,1.03)  ~ (p=0.126)
BenchmarkFmtFprintfIntInt            554ns × (0.96,1.05)     554ns × (0.97,1.03)  ~ (p=0.975)
BenchmarkFmtFprintfIntInt-2          520ns × (0.98,1.03)     517ns × (0.98,1.02)  ~ (p=0.153)
BenchmarkFmtFprintfIntInt-4          524ns × (0.98,1.02)     525ns × (0.98,1.03)  ~ (p=0.597)
BenchmarkFmtFprintfPrefixedInt       433ns × (0.97,1.06)     434ns × (0.97,1.06)  ~ (p=0.804)
BenchmarkFmtFprintfPrefixedInt-2     413ns × (0.98,1.04)     413ns × (0.98,1.03)  ~ (p=0.881)
BenchmarkFmtFprintfPrefixedInt-4     420ns × (0.97,1.03)     421ns × (0.97,1.03)  ~ (p=0.561)
BenchmarkFmtFprintfFloat             620ns × (0.99,1.03)     636ns × (0.97,1.03)  +2.57% (p=0.000)
BenchmarkFmtFprintfFloat-2           601ns × (0.98,1.02)     617ns × (0.98,1.03)  +2.58% (p=0.000)
BenchmarkFmtFprintfFloat-4           613ns × (0.98,1.03)     626ns × (0.98,1.02)  +2.15% (p=0.000)
BenchmarkFmtManyArgs                2.19µs × (0.96,1.04)    2.23µs × (0.97,1.02)  +1.65% (p=0.000)
BenchmarkFmtManyArgs-2              2.08µs × (0.98,1.03)    2.10µs × (0.99,1.02)  +0.79% (p=0.019)
BenchmarkFmtManyArgs-4              2.10µs × (0.98,1.02)    2.13µs × (0.98,1.02)  +1.72% (p=0.000)
BenchmarkGobDecode                  21.3ms × (0.97,1.05)    21.1ms × (0.97,1.04)  -1.36% (p=0.025)
BenchmarkGobDecode-2                20.0ms × (0.97,1.03)    19.2ms × (0.97,1.03)  -4.00% (p=0.000)
BenchmarkGobDecode-4                19.5ms × (0.99,1.02)    19.0ms × (0.99,1.01)  -2.39% (p=0.000)
BenchmarkGobEncode                  18.3ms × (0.95,1.07)    18.1ms × (0.96,1.08)  ~ (p=0.305)
BenchmarkGobEncode-2                16.8ms × (0.97,1.02)    16.4ms × (0.98,1.02)  -2.79% (p=0.000)
BenchmarkGobEncode-4                15.4ms × (0.98,1.02)    15.4ms × (0.98,1.02)  ~ (p=0.465)
BenchmarkGzip                        650ms × (0.98,1.03)     655ms × (0.97,1.04)  ~ (p=0.075)
BenchmarkGzip-2                      652ms × (0.98,1.03)     655ms × (0.98,1.02)  ~ (p=0.337)
BenchmarkGzip-4                      656ms × (0.98,1.04)     653ms × (0.98,1.03)  ~ (p=0.291)
BenchmarkGunzip                      143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.507)
BenchmarkGunzip-2                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.313)
BenchmarkGunzip-4                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.312)
BenchmarkHTTPClientServer            110µs × (0.98,1.03)     109µs × (0.99,1.02)  -1.40% (p=0.000)
BenchmarkHTTPClientServer-2          154µs × (0.90,1.08)     149µs × (0.90,1.08)  -3.43% (p=0.007)
BenchmarkHTTPClientServer-4          138µs × (0.97,1.04)     138µs × (0.96,1.04)  ~ (p=0.670)
BenchmarkJSONEncode                 40.2ms × (0.98,1.02)    40.2ms × (0.98,1.05)  ~ (p=0.828)
BenchmarkJSONEncode-2               35.1ms × (0.99,1.02)    35.2ms × (0.98,1.03)  ~ (p=0.392)
BenchmarkJSONEncode-4               35.3ms × (0.98,1.03)    35.3ms × (0.98,1.02)  ~ (p=0.813)
BenchmarkJSONDecode                  119ms × (0.97,1.02)     117ms × (0.98,1.02)  -1.80% (p=0.000)
BenchmarkJSONDecode-2                115ms × (0.99,1.02)     114ms × (0.98,1.02)  -1.18% (p=0.000)
BenchmarkJSONDecode-4                116ms × (0.98,1.02)     114ms × (0.98,1.02)  -1.43% (p=0.000)
BenchmarkMandelbrot200              6.03ms × (1.00,1.01)    6.03ms × (1.00,1.01)  ~ (p=0.985)
BenchmarkMandelbrot200-2            6.03ms × (1.00,1.01)    6.02ms × (1.00,1.01)  ~ (p=0.320)
BenchmarkMandelbrot200-4            6.03ms × (1.00,1.01)    6.03ms × (1.00,1.01)  ~ (p=0.799)
BenchmarkGoParse                    8.63ms × (0.89,1.10)    8.58ms × (0.93,1.09)  ~ (p=0.667)
BenchmarkGoParse-2                  8.20ms × (0.97,1.04)    8.37ms × (0.97,1.04)  +1.96% (p=0.001)
BenchmarkGoParse-4                  8.00ms × (0.98,1.02)    8.14ms × (0.99,1.02)  +1.75% (p=0.000)
BenchmarkRegexpMatchEasy0_32         162ns × (1.00,1.01)     164ns × (0.98,1.04)  +1.35% (p=0.011)
BenchmarkRegexpMatchEasy0_32-2       161ns × (1.00,1.01)     161ns × (1.00,1.00)  ~ (p=0.185)
BenchmarkRegexpMatchEasy0_32-4       161ns × (1.00,1.00)     161ns × (1.00,1.00)  -0.19% (p=0.001)
BenchmarkRegexpMatchEasy0_1K         540ns × (0.99,1.02)     566ns × (0.98,1.04)  +4.98% (p=0.000)
BenchmarkRegexpMatchEasy0_1K-2       540ns × (0.99,1.01)     557ns × (0.99,1.01)  +3.21% (p=0.000)
BenchmarkRegexpMatchEasy0_1K-4       541ns × (0.99,1.01)     559ns × (0.99,1.01)  +3.26% (p=0.000)
BenchmarkRegexpMatchEasy1_32         139ns × (0.98,1.04)     139ns × (0.99,1.03)  ~ (p=0.979)
BenchmarkRegexpMatchEasy1_32-2       139ns × (0.99,1.04)     139ns × (0.99,1.02)  ~ (p=0.777)
BenchmarkRegexpMatchEasy1_32-4       139ns × (0.98,1.04)     139ns × (0.99,1.04)  ~ (p=0.771)
BenchmarkRegexpMatchEasy1_1K         890ns × (0.99,1.03)     885ns × (1.00,1.01)  -0.50% (p=0.004)
BenchmarkRegexpMatchEasy1_1K-2       888ns × (0.99,1.01)     885ns × (0.99,1.01)  -0.37% (p=0.004)
BenchmarkRegexpMatchEasy1_1K-4       890ns × (0.99,1.02)     884ns × (1.00,1.00)  -0.70% (p=0.000)
BenchmarkRegexpMatchMedium_32        252ns × (0.99,1.01)     251ns × (0.99,1.01)  ~ (p=0.081)
BenchmarkRegexpMatchMedium_32-2      254ns × (0.99,1.04)     252ns × (0.99,1.01)  -0.78% (p=0.027)
BenchmarkRegexpMatchMedium_32-4      253ns × (0.99,1.04)     252ns × (0.99,1.01)  -0.70% (p=0.022)
BenchmarkRegexpMatchMedium_1K       72.9µs × (0.99,1.01)    72.7µs × (1.00,1.00)  ~ (p=0.064)
BenchmarkRegexpMatchMedium_1K-2     74.1µs × (0.98,1.05)    72.9µs × (1.00,1.01)  -1.61% (p=0.001)
BenchmarkRegexpMatchMedium_1K-4     73.6µs × (0.99,1.05)    72.8µs × (1.00,1.00)  -1.13% (p=0.007)
BenchmarkRegexpMatchHard_32         3.88µs × (0.99,1.03)    3.92µs × (0.98,1.05)  ~ (p=0.143)
BenchmarkRegexpMatchHard_32-2       3.89µs × (0.99,1.03)    3.93µs × (0.98,1.09)  ~ (p=0.278)
BenchmarkRegexpMatchHard_32-4       3.90µs × (0.99,1.05)    3.93µs × (0.98,1.05)  ~ (p=0.252)
BenchmarkRegexpMatchHard_1K          118µs × (0.99,1.01)     117µs × (0.99,1.02)  -0.54% (p=0.003)
BenchmarkRegexpMatchHard_1K-2        118µs × (0.99,1.01)     118µs × (0.99,1.03)  ~ (p=0.581)
BenchmarkRegexpMatchHard_1K-4        118µs × (0.99,1.02)     117µs × (0.99,1.01)  -0.54% (p=0.002)
BenchmarkRevcomp                     991ms × (0.95,1.10)     989ms × (0.94,1.08)  ~ (p=0.879)
BenchmarkRevcomp-2                   978ms × (0.95,1.11)     962ms × (0.96,1.08)  ~ (p=0.257)
BenchmarkRevcomp-4                   979ms × (0.96,1.07)     974ms × (0.96,1.11)  ~ (p=0.678)
BenchmarkTemplate                    141ms × (0.99,1.02)     145ms × (0.99,1.02)  +2.75% (p=0.000)
BenchmarkTemplate-2                  135ms × (0.98,1.02)     138ms × (0.99,1.02)  +2.34% (p=0.000)
BenchmarkTemplate-4                  136ms × (0.98,1.02)     140ms × (0.99,1.02)  +2.71% (p=0.000)
BenchmarkTimeParse                   640ns × (0.99,1.01)     622ns × (0.99,1.01)  -2.88% (p=0.000)
BenchmarkTimeParse-2                 640ns × (0.99,1.01)     622ns × (1.00,1.00)  -2.81% (p=0.000)
BenchmarkTimeParse-4                 640ns × (1.00,1.01)     622ns × (0.99,1.01)  -2.82% (p=0.000)
BenchmarkTimeFormat                  730ns × (0.98,1.02)     731ns × (0.98,1.03)  ~ (p=0.767)
BenchmarkTimeFormat-2                709ns × (0.99,1.02)     707ns × (0.99,1.02)  ~ (p=0.347)
BenchmarkTimeFormat-4                717ns × (0.98,1.01)     718ns × (0.98,1.02)  ~ (p=0.793)

Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9
Reviewed-on: https://go-review.googlesource.com/9406
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-01 18:44:36 +00:00
Josh Bleecher Snyder
7bebccb972 Revert "runtime/pprof: write heap statistics to heap profile always"
This reverts commit c26fc88d56.

This broke pprof. See the comments at 9491.

Change-Id: Ic99ce026e86040c050a9bf0ea3024a1a42274ad1
Reviewed-on: https://go-review.googlesource.com/9565
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2015-05-01 15:56:20 +00:00
Keith Randall
a55b131393 cmd/dist, runtime: Make stack guard larger for non-optimized builds
Kind of a hack, but makes the non-optimized builds pass.

Fixes #10079

Change-Id: I26f41c546867f8f3f16d953dc043e784768f2aff
Reviewed-on: https://go-review.googlesource.com/9552
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 15:41:55 +00:00
David Chase
7fbb1b36c3 cmd/internal/gc: improve flow of input params to output params
This includes the following information in the per-function summary:

outK = paramJ   encoded in outK bits for paramJ
outK = *paramJ  encoded in outK bits for paramJ
heap = paramJ   EscHeap
heap = *paramJ  EscContentEscapes

Note that (currently) if the address of a parameter is taken and
returned, necessarily a heap allocation occurred to contain that
reference, and the heap can never refer to stack, therefore the
parameter and everything downstream from it escapes to the heap.

The per-function summary information now has a tuneable number of bits
(2 is probably noticeably better than 1, 3 is likely overkill, but it
is now easy to check and the -m debugging output includes information
that allows you to figure out if more would be better.)

A new test was  added to check pointer flow through struct-typed and
*struct-typed parameters and returns; some of these are sensitive to
the number of summary bits, and ought to yield better results with a
more competent escape analysis algorithm.  Another new test checks
(some) correctness with array parameters, results, and operations.

The old analysis inferred a piece of plan9 runtime was non-escaping by
counteracting overconservative analysis with buggy analysis; with the
bug fixed, the result was too conservative (and it's not easy to fix
in this framework) so the source code was tweaked to get the desired
result.  A test was added against the discovered bug.

The escape analysis was further improved splitting the "level" into
3 parts, one tracking the conventional "level" and the other two
computing the highest-level-suffix-from-copy, which is used to
generally model the cancelling effect of indirection applied to
address-of.

With the improved escape analysis enabled, it was necessary to
modify one of the runtime tests because it now attempts to allocate
too much on the (small, fixed-size) G0 (system) stack and this
failed the test.

Compiling src/std after touching src/runtime/*.go with -m logging
turned on shows 420 fewer heap allocation sites (10538 vs 10968).

Profiling allocations in src/html/template with
for i in {1..5} ;
  do go tool 6g -memprofile=mastx.${i}.prof  -memprofilerate=1 *.go;
  go tool pprof -alloc_objects -text  mastx.${i}.prof ;
done

showed a 15% reduction in allocations performed by the compiler.

Update #3753
Update #4720
Fixes #10466

Change-Id: I0fd97d5f5ac527b45f49e2218d158a6e89951432
Reviewed-on: https://go-review.googlesource.com/8202
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 13:47:20 +00:00
David Crawshaw
4044adedf7 runtime/cgo, cmd/dist: turn off exc_bad_access handler by default
App Store policy requires programs do not reference the exc_server
symbol. (Some public forum threads show that Unity ran into this
several years ago and it is a hard policy rule.) While some research
suggests that I could write my own version of exc_server, the
expedient course is to disable the exception handler by default.

Go programs only need it when running under lldb, which is primarily
used by tests. So enable the exception handler in cmd/dist when we
are running the tests.

Fixes #10646

Change-Id: I853905254894b5367edb8abd381d45585a78ee8b
Reviewed-on: https://go-review.googlesource.com/9549
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-01 13:19:39 +00:00
Shenghou Ma
5f69e739d3 runtime: adjust traceTickDiv for non-x86 architectures
Fixes #10554.
Fixes #10623.

Change-Id: I90fbaa34e3d55c8758178f8d2e7fa41ff1194a1b
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9247
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-05-01 07:25:49 +00:00
Russ Cox
79a990b845 runtime: schedule GC work more aggressively
Schedule the work as early as possible, while still respecting the
utilization percentage on average. The old code tried never to
go above the utilization percentage. The new code is willing
to go above the utilization percentage by one time slice
(but of course after doing that it must wait until the percentage
drops back down to the target before it gets another time slice).

The effect is that for concurrent GCs that can run in a small number
of time slices, the time during which write barriers are enabled is
reduced by one mutator + GC time slice round (possibly 30 ms per GC).

This only affects the fractional GC processor (the remainder of GOMAXPROCS/4),
so it matters most in GOMAXPROCS=1, a bit in GOMAXPROCS=2, and not at
all in GOMAXPROCS=4.

GOMAXPROCS=1
name                                      old mean                new mean        delta
BenchmarkBinaryTree17                12.4s × (0.98,1.03)     13.5s × (0.97,1.04)  +8.84% (p=0.000)
BenchmarkFannkuch11                  4.38s × (1.00,1.01)     4.38s × (1.00,1.01)  ~ (p=0.343)
BenchmarkFmtFprintfEmpty            88.9ns × (0.97,1.10)    90.1ns × (0.93,1.14)  ~ (p=0.224)
BenchmarkFmtFprintfString            356ns × (0.94,1.05)     321ns × (0.94,1.12)  -9.77% (p=0.000)
BenchmarkFmtFprintfInt               344ns × (0.98,1.03)     325ns × (0.96,1.03)  -5.46% (p=0.000)
BenchmarkFmtFprintfIntInt            622ns × (0.97,1.03)     571ns × (0.95,1.05)  -8.09% (p=0.000)
BenchmarkFmtFprintfPrefixedInt       462ns × (0.96,1.04)     431ns × (0.95,1.05)  -6.81% (p=0.000)
BenchmarkFmtFprintfFloat             653ns × (0.98,1.03)     621ns × (0.99,1.03)  -4.90% (p=0.000)
BenchmarkFmtManyArgs                2.32µs × (0.97,1.03)    2.19µs × (0.98,1.02)  -5.43% (p=0.000)
BenchmarkGobDecode                  27.0ms × (0.96,1.04)    20.0ms × (0.97,1.04)  -26.06% (p=0.000)
BenchmarkGobEncode                  26.6ms × (0.99,1.01)    17.8ms × (0.95,1.05)  -33.19% (p=0.000)
BenchmarkGzip                        659ms × (0.98,1.03)     650ms × (0.99,1.01)  -1.34% (p=0.000)
BenchmarkGunzip                      145ms × (0.98,1.04)     143ms × (1.00,1.01)  -1.47% (p=0.000)
BenchmarkHTTPClientServer            111µs × (0.97,1.04)     110µs × (0.96,1.03)  -1.30% (p=0.000)
BenchmarkJSONEncode                 52.0ms × (0.97,1.03)    40.8ms × (0.97,1.03)  -21.47% (p=0.000)
BenchmarkJSONDecode                  127ms × (0.98,1.04)     120ms × (0.98,1.02)  -5.55% (p=0.000)
BenchmarkMandelbrot200              6.04ms × (0.99,1.04)    6.02ms × (1.00,1.01)  ~ (p=0.176)
BenchmarkGoParse                    8.62ms × (0.96,1.08)    8.55ms × (0.93,1.09)  ~ (p=0.302)
BenchmarkRegexpMatchEasy0_32         164ns × (0.98,1.05)     165ns × (0.98,1.07)  ~ (p=0.293)
BenchmarkRegexpMatchEasy0_1K         546ns × (0.98,1.06)     547ns × (0.97,1.07)  ~ (p=0.741)
BenchmarkRegexpMatchEasy1_32         142ns × (0.97,1.09)     141ns × (0.97,1.05)  ~ (p=0.231)
BenchmarkRegexpMatchEasy1_1K         904ns × (0.97,1.07)     900ns × (0.98,1.04)  ~ (p=0.294)
BenchmarkRegexpMatchMedium_32        256ns × (0.98,1.06)     256ns × (0.97,1.04)  ~ (p=0.530)
BenchmarkRegexpMatchMedium_1K       74.2µs × (0.98,1.05)    73.8µs × (0.98,1.04)  ~ (p=0.334)
BenchmarkRegexpMatchHard_32         3.94µs × (0.98,1.07)    3.92µs × (0.98,1.05)  ~ (p=0.356)
BenchmarkRegexpMatchHard_1K          119µs × (0.98,1.07)     119µs × (0.98,1.06)  ~ (p=0.467)
BenchmarkRevcomp                     978ms × (0.96,1.09)     984ms × (0.95,1.07)  ~ (p=0.448)
BenchmarkTemplate                    151ms × (0.96,1.03)     142ms × (0.95,1.04)  -5.55% (p=0.000)
BenchmarkTimeParse                   628ns × (0.99,1.01)     628ns × (0.99,1.01)  ~ (p=0.855)
BenchmarkTimeFormat                  729ns × (0.98,1.06)     734ns × (0.97,1.05)  ~ (p=0.149)

GOMAXPROCS=2
name                                      old mean                new mean        delta
BenchmarkBinaryTree17-2              9.80s × (0.97,1.03)     9.85s × (0.99,1.02)  ~ (p=0.444)
BenchmarkFannkuch11-2                4.35s × (0.99,1.01)     4.40s × (0.98,1.05)  ~ (p=0.099)
BenchmarkFmtFprintfEmpty-2          86.7ns × (0.97,1.05)    85.9ns × (0.98,1.04)  ~ (p=0.409)
BenchmarkFmtFprintfString-2          297ns × (0.98,1.01)     297ns × (0.99,1.01)  ~ (p=0.743)
BenchmarkFmtFprintfInt-2             309ns × (0.98,1.02)     310ns × (0.99,1.01)  ~ (p=0.464)
BenchmarkFmtFprintfIntInt-2          525ns × (0.97,1.05)     518ns × (0.99,1.01)  ~ (p=0.151)
BenchmarkFmtFprintfPrefixedInt-2     408ns × (0.98,1.02)     408ns × (0.98,1.03)  ~ (p=0.797)
BenchmarkFmtFprintfFloat-2           603ns × (0.99,1.01)     604ns × (0.98,1.02)  ~ (p=0.588)
BenchmarkFmtManyArgs-2              2.07µs × (0.98,1.02)    2.05µs × (0.99,1.01)  ~ (p=0.091)
BenchmarkGobDecode-2                19.1ms × (0.97,1.01)    19.3ms × (0.97,1.04)  ~ (p=0.195)
BenchmarkGobEncode-2                16.2ms × (0.97,1.03)    16.4ms × (0.99,1.01)  ~ (p=0.069)
BenchmarkGzip-2                      652ms × (0.99,1.01)     651ms × (0.99,1.01)  ~ (p=0.705)
BenchmarkGunzip-2                    143ms × (1.00,1.01)     143ms × (1.00,1.00)  ~ (p=0.665)
BenchmarkHTTPClientServer-2          149µs × (0.92,1.11)     149µs × (0.91,1.08)  ~ (p=0.862)
BenchmarkJSONEncode-2               34.6ms × (0.98,1.02)    37.2ms × (0.99,1.01)  +7.56% (p=0.000)
BenchmarkJSONDecode-2                117ms × (0.99,1.01)     117ms × (0.99,1.01)  ~ (p=0.858)
BenchmarkMandelbrot200-2            6.10ms × (0.99,1.03)    6.03ms × (1.00,1.00)  ~ (p=0.083)
BenchmarkGoParse-2                  8.25ms × (0.98,1.01)    8.21ms × (0.99,1.02)  ~ (p=0.307)
BenchmarkRegexpMatchEasy0_32-2       162ns × (0.99,1.02)     162ns × (0.99,1.01)  ~ (p=0.857)
BenchmarkRegexpMatchEasy0_1K-2       541ns × (0.99,1.01)     540ns × (1.00,1.00)  ~ (p=0.530)
BenchmarkRegexpMatchEasy1_32-2       138ns × (1.00,1.00)     141ns × (0.98,1.04)  +1.88% (p=0.038)
BenchmarkRegexpMatchEasy1_1K-2       887ns × (0.99,1.01)     894ns × (0.99,1.01)  ~ (p=0.087)
BenchmarkRegexpMatchMedium_32-2      252ns × (0.99,1.01)     252ns × (0.99,1.01)  ~ (p=0.954)
BenchmarkRegexpMatchMedium_1K-2     73.4µs × (0.99,1.02)    72.8µs × (1.00,1.01)  -0.87% (p=0.029)
BenchmarkRegexpMatchHard_32-2       3.95µs × (0.97,1.05)    3.87µs × (1.00,1.01)  -2.11% (p=0.035)
BenchmarkRegexpMatchHard_1K-2        117µs × (0.99,1.01)     117µs × (0.99,1.01)  ~ (p=0.669)
BenchmarkRevcomp-2                   980ms × (0.95,1.03)     993ms × (0.94,1.09)  ~ (p=0.527)
BenchmarkTemplate-2                  136ms × (0.98,1.01)     135ms × (0.99,1.01)  ~ (p=0.200)
BenchmarkTimeParse-2                 630ns × (1.00,1.01)     630ns × (1.00,1.00)  ~ (p=0.634)
BenchmarkTimeFormat-2                705ns × (0.99,1.01)     710ns × (0.98,1.02)  ~ (p=0.174)

GOMAXPROCS=4
BenchmarkBinaryTree17-4              9.87s × (0.96,1.04)     9.75s × (0.96,1.03)  ~ (p=0.178)
BenchmarkFannkuch11-4                4.35s × (1.00,1.01)     4.40s × (0.99,1.04)  ~ (p=0.071)
BenchmarkFmtFprintfEmpty-4          85.8ns × (0.98,1.06)    85.6ns × (0.98,1.04)  ~ (p=0.858)
BenchmarkFmtFprintfString-4          306ns × (0.99,1.03)     304ns × (0.97,1.02)  ~ (p=0.470)
BenchmarkFmtFprintfInt-4             317ns × (0.98,1.01)     315ns × (0.98,1.02)  -0.92% (p=0.044)
BenchmarkFmtFprintfIntInt-4          527ns × (0.99,1.01)     525ns × (0.98,1.01)  ~ (p=0.164)
BenchmarkFmtFprintfPrefixedInt-4     421ns × (0.98,1.03)     417ns × (0.99,1.02)  ~ (p=0.092)
BenchmarkFmtFprintfFloat-4           623ns × (0.98,1.02)     618ns × (0.98,1.03)  ~ (p=0.172)
BenchmarkFmtManyArgs-4              2.09µs × (0.98,1.02)    2.09µs × (0.98,1.02)  ~ (p=0.679)
BenchmarkGobDecode-4                18.6ms × (0.99,1.01)    18.6ms × (0.98,1.03)  ~ (p=0.595)
BenchmarkGobEncode-4                15.0ms × (0.98,1.02)    15.1ms × (0.99,1.01)  ~ (p=0.301)
BenchmarkGzip-4                      659ms × (0.98,1.04)     660ms × (0.97,1.02)  ~ (p=0.724)
BenchmarkGunzip-4                    145ms × (0.98,1.04)     144ms × (0.99,1.04)  ~ (p=0.671)
BenchmarkHTTPClientServer-4          139µs × (0.97,1.02)     138µs × (0.99,1.02)  ~ (p=0.392)
BenchmarkJSONEncode-4               35.0ms × (0.99,1.02)    35.1ms × (0.98,1.02)  ~ (p=0.777)
BenchmarkJSONDecode-4                119ms × (0.98,1.01)     118ms × (0.98,1.02)  ~ (p=0.710)
BenchmarkMandelbrot200-4            6.02ms × (1.00,1.00)    6.02ms × (1.00,1.00)  ~ (p=0.289)
BenchmarkGoParse-4                  7.96ms × (0.99,1.01)    7.96ms × (0.99,1.01)  ~ (p=0.884)
BenchmarkRegexpMatchEasy0_32-4       164ns × (0.98,1.04)     166ns × (0.97,1.04)  ~ (p=0.221)
BenchmarkRegexpMatchEasy0_1K-4       540ns × (0.99,1.01)     552ns × (0.97,1.04)  +2.10% (p=0.018)
BenchmarkRegexpMatchEasy1_32-4       140ns × (0.99,1.04)     142ns × (0.97,1.04)  ~ (p=0.226)
BenchmarkRegexpMatchEasy1_1K-4       896ns × (0.99,1.03)     907ns × (0.97,1.04)  ~ (p=0.155)
BenchmarkRegexpMatchMedium_32-4      255ns × (0.99,1.04)     255ns × (0.98,1.04)  ~ (p=0.904)
BenchmarkRegexpMatchMedium_1K-4     73.4µs × (0.99,1.04)    73.8µs × (0.98,1.04)  ~ (p=0.560)
BenchmarkRegexpMatchHard_32-4       3.93µs × (0.98,1.04)    3.95µs × (0.98,1.04)  ~ (p=0.571)
BenchmarkRegexpMatchHard_1K-4        117µs × (1.00,1.01)     119µs × (0.98,1.04)  +1.48% (p=0.048)
BenchmarkRevcomp-4                   990ms × (0.94,1.08)     989ms × (0.94,1.10)  ~ (p=0.957)
BenchmarkTemplate-4                  137ms × (0.98,1.02)     137ms × (0.99,1.01)  ~ (p=0.996)
BenchmarkTimeParse-4                 629ns × (1.00,1.00)     629ns × (0.99,1.01)  ~ (p=0.924)
BenchmarkTimeFormat-4                710ns × (0.99,1.01)     716ns × (0.98,1.02)  +0.84% (p=0.033)

Change-Id: I43a04e0f6ad5e3ba9847dddf12e13222561f9cf4
Reviewed-on: https://go-review.googlesource.com/9543
Reviewed-by: Austin Clements <austin@google.com>
2015-04-30 15:50:12 +00:00
Austin Clements
3ca20218c1 runtime: fix gcDumpObject on non-heap pointers
gcDumpObject is used to print the source and destination objects when
checkmark find a missing mark. However, gcDumpObject currently assumes
the given pointer will point to a heap object. This is not true of the
source object during root marking and may not even be true of the
destination object in the limited situations where the heap points
back in to the stack.

If the pointer isn't a heap object, gcDumpObject will attempt an
out-of-bounds access to h_spans. This will cause a panicslice, which
will attempt to construct a useful panic message. This will cause a
string allocation, which will lead mallocgc to panic because the GC is
in mark termination (checkmark only happens during mark termination).

Fix this by checking that the pointer points into the heap arena
before attempting to use it as an arena pointer.

Change-Id: I09da600c380d4773f1f8f38e45b82cb229ea6382
Reviewed-on: https://go-review.googlesource.com/9498
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-30 14:53:51 +00:00
Keith Randall
4b78c9575d runtime: print stack of G during a signal
Sequence of operations:
- Go code does a systemstack call
- during the systemstack call, receive a signal
- signal requests a traceback of all goroutines

The orignal G is still marked as _Grunning, so the traceback code
refuses to print its stack.

Fix by allowing traceback of Gs whose caller is on the same M as G is.
G can't be modifying its stack if that is the case.

Fixes #10546

Change-Id: I2bcea48c0197fbf78ab6fa080027cd80181083ad
Reviewed-on: https://go-review.googlesource.com/9435
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-29 19:25:10 +00:00
Shenghou Ma
4d1ab2d8d1 runtime: re-enable TestNewProc0 on android/arm and fix heap corruption
The problem is not actually specific to android/arm. Linux/ARM's
runtime.clone set the stack pointer to child_stk-4 before calling
the fn. And then when fn returns, it tries to write to 4(R13) to
provide argument for runtime.exit, which is just beyond the allocated
child stack, and thus it will corrupt the heap randomly or trigger
segfault if that memory happens to be unmapped.

While we're at here, shorten the test polling interval to 0.1s to
speed up the test (it was only checking at 1s interval, which means
the test takes at least 1s).

Fixes #10548.

Change-Id: I57cd63232022b113b6cd61e987b0684ebcce930a
Reviewed-on: https://go-review.googlesource.com/9457
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-29 19:18:07 +00:00
Russ Cox
c26fc88d56 runtime/pprof: write heap statistics to heap profile always
The heap statistics were only written if asked for a profile with debug > 0,
but that also prints a stack trace for each profile line, which is comparatively
much noisier. The statistics are short enough and separate enough
(they only appear at the end) and useful enough that we can print them
always.

This means that people using -test.memprofile in tests will get a memory
profile with statistics included now. Pprof won't care, but if people care to
look, the numbers will be there.

This avoids the need for hacks like using -memprofilerate=1 to find
the number of allocations.

Change-Id: I10a4f593403d0315aad11b37c6e554b734caa73f
Reviewed-on: https://go-review.googlesource.com/9491
Reviewed-by: David Chase <drchase@google.com>
2015-04-29 18:07:43 +00:00
Keith Randall
c526f3ac10 runtime: tail call into memeq/cmp body implementations
There's no need to call/ret to the body implementation.
It can write the result to the right place.  Just jump to
it and have it return to our caller.

Old:
  call body implementation
  compute result
  put result in a register
  return
  write register to result location
  return

New:
  load address of result location into a register
  jump to body implementation
  compute result
  write result to passed-in address
  return

It's a bit tricky on 386 because there is no free register
with which to pass the result location.  Free up a register
by keeping around blen-alen instead of both alen and blen.

Change-Id: If2cf0682a5bf1cc592bdda7c126ed4eee8944fba
Reviewed-on: https://go-review.googlesource.com/9202
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-04-29 04:46:25 +00:00
Shenghou Ma
7e49c8193c runtime: skip gdb goroutine backtrace test on non-x86
Gdb is not able to backtrace our non-standard stack frames on RISC
architectures without frame pointer.

Change-Id: Id62a566ce2d743602ded2da22ff77b9ae34bc5ae
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9456
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-29 04:44:38 +00:00
Shenghou Ma
da11a9dda3 cmd/internal/ld, runtime: unify stack reservation in PE header and runtime
With 128KB stack reservation, on 32-bit Windows, the maximum number
threads is ~9000.

The original 65535-byte stack commit is causing problem on Windows
XP where it makes the stack reservation to be 1MB despite the fact
that the runtime specified 128KB.

While we're at here, also fix the extra spacings in the unable to
create more OS thread error message: println will insert a space
between each argument.

See #9457 for more information.

Change-Id: I3a82f7d9717d3d55211b6eb1c34b00b0eaad83ed
Reviewed-on: https://go-review.googlesource.com/2237
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Minux Ma <minux@golang.org>
2015-04-29 03:27:10 +00:00
Shenghou Ma
e7dd28891e cmd/internal/gc, cmd/[56789]g: rename stackcopy to blockcopy
To avoid confusion with the runtime concept of copying stack.

Change-Id: I33442377b71012c2482c2d0ddd561492c71e70d0
Reviewed-on: https://go-review.googlesource.com/8639
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-29 00:28:01 +00:00
Ian Lance Taylor
0c62c93a09 runtime/cgo: use PTHREAD_{MUTEX,COND}_INITIALIZER
Technically you must initialize static pthread_mutex_t and
pthread_cond_t variables with the appropriate INITIALIZER macro.  In
practice the default initializers are zero anyhow, but it's still good
code hygiene.

Change-Id: I517304b16c2c7943b3880855c1b47a9a506b4bdf
Reviewed-on: https://go-review.googlesource.com/9433
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-28 22:27:26 +00:00
Austin Clements
63caec5dee runtime: eliminate one heapBitsForObject from scanobject
scanobject with ptrmask!=nil is only ever called with the base
pointer of a heap object. Currently, scanobject calls
heapBitsForObject, which goes to a great deal of trouble to check
that the pointer points into the heap and to find the base of the
object it points to, both of which are completely unnecessary in
this case.

Replace this call to heapBitsForObject with much simpler logic to
fetch the span and compute the heap bits.

Benchmark results with five runs:

name                                    old mean                new mean        delta
BenchmarkBinaryTree17              9.21s × (0.95,1.02)     8.55s × (0.91,1.03)  -7.16% (p=0.022)
BenchmarkFannkuch11                2.65s × (1.00,1.00)     2.62s × (1.00,1.00)  -1.10% (p=0.000)
BenchmarkFmtFprintfEmpty          73.2ns × (0.99,1.01)    71.7ns × (1.00,1.01)  -1.99% (p=0.004)
BenchmarkFmtFprintfString          302ns × (0.99,1.00)     292ns × (0.98,1.02)  -3.31% (p=0.020)
BenchmarkFmtFprintfInt             281ns × (0.98,1.01)     279ns × (0.96,1.02)  ~ (p=0.596)
BenchmarkFmtFprintfIntInt          482ns × (0.98,1.01)     488ns × (0.95,1.02)  ~ (p=0.419)
BenchmarkFmtFprintfPrefixedInt     382ns × (0.99,1.01)     365ns × (0.96,1.02)  -4.35% (p=0.015)
BenchmarkFmtFprintfFloat           475ns × (0.99,1.01)     472ns × (1.00,1.00)  ~ (p=0.108)
BenchmarkFmtManyArgs              1.89µs × (1.00,1.01)    1.90µs × (0.94,1.02)  ~ (p=0.883)
BenchmarkGobDecode                22.4ms × (0.99,1.01)    21.9ms × (0.92,1.04)  ~ (p=0.332)
BenchmarkGobEncode                24.7ms × (0.98,1.02)    23.9ms × (0.87,1.07)  ~ (p=0.407)
BenchmarkGzip                      397ms × (0.99,1.01)     398ms × (0.99,1.01)  ~ (p=0.718)
BenchmarkGunzip                   96.7ms × (1.00,1.00)    96.9ms × (1.00,1.00)  ~ (p=0.230)
BenchmarkHTTPClientServer         71.5µs × (0.98,1.01)    68.5µs × (0.92,1.06)  ~ (p=0.243)
BenchmarkJSONEncode               46.1ms × (0.98,1.01)    44.9ms × (0.98,1.03)  -2.51% (p=0.040)
BenchmarkJSONDecode               86.1ms × (0.99,1.01)    86.5ms × (0.99,1.01)  ~ (p=0.343)
BenchmarkMandelbrot200            4.12ms × (1.00,1.00)    4.13ms × (1.00,1.00)  +0.23% (p=0.000)
BenchmarkGoParse                  5.89ms × (0.96,1.03)    5.82ms × (0.96,1.04)  ~ (p=0.522)
BenchmarkRegexpMatchEasy0_32       141ns × (0.99,1.01)     142ns × (1.00,1.00)  ~ (p=0.178)
BenchmarkRegexpMatchEasy0_1K       408ns × (1.00,1.00)     392ns × (0.99,1.00)  -3.83% (p=0.000)
BenchmarkRegexpMatchEasy1_32       122ns × (1.00,1.00)     122ns × (1.00,1.00)  ~ (p=0.178)
BenchmarkRegexpMatchEasy1_1K       626ns × (1.00,1.01)     624ns × (0.99,1.00)  ~ (p=0.122)
BenchmarkRegexpMatchMedium_32      202ns × (0.99,1.00)     205ns × (0.99,1.01)  +1.58% (p=0.001)
BenchmarkRegexpMatchMedium_1K     54.4µs × (1.00,1.00)    55.5µs × (1.00,1.00)  +1.86% (p=0.000)
BenchmarkRegexpMatchHard_32       2.68µs × (1.00,1.00)    2.71µs × (1.00,1.00)  +0.97% (p=0.002)
BenchmarkRegexpMatchHard_1K       79.8µs × (1.00,1.01)    80.5µs × (1.00,1.01)  +0.94% (p=0.003)
BenchmarkRevcomp                   590ms × (0.99,1.01)     585ms × (1.00,1.00)  ~ (p=0.066)
BenchmarkTemplate                  111ms × (0.97,1.02)     112ms × (0.99,1.01)  ~ (p=0.201)
BenchmarkTimeParse                 392ns × (1.00,1.00)     385ns × (1.00,1.00)  -1.69% (p=0.000)
BenchmarkTimeFormat                449ns × (0.98,1.01)     448ns × (0.99,1.01)  ~ (p=0.550)

Change-Id: Ie7c3830c481d96c9043e7bf26853c6c1d05dc9f4
Reviewed-on: https://go-review.googlesource.com/9364
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-28 15:22:20 +00:00
Russ Cox
32d6fbcb4f runtime: replace needwb() with writeBarrierEnabled
Reduce the write barrier check to a single load and compare
so that it can be inlined into write barrier use sites.
Makes the standard write barrier a little faster too.

name                                       old                     new          delta
BenchmarkBinaryTree17              17.9s × (0.99,1.01)     17.9s × (1.00,1.01)  ~
BenchmarkFannkuch11                4.35s × (1.00,1.00)     4.43s × (1.00,1.00)  +1.81%
BenchmarkFmtFprintfEmpty           120ns × (0.93,1.06)     110ns × (1.00,1.06)  -7.92%
BenchmarkFmtFprintfString          479ns × (0.99,1.00)     487ns × (0.99,1.00)  +1.67%
BenchmarkFmtFprintfInt             452ns × (0.99,1.02)     450ns × (0.99,1.00)  ~
BenchmarkFmtFprintfIntInt          766ns × (0.99,1.01)     762ns × (1.00,1.00)  ~
BenchmarkFmtFprintfPrefixedInt     576ns × (0.98,1.01)     584ns × (0.99,1.01)  ~
BenchmarkFmtFprintfFloat           730ns × (1.00,1.01)     738ns × (1.00,1.00)  +1.16%
BenchmarkFmtManyArgs              2.84µs × (0.99,1.00)    2.80µs × (1.00,1.01)  -1.22%
BenchmarkGobDecode                39.3ms × (0.98,1.01)    39.0ms × (0.99,1.00)  ~
BenchmarkGobEncode                39.5ms × (0.99,1.01)    37.8ms × (0.98,1.01)  -4.33%
BenchmarkGzip                      663ms × (1.00,1.01)     661ms × (0.99,1.01)  ~
BenchmarkGunzip                    143ms × (1.00,1.00)     142ms × (1.00,1.00)  ~
BenchmarkHTTPClientServer          132µs × (0.99,1.01)     132µs × (0.99,1.01)  ~
BenchmarkJSONEncode               57.4ms × (0.99,1.01)    56.3ms × (0.99,1.01)  -1.96%
BenchmarkJSONDecode                139ms × (0.99,1.00)     138ms × (0.99,1.01)  ~
BenchmarkMandelbrot200            6.03ms × (1.00,1.00)    6.01ms × (1.00,1.00)  ~
BenchmarkGoParse                  10.3ms × (0.89,1.14)    10.2ms × (0.87,1.05)  ~
BenchmarkRegexpMatchEasy0_32       209ns × (1.00,1.00)     208ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy0_1K       591ns × (0.99,1.00)     588ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy1_32       184ns × (0.99,1.02)     182ns × (0.99,1.01)  ~
BenchmarkRegexpMatchEasy1_1K      1.01µs × (1.00,1.00)    0.99µs × (1.00,1.01)  -2.33%
BenchmarkRegexpMatchMedium_32      330ns × (1.00,1.00)     323ns × (1.00,1.01)  -2.12%
BenchmarkRegexpMatchMedium_1K     92.6µs × (1.00,1.00)    89.9µs × (1.00,1.00)  -2.92%
BenchmarkRegexpMatchHard_32       4.80µs × (0.95,1.00)    4.72µs × (0.95,1.01)  ~
BenchmarkRegexpMatchHard_1K        136µs × (1.00,1.00)     133µs × (1.00,1.01)  -1.86%
BenchmarkRevcomp                   900ms × (0.99,1.04)     900ms × (1.00,1.05)  ~
BenchmarkTemplate                  172ms × (1.00,1.00)     168ms × (0.99,1.01)  -2.07%
BenchmarkTimeParse                 637ns × (1.00,1.00)     637ns × (1.00,1.00)  ~
BenchmarkTimeFormat                744ns × (1.00,1.01)     738ns × (1.00,1.00)  -0.67%

Change-Id: I4ecc925805da1f5ee264377f1f7574f54ee575e7
Reviewed-on: https://go-review.googlesource.com/9321
Reviewed-by: Austin Clements <austin@google.com>
2015-04-28 01:37:53 +00:00
Russ Cox
2050f57141 runtime: change unused argument in fat write barriers from pointer to scalar
The argument is unused, only present for alignment of the
following argument. The compiler today always passes a zero
but I'd rather not write anything there during the call sequence,
so mark it as a scalar so the garbage collector won't look at it.

As expected, no significant performance change.

name                                       old                     new          delta
BenchmarkBinaryTree17              17.9s × (0.99,1.00)     17.9s × (0.99,1.01)  ~
BenchmarkFannkuch11                4.35s × (1.00,1.00)     4.35s × (1.00,1.00)  ~
BenchmarkFmtFprintfEmpty           120ns × (0.94,1.05)     120ns × (0.93,1.06)  ~
BenchmarkFmtFprintfString          477ns × (1.00,1.00)     479ns × (0.99,1.00)  ~
BenchmarkFmtFprintfInt             450ns × (0.99,1.01)     452ns × (0.99,1.02)  ~
BenchmarkFmtFprintfIntInt          765ns × (0.99,1.01)     766ns × (0.99,1.01)  ~
BenchmarkFmtFprintfPrefixedInt     569ns × (0.99,1.01)     576ns × (0.98,1.01)  ~
BenchmarkFmtFprintfFloat           728ns × (1.00,1.00)     730ns × (1.00,1.01)  ~
BenchmarkFmtManyArgs              2.82µs × (0.99,1.01)    2.84µs × (0.99,1.00)  ~
BenchmarkGobDecode                39.1ms × (0.99,1.01)    39.3ms × (0.98,1.01)  ~
BenchmarkGobEncode                39.4ms × (0.99,1.01)    39.5ms × (0.99,1.01)  ~
BenchmarkGzip                      661ms × (0.99,1.01)     663ms × (1.00,1.01)  ~
BenchmarkGunzip                    143ms × (1.00,1.00)     143ms × (1.00,1.00)  ~
BenchmarkHTTPClientServer          133µs × (0.99,1.01)     132µs × (0.99,1.01)  ~
BenchmarkJSONEncode               57.3ms × (0.99,1.04)    57.4ms × (0.99,1.01)  ~
BenchmarkJSONDecode                139ms × (0.99,1.00)     139ms × (0.99,1.00)  ~
BenchmarkMandelbrot200            6.02ms × (1.00,1.00)    6.03ms × (1.00,1.00)  ~
BenchmarkGoParse                  9.72ms × (0.92,1.11)   10.31ms × (0.89,1.14)  ~
BenchmarkRegexpMatchEasy0_32       209ns × (1.00,1.01)     209ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy0_1K       592ns × (0.99,1.00)     591ns × (0.99,1.00)  ~
BenchmarkRegexpMatchEasy1_32       183ns × (0.98,1.01)     184ns × (0.99,1.02)  ~
BenchmarkRegexpMatchEasy1_1K      1.01µs × (1.00,1.01)    1.01µs × (1.00,1.00)  ~
BenchmarkRegexpMatchMedium_32      330ns × (1.00,1.00)     330ns × (1.00,1.00)  ~
BenchmarkRegexpMatchMedium_1K     92.4µs × (1.00,1.00)    92.6µs × (1.00,1.00)  ~
BenchmarkRegexpMatchHard_32       4.77µs × (0.95,1.01)    4.80µs × (0.95,1.00)  ~
BenchmarkRegexpMatchHard_1K        136µs × (1.00,1.00)     136µs × (1.00,1.00)  ~
BenchmarkRevcomp                   906ms × (0.99,1.05)     900ms × (0.99,1.04)  ~
BenchmarkTemplate                  171ms × (0.99,1.01)     172ms × (1.00,1.00)  ~
BenchmarkTimeParse                 638ns × (1.00,1.00)     637ns × (1.00,1.00)  ~
BenchmarkTimeFormat                745ns × (0.99,1.02)     744ns × (1.00,1.01)  ~

Change-Id: I0aeac5dc7adfd75e2223e3aabfedc7818d339f9b
Reviewed-on: https://go-review.googlesource.com/9320
Reviewed-by: Austin Clements <austin@google.com>
2015-04-28 01:37:45 +00:00
Austin Clements
02ba71e547 runtime/race: fix failing tests
Some race tests were sensitive to the goroutine scheduling order.
When this changed in commit e870f06, these tests started to fail.

Fix TestRaceHeapParam by ensuring that the racing goroutine has
run before the test exits. Fix TestRaceRWMutexMultipleReaders by
adding a third reader to ensure that two readers wind up on the
same side of the writer (and race with each other) regardless of
the schedule. Fix TestRaceRange by ensuring that the racing
goroutine runs before the main goroutine exits the loop it races
with.

Change-Id: Iaf002f8730ea42227feaf2f3c51b9a1e57ccffdd
Reviewed-on: https://go-review.googlesource.com/9402
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-27 23:12:00 +00:00
Russ Cox
f774e6a1f8 runtime/race: stop listening to external network addresses
This makes the OS X firewall box pop up.
Not run during all.bash so hasn't been noticed before.

Change-Id: I78feb4fd3e1d3c983ae3419085048831c04de3da
Reviewed-on: https://go-review.googlesource.com/9401
Reviewed-by: Austin Clements <austin@google.com>
2015-04-27 23:11:45 +00:00
Austin Clements
7c7cd69591 runtime: fix stack use accounting
ReadMemStats accounts for stacks slightly differently than the runtime
does internally. Internally, only stacks allocated by newosproc0 are
accounted in memstats.stacks_sys and other stacks are accounted in
heap_sys. readmemstats_m shuffles the statistics so all stacks are
accounted in StackSys rather than HeapSys.

However, currently, readmemstats_m assumes StackSys will be zero when
it does this shuffle. This was true until commit 6ad33be. If it isn't
(e.g., if something called newosproc0), StackSys+HeapSys will be
different before and after this shuffle, and the Sys sum that was
computed earlier will no longer agree with the sum of its components.

Fix this by making the shuffle in readmemstats_m not assume that
StackSys is zero.

Fixes #10585.

Change-Id: If13991c8de68bd7b85e1b613d3f12b4fd6fd5813
Reviewed-on: https://go-review.googlesource.com/9366
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-27 23:09:39 +00:00
David Crawshaw
d707a6e0e2 runtime: remove unnecessary noescape to fix netbsd
I introduced this build failure in golang.org/cl/9302 but failed to
notice due to the other failures on the dashboard.

Change-Id: I84bf00f664ba572c1ca722e0136d8a2cf21613ca
Reviewed-on: https://go-review.googlesource.com/9363
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-27 23:04:38 +00:00
Austin Clements
23ce80efeb runtime/race: fix benchmark deadlock
Currently TestRaceCrawl fails to wg.Done for every wg.Adds if the
depth ever reaches 0. This causes the test to deadlock. Under the race
detector, this deadlock is not detected, so the test eventually times
out.

This only recently became a problem. Prior to commit e870f06 the depth
would never reach 0 because the strict round-robin goroutine schedule
ensured that all of the URLs were already "seen" by depth 2. Now that
the runtime prefers scheduling the most recently started goroutine,
the test is able to reach depth 0 and trigger this deadlock.

Change-Id: I5176302a89614a344c84d587073b364833af6590
Reviewed-on: https://go-review.googlesource.com/9344
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-27 20:54:34 +00:00
Russ Cox
42da270024 runtime: fix race in BenchmarkPingPongHog
The master goroutine was returning before
the child goroutine had done its final i < b.N
(the one that fails and causes it to exit the loop)
and then the benchmark harness was updating
b.N, causing a read+write race on b.N.

Change-Id: I2504270a0de30544736f6c32161337a25b505c3e
Reviewed-on: https://go-review.googlesource.com/9368
Reviewed-by: Austin Clements <austin@google.com>
2015-04-27 20:10:11 +00:00
Austin Clements
33e0f3d853 runtime: fix some out of date comments and typos
Change-Id: I061057414c722c5a0f03c709528afc8554114db6
Reviewed-on: https://go-review.googlesource.com/9367
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 20:08:38 +00:00
Josh Bleecher Snyder
9a0fd97ff3 runtime: remove a modulus calculation from pollorder
This is a follow-up to CL 9269, as suggested
by dvyukov.

There is probably even more that can be done
to speed up this shuffle. It will matter more
once CL 7570 (fine-grained locking in select)
is in and can be revisited then, with benchmarks.

Change-Id: Ic13a27d11cedd1e1f007951214b3bb56b1644f02
Reviewed-on: https://go-review.googlesource.com/9393
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-27 19:36:37 +00:00
Austin Clements
1b01910c06 runtime: rename gcController.findRunnable to findRunnableGCWorker
This avoids confusion with the main findrunnable in the scheduler.

Change-Id: I8cf40657557a8610a2fe5a2f74598518256ca7f0
Reviewed-on: https://go-review.googlesource.com/9305
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:42 +00:00
Austin Clements
bb6320535d runtime: replace STW for enabling write barriers with ragged barrier
Currently, we use a full stop-the-world around enabling write
barriers. This is to ensure that all Gs have enabled write barriers
before any blackening occurs (either in gcBgMarkWorker() or in
gcAssistAlloc()).

However, there's no need to bring the whole world to a synchronous
stop to ensure this. This change replaces the STW with a ragged
barrier that ensures each P has individually observed that write
barriers should be enabled before GC performs any blackening.

Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955
Reviewed-on: https://go-review.googlesource.com/8207
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:37 +00:00
Austin Clements
57afa76471 runtime: add ragged global barrier function
This adds forEachP, which performs a general-purpose ragged global
barrier. forEachP takes a callback and invokes it for every P at a GC
safe point.

Ps that are idle or in a syscall are considered to be at a continuous
safe point. forEachP ensures that these Ps do not change state by
forcing all syscall Ps into idle and holding the sched.lock.

To ensure that Ps do not enter syscall or idle without running the
safe-point function, this adds checks for a pending callback every
place there is currently a gcwaiting check.

We'll use forEachP to replace the STW around enabling the write
barrier and to replace the current asynchronous per-M wbuf cache with
a cooperatively managed per-P gcWork cache.

Change-Id: Ie944f8ce1fead7c79bf271d2f42fcd61a41bb3cc
Reviewed-on: https://go-review.googlesource.com/8206
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:33 +00:00
Austin Clements
b0b1a66052 runtime: reset spinning in mspinning if work was ready()ed
This fixes a bug where the runtime ready()s a goroutine while setting
up a new M that's initially marked as spinning, causing the scheduler
to later panic when it finds work in the run queue of a P associated
with a spinning M. Specifically, the sequence of events that can lead
to this is:

1) sysmon calls handoffp to hand off a P stolen from a syscall.

2) handoffp sees no pending work on the P, so it calls startm with
   spinning set.

3) startm calls newm, which in turn calls allocm to allocate a new M.

4) allocm "borrows" the P we're handing off in order to do allocation
   and performs this allocation.

5) This allocation may assist the garbage collector, and this assist
   may detect the end of concurrent mark and ready() the main GC
   goroutine to signal this.

6) This ready()ing puts the GC goroutine on the run queue of the
   borrowed P.

7) newm starts the OS thread, which runs mstart and subsequently
   mstart1, which marks the M spinning because startm was called with
   spinning set.

8) mstart1 enters the scheduler, which panics because there's work on
   the run queue, but the M is marked spinning.

To fix this, before marking the M spinning in step 7, add a check to
see if work was been added to the P's run queue. If this is the case,
undo the spinning instead.

Fixes #10573.

Change-Id: I4670495ae00582144a55ce88c45ae71de597cfa5
Reviewed-on: https://go-review.googlesource.com/9332
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-04-27 12:49:54 +00:00
Austin Clements
2a46f55b35 runtime: panic when idling a P with runnable Gs
This adds a check that we never put a P on the idle list when it has
work on its local run queue.

Change-Id: Ifcfab750de60c335148a7f513d4eef17be03b6a7
Reviewed-on: https://go-review.googlesource.com/9324
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-27 12:49:49 +00:00