1
0
mirror of https://github.com/golang/go synced 2024-10-03 07:21:21 -06:00
Commit Graph

1221 Commits

Author SHA1 Message Date
Daniel Morsing
db6f88a84b runtime: enable profiling on g0
Since we now have stack information for code running on the
systemstack, we can traceback over it. To make cpu profiles useful,
add a case in gentraceback to jump over systemstack switches.

Fixes #10609.

Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7
Reviewed-on: https://go-review.googlesource.com/9506
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-05-11 08:44:30 +00:00
Shenghou Ma
fd392ee52b cmd/internal/ld: generate correct .debug_frames on RISC architectures
With this patch, gdb seems to be able to corretly backtrace Go
process on at least linux/{arm,arm64,ppc64}.

Change-Id: Ic40a2a70e71a19c4a92e4655710f38a807b67e9a
Reviewed-on: https://go-review.googlesource.com/9822
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-08 00:34:27 +00:00
Russ Cox
0211d7d7b0 runtime: turn off checkmark by default
Change-Id: Ic8cb8b1ed8715d6d5a53ec3cac385c0e93883514
Reviewed-on: https://go-review.googlesource.com/9825
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-07 21:08:42 +00:00
Russ Cox
9626561030 runtime: fix gccheckmark mode and enable by default
It was testing the mark bits on what roots pointed at,
but not the remainder of the live heap, because in
CL 2991 I accidentally inverted this check during
refactoring.

The next CL will turn it back off by default again,
but I want one run on the builders with the full
checkmark checks.

Change-Id: Ic166458cea25c0a56e5387fc527cb166ff2e5ada
Reviewed-on: https://go-review.googlesource.com/9824
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-07 21:08:29 +00:00
Rick Hudson
b6e178ed7e runtime: set heap minimum default based on GOGC
Currently the heap minimum is set to 4MB which prevents our ability to
collect at every allocation by setting GOGC=0. This adjust the
heap minimum to 4MB*GOGC/100 thus reenabling collecting at every allocation.
Fixes #10681

Change-Id: I912d027dac4b14ae535597e8beefa9ac3fb8ad94
Reviewed-on: https://go-review.googlesource.com/9814
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-07 21:05:58 +00:00
Michael Hudson-Doyle
fa896733b5 runtime: check consistency of all module data objects
Current code just checks the consistency (that the functab is correctly
sorted by PC, etc) of the moduledata object that the runtime belongs to.
Change to check all of them.

Change-Id: I544a44c5de7445fff87d3cdb4840ff04c5e2bf75
Reviewed-on: https://go-review.googlesource.com/9773
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-07 15:06:08 +00:00
Alex Brainman
a52dc9fcbd runtime: fix comments that mention g status values
Makes searching in source code easier.

Change-Id: Ie2e85934d23920ac0bc01d28168bcfbbdc465580
Reviewed-on: https://go-review.googlesource.com/9774
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-07 00:00:38 +00:00
Austin Clements
17db6e0420 runtime: use heap scan size as estimate of GC scan work
Currently, the GC uses a moving average of recent scan work ratios to
estimate the total scan work required by this cycle. This is in turn
used to compute how much scan work should be done by mutators when
they allocate in order to perform all expected scan work by the time
the allocated heap reaches the heap goal.

However, our current scan work estimate can be arbitrarily wrong if
the heap topography changes significantly from one cycle to the
next. For example, in the go1 benchmarks, at the beginning of each
benchmark, the heap is dominated by a 256MB no-scan object, so the GC
learns that the scan density of the heap is very low. In benchmarks
that then rapidly allocate pointer-dense objects, by the time of the
next GC cycle, our estimate of the scan work can be too low by a large
factor. This in turn lets the mutator allocate faster than the GC can
collect, allowing it to get arbitrarily far ahead of the scan work
estimate, which leads to very long GC cycles with very little mutator
assist that can overshoot the heap goal by large margins. This is
particularly easy to demonstrate with BinaryTree17:

$ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17
gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P
gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P
testing: warning: no tests to run
PASS
BenchmarkBinaryTree17	gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced)
gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P
       1       9150447353 ns/op

Change this estimate to instead use the *current* scannable heap
size. This has the advantage of being based solely on the current
state of the heap, not on past densities or reachable heap sizes, so
it isn't susceptible to falling behind during these sorts of phase
changes. This is strictly an over-estimate, but it's better to
over-estimate and get more assist than necessary than it is to
under-estimate and potentially spiral out of control. Experiments with
scaling this estimate back showed no obvious benefit for mutator
utilization, heap size, or assist time.

This new estimate has little effect for most benchmarks, including
most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge
effect for benchmarks that triggered the bad pacer behavior:

name                   old mean              new mean              delta
BinaryTree17            10.0s × (1.00,1.00)    3.5s × (0.98,1.01)  -64.93% (p=0.000)
Fannkuch11              2.74s × (1.00,1.01)   2.65s × (1.00,1.00)   -3.52% (p=0.000)
FmtFprintfEmpty        56.4ns × (0.99,1.00)  57.8ns × (1.00,1.01)   +2.43% (p=0.000)
FmtFprintfString        187ns × (0.99,1.00)   185ns × (0.99,1.01)   -1.19% (p=0.010)
FmtFprintfInt           184ns × (1.00,1.00)   183ns × (1.00,1.00)  (no variance)
FmtFprintfIntInt        321ns × (1.00,1.00)   315ns × (1.00,1.00)   -1.80% (p=0.000)
FmtFprintfPrefixedInt   266ns × (1.00,1.00)   263ns × (1.00,1.00)   -1.22% (p=0.000)
FmtFprintfFloat         353ns × (1.00,1.00)   353ns × (1.00,1.00)   -0.13% (p=0.035)
FmtManyArgs            1.21µs × (1.00,1.00)  1.19µs × (1.00,1.00)   -1.33% (p=0.000)
GobDecode              9.69ms × (1.00,1.00)  9.59ms × (1.00,1.00)   -1.07% (p=0.000)
GobEncode              7.89ms × (0.99,1.01)  7.74ms × (1.00,1.00)   -1.92% (p=0.000)
Gzip                    391ms × (1.00,1.00)   392ms × (1.00,1.00)     ~    (p=0.522)
Gunzip                 97.1ms × (1.00,1.00)  97.0ms × (1.00,1.00)   -0.10% (p=0.000)
HTTPClientServer       55.7µs × (0.99,1.01)  56.7µs × (0.99,1.01)   +1.81% (p=0.001)
JSONEncode             19.1ms × (1.00,1.00)  19.0ms × (1.00,1.00)   -0.85% (p=0.000)
JSONDecode             66.8ms × (1.00,1.00)  66.9ms × (1.00,1.00)     ~    (p=0.288)
Mandelbrot200          4.13ms × (1.00,1.00)  4.12ms × (1.00,1.00)   -0.08% (p=0.000)
GoParse                3.97ms × (1.00,1.01)  4.01ms × (1.00,1.00)   +0.99% (p=0.000)
RegexpMatchEasy0_32     114ns × (1.00,1.00)   115ns × (0.99,1.00)     ~    (p=0.070)
RegexpMatchEasy0_1K     376ns × (1.00,1.00)   376ns × (1.00,1.00)     ~    (p=0.900)
RegexpMatchEasy1_32    94.9ns × (1.00,1.00)  96.3ns × (1.00,1.01)   +1.53% (p=0.001)
RegexpMatchEasy1_1K     568ns × (1.00,1.00)   567ns × (1.00,1.00)   -0.22% (p=0.001)
RegexpMatchMedium_32    159ns × (1.00,1.00)   159ns × (1.00,1.00)     ~    (p=0.178)
RegexpMatchMedium_1K   46.4µs × (1.00,1.00)  46.6µs × (1.00,1.00)   +0.29% (p=0.000)
RegexpMatchHard_32     2.37µs × (1.00,1.00)  2.37µs × (1.00,1.00)     ~    (p=0.722)
RegexpMatchHard_1K     71.1µs × (1.00,1.00)  71.2µs × (1.00,1.00)     ~    (p=0.229)
Revcomp                 565ms × (1.00,1.00)   562ms × (1.00,1.00)   -0.52% (p=0.000)
Template               81.0ms × (1.00,1.00)  80.2ms × (1.00,1.00)   -0.97% (p=0.000)
TimeParse               380ns × (1.00,1.00)   380ns × (1.00,1.00)     ~    (p=0.148)
TimeFormat              405ns × (0.99,1.00)   385ns × (0.99,1.00)   -5.00% (p=0.000)

Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4
Reviewed-on: https://go-review.googlesource.com/9696
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-06 19:40:38 +00:00
Austin Clements
3be3cbd548 runtime: track "scannable" bytes of heap
This tracks the number of scannable bytes in the allocated heap. That
is, bytes that the garbage collector must scan before reaching the
last pointer field in each object.

This will be used to compute a more robust estimate of the GC scan
work.

Change-Id: I1eecd45ef9cdd65b69d2afb5db5da885c80086bb
Reviewed-on: https://go-review.googlesource.com/9695
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:33 +00:00
Austin Clements
53c53984e7 runtime: include scalar slots in GC scan work metric
The garbage collector predicts how much "scan work" must be done in a
cycle to determine how much work should be done by mutators when they
allocate. Most code doesn't care what units the scan work is in: it
simply knows that a certain amount of scan work has to be done in the
cycle. Currently, the GC uses the number of pointer slots scanned as
the scan work on the theory that this is the bulk of the time spent in
the garbage collector and hence reflects real CPU resource usage.
However, this metric is difficult to estimate at the beginning of a
cycle.

Switch to counting the total number of bytes scanned, including both
pointer and scalar slots. This is still less than the total marked
heap since it omits no-scan objects and no-scan tails of objects. This
metric may not reflect absolute performance as well as the count of
scanned pointer slots (though it still takes time to scan scalar
fields), but it will be much easier to estimate robustly, which is
more important.

Change-Id: Ie3a5eeeb0384a1ca566f61b2f11e9ff3a75ca121
Reviewed-on: https://go-review.googlesource.com/9694
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:27 +00:00
Austin Clements
c4931a8433 runtime: dispose gcWork caches before updating controller state
Currently, we only flush the per-P gcWork caches in gcMark, at the
beginning of mark termination. This is necessary to ensure that no
work is held up in these caches.

However, this flush happens after we update the GC controller state,
which depends on statistics about marked heap size and scan work that
are only updated by this flush. Hence, the controller is missing the
bulk of heap marking and scan work. This bug was introduced in commit
1b4025f, which introduced the per-P gcWork caches.

Fix this by flushing these caches before we update the GC controller
state. We continue to flush them at the beginning of mark termination
as well to be robust in case any write barriers happened between the
previous flush and entering mark termination, but this should be a
no-op.

Change-Id: I8f0f91024df967ebf0c616d1c4f0c339c304ebaa
Reviewed-on: https://go-review.googlesource.com/9646
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:22 +00:00
Rick Hudson
1845314560 runtime: remove unused GC timers
During development some tracing routines were added that are not
needed in the release. These included GCstarttimes, GCendtimes, and
GCprinttimes.
Fixes #10462

Change-Id: I0788e6409d61038571a5ae0cbbab793102df0a65
Reviewed-on: https://go-review.googlesource.com/9689
Reviewed-by: Austin Clements <austin@google.com>
2015-05-06 12:53:08 +00:00
Aram Hăvărneanu
fe5ef5c9d7 runtime, syscall: link Solaris binaries directly instead of using dlopen/dlsym
Before CL 8214 (use .plt instead of .got on Solaris) Solaris used a
dynamic linking scheme that didn't permit lazy binding. To speed program
startup, Go binaries only used it for a small number of symbols required
by the runtime. Other symbols were resolved on demand on first use, and
were cached for subsequent use. This required some moderately complex
code in the syscall package.

CL 8214 changed the way dynamic linking is implemented, and now lazy
binding is supported. As now all symbols are resolved lazily by the
dynamic loader, there is no need for the complex code in the syscall
package that did the same. This CL makes Go programs link directly
with the necessary shared libraries and deletes the lazy-loading code
implemented in Go.

Change-Id: Ifd7275db72de61b70647242e7056dd303b1aee9e
Reviewed-on: https://go-review.googlesource.com/9184
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-06 11:38:50 +00:00
Aram Hăvărneanu
121489cbfd runtime/cgo: add cgo support for solaris/amd64
Change-Id: Ic9744c7716cdd53f27c6e5874230963e5fff0333
Reviewed-on: https://go-review.googlesource.com/8260
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-06 11:37:28 +00:00
Aram Hăvărneanu
c94f1f791b runtime: always load address of libcFunc on Solaris
The linker always uses .plt for externals, so libcFunc is now an actual
external symbol instead of a pointer to one.

Fixes most of the breakage introduced in previous CL.

Change-Id: I64b8c96f93127f2d13b5289b024677fd3ea7dbea
Reviewed-on: https://go-review.googlesource.com/8215
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-06 11:36:57 +00:00
Russ Cox
ceefebd795 runtime: rename ptrsize to ptrdata
I forgot there is already a ptrSize constant.
Rename field to avoid some confusion.

Change-Id: I098fdcc8afc947d6c02c41c6e6de24624cc1c8ff
Reviewed-on: https://go-review.googlesource.com/9700
Reviewed-by: Austin Clements <austin@google.com>
2015-05-05 19:27:47 +00:00
Keith Randall
5a828cfcde runtime: let freezetheworld work even when gomaxprocs=1
Freezetheworld still has stuff to do when gomaxprocs=1.
In particular, signals can come in on other Ms (like the GC M, say)
and the single user M is still running.

Fixes #10546

Change-Id: I2f07f17d1c81e93cf905df2cb087112d436ca7e7
Reviewed-on: https://go-review.googlesource.com/9551
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-05-05 15:11:10 +00:00
Shenghou Ma
102436e800 runtime: fix software FP regs corruption when emulating SQRT on ARM
When emulating ARM FSQRT instruction, the sqrt function itself
should not use any floating point arithmetics, otherwise it will
clobber the user software FP registers.

Fortunately, the sqrt function only uses floating point instructions
to test for corner cases, so it's easy to make that function does
all it job using pure integer arithmetic only. I've verified that
after this change, runtime.stepflt and runtime.sqrt doesn't contain
any call to _sfloat. (Perhaps we should add //go:nosfloat to make
the compiler enforce this?)

Fixes #10641.

Change-Id: Ida4742c49000fae4fea4649f28afde630ce4c576
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9570
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Keith Randall <khr@golang.org>
2015-05-05 07:32:58 +00:00
Austin Clements
98a9d36837 runtime: add pointer size to type structure
This adds a field to the runtime type structure that records the size
of the prefix of objects of that type containing pointers. Any data
after this offset is scalar data.

This is necessary for shrinking the type bitmaps to 1 bit and will
help the garbage collector efficiently estimate the amount of heap
that needs to be scanned.

Change-Id: I1318d79e6360dca0ac980245016c562e61f52ff5
Reviewed-on: https://go-review.googlesource.com/9691
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-04 20:17:48 +00:00
Rick Hudson
b86e71f5aa runtime: Reduce calls to shouldtriggergc
shouldtriggergc is slightly expensive due to the call overhead
and the use of an atomic. This CL reduces the number of time
one checks if a GC should be done from one at each allocation
to once when a span is allocated. Since shouldtriggergc is an
important abstraction simply hand inlining it, along with its
atomic instruction would lose the abstraction.

Change-Id: Ia3210655b4b3d433f77064a21ecb54e4d9d435f7
Reviewed-on: https://go-review.googlesource.com/9403
Reviewed-by: Austin Clements <austin@google.com>
2015-05-04 17:38:58 +00:00
Alex Brainman
031c3bc9ae runtime: fix stackDebug comment
Change-Id: Ia9191bd7ecdf7bd5ee7d69ae23aa71760f379aa8
Reviewed-on: https://go-review.googlesource.com/9590
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-02 02:39:50 +00:00
Austin Clements
dc870d5f4b runtime: detailed debug output of controller state
This adds a detailed debug dump of the state of the GC controller and
a GODEBUG flag to enable it.

Change-Id: I562fed7981691a84ddf0f9e6fcd9f089f497ac13
Reviewed-on: https://go-review.googlesource.com/9640
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 19:39:43 +00:00
Russ Cox
4fffc50c26 runtime: correct accounting of scan work and bytes marked
(1) Count pointer-free objects found during scanning roots
as marked bytes, by not zeroing the mark total after scanning roots.

(2) Don't count the bytes for the roots themselves, by not adding
them to the mark total in scanblock (the zeroing removed by (1)
was aimed at that add but hitting more).

Combined, (1) and (2) fix the calculation of the marked heap size.
This makes the GC trigger much less often in the Go 1 benchmarks,
which have a global []byte pointing at 256 MB of data.
That 256 MB allocation was not being included in the heap size
in the current code, but was included in Go 1.4.
This is the source of much of the relative slowdown in that directory.

(3) Count the bytes for the roots as scanned work, by not zeroing
the scan total after scanning roots. There is no strict justification
for this, and it probably doesn't matter much either way,
but it was always combined with another buggy zeroing
(removed in (1)), so guilty by association.

Austin noticed this.

name                                    old mean                new mean        delta
BenchmarkBinaryTree17              13.1s × (0.97,1.03)      5.9s × (0.97,1.05)  -55.19% (p=0.000)
BenchmarkFannkuch11                4.35s × (0.99,1.01)     4.37s × (1.00,1.01)  +0.47% (p=0.032)
BenchmarkFmtFprintfEmpty          84.6ns × (0.95,1.14)    85.7ns × (0.94,1.05)  ~ (p=0.521)
BenchmarkFmtFprintfString          320ns × (0.95,1.06)     283ns × (0.99,1.02)  -11.48% (p=0.000)
BenchmarkFmtFprintfInt             311ns × (0.98,1.03)     288ns × (0.99,1.02)  -7.26% (p=0.000)
BenchmarkFmtFprintfIntInt          554ns × (0.96,1.05)     478ns × (0.99,1.02)  -13.70% (p=0.000)
BenchmarkFmtFprintfPrefixedInt     434ns × (0.96,1.06)     393ns × (0.98,1.04)  -9.60% (p=0.000)
BenchmarkFmtFprintfFloat           620ns × (0.99,1.03)     584ns × (0.99,1.01)  -5.73% (p=0.000)
BenchmarkFmtManyArgs              2.19µs × (0.98,1.03)    1.94µs × (0.99,1.01)  -11.62% (p=0.000)
BenchmarkGobDecode                21.2ms × (0.97,1.06)    15.2ms × (0.99,1.01)  -28.17% (p=0.000)
BenchmarkGobEncode                18.1ms × (0.94,1.06)    11.8ms × (0.99,1.01)  -35.00% (p=0.000)
BenchmarkGzip                      650ms × (0.98,1.01)     649ms × (0.99,1.02)  ~ (p=0.802)
BenchmarkGunzip                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.438)
BenchmarkHTTPClientServer          110µs × (0.98,1.04)     101µs × (0.98,1.02)  -8.79% (p=0.000)
BenchmarkJSONEncode               40.3ms × (0.97,1.03)    31.8ms × (0.98,1.03)  -20.92% (p=0.000)
BenchmarkJSONDecode                119ms × (0.97,1.02)     108ms × (0.99,1.02)  -9.15% (p=0.000)
BenchmarkMandelbrot200            6.03ms × (1.00,1.01)    6.03ms × (0.99,1.01)  ~ (p=0.750)
BenchmarkGoParse                  8.58ms × (0.89,1.10)    6.80ms × (1.00,1.00)  -20.71% (p=0.000)
BenchmarkRegexpMatchEasy0_32       162ns × (1.00,1.01)     162ns × (0.99,1.02)  ~ (p=0.131)
BenchmarkRegexpMatchEasy0_1K       540ns × (0.99,1.02)     559ns × (0.99,1.02)  +3.58% (p=0.000)
BenchmarkRegexpMatchEasy1_32       139ns × (0.98,1.04)     139ns × (1.00,1.00)  ~ (p=0.466)
BenchmarkRegexpMatchEasy1_1K       889ns × (0.99,1.01)     885ns × (0.99,1.01)  -0.50% (p=0.022)
BenchmarkRegexpMatchMedium_32      252ns × (0.99,1.02)     252ns × (0.99,1.01)  ~ (p=0.469)
BenchmarkRegexpMatchMedium_1K     72.9µs × (0.99,1.01)    73.6µs × (0.99,1.03)  ~ (p=0.168)
BenchmarkRegexpMatchHard_32       3.87µs × (1.00,1.01)    3.86µs × (1.00,1.00)  ~ (p=0.055)
BenchmarkRegexpMatchHard_1K        118µs × (0.99,1.01)     117µs × (0.99,1.00)  ~ (p=0.133)
BenchmarkRevcomp                   995ms × (0.94,1.10)     949ms × (0.99,1.01)  -4.64% (p=0.000)
BenchmarkTemplate                  141ms × (0.97,1.02)     127ms × (0.99,1.01)  -10.00% (p=0.000)
BenchmarkTimeParse                 641ns × (0.99,1.01)     623ns × (0.99,1.01)  -2.79% (p=0.000)
BenchmarkTimeFormat                729ns × (0.98,1.03)     679ns × (0.99,1.00)  -6.93% (p=0.000)

Change-Id: I839bd7356630d18377989a0748763414e15ed057
Reviewed-on: https://go-review.googlesource.com/9602
Reviewed-by: Austin Clements <austin@google.com>
2015-05-01 19:31:00 +00:00
Russ Cox
4d0f3a1c95 cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss
The bitmaps were 2 bits per pointer because we needed to distinguish
scalar, pointer, multiword, and we used the leftover value to distinguish
uninitialized from scalar, even though the garbage collector (GC) didn't care.

Now that there are no multiword structures from the GC's point of view,
cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not.

The GC assumes the same layout for stack frames and for the maps
describing the global data and bss sections, so change them all in one CL.

The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since
the 2-bit representation lives (at least for now) in some of the reflect data.

Because these stack frame bitmaps are stored directly in the rodata in
the binary, this CL reduces the size of the 6g binary by about 1.1%.

Performance change is basically a wash, but using less memory,
and smaller binaries, and enables other bitmap reductions.

name                                      old mean                new mean        delta
BenchmarkBinaryTree17                13.2s × (0.97,1.03)     13.0s × (0.99,1.01)  -0.93% (p=0.005)
BenchmarkBinaryTree17-2              9.69s × (0.96,1.05)     9.51s × (0.96,1.03)  -1.86% (p=0.001)
BenchmarkBinaryTree17-4              10.1s × (0.97,1.05)     10.0s × (0.96,1.05)  ~ (p=0.141)
BenchmarkFannkuch11                  4.35s × (0.99,1.01)     4.43s × (0.98,1.04)  +1.75% (p=0.001)
BenchmarkFannkuch11-2                4.31s × (0.99,1.03)     4.32s × (1.00,1.00)  ~ (p=0.095)
BenchmarkFannkuch11-4                4.32s × (0.99,1.02)     4.38s × (0.98,1.04)  +1.38% (p=0.008)
BenchmarkFmtFprintfEmpty            83.5ns × (0.97,1.10)    87.3ns × (0.92,1.11)  +4.55% (p=0.014)
BenchmarkFmtFprintfEmpty-2          81.8ns × (0.98,1.04)    82.5ns × (0.97,1.08)  ~ (p=0.364)
BenchmarkFmtFprintfEmpty-4          80.9ns × (0.99,1.01)    82.6ns × (0.97,1.08)  +2.12% (p=0.010)
BenchmarkFmtFprintfString            320ns × (0.95,1.04)     322ns × (0.97,1.05)  ~ (p=0.368)
BenchmarkFmtFprintfString-2          303ns × (0.97,1.04)     304ns × (0.97,1.04)  ~ (p=0.484)
BenchmarkFmtFprintfString-4          305ns × (0.97,1.05)     306ns × (0.98,1.05)  ~ (p=0.543)
BenchmarkFmtFprintfInt               311ns × (0.98,1.03)     319ns × (0.97,1.03)  +2.63% (p=0.000)
BenchmarkFmtFprintfInt-2             297ns × (0.98,1.04)     301ns × (0.97,1.04)  +1.19% (p=0.023)
BenchmarkFmtFprintfInt-4             302ns × (0.98,1.02)     304ns × (0.97,1.03)  ~ (p=0.126)
BenchmarkFmtFprintfIntInt            554ns × (0.96,1.05)     554ns × (0.97,1.03)  ~ (p=0.975)
BenchmarkFmtFprintfIntInt-2          520ns × (0.98,1.03)     517ns × (0.98,1.02)  ~ (p=0.153)
BenchmarkFmtFprintfIntInt-4          524ns × (0.98,1.02)     525ns × (0.98,1.03)  ~ (p=0.597)
BenchmarkFmtFprintfPrefixedInt       433ns × (0.97,1.06)     434ns × (0.97,1.06)  ~ (p=0.804)
BenchmarkFmtFprintfPrefixedInt-2     413ns × (0.98,1.04)     413ns × (0.98,1.03)  ~ (p=0.881)
BenchmarkFmtFprintfPrefixedInt-4     420ns × (0.97,1.03)     421ns × (0.97,1.03)  ~ (p=0.561)
BenchmarkFmtFprintfFloat             620ns × (0.99,1.03)     636ns × (0.97,1.03)  +2.57% (p=0.000)
BenchmarkFmtFprintfFloat-2           601ns × (0.98,1.02)     617ns × (0.98,1.03)  +2.58% (p=0.000)
BenchmarkFmtFprintfFloat-4           613ns × (0.98,1.03)     626ns × (0.98,1.02)  +2.15% (p=0.000)
BenchmarkFmtManyArgs                2.19µs × (0.96,1.04)    2.23µs × (0.97,1.02)  +1.65% (p=0.000)
BenchmarkFmtManyArgs-2              2.08µs × (0.98,1.03)    2.10µs × (0.99,1.02)  +0.79% (p=0.019)
BenchmarkFmtManyArgs-4              2.10µs × (0.98,1.02)    2.13µs × (0.98,1.02)  +1.72% (p=0.000)
BenchmarkGobDecode                  21.3ms × (0.97,1.05)    21.1ms × (0.97,1.04)  -1.36% (p=0.025)
BenchmarkGobDecode-2                20.0ms × (0.97,1.03)    19.2ms × (0.97,1.03)  -4.00% (p=0.000)
BenchmarkGobDecode-4                19.5ms × (0.99,1.02)    19.0ms × (0.99,1.01)  -2.39% (p=0.000)
BenchmarkGobEncode                  18.3ms × (0.95,1.07)    18.1ms × (0.96,1.08)  ~ (p=0.305)
BenchmarkGobEncode-2                16.8ms × (0.97,1.02)    16.4ms × (0.98,1.02)  -2.79% (p=0.000)
BenchmarkGobEncode-4                15.4ms × (0.98,1.02)    15.4ms × (0.98,1.02)  ~ (p=0.465)
BenchmarkGzip                        650ms × (0.98,1.03)     655ms × (0.97,1.04)  ~ (p=0.075)
BenchmarkGzip-2                      652ms × (0.98,1.03)     655ms × (0.98,1.02)  ~ (p=0.337)
BenchmarkGzip-4                      656ms × (0.98,1.04)     653ms × (0.98,1.03)  ~ (p=0.291)
BenchmarkGunzip                      143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.507)
BenchmarkGunzip-2                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.313)
BenchmarkGunzip-4                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.312)
BenchmarkHTTPClientServer            110µs × (0.98,1.03)     109µs × (0.99,1.02)  -1.40% (p=0.000)
BenchmarkHTTPClientServer-2          154µs × (0.90,1.08)     149µs × (0.90,1.08)  -3.43% (p=0.007)
BenchmarkHTTPClientServer-4          138µs × (0.97,1.04)     138µs × (0.96,1.04)  ~ (p=0.670)
BenchmarkJSONEncode                 40.2ms × (0.98,1.02)    40.2ms × (0.98,1.05)  ~ (p=0.828)
BenchmarkJSONEncode-2               35.1ms × (0.99,1.02)    35.2ms × (0.98,1.03)  ~ (p=0.392)
BenchmarkJSONEncode-4               35.3ms × (0.98,1.03)    35.3ms × (0.98,1.02)  ~ (p=0.813)
BenchmarkJSONDecode                  119ms × (0.97,1.02)     117ms × (0.98,1.02)  -1.80% (p=0.000)
BenchmarkJSONDecode-2                115ms × (0.99,1.02)     114ms × (0.98,1.02)  -1.18% (p=0.000)
BenchmarkJSONDecode-4                116ms × (0.98,1.02)     114ms × (0.98,1.02)  -1.43% (p=0.000)
BenchmarkMandelbrot200              6.03ms × (1.00,1.01)    6.03ms × (1.00,1.01)  ~ (p=0.985)
BenchmarkMandelbrot200-2            6.03ms × (1.00,1.01)    6.02ms × (1.00,1.01)  ~ (p=0.320)
BenchmarkMandelbrot200-4            6.03ms × (1.00,1.01)    6.03ms × (1.00,1.01)  ~ (p=0.799)
BenchmarkGoParse                    8.63ms × (0.89,1.10)    8.58ms × (0.93,1.09)  ~ (p=0.667)
BenchmarkGoParse-2                  8.20ms × (0.97,1.04)    8.37ms × (0.97,1.04)  +1.96% (p=0.001)
BenchmarkGoParse-4                  8.00ms × (0.98,1.02)    8.14ms × (0.99,1.02)  +1.75% (p=0.000)
BenchmarkRegexpMatchEasy0_32         162ns × (1.00,1.01)     164ns × (0.98,1.04)  +1.35% (p=0.011)
BenchmarkRegexpMatchEasy0_32-2       161ns × (1.00,1.01)     161ns × (1.00,1.00)  ~ (p=0.185)
BenchmarkRegexpMatchEasy0_32-4       161ns × (1.00,1.00)     161ns × (1.00,1.00)  -0.19% (p=0.001)
BenchmarkRegexpMatchEasy0_1K         540ns × (0.99,1.02)     566ns × (0.98,1.04)  +4.98% (p=0.000)
BenchmarkRegexpMatchEasy0_1K-2       540ns × (0.99,1.01)     557ns × (0.99,1.01)  +3.21% (p=0.000)
BenchmarkRegexpMatchEasy0_1K-4       541ns × (0.99,1.01)     559ns × (0.99,1.01)  +3.26% (p=0.000)
BenchmarkRegexpMatchEasy1_32         139ns × (0.98,1.04)     139ns × (0.99,1.03)  ~ (p=0.979)
BenchmarkRegexpMatchEasy1_32-2       139ns × (0.99,1.04)     139ns × (0.99,1.02)  ~ (p=0.777)
BenchmarkRegexpMatchEasy1_32-4       139ns × (0.98,1.04)     139ns × (0.99,1.04)  ~ (p=0.771)
BenchmarkRegexpMatchEasy1_1K         890ns × (0.99,1.03)     885ns × (1.00,1.01)  -0.50% (p=0.004)
BenchmarkRegexpMatchEasy1_1K-2       888ns × (0.99,1.01)     885ns × (0.99,1.01)  -0.37% (p=0.004)
BenchmarkRegexpMatchEasy1_1K-4       890ns × (0.99,1.02)     884ns × (1.00,1.00)  -0.70% (p=0.000)
BenchmarkRegexpMatchMedium_32        252ns × (0.99,1.01)     251ns × (0.99,1.01)  ~ (p=0.081)
BenchmarkRegexpMatchMedium_32-2      254ns × (0.99,1.04)     252ns × (0.99,1.01)  -0.78% (p=0.027)
BenchmarkRegexpMatchMedium_32-4      253ns × (0.99,1.04)     252ns × (0.99,1.01)  -0.70% (p=0.022)
BenchmarkRegexpMatchMedium_1K       72.9µs × (0.99,1.01)    72.7µs × (1.00,1.00)  ~ (p=0.064)
BenchmarkRegexpMatchMedium_1K-2     74.1µs × (0.98,1.05)    72.9µs × (1.00,1.01)  -1.61% (p=0.001)
BenchmarkRegexpMatchMedium_1K-4     73.6µs × (0.99,1.05)    72.8µs × (1.00,1.00)  -1.13% (p=0.007)
BenchmarkRegexpMatchHard_32         3.88µs × (0.99,1.03)    3.92µs × (0.98,1.05)  ~ (p=0.143)
BenchmarkRegexpMatchHard_32-2       3.89µs × (0.99,1.03)    3.93µs × (0.98,1.09)  ~ (p=0.278)
BenchmarkRegexpMatchHard_32-4       3.90µs × (0.99,1.05)    3.93µs × (0.98,1.05)  ~ (p=0.252)
BenchmarkRegexpMatchHard_1K          118µs × (0.99,1.01)     117µs × (0.99,1.02)  -0.54% (p=0.003)
BenchmarkRegexpMatchHard_1K-2        118µs × (0.99,1.01)     118µs × (0.99,1.03)  ~ (p=0.581)
BenchmarkRegexpMatchHard_1K-4        118µs × (0.99,1.02)     117µs × (0.99,1.01)  -0.54% (p=0.002)
BenchmarkRevcomp                     991ms × (0.95,1.10)     989ms × (0.94,1.08)  ~ (p=0.879)
BenchmarkRevcomp-2                   978ms × (0.95,1.11)     962ms × (0.96,1.08)  ~ (p=0.257)
BenchmarkRevcomp-4                   979ms × (0.96,1.07)     974ms × (0.96,1.11)  ~ (p=0.678)
BenchmarkTemplate                    141ms × (0.99,1.02)     145ms × (0.99,1.02)  +2.75% (p=0.000)
BenchmarkTemplate-2                  135ms × (0.98,1.02)     138ms × (0.99,1.02)  +2.34% (p=0.000)
BenchmarkTemplate-4                  136ms × (0.98,1.02)     140ms × (0.99,1.02)  +2.71% (p=0.000)
BenchmarkTimeParse                   640ns × (0.99,1.01)     622ns × (0.99,1.01)  -2.88% (p=0.000)
BenchmarkTimeParse-2                 640ns × (0.99,1.01)     622ns × (1.00,1.00)  -2.81% (p=0.000)
BenchmarkTimeParse-4                 640ns × (1.00,1.01)     622ns × (0.99,1.01)  -2.82% (p=0.000)
BenchmarkTimeFormat                  730ns × (0.98,1.02)     731ns × (0.98,1.03)  ~ (p=0.767)
BenchmarkTimeFormat-2                709ns × (0.99,1.02)     707ns × (0.99,1.02)  ~ (p=0.347)
BenchmarkTimeFormat-4                717ns × (0.98,1.01)     718ns × (0.98,1.02)  ~ (p=0.793)

Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9
Reviewed-on: https://go-review.googlesource.com/9406
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-01 18:44:36 +00:00
Josh Bleecher Snyder
7bebccb972 Revert "runtime/pprof: write heap statistics to heap profile always"
This reverts commit c26fc88d56.

This broke pprof. See the comments at 9491.

Change-Id: Ic99ce026e86040c050a9bf0ea3024a1a42274ad1
Reviewed-on: https://go-review.googlesource.com/9565
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2015-05-01 15:56:20 +00:00
Keith Randall
a55b131393 cmd/dist, runtime: Make stack guard larger for non-optimized builds
Kind of a hack, but makes the non-optimized builds pass.

Fixes #10079

Change-Id: I26f41c546867f8f3f16d953dc043e784768f2aff
Reviewed-on: https://go-review.googlesource.com/9552
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 15:41:55 +00:00
David Chase
7fbb1b36c3 cmd/internal/gc: improve flow of input params to output params
This includes the following information in the per-function summary:

outK = paramJ   encoded in outK bits for paramJ
outK = *paramJ  encoded in outK bits for paramJ
heap = paramJ   EscHeap
heap = *paramJ  EscContentEscapes

Note that (currently) if the address of a parameter is taken and
returned, necessarily a heap allocation occurred to contain that
reference, and the heap can never refer to stack, therefore the
parameter and everything downstream from it escapes to the heap.

The per-function summary information now has a tuneable number of bits
(2 is probably noticeably better than 1, 3 is likely overkill, but it
is now easy to check and the -m debugging output includes information
that allows you to figure out if more would be better.)

A new test was  added to check pointer flow through struct-typed and
*struct-typed parameters and returns; some of these are sensitive to
the number of summary bits, and ought to yield better results with a
more competent escape analysis algorithm.  Another new test checks
(some) correctness with array parameters, results, and operations.

The old analysis inferred a piece of plan9 runtime was non-escaping by
counteracting overconservative analysis with buggy analysis; with the
bug fixed, the result was too conservative (and it's not easy to fix
in this framework) so the source code was tweaked to get the desired
result.  A test was added against the discovered bug.

The escape analysis was further improved splitting the "level" into
3 parts, one tracking the conventional "level" and the other two
computing the highest-level-suffix-from-copy, which is used to
generally model the cancelling effect of indirection applied to
address-of.

With the improved escape analysis enabled, it was necessary to
modify one of the runtime tests because it now attempts to allocate
too much on the (small, fixed-size) G0 (system) stack and this
failed the test.

Compiling src/std after touching src/runtime/*.go with -m logging
turned on shows 420 fewer heap allocation sites (10538 vs 10968).

Profiling allocations in src/html/template with
for i in {1..5} ;
  do go tool 6g -memprofile=mastx.${i}.prof  -memprofilerate=1 *.go;
  go tool pprof -alloc_objects -text  mastx.${i}.prof ;
done

showed a 15% reduction in allocations performed by the compiler.

Update #3753
Update #4720
Fixes #10466

Change-Id: I0fd97d5f5ac527b45f49e2218d158a6e89951432
Reviewed-on: https://go-review.googlesource.com/8202
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 13:47:20 +00:00
David Crawshaw
4044adedf7 runtime/cgo, cmd/dist: turn off exc_bad_access handler by default
App Store policy requires programs do not reference the exc_server
symbol. (Some public forum threads show that Unity ran into this
several years ago and it is a hard policy rule.) While some research
suggests that I could write my own version of exc_server, the
expedient course is to disable the exception handler by default.

Go programs only need it when running under lldb, which is primarily
used by tests. So enable the exception handler in cmd/dist when we
are running the tests.

Fixes #10646

Change-Id: I853905254894b5367edb8abd381d45585a78ee8b
Reviewed-on: https://go-review.googlesource.com/9549
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-01 13:19:39 +00:00
Shenghou Ma
5f69e739d3 runtime: adjust traceTickDiv for non-x86 architectures
Fixes #10554.
Fixes #10623.

Change-Id: I90fbaa34e3d55c8758178f8d2e7fa41ff1194a1b
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9247
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-05-01 07:25:49 +00:00
Russ Cox
79a990b845 runtime: schedule GC work more aggressively
Schedule the work as early as possible, while still respecting the
utilization percentage on average. The old code tried never to
go above the utilization percentage. The new code is willing
to go above the utilization percentage by one time slice
(but of course after doing that it must wait until the percentage
drops back down to the target before it gets another time slice).

The effect is that for concurrent GCs that can run in a small number
of time slices, the time during which write barriers are enabled is
reduced by one mutator + GC time slice round (possibly 30 ms per GC).

This only affects the fractional GC processor (the remainder of GOMAXPROCS/4),
so it matters most in GOMAXPROCS=1, a bit in GOMAXPROCS=2, and not at
all in GOMAXPROCS=4.

GOMAXPROCS=1
name                                      old mean                new mean        delta
BenchmarkBinaryTree17                12.4s × (0.98,1.03)     13.5s × (0.97,1.04)  +8.84% (p=0.000)
BenchmarkFannkuch11                  4.38s × (1.00,1.01)     4.38s × (1.00,1.01)  ~ (p=0.343)
BenchmarkFmtFprintfEmpty            88.9ns × (0.97,1.10)    90.1ns × (0.93,1.14)  ~ (p=0.224)
BenchmarkFmtFprintfString            356ns × (0.94,1.05)     321ns × (0.94,1.12)  -9.77% (p=0.000)
BenchmarkFmtFprintfInt               344ns × (0.98,1.03)     325ns × (0.96,1.03)  -5.46% (p=0.000)
BenchmarkFmtFprintfIntInt            622ns × (0.97,1.03)     571ns × (0.95,1.05)  -8.09% (p=0.000)
BenchmarkFmtFprintfPrefixedInt       462ns × (0.96,1.04)     431ns × (0.95,1.05)  -6.81% (p=0.000)
BenchmarkFmtFprintfFloat             653ns × (0.98,1.03)     621ns × (0.99,1.03)  -4.90% (p=0.000)
BenchmarkFmtManyArgs                2.32µs × (0.97,1.03)    2.19µs × (0.98,1.02)  -5.43% (p=0.000)
BenchmarkGobDecode                  27.0ms × (0.96,1.04)    20.0ms × (0.97,1.04)  -26.06% (p=0.000)
BenchmarkGobEncode                  26.6ms × (0.99,1.01)    17.8ms × (0.95,1.05)  -33.19% (p=0.000)
BenchmarkGzip                        659ms × (0.98,1.03)     650ms × (0.99,1.01)  -1.34% (p=0.000)
BenchmarkGunzip                      145ms × (0.98,1.04)     143ms × (1.00,1.01)  -1.47% (p=0.000)
BenchmarkHTTPClientServer            111µs × (0.97,1.04)     110µs × (0.96,1.03)  -1.30% (p=0.000)
BenchmarkJSONEncode                 52.0ms × (0.97,1.03)    40.8ms × (0.97,1.03)  -21.47% (p=0.000)
BenchmarkJSONDecode                  127ms × (0.98,1.04)     120ms × (0.98,1.02)  -5.55% (p=0.000)
BenchmarkMandelbrot200              6.04ms × (0.99,1.04)    6.02ms × (1.00,1.01)  ~ (p=0.176)
BenchmarkGoParse                    8.62ms × (0.96,1.08)    8.55ms × (0.93,1.09)  ~ (p=0.302)
BenchmarkRegexpMatchEasy0_32         164ns × (0.98,1.05)     165ns × (0.98,1.07)  ~ (p=0.293)
BenchmarkRegexpMatchEasy0_1K         546ns × (0.98,1.06)     547ns × (0.97,1.07)  ~ (p=0.741)
BenchmarkRegexpMatchEasy1_32         142ns × (0.97,1.09)     141ns × (0.97,1.05)  ~ (p=0.231)
BenchmarkRegexpMatchEasy1_1K         904ns × (0.97,1.07)     900ns × (0.98,1.04)  ~ (p=0.294)
BenchmarkRegexpMatchMedium_32        256ns × (0.98,1.06)     256ns × (0.97,1.04)  ~ (p=0.530)
BenchmarkRegexpMatchMedium_1K       74.2µs × (0.98,1.05)    73.8µs × (0.98,1.04)  ~ (p=0.334)
BenchmarkRegexpMatchHard_32         3.94µs × (0.98,1.07)    3.92µs × (0.98,1.05)  ~ (p=0.356)
BenchmarkRegexpMatchHard_1K          119µs × (0.98,1.07)     119µs × (0.98,1.06)  ~ (p=0.467)
BenchmarkRevcomp                     978ms × (0.96,1.09)     984ms × (0.95,1.07)  ~ (p=0.448)
BenchmarkTemplate                    151ms × (0.96,1.03)     142ms × (0.95,1.04)  -5.55% (p=0.000)
BenchmarkTimeParse                   628ns × (0.99,1.01)     628ns × (0.99,1.01)  ~ (p=0.855)
BenchmarkTimeFormat                  729ns × (0.98,1.06)     734ns × (0.97,1.05)  ~ (p=0.149)

GOMAXPROCS=2
name                                      old mean                new mean        delta
BenchmarkBinaryTree17-2              9.80s × (0.97,1.03)     9.85s × (0.99,1.02)  ~ (p=0.444)
BenchmarkFannkuch11-2                4.35s × (0.99,1.01)     4.40s × (0.98,1.05)  ~ (p=0.099)
BenchmarkFmtFprintfEmpty-2          86.7ns × (0.97,1.05)    85.9ns × (0.98,1.04)  ~ (p=0.409)
BenchmarkFmtFprintfString-2          297ns × (0.98,1.01)     297ns × (0.99,1.01)  ~ (p=0.743)
BenchmarkFmtFprintfInt-2             309ns × (0.98,1.02)     310ns × (0.99,1.01)  ~ (p=0.464)
BenchmarkFmtFprintfIntInt-2          525ns × (0.97,1.05)     518ns × (0.99,1.01)  ~ (p=0.151)
BenchmarkFmtFprintfPrefixedInt-2     408ns × (0.98,1.02)     408ns × (0.98,1.03)  ~ (p=0.797)
BenchmarkFmtFprintfFloat-2           603ns × (0.99,1.01)     604ns × (0.98,1.02)  ~ (p=0.588)
BenchmarkFmtManyArgs-2              2.07µs × (0.98,1.02)    2.05µs × (0.99,1.01)  ~ (p=0.091)
BenchmarkGobDecode-2                19.1ms × (0.97,1.01)    19.3ms × (0.97,1.04)  ~ (p=0.195)
BenchmarkGobEncode-2                16.2ms × (0.97,1.03)    16.4ms × (0.99,1.01)  ~ (p=0.069)
BenchmarkGzip-2                      652ms × (0.99,1.01)     651ms × (0.99,1.01)  ~ (p=0.705)
BenchmarkGunzip-2                    143ms × (1.00,1.01)     143ms × (1.00,1.00)  ~ (p=0.665)
BenchmarkHTTPClientServer-2          149µs × (0.92,1.11)     149µs × (0.91,1.08)  ~ (p=0.862)
BenchmarkJSONEncode-2               34.6ms × (0.98,1.02)    37.2ms × (0.99,1.01)  +7.56% (p=0.000)
BenchmarkJSONDecode-2                117ms × (0.99,1.01)     117ms × (0.99,1.01)  ~ (p=0.858)
BenchmarkMandelbrot200-2            6.10ms × (0.99,1.03)    6.03ms × (1.00,1.00)  ~ (p=0.083)
BenchmarkGoParse-2                  8.25ms × (0.98,1.01)    8.21ms × (0.99,1.02)  ~ (p=0.307)
BenchmarkRegexpMatchEasy0_32-2       162ns × (0.99,1.02)     162ns × (0.99,1.01)  ~ (p=0.857)
BenchmarkRegexpMatchEasy0_1K-2       541ns × (0.99,1.01)     540ns × (1.00,1.00)  ~ (p=0.530)
BenchmarkRegexpMatchEasy1_32-2       138ns × (1.00,1.00)     141ns × (0.98,1.04)  +1.88% (p=0.038)
BenchmarkRegexpMatchEasy1_1K-2       887ns × (0.99,1.01)     894ns × (0.99,1.01)  ~ (p=0.087)
BenchmarkRegexpMatchMedium_32-2      252ns × (0.99,1.01)     252ns × (0.99,1.01)  ~ (p=0.954)
BenchmarkRegexpMatchMedium_1K-2     73.4µs × (0.99,1.02)    72.8µs × (1.00,1.01)  -0.87% (p=0.029)
BenchmarkRegexpMatchHard_32-2       3.95µs × (0.97,1.05)    3.87µs × (1.00,1.01)  -2.11% (p=0.035)
BenchmarkRegexpMatchHard_1K-2        117µs × (0.99,1.01)     117µs × (0.99,1.01)  ~ (p=0.669)
BenchmarkRevcomp-2                   980ms × (0.95,1.03)     993ms × (0.94,1.09)  ~ (p=0.527)
BenchmarkTemplate-2                  136ms × (0.98,1.01)     135ms × (0.99,1.01)  ~ (p=0.200)
BenchmarkTimeParse-2                 630ns × (1.00,1.01)     630ns × (1.00,1.00)  ~ (p=0.634)
BenchmarkTimeFormat-2                705ns × (0.99,1.01)     710ns × (0.98,1.02)  ~ (p=0.174)

GOMAXPROCS=4
BenchmarkBinaryTree17-4              9.87s × (0.96,1.04)     9.75s × (0.96,1.03)  ~ (p=0.178)
BenchmarkFannkuch11-4                4.35s × (1.00,1.01)     4.40s × (0.99,1.04)  ~ (p=0.071)
BenchmarkFmtFprintfEmpty-4          85.8ns × (0.98,1.06)    85.6ns × (0.98,1.04)  ~ (p=0.858)
BenchmarkFmtFprintfString-4          306ns × (0.99,1.03)     304ns × (0.97,1.02)  ~ (p=0.470)
BenchmarkFmtFprintfInt-4             317ns × (0.98,1.01)     315ns × (0.98,1.02)  -0.92% (p=0.044)
BenchmarkFmtFprintfIntInt-4          527ns × (0.99,1.01)     525ns × (0.98,1.01)  ~ (p=0.164)
BenchmarkFmtFprintfPrefixedInt-4     421ns × (0.98,1.03)     417ns × (0.99,1.02)  ~ (p=0.092)
BenchmarkFmtFprintfFloat-4           623ns × (0.98,1.02)     618ns × (0.98,1.03)  ~ (p=0.172)
BenchmarkFmtManyArgs-4              2.09µs × (0.98,1.02)    2.09µs × (0.98,1.02)  ~ (p=0.679)
BenchmarkGobDecode-4                18.6ms × (0.99,1.01)    18.6ms × (0.98,1.03)  ~ (p=0.595)
BenchmarkGobEncode-4                15.0ms × (0.98,1.02)    15.1ms × (0.99,1.01)  ~ (p=0.301)
BenchmarkGzip-4                      659ms × (0.98,1.04)     660ms × (0.97,1.02)  ~ (p=0.724)
BenchmarkGunzip-4                    145ms × (0.98,1.04)     144ms × (0.99,1.04)  ~ (p=0.671)
BenchmarkHTTPClientServer-4          139µs × (0.97,1.02)     138µs × (0.99,1.02)  ~ (p=0.392)
BenchmarkJSONEncode-4               35.0ms × (0.99,1.02)    35.1ms × (0.98,1.02)  ~ (p=0.777)
BenchmarkJSONDecode-4                119ms × (0.98,1.01)     118ms × (0.98,1.02)  ~ (p=0.710)
BenchmarkMandelbrot200-4            6.02ms × (1.00,1.00)    6.02ms × (1.00,1.00)  ~ (p=0.289)
BenchmarkGoParse-4                  7.96ms × (0.99,1.01)    7.96ms × (0.99,1.01)  ~ (p=0.884)
BenchmarkRegexpMatchEasy0_32-4       164ns × (0.98,1.04)     166ns × (0.97,1.04)  ~ (p=0.221)
BenchmarkRegexpMatchEasy0_1K-4       540ns × (0.99,1.01)     552ns × (0.97,1.04)  +2.10% (p=0.018)
BenchmarkRegexpMatchEasy1_32-4       140ns × (0.99,1.04)     142ns × (0.97,1.04)  ~ (p=0.226)
BenchmarkRegexpMatchEasy1_1K-4       896ns × (0.99,1.03)     907ns × (0.97,1.04)  ~ (p=0.155)
BenchmarkRegexpMatchMedium_32-4      255ns × (0.99,1.04)     255ns × (0.98,1.04)  ~ (p=0.904)
BenchmarkRegexpMatchMedium_1K-4     73.4µs × (0.99,1.04)    73.8µs × (0.98,1.04)  ~ (p=0.560)
BenchmarkRegexpMatchHard_32-4       3.93µs × (0.98,1.04)    3.95µs × (0.98,1.04)  ~ (p=0.571)
BenchmarkRegexpMatchHard_1K-4        117µs × (1.00,1.01)     119µs × (0.98,1.04)  +1.48% (p=0.048)
BenchmarkRevcomp-4                   990ms × (0.94,1.08)     989ms × (0.94,1.10)  ~ (p=0.957)
BenchmarkTemplate-4                  137ms × (0.98,1.02)     137ms × (0.99,1.01)  ~ (p=0.996)
BenchmarkTimeParse-4                 629ns × (1.00,1.00)     629ns × (0.99,1.01)  ~ (p=0.924)
BenchmarkTimeFormat-4                710ns × (0.99,1.01)     716ns × (0.98,1.02)  +0.84% (p=0.033)

Change-Id: I43a04e0f6ad5e3ba9847dddf12e13222561f9cf4
Reviewed-on: https://go-review.googlesource.com/9543
Reviewed-by: Austin Clements <austin@google.com>
2015-04-30 15:50:12 +00:00
Austin Clements
3ca20218c1 runtime: fix gcDumpObject on non-heap pointers
gcDumpObject is used to print the source and destination objects when
checkmark find a missing mark. However, gcDumpObject currently assumes
the given pointer will point to a heap object. This is not true of the
source object during root marking and may not even be true of the
destination object in the limited situations where the heap points
back in to the stack.

If the pointer isn't a heap object, gcDumpObject will attempt an
out-of-bounds access to h_spans. This will cause a panicslice, which
will attempt to construct a useful panic message. This will cause a
string allocation, which will lead mallocgc to panic because the GC is
in mark termination (checkmark only happens during mark termination).

Fix this by checking that the pointer points into the heap arena
before attempting to use it as an arena pointer.

Change-Id: I09da600c380d4773f1f8f38e45b82cb229ea6382
Reviewed-on: https://go-review.googlesource.com/9498
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-30 14:53:51 +00:00
Keith Randall
4b78c9575d runtime: print stack of G during a signal
Sequence of operations:
- Go code does a systemstack call
- during the systemstack call, receive a signal
- signal requests a traceback of all goroutines

The orignal G is still marked as _Grunning, so the traceback code
refuses to print its stack.

Fix by allowing traceback of Gs whose caller is on the same M as G is.
G can't be modifying its stack if that is the case.

Fixes #10546

Change-Id: I2bcea48c0197fbf78ab6fa080027cd80181083ad
Reviewed-on: https://go-review.googlesource.com/9435
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-29 19:25:10 +00:00
Shenghou Ma
4d1ab2d8d1 runtime: re-enable TestNewProc0 on android/arm and fix heap corruption
The problem is not actually specific to android/arm. Linux/ARM's
runtime.clone set the stack pointer to child_stk-4 before calling
the fn. And then when fn returns, it tries to write to 4(R13) to
provide argument for runtime.exit, which is just beyond the allocated
child stack, and thus it will corrupt the heap randomly or trigger
segfault if that memory happens to be unmapped.

While we're at here, shorten the test polling interval to 0.1s to
speed up the test (it was only checking at 1s interval, which means
the test takes at least 1s).

Fixes #10548.

Change-Id: I57cd63232022b113b6cd61e987b0684ebcce930a
Reviewed-on: https://go-review.googlesource.com/9457
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-29 19:18:07 +00:00
Russ Cox
c26fc88d56 runtime/pprof: write heap statistics to heap profile always
The heap statistics were only written if asked for a profile with debug > 0,
but that also prints a stack trace for each profile line, which is comparatively
much noisier. The statistics are short enough and separate enough
(they only appear at the end) and useful enough that we can print them
always.

This means that people using -test.memprofile in tests will get a memory
profile with statistics included now. Pprof won't care, but if people care to
look, the numbers will be there.

This avoids the need for hacks like using -memprofilerate=1 to find
the number of allocations.

Change-Id: I10a4f593403d0315aad11b37c6e554b734caa73f
Reviewed-on: https://go-review.googlesource.com/9491
Reviewed-by: David Chase <drchase@google.com>
2015-04-29 18:07:43 +00:00
Keith Randall
c526f3ac10 runtime: tail call into memeq/cmp body implementations
There's no need to call/ret to the body implementation.
It can write the result to the right place.  Just jump to
it and have it return to our caller.

Old:
  call body implementation
  compute result
  put result in a register
  return
  write register to result location
  return

New:
  load address of result location into a register
  jump to body implementation
  compute result
  write result to passed-in address
  return

It's a bit tricky on 386 because there is no free register
with which to pass the result location.  Free up a register
by keeping around blen-alen instead of both alen and blen.

Change-Id: If2cf0682a5bf1cc592bdda7c126ed4eee8944fba
Reviewed-on: https://go-review.googlesource.com/9202
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-04-29 04:46:25 +00:00
Shenghou Ma
7e49c8193c runtime: skip gdb goroutine backtrace test on non-x86
Gdb is not able to backtrace our non-standard stack frames on RISC
architectures without frame pointer.

Change-Id: Id62a566ce2d743602ded2da22ff77b9ae34bc5ae
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9456
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-29 04:44:38 +00:00
Shenghou Ma
da11a9dda3 cmd/internal/ld, runtime: unify stack reservation in PE header and runtime
With 128KB stack reservation, on 32-bit Windows, the maximum number
threads is ~9000.

The original 65535-byte stack commit is causing problem on Windows
XP where it makes the stack reservation to be 1MB despite the fact
that the runtime specified 128KB.

While we're at here, also fix the extra spacings in the unable to
create more OS thread error message: println will insert a space
between each argument.

See #9457 for more information.

Change-Id: I3a82f7d9717d3d55211b6eb1c34b00b0eaad83ed
Reviewed-on: https://go-review.googlesource.com/2237
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Minux Ma <minux@golang.org>
2015-04-29 03:27:10 +00:00
Shenghou Ma
e7dd28891e cmd/internal/gc, cmd/[56789]g: rename stackcopy to blockcopy
To avoid confusion with the runtime concept of copying stack.

Change-Id: I33442377b71012c2482c2d0ddd561492c71e70d0
Reviewed-on: https://go-review.googlesource.com/8639
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-29 00:28:01 +00:00
Ian Lance Taylor
0c62c93a09 runtime/cgo: use PTHREAD_{MUTEX,COND}_INITIALIZER
Technically you must initialize static pthread_mutex_t and
pthread_cond_t variables with the appropriate INITIALIZER macro.  In
practice the default initializers are zero anyhow, but it's still good
code hygiene.

Change-Id: I517304b16c2c7943b3880855c1b47a9a506b4bdf
Reviewed-on: https://go-review.googlesource.com/9433
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-28 22:27:26 +00:00
Austin Clements
63caec5dee runtime: eliminate one heapBitsForObject from scanobject
scanobject with ptrmask!=nil is only ever called with the base
pointer of a heap object. Currently, scanobject calls
heapBitsForObject, which goes to a great deal of trouble to check
that the pointer points into the heap and to find the base of the
object it points to, both of which are completely unnecessary in
this case.

Replace this call to heapBitsForObject with much simpler logic to
fetch the span and compute the heap bits.

Benchmark results with five runs:

name                                    old mean                new mean        delta
BenchmarkBinaryTree17              9.21s × (0.95,1.02)     8.55s × (0.91,1.03)  -7.16% (p=0.022)
BenchmarkFannkuch11                2.65s × (1.00,1.00)     2.62s × (1.00,1.00)  -1.10% (p=0.000)
BenchmarkFmtFprintfEmpty          73.2ns × (0.99,1.01)    71.7ns × (1.00,1.01)  -1.99% (p=0.004)
BenchmarkFmtFprintfString          302ns × (0.99,1.00)     292ns × (0.98,1.02)  -3.31% (p=0.020)
BenchmarkFmtFprintfInt             281ns × (0.98,1.01)     279ns × (0.96,1.02)  ~ (p=0.596)
BenchmarkFmtFprintfIntInt          482ns × (0.98,1.01)     488ns × (0.95,1.02)  ~ (p=0.419)
BenchmarkFmtFprintfPrefixedInt     382ns × (0.99,1.01)     365ns × (0.96,1.02)  -4.35% (p=0.015)
BenchmarkFmtFprintfFloat           475ns × (0.99,1.01)     472ns × (1.00,1.00)  ~ (p=0.108)
BenchmarkFmtManyArgs              1.89µs × (1.00,1.01)    1.90µs × (0.94,1.02)  ~ (p=0.883)
BenchmarkGobDecode                22.4ms × (0.99,1.01)    21.9ms × (0.92,1.04)  ~ (p=0.332)
BenchmarkGobEncode                24.7ms × (0.98,1.02)    23.9ms × (0.87,1.07)  ~ (p=0.407)
BenchmarkGzip                      397ms × (0.99,1.01)     398ms × (0.99,1.01)  ~ (p=0.718)
BenchmarkGunzip                   96.7ms × (1.00,1.00)    96.9ms × (1.00,1.00)  ~ (p=0.230)
BenchmarkHTTPClientServer         71.5µs × (0.98,1.01)    68.5µs × (0.92,1.06)  ~ (p=0.243)
BenchmarkJSONEncode               46.1ms × (0.98,1.01)    44.9ms × (0.98,1.03)  -2.51% (p=0.040)
BenchmarkJSONDecode               86.1ms × (0.99,1.01)    86.5ms × (0.99,1.01)  ~ (p=0.343)
BenchmarkMandelbrot200            4.12ms × (1.00,1.00)    4.13ms × (1.00,1.00)  +0.23% (p=0.000)
BenchmarkGoParse                  5.89ms × (0.96,1.03)    5.82ms × (0.96,1.04)  ~ (p=0.522)
BenchmarkRegexpMatchEasy0_32       141ns × (0.99,1.01)     142ns × (1.00,1.00)  ~ (p=0.178)
BenchmarkRegexpMatchEasy0_1K       408ns × (1.00,1.00)     392ns × (0.99,1.00)  -3.83% (p=0.000)
BenchmarkRegexpMatchEasy1_32       122ns × (1.00,1.00)     122ns × (1.00,1.00)  ~ (p=0.178)
BenchmarkRegexpMatchEasy1_1K       626ns × (1.00,1.01)     624ns × (0.99,1.00)  ~ (p=0.122)
BenchmarkRegexpMatchMedium_32      202ns × (0.99,1.00)     205ns × (0.99,1.01)  +1.58% (p=0.001)
BenchmarkRegexpMatchMedium_1K     54.4µs × (1.00,1.00)    55.5µs × (1.00,1.00)  +1.86% (p=0.000)
BenchmarkRegexpMatchHard_32       2.68µs × (1.00,1.00)    2.71µs × (1.00,1.00)  +0.97% (p=0.002)
BenchmarkRegexpMatchHard_1K       79.8µs × (1.00,1.01)    80.5µs × (1.00,1.01)  +0.94% (p=0.003)
BenchmarkRevcomp                   590ms × (0.99,1.01)     585ms × (1.00,1.00)  ~ (p=0.066)
BenchmarkTemplate                  111ms × (0.97,1.02)     112ms × (0.99,1.01)  ~ (p=0.201)
BenchmarkTimeParse                 392ns × (1.00,1.00)     385ns × (1.00,1.00)  -1.69% (p=0.000)
BenchmarkTimeFormat                449ns × (0.98,1.01)     448ns × (0.99,1.01)  ~ (p=0.550)

Change-Id: Ie7c3830c481d96c9043e7bf26853c6c1d05dc9f4
Reviewed-on: https://go-review.googlesource.com/9364
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-28 15:22:20 +00:00
Russ Cox
32d6fbcb4f runtime: replace needwb() with writeBarrierEnabled
Reduce the write barrier check to a single load and compare
so that it can be inlined into write barrier use sites.
Makes the standard write barrier a little faster too.

name                                       old                     new          delta
BenchmarkBinaryTree17              17.9s × (0.99,1.01)     17.9s × (1.00,1.01)  ~
BenchmarkFannkuch11                4.35s × (1.00,1.00)     4.43s × (1.00,1.00)  +1.81%
BenchmarkFmtFprintfEmpty           120ns × (0.93,1.06)     110ns × (1.00,1.06)  -7.92%
BenchmarkFmtFprintfString          479ns × (0.99,1.00)     487ns × (0.99,1.00)  +1.67%
BenchmarkFmtFprintfInt             452ns × (0.99,1.02)     450ns × (0.99,1.00)  ~
BenchmarkFmtFprintfIntInt          766ns × (0.99,1.01)     762ns × (1.00,1.00)  ~
BenchmarkFmtFprintfPrefixedInt     576ns × (0.98,1.01)     584ns × (0.99,1.01)  ~
BenchmarkFmtFprintfFloat           730ns × (1.00,1.01)     738ns × (1.00,1.00)  +1.16%
BenchmarkFmtManyArgs              2.84µs × (0.99,1.00)    2.80µs × (1.00,1.01)  -1.22%
BenchmarkGobDecode                39.3ms × (0.98,1.01)    39.0ms × (0.99,1.00)  ~
BenchmarkGobEncode                39.5ms × (0.99,1.01)    37.8ms × (0.98,1.01)  -4.33%
BenchmarkGzip                      663ms × (1.00,1.01)     661ms × (0.99,1.01)  ~
BenchmarkGunzip                    143ms × (1.00,1.00)     142ms × (1.00,1.00)  ~
BenchmarkHTTPClientServer          132µs × (0.99,1.01)     132µs × (0.99,1.01)  ~
BenchmarkJSONEncode               57.4ms × (0.99,1.01)    56.3ms × (0.99,1.01)  -1.96%
BenchmarkJSONDecode                139ms × (0.99,1.00)     138ms × (0.99,1.01)  ~
BenchmarkMandelbrot200            6.03ms × (1.00,1.00)    6.01ms × (1.00,1.00)  ~
BenchmarkGoParse                  10.3ms × (0.89,1.14)    10.2ms × (0.87,1.05)  ~
BenchmarkRegexpMatchEasy0_32       209ns × (1.00,1.00)     208ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy0_1K       591ns × (0.99,1.00)     588ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy1_32       184ns × (0.99,1.02)     182ns × (0.99,1.01)  ~
BenchmarkRegexpMatchEasy1_1K      1.01µs × (1.00,1.00)    0.99µs × (1.00,1.01)  -2.33%
BenchmarkRegexpMatchMedium_32      330ns × (1.00,1.00)     323ns × (1.00,1.01)  -2.12%
BenchmarkRegexpMatchMedium_1K     92.6µs × (1.00,1.00)    89.9µs × (1.00,1.00)  -2.92%
BenchmarkRegexpMatchHard_32       4.80µs × (0.95,1.00)    4.72µs × (0.95,1.01)  ~
BenchmarkRegexpMatchHard_1K        136µs × (1.00,1.00)     133µs × (1.00,1.01)  -1.86%
BenchmarkRevcomp                   900ms × (0.99,1.04)     900ms × (1.00,1.05)  ~
BenchmarkTemplate                  172ms × (1.00,1.00)     168ms × (0.99,1.01)  -2.07%
BenchmarkTimeParse                 637ns × (1.00,1.00)     637ns × (1.00,1.00)  ~
BenchmarkTimeFormat                744ns × (1.00,1.01)     738ns × (1.00,1.00)  -0.67%

Change-Id: I4ecc925805da1f5ee264377f1f7574f54ee575e7
Reviewed-on: https://go-review.googlesource.com/9321
Reviewed-by: Austin Clements <austin@google.com>
2015-04-28 01:37:53 +00:00
Russ Cox
2050f57141 runtime: change unused argument in fat write barriers from pointer to scalar
The argument is unused, only present for alignment of the
following argument. The compiler today always passes a zero
but I'd rather not write anything there during the call sequence,
so mark it as a scalar so the garbage collector won't look at it.

As expected, no significant performance change.

name                                       old                     new          delta
BenchmarkBinaryTree17              17.9s × (0.99,1.00)     17.9s × (0.99,1.01)  ~
BenchmarkFannkuch11                4.35s × (1.00,1.00)     4.35s × (1.00,1.00)  ~
BenchmarkFmtFprintfEmpty           120ns × (0.94,1.05)     120ns × (0.93,1.06)  ~
BenchmarkFmtFprintfString          477ns × (1.00,1.00)     479ns × (0.99,1.00)  ~
BenchmarkFmtFprintfInt             450ns × (0.99,1.01)     452ns × (0.99,1.02)  ~
BenchmarkFmtFprintfIntInt          765ns × (0.99,1.01)     766ns × (0.99,1.01)  ~
BenchmarkFmtFprintfPrefixedInt     569ns × (0.99,1.01)     576ns × (0.98,1.01)  ~
BenchmarkFmtFprintfFloat           728ns × (1.00,1.00)     730ns × (1.00,1.01)  ~
BenchmarkFmtManyArgs              2.82µs × (0.99,1.01)    2.84µs × (0.99,1.00)  ~
BenchmarkGobDecode                39.1ms × (0.99,1.01)    39.3ms × (0.98,1.01)  ~
BenchmarkGobEncode                39.4ms × (0.99,1.01)    39.5ms × (0.99,1.01)  ~
BenchmarkGzip                      661ms × (0.99,1.01)     663ms × (1.00,1.01)  ~
BenchmarkGunzip                    143ms × (1.00,1.00)     143ms × (1.00,1.00)  ~
BenchmarkHTTPClientServer          133µs × (0.99,1.01)     132µs × (0.99,1.01)  ~
BenchmarkJSONEncode               57.3ms × (0.99,1.04)    57.4ms × (0.99,1.01)  ~
BenchmarkJSONDecode                139ms × (0.99,1.00)     139ms × (0.99,1.00)  ~
BenchmarkMandelbrot200            6.02ms × (1.00,1.00)    6.03ms × (1.00,1.00)  ~
BenchmarkGoParse                  9.72ms × (0.92,1.11)   10.31ms × (0.89,1.14)  ~
BenchmarkRegexpMatchEasy0_32       209ns × (1.00,1.01)     209ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy0_1K       592ns × (0.99,1.00)     591ns × (0.99,1.00)  ~
BenchmarkRegexpMatchEasy1_32       183ns × (0.98,1.01)     184ns × (0.99,1.02)  ~
BenchmarkRegexpMatchEasy1_1K      1.01µs × (1.00,1.01)    1.01µs × (1.00,1.00)  ~
BenchmarkRegexpMatchMedium_32      330ns × (1.00,1.00)     330ns × (1.00,1.00)  ~
BenchmarkRegexpMatchMedium_1K     92.4µs × (1.00,1.00)    92.6µs × (1.00,1.00)  ~
BenchmarkRegexpMatchHard_32       4.77µs × (0.95,1.01)    4.80µs × (0.95,1.00)  ~
BenchmarkRegexpMatchHard_1K        136µs × (1.00,1.00)     136µs × (1.00,1.00)  ~
BenchmarkRevcomp                   906ms × (0.99,1.05)     900ms × (0.99,1.04)  ~
BenchmarkTemplate                  171ms × (0.99,1.01)     172ms × (1.00,1.00)  ~
BenchmarkTimeParse                 638ns × (1.00,1.00)     637ns × (1.00,1.00)  ~
BenchmarkTimeFormat                745ns × (0.99,1.02)     744ns × (1.00,1.01)  ~

Change-Id: I0aeac5dc7adfd75e2223e3aabfedc7818d339f9b
Reviewed-on: https://go-review.googlesource.com/9320
Reviewed-by: Austin Clements <austin@google.com>
2015-04-28 01:37:45 +00:00
Austin Clements
02ba71e547 runtime/race: fix failing tests
Some race tests were sensitive to the goroutine scheduling order.
When this changed in commit e870f06, these tests started to fail.

Fix TestRaceHeapParam by ensuring that the racing goroutine has
run before the test exits. Fix TestRaceRWMutexMultipleReaders by
adding a third reader to ensure that two readers wind up on the
same side of the writer (and race with each other) regardless of
the schedule. Fix TestRaceRange by ensuring that the racing
goroutine runs before the main goroutine exits the loop it races
with.

Change-Id: Iaf002f8730ea42227feaf2f3c51b9a1e57ccffdd
Reviewed-on: https://go-review.googlesource.com/9402
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-27 23:12:00 +00:00
Russ Cox
f774e6a1f8 runtime/race: stop listening to external network addresses
This makes the OS X firewall box pop up.
Not run during all.bash so hasn't been noticed before.

Change-Id: I78feb4fd3e1d3c983ae3419085048831c04de3da
Reviewed-on: https://go-review.googlesource.com/9401
Reviewed-by: Austin Clements <austin@google.com>
2015-04-27 23:11:45 +00:00
Austin Clements
7c7cd69591 runtime: fix stack use accounting
ReadMemStats accounts for stacks slightly differently than the runtime
does internally. Internally, only stacks allocated by newosproc0 are
accounted in memstats.stacks_sys and other stacks are accounted in
heap_sys. readmemstats_m shuffles the statistics so all stacks are
accounted in StackSys rather than HeapSys.

However, currently, readmemstats_m assumes StackSys will be zero when
it does this shuffle. This was true until commit 6ad33be. If it isn't
(e.g., if something called newosproc0), StackSys+HeapSys will be
different before and after this shuffle, and the Sys sum that was
computed earlier will no longer agree with the sum of its components.

Fix this by making the shuffle in readmemstats_m not assume that
StackSys is zero.

Fixes #10585.

Change-Id: If13991c8de68bd7b85e1b613d3f12b4fd6fd5813
Reviewed-on: https://go-review.googlesource.com/9366
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-27 23:09:39 +00:00
David Crawshaw
d707a6e0e2 runtime: remove unnecessary noescape to fix netbsd
I introduced this build failure in golang.org/cl/9302 but failed to
notice due to the other failures on the dashboard.

Change-Id: I84bf00f664ba572c1ca722e0136d8a2cf21613ca
Reviewed-on: https://go-review.googlesource.com/9363
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-27 23:04:38 +00:00
Austin Clements
23ce80efeb runtime/race: fix benchmark deadlock
Currently TestRaceCrawl fails to wg.Done for every wg.Adds if the
depth ever reaches 0. This causes the test to deadlock. Under the race
detector, this deadlock is not detected, so the test eventually times
out.

This only recently became a problem. Prior to commit e870f06 the depth
would never reach 0 because the strict round-robin goroutine schedule
ensured that all of the URLs were already "seen" by depth 2. Now that
the runtime prefers scheduling the most recently started goroutine,
the test is able to reach depth 0 and trigger this deadlock.

Change-Id: I5176302a89614a344c84d587073b364833af6590
Reviewed-on: https://go-review.googlesource.com/9344
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-27 20:54:34 +00:00
Russ Cox
42da270024 runtime: fix race in BenchmarkPingPongHog
The master goroutine was returning before
the child goroutine had done its final i < b.N
(the one that fails and causes it to exit the loop)
and then the benchmark harness was updating
b.N, causing a read+write race on b.N.

Change-Id: I2504270a0de30544736f6c32161337a25b505c3e
Reviewed-on: https://go-review.googlesource.com/9368
Reviewed-by: Austin Clements <austin@google.com>
2015-04-27 20:10:11 +00:00
Austin Clements
33e0f3d853 runtime: fix some out of date comments and typos
Change-Id: I061057414c722c5a0f03c709528afc8554114db6
Reviewed-on: https://go-review.googlesource.com/9367
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 20:08:38 +00:00
Josh Bleecher Snyder
9a0fd97ff3 runtime: remove a modulus calculation from pollorder
This is a follow-up to CL 9269, as suggested
by dvyukov.

There is probably even more that can be done
to speed up this shuffle. It will matter more
once CL 7570 (fine-grained locking in select)
is in and can be revisited then, with benchmarks.

Change-Id: Ic13a27d11cedd1e1f007951214b3bb56b1644f02
Reviewed-on: https://go-review.googlesource.com/9393
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-27 19:36:37 +00:00
Austin Clements
1b01910c06 runtime: rename gcController.findRunnable to findRunnableGCWorker
This avoids confusion with the main findrunnable in the scheduler.

Change-Id: I8cf40657557a8610a2fe5a2f74598518256ca7f0
Reviewed-on: https://go-review.googlesource.com/9305
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:42 +00:00
Austin Clements
bb6320535d runtime: replace STW for enabling write barriers with ragged barrier
Currently, we use a full stop-the-world around enabling write
barriers. This is to ensure that all Gs have enabled write barriers
before any blackening occurs (either in gcBgMarkWorker() or in
gcAssistAlloc()).

However, there's no need to bring the whole world to a synchronous
stop to ensure this. This change replaces the STW with a ragged
barrier that ensures each P has individually observed that write
barriers should be enabled before GC performs any blackening.

Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955
Reviewed-on: https://go-review.googlesource.com/8207
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:37 +00:00
Austin Clements
57afa76471 runtime: add ragged global barrier function
This adds forEachP, which performs a general-purpose ragged global
barrier. forEachP takes a callback and invokes it for every P at a GC
safe point.

Ps that are idle or in a syscall are considered to be at a continuous
safe point. forEachP ensures that these Ps do not change state by
forcing all syscall Ps into idle and holding the sched.lock.

To ensure that Ps do not enter syscall or idle without running the
safe-point function, this adds checks for a pending callback every
place there is currently a gcwaiting check.

We'll use forEachP to replace the STW around enabling the write
barrier and to replace the current asynchronous per-M wbuf cache with
a cooperatively managed per-P gcWork cache.

Change-Id: Ie944f8ce1fead7c79bf271d2f42fcd61a41bb3cc
Reviewed-on: https://go-review.googlesource.com/8206
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:33 +00:00
Austin Clements
b0b1a66052 runtime: reset spinning in mspinning if work was ready()ed
This fixes a bug where the runtime ready()s a goroutine while setting
up a new M that's initially marked as spinning, causing the scheduler
to later panic when it finds work in the run queue of a P associated
with a spinning M. Specifically, the sequence of events that can lead
to this is:

1) sysmon calls handoffp to hand off a P stolen from a syscall.

2) handoffp sees no pending work on the P, so it calls startm with
   spinning set.

3) startm calls newm, which in turn calls allocm to allocate a new M.

4) allocm "borrows" the P we're handing off in order to do allocation
   and performs this allocation.

5) This allocation may assist the garbage collector, and this assist
   may detect the end of concurrent mark and ready() the main GC
   goroutine to signal this.

6) This ready()ing puts the GC goroutine on the run queue of the
   borrowed P.

7) newm starts the OS thread, which runs mstart and subsequently
   mstart1, which marks the M spinning because startm was called with
   spinning set.

8) mstart1 enters the scheduler, which panics because there's work on
   the run queue, but the M is marked spinning.

To fix this, before marking the M spinning in step 7, add a check to
see if work was been added to the P's run queue. If this is the case,
undo the spinning instead.

Fixes #10573.

Change-Id: I4670495ae00582144a55ce88c45ae71de597cfa5
Reviewed-on: https://go-review.googlesource.com/9332
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-04-27 12:49:54 +00:00
Austin Clements
2a46f55b35 runtime: panic when idling a P with runnable Gs
This adds a check that we never put a P on the idle list when it has
work on its local run queue.

Change-Id: Ifcfab750de60c335148a7f513d4eef17be03b6a7
Reviewed-on: https://go-review.googlesource.com/9324
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-27 12:49:49 +00:00
Josh Bleecher Snyder
fd5540e7e5 runtime: tighten select permutation generation
This is the optimization made to math/rand in CL 21030043.

Change-Id: I231b24fa77cac1fe74ba887db76313b5efaab3e8
Reviewed-on: https://go-review.googlesource.com/9269
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-27 02:36:24 +00:00
David Crawshaw
a5b693b431 runtime: signal forwarding for darwin/amd64
Follows the linux signal forwarding semantics from
http://golang.org/cl/8712, sharing the implementation of sigfwdgo.
Forwarding for 386, arm, and arm64 will follow.

Change-Id: I6bf30d563d19da39b6aec6900c7fe12d82ed4f62
Reviewed-on: https://go-review.googlesource.com/9302
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-26 13:46:13 +00:00
Rick Hudson
ada8cdb9f6 runtime: Fix bug due to elided return.
A previous change to mbitmap.go dropped a return on a
path the seems not to be excersized. This was a mistake that
this CL fixes.

Change-Id: I715ee4ef08f5bf8d9f53cee84e8fb31a237e2d43
Reviewed-on: https://go-review.googlesource.com/9295
Reviewed-by: Austin Clements <austin@google.com>
2015-04-24 21:52:30 +00:00
Austin Clements
1b4025f4bd runtime: replace per-M workbuf cache with per-P gcWork cache
Currently, each M has a cache of the most recently used *workbuf. This
is used primarily by the write barrier so it doesn't have to access
the global workbuf lists on every write barrier. It's also used by
stack scanning because it's convenient.

This cache is important for write barrier performance, but this
particular approach has several downsides. It's faster than no cache,
but far from optimal (as the benchmarks below show). It's complex:
access to the cache is sprinkled through most of the workbuf list
operations and it requires special care to transform into and back out
of the gcWork cache that's actually used for scanning and marking. It
requires atomic exchanges to take ownership of the cached workbuf and
to return it to the M's cache even though it's almost always used by
only the current M. Since it's per-M, flushing these caches is O(# of
Ms), which may be high. And it has some significant subtleties: for
example, in general the cache shouldn't be used after the
harvestwbufs() in mark termination because it could hide work from
mark termination, but stack scanning can happen after this and *will*
use the cache (but it turns out this is okay because it will always be
followed by a getfull(), which drains the cache).

This change replaces this cache with a per-P gcWork object. This
gcWork cache can be used directly by scanning and marking (as long as
preemption is disabled, which is a general requirement of gcWork).
Since it's per-P, it doesn't require synchronization, which simplifies
things and means the only atomic operations in the write barrier are
occasionally fetching new work buffers and setting a mark bit if the
object isn't already marked. This cache can be flushed in O(# of Ps),
which is generally small. It follows a simple flushing rule: the cache
can be used during any phase, but during mark termination it must be
flushed before allowing preemption. This also makes the dispose during
mutator assist no longer necessary, which eliminates the vast majority
of gcWork dispose calls and reduces contention on the global workbuf
lists. And it's a lot faster on some benchmarks:

benchmark                          old ns/op       new ns/op       delta
BenchmarkBinaryTree17              11963668673     11206112763     -6.33%
BenchmarkFannkuch11                2643217136      2649182499      +0.23%
BenchmarkFmtFprintfEmpty           70.4            70.2            -0.28%
BenchmarkFmtFprintfString          364             307             -15.66%
BenchmarkFmtFprintfInt             317             282             -11.04%
BenchmarkFmtFprintfIntInt          512             483             -5.66%
BenchmarkFmtFprintfPrefixedInt     404             380             -5.94%
BenchmarkFmtFprintfFloat           521             479             -8.06%
BenchmarkFmtManyArgs               2164            1894            -12.48%
BenchmarkGobDecode                 30366146        22429593        -26.14%
BenchmarkGobEncode                 29867472        26663152        -10.73%
BenchmarkGzip                      391236616       396779490       +1.42%
BenchmarkGunzip                    96639491        96297024        -0.35%
BenchmarkHTTPClientServer          100110          70763           -29.31%
BenchmarkJSONEncode                51866051        52511382        +1.24%
BenchmarkJSONDecode                103813138       86094963        -17.07%
BenchmarkMandelbrot200             4121834         4120886         -0.02%
BenchmarkGoParse                   16472789        5879949         -64.31%
BenchmarkRegexpMatchEasy0_32       140             140             +0.00%
BenchmarkRegexpMatchEasy0_1K       394             394             +0.00%
BenchmarkRegexpMatchEasy1_32       120             120             +0.00%
BenchmarkRegexpMatchEasy1_1K       621             614             -1.13%
BenchmarkRegexpMatchMedium_32      209             202             -3.35%
BenchmarkRegexpMatchMedium_1K      54889           55175           +0.52%
BenchmarkRegexpMatchHard_32        2682            2675            -0.26%
BenchmarkRegexpMatchHard_1K        79383           79524           +0.18%
BenchmarkRevcomp                   584116718       584595320       +0.08%
BenchmarkTemplate                  125400565       109620196       -12.58%
BenchmarkTimeParse                 386             387             +0.26%
BenchmarkTimeFormat                580             447             -22.93%

(Best out of 10 runs. The delta of averages is similar.)

This also puts us in a good position to flush these caches when
nearing the end of concurrent marking, which will let us increase the
size of the work buffers while still controlling mark termination
pause time.

Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522
Reviewed-on: https://go-review.googlesource.com/9178
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 20:10:14 +00:00
Austin Clements
d1cae6358c runtime: fix check for pending GC work
When findRunnable considers running a fractional mark worker, it first
checks if there's any work to be done; if there isn't there's no point
in running the worker because it will just reschedule immediately.
However, currently findRunnable just checks work.full and
work.partial, whereas getfull can *also* draw work from m.currentwbuf.
As a result, findRunnable may not start a worker even though there
actually is work.

This problem manifests itself in occasional failures of the
test/init1.go test. This test is unusual because it performs a large
amount of allocation without executing any write barriers, which means
there's nothing to force the pointers in currentwbuf out to the
work.partial/full lists where findRunnable can see them.

This change fixes this problem by making findRunnable also check for a
currentwbuf. This aligns findRunnable with trygetfull's notion of
whether or not there's work.

Change-Id: Ic76d22b7b5d040bc4f58a6b5975e9217650e66c4
Reviewed-on: https://go-review.googlesource.com/9299
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 20:10:10 +00:00
Austin Clements
26eac917dc runtime: start dedicated mark workers even if there's no work
Currently, findRunnable only considers running a mark worker if
there's work in the work queue. In principle, this can delay the start
of the desired number of dedicated mark workers if there's no work
pending. This is unlikely to occur in practice, since there should be
work queued from the scan phase, but if it were to come up, a CPU hog
mutator could slow down or delay garbage collection.

This check makes sense for fractional mark workers, since they'll just
return to the scheduler immediately if there's no work, but we want
the scheduler to start all of the dedicated mark workers promptly,
even if there's currently no queued work. Hence, this change moves the
pending work check after the check for starting a dedicated worker.

Change-Id: I52b851cc9e41f508a0955b3f905ca80f109ea101
Reviewed-on: https://go-review.googlesource.com/9298
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-24 20:10:05 +00:00
Austin Clements
711a164267 runtime: fix some out-of-date comments
bgMarkCount no longer exists.

Change-Id: I3aa406fdccfca659814da311229afbae55af8304
Reviewed-on: https://go-review.googlesource.com/9297
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-24 20:10:01 +00:00
Srdjan Petrovic
6ad33be2d9 runtime: implement xadduintptr and update system mstats using it
The motivation is that sysAlloc/Free() currently aren't safe to be
called without a valid G, because arm's xadd64() uses locks that require
a valid G.

The solution here was proposed by Dmitry Vyukov: use xadduintptr()
instead of xadd64(), until arm can support xadd64 on all of its
architectures (not a trivial task for arm).

Change-Id: I250252079357ea2e4360e1235958b1c22051498f
Reviewed-on: https://go-review.googlesource.com/9002
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-24 16:53:26 +00:00
Austin Clements
0e6a6c510f runtime: simplify process for starting GC goroutine
Currently, when allocation reaches the GC trigger, the runtime uses
readyExecute to start the GC goroutine immediately rather than wait
for the scheduler to get around to the GC goroutine while the mutator
continues to grow the heap.

Now that the scheduler runs the most recently readied goroutine when a
goroutine yields its time slice, this rigmarole is no longer
necessary. The runtime can simply ready the GC goroutine and yield
from the readying goroutine.

Change-Id: I3b4ebadd2a72a923b1389f7598f82973dd5c8710
Reviewed-on: https://go-review.googlesource.com/9292
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-04-24 15:13:05 +00:00
Austin Clements
ce502b063c runtime: use park/ready to wake up GC at end of concurrent mark
Currently, the main GC goroutine sleeps on a note during concurrent
mark and the first background mark worker or assist to finish marking
use wakes up that note to let the main goroutine proceed into mark
termination. Unfortunately, the latency of this wakeup can be quite
high, since the GC goroutine will typically have lost its P while in
the futex sleep, meaning it will be placed on the global run queue and
will wait there until some P is kind enough to pick it up. This delay
gives the mutator more time to allocate and create floating garbage,
growing the heap unnecessarily. Worse, it's likely that background
marking has stopped at this point (unless GOMAXPROCS>4), so anything
that's allocated and published to the heap during this window will
have to be scanned during mark termination while the world is stopped.

This change replaces the note sleep/wakeup with a gopark/ready
scheme. This keeps the wakeup inside the Go scheduler and lets the
garbage collector take advantage of the new scheduler semantics that
run the ready()d goroutine immediately when the ready()ing goroutine
sleeps.

For the json benchmark from x/benchmarks with GOMAXPROCS=4, this
reduces the delay in waking up the GC goroutine and entering mark
termination once concurrent marking is done from ~100ms to typically
<100µs.

Change-Id: Ib11f8b581b8914f2d68e0094f121e49bac3bb384
Reviewed-on: https://go-review.googlesource.com/9291
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:13:01 +00:00
Austin Clements
4e32718d3e runtime: use timer for GC control revise rather than timeout
Currently, we use a note sleep with a timeout in a loop in func gc to
periodically revise the GC control variables. Replace this with a
fully blocking note sleep and use a periodic timer to trigger the
revise instead. This is a step toward replacing the note sleep in func
gc.

Change-Id: I2d562f6b9b2e5f0c28e9a54227e2c0f8a2603f63
Reviewed-on: https://go-review.googlesource.com/9290
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:56 +00:00
Austin Clements
e870f06c3f runtime: yield time slice to most recently readied G
Currently, when the runtime ready()s a G, it adds it to the end of the
current P's run queue and continues running. If there are many other
things in the run queue, this can result in a significant delay before
the ready()d G actually runs and can hurt fairness when other Gs in
the run queue are CPU hogs. For example, if there are three Gs sharing
a P, one of which is a CPU hog that never voluntarily gives up the P
and the other two of which are doing small amounts of work and
communicating back and forth on an unbuffered channel, the two
communicating Gs will get very little CPU time.

Change this so that when G1 ready()s G2 and then blocks, the scheduler
immediately hands off the remainder of G1's time slice to G2. In the
above example, the two communicating Gs will now act as a unit and
together get half of the CPU time, while the CPU hog gets the other
half of the CPU time.

This fixes the problem demonstrated by the ping-pong benchmark added
in the previous commit:

benchmark                old ns/op     new ns/op     delta
BenchmarkPingPongHog     684287        825           -99.88%

On the x/benchmarks suite, this change improves the performance of
garbage by ~6% (for GOMAXPROCS=1 and 4), and json by 28% and 36% for
GOMAXPROCS=1 and 4. It has negligible effect on heap size.

This has no effect on the go1 benchmark suite since those benchmarks
are mostly single-threaded.

Change-Id: I858a08eaa78f702ea98a5fac99d28a4ac91d339f
Reviewed-on: https://go-review.googlesource.com/9289
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:52 +00:00
Austin Clements
da0e37fa8d runtime: benchmark for ping-pong in the presence of a CPU hog
This benchmark demonstrates a current problem with the scheduler where
a set of frequently communicating goroutines get very little CPU time
in the presence of another goroutine that hogs that CPU, even if one
of those communicating goroutines is always runnable.

Currently it takes about 0.5 milliseconds to switch between
ping-ponging goroutines in the presence of a CPU hog:

BenchmarkPingPongHog	    2000	    684287 ns/op

Change-Id: I278848c84f778de32344921ae8a4a8056e4898b0
Reviewed-on: https://go-review.googlesource.com/9288
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:47 +00:00
Austin Clements
e5e52f4f2c runtime: factor checking if P run queue is empty
There are a variety of places where we check if a P's run queue is
empty. This test is about to get slightly more complicated, so factor
it out into a new function, runqempty. This function is inlinable, so
this has no effect on performance.

Change-Id: If4a0b01ffbd004937de90d8d686f6ded4aad2c6b
Reviewed-on: https://go-review.googlesource.com/9287
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:42 +00:00
Srdjan Petrovic
5c8fbc6f1e runtime: signal forwarding
Forward signals to signal handlers installed before Go installs its own,
under certain circumstances.  In particular, as iant@ suggests, signals are
forwarded iff:
   (1) a non-SIG_DFL signal handler existed before Go, and
   (2) signal is synchronous (i.e., one of SIGSEGV, SIGBUS, SIGFPE), and
   	(3a) signal occured on a non-Go thread, or
   	(3b) signal occurred on a Go thread but in CGo code.

Supported only on Linux, for now.

Change-Id: I403219ee47b26cf65da819fb86cf1ec04d3e25f5
Reviewed-on: https://go-review.googlesource.com/8712
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-24 05:19:39 +00:00
Srdjan Petrovic
1f65c9c141 runtime: deflake TestNewOSProc0, fix _rt0_amd64_linux_lib stack alignment
This addresses iant's comments from CL 9164.

Change-Id: I7b5b282f61b11aab587402c2d302697e76666376
Reviewed-on: https://go-review.googlesource.com/9222
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-23 23:09:03 +00:00
Austin Clements
ed09e0e2bf runtime: fix underflow in next_gc calculation
Currently, it's possible for the next_gc calculation to underflow.
Since next_gc is unsigned, this wraps around and effectively disables
GC for the rest of the program's execution. Besides being obviously
wrong, this is causing test failures on 32-bit because some tests are
running out of heap.

This underflow happens for two reasons, both having to do with how we
estimate the reachable heap size at the end of the GC cycle.

One reason is that this calculation depends on the value of heap_live
at the beginning of the GC cycle, but we currently only record that
value during a concurrent GC and not during a forced STW GC. Fix this
by moving the recorded value from gcController to work and recording
it on a common code path.

The other reason is that we use the amount of allocation during the GC
cycle as an approximation of the amount of floating garbage and
subtract it from the marked heap to estimate the reachable heap.
However, since this is only an approximation, it's possible for the
amount of allocation during the cycle to be *larger* than the marked
heap size (since the runtime allocates white and it's possible for
these allocations to never be made reachable from the heap). Currently
this causes wrap-around in our estimate of the reachable heap size,
which in turn causes wrap-around in next_gc. Fix this by bottoming out
the reachable heap estimate at 0, in which case we just fall back to
triggering GC at heapminimum (which is okay since this only happens on
small heaps).

Fixes #10555, fixes #10556, and fixes #10559.

Change-Id: Iad07b529c03772356fede2ae557732f13ebfdb63
Reviewed-on: https://go-review.googlesource.com/9286
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-23 20:52:54 +00:00
Rick Hudson
77f56af0bc runtime: Improve scanning performance
To achieve a 2% improvement in the garbage benchmark this CL removes
an unneeded assert and avoids one hbits.next() call per object
being scanned.

Change-Id: Ibd542d01e9c23eace42228886f9edc488354df0d
Reviewed-on: https://go-review.googlesource.com/9244
Reviewed-by: Austin Clements <austin@google.com>
2015-04-23 20:27:46 +00:00
Hyang-Ah Hana Kim
aef54d40ac runtime: disable TestNewOSProc0 on android/arm.
newosproc0 does not work on android/arm.
See issue #10548.

Change-Id: Ieaf6f5d0b77cddf5bf0b6c89fd12b1c1b8723f9b
Reviewed-on: https://go-review.googlesource.com/9293
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-23 19:08:33 +00:00
Shenghou Ma
edc53e1f14 runtime: fix build after CL 9164 on Linux
There is an assumption that the function executed in child thread
created by runtime.close should not return. And different systems
enforce that differently: some exit that thread, some exit the
whole process.

The test TestNewOSProc0 introduced in CL 9161 breaks that assumption,
so we need to adjust the code to only exit the thread should the
called function return.

Change-Id: Id631cb2f02ec6fbd765508377a79f3f96c6a2ed6
Reviewed-on: https://go-review.googlesource.com/9246
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-04-22 23:21:25 +00:00
Austin Clements
4655aadd00 runtime: use reachable heap estimate to set trigger/goal
Currently, we set the heap goal for the next GC cycle using the size
of the marked heap at the end of the current cycle. This can lead to a
bad feedback loop if the mutator is rapidly allocating and releasing
pointers that can significantly bloat heap size.

If the GC were STW, the marked heap size would be exactly the
reachable heap size (call it stwLive). However, in concurrent GC,
marked=stwLive+floatLive, where floatLive is the amount of "floating
garbage": objects that were reachable at some point during the cycle
and were marked, but which are no longer reachable by the end of the
cycle. If the GC cycle is short, then the mutator doesn't have much
time to create floating garbage, so marked≈stwLive. However, if the GC
cycle is long and the mutator is allocating and creating floating
garbage very rapidly, then it's possible that marked≫stwLive. Since
the runtime currently sets the heap goal based on marked, this will
cause it to set a high heap goal. This means that 1) the next GC cycle
will take longer because of the larger heap and 2) the assist ratio
will be low because of the large distance between the trigger and the
goal. The combination of these lets the mutator produce even more
floating garbage in the next cycle, which further exacerbates the
problem.

For example, on the garbage benchmark with GOMAXPROCS=1, this causes
the heap to grow to ~500MB and the garbage collector to retain upwards
of ~300MB of heap, while the true reachable heap size is ~32MB. This,
in turn, causes the GC cycle to take upwards of ~3 seconds.

Fix this bad feedback loop by estimating the true reachable heap size
(stwLive) and using this rather than the marked heap size
(stwLive+floatLive) as the basis for the GC trigger and heap goal.
This breaks the bad feedback loop and causes the mutator to assist
more, which decreases the rate at which it can create floating
garbage. On the same garbage benchmark, this reduces the maximum heap
size to ~73MB, the retained heap to ~40MB, and the duration of the GC
cycle to ~200ms.

Change-Id: I7712244c94240743b266f9eb720c03802799cdd1
Reviewed-on: https://go-review.googlesource.com/9177
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-22 19:28:42 +00:00
Austin Clements
1ccc577b8a runtime: include heap goal in gctrace line
This may or may not be useful to the end user, but it's incredibly
useful for us to understand the behavior of the pacer. Currently this
is fairly easy (though not trivial) to derive from the other heap
stats we print, but we're about to change how we compute the goal,
which will make it much harder to derive.

Change-Id: I796ef233d470c01f606bd9929820c01ece1f585a
Reviewed-on: https://go-review.googlesource.com/9176
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-22 19:07:44 +00:00
Austin Clements
1f39beb01a runtime: avoid divide-by-zero in GC trigger controller
The trigger controller computes GC CPU utilization by dividing by the
wall-clock time that's passed since concurrent mark began. Since this
delta is nanoseconds it's borderline impossible for it to be zero, but
if it is zero we'll currently divide by zero. Be robust to this
possibility by ignoring the utilization in the error term if no time
has elapsed.

Change-Id: I93dfc9e84735682af3e637f6538d1e7602634f09
Reviewed-on: https://go-review.googlesource.com/9175
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-22 19:07:36 +00:00
Srdjan Petrovic
ca9128f18f runtime: merge clone0 and clone
We initially added clone0 to handle the case when G or M don't exist, but
it turns out that we could have just modified clone.  (It also helps that
the function we're invoking in clone0 no longer needs arguments.)

As a side-effect, newosproc0 is now supported on all linux archs.

Change-Id: Ie603af75d8f164310fc16446052d83743961f3ca
Reviewed-on: https://go-review.googlesource.com/9164
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-22 16:28:57 +00:00
Shenghou Ma
87054c4704 runtime: fix more vet reported issues
Change-Id: Ie8dfdb592ee0bfc736d08c92c3d8413a37b6ac03
Reviewed-on: https://go-review.googlesource.com/9241
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-22 02:50:48 +00:00
Keith Randall
3a56aa0d3e runtime: check error codes for arm64 system calls
Unlike linux arm32, linux arm64 does not set the condition codes to indicate
whether a system call failed or not.  We must check if the return value
is in the error code range (the same as amd64 does).

Fixes runtime.TestBadOpen test.

Change-Id: I97a8b0a17b5f002a3215c535efa91d199cee3309
Reviewed-on: https://go-review.googlesource.com/9220
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-22 02:30:22 +00:00
Josh Bleecher Snyder
a76099f0d9 runtime: fix arm64 asm vet issues
Several naming changes and a real issue in asmcgocall_errno.

Change-Id: Ieb0a328a168819fe233d74e0397358384d7e71b3
Reviewed-on: https://go-review.googlesource.com/9212
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-22 02:30:11 +00:00
Austin Clements
170fb10089 runtime: assist harder if GC exceeds the estimated marked heap
Currently, the GC controller computes the mutator assist ratio at the
beginning of the cycle by estimating that the marked heap size this
cycle will be the same as it was the previous cycle. It then uses that
assist ratio for the rest of the cycle. However, this means that if
the mutator is quickly growing its reachable heap, the heap size is
likely to exceed the heap goal and currently there's no additional
pressure on mutator assists when this happens. For example, 6g (with
GOMAXPROCS=1) frequently exceeds the goal heap size by ~25% because of
this.

This change makes GC revise its work estimate and the resulting assist
ratio every 10ms during the concurrent mark. Instead of
unconditionally using the marked heap size from the last cycle as an
estimate for this cycle, it takes the minimum of the previously marked
heap and the currently marked heap. As a result, as the cycle
approaches or exceeds its heap goal, this will increase the assist
ratio to put more pressure on the mutator assist to bring the cycle to
an end. For 6g, this causes the GC to always finish within 5% and
often within 1% of its heap goal.

Change-Id: I4333b92ad0878c704964be42c655c38a862b4224
Reviewed-on: https://go-review.googlesource.com/9070
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-04-21 15:35:55 +00:00
Austin Clements
e0c3d85f08 runtime: fix background marking at 25% utilization
Currently, in accordance with the GC pacing proposal, we schedule
background marking with a goal of achieving 25% utilization *total*
between mutator assists and background marking. This is stricter than
was set out in the Go 1.5 proposal, which suggests that the garbage
collector can use 25% just for itself and anything the mutator does to
help out is on top of that. It also has several technical
drawbacks. Because mutator assist time is constantly changing and we
can't have instantaneous information on background marking time, it
effectively requires hitting a moving target based on out-of-date
information. This works out in the long run, but works poorly for
short GC cycles and on short time scales. Also, this requires
time-multiplexing all Ps between the mutator and background GC since
the goal utilization of background GC constantly fluctuates. This
results in a complicated scheduling algorithm, poor affinity, and
extra overheads from context switching.

This change modifies the way we schedule and run background marking so
that background marking always consumes 25% of GOMAXPROCS and mutator
assist is in addition to this. This enables a much more robust
scheduling algorithm where we pre-determine the number of Ps we should
dedicate to background marking as well as the utilization goal for a
single floating "remainder" mark worker.

Change-Id: I187fa4c03ab6fe78012a84d95975167299eb9168
Reviewed-on: https://go-review.googlesource.com/9013
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:50 +00:00
Austin Clements
24a7252e25 runtime: finish sweeping before concurrent GC starts
Currently, the concurrent sweep follows a 1:1 rule: when allocation
needs a span, it sweeps a span (likewise, when a large allocation
needs N pages, it sweeps until it frees N pages). This rule worked
well for the STW collector (especially when GOGC==100) because it did
no more sweeping than necessary to keep the heap from growing, would
generally finish sweeping just before GC, and ensured good temporal
locality between sweeping a page and allocating from it.

It doesn't work well with concurrent GC. Since concurrent GC requires
starting GC earlier (sometimes much earlier), the sweep often won't be
done when GC starts. Unfortunately, the first thing GC has to do is
finish the sweep. In the mean time, the mutator can continue
allocating, pushing the heap size even closer to the goal size. This
worked okay with the 7/8ths trigger, but it gets into a vicious cycle
with the GC trigger controller: if the mutator is allocating quickly
and driving the trigger lower, more and more sweep work will be left
to GC; this both causes GC to take longer (allowing the mutator to
allocate more during GC) and delays the start of the concurrent mark
phase, which throws off the GC controller's statistics and generally
causes it to push the trigger even lower.

As an example of a particularly bad case, the garbage benchmark with
GOMAXPROCS=4 and -benchmem 512 (MB) spends the first 0.4-0.8 seconds
of each GC cycle sweeping, during which the heap grows by between
109MB and 252MB.

To fix this, this change replaces the 1:1 sweep rule with a
proportional sweep rule. At the end of GC, GC knows exactly how much
heap allocation will occur before the next concurrent GC as well as
how many span pages must be swept. This change computes this "sweep
ratio" and when the mallocgc asks for a span, the mcentral sweeps
enough spans to bring the swept span count into ratio with the
allocated byte count.

On the benchmark from above, this entirely eliminates sweeping at the
beginning of GC, which reduces the time between startGC readying the
GC goroutine and GC stopping the world for sweep termination to ~100µs
during which the heap grows at most 134KB.

Change-Id: I35422d6bba0c2310d48bb1f8f30a72d29e98c1af
Reviewed-on: https://go-review.googlesource.com/8921
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:46 +00:00
Austin Clements
91c80ce6c7 runtime: make mcache.local_cachealloc a uintptr
This field used to decrease with sweeps (and potentially go
negative). Now it is always zero or positive, so change it to a
uintptr so it meshes better with other memory stats.

Change-Id: I6a50a956ddc6077eeaf92011c51743cb69540a3c
Reviewed-on: https://go-review.googlesource.com/8899
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:41 +00:00
Austin Clements
a0452a6821 runtime: proportional response GC trigger controller
Currently, concurrent GC triggers at a fixed 7/8*GOGC heap growth. For
mutators that allocate slowly, this means GC will trigger too early
and run too often, wasting CPU time on GC. For mutators that allocate
quickly, this means GC will trigger too late, causing the program to
exceed the GOGC heap growth goal and/or to exceed CPU goals because of
a high mutator assist ratio.

This change adds a feedback control loop to dynamically adjust the GC
trigger from cycle to cycle. By monitoring the heap growth and GC CPU
utilization from cycle to cycle, this adjusts the Go garbage collector
to target the GOGC heap growth goal and the 25% CPU utilization goal.

Change-Id: Ic82eef288c1fa122f73b69fe604d32cbb219e293
Reviewed-on: https://go-review.googlesource.com/8851
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:37 +00:00
Austin Clements
8d03acce54 runtime: multi-threaded, utilization-scheduled background mark
Currently, the concurrent mark phase is performed by the main GC
goroutine. Prior to the previous commit enabling preemption, this
caused marking to always consume 1/GOMAXPROCS of the available CPU
time. If GOMAXPROCS=1, this meant background GC would consume 100% of
the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use
less than the goal of 25%. If GOMAXPROCS=4, background GC would use
the goal 25%, but if the mutator wasn't using the remaining 75%,
background marking wouldn't take advantage of the idle time. Enabling
preemption in the previous commit made GC miss CPU targets in
completely different ways, but set us up to bring everything back in
line.

This change replaces the fixed GC goroutine with per-P background mark
goroutines. Once started, these goroutines don't go in the standard
run queues; instead, they are scheduled specially such that the time
spent in mutator assists and the background mark goroutines totals 25%
of the CPU time available to the program. Furthermore, this lets
background marking take advantage of idle Ps, which significantly
boosts GC performance for applications that under-utilize the CPU.

This requires also changing how time is reported for gctrace, so this
change splits the concurrent mark CPU time into assist/background/idle
scanning.

This also requires increasing the size of the StackRecord slice used
in a GoroutineProfile test.

Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157
Reviewed-on: https://go-review.googlesource.com/8850
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:32 +00:00
Austin Clements
af060c3086 runtime: generally allow preemption during concurrent GC phases
Currently, the entire GC process runs with g.m.preemptoff set. In the
concurrent phases, the parts that actually need preemption disabled
are run on a system stack and there's no overall need to stay on the
same M or P during the concurrent phases. Hence, move the setting of
g.m.preemptoff to when we start mark termination, at which point we
really do need preemption disabled.

This dramatically changes the scheduling behavior of the concurrent
mark phase. Currently, since this is non-preemptible, concurrent mark
gets one dedicated P (so 1/GOMAXPROCS utilization). With this change,
the GC goroutine is scheduled like any other goroutine during
concurrent mark, so it gets 1/<runnable goroutines> utilization.

You might think it's not even necessary to set g.m.preemptoff at that
point since the world is stopped, but stackalloc/stackfree use this as
a signal that the per-P pools are not safe to access without
synchronization.

Change-Id: I08aebe8179a7d304650fb8449ff36262b3771099
Reviewed-on: https://go-review.googlesource.com/8839
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:27 +00:00
Austin Clements
100da60979 runtime: track time spent in mutator assists
This time is tracked per P and periodically flushed to the global
controller state. This will be used to compute mutator assist
utilization in order to schedule background GC work.

Change-Id: Ib94f90903d426a02cf488bf0e2ef67a068eb3eec
Reviewed-on: https://go-review.googlesource.com/8837
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:22 +00:00
Austin Clements
4b2fde945a runtime: proportional mutator assist
Currently, mutator allocation periodically assists the garbage
collector by performing a small, fixed amount of scanning work.
However, to control heap growth, mutators need to perform scanning
work *proportional* to their allocation rate.

This change implements proportional mutator assists. This uses the
scan work estimate computed by the garbage collector at the beginning
of each cycle to compute how much scan work must be performed per
allocation byte to complete the estimated scan work by the time the
heap reaches the goal size. When allocation triggers an assist, it
uses this ratio and the amount allocated since the last assist to
compute the assist work, then attempts to steal as much of this work
as possible from the background collector's credit, and then performs
any remaining scan work itself.

Change-Id: I98b2078147a60d01d6228b99afd414ef857e4fba
Reviewed-on: https://go-review.googlesource.com/8836
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:18 +00:00
Austin Clements
028f972847 runtime: make gcDrainN in terms of scan work
Currently, the "n" in gcDrainN is in terms of objects to scan. This is
used by gchelpwork to perform a limited amount of work on allocation,
but is a pretty arbitrary way to bound this amount of work since the
number of objects has little relation to how long they take to scan.

Modify gcDrainN to perform a fixed amount of scan work instead. For
now, gchelpwork still performs a fairly arbitrary amount of scan work,
but at least this is much more closely related to how long the work
will take. Shortly, we'll use this to precisely control the scan work
performed by mutator assists during allocation to achieve the heap
size goal.

Change-Id: I3cd07fe0516304298a0af188d0ccdf621d4651cc
Reviewed-on: https://go-review.googlesource.com/8835
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:14 +00:00
Austin Clements
8e24283a28 runtime: track background scan work credit
This tracks scan work done by background GC in a global pool. Mutator
assists will draw on this credit to avoid doing work when background
GC is staying ahead.

Unlike the other GC controller tracking variables, this will be both
written and read throughout the cycle. Hence, we can't arbitrarily
delay updates like we can for scan work and bytes marked. However, we
still want to minimize contention, so this global credit pool is
allowed some error from the "true" amount of credit. Background GC
accumulates credit locally up to a limit and only then flushes to the
global pool. Similarly, mutator assists will draw from the credit pool
in batches.

Change-Id: I1aa4fc604b63bf53d1ee2a967694dffdfc3e255e
Reviewed-on: https://go-review.googlesource.com/8834
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:09 +00:00
Austin Clements
4e9fc0df48 runtime: implement GC scan work estimator
This implements tracking the scan work ratio of a GC cycle and using
this to estimate the scan work that will be required by the next GC
cycle. Currently this estimate is unused; it will be used to drive
mutator assists.

Change-Id: I8685b59d89cf1d83eddfc9b30d84da4e3a7f4b72
Reviewed-on: https://go-review.googlesource.com/8833
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:04 +00:00
Austin Clements
571ebae6ef runtime: track scan work performed during concurrent mark
This tracks the amount of scan work in terms of scanned pointers
during the concurrent mark phase. We'll use this information to
estimate scan work for the next cycle.

Currently this aggregates the work counter in gcWork and dispose
atomically aggregates this into a global work counter. dispose happens
relatively infrequently, so the contention on the global counter
should be low. If this turns out to be an issue, we can reduce the
number of disposes, and if it's still a problem, we can switch to
per-P counters.

Change-Id: Iac0364c466ee35fab781dbbbe7970a5f3c4e1fc1
Reviewed-on: https://go-review.googlesource.com/8832
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:00 +00:00
Austin Clements
fb9fd2bdd7 runtime: atomic ops for int64
These currently use portable implementations in terms of their uint64
counterparts.

Change-Id: Icba5f7134cfcf9d0429edabcdd73091d97e5e905
Reviewed-on: https://go-review.googlesource.com/8831
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:34:54 +00:00
Sebastien Binet
918fdae348 reflect: implement ArrayOf
This change exposes reflect.ArrayOf to create new reflect.Type array
types at runtime, when given a reflect.Type element.

- reflect: implement ArrayOf
- reflect: tests for ArrayOf
- runtime: document that typeAlg is used by reflect and must be kept in
  synchronized

Fixes #5996.

Change-Id: I5d07213364ca915c25612deea390507c19461758
Reviewed-on: https://go-review.googlesource.com/4111
Reviewed-by: Keith Randall <khr@golang.org>
2015-04-21 15:21:09 +00:00
Matthew Dempsky
c0fa9e3f6f runtime/pprof: disable flaky TestTraceFutileWakeup on linux/ppc64le
Update #10512.

Change-Id: Ifdc59c3a5d8aba420b34ae4e37b3c2315dd7c783
Reviewed-on: https://go-review.googlesource.com/9162
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-21 10:01:53 +00:00
Rick Hudson
899a4ad47e runtime: Speed up heapBitsForObject
Optimized heapBitsForObject by special casing
objects whose size is a power of two. When a
span holding such objects is initialized I
added a mask that when &ed with an interior pointer
results in the base of the pointer. For the garbage
benchmark this resulted in CPU_CLK_UNHALTED in
heapBitsForObject going from 7.7% down to 5.9%
of the total, INST_RETIRED went from 12.2 -> 8.7.

Here are the benchmarks that were at lease plus or minus 1%.

benchmark                          old ns/op      new ns/op      delta
BenchmarkFmtFprintfString          249            221            -11.24%
BenchmarkFmtFprintfInt             247            223            -9.72%
BenchmarkFmtFprintfEmpty           76.5           69.6           -9.02%
BenchmarkBinaryTree17              4106631412     3744550160     -8.82%
BenchmarkFmtFprintfFloat           424            399            -5.90%
BenchmarkGoParse                   4484421        4242115        -5.40%
BenchmarkGobEncode                 8803668        8449107        -4.03%
BenchmarkFmtManyArgs               1494           1436           -3.88%
BenchmarkGobDecode                 10431051       10032606       -3.82%
BenchmarkFannkuch11                2591306713     2517400464     -2.85%
BenchmarkTimeParse                 361            371            +2.77%
BenchmarkJSONDecode                70620492       68830357       -2.53%
BenchmarkRegexpMatchMedium_1K      54693          53343          -2.47%
BenchmarkTemplate                  90008879       91929940       +2.13%
BenchmarkTimeFormat                380            387            +1.84%
BenchmarkRegexpMatchEasy1_32       111            113            +1.80%
BenchmarkJSONEncode                21359159       21007583       -1.65%
BenchmarkRegexpMatchEasy1_1K       603            613            +1.66%
BenchmarkRegexpMatchEasy0_32       127            129            +1.57%
BenchmarkFmtFprintfIntInt          399            393            -1.50%
BenchmarkRegexpMatchEasy0_1K       373            378            +1.34%

Change-Id: I78e297161026f8b5cc7507c965fd3e486f81ed29
Reviewed-on: https://go-review.googlesource.com/8980
Reviewed-by: Austin Clements <austin@google.com>
2015-04-20 21:39:06 +00:00
Russ Cox
181e26b9fa runtime: replace func-based write barrier skipping with type-based
This CL revises CL 7504 to use explicitly uintptr types for the
struct fields that are going to be updated sometimes without
write barriers. The result is that the fields are now updated *always*
without write barriers.

This approach has two important properties:

1) Now the GC never looks at the field, so if the missing reference
could cause a problem, it will do so all the time, not just when the
write barrier is missed at just the right moment.

2) Now a write barrier never happens for the field, avoiding the
(correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1.

Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e
Reviewed-on: https://go-review.googlesource.com/9019
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-20 20:20:09 +00:00
Ian Lance Taylor
357a013060 runtime: save registers in linux/{386,amd64} lib entry point
The callee-saved registers must be saved because for the c-shared case
this code is invoked from C code in the system library, and that code
expects the registers to be saved.  The tests were passing because in
the normal case the code calls a cgo function that naturally saves
callee-saved registers anyhow.  However, it fails when the code takes
the non-cgo path.

Change-Id: I9c1f5e884f5a72db9614478049b1863641c8b2b9
Reviewed-on: https://go-review.googlesource.com/9114
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-20 18:09:41 +00:00
Ian Lance Taylor
725aa3451a runtime: no deadlock error if buildmode=c-archive or c-shared
Change-Id: I4ee6dac32bd3759aabdfdc92b235282785fbcca9
Reviewed-on: https://go-review.googlesource.com/9083
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-20 17:31:44 +00:00
Ian Lance Taylor
9c1868d06d runtime: add -buildmode=c-archive/c-shared support for linux/386
Change-Id: I87147ca6bb53e3121cc4245449c519509f107638
Reviewed-on: https://go-review.googlesource.com/9009
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-17 19:31:37 +00:00
Russ Cox
8e5346571c runtime: leave gccheckmark testing off by default
It's not helping anymore, and it's fooling people who try to
understand performance (like me).

Change-Id: I133a644acae0ddf1bfa17c654cdc01e2089da963
Reviewed-on: https://go-review.googlesource.com/9018
Reviewed-by: Austin Clements <austin@google.com>
2015-04-17 19:29:04 +00:00
Austin Clements
c1c667542c runtime: fix dangling pointer in readyExecute
readyExecute passes a closure to mcall that captures an argument to
readyExecute. Since mcall is marked noescape, this closure lives on
the stack of the calling goroutine. However, the closure puts the
calling goroutine on the run queue (and switches to a new
goroutine). If the calling goroutine gets scheduled before the mcall
returns, this stack-allocated closure will become invalid while it's
still executing. One consequence of this we've observed is that the
captured gp variable can get overwritten before the call to
execute(gp), causing execute(gp) to segfault.

Fix this by passing the currently captured gp variable through a field
in the calling goroutine's g struct so that the func is no longer a
closure.

To prevent problems like this in the future, this change also removes
the go:noescape annotation from mcall. Due to a compiler bug, this
will currently cause a func closure passed to mcall to be implicitly
allocated rather than refusing the implicit allocation. However, this
is okay because there are no other closures passed to mcall right now
and the compiler bug will be fixed shortly.

Fixes #10428.

Change-Id: I49b48b85de5643323b89e9eaa4df63854e968c32
Reviewed-on: https://go-review.googlesource.com/8866
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-17 17:59:14 +00:00
Dave Cheney
7ae9d06880 runtime/pprof: disable TestTraceStressStartStop
Updates #10476

Change-Id: Ic4414f669104905c6004835be5cf0fa873553ea6
Reviewed-on: https://go-review.googlesource.com/8962
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-17 14:54:25 +00:00
David Crawshaw
c8aba85e4a runtime: export main.main for android
Previously we started the Go runtime from a JNI function call, which
eventually called the program's main function. Now the runtime is
initialized by an ELF initialization function as a c-shared library,
and the program's main function is not called. So now we export main
so it can be called from JNI.

This is necessary for all-Go apps because unlike a normal shared
library, the program loading the library is not written by or known
to the programmer. As far as they are concerned, the .so is
everything. In fact the same code is compiled for iOS as a normal Go
program.

Change-Id: I61c6a92243240ed229342362231b1bfc7ca526ba
Reviewed-on: https://go-review.googlesource.com/9015
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-04-17 12:11:04 +00:00
David Crawshaw
5da1c254d5 runtime: do not run main when buildmode=c-shared
Change-Id: Ie7f85873978adf3fd5c739176f501ca219592824
Reviewed-on: https://go-review.googlesource.com/9011
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-17 11:31:01 +00:00
Russ Cox
6a2b0c0b6d runtime: delete cgo_allocate
This memory is untyped and can't be used anymore.
The next version of SWIG won't need it.

Change-Id: I592b287c5f5186975ee09a9b28d8efe3b57134e7
Reviewed-on: https://go-review.googlesource.com/8956
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-17 01:30:47 +00:00
David Crawshaw
5b72b8c7a3 runtime: aeshash stubs for arm64
For some reason the absense of an implementation does not stop arm64
binaries being built. However it comes up with -buildmode=c-archive.

Change-Id: Ic0db5fd8fb4fe8252b5aa320818df0c7aec3db8f
Reviewed-on: https://go-review.googlesource.com/8989
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-16 19:49:31 +00:00
David Crawshaw
e8b7133e9b runtime: darwin/arm64 c-archive entry point
Change-Id: Ib227aa3e14d01a0ab1ad9e53d107858e045d1c42
Reviewed-on: https://go-review.googlesource.com/8984
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-16 18:56:54 +00:00
David Crawshaw
9cde36be54 runtime/cgo: enable arm64 EXC_BAD_ACCESS handler
Change-Id: I8e912ff9327a4163b63b8c628aa3546e86ddcc02
Reviewed-on: https://go-review.googlesource.com/8983
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-04-16 18:00:57 +00:00
Shenghou Ma
4a71b91d29 runtime: darwin/arm64 support
Change-Id: I3b3f80791a1db4c2b7318f81a115972cd2237f03
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/8782
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-16 13:01:19 +00:00
Shenghou Ma
828de09f8b runtime/cgo: darwin/arm64 support
Fixes #10116.

Change-Id: I3b3f80791a1db4c2b7318f81a115972cd2237f05
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/8784
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-16 12:50:49 +00:00
Michael Hudson-Doyle
f616af23e0 cmd/6l: call runtime.addmoduledata from .init_array
Change-Id: I09e84161d106960a69972f5fc845a1e40c28e58f
Reviewed-on: https://go-review.googlesource.com/8331
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-15 23:54:20 +00:00
Josh Bleecher Snyder
7e0c11c32f cmd/6g, runtime: improve duffzero throughput
It is faster to execute

	MOVQ AX,(DI)
	MOVQ AX,8(DI)
	MOVQ AX,16(DI)
	MOVQ AX,24(DI)
	ADDQ $32,DI

than

	STOSQ
	STOSQ
	STOSQ
	STOSQ

However, in order to be able to jump into
the middle of a block of MOVQs, the call
site needs to pre-adjust DI.

If we're clearing a small area, the cost
of that DI pre-adjustment isn't repaid.

This CL switches the DUFFZERO implementation
to use a hybrid strategy, in which small
clears use STOSQ as before, but large clears
use mostly MOVQ/ADDQ blocks.

benchmark                 old ns/op     new ns/op     delta
BenchmarkClearFat8        0.55          0.55          +0.00%
BenchmarkClearFat12       0.82          0.83          +1.22%
BenchmarkClearFat16       0.55          0.55          +0.00%
BenchmarkClearFat24       0.82          0.82          +0.00%
BenchmarkClearFat32       2.20          1.94          -11.82%
BenchmarkClearFat40       1.92          1.66          -13.54%
BenchmarkClearFat48       2.21          1.93          -12.67%
BenchmarkClearFat56       3.03          2.20          -27.39%
BenchmarkClearFat64       3.26          2.48          -23.93%
BenchmarkClearFat72       3.57          2.76          -22.69%
BenchmarkClearFat80       3.83          3.05          -20.37%
BenchmarkClearFat88       4.14          3.30          -20.29%
BenchmarkClearFat128      5.54          4.69          -15.34%
BenchmarkClearFat256      9.95          9.09          -8.64%
BenchmarkClearFat512      18.7          17.9          -4.28%
BenchmarkClearFat1024     36.2          35.4          -2.21%

Change-Id: Ic786406d9b3cab68d5a231688f9e66fcd1bd7103
Reviewed-on: https://go-review.googlesource.com/2585
Reviewed-by: Keith Randall <khr@golang.org>
2015-04-15 19:17:07 +00:00
Michael Hudson-Doyle
ab4df700b8 runtime: merge slice and sliceStruct
By removing type slice, renaming type sliceStruct to type slice and
whacking until it compiles.

Has a pleasing net reduction of conversions.

Fixes #10188

Change-Id: I77202b8df637185b632fd7875a1fdd8d52c7a83c
Reviewed-on: https://go-review.googlesource.com/8770
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-15 16:59:49 +00:00
Dave Cheney
e629cd0f88 runtime: mark all runtime.cputicks implementations NOSPLIT
Fixes #10450

runtime.cputicks is called from runtime.exitsyscall and must not
split the stack. cputicks is implemented in several ways and the
NOSPLIT annotation was missing from a few of these.

Change-Id: I5cbbb4e5888c5d298fe2fef240782d0e49f59af8
Reviewed-on: https://go-review.googlesource.com/8939
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-04-15 09:22:15 +00:00
Alex Brainman
9402e49450 runtime: really pass return value to Windows in externalthreadhandler
When Windows calls externalthreadhandler it expects to receive
return value in AX. We don't set AX anywhere. Change that.
Store ctrlhandler1 and profileloop1 return values into AX before
returning from externalthreadhandler.

Fixes #10215.

Change-Id: Ied04542cc3ebe7d4a26660e970f9f78098143591
Reviewed-on: https://go-review.googlesource.com/8901
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-15 05:03:42 +00:00
Austin Clements
a23a341e10 runtime: make time slice a const
A G will be preempted if it runs for 10ms without blocking. Currently
this constant is hard-coded in retake. Move it to a global const.
We'll use the time slice length in scheduling background GC.

Change-Id: I79a979948af2fad3afe5df9d4af4062f166554b7
Reviewed-on: https://go-review.googlesource.com/8838
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-14 22:06:32 +00:00
Austin Clements
69001e404e runtime: fix freed page accounting in mHeap_ReclaimList
mHeap_ReclaimList is asked to reclaim at least npages pages, but it
counts the number of spans reclaimed, not the number of pages
reclaimed. The number of spans reclaimed is strictly larger than the
number of pages, so this is not strictly wrong, but it is forcing more
reclamation than was intended by the caller, which delays large
allocations.

Fix this by increasing the count by the number of pages in the swept
span, rather than just increasing it by 1.

Fixes #9048.

Change-Id: I5ae364a9837a6012e68fcd431bba000340cfd50c
Reviewed-on: https://go-review.googlesource.com/8920
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-14 20:55:14 +00:00
Austin Clements
bedb6f8aef runtime: remove unnecessary traceNextGC
Commit d7e0ad4 removed the next_gc manipulation from mSpan_Sweep, but
left in the traceNextGC() for recording the updated next_gc
value. Remove this now unnecessary call.

Change-Id: I28e0de071661199be9810d7bdcc81ce50b5a58ae
Reviewed-on: https://go-review.googlesource.com/8894
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-14 20:54:23 +00:00
David Crawshaw
3b22ffc07e runtime: make cgocallback wait on package init
With the new buildmodes c-archive and c-shared, it is possible for a
cgo call to come in early in the lifecycle of a Go program. Calls
before the runtime has been initialized are caught by
_cgo_wait_runtime_init_done. However a call can come in after the
runtime has initialized, but before the program's package init
functions have finished running.

To avoid this cgocallback checks m.ncgo to see if we are on a thread
running Go. If not, we may be a foreign thread and it blocks until
main_init is complete.

Change-Id: I7a9f137fa2a40c322a0b93764261f9aa17fcf5b8
Reviewed-on: https://go-review.googlesource.com/8897
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2015-04-14 13:39:02 +00:00
David Crawshaw
cea272de30 runtime: rename close to closefd
Avoids shadowing the builtin channel close function.

Change-Id: I7a729b0937c8248fe27222be61318a88db995eee
Reviewed-on: https://go-review.googlesource.com/8898
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2015-04-14 12:31:29 +00:00
Srdjan Petrovic
d1eee2cebf runtime: shared library init support for android/arm.
Follows http://golang.org/cl/8454, a similar CL for arm architectures.
This CL involves android-specific changes, namely, synthesizing
argv/auxv, as android doesn't provide those to the init functions.

This code is based on crawshaw@ android code in golang.org/x/mobile.

Change-Id: I32364efbb2662e80270a99bd7dfb1d0421b5417d
Reviewed-on: https://go-review.googlesource.com/8457
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-13 21:53:15 +00:00
Srdjan Petrovic
93644c9118 runtime: shared library runtime init for arm
Adds the runtime initialization flow for arm akin to amd64.
In particular,we use the library initialization entry point to:
    - create a new OS thread and run the "regular" runtime init stack on
      that thread
    - return immediately from the main (i.e., loader) thread
    - at the first CGO invocation, we wait for the runtime initialization
      to complete.

Verified to work on a Raspberry Pi and an Android phone.

Change-Id: I32f39228ae30a03ce9569287f234b305790fecf6
Reviewed-on: https://go-review.googlesource.com/8455
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Srdjan Petrovic <spetrovic@google.com>
2015-04-13 18:58:18 +00:00
Srdjan Petrovic
a888fcf7a7 runtime: remove runtime wait/notify from ppc64x architectures.
Related to issue #10410

For some reason, any non-trivial code in _cgo_wait_runtime_init_done
(even fprintf()) will crash that call.

If anybody has any guess why this is happening, please let me know!

For now, I'm clearing the functions for ppc64, as it's currently not used.

Change-Id: I1b11383aaf4f9f9a16f1fd6606842cfeedc9f0b3
Reviewed-on: https://go-review.googlesource.com/8766
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Srdjan Petrovic <spetrovic@google.com>
2015-04-13 17:21:04 +00:00
David Crawshaw
989f0ee80a runtime/cgo: EXC_BAD_ACCESS handler for arm64
Change-Id: Ia9ff9c0d381fad43fc5d3e5972dd6e66503733a5
Reviewed-on: https://go-review.googlesource.com/8815
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-13 12:08:37 +00:00
David Crawshaw
0a81d31b66 runtime/pprof: skip fork test on darwin/arm64
Just like darwin/arm.

Change-Id: Ic75927bd6457d37cda7dd8279fd9b4cd52edc1d1
Reviewed-on: https://go-review.googlesource.com/8813
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-13 11:58:03 +00:00
David Crawshaw
7db8835a50 runtime/debug: disable arm64 test for issue 9993
Like other arm64 platforms, darwin/arm64 has a different physical
page size to logical page size so it is running into issue 9993. I
hope it can be fixed for Go 1.5, but for now it is demonstrating the
same bug as the other skipped os+arch combinations.

Change-Id: Iedaf9afe56d6954bb4391b6e843d81742a75a00c
Reviewed-on: https://go-review.googlesource.com/8814
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-13 11:57:12 +00:00
David Crawshaw
d6d423b99b runtime: skip fork test on darwin/arm64
Just like darwin/arm.

Change-Id: Ie4998d24b2d891a9f6c8047ec40cd3fdf80622cd
Reviewed-on: https://go-review.googlesource.com/8812
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-13 11:52:05 +00:00
Alex Brainman
d1af6bed84 runtime: move all exception related code into signal_windows.go
Change-Id: I9654a5c85bd9b3ae9c7a9eddaef1ec752f42bd1b
Reviewed-on: https://go-review.googlesource.com/8840
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-13 07:04:21 +00:00
David Crawshaw
6e3a6c4d38 runtime: library entry point for darwin/arm
Tested by using -buildmode=c-archive to generate an archive, add it
to an Xcode project and calling a Go function from an iOS app. (I'm
still investigating proper buildmode tests for all.bash.)

Change-Id: I7890df15246df8e90ad27837b8d64ba2cde409fe
Reviewed-on: https://go-review.googlesource.com/8719
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-12 12:49:49 +00:00
Michael Hudson-Doyle
e1366f94ee reflect, runtime: check equality, not identity, for method names
When dynamically linking Go code, it is no longer safe to assume that
strings that end up in method names are identical if they are equal.

The performance impact seems to be noise:

benchmark                    old ns/op     new ns/op     delta
BenchmarkAssertI2E2          13.3          13.1          -1.50%
BenchmarkAssertE2I           23.5          23.2          -1.28%
BenchmarkAssertE2E2Blank     0.83          0.82          -1.20%
BenchmarkConvT2ISmall        60.7          60.1          -0.99%
BenchmarkAssertI2T           10.2          10.1          -0.98%
BenchmarkAssertE2T           10.2          10.3          +0.98%
BenchmarkConvT2ESmall        56.7          57.2          +0.88%
BenchmarkConvT2ILarge        59.4          58.9          -0.84%
BenchmarkConvI2E             13.0          12.9          -0.77%
BenchmarkAssertI2E           13.4          13.3          -0.75%
BenchmarkConvT2IUintptr      57.9          58.3          +0.69%
BenchmarkConvT2ELarge        55.9          55.6          -0.54%
BenchmarkAssertI2I           23.8          23.7          -0.42%
BenchmarkConvT2EUintptr      55.4          55.5          +0.18%
BenchmarkAssertE2E           6.12          6.11          -0.16%
BenchmarkAssertE2E2          14.4          14.4          +0.00%
BenchmarkAssertE2T2          10.0          10.0          +0.00%
BenchmarkAssertE2T2Blank     0.83          0.83          +0.00%
BenchmarkAssertE2TLarge      10.7          10.7          +0.00%
BenchmarkAssertI2E2Blank     0.83          0.83          +0.00%
BenchmarkConvI2I             23.4          23.4          +0.00%

Change-Id: I0b3dfc314215a4d4e09eec6b42c1e3ebce33eb56
Reviewed-on: https://go-review.googlesource.com/8239
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-11 17:35:44 +00:00
Derek Buitenhuis
53840ad6f1 runtime: Fix GDB integration with Python 2
A similar fix was applied in 545686857b
but another instance of 'pc' was missed.

Also adds a test for the goroutine gdb command.

It currently uses goroutine 2 for the test, since goroutine 1 has
its stack pointer set to 0 for some reason.

Change-Id: I53ca22be6952f03a862edbdebd9b5c292e0853ae
Reviewed-on: https://go-review.googlesource.com/8729
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-10 22:17:59 +00:00
Austin Clements
4b956ae317 runtime: start concurrent GC promptly when we reach its trigger
Currently, when allocation reaches the concurrent GC trigger size, we
start the concurrent collector by ready'ing its G. This simply puts it
on the end of the P's run queue, which means we may not actually start
GC for some time as the current G continues to run and then the P
drains other Gs already on its run queue. Since the mutator can
continue to allocate, the heap can potentially be much larger than we
intended by the time GC actually starts. Furthermore, how much larger
is difficult to predict since it depends on the scheduler.

Fix this by preempting the current G and switching directly to the
concurrent GC G as soon as we reach the trigger heap size.

On the garbage benchmark from the benchmarks subrepo with
GOMAXPROCS=4, this reduces the time from triggering the GC to the
beginning of sweep termination by 10 to 30 milliseconds, which reduces
allocation after the trigger by up to 10MB (a large fraction of the
64MB live heap the benchmark tries to maintain).

One other known source of delay before we "really" start GC is the
sweep finalization performed before sweep termination. This has
similar negative effects on heap size and predictability, but is an
orthogonal problem. This change adds a TODO for this.

Change-Id: I8bae98cb43685c1bf353ff55868e4647e3743c47
Reviewed-on: https://go-review.googlesource.com/8513
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-10 18:22:52 +00:00
Austin Clements
6afb5fa48f runtime: remove GoSched/GoStart trace events around GC
These were appropriate for STW GC, since it interrupted the allocating
Goroutine, but don't apply to concurrent GC, which runs on its own
Goroutine. Forced GC is still STW, but it makes sense to attribute the
GC to the goroutine that called runtime.GC().

Change-Id: If12418ca66dc7e53b8b16025af4e03adb5d9577e
Reviewed-on: https://go-review.googlesource.com/8715
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-10 18:21:52 +00:00
Austin Clements
7c37249639 runtime: make test for freezetheworld more precise
exitsyscallfast checks for freezetheworld, but does so only by
checking if stopwait is positive. This can also happen during
stoptheworld, which is harmless, but confusing. Shortly, it will be
important that we get to the p.status cas even if stopwait is set.

Hence, make this test more specific so it only triggers with
freezetheworld and not other uses of stopwait.

Change-Id: Ibb722cd8360c3ed5a9654482519e3ceb87a8274d
Reviewed-on: https://go-review.googlesource.com/8205
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-10 18:02:55 +00:00
Dmitry Vyukov
089d363a91 runtime: fix tracing of syscall exit
Fix tracing of syscall exit after:
https://go-review.googlesource.com/#/c/7504/

Change-Id: Idcde2aa826d2b9a05d0a90a80242b6bfa78846ab
Reviewed-on: https://go-review.googlesource.com/8728
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-10 17:39:06 +00:00
Michael Hudson-Doyle
a1f57598cc runtime, cmd/internal/ld: rename themoduledata to firstmoduledata
'themoduledata' doesn't really make sense now we support multiple moduledata
objects.

Change-Id: I8263045d8f62a42cb523502b37289b0fba054f62
Reviewed-on: https://go-review.googlesource.com/8521
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-10 05:11:49 +00:00
Michael Hudson-Doyle
fae4a128cb runtime, reflect: support multiple moduledata objects
This changes all the places that consult themoduledata to consult a
linked list of moduledata objects, as will be necessary for
-linkshared to work.

Obviously, as there is as yet no way of adding moduledata objects to
this list, all this change achieves right now is wasting a few
instructions here and there.

Change-Id: I397af7f60d0849b76aaccedf72238fe664867051
Reviewed-on: https://go-review.googlesource.com/8231
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-10 04:51:42 +00:00
Josh Bleecher Snyder
969f10140c runtime: fix arm64 build
Broken by CL 8541.

Change-Id: Ie2e89a22b91748e82f7bc4723660a24ed4135687
Reviewed-on: https://go-review.googlesource.com/8734
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-10 02:29:01 +00:00
Austin Clements
cb10ff1ef9 runtime: report next_gc for initial heap size in gctrace
Currently, the initial heap size reported in the gctrace line is the
heap_live right before sweep termination. However, we triggered GC
when heap_live reached next_gc, and there may have been significant
allocation between that point and the beginning of sweep
termination. Ideally these would be essentially the same, but
currently there's scheduler delay when readying the GC goroutine as
well as delay from background sweep finalization.

We should fix this delay, but in the mean time, to give the user a
better idea of how much the heap grew during the whole of garbage
collection, report the trigger rather than what the heap size happened
to be after the garbage collector finished rolling out of bed. This
will also be more useful for heap growth plots.

Change-Id: I08476b9fbcfb2de90592405e9c9f434dfb9eb1f8
Reviewed-on: https://go-review.googlesource.com/8512
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-09 22:18:06 +00:00
David Crawshaw
d1b1eee280 runtime: add isarchive, set by the linker
According to Go execution modes, a Go program compiled with
-buildmode=c-archive has a main function, but it is ignored on run.
This gives the runtime the information it needs not to run the main.

I have this working with pending linker changes on darwin/amd64.

Change-Id: I49bd7d65aa619ec847c464a872afa5deea7d4d30
Reviewed-on: https://go-review.googlesource.com/8701
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-09 20:02:02 +00:00
Dave Cheney
ee349b5d77 runtime: add arm64 runtime.cmpstring and bytes.Compare
Add arm64 assembly implementation of runtime.cmpstring and bytes.Compare.

benchmark                                old ns/op     new ns/op     delta
BenchmarkCompareBytesEqual               98.0          27.5          -71.94%
BenchmarkCompareBytesToNil               9.38          10.0          +6.61%
BenchmarkCompareBytesEmpty               13.3          10.0          -24.81%
BenchmarkCompareBytesIdentical           98.0          27.5          -71.94%
BenchmarkCompareBytesSameLength          43.3          16.3          -62.36%
BenchmarkCompareBytesDifferentLength     43.4          16.3          -62.44%
BenchmarkCompareBytesBigUnaligned        6979680       1360979       -80.50%
BenchmarkCompareBytesBig                 6915995       1381979       -80.02%
BenchmarkCompareBytesBigIdentical        6781440       1327304       -80.43%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkCompareBytesBigUnaligned     150.23       770.46       5.13x
BenchmarkCompareBytesBig              151.62       758.76       5.00x
BenchmarkCompareBytesBigIdentical     154.63       790.01       5.11x

* note, the machine we are benchmarking on has some issues. What is clear is
compared to a few days ago the old MB/s value has increased from ~115 to 150.
I'm less certain about the new MB/s number, which used to be close to 1Gb/s.

Change-Id: I4f31b2c7a06296e13912aacc958525632cb0450d
Reviewed-on: https://go-review.googlesource.com/8541
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-09 14:49:31 +00:00
Alex Brainman
6e774faed7 runtime: make windows exception handler code arch independent
Mainly it is simple copy. But I had to change amd64
lastcontinuehandler return value from uint32 to int32.
I don't remember how it happened to be uint32, but new
int32 is matching better with Windows documentation (LONG).
I don't think it matters one way or the others.

Change-Id: I6935224a2470ad6301e27590f2baa86c13bbe8d5
Reviewed-on: https://go-review.googlesource.com/8686
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-09 09:55:38 +00:00
Alex Brainman
414444d416 runtime: do not calculate asmstdcall address every time we make syscall
Change-Id: If3c8c9035e12d41647ae4982883f6a979313ea9d
Reviewed-on: https://go-review.googlesource.com/8682
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-09 04:26:44 +00:00
David Crawshaw
c844bf4cfc runtime: fix darwin/386, darwin/arm builds
In cl/8652 I broke darwin/arm and darwin/386 because I removed the *g
parameter, which they both expect and use. This CL adjusts both ports
to look for g0 in m, just as darwin/amd64 does.

Tested on darwin{386,arm,amd64}.

Change-Id: Ia56f3d97e126b40d8bbd2e8f677b008e4a1badad
Reviewed-on: https://go-review.googlesource.com/8666
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-09 01:36:21 +00:00
Alex Brainman
e0d9342da7 runtime: use (*context) ip, setip, sp and setsp everywhere on windows
Also move dumpregs into defs_windows_*.go.

Change-Id: Ic077d7dbb133c7b812856e758d696d6fed557afd
Reviewed-on: https://go-review.googlesource.com/4650
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-09 00:57:28 +00:00
David Crawshaw
b0a85f5d93 runtime: darwin/amd64 library entry point
This is a practice run for darwin/arm.

Similar to the linux/amd64 shared library entry point. With several
pending linker changes I am successfully using this to implement
-buildmode=c-archive on darwin/amd64 with external linking.

The same entry point can be reused to implement -buildmode=c-shared
on darwin/amd64, however that will require further ld changes to
remove all text relocations.

One extra runtime change will follow this. According to the Go
execution modes document, -buildmode=c-archive should ignore the Go
main function. Right now it is being executed (and the process exits
if it doesn't block). I'm still searching for the right way to do
this.

Change-Id: Id97901ddd4d46970996f222bd79731dabff66a3d
Reviewed-on: https://go-review.googlesource.com/8652
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-08 21:53:52 +00:00
Michael Hudson-Doyle
3a84e3305b runtime, cmd/internal/ld: initialize themoduledata slices directly
This CL is quite conservative in some ways.  It continues to define
symbols that have no real purpose (e.g. epclntab).  These could be
deleted if there is no concern that external tools might look for them.

It would also now be possible to make some changes to the pcln data but
I get the impression that would definitely require some thought and
discussion.

Change-Id: Ib33cde07e4ec38ecc1d6c319a10138c9347933a3
Reviewed-on: https://go-review.googlesource.com/7616
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-08 16:20:57 +00:00
Michael Matloob
a173357cd5 runtime: fix return type for bsdthread_register in comments
The return type for bsdthread_register is int32. See
runtime/os_darwin.go.

This change also rewrites declaration comments for go functions to
use go syntax and fixes vet errors in sys_darwin_amd64.s.

Change-Id: I7482105f7562929e0ede30099efac9e76babd8a3
Reviewed-on: https://go-review.googlesource.com/3260
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-08 14:13:53 +00:00
Shenghou Ma
d0b62d8bfa runtime: linux/arm64 cgo support
Change-Id: I309e3df7608b9eef9339196fdc50dedf5f9439f3
Reviewed-on: https://go-review.googlesource.com/8450
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-04-08 09:08:27 +00:00
Shenghou Ma
0accc80fbb runtime/cgo: linux/arm64 cgo support
Change-Id: I309e3df7608b9eef9339196fdc50dedf5f9439f2
Reviewed-on: https://go-review.googlesource.com/8439
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-04-08 09:08:12 +00:00
Russ Cox
92c826b1b2 cmd/internal/gc: inline runtime.getg
This more closely restores what the old C runtime did.
(In C, g was an 'extern register' with the same effective
implementation as in this CL.)

On a late 2012 MacBookPro10,2, best of 5 old vs best of 5 new:

benchmark                          old ns/op      new ns/op      delta
BenchmarkBinaryTree17              4981312777     4463426605     -10.40%
BenchmarkFannkuch11                3046495712     3006819428     -1.30%
BenchmarkFmtFprintfEmpty           89.3           79.8           -10.64%
BenchmarkFmtFprintfString          284            262            -7.75%
BenchmarkFmtFprintfInt             282            262            -7.09%
BenchmarkFmtFprintfIntInt          480            448            -6.67%
BenchmarkFmtFprintfPrefixedInt     382            358            -6.28%
BenchmarkFmtFprintfFloat           529            486            -8.13%
BenchmarkFmtManyArgs               1849           1773           -4.11%
BenchmarkGobDecode                 12835963       11794385       -8.11%
BenchmarkGobEncode                 10527170       10288422       -2.27%
BenchmarkGzip                      436109569      438422516      +0.53%
BenchmarkGunzip                    110121663      109843648      -0.25%
BenchmarkHTTPClientServer          81930          85446          +4.29%
BenchmarkJSONEncode                24638574       24280603       -1.45%
BenchmarkJSONDecode                93022423       85753546       -7.81%
BenchmarkMandelbrot200             4703899        4735407        +0.67%
BenchmarkGoParse                   5319853        5086843        -4.38%
BenchmarkRegexpMatchEasy0_32       151            151            +0.00%
BenchmarkRegexpMatchEasy0_1K       452            453            +0.22%
BenchmarkRegexpMatchEasy1_32       131            132            +0.76%
BenchmarkRegexpMatchEasy1_1K       761            722            -5.12%
BenchmarkRegexpMatchMedium_32      228            224            -1.75%
BenchmarkRegexpMatchMedium_1K      63751          64296          +0.85%
BenchmarkRegexpMatchHard_32        3188           3238           +1.57%
BenchmarkRegexpMatchHard_1K        95396          96756          +1.43%
BenchmarkRevcomp                   661587262      687107364      +3.86%
BenchmarkTemplate                  108312598      104008540      -3.97%
BenchmarkTimeParse                 453            459            +1.32%
BenchmarkTimeFormat                475            441            -7.16%

The garbage benchmark from the benchmarks subrepo gets 2.6% faster as well.

Change-Id: I320aeda332db81012688b26ffab23f6581c59cfa
Reviewed-on: https://go-review.googlesource.com/8460
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-04-07 14:26:47 +00:00
David Crawshaw
ede863c673 runtime: add _rt0_arm_android_lib
At the moment this function does nothing, runtime initialization is
still done in android.c:init_go_runtime.

Fixes #10358

Change-Id: I1d762383ba61efcbcf0bbc7c77895f5c1dbf8968
Reviewed-on: https://go-review.googlesource.com/8510
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-04-06 22:54:52 +00:00
Austin Clements
8c3fc088fb runtime: report marked heap size in gctrace
When the gctrace GODEBUG option is enabled, it will now report three
heap sizes: the heap size at the beginning of the GC cycle, the heap
size at the end of the GC cycle before sweeping, and marked heap size,
which is the amount of heap that will be retained until the next GC
cycle.

Change-Id: Ie13f8a6d5c609bc9cc47c7555960ab55b37b5f1c
Reviewed-on: https://go-review.googlesource.com/8430
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-06 21:28:23 +00:00
Austin Clements
6d12b1780e runtime: make next_gc be heap size to trigger GC at
In the STW collector, next_gc was both the heap size to trigger GC at
as well as the goal heap size.

Early in the concurrent collector's development, next_gc was the goal
heap size, but was also used as the heap size to trigger GC at. This
meant we always overshot the goal because of allocation during
concurrent GC.

Currently, next_gc is still the goal heap size, but we trigger
concurrent GC at 7/8*GOGC heap growth. This complicates
shouldtriggergc, but was necessary because of the incremental
maintenance of next_gc.

Now we simply compute next_gc for the next cycle during mark
termination. Hence, it's now easy to take the simpler route and
redefine next_gc as the heap size at which the next GC triggers. We
can directly compute this with the 7/8 backoff during mark termination
and shouldtriggergc can simply test if the live heap size has grown
over the next_gc trigger.

This will also simplify later changes once we start setting next_gc in
more sophisticated ways.

Change-Id: I872be4ae06b4f7a0d7f7967360a054bd36b90eea
Reviewed-on: https://go-review.googlesource.com/8420
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-06 21:28:18 +00:00
Austin Clements
d7e0ad4b82 runtime: introduce heap_live; replace use of heap_alloc in GC
Currently there are two main consumers of memstats.heap_alloc:
updatememstats (aka ReadMemStats) and shouldtriggergc.

updatememstats recomputes heap_alloc from the ground up, so we don't
need to keep heap_alloc up to date for it. shouldtriggergc wants to
know how many bytes were marked by the previous GC plus how many bytes
have been allocated since then, but this *isn't* what heap_alloc
tracks. heap_alloc also includes objects that are not marked and
haven't yet been swept.

Introduce a new memstat called heap_live that actually tracks what
shouldtriggergc wants to know and stop keeping heap_alloc up to date.

Unlike heap_alloc, heap_live follows a simple sawtooth that drops
during each mark termination and increases monotonically between GCs.
heap_alloc, on the other hand, has much more complicated behavior: it
may drop during sweep termination, slowly decreases from background
sweeping between GCs, is roughly unaffected by allocation as long as
there are unswept spans (because we sweep and allocate at the same
rate), and may go up after background sweeping is done depending on
the GC trigger.

heap_live simplifies computing next_gc and using it to figure out when
to trigger garbage collection. Currently, we guess next_gc at the end
of a cycle and update it as we sweep and get a better idea of how much
heap was marked. Now, since we're directly tracking how much heap is
marked, we can directly compute next_gc.

This also corrects bugs that could cause us to trigger GC early.
Currently, in any case where sweep termination actually finds spans to
sweep, heap_alloc is an overestimation of live heap, so we'll trigger
GC too early. heap_live, on the other hand, is unaffected by sweeping.

Change-Id: I1f96807b6ed60d4156e8173a8e68745ffc742388
Reviewed-on: https://go-review.googlesource.com/8389
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-06 21:28:13 +00:00
Austin Clements
50a66562a0 runtime: track heap bytes marked by GC
This tracks the number of heap bytes marked by a GC cycle. We'll use
this information to precisely trigger the next GC cycle.

Currently this aggregates the work counter in gcWork and dispose
atomically aggregates this into a global work counter. dispose happens
relatively infrequently, so the contention on the global counter
should be low. If this turns out to be an issue, we can reduce the
number of disposes, and if it's still a problem, we can switch to
per-P counters.

Change-Id: I1bc377cb2e802ef61c2968602b63146d52e7f5db
Reviewed-on: https://go-review.googlesource.com/8388
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-06 21:28:07 +00:00
Ian Lance Taylor
32dbe07621 runtime: fix arm, arm64, ppc64 builds (I hope)
I guess we need more builders.

Change-Id: I309e3df7608b9eef9339196fdc50dedf5f9422e4
Reviewed-on: https://go-review.googlesource.com/8434
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-03 05:18:31 +00:00
Srdjan Petrovic
e8694c8196 runtime: initialize shared library at library-load time
This is Part 2 of the change, see Part 1 here: in https://go-review.googlesource.com/#/c/7692/

Suggested by iant@, we use the library initialization entry point to:
    - create a new OS thread and run the "regular" runtime init stack on
      that thread
    - return immediately from the main (i.e., loader) thread
    - at the first CGO invocation, we wait for the runtime initialization
      to complete.

The above mechanism is implemented only on linux_amd64.  Next step is to
support it on linux_arm.  Other platforms don't yet support shared library
compiling/linking, but we intend to use the same strategy there as well.

Change-Id: Ib2c81b1b83bee837134084b75a3beecfb8de6bf4
Reviewed-on: https://go-review.googlesource.com/8094
Run-TryBot: Srdjan Petrovic <spetrovic@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-03 01:24:51 +00:00
Austin Clements
f244a1471d runtime: add cumulative GC CPU % to gctrace line
This tracks both total CPU time used by GC and the total time
available to all Ps since the beginning of the program and uses this
to derive a cumulative CPU usage percent for the gctrace line.

Change-Id: Ica85372b8dd45f7621909b325d5ac713a9b0d015
Reviewed-on: https://go-review.googlesource.com/8350
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-02 23:37:13 +00:00
Austin Clements
24ee948269 runtime: update gctrace line for new garbage collector
GODEBUG=gctrace=1 turns on a per-GC cycle trace line. The current line
is left over from the STW garbage collector and includes a lot of
information that is no longer meaningful for the concurrent GC and
doesn't include a lot of information that is important.

Replace this line with a new line designed for the new garbage
collector.

This new line is focused more on helping the user understand the
impact of the garbage collector on their program and less on telling
us, the runtime developers, everything that's happening inside
GC. It's designed to fit in 80 columns and intentionally omit some
potentially useful things that were in the old line. We might want a
"verbose" mode that adds information for us.

We'll be able to further simplify the line once we eliminate the STW
around enabling the write barrier. Then we'll have just one STW phase,
one concurrent phase, and one more STW phase, so we'll be able to
reduce the number of times from five to three.

Change-Id: Icc30939fe4576fb4491b4eac811649395727aa2a
Reviewed-on: https://go-review.googlesource.com/8208
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-02 23:37:06 +00:00
Austin Clements
822a24b602 runtime: remove checkgc code from hashmap
Currently hashmap is riddled with code that attempts to force a GC on
the next allocation if checkgc is set. This no longer works as
originally intended with the concurrent collector, and is apparently
no longer used anyway.

Remove checkgc.

Change-Id: Ia6c17c405fa8821dc2e6af28d506c1133ab1ca0c
Reviewed-on: https://go-review.googlesource.com/8355
Reviewed-by: Keith Randall <khr@golang.org>
2015-04-02 15:28:56 +00:00
Austin Clements
6134caf1f9 runtime: improve MemStats comments
This tries to clarify that Alloc and HeapAlloc are tied to how much
freeing has been done by the sweeper.

Change-Id: Id8320074bd75de791f39ec01bac99afe28052d02
Reviewed-on: https://go-review.googlesource.com/8354
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-02 15:28:50 +00:00
Josh Bleecher Snyder
ad3600945a runtime: auto-generate duff routines
This makes it easier to experiment with alternative implementations.

While we're here, update the comments.

No functional changes. Passes toolstash -cmp.

Change-Id: I428535754908f0fdd7cc36c214ddb6e1e60f376e
Reviewed-on: https://go-review.googlesource.com/8310
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-02 02:37:59 +00:00
Michael Hudson-Doyle
67426a8a9e runtime, cmd/internal/ld: change runtime to use a single linker symbol
In preparation for being able to run a go program that has code
in several objects, this changes from having several linker
symbols used by the runtime into having one linker symbol that
points at a structure containing the needed data.  Multiple
object support will construct a linked list of such structures.

A follow up will initialize the slices in the themoduledata
structure directly from the linker but I was aiming for a minimal
diff for now.

Change-Id: I613cce35309801cf265a1d5ae5aaca8d689c5cbf
Reviewed-on: https://go-review.googlesource.com/7441
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-31 22:45:07 +00:00
Austin Clements
a2f3d73fee runtime: improve comment about non-preemption during GC work
Currently, gcDrainN is documented saying that it must be run on the
system stack. In fact, the problem and solution here are somewhat
subtler. First, it doesn't have to happen on the system stack, it just
has to be non-stoppable (that is, non-preemptible). Second, this isn't
specific to gcDrainN (though gcDrainN is perhaps the most surprising
instance); it's general to anything that uses the gcWork structure.

Move the comment to gcWork and generalize it.

Change-Id: I5277b5abb070e47f8d783bc15a310b379c6adc22
Reviewed-on: https://go-review.googlesource.com/8247
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-31 01:05:38 +00:00
Austin Clements
a4374c1de1 runtime: fix another out of date comment in GC
gcDrain used to be passed a *workbuf to start draining from, but now
it takes a gcWork, which hides whether or not there's an initial
workbuf. Update the comment to match this.

Change-Id: I976b58e5bfebc451cfd4fa75e770113067b5cc07
Reviewed-on: https://go-review.googlesource.com/8246
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-31 01:05:31 +00:00
Lee Packham
c45751e8a5 runtime: allow pointers to strings to be printed
Being able to printer pointers to strings means one will able to output
the result of things like the flag library and other components that use
string pointers.

While here, adjusted the tests for gdb to test original string pretty
printing as well as pointers to them. It was doing it via the map before
but for completeness this ensures it's tested as a unit.

Change-Id: I4926547ae4fa6c85ef74301e7d96d49ba4a7b0c6
Reviewed-on: https://go-review.googlesource.com/8217
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-30 23:59:24 +00:00
Michael Hudson-Doyle
f78dc1dac1 runtime: rename ·main·f to ·mainPC to avoid duplicate symbol
runtime·main·f is normalized by the linker to runtime.main.f, as is
the compiler-generated symbol runtime.main·f.  Change the former to
runtime·mainPC instead.

Fixes issue #9934

Change-Id: I656a6fa6422d45385fa2cc55bd036c6affa1abfe
Reviewed-on: https://go-review.googlesource.com/8234
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-30 18:52:14 +00:00
David Chase
2270133981 cmd/gc: allocate backing storage for non-escaping interfaces on stack
Extend escape analysis to convT2E and conT2I. If the interface value
does not escape supply runtime with a stack buffer for the object copy.

This is a straight port from .c to .go of Dmitry's patch

Change-Id: Ic315dd50d144d94dd3324227099c116be5ca70b6
Reviewed-on: https://go-review.googlesource.com/8201
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-03-30 16:11:22 +00:00
Austin Clements
9e6f7aac28 runtime: make "write barriers are not allowed" comments more precise
Currently, various functions are marked with the comment

  // May run without a P, so write barriers are not allowed.

However, "running without a P" is ambiguous. We intended these to mean
that m.p may be nil (which is the condition checked by the write
barrier). The comment could also be taken to mean that a
stop-the-world may happen, which is not the case for these functions
because they run in situations where there is in fact a function on
the stack holding a P locally, it just isn't in m.p.

Change these comments to state precisely what we mean, that m.p may be
nil.

Change-Id: I4a4a1d26aebd455e5067540e13b9f96a7482146c
Reviewed-on: https://go-review.googlesource.com/8209
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-30 15:13:53 +00:00
Daniel Theophanes
77f4571f71 runtime: do not use AddVectoredContinueHandler on Windows XP/2003.
When Windows Error Reporting dialog is disabled on amd64
Windows XP or 2003, the continue handler does not fire. Newer
versions work correctly regardless of WER.

Fixes #10162

Change-Id: I84ea36ee188b34d1421a8db6231223cf61b4111b
Reviewed-on: https://go-review.googlesource.com/8165
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2015-03-30 03:37:55 +00:00
Dmitry Vyukov
ca98dd773a runtime/pprof: fix data race in test
rp.Close happened concurrently with rp.Read. Order them.

Fixes #10280

Change-Id: I7b083bcc336d15396c4e42fc4654ba34fad4a4cc
Reviewed-on: https://go-review.googlesource.com/8211
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-03-29 12:24:16 +00:00
Dmitry Vyukov
c61d86af72 os: give race detector chance to override Exit(0)
Racy tests do not fail currently, they do os.Exit(0).
So if you run go test without -v, you won't even notice.
This was probably introduced with testing.TestMain.

Racy programs do not have the right to finish successfully.

Change-Id: Id133d7424f03d90d438bc3478528683dd02b8846
Reviewed-on: https://go-review.googlesource.com/4371
Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-28 12:42:37 +00:00
Srdjan Petrovic
8da54a4eec cmd: linker changes for shared library initialization
Suggested by iant@, this change:
  - looks for a symbol _rt0_<GOARCH>_<GOOS>_lib,
  - if the symbol is present, adds a new entry into the .init_array ELF
    section that points to the symbol.

The end-effect is that the symbol _rt0_<GOARCH>_<GOOS>_lib will be
invoked as soon as the (ELF) shared library is loaded, which will in turn
initialize the runtime. (To be implemented.)

Change-Id: I99911a180215a6df18f8a18483d12b9b497b48f4
Reviewed-on: https://go-review.googlesource.com/7692
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-27 22:52:10 +00:00
Hyang-Ah Hana Kim
39bc78845b runtime/pprof: fix TestCPUProfileWithFork for GOOS=android.
1) Large allocation in this test caused crash. This was not
detected by builder because builder runs tests with -test.short.

2) The command "go" for forking doesn't exist in some platforms
including android. This change uses the test binary itself which
is guaranteed to exist.

This change also adds logging of the total samples collected in
TestCPUProfileMultithreaded test that is flaky in android-arm
builder.

Change-Id: I225c6b7877d811edef8b25e7eb00559450640c42
Reviewed-on: https://go-review.googlesource.com/8131
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-03-27 18:07:06 +00:00
Austin Clements
392336f94e runtime: disallow write barriers in handoffp and callees
handoffp by definition runs without a P, so it's not allowed to have
write barriers. It doesn't have any right now, but mark it
nowritebarrier to disallow any creeping in in the future. handoffp in
turns calls startm, newm, and newosproc, all of which are "below Go"
and make sense to run without a P, so disallow write barriers in these
as well.

For most functions, we've done this because they may race with
stoptheworld() and hence must not have write barriers. For these
functions, it's a little different: the world can't stop while we're
in handoffp, so this race isn't present. But we implement this
restriction with a somewhat broader rule that you can't have a write
barrier without a P. We like this rule because it's simple and means
that our write barriers can depend on there being a P, even though
this rule is actually a little broader than necessary. Hence, even
though there's no danger of the race in these functions, we want to
adhere to the broader rule.

Change-Id: Ie22319c30eea37d703eb52f5c7ca5da872030b88
Reviewed-on: https://go-review.googlesource.com/8130
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-26 20:38:59 +00:00
Shenghou Ma
400f58a010 runtime: don't trigger write barrier in newosproc for nacl
This should fix the intermittent calling write barrier with mp.p == nil
failures on the nacl/386 builder.

Change-Id: I34aef5ca75ccd2939e6a6ad3f5dacec64903074e
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/7973
Reviewed-by: Austin Clements <austin@google.com>
2015-03-26 19:58:14 +00:00
Austin Clements
ec2c7e6659 runtime: use uintXX instead of *byte for si_addr on Darwin
Currently, Darwin's siginfo type uses *byte for the si_addr
field. This results in unwanted write barriers in set_sigaddr. It's
also pointless since it never points to anything real and the get/set
methods return/take uintXX and cast it from/to the pointer.

All other arches use a uint type for this field. Change Darwin to
match. This simplifies the get/set methods and eliminates the unwanted
write barriers.

Change-Id: Ifdb5646d35e1f2f6808b87a3d59745ec9718add1
Reviewed-on: https://go-review.googlesource.com/8086
Reviewed-by: Austin Clements <austin@google.com>
2015-03-26 16:20:32 +00:00
Austin Clements
9b0ea6aa27 runtime: remove write barrier on G in sighandler
sighandler may run during a stop-the-world without a P, so it's not
allowed to have write barriers. Fix the G write to disable the write
barrier (this is safe because the G is reachable from allgs) and mark
the function nowritebarrier.

Change-Id: I907f05d3829e24eeb15fa4d020598af36710e87e
Reviewed-on: https://go-review.googlesource.com/8020
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-26 15:26:29 +00:00
David Crawshaw
e9d9d0befc runtime, runtime/cgo: make needextram a bool
Also invert it, which means it no longer needs to cross the cgo
package boundary.

Change-Id: I393cd073bda02b591a55d6bc6b8bb94970ea71cd
Reviewed-on: https://go-review.googlesource.com/8082
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-03-26 11:12:25 +00:00
Dave Cheney
e2543ef62c runtime: add runtime.cmpstring and bytes.Compare
Update #10007

Implement runtime.cmpstring and bytes.Compare in asm for arm.

benchmark                                old ns/op     new ns/op     delta
BenchmarkCompareBytesEqual               254           91.4          -64.02%
BenchmarkCompareBytesToNil               41.5          37.6          -9.40%
BenchmarkCompareBytesEmpty               40.7          37.6          -7.62%
BenchmarkCompareBytesIdentical           255           96.3          -62.24%
BenchmarkCompareBytesSameLength          125           60.9          -51.28%
BenchmarkCompareBytesDifferentLength     133           60.9          -54.21%
BenchmarkCompareBytesBigUnaligned        17985879      5669706       -68.48%
BenchmarkCompareBytesBig                 17097634      4926798       -71.18%
BenchmarkCompareBytesBigIdentical        16861941      4389206       -73.97%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkCompareBytesBigUnaligned     58.30        184.95       3.17x
BenchmarkCompareBytesBig              61.33        212.83       3.47x
BenchmarkCompareBytesBigIdentical     62.19        238.90       3.84x

This is a collaboration between Josh Bleecher Snyder and myself.

Change-Id: Ib3944b8c410d0e12135c2ba9459bfe131df48edd
Reviewed-on: https://go-review.googlesource.com/8010
Reviewed-by: Keith Randall <khr@golang.org>
2015-03-25 22:46:39 +00:00
Alex Brainman
2420926a8a runtime: remove obsolete comment
We do not use SEH to handle Windows exception anymore.

Change-Id: I0ac807a0fed7a5b4c745454246764c524460472b
Reviewed-on: https://go-review.googlesource.com/8071
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-25 02:55:56 +00:00
Shenghou Ma
003dccfac4 runtime, syscall: use the new get_random_bytes syscall for NaCl
The SecureRandom named service was removed in
https://codereview.chromium.org/550523002. And the new syscall
was introduced in https://codereview.chromium.org/537543003.

Accepting this will remove the support for older version of
sel_ldr. I've confirmed that both pepper_40 and current
pepper_canary have this syscall.

After this change, we need sel_ldr from pepper_39 or above to
work.

Fixes #9261

Change-Id: I096973593aa302ade61f259a3a71ebc7c1a57913
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/1755
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-03-25 02:07:09 +00:00
Aram Hăvărneanu
41f9c430f3 runtime, syscall: fix Solaris exec tests
Also fixes a long-existing problem in the fork/exec path.

Change-Id: Idec40b1cee0cfb1625fe107db3eafdc0d71798f2
Reviewed-on: https://go-review.googlesource.com/8030
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-03-24 19:51:21 +00:00
David Crawshaw
b8caed823b runtime: initialize extra M for cgo during mstart
Previously the extra m needed for cgo callbacks was created on the
first callback. This works for cgo, however the cgocallback mechanism
is also borrowed by badsignal which can run before any cgo calls are
made.

Now we initialize the extra M at runtime startup before any signal
handlers are registered, so badsignal cannot be called until the
extra M is ready.

Updates #10207.

Change-Id: Iddda2c80db6dc52d8b60e2b269670fbaa704c7b3
Reviewed-on: https://go-review.googlesource.com/7978
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2015-03-24 19:39:46 +00:00
Rick Hudson
546a54bb2e runtime: Remove write barrier on g
There are calls to stdcall when the GC thinks the world is stopped
and stdcall write a *g for the CPU profiler. This produces a write
barrier but the GC is not prepared to deal with write barriers when
it thinks the world is stopped. Since the g is on allg it does not
need a write barrier to keep it alive so eliminate the write barrier.

Change-Id: I937633409a66553d7d292d87d7d58caba1fad0b6
Reviewed-on: https://go-review.googlesource.com/7979
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Rick Hudson <rlh@golang.org>
2015-03-24 16:42:39 +00:00
Alex Brainman
9b69196958 runtime: add TestCgoDLLImports
The test is a simple reproduction of issue 9356.

Update #8948.
Update #9356.

Change-Id: Ia77bc36d12ed0c3c4a8b1214cade8be181c9ad55
Reviewed-on: https://go-review.googlesource.com/7618
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-24 05:39:28 +00:00
Shenghou Ma
b6ed943bef runtime: use _main instead of main on windows/386
windows/386 also wants underscore prefix for external names.
This CL is in preparation of external linking support.

Change-Id: I2d2ea233f976aab3f356f9b508cdd246d5013e2d
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/7282
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2015-03-24 03:23:03 +00:00
Shenghou Ma
6112e6e404 cmd/internal/ld, runtime: record argument size for cgo_dynimport stdcall syscalls
When external linking, we must link to implib provided by mingw, so we must use
properly decorated names for stdcalls.

Because the feature is only used in the runtime, I've designed a new decoration
scheme so that we can use the same decorated name for both 386 and amd64.

A stdcall function named FooEx from bar16.dll which takes 3 parameters will be
imported like this:
	//go:cgo_import_dynamic runtime._FooEx FooEx%3 "bar16.dll"
Depending on the size of uintptr, the linker will later transform it to _FooEx@12
or _FooEx@24.

This is in prepration for the next CL that adds external linking support for
windows/386.

Change-Id: I2d2ea233f976aab3f356f9b508cdd246d5013e2c
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/7163
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-24 03:22:26 +00:00
Michael MacInnis
f7befa43a3 syscall: Add Foreground and Pgid to SysProcAttr
On Unix, when placing a child in a new process group, allow that group
to become the foreground process group. Also, allow a child process to
join a specific process group.

When setting the foreground process group, Ctty is used as the file
descriptor of the controlling terminal. Ctty has been added to the BSD
and Solaris SysProcAttr structures and the handling of Setctty changed
to match Linux.

Change-Id: I18d169a6c5ab8a6a90708c4ff52eb4aded50bc8c
Reviewed-on: https://go-review.googlesource.com/5130
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-23 15:35:53 +00:00
Joel Sing
4f35ad6088 runtime: fix return values for open/read/write/close on openbsd/arm
Change-Id: I5b057d16eed1b364e608ff0fd74de323da6492bc
Reviewed-on: https://go-review.googlesource.com/7679
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-21 03:52:42 +00:00
Dave Cheney
98485f5ad4 runtime: fix linux/amd64p32 build
Implement runtime.atomicand8 for amd64p32 which was overlooked
in CL 7861.

Change-Id: Ic7eccddc6fd6c4682cac1761294893928f5428a2
Reviewed-on: https://go-review.googlesource.com/7920
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-21 02:59:43 +00:00
Russ Cox
4224d81fae cmd/internal/gc: inline x := y.(*T) and x, ok := y.(*T)
These can be implemented with just a compare and a move instruction.
Do so, avoiding the overhead of a call into the runtime.

These assertions are a significant cost in Go code that uses interface{}
as a safe alternative to C's void* (or unsafe.Pointer), such as the
current version of the Go compiler.

*T here includes pointer to T but also any Go type represented as
a single pointer (chan, func, map). It does not include [1]*T or struct{*int}.
That requires more work in other parts of the compiler; there is a TODO.

Change-Id: I7ff681c20d2c3eb6ad11dd7b3a37b1f3dda23965
Reviewed-on: https://go-review.googlesource.com/7862
Reviewed-by: Rob Pike <r@golang.org>
2015-03-20 20:05:37 +00:00
Austin Clements
653426f08f runtime: exit getfull barrier if there are partial workbufs
Currently, we only exit the getfull barrier if there is work on the
full list, even though the exit path will take work from either the
full or partial list. Change this to exit the barrier if there is work
on either the full or partial lists.

I believe it's currently safe to check only the full list, since
during mark termination there is no reason to put a workbuf on a
partial list. However, checking both is more robust.

Change-Id: Icf095b0945c7cad326a87ff2f1dc49b7699df373
Reviewed-on: https://go-review.googlesource.com/7840
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-20 14:05:11 +00:00
Austin Clements
06de3f52a7 runtime: document subtlety around entering mark termination
The barrier in gcDrain does not account for concurrent gcDrainNs
happening in gchelpwork, so it can actually return while there is
still work being done. It turns out this is okay, but for subtle
reasons involving gcDrainN always being run on the system
stack. Document these reasons.

Change-Id: Ib07b3753cc4e2b54533ab3081a359cbd1c3c08fb
Reviewed-on: https://go-review.googlesource.com/7736
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-20 14:05:05 +00:00
Russ Cox
4d2b3a0b5f runtime: fix arm build
Make mask uint32, and move down one line to match atomic_arm64.go.

Change-Id: I4867de494bc4076b7c2b3bf4fd74aa984e3ea0c8
Reviewed-on: https://go-review.googlesource.com/7854
Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-20 05:00:46 +00:00