1
0
mirror of https://github.com/golang/go synced 2024-10-05 21:21:21 -06:00
Commit Graph

1709 Commits

Author SHA1 Message Date
Keith Randall
00c638d243 runtime: on map update, don't overwrite key if we don't need to.
Keep track of which types of keys need an update and which don't.

Strings need an update because the new key might pin a smaller backing store.
Floats need an update because it might be +0/-0.
Interfaces need an update because they may contain strings or floats.

Fixes #11088

Change-Id: I9ade53c1dfb3c1a2870d68d07201bc8128e9f217
Reviewed-on: https://go-review.googlesource.com/10843
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-09 21:06:49 +00:00
Austin Clements
e9089e4ab6 runtime: add high-level description of how stack barriers work
Change-Id: I6affe75b5fa9dbf513c16200bff4fd7aa5f3a985
Reviewed-on: https://go-review.googlesource.com/14051
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-09 01:18:56 +00:00
Austin Clements
92e68c89f6 runtime: move stack barrier code to its own file
Currently the stack barrier code is mixed in with the mark and scan
code. Move all of the stack barrier related functions and variables to
a new dedicated source file. There are no code modifications.

Change-Id: I604603045465ef8573b9f88915d28ab6b5910903
Reviewed-on: https://go-review.googlesource.com/14050
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-09 01:18:50 +00:00
Michael Hudson-Doyle
0388d4303f runtime: remove unused FUNCDATA_DeadValueMaps
Change-Id: Iccb0221bd9aef062d20798b952eaa09d9e60b902
Reviewed-on: https://go-review.googlesource.com/14345
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-07 21:02:11 +00:00
Michael Hudson-Doyle
31322996fd runtime: add stub sigreturn on arm
When building a shared library, all functions that are declared must actually
be defined.

Change-Id: I1488690cecfb66e62d9fdb3b8d257a4dc31d202a
Reviewed-on: https://go-review.googlesource.com/14187
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-09-07 07:49:09 +00:00
Michael Hudson-Doyle
40af15f28e runtime: teach softfloat interpreter about "add r11, pc, r11"
This is generated during fp code when -shared is active.

Change-Id: Ia1092299b9c3b63ff771ca4842158b42c34bd008
Reviewed-on: https://go-review.googlesource.com/14286
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-09-04 06:43:35 +00:00
Michael Hudson-Doyle
9e6ba37b86 cmd/internal/obj: some platform independent bits of proper toolchain support for thread local storage
Also simplifies some silliness around making the .tbss section wrt internal
vs external linking. The "make TLS make sense" project has quite a few more
steps to go.

Issue #11270

Change-Id: Ia4fa135cb22d916728ead95bdbc0ebc1ae06f05c
Reviewed-on: https://go-review.googlesource.com/13990
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-03 14:06:07 +00:00
Michael Hudson-Doyle
9f0baca505 runtime: fixes for arm64 shared libraries
Building for shared libraries requires that all functions that are declared
have an implementation and vice versa so make that so on arm64.

It would be nicer to not require the stub sigreturn (it will never be called)
but that seems a bit awkward.

Change-Id: I3cec81697161b452af81fa35939f748bd1acf7fd
Reviewed-on: https://go-review.googlesource.com/13995
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-09-03 01:07:40 +00:00
Keith Randall
a088f1b76c runtime: soften up hash checks a bit
The hash tests generate occasional failures, quiet them some more.

In particular we can get 1 collision when the expected number is
.001 or so. That shouldn't be a dealbreaker.

Fixes #12311

Change-Id: I784e91b5d21f4f1f166dc51bde2d1cd3a7a3bfea
Reviewed-on: https://go-review.googlesource.com/13902
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-08-31 19:38:24 +00:00
Shenghou Ma
32d3b96e8b runtime: implement cmpstring and bytes.Compare in assembly for ppc64
Change-Id: I15bf55aa5ac3588c05f0a253f583c52bab209892
Reviewed-on: https://go-review.googlesource.com/14041
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-08-31 18:41:58 +00:00
Austin Clements
77e528293b runtime: check that stack barrier unwind is in sync
Currently the stack barrier stub blindly unwinds the next stack
barrier from the G's stack barrier array without checking that it's
the right stack barrier. If through some bug the stack barrier array
position gets out of sync with where we actually are on the stack,
this could return to the wrong PC, which would lead to difficult to
debug crashes. To address this, this commit adds a check to the amd64
stack barrier stub that it's unwinding the correct stack barrier.

Updates #12238.

Change-Id: If824d95191d07e2512dc5dba0d9978cfd9f54e02
Reviewed-on: https://go-review.googlesource.com/13948
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:07:02 +00:00
Austin Clements
3bfc9df21a runtime: add GODEBUG for stack barriers at every frame
Currently enabling the debugging mode where stack barriers are
installed at every frame requires recompiling the runtime. However,
this is potentially useful for field debugging and for runtime tests,
so make this mode a GODEBUG.

Updates #12238.

Change-Id: I6fb128f598b19568ae723a612e099c0ed96917f5
Reviewed-on: https://go-review.googlesource.com/13947
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:06:55 +00:00
Austin Clements
e2bb03f175 runtime: don't install a stack barrier in cgocallback_gofunc's frame
Currently the runtime can install stack barriers in any frame.
However, the frame of cgocallback_gofunc is special: it's the one
function that switches from a regular G stack to the system stack on
return. Hence, the return PC slot in its frame on the G stack is
actually used to save getg().sched.pc (so tracebacks appear to unwind
to the last Go function running on that G), and not as an actual
return PC for cgocallback_gofunc.

Because of this, if we install a stack barrier in cgocallback_gofunc's
return PC slot, when cgocallback_gofunc does return, it will move the
stack barrier stub PC in to getg().sched.pc and switch back to the
system stack. The rest of the runtime doesn't know how to deal with a
stack barrier stub in sched.pc: nothing knows how to match it up with
the G's stack barrier array and, when the runtime removes stack
barriers, it doesn't know to undo the one in sched.pc. Hence, if the C
code later returns back in to Go code, it will attempt to return
through the stack barrier saved in sched.pc, which may no longer have
correct unwinding information.

Fix this by blacklisting cgocallback_gofunc's frame so the runtime
won't install a stack barrier in it's return PC slot.

Fixes #12238.

Change-Id: I46aa2155df2fd050dd50de3434b62987dc4947b8
Reviewed-on: https://go-review.googlesource.com/13944
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:06:47 +00:00
Keith Randall
805e56ef47 runtime: short-circuit bytes.Compare if src and dst are the same slice
Should only matter on ppc64 and ppc64le.

Fixes #11336

Change-Id: Id4b0ac28b573648e1aa98e87bf010f00d006b146
Reviewed-on: https://go-review.googlesource.com/13901
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-08-29 02:43:57 +00:00
Russ Cox
9c04d00214 runtime: check explicitly for short unwinding of stacks
Right now we find out implicitly if stack barriers are in place,
or defers. This change makes sure we find out about short
unwinds always.

Change-Id: Ibdde1ba9c79eb792660dcb7aa6f186e4e4d559b3
Reviewed-on: https://go-review.googlesource.com/13966
Reviewed-by: Austin Clements <austin@google.com>
2015-08-28 16:05:59 +00:00
Tim Cooijmans
34db31d5f5 src/runtime: Add missing defs for android/386.
Change-Id: I63bf6d2fdf41b49ff8783052d5d6c53b20e2f050
Reviewed-on: https://go-review.googlesource.com/13760
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-08-27 15:14:41 +00:00
Michael Hudson-Doyle
d497eeb005 runtime: remove unused xchgp/xchgp1
I noticed that they were unimplemented on arm64 but then that they were
in fact not used at all.

Change-Id: Iee579feda2a5e374fa571bcc8c89e4ef607d50f6
Reviewed-on: https://go-review.googlesource.com/13951
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-27 00:28:35 +00:00
Uttam C Pawar
32add8d7c8 bytes: improve Compare function on amd64 for large byte arrays
This patch contains only loop unrolling change for size > 63B

Following are the performance numbers for various sizes on
On Haswell based system: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz.

benchcmp go.head.8.25.15.txt go.head.8.25.15.opt.txt
benchmark                       old ns/op     new ns/op     delta
BenchmarkBytesCompare1-4        5.37          5.37          +0.00%
BenchmarkBytesCompare2-4        5.37          5.38          +0.19%
BenchmarkBytesCompare4-4        5.37          5.37          +0.00%
BenchmarkBytesCompare8-4        4.42          4.38          -0.90%
BenchmarkBytesCompare16-4       4.27          4.45          +4.22%
BenchmarkBytesCompare32-4       5.30          5.36          +1.13%
BenchmarkBytesCompare64-4       6.93          6.78          -2.16%
BenchmarkBytesCompare128-4      10.3          9.50          -7.77%
BenchmarkBytesCompare256-4      17.1          13.8          -19.30%
BenchmarkBytesCompare512-4      31.3          22.1          -29.39%
BenchmarkBytesCompare1024-4     62.5          39.0          -37.60%
BenchmarkBytesCompare2048-4     112           73.2          -34.64%

Change-Id: I4eeb1c22732fd62cbac97ba757b0d29f648d4ef1
Reviewed-on: https://go-review.googlesource.com/11871
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-26 03:52:20 +00:00
Todd Neal
a94e906c41 runtime: remove always false comparison in sigsend
s is a uint32 and can never be zero. It's max value is already tested
against sig.wanted, whose size is derived from _NSIG.  This also
matches the test in signal_enable.

Fixes #11282

Change-Id: I8eec9c7df8eb8682433616462fe51b264c092475
Reviewed-on: https://go-review.googlesource.com/13940
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-08-26 01:02:55 +00:00
Michael Hudson-Doyle
af78482d6b cmd/compile, cmd/link, reflect, runtime: remove type.zero field
No longer used after previous hashmap change.

Change-Id: I558470f872281e84a78406132df4e391d077b833
Reviewed-on: https://go-review.googlesource.com/13785
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-26 00:28:17 +00:00
Michael Hudson-Doyle
38519e69d0 cmd/compile, runtime: stop returning t.zero on hashmap miss
Previously t.zero always pointed to runtime.zerovalue. Change the hashmap code
to always return a runtime pointer directly, and change that pointer to point
to a larger buffer if one is needed.

(It might be better to only copy from the pointer returned by the mapaccess
functions when the value type is small enough and have the compiler insert
explicit zeroing for larger value types, but I tried and failed to do this).

This removes all uses of the zero field of the type data; the field itself can
be removed in a separate change.

Fixes #11491

Change-Id: I5b81752ff4067d74a5a281c41e88f151bae0171e
Reviewed-on: https://go-review.googlesource.com/13784
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-08-26 00:03:21 +00:00
Austin Clements
05a3b1fce5 cmd/compile: fix uninitialized memory in compare of interface value
A comparison of the form l == r where l is an interface and r is
concrete performs a type assertion on l to convert it to r's type.
However, the compiler fails to zero the temporary where the result of
the type assertion is written, so if the type is a pointer type and a
stack scan occurs while in the type assertion, it may see an invalid
pointer on the stack.

Fix this by zeroing the temporary. This is equivalent to the fix for
type switches from c4092ac.

Fixes #12253.

Change-Id: Iaf205d456b856c056b317b4e888ce892f0c555b9
Reviewed-on: https://go-review.googlesource.com/13872
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-25 14:37:08 +00:00
Dave Cheney
686d44d9e0 runtime: check pointer equality in arm64 cmpbody
Updates #11336

Follow the lead of amd64 by doing a pointer equality check
before comparing string/byte contents on arm64.

BenchmarkCompareBytesEqual-8               25.8           26.3           +1.94%
BenchmarkCompareBytesToNil-8               9.59           9.59           +0.00%
BenchmarkCompareBytesEmpty-8               9.59           9.17           -4.38%
BenchmarkCompareBytesIdentical-8           26.3           9.17           -65.13%
BenchmarkCompareBytesSameLength-8          16.3           16.3           +0.00%
BenchmarkCompareBytesDifferentLength-8     16.3           16.3           +0.00%
BenchmarkCompareBytesBigUnaligned-8        1132038        1131409        -0.06%
BenchmarkCompareBytesBig-8                 1126758        1128470        +0.15%
BenchmarkCompareBytesBigIdentical-8        1084366        9.17           -100.00%

Change-Id: Id7125c31957eff1ddb78897d4511bd50e79af3f7
Reviewed-on: https://go-review.googlesource.com/13885
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-25 03:29:47 +00:00
Todd Neal
3efe36d4c4 runtime: fix nmspinning comparison
nmspinning has a value range of [0, 2^31-1].  Update the comment to
indicate this and fix the comparison so it's not always false.

Fixes #11280

Change-Id: Iedaf0654dcba5e2c800645f26b26a1a781ea1991
Reviewed-on: https://go-review.googlesource.com/13877
Reviewed-by: Minux Ma <minux@golang.org>
2015-08-25 02:44:11 +00:00
Shenghou Ma
24be0997a2 runtime: add a missing hex conversion
gobuf.g is a guintptr, so without hex(), it will be printed as
a decimal, which is not very helpful and inconsistent with how
other pointers are printed.

Change-Id: I7c0432e9709e90a5c3b3e22ce799551a6242d017
Reviewed-on: https://go-review.googlesource.com/13879
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-25 01:37:54 +00:00
Dave Cheney
1135b9d671 runtime: check pointer equality in arm cmpbody
Updates #11336

Follow the lead of amd64 do a pointer equality check
before comparing string/byte contents on arm.

BenchmarkCompareBytesEqual-4               208             211             +1.44%
BenchmarkCompareBytesToNil-4               83.6            81.8            -2.15%
BenchmarkCompareBytesEmpty-4               80.2            75.2            -6.23%
BenchmarkCompareBytesIdentical-4           208             75.2            -63.85%
BenchmarkCompareBytesSameLength-4          126             128             +1.59%
BenchmarkCompareBytesDifferentLength-4     128             130             +1.56%
BenchmarkCompareBytesBigUnaligned-4        14192804        14060971        -0.93%
BenchmarkCompareBytesBig-4                 12277313        12128193        -1.21%
BenchmarkCompareBytesBigIdentical-4        9385046         78.5            -100.00%

Change-Id: I5b24620018688c5fe04b6ff6743a24c4ce225788
Reviewed-on: https://go-review.googlesource.com/13881
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-24 21:18:33 +00:00
Hyang-Ah (Hana) Kim
db5eb2a2c3 runtime/cgo: remove __stack_chk_fail_local
I cannot find where it's being used.

This addresses a duplicate symbol issue encountered in golang/go#9327.

Change-Id: I8efda45a006ad3e19423748210c78bd5831215e0
Reviewed-on: https://go-review.googlesource.com/13615
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-21 15:56:36 +00:00
Shawn Walker-Salas
d9e3d16796 runtime, syscall: remove unused bits from Solaris implementation
CL 9184 changed the runtime and syscall packages to link Solaris binaries
directly instead of using dlopen/dlsym but did not remove the unused (and
now broken) references to dlopen, dlclose, and dlsym.

Fixes #11923

Change-Id: I36345ce5e7b371bd601b7d48af000f4ccacd62c0
Reviewed-on: https://go-review.googlesource.com/13410
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-08-21 11:39:24 +00:00
Russ Cox
3ae17043f7 runtime: make sure heapBitsBulkBarrier cannot be preempted
Changes the torture test in #12068 from failing about 1/10 times
to not failing in almost 2,000 runs.

This was only happening in -race mode because functions are
bigger in -race mode, so a few of the helpers for heapBitsBulkBarrier
were not being inlined, and they were not marked nosplit,
so (only in -race mode) the write barrier was being preempted by GC,
causing missed pointer updates.

Filed issue #12069 for diagnosis of any other similar errors.

Fixes #12068.

Change-Id: Ic174d9b050ba278b18b08ab0d85a73c33bd5b175
Reviewed-on: https://go-review.googlesource.com/13364
Reviewed-by: Austin Clements <austin@google.com>
2015-08-07 17:55:26 +00:00
Russ Cox
4a19081358 runtime: run on GOARM=5 and GOARM=6 uniprocessor freebsd/arm systems
Also, crash early on non-Linux SMP ARM systems when GOARM < 7;
without the proper synchronization, SMP cannot work.

Linux is okay because we call kernel-provided routines for
synchronization and barriers, and the kernel takes care of
providing the right routines for the current system.
On non-Linux systems we are left to fend for ourselves.

It is possible to use different synchronization on GOARM=6,
but it's too late to do that in the Go 1.5 cycle.
We don't believe there are any non-Linux SMP GOARM=6 systems anyway.

Fixes #12067.

Change-Id: I771a556e47893ed540ec2cd33d23c06720157ea3
Reviewed-on: https://go-review.googlesource.com/13363
Reviewed-by: Austin Clements <austin@google.com>
2015-08-07 17:39:07 +00:00
Austin Clements
ad731887a7 runtime: call goexit1 instead of goexit
Currently, runtime.Goexit() calls goexit()—the goroutine exit stub—to
terminate the goroutine. This *mostly* works, but can cause a
"leftover stack barriers" panic if the following happens:

1. Goroutine A has a reasonably large stack.

2. The garbage collector scan phase runs and installs stack barriers
   in A's stack. The top-most stack barrier happens to fall at address X.

3. Goroutine A unwinds the stack far enough to be a candidate for
   stack shrinking, but not past X.

4. Goroutine A calls runtime.Goexit(), which calls goexit(), which
   calls goexit1().

5. The garbage collector enters mark termination.

6. Goroutine A is preempted right at the prologue of goexit1() and
   performs a stack shrink, which calls gentraceback.

gentraceback stops as soon as it sees goexit on the stack, which is
only two frames up at this point, even though there may really be many
frames above it. More to the point, the stack barrier at X is above
the goexit frame, so gentraceback never sees that stack barrier. At
the end of gentraceback, it checks that it saw all of the stack
barriers and panics because it didn't see the one at X.

The fix is simple: call goexit1, which actually implements the process
of exiting a goroutine, rather than goexit, the exit stub.

To make sure this doesn't happen again in the future, we also add an
argument to the stub prototype of goexit so you really, really have to
want to call it in order to call it. We were able to reliably
reproduce the above sequence with a fair amount of awful code inserted
at the right places in the runtime, but chose to change the goexit
prototype to ensure this wouldn't happen again rather than pollute the
runtime with ugly testing code.

Change-Id: Ifb6fb53087e09a252baddadc36eebf954468f2a8
Reviewed-on: https://go-review.googlesource.com/13323
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-06 20:21:05 +00:00
Russ Cox
26baed6af7 runtime: fix race that dropped GoSysExit events from trace
This makes TestTraceStressStartStop much less flaky.
Running under stress, it changes the failure rate from
above 1/100 to under 1/50000. That very unlikely
failure happens when an unexpected GoSysExit is
written. Not sure how that happens yet, but it is much
less important.

Fixes #11953.

Change-Id: I034671936334b4f3ab733614ef239aa121d20247
Reviewed-on: https://go-review.googlesource.com/13321
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-08-06 19:29:09 +00:00
Austin Clements
d57f037302 runtime: don't recheck heap trigger for periodic GC
88e945f introduced a non-speculative double check of the heap trigger
before actually starting a concurrent GC. This was necessary to fix a
race for heap-triggered GC, but broke sysmon-triggered periodic GC,
since the heap check will of course fail for periodically triggered
GC.

Fix this by telling startGC whether or not this GC was triggered by
heap size or a timer and only doing the heap size double check for GCs
triggered by heap size.

Fixes #12026.

Change-Id: I7c3f6ec364545c36d619f2b4b3bf3b758e3bcbd6
Reviewed-on: https://go-review.googlesource.com/13168
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-05 17:28:56 +00:00
Russ Cox
2a60d77059 runtime: align stack pointer during initcgo call on arm
This is what is causing freebsd/arm to crash mysteriously when using cgo.
The bug was introduced in golang.org/cl/4030, which moved this code out
of rt0_go and into its own function. The ARM ABI says that calls must
be made with the stack pointer at an 8-byte boundary, but only FreeBSD
seems to crash when this is violated.

Fixes #10119.

Change-Id: Ibdbe76b2c7b80943ab66b8abbb38b47acb70b1e5
Reviewed-on: https://go-review.googlesource.com/13161
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-08-05 05:31:34 +00:00
Austin Clements
be39a42920 runtime: fix typos in comments
Change-Id: I66f7937b22bb6e05c3f2f0f2a057151020ad9699
Reviewed-on: https://go-review.googlesource.com/13049
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:56 +00:00
Austin Clements
e3870aa6f3 runtime: fix assist utilization computation
When commit 510fd13 enabled assists during the scan phase, it failed
to also update the code in the GC controller that computed the assist
CPU utilization and adjusted the trigger based on it. Fix that code so
it uses the start of the scan phase as the wall-clock time when
assists were enabled rather than the start of the mark phase.

Change-Id: I05013734b4448c3e2c730dc7b0b5ee28c86ed8cf
Reviewed-on: https://go-review.googlesource.com/13048
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:53 +00:00
Austin Clements
1fb01a88f9 runtime: revise assist ratio aggressively
At the start of a GC cycle, the garbage collector computes the assist
ratio based on the total scannable heap size. This was intended to be
conservative; after all, this assumes the entire heap may be reachable
and hence needs to be scanned. But it only assumes that the *current*
entire heap may be reachable. It fails to account for heap allocated
during the GC cycle. If the trigger ratio is very low (near zero), and
most of the heap is reachable when GC starts (which is likely if the
trigger ratio is near zero), then it's possible for the mutator to
create new, reachable heap fast enough that the assists won't keep up
based on the assist ratio computed at the beginning of the cycle. As a
result, the heap can grow beyond the heap goal (by hundreds of megs in
stress tests like in issue #11911).

We already have some vestigial logic for dealing with situations like
this; it just doesn't run often enough. Currently, every 10 ms during
the GC cycle, the GC revises the assist ratio. This was put in before
we switched to a conservative assist ratio (when we really were using
estimates of scannable heap), and it turns out to be exactly what we
need now. However, every 10 ms is far too infrequent for a rapidly
allocating mutator.

This commit reuses this logic, but replaces the 10 ms timer with
revising the assist ratio every time the heap is locked, which
coincides precisely with when the statistics used to compute the
assist ratio are updated.

Fixes #11911.

Change-Id: I377b231ab064946228378fa10422a46d1b50f4c5
Reviewed-on: https://go-review.googlesource.com/13047
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:48 +00:00
Austin Clements
f9dc3382ad runtime: when gcpacertrace > 0, print information about assist ratio
This was useful in debugging the mutator assist behavior for #11911,
and it fits with the other gcpacertrace output.

Change-Id: I1e25590bb4098223a160de796578bd11086309c7
Reviewed-on: https://go-review.googlesource.com/13046
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:46 +00:00
Austin Clements
fc9ca85f4c runtime: make sweep proportional to spans bytes allocated
Proportional concurrent sweep is currently based on a ratio of spans
to be swept per bytes of object allocation. However, proportional
sweeping is performed during span allocation, not object allocation,
in order to minimize contention and overhead. Since objects are
allocated from spans after those spans are allocated, the system tends
to operate in debt, which means when the next GC cycle starts, there
is often sweep debt remaining, so GC has to finish the sweep, which
delays the start of the cycle and delays enabling mutator assists.

For example, it's quite likely that many Ps will simultaneously refill
their span caches immediately after a GC cycle (because GC flushes the
span caches), but at this point, there has been very little object
allocation since the end of GC, so very little sweeping is done. The
Ps then allocate objects from these cached spans, which drives up the
bytes of object allocation, but since these allocations are coming
from cached spans, nothing considers whether more sweeping has to
happen. If the sweep ratio is high enough (which can happen if the
next GC trigger is very close to the retained heap size), this can
easily represent a sweep debt of thousands of pages.

Fix this by making proportional sweep proportional to the number of
bytes of spans allocated, rather than the number of bytes of objects
allocated. Prior to allocating a span, both the small object path and
the large object path ensure credit for allocating that span, so the
system operates in the black, rather than in the red.

Combined with the previous commit, this should eliminate all sweeping
from GC start up. On the stress test in issue #11911, this reduces the
time spent sweeping during GC (and delaying start up) by several
orders of magnitude:

                mean    99%ile     max
    pre fix      1 ms    11 ms   144 ms
    post fix   270 ns   735 ns   916 ns

Updates #11911.

Change-Id: I89223712883954c9d6ec2a7a51ecb97172097df3
Reviewed-on: https://go-review.googlesource.com/13044
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:44 +00:00
Austin Clements
e30c6d64ba runtime: always give concurrent sweep some heap distance
Currently it's possible for the next_gc heap size trigger computed for
the next GC cycle to be less than the current allocated heap size.
This means the next cycle will start immediately, which means there's
no time to perform the concurrent sweep between GC cycles. This places
responsibility for finishing the sweep on GC itself, which delays GC
start-up and hence delays mutator assist.

Fix this by ensuring that next_gc is always at least a little higher
than the allocated heap size, so we won't trigger the next cycle
instantly.

Updates #11911.

Change-Id: I74f0b887bf187518d5fedffc7989817cbcf30592
Reviewed-on: https://go-review.googlesource.com/13043
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:41 +00:00
Austin Clements
fb5230af8a runtime: assist the GC during GC startup and shutdown
Currently there are two sensitive periods during which a mutator can
allocate past the heap goal but mutator assists can't be enabled: 1)
at the beginning of GC between when the heap first passes the heap
trigger and sweep termination and 2) at the end of GC between mark
termination and when the background GC goroutine parks. During these
periods there's no back-pressure or safety net, so a rapidly
allocating mutator can allocate past the heap goal. This is
exacerbated if there are many goroutines because the GC coordinator is
scheduled as any other goroutine, so if it gets preempted during one
of these periods, it may stay preempted for a long period (10s or 100s
of milliseconds).

Normally the mutator does scan work to create back-pressure against
allocation, but there is no scan work during these periods. Hence, as
a fall back, if a mutator would assist but can't yet, simply yield the
CPU. This delays the mutator somewhat, but more importantly gives more
CPU time to the GC coordinator for it to complete the transition.

This is obviously a workaround. Issue #11970 suggests a far better but
far more invasive way to fix this.

Updates #11911. (This very nearly fixes the issue, but about once
every 15 minutes I get a GC cycle where the assists are enabled but
don't do enough work.)

Change-Id: I9768b79e3778abd3e06d306596c3bd77f65bf3f1
Reviewed-on: https://go-review.googlesource.com/13026
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-08-04 18:54:38 +00:00
Austin Clements
88e945fd23 runtime: recheck GC trigger before actually starting GC
Currently allocation checks the GC trigger speculatively during
allocation and then triggers the GC without rechecking. As a result,
it's possible for G 1 and G 2 to detect the trigger simultaneously,
both enter startGC, G 1 actually starts GC while G 2 gets preempted
until after the whole GC cycle, then G 2 immediately starts another GC
cycle even though the heap is now well under the trigger.

Fix this by re-checking the GC trigger non-speculatively just before
actually kicking off a new GC cycle.

This contributes to #11911 because when this happens, we definitely
don't finish the background sweep before starting the next GC cycle,
which can significantly delay the start of concurrent scan.

Change-Id: I560ab79ba5684ba435084410a9765d28f5745976
Reviewed-on: https://go-review.googlesource.com/13025
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-08-04 18:54:32 +00:00
Mikio Hara
5e15e28e0e runtime: skip TestCgoCallbackGC on dragonfly
Updates #11990.

Change-Id: I6c58923a1b5a3805acfb6e333e3c9e87f4edf4ba
Reviewed-on: https://go-review.googlesource.com/13050
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-03 04:41:48 +00:00
Russ Cox
c5dff7282e cmd/compile, runtime: fix placement of map bucket overflow pointer on nacl
On most systems, a pointer is the worst case alignment, so adding
a pointer field at the end of a struct guarantees there will be no
padding added after that field (to satisfy overall struct alignment
due to some more-aligned field also present).

In the runtime, the map implementation needs a quick way to
get to the overflow pointer, which is last in the bucket struct,
so it uses size - sizeof(pointer) as the offset.

NaCl/amd64p32 is the exception, as always.
The worst case alignment is 64 bits but pointers are 32 bits.
There's a long history that is not worth going into, but when
we moved the overflow pointer to the end of the struct,
we didn't get the padding computation right.
The compiler computed the regular struct size and then
on amd64p32 added another 32-bit field.
And the runtime assumed it could step back two 32-bit fields
(one 64-bit register size) to get to the overflow pointer.
But in fact if the struct needed 64-bit alignment, the computation
of the regular struct size would have added a 32-bit pad already,
and then the code unconditionally added a second 32-bit pad.
This placed the overflow pointer three words from the end, not two.
The last two were padding, and since the runtime was consistent
about using the second-to-last word as the overflow pointer,
no harm done in the sense of overwriting useful memory.
But writing the overflow pointer to a non-pointer word of memory
means that the GC can't see the overflow blocks, so it will
collect them prematurely. Then bad things happen.

Correct all this in a few steps:

1. Add an explicit check at the end of the bucket layout in the
compiler that the overflow field is last in the struct, never
followed by padding.

2. When padding is needed on nacl (not always, just when needed),
insert it before the overflow pointer, to preserve the "last in the struct"
property.

3. Let the compiler have the final word on the width of the struct,
by inserting an explicit padding field instead of overwriting the
results of the width computation it does.

4. For the same reason (tell the truth to the compiler), set the type
of the overflow field when we're trying to pretend its not a pointer
(in this case the runtime maintains a list of the overflow blocks
elsewhere).

5. Make the runtime use "last in the struct" as its location algorithm.

This fixes TestTraceStress on nacl/amd64p32.
The 'bad map state' and 'invalid free list' failures no longer occur.

Fixes #11838.

Change-Id: If918887f8f252d988db0a35159944d2b36512f92
Reviewed-on: https://go-review.googlesource.com/12971
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-31 18:49:32 +00:00
Russ Cox
108ec5f75a runtime: fix systemstack tracebacks on nacl/arm
For #11956.

Change-Id: Ic9b57cafa197953cc7f435941e44d42b60b3ddf0
Reviewed-on: https://go-review.googlesource.com/13011
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-07-31 04:35:38 +00:00
Russ Cox
abdc77a288 runtime: avoid reference to stale stack after GC shrinkstack
Dangling pointer error. Unlikely to trigger in practice, but still.
Found by running GODEBUG=efence=1 GOGC=1 trace.test.

Change-Id: Ice474dedcf62dd33ab77526287a023ba3b166db9
Reviewed-on: https://go-review.googlesource.com/12991
Reviewed-by: Austin Clements <austin@google.com>
2015-07-31 02:18:42 +00:00
Russ Cox
4bd8040d47 runtime, sync/atomic: add memory barriers in arm cas routines
This only triggers on ARMv7+.
If there are important SMP ARMv6 machines we can reconsider.

Makes TestLFStress tests pass and sync/atomic tests not time out
on Apple iPad Mini 3.

Fixes #7977.
Fixes #10189.

Change-Id: Ie424dea3765176a377d39746be9aa8265d11bec4
Reviewed-on: https://go-review.googlesource.com/12950
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-30 20:11:11 +00:00
Russ Cox
e0c180c44f runtime/cgo: fix darwin/amd64 signal handling setup
Was not allocating space for the frame above sigpanic,
nor was it pushing the LR into the right place.
Because traceback past sigpanic only needs the
LR for faulting leaves, this was not noticed too much.
But it did break the sync/atomic nil deref tests.

Change-Id: Icba53fffa193423aab744c37f21ee893ce2ee3ac
Reviewed-on: https://go-review.googlesource.com/12926
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-30 19:18:45 +00:00
Russ Cox
b2dfacf35e runtime: change arm software div/mod call sequence not to modify stack
Instead of pushing the denominator argument on the stack,
the denominator is now passed in m.

This fixes a variety of bugs related to trying to take stack traces
backwards from the middle of the software div/mod routines.
Some of those bugs have been kludged around in the past,
but others have not. Instead of trying to patch up after breaking
the stack, this CL stops breaking the stack.

This is an update of https://golang.org/cl/19810043,
which was rolled back in https://golang.org/cl/20350043.

The problem in the original CL was that there were divisions
at bad times, when m was not available. These were divisions
by constant denominators, either in C code or in assembly.
The Go compiler knows how to generate division by multiplication
for constant denominators, but the C compiler did not.
There is no longer any C code, so that's taken care of.
There was one problematic DIV in runtime.usleep (assembly)
but https://golang.org/cl/12898 took care of that one.
So now this approach is safe.

Reject DIV/MOD in NOSPLIT functions to keep them from
coming back.

Fixes #6681.
Fixes #6699.
Fixes #10486.

Change-Id: I09a13c76ad08ba75b3bd5d46a3eb78e66a84ab38
Reviewed-on: https://go-review.googlesource.com/12899
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-30 16:14:05 +00:00
Russ Cox
c9d2c7f0d2 runtime: replace divide with multiply in runtime.usleep on arm
We want to adjust the DIV calling convention to use m,
and usleep can be called without an m, so switch to a
multiplication by the reciprocal (and test).

Step toward a fix for #6699 and #10486.

Change-Id: Iccf76a18432d835e48ec64a2fa34a0e4d6d4b955
Reviewed-on: https://go-review.googlesource.com/12898
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-30 15:48:29 +00:00
David Crawshaw
b7205b92c0 runtime/trace: test requires 'go tool addr2line'
For the android/arm builder.

Change-Id: Iad4881689223cd6479870da9541524a8cc458cce
Reviewed-on: https://go-review.googlesource.com/12859
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2015-07-30 05:57:37 +00:00
Russ Cox
c4092ac398 cmd/compile: fix uninitialized memory during type switch assertE2I2
Fixes arm64 builder crash.

The bug is possible on all architectures; you just have to get lucky
and hit a preemption or a stack growth on entry to assertE2I2.
The test stacks the deck.

Change-Id: I8419da909b06249b1ad15830cbb64e386b6aa5f6
Reviewed-on: https://go-review.googlesource.com/12890
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-30 05:21:56 +00:00
Russ Cox
bfac8623d5 runtime: enable TestEmptySlice
It says to disable until #7564 is fixed. It was fixed in April 2014.

Change-Id: I9bebfe96802bafdd2d1a0a47591df346d91b000c
Reviewed-on: https://go-review.googlesource.com/12858
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-30 04:47:16 +00:00
Russ Cox
d3ffc975f3 runtime: set invalidptr=1 by default, as documented
Also make invalidptr control the recently added GC pointer check,
as documented.

Change-Id: Iccfdf49480219d12be8b33b8f03d8312d8ceabed
Reviewed-on: https://go-review.googlesource.com/12857
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-29 23:50:20 +00:00
Russ Cox
bd5ca22232 runtime/trace: remove existing Skips
The skips added in CL 12579, based on incorrect time stamps,
should be sufficient to identify and exclude all the time-related
flakiness on these systems.

If there is other flakiness, we want to find out.

For #10512.

Change-Id: I5b588ac1585b2e9d1d18143520d2d51686b563e3
Reviewed-on: https://go-review.googlesource.com/12746
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:32:23 +00:00
Russ Cox
80c98fa901 runtime/trace: record event sequence numbers explicitly
Nearly all the flaky failures we've seen in trace tests have been
due to the use of time stamps to determine relative event ordering.
This is tricky for many reasons, including:
 - different cores might not have exactly synchronized clocks
 - VMs are worse than real hardware
 - non-x86 chips have different timer resolution than x86 chips
 - on fast systems two events can end up with the same time stamp

Stop trying to make time reliable. It's clearly not going to be for Go 1.5.
Instead, record an explicit event sequence number for ordering.
Using our own counter solves all of the above problems.

The trace still contains time stamps, of course. The sequence number
is just used for ordering.

Should alleviate #10554 somewhat. Then tickDiv can be chosen to
be a useful time unit instead of having to be exact for ordering.

Separating ordering and time stamps lets the trace parser diagnose
systems where the time stamp order and actual order do not match
for one reason or another. This CL adds that check to the end of
trace.Parse, after all other sequence order-based checking.
If that error is found, we skip the test instead of failing it.
Putting the check in trace.Parse means that cmd/trace will pick
up the same check, refusing to display a trace where the time stamps
do not match actual ordering.

Using net/http's BenchmarkClientServerParallel4 on various CPU counts,
not tracing vs tracing:

name                      old time/op    new time/op    delta
ClientServerParallel4       50.4µs ± 4%    80.2µs ± 4%  +59.06%        (p=0.000 n=10+10)
ClientServerParallel4-2     33.1µs ± 7%    57.8µs ± 5%  +74.53%        (p=0.000 n=10+10)
ClientServerParallel4-4     18.5µs ± 4%    32.6µs ± 3%  +75.77%        (p=0.000 n=10+10)
ClientServerParallel4-6     12.9µs ± 5%    24.4µs ± 2%  +89.33%        (p=0.000 n=10+10)
ClientServerParallel4-8     11.4µs ± 6%    21.0µs ± 3%  +83.40%        (p=0.000 n=10+10)
ClientServerParallel4-12    14.4µs ± 4%    23.8µs ± 4%  +65.67%        (p=0.000 n=10+10)

Fixes #10512.

Change-Id: I173eecf8191e86feefd728a5aad25bf1bc094b12
Reviewed-on: https://go-review.googlesource.com/12579
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:32:14 +00:00
Russ Cox
fde392623a runtime: ignore arguments in cgocallback_gofunc frame
Otherwise the GC may see uninitialized memory there,
which might be old pointers that are retained, or it might
trigger the invalid pointer check.

Fixes #11907.

Change-Id: I67e306384a68468eef45da1a8eb5c9df216a77c0
Reviewed-on: https://go-review.googlesource.com/12852
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:30:46 +00:00
Russ Cox
f6dfe16798 runtime: fix darwin/amd64 assembly frame sizes
Change-Id: I2f0ecdc02ce275feadf07e402b54f988513e9b49
Reviewed-on: https://go-review.googlesource.com/12855
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 22:26:02 +00:00
Russ Cox
4addec3aaa runtime: reenable bad pointer check in GC
The last time we tried this, linux/arm64 broke.
The series of CLs leading to this one fixes that problem.
Let's try again.

Fixes #9880.

Change-Id: I67bc1d959175ec972d4dcbe4aa6f153790f74251
Reviewed-on: https://go-review.googlesource.com/12849
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 21:37:55 +00:00
Russ Cox
421220571d runtime, reflect: use correctly aligned stack frame sizes on arm64
arm64 requires either no stack frame or a frame with a size that is 8 mod 16
(adding the saved LR will make it 16-aligned).

The cmd/internal/obj/arm64 has been silently aligning frames, but it led to
a terrible bug when the compiler and obj disagreed on the frame size,
and it's just generally confusing, so we're going to make misaligned frames
an error instead of something that is silently changed.

This CL prepares by updating assembly files.
Note that the changes in this CL are already being done silently by
cmd/internal/obj/arm64, so there is no semantic effect here,
just a clarity effect.

For #9880.

Change-Id: Ibd6928dc5fdcd896c2bacd0291bf26b364591e28
Reviewed-on: https://go-review.googlesource.com/12845
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 21:35:35 +00:00
Austin Clements
23e4744c07 runtime: report GC CPU utilization in MemStats
This adds a GCCPUFraction field to MemStats that reports the
cumulative fraction of the program's execution time spent in the
garbage collector. This is equivalent to the utilization percent shown
in the gctrace output and makes this available programmatically.

This does make one small effect on the gctrace output: we now report
the duration of mark termination up to just before the final
start-the-world, rather than up to just after. However, unlike
stop-the-world, I don't believe there's any way that start-the-world
can block, so it should take negligible time.

While there are many statistics one might want to expose via MemStats,
this is one of the few that will undoubtedly remain meaningful
regardless of future changes to the memory system.

The diff for this change is larger than the actual change. Mostly it
lifts the code for computing the GC CPU utilization out of the
debug.gctrace path.

Updates #10323.

Change-Id: I0f7dc3fdcafe95e8d1233ceb79de606b48acd989
Reviewed-on: https://go-review.googlesource.com/12844
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 20:23:34 +00:00
Austin Clements
4b71660c5b runtime: always capture GC phase transition times
Currently we only capture GC phase transition times if
debug.gctrace>0, but we're about to compute GC CPU utilization
regardless of whether debug.gctrace is set, so we need these
regardless of debug.gctrace.

Change-Id: If3acf16505a43d416e9a99753206f03287180660
Reviewed-on: https://go-review.googlesource.com/12843
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-07-29 20:23:25 +00:00
Austin Clements
87f97c73d3 runtime: avoid race between SIGPROF traceback and stack barriers
The following sequence of events can lead to the runtime attempting an
out-of-bounds access on a stack barrier slice:

1. A SIGPROF comes in on a thread while the G on that thread is in
   _Gsyscall. The sigprof handler calls gentraceback, which saves a
   local copy of the G's stkbar slice. Currently the G has no stack
   barriers, so this slice is empty.

2. On another thread, the GC concurrently scans the stack of the
   goroutine being profiled (it considers it stopped because it's in
   _Gsyscall) and installs stack barriers.

3. Back on the sigprof thread, gentraceback comes across a stack
   barrier in the stack and attempts to look it up in its (zero
   length) copy of G's old stkbar slice, which causes an out-of-bounds
   access.

This commit fixes this by adding a simple cas spin to synchronize the
SIGPROF handler with stack barrier insertion.

In general I would prefer that this synchronization be done through
the G status, since that's how stack scans are otherwise synchronized,
but adding a new lock is a much smaller change and G statuses are full
of subtlety.

Fixes #11863.

Change-Id: Ie89614a6238bb9c6a5b1190499b0b48ec759eaf7
Reviewed-on: https://go-review.googlesource.com/12748
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 19:31:46 +00:00
Rick Hudson
e95bc5fef7 runtime: force mutator to give work buffer to GC
The scheduler, work buffer's dispose, and write barriers
can conspire to hide the a pointer from the GC's concurent
mark phase. If this pointer is the only path to a large
amount of marking the STW mark termination phase may take
a lot of time.

Consider the following:
1) dispose places a work buffer on the partial queue
2) the GC is busy so it does not immediately remove and
   process the work buffer
3) the scheduler runs a mutator whose write barrier dequeues the
   work buffer from the partial queue so the GC won't see it
This repeats until the GC reaches the mark termination
phase where the GC finally discovers the pointer along
with a lot of work to do.

This CL fixes the problem by having the mutator
dispose of the buffer to the full queue instead of
the partial queue. Since the write buffer never asks for full
buffers the conspiracy described above is not possible.

Updates #11694.

Change-Id: I2ce832f9657a7570f800e8ce4459cd9e304ef43b
Reviewed-on: https://go-review.googlesource.com/12840
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 18:56:11 +00:00
Dmitry Vyukov
0c22a74e85 runtime: fix out-of-bounds in stack debugging
Currently stackDebug=4 crashes as:

panic: runtime error: index out of range
fatal error: panic on system stack
runtime stack:
runtime.throw(0x607470, 0x15)
	src/runtime/panic.go:527 +0x96
runtime.gopanic(0x5ada00, 0xc82000a1d0)
	src/runtime/panic.go:354 +0xb9
runtime.panicindex()
	src/runtime/panic.go:12 +0x49
runtime.adjustpointers(0xc820065ac8, 0x7ffe58b56100, 0x7ffe58b56318, 0x0)
	src/runtime/stack1.go:428 +0x5fb
runtime.adjustframe(0x7ffe58b56200, 0x7ffe58b56318, 0x1)
	src/runtime/stack1.go:542 +0x780
runtime.gentraceback(0x487760, 0xc820065ac0, 0x0, 0xc820001080, 0x0, 0x0, 0x7fffffff, 0x6341b8, 0x7ffe58b56318, 0x0, ...)
	src/runtime/traceback.go:336 +0xa7e
runtime.copystack(0xc820001080, 0x1000)
	src/runtime/stack1.go:616 +0x3b1
runtime.newstack()
	src/runtime/stack1.go:801 +0xdde

Change-Id: If2d60960231480a9dbe545d87385fe650d6db808
Reviewed-on: https://go-review.googlesource.com/12763
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-28 20:11:19 +00:00
Russ Cox
7a63ab1a65 runtime: use 64k page rounding on arm64
Fixes #11886.

Change-Id: I9392fd2ef5951173ae275b3ab42db4f8bd2e1d7a
Reviewed-on: https://go-review.googlesource.com/12747
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-28 19:59:00 +00:00
David du Colombier
68117a91ae runtime: fix x86 stack trace for call to heap memory on Plan 9
Russ Cox fixed this issue for other systems
in CL 12026, but the Plan 9 part was forgotten.

Fixes #11656.

Change-Id: I91c033687987ba43d13ad8f42e3fe4c7a78e6075
Reviewed-on: https://go-review.googlesource.com/12762
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-28 19:01:41 +00:00
Ian Lance Taylor
0229317d76 runtime: don't define libc_getpid in os3_solaris.go
The function is already defined between syscall_solaris.go and
syscall2_solaris.go.

Change-Id: I034baf7c8531566bebfdbc5a4061352cbcc31449
Reviewed-on: https://go-review.googlesource.com/12773
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-28 14:07:17 +00:00
Ian Lance Taylor
deaf0333df runtime: fix definitions of getpid and kill on Solaris
A further attempt to fix raiseproc on Solaris.

Change-Id: I8d8000d6ccd0cd9f029ebe1f211b76ecee230cd0
Reviewed-on: https://go-review.googlesource.com/12771
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-28 06:21:08 +00:00
Ian Lance Taylor
d7223c6cc1 runtime: correct implementation of raiseproc on Solaris
I forgot that the libc raise function only sends the signal to the
current thread.  We need to actually use kill and getpid here, as we
do on other systems.

Change-Id: Iac34af822c93468bf68cab8879db3ee20891caaf
Reviewed-on: https://go-review.googlesource.com/12704
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-28 05:41:27 +00:00
David Crawshaw
249894ab6c runtime/cgo: remove TMPDIR logic for iOS
Seems like the simplest solution for 1.5. All the parts of the test
suite I can run on my current device (for which my exception handler
fix no longer works, apparently) pass without this code. I'll move it
into x/mobile/app.

Fixes #11884

Change-Id: I2da40c8c7b48a4c6970c4d709dd7c148a22e8727
Reviewed-on: https://go-review.googlesource.com/12721
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-27 21:28:31 +00:00
Austin Clements
c1f7a56fc0 runtime: close window that hides GC work from concurrent mark
Currently we enter mark 2 by first flushing all existing gcWork caches
and then setting gcBlackenPromptly, which disables further gcWork
caching. However, if a worker or assist pulls a work buffer in to its
gcWork cache after that cache has been flushed but before caching is
disabled, that work may remain in that cache until mark termination.
If that work represents a heap bottleneck (e.g., a single pointer that
is the only way to reach a large amount of the heap), this can force
mark termination to do a large amount of work, resulting in a long
STW.

Fix this by reversing the order of these steps: first disable caching,
then flush all existing caches.

Rick Hudson <rlh> did the hard work of tracking this down. This CL
combined with CL 12672 and CL 12646 distills the critical parts of his
fix from CL 12539.

Fixes #11694.

Change-Id: Ib10d0a21e3f6170a80727d0286f9990df049fed2
Reviewed-on: https://go-review.googlesource.com/12688
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-07-27 20:00:25 +00:00
Austin Clements
510fd1350d runtime: enable GC assists ASAP
Currently the GC coordinator enables GC assists at the same time it
enables background mark workers, after the concurrent scan phase is
done. However, this means a rapidly allocating mutator has the entire
scan phase during which to allocate beyond the heap trigger and
potentially beyond the heap goal with no back-pressure from assists.
This prevents the feedback system that's supposed to keep the heap
size under the heap goal from doing its job.

Fix this by enabling mutator assists during the scan phase. This is
safe because the write barrier is already enabled and globally
acknowledged at this point.

There's still a very small window between when the heap size reaches
the heap trigger and when the GC coordinator is able to stop the world
during which the mutator can allocate unabated. This allows *very*
rapidly allocator mutators like TestTraceStress to still occasionally
exceed the heap goal by a small amount (~20 MB at most for
TestTraceStress). However, this seems like a corner case.

Fixes #11677.

Change-Id: I0f80d949ec82341cd31ca1604a626efb7295a819
Reviewed-on: https://go-review.googlesource.com/12674
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:05 +00:00
Austin Clements
f5e67e53e7 runtime: allow GC drain whenever write barrier is enabled
Currently we hand-code a set of phases when draining is allowed.
However, this set of phases is conservative. The critical invariant is
simply that the write barrier must be enabled if we're draining.

Shortly we're going to enable mutator assists during the scan phase,
which means we may drain during the scan phase. In preparation, this
commit generalizes these assertions to check the fundamental condition
that the write barrier is enabled, rather than checking that we're in
any particular phase.

Change-Id: I0e1bec1ca823d4a697a0831ec4c50f5dd3f2a893
Reviewed-on: https://go-review.googlesource.com/12673
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:04 +00:00
Austin Clements
64a32ffeee runtime: don't start workers between mark 1 & 2
Currently we clear both the mark 1 and mark 2 signals at the beginning
of concurrent mark. If either if these is clear, it acts as a signal
to the scheduler that it should start background workers. However,
this means that in the interim *between* mark 1 and mark 2, the
scheduler basically loops starting up new workers only to have them
return with nothing to do. In addition to harming performance and
delaying mutator work, this approach has a race where workers started
for mark 1 can mistakenly signal mark 2, causing it to complete
prematurely. This approach also interferes with starting assists
earlier to fix #11677.

Fix this by initially setting both mark 1 and mark 2 to "signaled".
The scheduler will not start background mark workers, though assists
can still run. When we're ready to enter mark 1, we clear the mark 1
signal and wait for it. Then, when we're ready to enter mark 2, we
clear the mark 2 signal and wait for it.

This structure also lets us deal cleanly with the situation where all
work is drained *prior* to the mark 2 wait, meaning that there may be
no workers to signal completion. Currently we deal with this using a
racy (and possibly incorrect) check for work in the coordinator itself
to skip the mark 2 wait if there's no work. This change makes the
coordinator unconditionally wait for mark completion and makes the
scheduler itself signal completion by slightly extending the logic it
already has to determine that there's no work and hence no use in
starting a new worker.

This is a prerequisite to fixing the remaining component of #11677,
which will require enabling assists during the scan phase. However, we
don't want to enable background workers until the mark phase because
they will compete with the scan. This change lets us use bgMark1 and
bgMark2 to indicate when it's okay to start background workers
independent of assists.

This is also a prerequisite to fixing #11694. It significantly reduces
the occurrence of long mark termination pauses in #11694 (from 64 out
of 1000 to 2 out of 1000 in one experiment).

Coincidentally, this also reduces the final heap size (and hence run
time) of TestTraceStress from ~100 MB and ~1.9 seconds to ~14 MB and
~0.4 seconds because it significantly shortens concurrent mark
duration.

Rick Hudson <rlh> did the hard work of tracking this down.

Change-Id: I12ea9ee2db9a0ae9d3a90dde4944a75fcf408f4c
Reviewed-on: https://go-review.googlesource.com/12672
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:02 +00:00
Austin Clements
8f34b25318 runtime: retry GC assist until debt is paid off
Currently, there are three ways to satisfy a GC assist: 1) the mutator
steals credit from background GC, 2) the mutator actually does GC
work, and 3) there is no more work available. 3 was never really
intended as a way to satisfy an assist, and it causes problems: there
are periods when it's expected that the GC won't have any work, such
as when transitioning from mark 1 to mark 2 and from mark 2 to mark
termination. During these periods, there's no back-pressure on rapidly
allocating mutators, which lets them race ahead of the heap goal.

For example, test/init1.go and the runtime/trace test both have small
reachable heaps and contain loops that rapidly allocate large garbage
byte slices. This bug lets these tests exceed the heap goal by several
orders of magnitude.

Fix this by forcing the assist (and hence the allocation) to block
until it can satisfy its debt via either 1 or 2, or the GC cycle
terminates.

This fixes one the causes of #11677. It's still possible to overshoot
the GC heap goal, but with this change the overshoot is almost exactly
by the amount of allocation that happens during the concurrent scan
phase, between when the heap passes the GC trigger and when the GC
enables assists.

Change-Id: I5ef4edcb0d2e13a1e432e66e8245f2bd9f8995be
Reviewed-on: https://go-review.googlesource.com/12671
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:01 +00:00
Austin Clements
500c88d40d runtime: yield to GC coordinator after assist completion
Currently it's possible for the GC assist to signal completion of the
mark phase, which puts the GC coordinator goroutine on the current P's
run queue, and then return to mutator code that delays until the next
forced preemption before actually yielding control to the GC
coordinator, dragging out completion of the mark phase. This delay can
be further exacerbated if the mutator makes other goroutines runnable
before yielding control, since this will push the GC coordinator on
the back of the P's run queue.

To fix this, this adds a Gosched to the assist if it completed the
mark phase. This immediately and directly yields control to the GC
coordinator. This already happens implicitly in the background mark
workers because they park immediately after completing the mark.

This is one of the reasons completion of the mark phase is being
dragged out and allowing the mutator to allocate without assisting,
leading to the large heap goal overshoot in issue #11677. This is also
a prerequisite to making the assist block when it can't pay off its
debt.

Change-Id: I586adfbecb3ca042a37966752c1dc757f5c7fc78
Reviewed-on: https://go-review.googlesource.com/12670
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:00 +00:00
Austin Clements
4f188c2d1c runtime: disallow GC assists in non-preemptible contexts
Currently it's possible to perform GC work on a system stack or when
locks are held if there's an allocation that triggers an assist. This
is generally a bad idea because of the fragility of these contexts,
and it's incompatible with two changes we're about to make: one is to
yield after signaling mark completion (which we can't do from a
non-preemptible context) and the other is to make assists block if
there's no other way for them to pay off the assist debt.

This commit simply skips the assist if it's called from a
non-preemptible context. The allocation will still count toward the
assist debt, so it will be paid off by a later assist. There should be
little allocation from non-preemptible contexts, so this shouldn't
harm the overall assist mechanism.

Change-Id: I7bf0e6c73e659fe6b52f27437abf39d76b245c79
Reviewed-on: https://go-review.googlesource.com/12649
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:59 +00:00
Austin Clements
dff9108d98 runtime: make notetsleep_internal nowritebarrier
When notetsleep_internal is called from notetsleepg, notetsleepg has
just given up the P, so write barriers are not allowed in
notetsleep_internal.

Change-Id: I1b214fa388b1ea05b8ce2dcfe1c0074c0a3c8870
Reviewed-on: https://go-review.googlesource.com/12647
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:58 +00:00
Austin Clements
cf225a1748 runtime: fix mark 2 completion in fractional/idle workers
Currently fractional and idle mark workers dispose of their gcWork
cache during mark 2 after incrementing work.nwait and after checking
whether there are any workers or any work available. This creates a
window for two races:

1) If the only remaining work is in this worker's gcWork cache, it
   will see that there are no more workers and no more work on the
   global lists (since it has not yet flushed its own cache) and
   prematurely signal mark 2 completion.

2) After this worker has incremented work.nwait but before it has
   flushed its cache, another worker may observe that there are no
   more workers and no more work and prematurely signal mark 2
   completion.

We can fix both of these by simply moving the cache flush above the
increment of nwait and the test of the completion condition.

This is probably contributing to #11694, though this alone is not
enough to fix it.

Change-Id: Idcf9656e5c460c5ea0d23c19c6c51e951f7716c3
Reviewed-on: https://go-review.googlesource.com/12646
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:56 +00:00
Austin Clements
b8526a8380 runtime: steal the correct amount of GC assist credit
GC assists are supposed to steal at most the amount of background GC
credit available so that background GC credit doesn't go negative.
However, they are instead stealing the *total* amount of their debt
but only claiming up to the amount of credit that was available. This
results in draining the background GC credit pool too quickly, which
results in unnecessary assist work.

The fix is trivial: steal the amount of work we meant to steal (which
is already computed).

Change-Id: I837fe60ed515ba91c6baf363248069734a7895ef
Reviewed-on: https://go-review.googlesource.com/12643
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:54 +00:00
Austin Clements
4c9464525e runtime: document gctrace format
Fixes #10348.

Change-Id: I3eea9738e3f6fdc1998d04a601dc9b556dd2db72
Reviewed-on: https://go-review.googlesource.com/12453
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 17:45:34 +00:00
Austin Clements
7eeeae2a5c runtime: always report starting heap size in gctrace
Currently the gctrace output reports the trigger heap size, rather
than the actual heap size at the beginning of GC. Often these are the
same, or at least very close. However, it's possible for the heap to
already have exceeded this trigger when we first check the trigger and
start GC; in this case, this output is very misleading. We've
encountered this confusion a few times when debugging and this
behavior is difficult to document succinctly.

Change the gctrace output to report the actual heap size when GC
starts, rather than the trigger.

Change-Id: I246b3ccae4c4c7ea44c012e70d24a46878d7601f
Reviewed-on: https://go-review.googlesource.com/12452
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 17:45:28 +00:00
Austin Clements
cc6ed285e5 runtime: remove # from gctrace line
Whenever someone pastes gctrace output into GitHub, it helpfully turns
the GC cycle number into a link to some unrelated issue. Prevent this
by removing the pound before the cycle number. The fact that this is a
cycle number is probably more obvious at a glance than most of the
other numbers.

Change-Id: Ifa5fc7fe6c715eac50e639f25bc36c81a132ffea
Reviewed-on: https://go-review.googlesource.com/12413
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 17:45:22 +00:00
Ian Lance Taylor
f0876a1a94 runtime: log all thread stack traces during GODEBUG=crash on Unix
This extends https://golang.org/cl/2811, which only applied to Darwin
and GNU/Linux, to all Unix systems.

Fixes #9591.

Change-Id: Iec3fb438564ba2924b15b447c0480f87c0bfd009
Reviewed-on: https://go-review.googlesource.com/12661
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 16:58:53 +00:00
Russ Cox
6b8762104a runtime/pprof: document content of heap profile
Fixes #11343.

Change-Id: I46efc24b687b9d060ad864fbb238c74544348e38
Reviewed-on: https://go-review.googlesource.com/12556
Reviewed-by: Rob Pike <r@golang.org>
2015-07-27 16:30:27 +00:00
Russ Cox
f6fb549d22 runtime/cgo: move TMPDIR magic out of os
It's not clear this really belongs anywhere at all,
but this is a better place for it than package os.
This way package os can avoid importing "C".

Fixes #10455.

Change-Id: Ibe321a93bf26f478951c3a067d75e22f3d967eb7
Reviewed-on: https://go-review.googlesource.com/12574
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-07-27 16:05:42 +00:00
Michael Hudson-Doyle
2b0ddb6c23 runtime: pass a smaller buffer to sched_getaffinity on ARM
The system stack is only around 8kb on ARM so one can't put an 8kb buffer on
the stack. More than 1024 ARM cores seems sufficiently unlikely for the
foreseeable future.

Fixes #11853

Change-Id: I7cb27c1250a6153f86e269c172054e9dfc218c72
Reviewed-on: https://go-review.googlesource.com/12622
Reviewed-by: Austin Clements <austin@google.com>
2015-07-27 01:04:10 +00:00
Ian Lance Taylor
eb248c4df2 runtime: require gdb version 7.9 for gdb test
Issue 11214 reports problems with older versions of gdb.  It does work
with gdb 7.9 on my Ubuntu Trusty system, so take that as the minimum
required version.

Fixes #11214.

Change-Id: I61b732895506575be7af595f81fc1bcf696f58c2
Reviewed-on: https://go-review.googlesource.com/12626
Reviewed-by: Austin Clements <austin@google.com>
2015-07-24 17:15:44 +00:00
Ian Lance Taylor
d9ee9a0f6e runtime: fix runtime·raise for dragonfly amd64
Fixes #11847.

Change-Id: I21736a4c6f6fb2f61aec1396ce2c965e3e329e92
Reviewed-on: https://go-review.googlesource.com/12621
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
2015-07-24 05:16:19 +00:00
Russ Cox
74ec5bf2d8 runtime: make pcln table check not trigger next to foreign code
Foreign code can be arbitrarily aligned,
so the function before it can have
arbitrarily much padding.
We can't call pcvalue on values in the padding.

Fixes #11653.

Change-Id: I7d57f813ae5a2409d1520fcc909af3eeef2da131
Reviewed-on: https://go-review.googlesource.com/12550
Reviewed-by: Rob Pike <r@golang.org>
2015-07-23 14:14:22 +00:00
Russ Cox
7334cb3a6f runtime/trace: fix TestTraceSymbolize networking
We use 127.0.0.1 instead of localhost in Go networking tests.
The reporter of #11774 has localhost defined to be 120.192.83.162,
for reasons unknown.

Also, if TestTraceSymbolize calls Fatalf (for example because Listen
fails) then we need to stop the trace for future tests to work.
See failure log in #11774.

Fixes #11774.

Change-Id: Iceddb03a72d31e967acd2d559ecb78051f9c14b7
Reviewed-on: https://go-review.googlesource.com/12521
Reviewed-by: Rob Pike <r@golang.org>
2015-07-23 05:37:15 +00:00
Russ Cox
77d38d9cbe runtime: handle linux CPU masks up to 64k CPUs
Fixes #11823.

Change-Id: Ic949ccb9657478f8ca34fdf1a6fe88f57db69f24
Reviewed-on: https://go-review.googlesource.com/12535
Reviewed-by: Austin Clements <austin@google.com>
2015-07-22 20:53:01 +00:00
Russ Cox
75d779566b runtime/cgo: make compatible with race detector
Some routines run without and m or g and cannot invoke the
race detector runtime. They must be opaque to the runtime.
That used to be true because they were written in C.
Now that they are written in Go, disable the race detector
annotations for those functions explicitly.

Add test.

Fixes #10874.

Change-Id: Ia8cc28d51e7051528f9f9594b75634e6bb66a785
Reviewed-on: https://go-review.googlesource.com/12534
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-22 20:28:47 +00:00
Russ Cox
3b26e8b29a runtime/pprof: ignore too few samples on Windows test
Fixes #10842.

Change-Id: I7de98f3073a47911863a252b7a74d8fdaa48c86f
Reviewed-on: https://go-review.googlesource.com/12529
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-22 20:26:37 +00:00
Ian Lance Taylor
872b168fe3 runtime: if we don't handle a signal on a non-Go thread, raise it
In the past badsignal would crash the program.  In
https://golang.org/cl/10757044 badsignal was changed to call sigsend,
to fix issue #3250.  The effect of this was that when a non-Go thread
received a signal, and os/signal.Notify was not being used to check
for occurrences of the signal, the signal was ignored.

This changes the code so that if os/signal.Notify is not being used,
then the signal handler is reset to what it was, and the signal is
raised again.  This lets non-Go threads handle the signal as they
wish.  In particular, it means that a segmentation violation in a
non-Go thread will ordinarily crash the process, as it should.

Fixes #10139.
Update #11794.

Change-Id: I2109444aaada9d963ad03b1d071ec667760515e5
Reviewed-on: https://go-review.googlesource.com/12503
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2015-07-22 20:26:29 +00:00
Russ Cox
4a4eba9f37 runtime: disable TestGoroutineParallelism on uniprocessor
It's a bad test and it's worst on uniprocessors.

Fixes #11143.

Change-Id: I0164231ada294788d7eec251a2fc33e02a26c13b
Reviewed-on: https://go-review.googlesource.com/12522
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-22 18:53:12 +00:00
Austin Clements
58f3a82950 runtime: fix comments referring to trace functions in runtime/pprof
ae1ea2a moved trace-related functions from runtime/pprof to
runtime/trace, but missed a doc comment and a code comment. Update
these to reflect the move.

Change-Id: I6e1e8861e5ede465c08a2e3f80b976145a8b32d8
Reviewed-on: https://go-review.googlesource.com/12525
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-07-22 18:33:38 +00:00
Dmitry Vyukov
ae1ea2aa94 runtime/trace: add new package
Move tracing functions from runtime/pprof to the new runtime/trace package.

Fixes #9710

Change-Id: I718bcb2ae3e5959d9f72cab5e6708289e5c8ebd5
Reviewed-on: https://go-review.googlesource.com/12511
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-22 15:47:16 +00:00
Michael Hudson-Doyle
1125cd4997 cmd/compile: define func value symbols at declaration
This is mostly Russ's https://golang.org/cl/12145 but with some extra fixes to
account for the fact that function declarations without implementations now
break shared libraries, and including my test case.

Fixes #11480.

Change-Id: Iabdc2934a0378e5025e4e7affadb535eaef2c8f1
Reviewed-on: https://go-review.googlesource.com/12340
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-20 00:50:46 +00:00
Austin Clements
1942e3814b runtime: clarify runtime.GC blocking behavior
The runtime.GC documentation was rewritten in df2809f to make it clear
that it blocks until GC is complete, but the re-rewrite in ed9a4c9 and
e28a679 lost this property when clarifying that it may also block the
entire program and not just the caller.

Try to arrive at wording that conveys both of these properties.

Change-Id: I1e255322aa28a21a548556ecf2a44d8d8ac524ef
Reviewed-on: https://go-review.googlesource.com/12392
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-19 15:10:06 +00:00
Ian Lance Taylor
692054e76e runtime: check for findmoduledatap returning nil
The findmoduledatap function will not return nil in ordinary use, but
check for nil to try to avoid crashing when we are already crashing.

Update #11783.

Change-Id: If7b1adb51efab13b4c1a37b6f3c9ad22641a0b56
Reviewed-on: https://go-review.googlesource.com/12391
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-18 21:26:59 +00:00
Alex Brainman
4a0d9587f2 runtime: skip TestReturnAfterStackGrowInCallback if gcc is not found
Fixes #11754

Change-Id: Ifa423ca6eea46d1500278db290498724a9559d14
Reviewed-on: https://go-review.googlesource.com/12347
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-18 01:29:09 +00:00
Rob Pike
e28a679216 runtime: make the GC message less committal.
We shouldn't guarantee this behavior, but suggest it's possible.

Change-Id: I4c2afb48b99be4d91537306d3337171a13c9990a
Reviewed-on: https://go-review.googlesource.com/12346
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-18 00:28:50 +00:00
Rob Pike
ed9a4c91c2 runtime: document that GC blocks the whole program
No code changes. Just make it clear that runtime.GC is not concurrent.

Change-Id: I00a99ebd26402817c665c9a128978cef19f037be
Reviewed-on: https://go-review.googlesource.com/12345
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-17 22:40:21 +00:00
Austin Clements
e33d6b3d4d runtime: remove out-of-date comment
An out-of-date comment snuck in to cc8f544. Remove it.

Change-Id: I5bc7c17e737d1cabe57b88de06d7579c60ca28ff
Reviewed-on: https://go-review.googlesource.com/12328
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2015-07-17 16:52:32 +00:00
Austin Clements
cc8f544198 runtime: don't free large spans until heapBitsSweepSpan returns
This fixes a race between 1) sweeping and freeing an unmarked large
span and 2) reusing that span and allocating from it. This race arises
because mSpan_Sweep returns spans for large objects to the heap
*before* heapBitsSweepSpan clears the mark bit on the object in the
span.

Specifically, the following sequence of events can lead to an
incorrectly zeroed bitmap byte, which causes the garbage collector to
not trace any pointers in that object (the pointer bits for the first
four words are cleared, and the scan bits are also cleared, so it
looks like a no-scan object).

1) P0 calls mSpan_Sweep on a large span S0 with an unmarked object on it.

2) mSpan_Sweep calls heapBitsSweepSpan, which invokes the callback for
   the one (unmarked) object on the span.

3) The callback calls mHeap_Free, which makes span S0 available for
   allocation, but this is too early.

4) P1 grabs this S0 from the heap to use for allocation.

5) P1 allocates an object on this span and writes that object's type
   bits to the bitmap.

6) P0 returns from the callback to heapBitsSweepSpan.
   heapBitsSweepSpan clears the byte containing the mark, even though
   this span is now owned by P1 and this byte contains important
   bitmap information.

This fixes this problem by simply delaying the mHeap_Free until after
the heapBitsSweepSpan. I think the overall logic of mSpan_Sweep could
be simplified now, but this seems like the minimal change.

Fixes #11617.

Change-Id: I6b1382c7e7cc35f81984467c0772fe9848b7522a
Reviewed-on: https://go-review.googlesource.com/12320
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-17 03:34:11 +00:00
Russ Cox
a93e5b4ff9 Revert "runtime: diagnose invalid pointers during GC"
Broke arm64. Update #9880.

This reverts commit 38d9b2a3a9.

Change-Id: I35fa21005af2183828a9d8b195ebcfbe45ec5138
Reviewed-on: https://go-review.googlesource.com/12247
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-16 01:49:58 +00:00
Austin Clements
e42413cecc runtime: fix saved PC/SP after safe-point function in syscall
Running a safe-point function on syscall entry uses systemstack() and
hence clobbers g.sched.pc and g.sched.sp. Fix this by re-saving them
after the systemstack, just like in the other uses of systemstack in
reentersyscall.

Change-Id: I47868a53eba24d81919fda56ef6bbcf72f1f922e
Reviewed-on: https://go-review.googlesource.com/12125
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-15 21:09:16 +00:00
Austin Clements
edfc979725 runtime: run safe-point function before entering _Psyscall
Currently, we run a P's safe-point function immediately after entering
_Psyscall state. This is unsafe, since as soon as we put the P in
_Psyscall, we no longer control the P and another M may claim it.
We'll still run the safe-point function only once (because doing so
races on an atomic), but the P may no longer be at a safe-point when
we do so.

In particular, this means that the use of forEachP to dispose all P's
gcw caches is unsafe. A P may enter a syscall, run the safe-point
function, and dispose the P's gcw cache concurrently with another M
claiming the P and attempting to use its gcw cache. If this happens,
we may empty the gcw's workbuf after putting it on
work.{full,partial}, or add pointers to it after putting it in
work.empty. This will cause an assertion failure when we later pop the
workbuf from the list and its object count is inconsistent with the
list we got it from.

Fix this by running the safe-point function just before putting the P
in _Psyscall.

Related to #11640. This probably fixes this issue, but while I'm able
to show that we can enter a bad safe-point state as a result of this,
I can't reproduce that specific failure.

Change-Id: I6989c8ca7ef2a4a941ae1931e9a0748cbbb59434
Reviewed-on: https://go-review.googlesource.com/12124
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-15 21:09:07 +00:00
Matthew Dempsky
64e53337af runtime: fix go:nowritebarrier annotation on gcmarkwb_m
Change-Id: I945d46d3bb63f1992bce0d0b1e89e75cac9bbd54
Reviewed-on: https://go-review.googlesource.com/12271
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-07-15 21:06:13 +00:00
Russ Cox
38d9b2a3a9 runtime: diagnose invalid pointers during GC
For #9880. Let's see what breaks.

Change-Id: Ic8b99a604e60177a448af5f7173595feed607875
Reviewed-on: https://go-review.googlesource.com/10818
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2015-07-15 05:42:06 +00:00
Russ Cox
3290e9c145 runtime: fix build on non-x86 machines
Fixes #11656 (again).

Change-Id: I170ff10bfbdb0f34e57c11de42b6ee5291837813
Reviewed-on: https://go-review.googlesource.com/12142
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-14 04:42:12 +00:00
Austin Clements
777ab5ce1a runtime: fix MemStats.{PauseNS,PauseEnd,PauseTotalNS,LastGC}
These memstats are currently being computed by gcMark, which was
appropriate in Go 1.4, but gcMark is now just one part of a bigger
picture. In particular, it can't account for the sweep termination
pause time, it can't account for all of the mark termination pause
time, and the reported "pause end" and "last GC" times will be
slightly earlier than they really are.

Lift computing of these statistics into func gc, which has the
appropriate visibility into the process to compute them correctly.

Fixes one of the issues in #10323. This does not add new statistics
appropriate to the concurrent collector; it simply fixes existing
statistics that are being misreported.

Change-Id: I670cb16594a8641f6b27acf4472db15b6e8e086e
Reviewed-on: https://go-review.googlesource.com/11794
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-13 23:32:59 +00:00
Austin Clements
ad60cd8b92 runtime: report MemStats.PauseEnd in UNIX time
Currently we report MemStats.PauseEnd in nanoseconds, but with no
particular 0 time. On Linux, the 0 time is when the host started. On
Darwin, it's the UNIX epoch. This is also inconsistent with the other
absolute time in MemStats, LastGC, which is always reported in
nanoseconds since 1970.

Fix PauseEnd so it's always reported in nanoseconds since 1970, like
LastGC.

Fixes one of the issues raised in #10323.

Change-Id: Ie2fe3169d45113992363a03b764f4e6c47e5c6a8
Reviewed-on: https://go-review.googlesource.com/11801
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-13 23:32:02 +00:00
Russ Cox
0bcdffeea6 runtime: fix x86 stack trace for call to heap memory
Fixes #11656.

Change-Id: Ib81d583e4b004e67dc9d2f898fd798112434e7a9
Reviewed-on: https://go-review.googlesource.com/12026
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
2015-07-13 19:42:35 +00:00
Russ Cox
683311175c runtime: fix race in TestChanSendBarrier
Fixes race detector build.

Change-Id: I8bdc78d57487580e6b5b8c415df4653a1ba69e37
Reviewed-on: https://go-review.googlesource.com/12087
Reviewed-by: Austin Clements <austin@google.com>
2015-07-13 19:42:20 +00:00
Russ Cox
8c3533c89b runtime: add memory barrier for sync send in select
Missed select case when adding the barrier last time.
All the more reason to refactor this code in Go 1.6.

Fixes #11643.

Change-Id: Ib0d19d6e0939296c0a3e06dda5e9b76f813bbc7e
Reviewed-on: https://go-review.googlesource.com/12086
Reviewed-by: Austin Clements <austin@google.com>
2015-07-13 19:10:22 +00:00
Brad Fitzpatrick
2ae77376f7 all: link to https instead of http
The one in misc/makerelease/makerelease.go is particularly bad and
probably warrants rotating our keys.

I didn't update old weekly notes, and reverted some changes involving
test code for now, since we're late in the Go 1.5 freeze. Otherwise,
the rest are all auto-generated changes, and all manually reviewed.

Change-Id: Ia2753576ab5d64826a167d259f48a2f50508792d
Reviewed-on: https://go-review.googlesource.com/12048
Reviewed-by: Rob Pike <r@golang.org>
2015-07-11 14:36:33 +00:00
Elias Naur
b3a8b0574a runtime: abort on fatal errors and panics in c-shared and c-archive modes
The default behaviour for fatal errors and runtime panics is to dump
the goroutine stack traces and exit with code 2. However, when the process is
owned by foreign code, it is suprising and inappropriate to suddenly exit
the whole process, even on fatal errors. Instead, re-use the crash behaviour
from GOTRACEBACK=crash and abort.

The motivating use case is issue #11382, where an Android crash reporter
is confused by an exiting process, but I believe the aborting behaviour
is appropriate for all cases where Go does not own the process.

The change is simple and contained and will enable reliable crash reporting
for Android apps in Go 1.5, but I'll leave it to others to judge whether it
is too late for Go 1.5.

Fixes #11382

Change-Id: I477328e1092f483591c99da1fbb8bc4411911785
Reviewed-on: https://go-review.googlesource.com/12032
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-07-11 11:39:05 +00:00
Alex Brainman
d5004ee69e runtime: use AddVectoredContinueHandler on Windows XP amd64
Recent change (CL 10370) unexpectedly broke TestRaiseException on
Windows XP amd64. I still do not know why. But reverting old
CL 8165 fixes the problem.

This effectively makes Windows XP amd64 use AddVectoredContinueHandler
instead of SetUnhandledExceptionFilter for exception handling. That is
what we do for all recent Windows versions too.

Fixes #11481

Change-Id: If2e8037711f05bf97e3c69f5a8d86af67c58f6fc
Reviewed-on: https://go-review.googlesource.com/11888
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Daniel Theophanes <kardianos@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-11 07:02:57 +00:00
Ian Lance Taylor
6a90b1d621 runtime, cmd/go: fix tests to work when GOROOT_FINAL is set
When GOROOT_FINAL is set when running all.bash, the tests are run
before the files are copied to GOROOT_FINAL.  The tests are run with
GOROOT set, so most work fine.  This fixes two cases that do not.

In cmd/go/go_test.go we were explicitly removing GOROOT from the
environment, causing tests that did not themselves explicitly set
GOROOT to fail.  There was no need to explicitly remove GOROOT, so
don't do it.  If people choose to run "go test cmd/go" with a bad
GOROOT, that is their own lookout.

In the runtime GDB test, the linker has told gdb to find the support
script in GOROOT_FINAL, which will fail.  Check for that case, and
skip the test when we see it.

Fixes #11652.

Change-Id: I4d3a32311e3973c30fd8a79551aaeab6789d0451
Reviewed-on: https://go-review.googlesource.com/12021
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-10 21:29:37 +00:00
Ian Lance Taylor
2de67e9974 runtime: clarify that NumCPU returns only available CPUs
Update #11609.

Change-Id: Ie363facf13f5e62f1af4a8bdc42a18fb36e16ebf
Reviewed-on: https://go-review.googlesource.com/12022
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-10 21:28:49 +00:00
Austin Clements
4b2774f5ea runtime: make sysmon-triggered GC concurrent
sysmon triggers a GC if there has been no GC for two minutes.
Currently, this is a STW GC. There is no reason for this to be STW, so
make it concurrent.

Fixes #10261.

Change-Id: I92f3ac37272d5c2a31480ff1fa897ebad08775a9
Reviewed-on: https://go-review.googlesource.com/11955
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-09 05:53:21 +00:00
David Chase
7929a0ddfa cmd/compile: initialize line number properly for temporaries
The expansion of structure, array, slice, and map literals
does not use the right line number in its introduced assignments
to temporaries, which leads to incorrect line number attribution
for expressions in those literals.

Inlining also incorrectly replaced the line numbers of args to
inlined functions.

This was revealed in CL 9721 because a now-avoided temporary
assignment introduced the correct line number.
I.e. before CL 9721
  "tmp_wrongline := expr"
was transformed to
  "tmp_rightline := expr; tmp_wrongline := tmp_rightline"

Also includes a repair to CL 10334 involving line numbers
where a spurious -1 remained (should have been 0, now is 0).

Fixes #11400.

Change-Id: I3a4687efe463977fa1e2c996606f4d91aaf22722
Reviewed-on: https://go-review.googlesource.com/11730
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Sameer Ajmani <sameer@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-07 21:30:59 +00:00
Russ Cox
2028077899 runtime: randomize scheduling in -race mode
Basic randomization of goroutine scheduling for -race mode.
It is probably possible to do much better (there's a paper linked
in the issue that I haven't read, for example), but this suffices
to introduce at least some unpredictability into the scheduling order.
The goal here is to have _something_ for Go 1.5, so that we don't
start hitting more of these scheduling order-dependent bugs
if we change the scheduler order again in Go 1.6.

For #11372.

Change-Id: Idf1154123fbd5b7a1ee4d339e93f97635cc2bacb
Reviewed-on: https://go-review.googlesource.com/11795
Reviewed-by: Austin Clements <austin@google.com>
2015-07-07 21:27:38 +00:00
Russ Cox
3b6e86f48a cmd/compile: fix race detector handling of OBLOCK nodes
Fixes #7561 correctly.
Fixes #9137.

Change-Id: I7f27e199d7101b785a7645f789e8fe41a405a86f
Reviewed-on: https://go-review.googlesource.com/11713
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-06-30 19:25:18 +00:00
Russ Cox
8b99bb7b8c runtime: fix broken arm builds
Change-Id: I08de33aacb3fc932722286d69b1dd70ffe787c89
Reviewed-on: https://go-review.googlesource.com/11697
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 17:33:23 +00:00
Russ Cox
434e0bc0a0 cmd/link: record missing pcdata tables correctly
The old code was recording the current table output offset,
so the table from the next function would be used instead of
the runtime realizing that there was no table at all.

Add debug constant in runtime to check this for every function
at startup. It's too expensive to do that by default, but we can
do the last five functions. The end of the table is usually where
the C symbols end up, so that's where the problems typically are.

Fixes #10747.
Fixes #11396.

Change-Id: I13592e78017969fc22979fa902e19e1b151d41b1
Reviewed-on: https://go-review.googlesource.com/11657
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
2015-06-29 16:07:14 +00:00
Austin Clements
1b917484a8 runtime: reset mark state before checkmark and gctrace=2 mark
Currently we fail to reset the live heap accounting state before the
checkmark mark and before the gctrace=2 extra mark. As a result, if
either are enabled, at the end of GC it thinks there are 0 bytes of
live heap, which causes the GC controller to initiate a new GC
immediately, regardless of the true heap size.

Fix this by factoring this state reset into a function and calling it
before all three possible marks.

This function should be merged with gcResetGState, but doing so
requires some additional cleanup, so it will wait for after the
freeze. Filed #11427 for this cleanup.

Fixes #10492.

Change-Id: Ibe46348916fc8368fac6f086e142815c970a6f4d
Reviewed-on: https://go-review.googlesource.com/11561
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 15:58:29 +00:00
Austin Clements
d57056ba26 runtime: don't free stack spans during GC
Memory for stacks is manually managed by the runtime and, currently
(with one exception) we free stack spans immediately when the last
stack on a span is freed. However, the garbage collector assumes that
spans can never transition from non-free to free during scan or mark.
This disagreement makes it possible for the garbage collector to mark
uninitialized objects and is blocking us from re-enabling the bad
pointer test in the garbage collector (issue #9880).

For example, the following sequence will result in marking an
uninitialized object:

1. scanobject loads a pointer slot out of the object it's scanning.
   This happens to be one of the special pointers from the heap into a
   stack. Call the pointer p and suppose it points into X's stack.

2. X, running on another thread, grows its stack and frees its old
   stack.

3. The old stack happens to be large or was the last stack in its
   span, so X frees this span, setting it to state _MSpanFree.

4. The span gets reused as a heap span.

5. scanobject calls heapBitsForObject, which loads the span containing
   p, which is now in state _MSpanInUse, but doesn't necessarily have
   an object at p. The not-object at p gets marked, and at this point
   all sorts of things can go wrong.

We already have a partial solution to this. When shrinking a stack, we
put the old stack on a queue to be freed at the end of garbage
collection. This was done to address exactly this problem, but wasn't
a complete solution.

This commit generalizes this solution to both shrinking and growing
stacks. For stacks that fit in the stack pool, we simply don't free
the span, even if its reference count reaches zero. It's fine to reuse
the span for other stacks, and this enables that. At the end of GC, we
sweep for cached stack spans with a zero reference count and free
them. For larger stacks, we simply queue the stack span to be freed at
the end of GC. Ideally, we would reuse these large stack spans the way
we can small stack spans, but that's a more invasive change that will
have to wait until after the freeze.

Fixes #11267.

Change-Id: Ib7f2c5da4845cc0268e8dc098b08465116972a71
Reviewed-on: https://go-review.googlesource.com/11502
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 15:33:40 +00:00
Austin Clements
f73b2fca84 runtime: remove unused _GCsweep state
We don't use this state. _GCoff means we're sweeping in the
background. This makes it clear in the next commit that _GCoff and
only _GCoff means sweeping.

Change-Id: I416324a829ba0be3794a6cf3cf1655114cb6e47c
Reviewed-on: https://go-review.googlesource.com/11501
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 15:33:31 +00:00
Austin Clements
840965f8d7 runtime: always clear stack barriers on G exit
Currently the runtime fails to clear a G's stack barriers in gfput if
the G's stack allocation is _FixedStack bytes. This causes the runtime
to panic if the following sequence of events happens:

1) The runtime installs stack barriers on a G.

2) The G exits by calling runtime.Goexit. Since this does not
   necessarily return through the stack barriers installed on the G,
   there may still be untriggered stack barriers left on the G's stack
   in recorded in g.stkbar.

3) The runtime calls gfput to add the exiting G to the free pool. If
   the G's stack allocation is _FixedStack bytes, we fail to clear
   g.stkbar.

4) A new G starts and allocates the G that was just added to the free
   pool.

5) The new G begins to execute and overwrites the stack slots that had
   stack barriers in them.

6) The garbage collector enters mark termination, attempts to remove
   stack barriers from the new G, and finds that they've been
   overwritten.

Fix this by clearing the stack barriers in gfput in the case where it
reuses the stack.

Fixes #11256.

Change-Id: I377c44258900e6bcc2d4b3451845814a8eeb2bcf
Reviewed-on: https://go-review.googlesource.com/11461
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 15:02:30 +00:00
Alex Brainman
85d4d46f3c runtime: store syscall parameters in m not on stack
Stack can move during callback, so libcall struct cannot be stored on stack.
asmstdcall updates return values and errno in libcall struct parameter, but
these could be at different location when callback returns.
Store these in m, so they are not affected by GC.

Fixes #10406

Change-Id: Id01c9d2b4b44530494e6d9e9e1c875261ce477cd
Reviewed-on: https://go-review.googlesource.com/10370
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 02:45:45 +00:00
Austin Clements
d231cb8249 runtime: repeat bitmap for slice of GCprog n-1 times, not n times
Currently, to write out the bitmap of a slice of a type with a GCprog,
we construct a new GCprog that executes the underlying type's GCprog
to write out the bitmap once and then repeats those bits n more times.
This results in n+1 repetitions of the bitmap, which is one more
repetition than it should be. This corrupts the bitmap of the heap
following the slice and may write past the mapped bitmap memory and
segfault.

Fix this by repeating the bitmap only n-1 more times.

Fixes #11430.

Change-Id: Ic24854363bffc5a755b66f257339f9309ada3aa5
Reviewed-on: https://go-review.googlesource.com/11570
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-26 21:52:51 +00:00
Dmitry Vyukov
77132c810d runtime/race: enable tests that now pass
These tests pass after cl/11417.

Change-Id: Id98088c52e564208ce432e9717eddd672c42c66d
Reviewed-on: https://go-review.googlesource.com/11551
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-26 18:54:11 +00:00
Shenghou Ma
21a4c93166 runtime: slightly clean up softfloat code
Removes the remains of the old C based stepflt implementation.
Also removed goto usage.

Change-Id: Ida4742c49000fae4fea4649f28afde630ce4c577
Reviewed-on: https://go-review.googlesource.com/9600
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-26 17:51:22 +00:00
Russ Cox
32fddadd98 runtime: reduce slice growth during append to 2x
The new inlined code for append assumed that it could pass the
desired new cap to growslice, not the number of new elements.
But growslice still interpreted the argument as the number of new elements,
making it always grow by >2x (more precisely, 2x+1 rounded up
to the next malloc block size). At the time, I had intended to change
the other callers to use the new cap as well, but it's too late for that.
Instead, introduce growslice_n for the old callers and keep growslice
for the inlined (common case) caller.

Fixes #11403.

Filed #11419 to merge them.

Change-Id: I1338b1e5b352f3be4e43641f44b652ef7195251b
Reviewed-on: https://go-review.googlesource.com/11541
Reviewed-by: Austin Clements <austin@google.com>
2015-06-26 17:49:33 +00:00
Dmitry Vyukov
cd0a8ed48a cmd/compile: add instrumentation of OKEY
Instrument operands of OKEY.
Also instrument OSLICESTR. Previously it was not needed
because of preceeding bounds checks (which were instrumented).
But the preceeding bounds checks have disappeared.

Change-Id: I3b0de213e23cbcf5b8ef800abeded5eeeb3f8287
Reviewed-on: https://go-review.googlesource.com/11417
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-26 15:54:03 +00:00
Aaron Jacobs
8628688304 Fix several out of date references to 4g/5g/6g/8g/9g.
Change-Id: Ifb8e4e13c7778a7c0113190051415e096f5db94f
Reviewed-on: https://go-review.googlesource.com/11390
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-06-26 03:38:21 +00:00
Dmitry Vyukov
055e1a3ae7 runtime/race: fix test driver
At some point it silently stopped recognizing test output.
Meanwhile two tests degraded...

Change-Id: I90a0325fc9aaa16c3ef16b9c4c642581da2bb10c
Reviewed-on: https://go-review.googlesource.com/11416
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-25 11:36:07 +00:00
Russ Cox
a9e536442e runtime: set m.procid always on Linux
For debuggers and other program inspectors.

Fixes #9914.

Change-Id: I670728cea28c045e6eaba1808c550ee2f34d16ff
Reviewed-on: https://go-review.googlesource.com/11341
Reviewed-by: Austin Clements <austin@google.com>
2015-06-24 21:50:39 +00:00
Dmitry Vyukov
77082481d4 runtime/race: make test more robust
The test is flaky on builders lately. I don't see any issues other than
usage of very small sleeps. So increase the sleeps. Also take opportunity
to refactor the code.
On my machine this change significantly reduces failure rate with GOMAXPROCS=2.
I can't reproduce the failure with GOMAXPROCS=1.

Fixes #10726

Change-Id: Iea6f10cf3ce1be5c112a2375d51c13687a8ab4c9
Reviewed-on: https://go-review.googlesource.com/9803
Reviewed-by: Austin Clements <austin@google.com>
2015-06-24 17:53:25 +00:00
Austin Clements
a8ae93fd26 runtime: fix heap bitmap repeating with large scalar tails
When heapBitsSetType repeats a source bitmap with a scalar tail
(typ.ptrdata < typ.size), it lays out the tail upon reaching the end
of the source bitmap by simply increasing the number of bits claimed
to be in the incoming bit buffer. This causes later iterations to read
the appropriate number of zeros out of the bit buffer before starting
on the next repeat of the source bitmap.

Currently, however, later iterations of the loop continue to read bits
from the source bitmap *regardless of the number of bits currently in
the bit buffer*. The bit buffer can only hold 32 or 64 bits, so if the
scalar tail is large and the padding bits exceed the size of the bit
buffer, the read from the source bitmap on the next iteration will
shift the incoming bits into oblivion when it attempts to put them in
the bit buffer. When the buffer does eventually shift down to where
these bits were supposed to be, it will contain zeros. As a result,
words that should be marked as pointers on later repetitions are
marked as scalars, so the garbage collector does not trace them. If
this is the only reference to an object, it will be incorrectly freed.

Fix this by adding logic to drain the bit buffer down if it is large
instead of reading more bits from the source bitmap.

Fixes #11286.

Change-Id: I964432c4b9f1cec334fc8c3da0ff16460203feb6
Reviewed-on: https://go-review.googlesource.com/11360
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-23 18:37:17 +00:00
Austin Clements
eabdd05892 runtime: document memory ordering for h_spans
h_spans can be accessed concurrently without synchronization from
other threads, which means it needs the appropriate memory barriers on
weakly ordered machines. It happens to already have the necessary
memory barriers because all accesses to h_spans are currently
protected by the heap lock and the unlocks happen in exactly the
places where release barriers are needed, but it's easy to imagine
that this could change in the future. Document the fact that we're
depending on the barrier implied by the unlock.

Related to issue #9984.

Change-Id: I1bc3c95cd73361b041c8c95cd4bb92daf8c1f94a
Reviewed-on: https://go-review.googlesource.com/11361
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-06-23 18:28:46 +00:00
Rick Hudson
1ab9176e54 runtime: remove race and increase precision in pointer validation.
This CL removes the single and racy use of mheap.arena_end outside
of the bookkeeping done in mHeap_init and mHeap_Alloc.
There should be no way for heapBitsForSpan to see a pointer to
an invalid span. This CL makes the check for this more precise by
checking that the pointer is between mheap_.arena_start and
mheap_.arena_used instead of mheap_.arena_end.

Change-Id: I1200b54353ee1eda002d92645fd8d26048600ceb
Reviewed-on: https://go-review.googlesource.com/11342
Reviewed-by: Austin Clements <austin@google.com>
2015-06-22 20:37:23 +00:00
Austin Clements
9a3112bcae runtime: one more Map{Bits,Spans} before arena_used update
In order to avoid a race with a concurrent write barrier or garbage
collector thread, any update to arena_used must be preceded by mapping
the corresponding heap bitmap and spans array memory. Otherwise, the
concurrent access may observe that a pointer falls within the heap
arena, but then attempt to access unmapped memory to look up its span
or heap bits.

Commit d57c889 fixed all of the places where we updated arena_used
immediately before mapping the heap bitmap and spans, but it missed
the one place where we update arena_used and depend on later code to
update it again and map the bitmap and spans. This creates a window
where the original race can still happen. This commit fixes this by
mapping the heap bitmap and spans before this arena_used update as
well. This code path is only taken when expanding the heap reservation
on 32-bit over a hole in the address space, so these extra mmap calls
should have negligible impact.

Fixes #10212, #11324.

Change-Id: Id67795e6c7563eb551873bc401e5cc997aaa2bd8
Reviewed-on: https://go-review.googlesource.com/11340
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-06-22 18:54:38 +00:00
Austin Clements
2a331ca8bb runtime: document relaxed access to arena_used
The unsynchronized accesses to mheap_.arena_used in the concurrent
part of the garbage collector look like a problem waiting to happen.
In fact, they are safe, but the reason is somewhat subtle and
undocumented. This commit documents this reasoning.

Related to issue #9984.

Change-Id: Icdbf2329c1aa11dbe2396a71eb5fc2a85bd4afd5
Reviewed-on: https://go-review.googlesource.com/11254
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-06-22 18:37:20 +00:00
Austin Clements
f5d494bbdf runtime: ensure GC sees type-safe memory on weak machines
Currently its possible for the garbage collector to observe
uninitialized memory or stale heap bitmap bits on weakly ordered
architectures such as ARM and PPC. On such architectures, the stores
that zero newly allocated memory and initialize its heap bitmap may
move after a store in user code that makes the allocated object
observable by the garbage collector.

To fix this, add a "publication barrier" (also known as an "export
barrier") before returning from mallocgc. This is a store/store
barrier that ensures any write done by user code that makes the
returned object observable to the garbage collector will be ordered
after the initialization performed by mallocgc. No barrier is
necessary on the reading side because of the data dependency between
loading the pointer and loading the contents of the object.

Fixes one of the issues raised in #9984.

Change-Id: Ia3d96ad9c5fc7f4d342f5e05ec0ceae700cd17c8
Reviewed-on: https://go-review.googlesource.com/11083
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Martin Capitanio <capnm9@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-19 15:29:50 +00:00
Alex Brainman
9d968cb47b runtime: rename cgocall_errno and asmcgocall_errno into cgocall and asmcgocall
Change-Id: I5917bea8bb35b0e725dcc56a68f3a70137cfc180
Reviewed-on: https://go-review.googlesource.com/9387
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-19 01:47:11 +00:00
Rick Hudson
90a19961f2 runtime: reduce latency by aggressively ending mark phase
Some latency regressions have crept into our system over the past few
weeks. This CL fixes those by having the mark phase more aggressively
blacken objects so that the mark termination phase, a STW phase, has less
work to do. Three approaches were taken when the mark phase believes
it has no more work to do, ie all the work buffers are empty.
If things have gone well the mark phase is correct and there is
in fact little or no work. In that case the following items will
take very little time. If the mark phase is wrong this CL will
ferret that work out and give the mark phase a chance to deal with
it concurrently before mark termination begins.

When the mark phase first appears to be out of work, it does three things:
1) It switches from allocating white to allocating black to reduce the
number of unmarked objects reachable only from stacks.
2) It flushes and disables per-P GC work caches so all work must be in
globally visible work buffers.
3) It rescans the global roots---the BSS and data segments---so there
are fewer objects to blacken during mark termination. We do not rescan
stacks at this point, though that could be done in a later CL.
After these steps, it again drains the global work buffers.

On a lightly loaded machine the garbage benchmark has reduced the
number of GC cycles with latency > 10 ms from 83 out of 4083 cycles
down to 2 out of 3995 cycles. Maximum latency was reduced from
60+ msecs down to 20 ms.

Change-Id: I152285b48a7e56c5083a02e8e4485dd39c990492
Reviewed-on: https://go-review.googlesource.com/10590
Reviewed-by: Austin Clements <austin@google.com>
2015-06-18 21:38:46 +00:00
Shenghou Ma
3925a7c5db all: switch to the new deprecation convention
While we're at it, move some misplaced comment blocks around.

Change-Id: I1847d7f1ca1dbb8e5de737203c4ed6c66e112508
Reviewed-on: https://go-review.googlesource.com/10188
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-18 19:16:23 +00:00
Dmitry Vyukov
e72f5f67a1 runtime: fix tracing of syscallexit
There were two issues.
1. Delayed EvGoSysExit could have been emitted during TraceStart,
while it had not yet emitted EvGoInSyscall.
2. Delayed EvGoSysExit could have been emitted during next tracing session.

Fixes #10476
Fixes #11262

Change-Id: Iab68eb31cf38eb6eb6eee427f49c5ca0865a8c64
Reviewed-on: https://go-review.googlesource.com/9132
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-18 13:59:55 +00:00
Alex Brainman
2858b73843 runtime: remove cgocall and asmcgocall
In preparation for rename of cgocall_errno into cgocall and
asmcgocall_errno into asmcgocall in the fllowinng CL.
rsc requested CL 9387 to be split into two parts. This is first part.

Change-Id: I7434f0e4b44dd37017540695834bfcb1eebf0b2f
Reviewed-on: https://go-review.googlesource.com/11166
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-18 04:42:53 +00:00
Russ Cox
cfa3eda587 runtime: fix race in scanvalid assertion
Change-Id: I389b2e10fe667eaa55f87b71b1e004994694d4a3
Reviewed-on: https://go-review.googlesource.com/11173
Reviewed-by: Austin Clements <austin@google.com>
2015-06-17 20:12:37 +00:00
Russ Cox
3c60e6e8cf runtime: fix races in stack scan
This fixes a hang during runtime.TestTraceStress.
It also fixes double-scan of stacks, which leads to
stack barrier installation failures.

Both of these have shown up as flaky failures on the dashboard.

Fixes #10941.

Change-Id: Ia2a5991ce2c9f43ba06ae1c7032f7c898dc990e0
Reviewed-on: https://go-review.googlesource.com/11089
Reviewed-by: Austin Clements <austin@google.com>
2015-06-17 17:56:26 +00:00
Russ Cox
08e25fc1ba cmd/compile: introduce //go:systemstack annotation
//go:systemstack means that the function must run on the system stack.

Add one use in runtime as a demonstration.

Fixes #9174.

Change-Id: I8d4a509cb313541426157da703f1c022e964ace4
Reviewed-on: https://go-review.googlesource.com/10840
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2015-06-17 14:23:00 +00:00
Yongjian Xu
e3dc59f33d runtime: fix typos in os_linux_arm.go
Change-Id: I750900e0aed9ec528fea3f442c35196773e3ba5e
Reviewed-on: https://go-review.googlesource.com/11163
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-17 08:51:59 +00:00
Austin Clements
7387121ddb runtime: account for stack guard when shrinking the stack
Currently, when shrinkstack computes whether the halved stack
allocation will have enough room for the stack, it accounts for the
stack space that's actively in use but fails to leave extra room for
the stack guard space. As a result, *if* the minimum stack size is
small enough or the guard large enough, it may shrink the stack and
leave less than enough room to run nosplit functions. If the next
function called after the stack shrink is a nosplit function, it may
overflow the stack without noticing and overwrite non-stack memory.

We don't think this is happening under normal conditions right now.
The minimum stack allocation is 2K and the guard is 640 bytes. The
"worst case" stack shrink is from 4K (4048 bytes after stack barrier
array reservation) to 2K (2016 bytes after stack barrier array
reservation), which means the largest "used" size that will qualify
for shrinking is 4048/4 - 8 = 1004 bytes. After copying, that leaves
2016 - 1004 = 1012 bytes of available stack, which is significantly
more than the guard space.

If we were to reduce the minimum stack size to 1K or raise the guard
space above 1012 bytes, the logic in shrinkstack would no longer leave
enough space.

It's also possible to trigger this problem by setting
firstStackBarrierOffset to 0, which puts stack barriers in a debug
mode that steals away *half* of the stack for the stack barrier array
reservation. Then, the largest "used" size that qualifies for
shrinking is (4096/2)/4 - 8 = 504 bytes. After copying, that leaves
(2096/2) - 504 = 8 bytes of available stack; much less than the
required guard space. This causes failures like those in issue #11027
because func gc() shrinks its own stack and then immediately calls
casgstatus (a nosplit function), which overflows the stack and
overwrites a free list pointer in the neighboring span. However, since
this seems to require the special debug mode, we don't think it's
responsible for issue #11027.

To forestall all of these subtle issues, this commit modifies
shrinkstack to correctly account for the guard space when considering
whether to halve the stack allocation.

Change-Id: I7312584addc63b5bfe55cc384a1012f6181f1b9d
Reviewed-on: https://go-review.googlesource.com/10714
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-16 21:17:53 +00:00
Austin Clements
5250279eb9 runtime: detect and print corrupted free lists
Issues #10240, #10541, #10941, #11023, #11027 and possibly others are
indicating memory corruption in the runtime. One of the easiest places
to both get corruption and detect it is in the allocator's free lists
since they appear throughout memory and follow strict invariants. This
commit adds a check when sweeping a span that its free list is sane
and, if not, it prints the corrupted free list and panics. Hopefully
this will help us collect more information on these failures.

Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64
Reviewed-on: https://go-review.googlesource.com/10713
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-06-16 21:17:47 +00:00
Russ Cox
142e434006 runtime: implement GOTRACEBACK=crash for linux/386
Change-Id: I401ce8d612160a4f4ee617bddca6827fa544763a
Reviewed-on: https://go-review.googlesource.com/11087
Reviewed-by: Austin Clements <austin@google.com>
2015-06-16 20:47:47 +00:00
Russ Cox
7bc3e58806 all: extract "can I exec?" check from tests into internal/testenv
Change-Id: I7b54be9d8b50b39e01c6be21f310ae9a10404e9d
Reviewed-on: https://go-review.googlesource.com/10753
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-16 18:07:36 +00:00
Russ Cox
43aac4f9e7 runtime: raise maxmem to 512 GB
A workaround for #10460.

Change-Id: I607a556561d509db6de047892f886fb565513895
Reviewed-on: https://go-review.googlesource.com/10819
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 18:31:25 +00:00
Russ Cox
2c2770c3d4 cmd/cgo: make sure pointers passed to C escape to heap
Fixes #10303.

Change-Id: Ia68d3566ba3ebeea6e18e388446bd9b8c431e156
Reviewed-on: https://go-review.googlesource.com/10814
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-15 17:39:53 +00:00
Russ Cox
a3b9797baa runtime: gofmt
Change-Id: I539bdc438f694610a7cd373f7e1451171737cfb3
Reviewed-on: https://go-review.googlesource.com/11084
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-15 17:36:34 +00:00
Russ Cox
d5b40b6ac2 runtime: add GODEBUG gcshrinkstackoff, gcstackbarrieroff, and gcstoptheworld variables
While we're here, update the documentation and delete variables with no effect.

Change-Id: I4df0d266dff880df61b488ed547c2870205862f0
Reviewed-on: https://go-review.googlesource.com/10790
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 17:31:04 +00:00
Russ Cox
80ec711755 runtime: use type-based write barrier for remote stack write during chansend
A send on an unbuffered channel to a blocked receiver is the only
case in the runtime where one goroutine writes directly to the stack
of another. The garbage collector assumes that if a goroutine is
blocked, its stack contains no new pointers since the last time it ran.
The send on an unbuffered channel violates this, so it needs an
explicit write barrier. It has an explicit write barrier, but not one that
can handle a write to another stack. Use one that can (based on type bitmap
instead of heap bitmap).

To make this work, raise the limit for type bitmaps so that they are
used for all types up to 64 kB in size (256 bytes of bitmap).
(The runtime already imposes a limit of 64 kB for a channel element size.)

I have been unable to reproduce this problem in a simple test program.

Could help #11035.

Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5
Reviewed-on: https://go-review.googlesource.com/10815
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 16:50:30 +00:00
Russ Cox
d57c889ae8 runtime: wait to update arena_used until after mapping bitmap
This avoids a race with gcmarkwb_m that was leading to faults.

Fixes #10212.

Change-Id: I6fcf8d09f2692227063ce29152cb57366ea22487
Reviewed-on: https://go-review.googlesource.com/10816
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-06-11 18:15:21 +00:00
Ainar Garipov
7f9f70e5b6 all: fix misprints in comments
These were found by grepping the comments from the go code and feeding
the output to aspell.

Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395
Reviewed-on: https://go-review.googlesource.com/10941
Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-11 14:18:57 +00:00
Yongjian Xu
93e57a22d5 runtime: correct a drifted comment in referencing m->locked.
Change-Id: Ida4b98aa63e57594fa6fa0b8178106bac9b3cd19
Reviewed-on: https://go-review.googlesource.com/10837
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-10 06:15:20 +00:00
Russ Cox
433c0bc769 runtime: avoid fault in heapBitsBulkBarrier
Change-Id: I0512e461de1f25cb2a1cb7f23e7a77d00700667c
Reviewed-on: https://go-review.googlesource.com/10803
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-08 20:24:00 +00:00
Austin Clements
b0532a96a8 runtime: fix write-barrier-enabled phase list in gcmarkwb_m
Commit 1303957 was supposed to enable write barriers during the
concurrent scan phase, but it only enabled *calls* to the write
barrier during this phase. It failed to update the redundant list of
write-barrier-enabled phases in gcmarkwb_m, so it still wasn't greying
objects during the scan phase.

This commit fixes this by replacing the redundant list of phases in
gcmarkwb_m with simply checking writeBarrierEnabled. This is almost
certainly redundant with checks already done in callers, but the last
time we tried to remove these redundant checks everything got much
slower, so I'm leaving it alone for now.

Fixes #11105.

Change-Id: I00230a3cb80a008e749553a8ae901b409097e4be
Reviewed-on: https://go-review.googlesource.com/10801
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-08 05:13:15 +00:00
Austin Clements
306f8f11ad runtime: unwind stack barriers when writing above the current frame
Stack barriers assume that writes through pointers to frames above the
current frame will get write barriers, and hence these frames do not
need to be re-scanned to pick up these changes. For normal writes,
this is true. However, there are places in the runtime that use
typedmemmove to potentially write through pointers to higher frames
(such as mapassign1). Currently, typedmemmove does not execute write
barriers if the destination is on the stack. If there's a stack
barrier between the current frame and the frame being modified with
typedmemmove, and the stack barrier is not otherwise hit, it's
possible that the garbage collector will never see the updated pointer
and incorrectly reclaim the object.

Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove
and its variants) detect when the destination is in the stack and
unwind stack barriers up to the point, forcing mark termination to
later rescan the effected frame and collect these pointers.

Fixes #11084. Might be related to #10240, #10541, #10941, #11023,
 #11027 and possibly others.

Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd
Reviewed-on: https://go-review.googlesource.com/10791
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:57:47 +00:00
Austin Clements
1303957dbf runtime: enable write barriers during concurrent scan
Currently, write barriers are only enabled after completion of the
concurrent scan phase, as we enter the concurrent mark phase. However,
stack barriers are installed during the scan phase and assume that
write barriers will track changes to frames above the stack
barriers. Since write barriers aren't enabled until after stack
barriers are installed, we may miss modifications to the stack that
happen after installing the stack barriers and before enabling write
barriers.

Fix this by enabling write barriers during the scan phase.

This commit intentionally makes the minimal change to do this (there's
only one line of code change; the rest are comment changes). At the
very least, we should consider eliminating the ragged barrier that's
intended to synchronize the enabling of write barriers, but now just
wastes time. I've included a large comment about extensions and
alternative designs.

Change-Id: Ib20fede794e4fcb91ddf36f99bd97344d7f96421
Reviewed-on: https://go-review.googlesource.com/10795
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:55:33 +00:00
Austin Clements
6f6403eddf runtime: fix checkmarks to rescan stacks
Currently checkmarks mode fails to rescan stacks because it sees the
leftover state bits indicating that the stacks haven't changed since
the last scan. As a result, it won't detect lost marks caused by
failing to scan stacks correctly during regular garbage collection.

Fix this by marking all stacks dirty before performing the checkmark
phase.

Change-Id: I1f06882bb8b20257120a4b8e7f95bb3ffc263895
Reviewed-on: https://go-review.googlesource.com/10794
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:55:12 +00:00
Austin Clements
2774b37306 all: use RET instead of RETURN on ppc64
All of the architectures except ppc64 have only "RET" for the return
mnemonic. ppc64 used to have only "RETURN", but commit cf06ea6
introduced RET as a synonym for RETURN to make ppc64 consistent with
the other architectures. However, that commit was never followed up to
make the code itself consistent by eliminating uses of RETURN.

This commit replaces all uses of RETURN in the ppc64 assembly with
RET.

This was done with
    sed -i 's/\<RETURN\>/RET/' **/*_ppc64x.s
plus one manual change to syscall/asm.s.

Change-Id: I3f6c8d2be157df8841d48de988ee43f3e3087995
Reviewed-on: https://go-review.googlesource.com/10672
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-06 00:07:23 +00:00
Alan Donovan
232331f0c7 runtime: add blank assignment to defeat "declared but not used" error from go/types
gc should ideally consider this an error too; see golang/go#8560.

Change-Id: Ieee71c4ecaff493d7f83e15ba8c8a04ee90a4cf1
Reviewed-on: https://go-review.googlesource.com/10757
Reviewed-by: Robert Griesemer <gri@golang.org>
2015-06-05 18:05:16 +00:00
Austin Clements
7529314ed3 runtime: use correct SP when installing stack barriers
Currently the stack barriers are installed at the next frame boundary
after gp.sched.sp + 1024*2^n for n=0,1,2,... However, when a G is in a
system call, we set gp.sched.sp to 0, which causes stack barriers to
be installed at *every* frame. This easily overflows the slice we've
reserved for storing the stack barrier information, and causes a
"slice bounds out of range" panic in gcInstallStackBarrier.

Fix this by using gp.syscallsp instead of gp.sched.sp if it's
non-zero. This is the same logic that gentraceback uses to determine
the current SP.

Fixes #11049.

Change-Id: Ie40eeee5bec59b7c1aa715a7c17aa63b1f1cf4e8
Reviewed-on: https://go-review.googlesource.com/10755
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-05 15:53:07 +00:00
Russ Cox
3ffcbb633e runtime: default GOMAXPROCS to NumCPU(), not 1
See golang.org/s/go15gomaxprocs for details.

Change-Id: I8de5df34fa01d31d78f0194ec78a2474c281243c
Reviewed-on: https://go-review.googlesource.com/10668
Reviewed-by: Rob Pike <r@golang.org>
2015-06-05 04:38:04 +00:00
Josh Bleecher Snyder
5353cde080 runtime, cmd/internal/obj/arm: improve arm function prologue
When stack growth is not needed, as it usually is not,
execute only a single conditional branch
rather than three conditional instructions.
This adds 4 bytes to every function,
but might speed up execution in the common case.

Sample disassembly for

func f() {
	_ = [128]byte{}
}

Before:

TEXT main.f(SB) x.go
	x.go:3	0x2000	e59a1008	MOVW 0x8(R10), R1
	x.go:3	0x2004	e59fb028	MOVW 0x28(R15), R11
	x.go:3	0x2008	e08d200b	ADD R11, R13, R2
	x.go:3	0x200c	e1520001	CMP R1, R2
	x.go:3	0x2010	91a0300e	MOVW.LS R14, R3
	x.go:3	0x2014	9b0118a9	BL.LS runtime.morestack_noctxt(SB)
	x.go:3	0x2018	9afffff8	B.LS main.f(SB)
	x.go:3	0x201c	e52de084	MOVW.W R14, -0x84(R13)
	x.go:4	0x2020	e28d1004	ADD $4, R13, R1
	x.go:4	0x2024	e3a00000	MOVW $0, R0
	x.go:4	0x2028	eb012255	BL 0x4a984
	x.go:5	0x202c	e49df084	RET #132
	x.go:5	0x2030	eafffffe	B 0x2030
	x.go:5	0x2034	ffffff7c	?

After:

TEXT main.f(SB) x.go
	x.go:3	0x2000	e59a1008	MOVW 0x8(R10), R1
	x.go:3	0x2004	e59fb02c	MOVW 0x2c(R15), R11
	x.go:3	0x2008	e08d200b	ADD R11, R13, R2
	x.go:3	0x200c	e1520001	CMP R1, R2
	x.go:3	0x2010	9a000004	B.LS 0x2028
	x.go:3	0x2014	e52de084	MOVW.W R14, -0x84(R13)
	x.go:4	0x2018	e28d1004	ADD $4, R13, R1
	x.go:4	0x201c	e3a00000	MOVW $0, R0
	x.go:4	0x2020	eb0124dc	BL 0x4b398
	x.go:5	0x2024	e49df084	RET #132
	x.go:5	0x2028	e1a0300e	MOVW R14, R3
	x.go:5	0x202c	eb011b0d	BL runtime.morestack_noctxt(SB)
	x.go:5	0x2030	eafffff2	B main.f(SB)
	x.go:5	0x2034	eafffffe	B 0x2034
	x.go:5	0x2038	ffffff7c	?

Updates #10587.

package sort benchmarks on an iPhone 6:

name            old time/op  new time/op  delta
SortString1K     569µs ± 0%   565µs ± 1%  -0.75%  (p=0.000 n=23+24)
StableString1K   872µs ± 1%   870µs ± 1%  -0.16%  (p=0.009 n=23+24)
SortInt1K        317µs ± 2%   316µs ± 2%    ~     (p=0.410 n=26+26)
StableInt1K      343µs ± 1%   339µs ± 1%  -1.07%  (p=0.000 n=22+23)
SortInt64K      30.0ms ± 1%  30.0ms ± 1%    ~     (p=0.091 n=25+24)
StableInt64K    30.2ms ± 0%  30.0ms ± 0%  -0.69%  (p=0.000 n=22+22)
Sort1e2          147µs ± 1%   146µs ± 0%  -0.48%  (p=0.000 n=25+24)
Stable1e2        290µs ± 1%   286µs ± 1%  -1.30%  (p=0.000 n=23+24)
Sort1e4         29.5ms ± 2%  29.7ms ± 1%  +0.71%  (p=0.000 n=23+23)
Stable1e4       88.7ms ± 4%  88.6ms ± 8%  -0.07%  (p=0.022 n=26+26)
Sort1e6          4.81s ± 7%   4.83s ± 7%    ~     (p=0.192 n=26+26)
Stable1e6        18.3s ± 1%   18.1s ± 1%  -0.76%  (p=0.000 n=25+23)
SearchWrappers   318ns ± 1%   344ns ± 1%  +8.14%  (p=0.000 n=23+26)

package sort benchmarks on a first generation rpi:

name            old time/op  new time/op  delta
SearchWrappers  4.13µs ± 0%  3.95µs ± 0%   -4.42%  (p=0.000 n=15+13)
SortString1K    5.81ms ± 1%  5.82ms ± 2%     ~     (p=0.400 n=14+15)
StableString1K  9.69ms ± 1%  9.73ms ± 0%     ~     (p=0.121 n=15+11)
SortInt1K       3.30ms ± 2%  3.66ms ±19%  +10.82%  (p=0.000 n=15+14)
StableInt1K     5.97ms ±15%  4.17ms ± 8%  -30.05%  (p=0.000 n=15+15)
SortInt64K       319ms ± 1%   295ms ± 1%   -7.65%  (p=0.000 n=15+15)
StableInt64K     343ms ± 0%   332ms ± 0%   -3.26%  (p=0.000 n=12+13)
Sort1e2         3.36ms ± 2%  3.22ms ± 4%   -4.10%  (p=0.000 n=15+15)
Stable1e2       6.74ms ± 1%  6.43ms ± 2%   -4.67%  (p=0.000 n=15+15)
Sort1e4          247ms ± 1%   247ms ± 1%     ~     (p=0.331 n=15+14)
Stable1e4        864ms ± 0%   820ms ± 0%   -5.15%  (p=0.000 n=14+15)
Sort1e6          41.2s ± 0%   41.2s ± 0%   +0.15%  (p=0.000 n=13+14)
Stable1e6         192s ± 0%    182s ± 0%   -5.07%  (p=0.000 n=14+14)

Change-Id: I8a9db77e1d4ea1956575895893bc9d04bd81204b
Reviewed-on: https://go-review.googlesource.com/10497
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-04 16:35:12 +00:00
Brad Fitzpatrick
03410f6758 runtime: fix TestFixedGOROOT to properly restore the GOROOT env var after test
Otherwise subsequent tests won't see any modified GOROOT.

With this CL I can move my GOROOT, set GOROOT to the new location, and
the runtime tests pass. Previously the crash_tests would instead look
for the GOROOT baked into the binary, instead of the env var:

--- FAIL: TestGcSys (0.01s)
        crash_test.go:92: building source: exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestGCFairness (0.01s)
        crash_test.go:92: building source: exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestGdbPython (0.07s)
        runtime-gdb_test.go:64: building source exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestLargeStringConcat (0.01s)
        crash_test.go:92: building source: exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go

Update #10029

Change-Id: If91be0f04d3acdcf39a9e773a4e7905a446bc477
Reviewed-on: https://go-review.googlesource.com/10685
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-03 23:33:48 +00:00
Austin Clements
10083d8007 runtime: print start of GC cycle in gctrace, rather than end
Currently the GODEBUG=gctrace=1 trace line includes "@n.nnns" to
indicate the time that the GC cycle ended relative to the time the
program started. This was meant to be consistent with the utilization
as of the end of the cycle, which is printed next on the trace line,
but it winds up just being confusing and unexpected.

Change the trace line to include the time that the GC cycle started
relative to the time the program started.

Change-Id: I7d64580cd696eb17540716d3e8a74a9d6ae50650
Reviewed-on: https://go-review.googlesource.com/10634
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-03 02:17:43 +00:00
Austin Clements
faa7a7e8ae runtime: implement GC stack barriers
This commit implements stack barriers to minimize the amount of
stack re-scanning that must be done during mark termination.

Currently the GC scans stacks of active goroutines twice during every
GC cycle: once at the beginning during root discovery and once at the
end during mark termination. The second scan happens while the world
is stopped and guarantees that we've seen all of the roots (since
there are no write barriers on writes to local stack
variables). However, this means pause time is proportional to stack
size. In particularly recursive programs, this can drive pause time up
past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap).

Re-scanning the entire stack is rarely necessary, especially for large
stacks, because usually most of the frames on the stack were not
active between the first and second scans and hence any changes to
these frames (via non-escaping pointers passed down the stack) were
tracked by write barriers.

To efficiently track how far a stack has been unwound since the first
scan (and, hence, how much needs to be re-scanned), this commit
introduces stack barriers. During the first scan, at exponentially
spaced points in each stack, the scan overwrites return PCs with the
PC of the stack barrier function. When "returned" to, the stack
barrier function records how far the stack has unwound and jumps to
the original return PC for that point in the stack. Then the second
scan only needs to proceed as far as the lowest barrier that hasn't
been hit.

For deeply recursive programs, this substantially reduces mark
termination time (and hence pause time). For the goscheme example
linked in issue #10898, prior to this change, mark termination times
were typically between 100 and 500ms; with this change, mark
termination times are typically between 10 and 20ms. As a result of
the reduced stack scanning work, this reduces overall execution time
of the goscheme example by 20%.

Fixes #10898.

The effect of this on programs that are not deeply recursive is
minimal:

name                   old time/op    new time/op    delta
BinaryTree17              3.16s ± 2%     3.26s ± 1%  +3.31%  (p=0.000 n=19+19)
Fannkuch11                2.42s ± 1%     2.48s ± 1%  +2.24%  (p=0.000 n=17+19)
FmtFprintfEmpty          50.0ns ± 3%    49.8ns ± 1%    ~     (p=0.534 n=20+19)
FmtFprintfString          173ns ± 0%     175ns ± 0%  +1.49%  (p=0.000 n=16+19)
FmtFprintfInt             170ns ± 1%     175ns ± 1%  +2.97%  (p=0.000 n=20+19)
FmtFprintfIntInt          288ns ± 0%     295ns ± 0%  +2.73%  (p=0.000 n=16+19)
FmtFprintfPrefixedInt     242ns ± 1%     252ns ± 1%  +4.13%  (p=0.000 n=18+18)
FmtFprintfFloat           324ns ± 0%     323ns ± 0%  -0.36%  (p=0.000 n=20+19)
FmtManyArgs              1.14µs ± 0%    1.12µs ± 1%  -1.01%  (p=0.000 n=18+19)
GobDecode                8.88ms ± 1%    8.87ms ± 0%    ~     (p=0.480 n=19+18)
GobEncode                6.80ms ± 1%    6.85ms ± 0%  +0.82%  (p=0.000 n=20+18)
Gzip                      363ms ± 1%     363ms ± 1%    ~     (p=0.077 n=18+20)
Gunzip                   90.6ms ± 0%    90.0ms ± 1%  -0.71%  (p=0.000 n=17+18)
HTTPClientServer         51.5µs ± 1%    50.8µs ± 1%  -1.32%  (p=0.000 n=18+18)
JSONEncode               17.0ms ± 0%    17.1ms ± 0%  +0.40%  (p=0.000 n=18+17)
JSONDecode               61.8ms ± 0%    63.8ms ± 1%  +3.11%  (p=0.000 n=18+17)
Mandelbrot200            3.84ms ± 0%    3.84ms ± 1%    ~     (p=0.583 n=19+19)
GoParse                  3.71ms ± 1%    3.72ms ± 1%    ~     (p=0.159 n=18+19)
RegexpMatchEasy0_32       100ns ± 0%     100ns ± 1%  -0.19%  (p=0.033 n=17+19)
RegexpMatchEasy0_1K       342ns ± 1%     331ns ± 0%  -3.41%  (p=0.000 n=19+19)
RegexpMatchEasy1_32      82.5ns ± 0%    81.7ns ± 0%  -0.98%  (p=0.000 n=18+18)
RegexpMatchEasy1_1K       505ns ± 0%     494ns ± 1%  -2.16%  (p=0.000 n=18+18)
RegexpMatchMedium_32      137ns ± 1%     137ns ± 1%  -0.24%  (p=0.048 n=20+18)
RegexpMatchMedium_1K     41.6µs ± 0%    41.3µs ± 1%  -0.57%  (p=0.004 n=18+20)
RegexpMatchHard_32       2.11µs ± 0%    2.11µs ± 1%  +0.20%  (p=0.037 n=17+19)
RegexpMatchHard_1K       63.9µs ± 2%    63.3µs ± 0%  -0.99%  (p=0.000 n=20+17)
Revcomp                   560ms ± 1%     522ms ± 0%  -6.87%  (p=0.000 n=18+16)
Template                 75.0ms ± 0%    75.1ms ± 1%  +0.18%  (p=0.013 n=18+19)
TimeParse                 358ns ± 1%     364ns ± 0%  +1.74%  (p=0.000 n=20+15)
TimeFormat                360ns ± 0%     372ns ± 0%  +3.55%  (p=0.000 n=20+18)

Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2
Reviewed-on: https://go-review.googlesource.com/10314
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-06-02 20:00:57 +00:00
Austin Clements
724f8298a8 runtime: avoid double-scanning of stacks
Currently there's a race between stopg scanning another G's stack and
the G reaching a preemption point and scanning its own stack. When
this race occurs, the G's stack is scanned twice. Currently this is
okay, so this race is benign.

However, we will shortly be adding stack barriers during the first
stack scan, so scanning will no longer be idempotent. To prepare for
this, this change ensures that each stack is scanned only once during
each GC phase by checking the flag that indicates that the stack has
been scanned in this phase before scanning the stack.

Change-Id: Id9f4d5e2e5b839bc3f200ec1723a4a12dd677ab4
Reviewed-on: https://go-review.googlesource.com/10458
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-06-02 19:59:05 +00:00
Austin Clements
3f6e69aca5 runtime: steal space for stack barrier tracking from stack
The stack barrier code will need a bookkeeping structure to keep track
of the overwritten return PCs. This commit introduces and allocates
this structure, but does not yet use the structure.

We don't want to allocate space for this structure during garbage
collection, so this commit allocates it along with the allocation of
the corresponding stack. However, we can't do a regular allocation in
newstack because mallocgc may itself grow the stack (which would lead
to a recursive allocation). Hence, this commit makes the bookkeeping
structure part of the stack allocation itself by stealing the
necessary space from the top of the stack allocation. Since the size
of this bookkeeping structure is logarithmic in the size of the stack,
this has minimal impact on stack behavior.

Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665
Reviewed-on: https://go-review.googlesource.com/10313
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:57 +00:00
Austin Clements
e610c25df0 runtime: decouple stack bounds and stack allocation size
Currently the runtime assumes that the allocation for the stack is
exactly [stack.lo, stack.hi). We're about to steal a small part of
this allocation for per-stack GC metadata. To prepare for this, this
commit adds a field to the G for the allocated size of the stack.
With this change, stack.lo and stack.hi continue to act as the true
bounds on the stack, but are no longer also used as the bounds on the
stack allocation.

(I also tried this the other way around, where stack.lo and stack.hi
remained the allocation bounds and I introduced a new top of stack.
However, there are far more places that assume stack.hi is the true
top of the stack than there are places that assume it's the top of the
allocation.)

Change-Id: Ifa9d956753be53d286d09cbc73d47fb34a18c0c6
Reviewed-on: https://go-review.googlesource.com/10312
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:50 +00:00
Austin Clements
c02b8911d8 runtime: clean up signalstack API
Currently signalstack takes a lower limit and a length and all calls
hard-code the passed length. Change the API to take a *stack and
compute the lower limit and length from the passed stack.

This will make it easier for the runtime to steal some space from the
top of the stack since it eliminates the hard-coded stack sizes.

Change-Id: I7d2a9f45894b221f4e521628c2165530bbc57d53
Reviewed-on: https://go-review.googlesource.com/10311
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:42 +00:00
Austin Clements
cc6a7fce53 runtime: increase precision of gctrace times
Currently we truncate gctrace clock and CPU times to millisecond
precision. As a result, many phases are typically printed as 0, which
is fine for user consumption, but makes gathering statistics and
reports over GC traces difficult.

In 1.4, the gctrace line printed times in microseconds. This was
better for statistics, but not as easy for users to read or interpret,
and it generally made the trace lines longer.

This change strikes a balance between these extremes by printing
milliseconds, but including the decimal part to two significant
figures down to microsecond precision. This remains easy to read and
interpret, but includes more precision when it's useful.

For example, where the code currently prints,

gc #29 @1.629s 0%: 0+2+0+12+0 ms clock, 0+2+0+0/12/0+0 ms cpu, 4->4->2 MB, 4 MB goal, 1 P

this prints,

gc #29 @1.629s 0%: 0.005+2.1+0+12+0.29 ms clock, 0.005+2.1+0+0/12/0+0.29 ms cpu, 4->4->2 MB, 4 MB goal, 1 P

Fixes #10970.

Change-Id: I249624779433927cd8b0947b986df9060c289075
Reviewed-on: https://go-review.googlesource.com/10554
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 18:31:36 +00:00
Mikio Hara
1fa0a8cec5 runtime: fix data race in BenchmarkChanPopular
Fixes #11014.

Change-Id: I9a18dacd10564d3eaa1fea4d77f1a48e08e79f53
Reviewed-on: https://go-review.googlesource.com/10563
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-02 11:16:01 +00:00
Austin Clements
df2809f04e runtime: document that runtime.GC() blocks until GC is complete
runtime.GC() is intentionally very weakly specified. However, it is so
weakly specified that it's difficult to know that it's being used
correctly for its one intended use case: to ensure garbage collection
has run in a test that is garbage-sensitive. In particular, it is
unclear whether it is synchronous or asynchronous. In the old STW
collector this was essentially self-evident; short of queuing up a
garbage collection to run later, it had to be synchronous. However,
with the concurrent collector, there's evidence that people are
inferring that it may be asynchronous (e.g., issue #10986), as this is
both unclear in the documentation and possible in the implementation.

In fact, runtime.GC() runs a fully synchronous STW collection. We
probably don't want to commit to this exact behavior. But we can
commit to the essential property that tests rely on: that runtime.GC()
does not return until the GC has finished.

Change-Id: Ifc3045a505e1898ecdbe32c1f7e80e2e9ffacb5b
Reviewed-on: https://go-review.googlesource.com/10488
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-06-01 14:51:12 +00:00
Austin Clements
f2c3957ed8 runtime: disable GC around TestGoroutineParallelism
TestGoroutineParallelism can deadlock if the GC runs during the
test. Currently it tries to prevent this by forcing a GC before the
test, but this is best effort and fails completely if GOGC is very low
for testing.

This change replaces this best-effort fix with simply setting GOGC to
off for the duration of the test.

Change-Id: I8229310833f241b149ebcd32845870c1cb14e9f8
Reviewed-on: https://go-review.googlesource.com/10454
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-28 17:40:19 +00:00
Austin Clements
4a1957d0aa runtime: use stripped test environment for TestGdbPython
Most runtime tests that invoke the compiler to build a sub-test binary
do so with a special environment constructed by testEnv that strips
out environment variables that should apply to the test but not to the
build.

Fix TestGdbPython to use this test environment when invoking go build,
like other tests do.

Change-Id: Iafdf89d4765c587cbebc427a5d61cb8a7e71b326
Reviewed-on: https://go-review.googlesource.com/10455
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-28 17:39:08 +00:00
Elias Naur
8017ace496 runtime: don't always block all signals on OpenBSD
Implement the changes from CL 10173 on OpenBSD.

Change-Id: I2db1cd8141fd392a34753a1b8113e2e0401173b9
Reviewed-on: https://go-review.googlesource.com/10342
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-23 17:42:43 +00:00
Elias Naur
84cfba17c2 runtime: don't always unblock all signals
Ian proposed an improved way of handling signals masks in Go, motivated
by a problem where the Android java runtime expects certain signals to
be blocked for all JVM threads. Discussion here

https://groups.google.com/forum/#!topic/golang-dev/_TSCkQHJt6g

Ian's text is used in the following:

A Go program always needs to have the synchronous signals enabled.
These are the signals for which _SigPanic is set in sigtable, namely
SIGSEGV, SIGBUS, SIGFPE.

A Go program that uses the os/signal package, and calls signal.Notify,
needs to have at least one thread which is not blocking that signal,
but it doesn't matter much which one.

Unix programs do not change signal mask across execve.  They inherit
signal masks across fork.  The shell uses this fact to some extent;
for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are
blocked for commands run due to backquote quoting or $().

Our current position on signal masks was not thought out.  We wandered
into step by step, e.g., http://golang.org/cl/7323067 .

This CL does the following:

Introduce a new platform hook, msigsave, that saves the signal mask of
the current thread to m.sigsave.

Call msigsave from needm and newm.

In minit grab set up the signal mask from m.sigsave and unblock the
essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT
(for systems that have it).

In unminit, restore the signal mask from m.sigsave.

The first time that os/signal.Notify is called, start a new thread whose
only purpose is to update its signal mask to make sure signals for
signal.Notify are unblocked on at least one thread.

The effect on Go programs will be that if they are invoked with some
non-synchronous signals blocked, those signals will normally be
ignored.  Previously, those signals would mostly be ignored.  A change
in behaviour will occur for programs started with any of these signals
blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT,
SIGTERM.  Previously those signals would always cause a crash (unless
using the os/signal package); with this change, they will be ignored
if the program is started with the signal blocked (and does not use
the os/signal package).

./all.bash completes successfully on linux/amd64.

OpenBSD is missing the implementation.

Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c
Reviewed-on: https://go-review.googlesource.com/10173
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-22 20:24:08 +00:00
Russ Cox
001438bdfe runtime: fix callwritebarrier
Given a call frame F of size N where the return values start at offset R,
callwritebarrier was instructing heapBitsBulkBarrier to scan the block
of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R
bytes scanned might lead into the next allocated block in memory.
Because the scan was consulting the heap bitmap for type information,
scanning into the next block normally "just worked" in the sense of
not crashing.

Scanning the extra N-R bytes of memory is a problem mainly because
it causes the GC to consider pointers that might otherwise not be
considered, leading it to retain objects that should actually be freed.
This is very difficult to detect.

Luckily, juju turned up a case where the heap bitmap and the memory
were out of sync for the block immediately after the call frame, so that
heapBitsBulkBarrier saw an obvious non-pointer where it expected a
pointer, causing a loud crash.

Why is there a non-pointer in memory that the heap bitmap records as
a pointer? That is more difficult to answer. At least one way that it
could happen is that allocations containing no pointers at all do not
update the heap bitmap. So if heapBitsBulkBarrier walked out of the
current object and into a no-pointer object and consulted those bitmap
bits, it would be misled. This doesn't happen in general because all
the paths to heapBitsBulkBarrier first check for the no-pointer case.
This may or may not be what happened, but it's the only scenario
I've been able to construct.

I tried for quite a while to write a simple test for this and could not.
It does fix the juju crash, and it is clearly an improvement over the
old code.

Fixes #10844.

Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a
Reviewed-on: https://go-review.googlesource.com/10317
Reviewed-by: Austin Clements <austin@google.com>
2015-05-21 19:14:03 +00:00
Austin Clements
a5c3bbe0b4 runtime: eliminate write barrier from adjustpointers
Currently adjustpointers invokes a write barrier for every stack slot
it updates. This is safe---the write barrier always does nothing
because the new value is never a heap pointer---but it's unnecessary
overhead in performance and complexity.

Fix this by rewriting adjustpointers to work with *uintptrs instead of
*unsafe.Pointers. As an added bonus, this makes the code cleaner.

name                   old mean              new mean              delta
BinaryTree17            3.35s × (0.98,1.01)   3.33s × (0.99,1.02)    ~    (p=0.095 n=20+19)
Fannkuch11              2.49s × (1.00,1.01)   2.52s × (0.99,1.01)  +1.23% (p=0.000 n=19+20)
FmtFprintfEmpty        52.2ns × (0.99,1.02)  52.2ns × (0.99,1.02)    ~    (p=0.766 n=19+19)
FmtFprintfString        181ns × (0.99,1.02)   179ns × (0.99,1.01)  -1.06% (p=0.000 n=20+19)
FmtFprintfInt           177ns × (0.99,1.01)   173ns × (0.99,1.02)  -2.26% (p=0.000 n=17+20)
FmtFprintfIntInt        300ns × (0.99,1.01)   302ns × (0.99,1.01)  +0.76% (p=0.000 n=19+20)
FmtFprintfPrefixedInt   253ns × (0.99,1.02)   256ns × (0.99,1.01)  +0.96% (p=0.000 n=20+19)
FmtFprintfFloat         334ns × (0.99,1.02)   334ns × (1.00,1.01)    ~    (p=0.243 n=20+19)
FmtManyArgs            1.16µs × (0.99,1.01)  1.17µs × (0.99,1.02)  +0.88% (p=0.000 n=20+20)
GobDecode              9.16ms × (0.99,1.02)  9.18ms × (1.00,1.00)  +0.21% (p=0.048 n=20+17)
GobEncode              7.03ms × (0.99,1.01)  7.05ms × (0.99,1.01)    ~    (p=0.091 n=19+19)
Gzip                    374ms × (0.99,1.01)   372ms × (0.99,1.02)  -0.50% (p=0.008 n=18+20)
Gunzip                 92.9ms × (0.99,1.01)  92.5ms × (1.00,1.01)  -0.47% (p=0.002 n=19+19)
HTTPClientServer       53.1µs × (0.98,1.01)  52.5µs × (0.99,1.01)  -0.98% (p=0.000 n=20+19)
JSONEncode             17.4ms × (0.99,1.02)  17.5ms × (0.99,1.01)    ~    (p=0.061 n=19+20)
JSONDecode             66.0ms × (0.99,1.02)  64.7ms × (0.99,1.01)  -1.87% (p=0.000 n=20+20)
Mandelbrot200          3.94ms × (1.00,1.01)  3.95ms × (1.00,1.01)    ~    (p=0.799 n=18+19)
GoParse                3.89ms × (0.99,1.02)  3.86ms × (0.99,1.01)  -0.70% (p=0.016 n=20+19)
RegexpMatchEasy0_32     102ns × (0.99,1.02)   102ns × (1.00,1.01)    ~    (p=0.557 n=20+18)
RegexpMatchEasy0_1K     353ns × (0.99,1.02)   341ns × (0.99,1.01)  -3.38% (p=0.000 n=20+20)
RegexpMatchEasy1_32    85.0ns × (0.99,1.02)  85.0ns × (0.99,1.01)    ~    (p=0.851 n=19+20)
RegexpMatchEasy1_1K     521ns × (0.99,1.02)   506ns × (1.00,1.01)  -2.85% (p=0.000 n=20+18)
RegexpMatchMedium_32    142ns × (0.99,1.02)   141ns × (1.00,1.01)  -1.17% (p=0.000 n=20+19)
RegexpMatchMedium_1K   42.8µs × (0.99,1.01)  42.3µs × (0.99,1.01)  -1.07% (p=0.000 n=20+19)
RegexpMatchHard_32     2.17µs × (0.99,1.01)  2.16µs × (1.00,1.01)  -0.51% (p=0.042 n=20+18)
RegexpMatchHard_1K     65.6µs × (0.99,1.01)  64.8µs × (1.00,1.00)  -1.21% (p=0.000 n=20+17)
Revcomp                 581ms × (0.99,1.04)   536ms × (1.00,1.01)  -7.71% (p=0.000 n=20+18)
Template               77.2ms × (0.99,1.01)  76.8ms × (0.99,1.01)    ~    (p=0.426 n=20+18)
TimeParse               369ns × (0.99,1.02)   371ns × (1.00,1.01)    ~    (p=0.117 n=20+19)
TimeFormat              371ns × (0.99,1.02)   391ns × (0.99,1.01)  +5.33% (p=0.000 n=20+19)

Change-Id: I5b952ba577ac4365c8c87db837c5804a1e30b7be
Reviewed-on: https://go-review.googlesource.com/10293
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-21 18:35:49 +00:00
Rick Hudson
5b66e5d0d8 runtime: turn work buffer tracing off by default
During development we ran with monitoring code turned
on by default. This CL turns the work buffer monitoring
off. Performance change on most go1 benchmarks is small
or insignificant.

name                   old mean              new mean              delta
BinaryTree17            3.35s × (0.99,1.01)   3.35s × (0.99,1.01)    ~    (p=0.841 n=5+5)
Fannkuch11              2.59s × (1.00,1.01)   2.55s × (1.00,1.00)  -1.65% (p=0.008 n=5+5)
FmtFprintfEmpty        52.5ns × (0.99,1.02)  53.2ns × (0.98,1.01)    ~    (p=0.063 n=5+5)
FmtFprintfString        181ns × (1.00,1.00)   180ns × (1.00,1.00)  -0.55% (p=0.029 n=4+4)
FmtFprintfInt           176ns × (1.00,1.01)   174ns × (1.00,1.00)  -0.91% (p=0.000 n=5+4)
FmtFprintfIntInt        298ns × (1.00,1.00)   299ns × (1.00,1.00)    ~    (p=0.143 n=4+4)
FmtFprintfPrefixedInt   250ns × (1.00,1.01)   246ns × (1.00,1.00)  -1.68% (p=0.000 n=5+4)
FmtFprintfFloat         340ns × (1.00,1.00)   340ns × (1.00,1.01)    ~    (p=0.643 n=5+5)
FmtManyArgs            1.16µs × (1.00,1.00)  1.15µs × (1.00,1.00)  -0.47% (p=0.016 n=5+5)
GobDecode              9.22ms × (1.00,1.00)  9.23ms × (1.00,1.00)    ~    (p=0.841 n=5+5)
GobEncode              7.00ms × (1.00,1.01)  7.09ms × (0.99,1.01)  +1.26% (p=0.016 n=5+5)
Gzip                    387ms × (1.00,1.00)   389ms × (0.99,1.02)    ~    (p=1.000 n=5+5)
Gunzip                 97.8ms × (1.00,1.00)  98.3ms × (1.00,1.00)  +0.51% (p=0.016 n=5+4)
HTTPClientServer       52.6µs × (1.00,1.01)  52.7µs × (1.00,1.01)    ~    (p=1.000 n=5+5)
JSONEncode             18.0ms × (0.99,1.02)  17.9ms × (1.00,1.00)    ~    (p=0.310 n=5+5)
JSONDecode             64.8ms × (0.99,1.02)  63.6ms × (1.00,1.00)  -1.94% (p=0.008 n=5+5)
Mandelbrot200          4.05ms × (1.00,1.00)  4.05ms × (1.00,1.00)    ~    (p=0.421 n=5+5)
GoParse                3.86ms × (1.00,1.01)  3.84ms × (0.99,1.01)    ~    (p=0.421 n=5+5)
RegexpMatchEasy0_32     101ns × (1.00,1.00)   102ns × (0.99,1.02)    ~    (p=0.238 n=4+5)
RegexpMatchEasy0_1K     346ns × (1.00,1.01)   345ns × (1.00,1.00)    ~    (p=0.333 n=5+4)
RegexpMatchEasy1_32    87.3ns × (0.99,1.02)  87.4ns × (1.00,1.00)    ~    (p=0.190 n=5+4)
RegexpMatchEasy1_1K     520ns × (1.00,1.00)   520ns × (1.00,1.01)    ~    (p=1.000 n=4+5)
RegexpMatchMedium_32    143ns × (1.00,1.00)   142ns × (1.00,1.00)  -0.70% (p=0.029 n=4+4)
RegexpMatchMedium_1K   43.2µs × (1.00,1.01)  43.2µs × (1.00,1.00)    ~    (p=0.841 n=5+5)
RegexpMatchHard_32     2.24µs × (1.00,1.01)  2.23µs × (1.00,1.01)  -0.63% (p=0.048 n=5+5)
RegexpMatchHard_1K     68.7µs × (1.00,1.00)  68.3µs × (1.00,1.00)  -0.56% (p=0.008 n=5+5)
Revcomp                 577ms × (1.00,1.01)   579ms × (1.00,1.00)    ~    (p=0.151 n=5+5)
Template               74.9ms × (1.00,1.00)  76.5ms × (1.00,1.00)  +2.11% (p=0.008 n=5+5)
TimeParse               359ns × (1.00,1.00)   362ns × (1.00,1.00)  +0.72% (p=0.008 n=5+5)
TimeFormat              369ns × (1.00,1.00)   371ns × (1.00,1.01)    ~    (p=0.071 n=5+5)

Change-Id: I4206a3f77a3d1450966b7a62ea7597aec44cb72f
Reviewed-on: https://go-review.googlesource.com/10294
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-21 16:09:24 +00:00
Austin Clements
719efc70eb runtime: make runtime.callers walk calling G, not g0
Currently runtime.callers invokes gentraceback with the pc and sp of
the G it is called from, but always passes g0 even if it was called
from a regular g. Right now this has no ill effects because
runtime.callers does not use either callback argument or the
_TraceJumpStack flag, but it makes the code fragile and will break
some upcoming changes.

Fix this by lifting the getg() call outside of the systemstack in
runtime.callers.

Change-Id: I4e1e927961c0e0cd4dcf28693be47df7bae9e122
Reviewed-on: https://go-review.googlesource.com/10292
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-21 16:06:37 +00:00
Rick Hudson
197aa9e64d runtime: remove unused quiesce code
This is dead code. If you want to quiesce the system the
preferred way is to use forEachP(func(*p){}).

Change-Id: Ic7677a5dd55e3639b99e78ddeb2c71dd1dd091fa
Reviewed-on: https://go-review.googlesource.com/10267
Reviewed-by: Austin Clements <austin@google.com>
2015-05-20 17:56:44 +00:00
Rick Hudson
913db7685e runtime: run background mark helpers only if work is available
Prior to this CL whenever the GC marking was enabled and
a P was looking for work we supplied a G to help
the GC do its marking tasks. Once this G finished all
the marking available it would release the P to find another
available G. In the case where there was no work the P would drop
into findrunnable which would execute the mark helper G which would
immediately return and the P would drop into findrunnable again repeating
the process. Since the P was always given a G to run it never blocks.
This CL first checks if the GC mark helper G has available work and if
not the P immediately falls through to its blocking logic.

Fixes #10901

Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5
Reviewed-on: https://go-review.googlesource.com/10189
Reviewed-by: Austin Clements <austin@google.com>
2015-05-19 15:57:50 +00:00
Austin Clements
f4d51eb2f5 runtime: minor clean up to heapminimum
Currently setGCPercent sets heapminimum to heapminimum*GOGC/100. The
real intent is to set heapminimum to a scaled multiple of a fixed
default heap minimum, not to scale heapminimum based on its current
value. This turns out to be okay because setGCPercent is only called
once and heapminimum is initially set to this default heap minimum.
However, the code as written is confusing, especially since
setGCPercent is otherwise written so it could be called again to
change GOGC. Fix this by introducing a defaultHeapMinimum constant and
using this instead of the current value of heapminimum to compute the
scaled heap minimum.

As part of this, this commit improves the documentation on
heapminimum.

Change-Id: I4eb82c73dc2eb44a6e5a17c780a747a2e73d7493
Reviewed-on: https://go-review.googlesource.com/10181
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-19 15:30:34 +00:00
Russ Cox
8903b3db0e runtime: add fast check for self-loop pointer in scanobject
Addresses a problem reported on the mailing list.

This will come up mainly in programs custom allocators that batch allocations,
but it still helps in our programs, which mainly do not have such allocations.

name                   old mean              new mean              delta
BinaryTree17            5.95s × (0.97,1.03)   5.93s × (0.97,1.04)    ~    (p=0.613)
Fannkuch11              4.46s × (0.98,1.04)   4.33s × (0.99,1.01)  -2.93% (p=0.000)
FmtFprintfEmpty        86.6ns × (0.98,1.03)  86.8ns × (0.98,1.02)    ~    (p=0.523)
FmtFprintfString        290ns × (0.98,1.05)   287ns × (0.98,1.03)    ~    (p=0.061)
FmtFprintfInt           271ns × (0.98,1.04)   286ns × (0.99,1.01)  +5.54% (p=0.000)
FmtFprintfIntInt        495ns × (0.98,1.04)   489ns × (0.99,1.01)  -1.24% (p=0.015)
FmtFprintfPrefixedInt   391ns × (0.99,1.02)   407ns × (0.99,1.01)  +4.00% (p=0.000)
FmtFprintfFloat         578ns × (0.99,1.01)   559ns × (0.99,1.01)  -3.35% (p=0.000)
FmtManyArgs            1.96µs × (0.98,1.05)  1.94µs × (0.99,1.01)  -1.33% (p=0.030)
GobDecode              15.9ms × (0.97,1.05)  15.7ms × (0.99,1.01)  -1.35% (p=0.044)
GobEncode              11.4ms × (0.97,1.05)  11.3ms × (0.98,1.03)    ~    (p=0.141)
Gzip                    658ms × (0.98,1.05)   648ms × (0.99,1.01)  -1.59% (p=0.009)
Gunzip                  144ms × (0.99,1.03)   144ms × (0.99,1.01)    ~    (p=0.867)
HTTPClientServer       92.1µs × (0.97,1.05)  90.3µs × (0.99,1.01)  -1.89% (p=0.005)
JSONEncode             31.0ms × (0.96,1.07)  30.2ms × (0.98,1.03)  -2.66% (p=0.001)
JSONDecode              110ms × (0.97,1.04)   107ms × (0.99,1.01)  -2.59% (p=0.000)
Mandelbrot200          6.15ms × (0.98,1.04)  6.07ms × (0.99,1.02)  -1.32% (p=0.045)
GoParse                6.79ms × (0.97,1.04)  6.74ms × (0.97,1.04)    ~    (p=0.242)
RegexpMatchEasy0_32     158ns × (0.98,1.05)   155ns × (0.99,1.01)  -1.64% (p=0.010)
RegexpMatchEasy0_1K     548ns × (0.97,1.04)   540ns × (0.99,1.01)  -1.34% (p=0.042)
RegexpMatchEasy1_32     133ns × (0.97,1.04)   132ns × (0.97,1.05)    ~    (p=0.466)
RegexpMatchEasy1_1K     899ns × (0.96,1.05)   878ns × (0.99,1.01)  -2.32% (p=0.002)
RegexpMatchMedium_32    250ns × (0.96,1.03)   243ns × (0.99,1.01)  -2.90% (p=0.000)
RegexpMatchMedium_1K   73.4µs × (0.98,1.04)  73.0µs × (0.98,1.04)    ~    (p=0.411)
RegexpMatchHard_32     3.87µs × (0.97,1.07)  3.84µs × (0.98,1.04)    ~    (p=0.273)
RegexpMatchHard_1K      120µs × (0.97,1.08)   117µs × (0.99,1.01)  -2.06% (p=0.010)
Revcomp                 940ms × (0.96,1.07)   924ms × (0.97,1.07)    ~    (p=0.071)
Template                128ms × (0.96,1.05)   128ms × (0.99,1.01)    ~    (p=0.502)
TimeParse               632ns × (0.96,1.07)   616ns × (0.99,1.01)  -2.58% (p=0.001)
TimeFormat              671ns × (0.97,1.06)   657ns × (0.99,1.02)  -2.10% (p=0.002)

In contrast to the one in test/bench/go1 (above), the binarytree program on the
shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS
to runtime.NumCPU()*2.

Using that version, before vs after:

name          old mean             new mean             delta
BinaryTree20  18.6s × (0.96,1.05)  11.3s × (0.98,1.02)  -39.46% (p=0.000)

And Go 1.4 vs after:

name          old mean             new mean             delta
BinaryTree20  13.0s × (0.97,1.02)  11.3s × (0.98,1.02)  -13.21% (p=0.000)

There is still a scheduling problem - the raw run times are hiding the fact that
this chews up 2x the CPU - but we'll take care of that separately.

Change-Id: I3f5da879b24ae73a0d06745381ffb88c3744948b
Reviewed-on: https://go-review.googlesource.com/10220
Reviewed-by: Austin Clements <austin@google.com>
2015-05-19 15:29:40 +00:00
Josh Bleecher Snyder
79986e24e0 runtime/pprof: write heap statistics to heap profile always
This is a duplicate of CL 9491.
That CL broke the build due to pprof shortcomings
and was reverted in CL 9565.

CL 9623 fixed pprof, so this can go in again.

Fixes #10659.

Change-Id: If470fc90b3db2ade1d161b4417abd2f5c6c330b8
Reviewed-on: https://go-review.googlesource.com/10212
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2015-05-18 20:02:21 +00:00
Austin Clements
f0dd002895 runtime: use separate count and note for forEachP
Currently, forEachP reuses the stopwait and stopnote fields from
stopTheWorld to track how many Ps have not responded to the safe-point
request and to sleep until all Ps have responded.

It was assumed this was safe because both stopTheWorld and forEachP
must occur under the worlsema and hence stopwait and stopnote cannot
be used for both purposes simultaneously and callers could always
determine the appropriate use based on sched.gcwaiting (which is only
set by stopTheWorld). However, this is not the case, since it's
possible for there to be a window between when an M observes that
gcwaiting is set and when it checks stopwait during which stopwait
could have changed meanings. When this happens, the M decrements
stopwait and may wakeup stopnote, but does not otherwise participate
in the forEachP protocol. As a result, stopwait is decremented too
many times, so it may reach zero before all Ps have run the safe-point
function, causing forEachP to wake up early. It will then either
observe that some P has not run the safe-point function and panic with
"P did not run fn", or the remaining P (or Ps) will run the safe-point
function before it wakes up and it will observe that stopwait is
negative and panic with "not stopped".

Fix this problem by giving forEachP its own safePointWait and
safePointNote fields.

One known sequence of events that can cause this race is as
follows. It involves three actors:

G1 is running on M1 on P1. P1 has an empty run queue.

G2/M2 is in a blocked syscall and has lost its P. (The details of this
don't matter, it just needs to be in a position where it needs to grab
an idle P.)

GC just started on G3/M3/P3. (These aren't very involved, they just
have to be separate from the other G's, M's, and P's.)

1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1.

Now G1/M1 begins to enter a syscall:

2. G1/M1 invokes reentersyscall, which sets the P1's status to
   _Psyscall.

3. G1/M1's reentersyscall observes gcwaiting != 0 and calls
   entersyscall_gcwait.

4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock.

Back on GC:

5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and
   returns.

6. GC does stuff and then calls startTheWorld().

7. startTheWorld() calls procresize(), which sets P1's status to
   _Pidle and puts P1 on the idle list.

Now G2/M2 returns from its syscall and takes over P1:

8. G2/M2 returns from its blocked syscall and gets P1 from the idle
   list.

9. G2/M2 acquires P1, which sets P1's status to _Prunning.

10. G2/M2 starts a new syscall and invokes reentersyscall, which sets
    P1's status to _Psyscall.

Back on G1/M1:

11. G1/M1 finally acquires sched.lock in entersyscall_gcwait.

At this point, G1/M1 still thinks it's running on P1. P1's status is
_Psyscall, which is consistent with what G1/M1 is doing, but it's
_Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is
basically an ABA race on P1's status.

Because forEachP currently shares stopwait with stopTheWorld. G1/M1's
entersyscall_gcwait observes the non-zero stopwait set by forEachP,
but mistakes it for a stopTheWorld. It cas's P1's status from
_Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement
stopwait one more time than forEachP was expecting.

Fixes #10618. (See the issue for details on why the above race is safe
when forEachP is not involved.)

Prior to this commit, the command
  stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile
would reliably fail after a few hundred runs. With this commit, it
ran for over 2 million runs and never crashed.

Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30
Reviewed-on: https://go-review.googlesource.com/10157
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:47 +00:00
Austin Clements
277acca286 runtime: hold worldsema while starting the world
Currently, startTheWorld releases worldsema before starting the
world. Since startTheWorld can change gomaxprocs after allowing Ps to
run, this means that gomaxprocs can change while another P holds
worldsema.

Unfortunately, the garbage collector and forEachP assume that holding
worldsema protects against changes in gomaxprocs (which it *almost*
does). In particular, this is causing somewhat frequent "P did not run
fn" crashes in forEachP in the runtime tests because gomaxprocs is
changing between the several loops that forEachP does over all the Ps.

Fix this by only releasing worldsema after the world is started.

This relates to issue #10618. forEachP still fails under stress
testing, but much less frequently.

Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff
Reviewed-on: https://go-review.googlesource.com/10156
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:37 +00:00
Austin Clements
9c44a41dd5 runtime: disallow preemption during startTheWorld
Currently, startTheWorld clears preemptoff for the current M before
starting the world. A few callers increment m.locks around
startTheWorld, presumably to prevent preemption any time during
starting the world. This is almost certainly pointless (none of the
other callers do this), but there's no harm in making startTheWorld
keep preemption disabled until it's all done, which definitely lets us
drop these m.locks manipulations.

Change-Id: I8a93658abd0c72276c9bafa3d2c7848a65b4691a
Reviewed-on: https://go-review.googlesource.com/10155
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:31 +00:00
Austin Clements
a1da255aa0 runtime: factor stoptheworld/starttheworld pattern
There are several steps to stopping and starting the world and
currently they're open-coded in several places. The garbage collector
is the only thing that needs to stop and start the world in a
non-trivial pattern. Replace all other uses with calls to higher-level
functions that implement the entire pattern necessary to stop and
start the world.

This is a pure refectoring and should not change any code semantics.
In the following commits, we'll make changes that are easier to do
with this abstraction in place.

This commit renames the old starttheworld to startTheWorldWithSema.
This is a slight misnomer right now because the callers release
worldsema just before calling this. However, a later commit will swap
these and I don't want to think of another name in the mean time.

Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911
Reviewed-on: https://go-review.googlesource.com/10154
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:25 +00:00
Austin Clements
5f7060afd2 runtime: don't start GC if preemptoff is set
In order to avoid deadlocks, startGC avoids kicking off GC if locks
are held by the calling M. However, it currently fails to check
preemptoff, which is the other way to disable preemption.

Fix this by adding a check for preemptoff.

Change-Id: Ie1083166e5ba4af5c9d6c5a42efdfaaef41ca997
Reviewed-on: https://go-review.googlesource.com/10153
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:18 +00:00
Alex Brainman
e544bee1dd runtime: correct exception stack trace output
It is misleading when stack trace say:

signal arrived during cgo execution

but we are not in cgo call.

Change-Id: I627e2f2bdc7755074677f77f21befc070a101914
Reviewed-on: https://go-review.googlesource.com/9190
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 03:09:45 +00:00
Austin Clements
a0fc306023 runtime: eliminate runqvictims and a copy from runqsteal
Currently, runqsteal steals Gs from another P into an intermediate
buffer and then copies those Gs into the current P's run queue. This
intermediate buffer itself was moved from the stack to the P in commit
c4fe503 to eliminate the cost of zeroing it on every steal.

This commit follows up c4fe503 by stealing directly into the current
P's run queue, which eliminates the copy and the need for the
intermediate buffer. The update to the tail pointer is only committed
once the entire steal operation has succeeded, so the semantics of
stealing do not change.

Change-Id: Icdd7a0eb82668980bf42c0154b51eef6419fdd51
Reviewed-on: https://go-review.googlesource.com/9998
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-17 01:08:42 +00:00
Russ Cox
512f75e8df runtime: replace GC programs with simpler encoding, faster decoder
Small types record the location of pointers in their memory layout
by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries,
and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using
a bitmap for a large type containing arrays does not make sense:
if someone refers to the type [1<<28]*byte in a program in such
a way that the type information makes it into the binary, it would be
a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB
(for 1-bit entries) bitmap full of 1s into the binary or even to keep
one in memory during the execution of the program.

For large types containing arrays, it is much more compact to describe
the locations of pointers using a notation that can express repetition
than to lay out a bitmap of pointers. Go 1.4 included such a notation,
called ``GC programs'' but it was complex, required recursion during
decoding, and was generally slow. Dmitriy measured the execution of
these programs writing directly to the heap bitmap as being 7x slower
than copying from a preunrolled 4-bit mask (and frankly that code was
not terribly fast either). For some tests, unrollgcprog1 was seen costing
as much as 3x more than the rest of malloc combined.

This CL introduces a different form for the GC programs. They use a
simple Lempel-Ziv-style encoding of the 1-bit pointer information,
in which the only operations are (1) emit the following n bits
and (2) repeat the last n bits c more times. This encoding can be
generated directly from the Go type information (using repetition
only for arrays or large runs of non-pointer data) and it can be decoded
very efficiently. In particular the decoding requires little state and
no recursion, so that the entire decoding can run without any memory
accesses other than the reads of the encoding and the writes of the
decoded form to the heap bitmap. For recursive types like arrays of
arrays of arrays, the inner instructions are only executed once, not
n times, so that large repetitions run at full speed. (In contrast, large
repetitions in the old programs repeated the individual bit-level layout
of the inner data over and over.) The result is as much as 25x faster
decoding compared to the old form.

Because the old decoder was so slow, Go 1.4 had three (or so) cases
for how to set the heap bitmap bits for an allocation of a given type:

(1) If the type had an even number of words up to 32 words, then
the 4-bit pointer mask for the type fit in no more than 16 bytes;
store the 4-bit pointer mask directly in the binary and copy from it.

(1b) If the type had an odd number of words up to 15 words, then
the 4-bit pointer mask for the type, doubled to end on a byte boundary,
fit in no more than 16 bytes; store that doubled mask directly in the
binary and copy from it.

(2) If the type had an even number of words up to 128 words,
or an odd number of words up to 63 words (again due to doubling),
then the 4-bit pointer mask would fit in a 64-byte unrolled mask.
Store a GC program in the binary, but leave space in the BSS for
the unrolled mask. Execute the GC program to construct the mask the
first time it is needed, and thereafter copy from the mask.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.
(This is the case that was 7x slower than the other two.)

Because the new pointer masks store 1-bit entries instead of 4-bit
entries and because using the decoder no longer carries a significant
overhead, after this CL (that is, for Go 1.5) there are only two cases:

(1) If the type is 128 words or less (no condition about odd or even),
store the 1-bit pointer mask directly in the binary and use it to
initialize the heap bitmap during malloc. (Implemented in CL 9702.)

(2) There is no case 2 anymore.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.

Executing the GC program directly into the heap bitmap (case (3) above)
was disabled for the Go 1.5 dev cycle, both to avoid needing to use
GC programs for typedmemmove and to avoid updating that code as
the heap bitmap format changed. Typedmemmove no longer uses this
type information; as of CL 9886 it uses the heap bitmap directly.
Now that the heap bitmap format is stable, we reintroduce GC programs
and their space savings.

Benchmarks for heapBitsSetType, before this CL vs this CL:

name                    old mean               new mean              delta
SetTypePtr              7.59ns × (0.99,1.02)   5.16ns × (1.00,1.00)  -32.05% (p=0.000)
SetTypePtr8             21.0ns × (0.98,1.05)   21.4ns × (1.00,1.00)     ~    (p=0.179)
SetTypePtr16            24.1ns × (0.99,1.01)   24.6ns × (1.00,1.00)   +2.41% (p=0.001)
SetTypePtr32            31.2ns × (0.99,1.01)   32.4ns × (0.99,1.02)   +3.72% (p=0.001)
SetTypePtr64            45.2ns × (1.00,1.00)   47.2ns × (1.00,1.00)   +4.42% (p=0.000)
SetTypePtr126           75.8ns × (0.99,1.01)   79.1ns × (1.00,1.00)   +4.25% (p=0.000)
SetTypePtr128           74.3ns × (0.99,1.01)   77.6ns × (1.00,1.01)   +4.55% (p=0.000)
SetTypePtrSlice          726ns × (1.00,1.01)    712ns × (1.00,1.00)   -1.95% (p=0.001)
SetTypeNode1            20.0ns × (0.99,1.01)   20.7ns × (1.00,1.00)   +3.71% (p=0.000)
SetTypeNode1Slice        112ns × (1.00,1.00)    113ns × (0.99,1.00)     ~    (p=0.070)
SetTypeNode8            23.9ns × (1.00,1.00)   24.7ns × (1.00,1.01)   +3.18% (p=0.000)
SetTypeNode8Slice        294ns × (0.99,1.02)    287ns × (0.99,1.01)   -2.38% (p=0.015)
SetTypeNode64           52.8ns × (0.99,1.03)   51.8ns × (0.99,1.01)     ~    (p=0.069)
SetTypeNode64Slice      1.13µs × (0.99,1.05)   1.14µs × (0.99,1.00)     ~    (p=0.767)
SetTypeNode64Dead       36.0ns × (1.00,1.01)   32.5ns × (0.99,1.00)   -9.67% (p=0.000)
SetTypeNode64DeadSlice  1.43µs × (0.99,1.01)   1.40µs × (1.00,1.00)   -2.39% (p=0.001)
SetTypeNode124          75.7ns × (1.00,1.01)   79.0ns × (1.00,1.00)   +4.44% (p=0.000)
SetTypeNode124Slice     1.94µs × (1.00,1.01)   2.04µs × (0.99,1.01)   +4.98% (p=0.000)
SetTypeNode126          75.4ns × (1.00,1.01)   77.7ns × (0.99,1.01)   +3.11% (p=0.000)
SetTypeNode126Slice     1.95µs × (0.99,1.01)   2.03µs × (1.00,1.00)   +3.74% (p=0.000)
SetTypeNode128          85.4ns × (0.99,1.01)  122.0ns × (1.00,1.00)  +42.89% (p=0.000)
SetTypeNode128Slice     2.20µs × (1.00,1.01)   2.36µs × (0.98,1.02)   +7.48% (p=0.001)
SetTypeNode130          83.3ns × (1.00,1.00)  123.0ns × (1.00,1.00)  +47.61% (p=0.000)
SetTypeNode130Slice     2.30µs × (0.99,1.01)   2.40µs × (0.98,1.01)   +4.37% (p=0.000)
SetTypeNode1024          498ns × (1.00,1.00)    537ns × (1.00,1.00)   +7.96% (p=0.000)
SetTypeNode1024Slice    15.5µs × (0.99,1.01)   17.8µs × (1.00,1.00)  +15.27% (p=0.000)

The above compares always using a cached pointer mask (and the
corresponding waste of memory) against using the programs directly.
Some slowdown is expected, in exchange for having a better general algorithm.
The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024,
along with the slice variants of those.
It is possible that the cutoff of 128 words (bits) should be raised
in a followup CL, but even with this low cutoff the GC programs are
faster than Go 1.4's "fast path" non-GC program case.

Benchmarks for heapBitsSetType, Go 1.4 vs this CL:

name                    old mean              new mean              delta
SetTypePtr              6.89ns × (1.00,1.00)  5.17ns × (1.00,1.00)  -25.02% (p=0.000)
SetTypePtr8             25.8ns × (0.97,1.05)  21.5ns × (1.00,1.00)  -16.70% (p=0.000)
SetTypePtr16            39.8ns × (0.97,1.02)  24.7ns × (0.99,1.01)  -37.81% (p=0.000)
SetTypePtr32            68.8ns × (0.98,1.01)  32.2ns × (1.00,1.01)  -53.18% (p=0.000)
SetTypePtr64             130ns × (1.00,1.00)    47ns × (1.00,1.00)  -63.67% (p=0.000)
SetTypePtr126            241ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.25% (p=0.000)
SetTypePtr128           2.07µs × (1.00,1.00)  0.08µs × (1.00,1.00)  -96.27% (p=0.000)
SetTypePtrSlice         1.05µs × (0.99,1.01)  0.72µs × (0.99,1.02)  -31.70% (p=0.000)
SetTypeNode1            16.0ns × (0.99,1.01)  20.8ns × (0.99,1.03)  +29.91% (p=0.000)
SetTypeNode1Slice        184ns × (0.99,1.01)   112ns × (0.99,1.01)  -39.26% (p=0.000)
SetTypeNode8            29.5ns × (0.97,1.02)  24.6ns × (1.00,1.00)  -16.50% (p=0.000)
SetTypeNode8Slice        624ns × (0.98,1.02)   285ns × (1.00,1.00)  -54.31% (p=0.000)
SetTypeNode64            135ns × (0.96,1.08)    52ns × (0.99,1.02)  -61.32% (p=0.000)
SetTypeNode64Slice      3.83µs × (1.00,1.00)  1.14µs × (0.99,1.01)  -70.16% (p=0.000)
SetTypeNode64Dead        134ns × (0.99,1.01)    32ns × (1.00,1.01)  -75.74% (p=0.000)
SetTypeNode64DeadSlice  3.83µs × (0.99,1.00)  1.40µs × (1.00,1.01)  -63.42% (p=0.000)
SetTypeNode124           240ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.05% (p=0.000)
SetTypeNode124Slice     7.27µs × (1.00,1.00)  2.04µs × (1.00,1.00)  -71.95% (p=0.000)
SetTypeNode126          2.06µs × (0.99,1.01)  0.08µs × (0.99,1.01)  -96.23% (p=0.000)
SetTypeNode126Slice     64.4µs × (1.00,1.00)   2.0µs × (1.00,1.00)  -96.85% (p=0.000)
SetTypeNode128          2.09µs × (1.00,1.01)  0.12µs × (1.00,1.00)  -94.15% (p=0.000)
SetTypeNode128Slice     65.4µs × (1.00,1.00)   2.4µs × (0.99,1.03)  -96.39% (p=0.000)
SetTypeNode130          2.11µs × (1.00,1.00)  0.12µs × (1.00,1.00)  -94.18% (p=0.000)
SetTypeNode130Slice     66.3µs × (1.00,1.00)   2.4µs × (0.97,1.08)  -96.34% (p=0.000)
SetTypeNode1024         16.0µs × (1.00,1.01)   0.5µs × (1.00,1.00)  -96.65% (p=0.000)
SetTypeNode1024Slice     512µs × (1.00,1.00)    18µs × (0.98,1.04)  -96.45% (p=0.000)

SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation.
Both Go 1.4 and this CL are using pointer bitmaps for this case,
so that's an overall 3x speedup for using pointer bitmaps.

SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation.
Both Go 1.4 and this CL are running the GC program for this case,
so that's an overall 17x speedup when using GC programs (and
I've seen >20x on other systems).

Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against
this CL's SetTypeNode128 (GC program), the slow path in the
code in this CL is 2x faster than the fast path in Go 1.4.

The Go 1 benchmarks are basically unaffected compared to just before this CL.

Go 1 benchmarks, before this CL vs this CL:

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.97,1.04)   5.91s × (0.96,1.04)    ~    (p=0.306)
Fannkuch11              4.38s × (1.00,1.00)   4.37s × (1.00,1.01)  -0.22% (p=0.006)
FmtFprintfEmpty        90.7ns × (0.97,1.10)  89.3ns × (0.96,1.09)    ~    (p=0.280)
FmtFprintfString        282ns × (0.98,1.04)   287ns × (0.98,1.07)  +1.72% (p=0.039)
FmtFprintfInt           269ns × (0.99,1.03)   282ns × (0.97,1.04)  +4.87% (p=0.000)
FmtFprintfIntInt        478ns × (0.99,1.02)   481ns × (0.99,1.02)  +0.61% (p=0.048)
FmtFprintfPrefixedInt   399ns × (0.98,1.03)   400ns × (0.98,1.05)    ~    (p=0.533)
FmtFprintfFloat         563ns × (0.99,1.01)   570ns × (1.00,1.01)  +1.37% (p=0.000)
FmtManyArgs            1.89µs × (0.99,1.01)  1.92µs × (0.99,1.02)  +1.88% (p=0.000)
GobDecode              15.2ms × (0.99,1.01)  15.2ms × (0.98,1.05)    ~    (p=0.609)
GobEncode              11.6ms × (0.98,1.03)  11.9ms × (0.98,1.04)  +2.17% (p=0.000)
Gzip                    648ms × (0.99,1.01)   648ms × (1.00,1.01)    ~    (p=0.835)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.169)
HTTPClientServer       90.5µs × (0.98,1.03)  91.5µs × (0.98,1.04)  +1.04% (p=0.045)
JSONEncode             31.5ms × (0.98,1.03)  31.4ms × (0.98,1.03)    ~    (p=0.549)
JSONDecode              111ms × (0.99,1.01)   107ms × (0.99,1.01)  -3.21% (p=0.000)
Mandelbrot200          6.01ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.878)
GoParse                6.54ms × (0.99,1.02)  6.61ms × (0.99,1.03)  +1.08% (p=0.004)
RegexpMatchEasy0_32     160ns × (1.00,1.01)   161ns × (1.00,1.00)  +0.40% (p=0.000)
RegexpMatchEasy0_1K     560ns × (0.99,1.01)   559ns × (0.99,1.01)    ~    (p=0.088)
RegexpMatchEasy1_32     138ns × (0.99,1.01)   138ns × (1.00,1.00)    ~    (p=0.380)
RegexpMatchEasy1_1K     877ns × (1.00,1.00)   878ns × (1.00,1.00)    ~    (p=0.157)
RegexpMatchMedium_32    251ns × (0.99,1.00)   251ns × (1.00,1.01)  +0.28% (p=0.021)
RegexpMatchMedium_1K   72.6µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.539)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.378)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.067)
Revcomp                 904ms × (0.99,1.02)   904ms × (0.99,1.01)    ~    (p=0.943)
Template                125ms × (0.99,1.02)   127ms × (0.99,1.01)  +1.79% (p=0.000)
TimeParse               627ns × (0.99,1.01)   622ns × (0.99,1.01)  -0.88% (p=0.000)
TimeFormat              655ns × (0.99,1.02)   655ns × (0.99,1.02)    ~    (p=0.976)

For the record, Go 1 benchmarks, Go 1.4 vs this CL:

name                   old mean              new mean              delta
BinaryTree17            4.61s × (0.97,1.05)   5.91s × (0.98,1.03)  +28.35% (p=0.000)
Fannkuch11              4.40s × (0.99,1.03)   4.41s × (0.99,1.01)     ~    (p=0.212)
FmtFprintfEmpty         102ns × (0.99,1.01)    84ns × (0.99,1.02)  -18.38% (p=0.000)
FmtFprintfString        302ns × (0.98,1.01)   303ns × (0.99,1.02)     ~    (p=0.203)
FmtFprintfInt           313ns × (0.97,1.05)   270ns × (0.99,1.01)  -13.69% (p=0.000)
FmtFprintfIntInt        524ns × (0.98,1.02)   477ns × (0.99,1.00)   -8.87% (p=0.000)
FmtFprintfPrefixedInt   424ns × (0.98,1.02)   386ns × (0.99,1.01)   -8.96% (p=0.000)
FmtFprintfFloat         652ns × (0.98,1.02)   594ns × (0.97,1.05)   -8.97% (p=0.000)
FmtManyArgs            2.13µs × (0.99,1.02)  1.94µs × (0.99,1.01)   -8.92% (p=0.000)
GobDecode              17.1ms × (0.99,1.02)  14.9ms × (0.98,1.03)  -13.07% (p=0.000)
GobEncode              13.5ms × (0.98,1.03)  11.5ms × (0.98,1.03)  -15.25% (p=0.000)
Gzip                    656ms × (0.99,1.02)   647ms × (0.99,1.01)   -1.29% (p=0.000)
Gunzip                  143ms × (0.99,1.02)   144ms × (0.99,1.01)     ~    (p=0.204)
HTTPClientServer       88.2µs × (0.98,1.02)  90.8µs × (0.98,1.01)   +2.93% (p=0.000)
JSONEncode             32.2ms × (0.98,1.02)  30.9ms × (0.97,1.04)   -4.06% (p=0.001)
JSONDecode              121ms × (0.98,1.02)   110ms × (0.98,1.05)   -8.95% (p=0.000)
Mandelbrot200          6.06ms × (0.99,1.01)  6.11ms × (0.98,1.04)     ~    (p=0.184)
GoParse                6.76ms × (0.97,1.04)  6.58ms × (0.98,1.05)   -2.63% (p=0.003)
RegexpMatchEasy0_32     195ns × (1.00,1.01)   155ns × (0.99,1.01)  -20.43% (p=0.000)
RegexpMatchEasy0_1K     479ns × (0.98,1.03)   535ns × (0.99,1.02)  +11.59% (p=0.000)
RegexpMatchEasy1_32     169ns × (0.99,1.02)   131ns × (0.99,1.03)  -22.44% (p=0.000)
RegexpMatchEasy1_1K    1.53µs × (0.99,1.01)  0.87µs × (0.99,1.02)  -43.07% (p=0.000)
RegexpMatchMedium_32    334ns × (0.99,1.01)   242ns × (0.99,1.01)  -27.53% (p=0.000)
RegexpMatchMedium_1K    125µs × (1.00,1.01)    72µs × (0.99,1.03)  -42.53% (p=0.000)
RegexpMatchHard_32     6.03µs × (0.99,1.01)  3.79µs × (0.99,1.01)  -37.12% (p=0.000)
RegexpMatchHard_1K      189µs × (0.99,1.02)   115µs × (0.99,1.01)  -39.20% (p=0.000)
Revcomp                 935ms × (0.96,1.03)   926ms × (0.98,1.02)     ~    (p=0.083)
Template                146ms × (0.97,1.05)   119ms × (0.99,1.01)  -18.37% (p=0.000)
TimeParse               660ns × (0.99,1.01)   624ns × (0.99,1.02)   -5.43% (p=0.000)
TimeFormat              670ns × (0.98,1.02)   710ns × (1.00,1.01)   +5.97% (p=0.000)

This CL is a bit larger than I would like, but the compiler, linker, runtime,
and package reflect all need to be in sync about the format of these programs,
so there is no easy way to split this into independent changes (at least
while keeping the build working at each change).

Fixes #9625.
Fixes #10524.

Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a
Reviewed-on: https://go-review.googlesource.com/9888
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-16 00:38:17 +00:00
Russ Cox
d820d5f3ab runtime: make mapzero not crash on arm
Change-Id: I40e8a4a2e62253233b66f6a2e61e222437292c31
Reviewed-on: https://go-review.googlesource.com/10151
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-15 20:14:41 +00:00
Russ Cox
c3c047a6a3 runtime: test and fix heap bitmap for 1-pointer allocation on 32-bit system
Change-Id: Ic064fe7c6bd3304dcc8c3f7b3b5393870b5387c2
Reviewed-on: https://go-review.googlesource.com/10119
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-15 18:47:00 +00:00
Russ Cox
7e26a2d9a8 runtime: allocate map element zero values for reflect-created types on demand
Preallocating them in reflect means that
(1) if you say _ = PtrTo(ArrayOf(1000000000, reflect.TypeOf(byte(0)))), you just allocated 1GB of data
(2) if you say it again, that's *another* GB of data.

The only use of t.zero in the runtime is for map elements.
Delay the allocation until the creation of a map with that element type,
and share the zeros.

The one downside of the shared zero is that it's not garbage collected,
but it's also never written, so the OS should be able to handle it fairly
efficiently.

Change-Id: I56b098a091abf3ac0945de28ebef9a6c08e76614
Reviewed-on: https://go-review.googlesource.com/10111
Reviewed-by: Keith Randall <khr@golang.org>
2015-05-15 13:56:40 +00:00
Russ Cox
65c4d7beab runtime: optimize heapBitsBulkBarrier a tiny amount
This may be mostly noise but:

name                   old mean              new mean              delta
BinaryTree17            6.03s × (0.98,1.02)   5.98s × (0.97,1.03)    ~    (p=0.306)
Fannkuch11              4.42s × (0.99,1.01)   4.34s × (0.99,1.02)  -1.83% (p=0.000)
FmtFprintfEmpty        84.7ns × (0.99,1.01)  84.4ns × (1.00,1.00)    ~    (p=0.138)
FmtFprintfString        289ns × (0.98,1.02)   289ns × (1.00,1.01)    ~    (p=0.509)
FmtFprintfInt           280ns × (0.97,1.03)   272ns × (0.98,1.03)  -2.64% (p=0.003)
FmtFprintfIntInt        484ns × (0.98,1.02)   482ns × (0.98,1.03)    ~    (p=0.606)
FmtFprintfPrefixedInt   397ns × (0.98,1.03)   393ns × (0.99,1.02)    ~    (p=0.064)
FmtFprintfFloat         573ns × (0.99,1.01)   569ns × (0.99,1.01)  -0.69% (p=0.023)
FmtManyArgs            1.89µs × (0.99,1.02)  1.91µs × (0.98,1.02)    ~    (p=0.219)
GobDecode              15.4ms × (0.99,1.02)  15.1ms × (0.99,1.01)  -2.05% (p=0.000)
GobEncode              12.0ms × (0.97,1.04)  11.9ms × (0.97,1.03)    ~    (p=0.458)
Gzip                    652ms × (0.99,1.01)   653ms × (0.99,1.01)    ~    (p=0.743)
Gunzip                  144ms × (0.99,1.01)   143ms × (0.99,1.01)    ~    (p=0.134)
HTTPClientServer       91.6µs × (0.99,1.01)  91.8µs × (0.99,1.03)    ~    (p=0.678)
JSONEncode             31.9ms × (1.00,1.00)  32.0ms × (0.99,1.01)    ~    (p=0.334)
JSONDecode              110ms × (0.99,1.01)   110ms × (0.99,1.01)    ~    (p=0.315)
Mandelbrot200          6.04ms × (0.99,1.01)  6.04ms × (1.00,1.01)    ~    (p=0.596)
GoParse                6.72ms × (0.98,1.03)  6.74ms × (0.99,1.03)    ~    (p=0.577)
RegexpMatchEasy0_32     161ns × (0.99,1.01)   160ns × (1.00,1.00)  -0.83% (p=0.002)
RegexpMatchEasy0_1K     542ns × (0.99,1.02)   541ns × (0.99,1.01)    ~    (p=0.396)
RegexpMatchEasy1_32     140ns × (0.98,1.01)   137ns × (1.00,1.00)  -2.12% (p=0.000)
RegexpMatchEasy1_1K     892ns × (0.99,1.01)   891ns × (1.00,1.01)    ~    (p=0.631)
RegexpMatchMedium_32    255ns × (0.99,1.01)   253ns × (0.99,1.01)  -0.76% (p=0.008)
RegexpMatchMedium_1K   73.1µs × (1.00,1.01)  72.9µs × (1.00,1.00)    ~    (p=0.229)
RegexpMatchHard_32     3.86µs × (1.00,1.01)  3.85µs × (1.00,1.00)    ~    (p=0.341)
RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (0.99,1.00)    ~    (p=0.955)
Revcomp                 954ms × (0.97,1.03)   955ms × (0.98,1.02)    ~    (p=0.894)
Template                133ms × (0.97,1.05)   129ms × (0.99,1.02)  -2.50% (p=0.014)
TimeParse               629ns × (0.99,1.01)   626ns × (0.99,1.01)    ~    (p=0.106)
TimeFormat              663ns × (0.99,1.01)   660ns × (0.99,1.02)    ~    (p=0.231)

Change-Id: I580e03ed01b0629cb5eae4c4637618f20127f924
Reviewed-on: https://go-review.googlesource.com/9994
Reviewed-by: Austin Clements <austin@google.com>
2015-05-15 13:52:00 +00:00
Russ Cox
497970f421 runtime: use memmove during slice append
The effect of this CL:

name                   old mean              new mean              delta
BinaryTree17            5.97s × (0.96,1.04)   5.95s × (0.98,1.02)    ~    (p=0.697)
Fannkuch11              4.39s × (1.00,1.01)   4.41s × (1.00,1.01)  +0.52% (p=0.015)
FmtFprintfEmpty        90.8ns × (0.97,1.05)  89.4ns × (0.94,1.13)    ~    (p=0.571)
FmtFprintfString        305ns × (0.99,1.01)   292ns × (0.98,1.05)  -4.35% (p=0.000)
FmtFprintfInt           278ns × (0.96,1.03)   279ns × (0.98,1.04)    ~    (p=0.741)
FmtFprintfIntInt        489ns × (0.99,1.02)   482ns × (0.98,1.03)  -1.43% (p=0.024)
FmtFprintfPrefixedInt   402ns × (0.98,1.02)   395ns × (0.98,1.03)  -1.67% (p=0.014)
FmtFprintfFloat         578ns × (1.00,1.00)   569ns × (0.99,1.01)  -1.48% (p=0.000)
FmtManyArgs            1.88µs × (0.99,1.01)  1.88µs × (1.00,1.01)    ~    (p=0.055)
GobDecode              15.3ms × (0.99,1.01)  15.2ms × (1.00,1.01)  -0.61% (p=0.007)
GobEncode              11.8ms × (0.98,1.05)  11.6ms × (0.99,1.01)    ~    (p=0.075)
Gzip                    647ms × (0.99,1.01)   647ms × (1.00,1.00)    ~    (p=0.790)
Gunzip                  143ms × (1.00,1.00)   142ms × (1.00,1.00)    ~    (p=0.370)
HTTPClientServer       91.2µs × (0.99,1.01)  91.7µs × (0.99,1.02)    ~    (p=0.233)
JSONEncode             31.5ms × (0.98,1.01)  31.8ms × (0.99,1.02)  +1.09% (p=0.015)
JSONDecode              110ms × (0.99,1.01)   110ms × (0.99,1.02)    ~    (p=0.577)
Mandelbrot200          6.00ms × (1.00,1.00)  6.02ms × (1.00,1.00)  +0.24% (p=0.001)
GoParse                6.68ms × (0.98,1.02)  6.61ms × (0.99,1.01)  -1.10% (p=0.027)
RegexpMatchEasy0_32     162ns × (1.00,1.00)   161ns × (1.00,1.01)  -0.66% (p=0.001)
RegexpMatchEasy0_1K     539ns × (1.00,1.00)   539ns × (0.99,1.01)    ~    (p=0.509)
RegexpMatchEasy1_32     140ns × (0.99,1.02)   139ns × (0.99,1.02)    ~    (p=0.163)
RegexpMatchEasy1_1K     886ns × (1.00,1.00)   887ns × (1.00,1.00)    ~    (p=0.408)
RegexpMatchMedium_32    252ns × (1.00,1.00)   255ns × (0.99,1.01)  +1.01% (p=0.000)
RegexpMatchMedium_1K   72.6µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.176)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.403)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.351)
Revcomp                 926ms × (0.99,1.01)   925ms × (0.99,1.01)    ~    (p=0.541)
Template                126ms × (0.99,1.02)   130ms × (0.99,1.01)  +3.42% (p=0.000)
TimeParse               632ns × (0.99,1.01)   626ns × (1.00,1.00)  -0.88% (p=0.000)
TimeFormat              658ns × (0.99,1.01)   662ns × (0.99,1.02)    ~    (p=0.111)

The effect of this CL combined with CL 9886:

name                   old mean              new mean              delta
BinaryTree17            5.90s × (0.98,1.03)   5.95s × (0.98,1.02)    ~    (p=0.175)
Fannkuch11              4.34s × (1.00,1.00)   4.41s × (1.00,1.01)  +1.69% (p=0.000)
FmtFprintfEmpty        87.3ns × (0.97,1.17)  89.4ns × (0.94,1.13)    ~    (p=0.499)
FmtFprintfString        288ns × (0.98,1.04)   292ns × (0.98,1.05)    ~    (p=0.292)
FmtFprintfInt           290ns × (0.98,1.05)   279ns × (0.98,1.04)  -3.76% (p=0.001)
FmtFprintfIntInt        493ns × (0.98,1.04)   482ns × (0.98,1.03)  -2.27% (p=0.017)
FmtFprintfPrefixedInt   399ns × (0.98,1.02)   395ns × (0.98,1.03)    ~    (p=0.159)
FmtFprintfFloat         569ns × (1.00,1.00)   569ns × (0.99,1.01)    ~    (p=0.847)
FmtManyArgs            1.90µs × (0.99,1.03)  1.88µs × (1.00,1.01)  -1.14% (p=0.009)
GobDecode              15.2ms × (1.00,1.01)  15.2ms × (1.00,1.01)    ~    (p=0.170)
GobEncode              11.8ms × (0.99,1.02)  11.6ms × (0.99,1.01)  -1.47% (p=0.003)
Gzip                    649ms × (0.99,1.00)   647ms × (1.00,1.00)    ~    (p=0.200)
Gunzip                  144ms × (0.99,1.01)   142ms × (1.00,1.00)  -1.04% (p=0.000)
HTTPClientServer       91.1µs × (0.98,1.03)  91.7µs × (0.99,1.02)    ~    (p=0.345)
JSONEncode             31.5ms × (0.99,1.01)  31.8ms × (0.99,1.02)  +0.98% (p=0.021)
JSONDecode              110ms × (1.00,1.01)   110ms × (0.99,1.02)    ~    (p=0.259)
Mandelbrot200          6.02ms × (1.00,1.01)  6.02ms × (1.00,1.00)    ~    (p=0.500)
GoParse                6.68ms × (1.00,1.01)  6.61ms × (0.99,1.01)  -1.17% (p=0.001)
RegexpMatchEasy0_32     161ns × (1.00,1.00)   161ns × (1.00,1.01)  -0.39% (p=0.033)
RegexpMatchEasy0_1K     539ns × (1.00,1.00)   539ns × (0.99,1.01)    ~    (p=0.445)
RegexpMatchEasy1_32     138ns × (1.00,1.01)   139ns × (0.99,1.02)    ~    (p=0.281)
RegexpMatchEasy1_1K     887ns × (1.00,1.01)   887ns × (1.00,1.00)    ~    (p=0.610)
RegexpMatchMedium_32    251ns × (1.00,1.02)   255ns × (0.99,1.01)  +1.42% (p=0.000)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.097)
RegexpMatchHard_32     3.85µs × (1.00,1.00)  3.84µs × (1.00,1.00)  -0.31% (p=0.000)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.704)
Revcomp                 923ms × (0.98,1.02)   925ms × (0.99,1.01)    ~    (p=0.574)
Template                126ms × (0.98,1.03)   130ms × (0.99,1.01)  +3.28% (p=0.000)
TimeParse               631ns × (0.99,1.02)   626ns × (1.00,1.00)    ~    (p=0.053)
TimeFormat              660ns × (0.99,1.01)   662ns × (0.99,1.02)    ~    (p=0.398)

Change-Id: I59c03d329fe7bc178a31477c6f1f01062b881041
Reviewed-on: https://go-review.googlesource.com/9993
Reviewed-by: Austin Clements <austin@google.com>
2015-05-15 13:51:49 +00:00
Russ Cox
30aacd4ce2 runtime: add Node128, Node130 benchmarks
Change-Id: I815a7ceeea48cc652b3c8568967665af39b02834
Reviewed-on: https://go-review.googlesource.com/10045
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-14 20:21:34 +00:00
Russ Cox
ecfe42cab0 runtime: keep pointer bits set always in 1-word spans
It's dumb to clear them in initSpan, set them in heapBitsSetType,
clear them in heapBitsSweepSpan, set them again in heapBitsSetType,
clear them again in heapBitsSweepSpan, and so on.

Set them in initSpan and be done with it (until the span is reused
for objects of a different size).

This avoids an atomic operation in a common case (one-word allocation).
Suggested by rlh.

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.97,1.03)   5.93s × (0.98,1.04)              ~    (p=0.056)
Fannkuch11              4.34s × (1.00,1.01)   4.41s × (1.00,1.00)            +1.42% (p=0.000)
FmtFprintfEmpty        86.1ns × (0.98,1.03)  88.9ns × (0.95,1.14)              ~    (p=0.066)
FmtFprintfString        292ns × (0.97,1.04)   284ns × (0.98,1.03)            -2.64% (p=0.000)
FmtFprintfInt           271ns × (0.98,1.06)   274ns × (0.98,1.05)              ~    (p=0.148)
FmtFprintfIntInt        478ns × (0.98,1.05)   487ns × (0.98,1.03)            +1.85% (p=0.004)
FmtFprintfPrefixedInt   397ns × (0.98,1.05)   394ns × (0.98,1.02)              ~    (p=0.184)
FmtFprintfFloat         553ns × (0.99,1.02)   543ns × (0.99,1.01)            -1.71% (p=0.000)
FmtManyArgs            1.90µs × (0.98,1.05)  1.88µs × (0.99,1.01)            -0.97% (p=0.037)
GobDecode              15.1ms × (0.99,1.01)  15.3ms × (0.99,1.01)            +0.78% (p=0.001)
GobEncode              11.7ms × (0.98,1.05)  11.6ms × (0.99,1.02)            -1.39% (p=0.009)
Gzip                    646ms × (1.00,1.01)   647ms × (1.00,1.01)              ~    (p=0.120)
Gunzip                  142ms × (1.00,1.00)   142ms × (1.00,1.00)              ~    (p=0.068)
HTTPClientServer       89.7µs × (0.99,1.01)  90.1µs × (0.98,1.03)              ~    (p=0.224)
JSONEncode             31.3ms × (0.99,1.01)  31.2ms × (0.99,1.02)              ~    (p=0.149)
JSONDecode              113ms × (0.99,1.01)   111ms × (0.99,1.01)            -1.25% (p=0.000)
Mandelbrot200          6.01ms × (1.00,1.00)  6.01ms × (1.00,1.00)            +0.09% (p=0.015)
GoParse                6.63ms × (0.98,1.03)  6.55ms × (0.99,1.02)            -1.10% (p=0.006)
RegexpMatchEasy0_32     161ns × (1.00,1.00)   161ns × (1.00,1.00)  (sample has zero variance)
RegexpMatchEasy0_1K     539ns × (0.99,1.01)   563ns × (0.99,1.01)            +4.51% (p=0.000)
RegexpMatchEasy1_32     140ns × (0.99,1.01)   141ns × (0.99,1.01)            +1.34% (p=0.000)
RegexpMatchEasy1_1K     886ns × (1.00,1.01)   888ns × (1.00,1.00)            +0.20% (p=0.003)
RegexpMatchMedium_32    252ns × (1.00,1.02)   255ns × (0.99,1.01)            +1.32% (p=0.000)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  72.6µs × (1.00,1.00)              ~    (p=0.296)
RegexpMatchHard_32     3.84µs × (1.00,1.01)  3.84µs × (1.00,1.00)              ~    (p=0.339)
RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (1.00,1.00)            -0.28% (p=0.022)
Revcomp                 914ms × (0.99,1.01)   909ms × (0.99,1.01)            -0.49% (p=0.031)
Template                128ms × (0.99,1.01)   127ms × (0.99,1.01)            -1.10% (p=0.000)
TimeParse               628ns × (0.99,1.01)   639ns × (0.99,1.01)            +1.69% (p=0.000)
TimeFormat              660ns × (0.99,1.01)   662ns × (0.99,1.02)              ~    (p=0.287)

Change-Id: I3127b0ab89708267c74aa7d0eae1db1a1bcdfda5
Reviewed-on: https://go-review.googlesource.com/9884
Reviewed-by: Austin Clements <austin@google.com>
2015-05-14 15:58:54 +00:00
Russ Cox
94934f843e runtime: rewrite addb/subtractb to be simpler to compile; introduce add1, subtract1
This reduces the depth of the inlining at a particular call site.
The inliner introduces many temporary variables, and the compiler can do
a better job with fewer. Being verbose in the bodies of these helper functions
seems like a reasonable tradeoff: the uses are still just as readable, and
they run faster in some important cases.

Change-Id: I5323976ed3704d0acd18fb31176cfbf5ba23a89c
Reviewed-on: https://go-review.googlesource.com/9883
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-14 15:55:42 +00:00
Russ Cox
5b3739357a runtime: skip atomics in heapBitsSetType when GC is not running
Suggested by Rick during code review of this code,
but separated out for easier diagnosis in case it causes
problems (and also easier rollback).

name                    old mean              new mean              delta
SetTypePtr              13.9ns × (0.98,1.05)   6.2ns × (0.99,1.01)  -55.18% (p=0.000)
SetTypePtr8             15.5ns × (0.95,1.10)  15.5ns × (0.99,1.05)     ~    (p=0.952)
SetTypePtr16            17.8ns × (0.99,1.05)  18.0ns × (1.00,1.00)     ~    (p=0.157)
SetTypePtr32            25.2ns × (0.99,1.01)  24.3ns × (0.99,1.01)   -3.86% (p=0.000)
SetTypePtr64            42.2ns × (0.93,1.13)  40.8ns × (0.99,1.01)     ~    (p=0.239)
SetTypePtr126           67.3ns × (1.00,1.00)  67.5ns × (0.99,1.02)     ~    (p=0.365)
SetTypePtr128           67.6ns × (1.00,1.01)  70.1ns × (0.97,1.10)     ~    (p=0.063)
SetTypePtrSlice          575ns × (0.98,1.06)   543ns × (0.95,1.17)   -5.54% (p=0.034)
SetTypeNode1            12.4ns × (0.98,1.09)  12.8ns × (0.99,1.01)   +3.40% (p=0.021)
SetTypeNode1Slice       97.1ns × (0.97,1.09)  89.5ns × (1.00,1.00)   -7.78% (p=0.000)
SetTypeNode8            29.8ns × (1.00,1.01)  17.7ns × (1.00,1.01)  -40.74% (p=0.000)
SetTypeNode8Slice        204ns × (0.99,1.04)   190ns × (0.97,1.06)   -6.96% (p=0.000)
SetTypeNode64           42.8ns × (0.99,1.01)  44.0ns × (0.95,1.12)     ~    (p=0.163)
SetTypeNode64Slice      1.00µs × (0.95,1.09)  0.98µs × (0.96,1.08)     ~    (p=0.356)
SetTypeNode64Dead       12.2ns × (0.99,1.04)  12.7ns × (1.00,1.01)   +4.34% (p=0.000)
SetTypeNode64DeadSlice  1.14µs × (0.94,1.11)  0.99µs × (0.99,1.03)  -13.74% (p=0.000)
SetTypeNode124          67.9ns × (0.99,1.03)  70.4ns × (0.95,1.15)     ~    (p=0.115)
SetTypeNode124Slice     1.76µs × (0.99,1.04)  1.88µs × (0.91,1.23)     ~    (p=0.096)
SetTypeNode126          67.7ns × (1.00,1.01)  68.2ns × (0.99,1.02)   +0.72% (p=0.014)
SetTypeNode126Slice     1.76µs × (1.00,1.01)  1.87µs × (0.93,1.15)   +6.15% (p=0.035)
SetTypeNode1024          462ns × (0.96,1.10)   451ns × (0.99,1.05)     ~    (p=0.224)
SetTypeNode1024Slice    14.4µs × (0.95,1.15)  14.2µs × (0.97,1.19)     ~    (p=0.676)

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.98,1.04)   5.87s × (0.98,1.03)    ~    (p=0.993)
Fannkuch11              4.39s × (0.99,1.01)   4.34s × (1.00,1.01)  -1.22% (p=0.000)
FmtFprintfEmpty        90.6ns × (0.97,1.06)  89.4ns × (0.97,1.03)    ~    (p=0.070)
FmtFprintfString        305ns × (0.98,1.02)   296ns × (0.99,1.02)  -2.94% (p=0.000)
FmtFprintfInt           276ns × (0.97,1.04)   270ns × (0.98,1.03)  -2.17% (p=0.001)
FmtFprintfIntInt        490ns × (0.97,1.05)   473ns × (0.99,1.02)  -3.59% (p=0.000)
FmtFprintfPrefixedInt   402ns × (0.99,1.02)   397ns × (0.99,1.01)  -1.15% (p=0.000)
FmtFprintfFloat         577ns × (0.99,1.01)   549ns × (0.99,1.01)  -4.78% (p=0.000)
FmtManyArgs            1.89µs × (0.99,1.02)  1.87µs × (0.99,1.01)  -1.43% (p=0.000)
GobDecode              15.2ms × (0.99,1.01)  14.7ms × (0.99,1.02)  -3.55% (p=0.000)
GobEncode              11.7ms × (0.98,1.04)  11.5ms × (0.99,1.02)  -1.63% (p=0.002)
Gzip                    647ms × (0.99,1.01)   647ms × (1.00,1.01)    ~    (p=0.486)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.00)    ~    (p=0.234)
HTTPClientServer       90.7µs × (0.99,1.01)  90.4µs × (0.98,1.04)    ~    (p=0.331)
JSONEncode             31.9ms × (0.97,1.06)  31.6ms × (0.98,1.02)    ~    (p=0.206)
JSONDecode              110ms × (0.99,1.01)   112ms × (0.99,1.02)  +1.48% (p=0.000)
Mandelbrot200          6.00ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.058)
GoParse                6.63ms × (0.98,1.03)  6.61ms × (0.98,1.02)    ~    (p=0.353)
RegexpMatchEasy0_32     162ns × (0.99,1.01)   161ns × (1.00,1.00)  -0.33% (p=0.004)
RegexpMatchEasy0_1K     539ns × (0.99,1.01)   540ns × (0.99,1.02)    ~    (p=0.222)
RegexpMatchEasy1_32     139ns × (0.99,1.01)   140ns × (0.97,1.03)    ~    (p=0.054)
RegexpMatchEasy1_1K     886ns × (1.00,1.00)   887ns × (1.00,1.00)  +0.18% (p=0.001)
RegexpMatchMedium_32    252ns × (1.00,1.01)   252ns × (1.00,1.00)  +0.21% (p=0.010)
RegexpMatchMedium_1K   72.7µs × (1.00,1.01)  72.6µs × (1.00,1.00)    ~    (p=0.060)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.065)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)  -0.27% (p=0.000)
Revcomp                 916ms × (0.98,1.04)   909ms × (0.99,1.01)    ~    (p=0.054)
Template                126ms × (0.99,1.01)   128ms × (0.99,1.02)  +1.43% (p=0.000)
TimeParse               632ns × (0.99,1.01)   625ns × (1.00,1.01)  -1.05% (p=0.000)
TimeFormat              655ns × (0.99,1.02)   669ns × (0.99,1.02)  +2.01% (p=0.000)

Change-Id: I9477b7c9489c6fa98e860c190ce06cd73c53c6a1
Reviewed-on: https://go-review.googlesource.com/9829
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-14 15:54:53 +00:00
Brad Fitzpatrick
6f2c0f1585 runtime: add check for malloc in a signal handler
Change-Id: Ic8ebbe81eb788626c01bfab238d54236e6e5ef2b
Reviewed-on: https://go-review.googlesource.com/9964
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-13 20:36:19 +00:00
Rick Hudson
c4fe503119 runtime: reduce thrashing of gs between ps
One important use case is a pipeline computation that pass values
from one Goroutine to the next and then exits or is placed in a
wait state. If GOMAXPROCS > 1 a Goroutine running on P1 will enable
another Goroutine and then immediately make P1 available to execute
it. We need to prevent other Ps from stealing the G that P1 is about
to execute. Otherwise the Gs can thrash between Ps causing unneeded
synchronization and slowing down throughput.

Fix this by changing the stealing logic so that when a P attempts to
steal the only G on some other P's run queue, it will pause
momentarily to allow the victim P to schedule the G.

As part of optimizing stealing we also use a per P victim queue
move stolen gs. This eliminates the zeroing of a stack local victim
queue which turned out to be expensive.

This CL is a necessary but not sufficient prerequisite to changing
the default value of GOMAXPROCS to something > 1 which is another
CL/discussion.

For highly serialized programs, such as GoroutineRing below this can
make a large difference. For larger and more parallel programs such
as the x/benchmarks there is no noticeable detriment.

~/work/code/src/rsc.io/benchstat/benchstat old.txt new.txt
name                old mean              new mean              delta
GoroutineRing       30.2µs × (0.98,1.01)  30.1µs × (0.97,1.04)     ~    (p=0.941)
GoroutineRing-2      113µs × (0.91,1.07)    30µs × (0.98,1.03)  -73.17% (p=0.004)
GoroutineRing-4      144µs × (0.98,1.02)    32µs × (0.98,1.01)  -77.69% (p=0.000)
GoroutineRingBuf    32.7µs × (0.97,1.03)  32.5µs × (0.97,1.02)     ~    (p=0.795)
GoroutineRingBuf-2   120µs × (0.92,1.08)    33µs × (1.00,1.00)  -72.48% (p=0.004)
GoroutineRingBuf-4   138µs × (0.92,1.06)    33µs × (1.00,1.00)  -76.21% (p=0.003)

The bench benchmarks show little impact.
    	  	      old  	 new
garbage	      	      7032879	 7011696
httpold		        25509	   25301
splayold	      1022073	 1019499
jsonold		     28230624   28081433

Change-Id: I228c48fed8d85c9bbef16a7edc53ab7898506f50
Reviewed-on: https://go-review.googlesource.com/9872
Reviewed-by: Austin Clements <austin@google.com>
2015-05-13 12:55:24 +00:00
Austin Clements
350fd548b3 runtime: don't run runq tests on the system stack
Running these tests on the system stack is problematic because they
allocate Ps, which are large enough to overflow the system stack if
they are stack-allocated. It used to be necessary to run these tests
on the system stack because they were written in C, but since this is
no longer the case, we can fix this problem by simply not running the
tests on the system stack.

This also means we no longer need the hack in one of these tests that
forces the allocated Ps to escape to the heap, so eliminate that as
well.

Change-Id: I9064f5f8fd7f7b446ff39a22a70b172cfcb2dc57
Reviewed-on: https://go-review.googlesource.com/9923
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-12 19:58:08 +00:00
David du Colombier
7de86a1b1c runtime: terminate exit status buffer on Plan 9
The status buffer built by the exit function
was not nil-terminated.

Fixes #10789.

Change-Id: I2d34ac50a19d138176c4b47393497ba7070d5b61
Reviewed-on: https://go-review.googlesource.com/9953
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-12 16:35:58 +00:00
David du Colombier
f85a05581e runtime: fix signal handling on Plan 9
Once added to the signal queue, the pointer passed to the
signal handler could no longer be valid. Instead of passing
the pointer to the note string, we recopy the value of the
note string to a static array in the signal queue.

Fixes #10784.

Change-Id: Iddd6837b58a14dfaa16b069308ae28a7b8e0965b
Reviewed-on: https://go-review.googlesource.com/9950
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-12 16:35:46 +00:00
Michael Hudson-Doyle
77fc03f4cd cmd/internal/ld, runtime: abort on shared library ABI mismatch
This:

1) Defines the ABI hash of a package (as the SHA1 of the __.PKGDEF)
2) Defines the ABI hash of a shared library (sort the packages by import
   path, concatenate the hashes of the packages and SHA1 that)
3) When building a shared library, compute the above value and define a
   global symbol that points to a go string that has the hash as its value.
4) When linking against a shared library, read the abi hash from the
   library and put both the value seen at link time and a reference
   to the global symbol into the moduledata.
5) During runtime initialization, check that the hash seen at link time
   still matches the hash the global symbol points to.

Change-Id: Iaa54c783790e6dde3057a2feadc35473d49614a5
Reviewed-on: https://go-review.googlesource.com/8773
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
2015-05-12 01:30:40 +00:00
Michael Hudson-Doyle
be0cb9224b runtime: fix addmoduledata to follow the platform ABI
addmoduledata is called from a .init_array function and need to follow the
platform ABI. It contains accesses to global data which are rewritten to use
R15 by the assembler, and as R15 is callee-save we need to save it.

Change-Id: I03893efb1576aed4f102f2465421f256f3bb0f30
Reviewed-on: https://go-review.googlesource.com/9941
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-12 00:50:32 +00:00
Russ Cox
4212a3c3d9 runtime: use heap bitmap for typedmemmove
The current implementation of typedmemmove walks the ptrmask
in the type to find out where pointers are. This led to turning off
GC programs for the Go 1.5 dev cycle, so that there would always
be a ptrmask. Instead of also interpreting the GC programs,
interpret the heap bitmap, which we know must be available and
up to date. (There is no point to write barriers when writing outside
the heap.)

This CL is only about correctness. The next CL will optimize the code.

Change-Id: Id1305c7c071fd2734ab96634b0e1c745b23fa793
Reviewed-on: https://go-review.googlesource.com/9886
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:38:21 +00:00
Russ Cox
266a842f55 runtime: zero entire bitmap for object, even past dead marker
We want typedmemmove to use the heap bitmap to determine
where pointers are, instead of reinterpreting the type information.
The heap bitmap is simpler to access.

In general, typedmemmove will need to be able to look up the bits
for any word and find valid pointer information, so fill even after the
dead marker. Not filling after the dead marker was an optimization
I introduced only a few days ago, when reintroducing the dead marker
code. At the time I said it probably wouldn't last, and it didn't.

Change-Id: I6ba01bff17ddee1ff429f454abe29867ec60606e
Reviewed-on: https://go-review.googlesource.com/9885
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:37:46 +00:00
Russ Cox
e375ca2a25 runtime: reorder bits in heap bitmap bytes
The runtime deals with 1-bit pointer bitmaps and 2-bit heap bitmaps
that have entries for both pointers and mark bits.

Each byte in a 1-bit pointer bitmap looks like pppppppp (all pointer bits).
Each byte in a 2-bit heap bitmap looks like mpmpmpmp (mark, pointer, ...).
This means that when converting from 1-bit to 2-bit, as we do
during malloc, we have to pick up 4 bits in pppp form and use
shifts to create the mpmpmpmp form.

This CL changes the 2-bit heap bitmap form to mmmmpppp,
so that 4 bits picked up in 1-bit form can be used directly in
the low bits of the heap bitmap byte, without expansion.
This simplifies the code, and it also happens to be faster.

name                    old mean              new mean              delta
SetTypePtr              14.0ns × (0.98,1.09)  14.0ns × (0.98,1.08)     ~    (p=0.966)
SetTypePtr8             16.5ns × (0.99,1.05)  15.3ns × (0.96,1.16)   -6.86% (p=0.012)
SetTypePtr16            21.3ns × (0.98,1.05)  18.8ns × (0.94,1.14)  -11.49% (p=0.000)
SetTypePtr32            34.6ns × (0.93,1.22)  27.7ns × (0.91,1.26)  -20.08% (p=0.001)
SetTypePtr64            55.7ns × (0.97,1.11)  41.6ns × (0.98,1.04)  -25.30% (p=0.000)
SetTypePtr126           98.0ns × (1.00,1.00)  67.7ns × (0.99,1.05)  -30.88% (p=0.000)
SetTypePtr128           98.6ns × (1.00,1.01)  68.6ns × (0.99,1.03)  -30.44% (p=0.000)
SetTypePtrSlice          781ns × (0.99,1.01)   571ns × (0.99,1.04)  -26.93% (p=0.000)
SetTypeNode1            13.1ns × (0.99,1.01)  12.1ns × (0.99,1.01)   -7.45% (p=0.000)
SetTypeNode1Slice        113ns × (0.99,1.01)    94ns × (1.00,1.00)  -16.35% (p=0.000)
SetTypeNode8            32.7ns × (1.00,1.00)  29.8ns × (0.99,1.01)   -8.97% (p=0.000)
SetTypeNode8Slice        266ns × (1.00,1.00)   204ns × (1.00,1.00)  -23.40% (p=0.000)
SetTypeNode64           58.0ns × (0.98,1.08)  42.8ns × (1.00,1.01)  -26.24% (p=0.000)
SetTypeNode64Slice      1.55µs × (0.99,1.02)  0.96µs × (1.00,1.00)  -37.84% (p=0.000)
SetTypeNode64Dead       13.1ns × (0.99,1.01)  12.1ns × (1.00,1.00)   -7.33% (p=0.000)
SetTypeNode64DeadSlice  1.52µs × (1.00,1.01)  1.08µs × (1.00,1.01)  -28.95% (p=0.000)
SetTypeNode124          97.9ns × (1.00,1.00)  67.1ns × (1.00,1.01)  -31.49% (p=0.000)
SetTypeNode124Slice     2.87µs × (0.99,1.02)  1.75µs × (1.00,1.01)  -39.15% (p=0.000)
SetTypeNode126          98.4ns × (1.00,1.01)  68.1ns × (1.00,1.01)  -30.79% (p=0.000)
SetTypeNode126Slice     2.91µs × (0.99,1.01)  1.77µs × (0.99,1.01)  -39.09% (p=0.000)
SetTypeNode1024          732ns × (1.00,1.00)   511ns × (0.87,1.42)  -30.14% (p=0.000)
SetTypeNode1024Slice    23.1µs × (1.00,1.00)  13.9µs × (0.99,1.02)  -39.83% (p=0.000)

Change-Id: I12e3b850a4e6fa6c8146b8635ff728f3ef658819
Reviewed-on: https://go-review.googlesource.com/9828
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:37:36 +00:00
Russ Cox
363fd1dd1b runtime: move a few atomic fields up
Moving them up makes them properly aligned on 32-bit systems.
There are some odd fields above them right now
(like fixalloc and mutex maybe).

Change-Id: I57851a5bbb2e7cc339712f004f99bb6c0cce0ca5
Reviewed-on: https://go-review.googlesource.com/9889
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:08:57 +00:00
Russ Cox
8f037fa1ab runtime: fix TestLFStack on 386
The new(uint64) was moving to the stack, which may not be aligned.

Change-Id: Iad070964202001b52029494d43e299fed980f939
Reviewed-on: https://go-review.googlesource.com/9787
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-05-11 15:21:54 +00:00
Russ Cox
1635ab7dfe runtime: remove wbshadow mode
The write barrier shadow heap was very useful for
developing the write barriers initially, but it's no longer used,
clunky, and dragging the rest of the implementation down.

The gccheckmark mode will find bugs due to missed barriers
when they result in missed marks; wbshadow mode found the
missed barriers more aggressively, but it required an entire
separate copy of the heap. The gccheckmark mode requires
no extra memory, making it more useful in practice.

Compared to previous CL:
name                   old mean              new mean              delta
BinaryTree17            5.91s × (0.96,1.06)   5.72s × (0.97,1.03)  -3.12% (p=0.000)
Fannkuch11              4.32s × (1.00,1.00)   4.36s × (1.00,1.00)  +0.91% (p=0.000)
FmtFprintfEmpty        89.0ns × (0.93,1.10)  86.6ns × (0.96,1.11)    ~    (p=0.077)
FmtFprintfString        298ns × (0.98,1.06)   283ns × (0.99,1.04)  -4.90% (p=0.000)
FmtFprintfInt           286ns × (0.98,1.03)   283ns × (0.98,1.04)  -1.09% (p=0.032)
FmtFprintfIntInt        498ns × (0.97,1.06)   480ns × (0.99,1.02)  -3.65% (p=0.000)
FmtFprintfPrefixedInt   408ns × (0.98,1.02)   396ns × (0.99,1.01)  -3.00% (p=0.000)
FmtFprintfFloat         587ns × (0.98,1.01)   562ns × (0.99,1.01)  -4.34% (p=0.000)
FmtManyArgs            1.94µs × (0.99,1.02)  1.89µs × (0.99,1.01)  -2.85% (p=0.000)
GobDecode              15.8ms × (0.98,1.03)  15.7ms × (0.99,1.02)    ~    (p=0.251)
GobEncode              12.0ms × (0.96,1.09)  11.8ms × (0.98,1.03)  -1.87% (p=0.024)
Gzip                    648ms × (0.99,1.01)   647ms × (0.99,1.01)    ~    (p=0.688)
Gunzip                  143ms × (1.00,1.01)   143ms × (1.00,1.01)    ~    (p=0.203)
HTTPClientServer       90.3µs × (0.98,1.01)  89.1µs × (0.99,1.02)  -1.30% (p=0.000)
JSONEncode             31.6ms × (0.99,1.01)  31.7ms × (0.98,1.02)    ~    (p=0.219)
JSONDecode              107ms × (1.00,1.01)   111ms × (0.99,1.01)  +3.58% (p=0.000)
Mandelbrot200          6.03ms × (1.00,1.01)  6.01ms × (1.00,1.00)    ~    (p=0.077)
GoParse                6.53ms × (0.99,1.03)  6.54ms × (0.99,1.02)    ~    (p=0.585)
RegexpMatchEasy0_32     161ns × (1.00,1.01)   161ns × (0.98,1.05)    ~    (p=0.948)
RegexpMatchEasy0_1K     541ns × (0.99,1.01)   559ns × (0.98,1.01)  +3.32% (p=0.000)
RegexpMatchEasy1_32     138ns × (1.00,1.00)   137ns × (0.99,1.01)  -0.55% (p=0.001)
RegexpMatchEasy1_1K     887ns × (0.99,1.01)   878ns × (0.99,1.01)  -0.98% (p=0.000)
RegexpMatchMedium_32    253ns × (0.99,1.01)   252ns × (0.99,1.01)  -0.39% (p=0.001)
RegexpMatchMedium_1K   72.8µs × (1.00,1.00)  72.7µs × (1.00,1.00)    ~    (p=0.485)
RegexpMatchHard_32     3.85µs × (1.00,1.01)  3.85µs × (1.00,1.01)    ~    (p=0.283)
RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (1.00,1.00)    ~    (p=0.175)
Revcomp                 922ms × (0.97,1.08)   903ms × (0.98,1.05)  -2.15% (p=0.021)
Template                126ms × (0.99,1.01)   126ms × (0.99,1.01)    ~    (p=0.943)
TimeParse               628ns × (0.99,1.01)   634ns × (0.99,1.01)  +0.92% (p=0.000)
TimeFormat              668ns × (0.99,1.01)   698ns × (0.98,1.03)  +4.53% (p=0.000)

It's nice that the microbenchmarks are the ones helped the most,
because those were the ones hurt the most by the conversion from
4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that
process to (compared to CL 9706 patch set 1):

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.94,1.09)   5.72s × (0.97,1.03)  -2.57% (p=0.011)
Fannkuch11              4.32s × (1.00,1.00)   4.36s × (1.00,1.00)  +0.87% (p=0.000)
FmtFprintfEmpty        89.1ns × (0.95,1.16)  86.6ns × (0.96,1.11)    ~    (p=0.090)
FmtFprintfString        283ns × (0.98,1.02)   283ns × (0.99,1.04)    ~    (p=0.681)
FmtFprintfInt           284ns × (0.98,1.04)   283ns × (0.98,1.04)    ~    (p=0.620)
FmtFprintfIntInt        486ns × (0.98,1.03)   480ns × (0.99,1.02)  -1.27% (p=0.002)
FmtFprintfPrefixedInt   400ns × (0.99,1.02)   396ns × (0.99,1.01)  -0.84% (p=0.001)
FmtFprintfFloat         566ns × (0.99,1.01)   562ns × (0.99,1.01)  -0.80% (p=0.000)
FmtManyArgs            1.91µs × (0.99,1.02)  1.89µs × (0.99,1.01)  -1.10% (p=0.000)
GobDecode              15.5ms × (0.98,1.05)  15.7ms × (0.99,1.02)  +1.55% (p=0.005)
GobEncode              11.9ms × (0.97,1.03)  11.8ms × (0.98,1.03)  -0.97% (p=0.048)
Gzip                    648ms × (0.99,1.01)   647ms × (0.99,1.01)    ~    (p=0.627)
Gunzip                  143ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.482)
HTTPClientServer       89.2µs × (0.99,1.02)  89.1µs × (0.99,1.02)    ~    (p=0.740)
JSONEncode             32.3ms × (0.97,1.06)  31.7ms × (0.98,1.02)  -1.95% (p=0.002)
JSONDecode              106ms × (0.99,1.01)   111ms × (0.99,1.01)  +4.22% (p=0.000)
Mandelbrot200          6.02ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.417)
GoParse                6.57ms × (0.97,1.06)  6.54ms × (0.99,1.02)    ~    (p=0.404)
RegexpMatchEasy0_32     162ns × (1.00,1.00)   161ns × (0.98,1.05)    ~    (p=0.088)
RegexpMatchEasy0_1K     561ns × (0.99,1.02)   559ns × (0.98,1.01)  -0.47% (p=0.034)
RegexpMatchEasy1_32     145ns × (0.95,1.04)   137ns × (0.99,1.01)  -5.56% (p=0.000)
RegexpMatchEasy1_1K     864ns × (0.99,1.04)   878ns × (0.99,1.01)  +1.57% (p=0.000)
RegexpMatchMedium_32    255ns × (0.99,1.04)   252ns × (0.99,1.01)  -1.43% (p=0.001)
RegexpMatchMedium_1K   73.9µs × (0.98,1.04)  72.7µs × (1.00,1.00)  -1.55% (p=0.004)
RegexpMatchHard_32     3.92µs × (0.98,1.04)  3.85µs × (1.00,1.01)  -1.80% (p=0.003)
RegexpMatchHard_1K      120µs × (0.98,1.04)   117µs × (1.00,1.00)  -2.13% (p=0.001)
Revcomp                 936ms × (0.95,1.08)   903ms × (0.98,1.05)  -3.58% (p=0.002)
Template                130ms × (0.98,1.04)   126ms × (0.99,1.01)  -2.98% (p=0.000)
TimeParse               638ns × (0.98,1.05)   634ns × (0.99,1.01)    ~    (p=0.198)
TimeFormat              674ns × (0.99,1.01)   698ns × (0.98,1.03)  +3.69% (p=0.000)

Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c
Reviewed-on: https://go-review.googlesource.com/9706
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-11 14:55:11 +00:00
Russ Cox
54af9a3ba5 runtime: reintroduce ``dead'' space during GC scan
Reintroduce an optimization discarded during the initial conversion
from 4-bit heap bitmaps to 2-bit heap bitmaps: when we reach the
place in the bitmap where there are no more pointers, mark that position
for the GC so that it can avoid scanning past that place.

During heapBitsSetType we can also avoid initializing heap bitmap
beyond that location, which gives a bit of a win compared to Go 1.4.
This particular optimization (not initializing the heap bitmap) may not last:
we might change typedmemmove to use the heap bitmap, in which
case it would all need to be initialized. The early stop in the GC scan
will stay no matter what.

Compared to Go 1.4 (github.com/rsc/go, branch go14bench):
name                    old mean              new mean              delta
SetTypeNode64           80.7ns × (1.00,1.01)  57.4ns × (1.00,1.01)  -28.83% (p=0.000)
SetTypeNode64Dead       80.5ns × (1.00,1.01)  13.1ns × (0.99,1.02)  -83.77% (p=0.000)
SetTypeNode64Slice      2.16µs × (1.00,1.01)  1.54µs × (1.00,1.01)  -28.75% (p=0.000)
SetTypeNode64DeadSlice  2.16µs × (1.00,1.01)  1.52µs × (1.00,1.00)  -29.74% (p=0.000)

Compared to previous CL:
name                    old mean              new mean              delta
SetTypeNode64           56.7ns × (1.00,1.00)  57.4ns × (1.00,1.01)   +1.19% (p=0.000)
SetTypeNode64Dead       57.2ns × (1.00,1.00)  13.1ns × (0.99,1.02)  -77.15% (p=0.000)
SetTypeNode64Slice      1.56µs × (1.00,1.01)  1.54µs × (1.00,1.01)   -0.89% (p=0.000)
SetTypeNode64DeadSlice  1.55µs × (1.00,1.01)  1.52µs × (1.00,1.00)   -2.23% (p=0.000)

This is the last CL in the sequence converting from the 4-bit heap
to the 2-bit heap, with all the same optimizations reenabled.
Compared to before that process began (compared to CL 9701 patch set 1):

name                    old mean              new mean              delta
BinaryTree17             5.87s × (0.94,1.09)   5.91s × (0.96,1.06)    ~    (p=0.578)
Fannkuch11               4.32s × (1.00,1.00)   4.32s × (1.00,1.00)    ~    (p=0.474)
FmtFprintfEmpty         89.1ns × (0.95,1.16)  89.0ns × (0.93,1.10)    ~    (p=0.942)
FmtFprintfString         283ns × (0.98,1.02)   298ns × (0.98,1.06)  +5.33% (p=0.000)
FmtFprintfInt            284ns × (0.98,1.04)   286ns × (0.98,1.03)    ~    (p=0.208)
FmtFprintfIntInt         486ns × (0.98,1.03)   498ns × (0.97,1.06)  +2.48% (p=0.000)
FmtFprintfPrefixedInt    400ns × (0.99,1.02)   408ns × (0.98,1.02)  +2.23% (p=0.000)
FmtFprintfFloat          566ns × (0.99,1.01)   587ns × (0.98,1.01)  +3.69% (p=0.000)
FmtManyArgs             1.91µs × (0.99,1.02)  1.94µs × (0.99,1.02)  +1.81% (p=0.000)
GobDecode               15.5ms × (0.98,1.05)  15.8ms × (0.98,1.03)  +1.94% (p=0.002)
GobEncode               11.9ms × (0.97,1.03)  12.0ms × (0.96,1.09)    ~    (p=0.263)
Gzip                     648ms × (0.99,1.01)   648ms × (0.99,1.01)    ~    (p=0.992)
Gunzip                   143ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.585)
HTTPClientServer        89.2µs × (0.99,1.02)  90.3µs × (0.98,1.01)  +1.24% (p=0.000)
JSONEncode              32.3ms × (0.97,1.06)  31.6ms × (0.99,1.01)  -2.29% (p=0.000)
JSONDecode               106ms × (0.99,1.01)   107ms × (1.00,1.01)  +0.62% (p=0.000)
Mandelbrot200           6.02ms × (1.00,1.00)  6.03ms × (1.00,1.01)    ~    (p=0.250)
GoParse                 6.57ms × (0.97,1.06)  6.53ms × (0.99,1.03)    ~    (p=0.243)
RegexpMatchEasy0_32      162ns × (1.00,1.00)   161ns × (1.00,1.01)  -0.80% (p=0.000)
RegexpMatchEasy0_1K      561ns × (0.99,1.02)   541ns × (0.99,1.01)  -3.67% (p=0.000)
RegexpMatchEasy1_32      145ns × (0.95,1.04)   138ns × (1.00,1.00)  -5.04% (p=0.000)
RegexpMatchEasy1_1K      864ns × (0.99,1.04)   887ns × (0.99,1.01)  +2.57% (p=0.000)
RegexpMatchMedium_32     255ns × (0.99,1.04)   253ns × (0.99,1.01)  -1.05% (p=0.012)
RegexpMatchMedium_1K    73.9µs × (0.98,1.04)  72.8µs × (1.00,1.00)  -1.51% (p=0.005)
RegexpMatchHard_32      3.92µs × (0.98,1.04)  3.85µs × (1.00,1.01)  -1.88% (p=0.002)
RegexpMatchHard_1K       120µs × (0.98,1.04)   117µs × (1.00,1.01)  -2.02% (p=0.001)
Revcomp                  936ms × (0.95,1.08)   922ms × (0.97,1.08)    ~    (p=0.234)
Template                 130ms × (0.98,1.04)   126ms × (0.99,1.01)  -2.99% (p=0.000)
TimeParse                638ns × (0.98,1.05)   628ns × (0.99,1.01)  -1.54% (p=0.004)
TimeFormat               674ns × (0.99,1.01)   668ns × (0.99,1.01)  -0.80% (p=0.001)

The slowdown of the first few benchmarks seems to be due to the new
atomic operations for certain small size allocations. But the larger
benchmarks mostly improve, probably due to the decreased memory
pressure from having half as much heap bitmap.

CL 9706, which removes the (never used anymore) wbshadow mode,
gets back what is lost in the early microbenchmarks.

Change-Id: I37423a209e8ec2a2e92538b45cac5422a6acd32d
Reviewed-on: https://go-review.googlesource.com/9705
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-11 14:51:40 +00:00
Russ Cox
feb8a3b616 runtime: optimize heapBitsSetType
For the conversion of the heap bitmap from 4-bit to 2-bit fields,
I replaced heapBitsSetType with the dumbest thing that could possibly work:
two atomic operations (atomicand8+atomicor8) per 2-bit field.

This CL replaces that code with a proper implementation that
avoids the atomics whenever possible. Benchmarks vs base CL
(before the conversion to 2-bit heap bitmap) and vs Go 1.4 below.

Compared to Go 1.4, SetTypePtr (a 1-pointer allocation)
is 10ns slower because a race against the concurrent GC requires the
use of an atomicor8 that used to be an ordinary write. This slowdown
was present even in the base CL.

Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation)
is 10ns slower because it too needs a new atomic, because with the
denser representation, the byte on the end of the allocation is now shared
with the object next to it; this was not true with the 4-bit representation.

Excluding these two (fundamental) slowdowns due to the use of atomics,
the new code is noticeably faster than both Go 1.4 and the base CL.

The next CL will reintroduce the ``typeDead'' optimization.

Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5).

Compared to base CL (** = new atomic)
name                  old mean              new mean              delta
SetTypePtr            14.1ns × (0.99,1.02)  14.7ns × (0.93,1.10)     ~    (p=0.175)
SetTypePtr8           18.4ns × (1.00,1.01)  18.6ns × (0.81,1.21)     ~    (p=0.866)
SetTypePtr16          28.7ns × (1.00,1.00)  22.4ns × (0.90,1.27)  -21.88% (p=0.015)
SetTypePtr32          52.3ns × (1.00,1.00)  33.8ns × (0.93,1.24)  -35.37% (p=0.001)
SetTypePtr64          79.2ns × (1.00,1.00)  55.1ns × (1.00,1.01)  -30.43% (p=0.000)
SetTypePtr126          118ns × (1.00,1.00)   100ns × (1.00,1.00)  -15.97% (p=0.000)
SetTypePtr128          130ns × (0.92,1.19)    98ns × (1.00,1.00)  -24.36% (p=0.008)
SetTypePtrSlice        726ns × (0.96,1.08)   760ns × (1.00,1.00)     ~    (p=0.152)
SetTypeNode1          14.1ns × (0.94,1.15)  12.0ns × (1.00,1.01)  -14.60% (p=0.020)
SetTypeNode1Slice      135ns × (0.96,1.07)    88ns × (1.00,1.00)  -34.53% (p=0.000)
SetTypeNode8          20.9ns × (1.00,1.01)  32.6ns × (1.00,1.00)  +55.37% (p=0.000) **
SetTypeNode8Slice      414ns × (0.99,1.02)   244ns × (1.00,1.00)  -41.09% (p=0.000)
SetTypeNode64         80.0ns × (1.00,1.00)  57.4ns × (1.00,1.00)  -28.23% (p=0.000)
SetTypeNode64Slice    2.15µs × (1.00,1.01)  1.56µs × (1.00,1.00)  -27.43% (p=0.000)
SetTypeNode124         119ns × (0.99,1.00)   100ns × (1.00,1.00)  -16.11% (p=0.000)
SetTypeNode124Slice   3.40µs × (1.00,1.00)  2.93µs × (1.00,1.00)  -13.80% (p=0.000)
SetTypeNode126         120ns × (1.00,1.01)    98ns × (1.00,1.00)  -18.19% (p=0.000)
SetTypeNode126Slice   3.53µs × (0.98,1.08)  3.02µs × (1.00,1.00)  -14.49% (p=0.002)
SetTypeNode1024        726ns × (0.97,1.09)   740ns × (1.00,1.00)     ~    (p=0.451)
SetTypeNode1024Slice  24.9µs × (0.89,1.37)  23.1µs × (1.00,1.00)     ~    (p=0.476)

Compared to Go 1.4 (** = new atomic)
name                  old mean               new mean              delta
SetTypePtr            5.71ns × (0.89,1.19)  14.68ns × (0.93,1.10)  +157.24% (p=0.000) **
SetTypePtr8           19.3ns × (0.96,1.10)   18.6ns × (0.81,1.21)      ~    (p=0.638)
SetTypePtr16          30.7ns × (0.99,1.03)   22.4ns × (0.90,1.27)   -26.88% (p=0.005)
SetTypePtr32          51.5ns × (1.00,1.00)   33.8ns × (0.93,1.24)   -34.40% (p=0.001)
SetTypePtr64          83.6ns × (0.94,1.12)   55.1ns × (1.00,1.01)   -34.12% (p=0.001)
SetTypePtr126          137ns × (0.87,1.26)    100ns × (1.00,1.00)   -27.10% (p=0.028)
SetTypePtrSlice        865ns × (0.80,1.23)    760ns × (1.00,1.00)      ~    (p=0.243)
SetTypeNode1          15.2ns × (0.88,1.12)   12.0ns × (1.00,1.01)   -20.89% (p=0.014)
SetTypeNode1Slice      156ns × (0.93,1.16)     88ns × (1.00,1.00)   -43.57% (p=0.001)
SetTypeNode8          23.8ns × (0.90,1.18)   32.6ns × (1.00,1.00)   +36.76% (p=0.003) **
SetTypeNode8Slice      502ns × (0.92,1.10)    244ns × (1.00,1.00)   -51.46% (p=0.000)
SetTypeNode64         85.6ns × (0.94,1.11)   57.4ns × (1.00,1.00)   -32.89% (p=0.001)
SetTypeNode64Slice    2.36µs × (0.91,1.14)   1.56µs × (1.00,1.00)   -33.96% (p=0.002)
SetTypeNode124         130ns × (0.91,1.12)    100ns × (1.00,1.00)   -23.49% (p=0.004)
SetTypeNode124Slice   3.81µs × (0.90,1.22)   2.93µs × (1.00,1.00)   -23.09% (p=0.025)

There are fewer benchmarks vs Go 1.4 because unrolling directly
into the heap bitmap is not yet implemented, so those would not
be meaningful comparisons.

These benchmarks were not present in Go 1.4 as distributed.
The backport to Go 1.4 is in github.com/rsc/go's go14bench branch,
commit 71d5ee5.

Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6
Reviewed-on: https://go-review.googlesource.com/9704
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-11 14:51:20 +00:00
Russ Cox
0234dfd493 runtime: use 2-bit heap bitmap (in place of 4-bit)
Previous CLs changed the representation of the non-heap type bitmaps
to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap
stored a 2-bit type for each word and a mark bit and checkmark bit
for the first word of the object. (There used to be additional per-word bits.)

Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not,
and the other used for mark, checkmark, and "keep scanning forward
to find pointers in this object." See comments for details.

This CL replaces heapBitsSetType with very slow but obviously correct code.
A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.)

Change-Id: I999577a133f3cfecacebdec9cdc3573c235c7fb9
Reviewed-on: https://go-review.googlesource.com/9703
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 14:43:45 +00:00
Russ Cox
6d8a147bef runtime: use 1-bit pointer bitmaps in type representation
The type information in reflect.Type and the GC programs is now
1 bit per word, down from 2 bits.

The in-memory unrolled type bitmap representation are now
1 bit per word, down from 4 bits.

The conversion from the unrolled (now 1-bit) bitmap to the
heap bitmap (still 4-bit) is not optimized. A followup CL will
work on that, after the heap bitmap has been converted to 2-bit.

The typeDead optimization, in which a special value denotes
that there are no more pointers anywhere in the object, is lost
in this CL. A followup CL will bring it back in the final form of
heapBitsSetType.

Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549
Reviewed-on: https://go-review.googlesource.com/9702
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 14:43:33 +00:00
Russ Cox
7d9e16abc6 runtime: add benchmark of heapBitsSetType
There was an old benchmark that measured this indirectly
via allocation, but I don't understand how to factor out the
allocation cost when interpreting the numbers.

Replace with a benchmark that only calls heapBitsSetType,
that does not allocate. This was not possible when the
benchmark was first written, because heapBitsSetType had
not been factored out of mallocgc.

Change-Id: I30f0f02362efab3465a50769398be859832e6640
Reviewed-on: https://go-review.googlesource.com/9701
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 14:40:27 +00:00
Daniel Morsing
db6f88a84b runtime: enable profiling on g0
Since we now have stack information for code running on the
systemstack, we can traceback over it. To make cpu profiles useful,
add a case in gentraceback to jump over systemstack switches.

Fixes #10609.

Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7
Reviewed-on: https://go-review.googlesource.com/9506
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-05-11 08:44:30 +00:00
Shenghou Ma
fd392ee52b cmd/internal/ld: generate correct .debug_frames on RISC architectures
With this patch, gdb seems to be able to corretly backtrace Go
process on at least linux/{arm,arm64,ppc64}.

Change-Id: Ic40a2a70e71a19c4a92e4655710f38a807b67e9a
Reviewed-on: https://go-review.googlesource.com/9822
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-08 00:34:27 +00:00
Russ Cox
0211d7d7b0 runtime: turn off checkmark by default
Change-Id: Ic8cb8b1ed8715d6d5a53ec3cac385c0e93883514
Reviewed-on: https://go-review.googlesource.com/9825
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-07 21:08:42 +00:00
Russ Cox
9626561030 runtime: fix gccheckmark mode and enable by default
It was testing the mark bits on what roots pointed at,
but not the remainder of the live heap, because in
CL 2991 I accidentally inverted this check during
refactoring.

The next CL will turn it back off by default again,
but I want one run on the builders with the full
checkmark checks.

Change-Id: Ic166458cea25c0a56e5387fc527cb166ff2e5ada
Reviewed-on: https://go-review.googlesource.com/9824
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-07 21:08:29 +00:00
Rick Hudson
b6e178ed7e runtime: set heap minimum default based on GOGC
Currently the heap minimum is set to 4MB which prevents our ability to
collect at every allocation by setting GOGC=0. This adjust the
heap minimum to 4MB*GOGC/100 thus reenabling collecting at every allocation.
Fixes #10681

Change-Id: I912d027dac4b14ae535597e8beefa9ac3fb8ad94
Reviewed-on: https://go-review.googlesource.com/9814
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-07 21:05:58 +00:00
Michael Hudson-Doyle
fa896733b5 runtime: check consistency of all module data objects
Current code just checks the consistency (that the functab is correctly
sorted by PC, etc) of the moduledata object that the runtime belongs to.
Change to check all of them.

Change-Id: I544a44c5de7445fff87d3cdb4840ff04c5e2bf75
Reviewed-on: https://go-review.googlesource.com/9773
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-07 15:06:08 +00:00
Alex Brainman
a52dc9fcbd runtime: fix comments that mention g status values
Makes searching in source code easier.

Change-Id: Ie2e85934d23920ac0bc01d28168bcfbbdc465580
Reviewed-on: https://go-review.googlesource.com/9774
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-07 00:00:38 +00:00
Austin Clements
17db6e0420 runtime: use heap scan size as estimate of GC scan work
Currently, the GC uses a moving average of recent scan work ratios to
estimate the total scan work required by this cycle. This is in turn
used to compute how much scan work should be done by mutators when
they allocate in order to perform all expected scan work by the time
the allocated heap reaches the heap goal.

However, our current scan work estimate can be arbitrarily wrong if
the heap topography changes significantly from one cycle to the
next. For example, in the go1 benchmarks, at the beginning of each
benchmark, the heap is dominated by a 256MB no-scan object, so the GC
learns that the scan density of the heap is very low. In benchmarks
that then rapidly allocate pointer-dense objects, by the time of the
next GC cycle, our estimate of the scan work can be too low by a large
factor. This in turn lets the mutator allocate faster than the GC can
collect, allowing it to get arbitrarily far ahead of the scan work
estimate, which leads to very long GC cycles with very little mutator
assist that can overshoot the heap goal by large margins. This is
particularly easy to demonstrate with BinaryTree17:

$ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17
gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P
gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P
testing: warning: no tests to run
PASS
BenchmarkBinaryTree17	gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced)
gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P
       1       9150447353 ns/op

Change this estimate to instead use the *current* scannable heap
size. This has the advantage of being based solely on the current
state of the heap, not on past densities or reachable heap sizes, so
it isn't susceptible to falling behind during these sorts of phase
changes. This is strictly an over-estimate, but it's better to
over-estimate and get more assist than necessary than it is to
under-estimate and potentially spiral out of control. Experiments with
scaling this estimate back showed no obvious benefit for mutator
utilization, heap size, or assist time.

This new estimate has little effect for most benchmarks, including
most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge
effect for benchmarks that triggered the bad pacer behavior:

name                   old mean              new mean              delta
BinaryTree17            10.0s × (1.00,1.00)    3.5s × (0.98,1.01)  -64.93% (p=0.000)
Fannkuch11              2.74s × (1.00,1.01)   2.65s × (1.00,1.00)   -3.52% (p=0.000)
FmtFprintfEmpty        56.4ns × (0.99,1.00)  57.8ns × (1.00,1.01)   +2.43% (p=0.000)
FmtFprintfString        187ns × (0.99,1.00)   185ns × (0.99,1.01)   -1.19% (p=0.010)
FmtFprintfInt           184ns × (1.00,1.00)   183ns × (1.00,1.00)  (no variance)
FmtFprintfIntInt        321ns × (1.00,1.00)   315ns × (1.00,1.00)   -1.80% (p=0.000)
FmtFprintfPrefixedInt   266ns × (1.00,1.00)   263ns × (1.00,1.00)   -1.22% (p=0.000)
FmtFprintfFloat         353ns × (1.00,1.00)   353ns × (1.00,1.00)   -0.13% (p=0.035)
FmtManyArgs            1.21µs × (1.00,1.00)  1.19µs × (1.00,1.00)   -1.33% (p=0.000)
GobDecode              9.69ms × (1.00,1.00)  9.59ms × (1.00,1.00)   -1.07% (p=0.000)
GobEncode              7.89ms × (0.99,1.01)  7.74ms × (1.00,1.00)   -1.92% (p=0.000)
Gzip                    391ms × (1.00,1.00)   392ms × (1.00,1.00)     ~    (p=0.522)
Gunzip                 97.1ms × (1.00,1.00)  97.0ms × (1.00,1.00)   -0.10% (p=0.000)
HTTPClientServer       55.7µs × (0.99,1.01)  56.7µs × (0.99,1.01)   +1.81% (p=0.001)
JSONEncode             19.1ms × (1.00,1.00)  19.0ms × (1.00,1.00)   -0.85% (p=0.000)
JSONDecode             66.8ms × (1.00,1.00)  66.9ms × (1.00,1.00)     ~    (p=0.288)
Mandelbrot200          4.13ms × (1.00,1.00)  4.12ms × (1.00,1.00)   -0.08% (p=0.000)
GoParse                3.97ms × (1.00,1.01)  4.01ms × (1.00,1.00)   +0.99% (p=0.000)
RegexpMatchEasy0_32     114ns × (1.00,1.00)   115ns × (0.99,1.00)     ~    (p=0.070)
RegexpMatchEasy0_1K     376ns × (1.00,1.00)   376ns × (1.00,1.00)     ~    (p=0.900)
RegexpMatchEasy1_32    94.9ns × (1.00,1.00)  96.3ns × (1.00,1.01)   +1.53% (p=0.001)
RegexpMatchEasy1_1K     568ns × (1.00,1.00)   567ns × (1.00,1.00)   -0.22% (p=0.001)
RegexpMatchMedium_32    159ns × (1.00,1.00)   159ns × (1.00,1.00)     ~    (p=0.178)
RegexpMatchMedium_1K   46.4µs × (1.00,1.00)  46.6µs × (1.00,1.00)   +0.29% (p=0.000)
RegexpMatchHard_32     2.37µs × (1.00,1.00)  2.37µs × (1.00,1.00)     ~    (p=0.722)
RegexpMatchHard_1K     71.1µs × (1.00,1.00)  71.2µs × (1.00,1.00)     ~    (p=0.229)
Revcomp                 565ms × (1.00,1.00)   562ms × (1.00,1.00)   -0.52% (p=0.000)
Template               81.0ms × (1.00,1.00)  80.2ms × (1.00,1.00)   -0.97% (p=0.000)
TimeParse               380ns × (1.00,1.00)   380ns × (1.00,1.00)     ~    (p=0.148)
TimeFormat              405ns × (0.99,1.00)   385ns × (0.99,1.00)   -5.00% (p=0.000)

Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4
Reviewed-on: https://go-review.googlesource.com/9696
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-06 19:40:38 +00:00
Austin Clements
3be3cbd548 runtime: track "scannable" bytes of heap
This tracks the number of scannable bytes in the allocated heap. That
is, bytes that the garbage collector must scan before reaching the
last pointer field in each object.

This will be used to compute a more robust estimate of the GC scan
work.

Change-Id: I1eecd45ef9cdd65b69d2afb5db5da885c80086bb
Reviewed-on: https://go-review.googlesource.com/9695
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:33 +00:00
Austin Clements
53c53984e7 runtime: include scalar slots in GC scan work metric
The garbage collector predicts how much "scan work" must be done in a
cycle to determine how much work should be done by mutators when they
allocate. Most code doesn't care what units the scan work is in: it
simply knows that a certain amount of scan work has to be done in the
cycle. Currently, the GC uses the number of pointer slots scanned as
the scan work on the theory that this is the bulk of the time spent in
the garbage collector and hence reflects real CPU resource usage.
However, this metric is difficult to estimate at the beginning of a
cycle.

Switch to counting the total number of bytes scanned, including both
pointer and scalar slots. This is still less than the total marked
heap since it omits no-scan objects and no-scan tails of objects. This
metric may not reflect absolute performance as well as the count of
scanned pointer slots (though it still takes time to scan scalar
fields), but it will be much easier to estimate robustly, which is
more important.

Change-Id: Ie3a5eeeb0384a1ca566f61b2f11e9ff3a75ca121
Reviewed-on: https://go-review.googlesource.com/9694
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:27 +00:00
Austin Clements
c4931a8433 runtime: dispose gcWork caches before updating controller state
Currently, we only flush the per-P gcWork caches in gcMark, at the
beginning of mark termination. This is necessary to ensure that no
work is held up in these caches.

However, this flush happens after we update the GC controller state,
which depends on statistics about marked heap size and scan work that
are only updated by this flush. Hence, the controller is missing the
bulk of heap marking and scan work. This bug was introduced in commit
1b4025f, which introduced the per-P gcWork caches.

Fix this by flushing these caches before we update the GC controller
state. We continue to flush them at the beginning of mark termination
as well to be robust in case any write barriers happened between the
previous flush and entering mark termination, but this should be a
no-op.

Change-Id: I8f0f91024df967ebf0c616d1c4f0c339c304ebaa
Reviewed-on: https://go-review.googlesource.com/9646
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:22 +00:00
Rick Hudson
1845314560 runtime: remove unused GC timers
During development some tracing routines were added that are not
needed in the release. These included GCstarttimes, GCendtimes, and
GCprinttimes.
Fixes #10462

Change-Id: I0788e6409d61038571a5ae0cbbab793102df0a65
Reviewed-on: https://go-review.googlesource.com/9689
Reviewed-by: Austin Clements <austin@google.com>
2015-05-06 12:53:08 +00:00
Aram Hăvărneanu
fe5ef5c9d7 runtime, syscall: link Solaris binaries directly instead of using dlopen/dlsym
Before CL 8214 (use .plt instead of .got on Solaris) Solaris used a
dynamic linking scheme that didn't permit lazy binding. To speed program
startup, Go binaries only used it for a small number of symbols required
by the runtime. Other symbols were resolved on demand on first use, and
were cached for subsequent use. This required some moderately complex
code in the syscall package.

CL 8214 changed the way dynamic linking is implemented, and now lazy
binding is supported. As now all symbols are resolved lazily by the
dynamic loader, there is no need for the complex code in the syscall
package that did the same. This CL makes Go programs link directly
with the necessary shared libraries and deletes the lazy-loading code
implemented in Go.

Change-Id: Ifd7275db72de61b70647242e7056dd303b1aee9e
Reviewed-on: https://go-review.googlesource.com/9184
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-06 11:38:50 +00:00
Aram Hăvărneanu
121489cbfd runtime/cgo: add cgo support for solaris/amd64
Change-Id: Ic9744c7716cdd53f27c6e5874230963e5fff0333
Reviewed-on: https://go-review.googlesource.com/8260
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-06 11:37:28 +00:00
Aram Hăvărneanu
c94f1f791b runtime: always load address of libcFunc on Solaris
The linker always uses .plt for externals, so libcFunc is now an actual
external symbol instead of a pointer to one.

Fixes most of the breakage introduced in previous CL.

Change-Id: I64b8c96f93127f2d13b5289b024677fd3ea7dbea
Reviewed-on: https://go-review.googlesource.com/8215
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-06 11:36:57 +00:00
Russ Cox
ceefebd795 runtime: rename ptrsize to ptrdata
I forgot there is already a ptrSize constant.
Rename field to avoid some confusion.

Change-Id: I098fdcc8afc947d6c02c41c6e6de24624cc1c8ff
Reviewed-on: https://go-review.googlesource.com/9700
Reviewed-by: Austin Clements <austin@google.com>
2015-05-05 19:27:47 +00:00
Keith Randall
5a828cfcde runtime: let freezetheworld work even when gomaxprocs=1
Freezetheworld still has stuff to do when gomaxprocs=1.
In particular, signals can come in on other Ms (like the GC M, say)
and the single user M is still running.

Fixes #10546

Change-Id: I2f07f17d1c81e93cf905df2cb087112d436ca7e7
Reviewed-on: https://go-review.googlesource.com/9551
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-05-05 15:11:10 +00:00
Shenghou Ma
102436e800 runtime: fix software FP regs corruption when emulating SQRT on ARM
When emulating ARM FSQRT instruction, the sqrt function itself
should not use any floating point arithmetics, otherwise it will
clobber the user software FP registers.

Fortunately, the sqrt function only uses floating point instructions
to test for corner cases, so it's easy to make that function does
all it job using pure integer arithmetic only. I've verified that
after this change, runtime.stepflt and runtime.sqrt doesn't contain
any call to _sfloat. (Perhaps we should add //go:nosfloat to make
the compiler enforce this?)

Fixes #10641.

Change-Id: Ida4742c49000fae4fea4649f28afde630ce4c576
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9570
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Keith Randall <khr@golang.org>
2015-05-05 07:32:58 +00:00
Austin Clements
98a9d36837 runtime: add pointer size to type structure
This adds a field to the runtime type structure that records the size
of the prefix of objects of that type containing pointers. Any data
after this offset is scalar data.

This is necessary for shrinking the type bitmaps to 1 bit and will
help the garbage collector efficiently estimate the amount of heap
that needs to be scanned.

Change-Id: I1318d79e6360dca0ac980245016c562e61f52ff5
Reviewed-on: https://go-review.googlesource.com/9691
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-04 20:17:48 +00:00
Rick Hudson
b86e71f5aa runtime: Reduce calls to shouldtriggergc
shouldtriggergc is slightly expensive due to the call overhead
and the use of an atomic. This CL reduces the number of time
one checks if a GC should be done from one at each allocation
to once when a span is allocated. Since shouldtriggergc is an
important abstraction simply hand inlining it, along with its
atomic instruction would lose the abstraction.

Change-Id: Ia3210655b4b3d433f77064a21ecb54e4d9d435f7
Reviewed-on: https://go-review.googlesource.com/9403
Reviewed-by: Austin Clements <austin@google.com>
2015-05-04 17:38:58 +00:00
Alex Brainman
031c3bc9ae runtime: fix stackDebug comment
Change-Id: Ia9191bd7ecdf7bd5ee7d69ae23aa71760f379aa8
Reviewed-on: https://go-review.googlesource.com/9590
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-02 02:39:50 +00:00
Austin Clements
dc870d5f4b runtime: detailed debug output of controller state
This adds a detailed debug dump of the state of the GC controller and
a GODEBUG flag to enable it.

Change-Id: I562fed7981691a84ddf0f9e6fcd9f089f497ac13
Reviewed-on: https://go-review.googlesource.com/9640
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 19:39:43 +00:00
Russ Cox
4fffc50c26 runtime: correct accounting of scan work and bytes marked
(1) Count pointer-free objects found during scanning roots
as marked bytes, by not zeroing the mark total after scanning roots.

(2) Don't count the bytes for the roots themselves, by not adding
them to the mark total in scanblock (the zeroing removed by (1)
was aimed at that add but hitting more).

Combined, (1) and (2) fix the calculation of the marked heap size.
This makes the GC trigger much less often in the Go 1 benchmarks,
which have a global []byte pointing at 256 MB of data.
That 256 MB allocation was not being included in the heap size
in the current code, but was included in Go 1.4.
This is the source of much of the relative slowdown in that directory.

(3) Count the bytes for the roots as scanned work, by not zeroing
the scan total after scanning roots. There is no strict justification
for this, and it probably doesn't matter much either way,
but it was always combined with another buggy zeroing
(removed in (1)), so guilty by association.

Austin noticed this.

name                                    old mean                new mean        delta
BenchmarkBinaryTree17              13.1s × (0.97,1.03)      5.9s × (0.97,1.05)  -55.19% (p=0.000)
BenchmarkFannkuch11                4.35s × (0.99,1.01)     4.37s × (1.00,1.01)  +0.47% (p=0.032)
BenchmarkFmtFprintfEmpty          84.6ns × (0.95,1.14)    85.7ns × (0.94,1.05)  ~ (p=0.521)
BenchmarkFmtFprintfString          320ns × (0.95,1.06)     283ns × (0.99,1.02)  -11.48% (p=0.000)
BenchmarkFmtFprintfInt             311ns × (0.98,1.03)     288ns × (0.99,1.02)  -7.26% (p=0.000)
BenchmarkFmtFprintfIntInt          554ns × (0.96,1.05)     478ns × (0.99,1.02)  -13.70% (p=0.000)
BenchmarkFmtFprintfPrefixedInt     434ns × (0.96,1.06)     393ns × (0.98,1.04)  -9.60% (p=0.000)
BenchmarkFmtFprintfFloat           620ns × (0.99,1.03)     584ns × (0.99,1.01)  -5.73% (p=0.000)
BenchmarkFmtManyArgs              2.19µs × (0.98,1.03)    1.94µs × (0.99,1.01)  -11.62% (p=0.000)
BenchmarkGobDecode                21.2ms × (0.97,1.06)    15.2ms × (0.99,1.01)  -28.17% (p=0.000)
BenchmarkGobEncode                18.1ms × (0.94,1.06)    11.8ms × (0.99,1.01)  -35.00% (p=0.000)
BenchmarkGzip                      650ms × (0.98,1.01)     649ms × (0.99,1.02)  ~ (p=0.802)
BenchmarkGunzip                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.438)
BenchmarkHTTPClientServer          110µs × (0.98,1.04)     101µs × (0.98,1.02)  -8.79% (p=0.000)
BenchmarkJSONEncode               40.3ms × (0.97,1.03)    31.8ms × (0.98,1.03)  -20.92% (p=0.000)
BenchmarkJSONDecode                119ms × (0.97,1.02)     108ms × (0.99,1.02)  -9.15% (p=0.000)
BenchmarkMandelbrot200            6.03ms × (1.00,1.01)    6.03ms × (0.99,1.01)  ~ (p=0.750)
BenchmarkGoParse                  8.58ms × (0.89,1.10)    6.80ms × (1.00,1.00)  -20.71% (p=0.000)
BenchmarkRegexpMatchEasy0_32       162ns × (1.00,1.01)     162ns × (0.99,1.02)  ~ (p=0.131)
BenchmarkRegexpMatchEasy0_1K       540ns × (0.99,1.02)     559ns × (0.99,1.02)  +3.58% (p=0.000)
BenchmarkRegexpMatchEasy1_32       139ns × (0.98,1.04)     139ns × (1.00,1.00)  ~ (p=0.466)
BenchmarkRegexpMatchEasy1_1K       889ns × (0.99,1.01)     885ns × (0.99,1.01)  -0.50% (p=0.022)
BenchmarkRegexpMatchMedium_32      252ns × (0.99,1.02)     252ns × (0.99,1.01)  ~ (p=0.469)
BenchmarkRegexpMatchMedium_1K     72.9µs × (0.99,1.01)    73.6µs × (0.99,1.03)  ~ (p=0.168)
BenchmarkRegexpMatchHard_32       3.87µs × (1.00,1.01)    3.86µs × (1.00,1.00)  ~ (p=0.055)
BenchmarkRegexpMatchHard_1K        118µs × (0.99,1.01)     117µs × (0.99,1.00)  ~ (p=0.133)
BenchmarkRevcomp                   995ms × (0.94,1.10)     949ms × (0.99,1.01)  -4.64% (p=0.000)
BenchmarkTemplate                  141ms × (0.97,1.02)     127ms × (0.99,1.01)  -10.00% (p=0.000)
BenchmarkTimeParse                 641ns × (0.99,1.01)     623ns × (0.99,1.01)  -2.79% (p=0.000)
BenchmarkTimeFormat                729ns × (0.98,1.03)     679ns × (0.99,1.00)  -6.93% (p=0.000)

Change-Id: I839bd7356630d18377989a0748763414e15ed057
Reviewed-on: https://go-review.googlesource.com/9602
Reviewed-by: Austin Clements <austin@google.com>
2015-05-01 19:31:00 +00:00
Russ Cox
4d0f3a1c95 cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss
The bitmaps were 2 bits per pointer because we needed to distinguish
scalar, pointer, multiword, and we used the leftover value to distinguish
uninitialized from scalar, even though the garbage collector (GC) didn't care.

Now that there are no multiword structures from the GC's point of view,
cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not.

The GC assumes the same layout for stack frames and for the maps
describing the global data and bss sections, so change them all in one CL.

The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since
the 2-bit representation lives (at least for now) in some of the reflect data.

Because these stack frame bitmaps are stored directly in the rodata in
the binary, this CL reduces the size of the 6g binary by about 1.1%.

Performance change is basically a wash, but using less memory,
and smaller binaries, and enables other bitmap reductions.

name                                      old mean                new mean        delta
BenchmarkBinaryTree17                13.2s × (0.97,1.03)     13.0s × (0.99,1.01)  -0.93% (p=0.005)
BenchmarkBinaryTree17-2              9.69s × (0.96,1.05)     9.51s × (0.96,1.03)  -1.86% (p=0.001)
BenchmarkBinaryTree17-4              10.1s × (0.97,1.05)     10.0s × (0.96,1.05)  ~ (p=0.141)
BenchmarkFannkuch11                  4.35s × (0.99,1.01)     4.43s × (0.98,1.04)  +1.75% (p=0.001)
BenchmarkFannkuch11-2                4.31s × (0.99,1.03)     4.32s × (1.00,1.00)  ~ (p=0.095)
BenchmarkFannkuch11-4                4.32s × (0.99,1.02)     4.38s × (0.98,1.04)  +1.38% (p=0.008)
BenchmarkFmtFprintfEmpty            83.5ns × (0.97,1.10)    87.3ns × (0.92,1.11)  +4.55% (p=0.014)
BenchmarkFmtFprintfEmpty-2          81.8ns × (0.98,1.04)    82.5ns × (0.97,1.08)  ~ (p=0.364)
BenchmarkFmtFprintfEmpty-4          80.9ns × (0.99,1.01)    82.6ns × (0.97,1.08)  +2.12% (p=0.010)
BenchmarkFmtFprintfString            320ns × (0.95,1.04)     322ns × (0.97,1.05)  ~ (p=0.368)
BenchmarkFmtFprintfString-2          303ns × (0.97,1.04)     304ns × (0.97,1.04)  ~ (p=0.484)
BenchmarkFmtFprintfString-4          305ns × (0.97,1.05)     306ns × (0.98,1.05)  ~ (p=0.543)
BenchmarkFmtFprintfInt               311ns × (0.98,1.03)     319ns × (0.97,1.03)  +2.63% (p=0.000)
BenchmarkFmtFprintfInt-2             297ns × (0.98,1.04)     301ns × (0.97,1.04)  +1.19% (p=0.023)
BenchmarkFmtFprintfInt-4             302ns × (0.98,1.02)     304ns × (0.97,1.03)  ~ (p=0.126)
BenchmarkFmtFprintfIntInt            554ns × (0.96,1.05)     554ns × (0.97,1.03)  ~ (p=0.975)
BenchmarkFmtFprintfIntInt-2          520ns × (0.98,1.03)     517ns × (0.98,1.02)  ~ (p=0.153)
BenchmarkFmtFprintfIntInt-4          524ns × (0.98,1.02)     525ns × (0.98,1.03)  ~ (p=0.597)
BenchmarkFmtFprintfPrefixedInt       433ns × (0.97,1.06)     434ns × (0.97,1.06)  ~ (p=0.804)
BenchmarkFmtFprintfPrefixedInt-2     413ns × (0.98,1.04)     413ns × (0.98,1.03)  ~ (p=0.881)
BenchmarkFmtFprintfPrefixedInt-4     420ns × (0.97,1.03)     421ns × (0.97,1.03)  ~ (p=0.561)
BenchmarkFmtFprintfFloat             620ns × (0.99,1.03)     636ns × (0.97,1.03)  +2.57% (p=0.000)
BenchmarkFmtFprintfFloat-2           601ns × (0.98,1.02)     617ns × (0.98,1.03)  +2.58% (p=0.000)
BenchmarkFmtFprintfFloat-4           613ns × (0.98,1.03)     626ns × (0.98,1.02)  +2.15% (p=0.000)
BenchmarkFmtManyArgs                2.19µs × (0.96,1.04)    2.23µs × (0.97,1.02)  +1.65% (p=0.000)
BenchmarkFmtManyArgs-2              2.08µs × (0.98,1.03)    2.10µs × (0.99,1.02)  +0.79% (p=0.019)
BenchmarkFmtManyArgs-4              2.10µs × (0.98,1.02)    2.13µs × (0.98,1.02)  +1.72% (p=0.000)
BenchmarkGobDecode                  21.3ms × (0.97,1.05)    21.1ms × (0.97,1.04)  -1.36% (p=0.025)
BenchmarkGobDecode-2                20.0ms × (0.97,1.03)    19.2ms × (0.97,1.03)  -4.00% (p=0.000)
BenchmarkGobDecode-4                19.5ms × (0.99,1.02)    19.0ms × (0.99,1.01)  -2.39% (p=0.000)
BenchmarkGobEncode                  18.3ms × (0.95,1.07)    18.1ms × (0.96,1.08)  ~ (p=0.305)
BenchmarkGobEncode-2                16.8ms × (0.97,1.02)    16.4ms × (0.98,1.02)  -2.79% (p=0.000)
BenchmarkGobEncode-4                15.4ms × (0.98,1.02)    15.4ms × (0.98,1.02)  ~ (p=0.465)
BenchmarkGzip                        650ms × (0.98,1.03)     655ms × (0.97,1.04)  ~ (p=0.075)
BenchmarkGzip-2                      652ms × (0.98,1.03)     655ms × (0.98,1.02)  ~ (p=0.337)
BenchmarkGzip-4                      656ms × (0.98,1.04)     653ms × (0.98,1.03)  ~ (p=0.291)
BenchmarkGunzip                      143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.507)
BenchmarkGunzip-2                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.313)
BenchmarkGunzip-4                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.312)
BenchmarkHTTPClientServer            110µs × (0.98,1.03)     109µs × (0.99,1.02)  -1.40% (p=0.000)
BenchmarkHTTPClientServer-2          154µs × (0.90,1.08)     149µs × (0.90,1.08)  -3.43% (p=0.007)
BenchmarkHTTPClientServer-4          138µs × (0.97,1.04)     138µs × (0.96,1.04)  ~ (p=0.670)
BenchmarkJSONEncode                 40.2ms × (0.98,1.02)    40.2ms × (0.98,1.05)  ~ (p=0.828)
BenchmarkJSONEncode-2               35.1ms × (0.99,1.02)    35.2ms × (0.98,1.03)  ~ (p=0.392)
BenchmarkJSONEncode-4               35.3ms × (0.98,1.03)    35.3ms × (0.98,1.02)  ~ (p=0.813)
BenchmarkJSONDecode                  119ms × (0.97,1.02)     117ms × (0.98,1.02)  -1.80% (p=0.000)
BenchmarkJSONDecode-2                115ms × (0.99,1.02)     114ms × (0.98,1.02)  -1.18% (p=0.000)
BenchmarkJSONDecode-4                116ms × (0.98,1.02)     114ms × (0.98,1.02)  -1.43% (p=0.000)
BenchmarkMandelbrot200              6.03ms × (1.00,1.01)    6.03ms × (1.00,1.01)  ~ (p=0.985)
BenchmarkMandelbrot200-2            6.03ms × (1.00,1.01)    6.02ms × (1.00,1.01)  ~ (p=0.320)
BenchmarkMandelbrot200-4            6.03ms × (1.00,1.01)    6.03ms × (1.00,1.01)  ~ (p=0.799)
BenchmarkGoParse                    8.63ms × (0.89,1.10)    8.58ms × (0.93,1.09)  ~ (p=0.667)
BenchmarkGoParse-2                  8.20ms × (0.97,1.04)    8.37ms × (0.97,1.04)  +1.96% (p=0.001)
BenchmarkGoParse-4                  8.00ms × (0.98,1.02)    8.14ms × (0.99,1.02)  +1.75% (p=0.000)
BenchmarkRegexpMatchEasy0_32         162ns × (1.00,1.01)     164ns × (0.98,1.04)  +1.35% (p=0.011)
BenchmarkRegexpMatchEasy0_32-2       161ns × (1.00,1.01)     161ns × (1.00,1.00)  ~ (p=0.185)
BenchmarkRegexpMatchEasy0_32-4       161ns × (1.00,1.00)     161ns × (1.00,1.00)  -0.19% (p=0.001)
BenchmarkRegexpMatchEasy0_1K         540ns × (0.99,1.02)     566ns × (0.98,1.04)  +4.98% (p=0.000)
BenchmarkRegexpMatchEasy0_1K-2       540ns × (0.99,1.01)     557ns × (0.99,1.01)  +3.21% (p=0.000)
BenchmarkRegexpMatchEasy0_1K-4       541ns × (0.99,1.01)     559ns × (0.99,1.01)  +3.26% (p=0.000)
BenchmarkRegexpMatchEasy1_32         139ns × (0.98,1.04)     139ns × (0.99,1.03)  ~ (p=0.979)
BenchmarkRegexpMatchEasy1_32-2       139ns × (0.99,1.04)     139ns × (0.99,1.02)  ~ (p=0.777)
BenchmarkRegexpMatchEasy1_32-4       139ns × (0.98,1.04)     139ns × (0.99,1.04)  ~ (p=0.771)
BenchmarkRegexpMatchEasy1_1K         890ns × (0.99,1.03)     885ns × (1.00,1.01)  -0.50% (p=0.004)
BenchmarkRegexpMatchEasy1_1K-2       888ns × (0.99,1.01)     885ns × (0.99,1.01)  -0.37% (p=0.004)
BenchmarkRegexpMatchEasy1_1K-4       890ns × (0.99,1.02)     884ns × (1.00,1.00)  -0.70% (p=0.000)
BenchmarkRegexpMatchMedium_32        252ns × (0.99,1.01)     251ns × (0.99,1.01)  ~ (p=0.081)
BenchmarkRegexpMatchMedium_32-2      254ns × (0.99,1.04)     252ns × (0.99,1.01)  -0.78% (p=0.027)
BenchmarkRegexpMatchMedium_32-4      253ns × (0.99,1.04)     252ns × (0.99,1.01)  -0.70% (p=0.022)
BenchmarkRegexpMatchMedium_1K       72.9µs × (0.99,1.01)    72.7µs × (1.00,1.00)  ~ (p=0.064)
BenchmarkRegexpMatchMedium_1K-2     74.1µs × (0.98,1.05)    72.9µs × (1.00,1.01)  -1.61% (p=0.001)
BenchmarkRegexpMatchMedium_1K-4     73.6µs × (0.99,1.05)    72.8µs × (1.00,1.00)  -1.13% (p=0.007)
BenchmarkRegexpMatchHard_32         3.88µs × (0.99,1.03)    3.92µs × (0.98,1.05)  ~ (p=0.143)
BenchmarkRegexpMatchHard_32-2       3.89µs × (0.99,1.03)    3.93µs × (0.98,1.09)  ~ (p=0.278)
BenchmarkRegexpMatchHard_32-4       3.90µs × (0.99,1.05)    3.93µs × (0.98,1.05)  ~ (p=0.252)
BenchmarkRegexpMatchHard_1K          118µs × (0.99,1.01)     117µs × (0.99,1.02)  -0.54% (p=0.003)
BenchmarkRegexpMatchHard_1K-2        118µs × (0.99,1.01)     118µs × (0.99,1.03)  ~ (p=0.581)
BenchmarkRegexpMatchHard_1K-4        118µs × (0.99,1.02)     117µs × (0.99,1.01)  -0.54% (p=0.002)
BenchmarkRevcomp                     991ms × (0.95,1.10)     989ms × (0.94,1.08)  ~ (p=0.879)
BenchmarkRevcomp-2                   978ms × (0.95,1.11)     962ms × (0.96,1.08)  ~ (p=0.257)
BenchmarkRevcomp-4                   979ms × (0.96,1.07)     974ms × (0.96,1.11)  ~ (p=0.678)
BenchmarkTemplate                    141ms × (0.99,1.02)     145ms × (0.99,1.02)  +2.75% (p=0.000)
BenchmarkTemplate-2                  135ms × (0.98,1.02)     138ms × (0.99,1.02)  +2.34% (p=0.000)
BenchmarkTemplate-4                  136ms × (0.98,1.02)     140ms × (0.99,1.02)  +2.71% (p=0.000)
BenchmarkTimeParse                   640ns × (0.99,1.01)     622ns × (0.99,1.01)  -2.88% (p=0.000)
BenchmarkTimeParse-2                 640ns × (0.99,1.01)     622ns × (1.00,1.00)  -2.81% (p=0.000)
BenchmarkTimeParse-4                 640ns × (1.00,1.01)     622ns × (0.99,1.01)  -2.82% (p=0.000)
BenchmarkTimeFormat                  730ns × (0.98,1.02)     731ns × (0.98,1.03)  ~ (p=0.767)
BenchmarkTimeFormat-2                709ns × (0.99,1.02)     707ns × (0.99,1.02)  ~ (p=0.347)
BenchmarkTimeFormat-4                717ns × (0.98,1.01)     718ns × (0.98,1.02)  ~ (p=0.793)

Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9
Reviewed-on: https://go-review.googlesource.com/9406
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-01 18:44:36 +00:00
Josh Bleecher Snyder
7bebccb972 Revert "runtime/pprof: write heap statistics to heap profile always"
This reverts commit c26fc88d56.

This broke pprof. See the comments at 9491.

Change-Id: Ic99ce026e86040c050a9bf0ea3024a1a42274ad1
Reviewed-on: https://go-review.googlesource.com/9565
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2015-05-01 15:56:20 +00:00
Keith Randall
a55b131393 cmd/dist, runtime: Make stack guard larger for non-optimized builds
Kind of a hack, but makes the non-optimized builds pass.

Fixes #10079

Change-Id: I26f41c546867f8f3f16d953dc043e784768f2aff
Reviewed-on: https://go-review.googlesource.com/9552
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 15:41:55 +00:00
David Chase
7fbb1b36c3 cmd/internal/gc: improve flow of input params to output params
This includes the following information in the per-function summary:

outK = paramJ   encoded in outK bits for paramJ
outK = *paramJ  encoded in outK bits for paramJ
heap = paramJ   EscHeap
heap = *paramJ  EscContentEscapes

Note that (currently) if the address of a parameter is taken and
returned, necessarily a heap allocation occurred to contain that
reference, and the heap can never refer to stack, therefore the
parameter and everything downstream from it escapes to the heap.

The per-function summary information now has a tuneable number of bits
(2 is probably noticeably better than 1, 3 is likely overkill, but it
is now easy to check and the -m debugging output includes information
that allows you to figure out if more would be better.)

A new test was  added to check pointer flow through struct-typed and
*struct-typed parameters and returns; some of these are sensitive to
the number of summary bits, and ought to yield better results with a
more competent escape analysis algorithm.  Another new test checks
(some) correctness with array parameters, results, and operations.

The old analysis inferred a piece of plan9 runtime was non-escaping by
counteracting overconservative analysis with buggy analysis; with the
bug fixed, the result was too conservative (and it's not easy to fix
in this framework) so the source code was tweaked to get the desired
result.  A test was added against the discovered bug.

The escape analysis was further improved splitting the "level" into
3 parts, one tracking the conventional "level" and the other two
computing the highest-level-suffix-from-copy, which is used to
generally model the cancelling effect of indirection applied to
address-of.

With the improved escape analysis enabled, it was necessary to
modify one of the runtime tests because it now attempts to allocate
too much on the (small, fixed-size) G0 (system) stack and this
failed the test.

Compiling src/std after touching src/runtime/*.go with -m logging
turned on shows 420 fewer heap allocation sites (10538 vs 10968).

Profiling allocations in src/html/template with
for i in {1..5} ;
  do go tool 6g -memprofile=mastx.${i}.prof  -memprofilerate=1 *.go;
  go tool pprof -alloc_objects -text  mastx.${i}.prof ;
done

showed a 15% reduction in allocations performed by the compiler.

Update #3753
Update #4720
Fixes #10466

Change-Id: I0fd97d5f5ac527b45f49e2218d158a6e89951432
Reviewed-on: https://go-review.googlesource.com/8202
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 13:47:20 +00:00
David Crawshaw
4044adedf7 runtime/cgo, cmd/dist: turn off exc_bad_access handler by default
App Store policy requires programs do not reference the exc_server
symbol. (Some public forum threads show that Unity ran into this
several years ago and it is a hard policy rule.) While some research
suggests that I could write my own version of exc_server, the
expedient course is to disable the exception handler by default.

Go programs only need it when running under lldb, which is primarily
used by tests. So enable the exception handler in cmd/dist when we
are running the tests.

Fixes #10646

Change-Id: I853905254894b5367edb8abd381d45585a78ee8b
Reviewed-on: https://go-review.googlesource.com/9549
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-01 13:19:39 +00:00
Shenghou Ma
5f69e739d3 runtime: adjust traceTickDiv for non-x86 architectures
Fixes #10554.
Fixes #10623.

Change-Id: I90fbaa34e3d55c8758178f8d2e7fa41ff1194a1b
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9247
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-05-01 07:25:49 +00:00
Russ Cox
79a990b845 runtime: schedule GC work more aggressively
Schedule the work as early as possible, while still respecting the
utilization percentage on average. The old code tried never to
go above the utilization percentage. The new code is willing
to go above the utilization percentage by one time slice
(but of course after doing that it must wait until the percentage
drops back down to the target before it gets another time slice).

The effect is that for concurrent GCs that can run in a small number
of time slices, the time during which write barriers are enabled is
reduced by one mutator + GC time slice round (possibly 30 ms per GC).

This only affects the fractional GC processor (the remainder of GOMAXPROCS/4),
so it matters most in GOMAXPROCS=1, a bit in GOMAXPROCS=2, and not at
all in GOMAXPROCS=4.

GOMAXPROCS=1
name                                      old mean                new mean        delta
BenchmarkBinaryTree17                12.4s × (0.98,1.03)     13.5s × (0.97,1.04)  +8.84% (p=0.000)
BenchmarkFannkuch11                  4.38s × (1.00,1.01)     4.38s × (1.00,1.01)  ~ (p=0.343)
BenchmarkFmtFprintfEmpty            88.9ns × (0.97,1.10)    90.1ns × (0.93,1.14)  ~ (p=0.224)
BenchmarkFmtFprintfString            356ns × (0.94,1.05)     321ns × (0.94,1.12)  -9.77% (p=0.000)
BenchmarkFmtFprintfInt               344ns × (0.98,1.03)     325ns × (0.96,1.03)  -5.46% (p=0.000)
BenchmarkFmtFprintfIntInt            622ns × (0.97,1.03)     571ns × (0.95,1.05)  -8.09% (p=0.000)
BenchmarkFmtFprintfPrefixedInt       462ns × (0.96,1.04)     431ns × (0.95,1.05)  -6.81% (p=0.000)
BenchmarkFmtFprintfFloat             653ns × (0.98,1.03)     621ns × (0.99,1.03)  -4.90% (p=0.000)
BenchmarkFmtManyArgs                2.32µs × (0.97,1.03)    2.19µs × (0.98,1.02)  -5.43% (p=0.000)
BenchmarkGobDecode                  27.0ms × (0.96,1.04)    20.0ms × (0.97,1.04)  -26.06% (p=0.000)
BenchmarkGobEncode                  26.6ms × (0.99,1.01)    17.8ms × (0.95,1.05)  -33.19% (p=0.000)
BenchmarkGzip                        659ms × (0.98,1.03)     650ms × (0.99,1.01)  -1.34% (p=0.000)
BenchmarkGunzip                      145ms × (0.98,1.04)     143ms × (1.00,1.01)  -1.47% (p=0.000)
BenchmarkHTTPClientServer            111µs × (0.97,1.04)     110µs × (0.96,1.03)  -1.30% (p=0.000)
BenchmarkJSONEncode                 52.0ms × (0.97,1.03)    40.8ms × (0.97,1.03)  -21.47% (p=0.000)
BenchmarkJSONDecode                  127ms × (0.98,1.04)     120ms × (0.98,1.02)  -5.55% (p=0.000)
BenchmarkMandelbrot200              6.04ms × (0.99,1.04)    6.02ms × (1.00,1.01)  ~ (p=0.176)
BenchmarkGoParse                    8.62ms × (0.96,1.08)    8.55ms × (0.93,1.09)  ~ (p=0.302)
BenchmarkRegexpMatchEasy0_32         164ns × (0.98,1.05)     165ns × (0.98,1.07)  ~ (p=0.293)
BenchmarkRegexpMatchEasy0_1K         546ns × (0.98,1.06)     547ns × (0.97,1.07)  ~ (p=0.741)
BenchmarkRegexpMatchEasy1_32         142ns × (0.97,1.09)     141ns × (0.97,1.05)  ~ (p=0.231)
BenchmarkRegexpMatchEasy1_1K         904ns × (0.97,1.07)     900ns × (0.98,1.04)  ~ (p=0.294)
BenchmarkRegexpMatchMedium_32        256ns × (0.98,1.06)     256ns × (0.97,1.04)  ~ (p=0.530)
BenchmarkRegexpMatchMedium_1K       74.2µs × (0.98,1.05)    73.8µs × (0.98,1.04)  ~ (p=0.334)
BenchmarkRegexpMatchHard_32         3.94µs × (0.98,1.07)    3.92µs × (0.98,1.05)  ~ (p=0.356)
BenchmarkRegexpMatchHard_1K          119µs × (0.98,1.07)     119µs × (0.98,1.06)  ~ (p=0.467)
BenchmarkRevcomp                     978ms × (0.96,1.09)     984ms × (0.95,1.07)  ~ (p=0.448)
BenchmarkTemplate                    151ms × (0.96,1.03)     142ms × (0.95,1.04)  -5.55% (p=0.000)
BenchmarkTimeParse                   628ns × (0.99,1.01)     628ns × (0.99,1.01)  ~ (p=0.855)
BenchmarkTimeFormat                  729ns × (0.98,1.06)     734ns × (0.97,1.05)  ~ (p=0.149)

GOMAXPROCS=2
name                                      old mean                new mean        delta
BenchmarkBinaryTree17-2              9.80s × (0.97,1.03)     9.85s × (0.99,1.02)  ~ (p=0.444)
BenchmarkFannkuch11-2                4.35s × (0.99,1.01)     4.40s × (0.98,1.05)  ~ (p=0.099)
BenchmarkFmtFprintfEmpty-2          86.7ns × (0.97,1.05)    85.9ns × (0.98,1.04)  ~ (p=0.409)
BenchmarkFmtFprintfString-2          297ns × (0.98,1.01)     297ns × (0.99,1.01)  ~ (p=0.743)
BenchmarkFmtFprintfInt-2             309ns × (0.98,1.02)     310ns × (0.99,1.01)  ~ (p=0.464)
BenchmarkFmtFprintfIntInt-2          525ns × (0.97,1.05)     518ns × (0.99,1.01)  ~ (p=0.151)
BenchmarkFmtFprintfPrefixedInt-2     408ns × (0.98,1.02)     408ns × (0.98,1.03)  ~ (p=0.797)
BenchmarkFmtFprintfFloat-2           603ns × (0.99,1.01)     604ns × (0.98,1.02)  ~ (p=0.588)
BenchmarkFmtManyArgs-2              2.07µs × (0.98,1.02)    2.05µs × (0.99,1.01)  ~ (p=0.091)
BenchmarkGobDecode-2                19.1ms × (0.97,1.01)    19.3ms × (0.97,1.04)  ~ (p=0.195)
BenchmarkGobEncode-2                16.2ms × (0.97,1.03)    16.4ms × (0.99,1.01)  ~ (p=0.069)
BenchmarkGzip-2                      652ms × (0.99,1.01)     651ms × (0.99,1.01)  ~ (p=0.705)
BenchmarkGunzip-2                    143ms × (1.00,1.01)     143ms × (1.00,1.00)  ~ (p=0.665)
BenchmarkHTTPClientServer-2          149µs × (0.92,1.11)     149µs × (0.91,1.08)  ~ (p=0.862)
BenchmarkJSONEncode-2               34.6ms × (0.98,1.02)    37.2ms × (0.99,1.01)  +7.56% (p=0.000)
BenchmarkJSONDecode-2                117ms × (0.99,1.01)     117ms × (0.99,1.01)  ~ (p=0.858)
BenchmarkMandelbrot200-2            6.10ms × (0.99,1.03)    6.03ms × (1.00,1.00)  ~ (p=0.083)
BenchmarkGoParse-2                  8.25ms × (0.98,1.01)    8.21ms × (0.99,1.02)  ~ (p=0.307)
BenchmarkRegexpMatchEasy0_32-2       162ns × (0.99,1.02)     162ns × (0.99,1.01)  ~ (p=0.857)
BenchmarkRegexpMatchEasy0_1K-2       541ns × (0.99,1.01)     540ns × (1.00,1.00)  ~ (p=0.530)
BenchmarkRegexpMatchEasy1_32-2       138ns × (1.00,1.00)     141ns × (0.98,1.04)  +1.88% (p=0.038)
BenchmarkRegexpMatchEasy1_1K-2       887ns × (0.99,1.01)     894ns × (0.99,1.01)  ~ (p=0.087)
BenchmarkRegexpMatchMedium_32-2      252ns × (0.99,1.01)     252ns × (0.99,1.01)  ~ (p=0.954)
BenchmarkRegexpMatchMedium_1K-2     73.4µs × (0.99,1.02)    72.8µs × (1.00,1.01)  -0.87% (p=0.029)
BenchmarkRegexpMatchHard_32-2       3.95µs × (0.97,1.05)    3.87µs × (1.00,1.01)  -2.11% (p=0.035)
BenchmarkRegexpMatchHard_1K-2        117µs × (0.99,1.01)     117µs × (0.99,1.01)  ~ (p=0.669)
BenchmarkRevcomp-2                   980ms × (0.95,1.03)     993ms × (0.94,1.09)  ~ (p=0.527)
BenchmarkTemplate-2                  136ms × (0.98,1.01)     135ms × (0.99,1.01)  ~ (p=0.200)
BenchmarkTimeParse-2                 630ns × (1.00,1.01)     630ns × (1.00,1.00)  ~ (p=0.634)
BenchmarkTimeFormat-2                705ns × (0.99,1.01)     710ns × (0.98,1.02)  ~ (p=0.174)

GOMAXPROCS=4
BenchmarkBinaryTree17-4              9.87s × (0.96,1.04)     9.75s × (0.96,1.03)  ~ (p=0.178)
BenchmarkFannkuch11-4                4.35s × (1.00,1.01)     4.40s × (0.99,1.04)  ~ (p=0.071)
BenchmarkFmtFprintfEmpty-4          85.8ns × (0.98,1.06)    85.6ns × (0.98,1.04)  ~ (p=0.858)
BenchmarkFmtFprintfString-4          306ns × (0.99,1.03)     304ns × (0.97,1.02)  ~ (p=0.470)
BenchmarkFmtFprintfInt-4             317ns × (0.98,1.01)     315ns × (0.98,1.02)  -0.92% (p=0.044)
BenchmarkFmtFprintfIntInt-4          527ns × (0.99,1.01)     525ns × (0.98,1.01)  ~ (p=0.164)
BenchmarkFmtFprintfPrefixedInt-4     421ns × (0.98,1.03)     417ns × (0.99,1.02)  ~ (p=0.092)
BenchmarkFmtFprintfFloat-4           623ns × (0.98,1.02)     618ns × (0.98,1.03)  ~ (p=0.172)
BenchmarkFmtManyArgs-4              2.09µs × (0.98,1.02)    2.09µs × (0.98,1.02)  ~ (p=0.679)
BenchmarkGobDecode-4                18.6ms × (0.99,1.01)    18.6ms × (0.98,1.03)  ~ (p=0.595)
BenchmarkGobEncode-4                15.0ms × (0.98,1.02)    15.1ms × (0.99,1.01)  ~ (p=0.301)
BenchmarkGzip-4                      659ms × (0.98,1.04)     660ms × (0.97,1.02)  ~ (p=0.724)
BenchmarkGunzip-4                    145ms × (0.98,1.04)     144ms × (0.99,1.04)  ~ (p=0.671)
BenchmarkHTTPClientServer-4          139µs × (0.97,1.02)     138µs × (0.99,1.02)  ~ (p=0.392)
BenchmarkJSONEncode-4               35.0ms × (0.99,1.02)    35.1ms × (0.98,1.02)  ~ (p=0.777)
BenchmarkJSONDecode-4                119ms × (0.98,1.01)     118ms × (0.98,1.02)  ~ (p=0.710)
BenchmarkMandelbrot200-4            6.02ms × (1.00,1.00)    6.02ms × (1.00,1.00)  ~ (p=0.289)
BenchmarkGoParse-4                  7.96ms × (0.99,1.01)    7.96ms × (0.99,1.01)  ~ (p=0.884)
BenchmarkRegexpMatchEasy0_32-4       164ns × (0.98,1.04)     166ns × (0.97,1.04)  ~ (p=0.221)
BenchmarkRegexpMatchEasy0_1K-4       540ns × (0.99,1.01)     552ns × (0.97,1.04)  +2.10% (p=0.018)
BenchmarkRegexpMatchEasy1_32-4       140ns × (0.99,1.04)     142ns × (0.97,1.04)  ~ (p=0.226)
BenchmarkRegexpMatchEasy1_1K-4       896ns × (0.99,1.03)     907ns × (0.97,1.04)  ~ (p=0.155)
BenchmarkRegexpMatchMedium_32-4      255ns × (0.99,1.04)     255ns × (0.98,1.04)  ~ (p=0.904)
BenchmarkRegexpMatchMedium_1K-4     73.4µs × (0.99,1.04)    73.8µs × (0.98,1.04)  ~ (p=0.560)
BenchmarkRegexpMatchHard_32-4       3.93µs × (0.98,1.04)    3.95µs × (0.98,1.04)  ~ (p=0.571)
BenchmarkRegexpMatchHard_1K-4        117µs × (1.00,1.01)     119µs × (0.98,1.04)  +1.48% (p=0.048)
BenchmarkRevcomp-4                   990ms × (0.94,1.08)     989ms × (0.94,1.10)  ~ (p=0.957)
BenchmarkTemplate-4                  137ms × (0.98,1.02)     137ms × (0.99,1.01)  ~ (p=0.996)
BenchmarkTimeParse-4                 629ns × (1.00,1.00)     629ns × (0.99,1.01)  ~ (p=0.924)
BenchmarkTimeFormat-4                710ns × (0.99,1.01)     716ns × (0.98,1.02)  +0.84% (p=0.033)

Change-Id: I43a04e0f6ad5e3ba9847dddf12e13222561f9cf4
Reviewed-on: https://go-review.googlesource.com/9543
Reviewed-by: Austin Clements <austin@google.com>
2015-04-30 15:50:12 +00:00
Austin Clements
3ca20218c1 runtime: fix gcDumpObject on non-heap pointers
gcDumpObject is used to print the source and destination objects when
checkmark find a missing mark. However, gcDumpObject currently assumes
the given pointer will point to a heap object. This is not true of the
source object during root marking and may not even be true of the
destination object in the limited situations where the heap points
back in to the stack.

If the pointer isn't a heap object, gcDumpObject will attempt an
out-of-bounds access to h_spans. This will cause a panicslice, which
will attempt to construct a useful panic message. This will cause a
string allocation, which will lead mallocgc to panic because the GC is
in mark termination (checkmark only happens during mark termination).

Fix this by checking that the pointer points into the heap arena
before attempting to use it as an arena pointer.

Change-Id: I09da600c380d4773f1f8f38e45b82cb229ea6382
Reviewed-on: https://go-review.googlesource.com/9498
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-30 14:53:51 +00:00
Keith Randall
4b78c9575d runtime: print stack of G during a signal
Sequence of operations:
- Go code does a systemstack call
- during the systemstack call, receive a signal
- signal requests a traceback of all goroutines

The orignal G is still marked as _Grunning, so the traceback code
refuses to print its stack.

Fix by allowing traceback of Gs whose caller is on the same M as G is.
G can't be modifying its stack if that is the case.

Fixes #10546

Change-Id: I2bcea48c0197fbf78ab6fa080027cd80181083ad
Reviewed-on: https://go-review.googlesource.com/9435
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-29 19:25:10 +00:00
Shenghou Ma
4d1ab2d8d1 runtime: re-enable TestNewProc0 on android/arm and fix heap corruption
The problem is not actually specific to android/arm. Linux/ARM's
runtime.clone set the stack pointer to child_stk-4 before calling
the fn. And then when fn returns, it tries to write to 4(R13) to
provide argument for runtime.exit, which is just beyond the allocated
child stack, and thus it will corrupt the heap randomly or trigger
segfault if that memory happens to be unmapped.

While we're at here, shorten the test polling interval to 0.1s to
speed up the test (it was only checking at 1s interval, which means
the test takes at least 1s).

Fixes #10548.

Change-Id: I57cd63232022b113b6cd61e987b0684ebcce930a
Reviewed-on: https://go-review.googlesource.com/9457
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-29 19:18:07 +00:00
Russ Cox
c26fc88d56 runtime/pprof: write heap statistics to heap profile always
The heap statistics were only written if asked for a profile with debug > 0,
but that also prints a stack trace for each profile line, which is comparatively
much noisier. The statistics are short enough and separate enough
(they only appear at the end) and useful enough that we can print them
always.

This means that people using -test.memprofile in tests will get a memory
profile with statistics included now. Pprof won't care, but if people care to
look, the numbers will be there.

This avoids the need for hacks like using -memprofilerate=1 to find
the number of allocations.

Change-Id: I10a4f593403d0315aad11b37c6e554b734caa73f
Reviewed-on: https://go-review.googlesource.com/9491
Reviewed-by: David Chase <drchase@google.com>
2015-04-29 18:07:43 +00:00
Keith Randall
c526f3ac10 runtime: tail call into memeq/cmp body implementations
There's no need to call/ret to the body implementation.
It can write the result to the right place.  Just jump to
it and have it return to our caller.

Old:
  call body implementation
  compute result
  put result in a register
  return
  write register to result location
  return

New:
  load address of result location into a register
  jump to body implementation
  compute result
  write result to passed-in address
  return

It's a bit tricky on 386 because there is no free register
with which to pass the result location.  Free up a register
by keeping around blen-alen instead of both alen and blen.

Change-Id: If2cf0682a5bf1cc592bdda7c126ed4eee8944fba
Reviewed-on: https://go-review.googlesource.com/9202
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-04-29 04:46:25 +00:00
Shenghou Ma
7e49c8193c runtime: skip gdb goroutine backtrace test on non-x86
Gdb is not able to backtrace our non-standard stack frames on RISC
architectures without frame pointer.

Change-Id: Id62a566ce2d743602ded2da22ff77b9ae34bc5ae
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9456
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-29 04:44:38 +00:00
Shenghou Ma
da11a9dda3 cmd/internal/ld, runtime: unify stack reservation in PE header and runtime
With 128KB stack reservation, on 32-bit Windows, the maximum number
threads is ~9000.

The original 65535-byte stack commit is causing problem on Windows
XP where it makes the stack reservation to be 1MB despite the fact
that the runtime specified 128KB.

While we're at here, also fix the extra spacings in the unable to
create more OS thread error message: println will insert a space
between each argument.

See #9457 for more information.

Change-Id: I3a82f7d9717d3d55211b6eb1c34b00b0eaad83ed
Reviewed-on: https://go-review.googlesource.com/2237
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Minux Ma <minux@golang.org>
2015-04-29 03:27:10 +00:00
Shenghou Ma
e7dd28891e cmd/internal/gc, cmd/[56789]g: rename stackcopy to blockcopy
To avoid confusion with the runtime concept of copying stack.

Change-Id: I33442377b71012c2482c2d0ddd561492c71e70d0
Reviewed-on: https://go-review.googlesource.com/8639
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-29 00:28:01 +00:00
Ian Lance Taylor
0c62c93a09 runtime/cgo: use PTHREAD_{MUTEX,COND}_INITIALIZER
Technically you must initialize static pthread_mutex_t and
pthread_cond_t variables with the appropriate INITIALIZER macro.  In
practice the default initializers are zero anyhow, but it's still good
code hygiene.

Change-Id: I517304b16c2c7943b3880855c1b47a9a506b4bdf
Reviewed-on: https://go-review.googlesource.com/9433
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-28 22:27:26 +00:00
Austin Clements
63caec5dee runtime: eliminate one heapBitsForObject from scanobject
scanobject with ptrmask!=nil is only ever called with the base
pointer of a heap object. Currently, scanobject calls
heapBitsForObject, which goes to a great deal of trouble to check
that the pointer points into the heap and to find the base of the
object it points to, both of which are completely unnecessary in
this case.

Replace this call to heapBitsForObject with much simpler logic to
fetch the span and compute the heap bits.

Benchmark results with five runs:

name                                    old mean                new mean        delta
BenchmarkBinaryTree17              9.21s × (0.95,1.02)     8.55s × (0.91,1.03)  -7.16% (p=0.022)
BenchmarkFannkuch11                2.65s × (1.00,1.00)     2.62s × (1.00,1.00)  -1.10% (p=0.000)
BenchmarkFmtFprintfEmpty          73.2ns × (0.99,1.01)    71.7ns × (1.00,1.01)  -1.99% (p=0.004)
BenchmarkFmtFprintfString          302ns × (0.99,1.00)     292ns × (0.98,1.02)  -3.31% (p=0.020)
BenchmarkFmtFprintfInt             281ns × (0.98,1.01)     279ns × (0.96,1.02)  ~ (p=0.596)
BenchmarkFmtFprintfIntInt          482ns × (0.98,1.01)     488ns × (0.95,1.02)  ~ (p=0.419)
BenchmarkFmtFprintfPrefixedInt     382ns × (0.99,1.01)     365ns × (0.96,1.02)  -4.35% (p=0.015)
BenchmarkFmtFprintfFloat           475ns × (0.99,1.01)     472ns × (1.00,1.00)  ~ (p=0.108)
BenchmarkFmtManyArgs              1.89µs × (1.00,1.01)    1.90µs × (0.94,1.02)  ~ (p=0.883)
BenchmarkGobDecode                22.4ms × (0.99,1.01)    21.9ms × (0.92,1.04)  ~ (p=0.332)
BenchmarkGobEncode                24.7ms × (0.98,1.02)    23.9ms × (0.87,1.07)  ~ (p=0.407)
BenchmarkGzip                      397ms × (0.99,1.01)     398ms × (0.99,1.01)  ~ (p=0.718)
BenchmarkGunzip                   96.7ms × (1.00,1.00)    96.9ms × (1.00,1.00)  ~ (p=0.230)
BenchmarkHTTPClientServer         71.5µs × (0.98,1.01)    68.5µs × (0.92,1.06)  ~ (p=0.243)
BenchmarkJSONEncode               46.1ms × (0.98,1.01)    44.9ms × (0.98,1.03)  -2.51% (p=0.040)
BenchmarkJSONDecode               86.1ms × (0.99,1.01)    86.5ms × (0.99,1.01)  ~ (p=0.343)
BenchmarkMandelbrot200            4.12ms × (1.00,1.00)    4.13ms × (1.00,1.00)  +0.23% (p=0.000)
BenchmarkGoParse                  5.89ms × (0.96,1.03)    5.82ms × (0.96,1.04)  ~ (p=0.522)
BenchmarkRegexpMatchEasy0_32       141ns × (0.99,1.01)     142ns × (1.00,1.00)  ~ (p=0.178)
BenchmarkRegexpMatchEasy0_1K       408ns × (1.00,1.00)     392ns × (0.99,1.00)  -3.83% (p=0.000)
BenchmarkRegexpMatchEasy1_32       122ns × (1.00,1.00)     122ns × (1.00,1.00)  ~ (p=0.178)
BenchmarkRegexpMatchEasy1_1K       626ns × (1.00,1.01)     624ns × (0.99,1.00)  ~ (p=0.122)
BenchmarkRegexpMatchMedium_32      202ns × (0.99,1.00)     205ns × (0.99,1.01)  +1.58% (p=0.001)
BenchmarkRegexpMatchMedium_1K     54.4µs × (1.00,1.00)    55.5µs × (1.00,1.00)  +1.86% (p=0.000)
BenchmarkRegexpMatchHard_32       2.68µs × (1.00,1.00)    2.71µs × (1.00,1.00)  +0.97% (p=0.002)
BenchmarkRegexpMatchHard_1K       79.8µs × (1.00,1.01)    80.5µs × (1.00,1.01)  +0.94% (p=0.003)
BenchmarkRevcomp                   590ms × (0.99,1.01)     585ms × (1.00,1.00)  ~ (p=0.066)
BenchmarkTemplate                  111ms × (0.97,1.02)     112ms × (0.99,1.01)  ~ (p=0.201)
BenchmarkTimeParse                 392ns × (1.00,1.00)     385ns × (1.00,1.00)  -1.69% (p=0.000)
BenchmarkTimeFormat                449ns × (0.98,1.01)     448ns × (0.99,1.01)  ~ (p=0.550)

Change-Id: Ie7c3830c481d96c9043e7bf26853c6c1d05dc9f4
Reviewed-on: https://go-review.googlesource.com/9364
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-28 15:22:20 +00:00
Russ Cox
32d6fbcb4f runtime: replace needwb() with writeBarrierEnabled
Reduce the write barrier check to a single load and compare
so that it can be inlined into write barrier use sites.
Makes the standard write barrier a little faster too.

name                                       old                     new          delta
BenchmarkBinaryTree17              17.9s × (0.99,1.01)     17.9s × (1.00,1.01)  ~
BenchmarkFannkuch11                4.35s × (1.00,1.00)     4.43s × (1.00,1.00)  +1.81%
BenchmarkFmtFprintfEmpty           120ns × (0.93,1.06)     110ns × (1.00,1.06)  -7.92%
BenchmarkFmtFprintfString          479ns × (0.99,1.00)     487ns × (0.99,1.00)  +1.67%
BenchmarkFmtFprintfInt             452ns × (0.99,1.02)     450ns × (0.99,1.00)  ~
BenchmarkFmtFprintfIntInt          766ns × (0.99,1.01)     762ns × (1.00,1.00)  ~
BenchmarkFmtFprintfPrefixedInt     576ns × (0.98,1.01)     584ns × (0.99,1.01)  ~
BenchmarkFmtFprintfFloat           730ns × (1.00,1.01)     738ns × (1.00,1.00)  +1.16%
BenchmarkFmtManyArgs              2.84µs × (0.99,1.00)    2.80µs × (1.00,1.01)  -1.22%
BenchmarkGobDecode                39.3ms × (0.98,1.01)    39.0ms × (0.99,1.00)  ~
BenchmarkGobEncode                39.5ms × (0.99,1.01)    37.8ms × (0.98,1.01)  -4.33%
BenchmarkGzip                      663ms × (1.00,1.01)     661ms × (0.99,1.01)  ~
BenchmarkGunzip                    143ms × (1.00,1.00)     142ms × (1.00,1.00)  ~
BenchmarkHTTPClientServer          132µs × (0.99,1.01)     132µs × (0.99,1.01)  ~
BenchmarkJSONEncode               57.4ms × (0.99,1.01)    56.3ms × (0.99,1.01)  -1.96%
BenchmarkJSONDecode                139ms × (0.99,1.00)     138ms × (0.99,1.01)  ~
BenchmarkMandelbrot200            6.03ms × (1.00,1.00)    6.01ms × (1.00,1.00)  ~
BenchmarkGoParse                  10.3ms × (0.89,1.14)    10.2ms × (0.87,1.05)  ~
BenchmarkRegexpMatchEasy0_32       209ns × (1.00,1.00)     208ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy0_1K       591ns × (0.99,1.00)     588ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy1_32       184ns × (0.99,1.02)     182ns × (0.99,1.01)  ~
BenchmarkRegexpMatchEasy1_1K      1.01µs × (1.00,1.00)    0.99µs × (1.00,1.01)  -2.33%
BenchmarkRegexpMatchMedium_32      330ns × (1.00,1.00)     323ns × (1.00,1.01)  -2.12%
BenchmarkRegexpMatchMedium_1K     92.6µs × (1.00,1.00)    89.9µs × (1.00,1.00)  -2.92%
BenchmarkRegexpMatchHard_32       4.80µs × (0.95,1.00)    4.72µs × (0.95,1.01)  ~
BenchmarkRegexpMatchHard_1K        136µs × (1.00,1.00)     133µs × (1.00,1.01)  -1.86%
BenchmarkRevcomp                   900ms × (0.99,1.04)     900ms × (1.00,1.05)  ~
BenchmarkTemplate                  172ms × (1.00,1.00)     168ms × (0.99,1.01)  -2.07%
BenchmarkTimeParse                 637ns × (1.00,1.00)     637ns × (1.00,1.00)  ~
BenchmarkTimeFormat                744ns × (1.00,1.01)     738ns × (1.00,1.00)  -0.67%

Change-Id: I4ecc925805da1f5ee264377f1f7574f54ee575e7
Reviewed-on: https://go-review.googlesource.com/9321
Reviewed-by: Austin Clements <austin@google.com>
2015-04-28 01:37:53 +00:00
Russ Cox
2050f57141 runtime: change unused argument in fat write barriers from pointer to scalar
The argument is unused, only present for alignment of the
following argument. The compiler today always passes a zero
but I'd rather not write anything there during the call sequence,
so mark it as a scalar so the garbage collector won't look at it.

As expected, no significant performance change.

name                                       old                     new          delta
BenchmarkBinaryTree17              17.9s × (0.99,1.00)     17.9s × (0.99,1.01)  ~
BenchmarkFannkuch11                4.35s × (1.00,1.00)     4.35s × (1.00,1.00)  ~
BenchmarkFmtFprintfEmpty           120ns × (0.94,1.05)     120ns × (0.93,1.06)  ~
BenchmarkFmtFprintfString          477ns × (1.00,1.00)     479ns × (0.99,1.00)  ~
BenchmarkFmtFprintfInt             450ns × (0.99,1.01)     452ns × (0.99,1.02)  ~
BenchmarkFmtFprintfIntInt          765ns × (0.99,1.01)     766ns × (0.99,1.01)  ~
BenchmarkFmtFprintfPrefixedInt     569ns × (0.99,1.01)     576ns × (0.98,1.01)  ~
BenchmarkFmtFprintfFloat           728ns × (1.00,1.00)     730ns × (1.00,1.01)  ~
BenchmarkFmtManyArgs              2.82µs × (0.99,1.01)    2.84µs × (0.99,1.00)  ~
BenchmarkGobDecode                39.1ms × (0.99,1.01)    39.3ms × (0.98,1.01)  ~
BenchmarkGobEncode                39.4ms × (0.99,1.01)    39.5ms × (0.99,1.01)  ~
BenchmarkGzip                      661ms × (0.99,1.01)     663ms × (1.00,1.01)  ~
BenchmarkGunzip                    143ms × (1.00,1.00)     143ms × (1.00,1.00)  ~
BenchmarkHTTPClientServer          133µs × (0.99,1.01)     132µs × (0.99,1.01)  ~
BenchmarkJSONEncode               57.3ms × (0.99,1.04)    57.4ms × (0.99,1.01)  ~
BenchmarkJSONDecode                139ms × (0.99,1.00)     139ms × (0.99,1.00)  ~
BenchmarkMandelbrot200            6.02ms × (1.00,1.00)    6.03ms × (1.00,1.00)  ~
BenchmarkGoParse                  9.72ms × (0.92,1.11)   10.31ms × (0.89,1.14)  ~
BenchmarkRegexpMatchEasy0_32       209ns × (1.00,1.01)     209ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy0_1K       592ns × (0.99,1.00)     591ns × (0.99,1.00)  ~
BenchmarkRegexpMatchEasy1_32       183ns × (0.98,1.01)     184ns × (0.99,1.02)  ~
BenchmarkRegexpMatchEasy1_1K      1.01µs × (1.00,1.01)    1.01µs × (1.00,1.00)  ~
BenchmarkRegexpMatchMedium_32      330ns × (1.00,1.00)     330ns × (1.00,1.00)  ~
BenchmarkRegexpMatchMedium_1K     92.4µs × (1.00,1.00)    92.6µs × (1.00,1.00)  ~
BenchmarkRegexpMatchHard_32       4.77µs × (0.95,1.01)    4.80µs × (0.95,1.00)  ~
BenchmarkRegexpMatchHard_1K        136µs × (1.00,1.00)     136µs × (1.00,1.00)  ~
BenchmarkRevcomp                   906ms × (0.99,1.05)     900ms × (0.99,1.04)  ~
BenchmarkTemplate                  171ms × (0.99,1.01)     172ms × (1.00,1.00)  ~
BenchmarkTimeParse                 638ns × (1.00,1.00)     637ns × (1.00,1.00)  ~
BenchmarkTimeFormat                745ns × (0.99,1.02)     744ns × (1.00,1.01)  ~

Change-Id: I0aeac5dc7adfd75e2223e3aabfedc7818d339f9b
Reviewed-on: https://go-review.googlesource.com/9320
Reviewed-by: Austin Clements <austin@google.com>
2015-04-28 01:37:45 +00:00
Austin Clements
02ba71e547 runtime/race: fix failing tests
Some race tests were sensitive to the goroutine scheduling order.
When this changed in commit e870f06, these tests started to fail.

Fix TestRaceHeapParam by ensuring that the racing goroutine has
run before the test exits. Fix TestRaceRWMutexMultipleReaders by
adding a third reader to ensure that two readers wind up on the
same side of the writer (and race with each other) regardless of
the schedule. Fix TestRaceRange by ensuring that the racing
goroutine runs before the main goroutine exits the loop it races
with.

Change-Id: Iaf002f8730ea42227feaf2f3c51b9a1e57ccffdd
Reviewed-on: https://go-review.googlesource.com/9402
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-27 23:12:00 +00:00
Russ Cox
f774e6a1f8 runtime/race: stop listening to external network addresses
This makes the OS X firewall box pop up.
Not run during all.bash so hasn't been noticed before.

Change-Id: I78feb4fd3e1d3c983ae3419085048831c04de3da
Reviewed-on: https://go-review.googlesource.com/9401
Reviewed-by: Austin Clements <austin@google.com>
2015-04-27 23:11:45 +00:00
Austin Clements
7c7cd69591 runtime: fix stack use accounting
ReadMemStats accounts for stacks slightly differently than the runtime
does internally. Internally, only stacks allocated by newosproc0 are
accounted in memstats.stacks_sys and other stacks are accounted in
heap_sys. readmemstats_m shuffles the statistics so all stacks are
accounted in StackSys rather than HeapSys.

However, currently, readmemstats_m assumes StackSys will be zero when
it does this shuffle. This was true until commit 6ad33be. If it isn't
(e.g., if something called newosproc0), StackSys+HeapSys will be
different before and after this shuffle, and the Sys sum that was
computed earlier will no longer agree with the sum of its components.

Fix this by making the shuffle in readmemstats_m not assume that
StackSys is zero.

Fixes #10585.

Change-Id: If13991c8de68bd7b85e1b613d3f12b4fd6fd5813
Reviewed-on: https://go-review.googlesource.com/9366
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-27 23:09:39 +00:00
David Crawshaw
d707a6e0e2 runtime: remove unnecessary noescape to fix netbsd
I introduced this build failure in golang.org/cl/9302 but failed to
notice due to the other failures on the dashboard.

Change-Id: I84bf00f664ba572c1ca722e0136d8a2cf21613ca
Reviewed-on: https://go-review.googlesource.com/9363
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-27 23:04:38 +00:00
Austin Clements
23ce80efeb runtime/race: fix benchmark deadlock
Currently TestRaceCrawl fails to wg.Done for every wg.Adds if the
depth ever reaches 0. This causes the test to deadlock. Under the race
detector, this deadlock is not detected, so the test eventually times
out.

This only recently became a problem. Prior to commit e870f06 the depth
would never reach 0 because the strict round-robin goroutine schedule
ensured that all of the URLs were already "seen" by depth 2. Now that
the runtime prefers scheduling the most recently started goroutine,
the test is able to reach depth 0 and trigger this deadlock.

Change-Id: I5176302a89614a344c84d587073b364833af6590
Reviewed-on: https://go-review.googlesource.com/9344
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-27 20:54:34 +00:00
Russ Cox
42da270024 runtime: fix race in BenchmarkPingPongHog
The master goroutine was returning before
the child goroutine had done its final i < b.N
(the one that fails and causes it to exit the loop)
and then the benchmark harness was updating
b.N, causing a read+write race on b.N.

Change-Id: I2504270a0de30544736f6c32161337a25b505c3e
Reviewed-on: https://go-review.googlesource.com/9368
Reviewed-by: Austin Clements <austin@google.com>
2015-04-27 20:10:11 +00:00
Austin Clements
33e0f3d853 runtime: fix some out of date comments and typos
Change-Id: I061057414c722c5a0f03c709528afc8554114db6
Reviewed-on: https://go-review.googlesource.com/9367
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 20:08:38 +00:00
Josh Bleecher Snyder
9a0fd97ff3 runtime: remove a modulus calculation from pollorder
This is a follow-up to CL 9269, as suggested
by dvyukov.

There is probably even more that can be done
to speed up this shuffle. It will matter more
once CL 7570 (fine-grained locking in select)
is in and can be revisited then, with benchmarks.

Change-Id: Ic13a27d11cedd1e1f007951214b3bb56b1644f02
Reviewed-on: https://go-review.googlesource.com/9393
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-27 19:36:37 +00:00
Austin Clements
1b01910c06 runtime: rename gcController.findRunnable to findRunnableGCWorker
This avoids confusion with the main findrunnable in the scheduler.

Change-Id: I8cf40657557a8610a2fe5a2f74598518256ca7f0
Reviewed-on: https://go-review.googlesource.com/9305
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:42 +00:00
Austin Clements
bb6320535d runtime: replace STW for enabling write barriers with ragged barrier
Currently, we use a full stop-the-world around enabling write
barriers. This is to ensure that all Gs have enabled write barriers
before any blackening occurs (either in gcBgMarkWorker() or in
gcAssistAlloc()).

However, there's no need to bring the whole world to a synchronous
stop to ensure this. This change replaces the STW with a ragged
barrier that ensures each P has individually observed that write
barriers should be enabled before GC performs any blackening.

Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955
Reviewed-on: https://go-review.googlesource.com/8207
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:37 +00:00
Austin Clements
57afa76471 runtime: add ragged global barrier function
This adds forEachP, which performs a general-purpose ragged global
barrier. forEachP takes a callback and invokes it for every P at a GC
safe point.

Ps that are idle or in a syscall are considered to be at a continuous
safe point. forEachP ensures that these Ps do not change state by
forcing all syscall Ps into idle and holding the sched.lock.

To ensure that Ps do not enter syscall or idle without running the
safe-point function, this adds checks for a pending callback every
place there is currently a gcwaiting check.

We'll use forEachP to replace the STW around enabling the write
barrier and to replace the current asynchronous per-M wbuf cache with
a cooperatively managed per-P gcWork cache.

Change-Id: Ie944f8ce1fead7c79bf271d2f42fcd61a41bb3cc
Reviewed-on: https://go-review.googlesource.com/8206
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:33 +00:00
Austin Clements
b0b1a66052 runtime: reset spinning in mspinning if work was ready()ed
This fixes a bug where the runtime ready()s a goroutine while setting
up a new M that's initially marked as spinning, causing the scheduler
to later panic when it finds work in the run queue of a P associated
with a spinning M. Specifically, the sequence of events that can lead
to this is:

1) sysmon calls handoffp to hand off a P stolen from a syscall.

2) handoffp sees no pending work on the P, so it calls startm with
   spinning set.

3) startm calls newm, which in turn calls allocm to allocate a new M.

4) allocm "borrows" the P we're handing off in order to do allocation
   and performs this allocation.

5) This allocation may assist the garbage collector, and this assist
   may detect the end of concurrent mark and ready() the main GC
   goroutine to signal this.

6) This ready()ing puts the GC goroutine on the run queue of the
   borrowed P.

7) newm starts the OS thread, which runs mstart and subsequently
   mstart1, which marks the M spinning because startm was called with
   spinning set.

8) mstart1 enters the scheduler, which panics because there's work on
   the run queue, but the M is marked spinning.

To fix this, before marking the M spinning in step 7, add a check to
see if work was been added to the P's run queue. If this is the case,
undo the spinning instead.

Fixes #10573.

Change-Id: I4670495ae00582144a55ce88c45ae71de597cfa5
Reviewed-on: https://go-review.googlesource.com/9332
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-04-27 12:49:54 +00:00
Austin Clements
2a46f55b35 runtime: panic when idling a P with runnable Gs
This adds a check that we never put a P on the idle list when it has
work on its local run queue.

Change-Id: Ifcfab750de60c335148a7f513d4eef17be03b6a7
Reviewed-on: https://go-review.googlesource.com/9324
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-27 12:49:49 +00:00
Josh Bleecher Snyder
fd5540e7e5 runtime: tighten select permutation generation
This is the optimization made to math/rand in CL 21030043.

Change-Id: I231b24fa77cac1fe74ba887db76313b5efaab3e8
Reviewed-on: https://go-review.googlesource.com/9269
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-27 02:36:24 +00:00
David Crawshaw
a5b693b431 runtime: signal forwarding for darwin/amd64
Follows the linux signal forwarding semantics from
http://golang.org/cl/8712, sharing the implementation of sigfwdgo.
Forwarding for 386, arm, and arm64 will follow.

Change-Id: I6bf30d563d19da39b6aec6900c7fe12d82ed4f62
Reviewed-on: https://go-review.googlesource.com/9302
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-26 13:46:13 +00:00
Rick Hudson
ada8cdb9f6 runtime: Fix bug due to elided return.
A previous change to mbitmap.go dropped a return on a
path the seems not to be excersized. This was a mistake that
this CL fixes.

Change-Id: I715ee4ef08f5bf8d9f53cee84e8fb31a237e2d43
Reviewed-on: https://go-review.googlesource.com/9295
Reviewed-by: Austin Clements <austin@google.com>
2015-04-24 21:52:30 +00:00
Austin Clements
1b4025f4bd runtime: replace per-M workbuf cache with per-P gcWork cache
Currently, each M has a cache of the most recently used *workbuf. This
is used primarily by the write barrier so it doesn't have to access
the global workbuf lists on every write barrier. It's also used by
stack scanning because it's convenient.

This cache is important for write barrier performance, but this
particular approach has several downsides. It's faster than no cache,
but far from optimal (as the benchmarks below show). It's complex:
access to the cache is sprinkled through most of the workbuf list
operations and it requires special care to transform into and back out
of the gcWork cache that's actually used for scanning and marking. It
requires atomic exchanges to take ownership of the cached workbuf and
to return it to the M's cache even though it's almost always used by
only the current M. Since it's per-M, flushing these caches is O(# of
Ms), which may be high. And it has some significant subtleties: for
example, in general the cache shouldn't be used after the
harvestwbufs() in mark termination because it could hide work from
mark termination, but stack scanning can happen after this and *will*
use the cache (but it turns out this is okay because it will always be
followed by a getfull(), which drains the cache).

This change replaces this cache with a per-P gcWork object. This
gcWork cache can be used directly by scanning and marking (as long as
preemption is disabled, which is a general requirement of gcWork).
Since it's per-P, it doesn't require synchronization, which simplifies
things and means the only atomic operations in the write barrier are
occasionally fetching new work buffers and setting a mark bit if the
object isn't already marked. This cache can be flushed in O(# of Ps),
which is generally small. It follows a simple flushing rule: the cache
can be used during any phase, but during mark termination it must be
flushed before allowing preemption. This also makes the dispose during
mutator assist no longer necessary, which eliminates the vast majority
of gcWork dispose calls and reduces contention on the global workbuf
lists. And it's a lot faster on some benchmarks:

benchmark                          old ns/op       new ns/op       delta
BenchmarkBinaryTree17              11963668673     11206112763     -6.33%
BenchmarkFannkuch11                2643217136      2649182499      +0.23%
BenchmarkFmtFprintfEmpty           70.4            70.2            -0.28%
BenchmarkFmtFprintfString          364             307             -15.66%
BenchmarkFmtFprintfInt             317             282             -11.04%
BenchmarkFmtFprintfIntInt          512             483             -5.66%
BenchmarkFmtFprintfPrefixedInt     404             380             -5.94%
BenchmarkFmtFprintfFloat           521             479             -8.06%
BenchmarkFmtManyArgs               2164            1894            -12.48%
BenchmarkGobDecode                 30366146        22429593        -26.14%
BenchmarkGobEncode                 29867472        26663152        -10.73%
BenchmarkGzip                      391236616       396779490       +1.42%
BenchmarkGunzip                    96639491        96297024        -0.35%
BenchmarkHTTPClientServer          100110          70763           -29.31%
BenchmarkJSONEncode                51866051        52511382        +1.24%
BenchmarkJSONDecode                103813138       86094963        -17.07%
BenchmarkMandelbrot200             4121834         4120886         -0.02%
BenchmarkGoParse                   16472789        5879949         -64.31%
BenchmarkRegexpMatchEasy0_32       140             140             +0.00%
BenchmarkRegexpMatchEasy0_1K       394             394             +0.00%
BenchmarkRegexpMatchEasy1_32       120             120             +0.00%
BenchmarkRegexpMatchEasy1_1K       621             614             -1.13%
BenchmarkRegexpMatchMedium_32      209             202             -3.35%
BenchmarkRegexpMatchMedium_1K      54889           55175           +0.52%
BenchmarkRegexpMatchHard_32        2682            2675            -0.26%
BenchmarkRegexpMatchHard_1K        79383           79524           +0.18%
BenchmarkRevcomp                   584116718       584595320       +0.08%
BenchmarkTemplate                  125400565       109620196       -12.58%
BenchmarkTimeParse                 386             387             +0.26%
BenchmarkTimeFormat                580             447             -22.93%

(Best out of 10 runs. The delta of averages is similar.)

This also puts us in a good position to flush these caches when
nearing the end of concurrent marking, which will let us increase the
size of the work buffers while still controlling mark termination
pause time.

Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522
Reviewed-on: https://go-review.googlesource.com/9178
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 20:10:14 +00:00
Austin Clements
d1cae6358c runtime: fix check for pending GC work
When findRunnable considers running a fractional mark worker, it first
checks if there's any work to be done; if there isn't there's no point
in running the worker because it will just reschedule immediately.
However, currently findRunnable just checks work.full and
work.partial, whereas getfull can *also* draw work from m.currentwbuf.
As a result, findRunnable may not start a worker even though there
actually is work.

This problem manifests itself in occasional failures of the
test/init1.go test. This test is unusual because it performs a large
amount of allocation without executing any write barriers, which means
there's nothing to force the pointers in currentwbuf out to the
work.partial/full lists where findRunnable can see them.

This change fixes this problem by making findRunnable also check for a
currentwbuf. This aligns findRunnable with trygetfull's notion of
whether or not there's work.

Change-Id: Ic76d22b7b5d040bc4f58a6b5975e9217650e66c4
Reviewed-on: https://go-review.googlesource.com/9299
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 20:10:10 +00:00
Austin Clements
26eac917dc runtime: start dedicated mark workers even if there's no work
Currently, findRunnable only considers running a mark worker if
there's work in the work queue. In principle, this can delay the start
of the desired number of dedicated mark workers if there's no work
pending. This is unlikely to occur in practice, since there should be
work queued from the scan phase, but if it were to come up, a CPU hog
mutator could slow down or delay garbage collection.

This check makes sense for fractional mark workers, since they'll just
return to the scheduler immediately if there's no work, but we want
the scheduler to start all of the dedicated mark workers promptly,
even if there's currently no queued work. Hence, this change moves the
pending work check after the check for starting a dedicated worker.

Change-Id: I52b851cc9e41f508a0955b3f905ca80f109ea101
Reviewed-on: https://go-review.googlesource.com/9298
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-24 20:10:05 +00:00
Austin Clements
711a164267 runtime: fix some out-of-date comments
bgMarkCount no longer exists.

Change-Id: I3aa406fdccfca659814da311229afbae55af8304
Reviewed-on: https://go-review.googlesource.com/9297
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-24 20:10:01 +00:00
Srdjan Petrovic
6ad33be2d9 runtime: implement xadduintptr and update system mstats using it
The motivation is that sysAlloc/Free() currently aren't safe to be
called without a valid G, because arm's xadd64() uses locks that require
a valid G.

The solution here was proposed by Dmitry Vyukov: use xadduintptr()
instead of xadd64(), until arm can support xadd64 on all of its
architectures (not a trivial task for arm).

Change-Id: I250252079357ea2e4360e1235958b1c22051498f
Reviewed-on: https://go-review.googlesource.com/9002
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-24 16:53:26 +00:00
Austin Clements
0e6a6c510f runtime: simplify process for starting GC goroutine
Currently, when allocation reaches the GC trigger, the runtime uses
readyExecute to start the GC goroutine immediately rather than wait
for the scheduler to get around to the GC goroutine while the mutator
continues to grow the heap.

Now that the scheduler runs the most recently readied goroutine when a
goroutine yields its time slice, this rigmarole is no longer
necessary. The runtime can simply ready the GC goroutine and yield
from the readying goroutine.

Change-Id: I3b4ebadd2a72a923b1389f7598f82973dd5c8710
Reviewed-on: https://go-review.googlesource.com/9292
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-04-24 15:13:05 +00:00
Austin Clements
ce502b063c runtime: use park/ready to wake up GC at end of concurrent mark
Currently, the main GC goroutine sleeps on a note during concurrent
mark and the first background mark worker or assist to finish marking
use wakes up that note to let the main goroutine proceed into mark
termination. Unfortunately, the latency of this wakeup can be quite
high, since the GC goroutine will typically have lost its P while in
the futex sleep, meaning it will be placed on the global run queue and
will wait there until some P is kind enough to pick it up. This delay
gives the mutator more time to allocate and create floating garbage,
growing the heap unnecessarily. Worse, it's likely that background
marking has stopped at this point (unless GOMAXPROCS>4), so anything
that's allocated and published to the heap during this window will
have to be scanned during mark termination while the world is stopped.

This change replaces the note sleep/wakeup with a gopark/ready
scheme. This keeps the wakeup inside the Go scheduler and lets the
garbage collector take advantage of the new scheduler semantics that
run the ready()d goroutine immediately when the ready()ing goroutine
sleeps.

For the json benchmark from x/benchmarks with GOMAXPROCS=4, this
reduces the delay in waking up the GC goroutine and entering mark
termination once concurrent marking is done from ~100ms to typically
<100µs.

Change-Id: Ib11f8b581b8914f2d68e0094f121e49bac3bb384
Reviewed-on: https://go-review.googlesource.com/9291
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:13:01 +00:00
Austin Clements
4e32718d3e runtime: use timer for GC control revise rather than timeout
Currently, we use a note sleep with a timeout in a loop in func gc to
periodically revise the GC control variables. Replace this with a
fully blocking note sleep and use a periodic timer to trigger the
revise instead. This is a step toward replacing the note sleep in func
gc.

Change-Id: I2d562f6b9b2e5f0c28e9a54227e2c0f8a2603f63
Reviewed-on: https://go-review.googlesource.com/9290
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:56 +00:00
Austin Clements
e870f06c3f runtime: yield time slice to most recently readied G
Currently, when the runtime ready()s a G, it adds it to the end of the
current P's run queue and continues running. If there are many other
things in the run queue, this can result in a significant delay before
the ready()d G actually runs and can hurt fairness when other Gs in
the run queue are CPU hogs. For example, if there are three Gs sharing
a P, one of which is a CPU hog that never voluntarily gives up the P
and the other two of which are doing small amounts of work and
communicating back and forth on an unbuffered channel, the two
communicating Gs will get very little CPU time.

Change this so that when G1 ready()s G2 and then blocks, the scheduler
immediately hands off the remainder of G1's time slice to G2. In the
above example, the two communicating Gs will now act as a unit and
together get half of the CPU time, while the CPU hog gets the other
half of the CPU time.

This fixes the problem demonstrated by the ping-pong benchmark added
in the previous commit:

benchmark                old ns/op     new ns/op     delta
BenchmarkPingPongHog     684287        825           -99.88%

On the x/benchmarks suite, this change improves the performance of
garbage by ~6% (for GOMAXPROCS=1 and 4), and json by 28% and 36% for
GOMAXPROCS=1 and 4. It has negligible effect on heap size.

This has no effect on the go1 benchmark suite since those benchmarks
are mostly single-threaded.

Change-Id: I858a08eaa78f702ea98a5fac99d28a4ac91d339f
Reviewed-on: https://go-review.googlesource.com/9289
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:52 +00:00
Austin Clements
da0e37fa8d runtime: benchmark for ping-pong in the presence of a CPU hog
This benchmark demonstrates a current problem with the scheduler where
a set of frequently communicating goroutines get very little CPU time
in the presence of another goroutine that hogs that CPU, even if one
of those communicating goroutines is always runnable.

Currently it takes about 0.5 milliseconds to switch between
ping-ponging goroutines in the presence of a CPU hog:

BenchmarkPingPongHog	    2000	    684287 ns/op

Change-Id: I278848c84f778de32344921ae8a4a8056e4898b0
Reviewed-on: https://go-review.googlesource.com/9288
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:47 +00:00
Austin Clements
e5e52f4f2c runtime: factor checking if P run queue is empty
There are a variety of places where we check if a P's run queue is
empty. This test is about to get slightly more complicated, so factor
it out into a new function, runqempty. This function is inlinable, so
this has no effect on performance.

Change-Id: If4a0b01ffbd004937de90d8d686f6ded4aad2c6b
Reviewed-on: https://go-review.googlesource.com/9287
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:42 +00:00
Srdjan Petrovic
5c8fbc6f1e runtime: signal forwarding
Forward signals to signal handlers installed before Go installs its own,
under certain circumstances.  In particular, as iant@ suggests, signals are
forwarded iff:
   (1) a non-SIG_DFL signal handler existed before Go, and
   (2) signal is synchronous (i.e., one of SIGSEGV, SIGBUS, SIGFPE), and
   	(3a) signal occured on a non-Go thread, or
   	(3b) signal occurred on a Go thread but in CGo code.

Supported only on Linux, for now.

Change-Id: I403219ee47b26cf65da819fb86cf1ec04d3e25f5
Reviewed-on: https://go-review.googlesource.com/8712
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-24 05:19:39 +00:00
Srdjan Petrovic
1f65c9c141 runtime: deflake TestNewOSProc0, fix _rt0_amd64_linux_lib stack alignment
This addresses iant's comments from CL 9164.

Change-Id: I7b5b282f61b11aab587402c2d302697e76666376
Reviewed-on: https://go-review.googlesource.com/9222
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-23 23:09:03 +00:00
Austin Clements
ed09e0e2bf runtime: fix underflow in next_gc calculation
Currently, it's possible for the next_gc calculation to underflow.
Since next_gc is unsigned, this wraps around and effectively disables
GC for the rest of the program's execution. Besides being obviously
wrong, this is causing test failures on 32-bit because some tests are
running out of heap.

This underflow happens for two reasons, both having to do with how we
estimate the reachable heap size at the end of the GC cycle.

One reason is that this calculation depends on the value of heap_live
at the beginning of the GC cycle, but we currently only record that
value during a concurrent GC and not during a forced STW GC. Fix this
by moving the recorded value from gcController to work and recording
it on a common code path.

The other reason is that we use the amount of allocation during the GC
cycle as an approximation of the amount of floating garbage and
subtract it from the marked heap to estimate the reachable heap.
However, since this is only an approximation, it's possible for the
amount of allocation during the cycle to be *larger* than the marked
heap size (since the runtime allocates white and it's possible for
these allocations to never be made reachable from the heap). Currently
this causes wrap-around in our estimate of the reachable heap size,
which in turn causes wrap-around in next_gc. Fix this by bottoming out
the reachable heap estimate at 0, in which case we just fall back to
triggering GC at heapminimum (which is okay since this only happens on
small heaps).

Fixes #10555, fixes #10556, and fixes #10559.

Change-Id: Iad07b529c03772356fede2ae557732f13ebfdb63
Reviewed-on: https://go-review.googlesource.com/9286
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-23 20:52:54 +00:00
Rick Hudson
77f56af0bc runtime: Improve scanning performance
To achieve a 2% improvement in the garbage benchmark this CL removes
an unneeded assert and avoids one hbits.next() call per object
being scanned.

Change-Id: Ibd542d01e9c23eace42228886f9edc488354df0d
Reviewed-on: https://go-review.googlesource.com/9244
Reviewed-by: Austin Clements <austin@google.com>
2015-04-23 20:27:46 +00:00
Hyang-Ah Hana Kim
aef54d40ac runtime: disable TestNewOSProc0 on android/arm.
newosproc0 does not work on android/arm.
See issue #10548.

Change-Id: Ieaf6f5d0b77cddf5bf0b6c89fd12b1c1b8723f9b
Reviewed-on: https://go-review.googlesource.com/9293
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-23 19:08:33 +00:00
Shenghou Ma
edc53e1f14 runtime: fix build after CL 9164 on Linux
There is an assumption that the function executed in child thread
created by runtime.close should not return. And different systems
enforce that differently: some exit that thread, some exit the
whole process.

The test TestNewOSProc0 introduced in CL 9161 breaks that assumption,
so we need to adjust the code to only exit the thread should the
called function return.

Change-Id: Id631cb2f02ec6fbd765508377a79f3f96c6a2ed6
Reviewed-on: https://go-review.googlesource.com/9246
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-04-22 23:21:25 +00:00
Austin Clements
4655aadd00 runtime: use reachable heap estimate to set trigger/goal
Currently, we set the heap goal for the next GC cycle using the size
of the marked heap at the end of the current cycle. This can lead to a
bad feedback loop if the mutator is rapidly allocating and releasing
pointers that can significantly bloat heap size.

If the GC were STW, the marked heap size would be exactly the
reachable heap size (call it stwLive). However, in concurrent GC,
marked=stwLive+floatLive, where floatLive is the amount of "floating
garbage": objects that were reachable at some point during the cycle
and were marked, but which are no longer reachable by the end of the
cycle. If the GC cycle is short, then the mutator doesn't have much
time to create floating garbage, so marked≈stwLive. However, if the GC
cycle is long and the mutator is allocating and creating floating
garbage very rapidly, then it's possible that marked≫stwLive. Since
the runtime currently sets the heap goal based on marked, this will
cause it to set a high heap goal. This means that 1) the next GC cycle
will take longer because of the larger heap and 2) the assist ratio
will be low because of the large distance between the trigger and the
goal. The combination of these lets the mutator produce even more
floating garbage in the next cycle, which further exacerbates the
problem.

For example, on the garbage benchmark with GOMAXPROCS=1, this causes
the heap to grow to ~500MB and the garbage collector to retain upwards
of ~300MB of heap, while the true reachable heap size is ~32MB. This,
in turn, causes the GC cycle to take upwards of ~3 seconds.

Fix this bad feedback loop by estimating the true reachable heap size
(stwLive) and using this rather than the marked heap size
(stwLive+floatLive) as the basis for the GC trigger and heap goal.
This breaks the bad feedback loop and causes the mutator to assist
more, which decreases the rate at which it can create floating
garbage. On the same garbage benchmark, this reduces the maximum heap
size to ~73MB, the retained heap to ~40MB, and the duration of the GC
cycle to ~200ms.

Change-Id: I7712244c94240743b266f9eb720c03802799cdd1
Reviewed-on: https://go-review.googlesource.com/9177
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-22 19:28:42 +00:00
Austin Clements
1ccc577b8a runtime: include heap goal in gctrace line
This may or may not be useful to the end user, but it's incredibly
useful for us to understand the behavior of the pacer. Currently this
is fairly easy (though not trivial) to derive from the other heap
stats we print, but we're about to change how we compute the goal,
which will make it much harder to derive.

Change-Id: I796ef233d470c01f606bd9929820c01ece1f585a
Reviewed-on: https://go-review.googlesource.com/9176
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-22 19:07:44 +00:00
Austin Clements
1f39beb01a runtime: avoid divide-by-zero in GC trigger controller
The trigger controller computes GC CPU utilization by dividing by the
wall-clock time that's passed since concurrent mark began. Since this
delta is nanoseconds it's borderline impossible for it to be zero, but
if it is zero we'll currently divide by zero. Be robust to this
possibility by ignoring the utilization in the error term if no time
has elapsed.

Change-Id: I93dfc9e84735682af3e637f6538d1e7602634f09
Reviewed-on: https://go-review.googlesource.com/9175
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-22 19:07:36 +00:00
Srdjan Petrovic
ca9128f18f runtime: merge clone0 and clone
We initially added clone0 to handle the case when G or M don't exist, but
it turns out that we could have just modified clone.  (It also helps that
the function we're invoking in clone0 no longer needs arguments.)

As a side-effect, newosproc0 is now supported on all linux archs.

Change-Id: Ie603af75d8f164310fc16446052d83743961f3ca
Reviewed-on: https://go-review.googlesource.com/9164
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-22 16:28:57 +00:00
Shenghou Ma
87054c4704 runtime: fix more vet reported issues
Change-Id: Ie8dfdb592ee0bfc736d08c92c3d8413a37b6ac03
Reviewed-on: https://go-review.googlesource.com/9241
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-22 02:50:48 +00:00
Keith Randall
3a56aa0d3e runtime: check error codes for arm64 system calls
Unlike linux arm32, linux arm64 does not set the condition codes to indicate
whether a system call failed or not.  We must check if the return value
is in the error code range (the same as amd64 does).

Fixes runtime.TestBadOpen test.

Change-Id: I97a8b0a17b5f002a3215c535efa91d199cee3309
Reviewed-on: https://go-review.googlesource.com/9220
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-22 02:30:22 +00:00
Josh Bleecher Snyder
a76099f0d9 runtime: fix arm64 asm vet issues
Several naming changes and a real issue in asmcgocall_errno.

Change-Id: Ieb0a328a168819fe233d74e0397358384d7e71b3
Reviewed-on: https://go-review.googlesource.com/9212
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-22 02:30:11 +00:00
Austin Clements
170fb10089 runtime: assist harder if GC exceeds the estimated marked heap
Currently, the GC controller computes the mutator assist ratio at the
beginning of the cycle by estimating that the marked heap size this
cycle will be the same as it was the previous cycle. It then uses that
assist ratio for the rest of the cycle. However, this means that if
the mutator is quickly growing its reachable heap, the heap size is
likely to exceed the heap goal and currently there's no additional
pressure on mutator assists when this happens. For example, 6g (with
GOMAXPROCS=1) frequently exceeds the goal heap size by ~25% because of
this.

This change makes GC revise its work estimate and the resulting assist
ratio every 10ms during the concurrent mark. Instead of
unconditionally using the marked heap size from the last cycle as an
estimate for this cycle, it takes the minimum of the previously marked
heap and the currently marked heap. As a result, as the cycle
approaches or exceeds its heap goal, this will increase the assist
ratio to put more pressure on the mutator assist to bring the cycle to
an end. For 6g, this causes the GC to always finish within 5% and
often within 1% of its heap goal.

Change-Id: I4333b92ad0878c704964be42c655c38a862b4224
Reviewed-on: https://go-review.googlesource.com/9070
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-04-21 15:35:55 +00:00
Austin Clements
e0c3d85f08 runtime: fix background marking at 25% utilization
Currently, in accordance with the GC pacing proposal, we schedule
background marking with a goal of achieving 25% utilization *total*
between mutator assists and background marking. This is stricter than
was set out in the Go 1.5 proposal, which suggests that the garbage
collector can use 25% just for itself and anything the mutator does to
help out is on top of that. It also has several technical
drawbacks. Because mutator assist time is constantly changing and we
can't have instantaneous information on background marking time, it
effectively requires hitting a moving target based on out-of-date
information. This works out in the long run, but works poorly for
short GC cycles and on short time scales. Also, this requires
time-multiplexing all Ps between the mutator and background GC since
the goal utilization of background GC constantly fluctuates. This
results in a complicated scheduling algorithm, poor affinity, and
extra overheads from context switching.

This change modifies the way we schedule and run background marking so
that background marking always consumes 25% of GOMAXPROCS and mutator
assist is in addition to this. This enables a much more robust
scheduling algorithm where we pre-determine the number of Ps we should
dedicate to background marking as well as the utilization goal for a
single floating "remainder" mark worker.

Change-Id: I187fa4c03ab6fe78012a84d95975167299eb9168
Reviewed-on: https://go-review.googlesource.com/9013
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:50 +00:00
Austin Clements
24a7252e25 runtime: finish sweeping before concurrent GC starts
Currently, the concurrent sweep follows a 1:1 rule: when allocation
needs a span, it sweeps a span (likewise, when a large allocation
needs N pages, it sweeps until it frees N pages). This rule worked
well for the STW collector (especially when GOGC==100) because it did
no more sweeping than necessary to keep the heap from growing, would
generally finish sweeping just before GC, and ensured good temporal
locality between sweeping a page and allocating from it.

It doesn't work well with concurrent GC. Since concurrent GC requires
starting GC earlier (sometimes much earlier), the sweep often won't be
done when GC starts. Unfortunately, the first thing GC has to do is
finish the sweep. In the mean time, the mutator can continue
allocating, pushing the heap size even closer to the goal size. This
worked okay with the 7/8ths trigger, but it gets into a vicious cycle
with the GC trigger controller: if the mutator is allocating quickly
and driving the trigger lower, more and more sweep work will be left
to GC; this both causes GC to take longer (allowing the mutator to
allocate more during GC) and delays the start of the concurrent mark
phase, which throws off the GC controller's statistics and generally
causes it to push the trigger even lower.

As an example of a particularly bad case, the garbage benchmark with
GOMAXPROCS=4 and -benchmem 512 (MB) spends the first 0.4-0.8 seconds
of each GC cycle sweeping, during which the heap grows by between
109MB and 252MB.

To fix this, this change replaces the 1:1 sweep rule with a
proportional sweep rule. At the end of GC, GC knows exactly how much
heap allocation will occur before the next concurrent GC as well as
how many span pages must be swept. This change computes this "sweep
ratio" and when the mallocgc asks for a span, the mcentral sweeps
enough spans to bring the swept span count into ratio with the
allocated byte count.

On the benchmark from above, this entirely eliminates sweeping at the
beginning of GC, which reduces the time between startGC readying the
GC goroutine and GC stopping the world for sweep termination to ~100µs
during which the heap grows at most 134KB.

Change-Id: I35422d6bba0c2310d48bb1f8f30a72d29e98c1af
Reviewed-on: https://go-review.googlesource.com/8921
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:46 +00:00
Austin Clements
91c80ce6c7 runtime: make mcache.local_cachealloc a uintptr
This field used to decrease with sweeps (and potentially go
negative). Now it is always zero or positive, so change it to a
uintptr so it meshes better with other memory stats.

Change-Id: I6a50a956ddc6077eeaf92011c51743cb69540a3c
Reviewed-on: https://go-review.googlesource.com/8899
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:41 +00:00
Austin Clements
a0452a6821 runtime: proportional response GC trigger controller
Currently, concurrent GC triggers at a fixed 7/8*GOGC heap growth. For
mutators that allocate slowly, this means GC will trigger too early
and run too often, wasting CPU time on GC. For mutators that allocate
quickly, this means GC will trigger too late, causing the program to
exceed the GOGC heap growth goal and/or to exceed CPU goals because of
a high mutator assist ratio.

This change adds a feedback control loop to dynamically adjust the GC
trigger from cycle to cycle. By monitoring the heap growth and GC CPU
utilization from cycle to cycle, this adjusts the Go garbage collector
to target the GOGC heap growth goal and the 25% CPU utilization goal.

Change-Id: Ic82eef288c1fa122f73b69fe604d32cbb219e293
Reviewed-on: https://go-review.googlesource.com/8851
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:37 +00:00
Austin Clements
8d03acce54 runtime: multi-threaded, utilization-scheduled background mark
Currently, the concurrent mark phase is performed by the main GC
goroutine. Prior to the previous commit enabling preemption, this
caused marking to always consume 1/GOMAXPROCS of the available CPU
time. If GOMAXPROCS=1, this meant background GC would consume 100% of
the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use
less than the goal of 25%. If GOMAXPROCS=4, background GC would use
the goal 25%, but if the mutator wasn't using the remaining 75%,
background marking wouldn't take advantage of the idle time. Enabling
preemption in the previous commit made GC miss CPU targets in
completely different ways, but set us up to bring everything back in
line.

This change replaces the fixed GC goroutine with per-P background mark
goroutines. Once started, these goroutines don't go in the standard
run queues; instead, they are scheduled specially such that the time
spent in mutator assists and the background mark goroutines totals 25%
of the CPU time available to the program. Furthermore, this lets
background marking take advantage of idle Ps, which significantly
boosts GC performance for applications that under-utilize the CPU.

This requires also changing how time is reported for gctrace, so this
change splits the concurrent mark CPU time into assist/background/idle
scanning.

This also requires increasing the size of the StackRecord slice used
in a GoroutineProfile test.

Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157
Reviewed-on: https://go-review.googlesource.com/8850
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:32 +00:00
Austin Clements
af060c3086 runtime: generally allow preemption during concurrent GC phases
Currently, the entire GC process runs with g.m.preemptoff set. In the
concurrent phases, the parts that actually need preemption disabled
are run on a system stack and there's no overall need to stay on the
same M or P during the concurrent phases. Hence, move the setting of
g.m.preemptoff to when we start mark termination, at which point we
really do need preemption disabled.

This dramatically changes the scheduling behavior of the concurrent
mark phase. Currently, since this is non-preemptible, concurrent mark
gets one dedicated P (so 1/GOMAXPROCS utilization). With this change,
the GC goroutine is scheduled like any other goroutine during
concurrent mark, so it gets 1/<runnable goroutines> utilization.

You might think it's not even necessary to set g.m.preemptoff at that
point since the world is stopped, but stackalloc/stackfree use this as
a signal that the per-P pools are not safe to access without
synchronization.

Change-Id: I08aebe8179a7d304650fb8449ff36262b3771099
Reviewed-on: https://go-review.googlesource.com/8839
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:27 +00:00
Austin Clements
100da60979 runtime: track time spent in mutator assists
This time is tracked per P and periodically flushed to the global
controller state. This will be used to compute mutator assist
utilization in order to schedule background GC work.

Change-Id: Ib94f90903d426a02cf488bf0e2ef67a068eb3eec
Reviewed-on: https://go-review.googlesource.com/8837
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:22 +00:00
Austin Clements
4b2fde945a runtime: proportional mutator assist
Currently, mutator allocation periodically assists the garbage
collector by performing a small, fixed amount of scanning work.
However, to control heap growth, mutators need to perform scanning
work *proportional* to their allocation rate.

This change implements proportional mutator assists. This uses the
scan work estimate computed by the garbage collector at the beginning
of each cycle to compute how much scan work must be performed per
allocation byte to complete the estimated scan work by the time the
heap reaches the goal size. When allocation triggers an assist, it
uses this ratio and the amount allocated since the last assist to
compute the assist work, then attempts to steal as much of this work
as possible from the background collector's credit, and then performs
any remaining scan work itself.

Change-Id: I98b2078147a60d01d6228b99afd414ef857e4fba
Reviewed-on: https://go-review.googlesource.com/8836
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:18 +00:00
Austin Clements
028f972847 runtime: make gcDrainN in terms of scan work
Currently, the "n" in gcDrainN is in terms of objects to scan. This is
used by gchelpwork to perform a limited amount of work on allocation,
but is a pretty arbitrary way to bound this amount of work since the
number of objects has little relation to how long they take to scan.

Modify gcDrainN to perform a fixed amount of scan work instead. For
now, gchelpwork still performs a fairly arbitrary amount of scan work,
but at least this is much more closely related to how long the work
will take. Shortly, we'll use this to precisely control the scan work
performed by mutator assists during allocation to achieve the heap
size goal.

Change-Id: I3cd07fe0516304298a0af188d0ccdf621d4651cc
Reviewed-on: https://go-review.googlesource.com/8835
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:14 +00:00
Austin Clements
8e24283a28 runtime: track background scan work credit
This tracks scan work done by background GC in a global pool. Mutator
assists will draw on this credit to avoid doing work when background
GC is staying ahead.

Unlike the other GC controller tracking variables, this will be both
written and read throughout the cycle. Hence, we can't arbitrarily
delay updates like we can for scan work and bytes marked. However, we
still want to minimize contention, so this global credit pool is
allowed some error from the "true" amount of credit. Background GC
accumulates credit locally up to a limit and only then flushes to the
global pool. Similarly, mutator assists will draw from the credit pool
in batches.

Change-Id: I1aa4fc604b63bf53d1ee2a967694dffdfc3e255e
Reviewed-on: https://go-review.googlesource.com/8834
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:09 +00:00
Austin Clements
4e9fc0df48 runtime: implement GC scan work estimator
This implements tracking the scan work ratio of a GC cycle and using
this to estimate the scan work that will be required by the next GC
cycle. Currently this estimate is unused; it will be used to drive
mutator assists.

Change-Id: I8685b59d89cf1d83eddfc9b30d84da4e3a7f4b72
Reviewed-on: https://go-review.googlesource.com/8833
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:04 +00:00
Austin Clements
571ebae6ef runtime: track scan work performed during concurrent mark
This tracks the amount of scan work in terms of scanned pointers
during the concurrent mark phase. We'll use this information to
estimate scan work for the next cycle.

Currently this aggregates the work counter in gcWork and dispose
atomically aggregates this into a global work counter. dispose happens
relatively infrequently, so the contention on the global counter
should be low. If this turns out to be an issue, we can reduce the
number of disposes, and if it's still a problem, we can switch to
per-P counters.

Change-Id: Iac0364c466ee35fab781dbbbe7970a5f3c4e1fc1
Reviewed-on: https://go-review.googlesource.com/8832
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:00 +00:00
Austin Clements
fb9fd2bdd7 runtime: atomic ops for int64
These currently use portable implementations in terms of their uint64
counterparts.

Change-Id: Icba5f7134cfcf9d0429edabcdd73091d97e5e905
Reviewed-on: https://go-review.googlesource.com/8831
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:34:54 +00:00
Sebastien Binet
918fdae348 reflect: implement ArrayOf
This change exposes reflect.ArrayOf to create new reflect.Type array
types at runtime, when given a reflect.Type element.

- reflect: implement ArrayOf
- reflect: tests for ArrayOf
- runtime: document that typeAlg is used by reflect and must be kept in
  synchronized

Fixes #5996.

Change-Id: I5d07213364ca915c25612deea390507c19461758
Reviewed-on: https://go-review.googlesource.com/4111
Reviewed-by: Keith Randall <khr@golang.org>
2015-04-21 15:21:09 +00:00
Matthew Dempsky
c0fa9e3f6f runtime/pprof: disable flaky TestTraceFutileWakeup on linux/ppc64le
Update #10512.

Change-Id: Ifdc59c3a5d8aba420b34ae4e37b3c2315dd7c783
Reviewed-on: https://go-review.googlesource.com/9162
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-21 10:01:53 +00:00
Rick Hudson
899a4ad47e runtime: Speed up heapBitsForObject
Optimized heapBitsForObject by special casing
objects whose size is a power of two. When a
span holding such objects is initialized I
added a mask that when &ed with an interior pointer
results in the base of the pointer. For the garbage
benchmark this resulted in CPU_CLK_UNHALTED in
heapBitsForObject going from 7.7% down to 5.9%
of the total, INST_RETIRED went from 12.2 -> 8.7.

Here are the benchmarks that were at lease plus or minus 1%.

benchmark                          old ns/op      new ns/op      delta
BenchmarkFmtFprintfString          249            221            -11.24%
BenchmarkFmtFprintfInt             247            223            -9.72%
BenchmarkFmtFprintfEmpty           76.5           69.6           -9.02%
BenchmarkBinaryTree17              4106631412     3744550160     -8.82%
BenchmarkFmtFprintfFloat           424            399            -5.90%
BenchmarkGoParse                   4484421        4242115        -5.40%
BenchmarkGobEncode                 8803668        8449107        -4.03%
BenchmarkFmtManyArgs               1494           1436           -3.88%
BenchmarkGobDecode                 10431051       10032606       -3.82%
BenchmarkFannkuch11                2591306713     2517400464     -2.85%
BenchmarkTimeParse                 361            371            +2.77%
BenchmarkJSONDecode                70620492       68830357       -2.53%
BenchmarkRegexpMatchMedium_1K      54693          53343          -2.47%
BenchmarkTemplate                  90008879       91929940       +2.13%
BenchmarkTimeFormat                380            387            +1.84%
BenchmarkRegexpMatchEasy1_32       111            113            +1.80%
BenchmarkJSONEncode                21359159       21007583       -1.65%
BenchmarkRegexpMatchEasy1_1K       603            613            +1.66%
BenchmarkRegexpMatchEasy0_32       127            129            +1.57%
BenchmarkFmtFprintfIntInt          399            393            -1.50%
BenchmarkRegexpMatchEasy0_1K       373            378            +1.34%

Change-Id: I78e297161026f8b5cc7507c965fd3e486f81ed29
Reviewed-on: https://go-review.googlesource.com/8980
Reviewed-by: Austin Clements <austin@google.com>
2015-04-20 21:39:06 +00:00
Russ Cox
181e26b9fa runtime: replace func-based write barrier skipping with type-based
This CL revises CL 7504 to use explicitly uintptr types for the
struct fields that are going to be updated sometimes without
write barriers. The result is that the fields are now updated *always*
without write barriers.

This approach has two important properties:

1) Now the GC never looks at the field, so if the missing reference
could cause a problem, it will do so all the time, not just when the
write barrier is missed at just the right moment.

2) Now a write barrier never happens for the field, avoiding the
(correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1.

Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e
Reviewed-on: https://go-review.googlesource.com/9019
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-20 20:20:09 +00:00
Ian Lance Taylor
357a013060 runtime: save registers in linux/{386,amd64} lib entry point
The callee-saved registers must be saved because for the c-shared case
this code is invoked from C code in the system library, and that code
expects the registers to be saved.  The tests were passing because in
the normal case the code calls a cgo function that naturally saves
callee-saved registers anyhow.  However, it fails when the code takes
the non-cgo path.

Change-Id: I9c1f5e884f5a72db9614478049b1863641c8b2b9
Reviewed-on: https://go-review.googlesource.com/9114
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-20 18:09:41 +00:00
Ian Lance Taylor
725aa3451a runtime: no deadlock error if buildmode=c-archive or c-shared
Change-Id: I4ee6dac32bd3759aabdfdc92b235282785fbcca9
Reviewed-on: https://go-review.googlesource.com/9083
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-20 17:31:44 +00:00
Ian Lance Taylor
9c1868d06d runtime: add -buildmode=c-archive/c-shared support for linux/386
Change-Id: I87147ca6bb53e3121cc4245449c519509f107638
Reviewed-on: https://go-review.googlesource.com/9009
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-17 19:31:37 +00:00
Russ Cox
8e5346571c runtime: leave gccheckmark testing off by default
It's not helping anymore, and it's fooling people who try to
understand performance (like me).

Change-Id: I133a644acae0ddf1bfa17c654cdc01e2089da963
Reviewed-on: https://go-review.googlesource.com/9018
Reviewed-by: Austin Clements <austin@google.com>
2015-04-17 19:29:04 +00:00
Austin Clements
c1c667542c runtime: fix dangling pointer in readyExecute
readyExecute passes a closure to mcall that captures an argument to
readyExecute. Since mcall is marked noescape, this closure lives on
the stack of the calling goroutine. However, the closure puts the
calling goroutine on the run queue (and switches to a new
goroutine). If the calling goroutine gets scheduled before the mcall
returns, this stack-allocated closure will become invalid while it's
still executing. One consequence of this we've observed is that the
captured gp variable can get overwritten before the call to
execute(gp), causing execute(gp) to segfault.

Fix this by passing the currently captured gp variable through a field
in the calling goroutine's g struct so that the func is no longer a
closure.

To prevent problems like this in the future, this change also removes
the go:noescape annotation from mcall. Due to a compiler bug, this
will currently cause a func closure passed to mcall to be implicitly
allocated rather than refusing the implicit allocation. However, this
is okay because there are no other closures passed to mcall right now
and the compiler bug will be fixed shortly.

Fixes #10428.

Change-Id: I49b48b85de5643323b89e9eaa4df63854e968c32
Reviewed-on: https://go-review.googlesource.com/8866
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-17 17:59:14 +00:00
Dave Cheney
7ae9d06880 runtime/pprof: disable TestTraceStressStartStop
Updates #10476

Change-Id: Ic4414f669104905c6004835be5cf0fa873553ea6
Reviewed-on: https://go-review.googlesource.com/8962
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-17 14:54:25 +00:00
David Crawshaw
c8aba85e4a runtime: export main.main for android
Previously we started the Go runtime from a JNI function call, which
eventually called the program's main function. Now the runtime is
initialized by an ELF initialization function as a c-shared library,
and the program's main function is not called. So now we export main
so it can be called from JNI.

This is necessary for all-Go apps because unlike a normal shared
library, the program loading the library is not written by or known
to the programmer. As far as they are concerned, the .so is
everything. In fact the same code is compiled for iOS as a normal Go
program.

Change-Id: I61c6a92243240ed229342362231b1bfc7ca526ba
Reviewed-on: https://go-review.googlesource.com/9015
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-04-17 12:11:04 +00:00
David Crawshaw
5da1c254d5 runtime: do not run main when buildmode=c-shared
Change-Id: Ie7f85873978adf3fd5c739176f501ca219592824
Reviewed-on: https://go-review.googlesource.com/9011
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-17 11:31:01 +00:00
Russ Cox
6a2b0c0b6d runtime: delete cgo_allocate
This memory is untyped and can't be used anymore.
The next version of SWIG won't need it.

Change-Id: I592b287c5f5186975ee09a9b28d8efe3b57134e7
Reviewed-on: https://go-review.googlesource.com/8956
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-17 01:30:47 +00:00
David Crawshaw
5b72b8c7a3 runtime: aeshash stubs for arm64
For some reason the absense of an implementation does not stop arm64
binaries being built. However it comes up with -buildmode=c-archive.

Change-Id: Ic0db5fd8fb4fe8252b5aa320818df0c7aec3db8f
Reviewed-on: https://go-review.googlesource.com/8989
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-16 19:49:31 +00:00
David Crawshaw
e8b7133e9b runtime: darwin/arm64 c-archive entry point
Change-Id: Ib227aa3e14d01a0ab1ad9e53d107858e045d1c42
Reviewed-on: https://go-review.googlesource.com/8984
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-16 18:56:54 +00:00
David Crawshaw
9cde36be54 runtime/cgo: enable arm64 EXC_BAD_ACCESS handler
Change-Id: I8e912ff9327a4163b63b8c628aa3546e86ddcc02
Reviewed-on: https://go-review.googlesource.com/8983
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-04-16 18:00:57 +00:00
Shenghou Ma
4a71b91d29 runtime: darwin/arm64 support
Change-Id: I3b3f80791a1db4c2b7318f81a115972cd2237f03
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/8782
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-16 13:01:19 +00:00
Shenghou Ma
828de09f8b runtime/cgo: darwin/arm64 support
Fixes #10116.

Change-Id: I3b3f80791a1db4c2b7318f81a115972cd2237f05
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/8784
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-16 12:50:49 +00:00
Michael Hudson-Doyle
f616af23e0 cmd/6l: call runtime.addmoduledata from .init_array
Change-Id: I09e84161d106960a69972f5fc845a1e40c28e58f
Reviewed-on: https://go-review.googlesource.com/8331
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-15 23:54:20 +00:00
Josh Bleecher Snyder
7e0c11c32f cmd/6g, runtime: improve duffzero throughput
It is faster to execute

	MOVQ AX,(DI)
	MOVQ AX,8(DI)
	MOVQ AX,16(DI)
	MOVQ AX,24(DI)
	ADDQ $32,DI

than

	STOSQ
	STOSQ
	STOSQ
	STOSQ

However, in order to be able to jump into
the middle of a block of MOVQs, the call
site needs to pre-adjust DI.

If we're clearing a small area, the cost
of that DI pre-adjustment isn't repaid.

This CL switches the DUFFZERO implementation
to use a hybrid strategy, in which small
clears use STOSQ as before, but large clears
use mostly MOVQ/ADDQ blocks.

benchmark                 old ns/op     new ns/op     delta
BenchmarkClearFat8        0.55          0.55          +0.00%
BenchmarkClearFat12       0.82          0.83          +1.22%
BenchmarkClearFat16       0.55          0.55          +0.00%
BenchmarkClearFat24       0.82          0.82          +0.00%
BenchmarkClearFat32       2.20          1.94          -11.82%
BenchmarkClearFat40       1.92          1.66          -13.54%
BenchmarkClearFat48       2.21          1.93          -12.67%
BenchmarkClearFat56       3.03          2.20          -27.39%
BenchmarkClearFat64       3.26          2.48          -23.93%
BenchmarkClearFat72       3.57          2.76          -22.69%
BenchmarkClearFat80       3.83          3.05          -20.37%
BenchmarkClearFat88       4.14          3.30          -20.29%
BenchmarkClearFat128      5.54          4.69          -15.34%
BenchmarkClearFat256      9.95          9.09          -8.64%
BenchmarkClearFat512      18.7          17.9          -4.28%
BenchmarkClearFat1024     36.2          35.4          -2.21%

Change-Id: Ic786406d9b3cab68d5a231688f9e66fcd1bd7103
Reviewed-on: https://go-review.googlesource.com/2585
Reviewed-by: Keith Randall <khr@golang.org>
2015-04-15 19:17:07 +00:00
Michael Hudson-Doyle
ab4df700b8 runtime: merge slice and sliceStruct
By removing type slice, renaming type sliceStruct to type slice and
whacking until it compiles.

Has a pleasing net reduction of conversions.

Fixes #10188

Change-Id: I77202b8df637185b632fd7875a1fdd8d52c7a83c
Reviewed-on: https://go-review.googlesource.com/8770
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-15 16:59:49 +00:00
Dave Cheney
e629cd0f88 runtime: mark all runtime.cputicks implementations NOSPLIT
Fixes #10450

runtime.cputicks is called from runtime.exitsyscall and must not
split the stack. cputicks is implemented in several ways and the
NOSPLIT annotation was missing from a few of these.

Change-Id: I5cbbb4e5888c5d298fe2fef240782d0e49f59af8
Reviewed-on: https://go-review.googlesource.com/8939
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-04-15 09:22:15 +00:00
Alex Brainman
9402e49450 runtime: really pass return value to Windows in externalthreadhandler
When Windows calls externalthreadhandler it expects to receive
return value in AX. We don't set AX anywhere. Change that.
Store ctrlhandler1 and profileloop1 return values into AX before
returning from externalthreadhandler.

Fixes #10215.

Change-Id: Ied04542cc3ebe7d4a26660e970f9f78098143591
Reviewed-on: https://go-review.googlesource.com/8901
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-15 05:03:42 +00:00
Austin Clements
a23a341e10 runtime: make time slice a const
A G will be preempted if it runs for 10ms without blocking. Currently
this constant is hard-coded in retake. Move it to a global const.
We'll use the time slice length in scheduling background GC.

Change-Id: I79a979948af2fad3afe5df9d4af4062f166554b7
Reviewed-on: https://go-review.googlesource.com/8838
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-14 22:06:32 +00:00
Austin Clements
69001e404e runtime: fix freed page accounting in mHeap_ReclaimList
mHeap_ReclaimList is asked to reclaim at least npages pages, but it
counts the number of spans reclaimed, not the number of pages
reclaimed. The number of spans reclaimed is strictly larger than the
number of pages, so this is not strictly wrong, but it is forcing more
reclamation than was intended by the caller, which delays large
allocations.

Fix this by increasing the count by the number of pages in the swept
span, rather than just increasing it by 1.

Fixes #9048.

Change-Id: I5ae364a9837a6012e68fcd431bba000340cfd50c
Reviewed-on: https://go-review.googlesource.com/8920
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-14 20:55:14 +00:00
Austin Clements
bedb6f8aef runtime: remove unnecessary traceNextGC
Commit d7e0ad4 removed the next_gc manipulation from mSpan_Sweep, but
left in the traceNextGC() for recording the updated next_gc
value. Remove this now unnecessary call.

Change-Id: I28e0de071661199be9810d7bdcc81ce50b5a58ae
Reviewed-on: https://go-review.googlesource.com/8894
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-14 20:54:23 +00:00
David Crawshaw
3b22ffc07e runtime: make cgocallback wait on package init
With the new buildmodes c-archive and c-shared, it is possible for a
cgo call to come in early in the lifecycle of a Go program. Calls
before the runtime has been initialized are caught by
_cgo_wait_runtime_init_done. However a call can come in after the
runtime has initialized, but before the program's package init
functions have finished running.

To avoid this cgocallback checks m.ncgo to see if we are on a thread
running Go. If not, we may be a foreign thread and it blocks until
main_init is complete.

Change-Id: I7a9f137fa2a40c322a0b93764261f9aa17fcf5b8
Reviewed-on: https://go-review.googlesource.com/8897
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2015-04-14 13:39:02 +00:00
David Crawshaw
cea272de30 runtime: rename close to closefd
Avoids shadowing the builtin channel close function.

Change-Id: I7a729b0937c8248fe27222be61318a88db995eee
Reviewed-on: https://go-review.googlesource.com/8898
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2015-04-14 12:31:29 +00:00
Srdjan Petrovic
d1eee2cebf runtime: shared library init support for android/arm.
Follows http://golang.org/cl/8454, a similar CL for arm architectures.
This CL involves android-specific changes, namely, synthesizing
argv/auxv, as android doesn't provide those to the init functions.

This code is based on crawshaw@ android code in golang.org/x/mobile.

Change-Id: I32364efbb2662e80270a99bd7dfb1d0421b5417d
Reviewed-on: https://go-review.googlesource.com/8457
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-13 21:53:15 +00:00
Srdjan Petrovic
93644c9118 runtime: shared library runtime init for arm
Adds the runtime initialization flow for arm akin to amd64.
In particular,we use the library initialization entry point to:
    - create a new OS thread and run the "regular" runtime init stack on
      that thread
    - return immediately from the main (i.e., loader) thread
    - at the first CGO invocation, we wait for the runtime initialization
      to complete.

Verified to work on a Raspberry Pi and an Android phone.

Change-Id: I32f39228ae30a03ce9569287f234b305790fecf6
Reviewed-on: https://go-review.googlesource.com/8455
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Srdjan Petrovic <spetrovic@google.com>
2015-04-13 18:58:18 +00:00
Srdjan Petrovic
a888fcf7a7 runtime: remove runtime wait/notify from ppc64x architectures.
Related to issue #10410

For some reason, any non-trivial code in _cgo_wait_runtime_init_done
(even fprintf()) will crash that call.

If anybody has any guess why this is happening, please let me know!

For now, I'm clearing the functions for ppc64, as it's currently not used.

Change-Id: I1b11383aaf4f9f9a16f1fd6606842cfeedc9f0b3
Reviewed-on: https://go-review.googlesource.com/8766
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Srdjan Petrovic <spetrovic@google.com>
2015-04-13 17:21:04 +00:00
David Crawshaw
989f0ee80a runtime/cgo: EXC_BAD_ACCESS handler for arm64
Change-Id: Ia9ff9c0d381fad43fc5d3e5972dd6e66503733a5
Reviewed-on: https://go-review.googlesource.com/8815
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-13 12:08:37 +00:00
David Crawshaw
0a81d31b66 runtime/pprof: skip fork test on darwin/arm64
Just like darwin/arm.

Change-Id: Ic75927bd6457d37cda7dd8279fd9b4cd52edc1d1
Reviewed-on: https://go-review.googlesource.com/8813
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-13 11:58:03 +00:00
David Crawshaw
7db8835a50 runtime/debug: disable arm64 test for issue 9993
Like other arm64 platforms, darwin/arm64 has a different physical
page size to logical page size so it is running into issue 9993. I
hope it can be fixed for Go 1.5, but for now it is demonstrating the
same bug as the other skipped os+arch combinations.

Change-Id: Iedaf9afe56d6954bb4391b6e843d81742a75a00c
Reviewed-on: https://go-review.googlesource.com/8814
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-13 11:57:12 +00:00
David Crawshaw
d6d423b99b runtime: skip fork test on darwin/arm64
Just like darwin/arm.

Change-Id: Ie4998d24b2d891a9f6c8047ec40cd3fdf80622cd
Reviewed-on: https://go-review.googlesource.com/8812
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-13 11:52:05 +00:00
Alex Brainman
d1af6bed84 runtime: move all exception related code into signal_windows.go
Change-Id: I9654a5c85bd9b3ae9c7a9eddaef1ec752f42bd1b
Reviewed-on: https://go-review.googlesource.com/8840
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-13 07:04:21 +00:00
David Crawshaw
6e3a6c4d38 runtime: library entry point for darwin/arm
Tested by using -buildmode=c-archive to generate an archive, add it
to an Xcode project and calling a Go function from an iOS app. (I'm
still investigating proper buildmode tests for all.bash.)

Change-Id: I7890df15246df8e90ad27837b8d64ba2cde409fe
Reviewed-on: https://go-review.googlesource.com/8719
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-12 12:49:49 +00:00
Michael Hudson-Doyle
e1366f94ee reflect, runtime: check equality, not identity, for method names
When dynamically linking Go code, it is no longer safe to assume that
strings that end up in method names are identical if they are equal.

The performance impact seems to be noise:

benchmark                    old ns/op     new ns/op     delta
BenchmarkAssertI2E2          13.3          13.1          -1.50%
BenchmarkAssertE2I           23.5          23.2          -1.28%
BenchmarkAssertE2E2Blank     0.83          0.82          -1.20%
BenchmarkConvT2ISmall        60.7          60.1          -0.99%
BenchmarkAssertI2T           10.2          10.1          -0.98%
BenchmarkAssertE2T           10.2          10.3          +0.98%
BenchmarkConvT2ESmall        56.7          57.2          +0.88%
BenchmarkConvT2ILarge        59.4          58.9          -0.84%
BenchmarkConvI2E             13.0          12.9          -0.77%
BenchmarkAssertI2E           13.4          13.3          -0.75%
BenchmarkConvT2IUintptr      57.9          58.3          +0.69%
BenchmarkConvT2ELarge        55.9          55.6          -0.54%
BenchmarkAssertI2I           23.8          23.7          -0.42%
BenchmarkConvT2EUintptr      55.4          55.5          +0.18%
BenchmarkAssertE2E           6.12          6.11          -0.16%
BenchmarkAssertE2E2          14.4          14.4          +0.00%
BenchmarkAssertE2T2          10.0          10.0          +0.00%
BenchmarkAssertE2T2Blank     0.83          0.83          +0.00%
BenchmarkAssertE2TLarge      10.7          10.7          +0.00%
BenchmarkAssertI2E2Blank     0.83          0.83          +0.00%
BenchmarkConvI2I             23.4          23.4          +0.00%

Change-Id: I0b3dfc314215a4d4e09eec6b42c1e3ebce33eb56
Reviewed-on: https://go-review.googlesource.com/8239
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-11 17:35:44 +00:00
Derek Buitenhuis
53840ad6f1 runtime: Fix GDB integration with Python 2
A similar fix was applied in 545686857b
but another instance of 'pc' was missed.

Also adds a test for the goroutine gdb command.

It currently uses goroutine 2 for the test, since goroutine 1 has
its stack pointer set to 0 for some reason.

Change-Id: I53ca22be6952f03a862edbdebd9b5c292e0853ae
Reviewed-on: https://go-review.googlesource.com/8729
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-10 22:17:59 +00:00
Austin Clements
4b956ae317 runtime: start concurrent GC promptly when we reach its trigger
Currently, when allocation reaches the concurrent GC trigger size, we
start the concurrent collector by ready'ing its G. This simply puts it
on the end of the P's run queue, which means we may not actually start
GC for some time as the current G continues to run and then the P
drains other Gs already on its run queue. Since the mutator can
continue to allocate, the heap can potentially be much larger than we
intended by the time GC actually starts. Furthermore, how much larger
is difficult to predict since it depends on the scheduler.

Fix this by preempting the current G and switching directly to the
concurrent GC G as soon as we reach the trigger heap size.

On the garbage benchmark from the benchmarks subrepo with
GOMAXPROCS=4, this reduces the time from triggering the GC to the
beginning of sweep termination by 10 to 30 milliseconds, which reduces
allocation after the trigger by up to 10MB (a large fraction of the
64MB live heap the benchmark tries to maintain).

One other known source of delay before we "really" start GC is the
sweep finalization performed before sweep termination. This has
similar negative effects on heap size and predictability, but is an
orthogonal problem. This change adds a TODO for this.

Change-Id: I8bae98cb43685c1bf353ff55868e4647e3743c47
Reviewed-on: https://go-review.googlesource.com/8513
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-10 18:22:52 +00:00
Austin Clements
6afb5fa48f runtime: remove GoSched/GoStart trace events around GC
These were appropriate for STW GC, since it interrupted the allocating
Goroutine, but don't apply to concurrent GC, which runs on its own
Goroutine. Forced GC is still STW, but it makes sense to attribute the
GC to the goroutine that called runtime.GC().

Change-Id: If12418ca66dc7e53b8b16025af4e03adb5d9577e
Reviewed-on: https://go-review.googlesource.com/8715
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-10 18:21:52 +00:00
Austin Clements
7c37249639 runtime: make test for freezetheworld more precise
exitsyscallfast checks for freezetheworld, but does so only by
checking if stopwait is positive. This can also happen during
stoptheworld, which is harmless, but confusing. Shortly, it will be
important that we get to the p.status cas even if stopwait is set.

Hence, make this test more specific so it only triggers with
freezetheworld and not other uses of stopwait.

Change-Id: Ibb722cd8360c3ed5a9654482519e3ceb87a8274d
Reviewed-on: https://go-review.googlesource.com/8205
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-10 18:02:55 +00:00
Dmitry Vyukov
089d363a91 runtime: fix tracing of syscall exit
Fix tracing of syscall exit after:
https://go-review.googlesource.com/#/c/7504/

Change-Id: Idcde2aa826d2b9a05d0a90a80242b6bfa78846ab
Reviewed-on: https://go-review.googlesource.com/8728
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-10 17:39:06 +00:00
Michael Hudson-Doyle
a1f57598cc runtime, cmd/internal/ld: rename themoduledata to firstmoduledata
'themoduledata' doesn't really make sense now we support multiple moduledata
objects.

Change-Id: I8263045d8f62a42cb523502b37289b0fba054f62
Reviewed-on: https://go-review.googlesource.com/8521
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-10 05:11:49 +00:00
Michael Hudson-Doyle
fae4a128cb runtime, reflect: support multiple moduledata objects
This changes all the places that consult themoduledata to consult a
linked list of moduledata objects, as will be necessary for
-linkshared to work.

Obviously, as there is as yet no way of adding moduledata objects to
this list, all this change achieves right now is wasting a few
instructions here and there.

Change-Id: I397af7f60d0849b76aaccedf72238fe664867051
Reviewed-on: https://go-review.googlesource.com/8231
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-10 04:51:42 +00:00
Josh Bleecher Snyder
969f10140c runtime: fix arm64 build
Broken by CL 8541.

Change-Id: Ie2e89a22b91748e82f7bc4723660a24ed4135687
Reviewed-on: https://go-review.googlesource.com/8734
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-10 02:29:01 +00:00
Austin Clements
cb10ff1ef9 runtime: report next_gc for initial heap size in gctrace
Currently, the initial heap size reported in the gctrace line is the
heap_live right before sweep termination. However, we triggered GC
when heap_live reached next_gc, and there may have been significant
allocation between that point and the beginning of sweep
termination. Ideally these would be essentially the same, but
currently there's scheduler delay when readying the GC goroutine as
well as delay from background sweep finalization.

We should fix this delay, but in the mean time, to give the user a
better idea of how much the heap grew during the whole of garbage
collection, report the trigger rather than what the heap size happened
to be after the garbage collector finished rolling out of bed. This
will also be more useful for heap growth plots.

Change-Id: I08476b9fbcfb2de90592405e9c9f434dfb9eb1f8
Reviewed-on: https://go-review.googlesource.com/8512
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-09 22:18:06 +00:00
David Crawshaw
d1b1eee280 runtime: add isarchive, set by the linker
According to Go execution modes, a Go program compiled with
-buildmode=c-archive has a main function, but it is ignored on run.
This gives the runtime the information it needs not to run the main.

I have this working with pending linker changes on darwin/amd64.

Change-Id: I49bd7d65aa619ec847c464a872afa5deea7d4d30
Reviewed-on: https://go-review.googlesource.com/8701
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-09 20:02:02 +00:00
Dave Cheney
ee349b5d77 runtime: add arm64 runtime.cmpstring and bytes.Compare
Add arm64 assembly implementation of runtime.cmpstring and bytes.Compare.

benchmark                                old ns/op     new ns/op     delta
BenchmarkCompareBytesEqual               98.0          27.5          -71.94%
BenchmarkCompareBytesToNil               9.38          10.0          +6.61%
BenchmarkCompareBytesEmpty               13.3          10.0          -24.81%
BenchmarkCompareBytesIdentical           98.0          27.5          -71.94%
BenchmarkCompareBytesSameLength          43.3          16.3          -62.36%
BenchmarkCompareBytesDifferentLength     43.4          16.3          -62.44%
BenchmarkCompareBytesBigUnaligned        6979680       1360979       -80.50%
BenchmarkCompareBytesBig                 6915995       1381979       -80.02%
BenchmarkCompareBytesBigIdentical        6781440       1327304       -80.43%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkCompareBytesBigUnaligned     150.23       770.46       5.13x
BenchmarkCompareBytesBig              151.62       758.76       5.00x
BenchmarkCompareBytesBigIdentical     154.63       790.01       5.11x

* note, the machine we are benchmarking on has some issues. What is clear is
compared to a few days ago the old MB/s value has increased from ~115 to 150.
I'm less certain about the new MB/s number, which used to be close to 1Gb/s.

Change-Id: I4f31b2c7a06296e13912aacc958525632cb0450d
Reviewed-on: https://go-review.googlesource.com/8541
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-09 14:49:31 +00:00
Alex Brainman
6e774faed7 runtime: make windows exception handler code arch independent
Mainly it is simple copy. But I had to change amd64
lastcontinuehandler return value from uint32 to int32.
I don't remember how it happened to be uint32, but new
int32 is matching better with Windows documentation (LONG).
I don't think it matters one way or the others.

Change-Id: I6935224a2470ad6301e27590f2baa86c13bbe8d5
Reviewed-on: https://go-review.googlesource.com/8686
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-09 09:55:38 +00:00
Alex Brainman
414444d416 runtime: do not calculate asmstdcall address every time we make syscall
Change-Id: If3c8c9035e12d41647ae4982883f6a979313ea9d
Reviewed-on: https://go-review.googlesource.com/8682
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-09 04:26:44 +00:00
David Crawshaw
c844bf4cfc runtime: fix darwin/386, darwin/arm builds
In cl/8652 I broke darwin/arm and darwin/386 because I removed the *g
parameter, which they both expect and use. This CL adjusts both ports
to look for g0 in m, just as darwin/amd64 does.

Tested on darwin{386,arm,amd64}.

Change-Id: Ia56f3d97e126b40d8bbd2e8f677b008e4a1badad
Reviewed-on: https://go-review.googlesource.com/8666
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-09 01:36:21 +00:00
Alex Brainman
e0d9342da7 runtime: use (*context) ip, setip, sp and setsp everywhere on windows
Also move dumpregs into defs_windows_*.go.

Change-Id: Ic077d7dbb133c7b812856e758d696d6fed557afd
Reviewed-on: https://go-review.googlesource.com/4650
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-09 00:57:28 +00:00
David Crawshaw
b0a85f5d93 runtime: darwin/amd64 library entry point
This is a practice run for darwin/arm.

Similar to the linux/amd64 shared library entry point. With several
pending linker changes I am successfully using this to implement
-buildmode=c-archive on darwin/amd64 with external linking.

The same entry point can be reused to implement -buildmode=c-shared
on darwin/amd64, however that will require further ld changes to
remove all text relocations.

One extra runtime change will follow this. According to the Go
execution modes document, -buildmode=c-archive should ignore the Go
main function. Right now it is being executed (and the process exits
if it doesn't block). I'm still searching for the right way to do
this.

Change-Id: Id97901ddd4d46970996f222bd79731dabff66a3d
Reviewed-on: https://go-review.googlesource.com/8652
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-08 21:53:52 +00:00
Michael Hudson-Doyle
3a84e3305b runtime, cmd/internal/ld: initialize themoduledata slices directly
This CL is quite conservative in some ways.  It continues to define
symbols that have no real purpose (e.g. epclntab).  These could be
deleted if there is no concern that external tools might look for them.

It would also now be possible to make some changes to the pcln data but
I get the impression that would definitely require some thought and
discussion.

Change-Id: Ib33cde07e4ec38ecc1d6c319a10138c9347933a3
Reviewed-on: https://go-review.googlesource.com/7616
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-08 16:20:57 +00:00
Michael Matloob
a173357cd5 runtime: fix return type for bsdthread_register in comments
The return type for bsdthread_register is int32. See
runtime/os_darwin.go.

This change also rewrites declaration comments for go functions to
use go syntax and fixes vet errors in sys_darwin_amd64.s.

Change-Id: I7482105f7562929e0ede30099efac9e76babd8a3
Reviewed-on: https://go-review.googlesource.com/3260
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-08 14:13:53 +00:00
Shenghou Ma
d0b62d8bfa runtime: linux/arm64 cgo support
Change-Id: I309e3df7608b9eef9339196fdc50dedf5f9439f3
Reviewed-on: https://go-review.googlesource.com/8450
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-04-08 09:08:27 +00:00
Shenghou Ma
0accc80fbb runtime/cgo: linux/arm64 cgo support
Change-Id: I309e3df7608b9eef9339196fdc50dedf5f9439f2
Reviewed-on: https://go-review.googlesource.com/8439
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-04-08 09:08:12 +00:00
Russ Cox
92c826b1b2 cmd/internal/gc: inline runtime.getg
This more closely restores what the old C runtime did.
(In C, g was an 'extern register' with the same effective
implementation as in this CL.)

On a late 2012 MacBookPro10,2, best of 5 old vs best of 5 new:

benchmark                          old ns/op      new ns/op      delta
BenchmarkBinaryTree17              4981312777     4463426605     -10.40%
BenchmarkFannkuch11                3046495712     3006819428     -1.30%
BenchmarkFmtFprintfEmpty           89.3           79.8           -10.64%
BenchmarkFmtFprintfString          284            262            -7.75%
BenchmarkFmtFprintfInt             282            262            -7.09%
BenchmarkFmtFprintfIntInt          480            448            -6.67%
BenchmarkFmtFprintfPrefixedInt     382            358            -6.28%
BenchmarkFmtFprintfFloat           529            486            -8.13%
BenchmarkFmtManyArgs               1849           1773           -4.11%
BenchmarkGobDecode                 12835963       11794385       -8.11%
BenchmarkGobEncode                 10527170       10288422       -2.27%
BenchmarkGzip                      436109569      438422516      +0.53%
BenchmarkGunzip                    110121663      109843648      -0.25%
BenchmarkHTTPClientServer          81930          85446          +4.29%
BenchmarkJSONEncode                24638574       24280603       -1.45%
BenchmarkJSONDecode                93022423       85753546       -7.81%
BenchmarkMandelbrot200             4703899        4735407        +0.67%
BenchmarkGoParse                   5319853        5086843        -4.38%
BenchmarkRegexpMatchEasy0_32       151            151            +0.00%
BenchmarkRegexpMatchEasy0_1K       452            453            +0.22%
BenchmarkRegexpMatchEasy1_32       131            132            +0.76%
BenchmarkRegexpMatchEasy1_1K       761            722            -5.12%
BenchmarkRegexpMatchMedium_32      228            224            -1.75%
BenchmarkRegexpMatchMedium_1K      63751          64296          +0.85%
BenchmarkRegexpMatchHard_32        3188           3238           +1.57%
BenchmarkRegexpMatchHard_1K        95396          96756          +1.43%
BenchmarkRevcomp                   661587262      687107364      +3.86%
BenchmarkTemplate                  108312598      104008540      -3.97%
BenchmarkTimeParse                 453            459            +1.32%
BenchmarkTimeFormat                475            441            -7.16%

The garbage benchmark from the benchmarks subrepo gets 2.6% faster as well.

Change-Id: I320aeda332db81012688b26ffab23f6581c59cfa
Reviewed-on: https://go-review.googlesource.com/8460
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-04-07 14:26:47 +00:00
David Crawshaw
ede863c673 runtime: add _rt0_arm_android_lib
At the moment this function does nothing, runtime initialization is
still done in android.c:init_go_runtime.

Fixes #10358

Change-Id: I1d762383ba61efcbcf0bbc7c77895f5c1dbf8968
Reviewed-on: https://go-review.googlesource.com/8510
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-04-06 22:54:52 +00:00
Austin Clements
8c3fc088fb runtime: report marked heap size in gctrace
When the gctrace GODEBUG option is enabled, it will now report three
heap sizes: the heap size at the beginning of the GC cycle, the heap
size at the end of the GC cycle before sweeping, and marked heap size,
which is the amount of heap that will be retained until the next GC
cycle.

Change-Id: Ie13f8a6d5c609bc9cc47c7555960ab55b37b5f1c
Reviewed-on: https://go-review.googlesource.com/8430
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-06 21:28:23 +00:00
Austin Clements
6d12b1780e runtime: make next_gc be heap size to trigger GC at
In the STW collector, next_gc was both the heap size to trigger GC at
as well as the goal heap size.

Early in the concurrent collector's development, next_gc was the goal
heap size, but was also used as the heap size to trigger GC at. This
meant we always overshot the goal because of allocation during
concurrent GC.

Currently, next_gc is still the goal heap size, but we trigger
concurrent GC at 7/8*GOGC heap growth. This complicates
shouldtriggergc, but was necessary because of the incremental
maintenance of next_gc.

Now we simply compute next_gc for the next cycle during mark
termination. Hence, it's now easy to take the simpler route and
redefine next_gc as the heap size at which the next GC triggers. We
can directly compute this with the 7/8 backoff during mark termination
and shouldtriggergc can simply test if the live heap size has grown
over the next_gc trigger.

This will also simplify later changes once we start setting next_gc in
more sophisticated ways.

Change-Id: I872be4ae06b4f7a0d7f7967360a054bd36b90eea
Reviewed-on: https://go-review.googlesource.com/8420
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-06 21:28:18 +00:00
Austin Clements
d7e0ad4b82 runtime: introduce heap_live; replace use of heap_alloc in GC
Currently there are two main consumers of memstats.heap_alloc:
updatememstats (aka ReadMemStats) and shouldtriggergc.

updatememstats recomputes heap_alloc from the ground up, so we don't
need to keep heap_alloc up to date for it. shouldtriggergc wants to
know how many bytes were marked by the previous GC plus how many bytes
have been allocated since then, but this *isn't* what heap_alloc
tracks. heap_alloc also includes objects that are not marked and
haven't yet been swept.

Introduce a new memstat called heap_live that actually tracks what
shouldtriggergc wants to know and stop keeping heap_alloc up to date.

Unlike heap_alloc, heap_live follows a simple sawtooth that drops
during each mark termination and increases monotonically between GCs.
heap_alloc, on the other hand, has much more complicated behavior: it
may drop during sweep termination, slowly decreases from background
sweeping between GCs, is roughly unaffected by allocation as long as
there are unswept spans (because we sweep and allocate at the same
rate), and may go up after background sweeping is done depending on
the GC trigger.

heap_live simplifies computing next_gc and using it to figure out when
to trigger garbage collection. Currently, we guess next_gc at the end
of a cycle and update it as we sweep and get a better idea of how much
heap was marked. Now, since we're directly tracking how much heap is
marked, we can directly compute next_gc.

This also corrects bugs that could cause us to trigger GC early.
Currently, in any case where sweep termination actually finds spans to
sweep, heap_alloc is an overestimation of live heap, so we'll trigger
GC too early. heap_live, on the other hand, is unaffected by sweeping.

Change-Id: I1f96807b6ed60d4156e8173a8e68745ffc742388
Reviewed-on: https://go-review.googlesource.com/8389
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-06 21:28:13 +00:00
Austin Clements
50a66562a0 runtime: track heap bytes marked by GC
This tracks the number of heap bytes marked by a GC cycle. We'll use
this information to precisely trigger the next GC cycle.

Currently this aggregates the work counter in gcWork and dispose
atomically aggregates this into a global work counter. dispose happens
relatively infrequently, so the contention on the global counter
should be low. If this turns out to be an issue, we can reduce the
number of disposes, and if it's still a problem, we can switch to
per-P counters.

Change-Id: I1bc377cb2e802ef61c2968602b63146d52e7f5db
Reviewed-on: https://go-review.googlesource.com/8388
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-06 21:28:07 +00:00
Ian Lance Taylor
32dbe07621 runtime: fix arm, arm64, ppc64 builds (I hope)
I guess we need more builders.

Change-Id: I309e3df7608b9eef9339196fdc50dedf5f9422e4
Reviewed-on: https://go-review.googlesource.com/8434
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-03 05:18:31 +00:00
Srdjan Petrovic
e8694c8196 runtime: initialize shared library at library-load time
This is Part 2 of the change, see Part 1 here: in https://go-review.googlesource.com/#/c/7692/

Suggested by iant@, we use the library initialization entry point to:
    - create a new OS thread and run the "regular" runtime init stack on
      that thread
    - return immediately from the main (i.e., loader) thread
    - at the first CGO invocation, we wait for the runtime initialization
      to complete.

The above mechanism is implemented only on linux_amd64.  Next step is to
support it on linux_arm.  Other platforms don't yet support shared library
compiling/linking, but we intend to use the same strategy there as well.

Change-Id: Ib2c81b1b83bee837134084b75a3beecfb8de6bf4
Reviewed-on: https://go-review.googlesource.com/8094
Run-TryBot: Srdjan Petrovic <spetrovic@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-03 01:24:51 +00:00
Austin Clements
f244a1471d runtime: add cumulative GC CPU % to gctrace line
This tracks both total CPU time used by GC and the total time
available to all Ps since the beginning of the program and uses this
to derive a cumulative CPU usage percent for the gctrace line.

Change-Id: Ica85372b8dd45f7621909b325d5ac713a9b0d015
Reviewed-on: https://go-review.googlesource.com/8350
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-02 23:37:13 +00:00
Austin Clements
24ee948269 runtime: update gctrace line for new garbage collector
GODEBUG=gctrace=1 turns on a per-GC cycle trace line. The current line
is left over from the STW garbage collector and includes a lot of
information that is no longer meaningful for the concurrent GC and
doesn't include a lot of information that is important.

Replace this line with a new line designed for the new garbage
collector.

This new line is focused more on helping the user understand the
impact of the garbage collector on their program and less on telling
us, the runtime developers, everything that's happening inside
GC. It's designed to fit in 80 columns and intentionally omit some
potentially useful things that were in the old line. We might want a
"verbose" mode that adds information for us.

We'll be able to further simplify the line once we eliminate the STW
around enabling the write barrier. Then we'll have just one STW phase,
one concurrent phase, and one more STW phase, so we'll be able to
reduce the number of times from five to three.

Change-Id: Icc30939fe4576fb4491b4eac811649395727aa2a
Reviewed-on: https://go-review.googlesource.com/8208
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-02 23:37:06 +00:00
Austin Clements
822a24b602 runtime: remove checkgc code from hashmap
Currently hashmap is riddled with code that attempts to force a GC on
the next allocation if checkgc is set. This no longer works as
originally intended with the concurrent collector, and is apparently
no longer used anyway.

Remove checkgc.

Change-Id: Ia6c17c405fa8821dc2e6af28d506c1133ab1ca0c
Reviewed-on: https://go-review.googlesource.com/8355
Reviewed-by: Keith Randall <khr@golang.org>
2015-04-02 15:28:56 +00:00
Austin Clements
6134caf1f9 runtime: improve MemStats comments
This tries to clarify that Alloc and HeapAlloc are tied to how much
freeing has been done by the sweeper.

Change-Id: Id8320074bd75de791f39ec01bac99afe28052d02
Reviewed-on: https://go-review.googlesource.com/8354
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-02 15:28:50 +00:00
Josh Bleecher Snyder
ad3600945a runtime: auto-generate duff routines
This makes it easier to experiment with alternative implementations.

While we're here, update the comments.

No functional changes. Passes toolstash -cmp.

Change-Id: I428535754908f0fdd7cc36c214ddb6e1e60f376e
Reviewed-on: https://go-review.googlesource.com/8310
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-02 02:37:59 +00:00
Michael Hudson-Doyle
67426a8a9e runtime, cmd/internal/ld: change runtime to use a single linker symbol
In preparation for being able to run a go program that has code
in several objects, this changes from having several linker
symbols used by the runtime into having one linker symbol that
points at a structure containing the needed data.  Multiple
object support will construct a linked list of such structures.

A follow up will initialize the slices in the themoduledata
structure directly from the linker but I was aiming for a minimal
diff for now.

Change-Id: I613cce35309801cf265a1d5ae5aaca8d689c5cbf
Reviewed-on: https://go-review.googlesource.com/7441
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-31 22:45:07 +00:00
Austin Clements
a2f3d73fee runtime: improve comment about non-preemption during GC work
Currently, gcDrainN is documented saying that it must be run on the
system stack. In fact, the problem and solution here are somewhat
subtler. First, it doesn't have to happen on the system stack, it just
has to be non-stoppable (that is, non-preemptible). Second, this isn't
specific to gcDrainN (though gcDrainN is perhaps the most surprising
instance); it's general to anything that uses the gcWork structure.

Move the comment to gcWork and generalize it.

Change-Id: I5277b5abb070e47f8d783bc15a310b379c6adc22
Reviewed-on: https://go-review.googlesource.com/8247
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-31 01:05:38 +00:00
Austin Clements
a4374c1de1 runtime: fix another out of date comment in GC
gcDrain used to be passed a *workbuf to start draining from, but now
it takes a gcWork, which hides whether or not there's an initial
workbuf. Update the comment to match this.

Change-Id: I976b58e5bfebc451cfd4fa75e770113067b5cc07
Reviewed-on: https://go-review.googlesource.com/8246
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-31 01:05:31 +00:00
Lee Packham
c45751e8a5 runtime: allow pointers to strings to be printed
Being able to printer pointers to strings means one will able to output
the result of things like the flag library and other components that use
string pointers.

While here, adjusted the tests for gdb to test original string pretty
printing as well as pointers to them. It was doing it via the map before
but for completeness this ensures it's tested as a unit.

Change-Id: I4926547ae4fa6c85ef74301e7d96d49ba4a7b0c6
Reviewed-on: https://go-review.googlesource.com/8217
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-30 23:59:24 +00:00
Michael Hudson-Doyle
f78dc1dac1 runtime: rename ·main·f to ·mainPC to avoid duplicate symbol
runtime·main·f is normalized by the linker to runtime.main.f, as is
the compiler-generated symbol runtime.main·f.  Change the former to
runtime·mainPC instead.

Fixes issue #9934

Change-Id: I656a6fa6422d45385fa2cc55bd036c6affa1abfe
Reviewed-on: https://go-review.googlesource.com/8234
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-30 18:52:14 +00:00
David Chase
2270133981 cmd/gc: allocate backing storage for non-escaping interfaces on stack
Extend escape analysis to convT2E and conT2I. If the interface value
does not escape supply runtime with a stack buffer for the object copy.

This is a straight port from .c to .go of Dmitry's patch

Change-Id: Ic315dd50d144d94dd3324227099c116be5ca70b6
Reviewed-on: https://go-review.googlesource.com/8201
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-03-30 16:11:22 +00:00
Austin Clements
9e6f7aac28 runtime: make "write barriers are not allowed" comments more precise
Currently, various functions are marked with the comment

  // May run without a P, so write barriers are not allowed.

However, "running without a P" is ambiguous. We intended these to mean
that m.p may be nil (which is the condition checked by the write
barrier). The comment could also be taken to mean that a
stop-the-world may happen, which is not the case for these functions
because they run in situations where there is in fact a function on
the stack holding a P locally, it just isn't in m.p.

Change these comments to state precisely what we mean, that m.p may be
nil.

Change-Id: I4a4a1d26aebd455e5067540e13b9f96a7482146c
Reviewed-on: https://go-review.googlesource.com/8209
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-30 15:13:53 +00:00
Daniel Theophanes
77f4571f71 runtime: do not use AddVectoredContinueHandler on Windows XP/2003.
When Windows Error Reporting dialog is disabled on amd64
Windows XP or 2003, the continue handler does not fire. Newer
versions work correctly regardless of WER.

Fixes #10162

Change-Id: I84ea36ee188b34d1421a8db6231223cf61b4111b
Reviewed-on: https://go-review.googlesource.com/8165
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2015-03-30 03:37:55 +00:00
Dmitry Vyukov
ca98dd773a runtime/pprof: fix data race in test
rp.Close happened concurrently with rp.Read. Order them.

Fixes #10280

Change-Id: I7b083bcc336d15396c4e42fc4654ba34fad4a4cc
Reviewed-on: https://go-review.googlesource.com/8211
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-03-29 12:24:16 +00:00
Dmitry Vyukov
c61d86af72 os: give race detector chance to override Exit(0)
Racy tests do not fail currently, they do os.Exit(0).
So if you run go test without -v, you won't even notice.
This was probably introduced with testing.TestMain.

Racy programs do not have the right to finish successfully.

Change-Id: Id133d7424f03d90d438bc3478528683dd02b8846
Reviewed-on: https://go-review.googlesource.com/4371
Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-28 12:42:37 +00:00
Srdjan Petrovic
8da54a4eec cmd: linker changes for shared library initialization
Suggested by iant@, this change:
  - looks for a symbol _rt0_<GOARCH>_<GOOS>_lib,
  - if the symbol is present, adds a new entry into the .init_array ELF
    section that points to the symbol.

The end-effect is that the symbol _rt0_<GOARCH>_<GOOS>_lib will be
invoked as soon as the (ELF) shared library is loaded, which will in turn
initialize the runtime. (To be implemented.)

Change-Id: I99911a180215a6df18f8a18483d12b9b497b48f4
Reviewed-on: https://go-review.googlesource.com/7692
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-27 22:52:10 +00:00
Hyang-Ah Hana Kim
39bc78845b runtime/pprof: fix TestCPUProfileWithFork for GOOS=android.
1) Large allocation in this test caused crash. This was not
detected by builder because builder runs tests with -test.short.

2) The command "go" for forking doesn't exist in some platforms
including android. This change uses the test binary itself which
is guaranteed to exist.

This change also adds logging of the total samples collected in
TestCPUProfileMultithreaded test that is flaky in android-arm
builder.

Change-Id: I225c6b7877d811edef8b25e7eb00559450640c42
Reviewed-on: https://go-review.googlesource.com/8131
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-03-27 18:07:06 +00:00
Austin Clements
392336f94e runtime: disallow write barriers in handoffp and callees
handoffp by definition runs without a P, so it's not allowed to have
write barriers. It doesn't have any right now, but mark it
nowritebarrier to disallow any creeping in in the future. handoffp in
turns calls startm, newm, and newosproc, all of which are "below Go"
and make sense to run without a P, so disallow write barriers in these
as well.

For most functions, we've done this because they may race with
stoptheworld() and hence must not have write barriers. For these
functions, it's a little different: the world can't stop while we're
in handoffp, so this race isn't present. But we implement this
restriction with a somewhat broader rule that you can't have a write
barrier without a P. We like this rule because it's simple and means
that our write barriers can depend on there being a P, even though
this rule is actually a little broader than necessary. Hence, even
though there's no danger of the race in these functions, we want to
adhere to the broader rule.

Change-Id: Ie22319c30eea37d703eb52f5c7ca5da872030b88
Reviewed-on: https://go-review.googlesource.com/8130
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-26 20:38:59 +00:00
Shenghou Ma
400f58a010 runtime: don't trigger write barrier in newosproc for nacl
This should fix the intermittent calling write barrier with mp.p == nil
failures on the nacl/386 builder.

Change-Id: I34aef5ca75ccd2939e6a6ad3f5dacec64903074e
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/7973
Reviewed-by: Austin Clements <austin@google.com>
2015-03-26 19:58:14 +00:00
Austin Clements
ec2c7e6659 runtime: use uintXX instead of *byte for si_addr on Darwin
Currently, Darwin's siginfo type uses *byte for the si_addr
field. This results in unwanted write barriers in set_sigaddr. It's
also pointless since it never points to anything real and the get/set
methods return/take uintXX and cast it from/to the pointer.

All other arches use a uint type for this field. Change Darwin to
match. This simplifies the get/set methods and eliminates the unwanted
write barriers.

Change-Id: Ifdb5646d35e1f2f6808b87a3d59745ec9718add1
Reviewed-on: https://go-review.googlesource.com/8086
Reviewed-by: Austin Clements <austin@google.com>
2015-03-26 16:20:32 +00:00
Austin Clements
9b0ea6aa27 runtime: remove write barrier on G in sighandler
sighandler may run during a stop-the-world without a P, so it's not
allowed to have write barriers. Fix the G write to disable the write
barrier (this is safe because the G is reachable from allgs) and mark
the function nowritebarrier.

Change-Id: I907f05d3829e24eeb15fa4d020598af36710e87e
Reviewed-on: https://go-review.googlesource.com/8020
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-26 15:26:29 +00:00
David Crawshaw
e9d9d0befc runtime, runtime/cgo: make needextram a bool
Also invert it, which means it no longer needs to cross the cgo
package boundary.

Change-Id: I393cd073bda02b591a55d6bc6b8bb94970ea71cd
Reviewed-on: https://go-review.googlesource.com/8082
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-03-26 11:12:25 +00:00
Dave Cheney
e2543ef62c runtime: add runtime.cmpstring and bytes.Compare
Update #10007

Implement runtime.cmpstring and bytes.Compare in asm for arm.

benchmark                                old ns/op     new ns/op     delta
BenchmarkCompareBytesEqual               254           91.4          -64.02%
BenchmarkCompareBytesToNil               41.5          37.6          -9.40%
BenchmarkCompareBytesEmpty               40.7          37.6          -7.62%
BenchmarkCompareBytesIdentical           255           96.3          -62.24%
BenchmarkCompareBytesSameLength          125           60.9          -51.28%
BenchmarkCompareBytesDifferentLength     133           60.9          -54.21%
BenchmarkCompareBytesBigUnaligned        17985879      5669706       -68.48%
BenchmarkCompareBytesBig                 17097634      4926798       -71.18%
BenchmarkCompareBytesBigIdentical        16861941      4389206       -73.97%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkCompareBytesBigUnaligned     58.30        184.95       3.17x
BenchmarkCompareBytesBig              61.33        212.83       3.47x
BenchmarkCompareBytesBigIdentical     62.19        238.90       3.84x

This is a collaboration between Josh Bleecher Snyder and myself.

Change-Id: Ib3944b8c410d0e12135c2ba9459bfe131df48edd
Reviewed-on: https://go-review.googlesource.com/8010
Reviewed-by: Keith Randall <khr@golang.org>
2015-03-25 22:46:39 +00:00
Alex Brainman
2420926a8a runtime: remove obsolete comment
We do not use SEH to handle Windows exception anymore.

Change-Id: I0ac807a0fed7a5b4c745454246764c524460472b
Reviewed-on: https://go-review.googlesource.com/8071
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-25 02:55:56 +00:00
Shenghou Ma
003dccfac4 runtime, syscall: use the new get_random_bytes syscall for NaCl
The SecureRandom named service was removed in
https://codereview.chromium.org/550523002. And the new syscall
was introduced in https://codereview.chromium.org/537543003.

Accepting this will remove the support for older version of
sel_ldr. I've confirmed that both pepper_40 and current
pepper_canary have this syscall.

After this change, we need sel_ldr from pepper_39 or above to
work.

Fixes #9261

Change-Id: I096973593aa302ade61f259a3a71ebc7c1a57913
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/1755
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-03-25 02:07:09 +00:00
Aram Hăvărneanu
41f9c430f3 runtime, syscall: fix Solaris exec tests
Also fixes a long-existing problem in the fork/exec path.

Change-Id: Idec40b1cee0cfb1625fe107db3eafdc0d71798f2
Reviewed-on: https://go-review.googlesource.com/8030
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-03-24 19:51:21 +00:00
David Crawshaw
b8caed823b runtime: initialize extra M for cgo during mstart
Previously the extra m needed for cgo callbacks was created on the
first callback. This works for cgo, however the cgocallback mechanism
is also borrowed by badsignal which can run before any cgo calls are
made.

Now we initialize the extra M at runtime startup before any signal
handlers are registered, so badsignal cannot be called until the
extra M is ready.

Updates #10207.

Change-Id: Iddda2c80db6dc52d8b60e2b269670fbaa704c7b3
Reviewed-on: https://go-review.googlesource.com/7978
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2015-03-24 19:39:46 +00:00
Rick Hudson
546a54bb2e runtime: Remove write barrier on g
There are calls to stdcall when the GC thinks the world is stopped
and stdcall write a *g for the CPU profiler. This produces a write
barrier but the GC is not prepared to deal with write barriers when
it thinks the world is stopped. Since the g is on allg it does not
need a write barrier to keep it alive so eliminate the write barrier.

Change-Id: I937633409a66553d7d292d87d7d58caba1fad0b6
Reviewed-on: https://go-review.googlesource.com/7979
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Rick Hudson <rlh@golang.org>
2015-03-24 16:42:39 +00:00
Alex Brainman
9b69196958 runtime: add TestCgoDLLImports
The test is a simple reproduction of issue 9356.

Update #8948.
Update #9356.

Change-Id: Ia77bc36d12ed0c3c4a8b1214cade8be181c9ad55
Reviewed-on: https://go-review.googlesource.com/7618
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-24 05:39:28 +00:00
Shenghou Ma
b6ed943bef runtime: use _main instead of main on windows/386
windows/386 also wants underscore prefix for external names.
This CL is in preparation of external linking support.

Change-Id: I2d2ea233f976aab3f356f9b508cdd246d5013e2d
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/7282
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2015-03-24 03:23:03 +00:00
Shenghou Ma
6112e6e404 cmd/internal/ld, runtime: record argument size for cgo_dynimport stdcall syscalls
When external linking, we must link to implib provided by mingw, so we must use
properly decorated names for stdcalls.

Because the feature is only used in the runtime, I've designed a new decoration
scheme so that we can use the same decorated name for both 386 and amd64.

A stdcall function named FooEx from bar16.dll which takes 3 parameters will be
imported like this:
	//go:cgo_import_dynamic runtime._FooEx FooEx%3 "bar16.dll"
Depending on the size of uintptr, the linker will later transform it to _FooEx@12
or _FooEx@24.

This is in prepration for the next CL that adds external linking support for
windows/386.

Change-Id: I2d2ea233f976aab3f356f9b508cdd246d5013e2c
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/7163
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-24 03:22:26 +00:00
Michael MacInnis
f7befa43a3 syscall: Add Foreground and Pgid to SysProcAttr
On Unix, when placing a child in a new process group, allow that group
to become the foreground process group. Also, allow a child process to
join a specific process group.

When setting the foreground process group, Ctty is used as the file
descriptor of the controlling terminal. Ctty has been added to the BSD
and Solaris SysProcAttr structures and the handling of Setctty changed
to match Linux.

Change-Id: I18d169a6c5ab8a6a90708c4ff52eb4aded50bc8c
Reviewed-on: https://go-review.googlesource.com/5130
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-23 15:35:53 +00:00
Joel Sing
4f35ad6088 runtime: fix return values for open/read/write/close on openbsd/arm
Change-Id: I5b057d16eed1b364e608ff0fd74de323da6492bc
Reviewed-on: https://go-review.googlesource.com/7679
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-21 03:52:42 +00:00
Dave Cheney
98485f5ad4 runtime: fix linux/amd64p32 build
Implement runtime.atomicand8 for amd64p32 which was overlooked
in CL 7861.

Change-Id: Ic7eccddc6fd6c4682cac1761294893928f5428a2
Reviewed-on: https://go-review.googlesource.com/7920
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-21 02:59:43 +00:00
Russ Cox
4224d81fae cmd/internal/gc: inline x := y.(*T) and x, ok := y.(*T)
These can be implemented with just a compare and a move instruction.
Do so, avoiding the overhead of a call into the runtime.

These assertions are a significant cost in Go code that uses interface{}
as a safe alternative to C's void* (or unsafe.Pointer), such as the
current version of the Go compiler.

*T here includes pointer to T but also any Go type represented as
a single pointer (chan, func, map). It does not include [1]*T or struct{*int}.
That requires more work in other parts of the compiler; there is a TODO.

Change-Id: I7ff681c20d2c3eb6ad11dd7b3a37b1f3dda23965
Reviewed-on: https://go-review.googlesource.com/7862
Reviewed-by: Rob Pike <r@golang.org>
2015-03-20 20:05:37 +00:00
Austin Clements
653426f08f runtime: exit getfull barrier if there are partial workbufs
Currently, we only exit the getfull barrier if there is work on the
full list, even though the exit path will take work from either the
full or partial list. Change this to exit the barrier if there is work
on either the full or partial lists.

I believe it's currently safe to check only the full list, since
during mark termination there is no reason to put a workbuf on a
partial list. However, checking both is more robust.

Change-Id: Icf095b0945c7cad326a87ff2f1dc49b7699df373
Reviewed-on: https://go-review.googlesource.com/7840
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-20 14:05:11 +00:00
Austin Clements
06de3f52a7 runtime: document subtlety around entering mark termination
The barrier in gcDrain does not account for concurrent gcDrainNs
happening in gchelpwork, so it can actually return while there is
still work being done. It turns out this is okay, but for subtle
reasons involving gcDrainN always being run on the system
stack. Document these reasons.

Change-Id: Ib07b3753cc4e2b54533ab3081a359cbd1c3c08fb
Reviewed-on: https://go-review.googlesource.com/7736
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-20 14:05:05 +00:00
Russ Cox
4d2b3a0b5f runtime: fix arm build
Make mask uint32, and move down one line to match atomic_arm64.go.

Change-Id: I4867de494bc4076b7c2b3bf4fd74aa984e3ea0c8
Reviewed-on: https://go-review.googlesource.com/7854
Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-20 05:00:46 +00:00
Russ Cox
631d6a33bf runtime: implement atomicand8 atomically
We're skating on thin ice, and things are finally starting to melt around here.
(I want to avoid the debugging session that will happen when someone
uses atomicand8 expecting it to be atomic with respect to other operations.)

Change-Id: I254f1582be4eb1f2d7fbba05335a91c6bf0c7f02
Reviewed-on: https://go-review.googlesource.com/7861
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-20 04:45:29 +00:00
Russ Cox
564eab891a runtime: add GODEBUG=sbrk=1 to bypass memory allocator (and GC)
To reduce lock contention in this mode, makes persistent allocation state per-P,
which means at most 64 kB overhead x $GOMAXPROCS, which should be
completely tolerable.

Change-Id: I34ca95e77d7e67130e30822e5a4aff6772b1a1c5
Reviewed-on: https://go-review.googlesource.com/7740
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-20 00:02:30 +00:00
Josh Bleecher Snyder
25e793d7ea cmd/internal/gc, runtime: speed up some cases of _, ok := i.(T)
Some type assertions of the form _, ok := i.(T) allow efficient inlining.
Such type assertions commonly show up in type switches.
For example, with this optimization, using 6g, the length of
encoding/binary's intDataSize function shrinks from 2224 to 1728 bytes (-22%).

benchmark                    old ns/op     new ns/op     delta
BenchmarkAssertI2E2Blank     4.67          0.82          -82.44%
BenchmarkAssertE2T2Blank     4.38          0.83          -81.05%
BenchmarkAssertE2E2Blank     3.88          0.83          -78.61%
BenchmarkAssertE2E2          14.2          14.4          +1.41%
BenchmarkAssertE2T2          10.3          10.4          +0.97%
BenchmarkAssertI2E2          13.4          13.3          -0.75%

Change-Id: Ie9798c3e85432bb8e0f2c723afc376e233639df7
Reviewed-on: https://go-review.googlesource.com/7697
Reviewed-by: Keith Randall <khr@golang.org>
2015-03-19 16:20:32 +00:00
Austin Clements
cadd4f81a8 runtime: combine gcWorkProducer into gcWork
The distinction between gcWorkProducer and gcWork (producer and
consumer) is not serving us as originally intended, so merge these
into just gcWork.

The original intent was to replace the currentwbuf cache with a
gcWorkProducer. However, with gchelpwork (aka mutator assists),
mutators can both produce and consume work, so it will make more sense
to cache a whole gcWork.

Change-Id: I6e633e96db7cb23a64fbadbfc4607e3ad32bcfb3
Reviewed-on: https://go-review.googlesource.com/7733
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-19 15:55:21 +00:00
Austin Clements
77fcf36a5e runtime: don't use cached wbuf in markroot
Currently markroot fetches the wbuf to fill from the per-M wbuf
cache. The wbuf cache is primarily meant for the write barrier because
it produces very little work on each call. There's little point to
using the cache in mark root, since each call to markroot is likely to
produce a large amount of work (so the slight win on getting it from
the cache instead of from the central wbuf lists doesn't matter), and
markroot does not dispose the wbuf back to the cache (so most markroot
calls won't get anything from the wbuf cache anyway).

Instead, just get the wbuf from the central wbuf lists like other work
producers. This will simplify later changes.

Change-Id: I07a18a4335a41e266a6d70aa3a0911a40babce23
Reviewed-on: https://go-review.googlesource.com/7732
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-19 15:55:16 +00:00
Austin Clements
fa2f9c2c09 runtime: run concurrent mark phase on regular stack
Currently, the GC's concurrent mark phase runs on the system
stack. There's no need to do this, and running it this way ties up the
entire M and P running the GC by preventing the scheduler from
preempting the GC even during concurrent mark.

Fix this by running concurrent mark on the regular G stack. It's still
non-preemptible because we also set preemptoff around the whole GC
process, but this moves us closer to making it preemptible.

Change-Id: Ia9f1245e299b8c5c513a4b1e3ef13eaa35ac5e73
Reviewed-on: https://go-review.googlesource.com/7730
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-19 15:55:12 +00:00
Austin Clements
bef356b282 runtime: improve comment in concurrent GC
"Sync" is not very informative. What's being synchronized and with
whom? Update this comment to explain what we're really doing: enabling
write barriers.

Change-Id: I4f0cbb8771988c7ba4606d566b77c26c64165f0f
Reviewed-on: https://go-review.googlesource.com/7700
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-19 15:55:06 +00:00
Austin Clements
d21cef1f8f runtime: remove pointless harvestwbufs
Currently we harvestwbufs the moment we enter the mark phase, even
before starting the world again. Since cached wbufs are only filled
when we're in mark or mark termination, they should all be empty at
this point, making the harvest pointless. Remove the harvest.

We should, but do not currently harvest at the end of the mark phase
when we're running out of work to do.

Change-Id: I5f4ba874f14dd915b8dfbc4ee5bb526eecc2c0b4
Reviewed-on: https://go-review.googlesource.com/7669
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-19 15:55:00 +00:00
Austin Clements
a681c3029d runtime: remove out of date comment
Change-Id: I0ad1a81a235c7c067fea2093bbeac4e06a233c10
Reviewed-on: https://go-review.googlesource.com/7661
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-03-19 15:54:01 +00:00
Josh Bleecher Snyder
b90638e1de runtime: delete old .h files
Change-Id: I5a49f56518adf7d64ba8610b51ea1621ad888fc4
Reviewed-on: https://go-review.googlesource.com/7771
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-18 23:52:24 +00:00
Josh Bleecher Snyder
6ffed3020c runtime: fix minor typo
Change-Id: I79b7ed8f7e78e9d35b5e30ef70b98db64bc68a7b
Reviewed-on: https://go-review.googlesource.com/7720
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-18 15:14:37 +00:00
Josh Bleecher Snyder
2adc4e8927 all: use "reports whether" in place of "returns true if(f)"
Comment changes only.

Change-Id: I56848814564c4aa0988b451df18bebdfc88d6d94
Reviewed-on: https://go-review.googlesource.com/7721
Reviewed-by: Rob Pike <r@golang.org>
2015-03-18 15:14:06 +00:00