1
0
mirror of https://github.com/golang/go synced 2024-10-03 00:21:22 -06:00
Commit Graph

1749 Commits

Author SHA1 Message Date
Michael Matloob
432cb66f16 runtime: break out system-specific constants into package sys
runtime/internal/sys will hold system-, architecture- and config-
specific constants.

Updates #11647

Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54
Reviewed-on: https://go-review.googlesource.com/16817
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-12 17:04:45 +00:00
Matthew Dempsky
3073797c37 runtime: fix vet warning about println
lfstack.go:19: println call ends with newline

Change-Id: I2a903eef80a5300e9014999c2f0bc5d40ed5c735
Reviewed-on: https://go-review.googlesource.com/16836
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-11-12 05:19:58 +00:00
Matthew Dempsky
58bc561d1a runtime: fix vet warning about +build rule
cgo_ppc64x.go:7: +build comment must appear before package clause and be followed by a blank line

Change-Id: Ib6dedddae70cc75dc3f137eb37ea338a64f8b595
Reviewed-on: https://go-review.googlesource.com/16835
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Andrew Gerrand <adg@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-12 05:13:47 +00:00
Yao Zhang
fa61945cf2 runtime/debug: skip TestFreeOSMemory for mips64{,le}
Change-Id: I419f3b8bf1bddffd4a775b0cd7b98f0239fe19cb
Reviewed-on: https://go-review.googlesource.com/14458
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-12 04:51:42 +00:00
Yao Zhang
846a9adf05 runtime: restructured signal_linux.go, added signal table for mips64.
Linux/mips64 uses a different signal table. To avoid code copying,
signal table is factored out from signal_linux.go to
sigtab_linux_generic.go. And a mips64-specific version is added.

Change-Id: I842d7a7467c330bf772855fde01aecc77a42316b
Reviewed-on: https://go-review.googlesource.com/14993
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-12 04:49:06 +00:00
Yao Zhang
624f84536d runtime: renamed os2_linux.go to os2_linux_generic.go, added mips64 support
Linux/mips64 has a different sigset type and some different constants.
os2_linux.go is renamed to os2_linux_generic.go, and not used in mips64.
The corresponding file os2_linux_mips64x.go is added.

Change-Id: Ief83845a2779f7fe048d236d3c7da52b627ab533
Reviewed-on: https://go-review.googlesource.com/14992
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:48:43 +00:00
Yao Zhang
e0053f8b1c runtime: restructured os1_linux.go, added mips64 support
Linux/mips64 uses a different type of sigset. To deal with it, related
functions in os1_linux.go is refactored to os1_linux_generic.go
(used for non-mips64 architectures), and os1_linux_mips64x.go (only used
in mips64{,le}), to avoid code copying.

Change-Id: I5cadfccd86bfc4b30bf97e12607c3c614903ea4c
Reviewed-on: https://go-review.googlesource.com/14991
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-12 04:48:23 +00:00
Yao Zhang
c1037aad4d runtime: added mips64{,le} build tags and GOARCH cases
Change-Id: I381c03d957a0dccae5f655f02e92760e5c0e9629
Reviewed-on: https://go-review.googlesource.com/14929
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:47:42 +00:00
Yao Zhang
15b51d6ae6 runtime: updated automatically generated zgoarch_*.go
files for unsupported architectures are deleted, as it would require
changing cmd/dist to recognize their names as build tags (probably
need a separated CL).

Change-Id: Ifd164b014867d39b4924d1b859fb84317dce4ab0
Reviewed-on: https://go-review.googlesource.com/14928
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:47:29 +00:00
Yao Zhang
a36dda7880 runtime: added go files for linux/mips64{,le} support
Change-Id: I14b537922b97d4bce9e0523d98a822da906348f1
Reviewed-on: https://go-review.googlesource.com/14447
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:47:15 +00:00
Yao Zhang
980b00f55b runtime: added go files for mips64 architecture support
Change-Id: Ia496470e48b3c5d39fb9fef99fac356dfb73a949
Reviewed-on: https://go-review.googlesource.com/14927
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:46:50 +00:00
Yao Zhang
b2b8559987 runtime/internal/atomic: added mips64 support.
Change-Id: I2eaf0658771a0ff788429e2f503d116531166315
Reviewed-on: https://go-review.googlesource.com/16834
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:46:35 +00:00
Yao Zhang
424738e43e runtime: added assembly part of linux/mips64{,le} support
Change-Id: I9e94027ef66c88007107de2b2b75c3d7cf1352af
Reviewed-on: https://go-review.googlesource.com/14467
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:46:17 +00:00
Matthew Dempsky
a9bebd91c9 runtime: update comment that was missed in CL 6584
Change-Id: Ie5f70af7e673bb2c691a45c28db2c017e6cddd4f
Reviewed-on: https://go-review.googlesource.com/16833
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 03:38:04 +00:00
Matthew Dempsky
c17c42e8a5 runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...)
Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and
mSpanList.

Two special cases:

1. mHeap_Scavenge() previously didn't take an *mheap parameter, so it
was specially handled in this CL.

2. mHeap_Free() would have collided with mheap's "free" field, so it's
been renamed to (*mheap).freeSpan to parallel its underlying
(*mheap).freeSpanLocked method.

Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b
Reviewed-on: https://go-review.googlesource.com/16221
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-12 00:34:58 +00:00
Michael Hudson-Doyle
58db5fc94d runtime: run TestCgoExternalThreadSIGPROF on ppc64le
It was disabled because of the lack of external linking.

Change-Id: Iccb4a4ef8c57d048d53deabe4e0f4e6b9dccce33
Reviewed-on: https://go-review.googlesource.com/16797
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-12 00:30:04 +00:00
Hyang-Ah Hana Kim
b2259dcef0 runtime: add syscalls needed for android/386 logging
Update golang/go#9327.

Change-Id: I27ef973190d9ae652411caf3739414b5d46ca7d2
Reviewed-on: https://go-review.googlesource.com/16679
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-11-11 21:59:53 +00:00
Hyang-Ah Hana Kim
05c4c6e2f4 cmd,runtime: TLS setup for android/386
Same ugly hack as https://go-review.googlesource.com/15991.

Update golang/go#9327.

Change-Id: I58284e83268a15de95eabc833c3e01bf1e3faa2e
Reviewed-on: https://go-review.googlesource.com/16678
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-11-11 21:59:24 +00:00
Austin Clements
d727312cbf runtime: remove unused marking parfor
The GC now handles the root marking jobs as part of general marking,
so work.markfor is no longer used.

Change-Id: I6c3b23fed27e4e7ea6430d6ca7ba25ae4d04ed14
Reviewed-on: https://go-review.googlesource.com/16811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-11-11 18:31:33 +00:00
Austin Clements
f32f2954fb runtime: never allocate new M when jumping time forward
When we're jumping time forward, it means everyone is asleep, so there
should always be an M available. Furthermore, this causes both
allocation and write barriers in contexts that may be running without
a P (such as in sysmon).

Hence, replace this allocation with a throw.

Updates #10600.

Change-Id: I2cee70d5db828d0044082878995949edb25dda5f
Reviewed-on: https://go-review.googlesource.com/16815
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-11 17:37:42 +00:00
Austin Clements
f5c42cf88e runtime: replace traceBuf slice with index
Currently traceBuf keeps track of where it is in the trace buffer by
also maintaining a slice that points in to this buffer with an initial
length of 0 and a cap of the length of the array. All writes to this
buffer are done by appending to the slice (as long as the bounds
checks are right, it will never overflow and the append won't allocate
a new slice).

Each of these appends generates a write barrier. As long as we never
overflow the buffer, this write barrier won't fire, but this wreaks
havoc with eliminating write barriers from the tracing code. If we
were to overflow the buffer, this would both allocate and invoke a
write barrier, both things that are dicey at best to do in many of the
contexts tracing happens. It also wastes space in the traceBuf and
leads to more complex code and more complex generated code.

Replace this slice trick with keeping track of a simple array
position.

Updates #10600.

Change-Id: I0a63eecec1992e195449f414ed47653f66318d0e
Reviewed-on: https://go-review.googlesource.com/16814
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-11 17:37:31 +00:00
Austin Clements
2be1ed80c5 runtime: eliminate traceStack write barriers
This replaces *traceStack with traceStackPtr, much like the preceding
commit.

Updates #10600.

Change-Id: Ifadc35eb37a405ae877f9740151fb31a0ca1d08f
Reviewed-on: https://go-review.googlesource.com/16813
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-11 17:37:26 +00:00
Austin Clements
03227bb55e runtime: eliminate traceBuf write barriers
The tracing code is currently called from contexts such as sysmon and
the scheduler where write barriers are not allowed. Unfortunately,
while the common paths through the tracing code do not have write
barriers, many of the less common paths dealing with buffer overflow
and recycling do.

This change replaces all *traceBufs with traceBufPtrs. In the style of
guintptr, etc., the GC does not trace traceBufPtrs and write barriers
do not apply when these pointers are written. Since traceBufs are
allocated from non-GC'd memory and manually managed, this is always
safe.

Updates #10600.

Change-Id: I52b992d36d1b634ebd855c8cde27947ec14f59ba
Reviewed-on: https://go-review.googlesource.com/16812
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-11 17:37:18 +00:00
Austin Clements
7d1d642956 runtime: fix use of xadd64
Commit 7407d8e was rebased over the switch to runtime/internal/atomic
and introduced a call to xadd64, which no longer exists. Fix that
call.

Change-Id: I99c93469794c16504ae4a8ffe3066ac382c66a3a
Reviewed-on: https://go-review.googlesource.com/16816
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-11 15:26:24 +00:00
Austin Clements
7407d8e582 runtime: fix over-aggressive proportional sweep
Currently, sweeping is performed before allocating a span by charging
for the entire size of the span requested, rather than the number of
bytes actually available for allocation from the returned span. That
is, if the returned span is 8K, but already has 6K in use, the mutator
is charged for 8K of heap allocation even though it can only allocate
2K more from the span. As a result, proportional sweep is
over-aggressive and tends to finish much earlier than it needs to.
This effect is more amplified by fragmented heaps.

Fix this by reimbursing the mutator for the used space in a span once
it has allocated that span. We still have to charge up-front for the
worst-case because we don't know which span the mutator will get, but
at least we can correct the over-charge once it has a span, which will
go toward later span allocations.

This has negligible effect on the throughput of the go1 benchmarks and
the garbage benchmark.

Fixes #12040.

Change-Id: I0e23e7a4ccf126cca000fed5067b20017028dd6b
Reviewed-on: https://go-review.googlesource.com/16515
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-11 15:21:32 +00:00
Ian Lance Taylor
880a689124 runtime: don't call msanread when running on the system stack
The runtime is not instrumented, but the calls to msanread in the
runtime can sometimes refer to the system stack.  An example is the call
to copy in stkbucket in mprof.go.  Depending on what C code has done,
the system stack may appear uninitialized to msan.

Change-Id: Ic21705b9ac504ae5cf7601a59189302f072e7db1
Reviewed-on: https://go-review.googlesource.com/16660
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-11-11 06:04:04 +00:00
Ian Lance Taylor
8f3f2ccac0 runtime: mark cgo callback results as written for msan
This is a fix for the -msan option when using cgo callbacks.  A cgo
callback works by writing out C code that puts a struct on the stack and
passes the address of that struct into Go.  The result parameters are
fields of the struct.  The Go code will write to the result parameters,
but the Go code thinks it is just writing into the Go stack, and
therefore won't call msanwrite.  This CL adds a call to msanwrite in the
cgo callback code so that the C knows that results were written.

Change-Id: I80438dbd4561502bdee97fad3f02893a06880ee1
Reviewed-on: https://go-review.googlesource.com/16611
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-11-11 05:58:19 +00:00
Austin Clements
f84420c20d runtime: clean up park messages
This changes "mark worker (idle)" to "GC worker (idle)" so it's more
clear to users that these goroutines are GC-related. It changes "GC
assist" to "GC assist wait" to make it clear that the assist is
blocked.

Change-Id: Iafbc0903c84f9250ff6bee14baac6fcd4ed5ef76
Reviewed-on: https://go-review.googlesource.com/16511
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-11 01:04:39 +00:00
Austin Clements
56ad88b1ff runtime: free stack spans outside STW
We couldn't do this before this point because it must be done before
the next GC cycle starts. Hence, if it delayed the start of the next
cycle, that would widen the window between reaching the heap trigger
of the next cycle and starting the next GC cycle, during which the
mutator could over-allocate. With the decentralized GC, any mutators
that reach the heap trigger will block on the GC starting, so it's
safe to widen the time between starting the world and being able to
start the next GC cycle.

Fixes #11465.

Change-Id: Ic7ea7e9eba5b66fc050299f843a9c9001ad814aa
Reviewed-on: https://go-review.googlesource.com/16394
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-11 01:04:33 +00:00
Ian Lance Taylor
9dcc58c3d1 cmd/cgo, runtime: add checks for passing pointers from Go to C
This implements part of the proposal in issue 12416 by adding dynamic
checks for passing pointers from Go to C.  This code is intended to be
on at all times.  It does not try to catch every case.  It does not
implement checks on calling Go functions from C.

The new cgo checks may be disabled using GODEBUG=cgocheck=0.

Update #12416.

Change-Id: I48de130e7e2e83fb99a1e176b2c856be38a4d3c8
Reviewed-on: https://go-review.googlesource.com/16003
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 22:22:10 +00:00
Michael Matloob
67faca7d9c runtime: break atomics out into package runtime/internal/atomic
This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.

all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.

Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 17:38:04 +00:00
Michael Hudson-Doyle
4e3deae96d cmd/link, runtime: arm64 implementation of addmoduledata
Change-Id: I62fb5b20d7caa51b77560a4bfb74a39f17089805
Reviewed-on: https://go-review.googlesource.com/13999
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 01:24:25 +00:00
Keith Randall
e410a527b2 runtime: simplify chan ops, take 2
This change is the same as CL #9345 which was reverted,
except for a small bug fix.

The only change is to the body of sendDirect and its callsite.
Also added a test.

The problem was during a channel send operation.  The target
of the send was a sleeping goroutine waiting to receive.  We
basically do:
1) Read the destination pointer out of the sudog structure
2) Copy the value we're sending to that destination pointer
Unfortunately, the previous change had a goroutine suspend
point between 1 & 2 (the call to sendDirect).  At that point
the destination goroutine's stack could be copied (shrunk).
The pointer we read in step 1 is no longer valid for step 2.

Fixed by not allowing any suspension points between 1 & 2.
I suspect the old code worked correctly basically by accident.

Fixes #13169

The original 9345:

This change removes the retry mechanism we use for buffered channels.
Instead, any sender waking up a receiver or vice versa completes the
full protocol with its counterpart.  This means the counterpart does
not need to relock the channel when it wakes up.  (Currently
buffered channels need to relock on wakeup.)

For sends on a channel with waiting receivers, this change replaces
two copies (sender->queue, queue->receiver) with one (sender->receiver).
For receives on channels with a waiting sender, two copies are still required.

This change unifies to a large degree the algorithm for buffered
and unbuffered channels, simplifying the overall implementation.

Fixes #11506

Change-Id: I57dfa3fc219cffa4d48301ee15fe5479299efa09
Reviewed-on: https://go-review.googlesource.com/16740
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-08 23:20:25 +00:00
Michael Hudson-Doyle
1b4d28f8cf cmd/link, runtime: arm implementation of addmoduledata
Change-Id: I3975e10c2445e23c2798a7203a877ff2de3427c7
Reviewed-on: https://go-review.googlesource.com/14189
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-08 21:46:17 +00:00
Ian Lance Taylor
e884334b55 runtime: use pthread_sigmask, not sigprocmask, on Darwin ARM/ARM64
Other systems use pthread_sigmask.  It was a mistake to use sigprocmask
here.

Change-Id: Ie045aa3f09cf035fcf807b7543b96fa5b847958a
Reviewed-on: https://go-review.googlesource.com/16720
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-07 15:48:58 +00:00
Keith Randall
4b7d5f0b94 runtime: memmove/memclr pointers atomically
Make sure that we're moving or zeroing pointers atomically.
Anything that is a multiple of pointer size and at least
pointer aligned might have pointers in it.  All the code looks
ok except for the 1-pointer-sized moves.

Fixes #13160
Update #12552

Change-Id: Ib97d9b918fa9f4cc5c56c67ed90255b7fdfb7b45
Reviewed-on: https://go-review.googlesource.com/16668
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-07 02:42:12 +00:00
Ilya Tocar
321a40721b runtime: optimize indexbytebody on amd64
Use avx2 to compare 32 bytes per iteration.
Results (haswell):

name                    old time/op    new time/op     delta
IndexByte32-6             15.5ns ± 0%     14.7ns ± 5%   -4.87%        (p=0.000 n=16+20)
IndexByte4K-6              360ns ± 0%      183ns ± 0%  -49.17%        (p=0.000 n=19+20)
IndexByte4M-6              384µs ± 0%      256µs ± 1%  -33.41%        (p=0.000 n=20+20)
IndexByte64M-6            6.20ms ± 0%     4.18ms ± 1%  -32.52%        (p=0.000 n=19+20)
IndexBytePortable32-6     73.4ns ± 5%     75.8ns ± 3%   +3.35%        (p=0.000 n=20+19)
IndexBytePortable4K-6     5.15µs ± 0%     5.15µs ± 0%     ~     (all samples are equal)
IndexBytePortable4M-6     5.26ms ± 0%     5.25ms ± 0%   -0.12%        (p=0.000 n=20+18)
IndexBytePortable64M-6    84.1ms ± 0%     84.1ms ± 0%   -0.08%        (p=0.012 n=18+20)
Index32-6                  352ns ± 0%      352ns ± 0%     ~     (all samples are equal)
Index4K-6                 53.8µs ± 0%     53.8µs ± 0%   -0.03%        (p=0.000 n=16+18)
Index4M-6                 55.4ms ± 0%     55.4ms ± 0%     ~           (p=0.149 n=20+19)
Index64M-6                 886ms ± 0%      886ms ± 0%     ~           (p=0.108 n=20+20)
IndexEasy32-6             80.3ns ± 0%     80.1ns ± 0%   -0.21%        (p=0.000 n=20+20)
IndexEasy4K-6              426ns ± 0%      215ns ± 0%  -49.53%        (p=0.000 n=20+20)
IndexEasy4M-6              388µs ± 0%      262µs ± 1%  -32.42%        (p=0.000 n=18+20)
IndexEasy64M-6            6.20ms ± 0%     4.19ms ± 1%  -32.47%        (p=0.000 n=18+20)

name                    old speed      new speed       delta
IndexByte32-6           2.06GB/s ± 1%   2.17GB/s ± 5%   +5.19%        (p=0.000 n=18+20)
IndexByte4K-6           11.4GB/s ± 0%   22.3GB/s ± 0%  +96.45%        (p=0.000 n=17+20)
IndexByte4M-6           10.9GB/s ± 0%   16.4GB/s ± 1%  +50.17%        (p=0.000 n=20+20)
IndexByte64M-6          10.8GB/s ± 0%   16.0GB/s ± 1%  +48.19%        (p=0.000 n=19+20)
IndexBytePortable32-6    436MB/s ± 5%    422MB/s ± 3%   -3.27%        (p=0.000 n=20+19)
IndexBytePortable4K-6    795MB/s ± 0%    795MB/s ± 0%     ~           (p=0.940 n=17+18)
IndexBytePortable4M-6    798MB/s ± 0%    799MB/s ± 0%   +0.12%        (p=0.000 n=20+18)
IndexBytePortable64M-6   798MB/s ± 0%    798MB/s ± 0%   +0.08%        (p=0.011 n=18+20)
Index32-6               90.9MB/s ± 0%   90.9MB/s ± 0%   -0.00%        (p=0.025 n=20+20)
Index4K-6               76.1MB/s ± 0%   76.1MB/s ± 0%   +0.03%        (p=0.000 n=14+15)
Index4M-6               75.7MB/s ± 0%   75.7MB/s ± 0%     ~           (p=0.076 n=20+19)
Index64M-6              75.7MB/s ± 0%   75.7MB/s ± 0%     ~           (p=0.456 n=20+17)
IndexEasy32-6            399MB/s ± 0%    399MB/s ± 0%   +0.20%        (p=0.000 n=20+19)
IndexEasy4K-6           9.60GB/s ± 0%  19.02GB/s ± 0%  +98.19%        (p=0.000 n=20+20)
IndexEasy4M-6           10.8GB/s ± 0%   16.0GB/s ± 1%  +47.98%        (p=0.000 n=18+20)
IndexEasy64M-6          10.8GB/s ± 0%   16.0GB/s ± 1%  +48.08%        (p=0.000 n=18+20)

Change-Id: I46075921dde9f3580a89544c0b3a2d8c9181ebc4
Reviewed-on: https://go-review.googlesource.com/16484
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Klaus Post <klauspost@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-06 15:16:28 +00:00
Keith Randall
e9f90ba246 Revert "runtime: simplify buffered channels."
Revert for now until #13169 is understood.

This reverts commit 8e496f1d69.

Change-Id: Ib3eb2588824ef47a2b6eb9e377a24e5c817fcc81
Reviewed-on: https://go-review.googlesource.com/16716
Reviewed-by: Keith Randall <khr@golang.org>
2015-11-06 08:30:35 +00:00
Austin Clements
d5ba582166 runtime: remove background GC goroutine and mark barriers
These are now unused.

Updates #11970.

Change-Id: I43e5c4e5bcda9581bacc63364f96bb4855ab779f
Reviewed-on: https://go-review.googlesource.com/16393
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:24:05 +00:00
Austin Clements
bbf2da00fc runtime: remove GC start up/shutdown workaround in mallocgc
Currently mallocgc detects if the GC is in a state where it can't
assist, but also can't allocate uncontrolled and yields to help out
the GC. This was a workaround for periods when we were trying to
schedule the GC coordinator. It is no longer necessary because there
is no GC coordinator and malloc can always assist with any GC
transitions that are necessary.

Updates #11970.

Change-Id: I4f7beb7013e85e50ae99a3a8b0bb708ba49cbcd4
Reviewed-on: https://go-review.googlesource.com/16392
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:24:01 +00:00
Austin Clements
c99d7f7f85 runtime: decentralize mark done and mark termination
This moves all of the mark 1 to mark 2 transition and mark termination
to the mark done transition function. This means these transitions are
now handled on the goroutine that detected mark completion. This also
means that the GC coordinator and the background completion barriers
are no longer used and various workarounds to yield to the coordinator
are no longer necessary. These will be removed in follow-up commits.

One consequence of this is that mark workers now need to be
preemptible when performing the mark done transition. This allows them
to stop the world and to perform the final clean-up steps of GC after
restarting the world. They are only made preemptible while performing
this transition, so if the worker findRunnableGCWorker would schedule
isn't available, we didn't want to schedule it anyway.

Fixes #11970.

Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837
Reviewed-on: https://go-review.googlesource.com/16391
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:54 +00:00
Austin Clements
d986bf2741 runtime: account mark worker time before gcMarkDone
Currently gcMarkDone takes basically no time, so it's okay to account
the worker time after calling it. However, gcMarkDone is about to take
potentially *much* longer because it may perform all of mark
termination. Prepare for this by swapping the order so we account the
time before calling gcMarkDone.

Change-Id: I90c7df68192acfc4fd02a7254dae739dda4e2fcb
Reviewed-on: https://go-review.googlesource.com/16390
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:49 +00:00
Austin Clements
171204b561 runtime: factor mark done transition
Currently the code for completion of mark 1/mark 2 is duplicated in
background workers and assists. Factor this in to a single function
that will serve as the transition function for concurrent mark.

Change-Id: I4d9f697a15da0d349db3b34d56f3a220dd41d41b
Reviewed-on: https://go-review.googlesource.com/16359
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:42 +00:00
Austin Clements
12e23f05ff runtime: eliminate mark completion in scheduler
Currently, findRunnableGCWorker will perform mark completion if there
is no remaining work and no running workers. This used to be necessary
to resolve a race in the transition from mark 1 to mark 2 where we
would enter mark 2 with no mark work (and no dedicated workers), so no
workers would run, so no worker would signal mark completion.

However, we're about to make mark completion also perform the entire
follow-on process, which includes mark termination. We really don't
want to do that in the scheduler if it happens to detect completion.

Conveniently, this hack is no longer necessary because we always
enqueue root scanning work at the beginning of both mark 1 and mark 2,
so a mark worker will always run. Hence, we can simply eliminate it.

Change-Id: I3fc8f27c8da632f0fb732c9f6425e1f457f5652e
Reviewed-on: https://go-review.googlesource.com/16358
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:38 +00:00
Austin Clements
20f276e237 runtime: don't start idle mark workers when barriers are cleared
Currently, we don't start dedicated or fractional mark workers unless
the mark 1 or mark 2 barriers have been cleared. One intended
consequence of this is that no background workers run between the
forEachP that disposes all gcWork caches and the beginning of mark 2.

However, we (unintentionally) did not apply this restriction to idle
mark workers. As a result, these can start in the interim between mark
1 completion and mark 2 starting. This explains why it was necessary
to reset the root marking jobs using carefully ordered atomic writes
when setting up mark 2. It also means that, even though we definitely
enqueue work before starting mark 2, it may be drained by the time we
reset the mark 2 barrier. If this happens, currently the only thing
preventing the runtime from deadlocking is that the scheduler itself
also checks for mark completion and will signal mark 2 completion.
Were it not for the odd behavior of idle workers, this check in the
scheduler would not be necessary.

Clean all of this up and prepare to remove this check in the scheduler
by applying the same restriction to starting idle mark workers.

Change-Id: Ic1b479e1591bd7773dc27b320ca399a215603b5a
Reviewed-on: https://go-review.googlesource.com/16631
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:33 +00:00
Austin Clements
a51905fa04 runtime: decentralize sweep termination and mark transition
This moves all of GC initialization, sweep termination, and the
transition to concurrent marking in to the off->mark transition
function. This means it's now handled on the goroutine that detected
the state exit condition.

As a result, malloc no longer needs to Gosched() at the beginning of
the GC cycle to prevent over-allocation while the GC is starting up
because it will now *help* the GC to start up. The Gosched hack is
still necessary during GC shutdown (this is easy to test by enabling
gctrace and hitting Ctrl-S to block the gctrace output).

At this point, the GC coordinator still handles later phases. This
requires a small tweak to how we start the GC coordinator. Currently,
starting the GC coordinator is best-effort and may fail if the
coordinator is about to park from the previous cycle but hasn't yet.
We fix this by replacing the park/ready to wake up the coordinator
with a semaphore. This is temporary since the coordinator will be
going away in a few commits.

Updates #11970.

Change-Id: I2c6a11c91e72dfbc59c2d8e7c66146dee9a444fe
Reviewed-on: https://go-review.googlesource.com/16357
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:27 +00:00
Austin Clements
9630c47e8c runtime: decentralize concurrent sweep termination
This moves concurrent sweep termination from the coordinator to the
off->mark transition. This allows it to be performed by all Gs
attempting to start the GC.

Updates #11970.

Change-Id: I24428e8599a759398c2ef7ec996ba755a448f947
Reviewed-on: https://go-review.googlesource.com/16356
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:22 +00:00
Austin Clements
f54bcedce1 runtime: beginning of decentralized off->mark transition
This begins the conversion of the centralized GC coordinator to a
decentralized state machine by introducing the internal API that
triggers the first state transition from _GCoff to _GCmark (or
_GCmarktermination).

This change introduces the transition lock, the off->mark transition
condition (which is very similar to shouldtriggergc()), and the
general structure of a state transition. Since we're doing this
conversion in stages, it then falls back to the GC coordinator to
actually execute the cycle. We'll start moving logic out of the GC
coordinator and in to transition functions next.

This fixes a minor bug in gcstoptheworld debug mode where passing the
heap trigger once could trigger multiple STW GCs.

Updates #11970.

Change-Id: I964087dd190a639eb5766398f8e1bbf8b352902f
Reviewed-on: https://go-review.googlesource.com/16355
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-11-05 21:23:17 +00:00
Austin Clements
3842596284 runtime: move concurrent mark setup off system stack
For historical reasons we currently do a lot of the concurrent mark
setup on the system stack. In fact, at this point the one and only
thing that needs to happen on the system stack is the start-the-world.

Clean up this code by lifting everything other than the
start-the-world off the system stack.

The diff for this change looks large, but the only code change is to
narrow the systemstack call. Everything else is re-indentation.

Change-Id: I1e03b8afc759fad726f2397b05a17d183c2713ce
Reviewed-on: https://go-review.googlesource.com/16354
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:11 +00:00
Austin Clements
1959621584 runtime: lift state variables from func gc to var work
We're about to split func gc across several functions, so lift the
local variables it uses for tracking statistics and state across the
cycle into the global "work" variable.

Change-Id: Ie955f2f1758c7f5a5543ea1f3f33b222bc4b1d37
Reviewed-on: https://go-review.googlesource.com/16353
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:06 +00:00
Austin Clements
1698018955 runtime: note a minor issue with GODEUG=gcstoptheworld
Change-Id: I91cda8d88b0852cd0f868d33c594206bcca0c386
Reviewed-on: https://go-review.googlesource.com/16352
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-05 21:22:59 +00:00
Ilya Tocar
967564be7e runtime: optimize string comparison on amd64
Use AVX2 if possible.
Results below (haswell):

name                            old time/op    new time/op     delta
CompareStringEqual-6              8.77ns ± 0%     8.63ns ± 1%   -1.58%        (p=0.000 n=20+19)
CompareStringIdentical-6          5.02ns ± 0%     5.02ns ± 0%     ~     (all samples are equal)
CompareStringSameLength-6         7.51ns ± 0%     7.51ns ± 0%     ~     (all samples are equal)
CompareStringDifferentLength-6    1.56ns ± 0%     1.56ns ± 0%     ~     (all samples are equal)
CompareStringBigUnaligned-6        124µs ± 1%      105µs ± 5%  -14.99%        (p=0.000 n=20+18)
CompareStringBig-6                 112µs ± 1%      103µs ± 0%   -7.87%        (p=0.000 n=20+17)

name                            old speed      new speed       delta
CompareStringBigUnaligned-6     8.48GB/s ± 1%   9.98GB/s ± 5%  +17.67%        (p=0.000 n=20+18)
CompareStringBig-6              9.37GB/s ± 1%  10.17GB/s ± 0%   +8.54%        (p=0.000 n=20+17)

Change-Id: I1c949626dd2aaf9f633e3c888a9df71c82eed7e1
Reviewed-on: https://go-review.googlesource.com/16481
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Klaus Post <klauspost@gmail.com>
2015-11-05 15:42:33 +00:00
Keith Randall
8e496f1d69 runtime: simplify buffered channels.
This change removes the retry mechanism we use for buffered channels.
Instead, any sender waking up a receiver or vice versa completes the
full protocol with its counterpart.  This means the counterpart does
not need to relock the channel when it wakes up.  (Currently
buffered channels need to relock on wakeup.)

For sends on a channel with waiting receivers, this change replaces
two copies (sender->queue, queue->receiver) with one (sender->receiver).
For receives on channels with a waiting sender, two copies are still required.

This change unifies to a large degree the algorithm for buffered
and unbuffered channels, simplifying the overall implementation.

Fixes #11506

benchmark                        old ns/op     new ns/op     delta
BenchmarkChanProdCons10          125           110           -12.00%
BenchmarkChanProdCons0           303           284           -6.27%
BenchmarkChanProdCons100         75.5          71.3          -5.56%
BenchmarkChanContended           6452          6125          -5.07%
BenchmarkChanNonblocking         11.5          11.0          -4.35%
BenchmarkChanCreation            149           143           -4.03%
BenchmarkChanSem                 63.6          61.6          -3.14%
BenchmarkChanUncontended         6390          6212          -2.79%
BenchmarkChanSync                282           276           -2.13%
BenchmarkChanProdConsWork10      516           506           -1.94%
BenchmarkChanProdConsWork0       696           685           -1.58%
BenchmarkChanProdConsWork100     470           469           -0.21%
BenchmarkChanPopular             660427        660012        -0.06%

Change-Id: I164113a56432fbc7cace0786e49c5a6e6a708ea4
Reviewed-on: https://go-review.googlesource.com/9345
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-05 15:41:05 +00:00
Austin Clements
dcd9e5bc0f runtime: make putfull start mark workers
Currently we depend on the good graces and timing of the scheduler to
get opportunities to start dedicated mark workers. In the worst case,
it may take 10ms to get dedicated mark workers going at the beginning
of mark 1 and mark 2 or after the amount of available work has dropped
and gone back up.

Instead of waiting for the regular preemption logic to get around to
us, make putfull enlist a random P if we're not already running enough
dedicated workers. This should improve performance stability of the
garbage collector and is likely to improve the overall performance
somewhat.

No overall effect on the go1 benchmarks. It speeds up the garbage
benchmark by 12%, which more than counters the performance loss from
the previous commit.

name              old time/op  new time/op  delta
XBenchGarbage-12  6.32ms ± 4%  5.58ms ± 2%  -11.68%  (p=0.000 n=20+16)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.18s ± 5%     3.12s ± 4%  -1.83%  (p=0.021 n=20+20)
Fannkuch11-12                2.50s ± 2%     2.46s ± 2%  -1.57%  (p=0.000 n=18+19)
FmtFprintfEmpty-12          50.8ns ± 3%    50.4ns ± 3%    ~     (p=0.184 n=20+20)
FmtFprintfString-12          167ns ± 2%     171ns ± 1%  +2.46%  (p=0.000 n=20+19)
FmtFprintfInt-12             161ns ± 2%     163ns ± 2%  +1.81%  (p=0.000 n=20+20)
FmtFprintfIntInt-12          269ns ± 1%     266ns ± 1%  -0.81%  (p=0.002 n=19+20)
FmtFprintfPrefixedInt-12     237ns ± 2%     231ns ± 2%  -2.86%  (p=0.000 n=20+20)
FmtFprintfFloat-12           313ns ± 2%     313ns ± 1%    ~     (p=0.681 n=20+20)
FmtManyArgs-12              1.05µs ± 2%    1.03µs ± 1%  -2.26%  (p=0.000 n=20+20)
GobDecode-12                8.66ms ± 1%    8.67ms ± 1%    ~     (p=0.380 n=19+20)
GobEncode-12                6.56ms ± 1%    6.56ms ± 2%    ~     (p=0.607 n=19+20)
Gzip-12                      317ms ± 1%     314ms ± 2%  -1.10%  (p=0.000 n=20+19)
Gunzip-12                   42.1ms ± 1%    42.2ms ± 1%  +0.27%  (p=0.044 n=20+19)
HTTPClientServer-12         62.7µs ± 1%    62.0µs ± 1%  -1.04%  (p=0.000 n=19+18)
JSONEncode-12               16.7ms ± 1%    16.8ms ± 2%  +0.59%  (p=0.021 n=20+20)
JSONDecode-12               58.2ms ± 1%    61.4ms ± 2%  +5.43%  (p=0.000 n=18+19)
Mandelbrot200-12            3.84ms ± 1%    3.87ms ± 2%  +0.79%  (p=0.008 n=18+20)
GoParse-12                  3.86ms ± 2%    3.76ms ± 2%  -2.60%  (p=0.000 n=20+20)
RegexpMatchEasy0_32-12       100ns ± 2%     100ns ± 1%  -0.68%  (p=0.005 n=18+15)
RegexpMatchEasy0_1K-12       332ns ± 1%     342ns ± 1%  +3.16%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12      82.9ns ± 3%    83.0ns ± 2%    ~     (p=0.906 n=19+20)
RegexpMatchEasy1_1K-12       487ns ± 1%     494ns ± 1%  +1.50%  (p=0.000 n=17+20)
RegexpMatchMedium_32-12      131ns ± 2%     130ns ± 1%    ~     (p=0.686 n=19+20)
RegexpMatchMedium_1K-12     39.6µs ± 1%    39.2µs ± 1%  -1.09%  (p=0.000 n=18+19)
RegexpMatchHard_32-12       2.04µs ± 1%    2.04µs ± 2%    ~     (p=0.804 n=20+20)
RegexpMatchHard_1K-12       61.7µs ± 2%    61.3µs ± 2%    ~     (p=0.052 n=18+20)
Revcomp-12                   529ms ± 2%     533ms ± 1%  +0.83%  (p=0.003 n=20+19)
Template-12                 70.7ms ± 2%    71.0ms ± 2%    ~     (p=0.065 n=20+19)
TimeParse-12                 351ns ± 2%     355ns ± 1%  +1.25%  (p=0.000 n=19+20)
TimeFormat-12                362ns ± 2%     373ns ± 1%  +2.83%  (p=0.000 n=18+20)
[Geo mean]                  62.2µs         62.3µs       +0.13%

name                      old speed      new speed      delta
GobDecode-12              88.6MB/s ± 1%  88.5MB/s ± 1%    ~     (p=0.392 n=19+20)
GobEncode-12               117MB/s ± 1%   117MB/s ± 1%    ~     (p=0.622 n=19+20)
Gzip-12                   61.1MB/s ± 1%  61.8MB/s ± 2%  +1.11%  (p=0.000 n=20+19)
Gunzip-12                  461MB/s ± 1%   460MB/s ± 1%  -0.27%  (p=0.044 n=20+19)
JSONEncode-12              116MB/s ± 1%   115MB/s ± 2%  -0.58%  (p=0.022 n=20+20)
JSONDecode-12             33.3MB/s ± 1%  31.6MB/s ± 2%  -5.15%  (p=0.000 n=18+19)
GoParse-12                15.0MB/s ± 2%  15.4MB/s ± 2%  +2.66%  (p=0.000 n=20+20)
RegexpMatchEasy0_32-12     317MB/s ± 2%   319MB/s ± 2%    ~     (p=0.052 n=20+20)
RegexpMatchEasy0_1K-12    3.08GB/s ± 1%  2.99GB/s ± 1%  -3.07%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12     386MB/s ± 3%   386MB/s ± 2%    ~     (p=0.939 n=19+20)
RegexpMatchEasy1_1K-12    2.10GB/s ± 1%  2.07GB/s ± 1%  -1.46%  (p=0.000 n=17+20)
RegexpMatchMedium_32-12   7.62MB/s ± 2%  7.64MB/s ± 1%    ~     (p=0.702 n=19+20)
RegexpMatchMedium_1K-12   25.9MB/s ± 1%  26.1MB/s ± 2%  +0.99%  (p=0.000 n=18+20)
RegexpMatchHard_32-12     15.7MB/s ± 1%  15.7MB/s ± 2%    ~     (p=0.723 n=20+20)
RegexpMatchHard_1K-12     16.6MB/s ± 2%  16.7MB/s ± 2%    ~     (p=0.052 n=18+20)
Revcomp-12                 481MB/s ± 2%   477MB/s ± 1%  -0.83%  (p=0.003 n=20+19)
Template-12               27.5MB/s ± 2%  27.3MB/s ± 2%    ~     (p=0.062 n=20+19)
[Geo mean]                99.4MB/s       99.1MB/s       -0.35%

Change-Id: I914d8cadded5a230509d118164a4c201601afc06
Reviewed-on: https://go-review.googlesource.com/16298
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-04 20:15:51 +00:00
Austin Clements
62ba520b23 runtime: eliminate getfull barrier from concurrent mark
Currently dedicated mark workers participate in the getfull barrier
during concurrent mark. However, the getfull barrier wasn't designed
for concurrent work and this causes no end of headaches.

In the concurrent setting, participants come and go. This makes mark
completion susceptible to live-lock: since dedicated workers are only
periodically polling for completion, it's possible for the program to
be in some transient worker each time one of the dedicated workers
wakes up to check if it can exit the getfull barrier. It also
complicates reasoning about the system because dedicated workers
participate directly in the getfull barrier, but transient workers
must instead use trygetfull because they have exit conditions that
aren't captured by getfull (e.g., fractional workers exit when
preempted). The complexity of implementing these exit conditions
contributed to #11677. Furthermore, the getfull barrier is inefficient
because we could be running user code instead of spinning on a P. In
effect, we're dedicating 25% of the CPU to marking even if that means
we have to spin to make that 25%. It also causes issues on Windows
because we can't actually sleep for 100µs (#8687).

Fix this by making dedicated workers no longer participate in the
getfull barrier. Instead, dedicated workers simply return to the
scheduler when they fail to get more work, regardless of what others
workers are doing, and the scheduler only starts new dedicated workers
if there's work available. Everything that needs to be handled by this
barrier is already handled by detection of mark completion.

This makes the system much more symmetric because all workers and
assists now use trygetfull during concurrent mark. It also loosens the
25% CPU target so that we can give some of that 25% back to user code
if there isn't enough work to keep the mark worker busy. And it
eliminates the problematic 100µs sleep on Windows during concurrent
mark (though not during mark termination).

The downside of this is that if we hit a bottleneck in the heap graph
that then expands back out, the system may shut down dedicated workers
and take a while to start them back up. We'll address this in the next
commit.

Updates #12041 and #8687.

No effect on the go1 benchmarks. This slows down the garbage benchmark
by 9%, but we'll more than make it up in the next commit.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.80ms ± 2%  6.32ms ± 4%  +9.03%  (p=0.000 n=20+20)

Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b
Reviewed-on: https://go-review.googlesource.com/16297
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-04 20:15:39 +00:00
Austin Clements
3a765430c1 cmd/compile: add go:nowritebarrierrec annotation
This introduces a recursive variant of the go:nowritebarrier
annotation that prohibits write barriers not only in the annotated
function, but in all functions it calls, recursively. The error
message gives the shortest call stack from the annotated function to
the function containing the prohibited write barrier, including the
names of the functions and the line numbers of the calls.

To demonstrate the annotation, we apply it to gcmarkwb_m, the write
barrier itself.

This is a new annotation rather than a modification of the existing
go:nowritebarrier annotation because, for better or worse, there are
many go:nowritebarrier functions that do call functions with write
barriers. In most of these cases this is benign because the annotation
was conservative, but it prohibits simply coopting the existing
annotation.

Change-Id: I225ca483c8f699e8436373ed96349e80ca2c2479
Reviewed-on: https://go-review.googlesource.com/16554
Reviewed-by: Keith Randall <khr@golang.org>
2015-11-04 14:42:04 +00:00
Dmitry Vyukov
ee0305e036 runtime: remove dead code
runtime.free has long gone.

Change-Id: I058f69e6481b8fa008e1951c29724731a8a3d081
Reviewed-on: https://go-review.googlesource.com/16593
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2015-11-03 19:20:21 +00:00
Austin Clements
b6c0934a9b runtime: cache two workbufs to reduce contention
Currently the gcWork abstraction caches a single work buffer. As a
result, if a worker is putting and getting pointers right at the
boundary of a work buffer, it can flap between work buffers and
(potentially significantly) increase contention on the global work
buffer lists.

This change modifies gcWork to instead cache two work buffers and
switch off between them. This introduces one buffers' worth of
hysteresis and eliminates the above performance worst case by
amortizing the cost of getting or putting a work buffer over at least
one buffers' worth of work.

In practice, it's difficult to trigger this worst case with reasonably
large work buffers. On the garbage benchmark, this reduces the max
writes/sec to the global work list from 32K to 25K and the median from
6K to 5K. However, if a workload were to trigger this worst case
behavior, it could significantly drive up this contention.

This has negligible effects on the go1 benchmarks and slightly speeds
up the garbage benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.90ms ± 3%  5.83ms ± 4%  -1.18%  (p=0.011 n=18+18)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.22s ± 4%     3.17s ± 3%  -1.57%  (p=0.009 n=19+20)
Fannkuch11-12                2.44s ± 1%     2.53s ± 4%  +3.78%  (p=0.000 n=18+19)
FmtFprintfEmpty-12          50.2ns ± 2%    50.5ns ± 5%    ~     (p=0.631 n=19+20)
FmtFprintfString-12          167ns ± 1%     166ns ± 1%    ~     (p=0.141 n=20+20)
FmtFprintfInt-12             162ns ± 1%     159ns ± 1%  -1.80%  (p=0.000 n=20+20)
FmtFprintfIntInt-12          277ns ± 2%     263ns ± 1%  -4.78%  (p=0.000 n=20+18)
FmtFprintfPrefixedInt-12     240ns ± 1%     232ns ± 2%  -3.25%  (p=0.000 n=20+20)
FmtFprintfFloat-12           311ns ± 1%     315ns ± 2%  +1.17%  (p=0.000 n=20+20)
FmtManyArgs-12              1.05µs ± 2%    1.03µs ± 2%  -1.72%  (p=0.000 n=20+20)
GobDecode-12                8.65ms ± 1%    8.71ms ± 2%  +0.68%  (p=0.001 n=19+20)
GobEncode-12                6.51ms ± 1%    6.54ms ± 1%  +0.42%  (p=0.047 n=20+19)
Gzip-12                      318ms ± 2%     315ms ± 2%  -1.20%  (p=0.000 n=19+19)
Gunzip-12                   42.2ms ± 2%    42.1ms ± 1%    ~     (p=0.667 n=20+19)
HTTPClientServer-12         62.5µs ± 1%    62.4µs ± 1%    ~     (p=0.110 n=20+18)
JSONEncode-12               16.8ms ± 1%    16.8ms ± 2%    ~     (p=0.569 n=19+20)
JSONDecode-12               60.8ms ± 2%    59.8ms ± 1%  -1.69%  (p=0.000 n=19+19)
Mandelbrot200-12            3.87ms ± 1%    3.85ms ± 0%  -0.61%  (p=0.001 n=20+17)
GoParse-12                  3.76ms ± 2%    3.76ms ± 1%    ~     (p=0.698 n=20+20)
RegexpMatchEasy0_32-12       100ns ± 2%     101ns ± 2%    ~     (p=0.065 n=19+20)
RegexpMatchEasy0_1K-12       342ns ± 2%     333ns ± 1%  -2.82%  (p=0.000 n=20+19)
RegexpMatchEasy1_32-12      83.3ns ± 2%    83.2ns ± 2%    ~     (p=0.692 n=20+19)
RegexpMatchEasy1_1K-12       498ns ± 2%     490ns ± 1%  -1.52%  (p=0.000 n=18+20)
RegexpMatchMedium_32-12      131ns ± 2%     131ns ± 2%    ~     (p=0.464 n=20+18)
RegexpMatchMedium_1K-12     39.3µs ± 2%    39.6µs ± 1%  +0.77%  (p=0.000 n=18+19)
RegexpMatchHard_32-12       2.04µs ± 2%    2.06µs ± 1%  +0.69%  (p=0.009 n=19+20)
RegexpMatchHard_1K-12       61.4µs ± 2%    62.1µs ± 1%  +1.21%  (p=0.000 n=19+20)
Revcomp-12                   534ms ± 1%     529ms ± 1%  -0.97%  (p=0.000 n=19+16)
Template-12                 70.4ms ± 2%    70.0ms ± 1%    ~     (p=0.070 n=19+19)
TimeParse-12                 359ns ± 3%     344ns ± 1%  -4.15%  (p=0.000 n=19+19)
TimeFormat-12                357ns ± 1%     361ns ± 2%  +1.05%  (p=0.002 n=20+20)
[Geo mean]                  62.4µs         62.0µs       -0.56%

name                      old speed      new speed      delta
GobDecode-12              88.7MB/s ± 1%  88.1MB/s ± 2%  -0.68%  (p=0.001 n=19+20)
GobEncode-12               118MB/s ± 1%   117MB/s ± 1%  -0.42%  (p=0.046 n=20+19)
Gzip-12                   60.9MB/s ± 2%  61.7MB/s ± 2%  +1.21%  (p=0.000 n=19+19)
Gunzip-12                  460MB/s ± 2%   461MB/s ± 1%    ~     (p=0.661 n=20+19)
JSONEncode-12              116MB/s ± 1%   115MB/s ± 2%    ~     (p=0.555 n=19+20)
JSONDecode-12             31.9MB/s ± 2%  32.5MB/s ± 1%  +1.72%  (p=0.000 n=19+19)
GoParse-12                15.4MB/s ± 2%  15.4MB/s ± 1%    ~     (p=0.653 n=20+20)
RegexpMatchEasy0_32-12     317MB/s ± 2%   315MB/s ± 2%    ~     (p=0.141 n=19+20)
RegexpMatchEasy0_1K-12    2.99GB/s ± 2%  3.07GB/s ± 1%  +2.86%  (p=0.000 n=20+19)
RegexpMatchEasy1_32-12     384MB/s ± 2%   385MB/s ± 2%    ~     (p=0.672 n=20+19)
RegexpMatchEasy1_1K-12    2.06GB/s ± 2%  2.09GB/s ± 1%  +1.54%  (p=0.000 n=18+20)
RegexpMatchMedium_32-12   7.62MB/s ± 2%  7.63MB/s ± 2%    ~     (p=0.800 n=20+18)
RegexpMatchMedium_1K-12   26.0MB/s ± 1%  25.8MB/s ± 1%  -0.77%  (p=0.000 n=18+19)
RegexpMatchHard_32-12     15.7MB/s ± 2%  15.6MB/s ± 1%  -0.69%  (p=0.010 n=19+20)
RegexpMatchHard_1K-12     16.7MB/s ± 2%  16.5MB/s ± 1%  -1.19%  (p=0.000 n=19+20)
Revcomp-12                 476MB/s ± 1%   481MB/s ± 1%  +0.97%  (p=0.000 n=19+16)
Template-12               27.6MB/s ± 2%  27.7MB/s ± 1%    ~     (p=0.071 n=19+19)
[Geo mean]                99.1MB/s       99.3MB/s       +0.27%

Change-Id: I68bcbf74ccb716cd5e844a554f67b679135105e6
Reviewed-on: https://go-review.googlesource.com/16042
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 19:12:10 +00:00
Dmitry Vyukov
bf606094ee runtime: fix finalization and profiling of tiny allocations
Handling of special records for tiny allocations has two problems:
1. Once we queue a finalizer we mark the object. As the result any
   subsequent finalizers for the same object will not be queued
   during this GC cycle. If we have 16 finalizers setup (the worst case),
   finalization will take 16 GC cycles. This is what caused misbehave
   of tinyfin.go. The actual flakiness was caused by the fact that fing
   is asynchronous and don't always run before the check.
2. If a tiny block has both finalizer and profile specials,
   it is possible that we both queue finalizer, preserve the object live
   and free the profile record. As the result heap profile can be skewed.

Fix both issues by analyzing all special records for a single object at once.

Also, make tinyfin test stricter and remove reliance on real time.

Also, add a test for the problem 2. Currently heap profile missed about
a half of live memory.

Fixes #13100

Change-Id: I9ae4dc1c44893724138a4565ca5cae29f2e97544
Reviewed-on: https://go-review.googlesource.com/16591
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
2015-11-03 18:57:18 +00:00
Keith Randall
02f4d0a130 [dev.ssa] cmd/compile: start arguments as spilled
Declare a function's arguments as having already been
spilled so their use just requires a restore.

Allow spill locations to be portions of larger objects the stack.
Required to load portions of compound input arguments.

Rename the memory input to InputMem.  Use Arg for the
pre-spilled argument values.

Change-Id: I8fe2a03ffbba1022d98bfae2052b376b96d32dda
Reviewed-on: https://go-review.googlesource.com/16536
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-11-03 17:29:40 +00:00
Ilya Tocar
95333aea53 strings: add asm version of Index() for short strings on amd64
Currently we have special case for 1-byte strings,
This extends this to strings shorter than 32 bytes on amd64.
Results (broadwell):

name                 old time/op  new time/op  delta
IndexRune-4          57.4ns ± 0%  57.5ns ± 0%   +0.10%        (p=0.000 n=20+19)
IndexRuneFastPath-4  20.4ns ± 0%  20.4ns ± 0%     ~     (all samples are equal)
Index-4              21.0ns ± 0%  21.8ns ± 0%   +3.81%        (p=0.000 n=20+20)
LastIndex-4          7.07ns ± 1%  6.98ns ± 0%   -1.21%        (p=0.000 n=20+16)
IndexByte-4          18.3ns ± 0%  18.3ns ± 0%     ~     (all samples are equal)
IndexHard1-4         1.46ms ± 0%  0.39ms ± 0%  -73.06%        (p=0.000 n=16+16)
IndexHard2-4         1.46ms ± 0%  0.30ms ± 0%  -79.55%        (p=0.000 n=18+18)
IndexHard3-4         1.46ms ± 0%  0.66ms ± 0%  -54.68%        (p=0.000 n=19+19)
LastIndexHard1-4     1.46ms ± 0%  1.46ms ± 0%   -0.01%        (p=0.036 n=18+20)
LastIndexHard2-4     1.46ms ± 0%  1.46ms ± 0%     ~           (p=0.588 n=19+19)
LastIndexHard3-4     1.46ms ± 0%  1.46ms ± 0%     ~           (p=0.283 n=17+20)
IndexTorture-4       11.1µs ± 0%  11.1µs ± 0%   +0.01%        (p=0.000 n=18+17)

Change-Id: I892781549f558f698be4e41f9f568e3d0611efb5
Reviewed-on: https://go-review.googlesource.com/16430
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
2015-11-03 16:04:28 +00:00
Austin Clements
1870572180 runtime: enlarge GC work buffer size
Currently the GC work buffers are only 256 bytes and hence can record
only 24 64-bit pointer. They were reduced from 4K in commits db7fd1c
and a15818f as a way to minimize the amount of work the per-P workbuf
caches could "hide" from the mark phase and carry in to the mark
termination phase. However, this approach wasn't very robust and we
later added a "mark 2" phase to address this problem head-on.

Because of mark 2, there's now no benefit to having very small work
buffers. But there are plenty of downsides: small work buffers
increase contention on the work lists, increase the frequency and
hence net overhead of acquiring and releasing work buffers, and
somewhat increase memory overhead of the GC.

This commit expands work buffers back to 4K (504 64-bit pointers).
This reduces the rate of writes to work.full in the garbage benchmark
from a peak of ~780,000 writes/sec to a peak of ~32,000 writes/sec.

This has negligible effect on the go1 benchmarks. It slightly slows
down the garbage benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.37ms ± 5%  5.60ms ± 2%  +4.37%  (p=0.000 n=20+20)

Change-Id: Ic9cc28e7a125d23d9faf4f5e690fb8aa9bcdfb28
Reviewed-on: https://go-review.googlesource.com/15893
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 15:53:38 +00:00
Austin Clements
456528304d runtime: make assists preemptible
Currently, assists are non-preemptible, which means a heavily
assisting G can block other Gs from running. At the beginning of a GC
cycle, it can also delay scang, which will spin until the assist is
done. Since scanning is currently done sequentially, this can
seriously extend the length of the scan phase.

Fix this by making assists preemptible. Since the assist holds work
buffers and runs on the system stack, this must be done cooperatively:
we make gcDrainN return on preemption, and make the assist return from
the system stack and voluntarily Gosched.

This is prerequisite to enlarging the work buffers. Without this
change, the delays and spinning in scang increase significantly.

This has no effect on the go1 benchmarks.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.72ms ± 4%  5.37ms ± 5%  -6.11%  (p=0.000 n=20+20)

Change-Id: I829e732a0f23b126da633516a1a9ec1a508fdbf1
Reviewed-on: https://go-review.googlesource.com/15894
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 15:53:31 +00:00
Austin Clements
15aa6bbd5a runtime: replace assist sleep loop with park/ready
GC assists must block until the assist can be satisfied (either
through stealing credit or doing work) or the GC cycle ends.
Currently, this is implemented as a retry loop with a 100 µs delay.
This obviously isn't ideal, as it wastes CPU and delays mutator
execution. It also has the somewhat peculiar downside that sleeping a
G requires allocation, and this requires working around recursive
allocation.

Replace this timed delay with a proper scheduling queue. When an
assist can't be satisfied immediately, it adds the allocating G to a
queue and parks it. Any time background scan credit is flushed, it
consults this queue, directly satisfies the debt of queued assists,
and wakes up satisfied assists before flushing any remaining credit to
the background credit pool.

No effect on the go1 benchmarks. Slightly speeds up the garbage
benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.81ms ± 1%  5.72ms ± 4%  -1.65%  (p=0.011 n=20+20)

Updates #12041.

Change-Id: I8ee3b6274dd097b12b10a8030796a958a4b0e7b7
Reviewed-on: https://go-review.googlesource.com/15890
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 15:53:25 +00:00
Austin Clements
0ca4488cc1 runtime: change p.runq from []*g to []guintptr
This eliminates many write barriers in the scheduler code that are
unnecessary and will interfere with upcoming changes where the garbage
collector will have to invoke run queue functions in contexts that
must not have write barriers.

Change-Id: I702d0ac99cfd00ffff406e7362917db6a43e7e55
Reviewed-on: https://go-review.googlesource.com/16556
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-11-03 15:53:18 +00:00
Todd Neal
e3e0122ae2 test: use go:noinline consistently
Replace various implementations of inlining prevention with
"go:noinline"

Change-Id: Iac90895c3a62d6f4b7a6c72e11e165d15a0abfa4
Reviewed-on: https://go-review.googlesource.com/16510
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 02:01:34 +00:00
Ilya Tocar
0e23ca41d9 bytes: speed up Compare() on amd64
Use AVX2 if available.
Results (haswell), below:

name                           old time/op    new time/op     delta
BytesCompare1-6                  11.4ns ± 0%     11.4ns ± 0%     ~     (all samples are equal)
BytesCompare2-6                  11.4ns ± 0%     11.4ns ± 0%     ~     (all samples are equal)
BytesCompare4-6                  11.4ns ± 0%     11.4ns ± 0%     ~     (all samples are equal)
BytesCompare8-6                  9.29ns ± 2%     8.76ns ± 0%   -5.72%        (p=0.000 n=16+17)
BytesCompare16-6                 9.29ns ± 2%     9.20ns ± 0%   -1.02%        (p=0.000 n=20+16)
BytesCompare32-6                 11.4ns ± 1%     11.4ns ± 0%     ~           (p=0.191 n=20+20)
BytesCompare64-6                 14.4ns ± 0%     13.1ns ± 0%   -8.68%        (p=0.000 n=20+20)
BytesCompare128-6                20.2ns ± 0%     18.5ns ± 0%   -8.27%        (p=0.000 n=16+20)
BytesCompare256-6                29.3ns ± 0%     24.5ns ± 0%  -16.38%        (p=0.000 n=16+16)
BytesCompare512-6                46.8ns ± 0%     37.1ns ± 0%  -20.78%        (p=0.000 n=18+16)
BytesCompare1024-6               82.9ns ± 0%     62.3ns ± 0%  -24.86%        (p=0.000 n=20+14)
BytesCompare2048-6                155ns ± 0%      112ns ± 0%  -27.74%        (p=0.000 n=20+20)
CompareBytesEqual-6              10.1ns ± 1%     10.0ns ± 1%     ~           (p=0.527 n=20+20)
CompareBytesToNil-6              10.0ns ± 2%      9.4ns ± 0%   -6.57%        (p=0.000 n=20+17)
CompareBytesEmpty-6              8.76ns ± 0%     8.76ns ± 0%     ~     (all samples are equal)
CompareBytesIdentical-6          8.76ns ± 0%     8.76ns ± 0%     ~     (all samples are equal)
CompareBytesSameLength-6         10.6ns ± 1%     10.6ns ± 1%     ~           (p=0.240 n=20+20)
CompareBytesDifferentLength-6    10.6ns ± 0%     10.6ns ± 1%     ~           (p=1.000 n=20+20)
CompareBytesBigUnaligned-6        132±s ± 1%      105±s ± 1%  -20.61%        (p=0.000 n=20+18)
CompareBytesBig-6                 125±s ± 1%      105±s ± 1%  -16.31%        (p=0.000 n=20+20)
CompareBytesBigIdentical-6       8.13ns ± 0%     8.13ns ± 0%     ~     (all samples are equal)

name                           old speed      new speed       delta
CompareBytesBigUnaligned-6     7.94GB/s ± 1%  10.01GB/s ± 1%  +25.96%        (p=0.000 n=20+18)
CompareBytesBig-6              8.38GB/s ± 1%  10.01GB/s ± 1%  +19.48%        (p=0.000 n=20+20)
CompareBytesBigIdentical-6      129TB/s ± 0%    129TB/s ± 0%   +0.01%        (p=0.003 n=17+19)

Change-Id: I820f31bab4582dd4204b146bb077c0d2f24cd8f5
Reviewed-on: https://go-review.googlesource.com/16434
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Klaus Post <klauspost@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2015-11-02 18:39:38 +00:00
Michael Hudson-Doyle
35d71d6727 cmd/go, runtime: define GOBUILDMODE_shared rather than shared when dynamically linking
To avoid collisions with what existing code may already be doing.

Change-Id: Ice639440aafc0724714c25333d90a49954372230
Reviewed-on: https://go-review.googlesource.com/16503
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-01 19:52:33 +00:00
Austin Clements
fbf273250f runtime: perform mark 2 root re-scanning in GC workers
This moves another root scanning task out of the GC coordinator and
parallelizes it on the GC workers.

This has negligible effect on the go1 benchmarks and the garbage
benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.24ms ± 1%  5.26ms ± 1%  +0.30%  (p=0.007 n=18+17)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.20s ± 5%     3.21s ± 5%    ~     (p=0.264 n=20+18)
Fannkuch11-12                2.46s ± 1%     2.54s ± 2%  +3.09%  (p=0.000 n=18+20)
FmtFprintfEmpty-12          49.9ns ± 4%    50.0ns ± 5%    ~     (p=0.356 n=20+20)
FmtFprintfString-12          170ns ± 1%     170ns ± 2%    ~     (p=0.815 n=19+20)
FmtFprintfInt-12             160ns ± 1%     159ns ± 1%  -0.63%  (p=0.003 n=18+19)
FmtFprintfIntInt-12          270ns ± 1%     267ns ± 1%  -1.00%  (p=0.000 n=19+18)
FmtFprintfPrefixedInt-12     238ns ± 1%     232ns ± 1%  -2.28%  (p=0.000 n=19+19)
FmtFprintfFloat-12           310ns ± 2%     313ns ± 2%  +0.93%  (p=0.000 n=19+19)
FmtManyArgs-12              1.06µs ± 1%    1.04µs ± 1%  -1.93%  (p=0.000 n=20+19)
GobDecode-12                8.63ms ± 1%    8.70ms ± 1%  +0.81%  (p=0.001 n=20+19)
GobEncode-12                6.52ms ± 1%    6.56ms ± 1%  +0.66%  (p=0.000 n=20+19)
Gzip-12                      318ms ± 1%     319ms ± 1%    ~     (p=0.405 n=17+18)
Gunzip-12                   42.1ms ± 2%    42.0ms ± 1%    ~     (p=0.771 n=20+19)
HTTPClientServer-12         62.6µs ± 1%    62.9µs ± 1%  +0.41%  (p=0.038 n=20+20)
JSONEncode-12               16.9ms ± 1%    16.9ms ± 1%    ~     (p=0.077 n=18+20)
JSONDecode-12               60.7ms ± 1%    62.3ms ± 1%  +2.73%  (p=0.000 n=20+20)
Mandelbrot200-12            3.86ms ± 1%    3.85ms ± 1%    ~     (p=0.084 n=19+20)
GoParse-12                  3.75ms ± 2%    3.73ms ± 1%    ~     (p=0.107 n=20+19)
RegexpMatchEasy0_32-12       100ns ± 2%     101ns ± 2%  +0.97%  (p=0.001 n=20+19)
RegexpMatchEasy0_1K-12       342ns ± 2%     332ns ± 2%  -2.86%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12      83.2ns ± 2%    82.8ns ± 2%    ~     (p=0.108 n=19+20)
RegexpMatchEasy1_1K-12       495ns ± 2%     490ns ± 2%  -1.04%  (p=0.000 n=18+19)
RegexpMatchMedium_32-12      130ns ± 2%     131ns ± 2%    ~     (p=0.291 n=20+20)
RegexpMatchMedium_1K-12     39.3µs ± 1%    39.9µs ± 1%  +1.54%  (p=0.000 n=18+20)
RegexpMatchHard_32-12       2.02µs ± 1%    2.05µs ± 2%  +1.19%  (p=0.000 n=19+19)
RegexpMatchHard_1K-12       60.9µs ± 1%    61.5µs ± 1%  +0.99%  (p=0.000 n=18+18)
Revcomp-12                   535ms ± 1%     531ms ± 1%  -0.82%  (p=0.000 n=17+17)
Template-12                 73.0ms ± 1%    74.1ms ± 1%  +1.47%  (p=0.000 n=20+20)
TimeParse-12                 356ns ± 2%     348ns ± 1%  -2.30%  (p=0.000 n=20+20)
TimeFormat-12                347ns ± 1%     353ns ± 1%  +1.68%  (p=0.000 n=19+20)
[Geo mean]                  62.3µs         62.4µs       +0.12%

name                      old speed      new speed      delta
GobDecode-12              88.9MB/s ± 1%  88.2MB/s ± 1%  -0.81%  (p=0.001 n=20+19)
GobEncode-12               118MB/s ± 1%   117MB/s ± 1%  -0.66%  (p=0.000 n=20+19)
Gzip-12                   60.9MB/s ± 1%  60.8MB/s ± 1%    ~     (p=0.409 n=17+18)
Gunzip-12                  461MB/s ± 2%   462MB/s ± 1%    ~     (p=0.765 n=20+19)
JSONEncode-12              115MB/s ± 1%   115MB/s ± 1%    ~     (p=0.078 n=18+20)
JSONDecode-12             32.0MB/s ± 1%  31.1MB/s ± 1%  -2.65%  (p=0.000 n=20+20)
GoParse-12                15.5MB/s ± 2%  15.5MB/s ± 1%    ~     (p=0.111 n=20+19)
RegexpMatchEasy0_32-12     318MB/s ± 2%   314MB/s ± 2%  -1.27%  (p=0.000 n=20+19)
RegexpMatchEasy0_1K-12    2.99GB/s ± 1%  3.08GB/s ± 2%  +2.94%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12     385MB/s ± 2%   386MB/s ± 2%    ~     (p=0.105 n=19+20)
RegexpMatchEasy1_1K-12    2.07GB/s ± 1%  2.09GB/s ± 2%  +1.06%  (p=0.000 n=18+19)
RegexpMatchMedium_32-12   7.64MB/s ± 2%  7.61MB/s ± 1%    ~     (p=0.179 n=20+20)
RegexpMatchMedium_1K-12   26.1MB/s ± 1%  25.7MB/s ± 1%  -1.52%  (p=0.000 n=18+20)
RegexpMatchHard_32-12     15.8MB/s ± 1%  15.6MB/s ± 2%  -1.18%  (p=0.000 n=19+19)
RegexpMatchHard_1K-12     16.8MB/s ± 2%  16.6MB/s ± 1%  -0.90%  (p=0.000 n=19+18)
Revcomp-12                 475MB/s ± 1%   479MB/s ± 1%  +0.83%  (p=0.000 n=17+17)
Template-12               26.6MB/s ± 1%  26.2MB/s ± 1%  -1.45%  (p=0.000 n=20+20)
[Geo mean]                99.0MB/s       98.7MB/s       -0.32%

Change-Id: I6ea44d7a59aaa6851c64695277ab65645ff9d32e
Reviewed-on: https://go-review.googlesource.com/16070
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-10-30 22:46:39 +00:00
Austin Clements
82d14d77da runtime: perform concurrent scan in GC workers
Currently the concurrent root scan is performed in its entirety by the
GC coordinator before entering concurrent mark (which enables GC
workers). This scan is done sequentially, which can prolong the scan
phase, delay the mark phase, and means that the scan phase does not
obey the 25% CPU goal. Furthermore, there's no need to complete the
root scan before starting marking (in fact, we already allow GC
assists to happen during the scan phase), so this acts as an
unnecessary barrier between root scanning and marking.

This change shifts the root scan work out of the GC coordinator and in
to the GC workers. The coordinator simply sets up the scan state and
enqueues the right number of root scan jobs. The GC workers then drain
the root scan jobs prior to draining heap scan jobs.

This parallelizes the root scan process, makes it obey the 25% CPU
goal, and effectively eliminates root scanning as an isolated phase,
allowing the system to smoothly transition from root scanning to heap
marking. This also eliminates a major non-STW responsibility of the GC
coordinator, which will make it easier to switch to a decentralized
state machine. Finally, it puts us in a good position to perform root
scanning in assists as well, which will help satisfy assists at the
beginning of the GC cycle.

This is mostly straightforward. One tricky aspect is that we have to
deal with preemption deadlock: where two non-preemptible gorountines
are trying to preempt each other to perform a stack scan. Given the
context where this happens, the only instance of this is two
background workers trying to scan each other. We avoid this by simply
not scanning the stacks of background workers during the concurrent
phase; this is safe because we'll scan them during mark termination
(and their stacks are *very* small and should not contain any new
pointers).

This change also switches the root marking during mark termination to
use the same gcDrain-based code path as concurrent mark. This
shouldn't affect performance because STW root marking was already
parallel and tasks switched to heap marking immediately when no more
root marking tasks were available. However, it simplifies the code and
unifies these code paths.

This has negligible effect on the go1 benchmarks. It slightly slows
down the garbage benchmark, possibly by making GC run slightly more
frequently.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.10ms ± 1%  5.24ms ± 1%  +2.87%  (p=0.000 n=18+18)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.25s ± 3%     3.20s ± 5%  -1.57%  (p=0.013 n=20+20)
Fannkuch11-12                2.45s ± 1%     2.46s ± 1%  +0.38%  (p=0.019 n=20+18)
FmtFprintfEmpty-12          49.7ns ± 3%    49.9ns ± 4%    ~     (p=0.851 n=19+20)
FmtFprintfString-12          170ns ± 2%     170ns ± 1%    ~     (p=0.775 n=20+19)
FmtFprintfInt-12             161ns ± 1%     160ns ± 1%  -0.78%  (p=0.000 n=19+18)
FmtFprintfIntInt-12          267ns ± 1%     270ns ± 1%  +1.04%  (p=0.000 n=19+19)
FmtFprintfPrefixedInt-12     238ns ± 2%     238ns ± 1%    ~     (p=0.133 n=18+19)
FmtFprintfFloat-12           311ns ± 1%     310ns ± 2%  -0.35%  (p=0.023 n=20+19)
FmtManyArgs-12              1.08µs ± 1%    1.06µs ± 1%  -2.31%  (p=0.000 n=20+20)
GobDecode-12                8.65ms ± 1%    8.63ms ± 1%    ~     (p=0.377 n=18+20)
GobEncode-12                6.49ms ± 1%    6.52ms ± 1%  +0.37%  (p=0.015 n=20+20)
Gzip-12                      319ms ± 3%     318ms ± 1%    ~     (p=0.975 n=19+17)
Gunzip-12                   41.9ms ± 1%    42.1ms ± 2%  +0.65%  (p=0.004 n=19+20)
HTTPClientServer-12         61.7µs ± 1%    62.6µs ± 1%  +1.40%  (p=0.000 n=18+20)
JSONEncode-12               16.8ms ± 1%    16.9ms ± 1%    ~     (p=0.239 n=20+18)
JSONDecode-12               58.4ms ± 1%    60.7ms ± 1%  +3.85%  (p=0.000 n=19+20)
Mandelbrot200-12            3.86ms ± 0%    3.86ms ± 1%    ~     (p=0.092 n=18+19)
GoParse-12                  3.75ms ± 2%    3.75ms ± 2%    ~     (p=0.708 n=19+20)
RegexpMatchEasy0_32-12       100ns ± 1%     100ns ± 2%  +0.60%  (p=0.010 n=17+20)
RegexpMatchEasy0_1K-12       341ns ± 1%     342ns ± 2%    ~     (p=0.203 n=20+19)
RegexpMatchEasy1_32-12      82.5ns ± 2%    83.2ns ± 2%  +0.83%  (p=0.007 n=19+19)
RegexpMatchEasy1_1K-12       495ns ± 1%     495ns ± 2%    ~     (p=0.970 n=19+18)
RegexpMatchMedium_32-12      130ns ± 2%     130ns ± 2%  +0.59%  (p=0.039 n=19+20)
RegexpMatchMedium_1K-12     39.2µs ± 1%    39.3µs ± 1%    ~     (p=0.214 n=18+18)
RegexpMatchHard_32-12       2.03µs ± 2%    2.02µs ± 1%    ~     (p=0.166 n=18+19)
RegexpMatchHard_1K-12       61.0µs ± 1%    60.9µs ± 1%    ~     (p=0.169 n=20+18)
Revcomp-12                   533ms ± 1%     535ms ± 1%    ~     (p=0.071 n=19+17)
Template-12                 68.1ms ± 2%    73.0ms ± 1%  +7.26%  (p=0.000 n=19+20)
TimeParse-12                 355ns ± 2%     356ns ± 2%    ~     (p=0.530 n=19+20)
TimeFormat-12                357ns ± 2%     347ns ± 1%  -2.59%  (p=0.000 n=20+19)
[Geo mean]                  62.1µs         62.3µs       +0.31%

name                      old speed      new speed      delta
GobDecode-12              88.7MB/s ± 1%  88.9MB/s ± 1%    ~     (p=0.377 n=18+20)
GobEncode-12               118MB/s ± 1%   118MB/s ± 1%  -0.37%  (p=0.015 n=20+20)
Gzip-12                   60.9MB/s ± 3%  60.9MB/s ± 1%    ~     (p=0.944 n=19+17)
Gunzip-12                  464MB/s ± 1%   461MB/s ± 2%  -0.64%  (p=0.004 n=19+20)
JSONEncode-12              115MB/s ± 1%   115MB/s ± 1%    ~     (p=0.236 n=20+18)
JSONDecode-12             33.2MB/s ± 1%  32.0MB/s ± 1%  -3.71%  (p=0.000 n=19+20)
GoParse-12                15.5MB/s ± 2%  15.5MB/s ± 2%    ~     (p=0.702 n=19+20)
RegexpMatchEasy0_32-12     320MB/s ± 1%   318MB/s ± 2%    ~     (p=0.094 n=18+20)
RegexpMatchEasy0_1K-12    3.00GB/s ± 1%  2.99GB/s ± 1%    ~     (p=0.194 n=20+19)
RegexpMatchEasy1_32-12     388MB/s ± 2%   385MB/s ± 2%  -0.83%  (p=0.008 n=19+19)
RegexpMatchEasy1_1K-12    2.07GB/s ± 1%  2.07GB/s ± 1%    ~     (p=0.964 n=19+18)
RegexpMatchMedium_32-12   7.68MB/s ± 1%  7.64MB/s ± 2%  -0.57%  (p=0.020 n=19+20)
RegexpMatchMedium_1K-12   26.1MB/s ± 1%  26.1MB/s ± 1%    ~     (p=0.211 n=18+18)
RegexpMatchHard_32-12     15.8MB/s ± 1%  15.8MB/s ± 1%    ~     (p=0.180 n=18+19)
RegexpMatchHard_1K-12     16.8MB/s ± 1%  16.8MB/s ± 2%    ~     (p=0.236 n=20+19)
Revcomp-12                 477MB/s ± 1%   475MB/s ± 1%    ~     (p=0.071 n=19+17)
Template-12               28.5MB/s ± 2%  26.6MB/s ± 1%  -6.77%  (p=0.000 n=19+20)
[Geo mean]                 100MB/s       99.0MB/s       -0.82%

Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0
Reviewed-on: https://go-review.googlesource.com/16059
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-30 22:46:31 +00:00
Austin Clements
4cca1cc05e runtime: consolidate "out of GC work" checks
We already have gcMarkWorkAvailable, but the check for GC mark work is
open-coded in several places. Generalize gcMarkWorkAvailable slightly
and replace these open-coded checks with calls to gcMarkWorkAvailable.

In addition to cleaning up the code, this puts us in a better position
to make this check slightly more complicated.

Change-Id: I1b29883300ecd82a1bf6be193e9b4ee96582a860
Reviewed-on: https://go-review.googlesource.com/16058
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-30 22:46:22 +00:00
Russ Cox
bf1de1b141 runtime: introduce GOTRACEBACK=single, now the default
Abandon (but still support) the old numbering system.

GOTRACEBACK=none is old 0
GOTRACEBACK=single is the new behavior
GOTRACEBACK=all is old 1
GOTRACEBACK=system is old 2
GOTRACEBACK=crash is unchanged

See doc comment change in runtime1.go for details.

Filed #13107 to decide whether to change default back to GOTRACEBACK=all for Go 1.6 release.
If you run into programs where printing only the current goroutine omits
needed information, please add details in a comment on that issue.

Fixes #12366.

Change-Id: I82ca8b99b5d86dceb3f7102d38d2659d45dbe0db
Reviewed-on: https://go-review.googlesource.com/16512
Reviewed-by: Austin Clements <austin@google.com>
2015-10-30 18:43:44 +00:00
Michael Hudson-Doyle
c9b8cab16c cmd/internal/obj, cmd/link, runtime: handle TLS more like a platform linker on ppc64
On ppc64x, the thread pointer, held in R13, points 0x7000 bytes past where
thread-local storage begins (presumably to maximize the amount of storage that
can be accessed with a 16-bit signed displacement). The relocations used to
indicate thread-local storage to the platform linker account for this, so to be
able to support external linking we need to change things so the linker applies
this offset instead of the runtime assembly.

Change-Id: I2556c249ab2d802cae62c44b2b4c5b44787d7059
Reviewed-on: https://go-review.googlesource.com/14233
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-10-29 22:24:29 +00:00
Michael Hudson-Doyle
8537ff8a39 runtime/cgo: export _cgo_reginit on ppc64x
This is needed to make external linking work.

Change-Id: I4cf7edb4ea318849cab92a697952f8745eed40c4
Reviewed-on: https://go-review.googlesource.com/14237
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-29 00:38:43 +00:00
Ian Lance Taylor
f6fd086d5e runtime: add missing word in comment
Change-Id: Iffe27445e35ec071cf0920a05c81b8a97a3ed712
Reviewed-on: https://go-review.googlesource.com/16431
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-28 23:09:44 +00:00
David Crawshaw
73ff7cb1ed runtime: c-shared entrypoint for linux/arm64
Change-Id: I7dab124842f5209097a8d5a802fcbdde650654fa
Reviewed-on: https://go-review.googlesource.com/16395
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-28 21:21:33 +00:00
Hyang-Ah Hana Kim
dfc8649854 runtime, cmd: TLS setup for android/amd64.
Android linker does not handle TLS for us. We set up the TLS slot
for g, as darwin/386,amd64 handle instead. This is disgusting and
fragile. We will eventually fix this ugly hack by taking advantage
of the recent TLS IE model implementation. (Instead of referencing
an GOT entry, make the code sequence look into the TLS variable that
holds the offset.)

The TLS slot for g in android/amd64 assumes a fixed offset from %fs.
See runtime/cgo/gcc_android_amd64.c for details.

For golang/go#10743

Change-Id: I1a3fc207946c665515f79026a56ea19134ede2dd
Reviewed-on: https://go-review.googlesource.com/15991
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-28 20:54:28 +00:00
Michael Hudson-Doyle
72180c3b82 cmd/internal/obj, cmd/link, runtime: native-ish support for tls on arm64
Fixes #10560

Change-Id: Iedffd9c236c4fbb386c3afc52c5a1457f96ef122
Reviewed-on: https://go-review.googlesource.com/13991
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-28 19:51:05 +00:00
David du Colombier
31430bda09 runtime: don't use FP when calling nextSample in the Plan 9 sighandler
In the Go signal handler on Plan 9, when a signal with
the _SigThrow flag is received, we call startpanic before
printing the stack trace.

The startpanic function calls systemstack which calls
startpanic_m. In the startpanic_m function, we call
allocmcache to allocate _g_.m.mcache. The problem is
that allocmcache calls nextSample, which does a floating
point operation to return a sampling point for heap profiling.

However, Plan 9 doesn't support floating point in the
signal handler.

This change adds a new function nextSampleNoFP, only
called when in the Plan 9 signal handler, which is
similar to nextSample, but avoids floating point.

Change-Id: Iaa30437aa0f7c8c84d40afbab7567ad3bd5ea2de
Reviewed-on: https://go-review.googlesource.com/16307
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-28 05:45:24 +00:00
Michael Hudson-Doyle
bc3f14fd2a runtime: invoke vsyscall helper via TCB when dynamic linking on linux/386
The dynamic linker on linux/386 stores the address of the vsyscall helper at a
fixed offset from the %gs register on linux/386 for easy access from PIC code.

Change-Id: I635305cfecceef2289985d62e676e16810ed6b94
Reviewed-on: https://go-review.googlesource.com/16346
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-28 01:36:25 +00:00
Matthew Dempsky
4ff231bca1 runtime: eliminate some unnecessary uintptr conversions
arena_{start,used,end} are already uintptr, so no need to convert them
to uintptr, much less to convert them to unsafe.Pointer and then to
uintptr.  No binary change to pkg/linux_amd64/runtime.a.

Change-Id: Ia4232ed2a724c44fde7eba403c5fe8e6dccaa879
Reviewed-on: https://go-review.googlesource.com/16339
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-27 02:53:04 +00:00
David du Colombier
d093bf489b runtime: handle abort note on Plan 9
Implement an abort note on Plan 9, as an
equivalent of the SIGABRT signal on other
operating systems.

Updates #11975.

Change-Id: I010c9b10f2fbd2471aacd1d073368d975a2f0592
Reviewed-on: https://go-review.googlesource.com/16300
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-26 22:12:30 +00:00
Matthew Dempsky
d18167fefe runtime: fix tiny allocator
When a new tiny block is allocated because we're allocating an object
that won't fit into the current block, mallocgc saves the new block if
it has more space leftover than the old block.  However, the logic for
this was subtly broken in golang.org/cl/2814, resulting in never
saving (or consequently reusing) a tiny block.

Change-Id: Ib5f6769451fb82877ddeefe75dfe79ed4a04fd40
Reviewed-on: https://go-review.googlesource.com/16330
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-26 21:14:15 +00:00
Austin Clements
d3df04cd8c runtime: partition data and BSS root marking
Currently data and BSS root marking are each a single markroot job.
This makes them difficult to load balance, which can draw out mark
termination time if they are large.

Fix this by splitting both in to 256K chunks. While we're putting in
the infrastructure for dynamic roots, we also replace the fixed
sharding of the span roots with sharding in to fixed sizes. In
addition to helping balance root marking, this also paves the way to
parallelizing concurrent scan and to letting assists help with root
marking.

Updates #10345. This fixes the data and BSS aspects of that bug; it
does not partition scanning of large heap objects.

This has negligible effect on either the go1 benchmarks or the garbage
benchmark:

name              old time/op  new time/op  delta
XBenchGarbage-12  4.90ms ± 1%  4.91ms ± 2%   ~     (p=0.058 n=17+16)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.11s ± 4%     3.12s ± 4%    ~     (p=0.512 n=20+20)
Fannkuch11-12                2.53s ± 2%     2.47s ± 2%  -2.28%  (p=0.000 n=20+18)
FmtFprintfEmpty-12          49.1ns ± 1%    50.0ns ± 4%  +1.68%  (p=0.008 n=18+20)
FmtFprintfString-12          170ns ± 0%     172ns ± 1%  +1.05%  (p=0.000 n=14+19)
FmtFprintfInt-12             174ns ± 1%     162ns ± 1%  -6.81%  (p=0.000 n=18+17)
FmtFprintfIntInt-12          284ns ± 1%     277ns ± 1%  -2.42%  (p=0.000 n=20+19)
FmtFprintfPrefixedInt-12     252ns ± 1%     244ns ± 1%  -2.84%  (p=0.000 n=18+20)
FmtFprintfFloat-12           317ns ± 0%     311ns ± 0%  -1.95%  (p=0.000 n=19+18)
FmtManyArgs-12              1.08µs ± 1%    1.11µs ± 1%  +3.43%  (p=0.000 n=18+19)
GobDecode-12                8.56ms ± 1%    8.61ms ± 1%  +0.50%  (p=0.020 n=20+20)
GobEncode-12                6.58ms ± 1%    6.57ms ± 1%    ~     (p=0.792 n=20+19)
Gzip-12                      317ms ± 3%     317ms ± 2%    ~     (p=0.840 n=19+19)
Gunzip-12                   41.6ms ± 0%    41.6ms ± 0%  +0.07%  (p=0.027 n=18+15)
HTTPClientServer-12         62.2µs ± 1%    62.3µs ± 1%    ~     (p=0.283 n=19+20)
JSONEncode-12               16.5ms ± 2%    16.5ms ± 1%    ~     (p=0.857 n=20+19)
JSONDecode-12               58.5ms ± 1%    61.3ms ± 1%  +4.67%  (p=0.000 n=18+17)
Mandelbrot200-12            3.84ms ± 0%    3.84ms ± 0%    ~     (p=0.259 n=17+17)
GoParse-12                  3.70ms ± 2%    3.74ms ± 2%  +0.96%  (p=0.009 n=19+20)
RegexpMatchEasy0_32-12       100ns ± 1%     100ns ± 0%  +0.31%  (p=0.040 n=19+15)
RegexpMatchEasy0_1K-12       340ns ± 1%     340ns ± 1%    ~     (p=0.411 n=17+19)
RegexpMatchEasy1_32-12      82.7ns ± 2%    82.3ns ± 1%    ~     (p=0.456 n=20+19)
RegexpMatchEasy1_1K-12       498ns ± 2%     495ns ± 0%    ~     (p=0.108 n=19+17)
RegexpMatchMedium_32-12      130ns ± 1%     130ns ± 2%    ~     (p=0.405 n=18+19)
RegexpMatchMedium_1K-12     39.4µs ± 2%    39.1µs ± 1%  -0.64%  (p=0.002 n=20+19)
RegexpMatchHard_32-12       2.03µs ± 2%    2.02µs ± 0%    ~     (p=0.561 n=20+17)
RegexpMatchHard_1K-12       61.1µs ± 2%    60.8µs ± 1%    ~     (p=0.615 n=19+18)
Revcomp-12                   532ms ± 2%     531ms ± 1%    ~     (p=0.470 n=19+19)
Template-12                 68.5ms ± 1%    69.1ms ± 1%  +0.87%  (p=0.000 n=17+17)
TimeParse-12                 344ns ± 2%     344ns ± 1%  +0.25%  (p=0.032 n=19+18)
TimeFormat-12                347ns ± 1%     362ns ± 1%  +4.27%  (p=0.000 n=17+19)
[Geo mean]                  62.3µs         62.3µs       -0.04%

name                      old speed      new speed      delta
GobDecode-12              89.6MB/s ± 1%  89.2MB/s ± 1%  -0.50%  (p=0.019 n=20+20)
GobEncode-12               117MB/s ± 1%   117MB/s ± 1%    ~     (p=0.797 n=20+19)
Gzip-12                   61.3MB/s ± 3%  61.2MB/s ± 2%    ~     (p=0.834 n=19+19)
Gunzip-12                  467MB/s ± 0%   466MB/s ± 0%  -0.07%  (p=0.027 n=18+15)
JSONEncode-12              117MB/s ± 2%   117MB/s ± 1%    ~     (p=0.851 n=20+19)
JSONDecode-12             33.2MB/s ± 1%  31.7MB/s ± 1%  -4.47%  (p=0.000 n=18+17)
GoParse-12                15.6MB/s ± 2%  15.5MB/s ± 2%  -0.95%  (p=0.008 n=19+20)
RegexpMatchEasy0_32-12     321MB/s ± 2%   320MB/s ± 1%  -0.57%  (p=0.002 n=17+17)
RegexpMatchEasy0_1K-12    3.01GB/s ± 1%  3.01GB/s ± 1%    ~     (p=0.132 n=17+18)
RegexpMatchEasy1_32-12     387MB/s ± 2%   389MB/s ± 1%    ~     (p=0.423 n=20+19)
RegexpMatchEasy1_1K-12    2.05GB/s ± 2%  2.06GB/s ± 0%    ~     (p=0.129 n=19+17)
RegexpMatchMedium_32-12   7.64MB/s ± 1%  7.66MB/s ± 1%    ~     (p=0.258 n=18+19)
RegexpMatchMedium_1K-12   26.0MB/s ± 2%  26.2MB/s ± 1%  +0.64%  (p=0.002 n=20+19)
RegexpMatchHard_32-12     15.7MB/s ± 2%  15.8MB/s ± 1%    ~     (p=0.510 n=20+17)
RegexpMatchHard_1K-12     16.8MB/s ± 2%  16.8MB/s ± 1%    ~     (p=0.603 n=19+18)
Revcomp-12                 477MB/s ± 2%   479MB/s ± 1%    ~     (p=0.470 n=19+19)
Template-12               28.3MB/s ± 1%  28.1MB/s ± 1%  -0.85%  (p=0.000 n=17+17)
[Geo mean]                 100MB/s        100MB/s       -0.26%

Change-Id: Ib0bfe0145675ce88c5a8791752f7486ac98805b4
Reviewed-on: https://go-review.googlesource.com/16043
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-26 15:42:44 +00:00
David Crawshaw
21f35b33c2 runtime: use a 64kb system stack on arm
I went looking for an arm system whose stacks are by default smaller
than 64KB. In fact the smallest common linux target I could find was
Android, which like iOS uses 1MB stacks.

Fixes #11873

Change-Id: Ieeb66ad095b3da18d47ba21360ea75152a4107c6
Reviewed-on: https://go-review.googlesource.com/14602
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-10-26 15:10:34 +00:00
Caleb Spare
fb7178e7cc runtime: copy sqrt normalization bugfix from math
This copies the change from CL 16158 (applied as
22d4c8bf13).

Updates #13013

Change-Id: Id7d02e63d92806f06a4e064a91b2fb6574fe385f
Reviewed-on: https://go-review.googlesource.com/16291
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-23 23:43:47 +00:00
David Chase
e99dd52066 [dev.ssa] cmd/compile: enhance SSA filtering, add OpConvert
Modified GOSSA{HASH.PKG} environment variable filters to
make it easier to make/run with all SSA for testing.
Disable attempts at SSA for architectures that are not
amd64 (avoid spurious errors/unimplementeds.)

Removed easy out for unimplemented features.

Add convert op for proper liveness in presence of uintptr
to/from unsafe.Pointer conversions.

Tweaked stack sizes to get a pass on windows;
1024 instead 768, was observed to pass at least once.

Change-Id: Ida3800afcda67d529e3b1cf48ca4a3f0fa48b2c5
Reviewed-on: https://go-review.googlesource.com/16201
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
2015-10-23 19:32:57 +00:00
Matthew Dempsky
8ee0fd8623 runtime: replace is{plan9,solaris,windows} with GOOS tests
Change-Id: I27589395f547c5837dc7536a0ab5bc7cc23a4ff6
Reviewed-on: https://go-review.googlesource.com/10872
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-23 18:11:17 +00:00
Alex Brainman
6410e67a1e runtime: account for cpu affinity in windows NumCPU
Fixes #11671

Change-Id: Ide1f8d92637dad2a2faed391329f9b6001789b76
Reviewed-on: https://go-review.googlesource.com/14742
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-23 07:54:42 +00:00
Austin Clements
beedb1ec33 runtime: add pcvalue cache to improve stack scan speed
The cost of scanning large stacks is currently dominated by the time
spent looking up and decoding the pcvalue table. However, large stacks
are usually large not because they contain calls to many different
functions, but because they contain many calls to the same, small set
of recursive functions. Hence, walking large stacks tends to make the
same pcvalue queries many times.

Based on this observation, this commit adds a small, very simple, and
fast cache in front of pcvalue lookup. We thread this cache down from
operations that make many pcvalue calls, such as gentraceback, stack
scanning, and stack adjusting.

This simple cache works well because it has minimal overhead when it's
not effective. I also tried a hashed direct-map cache, CLOCK-based
replacement, round-robin replacement, and round-robin with lookups
disabled until there had been at least 16 probes, but none of these
approaches had obvious wins over the random replacement policy in this
commit.

This nearly doubles the overall performance of the deep stack test
program from issue #10898:

name        old time/op  new time/op  delta
Issue10898   16.5s ±12%    9.2s ±12%  -44.37%  (p=0.008 n=5+5)

It's a very slight win on the garbage benchmark:

name              old time/op  new time/op  delta
XBenchGarbage-12  4.92ms ± 1%  4.89ms ± 1%  -0.75%  (p=0.000 n=18+19)

It's a wash (but doesn't harm performance) on the go1 benchmarks,
which don't have particularly deep stacks:

name                      old time/op    new time/op    delta
BinaryTree17-12              3.11s ± 2%     3.20s ± 3%  +2.83%  (p=0.000 n=17+20)
Fannkuch11-12                2.51s ± 1%     2.51s ± 1%  -0.22%  (p=0.034 n=19+18)
FmtFprintfEmpty-12          50.8ns ± 3%    50.6ns ± 2%    ~     (p=0.793 n=20+20)
FmtFprintfString-12          174ns ± 0%     174ns ± 1%  +0.17%  (p=0.048 n=15+20)
FmtFprintfInt-12             177ns ± 0%     165ns ± 1%  -6.99%  (p=0.000 n=17+19)
FmtFprintfIntInt-12          283ns ± 1%     284ns ± 0%  +0.22%  (p=0.000 n=18+15)
FmtFprintfPrefixedInt-12     243ns ± 1%     244ns ± 1%  +0.40%  (p=0.000 n=20+19)
FmtFprintfFloat-12           318ns ± 0%     319ns ± 0%  +0.27%  (p=0.001 n=19+20)
FmtManyArgs-12              1.12µs ± 0%    1.14µs ± 0%  +1.74%  (p=0.000 n=19+20)
GobDecode-12                8.69ms ± 0%    8.73ms ± 1%  +0.46%  (p=0.000 n=18+18)
GobEncode-12                6.64ms ± 1%    6.61ms ± 1%  -0.46%  (p=0.000 n=20+20)
Gzip-12                      323ms ± 2%     319ms ± 1%  -1.11%  (p=0.000 n=20+20)
Gunzip-12                   42.8ms ± 0%    42.9ms ± 0%    ~     (p=0.158 n=18+20)
HTTPClientServer-12         63.3µs ± 1%    63.1µs ± 1%  -0.35%  (p=0.011 n=20+20)
JSONEncode-12               16.9ms ± 1%    17.3ms ± 1%  +2.84%  (p=0.000 n=19+20)
JSONDecode-12               59.7ms ± 0%    58.5ms ± 0%  -2.05%  (p=0.000 n=19+17)
Mandelbrot200-12            3.92ms ± 0%    3.91ms ± 0%  -0.16%  (p=0.003 n=19+19)
GoParse-12                  3.79ms ± 2%    3.75ms ± 2%  -0.91%  (p=0.005 n=20+20)
RegexpMatchEasy0_32-12       102ns ± 1%     101ns ± 1%  -0.80%  (p=0.001 n=14+20)
RegexpMatchEasy0_1K-12       337ns ± 1%     346ns ± 1%  +2.90%  (p=0.000 n=20+19)
RegexpMatchEasy1_32-12      84.4ns ± 2%    84.3ns ± 2%    ~     (p=0.743 n=20+20)
RegexpMatchEasy1_1K-12       502ns ± 1%     505ns ± 0%  +0.64%  (p=0.000 n=20+20)
RegexpMatchMedium_32-12      133ns ± 1%     132ns ± 1%  -0.85%  (p=0.000 n=20+19)
RegexpMatchMedium_1K-12     40.1µs ± 1%    39.8µs ± 1%  -0.77%  (p=0.000 n=18+18)
RegexpMatchHard_32-12       2.08µs ± 1%    2.07µs ± 1%  -0.55%  (p=0.001 n=18+19)
RegexpMatchHard_1K-12       62.4µs ± 1%    62.0µs ± 1%  -0.74%  (p=0.000 n=19+19)
Revcomp-12                   545ms ± 2%     545ms ± 3%    ~     (p=0.771 n=19+20)
Template-12                 73.7ms ± 1%    72.0ms ± 0%  -2.33%  (p=0.000 n=20+18)
TimeParse-12                 358ns ± 1%     351ns ± 1%  -2.07%  (p=0.000 n=20+20)
TimeFormat-12                369ns ± 1%     356ns ± 0%  -3.53%  (p=0.000 n=20+18)
[Geo mean]                  63.5µs         63.2µs       -0.41%

name                      old speed      new speed      delta
GobDecode-12              88.3MB/s ± 0%  87.9MB/s ± 0%  -0.43%  (p=0.000 n=18+17)
GobEncode-12               116MB/s ± 1%   116MB/s ± 1%  +0.47%  (p=0.000 n=20+20)
Gzip-12                   60.2MB/s ± 2%  60.8MB/s ± 1%  +1.13%  (p=0.000 n=20+20)
Gunzip-12                  453MB/s ± 0%   453MB/s ± 0%    ~     (p=0.160 n=18+20)
JSONEncode-12              115MB/s ± 1%   112MB/s ± 1%  -2.76%  (p=0.000 n=19+20)
JSONDecode-12             32.5MB/s ± 0%  33.2MB/s ± 0%  +2.09%  (p=0.000 n=19+17)
GoParse-12                15.3MB/s ± 2%  15.4MB/s ± 2%  +0.92%  (p=0.004 n=20+20)
RegexpMatchEasy0_32-12     311MB/s ± 1%   314MB/s ± 1%  +0.78%  (p=0.000 n=15+19)
RegexpMatchEasy0_1K-12    3.04GB/s ± 1%  2.95GB/s ± 1%  -2.90%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12     379MB/s ± 2%   380MB/s ± 2%    ~     (p=0.779 n=20+20)
RegexpMatchEasy1_1K-12    2.04GB/s ± 1%  2.02GB/s ± 0%  -0.62%  (p=0.000 n=20+20)
RegexpMatchMedium_32-12   7.46MB/s ± 1%  7.53MB/s ± 1%  +0.86%  (p=0.000 n=20+19)
RegexpMatchMedium_1K-12   25.5MB/s ± 1%  25.7MB/s ± 1%  +0.78%  (p=0.000 n=18+18)
RegexpMatchHard_32-12     15.4MB/s ± 1%  15.5MB/s ± 1%  +0.62%  (p=0.000 n=19+19)
RegexpMatchHard_1K-12     16.4MB/s ± 1%  16.5MB/s ± 1%  +0.82%  (p=0.000 n=20+19)
Revcomp-12                 466MB/s ± 2%   466MB/s ± 3%    ~     (p=0.765 n=19+20)
Template-12               26.3MB/s ± 1%  27.0MB/s ± 0%  +2.38%  (p=0.000 n=20+18)
[Geo mean]                97.8MB/s       98.0MB/s       +0.23%

Change-Id: I281044ae0b24990ba46487cacbc1069493274bc4
Reviewed-on: https://go-review.googlesource.com/13614
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-22 17:48:13 +00:00
Matthew Dempsky
1652a2c316 runtime: add mSpanList type to represent lists of mspans
This CL introduces a new mSpanList type to replace the empty mspan
variables that were previously used as list heads.

To be type safe, the previous circular linked list data structure is
now a tail queue instead.  One complication of this is
mSpanList_Remove needs to know the list a span is being removed from,
but this appears to be computable in all circumstances.

As a temporary sanity check, mSpanList_Insert and mSpanList_InsertBack
record the list that an mspan has been inserted into so that
mSpanList_Remove can verify that the correct list was specified.

Whereas mspan is 112 bytes on amd64, mSpanList is only 16 bytes.  This
shrinks the size of mheap from 50216 bytes to 12584 bytes.

Change-Id: I8146364753dbc3b4ab120afbb9c7b8740653c216
Reviewed-on: https://go-review.googlesource.com/15906
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2015-10-22 17:12:06 +00:00
Aaron Jacobs
151f4ec95d runtime: remove unused printpc and printbyte functions
Change-Id: I40e338f6b445ca72055fc9bac0f09f0dca904e3a
Reviewed-on: https://go-review.googlesource.com/16191
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-22 15:02:44 +00:00
Matthew Dempsky
5a68eb9f25 runtime: prune some dead variables
Change-Id: I7a1c3079b433c4e30d72fb7d59f9594e0d5efe47
Reviewed-on: https://go-review.googlesource.com/16178
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-10-22 03:56:19 +00:00
Matthew Dempsky
29330c118d runtime: change fixalloc's chunk field to unsafe.Pointer
It's never used as a *byte anyway, so might as well just make it an
unsafe.Pointer instead.

Change-Id: I68ee418781ab2fc574eeac0498f2515b5561b7a8
Reviewed-on: https://go-review.googlesource.com/16175
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-22 01:14:23 +00:00
Shenghou Ma
1948aef6e3 runtime: fix typos
Change-Id: Iffc25fc80452baf090bf8ef15ab798cfaa120b8e
Reviewed-on: https://go-review.googlesource.com/16154
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-22 00:40:48 +00:00
Matthew Dempsky
58e3ae2fae runtime: split plan9 and solaris's m fields into new embedded mOS type
Reduces the size of m by ~8% on linux/amd64 (1040 bytes -> 960 bytes).

There are also windows-specific fields, but they're currently
referenced in OS-independent source files (but only when
GOOS=="windows").

Change-Id: I13e1471ff585ccced1271f74209f8ed6df14c202
Reviewed-on: https://go-review.googlesource.com/16173
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-22 00:04:52 +00:00
Matthew Dempsky
7df8ba136c runtime: replace unsafe pointer arithmetic with array indexing
Change-Id: I313819abebd4cda4a6c30fd0fd6f44cb1d09161f
Reviewed-on: https://go-review.googlesource.com/16167
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 23:22:20 +00:00
Matthew Dempsky
84afa1be76 runtime: make iface/eface handling more type safe
Change compiler-invoked interface functions to directly take
iface/eface parameters instead of fInterface/interface{} to avoid
needing to always convert.

For the handful of functions that legitimately need to take an
interface{} parameter, add efaceOf to type-safely convert *interface{}
to *eface.

Change-Id: I8928761a12fd3c771394f36adf93d3006a9fcf39
Reviewed-on: https://go-review.googlesource.com/16166
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 23:08:22 +00:00
Ian Lance Taylor
73f329f472 runtime, syscall: add calls to msan functions
Add explicit memory sanitizer instrumentation to the runtime and syscall
packages.  The compiler does not instrument the runtime package.  It
does instrument the syscall package, but we need to add a couple of
cases that it can't see.

Change-Id: I2d66073f713fe67e33a6720460d2bb8f72f31394
Reviewed-on: https://go-review.googlesource.com/16164
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-21 19:17:46 +00:00
Matthew Dempsky
c279250946 runtime: change functype's in and out fields to []*_type
Allows removing a few gratuitous unsafe.Pointer conversions and
parallels the type of reflect.funcType's in and out fields ([]*rtype).

Change-Id: Ie5ca230a94407301a854dfd8782a3180d5054bc4
Reviewed-on: https://go-review.googlesource.com/16163
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 18:37:45 +00:00
Ian Lance Taylor
5174df9087 runtime, runtime/msan: add msan runtime support
These are the runtime support functions for letting Go code interoperate
with the C/C++ memory sanitizer.  Calls to msanread/msanwrite are now
inserted by the compiler with the -msan option.  Calls to
msanmalloc/msanfree will be from other runtime functions in a subsequent
CL.

Change-Id: I64fb061b38cc6519153face242eccd291c07d1f2
Reviewed-on: https://go-review.googlesource.com/16162
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-21 17:50:39 +00:00
Austin Clements
a42f668654 runtime: eliminate unused _GCstw phase
Change-Id: Ie94cd17e1975fdaaa418fa6a7b2d3b164fedc135
Reviewed-on: https://go-review.googlesource.com/16057
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-21 16:26:34 +00:00
Austin Clements
28f458ce5b runtime: eliminate unnecessary ragged barrier
The ragged barrier after entering the concurrent mark phase is
vestigial. This used to be the point where we enabled write barriers,
so it was necessary to synchronize all Ps to ensure write barriers
were enabled before any marking occurred. However, we've long since
switched to enabling write barriers during the concurrent scan phase,
so the start-the-world at the beginning of the concurrent scan phase
ensures that all Ps have enabled the write barrier.

Hence, we can eliminate the old "install write barrier" phase.

Fixes #11971.

Change-Id: I8cdcb84b5525cef19927d51ea11ba0a4db991ea8
Reviewed-on: https://go-review.googlesource.com/16044
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-21 16:26:25 +00:00
Matthew Dempsky
d4a7ea1b71 runtime: add stringStructOf helper function
Instead of open-coding conversions from *string to unsafe.Pointer then
to *stringStruct, add a helper function to add some type safety.
Bonus: This caught two **string values being converted to
*stringStruct in heapdump.go.

While here, get rid of the redundant _string type, but add in a
stringStructDWARF type used for generating DWARF debug info.

Change-Id: I8882f8cca66ac45190270f82019a5d85db023bd2
Reviewed-on: https://go-review.googlesource.com/16131
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-20 23:13:27 +00:00
Aaron Jacobs
ef986fa3fc runtime: change odd 'print1_write' file names
The '1' part is left over from the C conversion, but no longer makes
sense given that print1.go no longer exists.

Change-Id: Iec171251370d740f234afdbd6fb1a4009fde6696
Reviewed-on: https://go-review.googlesource.com/16036
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-20 23:03:06 +00:00
Hyang-Ah Hana Kim
30ee5919bd runtime: add syscalls needed for android/amd64 logging.
access, connect, socket.

In Android-L, logging is done by writing the log messages to the logd
process through a unix domain socket.

Also, changed the arg types of those syscall stubs to match linux
programming APIs.

For golang/go#10743

Change-Id: I66368a03316e253561e9e76aadd180c2cd2e48f3
Reviewed-on: https://go-review.googlesource.com/15993
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-20 16:56:58 +00:00
Aaron Jacobs
3bc0601742 runtime: rename _func.frame to make it clear it's deprecated and unused.
When I saw that it was labelled "legacy", I went looking for users of it
to see how it was still used. But there aren't any. Save the next person
the trouble.

Change-Id: I921dd6c57b60331c9816542272555153ac133c02
Reviewed-on: https://go-review.googlesource.com/16035
Reviewed-by: Dave Cheney <dave@cheney.net>
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-20 03:16:09 +00:00
Michael Hudson-Doyle
c5856cfdb6 runtime: tweaks to allow -buildmode=shared to work
Building Go shared libraries requires that all functions that have declarations
without bodies have implementations and vice versa, so remove the
implementation of call16 and add a stub implementation of sigreturn.

Change-Id: I4d5a30c8637a5da7991054e151a536611d5bea46
Reviewed-on: https://go-review.googlesource.com/15966
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-19 21:23:36 +00:00
Keith Randall
7c4fbb650c [dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch
The only major fixup is that duffzero changed from
8-byte writes to 16-byte writes.

Change-Id: I1762b74ce67a8e4b81c11568027cdb3572f7f87c
2015-10-19 14:00:03 -07:00
Austin Clements
3cd56b4dca runtime: combine gcResetGState and gcResetMarkState
These functions are always called together and perform logically
related state resets, so combine them in to just gcResetMarkState.

Fixes #11427.

Change-Id: I06c17ef65f66186494887a767b3993126955b5fe
Reviewed-on: https://go-review.googlesource.com/16041
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-19 18:38:07 +00:00
Austin Clements
b0d5e5c500 runtime: consolidate gcResetGState calls
Currently gcResetGState is called by func gcscan_m for concurrent GC
and directly by func gc for STW GC. Simplify this by consolidating
these two calls in to one call by func gc above where it splits for
concurrent and STW GC.

As a consequence, gcResetGState and gcResetMarkState are always called
together, so the next commit will consolidate these.

Change-Id: Ib62d404c7b32b28f7d3080d26ecf3966cbc4aca0
Reviewed-on: https://go-review.googlesource.com/16040
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-19 18:38:00 +00:00
Austin Clements
feb92a8e8c runtime: remove work.partial queue
This work queue is no longer used (there are many reads of
work.partial, but the only write is in putpartial, which is never
called).

Fixes #11922.

Change-Id: I08b76c0c02a0867a9cdcb94783e1f7629d44249a
Reviewed-on: https://go-review.googlesource.com/15892
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-19 18:37:54 +00:00
Aaron Jacobs
5d88323fa6 runtime: remove a redundant nil pointer check.
It appears this was made possible by commit 89f185f; before that, g was
not dereferenced above.

Change-Id: I70bc571d924b36351392fd4c13d681e938cfb573
Reviewed-on: https://go-review.googlesource.com/16033
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-10-19 09:58:15 +00:00
Nodir Turakulov
386fa03609 runtime: merge proc1.go -> proc.go
from proc1.go to proc.go:
* prepend header comment explaining "Goroutine scheduler"
* insert m0 and g0 var defs after the comment
* append the rest

Updates #12952

Change-Id: I35ee9ae3287675cde0c1b6aeaca0a460393f2354
Reviewed-on: https://go-review.googlesource.com/16024
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-19 01:11:00 +00:00
Nodir Turakulov
243757576d runtime: merge race1.go -> race.go
* append contents of race1.go to race.go
* delete "Implementation of the race detector API." comment
  from race1.go

Updates #12952

Change-Id: Ibdd9c4dc79a63c3bef69eade9525578063c86c1c
Reviewed-on: https://go-review.googlesource.com/16023
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-18 23:48:22 +00:00
Michael Hudson-Doyle
6deb3c0619 runtime, runtime/cgo: conform to PIC register use rules in ppc64 asm
PIC code on ppc64le uses R2 as a TOC pointer and when calling a function
through a function pointer must ensure the function pointer is in R12.  These
rules are easy enough to follow unconditionally in our assembly, so do that.

Change-Id: Icfc4e47ae5dfbe15f581cbdd785cdeed6e40bc32
Reviewed-on: https://go-review.googlesource.com/15526
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-18 23:36:39 +00:00
Michael Hudson-Doyle
b8f8969fbd reflect, runtime, runtime/cgo: use ppc64 asm constant for fixed frame size
Shared libraries on ppc64le will require a larger minimum stack frame (because
the ABI mandates that the TOC pointer is available at 24(R1)). Part 3 of that
is using a #define in the ppc64 assembly to refer to the size of the fixed
part of the stack (finding all these took me about a week!).

Change-Id: I50f22fe1c47af1ec59da1bd7ea8f84a4750df9b7
Reviewed-on: https://go-review.googlesource.com/15525
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-18 23:15:26 +00:00
Michael Hudson-Doyle
a4855812e2 runtime: add a constant for the smallest possible stack frame
Shared libraries on ppc64le will require a larger minimum stack frame (because
the ABI mandates that the TOC pointer is available at 24(R1)). So to prepare
for this, make a constant for the fixed part of a stack and use that where
necessary.

Change-Id: I447949f4d725003bb82e7d2cf7991c1bca5aa887
Reviewed-on: https://go-review.googlesource.com/15523
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-18 22:14:00 +00:00
Michael Hudson-Doyle
45c06b27a4 cmd/internal/obj, runtime: add NOFRAME flag to suppress stack frame set up on ppc64x
Replace the confusing game where a frame size of $-8 would suppress the
implicit setting up of a stack frame with a nice explicit flag.

The code to set up the function prologue is still a little confusing but better
than it was.

Change-Id: I1d49278ff42c6bc734ebfb079998b32bc53f8d9a
Reviewed-on: https://go-review.googlesource.com/15670
Reviewed-by: Minux Ma <minux@golang.org>
2015-10-18 22:13:30 +00:00
David Chase
57670ad8b2 [dev.ssa] cmd/compile: fill remaining SSA gaps
Changed racewalk/race detector to use FP in a more
sensible way.

Relaxed checks for CONVNOP when race detecting.

Modified tighten to ensure that GetClosurePtr cannot float
out of entry block (turns out this cannot be relaxed, DX is
sometimes stomped by other code accompanying race detection).

Added case for addr(CONVNOP)

Modified addr to take "bounded" flag to suppress nilchecks
where it is set (usually, by race detector).

Cannot leave unimplemented-complainer enabled because it
turns out we are optimistically running SSA on every platform.

Change-Id: Ife021654ee4065b3ffac62326d09b4b317b9f2e0
Reviewed-on: https://go-review.googlesource.com/15710
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-18 13:30:54 +00:00
Nodir Turakulov
db2e73faeb runtime: merge stack{1,2}.go -> stack.go
* rename stack1.go -> stack.go
* prepend contents of stack2.go to stack.go

Updates #12952

Change-Id: I60d409af37162a5a7596c678dfebc2cea89564ff
Reviewed-on: https://go-review.googlesource.com/16008
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-17 20:52:22 +00:00
Matthew Dempsky
4562784bae runtime: remove some unnecessary unsafe code in mfixalloc
Change-Id: Ie9ea4af4315a4d0eb69d0569726bb3eca2b397af
Reviewed-on: https://go-review.googlesource.com/16005
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-17 00:26:26 +00:00
Nodir Turakulov
9358f7fa61 runtime: merge panic1.go into panic.go
A TODO to merge is removed from panic1.go.
The rest is appended to panic.go

Updates #12952

Change-Id: Ied4382a455abc20bc2938e34d031802e6b4baf8b
Reviewed-on: https://go-review.googlesource.com/15905
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-16 15:51:49 +00:00
Nodir Turakulov
d72d299f3e runtime: rename print1.go -> print.go
It seems that it was called print1.go mistakenly: print.go was deleted
in the same commit:
https://go.googlesource.com/go/+/597b266eafe7d63e9be8da1c1b4813bd2998a11c

Updates #12952

Change-Id: I371e59d6cebc8824857df3f3ee89101147dfffc0
Reviewed-on: https://go-review.googlesource.com/15950
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-16 15:51:30 +00:00
Nodir Turakulov
881b0e7880 runtime: merge string1.go into string.go
string1.go contents are appended to string.go as is

Updates #12952

Change-Id: I30083ba7fdd362d4421e964a494c76ca865bedc2
Reviewed-on: https://go-review.googlesource.com/15951
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-16 15:46:02 +00:00
Michael Hudson-Doyle
42c7929c04 runtime, runtime/debug: access unexported runtime functions with //go:linkname, not assembly stubs
Change-Id: I88f80f5914d6e4c179f3d28aa59fc29b7ef0cc66
Reviewed-on: https://go-review.googlesource.com/15960
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-16 09:14:25 +00:00
Michael Hudson-Doyle
0b8d583320 runtime, os/signal: use //go:linkname instead of assembly stubs to get access to runtime functions
os/signal depends on a few unexported runtime functions. This removes the
assembly stubs it used to get access to these in favour of using
//go:linkname in runtime to make the functions accessible to os/signal.

This is motivated by ppc64le shared libraries, where you cannot BR to a symbol
defined in a shared library (only BL), but it seems like an improvment anyway.

Change-Id: I09361203ce38070bd3f132f6dc5ac212f2dc6f58
Reviewed-on: https://go-review.googlesource.com/15871
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-10-16 07:11:04 +00:00
Matthew Dempsky
4c2465d47d runtime: use unsafe.Pointer(x) instead of (unsafe.Pointer)(x)
This isn't C anymore.  No binary change to pkg/linux_amd64/runtime.a.

Change-Id: I24d66b0f5ac888f432b874aac684b1395e7c8345
Reviewed-on: https://go-review.googlesource.com/15903
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-15 21:48:37 +00:00
Raul Silvera
1d765b77a0 runtime: Reduce testing for fastlog2 implementation
The current fastlog2 testing checks all 64M values in the domain of
interest, which is too much for platforms with no native floating point.

Reduce testing under testing.Short() to speed up builds for those platforms.

Related to #12620

Change-Id: Ie5dcd408724ba91c3b3fcf9ba0dddedb34706cd1
Reviewed-on: https://go-review.googlesource.com/15830
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Joel Sing <jsing@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-14 04:54:33 +00:00
Ian Lance Taylor
2961cab965 runtime: remove _Kind constants
The duplication of _Kind and kind constants is a legacy of the
conversion from C.

Change-Id: I368b35a41f215cf91ac4b09dac59699edb414a0e
Reviewed-on: https://go-review.googlesource.com/15800
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-13 00:15:36 +00:00
Austin Clements
65aa2da617 runtime: assist before allocating
Currently, when the mutator allocates, the runtime first allocates the
memory and then, if that G has done "enough" allocation, the runtime
checks whether the G has assist debt to pay off and, if so, pays it
off. This approach leads to under-assisting, where a G can allocate a
large region (or many small regions) before paying for it, or can even
exit with outstanding debt.

This commit flips this around so that a G always acquires enough
credit for an allocation before it can perform that allocation. We
continue to amortize the cost of assists by requiring that they
over-assist when triggered to build up credit for many allocations.

Fixes #11967.

Change-Id: Idac9f11133b328535667674d837be72c23ebd899
Reviewed-on: https://go-review.googlesource.com/15409
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-10-09 19:39:03 +00:00
Austin Clements
89c341c5e9 runtime: directly track GC assist balance
Currently we track the per-G GC assist balance as two monotonically
increasing values: the bytes allocated by the G this cycle (gcalloc)
and the scan work performed by the G this cycle (gcscanwork). The
assist balance is hence assistRatio*gcalloc - gcscanwork.

This works, but has two important downsides:

1) It requires floating-point math to figure out if a G is in debt or
   not. This makes it inappropriate to check for assist debt in the
   hot path of mallocgc, so we only do this when a G allocates a new
   span. As a result, Gs can operate "in the red", leading to
   under-assist and extended GC cycle length.

2) Revising the assist ratio during a GC cycle can lead to an "assist
   burst". If you think of plotting the scan work performed versus
   heaps size, the assist ratio controls the slope of this line.
   However, in the current system, the target line always passes
   through 0 at the heap size that triggered GC, so if the runtime
   increases the assist ratio, there has to be a potentially large
   assist to jump from the current amount of scan work up to the new
   target scan work for the current heap size.

This commit replaces this approach with directly tracking the GC
assist balance in terms of allocation credit bytes. Allocating N bytes
simply decreases this by N and assisting raises it by the amount of
scan work performed divided by the assist ratio (to get back to
bytes).

This will make it cheap to figure out if a G is in debt, which will
let us efficiently check if an assist is necessary *before* performing
an allocation and hence keep Gs "in the black".

This also fixes assist bursts because the assist ratio is now in terms
of *remaining* work, rather than work from the beginning of the GC
cycle. Hence, the plot of scan work versus heap size becomes
continuous: we can revise the slope, but this slope always starts from
where we are right now, rather than where we were at the beginning of
the cycle.

Change-Id: Ia821c5f07f8a433e8da7f195b52adfedd58bdf2c
Reviewed-on: https://go-review.googlesource.com/15408
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:52 +00:00
Austin Clements
9e77c89868 runtime: ensure minimum heap distance via heap goal
Currently we ensure a minimum heap distance of 1MB when computing the
assist ratio. Rather than enforcing this minimum on the heap distance,
it makes more sense to enforce that the heap goal itself is at least
1MB over the live heap size at the beginning of GC. Currently the two
approaches are semantically equivalent, but this will let us switch to
basing the assist ratio on current heap distance rather than the
initial heap distance, since we can't enforce this minimum on the
current heap distance (the GC may never finish because the goal posts
will always be 1MB away).

Change-Id: I0027b1c26a41a0152b01e5b67bdb1140d43ee903
Reviewed-on: https://go-review.googlesource.com/15604
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:39 +00:00
Austin Clements
8e8219deb5 runtime: update gcController.scanWork regularly
Currently, gcController.scanWork is updated as lazily as possible
since it is only read at the end of the GC cycle. We're about to read
it during the GC cycle to improve the assist ratio revisions, so
modify gcDrain* to regularly flush to gcController.scanWork in much
the same way as we regularly flush to gcController.bgScanCredit.

One consequence of this is that it's difficult to keep gcw.scanWork
monotonic, so we give up on that and simply return the amount of scan
work done by gcDrainN rather than calculating it in the caller.

Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d
Reviewed-on: https://go-review.googlesource.com/15407
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:29 +00:00
Austin Clements
c18b163c15 runtime: control background scan credit flushing with flag
Currently callers of gcDrain control whether it flushes scan work
credit to gcController.bgScanCredit by passing a value other than -1
for the flush threshold. Shortly we're going to make this always flush
scan work to gcController.scanWork and optionally also flush scan work
to gcController.bgScanCredit. This will be much easier if the flush
threshold is simply a constant (which it is in practice) and callers
merely control whether or not the flush includes the background
credit. Hence, replace the flush threshold argument with a flag.

Change-Id: Ia27db17de8a3f1e462a5d7137d4b5dc72f99a04e
Reviewed-on: https://go-review.googlesource.com/15406
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:16 +00:00
Austin Clements
9b3cdaf0a3 runtime: consolidate gcDrain and gcDrainUntilPreempt
These functions were nearly identical. Consolidate them by adding a
flags argument. In addition to cleaning up this code, this makes
further changes that affect both functions easier.

Change-Id: I6ec5c947603bbbd3ff4040113b2fbc240e99745f
Reviewed-on: https://go-review.googlesource.com/15405
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:03 +00:00
Austin Clements
39ed682206 runtime: explain why continuous assist revising is necessary
Change-Id: I950af8d80433b3ae8a1da0aa7a8d2d0b295dd313
Reviewed-on: https://go-review.googlesource.com/15404
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:37:53 +00:00
Austin Clements
3271250ec4 runtime: fix comment for gcAssistAlloc
Change-Id: I312e56e95d8ef8ae036d16444ab1e2df1285845d
Reviewed-on: https://go-review.googlesource.com/15403
Reviewed-by: Russ Cox <rsc@golang.org>
2015-10-09 19:37:41 +00:00
Austin Clements
3e57b17dc3 runtime: fix comment for assistRatio
The comment for assistRatio claimed it to be the reciprocal of what it
actually is.

Change-Id: If7f9bb853d75d0097facff3aa6704b224d9108b8
Reviewed-on: https://go-review.googlesource.com/15402
Reviewed-by: Russ Cox <rsc@golang.org>
2015-10-09 19:37:23 +00:00
Nodir Turakulov
3be4d59820 runtime: remove redundant type cast
(*T)(unsafe.Pointer(&t)) === &t
for t of type T

Change-Id: I43c1aa436747dfa0bf4cb0d615da1647633f9536
Reviewed-on: https://go-review.googlesource.com/15656
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-09 18:48:36 +00:00
Keith Randall
91059de095 runtime: make aeshash more DOS-proof
Improve the aeshash implementation to make it harder to engineer collisions.

1) Scramble the seed before xoring with the input string.  This
   makes it harder to cancel known portions of the seed (like the size)
   because it mixes the per-table seed into those other parts.

2) Use table-dependent seeds for all stripes when hashing >16 byte strings.

For small strings this change uses 4 aesenc ops instead of 3, so it
is somewhat slower.  The first two can run in parallel, though, so
it isn't 33% slower.

benchmark                            old ns/op     new ns/op     delta
BenchmarkHash64-12                   10.2          11.2          +9.80%
BenchmarkHash16-12                   5.71          6.13          +7.36%
BenchmarkHash5-12                    6.64          7.01          +5.57%
BenchmarkHashBytesSpeed-12           30.3          31.9          +5.28%
BenchmarkHash65536-12                2785          2882          +3.48%
BenchmarkHash1024-12                 53.6          55.4          +3.36%
BenchmarkHashStringArraySpeed-12     54.9          56.5          +2.91%
BenchmarkHashStringSpeed-12          18.7          19.2          +2.67%
BenchmarkHashInt32Speed-12           14.8          15.1          +2.03%
BenchmarkHashInt64Speed-12           14.5          14.5          +0.00%

Change-Id: I59ea124b5cb92b1c7e8584008257347f9049996c
Reviewed-on: https://go-review.googlesource.com/14124
Reviewed-by: jcd . <jcd@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-08 16:43:03 +00:00
Michael Hudson-Doyle
168a51b3a1 runtime: adjust the arm64 memmove and memclr to operate by word as much as they can
Not only is this an obvious optimization:

benchmark                           old MB/s     new MB/s     speedup
BenchmarkMemmove1-4                 35.35        29.65        0.84x
BenchmarkMemmove2-4                 63.78        52.53        0.82x
BenchmarkMemmove3-4                 89.72        73.96        0.82x
BenchmarkMemmove4-4                 109.94       95.73        0.87x
BenchmarkMemmove5-4                 127.60       112.80       0.88x
BenchmarkMemmove6-4                 143.59       126.67       0.88x
BenchmarkMemmove7-4                 157.90       138.92       0.88x
BenchmarkMemmove8-4                 167.18       231.81       1.39x
BenchmarkMemmove9-4                 175.23       252.07       1.44x
BenchmarkMemmove10-4                165.68       261.10       1.58x
BenchmarkMemmove11-4                174.43       263.31       1.51x
BenchmarkMemmove12-4                180.76       267.56       1.48x
BenchmarkMemmove13-4                189.06       284.93       1.51x
BenchmarkMemmove14-4                186.31       284.72       1.53x
BenchmarkMemmove15-4                195.75       281.62       1.44x
BenchmarkMemmove16-4                202.96       439.23       2.16x
BenchmarkMemmove32-4                264.77       775.77       2.93x
BenchmarkMemmove64-4                306.81       1209.64      3.94x
BenchmarkMemmove128-4               357.03       1515.41      4.24x
BenchmarkMemmove256-4               380.77       2066.01      5.43x
BenchmarkMemmove512-4               385.05       2556.45      6.64x
BenchmarkMemmove1024-4              381.23       2804.10      7.36x
BenchmarkMemmove2048-4              379.06       2814.83      7.43x
BenchmarkMemmove4096-4              387.43       3064.96      7.91x
BenchmarkMemmoveUnaligned1-4        28.91        25.40        0.88x
BenchmarkMemmoveUnaligned2-4        56.13        47.56        0.85x
BenchmarkMemmoveUnaligned3-4        74.32        69.31        0.93x
BenchmarkMemmoveUnaligned4-4        97.02        83.58        0.86x
BenchmarkMemmoveUnaligned5-4        110.17       103.62       0.94x
BenchmarkMemmoveUnaligned6-4        124.95       113.26       0.91x
BenchmarkMemmoveUnaligned7-4        142.37       130.82       0.92x
BenchmarkMemmoveUnaligned8-4        151.20       205.64       1.36x
BenchmarkMemmoveUnaligned9-4        166.97       215.42       1.29x
BenchmarkMemmoveUnaligned10-4       148.49       221.22       1.49x
BenchmarkMemmoveUnaligned11-4       159.47       239.57       1.50x
BenchmarkMemmoveUnaligned12-4       163.52       247.32       1.51x
BenchmarkMemmoveUnaligned13-4       167.55       256.54       1.53x
BenchmarkMemmoveUnaligned14-4       175.12       251.03       1.43x
BenchmarkMemmoveUnaligned15-4       192.10       267.13       1.39x
BenchmarkMemmoveUnaligned16-4       190.76       378.87       1.99x
BenchmarkMemmoveUnaligned32-4       259.02       562.98       2.17x
BenchmarkMemmoveUnaligned64-4       317.72       842.44       2.65x
BenchmarkMemmoveUnaligned128-4      355.43       1274.49      3.59x
BenchmarkMemmoveUnaligned256-4      378.17       1815.74      4.80x
BenchmarkMemmoveUnaligned512-4      362.15       2180.81      6.02x
BenchmarkMemmoveUnaligned1024-4     376.07       2453.58      6.52x
BenchmarkMemmoveUnaligned2048-4     381.66       2568.32      6.73x
BenchmarkMemmoveUnaligned4096-4     398.51       2669.36      6.70x
BenchmarkMemclr5-4                  113.83       107.93       0.95x
BenchmarkMemclr16-4                 223.84       389.63       1.74x
BenchmarkMemclr64-4                 421.99       1209.58      2.87x
BenchmarkMemclr256-4                525.94       2411.58      4.59x
BenchmarkMemclr4096-4               581.66       4372.20      7.52x
BenchmarkMemclr65536-4              565.84       4747.48      8.39x
BenchmarkGoMemclr5-4                194.63       160.31       0.82x
BenchmarkGoMemclr16-4               295.30       630.07       2.13x
BenchmarkGoMemclr64-4               480.24       1884.03      3.92x
BenchmarkGoMemclr256-4              540.23       2926.49      5.42x

but it turns out that it's necessary to avoid the GC seeing partially written
pointers.

It's of course possible to be more sophisticated (using ldp/stp to move 16
bytes at a time in the core loop and unrolling the tail copying loops being
the obvious ideas) but I wanted something simple and (reasonably) obviously
correct.

Fixes #12552

Change-Id: Iaeaf8a812cd06f4747ba2f792de1ded738890735
Reviewed-on: https://go-review.googlesource.com/14813
Reviewed-by: Austin Clements <austin@google.com>
2015-10-08 07:49:35 +00:00
Michael Hudson-Doyle
a5cb76243a cmd/internal/obj, cmd/link, runtime: lots of TLS cleanup
It's particularly nice to get rid of the android special cases in the linker.

Change-Id: I516363af7ce8a6b2f196fe49cb8887ac787a6dad
Reviewed-on: https://go-review.googlesource.com/14197
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-08 00:21:30 +00:00
Raul Silvera
27ee719fb3 pprof: improve sampling for heap profiling
The current heap sampling introduces some bias that interferes
with unsampling, producing unexpected heap profiles.
The solution is to use a Poisson process to generate the
sampling points, using the formulas described at
https://en.wikipedia.org/wiki/Poisson_process

This fixes #12620

Change-Id: If2400809ed3c41de504dd6cff06be14e476ff96c
Reviewed-on: https://go-review.googlesource.com/14590
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-05 08:15:09 +00:00
Austin Clements
9f6df6c940 runtime: use 4 byte writes in amd64p32 memmove/memclr
Currently, amd64p32's memmove and memclr use 8 byte writes as much as
possible and 1 byte writes for the tail of the object. However, if an
object ends with a 4 byte pointer at an 8 byte aligned offset, this
may copy/zero the pointer field one byte at a time, allowing the
garbage collector to observe a partially copied pointer.

Fix this by using 4 byte writes instead of 8 byte writes.

Updates #12552.

Change-Id: I13324fd05756fb25ae57e812e836f0a975b5595c
Reviewed-on: https://go-review.googlesource.com/15370
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-02 22:49:15 +00:00
Austin Clements
44078a3228 runtime: adjust huge page flags only on huge page granularity
This fixes an issue where the runtime panics with "out of memory" or
"cannot allocate memory" even though there's ample memory by reducing
the number of memory mappings created by the memory allocator.

Commit 7e1b61c worked around issue #8832 where Linux's transparent
huge page support could dramatically increase the RSS of a Go process
by setting the MADV_NOHUGEPAGE flag on any regions of pages released
to the OS with MADV_DONTNEED. This had the side effect of also
increasing the number of VMAs (memory mappings) in a Go address space
because a separate VMA is needed for every region of the virtual
address space with different flags. Unfortunately, by default, Linux
limits the number of VMAs in an address space to 65530, and a large
heap can quickly reach this limit when the runtime starts scavenging
memory.

This commit dramatically reduces the number of VMAs. It does this
primarily by only adjusting the huge page flag at huge page
granularity. With this change, on amd64, even a pessimal heap that
alternates between MADV_NOHUGEPAGE and MADV_HUGEPAGE must reach 128GB
to reach the VMA limit. Because of this rounding to huge page
granularity, this change is also careful to leave large used and
unused regions huge page-enabled.

This change reduces the maximum number of VMAs during the runtime
benchmarks with GODEBUG=scavenge=1 from 692 to 49.

Fixes #12233.

Change-Id: Ic397776d042f20d53783a1cacf122e2e2db00584
Reviewed-on: https://go-review.googlesource.com/15191
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-02 20:20:43 +00:00
Austin Clements
9a31d38f65 runtime: remove sweep wait loop in finishsweep_m
In general, finishsweep_m must block until any spans that are
concurrently being swept have been swept. It accomplishes this by
looping over all spans, which, as in the previous commit, takes
~1ms/heap GB. Unfortunately, we do this during the STW sweep
termination phase, so multi-gigabyte heaps can push our STW time past
10ms.

However, there's no need to do this wait if the world is stopped
because, in effect, stopping the world already had to wait for
anything that was sweeping (and if it didn't, the wait in
finishsweep_m would deadlock). Hence, we can simply skip this loop if
the world is stopped, such as during sweep termination. In fact,
currently all calls to finishsweep_m are STW, but this hasn't always
been the case and may not be the case in the future, so we keep the
logic around.

For 24GB heaps, this reduces max pause time by 75% relative to tip and
by 90% relative to Go 1.5. Notably, all pauses are now well under
10ms. Here are the results for the garbage benchmark:

               ------------- max pause ------------
Heap   Procs   after change   before change   1.5.1
24GB     12        3.8ms          16ms         37ms
24GB      4        3.7ms          16ms         37ms
 4GB      4        3.7ms           3ms        6.9ms

In the 4GB/4P case, it seems the "before change" run got lucky: the
max went up, but the 99%ile pause time went down from 3ms to 2.04ms.

Change-Id: Ica22189559f231d408ef2815019c9dbb5f38bf31
Reviewed-on: https://go-review.googlesource.com/15071
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-02 19:56:01 +00:00
Austin Clements
dac220b0a9 runtime: remove in-use page count loop from STW
In order to compute the sweep ratio, the runtime needs to know how
many pages belong to spans in state _MSpanInUse. Currently it finds
this out by looping over all spans during mark termination. However,
this takes ~1ms/heap GB, so multi-gigabyte heaps can quickly push our
STW time past 10ms.

Replace the loop with an actively maintained count of in-use pages.

For multi-gigabyte heaps, this reduces max mark termination pause time
by 75%–90% relative to tip and by 85%–95% relative to Go 1.5.1. This
shifts the longest pause time for large heaps to the sweep termination
phase, so it only slightly decreases max pause time, though it roughly
halves mean pause time. Here are the results for the garbage
benchmark:

               ---- max mark termination pause ----
Heap   Procs   after change   before change   1.5.1
24GB     12        1.9ms          18ms         37ms
24GB      4        3.7ms          18ms         37ms
 4GB      4        920µs         3.8ms        6.9ms

Fixes #11484.

Change-Id: Ia2d28bb8a1e4f1c3b8ebf79fb203f12b9bf114ac
Reviewed-on: https://go-review.googlesource.com/15070
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-02 19:55:55 +00:00
Austin Clements
608c1b0d56 runtime: scan objects with finalizers concurrently
This reduces pause time by ~25% relative to tip and by ~50% relative
to Go 1.5.1.

Currently one of the steps of STW mark termination is to loop (in
parallel) over all spans to find objects with finalizers in order to
mark all objects reachable from these objects and to treat the
finalizer special as a root. Unfortunately, even if there are no
finalizers at all, this loop takes roughly 1 ms/heap GB/core, so
multi-gigabyte heaps can quickly push our STW time past 10ms.

Fix this by moving this scan from mark termination to concurrent scan,
where it can run in parallel with mutators. The loop itself could also
be optimized, but this cost is small compared to concurrent marking.

Making this scan concurrent introduces two complications:

1) The scan currently walks the specials list of each span without
locking it, which is safe only with the world stopped. We fix this by
speculatively checking if a span has any specials (the vast majority
won't) and then locking the specials list only if there are specials
to check.

2) An object can have a finalizer set after concurrent scan, in which
case it won't have been marked appropriately by concurrent scan. If
the finalizer is a closure and is only reachable from the special, it
could be swept before it is run. Likewise, if the object is not marked
yet when the finalizer is set and then becomes unreachable before it
is marked, other objects reachable only from it may be swept before
the finalizer function is run. We fix this issue by making
addfinalizer ensure the same marking invariants as markroot does.

For multi-gigabyte heaps, this reduces max pause time by 20%–30%
relative to tip (depending on GOMAXPROCS) and by ~50% relative to Go
1.5.1 (where this loop was neither concurrent nor parallel). Here are
the results for the garbage benchmark:

               ---------------- max pause ----------------
Heap   Procs   Concurrent scan   STW parallel scan   1.5.1
24GB     12         18ms              23ms            37ms
24GB      4         18ms              25ms            37ms
 4GB      4         3.8ms            4.9ms           6.9ms

In all cases, 95%ile pause time is similar to the max pause time. This
also improves mean STW time by 10%–30%.

Fixes #11485.

Change-Id: I9359d8c3d120a51d23d924b52bf853a1299b1dfd
Reviewed-on: https://go-review.googlesource.com/14982
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-02 19:55:48 +00:00
Austin Clements
fbd2660af3 runtime: introduce gcMode type for GC modes
Currently, the GC modes constants are untyped and functions pass them
around as ints. Clean this up by introducing a proper type for these
constant.

Change-Id: Ibc022447bdfa203644921fbb548312d7e2272e8d
Reviewed-on: https://go-review.googlesource.com/14981
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-02 19:55:41 +00:00
Austin Clements
1b84bb8c7c runtime: fix out-of-date comment on gcWork usage
Change-Id: I3c21ffa80a5c14911e07238b1f64bec686ed7b72
Reviewed-on: https://go-review.googlesource.com/14980
Reviewed-by: Minux Ma <minux@golang.org>
2015-10-02 19:55:34 +00:00
David Crawshaw
47ccf96a95 runtime: darwin/386 entrypoint for c-archive
Change-Id: Ic22597b5e2824cffe9598cb9b506af3426c285fd
Reviewed-on: https://go-review.googlesource.com/12412
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-02 11:45:52 +00:00
Michael Hudson-Doyle
2c911143fd runtime: adjust the ppc64x memmove and memclr to copy by word as much as it can
Issue #12552 can happen on ppc64 too, although much less frequently in my
testing. I'm fairly sure this fixes it (2 out of 200 runs of oracle.test failed
without this change and 0 of 200 failed with it). It's also a lot faster for
large moves/clears:

name           old speed      new speed       delta
Memmove1-6      157MB/s ± 9%    144MB/s ± 0%    -8.20%         (p=0.004 n=10+9)
Memmove2-6      281MB/s ± 1%    249MB/s ± 1%   -11.53%        (p=0.000 n=10+10)
Memmove3-6      376MB/s ± 1%    328MB/s ± 1%   -12.64%        (p=0.000 n=10+10)
Memmove4-6      475MB/s ± 4%    345MB/s ± 1%   -27.28%         (p=0.000 n=10+8)
Memmove5-6      540MB/s ± 1%    393MB/s ± 0%   -27.21%        (p=0.000 n=10+10)
Memmove6-6      609MB/s ± 0%    423MB/s ± 0%   -30.56%         (p=0.000 n=9+10)
Memmove7-6      659MB/s ± 0%    468MB/s ± 0%   -28.99%         (p=0.000 n=8+10)
Memmove8-6      705MB/s ± 0%   1295MB/s ± 1%   +83.73%          (p=0.000 n=9+9)
Memmove9-6      740MB/s ± 1%   1241MB/s ± 1%   +67.61%         (p=0.000 n=10+8)
Memmove10-6     780MB/s ± 0%   1162MB/s ± 1%   +48.95%         (p=0.000 n=10+9)
Memmove11-6     811MB/s ± 0%   1180MB/s ± 0%   +45.58%          (p=0.000 n=8+9)
Memmove12-6     820MB/s ± 1%   1073MB/s ± 1%   +30.83%         (p=0.000 n=10+9)
Memmove13-6     849MB/s ± 0%   1068MB/s ± 1%   +25.87%        (p=0.000 n=10+10)
Memmove14-6     877MB/s ± 0%    911MB/s ± 0%    +3.83%        (p=0.000 n=10+10)
Memmove15-6     893MB/s ± 0%    922MB/s ± 0%    +3.25%         (p=0.000 n=10+9)
Memmove16-6     897MB/s ± 1%   2418MB/s ± 1%  +169.67%         (p=0.000 n=10+9)
Memmove32-6     908MB/s ± 0%   3927MB/s ± 2%  +332.64%         (p=0.000 n=10+8)
Memmove64-6    1.11GB/s ± 0%   5.59GB/s ± 0%  +404.64%          (p=0.000 n=9+9)
Memmove128-6   1.25GB/s ± 0%   6.71GB/s ± 2%  +437.49%         (p=0.000 n=9+10)
Memmove256-6   1.33GB/s ± 0%   7.25GB/s ± 1%  +445.06%        (p=0.000 n=10+10)
Memmove512-6   1.38GB/s ± 0%   8.87GB/s ± 0%  +544.43%        (p=0.000 n=10+10)
Memmove1024-6  1.40GB/s ± 0%  10.00GB/s ± 0%  +613.80%        (p=0.000 n=10+10)
Memmove2048-6  1.41GB/s ± 0%  10.65GB/s ± 0%  +652.95%         (p=0.000 n=9+10)
Memmove4096-6  1.42GB/s ± 0%  11.01GB/s ± 0%  +675.37%         (p=0.000 n=8+10)
Memclr5-6       269MB/s ± 1%    264MB/s ± 0%    -1.80%        (p=0.000 n=10+10)
Memclr16-6      600MB/s ± 0%    887MB/s ± 1%   +47.83%        (p=0.000 n=10+10)
Memclr64-6     1.06GB/s ± 0%   2.91GB/s ± 1%  +174.58%         (p=0.000 n=8+10)
Memclr256-6    1.32GB/s ± 0%   6.58GB/s ± 0%  +399.86%         (p=0.000 n=9+10)
Memclr4096-6   1.42GB/s ± 0%  10.90GB/s ± 0%  +668.03%         (p=0.000 n=8+10)
Memclr65536-6  1.43GB/s ± 0%  11.37GB/s ± 0%  +697.83%          (p=0.000 n=9+8)
GoMemclr5-6     359MB/s ± 0%    360MB/s ± 0%    +0.46%        (p=0.000 n=10+10)
GoMemclr16-6    750MB/s ± 0%   1264MB/s ± 1%   +68.45%        (p=0.000 n=10+10)
GoMemclr64-6   1.17GB/s ± 0%   3.78GB/s ± 1%  +223.58%         (p=0.000 n=10+9)
GoMemclr256-6  1.35GB/s ± 0%   7.47GB/s ± 0%  +452.44%        (p=0.000 n=10+10)

Update #12552

Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893
Reviewed-on: https://go-review.googlesource.com/14840
Reviewed-by: Minux Ma <minux@golang.org>
2015-10-02 07:50:52 +00:00
Mikio Hara
9fb79380f0 runtime: drop sigfwd from signal forwarding unsupported platforms
This change splits signal_unix.go into signal_unix.go and
signal2_unix.go and removes the fake symbol sigfwd from signal
forwarding unsupported platforms for clarification purpose.

Change-Id: I205eab5cf1930fda8a68659b35cfa9f3a0e67ca6
Reviewed-on: https://go-review.googlesource.com/12062
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-02 01:07:44 +00:00
Joel Sing
db70c019d7 runtime/trace: reduce memory usage for trace stress tests on openbsd/arm
Reduce allocation to avoid running out of memory on the openbsd/arm builder,
until issue/12032 is resolved.

Update issue #12032

Change-Id: Ibd513829ffdbd0db6cd86a0a5409934336131156
Reviewed-on: https://go-review.googlesource.com/15242
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-10-01 18:00:55 +00:00
Joel Sing
1d5251f707 runtime: handle sysReserve failure in mHeap_SysAlloc
sysReserve will return nil on failure - correctly handle this case and return
nil to the caller. Currently, a failure will result in h.arena_end being set
to psize, h.arena_used being set to zero and fun times ensue.

On the openbsd/arm builder this has resulted in:

  runtime: address space conflict: map(0x0) = 0x40946000
  fatal error: runtime: address space conflict

When it should be reporting out of memory instead.

Change-Id: Iba828d5ee48ee1946de75eba409e0cfb04f089d4
Reviewed-on: https://go-review.googlesource.com/15056
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-10-01 14:40:02 +00:00
Jeremy Schlatter
59bacb285c runtime: update comment to match function name
Change-Id: I8f22434ade576cc7e3e6d9f357bba12c1296e3d1
Reviewed-on: https://go-review.googlesource.com/15250
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-01 13:12:50 +00:00
Ian Lance Taylor
0c1f0549b8 runtime, runtime/cgo: support using msan on cgo code
The memory sanitizer (msan) is a nice compiler feature that can
dynamically check for memory errors in C code.  It's not useful for Go
code, since Go is memory safe.  But it is useful to be able to use the
memory sanitizer on C code that is linked into a Go program via cgo.
Without this change it does not work, as msan considers memory passed
from Go to C as uninitialized.

To make this work, change the runtime to call the C mmap function when
using cgo.  When using msan the mmap call will be intercepted and marked
as returning initialized memory.

Work around what appears to be an msan bug by calling malloc before we
call mmap.

Change-Id: I8ab7286d7595ae84782f68a98bef6d3688b946f9
Reviewed-on: https://go-review.googlesource.com/15170
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-09-30 22:17:55 +00:00
Austin Clements
e01be84149 runtime: test that periodic GC works
We've broken periodic GC a few times without noticing because there's
no test for it, partly because you have to wait two minutes to see if
it happens. This exposes control of the periodic GC timeout to runtime
tests and adds a test that cranks it down to zero and sleeps for a bit
to make sure periodic GCs happen.

Change-Id: I3ec44e967e99f4eda752f85c329eebd18b87709e
Reviewed-on: https://go-review.googlesource.com/13169
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-09-30 19:24:07 +00:00
Shenghou Ma
604fbab3f1 runtime: fix incomplete sentence in comment
Fixes #12709.

Change-Id: If5a2536458fcd26d6f003dde1bfc02f86b09fa94
Reviewed-on: https://go-review.googlesource.com/14793
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-09-23 17:05:39 +00:00
Alex Brainman
d02a4c1d60 runtime: test that timeBeginPeriod succeeds
Change-Id: I5183f767dadb6d24a34d2460d02e97ddbaab129a
Reviewed-on: https://go-review.googlesource.com/12546
Run-TryBot: Minux Ma <minux@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-23 09:01:08 +00:00
Austin Clements
b307910b6e runtime: fix offset in invalidptr panic message
Change-Id: I00e1eebbf5e1a01c8fad5ca5324aa8eec1e4d731
Reviewed-on: https://go-review.googlesource.com/14792
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-22 16:55:17 +00:00
Ilya Tocar
5cf281a9b7 runtime: optimize duffcopy on amd64
Use movups to copy 16 bytes at a time.
Results (haswell):

name            old time/op  new time/op  delta
CopyFat8-48     0.62ns ± 3%  0.63ns ± 3%     ~     (p=0.535 n=20+20)
CopyFat12-48    0.92ns ± 2%  0.93ns ± 3%     ~     (p=0.594 n=17+18)
CopyFat16-48    1.23ns ± 2%  1.23ns ± 2%     ~     (p=0.839 n=20+19)
CopyFat24-48    1.85ns ± 2%  1.84ns ± 0%   -0.48%  (p=0.014 n=19+20)
CopyFat32-48    2.45ns ± 0%  2.45ns ± 1%     ~     (p=1.000 n=16+16)
CopyFat64-48    3.30ns ± 2%  2.14ns ± 1%  -35.00%  (p=0.000 n=20+18)
CopyFat128-48   6.05ns ± 0%  3.98ns ± 0%  -34.22%  (p=0.000 n=18+17)
CopyFat256-48   11.9ns ± 3%   7.7ns ± 0%  -35.87%  (p=0.000 n=20+17)
CopyFat512-48   23.0ns ± 2%  15.1ns ± 2%  -34.52%  (p=0.000 n=20+18)
CopyFat1024-48  44.8ns ± 1%  29.8ns ± 2%  -33.48%  (p=0.000 n=17+19)

Change-Id: I8a78773c656d400726a020894461e00c59f896bf
Reviewed-on: https://go-review.googlesource.com/14836
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-22 15:02:37 +00:00
Keith Randall
d29e92be52 [dev.ssa] cmd/compile: Use varkill only for non-SSAable vars
For variables which get SSA'd, SSA keeps track of all the def/kill.
It is only for on-stack variables that we need them.

This reduces stack frame sizes significantly because often the
only use of a variable was a varkill, and without that last use
the variable doesn't get allocated in the frame at all.

Fixes #12602

Change-Id: I3f00a768aa5ddd8d7772f375b25f846086a3e689
Reviewed-on: https://go-review.googlesource.com/14758
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-20 07:10:05 +00:00
Dmitry Vyukov
9172a1b573 runtime: race instrument read of convT2E/I arg
Sometimes this read is instrumented by compiler when it creates
a temp to take address, but sometimes it is not (e.g. for global vars
compiler takes address of the global directly).

Instrument convT2E/I similarly to chansend and mapaccess.

Fixes #12664

Change-Id: Ia7807f15d735483996426c5f3aed60a33b279579
Reviewed-on: https://go-review.googlesource.com/14752
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-19 10:26:36 +00:00
Austin Clements
c742ff6adc runtime: remove flaky TestInvalidptrCrash to fix build
This test fails on arm64 and some amd64 OSs and fails on Linux/amd64
if you remove the first runtime.GC(), which should be unnecessary, and
run it in all.bash (but not if you run it in isolation). I don't
understand any of these failures, so for now just remove this test.

TBR=rlh

Change-Id: Ibed00671126000ed7dc5b5d4af1f86fe4a1e30e1
Reviewed-on: https://go-review.googlesource.com/14767
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-19 01:43:00 +00:00
Austin Clements
97b64d88eb runtime: avoid debug prints of huge objects
Currently when the GC prints an object for debugging (e.g., for a
failed invalidptr or checkmark check), it dumps the entire object. To
avoid inundating the user with output for really large objects, limit
this to printing just the first 128 words (which are most likely to be
useful in identifying the type of an object) and the 32 words around
the problematic field.

Change-Id: Id94a5c9d8162f8bd9b2a63bf0b1bfb0adde83c68
Reviewed-on: https://go-review.googlesource.com/14764
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-18 22:23:18 +00:00
Austin Clements
b7c55ba496 runtime: improve invalid pointer error message
By default, the runtime panics if it detects a pointer to an
unallocated span. At this point, this usually catches bad uses of
unsafe or cgo in user code (though it could also catch runtime bugs).
Unfortunately, the rather cryptic error misleads users, offers users
little help with debugging their own problem, and offers the Go
developers little help with root-causing.

Improve the error message in various ways. First, the wording is
improved to make it clearer what condition was detected and to suggest
that this may be the result of incorrect use of unsafe or cgo. Second,
we add a dump of the object containing the bad pointer so that there's
at least some hope of figuring out why a bad pointer was stored in the
Go heap.

Change-Id: I57b91b12bc3cb04476399d7706679e096ce594b9
Reviewed-on: https://go-review.googlesource.com/14763
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-18 22:23:11 +00:00
Shawn Walker-Salas
001a75a74c runtime/trace: fix tracing of blocking system calls
The placement and invocation of traceGoSysCall when using
entersyscallblock() instead of entersyscall() differs enough that the
TestTraceSymbolize test can fail on some platforms.

This change moves the invocation of traceGoSysCall for entersyscall() so
that the same number of "frames to skip" are present in the trace as when
entersyscallblock() is used ensuring system call traces remain identical
regardless of internal implementation choices.

Fixes golang/go#12056

Change-Id: I8361e91aa3708f5053f98263dfe9feb8c5d1d969
Reviewed-on: https://go-review.googlesource.com/13861
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-09-17 09:06:20 +00:00
Alex Brainman
3d1f8c2379 runtime: print errno and byte count before crashing in mem_windows.go
As per iant suggestion during issue #12587 crash investigation.

Also adjust incorrect throw message in sysUsed while we are here.

Change-Id: Ice07904fdd6e0980308cb445965a696d26a1b92e
Reviewed-on: https://go-review.googlesource.com/14633
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-17 07:06:42 +00:00
David Crawshaw
9337dc9b5e runtime/debug: more explicit Stack docs
Change-Id: I81a7f22be827519b5290b4acbcba357680cad3c4
Reviewed-on: https://go-review.googlesource.com/14605
Reviewed-by: Rob Pike <r@golang.org>
2015-09-16 22:25:11 +00:00
Ilya Tocar
2421c6e3df runtime: optimize duffzero for amd64.
Use MOVUPS to zero 16 bytes at a time.

results (haswell):

name             old time/op  new time/op  delta
ClearFat8-48     0.62ns ± 2%  0.62ns ± 1%     ~     (p=0.085 n=20+15)
ClearFat12-48    0.93ns ± 2%  0.93ns ± 2%     ~     (p=0.757 n=19+19)
ClearFat16-48    1.23ns ± 1%  1.23ns ± 1%     ~     (p=0.896 n=19+17)
ClearFat24-48    1.85ns ± 2%  1.84ns ± 0%   -0.51%  (p=0.023 n=20+15)
ClearFat32-48    2.45ns ± 0%  2.46ns ± 2%     ~     (p=0.053 n=17+18)
ClearFat40-48    1.99ns ± 0%  0.92ns ± 2%  -53.54%  (p=0.000 n=19+20)
ClearFat48-48    2.15ns ± 1%  0.92ns ± 2%  -56.93%  (p=0.000 n=19+20)
ClearFat56-48    2.46ns ± 1%  1.23ns ± 0%  -49.98%  (p=0.000 n=19+14)
ClearFat64-48    2.76ns ± 0%  2.14ns ± 1%  -22.21%  (p=0.000 n=17+17)
ClearFat128-48   5.21ns ± 0%  3.99ns ± 0%  -23.46%  (p=0.000 n=17+19)
ClearFat256-48   10.3ns ± 4%   7.7ns ± 0%  -25.37%  (p=0.000 n=20+17)
ClearFat512-48   20.2ns ± 4%  15.0ns ± 1%  -25.58%  (p=0.000 n=20+17)
ClearFat1024-48  39.7ns ± 2%  29.7ns ± 0%  -25.05%  (p=0.000 n=19+19)

Change-Id: I200401eec971b2dd2450c0651c51e378bd982405
Reviewed-on: https://go-review.googlesource.com/14408
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-16 16:07:44 +00:00
David Crawshaw
2d697b2401 runtime/debug: implement Stack using runtime.Stack
Fixes #12363

Change-Id: I1a025ab6a1cbd5a58f5c2bce5416788387495428
Reviewed-on: https://go-review.googlesource.com/14604
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-16 11:36:21 +00:00
David Crawshaw
fb30270037 runtime: preserve R11 in darwin/arm entrypoint
The _rt0_arm_darwin_lib entrypoint has to conform to the darwin ARMv7
calling convention, which requires functions to preserve the value of
R11. Go uses R11 as the liblink REGTMP register, so save it manually.

Also avoid using R4, which is also callee-save.

Fixes #12590

Change-Id: I9c3b374e330f81ff8fc9c01fa20505a33ddcf39a
Reviewed-on: https://go-review.googlesource.com/14603
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-09-16 11:23:32 +00:00
Ian Lance Taylor
b6d115a583 runtime: on unexpected netpoll error, throw instead of looping
The current code prints an error message and then tries to carry on.
This is not helpful for Go users: they see a message that means
nothing and that they can do nothing about.  In the only known case of
this message, in issue 11498, the best guess is that the netpoll code
went into an infinite loop.  Instead of doing that, crash the program.

Fixes #11498.

Change-Id: Idda3456c5b708f0df6a6b56c5bb4e796bbc39d7c
Reviewed-on: https://go-review.googlesource.com/12047
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-09-15 17:56:56 +00:00
Keith Randall
731bdc5115 runtime: fix aeshash of empty string
Aeshash currently computes the hash of the empty string as
hash("", seed) = seed.  This is bad because the hash of a compound
object with empty strings in it doesn't include information about
where those empty strings were.  For instance [2]string{"", "foo"}
and [2]string{"foo", ""} might get the same hash.

Fix this by returning a scrambled seed instead of the seed itself.
With this fix, we can remove the scrambling done by the generated
array hash routines.

The test also rejects hash("", seed) = 0, if we ever thought
it would be a good idea to try that.

The fallback hash is already OK in this regard.

Change-Id: Iaedbaa5be8d6a246dc7e9383d795000e0f562037
Reviewed-on: https://go-review.googlesource.com/14129
Reviewed-by: jcd . <jcd@golang.org>
2015-09-15 17:51:23 +00:00
Alex Brainman
d7c12042bf runtime: provide room for first 4 syscall parameters in windows usleep2
Windows amd64 requires all syscall callers to provide room for first
4 parameters on stack. We do that for all our syscalls, except inside
of usleep2. In https://codereview.appspot.com/7563043#msg3 rsc says:

"We don't need the stack alignment and first 4 parameters on amd64
because it's just a system call, not an ordinary function call."

He seems to be wrong on both counts. But alignment is already fixed.
Fix parameter space now too.

Fixes #12444

Change-Id: I66a2a18d2f2c3846e3aa556cc3acc8ec6240bea0
Reviewed-on: https://go-review.googlesource.com/14282
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-15 01:12:32 +00:00
Ian Lance Taylor
ffd7d31787 runtime: unblock special glibc signals on each thread
Glibc uses some special signals for special thread operations.  These
signals will be used in programs that use cgo and invoke certain glibc
functions, such as setgid.  In order for this to work, these signals
need to not be masked by any thread.  Before this change, they were
being masked by programs that used os/signal.Notify, because it
carefully masks all non-thread-specific signals in all threads so that a
dedicated thread will collect and report those signals (see ensureSigM
in signal1_unix.go).

This change adds the two glibc special signals to the set of signals
that are unmasked in each thread.

Fixes #12498.

Change-Id: I797d71a099a2169c186f024185d44a2e1972d4ad
Reviewed-on: https://go-review.googlesource.com/14297
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-09-14 21:59:54 +00:00
Austin Clements
4ac4085f8e runtime: minor clarifications of markroot
This puts the _Root* indexes in a more friendly order and tweaks
markrootSpans to use a for-range loop instead of its own indexing.

Change-Id: I2c18d55c9a673ea396b6424d51ef4997a1a74825
Reviewed-on: https://go-review.googlesource.com/14548
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-14 19:37:44 +00:00
Austin Clements
a1cad70a2f runtime: remove unused g.readyg field
Commit 0e6a6c5 removed readyExecute a long time ago, but left behind
the g.readyg field that was used by readyExecute. Remove this now
unused field.

Change-Id: I41b87ad2b427974d256ec7a7f6d4bdc2ce8a13bb
Reviewed-on: https://go-review.googlesource.com/13111
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-14 18:40:22 +00:00
Austin Clements
70462f90ec runtime: simplify mSpan_Sweep
This is a cleanup following cc8f544, which was a minimal change to fix
issue #11617. This consolidates the two places in mSpan_Sweep that
update sweepgen. Previously this was necessary because sweepgen must
be updated before freeing the span, but we freed large spans early.
Now we free large spans later, so there's no need to duplicate the
sweepgen update. This also means large spans can take advantage of the
sweepgen sanity checking performed for other spans.

Change-Id: I23b79dbd9ec81d08575cd307cdc0fa6b20831768
Reviewed-on: https://go-review.googlesource.com/12451
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-14 18:29:58 +00:00
Austin Clements
572f08a064 runtime: split marking of span roots into 128 subtasks
Marking of span roots can represent a significant fraction of the time
spent in mark termination. Simply traversing the span list takes about
1ms per GB of heap and if there are a large number of finalizers (for
example, for network connections), it may take much longer.

Improve the situation by splitting the span scan into 128 subtasks
that can be executed in parallel and load balanced by the markroots
parallel for. This lets the GC balance this job across the Ps.

A better solution is to do this during concurrent mark, or to improve
it algorithmically, but this is a simple change with a lot of bang for
the buck.

This was suggested by Rhys Hiltner.

Updates #11485.

Change-Id: I8b281adf0ba827064e154a1b6cc32d4d8031c03c
Reviewed-on: https://go-review.googlesource.com/13112
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-14 18:15:40 +00:00
Austin Clements
739f133837 runtime: fix hashing of trace stacks
The call to hash the trace stack reversed the "seed" and "size"
arguments to memhash and, hence, always called memhash with a 0 size,
which dutifully returned a hash value that depended only on the number
of PCs in the stack and not their values. As a result, all stacks were
put in to a very subset of the 8,192 buckets.

Fix this by passing these arguments in the correct order.

Change-Id: I67cd29312f5615c7ffa23e205008dd72c6b8af62
Reviewed-on: https://go-review.googlesource.com/13613
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-09-14 18:14:14 +00:00
Keith Randall
e3869a6b65 [dev.ssa] cmd/compile/internal/ssa: implement write barriers
For now, we only use typedmemmove.  This can be optimized
in future CLs.

Also add a feature to help with binary searching bad compilations.
Together with GOSSAPKG, GOSSAHASH specifies the last few binary digits
of the hash of function names that should be compiled.  So
GOSSAHASH=0110 means compile only those functions whose last 4 bits
of hash are 0110.  By adding digits to the front we can binary search
for the function whose SSA-generated code is causing a test to fail.

Change-Id: I5a8b6b70c6f034f59e5753965234cd42ea36d524
Reviewed-on: https://go-review.googlesource.com/14530
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-14 16:25:40 +00:00
Dave Cheney
4f48507d90 runtime: reduce pthread stack size in TestCgoCallbackGC
Fixes #11959

This test runs 100 concurrent callbacks from C to Go consuming 100
operating system threads, which at 8mb a piece (the default on linux/arm)
would reserve over 800mb of address space. This would frequently
cause the test to fail on platforms with ~1gb of ram, such as the
raspberry pi.

This change reduces the thread stack allocation to 256kb, a number picked
at random, but at 1/32th the previous size, should allow the test to
pass successfully on all platforms.

Change-Id: I8b8bbab30ea7b2972b3269a6ff91e6fe5bc717af
Reviewed-on: https://go-review.googlesource.com/13731
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Capitanio <capnm9@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-09-13 23:46:55 +00:00
Shenghou Ma
0b5bcf53ee runtime/cgo: explicitly link msvcrt on windows
It's because runtime links to ntdll, and ntdll exports a couple
incompatible libc functions. We must link to msvcrt first and
then try ntdll.

Fixes #12030.

Change-Id: I0105417bada108da55f5ae4482c2423ac7a92957
Reviewed-on: https://go-review.googlesource.com/14472
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-09-12 08:34:52 +00:00
Rob Pike
67ddae87b9 all: use one 'l' when cancelling everywhere except Solaris
Fixes #11626.

Change-Id: I1b70c0844473c3b57a53d7cca747ea5cdc68d232
Reviewed-on: https://go-review.googlesource.com/14526
Run-TryBot: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-11 18:31:51 +00:00
Didier Spezia
4f33436004 runtime,internal/trace: map/slice literals janitoring
Simplify slice/map literal expressions.
Caught with gofmt -d -s, fixed with gofmt -w -s
Checked that the result can still be compiled with Go 1.4.

Change-Id: I06bce110bb5f46ee2f45113681294475aa6968bc
Reviewed-on: https://go-review.googlesource.com/13839
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-09-11 14:03:43 +00:00
Michael Hudson-Doyle
b0344e9fd5 cmd/internal/obj, cmd/link, runtime: a saner model for TLS on arm
this leaves lots of cruft behind, will delete that soon

Change-Id: I12d6b6192f89bcdd89b2b0873774bd3458373b8a
Reviewed-on: https://go-review.googlesource.com/14196
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-10 19:49:13 +00:00
Shenghou Ma
1cf05ee612 runtime: move arch1_$GOARCH.go into arch_$GOARCH.go
Update #12563.

Change-Id: Id87f8e53586accd662575c31961c39787268df7a
Reviewed-on: https://go-review.googlesource.com/14471
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-10 04:52:24 +00:00
Keith Randall
00c638d243 runtime: on map update, don't overwrite key if we don't need to.
Keep track of which types of keys need an update and which don't.

Strings need an update because the new key might pin a smaller backing store.
Floats need an update because it might be +0/-0.
Interfaces need an update because they may contain strings or floats.

Fixes #11088

Change-Id: I9ade53c1dfb3c1a2870d68d07201bc8128e9f217
Reviewed-on: https://go-review.googlesource.com/10843
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-09 21:06:49 +00:00
Austin Clements
e9089e4ab6 runtime: add high-level description of how stack barriers work
Change-Id: I6affe75b5fa9dbf513c16200bff4fd7aa5f3a985
Reviewed-on: https://go-review.googlesource.com/14051
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-09 01:18:56 +00:00
Austin Clements
92e68c89f6 runtime: move stack barrier code to its own file
Currently the stack barrier code is mixed in with the mark and scan
code. Move all of the stack barrier related functions and variables to
a new dedicated source file. There are no code modifications.

Change-Id: I604603045465ef8573b9f88915d28ab6b5910903
Reviewed-on: https://go-review.googlesource.com/14050
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-09 01:18:50 +00:00
Michael Hudson-Doyle
0388d4303f runtime: remove unused FUNCDATA_DeadValueMaps
Change-Id: Iccb0221bd9aef062d20798b952eaa09d9e60b902
Reviewed-on: https://go-review.googlesource.com/14345
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-07 21:02:11 +00:00
Michael Hudson-Doyle
31322996fd runtime: add stub sigreturn on arm
When building a shared library, all functions that are declared must actually
be defined.

Change-Id: I1488690cecfb66e62d9fdb3b8d257a4dc31d202a
Reviewed-on: https://go-review.googlesource.com/14187
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-09-07 07:49:09 +00:00
Michael Hudson-Doyle
40af15f28e runtime: teach softfloat interpreter about "add r11, pc, r11"
This is generated during fp code when -shared is active.

Change-Id: Ia1092299b9c3b63ff771ca4842158b42c34bd008
Reviewed-on: https://go-review.googlesource.com/14286
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-09-04 06:43:35 +00:00
Michael Hudson-Doyle
9e6ba37b86 cmd/internal/obj: some platform independent bits of proper toolchain support for thread local storage
Also simplifies some silliness around making the .tbss section wrt internal
vs external linking. The "make TLS make sense" project has quite a few more
steps to go.

Issue #11270

Change-Id: Ia4fa135cb22d916728ead95bdbc0ebc1ae06f05c
Reviewed-on: https://go-review.googlesource.com/13990
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-03 14:06:07 +00:00
Michael Hudson-Doyle
9f0baca505 runtime: fixes for arm64 shared libraries
Building for shared libraries requires that all functions that are declared
have an implementation and vice versa so make that so on arm64.

It would be nicer to not require the stub sigreturn (it will never be called)
but that seems a bit awkward.

Change-Id: I3cec81697161b452af81fa35939f748bd1acf7fd
Reviewed-on: https://go-review.googlesource.com/13995
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-09-03 01:07:40 +00:00
Keith Randall
a088f1b76c runtime: soften up hash checks a bit
The hash tests generate occasional failures, quiet them some more.

In particular we can get 1 collision when the expected number is
.001 or so. That shouldn't be a dealbreaker.

Fixes #12311

Change-Id: I784e91b5d21f4f1f166dc51bde2d1cd3a7a3bfea
Reviewed-on: https://go-review.googlesource.com/13902
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-08-31 19:38:24 +00:00
Shenghou Ma
32d3b96e8b runtime: implement cmpstring and bytes.Compare in assembly for ppc64
Change-Id: I15bf55aa5ac3588c05f0a253f583c52bab209892
Reviewed-on: https://go-review.googlesource.com/14041
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-08-31 18:41:58 +00:00
Austin Clements
77e528293b runtime: check that stack barrier unwind is in sync
Currently the stack barrier stub blindly unwinds the next stack
barrier from the G's stack barrier array without checking that it's
the right stack barrier. If through some bug the stack barrier array
position gets out of sync with where we actually are on the stack,
this could return to the wrong PC, which would lead to difficult to
debug crashes. To address this, this commit adds a check to the amd64
stack barrier stub that it's unwinding the correct stack barrier.

Updates #12238.

Change-Id: If824d95191d07e2512dc5dba0d9978cfd9f54e02
Reviewed-on: https://go-review.googlesource.com/13948
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:07:02 +00:00
Austin Clements
3bfc9df21a runtime: add GODEBUG for stack barriers at every frame
Currently enabling the debugging mode where stack barriers are
installed at every frame requires recompiling the runtime. However,
this is potentially useful for field debugging and for runtime tests,
so make this mode a GODEBUG.

Updates #12238.

Change-Id: I6fb128f598b19568ae723a612e099c0ed96917f5
Reviewed-on: https://go-review.googlesource.com/13947
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:06:55 +00:00
Austin Clements
e2bb03f175 runtime: don't install a stack barrier in cgocallback_gofunc's frame
Currently the runtime can install stack barriers in any frame.
However, the frame of cgocallback_gofunc is special: it's the one
function that switches from a regular G stack to the system stack on
return. Hence, the return PC slot in its frame on the G stack is
actually used to save getg().sched.pc (so tracebacks appear to unwind
to the last Go function running on that G), and not as an actual
return PC for cgocallback_gofunc.

Because of this, if we install a stack barrier in cgocallback_gofunc's
return PC slot, when cgocallback_gofunc does return, it will move the
stack barrier stub PC in to getg().sched.pc and switch back to the
system stack. The rest of the runtime doesn't know how to deal with a
stack barrier stub in sched.pc: nothing knows how to match it up with
the G's stack barrier array and, when the runtime removes stack
barriers, it doesn't know to undo the one in sched.pc. Hence, if the C
code later returns back in to Go code, it will attempt to return
through the stack barrier saved in sched.pc, which may no longer have
correct unwinding information.

Fix this by blacklisting cgocallback_gofunc's frame so the runtime
won't install a stack barrier in it's return PC slot.

Fixes #12238.

Change-Id: I46aa2155df2fd050dd50de3434b62987dc4947b8
Reviewed-on: https://go-review.googlesource.com/13944
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:06:47 +00:00
Keith Randall
805e56ef47 runtime: short-circuit bytes.Compare if src and dst are the same slice
Should only matter on ppc64 and ppc64le.

Fixes #11336

Change-Id: Id4b0ac28b573648e1aa98e87bf010f00d006b146
Reviewed-on: https://go-review.googlesource.com/13901
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-08-29 02:43:57 +00:00
Russ Cox
9c04d00214 runtime: check explicitly for short unwinding of stacks
Right now we find out implicitly if stack barriers are in place,
or defers. This change makes sure we find out about short
unwinds always.

Change-Id: Ibdde1ba9c79eb792660dcb7aa6f186e4e4d559b3
Reviewed-on: https://go-review.googlesource.com/13966
Reviewed-by: Austin Clements <austin@google.com>
2015-08-28 16:05:59 +00:00
Tim Cooijmans
34db31d5f5 src/runtime: Add missing defs for android/386.
Change-Id: I63bf6d2fdf41b49ff8783052d5d6c53b20e2f050
Reviewed-on: https://go-review.googlesource.com/13760
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-08-27 15:14:41 +00:00
Michael Hudson-Doyle
d497eeb005 runtime: remove unused xchgp/xchgp1
I noticed that they were unimplemented on arm64 but then that they were
in fact not used at all.

Change-Id: Iee579feda2a5e374fa571bcc8c89e4ef607d50f6
Reviewed-on: https://go-review.googlesource.com/13951
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-27 00:28:35 +00:00
Uttam C Pawar
32add8d7c8 bytes: improve Compare function on amd64 for large byte arrays
This patch contains only loop unrolling change for size > 63B

Following are the performance numbers for various sizes on
On Haswell based system: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz.

benchcmp go.head.8.25.15.txt go.head.8.25.15.opt.txt
benchmark                       old ns/op     new ns/op     delta
BenchmarkBytesCompare1-4        5.37          5.37          +0.00%
BenchmarkBytesCompare2-4        5.37          5.38          +0.19%
BenchmarkBytesCompare4-4        5.37          5.37          +0.00%
BenchmarkBytesCompare8-4        4.42          4.38          -0.90%
BenchmarkBytesCompare16-4       4.27          4.45          +4.22%
BenchmarkBytesCompare32-4       5.30          5.36          +1.13%
BenchmarkBytesCompare64-4       6.93          6.78          -2.16%
BenchmarkBytesCompare128-4      10.3          9.50          -7.77%
BenchmarkBytesCompare256-4      17.1          13.8          -19.30%
BenchmarkBytesCompare512-4      31.3          22.1          -29.39%
BenchmarkBytesCompare1024-4     62.5          39.0          -37.60%
BenchmarkBytesCompare2048-4     112           73.2          -34.64%

Change-Id: I4eeb1c22732fd62cbac97ba757b0d29f648d4ef1
Reviewed-on: https://go-review.googlesource.com/11871
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-26 03:52:20 +00:00
Todd Neal
a94e906c41 runtime: remove always false comparison in sigsend
s is a uint32 and can never be zero. It's max value is already tested
against sig.wanted, whose size is derived from _NSIG.  This also
matches the test in signal_enable.

Fixes #11282

Change-Id: I8eec9c7df8eb8682433616462fe51b264c092475
Reviewed-on: https://go-review.googlesource.com/13940
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-08-26 01:02:55 +00:00
Michael Hudson-Doyle
af78482d6b cmd/compile, cmd/link, reflect, runtime: remove type.zero field
No longer used after previous hashmap change.

Change-Id: I558470f872281e84a78406132df4e391d077b833
Reviewed-on: https://go-review.googlesource.com/13785
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-26 00:28:17 +00:00
Michael Hudson-Doyle
38519e69d0 cmd/compile, runtime: stop returning t.zero on hashmap miss
Previously t.zero always pointed to runtime.zerovalue. Change the hashmap code
to always return a runtime pointer directly, and change that pointer to point
to a larger buffer if one is needed.

(It might be better to only copy from the pointer returned by the mapaccess
functions when the value type is small enough and have the compiler insert
explicit zeroing for larger value types, but I tried and failed to do this).

This removes all uses of the zero field of the type data; the field itself can
be removed in a separate change.

Fixes #11491

Change-Id: I5b81752ff4067d74a5a281c41e88f151bae0171e
Reviewed-on: https://go-review.googlesource.com/13784
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-08-26 00:03:21 +00:00
Austin Clements
05a3b1fce5 cmd/compile: fix uninitialized memory in compare of interface value
A comparison of the form l == r where l is an interface and r is
concrete performs a type assertion on l to convert it to r's type.
However, the compiler fails to zero the temporary where the result of
the type assertion is written, so if the type is a pointer type and a
stack scan occurs while in the type assertion, it may see an invalid
pointer on the stack.

Fix this by zeroing the temporary. This is equivalent to the fix for
type switches from c4092ac.

Fixes #12253.

Change-Id: Iaf205d456b856c056b317b4e888ce892f0c555b9
Reviewed-on: https://go-review.googlesource.com/13872
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-25 14:37:08 +00:00
Dave Cheney
686d44d9e0 runtime: check pointer equality in arm64 cmpbody
Updates #11336

Follow the lead of amd64 by doing a pointer equality check
before comparing string/byte contents on arm64.

BenchmarkCompareBytesEqual-8               25.8           26.3           +1.94%
BenchmarkCompareBytesToNil-8               9.59           9.59           +0.00%
BenchmarkCompareBytesEmpty-8               9.59           9.17           -4.38%
BenchmarkCompareBytesIdentical-8           26.3           9.17           -65.13%
BenchmarkCompareBytesSameLength-8          16.3           16.3           +0.00%
BenchmarkCompareBytesDifferentLength-8     16.3           16.3           +0.00%
BenchmarkCompareBytesBigUnaligned-8        1132038        1131409        -0.06%
BenchmarkCompareBytesBig-8                 1126758        1128470        +0.15%
BenchmarkCompareBytesBigIdentical-8        1084366        9.17           -100.00%

Change-Id: Id7125c31957eff1ddb78897d4511bd50e79af3f7
Reviewed-on: https://go-review.googlesource.com/13885
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-25 03:29:47 +00:00
Todd Neal
3efe36d4c4 runtime: fix nmspinning comparison
nmspinning has a value range of [0, 2^31-1].  Update the comment to
indicate this and fix the comparison so it's not always false.

Fixes #11280

Change-Id: Iedaf0654dcba5e2c800645f26b26a1a781ea1991
Reviewed-on: https://go-review.googlesource.com/13877
Reviewed-by: Minux Ma <minux@golang.org>
2015-08-25 02:44:11 +00:00
Shenghou Ma
24be0997a2 runtime: add a missing hex conversion
gobuf.g is a guintptr, so without hex(), it will be printed as
a decimal, which is not very helpful and inconsistent with how
other pointers are printed.

Change-Id: I7c0432e9709e90a5c3b3e22ce799551a6242d017
Reviewed-on: https://go-review.googlesource.com/13879
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-25 01:37:54 +00:00
Dave Cheney
1135b9d671 runtime: check pointer equality in arm cmpbody
Updates #11336

Follow the lead of amd64 do a pointer equality check
before comparing string/byte contents on arm.

BenchmarkCompareBytesEqual-4               208             211             +1.44%
BenchmarkCompareBytesToNil-4               83.6            81.8            -2.15%
BenchmarkCompareBytesEmpty-4               80.2            75.2            -6.23%
BenchmarkCompareBytesIdentical-4           208             75.2            -63.85%
BenchmarkCompareBytesSameLength-4          126             128             +1.59%
BenchmarkCompareBytesDifferentLength-4     128             130             +1.56%
BenchmarkCompareBytesBigUnaligned-4        14192804        14060971        -0.93%
BenchmarkCompareBytesBig-4                 12277313        12128193        -1.21%
BenchmarkCompareBytesBigIdentical-4        9385046         78.5            -100.00%

Change-Id: I5b24620018688c5fe04b6ff6743a24c4ce225788
Reviewed-on: https://go-review.googlesource.com/13881
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-24 21:18:33 +00:00
Hyang-Ah (Hana) Kim
db5eb2a2c3 runtime/cgo: remove __stack_chk_fail_local
I cannot find where it's being used.

This addresses a duplicate symbol issue encountered in golang/go#9327.

Change-Id: I8efda45a006ad3e19423748210c78bd5831215e0
Reviewed-on: https://go-review.googlesource.com/13615
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-21 15:56:36 +00:00
Shawn Walker-Salas
d9e3d16796 runtime, syscall: remove unused bits from Solaris implementation
CL 9184 changed the runtime and syscall packages to link Solaris binaries
directly instead of using dlopen/dlsym but did not remove the unused (and
now broken) references to dlopen, dlclose, and dlsym.

Fixes #11923

Change-Id: I36345ce5e7b371bd601b7d48af000f4ccacd62c0
Reviewed-on: https://go-review.googlesource.com/13410
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-08-21 11:39:24 +00:00
Russ Cox
3ae17043f7 runtime: make sure heapBitsBulkBarrier cannot be preempted
Changes the torture test in #12068 from failing about 1/10 times
to not failing in almost 2,000 runs.

This was only happening in -race mode because functions are
bigger in -race mode, so a few of the helpers for heapBitsBulkBarrier
were not being inlined, and they were not marked nosplit,
so (only in -race mode) the write barrier was being preempted by GC,
causing missed pointer updates.

Filed issue #12069 for diagnosis of any other similar errors.

Fixes #12068.

Change-Id: Ic174d9b050ba278b18b08ab0d85a73c33bd5b175
Reviewed-on: https://go-review.googlesource.com/13364
Reviewed-by: Austin Clements <austin@google.com>
2015-08-07 17:55:26 +00:00
Russ Cox
4a19081358 runtime: run on GOARM=5 and GOARM=6 uniprocessor freebsd/arm systems
Also, crash early on non-Linux SMP ARM systems when GOARM < 7;
without the proper synchronization, SMP cannot work.

Linux is okay because we call kernel-provided routines for
synchronization and barriers, and the kernel takes care of
providing the right routines for the current system.
On non-Linux systems we are left to fend for ourselves.

It is possible to use different synchronization on GOARM=6,
but it's too late to do that in the Go 1.5 cycle.
We don't believe there are any non-Linux SMP GOARM=6 systems anyway.

Fixes #12067.

Change-Id: I771a556e47893ed540ec2cd33d23c06720157ea3
Reviewed-on: https://go-review.googlesource.com/13363
Reviewed-by: Austin Clements <austin@google.com>
2015-08-07 17:39:07 +00:00
Austin Clements
ad731887a7 runtime: call goexit1 instead of goexit
Currently, runtime.Goexit() calls goexit()—the goroutine exit stub—to
terminate the goroutine. This *mostly* works, but can cause a
"leftover stack barriers" panic if the following happens:

1. Goroutine A has a reasonably large stack.

2. The garbage collector scan phase runs and installs stack barriers
   in A's stack. The top-most stack barrier happens to fall at address X.

3. Goroutine A unwinds the stack far enough to be a candidate for
   stack shrinking, but not past X.

4. Goroutine A calls runtime.Goexit(), which calls goexit(), which
   calls goexit1().

5. The garbage collector enters mark termination.

6. Goroutine A is preempted right at the prologue of goexit1() and
   performs a stack shrink, which calls gentraceback.

gentraceback stops as soon as it sees goexit on the stack, which is
only two frames up at this point, even though there may really be many
frames above it. More to the point, the stack barrier at X is above
the goexit frame, so gentraceback never sees that stack barrier. At
the end of gentraceback, it checks that it saw all of the stack
barriers and panics because it didn't see the one at X.

The fix is simple: call goexit1, which actually implements the process
of exiting a goroutine, rather than goexit, the exit stub.

To make sure this doesn't happen again in the future, we also add an
argument to the stub prototype of goexit so you really, really have to
want to call it in order to call it. We were able to reliably
reproduce the above sequence with a fair amount of awful code inserted
at the right places in the runtime, but chose to change the goexit
prototype to ensure this wouldn't happen again rather than pollute the
runtime with ugly testing code.

Change-Id: Ifb6fb53087e09a252baddadc36eebf954468f2a8
Reviewed-on: https://go-review.googlesource.com/13323
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-06 20:21:05 +00:00
Russ Cox
26baed6af7 runtime: fix race that dropped GoSysExit events from trace
This makes TestTraceStressStartStop much less flaky.
Running under stress, it changes the failure rate from
above 1/100 to under 1/50000. That very unlikely
failure happens when an unexpected GoSysExit is
written. Not sure how that happens yet, but it is much
less important.

Fixes #11953.

Change-Id: I034671936334b4f3ab733614ef239aa121d20247
Reviewed-on: https://go-review.googlesource.com/13321
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-08-06 19:29:09 +00:00
Austin Clements
d57f037302 runtime: don't recheck heap trigger for periodic GC
88e945f introduced a non-speculative double check of the heap trigger
before actually starting a concurrent GC. This was necessary to fix a
race for heap-triggered GC, but broke sysmon-triggered periodic GC,
since the heap check will of course fail for periodically triggered
GC.

Fix this by telling startGC whether or not this GC was triggered by
heap size or a timer and only doing the heap size double check for GCs
triggered by heap size.

Fixes #12026.

Change-Id: I7c3f6ec364545c36d619f2b4b3bf3b758e3bcbd6
Reviewed-on: https://go-review.googlesource.com/13168
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-05 17:28:56 +00:00
Russ Cox
2a60d77059 runtime: align stack pointer during initcgo call on arm
This is what is causing freebsd/arm to crash mysteriously when using cgo.
The bug was introduced in golang.org/cl/4030, which moved this code out
of rt0_go and into its own function. The ARM ABI says that calls must
be made with the stack pointer at an 8-byte boundary, but only FreeBSD
seems to crash when this is violated.

Fixes #10119.

Change-Id: Ibdbe76b2c7b80943ab66b8abbb38b47acb70b1e5
Reviewed-on: https://go-review.googlesource.com/13161
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-08-05 05:31:34 +00:00
Austin Clements
be39a42920 runtime: fix typos in comments
Change-Id: I66f7937b22bb6e05c3f2f0f2a057151020ad9699
Reviewed-on: https://go-review.googlesource.com/13049
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:56 +00:00
Austin Clements
e3870aa6f3 runtime: fix assist utilization computation
When commit 510fd13 enabled assists during the scan phase, it failed
to also update the code in the GC controller that computed the assist
CPU utilization and adjusted the trigger based on it. Fix that code so
it uses the start of the scan phase as the wall-clock time when
assists were enabled rather than the start of the mark phase.

Change-Id: I05013734b4448c3e2c730dc7b0b5ee28c86ed8cf
Reviewed-on: https://go-review.googlesource.com/13048
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:53 +00:00
Austin Clements
1fb01a88f9 runtime: revise assist ratio aggressively
At the start of a GC cycle, the garbage collector computes the assist
ratio based on the total scannable heap size. This was intended to be
conservative; after all, this assumes the entire heap may be reachable
and hence needs to be scanned. But it only assumes that the *current*
entire heap may be reachable. It fails to account for heap allocated
during the GC cycle. If the trigger ratio is very low (near zero), and
most of the heap is reachable when GC starts (which is likely if the
trigger ratio is near zero), then it's possible for the mutator to
create new, reachable heap fast enough that the assists won't keep up
based on the assist ratio computed at the beginning of the cycle. As a
result, the heap can grow beyond the heap goal (by hundreds of megs in
stress tests like in issue #11911).

We already have some vestigial logic for dealing with situations like
this; it just doesn't run often enough. Currently, every 10 ms during
the GC cycle, the GC revises the assist ratio. This was put in before
we switched to a conservative assist ratio (when we really were using
estimates of scannable heap), and it turns out to be exactly what we
need now. However, every 10 ms is far too infrequent for a rapidly
allocating mutator.

This commit reuses this logic, but replaces the 10 ms timer with
revising the assist ratio every time the heap is locked, which
coincides precisely with when the statistics used to compute the
assist ratio are updated.

Fixes #11911.

Change-Id: I377b231ab064946228378fa10422a46d1b50f4c5
Reviewed-on: https://go-review.googlesource.com/13047
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:48 +00:00
Austin Clements
f9dc3382ad runtime: when gcpacertrace > 0, print information about assist ratio
This was useful in debugging the mutator assist behavior for #11911,
and it fits with the other gcpacertrace output.

Change-Id: I1e25590bb4098223a160de796578bd11086309c7
Reviewed-on: https://go-review.googlesource.com/13046
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:46 +00:00
Austin Clements
fc9ca85f4c runtime: make sweep proportional to spans bytes allocated
Proportional concurrent sweep is currently based on a ratio of spans
to be swept per bytes of object allocation. However, proportional
sweeping is performed during span allocation, not object allocation,
in order to minimize contention and overhead. Since objects are
allocated from spans after those spans are allocated, the system tends
to operate in debt, which means when the next GC cycle starts, there
is often sweep debt remaining, so GC has to finish the sweep, which
delays the start of the cycle and delays enabling mutator assists.

For example, it's quite likely that many Ps will simultaneously refill
their span caches immediately after a GC cycle (because GC flushes the
span caches), but at this point, there has been very little object
allocation since the end of GC, so very little sweeping is done. The
Ps then allocate objects from these cached spans, which drives up the
bytes of object allocation, but since these allocations are coming
from cached spans, nothing considers whether more sweeping has to
happen. If the sweep ratio is high enough (which can happen if the
next GC trigger is very close to the retained heap size), this can
easily represent a sweep debt of thousands of pages.

Fix this by making proportional sweep proportional to the number of
bytes of spans allocated, rather than the number of bytes of objects
allocated. Prior to allocating a span, both the small object path and
the large object path ensure credit for allocating that span, so the
system operates in the black, rather than in the red.

Combined with the previous commit, this should eliminate all sweeping
from GC start up. On the stress test in issue #11911, this reduces the
time spent sweeping during GC (and delaying start up) by several
orders of magnitude:

                mean    99%ile     max
    pre fix      1 ms    11 ms   144 ms
    post fix   270 ns   735 ns   916 ns

Updates #11911.

Change-Id: I89223712883954c9d6ec2a7a51ecb97172097df3
Reviewed-on: https://go-review.googlesource.com/13044
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:44 +00:00
Austin Clements
e30c6d64ba runtime: always give concurrent sweep some heap distance
Currently it's possible for the next_gc heap size trigger computed for
the next GC cycle to be less than the current allocated heap size.
This means the next cycle will start immediately, which means there's
no time to perform the concurrent sweep between GC cycles. This places
responsibility for finishing the sweep on GC itself, which delays GC
start-up and hence delays mutator assist.

Fix this by ensuring that next_gc is always at least a little higher
than the allocated heap size, so we won't trigger the next cycle
instantly.

Updates #11911.

Change-Id: I74f0b887bf187518d5fedffc7989817cbcf30592
Reviewed-on: https://go-review.googlesource.com/13043
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:41 +00:00
Austin Clements
fb5230af8a runtime: assist the GC during GC startup and shutdown
Currently there are two sensitive periods during which a mutator can
allocate past the heap goal but mutator assists can't be enabled: 1)
at the beginning of GC between when the heap first passes the heap
trigger and sweep termination and 2) at the end of GC between mark
termination and when the background GC goroutine parks. During these
periods there's no back-pressure or safety net, so a rapidly
allocating mutator can allocate past the heap goal. This is
exacerbated if there are many goroutines because the GC coordinator is
scheduled as any other goroutine, so if it gets preempted during one
of these periods, it may stay preempted for a long period (10s or 100s
of milliseconds).

Normally the mutator does scan work to create back-pressure against
allocation, but there is no scan work during these periods. Hence, as
a fall back, if a mutator would assist but can't yet, simply yield the
CPU. This delays the mutator somewhat, but more importantly gives more
CPU time to the GC coordinator for it to complete the transition.

This is obviously a workaround. Issue #11970 suggests a far better but
far more invasive way to fix this.

Updates #11911. (This very nearly fixes the issue, but about once
every 15 minutes I get a GC cycle where the assists are enabled but
don't do enough work.)

Change-Id: I9768b79e3778abd3e06d306596c3bd77f65bf3f1
Reviewed-on: https://go-review.googlesource.com/13026
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-08-04 18:54:38 +00:00
Austin Clements
88e945fd23 runtime: recheck GC trigger before actually starting GC
Currently allocation checks the GC trigger speculatively during
allocation and then triggers the GC without rechecking. As a result,
it's possible for G 1 and G 2 to detect the trigger simultaneously,
both enter startGC, G 1 actually starts GC while G 2 gets preempted
until after the whole GC cycle, then G 2 immediately starts another GC
cycle even though the heap is now well under the trigger.

Fix this by re-checking the GC trigger non-speculatively just before
actually kicking off a new GC cycle.

This contributes to #11911 because when this happens, we definitely
don't finish the background sweep before starting the next GC cycle,
which can significantly delay the start of concurrent scan.

Change-Id: I560ab79ba5684ba435084410a9765d28f5745976
Reviewed-on: https://go-review.googlesource.com/13025
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-08-04 18:54:32 +00:00
Mikio Hara
5e15e28e0e runtime: skip TestCgoCallbackGC on dragonfly
Updates #11990.

Change-Id: I6c58923a1b5a3805acfb6e333e3c9e87f4edf4ba
Reviewed-on: https://go-review.googlesource.com/13050
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-03 04:41:48 +00:00
Russ Cox
c5dff7282e cmd/compile, runtime: fix placement of map bucket overflow pointer on nacl
On most systems, a pointer is the worst case alignment, so adding
a pointer field at the end of a struct guarantees there will be no
padding added after that field (to satisfy overall struct alignment
due to some more-aligned field also present).

In the runtime, the map implementation needs a quick way to
get to the overflow pointer, which is last in the bucket struct,
so it uses size - sizeof(pointer) as the offset.

NaCl/amd64p32 is the exception, as always.
The worst case alignment is 64 bits but pointers are 32 bits.
There's a long history that is not worth going into, but when
we moved the overflow pointer to the end of the struct,
we didn't get the padding computation right.
The compiler computed the regular struct size and then
on amd64p32 added another 32-bit field.
And the runtime assumed it could step back two 32-bit fields
(one 64-bit register size) to get to the overflow pointer.
But in fact if the struct needed 64-bit alignment, the computation
of the regular struct size would have added a 32-bit pad already,
and then the code unconditionally added a second 32-bit pad.
This placed the overflow pointer three words from the end, not two.
The last two were padding, and since the runtime was consistent
about using the second-to-last word as the overflow pointer,
no harm done in the sense of overwriting useful memory.
But writing the overflow pointer to a non-pointer word of memory
means that the GC can't see the overflow blocks, so it will
collect them prematurely. Then bad things happen.

Correct all this in a few steps:

1. Add an explicit check at the end of the bucket layout in the
compiler that the overflow field is last in the struct, never
followed by padding.

2. When padding is needed on nacl (not always, just when needed),
insert it before the overflow pointer, to preserve the "last in the struct"
property.

3. Let the compiler have the final word on the width of the struct,
by inserting an explicit padding field instead of overwriting the
results of the width computation it does.

4. For the same reason (tell the truth to the compiler), set the type
of the overflow field when we're trying to pretend its not a pointer
(in this case the runtime maintains a list of the overflow blocks
elsewhere).

5. Make the runtime use "last in the struct" as its location algorithm.

This fixes TestTraceStress on nacl/amd64p32.
The 'bad map state' and 'invalid free list' failures no longer occur.

Fixes #11838.

Change-Id: If918887f8f252d988db0a35159944d2b36512f92
Reviewed-on: https://go-review.googlesource.com/12971
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-31 18:49:32 +00:00
Russ Cox
108ec5f75a runtime: fix systemstack tracebacks on nacl/arm
For #11956.

Change-Id: Ic9b57cafa197953cc7f435941e44d42b60b3ddf0
Reviewed-on: https://go-review.googlesource.com/13011
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-07-31 04:35:38 +00:00
Russ Cox
abdc77a288 runtime: avoid reference to stale stack after GC shrinkstack
Dangling pointer error. Unlikely to trigger in practice, but still.
Found by running GODEBUG=efence=1 GOGC=1 trace.test.

Change-Id: Ice474dedcf62dd33ab77526287a023ba3b166db9
Reviewed-on: https://go-review.googlesource.com/12991
Reviewed-by: Austin Clements <austin@google.com>
2015-07-31 02:18:42 +00:00
Russ Cox
4bd8040d47 runtime, sync/atomic: add memory barriers in arm cas routines
This only triggers on ARMv7+.
If there are important SMP ARMv6 machines we can reconsider.

Makes TestLFStress tests pass and sync/atomic tests not time out
on Apple iPad Mini 3.

Fixes #7977.
Fixes #10189.

Change-Id: Ie424dea3765176a377d39746be9aa8265d11bec4
Reviewed-on: https://go-review.googlesource.com/12950
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-30 20:11:11 +00:00
Russ Cox
e0c180c44f runtime/cgo: fix darwin/amd64 signal handling setup
Was not allocating space for the frame above sigpanic,
nor was it pushing the LR into the right place.
Because traceback past sigpanic only needs the
LR for faulting leaves, this was not noticed too much.
But it did break the sync/atomic nil deref tests.

Change-Id: Icba53fffa193423aab744c37f21ee893ce2ee3ac
Reviewed-on: https://go-review.googlesource.com/12926
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-30 19:18:45 +00:00
Russ Cox
b2dfacf35e runtime: change arm software div/mod call sequence not to modify stack
Instead of pushing the denominator argument on the stack,
the denominator is now passed in m.

This fixes a variety of bugs related to trying to take stack traces
backwards from the middle of the software div/mod routines.
Some of those bugs have been kludged around in the past,
but others have not. Instead of trying to patch up after breaking
the stack, this CL stops breaking the stack.

This is an update of https://golang.org/cl/19810043,
which was rolled back in https://golang.org/cl/20350043.

The problem in the original CL was that there were divisions
at bad times, when m was not available. These were divisions
by constant denominators, either in C code or in assembly.
The Go compiler knows how to generate division by multiplication
for constant denominators, but the C compiler did not.
There is no longer any C code, so that's taken care of.
There was one problematic DIV in runtime.usleep (assembly)
but https://golang.org/cl/12898 took care of that one.
So now this approach is safe.

Reject DIV/MOD in NOSPLIT functions to keep them from
coming back.

Fixes #6681.
Fixes #6699.
Fixes #10486.

Change-Id: I09a13c76ad08ba75b3bd5d46a3eb78e66a84ab38
Reviewed-on: https://go-review.googlesource.com/12899
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-30 16:14:05 +00:00
Russ Cox
c9d2c7f0d2 runtime: replace divide with multiply in runtime.usleep on arm
We want to adjust the DIV calling convention to use m,
and usleep can be called without an m, so switch to a
multiplication by the reciprocal (and test).

Step toward a fix for #6699 and #10486.

Change-Id: Iccf76a18432d835e48ec64a2fa34a0e4d6d4b955
Reviewed-on: https://go-review.googlesource.com/12898
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-30 15:48:29 +00:00
David Crawshaw
b7205b92c0 runtime/trace: test requires 'go tool addr2line'
For the android/arm builder.

Change-Id: Iad4881689223cd6479870da9541524a8cc458cce
Reviewed-on: https://go-review.googlesource.com/12859
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2015-07-30 05:57:37 +00:00
Russ Cox
c4092ac398 cmd/compile: fix uninitialized memory during type switch assertE2I2
Fixes arm64 builder crash.

The bug is possible on all architectures; you just have to get lucky
and hit a preemption or a stack growth on entry to assertE2I2.
The test stacks the deck.

Change-Id: I8419da909b06249b1ad15830cbb64e386b6aa5f6
Reviewed-on: https://go-review.googlesource.com/12890
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-30 05:21:56 +00:00
Russ Cox
bfac8623d5 runtime: enable TestEmptySlice
It says to disable until #7564 is fixed. It was fixed in April 2014.

Change-Id: I9bebfe96802bafdd2d1a0a47591df346d91b000c
Reviewed-on: https://go-review.googlesource.com/12858
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-30 04:47:16 +00:00
Russ Cox
d3ffc975f3 runtime: set invalidptr=1 by default, as documented
Also make invalidptr control the recently added GC pointer check,
as documented.

Change-Id: Iccfdf49480219d12be8b33b8f03d8312d8ceabed
Reviewed-on: https://go-review.googlesource.com/12857
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-29 23:50:20 +00:00
Russ Cox
bd5ca22232 runtime/trace: remove existing Skips
The skips added in CL 12579, based on incorrect time stamps,
should be sufficient to identify and exclude all the time-related
flakiness on these systems.

If there is other flakiness, we want to find out.

For #10512.

Change-Id: I5b588ac1585b2e9d1d18143520d2d51686b563e3
Reviewed-on: https://go-review.googlesource.com/12746
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:32:23 +00:00
Russ Cox
80c98fa901 runtime/trace: record event sequence numbers explicitly
Nearly all the flaky failures we've seen in trace tests have been
due to the use of time stamps to determine relative event ordering.
This is tricky for many reasons, including:
 - different cores might not have exactly synchronized clocks
 - VMs are worse than real hardware
 - non-x86 chips have different timer resolution than x86 chips
 - on fast systems two events can end up with the same time stamp

Stop trying to make time reliable. It's clearly not going to be for Go 1.5.
Instead, record an explicit event sequence number for ordering.
Using our own counter solves all of the above problems.

The trace still contains time stamps, of course. The sequence number
is just used for ordering.

Should alleviate #10554 somewhat. Then tickDiv can be chosen to
be a useful time unit instead of having to be exact for ordering.

Separating ordering and time stamps lets the trace parser diagnose
systems where the time stamp order and actual order do not match
for one reason or another. This CL adds that check to the end of
trace.Parse, after all other sequence order-based checking.
If that error is found, we skip the test instead of failing it.
Putting the check in trace.Parse means that cmd/trace will pick
up the same check, refusing to display a trace where the time stamps
do not match actual ordering.

Using net/http's BenchmarkClientServerParallel4 on various CPU counts,
not tracing vs tracing:

name                      old time/op    new time/op    delta
ClientServerParallel4       50.4µs ± 4%    80.2µs ± 4%  +59.06%        (p=0.000 n=10+10)
ClientServerParallel4-2     33.1µs ± 7%    57.8µs ± 5%  +74.53%        (p=0.000 n=10+10)
ClientServerParallel4-4     18.5µs ± 4%    32.6µs ± 3%  +75.77%        (p=0.000 n=10+10)
ClientServerParallel4-6     12.9µs ± 5%    24.4µs ± 2%  +89.33%        (p=0.000 n=10+10)
ClientServerParallel4-8     11.4µs ± 6%    21.0µs ± 3%  +83.40%        (p=0.000 n=10+10)
ClientServerParallel4-12    14.4µs ± 4%    23.8µs ± 4%  +65.67%        (p=0.000 n=10+10)

Fixes #10512.

Change-Id: I173eecf8191e86feefd728a5aad25bf1bc094b12
Reviewed-on: https://go-review.googlesource.com/12579
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:32:14 +00:00
Russ Cox
fde392623a runtime: ignore arguments in cgocallback_gofunc frame
Otherwise the GC may see uninitialized memory there,
which might be old pointers that are retained, or it might
trigger the invalid pointer check.

Fixes #11907.

Change-Id: I67e306384a68468eef45da1a8eb5c9df216a77c0
Reviewed-on: https://go-review.googlesource.com/12852
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:30:46 +00:00
Russ Cox
f6dfe16798 runtime: fix darwin/amd64 assembly frame sizes
Change-Id: I2f0ecdc02ce275feadf07e402b54f988513e9b49
Reviewed-on: https://go-review.googlesource.com/12855
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 22:26:02 +00:00
Russ Cox
4addec3aaa runtime: reenable bad pointer check in GC
The last time we tried this, linux/arm64 broke.
The series of CLs leading to this one fixes that problem.
Let's try again.

Fixes #9880.

Change-Id: I67bc1d959175ec972d4dcbe4aa6f153790f74251
Reviewed-on: https://go-review.googlesource.com/12849
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 21:37:55 +00:00
Russ Cox
421220571d runtime, reflect: use correctly aligned stack frame sizes on arm64
arm64 requires either no stack frame or a frame with a size that is 8 mod 16
(adding the saved LR will make it 16-aligned).

The cmd/internal/obj/arm64 has been silently aligning frames, but it led to
a terrible bug when the compiler and obj disagreed on the frame size,
and it's just generally confusing, so we're going to make misaligned frames
an error instead of something that is silently changed.

This CL prepares by updating assembly files.
Note that the changes in this CL are already being done silently by
cmd/internal/obj/arm64, so there is no semantic effect here,
just a clarity effect.

For #9880.

Change-Id: Ibd6928dc5fdcd896c2bacd0291bf26b364591e28
Reviewed-on: https://go-review.googlesource.com/12845
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 21:35:35 +00:00
Austin Clements
23e4744c07 runtime: report GC CPU utilization in MemStats
This adds a GCCPUFraction field to MemStats that reports the
cumulative fraction of the program's execution time spent in the
garbage collector. This is equivalent to the utilization percent shown
in the gctrace output and makes this available programmatically.

This does make one small effect on the gctrace output: we now report
the duration of mark termination up to just before the final
start-the-world, rather than up to just after. However, unlike
stop-the-world, I don't believe there's any way that start-the-world
can block, so it should take negligible time.

While there are many statistics one might want to expose via MemStats,
this is one of the few that will undoubtedly remain meaningful
regardless of future changes to the memory system.

The diff for this change is larger than the actual change. Mostly it
lifts the code for computing the GC CPU utilization out of the
debug.gctrace path.

Updates #10323.

Change-Id: I0f7dc3fdcafe95e8d1233ceb79de606b48acd989
Reviewed-on: https://go-review.googlesource.com/12844
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 20:23:34 +00:00
Austin Clements
4b71660c5b runtime: always capture GC phase transition times
Currently we only capture GC phase transition times if
debug.gctrace>0, but we're about to compute GC CPU utilization
regardless of whether debug.gctrace is set, so we need these
regardless of debug.gctrace.

Change-Id: If3acf16505a43d416e9a99753206f03287180660
Reviewed-on: https://go-review.googlesource.com/12843
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-07-29 20:23:25 +00:00
Austin Clements
87f97c73d3 runtime: avoid race between SIGPROF traceback and stack barriers
The following sequence of events can lead to the runtime attempting an
out-of-bounds access on a stack barrier slice:

1. A SIGPROF comes in on a thread while the G on that thread is in
   _Gsyscall. The sigprof handler calls gentraceback, which saves a
   local copy of the G's stkbar slice. Currently the G has no stack
   barriers, so this slice is empty.

2. On another thread, the GC concurrently scans the stack of the
   goroutine being profiled (it considers it stopped because it's in
   _Gsyscall) and installs stack barriers.

3. Back on the sigprof thread, gentraceback comes across a stack
   barrier in the stack and attempts to look it up in its (zero
   length) copy of G's old stkbar slice, which causes an out-of-bounds
   access.

This commit fixes this by adding a simple cas spin to synchronize the
SIGPROF handler with stack barrier insertion.

In general I would prefer that this synchronization be done through
the G status, since that's how stack scans are otherwise synchronized,
but adding a new lock is a much smaller change and G statuses are full
of subtlety.

Fixes #11863.

Change-Id: Ie89614a6238bb9c6a5b1190499b0b48ec759eaf7
Reviewed-on: https://go-review.googlesource.com/12748
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 19:31:46 +00:00
Rick Hudson
e95bc5fef7 runtime: force mutator to give work buffer to GC
The scheduler, work buffer's dispose, and write barriers
can conspire to hide the a pointer from the GC's concurent
mark phase. If this pointer is the only path to a large
amount of marking the STW mark termination phase may take
a lot of time.

Consider the following:
1) dispose places a work buffer on the partial queue
2) the GC is busy so it does not immediately remove and
   process the work buffer
3) the scheduler runs a mutator whose write barrier dequeues the
   work buffer from the partial queue so the GC won't see it
This repeats until the GC reaches the mark termination
phase where the GC finally discovers the pointer along
with a lot of work to do.

This CL fixes the problem by having the mutator
dispose of the buffer to the full queue instead of
the partial queue. Since the write buffer never asks for full
buffers the conspiracy described above is not possible.

Updates #11694.

Change-Id: I2ce832f9657a7570f800e8ce4459cd9e304ef43b
Reviewed-on: https://go-review.googlesource.com/12840
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 18:56:11 +00:00
Dmitry Vyukov
0c22a74e85 runtime: fix out-of-bounds in stack debugging
Currently stackDebug=4 crashes as:

panic: runtime error: index out of range
fatal error: panic on system stack
runtime stack:
runtime.throw(0x607470, 0x15)
	src/runtime/panic.go:527 +0x96
runtime.gopanic(0x5ada00, 0xc82000a1d0)
	src/runtime/panic.go:354 +0xb9
runtime.panicindex()
	src/runtime/panic.go:12 +0x49
runtime.adjustpointers(0xc820065ac8, 0x7ffe58b56100, 0x7ffe58b56318, 0x0)
	src/runtime/stack1.go:428 +0x5fb
runtime.adjustframe(0x7ffe58b56200, 0x7ffe58b56318, 0x1)
	src/runtime/stack1.go:542 +0x780
runtime.gentraceback(0x487760, 0xc820065ac0, 0x0, 0xc820001080, 0x0, 0x0, 0x7fffffff, 0x6341b8, 0x7ffe58b56318, 0x0, ...)
	src/runtime/traceback.go:336 +0xa7e
runtime.copystack(0xc820001080, 0x1000)
	src/runtime/stack1.go:616 +0x3b1
runtime.newstack()
	src/runtime/stack1.go:801 +0xdde

Change-Id: If2d60960231480a9dbe545d87385fe650d6db808
Reviewed-on: https://go-review.googlesource.com/12763
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-28 20:11:19 +00:00
Russ Cox
7a63ab1a65 runtime: use 64k page rounding on arm64
Fixes #11886.

Change-Id: I9392fd2ef5951173ae275b3ab42db4f8bd2e1d7a
Reviewed-on: https://go-review.googlesource.com/12747
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-28 19:59:00 +00:00
David du Colombier
68117a91ae runtime: fix x86 stack trace for call to heap memory on Plan 9
Russ Cox fixed this issue for other systems
in CL 12026, but the Plan 9 part was forgotten.

Fixes #11656.

Change-Id: I91c033687987ba43d13ad8f42e3fe4c7a78e6075
Reviewed-on: https://go-review.googlesource.com/12762
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-28 19:01:41 +00:00
Ian Lance Taylor
0229317d76 runtime: don't define libc_getpid in os3_solaris.go
The function is already defined between syscall_solaris.go and
syscall2_solaris.go.

Change-Id: I034baf7c8531566bebfdbc5a4061352cbcc31449
Reviewed-on: https://go-review.googlesource.com/12773
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-28 14:07:17 +00:00
Ian Lance Taylor
deaf0333df runtime: fix definitions of getpid and kill on Solaris
A further attempt to fix raiseproc on Solaris.

Change-Id: I8d8000d6ccd0cd9f029ebe1f211b76ecee230cd0
Reviewed-on: https://go-review.googlesource.com/12771
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-28 06:21:08 +00:00
Ian Lance Taylor
d7223c6cc1 runtime: correct implementation of raiseproc on Solaris
I forgot that the libc raise function only sends the signal to the
current thread.  We need to actually use kill and getpid here, as we
do on other systems.

Change-Id: Iac34af822c93468bf68cab8879db3ee20891caaf
Reviewed-on: https://go-review.googlesource.com/12704
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-28 05:41:27 +00:00
David Crawshaw
249894ab6c runtime/cgo: remove TMPDIR logic for iOS
Seems like the simplest solution for 1.5. All the parts of the test
suite I can run on my current device (for which my exception handler
fix no longer works, apparently) pass without this code. I'll move it
into x/mobile/app.

Fixes #11884

Change-Id: I2da40c8c7b48a4c6970c4d709dd7c148a22e8727
Reviewed-on: https://go-review.googlesource.com/12721
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-27 21:28:31 +00:00
Austin Clements
c1f7a56fc0 runtime: close window that hides GC work from concurrent mark
Currently we enter mark 2 by first flushing all existing gcWork caches
and then setting gcBlackenPromptly, which disables further gcWork
caching. However, if a worker or assist pulls a work buffer in to its
gcWork cache after that cache has been flushed but before caching is
disabled, that work may remain in that cache until mark termination.
If that work represents a heap bottleneck (e.g., a single pointer that
is the only way to reach a large amount of the heap), this can force
mark termination to do a large amount of work, resulting in a long
STW.

Fix this by reversing the order of these steps: first disable caching,
then flush all existing caches.

Rick Hudson <rlh> did the hard work of tracking this down. This CL
combined with CL 12672 and CL 12646 distills the critical parts of his
fix from CL 12539.

Fixes #11694.

Change-Id: Ib10d0a21e3f6170a80727d0286f9990df049fed2
Reviewed-on: https://go-review.googlesource.com/12688
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-07-27 20:00:25 +00:00
Austin Clements
510fd1350d runtime: enable GC assists ASAP
Currently the GC coordinator enables GC assists at the same time it
enables background mark workers, after the concurrent scan phase is
done. However, this means a rapidly allocating mutator has the entire
scan phase during which to allocate beyond the heap trigger and
potentially beyond the heap goal with no back-pressure from assists.
This prevents the feedback system that's supposed to keep the heap
size under the heap goal from doing its job.

Fix this by enabling mutator assists during the scan phase. This is
safe because the write barrier is already enabled and globally
acknowledged at this point.

There's still a very small window between when the heap size reaches
the heap trigger and when the GC coordinator is able to stop the world
during which the mutator can allocate unabated. This allows *very*
rapidly allocator mutators like TestTraceStress to still occasionally
exceed the heap goal by a small amount (~20 MB at most for
TestTraceStress). However, this seems like a corner case.

Fixes #11677.

Change-Id: I0f80d949ec82341cd31ca1604a626efb7295a819
Reviewed-on: https://go-review.googlesource.com/12674
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:05 +00:00
Austin Clements
f5e67e53e7 runtime: allow GC drain whenever write barrier is enabled
Currently we hand-code a set of phases when draining is allowed.
However, this set of phases is conservative. The critical invariant is
simply that the write barrier must be enabled if we're draining.

Shortly we're going to enable mutator assists during the scan phase,
which means we may drain during the scan phase. In preparation, this
commit generalizes these assertions to check the fundamental condition
that the write barrier is enabled, rather than checking that we're in
any particular phase.

Change-Id: I0e1bec1ca823d4a697a0831ec4c50f5dd3f2a893
Reviewed-on: https://go-review.googlesource.com/12673
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:04 +00:00
Austin Clements
64a32ffeee runtime: don't start workers between mark 1 & 2
Currently we clear both the mark 1 and mark 2 signals at the beginning
of concurrent mark. If either if these is clear, it acts as a signal
to the scheduler that it should start background workers. However,
this means that in the interim *between* mark 1 and mark 2, the
scheduler basically loops starting up new workers only to have them
return with nothing to do. In addition to harming performance and
delaying mutator work, this approach has a race where workers started
for mark 1 can mistakenly signal mark 2, causing it to complete
prematurely. This approach also interferes with starting assists
earlier to fix #11677.

Fix this by initially setting both mark 1 and mark 2 to "signaled".
The scheduler will not start background mark workers, though assists
can still run. When we're ready to enter mark 1, we clear the mark 1
signal and wait for it. Then, when we're ready to enter mark 2, we
clear the mark 2 signal and wait for it.

This structure also lets us deal cleanly with the situation where all
work is drained *prior* to the mark 2 wait, meaning that there may be
no workers to signal completion. Currently we deal with this using a
racy (and possibly incorrect) check for work in the coordinator itself
to skip the mark 2 wait if there's no work. This change makes the
coordinator unconditionally wait for mark completion and makes the
scheduler itself signal completion by slightly extending the logic it
already has to determine that there's no work and hence no use in
starting a new worker.

This is a prerequisite to fixing the remaining component of #11677,
which will require enabling assists during the scan phase. However, we
don't want to enable background workers until the mark phase because
they will compete with the scan. This change lets us use bgMark1 and
bgMark2 to indicate when it's okay to start background workers
independent of assists.

This is also a prerequisite to fixing #11694. It significantly reduces
the occurrence of long mark termination pauses in #11694 (from 64 out
of 1000 to 2 out of 1000 in one experiment).

Coincidentally, this also reduces the final heap size (and hence run
time) of TestTraceStress from ~100 MB and ~1.9 seconds to ~14 MB and
~0.4 seconds because it significantly shortens concurrent mark
duration.

Rick Hudson <rlh> did the hard work of tracking this down.

Change-Id: I12ea9ee2db9a0ae9d3a90dde4944a75fcf408f4c
Reviewed-on: https://go-review.googlesource.com/12672
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:02 +00:00
Austin Clements
8f34b25318 runtime: retry GC assist until debt is paid off
Currently, there are three ways to satisfy a GC assist: 1) the mutator
steals credit from background GC, 2) the mutator actually does GC
work, and 3) there is no more work available. 3 was never really
intended as a way to satisfy an assist, and it causes problems: there
are periods when it's expected that the GC won't have any work, such
as when transitioning from mark 1 to mark 2 and from mark 2 to mark
termination. During these periods, there's no back-pressure on rapidly
allocating mutators, which lets them race ahead of the heap goal.

For example, test/init1.go and the runtime/trace test both have small
reachable heaps and contain loops that rapidly allocate large garbage
byte slices. This bug lets these tests exceed the heap goal by several
orders of magnitude.

Fix this by forcing the assist (and hence the allocation) to block
until it can satisfy its debt via either 1 or 2, or the GC cycle
terminates.

This fixes one the causes of #11677. It's still possible to overshoot
the GC heap goal, but with this change the overshoot is almost exactly
by the amount of allocation that happens during the concurrent scan
phase, between when the heap passes the GC trigger and when the GC
enables assists.

Change-Id: I5ef4edcb0d2e13a1e432e66e8245f2bd9f8995be
Reviewed-on: https://go-review.googlesource.com/12671
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:01 +00:00
Austin Clements
500c88d40d runtime: yield to GC coordinator after assist completion
Currently it's possible for the GC assist to signal completion of the
mark phase, which puts the GC coordinator goroutine on the current P's
run queue, and then return to mutator code that delays until the next
forced preemption before actually yielding control to the GC
coordinator, dragging out completion of the mark phase. This delay can
be further exacerbated if the mutator makes other goroutines runnable
before yielding control, since this will push the GC coordinator on
the back of the P's run queue.

To fix this, this adds a Gosched to the assist if it completed the
mark phase. This immediately and directly yields control to the GC
coordinator. This already happens implicitly in the background mark
workers because they park immediately after completing the mark.

This is one of the reasons completion of the mark phase is being
dragged out and allowing the mutator to allocate without assisting,
leading to the large heap goal overshoot in issue #11677. This is also
a prerequisite to making the assist block when it can't pay off its
debt.

Change-Id: I586adfbecb3ca042a37966752c1dc757f5c7fc78
Reviewed-on: https://go-review.googlesource.com/12670
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:00 +00:00
Austin Clements
4f188c2d1c runtime: disallow GC assists in non-preemptible contexts
Currently it's possible to perform GC work on a system stack or when
locks are held if there's an allocation that triggers an assist. This
is generally a bad idea because of the fragility of these contexts,
and it's incompatible with two changes we're about to make: one is to
yield after signaling mark completion (which we can't do from a
non-preemptible context) and the other is to make assists block if
there's no other way for them to pay off the assist debt.

This commit simply skips the assist if it's called from a
non-preemptible context. The allocation will still count toward the
assist debt, so it will be paid off by a later assist. There should be
little allocation from non-preemptible contexts, so this shouldn't
harm the overall assist mechanism.

Change-Id: I7bf0e6c73e659fe6b52f27437abf39d76b245c79
Reviewed-on: https://go-review.googlesource.com/12649
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:59 +00:00
Austin Clements
dff9108d98 runtime: make notetsleep_internal nowritebarrier
When notetsleep_internal is called from notetsleepg, notetsleepg has
just given up the P, so write barriers are not allowed in
notetsleep_internal.

Change-Id: I1b214fa388b1ea05b8ce2dcfe1c0074c0a3c8870
Reviewed-on: https://go-review.googlesource.com/12647
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:58 +00:00
Austin Clements
cf225a1748 runtime: fix mark 2 completion in fractional/idle workers
Currently fractional and idle mark workers dispose of their gcWork
cache during mark 2 after incrementing work.nwait and after checking
whether there are any workers or any work available. This creates a
window for two races:

1) If the only remaining work is in this worker's gcWork cache, it
   will see that there are no more workers and no more work on the
   global lists (since it has not yet flushed its own cache) and
   prematurely signal mark 2 completion.

2) After this worker has incremented work.nwait but before it has
   flushed its cache, another worker may observe that there are no
   more workers and no more work and prematurely signal mark 2
   completion.

We can fix both of these by simply moving the cache flush above the
increment of nwait and the test of the completion condition.

This is probably contributing to #11694, though this alone is not
enough to fix it.

Change-Id: Idcf9656e5c460c5ea0d23c19c6c51e951f7716c3
Reviewed-on: https://go-review.googlesource.com/12646
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:56 +00:00
Austin Clements
b8526a8380 runtime: steal the correct amount of GC assist credit
GC assists are supposed to steal at most the amount of background GC
credit available so that background GC credit doesn't go negative.
However, they are instead stealing the *total* amount of their debt
but only claiming up to the amount of credit that was available. This
results in draining the background GC credit pool too quickly, which
results in unnecessary assist work.

The fix is trivial: steal the amount of work we meant to steal (which
is already computed).

Change-Id: I837fe60ed515ba91c6baf363248069734a7895ef
Reviewed-on: https://go-review.googlesource.com/12643
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:54 +00:00
Austin Clements
4c9464525e runtime: document gctrace format
Fixes #10348.

Change-Id: I3eea9738e3f6fdc1998d04a601dc9b556dd2db72
Reviewed-on: https://go-review.googlesource.com/12453
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 17:45:34 +00:00
Austin Clements
7eeeae2a5c runtime: always report starting heap size in gctrace
Currently the gctrace output reports the trigger heap size, rather
than the actual heap size at the beginning of GC. Often these are the
same, or at least very close. However, it's possible for the heap to
already have exceeded this trigger when we first check the trigger and
start GC; in this case, this output is very misleading. We've
encountered this confusion a few times when debugging and this
behavior is difficult to document succinctly.

Change the gctrace output to report the actual heap size when GC
starts, rather than the trigger.

Change-Id: I246b3ccae4c4c7ea44c012e70d24a46878d7601f
Reviewed-on: https://go-review.googlesource.com/12452
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 17:45:28 +00:00
Austin Clements
cc6ed285e5 runtime: remove # from gctrace line
Whenever someone pastes gctrace output into GitHub, it helpfully turns
the GC cycle number into a link to some unrelated issue. Prevent this
by removing the pound before the cycle number. The fact that this is a
cycle number is probably more obvious at a glance than most of the
other numbers.

Change-Id: Ifa5fc7fe6c715eac50e639f25bc36c81a132ffea
Reviewed-on: https://go-review.googlesource.com/12413
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 17:45:22 +00:00
Ian Lance Taylor
f0876a1a94 runtime: log all thread stack traces during GODEBUG=crash on Unix
This extends https://golang.org/cl/2811, which only applied to Darwin
and GNU/Linux, to all Unix systems.

Fixes #9591.

Change-Id: Iec3fb438564ba2924b15b447c0480f87c0bfd009
Reviewed-on: https://go-review.googlesource.com/12661
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 16:58:53 +00:00
Russ Cox
6b8762104a runtime/pprof: document content of heap profile
Fixes #11343.

Change-Id: I46efc24b687b9d060ad864fbb238c74544348e38
Reviewed-on: https://go-review.googlesource.com/12556
Reviewed-by: Rob Pike <r@golang.org>
2015-07-27 16:30:27 +00:00
Russ Cox
f6fb549d22 runtime/cgo: move TMPDIR magic out of os
It's not clear this really belongs anywhere at all,
but this is a better place for it than package os.
This way package os can avoid importing "C".

Fixes #10455.

Change-Id: Ibe321a93bf26f478951c3a067d75e22f3d967eb7
Reviewed-on: https://go-review.googlesource.com/12574
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-07-27 16:05:42 +00:00
Michael Hudson-Doyle
2b0ddb6c23 runtime: pass a smaller buffer to sched_getaffinity on ARM
The system stack is only around 8kb on ARM so one can't put an 8kb buffer on
the stack. More than 1024 ARM cores seems sufficiently unlikely for the
foreseeable future.

Fixes #11853

Change-Id: I7cb27c1250a6153f86e269c172054e9dfc218c72
Reviewed-on: https://go-review.googlesource.com/12622
Reviewed-by: Austin Clements <austin@google.com>
2015-07-27 01:04:10 +00:00
Ian Lance Taylor
eb248c4df2 runtime: require gdb version 7.9 for gdb test
Issue 11214 reports problems with older versions of gdb.  It does work
with gdb 7.9 on my Ubuntu Trusty system, so take that as the minimum
required version.

Fixes #11214.

Change-Id: I61b732895506575be7af595f81fc1bcf696f58c2
Reviewed-on: https://go-review.googlesource.com/12626
Reviewed-by: Austin Clements <austin@google.com>
2015-07-24 17:15:44 +00:00
Ian Lance Taylor
d9ee9a0f6e runtime: fix runtime·raise for dragonfly amd64
Fixes #11847.

Change-Id: I21736a4c6f6fb2f61aec1396ce2c965e3e329e92
Reviewed-on: https://go-review.googlesource.com/12621
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
2015-07-24 05:16:19 +00:00
Russ Cox
74ec5bf2d8 runtime: make pcln table check not trigger next to foreign code
Foreign code can be arbitrarily aligned,
so the function before it can have
arbitrarily much padding.
We can't call pcvalue on values in the padding.

Fixes #11653.

Change-Id: I7d57f813ae5a2409d1520fcc909af3eeef2da131
Reviewed-on: https://go-review.googlesource.com/12550
Reviewed-by: Rob Pike <r@golang.org>
2015-07-23 14:14:22 +00:00
Russ Cox
7334cb3a6f runtime/trace: fix TestTraceSymbolize networking
We use 127.0.0.1 instead of localhost in Go networking tests.
The reporter of #11774 has localhost defined to be 120.192.83.162,
for reasons unknown.

Also, if TestTraceSymbolize calls Fatalf (for example because Listen
fails) then we need to stop the trace for future tests to work.
See failure log in #11774.

Fixes #11774.

Change-Id: Iceddb03a72d31e967acd2d559ecb78051f9c14b7
Reviewed-on: https://go-review.googlesource.com/12521
Reviewed-by: Rob Pike <r@golang.org>
2015-07-23 05:37:15 +00:00
Russ Cox
77d38d9cbe runtime: handle linux CPU masks up to 64k CPUs
Fixes #11823.

Change-Id: Ic949ccb9657478f8ca34fdf1a6fe88f57db69f24
Reviewed-on: https://go-review.googlesource.com/12535
Reviewed-by: Austin Clements <austin@google.com>
2015-07-22 20:53:01 +00:00
Russ Cox
75d779566b runtime/cgo: make compatible with race detector
Some routines run without and m or g and cannot invoke the
race detector runtime. They must be opaque to the runtime.
That used to be true because they were written in C.
Now that they are written in Go, disable the race detector
annotations for those functions explicitly.

Add test.

Fixes #10874.

Change-Id: Ia8cc28d51e7051528f9f9594b75634e6bb66a785
Reviewed-on: https://go-review.googlesource.com/12534
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-22 20:28:47 +00:00
Russ Cox
3b26e8b29a runtime/pprof: ignore too few samples on Windows test
Fixes #10842.

Change-Id: I7de98f3073a47911863a252b7a74d8fdaa48c86f
Reviewed-on: https://go-review.googlesource.com/12529
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-22 20:26:37 +00:00
Ian Lance Taylor
872b168fe3 runtime: if we don't handle a signal on a non-Go thread, raise it
In the past badsignal would crash the program.  In
https://golang.org/cl/10757044 badsignal was changed to call sigsend,
to fix issue #3250.  The effect of this was that when a non-Go thread
received a signal, and os/signal.Notify was not being used to check
for occurrences of the signal, the signal was ignored.

This changes the code so that if os/signal.Notify is not being used,
then the signal handler is reset to what it was, and the signal is
raised again.  This lets non-Go threads handle the signal as they
wish.  In particular, it means that a segmentation violation in a
non-Go thread will ordinarily crash the process, as it should.

Fixes #10139.
Update #11794.

Change-Id: I2109444aaada9d963ad03b1d071ec667760515e5
Reviewed-on: https://go-review.googlesource.com/12503
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2015-07-22 20:26:29 +00:00
Russ Cox
4a4eba9f37 runtime: disable TestGoroutineParallelism on uniprocessor
It's a bad test and it's worst on uniprocessors.

Fixes #11143.

Change-Id: I0164231ada294788d7eec251a2fc33e02a26c13b
Reviewed-on: https://go-review.googlesource.com/12522
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-22 18:53:12 +00:00
Austin Clements
58f3a82950 runtime: fix comments referring to trace functions in runtime/pprof
ae1ea2a moved trace-related functions from runtime/pprof to
runtime/trace, but missed a doc comment and a code comment. Update
these to reflect the move.

Change-Id: I6e1e8861e5ede465c08a2e3f80b976145a8b32d8
Reviewed-on: https://go-review.googlesource.com/12525
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-07-22 18:33:38 +00:00
Dmitry Vyukov
ae1ea2aa94 runtime/trace: add new package
Move tracing functions from runtime/pprof to the new runtime/trace package.

Fixes #9710

Change-Id: I718bcb2ae3e5959d9f72cab5e6708289e5c8ebd5
Reviewed-on: https://go-review.googlesource.com/12511
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-22 15:47:16 +00:00
Michael Hudson-Doyle
1125cd4997 cmd/compile: define func value symbols at declaration
This is mostly Russ's https://golang.org/cl/12145 but with some extra fixes to
account for the fact that function declarations without implementations now
break shared libraries, and including my test case.

Fixes #11480.

Change-Id: Iabdc2934a0378e5025e4e7affadb535eaef2c8f1
Reviewed-on: https://go-review.googlesource.com/12340
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-20 00:50:46 +00:00
Austin Clements
1942e3814b runtime: clarify runtime.GC blocking behavior
The runtime.GC documentation was rewritten in df2809f to make it clear
that it blocks until GC is complete, but the re-rewrite in ed9a4c9 and
e28a679 lost this property when clarifying that it may also block the
entire program and not just the caller.

Try to arrive at wording that conveys both of these properties.

Change-Id: I1e255322aa28a21a548556ecf2a44d8d8ac524ef
Reviewed-on: https://go-review.googlesource.com/12392
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-19 15:10:06 +00:00
Ian Lance Taylor
692054e76e runtime: check for findmoduledatap returning nil
The findmoduledatap function will not return nil in ordinary use, but
check for nil to try to avoid crashing when we are already crashing.

Update #11783.

Change-Id: If7b1adb51efab13b4c1a37b6f3c9ad22641a0b56
Reviewed-on: https://go-review.googlesource.com/12391
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-18 21:26:59 +00:00
Alex Brainman
4a0d9587f2 runtime: skip TestReturnAfterStackGrowInCallback if gcc is not found
Fixes #11754

Change-Id: Ifa423ca6eea46d1500278db290498724a9559d14
Reviewed-on: https://go-review.googlesource.com/12347
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-18 01:29:09 +00:00
Rob Pike
e28a679216 runtime: make the GC message less committal.
We shouldn't guarantee this behavior, but suggest it's possible.

Change-Id: I4c2afb48b99be4d91537306d3337171a13c9990a
Reviewed-on: https://go-review.googlesource.com/12346
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-18 00:28:50 +00:00
Rob Pike
ed9a4c91c2 runtime: document that GC blocks the whole program
No code changes. Just make it clear that runtime.GC is not concurrent.

Change-Id: I00a99ebd26402817c665c9a128978cef19f037be
Reviewed-on: https://go-review.googlesource.com/12345
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-17 22:40:21 +00:00
Austin Clements
e33d6b3d4d runtime: remove out-of-date comment
An out-of-date comment snuck in to cc8f544. Remove it.

Change-Id: I5bc7c17e737d1cabe57b88de06d7579c60ca28ff
Reviewed-on: https://go-review.googlesource.com/12328
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2015-07-17 16:52:32 +00:00
Austin Clements
cc8f544198 runtime: don't free large spans until heapBitsSweepSpan returns
This fixes a race between 1) sweeping and freeing an unmarked large
span and 2) reusing that span and allocating from it. This race arises
because mSpan_Sweep returns spans for large objects to the heap
*before* heapBitsSweepSpan clears the mark bit on the object in the
span.

Specifically, the following sequence of events can lead to an
incorrectly zeroed bitmap byte, which causes the garbage collector to
not trace any pointers in that object (the pointer bits for the first
four words are cleared, and the scan bits are also cleared, so it
looks like a no-scan object).

1) P0 calls mSpan_Sweep on a large span S0 with an unmarked object on it.

2) mSpan_Sweep calls heapBitsSweepSpan, which invokes the callback for
   the one (unmarked) object on the span.

3) The callback calls mHeap_Free, which makes span S0 available for
   allocation, but this is too early.

4) P1 grabs this S0 from the heap to use for allocation.

5) P1 allocates an object on this span and writes that object's type
   bits to the bitmap.

6) P0 returns from the callback to heapBitsSweepSpan.
   heapBitsSweepSpan clears the byte containing the mark, even though
   this span is now owned by P1 and this byte contains important
   bitmap information.

This fixes this problem by simply delaying the mHeap_Free until after
the heapBitsSweepSpan. I think the overall logic of mSpan_Sweep could
be simplified now, but this seems like the minimal change.

Fixes #11617.

Change-Id: I6b1382c7e7cc35f81984467c0772fe9848b7522a
Reviewed-on: https://go-review.googlesource.com/12320
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-17 03:34:11 +00:00
Russ Cox
a93e5b4ff9 Revert "runtime: diagnose invalid pointers during GC"
Broke arm64. Update #9880.

This reverts commit 38d9b2a3a9.

Change-Id: I35fa21005af2183828a9d8b195ebcfbe45ec5138
Reviewed-on: https://go-review.googlesource.com/12247
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-16 01:49:58 +00:00
Austin Clements
e42413cecc runtime: fix saved PC/SP after safe-point function in syscall
Running a safe-point function on syscall entry uses systemstack() and
hence clobbers g.sched.pc and g.sched.sp. Fix this by re-saving them
after the systemstack, just like in the other uses of systemstack in
reentersyscall.

Change-Id: I47868a53eba24d81919fda56ef6bbcf72f1f922e
Reviewed-on: https://go-review.googlesource.com/12125
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-15 21:09:16 +00:00
Austin Clements
edfc979725 runtime: run safe-point function before entering _Psyscall
Currently, we run a P's safe-point function immediately after entering
_Psyscall state. This is unsafe, since as soon as we put the P in
_Psyscall, we no longer control the P and another M may claim it.
We'll still run the safe-point function only once (because doing so
races on an atomic), but the P may no longer be at a safe-point when
we do so.

In particular, this means that the use of forEachP to dispose all P's
gcw caches is unsafe. A P may enter a syscall, run the safe-point
function, and dispose the P's gcw cache concurrently with another M
claiming the P and attempting to use its gcw cache. If this happens,
we may empty the gcw's workbuf after putting it on
work.{full,partial}, or add pointers to it after putting it in
work.empty. This will cause an assertion failure when we later pop the
workbuf from the list and its object count is inconsistent with the
list we got it from.

Fix this by running the safe-point function just before putting the P
in _Psyscall.

Related to #11640. This probably fixes this issue, but while I'm able
to show that we can enter a bad safe-point state as a result of this,
I can't reproduce that specific failure.

Change-Id: I6989c8ca7ef2a4a941ae1931e9a0748cbbb59434
Reviewed-on: https://go-review.googlesource.com/12124
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-15 21:09:07 +00:00