1
0
mirror of https://github.com/golang/go synced 2024-10-03 05:11:21 -06:00
Commit Graph

1533 Commits

Author SHA1 Message Date
Yao Zhang
980b00f55b runtime: added go files for mips64 architecture support
Change-Id: Ia496470e48b3c5d39fb9fef99fac356dfb73a949
Reviewed-on: https://go-review.googlesource.com/14927
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:46:50 +00:00
Yao Zhang
b2b8559987 runtime/internal/atomic: added mips64 support.
Change-Id: I2eaf0658771a0ff788429e2f503d116531166315
Reviewed-on: https://go-review.googlesource.com/16834
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:46:35 +00:00
Yao Zhang
424738e43e runtime: added assembly part of linux/mips64{,le} support
Change-Id: I9e94027ef66c88007107de2b2b75c3d7cf1352af
Reviewed-on: https://go-review.googlesource.com/14467
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 04:46:17 +00:00
Matthew Dempsky
a9bebd91c9 runtime: update comment that was missed in CL 6584
Change-Id: Ie5f70af7e673bb2c691a45c28db2c017e6cddd4f
Reviewed-on: https://go-review.googlesource.com/16833
Reviewed-by: Minux Ma <minux@golang.org>
2015-11-12 03:38:04 +00:00
Matthew Dempsky
c17c42e8a5 runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...)
Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and
mSpanList.

Two special cases:

1. mHeap_Scavenge() previously didn't take an *mheap parameter, so it
was specially handled in this CL.

2. mHeap_Free() would have collided with mheap's "free" field, so it's
been renamed to (*mheap).freeSpan to parallel its underlying
(*mheap).freeSpanLocked method.

Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b
Reviewed-on: https://go-review.googlesource.com/16221
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-12 00:34:58 +00:00
Michael Hudson-Doyle
58db5fc94d runtime: run TestCgoExternalThreadSIGPROF on ppc64le
It was disabled because of the lack of external linking.

Change-Id: Iccb4a4ef8c57d048d53deabe4e0f4e6b9dccce33
Reviewed-on: https://go-review.googlesource.com/16797
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-12 00:30:04 +00:00
Hyang-Ah Hana Kim
b2259dcef0 runtime: add syscalls needed for android/386 logging
Update golang/go#9327.

Change-Id: I27ef973190d9ae652411caf3739414b5d46ca7d2
Reviewed-on: https://go-review.googlesource.com/16679
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-11-11 21:59:53 +00:00
Hyang-Ah Hana Kim
05c4c6e2f4 cmd,runtime: TLS setup for android/386
Same ugly hack as https://go-review.googlesource.com/15991.

Update golang/go#9327.

Change-Id: I58284e83268a15de95eabc833c3e01bf1e3faa2e
Reviewed-on: https://go-review.googlesource.com/16678
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-11-11 21:59:24 +00:00
Austin Clements
d727312cbf runtime: remove unused marking parfor
The GC now handles the root marking jobs as part of general marking,
so work.markfor is no longer used.

Change-Id: I6c3b23fed27e4e7ea6430d6ca7ba25ae4d04ed14
Reviewed-on: https://go-review.googlesource.com/16811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-11-11 18:31:33 +00:00
Austin Clements
f32f2954fb runtime: never allocate new M when jumping time forward
When we're jumping time forward, it means everyone is asleep, so there
should always be an M available. Furthermore, this causes both
allocation and write barriers in contexts that may be running without
a P (such as in sysmon).

Hence, replace this allocation with a throw.

Updates #10600.

Change-Id: I2cee70d5db828d0044082878995949edb25dda5f
Reviewed-on: https://go-review.googlesource.com/16815
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-11 17:37:42 +00:00
Austin Clements
f5c42cf88e runtime: replace traceBuf slice with index
Currently traceBuf keeps track of where it is in the trace buffer by
also maintaining a slice that points in to this buffer with an initial
length of 0 and a cap of the length of the array. All writes to this
buffer are done by appending to the slice (as long as the bounds
checks are right, it will never overflow and the append won't allocate
a new slice).

Each of these appends generates a write barrier. As long as we never
overflow the buffer, this write barrier won't fire, but this wreaks
havoc with eliminating write barriers from the tracing code. If we
were to overflow the buffer, this would both allocate and invoke a
write barrier, both things that are dicey at best to do in many of the
contexts tracing happens. It also wastes space in the traceBuf and
leads to more complex code and more complex generated code.

Replace this slice trick with keeping track of a simple array
position.

Updates #10600.

Change-Id: I0a63eecec1992e195449f414ed47653f66318d0e
Reviewed-on: https://go-review.googlesource.com/16814
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-11 17:37:31 +00:00
Austin Clements
2be1ed80c5 runtime: eliminate traceStack write barriers
This replaces *traceStack with traceStackPtr, much like the preceding
commit.

Updates #10600.

Change-Id: Ifadc35eb37a405ae877f9740151fb31a0ca1d08f
Reviewed-on: https://go-review.googlesource.com/16813
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-11 17:37:26 +00:00
Austin Clements
03227bb55e runtime: eliminate traceBuf write barriers
The tracing code is currently called from contexts such as sysmon and
the scheduler where write barriers are not allowed. Unfortunately,
while the common paths through the tracing code do not have write
barriers, many of the less common paths dealing with buffer overflow
and recycling do.

This change replaces all *traceBufs with traceBufPtrs. In the style of
guintptr, etc., the GC does not trace traceBufPtrs and write barriers
do not apply when these pointers are written. Since traceBufs are
allocated from non-GC'd memory and manually managed, this is always
safe.

Updates #10600.

Change-Id: I52b992d36d1b634ebd855c8cde27947ec14f59ba
Reviewed-on: https://go-review.googlesource.com/16812
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-11 17:37:18 +00:00
Austin Clements
7d1d642956 runtime: fix use of xadd64
Commit 7407d8e was rebased over the switch to runtime/internal/atomic
and introduced a call to xadd64, which no longer exists. Fix that
call.

Change-Id: I99c93469794c16504ae4a8ffe3066ac382c66a3a
Reviewed-on: https://go-review.googlesource.com/16816
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-11 15:26:24 +00:00
Austin Clements
7407d8e582 runtime: fix over-aggressive proportional sweep
Currently, sweeping is performed before allocating a span by charging
for the entire size of the span requested, rather than the number of
bytes actually available for allocation from the returned span. That
is, if the returned span is 8K, but already has 6K in use, the mutator
is charged for 8K of heap allocation even though it can only allocate
2K more from the span. As a result, proportional sweep is
over-aggressive and tends to finish much earlier than it needs to.
This effect is more amplified by fragmented heaps.

Fix this by reimbursing the mutator for the used space in a span once
it has allocated that span. We still have to charge up-front for the
worst-case because we don't know which span the mutator will get, but
at least we can correct the over-charge once it has a span, which will
go toward later span allocations.

This has negligible effect on the throughput of the go1 benchmarks and
the garbage benchmark.

Fixes #12040.

Change-Id: I0e23e7a4ccf126cca000fed5067b20017028dd6b
Reviewed-on: https://go-review.googlesource.com/16515
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-11 15:21:32 +00:00
Ian Lance Taylor
880a689124 runtime: don't call msanread when running on the system stack
The runtime is not instrumented, but the calls to msanread in the
runtime can sometimes refer to the system stack.  An example is the call
to copy in stkbucket in mprof.go.  Depending on what C code has done,
the system stack may appear uninitialized to msan.

Change-Id: Ic21705b9ac504ae5cf7601a59189302f072e7db1
Reviewed-on: https://go-review.googlesource.com/16660
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-11-11 06:04:04 +00:00
Ian Lance Taylor
8f3f2ccac0 runtime: mark cgo callback results as written for msan
This is a fix for the -msan option when using cgo callbacks.  A cgo
callback works by writing out C code that puts a struct on the stack and
passes the address of that struct into Go.  The result parameters are
fields of the struct.  The Go code will write to the result parameters,
but the Go code thinks it is just writing into the Go stack, and
therefore won't call msanwrite.  This CL adds a call to msanwrite in the
cgo callback code so that the C knows that results were written.

Change-Id: I80438dbd4561502bdee97fad3f02893a06880ee1
Reviewed-on: https://go-review.googlesource.com/16611
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-11-11 05:58:19 +00:00
Austin Clements
f84420c20d runtime: clean up park messages
This changes "mark worker (idle)" to "GC worker (idle)" so it's more
clear to users that these goroutines are GC-related. It changes "GC
assist" to "GC assist wait" to make it clear that the assist is
blocked.

Change-Id: Iafbc0903c84f9250ff6bee14baac6fcd4ed5ef76
Reviewed-on: https://go-review.googlesource.com/16511
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-11 01:04:39 +00:00
Austin Clements
56ad88b1ff runtime: free stack spans outside STW
We couldn't do this before this point because it must be done before
the next GC cycle starts. Hence, if it delayed the start of the next
cycle, that would widen the window between reaching the heap trigger
of the next cycle and starting the next GC cycle, during which the
mutator could over-allocate. With the decentralized GC, any mutators
that reach the heap trigger will block on the GC starting, so it's
safe to widen the time between starting the world and being able to
start the next GC cycle.

Fixes #11465.

Change-Id: Ic7ea7e9eba5b66fc050299f843a9c9001ad814aa
Reviewed-on: https://go-review.googlesource.com/16394
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-11 01:04:33 +00:00
Ian Lance Taylor
9dcc58c3d1 cmd/cgo, runtime: add checks for passing pointers from Go to C
This implements part of the proposal in issue 12416 by adding dynamic
checks for passing pointers from Go to C.  This code is intended to be
on at all times.  It does not try to catch every case.  It does not
implement checks on calling Go functions from C.

The new cgo checks may be disabled using GODEBUG=cgocheck=0.

Update #12416.

Change-Id: I48de130e7e2e83fb99a1e176b2c856be38a4d3c8
Reviewed-on: https://go-review.googlesource.com/16003
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 22:22:10 +00:00
Michael Matloob
67faca7d9c runtime: break atomics out into package runtime/internal/atomic
This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.

all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.

Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 17:38:04 +00:00
Michael Hudson-Doyle
4e3deae96d cmd/link, runtime: arm64 implementation of addmoduledata
Change-Id: I62fb5b20d7caa51b77560a4bfb74a39f17089805
Reviewed-on: https://go-review.googlesource.com/13999
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 01:24:25 +00:00
Keith Randall
e410a527b2 runtime: simplify chan ops, take 2
This change is the same as CL #9345 which was reverted,
except for a small bug fix.

The only change is to the body of sendDirect and its callsite.
Also added a test.

The problem was during a channel send operation.  The target
of the send was a sleeping goroutine waiting to receive.  We
basically do:
1) Read the destination pointer out of the sudog structure
2) Copy the value we're sending to that destination pointer
Unfortunately, the previous change had a goroutine suspend
point between 1 & 2 (the call to sendDirect).  At that point
the destination goroutine's stack could be copied (shrunk).
The pointer we read in step 1 is no longer valid for step 2.

Fixed by not allowing any suspension points between 1 & 2.
I suspect the old code worked correctly basically by accident.

Fixes #13169

The original 9345:

This change removes the retry mechanism we use for buffered channels.
Instead, any sender waking up a receiver or vice versa completes the
full protocol with its counterpart.  This means the counterpart does
not need to relock the channel when it wakes up.  (Currently
buffered channels need to relock on wakeup.)

For sends on a channel with waiting receivers, this change replaces
two copies (sender->queue, queue->receiver) with one (sender->receiver).
For receives on channels with a waiting sender, two copies are still required.

This change unifies to a large degree the algorithm for buffered
and unbuffered channels, simplifying the overall implementation.

Fixes #11506

Change-Id: I57dfa3fc219cffa4d48301ee15fe5479299efa09
Reviewed-on: https://go-review.googlesource.com/16740
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-08 23:20:25 +00:00
Michael Hudson-Doyle
1b4d28f8cf cmd/link, runtime: arm implementation of addmoduledata
Change-Id: I3975e10c2445e23c2798a7203a877ff2de3427c7
Reviewed-on: https://go-review.googlesource.com/14189
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-08 21:46:17 +00:00
Ian Lance Taylor
e884334b55 runtime: use pthread_sigmask, not sigprocmask, on Darwin ARM/ARM64
Other systems use pthread_sigmask.  It was a mistake to use sigprocmask
here.

Change-Id: Ie045aa3f09cf035fcf807b7543b96fa5b847958a
Reviewed-on: https://go-review.googlesource.com/16720
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-07 15:48:58 +00:00
Keith Randall
4b7d5f0b94 runtime: memmove/memclr pointers atomically
Make sure that we're moving or zeroing pointers atomically.
Anything that is a multiple of pointer size and at least
pointer aligned might have pointers in it.  All the code looks
ok except for the 1-pointer-sized moves.

Fixes #13160
Update #12552

Change-Id: Ib97d9b918fa9f4cc5c56c67ed90255b7fdfb7b45
Reviewed-on: https://go-review.googlesource.com/16668
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-07 02:42:12 +00:00
Ilya Tocar
321a40721b runtime: optimize indexbytebody on amd64
Use avx2 to compare 32 bytes per iteration.
Results (haswell):

name                    old time/op    new time/op     delta
IndexByte32-6             15.5ns ± 0%     14.7ns ± 5%   -4.87%        (p=0.000 n=16+20)
IndexByte4K-6              360ns ± 0%      183ns ± 0%  -49.17%        (p=0.000 n=19+20)
IndexByte4M-6              384µs ± 0%      256µs ± 1%  -33.41%        (p=0.000 n=20+20)
IndexByte64M-6            6.20ms ± 0%     4.18ms ± 1%  -32.52%        (p=0.000 n=19+20)
IndexBytePortable32-6     73.4ns ± 5%     75.8ns ± 3%   +3.35%        (p=0.000 n=20+19)
IndexBytePortable4K-6     5.15µs ± 0%     5.15µs ± 0%     ~     (all samples are equal)
IndexBytePortable4M-6     5.26ms ± 0%     5.25ms ± 0%   -0.12%        (p=0.000 n=20+18)
IndexBytePortable64M-6    84.1ms ± 0%     84.1ms ± 0%   -0.08%        (p=0.012 n=18+20)
Index32-6                  352ns ± 0%      352ns ± 0%     ~     (all samples are equal)
Index4K-6                 53.8µs ± 0%     53.8µs ± 0%   -0.03%        (p=0.000 n=16+18)
Index4M-6                 55.4ms ± 0%     55.4ms ± 0%     ~           (p=0.149 n=20+19)
Index64M-6                 886ms ± 0%      886ms ± 0%     ~           (p=0.108 n=20+20)
IndexEasy32-6             80.3ns ± 0%     80.1ns ± 0%   -0.21%        (p=0.000 n=20+20)
IndexEasy4K-6              426ns ± 0%      215ns ± 0%  -49.53%        (p=0.000 n=20+20)
IndexEasy4M-6              388µs ± 0%      262µs ± 1%  -32.42%        (p=0.000 n=18+20)
IndexEasy64M-6            6.20ms ± 0%     4.19ms ± 1%  -32.47%        (p=0.000 n=18+20)

name                    old speed      new speed       delta
IndexByte32-6           2.06GB/s ± 1%   2.17GB/s ± 5%   +5.19%        (p=0.000 n=18+20)
IndexByte4K-6           11.4GB/s ± 0%   22.3GB/s ± 0%  +96.45%        (p=0.000 n=17+20)
IndexByte4M-6           10.9GB/s ± 0%   16.4GB/s ± 1%  +50.17%        (p=0.000 n=20+20)
IndexByte64M-6          10.8GB/s ± 0%   16.0GB/s ± 1%  +48.19%        (p=0.000 n=19+20)
IndexBytePortable32-6    436MB/s ± 5%    422MB/s ± 3%   -3.27%        (p=0.000 n=20+19)
IndexBytePortable4K-6    795MB/s ± 0%    795MB/s ± 0%     ~           (p=0.940 n=17+18)
IndexBytePortable4M-6    798MB/s ± 0%    799MB/s ± 0%   +0.12%        (p=0.000 n=20+18)
IndexBytePortable64M-6   798MB/s ± 0%    798MB/s ± 0%   +0.08%        (p=0.011 n=18+20)
Index32-6               90.9MB/s ± 0%   90.9MB/s ± 0%   -0.00%        (p=0.025 n=20+20)
Index4K-6               76.1MB/s ± 0%   76.1MB/s ± 0%   +0.03%        (p=0.000 n=14+15)
Index4M-6               75.7MB/s ± 0%   75.7MB/s ± 0%     ~           (p=0.076 n=20+19)
Index64M-6              75.7MB/s ± 0%   75.7MB/s ± 0%     ~           (p=0.456 n=20+17)
IndexEasy32-6            399MB/s ± 0%    399MB/s ± 0%   +0.20%        (p=0.000 n=20+19)
IndexEasy4K-6           9.60GB/s ± 0%  19.02GB/s ± 0%  +98.19%        (p=0.000 n=20+20)
IndexEasy4M-6           10.8GB/s ± 0%   16.0GB/s ± 1%  +47.98%        (p=0.000 n=18+20)
IndexEasy64M-6          10.8GB/s ± 0%   16.0GB/s ± 1%  +48.08%        (p=0.000 n=18+20)

Change-Id: I46075921dde9f3580a89544c0b3a2d8c9181ebc4
Reviewed-on: https://go-review.googlesource.com/16484
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Klaus Post <klauspost@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-06 15:16:28 +00:00
Keith Randall
e9f90ba246 Revert "runtime: simplify buffered channels."
Revert for now until #13169 is understood.

This reverts commit 8e496f1d69.

Change-Id: Ib3eb2588824ef47a2b6eb9e377a24e5c817fcc81
Reviewed-on: https://go-review.googlesource.com/16716
Reviewed-by: Keith Randall <khr@golang.org>
2015-11-06 08:30:35 +00:00
Austin Clements
d5ba582166 runtime: remove background GC goroutine and mark barriers
These are now unused.

Updates #11970.

Change-Id: I43e5c4e5bcda9581bacc63364f96bb4855ab779f
Reviewed-on: https://go-review.googlesource.com/16393
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:24:05 +00:00
Austin Clements
bbf2da00fc runtime: remove GC start up/shutdown workaround in mallocgc
Currently mallocgc detects if the GC is in a state where it can't
assist, but also can't allocate uncontrolled and yields to help out
the GC. This was a workaround for periods when we were trying to
schedule the GC coordinator. It is no longer necessary because there
is no GC coordinator and malloc can always assist with any GC
transitions that are necessary.

Updates #11970.

Change-Id: I4f7beb7013e85e50ae99a3a8b0bb708ba49cbcd4
Reviewed-on: https://go-review.googlesource.com/16392
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:24:01 +00:00
Austin Clements
c99d7f7f85 runtime: decentralize mark done and mark termination
This moves all of the mark 1 to mark 2 transition and mark termination
to the mark done transition function. This means these transitions are
now handled on the goroutine that detected mark completion. This also
means that the GC coordinator and the background completion barriers
are no longer used and various workarounds to yield to the coordinator
are no longer necessary. These will be removed in follow-up commits.

One consequence of this is that mark workers now need to be
preemptible when performing the mark done transition. This allows them
to stop the world and to perform the final clean-up steps of GC after
restarting the world. They are only made preemptible while performing
this transition, so if the worker findRunnableGCWorker would schedule
isn't available, we didn't want to schedule it anyway.

Fixes #11970.

Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837
Reviewed-on: https://go-review.googlesource.com/16391
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:54 +00:00
Austin Clements
d986bf2741 runtime: account mark worker time before gcMarkDone
Currently gcMarkDone takes basically no time, so it's okay to account
the worker time after calling it. However, gcMarkDone is about to take
potentially *much* longer because it may perform all of mark
termination. Prepare for this by swapping the order so we account the
time before calling gcMarkDone.

Change-Id: I90c7df68192acfc4fd02a7254dae739dda4e2fcb
Reviewed-on: https://go-review.googlesource.com/16390
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:49 +00:00
Austin Clements
171204b561 runtime: factor mark done transition
Currently the code for completion of mark 1/mark 2 is duplicated in
background workers and assists. Factor this in to a single function
that will serve as the transition function for concurrent mark.

Change-Id: I4d9f697a15da0d349db3b34d56f3a220dd41d41b
Reviewed-on: https://go-review.googlesource.com/16359
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:42 +00:00
Austin Clements
12e23f05ff runtime: eliminate mark completion in scheduler
Currently, findRunnableGCWorker will perform mark completion if there
is no remaining work and no running workers. This used to be necessary
to resolve a race in the transition from mark 1 to mark 2 where we
would enter mark 2 with no mark work (and no dedicated workers), so no
workers would run, so no worker would signal mark completion.

However, we're about to make mark completion also perform the entire
follow-on process, which includes mark termination. We really don't
want to do that in the scheduler if it happens to detect completion.

Conveniently, this hack is no longer necessary because we always
enqueue root scanning work at the beginning of both mark 1 and mark 2,
so a mark worker will always run. Hence, we can simply eliminate it.

Change-Id: I3fc8f27c8da632f0fb732c9f6425e1f457f5652e
Reviewed-on: https://go-review.googlesource.com/16358
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:38 +00:00
Austin Clements
20f276e237 runtime: don't start idle mark workers when barriers are cleared
Currently, we don't start dedicated or fractional mark workers unless
the mark 1 or mark 2 barriers have been cleared. One intended
consequence of this is that no background workers run between the
forEachP that disposes all gcWork caches and the beginning of mark 2.

However, we (unintentionally) did not apply this restriction to idle
mark workers. As a result, these can start in the interim between mark
1 completion and mark 2 starting. This explains why it was necessary
to reset the root marking jobs using carefully ordered atomic writes
when setting up mark 2. It also means that, even though we definitely
enqueue work before starting mark 2, it may be drained by the time we
reset the mark 2 barrier. If this happens, currently the only thing
preventing the runtime from deadlocking is that the scheduler itself
also checks for mark completion and will signal mark 2 completion.
Were it not for the odd behavior of idle workers, this check in the
scheduler would not be necessary.

Clean all of this up and prepare to remove this check in the scheduler
by applying the same restriction to starting idle mark workers.

Change-Id: Ic1b479e1591bd7773dc27b320ca399a215603b5a
Reviewed-on: https://go-review.googlesource.com/16631
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:33 +00:00
Austin Clements
a51905fa04 runtime: decentralize sweep termination and mark transition
This moves all of GC initialization, sweep termination, and the
transition to concurrent marking in to the off->mark transition
function. This means it's now handled on the goroutine that detected
the state exit condition.

As a result, malloc no longer needs to Gosched() at the beginning of
the GC cycle to prevent over-allocation while the GC is starting up
because it will now *help* the GC to start up. The Gosched hack is
still necessary during GC shutdown (this is easy to test by enabling
gctrace and hitting Ctrl-S to block the gctrace output).

At this point, the GC coordinator still handles later phases. This
requires a small tweak to how we start the GC coordinator. Currently,
starting the GC coordinator is best-effort and may fail if the
coordinator is about to park from the previous cycle but hasn't yet.
We fix this by replacing the park/ready to wake up the coordinator
with a semaphore. This is temporary since the coordinator will be
going away in a few commits.

Updates #11970.

Change-Id: I2c6a11c91e72dfbc59c2d8e7c66146dee9a444fe
Reviewed-on: https://go-review.googlesource.com/16357
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:27 +00:00
Austin Clements
9630c47e8c runtime: decentralize concurrent sweep termination
This moves concurrent sweep termination from the coordinator to the
off->mark transition. This allows it to be performed by all Gs
attempting to start the GC.

Updates #11970.

Change-Id: I24428e8599a759398c2ef7ec996ba755a448f947
Reviewed-on: https://go-review.googlesource.com/16356
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:22 +00:00
Austin Clements
f54bcedce1 runtime: beginning of decentralized off->mark transition
This begins the conversion of the centralized GC coordinator to a
decentralized state machine by introducing the internal API that
triggers the first state transition from _GCoff to _GCmark (or
_GCmarktermination).

This change introduces the transition lock, the off->mark transition
condition (which is very similar to shouldtriggergc()), and the
general structure of a state transition. Since we're doing this
conversion in stages, it then falls back to the GC coordinator to
actually execute the cycle. We'll start moving logic out of the GC
coordinator and in to transition functions next.

This fixes a minor bug in gcstoptheworld debug mode where passing the
heap trigger once could trigger multiple STW GCs.

Updates #11970.

Change-Id: I964087dd190a639eb5766398f8e1bbf8b352902f
Reviewed-on: https://go-review.googlesource.com/16355
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-11-05 21:23:17 +00:00
Austin Clements
3842596284 runtime: move concurrent mark setup off system stack
For historical reasons we currently do a lot of the concurrent mark
setup on the system stack. In fact, at this point the one and only
thing that needs to happen on the system stack is the start-the-world.

Clean up this code by lifting everything other than the
start-the-world off the system stack.

The diff for this change looks large, but the only code change is to
narrow the systemstack call. Everything else is re-indentation.

Change-Id: I1e03b8afc759fad726f2397b05a17d183c2713ce
Reviewed-on: https://go-review.googlesource.com/16354
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:11 +00:00
Austin Clements
1959621584 runtime: lift state variables from func gc to var work
We're about to split func gc across several functions, so lift the
local variables it uses for tracking statistics and state across the
cycle into the global "work" variable.

Change-Id: Ie955f2f1758c7f5a5543ea1f3f33b222bc4b1d37
Reviewed-on: https://go-review.googlesource.com/16353
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:06 +00:00
Austin Clements
1698018955 runtime: note a minor issue with GODEUG=gcstoptheworld
Change-Id: I91cda8d88b0852cd0f868d33c594206bcca0c386
Reviewed-on: https://go-review.googlesource.com/16352
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-05 21:22:59 +00:00
Ilya Tocar
967564be7e runtime: optimize string comparison on amd64
Use AVX2 if possible.
Results below (haswell):

name                            old time/op    new time/op     delta
CompareStringEqual-6              8.77ns ± 0%     8.63ns ± 1%   -1.58%        (p=0.000 n=20+19)
CompareStringIdentical-6          5.02ns ± 0%     5.02ns ± 0%     ~     (all samples are equal)
CompareStringSameLength-6         7.51ns ± 0%     7.51ns ± 0%     ~     (all samples are equal)
CompareStringDifferentLength-6    1.56ns ± 0%     1.56ns ± 0%     ~     (all samples are equal)
CompareStringBigUnaligned-6        124µs ± 1%      105µs ± 5%  -14.99%        (p=0.000 n=20+18)
CompareStringBig-6                 112µs ± 1%      103µs ± 0%   -7.87%        (p=0.000 n=20+17)

name                            old speed      new speed       delta
CompareStringBigUnaligned-6     8.48GB/s ± 1%   9.98GB/s ± 5%  +17.67%        (p=0.000 n=20+18)
CompareStringBig-6              9.37GB/s ± 1%  10.17GB/s ± 0%   +8.54%        (p=0.000 n=20+17)

Change-Id: I1c949626dd2aaf9f633e3c888a9df71c82eed7e1
Reviewed-on: https://go-review.googlesource.com/16481
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Klaus Post <klauspost@gmail.com>
2015-11-05 15:42:33 +00:00
Keith Randall
8e496f1d69 runtime: simplify buffered channels.
This change removes the retry mechanism we use for buffered channels.
Instead, any sender waking up a receiver or vice versa completes the
full protocol with its counterpart.  This means the counterpart does
not need to relock the channel when it wakes up.  (Currently
buffered channels need to relock on wakeup.)

For sends on a channel with waiting receivers, this change replaces
two copies (sender->queue, queue->receiver) with one (sender->receiver).
For receives on channels with a waiting sender, two copies are still required.

This change unifies to a large degree the algorithm for buffered
and unbuffered channels, simplifying the overall implementation.

Fixes #11506

benchmark                        old ns/op     new ns/op     delta
BenchmarkChanProdCons10          125           110           -12.00%
BenchmarkChanProdCons0           303           284           -6.27%
BenchmarkChanProdCons100         75.5          71.3          -5.56%
BenchmarkChanContended           6452          6125          -5.07%
BenchmarkChanNonblocking         11.5          11.0          -4.35%
BenchmarkChanCreation            149           143           -4.03%
BenchmarkChanSem                 63.6          61.6          -3.14%
BenchmarkChanUncontended         6390          6212          -2.79%
BenchmarkChanSync                282           276           -2.13%
BenchmarkChanProdConsWork10      516           506           -1.94%
BenchmarkChanProdConsWork0       696           685           -1.58%
BenchmarkChanProdConsWork100     470           469           -0.21%
BenchmarkChanPopular             660427        660012        -0.06%

Change-Id: I164113a56432fbc7cace0786e49c5a6e6a708ea4
Reviewed-on: https://go-review.googlesource.com/9345
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-05 15:41:05 +00:00
Austin Clements
dcd9e5bc0f runtime: make putfull start mark workers
Currently we depend on the good graces and timing of the scheduler to
get opportunities to start dedicated mark workers. In the worst case,
it may take 10ms to get dedicated mark workers going at the beginning
of mark 1 and mark 2 or after the amount of available work has dropped
and gone back up.

Instead of waiting for the regular preemption logic to get around to
us, make putfull enlist a random P if we're not already running enough
dedicated workers. This should improve performance stability of the
garbage collector and is likely to improve the overall performance
somewhat.

No overall effect on the go1 benchmarks. It speeds up the garbage
benchmark by 12%, which more than counters the performance loss from
the previous commit.

name              old time/op  new time/op  delta
XBenchGarbage-12  6.32ms ± 4%  5.58ms ± 2%  -11.68%  (p=0.000 n=20+16)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.18s ± 5%     3.12s ± 4%  -1.83%  (p=0.021 n=20+20)
Fannkuch11-12                2.50s ± 2%     2.46s ± 2%  -1.57%  (p=0.000 n=18+19)
FmtFprintfEmpty-12          50.8ns ± 3%    50.4ns ± 3%    ~     (p=0.184 n=20+20)
FmtFprintfString-12          167ns ± 2%     171ns ± 1%  +2.46%  (p=0.000 n=20+19)
FmtFprintfInt-12             161ns ± 2%     163ns ± 2%  +1.81%  (p=0.000 n=20+20)
FmtFprintfIntInt-12          269ns ± 1%     266ns ± 1%  -0.81%  (p=0.002 n=19+20)
FmtFprintfPrefixedInt-12     237ns ± 2%     231ns ± 2%  -2.86%  (p=0.000 n=20+20)
FmtFprintfFloat-12           313ns ± 2%     313ns ± 1%    ~     (p=0.681 n=20+20)
FmtManyArgs-12              1.05µs ± 2%    1.03µs ± 1%  -2.26%  (p=0.000 n=20+20)
GobDecode-12                8.66ms ± 1%    8.67ms ± 1%    ~     (p=0.380 n=19+20)
GobEncode-12                6.56ms ± 1%    6.56ms ± 2%    ~     (p=0.607 n=19+20)
Gzip-12                      317ms ± 1%     314ms ± 2%  -1.10%  (p=0.000 n=20+19)
Gunzip-12                   42.1ms ± 1%    42.2ms ± 1%  +0.27%  (p=0.044 n=20+19)
HTTPClientServer-12         62.7µs ± 1%    62.0µs ± 1%  -1.04%  (p=0.000 n=19+18)
JSONEncode-12               16.7ms ± 1%    16.8ms ± 2%  +0.59%  (p=0.021 n=20+20)
JSONDecode-12               58.2ms ± 1%    61.4ms ± 2%  +5.43%  (p=0.000 n=18+19)
Mandelbrot200-12            3.84ms ± 1%    3.87ms ± 2%  +0.79%  (p=0.008 n=18+20)
GoParse-12                  3.86ms ± 2%    3.76ms ± 2%  -2.60%  (p=0.000 n=20+20)
RegexpMatchEasy0_32-12       100ns ± 2%     100ns ± 1%  -0.68%  (p=0.005 n=18+15)
RegexpMatchEasy0_1K-12       332ns ± 1%     342ns ± 1%  +3.16%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12      82.9ns ± 3%    83.0ns ± 2%    ~     (p=0.906 n=19+20)
RegexpMatchEasy1_1K-12       487ns ± 1%     494ns ± 1%  +1.50%  (p=0.000 n=17+20)
RegexpMatchMedium_32-12      131ns ± 2%     130ns ± 1%    ~     (p=0.686 n=19+20)
RegexpMatchMedium_1K-12     39.6µs ± 1%    39.2µs ± 1%  -1.09%  (p=0.000 n=18+19)
RegexpMatchHard_32-12       2.04µs ± 1%    2.04µs ± 2%    ~     (p=0.804 n=20+20)
RegexpMatchHard_1K-12       61.7µs ± 2%    61.3µs ± 2%    ~     (p=0.052 n=18+20)
Revcomp-12                   529ms ± 2%     533ms ± 1%  +0.83%  (p=0.003 n=20+19)
Template-12                 70.7ms ± 2%    71.0ms ± 2%    ~     (p=0.065 n=20+19)
TimeParse-12                 351ns ± 2%     355ns ± 1%  +1.25%  (p=0.000 n=19+20)
TimeFormat-12                362ns ± 2%     373ns ± 1%  +2.83%  (p=0.000 n=18+20)
[Geo mean]                  62.2µs         62.3µs       +0.13%

name                      old speed      new speed      delta
GobDecode-12              88.6MB/s ± 1%  88.5MB/s ± 1%    ~     (p=0.392 n=19+20)
GobEncode-12               117MB/s ± 1%   117MB/s ± 1%    ~     (p=0.622 n=19+20)
Gzip-12                   61.1MB/s ± 1%  61.8MB/s ± 2%  +1.11%  (p=0.000 n=20+19)
Gunzip-12                  461MB/s ± 1%   460MB/s ± 1%  -0.27%  (p=0.044 n=20+19)
JSONEncode-12              116MB/s ± 1%   115MB/s ± 2%  -0.58%  (p=0.022 n=20+20)
JSONDecode-12             33.3MB/s ± 1%  31.6MB/s ± 2%  -5.15%  (p=0.000 n=18+19)
GoParse-12                15.0MB/s ± 2%  15.4MB/s ± 2%  +2.66%  (p=0.000 n=20+20)
RegexpMatchEasy0_32-12     317MB/s ± 2%   319MB/s ± 2%    ~     (p=0.052 n=20+20)
RegexpMatchEasy0_1K-12    3.08GB/s ± 1%  2.99GB/s ± 1%  -3.07%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12     386MB/s ± 3%   386MB/s ± 2%    ~     (p=0.939 n=19+20)
RegexpMatchEasy1_1K-12    2.10GB/s ± 1%  2.07GB/s ± 1%  -1.46%  (p=0.000 n=17+20)
RegexpMatchMedium_32-12   7.62MB/s ± 2%  7.64MB/s ± 1%    ~     (p=0.702 n=19+20)
RegexpMatchMedium_1K-12   25.9MB/s ± 1%  26.1MB/s ± 2%  +0.99%  (p=0.000 n=18+20)
RegexpMatchHard_32-12     15.7MB/s ± 1%  15.7MB/s ± 2%    ~     (p=0.723 n=20+20)
RegexpMatchHard_1K-12     16.6MB/s ± 2%  16.7MB/s ± 2%    ~     (p=0.052 n=18+20)
Revcomp-12                 481MB/s ± 2%   477MB/s ± 1%  -0.83%  (p=0.003 n=20+19)
Template-12               27.5MB/s ± 2%  27.3MB/s ± 2%    ~     (p=0.062 n=20+19)
[Geo mean]                99.4MB/s       99.1MB/s       -0.35%

Change-Id: I914d8cadded5a230509d118164a4c201601afc06
Reviewed-on: https://go-review.googlesource.com/16298
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-04 20:15:51 +00:00
Austin Clements
62ba520b23 runtime: eliminate getfull barrier from concurrent mark
Currently dedicated mark workers participate in the getfull barrier
during concurrent mark. However, the getfull barrier wasn't designed
for concurrent work and this causes no end of headaches.

In the concurrent setting, participants come and go. This makes mark
completion susceptible to live-lock: since dedicated workers are only
periodically polling for completion, it's possible for the program to
be in some transient worker each time one of the dedicated workers
wakes up to check if it can exit the getfull barrier. It also
complicates reasoning about the system because dedicated workers
participate directly in the getfull barrier, but transient workers
must instead use trygetfull because they have exit conditions that
aren't captured by getfull (e.g., fractional workers exit when
preempted). The complexity of implementing these exit conditions
contributed to #11677. Furthermore, the getfull barrier is inefficient
because we could be running user code instead of spinning on a P. In
effect, we're dedicating 25% of the CPU to marking even if that means
we have to spin to make that 25%. It also causes issues on Windows
because we can't actually sleep for 100µs (#8687).

Fix this by making dedicated workers no longer participate in the
getfull barrier. Instead, dedicated workers simply return to the
scheduler when they fail to get more work, regardless of what others
workers are doing, and the scheduler only starts new dedicated workers
if there's work available. Everything that needs to be handled by this
barrier is already handled by detection of mark completion.

This makes the system much more symmetric because all workers and
assists now use trygetfull during concurrent mark. It also loosens the
25% CPU target so that we can give some of that 25% back to user code
if there isn't enough work to keep the mark worker busy. And it
eliminates the problematic 100µs sleep on Windows during concurrent
mark (though not during mark termination).

The downside of this is that if we hit a bottleneck in the heap graph
that then expands back out, the system may shut down dedicated workers
and take a while to start them back up. We'll address this in the next
commit.

Updates #12041 and #8687.

No effect on the go1 benchmarks. This slows down the garbage benchmark
by 9%, but we'll more than make it up in the next commit.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.80ms ± 2%  6.32ms ± 4%  +9.03%  (p=0.000 n=20+20)

Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b
Reviewed-on: https://go-review.googlesource.com/16297
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-11-04 20:15:39 +00:00
Austin Clements
3a765430c1 cmd/compile: add go:nowritebarrierrec annotation
This introduces a recursive variant of the go:nowritebarrier
annotation that prohibits write barriers not only in the annotated
function, but in all functions it calls, recursively. The error
message gives the shortest call stack from the annotated function to
the function containing the prohibited write barrier, including the
names of the functions and the line numbers of the calls.

To demonstrate the annotation, we apply it to gcmarkwb_m, the write
barrier itself.

This is a new annotation rather than a modification of the existing
go:nowritebarrier annotation because, for better or worse, there are
many go:nowritebarrier functions that do call functions with write
barriers. In most of these cases this is benign because the annotation
was conservative, but it prohibits simply coopting the existing
annotation.

Change-Id: I225ca483c8f699e8436373ed96349e80ca2c2479
Reviewed-on: https://go-review.googlesource.com/16554
Reviewed-by: Keith Randall <khr@golang.org>
2015-11-04 14:42:04 +00:00
Dmitry Vyukov
ee0305e036 runtime: remove dead code
runtime.free has long gone.

Change-Id: I058f69e6481b8fa008e1951c29724731a8a3d081
Reviewed-on: https://go-review.googlesource.com/16593
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2015-11-03 19:20:21 +00:00
Austin Clements
b6c0934a9b runtime: cache two workbufs to reduce contention
Currently the gcWork abstraction caches a single work buffer. As a
result, if a worker is putting and getting pointers right at the
boundary of a work buffer, it can flap between work buffers and
(potentially significantly) increase contention on the global work
buffer lists.

This change modifies gcWork to instead cache two work buffers and
switch off between them. This introduces one buffers' worth of
hysteresis and eliminates the above performance worst case by
amortizing the cost of getting or putting a work buffer over at least
one buffers' worth of work.

In practice, it's difficult to trigger this worst case with reasonably
large work buffers. On the garbage benchmark, this reduces the max
writes/sec to the global work list from 32K to 25K and the median from
6K to 5K. However, if a workload were to trigger this worst case
behavior, it could significantly drive up this contention.

This has negligible effects on the go1 benchmarks and slightly speeds
up the garbage benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.90ms ± 3%  5.83ms ± 4%  -1.18%  (p=0.011 n=18+18)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.22s ± 4%     3.17s ± 3%  -1.57%  (p=0.009 n=19+20)
Fannkuch11-12                2.44s ± 1%     2.53s ± 4%  +3.78%  (p=0.000 n=18+19)
FmtFprintfEmpty-12          50.2ns ± 2%    50.5ns ± 5%    ~     (p=0.631 n=19+20)
FmtFprintfString-12          167ns ± 1%     166ns ± 1%    ~     (p=0.141 n=20+20)
FmtFprintfInt-12             162ns ± 1%     159ns ± 1%  -1.80%  (p=0.000 n=20+20)
FmtFprintfIntInt-12          277ns ± 2%     263ns ± 1%  -4.78%  (p=0.000 n=20+18)
FmtFprintfPrefixedInt-12     240ns ± 1%     232ns ± 2%  -3.25%  (p=0.000 n=20+20)
FmtFprintfFloat-12           311ns ± 1%     315ns ± 2%  +1.17%  (p=0.000 n=20+20)
FmtManyArgs-12              1.05µs ± 2%    1.03µs ± 2%  -1.72%  (p=0.000 n=20+20)
GobDecode-12                8.65ms ± 1%    8.71ms ± 2%  +0.68%  (p=0.001 n=19+20)
GobEncode-12                6.51ms ± 1%    6.54ms ± 1%  +0.42%  (p=0.047 n=20+19)
Gzip-12                      318ms ± 2%     315ms ± 2%  -1.20%  (p=0.000 n=19+19)
Gunzip-12                   42.2ms ± 2%    42.1ms ± 1%    ~     (p=0.667 n=20+19)
HTTPClientServer-12         62.5µs ± 1%    62.4µs ± 1%    ~     (p=0.110 n=20+18)
JSONEncode-12               16.8ms ± 1%    16.8ms ± 2%    ~     (p=0.569 n=19+20)
JSONDecode-12               60.8ms ± 2%    59.8ms ± 1%  -1.69%  (p=0.000 n=19+19)
Mandelbrot200-12            3.87ms ± 1%    3.85ms ± 0%  -0.61%  (p=0.001 n=20+17)
GoParse-12                  3.76ms ± 2%    3.76ms ± 1%    ~     (p=0.698 n=20+20)
RegexpMatchEasy0_32-12       100ns ± 2%     101ns ± 2%    ~     (p=0.065 n=19+20)
RegexpMatchEasy0_1K-12       342ns ± 2%     333ns ± 1%  -2.82%  (p=0.000 n=20+19)
RegexpMatchEasy1_32-12      83.3ns ± 2%    83.2ns ± 2%    ~     (p=0.692 n=20+19)
RegexpMatchEasy1_1K-12       498ns ± 2%     490ns ± 1%  -1.52%  (p=0.000 n=18+20)
RegexpMatchMedium_32-12      131ns ± 2%     131ns ± 2%    ~     (p=0.464 n=20+18)
RegexpMatchMedium_1K-12     39.3µs ± 2%    39.6µs ± 1%  +0.77%  (p=0.000 n=18+19)
RegexpMatchHard_32-12       2.04µs ± 2%    2.06µs ± 1%  +0.69%  (p=0.009 n=19+20)
RegexpMatchHard_1K-12       61.4µs ± 2%    62.1µs ± 1%  +1.21%  (p=0.000 n=19+20)
Revcomp-12                   534ms ± 1%     529ms ± 1%  -0.97%  (p=0.000 n=19+16)
Template-12                 70.4ms ± 2%    70.0ms ± 1%    ~     (p=0.070 n=19+19)
TimeParse-12                 359ns ± 3%     344ns ± 1%  -4.15%  (p=0.000 n=19+19)
TimeFormat-12                357ns ± 1%     361ns ± 2%  +1.05%  (p=0.002 n=20+20)
[Geo mean]                  62.4µs         62.0µs       -0.56%

name                      old speed      new speed      delta
GobDecode-12              88.7MB/s ± 1%  88.1MB/s ± 2%  -0.68%  (p=0.001 n=19+20)
GobEncode-12               118MB/s ± 1%   117MB/s ± 1%  -0.42%  (p=0.046 n=20+19)
Gzip-12                   60.9MB/s ± 2%  61.7MB/s ± 2%  +1.21%  (p=0.000 n=19+19)
Gunzip-12                  460MB/s ± 2%   461MB/s ± 1%    ~     (p=0.661 n=20+19)
JSONEncode-12              116MB/s ± 1%   115MB/s ± 2%    ~     (p=0.555 n=19+20)
JSONDecode-12             31.9MB/s ± 2%  32.5MB/s ± 1%  +1.72%  (p=0.000 n=19+19)
GoParse-12                15.4MB/s ± 2%  15.4MB/s ± 1%    ~     (p=0.653 n=20+20)
RegexpMatchEasy0_32-12     317MB/s ± 2%   315MB/s ± 2%    ~     (p=0.141 n=19+20)
RegexpMatchEasy0_1K-12    2.99GB/s ± 2%  3.07GB/s ± 1%  +2.86%  (p=0.000 n=20+19)
RegexpMatchEasy1_32-12     384MB/s ± 2%   385MB/s ± 2%    ~     (p=0.672 n=20+19)
RegexpMatchEasy1_1K-12    2.06GB/s ± 2%  2.09GB/s ± 1%  +1.54%  (p=0.000 n=18+20)
RegexpMatchMedium_32-12   7.62MB/s ± 2%  7.63MB/s ± 2%    ~     (p=0.800 n=20+18)
RegexpMatchMedium_1K-12   26.0MB/s ± 1%  25.8MB/s ± 1%  -0.77%  (p=0.000 n=18+19)
RegexpMatchHard_32-12     15.7MB/s ± 2%  15.6MB/s ± 1%  -0.69%  (p=0.010 n=19+20)
RegexpMatchHard_1K-12     16.7MB/s ± 2%  16.5MB/s ± 1%  -1.19%  (p=0.000 n=19+20)
Revcomp-12                 476MB/s ± 1%   481MB/s ± 1%  +0.97%  (p=0.000 n=19+16)
Template-12               27.6MB/s ± 2%  27.7MB/s ± 1%    ~     (p=0.071 n=19+19)
[Geo mean]                99.1MB/s       99.3MB/s       +0.27%

Change-Id: I68bcbf74ccb716cd5e844a554f67b679135105e6
Reviewed-on: https://go-review.googlesource.com/16042
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 19:12:10 +00:00
Dmitry Vyukov
bf606094ee runtime: fix finalization and profiling of tiny allocations
Handling of special records for tiny allocations has two problems:
1. Once we queue a finalizer we mark the object. As the result any
   subsequent finalizers for the same object will not be queued
   during this GC cycle. If we have 16 finalizers setup (the worst case),
   finalization will take 16 GC cycles. This is what caused misbehave
   of tinyfin.go. The actual flakiness was caused by the fact that fing
   is asynchronous and don't always run before the check.
2. If a tiny block has both finalizer and profile specials,
   it is possible that we both queue finalizer, preserve the object live
   and free the profile record. As the result heap profile can be skewed.

Fix both issues by analyzing all special records for a single object at once.

Also, make tinyfin test stricter and remove reliance on real time.

Also, add a test for the problem 2. Currently heap profile missed about
a half of live memory.

Fixes #13100

Change-Id: I9ae4dc1c44893724138a4565ca5cae29f2e97544
Reviewed-on: https://go-review.googlesource.com/16591
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
2015-11-03 18:57:18 +00:00
Ilya Tocar
95333aea53 strings: add asm version of Index() for short strings on amd64
Currently we have special case for 1-byte strings,
This extends this to strings shorter than 32 bytes on amd64.
Results (broadwell):

name                 old time/op  new time/op  delta
IndexRune-4          57.4ns ± 0%  57.5ns ± 0%   +0.10%        (p=0.000 n=20+19)
IndexRuneFastPath-4  20.4ns ± 0%  20.4ns ± 0%     ~     (all samples are equal)
Index-4              21.0ns ± 0%  21.8ns ± 0%   +3.81%        (p=0.000 n=20+20)
LastIndex-4          7.07ns ± 1%  6.98ns ± 0%   -1.21%        (p=0.000 n=20+16)
IndexByte-4          18.3ns ± 0%  18.3ns ± 0%     ~     (all samples are equal)
IndexHard1-4         1.46ms ± 0%  0.39ms ± 0%  -73.06%        (p=0.000 n=16+16)
IndexHard2-4         1.46ms ± 0%  0.30ms ± 0%  -79.55%        (p=0.000 n=18+18)
IndexHard3-4         1.46ms ± 0%  0.66ms ± 0%  -54.68%        (p=0.000 n=19+19)
LastIndexHard1-4     1.46ms ± 0%  1.46ms ± 0%   -0.01%        (p=0.036 n=18+20)
LastIndexHard2-4     1.46ms ± 0%  1.46ms ± 0%     ~           (p=0.588 n=19+19)
LastIndexHard3-4     1.46ms ± 0%  1.46ms ± 0%     ~           (p=0.283 n=17+20)
IndexTorture-4       11.1µs ± 0%  11.1µs ± 0%   +0.01%        (p=0.000 n=18+17)

Change-Id: I892781549f558f698be4e41f9f568e3d0611efb5
Reviewed-on: https://go-review.googlesource.com/16430
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
2015-11-03 16:04:28 +00:00
Austin Clements
1870572180 runtime: enlarge GC work buffer size
Currently the GC work buffers are only 256 bytes and hence can record
only 24 64-bit pointer. They were reduced from 4K in commits db7fd1c
and a15818f as a way to minimize the amount of work the per-P workbuf
caches could "hide" from the mark phase and carry in to the mark
termination phase. However, this approach wasn't very robust and we
later added a "mark 2" phase to address this problem head-on.

Because of mark 2, there's now no benefit to having very small work
buffers. But there are plenty of downsides: small work buffers
increase contention on the work lists, increase the frequency and
hence net overhead of acquiring and releasing work buffers, and
somewhat increase memory overhead of the GC.

This commit expands work buffers back to 4K (504 64-bit pointers).
This reduces the rate of writes to work.full in the garbage benchmark
from a peak of ~780,000 writes/sec to a peak of ~32,000 writes/sec.

This has negligible effect on the go1 benchmarks. It slightly slows
down the garbage benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.37ms ± 5%  5.60ms ± 2%  +4.37%  (p=0.000 n=20+20)

Change-Id: Ic9cc28e7a125d23d9faf4f5e690fb8aa9bcdfb28
Reviewed-on: https://go-review.googlesource.com/15893
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 15:53:38 +00:00
Austin Clements
456528304d runtime: make assists preemptible
Currently, assists are non-preemptible, which means a heavily
assisting G can block other Gs from running. At the beginning of a GC
cycle, it can also delay scang, which will spin until the assist is
done. Since scanning is currently done sequentially, this can
seriously extend the length of the scan phase.

Fix this by making assists preemptible. Since the assist holds work
buffers and runs on the system stack, this must be done cooperatively:
we make gcDrainN return on preemption, and make the assist return from
the system stack and voluntarily Gosched.

This is prerequisite to enlarging the work buffers. Without this
change, the delays and spinning in scang increase significantly.

This has no effect on the go1 benchmarks.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.72ms ± 4%  5.37ms ± 5%  -6.11%  (p=0.000 n=20+20)

Change-Id: I829e732a0f23b126da633516a1a9ec1a508fdbf1
Reviewed-on: https://go-review.googlesource.com/15894
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 15:53:31 +00:00
Austin Clements
15aa6bbd5a runtime: replace assist sleep loop with park/ready
GC assists must block until the assist can be satisfied (either
through stealing credit or doing work) or the GC cycle ends.
Currently, this is implemented as a retry loop with a 100 µs delay.
This obviously isn't ideal, as it wastes CPU and delays mutator
execution. It also has the somewhat peculiar downside that sleeping a
G requires allocation, and this requires working around recursive
allocation.

Replace this timed delay with a proper scheduling queue. When an
assist can't be satisfied immediately, it adds the allocating G to a
queue and parks it. Any time background scan credit is flushed, it
consults this queue, directly satisfies the debt of queued assists,
and wakes up satisfied assists before flushing any remaining credit to
the background credit pool.

No effect on the go1 benchmarks. Slightly speeds up the garbage
benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.81ms ± 1%  5.72ms ± 4%  -1.65%  (p=0.011 n=20+20)

Updates #12041.

Change-Id: I8ee3b6274dd097b12b10a8030796a958a4b0e7b7
Reviewed-on: https://go-review.googlesource.com/15890
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 15:53:25 +00:00
Austin Clements
0ca4488cc1 runtime: change p.runq from []*g to []guintptr
This eliminates many write barriers in the scheduler code that are
unnecessary and will interfere with upcoming changes where the garbage
collector will have to invoke run queue functions in contexts that
must not have write barriers.

Change-Id: I702d0ac99cfd00ffff406e7362917db6a43e7e55
Reviewed-on: https://go-review.googlesource.com/16556
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-11-03 15:53:18 +00:00
Todd Neal
e3e0122ae2 test: use go:noinline consistently
Replace various implementations of inlining prevention with
"go:noinline"

Change-Id: Iac90895c3a62d6f4b7a6c72e11e165d15a0abfa4
Reviewed-on: https://go-review.googlesource.com/16510
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-03 02:01:34 +00:00
Ilya Tocar
0e23ca41d9 bytes: speed up Compare() on amd64
Use AVX2 if available.
Results (haswell), below:

name                           old time/op    new time/op     delta
BytesCompare1-6                  11.4ns ± 0%     11.4ns ± 0%     ~     (all samples are equal)
BytesCompare2-6                  11.4ns ± 0%     11.4ns ± 0%     ~     (all samples are equal)
BytesCompare4-6                  11.4ns ± 0%     11.4ns ± 0%     ~     (all samples are equal)
BytesCompare8-6                  9.29ns ± 2%     8.76ns ± 0%   -5.72%        (p=0.000 n=16+17)
BytesCompare16-6                 9.29ns ± 2%     9.20ns ± 0%   -1.02%        (p=0.000 n=20+16)
BytesCompare32-6                 11.4ns ± 1%     11.4ns ± 0%     ~           (p=0.191 n=20+20)
BytesCompare64-6                 14.4ns ± 0%     13.1ns ± 0%   -8.68%        (p=0.000 n=20+20)
BytesCompare128-6                20.2ns ± 0%     18.5ns ± 0%   -8.27%        (p=0.000 n=16+20)
BytesCompare256-6                29.3ns ± 0%     24.5ns ± 0%  -16.38%        (p=0.000 n=16+16)
BytesCompare512-6                46.8ns ± 0%     37.1ns ± 0%  -20.78%        (p=0.000 n=18+16)
BytesCompare1024-6               82.9ns ± 0%     62.3ns ± 0%  -24.86%        (p=0.000 n=20+14)
BytesCompare2048-6                155ns ± 0%      112ns ± 0%  -27.74%        (p=0.000 n=20+20)
CompareBytesEqual-6              10.1ns ± 1%     10.0ns ± 1%     ~           (p=0.527 n=20+20)
CompareBytesToNil-6              10.0ns ± 2%      9.4ns ± 0%   -6.57%        (p=0.000 n=20+17)
CompareBytesEmpty-6              8.76ns ± 0%     8.76ns ± 0%     ~     (all samples are equal)
CompareBytesIdentical-6          8.76ns ± 0%     8.76ns ± 0%     ~     (all samples are equal)
CompareBytesSameLength-6         10.6ns ± 1%     10.6ns ± 1%     ~           (p=0.240 n=20+20)
CompareBytesDifferentLength-6    10.6ns ± 0%     10.6ns ± 1%     ~           (p=1.000 n=20+20)
CompareBytesBigUnaligned-6        132±s ± 1%      105±s ± 1%  -20.61%        (p=0.000 n=20+18)
CompareBytesBig-6                 125±s ± 1%      105±s ± 1%  -16.31%        (p=0.000 n=20+20)
CompareBytesBigIdentical-6       8.13ns ± 0%     8.13ns ± 0%     ~     (all samples are equal)

name                           old speed      new speed       delta
CompareBytesBigUnaligned-6     7.94GB/s ± 1%  10.01GB/s ± 1%  +25.96%        (p=0.000 n=20+18)
CompareBytesBig-6              8.38GB/s ± 1%  10.01GB/s ± 1%  +19.48%        (p=0.000 n=20+20)
CompareBytesBigIdentical-6      129TB/s ± 0%    129TB/s ± 0%   +0.01%        (p=0.003 n=17+19)

Change-Id: I820f31bab4582dd4204b146bb077c0d2f24cd8f5
Reviewed-on: https://go-review.googlesource.com/16434
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Klaus Post <klauspost@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2015-11-02 18:39:38 +00:00
Michael Hudson-Doyle
35d71d6727 cmd/go, runtime: define GOBUILDMODE_shared rather than shared when dynamically linking
To avoid collisions with what existing code may already be doing.

Change-Id: Ice639440aafc0724714c25333d90a49954372230
Reviewed-on: https://go-review.googlesource.com/16503
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-01 19:52:33 +00:00
Austin Clements
fbf273250f runtime: perform mark 2 root re-scanning in GC workers
This moves another root scanning task out of the GC coordinator and
parallelizes it on the GC workers.

This has negligible effect on the go1 benchmarks and the garbage
benchmark.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.24ms ± 1%  5.26ms ± 1%  +0.30%  (p=0.007 n=18+17)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.20s ± 5%     3.21s ± 5%    ~     (p=0.264 n=20+18)
Fannkuch11-12                2.46s ± 1%     2.54s ± 2%  +3.09%  (p=0.000 n=18+20)
FmtFprintfEmpty-12          49.9ns ± 4%    50.0ns ± 5%    ~     (p=0.356 n=20+20)
FmtFprintfString-12          170ns ± 1%     170ns ± 2%    ~     (p=0.815 n=19+20)
FmtFprintfInt-12             160ns ± 1%     159ns ± 1%  -0.63%  (p=0.003 n=18+19)
FmtFprintfIntInt-12          270ns ± 1%     267ns ± 1%  -1.00%  (p=0.000 n=19+18)
FmtFprintfPrefixedInt-12     238ns ± 1%     232ns ± 1%  -2.28%  (p=0.000 n=19+19)
FmtFprintfFloat-12           310ns ± 2%     313ns ± 2%  +0.93%  (p=0.000 n=19+19)
FmtManyArgs-12              1.06µs ± 1%    1.04µs ± 1%  -1.93%  (p=0.000 n=20+19)
GobDecode-12                8.63ms ± 1%    8.70ms ± 1%  +0.81%  (p=0.001 n=20+19)
GobEncode-12                6.52ms ± 1%    6.56ms ± 1%  +0.66%  (p=0.000 n=20+19)
Gzip-12                      318ms ± 1%     319ms ± 1%    ~     (p=0.405 n=17+18)
Gunzip-12                   42.1ms ± 2%    42.0ms ± 1%    ~     (p=0.771 n=20+19)
HTTPClientServer-12         62.6µs ± 1%    62.9µs ± 1%  +0.41%  (p=0.038 n=20+20)
JSONEncode-12               16.9ms ± 1%    16.9ms ± 1%    ~     (p=0.077 n=18+20)
JSONDecode-12               60.7ms ± 1%    62.3ms ± 1%  +2.73%  (p=0.000 n=20+20)
Mandelbrot200-12            3.86ms ± 1%    3.85ms ± 1%    ~     (p=0.084 n=19+20)
GoParse-12                  3.75ms ± 2%    3.73ms ± 1%    ~     (p=0.107 n=20+19)
RegexpMatchEasy0_32-12       100ns ± 2%     101ns ± 2%  +0.97%  (p=0.001 n=20+19)
RegexpMatchEasy0_1K-12       342ns ± 2%     332ns ± 2%  -2.86%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12      83.2ns ± 2%    82.8ns ± 2%    ~     (p=0.108 n=19+20)
RegexpMatchEasy1_1K-12       495ns ± 2%     490ns ± 2%  -1.04%  (p=0.000 n=18+19)
RegexpMatchMedium_32-12      130ns ± 2%     131ns ± 2%    ~     (p=0.291 n=20+20)
RegexpMatchMedium_1K-12     39.3µs ± 1%    39.9µs ± 1%  +1.54%  (p=0.000 n=18+20)
RegexpMatchHard_32-12       2.02µs ± 1%    2.05µs ± 2%  +1.19%  (p=0.000 n=19+19)
RegexpMatchHard_1K-12       60.9µs ± 1%    61.5µs ± 1%  +0.99%  (p=0.000 n=18+18)
Revcomp-12                   535ms ± 1%     531ms ± 1%  -0.82%  (p=0.000 n=17+17)
Template-12                 73.0ms ± 1%    74.1ms ± 1%  +1.47%  (p=0.000 n=20+20)
TimeParse-12                 356ns ± 2%     348ns ± 1%  -2.30%  (p=0.000 n=20+20)
TimeFormat-12                347ns ± 1%     353ns ± 1%  +1.68%  (p=0.000 n=19+20)
[Geo mean]                  62.3µs         62.4µs       +0.12%

name                      old speed      new speed      delta
GobDecode-12              88.9MB/s ± 1%  88.2MB/s ± 1%  -0.81%  (p=0.001 n=20+19)
GobEncode-12               118MB/s ± 1%   117MB/s ± 1%  -0.66%  (p=0.000 n=20+19)
Gzip-12                   60.9MB/s ± 1%  60.8MB/s ± 1%    ~     (p=0.409 n=17+18)
Gunzip-12                  461MB/s ± 2%   462MB/s ± 1%    ~     (p=0.765 n=20+19)
JSONEncode-12              115MB/s ± 1%   115MB/s ± 1%    ~     (p=0.078 n=18+20)
JSONDecode-12             32.0MB/s ± 1%  31.1MB/s ± 1%  -2.65%  (p=0.000 n=20+20)
GoParse-12                15.5MB/s ± 2%  15.5MB/s ± 1%    ~     (p=0.111 n=20+19)
RegexpMatchEasy0_32-12     318MB/s ± 2%   314MB/s ± 2%  -1.27%  (p=0.000 n=20+19)
RegexpMatchEasy0_1K-12    2.99GB/s ± 1%  3.08GB/s ± 2%  +2.94%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12     385MB/s ± 2%   386MB/s ± 2%    ~     (p=0.105 n=19+20)
RegexpMatchEasy1_1K-12    2.07GB/s ± 1%  2.09GB/s ± 2%  +1.06%  (p=0.000 n=18+19)
RegexpMatchMedium_32-12   7.64MB/s ± 2%  7.61MB/s ± 1%    ~     (p=0.179 n=20+20)
RegexpMatchMedium_1K-12   26.1MB/s ± 1%  25.7MB/s ± 1%  -1.52%  (p=0.000 n=18+20)
RegexpMatchHard_32-12     15.8MB/s ± 1%  15.6MB/s ± 2%  -1.18%  (p=0.000 n=19+19)
RegexpMatchHard_1K-12     16.8MB/s ± 2%  16.6MB/s ± 1%  -0.90%  (p=0.000 n=19+18)
Revcomp-12                 475MB/s ± 1%   479MB/s ± 1%  +0.83%  (p=0.000 n=17+17)
Template-12               26.6MB/s ± 1%  26.2MB/s ± 1%  -1.45%  (p=0.000 n=20+20)
[Geo mean]                99.0MB/s       98.7MB/s       -0.32%

Change-Id: I6ea44d7a59aaa6851c64695277ab65645ff9d32e
Reviewed-on: https://go-review.googlesource.com/16070
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-10-30 22:46:39 +00:00
Austin Clements
82d14d77da runtime: perform concurrent scan in GC workers
Currently the concurrent root scan is performed in its entirety by the
GC coordinator before entering concurrent mark (which enables GC
workers). This scan is done sequentially, which can prolong the scan
phase, delay the mark phase, and means that the scan phase does not
obey the 25% CPU goal. Furthermore, there's no need to complete the
root scan before starting marking (in fact, we already allow GC
assists to happen during the scan phase), so this acts as an
unnecessary barrier between root scanning and marking.

This change shifts the root scan work out of the GC coordinator and in
to the GC workers. The coordinator simply sets up the scan state and
enqueues the right number of root scan jobs. The GC workers then drain
the root scan jobs prior to draining heap scan jobs.

This parallelizes the root scan process, makes it obey the 25% CPU
goal, and effectively eliminates root scanning as an isolated phase,
allowing the system to smoothly transition from root scanning to heap
marking. This also eliminates a major non-STW responsibility of the GC
coordinator, which will make it easier to switch to a decentralized
state machine. Finally, it puts us in a good position to perform root
scanning in assists as well, which will help satisfy assists at the
beginning of the GC cycle.

This is mostly straightforward. One tricky aspect is that we have to
deal with preemption deadlock: where two non-preemptible gorountines
are trying to preempt each other to perform a stack scan. Given the
context where this happens, the only instance of this is two
background workers trying to scan each other. We avoid this by simply
not scanning the stacks of background workers during the concurrent
phase; this is safe because we'll scan them during mark termination
(and their stacks are *very* small and should not contain any new
pointers).

This change also switches the root marking during mark termination to
use the same gcDrain-based code path as concurrent mark. This
shouldn't affect performance because STW root marking was already
parallel and tasks switched to heap marking immediately when no more
root marking tasks were available. However, it simplifies the code and
unifies these code paths.

This has negligible effect on the go1 benchmarks. It slightly slows
down the garbage benchmark, possibly by making GC run slightly more
frequently.

name              old time/op  new time/op  delta
XBenchGarbage-12  5.10ms ± 1%  5.24ms ± 1%  +2.87%  (p=0.000 n=18+18)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.25s ± 3%     3.20s ± 5%  -1.57%  (p=0.013 n=20+20)
Fannkuch11-12                2.45s ± 1%     2.46s ± 1%  +0.38%  (p=0.019 n=20+18)
FmtFprintfEmpty-12          49.7ns ± 3%    49.9ns ± 4%    ~     (p=0.851 n=19+20)
FmtFprintfString-12          170ns ± 2%     170ns ± 1%    ~     (p=0.775 n=20+19)
FmtFprintfInt-12             161ns ± 1%     160ns ± 1%  -0.78%  (p=0.000 n=19+18)
FmtFprintfIntInt-12          267ns ± 1%     270ns ± 1%  +1.04%  (p=0.000 n=19+19)
FmtFprintfPrefixedInt-12     238ns ± 2%     238ns ± 1%    ~     (p=0.133 n=18+19)
FmtFprintfFloat-12           311ns ± 1%     310ns ± 2%  -0.35%  (p=0.023 n=20+19)
FmtManyArgs-12              1.08µs ± 1%    1.06µs ± 1%  -2.31%  (p=0.000 n=20+20)
GobDecode-12                8.65ms ± 1%    8.63ms ± 1%    ~     (p=0.377 n=18+20)
GobEncode-12                6.49ms ± 1%    6.52ms ± 1%  +0.37%  (p=0.015 n=20+20)
Gzip-12                      319ms ± 3%     318ms ± 1%    ~     (p=0.975 n=19+17)
Gunzip-12                   41.9ms ± 1%    42.1ms ± 2%  +0.65%  (p=0.004 n=19+20)
HTTPClientServer-12         61.7µs ± 1%    62.6µs ± 1%  +1.40%  (p=0.000 n=18+20)
JSONEncode-12               16.8ms ± 1%    16.9ms ± 1%    ~     (p=0.239 n=20+18)
JSONDecode-12               58.4ms ± 1%    60.7ms ± 1%  +3.85%  (p=0.000 n=19+20)
Mandelbrot200-12            3.86ms ± 0%    3.86ms ± 1%    ~     (p=0.092 n=18+19)
GoParse-12                  3.75ms ± 2%    3.75ms ± 2%    ~     (p=0.708 n=19+20)
RegexpMatchEasy0_32-12       100ns ± 1%     100ns ± 2%  +0.60%  (p=0.010 n=17+20)
RegexpMatchEasy0_1K-12       341ns ± 1%     342ns ± 2%    ~     (p=0.203 n=20+19)
RegexpMatchEasy1_32-12      82.5ns ± 2%    83.2ns ± 2%  +0.83%  (p=0.007 n=19+19)
RegexpMatchEasy1_1K-12       495ns ± 1%     495ns ± 2%    ~     (p=0.970 n=19+18)
RegexpMatchMedium_32-12      130ns ± 2%     130ns ± 2%  +0.59%  (p=0.039 n=19+20)
RegexpMatchMedium_1K-12     39.2µs ± 1%    39.3µs ± 1%    ~     (p=0.214 n=18+18)
RegexpMatchHard_32-12       2.03µs ± 2%    2.02µs ± 1%    ~     (p=0.166 n=18+19)
RegexpMatchHard_1K-12       61.0µs ± 1%    60.9µs ± 1%    ~     (p=0.169 n=20+18)
Revcomp-12                   533ms ± 1%     535ms ± 1%    ~     (p=0.071 n=19+17)
Template-12                 68.1ms ± 2%    73.0ms ± 1%  +7.26%  (p=0.000 n=19+20)
TimeParse-12                 355ns ± 2%     356ns ± 2%    ~     (p=0.530 n=19+20)
TimeFormat-12                357ns ± 2%     347ns ± 1%  -2.59%  (p=0.000 n=20+19)
[Geo mean]                  62.1µs         62.3µs       +0.31%

name                      old speed      new speed      delta
GobDecode-12              88.7MB/s ± 1%  88.9MB/s ± 1%    ~     (p=0.377 n=18+20)
GobEncode-12               118MB/s ± 1%   118MB/s ± 1%  -0.37%  (p=0.015 n=20+20)
Gzip-12                   60.9MB/s ± 3%  60.9MB/s ± 1%    ~     (p=0.944 n=19+17)
Gunzip-12                  464MB/s ± 1%   461MB/s ± 2%  -0.64%  (p=0.004 n=19+20)
JSONEncode-12              115MB/s ± 1%   115MB/s ± 1%    ~     (p=0.236 n=20+18)
JSONDecode-12             33.2MB/s ± 1%  32.0MB/s ± 1%  -3.71%  (p=0.000 n=19+20)
GoParse-12                15.5MB/s ± 2%  15.5MB/s ± 2%    ~     (p=0.702 n=19+20)
RegexpMatchEasy0_32-12     320MB/s ± 1%   318MB/s ± 2%    ~     (p=0.094 n=18+20)
RegexpMatchEasy0_1K-12    3.00GB/s ± 1%  2.99GB/s ± 1%    ~     (p=0.194 n=20+19)
RegexpMatchEasy1_32-12     388MB/s ± 2%   385MB/s ± 2%  -0.83%  (p=0.008 n=19+19)
RegexpMatchEasy1_1K-12    2.07GB/s ± 1%  2.07GB/s ± 1%    ~     (p=0.964 n=19+18)
RegexpMatchMedium_32-12   7.68MB/s ± 1%  7.64MB/s ± 2%  -0.57%  (p=0.020 n=19+20)
RegexpMatchMedium_1K-12   26.1MB/s ± 1%  26.1MB/s ± 1%    ~     (p=0.211 n=18+18)
RegexpMatchHard_32-12     15.8MB/s ± 1%  15.8MB/s ± 1%    ~     (p=0.180 n=18+19)
RegexpMatchHard_1K-12     16.8MB/s ± 1%  16.8MB/s ± 2%    ~     (p=0.236 n=20+19)
Revcomp-12                 477MB/s ± 1%   475MB/s ± 1%    ~     (p=0.071 n=19+17)
Template-12               28.5MB/s ± 2%  26.6MB/s ± 1%  -6.77%  (p=0.000 n=19+20)
[Geo mean]                 100MB/s       99.0MB/s       -0.82%

Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0
Reviewed-on: https://go-review.googlesource.com/16059
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-30 22:46:31 +00:00
Austin Clements
4cca1cc05e runtime: consolidate "out of GC work" checks
We already have gcMarkWorkAvailable, but the check for GC mark work is
open-coded in several places. Generalize gcMarkWorkAvailable slightly
and replace these open-coded checks with calls to gcMarkWorkAvailable.

In addition to cleaning up the code, this puts us in a better position
to make this check slightly more complicated.

Change-Id: I1b29883300ecd82a1bf6be193e9b4ee96582a860
Reviewed-on: https://go-review.googlesource.com/16058
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-30 22:46:22 +00:00
Russ Cox
bf1de1b141 runtime: introduce GOTRACEBACK=single, now the default
Abandon (but still support) the old numbering system.

GOTRACEBACK=none is old 0
GOTRACEBACK=single is the new behavior
GOTRACEBACK=all is old 1
GOTRACEBACK=system is old 2
GOTRACEBACK=crash is unchanged

See doc comment change in runtime1.go for details.

Filed #13107 to decide whether to change default back to GOTRACEBACK=all for Go 1.6 release.
If you run into programs where printing only the current goroutine omits
needed information, please add details in a comment on that issue.

Fixes #12366.

Change-Id: I82ca8b99b5d86dceb3f7102d38d2659d45dbe0db
Reviewed-on: https://go-review.googlesource.com/16512
Reviewed-by: Austin Clements <austin@google.com>
2015-10-30 18:43:44 +00:00
Michael Hudson-Doyle
c9b8cab16c cmd/internal/obj, cmd/link, runtime: handle TLS more like a platform linker on ppc64
On ppc64x, the thread pointer, held in R13, points 0x7000 bytes past where
thread-local storage begins (presumably to maximize the amount of storage that
can be accessed with a 16-bit signed displacement). The relocations used to
indicate thread-local storage to the platform linker account for this, so to be
able to support external linking we need to change things so the linker applies
this offset instead of the runtime assembly.

Change-Id: I2556c249ab2d802cae62c44b2b4c5b44787d7059
Reviewed-on: https://go-review.googlesource.com/14233
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-10-29 22:24:29 +00:00
Michael Hudson-Doyle
8537ff8a39 runtime/cgo: export _cgo_reginit on ppc64x
This is needed to make external linking work.

Change-Id: I4cf7edb4ea318849cab92a697952f8745eed40c4
Reviewed-on: https://go-review.googlesource.com/14237
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-29 00:38:43 +00:00
Ian Lance Taylor
f6fd086d5e runtime: add missing word in comment
Change-Id: Iffe27445e35ec071cf0920a05c81b8a97a3ed712
Reviewed-on: https://go-review.googlesource.com/16431
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-28 23:09:44 +00:00
David Crawshaw
73ff7cb1ed runtime: c-shared entrypoint for linux/arm64
Change-Id: I7dab124842f5209097a8d5a802fcbdde650654fa
Reviewed-on: https://go-review.googlesource.com/16395
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-28 21:21:33 +00:00
Hyang-Ah Hana Kim
dfc8649854 runtime, cmd: TLS setup for android/amd64.
Android linker does not handle TLS for us. We set up the TLS slot
for g, as darwin/386,amd64 handle instead. This is disgusting and
fragile. We will eventually fix this ugly hack by taking advantage
of the recent TLS IE model implementation. (Instead of referencing
an GOT entry, make the code sequence look into the TLS variable that
holds the offset.)

The TLS slot for g in android/amd64 assumes a fixed offset from %fs.
See runtime/cgo/gcc_android_amd64.c for details.

For golang/go#10743

Change-Id: I1a3fc207946c665515f79026a56ea19134ede2dd
Reviewed-on: https://go-review.googlesource.com/15991
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-28 20:54:28 +00:00
Michael Hudson-Doyle
72180c3b82 cmd/internal/obj, cmd/link, runtime: native-ish support for tls on arm64
Fixes #10560

Change-Id: Iedffd9c236c4fbb386c3afc52c5a1457f96ef122
Reviewed-on: https://go-review.googlesource.com/13991
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-28 19:51:05 +00:00
David du Colombier
31430bda09 runtime: don't use FP when calling nextSample in the Plan 9 sighandler
In the Go signal handler on Plan 9, when a signal with
the _SigThrow flag is received, we call startpanic before
printing the stack trace.

The startpanic function calls systemstack which calls
startpanic_m. In the startpanic_m function, we call
allocmcache to allocate _g_.m.mcache. The problem is
that allocmcache calls nextSample, which does a floating
point operation to return a sampling point for heap profiling.

However, Plan 9 doesn't support floating point in the
signal handler.

This change adds a new function nextSampleNoFP, only
called when in the Plan 9 signal handler, which is
similar to nextSample, but avoids floating point.

Change-Id: Iaa30437aa0f7c8c84d40afbab7567ad3bd5ea2de
Reviewed-on: https://go-review.googlesource.com/16307
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-28 05:45:24 +00:00
Michael Hudson-Doyle
bc3f14fd2a runtime: invoke vsyscall helper via TCB when dynamic linking on linux/386
The dynamic linker on linux/386 stores the address of the vsyscall helper at a
fixed offset from the %gs register on linux/386 for easy access from PIC code.

Change-Id: I635305cfecceef2289985d62e676e16810ed6b94
Reviewed-on: https://go-review.googlesource.com/16346
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-28 01:36:25 +00:00
Matthew Dempsky
4ff231bca1 runtime: eliminate some unnecessary uintptr conversions
arena_{start,used,end} are already uintptr, so no need to convert them
to uintptr, much less to convert them to unsafe.Pointer and then to
uintptr.  No binary change to pkg/linux_amd64/runtime.a.

Change-Id: Ia4232ed2a724c44fde7eba403c5fe8e6dccaa879
Reviewed-on: https://go-review.googlesource.com/16339
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-27 02:53:04 +00:00
David du Colombier
d093bf489b runtime: handle abort note on Plan 9
Implement an abort note on Plan 9, as an
equivalent of the SIGABRT signal on other
operating systems.

Updates #11975.

Change-Id: I010c9b10f2fbd2471aacd1d073368d975a2f0592
Reviewed-on: https://go-review.googlesource.com/16300
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-26 22:12:30 +00:00
Matthew Dempsky
d18167fefe runtime: fix tiny allocator
When a new tiny block is allocated because we're allocating an object
that won't fit into the current block, mallocgc saves the new block if
it has more space leftover than the old block.  However, the logic for
this was subtly broken in golang.org/cl/2814, resulting in never
saving (or consequently reusing) a tiny block.

Change-Id: Ib5f6769451fb82877ddeefe75dfe79ed4a04fd40
Reviewed-on: https://go-review.googlesource.com/16330
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-26 21:14:15 +00:00
Austin Clements
d3df04cd8c runtime: partition data and BSS root marking
Currently data and BSS root marking are each a single markroot job.
This makes them difficult to load balance, which can draw out mark
termination time if they are large.

Fix this by splitting both in to 256K chunks. While we're putting in
the infrastructure for dynamic roots, we also replace the fixed
sharding of the span roots with sharding in to fixed sizes. In
addition to helping balance root marking, this also paves the way to
parallelizing concurrent scan and to letting assists help with root
marking.

Updates #10345. This fixes the data and BSS aspects of that bug; it
does not partition scanning of large heap objects.

This has negligible effect on either the go1 benchmarks or the garbage
benchmark:

name              old time/op  new time/op  delta
XBenchGarbage-12  4.90ms ± 1%  4.91ms ± 2%   ~     (p=0.058 n=17+16)

name                      old time/op    new time/op    delta
BinaryTree17-12              3.11s ± 4%     3.12s ± 4%    ~     (p=0.512 n=20+20)
Fannkuch11-12                2.53s ± 2%     2.47s ± 2%  -2.28%  (p=0.000 n=20+18)
FmtFprintfEmpty-12          49.1ns ± 1%    50.0ns ± 4%  +1.68%  (p=0.008 n=18+20)
FmtFprintfString-12          170ns ± 0%     172ns ± 1%  +1.05%  (p=0.000 n=14+19)
FmtFprintfInt-12             174ns ± 1%     162ns ± 1%  -6.81%  (p=0.000 n=18+17)
FmtFprintfIntInt-12          284ns ± 1%     277ns ± 1%  -2.42%  (p=0.000 n=20+19)
FmtFprintfPrefixedInt-12     252ns ± 1%     244ns ± 1%  -2.84%  (p=0.000 n=18+20)
FmtFprintfFloat-12           317ns ± 0%     311ns ± 0%  -1.95%  (p=0.000 n=19+18)
FmtManyArgs-12              1.08µs ± 1%    1.11µs ± 1%  +3.43%  (p=0.000 n=18+19)
GobDecode-12                8.56ms ± 1%    8.61ms ± 1%  +0.50%  (p=0.020 n=20+20)
GobEncode-12                6.58ms ± 1%    6.57ms ± 1%    ~     (p=0.792 n=20+19)
Gzip-12                      317ms ± 3%     317ms ± 2%    ~     (p=0.840 n=19+19)
Gunzip-12                   41.6ms ± 0%    41.6ms ± 0%  +0.07%  (p=0.027 n=18+15)
HTTPClientServer-12         62.2µs ± 1%    62.3µs ± 1%    ~     (p=0.283 n=19+20)
JSONEncode-12               16.5ms ± 2%    16.5ms ± 1%    ~     (p=0.857 n=20+19)
JSONDecode-12               58.5ms ± 1%    61.3ms ± 1%  +4.67%  (p=0.000 n=18+17)
Mandelbrot200-12            3.84ms ± 0%    3.84ms ± 0%    ~     (p=0.259 n=17+17)
GoParse-12                  3.70ms ± 2%    3.74ms ± 2%  +0.96%  (p=0.009 n=19+20)
RegexpMatchEasy0_32-12       100ns ± 1%     100ns ± 0%  +0.31%  (p=0.040 n=19+15)
RegexpMatchEasy0_1K-12       340ns ± 1%     340ns ± 1%    ~     (p=0.411 n=17+19)
RegexpMatchEasy1_32-12      82.7ns ± 2%    82.3ns ± 1%    ~     (p=0.456 n=20+19)
RegexpMatchEasy1_1K-12       498ns ± 2%     495ns ± 0%    ~     (p=0.108 n=19+17)
RegexpMatchMedium_32-12      130ns ± 1%     130ns ± 2%    ~     (p=0.405 n=18+19)
RegexpMatchMedium_1K-12     39.4µs ± 2%    39.1µs ± 1%  -0.64%  (p=0.002 n=20+19)
RegexpMatchHard_32-12       2.03µs ± 2%    2.02µs ± 0%    ~     (p=0.561 n=20+17)
RegexpMatchHard_1K-12       61.1µs ± 2%    60.8µs ± 1%    ~     (p=0.615 n=19+18)
Revcomp-12                   532ms ± 2%     531ms ± 1%    ~     (p=0.470 n=19+19)
Template-12                 68.5ms ± 1%    69.1ms ± 1%  +0.87%  (p=0.000 n=17+17)
TimeParse-12                 344ns ± 2%     344ns ± 1%  +0.25%  (p=0.032 n=19+18)
TimeFormat-12                347ns ± 1%     362ns ± 1%  +4.27%  (p=0.000 n=17+19)
[Geo mean]                  62.3µs         62.3µs       -0.04%

name                      old speed      new speed      delta
GobDecode-12              89.6MB/s ± 1%  89.2MB/s ± 1%  -0.50%  (p=0.019 n=20+20)
GobEncode-12               117MB/s ± 1%   117MB/s ± 1%    ~     (p=0.797 n=20+19)
Gzip-12                   61.3MB/s ± 3%  61.2MB/s ± 2%    ~     (p=0.834 n=19+19)
Gunzip-12                  467MB/s ± 0%   466MB/s ± 0%  -0.07%  (p=0.027 n=18+15)
JSONEncode-12              117MB/s ± 2%   117MB/s ± 1%    ~     (p=0.851 n=20+19)
JSONDecode-12             33.2MB/s ± 1%  31.7MB/s ± 1%  -4.47%  (p=0.000 n=18+17)
GoParse-12                15.6MB/s ± 2%  15.5MB/s ± 2%  -0.95%  (p=0.008 n=19+20)
RegexpMatchEasy0_32-12     321MB/s ± 2%   320MB/s ± 1%  -0.57%  (p=0.002 n=17+17)
RegexpMatchEasy0_1K-12    3.01GB/s ± 1%  3.01GB/s ± 1%    ~     (p=0.132 n=17+18)
RegexpMatchEasy1_32-12     387MB/s ± 2%   389MB/s ± 1%    ~     (p=0.423 n=20+19)
RegexpMatchEasy1_1K-12    2.05GB/s ± 2%  2.06GB/s ± 0%    ~     (p=0.129 n=19+17)
RegexpMatchMedium_32-12   7.64MB/s ± 1%  7.66MB/s ± 1%    ~     (p=0.258 n=18+19)
RegexpMatchMedium_1K-12   26.0MB/s ± 2%  26.2MB/s ± 1%  +0.64%  (p=0.002 n=20+19)
RegexpMatchHard_32-12     15.7MB/s ± 2%  15.8MB/s ± 1%    ~     (p=0.510 n=20+17)
RegexpMatchHard_1K-12     16.8MB/s ± 2%  16.8MB/s ± 1%    ~     (p=0.603 n=19+18)
Revcomp-12                 477MB/s ± 2%   479MB/s ± 1%    ~     (p=0.470 n=19+19)
Template-12               28.3MB/s ± 1%  28.1MB/s ± 1%  -0.85%  (p=0.000 n=17+17)
[Geo mean]                 100MB/s        100MB/s       -0.26%

Change-Id: Ib0bfe0145675ce88c5a8791752f7486ac98805b4
Reviewed-on: https://go-review.googlesource.com/16043
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-26 15:42:44 +00:00
David Crawshaw
21f35b33c2 runtime: use a 64kb system stack on arm
I went looking for an arm system whose stacks are by default smaller
than 64KB. In fact the smallest common linux target I could find was
Android, which like iOS uses 1MB stacks.

Fixes #11873

Change-Id: Ieeb66ad095b3da18d47ba21360ea75152a4107c6
Reviewed-on: https://go-review.googlesource.com/14602
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-10-26 15:10:34 +00:00
Caleb Spare
fb7178e7cc runtime: copy sqrt normalization bugfix from math
This copies the change from CL 16158 (applied as
22d4c8bf13).

Updates #13013

Change-Id: Id7d02e63d92806f06a4e064a91b2fb6574fe385f
Reviewed-on: https://go-review.googlesource.com/16291
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-23 23:43:47 +00:00
Matthew Dempsky
8ee0fd8623 runtime: replace is{plan9,solaris,windows} with GOOS tests
Change-Id: I27589395f547c5837dc7536a0ab5bc7cc23a4ff6
Reviewed-on: https://go-review.googlesource.com/10872
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-23 18:11:17 +00:00
Alex Brainman
6410e67a1e runtime: account for cpu affinity in windows NumCPU
Fixes #11671

Change-Id: Ide1f8d92637dad2a2faed391329f9b6001789b76
Reviewed-on: https://go-review.googlesource.com/14742
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-23 07:54:42 +00:00
Austin Clements
beedb1ec33 runtime: add pcvalue cache to improve stack scan speed
The cost of scanning large stacks is currently dominated by the time
spent looking up and decoding the pcvalue table. However, large stacks
are usually large not because they contain calls to many different
functions, but because they contain many calls to the same, small set
of recursive functions. Hence, walking large stacks tends to make the
same pcvalue queries many times.

Based on this observation, this commit adds a small, very simple, and
fast cache in front of pcvalue lookup. We thread this cache down from
operations that make many pcvalue calls, such as gentraceback, stack
scanning, and stack adjusting.

This simple cache works well because it has minimal overhead when it's
not effective. I also tried a hashed direct-map cache, CLOCK-based
replacement, round-robin replacement, and round-robin with lookups
disabled until there had been at least 16 probes, but none of these
approaches had obvious wins over the random replacement policy in this
commit.

This nearly doubles the overall performance of the deep stack test
program from issue #10898:

name        old time/op  new time/op  delta
Issue10898   16.5s ±12%    9.2s ±12%  -44.37%  (p=0.008 n=5+5)

It's a very slight win on the garbage benchmark:

name              old time/op  new time/op  delta
XBenchGarbage-12  4.92ms ± 1%  4.89ms ± 1%  -0.75%  (p=0.000 n=18+19)

It's a wash (but doesn't harm performance) on the go1 benchmarks,
which don't have particularly deep stacks:

name                      old time/op    new time/op    delta
BinaryTree17-12              3.11s ± 2%     3.20s ± 3%  +2.83%  (p=0.000 n=17+20)
Fannkuch11-12                2.51s ± 1%     2.51s ± 1%  -0.22%  (p=0.034 n=19+18)
FmtFprintfEmpty-12          50.8ns ± 3%    50.6ns ± 2%    ~     (p=0.793 n=20+20)
FmtFprintfString-12          174ns ± 0%     174ns ± 1%  +0.17%  (p=0.048 n=15+20)
FmtFprintfInt-12             177ns ± 0%     165ns ± 1%  -6.99%  (p=0.000 n=17+19)
FmtFprintfIntInt-12          283ns ± 1%     284ns ± 0%  +0.22%  (p=0.000 n=18+15)
FmtFprintfPrefixedInt-12     243ns ± 1%     244ns ± 1%  +0.40%  (p=0.000 n=20+19)
FmtFprintfFloat-12           318ns ± 0%     319ns ± 0%  +0.27%  (p=0.001 n=19+20)
FmtManyArgs-12              1.12µs ± 0%    1.14µs ± 0%  +1.74%  (p=0.000 n=19+20)
GobDecode-12                8.69ms ± 0%    8.73ms ± 1%  +0.46%  (p=0.000 n=18+18)
GobEncode-12                6.64ms ± 1%    6.61ms ± 1%  -0.46%  (p=0.000 n=20+20)
Gzip-12                      323ms ± 2%     319ms ± 1%  -1.11%  (p=0.000 n=20+20)
Gunzip-12                   42.8ms ± 0%    42.9ms ± 0%    ~     (p=0.158 n=18+20)
HTTPClientServer-12         63.3µs ± 1%    63.1µs ± 1%  -0.35%  (p=0.011 n=20+20)
JSONEncode-12               16.9ms ± 1%    17.3ms ± 1%  +2.84%  (p=0.000 n=19+20)
JSONDecode-12               59.7ms ± 0%    58.5ms ± 0%  -2.05%  (p=0.000 n=19+17)
Mandelbrot200-12            3.92ms ± 0%    3.91ms ± 0%  -0.16%  (p=0.003 n=19+19)
GoParse-12                  3.79ms ± 2%    3.75ms ± 2%  -0.91%  (p=0.005 n=20+20)
RegexpMatchEasy0_32-12       102ns ± 1%     101ns ± 1%  -0.80%  (p=0.001 n=14+20)
RegexpMatchEasy0_1K-12       337ns ± 1%     346ns ± 1%  +2.90%  (p=0.000 n=20+19)
RegexpMatchEasy1_32-12      84.4ns ± 2%    84.3ns ± 2%    ~     (p=0.743 n=20+20)
RegexpMatchEasy1_1K-12       502ns ± 1%     505ns ± 0%  +0.64%  (p=0.000 n=20+20)
RegexpMatchMedium_32-12      133ns ± 1%     132ns ± 1%  -0.85%  (p=0.000 n=20+19)
RegexpMatchMedium_1K-12     40.1µs ± 1%    39.8µs ± 1%  -0.77%  (p=0.000 n=18+18)
RegexpMatchHard_32-12       2.08µs ± 1%    2.07µs ± 1%  -0.55%  (p=0.001 n=18+19)
RegexpMatchHard_1K-12       62.4µs ± 1%    62.0µs ± 1%  -0.74%  (p=0.000 n=19+19)
Revcomp-12                   545ms ± 2%     545ms ± 3%    ~     (p=0.771 n=19+20)
Template-12                 73.7ms ± 1%    72.0ms ± 0%  -2.33%  (p=0.000 n=20+18)
TimeParse-12                 358ns ± 1%     351ns ± 1%  -2.07%  (p=0.000 n=20+20)
TimeFormat-12                369ns ± 1%     356ns ± 0%  -3.53%  (p=0.000 n=20+18)
[Geo mean]                  63.5µs         63.2µs       -0.41%

name                      old speed      new speed      delta
GobDecode-12              88.3MB/s ± 0%  87.9MB/s ± 0%  -0.43%  (p=0.000 n=18+17)
GobEncode-12               116MB/s ± 1%   116MB/s ± 1%  +0.47%  (p=0.000 n=20+20)
Gzip-12                   60.2MB/s ± 2%  60.8MB/s ± 1%  +1.13%  (p=0.000 n=20+20)
Gunzip-12                  453MB/s ± 0%   453MB/s ± 0%    ~     (p=0.160 n=18+20)
JSONEncode-12              115MB/s ± 1%   112MB/s ± 1%  -2.76%  (p=0.000 n=19+20)
JSONDecode-12             32.5MB/s ± 0%  33.2MB/s ± 0%  +2.09%  (p=0.000 n=19+17)
GoParse-12                15.3MB/s ± 2%  15.4MB/s ± 2%  +0.92%  (p=0.004 n=20+20)
RegexpMatchEasy0_32-12     311MB/s ± 1%   314MB/s ± 1%  +0.78%  (p=0.000 n=15+19)
RegexpMatchEasy0_1K-12    3.04GB/s ± 1%  2.95GB/s ± 1%  -2.90%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12     379MB/s ± 2%   380MB/s ± 2%    ~     (p=0.779 n=20+20)
RegexpMatchEasy1_1K-12    2.04GB/s ± 1%  2.02GB/s ± 0%  -0.62%  (p=0.000 n=20+20)
RegexpMatchMedium_32-12   7.46MB/s ± 1%  7.53MB/s ± 1%  +0.86%  (p=0.000 n=20+19)
RegexpMatchMedium_1K-12   25.5MB/s ± 1%  25.7MB/s ± 1%  +0.78%  (p=0.000 n=18+18)
RegexpMatchHard_32-12     15.4MB/s ± 1%  15.5MB/s ± 1%  +0.62%  (p=0.000 n=19+19)
RegexpMatchHard_1K-12     16.4MB/s ± 1%  16.5MB/s ± 1%  +0.82%  (p=0.000 n=20+19)
Revcomp-12                 466MB/s ± 2%   466MB/s ± 3%    ~     (p=0.765 n=19+20)
Template-12               26.3MB/s ± 1%  27.0MB/s ± 0%  +2.38%  (p=0.000 n=20+18)
[Geo mean]                97.8MB/s       98.0MB/s       +0.23%

Change-Id: I281044ae0b24990ba46487cacbc1069493274bc4
Reviewed-on: https://go-review.googlesource.com/13614
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-22 17:48:13 +00:00
Matthew Dempsky
1652a2c316 runtime: add mSpanList type to represent lists of mspans
This CL introduces a new mSpanList type to replace the empty mspan
variables that were previously used as list heads.

To be type safe, the previous circular linked list data structure is
now a tail queue instead.  One complication of this is
mSpanList_Remove needs to know the list a span is being removed from,
but this appears to be computable in all circumstances.

As a temporary sanity check, mSpanList_Insert and mSpanList_InsertBack
record the list that an mspan has been inserted into so that
mSpanList_Remove can verify that the correct list was specified.

Whereas mspan is 112 bytes on amd64, mSpanList is only 16 bytes.  This
shrinks the size of mheap from 50216 bytes to 12584 bytes.

Change-Id: I8146364753dbc3b4ab120afbb9c7b8740653c216
Reviewed-on: https://go-review.googlesource.com/15906
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2015-10-22 17:12:06 +00:00
Aaron Jacobs
151f4ec95d runtime: remove unused printpc and printbyte functions
Change-Id: I40e338f6b445ca72055fc9bac0f09f0dca904e3a
Reviewed-on: https://go-review.googlesource.com/16191
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-22 15:02:44 +00:00
Matthew Dempsky
5a68eb9f25 runtime: prune some dead variables
Change-Id: I7a1c3079b433c4e30d72fb7d59f9594e0d5efe47
Reviewed-on: https://go-review.googlesource.com/16178
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-10-22 03:56:19 +00:00
Matthew Dempsky
29330c118d runtime: change fixalloc's chunk field to unsafe.Pointer
It's never used as a *byte anyway, so might as well just make it an
unsafe.Pointer instead.

Change-Id: I68ee418781ab2fc574eeac0498f2515b5561b7a8
Reviewed-on: https://go-review.googlesource.com/16175
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-22 01:14:23 +00:00
Shenghou Ma
1948aef6e3 runtime: fix typos
Change-Id: Iffc25fc80452baf090bf8ef15ab798cfaa120b8e
Reviewed-on: https://go-review.googlesource.com/16154
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-22 00:40:48 +00:00
Matthew Dempsky
58e3ae2fae runtime: split plan9 and solaris's m fields into new embedded mOS type
Reduces the size of m by ~8% on linux/amd64 (1040 bytes -> 960 bytes).

There are also windows-specific fields, but they're currently
referenced in OS-independent source files (but only when
GOOS=="windows").

Change-Id: I13e1471ff585ccced1271f74209f8ed6df14c202
Reviewed-on: https://go-review.googlesource.com/16173
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-22 00:04:52 +00:00
Matthew Dempsky
7df8ba136c runtime: replace unsafe pointer arithmetic with array indexing
Change-Id: I313819abebd4cda4a6c30fd0fd6f44cb1d09161f
Reviewed-on: https://go-review.googlesource.com/16167
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 23:22:20 +00:00
Matthew Dempsky
84afa1be76 runtime: make iface/eface handling more type safe
Change compiler-invoked interface functions to directly take
iface/eface parameters instead of fInterface/interface{} to avoid
needing to always convert.

For the handful of functions that legitimately need to take an
interface{} parameter, add efaceOf to type-safely convert *interface{}
to *eface.

Change-Id: I8928761a12fd3c771394f36adf93d3006a9fcf39
Reviewed-on: https://go-review.googlesource.com/16166
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 23:08:22 +00:00
Ian Lance Taylor
73f329f472 runtime, syscall: add calls to msan functions
Add explicit memory sanitizer instrumentation to the runtime and syscall
packages.  The compiler does not instrument the runtime package.  It
does instrument the syscall package, but we need to add a couple of
cases that it can't see.

Change-Id: I2d66073f713fe67e33a6720460d2bb8f72f31394
Reviewed-on: https://go-review.googlesource.com/16164
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-21 19:17:46 +00:00
Matthew Dempsky
c279250946 runtime: change functype's in and out fields to []*_type
Allows removing a few gratuitous unsafe.Pointer conversions and
parallels the type of reflect.funcType's in and out fields ([]*rtype).

Change-Id: Ie5ca230a94407301a854dfd8782a3180d5054bc4
Reviewed-on: https://go-review.googlesource.com/16163
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 18:37:45 +00:00
Ian Lance Taylor
5174df9087 runtime, runtime/msan: add msan runtime support
These are the runtime support functions for letting Go code interoperate
with the C/C++ memory sanitizer.  Calls to msanread/msanwrite are now
inserted by the compiler with the -msan option.  Calls to
msanmalloc/msanfree will be from other runtime functions in a subsequent
CL.

Change-Id: I64fb061b38cc6519153face242eccd291c07d1f2
Reviewed-on: https://go-review.googlesource.com/16162
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-21 17:50:39 +00:00
Austin Clements
a42f668654 runtime: eliminate unused _GCstw phase
Change-Id: Ie94cd17e1975fdaaa418fa6a7b2d3b164fedc135
Reviewed-on: https://go-review.googlesource.com/16057
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-21 16:26:34 +00:00
Austin Clements
28f458ce5b runtime: eliminate unnecessary ragged barrier
The ragged barrier after entering the concurrent mark phase is
vestigial. This used to be the point where we enabled write barriers,
so it was necessary to synchronize all Ps to ensure write barriers
were enabled before any marking occurred. However, we've long since
switched to enabling write barriers during the concurrent scan phase,
so the start-the-world at the beginning of the concurrent scan phase
ensures that all Ps have enabled the write barrier.

Hence, we can eliminate the old "install write barrier" phase.

Fixes #11971.

Change-Id: I8cdcb84b5525cef19927d51ea11ba0a4db991ea8
Reviewed-on: https://go-review.googlesource.com/16044
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-21 16:26:25 +00:00
Matthew Dempsky
d4a7ea1b71 runtime: add stringStructOf helper function
Instead of open-coding conversions from *string to unsafe.Pointer then
to *stringStruct, add a helper function to add some type safety.
Bonus: This caught two **string values being converted to
*stringStruct in heapdump.go.

While here, get rid of the redundant _string type, but add in a
stringStructDWARF type used for generating DWARF debug info.

Change-Id: I8882f8cca66ac45190270f82019a5d85db023bd2
Reviewed-on: https://go-review.googlesource.com/16131
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-20 23:13:27 +00:00
Aaron Jacobs
ef986fa3fc runtime: change odd 'print1_write' file names
The '1' part is left over from the C conversion, but no longer makes
sense given that print1.go no longer exists.

Change-Id: Iec171251370d740f234afdbd6fb1a4009fde6696
Reviewed-on: https://go-review.googlesource.com/16036
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-20 23:03:06 +00:00
Hyang-Ah Hana Kim
30ee5919bd runtime: add syscalls needed for android/amd64 logging.
access, connect, socket.

In Android-L, logging is done by writing the log messages to the logd
process through a unix domain socket.

Also, changed the arg types of those syscall stubs to match linux
programming APIs.

For golang/go#10743

Change-Id: I66368a03316e253561e9e76aadd180c2cd2e48f3
Reviewed-on: https://go-review.googlesource.com/15993
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-20 16:56:58 +00:00
Aaron Jacobs
3bc0601742 runtime: rename _func.frame to make it clear it's deprecated and unused.
When I saw that it was labelled "legacy", I went looking for users of it
to see how it was still used. But there aren't any. Save the next person
the trouble.

Change-Id: I921dd6c57b60331c9816542272555153ac133c02
Reviewed-on: https://go-review.googlesource.com/16035
Reviewed-by: Dave Cheney <dave@cheney.net>
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-20 03:16:09 +00:00
Michael Hudson-Doyle
c5856cfdb6 runtime: tweaks to allow -buildmode=shared to work
Building Go shared libraries requires that all functions that have declarations
without bodies have implementations and vice versa, so remove the
implementation of call16 and add a stub implementation of sigreturn.

Change-Id: I4d5a30c8637a5da7991054e151a536611d5bea46
Reviewed-on: https://go-review.googlesource.com/15966
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-19 21:23:36 +00:00
Austin Clements
3cd56b4dca runtime: combine gcResetGState and gcResetMarkState
These functions are always called together and perform logically
related state resets, so combine them in to just gcResetMarkState.

Fixes #11427.

Change-Id: I06c17ef65f66186494887a767b3993126955b5fe
Reviewed-on: https://go-review.googlesource.com/16041
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-19 18:38:07 +00:00
Austin Clements
b0d5e5c500 runtime: consolidate gcResetGState calls
Currently gcResetGState is called by func gcscan_m for concurrent GC
and directly by func gc for STW GC. Simplify this by consolidating
these two calls in to one call by func gc above where it splits for
concurrent and STW GC.

As a consequence, gcResetGState and gcResetMarkState are always called
together, so the next commit will consolidate these.

Change-Id: Ib62d404c7b32b28f7d3080d26ecf3966cbc4aca0
Reviewed-on: https://go-review.googlesource.com/16040
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-19 18:38:00 +00:00
Austin Clements
feb92a8e8c runtime: remove work.partial queue
This work queue is no longer used (there are many reads of
work.partial, but the only write is in putpartial, which is never
called).

Fixes #11922.

Change-Id: I08b76c0c02a0867a9cdcb94783e1f7629d44249a
Reviewed-on: https://go-review.googlesource.com/15892
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-19 18:37:54 +00:00
Aaron Jacobs
5d88323fa6 runtime: remove a redundant nil pointer check.
It appears this was made possible by commit 89f185f; before that, g was
not dereferenced above.

Change-Id: I70bc571d924b36351392fd4c13d681e938cfb573
Reviewed-on: https://go-review.googlesource.com/16033
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-10-19 09:58:15 +00:00