1
0
mirror of https://github.com/golang/go synced 2024-11-19 20:24:41 -07:00
Commit Graph

2502 Commits

Author SHA1 Message Date
Austin Clements
1bd39e79db runtime: fix SP adjustment on amd64p32
On amd64p32, rt0_go attempts to reserve 128 bytes of scratch space on
the stack, but due to a register mixup this ends up being a no-op. Fix
this so we actually reserve the stack space.

Change-Id: I04dbfbeb44f3109528c8ec74e1136bc00d7e1faa
Reviewed-on: https://go-review.googlesource.com/32331
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-28 21:39:17 +00:00
Austin Clements
bd640c882a runtime: disable stack rescanning by default
With the hybrid barrier in place, we can now disable stack rescanning
by default. This commit adds a "gcrescanstacks" GODEBUG variable that
is off by default but can be set to re-enable STW stack rescanning.
The plan is to leave this off but available in Go 1.8 for debugging
and as a fallback.

With this change, worst-case mark termination time at GOMAXPROCS=12
*not* including time spent stopping the world (which is still
unbounded) is reliably under 100 µs, with a 95%ile around 50 µs in
every benchmark I tried (the go1 benchmarks, the x/benchmarks garbage
benchmark, and the gcbench activegs and rpc benchmarks). Including
time spent stopping the world usually adds about 20 µs to total STW
time at GOMAXPROCS=12, but I've seen it add around 150 µs in these
benchmarks when a goroutine takes time to reach a safe point (see
issue #10958) or when stopping the world races with goroutine
switches. At GOMAXPROCS=1, where this isn't an issue, worst case STW
is typically 30 µs.

The go-gcbench activegs benchmark is designed to stress large numbers
of dirty stacks. This commit reduces 95%ile STW time for 500k dirty
stacks by nearly three orders of magnitude, from 150ms to 195µs.

This has little effect on the throughput of the go1 benchmarks or the
x/benchmarks benchmarks.

name         old time/op  new time/op  delta
XGarbage-12  2.31ms ± 0%  2.32ms ± 1%  +0.28%  (p=0.001 n=17+16)
XJSON-12     12.4ms ± 0%  12.4ms ± 0%  +0.41%  (p=0.000 n=18+18)
XHTTP-12     11.8µs ± 0%  11.8µs ± 1%    ~     (p=0.492 n=20+18)

It reduces the tail latency of the x/benchmarks HTTP benchmark:

name      old p50-time  new p50-time  delta
XHTTP-12    489µs ± 0%    491µs ± 1%  +0.54%  (p=0.000 n=20+18)

name      old p95-time  new p95-time  delta
XHTTP-12    957µs ± 1%    960µs ± 1%  +0.28%  (p=0.002 n=20+17)

name      old p99-time  new p99-time  delta
XHTTP-12   1.76ms ± 1%   1.64ms ± 1%  -7.20%  (p=0.000 n=20+18)

Comparing to the beginning of the hybrid barrier implementation
("runtime: parallelize STW mcache flushing") shows that the hybrid
barrier trades a small performance impact for much better STW latency,
as expected. The magnitude of the performance impact is generally
small:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.37s ± 1%     2.42s ± 1%  +2.04%  (p=0.000 n=19+18)
Fannkuch11-12                2.84s ± 0%     2.72s ± 0%  -4.00%  (p=0.000 n=19+19)
FmtFprintfEmpty-12          44.2ns ± 1%    45.2ns ± 1%  +2.20%  (p=0.000 n=17+19)
FmtFprintfString-12          130ns ± 1%     134ns ± 0%  +2.94%  (p=0.000 n=18+16)
FmtFprintfInt-12             114ns ± 1%     117ns ± 0%  +3.01%  (p=0.000 n=19+15)
FmtFprintfIntInt-12          176ns ± 1%     182ns ± 0%  +3.17%  (p=0.000 n=20+15)
FmtFprintfPrefixedInt-12     186ns ± 1%     187ns ± 1%  +1.04%  (p=0.000 n=20+19)
FmtFprintfFloat-12           251ns ± 1%     250ns ± 1%  -0.74%  (p=0.000 n=17+18)
FmtManyArgs-12               746ns ± 1%     761ns ± 0%  +2.08%  (p=0.000 n=19+20)
GobDecode-12                6.57ms ± 1%    6.65ms ± 1%  +1.11%  (p=0.000 n=19+20)
GobEncode-12                5.59ms ± 1%    5.65ms ± 0%  +1.08%  (p=0.000 n=17+17)
Gzip-12                      223ms ± 1%     223ms ± 1%  -0.31%  (p=0.006 n=20+20)
Gunzip-12                   38.0ms ± 0%    37.9ms ± 1%  -0.25%  (p=0.009 n=19+20)
HTTPClientServer-12         77.5µs ± 1%    78.9µs ± 2%  +1.89%  (p=0.000 n=20+20)
JSONEncode-12               14.7ms ± 1%    14.9ms ± 0%  +0.75%  (p=0.000 n=20+20)
JSONDecode-12               53.0ms ± 1%    55.9ms ± 1%  +5.54%  (p=0.000 n=19+19)
Mandelbrot200-12            3.81ms ± 0%    3.81ms ± 1%  +0.20%  (p=0.023 n=17+19)
GoParse-12                  3.17ms ± 1%    3.18ms ± 1%    ~     (p=0.057 n=20+19)
RegexpMatchEasy0_32-12      71.7ns ± 1%    70.4ns ± 1%  -1.77%  (p=0.000 n=19+20)
RegexpMatchEasy0_1K-12       946ns ± 0%     946ns ± 0%    ~     (p=0.405 n=18+18)
RegexpMatchEasy1_32-12      67.2ns ± 2%    67.3ns ± 2%    ~     (p=0.732 n=20+20)
RegexpMatchEasy1_1K-12       374ns ± 1%     378ns ± 1%  +1.14%  (p=0.000 n=18+19)
RegexpMatchMedium_32-12      107ns ± 1%     107ns ± 1%    ~     (p=0.259 n=18+20)
RegexpMatchMedium_1K-12     34.2µs ± 1%    34.5µs ± 1%  +1.03%  (p=0.000 n=18+18)
RegexpMatchHard_32-12       1.77µs ± 1%    1.79µs ± 1%  +0.73%  (p=0.000 n=19+18)
RegexpMatchHard_1K-12       53.6µs ± 1%    54.2µs ± 1%  +1.10%  (p=0.000 n=19+19)
Template-12                 61.5ms ± 1%    63.9ms ± 0%  +3.96%  (p=0.000 n=18+18)
TimeParse-12                 303ns ± 1%     300ns ± 1%  -1.08%  (p=0.000 n=19+20)
TimeFormat-12                318ns ± 1%     320ns ± 0%  +0.79%  (p=0.000 n=19+19)
Revcomp-12 (*)               509ms ± 3%     504ms ± 0%    ~     (p=0.967 n=7+12)
[Geo mean]                  54.3µs         54.8µs       +0.88%

(*) Revcomp is highly non-linear, so I only took samples with 2
iterations.

name         old time/op  new time/op  delta
XGarbage-12  2.25ms ± 0%  2.32ms ± 1%  +2.74%  (p=0.000 n=16+16)
XJSON-12     11.6ms ± 0%  12.4ms ± 0%  +6.81%  (p=0.000 n=18+18)
XHTTP-12     11.6µs ± 1%  11.8µs ± 1%  +1.62%  (p=0.000 n=17+18)

Updates #17503.

Updates #17099, since you can't have a rescan list bug if there's no
rescan list. I'm not marking it as fixed, since gcrescanstacks can
still be set to re-enable the rescan lists.

Change-Id: I6e926b4c2dbd4cd56721869d4f817bdbb330b851
Reviewed-on: https://go-review.googlesource.com/31766
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 21:24:13 +00:00
Austin Clements
5380b22991 runtime: implement unconditional hybrid barrier
This implements the unconditional version of the hybrid deletion write
barrier, which always shades both the old and new pointer. It's
unconditional for now because barriers on channel operations require
checking both the source and destination stacks and we don't have a
way to funnel this information into the write barrier at the moment.

As part of this change, we modify the typed memclr operations
introduced earlier to invoke the write barrier.

This has basically no overall effect on benchmark performance. This is
good, since it indicates that neither the extra shade nor the new bulk
clear barriers have much effect. It also has little effect on latency.
This is expected, since we haven't yet modified mark termination to
take advantage of the hybrid barrier.

Updates #17503.

Change-Id: Iebedf84af2f0e857bd5d3a2d525f760b5cf7224b
Reviewed-on: https://go-review.googlesource.com/31765
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 21:24:02 +00:00
Austin Clements
ee3d20129a runtime: avoid getfull() barrier most of the time
With the hybrid barrier, unless we're doing a STW GC or hit a very
rare race (~once per all.bash) that can start mark termination before
all of the work is drained, we don't need to drain the work queue at
all. Even draining an empty work queue is rather expensive since we
have to enter the getfull() barrier, so it's worth avoiding this.

Conveniently, it's quite easy to detect whether or not we actually
need the getufull() barrier: since the world is stopped when we enter
mark termination, everything must have flushed its work to the work
queue, so we can just check the queue. If the queue is empty and we
haven't queued up any jobs that may create more work (which should
always be the case with the hybrid barrier), we can simply have all GC
workers perform non-blocking drains.

Also conveniently, this solution is quite safe. If we do somehow screw
something up and there's work on the work queue, some worker will
still process it, it just may not happen in parallel.

This is not the "right" solution, but it's simple, expedient,
low-risk, and maintains compatibility with debug.gcrescanstacks. When
we remove the gcrescanstacks fallback in Go 1.9, we should also fix
the race that starts mark termination early, and then we can eliminate
work draining from mark termination.

Updates #17503.

Change-Id: I7b3cd5de6a248ab29d78c2b42aed8b7443641361
Reviewed-on: https://go-review.googlesource.com/32186
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 21:23:52 +00:00
Austin Clements
d8256824ac runtime: remove unnecessary step from bulkBarrierPreWrite
Currently bulkBarrierPreWrite calls writebarrierptr_prewrite, but this
means that we check writeBarrier.needed twice and perform cgo checks
twice.

Change bulkBarrierPreWrite to call writebarrierptr_prewrite1 to skip
over these duplicate checks.

This may speed up bulkBarrierPreWrite slightly, but mostly this will
save us from running out of nosplit stack space on ppc64x in the near
future.

Updates #17503.

Change-Id: I1cea1a2207e884ab1a279c6a5e378dcdc048b63e
Reviewed-on: https://go-review.googlesource.com/31890
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 21:23:42 +00:00
Austin Clements
70c107c68d runtime: add deletion barriers on gobuf.ctxt
gobuf.ctxt is set to nil from many places in assembly code and these
assignments require write barriers with the hybrid barrier.

Conveniently, in most of these places ctxt should already be nil, in
which case we don't need the barrier. This commit changes these places
to assert that ctxt is already nil.

gogo is more complicated, since ctxt may not already be nil. For gogo,
we manually perform the write barrier if ctxt is not nil.

Updates #17503.

Change-Id: I9d75e27c75a1b7f8b715ad112fc5d45ffa856d30
Reviewed-on: https://go-review.googlesource.com/31764
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-28 20:48:02 +00:00
Austin Clements
8f81dfe8b4 runtime: perform write barrier before pointer write
Currently, we perform write barriers after performing pointer writes.
At the moment, it simply doesn't matter what order this happens in, as
long as they appear atomic to GC. But both the hybrid barrier and ROC
are going to require a pre-write write barrier.

For the hybrid barrier, this is important because the barrier needs to
observe both the current value of the slot and the value that will be
written to it. (Alternatively, the caller could do the write and pass
in the old value, but it seems easier and more useful to just swap the
order of the barrier and the write.)

For ROC, this is necessary because, if the pointer write is going to
make the pointer reachable to some goroutine that it currently is not
visible to, the garbage collector must take some special action before
that pointer becomes more broadly visible.

This commits swaps pointer writes around so the write barrier occurs
before the pointer write.

The main subtlety here is bulk memory writes. Currently, these copy to
the destination first and then use the pointer bitmap of the
destination to find the copied pointers and invoke the write barrier.
This is necessary because the source may not have a pointer bitmap. To
handle these, we pass both the source and the destination to the bulk
memory barrier, which uses the pointer bitmap of the destination, but
reads the pointer values from the source.

Updates #17503.

Change-Id: I78ecc0c5c94ee81c29019c305b3d232069294a55
Reviewed-on: https://go-review.googlesource.com/31763
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 20:47:52 +00:00
Cherry Zhang
9d1efba28d cmd/link: put text at address 0x1000000 on darwin/amd64
Apparently on macOS Sierra LLDB thinks /usr/lib/dyld is mapped
at address 0, even if Go code starts at 0x1000, and it looks up
addresses from dyld which shadows Go symbols. Move Go binary at
a higher address to avoid clash.

Fixes #17463. Re-enable TestLldbPython.

Change-Id: I89ca6f3ee48aa6da9862bfa0c2da91477cc93255
Reviewed-on: https://go-review.googlesource.com/32185
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Quentin Smith <quentin@golang.org>
2016-10-28 20:17:53 +00:00
Austin Clements
c3163d23f0 runtime: eliminate write barriers from save
As for dropg, save is writing a nil pointer that will generate a write
barrier with the hybrid barrier. However, in this case, ctxt always
should already be nil, so replace the write with an assertion that
this is the case.

At this point, we're ready to disable the write barrier elision
optimizations that interfere with the hybrid barrier.

Updates #17503.

Change-Id: I83208e65aa33403d442401f355b2e013ab9a50e9
Reviewed-on: https://go-review.googlesource.com/31571
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 20:05:49 +00:00
Austin Clements
8044b77a57 runtime: eliminate write barriers from dropg
Currently this contains no write barriers because it's writing nil
pointers, but with the hybrid barrier, even these will produce write
barriers. However, since these are *gs and *ms, they don't need write
barriers, so we can simply eliminate them.

Updates #17503.

Change-Id: Ib188a60492c5cfb352814bf9b2bcb2941fb7d6c0
Reviewed-on: https://go-review.googlesource.com/31570
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 20:05:39 +00:00
Austin Clements
85c22bc3a5 runtime: mark tiny blocks at GC start
The hybrid barrier requires allocate-black, but there's one case where
we don't currently allocate black: the tiny allocator. If we allocate
a *new* tiny alloc block during GC, it will be allocated black, but if
we allocated the current block before GC, it won't be black, and the
further allocations from it won't mark it, which means we may free a
reachable tiny block during sweeping.

Fix this by passing over all mcaches at the beginning of mark, while
the world is still stopped, and greying their tiny blocks.

Updates #17503.

Change-Id: I04d4df7cc2f553f8f7b1e4cb0b52e2946588111a
Reviewed-on: https://go-review.googlesource.com/31456
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 20:05:28 +00:00
Austin Clements
ee785f03a2 runtime: shade stack-to-stack copy when starting a goroutine
The hybrid barrier requires barriers on stack-to-stack copies if
either stack is grey. There are only two instances of this in the
runtime: channel sends and starting a goroutine. Channel sends already
use typedmemmove and hence have the necessary barriers. This commits
adds barriers for the stack-to-stack copy when starting a goroutine.

Updates #17503.

Change-Id: Ibb55e08127ca4d021ac54be61cb96732efa5df5b
Reviewed-on: https://go-review.googlesource.com/31455
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 20:05:18 +00:00
Michael Matloob
0c0960a234 runtime/pprof/internal/profile: add copyright notice to profile_memmap.go
Change-Id: Ia511b0aadc87eb53e084d14cdb90ba4be958a43e
Reviewed-on: https://go-review.googlesource.com/32259
Reviewed-by: Austin Clements <austin@google.com>
2016-10-28 20:03:40 +00:00
Michael Matloob
b33030a727 runtime/pprof: write profiles in protobuf format.
Original Change by Daria Kolistratova <daria.kolistratova@intel.com>

Added functions with suffix proto and stuff from pprof tool to translate
to protobuf. Done as the profile proto is more extensible than the legacy
pprof format and is pprof's preferred profile format. Large part was taken
from https://github.com/google/pprof tool. Tested by hand and compared the
result with translated by pprof tool, profiles are identical.
Fixes #16093

Change-Id: I2751345b09a66ee2b6aa64be76cba4cd1c326aa6
Reviewed-on: https://go-review.googlesource.com/32257
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
2016-10-28 19:52:13 +00:00
Russ Cox
2a7272b422 runtime/trace: deflake TestTraceSymbolize
Waiting 2ms for all the kicked-off goroutines to run and block
seems a little optimistic. No harm done by waiting for 200ms instead.

Fixes #17238.

Change-Id: I827532ea2f5f1f3ed04179f8957dd2c563946ed0
Reviewed-on: https://go-review.googlesource.com/32103
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-28 19:32:37 +00:00
Austin Clements
88518e7dd6 runtime: zero-initialize LR on new stacks
Currently we initialize LR on a new stack by writing nil to it. But
this is an initializing write since the newly allocated stack is not
zeroed, so this is unsafe with the hybrid barrier. Change this is a
uintptr write to avoid a bad write barrier.

Updates #17503.

Change-Id: I062ac352e35df7da4644c1f2a5aaab87049d1f60
Reviewed-on: https://go-review.googlesource.com/32093
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 19:14:03 +00:00
Austin Clements
d3836aba31 runtime: ensure finalizers are zero-initialized before reuse
We reuse finalizers in finblocks, which are allocated off-heap. This
means they have to be zero-initialized before becoming visible to the
garbage collector. We actually already do this by clearing the
finalizer before returning it to the pool, but we're not careful to
enforce correct memory ordering. Fix this by manipulating the
finalizer count atomically so these writes synchronize properly with
the garbage collector.

Updates #17503.

Change-Id: I7797d31df3c656c9fe654bc6da287f66a9e2037d
Reviewed-on: https://go-review.googlesource.com/31454
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 19:13:54 +00:00
Austin Clements
db56a63547 runtime: avoid write barriers to uninitialized finalizer frame memory
runfinq allocates a stack frame on the heap for constructing the
finalizer function calls and reuses it for each call. However, because
the type of this frame is constantly shifting, it tells mallocgc there
are no pointers in it and it acts essentially like uninitialized
memory between uses. But runfinq uses pointer writes with write
barriers to "initialize" this memory, which is not going to be safe
with the hybrid barrier, since the hybrid barrier may see a stale
pointer left in the "uninitialized" frame.

Fix this by zero-initializing the argument values in the frame before
writing the argument pointers.

Updates #17503.

Change-Id: I951c0a2be427eb9082a32d65c4410e6fdef041be
Reviewed-on: https://go-review.googlesource.com/31453
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 19:13:44 +00:00
Austin Clements
c5e7006540 runtime: document rules about unmanaged memory
Updates #17503.

Change-Id: I109d8742358ae983fdff3f3dbb7136973e81f4c3
Reviewed-on: https://go-review.googlesource.com/31452
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 19:13:33 +00:00
Austin Clements
14f3284ddb Revert "runtime/pprof: write profiles in protobuf format."
This reverts commit 7d14401bcb.

Reason for revert: Doesn't build.

Change-Id: I766179ab9225109d9232f783326e4d3843254980
Reviewed-on: https://go-review.googlesource.com/32256
Reviewed-by: Russ Cox <rsc@golang.org>
2016-10-28 18:41:16 +00:00
Austin Clements
87e48c5afd runtime, cmd/compile: rename memclr -> memclrNoHeapPointers
Since barrier-less memclr is only safe in very narrow circumstances,
this commit renames memclr to avoid accidentally calling memclr on
typed memory. This can cause subtle, non-deterministic bugs, so it's
worth some effort to prevent. In the near term, this will also prevent
bugs creeping in from any concurrent CLs that add calls to memclr; if
this happens, whichever patch hits master second will fail to compile.

This also adds the other new memclr variants to the compiler's
builtin.go to minimize the churn on that binary blob. We'll use these
in future commits.

Updates #17503.

Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca
Reviewed-on: https://go-review.googlesource.com/31369
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 18:20:33 +00:00
Austin Clements
ae3bb4a537 runtime: make fixalloc zero allocations on reuse
Currently fixalloc does not zero memory it reuses. This is dangerous
with the hybrid barrier if the type may contain heap pointers, since
it may cause us to observe a dead heap pointer on reuse. It's also
error-prone since it's the only allocator that doesn't zero on
allocation (mallocgc of course zeroes, but so do persistentalloc and
sysAlloc). It's also largely pointless: for mcache, the caller
immediately memclrs the allocation; and the two specials types are
tiny so there's no real cost to zeroing them.

Change fixalloc to zero allocations by default.

The only type we don't zero by default is mspan. This actually
requires that the spsn's sweepgen survive across freeing and
reallocating a span. If we were to zero it, the following race would
be possible:

1. The current sweepgen is 2. Span s is on the unswept list.

2. Direct sweeping sweeps span s, finds it's all free, and releases s
   to the fixalloc.

3. Thread 1 allocates s from fixalloc. Suppose this zeros s, including
   s.sweepgen.

4. Thread 1 calls s.init, which sets s.state to _MSpanDead.

5. On thread 2, background sweeping comes across span s in allspans
   and cas's s.sweepgen from 0 (sg-2) to 1 (sg-1). Now it thinks it
   owns it for sweeping. 6. Thread 1 continues initializing s.
   Everything breaks.

I would like to fix this because it's obviously confusing, but it's a
subtle enough problem that I'm leaving it alone for now. The solution
may be to skip sweepgen 0, but then we have to think about wrap-around
much more carefully.

Updates #17503.

Change-Id: Ie08691feed3abbb06a31381b94beb0a2e36a0613
Reviewed-on: https://go-review.googlesource.com/31368
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 18:20:23 +00:00
Austin Clements
f4dcc9b29b runtime: make _MSpanDead be the zero state
Currently the zero value of an mspan is in the "in use" state. This
seems like a bad idea in general. But it's going to wreak havoc when
we make fixalloc zero allocations: even "freed" mspan objects are
still on the allspans list and still get looked at by the garbage
collector. Hence, if we leave the mspan states the way they are,
allocating a span that reuses old memory will temporarily pass that
span (which is visible to GC!) through the "in use" state, which can
cause "unswept span" panics.

Fix all of this by making the zero state "dead".

Updates #17503.

Change-Id: I77c7ac06e297af4b9e6258bc091c37abe102acc3
Reviewed-on: https://go-review.googlesource.com/31367
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 18:20:13 +00:00
Austin Clements
aa581f5157 runtime: use typedmemclr for typed memory
The hybrid barrier requires distinguishing typed and untyped memory
even when zeroing because the *current* contents of the memory matters
even when overwriting.

This commit introduces runtime.typedmemclr and runtime.memclrHasPointers
as a typed memory clearing functions parallel to runtime.typedmemmove.
Currently these simply call memclr, but with the hybrid barrier we'll
need to shade any pointers we're overwriting. These will provide us
with the necessary hooks to do so.

Updates #17503.

Change-Id: I74478619f8907825898092aaa204d6e4690f27e6
Reviewed-on: https://go-review.googlesource.com/31366
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 18:20:04 +00:00
Austin Clements
a475a38a3d runtime: parallelize STW mcache flushing
Currently all mcaches are flushed in a single STW root job. This takes
about 5 µs per P, but since it's done sequentially it adds about
5*GOMAXPROCS µs to the STW.

Fix this by parallelizing the job. Since there are exactly GOMAXPROCS
mcaches to flush, this parallelizes quite nicely and brings the STW
latency cost down to a constant 5 µs (assuming GOMAXPROCS actually
reflects the number of CPUs).

Updates #17503.

Change-Id: Ibefeb1c2229975d5137c6e67fac3b6c92103742d
Reviewed-on: https://go-review.googlesource.com/32033
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 18:19:53 +00:00
unknown
7d14401bcb runtime/pprof: write profiles in protobuf format.
Added functions with suffix proto and stuff from pprof tool to translate
to protobuf. Done as the profile proto is more extensible than the legacy
pprof format and is pprof's preferred profile format. Large part was taken
from https://github.com/google/pprof tool. Tested by hand and compared the
result with translated by pprof tool, profiles are identical.
Fixes #16093
Change-Id: I5acdb2809cab0d16ed4694fdaa7b8ddfd68df11e
Reviewed-on: https://go-review.googlesource.com/30556
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2016-10-28 18:08:27 +00:00
Austin Clements
d70b0fe6c4 runtime: fix preemption of root marking jobs
The current logic in gcDrain conflates non-blocking with preemptible
draining for root jobs. As a result, if you do a non-blocking (but
*not* preemptible) drain, like dedicated workers do, the root job
drain will stop if preempted and fall through to heap marking jobs,
which won't stop until it fails to get a heap marking job.

This commit fixes the condition on root marking jobs so they only stop
when preempted if the drain is preemptible.

Coincidentally, this also fixes a nil pointer dereference if we call
gcDrain with gcDrainNoBlock and without a user G, since it tries to
get the preempt flag from the nil user G. This combination never
happens right now, but will in the future.

Change-Id: Ia910ec20a9b46237f7926969144a33b1b4a7b2f9
Reviewed-on: https://go-review.googlesource.com/32291
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-10-28 18:07:30 +00:00
Austin Clements
e14b021977 runtime/trace, internal/trace: script to collect canned traces
This adds support to the runtime/trace test for saving traces
collected by its tests to disk and a script in internal/trace that
uses this to collect canned traces for the trace test suite. This can
be used to add to the test suite when we introduce a new trace format
version.

Change-Id: Id9ac1ff312235bf02f982fdfff8a827f54035758
Reviewed-on: https://go-review.googlesource.com/32290
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-28 17:46:49 +00:00
Russ Cox
54f691d69d runtime: skip TestMemmoveOverflow if mmap of needed page fails
Fixes #16731.

Change-Id: I6d393357973d008ab7cf5fb264acb7d38c9354eb
Reviewed-on: https://go-review.googlesource.com/32104
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-28 17:10:39 +00:00
Austin Clements
6da83c6fc0 runtime, cmd/trace: track goroutines blocked on GC assists
Currently when a goroutine blocks on a GC assist, it emits a generic
EvGoBlock event. Since assist blocking events and, in particular, the
length of the blocked assist queue, are important for diagnosing GC
behavior, this commit adds a new EvGoBlockGC event for blocking on a
GC assist. The trace viewer uses this event to report a "waiting on
GC" count in the "Goroutines" row. This makes sense because, unlike
other blocked goroutines, these goroutines do have work to do, so
being blocked on a GC assist is quite similar to being in the
"runnable" state, which we also report in the trace viewer.

Change-Id: Ic21a326992606b121ea3d3d00110d8d1fdc7a5ef
Reviewed-on: https://go-review.googlesource.com/30704
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-28 14:29:47 +00:00
Austin Clements
6834839427 runtime, cmd/trace: annotate different mark worker types
Currently mark workers are shown in the trace as regular goroutines
labeled "runtime.gcBgMarkWorker". That's somewhat unhelpful to an end
user because of the opaque label and particularly unhelpful to runtime
developers because it doesn't distinguish the different types of mark
workers.

Fix this by introducing a variant of the GoStart event called
GoStartLabel that lets the runtime indicate a label for a goroutine
execution span and using this to label mark worker executions as "GC
(<mode>)" in the trace viewer.

Since this bumps the trace version to 1.8, we also add test data for
1.7 traces.

Change-Id: Id7b9c0536508430c661ffb9e40e436f3901ca121
Reviewed-on: https://go-review.googlesource.com/30702
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-28 14:29:40 +00:00
David Crawshaw
9c02c75639 runtime: pass windows float syscall args via XMM
Based on the calling convention documented in:

	https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx

and long-used in golang.org/x/mobile/gl via some fixup asm:

	https://go.googlesource.com/mobile/+/master/gl/work_windows_amd64.s

Fixes #6510

Change-Id: I97e81baaa2872bcd732b1308915eb66f1ba2168f
Reviewed-on: https://go-review.googlesource.com/32173
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
2016-10-28 13:13:08 +00:00
Austin Clements
dd500193d3 runtime: fix preemption of fractional and idle mark workers
Currently, gcDrain looks for the preemption flag at getg().preempt.
However, commit d6625ca moved mark worker draining to the system
stack, which means getg() returns the g0, which never has the preempt
flag set, so idle and fractional workers don't get preempted after
10ms and just run until they run out of work. As a result, if there's
enough idle time, GC becomes effectively STW.

Fix this by looking for the preemption flag on getg().m.curg, which
will always be the user G (where the preempt flag is set), regardless
of whether gcDrain is running on the user or the g0 stack.

Change-Id: Ib554cf49a705b86ccc3d08940bc869f868c50dd2
Reviewed-on: https://go-review.googlesource.com/32251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 13:09:48 +00:00
Peter Weinberger
ca922b6d36 runtime: Profile goroutines holding contended mutexes.
runtime.SetMutexProfileFraction(n int) will capture 1/n-th of stack
traces of goroutines holding contended mutexes if n > 0. From runtime/pprof,
pprot.Lookup("mutex").WriteTo writes the accumulated
stack traces to w (in essentially the same format that blocking
profiling uses).

Change-Id: Ie0b54fa4226853d99aa42c14cb529ae586a8335a
Reviewed-on: https://go-review.googlesource.com/29650
Reviewed-by: Austin Clements <austin@google.com>
2016-10-28 11:47:16 +00:00
Martin Möhrmann
b679665a18 cmd/compile: move stringtoslicebytetmp to the backend
- removes the runtime function stringtoslicebytetmp
- removes the generation of calls to stringtoslicebytetmp from the frontend
- adds handling of OSTRARRAYBYTETMP in the backend

This reduces binary sizes and avoids function call overhead.

Change-Id: Ib9988d48549cee663b685b4897a483f94727b940
Reviewed-on: https://go-review.googlesource.com/32158
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-28 07:58:47 +00:00
Alex Brainman
f595848e9a runtime/cgo: do not link threads lib by default on windows
I do not know why it is included. All tests pass without it.

Change-Id: I839076ee131816dfd177570a902c69fe8fba5022
Reviewed-on: https://go-review.googlesource.com/32144
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-28 04:36:31 +00:00
Michael Hudson-Doyle
8b07ec20f7 cmd/compile, runtime: make the go.itab.* symbols module-local
Otherwise, the way the ELF dynamic linker works means that you can end up with
the same itab being passed to additab twice, leading to the itab linked list
having a cycle in it. Add a test to additab in runtime to catch this when it
happens, not some arbitrary and surprsing time later.

Fixes #17594

Change-Id: I6c82edcc9ac88ac188d1185370242dc92f46b1ad
Reviewed-on: https://go-review.googlesource.com/32131
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-27 19:13:35 +00:00
Russ Cox
5594074dcd runtime: use clock_gettime(CLOCK_REALTIME) for nanosecond-precision time.now on arm64, mips64x
Assembly copied from the clock_gettime(CLOCK_MONOTONIC)
call in runtime.nanotime in these files and then modified to use
CLOCK_REALTIME.

Also comment system call numbers in a few other files.

Fixes #11222.

Change-Id: Ie132086de7386f865908183aac2713f90fc73e0d
Reviewed-on: https://go-review.googlesource.com/32177
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-27 17:53:13 +00:00
Russ Cox
09692182fa runtime: print sigcode on signal crash
For #17496.

Change-Id: I671a59581c54d17bc272767eeb7b2742b54eca38
Reviewed-on: https://go-review.googlesource.com/32183
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-27 17:46:01 +00:00
Than McIntosh
d80e8de54e cmd/compile: avoid truncating fieldname var locations
Don't include package path when creating LSyms for auto and param
variables during Prog generation, and update the DWARF emit routine
accordingly (remove the code that chops off package path from names in
DWARF var location expressions). Implementation suggested by mdempsky@.

The intent of this change is to have saner location expressions in cases
where the variable corresponds to a structure field. For example, the
SSA compiler's "decompose" phase can take a slice value and break it
apart into three scalar variables corresponding to the fields (slice "X"
gets split into "X.len", "X.cap", "X.ptr"). In such cases we want the
name in the location expression to omit the package path but preserve
the original variable name (e.g. "X").

Fixes #16338

Change-Id: Ibc444e7f3454b70fc500a33f0397e669d127daa1
Reviewed-on: https://go-review.googlesource.com/31819
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-10-26 21:14:46 +00:00
Austin Clements
d6625caf53 runtime: scan mark worker stacks like normal
Currently, markroot delays scanning mark worker stacks until mark
termination by putting the mark worker G directly on the rescan list
when it encounters one during the mark phase. Without this, since mark
workers are non-preemptible, two mark workers that attempt to scan
each other's stacks can deadlock.

However, this is annoyingly asymmetric and causes some real problems.
First, markroot does not own the G at that point, so it's not
technically safe to add it to the rescan list. I haven't been able to
find a specific problem this could cause, but I suspect it's the root
cause of issue #17099. Second, this will interfere with the hybrid
barrier, since there is no stack rescanning during mark termination
with the hybrid barrier.

This commit switches to a different approach. We move the mark
worker's call to gcDrain to the system stack and set the mark worker's
status to _Gwaiting for the duration of the drain to indicate that
it's preemptible. This lets another mark worker scan its G stack while
the drain is running on the system stack. We don't return to the G
stack until we can switch back to _Grunning, which ensures we don't
race with a stack scan. This lets us eliminate the special case for
mark worker stack scans and scan them just like any other goroutine.
The only subtlety to this approach is that we have to disable stack
shrinking for mark workers; they could be referring to captured
variables from the G stack, so it's not safe to move their stacks.

Updates #17099 and #17503.

Change-Id: Ia5213949ec470af63e24dfce01df357c12adbbea
Reviewed-on: https://go-review.googlesource.com/31820
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-26 18:13:16 +00:00
Austin Clements
3193c71c5b runtime: fix bad pointer with 0 stack barriers
Currently, if the number of stack barriers for a stack is 0, we'll
create a zero-length slice that points just past the end of the stack
allocation. This bad pointer causes GC panics.

Fix this by creating a nil slice if the stack barrier count is 0.

In practice, the only way this can happen is if
GODEBUG=gcstackbarrieroff=1 is set because even the minimum size stack
reserves space for two stack barriers.

Change-Id: I3527c9a504c445b64b81170ee285a28594e7983d
Reviewed-on: https://go-review.googlesource.com/31762
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-26 15:46:25 +00:00
Austin Clements
d1cc83472d runtime: debug code to panic when marking a free object
This adds debug code enabled in gccheckmark mode that panics if we
attempt to mark an unallocated object. This is a common issue with the
hybrid barrier when we're manipulating uninitialized memory that
contains stale pointers. This also tends to catch bugs that will lead
to "sweep increased allocation count" crashes closer to the source of
the bug.

Change-Id: I443ead3eac6f316a46f50b106078b524cac317f4
Reviewed-on: https://go-review.googlesource.com/31761
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-26 15:46:00 +00:00
Austin Clements
79561a84ce runtime: simplify reflectcall write barriers
Currently reflectcall has a subtle dance with write barriers where the
assembly code copies the result values from the stack to the in-heap
argument frame without write barriers and then calls into the runtime
after the fact to invoke the necessary write barriers.

For the hybrid barrier (and for ROC), we need to switch to a
*pre*-write write barrier, which is very difficult to do with the
current setup. We could tie ourselves in knots of subtle reasoning
about why it's okay in this particular case to have a post-write write
barrier, but this commit instead takes a different approach. Rather
than making things more complex, this simplifies reflection calls so
that the argument copy is done in Go using normal bulk write barriers.

The one difficulty with this approach is that calling into Go requires
putting arguments on the stack, but the call* functions "donate" their
entire stack frame to the called function. We can get away with this
now because the copy avoids using the stack and has copied the results
out before we clobber the stack frame to call into the write barrier.
The solution in this CL is to call another function, passing arguments
in registers instead of on the stack, and let that other function
reserve more stack space and setup the arguments for the runtime.

This approach seemed to work out the best. I also tried making the
call* functions reserve 32 extra bytes of frame for the write barrier
arguments and adjust SP up by 32 bytes around the call. However, even
with the necessary changes to the assembler to correct the spdelta
table, the runtime was still having trouble with the frame layout (and
the changes to the assembler caused many other things that do strange
things with the SP to fail to assemble). The approach I took doesn't
require any funny business with the SP.

Updates #17503.

Change-Id: Ie2bb0084b24d6cff38b5afb218b9e0534ad2119e
Reviewed-on: https://go-review.googlesource.com/31655
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-26 15:44:44 +00:00
Alexander Morozov
2481481ff7 runtime: fix comments in time.go
Change-Id: I5c501f598f41241e6d7b21d98a126827a3c3ad9a
Reviewed-on: https://go-review.googlesource.com/32018
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-26 03:51:33 +00:00
Keith Randall
c78d072c8e Revert "Revert "cmd/compile: inline convI2E""
This reverts commit 7dd9c385f6.

Reason for revert: Reverting the revert, which will re-enable the convI2E optimization.  We originally reverted the convI2E optimization because it was making the builder fail, but the underlying cause was later determined to be unrelated.

Original CL: https://go-review.googlesource.com/31260
Revert CL: https://go-review.googlesource.com/31310
Real bug: https://go-review.googlesource.com/c/25159
Real fix: https://go-review.googlesource.com/c/31316

Change-Id: I17237bb577a23a7675a5caab970ccda71a4124f2
Reviewed-on: https://go-review.googlesource.com/32023
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-10-25 23:51:34 +00:00
Austin Clements
575b1dda4e runtime: eliminate allspans snapshot
Now that sweeping and span marking use the sweep list, there's no need
for the work.spans snapshot of the allspans list. This change
eliminates the few remaining uses of it, which are either dead code or
can use allspans directly, and removes work.spans and its support
functions.

Change-Id: Id5388b42b1e68e8baee853d8eafb8bb4ff95bb43
Reviewed-on: https://go-review.googlesource.com/30537
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-25 22:33:02 +00:00
Austin Clements
c95a8e458f runtime: make markrootSpans time proportional to in-use spans
Currently markrootSpans iterates over all spans ever allocated to find
the in-use spans. Since we now have a list of in-use spans, change it
to iterate over that instead.

This, combined with the previous change, fixes #9265. Before these two
changes, blowing up the heap to 8GB and then shrinking it to a 0MB
live set caused the small-heap portion of the test to run 60x slower
than without the initial blowup. With these two changes, the time is
indistinguishable.

No significant effect on other benchmarks.

Change-Id: I4a27e533efecfb5d18cba3a87c0181a81d0ddc1e
Reviewed-on: https://go-review.googlesource.com/30536
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-25 22:32:59 +00:00
Austin Clements
f9497a6747 runtime: make sweep time proportional to in-use spans
Currently sweeping walks the list of all spans, which means the work
in sweeping is proportional to the maximum number of spans ever used.
If the heap was once large but is now small, this causes an
amortization failure: on a small heap, GCs happen frequently, but a
full sweep still has to happen in each GC cycle, which means we spent
a lot of time in sweeping.

Fix this by creating a separate list consisting of just the in-use
spans to be swept, so sweeping is proportional to the number of in-use
spans (which is proportional to the live heap). Specifically, we
create two lists: a list of unswept in-use spans and a list of swept
in-use spans. At the start of the sweep cycle, the swept list becomes
the unswept list and the new swept list is empty. Allocating a new
in-use span adds it to the swept list. Sweeping moves spans from the
unswept list to the swept list.

This fixes the amortization problem because a shrinking heap moves
spans off the unswept list without adding them to the swept list,
reducing the time required by the next sweep cycle.

Updates #9265. This fix eliminates almost all of the time spent in
sweepone; however, markrootSpans has essentially the same bug, so now
the test program from this issue spends all of its time in
markrootSpans.

No significant effect on other benchmarks.

Change-Id: Ib382e82790aad907da1c127e62b3ab45d7a4ac1e
Reviewed-on: https://go-review.googlesource.com/30535
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-25 22:32:57 +00:00
Austin Clements
45baff61e3 runtime: expand comment on work.spans
Change-Id: I4b8a6f5d9bc5aba16026d17f99f3512dacde8d2d
Reviewed-on: https://go-review.googlesource.com/30534
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-25 22:32:54 +00:00
Austin Clements
5915ce6674 runtime: use len(h.spans) to indicate mapped region
Currently we set the len and cap of h.spans to the full reserved
region of the address space and track the actual mapped region
separately in h.spans_mapped. Since we have both the len and cap at
our disposal, change things so len(h.spans) tracks how much of the
spans array is mapped and eliminate h.spans_mapped. This simplifies
mheap and means we'll get nice "index out of bounds" exceptions if we
do try to go off the end of the spans rather than a SIGSEGV.

Change-Id: I8ed9a1a9a844d90e9fd2e269add4704623dbdfe6
Reviewed-on: https://go-review.googlesource.com/30533
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-25 22:32:51 +00:00
Austin Clements
6b0f668044 runtime: consolidate h_spans and mheap_.spans
Like h_allspans and mheap_.allspans, these were two ways of referring
to the spans array from when the runtime was split between C and Go.
Clean this up by making mheap_.spans a slice and eliminating h_spans.

Change-Id: I3aa7038d53c3a4252050aa33e468c48dfed0b70e
Reviewed-on: https://go-review.googlesource.com/30532
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-25 22:32:48 +00:00
Austin Clements
66e849b168 runtime: eliminate mheap.nspan and use range loops
This was necessary in the C days when allspans was an mspan**, but now
that allspans is a Go slice, this is redundant with len(allspans) and
we can use range loops over allspans.

Change-Id: Ie1dc39611e574e29a896e01690582933f4c5be7e
Reviewed-on: https://go-review.googlesource.com/30531
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-25 22:32:45 +00:00
Austin Clements
4d6207790b runtime: consolidate h_allspans and mheap_.allspans
These are two ways to refer to the allspans array that hark back to
when the runtime was split between C and Go. Clean this up by making
mheap_.allspans a slice and eliminating h_allspans.

Change-Id: Ic9360d040cf3eb590b5dfbab0b82e8ace8525610
Reviewed-on: https://go-review.googlesource.com/30530
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-25 22:32:42 +00:00
Josh Bleecher Snyder
d8f7e4fadb runtime, syscall: appease vet
No functional changes.

Change-Id: I0842b2560f4296abfc453410fdd79514132cab83
Reviewed-on: https://go-review.googlesource.com/31935
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-25 15:11:54 +00:00
Michael Munday
3ef07c412f cmd, runtime: remove s390x 3 operand immediate logical ops
These are emulated by the assembler and we don't need them.

Change-Id: I2b07c5315a5b642fdb5e50b468453260ae121164
Reviewed-on: https://go-review.googlesource.com/31758
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-25 12:36:06 +00:00
Russ Cox
71cf409dbd runtime: accept timeout from non-timeout semaphore wait on OS X
Looking at the kernel sources, I don't see how this is possible.
But obviously it is. Just try again.

Fixes #17161.

Change-Id: Iea7d53f7cf75944792d2f75a0d07129831c7bcdb
Reviewed-on: https://go-review.googlesource.com/31823
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-25 02:51:04 +00:00
Russ Cox
39690beb58 runtime: fix invariant comment in chan.go
Change-Id: Ic6317f186d0ee68ab1f2d15be9a966a152f61bfb
Reviewed-on: https://go-review.googlesource.com/31610
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-24 15:59:28 +00:00
Russ Cox
8419c85eaa runtime, cmd/link: fix netbsd/arm EABI support
Fixes reported by oshimaya (see #13806).

Fixes #13806.

Change-Id: I9b659ab918a34bc5f7c58f3d7f59058115b7f776
Reviewed-on: https://go-review.googlesource.com/31651
Reviewed-by: Minux Ma <minux@golang.org>
2016-10-24 15:23:13 +00:00
Alex Brainman
9ac60181e2 runtime/cgo: do not link math lib by default on windows
Makes windows same as others.

Change-Id: Ib4651e06d0bd37473ac345d36c91f39aa8f5e662
Reviewed-on: https://go-review.googlesource.com/31791
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-24 08:09:57 +00:00
Austin Clements
3cbfcaa4ba runtime: make mspan.isFree do what's on the tin
Currently mspan.isFree technically returns whether the object was not
allocated *during this cycle*. Fix it so it actually returns whether
or not the object is allocated so the method is more generally useful
(especially for debugging).

It has one caller, which is carefully written to be insensitive to
this distinction, but this lets us simplify this caller.

Change-Id: I9d79cf784a56015e434961733093c1d8d03fc091
Reviewed-on: https://go-review.googlesource.com/30145
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-24 02:33:39 +00:00
Austin Clements
bf9c71cb43 runtime: make morestack less subtle
morestack writes the context pointer to gobuf.ctxt, but since
morestack is written in assembly (and has to be very careful with
state), it does *not* invoke the requisite write barrier for this
write. Instead, we patch this up later, in newstack, where we invoke
an explicit write barrier for ctxt.

This already requires some subtle reasoning, and it's going to get a
lot hairier with the hybrid barrier.

Fix this by simplifying the whole mechanism. Instead of writing
gobuf.ctxt in morestack, just pass the value of the context register
to newstack and let it write it to gobuf.ctxt. This is a normal Go
pointer write, so it gets the normal Go write barrier. No subtle
reasoning required.

Updates #17503.

Change-Id: Ia6bf8459bfefc6828f53682ade32c02412e4db63
Reviewed-on: https://go-review.googlesource.com/31550
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-24 02:23:16 +00:00
Josh Bleecher Snyder
448e1db103 runtime: skip TestLldbPython
The test is broken on macOS Sierra.

Updates #17463.

Change-Id: Ifbb2379c640b9353a01bc55a5cb26dfaad9b4bdc
Reviewed-on: https://go-review.googlesource.com/31725
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-22 19:17:15 +00:00
Austin Clements
a73d68e75e runtime: fix call* signatures and deferArgs with siz=0
This commit fixes two bizarrely related bugs:

1. The signatures for the call* functions were wrong, indicating that
they had only two pointer arguments instead of three. We didn't notice
because the call* functions are defined by a macro expansion, which go
vet doesn't see.

2. deferArgs on a defer object with a zero-sized frame returned a
pointer just past the end of the allocated object, which is illegal in
Go (and can cause the "sweep increased allocation count" crashes).

In a fascinating twist, these two bugs canceled each other out, which
is why I'm fixing them together. The pointer returned by deferArgs is
used in only two ways: as an argument to memmove and as an argument to
reflectcall. memmove is NOSPLIT, so the argument was unobservable.
reflectcall immediately tail calls one of the call* functions, which
are not NOSPLIT, but the deferArgs pointer just happened to be the
third argument that was accidentally marked as a scalar. Hence, when
the garbage collector scanned the stack, it didn't see the bad
pointer as a pointer.

I believe this was all ultimately benign. In principle, stack growth
during the reflectcall could fail to update the args pointer, but it
never points to the stack, so it never needs to be updated. Also in
principle, the garbage collector could fail to mark the args object
because of the incorrect call* signatures, but in all calls to
reflectcall (including the ones spelled "call" in the reflect package)
the args object is kept live by the calling stack.

Change-Id: Ic932c79d5f4382be23118fdd9dba9688e9169e28
Reviewed-on: https://go-review.googlesource.com/31654
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-21 16:01:32 +00:00
Austin Clements
c242517866 runtime: replace *g with guintptr in trace
trace's reader *g is going to cause write barriers in unfortunate
places, so replace it with a guintptr.

Change-Id: Ie8fb13bb89a78238f9d2a77ec77da703e96df8af
Reviewed-on: https://go-review.googlesource.com/31469
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-21 16:00:20 +00:00
Alberto Donizetti
10560afb54 runtime/debug: avoid overflow in SetMaxThreads
Fixes #16076

Change-Id: I91fa87b642592ee4604537dd8c3197cd61ec8b31
Reviewed-on: https://go-review.googlesource.com/31516
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-20 21:08:18 +00:00
Michael Munday
f1ad4863aa runtime: get s390x vector facility availability from AT_HWCAP
This is a more robust method for obtaining the availability of vx.
Since this variable may be checked frequently I've also now
padded it so that it will be in its own cache line.

I've kept the other check (in hash/crc32) the same for now until
I can figure out the best way to update it.

Updates #15403.

Change-Id: I74eed651afc6f6a9c5fa3b88fa6a2b0c9ecf5875
Reviewed-on: https://go-review.googlesource.com/31149
Reviewed-by: Austin Clements <austin@google.com>
2016-10-19 21:50:13 +00:00
Austin Clements
2be3ab4415 runtime: keep gcMarkRootCheck happy with spare Gs
oneNewExtraM creates a spare M and G for use with cgo callbacks. The G
doesn't run right away, but goes directly into syscall status. For the
garbage collector, it's marked as "scan valid" and not on the rescan
list, but I forgot to also mark it as "scan done". As a result,
gcMarkRootCheck thinks that the goroutine hasn't been scanned and
panics.

This only affects GODEBUG=gccheckmark=1 mode, since we otherwise skip
the gcMarkRootCheck.

Fixes #17473.

Change-Id: I94f5671c42eb44bd5ea7dc68fbf85f0c19e2e52c
Reviewed-on: https://go-review.googlesource.com/31139
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-19 21:36:53 +00:00
Austin Clements
7bc42a145a runtime: don't reserve space for stack barriers if they're off
Change-Id: I79ebccdaefc434c47b77bd545cc3c50723c18b61
Reviewed-on: https://go-review.googlesource.com/31135
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-19 21:36:37 +00:00
Austin Clements
5b7497f327 runtime: update heap profile stats after world is started
Updating the heap profile stats is one of the most expensive parts of
mark termination other than stack rescanning, but there's really no
need to do this with the world stopped. Move it to right after we've
started the world back up. This creates a *very* small window where
allocations from the next cycle can slip into the profile, but the
exact point where mark termination happens is so non-deterministic
already that a slight reordering here is unimportant.

Change-Id: I2f76f22c70329923ad6a594a2c26869f0736d34e
Reviewed-on: https://go-review.googlesource.com/31363
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-19 21:36:24 +00:00
Austin Clements
9429aab999 runtime: remove gcWork flushes in mark termination
The only reason these flushes are still necessary at all is that
gcmarknewobject doesn't flush its gcWork stats like it's supposed to.
By changing gcmarknewobject to follow the standard protocol, the
flushes become completely unnecessary because mark 2 ensures caches
are flushed (and stay flushed) before we ever enter mark termination.

In the garbage benchmark, this takes roughly 50 µs, which is
surprisingly long for doing nothing. We still double-check after
draining that they are in fact empty.

Change-Id: Ia1c7cf98a53f72baa513792eb33eca6a0b4a7128
Reviewed-on: https://go-review.googlesource.com/31134
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-19 21:36:09 +00:00
Ian Lance Taylor
a16954b8a7 cmd/cgo: always use a function literal for pointer checking
The pointer checking code needs to know the exact type of the parameter
expected by the C function, so that it can use a type assertion to
convert the empty interface returned by cgoCheckPointer to the correct
type. Previously this was done by using a type conversion, but that
meant that the code accepted arguments that were convertible to the
parameter type, rather than arguments that were assignable as in a
normal function call. In other words, some code that should not have
passed type checking was accepted.

This CL changes cgo to always use a function literal for pointer
checking. Now the argument is passed to the function literal, which has
the correct argument type, so type checking is performed just as for a
function call as it should be.

Since we now always use a function literal, simplify the checking code
to run as a statement by itself. It now no longer needs to return a
value, and we no longer need a type assertion.

This does have the cost of introducing another function call into any
call to a C function that requires pointer checking, but the cost of the
additional call should be minimal compared to the cost of pointer
checking.

Fixes #16591.

Change-Id: I220165564cf69db9fd5f746532d7f977a5b2c989
Reviewed-on: https://go-review.googlesource.com/31233
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-10-19 21:20:50 +00:00
Russ Cox
40d81cf061 sync: throw, not panic, for unlock of unlocked mutex
The panic leaves the lock in an unusable state.
Trying to panic with a usable state makes the lock significantly
less efficient and scalable (see early CL patch sets and discussion).

Instead, use runtime.throw, which will crash the program directly.

In general throw is reserved for when the runtime detects truly
serious, unrecoverable problems. This problem is certainly serious,
and, without a significant performance hit, is unrecoverable.

Fixes #13879.

Change-Id: I41920d9e2317270c6f909957d195bd8b68177f8d
Reviewed-on: https://go-review.googlesource.com/31359
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-19 17:46:27 +00:00
Matthew Dempsky
7dd9c385f6 Revert "cmd/compile: inline convI2E"
This reverts commit 395d36a67d.

Appears to be responsible for builder failures.

Change-Id: Ic6c6307f662767e529060b88704a9f074785d99e
Reviewed-on: https://go-review.googlesource.com/31310
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-10-17 20:37:19 +00:00
Lynn Boger
2190f771d8 bytes: fix typo in ppc64le asm for Compare
Correcting a line in asm_ppc64x.s in the cmpbodyLE function
that originally was R14 but accidentally changed to R4.

Fixes #17488

Change-Id: Id4ca6fb2e0cd81251557a0627e17b5e734c39e01
Reviewed-on: https://go-review.googlesource.com/31266
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
2016-10-17 20:01:17 +00:00
Austin Clements
984753b665 runtime: fix GC assist retry path
GC assists retry if preempted or if they fail to park. However, on the
retry path they currently use stale statistics. In particular, the
retry can use "debtBytes", but debtBytes isn't updated when the debt
changes (since other than retries it is only used once). Also, though
less of a problem, the if the assist ratio has changed while the
assist was blocked, the retry will still use the old assist ratio.

Fix all of this by simply making the retry jump back to where we
compute these statistics, rather than just after.

Change-Id: I2ed8b4f0fc9f008ff060aa926f4334b662ac7d3f
Reviewed-on: https://go-review.googlesource.com/30701
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-17 19:09:37 +00:00
Austin Clements
81c431a537 runtime: abstract out assist queue management
This puts all of the assist queue-related code together and makes it
easier to modify how the assist queue works.

Change-Id: Id54e06702bdd5a5dd3fef2ce2c14cd7ca215303c
Reviewed-on: https://go-review.googlesource.com/30700
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-17 19:09:30 +00:00
Keith Randall
395d36a67d cmd/compile: inline convI2E
It's pretty simple.  For:
  e = (interface{})(i)
Do:
  tmp = i.itab
  if tmp != nil {
    tmp = tmp.typ_ // load type from itab
  }
  e = eface{tmp, i.data}

It is smaller and faster than calling the runtime.

Change-Id: I0ad27f62f4ec0b6cd53bc8530e4da0eae3e67a6c
Reviewed-on: https://go-review.googlesource.com/31260
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-10-17 19:09:07 +00:00
Austin Clements
d5bd797ee5 runtime: fix getArgInfo for deferred reflection calls
getArgInfo for reflect.makeFuncStub and reflect.methodValueCall is
necessarily special. These have dynamically determined argument maps
that are stored in their context (that is, their *funcval). These
functions are written to store this context at 0(SP) when called, and
getArgInfo retrieves it from there.

This technique works if getArgInfo is passed an active call frame for
one of these functions. However, getArgInfo is also used in
tracebackdefers, where the "call" is not a true call with an active
stack frame, but a deferred call. In this situation, getArgInfo
currently crashes because tracebackdefers passes a frame with sp set
to 0. However, the entire approach used by getArgInfo is flawed in
this situation because the wrapper has not actually executed, and
hence hasn't saved this metadata to any stack frame.

In the defer case, we know the *funcval from the _defer itself, so we
can fix this by teaching getArgInfo to use the *funcval context
directly when its available, and otherwise get it from the active call
frame.

While we're here, this commit simplifies getArgInfo a bit by making it
play more nicely with the type system. Rather than decoding the
*reflect.methodValue that is the wrapper's context as a *[2]uintptr,
just write out a copy of the reflect.methodValue type in the runtime.

Fixes #16331. Fixes #17471.

Change-Id: I81db4d985179b4a81c68c490cceeccbfc675456a
Reviewed-on: https://go-review.googlesource.com/31138
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-17 18:57:01 +00:00
Austin Clements
687d9d5d78 runtime: print a message on bad morestack
If morestack runs on the g0 or gsignal stack, it currently performs
some abort operation that typically produces a signal (e.g., it does
an INT $3 on x86). This is useful if you're running in a debugger, but
if you're not, the runtime tries to trap this signal, which is likely
to send the program into a deeper spiral of collapse and lead to very
confusing diagnostic output.

Help out people trying to debug without a debugger by making morestack
print an informative message before blowing up.

Change-Id: I2814c64509b137bfe20a00091d8551d18c2c4749
Reviewed-on: https://go-review.googlesource.com/31133
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-17 18:56:09 +00:00
Lynn Boger
1e28dce80a bytes: improve performance for bytes.Compare on ppc64x
This improves the performance for byte.Compare by rewriting
the cmpbody function in runtime/asm_ppc64x.s.  The previous code
had a simple loop which loaded a pair of bytes and compared them,
which is inefficient for long buffers.  The updated function checks
for 8 or 32 byte chunks and then loads and compares double words where
possible.

Because the byte.Compare result indicates greater or less than,
the doubleword loads must take endianness into account, using a
byte reversed load in the little endian case.

Fixes #17433

benchmark                                   old ns/op     new ns/op     delta
BenchmarkBytesCompare/8-16                  13.6          7.16          -47.35%
BenchmarkBytesCompare/16-16                 25.7          7.83          -69.53%
BenchmarkBytesCompare/32-16                 38.1          7.78          -79.58%
BenchmarkBytesCompare/64-16                 63.0          10.6          -83.17%
BenchmarkBytesCompare/128-16                112           13.0          -88.39%
BenchmarkBytesCompare/256-16                211           28.1          -86.68%
BenchmarkBytesCompare/512-16                410           38.6          -90.59%
BenchmarkBytesCompare/1024-16               807           60.2          -92.54%
BenchmarkBytesCompare/2048-16               1601          103           -93.57%

Change-Id: I121acc74fcd27c430797647b8d682eb0607c63eb
Reviewed-on: https://go-review.googlesource.com/30949
Reviewed-by: David Chase <drchase@google.com>
2016-10-17 18:46:20 +00:00
Martin Möhrmann
d295174030 runtime: speed up non-ASCII rune decoding
Copies utf8 constants and EncodeRune implementation from unicode/utf8.

Adds a new decoderune implementation that is used by the compiler
in code generated for ranging over strings. It does not handle
ASCII runes since these are handled directly before calls to decoderune.

The DecodeRuneInString implementation from unicode/utf8 is not used
since it uses a lookup table that would increase the use of cpu caches.

Adds more tests that check decoding of valid and invalid utf8 sequences.

name                              old time/op  new time/op  delta
RuneIterate/range2/ASCII-4        7.45ns ± 2%  7.45ns ± 1%     ~     (p=0.634 n=16+16)
RuneIterate/range2/Japanese-4     53.5ns ± 1%  49.2ns ± 2%   -8.03%  (p=0.000 n=20+20)
RuneIterate/range2/MixedLength-4  46.3ns ± 1%  41.0ns ± 2%  -11.57%  (p=0.000 n=20+20)

new:
"".decoderune t=1 size=423 args=0x28 locals=0x0
old:
"".charntorune t=1 size=666 args=0x28 locals=0x0

Change-Id: I1df1fdb385bb9ea5e5e71b8818ea2bf5ce62de52
Reviewed-on: https://go-review.googlesource.com/28490
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-17 11:25:22 +00:00
Austin Clements
9897e40811 runtime: use more go:nowritebarrierrec in proc.go
Currently we use go:nowritebarrier in many places in proc.go.
go:notinheap and go:yeswritebarrierrec now let us use
go:nowritebarrierrec (the recursive form of the go:nowritebarrier
pragma) more liberally. Do so in proc.go

Change-Id: Ia7fcbc12ce6c51cb24730bf835fb7634ad53462f
Reviewed-on: https://go-review.googlesource.com/30942
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-15 17:58:23 +00:00
Austin Clements
1bc6be6423 runtime: mark several types go:notinheap
This covers basically all sysAlloc'd, persistentalloc'd, and
fixalloc'd types.

Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b
Reviewed-on: https://go-review.googlesource.com/30941
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-15 17:58:20 +00:00
Austin Clements
991a85c889 runtime: make mSpanList more go:notinheap-friendly
Currently mspan links to its previous mspan using a **mspan field that
points to the previous span's next field. This simplifies some of the
list manipulation code, but is going to make it very hard to convince
the compiler that mspan list manipulations don't need write barriers.

Fix this by using a more traditional ("boring") linked list that uses
a simple *mspan pointer to the previous mspan. This complicates some
of the list manipulation slightly, but it will let us eliminate all
write barriers from the mspan list manipulation code by marking mspan
go:notinheap.

Change-Id: I0d0b212db5f20002435d2a0ed2efc8aa0364b905
Reviewed-on: https://go-review.googlesource.com/30940
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-15 17:58:17 +00:00
Austin Clements
77527a316b cmd/compile: add go:notinheap type pragma
This adds a //go:notinheap pragma for declarations of types that must
not be heap allocated. We ensure these rules by disallowing new(T),
make([]T), append([]T), or implicit allocation of T, by disallowing
conversions to notinheap types, and by propagating notinheap to any
struct or array that contains notinheap elements.

The utility of this pragma is that we can eliminate write barriers for
writes to pointers to go:notinheap types, since the write barrier is
guaranteed to be a no-op. This will let us mark several scheduler and
memory allocator structures as go:notinheap, which will let us
disallow write barriers in the scheduler and memory allocator much
more thoroughly and also eliminate some problematic hybrid write
barriers.

This also makes go:nowritebarrierrec and go:yeswritebarrierrec much
more powerful. Currently we use go:nowritebarrier all over the place,
but it's almost never what you actually want: when write barriers are
illegal, they're typically illegal for a whole dynamic scope. Partly
this is because go:nowritebarrier has been around longer, but it's
also because go:nowritebarrierrec couldn't be used in situations that
had no-op write barriers or where some nested scope did allow write
barriers. go:notinheap eliminates many no-op write barriers and
go:yeswritebarrierrec makes it possible to opt back in to write
barriers, so these two changes will let us use go:nowritebarrierrec
far more liberally.

This updates #13386, which is about controlling pointers from non-GC'd
memory to GC'd memory. That would require some additional pragma (or
pragmas), but could build on this pragma.

Change-Id: I6314f8f4181535dd166887c9ec239977b54940bd
Reviewed-on: https://go-review.googlesource.com/30939
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-10-15 17:58:14 +00:00
Austin Clements
a9e6cebde2 cmd/compile, runtime: add go:yeswritebarrierrec pragma
This pragma cancels the effect of go:nowritebarrierrec. This is useful
in the scheduler because there are places where we enter a function
without a valid P (and hence cannot have write barriers), but then
obtain a P. This allows us to annotate the function with
go:nowritebarrierrec and split out the part after we've obtained a P
into a go:yeswritebarrierrec function.

Change-Id: Ic8ce4b6d3c074a1ecd8280ad90eaf39f0ffbcc2a
Reviewed-on: https://go-review.googlesource.com/30938
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-15 17:58:11 +00:00
Keith Randall
442de98c14 cmd/compile,runtime: redo how map assignments work
To compile:
  m[k] = v
instead of:
  mapassign(maptype, m, &k, &v), do
do:
  *mapassign(maptype, m, &k) = v

mapassign returns a pointer to the value slot in the map.  It is just
like mapaccess except that it will allocate a new slot if k is not
already present in the map.

This makes map accesses faster but potentially larger (codewise).

It is faster because the write into the map is done when the compiler
knows the concrete type, so it can be done with a few store
instructions instead of calling typedmemmove.  We also potentially
avoid stack temporaries to hold v.

The code can be larger when the map has pointers in its value type,
since there is a write barrier call in addition to the mapassign call.
That makes the code at the callsite a bit bigger (go binary is 0.3%
bigger).

This CL is in preparation for doing operations like m[k] += v with
only a single runtime call.  That will roughly double the speed of
such operations.

Update #17133
Update #5147

Change-Id: Ia435f032090a2ed905dac9234e693972fe8c2dc5
Reviewed-on: https://go-review.googlesource.com/30815
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-12 20:41:23 +00:00
Emmanuel Odeke
898ca6ba0a runtime: update mkduff legacy comments
Update comments for duffzero and duffcopy
which referred to legacy locations:
+ cmd/?g/cgen.go
+ cmd/?g/ggen.go

Remnants of the old days when we had 5g, 6g etc.

Those locations have since moved to:
+ cmd/compile/internal/<arch>/cgen.go
+ cmd/compile/internal/<arch>/ggen.go

Change-Id: Ie2ea668559d52d42b747260ea69a6d5b3d70e859
Reviewed-on: https://go-review.googlesource.com/29073
Reviewed-by: Russ Cox <rsc@golang.org>
2016-10-12 14:51:50 +00:00
Joshua Boelter
1a3b739b26 runtime: check for errors returned by windows sema calls
Add checks for failure of CreateEvent, SetEvent or
WaitForSingleObject. Any failures are considered fatal and
will throw() after printing an informative message.

Updates #16646

Change-Id: I3bacf9001d2abfa8667cc3aff163ff2de1c99915
Reviewed-on: https://go-review.googlesource.com/26655
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-12 13:39:43 +00:00
Gyu-Ho Lee
456b7f5a97 runtime/pprof: preallocate slice in pprof.go
To prevent slice growth when appending.

Change-Id: I2cdb9b09bc33f63188b19573c8b9a77601e63801
Reviewed-on: https://go-review.googlesource.com/23783
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-10-12 06:33:34 +00:00
Gleb Stepanov
460d112f6c runtime: fix typo in comments
Fix typo in word synchronization in comments.

Change-Id: I453b4e799301e758799c93df1e32f5244ca2fb84
Reviewed-on: https://go-review.googlesource.com/25174
Reviewed-by: Russ Cox <rsc@golang.org>
2016-10-12 02:48:31 +00:00
Michael Munday
8728df645c runtime: remove canBackTrace variable from TestGdbPython
The canBackTrace variable is true for all of the architectures
Go supports and this is likely to remain the case as new
architectures are added.

Change-Id: I73900c018eb4b2e5c02fccd8d3e89853b2ba9d90
Reviewed-on: https://go-review.googlesource.com/22423
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-12 02:16:11 +00:00
Matthew Dempsky
303b69feb7 cmd/compile, runtime: stop padding stackmaps to 4 bytes
Shrinks cmd/go's text segment by 0.9%.

   text	   data	    bss	    dec	    hex	filename
6447148	 231643	 146328	6825119	 68249f	go.before
6387404	 231643	 146328	6765375	 673b3f	go.after

Change-Id: I431e8482dbb11a7c1c77f2196cada43d5dad2981
Reviewed-on: https://go-review.googlesource.com/30817
Reviewed-by: Austin Clements <austin@google.com>
2016-10-11 20:23:30 +00:00
Cherry Zhang
7c431cb7f9 cmd/link: insert trampolines for too-far jumps on ARM
ARM direct CALL/JMP instruction has 24 bit offset, which can only
encodes jumps within +/-32M. When the target is too far, the top
bits get truncated and the program jumps wild.

This CL detects too-far jumps and automatically insert trampolines,
currently only internal linking on ARM.

It is necessary to make the following changes to the linker:
- Resolve direct jump relocs when assigning addresses to functions.
  this allows trampoline insertion without moving all code that
  already laid down.
- Lay down packages in dependency order, so that when resolving a
  inter-package direct jump reloc, the target address is already
  known. Intra-package jumps are assumed never too far.
- a linker flag -debugtramp is added for debugging trampolines:
    "-debugtramp=1 -v" prints trampoline debug message
    "-debugtramp=2"    forces all inter-package jump to use
                       trampolines (currently ARM only)
    "-debugtramp=2 -v" does both
- Some data structures are changed for bookkeeping.

On ARM, pseudo DIV/DIVU/MOD/MODU instructions now clobber R8
(unfortunate). In the standard library there is no ARM assembly
code that uses these instructions, and the compiler no longer emits
them (CL 29390).

all.bash passes with -debugtramp=2, except a disassembly test (this
is unavoidable as we changed the instruction).

TBD: debug info of trampolines?

Fixes #17028.

Change-Id: Idcce347ea7e0af77c4079041a160b2f6e114b474
Reviewed-on: https://go-review.googlesource.com/29397
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-11 13:35:33 +00:00
Ian Lance Taylor
d03e8b226c runtime: record current PC for SIGPROF on non-Go thread
If we get a SIGPROF on a non-Go thread, and the program has not called
runtime.SetCgoTraceback so we have no way to collect a stack trace, then
record a profile that is just the PC where the signal occurred. That
will at least point the user to the right area.

Retrieving the PC from the sigctxt in a signal handler on a non-G thread
required marking a number of trivial sigctxt methods as nosplit, and,
for extra safety, nowritebarrierrec.

The test shows that the existing test CgoPprofThread test does not test
the stack trace, just the profile signal. Leaving that for later.

Change-Id: I8f8f3ff09ac099fc9d9df94b5a9d210ffc20c4ab
Reviewed-on: https://go-review.googlesource.com/30252
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-11 12:56:15 +00:00
Alex Brainman
dd307da10c runtime/cgo: do not explicitly link msvcrt.dll
CL 14472 solved issue #12030 by explicitly linking msvcrt.dll
to every cgo executable we build. This CL achieves the same
by manually loading ntdll.dll during startup.

Updates #12030

Change-Id: I5d9cd925ef65cc34c5d4031c750f0f97794529b2
Reviewed-on: https://go-review.googlesource.com/30737
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-10-11 05:45:06 +00:00
Austin Clements
94589054d3 cmd/trace: label mark termination spans as such
Currently these are labeled "MARK", which was accurate in the STW
collector, but these really indicate mark termination now, since
marking happens for the full duration of the concurrent GC. Re-label
them as "MARK TERMINATION" to clarify this.

Change-Id: Ie98bd961195acde49598b4fa3f9e7d90d757c0a6
Reviewed-on: https://go-review.googlesource.com/30018
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-07 18:33:23 +00:00
Austin Clements
fa9b57bb1d runtime: make next_gc ^0 when GC is disabled
When GC is disabled, we set gcpercent to -1. However, we still use
gcpercent to compute several values, such as next_gc and gc_trigger.
These calculations are meaningless when gcpercent is -1 and result in
meaningless values. This is okay in a sense because we also never use
these values if gcpercent is -1, but they're confusing when exposed to
the user, for example via MemStats or the execution trace. It's
particularly unfortunate in the execution trace because it attempts to
plot the underflowed value of next_gc, which scales all useful
information in the heap row into oblivion.

Fix this by making next_gc ^0 when gcpercent < 0. This has the
advantage of being true in a way: next_gc is effectively infinite when
gcpercent < 0. We can also detect this special value when updating the
execution trace and report next_gc as 0 so it doesn't blow up the
display of the heap line.

Change-Id: I4f366e4451f8892a4908da7b2b6086bdc67ca9a9
Reviewed-on: https://go-review.googlesource.com/30016
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-07 18:32:51 +00:00
Ian Lance Taylor
15937ccb89 runtime: fix sigset type for ppc64 big-endian GNU/Linux
On 64-bit big-endian GNU/Linux machines we need to treat sigset as a
single uint64, not as a pair of uint32 values. This fix was already made
for s390x, but not for ppc64 (which is big-endian--the little endian
version is known as ppc64le). So copy os_linux_390.x to
os_linux_be64.go, and use build constraints as needed.

Fixes #17361

Change-Id: Ia0eb18221a8f5056bf17675fcfeb010407a13fb0
Reviewed-on: https://go-review.googlesource.com/30602
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-06 22:24:40 +00:00
Brad Fitzpatrick
4103fedf19 runtime: skip gdb tests on linux/ppc64 for now
Updates #17366

Change-Id: Ia4bd3c74c48b85f186586184a7c2b66d3b80fc9c
Reviewed-on: https://go-review.googlesource.com/30596
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-10-06 19:52:09 +00:00
Denis Nagorny
d7507e9d11 runtime: improve memmove for amd64
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors.

    (collected on E5 2609v3 @1.9GHz)
    name                        old speed      new speed       delta
    Memmove/1-6                  158MB/s ± 0%    172MB/s ± 0%    +9.09% (p=0.000 n=16+16)
    Memmove/2-6                  316MB/s ± 0%    345MB/s ± 0%    +9.09% (p=0.000 n=18+16)
    Memmove/3-6                  517MB/s ± 0%    517MB/s ± 0%      ~ (p=0.445 n=16+16)
    Memmove/4-6                  687MB/s ± 1%    690MB/s ± 0%    +0.35% (p=0.000 n=20+17)
    Memmove/5-6                  729MB/s ± 0%    729MB/s ± 0%    +0.01% (p=0.000 n=16+18)
    Memmove/6-6                  875MB/s ± 0%    875MB/s ± 0%    +0.01% (p=0.000 n=18+18)
    Memmove/7-6                 1.02GB/s ± 0%   1.02GB/s ± 1%      ~ (p=0.139 n=19+20)
    Memmove/8-6                 1.26GB/s ± 0%   1.26GB/s ± 0%    +0.00% (p=0.000 n=18+18)
    Memmove/9-6                 1.42GB/s ± 0%   1.42GB/s ± 0%    +0.00% (p=0.000 n=17+18)
    Memmove/10-6                1.58GB/s ± 0%   1.58GB/s ± 0%    +0.00% (p=0.000 n=19+19)
    Memmove/11-6                1.74GB/s ± 0%   1.74GB/s ± 0%    +0.00% (p=0.001 n=18+17)
    Memmove/12-6                1.90GB/s ± 0%   1.90GB/s ± 0%    +0.00% (p=0.000 n=19+19)
    Memmove/13-6                2.05GB/s ± 0%   2.05GB/s ± 0%    +0.00% (p=0.000 n=18+19)
    Memmove/14-6                2.21GB/s ± 0%   2.21GB/s ± 0%    +0.00% (p=0.000 n=16+20)
    Memmove/15-6                2.37GB/s ± 0%   2.37GB/s ± 0%    +0.00% (p=0.004 n=19+20)
    Memmove/16-6                2.53GB/s ± 0%   2.53GB/s ± 0%    +0.00% (p=0.000 n=16+16)
    Memmove/32-6                4.67GB/s ± 0%   4.67GB/s ± 0%    +0.00% (p=0.000 n=17+17)
    Memmove/64-6                8.67GB/s ± 0%   8.64GB/s ± 0%    -0.33% (p=0.000 n=18+17)
    Memmove/128-6               12.6GB/s ± 0%   11.6GB/s ± 0%    -8.05% (p=0.000 n=16+19)
    Memmove/256-6               16.3GB/s ± 0%   16.6GB/s ± 0%    +1.66% (p=0.000 n=20+18)
    Memmove/512-6               21.5GB/s ± 0%   24.4GB/s ± 0%   +13.35% (p=0.000 n=18+17)
    Memmove/1024-6              24.7GB/s ± 0%   33.7GB/s ± 0%   +36.12% (p=0.000 n=18+18)
    Memmove/2048-6              27.3GB/s ± 0%   43.3GB/s ± 0%   +58.77% (p=0.000 n=19+17)
    Memmove/4096-6              37.5GB/s ± 0%   50.5GB/s ± 0%   +34.56% (p=0.000 n=19+19)
    MemmoveUnalignedDst/1-6      135MB/s ± 0%    146MB/s ± 0%    +7.69% (p=0.000 n=16+14)
    MemmoveUnalignedDst/2-6      271MB/s ± 0%    292MB/s ± 0%    +7.69% (p=0.000 n=18+18)
    MemmoveUnalignedDst/3-6      438MB/s ± 0%    438MB/s ± 0%      ~ (p=0.352 n=16+19)
    MemmoveUnalignedDst/4-6      584MB/s ± 0%    584MB/s ± 0%      ~ (p=0.876 n=17+17)
    MemmoveUnalignedDst/5-6      631MB/s ± 1%    632MB/s ± 0%    +0.25% (p=0.000 n=20+17)
    MemmoveUnalignedDst/6-6      759MB/s ± 0%    759MB/s ± 0%    +0.00% (p=0.000 n=19+16)
    MemmoveUnalignedDst/7-6      885MB/s ± 0%    883MB/s ± 1%      ~ (p=0.647 n=18+20)
    MemmoveUnalignedDst/8-6     1.08GB/s ± 0%   1.08GB/s ± 0%    +0.00% (p=0.035 n=19+18)
    MemmoveUnalignedDst/9-6     1.22GB/s ± 0%   1.22GB/s ± 0%      ~ (p=0.251 n=18+17)
    MemmoveUnalignedDst/10-6    1.35GB/s ± 0%   1.35GB/s ± 0%      ~ (p=0.327 n=17+18)
    MemmoveUnalignedDst/11-6    1.49GB/s ± 0%   1.49GB/s ± 0%      ~ (p=0.531 n=18+19)
    MemmoveUnalignedDst/12-6    1.63GB/s ± 0%   1.63GB/s ± 0%      ~ (p=0.886 n=19+18)
    MemmoveUnalignedDst/13-6    1.76GB/s ± 0%   1.76GB/s ± 1%    -0.24% (p=0.006 n=18+20)
    MemmoveUnalignedDst/14-6    1.90GB/s ± 0%   1.90GB/s ± 0%      ~ (p=0.818 n=20+19)
    MemmoveUnalignedDst/15-6    2.03GB/s ± 0%   2.03GB/s ± 0%      ~ (p=0.294 n=17+16)
    MemmoveUnalignedDst/16-6    2.17GB/s ± 0%   2.17GB/s ± 0%      ~ (p=0.602 n=16+18)
    MemmoveUnalignedDst/32-6    4.05GB/s ± 0%   4.05GB/s ± 0%    +0.00% (p=0.010 n=18+17)
    MemmoveUnalignedDst/64-6    7.59GB/s ± 0%   7.59GB/s ± 0%    +0.00% (p=0.022 n=18+16)
    MemmoveUnalignedDst/128-6   11.1GB/s ± 0%   11.4GB/s ± 0%    +2.79% (p=0.000 n=18+17)
    MemmoveUnalignedDst/256-6   16.4GB/s ± 0%   16.7GB/s ± 0%    +1.59% (p=0.000 n=20+17)
    MemmoveUnalignedDst/512-6   15.7GB/s ± 0%   21.3GB/s ± 0%   +35.87% (p=0.000 n=18+20)
    MemmoveUnalignedDst/1024-6  16.0GB/s ±20%   31.5GB/s ± 0%   +96.93% (p=0.000 n=20+14)
    MemmoveUnalignedDst/2048-6  19.6GB/s ± 0%   42.1GB/s ± 0%  +115.16% (p=0.000 n=17+18)
    MemmoveUnalignedDst/4096-6  6.41GB/s ± 0%  33.18GB/s ± 0%  +417.56% (p=0.000 n=17+18)
    MemmoveUnalignedSrc/1-6      171MB/s ± 0%    166MB/s ± 0%    -3.33% (p=0.000 n=19+16)
    MemmoveUnalignedSrc/2-6      343MB/s ± 0%    342MB/s ± 1%    -0.41% (p=0.000 n=17+20)
    MemmoveUnalignedSrc/3-6      508MB/s ± 0%    493MB/s ± 1%    -2.90% (p=0.000 n=17+17)
    MemmoveUnalignedSrc/4-6      677MB/s ± 0%    660MB/s ± 2%    -2.55% (p=0.000 n=17+20)
    MemmoveUnalignedSrc/5-6      790MB/s ± 0%    790MB/s ± 0%      ~ (p=0.139 n=17+17)
    MemmoveUnalignedSrc/6-6      948MB/s ± 0%    946MB/s ± 1%      ~ (p=0.330 n=17+19)
    MemmoveUnalignedSrc/7-6     1.11GB/s ± 0%   1.11GB/s ± 0%    -0.05% (p=0.026 n=17+17)
    MemmoveUnalignedSrc/8-6     1.38GB/s ± 0%   1.38GB/s ± 0%      ~ (p=0.091 n=18+16)
    MemmoveUnalignedSrc/9-6     1.42GB/s ± 0%   1.40GB/s ± 1%    -1.04% (p=0.000 n=19+20)
    MemmoveUnalignedSrc/10-6    1.58GB/s ± 0%   1.56GB/s ± 1%    -1.15% (p=0.000 n=18+19)
    MemmoveUnalignedSrc/11-6    1.73GB/s ± 0%   1.71GB/s ± 1%    -1.30% (p=0.000 n=20+20)
    MemmoveUnalignedSrc/12-6    1.89GB/s ± 0%   1.87GB/s ± 1%    -1.18% (p=0.000 n=17+20)
    MemmoveUnalignedSrc/13-6    2.05GB/s ± 0%   2.02GB/s ± 1%    -1.18% (p=0.000 n=17+20)
    MemmoveUnalignedSrc/14-6    2.21GB/s ± 0%   2.18GB/s ± 1%    -1.14% (p=0.000 n=17+20)
    MemmoveUnalignedSrc/15-6    2.36GB/s ± 0%   2.34GB/s ± 1%    -1.04% (p=0.000 n=17+20)
    MemmoveUnalignedSrc/16-6    2.52GB/s ± 0%   2.49GB/s ± 1%    -1.26% (p=0.000 n=19+20)
    MemmoveUnalignedSrc/32-6    4.82GB/s ± 0%   4.61GB/s ± 0%    -4.40% (p=0.000 n=19+20)
    MemmoveUnalignedSrc/64-6    5.03GB/s ± 4%   7.97GB/s ± 0%   +58.55% (p=0.000 n=20+16)
    MemmoveUnalignedSrc/128-6   11.1GB/s ± 0%   11.2GB/s ± 0%    +0.52% (p=0.000 n=17+18)
    MemmoveUnalignedSrc/256-6   16.5GB/s ± 0%   16.4GB/s ± 0%    -0.10% (p=0.000 n=20+18)
    MemmoveUnalignedSrc/512-6   21.0GB/s ± 0%   22.1GB/s ± 0%    +5.48% (p=0.000 n=14+17)
    MemmoveUnalignedSrc/1024-6  24.9GB/s ± 0%   31.9GB/s ± 0%   +28.20% (p=0.000 n=19+20)
    MemmoveUnalignedSrc/2048-6  23.3GB/s ± 0%   33.8GB/s ± 0%   +45.22% (p=0.000 n=17+19)
    MemmoveUnalignedSrc/4096-6  37.3GB/s ± 0%   42.7GB/s ± 0%   +14.30% (p=0.000 n=17+17)

Change-Id: Id66aa3e499ccfb117cb99d623ef326b50d057b64
Reviewed-on: https://go-review.googlesource.com/29590
Run-TryBot: Denis Nagorny <denis.nagorny@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-06 10:21:58 +00:00
Ian Lance Taylor
e5421e21ef runtime: add threadprof tag for test that starts busy thread
The CgoExternalThreadSIGPROF test starts a thread at constructor time
that does a busy loop. That can throw off some other tests. So only
build that code if testprogcgo is built with the tag threadprof, and
adjust the tests that use that code to pass that build tag.

This revealed that the CgoPprofThread test was not testing what it
should have, as it never actually started the cpuHog thread. It was
passing because of the busy loop thread. Fix it to start the thread as
intended.

Change-Id: I087a9e4fc734a86be16a287456441afac5676beb
Reviewed-on: https://go-review.googlesource.com/30362
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-06 01:23:09 +00:00
Lynn Boger
3107c91e2d runtime: memclr perf improvements on ppc64x
This updates runtime/memclr_ppc64x.s to improve performance,
by unrolling loops for larger clears.

Fixes #17348

benchmark                    old MB/s     new MB/s     speedup
BenchmarkMemclr/5-80         199.71       406.63       2.04x
BenchmarkMemclr/16-80        693.66       1817.41      2.62x
BenchmarkMemclr/64-80        2309.35      5793.34      2.51x
BenchmarkMemclr/256-80       5428.18      14765.81     2.72x
BenchmarkMemclr/4096-80      8611.65      27191.94     3.16x
BenchmarkMemclr/65536-80     8736.69      28604.23     3.27x
BenchmarkMemclr/1M-80        9304.94      27600.09     2.97x
BenchmarkMemclr/4M-80        8705.66      27589.64     3.17x
BenchmarkMemclr/8M-80        8575.74      23631.04     2.76x
BenchmarkMemclr/16M-80       8443.10      19240.68     2.28x
BenchmarkMemclr/64M-80       8390.40      9493.04      1.13x
BenchmarkGoMemclr/5-80       263.05       630.37       2.40x
BenchmarkGoMemclr/16-80      904.33       1148.49      1.27x
BenchmarkGoMemclr/64-80      2830.20      8756.70      3.09x
BenchmarkGoMemclr/256-80     6064.59      20299.46     3.35x

Change-Id: Ic76c9183c8b4129ba3df512ca8b0fe6bd424e088
Reviewed-on: https://go-review.googlesource.com/30373
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
2016-10-05 22:18:14 +00:00
Cherry Zhang
4c9a372946 runtime, cmd/internal/obj: get rid of rewindmorestack
In the function prologue, we emit a jump to the beginning of
the function immediately after calling morestack. And in the
runtime stack growing code, it decodes and emulates that jump.
This emulation was necessary before we had per-PC SP deltas,
since the traceback code assumed that the frame size was fixed
for the whole function, except on the first instruction where
it was 0. Since we now have per-PC SP deltas and PCDATA, we
can correctly record that the frame size is 0. This makes the
emulation unnecessary.

This may be helpful for registerized calling convention, where
there may be unspills of arguments after calling morestack. It
also simplifies the runtime.

Change-Id: I7ebee31eaee81795445b33f521ab6a79624c4ceb
Reviewed-on: https://go-review.googlesource.com/30138
Reviewed-by: David Chase <drchase@google.com>
2016-10-05 18:19:46 +00:00
Michael Munday
f15f1ff46f runtime/testdata/testprogcgo: add explicit return value to signalThread
Should fix the clang builder.

Change-Id: I3ee34581b6a7ec902420de72a8a08a2426997782
Reviewed-on: https://go-review.googlesource.com/30363
Run-TryBot: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-05 15:36:00 +00:00
Ian Lance Taylor
6c13a1db2e runtime: don't call cgocallback from signal handler
Calling cgocallback from a signal handler can fail when using the race
detector. Calling cgocallback will lead to a call to newextram which
will call oneNewExtraM which will call racegostart. The racegostart
function will set up some race detector data structures, and doing that
will sometimes call the C memory allocator. If we are running the signal
handler from a signal that interrupted the C memory allocator, we will
crash or hang.

Instead, change the signal handler code to call needm and dropm. The
needm function will grab allocated m and g structures and initialize the
g to use the current stack--the signal stack. That is all we need to
safely call code that allocates memory and checks whether it needs to
split the stack. This may temporarily leave us with no m available to
run a cgo callback, but that is OK in this case since the code we call
will quickly either crash or call dropm to return the m.

Implementing this required changing some of the setSignalstackSP
functions to avoid a write barrier. These functions never need a write
barrier but in some cases generated one anyhow because on some systems
the ss_sp field is a pointer.

Change-Id: I3893f47c3a66278f85eab7f94c1ab11d4f3be133
Reviewed-on: https://go-review.googlesource.com/30218
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-05 13:21:49 +00:00
Ian Lance Taylor
7faf702396 runtime: avoid endless loop if printing the panic value panics
Change-Id: I56de359a5ccdc0a10925cd372fa86534353c6ca0
Reviewed-on: https://go-review.googlesource.com/30358
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-05 13:13:27 +00:00
Carl Mastrangelo
c1e267cc73 runtime: make append only clear uncopied memory
Also add a benchmark that shows off the new behavior.  The
existing benchmarks reuse the same slice, and thus don't ever have
to clear memory.  Running the Append|Grow benchmarks in runtime:

name                              old time/op  new time/op  delta
AppendSliceLarge/1024Bytes-12      265ns ± 1%   265ns ± 3%     ~     (p=0.524 n=17+20)
AppendSliceLarge/4096Bytes-12      807ns ± 3%   772ns ± 1%   -4.38%  (p=0.000 n=20+20)
AppendSliceLarge/16384Bytes-12    3.20µs ± 4%  2.82µs ± 4%  -11.93%  (p=0.000 n=19+20)
AppendSliceLarge/65536Bytes-12    13.0µs ± 4%  11.0µs ± 3%  -15.22%  (p=0.000 n=20+20)
AppendSliceLarge/262144Bytes-12   62.7µs ± 1%  51.6µs ± 1%  -17.67%  (p=0.000 n=19+20)
AppendSliceLarge/1048576Bytes-12   337µs ± 3%   289µs ± 3%  -14.36%  (p=0.000 n=20+20)
GrowSliceBytes-12                 31.2ns ± 4%  31.4ns ±11%     ~     (p=0.308 n=19+18)
GrowSliceInts-12                  53.4ns ±14%  45.0ns ± 6%  -15.74%  (p=0.000 n=20+19)
GrowSlicePtr-12                   87.0ns ± 3%  83.3ns ± 3%   -4.26%  (p=0.000 n=18+17)
GrowSliceStruct24Bytes-12         88.9ns ± 5%  77.8ns ± 2%  -12.45%  (p=0.000 n=20+19)
Append-12                         17.2ns ± 1%  17.3ns ± 2%     ~     (p=0.464 n=18+17)
AppendGrowByte-12                 2.28ms ± 1%  1.92ms ± 2%  -15.65%  (p=0.000 n=20+18)
AppendGrowString-12                255ms ± 3%   253ms ± 4%     ~     (p=0.065 n=19+19)
AppendSlice/1Bytes-12             3.13ns ± 0%  3.11ns ± 1%   -0.65%  (p=0.000 n=17+18)
AppendSlice/4Bytes-12             3.02ns ± 2%  3.11ns ± 1%   +3.27%  (p=0.000 n=18+17)
AppendSlice/7Bytes-12             4.14ns ± 3%  4.13ns ± 2%     ~     (p=0.380 n=19+18)
AppendSlice/8Bytes-12             3.74ns ± 3%  3.68ns ± 1%   -1.76%  (p=0.000 n=19+18)
AppendSlice/15Bytes-12            4.03ns ± 2%  4.04ns ± 2%     ~     (p=0.261 n=19+20)
AppendSlice/16Bytes-12            4.03ns ± 2%  4.03ns ± 0%     ~     (p=0.062 n=18+17)
AppendSlice/32Bytes-12            3.23ns ± 4%  3.43ns ± 1%   +6.10%  (p=0.000 n=17+18)
AppendStr/1Bytes-12               3.51ns ± 1%  3.52ns ± 1%     ~     (p=0.321 n=18+19)
AppendStr/4Bytes-12               3.46ns ± 1%  3.46ns ± 1%     ~     (p=0.977 n=18+20)
AppendStr/8Bytes-12               3.18ns ± 1%  3.19ns ± 1%     ~     (p=0.650 n=16+17)
AppendStr/16Bytes-12              6.08ns ±27%  5.52ns ± 3%   -9.16%  (p=0.002 n=18+19)
AppendStr/32Bytes-12              3.71ns ± 1%  3.53ns ± 1%   -4.73%  (p=0.000 n=20+19)
AppendSpecialCase-12              17.7ns ± 1%  17.8ns ± 3%   +0.86%  (p=0.045 n=17+18)
AppendInPlace/NoGrow/Byte-12       375ns ± 1%   376ns ± 1%   +0.35%  (p=0.021 n=20+18)
AppendInPlace/NoGrow/1Ptr-12      1.01µs ± 1%  1.10µs ± 1%   +9.28%  (p=0.000 n=18+20)
AppendInPlace/NoGrow/2Ptr-12      1.85µs ± 2%  1.71µs ± 1%   -7.51%  (p=0.000 n=19+18)
AppendInPlace/NoGrow/3Ptr-12      2.57µs ± 2%  2.44µs ± 1%   -5.08%  (p=0.000 n=19+19)
AppendInPlace/NoGrow/4Ptr-12      3.52µs ± 2%  3.35µs ± 2%   -4.70%  (p=0.000 n=20+19)
AppendInPlace/Grow/Byte-12         212ns ± 1%   217ns ± 8%   +2.57%  (p=0.000 n=20+20)
AppendInPlace/Grow/1Ptr-12         214ns ± 2%   217ns ± 3%   +1.23%  (p=0.001 n=18+19)
AppendInPlace/Grow/2Ptr-12         298ns ± 2%   300ns ± 2%   +0.55%  (p=0.038 n=19+20)
AppendInPlace/Grow/3Ptr-12         367ns ± 2%   366ns ± 2%     ~     (p=0.452 n=20+18)
AppendInPlace/Grow/4Ptr-12         416ns ± 2%   411ns ± 2%   -1.18%  (p=0.000 n=20+19)
StackGrowth-12                    43.4ns ± 1%  43.4ns ± 0%     ~     (p=1.000 n=16+16)
StackGrowthDeep-12                11.4µs ± 4%  10.3µs ± 4%   -9.65%  (p=0.000 n=20+19)

Change-Id: I69a8afbd942c787c591d95b9d9439bd6db4d1e49
Reviewed-on: https://go-review.googlesource.com/30192
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-04 22:40:20 +00:00
Nick Craig-Wood
6b795e77df runtime: correct function name in throw message
Change-Id: I8fd271066925734c3f7196f64db04f27c4ce27cb
Reviewed-on: https://go-review.googlesource.com/30274
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-04 13:48:07 +00:00
Brad Fitzpatrick
ad26bb5e30 all: use sort.Slice where applicable
I avoided anywhere in the compiler or things which might be used by
the compiler in the future, since they need to build with Go 1.4.

I also avoided anywhere where there was no benefit to changing it.

I probably missed some.

Updates #16721

Change-Id: Ib3c895ff475c6dec2d4322393faaf8cb6a6d4956
Reviewed-on: https://go-review.googlesource.com/30250
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
2016-10-04 05:10:56 +00:00
Austin Clements
7aab88a31e runtime: fix missing space in error message
Change-Id: I422708d50c3c727246e7991039877660ca034dc9
Reviewed-on: https://go-review.googlesource.com/30144
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-03 22:00:13 +00:00
Austin Clements
bf776a988b runtime: document bmap.tophash
In particular, it wasn't obvious that some values are special (unless
you also found those special values), so document that it isn't
necessarily a hash value.

Change-Id: Iff292822b44408239e26cd882dc07be6df2c1d38
Reviewed-on: https://go-review.googlesource.com/30143
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-03 22:00:06 +00:00
Austin Clements
38f1df66ff runtime: make gcDumpObject useful on stack frames
gcDumpObject is often used on a stack pointer (for example, when
checkmark finds an unmarked object on the stack), but since stack
spans don't have an elemsize, it doesn't print any of the memory from
the frame. Make it at least slightly more useful by printing
everything between obj and obj+off (inclusive). While we're here, also
print out the span state.

Change-Id: I51be064ea8791b4a365865bfdc7afa7b5aaecfbd
Reviewed-on: https://go-review.googlesource.com/30142
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-03 21:59:54 +00:00
Austin Clements
6879dbde4e runtime: introduce a type for span states
Currently span states are untyped constants and the field is just a
uint8. Make this more type-safe by introducing a type for the span
state.

Change-Id: I369bf59fe6e8234475f4921611424fceb7d0a6de
Reviewed-on: https://go-review.googlesource.com/30141
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-03 21:59:45 +00:00
Austin Clements
99339dd445 runtime: weaken claim about SetFinalizer panicking
Currently the SetFinalizer documentation makes a strong claim that
SetFinalizer will panic if the pointer is not to an object allocated
by calling new, to a composite literal, or to a local variable. This
is not true. For example, it doesn't panic when passed the address of
a package-level variable. Nor can we practically make it true. For
example, we can't distinguish between passing a pointer to a composite
literal and passing a pointer to its first field.

Hence, weaken the guarantee to say that it "may" panic.

Updates #17311. (Might fix it, depending on what we want to do with
package-level variables.)

Change-Id: I1c68ea9d0a5bbd3dd1b7ce329d92b0f05e2e0877
Reviewed-on: https://go-review.googlesource.com/30137
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-03 16:12:48 +00:00
Mike Appleby
360f2e43b7 runtime: sleep on CLOCK_MONOTONIC in futexsleep1 on freebsd
In FreeBSD 10.0, the _umtx_op syscall was changed to allow sleeping on
any supported clock, but the default clock was switched from a monotonic
clock to CLOCK_REALTIME.

Prior to 10.0, the __umtx_op_wait* functions ignored the fourth argument
to _umtx_op (uaddr1), expected the fifth argument (uaddr2) to be a
struct timespec pointer, and used a monotonic clock (nanouptime(9)) for
timeout calculations.

Since 10.0, if callers want a clock other than CLOCK_REALTIME, they must
call _umtx_op with uaddr1 set to a value greater than sizeof(struct
timespec), and with uaddr2 as pointer to a struct _umtx_time, rather
than a timespec. Callers can set the _clockid field of the struct
_umtx_time to request the clock they want.

The relevant FreeBSD commit:
    https://svnweb.freebsd.org/base?view=revision&revision=232144

Fixes #17168

Change-Id: I3dd7b32b683622b8d7b4a6a8f9eb56401bed6bdf
Reviewed-on: https://go-review.googlesource.com/30154
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-01 01:25:21 +00:00
David Crawshaw
441502154f runtime: remove defer from standard cgo call
The endcgo function call is currently deferred in case a cgo
callback into Go panics and unwinds through cgocall. Typical cgo
calls do not have callbacks into Go, and even fewer panic, so we
pay the cost of this defer for no typical benefit.

Amazingly, there is another defer on the cgocallback path also used
to cleanup in case the Go code called by cgo panics. This CL folds
the first defer into the second, to reduce the cost of typical cgo
calls.

This reduces the overhead for a no-op cgo call significantly:

	name       old time/op  new time/op  delta
	CgoNoop-8  93.5ns ± 0%  51.1ns ± 1%  -45.34%  (p=0.016 n=4+5)

The total effect between Go 1.7 and 1.8 is even greater, as CL 29656
reduced the cost of defer recently. Hopefully a future Go release
will drop the cost of defer to nothing, making this optimization
unnecessary. But until then, this is nice.

Change-Id: Id1a5648f687a87001d95bec6842e4054bd20ee4f
Reviewed-on: https://go-review.googlesource.com/30080
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-30 13:31:07 +00:00
Matthew Dempsky
9828b7c468 runtime, syscall: use FP instead of SP for parameters
Consistently access function parameters using the FP pseudo-register
instead of SP (e.g., x+0(FP) instead of x+4(SP) or x+8(SP), depending
on register size). Two reasons: 1) doc/asm says the SP pseudo-register
should use negative offsets in the range [-framesize, 0), and 2)
cmd/vet only validates parameter offsets when indexed from the FP
pseudo-register.

No binary changes to the compiled object files for any of the affected
package/OS/arch combinations.

Change-Id: I0efc6079bc7519fcea588c114ec6a39b245d68b0
Reviewed-on: https://go-review.googlesource.com/30085
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-30 05:40:43 +00:00
Mikio Hara
a0d330f1dd runtime, runtime/cgo: revert CL 18814; don't drop signal stack in new thread on dragonfly
This change reverts CL 18814 which is a workaroud for older DragonFly
BSD kernels, and fixes #13945 and #13947 in a more general way the
same as other platforms except NetBSD.

This is a followup to CL 29491.

Updates #16329.

Change-Id: I771670bc672c827f2b3dbc7fd7417c49897cb991
Reviewed-on: https://go-review.googlesource.com/29971
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-28 14:45:06 +00:00
Ian Lance Taylor
eb268cb321 runtime: minor simplifications to signal code
Change setsig, setsigstack, getsig, raise, raiseproc to take uint32 for
signal number parameter, as that is the type mostly used for signal
numbers.  Same for dieFromSignal, sigInstallGoHandler, raisebadsignal.

Remove setsig restart parameter, as it is always either true or
irrelevant.

Don't check the handler in setsigstack, as the only caller does that
anyhow.

Don't bother to convert the handler from sigtramp to sighandler in
getsig, as it will never be called when the handler is sigtramp or
sighandler.

Don't check the return value from rt_sigaction in the GNU/Linux version
of setsigstack; no other setsigstack checks it, and it never fails.

Change-Id: I6bbd677e048a77eddf974dd3d017bc3c560fbd48
Reviewed-on: https://go-review.googlesource.com/29953
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-28 13:12:47 +00:00
Alex Brainman
db82cf4e50 runtime: use RtlGenRandom instead of CryptGenRandom
This change replaces the use of CryptGenRandom with RtlGenRandom in
Windows to generate cryptographically random numbers during process
startup. RtlGenRandom uses the same RNG as CryptGenRandom, but it has many
fewer DLL dependencies and so does not affect process startup time as
much.

This makes running simple Go program on my computers faster.

Windows XP:
benchmark                      old ns/op     new ns/op     delta
BenchmarkRunningGoProgram-2     47408573      10784148      -77.25%

Windows 7 (VM):
benchmark                    old ns/op     new ns/op     delta
BenchmarkRunningGoProgram     16260390      12792150      -21.33%

Windows 7:
benchmark                      old ns/op     new ns/op     delta
BenchmarkRunningGoProgram-2     13600778      10050574      -26.10%

Fixes #15589

Change-Id: I2816239a2056e3d4a6dcd86a6fa2bb619c6008fe
Reviewed-on: https://go-review.googlesource.com/29700
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-28 05:21:53 +00:00
Ian Lance Taylor
fdc167164e runtime: remove sigmask type, use sigset instead
The OS-independent sigmask type was not pulling its weight. Replace it
with the OS-dependent sigset type. This requires adding an OS-specific
sigaddset function, but permits removing the OS-specific sigmaskToSigset
function.

Change-Id: I43307b512b0264ec291baadaea902f05ce212305
Reviewed-on: https://go-review.googlesource.com/29950
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-27 21:33:44 +00:00
Ian Lance Taylor
097a581dc0 runtime: simplify signalstack by dropping nil as argument
Change the two calls to signalstack(nil) to inline the code
instead (it's two lines).

Change-Id: Ie92a05494f924f279e40ac159f1b677fda18f281
Reviewed-on: https://go-review.googlesource.com/29854
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-27 17:08:29 +00:00
Elias Naur
60482d8a8b runtime: relax SetFinalizer documentation to allow &local
The SetFinalizer documentation states that

"The argument obj must be a pointer to an object allocated by calling
new or by taking the address of a composite literal."

which precludes pointers to local variables. According to a comment
on #6591, this case is expected to work. This CL updates the documentation
for SetFinalizer accordingly.

Fixes #6591

Change-Id: Id861b3436bc1c9521361ea2d51c1ce74a121c1af
Reviewed-on: https://go-review.googlesource.com/29592
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-09-27 16:38:47 +00:00
Michael Munday
3436f0776f reflect, runtime: optimize Value.Call on s390x and add benchmark
Use an MVC loop to copy arguments in runtime.call* rather than copying
bytes individually.

I've added the benchmark CallArgCopy to test the speed of Value.Call
for various argument sizes.

name                    old speed      new speed       delta
CallArgCopy/size=128     439MB/s ± 1%    582MB/s ± 1%   +32.41%  (p=0.000 n=10+10)
CallArgCopy/size=256     695MB/s ± 1%   1172MB/s ± 1%   +68.67%  (p=0.000 n=10+10)
CallArgCopy/size=1024    573MB/s ± 8%   4175MB/s ± 2%  +628.11%  (p=0.000 n=10+10)
CallArgCopy/size=4096   1.46GB/s ± 2%  10.19GB/s ± 1%  +600.52%  (p=0.000 n=10+10)
CallArgCopy/size=65536  1.51GB/s ± 0%  12.30GB/s ± 1%  +716.30%   (p=0.000 n=9+10)

Change-Id: I87dae4809330e7964f6cb4a9e40e5b3254dd519d
Reviewed-on: https://go-review.googlesource.com/28096
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-27 16:00:19 +00:00
Cherry Zhang
9d4b40f55d runtime, cmd/compile: implement and use DUFFCOPY on ARM64
Change-Id: I8984eac30e5df78d4b94f19412135d3cc36969f8
Reviewed-on: https://go-review.googlesource.com/29910
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-27 15:07:31 +00:00
Austin Clements
f8b2314c56 runtime: optimize defer code
This optimizes deferproc and deferreturn in various ways.

The most important optimization is that it more carefully arranges to
prevent preemption or stack growth. Currently we do this by switching
to the system stack on every deferproc and every deferreturn. While we
need to be on the system stack for the slow path of allocating and
freeing defers, in the common case we can fit in the nosplit stack.
Hence, this change pushes the system stack switch down into the slow
paths and makes everything now exposed to the user stack nosplit. This
also eliminates the need for various acquirem/releasem pairs, since we
are now preventing preemption by preventing stack split checks.

As another smaller optimization, we special case the common cases of
zero-sized and pointer-sized defer frames to respectively skip the
copy and perform the copy in line instead of calling memmove.

This speeds up the runtime defer benchmark by 42%:

name           old time/op  new time/op  delta
Defer-4        75.1ns ± 1%  43.3ns ± 1%  -42.31%   (p=0.000 n=8+10)

In reality, this speeds up defer by about 2.2X. The two benchmarks
below compare a Lock/defer Unlock pair (DeferLock) with a Lock/Unlock
pair (NoDeferLock). NoDeferLock establishes a baseline cost, so these
two benchmarks together show that this change reduces the overhead of
defer from 61.4ns to 27.9ns.

name           old time/op  new time/op  delta
DeferLock-4    77.4ns ± 1%  43.9ns ± 1%  -43.31%  (p=0.000 n=10+10)
NoDeferLock-4  16.0ns ± 0%  15.9ns ± 0%   -0.39%    (p=0.000 n=9+8)

This also shaves 34ns off cgo calls:

name       old time/op  new time/op  delta
CgoNoop-4   122ns ± 1%  88.3ns ± 1%  -27.72%  (p=0.000 n=8+9)

Updates #14939, #16051.

Change-Id: I2baa0dea378b7e4efebbee8fca919a97d5e15f38
Reviewed-on: https://go-review.googlesource.com/29656
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-26 22:01:35 +00:00
Austin Clements
d211c2d377 runtime: implement getcallersp in Go
This makes it possible to inline getcallersp. getcallersp is on the
hot path of defers, so this slightly speeds up defer:

name           old time/op  new time/op  delta
Defer-4        78.3ns ± 2%  75.1ns ± 1%  -4.00%   (p=0.000 n=9+8)

Updates #14939.

Change-Id: Icc1cc4cd2f0a81fc4c8344432d0b2e783accacdd
Reviewed-on: https://go-review.googlesource.com/29655
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-26 22:01:32 +00:00
Austin Clements
aaf4099a5c runtime: update malloc.go documentation
The big documentation comment at the top of malloc.go has gotten
woefully out of date. Update it.

Change-Id: Ibdb1bdcfdd707a6dc9db79d0633a36a28882301b
Reviewed-on: https://go-review.googlesource.com/29731
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-26 22:00:53 +00:00
Austin Clements
f67c9de656 runtime: document MemStats
This documents all fields in MemStats and more clearly documents where
mstats differs from MemStats.

Fixes #15849.

Change-Id: Ie09374bcdb3a5fdd2d25fe4bba836aaae92cb1dd
Reviewed-on: https://go-review.googlesource.com/28972
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2016-09-26 22:00:50 +00:00
Austin Clements
2098e5d39a runtime: eliminate memstats.heap_reachable
We used to compute an estimate of the reachable heap size that was
different from the marked heap size. This ultimately caused more
problems than it solved, so we pulled it out, but memstats still has
both heap_reachable and heap_marked, and there are some leftover TODOs
about the problems with this estimate.

Clean this up by eliminating heap_reachable in favor of heap_marked
and deleting the stale TODOs.

Change-Id: I713bc20a7c90683d2b43ff63c0b21a440269cc4d
Reviewed-on: https://go-review.googlesource.com/29271
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-26 22:00:47 +00:00
Austin Clements
ec9c84c884 runtime: disentangle next_gc from GC trigger
Back in Go 1.4, memstats.next_gc was both the heap size at which GC
would trigger, and the size GC kept the heap under. When we switched
to concurrent GC in Go 1.5, we got somewhat confused and made this
variable the trigger heap size, while gcController.heapGoal became the
goal heap size.

memstats.next_gc is exposed to the user via MemStats.NextGC, while
gcController.heapGoal is not. This is unfortunate because 1) the heap
goal is far more useful for diagnostics, and 2) the trigger heap size
is just part of the GC trigger heuristic, which means it wouldn't be
useful to an application even if it tried to use it.

We never noticed this mess because MemStats.NextGC is practically
undocumented. Now that we're trying to document MemStats, it became
clear that this field had diverged from its original usefulness.

Clean up this mess by shuffling things back around so that next_gc is
the goal heap size and the new (unexposed) memstats.gc_trigger field
is the trigger heap size. This eliminates gcController.heapGoal.

Updates #15849.

Change-Id: I2cbbd43b1d78bdf613cb43f53488bd63913189b7
Reviewed-on: https://go-review.googlesource.com/29270
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-26 22:00:44 +00:00
Ian Lance Taylor
e2e11f02a4 runtime: unify Unix implementations of unminit
Change-Id: I2cbb13eb85876ad05a52cbd498a9b86e7a28899c
Reviewed-on: https://go-review.googlesource.com/29772
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-26 19:33:26 +00:00
Ian Lance Taylor
ac24388e5e runtime: merge setting new signal mask in minit
All the variants that sets the new signal mask in minit do the same
thing, so merge them. This requires an OS-specific sigdelset function;
the function already exists for linux, and is now added for other OS's.

Change-Id: Ie96f6f02e2cf09c43005085985a078bd9581f670
Reviewed-on: https://go-review.googlesource.com/29771
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-26 18:29:32 +00:00
Ian Lance Taylor
c2735039f3 runtime: unify sigtrampgo
Combine the various versions of sigtrampgo into a single function in
signal_unix.go. This requires defining a fixsigcode method on sigctxt
for all operating systems; it only does something on Darwin. This also
requires changing the darwin/amd64 signal handler to call sigreturn
itself, rather than relying on sigtrampgo to call sigreturn for it. We
can then drop the Darwin sigreturn function, as it is no longer used.

Change-Id: I5a0b9d2d2c141957e151b41e694efeb20e4b4b9a
Reviewed-on: https://go-review.googlesource.com/29761
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-26 17:22:42 +00:00
Ian Lance Taylor
d15295c679 runtime: unify handling of alternate signal stack
Change all Unix systems to use stackt for the alternate signal
stack (some were using sigaltstackt). Add OS-specific setSignalstackSP
function to handle different types for ss_sp field, and unify all
OS-specific signalstack functions into one. Unify handling of alternate
signal stack in OS-specific minit and sigtrampgo functions via new
functions minitSignalstack and setGsignalStack.

Change-Id: Idc316dc69b1dd725717acdf61a1cd8b9f33ed174
Reviewed-on: https://go-review.googlesource.com/29757
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-26 04:07:31 +00:00
Ian Lance Taylor
f05cd4cde5 runtime: simplify conditions testing g.paniconfault
Implement a comment by Ralph Corderoy on CL 29754.

Change-Id: I22bbede211ddcb8a057f16b4f47d335a156cc8d2
Reviewed-on: https://go-review.googlesource.com/29756
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-25 20:29:50 +00:00
Ian Lance Taylor
343bec53c7 runtime: merge sigpanic_unix.go into signal_unix.go
Change-Id: Iba541045b4878405834c637095627631b6559a35
Reviewed-on: https://go-review.googlesource.com/29754
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-25 19:27:05 +00:00
Dmitry Vyukov
38765eba73 runtime/race: don't crash on invalid PCs
Currently raceSymbolizeCode uses funcline, which is internal runtime
function which crashes on incorrect PCs. Use FileLine instead,
it is public and does not crash on invalid data.

Note: FileLine returns "?" file on failure. That string is not NUL-terminated,
so we need to additionally check what FileLine returns.

Fixes #17190

Change-Id: Ic6fbd4f0e68ddd52e9b2dd25e625b50adcb69a98
Reviewed-on: https://go-review.googlesource.com/29714
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-25 12:22:04 +00:00
Dmitry Vyukov
c14050646f runtime: fix newextram PC passed to race detector
PC passed to racegostart is expected to be a return PC
of the go statement. Race runtime will subtract 1 from the PC
before symbolization. Passing start PC of a function is wrong.
Add sys.PCQuantum to the function start PC.

Update #17190

Change-Id: Ia504c49e79af84ed4ea360c2aea472b370ea8bf5
Reviewed-on: https://go-review.googlesource.com/29712
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-25 12:15:40 +00:00
Ian Lance Taylor
159a90b93a runtime: merge Unix sighandler functions
Replace all the Unix sighandler functions with a single instance.
Push the relatively small amount of processor-specific code into five
methods on sigctxt: sigpc, sigsp, siglr, fault, preparePanic.
(Some processors already had a fault method.)

Change-Id: Ib459412ff8f7e0f5ad06bfd43eb827c8b196fc32
Reviewed-on: https://go-review.googlesource.com/29752
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-09-25 03:55:33 +00:00
Keith Randall
60074b0fd3 runtime: remove TestCollisions from -short
Takes a bit too long to run it all the time.

Fixes #17217
Update #17104

Change-Id: I4802190ea16ee0f436a7f95b093ea0f995f5b11d
Reviewed-on: https://go-review.googlesource.com/29751
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-24 03:10:13 +00:00
Ian Lance Taylor
ab552aa3b6 runtime: unify some signal handling functions
Unify the OS-specific versions of msigsave, msigrestore, sigblock,
updatesigmask, and unblocksig into single versions in signal_unix.go.
To do this, make sigprocmask work the same way on all systems, which
required adding a definition of sigprocmask for linux and openbsd.
Also add a single OS-specific function sigmaskToSigset.

Change-Id: I7cbf75131dddb57eeefe648ef845b0791404f785
Reviewed-on: https://go-review.googlesource.com/29689
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-09-24 01:39:48 +00:00
David Crawshaw
ed915ad421 runtime: use sched_yield instead of pthread_yield
Attempt to fix the linux-amd64-clang builder, which broke
with CL 29472.

Turns out pthread_yield is a non-portable Linux function, and
should have #define _GNU_SOURCE before #include <pthread.h>.
GCC doesn't complain about this, but Clang does:

	./raceprof.go:44:3: warning: implicit declaration of function 'pthread_yield' is invalid in C99 [-Wimplicit-function-declaration]

(Though the error, while explicable, certainly could be clearer.)

There is a portable POSIX equivalent, sched_yield, so this
CL uses it instead.

Change-Id: I58ca7a3f73a2b3697712fdb02e72a8027c391169
Reviewed-on: https://go-review.googlesource.com/29675
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-23 04:32:38 +00:00
David Crawshaw
b4c9829c22 runtime: check plugin-loaded moduledata addresses
Inspired by difficulties with plugin support on darwin.

Change-Id: I2cef8410837946454e75d00e94e46791f03f2267
Reviewed-on: https://go-review.googlesource.com/29391
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-23 02:16:17 +00:00
Ian Lance Taylor
9d8522fdc7 cmd/compile: don't instrument copy and append in runtime
Instrumenting copy and append for the race detector changes them to call
different functions. In the runtime package the alternate functions are
not marked as nosplit. This caused a crash in the SIGPROF handler when
invoked on a non-Go thread in a program built with the race detector. In
some cases the handler can call copy, the race detector changed that to
a call to a non-nosplit function, the function tried to check the stack
guard, and crashed because it was running on a non-Go thread. The
SIGPROF handler is written carefully to avoid such problems, but hidden
function calls are difficult to avoid.

Fix this by changing the compiler to not instrument copy and append when
compiling the runtime package. Change the runtime package to add
explicit race checks for the only code I could find where copy is used
to write to user data (append is never used).

Change-Id: I11078a66c0aaa459a7d2b827b49f4147922050af
Reviewed-on: https://go-review.googlesource.com/29472
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-09-22 23:37:17 +00:00
Ian Lance Taylor
9e028b70ea runtime: merge signal[12]_unix.go into signal_unix.go
Requires adding a sigfwd function for Solaris, as previously
signal2_unix.go was not built for Solaris.

Change-Id: Iea3ff0ddfa15af573813eb075bead532b324a3fc
Reviewed-on: https://go-review.googlesource.com/29550
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-21 23:04:34 +00:00
Mikio Hara
5db80c30e6 runtime: revert CL 18835; don't install new signal stack unconditionally on dragonfly
This change reverts CL 18835 which is a workaroud for older DragonFly
BSD kernels, and fixes #14051, #14052 and #14067 in a more general way
the same as other platforms except NetBSD.

This change also bumps the minimum required version of DragonFly BSD
kernel to 4.4.4.

Fixes #16329.

Change-Id: I0b44b6afa675f5ed9523914226bd9ec4809ba5ae
Reviewed-on: https://go-review.googlesource.com/29491
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-21 22:18:06 +00:00
Lynn Boger
b4efd09d18 cmd/link: split large elf text sections on ppc64x
Some applications built with Go on ppc64x with external linking
can fail to link with relocation truncation errors if the elf
text section that is generated is larger than 2^26 bytes and that
section contains a call instruction (bl) which calls a function
beyond the limit addressable by the 24 bit field in the
instruction.

This solution consists of generating multiple text sections where
each is small enough to allow the GNU linker to resolve the calls
by generating long branch code where needed.  Other changes were added
to handle differences in processing when multiple text sections exist.

Some adjustments were required to the computation of a method's address
when using the method offset table when there are multiple text sections.

The number of possible section headers was increased to allow for up
to 128 text sections.  A test case was also added.

Fixes #15823.

Change-Id: If8117b0e0afb058cbc072258425a35aef2363c92
Reviewed-on: https://go-review.googlesource.com/27790
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-21 20:23:49 +00:00
Austin Clements
c03925edd3 runtime: remove unnecessary atomics from heapBitSetType
These used to be necessary when racing with updates to the mark bit,
but since the mark bit is no longer in the bitmap and the checkmark is
only updated with the world stopped, we can now always use regular
writes to update the type information in the heap bitmap.

Somewhat surprisingly, this has basically no overall performance
effect beyond the usual noise, but it does clean up the code.

Change-Id: I3933d0b4c0bc1c9bcf6313613515c0b496212105
Reviewed-on: https://go-review.googlesource.com/29277
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-21 15:08:16 +00:00
Austin Clements
ab59235729 runtime: consistency check for G rescan position
Issue #17099 shows a failure that indicates we rescanned a stack twice
concurrently during mark termination, which suggests that the rescan
list became inconsistent. Add a simple check when we dequeue something
from the rescan list that it claims to be at the index where we found
it.

Change-Id: I6a267da4154a2e7b7d430cb4056e6bae978eaf62
Reviewed-on: https://go-review.googlesource.com/29280
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-20 18:37:32 +00:00
Austin Clements
39ce6eb9ec runtime: report GCSys and OtherSys in heap profile
The comment block at the end of the heap profile includes *almost*
everything from MemStats. Add the missing fields. These are useful for
debugging RSS that has gone to GC-internal data structures.

Change-Id: I0ee8a918d49629e28fd8fd2bf6861c4529461c24
Reviewed-on: https://go-review.googlesource.com/29276
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-20 18:37:29 +00:00
Cherry Zhang
38cd79889e cmd/compile: simplify div/mod on ARM
On ARM, DIV, DIVU, MOD, MODU are pseudo instructions that makes
runtime calls _div/_udiv/_mod/_umod, which themselves are wrappers
of udiv. The udiv function does the real thing.

Instead of generating these pseudo instructions, call to udiv
directly. This removes one layer of wrappers (which has an awkward
way of passing argument), and also allows combining DIV and MOD
if both results are needed.

Change-Id: I118afc3986db3a1daabb5c1e6e57430888c91817
Reviewed-on: https://go-review.googlesource.com/29390
Reviewed-by: David Chase <drchase@google.com>
2016-09-20 13:40:48 +00:00
Keith Randall
6129f37367 cmd/compile: inline convT2{I,E} when result doesn't escape
No point in calling a function when we can build the interface
using a known type (or itab) and the address of a local.

Get rid of third arg (preallocated stack space) to convT2{I,E}.

Makes go binary smaller by 0.2%

benchmark                   old ns/op     new ns/op     delta
BenchmarkEfaceInteger-8     16.7          10.1          -39.52%

Update #17118
Update #15375

Change-Id: I9724a1f802bfa1e3957bf1856b55558278e198a2
Reviewed-on: https://go-review.googlesource.com/29373
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-09-19 02:37:08 +00:00
David Crawshaw
0cbb12f0bb plugin: new package for loading plugins
Includes a linux implementation.

Change-Id: Iacc2ed7da760ae9deebc928adf2b334b043b07ec
Reviewed-on: https://go-review.googlesource.com/27823
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-16 17:54:40 +00:00
David Crawshaw
8607bed744 runtime: avoid dependence on main symbol
For -buildmode=plugin, this lets the linker drop the main.main symbol
out of the binary while including most of the runtime.

(In the future it should be possible to drop the entire runtime
package from plugins.)

Change-Id: I3e7a024ddf5cc945e3d8b84bf37a0b7cb2a00eb6
Reviewed-on: https://go-review.googlesource.com/27821
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-16 14:49:27 +00:00
David Crawshaw
eced6754c2 cmd/link: -buildmode=plugin support for linux
This CL contains several linker changes to support creating plugins.

It collects the exported plugin symbols provided by the compiler and
includes them in the moduledata.

It treats a binary as being dynamically linked if it imports the plugin
package. This lets the dynamic linker de-duplicate symbols.

Change-Id: I099b6f38dda26306eba5c41dbe7862f5a5918d95
Reviewed-on: https://go-review.googlesource.com/27820
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-16 14:49:13 +00:00
Matthew Dempsky
27eebbabc2 cmd/compile, runtime: remove throwreturn
Change-Id: If8d27cf1cd8d650ed0ba332448d3174d80b6b0ca
Reviewed-on: https://go-review.googlesource.com/29217
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-15 12:13:34 +00:00
Martin Möhrmann
150de948ee cmd/compile: intrinsify slicebytetostringtmp when not instrumenting
when not instrumenting:
- Intrinsify uses of slicebytetostringtmp within the runtime package
  in the ssa backend.
- Pass OARRAYBYTESTRTMP nodes to the compiler backends for lowering
  instead of generating calls to slicebytetostringtmp.

name                    old time/op  new time/op  delta
ConcatStringAndBytes-4  27.9ns ± 2%  24.7ns ± 2%  -11.52%  (p=0.000 n=43+43)

Fixes #17044

Change-Id: I51ce9c3b93284ce526edd0234f094e98580faf2d
Reviewed-on: https://go-review.googlesource.com/29017
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-14 21:58:14 +00:00
Josh Bleecher Snyder
9980b70cb4 runtime: limit the number of map overflow buckets
Consider repeatedly adding many items to a map
and then deleting them all, as in #16070. The map
itself doesn't need to grow above the high water
mark of number of items. However, due to random
collisions, the map can accumulate overflow
buckets.

Prior to this CL, those overflow buckets were
never removed, which led to a slow memory leak.

The problem with removing overflow buckets is
iterators. The obvious approach is to repack
keys and values and eliminate unused overflow
buckets. However, keys, values, and overflow
buckets cannot be manipulated without disrupting
iterators.

This CL takes a different approach, which is to
reuse the existing map growth mechanism,
which is well established, well tested, and
safe in the presence of iterators.
When a map has accumulated enough overflow buckets
we trigger map growth, but grow into a map of the
same size as before. The old overflow buckets will
be left behind for garbage collection.

For the code in #16070, instead of climbing
(very slowly) forever, memory usage now cycles
between 264mb and 483mb every 15 minutes or so.

To avoid increasing the size of maps,
the overflow bucket counter is only 16 bits.
For large maps, the counter is incremented
stochastically.

Fixes #16070

Change-Id: If551d77613ec6836907efca58bda3deee304297e
Reviewed-on: https://go-review.googlesource.com/25049
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-13 17:53:32 +00:00
Cherry Zhang
38d35e714a cmd/compile, runtime/internal/atomic: intrinsify And8, Or8 on ARM64
Also add assembly implementation, in case intrinsics is disabled.

Change-Id: Iff0a8a8ce326651bd29f6c403f5ec08dd3629993
Reviewed-on: https://go-review.googlesource.com/28979
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-13 02:09:15 +00:00
Keith Randall
d00a3cead8 runtime: make gdb test resilient to line numbering
Don't break on line number, instead break on the actual call.
This makes the test more robust to line numbering changes in the backend.

A CL (28950) changed the generated code line numbering slightly.  A MOVW
$0, R0 instruction at the start of the function changed to line
10 (because several constant zero instructions got CSEd, and one gets
picked arbitrarily).  That's too fragile for a test.

Change-Id: I5d6a8ef0603de7d727585004142780a527e70496
Reviewed-on: https://go-review.googlesource.com/29085
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-12 23:13:12 +00:00
Michael Munday
f1515a01fd runtime, math/big: allow R0 on s390x to contain values other than 0
The new SSA backend for s390x can use R0 as a general purpose register.
This change modifies assembly code to either avoid using R0 entirely
or explicitly set R0 to 0.

R0 can still be safely used as 0 in address calculations.

Change-Id: I3efa723e9ef322a91a408bd8c31768d7858526c8
Reviewed-on: https://go-review.googlesource.com/28976
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-12 18:06:01 +00:00
Michael Munday
d38d59ffb5 runtime: fix SIGILL in checkvectorfacility on s390x
STFLE does not necessarily write to all the double-words that are
requested. It is therefore necessary to clear the target memory
before calling STFLE in order to ensure that the facility list does
not contain false positives.

Fixes #17032.

Change-Id: I7bec9ade7103e747b72f08562fe57e6f091bd89f
Reviewed-on: https://go-review.googlesource.com/28850
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-09 00:34:22 +00:00
Cherry Zhang
14ff7cc94c runtime/cgo: fix callback on big-endian MIPS64
Use MOVW, instead of MOVV, to pass an int32 arg. Also no need to
restore arg registers.

Fix big-endian MIPS64 build.

Change-Id: Ib43c71075c988153e5e5c5c6e7297b3fee28652a
Reviewed-on: https://go-review.googlesource.com/28830
Reviewed-by: Minux Ma <minux@golang.org>
2016-09-09 00:30:28 +00:00
Josh Bleecher Snyder
07bcc16547 runtime: simplify getargp
Change-Id: I9ed62e8a6d8b9204c18748efd7845adabf3460b9
Reviewed-on: https://go-review.googlesource.com/28775
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-08 17:10:22 +00:00
Martin Möhrmann
252093f120 runtime: remove maxstring
Before this CL the runtime prevented printing of overlong strings with the print
function when the length of the string was determined to be corrupted.
Corruption was checked by comparing the string size against the limit
which was stored in maxstring.

However maxstring was not updated everywhere were go strings were created
e.g. for string constants during compile time. Thereby the check for maximum
string length prevented the printing of some valid strings.

The protection maxstring provided did not warrant the bookkeeping
and global synchronization needed to keep maxstring updated to the
correct limit everywhere.

Fixes #16999

Change-Id: I62cc2f4362f333f75b77f199ce1a71aac0ff7aeb
Reviewed-on: https://go-review.googlesource.com/28813
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-08 15:57:01 +00:00
Ilya Tocar
0cff219c12 strings: use AVX2 for Index if available
IndexHard4-4      1.50ms ± 2%  0.71ms ± 0%  -52.36%  (p=0.000 n=20+19)

This also fixes a bug, that caused a string of length 16 to use
two 8-byte comparisons instead of one 16-byte. And adds a test for
cases when partial_match fails.

Change-Id: I1ee8fc4e068bb36c95c45de78f067c822c0d9df0
Reviewed-on: https://go-review.googlesource.com/22551
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-07 10:43:13 +00:00
Austin Clements
8259cf3c72 runtime/debug: enable TestFreeOSMemory on all arches
TestFreeOSMemory was disabled on many arches because of issue #9993.
Since that's been fixed, enable the test everywhere.

Change-Id: I298c38c3e04128d9c8a1f558980939d5699bea03
Reviewed-on: https://go-review.googlesource.com/27403
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-09-06 21:05:58 +00:00
Austin Clements
1b9499b069 syscall: make Getpagesize return page size from runtime
syscall.Getpagesize currently returns hard-coded page sizes on all
architectures (some of which are probably always wrong, and some of
which are definitely not always right). The runtime now has this
information, queried from the OS during runtime init, so make
syscall.Getpagesize return the page size that the runtime knows.

Updates #10180.

Change-Id: I4daa6fbc61a2193eb8fa9e7878960971205ac346
Reviewed-on: https://go-review.googlesource.com/25051
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-06 21:05:55 +00:00
Austin Clements
6dda7b2f5f runtime: don't hard-code physical page size
Now that the runtime fetches the true physical page size from the OS,
make the physical page size used by heap growth a variable instead of
a constant. This isn't used in any performance-critical paths, so it
shouldn't be an issue.

sys.PhysPageSize is also renamed to sys.DefaultPhysPageSize to make it
clear that it's not necessarily the true page size. There are no uses
of this constant any more, but we'll keep it around for now.

Updates #12480 and #10180.

Change-Id: I6c23b9df860db309c38c8287a703c53817754f03
Reviewed-on: https://go-review.googlesource.com/25022
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-06 21:05:53 +00:00
Austin Clements
276a52de55 runtime: fetch physical page size from the OS
Currently the physical page size assumed by the runtime is hard-coded.
On Linux the runtime at least fetches the OS page size during init and
sanity checks against the hard-coded value, but they may still differ.
On other OSes we wouldn't even notice.

Add support on all OSes to fetch the actual OS physical page size
during runtime init and lift the sanity check of PhysPageSize from the
Linux init code to general malloc init. Currently this is the only use
of the retrieved page size, but we'll add more shortly.

Updates #12480 and #10180.

Change-Id: I065f2834bc97c71d3208edc17fd990ec9058b6da
Reviewed-on: https://go-review.googlesource.com/25050
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-06 21:05:50 +00:00
Austin Clements
d7de8b6d23 runtime: assume 64kB physical pages on ARM
Currently we assume the physical page size on ARM is 4kB. While this
is usually true, the architecture also supports 16kB and 64kB physical
pages, and Linux (and possibly other OSes) can be configured to use
these larger page sizes.

With Go 1.6, such a configuration could potentially run, but generally
resulted in memory corruption or random panics. With current master,
this configuration will cause the runtime to panic during init on
Linux when it checks the true physical page size (and will still cause
corruption or panics on other OSes).

However, the assumed physical page size only has to be a multiple of
the true physical page size, the scavenger can now deal with large
physical page sizes, and the rest of the runtime can deal with a
larger assumed physical page size than the true size. Hence, there's
little disadvantage to conservatively setting the assumed physical
page size to 64kB on ARM.

This may result in some extra memory use, since we can only return
memory at multiples of the assumed physical page size. However, it is
a simple change that should make Go run on systems configured for
larger page sizes. The following commits will make the runtime query
the actual physical page size from the OS, but this is a simple step
there.

Updates #12480.

Change-Id: I851829595bc9e0c76235c847a7b5f62ad82b5302
Reviewed-on: https://go-review.googlesource.com/25021
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-09-06 21:05:47 +00:00
Austin Clements
cf4f1d07a1 runtime: bound scanobject to ~100 µs
Currently the time spent in scanobject is proportional to the size of
the object being scanned. Since scanobject is non-preemptible, large
objects can cause significant goroutine (and even whole application)
delays through several means:

1. If a GC assist picks up a large object, the allocating goroutine is
   blocked for the whole scan, even if that scan well exceeds that
   goroutine's debt.

2. Since the scheduler does not run on the P performing a large object
   scan, goroutines in that P's run queue do not run unless they are
   stolen by another P (which can take some time). If there are a few
   large objects, all of the Ps may get tied up so the scheduler
   doesn't run anywhere.

3. Even if a large object is scanned by a background worker and other
   Ps are still running the scheduler, the large object scan doesn't
   flush background credit until the whole scan is done. This can
   easily cause all allocations to block in assists, waiting for
   credit, causing an effective STW.

Fix this by splitting large objects into 128 KB "oblets" and scanning
at most one oblet at a time. Since we can scan 1–2 MB/ms, this equates
to bounding scanobject at roughly 100 µs. This improves assist
behavior both because assists can no longer get "unlucky" and be stuck
scanning a large object, and because it causes the background worker
to flush credit and unblock assists more frequently when scanning
large objects. This also improves GC parallelism if the heap consists
primarily of a small number of very large objects by letting multiple
workers scan a large objects in parallel.

Fixes #10345. Fixes #16293.

This substantially improves goroutine latency in the benchmark from
issue #16293, which exercises several forms of very large objects:

name                 old max-latency    new max-latency    delta
SliceNoPointer-12           154µs ± 1%        155µs ±  2%     ~     (p=0.087 n=13+12)
SlicePointer-12             314ms ± 1%       5.94ms ±138%  -98.11%  (p=0.000 n=19+20)
SliceLivePointer-12        1148ms ± 0%       4.72ms ±167%  -99.59%  (p=0.000 n=19+20)
MapNoPointer-12           72509µs ± 1%        408µs ±325%  -99.44%  (p=0.000 n=19+18)
ChanPointer-12              313ms ± 0%       4.74ms ±140%  -98.49%  (p=0.000 n=18+20)
ChanLivePointer-12         1147ms ± 0%       3.30ms ±149%  -99.71%  (p=0.000 n=19+20)

name                 old P99.9-latency  new P99.9-latency  delta
SliceNoPointer-12           113µs ±25%         107µs ±12%     ~     (p=0.153 n=20+18)
SlicePointer-12          309450µs ± 0%         133µs ±23%  -99.96%  (p=0.000 n=20+20)
SliceLivePointer-12         961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)
MapNoPointer-12            448µs ±288%         119µs ±18%  -73.34%  (p=0.000 n=18+20)
ChanPointer-12           309450µs ± 0%         134µs ±23%  -99.96%  (p=0.000 n=20+19)
ChanLivePointer-12          961ms ± 0%        1.35ms ±27%  -99.86%  (p=0.000 n=20+20)

This has negligible effect on all metrics from the garbage, JSON, and
HTTP x/benchmarks.

It shows slight improvement on some of the go1 benchmarks,
particularly Revcomp, which uses some multi-megabyte buffers:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.46s ± 1%     2.47s ± 1%  +0.32%  (p=0.012 n=20+20)
Fannkuch11-12                2.82s ± 0%     2.81s ± 0%  -0.61%  (p=0.000 n=17+20)
FmtFprintfEmpty-12          50.8ns ± 5%    50.5ns ± 2%    ~     (p=0.197 n=17+19)
FmtFprintfString-12          131ns ± 1%     132ns ± 0%  +0.57%  (p=0.000 n=20+16)
FmtFprintfInt-12             117ns ± 0%     116ns ± 0%  -0.47%  (p=0.000 n=15+20)
FmtFprintfIntInt-12          180ns ± 0%     179ns ± 1%  -0.78%  (p=0.000 n=16+20)
FmtFprintfPrefixedInt-12     186ns ± 1%     185ns ± 1%  -0.55%  (p=0.000 n=19+20)
FmtFprintfFloat-12           263ns ± 1%     271ns ± 0%  +2.84%  (p=0.000 n=18+20)
FmtManyArgs-12               741ns ± 1%     742ns ± 1%    ~     (p=0.190 n=19+19)
GobDecode-12                7.44ms ± 0%    7.35ms ± 1%  -1.21%  (p=0.000 n=20+20)
GobEncode-12                6.22ms ± 1%    6.21ms ± 1%    ~     (p=0.336 n=20+19)
Gzip-12                      220ms ± 1%     219ms ± 1%    ~     (p=0.130 n=19+19)
Gunzip-12                   37.9ms ± 0%    37.9ms ± 1%    ~     (p=1.000 n=20+19)
HTTPClientServer-12         82.5µs ± 3%    82.6µs ± 3%    ~     (p=0.776 n=20+19)
JSONEncode-12               16.4ms ± 1%    16.5ms ± 2%  +0.49%  (p=0.003 n=18+19)
JSONDecode-12               53.7ms ± 1%    54.1ms ± 1%  +0.71%  (p=0.000 n=19+18)
Mandelbrot200-12            4.19ms ± 1%    4.20ms ± 1%    ~     (p=0.452 n=19+19)
GoParse-12                  3.38ms ± 1%    3.37ms ± 1%    ~     (p=0.123 n=19+19)
RegexpMatchEasy0_32-12      72.1ns ± 1%    71.8ns ± 1%    ~     (p=0.397 n=19+17)
RegexpMatchEasy0_1K-12       242ns ± 0%     242ns ± 0%    ~     (p=0.168 n=17+20)
RegexpMatchEasy1_32-12      72.1ns ± 1%    72.1ns ± 1%    ~     (p=0.538 n=18+19)
RegexpMatchEasy1_1K-12       385ns ± 1%     384ns ± 1%    ~     (p=0.388 n=20+20)
RegexpMatchMedium_32-12      112ns ± 1%     112ns ± 3%    ~     (p=0.539 n=20+20)
RegexpMatchMedium_1K-12     34.4µs ± 2%    34.4µs ± 2%    ~     (p=0.628 n=18+18)
RegexpMatchHard_32-12       1.80µs ± 1%    1.80µs ± 1%    ~     (p=0.522 n=18+19)
RegexpMatchHard_1K-12       54.0µs ± 1%    54.1µs ± 1%    ~     (p=0.647 n=20+19)
Revcomp-12                   387ms ± 1%     369ms ± 5%  -4.89%  (p=0.000 n=17+19)
Template-12                 62.3ms ± 1%    62.0ms ± 0%  -0.48%  (p=0.002 n=20+17)
TimeParse-12                 314ns ± 1%     314ns ± 0%    ~     (p=1.011 n=20+13)
TimeFormat-12                358ns ± 0%     354ns ± 0%  -1.12%  (p=0.000 n=17+20)
[Geo mean]                  53.5µs         53.3µs       -0.23%

Change-Id: I2a0a179d1d6bf7875dd054b7693dd12d2a340132
Reviewed-on: https://go-review.googlesource.com/23540
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-06 19:27:33 +00:00
Austin Clements
b275e55d86 runtime: clean up more traces of the old mark bit
Commit 59877bf renamed bitMarked to bitScan, since the bitmap is no
longer used for marking. However, there were several other references
to this strewn about comments and in some other constant names. Fix
these up, too.

Change-Id: I4183d28c6b01977f1d75a99ad55b150f2211772d
Reviewed-on: https://go-review.googlesource.com/28450
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-06 19:26:08 +00:00
Erik Staab
66121ce8a9 runtime: remove redundant expression from SetFinalizer
The previous if condition already checks the same expression and doesn't
have side effects.

Change-Id: Ieaf30a786572b608d0a883052b45fd3f04bc6147
Reviewed-on: https://go-review.googlesource.com/28475
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-06 00:52:26 +00:00
Josh Bleecher Snyder
0e7e43688d runtime: remove a load and shift from scanobject
hbits.morePointers and hbits.isPointer both
do a load and a shift. Do it only once.

Benchmarks using compilebench (because it is
the benchmark I have the most tooling around),
on a quiet machine.

name       old time/op      new time/op      delta
Template        291ms ±14%       290ms ±15%    ~          (p=0.702 n=100+99)
Unicode         143ms ± 9%       142ms ± 9%    ~           (p=0.126 n=99+98)
GoTypes         934ms ± 4%       933ms ± 4%    ~         (p=0.937 n=100+100)
Compiler        4.92s ± 2%       4.90s ± 1%  -0.28%        (p=0.003 n=98+98)

name       old user-ns/op   new user-ns/op   delta
Template   360user-ms ± 5%  355user-ms ± 4%  -1.37%        (p=0.000 n=97+96)
Unicode    178user-ms ± 6%  176user-ms ± 6%  -1.24%        (p=0.001 n=96+99)
GoTypes    1.22user-s ± 5%  1.21user-s ± 5%  -0.94%      (p=0.000 n=100+100)
Compiler   6.50user-s ± 2%  6.44user-s ± 3%  -0.94%        (p=0.000 n=96+98)

On amd64, before:

"".scanobject t=1 size=581 args=0x10 locals=0x78

After:

"".scanobject t=1 size=540 args=0x10 locals=0x78


Change-Id: I420ac3704549d484a5d85e19fea82c85da389514
Reviewed-on: https://go-review.googlesource.com/22712
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-02 19:17:43 +00:00
Dmitry Vyukov
cd285f1c6f runtime: fix global buffer reset in StopTrace
We reset global buffer only if its pos != 0.
We ought to do it always, but queue it only if pos != 0.
This is a latent bug. Currently it does not fire because
whenever we create a global buffer, we increment pos.

Change-Id: I01e28ae88ce9a5412497c524391b8b7cb443ffd9
Reviewed-on: https://go-review.googlesource.com/25574
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-09-02 19:14:11 +00:00
Gleb Stepanov
59877bfaaf runtime: rename variable
Rename variable to bitScan according to
TODO comment.

Change-Id: I81dd8cc1ca28c0dc9308a654ad65cdf5b2fd2ce3
Reviewed-on: https://go-review.googlesource.com/25175
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-09-02 17:28:41 +00:00
Austin Clements
3df926d52a runtime: improve message when a bad pointer is found on the stack
Currently this message says "invalid stack pointer", which could be
interpreted as the value of SP being invalid. Change it to "invalid
pointer found on stack" to emphasize that it's a pointer on the stack
that's invalid.

Updates #16948.

Change-Id: I753624f8cc7e08cf13d3ea5d9c790cc4af9fa372
Reviewed-on: https://go-review.googlesource.com/28430
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-02 17:04:37 +00:00
Ilya Tocar
44f1854c9d bytes: Use the same algorithm as strings for Index
name                     old time/op    new time/op      delta
IndexByte32-48             9.05ns ± 7%      9.59ns ±11%     +5.93%  (p=0.001 n=19+20)
IndexByte4K-48              118ns ± 4%       122ns ± 8%     +3.52%  (p=0.002 n=19+19)
IndexByte4M-48              172µs ±13%       188µs ±12%     +9.49%  (p=0.000 n=20+20)
IndexByte64M-48            8.00ms ±14%      8.05ms ±23%       ~     (p=0.799 n=20+20)
IndexBytePortable32-48     41.7ns ±15%      42.5ns ±12%       ~     (p=0.372 n=20+20)
IndexBytePortable4K-48     3.08µs ±16%      3.26µs ±10%     +5.77%  (p=0.018 n=20+20)
IndexBytePortable4M-48     3.12ms ±17%      3.20ms ±10%       ~     (p=0.157 n=20+20)
IndexBytePortable64M-48    54.0ms ±14%      55.3ms ±14%       ~     (p=0.640 n=20+20)
Index32-48                  230ns ±12%        46ns ± 6%    -79.87%  (p=0.000 n=20+19)
Index4K-48                 43.2µs ± 9%       3.2µs ±12%    -92.58%  (p=0.000 n=19+20)
Index4M-48                 44.4ms ± 7%       3.3ms ±13%    -92.59%  (p=0.000 n=19+20)
Index64M-48                 714ms ±10%        56ms ± 8%    -92.22%  (p=0.000 n=19+19)
IndexEasy32-48             52.7ns ±10%      31.0ns ±11%    -41.21%  (p=0.000 n=20+20)
IndexEasy4K-48              139ns ± 5%      1598ns ± 6%  +1046.37%  (p=0.000 n=19+19)
IndexEasy4M-48              179µs ± 8%      1674µs ±10%   +834.31%  (p=0.000 n=19+20)
IndexEasy64M-48            8.56ms ±10%     27.82ms ±16%   +225.14%  (p=0.000 n=19+20)

name                     old speed      new speed        delta
IndexByte32-48           3.52GB/s ± 7%    3.35GB/s ±11%     -4.99%  (p=0.001 n=20+20)
IndexByte4K-48           34.5GB/s ± 7%    33.2GB/s ±10%     -3.67%  (p=0.002 n=20+20)
IndexByte4M-48           24.6GB/s ±14%    22.4GB/s ±14%     -8.73%  (p=0.000 n=20+20)
IndexByte64M-48          8.42GB/s ±16%    8.42GB/s ±19%       ~     (p=0.799 n=20+20)
IndexBytePortable32-48    770MB/s ±13%     756MB/s ±11%       ~     (p=0.383 n=20+20)
IndexBytePortable4K-48   1.34GB/s ±14%    1.26GB/s ±10%     -5.76%  (p=0.018 n=20+20)
IndexBytePortable4M-48   1.35GB/s ±15%    1.31GB/s ±11%       ~     (p=0.157 n=20+20)
IndexBytePortable64M-48  1.25GB/s ±16%    1.22GB/s ±13%       ~     (p=0.640 n=20+20)
Index32-48                138MB/s ± 8%     687MB/s ± 8%   +398.57%  (p=0.000 n=19+20)
Index4K-48               94.9MB/s ± 9%  1280.5MB/s ±11%  +1249.11%  (p=0.000 n=19+20)
Index4M-48               94.6MB/s ± 7%  1278.5MB/s ±12%  +1250.99%  (p=0.000 n=19+20)
Index64M-48              94.2MB/s ±10%  1210.9MB/s ± 8%  +1185.04%  (p=0.000 n=19+19)
IndexEasy32-48            608MB/s ±10%    1035MB/s ±10%    +70.15%  (p=0.000 n=20+20)
IndexEasy4K-48           29.3GB/s ± 6%     2.6GB/s ± 6%    -91.24%  (p=0.000 n=19+19)
IndexEasy4M-48           23.3GB/s ±10%     2.5GB/s ± 9%    -89.23%  (p=0.000 n=20+20)
IndexEasy64M-48          7.86GB/s ±11%    2.42GB/s ±14%    -69.18%  (p=0.000 n=19+20)

Change-Id: Ia191f0a6ca80e113397d9ed98d25f195768b65bc
Reviewed-on: https://go-review.googlesource.com/22550
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-01 18:05:50 +00:00
Joe Tsai
6fb4b15f98 Revert "runtime: improve memmove for amd64"
This reverts commit 3607c5f4f1.

This was causing failures on amd64 machines without AVX.

Fixes #16939

Change-Id: I70080fbb4e7ae791857334f2bffd847d08cb25fa
Reviewed-on: https://go-review.googlesource.com/28274
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-31 21:07:35 +00:00
Kevin Burke
ffa2bd27a4 runtime: fix typo
Change-Id: I47e3cfa8b49e3d0b55c91387df31488b37038a8f
Reviewed-on: https://go-review.googlesource.com/28225
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-31 16:24:45 +00:00
Denis Nagorny
3607c5f4f1 runtime: improve memmove for amd64
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors.

(collected on E5 2609v3 @1.9GHz)
name                        old speed      new speed       delta
Memmove/1-6                  158MB/s ± 0%    172MB/s ± 0%    +9.09% (p=0.000 n=16+16)
Memmove/2-6                  316MB/s ± 0%    345MB/s ± 0%    +9.09% (p=0.000 n=18+16)
Memmove/3-6                  517MB/s ± 0%    517MB/s ± 0%      ~ (p=0.445 n=16+16)
Memmove/4-6                  687MB/s ± 1%    690MB/s ± 0%    +0.35% (p=0.000 n=20+17)
Memmove/5-6                  729MB/s ± 0%    729MB/s ± 0%    +0.01% (p=0.000 n=16+18)
Memmove/6-6                  875MB/s ± 0%    875MB/s ± 0%    +0.01% (p=0.000 n=18+18)
Memmove/7-6                 1.02GB/s ± 0%   1.02GB/s ± 1%      ~ (p=0.139 n=19+20)
Memmove/8-6                 1.26GB/s ± 0%   1.26GB/s ± 0%    +0.00% (p=0.000 n=18+18)
Memmove/9-6                 1.42GB/s ± 0%   1.42GB/s ± 0%    +0.00% (p=0.000 n=17+18)
Memmove/10-6                1.58GB/s ± 0%   1.58GB/s ± 0%    +0.00% (p=0.000 n=19+19)
Memmove/11-6                1.74GB/s ± 0%   1.74GB/s ± 0%    +0.00% (p=0.001 n=18+17)
Memmove/12-6                1.90GB/s ± 0%   1.90GB/s ± 0%    +0.00% (p=0.000 n=19+19)
Memmove/13-6                2.05GB/s ± 0%   2.05GB/s ± 0%    +0.00% (p=0.000 n=18+19)
Memmove/14-6                2.21GB/s ± 0%   2.21GB/s ± 0%    +0.00% (p=0.000 n=16+20)
Memmove/15-6                2.37GB/s ± 0%   2.37GB/s ± 0%    +0.00% (p=0.004 n=19+20)
Memmove/16-6                2.53GB/s ± 0%   2.53GB/s ± 0%    +0.00% (p=0.000 n=16+16)
Memmove/32-6                4.67GB/s ± 0%   4.67GB/s ± 0%    +0.00% (p=0.000 n=17+17)
Memmove/64-6                8.67GB/s ± 0%   8.64GB/s ± 0%    -0.33% (p=0.000 n=18+17)
Memmove/128-6               12.6GB/s ± 0%   11.6GB/s ± 0%    -8.05% (p=0.000 n=16+19)
Memmove/256-6               16.3GB/s ± 0%   16.6GB/s ± 0%    +1.66% (p=0.000 n=20+18)
Memmove/512-6               21.5GB/s ± 0%   24.4GB/s ± 0%   +13.35% (p=0.000 n=18+17)
Memmove/1024-6              24.7GB/s ± 0%   33.7GB/s ± 0%   +36.12% (p=0.000 n=18+18)
Memmove/2048-6              27.3GB/s ± 0%   43.3GB/s ± 0%   +58.77% (p=0.000 n=19+17)
Memmove/4096-6              37.5GB/s ± 0%   50.5GB/s ± 0%   +34.56% (p=0.000 n=19+19)
MemmoveUnalignedDst/1-6      135MB/s ± 0%    146MB/s ± 0%    +7.69% (p=0.000 n=16+14)
MemmoveUnalignedDst/2-6      271MB/s ± 0%    292MB/s ± 0%    +7.69% (p=0.000 n=18+18)
MemmoveUnalignedDst/3-6      438MB/s ± 0%    438MB/s ± 0%      ~ (p=0.352 n=16+19)
MemmoveUnalignedDst/4-6      584MB/s ± 0%    584MB/s ± 0%      ~ (p=0.876 n=17+17)
MemmoveUnalignedDst/5-6      631MB/s ± 1%    632MB/s ± 0%    +0.25% (p=0.000 n=20+17)
MemmoveUnalignedDst/6-6      759MB/s ± 0%    759MB/s ± 0%    +0.00% (p=0.000 n=19+16)
MemmoveUnalignedDst/7-6      885MB/s ± 0%    883MB/s ± 1%      ~ (p=0.647 n=18+20)
MemmoveUnalignedDst/8-6     1.08GB/s ± 0%   1.08GB/s ± 0%    +0.00% (p=0.035 n=19+18)
MemmoveUnalignedDst/9-6     1.22GB/s ± 0%   1.22GB/s ± 0%      ~ (p=0.251 n=18+17)
MemmoveUnalignedDst/10-6    1.35GB/s ± 0%   1.35GB/s ± 0%      ~ (p=0.327 n=17+18)
MemmoveUnalignedDst/11-6    1.49GB/s ± 0%   1.49GB/s ± 0%      ~ (p=0.531 n=18+19)
MemmoveUnalignedDst/12-6    1.63GB/s ± 0%   1.63GB/s ± 0%      ~ (p=0.886 n=19+18)
MemmoveUnalignedDst/13-6    1.76GB/s ± 0%   1.76GB/s ± 1%    -0.24% (p=0.006 n=18+20)
MemmoveUnalignedDst/14-6    1.90GB/s ± 0%   1.90GB/s ± 0%      ~ (p=0.818 n=20+19)
MemmoveUnalignedDst/15-6    2.03GB/s ± 0%   2.03GB/s ± 0%      ~ (p=0.294 n=17+16)
MemmoveUnalignedDst/16-6    2.17GB/s ± 0%   2.17GB/s ± 0%      ~ (p=0.602 n=16+18)
MemmoveUnalignedDst/32-6    4.05GB/s ± 0%   4.05GB/s ± 0%    +0.00% (p=0.010 n=18+17)
MemmoveUnalignedDst/64-6    7.59GB/s ± 0%   7.59GB/s ± 0%    +0.00% (p=0.022 n=18+16)
MemmoveUnalignedDst/128-6   11.1GB/s ± 0%   11.4GB/s ± 0%    +2.79% (p=0.000 n=18+17)
MemmoveUnalignedDst/256-6   16.4GB/s ± 0%   16.7GB/s ± 0%    +1.59% (p=0.000 n=20+17)
MemmoveUnalignedDst/512-6   15.7GB/s ± 0%   21.3GB/s ± 0%   +35.87% (p=0.000 n=18+20)
MemmoveUnalignedDst/1024-6  16.0GB/s ±20%   31.5GB/s ± 0%   +96.93% (p=0.000 n=20+14)
MemmoveUnalignedDst/2048-6  19.6GB/s ± 0%   42.1GB/s ± 0%  +115.16% (p=0.000 n=17+18)
MemmoveUnalignedDst/4096-6  6.41GB/s ± 0%  33.18GB/s ± 0%  +417.56% (p=0.000 n=17+18)
MemmoveUnalignedSrc/1-6      171MB/s ± 0%    166MB/s ± 0%    -3.33% (p=0.000 n=19+16)
MemmoveUnalignedSrc/2-6      343MB/s ± 0%    342MB/s ± 1%    -0.41% (p=0.000 n=17+20)
MemmoveUnalignedSrc/3-6      508MB/s ± 0%    493MB/s ± 1%    -2.90% (p=0.000 n=17+17)
MemmoveUnalignedSrc/4-6      677MB/s ± 0%    660MB/s ± 2%    -2.55% (p=0.000 n=17+20)
MemmoveUnalignedSrc/5-6      790MB/s ± 0%    790MB/s ± 0%      ~ (p=0.139 n=17+17)
MemmoveUnalignedSrc/6-6      948MB/s ± 0%    946MB/s ± 1%      ~ (p=0.330 n=17+19)
MemmoveUnalignedSrc/7-6     1.11GB/s ± 0%   1.11GB/s ± 0%    -0.05% (p=0.026 n=17+17)
MemmoveUnalignedSrc/8-6     1.38GB/s ± 0%   1.38GB/s ± 0%      ~ (p=0.091 n=18+16)
MemmoveUnalignedSrc/9-6     1.42GB/s ± 0%   1.40GB/s ± 1%    -1.04% (p=0.000 n=19+20)
MemmoveUnalignedSrc/10-6    1.58GB/s ± 0%   1.56GB/s ± 1%    -1.15% (p=0.000 n=18+19)
MemmoveUnalignedSrc/11-6    1.73GB/s ± 0%   1.71GB/s ± 1%    -1.30% (p=0.000 n=20+20)
MemmoveUnalignedSrc/12-6    1.89GB/s ± 0%   1.87GB/s ± 1%    -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/13-6    2.05GB/s ± 0%   2.02GB/s ± 1%    -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/14-6    2.21GB/s ± 0%   2.18GB/s ± 1%    -1.14% (p=0.000 n=17+20)
MemmoveUnalignedSrc/15-6    2.36GB/s ± 0%   2.34GB/s ± 1%    -1.04% (p=0.000 n=17+20)
MemmoveUnalignedSrc/16-6    2.52GB/s ± 0%   2.49GB/s ± 1%    -1.26% (p=0.000 n=19+20)
MemmoveUnalignedSrc/32-6    4.82GB/s ± 0%   4.61GB/s ± 0%    -4.40% (p=0.000 n=19+20)
MemmoveUnalignedSrc/64-6    5.03GB/s ± 4%   7.97GB/s ± 0%   +58.55% (p=0.000 n=20+16)
MemmoveUnalignedSrc/128-6   11.1GB/s ± 0%   11.2GB/s ± 0%    +0.52% (p=0.000 n=17+18)
MemmoveUnalignedSrc/256-6   16.5GB/s ± 0%   16.4GB/s ± 0%    -0.10% (p=0.000 n=20+18)
MemmoveUnalignedSrc/512-6   21.0GB/s ± 0%   22.1GB/s ± 0%    +5.48% (p=0.000 n=14+17)
MemmoveUnalignedSrc/1024-6  24.9GB/s ± 0%   31.9GB/s ± 0%   +28.20% (p=0.000 n=19+20)
MemmoveUnalignedSrc/2048-6  23.3GB/s ± 0%   33.8GB/s ± 0%   +45.22% (p=0.000 n=17+19)
MemmoveUnalignedSrc/4096-6  37.3GB/s ± 0%   42.7GB/s ± 0%   +14.30% (p=0.000 n=17+17)

Change-Id: Iab488d93a293cdf573ab5cd89b95a818bbb5d531
Reviewed-on: https://go-review.googlesource.com/22515
Run-TryBot: Denis Nagorny <denis.nagorny@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-31 16:03:30 +00:00
Josh Bleecher Snyder
2b74de3ed9 runtime: rename fastrand1 to fastrand
Change-Id: I37706ff0a3486827c5b072c95ad890ea87ede847
Reviewed-on: https://go-review.googlesource.com/28210
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-30 23:59:21 +00:00
Cherry Zhang
f9dafc742d cmd/compile, runtime, etc: get rid of constant FP registers
On ARM64, MIPS64, and PPC64, some floating point registers were
reserved for constants 0, 1, 2, 0.5, etc. This CL removes them.

On ARM64, they are never used. On MIPS64 and PPC64, the only use
case is a multiplication-by-2 in the old backend of the compiler,
which is replaced with an addition.

Change-Id: I737cbf43283756e3408964fc88c567a938c57036
Reviewed-on: https://go-review.googlesource.com/28095
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-30 23:16:17 +00:00
Cherry Zhang
b2e0e9688a cmd/compile: remove Zero and NilCheck for newobject
Recognize runtime.newobject and don't Zero or NilCheck it.

Fixes #15914 (?)
Updates #15390.

TBD: add test

Change-Id: Ia3bfa5c2ddbe2c27c92d9f68534a713b5ce95934
Reviewed-on: https://go-review.googlesource.com/27930
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-30 23:10:43 +00:00
Keith Randall
842b05832f all: use testing.GoToolPath instead of "go"
This change makes sure that tests are run with the correct
version of the go tool.  The correct version is the one that
we invoked with "go test", not the one that is first in our path.

Fixes #16577

Change-Id: If22c8f8c3ec9e7c35d094362873819f2fbb8559b
Reviewed-on: https://go-review.googlesource.com/28089
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-30 22:49:11 +00:00
Martin Möhrmann
0dae9dfb08 cmd/compile: improve string iteration performance
Generate a for loop for ranging over strings that only needs to call
the runtime function charntorune for non ASCII characters.

This provides faster iteration over ASCII characters and slightly
faster iteration for other characters.

The runtime function charntorune is changed to take an index from where
to start decoding and returns the index after the last byte belonging
to the decoded rune.

All call sites of charntorune in the runtime are replaced by a for loop
that will be transformed by the compiler instead of calling the charntorune
function directly.

go binary size decreases by 80 bytes.
godoc binary size increases by around 4 kilobytes.

runtime:

name                           old time/op  new time/op  delta
RuneIterate/range/ASCII-4      43.7ns ± 3%  10.3ns ± 4%  -76.33%  (p=0.000 n=44+45)
RuneIterate/range/Japanese-4   72.5ns ± 2%  62.8ns ± 2%  -13.41%  (p=0.000 n=49+50)
RuneIterate/range1/ASCII-4     43.5ns ± 2%  10.4ns ± 3%  -76.18%  (p=0.000 n=50+50)
RuneIterate/range1/Japanese-4  72.5ns ± 2%  62.9ns ± 2%  -13.26%  (p=0.000 n=50+49)
RuneIterate/range2/ASCII-4     43.5ns ± 3%  10.3ns ± 2%  -76.22%  (p=0.000 n=48+47)
RuneIterate/range2/Japanese-4  72.4ns ± 2%  62.7ns ± 2%  -13.47%  (p=0.000 n=50+50)

strings:

name                 old time/op    new time/op    delta
IndexRune-4            64.7ns ± 5%    22.4ns ± 3%  -65.43%  (p=0.000 n=25+21)
MapNoChanges-4          269ns ± 2%     157ns ± 2%  -41.46%  (p=0.000 n=23+24)
Fields-4               23.0ms ± 2%    19.7ms ± 2%  -14.35%  (p=0.000 n=25+25)
FieldsFunc-4           23.1ms ± 2%    19.6ms ± 2%  -14.94%  (p=0.000 n=25+24)

name                 old speed      new speed      delta
Fields-4             45.6MB/s ± 2%  53.2MB/s ± 2%  +16.87%  (p=0.000 n=24+25)
FieldsFunc-4         45.5MB/s ± 2%  53.5MB/s ± 2%  +17.57%  (p=0.000 n=25+24)

Updates #13162

Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701
Reviewed-on: https://go-review.googlesource.com/27853
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-30 18:17:20 +00:00
Keith Randall
0d7a2241cb runtime: update a few comments
noescape is now 0 instructions with the SSA backend.
fast atomics are no longer a TODO (at least for amd64).

Change-Id: Ib6e06f7471bef282a47ba236d8ce95404bb60a42
Reviewed-on: https://go-review.googlesource.com/28087
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-30 18:16:28 +00:00
Carlos Eduardo Seo
aaa6b53524 runtime: insufficient padding in the p structure
The current padding in the 'p' struct is hardcoded at 64 bytes. It should be the
cache line size. On ppc64x, the current value is only okay because sys.CacheLineSize
is wrong at 64 bytes. This change fixes that by making the padding equal to the
cache line size. It also fixes the cache line size for ppc64/ppc64le to 128 bytes.

Fixes #16477

Change-Id: Ib7ec5195685116eb11ba312a064f41920373d4a3
Reviewed-on: https://go-review.googlesource.com/25370
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-29 23:22:51 +00:00
Martin Möhrmann
e6f9f39ce5 cmd/compile: generate makeslice calls with int arguments
Where possible generate calls to runtime makeslice with int arguments
during compile time instead of makeslice with int64 arguments.

This eliminates converting arguments for calls to makeslice with
int64 arguments for platforms where int64 values do not fit into
arguments of type int.

godoc 386 binary shrinks by approximately 12 kilobyte.

amd64:
name         old time/op  new time/op  delta
MakeSlice-2  29.8ns ± 1%  29.8ns ± 1%   ~     (p=1.000 n=24+24)

386:
name         old time/op  new time/op  delta
MakeSlice-2  52.3ns ± 0%  45.9ns ± 0%  -12.17%  (p=0.000 n=25+22)

Fixes  #15357

Change-Id: Icb8701bb63c5a83877d26c8a4b78e782ba76de7c
Reviewed-on: https://go-review.googlesource.com/27851
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-08-29 18:25:33 +00:00
Emmanuel Odeke
7c04633e0c all: fix obsolete inferno-os links
Fixes #16911.

Fix obsolete inferno-os links, since code.google.com shutdown.
This CL points to the right files by replacing
http://code.google.com/p/inferno-os/source/browse
with
https://bitbucket.org/inferno-os/inferno-os/src/default

To implement the change I wrote and ran this script in the root:
$ grep -Rn 'http://code.google.com/p/inferno-os/source/browse' * \
| cut -d":" -f1 | while read F;do perl -pi -e \
's/http:\/\/code.google.com\/p\/inferno-os\/source\/browse/https:\/\/bitbucket.org\/inferno-os\/inferno-os\/src\/default/g'
$F;done

I excluded any cmd/vendor changes from the commit.

Change-Id: Iaaf828ac8f6fc949019fd01832989d00b29b6749
Reviewed-on: https://go-review.googlesource.com/27994
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-29 04:54:42 +00:00
David Crawshaw
bee4206764 runtime: have typelinksinit work forwards
For reasons I have forgotten typelinksinit processed modules backwards.
(I suspect this was an attempt to process types in the executing
binary first.)

It does not appear to be necessary, and it is not the order we want
when a module can be loaded at an arbitrary point during a program's
execution as a plugin. So reverse the order.

While here, make it safe to call typelinksinit multiple times.

Change-Id: Ie10587c55c8e5efa0542981efb6eb3c12dd59e8c
Reviewed-on: https://go-review.googlesource.com/27822
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-08-26 21:22:30 +00:00
Keith Randall
320ddcf834 cmd/compile: inline atomics from runtime/internal/atomic on amd64
Inline atomic reads and writes on amd64.  There's no reason
to pay the overhead of a call for these.

To keep atomic loads from being reordered, we make them
return a <value,memory> tuple.

Change the meaning of resultInArg0 for tuple-generating ops
to mean the first part of the result tuple, not the second.
This means we can always put the store part of the tuple last,
matching how arguments are laid out.  This requires reordering
the outputs of add32carry and sub32carry and their descendents
in various architectures.

benchmark                    old ns/op     new ns/op     delta
BenchmarkAtomicLoad64-8      2.09          0.26          -87.56%
BenchmarkAtomicStore64-8     7.54          5.72          -24.14%

TBD (in a different CL): Cas, Or8, ...

Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
Reviewed-on: https://go-review.googlesource.com/27641
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-25 20:09:04 +00:00
Josh Bleecher Snyder
71ab9fa312 all: fix assembly vet issues
Add missing function prototypes.
Fix function prototypes.
Use FP references instead of SP references.
Fix variable names.
Update comments.
Clean up whitespace. (Not for vet.)

All fairly minor fixes to make vet happy.

Updates #11041

Change-Id: Ifab2cdf235ff61cdc226ab1d84b8467b5ac9446c
Reviewed-on: https://go-review.googlesource.com/27713
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-25 18:52:31 +00:00
Ian Lance Taylor
f29ec7d74a runtime: remove unused type sigtabtt
The type sigtabtt was introduced by an automated tool in
https://golang.org/cl/167550043. It was the Go version of the C type
SigTab. However, when the C code using SigTab was converted to Go in
https://golang.org/cl/168500044 it was rewritten to use a different Go
type, sigTabT, rather than sigtabtt (the difference being that sigTabT
uses string where sigtabtt uses *int8 from the C type char*). So this is
just a dreg from the conversion that was never actually used.

Change-Id: I2ec6eb4b25613bf5e5ad1dbba1f4b5ff20f80f55
Reviewed-on: https://go-review.googlesource.com/27691
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-25 03:51:24 +00:00
Michael Munday
61d5daea0a runtime: use clock_gettime for time.now() on s390x
This should improve the precision of time.now() from microseconds
to nanoseconds.

Also, modify runtime.nanotime to keep it consistent with cleanup
done to time.now.

Updates #11222 for s390x.

Change-Id: I27864115ea1fee7299360d9003cd3a8355f624d3
Reviewed-on: https://go-review.googlesource.com/27710
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-25 02:05:31 +00:00
Keith Randall
3e270ab80b cmd/compile: clean up ctz ops
Now that we have ops that can return 2 results, have BSF return a result
and flags.  We can then get rid of the redundant comparison and use CMOV
instead of CMOVconst ops.

Get rid of a bunch of the ops we don't use.  Ctz{8,16}, plus all the Clzs,
and CMOVNEs.  I don't think we'll ever use them, and they would be easy
to add back if needed.

Change-Id: I8858a1d017903474ea7e4002fc76a6a86e7bd487
Reviewed-on: https://go-review.googlesource.com/27630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-23 23:45:12 +00:00
Ian Lance Taylor
d00890b5f3 runtime: add msan calls before calling traceback functions
Tell msan that the arguments to the traceback functions are initialized,
in case the traceback functions are compiled with -fsanitize=memory.

Change-Id: I3ab0816604906c6cd7086245e6ae2e7fa62fe354
Reviewed-on: https://go-review.googlesource.com/24856
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-23 16:31:16 +00:00
Ian Lance Taylor
fe251d2581 runtime: remove unused function in test
Change-Id: I43f14cdd9eb4a1d5471fc88c1b4759ceb2c674cf
Reviewed-on: https://go-review.googlesource.com/24817
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-23 14:09:52 +00:00
Ian Lance Taylor
2d85e87f08 runtime/cgo: add tsan acquire/release around setenv/unsetenv
Change-Id: Iabb25e97714d070c31c657559a97a3bfc979da18
Reviewed-on: https://go-review.googlesource.com/25403
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-23 14:07:58 +00:00
Ian Lance Taylor
dc9755c2a2 runtime: add missing race and msan checks to reflect functions
Add missing race and msan checks to reflect.typedmmemove and
reflect.typedslicecopy. Missing these checks caused the race detector
to miss races and caused msan to issue false positive errors.

Fixes #16281.

Change-Id: I500b5f92bd68dc99dd5d6f297827fd5d2609e88b
Reviewed-on: https://go-review.googlesource.com/24760
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-08-23 13:12:15 +00:00
Carlos Eduardo Seo
0df5ab7e65 runtime: Use clock_gettime to get current time on ppc64x
Fetch the current time in nanoseconds, not microseconds, by using
clock_gettime rather than gettimeofday.

Updates #11222

Change-Id: I1c2c1b88f80ae82002518359436e19099061c6fb
Reviewed-on: https://go-review.googlesource.com/26790
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
2016-08-23 05:37:05 +00:00
Josh Bleecher Snyder
e2103adb6c crypto/*, runtime: nacl asm fixes
Found by vet.

Updates #11041

Change-Id: I5217b3e20c6af435d7500d6bb487b9895efe6605
Reviewed-on: https://go-review.googlesource.com/27493
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-08-22 19:50:41 +00:00
Josh Bleecher Snyder
5abfc97e84 runtime: use correct MOV for plan9 brk_ ret value
Updates #11041

Change-Id: I78f8d48f00cfbb451e37c868cc472ef06ea0fd95
Reviewed-on: https://go-review.googlesource.com/27491
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-22 19:49:08 +00:00
Josh Bleecher Snyder
e80376ca6b runtime: ignore closeonexec ret val on openbsd/arm
Fixes #16641
Updates #11041

Change-Id: I087208a486f535d74135591b2c9a73168cf80e1a
Reviewed-on: https://go-review.googlesource.com/27490
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-22 19:40:09 +00:00
Dmitry Vyukov
747a158ef3 runtime: speed up StartTrace with lots of blocked goroutines
In StartTrace we emit EvGoCreate for all existing goroutines.
This includes stack unwind to obtain current stack.
Real Go programs can contain hundreds of thousands of blocked goroutines.
For such programs StartTrace can take up to a second (few ms per goroutine).

Obtain current stack ID once and use it for all EvGoCreate events.

This speeds up StartTrace with 10K blocked goroutines from 20ms to 4 ms
(win for StartTrace called from net/http/pprof hander will be bigger
as stack is deeper).

Change-Id: I9e5ff9468331a840f8fdcdd56c5018c2cfde61fc
Reviewed-on: https://go-review.googlesource.com/25573
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2016-08-22 17:40:10 +00:00
Josh Bleecher Snyder
7c5f33b173 runtime: cull dead code
They are unused, and vet wants them to have
a function prototype.

Updates #11041

Change-Id: Idedc96ddd3c3cf1b1d2ab6d98796367eab29f032
Reviewed-on: https://go-review.googlesource.com/27492
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-22 16:41:34 +00:00
Josh Bleecher Snyder
4af1148079 cmd/vet: improve asmdecl parameter handling
The asmdecl check had hand-rolled code that
calculated the size and offset of parameters
based only on the AST.
It included a list of known named types.

This CL changes asmdecl to use go/types instead.
This allows us to easily handle named types.
It also adds support for structs, arrays,
and complex parameters.

It improves the default names given to unnamed
parameters. Previously, all anonymous arguments were
called "unnamed", and the first anonymous return
argument was called "ret".
Anonymous arguments are now called arg, arg1, arg2,
etc., depending on the index in the argument list.
Return arguments are ret, ret1, ret2.

This CL also fixes a bug in the printing of
composite data type sizes.

Updates #11041

Change-Id: I1085116a26fe6199480b680eff659eb9ab31769b
Reviewed-on: https://go-review.googlesource.com/27150
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2016-08-22 15:42:06 +00:00
Josh Bleecher Snyder
880c967ccd runtime: minor string/rune optimizations
Eliminate a spill in concatstrings.
Provide bounds elim hints in runetochar.
No significant benchmark movement.

Before:
"".runetochar t=1 size=412 args=0x28 locals=0x0
"".concatstrings t=1 size=736 args=0x30 locals=0x98

After:
"".runetochar t=1 size=337 args=0x28 locals=0x0
"".concatstrings t=1 size=711 args=0x30 locals=0x90

Change-Id: Icce646976cb20a223163b7e72a54761193ac17e3
Reviewed-on: https://go-review.googlesource.com/27460
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-22 15:19:31 +00:00
Michael Munday
fa897643a1 runtime: remove unnecessary calls to memclr
Go will have already cleared the structs (the original C wouldn't
have).

Change-Id: I4a5a0cfd73953181affc158d188aae2ce281bb33
Reviewed-on: https://go-review.googlesource.com/27435
Run-TryBot: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-20 18:00:09 +00:00
Dmitry Vyukov
14e5951166 runtime: increase malloc size classes
When we calculate class sizes, in some cases we discard considerable
amounts of memory without an apparent reason. For example, we choose
size 8448 with 6 objects in 7 pages. But we can well use object
size 9472, which is also 6 objects in 7 pages but +1024 bytes (+12.12%).

Increase class sizes to the max value that leads to the same
page count/number of objects. Full list of affected size classes:

class 36: pages: 2 size: 1664->1792 +128 (7.69%)
class 39: pages: 1 size: 2560->2688 +128 (5.0%)
class 40: pages: 3 size: 2816->3072 +256 (9.9%)
class 41: pages: 2 size: 3072->3200 +128 (4.16%)
class 42: pages: 3 size: 3328->3456 +128 (3.84%)
class 44: pages: 3 size: 4608->4864 +256 (5.55%)
class 47: pages: 4 size: 6400->6528 +128 (2.0%)
class 48: pages: 5 size: 6656->6784 +128 (1.92%)
class 51: pages: 7 size: 8448->9472 +1024 (12.12%)
class 52: pages: 6 size: 8704->9728 +1024 (11.76%)
class 53: pages: 5 size: 9472->10240 +768 (8.10%)
class 54: pages: 4 size: 10496->10880 +384 (3.65%)
class 57: pages: 7 size: 14080->14336 +256 (1.81%)
class 59: pages: 9 size: 16640->18432 +1792 (10.76%)
class 60: pages: 7 size: 17664->19072 +1408 (7.97%)
class 62: pages: 8 size: 21248->21760 +512 (2.40%)
class 64: pages: 10 size: 24832->27264 +2432 (9.79%)
class 65: pages: 7 size: 28416->28672 +256 (0.90%)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.59s ± 5%     2.52s ± 4%    ~     (p=0.132 n=6+6)
Fannkuch11-12                2.13s ± 3%     2.17s ± 3%    ~     (p=0.180 n=6+6)
FmtFprintfEmpty-12          47.0ns ± 3%    46.6ns ± 1%    ~     (p=0.355 n=6+5)
FmtFprintfString-12          131ns ± 0%     131ns ± 1%    ~     (p=0.476 n=4+6)
FmtFprintfInt-12             121ns ± 6%     122ns ± 2%    ~     (p=0.511 n=6+6)
FmtFprintfIntInt-12          182ns ± 2%     186ns ± 1%  +2.20%  (p=0.015 n=6+6)
FmtFprintfPrefixedInt-12     184ns ± 5%     181ns ± 2%    ~     (p=0.645 n=6+6)
FmtFprintfFloat-12           272ns ± 7%     265ns ± 1%    ~     (p=1.000 n=6+5)
FmtManyArgs-12               783ns ± 2%     802ns ± 2%  +2.38%  (p=0.017 n=6+6)
GobDecode-12                7.04ms ± 4%    7.00ms ± 2%    ~     (p=1.000 n=6+6)
GobEncode-12                6.36ms ± 6%    6.17ms ± 6%    ~     (p=0.240 n=6+6)
Gzip-12                      242ms ±14%     233ms ± 7%    ~     (p=0.310 n=6+6)
Gunzip-12                   36.6ms ±22%    36.0ms ± 9%    ~     (p=0.841 n=5+5)
HTTPClientServer-12         93.1µs ±29%    88.0µs ±32%    ~     (p=0.240 n=6+6)
JSONEncode-12               27.1ms ±39%    26.2ms ±35%    ~     (p=0.589 n=6+6)
JSONDecode-12               71.7ms ±36%    71.5ms ±36%    ~     (p=0.937 n=6+6)
Mandelbrot200-12            4.78ms ±10%    4.70ms ±16%    ~     (p=0.394 n=6+6)
GoParse-12                  4.86ms ±34%    4.95ms ±36%    ~     (p=1.000 n=6+6)
RegexpMatchEasy0_32-12       110ns ±37%     110ns ±36%    ~     (p=0.660 n=6+6)
RegexpMatchEasy0_1K-12       240ns ±38%     234ns ±47%    ~     (p=0.554 n=6+6)
RegexpMatchEasy1_32-12      77.2ns ± 2%    77.2ns ±10%    ~     (p=0.699 n=6+6)
RegexpMatchEasy1_1K-12       337ns ± 5%     331ns ± 4%    ~     (p=0.552 n=6+6)
RegexpMatchMedium_32-12      125ns ±13%     132ns ±26%    ~     (p=0.561 n=6+6)
RegexpMatchMedium_1K-12     35.9µs ± 3%    36.1µs ± 5%    ~     (p=0.818 n=6+6)
RegexpMatchHard_32-12       1.81µs ± 4%    1.82µs ± 5%    ~     (p=0.452 n=5+5)
RegexpMatchHard_1K-12       52.4µs ± 2%    54.4µs ± 3%  +3.84%  (p=0.002 n=6+6)
Revcomp-12                   401ms ± 2%     390ms ± 1%  -2.82%  (p=0.002 n=6+6)
Template-12                 54.5ms ± 3%    54.6ms ± 1%    ~     (p=0.589 n=6+6)
TimeParse-12                 294ns ± 1%     298ns ± 2%    ~     (p=0.160 n=6+6)
TimeFormat-12                323ns ± 4%     318ns ± 5%    ~     (p=0.297 n=6+6)

name                      old speed      new speed      delta
GobDecode-12               109MB/s ± 4%   110MB/s ± 2%    ~     (p=1.000 n=6+6)
GobEncode-12               121MB/s ± 6%   125MB/s ± 6%    ~     (p=0.240 n=6+6)
Gzip-12                   80.4MB/s ±12%  83.3MB/s ± 7%    ~     (p=0.310 n=6+6)
Gunzip-12                  495MB/s ±41%   541MB/s ± 9%    ~     (p=0.931 n=6+5)
JSONEncode-12             80.7MB/s ±39%  82.8MB/s ±34%    ~     (p=0.589 n=6+6)
JSONDecode-12             30.4MB/s ±40%  31.0MB/s ±37%    ~     (p=0.937 n=6+6)
GoParse-12                13.2MB/s ±33%  13.2MB/s ±35%    ~     (p=1.000 n=6+6)
RegexpMatchEasy0_32-12     321MB/s ±34%   326MB/s ±34%    ~     (p=0.699 n=6+6)
RegexpMatchEasy0_1K-12    4.49GB/s ±31%  4.74GB/s ±37%    ~     (p=0.589 n=6+6)
RegexpMatchEasy1_32-12     414MB/s ± 2%   415MB/s ± 9%    ~     (p=0.699 n=6+6)
RegexpMatchEasy1_1K-12    3.03GB/s ± 5%  3.09GB/s ± 4%    ~     (p=0.699 n=6+6)
RegexpMatchMedium_32-12   7.99MB/s ±12%  7.68MB/s ±22%    ~     (p=0.589 n=6+6)
RegexpMatchMedium_1K-12   28.5MB/s ± 3%  28.4MB/s ± 5%    ~     (p=0.818 n=6+6)
RegexpMatchHard_32-12     17.7MB/s ± 4%  17.0MB/s ±15%    ~     (p=0.351 n=5+6)
RegexpMatchHard_1K-12     19.6MB/s ± 2%  18.8MB/s ± 3%  -3.67%  (p=0.002 n=6+6)
Revcomp-12                 634MB/s ± 2%   653MB/s ± 1%  +2.89%  (p=0.002 n=6+6)
Template-12               35.6MB/s ± 3%  35.5MB/s ± 1%    ~     (p=0.615 n=6+6)

Change-Id: I465a47f74227f316e3abea231444f48c7a30ef85
Reviewed-on: https://go-review.googlesource.com/24493
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-08-19 21:24:28 +00:00
Austin Clements
3de7dbb191 runtime: fix check for vacuous page boundary rounding again
The previous fix for this, commit 336dad2a, had everything right in
the commit message, but reversed the test in the code. Fix the test in
the code.

This reversal effectively disabled the scavenger on large page systems
*except* in the rare cases where this code was originally wrong, which
is why it didn't obviously show up in testing.

Fixes #16644. Again. :(

Change-Id: I27cce4aea13de217197db4b628f17860f27ce83e
Reviewed-on: https://go-review.googlesource.com/27402
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-19 20:16:43 +00:00
Austin Clements
244efebe7f runtime: fix out of date comments
The transition from mark 1 to mark 2 no longer enqueues new root
marking jobs, but some of the comments still refer to this. Fix these
comments.

Change-Id: I3f98628dba32c5afe30495ab495da42b32291e9e
Reviewed-on: https://go-review.googlesource.com/24965
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-08-19 18:15:54 +00:00
Josh Bleecher Snyder
604efe1281 runtime: disable TestCgoCallbackGC on FreeBSD
The trybot flakes are a nuisance.

Updates #16396

Change-Id: I8202adb554391676ba82bca44d784c6a81bf2085
Reviewed-on: https://go-review.googlesource.com/27313
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-18 17:13:39 +00:00
David Chase
5b9ff11c3d cmd/compile: ppc64le working, not optimized enough
This time with the cherry-pick from the proper patch of
the old CL.

Stack size increased.
Corrected NaN-comparison glitches.
Marked g register as clobbered by calls.
Fixed shared libraries.

live_ssa.go still disabled because of differences.
Presumably turning on more optimization will fix
both the stack size and the live_ssa.go glitches.

Enhanced debugging output for shared libs test.

Rebased onto master.

Updates #16010.

Change-Id: I40864faf1ef32c118fb141b7ef8e854498e6b2c4
Reviewed-on: https://go-review.googlesource.com/27159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-18 16:34:47 +00:00
Jaana Burcu Dogan
c2322b7ea6 runtime: fix the absolute URL to pprof tools
Change-Id: I82eaf5c14a5b8b9ec088409f946adf7b5fd5dbe3
Reviewed-on: https://go-review.googlesource.com/27311
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-17 23:22:53 +00:00
Austin Clements
336dad2a07 runtime: fix check for vacuous page boundary rounding
sysUnused (e.g., madvise MADV_FREE) is only sensible to call on
physical page boundaries, so scavengelist rounds in the bounds of the
region being released to the nearest physical page boundaries.
However, if the region is smaller than a physical page and neither the
start nor end fall on a boundary, then rounding the start up to a page
boundary and the end down to a page boundary will result in end < start.
Currently, we only give up on the region if start == end, so if we
encounter end < start, we'll call madvise with a negative length and
the madvise will fail.

Issue #16644 gives a concrete example of this:

    start = 0x1285ac000
    end   = 0x1285ae000 (1 8K page)

This leads to the rounded values

    start = 0x1285b0000
    end   = 0x1285a0000

which leads to len = -65536.

Fix this by giving up on the region if end <= start, not just if
end == start.

Fixes #16644.

Change-Id: I8300db492dbadc82ac1ad878318b36bcb7c39524
Reviewed-on: https://go-review.googlesource.com/27230
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-17 14:04:16 +00:00
Keith Randall
e492d9f018 runtime: fix map iterator concurrent map check
We should check whether there is a concurrent writer at the
start of every mapiternext, not just in mapaccessK (which is
only called during certain map growth situations).

Tests turned off by default because they are inherently flaky.

Fixes #16278

Change-Id: I8b72cab1b8c59d1923bec6fa3eabc932e4e91542
Reviewed-on: https://go-review.googlesource.com/24749
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-08-16 21:52:44 +00:00
Josh Bleecher Snyder
562d06fc23 cmd/compile: inline _, ok = i.(T)
We already inlined

_, ok = e.(T)
_, ok = i.(E)
_, ok = e.(E)

The only ok-only variants not inlined are now

_, ok = i.(I)
_, ok = e.(I)

These call getitab, so are non-trivial.

Change-Id: Ie45fd8933ee179a679b92ce925079b94cff0ee12
Reviewed-on: https://go-review.googlesource.com/26658
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-16 15:24:33 +00:00
Josh Bleecher Snyder
6f74c0774c runtime: move printing of extra newline
No functional changes, makes vet happy.

Updates #11041

Change-Id: I59f3aba46d19b86d605508978652d76a1fe7ac7b
Reviewed-on: https://go-review.googlesource.com/27125
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-16 14:37:17 +00:00
Keith Randall
88c8b7c7f9 Merge remote-tracking branch 'origin/dev.ssa' into merge
Merging from dev.ssa back into master.

Contains complete SSA backends for arm, arm64, 386, amd64p32.
Work in progress for PPC64.

Change-Id: Ifd7075e3ec6f88f776e29f8c7fd55830328897fd
2016-08-15 17:07:16 -07:00
Keith Randall
c069bc4996 [dev.ssa] cmd/compile: implement GO386=387
Last part of the 386 SSA port.

Modify the x86 backend to simulate SSE registers and
instructions with 387 registers and instructions.
The simulation isn't terribly performant, but it works,
and the old implementation wasn't very performant either.
Leaving to people who care about 387 to optimize if they want.

Turn on SSA backend for 386 by default.

Fixes #16358

Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0
Reviewed-on: https://go-review.googlesource.com/25271
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-10 17:41:01 +00:00
Shenghou Ma
26015b9563 runtime: make stack 16-byte aligned for external code in _rt0_amd64_linux_lib
Fixes #16618.

Change-Id: Iffada12e8672bbdbcf2e787782c497e2c45701b1
Reviewed-on: https://go-review.googlesource.com/25550
Run-TryBot: Minux Ma <minux@golang.org>
Reviewed-by: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-05 23:56:07 +00:00
Shenghou Ma
9fde86b012 runtime, syscall: fix kernel gettimeofday ABI change on iOS 10
Fixes #16570 on iOS.

Thanks Daniel Burhans for reporting the bug and testing the fix.

Change-Id: I43ae7b78c8f85a131ed3d93ea59da9f32a02cd8f
Reviewed-on: https://go-review.googlesource.com/25481
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-05 20:47:34 +00:00
Keith Randall
01dbfb81a0 [dev.ssa] Merge commit 'f135c326402aaa757aa96aad283a91873d4ae124' into mergebranch
Pick up shared library fix in dev.ssa.

Change-Id: I5bdd0e9e0f1d6f7c14b518343ee323ed9a894b9c
2016-08-04 10:52:24 -07:00
David Crawshaw
f135c32640 runtime: initialize hash algs before typemap
When compiling with -buildmode=shared, a map[int32]*_type is created for
each extra module mapping duplicate types back to a canonical object.
This is done in the function typelinksinit, which is called before the
init function that sets up the hash functions for the map
implementation. The result is typemap becomes unusable after
runtime initialization.

The fix in this CL is to move algorithm init before typelinksinit in
the runtime setup process. (For 1.8, we may want to turn typemap into
a sorted slice of types and use binary search.)

Manually tested on GOOS=linux with:

	GOHOSTARCH=386 GOARCH=386 ./make.bash && \
		go install -buildmode=shared std && \
		cd ../test && \
		go run run.go -linkshared

Fixes #16590

Change-Id: Idc08c50cc70d20028276fbf564509d2cd5405210
Reviewed-on: https://go-review.googlesource.com/25469
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-04 17:39:05 +00:00
Keith Randall
d2286ea284 [dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch
Semi-regular merge from tip into dev.ssa.

Change-Id: Iadb60e594ef65a99c0e1404b14205fa67c32a9e9
2016-08-04 10:08:20 -07:00
Brad Fitzpatrick
2da5633eb9 runtime: fix nanotime for macOS Sierra, again.
macOS Sierra beta4 changed the kernel interface for getting time.
DX now optionally points to an address for additional info.
Set it to zero to avoid corrupting memory.

Fixes #16570

Change-Id: I9f537e552682045325cdbb68b7d0b4ddafade14a
Reviewed-on: https://go-review.googlesource.com/25400
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Quentin Smith <quentin@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-02 20:17:50 +00:00
Rhys Hiltner
ccca9c9cc0 runtime: reduce GC assist extra credit
Mutator goroutines that allocate memory during the concurrent mark
phase are required to spend some time assisting the garbage
collector. The magnitude of this mandatory assistance is proportional
to the goroutine's allocation debt and subject to the assistance
ratio as calculated by the pacer.

When assisting the garbage collector, a mutator goroutine will go
beyond paying off its allocation debt. It will build up extra credit
to amortize the overhead of the assist.

In fast-allocating applications with high assist ratios, building up
this credit can take the affected goroutine's entire time slice.
Reduce the penalty on each goroutine being selected to assist the GC
in two ways, to spread the responsibility more evenly.

First, do a consistent amount of extra scan work without regard for
the pacer's assistance ratio. Second, reduce the magnitude of the
extra scan work so it can be completed within a few hundred
microseconds.

Commentary on gcOverAssistWork is by Austin Clements, originally in
https://golang.org/cl/24704

Updates #14812
Fixes #16432

Change-Id: I436f899e778c20daa314f3e9f0e2a1bbd53b43e1
Reviewed-on: https://go-review.googlesource.com/25155
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Chris Broadfoot <cbro@golang.org>
2016-07-27 18:56:04 +00:00
Austin Clements
b11fff3886 runtime/pprof: document use of pprof package
Currently the pprof package gives almost no guidance for how to use it
and, despite the standard boilerplate used to create CPU and memory
profiles, this boilerplate appears nowhere in the pprof documentation.

Update the pprof package documentation to give the standard
boilerplate in a form people can copy, paste, and tweak. This
boilerplate is based on rsc's 2011 blog post on profiling Go programs
at https://blog.golang.org/profiling-go-programs, which is where I
always go when I need to copy-paste the boilerplate.

Change-Id: I74021e494ea4dcc6b56d6fb5e59829ad4bb7b0be
Reviewed-on: https://go-review.googlesource.com/25182
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-07-26 22:16:55 +00:00
Keith Randall
df2f813bd2 [dev.ssa] cmd/compile: 386 port now works
GOARCH=386 SSATEST=1 ./all.bash passes

Caveat: still needs changes to test/ files to use *_ssa.go versions.  I
won't check those changes in with this CL because the builders will
complain as they don't have SSATEST=1.

Mostly minor fixes.

Implement float <-> uint32 in assembly.  It seems the simplest option
for now.

GO386=387 does not work.  That's why I can't make SSA the default for
386 yet.

Change-Id: Ic4d4402104d32bcfb1fd612f5bb6539f9acb8ae0
Reviewed-on: https://go-review.googlesource.com/25119
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-07-21 20:41:18 +00:00
Ian Lance Taylor
ff227b8a56 runtime: add explicit INT $3 at end of Darwin amd64 sigtramp
The omission of this instruction could confuse the traceback code if a
SIGPROF occurred during a signal handler.  The traceback code would
trace up to sigtramp, but would then get confused because it would see a
PC address that did not appear to be in the function.

Fixes #16453.

Change-Id: I2b3d53e0b272fb01d9c2cb8add22bad879d3eebc
Reviewed-on: https://go-review.googlesource.com/25104
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-07-21 01:04:22 +00:00
Austin Clements
f407ca9288 runtime: support smaller physical pages than PhysPageSize
Most operations need an upper bound on the physical page size, which
is what sys.PhysPageSize is for (this is checked at runtime init on
Linux). However, a few operations need a *lower* bound on the physical
page size. Introduce a "minPhysPageSize" constant to act as this lower
bound and use it where it makes sense:

1) In addrspace_free, we have to query each page in the given range.
   Currently we increment by the upper bound on the physical page
   size, which means we may skip over pages if the true size is
   smaller. Worse, we currently pass a result buffer that only has
   enough room for one page. If there are actually multiple pages in
   the range passed to mincore, the kernel will overflow this buffer.
   Fix these problems by incrementing by the lower-bound on the
   physical page size and by passing "1" for the length, which the
   kernel will round up to the true physical page size.

2) In the write barrier, the bad pointer check tests for pointers to
   the first physical page, which are presumably small integers
   masquerading as pointers. However, if physical pages are smaller
   than we think, we may have legitimate pointers below
   sys.PhysPageSize. Hence, use minPhysPageSize for this test since
   pointers should never fall below that.

In particular, this applies to ARM64 and MIPS. The runtime is
configured to use 64kB pages on ARM64, but by default Linux uses 4kB
pages. Similarly, the runtime assumes 16kB pages on MIPS, but both 4kB
and 16kB kernel configurations are common. This also applies to ARM on
systems where the runtime is recompiled to deal with a larger page
size. It is also a step toward making the runtime use only a
dynamically-queried page size.

Change-Id: I1fdfd18f6e7cbca170cc100354b9faa22fde8a69
Reviewed-on: https://go-review.googlesource.com/25020
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2016-07-20 18:28:43 +00:00
Cherry Zhang
7b9873b9b9 [dev.ssa] cmd/internal/obj, etc.: add and use NEGF, NEGD instructions on ARM
Updates #15365.

Change-Id: I372a5617c2c7d91de545cac0464809b96711b63a
Reviewed-on: https://go-review.googlesource.com/24646
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2016-07-20 18:15:37 +00:00
Dmitry Vyukov
d73ca5f4d8 runtime/race: fix memory leak
The leak was reported internally on a sever canary that runs for days.
After a day server consumes 5.6GB, after 6 days -- 12.2GB.
The leak is exposed by the added benchmark.
The leak is fixed upstream in :
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/tsan/rtl/tsan_rtl_thread.cc?view=diff&r1=276102&r2=276103&pathrev=276103

Fixes #16441

Change-Id: I9d4f0adef48ca6cf2cd781b9a6990ad4661ba49b
Reviewed-on: https://go-review.googlesource.com/25091
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
2016-07-20 14:17:44 +00:00
Ian Lance Taylor
50048a4e8e runtime: add as many extra M's as needed
When a non-Go thread calls into Go, the runtime needs an M to run the Go
code. The runtime keeps a list of extra M's available. When the last
extra M is allocated, the needextram field is set to tell it to allocate
a new extra M as soon as it is running in Go. This ensures that an extra
M will always be available for the next thread.

However, if many threads need an extra M at the same time, this
serializes them all. One thread will get an extra M with the needextram
field set. All the other threads will see that there is no M available
and will go to sleep. The one thread that succeeded will create a new
extra M. One lucky thread will get it. All the other threads will see
that there is no M available and will go to sleep. The effect is
thundering herd, as all the threads looking for an extra M go through
the process one by one. This seems to have a particularly bad effect on
the FreeBSD scheduler for some reason.

With this change, we track the number of threads waiting for an M, and
create all of them as soon as one thread gets through. This still means
that all the threads will fight for the lock to pick up the next M. But
at least each thread that gets the lock will succeed, instead of going
to sleep only to fight again.

This smooths out the performance greatly on FreeBSD, reducing the
average wall time of `testprogcgo CgoCallbackGC` by 74%.  On GNU/Linux
the average wall time goes down by 9%.

Fixes #13926
Fixes #16396

Change-Id: I6dc42a4156085a7ed4e5334c60b39db8f8ef8fea
Reviewed-on: https://go-review.googlesource.com/25047
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-07-20 13:31:55 +00:00
Cherry Zhang
7d70f84f54 [dev.ssa] cmd/compile: add floating point optimizations in SSA for ARM
Add some simplification rules for floating point ops.

cmd/internal/obj/arm supports instructions that compare FP register
to 0, but runtime softfloat simulator does not. This CL adds these
instructions to softfloat simulator as well.

Updates #15365.

Change-Id: I29405b2bfcb4c8cf106cb7a1a811409fec91b170
Reviewed-on: https://go-review.googlesource.com/24790
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-16 03:13:22 +00:00
Josh Bleecher Snyder
4054769a31 runtime/internal/atomic: fix assembly arg sizes
Change-Id: I80ccf40cd3930aff908ee64f6dcbe5f5255198d3
Reviewed-on: https://go-review.googlesource.com/24914
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-07-14 16:35:37 +00:00
Ian Lance Taylor
29ed5da5f2 runtime/pprof: don't print extraneous 0 after goexit
This fixes erroneous handling of the more result parameter of
runtime.Frames.Next.

Fixes #16349.

Change-Id: I4f1c0263dafbb883294b31dbb8922b9d3e650200
Reviewed-on: https://go-review.googlesource.com/24911
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-07-13 21:18:19 +00:00
Keith Randall
efefd11725 [dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch
Semi-regular merge of tip into dev.ssa.

Change-Id: I855817c4746237792a2dab6eaf471087a3646be4
2016-07-13 11:12:44 -07:00
Ian Lance Taylor
b30814bbd6 runtime: add ctxt parameter to cgocallback called from Go
The cgocallback function picked up a ctxt parameter in CL 22508.
That CL updated the assembler implementation, but there are a few
mentions in Go code that were not updated. This CL fixes that.

Fixes #16326

Change-Id: I5f68e23565c6a0b11057aff476d13990bff54a66
Reviewed-on: https://go-review.googlesource.com/24848
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-07-12 16:39:00 +00:00
Ian Lance Taylor
12f2b4ff0e runtime: fix case in KeepAlive comment
Fixes #16299.

Change-Id: I76f541c7f11edb625df566f2f1035147b8bcd9dd
Reviewed-on: https://go-review.googlesource.com/24830
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-07-08 16:50:26 +00:00
Ian Lance Taylor
fad2bbdc6a runtime: fix nanotime for macOS Sierra
In the beta version of the macOS Sierra (10.12) release, the
gettimeofday system call changed on x86. Previously it always returned
the time in the AX/DX registers. Now, if AX is returned as 0, it means
that the system call has stored the values into the memory pointed to by
the first argument, just as the libc gettimeofday function does. The
libc function handles both cases, and we need to do so as well.

Fixes #16272.

Change-Id: Ibe5ad50a2c5b125e92b5a4e787db4b5179f6b723
Reviewed-on: https://go-review.googlesource.com/24812
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-07-08 03:17:18 +00:00
Ian Lance Taylor
84bb9e62f0 runtime: handle selects with duplicate channels in shrinkstack
The shrinkstack code locks all the channels a goroutine is waiting for,
but didn't handle the case of the same channel appearing in the list
multiple times. This led to a deadlock. The channels are sorted so it's
easy to avoid locking the same channel twice.

Fixes #16286.

Change-Id: Ie514805d0532f61c942e85af5b7b8ac405e2ff65
Reviewed-on: https://go-review.googlesource.com/24815
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-07-08 02:05:40 +00:00
Austin Clements
9c8809f82a runtime/internal/sys: implement Ctz and Bswap in assembly for 386
Ctz is a hot-spot in the Go 1.7 memory manager. In SSA it's
implemented as an intrinsic that compiles to a few instructions, but
on the old backend (all architectures other than amd64), it's
implemented as a fairly complex Go function. As a result, switching to
bitmap-based allocation was a significant hit to allocation-heavy
workloads like BinaryTree17 on non-SSA platforms.

For unknown reasons, this hit 386 particularly hard. We can regain a
lot of the lost performance by implementing Ctz in assembly on the
386. This isn't as good as an intrinsic, since it still generates a
function call and prevents useful inlining, but it's much better than
the pure Go implementation:

name                      old time/op    new time/op    delta
BinaryTree17-12              3.59s ± 1%     3.06s ± 1%  -14.74%  (p=0.000 n=19+20)
Fannkuch11-12                3.72s ± 1%     3.64s ± 1%   -2.09%  (p=0.000 n=17+19)
FmtFprintfEmpty-12          52.3ns ± 3%    52.3ns ± 3%     ~     (p=0.829 n=20+19)
FmtFprintfString-12          156ns ± 1%     148ns ± 3%   -5.20%  (p=0.000 n=18+19)
FmtFprintfInt-12             137ns ± 1%     136ns ± 1%   -0.56%  (p=0.000 n=19+13)
FmtFprintfIntInt-12          227ns ± 2%     225ns ± 2%   -0.93%  (p=0.000 n=19+17)
FmtFprintfPrefixedInt-12     210ns ± 1%     208ns ± 1%   -0.91%  (p=0.000 n=19+17)
FmtFprintfFloat-12           375ns ± 1%     371ns ± 1%   -1.06%  (p=0.000 n=19+18)
FmtManyArgs-12               995ns ± 2%     978ns ± 1%   -1.63%  (p=0.000 n=17+17)
GobDecode-12                9.33ms ± 1%    9.19ms ± 0%   -1.59%  (p=0.000 n=20+17)
GobEncode-12                7.73ms ± 1%    7.73ms ± 1%     ~     (p=0.771 n=19+20)
Gzip-12                      375ms ± 1%     374ms ± 1%     ~     (p=0.141 n=20+18)
Gunzip-12                   61.8ms ± 1%    61.8ms ± 1%     ~     (p=0.602 n=20+20)
HTTPClientServer-12         87.7µs ± 2%    86.9µs ± 3%   -0.87%  (p=0.024 n=19+20)
JSONEncode-12               20.2ms ± 1%    20.4ms ± 0%   +0.53%  (p=0.000 n=18+19)
JSONDecode-12               65.3ms ± 0%    65.4ms ± 1%     ~     (p=0.385 n=16+19)
Mandelbrot200-12            4.11ms ± 1%    4.12ms ± 0%   +0.29%  (p=0.020 n=19+19)
GoParse-12                  3.75ms ± 1%    3.61ms ± 2%   -3.90%  (p=0.000 n=20+20)
RegexpMatchEasy0_32-12       104ns ± 0%     103ns ± 0%   -0.96%  (p=0.000 n=13+16)
RegexpMatchEasy0_1K-12       805ns ± 1%     803ns ± 1%     ~     (p=0.189 n=18+18)
RegexpMatchEasy1_32-12       111ns ± 0%     111ns ± 3%     ~     (p=1.000 n=14+19)
RegexpMatchEasy1_1K-12      1.00µs ± 1%    1.00µs ± 1%   +0.50%  (p=0.003 n=19+19)
RegexpMatchMedium_32-12      133ns ± 2%     133ns ± 2%     ~     (p=0.218 n=20+20)
RegexpMatchMedium_1K-12     41.2µs ± 1%    42.2µs ± 1%   +2.52%  (p=0.000 n=18+16)
RegexpMatchHard_32-12       2.35µs ± 1%    2.38µs ± 1%   +1.53%  (p=0.000 n=18+18)
RegexpMatchHard_1K-12       70.9µs ± 2%    72.0µs ± 1%   +1.42%  (p=0.000 n=19+17)
Revcomp-12                   1.06s ± 0%     1.05s ± 0%   -1.36%  (p=0.000 n=20+18)
Template-12                 86.2ms ± 1%    84.6ms ± 0%   -1.89%  (p=0.000 n=20+18)
TimeParse-12                 425ns ± 2%     428ns ± 1%   +0.77%  (p=0.000 n=18+19)
TimeFormat-12                517ns ± 1%     519ns ± 1%   +0.43%  (p=0.001 n=20+19)
[Geo mean]                  74.3µs         73.5µs        -1.05%

Prior to this commit, BinaryTree17-12 on 386 was 33% slower than at
the go1.6 tag. With this commit, it's 13% slower.

On arm and arm64, BinaryTree17-12 is only ~5% slower than it was at
go1.6. It may be worth implementing Ctz for them as well.

I consider this change low risk, since the functions it replaces are
simple, very well specified, and well tested.

For #16117.

Change-Id: Ic39d851d5aca91330134596effd2dab9689ba066
Reviewed-on: https://go-review.googlesource.com/24640
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-06-30 19:35:44 +00:00
Dmitry Vyukov
bb337372fb runtime: fix race atomic operations on external memory
The assembly is broken: it does `MOVQ g(R12), R14` expecting that
R12 contains tls address, but it does not do get_tls(R12) before.
This magically works on linux: `MOVQ g(R12), R14` is compiled to
`mov %fs:0xfffffffffffffff8,%r14` which does not use R12.
But it crashes on windows.

Add explicit `get_tls(R12)`.

Fixes #16206

Change-Id: Ic1f21a6fef2473bcf9147de6646929781c9c1e98
Reviewed-on: https://go-review.googlesource.com/24590
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-06-29 15:30:54 +00:00
Ian Lance Taylor
25a609556a runtime: correct printing of blocked field in scheduler trace
When the blocked field was first introduced back in
https://golang.org/cl/61250043 the scheduler trace code incorrectly used
m->blocked instead of mp->blocked.  That has carried through the
conversion to Go.  This CL fixes it.

Change-Id: Id81907b625221895aa5c85b9853f7c185efd8f4b
Reviewed-on: https://go-review.googlesource.com/24571
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-06-29 01:38:39 +00:00