1
0
mirror of https://github.com/golang/go synced 2024-10-02 08:28:36 -06:00
Commit Graph

2654 Commits

Author SHA1 Message Date
Austin Clements
0c0c94a9dc runtime/pprof: fix period information
The period recorded in CPU profiles is in nanoseconds, but was being
computed incorrectly as hz * 1000. As a result, many absolute times
displayed by pprof were incorrect.

Fix this by computing the period correctly.

Change-Id: I6fadd6d8ad3e57f31e8cc7a25a24fcaec510d8d4
Reviewed-on: https://go-review.googlesource.com/40995
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-04-20 19:35:08 +00:00
Josh Bleecher Snyder
01b1a34aac cmd/compile: rework handling of udiv on ARM
Instead of populating the aux symbol
of CALLudiv during rewrite rules,
populate it during genssa.

This simplifies the rewrite rules.
It also removes all remaining calls
to ctxt.Lookup from any rewrite rules.
This is a first step towards removing
ctxt from ssa.Cache entirely,
and also a first step towards converting
the obj.LSym.Version field into a boolean.
It should also speed up compilation.

Also, move func udiv into package runtime.
That's where it is anyway,
and it lets udiv look and act like the rest of
the runtime support functions.

Change-Id: I41462a632c14fdc41f61b08049ec13cd80a87bfe
Reviewed-on: https://go-review.googlesource.com/41191
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-20 16:27:38 +00:00
Daniel Martí
ff7994ac10 all: remove redundant returns
Returns at the end of func bodies where the funcs have no return values
are pointless.

Change-Id: I0da5ea78671503e41a9f56dd770df8c919310ce5
Reviewed-on: https://go-review.googlesource.com/41093
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-19 20:03:51 +00:00
Austin Clements
22000f5407 runtime: record swept and reclaimed bytes in sweep trace
This extends the GCSweepDone event with counts of swept and reclaimed
bytes. These are useful for understanding the duration and
effectiveness of sweep events.

Change-Id: I3c97a4f0f3aad3adbd188adb264859775f54e2df
Reviewed-on: https://go-review.googlesource.com/40811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2017-04-19 18:31:14 +00:00
Austin Clements
79c56addb6 runtime: make sweep trace events encompass entire sweep loop
Currently, each individual span sweep emits a span to the trace. But
sweeps are generally done in loops until some condition is satisfied,
so this tracing is lower-level than anyone really wants any hides the
fact that no other work is being accomplished between adjacent sweep
events. This is also high overhead: enabling tracing significantly
impacts sweep latency.

Replace this with instead tracing around the sweep loops used for
allocation. This is slightly tricky because sweep loops don't
generally know if any sweeping will happen in them. Hence, we make the
tracing lazy by recording in the P that we would like to start tracing
the sweep *if* one happens, and then only closing the sweep event if
we started it.

This does mean we don't get tracing on every sweep path, which are
legion. However, we get much more informative tracing on the paths
that block allocation, which are the paths that matter.

Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c
Reviewed-on: https://go-review.googlesource.com/40810
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-19 18:31:11 +00:00
Michael Munday
fb28f5ba3a runtime: avoid restricting GOARCH values in documentation
Changes the text to match GOOS which appends 'and so on' at the
end to avoid restricting the set of possible values.

Change-Id: I54bcde71334202cf701662cdc2582c974ba8bf53
Reviewed-on: https://go-review.googlesource.com/41074
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-19 18:19:08 +00:00
Josh Bleecher Snyder
94e44a9c8e runtime: preallocate some overflow buckets
When allocating a non-small array of buckets for a map,
also preallocate some overflow buckets.

The estimate of the number of overflow buckets
is based on a simulation of putting mid=(low+high)/2 elements
into a map, where low is the minimum number of elements
needed to reach this value of b (according to overLoadFactor),
and high is the maximum number of elements possible
to put in this value of b (according to overLoadFactor).
This estimate is surprisingly reliable and accurate.

The number of overflow buckets needed is quadratic,
for a fixed value of b.
Using this mid estimate means that we will overallocate a few
too many overflow buckets when the actual number of elements is near low,
and underallocate significantly too few overflow buckets
when the actual number of elements is near high.

The mechanism introduced in this CL can be re-used for
other overflow bucket optimizations.

For example, given an initial size hint,
we could estimate quite precisely the number of overflow buckets.
This is #19931.

We could also change from "non-nil means end-of-list"
to "pointer-to-hmap.buckets means end-of-list",
and then create a linked list of reusable overflow buckets
when they are freed by map growth.
That is #19992.

We could also use a similar mechanism to do bulk allocation
of overflow buckets.
All these uses can co-exist with only the one additional pointer
in mapextra, given a little care.

name                  old time/op    new time/op    delta
MapPopulate/1-8         60.1ns ± 2%    60.3ns ± 2%     ~     (p=0.278 n=19+20)
MapPopulate/10-8         577ns ± 1%     578ns ± 1%     ~     (p=0.140 n=20+20)
MapPopulate/100-8       8.06µs ± 1%    8.19µs ± 1%   +1.67%  (p=0.000 n=20+20)
MapPopulate/1000-8       104µs ± 1%     104µs ± 1%     ~     (p=0.317 n=20+20)
MapPopulate/10000-8      891µs ± 1%     888µs ± 1%     ~     (p=0.101 n=19+20)
MapPopulate/100000-8    8.61ms ± 1%    8.58ms ± 0%   -0.34%  (p=0.009 n=20+17)

name                  old alloc/op   new alloc/op   delta
MapPopulate/1-8          0.00B          0.00B          ~     (all equal)
MapPopulate/10-8          179B ± 0%      179B ± 0%     ~     (all equal)
MapPopulate/100-8       3.33kB ± 0%    3.38kB ± 0%   +1.48%  (p=0.000 n=20+16)
MapPopulate/1000-8      55.5kB ± 0%    53.4kB ± 0%   -3.84%  (p=0.000 n=19+20)
MapPopulate/10000-8      432kB ± 0%     428kB ± 0%   -1.06%  (p=0.000 n=19+20)
MapPopulate/100000-8    3.65MB ± 0%    3.62MB ± 0%   -0.70%  (p=0.000 n=20+20)

name                  old allocs/op  new allocs/op  delta
MapPopulate/1-8           0.00           0.00          ~     (all equal)
MapPopulate/10-8          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
MapPopulate/100-8         18.0 ± 0%      17.0 ± 0%   -5.56%  (p=0.000 n=20+20)
MapPopulate/1000-8        96.0 ± 0%      72.6 ± 1%  -24.38%  (p=0.000 n=20+20)
MapPopulate/10000-8        625 ± 0%       319 ± 0%  -48.86%  (p=0.000 n=20+20)
MapPopulate/100000-8     6.23k ± 0%     4.00k ± 0%  -35.79%  (p=0.000 n=20+20)

Change-Id: I01f41cb1374bdb99ccedbc00d04fb9ae43daa204
Reviewed-on: https://go-review.googlesource.com/40979
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:47:28 +00:00
Josh Bleecher Snyder
2abd91e265 runtime: add a map growth benchmark
Updates #19931
Updates #19992

Change-Id: Ib2d4e6b9b89a49caa443310d896dce8d6db06050
Reviewed-on: https://go-review.googlesource.com/40978
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:47:21 +00:00
Josh Bleecher Snyder
17d497feaa runtime: add bmap.setoverflow
bmap already has a overflow (getter) method.
Add a setoverflow (setter) method, for readability.

Updates #19931
Updates #19992

Change-Id: I00b3d94037c0d75508a7ebd51085c5c3857fb764
Reviewed-on: https://go-review.googlesource.com/40977
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:43:01 +00:00
Josh Bleecher Snyder
a41b1d5052 runtime: convert hmap.overflow into hmap.extra
Any change to how we allocate overflow buckets
will require some extra hmap storage,
but we don't want hmap to grow,
particular as small maps usually don't need overflow buckets.

This CL converts the existing hmap overflow field,
which is usually used for pointer-free maps,
into a generic extra field.

This extra field can be used to hold data that is optional.
If it is valuable enough to do have special
handling of overflow buckets, which are medium-sized,
it is valuable enough to pay an extra alloc and two extra words for.

Adding fields to extra would entail adding overhead to pointer-free maps;
any mapextra fields added would need to be weighed against that.
This CL is just rearrangement, though.

Updates #19931
Updates #19992

Change-Id: If8537a206905b9d4dc6cd9d886184ece671b3f80
Reviewed-on: https://go-review.googlesource.com/40976
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:42:04 +00:00
Josh Bleecher Snyder
619af17205 runtime: refactor hmap setoverflow into newoverflow
This simplifies the code, as well as providing
a single place to modify to change the
allocation of new overflow buckets.

Updates #19931
Updates #19992

Change-Id: I77070619f5c8fe449bbc35278278bca5eda780f2
Reviewed-on: https://go-review.googlesource.com/40975
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:41:44 +00:00
David Lazar
17137fae2e runtime: fix TestCaller with -l=4
Only the noinline pragma on testCallerFoo is needed to pass the test,
but the second pragma makes the test robust to future changes to the
inliner.

Change-Id: I80b384380c598f52e0382f53b59bb47ff196363d
Reviewed-on: https://go-review.googlesource.com/40877
Run-TryBot: David Lazar <lazard@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-18 19:56:48 +00:00
David Lazar
7821be5951 runtime: make example independent of inlining
Otherwise, with -l=4, runtime.Callers gets inlined and the example
prints too many frames. Now the example passes with -l=4.

Change-Id: I9e420af9371724ac3ec89efafd76a658cf82bb4a
Reviewed-on: https://go-review.googlesource.com/40876
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-18 19:56:36 +00:00
David Lazar
0ea120a70c runtime: skip logical frames in runtime.Caller
This rewrites runtime.Caller in terms of stackExpander, which already
handles inlined frames and partially skipped frames. This also has the
effect of making runtime.Caller understand cgo frames if there is a cgo
symbolizer.

Updates #19348.

Change-Id: Icdf4df921aab5aa394d4d92e3becc4dd169c9a6e
Reviewed-on: https://go-review.googlesource.com/40270
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-18 19:56:30 +00:00
Austin Clements
38521004ed runtime: make internal CallersFrames-equivalent that doesn't escape PC slice
The Frames API forces the PC slice to escape to the heap because it
stores it in the Frames object. However, we'd like to use this API for
call stack expansion internally in the runtime in places where it
would be very good to avoid heap allocation.

This commit makes this possible by pulling the bulk of the Frames
implementation into an internal frameExpander API. The key difference
between these APIs is that the frameExpander does not hold the PC
slice; instead, the caller is responsible for threading the PC slice
through the frameExpander API calls. This makes it possible to keep
the PC slice on the stack. The Frames API then becomes a thin shim
around the frameExpander that keeps the PC slice in the Frames object.

Change-Id: If6b2d0b9132a2a905a0cf5deced9feddce76fc0e
Reviewed-on: https://go-review.googlesource.com/40610
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
2017-04-17 22:46:18 +00:00
Lynn Boger
7d4cca07d2 cmd/asm: detect invalid DS form offsets for ppc64x
While debugging a recent regression it was discovered that
the assembler for ppc64x was not always generating the correct
instruction for DS form loads and stores.  When an instruction
is DS form then the offset must be a multiple of 4, and if it
isn't then bits outside the offset field were being incorrectly
set resulting in unexpected and incorrect instructions.

This change adds a check to determine when the opcode is DS form
and then verifies that the offset is a multiple of 4 before
generating the instruction, otherwise logs an error.

This also changes a few asm files that were using unaligned offsets
for DS form loads and stores.  In the runtime package these were
instructions intended to cause a crash so using aligned or unaligned
offsets doesn't change that behavior.

Change-Id: Ie3a7e1e65dcc9933b54de7a46a054da8459cb56f
Reviewed-on: https://go-review.googlesource.com/40476
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-04-17 21:24:35 +00:00
David Lazar
3249cb0ab4 runtime/trace: iterate over frames instead of PCs
Now the runtime/trace tests pass with -l=4.

This also gets rid of the frames cache for multiple reasons:

1) The frames cache was used to avoid repeated calls to funcname and
funcline. Now these calls happen inside the CallersFrames iterator.

2) Maintaining a frames cache is harder: map[uintptr]traceFrame
doesn't work since each PC can map to multiple traceFrames.

3) It's not clear that the cache is important.

Change-Id: I2914ac0b3ba08e39b60149d99a98f9f532b35bbb
Reviewed-on: https://go-review.googlesource.com/40591
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-14 12:21:21 +00:00
David Lazar
a7276742e6 runtime/trace: better output when test fails
Change-Id: I108d15eb4cd25904bb76de4ed7548c039c69d1a3
Reviewed-on: https://go-review.googlesource.com/40590
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-14 12:21:02 +00:00
Austin Clements
051809e352 runtime: free workbufs during sweeping
This extends the sweeper to free workbufs back to the heap between GC
cycles, allowing this memory to be reused for GC'd allocations or
eventually returned to the OS.

This helps for applications that have high peak heap usage relative to
their regular heap usage (for example, a high-memory initialization
phase). Workbuf memory is roughly proportional to heap size and since
we currently never free workbufs, it's proportional to *peak* heap
size. By freeing workbufs, we can release and reuse this memory for
other purposes when the heap shrinks.

This is somewhat complicated because this costs ~1–2 µs per workbuf
span, so for large heaps it's too expensive to just do synchronously
after mark termination between starting the world and dropping the
worldsema. Hence, we do it asynchronously in the sweeper. This adds a
list of "free" workbuf spans that can be returned to the heap. GC
moves all workbuf spans to this list after mark termination and the
background sweeper drains this list back to the heap. If the sweeper
doesn't finish, that's fine, since getempty can directly reuse any
remaining spans to allocate more workbufs.

Performance impact is negligible. On the x/benchmarks, this reduces
GC-bytes-from-system by 6–11%.

Fixes #19325.

Change-Id: Icb92da2196f0c39ee984faf92d52f29fd9ded7a8
Reviewed-on: https://go-review.googlesource.com/38582
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:47 +00:00
Austin Clements
9cc883a466 runtime: allocate GC workbufs from manually-managed spans
Currently the runtime allocates workbufs from persistent memory, which
means they can never be freed.

Switch to allocating them from manually-managed heap spans. This
doesn't free them yet, but it puts us in a position to do so.

For #19325.

Change-Id: I94b2512a2f2bbbb456cd9347761b9412e80d2da9
Reviewed-on: https://go-review.googlesource.com/38581
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:44 +00:00
Austin Clements
42c1214762 runtime: eliminate write barriers from alloc/mark bitmaps
This introduces a new type, *gcBits, to use for alloc/mark bitmap
allocations instead of *uint8. This type is marked go:notinheap, so
uses of it correctly eliminate write barriers. Since we now have a
type, this also extracts some common operations to methods both for
convenience and to avoid (*uint8) casts at most use sites.

For #19325.

Change-Id: Id51f734fb2e96b8b7715caa348c8dcd4aef0696a
Reviewed-on: https://go-review.googlesource.com/38580
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:42 +00:00
Austin Clements
9d1b2f888e runtime: rename gcBits -> gcBitsArena
This clarifies that the gcBits type is actually an arena of gcBits and
will let us introduce a new gcBits type representing a single
mark/alloc bitmap allocated from the arena.

For #19325.

Change-Id: Idedf76d202d9174a17c61bcca9d5539e042e2445
Reviewed-on: https://go-review.googlesource.com/38579
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:40 +00:00
Austin Clements
dc0f0ab70f runtime: don't count manually-managed spans from heap_{inuse,sys}
Currently, manually-managed spans are included in memstats.heap_inuse
and memstats.heap_sys, but when we export these stats to the user, we
subtract out how much has been allocated for stack spans from both.
This works for now because stacks are the only manually-managed spans
we have.

However, we're about to use manually-managed spans for more things
that don't necessarily have obvious stats we can use to adjust the
user-presented numbers. Prepare for this by changing the accounting so
manually-managed spans don't count toward heap_inuse or heap_sys. This
makes these fields align with the fields presented to the user and
means we don't have to track more statistics just so we can adjust
these statistics.

For #19325.

Change-Id: I5cb35527fd65587ff23339276ba2c3969e2ad98f
Reviewed-on: https://go-review.googlesource.com/38577
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:38 +00:00
Austin Clements
407c56ae9f runtime: generalize {alloc,free}Stack to {alloc,free}Manual
We're going to start using manually-managed spans for GC workbufs, so
rename the allocate/free methods and pass in a pointer to the stats to
use instead of using the stack stats directly.

For #19325.

Change-Id: I37df0147ae5a8e1f3cb37d59c8e57a1fcc6f2980
Reviewed-on: https://go-review.googlesource.com/38576
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:35 +00:00
Austin Clements
ab9db51e1c runtime: rename mspan.stackfreelist -> manualFreeList
We're going to use this free list for other types of manually-managed
memory in the heap.

For #19325.

Change-Id: Ib7e682295133eabfddf3a84f44db43d937bfdd9c
Reviewed-on: https://go-review.googlesource.com/38575
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:33 +00:00
Austin Clements
8fbaa4f70b runtime: rename _MSpanStack -> _MSpanManual
We're about to generalize _MSpanStack to be used for other forms of
in-heap manual memory management in the runtime. This is an automated
rename of _MSpanStack to _MSpanManual plus some comment fix-ups.

For #19325.

Change-Id: I1e20a57bb3b87a0d324382f92a3e294ffc767395
Reviewed-on: https://go-review.googlesource.com/38574
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:30 +00:00
Wei Xiao
ab636b899c hash/crc32: optimize arm64 crc32 implementation
ARMv8 defines crc32 instruction.

Comparing to the original crc32 calculation, this patch makes use of
crc32 instructions to do crc32 calculation instead of the multiple
lookup table algorithms.

ARMv8 provides IEEE and Castagnoli polynomials for crc32 calculation
so that the perfomance of these two types of crc32 get significant
improved.

name                                        old time/op   new time/op    delta
CRC32/poly=IEEE/size=15/align=0-32            117ns ± 0%      38ns ± 0%   -67.44%
CRC32/poly=IEEE/size=15/align=1-32            117ns ± 0%      38ns ± 0%   -67.52%
CRC32/poly=IEEE/size=40/align=0-32            129ns ± 0%      41ns ± 0%   -68.37%
CRC32/poly=IEEE/size=40/align=1-32            129ns ± 0%      41ns ± 0%   -68.29%
CRC32/poly=IEEE/size=512/align=0-32           828ns ± 0%     246ns ± 0%   -70.29%
CRC32/poly=IEEE/size=512/align=1-32           828ns ± 0%     132ns ± 0%   -84.06%
CRC32/poly=IEEE/size=1kB/align=0-32          1.58µs ± 0%    0.46µs ± 0%   -70.98%
CRC32/poly=IEEE/size=1kB/align=1-32          1.58µs ± 0%    0.46µs ± 0%   -70.92%
CRC32/poly=IEEE/size=4kB/align=0-32          6.06µs ± 0%    1.74µs ± 0%   -71.27%
CRC32/poly=IEEE/size=4kB/align=1-32          6.10µs ± 0%    1.74µs ± 0%   -71.44%
CRC32/poly=IEEE/size=32kB/align=0-32         48.3µs ± 0%    13.7µs ± 0%   -71.61%
CRC32/poly=IEEE/size=32kB/align=1-32         48.3µs ± 0%    13.7µs ± 0%   -71.60%
CRC32/poly=Castagnoli/size=15/align=0-32      116ns ± 0%      38ns ± 0%   -67.07%
CRC32/poly=Castagnoli/size=15/align=1-32      116ns ± 0%      38ns ± 0%   -66.90%
CRC32/poly=Castagnoli/size=40/align=0-32      127ns ± 0%      40ns ± 0%   -68.11%
CRC32/poly=Castagnoli/size=40/align=1-32      127ns ± 0%      40ns ± 0%   -68.11%
CRC32/poly=Castagnoli/size=512/align=0-32     828ns ± 0%     132ns ± 0%   -84.06%
CRC32/poly=Castagnoli/size=512/align=1-32     827ns ± 0%     132ns ± 0%   -84.04%
CRC32/poly=Castagnoli/size=1kB/align=0-32    1.59µs ± 0%    0.22µs ± 0%   -85.89%
CRC32/poly=Castagnoli/size=1kB/align=1-32    1.58µs ± 0%    0.22µs ± 0%   -85.79%
CRC32/poly=Castagnoli/size=4kB/align=0-32    6.14µs ± 0%    0.77µs ± 0%   -87.40%
CRC32/poly=Castagnoli/size=4kB/align=1-32    6.06µs ± 0%    0.77µs ± 0%   -87.25%
CRC32/poly=Castagnoli/size=32kB/align=0-32   48.3µs ± 0%     5.9µs ± 0%   -87.71%
CRC32/poly=Castagnoli/size=32kB/align=1-32   48.4µs ± 0%     6.0µs ± 0%   -87.69%
CRC32/poly=Koopman/size=15/align=0-32         104ns ± 0%     104ns ± 0%    +0.00%
CRC32/poly=Koopman/size=15/align=1-32         104ns ± 0%     104ns ± 0%    +0.00%
CRC32/poly=Koopman/size=40/align=0-32         235ns ± 0%     235ns ± 0%    +0.00%
CRC32/poly=Koopman/size=40/align=1-32         235ns ± 0%     235ns ± 0%    +0.00%
CRC32/poly=Koopman/size=512/align=0-32       2.71µs ± 0%    2.71µs ± 0%    -0.07%
CRC32/poly=Koopman/size=512/align=1-32       2.71µs ± 0%    2.71µs ± 0%    -0.04%
CRC32/poly=Koopman/size=1kB/align=0-32       5.40µs ± 0%    5.39µs ± 0%    -0.06%
CRC32/poly=Koopman/size=1kB/align=1-32       5.40µs ± 0%    5.40µs ± 0%    +0.02%
CRC32/poly=Koopman/size=4kB/align=0-32       21.5µs ± 0%    21.5µs ± 0%    -0.16%
CRC32/poly=Koopman/size=4kB/align=1-32       21.5µs ± 0%    21.5µs ± 0%    -0.05%
CRC32/poly=Koopman/size=32kB/align=0-32       172µs ± 0%     172µs ± 0%    -0.07%
CRC32/poly=Koopman/size=32kB/align=1-32       172µs ± 0%     172µs ± 0%    -0.01%

name                                        old speed     new speed      delta
CRC32/poly=IEEE/size=15/align=0-32          128MB/s ± 0%   394MB/s ± 0%  +207.95%
CRC32/poly=IEEE/size=15/align=1-32          128MB/s ± 0%   394MB/s ± 0%  +208.09%
CRC32/poly=IEEE/size=40/align=0-32          310MB/s ± 0%   979MB/s ± 0%  +216.07%
CRC32/poly=IEEE/size=40/align=1-32          310MB/s ± 0%   979MB/s ± 0%  +216.16%
CRC32/poly=IEEE/size=512/align=0-32         618MB/s ± 0%  2074MB/s ± 0%  +235.72%
CRC32/poly=IEEE/size=512/align=1-32         618MB/s ± 0%  3852MB/s ± 0%  +523.55%
CRC32/poly=IEEE/size=1kB/align=0-32         646MB/s ± 0%  2225MB/s ± 0%  +244.57%
CRC32/poly=IEEE/size=1kB/align=1-32         647MB/s ± 0%  2225MB/s ± 0%  +243.87%
CRC32/poly=IEEE/size=4kB/align=0-32         676MB/s ± 0%  2352MB/s ± 0%  +248.02%
CRC32/poly=IEEE/size=4kB/align=1-32         672MB/s ± 0%  2352MB/s ± 0%  +250.15%
CRC32/poly=IEEE/size=32kB/align=0-32        678MB/s ± 0%  2387MB/s ± 0%  +252.17%
CRC32/poly=IEEE/size=32kB/align=1-32        678MB/s ± 0%  2388MB/s ± 0%  +252.11%
CRC32/poly=Castagnoli/size=15/align=0-32    129MB/s ± 0%   393MB/s ± 0%  +205.51%
CRC32/poly=Castagnoli/size=15/align=1-32    129MB/s ± 0%   390MB/s ± 0%  +203.41%
CRC32/poly=Castagnoli/size=40/align=0-32    314MB/s ± 0%   988MB/s ± 0%  +215.04%
CRC32/poly=Castagnoli/size=40/align=1-32    314MB/s ± 0%   987MB/s ± 0%  +214.68%
CRC32/poly=Castagnoli/size=512/align=0-32   618MB/s ± 0%  3860MB/s ± 0%  +524.32%
CRC32/poly=Castagnoli/size=512/align=1-32   619MB/s ± 0%  3859MB/s ± 0%  +523.66%
CRC32/poly=Castagnoli/size=1kB/align=0-32   645MB/s ± 0%  4568MB/s ± 0%  +608.56%
CRC32/poly=Castagnoli/size=1kB/align=1-32   650MB/s ± 0%  4567MB/s ± 0%  +602.94%
CRC32/poly=Castagnoli/size=4kB/align=0-32   667MB/s ± 0%  5297MB/s ± 0%  +693.81%
CRC32/poly=Castagnoli/size=4kB/align=1-32   676MB/s ± 0%  5297MB/s ± 0%  +684.00%
CRC32/poly=Castagnoli/size=32kB/align=0-32  678MB/s ± 0%  5519MB/s ± 0%  +713.83%
CRC32/poly=Castagnoli/size=32kB/align=1-32  677MB/s ± 0%  5497MB/s ± 0%  +712.04%
CRC32/poly=Koopman/size=15/align=0-32       143MB/s ± 0%   144MB/s ± 0%    +0.27%
CRC32/poly=Koopman/size=15/align=1-32       143MB/s ± 0%   144MB/s ± 0%    +0.33%
CRC32/poly=Koopman/size=40/align=0-32       169MB/s ± 0%   170MB/s ± 0%    +0.12%
CRC32/poly=Koopman/size=40/align=1-32       170MB/s ± 0%   170MB/s ± 0%    +0.08%
CRC32/poly=Koopman/size=512/align=0-32      189MB/s ± 0%   189MB/s ± 0%    +0.07%
CRC32/poly=Koopman/size=512/align=1-32      189MB/s ± 0%   189MB/s ± 0%    +0.04%
CRC32/poly=Koopman/size=1kB/align=0-32      190MB/s ± 0%   190MB/s ± 0%    +0.05%
CRC32/poly=Koopman/size=1kB/align=1-32      190MB/s ± 0%   190MB/s ± 0%    -0.01%
CRC32/poly=Koopman/size=4kB/align=0-32      190MB/s ± 0%   190MB/s ± 0%    +0.15%
CRC32/poly=Koopman/size=4kB/align=1-32      190MB/s ± 0%   191MB/s ± 0%    +0.05%
CRC32/poly=Koopman/size=32kB/align=0-32     191MB/s ± 0%   191MB/s ± 0%    +0.06%
CRC32/poly=Koopman/size=32kB/align=1-32     191MB/s ± 0%   191MB/s ± 0%    +0.02%

Also fix a bug of arm64 assembler

The optimization is mainly contributed by Fangming.Fang <fangming.fang@arm.com>

Change-Id: I900678c2e445d7e8ad9e2a9ab3305d649230905f
Reviewed-on: https://go-review.googlesource.com/40074
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-13 12:44:10 +00:00
Austin Clements
7f32d41e5d runtime: expand inlining iteratively in CallersFrames
Currently CallersFrames expands each PC to a slice of Frames and then
iteratively returns those Frames. However, this makes it very
difficult to avoid heap allocation: either the Frames slice will be
heap allocated, or, if it uses internal scratch space for small slices
(as it currently does), the Frames object itself has to be heap
allocated.

Fix this, at least in the common case, by expanding each PC
iteratively. We introduce a new pcExpander type that's responsible for
expanding a single PC. This maintains state from one Frame to the next
in the same PC. Frames then becomes a wrapper around this responsible
for feeding it the next PC when the pcExpander runs out of frames for
the current PC.

This makes it possible to stack-allocate a Frames object, which will
make it possible to use this API for PC expansion from within the
runtime itself.

Change-Id: I993463945ab574557cf1d6bedbe79ce7e9cbbdcd
Reviewed-on: https://go-review.googlesource.com/40434
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
2017-04-12 19:39:37 +00:00
Todd Neal
e49627d355 plugin: properly handle recursively defined types
Prevent a crash if the same type in two plugins had a recursive
definition, either by referring to a pointer to itself or a map existing
with the type as a value type (which creates a recursive definition
through the overflow bucket type).

Fixes #19258

Change-Id: Iac1cbda4c5b6e8edd5e6859a4d5da3bad539a9c6
Reviewed-on: https://go-review.googlesource.com/40292
Run-TryBot: Todd Neal <todd@tneal.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-04-12 12:46:07 +00:00
Joel Sing
092405a9af runtime/cgo: actually remove gcc_libinit_openbsd.c
This was unintentionally emptied rather than removed in 9417c022.

Change-Id: Ie6fdcf7ef55e58f12e2a2750ab448aa2d9f94d15
Reviewed-on: https://go-review.googlesource.com/40413
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-12 06:56:34 +00:00
Joel Sing
9417c022c6 cmd/link,runtime/cgo: enable PT_TLS generation on OpenBSD
OpenBSD 6.0 and later have support for PT_TLS in ld.so(1). Now that OpenBSD
6.1 has been released, OpenBSD 5.9 is no longer officially supported and Go
can start generating PT_TLS for OpenBSD cgo binaries. This also allows us
to remove the workarounds in the OpenBSD cgo runtime.

This change also removes the environ and progname exports - these are now
provided directly by ld.so(1) itself.

Fixes #19932

Change-Id: I42e75ef9feb5dcd4696add5233497e3cbc48ad52
Reviewed-on: https://go-review.googlesource.com/40331
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-11 16:33:16 +00:00
Ben Shi
69261ecad6 runtime: use hardware divider to improve performance
The hardware divider is an optional component of ARMv7. This patch
detects whether it is available in runtime and use it or not.

1. The hardware divider is detected at startup and a flag is set/clear
   according to a perticular bit of runtime.hwcap.
2. Each call of runtime.udiv will check this flag and decide if
   use the hardware division instruction.

A rough test shows the performance improves 40-50% for ARMv7. And
the compatibility of ARMv5/v6 is not broken.

fixes #19118

Change-Id: Ic586bc9659ebc169553ca2004d2bdb721df823ac
Reviewed-on: https://go-review.googlesource.com/37496
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-11 12:25:55 +00:00
Austin Clements
6c6f455f88 runtime: consolidate changes to arena_used
Changing mheap_.arena_used requires several steps that are currently
repeated multiple times in mheap_.sysAlloc. Consolidate these into a
single function.

In the future, this will also make it easier to add other auxiliary VM
structures.

Change-Id: Ie68837d2612e1f4ba4904acb1b6b832b15431d56
Reviewed-on: https://go-review.googlesource.com/40151
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-11 01:35:47 +00:00
Caleb Spare
221541ec8c testing: consider a test failed after race errors
Fixes #19851.

Change-Id: I5ee9533406542be7d5418df154f6134139e75892
Reviewed-on: https://go-review.googlesource.com/39890
Run-TryBot: Caleb Spare <cespare@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-04-10 14:36:02 +00:00
Austin Clements
7e1832d06c runtime: say where the compiler knows about var writeBarrier
The runtime.writeBarrier variable tries to be helpful by telling you
that the compiler also knows about this variable, which you could
probably guess, but doesn't say how the compiler knows about it. In
fact, the compiler has a complete copy in builtin/runtime.go that
needs to be kept in sync. Say so.

Change-Id: Ia7fb0c591cb6f9b8230decce01008b417dfcec89
Reviewed-on: https://go-review.googlesource.com/40150
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-09 23:05:24 +00:00
Todd Neal
0d33dc3105 runtime: improve output of panic(x) where x is numeric
Fixes #19658

Change-Id: I41e46073b75c7674e2ed9d6a90ece367ce92166b
Reviewed-on: https://go-review.googlesource.com/39650
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-09 22:40:33 +00:00
Cherry Zhang
0020b8a257 runtime: prevent TLS fetching instructions from being assembled on NaCl/ARM
They are dead code already, but the verifier is still not happy.
Don't assemble them at all.

Looks like it has been like that for long. I don't know why it
was ok. Maybe the verifier is now more picky?

Fixes #19884.

Change-Id: Ib806fb73ca469789dec56f52d484cf8baf7a245c
Reviewed-on: https://go-review.googlesource.com/40111
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
2017-04-08 22:51:18 +00:00
Josselin Costanzi
d206af1e6c strings: optimize Count for amd64
Move optimized Count implementation from bytes to runtime. Use in
both bytes and strings packages.
Add CountByte benchmark to strings.

Strings benchmarks:
name                       old time/op    new time/op    delta
CountHard1-4                 226µs ± 1%      226µs ± 2%      ~     (p=0.247 n=10+10)
CountHard2-4                 316µs ± 1%      315µs ± 0%      ~     (p=0.133 n=9+10)
CountHard3-4                 919µs ± 1%      920µs ± 1%      ~     (p=0.968 n=10+9)
CountTorture-4              15.4µs ± 1%     15.7µs ± 1%    +2.47%  (p=0.000 n=10+9)
CountTortureOverlapping-4   9.60ms ± 0%     9.65ms ± 1%      ~     (p=0.247 n=10+10)
CountByte/10-4              26.3ns ± 1%     10.9ns ± 1%   -58.71%  (p=0.000 n=9+9)
CountByte/32-4              42.7ns ± 0%     14.2ns ± 0%   -66.64%  (p=0.000 n=10+10)
CountByte/4096-4            3.07µs ± 0%     0.31µs ± 2%   -89.99%  (p=0.000 n=9+10)
CountByte/4194304-4         3.48ms ± 1%     0.34ms ± 1%   -90.09%  (p=0.000 n=10+9)
CountByte/67108864-4        55.6ms ± 1%      7.0ms ± 0%   -87.49%  (p=0.000 n=9+8)

name                      old speed      new speed       delta
CountByte/10-4             380MB/s ± 1%    919MB/s ± 1%  +142.21%  (p=0.000 n=9+9)
CountByte/32-4             750MB/s ± 0%   2247MB/s ± 0%  +199.62%  (p=0.000 n=10+10)
CountByte/4096-4          1.33GB/s ± 0%  13.32GB/s ± 2%  +898.13%  (p=0.000 n=9+10)
CountByte/4194304-4       1.21GB/s ± 1%  12.17GB/s ± 1%  +908.87%  (p=0.000 n=10+9)
CountByte/67108864-4      1.21GB/s ± 1%   9.65GB/s ± 0%  +699.29%  (p=0.000 n=9+8)

Fixes #19411

Change-Id: I8d2d409f0fa6df6d03b60790aa86e540b4a4e3b0
Reviewed-on: https://go-review.googlesource.com/38693
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-07 14:25:13 +00:00
Daniel Martí
4e7724b2db runtime: remove unused parameter from bestFitTreap
This code was added recently, and it doesn't seem like the parameter
will be useful in the near future.

Change-Id: I5d64dadb6820c159b588262ab90df2461b5fdf04
Reviewed-on: https://go-review.googlesource.com/39692
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-06 17:20:43 +00:00
Austin Clements
9741f0275c runtime: initialize more fields of stack spans
Stack spans don't internally use many of the fields of the mspan,
which means things like the size class and element size get left over
from whatever last used the mspan. This can lead to confusing crashes
and debugging.

Zero these fields or initialize them to something reasonable. This
also lets us simplify some code that currently has to distinguish
between heap and stack spans.

Change-Id: I9bd114e76c147bb32de497045b932f8bf1988bbf
Reviewed-on: https://go-review.googlesource.com/38573
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-05 19:17:41 +00:00
Austin Clements
ecb7b63820 runtime: fix gcpacertrace printing of sweep ratio
Commit 44ed88a5a7 moved printing of the "sweep done" gcpacertrace
message so that it is printed when the final sweeper finishes.
However, by this point some other thread has often already observed
that there are no more spans to sweep and zeroed sweepPagesPerByte.

Avoid printing a 0 sweep ratio in the trace when this race happens by
getting the value of the sweep ratio upon entry to sweepone and
printing that.

Change-Id: Iac0c48ae899e12f193267cdfb012c921f8b71c85
Reviewed-on: https://go-review.googlesource.com/39492
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-05 18:24:52 +00:00
Filip Gruszczyński
6c5a819a5e reflect: add MakeMapWithSize for creating maps with size hint
Providing size hint when creating a map allows avoiding re-allocating
underlying data structure if we know how many elements are going to
be inserted. This can be used for example during decoding maps in
gob.

Fixes #19599

Change-Id: I108035fec29391215d2261a73eaed1310b46bab1
Reviewed-on: https://go-review.googlesource.com/38335
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-04 20:01:43 +00:00
Eric Lagergren
094498c9a1 all: fix minor misspellings
Change-Id: I1f1cfb161640eb8756fb1a283892d06b30b7a8fa
Reviewed-on: https://go-review.googlesource.com/39356
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-03 23:19:07 +00:00
Josh Bleecher Snyder
654c977b26 runtime/race: print output when TestRace parsing fails
Change-Id: I986f0c106e059455874692f5bfe2b5af25cf470e
Reviewed-on: https://go-review.googlesource.com/39090
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-31 17:07:29 +00:00
Austin Clements
9ffbdabdb0 runtime: make runtime.GC() trigger a concurrent GC
Currently runtime.GC() triggers a STW GC. For common uses in tests and
benchmarks, it doesn't matter whether it's STW or concurrent, but for
uses in servers for things like collecting heap profiles and
controlling memory footprint, this pause can be a bit problem for
latency.

This changes runtime.GC() to trigger a concurrent GC. In order to
remain as close as possible to its current meaning, we define it to
always perform a full mark/sweep GC cycle before returning (even if
that means it has to finish up a cycle we're in the middle of first)
and to publish the heap profile as of the triggered mark termination.
While it must perform a full cycle, simultaneous runtime.GC() calls
can be consolidated into a single full cycle.

Fixes #18216.

Change-Id: I9088cc5deef4ab6bcf0245ed1982a852a01c44b5
Reviewed-on: https://go-review.googlesource.com/37520
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:21 +00:00
Austin Clements
44ed88a5a7 runtime: track the number of active sweepone calls
sweepone returns ^uintptr(0) when there are no more spans to *start*
sweeping, but there may be spans being swept concurrently at the time
and there's currently no efficient way to tell when the sweeper is
done sweeping all the spans.

We'll need this for concurrent runtime.GC(), so add a count of the
number of active sweepone calls to make it possible to block until
sweeping is truly done.

This is also useful for more accurately printing the gcpacertrace,
since that should be printed after all of the sweeping stats are in
(currently we can print it slightly too early).

For #18216.

Change-Id: I06e6240c9e7b40aca6fd7b788bb6962107c10a0f
Reviewed-on: https://go-review.googlesource.com/37716
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:18 +00:00
Austin Clements
2919132e1b runtime: don't adjust GC trigger on forced GC
Forced GCs don't provide good information about how to adjust the GC
trigger. Currently we avoid adjusting the trigger on forced GC because
forced GC is also STW and we don't adjust the trigger on STW GC.
However, this will become a problem when forced GC is concurrent.

Fix this by skipping trigger adjustment if the GC was user-forced.

For #18216.

Change-Id: I03dfdad12ecd3cfeca4573140a0768abb29aac5e
Reviewed-on: https://go-review.googlesource.com/38951
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:16 +00:00
Austin Clements
29fdbcfea3 runtime: track forced GCs independent of gcMode
Currently gcMode != gcBackgroundMode implies this was a user-forced GC
cycle. This is no longer going to be true when we make runtime.GC()
trigger a concurrent GC, so replace this with an explicit
work.userForced bit.

For #18216.

Change-Id: If7d71bbca78b5f0b35641b070f9d457f5c9a52bd
Reviewed-on: https://go-review.googlesource.com/37519
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:13 +00:00
Austin Clements
786eb5b754 runtime: make debug.FreeOSMemory call runtime.GC()
Currently freeOSMemory calls gcStart directly, but we really just want
it to behave like runtime.GC() and then perform a scavenge, so make it
call runtime.GC() rather than gcStart.

For #18216.

Change-Id: I548ec007afc788e87d383532a443a10d92105937
Reviewed-on: https://go-review.googlesource.com/37518
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:10 +00:00
Austin Clements
3d58498fdb runtime: simplify forced GC triggering
Now that the gcMode is no longer involved in the GC trigger condition,
we can simplify the triggering of forced GCs. By making the trigger
condition for forced GCs true even if gcphase is not _GCoff, we don't
need any special case path in gcStart to ensure that forced GCs don't
get consolidated.

Change-Id: I6067a13d76e40ff2eef8fade6fc14adb0cb58ee5
Reviewed-on: https://go-review.googlesource.com/37517
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:08 +00:00
Austin Clements
29be3f1999 runtime: generalize GC trigger
Currently the GC triggering condition is an awkward combination of the
gcMode (whether or not it's gcBackgroundMode) and a boolean
"forceTrigger" flag.

Replace this with a new gcTrigger type that represents the range of
transition predicates we need. This has several advantages:

1. We can remove the awkward logic that affects the trigger behavior
   based on the gcMode. Now gcMode purely controls whether to run a
   STW GC or not and the gcTrigger controls whether this is a forced
   GC that cannot be consolidated with other GC cycles.

2. We can lift the time-based triggering logic in sysmon to just
   another type of GC trigger and move the logic to the trigger test.

3. This sets us up to have a cycle count-based trigger, which we'll
   use to make runtime.GC trigger concurrent GC with the desired
   consolidation properties.

For #18216.

Change-Id: If9cd49349579a548800f5022ae47b8128004bbfc
Reviewed-on: https://go-review.googlesource.com/37516
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:06 +00:00
Austin Clements
640cd3b322 runtime: check transition condition before triggering periodic GC
Currently sysmon triggers periodic GC if GC is not currently running
and it's been long enough since the last GC. This misses some
important conditions; for example, whether GC is enabled at all by
GOGC. As a result, if GOGC is off, once we pass the timeout for
periodic GC, sysmon will attempt to trigger a GC every 10ms. This GC
will be a no-op because gcStart will check all of the appropriate
conditions and do nothing, but it still goes through the motions of
waking the forcegc goroutine and printing a gctrace line.

Fix this by making sysmon call gcShouldStart to check *all* of the
appropriate transition conditions before attempting to trigger a
periodic GC.

Fixes #19247.

Change-Id: Icee5521ce175e8419f934723849853d53773af31
Reviewed-on: https://go-review.googlesource.com/37515
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:03 +00:00
Austin Clements
1be3e76e76 runtime: simplify heap profile flushing
Currently the heap profile is flushed by *either* gcSweep in STW mode
or by gcMarkTermination in concurrent mode. Simplify this by making
gcMarkTermination always flush the heap profile and by making gcSweep
do one extra flush (instead of two) in STW mode.

Change-Id: I62147afb2a128e1f3d92ef4bb8144c8a345f53c4
Reviewed-on: https://go-review.googlesource.com/37715
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:01 +00:00
Austin Clements
eee85fc5a1 runtime: snapshot heap profile during mark termination
Currently we snapshot the heap profile just *after* mark termination
starts the world because it's a relatively expensive operation.
However, this means any alloc or free events that happen between
starting the world and snapshotting the heap profile can be accounted
to the wrong cycle. In the worst case, a free can be accounted to the
cycle before the alloc; if the heap is small, this can result
temporarily in a negative "in use" count in the profile.

Fix this without making STW more expensive by using a global heap
profile cycle counter. This lets us split up the operation into a two
parts: 1) a super-cheap snapshot operation that simply increments the
global cycle counter during STW, and 2) a more expensive cleanup
operation we can do after starting the world that frees up a slot in
all buckets for use by the next heap profile cycle.

Fixes #19311.

Change-Id: I6bdafabf111c48b3d26fe2d91267f7bef0bd4270
Reviewed-on: https://go-review.googlesource.com/37714
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:14:56 +00:00
Austin Clements
3ebe7d7d11 runtime: pull heap profile cycle into a type
Currently memRecord has the same set of four fields repeated three
times. Pull these into a type and use this type three times. This
cleans up and simplifies the code a bit and will make it easier to
switch to a globally tracked heap profile cycle for #19311.

Change-Id: I414d15673feaa406a8366b48784437c642997cf2
Reviewed-on: https://go-review.googlesource.com/37713
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:14:54 +00:00
Austin Clements
673a8fdfe6 runtime: diagram flow of stats through heap profile
Every time I modify heap profiling, I find myself redrawing this
diagram, so add it to the comments. This shows how allocations and
frees are accounted, how we arrive at consistent profile snapshots,
and when those snapshots are published to the user.

Change-Id: I106aba1200af3c773b46e24e5f50205e808e2c69
Reviewed-on: https://go-review.googlesource.com/37514
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 00:46:18 +00:00
Austin Clements
ef1829d1de runtime: improve TestMemStats checks
Now that we have a nice predicate system, improve the tests performed
by TestMemStats. We add some more non-zero checks (now that we force a
GC, things like NumGC must be non-zero), checks for trivial boolean
fields, and a few more range checks.

Change-Id: I6da46d33fa0ce5738407ee57d587825479413171
Reviewed-on: https://go-review.googlesource.com/37513
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 00:46:16 +00:00
Austin Clements
bda74b0e4a runtime: make TestMemStats failure messages useful
Currently most TestMemStats failures dump the whole MemStats object if
anything is amiss without telling you what is amiss, or even which
field is wrong. This makes it hard to figure out what the actual
problem is.

Replace this with a reflection walk over MemStats and a map of
predicates to check. If one fails, we can construct a detailed and
descriptive error message. The predicates are a direct translation of
the current tests.

Change-Id: I5a7cafb8e6a1eeab653d2e18bb74e2245eaa5444
Reviewed-on: https://go-review.googlesource.com/37512
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 00:46:14 +00:00
Caleb Spare
592037f381 runtime: fix for implementation notes appearing in godoc
Change-Id: I31cfae1e98313b68e3bc8f49079491d2725a662b
Reviewed-on: https://go-review.googlesource.com/38850
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-29 22:32:57 +00:00
David Lazar
7bf0adc6ad runtime: include inlined calls in result of CallersFrames
Change-Id: If1a3396175f2afa607d56efd1444181334a9ae3e
Reviewed-on: https://go-review.googlesource.com/37862
Reviewed-by: Austin Clements <austin@google.com>
2017-03-29 17:27:38 +00:00
David Lazar
ee97216a17 runtime: handle inlined calls in runtime.Callers
The `skip` argument passed to runtime.Caller and runtime.Callers should
be interpreted as the number of logical calls to skip (rather than the
number of physical stack frames to skip). This changes runtime.Callers
to skip inlined calls in addition to physical stack frames.

The result value of runtime.Callers is a slice of program counters
([]uintptr) representing physical stack frames. If the `skip` parameter
to runtime.Callers skips part-way into a physical frame, there is no
convenient way to encode that in the resulting slice. To avoid changing
the API in an incompatible way, our solution is to store the number of
skipped logical calls of the first frame in the _second_ uintptr
returned by runtime.Callers. Since this number is a small integer, we
encode it as a valid PC value into a small symbol called:

    runtime.skipPleaseUseCallersFrames

For example, if f() calls g(), g() calls `runtime.Callers(2, pcs)`, and
g() is inlined into f, then the frame for f will be partially skipped,
resulting in the following slice:

    pcs = []uintptr{pc_in_f, runtime.skipPleaseUseCallersFrames+1, ...}

We store the skip PC in pcs[1] instead of pcs[0] so that `pcs[i:]` will
truncate the captured stack trace rather than grow it for all i.

Updates #19348.

Change-Id: I1c56f89ac48c29e6f52a5d085567c6d77d499cf1
Reviewed-on: https://go-review.googlesource.com/37854
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-29 17:22:08 +00:00
Rick Hudson
6e9ec14186 runtime: redo insert/remove of large spans
Currently for spans with up to 1 MBytes (128 pages) we
maintain an array indexed by the number of pages in the
span. This is efficient both in terms of space as well
as time to insert or remove a span of a particular size.

Unfortunately for spans larger than 1 MByte we currently
place them on a separate linked list. This results in
O(n) behavior. Now that we are seeing heaps approaching
100 GBytes n is large enough to be noticed in real programs.

This change replaces the linked list now used with a balanced
binary tree structure called a treap. A treap is a
probabilistically balanced tree offering O(logN) behavior for
inserting and removing spans.

To verify that this approach will work we start with noting
that only spans with sizes > 1MByte will be put into the treap.
This means that to support 1 TByte a treap will need at most
1 million nodes and can ideally be held in a treap with a
depth of 20. Experiments with adding and removing randomly
sized spans from the treap seem to result in treaps with
depths of about twice the ideal or 40. A petabyte would
require a tree of only twice again that depth again so this
algorithm should last well into the future.

Fixes #19393

Go1 benchmarks indicate this is basically an overall wash.
Tue Mar 28 21:29:21 EDT 2017
name                     old time/op    new time/op    delta
BinaryTree17-4              2.42s ± 1%     2.42s ± 1%    ~     (p=0.980 n=21+21)
Fannkuch11-4                3.00s ± 1%     3.18s ± 4%  +6.10%  (p=0.000 n=22+24)
FmtFprintfEmpty-4          40.5ns ± 1%    40.3ns ± 3%    ~     (p=0.692 n=22+25)
FmtFprintfString-4         65.9ns ± 3%    64.6ns ± 1%  -1.98%  (p=0.000 n=24+23)
FmtFprintfInt-4            69.6ns ± 1%    68.0ns ± 7%  -2.30%  (p=0.001 n=21+22)
FmtFprintfIntInt-4          102ns ± 2%      99ns ± 1%  -3.07%  (p=0.000 n=23+23)
FmtFprintfPrefixedInt-4     126ns ± 0%     125ns ± 0%  -0.79%  (p=0.000 n=19+17)
FmtFprintfFloat-4           206ns ± 2%     205ns ± 1%    ~     (p=0.671 n=23+21)
FmtManyArgs-4               441ns ± 1%     445ns ± 1%  +0.88%  (p=0.000 n=22+23)
GobDecode-4                5.73ms ± 1%    5.86ms ± 1%  +2.37%  (p=0.000 n=23+22)
GobEncode-4                4.51ms ± 1%    4.89ms ± 1%  +8.32%  (p=0.000 n=22+22)
Gzip-4                      197ms ± 0%     202ms ± 1%  +2.75%  (p=0.000 n=23+24)
Gunzip-4                   32.9ms ± 8%    32.7ms ± 2%    ~     (p=0.466 n=23+24)
HTTPClientServer-4         57.3µs ± 1%    56.7µs ± 1%  -0.94%  (p=0.000 n=21+22)
JSONEncode-4               13.8ms ± 1%    13.9ms ± 2%  +1.14%  (p=0.000 n=22+23)
JSONDecode-4               47.4ms ± 1%    48.1ms ± 1%  +1.49%  (p=0.000 n=23+23)
Mandelbrot200-4            3.92ms ± 0%    3.92ms ± 1%  +0.21%  (p=0.000 n=22+22)
GoParse-4                  2.89ms ± 1%    2.87ms ± 1%  -0.68%  (p=0.000 n=21+22)
RegexpMatchEasy0_32-4      73.6ns ± 1%    72.0ns ± 2%  -2.15%  (p=0.000 n=21+22)
RegexpMatchEasy0_1K-4       173ns ± 1%     173ns ± 1%    ~     (p=0.847 n=22+24)
RegexpMatchEasy1_32-4      71.9ns ± 1%    69.8ns ± 1%  -2.99%  (p=0.000 n=23+20)
RegexpMatchEasy1_1K-4       314ns ± 1%     308ns ± 1%  -1.91%  (p=0.000 n=22+23)
RegexpMatchMedium_32-4      106ns ± 0%     105ns ± 1%  -0.58%  (p=0.000 n=19+21)
RegexpMatchMedium_1K-4     34.3µs ± 1%    34.3µs ± 1%    ~     (p=0.871 n=23+22)
RegexpMatchHard_32-4       1.67µs ± 1%    1.67µs ± 7%    ~     (p=0.224 n=22+23)
RegexpMatchHard_1K-4       51.5µs ± 1%    50.4µs ± 1%  -1.99%  (p=0.000 n=22+23)
Revcomp-4                   383ms ± 1%     415ms ± 0%  +8.51%  (p=0.000 n=22+22)
Template-4                 51.5ms ± 1%    51.5ms ± 1%    ~     (p=0.555 n=20+23)
TimeParse-4                 279ns ± 2%     277ns ± 1%  -0.95%  (p=0.000 n=24+22)
TimeFormat-4                294ns ± 1%     296ns ± 1%  +0.58%  (p=0.003 n=24+23)
[Geo mean]                 43.7µs         43.8µs       +0.32%

name                     old speed      new speed      delta
GobDecode-4               134MB/s ± 1%   131MB/s ± 1%  -2.32%  (p=0.000 n=23+22)
GobEncode-4               170MB/s ± 1%   157MB/s ± 1%  -7.68%  (p=0.000 n=22+22)
Gzip-4                   98.7MB/s ± 0%  96.1MB/s ± 1%  -2.68%  (p=0.000 n=23+24)
Gunzip-4                  590MB/s ± 7%   593MB/s ± 2%    ~     (p=0.466 n=23+24)
JSONEncode-4              141MB/s ± 1%   139MB/s ± 2%  -1.13%  (p=0.000 n=22+23)
JSONDecode-4             40.9MB/s ± 1%  40.3MB/s ± 0%  -1.47%  (p=0.000 n=23+23)
GoParse-4                20.1MB/s ± 1%  20.2MB/s ± 1%  +0.69%  (p=0.000 n=21+22)
RegexpMatchEasy0_32-4     435MB/s ± 1%   444MB/s ± 2%  +2.21%  (p=0.000 n=21+22)
RegexpMatchEasy0_1K-4    5.89GB/s ± 1%  5.89GB/s ± 1%    ~     (p=0.439 n=22+24)
RegexpMatchEasy1_32-4     445MB/s ± 1%   459MB/s ± 1%  +3.06%  (p=0.000 n=23+20)
RegexpMatchEasy1_1K-4    3.26GB/s ± 1%  3.32GB/s ± 1%  +1.97%  (p=0.000 n=22+23)
RegexpMatchMedium_32-4   9.40MB/s ± 1%  9.44MB/s ± 1%  +0.43%  (p=0.000 n=23+21)
RegexpMatchMedium_1K-4   29.8MB/s ± 1%  29.8MB/s ± 1%    ~     (p=0.826 n=23+22)
RegexpMatchHard_32-4     19.1MB/s ± 1%  19.1MB/s ± 7%    ~     (p=0.233 n=22+23)
RegexpMatchHard_1K-4     19.9MB/s ± 1%  20.3MB/s ± 1%  +2.03%  (p=0.000 n=22+23)
Revcomp-4                 664MB/s ± 1%   612MB/s ± 0%  -7.85%  (p=0.000 n=22+22)
Template-4               37.6MB/s ± 1%  37.7MB/s ± 1%    ~     (p=0.558 n=20+23)
[Geo mean]                134MB/s        133MB/s       -0.76%
Tue Mar 28 22:16:54 EDT 2017

Change-Id: I4a4f5c2b53d3fb85ef76c98522d3ed5cf8ae5b7e
Reviewed-on: https://go-review.googlesource.com/38732
Reviewed-by: Russ Cox <rsc@golang.org>
2017-03-29 14:18:24 +00:00
Elias Naur
2b4274d667 runtime/cgo: CFRelease result from CFBundleCopyResourceURL
The result from CFBundleCopyResourceURL is owned by the caller. This
CL adds the necessary CFRelease to release it after use.

Fixes #19722

Change-Id: I7afe22ef241d21922a7f5cef6498017e6269a5c3
Reviewed-on: https://go-review.googlesource.com/38639
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2017-03-27 18:12:17 +00:00
Austin Clements
4234d1decd runtime: improve systemstack-on-Go stack message
We reused the old C stack check mechanism for the implementation of
//go:systemstack, so when we execute a //go:systemstack function on a
user stack, the system fails by calling morestackc. However,
morestackc's message still talks about "executing C code".

Fix morestackc's message to reflect its modern usage.

Change-Id: I7e70e7980eab761c0520f675d3ce89486496030f
Reviewed-on: https://go-review.googlesource.com/38572
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-27 14:53:12 +00:00
Elias Naur
0476c7a7b5 runtime/cgo: raise the thread-local storage slot search limit on Android
On Android, the thread local offset is found by looping through memory
starting at the TLS base address. The search is limited to
PTHREAD_KEYS_MAX, but issue 19472 made it clear that in some cases, the
slot is located further from the TLS base.

The limit is merely a sanity check in case our assumptions about the
thread-local storage layout are wrong, so this CL raises it to 384, which
is enough for the test case in issue 19472.

Fixes #19472

Change-Id: I89d1db3e9739d3a7fff5548ae487a7483c0a278a
Reviewed-on: https://go-review.googlesource.com/38636
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-27 08:56:08 +00:00
Elias Naur
aa4c2ca316 runtime/pprof: fix proto tests on NetBSD
The proto_test tests are failing on NetBSD:

https://build.golang.org/log/a3a577144ac48c6ef8e384ce6a700ad30549fb78

the failures seem similar to previous failures on Android:

https://build.golang.org/log/b5786e0cd6d5941dc37b6a50be5172f6b99e22f0

The Android failures where fixed by CL 37896. This CL is an attempt
to fix the NetBSD failures with a similar fix.

Change-Id: I3834afa5b32303ca226e6a31f0f321f66fef9a3f
Reviewed-on: https://go-review.googlesource.com/38637
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-27 08:55:14 +00:00
Keith Randall
e67d881bc3 cmd/compile: simplify efaceeq and ifaceeq
Clean up code that does interface equality. Avoid doing checks
in efaceeq/ifaceeq that we already did before calling those routines.

No noticeable performance changes for existing benchmarks.

name            old time/op  new time/op  delta
EfaceCmpDiff-8   604ns ± 1%   553ns ± 1%  -8.41%  (p=0.000 n=9+10)

Fixes #18618

Change-Id: I3bd46db82b96494873045bc3300c56400bc582eb
Reviewed-on: https://go-review.googlesource.com/38606
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2017-03-24 23:03:09 +00:00
Cherry Zhang
3a1ce1085a runtime: access _cgo_yield indirectly
The darwin linker for ARM does not allow PC-relative relocation
of external symbol in text section. Work around it by accessing
it indirectly: putting its address in a global variable (which is
not external), and accessing through that variable.

Fixes #19684.

Change-Id: I41361bbb281b5dbdda0d100ae49d32c69ed85a81
Reviewed-on: https://go-review.googlesource.com/38596
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
2017-03-24 15:37:56 +00:00
Carlos Eduardo Seo
189053aee2 runtime/internal/atomic: Remove unnecessary checks for GOARCH_ppc64
Starting in go1.9, the minimum processor requirement for ppc64 is POWER8. This
means the checks for GOARCH_ppc64 in asm_ppc64x.s can be removed, since we can
assume LBAR and STBCCC instructions (both from ISA 2.06) will always be
available.

Updates #19074

Change-Id: Ib4418169cd9fc6f871a5ab126b28ee58a2f349e2
Reviewed-on: https://go-review.googlesource.com/38406
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-22 18:14:41 +00:00
Josselin Costanzi
01cd22c687 bytes: add optimized countByte for amd64
Use SSE/AVX2 when counting a single byte.
Inspired from runtime indexbyte implementation.

Benchmark against previous implementation, where
1 byte in every 8 is the one we are looking for:

* On a machine without AVX2
name               old time/op   new time/op     delta
CountSingle/10-4    61.8ns ±10%     15.6ns ±11%    -74.83%  (p=0.000 n=10+10)
CountSingle/32-4     100ns ± 4%       17ns ±10%    -82.54%  (p=0.000 n=10+9)
CountSingle/4K-4    9.66µs ± 3%     0.37µs ± 6%    -96.21%  (p=0.000 n=10+10)
CountSingle/4M-4    11.0ms ± 6%      0.4ms ± 4%    -96.04%  (p=0.000 n=10+10)
CountSingle/64M-4    194ms ± 8%        8ms ± 2%    -95.64%  (p=0.000 n=10+10)

name               old speed     new speed       delta
CountSingle/10-4   162MB/s ±10%    645MB/s ±10%   +297.00%  (p=0.000 n=10+10)
CountSingle/32-4   321MB/s ± 5%   1844MB/s ± 9%   +474.79%  (p=0.000 n=10+9)
CountSingle/4K-4   424MB/s ± 3%  11169MB/s ± 6%  +2533.10%  (p=0.000 n=10+10)
CountSingle/4M-4   381MB/s ± 7%   9609MB/s ± 4%  +2421.88%  (p=0.000 n=10+10)
CountSingle/64M-4  346MB/s ± 7%   7924MB/s ± 2%  +2188.78%  (p=0.000 n=10+10)

* On a machine with AVX2
name               old time/op   new time/op     delta
CountSingle/10-8    37.1ns ± 3%      8.2ns ± 1%    -77.80%  (p=0.000 n=10+10)
CountSingle/32-8    66.1ns ± 3%      9.8ns ± 2%    -85.23%  (p=0.000 n=10+10)
CountSingle/4K-8    7.36µs ± 3%     0.11µs ± 1%    -98.54%  (p=0.000 n=10+10)
CountSingle/4M-8    7.46ms ± 2%     0.15ms ± 2%    -97.95%  (p=0.000 n=10+9)
CountSingle/64M-8    124ms ± 2%        6ms ± 4%    -95.09%  (p=0.000 n=10+10)

name               old speed     new speed       delta
CountSingle/10-8   269MB/s ± 3%   1213MB/s ± 1%   +350.32%  (p=0.000 n=10+10)
CountSingle/32-8   484MB/s ± 4%   3277MB/s ± 2%   +576.66%  (p=0.000 n=10+10)
CountSingle/4K-8   556MB/s ± 3%  37933MB/s ± 1%  +6718.36%  (p=0.000 n=10+10)
CountSingle/4M-8   562MB/s ± 2%  27444MB/s ± 3%  +4783.43%  (p=0.000 n=10+9)
CountSingle/64M-8  543MB/s ± 2%  11054MB/s ± 3%  +1935.81%  (p=0.000 n=10+10)

Fixes #19411

Change-Id: Ieaf20b1fabccabe767c55c66e242e86f3617f883
Reviewed-on: https://go-review.googlesource.com/38258
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-21 20:25:17 +00:00
Daniel Martí
2e29eb57db runtime: remove unused *chantype parameters
The chanrecv funcs don't use it at all. The chansend ones do, but the
element type is now part of the hchan struct, which is already a
parameter.

hchan can be nil in chansend when sending to a nil channel, so when
instrumenting we must copy to the stack to be able to read the channel
type.

name             old time/op  new time/op  delta
ChanUncontended  6.42µs ± 1%  6.22µs ± 0%  -3.06%  (p=0.000 n=19+18)

Initially found by github.com/mvdan/unparam.

Fixes #19591.

Change-Id: I3a5e8a0082e8445cc3f0074695e3593fd9c88412
Reviewed-on: https://go-review.googlesource.com/38351
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-21 17:10:16 +00:00
Vladimir Stefanovic
24dc8c6cb5 cmd/compile,runtime: fix atomic And8 for mipsle
Removing stray xori that came from big endian copy/paste.
Adding atomicand8 check to runtime.check() that would have revealed
this error.
Might fix #19396.

Change-Id: If8d6f25d3e205496163541eb112548aa66df9c2a
Reviewed-on: https://go-review.googlesource.com/38257
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-21 16:03:12 +00:00
Hugues Bruant
5d6b7fcaa1 runtime: add mapdelete_fast*
Add benchmarks for map delete with int32/int64/string key

Benchmark results on darwin/amd64

name                 old time/op  new time/op  delta
MapDelete/Int32/1-8   151ns ± 8%    99ns ± 3%  -34.39%  (p=0.008 n=5+5)
MapDelete/Int32/2-8   128ns ± 2%   111ns ±15%  -13.40%  (p=0.040 n=5+5)
MapDelete/Int32/4-8   128ns ± 5%   114ns ± 2%  -10.82%  (p=0.008 n=5+5)
MapDelete/Int64/1-8   144ns ± 0%   104ns ± 3%  -27.53%  (p=0.016 n=4+5)
MapDelete/Int64/2-8   153ns ± 1%   126ns ± 3%  -17.17%  (p=0.008 n=5+5)
MapDelete/Int64/4-8   178ns ± 3%   136ns ± 2%  -23.60%  (p=0.008 n=5+5)
MapDelete/Str/1-8     187ns ± 3%   171ns ± 3%   -8.54%  (p=0.008 n=5+5)
MapDelete/Str/2-8     221ns ± 3%   206ns ± 4%   -7.18%  (p=0.016 n=5+4)
MapDelete/Str/4-8     256ns ± 5%   232ns ± 2%   -9.36%  (p=0.016 n=4+5)

name                     old time/op    new time/op    delta
BinaryTree17-8              2.78s ± 7%     2.70s ± 1%    ~     (p=0.151 n=5+5)
Fannkuch11-8                3.21s ± 2%     3.19s ± 1%    ~     (p=0.310 n=5+5)
FmtFprintfEmpty-8          49.1ns ± 3%    50.2ns ± 2%    ~     (p=0.095 n=5+5)
FmtFprintfString-8         78.6ns ± 4%    80.2ns ± 5%    ~     (p=0.460 n=5+5)
FmtFprintfInt-8            79.7ns ± 1%    81.0ns ± 3%    ~     (p=0.103 n=5+5)
FmtFprintfIntInt-8          117ns ± 2%     119ns ± 0%    ~     (p=0.079 n=5+4)
FmtFprintfPrefixedInt-8     153ns ± 1%     146ns ± 3%  -4.19%  (p=0.024 n=5+5)
FmtFprintfFloat-8           239ns ± 1%     237ns ± 1%    ~     (p=0.246 n=5+5)
FmtManyArgs-8               506ns ± 2%     509ns ± 2%    ~     (p=0.238 n=5+5)
GobDecode-8                7.06ms ± 4%    6.86ms ± 1%    ~     (p=0.222 n=5+5)
GobEncode-8                6.01ms ± 5%    5.87ms ± 2%    ~     (p=0.222 n=5+5)
Gzip-8                      246ms ± 4%     236ms ± 1%  -4.12%  (p=0.008 n=5+5)
Gunzip-8                   37.7ms ± 4%    37.3ms ± 1%    ~     (p=0.841 n=5+5)
HTTPClientServer-8         64.9µs ± 1%    64.4µs ± 0%  -0.80%  (p=0.032 n=5+4)
JSONEncode-8               16.0ms ± 2%    16.2ms ±11%    ~     (p=0.548 n=5+5)
JSONDecode-8               53.2ms ± 2%    53.1ms ± 4%    ~     (p=1.000 n=5+5)
Mandelbrot200-8            4.33ms ± 2%    4.32ms ± 2%    ~     (p=0.841 n=5+5)
GoParse-8                  3.24ms ± 2%    3.27ms ± 4%    ~     (p=0.690 n=5+5)
RegexpMatchEasy0_32-8      86.2ns ± 1%    85.2ns ± 3%    ~     (p=0.286 n=5+5)
RegexpMatchEasy0_1K-8       198ns ± 2%     199ns ± 1%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_32-8      82.6ns ± 2%    81.8ns ± 1%    ~     (p=0.294 n=5+5)
RegexpMatchEasy1_1K-8       359ns ± 2%     354ns ± 1%  -1.39%  (p=0.048 n=5+5)
RegexpMatchMedium_32-8      123ns ± 2%     123ns ± 1%    ~     (p=0.905 n=5+5)
RegexpMatchMedium_1K-8     38.2µs ± 2%    38.6µs ± 8%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32-8       1.92µs ± 2%    1.91µs ± 5%    ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-8       57.6µs ± 1%    57.0µs ± 2%    ~     (p=0.310 n=5+5)
Revcomp-8                   483ms ± 7%     441ms ± 1%  -8.79%  (p=0.016 n=5+4)
Template-8                 58.0ms ± 1%    58.2ms ± 7%    ~     (p=0.310 n=5+5)
TimeParse-8                 324ns ± 6%     312ns ± 2%    ~     (p=0.087 n=5+5)
TimeFormat-8                330ns ± 1%     329ns ± 1%    ~     (p=0.968 n=5+5)

name                     old speed      new speed      delta
GobDecode-8               109MB/s ± 4%   112MB/s ± 1%    ~     (p=0.222 n=5+5)
GobEncode-8               128MB/s ± 5%   131MB/s ± 2%    ~     (p=0.222 n=5+5)
Gzip-8                   78.9MB/s ± 4%  82.3MB/s ± 1%  +4.25%  (p=0.008 n=5+5)
Gunzip-8                  514MB/s ± 4%   521MB/s ± 1%    ~     (p=0.841 n=5+5)
JSONEncode-8              121MB/s ± 2%   120MB/s ±10%    ~     (p=0.548 n=5+5)
JSONDecode-8             36.5MB/s ± 2%  36.6MB/s ± 4%    ~     (p=1.000 n=5+5)
GoParse-8                17.9MB/s ± 2%  17.7MB/s ± 4%    ~     (p=0.730 n=5+5)
RegexpMatchEasy0_32-8     371MB/s ± 1%   375MB/s ± 3%    ~     (p=0.310 n=5+5)
RegexpMatchEasy0_1K-8    5.15GB/s ± 1%  5.13GB/s ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchEasy1_32-8     387MB/s ± 2%   391MB/s ± 1%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_1K-8    2.85GB/s ± 2%  2.89GB/s ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchMedium_32-8   8.07MB/s ± 2%  8.06MB/s ± 1%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8   26.8MB/s ± 2%  26.6MB/s ± 7%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32-8     16.7MB/s ± 2%  16.7MB/s ± 5%    ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-8     17.8MB/s ± 1%  18.0MB/s ± 2%    ~     (p=0.310 n=5+5)
Revcomp-8                 527MB/s ± 6%   577MB/s ± 1%  +9.44%  (p=0.016 n=5+4)
Template-8               33.5MB/s ± 1%  33.4MB/s ± 7%    ~     (p=0.310 n=5+5)

Updates #19495

Change-Id: Ib9ece1690813d9b4788455db43d30891e2138df5
Reviewed-on: https://go-review.googlesource.com/38172
Reviewed-by: Hugues Bruant <hugues.bruant@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-21 06:07:24 +00:00
Alex Brainman
f2b79cadfd runtime: import os package in BenchmarkRunningGoProgram
I would like to use BenchmarkRunningGoProgram to measure
changes for issue #15588. So the program in the benchmark
should import "os" package.

It is also reasonable that basic Go program includes
"os" package.

For #15588.

Change-Id: Ida6712eab22c2e79fbe91b6fdd492eaf31756852
Reviewed-on: https://go-review.googlesource.com/37914
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-21 05:59:45 +00:00
Ian Lance Taylor
5dc14af682 runtime: clear signal stack on main thread
This is a workaround for a FreeBSD kernel bug. It can be removed when
we are confident that all people are using the fixed kernel. See #15658.

Updates #15658.

Change-Id: I0ecdccb77ddd0c270bdeac4d3a5c8abaf0449075
Reviewed-on: https://go-review.googlesource.com/38325
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-20 22:59:26 +00:00
Austin Clements
df6025bc0d runtime: disallow malloc or panic in scavenge
Mallocs and panics in the scavenge path are particularly nasty because
they're likely to silently self-deadlock on the mheap.lock. Avoid
sinking lots of time into debugging these issues in the future by
turning these into immediate throws.

Change-Id: Ib36fdda33bc90b21c32432b03561630c1f3c69bc
Reviewed-on: https://go-review.googlesource.com/38293
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-19 22:42:28 +00:00
Austin Clements
13ae271d5d runtime: introduce a type for lfstacks
The lfstack API is still a C-style API: lfstacks all have unhelpful
type uint64 and the APIs are package-level functions. Make the code
more readable and Go-style by creating an lfstack type with methods
for push, pop, and empty.

Change-Id: I64685fa3be0e82ae2d1a782a452a50974440a827
Reviewed-on: https://go-review.googlesource.com/38290
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-19 22:42:24 +00:00
Daniel Martí
77b09b8b8d runtime: remove unused g parameter
Found by github.com/mvdan/unparam.

Change-Id: I20145440ff1bcd27fcf15a740354c52f313e536c
Reviewed-on: https://go-review.googlesource.com/37894
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-16 14:03:45 +00:00
Carlos Eduardo Seo
d60166d5ee runtime: improve IndexByte for ppc64x
This change adds a better implementation of IndexByte for ppc64x.

Improvement for bytes·IndexByte:

benchmark                             old ns/op     new ns/op     delta
BenchmarkIndexByte/10-16              12.5          8.48          -32.16%
BenchmarkIndexByte/32-16              34.4          9.85          -71.37%
BenchmarkIndexByte/4K-16              3089          217           -92.98%
BenchmarkIndexByte/4M-16              3154810       207051        -93.44%
BenchmarkIndexByte/64M-16             50564811      5579093       -88.97%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkIndexByte/10-16              800.41       1179.64      1.47x
BenchmarkIndexByte/32-16              930.60       3249.10      3.49x
BenchmarkIndexByte/4K-16              1325.71      18832.53     14.21x
BenchmarkIndexByte/4M-16              1329.49      20257.29     15.24x
BenchmarkIndexByte/64M-16             1327.19      12028.63     9.06x

Improvement for strings·IndexByte:

benchmark                             old ns/op     new ns/op     delta
BenchmarkIndexByte-16                 25.9          7.69          -70.31%

Fixes #19030

Change-Id: Ifb82bbb3d643ec44b98eaa2d08a07f47e5c2fd11
Reviewed-on: https://go-review.googlesource.com/37670
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-03-16 13:54:20 +00:00
Keith Randall
d5dc490519 cmd/compile: intrinsics for math/bits.TrailingZerosX
Implement math/bits.TrailingZerosX using intrinsics.

Generally reorganize the intrinsic spec a bit.
The instrinsics data structure is now built at init time.
This will make doing the other functions in math/bits easier.

Update sys.CtzX to return int instead of uint{64,32} so it
matches math/bits.TrailingZerosX.

Improve the intrinsics a bit for amd64.  We don't need the CMOV
for <64 bit versions.

Update #18616

Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39
Reviewed-on: https://go-review.googlesource.com/38155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 02:44:16 +00:00
Martin Möhrmann
16200c7333 runtime: make complex division c99 compatible
- changes tests to check that the real and imaginary part of the go complex
  division result is equal to the result gcc produces for c99
- changes complex division code to satisfy new complex division test
- adds float functions isNan, isFinite, isInf, abs and copysign
  in the runtime package

Fixes #14644.

name                   old time/op  new time/op  delta
Complex128DivNormal-4  21.8ns ± 6%  13.9ns ± 6%  -36.37%  (p=0.000 n=20+20)
Complex128DivNisNaN-4  14.1ns ± 1%  15.0ns ± 1%   +5.86%  (p=0.000 n=20+19)
Complex128DivDisNaN-4  12.5ns ± 1%  16.7ns ± 1%  +33.79%  (p=0.000 n=19+20)
Complex128DivNisInf-4  10.1ns ± 1%  13.0ns ± 1%  +28.25%  (p=0.000 n=20+19)
Complex128DivDisInf-4  11.0ns ± 1%  20.9ns ± 1%  +90.69%  (p=0.000 n=16+19)
ComplexAlgMap-4        86.7ns ± 1%  86.8ns ± 2%     ~     (p=0.804 n=20+20)

Change-Id: I261f3b4a81f6cc858bc7ff48f6fd1b39c300abf0
Reviewed-on: https://go-review.googlesource.com/37441
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-15 22:45:17 +00:00
Austin Clements
4b8f41daa6 runtime: print user stack on other threads during GOTRACBEACK=crash
Currently, when printing tracebacks of other threads during
GOTRACEBACK=crash, if the thread is on the system stack we print only
the header for the user goroutine and fail to print its stack. This
happens because we passed the g0 to traceback instead of curg. The g0
never has anything set in its gobuf, so traceback doesn't print
anything.

Fix this by passing _g_.m.curg to traceback instead of the g0.

Fixes #19494.

Change-Id: Idfabf94d6a725e9cdf94a3923dead6455ef3b217
Reviewed-on: https://go-review.googlesource.com/38012
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-15 22:16:12 +00:00
Austin Clements
f2e87158f0 runtime: make GOTRACEBACK=crash crash promptly in cgo binaries
GOTRACEBACK=crash works by bouncing a SIGQUIT around the process
sched.mcount times. However, sched.mcount includes the extra Ms
allocated by oneNewExtraM for cgo callbacks. Hence, if there are any
extra Ms that don't have real OS threads, we'll try to send SIGQUIT
more times than there are threads to catch it. Since nothing will
catch these extra signals, we'll fall back to blocking for five
seconds before aborting the process.

Avoid this five second delay by subtracting out the number of extra Ms
when sending SIGQUITs.

Of course, in a cgo binary, it's still possible for the SIGQUIT to go
to a cgo thread and cause some other failure mode. This does not fix
that.

Change-Id: I4fbf3c52dd721812796c4c1dcb2ab4cb7026d965
Reviewed-on: https://go-review.googlesource.com/38182
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-15 22:16:10 +00:00
Hugues Bruant
ec091b6af2 runtime: add mapassign_fast*
Add benchmarks for map assignment with int32/int64/string key

Benchmark results on darwin/amd64

name                  old time/op  new time/op  delta
MapAssignInt32_255-8  24.7ns ± 3%  17.4ns ± 2%  -29.75%  (p=0.000 n=10+10)
MapAssignInt32_64k-8  45.5ns ± 4%  37.6ns ± 4%  -17.18%  (p=0.000 n=10+10)
MapAssignInt64_255-8  26.0ns ± 3%  17.9ns ± 4%  -31.03%  (p=0.000 n=10+10)
MapAssignInt64_64k-8  46.9ns ± 5%  38.7ns ± 2%  -17.53%  (p=0.000 n=9+10)
MapAssignStr_255-8    47.8ns ± 3%  24.8ns ± 4%  -48.01%  (p=0.000 n=10+10)
MapAssignStr_64k-8    83.0ns ± 3%  51.9ns ± 3%  -37.45%  (p=0.000 n=10+9)

name                     old time/op    new time/op    delta
BinaryTree17-8              3.11s ±19%     2.78s ± 3%    ~     (p=0.095 n=5+5)
Fannkuch11-8                3.26s ± 1%     3.21s ± 2%    ~     (p=0.056 n=5+5)
FmtFprintfEmpty-8          50.3ns ± 1%    50.8ns ± 2%    ~     (p=0.246 n=5+5)
FmtFprintfString-8         82.7ns ± 4%    80.1ns ± 5%    ~     (p=0.238 n=5+5)
FmtFprintfInt-8            82.6ns ± 2%    81.9ns ± 3%    ~     (p=0.508 n=5+5)
FmtFprintfIntInt-8          124ns ± 4%     121ns ± 3%    ~     (p=0.111 n=5+5)
FmtFprintfPrefixedInt-8     158ns ± 6%     160ns ± 2%    ~     (p=0.341 n=5+5)
FmtFprintfFloat-8           249ns ± 2%     245ns ± 2%    ~     (p=0.095 n=5+5)
FmtManyArgs-8               513ns ± 2%     519ns ± 3%    ~     (p=0.151 n=5+5)
GobDecode-8                7.48ms ±12%    7.11ms ± 2%    ~     (p=0.222 n=5+5)
GobEncode-8                6.25ms ± 1%    6.03ms ± 2%  -3.56%  (p=0.008 n=5+5)
Gzip-8                      252ms ± 4%     252ms ± 4%    ~     (p=1.000 n=5+5)
Gunzip-8                   38.4ms ± 3%    38.6ms ± 2%    ~     (p=0.690 n=5+5)
HTTPClientServer-8         76.9µs ±41%    66.4µs ± 6%    ~     (p=0.310 n=5+5)
JSONEncode-8               16.5ms ± 3%    16.7ms ± 3%    ~     (p=0.421 n=5+5)
JSONDecode-8               54.6ms ± 1%    54.3ms ± 2%    ~     (p=0.548 n=5+5)
Mandelbrot200-8            4.45ms ± 3%    4.47ms ± 1%    ~     (p=0.841 n=5+5)
GoParse-8                  3.43ms ± 1%    3.32ms ± 2%  -3.28%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8      88.2ns ± 3%    89.4ns ± 2%    ~     (p=0.333 n=5+5)
RegexpMatchEasy0_1K-8       205ns ± 1%     206ns ± 1%    ~     (p=0.905 n=5+5)
RegexpMatchEasy1_32-8      85.1ns ± 1%    85.5ns ± 5%    ~     (p=0.690 n=5+5)
RegexpMatchEasy1_1K-8       365ns ± 1%     371ns ± 9%    ~     (p=1.000 n=5+5)
RegexpMatchMedium_32-8      129ns ± 2%     128ns ± 3%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8     39.8µs ± 0%    39.7µs ± 4%    ~     (p=0.730 n=4+5)
RegexpMatchHard_32-8       1.99µs ± 3%    2.05µs ±16%    ~     (p=0.794 n=5+5)
RegexpMatchHard_1K-8       59.3µs ± 1%    60.3µs ± 7%    ~     (p=1.000 n=5+5)
Revcomp-8                   1.36s ±63%     0.52s ± 5%    ~     (p=0.095 n=5+5)
Template-8                 62.6ms ±14%    60.5ms ± 5%    ~     (p=0.690 n=5+5)
TimeParse-8                 330ns ± 2%     324ns ± 2%    ~     (p=0.087 n=5+5)
TimeFormat-8                350ns ± 3%     340ns ± 1%  -2.86%  (p=0.008 n=5+5)

name                     old speed      new speed      delta
GobDecode-8               103MB/s ±11%   108MB/s ± 2%    ~     (p=0.222 n=5+5)
GobEncode-8               123MB/s ± 1%   127MB/s ± 2%  +3.71%  (p=0.008 n=5+5)
Gzip-8                   77.1MB/s ± 4%  76.9MB/s ± 3%    ~     (p=1.000 n=5+5)
Gunzip-8                  505MB/s ± 3%   503MB/s ± 2%    ~     (p=0.690 n=5+5)
JSONEncode-8              118MB/s ± 3%   116MB/s ± 3%    ~     (p=0.421 n=5+5)
JSONDecode-8             35.5MB/s ± 1%  35.8MB/s ± 2%    ~     (p=0.397 n=5+5)
GoParse-8                16.9MB/s ± 1%  17.4MB/s ± 2%  +3.45%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8     363MB/s ± 3%   358MB/s ± 2%    ~     (p=0.421 n=5+5)
RegexpMatchEasy0_1K-8    4.98GB/s ± 1%  4.97GB/s ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchEasy1_32-8     376MB/s ± 1%   375MB/s ± 5%    ~     (p=0.690 n=5+5)
RegexpMatchEasy1_1K-8    2.80GB/s ± 1%  2.76GB/s ± 9%    ~     (p=0.841 n=5+5)
RegexpMatchMedium_32-8   7.73MB/s ± 1%  7.76MB/s ± 3%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8   25.8MB/s ± 0%  25.8MB/s ± 4%    ~     (p=0.651 n=4+5)
RegexpMatchHard_32-8     16.1MB/s ± 3%  15.7MB/s ±14%    ~     (p=0.794 n=5+5)
RegexpMatchHard_1K-8     17.3MB/s ± 1%  17.0MB/s ± 7%    ~     (p=0.984 n=5+5)
Revcomp-8                 273MB/s ±83%   488MB/s ± 5%    ~     (p=0.095 n=5+5)
Template-8               31.1MB/s ±13%  32.1MB/s ± 5%    ~     (p=0.690 n=5+5)

Updates #19495

Change-Id: I116e9a2a4594769318b22d736464de8a98499909
Reviewed-on: https://go-review.googlesource.com/38091
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-13 23:43:16 +00:00
Dave Cheney
7ee43faf78 runtime: remove sizeToClass
CL 32219 added precomputed sizeclass tables.

Remove the unused sizeToClass method which was previously only
called from initSizes.

Change-Id: I907bf9ed78430ecfaabbec7fca77ef2375010081
Reviewed-on: https://go-review.googlesource.com/38113
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-13 01:55:44 +00:00
David NewHamlet
e19f184b8f runtime: use cpuset_getaffinity for runtime.NumCPU() on FreeBSD
In FreeBSD when run Go proc under a given sub-list of
processors(e.g. 'cpuset -l 0 ./a.out' in multi-core system),
runtime.NumCPU() still return all physical CPUs from sysctl
hw.ncpu instead of account from sub-list.

Fix by use syscall cpuset_getaffinity to account the number of sub-list.

Fixes #15206

Change-Id: If87c4b620e870486efa100685db5debbf1210a5b
Reviewed-on: https://go-review.googlesource.com/29341
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2017-03-10 22:06:24 +00:00
Daniel Martí
da0d23e5cd runtime: remove unused ratep parameter
Found by github.com/mvdan/unparam.

Change-Id: Iabcdfec2ae42c735aa23210b7183080d750682ca
Reviewed-on: https://go-review.googlesource.com/38030
Reviewed-by: Peter Weinberger <pjw@google.com>
Run-TryBot: Peter Weinberger <pjw@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-10 20:46:58 +00:00
Bryan C. Mills
4210930a28 runtime/cgo: return correct sa_flags
A typo in the previous revision ("act" instead of "oldact") caused us
to return the sa_flags from the new (or zeroed) sigaction rather than
the old one.

In the presence of a signal handler registered before
runtime.libpreinit, this caused setsigstack to erroneously zero out
important sa_flags (such as SA_SIGINFO) in its attempt to re-register
the existing handler with SA_ONSTACK.

Change-Id: I3cd5152a38ec0d44ae611f183bc1651d65b8a115
Reviewed-on: https://go-review.googlesource.com/37852
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-09 18:53:35 +00:00
Bryan C. Mills
e57350f4c0 runtime: fix _cgo_yield usage with sysmon and on BSD
There are a few problems from change 35494, discovered during testing
of change 37852.

1. I was confused about the usage of n.key in the sema variant, so we
   were looping on the wrong condition. The error was not caught by
   the TryBots (presumably due to missing TSAN coverage in the BSD and
   darwin builders?).

2. The sysmon goroutine sometimes skips notetsleep entirely, using
   direct usleep syscalls instead. In that case, we were not calling
   _cgo_yield, leading to missed signals under TSAN.

3. Some notetsleep calls have long finite timeouts. They should be
   broken up into smaller chunks with a yield at the end of each
   chunk.

updates #18717

Change-Id: I91175af5dea3857deebc686f51a8a40f9d690bcc
Reviewed-on: https://go-review.googlesource.com/37867
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-09 18:36:49 +00:00
Josh Bleecher Snyder
23be728950 runtime: optimize slicebytestostring
Inline rawstringtmp and simplify.
Use memmove instead of copy.

name                     old time/op  new time/op  delta
SliceByteToString/1-8    19.4ns ± 2%  14.1ns ± 1%  -27.04%  (p=0.000 n=20+17)
SliceByteToString/2-8    20.8ns ± 2%  15.5ns ± 2%  -25.46%  (p=0.000 n=20+20)
SliceByteToString/4-8    20.7ns ± 1%  14.9ns ± 1%  -28.30%  (p=0.000 n=20+20)
SliceByteToString/8-8    23.2ns ± 1%  17.1ns ± 1%  -26.22%  (p=0.000 n=19+19)
SliceByteToString/16-8   29.4ns ± 1%  23.6ns ± 1%  -19.76%  (p=0.000 n=17+20)
SliceByteToString/32-8   31.4ns ± 1%  26.0ns ± 1%  -17.11%  (p=0.000 n=16+19)
SliceByteToString/64-8   36.1ns ± 0%  30.0ns ± 0%  -16.96%  (p=0.000 n=16+16)
SliceByteToString/128-8  46.9ns ± 0%  38.9ns ± 0%  -17.15%  (p=0.000 n=17+19)

Change-Id: I422e688830e4a9bd21897d1f74964625b735f436
Reviewed-on: https://go-review.googlesource.com/37791
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-08 22:05:52 +00:00
Bryan C. Mills
29edf0f9fe runtime: poll libc to deliver signals under TSAN
fixes #18717

Change-Id: I7244463d2e7489e0b0fe3b74c4b782e71210beb2
Reviewed-on: https://go-review.googlesource.com/35494
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-08 18:58:30 +00:00
David Chase
d71f36b5aa cmd/compile: check loop rescheduling with stack bound, not counter
After benchmarking with a compiler modified to have better
spill location, it became clear that this method of checking
was actually faster on (at least) two different architectures
(ppc64 and amd64) and it also provides more timely interruption
of loops.

This change adds a modified FOR loop node "FORUNTIL" that
checks after executing the loop body instead of before (i.e.,
always at least once).  This ensures that a pointer past the
end of a slice or array is not made visible to the garbage
collector.

Without the rescheduling checks inserted, the restructured
loop from this  change apparently provides a 1% geomean
improvement on PPC64 running the go1 benchmarks; the
improvement on AMD64 is only 0.12%.

Inserting the rescheduling check exposed some peculiar bug
with the ssa test code for s390x; this was updated based on
initial code actually generated for GOARCH=s390x to use
appropriate OpArg, OpAddr, and OpVarDef.

NaCl is disabled in testing.

Change-Id: Ieafaa9a61d2a583ad00968110ef3e7a441abca50
Reviewed-on: https://go-review.googlesource.com/36206
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 18:52:12 +00:00
Russ Cox
c797256a8f runtime/pprof: add GNU build IDs to Mappings recorded from /proc/self/maps
This helps systems that maintain an external database mapping
build ID to symbol information for the given binary, especially
in the case where /proc/self/maps lists many different files
(for example, many shared libraries).

Avoid importing debug/elf to avoid dragging in that whole
package (and its dependencies like debug/dwarf) into the
build of every program that generates a profile.

Fixes #19431.

Change-Id: I6d4362a79fe23e4f1726dffb0661d20bb57f766f
Reviewed-on: https://go-review.googlesource.com/37855
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-08 01:09:18 +00:00
Austin Clements
d50f892abc runtime: join selectgo and selectgoImpl
Currently selectgo is just a wrapper around selectgoImpl. This keeps
the hard-coded frame skip counts for tracing the same between the
channel implementation and the select implementation.

However, this is fragile and confusing, so pass a skip parameter to
send and recv, join selectgo and selectgoImpl into one function, and
use decrease all of the skips in selectgo by one.

Change-Id: I11b8cbb7d805b55f5dc6ab4875ac7dde79412ff2
Reviewed-on: https://go-review.googlesource.com/37860
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-07 21:19:38 +00:00
Austin Clements
b992c2649e runtime: print SP/FP on bad pointer crashes
If the bad pointer is on a stack, this makes it possible to find the
frame containing the bad pointer.

Change-Id: Ieda44e054aa9ebf22d15d184457c7610b056dded
Reviewed-on: https://go-review.googlesource.com/37858
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 20:46:54 +00:00
Austin Clements
caa7dacfd2 runtime: honor GOTRACEBACK=crash even if _g_.m.traceback != 0
Change-Id: I6de1ef8f67bde044b8706c01e98400e266e1f8f0
Reviewed-on: https://go-review.googlesource.com/37857
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-07 20:46:52 +00:00
Matthew Dempsky
c310c688ff cmd/compile, runtime: simplify multiway select implementation
This commit reworks multiway select statements to use normal control
flow primitives instead of the previous setjmp/longjmp-like behavior.
This simplifies liveness analysis and should prevent issues around
"returns twice" function calls within SSA passes.

test/live.go is updated because liveness analysis's CFG is more
representative of actual control flow. The case bodies are the only
real successors of the selectgo call, but previously the selectsend,
selectrecv, etc. calls were included in the successors list too.

Updates #19331.

Change-Id: I7f879b103a4b85e62fc36a270d812f54c0aa3e83
Reviewed-on: https://go-review.googlesource.com/37661
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-07 20:14:17 +00:00
Daniel Martí
5ed952368e runtime/pprof: actually use tag parameter
It's only ever called with the value it was using, but the code was
counterintuitive. Use the parameter instead, like the other funcs near
it.

Found by github.com/mvdan/unparam.

Change-Id: I45855e11d749380b9b2a28e6dd1d5dedf119a19b
Reviewed-on: https://go-review.googlesource.com/37893
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 20:01:05 +00:00
Elias Naur
b91b694b37 runtime/pprof: fix the protobuf tests on Android
Change-Id: I5f85a7980b9a18d3641c4ee8b0992671a8421bb0
Reviewed-on: https://go-review.googlesource.com/37896
Run-TryBot: Elias Naur <elias.naur@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 19:18:04 +00:00
Austin Clements
e4f73769bc runtime: strongly encourage CallersFrames with the result of Callers
For historical reasons, it's still commonplace to iterate over the
slice returned by runtime.Callers and call FuncForPC on each PC. This
is broken in gccgo and somewhat broken in gc and will become more
broken in gc with mid-stack inlining.

In Go 1.7, we introduced runtime.CallersFrames to deal with these
problems, but didn't strongly direct people toward using it. Reword
the documentation on runtime.Callers to more strongly encourage people
to use CallersFrames and explicitly discourage them from iterating
over the PCs or using FuncForPC on the results.

Fixes #19426.

Change-Id: Id0d14cb51a0e9521c8fdde9612610f2c2b9383c4
Reviewed-on: https://go-review.googlesource.com/37726
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-06 20:52:00 +00:00