1
0
mirror of https://github.com/golang/go synced 2024-11-19 20:44:40 -07:00
Commit Graph

2601 Commits

Author SHA1 Message Date
Austin Clements
eee85fc5a1 runtime: snapshot heap profile during mark termination
Currently we snapshot the heap profile just *after* mark termination
starts the world because it's a relatively expensive operation.
However, this means any alloc or free events that happen between
starting the world and snapshotting the heap profile can be accounted
to the wrong cycle. In the worst case, a free can be accounted to the
cycle before the alloc; if the heap is small, this can result
temporarily in a negative "in use" count in the profile.

Fix this without making STW more expensive by using a global heap
profile cycle counter. This lets us split up the operation into a two
parts: 1) a super-cheap snapshot operation that simply increments the
global cycle counter during STW, and 2) a more expensive cleanup
operation we can do after starting the world that frees up a slot in
all buckets for use by the next heap profile cycle.

Fixes #19311.

Change-Id: I6bdafabf111c48b3d26fe2d91267f7bef0bd4270
Reviewed-on: https://go-review.googlesource.com/37714
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:14:56 +00:00
Austin Clements
3ebe7d7d11 runtime: pull heap profile cycle into a type
Currently memRecord has the same set of four fields repeated three
times. Pull these into a type and use this type three times. This
cleans up and simplifies the code a bit and will make it easier to
switch to a globally tracked heap profile cycle for #19311.

Change-Id: I414d15673feaa406a8366b48784437c642997cf2
Reviewed-on: https://go-review.googlesource.com/37713
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:14:54 +00:00
Austin Clements
673a8fdfe6 runtime: diagram flow of stats through heap profile
Every time I modify heap profiling, I find myself redrawing this
diagram, so add it to the comments. This shows how allocations and
frees are accounted, how we arrive at consistent profile snapshots,
and when those snapshots are published to the user.

Change-Id: I106aba1200af3c773b46e24e5f50205e808e2c69
Reviewed-on: https://go-review.googlesource.com/37514
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 00:46:18 +00:00
Austin Clements
ef1829d1de runtime: improve TestMemStats checks
Now that we have a nice predicate system, improve the tests performed
by TestMemStats. We add some more non-zero checks (now that we force a
GC, things like NumGC must be non-zero), checks for trivial boolean
fields, and a few more range checks.

Change-Id: I6da46d33fa0ce5738407ee57d587825479413171
Reviewed-on: https://go-review.googlesource.com/37513
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 00:46:16 +00:00
Austin Clements
bda74b0e4a runtime: make TestMemStats failure messages useful
Currently most TestMemStats failures dump the whole MemStats object if
anything is amiss without telling you what is amiss, or even which
field is wrong. This makes it hard to figure out what the actual
problem is.

Replace this with a reflection walk over MemStats and a map of
predicates to check. If one fails, we can construct a detailed and
descriptive error message. The predicates are a direct translation of
the current tests.

Change-Id: I5a7cafb8e6a1eeab653d2e18bb74e2245eaa5444
Reviewed-on: https://go-review.googlesource.com/37512
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 00:46:14 +00:00
Caleb Spare
592037f381 runtime: fix for implementation notes appearing in godoc
Change-Id: I31cfae1e98313b68e3bc8f49079491d2725a662b
Reviewed-on: https://go-review.googlesource.com/38850
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-29 22:32:57 +00:00
David Lazar
7bf0adc6ad runtime: include inlined calls in result of CallersFrames
Change-Id: If1a3396175f2afa607d56efd1444181334a9ae3e
Reviewed-on: https://go-review.googlesource.com/37862
Reviewed-by: Austin Clements <austin@google.com>
2017-03-29 17:27:38 +00:00
David Lazar
ee97216a17 runtime: handle inlined calls in runtime.Callers
The `skip` argument passed to runtime.Caller and runtime.Callers should
be interpreted as the number of logical calls to skip (rather than the
number of physical stack frames to skip). This changes runtime.Callers
to skip inlined calls in addition to physical stack frames.

The result value of runtime.Callers is a slice of program counters
([]uintptr) representing physical stack frames. If the `skip` parameter
to runtime.Callers skips part-way into a physical frame, there is no
convenient way to encode that in the resulting slice. To avoid changing
the API in an incompatible way, our solution is to store the number of
skipped logical calls of the first frame in the _second_ uintptr
returned by runtime.Callers. Since this number is a small integer, we
encode it as a valid PC value into a small symbol called:

    runtime.skipPleaseUseCallersFrames

For example, if f() calls g(), g() calls `runtime.Callers(2, pcs)`, and
g() is inlined into f, then the frame for f will be partially skipped,
resulting in the following slice:

    pcs = []uintptr{pc_in_f, runtime.skipPleaseUseCallersFrames+1, ...}

We store the skip PC in pcs[1] instead of pcs[0] so that `pcs[i:]` will
truncate the captured stack trace rather than grow it for all i.

Updates #19348.

Change-Id: I1c56f89ac48c29e6f52a5d085567c6d77d499cf1
Reviewed-on: https://go-review.googlesource.com/37854
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-29 17:22:08 +00:00
Rick Hudson
6e9ec14186 runtime: redo insert/remove of large spans
Currently for spans with up to 1 MBytes (128 pages) we
maintain an array indexed by the number of pages in the
span. This is efficient both in terms of space as well
as time to insert or remove a span of a particular size.

Unfortunately for spans larger than 1 MByte we currently
place them on a separate linked list. This results in
O(n) behavior. Now that we are seeing heaps approaching
100 GBytes n is large enough to be noticed in real programs.

This change replaces the linked list now used with a balanced
binary tree structure called a treap. A treap is a
probabilistically balanced tree offering O(logN) behavior for
inserting and removing spans.

To verify that this approach will work we start with noting
that only spans with sizes > 1MByte will be put into the treap.
This means that to support 1 TByte a treap will need at most
1 million nodes and can ideally be held in a treap with a
depth of 20. Experiments with adding and removing randomly
sized spans from the treap seem to result in treaps with
depths of about twice the ideal or 40. A petabyte would
require a tree of only twice again that depth again so this
algorithm should last well into the future.

Fixes #19393

Go1 benchmarks indicate this is basically an overall wash.
Tue Mar 28 21:29:21 EDT 2017
name                     old time/op    new time/op    delta
BinaryTree17-4              2.42s ± 1%     2.42s ± 1%    ~     (p=0.980 n=21+21)
Fannkuch11-4                3.00s ± 1%     3.18s ± 4%  +6.10%  (p=0.000 n=22+24)
FmtFprintfEmpty-4          40.5ns ± 1%    40.3ns ± 3%    ~     (p=0.692 n=22+25)
FmtFprintfString-4         65.9ns ± 3%    64.6ns ± 1%  -1.98%  (p=0.000 n=24+23)
FmtFprintfInt-4            69.6ns ± 1%    68.0ns ± 7%  -2.30%  (p=0.001 n=21+22)
FmtFprintfIntInt-4          102ns ± 2%      99ns ± 1%  -3.07%  (p=0.000 n=23+23)
FmtFprintfPrefixedInt-4     126ns ± 0%     125ns ± 0%  -0.79%  (p=0.000 n=19+17)
FmtFprintfFloat-4           206ns ± 2%     205ns ± 1%    ~     (p=0.671 n=23+21)
FmtManyArgs-4               441ns ± 1%     445ns ± 1%  +0.88%  (p=0.000 n=22+23)
GobDecode-4                5.73ms ± 1%    5.86ms ± 1%  +2.37%  (p=0.000 n=23+22)
GobEncode-4                4.51ms ± 1%    4.89ms ± 1%  +8.32%  (p=0.000 n=22+22)
Gzip-4                      197ms ± 0%     202ms ± 1%  +2.75%  (p=0.000 n=23+24)
Gunzip-4                   32.9ms ± 8%    32.7ms ± 2%    ~     (p=0.466 n=23+24)
HTTPClientServer-4         57.3µs ± 1%    56.7µs ± 1%  -0.94%  (p=0.000 n=21+22)
JSONEncode-4               13.8ms ± 1%    13.9ms ± 2%  +1.14%  (p=0.000 n=22+23)
JSONDecode-4               47.4ms ± 1%    48.1ms ± 1%  +1.49%  (p=0.000 n=23+23)
Mandelbrot200-4            3.92ms ± 0%    3.92ms ± 1%  +0.21%  (p=0.000 n=22+22)
GoParse-4                  2.89ms ± 1%    2.87ms ± 1%  -0.68%  (p=0.000 n=21+22)
RegexpMatchEasy0_32-4      73.6ns ± 1%    72.0ns ± 2%  -2.15%  (p=0.000 n=21+22)
RegexpMatchEasy0_1K-4       173ns ± 1%     173ns ± 1%    ~     (p=0.847 n=22+24)
RegexpMatchEasy1_32-4      71.9ns ± 1%    69.8ns ± 1%  -2.99%  (p=0.000 n=23+20)
RegexpMatchEasy1_1K-4       314ns ± 1%     308ns ± 1%  -1.91%  (p=0.000 n=22+23)
RegexpMatchMedium_32-4      106ns ± 0%     105ns ± 1%  -0.58%  (p=0.000 n=19+21)
RegexpMatchMedium_1K-4     34.3µs ± 1%    34.3µs ± 1%    ~     (p=0.871 n=23+22)
RegexpMatchHard_32-4       1.67µs ± 1%    1.67µs ± 7%    ~     (p=0.224 n=22+23)
RegexpMatchHard_1K-4       51.5µs ± 1%    50.4µs ± 1%  -1.99%  (p=0.000 n=22+23)
Revcomp-4                   383ms ± 1%     415ms ± 0%  +8.51%  (p=0.000 n=22+22)
Template-4                 51.5ms ± 1%    51.5ms ± 1%    ~     (p=0.555 n=20+23)
TimeParse-4                 279ns ± 2%     277ns ± 1%  -0.95%  (p=0.000 n=24+22)
TimeFormat-4                294ns ± 1%     296ns ± 1%  +0.58%  (p=0.003 n=24+23)
[Geo mean]                 43.7µs         43.8µs       +0.32%

name                     old speed      new speed      delta
GobDecode-4               134MB/s ± 1%   131MB/s ± 1%  -2.32%  (p=0.000 n=23+22)
GobEncode-4               170MB/s ± 1%   157MB/s ± 1%  -7.68%  (p=0.000 n=22+22)
Gzip-4                   98.7MB/s ± 0%  96.1MB/s ± 1%  -2.68%  (p=0.000 n=23+24)
Gunzip-4                  590MB/s ± 7%   593MB/s ± 2%    ~     (p=0.466 n=23+24)
JSONEncode-4              141MB/s ± 1%   139MB/s ± 2%  -1.13%  (p=0.000 n=22+23)
JSONDecode-4             40.9MB/s ± 1%  40.3MB/s ± 0%  -1.47%  (p=0.000 n=23+23)
GoParse-4                20.1MB/s ± 1%  20.2MB/s ± 1%  +0.69%  (p=0.000 n=21+22)
RegexpMatchEasy0_32-4     435MB/s ± 1%   444MB/s ± 2%  +2.21%  (p=0.000 n=21+22)
RegexpMatchEasy0_1K-4    5.89GB/s ± 1%  5.89GB/s ± 1%    ~     (p=0.439 n=22+24)
RegexpMatchEasy1_32-4     445MB/s ± 1%   459MB/s ± 1%  +3.06%  (p=0.000 n=23+20)
RegexpMatchEasy1_1K-4    3.26GB/s ± 1%  3.32GB/s ± 1%  +1.97%  (p=0.000 n=22+23)
RegexpMatchMedium_32-4   9.40MB/s ± 1%  9.44MB/s ± 1%  +0.43%  (p=0.000 n=23+21)
RegexpMatchMedium_1K-4   29.8MB/s ± 1%  29.8MB/s ± 1%    ~     (p=0.826 n=23+22)
RegexpMatchHard_32-4     19.1MB/s ± 1%  19.1MB/s ± 7%    ~     (p=0.233 n=22+23)
RegexpMatchHard_1K-4     19.9MB/s ± 1%  20.3MB/s ± 1%  +2.03%  (p=0.000 n=22+23)
Revcomp-4                 664MB/s ± 1%   612MB/s ± 0%  -7.85%  (p=0.000 n=22+22)
Template-4               37.6MB/s ± 1%  37.7MB/s ± 1%    ~     (p=0.558 n=20+23)
[Geo mean]                134MB/s        133MB/s       -0.76%
Tue Mar 28 22:16:54 EDT 2017

Change-Id: I4a4f5c2b53d3fb85ef76c98522d3ed5cf8ae5b7e
Reviewed-on: https://go-review.googlesource.com/38732
Reviewed-by: Russ Cox <rsc@golang.org>
2017-03-29 14:18:24 +00:00
Elias Naur
2b4274d667 runtime/cgo: CFRelease result from CFBundleCopyResourceURL
The result from CFBundleCopyResourceURL is owned by the caller. This
CL adds the necessary CFRelease to release it after use.

Fixes #19722

Change-Id: I7afe22ef241d21922a7f5cef6498017e6269a5c3
Reviewed-on: https://go-review.googlesource.com/38639
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2017-03-27 18:12:17 +00:00
Austin Clements
4234d1decd runtime: improve systemstack-on-Go stack message
We reused the old C stack check mechanism for the implementation of
//go:systemstack, so when we execute a //go:systemstack function on a
user stack, the system fails by calling morestackc. However,
morestackc's message still talks about "executing C code".

Fix morestackc's message to reflect its modern usage.

Change-Id: I7e70e7980eab761c0520f675d3ce89486496030f
Reviewed-on: https://go-review.googlesource.com/38572
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-27 14:53:12 +00:00
Elias Naur
0476c7a7b5 runtime/cgo: raise the thread-local storage slot search limit on Android
On Android, the thread local offset is found by looping through memory
starting at the TLS base address. The search is limited to
PTHREAD_KEYS_MAX, but issue 19472 made it clear that in some cases, the
slot is located further from the TLS base.

The limit is merely a sanity check in case our assumptions about the
thread-local storage layout are wrong, so this CL raises it to 384, which
is enough for the test case in issue 19472.

Fixes #19472

Change-Id: I89d1db3e9739d3a7fff5548ae487a7483c0a278a
Reviewed-on: https://go-review.googlesource.com/38636
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-27 08:56:08 +00:00
Elias Naur
aa4c2ca316 runtime/pprof: fix proto tests on NetBSD
The proto_test tests are failing on NetBSD:

https://build.golang.org/log/a3a577144ac48c6ef8e384ce6a700ad30549fb78

the failures seem similar to previous failures on Android:

https://build.golang.org/log/b5786e0cd6d5941dc37b6a50be5172f6b99e22f0

The Android failures where fixed by CL 37896. This CL is an attempt
to fix the NetBSD failures with a similar fix.

Change-Id: I3834afa5b32303ca226e6a31f0f321f66fef9a3f
Reviewed-on: https://go-review.googlesource.com/38637
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-27 08:55:14 +00:00
Keith Randall
e67d881bc3 cmd/compile: simplify efaceeq and ifaceeq
Clean up code that does interface equality. Avoid doing checks
in efaceeq/ifaceeq that we already did before calling those routines.

No noticeable performance changes for existing benchmarks.

name            old time/op  new time/op  delta
EfaceCmpDiff-8   604ns ± 1%   553ns ± 1%  -8.41%  (p=0.000 n=9+10)

Fixes #18618

Change-Id: I3bd46db82b96494873045bc3300c56400bc582eb
Reviewed-on: https://go-review.googlesource.com/38606
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2017-03-24 23:03:09 +00:00
Cherry Zhang
3a1ce1085a runtime: access _cgo_yield indirectly
The darwin linker for ARM does not allow PC-relative relocation
of external symbol in text section. Work around it by accessing
it indirectly: putting its address in a global variable (which is
not external), and accessing through that variable.

Fixes #19684.

Change-Id: I41361bbb281b5dbdda0d100ae49d32c69ed85a81
Reviewed-on: https://go-review.googlesource.com/38596
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
2017-03-24 15:37:56 +00:00
Carlos Eduardo Seo
189053aee2 runtime/internal/atomic: Remove unnecessary checks for GOARCH_ppc64
Starting in go1.9, the minimum processor requirement for ppc64 is POWER8. This
means the checks for GOARCH_ppc64 in asm_ppc64x.s can be removed, since we can
assume LBAR and STBCCC instructions (both from ISA 2.06) will always be
available.

Updates #19074

Change-Id: Ib4418169cd9fc6f871a5ab126b28ee58a2f349e2
Reviewed-on: https://go-review.googlesource.com/38406
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-22 18:14:41 +00:00
Josselin Costanzi
01cd22c687 bytes: add optimized countByte for amd64
Use SSE/AVX2 when counting a single byte.
Inspired from runtime indexbyte implementation.

Benchmark against previous implementation, where
1 byte in every 8 is the one we are looking for:

* On a machine without AVX2
name               old time/op   new time/op     delta
CountSingle/10-4    61.8ns ±10%     15.6ns ±11%    -74.83%  (p=0.000 n=10+10)
CountSingle/32-4     100ns ± 4%       17ns ±10%    -82.54%  (p=0.000 n=10+9)
CountSingle/4K-4    9.66µs ± 3%     0.37µs ± 6%    -96.21%  (p=0.000 n=10+10)
CountSingle/4M-4    11.0ms ± 6%      0.4ms ± 4%    -96.04%  (p=0.000 n=10+10)
CountSingle/64M-4    194ms ± 8%        8ms ± 2%    -95.64%  (p=0.000 n=10+10)

name               old speed     new speed       delta
CountSingle/10-4   162MB/s ±10%    645MB/s ±10%   +297.00%  (p=0.000 n=10+10)
CountSingle/32-4   321MB/s ± 5%   1844MB/s ± 9%   +474.79%  (p=0.000 n=10+9)
CountSingle/4K-4   424MB/s ± 3%  11169MB/s ± 6%  +2533.10%  (p=0.000 n=10+10)
CountSingle/4M-4   381MB/s ± 7%   9609MB/s ± 4%  +2421.88%  (p=0.000 n=10+10)
CountSingle/64M-4  346MB/s ± 7%   7924MB/s ± 2%  +2188.78%  (p=0.000 n=10+10)

* On a machine with AVX2
name               old time/op   new time/op     delta
CountSingle/10-8    37.1ns ± 3%      8.2ns ± 1%    -77.80%  (p=0.000 n=10+10)
CountSingle/32-8    66.1ns ± 3%      9.8ns ± 2%    -85.23%  (p=0.000 n=10+10)
CountSingle/4K-8    7.36µs ± 3%     0.11µs ± 1%    -98.54%  (p=0.000 n=10+10)
CountSingle/4M-8    7.46ms ± 2%     0.15ms ± 2%    -97.95%  (p=0.000 n=10+9)
CountSingle/64M-8    124ms ± 2%        6ms ± 4%    -95.09%  (p=0.000 n=10+10)

name               old speed     new speed       delta
CountSingle/10-8   269MB/s ± 3%   1213MB/s ± 1%   +350.32%  (p=0.000 n=10+10)
CountSingle/32-8   484MB/s ± 4%   3277MB/s ± 2%   +576.66%  (p=0.000 n=10+10)
CountSingle/4K-8   556MB/s ± 3%  37933MB/s ± 1%  +6718.36%  (p=0.000 n=10+10)
CountSingle/4M-8   562MB/s ± 2%  27444MB/s ± 3%  +4783.43%  (p=0.000 n=10+9)
CountSingle/64M-8  543MB/s ± 2%  11054MB/s ± 3%  +1935.81%  (p=0.000 n=10+10)

Fixes #19411

Change-Id: Ieaf20b1fabccabe767c55c66e242e86f3617f883
Reviewed-on: https://go-review.googlesource.com/38258
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-21 20:25:17 +00:00
Daniel Martí
2e29eb57db runtime: remove unused *chantype parameters
The chanrecv funcs don't use it at all. The chansend ones do, but the
element type is now part of the hchan struct, which is already a
parameter.

hchan can be nil in chansend when sending to a nil channel, so when
instrumenting we must copy to the stack to be able to read the channel
type.

name             old time/op  new time/op  delta
ChanUncontended  6.42µs ± 1%  6.22µs ± 0%  -3.06%  (p=0.000 n=19+18)

Initially found by github.com/mvdan/unparam.

Fixes #19591.

Change-Id: I3a5e8a0082e8445cc3f0074695e3593fd9c88412
Reviewed-on: https://go-review.googlesource.com/38351
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-21 17:10:16 +00:00
Vladimir Stefanovic
24dc8c6cb5 cmd/compile,runtime: fix atomic And8 for mipsle
Removing stray xori that came from big endian copy/paste.
Adding atomicand8 check to runtime.check() that would have revealed
this error.
Might fix #19396.

Change-Id: If8d6f25d3e205496163541eb112548aa66df9c2a
Reviewed-on: https://go-review.googlesource.com/38257
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-21 16:03:12 +00:00
Hugues Bruant
5d6b7fcaa1 runtime: add mapdelete_fast*
Add benchmarks for map delete with int32/int64/string key

Benchmark results on darwin/amd64

name                 old time/op  new time/op  delta
MapDelete/Int32/1-8   151ns ± 8%    99ns ± 3%  -34.39%  (p=0.008 n=5+5)
MapDelete/Int32/2-8   128ns ± 2%   111ns ±15%  -13.40%  (p=0.040 n=5+5)
MapDelete/Int32/4-8   128ns ± 5%   114ns ± 2%  -10.82%  (p=0.008 n=5+5)
MapDelete/Int64/1-8   144ns ± 0%   104ns ± 3%  -27.53%  (p=0.016 n=4+5)
MapDelete/Int64/2-8   153ns ± 1%   126ns ± 3%  -17.17%  (p=0.008 n=5+5)
MapDelete/Int64/4-8   178ns ± 3%   136ns ± 2%  -23.60%  (p=0.008 n=5+5)
MapDelete/Str/1-8     187ns ± 3%   171ns ± 3%   -8.54%  (p=0.008 n=5+5)
MapDelete/Str/2-8     221ns ± 3%   206ns ± 4%   -7.18%  (p=0.016 n=5+4)
MapDelete/Str/4-8     256ns ± 5%   232ns ± 2%   -9.36%  (p=0.016 n=4+5)

name                     old time/op    new time/op    delta
BinaryTree17-8              2.78s ± 7%     2.70s ± 1%    ~     (p=0.151 n=5+5)
Fannkuch11-8                3.21s ± 2%     3.19s ± 1%    ~     (p=0.310 n=5+5)
FmtFprintfEmpty-8          49.1ns ± 3%    50.2ns ± 2%    ~     (p=0.095 n=5+5)
FmtFprintfString-8         78.6ns ± 4%    80.2ns ± 5%    ~     (p=0.460 n=5+5)
FmtFprintfInt-8            79.7ns ± 1%    81.0ns ± 3%    ~     (p=0.103 n=5+5)
FmtFprintfIntInt-8          117ns ± 2%     119ns ± 0%    ~     (p=0.079 n=5+4)
FmtFprintfPrefixedInt-8     153ns ± 1%     146ns ± 3%  -4.19%  (p=0.024 n=5+5)
FmtFprintfFloat-8           239ns ± 1%     237ns ± 1%    ~     (p=0.246 n=5+5)
FmtManyArgs-8               506ns ± 2%     509ns ± 2%    ~     (p=0.238 n=5+5)
GobDecode-8                7.06ms ± 4%    6.86ms ± 1%    ~     (p=0.222 n=5+5)
GobEncode-8                6.01ms ± 5%    5.87ms ± 2%    ~     (p=0.222 n=5+5)
Gzip-8                      246ms ± 4%     236ms ± 1%  -4.12%  (p=0.008 n=5+5)
Gunzip-8                   37.7ms ± 4%    37.3ms ± 1%    ~     (p=0.841 n=5+5)
HTTPClientServer-8         64.9µs ± 1%    64.4µs ± 0%  -0.80%  (p=0.032 n=5+4)
JSONEncode-8               16.0ms ± 2%    16.2ms ±11%    ~     (p=0.548 n=5+5)
JSONDecode-8               53.2ms ± 2%    53.1ms ± 4%    ~     (p=1.000 n=5+5)
Mandelbrot200-8            4.33ms ± 2%    4.32ms ± 2%    ~     (p=0.841 n=5+5)
GoParse-8                  3.24ms ± 2%    3.27ms ± 4%    ~     (p=0.690 n=5+5)
RegexpMatchEasy0_32-8      86.2ns ± 1%    85.2ns ± 3%    ~     (p=0.286 n=5+5)
RegexpMatchEasy0_1K-8       198ns ± 2%     199ns ± 1%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_32-8      82.6ns ± 2%    81.8ns ± 1%    ~     (p=0.294 n=5+5)
RegexpMatchEasy1_1K-8       359ns ± 2%     354ns ± 1%  -1.39%  (p=0.048 n=5+5)
RegexpMatchMedium_32-8      123ns ± 2%     123ns ± 1%    ~     (p=0.905 n=5+5)
RegexpMatchMedium_1K-8     38.2µs ± 2%    38.6µs ± 8%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32-8       1.92µs ± 2%    1.91µs ± 5%    ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-8       57.6µs ± 1%    57.0µs ± 2%    ~     (p=0.310 n=5+5)
Revcomp-8                   483ms ± 7%     441ms ± 1%  -8.79%  (p=0.016 n=5+4)
Template-8                 58.0ms ± 1%    58.2ms ± 7%    ~     (p=0.310 n=5+5)
TimeParse-8                 324ns ± 6%     312ns ± 2%    ~     (p=0.087 n=5+5)
TimeFormat-8                330ns ± 1%     329ns ± 1%    ~     (p=0.968 n=5+5)

name                     old speed      new speed      delta
GobDecode-8               109MB/s ± 4%   112MB/s ± 1%    ~     (p=0.222 n=5+5)
GobEncode-8               128MB/s ± 5%   131MB/s ± 2%    ~     (p=0.222 n=5+5)
Gzip-8                   78.9MB/s ± 4%  82.3MB/s ± 1%  +4.25%  (p=0.008 n=5+5)
Gunzip-8                  514MB/s ± 4%   521MB/s ± 1%    ~     (p=0.841 n=5+5)
JSONEncode-8              121MB/s ± 2%   120MB/s ±10%    ~     (p=0.548 n=5+5)
JSONDecode-8             36.5MB/s ± 2%  36.6MB/s ± 4%    ~     (p=1.000 n=5+5)
GoParse-8                17.9MB/s ± 2%  17.7MB/s ± 4%    ~     (p=0.730 n=5+5)
RegexpMatchEasy0_32-8     371MB/s ± 1%   375MB/s ± 3%    ~     (p=0.310 n=5+5)
RegexpMatchEasy0_1K-8    5.15GB/s ± 1%  5.13GB/s ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchEasy1_32-8     387MB/s ± 2%   391MB/s ± 1%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_1K-8    2.85GB/s ± 2%  2.89GB/s ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchMedium_32-8   8.07MB/s ± 2%  8.06MB/s ± 1%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8   26.8MB/s ± 2%  26.6MB/s ± 7%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32-8     16.7MB/s ± 2%  16.7MB/s ± 5%    ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-8     17.8MB/s ± 1%  18.0MB/s ± 2%    ~     (p=0.310 n=5+5)
Revcomp-8                 527MB/s ± 6%   577MB/s ± 1%  +9.44%  (p=0.016 n=5+4)
Template-8               33.5MB/s ± 1%  33.4MB/s ± 7%    ~     (p=0.310 n=5+5)

Updates #19495

Change-Id: Ib9ece1690813d9b4788455db43d30891e2138df5
Reviewed-on: https://go-review.googlesource.com/38172
Reviewed-by: Hugues Bruant <hugues.bruant@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-21 06:07:24 +00:00
Alex Brainman
f2b79cadfd runtime: import os package in BenchmarkRunningGoProgram
I would like to use BenchmarkRunningGoProgram to measure
changes for issue #15588. So the program in the benchmark
should import "os" package.

It is also reasonable that basic Go program includes
"os" package.

For #15588.

Change-Id: Ida6712eab22c2e79fbe91b6fdd492eaf31756852
Reviewed-on: https://go-review.googlesource.com/37914
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-21 05:59:45 +00:00
Ian Lance Taylor
5dc14af682 runtime: clear signal stack on main thread
This is a workaround for a FreeBSD kernel bug. It can be removed when
we are confident that all people are using the fixed kernel. See #15658.

Updates #15658.

Change-Id: I0ecdccb77ddd0c270bdeac4d3a5c8abaf0449075
Reviewed-on: https://go-review.googlesource.com/38325
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-20 22:59:26 +00:00
Austin Clements
df6025bc0d runtime: disallow malloc or panic in scavenge
Mallocs and panics in the scavenge path are particularly nasty because
they're likely to silently self-deadlock on the mheap.lock. Avoid
sinking lots of time into debugging these issues in the future by
turning these into immediate throws.

Change-Id: Ib36fdda33bc90b21c32432b03561630c1f3c69bc
Reviewed-on: https://go-review.googlesource.com/38293
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-19 22:42:28 +00:00
Austin Clements
13ae271d5d runtime: introduce a type for lfstacks
The lfstack API is still a C-style API: lfstacks all have unhelpful
type uint64 and the APIs are package-level functions. Make the code
more readable and Go-style by creating an lfstack type with methods
for push, pop, and empty.

Change-Id: I64685fa3be0e82ae2d1a782a452a50974440a827
Reviewed-on: https://go-review.googlesource.com/38290
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-19 22:42:24 +00:00
Daniel Martí
77b09b8b8d runtime: remove unused g parameter
Found by github.com/mvdan/unparam.

Change-Id: I20145440ff1bcd27fcf15a740354c52f313e536c
Reviewed-on: https://go-review.googlesource.com/37894
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-16 14:03:45 +00:00
Carlos Eduardo Seo
d60166d5ee runtime: improve IndexByte for ppc64x
This change adds a better implementation of IndexByte for ppc64x.

Improvement for bytes·IndexByte:

benchmark                             old ns/op     new ns/op     delta
BenchmarkIndexByte/10-16              12.5          8.48          -32.16%
BenchmarkIndexByte/32-16              34.4          9.85          -71.37%
BenchmarkIndexByte/4K-16              3089          217           -92.98%
BenchmarkIndexByte/4M-16              3154810       207051        -93.44%
BenchmarkIndexByte/64M-16             50564811      5579093       -88.97%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkIndexByte/10-16              800.41       1179.64      1.47x
BenchmarkIndexByte/32-16              930.60       3249.10      3.49x
BenchmarkIndexByte/4K-16              1325.71      18832.53     14.21x
BenchmarkIndexByte/4M-16              1329.49      20257.29     15.24x
BenchmarkIndexByte/64M-16             1327.19      12028.63     9.06x

Improvement for strings·IndexByte:

benchmark                             old ns/op     new ns/op     delta
BenchmarkIndexByte-16                 25.9          7.69          -70.31%

Fixes #19030

Change-Id: Ifb82bbb3d643ec44b98eaa2d08a07f47e5c2fd11
Reviewed-on: https://go-review.googlesource.com/37670
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-03-16 13:54:20 +00:00
Keith Randall
d5dc490519 cmd/compile: intrinsics for math/bits.TrailingZerosX
Implement math/bits.TrailingZerosX using intrinsics.

Generally reorganize the intrinsic spec a bit.
The instrinsics data structure is now built at init time.
This will make doing the other functions in math/bits easier.

Update sys.CtzX to return int instead of uint{64,32} so it
matches math/bits.TrailingZerosX.

Improve the intrinsics a bit for amd64.  We don't need the CMOV
for <64 bit versions.

Update #18616

Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39
Reviewed-on: https://go-review.googlesource.com/38155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 02:44:16 +00:00
Martin Möhrmann
16200c7333 runtime: make complex division c99 compatible
- changes tests to check that the real and imaginary part of the go complex
  division result is equal to the result gcc produces for c99
- changes complex division code to satisfy new complex division test
- adds float functions isNan, isFinite, isInf, abs and copysign
  in the runtime package

Fixes #14644.

name                   old time/op  new time/op  delta
Complex128DivNormal-4  21.8ns ± 6%  13.9ns ± 6%  -36.37%  (p=0.000 n=20+20)
Complex128DivNisNaN-4  14.1ns ± 1%  15.0ns ± 1%   +5.86%  (p=0.000 n=20+19)
Complex128DivDisNaN-4  12.5ns ± 1%  16.7ns ± 1%  +33.79%  (p=0.000 n=19+20)
Complex128DivNisInf-4  10.1ns ± 1%  13.0ns ± 1%  +28.25%  (p=0.000 n=20+19)
Complex128DivDisInf-4  11.0ns ± 1%  20.9ns ± 1%  +90.69%  (p=0.000 n=16+19)
ComplexAlgMap-4        86.7ns ± 1%  86.8ns ± 2%     ~     (p=0.804 n=20+20)

Change-Id: I261f3b4a81f6cc858bc7ff48f6fd1b39c300abf0
Reviewed-on: https://go-review.googlesource.com/37441
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-15 22:45:17 +00:00
Austin Clements
4b8f41daa6 runtime: print user stack on other threads during GOTRACBEACK=crash
Currently, when printing tracebacks of other threads during
GOTRACEBACK=crash, if the thread is on the system stack we print only
the header for the user goroutine and fail to print its stack. This
happens because we passed the g0 to traceback instead of curg. The g0
never has anything set in its gobuf, so traceback doesn't print
anything.

Fix this by passing _g_.m.curg to traceback instead of the g0.

Fixes #19494.

Change-Id: Idfabf94d6a725e9cdf94a3923dead6455ef3b217
Reviewed-on: https://go-review.googlesource.com/38012
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-15 22:16:12 +00:00
Austin Clements
f2e87158f0 runtime: make GOTRACEBACK=crash crash promptly in cgo binaries
GOTRACEBACK=crash works by bouncing a SIGQUIT around the process
sched.mcount times. However, sched.mcount includes the extra Ms
allocated by oneNewExtraM for cgo callbacks. Hence, if there are any
extra Ms that don't have real OS threads, we'll try to send SIGQUIT
more times than there are threads to catch it. Since nothing will
catch these extra signals, we'll fall back to blocking for five
seconds before aborting the process.

Avoid this five second delay by subtracting out the number of extra Ms
when sending SIGQUITs.

Of course, in a cgo binary, it's still possible for the SIGQUIT to go
to a cgo thread and cause some other failure mode. This does not fix
that.

Change-Id: I4fbf3c52dd721812796c4c1dcb2ab4cb7026d965
Reviewed-on: https://go-review.googlesource.com/38182
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-15 22:16:10 +00:00
Hugues Bruant
ec091b6af2 runtime: add mapassign_fast*
Add benchmarks for map assignment with int32/int64/string key

Benchmark results on darwin/amd64

name                  old time/op  new time/op  delta
MapAssignInt32_255-8  24.7ns ± 3%  17.4ns ± 2%  -29.75%  (p=0.000 n=10+10)
MapAssignInt32_64k-8  45.5ns ± 4%  37.6ns ± 4%  -17.18%  (p=0.000 n=10+10)
MapAssignInt64_255-8  26.0ns ± 3%  17.9ns ± 4%  -31.03%  (p=0.000 n=10+10)
MapAssignInt64_64k-8  46.9ns ± 5%  38.7ns ± 2%  -17.53%  (p=0.000 n=9+10)
MapAssignStr_255-8    47.8ns ± 3%  24.8ns ± 4%  -48.01%  (p=0.000 n=10+10)
MapAssignStr_64k-8    83.0ns ± 3%  51.9ns ± 3%  -37.45%  (p=0.000 n=10+9)

name                     old time/op    new time/op    delta
BinaryTree17-8              3.11s ±19%     2.78s ± 3%    ~     (p=0.095 n=5+5)
Fannkuch11-8                3.26s ± 1%     3.21s ± 2%    ~     (p=0.056 n=5+5)
FmtFprintfEmpty-8          50.3ns ± 1%    50.8ns ± 2%    ~     (p=0.246 n=5+5)
FmtFprintfString-8         82.7ns ± 4%    80.1ns ± 5%    ~     (p=0.238 n=5+5)
FmtFprintfInt-8            82.6ns ± 2%    81.9ns ± 3%    ~     (p=0.508 n=5+5)
FmtFprintfIntInt-8          124ns ± 4%     121ns ± 3%    ~     (p=0.111 n=5+5)
FmtFprintfPrefixedInt-8     158ns ± 6%     160ns ± 2%    ~     (p=0.341 n=5+5)
FmtFprintfFloat-8           249ns ± 2%     245ns ± 2%    ~     (p=0.095 n=5+5)
FmtManyArgs-8               513ns ± 2%     519ns ± 3%    ~     (p=0.151 n=5+5)
GobDecode-8                7.48ms ±12%    7.11ms ± 2%    ~     (p=0.222 n=5+5)
GobEncode-8                6.25ms ± 1%    6.03ms ± 2%  -3.56%  (p=0.008 n=5+5)
Gzip-8                      252ms ± 4%     252ms ± 4%    ~     (p=1.000 n=5+5)
Gunzip-8                   38.4ms ± 3%    38.6ms ± 2%    ~     (p=0.690 n=5+5)
HTTPClientServer-8         76.9µs ±41%    66.4µs ± 6%    ~     (p=0.310 n=5+5)
JSONEncode-8               16.5ms ± 3%    16.7ms ± 3%    ~     (p=0.421 n=5+5)
JSONDecode-8               54.6ms ± 1%    54.3ms ± 2%    ~     (p=0.548 n=5+5)
Mandelbrot200-8            4.45ms ± 3%    4.47ms ± 1%    ~     (p=0.841 n=5+5)
GoParse-8                  3.43ms ± 1%    3.32ms ± 2%  -3.28%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8      88.2ns ± 3%    89.4ns ± 2%    ~     (p=0.333 n=5+5)
RegexpMatchEasy0_1K-8       205ns ± 1%     206ns ± 1%    ~     (p=0.905 n=5+5)
RegexpMatchEasy1_32-8      85.1ns ± 1%    85.5ns ± 5%    ~     (p=0.690 n=5+5)
RegexpMatchEasy1_1K-8       365ns ± 1%     371ns ± 9%    ~     (p=1.000 n=5+5)
RegexpMatchMedium_32-8      129ns ± 2%     128ns ± 3%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8     39.8µs ± 0%    39.7µs ± 4%    ~     (p=0.730 n=4+5)
RegexpMatchHard_32-8       1.99µs ± 3%    2.05µs ±16%    ~     (p=0.794 n=5+5)
RegexpMatchHard_1K-8       59.3µs ± 1%    60.3µs ± 7%    ~     (p=1.000 n=5+5)
Revcomp-8                   1.36s ±63%     0.52s ± 5%    ~     (p=0.095 n=5+5)
Template-8                 62.6ms ±14%    60.5ms ± 5%    ~     (p=0.690 n=5+5)
TimeParse-8                 330ns ± 2%     324ns ± 2%    ~     (p=0.087 n=5+5)
TimeFormat-8                350ns ± 3%     340ns ± 1%  -2.86%  (p=0.008 n=5+5)

name                     old speed      new speed      delta
GobDecode-8               103MB/s ±11%   108MB/s ± 2%    ~     (p=0.222 n=5+5)
GobEncode-8               123MB/s ± 1%   127MB/s ± 2%  +3.71%  (p=0.008 n=5+5)
Gzip-8                   77.1MB/s ± 4%  76.9MB/s ± 3%    ~     (p=1.000 n=5+5)
Gunzip-8                  505MB/s ± 3%   503MB/s ± 2%    ~     (p=0.690 n=5+5)
JSONEncode-8              118MB/s ± 3%   116MB/s ± 3%    ~     (p=0.421 n=5+5)
JSONDecode-8             35.5MB/s ± 1%  35.8MB/s ± 2%    ~     (p=0.397 n=5+5)
GoParse-8                16.9MB/s ± 1%  17.4MB/s ± 2%  +3.45%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8     363MB/s ± 3%   358MB/s ± 2%    ~     (p=0.421 n=5+5)
RegexpMatchEasy0_1K-8    4.98GB/s ± 1%  4.97GB/s ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchEasy1_32-8     376MB/s ± 1%   375MB/s ± 5%    ~     (p=0.690 n=5+5)
RegexpMatchEasy1_1K-8    2.80GB/s ± 1%  2.76GB/s ± 9%    ~     (p=0.841 n=5+5)
RegexpMatchMedium_32-8   7.73MB/s ± 1%  7.76MB/s ± 3%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8   25.8MB/s ± 0%  25.8MB/s ± 4%    ~     (p=0.651 n=4+5)
RegexpMatchHard_32-8     16.1MB/s ± 3%  15.7MB/s ±14%    ~     (p=0.794 n=5+5)
RegexpMatchHard_1K-8     17.3MB/s ± 1%  17.0MB/s ± 7%    ~     (p=0.984 n=5+5)
Revcomp-8                 273MB/s ±83%   488MB/s ± 5%    ~     (p=0.095 n=5+5)
Template-8               31.1MB/s ±13%  32.1MB/s ± 5%    ~     (p=0.690 n=5+5)

Updates #19495

Change-Id: I116e9a2a4594769318b22d736464de8a98499909
Reviewed-on: https://go-review.googlesource.com/38091
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-13 23:43:16 +00:00
Dave Cheney
7ee43faf78 runtime: remove sizeToClass
CL 32219 added precomputed sizeclass tables.

Remove the unused sizeToClass method which was previously only
called from initSizes.

Change-Id: I907bf9ed78430ecfaabbec7fca77ef2375010081
Reviewed-on: https://go-review.googlesource.com/38113
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-13 01:55:44 +00:00
David NewHamlet
e19f184b8f runtime: use cpuset_getaffinity for runtime.NumCPU() on FreeBSD
In FreeBSD when run Go proc under a given sub-list of
processors(e.g. 'cpuset -l 0 ./a.out' in multi-core system),
runtime.NumCPU() still return all physical CPUs from sysctl
hw.ncpu instead of account from sub-list.

Fix by use syscall cpuset_getaffinity to account the number of sub-list.

Fixes #15206

Change-Id: If87c4b620e870486efa100685db5debbf1210a5b
Reviewed-on: https://go-review.googlesource.com/29341
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2017-03-10 22:06:24 +00:00
Daniel Martí
da0d23e5cd runtime: remove unused ratep parameter
Found by github.com/mvdan/unparam.

Change-Id: Iabcdfec2ae42c735aa23210b7183080d750682ca
Reviewed-on: https://go-review.googlesource.com/38030
Reviewed-by: Peter Weinberger <pjw@google.com>
Run-TryBot: Peter Weinberger <pjw@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-10 20:46:58 +00:00
Bryan C. Mills
4210930a28 runtime/cgo: return correct sa_flags
A typo in the previous revision ("act" instead of "oldact") caused us
to return the sa_flags from the new (or zeroed) sigaction rather than
the old one.

In the presence of a signal handler registered before
runtime.libpreinit, this caused setsigstack to erroneously zero out
important sa_flags (such as SA_SIGINFO) in its attempt to re-register
the existing handler with SA_ONSTACK.

Change-Id: I3cd5152a38ec0d44ae611f183bc1651d65b8a115
Reviewed-on: https://go-review.googlesource.com/37852
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-09 18:53:35 +00:00
Bryan C. Mills
e57350f4c0 runtime: fix _cgo_yield usage with sysmon and on BSD
There are a few problems from change 35494, discovered during testing
of change 37852.

1. I was confused about the usage of n.key in the sema variant, so we
   were looping on the wrong condition. The error was not caught by
   the TryBots (presumably due to missing TSAN coverage in the BSD and
   darwin builders?).

2. The sysmon goroutine sometimes skips notetsleep entirely, using
   direct usleep syscalls instead. In that case, we were not calling
   _cgo_yield, leading to missed signals under TSAN.

3. Some notetsleep calls have long finite timeouts. They should be
   broken up into smaller chunks with a yield at the end of each
   chunk.

updates #18717

Change-Id: I91175af5dea3857deebc686f51a8a40f9d690bcc
Reviewed-on: https://go-review.googlesource.com/37867
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-09 18:36:49 +00:00
Josh Bleecher Snyder
23be728950 runtime: optimize slicebytestostring
Inline rawstringtmp and simplify.
Use memmove instead of copy.

name                     old time/op  new time/op  delta
SliceByteToString/1-8    19.4ns ± 2%  14.1ns ± 1%  -27.04%  (p=0.000 n=20+17)
SliceByteToString/2-8    20.8ns ± 2%  15.5ns ± 2%  -25.46%  (p=0.000 n=20+20)
SliceByteToString/4-8    20.7ns ± 1%  14.9ns ± 1%  -28.30%  (p=0.000 n=20+20)
SliceByteToString/8-8    23.2ns ± 1%  17.1ns ± 1%  -26.22%  (p=0.000 n=19+19)
SliceByteToString/16-8   29.4ns ± 1%  23.6ns ± 1%  -19.76%  (p=0.000 n=17+20)
SliceByteToString/32-8   31.4ns ± 1%  26.0ns ± 1%  -17.11%  (p=0.000 n=16+19)
SliceByteToString/64-8   36.1ns ± 0%  30.0ns ± 0%  -16.96%  (p=0.000 n=16+16)
SliceByteToString/128-8  46.9ns ± 0%  38.9ns ± 0%  -17.15%  (p=0.000 n=17+19)

Change-Id: I422e688830e4a9bd21897d1f74964625b735f436
Reviewed-on: https://go-review.googlesource.com/37791
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-08 22:05:52 +00:00
Bryan C. Mills
29edf0f9fe runtime: poll libc to deliver signals under TSAN
fixes #18717

Change-Id: I7244463d2e7489e0b0fe3b74c4b782e71210beb2
Reviewed-on: https://go-review.googlesource.com/35494
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-08 18:58:30 +00:00
David Chase
d71f36b5aa cmd/compile: check loop rescheduling with stack bound, not counter
After benchmarking with a compiler modified to have better
spill location, it became clear that this method of checking
was actually faster on (at least) two different architectures
(ppc64 and amd64) and it also provides more timely interruption
of loops.

This change adds a modified FOR loop node "FORUNTIL" that
checks after executing the loop body instead of before (i.e.,
always at least once).  This ensures that a pointer past the
end of a slice or array is not made visible to the garbage
collector.

Without the rescheduling checks inserted, the restructured
loop from this  change apparently provides a 1% geomean
improvement on PPC64 running the go1 benchmarks; the
improvement on AMD64 is only 0.12%.

Inserting the rescheduling check exposed some peculiar bug
with the ssa test code for s390x; this was updated based on
initial code actually generated for GOARCH=s390x to use
appropriate OpArg, OpAddr, and OpVarDef.

NaCl is disabled in testing.

Change-Id: Ieafaa9a61d2a583ad00968110ef3e7a441abca50
Reviewed-on: https://go-review.googlesource.com/36206
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 18:52:12 +00:00
Russ Cox
c797256a8f runtime/pprof: add GNU build IDs to Mappings recorded from /proc/self/maps
This helps systems that maintain an external database mapping
build ID to symbol information for the given binary, especially
in the case where /proc/self/maps lists many different files
(for example, many shared libraries).

Avoid importing debug/elf to avoid dragging in that whole
package (and its dependencies like debug/dwarf) into the
build of every program that generates a profile.

Fixes #19431.

Change-Id: I6d4362a79fe23e4f1726dffb0661d20bb57f766f
Reviewed-on: https://go-review.googlesource.com/37855
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-08 01:09:18 +00:00
Austin Clements
d50f892abc runtime: join selectgo and selectgoImpl
Currently selectgo is just a wrapper around selectgoImpl. This keeps
the hard-coded frame skip counts for tracing the same between the
channel implementation and the select implementation.

However, this is fragile and confusing, so pass a skip parameter to
send and recv, join selectgo and selectgoImpl into one function, and
use decrease all of the skips in selectgo by one.

Change-Id: I11b8cbb7d805b55f5dc6ab4875ac7dde79412ff2
Reviewed-on: https://go-review.googlesource.com/37860
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-07 21:19:38 +00:00
Austin Clements
b992c2649e runtime: print SP/FP on bad pointer crashes
If the bad pointer is on a stack, this makes it possible to find the
frame containing the bad pointer.

Change-Id: Ieda44e054aa9ebf22d15d184457c7610b056dded
Reviewed-on: https://go-review.googlesource.com/37858
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 20:46:54 +00:00
Austin Clements
caa7dacfd2 runtime: honor GOTRACEBACK=crash even if _g_.m.traceback != 0
Change-Id: I6de1ef8f67bde044b8706c01e98400e266e1f8f0
Reviewed-on: https://go-review.googlesource.com/37857
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-07 20:46:52 +00:00
Matthew Dempsky
c310c688ff cmd/compile, runtime: simplify multiway select implementation
This commit reworks multiway select statements to use normal control
flow primitives instead of the previous setjmp/longjmp-like behavior.
This simplifies liveness analysis and should prevent issues around
"returns twice" function calls within SSA passes.

test/live.go is updated because liveness analysis's CFG is more
representative of actual control flow. The case bodies are the only
real successors of the selectgo call, but previously the selectsend,
selectrecv, etc. calls were included in the successors list too.

Updates #19331.

Change-Id: I7f879b103a4b85e62fc36a270d812f54c0aa3e83
Reviewed-on: https://go-review.googlesource.com/37661
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-07 20:14:17 +00:00
Daniel Martí
5ed952368e runtime/pprof: actually use tag parameter
It's only ever called with the value it was using, but the code was
counterintuitive. Use the parameter instead, like the other funcs near
it.

Found by github.com/mvdan/unparam.

Change-Id: I45855e11d749380b9b2a28e6dd1d5dedf119a19b
Reviewed-on: https://go-review.googlesource.com/37893
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 20:01:05 +00:00
Elias Naur
b91b694b37 runtime/pprof: fix the protobuf tests on Android
Change-Id: I5f85a7980b9a18d3641c4ee8b0992671a8421bb0
Reviewed-on: https://go-review.googlesource.com/37896
Run-TryBot: Elias Naur <elias.naur@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 19:18:04 +00:00
Austin Clements
e4f73769bc runtime: strongly encourage CallersFrames with the result of Callers
For historical reasons, it's still commonplace to iterate over the
slice returned by runtime.Callers and call FuncForPC on each PC. This
is broken in gccgo and somewhat broken in gc and will become more
broken in gc with mid-stack inlining.

In Go 1.7, we introduced runtime.CallersFrames to deal with these
problems, but didn't strongly direct people toward using it. Reword
the documentation on runtime.Callers to more strongly encourage people
to use CallersFrames and explicitly discourage them from iterating
over the PCs or using FuncForPC on the results.

Fixes #19426.

Change-Id: Id0d14cb51a0e9521c8fdde9612610f2c2b9383c4
Reviewed-on: https://go-review.googlesource.com/37726
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-06 20:52:00 +00:00
Austin Clements
0efc8b2188 runtime: avoid repeated findmoduledatap calls
Currently almost every function that deals with a *_func has to first
look up the *moduledata for the module containing the function's entry
point. This means we almost always do at least two identical module
lookups whenever we deal with a *_func (one to get the *_func and
another to get something from its module data) and sometimes several
more.

Fix this by making findfunc return a new funcInfo type that embeds
*_func, but also includes the *moduledata, and making all of the
functions that currently take a *_func instead take a funcInfo and use
the already-found *moduledata.

This transformation is trivial for the most part, since the *_func
type is usually inferred. The annoying part is that we can no longer
use nil to indicate failure, so this introduces a funcInfo.valid()
method and replaces nil checks with calls to valid.

Change-Id: I9b8075ef1c31185c1943596d96dec45c7ab5100f
Reviewed-on: https://go-review.googlesource.com/37331
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-03-06 19:17:24 +00:00
Austin Clements
2ef88f7fcf runtime: lock-free fast path for mark bits allocation
Currently we acquire a global lock for every newMarkBits call. This is
unfortunate since every span sweep operation calls newMarkBits.

However, most allocations are simply linear allocations from the
current arena. Take advantage of this to add a lock-free fast path for
allocating from the current arena. With this change, the global lock
only protects the lists of arenas, not the free offset in the current
arena.

Change-Id: I6cf6182af8492c8bfc21276114c77275fe3d7826
Reviewed-on: https://go-review.googlesource.com/34595
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-06 18:40:26 +00:00
Austin Clements
6c4a8d195b runtime: don't hold global gcBitsArenas lock over allocation
Currently, newArena holds the gcBitsArenas lock across allocating
memory from the OS for a new gcBits arena. This is a global lock and
allocating physical memory can be expensive, so this has the potential
to cause high lock contention, especially since every single span
sweep operation calls newArena (via newMarkBits).

Improve the situation by temporarily dropping the lock across
allocation. This means the caller now has to revalidate its
assumptions after the lock is dropped, so this also factors out that
code path and reinvokes it after the lock is acquired.

Change-Id: I1113200a954ab4aad16b5071512583cfac744bdc
Reviewed-on: https://go-review.googlesource.com/34594
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-06 18:40:23 +00:00
Eitan Adler
789c5255a4 all: remove the the duplicate words
Change-Id: I6343c162e27e2e492547c96f1fc504909b1c03c0
Reviewed-on: https://go-review.googlesource.com/37793
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-06 04:39:12 +00:00
Josh Bleecher Snyder
d4451362c0 runtime: add slicebytetostring benchmark
Change-Id: I666d2c6ea8d0b54a71260809d1a2573b122865b2
Reviewed-on: https://go-review.googlesource.com/37790
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-05 05:14:08 +00:00
Austin Clements
4a7cf960c3 runtime: make ReadMemStats STW for < 25µs
Currently ReadMemStats stops the world for ~1.7 ms/GB of heap because
it collects statistics from every single span. For large heaps, this
can be quite costly. This is particularly unfortunate because many
production infrastructures call this function regularly to collect and
report statistics.

Fix this by tracking the necessary cumulative statistics in the
mcaches. ReadMemStats still has to stop the world to stabilize these
statistics, but there are only O(GOMAXPROCS) mcaches to collect
statistics from, so this pause is only 25µs even at GOMAXPROCS=100.

Fixes #13613.

Change-Id: I3c0a4e14833f4760dab675efc1916e73b4c0032a
Reviewed-on: https://go-review.googlesource.com/34937
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-04 02:56:37 +00:00
Austin Clements
3399fd254d runtime: remove unused gcstats
The gcstats structure is no longer consumed by anything and no longer
tracks statistics that are particularly relevant to the concurrent
garbage collector. Remove it. (Having statistics is probably a good
idea, but these aren't the stats we need these days and we don't have
a way to get them out of the runtime.)

In preparation for #13613.

Change-Id: Ib63e2f9067850668f9dcbfd4ed89aab4a6622c3f
Reviewed-on: https://go-review.googlesource.com/34936
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-04 02:56:35 +00:00
Elias Naur
7523baed09 misc/ios,cmd/go, runtime/cgo: fix iOS test harness (again)
The iOS test harness was recently changed in response to lldb bugs
to replace breakpoints with the SIGUSR2 signal (CL 34926), and to
pass the current directory in the test binary arguments (CL 35152).
Both the signal sending and working directory setup is done from
the go test driver.

However, the new method doesn't work with tests where a C program is
the test driver instead of go test: the current working directory
will not be changed and SIGUSR2 is not raised.

Instead of copying that logic into any C test program, rework the
test harness (again) to move the setup logic to the early runtime
cgo setup code. That way, the harness will run even in the library
build modes.

Then, use the app Info.plist file to pass the working
directory, removing the need to alter the arguments after running.

Finally, use the SIGINT signal instead of SIGUSR2 to avoid
manipulating the signal masks or handlers.

Fixes the testcarchive tests on iOS.

With this CL, both darwin/arm and darwin/arm64 passes all.bash.

This CL replaces CL 34926, CL 35152 as well as the fixup CL
35123 and CL 35255. They are reverted in CLs earlier in the
relation chain.

Change-Id: I8485c7db1404fbd8daa261efd1ea89e905121a3e
Reviewed-on: https://go-review.googlesource.com/36090
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-03-04 01:43:13 +00:00
David Lazar
781fd3998e runtime: use inlining tables to generate accurate tracebacks
The code in https://play.golang.org/p/aYQPrTtzoK now produces the
following stack trace:

goroutine 1 [running]:
main.(*point).negate(...)
	/tmp/go/main.go:8
main.main()
	/tmp/go/main.go:14 +0x23

Previously the stack trace missed the inlined call:

goroutine 1 [running]:
main.main()
	/tmp/go/main.go:14 +0x23

Fixes #10152.
Updates #19348.

Change-Id: Ib43c67012f53da0ef1a1e69bcafb65b57d9cecb2
Reviewed-on: https://go-review.googlesource.com/37233
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-03 21:29:34 +00:00
David Lazar
699175a11a cmd/compile,link: generate PC-value tables with inlining information
In order to generate accurate tracebacks, the runtime needs to know the
inlined call stack for a given PC. This creates two tables per function
for this purpose. The first table is the inlining tree (stored in the
function's funcdata), which has a node containing the file, line, and
function name for every inlined call. The second table is a PC-value
table that maps each PC to a node in the inlining tree (or -1 if the PC
is not the result of inlining).

To give the appearance that inlining hasn't happened, the runtime also
needs the original source position information of inlined AST nodes.
Previously the compiler plastered over the line numbers of inlined AST
nodes with the line number of the call. This meant that the PC-line
table mapped each PC to line number of the outermost call in its inlined
call stack, with no way to access the innermost line number.

Now the compiler retains line numbers of inlined AST nodes and writes
the innermost source position information to the PC-line and PC-file
tables. Some tools and tests expect to see outermost line numbers, so we
provide the OutermostLine function for displaying line info.

To keep track of the inlined call stack for an AST node, we extend the
src.PosBase type with an index into a global inlining tree. Every time
the compiler inlines a call, it creates a node in the global inlining
tree for the call, and writes its index to the PosBase of every inlined
AST node. The parent of this node is the inlining tree index of the
call. -1 signifies no parent.

For each function, the compiler creates a local inlining tree and a
PC-value table mapping each PC to an index in the local tree.  These are
written to an object file, which is read by the linker.  The linker
re-encodes these tables compactly by deduplicating function names and
file names.

This change increases the size of binaries by 4-5%. For example, this is
how the go1 benchmark binary is impacted by this change:

section             old bytes   new bytes   delta
.text               3.49M ± 0%  3.49M ± 0%   +0.06%
.rodata             1.12M ± 0%  1.21M ± 0%   +8.21%
.gopclntab          1.50M ± 0%  1.68M ± 0%  +11.89%
.debug_line          338k ± 0%   435k ± 0%  +28.78%
Total               9.21M ± 0%  9.58M ± 0%   +4.01%

Updates #19348.

Change-Id: Ic4f180c3b516018138236b0c35e0218270d957d3
Reviewed-on: https://go-review.googlesource.com/37231
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-03 21:29:30 +00:00
Austin Clements
77f64c50db runtime: clarify access to mheap_.busy
There are two accesses to mheap_.busy that are guarded by checks
against len(mheap_.free). This works because both lists are (and must
be) the same length, but it makes the code less clear. Change these to
use len(mheap_.busy) so the access more clearly parallels the check.

Fixes #18944.

Change-Id: I9bacbd3663988df351ed4396ae9018bc71018311
Reviewed-on: https://go-review.googlesource.com/36354
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-03 17:02:18 +00:00
Austin Clements
b50b728587 runtime: simplify sweep allocation counting
Currently sweep counts the number of allocated objects, computes the
number of free objects from that, then re-computes the number of
allocated objects from that. Simplify and clean this up by skipping
these intermediate steps.

Change-Id: I3ed98e371eb54bbcab7c8530466c4ab5fde35f0a
Reviewed-on: https://go-review.googlesource.com/34935
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-03 17:02:16 +00:00
Austin Clements
f1ba75f8c5 runtime: don't rescan finalizers queue during mark termination
Currently we scan the finalizers queue both during concurrent mark and
during mark termination. This costs roughly 20ns per queued finalizer
and about 1ns per unused finalizer queue slot (allocated queue length
never decreases), which can drive up STW time if there are many
finalizers.

However, we only add finalizers to this queue during sweeping, which
means that the second scan will never find anything new. Hence, we can
fix this by simply not scanning the finalizers queue during mark
termination. This brings the STW time under the 100µs goal even with
1,000,000 queued finalizers.

Fixes #18869.

Change-Id: I4ce5620c66fb7f13ebeb39ca313ce57047d1d0fb
Reviewed-on: https://go-review.googlesource.com/36013
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-03 17:02:14 +00:00
Austin Clements
98da2d1f91 runtime: remove wbufptr
Since workbuf is now marked go:notinheap, the write barrier-preventing
wrapper type wbufptr is no longer necessary. Remove it.

Change-Id: I3e5b5803a1547d65de1c1a9c22458a38e08549b7
Reviewed-on: https://go-review.googlesource.com/35971
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-03 17:02:12 +00:00
Josh Bleecher Snyder
9b15c13dc5 runtime/pprof: fix data race between Profile.Add and Profile.WriteTo
p.m is accessed in WriteTo without holding p.mu.
Move the access inside the critical section.

The race detector catches this bug using this program:


package main

import (
	"os"
	"runtime/pprof"
	"time"
)

func main() {
	p := pprof.NewProfile("ABC")
	go func() {
		p.WriteTo(os.Stdout, 1)
		time.Sleep(time.Second)
	}()
	p.Add("abc", 0)
	time.Sleep(time.Second)
}


$ go run -race x.go 
==================
WARNING: DATA RACE
Write at 0x00c42007c240 by main goroutine:
  runtime.mapassign()
      /Users/josh/go/tip/src/runtime/hashmap.go:485 +0x0
  runtime/pprof.(*Profile).Add()
      /Users/josh/go/tip/src/runtime/pprof/pprof.go:281 +0x255
  main.main()
      /Users/josh/go/tip/src/p.go:15 +0x9d

Previous read at 0x00c42007c240 by goroutine 6:
  runtime/pprof.(*Profile).WriteTo()
      /Users/josh/go/tip/src/runtime/pprof/pprof.go:314 +0xc5
  main.main.func1()
      /Users/josh/go/tip/src/x.go:12 +0x69

Goroutine 6 (running) created at:
  main.main()
      /Users/josh/go/tip/src/x.go:11 +0x6e
==================
ABC profile: total 1
1 @ 0x110ccb4 0x111aeee 0x1055053 0x107f031

Found 1 data race(s)
exit status 66


(Exit status 66?)

Change-Id: I49d884dc3af9cce2209057a3448fe6bf50653523
Reviewed-on: https://go-review.googlesource.com/37730
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-02 23:30:07 +00:00
Josh Bleecher Snyder
04fc887761 runtime: delay marking maps as writing until after first alg call
Fixes #19359

Change-Id: I196b47cf0471915b6dc63785e8542aa1876ff695
Reviewed-on: https://go-review.googlesource.com/37665
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-02 17:38:30 +00:00
Lynn Boger
e54bc92a2c runtime, cmd/go: roll back stale message, test detail
Some debugging code was recently added to:
1) provide more detail for the stale reason when it is
determined that a package is stale
2) provide file and package time and date information when
it is determined that runtime.a is stale

This backs out those those debugging messages.

Fixes #19116

Change-Id: I8dd0cbe29324820275b481d8bbb78ff2c5fbc362
Reviewed-on: https://go-review.googlesource.com/37382
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-01 18:50:27 +00:00
Josh Bleecher Snyder
064e44f218 runtime: evacuate old map buckets more consistently
During map growth, buckets are evacuated in two ways.
When a value is altered, its containing bucket is evacuated.
Also, an evacuation mark is maintained and advanced every time.
Prior to this CL, the evacuation mark was always incremented,
even if the next bucket to be evacuated had already been evacuated.
This CL changes evacuation mark advancement to skip previously
evacuated buckets. This has the effect of making map evacuation both
more aggressive and more consistent.

Aggressive map evacuation is good. While the map is growing,
map accesses must check two buckets, which may be far apart in memory.
Map growth also delays garbage collection.
And if map evacuation is not aggressive enough, there is a risk that
a populate-once read-many map may be stuck permanently in map growth.
This CL does not eliminate that possibility, but it shrinks the window.

There is minimal impact on map benchmarks:

name                         old time/op    new time/op    delta
MapPop100-8                    12.4µs ±11%    12.4µs ± 7%    ~     (p=0.798 n=15+15)
MapPop1000-8                    240µs ± 8%     235µs ± 8%    ~     (p=0.217 n=15+14)
MapPop10000-8                  4.49ms ±10%    4.51ms ±15%    ~     (p=1.000 n=15+13)
MegMap-8                       11.9ns ± 2%    11.8ns ± 0%  -1.01%  (p=0.000 n=15+11)
MegOneMap-8                    9.30ns ± 1%    9.29ns ± 1%    ~     (p=0.955 n=14+14)
MegEqMap-8                     31.9µs ± 5%    31.9µs ± 3%    ~     (p=0.935 n=15+15)
MegEmptyMap-8                  2.41ns ± 2%    2.41ns ± 0%    ~     (p=0.594 n=12+14)
SmallStrMap-8                  12.8ns ± 1%    12.7ns ± 1%    ~     (p=0.569 n=14+13)
MapStringKeysEight_16-8        13.6ns ± 1%    13.7ns ± 2%    ~     (p=0.100 n=13+15)
MapStringKeysEight_32-8        12.1ns ± 1%    12.1ns ± 2%    ~     (p=0.340 n=15+15)
MapStringKeysEight_64-8        12.1ns ± 1%    12.1ns ± 2%    ~     (p=0.582 n=15+14)
MapStringKeysEight_1M-8        12.0ns ± 1%    12.1ns ± 1%    ~     (p=0.267 n=15+14)
IntMap-8                       7.96ns ± 1%    7.97ns ± 2%    ~     (p=0.991 n=15+13)
RepeatedLookupStrMapKey32-8    15.8ns ± 2%    15.8ns ± 1%    ~     (p=0.393 n=15+14)
RepeatedLookupStrMapKey1M-8    35.3µs ± 2%    35.3µs ± 1%    ~     (p=0.815 n=15+15)
NewEmptyMap-8                  36.0ns ± 4%    36.4ns ± 7%    ~     (p=0.270 n=15+15)
NewSmallMap-8                  85.5ns ± 1%    85.6ns ± 1%    ~     (p=0.674 n=14+15)
MapIter-8                      89.9ns ± 6%    90.8ns ± 6%    ~     (p=0.467 n=15+15)
MapIterEmpty-8                 10.0ns ±22%    10.0ns ±25%    ~     (p=0.846 n=15+15)
SameLengthMap-8                4.18ns ± 1%    4.17ns ± 1%    ~     (p=0.653 n=15+14)
BigKeyMap-8                    20.2ns ± 1%    20.1ns ± 1%  -0.82%  (p=0.002 n=15+15)
BigValMap-8                    22.5ns ± 8%    22.3ns ± 6%    ~     (p=0.615 n=15+15)
SmallKeyMap-8                  15.3ns ± 1%    15.3ns ± 1%    ~     (p=0.754 n=15+14)
ComplexAlgMap-8                58.4ns ± 1%    58.7ns ± 1%  +0.52%  (p=0.000 n=14+15)

There is a tiny but detectable difference in the compiler:

name       old time/op      new time/op      delta
Template        218ms ± 5%       219ms ± 4%    ~     (p=0.094 n=98+98)
Unicode        93.6ms ± 5%      93.6ms ± 4%    ~     (p=0.910 n=94+95)
GoTypes         596ms ± 5%       598ms ± 6%    ~     (p=0.533 n=98+100)
Compiler        2.72s ± 3%       2.72s ± 4%    ~     (p=0.238 n=100+99)
SSA             4.11s ± 3%       4.11s ± 3%    ~     (p=0.864 n=99+98)
Flate           129ms ± 6%       129ms ± 4%    ~     (p=0.522 n=98+96)
GoParser        151ms ± 4%       151ms ± 4%  -0.48%  (p=0.017 n=96+96)
Reflect         379ms ± 3%       376ms ± 4%  -0.57%  (p=0.011 n=99+99)
Tar             112ms ± 5%       112ms ± 6%    ~     (p=0.688 n=93+95)
XML             214ms ± 4%       214ms ± 5%    ~     (p=0.968 n=100+99)
StdCmd          16.2s ± 2%       16.2s ± 2%  -0.26%  (p=0.048 n=99+99)

name       old user-ns/op   new user-ns/op   delta
Template   252user-ms ± 4%  250user-ms ± 4%  -0.63%  (p=0.020 n=98+97)
Unicode    113user-ms ± 7%  114user-ms ± 5%    ~     (p=0.057 n=97+94)
GoTypes    776user-ms ± 5%  777user-ms ± 5%    ~     (p=0.375 n=97+96)
Compiler   3.61user-s ± 3%  3.60user-s ± 3%    ~     (p=0.445 n=98+93)
SSA        5.84user-s ± 6%  5.85user-s ± 5%    ~     (p=0.542 n=100+95)
Flate      154user-ms ± 5%  154user-ms ± 5%    ~     (p=0.699 n=99+99)
GoParser   184user-ms ± 6%  183user-ms ± 4%    ~     (p=0.557 n=98+95)
Reflect    461user-ms ± 5%  462user-ms ± 4%    ~     (p=0.853 n=97+99)
Tar        130user-ms ± 5%  129user-ms ± 6%    ~     (p=0.567 n=93+100)
XML        257user-ms ± 6%  258user-ms ± 6%    ~     (p=0.205 n=99+100)

Change-Id: Id92dd54a152904069aac415e6aaaab5c67f5f476
Reviewed-on: https://go-review.googlesource.com/37011
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-28 20:42:50 +00:00
Josh Bleecher Snyder
504bc3ed24 cmd/compile, runtime: specialize convT2x, don't alloc for zero vals
Prior to this CL, all runtime conversions
from a concrete value to an interface went
through one of two runtime calls: convT2E or convT2I.
However, in practice, basic types are very common.
Specializing convT2x for those basic types allows
for a more efficient implementation for those types.
For basic scalars and strings, allocation and copying
can use the same methods as normal code.
For pointer-free types, allocation can occur without
zeroing, and copying can take place without GC calls.
For slices, copying is cheaper and simpler.

This CL adds twelve runtime routines:

convT2E16, convT2I16
convT2E32, convT2I32
convT2E64, convT2I64
convT2Estring, convT2Istring
convT2Eslice, convT2Islice
convT2Enoptr, convT2Inoptr

While compiling make.bash, 93% of all convT2x calls
are now to one of these specialized convT2x call.

Within specialized convT2x routines, it is cheap to check
for a zero value, in a way that it is not in general.
When we detect a zero value there, we return a pointer
to zeroVal, rather than allocating.

name                         old time/op  new time/op  delta
ConvT2Ezero/zero/16-8        17.9ns ± 2%   3.0ns ± 3%  -83.20%  (p=0.000 n=56+56)
ConvT2Ezero/zero/32-8        17.8ns ± 2%   3.0ns ± 3%  -83.15%  (p=0.000 n=59+60)
ConvT2Ezero/zero/64-8        20.1ns ± 1%   3.0ns ± 2%  -84.98%  (p=0.000 n=57+57)
ConvT2Ezero/zero/str-8       32.6ns ± 1%   3.0ns ± 4%  -90.70%  (p=0.000 n=59+60)
ConvT2Ezero/zero/slice-8     36.7ns ± 2%   3.0ns ± 2%  -91.78%  (p=0.000 n=59+59)
ConvT2Ezero/zero/big-8       91.9ns ± 2%  85.9ns ± 2%   -6.52%  (p=0.000 n=57+57)
ConvT2Ezero/nonzero/16-8     17.7ns ± 2%  12.7ns ± 3%  -28.38%  (p=0.000 n=55+60)
ConvT2Ezero/nonzero/32-8     17.8ns ± 1%  12.7ns ± 1%  -28.44%  (p=0.000 n=54+57)
ConvT2Ezero/nonzero/64-8     20.0ns ± 1%  15.0ns ± 1%  -24.90%  (p=0.000 n=56+58)
ConvT2Ezero/nonzero/str-8    32.6ns ± 1%  25.7ns ± 1%  -21.17%  (p=0.000 n=58+55)
ConvT2Ezero/nonzero/slice-8  36.8ns ± 2%  30.4ns ± 1%  -17.32%  (p=0.000 n=60+52)
ConvT2Ezero/nonzero/big-8    92.1ns ± 2%  85.9ns ± 2%   -6.70%  (p=0.000 n=57+59)

Benchmarks on a real program (the compiler):

name       old time/op      new time/op      delta
Template        227ms ± 5%       221ms ± 2%  -2.48%  (p=0.000 n=30+26)
Unicode         102ms ± 5%       100ms ± 3%  -1.30%  (p=0.009 n=30+26)
GoTypes         656ms ± 5%       659ms ± 4%    ~     (p=0.208 n=30+30)
Compiler        2.82s ± 2%       2.82s ± 1%    ~     (p=0.614 n=29+27)
Flate           128ms ± 2%       128ms ± 5%    ~     (p=0.783 n=27+28)
GoParser        158ms ± 3%       158ms ± 3%    ~     (p=0.261 n=28+30)
Reflect         408ms ± 7%       401ms ± 3%    ~     (p=0.075 n=30+30)
Tar             123ms ± 6%       121ms ± 8%    ~     (p=0.287 n=29+30)
XML             220ms ± 2%       220ms ± 4%    ~     (p=0.805 n=29+29)

name       old user-ns/op   new user-ns/op   delta
Template   281user-ms ± 4%  279user-ms ± 3%  -0.87%  (p=0.044 n=28+28)
Unicode    142user-ms ± 4%  141user-ms ± 3%  -1.04%  (p=0.015 n=30+27)
GoTypes    884user-ms ± 3%  886user-ms ± 2%    ~     (p=0.532 n=30+30)
Compiler   3.94user-s ± 3%  3.92user-s ± 1%    ~     (p=0.185 n=30+28)
Flate      165user-ms ± 2%  165user-ms ± 4%    ~     (p=0.780 n=27+29)
GoParser   209user-ms ± 2%  208user-ms ± 3%    ~     (p=0.453 n=28+30)
Reflect    533user-ms ± 6%  526user-ms ± 3%    ~     (p=0.057 n=30+30)
Tar        156user-ms ± 6%  154user-ms ± 6%    ~     (p=0.133 n=29+30)
XML        288user-ms ± 4%  288user-ms ± 4%    ~     (p=0.633 n=30+30)

name       old alloc/op     new alloc/op     delta
Template       41.0MB ± 0%      40.9MB ± 0%  -0.11%  (p=0.000 n=29+29)
Unicode        32.6MB ± 0%      32.6MB ± 0%    ~     (p=0.572 n=29+30)
GoTypes         122MB ± 0%       122MB ± 0%  -0.10%  (p=0.000 n=30+30)
Compiler        482MB ± 0%       481MB ± 0%  -0.07%  (p=0.000 n=30+29)
Flate          26.6MB ± 0%      26.6MB ± 0%    ~     (p=0.096 n=30+30)
GoParser       32.7MB ± 0%      32.6MB ± 0%  -0.06%  (p=0.011 n=28+28)
Reflect        84.2MB ± 0%      84.1MB ± 0%  -0.17%  (p=0.000 n=29+30)
Tar            27.7MB ± 0%      27.7MB ± 0%  -0.05%  (p=0.032 n=27+28)
XML            44.7MB ± 0%      44.7MB ± 0%    ~     (p=0.131 n=28+30)

name       old allocs/op    new allocs/op    delta
Template         373k ± 1%        370k ± 1%  -0.76%  (p=0.000 n=30+30)
Unicode          325k ± 1%        325k ± 1%    ~     (p=0.383 n=29+30)
GoTypes         1.16M ± 0%       1.15M ± 0%  -0.75%  (p=0.000 n=29+30)
Compiler        4.15M ± 0%       4.13M ± 0%  -0.59%  (p=0.000 n=30+29)
Flate            238k ± 1%        237k ± 1%  -0.62%  (p=0.000 n=30+30)
GoParser         304k ± 1%        302k ± 1%  -0.64%  (p=0.000 n=30+28)
Reflect         1.00M ± 0%       0.99M ± 0%  -1.10%  (p=0.000 n=29+30)
Tar              245k ± 1%        244k ± 1%  -0.59%  (p=0.000 n=27+29)
XML              391k ± 1%        389k ± 1%  -0.59%  (p=0.000 n=29+30)

Change-Id: Id7f456d690567c2b0a96b0d6d64de8784b6e305f
Reviewed-on: https://go-review.googlesource.com/36476
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-28 19:23:33 +00:00
Austin Clements
bab191042b cmd/internal/obj, runtime: update funcdata comments
The comments in cmd/internal/obj/funcdata.go are identical to the
comments in runtime/funcdata.h, but the majority of the definitions
they refer to don't apply to Go sources and have been stripped out of
funcdata.go.

Remove these stale comments from funcdata.go and clean up the
references to other copies of the PCDATA and FUNCDATA indexes.

Change-Id: I5d6e49a6e586cc9aecd7c3ce1567679f2a605884
Reviewed-on: https://go-review.googlesource.com/37330
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-27 22:29:28 +00:00
Dmitry Vyukov
ba6e5776fd runtime: remove unused RaceSemacquire declaration
These functions are not defined and are not used.

Fixes #19290

Change-Id: I2978147220af83cf319f7439f076c131870fb9ee
Reviewed-on: https://go-review.googlesource.com/37448
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 20:15:51 +00:00
Josh Bleecher Snyder
c7894924c7 runtime/pprof: handle empty stack traces in Profile.Add
If the caller passes a large number to Profile.Add,
the list of pcs is empty, which results in junk
(a nil pc) being recorded. Check for that explicitly,
and replace such stack traces with a lostProfileEvent.

Fixes #18836.

Change-Id: I99c96aa67dd5525cd239ea96452e6e8fcb25ce02
Reviewed-on: https://go-review.googlesource.com/36891
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-27 17:11:07 +00:00
Russ Cox
8c24e52247 runtime: check that pprof accepts but doesn't need executable
The profiles are self-contained now.
Check that they work by themselves in the tests that invoke pprof,
but also keep checking that the old command lines work.

Change-Id: I24c74b5456f0b50473883c3640625c6612f72309
Reviewed-on: https://go-review.googlesource.com/37166
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 20:46:37 +00:00
Russ Cox
0b8c983ece runtime/pprof/internal/profile: move internal/pprof/profile here
Nothing needs internal/pprof anymore except the runtime/pprof tests.
Move the package here to prevent new dependencies.

Change-Id: Ia119af91cc2b980e0fa03a15f46f69d7f71d2926
Reviewed-on: https://go-review.googlesource.com/37165
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 20:45:21 +00:00
Russ Cox
cbab65fdfa runtime/pprof: add streaming protobuf encoder
The existing code builds a full profile in memory.
Then it translates that profile into a data structure (in memory).
Then it marshals that data structure into a protocol buffer (in memory).
Then it gzips that marshaled form into the underlying writer.
So there are three copies of the full profile data in memory
at the same time before we're done. This is obviously dumb.

This CL implements a fully streaming conversion from
the original in-memory profile to the underlying writer.
There is now only one copy of the profile in memory.

For the non-CPU profiles, this is optimal, since we have to
have a full copy in memory to start with.

For the CPU profiles, we could still try to bound the profile
size stored in memory and stream fragments out during
the actual profiling, as Go 1.7 did (with a simpler format),
but so far that hasn't been necessary.

Change-Id: Ic36141021857791bf0cd1fce84178fb5e744b989
Reviewed-on: https://go-review.googlesource.com/37164
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 20:15:56 +00:00
Russ Cox
1564817d8c runtime/pprof: use more efficient hash table for staging profile
The old hash table was a place holder that allocates memory
during every lookup for key generation, even for keys that hit
in the the table.

Change-Id: I4f601bbfd349f0be76d6259a8989c9c17ccfac21
Reviewed-on: https://go-review.googlesource.com/37163
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 17:05:37 +00:00
Russ Cox
1a680a902a runtime/pprof: use new profile buffers for CPU profiling
This doesn't change the functionality of the current code,
but it sets us up for exporting the profiling labels into the profile.

The old code had a hash table of profile samples maintained
during the signal handler, with evictions going into a log.
The new code just logs every sample directly, leaving the
hash-based deduplication to an ordinary goroutine.

The new code also avoids storing the entire profile in two
forms in memory, an unfortunate regression introduced
when binary profile support was added. After this CL the
entire profile is only stored once in memory. We'd still like
to get back down to storing it zero times (streaming it to
the underlying io.Writer).

Change-Id: I0893a1788267c564aa1af17970d47377b2a43457
Reviewed-on: https://go-review.googlesource.com/36712
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 17:01:47 +00:00
Russ Cox
a1261b8b0a runtime: do not allocate on every time.Sleep
It's common for some goroutines to loop calling time.Sleep.
Allocate once per goroutine, not every time.
This comes up in runtime/pprof's background reader.

Change-Id: I89d17dc7379dca266d2c9cd3aefc2382f5bdbade
Reviewed-on: https://go-review.googlesource.com/37162
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-02-24 15:34:01 +00:00
Russ Cox
b788fd80e6 runtime: new profile buffer implementation supporting label pointers
The existing CPU profiling buffer is a slice of uintptr, but we want to
start including profiling label data in the profiles, and those labels need
to be pointers in order to let them describe rich information.

This CL implements a new profBuf type that holds both a slice of uint64
for data and a slice of unsafe.Pointer for profiling labels (aka tags).
Making the runtime use these buffers will happen in followup CLs.

Change-Id: I9ff16b532d8edaf4ce0cbba1098229a561834efc
Reviewed-on: https://go-review.googlesource.com/36713
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-02-23 19:47:23 +00:00
Lynn Boger
ea48c9d232 runtime: more detail for crash_test.go
This updates the testcase to display the timestamps for the
runtime.a, it dependent packages atomic.a and sys.a, and
source files.

Change-Id: Id2901b4e8aa8eb9775c4f404ac01cc07b394ba91
Reviewed-on: https://go-review.googlesource.com/37332
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-22 16:34:14 +00:00
Josh Bleecher Snyder
4208fcdcd4 runtime: use standard linux/mipsx clone variable names
Change-Id: I62118e197190af1d11a89921d5769101ce6d2257
Reviewed-on: https://go-review.googlesource.com/37306
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-21 18:42:38 +00:00
Josh Bleecher Snyder
b6e0d4647f runtime: update assembly var names after monotonic time changes
Change-Id: I721045120a4df41462c02252e2e5e8529ae2d694
Reviewed-on: https://go-review.googlesource.com/37303
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-21 18:42:05 +00:00
Brad Fitzpatrick
a37f9d8a17 runtime/pprof: mark TestMutexProfile as flaky for now
Flaky tests hurt productivity. Disable for now.

Updates #19139

Change-Id: I2e3040bdf0e53597a1c4f925b788e3268ea284c1
Reviewed-on: https://go-review.googlesource.com/37291
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-20 20:17:16 +00:00
Dmitry Vyukov
0556e26273 sync: make Mutex more fair
Add new starvation mode for Mutex.
In starvation mode ownership is directly handed off from
unlocking goroutine to the next waiter. New arriving goroutines
don't compete for ownership.
Unfair wait time is now limited to 1ms.
Also fix a long standing bug that goroutines were requeued
at the tail of the wait queue. That lead to even more unfair
acquisition times with multiple waiters.

Performance of normal mode is not considerably affected.

Fixes #13086

On the provided in the issue lockskew program:

done in 1.207853ms
done in 1.177451ms
done in 1.184168ms
done in 1.198633ms
done in 1.185797ms
done in 1.182502ms
done in 1.316485ms
done in 1.211611ms
done in 1.182418ms

name                    old time/op  new time/op   delta
MutexUncontended-48     0.65ns ± 0%   0.65ns ± 1%     ~           (p=0.087 n=10+10)
Mutex-48                 112ns ± 1%    114ns ± 1%   +1.69%        (p=0.000 n=10+10)
MutexSlack-48            113ns ± 0%     87ns ± 1%  -22.65%         (p=0.000 n=8+10)
MutexWork-48             149ns ± 0%    145ns ± 0%   -2.48%         (p=0.000 n=9+10)
MutexWorkSlack-48        149ns ± 0%    122ns ± 3%  -18.26%         (p=0.000 n=6+10)
MutexNoSpin-48           103ns ± 4%    105ns ± 3%     ~           (p=0.089 n=10+10)
MutexSpin-48             490ns ± 4%    515ns ± 6%   +5.08%        (p=0.006 n=10+10)
Cond32-48               13.4µs ± 6%   13.1µs ± 5%   -2.75%        (p=0.023 n=10+10)
RWMutexWrite100-48      53.2ns ± 3%   41.2ns ± 3%  -22.57%        (p=0.000 n=10+10)
RWMutexWrite10-48       45.9ns ± 2%   43.9ns ± 2%   -4.38%        (p=0.000 n=10+10)
RWMutexWorkWrite100-48   122ns ± 2%    134ns ± 1%   +9.92%        (p=0.000 n=10+10)
RWMutexWorkWrite10-48    206ns ± 1%    188ns ± 1%   -8.52%         (p=0.000 n=8+10)
Cond32-24               12.1µs ± 3%   12.4µs ± 3%   +1.98%         (p=0.043 n=10+9)
MutexUncontended-24     0.74ns ± 1%   0.75ns ± 1%     ~           (p=0.650 n=10+10)
Mutex-24                 122ns ± 2%    124ns ± 1%   +1.31%        (p=0.007 n=10+10)
MutexSlack-24           96.9ns ± 2%  102.8ns ± 2%   +6.11%        (p=0.000 n=10+10)
MutexWork-24             146ns ± 1%    135ns ± 2%   -7.70%         (p=0.000 n=10+9)
MutexWorkSlack-24        135ns ± 1%    128ns ± 2%   -5.01%         (p=0.000 n=10+9)
MutexNoSpin-24           114ns ± 3%    110ns ± 4%   -3.84%        (p=0.000 n=10+10)
MutexSpin-24             482ns ± 4%    475ns ± 8%     ~           (p=0.286 n=10+10)
RWMutexWrite100-24      43.0ns ± 3%   43.1ns ± 2%     ~           (p=0.956 n=10+10)
RWMutexWrite10-24       43.4ns ± 1%   43.2ns ± 1%     ~            (p=0.085 n=10+9)
RWMutexWorkWrite100-24   130ns ± 3%    131ns ± 3%     ~           (p=0.747 n=10+10)
RWMutexWorkWrite10-24    191ns ± 1%    192ns ± 1%     ~           (p=0.210 n=10+10)
Cond32-12               11.5µs ± 2%   11.7µs ± 2%   +1.98%        (p=0.002 n=10+10)
MutexUncontended-12     1.48ns ± 0%   1.50ns ± 1%   +1.08%        (p=0.004 n=10+10)
Mutex-12                 141ns ± 1%    143ns ± 1%   +1.63%        (p=0.000 n=10+10)
MutexSlack-12            121ns ± 0%    119ns ± 0%   -1.65%          (p=0.001 n=8+9)
MutexWork-12             141ns ± 2%    150ns ± 3%   +6.36%         (p=0.000 n=9+10)
MutexWorkSlack-12        131ns ± 0%    138ns ± 0%   +5.73%         (p=0.000 n=9+10)
MutexNoSpin-12          87.0ns ± 1%   83.7ns ± 1%   -3.80%        (p=0.000 n=10+10)
MutexSpin-12             364ns ± 1%    377ns ± 1%   +3.77%        (p=0.000 n=10+10)
RWMutexWrite100-12      42.8ns ± 1%   43.9ns ± 1%   +2.41%         (p=0.000 n=8+10)
RWMutexWrite10-12       39.8ns ± 4%   39.3ns ± 1%     ~            (p=0.433 n=10+9)
RWMutexWorkWrite100-12   131ns ± 1%    131ns ± 0%     ~            (p=0.591 n=10+9)
RWMutexWorkWrite10-12    173ns ± 1%    174ns ± 0%     ~            (p=0.059 n=10+8)
Cond32-6                10.9µs ± 2%   10.9µs ± 2%     ~           (p=0.739 n=10+10)
MutexUncontended-6      2.97ns ± 0%   2.97ns ± 0%     ~     (all samples are equal)
Mutex-6                  122ns ± 6%    122ns ± 2%     ~           (p=0.668 n=10+10)
MutexSlack-6             149ns ± 3%    142ns ± 3%   -4.63%        (p=0.000 n=10+10)
MutexWork-6              136ns ± 3%    140ns ± 5%     ~           (p=0.077 n=10+10)
MutexWorkSlack-6         152ns ± 0%    138ns ± 2%   -9.21%         (p=0.000 n=6+10)
MutexNoSpin-6            150ns ± 1%    152ns ± 0%   +1.50%         (p=0.000 n=8+10)
MutexSpin-6              726ns ± 0%    730ns ± 1%     ~           (p=0.069 n=10+10)
RWMutexWrite100-6       40.6ns ± 1%   40.9ns ± 1%   +0.91%         (p=0.001 n=8+10)
RWMutexWrite10-6        37.1ns ± 0%   37.0ns ± 1%     ~            (p=0.386 n=9+10)
RWMutexWorkWrite100-6    133ns ± 1%    134ns ± 1%   +1.01%         (p=0.005 n=9+10)
RWMutexWorkWrite10-6     152ns ± 0%    152ns ± 0%     ~     (all samples are equal)
Cond32-2                7.86µs ± 2%   7.95µs ± 2%   +1.10%        (p=0.023 n=10+10)
MutexUncontended-2      8.10ns ± 0%   9.11ns ± 4%  +12.44%         (p=0.000 n=9+10)
Mutex-2                 32.9ns ± 9%   38.4ns ± 6%  +16.58%        (p=0.000 n=10+10)
MutexSlack-2            93.4ns ± 1%   98.5ns ± 2%   +5.39%         (p=0.000 n=10+9)
MutexWork-2             40.8ns ± 3%   43.8ns ± 7%   +7.38%         (p=0.000 n=10+9)
MutexWorkSlack-2        98.6ns ± 5%  108.2ns ± 2%   +9.80%         (p=0.000 n=10+8)
MutexNoSpin-2            399ns ± 1%    398ns ± 2%     ~             (p=0.463 n=8+9)
MutexSpin-2             1.99µs ± 3%   1.97µs ± 1%   -0.81%          (p=0.003 n=9+8)
RWMutexWrite100-2       37.6ns ± 5%   46.0ns ± 4%  +22.17%         (p=0.000 n=10+8)
RWMutexWrite10-2        50.1ns ± 6%   36.8ns ±12%  -26.46%         (p=0.000 n=9+10)
RWMutexWorkWrite100-2    136ns ± 0%    134ns ± 2%   -1.80%          (p=0.001 n=7+9)
RWMutexWorkWrite10-2     140ns ± 1%    138ns ± 1%   -1.50%        (p=0.000 n=10+10)
Cond32                  5.93µs ± 1%   5.91µs ± 0%     ~            (p=0.411 n=9+10)
MutexUncontended        15.9ns ± 0%   15.8ns ± 0%   -0.63%          (p=0.000 n=8+8)
Mutex                   15.9ns ± 0%   15.8ns ± 0%   -0.44%        (p=0.003 n=10+10)
MutexSlack              26.9ns ± 3%   26.7ns ± 2%     ~           (p=0.084 n=10+10)
MutexWork               47.8ns ± 0%   47.9ns ± 0%   +0.21%          (p=0.014 n=9+8)
MutexWorkSlack          54.9ns ± 3%   54.5ns ± 3%     ~           (p=0.254 n=10+10)
MutexNoSpin              786ns ± 2%    765ns ± 1%   -2.66%        (p=0.000 n=10+10)
MutexSpin               3.87µs ± 1%   3.83µs ± 0%   -0.85%          (p=0.005 n=9+8)
RWMutexWrite100         21.2ns ± 2%   21.0ns ± 1%   -0.88%         (p=0.018 n=10+9)
RWMutexWrite10          22.6ns ± 1%   22.6ns ± 0%     ~             (p=0.471 n=9+9)
RWMutexWorkWrite100      132ns ± 0%    132ns ± 0%     ~     (all samples are equal)
RWMutexWorkWrite10       124ns ± 0%    123ns ± 0%     ~           (p=0.656 n=10+10)

Change-Id: I66412a3a0980df1233ad7a5a0cd9723b4274528b
Reviewed-on: https://go-review.googlesource.com/34310
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-17 17:24:59 +00:00
Russ Cox
990124da2a runtime: use balanced tree for addr lookup in semaphore implementation
CL 36792 fixed #17953, a linear scan caused by n goroutines piling into
two different locks that hashed to the same bucket in the semaphore table.
In that CL, n goroutines contending for 2 unfortunately chosen locks
went from O(n²) to O(n).

This CL fixes a different linear scan, when n goroutines are contending for
n/2 different locks that all hash to the same bucket in the semaphore table.
In this CL, n goroutines contending for n/2 unfortunately chosen locks
goes from O(n²) to O(n log n). This case is much less likely, but any linear
scan eventually hurts, so we might as well fix it while the problem is fresh
in our minds.

The new test in this CL checks for both linear scans.

The effect of this CL on the sync benchmarks is negligible
(but it fixes the new test).

name                      old time/op    new time/op    delta
Cond1-48                     576ns ±10%     575ns ±13%     ~     (p=0.679 n=71+71)
Cond2-48                    1.59µs ± 8%    1.61µs ± 9%     ~     (p=0.107 n=73+69)
Cond4-48                    4.56µs ± 7%    4.55µs ± 7%     ~     (p=0.670 n=74+72)
Cond8-48                    9.87µs ± 9%    9.90µs ± 7%     ~     (p=0.507 n=69+73)
Cond16-48                   20.4µs ± 7%    20.4µs ±10%     ~     (p=0.588 n=69+71)
Cond32-48                   45.4µs ±10%    45.4µs ±14%     ~     (p=0.944 n=73+73)
UncontendedSemaphore-48     19.7ns ±12%    19.7ns ± 8%     ~     (p=0.589 n=65+63)
ContendedSemaphore-48       55.4ns ±26%    54.9ns ±32%     ~     (p=0.441 n=75+75)
MutexUncontended-48         0.63ns ± 0%    0.63ns ± 0%     ~     (all equal)
Mutex-48                     210ns ± 6%     213ns ±10%   +1.30%  (p=0.035 n=70+74)
MutexSlack-48                210ns ± 7%     211ns ± 9%     ~     (p=0.184 n=71+72)
MutexWork-48                 299ns ± 5%     300ns ± 5%     ~     (p=0.678 n=73+75)
MutexWorkSlack-48            302ns ± 6%     300ns ± 5%     ~     (p=0.149 n=74+72)
MutexNoSpin-48               135ns ± 6%     135ns ±10%     ~     (p=0.788 n=67+75)
MutexSpin-48                 693ns ± 5%     689ns ± 6%     ~     (p=0.092 n=65+74)
Once-48                     0.22ns ±25%    0.22ns ±24%     ~     (p=0.882 n=74+73)
Pool-48                     5.88ns ±36%    5.79ns ±24%     ~     (p=0.655 n=69+69)
PoolOverflow-48             4.79µs ±18%    4.87µs ±20%     ~     (p=0.233 n=75+75)
SemaUncontended-48          0.80ns ± 1%    0.82ns ± 8%   +2.46%  (p=0.000 n=60+74)
SemaSyntNonblock-48          103ns ± 4%     102ns ± 5%   -1.11%  (p=0.003 n=75+75)
SemaSyntBlock-48             104ns ± 4%     104ns ± 5%     ~     (p=0.231 n=71+75)
SemaWorkNonblock-48          128ns ± 4%     129ns ± 6%   +1.51%  (p=0.000 n=63+75)
SemaWorkBlock-48             129ns ± 8%     130ns ± 7%     ~     (p=0.072 n=75+74)
RWMutexUncontended-48       2.35ns ± 1%    2.35ns ± 0%     ~     (p=0.144 n=70+55)
RWMutexWrite100-48           139ns ±18%     141ns ±21%     ~     (p=0.071 n=75+73)
RWMutexWrite10-48            145ns ± 9%     145ns ± 8%     ~     (p=0.553 n=75+75)
RWMutexWorkWrite100-48       297ns ±13%     297ns ±15%     ~     (p=0.519 n=75+74)
RWMutexWorkWrite10-48        588ns ± 7%     585ns ± 5%     ~     (p=0.173 n=73+70)
WaitGroupUncontended-48     0.87ns ± 0%    0.87ns ± 0%     ~     (all equal)
WaitGroupAddDone-48         63.2ns ± 4%    62.7ns ± 4%   -0.82%  (p=0.027 n=72+75)
WaitGroupAddDoneWork-48      109ns ± 5%     109ns ± 4%     ~     (p=0.233 n=75+75)
WaitGroupWait-48            0.17ns ± 0%    0.16ns ±16%   -8.55%  (p=0.000 n=56+75)
WaitGroupWaitWork-48        1.78ns ± 1%    2.08ns ± 5%  +16.92%  (p=0.000 n=74+70)
WaitGroupActuallyWait-48    52.0ns ± 3%    50.6ns ± 5%   -2.70%  (p=0.000 n=71+69)

https://perf.golang.org/search?q=upload:20170215.1

Change-Id: Ia29a8bd006c089e401ec4297c3038cca656bcd0a
Reviewed-on: https://go-review.googlesource.com/37103
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-16 17:52:15 +00:00
Russ Cox
58d762176a runtime: run mutexevent profiling without holding semaRoot lock
Suggested by Dmitry in CL 36792 review.
Clearly safe since there are many different semaRoots
that could all have profiled sudogs calling mutexevent.

Change-Id: I45eed47a5be3e513b2dad63b60afcd94800e16d1
Reviewed-on: https://go-review.googlesource.com/37104
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2017-02-16 17:16:41 +00:00
Russ Cox
1f77db94f8 runtime: do not call wakep from enlistWorker, to avoid possible deadlock
We have seen one instance of a production job suddenly spinning to
100% CPU and becoming unresponsive. In that one instance, a SIGQUIT
was sent after 328 minutes of spinning, and the stacks showed a single
goroutine in "IO wait (scan)" state.

Looking for things that might get stuck if a goroutine got stuck in
scanning a stack, we found that injectglist does:

	lock(&sched.lock)
	var n int
	for n = 0; glist != nil; n++ {
		gp := glist
		glist = gp.schedlink.ptr()
		casgstatus(gp, _Gwaiting, _Grunnable)
		globrunqput(gp)
	}
	unlock(&sched.lock)

and that casgstatus spins on gp.atomicstatus until the _Gscan bit goes
away. Essentially, this code locks sched.lock and then while holding
sched.lock, waits to lock gp.atomicstatus.

The code that is doing the scan is:

	if castogscanstatus(gp, s, s|_Gscan) {
		if !gp.gcscandone {
			scanstack(gp, gcw)
			gp.gcscandone = true
		}
		restartg(gp)
		break loop
	}

More analysis showed that scanstack can, in a rare case, end up
calling back into code that acquires sched.lock. For example:

	runtime.scanstack at proc.go:866
	calls runtime.gentraceback at mgcmark.go:842
	calls runtime.scanstack$1 at traceback.go:378
	calls runtime.scanframeworker at mgcmark.go:819
	calls runtime.scanblock at mgcmark.go:904
	calls runtime.greyobject at mgcmark.go:1221
	calls (*runtime.gcWork).put at mgcmark.go:1412
	calls (*runtime.gcControllerState).enlistWorker at mgcwork.go:127
	calls runtime.wakep at mgc.go:632
	calls runtime.startm at proc.go:1779
	acquires runtime.sched.lock at proc.go:1675

This path was found with an automated deadlock-detecting tool.
There are many such paths but they all go through enlistWorker -> wakep.

The evidence strongly suggests that one of these paths is what caused
the deadlock we observed. We're running those jobs with
GOTRACEBACK=crash now to try to get more information if it happens
again.

Further refinement and analysis shows that if we drop the wakep call
from enlistWorker, the remaining few deadlock cycles found by the tool
are all false positives caused by not understanding the effect of calls
to func variables.

The enlistWorker -> wakep call was intended only as a performance
optimization, it rarely executes, and if it does execute at just the
wrong time it can (and plausibly did) cause the deadlock we saw.

Comment it out, to avoid the potential deadlock.

Fixes #19112.
Unfixes #14179.

Change-Id: I6f7e10b890b991c11e79fab7aeefaf70b5d5a07b
Reviewed-on: https://go-review.googlesource.com/37093
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-02-15 21:22:36 +00:00
Hana Kim
8833af3f4b runtime/pprof: print newly added fields of runtime.MemStats
in heap profile with debug mode

Change-Id: I3a80d03a4aa556614626067a8fd698b3b00f4290
Reviewed-on: https://go-review.googlesource.com/36962
Reviewed-by: Austin Clements <austin@google.com>
2017-02-15 21:14:37 +00:00
Ian Lance Taylor
c05b06a12d os: use poller for file I/O
This changes the os package to use the runtime poller for file I/O
where possible. When a system call blocks on a pollable descriptor,
the goroutine will be blocked on the poller but the thread will be
released to run other goroutines. When using a non-pollable
descriptor, the os package will continue to use thread-blocking system
calls as before.

For example, on GNU/Linux, the runtime poller uses epoll. epoll does
not support ordinary disk files, so they will continue to use blocking
I/O as before. The poller will be used for pipes.

Since this means that the poller is used for many more programs, this
modifies the runtime to only block waiting for the poller if there is
some goroutine that is waiting on the poller. Otherwise, there is no
point, as the poller will never make any goroutine ready. This
preserves the runtime's current simple deadlock detection.

This seems to crash FreeBSD systems, so it is disabled on FreeBSD.
This is issue 19093.

Using the poller on Windows requires opening the file with
FILE_FLAG_OVERLAPPED. We should only do that if we can remove that
flag if the program calls the Fd method. This is issue 19098.

Update #6817.
Update #7903.
Update #15021.
Update #18507.
Update #19093.
Update #19098.

Change-Id: Ia5197dcefa7c6fbcca97d19a6f8621b2abcbb1fe
Reviewed-on: https://go-review.googlesource.com/36800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-15 19:31:55 +00:00
Austin Clements
0993b2fd06 runtime: remove g.stackAlloc
Since we're no longer stealing space for the stack barrier array from
the stack allocation, the stack allocation is simply
g.stack.hi-g.stack.lo.

Updates #17503.

Change-Id: Id9b450ae12c3df9ec59cfc4365481a0a16b7c601
Reviewed-on: https://go-review.googlesource.com/36621
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-02-14 15:52:56 +00:00
Austin Clements
d089a6c718 runtime: remove stack barriers
Now that we don't rescan stacks, stack barriers are unnecessary. This
removes all of the code and structures supporting them as well as
tests that were specifically for stack barriers.

Updates #17503.

Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c
Reviewed-on: https://go-review.googlesource.com/36620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-14 15:52:54 +00:00
Austin Clements
c5ebcd2c8a runtime: remove rescan list
With the hybrid barrier, rescanning stacks is no longer necessary so
the rescan list is no longer necessary. Remove it.

This leaves the gcrescanstacks GODEBUG variable, since it's useful for
debugging, but changes it to simply walk all of the Gs to rescan
stacks rather than using the rescan list.

We could also remove g.gcscanvalid, which is effectively a distributed
rescan list. However, it's still useful for gcrescanstacks mode and it
adds little complexity, so we'll leave it in.

Fixes #17099.
Updates #17503.

Change-Id: I776d43f0729567335ef1bfd145b75c74de2cc7a9
Reviewed-on: https://go-review.googlesource.com/36619
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-02-14 15:52:51 +00:00
Austin Clements
7aeb915d6b runtime: remove unused debug.wbshadow
The wbshadow implementation was removed a year and a half ago in
1635ab7dfe, but the GODEBUG setting remained. Remove the GODEBUG
setting since it doesn't do anything.

Change-Id: I19cde324a79472aff60acb5cc9f7d4aa86c0c0ed
Reviewed-on: https://go-review.googlesource.com/36618
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-02-14 15:52:49 +00:00
Josh Bleecher Snyder
ef30a1c8aa runtime: fix some assembly offset names
For vet. There are more. This is a start.

Change-Id: Ibbbb2b20b5db60ee3fac4a1b5913d18fab01f6b9
Reviewed-on: https://go-review.googlesource.com/36939
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-14 02:09:48 +00:00
Josh Bleecher Snyder
cc2a52adef all: use keyed composite literals
Makes vet happy.

Change-Id: I7250f283c96e82b9796c5672a0a143ba7568fa63
Reviewed-on: https://go-review.googlesource.com/36937
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-14 02:09:14 +00:00
Josh Bleecher Snyder
46a75870ad runtime: speed up fastrand() % n
This occurs a fair amount in the runtime for non-power-of-two n.
Use an alternative, faster formulation.

name           old time/op  new time/op  delta
Fastrandn/2-8  4.45ns ± 2%  2.09ns ± 3%  -53.12%  (p=0.000 n=14+14)
Fastrandn/3-8  4.78ns ±11%  2.06ns ± 2%  -56.94%  (p=0.000 n=15+15)
Fastrandn/4-8  4.76ns ± 9%  1.99ns ± 3%  -58.28%  (p=0.000 n=15+13)
Fastrandn/5-8  4.96ns ±13%  2.03ns ± 6%  -59.14%  (p=0.000 n=15+15)

name                    old time/op  new time/op  delta
SelectUncontended-8     33.7ns ± 2%  33.9ns ± 2%  +0.70%  (p=0.000 n=49+50)
SelectSyncContended-8   1.68µs ± 4%  1.65µs ± 4%  -1.54%  (p=0.000 n=50+45)
SelectAsyncContended-8   282ns ± 1%   277ns ± 1%  -1.50%  (p=0.000 n=48+43)
SelectNonblock-8        5.31ns ± 1%  5.32ns ± 1%    ~     (p=0.275 n=45+44)
SelectProdCons-8         585ns ± 3%   577ns ± 2%  -1.35%  (p=0.000 n=50+50)
GoroutineSelect-8       1.59ms ± 2%  1.59ms ± 1%    ~     (p=0.084 n=49+48)

Updates #16213

Change-Id: Ib555a4d7da2042a25c3976f76a436b536487d5b7
Reviewed-on: https://go-review.googlesource.com/36932
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-14 00:01:22 +00:00
Ian Lance Taylor
62237c2c8e runtime: if runtime is stale while testing, show StaleReason
Update #19062.

Change-Id: I7397b573389145b56e73d2150ce0fc9aa75b3caa
Reviewed-on: https://go-review.googlesource.com/36934
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-13 23:46:12 +00:00
Sokolov Yura
663226d8e1 runtime: make fastrand to generate 32bit values
Extend period of fastrand from (1<<31)-1 to (1<<32)-1 by
choosing other polynom and reacting on high bit before shift.

Polynomial is taken at https://users.ece.cmu.edu/~koopman/lfsr/index.html
from 32.dat.gz . It is referred as F7711115 cause this list of
polynomials is for LFSR with shift to right (and fastrand uses shift to
left). (old polynomial is referred in 31.dat.gz as 7BB88888).

There were couple of places with conversation of fastrand to int, which
leads to negative values on 32bit platforms. They are fixed.

Change-Id: Ibee518a3f9103e0aea220ada494b3aec77babb72
Reviewed-on: https://go-review.googlesource.com/36875
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-13 20:22:02 +00:00
Ian Lance Taylor
40c27ed5bc runtime: if runtime is stale while testing, show cmd/go output
Update #19062.

Change-Id: If6a4c4f8d12e148b162256f13a8ee423f6e30637
Reviewed-on: https://go-review.googlesource.com/36918
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-13 20:00:05 +00:00
Ian Lance Taylor
3792db5183 net: refactor poller into new internal/poll package
This will make it possible to use the poller with the os package.

This is a lot of code movement but the behavior is intended to be
unchanged.

Update #6817.
Update #7903.
Update #15021.
Update #18507.

Change-Id: I1413685928017c32df5654ded73a2643820977ae
Reviewed-on: https://go-review.googlesource.com/36799
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-13 18:36:28 +00:00
Keith Randall
5a75d6a08e cmd/compile: optimize non-empty-interface type conversions
When doing i.(T) for non-empty-interface i and concrete type T,
there's no need to read the type out of the itab. Just compare the
itab to the itab we expect for that interface/type pair.

Also optimize type switches by putting the type hash of the
concrete type in the itab. That way we don't need to load the
type pointer out of the itab.

Update #18492

Change-Id: I49e280a21e5687e771db5b8a56b685291ac168ce
Reviewed-on: https://go-review.googlesource.com/34810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2017-02-13 18:16:31 +00:00
Josh Bleecher Snyder
8da91a6297 runtime: add Frames example
Based on sample code from iant.

Fixes #18788.

Change-Id: I6bb33ed05af2538fbde42ddcac629280ef7c00a6
Reviewed-on: https://go-review.googlesource.com/36892
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-13 06:10:35 +00:00
Russ Cox
45c6f59e1f runtime: use two-level list for semaphore address search in semaRoot
If there are many goroutines contending for two different locks
and both locks hash to the same semaRoot, the scans to find the
goroutines for a particular lock can end up being O(n), making
n lock acquisitions quadratic.

As long as only one actively-used lock hashes to each semaRoot
there's no problem, since the list operations in that case are O(1).
But when the second actively-used lock hits the same semaRoot,
then scans for entries with for a given lock have to scan over the
entries for the other lock.

Fix this problem by changing the semaRoot to hold only one sudog
per unique address. In the running example, this drops the length of
that list from O(n) to 2. Then attach other goroutines waiting on the
same address to a separate list headed by the sudog in the semaRoot list.
Those "same address list" operations are still O(1), so now the
example from above works much better.

There is still an assumption here that in real programs you don't have
many many goroutines queueing up on many many distinct addresses.
If we end up with that problem, we can replace the top-level list with
a treap.

Fixes #17953.

Change-Id: I78c5b1a5053845275ab31686038aa4f6db5720b2
Reviewed-on: https://go-review.googlesource.com/36792
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-12 15:54:16 +00:00