1
0
mirror of https://github.com/golang/go synced 2024-10-02 10:28:34 -06:00
Commit Graph

96 Commits

Author SHA1 Message Date
Austin Clements
1a2cf91f5e runtime: split gfree list into with-stacks and without-stacks
Currently all free Gs are added to one list. Split this into two
lists: one for free Gs with cached stacks and one for Gs without
cached stacks.

This lets us preferentially allocate Gs that already have a stack, but
more importantly, it sets us up to free cached G stacks concurrently.

Change-Id: Idbe486f708997e1c9d166662995283f02d1eeb3c
Reviewed-on: https://go-review.googlesource.com/20664
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 23:39:51 +00:00
Dmitry Vyukov
a3703618ea runtime: use per-goroutine sequence numbers in tracer
Currently tracer uses global sequencer and it introduces
significant slowdown on parallel machines (up to 10x).
Replace the global sequencer with per-goroutine sequencer.

If we assign per-goroutine sequence numbers to only 3 types
of events (start, unblock and syscall exit), it is enough to
restore consistent partial ordering of all events. Even these
events don't need sequence numbers all the time (if goroutine
starts on the same P where it was unblocked, then start does
not need sequence number).
The burden of restoring the order is put on trace parser.
Details of the algorithm are described in the comments.

On http benchmark with GOMAXPROCS=48:
no tracing: 5026 ns/op
tracing: 27803 ns/op (+453%)
with this change: 6369 ns/op (+26%, mostly for traceback)

Also trace size is reduced by ~22%. Average event size before: 4.63
bytes/event, after: 3.62 bytes/event.

Besides running trace tests, I've also tested with manually broken
cputicks (random skew for each event, per-P skew and episodic random skew).
In all cases broken timestamps were detected and no test failures.

Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa
Reviewed-on: https://go-review.googlesource.com/21512
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-23 15:57:05 +00:00
David Crawshaw
7d469179e6 cmd/compile, etc: store method tables as offsets
This CL introduces the typeOff type and a lookup method of the same
name that can turn a typeOff offset into an *rtype.

In a typical Go binary (built with buildmode=exe, pie, c-archive, or
c-shared), there is one moduledata and all typeOff values are offsets
relative to firstmoduledata.types. This makes computing the pointer
cheap in typical programs.

With buildmode=shared (and one day, buildmode=plugin) there are
multiple modules whose relative offset is determined at runtime.
We identify a type in the general case by the pair of the original
*rtype that references it and its typeOff value. We determine
the module from the original pointer, and then use the typeOff from
there to compute the final *rtype.

To ensure there is only one *rtype representing each type, the
runtime initializes a typemap for each module, using any identical
type from an earlier module when resolving that offset. This means
that types computed from an offset match the type mapped by the
pointer dynamic relocations.

A series of followup CLs will replace other *rtype values with typeOff
(and name/*string with nameOff).

For types created at runtime by reflect, type offsets are treated as
global IDs and reference into a reflect offset map kept by the runtime.

darwin/amd64:
	cmd/go:  -57KB (0.6%)
	jujud:  -557KB (0.8%)

linux/amd64 PIE:
	cmd/go: -361KB (3.0%)
	jujud:  -3.5MB (4.2%)

For #6853.

Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96
Reviewed-on: https://go-review.googlesource.com/21285
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-13 13:03:11 +00:00
Emmanuel Odeke
e4f1d9cf2e runtime: make execution error panic values implement the Error interface
Make execution panics implement Error as
mandated by https://golang.org/ref/spec#Run_time_panics,
instead of panics with strings.

Fixes #14965

Change-Id: I7827f898b9b9c08af541db922cc24fa0800ff18a
Reviewed-on: https://go-review.googlesource.com/21214
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-10 01:16:30 +00:00
Michael Hudson-Doyle
31cf1c1779 runtime: clamp OS-reported number of processors to _MaxGomaxprocs
So that all Go processes do not die on startup on a system with >256 CPUs.

I tested this by hacking osinit to set ncpu to 1000.

Updates #15131

Change-Id: I52e061a0de97be41d684dd8b748fa9087d6f1aef
Reviewed-on: https://go-review.googlesource.com/21599
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-07 00:11:25 +00:00
Dmitry Vyukov
475d113b53 runtime: don't burn CPU unnecessarily
Two GC-related functions, scang and casgstatus, wait in an active spin loop.
Active spinning is never a good idea in user-space. Once we wait several
times more than the expected wait time, something unexpected is happenning
(e.g. the thread we are waiting for is descheduled or handling a page fault)
and we need to yield to OS scheduler. Moreover, the expected wait time is
very high for these functions: scang wait time can be tens of milliseconds,
casgstatus can be hundreds of microseconds. It does not make sense to spin
even for that time.

go install -a std profile on a 4-core machine shows that 11% of time is spent
in the active spin in scang:

  6.12%    compile  compile                [.] runtime.scang
  3.27%    compile  compile                [.] runtime.readgstatus
  1.72%    compile  compile                [.] runtime/internal/atomic.Load

The active spin also increases tail latency in the case of the slightest
oversubscription: GC goroutines spend whole quantum in the loop instead of
executing user code.

Here is scang wait time histogram during go install -a std:

13707.0000 - 1815442.7667 [   118]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎...
1815442.7667 - 3617178.5333 [     9]: ∎∎∎∎∎∎∎∎∎
3617178.5333 - 5418914.3000 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
5418914.3000 - 7220650.0667 [     5]: ∎∎∎∎∎
7220650.0667 - 9022385.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
9022385.8333 - 10824121.6000 [    13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
10824121.6000 - 12625857.3667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
12625857.3667 - 14427593.1333 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
14427593.1333 - 16229328.9000 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
16229328.9000 - 18031064.6667 [    32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
18031064.6667 - 19832800.4333 [    28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
19832800.4333 - 21634536.2000 [     6]: ∎∎∎∎∎∎
21634536.2000 - 23436271.9667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
23436271.9667 - 25238007.7333 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
25238007.7333 - 27039743.5000 [    27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
27039743.5000 - 28841479.2667 [    20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
28841479.2667 - 30643215.0333 [    10]: ∎∎∎∎∎∎∎∎∎∎
30643215.0333 - 32444950.8000 [     7]: ∎∎∎∎∎∎∎
32444950.8000 - 34246686.5667 [     4]: ∎∎∎∎
34246686.5667 - 36048422.3333 [     4]: ∎∎∎∎
36048422.3333 - 37850158.1000 [     1]: ∎
37850158.1000 - 39651893.8667 [     5]: ∎∎∎∎∎
39651893.8667 - 41453629.6333 [     2]: ∎∎
41453629.6333 - 43255365.4000 [     2]: ∎∎
43255365.4000 - 45057101.1667 [     2]: ∎∎
45057101.1667 - 46858836.9333 [     1]: ∎
46858836.9333 - 48660572.7000 [     2]: ∎∎
48660572.7000 - 50462308.4667 [     3]: ∎∎∎
50462308.4667 - 52264044.2333 [     2]: ∎∎
52264044.2333 - 54065780.0000 [     2]: ∎∎

and the zoomed-in first part:

13707.0000 - 19916.7667 [     2]: ∎∎
19916.7667 - 26126.5333 [     2]: ∎∎
26126.5333 - 32336.3000 [     9]: ∎∎∎∎∎∎∎∎∎
32336.3000 - 38546.0667 [     8]: ∎∎∎∎∎∎∎∎
38546.0667 - 44755.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
44755.8333 - 50965.6000 [    10]: ∎∎∎∎∎∎∎∎∎∎
50965.6000 - 57175.3667 [     5]: ∎∎∎∎∎
57175.3667 - 63385.1333 [     6]: ∎∎∎∎∎∎
63385.1333 - 69594.9000 [     5]: ∎∎∎∎∎
69594.9000 - 75804.6667 [     6]: ∎∎∎∎∎∎
75804.6667 - 82014.4333 [     6]: ∎∎∎∎∎∎
82014.4333 - 88224.2000 [     4]: ∎∎∎∎
88224.2000 - 94433.9667 [     1]: ∎
94433.9667 - 100643.7333 [     1]: ∎
100643.7333 - 106853.5000 [     2]: ∎∎
106853.5000 - 113063.2667 [     0]:
113063.2667 - 119273.0333 [     2]: ∎∎
119273.0333 - 125482.8000 [     2]: ∎∎
125482.8000 - 131692.5667 [     1]: ∎
131692.5667 - 137902.3333 [     1]: ∎
137902.3333 - 144112.1000 [     0]:
144112.1000 - 150321.8667 [     2]: ∎∎
150321.8667 - 156531.6333 [     1]: ∎
156531.6333 - 162741.4000 [     1]: ∎
162741.4000 - 168951.1667 [     0]:
168951.1667 - 175160.9333 [     0]:
175160.9333 - 181370.7000 [     1]: ∎
181370.7000 - 187580.4667 [     1]: ∎
187580.4667 - 193790.2333 [     2]: ∎∎
193790.2333 - 200000.0000 [     0]:

Here is casgstatus wait time histogram:

  631.0000 -  5276.6333 [     3]: ∎∎∎
 5276.6333 -  9922.2667 [     5]: ∎∎∎∎∎
 9922.2667 - 14567.9000 [     2]: ∎∎
14567.9000 - 19213.5333 [     6]: ∎∎∎∎∎∎
19213.5333 - 23859.1667 [     5]: ∎∎∎∎∎
23859.1667 - 28504.8000 [     6]: ∎∎∎∎∎∎
28504.8000 - 33150.4333 [     6]: ∎∎∎∎∎∎
33150.4333 - 37796.0667 [     2]: ∎∎
37796.0667 - 42441.7000 [     1]: ∎
42441.7000 - 47087.3333 [     3]: ∎∎∎
47087.3333 - 51732.9667 [     0]:
51732.9667 - 56378.6000 [     1]: ∎
56378.6000 - 61024.2333 [     0]:
61024.2333 - 65669.8667 [     0]:
65669.8667 - 70315.5000 [     0]:
70315.5000 - 74961.1333 [     1]: ∎
74961.1333 - 79606.7667 [     0]:
79606.7667 - 84252.4000 [     0]:
84252.4000 - 88898.0333 [     0]:
88898.0333 - 93543.6667 [     0]:
93543.6667 - 98189.3000 [     0]:
98189.3000 - 102834.9333 [     0]:
102834.9333 - 107480.5667 [     1]: ∎
107480.5667 - 112126.2000 [     0]:
112126.2000 - 116771.8333 [     0]:
116771.8333 - 121417.4667 [     0]:
121417.4667 - 126063.1000 [     0]:
126063.1000 - 130708.7333 [     0]:
130708.7333 - 135354.3667 [     0]:
135354.3667 - 140000.0000 [     1]: ∎

Ideally we eliminate the waiting by switching to async
state machine for GC, but for now just yield to OS scheduler
after a reasonable wait time.

To choose yielding parameters I've measured
golang.org/x/benchmarks/http tail latencies with different yield
delays and oversubscription levels.

With no oversubscription (to the degree possible):

scang yield delay = 1, casgstatus yield delay = 1
Latency-50   1.41ms ±15%  1.41ms ± 5%    ~     (p=0.611 n=13+12)
Latency-95   5.21ms ± 2%  5.15ms ± 2%  -1.15%  (p=0.012 n=13+13)
Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.54%  (p=0.002 n=13+13)
Latency-999  10.7ms ± 9%  10.2ms ±10%  -5.46%  (p=0.004 n=12+13)

scang yield delay = 5000, casgstatus yield delay = 3000
Latency-50   1.41ms ±15%  1.41ms ± 8%    ~     (p=0.511 n=13+13)
Latency-95   5.21ms ± 2%  5.14ms ± 2%  -1.23%  (p=0.006 n=13+13)
Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.94%  (p=0.000 n=13+13)
Latency-999  10.7ms ± 9%  10.1ms ± 8%  -6.14%  (p=0.000 n=12+13)

scang yield delay = 10000, casgstatus yield delay = 5000
Latency-50   1.41ms ±15%  1.45ms ± 6%    ~     (p=0.724 n=13+13)
Latency-95   5.21ms ± 2%  5.18ms ± 1%    ~     (p=0.287 n=13+13)
Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.64%  (p=0.002 n=13+13)
Latency-999  10.7ms ± 9%  10.0ms ± 5%  -6.72%  (p=0.000 n=12+13)

scang yield delay = 30000, casgstatus yield delay = 10000
Latency-50   1.41ms ±15%  1.51ms ± 7%  +6.57%  (p=0.002 n=13+13)
Latency-95   5.21ms ± 2%  5.21ms ± 2%    ~     (p=0.960 n=13+13)
Latency-99   7.16ms ± 2%  7.06ms ± 2%  -1.50%  (p=0.012 n=13+13)
Latency-999  10.7ms ± 9%  10.0ms ± 6%  -6.49%  (p=0.000 n=12+13)

scang yield delay = 100000, casgstatus yield delay = 50000
Latency-50   1.41ms ±15%  1.53ms ± 6%  +8.48%  (p=0.000 n=13+12)
Latency-95   5.21ms ± 2%  5.23ms ± 2%    ~     (p=0.287 n=13+13)
Latency-99   7.16ms ± 2%  7.08ms ± 2%  -1.21%  (p=0.004 n=13+13)
Latency-999  10.7ms ± 9%   9.9ms ± 3%  -7.99%  (p=0.000 n=12+12)

scang yield delay = 200000, casgstatus yield delay = 100000
Latency-50   1.41ms ±15%  1.47ms ± 5%    ~     (p=0.072 n=13+13)
Latency-95   5.21ms ± 2%  5.17ms ± 2%    ~     (p=0.091 n=13+13)
Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.99%  (p=0.000 n=13+13)
Latency-999  10.7ms ± 9%   9.9ms ± 5%  -7.86%  (p=0.000 n=12+13)

With slight oversubscription (another instance of http benchmark
was running in background with reduced GOMAXPROCS):

scang yield delay = 1, casgstatus yield delay = 1
Latency-50    840µs ± 3%   804µs ± 3%  -4.37%  (p=0.000 n=15+18)
Latency-95   6.52ms ± 4%  6.03ms ± 4%  -7.51%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%  10.0ms ± 4%  -7.33%  (p=0.000 n=18+14)
Latency-999  18.0ms ± 9%  16.8ms ± 7%  -6.84%  (p=0.000 n=18+18)

scang yield delay = 5000, casgstatus yield delay = 3000
Latency-50    840µs ± 3%   809µs ± 3%  -3.71%  (p=0.000 n=15+17)
Latency-95   6.52ms ± 4%  6.11ms ± 4%  -6.29%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%   9.9ms ± 6%  -7.55%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.5ms ±11%  -8.49%  (p=0.000 n=18+18)

scang yield delay = 10000, casgstatus yield delay = 5000
Latency-50    840µs ± 3%   823µs ± 5%  -2.06%  (p=0.002 n=15+18)
Latency-95   6.52ms ± 4%  6.32ms ± 3%  -3.05%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 4%  -5.22%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.7ms ±10%  -7.09%  (p=0.000 n=18+18)

scang yield delay = 30000, casgstatus yield delay = 10000
Latency-50    840µs ± 3%   836µs ± 5%    ~     (p=0.442 n=15+18)
Latency-95   6.52ms ± 4%  6.39ms ± 3%  -2.00%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 6%  -5.15%  (p=0.000 n=18+17)
Latency-999  18.0ms ± 9%  16.6ms ± 8%  -7.48%  (p=0.000 n=18+18)

scang yield delay = 100000, casgstatus yield delay = 50000
Latency-50    840µs ± 3%   836µs ± 6%    ~     (p=0.401 n=15+18)
Latency-95   6.52ms ± 4%  6.40ms ± 4%  -1.79%  (p=0.010 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 5%  -4.95%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.5ms ±14%  -8.17%  (p=0.000 n=18+18)

scang yield delay = 200000, casgstatus yield delay = 100000
Latency-50    840µs ± 3%   828µs ± 2%  -1.49%  (p=0.001 n=15+17)
Latency-95   6.52ms ± 4%  6.38ms ± 4%  -2.04%  (p=0.001 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 4%  -4.77%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.9ms ± 9%  -6.23%  (p=0.000 n=18+18)

With significant oversubscription (background http benchmark
was running with full GOMAXPROCS):

scang yield delay = 1, casgstatus yield delay = 1
Latency-50   1.32ms ±12%  1.30ms ±13%    ~     (p=0.454 n=14+14)
Latency-95   16.3ms ±10%  15.3ms ± 7%  -6.29%  (p=0.001 n=14+14)
Latency-99   29.4ms ±10%  27.9ms ± 5%  -5.04%  (p=0.001 n=14+12)
Latency-999  49.9ms ±19%  45.9ms ± 5%  -8.00%  (p=0.008 n=14+13)

scang yield delay = 5000, casgstatus yield delay = 3000
Latency-50   1.32ms ±12%  1.29ms ± 9%    ~     (p=0.227 n=14+14)
Latency-95   16.3ms ±10%  15.4ms ± 5%  -5.27%  (p=0.002 n=14+14)
Latency-99   29.4ms ±10%  27.9ms ± 6%  -5.16%  (p=0.001 n=14+14)
Latency-999  49.9ms ±19%  46.8ms ± 8%  -6.21%  (p=0.050 n=14+14)

scang yield delay = 10000, casgstatus yield delay = 5000
Latency-50   1.32ms ±12%  1.35ms ± 9%     ~     (p=0.401 n=14+14)
Latency-95   16.3ms ±10%  15.0ms ± 4%   -7.67%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.4ms ± 5%   -6.98%  (p=0.000 n=14+14)
Latency-999  49.9ms ±19%  44.7ms ± 5%  -10.56%  (p=0.000 n=14+11)

scang yield delay = 30000, casgstatus yield delay = 10000
Latency-50   1.32ms ±12%  1.36ms ±10%     ~     (p=0.246 n=14+14)
Latency-95   16.3ms ±10%  14.9ms ± 5%   -8.31%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.4ms ± 7%   -6.70%  (p=0.000 n=14+14)
Latency-999  49.9ms ±19%  44.9ms ±15%  -10.13%  (p=0.003 n=14+14)

scang yield delay = 100000, casgstatus yield delay = 50000
Latency-50   1.32ms ±12%  1.41ms ± 9%  +6.37%  (p=0.008 n=14+13)
Latency-95   16.3ms ±10%  15.1ms ± 8%  -7.45%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.5ms ±12%  -6.67%  (p=0.002 n=14+14)
Latency-999  49.9ms ±19%  45.9ms ±16%  -8.06%  (p=0.019 n=14+14)

scang yield delay = 200000, casgstatus yield delay = 100000
Latency-50   1.32ms ±12%  1.42ms ±10%   +7.21%  (p=0.003 n=14+14)
Latency-95   16.3ms ±10%  15.0ms ± 7%   -7.59%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.3ms ± 8%   -7.20%  (p=0.000 n=14+14)
Latency-999  49.9ms ±19%  44.8ms ± 8%  -10.21%  (p=0.001 n=14+13)

All numbers are on 8 cores and with GOGC=10 (http benchmark has
tiny heap, few goroutines and low allocation rate, so by default
GC barely affects tail latency).

10us/5us yield delays seem to provide a reasonable compromise
and give 5-10% tail latency reduction. That's what used in this change.

go install -a std results on 4 core machine:

name      old time/op  new time/op  delta
Time       8.39s ± 2%   7.94s ± 2%  -5.34%  (p=0.000 n=47+49)
UserTime   24.6s ± 2%   22.9s ± 2%  -6.76%  (p=0.000 n=49+49)
SysTime    1.77s ± 9%   1.89s ±11%  +7.00%  (p=0.000 n=49+49)
CpuLoad    315ns ± 2%   313ns ± 1%  -0.59%  (p=0.000 n=49+48) # %CPU
MaxRSS    97.1ms ± 4%  97.5ms ± 9%    ~     (p=0.838 n=46+49) # bytes

Update #14396
Update #14189

Change-Id: I3f4109bf8f7fd79b39c466576690a778232055a2
Reviewed-on: https://go-review.googlesource.com/21503
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-04-05 15:52:03 +00:00
Dmitry Vyukov
3b246fa863 runtime: sleep less when we can do work
Usleep(100) in runqgrab negatively affects latency and throughput
of parallel application. We are sleeping instead of doing useful work.
This is effect is particularly visible on windows where minimal
sleep duration is 1-15ms.

Reduce sleep from 100us to 3us and use osyield on windows.
Sync chan send/recv takes ~50ns, so 3us gives us ~50x overshoot.

benchmark                    old ns/op     new ns/op     delta
BenchmarkChanSync-12         216           217           +0.46%
BenchmarkChanSyncWork-12     27213         25816         -5.13%

CPU consumption goes up from 106% to 108% in the first case,
and from 107% to 125% in the second case.

Test case from #14790 on windows:

BenchmarkDefaultResolution-8  4583372   29720    -99.35%
Benchmark1ms-8                992056    30701    -96.91%

99-th latency percentile for HTTP request serving is improved by up to 15%
(see http://golang.org/cl/20835 for details).

The following benchmarks are from the change that originally added this sleep
(see https://golang.org/s/go15gomaxprocs):

name        old time/op  new time/op  delta
Chain       22.6µs ± 2%  22.7µs ± 6%    ~      (p=0.905 n=9+10)
ChainBuf    22.4µs ± 3%  22.5µs ± 4%    ~      (p=0.780 n=9+10)
Chain-2     23.5µs ± 4%  24.9µs ± 1%  +5.66%   (p=0.000 n=10+9)
ChainBuf-2  23.7µs ± 1%  24.4µs ± 1%  +3.31%   (p=0.000 n=9+10)
Chain-4     24.2µs ± 2%  25.1µs ± 3%  +3.70%   (p=0.000 n=9+10)
ChainBuf-4  24.4µs ± 5%  25.0µs ± 2%  +2.37%  (p=0.023 n=10+10)
Powser       2.37s ± 1%   2.37s ± 1%    ~       (p=0.423 n=8+9)
Powser-2     2.48s ± 2%   2.57s ± 2%  +3.74%   (p=0.000 n=10+9)
Powser-4     2.66s ± 1%   2.75s ± 1%  +3.40%  (p=0.000 n=10+10)
Sieve        13.3s ± 2%   13.3s ± 2%    ~      (p=1.000 n=10+9)
Sieve-2      7.00s ± 2%   7.44s ±16%    ~      (p=0.408 n=8+10)
Sieve-4      4.13s ±21%   3.85s ±22%    ~       (p=0.113 n=9+9)

Fixes #14790

Change-Id: Ie7c6a1c4f9c8eb2f5d65ab127a3845386d6f8b5d
Reviewed-on: https://go-review.googlesource.com/20835
Reviewed-by: Austin Clements <austin@google.com>
2016-04-05 15:32:06 +00:00
Ian Lance Taylor
59fc42b230 runtime: allocate mp.cgocallers earlier
Fixes #15061.

Change-Id: I71f69f398d1c5f3a884bbd044786f1a5600d0fae
Reviewed-on: https://go-review.googlesource.com/21398
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-01 22:23:13 +00:00
Michel Lespinasse
7043d2bb5e runtime: insert itabs into hash table during init
See #14874

This change makes the runtime register all compiler generated itabs
(as obtained from the moduledata) during init.

Change-Id: I9969a0985b99b8bda820a631f7fe4c78f1174cdf
Reviewed-on: https://go-review.googlesource.com/20900
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
2016-03-29 02:14:49 +00:00
Dmitry Vyukov
ea0386f85f runtime: improve randomized stealing logic
During random stealing we steal 4*GOMAXPROCS times from random procs.
One would expect that most of the time we check all procs this way,
but due to low quality PRNG we actually miss procs with frightening
probability. Below are modelling experiment results for 1e6 tries:

GOMAXPROCS = 2 : missed 1 procs 7944 times

GOMAXPROCS = 3 : missed 1 procs 101620 times
GOMAXPROCS = 3 : missed 2 procs 3571 times

GOMAXPROCS = 4 : missed 1 procs 63916 times
GOMAXPROCS = 4 : missed 2 procs 61 times
GOMAXPROCS = 4 : missed 3 procs 16 times

GOMAXPROCS = 5 : missed 1 procs 133136 times
GOMAXPROCS = 5 : missed 2 procs 1025 times
GOMAXPROCS = 5 : missed 3 procs 101 times
GOMAXPROCS = 5 : missed 4 procs 15 times

GOMAXPROCS = 8 : missed 1 procs 151765 times
GOMAXPROCS = 8 : missed 2 procs 5057 times
GOMAXPROCS = 8 : missed 3 procs 1726 times
GOMAXPROCS = 8 : missed 4 procs 68 times

GOMAXPROCS = 12 : missed 1 procs 199081 times
GOMAXPROCS = 12 : missed 2 procs 27489 times
GOMAXPROCS = 12 : missed 3 procs 3113 times
GOMAXPROCS = 12 : missed 4 procs 233 times
GOMAXPROCS = 12 : missed 5 procs 9 times

GOMAXPROCS = 16 : missed 1 procs 237477 times
GOMAXPROCS = 16 : missed 2 procs 30037 times
GOMAXPROCS = 16 : missed 3 procs 9466 times
GOMAXPROCS = 16 : missed 4 procs 1334 times
GOMAXPROCS = 16 : missed 5 procs 192 times
GOMAXPROCS = 16 : missed 6 procs 5 times
GOMAXPROCS = 16 : missed 7 procs 1 times
GOMAXPROCS = 16 : missed 8 procs 1 times

A missed proc won't lead to underutilization because we check all procs
again after dropping P. But it can lead to an unpleasant situation
when we miss a proc, drop P, check all procs, discover work, acquire P,
miss the proc again, repeat.

Improve stealing logic to cover all procs.
Also don't enter spinning mode and try to steal when there is nobody around.

Change-Id: Ibb6b122cc7fb836991bad7d0639b77c807aab4c2
Reviewed-on: https://go-review.googlesource.com/20836
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2016-03-25 11:00:48 +00:00
Austin Clements
8fb182d020 runtime: never pass stack pointers to gopark
gopark calls the unlock function after setting the G to _Gwaiting.
This means it's generally unsafe to access the G's stack from the
unlock function because the G may start running on another P. Once we
start shrinking stacks concurrently, a stack shrink could also move
the stack the moment after it enters _Gwaiting and before the unlock
function is called.

Document this restriction and fix the two places where we currently
violate it.

This is unlikely to be a problem in practice for these two places
right now, but they're already skating on thin ice. For example, the
following sequence could in principle cause corruption, deadlock, or a
panic in the select code:

On M1/P1:
1. G1 selects on channels A and B.
2. selectgoImpl calls gopark.
3. gopark puts G1 in _Gwaiting.
4. gopark calls selparkcommit.
5. selparkcommit releases the lock on channel A.

On M2/P2:
6. G2 sends to channel A.
7. The send puts G1 in _Grunnable and puts it on P2's run queue.
8. The scheduler runs, selects G1, puts it in _Grunning, and resumes G1.
9. On G1, the sellock immediately following the gopark gets called.
10. sellock grows and moves the stack.

On M1/P1:
11. selparkcommit continues to scan the lock order for the next
channel to unlock, but it's now reading from a freed (and possibly
reused) stack.

This shouldn't happen in practice because step 10 isn't the first call
to sellock, so the stack should already be big enough. However, once
we start shrinking stacks concurrently, this reasoning won't work any
more.

For #12967.

Change-Id: I3660c5be37e5be9f87433cb8141bdfdf37fadc4c
Reviewed-on: https://go-review.googlesource.com/20038
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:13:10 +00:00
Austin Clements
e4a95b6343 runtime: record channel in sudog
Given a G, there's currently no way to find the channel it's blocking
on. We'll need this information to fix a (probably theoretical) bug in
select and to implement concurrent stack shrinking, so record the
channel in the sudog.

For #12967.

Change-Id: If8fb63a140f1d07175818824d08c0ebeec2bdf66
Reviewed-on: https://go-review.googlesource.com/20035
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:13:02 +00:00
Emmanuel Odeke
6dfcc336c5 runtime: move testSchedLocalQueue* to export_test
Move functions testSchedLocalQueueLocal and testSchedLocalQueueSteal
from proc.go to export_test.go, the only site that they are used.

Fixes #14796

Change-Id: I16b6fa4a13835eab33f66a2c2e87a5f5c79b7bd3
Reviewed-on: https://go-review.googlesource.com/20640
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-13 00:34:58 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
Dmitry Vyukov
bdc14698f8 runtime: unwire g/m in dropg always
Currently dropg does not unwire locked g/m.
This is unnecessary distiction between locked and non-locked g/m.
We always restart goroutines with execute which re-wires g/m.

First, this produces false sense that this distinction is necessary.
Second, it can confuse some sanity and cross checks. For example,
if we check that g/m are unwired before we wire them in execute,
the check will fail for locked g/m. I've hit this while doing some
race detector changes, When we deschedule a goroutine and run
scheduler code, m.curg is generally nil, but not for locked ms.

Remove the distinction.

Change-Id: I3b87a28ff343baa1d564aab1f821b582a84dee07
Reviewed-on: https://go-review.googlesource.com/19950
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-26 15:45:45 +00:00
Austin Clements
cbe849fc38 runtime: eliminate unused _Genqueue state
_Genqueue and _Gscanenqueue were introduced as part of the GC quiesce
code. The quiesce code was removed by 197aa9e, but these states and
some associated code stuck around. Remove them.

Change-Id: I69df81881602d4a431556513dac2959668d27c20
Reviewed-on: https://go-review.googlesource.com/19638
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 23:37:32 +00:00
Martin Möhrmann
fdd0179bb1 all: fix typos and spelling
Change-Id: Icd06d99c42b8299fd931c7da821e1f418684d913
Reviewed-on: https://go-review.googlesource.com/19829
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-24 18:42:29 +00:00
Shenghou Ma
d70c04cf08 runtime: fix missing word in comment
Change-Id: I6cb8ac7b59812e82111ab3b0f8303ab8194a5129
Reviewed-on: https://go-review.googlesource.com/19791
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-21 22:40:25 +00:00
Austin Clements
f309bf3eef runtime: start an M when handing off a P when there's GC work
Currently it's possible for the scheduler to deadlock with the right
confluence of locked Gs, assists, and scheduling of background mark
workers. Broadly, this happens because handoffp is stricter than
findrunnable, and if the only work for a P is GC work, handoffp will
put the P into idle, rather than starting an M to execute that P. One
way this can happen is as follows:

0. There is only one user G, which we'll call G 1. There is more than
   one P, but they're all idle except the one running G 1.

1. G 1 locks itself to an M using runtime.LockOSThread.

2. GC starts up and enters mark 1.

3. G 1 performs a GC assist, which completes mark 1 without being
   fully satisfied. Completing mark 1 causes all background mark
   workers to park. And since the assist isn't fully satisfied, it
   parks as well, waiting for a background mark worker to satisfy its
   remaining assist debt.

4. The assist park enters the scheduler. Since G 1 is locked to the M,
   the scheduler releases the P and calls handoffp to hand the P to
   another M.

5. handoffp checks the local and global run queues, which are empty,
   and sees that there are idle Ps, so rather than start an M, it puts
   the P into idle.

At this point, all of the Gs are waiting and all of the Ps are idle.
In particular, none of the GC workers are running, so no mark work
gets done and the assist on the main G is never satisfied, so the
whole process soft locks up.

Fix this by making handoffp start an M if there is GC work. This
reintroduces a key invariant: that in any situation where findrunnable
would return a G to run on a P, handoffp for that P will start an M to
run work on that P.

Fixes #13645.

Tested by running 2,689 iterations of `go tool dist test -no-rebuild
runtime:cpu124` across 10 linux-amd64-noopt VMs with no failures.
Without this change, the failure rate was somewhere around 1%.

Performance change is negligible.

name              old time/op  new time/op  delta
XBenchGarbage-12  2.48ms ± 2%  2.48ms ± 1%  -0.24%  (p=0.000 n=92+93)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.86s ± 2%     2.87s ± 2%    ~     (p=0.667 n=19+20)
Fannkuch11-12                2.52s ± 1%     2.47s ± 1%  -2.05%  (p=0.000 n=18+20)
FmtFprintfEmpty-12          51.7ns ± 1%    51.5ns ± 3%    ~     (p=0.931 n=16+20)
FmtFprintfString-12          170ns ± 1%     168ns ± 1%  -0.65%  (p=0.000 n=19+19)
FmtFprintfInt-12             160ns ± 0%     160ns ± 0%  +0.18%  (p=0.033 n=17+19)
FmtFprintfIntInt-12          265ns ± 1%     273ns ± 1%  +2.98%  (p=0.000 n=17+19)
FmtFprintfPrefixedInt-12     235ns ± 1%     239ns ± 1%  +1.99%  (p=0.000 n=16+19)
FmtFprintfFloat-12           315ns ± 0%     315ns ± 1%    ~     (p=0.250 n=17+19)
FmtManyArgs-12              1.04µs ± 1%    1.05µs ± 0%  +0.87%  (p=0.000 n=17+19)
GobDecode-12                7.93ms ± 0%    7.85ms ± 1%  -1.03%  (p=0.000 n=16+18)
GobEncode-12                6.62ms ± 1%    6.58ms ± 1%  -0.60%  (p=0.000 n=18+19)
Gzip-12                      322ms ± 1%     320ms ± 1%  -0.46%  (p=0.009 n=20+20)
Gunzip-12                   42.5ms ± 1%    42.5ms ± 0%    ~     (p=0.751 n=19+19)
HTTPClientServer-12         69.7µs ± 1%    70.0µs ± 2%    ~     (p=0.056 n=19+19)
JSONEncode-12               16.9ms ± 1%    16.7ms ± 1%  -1.13%  (p=0.000 n=19+19)
JSONDecode-12               61.5ms ± 1%    61.3ms ± 1%  -0.35%  (p=0.001 n=20+17)
Mandelbrot200-12            3.94ms ± 0%    3.91ms ± 0%  -0.67%  (p=0.000 n=20+18)
GoParse-12                  3.71ms ± 1%    3.70ms ± 1%    ~     (p=0.244 n=17+19)
RegexpMatchEasy0_32-12       101ns ± 1%     102ns ± 2%  +0.54%  (p=0.037 n=19+20)
RegexpMatchEasy0_1K-12       349ns ± 0%     350ns ± 0%  +0.33%  (p=0.000 n=17+18)
RegexpMatchEasy1_32-12      84.5ns ± 2%    84.2ns ± 1%  -0.43%  (p=0.048 n=19+20)
RegexpMatchEasy1_1K-12       510ns ± 1%     513ns ± 2%  +0.58%  (p=0.002 n=18+20)
RegexpMatchMedium_32-12      132ns ± 1%     134ns ± 1%  +0.95%  (p=0.000 n=20+20)
RegexpMatchMedium_1K-12     40.1µs ± 1%    39.6µs ± 1%  -1.39%  (p=0.000 n=20+20)
RegexpMatchHard_32-12       2.08µs ± 0%    2.06µs ± 1%  -0.95%  (p=0.000 n=18+18)
RegexpMatchHard_1K-12       62.2µs ± 1%    61.9µs ± 1%  -0.42%  (p=0.001 n=19+20)
Revcomp-12                   537ms ± 0%     536ms ± 0%    ~     (p=0.076 n=20+20)
Template-12                 71.3ms ± 1%    69.3ms ± 1%  -2.75%  (p=0.000 n=20+20)
TimeParse-12                 361ns ± 0%     360ns ± 1%    ~     (p=0.056 n=19+19)
TimeFormat-12                353ns ± 0%     352ns ± 0%  -0.23%  (p=0.000 n=17+18)
[Geo mean]                  62.6µs         62.5µs       -0.17%

Change-Id: I0fbbbe4d7d99653ba5600ffb4394fa03558bc4e9
Reviewed-on: https://go-review.googlesource.com/19107
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-02 02:11:28 +00:00
Austin Clements
09940b92a0 runtime: make p.gcBgMarkWorker a guintptr
Currently p.gcBgMarkWorker is a *g. Change it to a guintptr. This
eliminates a write barrier during the subtle mark worker parking dance
(which isn't known to be causing problems, but may).

Change-Id: Ibf12c05ac910820448059e69a68e5b882c993ed8
Reviewed-on: https://go-review.googlesource.com/18970
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-27 02:23:09 +00:00
Austin Clements
eb3b1830b0 runtime: attach mark workers to P after they park
Currently mark workers attach to their designated Ps before parking,
either during initialization or after performing a phase transition.
However, in both of these cases, it's possible that the mark worker is
running on a different P than the one it attaches to. This is a
problem, because as soon as the worker attaches to a P, that P's
scheduler can execute the worker. If the worker hasn't yet parked on
the P it's actually running on, this means the worker G will be
running in two places at once. The most visible consequence of this is
that once the first instance of the worker does park, it will clear
g.m and the second instance will crash shortly when it tries to use
g.m.

Fix this by moving the attach to the gopark callback. At this point,
the G is genuinely stopped and the callback is running on the system
stack, so it's safe for another P's scheduler to pick up the worker G.

Fixes #13363. Fixes #13978.

Change-Id: If2f7c4a4174f9511f6227e14a27c56fb842d1cc8
Reviewed-on: https://go-review.googlesource.com/18761
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2016-01-27 02:13:53 +00:00
Ian Lance Taylor
efd93a412e runtime: minimize time between lockextra/unlockextra
This doesn't fix a bug, but may improve performance in programs that
have many concurrent calls from C to Go.  The old code made several
system calls between lockextra and unlockextra.  That could be happening
while another thread is spinning acquiring lockextra.  This changes the
code to not make any system calls while holding the lock.

Change-Id: I50576478e478670c3d6429ad4e1b7d80f98a19d8
Reviewed-on: https://go-review.googlesource.com/18548
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-14 05:55:43 +00:00
Russ Cox
fac8202c3f runtime: make NumGoroutine and Stack agree not to include system goroutines
[Repeat of CL 18343 with build fixes.]

Before, NumGoroutine counted system goroutines and Stack (usually) didn't show them,
which was inconsistent and confusing.

To resolve which way they should be consistent, it seems like

	package main
	import "runtime"
	func main() { println(runtime.NumGoroutine()) }

should print 1 regardless of internal runtime details. Make it so.

Fixes #11706.

Change-Id: If26749fec06aa0ff84311f7941b88d140552e81d
Reviewed-on: https://go-review.googlesource.com/18432
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
2016-01-13 01:46:01 +00:00
Ian Lance Taylor
c02aa463db runtime: fix arm/arm64/ppc64/mips64 to dropm when necessary
Fixes #13881.

Change-Id: Idff77db381640184ddd2b65022133bb226168800
Reviewed-on: https://go-review.googlesource.com/18449
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2016-01-11 18:46:54 +00:00
Ian Lance Taylor
21b4f234c7 runtime: for c-archive/c-shared, install signal handlers synchronously
The previous behaviour of installing the signal handlers in a separate
thread meant that Go initialization raced with non-Go initialization if
the non-Go initialization also wanted to install signal handlers.  Make
installing signal handlers synchronous so that the process-wide behavior
is predictable.

Update #9896.

Change-Id: Ice24299877ec46f8518b072a381932d273096a32
Reviewed-on: https://go-review.googlesource.com/18150
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-01-09 00:58:38 +00:00
Russ Cox
6da608206c Revert "runtime: make NumGoroutine and Stack agree not to include system goroutines"
This reverts commit c5bafc8281.

Change-Id: Ie7030c978c6263b9e996d5aa0e490086796df26d
Reviewed-on: https://go-review.googlesource.com/18431
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-08 15:31:15 +00:00
Russ Cox
c5bafc8281 runtime: make NumGoroutine and Stack agree not to include system goroutines
Before, NumGoroutine counted system goroutines and Stack (usually) didn't show them,
which was inconsistent and confusing.

To resolve which way they should be consistent, it seems like

	package main
	import "runtime"
	func main() { println(runtime.NumGoroutine()) }

should print 1 regardless of internal runtime details. Make it so.

Fixes #11706.

Change-Id: I6bfe26a901de517728192cfb26a5568c4ef4fe47
Reviewed-on: https://go-review.googlesource.com/18343
Reviewed-by: Austin Clements <austin@google.com>
2016-01-08 15:25:00 +00:00
Austin Clements
3f22adecc7 runtime: fix sigprof stack barrier locking
f90b48e intended to require the stack barrier lock in all cases of
sigprof that walked the user stack, but got it wrong. In particular,
if sp < gp.stack.lo || gp.stack.hi < sp, tracebackUser would be true,
but we wouldn't acquire the stack lock. If it then turned out that we
were in a cgo call, it would walk the stack without the lock.

In fact, the whole structure of stack locking is sigprof is somewhat
wrong because it assumes the G to lock is gp.m.curg, but all three
gentraceback calls start from potentially different Gs.

To fix this, we lower the gcTryLockStackBarriers calls much closer to
the gentraceback calls. There are now three separate trylock calls,
each clearly associated with a gentraceback and the locked G clearly
matches the G from which the gentraceback starts. This actually brings
the sigprof logic closer to what it originally was before stack
barrier locking.

This depends on "runtime: increase assumed stack size in
externalthreadhandler" because it very slightly increases the stack
used by sigprof; without this other commit, this is enough to blow the
profiler thread's assumed stack size.

Fixes #12528 (hopefully for real this time!).

For the 1.5 branch, though it will require some backporting. On the
1.5 branch, this will *not* require the "runtime: increase assumed
stack size in externalthreadhandler" commit: there's no pcvalue cache,
so the used stack is smaller.

Change-Id: Id2f6446ac276848f6fc158bee550cccd03186b83
Reviewed-on: https://go-review.googlesource.com/18328
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-07 19:40:38 +00:00
Austin Clements
b50b24837d runtime: don't ignore success of cgo profiling tracebacks
If a sigprof happens during a cgo call, we traceback from the entry
point of the cgo call. However, if the SP is outside of the G's stack,
we'll then ignore this traceback, even if it was successful, and
overwrite it with just _ExternalCode.

Fix this by accepting any successful traceback, regardless of whether
we got it from a cgo entry point or from regular Go code.

Fixes #13466.

Change-Id: I5da9684361fc5964f44985d74a8cdf02ffefd213
Reviewed-on: https://go-review.googlesource.com/18327
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-07 19:40:26 +00:00
Ian Lance Taylor
70c9a8187a runtime: set new m signal mask to program startup mask
We were setting the signal mask of a new m to the signal mask of the m
that created it.  That failed when that m happened to be the one created
by ensureSigM, which sets its signal mask to only include the signals
being caught by os/signal.Notify.

Fixes #13164.
Update #9896.

Change-Id: I705c196fe9d11754e10bab9e9b2e7530ecdfa367
Reviewed-on: https://go-review.googlesource.com/18064
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-06 23:14:46 +00:00
Ian Lance Taylor
934e055f41 runtime: call msanwrite on object passed to runtime/cgo
Avoids an msan error when runtime/cgo is explicitly rebuilt with
-fsanitize=memory.

Fixes #13815.

Change-Id: I70308034011fb308b63585bcd40b0d1e62ec93ef
Reviewed-on: https://go-review.googlesource.com/18263
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-06 04:04:42 +00:00
Austin Clements
f90b48e0d3 runtime: require the stack barrier lock to traceback cgo and libcalls
Currently, if sigprof determines that the G is in user code (not cgo
or libcall code), it will only traceback the G stack if it can acquire
the stack barrier lock. However, it has no such restriction if the G
is in cgo or libcall code. Because cgo calls count as syscalls, stack
scanning and stack barrier installation can occur during a cgo call,
which means sigprof could attempt to traceback a G in a cgo call while
scanstack is installing stack barriers in that G's stack. As a result,
the following sequence of events can cause the sigprof traceback to
panic with "missed stack barrier":

1. M1: G1 performs a Cgo call (which, on Windows, is any system call,
   which could explain why this is easier to reproduce on Windows).

2. M1: The Cgo call puts G1 into _Gsyscall state.

3. M2: GC starts a scan of G1's stack. It puts G1 in to _Gscansyscall
   and acquires the stack barrier lock.

4. M3: A profiling signal comes in. On Windows this is a global
   (though I don't think this matters), so the runtime stops M1 and
   calls sigprof for G1.

5. M3: sigprof fails to acquire the stack barrier lock (because the
   GC's stack scan holds it).

6. M3: sigprof observes that G1 is in a Cgo call, so it calls
   gentraceback on G1 with its Cgo transition point.

7. M3: gentraceback on G1 grabs the currently empty g.stkbar slice.

8. M2: GC finishes scanning G1's stack and installing stack barriers.

9. M3: gentraceback encounters one of the just-installed stack
   barriers and panics.

This commit fixes this by only allowing cgo tracebacks if sigprof can
acquire the stack barrier lock, just like in the regular user
traceback case.

For good measure, we put the same constraint on libcall tracebacks.
This case is probably already safe because, unlike cgo calls, libcalls
leave the G in _Grunning and prevent reaching a safe point, so
scanstack cannot run during a libcall. However, this also means that
sigprof will always acquire the stack barrier lock without contention,
so there's no cost to adding this constraint to libcall tracebacks.

Fixes #12528. For 1.5.3 (will require some backporting).

Change-Id: Ia5a4b8e3d66b23b02ffcd54c6315c81055c0cec2
Reviewed-on: https://go-review.googlesource.com/18023
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-12-18 17:08:39 +00:00
Shenghou Ma
86f1944e86 cmd/dist, runtime: make runtime version available as runtime.buildVersion
So that there is a uniformed way to retrieve Go version from a Go
binary, starting from Go 1.4 (see https://golang.org/cl/117040043)

Updates #13507.

Change-Id: Iaa2b14fca2d8c4d883d3824e2efc82b3e6fe2624
Reviewed-on: https://go-review.googlesource.com/17459
Reviewed-by: Keith Randall <khr@golang.org>
2015-12-16 05:42:40 +00:00
Austin Clements
01baf13ba5 runtime: only trigger forced GC if GC is not running
Currently, sysmon triggers a forced GC solely based on
memstats.last_gc. However, memstats.last_gc isn't updated until mark
termination, so once sysmon starts triggering forced GC, it will keep
triggering them until GC finishes. The first of these actually starts
a GC; the remainder up to the last print "GC forced", but gcStart
returns immediately because gcphase != _GCoff; then the last may start
another GC if the previous GC finishes (and sets last_gc) between
sysmon triggering it and gcStart checking the GC phase.

Fix this by expanding the condition for starting a forced GC to also
require that no GC is currently running. This, combined with the way
forcegchelper blocks until the GC cycle is started, ensures sysmon
only starts one GC when the time exceeds the forced GC threshold.

Fixes #13458.

Change-Id: Ie6cf841927f6085136be3f45259956cd5cf10d23
Reviewed-on: https://go-review.googlesource.com/17819
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-12-15 20:13:19 +00:00
Austin Clements
50d8d4e834 runtime: simplify sigprof traceback interlocking
The addition of stack barrier locking to copystack subsumes the
partial fix from commit bbd1a1c for SIGPROF during copystack. With the
stack barrier locking, this commit simplifies the rule in sigprof to:
the user stack can be traced only if sigprof can acquire the stack
barrier lock.

Updates #12932, #13362.

Change-Id: I1c1f80015053d0ac7761e9e0c7437c2aba26663f
Reviewed-on: https://go-review.googlesource.com/17192
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-12-15 20:12:17 +00:00
Dmitry Vyukov
fb6f8a96f2 runtime: remove unnecessary wakeups of worker threads
Currently we wake up new worker threads whenever we pass
through the scheduler with nmspinning==0. This leads to
lots of unnecessary thread wake ups.
Instead let only spinning threads wake up new spinning threads.

For the following program:

package main
import "runtime"
func main() {
	for i := 0; i < 1e7; i++ {
		runtime.Gosched()
	}
}

Before:
$ time ./test
real	0m4.278s
user	0m7.634s
sys	0m1.423s

$ strace -c ./test
% time     seconds  usecs/call     calls    errors syscall
 99.93    9.314936           3   2685009     17536 futex

After:
$ time ./test
real	0m1.200s
user	0m1.181s
sys	0m0.024s

$ strace -c ./test
% time     seconds  usecs/call     calls    errors syscall
  3.11    0.000049          25         2           futex

Fixes #13527

Change-Id: Ia1f5bf8a896dcc25d8b04beb1f4317aa9ff16f74
Reviewed-on: https://go-review.googlesource.com/17540
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-12-11 11:31:12 +00:00
Rahul Chaudhry
f939ee13ae runtime: fix GODEBUG=schedtrace=X delay handling.
debug.schedtrace is an int32. Convert it to int64 before
multiplying with constant 1000000. Otherwise, schedtrace
values more than 2147 result in int32 overflow causing
incorrect delays between traces.

Change-Id: I064e8d7b432c1e892a705ee1f31a2e8cdd2c3ea3
Reviewed-on: https://go-review.googlesource.com/17712
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2015-12-11 03:59:20 +00:00
Austin Clements
d2c81ad847 runtime: recursively disallow write barriers in sysmon
sysmon runs without a P. This means it can't interact with the garbage
collector, so write barriers not allowed in anything that sysmon does.

Fixes #10600.

Change-Id: I9de1283900dadee4f72e2ebfc8787123e382ae88
Reviewed-on: https://go-review.googlesource.com/17006
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-19 21:17:25 +00:00
Austin Clements
402e37d4a9 cmd/compile: special case nowritebarrierrec for allocm
allocm is a very unusual function: it is specifically designed to
allocate in contexts where m.p is nil by temporarily taking over a P.
Since allocm is used in many contexts where it would make sense to use
nowritebarrierrec, this commit teaches the nowritebarrierrec analysis
to stop at allocm.

Updates #10600.

Change-Id: I8499629461d4fe25712d861720dfe438df7ada9b
Reviewed-on: https://go-review.googlesource.com/17005
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-19 21:17:19 +00:00
Austin Clements
9c9d74aba7 runtime: prevent sigprof during all stack barrier ops
A sigprof during stack barrier insertion or removal can crash if it
detects an inconsistency between the stkbar array and the stack
itself. Currently we protect against this when scanning another G's
stack using stackLock, but we don't protect against it when unwinding
stack barriers for a recover or a memmove to the stack.

This commit cleans up and improves the stack locking code. It
abstracts out the lock and unlock operations. It uses the lock
consistently everywhere we perform stack operations, and pushes the
lock/unlock down closer to where the stack barrier operations happen
to make it more obvious what it's protecting. Finally, it modifies
sigprof so that instead of spinning until it acquires the lock, it
simply doesn't perform a traceback if it can't acquire it. This is
necessary to prevent self-deadlock.

Updates #11863, which introduced stackLock to fix some of these
issues, but didn't go far enough.

Updates #12528.

Change-Id: I9d1fa88ae3744d31ba91500c96c6988ce1a3a349
Reviewed-on: https://go-review.googlesource.com/17036
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-19 16:35:38 +00:00
Russ Cox
f8e6418637 runtime: fix bad signal stack when using cgo-created threads and async signals
Cgo-created threads transition between having associated Go g's and m's and not.
A signal arriving during the transition could think it was safe and appropriate to
run Go signal handlers when it was in fact not.
Avoid the race by masking all signals during the transition.

Fixes #12277.

Change-Id: Ie9711bc1d098391d58362492197a7e0f5b497d14
Reviewed-on: https://go-review.googlesource.com/16915
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-18 18:05:22 +00:00
Michael Hudson-Doyle
368d548417 cmd/compile, cmd/link, runtime: on ppc64x, maintain the TOC pointer in R2 when compiling PIC
The PowerPC ISA does not have a PC-relative load instruction, which poses
obvious challenges when generating position-independent code. The way the ELFv2
ABI addresses this is to specify that r2 points to a per "module" (shared
library or executable) TOC pointer. Maintaining this pointer requires
cooperation between codegen and the system linker:

 * Non-leaf functions leave space on the stack at r1+24 to save the TOC pointer.
 * A call to a function that *might* have to go via a PLT stub must be followed
   by a nop instruction that the system linker can replace with "ld r1, 24(r1)"
   to restore the TOC pointer (only when dynamically linking Go code).
 * When calling a function via a function pointer, the address of the function
   must be in r12, and the first couple of instructions (the "global entry
   point") of the called function use this to derive the address of the TOC
   for the module it is in.
 * When calling a function that is implemented in the same module, the system
   linker adjusts the call to skip over the instructions mentioned above (the
   "local entry point"), assuming that r2 is already correctly set.

So this changeset adds the global entry point instructions, sets the metadata so
the system linker knows where the local entry point is, inserts code to save the
TOC pointer at 24(r1), adds a nop after any call not known to be local and copes
with the odd non-local code transfer in the runtime (e.g. the stuff around
jmpdefer). It does not actually compile PIC yet.

Change-Id: I7522e22bdfd2f891745a900c60254fe9e372c854
Reviewed-on: https://go-review.googlesource.com/15967
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-12 23:18:58 +00:00
Austin Clements
bbd1a1c706 runtime: make SIGPROF skip stacks that are being copied
sigprof tracebacks the stack across systemstack switches to make
profile tracebacks more complete. However, it does this even if the
user stack is currently being copied, which means it may be in an
inconsistent state that will cause the traceback to panic.

One specific way this can happen is during stack shrinking. Some
goroutine blocks for STW, then enters gchelper, which then assists
with root marking. If that root marking happens to pick the original
goroutine and its stack needs to be shrunk, it will begin to copy that
stack. During this copy, the stack is generally inconsistent and, in
particular, the actual locations of the stack barriers and their
recorded locations are temporarily out of sync. If a SIGPROF happens
during this inconsistency, it will walk the stack all the way back to
the blocked goroutine and panic when it fails to unwind the stack
barriers.

Fix this by disallowing jumping to the user stack during SIGPROF if
that user stack is in the process of being copied.

Fixes #12932.

Change-Id: I9ef694c2c01e3653e292ce22612418dd3daff1b4
Reviewed-on: https://go-review.googlesource.com/16819
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-12 17:37:04 +00:00
Michael Matloob
432cb66f16 runtime: break out system-specific constants into package sys
runtime/internal/sys will hold system-, architecture- and config-
specific constants.

Updates #11647

Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54
Reviewed-on: https://go-review.googlesource.com/16817
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-12 17:04:45 +00:00
Matthew Dempsky
c17c42e8a5 runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...)
Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and
mSpanList.

Two special cases:

1. mHeap_Scavenge() previously didn't take an *mheap parameter, so it
was specially handled in this CL.

2. mHeap_Free() would have collided with mheap's "free" field, so it's
been renamed to (*mheap).freeSpan to parallel its underlying
(*mheap).freeSpanLocked method.

Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b
Reviewed-on: https://go-review.googlesource.com/16221
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-12 00:34:58 +00:00
Austin Clements
f32f2954fb runtime: never allocate new M when jumping time forward
When we're jumping time forward, it means everyone is asleep, so there
should always be an M available. Furthermore, this causes both
allocation and write barriers in contexts that may be running without
a P (such as in sysmon).

Hence, replace this allocation with a throw.

Updates #10600.

Change-Id: I2cee70d5db828d0044082878995949edb25dda5f
Reviewed-on: https://go-review.googlesource.com/16815
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-11 17:37:42 +00:00
Michael Matloob
67faca7d9c runtime: break atomics out into package runtime/internal/atomic
This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.

all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.

Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 17:38:04 +00:00
Austin Clements
d5ba582166 runtime: remove background GC goroutine and mark barriers
These are now unused.

Updates #11970.

Change-Id: I43e5c4e5bcda9581bacc63364f96bb4855ab779f
Reviewed-on: https://go-review.googlesource.com/16393
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:24:05 +00:00
Austin Clements
c99d7f7f85 runtime: decentralize mark done and mark termination
This moves all of the mark 1 to mark 2 transition and mark termination
to the mark done transition function. This means these transitions are
now handled on the goroutine that detected mark completion. This also
means that the GC coordinator and the background completion barriers
are no longer used and various workarounds to yield to the coordinator
are no longer necessary. These will be removed in follow-up commits.

One consequence of this is that mark workers now need to be
preemptible when performing the mark done transition. This allows them
to stop the world and to perform the final clean-up steps of GC after
restarting the world. They are only made preemptible while performing
this transition, so if the worker findRunnableGCWorker would schedule
isn't available, we didn't want to schedule it anyway.

Fixes #11970.

Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837
Reviewed-on: https://go-review.googlesource.com/16391
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:54 +00:00
Austin Clements
20f276e237 runtime: don't start idle mark workers when barriers are cleared
Currently, we don't start dedicated or fractional mark workers unless
the mark 1 or mark 2 barriers have been cleared. One intended
consequence of this is that no background workers run between the
forEachP that disposes all gcWork caches and the beginning of mark 2.

However, we (unintentionally) did not apply this restriction to idle
mark workers. As a result, these can start in the interim between mark
1 completion and mark 2 starting. This explains why it was necessary
to reset the root marking jobs using carefully ordered atomic writes
when setting up mark 2. It also means that, even though we definitely
enqueue work before starting mark 2, it may be drained by the time we
reset the mark 2 barrier. If this happens, currently the only thing
preventing the runtime from deadlocking is that the scheduler itself
also checks for mark completion and will signal mark 2 completion.
Were it not for the odd behavior of idle workers, this check in the
scheduler would not be necessary.

Clean all of this up and prepare to remove this check in the scheduler
by applying the same restriction to starting idle mark workers.

Change-Id: Ic1b479e1591bd7773dc27b320ca399a215603b5a
Reviewed-on: https://go-review.googlesource.com/16631
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:33 +00:00