qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-10-02 10:28:34 -06:00

Author	SHA1	Message	Date
Austin Clements	1a2cf91f5e	runtime: split gfree list into with-stacks and without-stacks Currently all free Gs are added to one list. Split this into two lists: one for free Gs with cached stacks and one for Gs without cached stacks. This lets us preferentially allocate Gs that already have a stack, but more importantly, it sets us up to free cached G stacks concurrently. Change-Id: Idbe486f708997e1c9d166662995283f02d1eeb3c Reviewed-on: https://go-review.googlesource.com/20664 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-26 23:39:51 +00:00
Dmitry Vyukov	a3703618ea	runtime: use per-goroutine sequence numbers in tracer Currently tracer uses global sequencer and it introduces significant slowdown on parallel machines (up to 10x). Replace the global sequencer with per-goroutine sequencer. If we assign per-goroutine sequence numbers to only 3 types of events (start, unblock and syscall exit), it is enough to restore consistent partial ordering of all events. Even these events don't need sequence numbers all the time (if goroutine starts on the same P where it was unblocked, then start does not need sequence number). The burden of restoring the order is put on trace parser. Details of the algorithm are described in the comments. On http benchmark with GOMAXPROCS=48: no tracing: 5026 ns/op tracing: 27803 ns/op (+453%) with this change: 6369 ns/op (+26%, mostly for traceback) Also trace size is reduced by ~22%. Average event size before: 4.63 bytes/event, after: 3.62 bytes/event. Besides running trace tests, I've also tested with manually broken cputicks (random skew for each event, per-P skew and episodic random skew). In all cases broken timestamps were detected and no test failures. Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa Reviewed-on: https://go-review.googlesource.com/21512 Run-TryBot: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-23 15:57:05 +00:00
David Crawshaw	7d469179e6	cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final rtype. To ensure there is only one rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other rtype values with typeOff (and name/string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-13 13:03:11 +00:00
Emmanuel Odeke	e4f1d9cf2e	runtime: make execution error panic values implement the Error interface Make execution panics implement Error as mandated by https://golang.org/ref/spec#Run_time_panics, instead of panics with strings. Fixes #14965 Change-Id: I7827f898b9b9c08af541db922cc24fa0800ff18a Reviewed-on: https://go-review.googlesource.com/21214 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-10 01:16:30 +00:00
Michael Hudson-Doyle	31cf1c1779	runtime: clamp OS-reported number of processors to _MaxGomaxprocs So that all Go processes do not die on startup on a system with >256 CPUs. I tested this by hacking osinit to set ncpu to 1000. Updates #15131 Change-Id: I52e061a0de97be41d684dd8b748fa9087d6f1aef Reviewed-on: https://go-review.googlesource.com/21599 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-04-07 00:11:25 +00:00
Dmitry Vyukov	475d113b53	runtime: don't burn CPU unnecessarily Two GC-related functions, scang and casgstatus, wait in an active spin loop. Active spinning is never a good idea in user-space. Once we wait several times more than the expected wait time, something unexpected is happenning (e.g. the thread we are waiting for is descheduled or handling a page fault) and we need to yield to OS scheduler. Moreover, the expected wait time is very high for these functions: scang wait time can be tens of milliseconds, casgstatus can be hundreds of microseconds. It does not make sense to spin even for that time. go install -a std profile on a 4-core machine shows that 11% of time is spent in the active spin in scang: 6.12% compile compile [.] runtime.scang 3.27% compile compile [.] runtime.readgstatus 1.72% compile compile [.] runtime/internal/atomic.Load The active spin also increases tail latency in the case of the slightest oversubscription: GC goroutines spend whole quantum in the loop instead of executing user code. Here is scang wait time histogram during go install -a std: 13707.0000 - 1815442.7667 [ 118]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎... 1815442.7667 - 3617178.5333 [ 9]: ∎∎∎∎∎∎∎∎∎ 3617178.5333 - 5418914.3000 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ 5418914.3000 - 7220650.0667 [ 5]: ∎∎∎∎∎ 7220650.0667 - 9022385.8333 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎ 9022385.8333 - 10824121.6000 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎ 10824121.6000 - 12625857.3667 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 12625857.3667 - 14427593.1333 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 14427593.1333 - 16229328.9000 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 16229328.9000 - 18031064.6667 [ 32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 18031064.6667 - 19832800.4333 [ 28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 19832800.4333 - 21634536.2000 [ 6]: ∎∎∎∎∎∎ 21634536.2000 - 23436271.9667 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 23436271.9667 - 25238007.7333 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ 25238007.7333 - 27039743.5000 [ 27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 27039743.5000 - 28841479.2667 [ 20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 28841479.2667 - 30643215.0333 [ 10]: ∎∎∎∎∎∎∎∎∎∎ 30643215.0333 - 32444950.8000 [ 7]: ∎∎∎∎∎∎∎ 32444950.8000 - 34246686.5667 [ 4]: ∎∎∎∎ 34246686.5667 - 36048422.3333 [ 4]: ∎∎∎∎ 36048422.3333 - 37850158.1000 [ 1]: ∎ 37850158.1000 - 39651893.8667 [ 5]: ∎∎∎∎∎ 39651893.8667 - 41453629.6333 [ 2]: ∎∎ 41453629.6333 - 43255365.4000 [ 2]: ∎∎ 43255365.4000 - 45057101.1667 [ 2]: ∎∎ 45057101.1667 - 46858836.9333 [ 1]: ∎ 46858836.9333 - 48660572.7000 [ 2]: ∎∎ 48660572.7000 - 50462308.4667 [ 3]: ∎∎∎ 50462308.4667 - 52264044.2333 [ 2]: ∎∎ 52264044.2333 - 54065780.0000 [ 2]: ∎∎ and the zoomed-in first part: 13707.0000 - 19916.7667 [ 2]: ∎∎ 19916.7667 - 26126.5333 [ 2]: ∎∎ 26126.5333 - 32336.3000 [ 9]: ∎∎∎∎∎∎∎∎∎ 32336.3000 - 38546.0667 [ 8]: ∎∎∎∎∎∎∎∎ 38546.0667 - 44755.8333 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎ 44755.8333 - 50965.6000 [ 10]: ∎∎∎∎∎∎∎∎∎∎ 50965.6000 - 57175.3667 [ 5]: ∎∎∎∎∎ 57175.3667 - 63385.1333 [ 6]: ∎∎∎∎∎∎ 63385.1333 - 69594.9000 [ 5]: ∎∎∎∎∎ 69594.9000 - 75804.6667 [ 6]: ∎∎∎∎∎∎ 75804.6667 - 82014.4333 [ 6]: ∎∎∎∎∎∎ 82014.4333 - 88224.2000 [ 4]: ∎∎∎∎ 88224.2000 - 94433.9667 [ 1]: ∎ 94433.9667 - 100643.7333 [ 1]: ∎ 100643.7333 - 106853.5000 [ 2]: ∎∎ 106853.5000 - 113063.2667 [ 0]: 113063.2667 - 119273.0333 [ 2]: ∎∎ 119273.0333 - 125482.8000 [ 2]: ∎∎ 125482.8000 - 131692.5667 [ 1]: ∎ 131692.5667 - 137902.3333 [ 1]: ∎ 137902.3333 - 144112.1000 [ 0]: 144112.1000 - 150321.8667 [ 2]: ∎∎ 150321.8667 - 156531.6333 [ 1]: ∎ 156531.6333 - 162741.4000 [ 1]: ∎ 162741.4000 - 168951.1667 [ 0]: 168951.1667 - 175160.9333 [ 0]: 175160.9333 - 181370.7000 [ 1]: ∎ 181370.7000 - 187580.4667 [ 1]: ∎ 187580.4667 - 193790.2333 [ 2]: ∎∎ 193790.2333 - 200000.0000 [ 0]: Here is casgstatus wait time histogram: 631.0000 - 5276.6333 [ 3]: ∎∎∎ 5276.6333 - 9922.2667 [ 5]: ∎∎∎∎∎ 9922.2667 - 14567.9000 [ 2]: ∎∎ 14567.9000 - 19213.5333 [ 6]: ∎∎∎∎∎∎ 19213.5333 - 23859.1667 [ 5]: ∎∎∎∎∎ 23859.1667 - 28504.8000 [ 6]: ∎∎∎∎∎∎ 28504.8000 - 33150.4333 [ 6]: ∎∎∎∎∎∎ 33150.4333 - 37796.0667 [ 2]: ∎∎ 37796.0667 - 42441.7000 [ 1]: ∎ 42441.7000 - 47087.3333 [ 3]: ∎∎∎ 47087.3333 - 51732.9667 [ 0]: 51732.9667 - 56378.6000 [ 1]: ∎ 56378.6000 - 61024.2333 [ 0]: 61024.2333 - 65669.8667 [ 0]: 65669.8667 - 70315.5000 [ 0]: 70315.5000 - 74961.1333 [ 1]: ∎ 74961.1333 - 79606.7667 [ 0]: 79606.7667 - 84252.4000 [ 0]: 84252.4000 - 88898.0333 [ 0]: 88898.0333 - 93543.6667 [ 0]: 93543.6667 - 98189.3000 [ 0]: 98189.3000 - 102834.9333 [ 0]: 102834.9333 - 107480.5667 [ 1]: ∎ 107480.5667 - 112126.2000 [ 0]: 112126.2000 - 116771.8333 [ 0]: 116771.8333 - 121417.4667 [ 0]: 121417.4667 - 126063.1000 [ 0]: 126063.1000 - 130708.7333 [ 0]: 130708.7333 - 135354.3667 [ 0]: 135354.3667 - 140000.0000 [ 1]: ∎ Ideally we eliminate the waiting by switching to async state machine for GC, but for now just yield to OS scheduler after a reasonable wait time. To choose yielding parameters I've measured golang.org/x/benchmarks/http tail latencies with different yield delays and oversubscription levels. With no oversubscription (to the degree possible): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 1.41ms ±15% 1.41ms ± 5% ~ (p=0.611 n=13+12) Latency-95 5.21ms ± 2% 5.15ms ± 2% -1.15% (p=0.012 n=13+13) Latency-99 7.16ms ± 2% 7.05ms ± 2% -1.54% (p=0.002 n=13+13) Latency-999 10.7ms ± 9% 10.2ms ±10% -5.46% (p=0.004 n=12+13) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 1.41ms ±15% 1.41ms ± 8% ~ (p=0.511 n=13+13) Latency-95 5.21ms ± 2% 5.14ms ± 2% -1.23% (p=0.006 n=13+13) Latency-99 7.16ms ± 2% 7.02ms ± 2% -1.94% (p=0.000 n=13+13) Latency-999 10.7ms ± 9% 10.1ms ± 8% -6.14% (p=0.000 n=12+13) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 1.41ms ±15% 1.45ms ± 6% ~ (p=0.724 n=13+13) Latency-95 5.21ms ± 2% 5.18ms ± 1% ~ (p=0.287 n=13+13) Latency-99 7.16ms ± 2% 7.05ms ± 2% -1.64% (p=0.002 n=13+13) Latency-999 10.7ms ± 9% 10.0ms ± 5% -6.72% (p=0.000 n=12+13) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 1.41ms ±15% 1.51ms ± 7% +6.57% (p=0.002 n=13+13) Latency-95 5.21ms ± 2% 5.21ms ± 2% ~ (p=0.960 n=13+13) Latency-99 7.16ms ± 2% 7.06ms ± 2% -1.50% (p=0.012 n=13+13) Latency-999 10.7ms ± 9% 10.0ms ± 6% -6.49% (p=0.000 n=12+13) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 1.41ms ±15% 1.53ms ± 6% +8.48% (p=0.000 n=13+12) Latency-95 5.21ms ± 2% 5.23ms ± 2% ~ (p=0.287 n=13+13) Latency-99 7.16ms ± 2% 7.08ms ± 2% -1.21% (p=0.004 n=13+13) Latency-999 10.7ms ± 9% 9.9ms ± 3% -7.99% (p=0.000 n=12+12) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 1.41ms ±15% 1.47ms ± 5% ~ (p=0.072 n=13+13) Latency-95 5.21ms ± 2% 5.17ms ± 2% ~ (p=0.091 n=13+13) Latency-99 7.16ms ± 2% 7.02ms ± 2% -1.99% (p=0.000 n=13+13) Latency-999 10.7ms ± 9% 9.9ms ± 5% -7.86% (p=0.000 n=12+13) With slight oversubscription (another instance of http benchmark was running in background with reduced GOMAXPROCS): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 840µs ± 3% 804µs ± 3% -4.37% (p=0.000 n=15+18) Latency-95 6.52ms ± 4% 6.03ms ± 4% -7.51% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.0ms ± 4% -7.33% (p=0.000 n=18+14) Latency-999 18.0ms ± 9% 16.8ms ± 7% -6.84% (p=0.000 n=18+18) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 840µs ± 3% 809µs ± 3% -3.71% (p=0.000 n=15+17) Latency-95 6.52ms ± 4% 6.11ms ± 4% -6.29% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 9.9ms ± 6% -7.55% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.5ms ±11% -8.49% (p=0.000 n=18+18) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 840µs ± 3% 823µs ± 5% -2.06% (p=0.002 n=15+18) Latency-95 6.52ms ± 4% 6.32ms ± 3% -3.05% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 4% -5.22% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.7ms ±10% -7.09% (p=0.000 n=18+18) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 840µs ± 3% 836µs ± 5% ~ (p=0.442 n=15+18) Latency-95 6.52ms ± 4% 6.39ms ± 3% -2.00% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 6% -5.15% (p=0.000 n=18+17) Latency-999 18.0ms ± 9% 16.6ms ± 8% -7.48% (p=0.000 n=18+18) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 840µs ± 3% 836µs ± 6% ~ (p=0.401 n=15+18) Latency-95 6.52ms ± 4% 6.40ms ± 4% -1.79% (p=0.010 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 5% -4.95% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.5ms ±14% -8.17% (p=0.000 n=18+18) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 840µs ± 3% 828µs ± 2% -1.49% (p=0.001 n=15+17) Latency-95 6.52ms ± 4% 6.38ms ± 4% -2.04% (p=0.001 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 4% -4.77% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.9ms ± 9% -6.23% (p=0.000 n=18+18) With significant oversubscription (background http benchmark was running with full GOMAXPROCS): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 1.32ms ±12% 1.30ms ±13% ~ (p=0.454 n=14+14) Latency-95 16.3ms ±10% 15.3ms ± 7% -6.29% (p=0.001 n=14+14) Latency-99 29.4ms ±10% 27.9ms ± 5% -5.04% (p=0.001 n=14+12) Latency-999 49.9ms ±19% 45.9ms ± 5% -8.00% (p=0.008 n=14+13) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 1.32ms ±12% 1.29ms ± 9% ~ (p=0.227 n=14+14) Latency-95 16.3ms ±10% 15.4ms ± 5% -5.27% (p=0.002 n=14+14) Latency-99 29.4ms ±10% 27.9ms ± 6% -5.16% (p=0.001 n=14+14) Latency-999 49.9ms ±19% 46.8ms ± 8% -6.21% (p=0.050 n=14+14) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 1.32ms ±12% 1.35ms ± 9% ~ (p=0.401 n=14+14) Latency-95 16.3ms ±10% 15.0ms ± 4% -7.67% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.4ms ± 5% -6.98% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.7ms ± 5% -10.56% (p=0.000 n=14+11) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 1.32ms ±12% 1.36ms ±10% ~ (p=0.246 n=14+14) Latency-95 16.3ms ±10% 14.9ms ± 5% -8.31% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.4ms ± 7% -6.70% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.9ms ±15% -10.13% (p=0.003 n=14+14) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 1.32ms ±12% 1.41ms ± 9% +6.37% (p=0.008 n=14+13) Latency-95 16.3ms ±10% 15.1ms ± 8% -7.45% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.5ms ±12% -6.67% (p=0.002 n=14+14) Latency-999 49.9ms ±19% 45.9ms ±16% -8.06% (p=0.019 n=14+14) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 1.32ms ±12% 1.42ms ±10% +7.21% (p=0.003 n=14+14) Latency-95 16.3ms ±10% 15.0ms ± 7% -7.59% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.3ms ± 8% -7.20% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.8ms ± 8% -10.21% (p=0.001 n=14+13) All numbers are on 8 cores and with GOGC=10 (http benchmark has tiny heap, few goroutines and low allocation rate, so by default GC barely affects tail latency). 10us/5us yield delays seem to provide a reasonable compromise and give 5-10% tail latency reduction. That's what used in this change. go install -a std results on 4 core machine: name old time/op new time/op delta Time 8.39s ± 2% 7.94s ± 2% -5.34% (p=0.000 n=47+49) UserTime 24.6s ± 2% 22.9s ± 2% -6.76% (p=0.000 n=49+49) SysTime 1.77s ± 9% 1.89s ±11% +7.00% (p=0.000 n=49+49) CpuLoad 315ns ± 2% 313ns ± 1% -0.59% (p=0.000 n=49+48) # %CPU MaxRSS 97.1ms ± 4% 97.5ms ± 9% ~ (p=0.838 n=46+49) # bytes Update #14396 Update #14189 Change-Id: I3f4109bf8f7fd79b39c466576690a778232055a2 Reviewed-on: https://go-review.googlesource.com/21503 Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2016-04-05 15:52:03 +00:00
Dmitry Vyukov	3b246fa863	runtime: sleep less when we can do work Usleep(100) in runqgrab negatively affects latency and throughput of parallel application. We are sleeping instead of doing useful work. This is effect is particularly visible on windows where minimal sleep duration is 1-15ms. Reduce sleep from 100us to 3us and use osyield on windows. Sync chan send/recv takes ~50ns, so 3us gives us ~50x overshoot. benchmark old ns/op new ns/op delta BenchmarkChanSync-12 216 217 +0.46% BenchmarkChanSyncWork-12 27213 25816 -5.13% CPU consumption goes up from 106% to 108% in the first case, and from 107% to 125% in the second case. Test case from #14790 on windows: BenchmarkDefaultResolution-8 4583372 29720 -99.35% Benchmark1ms-8 992056 30701 -96.91% 99-th latency percentile for HTTP request serving is improved by up to 15% (see http://golang.org/cl/20835 for details). The following benchmarks are from the change that originally added this sleep (see https://golang.org/s/go15gomaxprocs): name old time/op new time/op delta Chain 22.6µs ± 2% 22.7µs ± 6% ~ (p=0.905 n=9+10) ChainBuf 22.4µs ± 3% 22.5µs ± 4% ~ (p=0.780 n=9+10) Chain-2 23.5µs ± 4% 24.9µs ± 1% +5.66% (p=0.000 n=10+9) ChainBuf-2 23.7µs ± 1% 24.4µs ± 1% +3.31% (p=0.000 n=9+10) Chain-4 24.2µs ± 2% 25.1µs ± 3% +3.70% (p=0.000 n=9+10) ChainBuf-4 24.4µs ± 5% 25.0µs ± 2% +2.37% (p=0.023 n=10+10) Powser 2.37s ± 1% 2.37s ± 1% ~ (p=0.423 n=8+9) Powser-2 2.48s ± 2% 2.57s ± 2% +3.74% (p=0.000 n=10+9) Powser-4 2.66s ± 1% 2.75s ± 1% +3.40% (p=0.000 n=10+10) Sieve 13.3s ± 2% 13.3s ± 2% ~ (p=1.000 n=10+9) Sieve-2 7.00s ± 2% 7.44s ±16% ~ (p=0.408 n=8+10) Sieve-4 4.13s ±21% 3.85s ±22% ~ (p=0.113 n=9+9) Fixes #14790 Change-Id: Ie7c6a1c4f9c8eb2f5d65ab127a3845386d6f8b5d Reviewed-on: https://go-review.googlesource.com/20835 Reviewed-by: Austin Clements <austin@google.com>	2016-04-05 15:32:06 +00:00
Ian Lance Taylor	59fc42b230	runtime: allocate mp.cgocallers earlier Fixes #15061. Change-Id: I71f69f398d1c5f3a884bbd044786f1a5600d0fae Reviewed-on: https://go-review.googlesource.com/21398 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-01 22:23:13 +00:00
Michel Lespinasse	7043d2bb5e	runtime: insert itabs into hash table during init See #14874 This change makes the runtime register all compiler generated itabs (as obtained from the moduledata) during init. Change-Id: I9969a0985b99b8bda820a631f7fe4c78f1174cdf Reviewed-on: https://go-review.googlesource.com/20900 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Michel Lespinasse <walken@google.com>	2016-03-29 02:14:49 +00:00
Dmitry Vyukov	ea0386f85f	runtime: improve randomized stealing logic During random stealing we steal 4*GOMAXPROCS times from random procs. One would expect that most of the time we check all procs this way, but due to low quality PRNG we actually miss procs with frightening probability. Below are modelling experiment results for 1e6 tries: GOMAXPROCS = 2 : missed 1 procs 7944 times GOMAXPROCS = 3 : missed 1 procs 101620 times GOMAXPROCS = 3 : missed 2 procs 3571 times GOMAXPROCS = 4 : missed 1 procs 63916 times GOMAXPROCS = 4 : missed 2 procs 61 times GOMAXPROCS = 4 : missed 3 procs 16 times GOMAXPROCS = 5 : missed 1 procs 133136 times GOMAXPROCS = 5 : missed 2 procs 1025 times GOMAXPROCS = 5 : missed 3 procs 101 times GOMAXPROCS = 5 : missed 4 procs 15 times GOMAXPROCS = 8 : missed 1 procs 151765 times GOMAXPROCS = 8 : missed 2 procs 5057 times GOMAXPROCS = 8 : missed 3 procs 1726 times GOMAXPROCS = 8 : missed 4 procs 68 times GOMAXPROCS = 12 : missed 1 procs 199081 times GOMAXPROCS = 12 : missed 2 procs 27489 times GOMAXPROCS = 12 : missed 3 procs 3113 times GOMAXPROCS = 12 : missed 4 procs 233 times GOMAXPROCS = 12 : missed 5 procs 9 times GOMAXPROCS = 16 : missed 1 procs 237477 times GOMAXPROCS = 16 : missed 2 procs 30037 times GOMAXPROCS = 16 : missed 3 procs 9466 times GOMAXPROCS = 16 : missed 4 procs 1334 times GOMAXPROCS = 16 : missed 5 procs 192 times GOMAXPROCS = 16 : missed 6 procs 5 times GOMAXPROCS = 16 : missed 7 procs 1 times GOMAXPROCS = 16 : missed 8 procs 1 times A missed proc won't lead to underutilization because we check all procs again after dropping P. But it can lead to an unpleasant situation when we miss a proc, drop P, check all procs, discover work, acquire P, miss the proc again, repeat. Improve stealing logic to cover all procs. Also don't enter spinning mode and try to steal when there is nobody around. Change-Id: Ibb6b122cc7fb836991bad7d0639b77c807aab4c2 Reviewed-on: https://go-review.googlesource.com/20836 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>	2016-03-25 11:00:48 +00:00
Austin Clements	8fb182d020	runtime: never pass stack pointers to gopark gopark calls the unlock function after setting the G to _Gwaiting. This means it's generally unsafe to access the G's stack from the unlock function because the G may start running on another P. Once we start shrinking stacks concurrently, a stack shrink could also move the stack the moment after it enters _Gwaiting and before the unlock function is called. Document this restriction and fix the two places where we currently violate it. This is unlikely to be a problem in practice for these two places right now, but they're already skating on thin ice. For example, the following sequence could in principle cause corruption, deadlock, or a panic in the select code: On M1/P1: 1. G1 selects on channels A and B. 2. selectgoImpl calls gopark. 3. gopark puts G1 in _Gwaiting. 4. gopark calls selparkcommit. 5. selparkcommit releases the lock on channel A. On M2/P2: 6. G2 sends to channel A. 7. The send puts G1 in _Grunnable and puts it on P2's run queue. 8. The scheduler runs, selects G1, puts it in _Grunning, and resumes G1. 9. On G1, the sellock immediately following the gopark gets called. 10. sellock grows and moves the stack. On M1/P1: 11. selparkcommit continues to scan the lock order for the next channel to unlock, but it's now reading from a freed (and possibly reused) stack. This shouldn't happen in practice because step 10 isn't the first call to sellock, so the stack should already be big enough. However, once we start shrinking stacks concurrently, this reasoning won't work any more. For #12967. Change-Id: I3660c5be37e5be9f87433cb8141bdfdf37fadc4c Reviewed-on: https://go-review.googlesource.com/20038 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-16 20:13:10 +00:00
Austin Clements	e4a95b6343	runtime: record channel in sudog Given a G, there's currently no way to find the channel it's blocking on. We'll need this information to fix a (probably theoretical) bug in select and to implement concurrent stack shrinking, so record the channel in the sudog. For #12967. Change-Id: If8fb63a140f1d07175818824d08c0ebeec2bdf66 Reviewed-on: https://go-review.googlesource.com/20035 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-16 20:13:02 +00:00
Emmanuel Odeke	6dfcc336c5	runtime: move testSchedLocalQueue* to export_test Move functions testSchedLocalQueueLocal and testSchedLocalQueueSteal from proc.go to export_test.go, the only site that they are used. Fixes #14796 Change-Id: I16b6fa4a13835eab33f66a2c2e87a5f5c79b7bd3 Reviewed-on: https://go-review.googlesource.com/20640 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-03-13 00:34:58 +00:00
Brad Fitzpatrick	5fea2ccc77	all: single space after period. The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-02 00:13:47 +00:00
Dmitry Vyukov	bdc14698f8	runtime: unwire g/m in dropg always Currently dropg does not unwire locked g/m. This is unnecessary distiction between locked and non-locked g/m. We always restart goroutines with execute which re-wires g/m. First, this produces false sense that this distinction is necessary. Second, it can confuse some sanity and cross checks. For example, if we check that g/m are unwired before we wire them in execute, the check will fail for locked g/m. I've hit this while doing some race detector changes, When we deschedule a goroutine and run scheduler code, m.curg is generally nil, but not for locked ms. Remove the distinction. Change-Id: I3b87a28ff343baa1d564aab1f821b582a84dee07 Reviewed-on: https://go-review.googlesource.com/19950 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-26 15:45:45 +00:00
Austin Clements	cbe849fc38	runtime: eliminate unused _Genqueue state _Genqueue and _Gscanenqueue were introduced as part of the GC quiesce code. The quiesce code was removed by `197aa9e`, but these states and some associated code stuck around. Remove them. Change-Id: I69df81881602d4a431556513dac2959668d27c20 Reviewed-on: https://go-review.googlesource.com/19638 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-25 23:37:32 +00:00
Martin Möhrmann	fdd0179bb1	all: fix typos and spelling Change-Id: Icd06d99c42b8299fd931c7da821e1f418684d913 Reviewed-on: https://go-review.googlesource.com/19829 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-24 18:42:29 +00:00
Shenghou Ma	d70c04cf08	runtime: fix missing word in comment Change-Id: I6cb8ac7b59812e82111ab3b0f8303ab8194a5129 Reviewed-on: https://go-review.googlesource.com/19791 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-21 22:40:25 +00:00
Austin Clements	f309bf3eef	runtime: start an M when handing off a P when there's GC work Currently it's possible for the scheduler to deadlock with the right confluence of locked Gs, assists, and scheduling of background mark workers. Broadly, this happens because handoffp is stricter than findrunnable, and if the only work for a P is GC work, handoffp will put the P into idle, rather than starting an M to execute that P. One way this can happen is as follows: 0. There is only one user G, which we'll call G 1. There is more than one P, but they're all idle except the one running G 1. 1. G 1 locks itself to an M using runtime.LockOSThread. 2. GC starts up and enters mark 1. 3. G 1 performs a GC assist, which completes mark 1 without being fully satisfied. Completing mark 1 causes all background mark workers to park. And since the assist isn't fully satisfied, it parks as well, waiting for a background mark worker to satisfy its remaining assist debt. 4. The assist park enters the scheduler. Since G 1 is locked to the M, the scheduler releases the P and calls handoffp to hand the P to another M. 5. handoffp checks the local and global run queues, which are empty, and sees that there are idle Ps, so rather than start an M, it puts the P into idle. At this point, all of the Gs are waiting and all of the Ps are idle. In particular, none of the GC workers are running, so no mark work gets done and the assist on the main G is never satisfied, so the whole process soft locks up. Fix this by making handoffp start an M if there is GC work. This reintroduces a key invariant: that in any situation where findrunnable would return a G to run on a P, handoffp for that P will start an M to run work on that P. Fixes #13645. Tested by running 2,689 iterations of `go tool dist test -no-rebuild runtime:cpu124` across 10 linux-amd64-noopt VMs with no failures. Without this change, the failure rate was somewhere around 1%. Performance change is negligible. name old time/op new time/op delta XBenchGarbage-12 2.48ms ± 2% 2.48ms ± 1% -0.24% (p=0.000 n=92+93) name old time/op new time/op delta BinaryTree17-12 2.86s ± 2% 2.87s ± 2% ~ (p=0.667 n=19+20) Fannkuch11-12 2.52s ± 1% 2.47s ± 1% -2.05% (p=0.000 n=18+20) FmtFprintfEmpty-12 51.7ns ± 1% 51.5ns ± 3% ~ (p=0.931 n=16+20) FmtFprintfString-12 170ns ± 1% 168ns ± 1% -0.65% (p=0.000 n=19+19) FmtFprintfInt-12 160ns ± 0% 160ns ± 0% +0.18% (p=0.033 n=17+19) FmtFprintfIntInt-12 265ns ± 1% 273ns ± 1% +2.98% (p=0.000 n=17+19) FmtFprintfPrefixedInt-12 235ns ± 1% 239ns ± 1% +1.99% (p=0.000 n=16+19) FmtFprintfFloat-12 315ns ± 0% 315ns ± 1% ~ (p=0.250 n=17+19) FmtManyArgs-12 1.04µs ± 1% 1.05µs ± 0% +0.87% (p=0.000 n=17+19) GobDecode-12 7.93ms ± 0% 7.85ms ± 1% -1.03% (p=0.000 n=16+18) GobEncode-12 6.62ms ± 1% 6.58ms ± 1% -0.60% (p=0.000 n=18+19) Gzip-12 322ms ± 1% 320ms ± 1% -0.46% (p=0.009 n=20+20) Gunzip-12 42.5ms ± 1% 42.5ms ± 0% ~ (p=0.751 n=19+19) HTTPClientServer-12 69.7µs ± 1% 70.0µs ± 2% ~ (p=0.056 n=19+19) JSONEncode-12 16.9ms ± 1% 16.7ms ± 1% -1.13% (p=0.000 n=19+19) JSONDecode-12 61.5ms ± 1% 61.3ms ± 1% -0.35% (p=0.001 n=20+17) Mandelbrot200-12 3.94ms ± 0% 3.91ms ± 0% -0.67% (p=0.000 n=20+18) GoParse-12 3.71ms ± 1% 3.70ms ± 1% ~ (p=0.244 n=17+19) RegexpMatchEasy0_32-12 101ns ± 1% 102ns ± 2% +0.54% (p=0.037 n=19+20) RegexpMatchEasy0_1K-12 349ns ± 0% 350ns ± 0% +0.33% (p=0.000 n=17+18) RegexpMatchEasy1_32-12 84.5ns ± 2% 84.2ns ± 1% -0.43% (p=0.048 n=19+20) RegexpMatchEasy1_1K-12 510ns ± 1% 513ns ± 2% +0.58% (p=0.002 n=18+20) RegexpMatchMedium_32-12 132ns ± 1% 134ns ± 1% +0.95% (p=0.000 n=20+20) RegexpMatchMedium_1K-12 40.1µs ± 1% 39.6µs ± 1% -1.39% (p=0.000 n=20+20) RegexpMatchHard_32-12 2.08µs ± 0% 2.06µs ± 1% -0.95% (p=0.000 n=18+18) RegexpMatchHard_1K-12 62.2µs ± 1% 61.9µs ± 1% -0.42% (p=0.001 n=19+20) Revcomp-12 537ms ± 0% 536ms ± 0% ~ (p=0.076 n=20+20) Template-12 71.3ms ± 1% 69.3ms ± 1% -2.75% (p=0.000 n=20+20) TimeParse-12 361ns ± 0% 360ns ± 1% ~ (p=0.056 n=19+19) TimeFormat-12 353ns ± 0% 352ns ± 0% -0.23% (p=0.000 n=17+18) [Geo mean] 62.6µs 62.5µs -0.17% Change-Id: I0fbbbe4d7d99653ba5600ffb4394fa03558bc4e9 Reviewed-on: https://go-review.googlesource.com/19107 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-02 02:11:28 +00:00
Austin Clements	09940b92a0	runtime: make p.gcBgMarkWorker a guintptr Currently p.gcBgMarkWorker is a *g. Change it to a guintptr. This eliminates a write barrier during the subtle mark worker parking dance (which isn't known to be causing problems, but may). Change-Id: Ibf12c05ac910820448059e69a68e5b882c993ed8 Reviewed-on: https://go-review.googlesource.com/18970 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2016-01-27 02:23:09 +00:00
Austin Clements	eb3b1830b0	runtime: attach mark workers to P after they park Currently mark workers attach to their designated Ps before parking, either during initialization or after performing a phase transition. However, in both of these cases, it's possible that the mark worker is running on a different P than the one it attaches to. This is a problem, because as soon as the worker attaches to a P, that P's scheduler can execute the worker. If the worker hasn't yet parked on the P it's actually running on, this means the worker G will be running in two places at once. The most visible consequence of this is that once the first instance of the worker does park, it will clear g.m and the second instance will crash shortly when it tries to use g.m. Fix this by moving the attach to the gopark callback. At this point, the G is genuinely stopped and the callback is running on the system stack, so it's safe for another P's scheduler to pick up the worker G. Fixes #13363. Fixes #13978. Change-Id: If2f7c4a4174f9511f6227e14a27c56fb842d1cc8 Reviewed-on: https://go-review.googlesource.com/18761 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2016-01-27 02:13:53 +00:00
Ian Lance Taylor	efd93a412e	runtime: minimize time between lockextra/unlockextra This doesn't fix a bug, but may improve performance in programs that have many concurrent calls from C to Go. The old code made several system calls between lockextra and unlockextra. That could be happening while another thread is spinning acquiring lockextra. This changes the code to not make any system calls while holding the lock. Change-Id: I50576478e478670c3d6429ad4e1b7d80f98a19d8 Reviewed-on: https://go-review.googlesource.com/18548 Reviewed-by: Russ Cox <rsc@golang.org>	2016-01-14 05:55:43 +00:00
Russ Cox	fac8202c3f	runtime: make NumGoroutine and Stack agree not to include system goroutines [Repeat of CL 18343 with build fixes.] Before, NumGoroutine counted system goroutines and Stack (usually) didn't show them, which was inconsistent and confusing. To resolve which way they should be consistent, it seems like package main import "runtime" func main() { println(runtime.NumGoroutine()) } should print 1 regardless of internal runtime details. Make it so. Fixes #11706. Change-Id: If26749fec06aa0ff84311f7941b88d140552e81d Reviewed-on: https://go-review.googlesource.com/18432 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>	2016-01-13 01:46:01 +00:00
Ian Lance Taylor	c02aa463db	runtime: fix arm/arm64/ppc64/mips64 to dropm when necessary Fixes #13881. Change-Id: Idff77db381640184ddd2b65022133bb226168800 Reviewed-on: https://go-review.googlesource.com/18449 Reviewed-by: David Crawshaw <crawshaw@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org>	2016-01-11 18:46:54 +00:00
Ian Lance Taylor	21b4f234c7	runtime: for c-archive/c-shared, install signal handlers synchronously The previous behaviour of installing the signal handlers in a separate thread meant that Go initialization raced with non-Go initialization if the non-Go initialization also wanted to install signal handlers. Make installing signal handlers synchronous so that the process-wide behavior is predictable. Update #9896. Change-Id: Ice24299877ec46f8518b072a381932d273096a32 Reviewed-on: https://go-review.googlesource.com/18150 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org>	2016-01-09 00:58:38 +00:00
Russ Cox	6da608206c	Revert "runtime: make NumGoroutine and Stack agree not to include system goroutines" This reverts commit `c5bafc8281`. Change-Id: Ie7030c978c6263b9e996d5aa0e490086796df26d Reviewed-on: https://go-review.googlesource.com/18431 Reviewed-by: Russ Cox <rsc@golang.org>	2016-01-08 15:31:15 +00:00
Russ Cox	c5bafc8281	runtime: make NumGoroutine and Stack agree not to include system goroutines Before, NumGoroutine counted system goroutines and Stack (usually) didn't show them, which was inconsistent and confusing. To resolve which way they should be consistent, it seems like package main import "runtime" func main() { println(runtime.NumGoroutine()) } should print 1 regardless of internal runtime details. Make it so. Fixes #11706. Change-Id: I6bfe26a901de517728192cfb26a5568c4ef4fe47 Reviewed-on: https://go-review.googlesource.com/18343 Reviewed-by: Austin Clements <austin@google.com>	2016-01-08 15:25:00 +00:00
Austin Clements	3f22adecc7	runtime: fix sigprof stack barrier locking `f90b48e` intended to require the stack barrier lock in all cases of sigprof that walked the user stack, but got it wrong. In particular, if sp < gp.stack.lo \|\| gp.stack.hi < sp, tracebackUser would be true, but we wouldn't acquire the stack lock. If it then turned out that we were in a cgo call, it would walk the stack without the lock. In fact, the whole structure of stack locking is sigprof is somewhat wrong because it assumes the G to lock is gp.m.curg, but all three gentraceback calls start from potentially different Gs. To fix this, we lower the gcTryLockStackBarriers calls much closer to the gentraceback calls. There are now three separate trylock calls, each clearly associated with a gentraceback and the locked G clearly matches the G from which the gentraceback starts. This actually brings the sigprof logic closer to what it originally was before stack barrier locking. This depends on "runtime: increase assumed stack size in externalthreadhandler" because it very slightly increases the stack used by sigprof; without this other commit, this is enough to blow the profiler thread's assumed stack size. Fixes #12528 (hopefully for real this time!). For the 1.5 branch, though it will require some backporting. On the 1.5 branch, this will not require the "runtime: increase assumed stack size in externalthreadhandler" commit: there's no pcvalue cache, so the used stack is smaller. Change-Id: Id2f6446ac276848f6fc158bee550cccd03186b83 Reviewed-on: https://go-review.googlesource.com/18328 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2016-01-07 19:40:38 +00:00
Austin Clements	b50b24837d	runtime: don't ignore success of cgo profiling tracebacks If a sigprof happens during a cgo call, we traceback from the entry point of the cgo call. However, if the SP is outside of the G's stack, we'll then ignore this traceback, even if it was successful, and overwrite it with just _ExternalCode. Fix this by accepting any successful traceback, regardless of whether we got it from a cgo entry point or from regular Go code. Fixes #13466. Change-Id: I5da9684361fc5964f44985d74a8cdf02ffefd213 Reviewed-on: https://go-review.googlesource.com/18327 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2016-01-07 19:40:26 +00:00
Ian Lance Taylor	70c9a8187a	runtime: set new m signal mask to program startup mask We were setting the signal mask of a new m to the signal mask of the m that created it. That failed when that m happened to be the one created by ensureSigM, which sets its signal mask to only include the signals being caught by os/signal.Notify. Fixes #13164. Update #9896. Change-Id: I705c196fe9d11754e10bab9e9b2e7530ecdfa367 Reviewed-on: https://go-review.googlesource.com/18064 Reviewed-by: Russ Cox <rsc@golang.org>	2016-01-06 23:14:46 +00:00
Ian Lance Taylor	934e055f41	runtime: call msanwrite on object passed to runtime/cgo Avoids an msan error when runtime/cgo is explicitly rebuilt with -fsanitize=memory. Fixes #13815. Change-Id: I70308034011fb308b63585bcd40b0d1e62ec93ef Reviewed-on: https://go-review.googlesource.com/18263 Reviewed-by: Russ Cox <rsc@golang.org>	2016-01-06 04:04:42 +00:00
Austin Clements	f90b48e0d3	runtime: require the stack barrier lock to traceback cgo and libcalls Currently, if sigprof determines that the G is in user code (not cgo or libcall code), it will only traceback the G stack if it can acquire the stack barrier lock. However, it has no such restriction if the G is in cgo or libcall code. Because cgo calls count as syscalls, stack scanning and stack barrier installation can occur during a cgo call, which means sigprof could attempt to traceback a G in a cgo call while scanstack is installing stack barriers in that G's stack. As a result, the following sequence of events can cause the sigprof traceback to panic with "missed stack barrier": 1. M1: G1 performs a Cgo call (which, on Windows, is any system call, which could explain why this is easier to reproduce on Windows). 2. M1: The Cgo call puts G1 into _Gsyscall state. 3. M2: GC starts a scan of G1's stack. It puts G1 in to _Gscansyscall and acquires the stack barrier lock. 4. M3: A profiling signal comes in. On Windows this is a global (though I don't think this matters), so the runtime stops M1 and calls sigprof for G1. 5. M3: sigprof fails to acquire the stack barrier lock (because the GC's stack scan holds it). 6. M3: sigprof observes that G1 is in a Cgo call, so it calls gentraceback on G1 with its Cgo transition point. 7. M3: gentraceback on G1 grabs the currently empty g.stkbar slice. 8. M2: GC finishes scanning G1's stack and installing stack barriers. 9. M3: gentraceback encounters one of the just-installed stack barriers and panics. This commit fixes this by only allowing cgo tracebacks if sigprof can acquire the stack barrier lock, just like in the regular user traceback case. For good measure, we put the same constraint on libcall tracebacks. This case is probably already safe because, unlike cgo calls, libcalls leave the G in _Grunning and prevent reaching a safe point, so scanstack cannot run during a libcall. However, this also means that sigprof will always acquire the stack barrier lock without contention, so there's no cost to adding this constraint to libcall tracebacks. Fixes #12528. For 1.5.3 (will require some backporting). Change-Id: Ia5a4b8e3d66b23b02ffcd54c6315c81055c0cec2 Reviewed-on: https://go-review.googlesource.com/18023 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-18 17:08:39 +00:00
Shenghou Ma	86f1944e86	cmd/dist, runtime: make runtime version available as runtime.buildVersion So that there is a uniformed way to retrieve Go version from a Go binary, starting from Go 1.4 (see https://golang.org/cl/117040043) Updates #13507. Change-Id: Iaa2b14fca2d8c4d883d3824e2efc82b3e6fe2624 Reviewed-on: https://go-review.googlesource.com/17459 Reviewed-by: Keith Randall <khr@golang.org>	2015-12-16 05:42:40 +00:00
Austin Clements	01baf13ba5	runtime: only trigger forced GC if GC is not running Currently, sysmon triggers a forced GC solely based on memstats.last_gc. However, memstats.last_gc isn't updated until mark termination, so once sysmon starts triggering forced GC, it will keep triggering them until GC finishes. The first of these actually starts a GC; the remainder up to the last print "GC forced", but gcStart returns immediately because gcphase != _GCoff; then the last may start another GC if the previous GC finishes (and sets last_gc) between sysmon triggering it and gcStart checking the GC phase. Fix this by expanding the condition for starting a forced GC to also require that no GC is currently running. This, combined with the way forcegchelper blocks until the GC cycle is started, ensures sysmon only starts one GC when the time exceeds the forced GC threshold. Fixes #13458. Change-Id: Ie6cf841927f6085136be3f45259956cd5cf10d23 Reviewed-on: https://go-review.googlesource.com/17819 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-15 20:13:19 +00:00
Austin Clements	50d8d4e834	runtime: simplify sigprof traceback interlocking The addition of stack barrier locking to copystack subsumes the partial fix from commit `bbd1a1c` for SIGPROF during copystack. With the stack barrier locking, this commit simplifies the rule in sigprof to: the user stack can be traced only if sigprof can acquire the stack barrier lock. Updates #12932, #13362. Change-Id: I1c1f80015053d0ac7761e9e0c7437c2aba26663f Reviewed-on: https://go-review.googlesource.com/17192 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-15 20:12:17 +00:00
Dmitry Vyukov	fb6f8a96f2	runtime: remove unnecessary wakeups of worker threads Currently we wake up new worker threads whenever we pass through the scheduler with nmspinning==0. This leads to lots of unnecessary thread wake ups. Instead let only spinning threads wake up new spinning threads. For the following program: package main import "runtime" func main() { for i := 0; i < 1e7; i++ { runtime.Gosched() } } Before: $ time ./test real 0m4.278s user 0m7.634s sys 0m1.423s $ strace -c ./test % time seconds usecs/call calls errors syscall 99.93 9.314936 3 2685009 17536 futex After: $ time ./test real 0m1.200s user 0m1.181s sys 0m0.024s $ strace -c ./test % time seconds usecs/call calls errors syscall 3.11 0.000049 25 2 futex Fixes #13527 Change-Id: Ia1f5bf8a896dcc25d8b04beb1f4317aa9ff16f74 Reviewed-on: https://go-review.googlesource.com/17540 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-12-11 11:31:12 +00:00
Rahul Chaudhry	f939ee13ae	runtime: fix GODEBUG=schedtrace=X delay handling. debug.schedtrace is an int32. Convert it to int64 before multiplying with constant 1000000. Otherwise, schedtrace values more than 2147 result in int32 overflow causing incorrect delays between traces. Change-Id: I064e8d7b432c1e892a705ee1f31a2e8cdd2c3ea3 Reviewed-on: https://go-review.googlesource.com/17712 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org>	2015-12-11 03:59:20 +00:00
Austin Clements	d2c81ad847	runtime: recursively disallow write barriers in sysmon sysmon runs without a P. This means it can't interact with the garbage collector, so write barriers not allowed in anything that sysmon does. Fixes #10600. Change-Id: I9de1283900dadee4f72e2ebfc8787123e382ae88 Reviewed-on: https://go-review.googlesource.com/17006 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-19 21:17:25 +00:00
Austin Clements	402e37d4a9	cmd/compile: special case nowritebarrierrec for allocm allocm is a very unusual function: it is specifically designed to allocate in contexts where m.p is nil by temporarily taking over a P. Since allocm is used in many contexts where it would make sense to use nowritebarrierrec, this commit teaches the nowritebarrierrec analysis to stop at allocm. Updates #10600. Change-Id: I8499629461d4fe25712d861720dfe438df7ada9b Reviewed-on: https://go-review.googlesource.com/17005 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-19 21:17:19 +00:00
Austin Clements	9c9d74aba7	runtime: prevent sigprof during all stack barrier ops A sigprof during stack barrier insertion or removal can crash if it detects an inconsistency between the stkbar array and the stack itself. Currently we protect against this when scanning another G's stack using stackLock, but we don't protect against it when unwinding stack barriers for a recover or a memmove to the stack. This commit cleans up and improves the stack locking code. It abstracts out the lock and unlock operations. It uses the lock consistently everywhere we perform stack operations, and pushes the lock/unlock down closer to where the stack barrier operations happen to make it more obvious what it's protecting. Finally, it modifies sigprof so that instead of spinning until it acquires the lock, it simply doesn't perform a traceback if it can't acquire it. This is necessary to prevent self-deadlock. Updates #11863, which introduced stackLock to fix some of these issues, but didn't go far enough. Updates #12528. Change-Id: I9d1fa88ae3744d31ba91500c96c6988ce1a3a349 Reviewed-on: https://go-review.googlesource.com/17036 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-19 16:35:38 +00:00
Russ Cox	f8e6418637	runtime: fix bad signal stack when using cgo-created threads and async signals Cgo-created threads transition between having associated Go g's and m's and not. A signal arriving during the transition could think it was safe and appropriate to run Go signal handlers when it was in fact not. Avoid the race by masking all signals during the transition. Fixes #12277. Change-Id: Ie9711bc1d098391d58362492197a7e0f5b497d14 Reviewed-on: https://go-review.googlesource.com/16915 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-18 18:05:22 +00:00
Michael Hudson-Doyle	368d548417	cmd/compile, cmd/link, runtime: on ppc64x, maintain the TOC pointer in R2 when compiling PIC The PowerPC ISA does not have a PC-relative load instruction, which poses obvious challenges when generating position-independent code. The way the ELFv2 ABI addresses this is to specify that r2 points to a per "module" (shared library or executable) TOC pointer. Maintaining this pointer requires cooperation between codegen and the system linker: * Non-leaf functions leave space on the stack at r1+24 to save the TOC pointer. * A call to a function that might have to go via a PLT stub must be followed by a nop instruction that the system linker can replace with "ld r1, 24(r1)" to restore the TOC pointer (only when dynamically linking Go code). * When calling a function via a function pointer, the address of the function must be in r12, and the first couple of instructions (the "global entry point") of the called function use this to derive the address of the TOC for the module it is in. * When calling a function that is implemented in the same module, the system linker adjusts the call to skip over the instructions mentioned above (the "local entry point"), assuming that r2 is already correctly set. So this changeset adds the global entry point instructions, sets the metadata so the system linker knows where the local entry point is, inserts code to save the TOC pointer at 24(r1), adds a nop after any call not known to be local and copes with the odd non-local code transfer in the runtime (e.g. the stuff around jmpdefer). It does not actually compile PIC yet. Change-Id: I7522e22bdfd2f891745a900c60254fe9e372c854 Reviewed-on: https://go-review.googlesource.com/15967 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-12 23:18:58 +00:00
Austin Clements	bbd1a1c706	runtime: make SIGPROF skip stacks that are being copied sigprof tracebacks the stack across systemstack switches to make profile tracebacks more complete. However, it does this even if the user stack is currently being copied, which means it may be in an inconsistent state that will cause the traceback to panic. One specific way this can happen is during stack shrinking. Some goroutine blocks for STW, then enters gchelper, which then assists with root marking. If that root marking happens to pick the original goroutine and its stack needs to be shrunk, it will begin to copy that stack. During this copy, the stack is generally inconsistent and, in particular, the actual locations of the stack barriers and their recorded locations are temporarily out of sync. If a SIGPROF happens during this inconsistency, it will walk the stack all the way back to the blocked goroutine and panic when it fails to unwind the stack barriers. Fix this by disallowing jumping to the user stack during SIGPROF if that user stack is in the process of being copied. Fixes #12932. Change-Id: I9ef694c2c01e3653e292ce22612418dd3daff1b4 Reviewed-on: https://go-review.googlesource.com/16819 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-12 17:37:04 +00:00
Michael Matloob	432cb66f16	runtime: break out system-specific constants into package sys runtime/internal/sys will hold system-, architecture- and config- specific constants. Updates #11647 Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54 Reviewed-on: https://go-review.googlesource.com/16817 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-12 17:04:45 +00:00
Matthew Dempsky	c17c42e8a5	runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...) Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and mSpanList. Two special cases: 1. mHeap_Scavenge() previously didn't take an mheap parameter, so it was specially handled in this CL. 2. mHeap_Free() would have collided with mheap's "free" field, so it's been renamed to (mheap).freeSpan to parallel its underlying (*mheap).freeSpanLocked method. Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b Reviewed-on: https://go-review.googlesource.com/16221 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-12 00:34:58 +00:00
Austin Clements	f32f2954fb	runtime: never allocate new M when jumping time forward When we're jumping time forward, it means everyone is asleep, so there should always be an M available. Furthermore, this causes both allocation and write barriers in contexts that may be running without a P (such as in sysmon). Hence, replace this allocation with a throw. Updates #10600. Change-Id: I2cee70d5db828d0044082878995949edb25dda5f Reviewed-on: https://go-review.googlesource.com/16815 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-11 17:37:42 +00:00
Michael Matloob	67faca7d9c	runtime: break atomics out into package runtime/internal/atomic This change breaks out most of the atomics functions in the runtime into package runtime/internal/atomic. It adds some basic support in the toolchain for runtime packages, and also modifies linux/arm atomics to remove the dependency on the runtime's mutex. The mutexes have been replaced with spinlocks. all trybots are happy! In addition to the trybots, I've tested on the darwin/arm64 builder, on the darwin/arm builder, and on a ppc64le machine. Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f Reviewed-on: https://go-review.googlesource.com/14204 Run-TryBot: Michael Matloob <matloob@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-10 17:38:04 +00:00
Austin Clements	d5ba582166	runtime: remove background GC goroutine and mark barriers These are now unused. Updates #11970. Change-Id: I43e5c4e5bcda9581bacc63364f96bb4855ab779f Reviewed-on: https://go-review.googlesource.com/16393 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-05 21:24:05 +00:00
Austin Clements	c99d7f7f85	runtime: decentralize mark done and mark termination This moves all of the mark 1 to mark 2 transition and mark termination to the mark done transition function. This means these transitions are now handled on the goroutine that detected mark completion. This also means that the GC coordinator and the background completion barriers are no longer used and various workarounds to yield to the coordinator are no longer necessary. These will be removed in follow-up commits. One consequence of this is that mark workers now need to be preemptible when performing the mark done transition. This allows them to stop the world and to perform the final clean-up steps of GC after restarting the world. They are only made preemptible while performing this transition, so if the worker findRunnableGCWorker would schedule isn't available, we didn't want to schedule it anyway. Fixes #11970. Change-Id: I9203a2d6287eeff62d589ec02ad9cb1e29ddb837 Reviewed-on: https://go-review.googlesource.com/16391 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-05 21:23:54 +00:00
Austin Clements	20f276e237	runtime: don't start idle mark workers when barriers are cleared Currently, we don't start dedicated or fractional mark workers unless the mark 1 or mark 2 barriers have been cleared. One intended consequence of this is that no background workers run between the forEachP that disposes all gcWork caches and the beginning of mark 2. However, we (unintentionally) did not apply this restriction to idle mark workers. As a result, these can start in the interim between mark 1 completion and mark 2 starting. This explains why it was necessary to reset the root marking jobs using carefully ordered atomic writes when setting up mark 2. It also means that, even though we definitely enqueue work before starting mark 2, it may be drained by the time we reset the mark 2 barrier. If this happens, currently the only thing preventing the runtime from deadlocking is that the scheduler itself also checks for mark completion and will signal mark 2 completion. Were it not for the odd behavior of idle workers, this check in the scheduler would not be necessary. Clean all of this up and prepare to remove this check in the scheduler by applying the same restriction to starting idle mark workers. Change-Id: Ic1b479e1591bd7773dc27b320ca399a215603b5a Reviewed-on: https://go-review.googlesource.com/16631 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-05 21:23:33 +00:00

1 2

96 Commits