qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-10-02 06:18:32 -06:00

Author	SHA1	Message	Date
Austin Clements	b1d94c118f	runtime: validate lfnode addresses Change-Id: Ic8c506289caaf6218494e5150d10002e0232feaa Reviewed-on: https://go-review.googlesource.com/85876 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:11 +00:00
Austin Clements	e9079a69f3	runtime: buffered write barrier implementation This implements runtime support for buffered write barriers on amd64. The buffered write barrier has a fast path that simply enqueues pointers in a per-P buffer. Unlike the current write barrier, this fast path is not a normal Go call and does not require the compiler to spill general-purpose registers or put arguments on the stack. When the buffer fills up, the write barrier takes the slow path, which spills all general purpose registers and flushes the buffer. We don't allow safe-points or stack splits while this frame is active, so it doesn't matter that we have no type information for the spilled registers in this frame. One minor complication is cgocheck=2 mode, which uses the write barrier to detect Go pointers being written to non-Go memory. We obviously can't buffer this, so instead we set the buffer to its minimum size, forcing the write barrier into the slow path on every call. For this specific case, we pass additional information as arguments to the flush function. This also requires enabling the cgo write barrier slightly later during runtime initialization, after Ps (and the per-P write barrier buffers) have been initialized. The code in this CL is not yet active. The next CL will modify the compiler to generate calls to the new write barrier. This reduces the average cost of the write barrier by roughly a factor of 4, which will pay for the cost of having it enabled more of the time after we make the GC pacer less aggressive. (Benchmarks will be in the next CL.) Updates #14951. Updates #22460. Change-Id: I396b5b0e2c5e5c4acfd761a3235fd15abadc6cb1 Reviewed-on: https://go-review.googlesource.com/73711 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-10-30 18:12:44 +00:00
Austin Clements	249b5cc945	runtime: mark gcWork methods nowritebarrierrec Currently most of these are marked go:nowritebarrier as a hint, but it's actually important that these not invoke write barriers recursively. The danger is that some gcWork method would invoke the write barrier while the gcWork is in an inconsistent state and that the write barrier would in turn invoke some other gcWork method, which would crash or permanently corrupt the gcWork. Simply marking the write barrier itself as go:nowritebarrierrec isn't sufficient to prevent this if the write barrier doesn't use the outer method. Thankfully, this doesn't cause any build failures, so we were getting this right. :) For #22460. Change-Id: I35a7292a584200eb35a49507cd3fe359ba2206f6 Reviewed-on: https://go-review.googlesource.com/72554 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-10-29 17:56:12 +00:00
Austin Clements	051809e352	runtime: free workbufs during sweeping This extends the sweeper to free workbufs back to the heap between GC cycles, allowing this memory to be reused for GC'd allocations or eventually returned to the OS. This helps for applications that have high peak heap usage relative to their regular heap usage (for example, a high-memory initialization phase). Workbuf memory is roughly proportional to heap size and since we currently never free workbufs, it's proportional to peak heap size. By freeing workbufs, we can release and reuse this memory for other purposes when the heap shrinks. This is somewhat complicated because this costs ~1–2 µs per workbuf span, so for large heaps it's too expensive to just do synchronously after mark termination between starting the world and dropping the worldsema. Hence, we do it asynchronously in the sweeper. This adds a list of "free" workbuf spans that can be returned to the heap. GC moves all workbuf spans to this list after mark termination and the background sweeper drains this list back to the heap. If the sweeper doesn't finish, that's fine, since getempty can directly reuse any remaining spans to allocate more workbufs. Performance impact is negligible. On the x/benchmarks, this reduces GC-bytes-from-system by 6–11%. Fixes #19325. Change-Id: Icb92da2196f0c39ee984faf92d52f29fd9ded7a8 Reviewed-on: https://go-review.googlesource.com/38582 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-13 18:20:47 +00:00
Austin Clements	9cc883a466	runtime: allocate GC workbufs from manually-managed spans Currently the runtime allocates workbufs from persistent memory, which means they can never be freed. Switch to allocating them from manually-managed heap spans. This doesn't free them yet, but it puts us in a position to do so. For #19325. Change-Id: I94b2512a2f2bbbb456cd9347761b9412e80d2da9 Reviewed-on: https://go-review.googlesource.com/38581 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-13 18:20:44 +00:00
Austin Clements	13ae271d5d	runtime: introduce a type for lfstacks The lfstack API is still a C-style API: lfstacks all have unhelpful type uint64 and the APIs are package-level functions. Make the code more readable and Go-style by creating an lfstack type with methods for push, pop, and empty. Change-Id: I64685fa3be0e82ae2d1a782a452a50974440a827 Reviewed-on: https://go-review.googlesource.com/38290 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-19 22:42:24 +00:00
Austin Clements	3399fd254d	runtime: remove unused gcstats The gcstats structure is no longer consumed by anything and no longer tracks statistics that are particularly relevant to the concurrent garbage collector. Remove it. (Having statistics is probably a good idea, but these aren't the stats we need these days and we don't have a way to get them out of the runtime.) In preparation for #13613. Change-Id: Ib63e2f9067850668f9dcbfd4ed89aab4a6622c3f Reviewed-on: https://go-review.googlesource.com/34936 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-04 02:56:35 +00:00
Austin Clements	98da2d1f91	runtime: remove wbufptr Since workbuf is now marked go:notinheap, the write barrier-preventing wrapper type wbufptr is no longer necessary. Remove it. Change-Id: I3e5b5803a1547d65de1c1a9c22458a38e08549b7 Reviewed-on: https://go-review.googlesource.com/35971 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-03 17:02:12 +00:00
Austin Clements	0bae74e8c9	runtime: wake idle Ps when enqueuing GC work If the scheduler has no user work and there's no GC work visible, it puts the P to sleep (or blocks on the network). However, if we later enqueue more GC work, there's currently nothing that specifically wakes up the scheduler to let it start an idle GC worker. As a result, we can underutilize the CPU during GC if Ps have been put to sleep. Fix this by making GC wake idle Ps when work buffers are put on the full list. We already have a hook to do this, since we use this to preempt a random P if we need more dedicated workers. We expand this hook to instead wake an idle P if there is one. The logic we use for this is identical to the logic used to wake an idle P when we ready a goroutine. To make this really sound, we also fix the scheduler to re-check the idle GC worker condition after releasing its P. This closes a race where 1) the scheduler checks for idle work and finds none, 2) new work is enqueued but there are no idle Ps so none are woken, and 3) the scheduler releases its P. There is one subtlety here. Currently we call enlistWorker directly from putfull, but the gcWork is in an inconsistent state in the places that call putfull. This isn't a problem right now because nothing that enlistWorker does touches the gcWork, but with the added call to wakep, it's possible to get a recursive call into the gcWork (specifically, while write barriers are disallowed, this can do an allocation, which can dispose a gcWork, which can put a workbuf). To handle this, we lift the enlistWorker calls up a layer and delay them until the gcWork is in a consistent state. Fixes #14179. Change-Id: Ia2467a52e54c9688c3c1752e1fc00f5b37bbfeeb Reviewed-on: https://go-review.googlesource.com/32434 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2016-11-20 22:44:22 +00:00
Austin Clements	1bc6be6423	runtime: mark several types go:notinheap This covers basically all sysAlloc'd, persistentalloc'd, and fixalloc'd types. Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b Reviewed-on: https://go-review.googlesource.com/30941 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-15 17:58:20 +00:00
Austin Clements	cf4f1d07a1	runtime: bound scanobject to ~100 µs Currently the time spent in scanobject is proportional to the size of the object being scanned. Since scanobject is non-preemptible, large objects can cause significant goroutine (and even whole application) delays through several means: 1. If a GC assist picks up a large object, the allocating goroutine is blocked for the whole scan, even if that scan well exceeds that goroutine's debt. 2. Since the scheduler does not run on the P performing a large object scan, goroutines in that P's run queue do not run unless they are stolen by another P (which can take some time). If there are a few large objects, all of the Ps may get tied up so the scheduler doesn't run anywhere. 3. Even if a large object is scanned by a background worker and other Ps are still running the scheduler, the large object scan doesn't flush background credit until the whole scan is done. This can easily cause all allocations to block in assists, waiting for credit, causing an effective STW. Fix this by splitting large objects into 128 KB "oblets" and scanning at most one oblet at a time. Since we can scan 1–2 MB/ms, this equates to bounding scanobject at roughly 100 µs. This improves assist behavior both because assists can no longer get "unlucky" and be stuck scanning a large object, and because it causes the background worker to flush credit and unblock assists more frequently when scanning large objects. This also improves GC parallelism if the heap consists primarily of a small number of very large objects by letting multiple workers scan a large objects in parallel. Fixes #10345. Fixes #16293. This substantially improves goroutine latency in the benchmark from issue #16293, which exercises several forms of very large objects: name old max-latency new max-latency delta SliceNoPointer-12 154µs ± 1% 155µs ± 2% ~ (p=0.087 n=13+12) SlicePointer-12 314ms ± 1% 5.94ms ±138% -98.11% (p=0.000 n=19+20) SliceLivePointer-12 1148ms ± 0% 4.72ms ±167% -99.59% (p=0.000 n=19+20) MapNoPointer-12 72509µs ± 1% 408µs ±325% -99.44% (p=0.000 n=19+18) ChanPointer-12 313ms ± 0% 4.74ms ±140% -98.49% (p=0.000 n=18+20) ChanLivePointer-12 1147ms ± 0% 3.30ms ±149% -99.71% (p=0.000 n=19+20) name old P99.9-latency new P99.9-latency delta SliceNoPointer-12 113µs ±25% 107µs ±12% ~ (p=0.153 n=20+18) SlicePointer-12 309450µs ± 0% 133µs ±23% -99.96% (p=0.000 n=20+20) SliceLivePointer-12 961ms ± 0% 1.35ms ±27% -99.86% (p=0.000 n=20+20) MapNoPointer-12 448µs ±288% 119µs ±18% -73.34% (p=0.000 n=18+20) ChanPointer-12 309450µs ± 0% 134µs ±23% -99.96% (p=0.000 n=20+19) ChanLivePointer-12 961ms ± 0% 1.35ms ±27% -99.86% (p=0.000 n=20+20) This has negligible effect on all metrics from the garbage, JSON, and HTTP x/benchmarks. It shows slight improvement on some of the go1 benchmarks, particularly Revcomp, which uses some multi-megabyte buffers: name old time/op new time/op delta BinaryTree17-12 2.46s ± 1% 2.47s ± 1% +0.32% (p=0.012 n=20+20) Fannkuch11-12 2.82s ± 0% 2.81s ± 0% -0.61% (p=0.000 n=17+20) FmtFprintfEmpty-12 50.8ns ± 5% 50.5ns ± 2% ~ (p=0.197 n=17+19) FmtFprintfString-12 131ns ± 1% 132ns ± 0% +0.57% (p=0.000 n=20+16) FmtFprintfInt-12 117ns ± 0% 116ns ± 0% -0.47% (p=0.000 n=15+20) FmtFprintfIntInt-12 180ns ± 0% 179ns ± 1% -0.78% (p=0.000 n=16+20) FmtFprintfPrefixedInt-12 186ns ± 1% 185ns ± 1% -0.55% (p=0.000 n=19+20) FmtFprintfFloat-12 263ns ± 1% 271ns ± 0% +2.84% (p=0.000 n=18+20) FmtManyArgs-12 741ns ± 1% 742ns ± 1% ~ (p=0.190 n=19+19) GobDecode-12 7.44ms ± 0% 7.35ms ± 1% -1.21% (p=0.000 n=20+20) GobEncode-12 6.22ms ± 1% 6.21ms ± 1% ~ (p=0.336 n=20+19) Gzip-12 220ms ± 1% 219ms ± 1% ~ (p=0.130 n=19+19) Gunzip-12 37.9ms ± 0% 37.9ms ± 1% ~ (p=1.000 n=20+19) HTTPClientServer-12 82.5µs ± 3% 82.6µs ± 3% ~ (p=0.776 n=20+19) JSONEncode-12 16.4ms ± 1% 16.5ms ± 2% +0.49% (p=0.003 n=18+19) JSONDecode-12 53.7ms ± 1% 54.1ms ± 1% +0.71% (p=0.000 n=19+18) Mandelbrot200-12 4.19ms ± 1% 4.20ms ± 1% ~ (p=0.452 n=19+19) GoParse-12 3.38ms ± 1% 3.37ms ± 1% ~ (p=0.123 n=19+19) RegexpMatchEasy0_32-12 72.1ns ± 1% 71.8ns ± 1% ~ (p=0.397 n=19+17) RegexpMatchEasy0_1K-12 242ns ± 0% 242ns ± 0% ~ (p=0.168 n=17+20) RegexpMatchEasy1_32-12 72.1ns ± 1% 72.1ns ± 1% ~ (p=0.538 n=18+19) RegexpMatchEasy1_1K-12 385ns ± 1% 384ns ± 1% ~ (p=0.388 n=20+20) RegexpMatchMedium_32-12 112ns ± 1% 112ns ± 3% ~ (p=0.539 n=20+20) RegexpMatchMedium_1K-12 34.4µs ± 2% 34.4µs ± 2% ~ (p=0.628 n=18+18) RegexpMatchHard_32-12 1.80µs ± 1% 1.80µs ± 1% ~ (p=0.522 n=18+19) RegexpMatchHard_1K-12 54.0µs ± 1% 54.1µs ± 1% ~ (p=0.647 n=20+19) Revcomp-12 387ms ± 1% 369ms ± 5% -4.89% (p=0.000 n=17+19) Template-12 62.3ms ± 1% 62.0ms ± 0% -0.48% (p=0.002 n=20+17) TimeParse-12 314ns ± 1% 314ns ± 0% ~ (p=1.011 n=20+13) TimeFormat-12 358ns ± 0% 354ns ± 0% -1.12% (p=0.000 n=17+20) [Geo mean] 53.5µs 53.3µs -0.23% Change-Id: I2a0a179d1d6bf7875dd054b7693dd12d2a340132 Reviewed-on: https://go-review.googlesource.com/23540 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-09-06 19:27:33 +00:00
Rick Hudson	1354b32cd7	[dev.garbage] runtime: add gc work buffer tryGet and put fast paths The complexity of the GC work buffers put and tryGet prevented them from being inlined. This CL simplifies the fast path thus enabling inlining. If the fast path does not succeed the previous put and tryGet functions are called. Change-Id: I6da6495d0dadf42bd0377c110b502274cc01acf5 Reviewed-on: https://go-review.googlesource.com/20704 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:55:02 +00:00
Brad Fitzpatrick	5fea2ccc77	all: single space after period. The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-02 00:13:47 +00:00
Austin Clements	3b3d58e119	runtime: remove workbuf logging Early in Go 1.5 we had bugs with ownership of workbufs, so we added a system for tracing their ownership to help debug these issues. However, this system has both CPU and space overhead even when disabled, it clutters up the workbuf API, the higher level gcWork abstraction makes it very difficult to mess up the ownership of workbufs in practice, and the tracing hasn't been enabled or needed since `5b66e5d` nine months ago. Hence, remove it. Benchmarks show the usual noise from changes at this level, but little overall movement. name old time/op new time/op delta XBenchGarbage-12 2.48ms ± 1% 2.47ms ± 0% -0.68% (p=0.000 n=21+21) name old time/op new time/op delta BinaryTree17-12 2.98s ± 7% 2.98s ± 6% ~ (p=0.799 n=20+20) Fannkuch11-12 2.61s ± 3% 2.55s ± 5% -2.55% (p=0.003 n=20+20) FmtFprintfEmpty-12 52.8ns ± 6% 53.6ns ± 6% ~ (p=0.228 n=20+20) FmtFprintfString-12 177ns ± 4% 177ns ± 4% ~ (p=0.280 n=20+20) FmtFprintfInt-12 162ns ± 5% 162ns ± 3% ~ (p=0.347 n=20+20) FmtFprintfIntInt-12 277ns ± 7% 273ns ± 4% -1.62% (p=0.005 n=20+20) FmtFprintfPrefixedInt-12 237ns ± 4% 242ns ± 4% +2.13% (p=0.005 n=20+20) FmtFprintfFloat-12 315ns ± 4% 312ns ± 4% -0.97% (p=0.001 n=20+20) FmtManyArgs-12 1.11µs ± 3% 1.15µs ± 4% +3.41% (p=0.004 n=20+20) GobDecode-12 8.50ms ± 7% 8.53ms ± 7% ~ (p=0.429 n=20+20) GobEncode-12 6.86ms ± 9% 6.93ms ± 7% +0.93% (p=0.030 n=20+20) Gzip-12 326ms ± 4% 329ms ± 4% +0.98% (p=0.020 n=20+20) Gunzip-12 43.3ms ± 3% 43.8ms ± 9% +1.25% (p=0.003 n=20+20) HTTPClientServer-12 72.0µs ± 3% 71.5µs ± 3% ~ (p=0.053 n=20+20) JSONEncode-12 17.0ms ± 6% 17.3ms ± 7% +1.32% (p=0.006 n=20+20) JSONDecode-12 64.2ms ± 4% 63.5ms ± 3% -1.05% (p=0.005 n=20+20) Mandelbrot200-12 4.00ms ± 3% 3.99ms ± 3% ~ (p=0.121 n=20+20) GoParse-12 3.74ms ± 5% 3.75ms ± 9% ~ (p=0.383 n=20+20) RegexpMatchEasy0_32-12 104ns ± 4% 104ns ± 6% ~ (p=0.392 n=20+20) RegexpMatchEasy0_1K-12 358ns ± 3% 361ns ± 4% +0.95% (p=0.003 n=20+20) RegexpMatchEasy1_32-12 86.3ns ± 5% 86.1ns ± 6% ~ (p=0.614 n=20+20) RegexpMatchEasy1_1K-12 523ns ± 4% 518ns ± 3% -1.14% (p=0.008 n=20+20) RegexpMatchMedium_32-12 137ns ± 3% 134ns ± 4% -1.90% (p=0.005 n=20+20) RegexpMatchMedium_1K-12 41.0µs ± 3% 40.6µs ± 4% -1.11% (p=0.004 n=20+20) RegexpMatchHard_32-12 2.13µs ± 4% 2.11µs ± 5% -1.31% (p=0.014 n=20+20) RegexpMatchHard_1K-12 64.1µs ± 3% 63.2µs ± 5% -1.38% (p=0.005 n=20+20) Revcomp-12 555ms ±10% 548ms ± 7% -1.17% (p=0.011 n=20+20) Template-12 84.2ms ± 5% 88.2ms ± 4% +4.73% (p=0.000 n=20+20) TimeParse-12 365ns ± 4% 371ns ± 5% +1.77% (p=0.002 n=20+20) TimeFormat-12 361ns ± 4% 365ns ± 3% +1.08% (p=0.002 n=20+20) [Geo mean] 64.7µs 64.8µs +0.19% Change-Id: Ib043a7a0d18b588b298873d60913d44cd19f3b44 Reviewed-on: https://go-review.googlesource.com/19887 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-02-26 15:14:32 +00:00
Austin Clements	0168c2676f	runtime: use only per-P gcWork Currently most uses of gcWork use the per-P gcWork, but there are two places that still use a stack-based gcWork. Simplify things by making these instead use the per-P gcWork. Change-Id: I712d012cce9dd5757c8541824e9641ac1c2a329c Reviewed-on: https://go-review.googlesource.com/19636 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-25 23:37:27 +00:00
Austin Clements	98130b39f5	runtime: remove noescape hacks from gcWork When gcWork was first introduced, the compiler's escape analysis wasn't good enough to detect that that method receiver didn't escape, so we had to hack around this. Now that the compiler can figure out this for itself, remove these hacks. Change-Id: I9f73fab721e272410b8b6905b564e7abc03c0dfe Reviewed-on: https://go-review.googlesource.com/19634 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-25 23:37:22 +00:00
Martin Möhrmann	fdd0179bb1	all: fix typos and spelling Change-Id: Icd06d99c42b8299fd931c7da821e1f418684d913 Reviewed-on: https://go-review.googlesource.com/19829 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-24 18:42:29 +00:00
Michael Matloob	432cb66f16	runtime: break out system-specific constants into package sys runtime/internal/sys will hold system-, architecture- and config- specific constants. Updates #11647 Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54 Reviewed-on: https://go-review.googlesource.com/16817 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-12 17:04:45 +00:00
Michael Matloob	67faca7d9c	runtime: break atomics out into package runtime/internal/atomic This change breaks out most of the atomics functions in the runtime into package runtime/internal/atomic. It adds some basic support in the toolchain for runtime packages, and also modifies linux/arm atomics to remove the dependency on the runtime's mutex. The mutexes have been replaced with spinlocks. all trybots are happy! In addition to the trybots, I've tested on the darwin/arm64 builder, on the darwin/arm builder, and on a ppc64le machine. Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f Reviewed-on: https://go-review.googlesource.com/14204 Run-TryBot: Michael Matloob <matloob@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-10 17:38:04 +00:00
Austin Clements	dcd9e5bc0f	runtime: make putfull start mark workers Currently we depend on the good graces and timing of the scheduler to get opportunities to start dedicated mark workers. In the worst case, it may take 10ms to get dedicated mark workers going at the beginning of mark 1 and mark 2 or after the amount of available work has dropped and gone back up. Instead of waiting for the regular preemption logic to get around to us, make putfull enlist a random P if we're not already running enough dedicated workers. This should improve performance stability of the garbage collector and is likely to improve the overall performance somewhat. No overall effect on the go1 benchmarks. It speeds up the garbage benchmark by 12%, which more than counters the performance loss from the previous commit. name old time/op new time/op delta XBenchGarbage-12 6.32ms ± 4% 5.58ms ± 2% -11.68% (p=0.000 n=20+16) name old time/op new time/op delta BinaryTree17-12 3.18s ± 5% 3.12s ± 4% -1.83% (p=0.021 n=20+20) Fannkuch11-12 2.50s ± 2% 2.46s ± 2% -1.57% (p=0.000 n=18+19) FmtFprintfEmpty-12 50.8ns ± 3% 50.4ns ± 3% ~ (p=0.184 n=20+20) FmtFprintfString-12 167ns ± 2% 171ns ± 1% +2.46% (p=0.000 n=20+19) FmtFprintfInt-12 161ns ± 2% 163ns ± 2% +1.81% (p=0.000 n=20+20) FmtFprintfIntInt-12 269ns ± 1% 266ns ± 1% -0.81% (p=0.002 n=19+20) FmtFprintfPrefixedInt-12 237ns ± 2% 231ns ± 2% -2.86% (p=0.000 n=20+20) FmtFprintfFloat-12 313ns ± 2% 313ns ± 1% ~ (p=0.681 n=20+20) FmtManyArgs-12 1.05µs ± 2% 1.03µs ± 1% -2.26% (p=0.000 n=20+20) GobDecode-12 8.66ms ± 1% 8.67ms ± 1% ~ (p=0.380 n=19+20) GobEncode-12 6.56ms ± 1% 6.56ms ± 2% ~ (p=0.607 n=19+20) Gzip-12 317ms ± 1% 314ms ± 2% -1.10% (p=0.000 n=20+19) Gunzip-12 42.1ms ± 1% 42.2ms ± 1% +0.27% (p=0.044 n=20+19) HTTPClientServer-12 62.7µs ± 1% 62.0µs ± 1% -1.04% (p=0.000 n=19+18) JSONEncode-12 16.7ms ± 1% 16.8ms ± 2% +0.59% (p=0.021 n=20+20) JSONDecode-12 58.2ms ± 1% 61.4ms ± 2% +5.43% (p=0.000 n=18+19) Mandelbrot200-12 3.84ms ± 1% 3.87ms ± 2% +0.79% (p=0.008 n=18+20) GoParse-12 3.86ms ± 2% 3.76ms ± 2% -2.60% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 100ns ± 2% 100ns ± 1% -0.68% (p=0.005 n=18+15) RegexpMatchEasy0_1K-12 332ns ± 1% 342ns ± 1% +3.16% (p=0.000 n=19+19) RegexpMatchEasy1_32-12 82.9ns ± 3% 83.0ns ± 2% ~ (p=0.906 n=19+20) RegexpMatchEasy1_1K-12 487ns ± 1% 494ns ± 1% +1.50% (p=0.000 n=17+20) RegexpMatchMedium_32-12 131ns ± 2% 130ns ± 1% ~ (p=0.686 n=19+20) RegexpMatchMedium_1K-12 39.6µs ± 1% 39.2µs ± 1% -1.09% (p=0.000 n=18+19) RegexpMatchHard_32-12 2.04µs ± 1% 2.04µs ± 2% ~ (p=0.804 n=20+20) RegexpMatchHard_1K-12 61.7µs ± 2% 61.3µs ± 2% ~ (p=0.052 n=18+20) Revcomp-12 529ms ± 2% 533ms ± 1% +0.83% (p=0.003 n=20+19) Template-12 70.7ms ± 2% 71.0ms ± 2% ~ (p=0.065 n=20+19) TimeParse-12 351ns ± 2% 355ns ± 1% +1.25% (p=0.000 n=19+20) TimeFormat-12 362ns ± 2% 373ns ± 1% +2.83% (p=0.000 n=18+20) [Geo mean] 62.2µs 62.3µs +0.13% name old speed new speed delta GobDecode-12 88.6MB/s ± 1% 88.5MB/s ± 1% ~ (p=0.392 n=19+20) GobEncode-12 117MB/s ± 1% 117MB/s ± 1% ~ (p=0.622 n=19+20) Gzip-12 61.1MB/s ± 1% 61.8MB/s ± 2% +1.11% (p=0.000 n=20+19) Gunzip-12 461MB/s ± 1% 460MB/s ± 1% -0.27% (p=0.044 n=20+19) JSONEncode-12 116MB/s ± 1% 115MB/s ± 2% -0.58% (p=0.022 n=20+20) JSONDecode-12 33.3MB/s ± 1% 31.6MB/s ± 2% -5.15% (p=0.000 n=18+19) GoParse-12 15.0MB/s ± 2% 15.4MB/s ± 2% +2.66% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 317MB/s ± 2% 319MB/s ± 2% ~ (p=0.052 n=20+20) RegexpMatchEasy0_1K-12 3.08GB/s ± 1% 2.99GB/s ± 1% -3.07% (p=0.000 n=19+19) RegexpMatchEasy1_32-12 386MB/s ± 3% 386MB/s ± 2% ~ (p=0.939 n=19+20) RegexpMatchEasy1_1K-12 2.10GB/s ± 1% 2.07GB/s ± 1% -1.46% (p=0.000 n=17+20) RegexpMatchMedium_32-12 7.62MB/s ± 2% 7.64MB/s ± 1% ~ (p=0.702 n=19+20) RegexpMatchMedium_1K-12 25.9MB/s ± 1% 26.1MB/s ± 2% +0.99% (p=0.000 n=18+20) RegexpMatchHard_32-12 15.7MB/s ± 1% 15.7MB/s ± 2% ~ (p=0.723 n=20+20) RegexpMatchHard_1K-12 16.6MB/s ± 2% 16.7MB/s ± 2% ~ (p=0.052 n=18+20) Revcomp-12 481MB/s ± 2% 477MB/s ± 1% -0.83% (p=0.003 n=20+19) Template-12 27.5MB/s ± 2% 27.3MB/s ± 2% ~ (p=0.062 n=20+19) [Geo mean] 99.4MB/s 99.1MB/s -0.35% Change-Id: I914d8cadded5a230509d118164a4c201601afc06 Reviewed-on: https://go-review.googlesource.com/16298 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-11-04 20:15:51 +00:00
Austin Clements	b6c0934a9b	runtime: cache two workbufs to reduce contention Currently the gcWork abstraction caches a single work buffer. As a result, if a worker is putting and getting pointers right at the boundary of a work buffer, it can flap between work buffers and (potentially significantly) increase contention on the global work buffer lists. This change modifies gcWork to instead cache two work buffers and switch off between them. This introduces one buffers' worth of hysteresis and eliminates the above performance worst case by amortizing the cost of getting or putting a work buffer over at least one buffers' worth of work. In practice, it's difficult to trigger this worst case with reasonably large work buffers. On the garbage benchmark, this reduces the max writes/sec to the global work list from 32K to 25K and the median from 6K to 5K. However, if a workload were to trigger this worst case behavior, it could significantly drive up this contention. This has negligible effects on the go1 benchmarks and slightly speeds up the garbage benchmark. name old time/op new time/op delta XBenchGarbage-12 5.90ms ± 3% 5.83ms ± 4% -1.18% (p=0.011 n=18+18) name old time/op new time/op delta BinaryTree17-12 3.22s ± 4% 3.17s ± 3% -1.57% (p=0.009 n=19+20) Fannkuch11-12 2.44s ± 1% 2.53s ± 4% +3.78% (p=0.000 n=18+19) FmtFprintfEmpty-12 50.2ns ± 2% 50.5ns ± 5% ~ (p=0.631 n=19+20) FmtFprintfString-12 167ns ± 1% 166ns ± 1% ~ (p=0.141 n=20+20) FmtFprintfInt-12 162ns ± 1% 159ns ± 1% -1.80% (p=0.000 n=20+20) FmtFprintfIntInt-12 277ns ± 2% 263ns ± 1% -4.78% (p=0.000 n=20+18) FmtFprintfPrefixedInt-12 240ns ± 1% 232ns ± 2% -3.25% (p=0.000 n=20+20) FmtFprintfFloat-12 311ns ± 1% 315ns ± 2% +1.17% (p=0.000 n=20+20) FmtManyArgs-12 1.05µs ± 2% 1.03µs ± 2% -1.72% (p=0.000 n=20+20) GobDecode-12 8.65ms ± 1% 8.71ms ± 2% +0.68% (p=0.001 n=19+20) GobEncode-12 6.51ms ± 1% 6.54ms ± 1% +0.42% (p=0.047 n=20+19) Gzip-12 318ms ± 2% 315ms ± 2% -1.20% (p=0.000 n=19+19) Gunzip-12 42.2ms ± 2% 42.1ms ± 1% ~ (p=0.667 n=20+19) HTTPClientServer-12 62.5µs ± 1% 62.4µs ± 1% ~ (p=0.110 n=20+18) JSONEncode-12 16.8ms ± 1% 16.8ms ± 2% ~ (p=0.569 n=19+20) JSONDecode-12 60.8ms ± 2% 59.8ms ± 1% -1.69% (p=0.000 n=19+19) Mandelbrot200-12 3.87ms ± 1% 3.85ms ± 0% -0.61% (p=0.001 n=20+17) GoParse-12 3.76ms ± 2% 3.76ms ± 1% ~ (p=0.698 n=20+20) RegexpMatchEasy0_32-12 100ns ± 2% 101ns ± 2% ~ (p=0.065 n=19+20) RegexpMatchEasy0_1K-12 342ns ± 2% 333ns ± 1% -2.82% (p=0.000 n=20+19) RegexpMatchEasy1_32-12 83.3ns ± 2% 83.2ns ± 2% ~ (p=0.692 n=20+19) RegexpMatchEasy1_1K-12 498ns ± 2% 490ns ± 1% -1.52% (p=0.000 n=18+20) RegexpMatchMedium_32-12 131ns ± 2% 131ns ± 2% ~ (p=0.464 n=20+18) RegexpMatchMedium_1K-12 39.3µs ± 2% 39.6µs ± 1% +0.77% (p=0.000 n=18+19) RegexpMatchHard_32-12 2.04µs ± 2% 2.06µs ± 1% +0.69% (p=0.009 n=19+20) RegexpMatchHard_1K-12 61.4µs ± 2% 62.1µs ± 1% +1.21% (p=0.000 n=19+20) Revcomp-12 534ms ± 1% 529ms ± 1% -0.97% (p=0.000 n=19+16) Template-12 70.4ms ± 2% 70.0ms ± 1% ~ (p=0.070 n=19+19) TimeParse-12 359ns ± 3% 344ns ± 1% -4.15% (p=0.000 n=19+19) TimeFormat-12 357ns ± 1% 361ns ± 2% +1.05% (p=0.002 n=20+20) [Geo mean] 62.4µs 62.0µs -0.56% name old speed new speed delta GobDecode-12 88.7MB/s ± 1% 88.1MB/s ± 2% -0.68% (p=0.001 n=19+20) GobEncode-12 118MB/s ± 1% 117MB/s ± 1% -0.42% (p=0.046 n=20+19) Gzip-12 60.9MB/s ± 2% 61.7MB/s ± 2% +1.21% (p=0.000 n=19+19) Gunzip-12 460MB/s ± 2% 461MB/s ± 1% ~ (p=0.661 n=20+19) JSONEncode-12 116MB/s ± 1% 115MB/s ± 2% ~ (p=0.555 n=19+20) JSONDecode-12 31.9MB/s ± 2% 32.5MB/s ± 1% +1.72% (p=0.000 n=19+19) GoParse-12 15.4MB/s ± 2% 15.4MB/s ± 1% ~ (p=0.653 n=20+20) RegexpMatchEasy0_32-12 317MB/s ± 2% 315MB/s ± 2% ~ (p=0.141 n=19+20) RegexpMatchEasy0_1K-12 2.99GB/s ± 2% 3.07GB/s ± 1% +2.86% (p=0.000 n=20+19) RegexpMatchEasy1_32-12 384MB/s ± 2% 385MB/s ± 2% ~ (p=0.672 n=20+19) RegexpMatchEasy1_1K-12 2.06GB/s ± 2% 2.09GB/s ± 1% +1.54% (p=0.000 n=18+20) RegexpMatchMedium_32-12 7.62MB/s ± 2% 7.63MB/s ± 2% ~ (p=0.800 n=20+18) RegexpMatchMedium_1K-12 26.0MB/s ± 1% 25.8MB/s ± 1% -0.77% (p=0.000 n=18+19) RegexpMatchHard_32-12 15.7MB/s ± 2% 15.6MB/s ± 1% -0.69% (p=0.010 n=19+20) RegexpMatchHard_1K-12 16.7MB/s ± 2% 16.5MB/s ± 1% -1.19% (p=0.000 n=19+20) Revcomp-12 476MB/s ± 1% 481MB/s ± 1% +0.97% (p=0.000 n=19+16) Template-12 27.6MB/s ± 2% 27.7MB/s ± 1% ~ (p=0.071 n=19+19) [Geo mean] 99.1MB/s 99.3MB/s +0.27% Change-Id: I68bcbf74ccb716cd5e844a554f67b679135105e6 Reviewed-on: https://go-review.googlesource.com/16042 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-03 19:12:10 +00:00
Austin Clements	1870572180	runtime: enlarge GC work buffer size Currently the GC work buffers are only 256 bytes and hence can record only 24 64-bit pointer. They were reduced from 4K in commits `db7fd1c` and `a15818f` as a way to minimize the amount of work the per-P workbuf caches could "hide" from the mark phase and carry in to the mark termination phase. However, this approach wasn't very robust and we later added a "mark 2" phase to address this problem head-on. Because of mark 2, there's now no benefit to having very small work buffers. But there are plenty of downsides: small work buffers increase contention on the work lists, increase the frequency and hence net overhead of acquiring and releasing work buffers, and somewhat increase memory overhead of the GC. This commit expands work buffers back to 4K (504 64-bit pointers). This reduces the rate of writes to work.full in the garbage benchmark from a peak of ~780,000 writes/sec to a peak of ~32,000 writes/sec. This has negligible effect on the go1 benchmarks. It slightly slows down the garbage benchmark. name old time/op new time/op delta XBenchGarbage-12 5.37ms ± 5% 5.60ms ± 2% +4.37% (p=0.000 n=20+20) Change-Id: Ic9cc28e7a125d23d9faf4f5e690fb8aa9bcdfb28 Reviewed-on: https://go-review.googlesource.com/15893 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-03 15:53:38 +00:00
Austin Clements	82d14d77da	runtime: perform concurrent scan in GC workers Currently the concurrent root scan is performed in its entirety by the GC coordinator before entering concurrent mark (which enables GC workers). This scan is done sequentially, which can prolong the scan phase, delay the mark phase, and means that the scan phase does not obey the 25% CPU goal. Furthermore, there's no need to complete the root scan before starting marking (in fact, we already allow GC assists to happen during the scan phase), so this acts as an unnecessary barrier between root scanning and marking. This change shifts the root scan work out of the GC coordinator and in to the GC workers. The coordinator simply sets up the scan state and enqueues the right number of root scan jobs. The GC workers then drain the root scan jobs prior to draining heap scan jobs. This parallelizes the root scan process, makes it obey the 25% CPU goal, and effectively eliminates root scanning as an isolated phase, allowing the system to smoothly transition from root scanning to heap marking. This also eliminates a major non-STW responsibility of the GC coordinator, which will make it easier to switch to a decentralized state machine. Finally, it puts us in a good position to perform root scanning in assists as well, which will help satisfy assists at the beginning of the GC cycle. This is mostly straightforward. One tricky aspect is that we have to deal with preemption deadlock: where two non-preemptible gorountines are trying to preempt each other to perform a stack scan. Given the context where this happens, the only instance of this is two background workers trying to scan each other. We avoid this by simply not scanning the stacks of background workers during the concurrent phase; this is safe because we'll scan them during mark termination (and their stacks are very small and should not contain any new pointers). This change also switches the root marking during mark termination to use the same gcDrain-based code path as concurrent mark. This shouldn't affect performance because STW root marking was already parallel and tasks switched to heap marking immediately when no more root marking tasks were available. However, it simplifies the code and unifies these code paths. This has negligible effect on the go1 benchmarks. It slightly slows down the garbage benchmark, possibly by making GC run slightly more frequently. name old time/op new time/op delta XBenchGarbage-12 5.10ms ± 1% 5.24ms ± 1% +2.87% (p=0.000 n=18+18) name old time/op new time/op delta BinaryTree17-12 3.25s ± 3% 3.20s ± 5% -1.57% (p=0.013 n=20+20) Fannkuch11-12 2.45s ± 1% 2.46s ± 1% +0.38% (p=0.019 n=20+18) FmtFprintfEmpty-12 49.7ns ± 3% 49.9ns ± 4% ~ (p=0.851 n=19+20) FmtFprintfString-12 170ns ± 2% 170ns ± 1% ~ (p=0.775 n=20+19) FmtFprintfInt-12 161ns ± 1% 160ns ± 1% -0.78% (p=0.000 n=19+18) FmtFprintfIntInt-12 267ns ± 1% 270ns ± 1% +1.04% (p=0.000 n=19+19) FmtFprintfPrefixedInt-12 238ns ± 2% 238ns ± 1% ~ (p=0.133 n=18+19) FmtFprintfFloat-12 311ns ± 1% 310ns ± 2% -0.35% (p=0.023 n=20+19) FmtManyArgs-12 1.08µs ± 1% 1.06µs ± 1% -2.31% (p=0.000 n=20+20) GobDecode-12 8.65ms ± 1% 8.63ms ± 1% ~ (p=0.377 n=18+20) GobEncode-12 6.49ms ± 1% 6.52ms ± 1% +0.37% (p=0.015 n=20+20) Gzip-12 319ms ± 3% 318ms ± 1% ~ (p=0.975 n=19+17) Gunzip-12 41.9ms ± 1% 42.1ms ± 2% +0.65% (p=0.004 n=19+20) HTTPClientServer-12 61.7µs ± 1% 62.6µs ± 1% +1.40% (p=0.000 n=18+20) JSONEncode-12 16.8ms ± 1% 16.9ms ± 1% ~ (p=0.239 n=20+18) JSONDecode-12 58.4ms ± 1% 60.7ms ± 1% +3.85% (p=0.000 n=19+20) Mandelbrot200-12 3.86ms ± 0% 3.86ms ± 1% ~ (p=0.092 n=18+19) GoParse-12 3.75ms ± 2% 3.75ms ± 2% ~ (p=0.708 n=19+20) RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 2% +0.60% (p=0.010 n=17+20) RegexpMatchEasy0_1K-12 341ns ± 1% 342ns ± 2% ~ (p=0.203 n=20+19) RegexpMatchEasy1_32-12 82.5ns ± 2% 83.2ns ± 2% +0.83% (p=0.007 n=19+19) RegexpMatchEasy1_1K-12 495ns ± 1% 495ns ± 2% ~ (p=0.970 n=19+18) RegexpMatchMedium_32-12 130ns ± 2% 130ns ± 2% +0.59% (p=0.039 n=19+20) RegexpMatchMedium_1K-12 39.2µs ± 1% 39.3µs ± 1% ~ (p=0.214 n=18+18) RegexpMatchHard_32-12 2.03µs ± 2% 2.02µs ± 1% ~ (p=0.166 n=18+19) RegexpMatchHard_1K-12 61.0µs ± 1% 60.9µs ± 1% ~ (p=0.169 n=20+18) Revcomp-12 533ms ± 1% 535ms ± 1% ~ (p=0.071 n=19+17) Template-12 68.1ms ± 2% 73.0ms ± 1% +7.26% (p=0.000 n=19+20) TimeParse-12 355ns ± 2% 356ns ± 2% ~ (p=0.530 n=19+20) TimeFormat-12 357ns ± 2% 347ns ± 1% -2.59% (p=0.000 n=20+19) [Geo mean] 62.1µs 62.3µs +0.31% name old speed new speed delta GobDecode-12 88.7MB/s ± 1% 88.9MB/s ± 1% ~ (p=0.377 n=18+20) GobEncode-12 118MB/s ± 1% 118MB/s ± 1% -0.37% (p=0.015 n=20+20) Gzip-12 60.9MB/s ± 3% 60.9MB/s ± 1% ~ (p=0.944 n=19+17) Gunzip-12 464MB/s ± 1% 461MB/s ± 2% -0.64% (p=0.004 n=19+20) JSONEncode-12 115MB/s ± 1% 115MB/s ± 1% ~ (p=0.236 n=20+18) JSONDecode-12 33.2MB/s ± 1% 32.0MB/s ± 1% -3.71% (p=0.000 n=19+20) GoParse-12 15.5MB/s ± 2% 15.5MB/s ± 2% ~ (p=0.702 n=19+20) RegexpMatchEasy0_32-12 320MB/s ± 1% 318MB/s ± 2% ~ (p=0.094 n=18+20) RegexpMatchEasy0_1K-12 3.00GB/s ± 1% 2.99GB/s ± 1% ~ (p=0.194 n=20+19) RegexpMatchEasy1_32-12 388MB/s ± 2% 385MB/s ± 2% -0.83% (p=0.008 n=19+19) RegexpMatchEasy1_1K-12 2.07GB/s ± 1% 2.07GB/s ± 1% ~ (p=0.964 n=19+18) RegexpMatchMedium_32-12 7.68MB/s ± 1% 7.64MB/s ± 2% -0.57% (p=0.020 n=19+20) RegexpMatchMedium_1K-12 26.1MB/s ± 1% 26.1MB/s ± 1% ~ (p=0.211 n=18+18) RegexpMatchHard_32-12 15.8MB/s ± 1% 15.8MB/s ± 1% ~ (p=0.180 n=18+19) RegexpMatchHard_1K-12 16.8MB/s ± 1% 16.8MB/s ± 2% ~ (p=0.236 n=20+19) Revcomp-12 477MB/s ± 1% 475MB/s ± 1% ~ (p=0.071 n=19+17) Template-12 28.5MB/s ± 2% 26.6MB/s ± 1% -6.77% (p=0.000 n=19+20) [Geo mean] 100MB/s 99.0MB/s -0.82% Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0 Reviewed-on: https://go-review.googlesource.com/16059 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-30 22:46:31 +00:00
Austin Clements	feb92a8e8c	runtime: remove work.partial queue This work queue is no longer used (there are many reads of work.partial, but the only write is in putpartial, which is never called). Fixes #11922. Change-Id: I08b76c0c02a0867a9cdcb94783e1f7629d44249a Reviewed-on: https://go-review.googlesource.com/15892 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-19 18:37:54 +00:00
Austin Clements	8e8219deb5	runtime: update gcController.scanWork regularly Currently, gcController.scanWork is updated as lazily as possible since it is only read at the end of the GC cycle. We're about to read it during the GC cycle to improve the assist ratio revisions, so modify gcDrain* to regularly flush to gcController.scanWork in much the same way as we regularly flush to gcController.bgScanCredit. One consequence of this is that it's difficult to keep gcw.scanWork monotonic, so we give up on that and simply return the amount of scan work done by gcDrainN rather than calculating it in the caller. Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d Reviewed-on: https://go-review.googlesource.com/15407 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-09 19:38:29 +00:00
Austin Clements	1b84bb8c7c	runtime: fix out-of-date comment on gcWork usage Change-Id: I3c21ffa80a5c14911e07238b1f64bec686ed7b72 Reviewed-on: https://go-review.googlesource.com/14980 Reviewed-by: Minux Ma <minux@golang.org>	2015-10-02 19:55:34 +00:00
Rick Hudson	e95bc5fef7	runtime: force mutator to give work buffer to GC The scheduler, work buffer's dispose, and write barriers can conspire to hide the a pointer from the GC's concurent mark phase. If this pointer is the only path to a large amount of marking the STW mark termination phase may take a lot of time. Consider the following: 1) dispose places a work buffer on the partial queue 2) the GC is busy so it does not immediately remove and process the work buffer 3) the scheduler runs a mutator whose write barrier dequeues the work buffer from the partial queue so the GC won't see it This repeats until the GC reaches the mark termination phase where the GC finally discovers the pointer along with a lot of work to do. This CL fixes the problem by having the mutator dispose of the buffer to the full queue instead of the partial queue. Since the write buffer never asks for full buffers the conspiracy described above is not possible. Updates #11694. Change-Id: I2ce832f9657a7570f800e8ce4459cd9e304ef43b Reviewed-on: https://go-review.googlesource.com/12840 Reviewed-by: Austin Clements <austin@google.com>	2015-07-29 18:56:11 +00:00
Rick Hudson	90a19961f2	runtime: reduce latency by aggressively ending mark phase Some latency regressions have crept into our system over the past few weeks. This CL fixes those by having the mark phase more aggressively blacken objects so that the mark termination phase, a STW phase, has less work to do. Three approaches were taken when the mark phase believes it has no more work to do, ie all the work buffers are empty. If things have gone well the mark phase is correct and there is in fact little or no work. In that case the following items will take very little time. If the mark phase is wrong this CL will ferret that work out and give the mark phase a chance to deal with it concurrently before mark termination begins. When the mark phase first appears to be out of work, it does three things: 1) It switches from allocating white to allocating black to reduce the number of unmarked objects reachable only from stacks. 2) It flushes and disables per-P GC work caches so all work must be in globally visible work buffers. 3) It rescans the global roots---the BSS and data segments---so there are fewer objects to blacken during mark termination. We do not rescan stacks at this point, though that could be done in a later CL. After these steps, it again drains the global work buffers. On a lightly loaded machine the garbage benchmark has reduced the number of GC cycles with latency > 10 ms from 83 out of 4083 cycles down to 2 out of 3995 cycles. Maximum latency was reduced from 60+ msecs down to 20 ms. Change-Id: I152285b48a7e56c5083a02e8e4485dd39c990492 Reviewed-on: https://go-review.googlesource.com/10590 Reviewed-by: Austin Clements <austin@google.com>	2015-06-18 21:38:46 +00:00
Ainar Garipov	7f9f70e5b6	all: fix misprints in comments These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-11 14:18:57 +00:00
Rick Hudson	5b66e5d0d8	runtime: turn work buffer tracing off by default During development we ran with monitoring code turned on by default. This CL turns the work buffer monitoring off. Performance change on most go1 benchmarks is small or insignificant. name old mean new mean delta BinaryTree17 3.35s × (0.99,1.01) 3.35s × (0.99,1.01) ~ (p=0.841 n=5+5) Fannkuch11 2.59s × (1.00,1.01) 2.55s × (1.00,1.00) -1.65% (p=0.008 n=5+5) FmtFprintfEmpty 52.5ns × (0.99,1.02) 53.2ns × (0.98,1.01) ~ (p=0.063 n=5+5) FmtFprintfString 181ns × (1.00,1.00) 180ns × (1.00,1.00) -0.55% (p=0.029 n=4+4) FmtFprintfInt 176ns × (1.00,1.01) 174ns × (1.00,1.00) -0.91% (p=0.000 n=5+4) FmtFprintfIntInt 298ns × (1.00,1.00) 299ns × (1.00,1.00) ~ (p=0.143 n=4+4) FmtFprintfPrefixedInt 250ns × (1.00,1.01) 246ns × (1.00,1.00) -1.68% (p=0.000 n=5+4) FmtFprintfFloat 340ns × (1.00,1.00) 340ns × (1.00,1.01) ~ (p=0.643 n=5+5) FmtManyArgs 1.16µs × (1.00,1.00) 1.15µs × (1.00,1.00) -0.47% (p=0.016 n=5+5) GobDecode 9.22ms × (1.00,1.00) 9.23ms × (1.00,1.00) ~ (p=0.841 n=5+5) GobEncode 7.00ms × (1.00,1.01) 7.09ms × (0.99,1.01) +1.26% (p=0.016 n=5+5) Gzip 387ms × (1.00,1.00) 389ms × (0.99,1.02) ~ (p=1.000 n=5+5) Gunzip 97.8ms × (1.00,1.00) 98.3ms × (1.00,1.00) +0.51% (p=0.016 n=5+4) HTTPClientServer 52.6µs × (1.00,1.01) 52.7µs × (1.00,1.01) ~ (p=1.000 n=5+5) JSONEncode 18.0ms × (0.99,1.02) 17.9ms × (1.00,1.00) ~ (p=0.310 n=5+5) JSONDecode 64.8ms × (0.99,1.02) 63.6ms × (1.00,1.00) -1.94% (p=0.008 n=5+5) Mandelbrot200 4.05ms × (1.00,1.00) 4.05ms × (1.00,1.00) ~ (p=0.421 n=5+5) GoParse 3.86ms × (1.00,1.01) 3.84ms × (0.99,1.01) ~ (p=0.421 n=5+5) RegexpMatchEasy0_32 101ns × (1.00,1.00) 102ns × (0.99,1.02) ~ (p=0.238 n=4+5) RegexpMatchEasy0_1K 346ns × (1.00,1.01) 345ns × (1.00,1.00) ~ (p=0.333 n=5+4) RegexpMatchEasy1_32 87.3ns × (0.99,1.02) 87.4ns × (1.00,1.00) ~ (p=0.190 n=5+4) RegexpMatchEasy1_1K 520ns × (1.00,1.00) 520ns × (1.00,1.01) ~ (p=1.000 n=4+5) RegexpMatchMedium_32 143ns × (1.00,1.00) 142ns × (1.00,1.00) -0.70% (p=0.029 n=4+4) RegexpMatchMedium_1K 43.2µs × (1.00,1.01) 43.2µs × (1.00,1.00) ~ (p=0.841 n=5+5) RegexpMatchHard_32 2.24µs × (1.00,1.01) 2.23µs × (1.00,1.01) -0.63% (p=0.048 n=5+5) RegexpMatchHard_1K 68.7µs × (1.00,1.00) 68.3µs × (1.00,1.00) -0.56% (p=0.008 n=5+5) Revcomp 577ms × (1.00,1.01) 579ms × (1.00,1.00) ~ (p=0.151 n=5+5) Template 74.9ms × (1.00,1.00) 76.5ms × (1.00,1.00) +2.11% (p=0.008 n=5+5) TimeParse 359ns × (1.00,1.00) 362ns × (1.00,1.00) +0.72% (p=0.008 n=5+5) TimeFormat 369ns × (1.00,1.00) 371ns × (1.00,1.01) ~ (p=0.071 n=5+5) Change-Id: I4206a3f77a3d1450966b7a62ea7597aec44cb72f Reviewed-on: https://go-review.googlesource.com/10294 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-21 16:09:24 +00:00
Rick Hudson	913db7685e	runtime: run background mark helpers only if work is available Prior to this CL whenever the GC marking was enabled and a P was looking for work we supplied a G to help the GC do its marking tasks. Once this G finished all the marking available it would release the P to find another available G. In the case where there was no work the P would drop into findrunnable which would execute the mark helper G which would immediately return and the P would drop into findrunnable again repeating the process. Since the P was always given a G to run it never blocks. This CL first checks if the GC mark helper G has available work and if not the P immediately falls through to its blocking logic. Fixes #10901 Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5 Reviewed-on: https://go-review.googlesource.com/10189 Reviewed-by: Austin Clements <austin@google.com>	2015-05-19 15:57:50 +00:00
Austin Clements	63caec5dee	runtime: eliminate one heapBitsForObject from scanobject scanobject with ptrmask!=nil is only ever called with the base pointer of a heap object. Currently, scanobject calls heapBitsForObject, which goes to a great deal of trouble to check that the pointer points into the heap and to find the base of the object it points to, both of which are completely unnecessary in this case. Replace this call to heapBitsForObject with much simpler logic to fetch the span and compute the heap bits. Benchmark results with five runs: name old mean new mean delta BenchmarkBinaryTree17 9.21s × (0.95,1.02) 8.55s × (0.91,1.03) -7.16% (p=0.022) BenchmarkFannkuch11 2.65s × (1.00,1.00) 2.62s × (1.00,1.00) -1.10% (p=0.000) BenchmarkFmtFprintfEmpty 73.2ns × (0.99,1.01) 71.7ns × (1.00,1.01) -1.99% (p=0.004) BenchmarkFmtFprintfString 302ns × (0.99,1.00) 292ns × (0.98,1.02) -3.31% (p=0.020) BenchmarkFmtFprintfInt 281ns × (0.98,1.01) 279ns × (0.96,1.02) ~ (p=0.596) BenchmarkFmtFprintfIntInt 482ns × (0.98,1.01) 488ns × (0.95,1.02) ~ (p=0.419) BenchmarkFmtFprintfPrefixedInt 382ns × (0.99,1.01) 365ns × (0.96,1.02) -4.35% (p=0.015) BenchmarkFmtFprintfFloat 475ns × (0.99,1.01) 472ns × (1.00,1.00) ~ (p=0.108) BenchmarkFmtManyArgs 1.89µs × (1.00,1.01) 1.90µs × (0.94,1.02) ~ (p=0.883) BenchmarkGobDecode 22.4ms × (0.99,1.01) 21.9ms × (0.92,1.04) ~ (p=0.332) BenchmarkGobEncode 24.7ms × (0.98,1.02) 23.9ms × (0.87,1.07) ~ (p=0.407) BenchmarkGzip 397ms × (0.99,1.01) 398ms × (0.99,1.01) ~ (p=0.718) BenchmarkGunzip 96.7ms × (1.00,1.00) 96.9ms × (1.00,1.00) ~ (p=0.230) BenchmarkHTTPClientServer 71.5µs × (0.98,1.01) 68.5µs × (0.92,1.06) ~ (p=0.243) BenchmarkJSONEncode 46.1ms × (0.98,1.01) 44.9ms × (0.98,1.03) -2.51% (p=0.040) BenchmarkJSONDecode 86.1ms × (0.99,1.01) 86.5ms × (0.99,1.01) ~ (p=0.343) BenchmarkMandelbrot200 4.12ms × (1.00,1.00) 4.13ms × (1.00,1.00) +0.23% (p=0.000) BenchmarkGoParse 5.89ms × (0.96,1.03) 5.82ms × (0.96,1.04) ~ (p=0.522) BenchmarkRegexpMatchEasy0_32 141ns × (0.99,1.01) 142ns × (1.00,1.00) ~ (p=0.178) BenchmarkRegexpMatchEasy0_1K 408ns × (1.00,1.00) 392ns × (0.99,1.00) -3.83% (p=0.000) BenchmarkRegexpMatchEasy1_32 122ns × (1.00,1.00) 122ns × (1.00,1.00) ~ (p=0.178) BenchmarkRegexpMatchEasy1_1K 626ns × (1.00,1.01) 624ns × (0.99,1.00) ~ (p=0.122) BenchmarkRegexpMatchMedium_32 202ns × (0.99,1.00) 205ns × (0.99,1.01) +1.58% (p=0.001) BenchmarkRegexpMatchMedium_1K 54.4µs × (1.00,1.00) 55.5µs × (1.00,1.00) +1.86% (p=0.000) BenchmarkRegexpMatchHard_32 2.68µs × (1.00,1.00) 2.71µs × (1.00,1.00) +0.97% (p=0.002) BenchmarkRegexpMatchHard_1K 79.8µs × (1.00,1.01) 80.5µs × (1.00,1.01) +0.94% (p=0.003) BenchmarkRevcomp 590ms × (0.99,1.01) 585ms × (1.00,1.00) ~ (p=0.066) BenchmarkTemplate 111ms × (0.97,1.02) 112ms × (0.99,1.01) ~ (p=0.201) BenchmarkTimeParse 392ns × (1.00,1.00) 385ns × (1.00,1.00) -1.69% (p=0.000) BenchmarkTimeFormat 449ns × (0.98,1.01) 448ns × (0.99,1.01) ~ (p=0.550) Change-Id: Ie7c3830c481d96c9043e7bf26853c6c1d05dc9f4 Reviewed-on: https://go-review.googlesource.com/9364 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-28 15:22:20 +00:00
Austin Clements	1b4025f4bd	runtime: replace per-M workbuf cache with per-P gcWork cache Currently, each M has a cache of the most recently used workbuf. This is used primarily by the write barrier so it doesn't have to access the global workbuf lists on every write barrier. It's also used by stack scanning because it's convenient. This cache is important for write barrier performance, but this particular approach has several downsides. It's faster than no cache, but far from optimal (as the benchmarks below show). It's complex: access to the cache is sprinkled through most of the workbuf list operations and it requires special care to transform into and back out of the gcWork cache that's actually used for scanning and marking. It requires atomic exchanges to take ownership of the cached workbuf and to return it to the M's cache even though it's almost always used by only the current M. Since it's per-M, flushing these caches is O(# of Ms), which may be high. And it has some significant subtleties: for example, in general the cache shouldn't be used after the harvestwbufs() in mark termination because it could hide work from mark termination, but stack scanning can happen after this and will* use the cache (but it turns out this is okay because it will always be followed by a getfull(), which drains the cache). This change replaces this cache with a per-P gcWork object. This gcWork cache can be used directly by scanning and marking (as long as preemption is disabled, which is a general requirement of gcWork). Since it's per-P, it doesn't require synchronization, which simplifies things and means the only atomic operations in the write barrier are occasionally fetching new work buffers and setting a mark bit if the object isn't already marked. This cache can be flushed in O(# of Ps), which is generally small. It follows a simple flushing rule: the cache can be used during any phase, but during mark termination it must be flushed before allowing preemption. This also makes the dispose during mutator assist no longer necessary, which eliminates the vast majority of gcWork dispose calls and reduces contention on the global workbuf lists. And it's a lot faster on some benchmarks: benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 11963668673 11206112763 -6.33% BenchmarkFannkuch11 2643217136 2649182499 +0.23% BenchmarkFmtFprintfEmpty 70.4 70.2 -0.28% BenchmarkFmtFprintfString 364 307 -15.66% BenchmarkFmtFprintfInt 317 282 -11.04% BenchmarkFmtFprintfIntInt 512 483 -5.66% BenchmarkFmtFprintfPrefixedInt 404 380 -5.94% BenchmarkFmtFprintfFloat 521 479 -8.06% BenchmarkFmtManyArgs 2164 1894 -12.48% BenchmarkGobDecode 30366146 22429593 -26.14% BenchmarkGobEncode 29867472 26663152 -10.73% BenchmarkGzip 391236616 396779490 +1.42% BenchmarkGunzip 96639491 96297024 -0.35% BenchmarkHTTPClientServer 100110 70763 -29.31% BenchmarkJSONEncode 51866051 52511382 +1.24% BenchmarkJSONDecode 103813138 86094963 -17.07% BenchmarkMandelbrot200 4121834 4120886 -0.02% BenchmarkGoParse 16472789 5879949 -64.31% BenchmarkRegexpMatchEasy0_32 140 140 +0.00% BenchmarkRegexpMatchEasy0_1K 394 394 +0.00% BenchmarkRegexpMatchEasy1_32 120 120 +0.00% BenchmarkRegexpMatchEasy1_1K 621 614 -1.13% BenchmarkRegexpMatchMedium_32 209 202 -3.35% BenchmarkRegexpMatchMedium_1K 54889 55175 +0.52% BenchmarkRegexpMatchHard_32 2682 2675 -0.26% BenchmarkRegexpMatchHard_1K 79383 79524 +0.18% BenchmarkRevcomp 584116718 584595320 +0.08% BenchmarkTemplate 125400565 109620196 -12.58% BenchmarkTimeParse 386 387 +0.26% BenchmarkTimeFormat 580 447 -22.93% (Best out of 10 runs. The delta of averages is similar.) This also puts us in a good position to flush these caches when nearing the end of concurrent marking, which will let us increase the size of the work buffers while still controlling mark termination pause time. Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522 Reviewed-on: https://go-review.googlesource.com/9178 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 20:10:14 +00:00
Austin Clements	571ebae6ef	runtime: track scan work performed during concurrent mark This tracks the amount of scan work in terms of scanned pointers during the concurrent mark phase. We'll use this information to estimate scan work for the next cycle. Currently this aggregates the work counter in gcWork and dispose atomically aggregates this into a global work counter. dispose happens relatively infrequently, so the contention on the global counter should be low. If this turns out to be an issue, we can reduce the number of disposes, and if it's still a problem, we can switch to per-P counters. Change-Id: Iac0364c466ee35fab781dbbbe7970a5f3c4e1fc1 Reviewed-on: https://go-review.googlesource.com/8832 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:00 +00:00
Austin Clements	50a66562a0	runtime: track heap bytes marked by GC This tracks the number of heap bytes marked by a GC cycle. We'll use this information to precisely trigger the next GC cycle. Currently this aggregates the work counter in gcWork and dispose atomically aggregates this into a global work counter. dispose happens relatively infrequently, so the contention on the global counter should be low. If this turns out to be an issue, we can reduce the number of disposes, and if it's still a problem, we can switch to per-P counters. Change-Id: I1bc377cb2e802ef61c2968602b63146d52e7f5db Reviewed-on: https://go-review.googlesource.com/8388 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-06 21:28:07 +00:00
Austin Clements	a2f3d73fee	runtime: improve comment about non-preemption during GC work Currently, gcDrainN is documented saying that it must be run on the system stack. In fact, the problem and solution here are somewhat subtler. First, it doesn't have to happen on the system stack, it just has to be non-stoppable (that is, non-preemptible). Second, this isn't specific to gcDrainN (though gcDrainN is perhaps the most surprising instance); it's general to anything that uses the gcWork structure. Move the comment to gcWork and generalize it. Change-Id: I5277b5abb070e47f8d783bc15a310b379c6adc22 Reviewed-on: https://go-review.googlesource.com/8247 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-31 01:05:38 +00:00
Austin Clements	653426f08f	runtime: exit getfull barrier if there are partial workbufs Currently, we only exit the getfull barrier if there is work on the full list, even though the exit path will take work from either the full or partial list. Change this to exit the barrier if there is work on either the full or partial lists. I believe it's currently safe to check only the full list, since during mark termination there is no reason to put a workbuf on a partial list. However, checking both is more robust. Change-Id: Icf095b0945c7cad326a87ff2f1dc49b7699df373 Reviewed-on: https://go-review.googlesource.com/7840 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-20 14:05:11 +00:00
Austin Clements	cadd4f81a8	runtime: combine gcWorkProducer into gcWork The distinction between gcWorkProducer and gcWork (producer and consumer) is not serving us as originally intended, so merge these into just gcWork. The original intent was to replace the currentwbuf cache with a gcWorkProducer. However, with gchelpwork (aka mutator assists), mutators can both produce and consume work, so it will make more sense to cache a whole gcWork. Change-Id: I6e633e96db7cb23a64fbadbfc4607e3ad32bcfb3 Reviewed-on: https://go-review.googlesource.com/7733 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-19 15:55:21 +00:00
Austin Clements	c25c371098	runtime: use more natural types in struct workbuf Until recently, struct workbuf had only lfnode and uintptr fields before the obj array to make it convenient to compute the size of the obj array. It slowly grew more fields until this became inconvenient enough that it was restructured to make the size computation easy. Now the size computation doesn't care what the field types are, so switch to more natural types. Change-Id: I966140ba7ebb4aeb41d5c66d9d2a3bdc17dd4bcf Reviewed-on: https://go-review.googlesource.com/5262 Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-19 17:00:30 +00:00
Austin Clements	b30d19de59	runtime: introduce higher-level GC work abstraction This introduces a producer/consumer abstraction for GC work pointers that internally handles the details of filling, draining, and shuffling work buffers. In addition to simplifying the GC code, this should make it easy for us to change how we use work buffers, including cleaning up how we use the work.partial queue, reintroducing a FIFO lookahead cache, adding prefetching, and using dual buffers to avoid flapping. This commit doesn't change any existing code. The following commit will switch the garbage collector from explicit workbuf manipulation to gcWork. Change-Id: Ifbfe5fff45bf0362d6d7c3cecb061f0c9874077d Reviewed-on: https://go-review.googlesource.com/5231 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 16:59:26 +00:00
Austin Clements	1b205857a4	runtime: drop unused workbufhdr.id field Change-Id: If7729b3c7df6dc7fcd41f293e2ef2472c769fe8b Reviewed-on: https://go-review.googlesource.com/5261 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 15:53:23 +00:00
Austin Clements	8ed95a942c	runtime: rename gcwork.go to mgcwork.go All of the other memory-related source files start with "m". Keep up the tradition. Change-Id: Idd88fdbf2a1453374fa12109b949b1c4d149a4f8 Reviewed-on: https://go-review.googlesource.com/4853 Reviewed-by: Minux Ma <minux@golang.org>	2015-02-17 18:42:41 +00:00

42 Commits