qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-19 15:44:44 -07:00

Author	SHA1	Message	Date
Austin Clements	b6c0934a9b	runtime: cache two workbufs to reduce contention Currently the gcWork abstraction caches a single work buffer. As a result, if a worker is putting and getting pointers right at the boundary of a work buffer, it can flap between work buffers and (potentially significantly) increase contention on the global work buffer lists. This change modifies gcWork to instead cache two work buffers and switch off between them. This introduces one buffers' worth of hysteresis and eliminates the above performance worst case by amortizing the cost of getting or putting a work buffer over at least one buffers' worth of work. In practice, it's difficult to trigger this worst case with reasonably large work buffers. On the garbage benchmark, this reduces the max writes/sec to the global work list from 32K to 25K and the median from 6K to 5K. However, if a workload were to trigger this worst case behavior, it could significantly drive up this contention. This has negligible effects on the go1 benchmarks and slightly speeds up the garbage benchmark. name old time/op new time/op delta XBenchGarbage-12 5.90ms ± 3% 5.83ms ± 4% -1.18% (p=0.011 n=18+18) name old time/op new time/op delta BinaryTree17-12 3.22s ± 4% 3.17s ± 3% -1.57% (p=0.009 n=19+20) Fannkuch11-12 2.44s ± 1% 2.53s ± 4% +3.78% (p=0.000 n=18+19) FmtFprintfEmpty-12 50.2ns ± 2% 50.5ns ± 5% ~ (p=0.631 n=19+20) FmtFprintfString-12 167ns ± 1% 166ns ± 1% ~ (p=0.141 n=20+20) FmtFprintfInt-12 162ns ± 1% 159ns ± 1% -1.80% (p=0.000 n=20+20) FmtFprintfIntInt-12 277ns ± 2% 263ns ± 1% -4.78% (p=0.000 n=20+18) FmtFprintfPrefixedInt-12 240ns ± 1% 232ns ± 2% -3.25% (p=0.000 n=20+20) FmtFprintfFloat-12 311ns ± 1% 315ns ± 2% +1.17% (p=0.000 n=20+20) FmtManyArgs-12 1.05µs ± 2% 1.03µs ± 2% -1.72% (p=0.000 n=20+20) GobDecode-12 8.65ms ± 1% 8.71ms ± 2% +0.68% (p=0.001 n=19+20) GobEncode-12 6.51ms ± 1% 6.54ms ± 1% +0.42% (p=0.047 n=20+19) Gzip-12 318ms ± 2% 315ms ± 2% -1.20% (p=0.000 n=19+19) Gunzip-12 42.2ms ± 2% 42.1ms ± 1% ~ (p=0.667 n=20+19) HTTPClientServer-12 62.5µs ± 1% 62.4µs ± 1% ~ (p=0.110 n=20+18) JSONEncode-12 16.8ms ± 1% 16.8ms ± 2% ~ (p=0.569 n=19+20) JSONDecode-12 60.8ms ± 2% 59.8ms ± 1% -1.69% (p=0.000 n=19+19) Mandelbrot200-12 3.87ms ± 1% 3.85ms ± 0% -0.61% (p=0.001 n=20+17) GoParse-12 3.76ms ± 2% 3.76ms ± 1% ~ (p=0.698 n=20+20) RegexpMatchEasy0_32-12 100ns ± 2% 101ns ± 2% ~ (p=0.065 n=19+20) RegexpMatchEasy0_1K-12 342ns ± 2% 333ns ± 1% -2.82% (p=0.000 n=20+19) RegexpMatchEasy1_32-12 83.3ns ± 2% 83.2ns ± 2% ~ (p=0.692 n=20+19) RegexpMatchEasy1_1K-12 498ns ± 2% 490ns ± 1% -1.52% (p=0.000 n=18+20) RegexpMatchMedium_32-12 131ns ± 2% 131ns ± 2% ~ (p=0.464 n=20+18) RegexpMatchMedium_1K-12 39.3µs ± 2% 39.6µs ± 1% +0.77% (p=0.000 n=18+19) RegexpMatchHard_32-12 2.04µs ± 2% 2.06µs ± 1% +0.69% (p=0.009 n=19+20) RegexpMatchHard_1K-12 61.4µs ± 2% 62.1µs ± 1% +1.21% (p=0.000 n=19+20) Revcomp-12 534ms ± 1% 529ms ± 1% -0.97% (p=0.000 n=19+16) Template-12 70.4ms ± 2% 70.0ms ± 1% ~ (p=0.070 n=19+19) TimeParse-12 359ns ± 3% 344ns ± 1% -4.15% (p=0.000 n=19+19) TimeFormat-12 357ns ± 1% 361ns ± 2% +1.05% (p=0.002 n=20+20) [Geo mean] 62.4µs 62.0µs -0.56% name old speed new speed delta GobDecode-12 88.7MB/s ± 1% 88.1MB/s ± 2% -0.68% (p=0.001 n=19+20) GobEncode-12 118MB/s ± 1% 117MB/s ± 1% -0.42% (p=0.046 n=20+19) Gzip-12 60.9MB/s ± 2% 61.7MB/s ± 2% +1.21% (p=0.000 n=19+19) Gunzip-12 460MB/s ± 2% 461MB/s ± 1% ~ (p=0.661 n=20+19) JSONEncode-12 116MB/s ± 1% 115MB/s ± 2% ~ (p=0.555 n=19+20) JSONDecode-12 31.9MB/s ± 2% 32.5MB/s ± 1% +1.72% (p=0.000 n=19+19) GoParse-12 15.4MB/s ± 2% 15.4MB/s ± 1% ~ (p=0.653 n=20+20) RegexpMatchEasy0_32-12 317MB/s ± 2% 315MB/s ± 2% ~ (p=0.141 n=19+20) RegexpMatchEasy0_1K-12 2.99GB/s ± 2% 3.07GB/s ± 1% +2.86% (p=0.000 n=20+19) RegexpMatchEasy1_32-12 384MB/s ± 2% 385MB/s ± 2% ~ (p=0.672 n=20+19) RegexpMatchEasy1_1K-12 2.06GB/s ± 2% 2.09GB/s ± 1% +1.54% (p=0.000 n=18+20) RegexpMatchMedium_32-12 7.62MB/s ± 2% 7.63MB/s ± 2% ~ (p=0.800 n=20+18) RegexpMatchMedium_1K-12 26.0MB/s ± 1% 25.8MB/s ± 1% -0.77% (p=0.000 n=18+19) RegexpMatchHard_32-12 15.7MB/s ± 2% 15.6MB/s ± 1% -0.69% (p=0.010 n=19+20) RegexpMatchHard_1K-12 16.7MB/s ± 2% 16.5MB/s ± 1% -1.19% (p=0.000 n=19+20) Revcomp-12 476MB/s ± 1% 481MB/s ± 1% +0.97% (p=0.000 n=19+16) Template-12 27.6MB/s ± 2% 27.7MB/s ± 1% ~ (p=0.071 n=19+19) [Geo mean] 99.1MB/s 99.3MB/s +0.27% Change-Id: I68bcbf74ccb716cd5e844a554f67b679135105e6 Reviewed-on: https://go-review.googlesource.com/16042 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-03 19:12:10 +00:00
Austin Clements	1870572180	runtime: enlarge GC work buffer size Currently the GC work buffers are only 256 bytes and hence can record only 24 64-bit pointer. They were reduced from 4K in commits `db7fd1c` and `a15818f` as a way to minimize the amount of work the per-P workbuf caches could "hide" from the mark phase and carry in to the mark termination phase. However, this approach wasn't very robust and we later added a "mark 2" phase to address this problem head-on. Because of mark 2, there's now no benefit to having very small work buffers. But there are plenty of downsides: small work buffers increase contention on the work lists, increase the frequency and hence net overhead of acquiring and releasing work buffers, and somewhat increase memory overhead of the GC. This commit expands work buffers back to 4K (504 64-bit pointers). This reduces the rate of writes to work.full in the garbage benchmark from a peak of ~780,000 writes/sec to a peak of ~32,000 writes/sec. This has negligible effect on the go1 benchmarks. It slightly slows down the garbage benchmark. name old time/op new time/op delta XBenchGarbage-12 5.37ms ± 5% 5.60ms ± 2% +4.37% (p=0.000 n=20+20) Change-Id: Ic9cc28e7a125d23d9faf4f5e690fb8aa9bcdfb28 Reviewed-on: https://go-review.googlesource.com/15893 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-03 15:53:38 +00:00
Austin Clements	82d14d77da	runtime: perform concurrent scan in GC workers Currently the concurrent root scan is performed in its entirety by the GC coordinator before entering concurrent mark (which enables GC workers). This scan is done sequentially, which can prolong the scan phase, delay the mark phase, and means that the scan phase does not obey the 25% CPU goal. Furthermore, there's no need to complete the root scan before starting marking (in fact, we already allow GC assists to happen during the scan phase), so this acts as an unnecessary barrier between root scanning and marking. This change shifts the root scan work out of the GC coordinator and in to the GC workers. The coordinator simply sets up the scan state and enqueues the right number of root scan jobs. The GC workers then drain the root scan jobs prior to draining heap scan jobs. This parallelizes the root scan process, makes it obey the 25% CPU goal, and effectively eliminates root scanning as an isolated phase, allowing the system to smoothly transition from root scanning to heap marking. This also eliminates a major non-STW responsibility of the GC coordinator, which will make it easier to switch to a decentralized state machine. Finally, it puts us in a good position to perform root scanning in assists as well, which will help satisfy assists at the beginning of the GC cycle. This is mostly straightforward. One tricky aspect is that we have to deal with preemption deadlock: where two non-preemptible gorountines are trying to preempt each other to perform a stack scan. Given the context where this happens, the only instance of this is two background workers trying to scan each other. We avoid this by simply not scanning the stacks of background workers during the concurrent phase; this is safe because we'll scan them during mark termination (and their stacks are very small and should not contain any new pointers). This change also switches the root marking during mark termination to use the same gcDrain-based code path as concurrent mark. This shouldn't affect performance because STW root marking was already parallel and tasks switched to heap marking immediately when no more root marking tasks were available. However, it simplifies the code and unifies these code paths. This has negligible effect on the go1 benchmarks. It slightly slows down the garbage benchmark, possibly by making GC run slightly more frequently. name old time/op new time/op delta XBenchGarbage-12 5.10ms ± 1% 5.24ms ± 1% +2.87% (p=0.000 n=18+18) name old time/op new time/op delta BinaryTree17-12 3.25s ± 3% 3.20s ± 5% -1.57% (p=0.013 n=20+20) Fannkuch11-12 2.45s ± 1% 2.46s ± 1% +0.38% (p=0.019 n=20+18) FmtFprintfEmpty-12 49.7ns ± 3% 49.9ns ± 4% ~ (p=0.851 n=19+20) FmtFprintfString-12 170ns ± 2% 170ns ± 1% ~ (p=0.775 n=20+19) FmtFprintfInt-12 161ns ± 1% 160ns ± 1% -0.78% (p=0.000 n=19+18) FmtFprintfIntInt-12 267ns ± 1% 270ns ± 1% +1.04% (p=0.000 n=19+19) FmtFprintfPrefixedInt-12 238ns ± 2% 238ns ± 1% ~ (p=0.133 n=18+19) FmtFprintfFloat-12 311ns ± 1% 310ns ± 2% -0.35% (p=0.023 n=20+19) FmtManyArgs-12 1.08µs ± 1% 1.06µs ± 1% -2.31% (p=0.000 n=20+20) GobDecode-12 8.65ms ± 1% 8.63ms ± 1% ~ (p=0.377 n=18+20) GobEncode-12 6.49ms ± 1% 6.52ms ± 1% +0.37% (p=0.015 n=20+20) Gzip-12 319ms ± 3% 318ms ± 1% ~ (p=0.975 n=19+17) Gunzip-12 41.9ms ± 1% 42.1ms ± 2% +0.65% (p=0.004 n=19+20) HTTPClientServer-12 61.7µs ± 1% 62.6µs ± 1% +1.40% (p=0.000 n=18+20) JSONEncode-12 16.8ms ± 1% 16.9ms ± 1% ~ (p=0.239 n=20+18) JSONDecode-12 58.4ms ± 1% 60.7ms ± 1% +3.85% (p=0.000 n=19+20) Mandelbrot200-12 3.86ms ± 0% 3.86ms ± 1% ~ (p=0.092 n=18+19) GoParse-12 3.75ms ± 2% 3.75ms ± 2% ~ (p=0.708 n=19+20) RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 2% +0.60% (p=0.010 n=17+20) RegexpMatchEasy0_1K-12 341ns ± 1% 342ns ± 2% ~ (p=0.203 n=20+19) RegexpMatchEasy1_32-12 82.5ns ± 2% 83.2ns ± 2% +0.83% (p=0.007 n=19+19) RegexpMatchEasy1_1K-12 495ns ± 1% 495ns ± 2% ~ (p=0.970 n=19+18) RegexpMatchMedium_32-12 130ns ± 2% 130ns ± 2% +0.59% (p=0.039 n=19+20) RegexpMatchMedium_1K-12 39.2µs ± 1% 39.3µs ± 1% ~ (p=0.214 n=18+18) RegexpMatchHard_32-12 2.03µs ± 2% 2.02µs ± 1% ~ (p=0.166 n=18+19) RegexpMatchHard_1K-12 61.0µs ± 1% 60.9µs ± 1% ~ (p=0.169 n=20+18) Revcomp-12 533ms ± 1% 535ms ± 1% ~ (p=0.071 n=19+17) Template-12 68.1ms ± 2% 73.0ms ± 1% +7.26% (p=0.000 n=19+20) TimeParse-12 355ns ± 2% 356ns ± 2% ~ (p=0.530 n=19+20) TimeFormat-12 357ns ± 2% 347ns ± 1% -2.59% (p=0.000 n=20+19) [Geo mean] 62.1µs 62.3µs +0.31% name old speed new speed delta GobDecode-12 88.7MB/s ± 1% 88.9MB/s ± 1% ~ (p=0.377 n=18+20) GobEncode-12 118MB/s ± 1% 118MB/s ± 1% -0.37% (p=0.015 n=20+20) Gzip-12 60.9MB/s ± 3% 60.9MB/s ± 1% ~ (p=0.944 n=19+17) Gunzip-12 464MB/s ± 1% 461MB/s ± 2% -0.64% (p=0.004 n=19+20) JSONEncode-12 115MB/s ± 1% 115MB/s ± 1% ~ (p=0.236 n=20+18) JSONDecode-12 33.2MB/s ± 1% 32.0MB/s ± 1% -3.71% (p=0.000 n=19+20) GoParse-12 15.5MB/s ± 2% 15.5MB/s ± 2% ~ (p=0.702 n=19+20) RegexpMatchEasy0_32-12 320MB/s ± 1% 318MB/s ± 2% ~ (p=0.094 n=18+20) RegexpMatchEasy0_1K-12 3.00GB/s ± 1% 2.99GB/s ± 1% ~ (p=0.194 n=20+19) RegexpMatchEasy1_32-12 388MB/s ± 2% 385MB/s ± 2% -0.83% (p=0.008 n=19+19) RegexpMatchEasy1_1K-12 2.07GB/s ± 1% 2.07GB/s ± 1% ~ (p=0.964 n=19+18) RegexpMatchMedium_32-12 7.68MB/s ± 1% 7.64MB/s ± 2% -0.57% (p=0.020 n=19+20) RegexpMatchMedium_1K-12 26.1MB/s ± 1% 26.1MB/s ± 1% ~ (p=0.211 n=18+18) RegexpMatchHard_32-12 15.8MB/s ± 1% 15.8MB/s ± 1% ~ (p=0.180 n=18+19) RegexpMatchHard_1K-12 16.8MB/s ± 1% 16.8MB/s ± 2% ~ (p=0.236 n=20+19) Revcomp-12 477MB/s ± 1% 475MB/s ± 1% ~ (p=0.071 n=19+17) Template-12 28.5MB/s ± 2% 26.6MB/s ± 1% -6.77% (p=0.000 n=19+20) [Geo mean] 100MB/s 99.0MB/s -0.82% Change-Id: I875bf6ceb306d1ee2f470cabf88aa6ede27c47a0 Reviewed-on: https://go-review.googlesource.com/16059 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-30 22:46:31 +00:00
Austin Clements	feb92a8e8c	runtime: remove work.partial queue This work queue is no longer used (there are many reads of work.partial, but the only write is in putpartial, which is never called). Fixes #11922. Change-Id: I08b76c0c02a0867a9cdcb94783e1f7629d44249a Reviewed-on: https://go-review.googlesource.com/15892 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-19 18:37:54 +00:00
Austin Clements	8e8219deb5	runtime: update gcController.scanWork regularly Currently, gcController.scanWork is updated as lazily as possible since it is only read at the end of the GC cycle. We're about to read it during the GC cycle to improve the assist ratio revisions, so modify gcDrain* to regularly flush to gcController.scanWork in much the same way as we regularly flush to gcController.bgScanCredit. One consequence of this is that it's difficult to keep gcw.scanWork monotonic, so we give up on that and simply return the amount of scan work done by gcDrainN rather than calculating it in the caller. Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d Reviewed-on: https://go-review.googlesource.com/15407 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-09 19:38:29 +00:00
Austin Clements	1b84bb8c7c	runtime: fix out-of-date comment on gcWork usage Change-Id: I3c21ffa80a5c14911e07238b1f64bec686ed7b72 Reviewed-on: https://go-review.googlesource.com/14980 Reviewed-by: Minux Ma <minux@golang.org>	2015-10-02 19:55:34 +00:00
Rick Hudson	e95bc5fef7	runtime: force mutator to give work buffer to GC The scheduler, work buffer's dispose, and write barriers can conspire to hide the a pointer from the GC's concurent mark phase. If this pointer is the only path to a large amount of marking the STW mark termination phase may take a lot of time. Consider the following: 1) dispose places a work buffer on the partial queue 2) the GC is busy so it does not immediately remove and process the work buffer 3) the scheduler runs a mutator whose write barrier dequeues the work buffer from the partial queue so the GC won't see it This repeats until the GC reaches the mark termination phase where the GC finally discovers the pointer along with a lot of work to do. This CL fixes the problem by having the mutator dispose of the buffer to the full queue instead of the partial queue. Since the write buffer never asks for full buffers the conspiracy described above is not possible. Updates #11694. Change-Id: I2ce832f9657a7570f800e8ce4459cd9e304ef43b Reviewed-on: https://go-review.googlesource.com/12840 Reviewed-by: Austin Clements <austin@google.com>	2015-07-29 18:56:11 +00:00
Rick Hudson	90a19961f2	runtime: reduce latency by aggressively ending mark phase Some latency regressions have crept into our system over the past few weeks. This CL fixes those by having the mark phase more aggressively blacken objects so that the mark termination phase, a STW phase, has less work to do. Three approaches were taken when the mark phase believes it has no more work to do, ie all the work buffers are empty. If things have gone well the mark phase is correct and there is in fact little or no work. In that case the following items will take very little time. If the mark phase is wrong this CL will ferret that work out and give the mark phase a chance to deal with it concurrently before mark termination begins. When the mark phase first appears to be out of work, it does three things: 1) It switches from allocating white to allocating black to reduce the number of unmarked objects reachable only from stacks. 2) It flushes and disables per-P GC work caches so all work must be in globally visible work buffers. 3) It rescans the global roots---the BSS and data segments---so there are fewer objects to blacken during mark termination. We do not rescan stacks at this point, though that could be done in a later CL. After these steps, it again drains the global work buffers. On a lightly loaded machine the garbage benchmark has reduced the number of GC cycles with latency > 10 ms from 83 out of 4083 cycles down to 2 out of 3995 cycles. Maximum latency was reduced from 60+ msecs down to 20 ms. Change-Id: I152285b48a7e56c5083a02e8e4485dd39c990492 Reviewed-on: https://go-review.googlesource.com/10590 Reviewed-by: Austin Clements <austin@google.com>	2015-06-18 21:38:46 +00:00
Ainar Garipov	7f9f70e5b6	all: fix misprints in comments These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-11 14:18:57 +00:00
Rick Hudson	5b66e5d0d8	runtime: turn work buffer tracing off by default During development we ran with monitoring code turned on by default. This CL turns the work buffer monitoring off. Performance change on most go1 benchmarks is small or insignificant. name old mean new mean delta BinaryTree17 3.35s × (0.99,1.01) 3.35s × (0.99,1.01) ~ (p=0.841 n=5+5) Fannkuch11 2.59s × (1.00,1.01) 2.55s × (1.00,1.00) -1.65% (p=0.008 n=5+5) FmtFprintfEmpty 52.5ns × (0.99,1.02) 53.2ns × (0.98,1.01) ~ (p=0.063 n=5+5) FmtFprintfString 181ns × (1.00,1.00) 180ns × (1.00,1.00) -0.55% (p=0.029 n=4+4) FmtFprintfInt 176ns × (1.00,1.01) 174ns × (1.00,1.00) -0.91% (p=0.000 n=5+4) FmtFprintfIntInt 298ns × (1.00,1.00) 299ns × (1.00,1.00) ~ (p=0.143 n=4+4) FmtFprintfPrefixedInt 250ns × (1.00,1.01) 246ns × (1.00,1.00) -1.68% (p=0.000 n=5+4) FmtFprintfFloat 340ns × (1.00,1.00) 340ns × (1.00,1.01) ~ (p=0.643 n=5+5) FmtManyArgs 1.16µs × (1.00,1.00) 1.15µs × (1.00,1.00) -0.47% (p=0.016 n=5+5) GobDecode 9.22ms × (1.00,1.00) 9.23ms × (1.00,1.00) ~ (p=0.841 n=5+5) GobEncode 7.00ms × (1.00,1.01) 7.09ms × (0.99,1.01) +1.26% (p=0.016 n=5+5) Gzip 387ms × (1.00,1.00) 389ms × (0.99,1.02) ~ (p=1.000 n=5+5) Gunzip 97.8ms × (1.00,1.00) 98.3ms × (1.00,1.00) +0.51% (p=0.016 n=5+4) HTTPClientServer 52.6µs × (1.00,1.01) 52.7µs × (1.00,1.01) ~ (p=1.000 n=5+5) JSONEncode 18.0ms × (0.99,1.02) 17.9ms × (1.00,1.00) ~ (p=0.310 n=5+5) JSONDecode 64.8ms × (0.99,1.02) 63.6ms × (1.00,1.00) -1.94% (p=0.008 n=5+5) Mandelbrot200 4.05ms × (1.00,1.00) 4.05ms × (1.00,1.00) ~ (p=0.421 n=5+5) GoParse 3.86ms × (1.00,1.01) 3.84ms × (0.99,1.01) ~ (p=0.421 n=5+5) RegexpMatchEasy0_32 101ns × (1.00,1.00) 102ns × (0.99,1.02) ~ (p=0.238 n=4+5) RegexpMatchEasy0_1K 346ns × (1.00,1.01) 345ns × (1.00,1.00) ~ (p=0.333 n=5+4) RegexpMatchEasy1_32 87.3ns × (0.99,1.02) 87.4ns × (1.00,1.00) ~ (p=0.190 n=5+4) RegexpMatchEasy1_1K 520ns × (1.00,1.00) 520ns × (1.00,1.01) ~ (p=1.000 n=4+5) RegexpMatchMedium_32 143ns × (1.00,1.00) 142ns × (1.00,1.00) -0.70% (p=0.029 n=4+4) RegexpMatchMedium_1K 43.2µs × (1.00,1.01) 43.2µs × (1.00,1.00) ~ (p=0.841 n=5+5) RegexpMatchHard_32 2.24µs × (1.00,1.01) 2.23µs × (1.00,1.01) -0.63% (p=0.048 n=5+5) RegexpMatchHard_1K 68.7µs × (1.00,1.00) 68.3µs × (1.00,1.00) -0.56% (p=0.008 n=5+5) Revcomp 577ms × (1.00,1.01) 579ms × (1.00,1.00) ~ (p=0.151 n=5+5) Template 74.9ms × (1.00,1.00) 76.5ms × (1.00,1.00) +2.11% (p=0.008 n=5+5) TimeParse 359ns × (1.00,1.00) 362ns × (1.00,1.00) +0.72% (p=0.008 n=5+5) TimeFormat 369ns × (1.00,1.00) 371ns × (1.00,1.01) ~ (p=0.071 n=5+5) Change-Id: I4206a3f77a3d1450966b7a62ea7597aec44cb72f Reviewed-on: https://go-review.googlesource.com/10294 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-21 16:09:24 +00:00
Rick Hudson	913db7685e	runtime: run background mark helpers only if work is available Prior to this CL whenever the GC marking was enabled and a P was looking for work we supplied a G to help the GC do its marking tasks. Once this G finished all the marking available it would release the P to find another available G. In the case where there was no work the P would drop into findrunnable which would execute the mark helper G which would immediately return and the P would drop into findrunnable again repeating the process. Since the P was always given a G to run it never blocks. This CL first checks if the GC mark helper G has available work and if not the P immediately falls through to its blocking logic. Fixes #10901 Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5 Reviewed-on: https://go-review.googlesource.com/10189 Reviewed-by: Austin Clements <austin@google.com>	2015-05-19 15:57:50 +00:00
Austin Clements	63caec5dee	runtime: eliminate one heapBitsForObject from scanobject scanobject with ptrmask!=nil is only ever called with the base pointer of a heap object. Currently, scanobject calls heapBitsForObject, which goes to a great deal of trouble to check that the pointer points into the heap and to find the base of the object it points to, both of which are completely unnecessary in this case. Replace this call to heapBitsForObject with much simpler logic to fetch the span and compute the heap bits. Benchmark results with five runs: name old mean new mean delta BenchmarkBinaryTree17 9.21s × (0.95,1.02) 8.55s × (0.91,1.03) -7.16% (p=0.022) BenchmarkFannkuch11 2.65s × (1.00,1.00) 2.62s × (1.00,1.00) -1.10% (p=0.000) BenchmarkFmtFprintfEmpty 73.2ns × (0.99,1.01) 71.7ns × (1.00,1.01) -1.99% (p=0.004) BenchmarkFmtFprintfString 302ns × (0.99,1.00) 292ns × (0.98,1.02) -3.31% (p=0.020) BenchmarkFmtFprintfInt 281ns × (0.98,1.01) 279ns × (0.96,1.02) ~ (p=0.596) BenchmarkFmtFprintfIntInt 482ns × (0.98,1.01) 488ns × (0.95,1.02) ~ (p=0.419) BenchmarkFmtFprintfPrefixedInt 382ns × (0.99,1.01) 365ns × (0.96,1.02) -4.35% (p=0.015) BenchmarkFmtFprintfFloat 475ns × (0.99,1.01) 472ns × (1.00,1.00) ~ (p=0.108) BenchmarkFmtManyArgs 1.89µs × (1.00,1.01) 1.90µs × (0.94,1.02) ~ (p=0.883) BenchmarkGobDecode 22.4ms × (0.99,1.01) 21.9ms × (0.92,1.04) ~ (p=0.332) BenchmarkGobEncode 24.7ms × (0.98,1.02) 23.9ms × (0.87,1.07) ~ (p=0.407) BenchmarkGzip 397ms × (0.99,1.01) 398ms × (0.99,1.01) ~ (p=0.718) BenchmarkGunzip 96.7ms × (1.00,1.00) 96.9ms × (1.00,1.00) ~ (p=0.230) BenchmarkHTTPClientServer 71.5µs × (0.98,1.01) 68.5µs × (0.92,1.06) ~ (p=0.243) BenchmarkJSONEncode 46.1ms × (0.98,1.01) 44.9ms × (0.98,1.03) -2.51% (p=0.040) BenchmarkJSONDecode 86.1ms × (0.99,1.01) 86.5ms × (0.99,1.01) ~ (p=0.343) BenchmarkMandelbrot200 4.12ms × (1.00,1.00) 4.13ms × (1.00,1.00) +0.23% (p=0.000) BenchmarkGoParse 5.89ms × (0.96,1.03) 5.82ms × (0.96,1.04) ~ (p=0.522) BenchmarkRegexpMatchEasy0_32 141ns × (0.99,1.01) 142ns × (1.00,1.00) ~ (p=0.178) BenchmarkRegexpMatchEasy0_1K 408ns × (1.00,1.00) 392ns × (0.99,1.00) -3.83% (p=0.000) BenchmarkRegexpMatchEasy1_32 122ns × (1.00,1.00) 122ns × (1.00,1.00) ~ (p=0.178) BenchmarkRegexpMatchEasy1_1K 626ns × (1.00,1.01) 624ns × (0.99,1.00) ~ (p=0.122) BenchmarkRegexpMatchMedium_32 202ns × (0.99,1.00) 205ns × (0.99,1.01) +1.58% (p=0.001) BenchmarkRegexpMatchMedium_1K 54.4µs × (1.00,1.00) 55.5µs × (1.00,1.00) +1.86% (p=0.000) BenchmarkRegexpMatchHard_32 2.68µs × (1.00,1.00) 2.71µs × (1.00,1.00) +0.97% (p=0.002) BenchmarkRegexpMatchHard_1K 79.8µs × (1.00,1.01) 80.5µs × (1.00,1.01) +0.94% (p=0.003) BenchmarkRevcomp 590ms × (0.99,1.01) 585ms × (1.00,1.00) ~ (p=0.066) BenchmarkTemplate 111ms × (0.97,1.02) 112ms × (0.99,1.01) ~ (p=0.201) BenchmarkTimeParse 392ns × (1.00,1.00) 385ns × (1.00,1.00) -1.69% (p=0.000) BenchmarkTimeFormat 449ns × (0.98,1.01) 448ns × (0.99,1.01) ~ (p=0.550) Change-Id: Ie7c3830c481d96c9043e7bf26853c6c1d05dc9f4 Reviewed-on: https://go-review.googlesource.com/9364 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-28 15:22:20 +00:00
Austin Clements	1b4025f4bd	runtime: replace per-M workbuf cache with per-P gcWork cache Currently, each M has a cache of the most recently used workbuf. This is used primarily by the write barrier so it doesn't have to access the global workbuf lists on every write barrier. It's also used by stack scanning because it's convenient. This cache is important for write barrier performance, but this particular approach has several downsides. It's faster than no cache, but far from optimal (as the benchmarks below show). It's complex: access to the cache is sprinkled through most of the workbuf list operations and it requires special care to transform into and back out of the gcWork cache that's actually used for scanning and marking. It requires atomic exchanges to take ownership of the cached workbuf and to return it to the M's cache even though it's almost always used by only the current M. Since it's per-M, flushing these caches is O(# of Ms), which may be high. And it has some significant subtleties: for example, in general the cache shouldn't be used after the harvestwbufs() in mark termination because it could hide work from mark termination, but stack scanning can happen after this and will* use the cache (but it turns out this is okay because it will always be followed by a getfull(), which drains the cache). This change replaces this cache with a per-P gcWork object. This gcWork cache can be used directly by scanning and marking (as long as preemption is disabled, which is a general requirement of gcWork). Since it's per-P, it doesn't require synchronization, which simplifies things and means the only atomic operations in the write barrier are occasionally fetching new work buffers and setting a mark bit if the object isn't already marked. This cache can be flushed in O(# of Ps), which is generally small. It follows a simple flushing rule: the cache can be used during any phase, but during mark termination it must be flushed before allowing preemption. This also makes the dispose during mutator assist no longer necessary, which eliminates the vast majority of gcWork dispose calls and reduces contention on the global workbuf lists. And it's a lot faster on some benchmarks: benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 11963668673 11206112763 -6.33% BenchmarkFannkuch11 2643217136 2649182499 +0.23% BenchmarkFmtFprintfEmpty 70.4 70.2 -0.28% BenchmarkFmtFprintfString 364 307 -15.66% BenchmarkFmtFprintfInt 317 282 -11.04% BenchmarkFmtFprintfIntInt 512 483 -5.66% BenchmarkFmtFprintfPrefixedInt 404 380 -5.94% BenchmarkFmtFprintfFloat 521 479 -8.06% BenchmarkFmtManyArgs 2164 1894 -12.48% BenchmarkGobDecode 30366146 22429593 -26.14% BenchmarkGobEncode 29867472 26663152 -10.73% BenchmarkGzip 391236616 396779490 +1.42% BenchmarkGunzip 96639491 96297024 -0.35% BenchmarkHTTPClientServer 100110 70763 -29.31% BenchmarkJSONEncode 51866051 52511382 +1.24% BenchmarkJSONDecode 103813138 86094963 -17.07% BenchmarkMandelbrot200 4121834 4120886 -0.02% BenchmarkGoParse 16472789 5879949 -64.31% BenchmarkRegexpMatchEasy0_32 140 140 +0.00% BenchmarkRegexpMatchEasy0_1K 394 394 +0.00% BenchmarkRegexpMatchEasy1_32 120 120 +0.00% BenchmarkRegexpMatchEasy1_1K 621 614 -1.13% BenchmarkRegexpMatchMedium_32 209 202 -3.35% BenchmarkRegexpMatchMedium_1K 54889 55175 +0.52% BenchmarkRegexpMatchHard_32 2682 2675 -0.26% BenchmarkRegexpMatchHard_1K 79383 79524 +0.18% BenchmarkRevcomp 584116718 584595320 +0.08% BenchmarkTemplate 125400565 109620196 -12.58% BenchmarkTimeParse 386 387 +0.26% BenchmarkTimeFormat 580 447 -22.93% (Best out of 10 runs. The delta of averages is similar.) This also puts us in a good position to flush these caches when nearing the end of concurrent marking, which will let us increase the size of the work buffers while still controlling mark termination pause time. Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522 Reviewed-on: https://go-review.googlesource.com/9178 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 20:10:14 +00:00
Austin Clements	571ebae6ef	runtime: track scan work performed during concurrent mark This tracks the amount of scan work in terms of scanned pointers during the concurrent mark phase. We'll use this information to estimate scan work for the next cycle. Currently this aggregates the work counter in gcWork and dispose atomically aggregates this into a global work counter. dispose happens relatively infrequently, so the contention on the global counter should be low. If this turns out to be an issue, we can reduce the number of disposes, and if it's still a problem, we can switch to per-P counters. Change-Id: Iac0364c466ee35fab781dbbbe7970a5f3c4e1fc1 Reviewed-on: https://go-review.googlesource.com/8832 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:00 +00:00
Austin Clements	50a66562a0	runtime: track heap bytes marked by GC This tracks the number of heap bytes marked by a GC cycle. We'll use this information to precisely trigger the next GC cycle. Currently this aggregates the work counter in gcWork and dispose atomically aggregates this into a global work counter. dispose happens relatively infrequently, so the contention on the global counter should be low. If this turns out to be an issue, we can reduce the number of disposes, and if it's still a problem, we can switch to per-P counters. Change-Id: I1bc377cb2e802ef61c2968602b63146d52e7f5db Reviewed-on: https://go-review.googlesource.com/8388 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-06 21:28:07 +00:00
Austin Clements	a2f3d73fee	runtime: improve comment about non-preemption during GC work Currently, gcDrainN is documented saying that it must be run on the system stack. In fact, the problem and solution here are somewhat subtler. First, it doesn't have to happen on the system stack, it just has to be non-stoppable (that is, non-preemptible). Second, this isn't specific to gcDrainN (though gcDrainN is perhaps the most surprising instance); it's general to anything that uses the gcWork structure. Move the comment to gcWork and generalize it. Change-Id: I5277b5abb070e47f8d783bc15a310b379c6adc22 Reviewed-on: https://go-review.googlesource.com/8247 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-31 01:05:38 +00:00
Austin Clements	653426f08f	runtime: exit getfull barrier if there are partial workbufs Currently, we only exit the getfull barrier if there is work on the full list, even though the exit path will take work from either the full or partial list. Change this to exit the barrier if there is work on either the full or partial lists. I believe it's currently safe to check only the full list, since during mark termination there is no reason to put a workbuf on a partial list. However, checking both is more robust. Change-Id: Icf095b0945c7cad326a87ff2f1dc49b7699df373 Reviewed-on: https://go-review.googlesource.com/7840 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-20 14:05:11 +00:00
Austin Clements	cadd4f81a8	runtime: combine gcWorkProducer into gcWork The distinction between gcWorkProducer and gcWork (producer and consumer) is not serving us as originally intended, so merge these into just gcWork. The original intent was to replace the currentwbuf cache with a gcWorkProducer. However, with gchelpwork (aka mutator assists), mutators can both produce and consume work, so it will make more sense to cache a whole gcWork. Change-Id: I6e633e96db7cb23a64fbadbfc4607e3ad32bcfb3 Reviewed-on: https://go-review.googlesource.com/7733 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-19 15:55:21 +00:00
Austin Clements	c25c371098	runtime: use more natural types in struct workbuf Until recently, struct workbuf had only lfnode and uintptr fields before the obj array to make it convenient to compute the size of the obj array. It slowly grew more fields until this became inconvenient enough that it was restructured to make the size computation easy. Now the size computation doesn't care what the field types are, so switch to more natural types. Change-Id: I966140ba7ebb4aeb41d5c66d9d2a3bdc17dd4bcf Reviewed-on: https://go-review.googlesource.com/5262 Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-19 17:00:30 +00:00
Austin Clements	b30d19de59	runtime: introduce higher-level GC work abstraction This introduces a producer/consumer abstraction for GC work pointers that internally handles the details of filling, draining, and shuffling work buffers. In addition to simplifying the GC code, this should make it easy for us to change how we use work buffers, including cleaning up how we use the work.partial queue, reintroducing a FIFO lookahead cache, adding prefetching, and using dual buffers to avoid flapping. This commit doesn't change any existing code. The following commit will switch the garbage collector from explicit workbuf manipulation to gcWork. Change-Id: Ifbfe5fff45bf0362d6d7c3cecb061f0c9874077d Reviewed-on: https://go-review.googlesource.com/5231 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 16:59:26 +00:00
Austin Clements	1b205857a4	runtime: drop unused workbufhdr.id field Change-Id: If7729b3c7df6dc7fcd41f293e2ef2472c769fe8b Reviewed-on: https://go-review.googlesource.com/5261 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 15:53:23 +00:00
Austin Clements	8ed95a942c	runtime: rename gcwork.go to mgcwork.go All of the other memory-related source files start with "m". Keep up the tradition. Change-Id: Idd88fdbf2a1453374fa12109b949b1c4d149a4f8 Reviewed-on: https://go-review.googlesource.com/4853 Reviewed-by: Minux Ma <minux@golang.org>	2015-02-17 18:42:41 +00:00

22 Commits