qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-14 09:20:28 -07:00

Author	SHA1	Message	Date
Austin Clements	8e8219deb5	runtime: update gcController.scanWork regularly Currently, gcController.scanWork is updated as lazily as possible since it is only read at the end of the GC cycle. We're about to read it during the GC cycle to improve the assist ratio revisions, so modify gcDrain* to regularly flush to gcController.scanWork in much the same way as we regularly flush to gcController.bgScanCredit. One consequence of this is that it's difficult to keep gcw.scanWork monotonic, so we give up on that and simply return the amount of scan work done by gcDrainN rather than calculating it in the caller. Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d Reviewed-on: https://go-review.googlesource.com/15407 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-09 19:38:29 +00:00
Austin Clements	c18b163c15	runtime: control background scan credit flushing with flag Currently callers of gcDrain control whether it flushes scan work credit to gcController.bgScanCredit by passing a value other than -1 for the flush threshold. Shortly we're going to make this always flush scan work to gcController.scanWork and optionally also flush scan work to gcController.bgScanCredit. This will be much easier if the flush threshold is simply a constant (which it is in practice) and callers merely control whether or not the flush includes the background credit. Hence, replace the flush threshold argument with a flag. Change-Id: Ia27db17de8a3f1e462a5d7137d4b5dc72f99a04e Reviewed-on: https://go-review.googlesource.com/15406 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-09 19:38:16 +00:00
Austin Clements	9b3cdaf0a3	runtime: consolidate gcDrain and gcDrainUntilPreempt These functions were nearly identical. Consolidate them by adding a flags argument. In addition to cleaning up this code, this makes further changes that affect both functions easier. Change-Id: I6ec5c947603bbbd3ff4040113b2fbc240e99745f Reviewed-on: https://go-review.googlesource.com/15405 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-09 19:38:03 +00:00
Austin Clements	39ed682206	runtime: explain why continuous assist revising is necessary Change-Id: I950af8d80433b3ae8a1da0aa7a8d2d0b295dd313 Reviewed-on: https://go-review.googlesource.com/15404 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-09 19:37:53 +00:00
Austin Clements	3e57b17dc3	runtime: fix comment for assistRatio The comment for assistRatio claimed it to be the reciprocal of what it actually is. Change-Id: If7f9bb853d75d0097facff3aa6704b224d9108b8 Reviewed-on: https://go-review.googlesource.com/15402 Reviewed-by: Russ Cox <rsc@golang.org>	2015-10-09 19:37:23 +00:00
Austin Clements	9a31d38f65	runtime: remove sweep wait loop in finishsweep_m In general, finishsweep_m must block until any spans that are concurrently being swept have been swept. It accomplishes this by looping over all spans, which, as in the previous commit, takes ~1ms/heap GB. Unfortunately, we do this during the STW sweep termination phase, so multi-gigabyte heaps can push our STW time past 10ms. However, there's no need to do this wait if the world is stopped because, in effect, stopping the world already had to wait for anything that was sweeping (and if it didn't, the wait in finishsweep_m would deadlock). Hence, we can simply skip this loop if the world is stopped, such as during sweep termination. In fact, currently all calls to finishsweep_m are STW, but this hasn't always been the case and may not be the case in the future, so we keep the logic around. For 24GB heaps, this reduces max pause time by 75% relative to tip and by 90% relative to Go 1.5. Notably, all pauses are now well under 10ms. Here are the results for the garbage benchmark: ------------- max pause ------------ Heap Procs after change before change 1.5.1 24GB 12 3.8ms 16ms 37ms 24GB 4 3.7ms 16ms 37ms 4GB 4 3.7ms 3ms 6.9ms In the 4GB/4P case, it seems the "before change" run got lucky: the max went up, but the 99%ile pause time went down from 3ms to 2.04ms. Change-Id: Ica22189559f231d408ef2815019c9dbb5f38bf31 Reviewed-on: https://go-review.googlesource.com/15071 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-02 19:56:01 +00:00
Austin Clements	dac220b0a9	runtime: remove in-use page count loop from STW In order to compute the sweep ratio, the runtime needs to know how many pages belong to spans in state _MSpanInUse. Currently it finds this out by looping over all spans during mark termination. However, this takes ~1ms/heap GB, so multi-gigabyte heaps can quickly push our STW time past 10ms. Replace the loop with an actively maintained count of in-use pages. For multi-gigabyte heaps, this reduces max mark termination pause time by 75%–90% relative to tip and by 85%–95% relative to Go 1.5.1. This shifts the longest pause time for large heaps to the sweep termination phase, so it only slightly decreases max pause time, though it roughly halves mean pause time. Here are the results for the garbage benchmark: ---- max mark termination pause ---- Heap Procs after change before change 1.5.1 24GB 12 1.9ms 18ms 37ms 24GB 4 3.7ms 18ms 37ms 4GB 4 920µs 3.8ms 6.9ms Fixes #11484. Change-Id: Ia2d28bb8a1e4f1c3b8ebf79fb203f12b9bf114ac Reviewed-on: https://go-review.googlesource.com/15070 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-02 19:55:55 +00:00
Austin Clements	608c1b0d56	runtime: scan objects with finalizers concurrently This reduces pause time by ~25% relative to tip and by ~50% relative to Go 1.5.1. Currently one of the steps of STW mark termination is to loop (in parallel) over all spans to find objects with finalizers in order to mark all objects reachable from these objects and to treat the finalizer special as a root. Unfortunately, even if there are no finalizers at all, this loop takes roughly 1 ms/heap GB/core, so multi-gigabyte heaps can quickly push our STW time past 10ms. Fix this by moving this scan from mark termination to concurrent scan, where it can run in parallel with mutators. The loop itself could also be optimized, but this cost is small compared to concurrent marking. Making this scan concurrent introduces two complications: 1) The scan currently walks the specials list of each span without locking it, which is safe only with the world stopped. We fix this by speculatively checking if a span has any specials (the vast majority won't) and then locking the specials list only if there are specials to check. 2) An object can have a finalizer set after concurrent scan, in which case it won't have been marked appropriately by concurrent scan. If the finalizer is a closure and is only reachable from the special, it could be swept before it is run. Likewise, if the object is not marked yet when the finalizer is set and then becomes unreachable before it is marked, other objects reachable only from it may be swept before the finalizer function is run. We fix this issue by making addfinalizer ensure the same marking invariants as markroot does. For multi-gigabyte heaps, this reduces max pause time by 20%–30% relative to tip (depending on GOMAXPROCS) and by ~50% relative to Go 1.5.1 (where this loop was neither concurrent nor parallel). Here are the results for the garbage benchmark: ---------------- max pause ---------------- Heap Procs Concurrent scan STW parallel scan 1.5.1 24GB 12 18ms 23ms 37ms 24GB 4 18ms 25ms 37ms 4GB 4 3.8ms 4.9ms 6.9ms In all cases, 95%ile pause time is similar to the max pause time. This also improves mean STW time by 10%–30%. Fixes #11485. Change-Id: I9359d8c3d120a51d23d924b52bf853a1299b1dfd Reviewed-on: https://go-review.googlesource.com/14982 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-02 19:55:48 +00:00
Austin Clements	fbd2660af3	runtime: introduce gcMode type for GC modes Currently, the GC modes constants are untyped and functions pass them around as ints. Clean this up by introducing a proper type for these constant. Change-Id: Ibc022447bdfa203644921fbb548312d7e2272e8d Reviewed-on: https://go-review.googlesource.com/14981 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-02 19:55:41 +00:00
Austin Clements	4ac4085f8e	runtime: minor clarifications of markroot This puts the _Root* indexes in a more friendly order and tweaks markrootSpans to use a for-range loop instead of its own indexing. Change-Id: I2c18d55c9a673ea396b6424d51ef4997a1a74825 Reviewed-on: https://go-review.googlesource.com/14548 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-09-14 19:37:44 +00:00
Austin Clements	572f08a064	runtime: split marking of span roots into 128 subtasks Marking of span roots can represent a significant fraction of the time spent in mark termination. Simply traversing the span list takes about 1ms per GB of heap and if there are a large number of finalizers (for example, for network connections), it may take much longer. Improve the situation by splitting the span scan into 128 subtasks that can be executed in parallel and load balanced by the markroots parallel for. This lets the GC balance this job across the Ps. A better solution is to do this during concurrent mark, or to improve it algorithmically, but this is a simple change with a lot of bang for the buck. This was suggested by Rhys Hiltner. Updates #11485. Change-Id: I8b281adf0ba827064e154a1b6cc32d4d8031c03c Reviewed-on: https://go-review.googlesource.com/13112 Reviewed-by: Keith Randall <khr@golang.org>	2015-09-14 18:15:40 +00:00
Austin Clements	92e68c89f6	runtime: move stack barrier code to its own file Currently the stack barrier code is mixed in with the mark and scan code. Move all of the stack barrier related functions and variables to a new dedicated source file. There are no code modifications. Change-Id: I604603045465ef8573b9f88915d28ab6b5910903 Reviewed-on: https://go-review.googlesource.com/14050 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-09-09 01:18:50 +00:00
Austin Clements	3bfc9df21a	runtime: add GODEBUG for stack barriers at every frame Currently enabling the debugging mode where stack barriers are installed at every frame requires recompiling the runtime. However, this is potentially useful for field debugging and for runtime tests, so make this mode a GODEBUG. Updates #12238. Change-Id: I6fb128f598b19568ae723a612e099c0ed96917f5 Reviewed-on: https://go-review.googlesource.com/13947 Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-30 16:06:55 +00:00
Austin Clements	d57f037302	runtime: don't recheck heap trigger for periodic GC `88e945f` introduced a non-speculative double check of the heap trigger before actually starting a concurrent GC. This was necessary to fix a race for heap-triggered GC, but broke sysmon-triggered periodic GC, since the heap check will of course fail for periodically triggered GC. Fix this by telling startGC whether or not this GC was triggered by heap size or a timer and only doing the heap size double check for GCs triggered by heap size. Fixes #12026. Change-Id: I7c3f6ec364545c36d619f2b4b3bf3b758e3bcbd6 Reviewed-on: https://go-review.googlesource.com/13168 Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-05 17:28:56 +00:00
Austin Clements	be39a42920	runtime: fix typos in comments Change-Id: I66f7937b22bb6e05c3f2f0f2a057151020ad9699 Reviewed-on: https://go-review.googlesource.com/13049 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:56 +00:00
Austin Clements	e3870aa6f3	runtime: fix assist utilization computation When commit `510fd13` enabled assists during the scan phase, it failed to also update the code in the GC controller that computed the assist CPU utilization and adjusted the trigger based on it. Fix that code so it uses the start of the scan phase as the wall-clock time when assists were enabled rather than the start of the mark phase. Change-Id: I05013734b4448c3e2c730dc7b0b5ee28c86ed8cf Reviewed-on: https://go-review.googlesource.com/13048 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:53 +00:00
Austin Clements	1fb01a88f9	runtime: revise assist ratio aggressively At the start of a GC cycle, the garbage collector computes the assist ratio based on the total scannable heap size. This was intended to be conservative; after all, this assumes the entire heap may be reachable and hence needs to be scanned. But it only assumes that the current entire heap may be reachable. It fails to account for heap allocated during the GC cycle. If the trigger ratio is very low (near zero), and most of the heap is reachable when GC starts (which is likely if the trigger ratio is near zero), then it's possible for the mutator to create new, reachable heap fast enough that the assists won't keep up based on the assist ratio computed at the beginning of the cycle. As a result, the heap can grow beyond the heap goal (by hundreds of megs in stress tests like in issue #11911). We already have some vestigial logic for dealing with situations like this; it just doesn't run often enough. Currently, every 10 ms during the GC cycle, the GC revises the assist ratio. This was put in before we switched to a conservative assist ratio (when we really were using estimates of scannable heap), and it turns out to be exactly what we need now. However, every 10 ms is far too infrequent for a rapidly allocating mutator. This commit reuses this logic, but replaces the 10 ms timer with revising the assist ratio every time the heap is locked, which coincides precisely with when the statistics used to compute the assist ratio are updated. Fixes #11911. Change-Id: I377b231ab064946228378fa10422a46d1b50f4c5 Reviewed-on: https://go-review.googlesource.com/13047 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:48 +00:00
Austin Clements	f9dc3382ad	runtime: when gcpacertrace > 0, print information about assist ratio This was useful in debugging the mutator assist behavior for #11911, and it fits with the other gcpacertrace output. Change-Id: I1e25590bb4098223a160de796578bd11086309c7 Reviewed-on: https://go-review.googlesource.com/13046 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:46 +00:00
Austin Clements	fc9ca85f4c	runtime: make sweep proportional to spans bytes allocated Proportional concurrent sweep is currently based on a ratio of spans to be swept per bytes of object allocation. However, proportional sweeping is performed during span allocation, not object allocation, in order to minimize contention and overhead. Since objects are allocated from spans after those spans are allocated, the system tends to operate in debt, which means when the next GC cycle starts, there is often sweep debt remaining, so GC has to finish the sweep, which delays the start of the cycle and delays enabling mutator assists. For example, it's quite likely that many Ps will simultaneously refill their span caches immediately after a GC cycle (because GC flushes the span caches), but at this point, there has been very little object allocation since the end of GC, so very little sweeping is done. The Ps then allocate objects from these cached spans, which drives up the bytes of object allocation, but since these allocations are coming from cached spans, nothing considers whether more sweeping has to happen. If the sweep ratio is high enough (which can happen if the next GC trigger is very close to the retained heap size), this can easily represent a sweep debt of thousands of pages. Fix this by making proportional sweep proportional to the number of bytes of spans allocated, rather than the number of bytes of objects allocated. Prior to allocating a span, both the small object path and the large object path ensure credit for allocating that span, so the system operates in the black, rather than in the red. Combined with the previous commit, this should eliminate all sweeping from GC start up. On the stress test in issue #11911, this reduces the time spent sweeping during GC (and delaying start up) by several orders of magnitude: mean 99%ile max pre fix 1 ms 11 ms 144 ms post fix 270 ns 735 ns 916 ns Updates #11911. Change-Id: I89223712883954c9d6ec2a7a51ecb97172097df3 Reviewed-on: https://go-review.googlesource.com/13044 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:44 +00:00
Austin Clements	e30c6d64ba	runtime: always give concurrent sweep some heap distance Currently it's possible for the next_gc heap size trigger computed for the next GC cycle to be less than the current allocated heap size. This means the next cycle will start immediately, which means there's no time to perform the concurrent sweep between GC cycles. This places responsibility for finishing the sweep on GC itself, which delays GC start-up and hence delays mutator assist. Fix this by ensuring that next_gc is always at least a little higher than the allocated heap size, so we won't trigger the next cycle instantly. Updates #11911. Change-Id: I74f0b887bf187518d5fedffc7989817cbcf30592 Reviewed-on: https://go-review.googlesource.com/13043 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:41 +00:00
Austin Clements	88e945fd23	runtime: recheck GC trigger before actually starting GC Currently allocation checks the GC trigger speculatively during allocation and then triggers the GC without rechecking. As a result, it's possible for G 1 and G 2 to detect the trigger simultaneously, both enter startGC, G 1 actually starts GC while G 2 gets preempted until after the whole GC cycle, then G 2 immediately starts another GC cycle even though the heap is now well under the trigger. Fix this by re-checking the GC trigger non-speculatively just before actually kicking off a new GC cycle. This contributes to #11911 because when this happens, we definitely don't finish the background sweep before starting the next GC cycle, which can significantly delay the start of concurrent scan. Change-Id: I560ab79ba5684ba435084410a9765d28f5745976 Reviewed-on: https://go-review.googlesource.com/13025 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-08-04 18:54:32 +00:00
Russ Cox	abdc77a288	runtime: avoid reference to stale stack after GC shrinkstack Dangling pointer error. Unlikely to trigger in practice, but still. Found by running GODEBUG=efence=1 GOGC=1 trace.test. Change-Id: Ice474dedcf62dd33ab77526287a023ba3b166db9 Reviewed-on: https://go-review.googlesource.com/12991 Reviewed-by: Austin Clements <austin@google.com>	2015-07-31 02:18:42 +00:00
Austin Clements	23e4744c07	runtime: report GC CPU utilization in MemStats This adds a GCCPUFraction field to MemStats that reports the cumulative fraction of the program's execution time spent in the garbage collector. This is equivalent to the utilization percent shown in the gctrace output and makes this available programmatically. This does make one small effect on the gctrace output: we now report the duration of mark termination up to just before the final start-the-world, rather than up to just after. However, unlike stop-the-world, I don't believe there's any way that start-the-world can block, so it should take negligible time. While there are many statistics one might want to expose via MemStats, this is one of the few that will undoubtedly remain meaningful regardless of future changes to the memory system. The diff for this change is larger than the actual change. Mostly it lifts the code for computing the GC CPU utilization out of the debug.gctrace path. Updates #10323. Change-Id: I0f7dc3fdcafe95e8d1233ceb79de606b48acd989 Reviewed-on: https://go-review.googlesource.com/12844 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-29 20:23:34 +00:00
Austin Clements	4b71660c5b	runtime: always capture GC phase transition times Currently we only capture GC phase transition times if debug.gctrace>0, but we're about to compute GC CPU utilization regardless of whether debug.gctrace is set, so we need these regardless of debug.gctrace. Change-Id: If3acf16505a43d416e9a99753206f03287180660 Reviewed-on: https://go-review.googlesource.com/12843 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-07-29 20:23:25 +00:00
Rick Hudson	e95bc5fef7	runtime: force mutator to give work buffer to GC The scheduler, work buffer's dispose, and write barriers can conspire to hide the a pointer from the GC's concurent mark phase. If this pointer is the only path to a large amount of marking the STW mark termination phase may take a lot of time. Consider the following: 1) dispose places a work buffer on the partial queue 2) the GC is busy so it does not immediately remove and process the work buffer 3) the scheduler runs a mutator whose write barrier dequeues the work buffer from the partial queue so the GC won't see it This repeats until the GC reaches the mark termination phase where the GC finally discovers the pointer along with a lot of work to do. This CL fixes the problem by having the mutator dispose of the buffer to the full queue instead of the partial queue. Since the write buffer never asks for full buffers the conspiracy described above is not possible. Updates #11694. Change-Id: I2ce832f9657a7570f800e8ce4459cd9e304ef43b Reviewed-on: https://go-review.googlesource.com/12840 Reviewed-by: Austin Clements <austin@google.com>	2015-07-29 18:56:11 +00:00
Austin Clements	c1f7a56fc0	runtime: close window that hides GC work from concurrent mark Currently we enter mark 2 by first flushing all existing gcWork caches and then setting gcBlackenPromptly, which disables further gcWork caching. However, if a worker or assist pulls a work buffer in to its gcWork cache after that cache has been flushed but before caching is disabled, that work may remain in that cache until mark termination. If that work represents a heap bottleneck (e.g., a single pointer that is the only way to reach a large amount of the heap), this can force mark termination to do a large amount of work, resulting in a long STW. Fix this by reversing the order of these steps: first disable caching, then flush all existing caches. Rick Hudson <rlh> did the hard work of tracking this down. This CL combined with CL 12672 and CL 12646 distills the critical parts of his fix from CL 12539. Fixes #11694. Change-Id: Ib10d0a21e3f6170a80727d0286f9990df049fed2 Reviewed-on: https://go-review.googlesource.com/12688 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-07-27 20:00:25 +00:00
Austin Clements	510fd1350d	runtime: enable GC assists ASAP Currently the GC coordinator enables GC assists at the same time it enables background mark workers, after the concurrent scan phase is done. However, this means a rapidly allocating mutator has the entire scan phase during which to allocate beyond the heap trigger and potentially beyond the heap goal with no back-pressure from assists. This prevents the feedback system that's supposed to keep the heap size under the heap goal from doing its job. Fix this by enabling mutator assists during the scan phase. This is safe because the write barrier is already enabled and globally acknowledged at this point. There's still a very small window between when the heap size reaches the heap trigger and when the GC coordinator is able to stop the world during which the mutator can allocate unabated. This allows very rapidly allocator mutators like TestTraceStress to still occasionally exceed the heap goal by a small amount (~20 MB at most for TestTraceStress). However, this seems like a corner case. Fixes #11677. Change-Id: I0f80d949ec82341cd31ca1604a626efb7295a819 Reviewed-on: https://go-review.googlesource.com/12674 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-27 19:59:05 +00:00
Austin Clements	64a32ffeee	runtime: don't start workers between mark 1 & 2 Currently we clear both the mark 1 and mark 2 signals at the beginning of concurrent mark. If either if these is clear, it acts as a signal to the scheduler that it should start background workers. However, this means that in the interim between mark 1 and mark 2, the scheduler basically loops starting up new workers only to have them return with nothing to do. In addition to harming performance and delaying mutator work, this approach has a race where workers started for mark 1 can mistakenly signal mark 2, causing it to complete prematurely. This approach also interferes with starting assists earlier to fix #11677. Fix this by initially setting both mark 1 and mark 2 to "signaled". The scheduler will not start background mark workers, though assists can still run. When we're ready to enter mark 1, we clear the mark 1 signal and wait for it. Then, when we're ready to enter mark 2, we clear the mark 2 signal and wait for it. This structure also lets us deal cleanly with the situation where all work is drained prior to the mark 2 wait, meaning that there may be no workers to signal completion. Currently we deal with this using a racy (and possibly incorrect) check for work in the coordinator itself to skip the mark 2 wait if there's no work. This change makes the coordinator unconditionally wait for mark completion and makes the scheduler itself signal completion by slightly extending the logic it already has to determine that there's no work and hence no use in starting a new worker. This is a prerequisite to fixing the remaining component of #11677, which will require enabling assists during the scan phase. However, we don't want to enable background workers until the mark phase because they will compete with the scan. This change lets us use bgMark1 and bgMark2 to indicate when it's okay to start background workers independent of assists. This is also a prerequisite to fixing #11694. It significantly reduces the occurrence of long mark termination pauses in #11694 (from 64 out of 1000 to 2 out of 1000 in one experiment). Coincidentally, this also reduces the final heap size (and hence run time) of TestTraceStress from ~100 MB and ~1.9 seconds to ~14 MB and ~0.4 seconds because it significantly shortens concurrent mark duration. Rick Hudson <rlh> did the hard work of tracking this down. Change-Id: I12ea9ee2db9a0ae9d3a90dde4944a75fcf408f4c Reviewed-on: https://go-review.googlesource.com/12672 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-27 19:59:02 +00:00
Austin Clements	500c88d40d	runtime: yield to GC coordinator after assist completion Currently it's possible for the GC assist to signal completion of the mark phase, which puts the GC coordinator goroutine on the current P's run queue, and then return to mutator code that delays until the next forced preemption before actually yielding control to the GC coordinator, dragging out completion of the mark phase. This delay can be further exacerbated if the mutator makes other goroutines runnable before yielding control, since this will push the GC coordinator on the back of the P's run queue. To fix this, this adds a Gosched to the assist if it completed the mark phase. This immediately and directly yields control to the GC coordinator. This already happens implicitly in the background mark workers because they park immediately after completing the mark. This is one of the reasons completion of the mark phase is being dragged out and allowing the mutator to allocate without assisting, leading to the large heap goal overshoot in issue #11677. This is also a prerequisite to making the assist block when it can't pay off its debt. Change-Id: I586adfbecb3ca042a37966752c1dc757f5c7fc78 Reviewed-on: https://go-review.googlesource.com/12670 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-27 19:59:00 +00:00
Austin Clements	cf225a1748	runtime: fix mark 2 completion in fractional/idle workers Currently fractional and idle mark workers dispose of their gcWork cache during mark 2 after incrementing work.nwait and after checking whether there are any workers or any work available. This creates a window for two races: 1) If the only remaining work is in this worker's gcWork cache, it will see that there are no more workers and no more work on the global lists (since it has not yet flushed its own cache) and prematurely signal mark 2 completion. 2) After this worker has incremented work.nwait but before it has flushed its cache, another worker may observe that there are no more workers and no more work and prematurely signal mark 2 completion. We can fix both of these by simply moving the cache flush above the increment of nwait and the test of the completion condition. This is probably contributing to #11694, though this alone is not enough to fix it. Change-Id: Idcf9656e5c460c5ea0d23c19c6c51e951f7716c3 Reviewed-on: https://go-review.googlesource.com/12646 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-27 19:58:56 +00:00
Austin Clements	7eeeae2a5c	runtime: always report starting heap size in gctrace Currently the gctrace output reports the trigger heap size, rather than the actual heap size at the beginning of GC. Often these are the same, or at least very close. However, it's possible for the heap to already have exceeded this trigger when we first check the trigger and start GC; in this case, this output is very misleading. We've encountered this confusion a few times when debugging and this behavior is difficult to document succinctly. Change the gctrace output to report the actual heap size when GC starts, rather than the trigger. Change-Id: I246b3ccae4c4c7ea44c012e70d24a46878d7601f Reviewed-on: https://go-review.googlesource.com/12452 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-27 17:45:28 +00:00
Austin Clements	cc6ed285e5	runtime: remove # from gctrace line Whenever someone pastes gctrace output into GitHub, it helpfully turns the GC cycle number into a link to some unrelated issue. Prevent this by removing the pound before the cycle number. The fact that this is a cycle number is probably more obvious at a glance than most of the other numbers. Change-Id: Ifa5fc7fe6c715eac50e639f25bc36c81a132ffea Reviewed-on: https://go-review.googlesource.com/12413 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-27 17:45:22 +00:00
Austin Clements	1942e3814b	runtime: clarify runtime.GC blocking behavior The runtime.GC documentation was rewritten in `df2809f` to make it clear that it blocks until GC is complete, but the re-rewrite in `ed9a4c9` and `e28a679` lost this property when clarifying that it may also block the entire program and not just the caller. Try to arrive at wording that conveys both of these properties. Change-Id: I1e255322aa28a21a548556ecf2a44d8d8ac524ef Reviewed-on: https://go-review.googlesource.com/12392 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rob Pike <r@golang.org>	2015-07-19 15:10:06 +00:00
Rob Pike	e28a679216	runtime: make the GC message less committal. We shouldn't guarantee this behavior, but suggest it's possible. Change-Id: I4c2afb48b99be4d91537306d3337171a13c9990a Reviewed-on: https://go-review.googlesource.com/12346 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-07-18 00:28:50 +00:00
Rob Pike	ed9a4c91c2	runtime: document that GC blocks the whole program No code changes. Just make it clear that runtime.GC is not concurrent. Change-Id: I00a99ebd26402817c665c9a128978cef19f037be Reviewed-on: https://go-review.googlesource.com/12345 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-07-17 22:40:21 +00:00
Austin Clements	777ab5ce1a	runtime: fix MemStats.{PauseNS,PauseEnd,PauseTotalNS,LastGC} These memstats are currently being computed by gcMark, which was appropriate in Go 1.4, but gcMark is now just one part of a bigger picture. In particular, it can't account for the sweep termination pause time, it can't account for all of the mark termination pause time, and the reported "pause end" and "last GC" times will be slightly earlier than they really are. Lift computing of these statistics into func gc, which has the appropriate visibility into the process to compute them correctly. Fixes one of the issues in #10323. This does not add new statistics appropriate to the concurrent collector; it simply fixes existing statistics that are being misreported. Change-Id: I670cb16594a8641f6b27acf4472db15b6e8e086e Reviewed-on: https://go-review.googlesource.com/11794 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-13 23:32:59 +00:00
Austin Clements	ad60cd8b92	runtime: report MemStats.PauseEnd in UNIX time Currently we report MemStats.PauseEnd in nanoseconds, but with no particular 0 time. On Linux, the 0 time is when the host started. On Darwin, it's the UNIX epoch. This is also inconsistent with the other absolute time in MemStats, LastGC, which is always reported in nanoseconds since 1970. Fix PauseEnd so it's always reported in nanoseconds since 1970, like LastGC. Fixes one of the issues raised in #10323. Change-Id: Ie2fe3169d45113992363a03b764f4e6c47e5c6a8 Reviewed-on: https://go-review.googlesource.com/11801 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-13 23:32:02 +00:00
Brad Fitzpatrick	2ae77376f7	all: link to https instead of http The one in misc/makerelease/makerelease.go is particularly bad and probably warrants rotating our keys. I didn't update old weekly notes, and reverted some changes involving test code for now, since we're late in the Go 1.5 freeze. Otherwise, the rest are all auto-generated changes, and all manually reviewed. Change-Id: Ia2753576ab5d64826a167d259f48a2f50508792d Reviewed-on: https://go-review.googlesource.com/12048 Reviewed-by: Rob Pike <r@golang.org>	2015-07-11 14:36:33 +00:00
Austin Clements	1b917484a8	runtime: reset mark state before checkmark and gctrace=2 mark Currently we fail to reset the live heap accounting state before the checkmark mark and before the gctrace=2 extra mark. As a result, if either are enabled, at the end of GC it thinks there are 0 bytes of live heap, which causes the GC controller to initiate a new GC immediately, regardless of the true heap size. Fix this by factoring this state reset into a function and calling it before all three possible marks. This function should be merged with gcResetGState, but doing so requires some additional cleanup, so it will wait for after the freeze. Filed #11427 for this cleanup. Fixes #10492. Change-Id: Ibe46348916fc8368fac6f086e142815c970a6f4d Reviewed-on: https://go-review.googlesource.com/11561 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 15:58:29 +00:00
Austin Clements	d57056ba26	runtime: don't free stack spans during GC Memory for stacks is manually managed by the runtime and, currently (with one exception) we free stack spans immediately when the last stack on a span is freed. However, the garbage collector assumes that spans can never transition from non-free to free during scan or mark. This disagreement makes it possible for the garbage collector to mark uninitialized objects and is blocking us from re-enabling the bad pointer test in the garbage collector (issue #9880). For example, the following sequence will result in marking an uninitialized object: 1. scanobject loads a pointer slot out of the object it's scanning. This happens to be one of the special pointers from the heap into a stack. Call the pointer p and suppose it points into X's stack. 2. X, running on another thread, grows its stack and frees its old stack. 3. The old stack happens to be large or was the last stack in its span, so X frees this span, setting it to state _MSpanFree. 4. The span gets reused as a heap span. 5. scanobject calls heapBitsForObject, which loads the span containing p, which is now in state _MSpanInUse, but doesn't necessarily have an object at p. The not-object at p gets marked, and at this point all sorts of things can go wrong. We already have a partial solution to this. When shrinking a stack, we put the old stack on a queue to be freed at the end of garbage collection. This was done to address exactly this problem, but wasn't a complete solution. This commit generalizes this solution to both shrinking and growing stacks. For stacks that fit in the stack pool, we simply don't free the span, even if its reference count reaches zero. It's fine to reuse the span for other stacks, and this enables that. At the end of GC, we sweep for cached stack spans with a zero reference count and free them. For larger stacks, we simply queue the stack span to be freed at the end of GC. Ideally, we would reuse these large stack spans the way we can small stack spans, but that's a more invasive change that will have to wait until after the freeze. Fixes #11267. Change-Id: Ib7f2c5da4845cc0268e8dc098b08465116972a71 Reviewed-on: https://go-review.googlesource.com/11502 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 15:33:40 +00:00
Austin Clements	f73b2fca84	runtime: remove unused _GCsweep state We don't use this state. _GCoff means we're sweeping in the background. This makes it clear in the next commit that _GCoff and only _GCoff means sweeping. Change-Id: I416324a829ba0be3794a6cf3cf1655114cb6e47c Reviewed-on: https://go-review.googlesource.com/11501 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 15:33:31 +00:00
Rick Hudson	90a19961f2	runtime: reduce latency by aggressively ending mark phase Some latency regressions have crept into our system over the past few weeks. This CL fixes those by having the mark phase more aggressively blacken objects so that the mark termination phase, a STW phase, has less work to do. Three approaches were taken when the mark phase believes it has no more work to do, ie all the work buffers are empty. If things have gone well the mark phase is correct and there is in fact little or no work. In that case the following items will take very little time. If the mark phase is wrong this CL will ferret that work out and give the mark phase a chance to deal with it concurrently before mark termination begins. When the mark phase first appears to be out of work, it does three things: 1) It switches from allocating white to allocating black to reduce the number of unmarked objects reachable only from stacks. 2) It flushes and disables per-P GC work caches so all work must be in globally visible work buffers. 3) It rescans the global roots---the BSS and data segments---so there are fewer objects to blacken during mark termination. We do not rescan stacks at this point, though that could be done in a later CL. After these steps, it again drains the global work buffers. On a lightly loaded machine the garbage benchmark has reduced the number of GC cycles with latency > 10 ms from 83 out of 4083 cycles down to 2 out of 3995 cycles. Maximum latency was reduced from 60+ msecs down to 20 ms. Change-Id: I152285b48a7e56c5083a02e8e4485dd39c990492 Reviewed-on: https://go-review.googlesource.com/10590 Reviewed-by: Austin Clements <austin@google.com>	2015-06-18 21:38:46 +00:00
Russ Cox	3c60e6e8cf	runtime: fix races in stack scan This fixes a hang during runtime.TestTraceStress. It also fixes double-scan of stacks, which leads to stack barrier installation failures. Both of these have shown up as flaky failures on the dashboard. Fixes #10941. Change-Id: Ia2a5991ce2c9f43ba06ae1c7032f7c898dc990e0 Reviewed-on: https://go-review.googlesource.com/11089 Reviewed-by: Austin Clements <austin@google.com>	2015-06-17 17:56:26 +00:00
Russ Cox	d5b40b6ac2	runtime: add GODEBUG gcshrinkstackoff, gcstackbarrieroff, and gcstoptheworld variables While we're here, update the documentation and delete variables with no effect. Change-Id: I4df0d266dff880df61b488ed547c2870205862f0 Reviewed-on: https://go-review.googlesource.com/10790 Reviewed-by: Austin Clements <austin@google.com>	2015-06-15 17:31:04 +00:00
Ainar Garipov	7f9f70e5b6	all: fix misprints in comments These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-11 14:18:57 +00:00
Austin Clements	1303957dbf	runtime: enable write barriers during concurrent scan Currently, write barriers are only enabled after completion of the concurrent scan phase, as we enter the concurrent mark phase. However, stack barriers are installed during the scan phase and assume that write barriers will track changes to frames above the stack barriers. Since write barriers aren't enabled until after stack barriers are installed, we may miss modifications to the stack that happen after installing the stack barriers and before enabling write barriers. Fix this by enabling write barriers during the scan phase. This commit intentionally makes the minimal change to do this (there's only one line of code change; the rest are comment changes). At the very least, we should consider eliminating the ragged barrier that's intended to synchronize the enabling of write barriers, but now just wastes time. I've included a large comment about extensions and alternative designs. Change-Id: Ib20fede794e4fcb91ddf36f99bd97344d7f96421 Reviewed-on: https://go-review.googlesource.com/10795 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:55:33 +00:00
Austin Clements	6f6403eddf	runtime: fix checkmarks to rescan stacks Currently checkmarks mode fails to rescan stacks because it sees the leftover state bits indicating that the stacks haven't changed since the last scan. As a result, it won't detect lost marks caused by failing to scan stacks correctly during regular garbage collection. Fix this by marking all stacks dirty before performing the checkmark phase. Change-Id: I1f06882bb8b20257120a4b8e7f95bb3ffc263895 Reviewed-on: https://go-review.googlesource.com/10794 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:55:12 +00:00
Austin Clements	10083d8007	runtime: print start of GC cycle in gctrace, rather than end Currently the GODEBUG=gctrace=1 trace line includes "@n.nnns" to indicate the time that the GC cycle ended relative to the time the program started. This was meant to be consistent with the utilization as of the end of the cycle, which is printed next on the trace line, but it winds up just being confusing and unexpected. Change the trace line to include the time that the GC cycle started relative to the time the program started. Change-Id: I7d64580cd696eb17540716d3e8a74a9d6ae50650 Reviewed-on: https://go-review.googlesource.com/10634 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-03 02:17:43 +00:00
Austin Clements	3f6e69aca5	runtime: steal space for stack barrier tracking from stack The stack barrier code will need a bookkeeping structure to keep track of the overwritten return PCs. This commit introduces and allocates this structure, but does not yet use the structure. We don't want to allocate space for this structure during garbage collection, so this commit allocates it along with the allocation of the corresponding stack. However, we can't do a regular allocation in newstack because mallocgc may itself grow the stack (which would lead to a recursive allocation). Hence, this commit makes the bookkeeping structure part of the stack allocation itself by stealing the necessary space from the top of the stack allocation. Since the size of this bookkeeping structure is logarithmic in the size of the stack, this has minimal impact on stack behavior. Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665 Reviewed-on: https://go-review.googlesource.com/10313 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 19:57:57 +00:00
Austin Clements	cc6a7fce53	runtime: increase precision of gctrace times Currently we truncate gctrace clock and CPU times to millisecond precision. As a result, many phases are typically printed as 0, which is fine for user consumption, but makes gathering statistics and reports over GC traces difficult. In 1.4, the gctrace line printed times in microseconds. This was better for statistics, but not as easy for users to read or interpret, and it generally made the trace lines longer. This change strikes a balance between these extremes by printing milliseconds, but including the decimal part to two significant figures down to microsecond precision. This remains easy to read and interpret, but includes more precision when it's useful. For example, where the code currently prints, gc #29 @1.629s 0%: 0+2+0+12+0 ms clock, 0+2+0+0/12/0+0 ms cpu, 4->4->2 MB, 4 MB goal, 1 P this prints, gc #29 @1.629s 0%: 0.005+2.1+0+12+0.29 ms clock, 0.005+2.1+0+0/12/0+0.29 ms cpu, 4->4->2 MB, 4 MB goal, 1 P Fixes #10970. Change-Id: I249624779433927cd8b0947b986df9060c289075 Reviewed-on: https://go-review.googlesource.com/10554 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 18:31:36 +00:00

1 2 3 4

164 Commits