qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-19 16:44:43 -07:00

Author	SHA1	Message	Date
Austin Clements	3b5637ff2b	runtime: doubly fix "double wakeup" panic runtime.gchelper depends on the non-atomic load of work.ndone happening strictly before the atomic add of work.nwait. Until very recently (commit `978af9c2db`, fixing #20334), the compiler reordered these operations. This created a race since work.ndone can change as soon as work.nwait is equal to work.ndone. If that happened, more than one gchelper could attempt to wake up the work.alldone note, causing a "double wakeup" panic. This was fixed in the compiler, but to make this code less subtle, make the load of work.ndone atomic. This clearly forces the order of these operations, ensuring the race doesn't happen. Fixes #19305 (though really `978af9c2db` fixed it). Change-Id: Ieb1a84e1e5044c33ac612c8a5ab6297e7db4c57d Reviewed-on: https://go-review.googlesource.com/43311 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-12 15:33:09 +00:00
Austin Clements	227fff2ea4	runtime/debug: don't trigger a GC on SetGCPercent Currently SetGCPercent forces a GC in order to recompute GC pacing. Since we can now recompute pacing on the fly using gcSetTriggerRatio, change SetGCPercent (really runtime.setGCPercent) to go through gcSetTriggerRatio and not trigger a GC. Fixes #19076. Change-Id: Ib30d7ab1bb3b55219535b9f238108f3d45a1b522 Reviewed-on: https://go-review.googlesource.com/39835 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-21 17:42:02 +00:00
Austin Clements	1c4f3c5ea0	runtime: make gcSetTriggerRatio work at any time This changes gcSetTriggerRatio so it can be called even during concurrent mark or sweep. In this case, it will adjust the pacing of the current phase, accounting for progress that has already been made. To make this work for concurrent sweep, this introduces a "basis" for the pagesSwept count, much like the basis we just introduced for heap_live. This lets gcSetTriggerRatio shift the basis to the current heap_live and pagesSwept and compute a slope from there to completion. This avoids creating a discontinuity where, if the ratio has increased, there has to be a flurry of sweep activity to catch up. Instead, this creates a continuous, piece-wise linear function as adjustments are made. For #19076. Change-Id: Ibcd76aeeb81ff4814b00be7cbd3530b73bbdbba9 Reviewed-on: https://go-review.googlesource.com/39833 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-21 17:41:59 +00:00
Austin Clements	a5eb3dceaf	runtime: drive proportional sweep directly off heap_live Currently, proportional sweep maintains its own count of how many bytes have been allocated since the beginning of the sweep cycle so it can compute how many pages need to be swept for a given allocation. However, this requires a somewhat complex reimbursement scheme since proportional sweep must be done before a span is allocated, but we don't know how many bytes to charge until we've allocated a span. This means that the allocated byte count used by proportional sweep can go up and down, which has led to underflow bugs in the past (#18043) and is going to interfere with adjusting sweep pacing on-the-fly (for #19076). This approach also means we're maintaining a statistic that is very closely related to heap_live, but has a different 0 value. This is particularly confusing because the sweep ratio is computed based on heap_live, so you have to understand that these two statistics are very closely related. Replace all of this and compute the sweep debt directly from the current value of heap_live. To make this work, we simply save the value of heap_live when the sweep ratio is computed to use as a "basis" for later computing the sweep debt. This eliminates the need for reimbursement as well as the code for maintaining the sweeper's version of the live heap size. For #19076. Coincidentally fixes #18043, since this eliminates sweep reimbursement entirely. Change-Id: I1f931ddd6e90c901a3972c7506874c899251dc2a Reviewed-on: https://go-review.googlesource.com/39832 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-21 17:41:57 +00:00
Austin Clements	ee175afac2	runtime: consolidate all trigger-derived computations Currently, the computations that derive controls from the GC trigger are spread across several parts of the mark termination code. Consolidate computing the absolute trigger, the heap goal, and sweep pacing into a single function called at the end of mark termination. Unlike the code being consolidated, this has to be more careful about negative gcpercent. Many of the consolidated code paths simply didn't execute if GC was off. This is a step toward being able to change the GC trigger ratio in the middle of concurrent sweeping and marking. For this commit, we try to stick close to the original structure of the code that's being consolidated, so it doesn't yet support mid-cycle adjustments. For #19076. Change-Id: Ic5335be04b96ad20e70d53d67913a86bd6b31456 Reviewed-on: https://go-review.googlesource.com/39831 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-21 17:41:55 +00:00
Austin Clements	49a412a5b7	runtime: rationalize triggerRatio gcController.triggerRatio is the only field in gcController that persists across cycles. As global mutable state, the places where it written and read are spread out, making it difficult to see that updates and downstream calculations are done correctly. Improve this situation by doing two things: 1) Move triggerRatio to memstats so it lives with the other trigger-related fields and makes gcController entirely transient state. 2) Commit the new trigger ratio during mark termination when we compute other next-cycle controls, including the absolute trigger. This forces us to explicitly thread the new trigger ratio from gcController.endCycle to mark termination, so we're not just pulling it out of global state. Change-Id: I6669932f8039a8c0ef46a3f2a8c537db72e578aa Reviewed-on: https://go-review.googlesource.com/39830 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-21 17:41:53 +00:00
Austin Clements	9d36163c0b	runtime: consistently use atomic loads for heap_live heap_live is updated atomically without locking, so we should also use atomic loads to read it. Fix the reads of heap_live that happen outside of STW to be atomic. Change-Id: Idca9451c348168c2a792a9499af349833a3c333f Reviewed-on: https://go-review.googlesource.com/41371 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-21 17:41:51 +00:00
Austin Clements	051809e352	runtime: free workbufs during sweeping This extends the sweeper to free workbufs back to the heap between GC cycles, allowing this memory to be reused for GC'd allocations or eventually returned to the OS. This helps for applications that have high peak heap usage relative to their regular heap usage (for example, a high-memory initialization phase). Workbuf memory is roughly proportional to heap size and since we currently never free workbufs, it's proportional to peak heap size. By freeing workbufs, we can release and reuse this memory for other purposes when the heap shrinks. This is somewhat complicated because this costs ~1–2 µs per workbuf span, so for large heaps it's too expensive to just do synchronously after mark termination between starting the world and dropping the worldsema. Hence, we do it asynchronously in the sweeper. This adds a list of "free" workbuf spans that can be returned to the heap. GC moves all workbuf spans to this list after mark termination and the background sweeper drains this list back to the heap. If the sweeper doesn't finish, that's fine, since getempty can directly reuse any remaining spans to allocate more workbufs. Performance impact is negligible. On the x/benchmarks, this reduces GC-bytes-from-system by 6–11%. Fixes #19325. Change-Id: Icb92da2196f0c39ee984faf92d52f29fd9ded7a8 Reviewed-on: https://go-review.googlesource.com/38582 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-13 18:20:47 +00:00
Austin Clements	9cc883a466	runtime: allocate GC workbufs from manually-managed spans Currently the runtime allocates workbufs from persistent memory, which means they can never be freed. Switch to allocating them from manually-managed heap spans. This doesn't free them yet, but it puts us in a position to do so. For #19325. Change-Id: I94b2512a2f2bbbb456cd9347761b9412e80d2da9 Reviewed-on: https://go-review.googlesource.com/38581 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-13 18:20:44 +00:00
Austin Clements	7e1832d06c	runtime: say where the compiler knows about var writeBarrier The runtime.writeBarrier variable tries to be helpful by telling you that the compiler also knows about this variable, which you could probably guess, but doesn't say how the compiler knows about it. In fact, the compiler has a complete copy in builtin/runtime.go that needs to be kept in sync. Say so. Change-Id: Ia7fb0c591cb6f9b8230decce01008b417dfcec89 Reviewed-on: https://go-review.googlesource.com/40150 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-04-09 23:05:24 +00:00
Austin Clements	9ffbdabdb0	runtime: make runtime.GC() trigger a concurrent GC Currently runtime.GC() triggers a STW GC. For common uses in tests and benchmarks, it doesn't matter whether it's STW or concurrent, but for uses in servers for things like collecting heap profiles and controlling memory footprint, this pause can be a bit problem for latency. This changes runtime.GC() to trigger a concurrent GC. In order to remain as close as possible to its current meaning, we define it to always perform a full mark/sweep GC cycle before returning (even if that means it has to finish up a cycle we're in the middle of first) and to publish the heap profile as of the triggered mark termination. While it must perform a full cycle, simultaneous runtime.GC() calls can be consolidated into a single full cycle. Fixes #18216. Change-Id: I9088cc5deef4ab6bcf0245ed1982a852a01c44b5 Reviewed-on: https://go-review.googlesource.com/37520 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-31 01:15:21 +00:00
Austin Clements	2919132e1b	runtime: don't adjust GC trigger on forced GC Forced GCs don't provide good information about how to adjust the GC trigger. Currently we avoid adjusting the trigger on forced GC because forced GC is also STW and we don't adjust the trigger on STW GC. However, this will become a problem when forced GC is concurrent. Fix this by skipping trigger adjustment if the GC was user-forced. For #18216. Change-Id: I03dfdad12ecd3cfeca4573140a0768abb29aac5e Reviewed-on: https://go-review.googlesource.com/38951 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-31 01:15:16 +00:00
Austin Clements	29fdbcfea3	runtime: track forced GCs independent of gcMode Currently gcMode != gcBackgroundMode implies this was a user-forced GC cycle. This is no longer going to be true when we make runtime.GC() trigger a concurrent GC, so replace this with an explicit work.userForced bit. For #18216. Change-Id: If7d71bbca78b5f0b35641b070f9d457f5c9a52bd Reviewed-on: https://go-review.googlesource.com/37519 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-31 01:15:13 +00:00
Austin Clements	3d58498fdb	runtime: simplify forced GC triggering Now that the gcMode is no longer involved in the GC trigger condition, we can simplify the triggering of forced GCs. By making the trigger condition for forced GCs true even if gcphase is not _GCoff, we don't need any special case path in gcStart to ensure that forced GCs don't get consolidated. Change-Id: I6067a13d76e40ff2eef8fade6fc14adb0cb58ee5 Reviewed-on: https://go-review.googlesource.com/37517 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-31 01:15:08 +00:00
Austin Clements	29be3f1999	runtime: generalize GC trigger Currently the GC triggering condition is an awkward combination of the gcMode (whether or not it's gcBackgroundMode) and a boolean "forceTrigger" flag. Replace this with a new gcTrigger type that represents the range of transition predicates we need. This has several advantages: 1. We can remove the awkward logic that affects the trigger behavior based on the gcMode. Now gcMode purely controls whether to run a STW GC or not and the gcTrigger controls whether this is a forced GC that cannot be consolidated with other GC cycles. 2. We can lift the time-based triggering logic in sysmon to just another type of GC trigger and move the logic to the trigger test. 3. This sets us up to have a cycle count-based trigger, which we'll use to make runtime.GC trigger concurrent GC with the desired consolidation properties. For #18216. Change-Id: If9cd49349579a548800f5022ae47b8128004bbfc Reviewed-on: https://go-review.googlesource.com/37516 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-31 01:15:06 +00:00
Austin Clements	1be3e76e76	runtime: simplify heap profile flushing Currently the heap profile is flushed by either gcSweep in STW mode or by gcMarkTermination in concurrent mode. Simplify this by making gcMarkTermination always flush the heap profile and by making gcSweep do one extra flush (instead of two) in STW mode. Change-Id: I62147afb2a128e1f3d92ef4bb8144c8a345f53c4 Reviewed-on: https://go-review.googlesource.com/37715 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-31 01:15:01 +00:00
Austin Clements	eee85fc5a1	runtime: snapshot heap profile during mark termination Currently we snapshot the heap profile just after mark termination starts the world because it's a relatively expensive operation. However, this means any alloc or free events that happen between starting the world and snapshotting the heap profile can be accounted to the wrong cycle. In the worst case, a free can be accounted to the cycle before the alloc; if the heap is small, this can result temporarily in a negative "in use" count in the profile. Fix this without making STW more expensive by using a global heap profile cycle counter. This lets us split up the operation into a two parts: 1) a super-cheap snapshot operation that simply increments the global cycle counter during STW, and 2) a more expensive cleanup operation we can do after starting the world that frees up a slot in all buckets for use by the next heap profile cycle. Fixes #19311. Change-Id: I6bdafabf111c48b3d26fe2d91267f7bef0bd4270 Reviewed-on: https://go-review.googlesource.com/37714 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-31 01:14:56 +00:00
Austin Clements	13ae271d5d	runtime: introduce a type for lfstacks The lfstack API is still a C-style API: lfstacks all have unhelpful type uint64 and the APIs are package-level functions. Make the code more readable and Go-style by creating an lfstack type with methods for push, pop, and empty. Change-Id: I64685fa3be0e82ae2d1a782a452a50974440a827 Reviewed-on: https://go-review.googlesource.com/38290 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-19 22:42:24 +00:00
Dmitry Vyukov	0556e26273	sync: make Mutex more fair Add new starvation mode for Mutex. In starvation mode ownership is directly handed off from unlocking goroutine to the next waiter. New arriving goroutines don't compete for ownership. Unfair wait time is now limited to 1ms. Also fix a long standing bug that goroutines were requeued at the tail of the wait queue. That lead to even more unfair acquisition times with multiple waiters. Performance of normal mode is not considerably affected. Fixes #13086 On the provided in the issue lockskew program: done in 1.207853ms done in 1.177451ms done in 1.184168ms done in 1.198633ms done in 1.185797ms done in 1.182502ms done in 1.316485ms done in 1.211611ms done in 1.182418ms name old time/op new time/op delta MutexUncontended-48 0.65ns ± 0% 0.65ns ± 1% ~ (p=0.087 n=10+10) Mutex-48 112ns ± 1% 114ns ± 1% +1.69% (p=0.000 n=10+10) MutexSlack-48 113ns ± 0% 87ns ± 1% -22.65% (p=0.000 n=8+10) MutexWork-48 149ns ± 0% 145ns ± 0% -2.48% (p=0.000 n=9+10) MutexWorkSlack-48 149ns ± 0% 122ns ± 3% -18.26% (p=0.000 n=6+10) MutexNoSpin-48 103ns ± 4% 105ns ± 3% ~ (p=0.089 n=10+10) MutexSpin-48 490ns ± 4% 515ns ± 6% +5.08% (p=0.006 n=10+10) Cond32-48 13.4µs ± 6% 13.1µs ± 5% -2.75% (p=0.023 n=10+10) RWMutexWrite100-48 53.2ns ± 3% 41.2ns ± 3% -22.57% (p=0.000 n=10+10) RWMutexWrite10-48 45.9ns ± 2% 43.9ns ± 2% -4.38% (p=0.000 n=10+10) RWMutexWorkWrite100-48 122ns ± 2% 134ns ± 1% +9.92% (p=0.000 n=10+10) RWMutexWorkWrite10-48 206ns ± 1% 188ns ± 1% -8.52% (p=0.000 n=8+10) Cond32-24 12.1µs ± 3% 12.4µs ± 3% +1.98% (p=0.043 n=10+9) MutexUncontended-24 0.74ns ± 1% 0.75ns ± 1% ~ (p=0.650 n=10+10) Mutex-24 122ns ± 2% 124ns ± 1% +1.31% (p=0.007 n=10+10) MutexSlack-24 96.9ns ± 2% 102.8ns ± 2% +6.11% (p=0.000 n=10+10) MutexWork-24 146ns ± 1% 135ns ± 2% -7.70% (p=0.000 n=10+9) MutexWorkSlack-24 135ns ± 1% 128ns ± 2% -5.01% (p=0.000 n=10+9) MutexNoSpin-24 114ns ± 3% 110ns ± 4% -3.84% (p=0.000 n=10+10) MutexSpin-24 482ns ± 4% 475ns ± 8% ~ (p=0.286 n=10+10) RWMutexWrite100-24 43.0ns ± 3% 43.1ns ± 2% ~ (p=0.956 n=10+10) RWMutexWrite10-24 43.4ns ± 1% 43.2ns ± 1% ~ (p=0.085 n=10+9) RWMutexWorkWrite100-24 130ns ± 3% 131ns ± 3% ~ (p=0.747 n=10+10) RWMutexWorkWrite10-24 191ns ± 1% 192ns ± 1% ~ (p=0.210 n=10+10) Cond32-12 11.5µs ± 2% 11.7µs ± 2% +1.98% (p=0.002 n=10+10) MutexUncontended-12 1.48ns ± 0% 1.50ns ± 1% +1.08% (p=0.004 n=10+10) Mutex-12 141ns ± 1% 143ns ± 1% +1.63% (p=0.000 n=10+10) MutexSlack-12 121ns ± 0% 119ns ± 0% -1.65% (p=0.001 n=8+9) MutexWork-12 141ns ± 2% 150ns ± 3% +6.36% (p=0.000 n=9+10) MutexWorkSlack-12 131ns ± 0% 138ns ± 0% +5.73% (p=0.000 n=9+10) MutexNoSpin-12 87.0ns ± 1% 83.7ns ± 1% -3.80% (p=0.000 n=10+10) MutexSpin-12 364ns ± 1% 377ns ± 1% +3.77% (p=0.000 n=10+10) RWMutexWrite100-12 42.8ns ± 1% 43.9ns ± 1% +2.41% (p=0.000 n=8+10) RWMutexWrite10-12 39.8ns ± 4% 39.3ns ± 1% ~ (p=0.433 n=10+9) RWMutexWorkWrite100-12 131ns ± 1% 131ns ± 0% ~ (p=0.591 n=10+9) RWMutexWorkWrite10-12 173ns ± 1% 174ns ± 0% ~ (p=0.059 n=10+8) Cond32-6 10.9µs ± 2% 10.9µs ± 2% ~ (p=0.739 n=10+10) MutexUncontended-6 2.97ns ± 0% 2.97ns ± 0% ~ (all samples are equal) Mutex-6 122ns ± 6% 122ns ± 2% ~ (p=0.668 n=10+10) MutexSlack-6 149ns ± 3% 142ns ± 3% -4.63% (p=0.000 n=10+10) MutexWork-6 136ns ± 3% 140ns ± 5% ~ (p=0.077 n=10+10) MutexWorkSlack-6 152ns ± 0% 138ns ± 2% -9.21% (p=0.000 n=6+10) MutexNoSpin-6 150ns ± 1% 152ns ± 0% +1.50% (p=0.000 n=8+10) MutexSpin-6 726ns ± 0% 730ns ± 1% ~ (p=0.069 n=10+10) RWMutexWrite100-6 40.6ns ± 1% 40.9ns ± 1% +0.91% (p=0.001 n=8+10) RWMutexWrite10-6 37.1ns ± 0% 37.0ns ± 1% ~ (p=0.386 n=9+10) RWMutexWorkWrite100-6 133ns ± 1% 134ns ± 1% +1.01% (p=0.005 n=9+10) RWMutexWorkWrite10-6 152ns ± 0% 152ns ± 0% ~ (all samples are equal) Cond32-2 7.86µs ± 2% 7.95µs ± 2% +1.10% (p=0.023 n=10+10) MutexUncontended-2 8.10ns ± 0% 9.11ns ± 4% +12.44% (p=0.000 n=9+10) Mutex-2 32.9ns ± 9% 38.4ns ± 6% +16.58% (p=0.000 n=10+10) MutexSlack-2 93.4ns ± 1% 98.5ns ± 2% +5.39% (p=0.000 n=10+9) MutexWork-2 40.8ns ± 3% 43.8ns ± 7% +7.38% (p=0.000 n=10+9) MutexWorkSlack-2 98.6ns ± 5% 108.2ns ± 2% +9.80% (p=0.000 n=10+8) MutexNoSpin-2 399ns ± 1% 398ns ± 2% ~ (p=0.463 n=8+9) MutexSpin-2 1.99µs ± 3% 1.97µs ± 1% -0.81% (p=0.003 n=9+8) RWMutexWrite100-2 37.6ns ± 5% 46.0ns ± 4% +22.17% (p=0.000 n=10+8) RWMutexWrite10-2 50.1ns ± 6% 36.8ns ±12% -26.46% (p=0.000 n=9+10) RWMutexWorkWrite100-2 136ns ± 0% 134ns ± 2% -1.80% (p=0.001 n=7+9) RWMutexWorkWrite10-2 140ns ± 1% 138ns ± 1% -1.50% (p=0.000 n=10+10) Cond32 5.93µs ± 1% 5.91µs ± 0% ~ (p=0.411 n=9+10) MutexUncontended 15.9ns ± 0% 15.8ns ± 0% -0.63% (p=0.000 n=8+8) Mutex 15.9ns ± 0% 15.8ns ± 0% -0.44% (p=0.003 n=10+10) MutexSlack 26.9ns ± 3% 26.7ns ± 2% ~ (p=0.084 n=10+10) MutexWork 47.8ns ± 0% 47.9ns ± 0% +0.21% (p=0.014 n=9+8) MutexWorkSlack 54.9ns ± 3% 54.5ns ± 3% ~ (p=0.254 n=10+10) MutexNoSpin 786ns ± 2% 765ns ± 1% -2.66% (p=0.000 n=10+10) MutexSpin 3.87µs ± 1% 3.83µs ± 0% -0.85% (p=0.005 n=9+8) RWMutexWrite100 21.2ns ± 2% 21.0ns ± 1% -0.88% (p=0.018 n=10+9) RWMutexWrite10 22.6ns ± 1% 22.6ns ± 0% ~ (p=0.471 n=9+9) RWMutexWorkWrite100 132ns ± 0% 132ns ± 0% ~ (all samples are equal) RWMutexWorkWrite10 124ns ± 0% 123ns ± 0% ~ (p=0.656 n=10+10) Change-Id: I66412a3a0980df1233ad7a5a0cd9723b4274528b Reviewed-on: https://go-review.googlesource.com/34310 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2017-02-17 17:24:59 +00:00
Russ Cox	1f77db94f8	runtime: do not call wakep from enlistWorker, to avoid possible deadlock We have seen one instance of a production job suddenly spinning to 100% CPU and becoming unresponsive. In that one instance, a SIGQUIT was sent after 328 minutes of spinning, and the stacks showed a single goroutine in "IO wait (scan)" state. Looking for things that might get stuck if a goroutine got stuck in scanning a stack, we found that injectglist does: lock(&sched.lock) var n int for n = 0; glist != nil; n++ { gp := glist glist = gp.schedlink.ptr() casgstatus(gp, _Gwaiting, _Grunnable) globrunqput(gp) } unlock(&sched.lock) and that casgstatus spins on gp.atomicstatus until the _Gscan bit goes away. Essentially, this code locks sched.lock and then while holding sched.lock, waits to lock gp.atomicstatus. The code that is doing the scan is: if castogscanstatus(gp, s, s\|_Gscan) { if !gp.gcscandone { scanstack(gp, gcw) gp.gcscandone = true } restartg(gp) break loop } More analysis showed that scanstack can, in a rare case, end up calling back into code that acquires sched.lock. For example: runtime.scanstack at proc.go:866 calls runtime.gentraceback at mgcmark.go:842 calls runtime.scanstack$1 at traceback.go:378 calls runtime.scanframeworker at mgcmark.go:819 calls runtime.scanblock at mgcmark.go:904 calls runtime.greyobject at mgcmark.go:1221 calls (runtime.gcWork).put at mgcmark.go:1412 calls (runtime.gcControllerState).enlistWorker at mgcwork.go:127 calls runtime.wakep at mgc.go:632 calls runtime.startm at proc.go:1779 acquires runtime.sched.lock at proc.go:1675 This path was found with an automated deadlock-detecting tool. There are many such paths but they all go through enlistWorker -> wakep. The evidence strongly suggests that one of these paths is what caused the deadlock we observed. We're running those jobs with GOTRACEBACK=crash now to try to get more information if it happens again. Further refinement and analysis shows that if we drop the wakep call from enlistWorker, the remaining few deadlock cycles found by the tool are all false positives caused by not understanding the effect of calls to func variables. The enlistWorker -> wakep call was intended only as a performance optimization, it rarely executes, and if it does execute at just the wrong time it can (and plausibly did) cause the deadlock we saw. Comment it out, to avoid the potential deadlock. Fixes #19112. Unfixes #14179. Change-Id: I6f7e10b890b991c11e79fab7aeefaf70b5d5a07b Reviewed-on: https://go-review.googlesource.com/37093 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2017-02-15 21:22:36 +00:00
Austin Clements	d089a6c718	runtime: remove stack barriers Now that we don't rescan stacks, stack barriers are unnecessary. This removes all of the code and structures supporting them as well as tests that were specifically for stack barriers. Updates #17503. Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c Reviewed-on: https://go-review.googlesource.com/36620 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-14 15:52:54 +00:00
Austin Clements	c5ebcd2c8a	runtime: remove rescan list With the hybrid barrier, rescanning stacks is no longer necessary so the rescan list is no longer necessary. Remove it. This leaves the gcrescanstacks GODEBUG variable, since it's useful for debugging, but changes it to simply walk all of the Gs to rescan stacks rather than using the rescan list. We could also remove g.gcscanvalid, which is effectively a distributed rescan list. However, it's still useful for gcrescanstacks mode and it adds little complexity, so we'll leave it in. Fixes #17099. Updates #17503. Change-Id: I776d43f0729567335ef1bfd145b75c74de2cc7a9 Reviewed-on: https://go-review.googlesource.com/36619 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-02-14 15:52:51 +00:00
Josh Bleecher Snyder	46a75870ad	runtime: speed up fastrand() % n This occurs a fair amount in the runtime for non-power-of-two n. Use an alternative, faster formulation. name old time/op new time/op delta Fastrandn/2-8 4.45ns ± 2% 2.09ns ± 3% -53.12% (p=0.000 n=14+14) Fastrandn/3-8 4.78ns ±11% 2.06ns ± 2% -56.94% (p=0.000 n=15+15) Fastrandn/4-8 4.76ns ± 9% 1.99ns ± 3% -58.28% (p=0.000 n=15+13) Fastrandn/5-8 4.96ns ±13% 2.03ns ± 6% -59.14% (p=0.000 n=15+15) name old time/op new time/op delta SelectUncontended-8 33.7ns ± 2% 33.9ns ± 2% +0.70% (p=0.000 n=49+50) SelectSyncContended-8 1.68µs ± 4% 1.65µs ± 4% -1.54% (p=0.000 n=50+45) SelectAsyncContended-8 282ns ± 1% 277ns ± 1% -1.50% (p=0.000 n=48+43) SelectNonblock-8 5.31ns ± 1% 5.32ns ± 1% ~ (p=0.275 n=45+44) SelectProdCons-8 585ns ± 3% 577ns ± 2% -1.35% (p=0.000 n=50+50) GoroutineSelect-8 1.59ms ± 2% 1.59ms ± 1% ~ (p=0.084 n=49+48) Updates #16213 Change-Id: Ib555a4d7da2042a25c3976f76a436b536487d5b7 Reviewed-on: https://go-review.googlesource.com/36932 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-14 00:01:22 +00:00
Russ Cox	e4371fb179	time: optimize Now on darwin, windows Fetch both monotonic and wall time together when possible. Avoids skew and is cheaper. Also shave a few ns off in conversion in package time. Compared to current implementation (after monotonic changes): name old time/op new time/op delta Now 19.6ns ± 1% 9.7ns ± 1% -50.63% (p=0.000 n=41+49) darwin/amd64 Now 23.5ns ± 4% 10.6ns ± 5% -54.61% (p=0.000 n=30+28) windows/amd64 Now 54.5ns ± 5% 29.8ns ± 9% -45.40% (p=0.000 n=27+29) windows/386 More importantly, compared to Go 1.8: name old time/op new time/op delta Now 9.5ns ± 1% 9.7ns ± 1% +1.94% (p=0.000 n=41+49) darwin/amd64 Now 12.9ns ± 5% 10.6ns ± 5% -17.73% (p=0.000 n=30+28) windows/amd64 Now 15.3ns ± 5% 29.8ns ± 9% +94.36% (p=0.000 n=30+29) windows/386 This brings time.Now back in line with Go 1.8 on darwin/amd64 and windows/amd64. It's not obvious why windows/386 is still noticeably worse than Go 1.8, but it's better than before this CL. The windows/386 speed is not too important; the changes just keep the two architectures similar. Change-Id: If69b94970c8a1a57910a371ee91e0d4e82e46c5d Reviewed-on: https://go-review.googlesource.com/36428 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-02-09 14:45:16 +00:00
Austin Clements	c1730ae424	runtime: force workers out before checking mark roots Currently we check that all roots are marked as soon as gcMarkDone decides to transition from mark 1 to mark 2. However, issue #16083 indicates that there may be a race where we try to complete mark 1 while a worker is still scanning a stack, causing the root mark check to fail. We don't yet understand this race, but as a simple mitigation, move the root check to after gcMarkDone performs a ragged barrier, which will force any remaining workers to finish their current job. Updates #16083. This may "fix" it, but it would be better to understand and fix the underlying race. Change-Id: I1af9ce67bd87ade7bc2a067295d79c28cd11abd2 Reviewed-on: https://go-review.googlesource.com/35353 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-01-18 15:40:33 +00:00
Austin Clements	618c291544	runtime: update big mgc.go comment The comment describing the overall GC algorithm at the top of mgc.go has gotten woefully out-of-date (and was possibly never correct/complete). Update it to reflect the current workings of the GC and the set of phases that we now divide it into. Change-Id: I02143c0ebefe9d4cd7753349dab8045f0973bf95 Reviewed-on: https://go-review.googlesource.com/34711 Reviewed-by: Rick Hudson <rlh@golang.org>	2017-01-06 18:22:35 +00:00
Austin Clements	01c6a19e04	runtime: add number of forced GCs to MemStats This adds a counter for the number of times the application forced a GC by, e.g., calling runtime.GC(). This is useful for detecting applications that are overusing/abusing runtime.GC() or debug.FreeOSMemory(). Fixes #18217. Change-Id: I990ab7a313c1b3b7a50a3d44535c460d7c54f47d Reviewed-on: https://go-review.googlesource.com/34067 Reviewed-by: Russ Cox <rsc@golang.org>	2016-12-07 20:59:16 +00:00
Cherry Zhang	bbe96f5673	runtime: make work.bytesMarked 8-byte aligned Make atomic access on 32-bit architectures happy. Updates #17786. Change-Id: I42de63ff1381af42124dc51befc887160f71797d Reviewed-on: https://go-review.googlesource.com/33235 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>	2016-11-21 20:25:17 +00:00
Austin Clements	0bae74e8c9	runtime: wake idle Ps when enqueuing GC work If the scheduler has no user work and there's no GC work visible, it puts the P to sleep (or blocks on the network). However, if we later enqueue more GC work, there's currently nothing that specifically wakes up the scheduler to let it start an idle GC worker. As a result, we can underutilize the CPU during GC if Ps have been put to sleep. Fix this by making GC wake idle Ps when work buffers are put on the full list. We already have a hook to do this, since we use this to preempt a random P if we need more dedicated workers. We expand this hook to instead wake an idle P if there is one. The logic we use for this is identical to the logic used to wake an idle P when we ready a goroutine. To make this really sound, we also fix the scheduler to re-check the idle GC worker condition after releasing its P. This closes a race where 1) the scheduler checks for idle work and finds none, 2) new work is enqueued but there are no idle Ps so none are woken, and 3) the scheduler releases its P. There is one subtlety here. Currently we call enlistWorker directly from putfull, but the gcWork is in an inconsistent state in the places that call putfull. This isn't a problem right now because nothing that enlistWorker does touches the gcWork, but with the added call to wakep, it's possible to get a recursive call into the gcWork (specifically, while write barriers are disallowed, this can do an allocation, which can dispose a gcWork, which can put a workbuf). To handle this, we lift the enlistWorker calls up a layer and delay them until the gcWork is in a consistent state. Fixes #14179. Change-Id: Ia2467a52e54c9688c3c1752e1fc00f5b37bbfeeb Reviewed-on: https://go-review.googlesource.com/32434 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2016-11-20 22:44:22 +00:00
Austin Clements	49ea9207b6	runtime: exit idle worker if there's higher-priority work Idle GC workers trigger whenever there's a GC running and the scheduler doesn't find any other work. However, they currently run for a full scheduler quantum (~10ms) once started. This is really bad for event-driven applications, where work may come in on the network hundreds of times during that window. In the go-gcbench rpc benchmark, this is bad enough to often cause effective STWs where all Ps are in the idle worker. When this happens, we don't even poll the network any more (except for the background 10ms poll in sysmon), so we don't even know there's more work to do. Fix this by making idle workers check with the scheduler roughly every 100 µs to see if there's any higher-priority work the P should be doing. This check includes polling the network for incoming work. Fixes #16528. Change-Id: I6f62ebf6d36a92368da9891bafbbfd609b9bd003 Reviewed-on: https://go-review.googlesource.com/32433 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-11-20 22:44:17 +00:00
David Crawshaw	54ec7b072e	runtime: access modules via a slice The introduction of -buildmode=plugin means modules can be added to a Go program while it is running. This means there exists some time while the program is running with the module is on the moduledata linked list, but it has not been initialized to the satisfaction of other parts of the runtime. Notably, the GC. This CL adds a new way of access modules, an activeModules function. It returns a slice of modules that is built in the background and atomically swapped in. The parts of the runtime that need to wait on module initialization can use this slice instead of the linked list. Fixes #17455 Change-Id: I04790fd07e40c7295beb47cea202eb439206d33d Reviewed-on: https://go-review.googlesource.com/32357 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-11-01 16:04:12 +00:00
Martin Möhrmann	d7b34d5f29	runtime: improve atoi implementation - Adds overflow checks - Adds parsing of negative integers - Adds boolean return value to signal parsing errors - Adds atoi32 for parsing of integers that fit in an int32 - Adds tests Handling of errors to provide error messages at the call sites is left to future CLs. Updates #17718 Change-Id: I3cacd0ab1230b9efc5404c68edae7304d39bcbc0 Reviewed-on: https://go-review.googlesource.com/32390 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-11-01 14:04:39 +00:00
Austin Clements	bd640c882a	runtime: disable stack rescanning by default With the hybrid barrier in place, we can now disable stack rescanning by default. This commit adds a "gcrescanstacks" GODEBUG variable that is off by default but can be set to re-enable STW stack rescanning. The plan is to leave this off but available in Go 1.8 for debugging and as a fallback. With this change, worst-case mark termination time at GOMAXPROCS=12 not including time spent stopping the world (which is still unbounded) is reliably under 100 µs, with a 95%ile around 50 µs in every benchmark I tried (the go1 benchmarks, the x/benchmarks garbage benchmark, and the gcbench activegs and rpc benchmarks). Including time spent stopping the world usually adds about 20 µs to total STW time at GOMAXPROCS=12, but I've seen it add around 150 µs in these benchmarks when a goroutine takes time to reach a safe point (see issue #10958) or when stopping the world races with goroutine switches. At GOMAXPROCS=1, where this isn't an issue, worst case STW is typically 30 µs. The go-gcbench activegs benchmark is designed to stress large numbers of dirty stacks. This commit reduces 95%ile STW time for 500k dirty stacks by nearly three orders of magnitude, from 150ms to 195µs. This has little effect on the throughput of the go1 benchmarks or the x/benchmarks benchmarks. name old time/op new time/op delta XGarbage-12 2.31ms ± 0% 2.32ms ± 1% +0.28% (p=0.001 n=17+16) XJSON-12 12.4ms ± 0% 12.4ms ± 0% +0.41% (p=0.000 n=18+18) XHTTP-12 11.8µs ± 0% 11.8µs ± 1% ~ (p=0.492 n=20+18) It reduces the tail latency of the x/benchmarks HTTP benchmark: name old p50-time new p50-time delta XHTTP-12 489µs ± 0% 491µs ± 1% +0.54% (p=0.000 n=20+18) name old p95-time new p95-time delta XHTTP-12 957µs ± 1% 960µs ± 1% +0.28% (p=0.002 n=20+17) name old p99-time new p99-time delta XHTTP-12 1.76ms ± 1% 1.64ms ± 1% -7.20% (p=0.000 n=20+18) Comparing to the beginning of the hybrid barrier implementation ("runtime: parallelize STW mcache flushing") shows that the hybrid barrier trades a small performance impact for much better STW latency, as expected. The magnitude of the performance impact is generally small: name old time/op new time/op delta BinaryTree17-12 2.37s ± 1% 2.42s ± 1% +2.04% (p=0.000 n=19+18) Fannkuch11-12 2.84s ± 0% 2.72s ± 0% -4.00% (p=0.000 n=19+19) FmtFprintfEmpty-12 44.2ns ± 1% 45.2ns ± 1% +2.20% (p=0.000 n=17+19) FmtFprintfString-12 130ns ± 1% 134ns ± 0% +2.94% (p=0.000 n=18+16) FmtFprintfInt-12 114ns ± 1% 117ns ± 0% +3.01% (p=0.000 n=19+15) FmtFprintfIntInt-12 176ns ± 1% 182ns ± 0% +3.17% (p=0.000 n=20+15) FmtFprintfPrefixedInt-12 186ns ± 1% 187ns ± 1% +1.04% (p=0.000 n=20+19) FmtFprintfFloat-12 251ns ± 1% 250ns ± 1% -0.74% (p=0.000 n=17+18) FmtManyArgs-12 746ns ± 1% 761ns ± 0% +2.08% (p=0.000 n=19+20) GobDecode-12 6.57ms ± 1% 6.65ms ± 1% +1.11% (p=0.000 n=19+20) GobEncode-12 5.59ms ± 1% 5.65ms ± 0% +1.08% (p=0.000 n=17+17) Gzip-12 223ms ± 1% 223ms ± 1% -0.31% (p=0.006 n=20+20) Gunzip-12 38.0ms ± 0% 37.9ms ± 1% -0.25% (p=0.009 n=19+20) HTTPClientServer-12 77.5µs ± 1% 78.9µs ± 2% +1.89% (p=0.000 n=20+20) JSONEncode-12 14.7ms ± 1% 14.9ms ± 0% +0.75% (p=0.000 n=20+20) JSONDecode-12 53.0ms ± 1% 55.9ms ± 1% +5.54% (p=0.000 n=19+19) Mandelbrot200-12 3.81ms ± 0% 3.81ms ± 1% +0.20% (p=0.023 n=17+19) GoParse-12 3.17ms ± 1% 3.18ms ± 1% ~ (p=0.057 n=20+19) RegexpMatchEasy0_32-12 71.7ns ± 1% 70.4ns ± 1% -1.77% (p=0.000 n=19+20) RegexpMatchEasy0_1K-12 946ns ± 0% 946ns ± 0% ~ (p=0.405 n=18+18) RegexpMatchEasy1_32-12 67.2ns ± 2% 67.3ns ± 2% ~ (p=0.732 n=20+20) RegexpMatchEasy1_1K-12 374ns ± 1% 378ns ± 1% +1.14% (p=0.000 n=18+19) RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 1% ~ (p=0.259 n=18+20) RegexpMatchMedium_1K-12 34.2µs ± 1% 34.5µs ± 1% +1.03% (p=0.000 n=18+18) RegexpMatchHard_32-12 1.77µs ± 1% 1.79µs ± 1% +0.73% (p=0.000 n=19+18) RegexpMatchHard_1K-12 53.6µs ± 1% 54.2µs ± 1% +1.10% (p=0.000 n=19+19) Template-12 61.5ms ± 1% 63.9ms ± 0% +3.96% (p=0.000 n=18+18) TimeParse-12 303ns ± 1% 300ns ± 1% -1.08% (p=0.000 n=19+20) TimeFormat-12 318ns ± 1% 320ns ± 0% +0.79% (p=0.000 n=19+19) Revcomp-12 () 509ms ± 3% 504ms ± 0% ~ (p=0.967 n=7+12) [Geo mean] 54.3µs 54.8µs +0.88% () Revcomp is highly non-linear, so I only took samples with 2 iterations. name old time/op new time/op delta XGarbage-12 2.25ms ± 0% 2.32ms ± 1% +2.74% (p=0.000 n=16+16) XJSON-12 11.6ms ± 0% 12.4ms ± 0% +6.81% (p=0.000 n=18+18) XHTTP-12 11.6µs ± 1% 11.8µs ± 1% +1.62% (p=0.000 n=17+18) Updates #17503. Updates #17099, since you can't have a rescan list bug if there's no rescan list. I'm not marking it as fixed, since gcrescanstacks can still be set to re-enable the rescan lists. Change-Id: I6e926b4c2dbd4cd56721869d4f817bdbb330b851 Reviewed-on: https://go-review.googlesource.com/31766 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 21:24:13 +00:00
Austin Clements	ee3d20129a	runtime: avoid getfull() barrier most of the time With the hybrid barrier, unless we're doing a STW GC or hit a very rare race (~once per all.bash) that can start mark termination before all of the work is drained, we don't need to drain the work queue at all. Even draining an empty work queue is rather expensive since we have to enter the getfull() barrier, so it's worth avoiding this. Conveniently, it's quite easy to detect whether or not we actually need the getufull() barrier: since the world is stopped when we enter mark termination, everything must have flushed its work to the work queue, so we can just check the queue. If the queue is empty and we haven't queued up any jobs that may create more work (which should always be the case with the hybrid barrier), we can simply have all GC workers perform non-blocking drains. Also conveniently, this solution is quite safe. If we do somehow screw something up and there's work on the work queue, some worker will still process it, it just may not happen in parallel. This is not the "right" solution, but it's simple, expedient, low-risk, and maintains compatibility with debug.gcrescanstacks. When we remove the gcrescanstacks fallback in Go 1.9, we should also fix the race that starts mark termination early, and then we can eliminate work draining from mark termination. Updates #17503. Change-Id: I7b3cd5de6a248ab29d78c2b42aed8b7443641361 Reviewed-on: https://go-review.googlesource.com/32186 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 21:23:52 +00:00
Austin Clements	85c22bc3a5	runtime: mark tiny blocks at GC start The hybrid barrier requires allocate-black, but there's one case where we don't currently allocate black: the tiny allocator. If we allocate a new tiny alloc block during GC, it will be allocated black, but if we allocated the current block before GC, it won't be black, and the further allocations from it won't mark it, which means we may free a reachable tiny block during sweeping. Fix this by passing over all mcaches at the beginning of mark, while the world is still stopped, and greying their tiny blocks. Updates #17503. Change-Id: I04d4df7cc2f553f8f7b1e4cb0b52e2946588111a Reviewed-on: https://go-review.googlesource.com/31456 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 20:05:28 +00:00
Austin Clements	a475a38a3d	runtime: parallelize STW mcache flushing Currently all mcaches are flushed in a single STW root job. This takes about 5 µs per P, but since it's done sequentially it adds about 5*GOMAXPROCS µs to the STW. Fix this by parallelizing the job. Since there are exactly GOMAXPROCS mcaches to flush, this parallelizes quite nicely and brings the STW latency cost down to a constant 5 µs (assuming GOMAXPROCS actually reflects the number of CPUs). Updates #17503. Change-Id: Ibefeb1c2229975d5137c6e67fac3b6c92103742d Reviewed-on: https://go-review.googlesource.com/32033 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 18:19:53 +00:00
Austin Clements	6834839427	runtime, cmd/trace: annotate different mark worker types Currently mark workers are shown in the trace as regular goroutines labeled "runtime.gcBgMarkWorker". That's somewhat unhelpful to an end user because of the opaque label and particularly unhelpful to runtime developers because it doesn't distinguish the different types of mark workers. Fix this by introducing a variant of the GoStart event called GoStartLabel that lets the runtime indicate a label for a goroutine execution span and using this to label mark worker executions as "GC (<mode>)" in the trace viewer. Since this bumps the trace version to 1.8, we also add test data for 1.7 traces. Change-Id: Id7b9c0536508430c661ffb9e40e436f3901ca121 Reviewed-on: https://go-review.googlesource.com/30702 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2016-10-28 14:29:40 +00:00
Peter Weinberger	ca922b6d36	runtime: Profile goroutines holding contended mutexes. runtime.SetMutexProfileFraction(n int) will capture 1/n-th of stack traces of goroutines holding contended mutexes if n > 0. From runtime/pprof, pprot.Lookup("mutex").WriteTo writes the accumulated stack traces to w (in essentially the same format that blocking profiling uses). Change-Id: Ie0b54fa4226853d99aa42c14cb529ae586a8335a Reviewed-on: https://go-review.googlesource.com/29650 Reviewed-by: Austin Clements <austin@google.com>	2016-10-28 11:47:16 +00:00
Austin Clements	d6625caf53	runtime: scan mark worker stacks like normal Currently, markroot delays scanning mark worker stacks until mark termination by putting the mark worker G directly on the rescan list when it encounters one during the mark phase. Without this, since mark workers are non-preemptible, two mark workers that attempt to scan each other's stacks can deadlock. However, this is annoyingly asymmetric and causes some real problems. First, markroot does not own the G at that point, so it's not technically safe to add it to the rescan list. I haven't been able to find a specific problem this could cause, but I suspect it's the root cause of issue #17099. Second, this will interfere with the hybrid barrier, since there is no stack rescanning during mark termination with the hybrid barrier. This commit switches to a different approach. We move the mark worker's call to gcDrain to the system stack and set the mark worker's status to _Gwaiting for the duration of the drain to indicate that it's preemptible. This lets another mark worker scan its G stack while the drain is running on the system stack. We don't return to the G stack until we can switch back to _Grunning, which ensures we don't race with a stack scan. This lets us eliminate the special case for mark worker stack scans and scan them just like any other goroutine. The only subtlety to this approach is that we have to disable stack shrinking for mark workers; they could be referring to captured variables from the G stack, so it's not safe to move their stacks. Updates #17099 and #17503. Change-Id: Ia5213949ec470af63e24dfce01df357c12adbbea Reviewed-on: https://go-review.googlesource.com/31820 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-26 18:13:16 +00:00
Austin Clements	575b1dda4e	runtime: eliminate allspans snapshot Now that sweeping and span marking use the sweep list, there's no need for the work.spans snapshot of the allspans list. This change eliminates the few remaining uses of it, which are either dead code or can use allspans directly, and removes work.spans and its support functions. Change-Id: Id5388b42b1e68e8baee853d8eafb8bb4ff95bb43 Reviewed-on: https://go-review.googlesource.com/30537 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-25 22:33:02 +00:00
Austin Clements	f9497a6747	runtime: make sweep time proportional to in-use spans Currently sweeping walks the list of all spans, which means the work in sweeping is proportional to the maximum number of spans ever used. If the heap was once large but is now small, this causes an amortization failure: on a small heap, GCs happen frequently, but a full sweep still has to happen in each GC cycle, which means we spent a lot of time in sweeping. Fix this by creating a separate list consisting of just the in-use spans to be swept, so sweeping is proportional to the number of in-use spans (which is proportional to the live heap). Specifically, we create two lists: a list of unswept in-use spans and a list of swept in-use spans. At the start of the sweep cycle, the swept list becomes the unswept list and the new swept list is empty. Allocating a new in-use span adds it to the swept list. Sweeping moves spans from the unswept list to the swept list. This fixes the amortization problem because a shrinking heap moves spans off the unswept list without adding them to the swept list, reducing the time required by the next sweep cycle. Updates #9265. This fix eliminates almost all of the time spent in sweepone; however, markrootSpans has essentially the same bug, so now the test program from this issue spends all of its time in markrootSpans. No significant effect on other benchmarks. Change-Id: Ib382e82790aad907da1c127e62b3ab45d7a4ac1e Reviewed-on: https://go-review.googlesource.com/30535 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-25 22:32:57 +00:00
Austin Clements	45baff61e3	runtime: expand comment on work.spans Change-Id: I4b8a6f5d9bc5aba16026d17f99f3512dacde8d2d Reviewed-on: https://go-review.googlesource.com/30534 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-25 22:32:54 +00:00
Austin Clements	4d6207790b	runtime: consolidate h_allspans and mheap_.allspans These are two ways to refer to the allspans array that hark back to when the runtime was split between C and Go. Clean this up by making mheap_.allspans a slice and eliminating h_allspans. Change-Id: Ic9360d040cf3eb590b5dfbab0b82e8ace8525610 Reviewed-on: https://go-review.googlesource.com/30530 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-25 22:32:42 +00:00
Austin Clements	5b7497f327	runtime: update heap profile stats after world is started Updating the heap profile stats is one of the most expensive parts of mark termination other than stack rescanning, but there's really no need to do this with the world stopped. Move it to right after we've started the world back up. This creates a very small window where allocations from the next cycle can slip into the profile, but the exact point where mark termination happens is so non-deterministic already that a slight reordering here is unimportant. Change-Id: I2f76f22c70329923ad6a594a2c26869f0736d34e Reviewed-on: https://go-review.googlesource.com/31363 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-19 21:36:24 +00:00
Austin Clements	9429aab999	runtime: remove gcWork flushes in mark termination The only reason these flushes are still necessary at all is that gcmarknewobject doesn't flush its gcWork stats like it's supposed to. By changing gcmarknewobject to follow the standard protocol, the flushes become completely unnecessary because mark 2 ensures caches are flushed (and stay flushed) before we ever enter mark termination. In the garbage benchmark, this takes roughly 50 µs, which is surprisingly long for doing nothing. We still double-check after draining that they are in fact empty. Change-Id: Ia1c7cf98a53f72baa513792eb33eca6a0b4a7128 Reviewed-on: https://go-review.googlesource.com/31134 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-19 21:36:09 +00:00
Gleb Stepanov	460d112f6c	runtime: fix typo in comments Fix typo in word synchronization in comments. Change-Id: I453b4e799301e758799c93df1e32f5244ca2fb84 Reviewed-on: https://go-review.googlesource.com/25174 Reviewed-by: Russ Cox <rsc@golang.org>	2016-10-12 02:48:31 +00:00
Austin Clements	fa9b57bb1d	runtime: make next_gc ^0 when GC is disabled When GC is disabled, we set gcpercent to -1. However, we still use gcpercent to compute several values, such as next_gc and gc_trigger. These calculations are meaningless when gcpercent is -1 and result in meaningless values. This is okay in a sense because we also never use these values if gcpercent is -1, but they're confusing when exposed to the user, for example via MemStats or the execution trace. It's particularly unfortunate in the execution trace because it attempts to plot the underflowed value of next_gc, which scales all useful information in the heap row into oblivion. Fix this by making next_gc ^0 when gcpercent < 0. This has the advantage of being true in a way: next_gc is effectively infinite when gcpercent < 0. We can also detect this special value when updating the execution trace and report next_gc as 0 so it doesn't blow up the display of the heap line. Change-Id: I4f366e4451f8892a4908da7b2b6086bdc67ca9a9 Reviewed-on: https://go-review.googlesource.com/30016 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-07 18:32:51 +00:00
Austin Clements	2098e5d39a	runtime: eliminate memstats.heap_reachable We used to compute an estimate of the reachable heap size that was different from the marked heap size. This ultimately caused more problems than it solved, so we pulled it out, but memstats still has both heap_reachable and heap_marked, and there are some leftover TODOs about the problems with this estimate. Clean this up by eliminating heap_reachable in favor of heap_marked and deleting the stale TODOs. Change-Id: I713bc20a7c90683d2b43ff63c0b21a440269cc4d Reviewed-on: https://go-review.googlesource.com/29271 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-09-26 22:00:47 +00:00
Austin Clements	ec9c84c884	runtime: disentangle next_gc from GC trigger Back in Go 1.4, memstats.next_gc was both the heap size at which GC would trigger, and the size GC kept the heap under. When we switched to concurrent GC in Go 1.5, we got somewhat confused and made this variable the trigger heap size, while gcController.heapGoal became the goal heap size. memstats.next_gc is exposed to the user via MemStats.NextGC, while gcController.heapGoal is not. This is unfortunate because 1) the heap goal is far more useful for diagnostics, and 2) the trigger heap size is just part of the GC trigger heuristic, which means it wouldn't be useful to an application even if it tried to use it. We never noticed this mess because MemStats.NextGC is practically undocumented. Now that we're trying to document MemStats, it became clear that this field had diverged from its original usefulness. Clean up this mess by shuffling things back around so that next_gc is the goal heap size and the new (unexposed) memstats.gc_trigger field is the trigger heap size. This eliminates gcController.heapGoal. Updates #15849. Change-Id: I2cbbd43b1d78bdf613cb43f53488bd63913189b7 Reviewed-on: https://go-review.googlesource.com/29270 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-09-26 22:00:44 +00:00
Austin Clements	cf4f1d07a1	runtime: bound scanobject to ~100 µs Currently the time spent in scanobject is proportional to the size of the object being scanned. Since scanobject is non-preemptible, large objects can cause significant goroutine (and even whole application) delays through several means: 1. If a GC assist picks up a large object, the allocating goroutine is blocked for the whole scan, even if that scan well exceeds that goroutine's debt. 2. Since the scheduler does not run on the P performing a large object scan, goroutines in that P's run queue do not run unless they are stolen by another P (which can take some time). If there are a few large objects, all of the Ps may get tied up so the scheduler doesn't run anywhere. 3. Even if a large object is scanned by a background worker and other Ps are still running the scheduler, the large object scan doesn't flush background credit until the whole scan is done. This can easily cause all allocations to block in assists, waiting for credit, causing an effective STW. Fix this by splitting large objects into 128 KB "oblets" and scanning at most one oblet at a time. Since we can scan 1–2 MB/ms, this equates to bounding scanobject at roughly 100 µs. This improves assist behavior both because assists can no longer get "unlucky" and be stuck scanning a large object, and because it causes the background worker to flush credit and unblock assists more frequently when scanning large objects. This also improves GC parallelism if the heap consists primarily of a small number of very large objects by letting multiple workers scan a large objects in parallel. Fixes #10345. Fixes #16293. This substantially improves goroutine latency in the benchmark from issue #16293, which exercises several forms of very large objects: name old max-latency new max-latency delta SliceNoPointer-12 154µs ± 1% 155µs ± 2% ~ (p=0.087 n=13+12) SlicePointer-12 314ms ± 1% 5.94ms ±138% -98.11% (p=0.000 n=19+20) SliceLivePointer-12 1148ms ± 0% 4.72ms ±167% -99.59% (p=0.000 n=19+20) MapNoPointer-12 72509µs ± 1% 408µs ±325% -99.44% (p=0.000 n=19+18) ChanPointer-12 313ms ± 0% 4.74ms ±140% -98.49% (p=0.000 n=18+20) ChanLivePointer-12 1147ms ± 0% 3.30ms ±149% -99.71% (p=0.000 n=19+20) name old P99.9-latency new P99.9-latency delta SliceNoPointer-12 113µs ±25% 107µs ±12% ~ (p=0.153 n=20+18) SlicePointer-12 309450µs ± 0% 133µs ±23% -99.96% (p=0.000 n=20+20) SliceLivePointer-12 961ms ± 0% 1.35ms ±27% -99.86% (p=0.000 n=20+20) MapNoPointer-12 448µs ±288% 119µs ±18% -73.34% (p=0.000 n=18+20) ChanPointer-12 309450µs ± 0% 134µs ±23% -99.96% (p=0.000 n=20+19) ChanLivePointer-12 961ms ± 0% 1.35ms ±27% -99.86% (p=0.000 n=20+20) This has negligible effect on all metrics from the garbage, JSON, and HTTP x/benchmarks. It shows slight improvement on some of the go1 benchmarks, particularly Revcomp, which uses some multi-megabyte buffers: name old time/op new time/op delta BinaryTree17-12 2.46s ± 1% 2.47s ± 1% +0.32% (p=0.012 n=20+20) Fannkuch11-12 2.82s ± 0% 2.81s ± 0% -0.61% (p=0.000 n=17+20) FmtFprintfEmpty-12 50.8ns ± 5% 50.5ns ± 2% ~ (p=0.197 n=17+19) FmtFprintfString-12 131ns ± 1% 132ns ± 0% +0.57% (p=0.000 n=20+16) FmtFprintfInt-12 117ns ± 0% 116ns ± 0% -0.47% (p=0.000 n=15+20) FmtFprintfIntInt-12 180ns ± 0% 179ns ± 1% -0.78% (p=0.000 n=16+20) FmtFprintfPrefixedInt-12 186ns ± 1% 185ns ± 1% -0.55% (p=0.000 n=19+20) FmtFprintfFloat-12 263ns ± 1% 271ns ± 0% +2.84% (p=0.000 n=18+20) FmtManyArgs-12 741ns ± 1% 742ns ± 1% ~ (p=0.190 n=19+19) GobDecode-12 7.44ms ± 0% 7.35ms ± 1% -1.21% (p=0.000 n=20+20) GobEncode-12 6.22ms ± 1% 6.21ms ± 1% ~ (p=0.336 n=20+19) Gzip-12 220ms ± 1% 219ms ± 1% ~ (p=0.130 n=19+19) Gunzip-12 37.9ms ± 0% 37.9ms ± 1% ~ (p=1.000 n=20+19) HTTPClientServer-12 82.5µs ± 3% 82.6µs ± 3% ~ (p=0.776 n=20+19) JSONEncode-12 16.4ms ± 1% 16.5ms ± 2% +0.49% (p=0.003 n=18+19) JSONDecode-12 53.7ms ± 1% 54.1ms ± 1% +0.71% (p=0.000 n=19+18) Mandelbrot200-12 4.19ms ± 1% 4.20ms ± 1% ~ (p=0.452 n=19+19) GoParse-12 3.38ms ± 1% 3.37ms ± 1% ~ (p=0.123 n=19+19) RegexpMatchEasy0_32-12 72.1ns ± 1% 71.8ns ± 1% ~ (p=0.397 n=19+17) RegexpMatchEasy0_1K-12 242ns ± 0% 242ns ± 0% ~ (p=0.168 n=17+20) RegexpMatchEasy1_32-12 72.1ns ± 1% 72.1ns ± 1% ~ (p=0.538 n=18+19) RegexpMatchEasy1_1K-12 385ns ± 1% 384ns ± 1% ~ (p=0.388 n=20+20) RegexpMatchMedium_32-12 112ns ± 1% 112ns ± 3% ~ (p=0.539 n=20+20) RegexpMatchMedium_1K-12 34.4µs ± 2% 34.4µs ± 2% ~ (p=0.628 n=18+18) RegexpMatchHard_32-12 1.80µs ± 1% 1.80µs ± 1% ~ (p=0.522 n=18+19) RegexpMatchHard_1K-12 54.0µs ± 1% 54.1µs ± 1% ~ (p=0.647 n=20+19) Revcomp-12 387ms ± 1% 369ms ± 5% -4.89% (p=0.000 n=17+19) Template-12 62.3ms ± 1% 62.0ms ± 0% -0.48% (p=0.002 n=20+17) TimeParse-12 314ns ± 1% 314ns ± 0% ~ (p=1.011 n=20+13) TimeFormat-12 358ns ± 0% 354ns ± 0% -1.12% (p=0.000 n=17+20) [Geo mean] 53.5µs 53.3µs -0.23% Change-Id: I2a0a179d1d6bf7875dd054b7693dd12d2a340132 Reviewed-on: https://go-review.googlesource.com/23540 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-09-06 19:27:33 +00:00

1 2 3 4 5 ...

283 Commits