qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-20 08:14:41 -07:00

Author	SHA1	Message	Date
Austin Clements	f4d51eb2f5	runtime: minor clean up to heapminimum Currently setGCPercent sets heapminimum to heapminimum*GOGC/100. The real intent is to set heapminimum to a scaled multiple of a fixed default heap minimum, not to scale heapminimum based on its current value. This turns out to be okay because setGCPercent is only called once and heapminimum is initially set to this default heap minimum. However, the code as written is confusing, especially since setGCPercent is otherwise written so it could be called again to change GOGC. Fix this by introducing a defaultHeapMinimum constant and using this instead of the current value of heapminimum to compute the scaled heap minimum. As part of this, this commit improves the documentation on heapminimum. Change-Id: I4eb82c73dc2eb44a6e5a17c780a747a2e73d7493 Reviewed-on: https://go-review.googlesource.com/10181 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-19 15:30:34 +00:00
Austin Clements	277acca286	runtime: hold worldsema while starting the world Currently, startTheWorld releases worldsema before starting the world. Since startTheWorld can change gomaxprocs after allowing Ps to run, this means that gomaxprocs can change while another P holds worldsema. Unfortunately, the garbage collector and forEachP assume that holding worldsema protects against changes in gomaxprocs (which it almost does). In particular, this is causing somewhat frequent "P did not run fn" crashes in forEachP in the runtime tests because gomaxprocs is changing between the several loops that forEachP does over all the Ps. Fix this by only releasing worldsema after the world is started. This relates to issue #10618. forEachP still fails under stress testing, but much less frequently. Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff Reviewed-on: https://go-review.googlesource.com/10156 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:37 +00:00
Austin Clements	a1da255aa0	runtime: factor stoptheworld/starttheworld pattern There are several steps to stopping and starting the world and currently they're open-coded in several places. The garbage collector is the only thing that needs to stop and start the world in a non-trivial pattern. Replace all other uses with calls to higher-level functions that implement the entire pattern necessary to stop and start the world. This is a pure refectoring and should not change any code semantics. In the following commits, we'll make changes that are easier to do with this abstraction in place. This commit renames the old starttheworld to startTheWorldWithSema. This is a slight misnomer right now because the callers release worldsema just before calling this. However, a later commit will swap these and I don't want to think of another name in the mean time. Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911 Reviewed-on: https://go-review.googlesource.com/10154 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:25 +00:00
Austin Clements	5f7060afd2	runtime: don't start GC if preemptoff is set In order to avoid deadlocks, startGC avoids kicking off GC if locks are held by the calling M. However, it currently fails to check preemptoff, which is the other way to disable preemption. Fix this by adding a check for preemptoff. Change-Id: Ie1083166e5ba4af5c9d6c5a42efdfaaef41ca997 Reviewed-on: https://go-review.googlesource.com/10153 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:18 +00:00
Russ Cox	512f75e8df	runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>	2015-05-16 00:38:17 +00:00
Russ Cox	1635ab7dfe	runtime: remove wbshadow mode The write barrier shadow heap was very useful for developing the write barriers initially, but it's no longer used, clunky, and dragging the rest of the implementation down. The gccheckmark mode will find bugs due to missed barriers when they result in missed marks; wbshadow mode found the missed barriers more aggressively, but it required an entire separate copy of the heap. The gccheckmark mode requires no extra memory, making it more useful in practice. Compared to previous CL: name old mean new mean delta BinaryTree17 5.91s × (0.96,1.06) 5.72s × (0.97,1.03) -3.12% (p=0.000) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.91% (p=0.000) FmtFprintfEmpty 89.0ns × (0.93,1.10) 86.6ns × (0.96,1.11) ~ (p=0.077) FmtFprintfString 298ns × (0.98,1.06) 283ns × (0.99,1.04) -4.90% (p=0.000) FmtFprintfInt 286ns × (0.98,1.03) 283ns × (0.98,1.04) -1.09% (p=0.032) FmtFprintfIntInt 498ns × (0.97,1.06) 480ns × (0.99,1.02) -3.65% (p=0.000) FmtFprintfPrefixedInt 408ns × (0.98,1.02) 396ns × (0.99,1.01) -3.00% (p=0.000) FmtFprintfFloat 587ns × (0.98,1.01) 562ns × (0.99,1.01) -4.34% (p=0.000) FmtManyArgs 1.94µs × (0.99,1.02) 1.89µs × (0.99,1.01) -2.85% (p=0.000) GobDecode 15.8ms × (0.98,1.03) 15.7ms × (0.99,1.02) ~ (p=0.251) GobEncode 12.0ms × (0.96,1.09) 11.8ms × (0.98,1.03) -1.87% (p=0.024) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.688) Gunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.203) HTTPClientServer 90.3µs × (0.98,1.01) 89.1µs × (0.99,1.02) -1.30% (p=0.000) JSONEncode 31.6ms × (0.99,1.01) 31.7ms × (0.98,1.02) ~ (p=0.219) JSONDecode 107ms × (1.00,1.01) 111ms × (0.99,1.01) +3.58% (p=0.000) Mandelbrot200 6.03ms × (1.00,1.01) 6.01ms × (1.00,1.00) ~ (p=0.077) GoParse 6.53ms × (0.99,1.03) 6.54ms × (0.99,1.02) ~ (p=0.585) RegexpMatchEasy0_32 161ns × (1.00,1.01) 161ns × (0.98,1.05) ~ (p=0.948) RegexpMatchEasy0_1K 541ns × (0.99,1.01) 559ns × (0.98,1.01) +3.32% (p=0.000) RegexpMatchEasy1_32 138ns × (1.00,1.00) 137ns × (0.99,1.01) -0.55% (p=0.001) RegexpMatchEasy1_1K 887ns × (0.99,1.01) 878ns × (0.99,1.01) -0.98% (p=0.000) RegexpMatchMedium_32 253ns × (0.99,1.01) 252ns × (0.99,1.01) -0.39% (p=0.001) RegexpMatchMedium_1K 72.8µs × (1.00,1.00) 72.7µs × (1.00,1.00) ~ (p=0.485) RegexpMatchHard_32 3.85µs × (1.00,1.01) 3.85µs × (1.00,1.01) ~ (p=0.283) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) ~ (p=0.175) Revcomp 922ms × (0.97,1.08) 903ms × (0.98,1.05) -2.15% (p=0.021) Template 126ms × (0.99,1.01) 126ms × (0.99,1.01) ~ (p=0.943) TimeParse 628ns × (0.99,1.01) 634ns × (0.99,1.01) +0.92% (p=0.000) TimeFormat 668ns × (0.99,1.01) 698ns × (0.98,1.03) +4.53% (p=0.000) It's nice that the microbenchmarks are the ones helped the most, because those were the ones hurt the most by the conversion from 4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that process to (compared to CL 9706 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.72s × (0.97,1.03) -2.57% (p=0.011) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.87% (p=0.000) FmtFprintfEmpty 89.1ns × (0.95,1.16) 86.6ns × (0.96,1.11) ~ (p=0.090) FmtFprintfString 283ns × (0.98,1.02) 283ns × (0.99,1.04) ~ (p=0.681) FmtFprintfInt 284ns × (0.98,1.04) 283ns × (0.98,1.04) ~ (p=0.620) FmtFprintfIntInt 486ns × (0.98,1.03) 480ns × (0.99,1.02) -1.27% (p=0.002) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 396ns × (0.99,1.01) -0.84% (p=0.001) FmtFprintfFloat 566ns × (0.99,1.01) 562ns × (0.99,1.01) -0.80% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.89µs × (0.99,1.01) -1.10% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.7ms × (0.99,1.02) +1.55% (p=0.005) GobEncode 11.9ms × (0.97,1.03) 11.8ms × (0.98,1.03) -0.97% (p=0.048) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.627) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.482) HTTPClientServer 89.2µs × (0.99,1.02) 89.1µs × (0.99,1.02) ~ (p=0.740) JSONEncode 32.3ms × (0.97,1.06) 31.7ms × (0.98,1.02) -1.95% (p=0.002) JSONDecode 106ms × (0.99,1.01) 111ms × (0.99,1.01) +4.22% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.417) GoParse 6.57ms × (0.97,1.06) 6.54ms × (0.99,1.02) ~ (p=0.404) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (0.98,1.05) ~ (p=0.088) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 559ns × (0.98,1.01) -0.47% (p=0.034) RegexpMatchEasy1_32 145ns × (0.95,1.04) 137ns × (0.99,1.01) -5.56% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 878ns × (0.99,1.01) +1.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 252ns × (0.99,1.01) -1.43% (p=0.001) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.7µs × (1.00,1.00) -1.55% (p=0.004) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.80% (p=0.003) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.00) -2.13% (p=0.001) Revcomp 936ms × (0.95,1.08) 903ms × (0.98,1.05) -3.58% (p=0.002) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.98% (p=0.000) TimeParse 638ns × (0.98,1.05) 634ns × (0.99,1.01) ~ (p=0.198) TimeFormat 674ns × (0.99,1.01) 698ns × (0.98,1.03) +3.69% (p=0.000) Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c Reviewed-on: https://go-review.googlesource.com/9706 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:55:11 +00:00
Rick Hudson	b6e178ed7e	runtime: set heap minimum default based on GOGC Currently the heap minimum is set to 4MB which prevents our ability to collect at every allocation by setting GOGC=0. This adjust the heap minimum to 4MB*GOGC/100 thus reenabling collecting at every allocation. Fixes #10681 Change-Id: I912d027dac4b14ae535597e8beefa9ac3fb8ad94 Reviewed-on: https://go-review.googlesource.com/9814 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-07 21:05:58 +00:00
Austin Clements	17db6e0420	runtime: use heap scan size as estimate of GC scan work Currently, the GC uses a moving average of recent scan work ratios to estimate the total scan work required by this cycle. This is in turn used to compute how much scan work should be done by mutators when they allocate in order to perform all expected scan work by the time the allocated heap reaches the heap goal. However, our current scan work estimate can be arbitrarily wrong if the heap topography changes significantly from one cycle to the next. For example, in the go1 benchmarks, at the beginning of each benchmark, the heap is dominated by a 256MB no-scan object, so the GC learns that the scan density of the heap is very low. In benchmarks that then rapidly allocate pointer-dense objects, by the time of the next GC cycle, our estimate of the scan work can be too low by a large factor. This in turn lets the mutator allocate faster than the GC can collect, allowing it to get arbitrarily far ahead of the scan work estimate, which leads to very long GC cycles with very little mutator assist that can overshoot the heap goal by large margins. This is particularly easy to demonstrate with BinaryTree17: $ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17 gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P testing: warning: no tests to run PASS BenchmarkBinaryTree17 gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced) gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P 1 9150447353 ns/op Change this estimate to instead use the current scannable heap size. This has the advantage of being based solely on the current state of the heap, not on past densities or reachable heap sizes, so it isn't susceptible to falling behind during these sorts of phase changes. This is strictly an over-estimate, but it's better to over-estimate and get more assist than necessary than it is to under-estimate and potentially spiral out of control. Experiments with scaling this estimate back showed no obvious benefit for mutator utilization, heap size, or assist time. This new estimate has little effect for most benchmarks, including most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge effect for benchmarks that triggered the bad pacer behavior: name old mean new mean delta BinaryTree17 10.0s × (1.00,1.00) 3.5s × (0.98,1.01) -64.93% (p=0.000) Fannkuch11 2.74s × (1.00,1.01) 2.65s × (1.00,1.00) -3.52% (p=0.000) FmtFprintfEmpty 56.4ns × (0.99,1.00) 57.8ns × (1.00,1.01) +2.43% (p=0.000) FmtFprintfString 187ns × (0.99,1.00) 185ns × (0.99,1.01) -1.19% (p=0.010) FmtFprintfInt 184ns × (1.00,1.00) 183ns × (1.00,1.00) (no variance) FmtFprintfIntInt 321ns × (1.00,1.00) 315ns × (1.00,1.00) -1.80% (p=0.000) FmtFprintfPrefixedInt 266ns × (1.00,1.00) 263ns × (1.00,1.00) -1.22% (p=0.000) FmtFprintfFloat 353ns × (1.00,1.00) 353ns × (1.00,1.00) -0.13% (p=0.035) FmtManyArgs 1.21µs × (1.00,1.00) 1.19µs × (1.00,1.00) -1.33% (p=0.000) GobDecode 9.69ms × (1.00,1.00) 9.59ms × (1.00,1.00) -1.07% (p=0.000) GobEncode 7.89ms × (0.99,1.01) 7.74ms × (1.00,1.00) -1.92% (p=0.000) Gzip 391ms × (1.00,1.00) 392ms × (1.00,1.00) ~ (p=0.522) Gunzip 97.1ms × (1.00,1.00) 97.0ms × (1.00,1.00) -0.10% (p=0.000) HTTPClientServer 55.7µs × (0.99,1.01) 56.7µs × (0.99,1.01) +1.81% (p=0.001) JSONEncode 19.1ms × (1.00,1.00) 19.0ms × (1.00,1.00) -0.85% (p=0.000) JSONDecode 66.8ms × (1.00,1.00) 66.9ms × (1.00,1.00) ~ (p=0.288) Mandelbrot200 4.13ms × (1.00,1.00) 4.12ms × (1.00,1.00) -0.08% (p=0.000) GoParse 3.97ms × (1.00,1.01) 4.01ms × (1.00,1.00) +0.99% (p=0.000) RegexpMatchEasy0_32 114ns × (1.00,1.00) 115ns × (0.99,1.00) ~ (p=0.070) RegexpMatchEasy0_1K 376ns × (1.00,1.00) 376ns × (1.00,1.00) ~ (p=0.900) RegexpMatchEasy1_32 94.9ns × (1.00,1.00) 96.3ns × (1.00,1.01) +1.53% (p=0.001) RegexpMatchEasy1_1K 568ns × (1.00,1.00) 567ns × (1.00,1.00) -0.22% (p=0.001) RegexpMatchMedium_32 159ns × (1.00,1.00) 159ns × (1.00,1.00) ~ (p=0.178) RegexpMatchMedium_1K 46.4µs × (1.00,1.00) 46.6µs × (1.00,1.00) +0.29% (p=0.000) RegexpMatchHard_32 2.37µs × (1.00,1.00) 2.37µs × (1.00,1.00) ~ (p=0.722) RegexpMatchHard_1K 71.1µs × (1.00,1.00) 71.2µs × (1.00,1.00) ~ (p=0.229) Revcomp 565ms × (1.00,1.00) 562ms × (1.00,1.00) -0.52% (p=0.000) Template 81.0ms × (1.00,1.00) 80.2ms × (1.00,1.00) -0.97% (p=0.000) TimeParse 380ns × (1.00,1.00) 380ns × (1.00,1.00) ~ (p=0.148) TimeFormat 405ns × (0.99,1.00) 385ns × (0.99,1.00) -5.00% (p=0.000) Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4 Reviewed-on: https://go-review.googlesource.com/9696 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-05-06 19:40:38 +00:00
Austin Clements	3be3cbd548	runtime: track "scannable" bytes of heap This tracks the number of scannable bytes in the allocated heap. That is, bytes that the garbage collector must scan before reaching the last pointer field in each object. This will be used to compute a more robust estimate of the GC scan work. Change-Id: I1eecd45ef9cdd65b69d2afb5db5da885c80086bb Reviewed-on: https://go-review.googlesource.com/9695 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-06 19:40:33 +00:00
Austin Clements	c4931a8433	runtime: dispose gcWork caches before updating controller state Currently, we only flush the per-P gcWork caches in gcMark, at the beginning of mark termination. This is necessary to ensure that no work is held up in these caches. However, this flush happens after we update the GC controller state, which depends on statistics about marked heap size and scan work that are only updated by this flush. Hence, the controller is missing the bulk of heap marking and scan work. This bug was introduced in commit `1b4025f`, which introduced the per-P gcWork caches. Fix this by flushing these caches before we update the GC controller state. We continue to flush them at the beginning of mark termination as well to be robust in case any write barriers happened between the previous flush and entering mark termination, but this should be a no-op. Change-Id: I8f0f91024df967ebf0c616d1c4f0c339c304ebaa Reviewed-on: https://go-review.googlesource.com/9646 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-06 19:40:22 +00:00
Rick Hudson	1845314560	runtime: remove unused GC timers During development some tracing routines were added that are not needed in the release. These included GCstarttimes, GCendtimes, and GCprinttimes. Fixes #10462 Change-Id: I0788e6409d61038571a5ae0cbbab793102df0a65 Reviewed-on: https://go-review.googlesource.com/9689 Reviewed-by: Austin Clements <austin@google.com>	2015-05-06 12:53:08 +00:00
Austin Clements	dc870d5f4b	runtime: detailed debug output of controller state This adds a detailed debug dump of the state of the GC controller and a GODEBUG flag to enable it. Change-Id: I562fed7981691a84ddf0f9e6fcd9f089f497ac13 Reviewed-on: https://go-review.googlesource.com/9640 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-01 19:39:43 +00:00
Russ Cox	79a990b845	runtime: schedule GC work more aggressively Schedule the work as early as possible, while still respecting the utilization percentage on average. The old code tried never to go above the utilization percentage. The new code is willing to go above the utilization percentage by one time slice (but of course after doing that it must wait until the percentage drops back down to the target before it gets another time slice). The effect is that for concurrent GCs that can run in a small number of time slices, the time during which write barriers are enabled is reduced by one mutator + GC time slice round (possibly 30 ms per GC). This only affects the fractional GC processor (the remainder of GOMAXPROCS/4), so it matters most in GOMAXPROCS=1, a bit in GOMAXPROCS=2, and not at all in GOMAXPROCS=4. GOMAXPROCS=1 name old mean new mean delta BenchmarkBinaryTree17 12.4s × (0.98,1.03) 13.5s × (0.97,1.04) +8.84% (p=0.000) BenchmarkFannkuch11 4.38s × (1.00,1.01) 4.38s × (1.00,1.01) ~ (p=0.343) BenchmarkFmtFprintfEmpty 88.9ns × (0.97,1.10) 90.1ns × (0.93,1.14) ~ (p=0.224) BenchmarkFmtFprintfString 356ns × (0.94,1.05) 321ns × (0.94,1.12) -9.77% (p=0.000) BenchmarkFmtFprintfInt 344ns × (0.98,1.03) 325ns × (0.96,1.03) -5.46% (p=0.000) BenchmarkFmtFprintfIntInt 622ns × (0.97,1.03) 571ns × (0.95,1.05) -8.09% (p=0.000) BenchmarkFmtFprintfPrefixedInt 462ns × (0.96,1.04) 431ns × (0.95,1.05) -6.81% (p=0.000) BenchmarkFmtFprintfFloat 653ns × (0.98,1.03) 621ns × (0.99,1.03) -4.90% (p=0.000) BenchmarkFmtManyArgs 2.32µs × (0.97,1.03) 2.19µs × (0.98,1.02) -5.43% (p=0.000) BenchmarkGobDecode 27.0ms × (0.96,1.04) 20.0ms × (0.97,1.04) -26.06% (p=0.000) BenchmarkGobEncode 26.6ms × (0.99,1.01) 17.8ms × (0.95,1.05) -33.19% (p=0.000) BenchmarkGzip 659ms × (0.98,1.03) 650ms × (0.99,1.01) -1.34% (p=0.000) BenchmarkGunzip 145ms × (0.98,1.04) 143ms × (1.00,1.01) -1.47% (p=0.000) BenchmarkHTTPClientServer 111µs × (0.97,1.04) 110µs × (0.96,1.03) -1.30% (p=0.000) BenchmarkJSONEncode 52.0ms × (0.97,1.03) 40.8ms × (0.97,1.03) -21.47% (p=0.000) BenchmarkJSONDecode 127ms × (0.98,1.04) 120ms × (0.98,1.02) -5.55% (p=0.000) BenchmarkMandelbrot200 6.04ms × (0.99,1.04) 6.02ms × (1.00,1.01) ~ (p=0.176) BenchmarkGoParse 8.62ms × (0.96,1.08) 8.55ms × (0.93,1.09) ~ (p=0.302) BenchmarkRegexpMatchEasy0_32 164ns × (0.98,1.05) 165ns × (0.98,1.07) ~ (p=0.293) BenchmarkRegexpMatchEasy0_1K 546ns × (0.98,1.06) 547ns × (0.97,1.07) ~ (p=0.741) BenchmarkRegexpMatchEasy1_32 142ns × (0.97,1.09) 141ns × (0.97,1.05) ~ (p=0.231) BenchmarkRegexpMatchEasy1_1K 904ns × (0.97,1.07) 900ns × (0.98,1.04) ~ (p=0.294) BenchmarkRegexpMatchMedium_32 256ns × (0.98,1.06) 256ns × (0.97,1.04) ~ (p=0.530) BenchmarkRegexpMatchMedium_1K 74.2µs × (0.98,1.05) 73.8µs × (0.98,1.04) ~ (p=0.334) BenchmarkRegexpMatchHard_32 3.94µs × (0.98,1.07) 3.92µs × (0.98,1.05) ~ (p=0.356) BenchmarkRegexpMatchHard_1K 119µs × (0.98,1.07) 119µs × (0.98,1.06) ~ (p=0.467) BenchmarkRevcomp 978ms × (0.96,1.09) 984ms × (0.95,1.07) ~ (p=0.448) BenchmarkTemplate 151ms × (0.96,1.03) 142ms × (0.95,1.04) -5.55% (p=0.000) BenchmarkTimeParse 628ns × (0.99,1.01) 628ns × (0.99,1.01) ~ (p=0.855) BenchmarkTimeFormat 729ns × (0.98,1.06) 734ns × (0.97,1.05) ~ (p=0.149) GOMAXPROCS=2 name old mean new mean delta BenchmarkBinaryTree17-2 9.80s × (0.97,1.03) 9.85s × (0.99,1.02) ~ (p=0.444) BenchmarkFannkuch11-2 4.35s × (0.99,1.01) 4.40s × (0.98,1.05) ~ (p=0.099) BenchmarkFmtFprintfEmpty-2 86.7ns × (0.97,1.05) 85.9ns × (0.98,1.04) ~ (p=0.409) BenchmarkFmtFprintfString-2 297ns × (0.98,1.01) 297ns × (0.99,1.01) ~ (p=0.743) BenchmarkFmtFprintfInt-2 309ns × (0.98,1.02) 310ns × (0.99,1.01) ~ (p=0.464) BenchmarkFmtFprintfIntInt-2 525ns × (0.97,1.05) 518ns × (0.99,1.01) ~ (p=0.151) BenchmarkFmtFprintfPrefixedInt-2 408ns × (0.98,1.02) 408ns × (0.98,1.03) ~ (p=0.797) BenchmarkFmtFprintfFloat-2 603ns × (0.99,1.01) 604ns × (0.98,1.02) ~ (p=0.588) BenchmarkFmtManyArgs-2 2.07µs × (0.98,1.02) 2.05µs × (0.99,1.01) ~ (p=0.091) BenchmarkGobDecode-2 19.1ms × (0.97,1.01) 19.3ms × (0.97,1.04) ~ (p=0.195) BenchmarkGobEncode-2 16.2ms × (0.97,1.03) 16.4ms × (0.99,1.01) ~ (p=0.069) BenchmarkGzip-2 652ms × (0.99,1.01) 651ms × (0.99,1.01) ~ (p=0.705) BenchmarkGunzip-2 143ms × (1.00,1.01) 143ms × (1.00,1.00) ~ (p=0.665) BenchmarkHTTPClientServer-2 149µs × (0.92,1.11) 149µs × (0.91,1.08) ~ (p=0.862) BenchmarkJSONEncode-2 34.6ms × (0.98,1.02) 37.2ms × (0.99,1.01) +7.56% (p=0.000) BenchmarkJSONDecode-2 117ms × (0.99,1.01) 117ms × (0.99,1.01) ~ (p=0.858) BenchmarkMandelbrot200-2 6.10ms × (0.99,1.03) 6.03ms × (1.00,1.00) ~ (p=0.083) BenchmarkGoParse-2 8.25ms × (0.98,1.01) 8.21ms × (0.99,1.02) ~ (p=0.307) BenchmarkRegexpMatchEasy0_32-2 162ns × (0.99,1.02) 162ns × (0.99,1.01) ~ (p=0.857) BenchmarkRegexpMatchEasy0_1K-2 541ns × (0.99,1.01) 540ns × (1.00,1.00) ~ (p=0.530) BenchmarkRegexpMatchEasy1_32-2 138ns × (1.00,1.00) 141ns × (0.98,1.04) +1.88% (p=0.038) BenchmarkRegexpMatchEasy1_1K-2 887ns × (0.99,1.01) 894ns × (0.99,1.01) ~ (p=0.087) BenchmarkRegexpMatchMedium_32-2 252ns × (0.99,1.01) 252ns × (0.99,1.01) ~ (p=0.954) BenchmarkRegexpMatchMedium_1K-2 73.4µs × (0.99,1.02) 72.8µs × (1.00,1.01) -0.87% (p=0.029) BenchmarkRegexpMatchHard_32-2 3.95µs × (0.97,1.05) 3.87µs × (1.00,1.01) -2.11% (p=0.035) BenchmarkRegexpMatchHard_1K-2 117µs × (0.99,1.01) 117µs × (0.99,1.01) ~ (p=0.669) BenchmarkRevcomp-2 980ms × (0.95,1.03) 993ms × (0.94,1.09) ~ (p=0.527) BenchmarkTemplate-2 136ms × (0.98,1.01) 135ms × (0.99,1.01) ~ (p=0.200) BenchmarkTimeParse-2 630ns × (1.00,1.01) 630ns × (1.00,1.00) ~ (p=0.634) BenchmarkTimeFormat-2 705ns × (0.99,1.01) 710ns × (0.98,1.02) ~ (p=0.174) GOMAXPROCS=4 BenchmarkBinaryTree17-4 9.87s × (0.96,1.04) 9.75s × (0.96,1.03) ~ (p=0.178) BenchmarkFannkuch11-4 4.35s × (1.00,1.01) 4.40s × (0.99,1.04) ~ (p=0.071) BenchmarkFmtFprintfEmpty-4 85.8ns × (0.98,1.06) 85.6ns × (0.98,1.04) ~ (p=0.858) BenchmarkFmtFprintfString-4 306ns × (0.99,1.03) 304ns × (0.97,1.02) ~ (p=0.470) BenchmarkFmtFprintfInt-4 317ns × (0.98,1.01) 315ns × (0.98,1.02) -0.92% (p=0.044) BenchmarkFmtFprintfIntInt-4 527ns × (0.99,1.01) 525ns × (0.98,1.01) ~ (p=0.164) BenchmarkFmtFprintfPrefixedInt-4 421ns × (0.98,1.03) 417ns × (0.99,1.02) ~ (p=0.092) BenchmarkFmtFprintfFloat-4 623ns × (0.98,1.02) 618ns × (0.98,1.03) ~ (p=0.172) BenchmarkFmtManyArgs-4 2.09µs × (0.98,1.02) 2.09µs × (0.98,1.02) ~ (p=0.679) BenchmarkGobDecode-4 18.6ms × (0.99,1.01) 18.6ms × (0.98,1.03) ~ (p=0.595) BenchmarkGobEncode-4 15.0ms × (0.98,1.02) 15.1ms × (0.99,1.01) ~ (p=0.301) BenchmarkGzip-4 659ms × (0.98,1.04) 660ms × (0.97,1.02) ~ (p=0.724) BenchmarkGunzip-4 145ms × (0.98,1.04) 144ms × (0.99,1.04) ~ (p=0.671) BenchmarkHTTPClientServer-4 139µs × (0.97,1.02) 138µs × (0.99,1.02) ~ (p=0.392) BenchmarkJSONEncode-4 35.0ms × (0.99,1.02) 35.1ms × (0.98,1.02) ~ (p=0.777) BenchmarkJSONDecode-4 119ms × (0.98,1.01) 118ms × (0.98,1.02) ~ (p=0.710) BenchmarkMandelbrot200-4 6.02ms × (1.00,1.00) 6.02ms × (1.00,1.00) ~ (p=0.289) BenchmarkGoParse-4 7.96ms × (0.99,1.01) 7.96ms × (0.99,1.01) ~ (p=0.884) BenchmarkRegexpMatchEasy0_32-4 164ns × (0.98,1.04) 166ns × (0.97,1.04) ~ (p=0.221) BenchmarkRegexpMatchEasy0_1K-4 540ns × (0.99,1.01) 552ns × (0.97,1.04) +2.10% (p=0.018) BenchmarkRegexpMatchEasy1_32-4 140ns × (0.99,1.04) 142ns × (0.97,1.04) ~ (p=0.226) BenchmarkRegexpMatchEasy1_1K-4 896ns × (0.99,1.03) 907ns × (0.97,1.04) ~ (p=0.155) BenchmarkRegexpMatchMedium_32-4 255ns × (0.99,1.04) 255ns × (0.98,1.04) ~ (p=0.904) BenchmarkRegexpMatchMedium_1K-4 73.4µs × (0.99,1.04) 73.8µs × (0.98,1.04) ~ (p=0.560) BenchmarkRegexpMatchHard_32-4 3.93µs × (0.98,1.04) 3.95µs × (0.98,1.04) ~ (p=0.571) BenchmarkRegexpMatchHard_1K-4 117µs × (1.00,1.01) 119µs × (0.98,1.04) +1.48% (p=0.048) BenchmarkRevcomp-4 990ms × (0.94,1.08) 989ms × (0.94,1.10) ~ (p=0.957) BenchmarkTemplate-4 137ms × (0.98,1.02) 137ms × (0.99,1.01) ~ (p=0.996) BenchmarkTimeParse-4 629ns × (1.00,1.00) 629ns × (0.99,1.01) ~ (p=0.924) BenchmarkTimeFormat-4 710ns × (0.99,1.01) 716ns × (0.98,1.02) +0.84% (p=0.033) Change-Id: I43a04e0f6ad5e3ba9847dddf12e13222561f9cf4 Reviewed-on: https://go-review.googlesource.com/9543 Reviewed-by: Austin Clements <austin@google.com>	2015-04-30 15:50:12 +00:00
Russ Cox	32d6fbcb4f	runtime: replace needwb() with writeBarrierEnabled Reduce the write barrier check to a single load and compare so that it can be inlined into write barrier use sites. Makes the standard write barrier a little faster too. name old new delta BenchmarkBinaryTree17 17.9s × (0.99,1.01) 17.9s × (1.00,1.01) ~ BenchmarkFannkuch11 4.35s × (1.00,1.00) 4.43s × (1.00,1.00) +1.81% BenchmarkFmtFprintfEmpty 120ns × (0.93,1.06) 110ns × (1.00,1.06) -7.92% BenchmarkFmtFprintfString 479ns × (0.99,1.00) 487ns × (0.99,1.00) +1.67% BenchmarkFmtFprintfInt 452ns × (0.99,1.02) 450ns × (0.99,1.00) ~ BenchmarkFmtFprintfIntInt 766ns × (0.99,1.01) 762ns × (1.00,1.00) ~ BenchmarkFmtFprintfPrefixedInt 576ns × (0.98,1.01) 584ns × (0.99,1.01) ~ BenchmarkFmtFprintfFloat 730ns × (1.00,1.01) 738ns × (1.00,1.00) +1.16% BenchmarkFmtManyArgs 2.84µs × (0.99,1.00) 2.80µs × (1.00,1.01) -1.22% BenchmarkGobDecode 39.3ms × (0.98,1.01) 39.0ms × (0.99,1.00) ~ BenchmarkGobEncode 39.5ms × (0.99,1.01) 37.8ms × (0.98,1.01) -4.33% BenchmarkGzip 663ms × (1.00,1.01) 661ms × (0.99,1.01) ~ BenchmarkGunzip 143ms × (1.00,1.00) 142ms × (1.00,1.00) ~ BenchmarkHTTPClientServer 132µs × (0.99,1.01) 132µs × (0.99,1.01) ~ BenchmarkJSONEncode 57.4ms × (0.99,1.01) 56.3ms × (0.99,1.01) -1.96% BenchmarkJSONDecode 139ms × (0.99,1.00) 138ms × (0.99,1.01) ~ BenchmarkMandelbrot200 6.03ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ BenchmarkGoParse 10.3ms × (0.89,1.14) 10.2ms × (0.87,1.05) ~ BenchmarkRegexpMatchEasy0_32 209ns × (1.00,1.00) 208ns × (1.00,1.00) ~ BenchmarkRegexpMatchEasy0_1K 591ns × (0.99,1.00) 588ns × (1.00,1.00) ~ BenchmarkRegexpMatchEasy1_32 184ns × (0.99,1.02) 182ns × (0.99,1.01) ~ BenchmarkRegexpMatchEasy1_1K 1.01µs × (1.00,1.00) 0.99µs × (1.00,1.01) -2.33% BenchmarkRegexpMatchMedium_32 330ns × (1.00,1.00) 323ns × (1.00,1.01) -2.12% BenchmarkRegexpMatchMedium_1K 92.6µs × (1.00,1.00) 89.9µs × (1.00,1.00) -2.92% BenchmarkRegexpMatchHard_32 4.80µs × (0.95,1.00) 4.72µs × (0.95,1.01) ~ BenchmarkRegexpMatchHard_1K 136µs × (1.00,1.00) 133µs × (1.00,1.01) -1.86% BenchmarkRevcomp 900ms × (0.99,1.04) 900ms × (1.00,1.05) ~ BenchmarkTemplate 172ms × (1.00,1.00) 168ms × (0.99,1.01) -2.07% BenchmarkTimeParse 637ns × (1.00,1.00) 637ns × (1.00,1.00) ~ BenchmarkTimeFormat 744ns × (1.00,1.01) 738ns × (1.00,1.00) -0.67% Change-Id: I4ecc925805da1f5ee264377f1f7574f54ee575e7 Reviewed-on: https://go-review.googlesource.com/9321 Reviewed-by: Austin Clements <austin@google.com>	2015-04-28 01:37:53 +00:00
Austin Clements	1b01910c06	runtime: rename gcController.findRunnable to findRunnableGCWorker This avoids confusion with the main findrunnable in the scheduler. Change-Id: I8cf40657557a8610a2fe5a2f74598518256ca7f0 Reviewed-on: https://go-review.googlesource.com/9305 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 19:26:42 +00:00
Austin Clements	bb6320535d	runtime: replace STW for enabling write barriers with ragged barrier Currently, we use a full stop-the-world around enabling write barriers. This is to ensure that all Gs have enabled write barriers before any blackening occurs (either in gcBgMarkWorker() or in gcAssistAlloc()). However, there's no need to bring the whole world to a synchronous stop to ensure this. This change replaces the STW with a ragged barrier that ensures each P has individually observed that write barriers should be enabled before GC performs any blackening. Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955 Reviewed-on: https://go-review.googlesource.com/8207 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 19:26:37 +00:00
Austin Clements	1b4025f4bd	runtime: replace per-M workbuf cache with per-P gcWork cache Currently, each M has a cache of the most recently used workbuf. This is used primarily by the write barrier so it doesn't have to access the global workbuf lists on every write barrier. It's also used by stack scanning because it's convenient. This cache is important for write barrier performance, but this particular approach has several downsides. It's faster than no cache, but far from optimal (as the benchmarks below show). It's complex: access to the cache is sprinkled through most of the workbuf list operations and it requires special care to transform into and back out of the gcWork cache that's actually used for scanning and marking. It requires atomic exchanges to take ownership of the cached workbuf and to return it to the M's cache even though it's almost always used by only the current M. Since it's per-M, flushing these caches is O(# of Ms), which may be high. And it has some significant subtleties: for example, in general the cache shouldn't be used after the harvestwbufs() in mark termination because it could hide work from mark termination, but stack scanning can happen after this and will* use the cache (but it turns out this is okay because it will always be followed by a getfull(), which drains the cache). This change replaces this cache with a per-P gcWork object. This gcWork cache can be used directly by scanning and marking (as long as preemption is disabled, which is a general requirement of gcWork). Since it's per-P, it doesn't require synchronization, which simplifies things and means the only atomic operations in the write barrier are occasionally fetching new work buffers and setting a mark bit if the object isn't already marked. This cache can be flushed in O(# of Ps), which is generally small. It follows a simple flushing rule: the cache can be used during any phase, but during mark termination it must be flushed before allowing preemption. This also makes the dispose during mutator assist no longer necessary, which eliminates the vast majority of gcWork dispose calls and reduces contention on the global workbuf lists. And it's a lot faster on some benchmarks: benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 11963668673 11206112763 -6.33% BenchmarkFannkuch11 2643217136 2649182499 +0.23% BenchmarkFmtFprintfEmpty 70.4 70.2 -0.28% BenchmarkFmtFprintfString 364 307 -15.66% BenchmarkFmtFprintfInt 317 282 -11.04% BenchmarkFmtFprintfIntInt 512 483 -5.66% BenchmarkFmtFprintfPrefixedInt 404 380 -5.94% BenchmarkFmtFprintfFloat 521 479 -8.06% BenchmarkFmtManyArgs 2164 1894 -12.48% BenchmarkGobDecode 30366146 22429593 -26.14% BenchmarkGobEncode 29867472 26663152 -10.73% BenchmarkGzip 391236616 396779490 +1.42% BenchmarkGunzip 96639491 96297024 -0.35% BenchmarkHTTPClientServer 100110 70763 -29.31% BenchmarkJSONEncode 51866051 52511382 +1.24% BenchmarkJSONDecode 103813138 86094963 -17.07% BenchmarkMandelbrot200 4121834 4120886 -0.02% BenchmarkGoParse 16472789 5879949 -64.31% BenchmarkRegexpMatchEasy0_32 140 140 +0.00% BenchmarkRegexpMatchEasy0_1K 394 394 +0.00% BenchmarkRegexpMatchEasy1_32 120 120 +0.00% BenchmarkRegexpMatchEasy1_1K 621 614 -1.13% BenchmarkRegexpMatchMedium_32 209 202 -3.35% BenchmarkRegexpMatchMedium_1K 54889 55175 +0.52% BenchmarkRegexpMatchHard_32 2682 2675 -0.26% BenchmarkRegexpMatchHard_1K 79383 79524 +0.18% BenchmarkRevcomp 584116718 584595320 +0.08% BenchmarkTemplate 125400565 109620196 -12.58% BenchmarkTimeParse 386 387 +0.26% BenchmarkTimeFormat 580 447 -22.93% (Best out of 10 runs. The delta of averages is similar.) This also puts us in a good position to flush these caches when nearing the end of concurrent marking, which will let us increase the size of the work buffers while still controlling mark termination pause time. Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522 Reviewed-on: https://go-review.googlesource.com/9178 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 20:10:14 +00:00
Austin Clements	d1cae6358c	runtime: fix check for pending GC work When findRunnable considers running a fractional mark worker, it first checks if there's any work to be done; if there isn't there's no point in running the worker because it will just reschedule immediately. However, currently findRunnable just checks work.full and work.partial, whereas getfull can also draw work from m.currentwbuf. As a result, findRunnable may not start a worker even though there actually is work. This problem manifests itself in occasional failures of the test/init1.go test. This test is unusual because it performs a large amount of allocation without executing any write barriers, which means there's nothing to force the pointers in currentwbuf out to the work.partial/full lists where findRunnable can see them. This change fixes this problem by making findRunnable also check for a currentwbuf. This aligns findRunnable with trygetfull's notion of whether or not there's work. Change-Id: Ic76d22b7b5d040bc4f58a6b5975e9217650e66c4 Reviewed-on: https://go-review.googlesource.com/9299 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 20:10:10 +00:00
Austin Clements	26eac917dc	runtime: start dedicated mark workers even if there's no work Currently, findRunnable only considers running a mark worker if there's work in the work queue. In principle, this can delay the start of the desired number of dedicated mark workers if there's no work pending. This is unlikely to occur in practice, since there should be work queued from the scan phase, but if it were to come up, a CPU hog mutator could slow down or delay garbage collection. This check makes sense for fractional mark workers, since they'll just return to the scheduler immediately if there's no work, but we want the scheduler to start all of the dedicated mark workers promptly, even if there's currently no queued work. Hence, this change moves the pending work check after the check for starting a dedicated worker. Change-Id: I52b851cc9e41f508a0955b3f905ca80f109ea101 Reviewed-on: https://go-review.googlesource.com/9298 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-24 20:10:05 +00:00
Austin Clements	711a164267	runtime: fix some out-of-date comments bgMarkCount no longer exists. Change-Id: I3aa406fdccfca659814da311229afbae55af8304 Reviewed-on: https://go-review.googlesource.com/9297 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-24 20:10:01 +00:00
Austin Clements	0e6a6c510f	runtime: simplify process for starting GC goroutine Currently, when allocation reaches the GC trigger, the runtime uses readyExecute to start the GC goroutine immediately rather than wait for the scheduler to get around to the GC goroutine while the mutator continues to grow the heap. Now that the scheduler runs the most recently readied goroutine when a goroutine yields its time slice, this rigmarole is no longer necessary. The runtime can simply ready the GC goroutine and yield from the readying goroutine. Change-Id: I3b4ebadd2a72a923b1389f7598f82973dd5c8710 Reviewed-on: https://go-review.googlesource.com/9292 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-04-24 15:13:05 +00:00
Austin Clements	ce502b063c	runtime: use park/ready to wake up GC at end of concurrent mark Currently, the main GC goroutine sleeps on a note during concurrent mark and the first background mark worker or assist to finish marking use wakes up that note to let the main goroutine proceed into mark termination. Unfortunately, the latency of this wakeup can be quite high, since the GC goroutine will typically have lost its P while in the futex sleep, meaning it will be placed on the global run queue and will wait there until some P is kind enough to pick it up. This delay gives the mutator more time to allocate and create floating garbage, growing the heap unnecessarily. Worse, it's likely that background marking has stopped at this point (unless GOMAXPROCS>4), so anything that's allocated and published to the heap during this window will have to be scanned during mark termination while the world is stopped. This change replaces the note sleep/wakeup with a gopark/ready scheme. This keeps the wakeup inside the Go scheduler and lets the garbage collector take advantage of the new scheduler semantics that run the ready()d goroutine immediately when the ready()ing goroutine sleeps. For the json benchmark from x/benchmarks with GOMAXPROCS=4, this reduces the delay in waking up the GC goroutine and entering mark termination once concurrent marking is done from ~100ms to typically <100µs. Change-Id: Ib11f8b581b8914f2d68e0094f121e49bac3bb384 Reviewed-on: https://go-review.googlesource.com/9291 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:13:01 +00:00
Austin Clements	4e32718d3e	runtime: use timer for GC control revise rather than timeout Currently, we use a note sleep with a timeout in a loop in func gc to periodically revise the GC control variables. Replace this with a fully blocking note sleep and use a periodic timer to trigger the revise instead. This is a step toward replacing the note sleep in func gc. Change-Id: I2d562f6b9b2e5f0c28e9a54227e2c0f8a2603f63 Reviewed-on: https://go-review.googlesource.com/9290 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:12:56 +00:00
Austin Clements	ed09e0e2bf	runtime: fix underflow in next_gc calculation Currently, it's possible for the next_gc calculation to underflow. Since next_gc is unsigned, this wraps around and effectively disables GC for the rest of the program's execution. Besides being obviously wrong, this is causing test failures on 32-bit because some tests are running out of heap. This underflow happens for two reasons, both having to do with how we estimate the reachable heap size at the end of the GC cycle. One reason is that this calculation depends on the value of heap_live at the beginning of the GC cycle, but we currently only record that value during a concurrent GC and not during a forced STW GC. Fix this by moving the recorded value from gcController to work and recording it on a common code path. The other reason is that we use the amount of allocation during the GC cycle as an approximation of the amount of floating garbage and subtract it from the marked heap to estimate the reachable heap. However, since this is only an approximation, it's possible for the amount of allocation during the cycle to be larger than the marked heap size (since the runtime allocates white and it's possible for these allocations to never be made reachable from the heap). Currently this causes wrap-around in our estimate of the reachable heap size, which in turn causes wrap-around in next_gc. Fix this by bottoming out the reachable heap estimate at 0, in which case we just fall back to triggering GC at heapminimum (which is okay since this only happens on small heaps). Fixes #10555, fixes #10556, and fixes #10559. Change-Id: Iad07b529c03772356fede2ae557732f13ebfdb63 Reviewed-on: https://go-review.googlesource.com/9286 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-23 20:52:54 +00:00
Austin Clements	4655aadd00	runtime: use reachable heap estimate to set trigger/goal Currently, we set the heap goal for the next GC cycle using the size of the marked heap at the end of the current cycle. This can lead to a bad feedback loop if the mutator is rapidly allocating and releasing pointers that can significantly bloat heap size. If the GC were STW, the marked heap size would be exactly the reachable heap size (call it stwLive). However, in concurrent GC, marked=stwLive+floatLive, where floatLive is the amount of "floating garbage": objects that were reachable at some point during the cycle and were marked, but which are no longer reachable by the end of the cycle. If the GC cycle is short, then the mutator doesn't have much time to create floating garbage, so marked≈stwLive. However, if the GC cycle is long and the mutator is allocating and creating floating garbage very rapidly, then it's possible that marked≫stwLive. Since the runtime currently sets the heap goal based on marked, this will cause it to set a high heap goal. This means that 1) the next GC cycle will take longer because of the larger heap and 2) the assist ratio will be low because of the large distance between the trigger and the goal. The combination of these lets the mutator produce even more floating garbage in the next cycle, which further exacerbates the problem. For example, on the garbage benchmark with GOMAXPROCS=1, this causes the heap to grow to ~500MB and the garbage collector to retain upwards of ~300MB of heap, while the true reachable heap size is ~32MB. This, in turn, causes the GC cycle to take upwards of ~3 seconds. Fix this bad feedback loop by estimating the true reachable heap size (stwLive) and using this rather than the marked heap size (stwLive+floatLive) as the basis for the GC trigger and heap goal. This breaks the bad feedback loop and causes the mutator to assist more, which decreases the rate at which it can create floating garbage. On the same garbage benchmark, this reduces the maximum heap size to ~73MB, the retained heap to ~40MB, and the duration of the GC cycle to ~200ms. Change-Id: I7712244c94240743b266f9eb720c03802799cdd1 Reviewed-on: https://go-review.googlesource.com/9177 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-22 19:28:42 +00:00
Austin Clements	1ccc577b8a	runtime: include heap goal in gctrace line This may or may not be useful to the end user, but it's incredibly useful for us to understand the behavior of the pacer. Currently this is fairly easy (though not trivial) to derive from the other heap stats we print, but we're about to change how we compute the goal, which will make it much harder to derive. Change-Id: I796ef233d470c01f606bd9929820c01ece1f585a Reviewed-on: https://go-review.googlesource.com/9176 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-22 19:07:44 +00:00
Austin Clements	1f39beb01a	runtime: avoid divide-by-zero in GC trigger controller The trigger controller computes GC CPU utilization by dividing by the wall-clock time that's passed since concurrent mark began. Since this delta is nanoseconds it's borderline impossible for it to be zero, but if it is zero we'll currently divide by zero. Be robust to this possibility by ignoring the utilization in the error term if no time has elapsed. Change-Id: I93dfc9e84735682af3e637f6538d1e7602634f09 Reviewed-on: https://go-review.googlesource.com/9175 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-22 19:07:36 +00:00
Austin Clements	170fb10089	runtime: assist harder if GC exceeds the estimated marked heap Currently, the GC controller computes the mutator assist ratio at the beginning of the cycle by estimating that the marked heap size this cycle will be the same as it was the previous cycle. It then uses that assist ratio for the rest of the cycle. However, this means that if the mutator is quickly growing its reachable heap, the heap size is likely to exceed the heap goal and currently there's no additional pressure on mutator assists when this happens. For example, 6g (with GOMAXPROCS=1) frequently exceeds the goal heap size by ~25% because of this. This change makes GC revise its work estimate and the resulting assist ratio every 10ms during the concurrent mark. Instead of unconditionally using the marked heap size from the last cycle as an estimate for this cycle, it takes the minimum of the previously marked heap and the currently marked heap. As a result, as the cycle approaches or exceeds its heap goal, this will increase the assist ratio to put more pressure on the mutator assist to bring the cycle to an end. For 6g, this causes the GC to always finish within 5% and often within 1% of its heap goal. Change-Id: I4333b92ad0878c704964be42c655c38a862b4224 Reviewed-on: https://go-review.googlesource.com/9070 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-04-21 15:35:55 +00:00
Austin Clements	e0c3d85f08	runtime: fix background marking at 25% utilization Currently, in accordance with the GC pacing proposal, we schedule background marking with a goal of achieving 25% utilization total between mutator assists and background marking. This is stricter than was set out in the Go 1.5 proposal, which suggests that the garbage collector can use 25% just for itself and anything the mutator does to help out is on top of that. It also has several technical drawbacks. Because mutator assist time is constantly changing and we can't have instantaneous information on background marking time, it effectively requires hitting a moving target based on out-of-date information. This works out in the long run, but works poorly for short GC cycles and on short time scales. Also, this requires time-multiplexing all Ps between the mutator and background GC since the goal utilization of background GC constantly fluctuates. This results in a complicated scheduling algorithm, poor affinity, and extra overheads from context switching. This change modifies the way we schedule and run background marking so that background marking always consumes 25% of GOMAXPROCS and mutator assist is in addition to this. This enables a much more robust scheduling algorithm where we pre-determine the number of Ps we should dedicate to background marking as well as the utilization goal for a single floating "remainder" mark worker. Change-Id: I187fa4c03ab6fe78012a84d95975167299eb9168 Reviewed-on: https://go-review.googlesource.com/9013 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:50 +00:00
Austin Clements	24a7252e25	runtime: finish sweeping before concurrent GC starts Currently, the concurrent sweep follows a 1:1 rule: when allocation needs a span, it sweeps a span (likewise, when a large allocation needs N pages, it sweeps until it frees N pages). This rule worked well for the STW collector (especially when GOGC==100) because it did no more sweeping than necessary to keep the heap from growing, would generally finish sweeping just before GC, and ensured good temporal locality between sweeping a page and allocating from it. It doesn't work well with concurrent GC. Since concurrent GC requires starting GC earlier (sometimes much earlier), the sweep often won't be done when GC starts. Unfortunately, the first thing GC has to do is finish the sweep. In the mean time, the mutator can continue allocating, pushing the heap size even closer to the goal size. This worked okay with the 7/8ths trigger, but it gets into a vicious cycle with the GC trigger controller: if the mutator is allocating quickly and driving the trigger lower, more and more sweep work will be left to GC; this both causes GC to take longer (allowing the mutator to allocate more during GC) and delays the start of the concurrent mark phase, which throws off the GC controller's statistics and generally causes it to push the trigger even lower. As an example of a particularly bad case, the garbage benchmark with GOMAXPROCS=4 and -benchmem 512 (MB) spends the first 0.4-0.8 seconds of each GC cycle sweeping, during which the heap grows by between 109MB and 252MB. To fix this, this change replaces the 1:1 sweep rule with a proportional sweep rule. At the end of GC, GC knows exactly how much heap allocation will occur before the next concurrent GC as well as how many span pages must be swept. This change computes this "sweep ratio" and when the mallocgc asks for a span, the mcentral sweeps enough spans to bring the swept span count into ratio with the allocated byte count. On the benchmark from above, this entirely eliminates sweeping at the beginning of GC, which reduces the time between startGC readying the GC goroutine and GC stopping the world for sweep termination to ~100µs during which the heap grows at most 134KB. Change-Id: I35422d6bba0c2310d48bb1f8f30a72d29e98c1af Reviewed-on: https://go-review.googlesource.com/8921 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:46 +00:00
Austin Clements	a0452a6821	runtime: proportional response GC trigger controller Currently, concurrent GC triggers at a fixed 7/8*GOGC heap growth. For mutators that allocate slowly, this means GC will trigger too early and run too often, wasting CPU time on GC. For mutators that allocate quickly, this means GC will trigger too late, causing the program to exceed the GOGC heap growth goal and/or to exceed CPU goals because of a high mutator assist ratio. This change adds a feedback control loop to dynamically adjust the GC trigger from cycle to cycle. By monitoring the heap growth and GC CPU utilization from cycle to cycle, this adjusts the Go garbage collector to target the GOGC heap growth goal and the 25% CPU utilization goal. Change-Id: Ic82eef288c1fa122f73b69fe604d32cbb219e293 Reviewed-on: https://go-review.googlesource.com/8851 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:37 +00:00
Austin Clements	8d03acce54	runtime: multi-threaded, utilization-scheduled background mark Currently, the concurrent mark phase is performed by the main GC goroutine. Prior to the previous commit enabling preemption, this caused marking to always consume 1/GOMAXPROCS of the available CPU time. If GOMAXPROCS=1, this meant background GC would consume 100% of the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use less than the goal of 25%. If GOMAXPROCS=4, background GC would use the goal 25%, but if the mutator wasn't using the remaining 75%, background marking wouldn't take advantage of the idle time. Enabling preemption in the previous commit made GC miss CPU targets in completely different ways, but set us up to bring everything back in line. This change replaces the fixed GC goroutine with per-P background mark goroutines. Once started, these goroutines don't go in the standard run queues; instead, they are scheduled specially such that the time spent in mutator assists and the background mark goroutines totals 25% of the CPU time available to the program. Furthermore, this lets background marking take advantage of idle Ps, which significantly boosts GC performance for applications that under-utilize the CPU. This requires also changing how time is reported for gctrace, so this change splits the concurrent mark CPU time into assist/background/idle scanning. This also requires increasing the size of the StackRecord slice used in a GoroutineProfile test. Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157 Reviewed-on: https://go-review.googlesource.com/8850 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:32 +00:00
Austin Clements	af060c3086	runtime: generally allow preemption during concurrent GC phases Currently, the entire GC process runs with g.m.preemptoff set. In the concurrent phases, the parts that actually need preemption disabled are run on a system stack and there's no overall need to stay on the same M or P during the concurrent phases. Hence, move the setting of g.m.preemptoff to when we start mark termination, at which point we really do need preemption disabled. This dramatically changes the scheduling behavior of the concurrent mark phase. Currently, since this is non-preemptible, concurrent mark gets one dedicated P (so 1/GOMAXPROCS utilization). With this change, the GC goroutine is scheduled like any other goroutine during concurrent mark, so it gets 1/<runnable goroutines> utilization. You might think it's not even necessary to set g.m.preemptoff at that point since the world is stopped, but stackalloc/stackfree use this as a signal that the per-P pools are not safe to access without synchronization. Change-Id: I08aebe8179a7d304650fb8449ff36262b3771099 Reviewed-on: https://go-review.googlesource.com/8839 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:27 +00:00
Austin Clements	100da60979	runtime: track time spent in mutator assists This time is tracked per P and periodically flushed to the global controller state. This will be used to compute mutator assist utilization in order to schedule background GC work. Change-Id: Ib94f90903d426a02cf488bf0e2ef67a068eb3eec Reviewed-on: https://go-review.googlesource.com/8837 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:22 +00:00
Austin Clements	4b2fde945a	runtime: proportional mutator assist Currently, mutator allocation periodically assists the garbage collector by performing a small, fixed amount of scanning work. However, to control heap growth, mutators need to perform scanning work proportional to their allocation rate. This change implements proportional mutator assists. This uses the scan work estimate computed by the garbage collector at the beginning of each cycle to compute how much scan work must be performed per allocation byte to complete the estimated scan work by the time the heap reaches the goal size. When allocation triggers an assist, it uses this ratio and the amount allocated since the last assist to compute the assist work, then attempts to steal as much of this work as possible from the background collector's credit, and then performs any remaining scan work itself. Change-Id: I98b2078147a60d01d6228b99afd414ef857e4fba Reviewed-on: https://go-review.googlesource.com/8836 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:18 +00:00
Austin Clements	8e24283a28	runtime: track background scan work credit This tracks scan work done by background GC in a global pool. Mutator assists will draw on this credit to avoid doing work when background GC is staying ahead. Unlike the other GC controller tracking variables, this will be both written and read throughout the cycle. Hence, we can't arbitrarily delay updates like we can for scan work and bytes marked. However, we still want to minimize contention, so this global credit pool is allowed some error from the "true" amount of credit. Background GC accumulates credit locally up to a limit and only then flushes to the global pool. Similarly, mutator assists will draw from the credit pool in batches. Change-Id: I1aa4fc604b63bf53d1ee2a967694dffdfc3e255e Reviewed-on: https://go-review.googlesource.com/8834 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:09 +00:00
Austin Clements	4e9fc0df48	runtime: implement GC scan work estimator This implements tracking the scan work ratio of a GC cycle and using this to estimate the scan work that will be required by the next GC cycle. Currently this estimate is unused; it will be used to drive mutator assists. Change-Id: I8685b59d89cf1d83eddfc9b30d84da4e3a7f4b72 Reviewed-on: https://go-review.googlesource.com/8833 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:04 +00:00
Austin Clements	571ebae6ef	runtime: track scan work performed during concurrent mark This tracks the amount of scan work in terms of scanned pointers during the concurrent mark phase. We'll use this information to estimate scan work for the next cycle. Currently this aggregates the work counter in gcWork and dispose atomically aggregates this into a global work counter. dispose happens relatively infrequently, so the contention on the global counter should be low. If this turns out to be an issue, we can reduce the number of disposes, and if it's still a problem, we can switch to per-P counters. Change-Id: Iac0364c466ee35fab781dbbbe7970a5f3c4e1fc1 Reviewed-on: https://go-review.googlesource.com/8832 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:00 +00:00
Russ Cox	6a2b0c0b6d	runtime: delete cgo_allocate This memory is untyped and can't be used anymore. The next version of SWIG won't need it. Change-Id: I592b287c5f5186975ee09a9b28d8efe3b57134e7 Reviewed-on: https://go-review.googlesource.com/8956 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-04-17 01:30:47 +00:00
Austin Clements	4b956ae317	runtime: start concurrent GC promptly when we reach its trigger Currently, when allocation reaches the concurrent GC trigger size, we start the concurrent collector by ready'ing its G. This simply puts it on the end of the P's run queue, which means we may not actually start GC for some time as the current G continues to run and then the P drains other Gs already on its run queue. Since the mutator can continue to allocate, the heap can potentially be much larger than we intended by the time GC actually starts. Furthermore, how much larger is difficult to predict since it depends on the scheduler. Fix this by preempting the current G and switching directly to the concurrent GC G as soon as we reach the trigger heap size. On the garbage benchmark from the benchmarks subrepo with GOMAXPROCS=4, this reduces the time from triggering the GC to the beginning of sweep termination by 10 to 30 milliseconds, which reduces allocation after the trigger by up to 10MB (a large fraction of the 64MB live heap the benchmark tries to maintain). One other known source of delay before we "really" start GC is the sweep finalization performed before sweep termination. This has similar negative effects on heap size and predictability, but is an orthogonal problem. This change adds a TODO for this. Change-Id: I8bae98cb43685c1bf353ff55868e4647e3743c47 Reviewed-on: https://go-review.googlesource.com/8513 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-10 18:22:52 +00:00
Austin Clements	6afb5fa48f	runtime: remove GoSched/GoStart trace events around GC These were appropriate for STW GC, since it interrupted the allocating Goroutine, but don't apply to concurrent GC, which runs on its own Goroutine. Forced GC is still STW, but it makes sense to attribute the GC to the goroutine that called runtime.GC(). Change-Id: If12418ca66dc7e53b8b16025af4e03adb5d9577e Reviewed-on: https://go-review.googlesource.com/8715 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-10 18:21:52 +00:00
Michael Hudson-Doyle	a1f57598cc	runtime, cmd/internal/ld: rename themoduledata to firstmoduledata 'themoduledata' doesn't really make sense now we support multiple moduledata objects. Change-Id: I8263045d8f62a42cb523502b37289b0fba054f62 Reviewed-on: https://go-review.googlesource.com/8521 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-10 05:11:49 +00:00
Michael Hudson-Doyle	fae4a128cb	runtime, reflect: support multiple moduledata objects This changes all the places that consult themoduledata to consult a linked list of moduledata objects, as will be necessary for -linkshared to work. Obviously, as there is as yet no way of adding moduledata objects to this list, all this change achieves right now is wasting a few instructions here and there. Change-Id: I397af7f60d0849b76aaccedf72238fe664867051 Reviewed-on: https://go-review.googlesource.com/8231 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-10 04:51:42 +00:00
Austin Clements	cb10ff1ef9	runtime: report next_gc for initial heap size in gctrace Currently, the initial heap size reported in the gctrace line is the heap_live right before sweep termination. However, we triggered GC when heap_live reached next_gc, and there may have been significant allocation between that point and the beginning of sweep termination. Ideally these would be essentially the same, but currently there's scheduler delay when readying the GC goroutine as well as delay from background sweep finalization. We should fix this delay, but in the mean time, to give the user a better idea of how much the heap grew during the whole of garbage collection, report the trigger rather than what the heap size happened to be after the garbage collector finished rolling out of bed. This will also be more useful for heap growth plots. Change-Id: I08476b9fbcfb2de90592405e9c9f434dfb9eb1f8 Reviewed-on: https://go-review.googlesource.com/8512 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-09 22:18:06 +00:00
Austin Clements	8c3fc088fb	runtime: report marked heap size in gctrace When the gctrace GODEBUG option is enabled, it will now report three heap sizes: the heap size at the beginning of the GC cycle, the heap size at the end of the GC cycle before sweeping, and marked heap size, which is the amount of heap that will be retained until the next GC cycle. Change-Id: Ie13f8a6d5c609bc9cc47c7555960ab55b37b5f1c Reviewed-on: https://go-review.googlesource.com/8430 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-06 21:28:23 +00:00
Austin Clements	6d12b1780e	runtime: make next_gc be heap size to trigger GC at In the STW collector, next_gc was both the heap size to trigger GC at as well as the goal heap size. Early in the concurrent collector's development, next_gc was the goal heap size, but was also used as the heap size to trigger GC at. This meant we always overshot the goal because of allocation during concurrent GC. Currently, next_gc is still the goal heap size, but we trigger concurrent GC at 7/8*GOGC heap growth. This complicates shouldtriggergc, but was necessary because of the incremental maintenance of next_gc. Now we simply compute next_gc for the next cycle during mark termination. Hence, it's now easy to take the simpler route and redefine next_gc as the heap size at which the next GC triggers. We can directly compute this with the 7/8 backoff during mark termination and shouldtriggergc can simply test if the live heap size has grown over the next_gc trigger. This will also simplify later changes once we start setting next_gc in more sophisticated ways. Change-Id: I872be4ae06b4f7a0d7f7967360a054bd36b90eea Reviewed-on: https://go-review.googlesource.com/8420 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-06 21:28:18 +00:00
Austin Clements	d7e0ad4b82	runtime: introduce heap_live; replace use of heap_alloc in GC Currently there are two main consumers of memstats.heap_alloc: updatememstats (aka ReadMemStats) and shouldtriggergc. updatememstats recomputes heap_alloc from the ground up, so we don't need to keep heap_alloc up to date for it. shouldtriggergc wants to know how many bytes were marked by the previous GC plus how many bytes have been allocated since then, but this isn't what heap_alloc tracks. heap_alloc also includes objects that are not marked and haven't yet been swept. Introduce a new memstat called heap_live that actually tracks what shouldtriggergc wants to know and stop keeping heap_alloc up to date. Unlike heap_alloc, heap_live follows a simple sawtooth that drops during each mark termination and increases monotonically between GCs. heap_alloc, on the other hand, has much more complicated behavior: it may drop during sweep termination, slowly decreases from background sweeping between GCs, is roughly unaffected by allocation as long as there are unswept spans (because we sweep and allocate at the same rate), and may go up after background sweeping is done depending on the GC trigger. heap_live simplifies computing next_gc and using it to figure out when to trigger garbage collection. Currently, we guess next_gc at the end of a cycle and update it as we sweep and get a better idea of how much heap was marked. Now, since we're directly tracking how much heap is marked, we can directly compute next_gc. This also corrects bugs that could cause us to trigger GC early. Currently, in any case where sweep termination actually finds spans to sweep, heap_alloc is an overestimation of live heap, so we'll trigger GC too early. heap_live, on the other hand, is unaffected by sweeping. Change-Id: I1f96807b6ed60d4156e8173a8e68745ffc742388 Reviewed-on: https://go-review.googlesource.com/8389 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-06 21:28:13 +00:00
Austin Clements	50a66562a0	runtime: track heap bytes marked by GC This tracks the number of heap bytes marked by a GC cycle. We'll use this information to precisely trigger the next GC cycle. Currently this aggregates the work counter in gcWork and dispose atomically aggregates this into a global work counter. dispose happens relatively infrequently, so the contention on the global counter should be low. If this turns out to be an issue, we can reduce the number of disposes, and if it's still a problem, we can switch to per-P counters. Change-Id: I1bc377cb2e802ef61c2968602b63146d52e7f5db Reviewed-on: https://go-review.googlesource.com/8388 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-06 21:28:07 +00:00
Austin Clements	f244a1471d	runtime: add cumulative GC CPU % to gctrace line This tracks both total CPU time used by GC and the total time available to all Ps since the beginning of the program and uses this to derive a cumulative CPU usage percent for the gctrace line. Change-Id: Ica85372b8dd45f7621909b325d5ac713a9b0d015 Reviewed-on: https://go-review.googlesource.com/8350 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-02 23:37:13 +00:00
Austin Clements	24ee948269	runtime: update gctrace line for new garbage collector GODEBUG=gctrace=1 turns on a per-GC cycle trace line. The current line is left over from the STW garbage collector and includes a lot of information that is no longer meaningful for the concurrent GC and doesn't include a lot of information that is important. Replace this line with a new line designed for the new garbage collector. This new line is focused more on helping the user understand the impact of the garbage collector on their program and less on telling us, the runtime developers, everything that's happening inside GC. It's designed to fit in 80 columns and intentionally omit some potentially useful things that were in the old line. We might want a "verbose" mode that adds information for us. We'll be able to further simplify the line once we eliminate the STW around enabling the write barrier. Then we'll have just one STW phase, one concurrent phase, and one more STW phase, so we'll be able to reduce the number of times from five to three. Change-Id: Icc30939fe4576fb4491b4eac811649395727aa2a Reviewed-on: https://go-review.googlesource.com/8208 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-02 23:37:06 +00:00
Michael Hudson-Doyle	67426a8a9e	runtime, cmd/internal/ld: change runtime to use a single linker symbol In preparation for being able to run a go program that has code in several objects, this changes from having several linker symbols used by the runtime into having one linker symbol that points at a structure containing the needed data. Multiple object support will construct a linked list of such structures. A follow up will initialize the slices in the themoduledata structure directly from the linker but I was aiming for a minimal diff for now. Change-Id: I613cce35309801cf265a1d5ae5aaca8d689c5cbf Reviewed-on: https://go-review.googlesource.com/7441 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-03-31 22:45:07 +00:00
Austin Clements	06de3f52a7	runtime: document subtlety around entering mark termination The barrier in gcDrain does not account for concurrent gcDrainNs happening in gchelpwork, so it can actually return while there is still work being done. It turns out this is okay, but for subtle reasons involving gcDrainN always being run on the system stack. Document these reasons. Change-Id: Ib07b3753cc4e2b54533ab3081a359cbd1c3c08fb Reviewed-on: https://go-review.googlesource.com/7736 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-20 14:05:05 +00:00
Russ Cox	631d6a33bf	runtime: implement atomicand8 atomically We're skating on thin ice, and things are finally starting to melt around here. (I want to avoid the debugging session that will happen when someone uses atomicand8 expecting it to be atomic with respect to other operations.) Change-Id: I254f1582be4eb1f2d7fbba05335a91c6bf0c7f02 Reviewed-on: https://go-review.googlesource.com/7861 Reviewed-by: Minux Ma <minux@golang.org>	2015-03-20 04:45:29 +00:00
Austin Clements	fa2f9c2c09	runtime: run concurrent mark phase on regular stack Currently, the GC's concurrent mark phase runs on the system stack. There's no need to do this, and running it this way ties up the entire M and P running the GC by preventing the scheduler from preempting the GC even during concurrent mark. Fix this by running concurrent mark on the regular G stack. It's still non-preemptible because we also set preemptoff around the whole GC process, but this moves us closer to making it preemptible. Change-Id: Ia9f1245e299b8c5c513a4b1e3ef13eaa35ac5e73 Reviewed-on: https://go-review.googlesource.com/7730 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-19 15:55:12 +00:00
Austin Clements	bef356b282	runtime: improve comment in concurrent GC "Sync" is not very informative. What's being synchronized and with whom? Update this comment to explain what we're really doing: enabling write barriers. Change-Id: I4f0cbb8771988c7ba4606d566b77c26c64165f0f Reviewed-on: https://go-review.googlesource.com/7700 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-19 15:55:06 +00:00
Austin Clements	d21cef1f8f	runtime: remove pointless harvestwbufs Currently we harvestwbufs the moment we enter the mark phase, even before starting the world again. Since cached wbufs are only filled when we're in mark or mark termination, they should all be empty at this point, making the harvest pointless. Remove the harvest. We should, but do not currently harvest at the end of the mark phase when we're running out of work to do. Change-Id: I5f4ba874f14dd915b8dfbc4ee5bb526eecc2c0b4 Reviewed-on: https://go-review.googlesource.com/7669 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-19 15:55:00 +00:00
Rick Hudson	d0eab03091	runtime: Adjust when write barriers are active Even though the world is stopped the GC may do pointer writes that need to be protected by write barriers. This means that the write barrier must be on continuously from the time the mark phase starts and the mark termination phase ends. Checks were added to ensure that no allocation happens during a GC. Hoist the logic that clears pools the start of the GC so that the memory can be reclaimed during this GC cycle. Change-Id: I9d1551ac5db9bac7bac0cb5370d5b2b19a9e6a52 Reviewed-on: https://go-review.googlesource.com/6990 Reviewed-by: Austin Clements <austin@google.com>	2015-03-10 15:04:12 +00:00
Dmitry Vyukov	919fd24884	runtime: remove runtime frames from stacks in traces Stip uninteresting bottom and top frames from trace stacks. This makes both binary and json trace files smaller, and also makes stacks shorter and more readable in the viewer. Change-Id: Ib9c80ccc280504f0e235f867f53f1d2652c41583 Reviewed-on: https://go-review.googlesource.com/5523 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-03-10 14:46:15 +00:00
Russ Cox	5789b28525	runtime: start GC background sweep eagerly Starting it lazily causes a memory allocation (for the goroutine) during GC. First use of channels for runtime implementation. Change-Id: I9cd24dcadbbf0ee5070ee6d0ed7ea415504f316c Reviewed-on: https://go-review.googlesource.com/6960 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-03-05 21:41:55 +00:00
Dmitry Vyukov	b759e225f5	runtime: bound defer pools (try 2) The unbounded list-based defer pool can grow infinitely. This can happen if a goroutine routinely allocates a defer; then blocks on one P; and then unblocked, scheduled and frees the defer on another P. The scenario was reported on golang-nuts list. We've been here several times. Any unbounded local caches are bad and grow to infinite size. This change introduces central defer pool; local pools become fixed-size with the only purpose of amortizing accesses to the central pool. Freedefer now executes on system stack to not consume nosplit stack space. Change-Id: I1a27695838409259d1586a0adfa9f92bccf7ceba Reviewed-on: https://go-review.googlesource.com/3967 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-03-04 14:29:58 +00:00
Dmitry Vyukov	5ef145c809	runtime: bound sudog cache The unbounded list-based sudog cache can grow infinitely. This can happen if a goroutine is routinely blocked on one P and then unblocked and scheduled on another P. The scenario was reported on golang-nuts list. We've been here several times. Any unbounded local caches are bad and grow to infinite size. This change introduces central sudog cache; local caches become fixed-size with the only purpose of amortizing accesses to the central cache. The change required to move sudog cache from mcache to P, because mcache is not scanned by GC. Change-Id: I3bb7b14710354c026dcba28b3d3c8936a8db4e90 Reviewed-on: https://go-review.googlesource.com/3742 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-03-04 14:14:29 +00:00
Austin Clements	07b73ce146	runtime: simplify gcResetGState Since allglock is held in this function, there's no point to tip-toeing around allgs. Just use a for-range loop. Change-Id: I1ee61c7e8cac8b8ebc8107c0c22f739db5db9840 Reviewed-on: https://go-review.googlesource.com/5882 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-25 15:48:57 +00:00
Austin Clements	b3d791c7bb	runtime: consolidate gcworkdone/gcscanvalid clearing loops Previously, we had three loops in the garbage collector that all cleared the per-G GC flags. Consolidate these into one function. This one function is designed to work in a concurrent setting. As a result, it's slightly more expensive than the loops it replaces during STW phases, but these happen at most twice per GC. Change-Id: Id1ec0074fd58865eb0112b8a0547b267802d0df1 Reviewed-on: https://go-review.googlesource.com/5881 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-25 15:46:41 +00:00
Austin Clements	37b8597178	runtime: remove unnecessary gcworkdone resetting loop The loop in gcMark is redundant with the gcworkdone resetting performed by markroot, which called a few lines later in gcMark. Change-Id: Ie0a826a614ecfa79e6e6b866e8d1de40ba515856 Reviewed-on: https://go-review.googlesource.com/5880 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-25 15:46:21 +00:00
Rick Hudson	e31e35a0de	runtime: reset gcscanvalid and gcworkdone when GODEBUG=gctrace=2 When GODEBUG=gctrace=2 two gcs are preformed. During the first gc the stack scan sets the g's gcscanvalid and gcworkdone flags to true indicating that the stacks have to be scanned and do not need to be rescanned. These need to be reset to false for the second GC so the stacks are rescanned, otherwise if the only pointer to an object is on the stack it will not be discovered and the object will be freed. Typically this will include the object that was just allocated in the mallocgc call that initiated the GC. Change-Id: Ic25163f4689905fd810c90abfca777324005c02f Reviewed-on: https://go-review.googlesource.com/5861 Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-25 00:14:42 +00:00
Russ Cox	89a091de24	runtime: split gc_m into gcMark and gcSweep This is a nice split but more importantly it provides a better way to fit the checkmark phase into the sequencing. Also factor out common span copying into gcSpanCopy. Change-Id: Ia058644974e4ed4ac3cf4b017a3446eb2284d053 Reviewed-on: https://go-review.googlesource.com/5333 Reviewed-by: Austin Clements <austin@google.com>	2015-02-20 17:00:39 +00:00
Russ Cox	929597b9e9	runtime: unroll gc_m loop The loop made more sense when gc_m was not its own function. Change-Id: I71a7f21d777e69c1924e3b534c507476daa4dfdd Reviewed-on: https://go-review.googlesource.com/5332 Reviewed-by: Austin Clements <austin@google.com>	2015-02-20 17:00:30 +00:00
Russ Cox	2b655c0b92	runtime: tidy GC driver Change-Id: I0da26e89ae73272e49e82c6549c774e5bc97f64c Reviewed-on: https://go-review.googlesource.com/5331 Reviewed-by: Austin Clements <austin@google.com>	2015-02-20 17:00:22 +00:00
Russ Cox	5254b7e9ce	runtime: do not unmap work.spans until after checkmark phase This is causing crashes. Change-Id: I1832f33d114bc29894e491dd2baac45d7ab3a50d Reviewed-on: https://go-review.googlesource.com/5330 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-19 21:33:06 +00:00
Russ Cox	6c4b54f409	runtime: missed change from reorganization CL That is, I accidentally dropped this change of Austin's when preparing my CL. I blame Git. Change-Id: I9dd772c84edefad96c4b16785fdd2dea04a4a0d6 Reviewed-on: https://go-review.googlesource.com/5320 Reviewed-by: Austin Clements <austin@google.com>	2015-02-19 20:46:59 +00:00
Russ Cox	484f801ff4	runtime: reorganize memory code Move code from malloc1.go, malloc2.go, mem.go, mgc0.go into appropriate locations. Factor mgc.go into mgc.go, mgcmark.go, mgcsweep.go, mstats.go. A lot of this code was in certain files because the right place was in a C file but it was written in Go, or vice versa. This is one step toward making things actually well-organized again. Change-Id: I6741deb88a7cfb1c17ffe0bcca3989e10207968f Reviewed-on: https://go-review.googlesource.com/5300 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 20:17:01 +00:00
Austin Clements	02dcdba7c8	runtime: switch to gcWork abstraction This converts the garbage collector from directly manipulating work buffers to using the new gcWork abstraction. The previous management of work buffers was rather ad hoc. As a result, switching to the gcWork abstraction changes many details of work buffer management. If greyobject fills a work buffer, it can now pull from work.partial in addition to work.empty. Previously, gcDrain started with a partial or empty work buffer and fetched an empty work buffer if it filled its current buffer (in greyobject). Now, gcDrain starts with a full work buffer and fetches an partial or empty work buffer if it fills its current buffer (in greyobject). The original behavior was bad because gcDrain would immediately drop the empty work buffer returned by greyobject and fetch a full work buffer, which greyobject was likely to immediately overflow, fetching another empty work buffer, etc. The new behavior isn't great at the start because greyobject is likely to immediately overflow the full buffer, but the steady-state behavior should be more stable. Both before and after this change, gcDrain fetches a full work buffer if it drains its current buffer. Basically all of these choices are bad; the right answer is to use a dual work buffer scheme. Previously, shade always fetched a work buffer (though usually from m.currentwbuf), even if the object was already marked. Now it only fetches a work buffer if it actually greys an object. Change-Id: I8b880ed660eb63135236fa5d5678f0c1c041881f Reviewed-on: https://go-review.googlesource.com/5232 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 16:59:34 +00:00
Austin Clements	b30d19de59	runtime: introduce higher-level GC work abstraction This introduces a producer/consumer abstraction for GC work pointers that internally handles the details of filling, draining, and shuffling work buffers. In addition to simplifying the GC code, this should make it easy for us to change how we use work buffers, including cleaning up how we use the work.partial queue, reintroducing a FIFO lookahead cache, adding prefetching, and using dual buffers to avoid flapping. This commit doesn't change any existing code. The following commit will switch the garbage collector from explicit workbuf manipulation to gcWork. Change-Id: Ifbfe5fff45bf0362d6d7c3cecb061f0c9874077d Reviewed-on: https://go-review.googlesource.com/5231 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 16:59:26 +00:00
Austin Clements	1ae124b5ff	runtime: make gcDrainN take an int instead of uintptr Nit. There's no reason to take a uintptr and doing so just requires casts in annoying places. Change-Id: Ifeb9638c6d94eae619c490930cf724cc315680ba Reviewed-on: https://go-review.googlesource.com/5230 Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-19 02:47:45 +00:00
Austin Clements	6e5cc1f1ac	runtime: rename drainworkbuf and drainobjects drainworkbuf is now gcDrain, since it drains until there's nothing left to drain. drainobjects is now gcDrainN because it's the bounded equivalent to gcDrain. The new names use the Go camel case convention because we have to start somewhere. The "gc" prefix is because we don't have runtime packages yet and just "drain" is too ambiguous. Change-Id: I88dbdf32e8ce4ce6c3b7e1f234664be9b76cb8fd Reviewed-on: https://go-review.googlesource.com/4785 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-13 15:34:55 +00:00
Austin Clements	60a16ea367	runtime: remove drainallwbufs argument to drainworkbuf All calls to drainworkbuf now pass true for this argument, so remove the argument and update the documentation to reflect the simplified interface. At a higher level, there are no longer any situations where we drain "one wbuf" (though drainworkbuf didn't guarantee this anyway). We either drain everything, or we drain a specific number of objects. Change-Id: Ib7ee0fde56577eff64232ee1e711ec57c4361335 Reviewed-on: https://go-review.googlesource.com/4784 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-13 15:34:49 +00:00
Austin Clements	15c9a2ef4e	runtime: eliminate drainworkbufs from scanblock scanblock is only called during _GCscan and _GCmarktermination. During _GCscan, scanblock didn't call drainworkbufs anyway. During _GCmarktermination, there's really no point in draining some (largely arbitrary) amount of work during the scanblock, since the GC is about to drain everything anyway, so simply eliminate this case. Change-Id: I7f3c59ce9186a83037c6f9e9b143181acd04c597 Reviewed-on: https://go-review.googlesource.com/4783 Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-13 15:34:39 +00:00
Austin Clements	1ac65f82ad	runtime: eliminate b == 0 special case from scanblock We no longer ever call scanblock with b == 0. Change-Id: I9b01da39595e0cc251668c24d58748d88f5f0792 Reviewed-on: https://go-review.googlesource.com/4782 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-13 15:34:34 +00:00
Austin Clements	cf964e1653	runtime: replace scanblock(0, 0, nil, nil) with drainworkbuf scanblock(0, 0, nil, nil) was just a confusing way of saying wbuf = getpartialorempty() drainworkbuf(wbuf, true) Make drainworkbuf accept a nil workbuf and perform the getpartialorempty itself and replace all uses of scanblock(0, 0, nil, nil) with direct calls to drainworkbuf(nil, true). Change-Id: I7002a2f8f3eaf6aa85bbf17ccc81d7288acfef1c Reviewed-on: https://go-review.googlesource.com/4781 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-13 15:34:24 +00:00
Austin Clements	c2de2f87f0	runtime: move checknocurrentwbuf() from scanblock to drainworkbuf Previously, scanblock called checknocurrentwbuf() after drainworkbuf(). Move this call into drainworkbuf so that every return path from drainworkbuf calls checknocurrentwbuf(). This is equivalent to the previous code because scanblock was the only caller of drainworkbuf. Change-Id: I96ef2168c8aa169bfc4d368f296342fa0fbeafb4 Reviewed-on: https://go-review.googlesource.com/4780 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-13 15:34:08 +00:00
Austin Clements	277d5870a0	runtime: move wbuf-related functions to new gcwork.go No code modifications. This is in preparation for improving the wbuf abstraction. Change-Id: I719543a345c34d079b7e39b251eccd5dd8a07826 Reviewed-on: https://go-review.googlesource.com/4710 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-12 20:16:35 +00:00
Rick Hudson	a15818fed3	runtime: cache workbufs on Ms and add consistency checks Add local workbufs to the m struct in order to reduce contention. Add consistency checks for workbuf ownership. Chain workbufs through call change to avoid swapping them to and from the m struct. Adjust the size of the workbuf so that the mutators can more frequently pass modifications to the GC thus shifting some work from the STW mark termination phase to the concurrent mark phase. Change-Id: I557b53af34ad9972265e0ed9f5996e52d548563d Reviewed-on: https://go-review.googlesource.com/3972 Reviewed-by: Austin Clements <austin@google.com>	2015-02-11 16:27:17 +00:00
Austin Clements	28b5118415	runtime: rename m.gcing to m.preemptoff and make it a string m.gcing has become overloaded to mean "don't preempt this g" in general. Once the garbage collector is preemptible, the one thing it won't mean is that we're in the garbage collector. So, rename gcing to "preemptoff" and make it a string giving a reason that preemption is disabled. gcing was never set to anything but 0 or 1, so we don't have to worry about there being a stack of reasons. Change-Id: I4337c29e8e942e7aa4f106fc29597e1b5de4ef46 Reviewed-on: https://go-review.googlesource.com/3660 Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-02 19:34:51 +00:00
Austin Clements	f95becaddb	runtime: update a few "onM"s in comments to say "systemstack" Commit `656be31` replaced onM with systemstack, but missed updating a few comments that still referred to onM. Update these. Change-Id: I0efb017e9a66ea0adebb6e1da6e518ee11263f69 Reviewed-on: https://go-review.googlesource.com/3664 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-02 19:33:55 +00:00
Austin Clements	ebbdf2a14c	runtime: eliminate parfor ctx field Prior to the conversion of the runtime to Go, this void* was necessary to get closure information in to C callbacks. There are no more C callbacks and parfor is perfectly capable of invoking a Go closure now, so eliminate ctx and all of its unsafe-ness. (Plus, the runtime currently doesn't use ctx for anything.) Change-Id: I39fc53b7dd3d7f660710abc76b0d831bfc6296d8 Reviewed-on: https://go-review.googlesource.com/3395 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-01-29 17:38:16 +00:00
Rick Hudson	813e97b786	runtime: set minimum heap size to 4Mbytes Set the minimum heap size to 4Mbytes except when the hash table code wants to force a GC. In an unrelated change when a mutator is asked to assist the GC by marking pointer workbufs it will keep working until the requested number of pointers are processed even if it means asking for additional workbufs. Change-Id: I661cfc0a7f2efcf6286b5d37d73e593d9ecd04d5 Reviewed-on: https://go-review.googlesource.com/3392 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-01-28 22:14:06 +00:00
Rick Hudson	13aff7831d	runtime: avoid redundant scans During a concurrent GC stacks are scanned in an initial scan phase informing the GC of all pointers on the stack. The GC only needs to rescan the stack if it potentially changes which can only happen if the goroutine runs. This CL tracks whether the Goroutine has run since it was last scanned and thus may have changed its stack. If necessary the stack is rescanned. Change-Id: I5fb1c4338d42e3f61ab56c9beb63b7b2da25f4f1 Reviewed-on: https://go-review.googlesource.com/3275 Reviewed-by: Russ Cox <rsc@golang.org>	2015-01-28 20:05:55 +00:00
Dmitry Vyukov	5288fadbdc	runtime: add tracing of runtime events Add actual tracing of interesting runtime events. Part of a larger tracing functionality: https://docs.google.com/document/u/1/d/1FP5apqzBgr7ahCCgFO-yoVhk4YZrNIDNf9RybngBc14/pub Full change: https://codereview.appspot.com/146920043 Change-Id: Icccf54aea54e09350bb698ba6bf11532f9fbe6d3 Reviewed-on: https://go-review.googlesource.com/1451 Reviewed-by: Russ Cox <rsc@golang.org>	2015-01-28 16:35:24 +00:00
Rick Hudson	34bc85f6f3	runtime: fix trigger for concurrent GC Adjust triggergc so that we trigger when we have used 7/8 of the available heap memory. Do first collection when we exceed 4Mbytes. Change-Id: I467b4335e16dc9cd1521d687fc1f99a51cc7e54b Reviewed-on: https://go-review.googlesource.com/3149 Reviewed-by: Austin Clements <austin@google.com>	2015-01-21 21:30:46 +00:00
Rick Hudson	0635706849	runtime: Add some diagnostic messages printing source of unmarked object Print out the object holding the reference to the object that checkmark detects as not being properly marked. Change-Id: Ieedbb6fddfaa65714504af9e7230bd9424cd0ae0 Reviewed-on: https://go-review.googlesource.com/2744 Reviewed-by: Austin Clements <austin@google.com>	2015-01-20 19:58:22 +00:00
Russ Cox	3965d7508e	runtime: factor out bitmap, finalizer code from malloc/mgc The code in mfinal.go is moved from malloc.go and mgc.go and substantially unchanged. The code in mbitmap.go is also moved from those files, but cleaned up so that it can be called from those files (in most cases the code being moved was not already a standalone function). I also renamed the constants and wrote comments describing the format. The result is a significant cleanup and isolation of the bitmap code, but, roughly speaking, it should be treated and reviewed as new code. The other files changed only as much as necessary to support this code movement. This CL does NOT change the semantics of the heap or type bitmaps at all, although there are now some obvious opportunities to do so in followup CLs. Change-Id: I41b8d5de87ad1d3cd322709931ab25e659dbb21d Reviewed-on: https://go-review.googlesource.com/2991 Reviewed-by: Keith Randall <khr@golang.org>	2015-01-19 16:26:51 +00:00
Russ Cox	4d226dfee9	runtime: move write barrier code into mbarrier.go I also added new comments at the top of mbarrier.go, but the rest of the code is just copy-and-paste. Change-Id: Iaeb2b12f8b1eaa33dbff5c2de676ca902bfddf2e Reviewed-on: https://go-review.googlesource.com/2990 Reviewed-by: Austin Clements <austin@google.com>	2015-01-19 15:27:23 +00:00
Russ Cox	6482fe6c65	runtime: delete dead code called from C. printf, vprintf, snprintf, gc_m_ptr, gc_g_ptr, gc_itab_ptr, gc_unixnanotime. These were called from C. There is no more C. Now that vprintf is gone, delete roundup, which is unsafe (see CL 2814). Change-Id: If8a7b727d497ffa13165c0d3a1ed62abc18f008c Reviewed-on: https://go-review.googlesource.com/2824 Reviewed-by: Austin Clements <austin@google.com>	2015-01-14 22:20:44 +00:00
Austin Clements	ea4f14cf2b	runtime: rename var checkmark to checkmarkphase The old name was too ambiguous (is it a verb? is it a predicate? is it a constant?) and too close to debug.gccheckmark. Hopefully the new name conveys that this variable indicates that we are currently doing mark checking. Change-Id: I031cd48b0906cdc7774f5395281d3aeeb8ef3ec9 Reviewed-on: https://go-review.googlesource.com/2656 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-01-14 16:08:46 +00:00
Russ Cox	aae0f074c0	runtime: fix a few GC-related bugs 1) Move non-preemption check even earlier in newstack. This avoids a few priority inversion problems. 2) Always use atomic operations to update bitmap for 1-word objects. This avoids lost mark bits during concurrent GC. 3) Stop using work.nproc == 1 as a signal for being single-threaded. The concurrent GC runs with work.nproc == 1 but other procs are running mutator code. The use of work.nproc == 1 in getfull is safe, but remove it anyway, since it is saving only a single atomic operation per GC round. Fixes #9225. Change-Id: I24134f100ad592ea8cb59efb6a54f5a1311093dc Reviewed-on: https://go-review.googlesource.com/2745 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-01-14 15:05:33 +00:00
Austin Clements	654297cb02	runtime: add GODEBUG=gccheckmark=0/1 Previously, gccheckmark could only be enabled or disabled by calling runtime.GCcheckmarkenable/GCcheckmarkdisable. This was a necessary hack because GODEBUG was broken. Now that GODEBUG works again, move control over gccheckmark to a GODEBUG variable and remove these runtime functions. Currently, gccheckmark is enabled by default (and will probably remain so for much of the 1.5 development cycle). Change-Id: I2bc6f30c21b795264edf7dbb6bd7354b050673ab Reviewed-on: https://go-review.googlesource.com/2603 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-01-12 16:36:50 +00:00
Rick Hudson	db7fd1c142	runtime: increase GC concurrency. run GC in its own background goroutine making the caller runnable if resources are available. This is critical in single goroutine applications. Allow goroutines that allocate a lot to help out the GC and in doing so throttle their own allocation. Adjust test so that it only detects that a GC is run during init calls and not whether the GC is memory efficient. Memory efficiency work will happen later in 1.5. Change-Id: I4306f5e377bb47c69bda1aedba66164f12b20c2b Reviewed-on: https://go-review.googlesource.com/2349 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-01-08 20:34:56 +00:00
Russ Cox	02f89331c2	runtime: fix two garbage collector bugs First, call clearcheckmarks immediately after changing checkmark, so that there is less time when the checkmark flag and the bitmap are inconsistent. The tiny gap between the two lines is fine, because the world is stopped. Before, the gap was much larger and included such code as "go bgsweep()", which allocated. Second, modify gcphase only when the world is stopped. As written, gcscan_m was changing gcphase from 0 to GCscan and back to 0 while other goroutines were running. Another goroutine running at the same time might decide to sleep, see GCscan, call gcphasework, and start "helping" by scanning its stack. That's fine, except that if gcphase flips back to 0 as the goroutine calls scanblock, it will start draining the work buffers prematurely. Both of these were found wbshadow=2 (and a lot of hard work). Eventually that will run automatically, but right now it still doesn't quite work for all.bash, due to mmap conflicts with pthread-created threads. Change-Id: I99aa8210cff9c6e7d0a1b62c75be32a23321897b Reviewed-on: https://go-review.googlesource.com/2340 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-01-06 15:10:13 +00:00
Russ Cox	dcec123a49	runtime: add GODEBUG wbshadow for finding missing write barriers This is the detection code. It works well enough that I know of a handful of missing write barriers. However, those are subtle enough that I'll address them in separate followup CLs. GODEBUG=wbshadow=1 checks for a write that bypassed the write barrier at the next write barrier of the same word. If a bug can be detected in this mode it is typically easy to understand, since the crash says quite clearly what kind of word has missed a write barrier. GODEBUG=wbshadow=2 adds a check of the write barrier shadow copy during garbage collection. Bugs detected at garbage collection can be difficult to understand, because there is no context for what the found word means. Typically you have to reproduce the problem with allocfreetrace=1 in order to understand the type of the badly updated word. Change-Id: If863837308e7c50d96b5bdc7d65af4969bf53a6e Reviewed-on: https://go-review.googlesource.com/2061 Reviewed-by: Austin Clements <austin@google.com>	2015-01-06 00:26:35 +00:00
Keith Randall	0bb8fc6614	runtime: remove go prefix from a few routines They are no longer needed now that C is gone. goatoi -> atoi gofuncname/funcname -> funcname/cfuncname goroundupsize -> already existing roundupsize Change-Id: I278bc33d279e1fdc5e8a2a04e961c4c1573b28c7 Reviewed-on: https://go-review.googlesource.com/2154 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Minux Ma <minux@golang.org>	2014-12-29 15:18:29 +00:00
Keith Randall	b2a950bb73	runtime: rename gothrow to throw Rename "gothrow" to "throw" now that the C version of "throw" is no longer needed. This change is purely mechanical except in panic.go where the old version of "throw" has been deleted. sed -i "" 's/[[:<:]]gothrow[[:>:]]/throw/g' runtime/*.go Change-Id: Icf0752299c35958b92870a97111c67bcd9159dc3 Reviewed-on: https://go-review.googlesource.com/2150 Reviewed-by: Minux Ma <minux@golang.org> Reviewed-by: Dave Cheney <dave@cheney.net>	2014-12-28 06:16:16 +00:00
Russ Cox	557a61d270	cmd/gc: add //go:nowritebarrier to diagnose unintended write barriers //go:nowritebarrier can only be used in package runtime. It does not disable write barriers; it is an assertion, checked by the compiler, that the following function needs no write barriers. Change-Id: Id7978b779b66dc1feea39ee6bda9fd4d80280b7c Reviewed-on: https://go-review.googlesource.com/1224 Reviewed-by: Rick Hudson <rlh@golang.org>	2014-12-12 20:48:10 +00:00
Keith Randall	b796cbc406	runtime: fix finalizer iterator It could only handle one finalizer before it raised an out-of-bounds error. Fixes issue #9172 Change-Id: Ibb4d0c8aff2d78a1396e248c7129a631176ab427 Reviewed-on: https://go-review.googlesource.com/1201 Reviewed-by: Russ Cox <rsc@golang.org>	2014-12-10 16:33:26 +00:00
Rick Hudson	2937d88af5	runtime: fix some comment formatting Change-Id: Ife7d6ce1131ff26444f09e8fda4f61344e669e21 Reviewed-on: https://go-review.googlesource.com/1260 Reviewed-by: Russ Cox <rsc@golang.org>	2014-12-09 22:08:45 +00:00
Rick Hudson	273507aa8f	[dev.garbage] runtime: Stop running gs during the GCscan phase. Ensure that all gs are in a scan state when their stacks are being scanned. LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/179160044	2014-11-21 16:46:27 -05:00
Rick Hudson	cc73a44f67	[dev.garbage] runtime: Fix constant overflow on 32 bit machines LGTM=rsc R=golang-codereviews CC=golang-codereviews, rsc https://golang.org/cl/180040043	2014-11-20 14:24:01 -05:00
Rick Hudson	8cfb084534	[dev.garbage] runtime: Turn concurrent GC on by default. Avoid write barriers for GC internal structures such as free lists. LGTM=rsc R=rsc CC=golang-codereviews, rsc https://golang.org/cl/179000043	2014-11-20 12:08:13 -05:00
Russ Cox	0fcf54b3d2	[dev.garbage] all: merge dev.cc into dev.garbage The garbage collector is now written in Go. There is plenty to clean up (just like on dev.cc). all.bash passes on darwin/amd64, darwin/386, linux/amd64, linux/386. TBR=rlh R=austin, rlh, bradfitz CC=golang-codereviews https://golang.org/cl/173250043	2014-11-15 08:00:38 -05:00
Russ Cox	3e804631d9	[dev.cc] all: merge dev.power64 (7667e41f3ced) into dev.cc This is to reduce the delta between dev.cc and dev.garbage to just garbage collector changes. These are the files that had merge conflicts and have been edited by hand: malloc.go mem_linux.go mgc.go os1_linux.go proc1.go panic1.go runtime1.go LGTM=austin R=austin CC=golang-codereviews https://golang.org/cl/174180043	2014-11-14 12:10:52 -05:00
Russ Cox	656be317d0	[dev.cc] runtime: delete scalararg, ptrarg; rename onM to systemstack Scalararg and ptrarg are not "signal safe". Go code filling them out can be interrupted by a signal, and then the signal handler runs, and if it also ends up in Go code that uses scalararg or ptrarg, now the old values have been smashed. For the pieces of code that do need to run in a signal handler, we introduced onM_signalok, which is really just onM except that the _signalok is meant to convey that the caller asserts that scalarg and ptrarg will be restored to their old values after the call (instead of the usual behavior, zeroing them). Scalararg and ptrarg are also untyped and therefore error-prone. Go code can always pass a closure instead of using scalararg and ptrarg; they were only really necessary for C code. And there's no more C code. For all these reasons, delete scalararg and ptrarg, converting the few remaining references to use closures. Once those are gone, there is no need for a distinction between onM and onM_signalok, so replace both with a single function equivalent to the current onM_signalok (that is, it can be called on any of the curg, g0, and gsignal stacks). The name onM and the phrase 'm stack' are misnomers, because on most system an M has two system stacks: the main thread stack and the signal handling stack. Correct the misnomer by naming the replacement function systemstack. Fix a few references to "M stack" in code. The main motivation for this change is to eliminate scalararg/ptrarg. Rick and I have already seen them cause problems because the calling sequence m.ptrarg[0] = p is a heap pointer assignment, so it gets a write barrier. The write barrier also uses onM, so it has all the same problems as if it were being invoked by a signal handler. We worked around this by saving and restoring the old values and by calling onM_signalok, but there's no point in keeping this nice home for bugs around any longer. This CL also changes funcline to return the file name as a result instead of filling in a passed-in string. (The string signature is left over from when the code was written in and called from C.) That's arguably an unrelated change, except that once I had done the ptrarg/scalararg/onM cleanup I started getting false positives about the *string argument escaping (not allowed in package runtime). The compiler is wrong, but the easiest fix is to write the code like Go code instead of like C code. I am a bit worried that the compiler is wrong because of some use of uninitialized memory in the escape analysis. If that's the reason, it will go away when we convert the compiler to Go. (And if not, we'll debug it the next time.) LGTM=khr R=r, khr CC=austin, golang-codereviews, iant, rlh https://golang.org/cl/174950043	2014-11-12 14:54:31 -05:00
Russ Cox	1e2d2f0947	[dev.cc] runtime: convert memory allocator and garbage collector to Go The conversion was done with an automated tool and then modified only as necessary to make it compile and run. [This CL is part of the removal of C code from package runtime. See golang.org/s/dev.cc for an overview.] LGTM=r R=r CC=austin, dvyukov, golang-codereviews, iant, khr https://golang.org/cl/167540043	2014-11-11 17:05:02 -05:00

1 2 3 4 5

211 Commits