qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-20 00:14:44 -07:00

Author	SHA1	Message	Date
Russ Cox	d5b40b6ac2	runtime: add GODEBUG gcshrinkstackoff, gcstackbarrieroff, and gcstoptheworld variables While we're here, update the documentation and delete variables with no effect. Change-Id: I4df0d266dff880df61b488ed547c2870205862f0 Reviewed-on: https://go-review.googlesource.com/10790 Reviewed-by: Austin Clements <austin@google.com>	2015-06-15 17:31:04 +00:00
Ainar Garipov	7f9f70e5b6	all: fix misprints in comments These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-11 14:18:57 +00:00
Austin Clements	306f8f11ad	runtime: unwind stack barriers when writing above the current frame Stack barriers assume that writes through pointers to frames above the current frame will get write barriers, and hence these frames do not need to be re-scanned to pick up these changes. For normal writes, this is true. However, there are places in the runtime that use typedmemmove to potentially write through pointers to higher frames (such as mapassign1). Currently, typedmemmove does not execute write barriers if the destination is on the stack. If there's a stack barrier between the current frame and the frame being modified with typedmemmove, and the stack barrier is not otherwise hit, it's possible that the garbage collector will never see the updated pointer and incorrectly reclaim the object. Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove and its variants) detect when the destination is in the stack and unwind stack barriers up to the point, forcing mark termination to later rescan the effected frame and collect these pointers. Fixes #11084. Might be related to #10240, #10541, #10941, #11023, #11027 and possibly others. Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd Reviewed-on: https://go-review.googlesource.com/10791 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:57:47 +00:00
Austin Clements	7529314ed3	runtime: use correct SP when installing stack barriers Currently the stack barriers are installed at the next frame boundary after gp.sched.sp + 10242^n for n=0,1,2,... However, when a G is in a system call, we set gp.sched.sp to 0, which causes stack barriers to be installed at every* frame. This easily overflows the slice we've reserved for storing the stack barrier information, and causes a "slice bounds out of range" panic in gcInstallStackBarrier. Fix this by using gp.syscallsp instead of gp.sched.sp if it's non-zero. This is the same logic that gentraceback uses to determine the current SP. Fixes #11049. Change-Id: Ie40eeee5bec59b7c1aa715a7c17aa63b1f1cf4e8 Reviewed-on: https://go-review.googlesource.com/10755 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-05 15:53:07 +00:00
Austin Clements	faa7a7e8ae	runtime: implement GC stack barriers This commit implements stack barriers to minimize the amount of stack re-scanning that must be done during mark termination. Currently the GC scans stacks of active goroutines twice during every GC cycle: once at the beginning during root discovery and once at the end during mark termination. The second scan happens while the world is stopped and guarantees that we've seen all of the roots (since there are no write barriers on writes to local stack variables). However, this means pause time is proportional to stack size. In particularly recursive programs, this can drive pause time up past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap). Re-scanning the entire stack is rarely necessary, especially for large stacks, because usually most of the frames on the stack were not active between the first and second scans and hence any changes to these frames (via non-escaping pointers passed down the stack) were tracked by write barriers. To efficiently track how far a stack has been unwound since the first scan (and, hence, how much needs to be re-scanned), this commit introduces stack barriers. During the first scan, at exponentially spaced points in each stack, the scan overwrites return PCs with the PC of the stack barrier function. When "returned" to, the stack barrier function records how far the stack has unwound and jumps to the original return PC for that point in the stack. Then the second scan only needs to proceed as far as the lowest barrier that hasn't been hit. For deeply recursive programs, this substantially reduces mark termination time (and hence pause time). For the goscheme example linked in issue #10898, prior to this change, mark termination times were typically between 100 and 500ms; with this change, mark termination times are typically between 10 and 20ms. As a result of the reduced stack scanning work, this reduces overall execution time of the goscheme example by 20%. Fixes #10898. The effect of this on programs that are not deeply recursive is minimal: name old time/op new time/op delta BinaryTree17 3.16s ± 2% 3.26s ± 1% +3.31% (p=0.000 n=19+19) Fannkuch11 2.42s ± 1% 2.48s ± 1% +2.24% (p=0.000 n=17+19) FmtFprintfEmpty 50.0ns ± 3% 49.8ns ± 1% ~ (p=0.534 n=20+19) FmtFprintfString 173ns ± 0% 175ns ± 0% +1.49% (p=0.000 n=16+19) FmtFprintfInt 170ns ± 1% 175ns ± 1% +2.97% (p=0.000 n=20+19) FmtFprintfIntInt 288ns ± 0% 295ns ± 0% +2.73% (p=0.000 n=16+19) FmtFprintfPrefixedInt 242ns ± 1% 252ns ± 1% +4.13% (p=0.000 n=18+18) FmtFprintfFloat 324ns ± 0% 323ns ± 0% -0.36% (p=0.000 n=20+19) FmtManyArgs 1.14µs ± 0% 1.12µs ± 1% -1.01% (p=0.000 n=18+19) GobDecode 8.88ms ± 1% 8.87ms ± 0% ~ (p=0.480 n=19+18) GobEncode 6.80ms ± 1% 6.85ms ± 0% +0.82% (p=0.000 n=20+18) Gzip 363ms ± 1% 363ms ± 1% ~ (p=0.077 n=18+20) Gunzip 90.6ms ± 0% 90.0ms ± 1% -0.71% (p=0.000 n=17+18) HTTPClientServer 51.5µs ± 1% 50.8µs ± 1% -1.32% (p=0.000 n=18+18) JSONEncode 17.0ms ± 0% 17.1ms ± 0% +0.40% (p=0.000 n=18+17) JSONDecode 61.8ms ± 0% 63.8ms ± 1% +3.11% (p=0.000 n=18+17) Mandelbrot200 3.84ms ± 0% 3.84ms ± 1% ~ (p=0.583 n=19+19) GoParse 3.71ms ± 1% 3.72ms ± 1% ~ (p=0.159 n=18+19) RegexpMatchEasy0_32 100ns ± 0% 100ns ± 1% -0.19% (p=0.033 n=17+19) RegexpMatchEasy0_1K 342ns ± 1% 331ns ± 0% -3.41% (p=0.000 n=19+19) RegexpMatchEasy1_32 82.5ns ± 0% 81.7ns ± 0% -0.98% (p=0.000 n=18+18) RegexpMatchEasy1_1K 505ns ± 0% 494ns ± 1% -2.16% (p=0.000 n=18+18) RegexpMatchMedium_32 137ns ± 1% 137ns ± 1% -0.24% (p=0.048 n=20+18) RegexpMatchMedium_1K 41.6µs ± 0% 41.3µs ± 1% -0.57% (p=0.004 n=18+20) RegexpMatchHard_32 2.11µs ± 0% 2.11µs ± 1% +0.20% (p=0.037 n=17+19) RegexpMatchHard_1K 63.9µs ± 2% 63.3µs ± 0% -0.99% (p=0.000 n=20+17) Revcomp 560ms ± 1% 522ms ± 0% -6.87% (p=0.000 n=18+16) Template 75.0ms ± 0% 75.1ms ± 1% +0.18% (p=0.013 n=18+19) TimeParse 358ns ± 1% 364ns ± 0% +1.74% (p=0.000 n=20+15) TimeFormat 360ns ± 0% 372ns ± 0% +3.55% (p=0.000 n=20+18) Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2 Reviewed-on: https://go-review.googlesource.com/10314 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-06-02 20:00:57 +00:00
Austin Clements	724f8298a8	runtime: avoid double-scanning of stacks Currently there's a race between stopg scanning another G's stack and the G reaching a preemption point and scanning its own stack. When this race occurs, the G's stack is scanned twice. Currently this is okay, so this race is benign. However, we will shortly be adding stack barriers during the first stack scan, so scanning will no longer be idempotent. To prepare for this, this change ensures that each stack is scanned only once during each GC phase by checking the flag that indicates that the stack has been scanned in this phase before scanning the stack. Change-Id: Id9f4d5e2e5b839bc3f200ec1723a4a12dd677ab4 Reviewed-on: https://go-review.googlesource.com/10458 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-06-02 19:59:05 +00:00
Austin Clements	3f6e69aca5	runtime: steal space for stack barrier tracking from stack The stack barrier code will need a bookkeeping structure to keep track of the overwritten return PCs. This commit introduces and allocates this structure, but does not yet use the structure. We don't want to allocate space for this structure during garbage collection, so this commit allocates it along with the allocation of the corresponding stack. However, we can't do a regular allocation in newstack because mallocgc may itself grow the stack (which would lead to a recursive allocation). Hence, this commit makes the bookkeeping structure part of the stack allocation itself by stealing the necessary space from the top of the stack allocation. Since the size of this bookkeeping structure is logarithmic in the size of the stack, this has minimal impact on stack behavior. Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665 Reviewed-on: https://go-review.googlesource.com/10313 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 19:57:57 +00:00
Rick Hudson	197aa9e64d	runtime: remove unused quiesce code This is dead code. If you want to quiesce the system the preferred way is to use forEachP(func(*p){}). Change-Id: Ic7677a5dd55e3639b99e78ddeb2c71dd1dd091fa Reviewed-on: https://go-review.googlesource.com/10267 Reviewed-by: Austin Clements <austin@google.com>	2015-05-20 17:56:44 +00:00
Russ Cox	8903b3db0e	runtime: add fast check for self-loop pointer in scanobject Addresses a problem reported on the mailing list. This will come up mainly in programs custom allocators that batch allocations, but it still helps in our programs, which mainly do not have such allocations. name old mean new mean delta BinaryTree17 5.95s × (0.97,1.03) 5.93s × (0.97,1.04) ~ (p=0.613) Fannkuch11 4.46s × (0.98,1.04) 4.33s × (0.99,1.01) -2.93% (p=0.000) FmtFprintfEmpty 86.6ns × (0.98,1.03) 86.8ns × (0.98,1.02) ~ (p=0.523) FmtFprintfString 290ns × (0.98,1.05) 287ns × (0.98,1.03) ~ (p=0.061) FmtFprintfInt 271ns × (0.98,1.04) 286ns × (0.99,1.01) +5.54% (p=0.000) FmtFprintfIntInt 495ns × (0.98,1.04) 489ns × (0.99,1.01) -1.24% (p=0.015) FmtFprintfPrefixedInt 391ns × (0.99,1.02) 407ns × (0.99,1.01) +4.00% (p=0.000) FmtFprintfFloat 578ns × (0.99,1.01) 559ns × (0.99,1.01) -3.35% (p=0.000) FmtManyArgs 1.96µs × (0.98,1.05) 1.94µs × (0.99,1.01) -1.33% (p=0.030) GobDecode 15.9ms × (0.97,1.05) 15.7ms × (0.99,1.01) -1.35% (p=0.044) GobEncode 11.4ms × (0.97,1.05) 11.3ms × (0.98,1.03) ~ (p=0.141) Gzip 658ms × (0.98,1.05) 648ms × (0.99,1.01) -1.59% (p=0.009) Gunzip 144ms × (0.99,1.03) 144ms × (0.99,1.01) ~ (p=0.867) HTTPClientServer 92.1µs × (0.97,1.05) 90.3µs × (0.99,1.01) -1.89% (p=0.005) JSONEncode 31.0ms × (0.96,1.07) 30.2ms × (0.98,1.03) -2.66% (p=0.001) JSONDecode 110ms × (0.97,1.04) 107ms × (0.99,1.01) -2.59% (p=0.000) Mandelbrot200 6.15ms × (0.98,1.04) 6.07ms × (0.99,1.02) -1.32% (p=0.045) GoParse 6.79ms × (0.97,1.04) 6.74ms × (0.97,1.04) ~ (p=0.242) RegexpMatchEasy0_32 158ns × (0.98,1.05) 155ns × (0.99,1.01) -1.64% (p=0.010) RegexpMatchEasy0_1K 548ns × (0.97,1.04) 540ns × (0.99,1.01) -1.34% (p=0.042) RegexpMatchEasy1_32 133ns × (0.97,1.04) 132ns × (0.97,1.05) ~ (p=0.466) RegexpMatchEasy1_1K 899ns × (0.96,1.05) 878ns × (0.99,1.01) -2.32% (p=0.002) RegexpMatchMedium_32 250ns × (0.96,1.03) 243ns × (0.99,1.01) -2.90% (p=0.000) RegexpMatchMedium_1K 73.4µs × (0.98,1.04) 73.0µs × (0.98,1.04) ~ (p=0.411) RegexpMatchHard_32 3.87µs × (0.97,1.07) 3.84µs × (0.98,1.04) ~ (p=0.273) RegexpMatchHard_1K 120µs × (0.97,1.08) 117µs × (0.99,1.01) -2.06% (p=0.010) Revcomp 940ms × (0.96,1.07) 924ms × (0.97,1.07) ~ (p=0.071) Template 128ms × (0.96,1.05) 128ms × (0.99,1.01) ~ (p=0.502) TimeParse 632ns × (0.96,1.07) 616ns × (0.99,1.01) -2.58% (p=0.001) TimeFormat 671ns × (0.97,1.06) 657ns × (0.99,1.02) -2.10% (p=0.002) In contrast to the one in test/bench/go1 (above), the binarytree program on the shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS to runtime.NumCPU()*2. Using that version, before vs after: name old mean new mean delta BinaryTree20 18.6s × (0.96,1.05) 11.3s × (0.98,1.02) -39.46% (p=0.000) And Go 1.4 vs after: name old mean new mean delta BinaryTree20 13.0s × (0.97,1.02) 11.3s × (0.98,1.02) -13.21% (p=0.000) There is still a scheduling problem - the raw run times are hiding the fact that this chews up 2x the CPU - but we'll take care of that separately. Change-Id: I3f5da879b24ae73a0d06745381ffb88c3744948b Reviewed-on: https://go-review.googlesource.com/10220 Reviewed-by: Austin Clements <austin@google.com>	2015-05-19 15:29:40 +00:00
Russ Cox	1635ab7dfe	runtime: remove wbshadow mode The write barrier shadow heap was very useful for developing the write barriers initially, but it's no longer used, clunky, and dragging the rest of the implementation down. The gccheckmark mode will find bugs due to missed barriers when they result in missed marks; wbshadow mode found the missed barriers more aggressively, but it required an entire separate copy of the heap. The gccheckmark mode requires no extra memory, making it more useful in practice. Compared to previous CL: name old mean new mean delta BinaryTree17 5.91s × (0.96,1.06) 5.72s × (0.97,1.03) -3.12% (p=0.000) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.91% (p=0.000) FmtFprintfEmpty 89.0ns × (0.93,1.10) 86.6ns × (0.96,1.11) ~ (p=0.077) FmtFprintfString 298ns × (0.98,1.06) 283ns × (0.99,1.04) -4.90% (p=0.000) FmtFprintfInt 286ns × (0.98,1.03) 283ns × (0.98,1.04) -1.09% (p=0.032) FmtFprintfIntInt 498ns × (0.97,1.06) 480ns × (0.99,1.02) -3.65% (p=0.000) FmtFprintfPrefixedInt 408ns × (0.98,1.02) 396ns × (0.99,1.01) -3.00% (p=0.000) FmtFprintfFloat 587ns × (0.98,1.01) 562ns × (0.99,1.01) -4.34% (p=0.000) FmtManyArgs 1.94µs × (0.99,1.02) 1.89µs × (0.99,1.01) -2.85% (p=0.000) GobDecode 15.8ms × (0.98,1.03) 15.7ms × (0.99,1.02) ~ (p=0.251) GobEncode 12.0ms × (0.96,1.09) 11.8ms × (0.98,1.03) -1.87% (p=0.024) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.688) Gunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.203) HTTPClientServer 90.3µs × (0.98,1.01) 89.1µs × (0.99,1.02) -1.30% (p=0.000) JSONEncode 31.6ms × (0.99,1.01) 31.7ms × (0.98,1.02) ~ (p=0.219) JSONDecode 107ms × (1.00,1.01) 111ms × (0.99,1.01) +3.58% (p=0.000) Mandelbrot200 6.03ms × (1.00,1.01) 6.01ms × (1.00,1.00) ~ (p=0.077) GoParse 6.53ms × (0.99,1.03) 6.54ms × (0.99,1.02) ~ (p=0.585) RegexpMatchEasy0_32 161ns × (1.00,1.01) 161ns × (0.98,1.05) ~ (p=0.948) RegexpMatchEasy0_1K 541ns × (0.99,1.01) 559ns × (0.98,1.01) +3.32% (p=0.000) RegexpMatchEasy1_32 138ns × (1.00,1.00) 137ns × (0.99,1.01) -0.55% (p=0.001) RegexpMatchEasy1_1K 887ns × (0.99,1.01) 878ns × (0.99,1.01) -0.98% (p=0.000) RegexpMatchMedium_32 253ns × (0.99,1.01) 252ns × (0.99,1.01) -0.39% (p=0.001) RegexpMatchMedium_1K 72.8µs × (1.00,1.00) 72.7µs × (1.00,1.00) ~ (p=0.485) RegexpMatchHard_32 3.85µs × (1.00,1.01) 3.85µs × (1.00,1.01) ~ (p=0.283) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) ~ (p=0.175) Revcomp 922ms × (0.97,1.08) 903ms × (0.98,1.05) -2.15% (p=0.021) Template 126ms × (0.99,1.01) 126ms × (0.99,1.01) ~ (p=0.943) TimeParse 628ns × (0.99,1.01) 634ns × (0.99,1.01) +0.92% (p=0.000) TimeFormat 668ns × (0.99,1.01) 698ns × (0.98,1.03) +4.53% (p=0.000) It's nice that the microbenchmarks are the ones helped the most, because those were the ones hurt the most by the conversion from 4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that process to (compared to CL 9706 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.72s × (0.97,1.03) -2.57% (p=0.011) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.87% (p=0.000) FmtFprintfEmpty 89.1ns × (0.95,1.16) 86.6ns × (0.96,1.11) ~ (p=0.090) FmtFprintfString 283ns × (0.98,1.02) 283ns × (0.99,1.04) ~ (p=0.681) FmtFprintfInt 284ns × (0.98,1.04) 283ns × (0.98,1.04) ~ (p=0.620) FmtFprintfIntInt 486ns × (0.98,1.03) 480ns × (0.99,1.02) -1.27% (p=0.002) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 396ns × (0.99,1.01) -0.84% (p=0.001) FmtFprintfFloat 566ns × (0.99,1.01) 562ns × (0.99,1.01) -0.80% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.89µs × (0.99,1.01) -1.10% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.7ms × (0.99,1.02) +1.55% (p=0.005) GobEncode 11.9ms × (0.97,1.03) 11.8ms × (0.98,1.03) -0.97% (p=0.048) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.627) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.482) HTTPClientServer 89.2µs × (0.99,1.02) 89.1µs × (0.99,1.02) ~ (p=0.740) JSONEncode 32.3ms × (0.97,1.06) 31.7ms × (0.98,1.02) -1.95% (p=0.002) JSONDecode 106ms × (0.99,1.01) 111ms × (0.99,1.01) +4.22% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.417) GoParse 6.57ms × (0.97,1.06) 6.54ms × (0.99,1.02) ~ (p=0.404) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (0.98,1.05) ~ (p=0.088) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 559ns × (0.98,1.01) -0.47% (p=0.034) RegexpMatchEasy1_32 145ns × (0.95,1.04) 137ns × (0.99,1.01) -5.56% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 878ns × (0.99,1.01) +1.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 252ns × (0.99,1.01) -1.43% (p=0.001) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.7µs × (1.00,1.00) -1.55% (p=0.004) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.80% (p=0.003) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.00) -2.13% (p=0.001) Revcomp 936ms × (0.95,1.08) 903ms × (0.98,1.05) -3.58% (p=0.002) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.98% (p=0.000) TimeParse 638ns × (0.98,1.05) 634ns × (0.99,1.01) ~ (p=0.198) TimeFormat 674ns × (0.99,1.01) 698ns × (0.98,1.03) +3.69% (p=0.000) Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c Reviewed-on: https://go-review.googlesource.com/9706 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:55:11 +00:00
Russ Cox	feb8a3b616	runtime: optimize heapBitsSetType For the conversion of the heap bitmap from 4-bit to 2-bit fields, I replaced heapBitsSetType with the dumbest thing that could possibly work: two atomic operations (atomicand8+atomicor8) per 2-bit field. This CL replaces that code with a proper implementation that avoids the atomics whenever possible. Benchmarks vs base CL (before the conversion to 2-bit heap bitmap) and vs Go 1.4 below. Compared to Go 1.4, SetTypePtr (a 1-pointer allocation) is 10ns slower because a race against the concurrent GC requires the use of an atomicor8 that used to be an ordinary write. This slowdown was present even in the base CL. Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation) is 10ns slower because it too needs a new atomic, because with the denser representation, the byte on the end of the allocation is now shared with the object next to it; this was not true with the 4-bit representation. Excluding these two (fundamental) slowdowns due to the use of atomics, the new code is noticeably faster than both Go 1.4 and the base CL. The next CL will reintroduce the ``typeDead'' optimization. Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5). Compared to base CL ( = new atomic) name old mean new mean delta SetTypePtr 14.1ns × (0.99,1.02) 14.7ns × (0.93,1.10) ~ (p=0.175) SetTypePtr8 18.4ns × (1.00,1.01) 18.6ns × (0.81,1.21) ~ (p=0.866) SetTypePtr16 28.7ns × (1.00,1.00) 22.4ns × (0.90,1.27) -21.88% (p=0.015) SetTypePtr32 52.3ns × (1.00,1.00) 33.8ns × (0.93,1.24) -35.37% (p=0.001) SetTypePtr64 79.2ns × (1.00,1.00) 55.1ns × (1.00,1.01) -30.43% (p=0.000) SetTypePtr126 118ns × (1.00,1.00) 100ns × (1.00,1.00) -15.97% (p=0.000) SetTypePtr128 130ns × (0.92,1.19) 98ns × (1.00,1.00) -24.36% (p=0.008) SetTypePtrSlice 726ns × (0.96,1.08) 760ns × (1.00,1.00) ~ (p=0.152) SetTypeNode1 14.1ns × (0.94,1.15) 12.0ns × (1.00,1.01) -14.60% (p=0.020) SetTypeNode1Slice 135ns × (0.96,1.07) 88ns × (1.00,1.00) -34.53% (p=0.000) SetTypeNode8 20.9ns × (1.00,1.01) 32.6ns × (1.00,1.00) +55.37% (p=0.000) SetTypeNode8Slice 414ns × (0.99,1.02) 244ns × (1.00,1.00) -41.09% (p=0.000) SetTypeNode64 80.0ns × (1.00,1.00) 57.4ns × (1.00,1.00) -28.23% (p=0.000) SetTypeNode64Slice 2.15µs × (1.00,1.01) 1.56µs × (1.00,1.00) -27.43% (p=0.000) SetTypeNode124 119ns × (0.99,1.00) 100ns × (1.00,1.00) -16.11% (p=0.000) SetTypeNode124Slice 3.40µs × (1.00,1.00) 2.93µs × (1.00,1.00) -13.80% (p=0.000) SetTypeNode126 120ns × (1.00,1.01) 98ns × (1.00,1.00) -18.19% (p=0.000) SetTypeNode126Slice 3.53µs × (0.98,1.08) 3.02µs × (1.00,1.00) -14.49% (p=0.002) SetTypeNode1024 726ns × (0.97,1.09) 740ns × (1.00,1.00) ~ (p=0.451) SetTypeNode1024Slice 24.9µs × (0.89,1.37) 23.1µs × (1.00,1.00) ~ (p=0.476) Compared to Go 1.4 ( = new atomic) name old mean new mean delta SetTypePtr 5.71ns × (0.89,1.19) 14.68ns × (0.93,1.10) +157.24% (p=0.000) SetTypePtr8 19.3ns × (0.96,1.10) 18.6ns × (0.81,1.21) ~ (p=0.638) SetTypePtr16 30.7ns × (0.99,1.03) 22.4ns × (0.90,1.27) -26.88% (p=0.005) SetTypePtr32 51.5ns × (1.00,1.00) 33.8ns × (0.93,1.24) -34.40% (p=0.001) SetTypePtr64 83.6ns × (0.94,1.12) 55.1ns × (1.00,1.01) -34.12% (p=0.001) SetTypePtr126 137ns × (0.87,1.26) 100ns × (1.00,1.00) -27.10% (p=0.028) SetTypePtrSlice 865ns × (0.80,1.23) 760ns × (1.00,1.00) ~ (p=0.243) SetTypeNode1 15.2ns × (0.88,1.12) 12.0ns × (1.00,1.01) -20.89% (p=0.014) SetTypeNode1Slice 156ns × (0.93,1.16) 88ns × (1.00,1.00) -43.57% (p=0.001) SetTypeNode8 23.8ns × (0.90,1.18) 32.6ns × (1.00,1.00) +36.76% (p=0.003) ** SetTypeNode8Slice 502ns × (0.92,1.10) 244ns × (1.00,1.00) -51.46% (p=0.000) SetTypeNode64 85.6ns × (0.94,1.11) 57.4ns × (1.00,1.00) -32.89% (p=0.001) SetTypeNode64Slice 2.36µs × (0.91,1.14) 1.56µs × (1.00,1.00) -33.96% (p=0.002) SetTypeNode124 130ns × (0.91,1.12) 100ns × (1.00,1.00) -23.49% (p=0.004) SetTypeNode124Slice 3.81µs × (0.90,1.22) 2.93µs × (1.00,1.00) -23.09% (p=0.025) There are fewer benchmarks vs Go 1.4 because unrolling directly into the heap bitmap is not yet implemented, so those would not be meaningful comparisons. These benchmarks were not present in Go 1.4 as distributed. The backport to Go 1.4 is in github.com/rsc/go's go14bench branch, commit 71d5ee5. Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6 Reviewed-on: https://go-review.googlesource.com/9704 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:51:20 +00:00
Russ Cox	0234dfd493	runtime: use 2-bit heap bitmap (in place of 4-bit) Previous CLs changed the representation of the non-heap type bitmaps to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap stored a 2-bit type for each word and a mark bit and checkmark bit for the first word of the object. (There used to be additional per-word bits.) Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not, and the other used for mark, checkmark, and "keep scanning forward to find pointers in this object." See comments for details. This CL replaces heapBitsSetType with very slow but obviously correct code. A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.) Change-Id: I999577a133f3cfecacebdec9cdc3573c235c7fb9 Reviewed-on: https://go-review.googlesource.com/9703 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 14:43:45 +00:00
Russ Cox	9626561030	runtime: fix gccheckmark mode and enable by default It was testing the mark bits on what roots pointed at, but not the remainder of the live heap, because in CL 2991 I accidentally inverted this check during refactoring. The next CL will turn it back off by default again, but I want one run on the builders with the full checkmark checks. Change-Id: Ic166458cea25c0a56e5387fc527cb166ff2e5ada Reviewed-on: https://go-review.googlesource.com/9824 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-07 21:08:29 +00:00
Austin Clements	3be3cbd548	runtime: track "scannable" bytes of heap This tracks the number of scannable bytes in the allocated heap. That is, bytes that the garbage collector must scan before reaching the last pointer field in each object. This will be used to compute a more robust estimate of the GC scan work. Change-Id: I1eecd45ef9cdd65b69d2afb5db5da885c80086bb Reviewed-on: https://go-review.googlesource.com/9695 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-06 19:40:33 +00:00
Austin Clements	53c53984e7	runtime: include scalar slots in GC scan work metric The garbage collector predicts how much "scan work" must be done in a cycle to determine how much work should be done by mutators when they allocate. Most code doesn't care what units the scan work is in: it simply knows that a certain amount of scan work has to be done in the cycle. Currently, the GC uses the number of pointer slots scanned as the scan work on the theory that this is the bulk of the time spent in the garbage collector and hence reflects real CPU resource usage. However, this metric is difficult to estimate at the beginning of a cycle. Switch to counting the total number of bytes scanned, including both pointer and scalar slots. This is still less than the total marked heap since it omits no-scan objects and no-scan tails of objects. This metric may not reflect absolute performance as well as the count of scanned pointer slots (though it still takes time to scan scalar fields), but it will be much easier to estimate robustly, which is more important. Change-Id: Ie3a5eeeb0384a1ca566f61b2f11e9ff3a75ca121 Reviewed-on: https://go-review.googlesource.com/9694 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-06 19:40:27 +00:00
Russ Cox	4fffc50c26	runtime: correct accounting of scan work and bytes marked (1) Count pointer-free objects found during scanning roots as marked bytes, by not zeroing the mark total after scanning roots. (2) Don't count the bytes for the roots themselves, by not adding them to the mark total in scanblock (the zeroing removed by (1) was aimed at that add but hitting more). Combined, (1) and (2) fix the calculation of the marked heap size. This makes the GC trigger much less often in the Go 1 benchmarks, which have a global []byte pointing at 256 MB of data. That 256 MB allocation was not being included in the heap size in the current code, but was included in Go 1.4. This is the source of much of the relative slowdown in that directory. (3) Count the bytes for the roots as scanned work, by not zeroing the scan total after scanning roots. There is no strict justification for this, and it probably doesn't matter much either way, but it was always combined with another buggy zeroing (removed in (1)), so guilty by association. Austin noticed this. name old mean new mean delta BenchmarkBinaryTree17 13.1s × (0.97,1.03) 5.9s × (0.97,1.05) -55.19% (p=0.000) BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.37s × (1.00,1.01) +0.47% (p=0.032) BenchmarkFmtFprintfEmpty 84.6ns × (0.95,1.14) 85.7ns × (0.94,1.05) ~ (p=0.521) BenchmarkFmtFprintfString 320ns × (0.95,1.06) 283ns × (0.99,1.02) -11.48% (p=0.000) BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 288ns × (0.99,1.02) -7.26% (p=0.000) BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 478ns × (0.99,1.02) -13.70% (p=0.000) BenchmarkFmtFprintfPrefixedInt 434ns × (0.96,1.06) 393ns × (0.98,1.04) -9.60% (p=0.000) BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 584ns × (0.99,1.01) -5.73% (p=0.000) BenchmarkFmtManyArgs 2.19µs × (0.98,1.03) 1.94µs × (0.99,1.01) -11.62% (p=0.000) BenchmarkGobDecode 21.2ms × (0.97,1.06) 15.2ms × (0.99,1.01) -28.17% (p=0.000) BenchmarkGobEncode 18.1ms × (0.94,1.06) 11.8ms × (0.99,1.01) -35.00% (p=0.000) BenchmarkGzip 650ms × (0.98,1.01) 649ms × (0.99,1.02) ~ (p=0.802) BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.438) BenchmarkHTTPClientServer 110µs × (0.98,1.04) 101µs × (0.98,1.02) -8.79% (p=0.000) BenchmarkJSONEncode 40.3ms × (0.97,1.03) 31.8ms × (0.98,1.03) -20.92% (p=0.000) BenchmarkJSONDecode 119ms × (0.97,1.02) 108ms × (0.99,1.02) -9.15% (p=0.000) BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (0.99,1.01) ~ (p=0.750) BenchmarkGoParse 8.58ms × (0.89,1.10) 6.80ms × (1.00,1.00) -20.71% (p=0.000) BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 162ns × (0.99,1.02) ~ (p=0.131) BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 559ns × (0.99,1.02) +3.58% (p=0.000) BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (1.00,1.00) ~ (p=0.466) BenchmarkRegexpMatchEasy1_1K 889ns × (0.99,1.01) 885ns × (0.99,1.01) -0.50% (p=0.022) BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.02) 252ns × (0.99,1.01) ~ (p=0.469) BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 73.6µs × (0.99,1.03) ~ (p=0.168) BenchmarkRegexpMatchHard_32 3.87µs × (1.00,1.01) 3.86µs × (1.00,1.00) ~ (p=0.055) BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.00) ~ (p=0.133) BenchmarkRevcomp 995ms × (0.94,1.10) 949ms × (0.99,1.01) -4.64% (p=0.000) BenchmarkTemplate 141ms × (0.97,1.02) 127ms × (0.99,1.01) -10.00% (p=0.000) BenchmarkTimeParse 641ns × (0.99,1.01) 623ns × (0.99,1.01) -2.79% (p=0.000) BenchmarkTimeFormat 729ns × (0.98,1.03) 679ns × (0.99,1.00) -6.93% (p=0.000) Change-Id: I839bd7356630d18377989a0748763414e15ed057 Reviewed-on: https://go-review.googlesource.com/9602 Reviewed-by: Austin Clements <austin@google.com>	2015-05-01 19:31:00 +00:00
Russ Cox	4d0f3a1c95	cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss The bitmaps were 2 bits per pointer because we needed to distinguish scalar, pointer, multiword, and we used the leftover value to distinguish uninitialized from scalar, even though the garbage collector (GC) didn't care. Now that there are no multiword structures from the GC's point of view, cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not. The GC assumes the same layout for stack frames and for the maps describing the global data and bss sections, so change them all in one CL. The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since the 2-bit representation lives (at least for now) in some of the reflect data. Because these stack frame bitmaps are stored directly in the rodata in the binary, this CL reduces the size of the 6g binary by about 1.1%. Performance change is basically a wash, but using less memory, and smaller binaries, and enables other bitmap reductions. name old mean new mean delta BenchmarkBinaryTree17 13.2s × (0.97,1.03) 13.0s × (0.99,1.01) -0.93% (p=0.005) BenchmarkBinaryTree17-2 9.69s × (0.96,1.05) 9.51s × (0.96,1.03) -1.86% (p=0.001) BenchmarkBinaryTree17-4 10.1s × (0.97,1.05) 10.0s × (0.96,1.05) ~ (p=0.141) BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.43s × (0.98,1.04) +1.75% (p=0.001) BenchmarkFannkuch11-2 4.31s × (0.99,1.03) 4.32s × (1.00,1.00) ~ (p=0.095) BenchmarkFannkuch11-4 4.32s × (0.99,1.02) 4.38s × (0.98,1.04) +1.38% (p=0.008) BenchmarkFmtFprintfEmpty 83.5ns × (0.97,1.10) 87.3ns × (0.92,1.11) +4.55% (p=0.014) BenchmarkFmtFprintfEmpty-2 81.8ns × (0.98,1.04) 82.5ns × (0.97,1.08) ~ (p=0.364) BenchmarkFmtFprintfEmpty-4 80.9ns × (0.99,1.01) 82.6ns × (0.97,1.08) +2.12% (p=0.010) BenchmarkFmtFprintfString 320ns × (0.95,1.04) 322ns × (0.97,1.05) ~ (p=0.368) BenchmarkFmtFprintfString-2 303ns × (0.97,1.04) 304ns × (0.97,1.04) ~ (p=0.484) BenchmarkFmtFprintfString-4 305ns × (0.97,1.05) 306ns × (0.98,1.05) ~ (p=0.543) BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 319ns × (0.97,1.03) +2.63% (p=0.000) BenchmarkFmtFprintfInt-2 297ns × (0.98,1.04) 301ns × (0.97,1.04) +1.19% (p=0.023) BenchmarkFmtFprintfInt-4 302ns × (0.98,1.02) 304ns × (0.97,1.03) ~ (p=0.126) BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 554ns × (0.97,1.03) ~ (p=0.975) BenchmarkFmtFprintfIntInt-2 520ns × (0.98,1.03) 517ns × (0.98,1.02) ~ (p=0.153) BenchmarkFmtFprintfIntInt-4 524ns × (0.98,1.02) 525ns × (0.98,1.03) ~ (p=0.597) BenchmarkFmtFprintfPrefixedInt 433ns × (0.97,1.06) 434ns × (0.97,1.06) ~ (p=0.804) BenchmarkFmtFprintfPrefixedInt-2 413ns × (0.98,1.04) 413ns × (0.98,1.03) ~ (p=0.881) BenchmarkFmtFprintfPrefixedInt-4 420ns × (0.97,1.03) 421ns × (0.97,1.03) ~ (p=0.561) BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 636ns × (0.97,1.03) +2.57% (p=0.000) BenchmarkFmtFprintfFloat-2 601ns × (0.98,1.02) 617ns × (0.98,1.03) +2.58% (p=0.000) BenchmarkFmtFprintfFloat-4 613ns × (0.98,1.03) 626ns × (0.98,1.02) +2.15% (p=0.000) BenchmarkFmtManyArgs 2.19µs × (0.96,1.04) 2.23µs × (0.97,1.02) +1.65% (p=0.000) BenchmarkFmtManyArgs-2 2.08µs × (0.98,1.03) 2.10µs × (0.99,1.02) +0.79% (p=0.019) BenchmarkFmtManyArgs-4 2.10µs × (0.98,1.02) 2.13µs × (0.98,1.02) +1.72% (p=0.000) BenchmarkGobDecode 21.3ms × (0.97,1.05) 21.1ms × (0.97,1.04) -1.36% (p=0.025) BenchmarkGobDecode-2 20.0ms × (0.97,1.03) 19.2ms × (0.97,1.03) -4.00% (p=0.000) BenchmarkGobDecode-4 19.5ms × (0.99,1.02) 19.0ms × (0.99,1.01) -2.39% (p=0.000) BenchmarkGobEncode 18.3ms × (0.95,1.07) 18.1ms × (0.96,1.08) ~ (p=0.305) BenchmarkGobEncode-2 16.8ms × (0.97,1.02) 16.4ms × (0.98,1.02) -2.79% (p=0.000) BenchmarkGobEncode-4 15.4ms × (0.98,1.02) 15.4ms × (0.98,1.02) ~ (p=0.465) BenchmarkGzip 650ms × (0.98,1.03) 655ms × (0.97,1.04) ~ (p=0.075) BenchmarkGzip-2 652ms × (0.98,1.03) 655ms × (0.98,1.02) ~ (p=0.337) BenchmarkGzip-4 656ms × (0.98,1.04) 653ms × (0.98,1.03) ~ (p=0.291) BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.507) BenchmarkGunzip-2 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.313) BenchmarkGunzip-4 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.312) BenchmarkHTTPClientServer 110µs × (0.98,1.03) 109µs × (0.99,1.02) -1.40% (p=0.000) BenchmarkHTTPClientServer-2 154µs × (0.90,1.08) 149µs × (0.90,1.08) -3.43% (p=0.007) BenchmarkHTTPClientServer-4 138µs × (0.97,1.04) 138µs × (0.96,1.04) ~ (p=0.670) BenchmarkJSONEncode 40.2ms × (0.98,1.02) 40.2ms × (0.98,1.05) ~ (p=0.828) BenchmarkJSONEncode-2 35.1ms × (0.99,1.02) 35.2ms × (0.98,1.03) ~ (p=0.392) BenchmarkJSONEncode-4 35.3ms × (0.98,1.03) 35.3ms × (0.98,1.02) ~ (p=0.813) BenchmarkJSONDecode 119ms × (0.97,1.02) 117ms × (0.98,1.02) -1.80% (p=0.000) BenchmarkJSONDecode-2 115ms × (0.99,1.02) 114ms × (0.98,1.02) -1.18% (p=0.000) BenchmarkJSONDecode-4 116ms × (0.98,1.02) 114ms × (0.98,1.02) -1.43% (p=0.000) BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.985) BenchmarkMandelbrot200-2 6.03ms × (1.00,1.01) 6.02ms × (1.00,1.01) ~ (p=0.320) BenchmarkMandelbrot200-4 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.799) BenchmarkGoParse 8.63ms × (0.89,1.10) 8.58ms × (0.93,1.09) ~ (p=0.667) BenchmarkGoParse-2 8.20ms × (0.97,1.04) 8.37ms × (0.97,1.04) +1.96% (p=0.001) BenchmarkGoParse-4 8.00ms × (0.98,1.02) 8.14ms × (0.99,1.02) +1.75% (p=0.000) BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 164ns × (0.98,1.04) +1.35% (p=0.011) BenchmarkRegexpMatchEasy0_32-2 161ns × (1.00,1.01) 161ns × (1.00,1.00) ~ (p=0.185) BenchmarkRegexpMatchEasy0_32-4 161ns × (1.00,1.00) 161ns × (1.00,1.00) -0.19% (p=0.001) BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 566ns × (0.98,1.04) +4.98% (p=0.000) BenchmarkRegexpMatchEasy0_1K-2 540ns × (0.99,1.01) 557ns × (0.99,1.01) +3.21% (p=0.000) BenchmarkRegexpMatchEasy0_1K-4 541ns × (0.99,1.01) 559ns × (0.99,1.01) +3.26% (p=0.000) BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (0.99,1.03) ~ (p=0.979) BenchmarkRegexpMatchEasy1_32-2 139ns × (0.99,1.04) 139ns × (0.99,1.02) ~ (p=0.777) BenchmarkRegexpMatchEasy1_32-4 139ns × (0.98,1.04) 139ns × (0.99,1.04) ~ (p=0.771) BenchmarkRegexpMatchEasy1_1K 890ns × (0.99,1.03) 885ns × (1.00,1.01) -0.50% (p=0.004) BenchmarkRegexpMatchEasy1_1K-2 888ns × (0.99,1.01) 885ns × (0.99,1.01) -0.37% (p=0.004) BenchmarkRegexpMatchEasy1_1K-4 890ns × (0.99,1.02) 884ns × (1.00,1.00) -0.70% (p=0.000) BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.01) 251ns × (0.99,1.01) ~ (p=0.081) BenchmarkRegexpMatchMedium_32-2 254ns × (0.99,1.04) 252ns × (0.99,1.01) -0.78% (p=0.027) BenchmarkRegexpMatchMedium_32-4 253ns × (0.99,1.04) 252ns × (0.99,1.01) -0.70% (p=0.022) BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 72.7µs × (1.00,1.00) ~ (p=0.064) BenchmarkRegexpMatchMedium_1K-2 74.1µs × (0.98,1.05) 72.9µs × (1.00,1.01) -1.61% (p=0.001) BenchmarkRegexpMatchMedium_1K-4 73.6µs × (0.99,1.05) 72.8µs × (1.00,1.00) -1.13% (p=0.007) BenchmarkRegexpMatchHard_32 3.88µs × (0.99,1.03) 3.92µs × (0.98,1.05) ~ (p=0.143) BenchmarkRegexpMatchHard_32-2 3.89µs × (0.99,1.03) 3.93µs × (0.98,1.09) ~ (p=0.278) BenchmarkRegexpMatchHard_32-4 3.90µs × (0.99,1.05) 3.93µs × (0.98,1.05) ~ (p=0.252) BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.02) -0.54% (p=0.003) BenchmarkRegexpMatchHard_1K-2 118µs × (0.99,1.01) 118µs × (0.99,1.03) ~ (p=0.581) BenchmarkRegexpMatchHard_1K-4 118µs × (0.99,1.02) 117µs × (0.99,1.01) -0.54% (p=0.002) BenchmarkRevcomp 991ms × (0.95,1.10) 989ms × (0.94,1.08) ~ (p=0.879) BenchmarkRevcomp-2 978ms × (0.95,1.11) 962ms × (0.96,1.08) ~ (p=0.257) BenchmarkRevcomp-4 979ms × (0.96,1.07) 974ms × (0.96,1.11) ~ (p=0.678) BenchmarkTemplate 141ms × (0.99,1.02) 145ms × (0.99,1.02) +2.75% (p=0.000) BenchmarkTemplate-2 135ms × (0.98,1.02) 138ms × (0.99,1.02) +2.34% (p=0.000) BenchmarkTemplate-4 136ms × (0.98,1.02) 140ms × (0.99,1.02) +2.71% (p=0.000) BenchmarkTimeParse 640ns × (0.99,1.01) 622ns × (0.99,1.01) -2.88% (p=0.000) BenchmarkTimeParse-2 640ns × (0.99,1.01) 622ns × (1.00,1.00) -2.81% (p=0.000) BenchmarkTimeParse-4 640ns × (1.00,1.01) 622ns × (0.99,1.01) -2.82% (p=0.000) BenchmarkTimeFormat 730ns × (0.98,1.02) 731ns × (0.98,1.03) ~ (p=0.767) BenchmarkTimeFormat-2 709ns × (0.99,1.02) 707ns × (0.99,1.02) ~ (p=0.347) BenchmarkTimeFormat-4 717ns × (0.98,1.01) 718ns × (0.98,1.02) ~ (p=0.793) Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9 Reviewed-on: https://go-review.googlesource.com/9406 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-01 18:44:36 +00:00
Austin Clements	3ca20218c1	runtime: fix gcDumpObject on non-heap pointers gcDumpObject is used to print the source and destination objects when checkmark find a missing mark. However, gcDumpObject currently assumes the given pointer will point to a heap object. This is not true of the source object during root marking and may not even be true of the destination object in the limited situations where the heap points back in to the stack. If the pointer isn't a heap object, gcDumpObject will attempt an out-of-bounds access to h_spans. This will cause a panicslice, which will attempt to construct a useful panic message. This will cause a string allocation, which will lead mallocgc to panic because the GC is in mark termination (checkmark only happens during mark termination). Fix this by checking that the pointer points into the heap arena before attempting to use it as an arena pointer. Change-Id: I09da600c380d4773f1f8f38e45b82cb229ea6382 Reviewed-on: https://go-review.googlesource.com/9498 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-30 14:53:51 +00:00
Austin Clements	63caec5dee	runtime: eliminate one heapBitsForObject from scanobject scanobject with ptrmask!=nil is only ever called with the base pointer of a heap object. Currently, scanobject calls heapBitsForObject, which goes to a great deal of trouble to check that the pointer points into the heap and to find the base of the object it points to, both of which are completely unnecessary in this case. Replace this call to heapBitsForObject with much simpler logic to fetch the span and compute the heap bits. Benchmark results with five runs: name old mean new mean delta BenchmarkBinaryTree17 9.21s × (0.95,1.02) 8.55s × (0.91,1.03) -7.16% (p=0.022) BenchmarkFannkuch11 2.65s × (1.00,1.00) 2.62s × (1.00,1.00) -1.10% (p=0.000) BenchmarkFmtFprintfEmpty 73.2ns × (0.99,1.01) 71.7ns × (1.00,1.01) -1.99% (p=0.004) BenchmarkFmtFprintfString 302ns × (0.99,1.00) 292ns × (0.98,1.02) -3.31% (p=0.020) BenchmarkFmtFprintfInt 281ns × (0.98,1.01) 279ns × (0.96,1.02) ~ (p=0.596) BenchmarkFmtFprintfIntInt 482ns × (0.98,1.01) 488ns × (0.95,1.02) ~ (p=0.419) BenchmarkFmtFprintfPrefixedInt 382ns × (0.99,1.01) 365ns × (0.96,1.02) -4.35% (p=0.015) BenchmarkFmtFprintfFloat 475ns × (0.99,1.01) 472ns × (1.00,1.00) ~ (p=0.108) BenchmarkFmtManyArgs 1.89µs × (1.00,1.01) 1.90µs × (0.94,1.02) ~ (p=0.883) BenchmarkGobDecode 22.4ms × (0.99,1.01) 21.9ms × (0.92,1.04) ~ (p=0.332) BenchmarkGobEncode 24.7ms × (0.98,1.02) 23.9ms × (0.87,1.07) ~ (p=0.407) BenchmarkGzip 397ms × (0.99,1.01) 398ms × (0.99,1.01) ~ (p=0.718) BenchmarkGunzip 96.7ms × (1.00,1.00) 96.9ms × (1.00,1.00) ~ (p=0.230) BenchmarkHTTPClientServer 71.5µs × (0.98,1.01) 68.5µs × (0.92,1.06) ~ (p=0.243) BenchmarkJSONEncode 46.1ms × (0.98,1.01) 44.9ms × (0.98,1.03) -2.51% (p=0.040) BenchmarkJSONDecode 86.1ms × (0.99,1.01) 86.5ms × (0.99,1.01) ~ (p=0.343) BenchmarkMandelbrot200 4.12ms × (1.00,1.00) 4.13ms × (1.00,1.00) +0.23% (p=0.000) BenchmarkGoParse 5.89ms × (0.96,1.03) 5.82ms × (0.96,1.04) ~ (p=0.522) BenchmarkRegexpMatchEasy0_32 141ns × (0.99,1.01) 142ns × (1.00,1.00) ~ (p=0.178) BenchmarkRegexpMatchEasy0_1K 408ns × (1.00,1.00) 392ns × (0.99,1.00) -3.83% (p=0.000) BenchmarkRegexpMatchEasy1_32 122ns × (1.00,1.00) 122ns × (1.00,1.00) ~ (p=0.178) BenchmarkRegexpMatchEasy1_1K 626ns × (1.00,1.01) 624ns × (0.99,1.00) ~ (p=0.122) BenchmarkRegexpMatchMedium_32 202ns × (0.99,1.00) 205ns × (0.99,1.01) +1.58% (p=0.001) BenchmarkRegexpMatchMedium_1K 54.4µs × (1.00,1.00) 55.5µs × (1.00,1.00) +1.86% (p=0.000) BenchmarkRegexpMatchHard_32 2.68µs × (1.00,1.00) 2.71µs × (1.00,1.00) +0.97% (p=0.002) BenchmarkRegexpMatchHard_1K 79.8µs × (1.00,1.01) 80.5µs × (1.00,1.01) +0.94% (p=0.003) BenchmarkRevcomp 590ms × (0.99,1.01) 585ms × (1.00,1.00) ~ (p=0.066) BenchmarkTemplate 111ms × (0.97,1.02) 112ms × (0.99,1.01) ~ (p=0.201) BenchmarkTimeParse 392ns × (1.00,1.00) 385ns × (1.00,1.00) -1.69% (p=0.000) BenchmarkTimeFormat 449ns × (0.98,1.01) 448ns × (0.99,1.01) ~ (p=0.550) Change-Id: Ie7c3830c481d96c9043e7bf26853c6c1d05dc9f4 Reviewed-on: https://go-review.googlesource.com/9364 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-28 15:22:20 +00:00
Austin Clements	33e0f3d853	runtime: fix some out of date comments and typos Change-Id: I061057414c722c5a0f03c709528afc8554114db6 Reviewed-on: https://go-review.googlesource.com/9367 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 20:08:38 +00:00
Austin Clements	bb6320535d	runtime: replace STW for enabling write barriers with ragged barrier Currently, we use a full stop-the-world around enabling write barriers. This is to ensure that all Gs have enabled write barriers before any blackening occurs (either in gcBgMarkWorker() or in gcAssistAlloc()). However, there's no need to bring the whole world to a synchronous stop to ensure this. This change replaces the STW with a ragged barrier that ensures each P has individually observed that write barriers should be enabled before GC performs any blackening. Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955 Reviewed-on: https://go-review.googlesource.com/8207 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 19:26:37 +00:00
Austin Clements	1b4025f4bd	runtime: replace per-M workbuf cache with per-P gcWork cache Currently, each M has a cache of the most recently used workbuf. This is used primarily by the write barrier so it doesn't have to access the global workbuf lists on every write barrier. It's also used by stack scanning because it's convenient. This cache is important for write barrier performance, but this particular approach has several downsides. It's faster than no cache, but far from optimal (as the benchmarks below show). It's complex: access to the cache is sprinkled through most of the workbuf list operations and it requires special care to transform into and back out of the gcWork cache that's actually used for scanning and marking. It requires atomic exchanges to take ownership of the cached workbuf and to return it to the M's cache even though it's almost always used by only the current M. Since it's per-M, flushing these caches is O(# of Ms), which may be high. And it has some significant subtleties: for example, in general the cache shouldn't be used after the harvestwbufs() in mark termination because it could hide work from mark termination, but stack scanning can happen after this and will* use the cache (but it turns out this is okay because it will always be followed by a getfull(), which drains the cache). This change replaces this cache with a per-P gcWork object. This gcWork cache can be used directly by scanning and marking (as long as preemption is disabled, which is a general requirement of gcWork). Since it's per-P, it doesn't require synchronization, which simplifies things and means the only atomic operations in the write barrier are occasionally fetching new work buffers and setting a mark bit if the object isn't already marked. This cache can be flushed in O(# of Ps), which is generally small. It follows a simple flushing rule: the cache can be used during any phase, but during mark termination it must be flushed before allowing preemption. This also makes the dispose during mutator assist no longer necessary, which eliminates the vast majority of gcWork dispose calls and reduces contention on the global workbuf lists. And it's a lot faster on some benchmarks: benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 11963668673 11206112763 -6.33% BenchmarkFannkuch11 2643217136 2649182499 +0.23% BenchmarkFmtFprintfEmpty 70.4 70.2 -0.28% BenchmarkFmtFprintfString 364 307 -15.66% BenchmarkFmtFprintfInt 317 282 -11.04% BenchmarkFmtFprintfIntInt 512 483 -5.66% BenchmarkFmtFprintfPrefixedInt 404 380 -5.94% BenchmarkFmtFprintfFloat 521 479 -8.06% BenchmarkFmtManyArgs 2164 1894 -12.48% BenchmarkGobDecode 30366146 22429593 -26.14% BenchmarkGobEncode 29867472 26663152 -10.73% BenchmarkGzip 391236616 396779490 +1.42% BenchmarkGunzip 96639491 96297024 -0.35% BenchmarkHTTPClientServer 100110 70763 -29.31% BenchmarkJSONEncode 51866051 52511382 +1.24% BenchmarkJSONDecode 103813138 86094963 -17.07% BenchmarkMandelbrot200 4121834 4120886 -0.02% BenchmarkGoParse 16472789 5879949 -64.31% BenchmarkRegexpMatchEasy0_32 140 140 +0.00% BenchmarkRegexpMatchEasy0_1K 394 394 +0.00% BenchmarkRegexpMatchEasy1_32 120 120 +0.00% BenchmarkRegexpMatchEasy1_1K 621 614 -1.13% BenchmarkRegexpMatchMedium_32 209 202 -3.35% BenchmarkRegexpMatchMedium_1K 54889 55175 +0.52% BenchmarkRegexpMatchHard_32 2682 2675 -0.26% BenchmarkRegexpMatchHard_1K 79383 79524 +0.18% BenchmarkRevcomp 584116718 584595320 +0.08% BenchmarkTemplate 125400565 109620196 -12.58% BenchmarkTimeParse 386 387 +0.26% BenchmarkTimeFormat 580 447 -22.93% (Best out of 10 runs. The delta of averages is similar.) This also puts us in a good position to flush these caches when nearing the end of concurrent marking, which will let us increase the size of the work buffers while still controlling mark termination pause time. Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522 Reviewed-on: https://go-review.googlesource.com/9178 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 20:10:14 +00:00
Austin Clements	ce502b063c	runtime: use park/ready to wake up GC at end of concurrent mark Currently, the main GC goroutine sleeps on a note during concurrent mark and the first background mark worker or assist to finish marking use wakes up that note to let the main goroutine proceed into mark termination. Unfortunately, the latency of this wakeup can be quite high, since the GC goroutine will typically have lost its P while in the futex sleep, meaning it will be placed on the global run queue and will wait there until some P is kind enough to pick it up. This delay gives the mutator more time to allocate and create floating garbage, growing the heap unnecessarily. Worse, it's likely that background marking has stopped at this point (unless GOMAXPROCS>4), so anything that's allocated and published to the heap during this window will have to be scanned during mark termination while the world is stopped. This change replaces the note sleep/wakeup with a gopark/ready scheme. This keeps the wakeup inside the Go scheduler and lets the garbage collector take advantage of the new scheduler semantics that run the ready()d goroutine immediately when the ready()ing goroutine sleeps. For the json benchmark from x/benchmarks with GOMAXPROCS=4, this reduces the delay in waking up the GC goroutine and entering mark termination once concurrent marking is done from ~100ms to typically <100µs. Change-Id: Ib11f8b581b8914f2d68e0094f121e49bac3bb384 Reviewed-on: https://go-review.googlesource.com/9291 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:13:01 +00:00
Rick Hudson	77f56af0bc	runtime: Improve scanning performance To achieve a 2% improvement in the garbage benchmark this CL removes an unneeded assert and avoids one hbits.next() call per object being scanned. Change-Id: Ibd542d01e9c23eace42228886f9edc488354df0d Reviewed-on: https://go-review.googlesource.com/9244 Reviewed-by: Austin Clements <austin@google.com>	2015-04-23 20:27:46 +00:00
Austin Clements	8d03acce54	runtime: multi-threaded, utilization-scheduled background mark Currently, the concurrent mark phase is performed by the main GC goroutine. Prior to the previous commit enabling preemption, this caused marking to always consume 1/GOMAXPROCS of the available CPU time. If GOMAXPROCS=1, this meant background GC would consume 100% of the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use less than the goal of 25%. If GOMAXPROCS=4, background GC would use the goal 25%, but if the mutator wasn't using the remaining 75%, background marking wouldn't take advantage of the idle time. Enabling preemption in the previous commit made GC miss CPU targets in completely different ways, but set us up to bring everything back in line. This change replaces the fixed GC goroutine with per-P background mark goroutines. Once started, these goroutines don't go in the standard run queues; instead, they are scheduled specially such that the time spent in mutator assists and the background mark goroutines totals 25% of the CPU time available to the program. Furthermore, this lets background marking take advantage of idle Ps, which significantly boosts GC performance for applications that under-utilize the CPU. This requires also changing how time is reported for gctrace, so this change splits the concurrent mark CPU time into assist/background/idle scanning. This also requires increasing the size of the StackRecord slice used in a GoroutineProfile test. Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157 Reviewed-on: https://go-review.googlesource.com/8850 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:32 +00:00
Austin Clements	100da60979	runtime: track time spent in mutator assists This time is tracked per P and periodically flushed to the global controller state. This will be used to compute mutator assist utilization in order to schedule background GC work. Change-Id: Ib94f90903d426a02cf488bf0e2ef67a068eb3eec Reviewed-on: https://go-review.googlesource.com/8837 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:22 +00:00
Austin Clements	4b2fde945a	runtime: proportional mutator assist Currently, mutator allocation periodically assists the garbage collector by performing a small, fixed amount of scanning work. However, to control heap growth, mutators need to perform scanning work proportional to their allocation rate. This change implements proportional mutator assists. This uses the scan work estimate computed by the garbage collector at the beginning of each cycle to compute how much scan work must be performed per allocation byte to complete the estimated scan work by the time the heap reaches the goal size. When allocation triggers an assist, it uses this ratio and the amount allocated since the last assist to compute the assist work, then attempts to steal as much of this work as possible from the background collector's credit, and then performs any remaining scan work itself. Change-Id: I98b2078147a60d01d6228b99afd414ef857e4fba Reviewed-on: https://go-review.googlesource.com/8836 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:18 +00:00
Austin Clements	028f972847	runtime: make gcDrainN in terms of scan work Currently, the "n" in gcDrainN is in terms of objects to scan. This is used by gchelpwork to perform a limited amount of work on allocation, but is a pretty arbitrary way to bound this amount of work since the number of objects has little relation to how long they take to scan. Modify gcDrainN to perform a fixed amount of scan work instead. For now, gchelpwork still performs a fairly arbitrary amount of scan work, but at least this is much more closely related to how long the work will take. Shortly, we'll use this to precisely control the scan work performed by mutator assists during allocation to achieve the heap size goal. Change-Id: I3cd07fe0516304298a0af188d0ccdf621d4651cc Reviewed-on: https://go-review.googlesource.com/8835 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:14 +00:00
Austin Clements	8e24283a28	runtime: track background scan work credit This tracks scan work done by background GC in a global pool. Mutator assists will draw on this credit to avoid doing work when background GC is staying ahead. Unlike the other GC controller tracking variables, this will be both written and read throughout the cycle. Hence, we can't arbitrarily delay updates like we can for scan work and bytes marked. However, we still want to minimize contention, so this global credit pool is allowed some error from the "true" amount of credit. Background GC accumulates credit locally up to a limit and only then flushes to the global pool. Similarly, mutator assists will draw from the credit pool in batches. Change-Id: I1aa4fc604b63bf53d1ee2a967694dffdfc3e255e Reviewed-on: https://go-review.googlesource.com/8834 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:09 +00:00
Austin Clements	571ebae6ef	runtime: track scan work performed during concurrent mark This tracks the amount of scan work in terms of scanned pointers during the concurrent mark phase. We'll use this information to estimate scan work for the next cycle. Currently this aggregates the work counter in gcWork and dispose atomically aggregates this into a global work counter. dispose happens relatively infrequently, so the contention on the global counter should be low. If this turns out to be an issue, we can reduce the number of disposes, and if it's still a problem, we can switch to per-P counters. Change-Id: Iac0364c466ee35fab781dbbbe7970a5f3c4e1fc1 Reviewed-on: https://go-review.googlesource.com/8832 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:00 +00:00
Michael Hudson-Doyle	a1f57598cc	runtime, cmd/internal/ld: rename themoduledata to firstmoduledata 'themoduledata' doesn't really make sense now we support multiple moduledata objects. Change-Id: I8263045d8f62a42cb523502b37289b0fba054f62 Reviewed-on: https://go-review.googlesource.com/8521 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-10 05:11:49 +00:00
Michael Hudson-Doyle	fae4a128cb	runtime, reflect: support multiple moduledata objects This changes all the places that consult themoduledata to consult a linked list of moduledata objects, as will be necessary for -linkshared to work. Obviously, as there is as yet no way of adding moduledata objects to this list, all this change achieves right now is wasting a few instructions here and there. Change-Id: I397af7f60d0849b76aaccedf72238fe664867051 Reviewed-on: https://go-review.googlesource.com/8231 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-10 04:51:42 +00:00
Austin Clements	50a66562a0	runtime: track heap bytes marked by GC This tracks the number of heap bytes marked by a GC cycle. We'll use this information to precisely trigger the next GC cycle. Currently this aggregates the work counter in gcWork and dispose atomically aggregates this into a global work counter. dispose happens relatively infrequently, so the contention on the global counter should be low. If this turns out to be an issue, we can reduce the number of disposes, and if it's still a problem, we can switch to per-P counters. Change-Id: I1bc377cb2e802ef61c2968602b63146d52e7f5db Reviewed-on: https://go-review.googlesource.com/8388 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-06 21:28:07 +00:00
Michael Hudson-Doyle	67426a8a9e	runtime, cmd/internal/ld: change runtime to use a single linker symbol In preparation for being able to run a go program that has code in several objects, this changes from having several linker symbols used by the runtime into having one linker symbol that points at a structure containing the needed data. Multiple object support will construct a linked list of such structures. A follow up will initialize the slices in the themoduledata structure directly from the linker but I was aiming for a minimal diff for now. Change-Id: I613cce35309801cf265a1d5ae5aaca8d689c5cbf Reviewed-on: https://go-review.googlesource.com/7441 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-03-31 22:45:07 +00:00
Austin Clements	a2f3d73fee	runtime: improve comment about non-preemption during GC work Currently, gcDrainN is documented saying that it must be run on the system stack. In fact, the problem and solution here are somewhat subtler. First, it doesn't have to happen on the system stack, it just has to be non-stoppable (that is, non-preemptible). Second, this isn't specific to gcDrainN (though gcDrainN is perhaps the most surprising instance); it's general to anything that uses the gcWork structure. Move the comment to gcWork and generalize it. Change-Id: I5277b5abb070e47f8d783bc15a310b379c6adc22 Reviewed-on: https://go-review.googlesource.com/8247 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-31 01:05:38 +00:00
Austin Clements	a4374c1de1	runtime: fix another out of date comment in GC gcDrain used to be passed a *workbuf to start draining from, but now it takes a gcWork, which hides whether or not there's an initial workbuf. Update the comment to match this. Change-Id: I976b58e5bfebc451cfd4fa75e770113067b5cc07 Reviewed-on: https://go-review.googlesource.com/8246 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-31 01:05:31 +00:00
Austin Clements	06de3f52a7	runtime: document subtlety around entering mark termination The barrier in gcDrain does not account for concurrent gcDrainNs happening in gchelpwork, so it can actually return while there is still work being done. It turns out this is okay, but for subtle reasons involving gcDrainN always being run on the system stack. Document these reasons. Change-Id: Ib07b3753cc4e2b54533ab3081a359cbd1c3c08fb Reviewed-on: https://go-review.googlesource.com/7736 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-20 14:05:05 +00:00
Austin Clements	cadd4f81a8	runtime: combine gcWorkProducer into gcWork The distinction between gcWorkProducer and gcWork (producer and consumer) is not serving us as originally intended, so merge these into just gcWork. The original intent was to replace the currentwbuf cache with a gcWorkProducer. However, with gchelpwork (aka mutator assists), mutators can both produce and consume work, so it will make more sense to cache a whole gcWork. Change-Id: I6e633e96db7cb23a64fbadbfc4607e3ad32bcfb3 Reviewed-on: https://go-review.googlesource.com/7733 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-19 15:55:21 +00:00
Austin Clements	77fcf36a5e	runtime: don't use cached wbuf in markroot Currently markroot fetches the wbuf to fill from the per-M wbuf cache. The wbuf cache is primarily meant for the write barrier because it produces very little work on each call. There's little point to using the cache in mark root, since each call to markroot is likely to produce a large amount of work (so the slight win on getting it from the cache instead of from the central wbuf lists doesn't matter), and markroot does not dispose the wbuf back to the cache (so most markroot calls won't get anything from the wbuf cache anyway). Instead, just get the wbuf from the central wbuf lists like other work producers. This will simplify later changes. Change-Id: I07a18a4335a41e266a6d70aa3a0911a40babce23 Reviewed-on: https://go-review.googlesource.com/7732 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-19 15:55:16 +00:00
Austin Clements	a681c3029d	runtime: remove out of date comment Change-Id: I0ad1a81a235c7c067fea2093bbeac4e06a233c10 Reviewed-on: https://go-review.googlesource.com/7661 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-19 15:54:01 +00:00
Austin Clements	506615d83e	runtime: factor object dumping code out of greyobject When checkmark fails, greyobject dumps both the object that pointed to the unmarked object and the unmarked object. This code cluttered up greyobject, was copy-pasted for the two objects, and the copy for dumping the unmarked object was not entirely correct. Extract object dumping out to a new function. This declutters greyobject and fixes the bugs in dumping the unmarked object. The new function is slightly cleaned up from the original code to have more natural control flow and shows a marker on the field in the base object that points to the unmarked object to make it easy to find. Change-Id: Ib51318a943f50b0b99995f0941d03ee8876b9fcf Reviewed-on: https://go-review.googlesource.com/7506 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-17 01:46:35 +00:00
Austin Clements	830abc957a	runtime: fix out of date comment scanobject no longer returns the new wbuf. Change-Id: I0da335ae5cd7ef7ea0e0fa965cf0e9f3a650d0e6 Reviewed-on: https://go-review.googlesource.com/7505 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-03-17 01:46:20 +00:00
Aram Hăvărneanu	846ee0465b	runtime: add support for linux/arm64 Change-Id: Ibda6a5bedaff57fd161d63fc04ad260931d34413 Reviewed-on: https://go-review.googlesource.com/7142 Reviewed-by: Russ Cox <rsc@golang.org>	2015-03-16 18:45:54 +00:00
Keith Randall	cd5b144d98	runtime,reflect,cmd/internal/gc: Fix comments referring to .c/.h files Everything has moved to Go, but comments still refer to .c/.h files. Fix all of those up, at least for these three directories. Fixes #10138 Change-Id: Ie5efe89b247841e0b3f82aac5256b2c606ef67dc Reviewed-on: https://go-review.googlesource.com/7431 Reviewed-by: Russ Cox <rsc@golang.org>	2015-03-11 20:19:43 +00:00
Rick Hudson	122384e489	runtime: Remove boundary bit logic. This is an experiment to see if removing the boundary bit logic will lead to fewer cache misses and improved performance. Instead of using boundary bits we use the span information to get element size and use some bit whacking to get the boundary without having to touch the random heap bits which cause cache misses. Furthermore once the boundary bit is removed we can either use that bit for a simpler checkmark routine or we can reduce the number of bits in the GC bitmap to 2 bits per pointer sized work. For example the 2 bits at the boundary can be used for marking and pointer/scalar differentiation. Since we don't need the mark bit except at the boundary nibble of the object other nibbles can use this bit as a noscan bit to indicate that there are no more pointers in the object. Currently the changed included in this CL slows down the garbage benchmark. With the boundary bits garbage gives 5.78 and without (this CL) it gives 5.88 which is a 2% slowdown. Change-Id: Id68f831ad668176f7dc9f7b57b339e4ebb6dc4c2 Reviewed-on: https://go-review.googlesource.com/6665 Reviewed-by: Austin Clements <austin@google.com>	2015-03-04 20:55:55 +00:00
Austin Clements	da4874cba4	runtime: trivial clean ups to greyobject Previously, the typeDead check in greyobject was under a separate !useCheckmark conditional. Put it with the rest of the !useCheckmark code. Also move a comment about atomic update of the marked bit to where we actually do that update now. Change-Id: Ief5f16401a25739ad57d959607b8d81ffe0bc211 Reviewed-on: https://go-review.googlesource.com/6271 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-27 19:39:57 +00:00
Austin Clements	b3d791c7bb	runtime: consolidate gcworkdone/gcscanvalid clearing loops Previously, we had three loops in the garbage collector that all cleared the per-G GC flags. Consolidate these into one function. This one function is designed to work in a concurrent setting. As a result, it's slightly more expensive than the loops it replaces during STW phases, but these happen at most twice per GC. Change-Id: Id1ec0074fd58865eb0112b8a0547b267802d0df1 Reviewed-on: https://go-review.googlesource.com/5881 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-25 15:46:41 +00:00
Russ Cox	89a091de24	runtime: split gc_m into gcMark and gcSweep This is a nice split but more importantly it provides a better way to fit the checkmark phase into the sequencing. Also factor out common span copying into gcSpanCopy. Change-Id: Ia058644974e4ed4ac3cf4b017a3446eb2284d053 Reviewed-on: https://go-review.googlesource.com/5333 Reviewed-by: Austin Clements <austin@google.com>	2015-02-20 17:00:39 +00:00
Russ Cox	484f801ff4	runtime: reorganize memory code Move code from malloc1.go, malloc2.go, mem.go, mgc0.go into appropriate locations. Factor mgc.go into mgc.go, mgcmark.go, mgcsweep.go, mstats.go. A lot of this code was in certain files because the right place was in a C file but it was written in Go, or vice versa. This is one step toward making things actually well-organized again. Change-Id: I6741deb88a7cfb1c17ffe0bcca3989e10207968f Reviewed-on: https://go-review.googlesource.com/5300 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 20:17:01 +00:00

49 Commits