qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-20 00:04:43 -07:00

Author	SHA1	Message	Date
Lynn Boger	eeca3ba92f	sync/atomic, runtime/internal/atomic: improve ppc64x atomics The following performance improvements have been made to the low-level atomic functions for ppc64le & ppc64: - For those cases containing a lwarx and stwcx (or other sizes): sync, lwarx, maybe something, stwcx, loop to sync, sync, isync The sync is moved before (outside) the lwarx/stwcx loop, and the sync after is removed, so it becomes: sync, lwarx, maybe something, stwcx, loop to lwarx, isync - For the Or8 and And8, the shifting and manipulation of the address to the word aligned version were removed and the instructions were changed to use lbarx, stbcx instead of register shifting, xor, then lwarx, stwcx. - New instructions LWSYNC, LBAR, STBCC were tested and added. runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode instead of the WORD encoding. Fixes #15469 Ran some of the benchmarks in the runtime and sync directories. Some results varied from run to run but the trend was improvement based on best times for base and new: runtime.test: BenchmarkChanNonblocking-128 0.88 0.89 +1.14% BenchmarkChanUncontended-128 569 511 -10.19% BenchmarkChanContended-128 63110 53231 -15.65% BenchmarkChanSync-128 691 598 -13.46% BenchmarkChanSyncWork-128 11355 11649 +2.59% BenchmarkChanProdCons0-128 2402 2090 -12.99% BenchmarkChanProdCons10-128 1348 1363 +1.11% BenchmarkChanProdCons100-128 1002 746 -25.55% BenchmarkChanProdConsWork0-128 2554 2720 +6.50% BenchmarkChanProdConsWork10-128 1909 1804 -5.50% BenchmarkChanProdConsWork100-128 1624 1580 -2.71% BenchmarkChanCreation-128 237 212 -10.55% BenchmarkChanSem-128 705 667 -5.39% BenchmarkChanPopular-128 5081190 4497566 -11.49% BenchmarkCreateGoroutines-128 532 473 -11.09% BenchmarkCreateGoroutinesParallel-128 35.0 34.7 -0.86% BenchmarkCreateGoroutinesCapture-128 4923 4200 -14.69% sync.test: BenchmarkUncontendedSemaphore-128 112 94.2 -15.89% BenchmarkContendedSemaphore-128 133 128 -3.76% BenchmarkMutexUncontended-128 1.90 1.67 -12.11% BenchmarkMutex-128 353 310 -12.18% BenchmarkMutexSlack-128 304 283 -6.91% BenchmarkMutexWork-128 554 541 -2.35% BenchmarkMutexWorkSlack-128 567 556 -1.94% BenchmarkMutexNoSpin-128 275 242 -12.00% BenchmarkMutexSpin-128 1129 1030 -8.77% BenchmarkOnce-128 1.08 0.96 -11.11% BenchmarkPool-128 29.8 27.4 -8.05% BenchmarkPoolOverflow-128 40564 36583 -9.81% BenchmarkSemaUncontended-128 3.14 2.63 -16.24% BenchmarkSemaSyntNonblock-128 1087 1069 -1.66% BenchmarkSemaSyntBlock-128 897 893 -0.45% BenchmarkSemaWorkNonblock-128 1034 1028 -0.58% BenchmarkSemaWorkBlock-128 949 886 -6.64% Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e Reviewed-on: https://go-review.googlesource.com/22549 Reviewed-by: Michael Munday <munday@ca.ibm.com> Reviewed-by: Minux Ma <minux@golang.org>	2016-05-05 18:52:28 +00:00
Cherry Zhang	bcd4b84bc5	runtime: skip TestCgoCallbackGC on linux/mips64x Builder is too slow. This test passed on builder machines but took 15+ min. Change-Id: Ief9d67ea47671a57e954e402751043bc1ce09451 Reviewed-on: https://go-review.googlesource.com/22798 Reviewed-by: Minux Ma <minux@golang.org> Run-TryBot: Minux Ma <minux@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-05 14:58:14 +00:00
Ian Lance Taylor	34f97d28d2	runtime: put tracebackctxt C functions in .c file Since tracebackctxt.go uses //export functions, the C functions can't be externally visible in the C comment. The code was using attributes to work around that, but that failed on Windows. Change-Id: If4449fd8209a8998b4f6855ea89e5db1471b2981 Reviewed-on: https://go-review.googlesource.com/22786 Reviewed-by: Minux Ma <minux@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-05 01:19:18 +00:00
Mohit Agarwal	4d6788ecae	runtime: clean up profiling data files produced by TestCgoPprof Fixes #15541 Change-Id: I9b6835157db0eb86de13591e785f971ffe754baa Reviewed-on: https://go-review.googlesource.com/22783 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-04 22:12:26 +00:00
Shenghou Ma	43d2a10e26	runtime/internal/atomic: fix vet warnings Change-Id: Ib29cf7abbbdaed81e918e5e41bca4e9b8da24621 Reviewed-on: https://go-review.googlesource.com/22503 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-05-04 16:50:22 +00:00
Cherry Zhang	b6687c8933	runtime: add linux/mips64x cgo support Change-Id: Id40dd05b7b264f3b779fdf9ccc2421ba4bc70589 Reviewed-on: https://go-review.googlesource.com/19806 Reviewed-by: Minux Ma <minux@golang.org>	2016-05-04 16:41:10 +00:00
Cherry Zhang	6e90432342	runtime/cgo: add context argument to crosscall2 on mips64 Change-Id: Id018516075842afd8af12fbf207763a851d5a851 Reviewed-on: https://go-review.googlesource.com/22754 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-05-04 16:40:44 +00:00
Ian Lance Taylor	84e808043f	runtime: use cgo traceback for SIGPROF If we collected a cgo traceback when entering the SIGPROF signal handler, record it as part of the profiling stack trace. This serves as the promised test for https://golang.org/cl/21055 . Change-Id: I5f60cd6cea1d9b7c3932211483a6bfab60ed21d2 Reviewed-on: https://go-review.googlesource.com/22650 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2016-05-04 00:08:19 +00:00
Dmitry Vyukov	caa2147532	runtime: per-P contexts for race detector Race runtime also needs local malloc caches and currently uses a mix of per-OS-thread and per-goroutine caches. This leads to increased memory consumption. But more importantly cache of synchronization objects is per-goroutine and we don't always have goroutine context when feeing memory in GC. As the result synchronization object descriptors leak (more precisely, they can be reused if another synchronization object is recreated at the same address, but it does not always help). For example, the added BenchmarkSyncLeak has effectively runaway memory consumption (based on a real long running server). This change updates race runtime with support for per-P contexts. BenchmarkSyncLeak now stabilizes at ~1GB memory consumption. Long term, this will allow us to remove race runtime dependency on glibc (as malloc is the main cornerstone). I've also implemented a different scheme to pass P context to race runtime: scheduler notified race runtime about association between G and P by calling procwire(g, p)/procunwire(g, p). But it turned out to be very messy as we have lots of places where the association changes (e.g. syscalls). So I dropped it in favor of the current scheme: race runtime asks scheduler about the current P. Fixes #14533 Change-Id: Iad10d2f816a44affae1b9fed446b3580eafd8c69 Reviewed-on: https://go-review.googlesource.com/19970 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-03 11:00:43 +00:00
Dmitry Vyukov	fcd7c02c70	runtime: fix CPU underutilization Runqempty is a critical predicate for scheduler. If runqempty spuriously returns true, then scheduler can fail to schedule arbitrary number of runnable goroutines on idle Ps for arbitrary long time. With the addition of runnext runqempty predicate become broken (can spuriously return true). Consider that runnext is not nil and the main array is empty. Runqempty observes that the array is empty, then it is descheduled for some time. Then queue owner pushes another element to the queue evicting runnext into the array. Then queue owner pops runnext. Then runqempty resumes and observes runnext is nil and returns true. But there were no point in time when the queue was empty. Fix runqempty predicate to not return true spuriously. Change-Id: Ifb7d75a699101f3ff753c4ce7c983cf08befd31e Reviewed-on: https://go-review.googlesource.com/20858 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-03 10:06:32 +00:00
Emmanuel Odeke	53fd522c0d	all: make copyright headers consistent with one space after period Follows suit with https://go-review.googlesource.com/#/c/20111. Generated by running $ grep -R 'Go Authors. All' * \| cut -d":" -f1 \| while read F;do perl -pi -e 's/Go Authors. All/Go Authors. All/g' $F;done The code in cmd/internal/unvendor wasn't changed. Fixes #15213 Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f Reviewed-on: https://go-review.googlesource.com/21819 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-02 13:43:18 +00:00
Austin Clements	77c7f12438	runtime: update some comments This updates some comments that became out of date when we moved the mark bit out of the heap bitmap and started using the high bit for the first word as a scan/dead bit. Change-Id: I4a572d16db6114cadff006825466c1f18359f2db Reviewed-on: https://go-review.googlesource.com/22662 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-05-01 03:31:50 +00:00
Cherry Zhang	5d002dbc21	runtime/cgo: add linux/mips64x cgo support MIPS N64 ABI passes arguments in registers R4-R11, return value in R2. R16-R23, R28, R30 and F24-F31 are callee-save. gcc PIC code expects to be called with indirect call through R25. Change-Id: I24f582b4b58e1891ba9fd606509990f95cca8051 Reviewed-on: https://go-review.googlesource.com/19805 Reviewed-by: Minux Ma <minux@golang.org>	2016-05-01 02:39:50 +00:00
Cherry Zhang	073d292c45	cmd/link, runtime: add external linking support for linux/mips64x Fixes #12560 Change-Id: Ic2004fc7b09f2dbbf83c41f8c6307757c0e1676d Reviewed-on: https://go-review.googlesource.com/19803 Reviewed-by: Minux Ma <minux@golang.org>	2016-05-01 02:38:37 +00:00
Cherry Zhang	981395103e	cmd/internal/obj/mips et al.: introduce SB register on mips64x SB register (R28) is introduced for access external addresses with shorter instruction sequences. It is loaded at entry points. External data within 2G of SB can be accessed this way. cmd/internal/obj: relocaltion R_ADDRMIPS is split into two relocations R_ADDRMIPS and R_ADDRMIPSU, handling the low 16 bits and the "upper" 16 bits of external addresses, respectively, since the instructios may not be adjacent. It might be better if relocation Variant could be used. cmd/link/internal/mips64: support new relocations. cmd/compile/internal/mips64: reserve SB register. runtime: initialize SB register at entry points. Change-Id: I5f34868f88c5a9698c042a8a1f12f76806c187b9 Reviewed-on: https://go-review.googlesource.com/19802 Reviewed-by: Minux Ma <minux@golang.org>	2016-05-01 02:36:46 +00:00
Cherry Zhang	a409fb80b0	cmd/internal/obj/mips, runtime: change REGTMP to R23 Leave R28 to SB register, which will be introduced in CL 19802. Change-Id: I1cf7a789695c5de664267ec8086bfb0b043ebc14 Reviewed-on: https://go-review.googlesource.com/19863 Reviewed-by: Minux Ma <minux@golang.org>	2016-05-01 02:36:28 +00:00
Austin Clements	a20fd1f6ba	runtime: reclaim scan/dead bit in first word With the switch to separate mark bitmaps, the scan/dead bit for the first word of each object is now unused. Reclaim this bit and use it as a scan/dead bit, just like words three and on. The second word is still used for checkmark. This dramatically simplifies heapBitsSetTypeNoScan and hasPointers, since they no longer need different cases for 1, 2, and 3+ word objects. They can instead just manipulate the heap bitmap for the first word and be done with it. In order to enable this, we change heapBitsSetType and runGCProg to always set the scan/dead bit to scan for the first word on every code path. Since these functions only apply to types that have pointers, there's no need to do this conditionally: it's always necessary to set the scan bit in the first word. We also change every place that scans an object and checks if there are more pointers. Rather than only checking morePointers if the word is >= 2, we now check morePointers if word != 1 (since that's the checkmark word). Looking forward, we should probably reclaim the checkmark bit, too, but that's going to be quite a bit more work. Tested by setting doubleCheck in heapBitsSetType and running all.bash on both linux/amd64 and linux/386, and by running GOGC=10 all.bash. This particularly improves the FmtFprintf* go1 benchmarks, since they do a large amount of noscan allocation. name old time/op new time/op delta BinaryTree17-12 2.34s ± 1% 2.38s ± 1% +1.70% (p=0.000 n=17+19) Fannkuch11-12 2.09s ± 0% 2.09s ± 1% ~ (p=0.276 n=17+16) FmtFprintfEmpty-12 44.9ns ± 2% 44.8ns ± 2% ~ (p=0.340 n=19+18) FmtFprintfString-12 127ns ± 0% 125ns ± 0% -1.57% (p=0.000 n=16+15) FmtFprintfInt-12 128ns ± 0% 122ns ± 1% -4.45% (p=0.000 n=15+20) FmtFprintfIntInt-12 207ns ± 1% 193ns ± 0% -6.55% (p=0.000 n=19+14) FmtFprintfPrefixedInt-12 197ns ± 1% 191ns ± 0% -2.93% (p=0.000 n=17+18) FmtFprintfFloat-12 263ns ± 0% 248ns ± 1% -5.88% (p=0.000 n=15+19) FmtManyArgs-12 794ns ± 0% 779ns ± 1% -1.90% (p=0.000 n=18+18) GobDecode-12 7.14ms ± 2% 7.11ms ± 1% ~ (p=0.072 n=20+20) GobEncode-12 5.85ms ± 1% 5.82ms ± 1% -0.49% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 215ms ± 1% -1.22% (p=0.000 n=19+19) Gunzip-12 36.8ms ± 0% 36.7ms ± 0% -0.18% (p=0.006 n=18+20) HTTPClientServer-12 77.1µs ± 4% 77.1µs ± 3% ~ (p=0.945 n=19+20) JSONEncode-12 15.6ms ± 1% 15.9ms ± 1% +1.68% (p=0.000 n=18+20) JSONDecode-12 55.2ms ± 1% 53.6ms ± 1% -2.93% (p=0.000 n=17+19) Mandelbrot200-12 4.05ms ± 1% 4.05ms ± 0% ~ (p=0.306 n=17+17) GoParse-12 3.14ms ± 1% 3.10ms ± 1% -1.31% (p=0.000 n=19+18) RegexpMatchEasy0_32-12 69.3ns ± 1% 70.0ns ± 0% +0.89% (p=0.000 n=19+17) RegexpMatchEasy0_1K-12 237ns ± 1% 236ns ± 0% -0.62% (p=0.000 n=19+16) RegexpMatchEasy1_32-12 69.5ns ± 1% 70.3ns ± 1% +1.14% (p=0.000 n=18+17) RegexpMatchEasy1_1K-12 377ns ± 1% 366ns ± 1% -3.03% (p=0.000 n=15+19) RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 2% ~ (p=0.318 n=20+19) RegexpMatchMedium_1K-12 33.8µs ± 3% 33.5µs ± 1% -1.04% (p=0.001 n=20+19) RegexpMatchHard_32-12 1.68µs ± 1% 1.73µs ± 0% +2.50% (p=0.000 n=20+18) RegexpMatchHard_1K-12 50.8µs ± 1% 52.0µs ± 1% +2.50% (p=0.000 n=19+18) Revcomp-12 381ms ± 1% 385ms ± 1% +1.00% (p=0.000 n=17+18) Template-12 64.9ms ± 3% 62.6ms ± 1% -3.55% (p=0.000 n=19+18) TimeParse-12 324ns ± 0% 328ns ± 1% +1.25% (p=0.000 n=18+18) TimeFormat-12 345ns ± 0% 334ns ± 0% -3.31% (p=0.000 n=15+17) [Geo mean] 52.1µs 51.5µs -1.00% Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049 Reviewed-on: https://go-review.googlesource.com/22632 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-30 16:49:54 +00:00
Austin Clements	d5e3d08b3a	runtime: use morePointers and isPointer in more places This makes this code better self-documenting and makes it easier to find these places in the future. Change-Id: I31dc5598ae67f937fb9ef26df92fd41d01e983c3 Reviewed-on: https://go-review.googlesource.com/22631 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-30 16:49:50 +00:00
Austin Clements	a5d3f7ece9	runtime: avoid conditional execution in morePointers and isPointer heapBits.bits is carefully written to produce good machine code. Use it in heapBits.morePointers and heapBits.isPointer to get good machine code there, too. Change-Id: I208c7d0d38697e7a22cad67f692162589b75f1e2 Reviewed-on: https://go-review.googlesource.com/22630 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-30 16:49:47 +00:00
Michael Munday	58f52cbb79	runtime: fix cgocallback_gofunc on ppc64x Fix issues introduced in `5f9a870`. Change-Id: Ia75945ef563956613bf88bbe57800a96455c265d Reviewed-on: https://go-review.googlesource.com/22661 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-04-30 03:49:22 +00:00
Ian Lance Taylor	9fe572e509	runtime: fix cgocallback_gofunc argument passing on arm64 Change-Id: I4b34bcd5cde71ecfbb352b39c4231de6168cc7f3 Reviewed-on: https://go-review.googlesource.com/22651 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <munday@ca.ibm.com>	2016-04-29 23:10:52 +00:00
Ian Lance Taylor	5f9a870bf1	cmd/cgo, runtime, runtime/cgo: use cgo context function Add support for the context function set by runtime.SetCgoTraceback. The context function was added in CL 17761, without support. This CL is the support. This CL has not been tested for real C code, as a working context function for C code requires unwind support that does not seem to exist. I wanted to get the CL out before the freeze. I apologize for the length of this CL. It's mostly plumbing, but unfortunately the plumbing is processor-specific. Change-Id: I8ce11a0de9b3dafcc29efd2649d776e93bff0e90 Reviewed-on: https://go-review.googlesource.com/22508 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-29 22:07:36 +00:00
Rick Hudson	56b5491262	Merge remote-tracking branch 'origin/dev.garbage' This commit moves the GC from free list allocation to bit mark allocation. Instead of using the bitmaps generated during the mark phases to generate free list and then using the free lists for allocation we allocate directly from the bitmaps. The change in the garbage benchmark name old time/op new time/op delta XBenchGarbage-12 2.22ms ± 1% 2.13ms ± 1% -3.90% (p=0.000 n=18+18) Change-Id: I17f57233336f0ca5ef5404c3be4ecb443ab622aa	2016-04-29 13:56:44 -04:00
Rick Hudson	e9eaa181fc	[dev.garbage] runtime: simplify nextFreeFast so it is inlined nextFreeFast is currently not inlined by the compiler due to its size and complexity. This CL simplifies nextFreeFast by letting the slow path handle (nextFree) handle a corner cases. Change-Id: Ia9c5d1a7912bcb4bec072f5fd240f0e0bafb20e4 Reviewed-on: https://go-review.googlesource.com/22598 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2016-04-29 16:47:11 +00:00
Austin Clements	b3579c095e	[dev.garbage] runtime: revive sweep fast path sweep used to skip mcental.freeSpan (and its locking) if it didn't find any new free objects. We lost that optimization when the freed-object counting changed in dad83f7 to count total free objects instead of newly freed objects. The previous commit brings back counting of newly freed objects, so we can easily revive this optimization by checking that count (like we used to) instead of the total free objects count. Change-Id: I43658707a1c61674d0366124d5976b00d98741a9 Reviewed-on: https://go-review.googlesource.com/22596 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-04-29 15:25:28 +00:00
Austin Clements	d97625ae9e	[dev.garbage] runtime: fix nfree accounting Commit `8dda1c4` changed the meaning of "nfree" in sweep from the number of newly freed objects to the total number of free objects in the span, but didn't update where sweep added nfree to c.local_nsmallfree. Hence, we're over-accounting the number of frees. This is causing TestArrayHash to fail with "too many allocs NNN - hash not balanced". Fix this by computing the number of newly freed objects and adding that to c.local_nsmallfree, so it behaves like it used to. Computing this requires a small tweak to mallocgc: apparently we've never set s.allocCount when allocating a large object; fix this by setting it to 1 so sweep doesn't get confused. Change-Id: I31902ffd310110da4ffd807c5c06f1117b872dc8 Reviewed-on: https://go-review.googlesource.com/22595 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2016-04-29 15:25:26 +00:00
Austin Clements	6d11490539	[dev.garbage] runtime: fix allocfreetrace We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode when we removed the sweep over the mark bitmap. Fix it by re-introducing the sweep over the bitmap specifically if we're in allocfreetrace mode. This doesn't have to be even remotely efficient, since the overhead of allocfreetrace is huge anyway, so we can keep the code for this down to just a few lines. Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40 Reviewed-on: https://go-review.googlesource.com/22592 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-04-29 15:08:21 +00:00
Austin Clements	38f674687a	[dev.garbage] runtime: reintroduce no-zeroing optimization Currently we always zero objects when we allocate them. We used to have an optimization that would not zero objects that had not been allocated since the whole span was last zeroed (either by getting it from the system or by getting it from the heap, which does a bulk zero), but this depended on the sweeper clobbering the first two words of each object. Hence, we lost this optimization when the bitmap sweeper went away. Re-introduce this optimization using a different mechanism. Each span already keeps a flag indicating that it just came from the OS or was just bulk zeroed by the mheap. We can simply use this flag to know when we don't need to zero an object. This is slightly less efficient than the old optimization: if a span gets allocated and partially used, then GC happens and the span gets returned to the mcentral, then the span gets re-acquired, the old optimization knew that it only had to re-zero the objects that had been reclaimed, whereas this optimization will re-zero everything. However, in this case, you're already paying for the garbage collection, and you've only wasted one zeroing of the span, so in practice there seems to be little difference. (If we did want to revive the full optimization, each span could keep track of a frontier beyond which all free slots are zeroed. I prototyped this and it didn't obvious do any better than the much simpler approach in this commit.) This significantly improves BinaryTree17, which is allocation-heavy (and runs first, so most pages are already zeroed), and slightly improves everything else. name old time/op new time/op delta XBenchGarbage-12 2.15ms ± 1% 2.14ms ± 1% -0.80% (p=0.000 n=17+17) name old time/op new time/op delta BinaryTree17-12 2.71s ± 1% 2.56s ± 1% -5.73% (p=0.000 n=18+19) DivconstI64-12 1.70ns ± 1% 1.70ns ± 1% ~ (p=0.562 n=18+18) DivconstU64-12 1.74ns ± 2% 1.74ns ± 1% ~ (p=0.394 n=20+20) DivconstI32-12 1.74ns ± 0% 1.74ns ± 0% ~ (all samples are equal) DivconstU32-12 1.66ns ± 1% 1.66ns ± 0% ~ (p=0.516 n=15+16) DivconstI16-12 1.84ns ± 0% 1.84ns ± 0% ~ (all samples are equal) DivconstU16-12 1.82ns ± 0% 1.82ns ± 0% ~ (all samples are equal) DivconstI8-12 1.79ns ± 0% 1.79ns ± 0% ~ (all samples are equal) DivconstU8-12 1.60ns ± 0% 1.60ns ± 1% ~ (p=0.603 n=17+19) Fannkuch11-12 2.11s ± 1% 2.11s ± 0% ~ (p=0.333 n=16+19) FmtFprintfEmpty-12 45.1ns ± 4% 45.4ns ± 5% ~ (p=0.111 n=20+20) FmtFprintfString-12 134ns ± 0% 129ns ± 0% -3.45% (p=0.000 n=18+16) FmtFprintfInt-12 131ns ± 1% 129ns ± 1% -1.54% (p=0.000 n=16+18) FmtFprintfIntInt-12 205ns ± 2% 203ns ± 0% -0.56% (p=0.014 n=20+18) FmtFprintfPrefixedInt-12 200ns ± 2% 197ns ± 1% -1.48% (p=0.000 n=20+18) FmtFprintfFloat-12 256ns ± 1% 256ns ± 0% -0.21% (p=0.008 n=18+20) FmtManyArgs-12 805ns ± 0% 804ns ± 0% -0.19% (p=0.001 n=18+18) GobDecode-12 7.21ms ± 1% 7.14ms ± 1% -0.92% (p=0.000 n=19+20) GobEncode-12 5.88ms ± 1% 5.88ms ± 1% ~ (p=0.641 n=18+19) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.271 n=19+18) Gunzip-12 37.1ms ± 0% 36.9ms ± 0% -0.29% (p=0.000 n=18+17) HTTPClientServer-12 78.1µs ± 2% 77.4µs ± 2% ~ (p=0.070 n=19+19) JSONEncode-12 15.5ms ± 1% 15.5ms ± 0% ~ (p=0.063 n=20+18) JSONDecode-12 56.1ms ± 0% 55.4ms ± 1% -1.18% (p=0.000 n=19+18) Mandelbrot200-12 4.05ms ± 0% 4.06ms ± 0% +0.29% (p=0.001 n=18+18) GoParse-12 3.28ms ± 1% 3.21ms ± 1% -2.30% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 69.4ns ± 2% 69.3ns ± 1% ~ (p=0.205 n=18+16) RegexpMatchEasy0_1K-12 239ns ± 0% 239ns ± 0% ~ (all samples are equal) RegexpMatchEasy1_32-12 69.4ns ± 1% 69.4ns ± 1% ~ (p=0.620 n=15+18) RegexpMatchEasy1_1K-12 370ns ± 1% 369ns ± 2% ~ (p=0.088 n=20+20) RegexpMatchMedium_32-12 108ns ± 0% 108ns ± 0% ~ (all samples are equal) RegexpMatchMedium_1K-12 33.6µs ± 3% 33.5µs ± 3% ~ (p=0.718 n=20+20) RegexpMatchHard_32-12 1.68µs ± 1% 1.67µs ± 2% ~ (p=0.316 n=20+20) RegexpMatchHard_1K-12 50.5µs ± 3% 50.4µs ± 3% ~ (p=0.659 n=20+20) Revcomp-12 381ms ± 1% 381ms ± 1% ~ (p=0.916 n=19+18) Template-12 66.5ms ± 1% 65.8ms ± 2% -1.08% (p=0.000 n=20+20) TimeParse-12 317ns ± 0% 319ns ± 0% +0.48% (p=0.000 n=19+12) TimeFormat-12 338ns ± 0% 338ns ± 0% ~ (p=0.124 n=19+18) [Geo mean] 5.99µs 5.96µs -0.54% Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907 Reviewed-on: https://go-review.googlesource.com/22591 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-04-29 15:08:13 +00:00
Austin Clements	3e2462387f	[dev.garbage] runtime: eliminate mspan.start This converts all remaining uses of mspan.start to instead use mspan.base(). In many cases, this actually reduces the complexity of the code. Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7 Reviewed-on: https://go-review.googlesource.com/22590 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2016-04-29 03:53:17 +00:00
Austin Clements	b7adc41fba	[dev.garbage] runtime: use s.base() everywhere it makes sense Currently we have lots of (s.start << _PageShift) and variants. We now have an s.base() function that returns this. It's faster and more readable, so use it. Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b Reviewed-on: https://go-review.googlesource.com/22559 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2016-04-29 03:53:14 +00:00
Austin Clements	2e8b74b695	[dev.garbage] runtime: document sysAlloc In particular, it always returns an aligned pointer. Change-Id: I763789a539a4bfd8b0efb36a39a80be1a479d3e2 Reviewed-on: https://go-review.googlesource.com/22558 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-29 03:53:12 +00:00
Austin Clements	15744c92de	[dev.garbage] runtime: remove unused head/end arguments from freeSpan These used to be used for the list of newly freed objects, but that's no longer a thing. Change-Id: I5a4503137b74ec0eae5372ca271b1aa0b32df074 Reviewed-on: https://go-review.googlesource.com/22557 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-29 03:53:08 +00:00
Rick Hudson	2fb75ea6c6	[dev.garbage] runtime: use sys.Ctz64 intrinsic Our compilers now provides instrinsics including sys.Ctz64 that support CTZ (count trailing zero) instructions. This CL replaces the Go versions of CTZ with the compiler intrinsic. Count trailing zeros CTZ finds the least significant 1 in a word and returns the number of less significant 0s in the word. Allocation uses the bitmap created by the garbage collector to locate an unmarked object. The logic takes a word of the bitmap, complements, and then caches it. It then uses CTZ to locate an available unmarked object. It then shifts marked bits out of the bitmap word preparing it for the next search. Once all the unmarked objects are used in the cached work the bitmap gets another word and repeats the process. Change-Id: Id2fc42d1d4b9893efaa2e1bd01896985b7e42f82 Reviewed-on: https://go-review.googlesource.com/21366 Reviewed-by: Austin Clements <austin@google.com>	2016-04-29 00:00:50 +00:00
Rick Hudson	2063d5d903	[dev.garbage] runtime: restructure alloc and mark bits Two changes are included here that are dependent on the other. The first is that allocBits and gcamrkBits are changed to a *uint8 which points to the first byte of that span's mark and alloc bits. Several places were altered to perform pointer arithmetic to locate the byte corresponding to an object in the span. The actual bit corresponding to an object is indexed in the byte by using the lower three bits of the objects index. The second change avoids the redundant calculation of an object's index. The index is returned from heapBitsForObject and then used by the functions indexing allocBits and gcmarkBits. Finally we no longer allocate the gc bits in the span structures. Instead we use an arena based allocation scheme that allows for a more compact bit map as well as recycling and bulk clearing of the mark bits. Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef Reviewed-on: https://go-review.googlesource.com/20705 Reviewed-by: Austin Clements <austin@google.com>	2016-04-29 00:00:47 +00:00
Mikio Hara	be730b49ca	runtime: drop _SigUnblock for SIGSYS on Linux The _SigUnblock flag was appended to SIGSYS slot of runtime signal table for Linux in https://go-review.googlesource.com/22202, but there is still no concrete opinion on whether SIGSYS must be an unblocked signal for runtime. This change removes _SigUnblock flag from SIGSYS on Linux for consistency in runtime signal handling and adds a reference to #15204 to runtime signal table for FreeBSD. Updates #15204. Change-Id: I42992b1d852c2ab5dd37d6dbb481dba46929f665 Reviewed-on: https://go-review.googlesource.com/22537 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-04-28 21:48:44 +00:00
Rick Hudson	23aeb34df1	[dev.garbage] Merge remote-tracking branch 'origin/master' into HEAD Change-Id: I282fd9ce9db435dfd35e882a9502ab1abc185297	2016-04-27 18:46:52 -04:00
Brad Fitzpatrick	06d639e075	runtime: fix SetCgoTraceback doc indentation It wasn't rendering as HTML nicely. Change-Id: I5408ec22932a05e85c210c0faa434bd19dce5650 Reviewed-on: https://go-review.googlesource.com/22532 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-04-27 22:12:01 +00:00
Rick Hudson	1354b32cd7	[dev.garbage] runtime: add gc work buffer tryGet and put fast paths The complexity of the GC work buffers put and tryGet prevented them from being inlined. This CL simplifies the fast path thus enabling inlining. If the fast path does not succeed the previous put and tryGet functions are called. Change-Id: I6da6495d0dadf42bd0377c110b502274cc01acf5 Reviewed-on: https://go-review.googlesource.com/20704 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:55:02 +00:00
Rick Hudson	f8d0d4fd59	[dev.garbage] runtime: cleanup and optimize span.base() Prior to this CL the base of a span was calculated in various places using shifts or calls to base(). This CL now always calls base() which has been optimized to calculate the base of the span when the span is initialized and store that value in the span structure. Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100 Reviewed-on: https://go-review.googlesource.com/20703 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:59 +00:00
Rick Hudson	8dda1c4c08	[dev.garbage] runtime: remove heapBitsSweepSpan Prior to this CL the sweep phase was responsible for locating all objects that were about to be freed and calling a function to process the object. This was done by the function heapBitsSweepSpan. Part of processing included calls to tracefree and msanfree as well as counting how many objects were freed. The calls to tracefree and msanfree have been moved into the gcmalloc routine and called when the object is about to be reallocated. The counting of free objects has been optimized using an array based popcnt algorithm and if all the objects in a span are free then span is freed. Similarly the code to locate the next free object has been optimized to use an array based ctz (count trailing zero). Various hot paths in the allocation logic have been optimized. At this point the garbage benchmark is within 3% of the 1.6 release. Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa Reviewed-on: https://go-review.googlesource.com/20201 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:57 +00:00
Rick Hudson	4093481523	[dev.garbage] runtime: add bit and cache ctz64 (count trailing zero) Add to each span a 64 bit cache (allocCache) of the allocBits at freeindex. allocCache is shifted such that the lowest bit corresponds to the bit freeindex. allocBits uses a 0 to indicate an object is free, on the other hand allocCache uses a 1 to indicate an object is free. This facilitates ctz64 (count trailing zero) which counts the number of 0s trailing the least significant 1. This is also the index of the least significant 1. Each span maintains a freeindex indicating the boundary between allocated objects and unallocated objects. allocCache is shifted as freeindex is incremented such that the low bit in allocCache corresponds to the bit a freeindex in the allocBits array. Currently ctz64 is written in Go using a for loop so it is not very efficient. Use of the hardware instruction will follow. With this in mind comparisons of the garbage benchmark are as follows. 1.6 release 2.8 seconds dev:garbage branch 3.1 seconds. Profiling shows the go implementation of ctz64 takes up 1% of the total time. Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f Reviewed-on: https://go-review.googlesource.com/20200 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:54 +00:00
Rick Hudson	44fe90d0b3	[dev.garbage] runtime: logic that uses count trailing zero (ctz) Most (all?) processors that Go supports supply a hardware instruction that takes a byte and returns the number of zeros trailing the first 1 encountered, or 8 if no ones are found. This is the index within the byte of the first 1 encountered. CTZ should improve the performance of the nextFreeIndex function. Since nextFreeIndex wants the next unmarked (0) bit a bit-wise complement is needed before calling ctz. Furthermore unmarked bits associated with previously allocated objects need to be ignored. Instead of writing a 1 as we allocate the code masks all bits less than the freeindex after loading the byte. While this CL does not actual execute a CTZ instruction it supplies a ctz function with the appropiate signature along with the logic to execute it. Change-Id: I5c55ce0ed48ca22c21c4dd9f969b0819b4eadaa7 Reviewed-on: https://go-review.googlesource.com/20169 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:52 +00:00
Rick Hudson	e4ac2d4acc	[dev.garbage] runtime: replace ref with allocCount This is a renaming of the field ref to the more appropriate allocCount. The field holds the number of objects in the span that are currently allocated. Some throws strings were adjusted to more accurately convey the meaning of allocCount. Change-Id: I10daf44e3e9cc24a10912638c7de3c1984ef8efe Reviewed-on: https://go-review.googlesource.com/19518 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:49 +00:00
Rick Hudson	3479b065d4	[dev.garbage] runtime: allocate directly from GC mark bits Instead of building a freelist from the mark bits generated by the GC this CL allocates directly from the mark bits. The approach moves the mark bits from the pointer/no pointer heap structures into their own per span data structures. The mark/allocation vectors consist of a single mark bit per object. Two vectors are maintained, one for allocation and one for the GC's mark phase. During the GC cycle's sweep phase the interpretation of the vectors is swapped. The mark vector becomes the allocation vector and the old allocation vector is cleared and becomes the mark vector that the next GC cycle will use. Marked entries in the allocation vector indicate that the object is not free. Each allocation vector maintains a boundary between areas of the span already allocated from and areas not yet allocated from. As objects are allocated this boundary is moved until it reaches the end of the span. At this point further allocations will be done from another span. Since we no longer sweep a span inspecting each freed object the responsibility for maintaining pointer/scalar bits in the heapBitMap containing is now the responsibility of the the routines doing the actual allocation. This CL is functionally complete and ready for performance tuning. Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04 Reviewed-on: https://go-review.googlesource.com/19470 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:47 +00:00
Rick Hudson	dc65a82eff	[dev.garbage] runtime: mark/allocation helper functions The gcmarkBits is a bit vector used by the GC to mark reachable objects. Once a GC cycle is complete the gcmarkBits swap places with the allocBits. allocBits is then used directly by malloc to locate free objects, thus avoiding the construction of a linked free list. This CL introduces a set of helper functions for manipulating gcmarkBits and allocBits that will be used by later CLs to realize the actual algorithm. Minimal attempts have been made to optimize these helper routines. Change-Id: I55ad6240ca32cd456e8ed4973c6970b3b882dd34 Reviewed-on: https://go-review.googlesource.com/19420 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Rick Hudson <rlh@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-27 21:54:44 +00:00
Rick Hudson	e1c4e9a754	[dev.garbage] runtime: refactor next free object In preparation for changing how the next free object is chosen refactor and consolidate code into a single function. Change-Id: I6836cd88ed7cbf0b2df87abd7c1c3b9fabc1cbd8 Reviewed-on: https://go-review.googlesource.com/19317 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:41 +00:00
Rick Hudson	aed861038f	[dev.garbage] runtime: add stackfreelist The freelist for normal objects and the freelist for stacks share the same mspan field for holding the list head but are operated on by different code sequences. This overloading complicates the use of bit vectors for allocation of normal objects. This change refactors the use of the stackfreelist out from the use of freelist. Change-Id: I5b155b5b8a1fcd8e24c12ee1eb0800ad9b6b4fa0 Reviewed-on: https://go-review.googlesource.com/19315 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:39 +00:00
Rick Hudson	2ac8bdc52a	[dev.garbage] runtime: bitmap allocation data structs The bitmap allocation data structure prototypes. Before this is released these underlying data structures need to be more performant but the signatures of helper functions utilizing these structures will remain stable. Change-Id: I5ace12f2fb512a7038a52bbde2bfb7e98783bcbe Reviewed-on: https://go-review.googlesource.com/19221 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-27 21:54:35 +00:00
Austin Clements	b49b71ae19	runtime: don't rescan globals Currently the runtime rescans globals during mark 2 and mark termination. This costs as much as 500µs/MB in STW time, which is enough to surpass the 10ms STW limit with only 20MB of globals. It's also basically unnecessary. The compiler already generates write barriers for global -> heap pointer updates and the regular write barrier doesn't check whether the slot is a global or in the heap. Some less common write barriers do cause problems. heapBitsBulkBarrier, which is used by typedmemmove and related functions, currently depends on having access to the pointer bitmap and as a result ignores writes to globals. Likewise, the reflect-related write barriers reflect_typedmemmovepartial and callwritebarrier ignore non-heap destinations; though it appears they can never be called with global pointers anyway. This commit makes heapBitsBulkBarrier issue write barriers for writes to global pointers using the data and BSS pointer bitmaps, removes the inheap checks from the reflection write barriers, and eliminates the rescans during mark 2 and mark termination. It also adds a test that writes to globals have write barriers. Programs with large data+BSS segments (with pointers) aren't common, but for programs that do have large data+BSS segments, this significantly reduces pause time: name \ 95%ile-time/markTerm old new delta LargeBSS/bss:1GB/gomaxprocs:4 148200µs ± 6% 302µs ±52% -99.80% (p=0.008 n=5+5) This very slightly improves the go1 benchmarks: name old time/op new time/op delta BinaryTree17-12 2.62s ± 3% 2.62s ± 4% ~ (p=0.904 n=20+20) Fannkuch11-12 2.15s ± 1% 2.13s ± 0% -1.29% (p=0.000 n=18+20) FmtFprintfEmpty-12 48.3ns ± 2% 47.6ns ± 1% -1.52% (p=0.000 n=20+16) FmtFprintfString-12 152ns ± 0% 152ns ± 1% ~ (p=0.725 n=18+18) FmtFprintfInt-12 150ns ± 1% 149ns ± 1% -1.14% (p=0.000 n=19+20) FmtFprintfIntInt-12 250ns ± 0% 244ns ± 1% -2.12% (p=0.000 n=20+18) FmtFprintfPrefixedInt-12 219ns ± 1% 217ns ± 1% -1.20% (p=0.000 n=19+20) FmtFprintfFloat-12 280ns ± 0% 281ns ± 1% +0.47% (p=0.000 n=19+19) FmtManyArgs-12 928ns ± 0% 923ns ± 1% -0.53% (p=0.000 n=19+18) GobDecode-12 7.21ms ± 1% 7.24ms ± 2% ~ (p=0.091 n=19+19) GobEncode-12 6.07ms ± 1% 6.05ms ± 1% -0.36% (p=0.002 n=20+17) Gzip-12 265ms ± 1% 265ms ± 1% ~ (p=0.496 n=20+19) Gunzip-12 39.6ms ± 1% 39.3ms ± 1% -0.85% (p=0.000 n=19+19) HTTPClientServer-12 74.0µs ± 2% 73.8µs ± 1% ~ (p=0.569 n=20+19) JSONEncode-12 15.4ms ± 1% 15.3ms ± 1% -0.25% (p=0.049 n=17+17) JSONDecode-12 53.7ms ± 2% 53.0ms ± 1% -1.29% (p=0.000 n=18+17) Mandelbrot200-12 3.97ms ± 1% 3.97ms ± 0% ~ (p=0.072 n=17+18) GoParse-12 3.35ms ± 2% 3.36ms ± 1% +0.51% (p=0.005 n=18+20) RegexpMatchEasy0_32-12 72.7ns ± 2% 72.2ns ± 1% -0.70% (p=0.005 n=19+19) RegexpMatchEasy0_1K-12 246ns ± 1% 245ns ± 0% -0.60% (p=0.000 n=18+16) RegexpMatchEasy1_32-12 72.8ns ± 1% 72.5ns ± 1% -0.37% (p=0.011 n=18+18) RegexpMatchEasy1_1K-12 380ns ± 1% 385ns ± 1% +1.34% (p=0.000 n=20+19) RegexpMatchMedium_32-12 115ns ± 2% 115ns ± 1% +0.44% (p=0.047 n=20+20) RegexpMatchMedium_1K-12 35.4µs ± 1% 35.5µs ± 1% ~ (p=0.079 n=18+19) RegexpMatchHard_32-12 1.83µs ± 0% 1.80µs ± 1% -1.76% (p=0.000 n=18+18) RegexpMatchHard_1K-12 55.1µs ± 0% 54.3µs ± 1% -1.42% (p=0.000 n=18+19) Revcomp-12 386ms ± 1% 381ms ± 1% -1.14% (p=0.000 n=18+18) Template-12 61.5ms ± 2% 61.5ms ± 2% ~ (p=0.647 n=19+20) TimeParse-12 338ns ± 0% 336ns ± 1% -0.72% (p=0.000 n=14+19) TimeFormat-12 350ns ± 0% 357ns ± 0% +2.05% (p=0.000 n=19+18) [Geo mean] 55.3µs 55.0µs -0.41% Change-Id: I57e8720385a1b991aeebd111b6874354308e2a6b Reviewed-on: https://go-review.googlesource.com/20829 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-04-27 18:48:16 +00:00
Austin Clements	30172f1811	runtime: make {add,subtract}{b,1} nosplit These are used at the bottom level of various GC operations that must not be preempted. To be on the safe side, mark them all nosplit. Change-Id: I8f7360e79c9852bd044df71413b8581ad764380c Reviewed-on: https://go-review.googlesource.com/22504 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-04-27 18:46:00 +00:00

1 2 3 4 5 ...

1968 Commits