qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-19 17:14:44 -07:00

Author	SHA1	Message	Date
Josh Bleecher Snyder	6cb064c9c4	Revert "runtime: convert g.waitreason from string to uint8" This reverts commit `4eea887fd4`. Reason for revert: broke s390x build Change-Id: Id6c2b6a7319273c4d21f613d4cdd38b00d49f847 Reviewed-on: https://go-review.googlesource.com/100375 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-03-13 15:21:21 +00:00
Josh Bleecher Snyder	4eea887fd4	runtime: convert g.waitreason from string to uint8 Every time I poke at #14921, the g.waitreason string pointer writes show up. They're not particularly important performance-wise, but it'd be nice to clear the noise away. And it does open up a few extra bytes in the g struct for some future use. Change-Id: I7ffbd52fbc2a286931a2218038fda52ed6473cc9 Reviewed-on: https://go-review.googlesource.com/99078 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-03-12 21:56:50 +00:00
Austin Clements	1a033b1a70	runtime: separate spans of noscan objects Currently, we mix objects with pointers and objects without pointers ("noscan" objects) together in memory. As a result, for every object we grey, we have to check that object's heap bits to find out if it's noscan, which adds to the per-object cost of GC. This also hurts the TLB footprint of the garbage collector because it decreases the density of scannable objects at the page level. This commit improves the situation by using separate spans for noscan objects. This will allow a much simpler noscan check (in a follow up CL), eliminate the need to clear the bitmap of noscan objects (in a follow up CL), and improves TLB footprint by increasing the density of scannable objects. This is also a step toward eliminating dead bits, since the current noscan check depends on checking the dead bit of the first word. This has no effect on the heap size of the garbage benchmark. We'll measure the performance change of this after the follow-up optimizations. This is a cherry-pick from dev.garbage commit `d491e550c3`. The only non-trivial merge conflict was in updatememstats in mstats.go, where we now have to separate the per-spanclass stats from the per-sizeclass stats. Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40 Reviewed-on: https://go-review.googlesource.com/41251 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-28 22:50:31 +00:00
Austin Clements	1c4f3c5ea0	runtime: make gcSetTriggerRatio work at any time This changes gcSetTriggerRatio so it can be called even during concurrent mark or sweep. In this case, it will adjust the pacing of the current phase, accounting for progress that has already been made. To make this work for concurrent sweep, this introduces a "basis" for the pagesSwept count, much like the basis we just introduced for heap_live. This lets gcSetTriggerRatio shift the basis to the current heap_live and pagesSwept and compute a slope from there to completion. This avoids creating a discontinuity where, if the ratio has increased, there has to be a flurry of sweep activity to catch up. Instead, this creates a continuous, piece-wise linear function as adjustments are made. For #19076. Change-Id: Ibcd76aeeb81ff4814b00be7cbd3530b73bbdbba9 Reviewed-on: https://go-review.googlesource.com/39833 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-21 17:41:59 +00:00
Austin Clements	a5eb3dceaf	runtime: drive proportional sweep directly off heap_live Currently, proportional sweep maintains its own count of how many bytes have been allocated since the beginning of the sweep cycle so it can compute how many pages need to be swept for a given allocation. However, this requires a somewhat complex reimbursement scheme since proportional sweep must be done before a span is allocated, but we don't know how many bytes to charge until we've allocated a span. This means that the allocated byte count used by proportional sweep can go up and down, which has led to underflow bugs in the past (#18043) and is going to interfere with adjusting sweep pacing on-the-fly (for #19076). This approach also means we're maintaining a statistic that is very closely related to heap_live, but has a different 0 value. This is particularly confusing because the sweep ratio is computed based on heap_live, so you have to understand that these two statistics are very closely related. Replace all of this and compute the sweep debt directly from the current value of heap_live. To make this work, we simply save the value of heap_live when the sweep ratio is computed to use as a "basis" for later computing the sweep debt. This eliminates the need for reimbursement as well as the code for maintaining the sweeper's version of the live heap size. For #19076. Coincidentally fixes #18043, since this eliminates sweep reimbursement entirely. Change-Id: I1f931ddd6e90c901a3972c7506874c899251dc2a Reviewed-on: https://go-review.googlesource.com/39832 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-21 17:41:57 +00:00
Austin Clements	22000f5407	runtime: record swept and reclaimed bytes in sweep trace This extends the GCSweepDone event with counts of swept and reclaimed bytes. These are useful for understanding the duration and effectiveness of sweep events. Change-Id: I3c97a4f0f3aad3adbd188adb264859775f54e2df Reviewed-on: https://go-review.googlesource.com/40811 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>	2017-04-19 18:31:14 +00:00
Austin Clements	79c56addb6	runtime: make sweep trace events encompass entire sweep loop Currently, each individual span sweep emits a span to the trace. But sweeps are generally done in loops until some condition is satisfied, so this tracing is lower-level than anyone really wants any hides the fact that no other work is being accomplished between adjacent sweep events. This is also high overhead: enabling tracing significantly impacts sweep latency. Replace this with instead tracing around the sweep loops used for allocation. This is slightly tricky because sweep loops don't generally know if any sweeping will happen in them. Hence, we make the tracing lazy by recording in the P that we would like to start tracing the sweep if one happens, and then only closing the sweep event if we started it. This does mean we don't get tracing on every sweep path, which are legion. However, we get much more informative tracing on the paths that block allocation, which are the paths that matter. Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c Reviewed-on: https://go-review.googlesource.com/40810 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-19 18:31:11 +00:00
Austin Clements	051809e352	runtime: free workbufs during sweeping This extends the sweeper to free workbufs back to the heap between GC cycles, allowing this memory to be reused for GC'd allocations or eventually returned to the OS. This helps for applications that have high peak heap usage relative to their regular heap usage (for example, a high-memory initialization phase). Workbuf memory is roughly proportional to heap size and since we currently never free workbufs, it's proportional to peak heap size. By freeing workbufs, we can release and reuse this memory for other purposes when the heap shrinks. This is somewhat complicated because this costs ~1–2 µs per workbuf span, so for large heaps it's too expensive to just do synchronously after mark termination between starting the world and dropping the worldsema. Hence, we do it asynchronously in the sweeper. This adds a list of "free" workbuf spans that can be returned to the heap. GC moves all workbuf spans to this list after mark termination and the background sweeper drains this list back to the heap. If the sweeper doesn't finish, that's fine, since getempty can directly reuse any remaining spans to allocate more workbufs. Performance impact is negligible. On the x/benchmarks, this reduces GC-bytes-from-system by 6–11%. Fixes #19325. Change-Id: Icb92da2196f0c39ee984faf92d52f29fd9ded7a8 Reviewed-on: https://go-review.googlesource.com/38582 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-13 18:20:47 +00:00
Austin Clements	ecb7b63820	runtime: fix gcpacertrace printing of sweep ratio Commit `44ed88a5a7` moved printing of the "sweep done" gcpacertrace message so that it is printed when the final sweeper finishes. However, by this point some other thread has often already observed that there are no more spans to sweep and zeroed sweepPagesPerByte. Avoid printing a 0 sweep ratio in the trace when this race happens by getting the value of the sweep ratio upon entry to sweepone and printing that. Change-Id: Iac0c48ae899e12f193267cdfb012c921f8b71c85 Reviewed-on: https://go-review.googlesource.com/39492 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-05 18:24:52 +00:00
Austin Clements	44ed88a5a7	runtime: track the number of active sweepone calls sweepone returns ^uintptr(0) when there are no more spans to start sweeping, but there may be spans being swept concurrently at the time and there's currently no efficient way to tell when the sweeper is done sweeping all the spans. We'll need this for concurrent runtime.GC(), so add a count of the number of active sweepone calls to make it possible to block until sweeping is truly done. This is also useful for more accurately printing the gcpacertrace, since that should be printed after all of the sweeping stats are in (currently we can print it slightly too early). For #18216. Change-Id: I06e6240c9e7b40aca6fd7b788bb6962107c10a0f Reviewed-on: https://go-review.googlesource.com/37716 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-31 01:15:18 +00:00
Austin Clements	b50b728587	runtime: simplify sweep allocation counting Currently sweep counts the number of allocated objects, computes the number of free objects from that, then re-computes the number of allocated objects from that. Simplify and clean this up by skipping these intermediate steps. Change-Id: I3ed98e371eb54bbcab7c8530466c4ab5fde35f0a Reviewed-on: https://go-review.googlesource.com/34935 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-03 17:02:16 +00:00
Austin Clements	2817e77024	runtime: debug prints for spanBytesAlloc underflow Updates #18043. Change-Id: I24e687fdd5521c48b672987f15f0d5de9f308884 Reviewed-on: https://go-review.googlesource.com/34612 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-01-10 15:59:39 +00:00
Austin Clements	575b1dda4e	runtime: eliminate allspans snapshot Now that sweeping and span marking use the sweep list, there's no need for the work.spans snapshot of the allspans list. This change eliminates the few remaining uses of it, which are either dead code or can use allspans directly, and removes work.spans and its support functions. Change-Id: Id5388b42b1e68e8baee853d8eafb8bb4ff95bb43 Reviewed-on: https://go-review.googlesource.com/30537 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-25 22:33:02 +00:00
Austin Clements	f9497a6747	runtime: make sweep time proportional to in-use spans Currently sweeping walks the list of all spans, which means the work in sweeping is proportional to the maximum number of spans ever used. If the heap was once large but is now small, this causes an amortization failure: on a small heap, GCs happen frequently, but a full sweep still has to happen in each GC cycle, which means we spent a lot of time in sweeping. Fix this by creating a separate list consisting of just the in-use spans to be swept, so sweeping is proportional to the number of in-use spans (which is proportional to the live heap). Specifically, we create two lists: a list of unswept in-use spans and a list of swept in-use spans. At the start of the sweep cycle, the swept list becomes the unswept list and the new swept list is empty. Allocating a new in-use span adds it to the swept list. Sweeping moves spans from the unswept list to the swept list. This fixes the amortization problem because a shrinking heap moves spans off the unswept list without adding them to the swept list, reducing the time required by the next sweep cycle. Updates #9265. This fix eliminates almost all of the time spent in sweepone; however, markrootSpans has essentially the same bug, so now the test program from this issue spends all of its time in markrootSpans. No significant effect on other benchmarks. Change-Id: Ib382e82790aad907da1c127e62b3ab45d7a4ac1e Reviewed-on: https://go-review.googlesource.com/30535 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-25 22:32:57 +00:00
Dmitry Vyukov	caa2147532	runtime: per-P contexts for race detector Race runtime also needs local malloc caches and currently uses a mix of per-OS-thread and per-goroutine caches. This leads to increased memory consumption. But more importantly cache of synchronization objects is per-goroutine and we don't always have goroutine context when feeing memory in GC. As the result synchronization object descriptors leak (more precisely, they can be reused if another synchronization object is recreated at the same address, but it does not always help). For example, the added BenchmarkSyncLeak has effectively runaway memory consumption (based on a real long running server). This change updates race runtime with support for per-P contexts. BenchmarkSyncLeak now stabilizes at ~1GB memory consumption. Long term, this will allow us to remove race runtime dependency on glibc (as malloc is the main cornerstone). I've also implemented a different scheme to pass P context to race runtime: scheduler notified race runtime about association between G and P by calling procwire(g, p)/procunwire(g, p). But it turned out to be very messy as we have lots of places where the association changes (e.g. syscalls). So I dropped it in favor of the current scheme: race runtime asks scheduler about the current P. Fixes #14533 Change-Id: Iad10d2f816a44affae1b9fed446b3580eafd8c69 Reviewed-on: https://go-review.googlesource.com/19970 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-03 11:00:43 +00:00
Austin Clements	b3579c095e	[dev.garbage] runtime: revive sweep fast path sweep used to skip mcental.freeSpan (and its locking) if it didn't find any new free objects. We lost that optimization when the freed-object counting changed in dad83f7 to count total free objects instead of newly freed objects. The previous commit brings back counting of newly freed objects, so we can easily revive this optimization by checking that count (like we used to) instead of the total free objects count. Change-Id: I43658707a1c61674d0366124d5976b00d98741a9 Reviewed-on: https://go-review.googlesource.com/22596 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-04-29 15:25:28 +00:00
Austin Clements	d97625ae9e	[dev.garbage] runtime: fix nfree accounting Commit `8dda1c4` changed the meaning of "nfree" in sweep from the number of newly freed objects to the total number of free objects in the span, but didn't update where sweep added nfree to c.local_nsmallfree. Hence, we're over-accounting the number of frees. This is causing TestArrayHash to fail with "too many allocs NNN - hash not balanced". Fix this by computing the number of newly freed objects and adding that to c.local_nsmallfree, so it behaves like it used to. Computing this requires a small tweak to mallocgc: apparently we've never set s.allocCount when allocating a large object; fix this by setting it to 1 so sweep doesn't get confused. Change-Id: I31902ffd310110da4ffd807c5c06f1117b872dc8 Reviewed-on: https://go-review.googlesource.com/22595 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2016-04-29 15:25:26 +00:00
Austin Clements	6d11490539	[dev.garbage] runtime: fix allocfreetrace We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode when we removed the sweep over the mark bitmap. Fix it by re-introducing the sweep over the bitmap specifically if we're in allocfreetrace mode. This doesn't have to be even remotely efficient, since the overhead of allocfreetrace is huge anyway, so we can keep the code for this down to just a few lines. Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40 Reviewed-on: https://go-review.googlesource.com/22592 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-04-29 15:08:21 +00:00
Austin Clements	15744c92de	[dev.garbage] runtime: remove unused head/end arguments from freeSpan These used to be used for the list of newly freed objects, but that's no longer a thing. Change-Id: I5a4503137b74ec0eae5372ca271b1aa0b32df074 Reviewed-on: https://go-review.googlesource.com/22557 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-29 03:53:08 +00:00
Rick Hudson	2063d5d903	[dev.garbage] runtime: restructure alloc and mark bits Two changes are included here that are dependent on the other. The first is that allocBits and gcamrkBits are changed to a *uint8 which points to the first byte of that span's mark and alloc bits. Several places were altered to perform pointer arithmetic to locate the byte corresponding to an object in the span. The actual bit corresponding to an object is indexed in the byte by using the lower three bits of the objects index. The second change avoids the redundant calculation of an object's index. The index is returned from heapBitsForObject and then used by the functions indexing allocBits and gcmarkBits. Finally we no longer allocate the gc bits in the span structures. Instead we use an arena based allocation scheme that allows for a more compact bit map as well as recycling and bulk clearing of the mark bits. Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef Reviewed-on: https://go-review.googlesource.com/20705 Reviewed-by: Austin Clements <austin@google.com>	2016-04-29 00:00:47 +00:00
Rick Hudson	f8d0d4fd59	[dev.garbage] runtime: cleanup and optimize span.base() Prior to this CL the base of a span was calculated in various places using shifts or calls to base(). This CL now always calls base() which has been optimized to calculate the base of the span when the span is initialized and store that value in the span structure. Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100 Reviewed-on: https://go-review.googlesource.com/20703 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:59 +00:00
Rick Hudson	8dda1c4c08	[dev.garbage] runtime: remove heapBitsSweepSpan Prior to this CL the sweep phase was responsible for locating all objects that were about to be freed and calling a function to process the object. This was done by the function heapBitsSweepSpan. Part of processing included calls to tracefree and msanfree as well as counting how many objects were freed. The calls to tracefree and msanfree have been moved into the gcmalloc routine and called when the object is about to be reallocated. The counting of free objects has been optimized using an array based popcnt algorithm and if all the objects in a span are free then span is freed. Similarly the code to locate the next free object has been optimized to use an array based ctz (count trailing zero). Various hot paths in the allocation logic have been optimized. At this point the garbage benchmark is within 3% of the 1.6 release. Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa Reviewed-on: https://go-review.googlesource.com/20201 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:57 +00:00
Rick Hudson	4093481523	[dev.garbage] runtime: add bit and cache ctz64 (count trailing zero) Add to each span a 64 bit cache (allocCache) of the allocBits at freeindex. allocCache is shifted such that the lowest bit corresponds to the bit freeindex. allocBits uses a 0 to indicate an object is free, on the other hand allocCache uses a 1 to indicate an object is free. This facilitates ctz64 (count trailing zero) which counts the number of 0s trailing the least significant 1. This is also the index of the least significant 1. Each span maintains a freeindex indicating the boundary between allocated objects and unallocated objects. allocCache is shifted as freeindex is incremented such that the low bit in allocCache corresponds to the bit a freeindex in the allocBits array. Currently ctz64 is written in Go using a for loop so it is not very efficient. Use of the hardware instruction will follow. With this in mind comparisons of the garbage benchmark are as follows. 1.6 release 2.8 seconds dev:garbage branch 3.1 seconds. Profiling shows the go implementation of ctz64 takes up 1% of the total time. Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f Reviewed-on: https://go-review.googlesource.com/20200 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:54 +00:00
Rick Hudson	3479b065d4	[dev.garbage] runtime: allocate directly from GC mark bits Instead of building a freelist from the mark bits generated by the GC this CL allocates directly from the mark bits. The approach moves the mark bits from the pointer/no pointer heap structures into their own per span data structures. The mark/allocation vectors consist of a single mark bit per object. Two vectors are maintained, one for allocation and one for the GC's mark phase. During the GC cycle's sweep phase the interpretation of the vectors is swapped. The mark vector becomes the allocation vector and the old allocation vector is cleared and becomes the mark vector that the next GC cycle will use. Marked entries in the allocation vector indicate that the object is not free. Each allocation vector maintains a boundary between areas of the span already allocated from and areas not yet allocated from. As objects are allocated this boundary is moved until it reaches the end of the span. At this point further allocations will be done from another span. Since we no longer sweep a span inspecting each freed object the responsibility for maintaining pointer/scalar bits in the heapBitMap containing is now the responsibility of the the routines doing the actual allocation. This CL is functionally complete and ready for performance tuning. Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04 Reviewed-on: https://go-review.googlesource.com/19470 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:47 +00:00
Austin Clements	17f6e5396b	runtime: print sweep ratio if gcpacertrace>0 Change-Id: I5217bf4b75e110ca2946e1abecac6310ed84dad5 Reviewed-on: https://go-review.googlesource.com/21205 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-03-30 02:27:58 +00:00
Austin Clements	0d26efb12a	runtime: remove unnecessary clears of the heap bitmap Currently we clear the heap bitmap of a span both when we allocate that span and when we free it. There's no point in doing both, and we definitely have to write the heap bitmap when we allocate a span for pointer-sized objects, so switch to clearing only when we allocate a span. This results in a slight overall performance improvement; however, most of the benchmarks that get slower are very short, while the longer benchmarks generally got faster. name old time/op new time/op delta XBenchGarbage-12 2.48ms ± 1% 2.47ms ± 1% -0.58% (p=0.000 n=91+91) name old time/op new time/op delta BinaryTree17-12 2.85s ± 2% 2.85s ± 2% ~ (p=0.550 n=20+19) Fannkuch11-12 2.54s ± 0% 2.47s ± 1% -2.72% (p=0.000 n=19+18) FmtFprintfEmpty-12 51.3ns ± 4% 51.0ns ± 3% ~ (p=0.223 n=20+20) FmtFprintfString-12 169ns ± 0% 167ns ± 0% -1.18% (p=0.000 n=17+16) FmtFprintfInt-12 160ns ± 0% 161ns ± 0% +0.63% (p=0.000 n=16+15) FmtFprintfIntInt-12 267ns ± 0% 269ns ± 1% +0.62% (p=0.000 n=17+20) FmtFprintfPrefixedInt-12 234ns ± 1% 240ns ± 0% +2.80% (p=0.000 n=20+20) FmtFprintfFloat-12 316ns ± 0% 313ns ± 0% -0.76% (p=0.000 n=20+19) FmtManyArgs-12 1.04µs ± 0% 1.05µs ± 0% +0.45% (p=0.000 n=19+16) GobDecode-12 7.90ms ± 1% 7.81ms ± 0% -1.10% (p=0.000 n=18+18) GobEncode-12 6.61ms ± 1% 6.58ms ± 0% -0.46% (p=0.000 n=20+15) Gzip-12 320ms ± 1% 322ms ± 1% +0.47% (p=0.030 n=20+20) Gunzip-12 42.4ms ± 1% 42.6ms ± 0% +0.37% (p=0.000 n=20+20) HTTPClientServer-12 70.7µs ± 1% 70.6µs ± 2% ~ (p=0.784 n=18+20) JSONEncode-12 16.9ms ± 1% 16.8ms ± 0% -0.64% (p=0.000 n=20+20) JSONDecode-12 60.8ms ± 0% 58.6ms ± 1% -3.50% (p=0.000 n=17+18) Mandelbrot200-12 3.92ms ± 0% 3.91ms ± 0% -0.25% (p=0.000 n=19+19) GoParse-12 3.65ms ± 0% 3.68ms ± 1% +0.67% (p=0.000 n=17+16) RegexpMatchEasy0_32-12 102ns ± 1% 102ns ± 2% +0.67% (p=0.009 n=19+19) RegexpMatchEasy0_1K-12 350ns ± 0% 351ns ± 1% +0.34% (p=0.002 n=20+20) RegexpMatchEasy1_32-12 84.1ns ± 2% 84.2ns ± 2% ~ (p=0.799 n=20+18) RegexpMatchEasy1_1K-12 510ns ± 1% 508ns ± 1% -0.45% (p=0.000 n=20+17) RegexpMatchMedium_32-12 132ns ± 1% 134ns ± 1% +0.85% (p=0.000 n=20+19) RegexpMatchMedium_1K-12 40.0µs ± 1% 39.9µs ± 1% -0.29% (p=0.014 n=19+18) RegexpMatchHard_32-12 2.09µs ± 1% 2.05µs ± 0% -1.76% (p=0.000 n=20+18) RegexpMatchHard_1K-12 62.7µs ± 1% 61.8µs ± 1% -1.39% (p=0.000 n=20+19) Revcomp-12 541ms ± 1% 534ms ± 0% -1.16% (p=0.000 n=19+20) Template-12 71.1ms ± 0% 69.1ms ± 0% -2.83% (p=0.000 n=18+19) TimeParse-12 356ns ± 0% 357ns ± 0% +0.36% (p=0.000 n=17+19) TimeFormat-12 358ns ± 0% 372ns ± 1% +3.74% (p=0.000 n=15+18) [Geo mean] 62.6µs 62.5µs -0.25% Change-Id: Ied190b77c7a4d91ec7b2218c592fc31cf7acf362 Reviewed-on: https://go-review.googlesource.com/19633 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-25 23:37:19 +00:00
Austin Clements	4ad64cadf8	runtime: trace sweep completion in gcpacertrace mode Change-Id: I7991612e4d064c15492a39c19f753df1db926203 Reviewed-on: https://go-review.googlesource.com/17747 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-15 16:15:59 +00:00
Austin Clements	c1cbe5b577	runtime: check for spanBytesAlloc underflow Change-Id: I5e6739ff0c6c561195ed9891fb90f933b81e7750 Reviewed-on: https://go-review.googlesource.com/17746 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-15 16:15:47 +00:00
Michael Matloob	432cb66f16	runtime: break out system-specific constants into package sys runtime/internal/sys will hold system-, architecture- and config- specific constants. Updates #11647 Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54 Reviewed-on: https://go-review.googlesource.com/16817 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-12 17:04:45 +00:00
Matthew Dempsky	c17c42e8a5	runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...) Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and mSpanList. Two special cases: 1. mHeap_Scavenge() previously didn't take an mheap parameter, so it was specially handled in this CL. 2. mHeap_Free() would have collided with mheap's "free" field, so it's been renamed to (mheap).freeSpan to parallel its underlying (*mheap).freeSpanLocked method. Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b Reviewed-on: https://go-review.googlesource.com/16221 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-12 00:34:58 +00:00
Austin Clements	7d1d642956	runtime: fix use of xadd64 Commit `7407d8e` was rebased over the switch to runtime/internal/atomic and introduced a call to xadd64, which no longer exists. Fix that call. Change-Id: I99c93469794c16504ae4a8ffe3066ac382c66a3a Reviewed-on: https://go-review.googlesource.com/16816 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-11-11 15:26:24 +00:00
Austin Clements	7407d8e582	runtime: fix over-aggressive proportional sweep Currently, sweeping is performed before allocating a span by charging for the entire size of the span requested, rather than the number of bytes actually available for allocation from the returned span. That is, if the returned span is 8K, but already has 6K in use, the mutator is charged for 8K of heap allocation even though it can only allocate 2K more from the span. As a result, proportional sweep is over-aggressive and tends to finish much earlier than it needs to. This effect is more amplified by fragmented heaps. Fix this by reimbursing the mutator for the used space in a span once it has allocated that span. We still have to charge up-front for the worst-case because we don't know which span the mutator will get, but at least we can correct the over-charge once it has a span, which will go toward later span allocations. This has negligible effect on the throughput of the go1 benchmarks and the garbage benchmark. Fixes #12040. Change-Id: I0e23e7a4ccf126cca000fed5067b20017028dd6b Reviewed-on: https://go-review.googlesource.com/16515 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-11-11 15:21:32 +00:00
Michael Matloob	67faca7d9c	runtime: break atomics out into package runtime/internal/atomic This change breaks out most of the atomics functions in the runtime into package runtime/internal/atomic. It adds some basic support in the toolchain for runtime packages, and also modifies linux/arm atomics to remove the dependency on the runtime's mutex. The mutexes have been replaced with spinlocks. all trybots are happy! In addition to the trybots, I've tested on the darwin/arm64 builder, on the darwin/arm builder, and on a ppc64le machine. Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f Reviewed-on: https://go-review.googlesource.com/14204 Run-TryBot: Michael Matloob <matloob@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-10 17:38:04 +00:00
Dmitry Vyukov	ee0305e036	runtime: remove dead code runtime.free has long gone. Change-Id: I058f69e6481b8fa008e1951c29724731a8a3d081 Reviewed-on: https://go-review.googlesource.com/16593 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2015-11-03 19:20:21 +00:00
Dmitry Vyukov	bf606094ee	runtime: fix finalization and profiling of tiny allocations Handling of special records for tiny allocations has two problems: 1. Once we queue a finalizer we mark the object. As the result any subsequent finalizers for the same object will not be queued during this GC cycle. If we have 16 finalizers setup (the worst case), finalization will take 16 GC cycles. This is what caused misbehave of tinyfin.go. The actual flakiness was caused by the fact that fing is asynchronous and don't always run before the check. 2. If a tiny block has both finalizer and profile specials, it is possible that we both queue finalizer, preserve the object live and free the profile record. As the result heap profile can be skewed. Fix both issues by analyzing all special records for a single object at once. Also, make tinyfin test stricter and remove reliance on real time. Also, add a test for the problem 2. Currently heap profile missed about a half of live memory. Fixes #13100 Change-Id: I9ae4dc1c44893724138a4565ca5cae29f2e97544 Reviewed-on: https://go-review.googlesource.com/16591 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-11-03 18:57:18 +00:00
Ian Lance Taylor	73f329f472	runtime, syscall: add calls to msan functions Add explicit memory sanitizer instrumentation to the runtime and syscall packages. The compiler does not instrument the runtime package. It does instrument the syscall package, but we need to add a couple of cases that it can't see. Change-Id: I2d66073f713fe67e33a6720460d2bb8f72f31394 Reviewed-on: https://go-review.googlesource.com/16164 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-10-21 19:17:46 +00:00
Austin Clements	9a31d38f65	runtime: remove sweep wait loop in finishsweep_m In general, finishsweep_m must block until any spans that are concurrently being swept have been swept. It accomplishes this by looping over all spans, which, as in the previous commit, takes ~1ms/heap GB. Unfortunately, we do this during the STW sweep termination phase, so multi-gigabyte heaps can push our STW time past 10ms. However, there's no need to do this wait if the world is stopped because, in effect, stopping the world already had to wait for anything that was sweeping (and if it didn't, the wait in finishsweep_m would deadlock). Hence, we can simply skip this loop if the world is stopped, such as during sweep termination. In fact, currently all calls to finishsweep_m are STW, but this hasn't always been the case and may not be the case in the future, so we keep the logic around. For 24GB heaps, this reduces max pause time by 75% relative to tip and by 90% relative to Go 1.5. Notably, all pauses are now well under 10ms. Here are the results for the garbage benchmark: ------------- max pause ------------ Heap Procs after change before change 1.5.1 24GB 12 3.8ms 16ms 37ms 24GB 4 3.7ms 16ms 37ms 4GB 4 3.7ms 3ms 6.9ms In the 4GB/4P case, it seems the "before change" run got lucky: the max went up, but the 99%ile pause time went down from 3ms to 2.04ms. Change-Id: Ica22189559f231d408ef2815019c9dbb5f38bf31 Reviewed-on: https://go-review.googlesource.com/15071 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-02 19:56:01 +00:00
Austin Clements	70462f90ec	runtime: simplify mSpan_Sweep This is a cleanup following `cc8f544`, which was a minimal change to fix issue #11617. This consolidates the two places in mSpan_Sweep that update sweepgen. Previously this was necessary because sweepgen must be updated before freeing the span, but we freed large spans early. Now we free large spans later, so there's no need to duplicate the sweepgen update. This also means large spans can take advantage of the sweepgen sanity checking performed for other spans. Change-Id: I23b79dbd9ec81d08575cd307cdc0fa6b20831768 Reviewed-on: https://go-review.googlesource.com/12451 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-09-14 18:29:58 +00:00
Austin Clements	fc9ca85f4c	runtime: make sweep proportional to spans bytes allocated Proportional concurrent sweep is currently based on a ratio of spans to be swept per bytes of object allocation. However, proportional sweeping is performed during span allocation, not object allocation, in order to minimize contention and overhead. Since objects are allocated from spans after those spans are allocated, the system tends to operate in debt, which means when the next GC cycle starts, there is often sweep debt remaining, so GC has to finish the sweep, which delays the start of the cycle and delays enabling mutator assists. For example, it's quite likely that many Ps will simultaneously refill their span caches immediately after a GC cycle (because GC flushes the span caches), but at this point, there has been very little object allocation since the end of GC, so very little sweeping is done. The Ps then allocate objects from these cached spans, which drives up the bytes of object allocation, but since these allocations are coming from cached spans, nothing considers whether more sweeping has to happen. If the sweep ratio is high enough (which can happen if the next GC trigger is very close to the retained heap size), this can easily represent a sweep debt of thousands of pages. Fix this by making proportional sweep proportional to the number of bytes of spans allocated, rather than the number of bytes of objects allocated. Prior to allocating a span, both the small object path and the large object path ensure credit for allocating that span, so the system operates in the black, rather than in the red. Combined with the previous commit, this should eliminate all sweeping from GC start up. On the stress test in issue #11911, this reduces the time spent sweeping during GC (and delaying start up) by several orders of magnitude: mean 99%ile max pre fix 1 ms 11 ms 144 ms post fix 270 ns 735 ns 916 ns Updates #11911. Change-Id: I89223712883954c9d6ec2a7a51ecb97172097df3 Reviewed-on: https://go-review.googlesource.com/13044 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:44 +00:00
Austin Clements	e33d6b3d4d	runtime: remove out-of-date comment An out-of-date comment snuck in to `cc8f544`. Remove it. Change-Id: I5bc7c17e737d1cabe57b88de06d7579c60ca28ff Reviewed-on: https://go-review.googlesource.com/12328 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2015-07-17 16:52:32 +00:00
Austin Clements	cc8f544198	runtime: don't free large spans until heapBitsSweepSpan returns This fixes a race between 1) sweeping and freeing an unmarked large span and 2) reusing that span and allocating from it. This race arises because mSpan_Sweep returns spans for large objects to the heap before heapBitsSweepSpan clears the mark bit on the object in the span. Specifically, the following sequence of events can lead to an incorrectly zeroed bitmap byte, which causes the garbage collector to not trace any pointers in that object (the pointer bits for the first four words are cleared, and the scan bits are also cleared, so it looks like a no-scan object). 1) P0 calls mSpan_Sweep on a large span S0 with an unmarked object on it. 2) mSpan_Sweep calls heapBitsSweepSpan, which invokes the callback for the one (unmarked) object on the span. 3) The callback calls mHeap_Free, which makes span S0 available for allocation, but this is too early. 4) P1 grabs this S0 from the heap to use for allocation. 5) P1 allocates an object on this span and writes that object's type bits to the bitmap. 6) P0 returns from the callback to heapBitsSweepSpan. heapBitsSweepSpan clears the byte containing the mark, even though this span is now owned by P1 and this byte contains important bitmap information. This fixes this problem by simply delaying the mHeap_Free until after the heapBitsSweepSpan. I think the overall logic of mSpan_Sweep could be simplified now, but this seems like the minimal change. Fixes #11617. Change-Id: I6b1382c7e7cc35f81984467c0772fe9848b7522a Reviewed-on: https://go-review.googlesource.com/12320 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Rob Pike <r@golang.org>	2015-07-17 03:34:11 +00:00
Austin Clements	5250279eb9	runtime: detect and print corrupted free lists Issues #10240, #10541, #10941, #11023, #11027 and possibly others are indicating memory corruption in the runtime. One of the easiest places to both get corruption and detect it is in the allocator's free lists since they appear throughout memory and follow strict invariants. This commit adds a check when sweeping a span that its free list is sane and, if not, it prints the corrupted free list and panics. Hopefully this will help us collect more information on these failures. Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64 Reviewed-on: https://go-review.googlesource.com/10713 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-06-16 21:17:47 +00:00
Austin Clements	24a7252e25	runtime: finish sweeping before concurrent GC starts Currently, the concurrent sweep follows a 1:1 rule: when allocation needs a span, it sweeps a span (likewise, when a large allocation needs N pages, it sweeps until it frees N pages). This rule worked well for the STW collector (especially when GOGC==100) because it did no more sweeping than necessary to keep the heap from growing, would generally finish sweeping just before GC, and ensured good temporal locality between sweeping a page and allocating from it. It doesn't work well with concurrent GC. Since concurrent GC requires starting GC earlier (sometimes much earlier), the sweep often won't be done when GC starts. Unfortunately, the first thing GC has to do is finish the sweep. In the mean time, the mutator can continue allocating, pushing the heap size even closer to the goal size. This worked okay with the 7/8ths trigger, but it gets into a vicious cycle with the GC trigger controller: if the mutator is allocating quickly and driving the trigger lower, more and more sweep work will be left to GC; this both causes GC to take longer (allowing the mutator to allocate more during GC) and delays the start of the concurrent mark phase, which throws off the GC controller's statistics and generally causes it to push the trigger even lower. As an example of a particularly bad case, the garbage benchmark with GOMAXPROCS=4 and -benchmem 512 (MB) spends the first 0.4-0.8 seconds of each GC cycle sweeping, during which the heap grows by between 109MB and 252MB. To fix this, this change replaces the 1:1 sweep rule with a proportional sweep rule. At the end of GC, GC knows exactly how much heap allocation will occur before the next concurrent GC as well as how many span pages must be swept. This change computes this "sweep ratio" and when the mallocgc asks for a span, the mcentral sweeps enough spans to bring the swept span count into ratio with the allocated byte count. On the benchmark from above, this entirely eliminates sweeping at the beginning of GC, which reduces the time between startGC readying the GC goroutine and GC stopping the world for sweep termination to ~100µs during which the heap grows at most 134KB. Change-Id: I35422d6bba0c2310d48bb1f8f30a72d29e98c1af Reviewed-on: https://go-review.googlesource.com/8921 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:46 +00:00
Austin Clements	bedb6f8aef	runtime: remove unnecessary traceNextGC Commit `d7e0ad4` removed the next_gc manipulation from mSpan_Sweep, but left in the traceNextGC() for recording the updated next_gc value. Remove this now unnecessary call. Change-Id: I28e0de071661199be9810d7bdcc81ce50b5a58ae Reviewed-on: https://go-review.googlesource.com/8894 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-14 20:54:23 +00:00
Austin Clements	d7e0ad4b82	runtime: introduce heap_live; replace use of heap_alloc in GC Currently there are two main consumers of memstats.heap_alloc: updatememstats (aka ReadMemStats) and shouldtriggergc. updatememstats recomputes heap_alloc from the ground up, so we don't need to keep heap_alloc up to date for it. shouldtriggergc wants to know how many bytes were marked by the previous GC plus how many bytes have been allocated since then, but this isn't what heap_alloc tracks. heap_alloc also includes objects that are not marked and haven't yet been swept. Introduce a new memstat called heap_live that actually tracks what shouldtriggergc wants to know and stop keeping heap_alloc up to date. Unlike heap_alloc, heap_live follows a simple sawtooth that drops during each mark termination and increases monotonically between GCs. heap_alloc, on the other hand, has much more complicated behavior: it may drop during sweep termination, slowly decreases from background sweeping between GCs, is roughly unaffected by allocation as long as there are unswept spans (because we sweep and allocate at the same rate), and may go up after background sweeping is done depending on the GC trigger. heap_live simplifies computing next_gc and using it to figure out when to trigger garbage collection. Currently, we guess next_gc at the end of a cycle and update it as we sweep and get a better idea of how much heap was marked. Now, since we're directly tracking how much heap is marked, we can directly compute next_gc. This also corrects bugs that could cause us to trigger GC early. Currently, in any case where sweep termination actually finds spans to sweep, heap_alloc is an overestimation of live heap, so we'll trigger GC too early. heap_live, on the other hand, is unaffected by sweeping. Change-Id: I1f96807b6ed60d4156e8173a8e68745ffc742388 Reviewed-on: https://go-review.googlesource.com/8389 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-06 21:28:13 +00:00
Dmitry Vyukov	919fd24884	runtime: remove runtime frames from stacks in traces Stip uninteresting bottom and top frames from trace stacks. This makes both binary and json trace files smaller, and also makes stacks shorter and more readable in the viewer. Change-Id: Ib9c80ccc280504f0e235f867f53f1d2652c41583 Reviewed-on: https://go-review.googlesource.com/5523 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-03-10 14:46:15 +00:00
Russ Cox	5789b28525	runtime: start GC background sweep eagerly Starting it lazily causes a memory allocation (for the goroutine) during GC. First use of channels for runtime implementation. Change-Id: I9cd24dcadbbf0ee5070ee6d0ed7ea415504f316c Reviewed-on: https://go-review.googlesource.com/6960 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-03-05 21:41:55 +00:00
Rick Hudson	122384e489	runtime: Remove boundary bit logic. This is an experiment to see if removing the boundary bit logic will lead to fewer cache misses and improved performance. Instead of using boundary bits we use the span information to get element size and use some bit whacking to get the boundary without having to touch the random heap bits which cause cache misses. Furthermore once the boundary bit is removed we can either use that bit for a simpler checkmark routine or we can reduce the number of bits in the GC bitmap to 2 bits per pointer sized work. For example the 2 bits at the boundary can be used for marking and pointer/scalar differentiation. Since we don't need the mark bit except at the boundary nibble of the object other nibbles can use this bit as a noscan bit to indicate that there are no more pointers in the object. Currently the changed included in this CL slows down the garbage benchmark. With the boundary bits garbage gives 5.78 and without (this CL) it gives 5.88 which is a 2% slowdown. Change-Id: Id68f831ad668176f7dc9f7b57b339e4ebb6dc4c2 Reviewed-on: https://go-review.googlesource.com/6665 Reviewed-by: Austin Clements <austin@google.com>	2015-03-04 20:55:55 +00:00
Russ Cox	89a091de24	runtime: split gc_m into gcMark and gcSweep This is a nice split but more importantly it provides a better way to fit the checkmark phase into the sequencing. Also factor out common span copying into gcSpanCopy. Change-Id: Ia058644974e4ed4ac3cf4b017a3446eb2284d053 Reviewed-on: https://go-review.googlesource.com/5333 Reviewed-by: Austin Clements <austin@google.com>	2015-02-20 17:00:39 +00:00
Russ Cox	484f801ff4	runtime: reorganize memory code Move code from malloc1.go, malloc2.go, mem.go, mgc0.go into appropriate locations. Factor mgc.go into mgc.go, mgcmark.go, mgcsweep.go, mstats.go. A lot of this code was in certain files because the right place was in a C file but it was written in Go, or vice versa. This is one step toward making things actually well-organized again. Change-Id: I6741deb88a7cfb1c17ffe0bcca3989e10207968f Reviewed-on: https://go-review.googlesource.com/5300 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 20:17:01 +00:00

50 Commits