The only non test user of the assembler prefetch functions is the
heapBits.prefetch function which is itself unused.
The runtime prefetch functions have no functionality on most platforms
and are not inlineable since they are written in assembler. The function
call overhead eliminates the performance gains that could be achieved with
prefetching and would degrade performance for platforms where the functions
are no-ops.
If prefetch functions are needed back again later they can be improved
by avoiding the function call overhead and implementing them as intrinsics.
Change-Id: I52c553cf3607ffe09f0441c6e7a0a818cb21117d
Reviewed-on: https://go-review.googlesource.com/44370
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This adds debugging information when we panic with "heapBitsForSpan:
base out of range".
Updates #20259.
Change-Id: I0dc1a106aa9e9531051c7d08867ace5ef230eb3f
Reviewed-on: https://go-review.googlesource.com/43310
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This is no longer necessary now that we can more efficiently consult
the span's noscan bit.
This is a cherry-pick of dev.garbage commit 312aa09996.
Change-Id: Id0b00b278533660973f45eb6efa5b00f373d58af
Reviewed-on: https://go-review.googlesource.com/41252
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, we mix objects with pointers and objects without pointers
("noscan" objects) together in memory. As a result, for every object
we grey, we have to check that object's heap bits to find out if it's
noscan, which adds to the per-object cost of GC. This also hurts the
TLB footprint of the garbage collector because it decreases the
density of scannable objects at the page level.
This commit improves the situation by using separate spans for noscan
objects. This will allow a much simpler noscan check (in a follow up
CL), eliminate the need to clear the bitmap of noscan objects (in a
follow up CL), and improves TLB footprint by increasing the density of
scannable objects.
This is also a step toward eliminating dead bits, since the current
noscan check depends on checking the dead bit of the first word.
This has no effect on the heap size of the garbage benchmark.
We'll measure the performance change of this after the follow-up
optimizations.
This is a cherry-pick from dev.garbage commit d491e550c3. The only
non-trivial merge conflict was in updatememstats in mstats.go, where
we now have to separate the per-spanclass stats from the per-sizeclass
stats.
Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40
Reviewed-on: https://go-review.googlesource.com/41251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This introduces a new type, *gcBits, to use for alloc/mark bitmap
allocations instead of *uint8. This type is marked go:notinheap, so
uses of it correctly eliminate write barriers. Since we now have a
type, this also extracts some common operations to methods both for
convenience and to avoid (*uint8) casts at most use sites.
For #19325.
Change-Id: Id51f734fb2e96b8b7715caa348c8dcd4aef0696a
Reviewed-on: https://go-review.googlesource.com/38580
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
We're about to generalize _MSpanStack to be used for other forms of
in-heap manual memory management in the runtime. This is an automated
rename of _MSpanStack to _MSpanManual plus some comment fix-ups.
For #19325.
Change-Id: I1e20a57bb3b87a0d324382f92a3e294ffc767395
Reviewed-on: https://go-review.googlesource.com/38574
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Changing mheap_.arena_used requires several steps that are currently
repeated multiple times in mheap_.sysAlloc. Consolidate these into a
single function.
In the future, this will also make it easier to add other auxiliary VM
structures.
Change-Id: Ie68837d2612e1f4ba4904acb1b6b832b15431d56
Reviewed-on: https://go-review.googlesource.com/40151
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Implement math/bits.TrailingZerosX using intrinsics.
Generally reorganize the intrinsic spec a bit.
The instrinsics data structure is now built at init time.
This will make doing the other functions in math/bits easier.
Update sys.CtzX to return int instead of uint{64,32} so it
matches math/bits.TrailingZerosX.
Improve the intrinsics a bit for amd64. We don't need the CMOV
for <64 bit versions.
Update #18616
Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39
Reviewed-on: https://go-review.googlesource.com/38155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
If the bad pointer is on a stack, this makes it possible to find the
frame containing the bad pointer.
Change-Id: Ieda44e054aa9ebf22d15d184457c7610b056dded
Reviewed-on: https://go-review.googlesource.com/37858
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently almost every function that deals with a *_func has to first
look up the *moduledata for the module containing the function's entry
point. This means we almost always do at least two identical module
lookups whenever we deal with a *_func (one to get the *_func and
another to get something from its module data) and sometimes several
more.
Fix this by making findfunc return a new funcInfo type that embeds
*_func, but also includes the *moduledata, and making all of the
functions that currently take a *_func instead take a funcInfo and use
the already-found *moduledata.
This transformation is trivial for the most part, since the *_func
type is usually inferred. The annoying part is that we can no longer
use nil to indicate failure, so this introduces a funcInfo.valid()
method and replaces nil checks with calls to valid.
Change-Id: I9b8075ef1c31185c1943596d96dec45c7ab5100f
Reviewed-on: https://go-review.googlesource.com/37331
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Currently sweep counts the number of allocated objects, computes the
number of free objects from that, then re-computes the number of
allocated objects from that. Simplify and clean this up by skipping
these intermediate steps.
Change-Id: I3ed98e371eb54bbcab7c8530466c4ab5fde35f0a
Reviewed-on: https://go-review.googlesource.com/34935
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Now that we don't rescan stacks, stack barriers are unnecessary. This
removes all of the code and structures supporting them as well as
tests that were specifically for stack barriers.
Updates #17503.
Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c
Reviewed-on: https://go-review.googlesource.com/36620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
It's not used, it's never been used, and it doesn't do what its doc
comment says it does.
Fixes#18941.
Change-Id: Ia89d97fb87525f5b861d7701f919e0d6b7cbd376
Reviewed-on: https://go-review.googlesource.com/36322
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The introduction of -buildmode=plugin means modules can be added to a
Go program while it is running. This means there exists some time
while the program is running with the module is on the moduledata
linked list, but it has not been initialized to the satisfaction of
other parts of the runtime. Notably, the GC.
This CL adds a new way of access modules, an activeModules function.
It returns a slice of modules that is built in the background and
atomically swapped in. The parts of the runtime that need to wait on
module initialization can use this slice instead of the linked list.
Fixes#17455
Change-Id: I04790fd07e40c7295beb47cea202eb439206d33d
Reviewed-on: https://go-review.googlesource.com/32357
Reviewed-by: Ian Lance Taylor <iant@golang.org>
No point in computing this info on startup.
Compute it at build time.
This lets us spend more time computing & checking the size classes.
Improve the div magic for rounding to the start of an object.
We can now use 32-bit multiplies & shifts, which should help
32-bit platforms.
The static data is <1KB.
The actual size classes are not changed by this CL.
Change-Id: I6450cec7d1b2b4ad31fd3f945f504ed2ec6570e7
Reviewed-on: https://go-review.googlesource.com/32219
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This implements the unconditional version of the hybrid deletion write
barrier, which always shades both the old and new pointer. It's
unconditional for now because barriers on channel operations require
checking both the source and destination stacks and we don't have a
way to funnel this information into the write barrier at the moment.
As part of this change, we modify the typed memclr operations
introduced earlier to invoke the write barrier.
This has basically no overall effect on benchmark performance. This is
good, since it indicates that neither the extra shade nor the new bulk
clear barriers have much effect. It also has little effect on latency.
This is expected, since we haven't yet modified mark termination to
take advantage of the hybrid barrier.
Updates #17503.
Change-Id: Iebedf84af2f0e857bd5d3a2d525f760b5cf7224b
Reviewed-on: https://go-review.googlesource.com/31765
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently bulkBarrierPreWrite calls writebarrierptr_prewrite, but this
means that we check writeBarrier.needed twice and perform cgo checks
twice.
Change bulkBarrierPreWrite to call writebarrierptr_prewrite1 to skip
over these duplicate checks.
This may speed up bulkBarrierPreWrite slightly, but mostly this will
save us from running out of nosplit stack space on ppc64x in the near
future.
Updates #17503.
Change-Id: I1cea1a2207e884ab1a279c6a5e378dcdc048b63e
Reviewed-on: https://go-review.googlesource.com/31890
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, we perform write barriers after performing pointer writes.
At the moment, it simply doesn't matter what order this happens in, as
long as they appear atomic to GC. But both the hybrid barrier and ROC
are going to require a pre-write write barrier.
For the hybrid barrier, this is important because the barrier needs to
observe both the current value of the slot and the value that will be
written to it. (Alternatively, the caller could do the write and pass
in the old value, but it seems easier and more useful to just swap the
order of the barrier and the write.)
For ROC, this is necessary because, if the pointer write is going to
make the pointer reachable to some goroutine that it currently is not
visible to, the garbage collector must take some special action before
that pointer becomes more broadly visible.
This commits swaps pointer writes around so the write barrier occurs
before the pointer write.
The main subtlety here is bulk memory writes. Currently, these copy to
the destination first and then use the pointer bitmap of the
destination to find the copied pointers and invoke the write barrier.
This is necessary because the source may not have a pointer bitmap. To
handle these, we pass both the source and the destination to the bulk
memory barrier, which uses the pointer bitmap of the destination, but
reads the pointer values from the source.
Updates #17503.
Change-Id: I78ecc0c5c94ee81c29019c305b3d232069294a55
Reviewed-on: https://go-review.googlesource.com/31763
Reviewed-by: Rick Hudson <rlh@golang.org>
Since barrier-less memclr is only safe in very narrow circumstances,
this commit renames memclr to avoid accidentally calling memclr on
typed memory. This can cause subtle, non-deterministic bugs, so it's
worth some effort to prevent. In the near term, this will also prevent
bugs creeping in from any concurrent CLs that add calls to memclr; if
this happens, whichever patch hits master second will fail to compile.
This also adds the other new memclr variants to the compiler's
builtin.go to minimize the churn on that binary blob. We'll use these
in future commits.
Updates #17503.
Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca
Reviewed-on: https://go-review.googlesource.com/31369
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Like h_allspans and mheap_.allspans, these were two ways of referring
to the spans array from when the runtime was split between C and Go.
Clean this up by making mheap_.spans a slice and eliminating h_spans.
Change-Id: I3aa7038d53c3a4252050aa33e468c48dfed0b70e
Reviewed-on: https://go-review.googlesource.com/30532
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently mspan.isFree technically returns whether the object was not
allocated *during this cycle*. Fix it so it actually returns whether
or not the object is allocated so the method is more generally useful
(especially for debugging).
It has one caller, which is carefully written to be insensitive to
this distinction, but this lets us simplify this caller.
Change-Id: I9d79cf784a56015e434961733093c1d8d03fc091
Reviewed-on: https://go-review.googlesource.com/30145
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Change-Id: I8fd271066925734c3f7196f64db04f27c4ce27cb
Reviewed-on: https://go-review.googlesource.com/30274
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
These used to be necessary when racing with updates to the mark bit,
but since the mark bit is no longer in the bitmap and the checkmark is
only updated with the world stopped, we can now always use regular
writes to update the type information in the heap bitmap.
Somewhat surprisingly, this has basically no overall performance
effect beyond the usual noise, but it does clean up the code.
Change-Id: I3933d0b4c0bc1c9bcf6313613515c0b496212105
Reviewed-on: https://go-review.googlesource.com/29277
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Now that the runtime fetches the true physical page size from the OS,
make the physical page size used by heap growth a variable instead of
a constant. This isn't used in any performance-critical paths, so it
shouldn't be an issue.
sys.PhysPageSize is also renamed to sys.DefaultPhysPageSize to make it
clear that it's not necessarily the true page size. There are no uses
of this constant any more, but we'll keep it around for now.
Updates #12480 and #10180.
Change-Id: I6c23b9df860db309c38c8287a703c53817754f03
Reviewed-on: https://go-review.googlesource.com/25022
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Commit 59877bf renamed bitMarked to bitScan, since the bitmap is no
longer used for marking. However, there were several other references
to this strewn about comments and in some other constant names. Fix
these up, too.
Change-Id: I4183d28c6b01977f1d75a99ad55b150f2211772d
Reviewed-on: https://go-review.googlesource.com/28450
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the heapBitsSetType documentation says that there are no
races on the heap bitmap, but that isn't exactly true. There are no
*write-write* races, but there are read-write races. Expand the
documentation to explain this and why it's okay.
Change-Id: Ibd92b69bcd6524a40a9dd4ec82422b50831071ed
Reviewed-on: https://go-review.googlesource.com/23092
Reviewed-by: Rick Hudson <rlh@golang.org>
In issue #13992, Russ mentioned that the heap bitmap footprint was
halved but that the bitmap size calculation hadn't been updated. This
presents the opportunity to either halve the bitmap size or double
the addressable virtual space. This CL doubles the addressable virtual
space. On 32 bit this can be tweaked further to allow the bitmap to
cover the entire 4GB virtual address space, removing a failure mode
if the kernel hands out memory with a too low address.
First, fix the calculation and double _MaxArena32 to cover 4GB virtual
memory space with the same bitmap size (256 MB).
Then, allow the fallback mode for the initial memory reservation
on 32 bit (or 64 bit with too little available virtual memory) to not
include space for the arena. mheap.sysAlloc will automatically reserve
additional space when the existing arena is full.
Finally, set arena_start to 0 in 32 bit mode, so that any address is
acceptable for subsequent (additional) reservations.
Before, the bitmap was always located just before arena_start, so
fix the two places relying on that assumption: Point the otherwise unused
mheap.bitmap to one byte after the end of the bitmap, and use it for
bitmap addressing instead of arena_start.
With arena_start set to 0 on 32 bit, the cgoInRange check is no longer a
sufficient check for Go pointers. Introduce and call inHeapOrStack to
check whether a pointer is to the Go heap or stack.
While we're here, remove sysReserveHigh which seems to be unused.
Fixes#13992
Change-Id: I592b513148a50b9d3967b5c5d94b86b3ec39acc2
Reviewed-on: https://go-review.googlesource.com/20471
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This updates some comments that became out of date when we moved the
mark bit out of the heap bitmap and started using the high bit for the
first word as a scan/dead bit.
Change-Id: I4a572d16db6114cadff006825466c1f18359f2db
Reviewed-on: https://go-review.googlesource.com/22662
Reviewed-by: Rick Hudson <rlh@golang.org>
With the switch to separate mark bitmaps, the scan/dead bit for the
first word of each object is now unused. Reclaim this bit and use it
as a scan/dead bit, just like words three and on. The second word is
still used for checkmark.
This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
since they no longer need different cases for 1, 2, and 3+ word
objects. They can instead just manipulate the heap bitmap for the
first word and be done with it.
In order to enable this, we change heapBitsSetType and runGCProg to
always set the scan/dead bit to scan for the first word on every code
path. Since these functions only apply to types that have pointers,
there's no need to do this conditionally: it's *always* necessary to
set the scan bit in the first word.
We also change every place that scans an object and checks if there
are more pointers. Rather than only checking morePointers if the word
is >= 2, we now check morePointers if word != 1 (since that's the
checkmark word).
Looking forward, we should probably reclaim the checkmark bit, too,
but that's going to be quite a bit more work.
Tested by setting doubleCheck in heapBitsSetType and running all.bash
on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.
This particularly improves the FmtFprintf* go1 benchmarks, since they
do a large amount of noscan allocation.
name old time/op new time/op delta
BinaryTree17-12 2.34s ± 1% 2.38s ± 1% +1.70% (p=0.000 n=17+19)
Fannkuch11-12 2.09s ± 0% 2.09s ± 1% ~ (p=0.276 n=17+16)
FmtFprintfEmpty-12 44.9ns ± 2% 44.8ns ± 2% ~ (p=0.340 n=19+18)
FmtFprintfString-12 127ns ± 0% 125ns ± 0% -1.57% (p=0.000 n=16+15)
FmtFprintfInt-12 128ns ± 0% 122ns ± 1% -4.45% (p=0.000 n=15+20)
FmtFprintfIntInt-12 207ns ± 1% 193ns ± 0% -6.55% (p=0.000 n=19+14)
FmtFprintfPrefixedInt-12 197ns ± 1% 191ns ± 0% -2.93% (p=0.000 n=17+18)
FmtFprintfFloat-12 263ns ± 0% 248ns ± 1% -5.88% (p=0.000 n=15+19)
FmtManyArgs-12 794ns ± 0% 779ns ± 1% -1.90% (p=0.000 n=18+18)
GobDecode-12 7.14ms ± 2% 7.11ms ± 1% ~ (p=0.072 n=20+20)
GobEncode-12 5.85ms ± 1% 5.82ms ± 1% -0.49% (p=0.000 n=20+20)
Gzip-12 218ms ± 1% 215ms ± 1% -1.22% (p=0.000 n=19+19)
Gunzip-12 36.8ms ± 0% 36.7ms ± 0% -0.18% (p=0.006 n=18+20)
HTTPClientServer-12 77.1µs ± 4% 77.1µs ± 3% ~ (p=0.945 n=19+20)
JSONEncode-12 15.6ms ± 1% 15.9ms ± 1% +1.68% (p=0.000 n=18+20)
JSONDecode-12 55.2ms ± 1% 53.6ms ± 1% -2.93% (p=0.000 n=17+19)
Mandelbrot200-12 4.05ms ± 1% 4.05ms ± 0% ~ (p=0.306 n=17+17)
GoParse-12 3.14ms ± 1% 3.10ms ± 1% -1.31% (p=0.000 n=19+18)
RegexpMatchEasy0_32-12 69.3ns ± 1% 70.0ns ± 0% +0.89% (p=0.000 n=19+17)
RegexpMatchEasy0_1K-12 237ns ± 1% 236ns ± 0% -0.62% (p=0.000 n=19+16)
RegexpMatchEasy1_32-12 69.5ns ± 1% 70.3ns ± 1% +1.14% (p=0.000 n=18+17)
RegexpMatchEasy1_1K-12 377ns ± 1% 366ns ± 1% -3.03% (p=0.000 n=15+19)
RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 2% ~ (p=0.318 n=20+19)
RegexpMatchMedium_1K-12 33.8µs ± 3% 33.5µs ± 1% -1.04% (p=0.001 n=20+19)
RegexpMatchHard_32-12 1.68µs ± 1% 1.73µs ± 0% +2.50% (p=0.000 n=20+18)
RegexpMatchHard_1K-12 50.8µs ± 1% 52.0µs ± 1% +2.50% (p=0.000 n=19+18)
Revcomp-12 381ms ± 1% 385ms ± 1% +1.00% (p=0.000 n=17+18)
Template-12 64.9ms ± 3% 62.6ms ± 1% -3.55% (p=0.000 n=19+18)
TimeParse-12 324ns ± 0% 328ns ± 1% +1.25% (p=0.000 n=18+18)
TimeFormat-12 345ns ± 0% 334ns ± 0% -3.31% (p=0.000 n=15+17)
[Geo mean] 52.1µs 51.5µs -1.00%
Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049
Reviewed-on: https://go-review.googlesource.com/22632
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
heapBits.bits is carefully written to produce good machine code. Use
it in heapBits.morePointers and heapBits.isPointer to get good machine
code there, too.
Change-Id: I208c7d0d38697e7a22cad67f692162589b75f1e2
Reviewed-on: https://go-review.googlesource.com/22630
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This converts all remaining uses of mspan.start to instead use
mspan.base(). In many cases, this actually reduces the complexity of
the code.
Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7
Reviewed-on: https://go-review.googlesource.com/22590
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Our compilers now provides instrinsics including
sys.Ctz64 that support CTZ (count trailing zero)
instructions. This CL replaces the Go versions
of CTZ with the compiler intrinsic.
Count trailing zeros CTZ finds the least
significant 1 in a word and returns the number
of less significant 0s in the word.
Allocation uses the bitmap created by the garbage
collector to locate an unmarked object. The logic
takes a word of the bitmap, complements, and then
caches it. It then uses CTZ to locate an available
unmarked object. It then shifts marked bits out of
the bitmap word preparing it for the next search.
Once all the unmarked objects are used in the
cached work the bitmap gets another word and
repeats the process.
Change-Id: Id2fc42d1d4b9893efaa2e1bd01896985b7e42f82
Reviewed-on: https://go-review.googlesource.com/21366
Reviewed-by: Austin Clements <austin@google.com>
Two changes are included here that are dependent on the other.
The first is that allocBits and gcamrkBits are changed to
a *uint8 which points to the first byte of that span's
mark and alloc bits. Several places were altered to
perform pointer arithmetic to locate the byte corresponding
to an object in the span. The actual bit corresponding
to an object is indexed in the byte by using the lower three
bits of the objects index.
The second change avoids the redundant calculation of an
object's index. The index is returned from heapBitsForObject
and then used by the functions indexing allocBits
and gcmarkBits.
Finally we no longer allocate the gc bits in the span
structures. Instead we use an arena based allocation scheme
that allows for a more compact bit map as well as recycling
and bulk clearing of the mark bits.
Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef
Reviewed-on: https://go-review.googlesource.com/20705
Reviewed-by: Austin Clements <austin@google.com>
Prior to this CL the base of a span was calculated in various
places using shifts or calls to base(). This CL now
always calls base() which has been optimized to calculate the
base of the span when the span is initialized and store that
value in the span structure.
Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100
Reviewed-on: https://go-review.googlesource.com/20703
Reviewed-by: Austin Clements <austin@google.com>
Prior to this CL the sweep phase was responsible for locating
all objects that were about to be freed and calling a function
to process the object. This was done by the function
heapBitsSweepSpan. Part of processing included calls to
tracefree and msanfree as well as counting how many objects
were freed.
The calls to tracefree and msanfree have been moved into the
gcmalloc routine and called when the object is about to be
reallocated. The counting of free objects has been optimized
using an array based popcnt algorithm and if all the objects
in a span are free then span is freed.
Similarly the code to locate the next free object has been
optimized to use an array based ctz (count trailing zero).
Various hot paths in the allocation logic have been optimized.
At this point the garbage benchmark is within 3% of the 1.6
release.
Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa
Reviewed-on: https://go-review.googlesource.com/20201
Reviewed-by: Austin Clements <austin@google.com>
Add to each span a 64 bit cache (allocCache) of the allocBits
at freeindex. allocCache is shifted such that the lowest bit
corresponds to the bit freeindex. allocBits uses a 0 to
indicate an object is free, on the other hand allocCache
uses a 1 to indicate an object is free. This facilitates
ctz64 (count trailing zero) which counts the number of 0s
trailing the least significant 1. This is also the index of
the least significant 1.
Each span maintains a freeindex indicating the boundary
between allocated objects and unallocated objects. allocCache
is shifted as freeindex is incremented such that the low bit
in allocCache corresponds to the bit a freeindex in the
allocBits array.
Currently ctz64 is written in Go using a for loop so it is
not very efficient. Use of the hardware instruction will
follow. With this in mind comparisons of the garbage
benchmark are as follows.
1.6 release 2.8 seconds
dev:garbage branch 3.1 seconds.
Profiling shows the go implementation of ctz64 takes up
1% of the total time.
Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f
Reviewed-on: https://go-review.googlesource.com/20200
Reviewed-by: Austin Clements <austin@google.com>
Most (all?) processors that Go supports supply a hardware
instruction that takes a byte and returns the number
of zeros trailing the first 1 encountered, or 8
if no ones are found. This is the index within the
byte of the first 1 encountered. CTZ should improve the
performance of the nextFreeIndex function.
Since nextFreeIndex wants the next unmarked (0) bit
a bit-wise complement is needed before calling ctz.
Furthermore unmarked bits associated with previously
allocated objects need to be ignored. Instead of writing
a 1 as we allocate the code masks all bits less than the
freeindex after loading the byte.
While this CL does not actual execute a CTZ instruction
it supplies a ctz function with the appropiate signature
along with the logic to execute it.
Change-Id: I5c55ce0ed48ca22c21c4dd9f969b0819b4eadaa7
Reviewed-on: https://go-review.googlesource.com/20169
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Instead of building a freelist from the mark bits generated
by the GC this CL allocates directly from the mark bits.
The approach moves the mark bits from the pointer/no pointer
heap structures into their own per span data structures. The
mark/allocation vectors consist of a single mark bit per
object. Two vectors are maintained, one for allocation and
one for the GC's mark phase. During the GC cycle's sweep
phase the interpretation of the vectors is swapped. The
mark vector becomes the allocation vector and the old
allocation vector is cleared and becomes the mark vector that
the next GC cycle will use.
Marked entries in the allocation vector indicate that the
object is not free. Each allocation vector maintains a boundary
between areas of the span already allocated from and areas
not yet allocated from. As objects are allocated this boundary
is moved until it reaches the end of the span. At this point
further allocations will be done from another span.
Since we no longer sweep a span inspecting each freed object
the responsibility for maintaining pointer/scalar bits in
the heapBitMap containing is now the responsibility of the
the routines doing the actual allocation.
This CL is functionally complete and ready for performance
tuning.
Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04
Reviewed-on: https://go-review.googlesource.com/19470
Reviewed-by: Austin Clements <austin@google.com>
The gcmarkBits is a bit vector used by the GC to mark
reachable objects. Once a GC cycle is complete the gcmarkBits
swap places with the allocBits. allocBits is then used directly
by malloc to locate free objects, thus avoiding the
construction of a linked free list. This CL introduces a set
of helper functions for manipulating gcmarkBits and allocBits
that will be used by later CLs to realize the actual
algorithm. Minimal attempts have been made to optimize these
helper routines.
Change-Id: I55ad6240ca32cd456e8ed4973c6970b3b882dd34
Reviewed-on: https://go-review.googlesource.com/19420
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
These are used at the bottom level of various GC operations that must
not be preempted. To be on the safe side, mark them all nosplit.
Change-Id: I8f7360e79c9852bd044df71413b8581ad764380c
Reviewed-on: https://go-review.googlesource.com/22504
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
This is a subset of https://golang.org/cl/20022 with only the copyright
header lines, so the next CL will be smaller and more reviewable.
Go policy has been single space after periods in comments for some time.
The copyright header template at:
https://golang.org/doc/contribute.html#copyright
also uses a single space.
Make them all consistent.
Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0
Reviewed-on: https://go-review.googlesource.com/20111
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>