1
0
mirror of https://github.com/golang/go synced 2024-11-19 16:34:49 -07:00
Commit Graph

160 Commits

Author SHA1 Message Date
Austin Clements
295d160e01 runtime: make _TinySizeClass an int8 to prevent use as spanClass
Currently _TinySizeClass is untyped, which means it can accidentally
be used as a spanClass (not that I would know this from experience or
anything). Make it an int8 to avoid this mix up.

This is a cherry-pick of dev.garbage commit 81b74bf9c5.

Change-Id: I1e69eccee436ea5aa45e9a9828a013e369e03f1a
Reviewed-on: https://go-review.googlesource.com/41254
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-28 22:50:39 +00:00
Austin Clements
8e25d4ccef runtime: eliminate heapBitsSetTypeNoScan
It's no longer necessary to maintain the bitmap of noscan objects
since we now use the span metadata to determine that they're noscan
instead of the bitmap.

The combined effect of segregating noscan spans and the follow-on
optimizations is roughly a 1% improvement in performance across the
go1 benchmarks and the x/benchmarks, with no increase in heap size.

Benchmark details: https://perf.golang.org/search?q=upload:20170420.1

name                       old time/op    new time/op    delta
Garbage/benchmem-MB=64-12    2.27ms ± 0%    2.25ms ± 1%  -0.96% (p=0.000 n=15+18)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.53s ± 2%     2.55s ± 1%  +0.68%        (p=0.001 n=17+16)
Fannkuch11-12                3.02s ± 0%     3.01s ± 0%  -0.15%        (p=0.000 n=16+16)
FmtFprintfEmpty-12          47.1ns ± 7%    47.0ns ± 5%    ~           (p=0.886 n=20+17)
FmtFprintfString-12         73.6ns ± 3%    73.8ns ± 1%  +0.30%        (p=0.026 n=19+17)
FmtFprintfInt-12            80.3ns ± 2%    80.2ns ± 1%    ~           (p=0.994 n=20+18)
FmtFprintfIntInt-12          124ns ± 0%     124ns ± 0%    ~     (all samples are equal)
FmtFprintfPrefixedInt-12     172ns ± 1%     171ns ± 1%  -0.72%        (p=0.003 n=20+18)
FmtFprintfFloat-12           217ns ± 1%     216ns ± 1%  -0.27%        (p=0.019 n=18+19)
FmtManyArgs-12               490ns ± 1%     488ns ± 0%  -0.36%        (p=0.014 n=18+18)
GobDecode-12                6.71ms ± 1%    6.73ms ± 1%  +0.42%        (p=0.000 n=20+20)
GobEncode-12                5.25ms ± 0%    5.24ms ± 0%  -0.20%        (p=0.001 n=18+20)
Gzip-12                      227ms ± 0%     226ms ± 1%    ~           (p=0.107 n=20+19)
Gunzip-12                   38.8ms ± 0%    38.8ms ± 0%    ~           (p=0.221 n=19+18)
HTTPClientServer-12         75.4µs ± 1%    76.3µs ± 1%  +1.26%        (p=0.000 n=20+19)
JSONEncode-12               14.7ms ± 0%    14.7ms ± 1%  -0.14%        (p=0.002 n=18+17)
JSONDecode-12               57.6ms ± 0%    55.2ms ± 0%  -4.13%        (p=0.000 n=19+19)
Mandelbrot200-12            3.73ms ± 0%    3.73ms ± 0%  -0.09%        (p=0.000 n=19+17)
GoParse-12                  3.18ms ± 1%    3.15ms ± 1%  -0.90%        (p=0.000 n=18+20)
RegexpMatchEasy0_32-12      73.3ns ± 2%    73.2ns ± 1%    ~           (p=0.994 n=20+18)
RegexpMatchEasy0_1K-12       236ns ± 2%     234ns ± 1%  -0.70%        (p=0.002 n=19+17)
RegexpMatchEasy1_32-12      69.7ns ± 2%    69.9ns ± 2%    ~           (p=0.416 n=20+20)
RegexpMatchEasy1_1K-12       366ns ± 1%     365ns ± 1%    ~           (p=0.376 n=19+17)
RegexpMatchMedium_32-12      109ns ± 1%     108ns ± 1%    ~           (p=0.461 n=17+18)
RegexpMatchMedium_1K-12     35.2µs ± 1%    35.2µs ± 3%    ~           (p=0.238 n=19+20)
RegexpMatchHard_32-12       1.77µs ± 1%    1.77µs ± 1%  +0.33%        (p=0.007 n=17+16)
RegexpMatchHard_1K-12       53.2µs ± 0%    53.3µs ± 0%  +0.26%        (p=0.001 n=17+17)
Revcomp-12                  1.13s ±117%    0.87s ±184%    ~           (p=0.813 n=20+19)
Template-12                 63.9ms ± 1%    64.6ms ± 1%  +1.18%        (p=0.000 n=19+20)
TimeParse-12                 313ns ± 5%     312ns ± 0%    ~           (p=0.114 n=20+19)
TimeFormat-12                336ns ± 0%     333ns ± 0%  -0.97%        (p=0.000 n=18+16)
[Geo mean]                  50.6µs         50.1µs       -1.04%

This is a cherry-pick of dev.garbage commit edb54c300f, with updated
benchmark results.

Change-Id: Ic77faaa15cdac3bfbbb0032dde5c204e05a0fd8e
Reviewed-on: https://go-review.googlesource.com/41253
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-28 22:50:37 +00:00
Austin Clements
1a033b1a70 runtime: separate spans of noscan objects
Currently, we mix objects with pointers and objects without pointers
("noscan" objects) together in memory. As a result, for every object
we grey, we have to check that object's heap bits to find out if it's
noscan, which adds to the per-object cost of GC. This also hurts the
TLB footprint of the garbage collector because it decreases the
density of scannable objects at the page level.

This commit improves the situation by using separate spans for noscan
objects. This will allow a much simpler noscan check (in a follow up
CL), eliminate the need to clear the bitmap of noscan objects (in a
follow up CL), and improves TLB footprint by increasing the density of
scannable objects.

This is also a step toward eliminating dead bits, since the current
noscan check depends on checking the dead bit of the first word.

This has no effect on the heap size of the garbage benchmark.

We'll measure the performance change of this after the follow-up
optimizations.

This is a cherry-pick from dev.garbage commit d491e550c3. The only
non-trivial merge conflict was in updatememstats in mstats.go, where
we now have to separate the per-spanclass stats from the per-sizeclass
stats.

Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40
Reviewed-on: https://go-review.googlesource.com/41251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-28 22:50:31 +00:00
Austin Clements
bb6309cd63 runtime: inform arena placement using sbrk(0)
On 32-bit architectures (or if we fail to map a 64-bit-style arena),
we try to map the heap arena just above the end of the process image.
While we can accept any address, using lower addresses is preferable
because lower addresses cause us to map less of the heap bitmap.

However, if a program is linked against C code that has global
constructors, those constructors may call brk/sbrk to allocate memory
(e.g., many C malloc implementations do this for small allocations).
The brk also starts just above the process image, so this may adjust
the brk past the beginning of where we want to put the heap arena. In
this case, the kernel will pick a different address for the arena and
it will usually be very high (at least, as these things go in a 32-bit
address space).

Fix this by consulting the current value of the brk and using this in
addition to the end of the process image to compute the initial arena
placement.

This is implemented only on Linux currently, since we have no evidence
that it's an issue on any other OSes.

Fixes #19831.

Change-Id: Id64b45d08d8c91e4f50d92d0339146250b04f2f8
Reviewed-on: https://go-review.googlesource.com/39810
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-21 14:34:10 +00:00
Austin Clements
6c6f455f88 runtime: consolidate changes to arena_used
Changing mheap_.arena_used requires several steps that are currently
repeated multiple times in mheap_.sysAlloc. Consolidate these into a
single function.

In the future, this will also make it easier to add other auxiliary VM
structures.

Change-Id: Ie68837d2612e1f4ba4904acb1b6b832b15431d56
Reviewed-on: https://go-review.googlesource.com/40151
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-11 01:35:47 +00:00
Austin Clements
29be3f1999 runtime: generalize GC trigger
Currently the GC triggering condition is an awkward combination of the
gcMode (whether or not it's gcBackgroundMode) and a boolean
"forceTrigger" flag.

Replace this with a new gcTrigger type that represents the range of
transition predicates we need. This has several advantages:

1. We can remove the awkward logic that affects the trigger behavior
   based on the gcMode. Now gcMode purely controls whether to run a
   STW GC or not and the gcTrigger controls whether this is a forced
   GC that cannot be consolidated with other GC cycles.

2. We can lift the time-based triggering logic in sysmon to just
   another type of GC trigger and move the logic to the trigger test.

3. This sets us up to have a cycle count-based trigger, which we'll
   use to make runtime.GC trigger concurrent GC with the desired
   consolidation properties.

For #18216.

Change-Id: If9cd49349579a548800f5022ae47b8128004bbfc
Reviewed-on: https://go-review.googlesource.com/37516
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:06 +00:00
Keith Randall
d5dc490519 cmd/compile: intrinsics for math/bits.TrailingZerosX
Implement math/bits.TrailingZerosX using intrinsics.

Generally reorganize the intrinsic spec a bit.
The instrinsics data structure is now built at init time.
This will make doing the other functions in math/bits easier.

Update sys.CtzX to return int instead of uint{64,32} so it
matches math/bits.TrailingZerosX.

Improve the intrinsics a bit for amd64.  We don't need the CMOV
for <64 bit versions.

Update #18616

Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39
Reviewed-on: https://go-review.googlesource.com/38155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 02:44:16 +00:00
Sokolov Yura
663226d8e1 runtime: make fastrand to generate 32bit values
Extend period of fastrand from (1<<31)-1 to (1<<32)-1 by
choosing other polynom and reacting on high bit before shift.

Polynomial is taken at https://users.ece.cmu.edu/~koopman/lfsr/index.html
from 32.dat.gz . It is referred as F7711115 cause this list of
polynomials is for LFSR with shift to right (and fastrand uses shift to
left). (old polynomial is referred in 31.dat.gz as 7BB88888).

There were couple of places with conversation of fastrand to int, which
leads to negative values on 32bit platforms. They are fixed.

Change-Id: Ibee518a3f9103e0aea220ada494b3aec77babb72
Reviewed-on: https://go-review.googlesource.com/36875
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-13 20:22:02 +00:00
Austin Clements
4af6b81d41 runtime: fix confusion between _MaxMem and _MaxArena32
Currently both _MaxMem and _MaxArena32 represent the maximum arena
size on 32-bit hosts (except on MIPS32 where _MaxMem is confusingly
smaller than _MaxArena32).

Clean up sysAlloc so that it always uses _MaxMem, which is the maximum
arena size on both 32- and 64-bit architectures and is the arena size
we allocate auxiliary structures for. This lets us simplify and unify
some code paths and eliminate _MaxArena32.

Fixes #18651. mheap.sysAlloc currently assumes that if the arena is
small, we must be on a 32-bit machine and can therefore grow the arena
to _MaxArena32. This breaks down on darwin/arm64, where _MaxMem is
only 2 GB. As a result, on darwin/arm64, we only reserve spans and
bitmap space for a 2 GB heap, and if the application tries to allocate
beyond that, sysAlloc takes the 32-bit path, tries to grow the arena
beyond 2 GB, and panics when it tries to grow the spans array
allocation past its reserved size. This has probably been a problem
for several releases now, but was only noticed recently because
mapSpans didn't check the bounds on the span reservation until
recently. Most likely it corrupted the bitmap before. By using _MaxMem
consistently, we avoid thinking that we can grow the arena larger than
we have auxiliary structures for.

Change-Id: Ifef28cb746a3ead4b31c1d7348495c2242fef520
Reviewed-on: https://go-review.googlesource.com/35253
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-07 18:39:18 +00:00
Austin Clements
1cc24690b8 runtime: simplify and cleanup mallocinit
mallocinit has evolved organically. Make a pass to clean it up in
various ways:

1. Merge the computation of spansSize and bitmapSize. These were
   computed on every loop iteration of two different loops, but always
   have the same value, which can be derived directly from _MaxMem.
   This also avoids over-reserving these on MIPS, were _MaxArena32 is
   larger than _MaxMem.

2. Remove the ulimit -v logic. It's been disabled for many releases
   and the dead code paths to support it are even more wrong now than
   they were when it was first disabled, since now we *must* reserve
   spans and bitmaps for the full address space.

3. Make it clear that we're using a simple linear allocation to lay
   out the spans, bitmap, and arena spaces. Previously there were a
   lot of redundant pointer computations. Now we just bump p1 up as we
   reserve the spaces.

In preparation for #18651.

Updates #5049 (respect ulimit).

Change-Id: Icbe66570d3a7a17bea227dc54fb3c4978b52a3af
Reviewed-on: https://go-review.googlesource.com/35252
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-07 18:39:15 +00:00
Austin Clements
efb5eae3cf runtime: make _MaxMem an untyped constant
Currently _MaxMem is a uintptr, which is going to complicate some
further changes. Make it untyped so we'll be able to do untyped math
on it before truncating it to a uintptr.

The runtime assembly is identical before and after this change on
{linux,windows}/{amd64,386}.

Updates #18651.

Change-Id: I0f64511faa9e0aa25179a556ab9f185ebf8c9cf8
Reviewed-on: https://go-review.googlesource.com/35251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-02-07 18:39:12 +00:00
Austin Clements
7aefdfded0 runtime: use 4K as the boundary of legal pointers
Currently, the check for legal pointers in stack copying uses
_PageSize (8K) as the minimum legal pointer. By default, Linux won't
let you map under 64K, but

1) it's less clear what other OSes allow or will allow in the future;

2) while mapping the first page is a terrible idea, mapping anywhere
above that is arguably more justifiable;

3) the compiler only assumes the first physical page (4K) is never
mapped.

Make the runtime consistent with the compiler and more robust by
changing the bad pointer check to use 4K as the minimum legal pointer.

This came out of discussions on CLs 34663 and 34719.

Change-Id: Idf721a788bd9699fb348f47bdd083cf8fa8bd3e5
Reviewed-on: https://go-review.googlesource.com/34890
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-01-06 16:19:14 +00:00
Vladimir Stefanovic
272032d0b2 runtime: add support files for linux/mips{,le} port
Only exe buildmode without cgo supported.

Change-Id: Id104a79a99d3285c04db00fd98b8affa94ea3c37
Reviewed-on: https://go-review.googlesource.com/31487
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-11-15 21:49:01 +00:00
Keith Randall
7ba36f4adb runtime: compute size classes statically
No point in computing this info on startup.
Compute it at build time.
This lets us spend more time computing & checking the size classes.

Improve the div magic for rounding to the start of an object.
We can now use 32-bit multiplies & shifts, which should help
32-bit platforms.

The static data is <1KB.

The actual size classes are not changed by this CL.

Change-Id: I6450cec7d1b2b4ad31fd3f945f504ed2ec6570e7
Reviewed-on: https://go-review.googlesource.com/32219
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-10-30 03:48:49 +00:00
Austin Clements
87e48c5afd runtime, cmd/compile: rename memclr -> memclrNoHeapPointers
Since barrier-less memclr is only safe in very narrow circumstances,
this commit renames memclr to avoid accidentally calling memclr on
typed memory. This can cause subtle, non-deterministic bugs, so it's
worth some effort to prevent. In the near term, this will also prevent
bugs creeping in from any concurrent CLs that add calls to memclr; if
this happens, whichever patch hits master second will fail to compile.

This also adds the other new memclr variants to the compiler's
builtin.go to minimize the churn on that binary blob. We'll use these
in future commits.

Updates #17503.

Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca
Reviewed-on: https://go-review.googlesource.com/31369
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 18:20:33 +00:00
Austin Clements
ae3bb4a537 runtime: make fixalloc zero allocations on reuse
Currently fixalloc does not zero memory it reuses. This is dangerous
with the hybrid barrier if the type may contain heap pointers, since
it may cause us to observe a dead heap pointer on reuse. It's also
error-prone since it's the only allocator that doesn't zero on
allocation (mallocgc of course zeroes, but so do persistentalloc and
sysAlloc). It's also largely pointless: for mcache, the caller
immediately memclrs the allocation; and the two specials types are
tiny so there's no real cost to zeroing them.

Change fixalloc to zero allocations by default.

The only type we don't zero by default is mspan. This actually
requires that the spsn's sweepgen survive across freeing and
reallocating a span. If we were to zero it, the following race would
be possible:

1. The current sweepgen is 2. Span s is on the unswept list.

2. Direct sweeping sweeps span s, finds it's all free, and releases s
   to the fixalloc.

3. Thread 1 allocates s from fixalloc. Suppose this zeros s, including
   s.sweepgen.

4. Thread 1 calls s.init, which sets s.state to _MSpanDead.

5. On thread 2, background sweeping comes across span s in allspans
   and cas's s.sweepgen from 0 (sg-2) to 1 (sg-1). Now it thinks it
   owns it for sweeping. 6. Thread 1 continues initializing s.
   Everything breaks.

I would like to fix this because it's obviously confusing, but it's a
subtle enough problem that I'm leaving it alone for now. The solution
may be to skip sweepgen 0, but then we have to think about wrap-around
much more carefully.

Updates #17503.

Change-Id: Ie08691feed3abbb06a31381b94beb0a2e36a0613
Reviewed-on: https://go-review.googlesource.com/31368
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 18:20:23 +00:00
Austin Clements
6b0f668044 runtime: consolidate h_spans and mheap_.spans
Like h_allspans and mheap_.allspans, these were two ways of referring
to the spans array from when the runtime was split between C and Go.
Clean this up by making mheap_.spans a slice and eliminating h_spans.

Change-Id: I3aa7038d53c3a4252050aa33e468c48dfed0b70e
Reviewed-on: https://go-review.googlesource.com/30532
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-25 22:32:48 +00:00
Austin Clements
1bc6be6423 runtime: mark several types go:notinheap
This covers basically all sysAlloc'd, persistentalloc'd, and
fixalloc'd types.

Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b
Reviewed-on: https://go-review.googlesource.com/30941
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-15 17:58:20 +00:00
Austin Clements
aaf4099a5c runtime: update malloc.go documentation
The big documentation comment at the top of malloc.go has gotten
woefully out of date. Update it.

Change-Id: Ibdb1bdcfdd707a6dc9db79d0633a36a28882301b
Reviewed-on: https://go-review.googlesource.com/29731
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-26 22:00:53 +00:00
Austin Clements
6dda7b2f5f runtime: don't hard-code physical page size
Now that the runtime fetches the true physical page size from the OS,
make the physical page size used by heap growth a variable instead of
a constant. This isn't used in any performance-critical paths, so it
shouldn't be an issue.

sys.PhysPageSize is also renamed to sys.DefaultPhysPageSize to make it
clear that it's not necessarily the true page size. There are no uses
of this constant any more, but we'll keep it around for now.

Updates #12480 and #10180.

Change-Id: I6c23b9df860db309c38c8287a703c53817754f03
Reviewed-on: https://go-review.googlesource.com/25022
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-06 21:05:53 +00:00
Austin Clements
276a52de55 runtime: fetch physical page size from the OS
Currently the physical page size assumed by the runtime is hard-coded.
On Linux the runtime at least fetches the OS page size during init and
sanity checks against the hard-coded value, but they may still differ.
On other OSes we wouldn't even notice.

Add support on all OSes to fetch the actual OS physical page size
during runtime init and lift the sanity check of PhysPageSize from the
Linux init code to general malloc init. Currently this is the only use
of the retrieved page size, but we'll add more shortly.

Updates #12480 and #10180.

Change-Id: I065f2834bc97c71d3208edc17fd990ec9058b6da
Reviewed-on: https://go-review.googlesource.com/25050
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-09-06 21:05:50 +00:00
Josh Bleecher Snyder
2b74de3ed9 runtime: rename fastrand1 to fastrand
Change-Id: I37706ff0a3486827c5b072c95ad890ea87ede847
Reviewed-on: https://go-review.googlesource.com/28210
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-30 23:59:21 +00:00
Cherry Zhang
b2e0e9688a cmd/compile: remove Zero and NilCheck for newobject
Recognize runtime.newobject and don't Zero or NilCheck it.

Fixes #15914 (?)
Updates #15390.

TBD: add test

Change-Id: Ia3bfa5c2ddbe2c27c92d9f68534a713b5ce95934
Reviewed-on: https://go-review.googlesource.com/27930
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-30 23:10:43 +00:00
Dmitry Vyukov
14e5951166 runtime: increase malloc size classes
When we calculate class sizes, in some cases we discard considerable
amounts of memory without an apparent reason. For example, we choose
size 8448 with 6 objects in 7 pages. But we can well use object
size 9472, which is also 6 objects in 7 pages but +1024 bytes (+12.12%).

Increase class sizes to the max value that leads to the same
page count/number of objects. Full list of affected size classes:

class 36: pages: 2 size: 1664->1792 +128 (7.69%)
class 39: pages: 1 size: 2560->2688 +128 (5.0%)
class 40: pages: 3 size: 2816->3072 +256 (9.9%)
class 41: pages: 2 size: 3072->3200 +128 (4.16%)
class 42: pages: 3 size: 3328->3456 +128 (3.84%)
class 44: pages: 3 size: 4608->4864 +256 (5.55%)
class 47: pages: 4 size: 6400->6528 +128 (2.0%)
class 48: pages: 5 size: 6656->6784 +128 (1.92%)
class 51: pages: 7 size: 8448->9472 +1024 (12.12%)
class 52: pages: 6 size: 8704->9728 +1024 (11.76%)
class 53: pages: 5 size: 9472->10240 +768 (8.10%)
class 54: pages: 4 size: 10496->10880 +384 (3.65%)
class 57: pages: 7 size: 14080->14336 +256 (1.81%)
class 59: pages: 9 size: 16640->18432 +1792 (10.76%)
class 60: pages: 7 size: 17664->19072 +1408 (7.97%)
class 62: pages: 8 size: 21248->21760 +512 (2.40%)
class 64: pages: 10 size: 24832->27264 +2432 (9.79%)
class 65: pages: 7 size: 28416->28672 +256 (0.90%)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.59s ± 5%     2.52s ± 4%    ~     (p=0.132 n=6+6)
Fannkuch11-12                2.13s ± 3%     2.17s ± 3%    ~     (p=0.180 n=6+6)
FmtFprintfEmpty-12          47.0ns ± 3%    46.6ns ± 1%    ~     (p=0.355 n=6+5)
FmtFprintfString-12          131ns ± 0%     131ns ± 1%    ~     (p=0.476 n=4+6)
FmtFprintfInt-12             121ns ± 6%     122ns ± 2%    ~     (p=0.511 n=6+6)
FmtFprintfIntInt-12          182ns ± 2%     186ns ± 1%  +2.20%  (p=0.015 n=6+6)
FmtFprintfPrefixedInt-12     184ns ± 5%     181ns ± 2%    ~     (p=0.645 n=6+6)
FmtFprintfFloat-12           272ns ± 7%     265ns ± 1%    ~     (p=1.000 n=6+5)
FmtManyArgs-12               783ns ± 2%     802ns ± 2%  +2.38%  (p=0.017 n=6+6)
GobDecode-12                7.04ms ± 4%    7.00ms ± 2%    ~     (p=1.000 n=6+6)
GobEncode-12                6.36ms ± 6%    6.17ms ± 6%    ~     (p=0.240 n=6+6)
Gzip-12                      242ms ±14%     233ms ± 7%    ~     (p=0.310 n=6+6)
Gunzip-12                   36.6ms ±22%    36.0ms ± 9%    ~     (p=0.841 n=5+5)
HTTPClientServer-12         93.1µs ±29%    88.0µs ±32%    ~     (p=0.240 n=6+6)
JSONEncode-12               27.1ms ±39%    26.2ms ±35%    ~     (p=0.589 n=6+6)
JSONDecode-12               71.7ms ±36%    71.5ms ±36%    ~     (p=0.937 n=6+6)
Mandelbrot200-12            4.78ms ±10%    4.70ms ±16%    ~     (p=0.394 n=6+6)
GoParse-12                  4.86ms ±34%    4.95ms ±36%    ~     (p=1.000 n=6+6)
RegexpMatchEasy0_32-12       110ns ±37%     110ns ±36%    ~     (p=0.660 n=6+6)
RegexpMatchEasy0_1K-12       240ns ±38%     234ns ±47%    ~     (p=0.554 n=6+6)
RegexpMatchEasy1_32-12      77.2ns ± 2%    77.2ns ±10%    ~     (p=0.699 n=6+6)
RegexpMatchEasy1_1K-12       337ns ± 5%     331ns ± 4%    ~     (p=0.552 n=6+6)
RegexpMatchMedium_32-12      125ns ±13%     132ns ±26%    ~     (p=0.561 n=6+6)
RegexpMatchMedium_1K-12     35.9µs ± 3%    36.1µs ± 5%    ~     (p=0.818 n=6+6)
RegexpMatchHard_32-12       1.81µs ± 4%    1.82µs ± 5%    ~     (p=0.452 n=5+5)
RegexpMatchHard_1K-12       52.4µs ± 2%    54.4µs ± 3%  +3.84%  (p=0.002 n=6+6)
Revcomp-12                   401ms ± 2%     390ms ± 1%  -2.82%  (p=0.002 n=6+6)
Template-12                 54.5ms ± 3%    54.6ms ± 1%    ~     (p=0.589 n=6+6)
TimeParse-12                 294ns ± 1%     298ns ± 2%    ~     (p=0.160 n=6+6)
TimeFormat-12                323ns ± 4%     318ns ± 5%    ~     (p=0.297 n=6+6)

name                      old speed      new speed      delta
GobDecode-12               109MB/s ± 4%   110MB/s ± 2%    ~     (p=1.000 n=6+6)
GobEncode-12               121MB/s ± 6%   125MB/s ± 6%    ~     (p=0.240 n=6+6)
Gzip-12                   80.4MB/s ±12%  83.3MB/s ± 7%    ~     (p=0.310 n=6+6)
Gunzip-12                  495MB/s ±41%   541MB/s ± 9%    ~     (p=0.931 n=6+5)
JSONEncode-12             80.7MB/s ±39%  82.8MB/s ±34%    ~     (p=0.589 n=6+6)
JSONDecode-12             30.4MB/s ±40%  31.0MB/s ±37%    ~     (p=0.937 n=6+6)
GoParse-12                13.2MB/s ±33%  13.2MB/s ±35%    ~     (p=1.000 n=6+6)
RegexpMatchEasy0_32-12     321MB/s ±34%   326MB/s ±34%    ~     (p=0.699 n=6+6)
RegexpMatchEasy0_1K-12    4.49GB/s ±31%  4.74GB/s ±37%    ~     (p=0.589 n=6+6)
RegexpMatchEasy1_32-12     414MB/s ± 2%   415MB/s ± 9%    ~     (p=0.699 n=6+6)
RegexpMatchEasy1_1K-12    3.03GB/s ± 5%  3.09GB/s ± 4%    ~     (p=0.699 n=6+6)
RegexpMatchMedium_32-12   7.99MB/s ±12%  7.68MB/s ±22%    ~     (p=0.589 n=6+6)
RegexpMatchMedium_1K-12   28.5MB/s ± 3%  28.4MB/s ± 5%    ~     (p=0.818 n=6+6)
RegexpMatchHard_32-12     17.7MB/s ± 4%  17.0MB/s ±15%    ~     (p=0.351 n=5+6)
RegexpMatchHard_1K-12     19.6MB/s ± 2%  18.8MB/s ± 3%  -3.67%  (p=0.002 n=6+6)
Revcomp-12                 634MB/s ± 2%   653MB/s ± 1%  +2.89%  (p=0.002 n=6+6)
Template-12               35.6MB/s ± 3%  35.5MB/s ± 1%    ~     (p=0.615 n=6+6)

Change-Id: I465a47f74227f316e3abea231444f48c7a30ef85
Reviewed-on: https://go-review.googlesource.com/24493
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-08-19 21:24:28 +00:00
Austin Clements
d8b08c3aa4 runtime: perform publication barrier even for noscan objects
Currently we only execute a publication barrier for scan objects (and
skip it for noscan objects). This used to be okay because GC would
never consult the object itself (so it wouldn't observe uninitialized
memory even if it found a pointer to a noscan object), and the heap
bitmap was pre-initialized to noscan.

However, now we explicitly initialize the heap bitmap for noscan
objects when we allocate them. While the GC will still never consult
the contents of a noscan object, it does need to see the initialized
heap bitmap. Hence, we need to execute a publication barrier to make
the bitmap visible before user code can expose a pointer to the newly
allocated object even for noscan objects.

Change-Id: Ie4133c638db0d9055b4f7a8061a634d970627153
Reviewed-on: https://go-review.googlesource.com/23043
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-05-14 13:49:51 +00:00
Elias Naur
e6ec82067a runtime: use entire address space on 32 bit
In issue #13992, Russ mentioned that the heap bitmap footprint was
halved but that the bitmap size calculation hadn't been updated. This
presents the opportunity to either halve the bitmap size or double
the addressable virtual space. This CL doubles the addressable virtual
space. On 32 bit this can be tweaked further to allow the bitmap to
cover the entire 4GB virtual address space, removing a failure mode
if the kernel hands out memory with a too low address.

First, fix the calculation and double _MaxArena32 to cover 4GB virtual
memory space with the same bitmap size (256 MB).

Then, allow the fallback mode for the initial memory reservation
on 32 bit (or 64 bit with too little available virtual memory) to not
include space for the arena. mheap.sysAlloc will automatically reserve
additional space when the existing arena is full.

Finally, set arena_start to 0 in 32 bit mode, so that any address is
acceptable for subsequent (additional) reservations.

Before, the bitmap was always located just before arena_start, so
fix the two places relying on that assumption: Point the otherwise unused
mheap.bitmap to one byte after the end of the bitmap, and use it for
bitmap addressing instead of arena_start.

With arena_start set to 0 on 32 bit, the cgoInRange check is no longer a
sufficient check for Go pointers. Introduce and call inHeapOrStack to
check whether a pointer is to the Go heap or stack.

While we're here, remove sysReserveHigh which seems to be unused.

Fixes #13992

Change-Id: I592b513148a50b9d3967b5c5d94b86b3ec39acc2
Reviewed-on: https://go-review.googlesource.com/20471
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-07 03:04:39 +00:00
Austin Clements
a20fd1f6ba runtime: reclaim scan/dead bit in first word
With the switch to separate mark bitmaps, the scan/dead bit for the
first word of each object is now unused. Reclaim this bit and use it
as a scan/dead bit, just like words three and on. The second word is
still used for checkmark.

This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
since they no longer need different cases for 1, 2, and 3+ word
objects. They can instead just manipulate the heap bitmap for the
first word and be done with it.

In order to enable this, we change heapBitsSetType and runGCProg to
always set the scan/dead bit to scan for the first word on every code
path. Since these functions only apply to types that have pointers,
there's no need to do this conditionally: it's *always* necessary to
set the scan bit in the first word.

We also change every place that scans an object and checks if there
are more pointers. Rather than only checking morePointers if the word
is >= 2, we now check morePointers if word != 1 (since that's the
checkmark word).

Looking forward, we should probably reclaim the checkmark bit, too,
but that's going to be quite a bit more work.

Tested by setting doubleCheck in heapBitsSetType and running all.bash
on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.

This particularly improves the FmtFprintf* go1 benchmarks, since they
do a large amount of noscan allocation.

name                      old time/op    new time/op    delta
BinaryTree17-12              2.34s ± 1%     2.38s ± 1%  +1.70%  (p=0.000 n=17+19)
Fannkuch11-12                2.09s ± 0%     2.09s ± 1%    ~     (p=0.276 n=17+16)
FmtFprintfEmpty-12          44.9ns ± 2%    44.8ns ± 2%    ~     (p=0.340 n=19+18)
FmtFprintfString-12          127ns ± 0%     125ns ± 0%  -1.57%  (p=0.000 n=16+15)
FmtFprintfInt-12             128ns ± 0%     122ns ± 1%  -4.45%  (p=0.000 n=15+20)
FmtFprintfIntInt-12          207ns ± 1%     193ns ± 0%  -6.55%  (p=0.000 n=19+14)
FmtFprintfPrefixedInt-12     197ns ± 1%     191ns ± 0%  -2.93%  (p=0.000 n=17+18)
FmtFprintfFloat-12           263ns ± 0%     248ns ± 1%  -5.88%  (p=0.000 n=15+19)
FmtManyArgs-12               794ns ± 0%     779ns ± 1%  -1.90%  (p=0.000 n=18+18)
GobDecode-12                7.14ms ± 2%    7.11ms ± 1%    ~     (p=0.072 n=20+20)
GobEncode-12                5.85ms ± 1%    5.82ms ± 1%  -0.49%  (p=0.000 n=20+20)
Gzip-12                      218ms ± 1%     215ms ± 1%  -1.22%  (p=0.000 n=19+19)
Gunzip-12                   36.8ms ± 0%    36.7ms ± 0%  -0.18%  (p=0.006 n=18+20)
HTTPClientServer-12         77.1µs ± 4%    77.1µs ± 3%    ~     (p=0.945 n=19+20)
JSONEncode-12               15.6ms ± 1%    15.9ms ± 1%  +1.68%  (p=0.000 n=18+20)
JSONDecode-12               55.2ms ± 1%    53.6ms ± 1%  -2.93%  (p=0.000 n=17+19)
Mandelbrot200-12            4.05ms ± 1%    4.05ms ± 0%    ~     (p=0.306 n=17+17)
GoParse-12                  3.14ms ± 1%    3.10ms ± 1%  -1.31%  (p=0.000 n=19+18)
RegexpMatchEasy0_32-12      69.3ns ± 1%    70.0ns ± 0%  +0.89%  (p=0.000 n=19+17)
RegexpMatchEasy0_1K-12       237ns ± 1%     236ns ± 0%  -0.62%  (p=0.000 n=19+16)
RegexpMatchEasy1_32-12      69.5ns ± 1%    70.3ns ± 1%  +1.14%  (p=0.000 n=18+17)
RegexpMatchEasy1_1K-12       377ns ± 1%     366ns ± 1%  -3.03%  (p=0.000 n=15+19)
RegexpMatchMedium_32-12      107ns ± 1%     107ns ± 2%    ~     (p=0.318 n=20+19)
RegexpMatchMedium_1K-12     33.8µs ± 3%    33.5µs ± 1%  -1.04%  (p=0.001 n=20+19)
RegexpMatchHard_32-12       1.68µs ± 1%    1.73µs ± 0%  +2.50%  (p=0.000 n=20+18)
RegexpMatchHard_1K-12       50.8µs ± 1%    52.0µs ± 1%  +2.50%  (p=0.000 n=19+18)
Revcomp-12                   381ms ± 1%     385ms ± 1%  +1.00%  (p=0.000 n=17+18)
Template-12                 64.9ms ± 3%    62.6ms ± 1%  -3.55%  (p=0.000 n=19+18)
TimeParse-12                 324ns ± 0%     328ns ± 1%  +1.25%  (p=0.000 n=18+18)
TimeFormat-12                345ns ± 0%     334ns ± 0%  -3.31%  (p=0.000 n=15+17)
[Geo mean]                  52.1µs         51.5µs       -1.00%

Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049
Reviewed-on: https://go-review.googlesource.com/22632
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-30 16:49:54 +00:00
Rick Hudson
e9eaa181fc [dev.garbage] runtime: simplify nextFreeFast so it is inlined
nextFreeFast is currently not inlined by the compiler due
to its size and complexity. This CL simplifies
nextFreeFast by letting the slow path handle (nextFree)
handle a corner cases.

Change-Id: Ia9c5d1a7912bcb4bec072f5fd240f0e0bafb20e4
Reviewed-on: https://go-review.googlesource.com/22598
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2016-04-29 16:47:11 +00:00
Austin Clements
d97625ae9e [dev.garbage] runtime: fix nfree accounting
Commit 8dda1c4 changed the meaning of "nfree" in sweep from the number
of newly freed objects to the total number of free objects in the
span, but didn't update where sweep added nfree to c.local_nsmallfree.
Hence, we're over-accounting the number of frees. This is causing
TestArrayHash to fail with "too many allocs NNN - hash not balanced".

Fix this by computing the number of newly freed objects and adding
that to c.local_nsmallfree, so it behaves like it used to. Computing
this requires a small tweak to mallocgc: apparently we've never set
s.allocCount when allocating a large object; fix this by setting it to
1 so sweep doesn't get confused.

Change-Id: I31902ffd310110da4ffd807c5c06f1117b872dc8
Reviewed-on: https://go-review.googlesource.com/22595
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2016-04-29 15:25:26 +00:00
Austin Clements
6d11490539 [dev.garbage] runtime: fix allocfreetrace
We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode
when we removed the sweep over the mark bitmap. Fix it by
re-introducing the sweep over the bitmap specifically if we're in
allocfreetrace mode. This doesn't have to be even remotely efficient,
since the overhead of allocfreetrace is huge anyway, so we can keep
the code for this down to just a few lines.

Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40
Reviewed-on: https://go-review.googlesource.com/22592
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-29 15:08:21 +00:00
Austin Clements
38f674687a [dev.garbage] runtime: reintroduce no-zeroing optimization
Currently we always zero objects when we allocate them. We used to
have an optimization that would not zero objects that had not been
allocated since the whole span was last zeroed (either by getting it
from the system or by getting it from the heap, which does a bulk
zero), but this depended on the sweeper clobbering the first two words
of each object. Hence, we lost this optimization when the bitmap
sweeper went away.

Re-introduce this optimization using a different mechanism. Each span
already keeps a flag indicating that it just came from the OS or was
just bulk zeroed by the mheap. We can simply use this flag to know
when we don't need to zero an object. This is slightly less efficient
than the old optimization: if a span gets allocated and partially
used, then GC happens and the span gets returned to the mcentral, then
the span gets re-acquired, the old optimization knew that it only had
to re-zero the objects that had been reclaimed, whereas this
optimization will re-zero everything. However, in this case, you're
already paying for the garbage collection, and you've only wasted one
zeroing of the span, so in practice there seems to be little
difference. (If we did want to revive the full optimization, each span
could keep track of a frontier beyond which all free slots are zeroed.
I prototyped this and it didn't obvious do any better than the much
simpler approach in this commit.)

This significantly improves BinaryTree17, which is allocation-heavy
(and runs first, so most pages are already zeroed), and slightly
improves everything else.

name              old time/op  new time/op  delta
XBenchGarbage-12  2.15ms ± 1%  2.14ms ± 1%  -0.80%  (p=0.000 n=17+17)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.71s ± 1%     2.56s ± 1%  -5.73%        (p=0.000 n=18+19)
DivconstI64-12              1.70ns ± 1%    1.70ns ± 1%    ~           (p=0.562 n=18+18)
DivconstU64-12              1.74ns ± 2%    1.74ns ± 1%    ~           (p=0.394 n=20+20)
DivconstI32-12              1.74ns ± 0%    1.74ns ± 0%    ~     (all samples are equal)
DivconstU32-12              1.66ns ± 1%    1.66ns ± 0%    ~           (p=0.516 n=15+16)
DivconstI16-12              1.84ns ± 0%    1.84ns ± 0%    ~     (all samples are equal)
DivconstU16-12              1.82ns ± 0%    1.82ns ± 0%    ~     (all samples are equal)
DivconstI8-12               1.79ns ± 0%    1.79ns ± 0%    ~     (all samples are equal)
DivconstU8-12               1.60ns ± 0%    1.60ns ± 1%    ~           (p=0.603 n=17+19)
Fannkuch11-12                2.11s ± 1%     2.11s ± 0%    ~           (p=0.333 n=16+19)
FmtFprintfEmpty-12          45.1ns ± 4%    45.4ns ± 5%    ~           (p=0.111 n=20+20)
FmtFprintfString-12          134ns ± 0%     129ns ± 0%  -3.45%        (p=0.000 n=18+16)
FmtFprintfInt-12             131ns ± 1%     129ns ± 1%  -1.54%        (p=0.000 n=16+18)
FmtFprintfIntInt-12          205ns ± 2%     203ns ± 0%  -0.56%        (p=0.014 n=20+18)
FmtFprintfPrefixedInt-12     200ns ± 2%     197ns ± 1%  -1.48%        (p=0.000 n=20+18)
FmtFprintfFloat-12           256ns ± 1%     256ns ± 0%  -0.21%        (p=0.008 n=18+20)
FmtManyArgs-12               805ns ± 0%     804ns ± 0%  -0.19%        (p=0.001 n=18+18)
GobDecode-12                7.21ms ± 1%    7.14ms ± 1%  -0.92%        (p=0.000 n=19+20)
GobEncode-12                5.88ms ± 1%    5.88ms ± 1%    ~           (p=0.641 n=18+19)
Gzip-12                      218ms ± 1%     218ms ± 1%    ~           (p=0.271 n=19+18)
Gunzip-12                   37.1ms ± 0%    36.9ms ± 0%  -0.29%        (p=0.000 n=18+17)
HTTPClientServer-12         78.1µs ± 2%    77.4µs ± 2%    ~           (p=0.070 n=19+19)
JSONEncode-12               15.5ms ± 1%    15.5ms ± 0%    ~           (p=0.063 n=20+18)
JSONDecode-12               56.1ms ± 0%    55.4ms ± 1%  -1.18%        (p=0.000 n=19+18)
Mandelbrot200-12            4.05ms ± 0%    4.06ms ± 0%  +0.29%        (p=0.001 n=18+18)
GoParse-12                  3.28ms ± 1%    3.21ms ± 1%  -2.30%        (p=0.000 n=20+20)
RegexpMatchEasy0_32-12      69.4ns ± 2%    69.3ns ± 1%    ~           (p=0.205 n=18+16)
RegexpMatchEasy0_1K-12       239ns ± 0%     239ns ± 0%    ~     (all samples are equal)
RegexpMatchEasy1_32-12      69.4ns ± 1%    69.4ns ± 1%    ~           (p=0.620 n=15+18)
RegexpMatchEasy1_1K-12       370ns ± 1%     369ns ± 2%    ~           (p=0.088 n=20+20)
RegexpMatchMedium_32-12      108ns ± 0%     108ns ± 0%    ~     (all samples are equal)
RegexpMatchMedium_1K-12     33.6µs ± 3%    33.5µs ± 3%    ~           (p=0.718 n=20+20)
RegexpMatchHard_32-12       1.68µs ± 1%    1.67µs ± 2%    ~           (p=0.316 n=20+20)
RegexpMatchHard_1K-12       50.5µs ± 3%    50.4µs ± 3%    ~           (p=0.659 n=20+20)
Revcomp-12                   381ms ± 1%     381ms ± 1%    ~           (p=0.916 n=19+18)
Template-12                 66.5ms ± 1%    65.8ms ± 2%  -1.08%        (p=0.000 n=20+20)
TimeParse-12                 317ns ± 0%     319ns ± 0%  +0.48%        (p=0.000 n=19+12)
TimeFormat-12                338ns ± 0%     338ns ± 0%    ~           (p=0.124 n=19+18)
[Geo mean]                  5.99µs         5.96µs       -0.54%

Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907
Reviewed-on: https://go-review.googlesource.com/22591
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-29 15:08:13 +00:00
Austin Clements
3e2462387f [dev.garbage] runtime: eliminate mspan.start
This converts all remaining uses of mspan.start to instead use
mspan.base(). In many cases, this actually reduces the complexity of
the code.

Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7
Reviewed-on: https://go-review.googlesource.com/22590
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2016-04-29 03:53:17 +00:00
Austin Clements
2e8b74b695 [dev.garbage] runtime: document sysAlloc
In particular, it always returns an aligned pointer.

Change-Id: I763789a539a4bfd8b0efb36a39a80be1a479d3e2
Reviewed-on: https://go-review.googlesource.com/22558
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-29 03:53:12 +00:00
Rick Hudson
2fb75ea6c6 [dev.garbage] runtime: use sys.Ctz64 intrinsic
Our compilers now provides instrinsics including
sys.Ctz64 that support CTZ (count trailing zero)
instructions. This CL replaces the Go versions
of CTZ with the compiler intrinsic.

Count trailing zeros CTZ finds the least
significant 1 in a word and returns the number
of less significant 0s in the word.

Allocation uses the bitmap created by the garbage
collector to locate an unmarked object. The logic
takes a word of the bitmap, complements, and then
caches it. It then uses CTZ to locate an available
unmarked object. It then shifts marked bits out of
the bitmap word preparing it for the next search.
Once all the unmarked objects are used in the
cached work the bitmap gets another word and
repeats the process.

Change-Id: Id2fc42d1d4b9893efaa2e1bd01896985b7e42f82
Reviewed-on: https://go-review.googlesource.com/21366
Reviewed-by: Austin Clements <austin@google.com>
2016-04-29 00:00:50 +00:00
Rick Hudson
2063d5d903 [dev.garbage] runtime: restructure alloc and mark bits
Two changes are included here that are dependent on the other.
The first is that allocBits and gcamrkBits are changed to
a *uint8 which points to the first byte of that span's
mark and alloc bits. Several places were altered to
perform pointer arithmetic to locate the byte corresponding
to an object in the span. The actual bit corresponding
to an object is indexed in the byte by using the lower three
bits of the objects index.

The second change avoids the redundant calculation of an
object's index. The index is returned from heapBitsForObject
and then used by the functions indexing allocBits
and gcmarkBits.

Finally we no longer allocate the gc bits in the span
structures. Instead we use an arena based allocation scheme
that allows for a more compact bit map as well as recycling
and bulk clearing of the mark bits.

Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef
Reviewed-on: https://go-review.googlesource.com/20705
Reviewed-by: Austin Clements <austin@google.com>
2016-04-29 00:00:47 +00:00
Rick Hudson
23aeb34df1 [dev.garbage] Merge remote-tracking branch 'origin/master' into HEAD
Change-Id: I282fd9ce9db435dfd35e882a9502ab1abc185297
2016-04-27 18:46:52 -04:00
Rick Hudson
f8d0d4fd59 [dev.garbage] runtime: cleanup and optimize span.base()
Prior to this CL the base of a span was calculated in various
places using shifts or calls to base(). This CL now
always calls base() which has been optimized to calculate the
base of the span when the span is initialized and store that
value in the span structure.

Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100
Reviewed-on: https://go-review.googlesource.com/20703
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:59 +00:00
Rick Hudson
8dda1c4c08 [dev.garbage] runtime: remove heapBitsSweepSpan
Prior to this CL the sweep phase was responsible for locating
all objects that were about to be freed and calling a function
to process the object. This was done by the function
heapBitsSweepSpan. Part of processing included calls to
tracefree and msanfree as well as counting how many objects
were freed.

The calls to tracefree and msanfree have been moved into the
gcmalloc routine and called when the object is about to be
reallocated. The counting of free objects has been optimized
using an array based popcnt algorithm and if all the objects
in a span are free then span is freed.

Similarly the code to locate the next free object has been
optimized to use an array based ctz (count trailing zero).
Various hot paths in the allocation logic have been optimized.

At this point the garbage benchmark is within 3% of the 1.6
release.

Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa
Reviewed-on: https://go-review.googlesource.com/20201
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:57 +00:00
Rick Hudson
4093481523 [dev.garbage] runtime: add bit and cache ctz64 (count trailing zero)
Add to each span a 64 bit cache (allocCache) of the allocBits
at freeindex. allocCache is shifted such that the lowest bit
corresponds to the bit freeindex. allocBits uses a 0 to
indicate an object is free, on the other hand allocCache
uses a 1 to indicate an object is free. This facilitates
ctz64 (count trailing zero) which counts the number of 0s
trailing the least significant 1. This is also the index of
the least significant 1.

Each span maintains a freeindex indicating the boundary
between allocated objects and unallocated objects. allocCache
is shifted as freeindex is incremented such that the low bit
in allocCache corresponds to the bit a freeindex in the
allocBits array.

Currently ctz64 is written in Go using a for loop so it is
not very efficient. Use of the hardware instruction will
follow. With this in mind comparisons of the garbage
benchmark are as follows.

1.6 release        2.8 seconds
dev:garbage branch 3.1 seconds.

Profiling shows the go implementation of ctz64 takes up
1% of the total time.

Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f
Reviewed-on: https://go-review.googlesource.com/20200
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:54 +00:00
Rick Hudson
e4ac2d4acc [dev.garbage] runtime: replace ref with allocCount
This is a renaming of the field ref to the
more appropriate allocCount. The field
holds the number of objects in the span
that are currently allocated. Some throws
strings were adjusted to more accurately
convey the meaning of allocCount.

Change-Id: I10daf44e3e9cc24a10912638c7de3c1984ef8efe
Reviewed-on: https://go-review.googlesource.com/19518
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:49 +00:00
Rick Hudson
3479b065d4 [dev.garbage] runtime: allocate directly from GC mark bits
Instead of building a freelist from the mark bits generated
by the GC this CL allocates directly from the mark bits.

The approach moves the mark bits from the pointer/no pointer
heap structures into their own per span data structures. The
mark/allocation vectors consist of a single mark bit per
object. Two vectors are maintained, one for allocation and
one for the GC's mark phase. During the GC cycle's sweep
phase the interpretation of the vectors is swapped. The
mark vector becomes the allocation vector and the old
allocation vector is cleared and becomes the mark vector that
the next GC cycle will use.

Marked entries in the allocation vector indicate that the
object is not free. Each allocation vector maintains a boundary
between areas of the span already allocated from and areas
not yet allocated from. As objects are allocated this boundary
is moved until it reaches the end of the span. At this point
further allocations will be done from another span.

Since we no longer sweep a span inspecting each freed object
the responsibility for maintaining pointer/scalar bits in
the heapBitMap containing is now the responsibility of the
the routines doing the actual allocation.

This CL is functionally complete and ready for performance
tuning.

Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04
Reviewed-on: https://go-review.googlesource.com/19470
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:47 +00:00
Rick Hudson
dc65a82eff [dev.garbage] runtime: mark/allocation helper functions
The gcmarkBits is a bit vector used by the GC to mark
reachable objects. Once a GC cycle is complete the gcmarkBits
swap places with the allocBits. allocBits is then used directly
by malloc to locate free objects, thus avoiding the
construction of a linked free list. This CL introduces a set
of helper functions for manipulating gcmarkBits and allocBits
that will be used by later CLs to realize the actual
algorithm. Minimal attempts have been made to optimize these
helper routines.

Change-Id: I55ad6240ca32cd456e8ed4973c6970b3b882dd34
Reviewed-on: https://go-review.googlesource.com/19420
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-27 21:54:44 +00:00
Rick Hudson
e1c4e9a754 [dev.garbage] runtime: refactor next free object
In preparation for changing how the next free object is chosen
refactor and consolidate code into a single function.

Change-Id: I6836cd88ed7cbf0b2df87abd7c1c3b9fabc1cbd8
Reviewed-on: https://go-review.googlesource.com/19317
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:41 +00:00
Rick Hudson
2ac8bdc52a [dev.garbage] runtime: bitmap allocation data structs
The bitmap allocation data structure prototypes. Before
this is released these underlying data structures need
to be more performant but the signatures of helper
functions utilizing these structures will remain stable.

Change-Id: I5ace12f2fb512a7038a52bbde2bfb7e98783bcbe
Reviewed-on: https://go-review.googlesource.com/19221
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-27 21:54:35 +00:00
Austin Clements
6002e01e34 runtime: allocate black during GC
Currently we allocate white for most of concurrent marking. This is
based on the classical argument that it produces less floating
garbage, since allocations during GC may not get linked into the heap
and allocating white lets us reclaim these. However, it's not clear
how often this actually happens, especially since our write barrier
shades any pointer as soon as it's installed in the heap regardless of
the color of the slot.

On the other hand, allocating black has several advantages that seem
to significantly outweigh this downside.

1) It naturally bounds the total scan work to the live heap size at
the start of a GC cycle. Allocating white does not, and thus depends
entirely on assists to prevent the heap from growing faster than it
can be scanned.

2) It reduces the total amount of scan work per GC cycle by the size
of newly allocated objects that are linked into the heap graph, since
objects allocated black never need to be scanned.

3) It reduces total write barrier work since more objects will already
be black when they are linked into the heap graph.

This gives a slight overall improvement in benchmarks.

name              old time/op  new time/op  delta
XBenchGarbage-12  2.24ms ± 0%  2.21ms ± 1%  -1.32%  (p=0.000 n=18+17)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.60s ± 3%     2.53s ± 3%  -2.56%  (p=0.000 n=20+20)
Fannkuch11-12                2.08s ± 1%     2.08s ± 0%    ~     (p=0.452 n=19+19)
FmtFprintfEmpty-12          45.1ns ± 2%    45.3ns ± 2%    ~     (p=0.367 n=19+20)
FmtFprintfString-12          131ns ± 3%     129ns ± 0%  -1.60%  (p=0.000 n=20+16)
FmtFprintfInt-12             122ns ± 0%     121ns ± 2%  -0.86%  (p=0.000 n=16+19)
FmtFprintfIntInt-12          187ns ± 1%     186ns ± 1%    ~     (p=0.514 n=18+19)
FmtFprintfPrefixedInt-12     189ns ± 0%     188ns ± 1%  -0.54%  (p=0.000 n=16+18)
FmtFprintfFloat-12           256ns ± 0%     254ns ± 1%  -0.43%  (p=0.000 n=17+19)
FmtManyArgs-12               769ns ± 0%     763ns ± 0%  -0.72%  (p=0.000 n=18+18)
GobDecode-12                7.08ms ± 2%    7.00ms ± 1%  -1.22%  (p=0.000 n=20+20)
GobEncode-12                5.88ms ± 0%    5.88ms ± 1%    ~     (p=0.406 n=18+18)
Gzip-12                      214ms ± 0%     214ms ± 1%    ~     (p=0.103 n=17+18)
Gunzip-12                   37.6ms ± 0%    37.6ms ± 0%    ~     (p=0.563 n=17+17)
HTTPClientServer-12         77.2µs ± 3%    76.9µs ± 2%    ~     (p=0.606 n=20+20)
JSONEncode-12               15.1ms ± 1%    15.2ms ± 2%    ~     (p=0.138 n=19+19)
JSONDecode-12               53.3ms ± 1%    53.1ms ± 1%  -0.33%  (p=0.000 n=19+18)
Mandelbrot200-12            4.04ms ± 1%    4.04ms ± 1%    ~     (p=0.075 n=19+18)
GoParse-12                  3.30ms ± 1%    3.29ms ± 1%  -0.57%  (p=0.000 n=18+16)
RegexpMatchEasy0_32-12      69.5ns ± 1%    69.9ns ± 3%    ~     (p=0.822 n=18+20)
RegexpMatchEasy0_1K-12       237ns ± 1%     237ns ± 0%    ~     (p=0.398 n=19+18)
RegexpMatchEasy1_32-12      69.8ns ± 2%    69.5ns ± 1%    ~     (p=0.090 n=20+16)
RegexpMatchEasy1_1K-12       371ns ± 1%     372ns ± 1%    ~     (p=0.178 n=19+20)
RegexpMatchMedium_32-12      108ns ± 2%     108ns ± 3%    ~     (p=0.124 n=20+19)
RegexpMatchMedium_1K-12     33.9µs ± 2%    34.2µs ± 4%    ~     (p=0.309 n=20+19)
RegexpMatchHard_32-12       1.75µs ± 2%    1.77µs ± 4%  +1.28%  (p=0.018 n=19+18)
RegexpMatchHard_1K-12       52.7µs ± 1%    53.4µs ± 4%  +1.23%  (p=0.013 n=15+18)
Revcomp-12                   354ms ± 1%     359ms ± 4%  +1.27%  (p=0.043 n=20+20)
Template-12                 63.6ms ± 2%    63.7ms ± 2%    ~     (p=0.654 n=20+18)
TimeParse-12                 313ns ± 1%     316ns ± 2%  +0.80%  (p=0.014 n=17+20)
TimeFormat-12                332ns ± 0%     329ns ± 0%  -0.66%  (p=0.000 n=16+16)
[Geo mean]                  51.7µs         51.6µs       -0.09%

Change-Id: I2214a6a0e4f544699ea166073249a8efdf080dc0
Reviewed-on: https://go-review.googlesource.com/21323
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 20:07:22 +00:00
Austin Clements
64a26b79ac runtime: simplify/optimize allocate-black a bit
Currently allocating black switches to the system stack (which is
probably a historical accident) and atomically updates the global
bytes marked stat. Since we're about to depend on this much more,
optimize it a bit by putting it back on the regular stack and updating
the per-P bytes marked stat, which gets lazily folded into the global
bytes marked stat.

Change-Id: Ibbe16e5382d3fd2256e4381f88af342bf7020b04
Reviewed-on: https://go-review.googlesource.com/22170
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 20:07:20 +00:00
Austin Clements
479501c14c runtime: count black allocations toward scan work
Currently we count black allocations toward the scannable heap size,
but not toward the scan work we've done so far. This is clearly
inconsistent (we have, in effect, scanned these allocations and since
they're already black, we're not going to scan them again). Worse, it
means we don't count black allocations toward the scannable heap size
as of the *next* GC because this is based on the amount of scan work
we did in this cycle.

Fix this by counting black allocations as scan work. Currently the GC
spends very little time in allocate-black mode, so this probably
hasn't been a problem, but this will become important when we switch
to always allocating black.

Change-Id: If6ff693b070c385b65b6ecbbbbf76283a0f9d990
Reviewed-on: https://go-review.googlesource.com/22119
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 20:07:17 +00:00
Martin Möhrmann
7e460e70d9 runtime: use type int to specify size for newarray
Consistently use type int for the size argument of
runtime.newarray, runtime.reflect_unsafe_NewArray
and reflect.unsafe_NewArray.

Change-Id: Ic77bf2dde216c92ca8c49462f8eedc0385b6314e
Reviewed-on: https://go-review.googlesource.com/22311
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 04:15:14 +00:00
Keith Randall
001e8e8070 runtime: simplify mallocgc flag argument
mallocgc can calculate noscan itself.  The only remaining
flag argument is needzero, so we just make that a boolean arg.

Fixes #15379

Change-Id: I839a70790b2a0c9dbcee2600052bfbd6c8148e20
Reviewed-on: https://go-review.googlesource.com/22290
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-20 14:02:22 +00:00
Josh Bleecher Snyder
a4dd6ea152 runtime: add maxSliceCap
This avoids expensive division calculations
for many common slice element sizes.

name                      old time/op  new time/op  delta
MakeSlice-8               51.9ns ± 3%  35.1ns ± 2%  -32.41%  (p=0.000 n=10+10)
GrowSliceBytes-8          44.1ns ± 2%  44.1ns ± 1%     ~     (p=0.984 n=10+10)
GrowSliceInts-8           60.9ns ± 3%  60.9ns ± 3%     ~     (p=0.698 n=10+10)
GrowSlicePtr-8             131ns ± 1%   120ns ± 2%   -8.41%   (p=0.000 n=8+10)
GrowSliceStruct24Bytes-8   111ns ± 2%   103ns ± 3%   -7.23%    (p=0.000 n=8+8)

Change-Id: I2630eb3d73c814db030cad16e620ea7fecbbd312
Reviewed-on: https://go-review.googlesource.com/22223
Reviewed-by: Keith Randall <khr@golang.org>
2016-04-19 21:38:52 +00:00