Moves sweep phase out of stoptheworld by adding
background sweeper goroutine and lazy on-demand sweeping.
It turned out to be somewhat trickier than I expected,
because there is no point in time when we know size of live heap
nor consistent number of mallocs and frees.
So everything related to next_gc, mprof, memstats, etc becomes trickier.
At the end of GC next_gc is conservatively set to heap_alloc*GOGC,
which is much larger than real value. But after every sweep
next_gc is decremented by freed*GOGC. So when everything is swept
next_gc becomes what it should be.
For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs),
because by the end of GC we know number of frees for the *previous* GC.
Significant caution is required to not cross yet-unknown real value of next_gc.
This is achieved by 2 means:
1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral.
2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are
returned to heap.
This provides quite strong guarantees that heap does not grow when it should now.
http-1
allocated 7036 7033 -0.04%
allocs 60 60 +0.00%
cputime 51050 46700 -8.52%
gc-pause-one 34060569 1777993 -94.78%
gc-pause-total 2554 133 -94.79%
latency-50 178448 170926 -4.22%
latency-95 284350 198294 -30.26%
latency-99 345191 220652 -36.08%
rss 101564416 101007360 -0.55%
sys-gc 6606832 6541296 -0.99%
sys-heap 88801280 87752704 -1.18%
sys-other 7334208 7405928 +0.98%
sys-stack 524288 524288 +0.00%
sys-total 103266608 102224216 -1.01%
time 50339 46533 -7.56%
virtual-mem 292990976 293728256 +0.25%
garbage-1
allocated 2983818 2990889 +0.24%
allocs 62880 62902 +0.03%
cputime 16480000 16190000 -1.76%
gc-pause-one 828462467 487875135 -41.11%
gc-pause-total 4142312 2439375 -41.11%
rss 1151709184 1153712128 +0.17%
sys-gc 66068352 66068352 +0.00%
sys-heap 1039728640 1039728640 +0.00%
sys-other 37776064 40770176 +7.93%
sys-stack 8781824 8781824 +0.00%
sys-total 1152354880 1155348992 +0.26%
time 16496998 16199876 -1.80%
virtual-mem 1409564672 1402281984 -0.52%
LGTM=rsc
R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot
CC=golang-codereviews, khr
https://golang.org/cl/46430043
This depends on: 9791044: runtime: allocate page table lazily
Once page table is moved out of heap, the heap becomes small.
This removes unnecessary dereferences during heap access.
No logical changes.
R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/9802043
The nlistmin/size thresholds are copied from tcmalloc,
but are unnecesary for Go malloc. We do not do explicit
frees into MCache. For sparse cases when we do (mainly hashmap),
simpler logic will do.
R=rsc, dave, iant
CC=gobot, golang-dev, r, remyoudompheng
https://golang.org/cl/9373043
Finer-grained transfers were relevant with per-M caches,
with per-P caches they are not relevant and harmful for performance.
For few small size classes where it makes difference,
it's fine to grab the whole span (4K).
benchmark old ns/op new ns/op delta
BenchmarkMalloc 42 40 -4.45%
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/9374043
Before, the mheap structure was in the bss,
but it's quite large (today, 256 MB, much of
which is never actually paged in), and it makes
Go binaries run afoul of exec-time bss size
limits on some BSD systems.
Fixes#4447.
R=golang-dev, dave, minux.ma, remyoudompheng, iant
CC=golang-dev
https://golang.org/cl/7307122
Collapse the arch,os-specific directories into the main directory
by renaming xxx/foo.c to foo_xxx.c, and so on.
There are no substantial edits here, except to the Makefile.
The assumption is that the Go tool will #define GOOS_darwin
and GOARCH_amd64 and will make any file named something
like signals_darwin.h available as signals_GOOS.h during the
build. This replaces what used to be done with -I$(GOOS).
There is still work to be done to make runtime build with
standard tools, but this is a big step. After this we will have
to write a script to generate all the generated files so they
can be checked in (instead of generated during the build).
R=r, iant, r, lucio.dere
CC=golang-dev
https://golang.org/cl/5490053
GC is still single-threaded.
Multiple threads will happen in another CL.
Garbage collection pauses are typically
about half as long as they were before this CL.
R=brainman, iant, r
CC=golang-dev
https://golang.org/cl/3975046
The old heap maps used a multilevel table, but that
was overkill: there are only 1M entries on a 32-bit
machine and we can arrange to use a dense address
range on a 64-bit machine.
The heap map is in bss. The assumption is that if
we don't touch the pages they won't be mapped in.
Also moved some duplicated memory allocation
code out of the OS-specific files.
R=r
CC=golang-dev
https://golang.org/cl/4118042
Prefix all external symbols in runtime by runtime·,
to avoid conflicts with possible symbols of the same
name in linked-in C libraries. The obvious conflicts
are printf, malloc, and free, but hide everything to
avoid future pain.
The symbols left alone are:
** known to cgo **
_cgo_free
_cgo_malloc
libcgo_thread_start
initcgo
ncgocall
** known to linker **
_rt0_$GOARCH
_rt0_$GOARCH_$GOOS
text
etext
data
end
pclntab
epclntab
symtab
esymtab
** known to C compiler **
_divv
_modv
_div64by32
etc (arch specific)
Tested on darwin/386, darwin/amd64, linux/386, linux/amd64.
Built (but not tested) for freebsd/386, freebsd/amd64, linux/arm, windows/386.
R=r, PeterGo
CC=golang-dev
https://golang.org/cl/2899041
This keeps fragmentation from delaying
garbage collections (and causing more fragmentation).
Cuts fresh godoc (with indexes) from 261M to 166M (120M live).
Cuts toy wc program from 50M to 8M.
Fixes#647.
R=r, cw
CC=golang-dev
https://golang.org/cl/257041
* specialize sweepspan as sweepspan0 and sweepspan1.
* in sweepspan1, inline "free" to avoid expensive mlookup.
R=iant
CC=golang-dev
https://golang.org/cl/206060