Count only number of frees, everything else is derivable
and does not need to be counted on every malloc.
benchmark old ns/op new ns/op delta
BenchmarkMalloc8 68 66 -3.07%
BenchmarkMalloc16 75 70 -6.48%
BenchmarkMallocTypeInfo8 102 97 -4.80%
BenchmarkMallocTypeInfo16 108 105 -2.78%
R=golang-dev, dave, rsc
CC=golang-dev
https://golang.org/cl/9776043
It is a caching wrapper around SysAlloc() that can allocate small chunks.
Use it for symtab allocations. Reduces number of symtab walks from 4 to 3
(reduces buildfuncs time from 10ms to 7.5ms on a large binary,
reduces initial heap size by 680K on the same binary).
Also can be used for type info allocation, itab allocation.
There are also several places in GC where we do the same thing,
they can be changed to use persistentalloc().
Also can be used in FixAlloc, because each instance of FixAlloc allocates
in 128K regions, which is too eager.
Reincarnation of committed and rolled back https://golang.org/cl/9805043
The latent bugs that it revealed are fixed:
https://golang.org/cl/9837049https://golang.org/cl/9778048
R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/9778049
This depends on: 9791044: runtime: allocate page table lazily
Once page table is moved out of heap, the heap becomes small.
This removes unnecessary dereferences during heap access.
No logical changes.
R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/9802043
This removes the 256MB memory allocation at startup,
which conflicts with ulimit.
Also will allow to eliminate an unnecessary memory dereference in GC,
because the page table is usually mapped at known address.
Update #5049.
Update #5236.
R=golang-dev, khr, r, khr, rsc
CC=golang-dev
https://golang.org/cl/9791044
multiple failures on amd64
««« original CL description
runtime: introduce helper persistentalloc() function
It is a caching wrapper around SysAlloc() that can allocate small chunks.
Use it for symtab allocations. Reduces number of symtab walks from 4 to 3
(reduces buildfuncs time from 10ms to 7.5ms on a large binary,
reduces initial heap size by 680K on the same binary).
Also can be used for type info allocation, itab allocation.
There are also several places in GC where we do the same thing,
they can be changed to use persistentalloc().
Also can be used in FixAlloc, because each instance of FixAlloc allocates
in 128K regions, which is too eager.
R=golang-dev, daniel.morsing, khr
CC=golang-dev
https://golang.org/cl/9805043
»»»
R=golang-dev
CC=golang-dev
https://golang.org/cl/9822043
It is a caching wrapper around SysAlloc() that can allocate small chunks.
Use it for symtab allocations. Reduces number of symtab walks from 4 to 3
(reduces buildfuncs time from 10ms to 7.5ms on a large binary,
reduces initial heap size by 680K on the same binary).
Also can be used for type info allocation, itab allocation.
There are also several places in GC where we do the same thing,
they can be changed to use persistentalloc().
Also can be used in FixAlloc, because each instance of FixAlloc allocates
in 128K regions, which is too eager.
R=golang-dev, daniel.morsing, khr
CC=golang-dev
https://golang.org/cl/9805043
Currently per-sizeclass stats are lost for destroyed MCache's. This patch fixes this.
Also, only update mstats.heap_alloc on heap operations, because that's the only
stat that needs to be promptly updated. Everything else needs to be up-to-date only in ReadMemStats().
R=golang-dev, remyoudompheng, dave, iant
CC=golang-dev
https://golang.org/cl/9207047
The nlistmin/size thresholds are copied from tcmalloc,
but are unnecesary for Go malloc. We do not do explicit
frees into MCache. For sparse cases when we do (mainly hashmap),
simpler logic will do.
R=rsc, dave, iant
CC=gobot, golang-dev, r, remyoudompheng
https://golang.org/cl/9373043
Finer-grained transfers were relevant with per-M caches,
with per-P caches they are not relevant and harmful for performance.
For few small size classes where it makes difference,
it's fine to grab the whole span (4K).
benchmark old ns/op new ns/op delta
BenchmarkMalloc 42 40 -4.45%
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/9374043
Also change table type from int32[] to int8[] to save space in L1$.
benchmark old ns/op new ns/op delta
BenchmarkMalloc 42 40 -4.68%
R=golang-dev, bradfitz, r
CC=golang-dev
https://golang.org/cl/9199044
Before, the mheap structure was in the bss,
but it's quite large (today, 256 MB, much of
which is never actually paged in), and it makes
Go binaries run afoul of exec-time bss size
limits on some BSD systems.
Fixes#4447.
R=golang-dev, dave, minux.ma, remyoudompheng, iant
CC=golang-dev
https://golang.org/cl/7307122
PauseNs is a circular buffer of recent pause times, and the
most recent one is at [((NumGC-1)+256)%256].
Also fix comments cross-linking the Go and C definition of
various structs.
R=golang-dev, rsc, bradfitz
CC=golang-dev
https://golang.org/cl/6657047
This CL makes the runtime understand that the type of
the len or cap of a map, slice, or string is 'int', not 'int32',
and it is also careful to distinguish between function arguments
and results of type 'int' vs type 'int32'.
In the runtime, the new typedefs 'intgo' and 'uintgo' refer
to Go int and uint. The C types int and uint continue to be
unavailable (cause intentional compile errors).
This CL does not change the meaning of int, but it should make
the eventual change of the meaning of int on amd64 a bit
smoother.
Update #2188.
R=iant, r, dave, remyoudompheng
CC=golang-dev
https://golang.org/cl/6551067
Parallel GC needs to know in advance how many helper threads will be there.
Hopefully it's the last patch before I can tackle parallel sweep phase.
The benchmarks are unaffected.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/6200064
Periodically browse MHeap's freelists for long unused spans and release them if any.
Current hardcoded settings:
- GC is forced if none occured over the last 2 minutes.
- spans are handed back after 5 minutes of uselessness.
SysUnused (for Unix) is a wrapper on madvise MADV_DONTNEED on Linux and MADV_FREE on BSDs.
R=rsc, dvyukov, remyoudompheng
CC=golang-dev
https://golang.org/cl/5451057
Unexports runtime.MemStats and rename MemStatsType to MemStats.
The new accessor requires passing a pointer to a user-allocated
MemStats structure.
Fixes#2572.
R=bradfitz, rsc, bradfitz, gustavo
CC=golang-dev, remy
https://golang.org/cl/5616072
Fixes#2337.
Unfortunate sequence of events is:
1. maxcpu=2, mcpu=1, grunning=1
2. starttheworld creates an extra M:
maxcpu=2, mcpu=2, grunning=1
4. the goroutine calls runtime.GOMAXPROCS(1)
maxcpu=1, mcpu=2, grunning=1
5. since it sees mcpu>maxcpu, it calls gosched()
6. schedule() deschedules the goroutine:
maxcpu=1, mcpu=1, grunning=0
7. schedule() call getnextandunlock() which
fails to pick up the goroutine again,
because canaddcpu() fails, because mcpu==maxcpu
8. then it sees that grunning==0,
reports deadlock and terminates
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/5191044
Running test/garbage/parser.out.
On a 4-core Lenovo X201s (Linux):
31.12u 0.60s 31.74r 1 cpu, no atomics
32.27u 0.58s 32.86r 1 cpu, atomic instructions
33.04u 0.83s 27.47r 2 cpu
On a 16-core Xeon (Linux):
33.08u 0.65s 33.80r 1 cpu, no atomics
34.87u 1.12s 29.60r 2 cpu
36.00u 1.87s 28.43r 3 cpu
36.46u 2.34s 27.10r 4 cpu
38.28u 3.85s 26.92r 5 cpu
37.72u 5.25s 26.73r 6 cpu
39.63u 7.11s 26.95r 7 cpu
39.67u 8.10s 26.68r 8 cpu
On a 2-core MacBook Pro Core 2 Duo 2.26 (circa 2009, MacBookPro5,5):
39.43u 1.45s 41.27r 1 cpu, no atomics
43.98u 2.95s 38.69r 2 cpu
On a 2-core Mac Mini Core 2 Duo 1.83 (circa 2008; Macmini2,1):
48.81u 2.12s 51.76r 1 cpu, no atomics
57.15u 4.72s 51.54r 2 cpu
The handoff algorithm is really only good for two cores.
Beyond that we will need to so something more sophisticated,
like have each core hand off to the next one, around a circle.
Even so, the code is a good checkpoint; for now we'll limit the
number of gc procs to at most 2.
R=dvyukov
CC=golang-dev
https://golang.org/cl/4641082
GC is still single-threaded.
Multiple threads will happen in another CL.
Garbage collection pauses are typically
about half as long as they were before this CL.
R=brainman, iant, r
CC=golang-dev
https://golang.org/cl/3975046
The old heap maps used a multilevel table, but that
was overkill: there are only 1M entries on a 32-bit
machine and we can arrange to use a dense address
range on a 64-bit machine.
The heap map is in bss. The assumption is that if
we don't touch the pages they won't be mapped in.
Also moved some duplicated memory allocation
code out of the OS-specific files.
R=r
CC=golang-dev
https://golang.org/cl/4118042