Drops mallocrep1.go back to a reasonable
amount of time. (154 -> 0.8 seconds on my Mac)
Fixes#2085.
R=golang-dev, dvyukov, r
CC=golang-dev
https://golang.org/cl/4811045
The g->sched.sp saved stack pointer and the
g->stackbase and g->stackguard stack bounds
can change even while "the world is stopped",
because a goroutine has to call functions (and
therefore might split its stack) when exiting a
system call to check whether the world is stopped
(and if so, wait until the world continues).
That means the garbage collector cannot access
those values safely (without a race) for goroutines
executing system calls. Instead, save a consistent
triple in g->gcsp, g->gcstack, g->gcguard during
entersyscall and have the garbage collector refer
to those.
The old code was occasionally seeing (because of
the race) an sp and stk that did not correspond to
each other, so that stk - sp was not the number of
stack bytes following sp. In that case, if sp < stk
then the call scanblock(sp, stk - sp) scanned too
many bytes (anything between the two pointers,
which pointed into different allocation blocks).
If sp > stk then stk - sp wrapped around.
On 32-bit, stk - sp is a uintptr (uint32) converted
to int64 in the call to scanblock, so a large (~4G)
but positive number. Scanblock would try to scan
that many bytes and eventually fault accessing
unmapped memory. On 64-bit, stk - sp is a uintptr (uint64)
promoted to int64 in the call to scanblock, so a negative
number. Scanblock would not scan anything, possibly
causing in-use blocks to be freed.
In short, 32-bit platforms would have seen either
ineffective garbage collection or crashes during garbage
collection, while 64-bit platforms would have seen
either ineffective or incorrect garbage collection.
You can see the invalid arguments to scanblock in the
stack traces in issue 1620.
Fixes#1620.
Fixes#1746.
R=iant, r
CC=golang-dev
https://golang.org/cl/4437075
* Change use of m->g0 stack (aka scheduler stack).
* Provide runtime.mcall(f) to invoke f() on m->g0 stack.
* Replace scheduler loop entry with runtime.mcall(schedule).
Runtime.mcall eliminates the need for fake scheduler states that
exist just to run a bit of code on the m->g0 stack
(Grecovery, Gstackalloc).
The elimination of the scheduler as a loop that stops and
starts using gosave and gogo fixes a bad interaction with the
way cgo uses the m->g0 stack. Cgo runs external (gcc-compiled)
C functions on that stack, and then when calling back into Go,
it sets m->g0->sched.sp below the added call frames, so that
other uses of m->g0's stack will not interfere with those frames.
Unfortunately, gogo (longjmp) back to the scheduler loop at
this point would end up running scheduler with the lower
sp, which no longer points at a valid stack frame for
a call to scheduler. If scheduler then wrote any function call
arguments or local variables to where it expected the stack
frame to be, it would overwrite other data on the stack.
I realized this possibility while debugging a problem with
calling complex Go code in a Go -> C -> Go cgo callback.
This wasn't the bug I was looking for, it turns out, but I believe
it is a real bug nonetheless. Switching to runtime.mcall, which
only adds new frames to the stack and never jumps into
functions running in existing ones, fixes this bug.
* Move cgo-related code out of proc.c into cgocall.c.
* Add very large comment describing cgo call sequences.
* Simpilify, regularize cgo function implementations and names.
* Add test suite as misc/cgo/test.
Now the Go -> C path calls cgocall, which calls asmcgocall,
and the C -> Go path calls cgocallback, which calls cgocallbackg.
The shuffling, which affects mainly the callback case, moves
most of the callback implementation to cgocallback running
on the m->curg stack (not the m->g0 scheduler stack) and
only while accounted for with $GOMAXPROCS (between calls
to exitsyscall and entersyscall).
The previous callback code did not block in startcgocallback's
approximation to exitsyscall, so if, say, the garbage collector
were running, it would still barge in and start doing things
like call malloc. Similarly endcgocallback's approximation of
entersyscall did not call matchmg to kick off new OS threads
when necessary, which caused the bug in issue 1560.
Fixes#1560.
R=iant
CC=golang-dev
https://golang.org/cl/4253054
Avoids deadlocks like the one below, in which a stack split happened
in order to call lock(&stacks), but then the stack unsplit cannot run
because stacks is now locked.
The only code calling stackalloc that wasn't on a scheduler
stack already was malg, which creates a new goroutine.
runtime.futex+0x23 /home/rsc/g/go/src/pkg/runtime/linux/amd64/sys.s:139
runtime.futex()
futexsleep+0x50 /home/rsc/g/go/src/pkg/runtime/linux/thread.c:51
futexsleep(0x5b0188, 0x300000003, 0x100020000, 0x4159e2)
futexlock+0x85 /home/rsc/g/go/src/pkg/runtime/linux/thread.c:119
futexlock(0x5b0188, 0x5b0188)
runtime.lock+0x56 /home/rsc/g/go/src/pkg/runtime/linux/thread.c:158
runtime.lock(0x5b0188, 0x7f0d27b4a000)
runtime.stackfree+0x4d /home/rsc/g/go/src/pkg/runtime/malloc.goc:336
runtime.stackfree(0x7f0d27b4a000, 0x1000, 0x8, 0x7fff37e1e218)
runtime.oldstack+0xa6 /home/rsc/g/go/src/pkg/runtime/proc.c:705
runtime.oldstack()
runtime.lessstack+0x22 /home/rsc/g/go/src/pkg/runtime/amd64/asm.s:224
runtime.lessstack()
----- lessstack called from goroutine 2 -----
runtime.lock+0x56 /home/rsc/g/go/src/pkg/runtime/linux/thread.c:158
runtime.lock(0x5b0188, 0x40a5e2)
runtime.stackalloc+0x55 /home/rsc/g/go/src/pkg/runtime/malloc.c:316
runtime.stackalloc(0x1000, 0x4055b0)
runtime.malg+0x3d /home/rsc/g/go/src/pkg/runtime/proc.c:803
runtime.malg(0x1000, 0x40add9)
runtime.newproc1+0x12b /home/rsc/g/go/src/pkg/runtime/proc.c:854
runtime.newproc1(0xf840027440, 0x7f0d27b49230, 0x0, 0x49f238, 0x40, ...)
runtime.newproc+0x2f /home/rsc/g/go/src/pkg/runtime/proc.c:831
runtime.newproc(0x0, 0xf840027440, 0xf800000010, 0x44b059)
...
R=r, r2
CC=golang-dev
https://golang.org/cl/4216045
GC is still single-threaded.
Multiple threads will happen in another CL.
Garbage collection pauses are typically
about half as long as they were before this CL.
R=brainman, iant, r
CC=golang-dev
https://golang.org/cl/3975046
The old heap maps used a multilevel table, but that
was overkill: there are only 1M entries on a 32-bit
machine and we can arrange to use a dense address
range on a 64-bit machine.
The heap map is in bss. The assumption is that if
we don't touch the pages they won't be mapped in.
Also moved some duplicated memory allocation
code out of the OS-specific files.
R=r
CC=golang-dev
https://golang.org/cl/4118042
The recent linker changes broke NaCl support
a month ago, and there are no known users of it.
The NaCl code can always be recovered from the
repository history.
R=adg, r
CC=golang-dev
https://golang.org/cl/3671042
Prefix all external symbols in runtime by runtime·,
to avoid conflicts with possible symbols of the same
name in linked-in C libraries. The obvious conflicts
are printf, malloc, and free, but hide everything to
avoid future pain.
The symbols left alone are:
** known to cgo **
_cgo_free
_cgo_malloc
libcgo_thread_start
initcgo
ncgocall
** known to linker **
_rt0_$GOARCH
_rt0_$GOARCH_$GOOS
text
etext
data
end
pclntab
epclntab
symtab
esymtab
** known to C compiler **
_divv
_modv
_div64by32
etc (arch specific)
Tested on darwin/386, darwin/amd64, linux/386, linux/amd64.
Built (but not tested) for freebsd/386, freebsd/amd64, linux/arm, windows/386.
R=r, PeterGo
CC=golang-dev
https://golang.org/cl/2899041
Old code was using recursion to traverse object graph.
New code uses an explicit stack, cutting the per-pointer
footprint to two words during the recursion and avoiding
the standard allocator and stack splitting code.
in test/garbage:
Reduces parser runtime by 2-3%
Reduces Peano runtime by 40%
Increases tree runtime by 4-5%
R=r
CC=golang-dev
https://golang.org/cl/2150042
This keeps fragmentation from delaying
garbage collections (and causing more fragmentation).
Cuts fresh godoc (with indexes) from 261M to 166M (120M live).
Cuts toy wc program from 50M to 8M.
Fixes#647.
R=r, cw
CC=golang-dev
https://golang.org/cl/257041
because free needs to mark the block as freed to
coordinate with the garbage collector.
(in C++ free can blindly put the block on the free list,
no questions asked, so the cache saves some work.)
R=iant
CC=golang-dev
https://golang.org/cl/206069
* specialize sweepspan as sweepspan0 and sweepspan1.
* in sweepspan1, inline "free" to avoid expensive mlookup.
R=iant
CC=golang-dev
https://golang.org/cl/206060
introduced explicit "data" symbol instead of etext
to mark beginning of data, so that using larger
alignment (i.e. 4MB like GNU loader) doesn't
confuse garbage collector.
split dodata into dodata and dobss in preparation
for putting the dynamic data + headers in the data
segment instead of stuffed at the beginning of the binary.
R=r
DELTA=52 (37 added, 3 deleted, 12 changed)
OCL=33610
CL=33618
saving of sp was too far away from use in scanstack;
the stack had changed since the sp was saved.
R=r
DELTA=9 (4 added, 2 deleted, 3 changed)
OCL=32232
CL=32237
* keep coherent SP/PC in gobuf
(i.e., SP that would be in use at that PC)
* gogocall replaces setspgoto,
should work better in presence of link registers
* delete unused system calls
only amd64; 386 is now broken
R=r
DELTA=548 (183 added, 183 deleted, 182 changed)
OCL=30381
CL=30442
not observed: do not use malloc to allocate stacks
during garbage collection, because it would make the
malloc data structures change underfoot.
R=r
DELTA=6 (3 added, 0 deleted, 3 changed)
OCL=30323
CL=30326