1
0
mirror of https://github.com/golang/go synced 2024-10-04 08:31:22 -06:00
Commit Graph

305 Commits

Author SHA1 Message Date
Dmitriy Vyukov
dfa5a99ebb runtime: generate type info for chans
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, khr
https://golang.org/cl/115280043
2014-07-29 22:06:47 +04:00
Dmitriy Vyukov
cd17a717f9 runtime: simpler and faster GC
Implement the design described in:
https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbKcg/pub

Summary of the changes:
GC uses "2-bits per word" pointer type info embed directly into bitmap.
Scanning of stacks/data/heap is unified.
The old spans types go away.
Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap).
Linker generates "dense" 2-bits type info for data/bss (the same as stacks use).

Summary of results:
-1680 lines of code total (-1000+ in mgc0.c only)
-25% memory consumption
-3-7% binary size
-15% GC pause reduction
-7% run time reduction

LGTM=khr
R=golang-codereviews, rsc, christoph, khr
CC=golang-codereviews, rlh
https://golang.org/cl/106260045
2014-07-29 11:01:02 +04:00
Dmitriy Vyukov
0603fbb01c runtime: fix unexpected return pc for runtime.newstackcall
With cl/112640043 TestCgoDeadlockCrash episodically print:
unexpected return pc for runtime.newstackcall
After adding debug output I see the following trace:

runtime: unexpected return pc for runtime.newstackcall called from 0xc208011b00
runtime.throw(0x414da86)
        src/pkg/runtime/panic.c:523 +0x77
runtime.gentraceback(0x40165fc, 0xba440c28, 0x0, 0xc208d15200, 0xc200000000, 0xc208ddfd20, 0x20, 0x0, 0x0, 0x300)
	src/pkg/runtime/traceback_x86.c:185 +0xca4
runtime.callers(0x1, 0xc208ddfd20, 0x20)
	src/pkg/runtime/traceback_x86.c:438 +0x98
mcommoninit(0xc208ddfc00)
	src/pkg/runtime/proc.c:369 +0x5c
runtime.allocm(0xc208052000)
	src/pkg/runtime/proc.c:686 +0xa6
newm(0x4017850, 0xc208052000)
	src/pkg/runtime/proc.c:933 +0x27
startm(0xc208052000, 0x100000001)
	src/pkg/runtime/proc.c:1011 +0xba
wakep()
	src/pkg/runtime/proc.c:1071 +0x57
resetspinning()
	src/pkg/runtime/proc.c:1297 +0xa1
schedule()
	src/pkg/runtime/proc.c:1366 +0x14b
runtime.gosched0(0xc20808e240)
	src/pkg/runtime/proc.c:1465 +0x5b
runtime.newstack()
	src/pkg/runtime/stack.c:891 +0x44d
runtime: unexpected return pc for runtime.newstackcall called from 0xc208011b00
runtime.newstackcall(0x4000cbd, 0x4000b80)
	src/pkg/runtime/asm_amd64.s:278 +0x6f

I suspect that it can happen on any stack split.
So don't unwind g0 stack.
Also, that comment is lying -- we can traceback w/o mcache,
CPU profiler does that.

LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, khr, rsc
https://golang.org/cl/120040043
2014-07-23 18:51:34 +04:00
Keith Randall
76f1b901db runtime: keep build version around in binaries
So we can tell from a binary which version of
Go built it.

LGTM=minux, rsc
R=golang-codereviews, minux, khr, rsc, dave
CC=golang-codereviews
https://golang.org/cl/117040043
2014-07-21 20:52:11 -07:00
Keith Randall
f378f30034 undo CL 101570044 / 2c57aaea79c4
redo stack allocation.  This is mostly the same as
the original CL with a few bug fixes.

1. add racemalloc() for stack allocations
2. fix poolalloc/poolfree to terminate free lists correctly.
3. adjust span ref count correctly.
4. don't use cache for sizes >= StackCacheSize.

Should fix bugs and memory leaks in original changelist.

««« original CL description
undo CL 104200047 / 318b04f28372

Breaks windows and race detector.
TBR=rsc

««« original CL description
runtime: stack allocator, separate from mallocgc

In order to move malloc to Go, we need to have a
separate stack allocator.  If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.

Stacks are the last remaining FlagNoGC objects in the
GC heap.  Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)

Fixes #7468
Fixes #7424

LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»

TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
»»»

LGTM=dvyukov
R=dvyukov, dave, khr, alex.brainman
CC=golang-codereviews
https://golang.org/cl/112240044
2014-07-17 14:41:46 -07:00
Dmitriy Vyukov
92c1e72040 runtime: make NumGoroutines faster
Resolves TODO for not walking all goroutines in NumGoroutines.

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/107290044
2014-07-17 21:51:03 +04:00
Dmitriy Vyukov
aa76377423 runtime: start goroutine ids at 1
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, khr
https://golang.org/cl/117810043
2014-07-16 12:19:33 +04:00
Russ Cox
64c2083ebc runtime: refactor routines for stopping, running goroutine from m
This CL adds 'dropg', which is called to drop the association
between m and its current goroutine, and it makes schedule
handle locked goroutines correctly, instead of requiring all
callers of schedule to do that.

The effect is that if you want to take over an m for, say,
garbage collection work while still allowing the current g
to run on some other m, you can do an mcall to a function
that is:

        // dissociate gp
        dropg();
        gp->status = Gwaiting; // for ready

        // put gp on run queue for others to find
        runtime·ready(gp);

        /* ... do other work here ... */

        // done with m, let it run goroutines again
        schedule();

Before this CL, the dropg() body had to be written explicitly,
and the check for lockedg before schedule had to be
written explicitly too, both of which make the code a bit
more fragile than it needs to be.

LGTM=iant
R=dvyukov, iant
CC=golang-codereviews, rlh
https://golang.org/cl/113110043
2014-07-14 20:56:37 -04:00
Keith Randall
3cf83c182a undo CL 104200047 / 318b04f28372
Breaks windows and race detector.
TBR=rsc

««« original CL description
runtime: stack allocator, separate from mallocgc

In order to move malloc to Go, we need to have a
separate stack allocator.  If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.

Stacks are the last remaining FlagNoGC objects in the
GC heap.  Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)

Fixes #7468
Fixes #7424

LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»

TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
2014-06-30 19:48:08 -07:00
Keith Randall
7c13860cd0 runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator.  If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.

Stacks are the last remaining FlagNoGC objects in the
GC heap.  Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)

Fixes #7468
Fixes #7424

LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
2014-06-30 18:59:24 -07:00
Dmitriy Vyukov
a7186dc303 runtime: improve scheduler trace
Output number of spinning threads,
this is useful to understanding whether the scheduler
is in a steady state or not.

R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/103540045
2014-06-26 17:16:43 -07:00
Dmitriy Vyukov
07f6f313a9 runtime: say when a goroutine is locked to OS thread
Say when a goroutine is locked to OS thread in crash reports
and goroutine profiles.
It can be useful to understand what goroutines consume OS threads
(syscall and locked), e.g. if you forget to call UnlockOSThread
or leak locked goroutines.

R=golang-codereviews
CC=golang-codereviews, rsc
https://golang.org/cl/94170043
2014-06-26 11:40:48 -07:00
Russ Cox
89f185fe8a all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).

This CL removes the extern register m; code now uses g->m.

On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.

The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:

BenchmarkBinaryTree17              5491374955     5471024381     -0.37%
BenchmarkFannkuch11                4357101311     4275174828     -1.88%
BenchmarkGobDecode                 11029957       11364184       +3.03%
BenchmarkGobEncode                 6852205        6784822        -0.98%
BenchmarkGzip                      650795967      650152275      -0.10%
BenchmarkGunzip                    140962363      141041670      +0.06%
BenchmarkHTTPClientServer          71581          73081          +2.10%
BenchmarkJSONEncode                31928079       31913356       -0.05%
BenchmarkJSONDecode                117470065      113689916      -3.22%
BenchmarkMandelbrot200             6008923        5998712        -0.17%
BenchmarkGoParse                   6310917        6327487        +0.26%
BenchmarkRegexpMatchMedium_1K      114568         114763         +0.17%
BenchmarkRegexpMatchHard_1K        168977         169244         +0.16%
BenchmarkRevcomp                   935294971      914060918      -2.27%
BenchmarkTemplate                  145917123      148186096      +1.55%

Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.

Actual code changes by Minux.
I only did the docs and the benchmarking.

LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
2014-06-26 11:54:39 -04:00
Dmitriy Vyukov
0d72364616 runtime/race: update runtime to tip
This requires minimal changes to the runtime hooks. In particular,
synchronization events must be done only on valid addresses now,
so I've added the additional checks to race.c.

LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/101000046
2014-06-20 16:36:21 -07:00
Russ Cox
4534fdb144 runtime: fix panic stack during runtime.Goexit during panic
A runtime.Goexit during a panic-invoked deferred call
left the panic stack intact even though all the stack frames
are gone when the goroutine is torn down.
The next goroutine to reuse that struct will have a
bogus panic stack and can cause the traceback routines
to walk into garbage.

Most likely to happen during tests, because t.Fatal might
be called during a deferred func and uses runtime.Goexit.

This "not enough cleared in Goexit" failure mode has
happened to us multiple times now. Clear all the pointers
that don't make sense to keep, not just gp->panic.

Fixes #8158.

LGTM=iant, dvyukov
R=iant, dvyukov
CC=golang-codereviews
https://golang.org/cl/102220043
2014-06-06 16:52:14 -04:00
Keith Randall
548b15def6 runtime: mark some C globals as having no pointers.
C globals are conservatively scanned.  This helps
avoid false retention, especially for 32 bit.

LGTM=rsc
R=golang-codereviews, khr, rsc
CC=golang-codereviews
https://golang.org/cl/102040043
2014-05-31 19:21:17 -04:00
Russ Cox
14d2ee1d00 runtime: make continuation pc available to stack walk
The 'continuation pc' is where the frame will continue
execution, if anywhere. For a frame that stopped execution
due to a CALL instruction, the continuation pc is immediately
after the CALL. But for a frame that stopped execution due to
a fault, the continuation pc is the pc after the most recent CALL
to deferproc in that frame, or else 0. That is where execution
will continue, if anywhere.

The liveness information is only recorded for CALL instructions.
This change makes sure that we never look for liveness information
except for CALL instructions.

Using a valid PC fixes crashes when a garbage collection or
stack copying tries to process a stack frame that has faulted.

Record continuation pc in heapdump (format change).

Fixes #8048.

LGTM=iant, khr
R=khr, iant, dvyukov
CC=golang-codereviews, r
https://golang.org/cl/100870044
2014-05-31 10:10:12 -04:00
Dmitriy Vyukov
b5caa02067 runtime: fix go of nil func value
Currently runtime derefences nil with m->locks>0,
which causes unrecoverable fatal error.
Panic instead.
Fixes #8045.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, khr
https://golang.org/cl/97620043
2014-05-28 00:00:01 -04:00
Keith Randall
29d1b211fd runtime: clean up scanning of Gs
Use a real type for Gs instead of scanning them conservatively.
Zero the schedlink pointer when it is dead.

Update #7820

LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews
https://golang.org/cl/89360043
2014-04-28 12:47:09 -04:00
Russ Cox
ade6bc68b0 runtime: crash when func main calls Goexit and all other goroutines exit
This has typically crashed in the past, although usually with
an 'all goroutines are asleep - deadlock!' message that shows
no goroutines (because there aren't any).

Previous discussion at:
https://groups.google.com/d/msg/golang-nuts/uCT_7WxxopQ/BoSBlLFzUTkJ
https://groups.google.com/d/msg/golang-dev/KUojayEr20I/u4fp_Ej5PdUJ
http://golang.org/issue/7711

There is general agreement that runtime.Goexit terminates the
main goroutine, so that main cannot return, so the program does
not exit.

The interpretation that all other goroutines exiting causes an
exit(0) is relatively new and was not part of those discussions.
That is what this CL changes.

Thankfully, even though the exit(0) has been there for a while,
some other accounting bugs made it very difficult to trigger,
so it is reasonable to replace. In particular, see golang.org/issue/7711#c10
for an examination of the behavior across past releases.

Fixes #7711.

LGTM=iant, r
R=golang-codereviews, iant, dvyukov, r
CC=golang-codereviews
https://golang.org/cl/88210044
2014-04-16 13:12:18 -04:00
Russ Cox
5556bfa9c7 runtime: cache gotraceback setting
On Plan 9 gotraceback calls getenv calls malloc, and we gotraceback
on every call to gentraceback, which happens during garbage collection.
Honestly I don't even know how this works on Plan 9.
I suspect it does not, and that we are getting by because
no one has tried to run with $GOTRACEBACK set at all.

This will speed up all the other systems by epsilon, since they
won't call getenv and atoi repeatedly.

LGTM=bradfitz
R=golang-codereviews, bradfitz, 0intro
CC=golang-codereviews
https://golang.org/cl/85430046
2014-04-08 22:35:41 -04:00
Dmitriy Vyukov
f8c350873c runtime: fix yet another race in bgsweep
Currently it's possible that bgsweep finishes before all spans
have been swept (we only know that sweeping of all spans has *started*).
In such case bgsweep may fail wake up runfinq goroutine when it needs to.
finq may still be nil at this point, but some finalizers may be queued later.
Make bgsweep to wait for sweeping to *complete*, then it can decide
whether it needs to wake up runfinq for sure.
Update #7533

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75960043
2014-03-26 15:11:36 +04:00
Russ Cox
3750904a7e runtime: use VEH, not SEH, for windows/386 exception handling
Structured Exception Handling (SEH) was the first way to handle
exceptions (memory faults, divides by zero) on Windows.
The S might as well stand for "stack-based": the implementation
interprets stack addresses in a few different ways, and it gets
subtly confused by Go's management of stacks. It's also something
that requires active maintenance during cgo switches, and we've
had bugs in that maintenance in the past.

We have recently come to believe that SEH cannot work with
Go's stack usage. See http://golang.org/issue/7325 for details.

Vectored Exception Handling (VEH) is more like a Unix signal
handler: you set it once for the whole process and forget about it.

This CL drops all the SEH code and replaces it with VEH code.
Many special cases and 7 #ifdefs disappear.

VEH was introduced in Windows XP, so Go on windows/386 will
now require Windows XP or later. The previous requirement was
Windows 2000 or later. Windows 2000 immediately preceded
Windows XP, so Windows 2000 is the only affected version.
Microsoft stopped supporting Windows 2000 in 2010.
See http://golang.org/s/win2000-golang-nuts for details.

Fixes #7325.

LGTM=alex.brainman, r
R=golang-codereviews, alex.brainman, stephen.gutekanst, dave
CC=golang-codereviews, iant, r
https://golang.org/cl/74790043
2014-03-24 21:22:16 -04:00
Dmitriy Vyukov
1895014257 runtime: fix stack split detection around fork
If runtime_BeforeFork splits stack, it will unsplit it
with spoiled g->stackguard. It leads to check failure in oldstack:

fatal error: stackfree: bad fixed size

runtime stack:
runtime.throw(0xadf3cd)
runtime.stackfree(0xc208040480, 0xfffffffffffff9dd, 0x1b00fa8)
runtime.oldstack()
runtime.lessstack()

goroutine 311 [stack unsplit]:
syscall.forkAndExecInChild(0xc20802eea0, 0xc208192c00, 0x5, 0x5, 0xc208072a80, ...)
syscall.forkExec(0xc20802ed80, 0x54, 0xc2081ccb40, 0x4, 0x4, ...)

Fixes #7567.

LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, khr, rsc
https://golang.org/cl/77340045
2014-03-19 17:04:51 +04:00
Dmitriy Vyukov
e678ab4e37 runtime: detect stack split after fork
This check would allowed to easily prevent issue 7511.
Update #7511

LGTM=rsc
R=rsc, aram
CC=golang-codereviews
https://golang.org/cl/75260043
2014-03-13 17:41:08 +04:00
Anthony Martin
64e041652a runtime: call symtabinit earlier
Otherwise, we won't get a stack trace in some of the early init.

Here's one example:

        http://build.golang.org/log/a96d10f6aee1fa3e3ae51f41da46d414a7ab02de

After walking the stack by hand in acid, I was able to determine
that the stackalloc inside mpreinit was causing the throw.

LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews
https://golang.org/cl/72450044
2014-03-12 19:42:58 -07:00
Shenghou Ma
84570aa9a1 runtime: round stack size to power of 2.
Fixes build on windows/386 and plan9/386.
Fixes #7487.

LGTM=mattn.jp, dvyukov, rsc
R=golang-codereviews, mattn.jp, dvyukov, 0intro, rsc
CC=golang-codereviews
https://golang.org/cl/72360043
2014-03-07 15:11:16 -05:00
Dmitriy Vyukov
1a89e6388c runtime: refactor and fix stack management code
There are at least 3 bugs:
1. g->stacksize accounting is broken during copystack/shrinkstack
2. stktop->free is not properly maintained during copystack/shrinkstack
3. stktop->free logic is broken:
        we can have stktop->free==FixedStack,
        and we will free it into stack cache,
        but it actually comes from heap as the result of non-copying segment shrink
This shows as at least spurious races on race builders (maybe something else as well I don't know).

The idea behind the refactoring is to consolidate stacksize and
segment origin logic in stackalloc/stackfree.

Fixes #7490.

LGTM=rsc, khr
R=golang-codereviews, rsc, khr
CC=golang-codereviews
https://golang.org/cl/72440043
2014-03-07 20:52:29 +04:00
Dmitriy Vyukov
a1695d2ea3 runtime: use custom thunks for race calls instead of cgo
Implement custom assembly thunks for hot race calls (memory accesses and function entry/exit).
The thunks extract caller pc, verify that the address is in heap or global and switch to g0 stack.

Before:
ok  	regexp	3.692s
ok  	compress/bzip2	9.461s
ok  	encoding/json	6.380s
After:
ok  	regexp	2.229s (-40%)
ok  	compress/bzip2	4.703s (-50%)
ok  	encoding/json	3.629s (-43%)

For comparison, normal non-race build:
ok  	regexp	0.348s
ok  	compress/bzip2	0.304s
ok  	encoding/json	0.661s
Race build:
ok  	regexp	2.229s (+540%)
ok  	compress/bzip2	4.703s (+1447%)
ok  	encoding/json	3.629s (+449%)

Also removes some race-related special cases from cgocall and scheduler.
In long-term it will allow to remove cyclic runtime/race dependency on cmd/cgo.

Fixes #4249.
Fixes #7460.
Update #6508
Update #6688

R=iant, rsc, bradfitz
CC=golang-codereviews
https://golang.org/cl/55100044
2014-03-06 23:48:30 +04:00
Dmitriy Vyukov
8ca3372d7b runtime: fix bad g status after copystack
LGTM=khr
R=khr
CC=golang-codereviews, rsc
https://golang.org/cl/69870054
2014-03-06 21:33:19 +04:00
Russ Cox
1249d3a518 runtime: handle Go calls C calls Go panic correctly on windows/386
32-bit Windows uses "structured exception handling" (SEH) to
handle hardware faults: that there is a per-thread linked list
of fault handlers maintained in user space instead of
something like Unix's signal handlers. The structures in the
linked list are required to live on the OS stack, and the
usual discipline is that the function that pushes a record
(allocated from the current stack frame) onto the list pops
that record before returning. Not to pop the entry before
returning creates a dangling pointer error: the list head
points to a stack frame that no longer exists.

Go pushes an SEH record in the top frame of every OS thread,
and that record suffices for all Go execution on that thread,
at least until cgo gets involved.

If we call into C using cgo, that called C code may push its
own SEH records, but by the convention it must pop them before
returning back to the Go code. We assume it does, and that's
fine.

If the C code calls back into Go, we want the Go SEH handler
to become active again, not whatever C has set up. So
runtime.callbackasm1, which handles a call from C back into
Go, pushes a new SEH record before calling the Go code and
pops it when the Go code returns. That's also fine.

It can happen that when Go calls C calls Go like this, the
inner Go code panics. We allow a defer in the outer Go to
recover the panic, effectively wiping not only the inner Go
frames but also the C calls. This sequence was not popping the
SEH stack up to what it was before the cgo calls, so it was
creating the dangling pointer warned about above. When
eventually the m stack was used enough to overwrite the
dangling SEH records, the SEH chain was lost, and any future
panic would not end up in Go's handler.

The bug in TestCallbackPanic and friends was thus creating a
situation where TestSetPanicOnFault - which causes a hardware
fault - would not find the Go fault handler and instead crash
the binary.

Add checks to TestCallbackPanicLocked to diagnose the mistake
in that test instead of leaving a bad state for another test
case to stumble over.

Fix bug by restoring SEH chain during deferred "endcgo"
cleanup.

This bug is likely present in Go 1.2.1, but since it depends
on Go calling C calling Go, with the inner Go panicking and
the outer Go recovering the panic, it seems not important
enough to bother fixing before Go 1.3. Certainly no one has
complained.

Fixes #7470.

LGTM=alex.brainman
R=golang-codereviews, alex.brainman
CC=golang-codereviews, iant, khr
https://golang.org/cl/71440043
2014-03-05 11:10:40 -05:00
Keith Randall
e9445547b6 runtime: move stack shrinking until after sweepgen is incremented.
Before GC, we flush all the per-P allocation caches.  Doing
stack shrinking mid-GC causes these caches to fill up.  At the
end of gc, the sweepgen is incremented which causes all of the
data in these caches to be in a bad state (cached but not yet
swept).

Move the stack shrinking until after sweepgen is incremented,
so any caching that happens as part of shrinking is done with
already-swept data.

Reenable stack copying.

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/69620043
2014-02-27 14:20:15 -08:00
Keith Randall
f50a87058b runtime: disable stack copying
TBR=dvyukov

TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/69080045
2014-02-27 01:45:22 -08:00
Keith Randall
1665b006a5 runtime: grow stack by copying
On stack overflow, if all frames on the stack are
copyable, we copy the frames to a new stack twice
as large as the old one.  During GC, if a G is using
less than 1/4 of its stack, copy the stack to a stack
half its size.

TODO
- Do something about C frames.  When a C frame is in the
  stack segment, it isn't copyable.  We allocate a new segment
  in this case.
  - For idempotent C code, we can abort it, copy the stack,
    then retry.  I'm working on a separate CL for this.
  - For other C code, we can raise the stackguard
    to the lowest Go frame so the next call that Go frame
    makes triggers a copy, which will then succeed.
- Pick a starting stack size?

The plan is that eventually we reach a point where the
stack contains only copyable frames.

LGTM=rsc
R=dvyukov, rsc
CC=golang-codereviews
https://golang.org/cl/54650044
2014-02-26 23:28:44 -08:00
Russ Cox
e56c6e7535 runtime/debug: add SetPanicOnFault
SetPanicOnFault allows recovery from unexpected memory faults.
This can be useful if you are using a memory-mapped file
or probing the address space of the current program.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/66590044
2014-02-20 16:18:05 -05:00
Russ Cox
67c83db60d runtime: use goc2c as much as possible
Package runtime's C functions written to be called from Go
started out written in C using carefully constructed argument
lists and the FLUSH macro to write a result back to memory.

For some functions, the appropriate parameter list ended up
being architecture-dependent due to differences in alignment,
so we added 'goc2c', which takes a .goc file containing Go func
declarations but C bodies, rewrites the Go func declaration to
equivalent C declarations for the target architecture, adds the
needed FLUSH statements, and writes out an equivalent C file.
That C file is compiled as part of package runtime.

Native Client's x86-64 support introduces the most complex
alignment rules yet, breaking many functions that could until
now be portably written in C. Using goc2c for those avoids the
breakage.

Separately, Keith's work on emitting stack information from
the C compiler would require the hand-written functions
to add #pragmas specifying how many arguments are result
parameters. Using goc2c for those avoids maintaining #pragmas.

For both reasons, use goc2c for as many Go-called C functions
as possible.

This CL is a replay of the bulk of CL 15400047 and CL 15790043,
both of which were reviewed as part of the NaCl port and are
checked in to the NaCl branch. This CL is part of bringing the
NaCl code into the main tree.

No new code here, just reformatting and occasional movement
into .h files.

LGTM=r
R=dave, alex.brainman, r
CC=golang-codereviews
https://golang.org/cl/65220044
2014-02-20 15:58:47 -05:00
Russ Cox
53061193f1 cmd/gc, runtime: enable precisestack by default
[Repeat of CL 64100044, after 32-bit fix in CL 66170043.]

Precisestack makes stack collection completely precise,
in the sense that there are no "used and not set" errors
in the collection of stack frames, no times where the collector
reads a pointer from a stack word that has not actually been
initialized with a pointer (possibly a nil pointer) in that function.

The most important part is interfaces: precisestack means
that if reading an interface value, the interface value is guaranteed
to be initialized, meaning that the type word can be relied
upon to be either nil or a valid interface type word describing
the data word.

This requires additional zeroing of certain values on the stack
on entry, which right now costs about 5% overall execution
time in all.bash. That cost will come down before Go 1.3
(issue 7345).

There are at least two known garbage collector bugs right now,
issues 7343 and 7344. The first happens even without precisestack.
The second I have only seen with precisestack, but that does not
mean that precisestack is what causes it. In fact it is very difficult
to explain by what precisestack does directly. Precisestack may
be exacerbating an existing problem. Both of those issues are
marked for Go 1.3 as well.

The reasons for enabling precisestack now are to give it more
time to soak and because the copying stack work depends on it.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/65820044
2014-02-19 17:09:08 -05:00
Russ Cox
aad23e708c undo CL 64100044 / 04d062c2e81c
broke 32-bit builds

««« original CL description
cmd/gc, runtime: enable precisestack by default

Precisestack makes stack collection completely precise,
in the sense that there are no "used and not set" errors
in the collection of stack frames, no times where the collector
reads a pointer from a stack word that has not actually been
initialized with a pointer (possibly a nil pointer) in that function.

The most important part is interfaces: precisestack means
that if reading an interface value, the interface value is guaranteed
to be initialized, meaning that the type word can be relied
upon to be either nil or a valid interface type word describing
the data word.

This requires additional zeroing of certain values on the stack
on entry, which right now costs about 5% overall execution
time in all.bash. That cost will come down before Go 1.3
(issue 7345).

There are at least two known garbage collector bugs right now,
issues 7343 and 7344. The first happens even without precisestack.
The second I have only seen with precisestack, but that does not
mean that precisestack is what causes it. In fact it is very difficult
to explain by what precisestack does directly. Precisestack may
be exacerbating an existing problem. Both of those issues are
marked for Go 1.3 as well.

The reasons for enabling precisestack now are to give it more
time to soak and because the copying stack work depends on it.

LGTM=r
R=r
CC=golang-codereviews, iant, khr
https://golang.org/cl/64100044
»»»

TBR=r
CC=golang-codereviews
https://golang.org/cl/65230043
2014-02-17 21:34:58 -05:00
Russ Cox
ecf700b5ee cmd/gc, runtime: enable precisestack by default
Precisestack makes stack collection completely precise,
in the sense that there are no "used and not set" errors
in the collection of stack frames, no times where the collector
reads a pointer from a stack word that has not actually been
initialized with a pointer (possibly a nil pointer) in that function.

The most important part is interfaces: precisestack means
that if reading an interface value, the interface value is guaranteed
to be initialized, meaning that the type word can be relied
upon to be either nil or a valid interface type word describing
the data word.

This requires additional zeroing of certain values on the stack
on entry, which right now costs about 5% overall execution
time in all.bash. That cost will come down before Go 1.3
(issue 7345).

There are at least two known garbage collector bugs right now,
issues 7343 and 7344. The first happens even without precisestack.
The second I have only seen with precisestack, but that does not
mean that precisestack is what causes it. In fact it is very difficult
to explain by what precisestack does directly. Precisestack may
be exacerbating an existing problem. Both of those issues are
marked for Go 1.3 as well.

The reasons for enabling precisestack now are to give it more
time to soak and because the copying stack work depends on it.

LGTM=r
R=r
CC=golang-codereviews, iant, khr
https://golang.org/cl/64100044
2014-02-17 20:12:40 -05:00
Dmitriy Vyukov
47534ddc68 runtime: remove misleading message during crash
The following checkdead message is false positive:

$ go test -race -c runtime
$ ./runtime.test -test.cpu=2 -test.run=TestSmhasherWindowed -test.v
=== RUN TestSmhasherWindowed-2
checkdead: find g 18 in status 1
SIGABRT: abort
PC=0x42bff1

LGTM=rsc
R=golang-codereviews, gobot, rsc
CC=golang-codereviews, iant, khr
https://golang.org/cl/59490046
2014-02-14 13:24:48 +04:00
Dmitriy Vyukov
eca55f5ac0 runtime: fix windows cpu profiler
Currently it periodically fails with the following message.
The immediate cause is the wrong base register when obtaining g
in sys_windows_amd64/386.s.
But there are several secondary problems as well.

runtime: unknown pc 0x0 after stack split
panic: invalid memory address or nil pointer dereference
fatal error: panic during malloc
[signal 0xc0000005 code=0x0 addr=0x60 pc=0x42267a]

runtime stack:
runtime.panic(0x7914c0, 0xc862af)
        c:/src/perfer/work/windows-amd64-a15f344a9efa/go/src/pkg/runtime/panic.c:217 +0x2c
runtime: unexpected return pc for runtime.externalthreadhandler called from 0x0

R=rsc, alex.brainman
CC=golang-codereviews
https://golang.org/cl/63310043
2014-02-14 09:20:51 +04:00
Dmitriy Vyukov
5e72fae9b2 runtime: improve cpu profiles for GC/syscalls/cgo
Current "System->etext" is not very informative.
Add parent "GC" frame.
Replace un-unwindable syscall/cgo frames with Go stack that leads to the call.

LGTM=rsc
R=rsc, alex.brainman, ality
CC=golang-codereviews
https://golang.org/cl/61270043
2014-02-12 22:31:36 +04:00
Dmitriy Vyukov
373e1e94d8 runtime: fix crash during cpu profiling
mp->mcache can be concurrently modified by runtime·helpgc.
In such case sigprof can remember mcache=nil, then helpgc sets it to non-nil,
then sigprof restores it back to nil, GC crashes with nil mcache.

R=rsc
CC=golang-codereviews
https://golang.org/cl/58860044
2014-02-10 20:24:47 +04:00
Dmitriy Vyukov
0229dc6dbe runtime: do not cpu profile idle threads on windows
Currently this leads to a significant skew towards 'etext' entry,
since all idle threads are profiled every tick.
Before:
Total: 66608 samples
   63188  94.9%  94.9%    63188  94.9% etext
     278   0.4%  95.3%      278   0.4% sweepspan
     216   0.3%  95.6%      448   0.7% runtime.mallocgc
     122   0.2%  95.8%      122   0.2% scanblock
     113   0.2%  96.0%      113   0.2% net/textproto.canonicalMIMEHeaderKey
After:
Total: 8008 samples
    3949  49.3%  49.3%     3949  49.3% etext
     231   2.9%  52.2%      231   2.9% scanblock
     211   2.6%  54.8%      211   2.6% runtime.cas64
     182   2.3%  57.1%      408   5.1% runtime.mallocgc
     178   2.2%  59.3%      178   2.2% runtime.atomicload64

LGTM=alex.brainman
R=golang-codereviews, alex.brainman
CC=golang-codereviews
https://golang.org/cl/61250043
2014-02-10 15:40:55 +04:00
Dmitriy Vyukov
179d41fecc runtime: tune P retake logic
When GOMAXPROCS>1 the last P in syscall is never retaken
(because there are already idle P's -- npidle>0).
This prevents sysmon thread from sleeping.
On a darwin machine the program from issue 6673 constantly
consumes ~0.2% CPU. With this change it stably consumes 0.0% CPU.
Fixes #6673.

R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/56990045
2014-01-27 23:17:46 +04:00
Dmitriy Vyukov
f8e0057bb7 sync: scalable Pool
Introduce fixed-size P-local caches.
When local caches overflow/underflow a batch of items
is transferred to/from global mutex-protected cache.

benchmark                    old ns/op    new ns/op    delta
BenchmarkPool                    50554        22423  -55.65%
BenchmarkPool-4                 400359         5904  -98.53%
BenchmarkPool-16                403311         1598  -99.60%
BenchmarkPool-32                367310         1526  -99.58%

BenchmarkPoolOverlflow            5214         3633  -30.32%
BenchmarkPoolOverlflow-4         42663         9539  -77.64%
BenchmarkPoolOverlflow-8         46919        11385  -75.73%
BenchmarkPoolOverlflow-16        39454        13048  -66.93%

BenchmarkSprintfEmpty                    84           63  -25.68%
BenchmarkSprintfEmpty-2                 371           32  -91.13%
BenchmarkSprintfEmpty-4                 465           22  -95.25%
BenchmarkSprintfEmpty-8                 565           12  -97.77%
BenchmarkSprintfEmpty-16                498            5  -98.87%
BenchmarkSprintfEmpty-32                492            4  -99.04%

BenchmarkSprintfString                  259          229  -11.58%
BenchmarkSprintfString-2                574          144  -74.91%
BenchmarkSprintfString-4                651           77  -88.05%
BenchmarkSprintfString-8                868           47  -94.48%
BenchmarkSprintfString-16               825           33  -95.96%
BenchmarkSprintfString-32               825           30  -96.28%

BenchmarkSprintfInt                     213          188  -11.74%
BenchmarkSprintfInt-2                   448          138  -69.20%
BenchmarkSprintfInt-4                   624           52  -91.63%
BenchmarkSprintfInt-8                   691           31  -95.43%
BenchmarkSprintfInt-16                  724           18  -97.46%
BenchmarkSprintfInt-32                  718           16  -97.70%

BenchmarkSprintfIntInt                  311          282   -9.32%
BenchmarkSprintfIntInt-2                333          145  -56.46%
BenchmarkSprintfIntInt-4                642          110  -82.87%
BenchmarkSprintfIntInt-8                832           42  -94.90%
BenchmarkSprintfIntInt-16               817           24  -97.00%
BenchmarkSprintfIntInt-32               805           22  -97.17%

BenchmarkSprintfPrefixedInt             309          269  -12.94%
BenchmarkSprintfPrefixedInt-2           245          168  -31.43%
BenchmarkSprintfPrefixedInt-4           598           99  -83.36%
BenchmarkSprintfPrefixedInt-8           770           67  -91.23%
BenchmarkSprintfPrefixedInt-16          829           54  -93.49%
BenchmarkSprintfPrefixedInt-32          824           50  -93.83%

BenchmarkSprintfFloat                   418          398   -4.78%
BenchmarkSprintfFloat-2                 295          203  -31.19%
BenchmarkSprintfFloat-4                 585          128  -78.12%
BenchmarkSprintfFloat-8                 873           60  -93.13%
BenchmarkSprintfFloat-16                884           33  -96.24%
BenchmarkSprintfFloat-32                881           29  -96.62%

BenchmarkManyArgs                      1097         1069   -2.55%
BenchmarkManyArgs-2                     705          567  -19.57%
BenchmarkManyArgs-4                     792          319  -59.72%
BenchmarkManyArgs-8                     963          172  -82.14%
BenchmarkManyArgs-16                   1115          103  -90.76%
BenchmarkManyArgs-32                   1133           90  -92.03%

LGTM=rsc
R=golang-codereviews, bradfitz, minux.ma, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/46010043
2014-01-24 22:29:53 +04:00
Dmitriy Vyukov
9cbd2fb1aa runtime: remove locks from netpoll hotpaths
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
This mechanism does not require backing mutex to protect wait predicate.
Use it in netpoll. See comment in netpoll.goc for details.
This slightly reduces contention between reader, writer and read/write io notifications;
and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.

benchmark                             old ns/op    new ns/op    delta
BenchmarkTCP4ConcurrentReadWrite           2109         1945   -7.78%
BenchmarkTCP4ConcurrentReadWrite-2         1162         1113   -4.22%
BenchmarkTCP4ConcurrentReadWrite-4          798          755   -5.39%
BenchmarkTCP4ConcurrentReadWrite-8          803          748   -6.85%
BenchmarkTCP4Persistent                    9411         9240   -1.82%
BenchmarkTCP4Persistent-2                  5888         5813   -1.27%
BenchmarkTCP4Persistent-4                  4016         3968   -1.20%
BenchmarkTCP4Persistent-8                  3943         3857   -2.18%

R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
CC=golang-codereviews, khr
https://golang.org/cl/45700043
2014-01-22 11:27:16 +04:00
Dmitriy Vyukov
98b50b89a8 runtime: allocate goroutine ids in batches
Helps reduce contention on sched.goidgen.

benchmark                               old ns/op    new ns/op    delta
BenchmarkCreateGoroutines-16                  259          237   -8.49%
BenchmarkCreateGoroutinesParallel-16          127           43  -66.06%

R=golang-codereviews, dave, bradfitz, khr
CC=golang-codereviews, rsc
https://golang.org/cl/46970043
2014-01-22 10:34:36 +04:00
Dmitriy Vyukov
8a3c587dc1 runtime: fix and improve CPU profiling
- do not lose profiling signals when we have no mcache (possible for syscalls/cgo)
- do not lose any profiling signals on windows
- fix profiling of cgo programs on windows (they had no m->thread setup)
- properly setup tls in cgo programs on windows
- check _beginthread return value

Fixes #6417.
Fixes #6986.

R=alex.brainman, rsc
CC=golang-codereviews
https://golang.org/cl/44820047
2014-01-22 10:30:10 +04:00
Dmitriy Vyukov
cb133c6607 runtime: do not collect GC roots explicitly
Currently we collect (add) all roots into a global array in a single-threaded GC phase.
This hinders parallelism.
With this change we just kick off parallel for for number_of_goroutines+5 iterations.
Then parallel for callback decides whether it needs to scan stack of a goroutine
scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely.
This requires to store all goroutines in an array instead of a linked list
(to allow direct indexing).
This CL also removes DebugScan functionality. It is broken because it uses
unbounded stack, so it can not run on g0. When it was working, I've found
it helpless for debugging issues because the two algorithms are too different now.
This change would require updating the DebugScan, so it's simpler to just delete it.

With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same.

garbage-8
allocated                 2987886      2989221      +0.04%
allocs                      62885        62887      +0.00%
cputime                  21286000     21272000      -0.07%
gc-pause-one             26633247     24885421      -6.56%
gc-pause-total             873570       811264      -7.13%
rss                     242089984    242515968      +0.18%
sys-gc                   13934336     13869056      -0.47%
sys-heap                205062144    205062144      +0.00%
sys-other                12628288     12628288      +0.00%
sys-stack                11534336     11927552      +3.41%
sys-total               243159104    243487040      +0.13%
time                      2809477      2740795      -2.44%

R=golang-codereviews, rsc
CC=cshapiro, golang-codereviews, khr
https://golang.org/cl/46860043
2014-01-21 13:06:57 +04:00