1
0
mirror of https://github.com/golang/go synced 2024-10-04 08:21:22 -06:00
Commit Graph

284 Commits

Author SHA1 Message Date
Dmitriy Vyukov
a12661329b runtime: fix triggering of forced GC
mstats.last_gc is unix time now, it is compared with abstract monotonic time.
On my machine GC is forced every 5 mins regardless of last_gc.

LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, iant, rsc
https://golang.org/cl/91350045
2014-05-13 09:53:03 +04:00
Dmitriy Vyukov
acb03b8028 runtime: optimize markspan
Increases throughput by 2x on a memory hungry program on 8-node NUMA machine.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/100230043
2014-05-07 19:32:34 +04:00
Dmitriy Vyukov
350a8fcde1 runtime: make MemStats.LastGC Unix time again
The monotonic clock patch changed all runtime times
to abstract monotonic time. As the result user-visible
MemStats.LastGC become monotonic time as well.
Restore Unix time for LastGC.

This is the simplest way to expose time.now to runtime that I found.
Another option would be to change time.now to C called
int64 runtime.unixnanotime() and then express time.now in terms of it.
But this would require to introduce 2 64-bit divisions into time.now.
Another option would be to change time.now to C called
void runtime.unixnanotime1(struct {int64 sec, int32 nsec} *now)
and then express both time.now and runtime.unixnanotime in terms of it.

Fixes #7852.

LGTM=minux.ma, iant
R=minux.ma, rsc, iant
CC=golang-codereviews
https://golang.org/cl/93720045
2014-05-02 17:32:42 +01:00
Russ Cox
a5b1530557 runtime: adjust GC debug print to include source pointers
Having the pointers means you can grub around in the
binary finding out more about them.

This helped with issue 7748.

LGTM=minux.ma, bradfitz
R=golang-codereviews, minux.ma, bradfitz
CC=golang-codereviews
https://golang.org/cl/88090045
2014-04-16 11:39:43 -04:00
Dmitriy Vyukov
55e0f36fb4 runtime: fix program termination when main goroutine calls Goexit
Do not consider idle finalizer/bgsweep/timer goroutines as doing something useful.
We can't simply set isbackground for the whole lifetime of the goroutines,
because when finalizer goroutine calls user function, we do want to consider it
as doing something useful.
This is borken due to timers for quite some time.
With background sweep is become even more broken.
Fixes #7784.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/87960044
2014-04-15 19:48:17 +04:00
Dmitriy Vyukov
8fc6ed4c89 sync: less agressive local caching in Pool
Currently Pool can cache up to 15 elements per P, and these elements are not accesible to other Ps.
If a Pool caches large objects, say 2MB, and GOMAXPROCS is set to a large value, say 32,
then the Pool can waste up to 960MB.
The new caching policy caches at most 1 per-P element, the rest is shared between Ps.

Get/Put performance is unchanged. Nested Get/Put performance is 57% worse.
However, overall scalability of nested Get/Put is significantly improved,
so the new policy starts winning under contention.

benchmark                     old ns/op     new ns/op     delta
BenchmarkPool                 27.4          26.7          -2.55%
BenchmarkPool-4               6.63          6.59          -0.60%
BenchmarkPool-16              1.98          1.87          -5.56%
BenchmarkPool-64              1.93          1.86          -3.63%
BenchmarkPoolOverlflow        3970          6235          +57.05%
BenchmarkPoolOverlflow-4      10935         1668          -84.75%
BenchmarkPoolOverlflow-16     13419         520           -96.12%
BenchmarkPoolOverlflow-64     10295         380           -96.31%

LGTM=rsc
R=rsc
CC=golang-codereviews, khr
https://golang.org/cl/86020043
2014-04-14 21:13:32 +04:00
Russ Cox
5539ef02b6 runtime: make times in GODEBUG=gctrace=1 output clearer
TBR=0intro
CC=golang-codereviews
https://golang.org/cl/86620043
2014-04-10 14:34:48 -04:00
Russ Cox
95ee7d6414 runtime: use 3x fewer nanotime calls in garbage collection
Cuts the number of calls from 6 to 2 in the non-debug case.

LGTM=iant
R=golang-codereviews, iant
CC=0intro, aram, golang-codereviews, khr
https://golang.org/cl/86040043
2014-04-09 10:38:12 -04:00
Russ Cox
72c5d5e756 reflect, runtime: fix crash in GC due to reflect.call + precise GC
Given
        type Outer struct {
                *Inner
                ...
        }
the compiler generates the implementation of (*Outer).M dispatching to
the embedded Inner. The implementation is logically:
        func (p *Outer) M() {
                (p.Inner).M()
        }
but since the only change here is the replacement of one pointer
receiver with another, the actual generated code overwrites the
original receiver with the p.Inner pointer and then jumps to the M
method expecting the *Inner receiver.

During reflect.Value.Call, we create an argument frame and the
associated data structures to describe it to the garbage collector,
populate the frame, call reflect.call to run a function call using
that frame, and then copy the results back out of the frame. The
reflect.call function does a memmove of the frame structure onto the
stack (to set up the inputs), runs the call, and the memmoves the
stack back to the frame structure (to preserve the outputs).

Originally reflect.call did not distinguish inputs from outputs: both
memmoves were for the full stack frame. However, in the case where the
called function was one of these wrappers, the rewritten receiver is
almost certainly a different type than the original receiver. This is
not a problem on the stack, where we use the program counter to
determine the type information and understand that during (*Outer).M
the receiver is an *Outer while during (*Inner).M the receiver in the
same memory word is now an *Inner. But in the statically typed
argument frame created by reflect, the receiver is always an *Outer.
Copying the modified receiver pointer off the stack into the frame
will store an *Inner there, and then if a garbage collection happens
to scan that argument frame before it is discarded, it will scan the
*Inner memory as if it were an *Outer. If the two have different
memory layouts, the collection will intepret the memory incorrectly.

Fix by only copying back the results.

Fixes #7725.

LGTM=khr
R=khr
CC=dave, golang-codereviews
https://golang.org/cl/85180043
2014-04-08 11:11:35 -04:00
Russ Cox
28f1868fed cmd/gc, runtime: make GODEBUG=gcdead=1 mode work with liveness
Trying to make GODEBUG=gcdead=1 work with liveness
and in particular ambiguously live variables.

1. In the liveness computation, mark all ambiguously live
variables as live for the entire function, except the entry.
They are zeroed directly after entry, and we need them not
to be poisoned thereafter.

2. In the liveness computation, compute liveness (and deadness)
for all parameters, not just pointer-containing parameters.
Otherwise gcdead poisons untracked scalar parameters and results.

3. Fix liveness debugging print for -live=2 to use correct bitmaps.
(Was not updated for compaction during compaction CL.)

4. Correct varkill during map literal initialization.
Was killing the map itself instead of the inserted value temp.

5. Disable aggressive varkill cleanup for call arguments if
the call appears in a defer or go statement.

6. In the garbage collector, avoid bug scanning empty
strings. An empty string is two zeros. The multiword
code only looked at the first zero and then interpreted
the next two bits in the bitmap as an ordinary word bitmap.
For a string the bits are 11 00, so if a live string was zero
length with a 0 base pointer, the poisoning code treated
the length as an ordinary word with code 00, meaning it
needed poisoning, turning the string into a poison-length
string with base pointer 0. By the same logic I believe that
a live nil slice (bits 11 01 00) will have its cap poisoned.
Always scan full multiword struct.

7. In the runtime, treat both poison words (PoisonGC and
PoisonStack) as invalid pointers that warrant crashes.

Manual testing as follows:

- Create a script called gcdead on your PATH containing:

        #!/bin/bash
        GODEBUG=gcdead=1 GOGC=10 GOTRACEBACK=2 exec "$@"
- Now you can build a test and then run 'gcdead ./foo.test'.
- More importantly, you can run 'go test -short -exec gcdead std'
   to run all the tests.

Fixes #7676.

While here, enable the precise scanning of slices, since that was
disabled due to bugs like these. That now works, both with and
without gcdead.

Fixes #7549.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83410044
2014-04-03 20:33:25 -04:00
Russ Cox
4676fae525 cmd/gc, cmd/ld, runtime: compact liveness bitmaps
Reduce footprint of liveness bitmaps by about 5x.

1. Mark all liveness bitmap symbols as 4-byte aligned
(they were aligned to a larger size by default).

2. The bitmap data is a bitmap count n followed by n bitmaps.
Each bitmap begins with its own count m giving the number
of bits. All the m's are the same for the n bitmaps.
Emit this bitmap length once instead of n times.

3. Many bitmaps within a function have the same bit values,
but each call site was given a distinct bitmap. Merge duplicate
bitmaps so that no bitmap is written more than once.

4. Many functions end up with the same aggregate bitmap data.
We used to name the bitmap data funcname.gcargs and funcname.gclocals.
Instead, name it gclocals.<md5 of data> and mark it dupok so
that the linker coalesces duplicate sets. This cut the bitmap
data remaining after step 3 by 40%; I was not expecting it to
be quite so dramatic.

Applied to "go build -ldflags -w code.google.com/p/go.tools/cmd/godoc":

                bitmaps           pclntab           binary on disk
before this CL  1326600           1985854           12738268
4-byte align    1154288 (0.87x)   1985854 (1.00x)   12566236 (0.99x)
one bitmap len   782528 (0.54x)   1985854 (1.00x)   12193500 (0.96x)
dedup bitmap     414748 (0.31x)   1948478 (0.98x)   11787996 (0.93x)
dedup bitmap set 245580 (0.19x)   1948478 (0.98x)   11620060 (0.91x)

While here, remove various dead blocks of code from plive.c.

Fixes #6929.
Fixes #7568.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83630044
2014-04-02 16:49:27 -04:00
Russ Cox
1ec4d5e9e7 runtime: adjust GODEBUG=allocfreetrace=1 and GODEBUG=gcdead=1
GODEBUG=allocfreetrace=1:

The allocfreetrace=1 mode prints a stack trace for each block
allocated and freed, and also a stack trace for each garbage collection.

It was implemented by reusing the heap profiling support: if allocfreetrace=1
then the heap profile was effectively running at 1 sample per 1 byte allocated
(always sample). The stack being shown at allocation was the stack gathered
for profiling, meaning it was derived only from the program counters and
did not include information about function arguments or frame pointers.
The stack being shown at free was the allocation stack, not the free stack.
If you are generating this log, you can find the allocation stack yourself, but
it can be useful to see exactly the sequence that led to freeing the block:
was it the garbage collector or an explicit free? Now that the garbage collector
runs on an m0 stack, the stack trace for the garbage collector was never interesting.

Fix all these problems:

1. Decouple allocfreetrace=1 from heap profiling.
2. Print the standard goroutine stack traces instead of a custom format.
3. Print the stack trace at time of allocation for an allocation,
   and print the stack trace at time of free (not the allocation trace again)
   for a free.
4. Print all goroutine stacks at garbage collection. Having all the stacks
   means that you can see the exact point at which each goroutine was
   preempted, which is often useful for identifying liveness-related errors.

GODEBUG=gcdead=1:

This mode overwrites dead pointers with a poison value.
Detect the poison value as an invalid pointer during collection,
the same way that small integers are invalid pointers.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81670043
2014-04-01 13:30:10 -04:00
Russ Cox
5a23a7e52c runtime: enable 'bad pointer' check during garbage collection of Go stack frames
This is the same check we use during stack copying.
The check cannot be applied to C stack frames, even
though we do emit pointer bitmaps for the arguments,
because (1) the pointer bitmaps assume all arguments
are always live, not true of outputs during the prologue,
and (2) the pointer bitmaps encode interface values as
pointer pairs, not true of interfaces holding integers.

For the rest of the frames, however, we should hold ourselves
to the rule that a pointer marked live really is initialized.
The interface scanning already implicitly checks this
because it interprets the type word  as a valid type pointer.

This may slow things down a little because of the extra loads.
Or it may speed things up because we don't bother enqueuing
nil pointers anymore. Enough of the rest of the system is slow
right now that we can't measure it meaningfully.
Enable for now, even if it is slow, to shake out bugs in the
liveness bitmaps, and then decide whether to turn it off
for the Go 1.3 release (issue 7650 reminds us to do this).

The new m->traceback field lets us force printing of fp=
values on all goroutine stack traces when we detect a
bad pointer. This makes it easier to understand exactly
where in the frame the bad pointer is, so that we can trace
it back to a specific variable and determine what is wrong.

Update #7650

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80860044
2014-03-27 14:06:15 -04:00
Dmitriy Vyukov
f8c350873c runtime: fix yet another race in bgsweep
Currently it's possible that bgsweep finishes before all spans
have been swept (we only know that sweeping of all spans has *started*).
In such case bgsweep may fail wake up runfinq goroutine when it needs to.
finq may still be nil at this point, but some finalizers may be queued later.
Make bgsweep to wait for sweeping to *complete*, then it can decide
whether it needs to wake up runfinq for sure.
Update #7533

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75960043
2014-03-26 15:11:36 +04:00
Dmitriy Vyukov
40f5e67571 runtime: minor improvement of string scanning
If we set obj, then it will be enqueued for marking at the end of the scanning loop.
This is not necessary, since we've already marked it.
This can wait for 1.4 if you wish.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/80030043
2014-03-26 15:03:58 +04:00
Keith Randall
fff63c2448 runtime: WriteHeapDump dumps the heap to a file.
See http://golang.org/s/go13heapdump for the file format.

LGTM=rsc
R=rsc, bradfitz, dvyukov, khr
CC=golang-codereviews
https://golang.org/cl/37540043
2014-03-25 15:09:49 -07:00
Keith Randall
1b45cc45e3 runtime: redo stack map entries to avoid false retention
Change two-bit stack map entries to encode:
0 = dead
1 = scalar
2 = pointer
3 = multiword

If multiword, the two-bit entry for the following word encodes:
0 = string
1 = slice
2 = iface
3 = eface

That way, during stack scanning we can check if a string
is zero length or a slice has zero capacity.  We can avoid
following the contained pointer in those cases.  It is safe
to do so because it can never be dereferenced, and it is
desirable to do so because it may cause false retention
of the following block in memory.

Slice feature turned off until issue 7564 is fixed.

Update #7549

LGTM=rsc
R=golang-codereviews, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/76380043
2014-03-25 14:11:34 -07:00
Ian Lance Taylor
4ebfa83199 runtime: accurately record whether heap memory is reserved
The existing code did not have a clear notion of whether
memory has been actually reserved.  It checked based on
whether in 32-bit mode or 64-bit mode and (on GNU/Linux) the
requested address, but it confused the requested address and
the returned address.

LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews, michael.hudson
https://golang.org/cl/79610043
2014-03-25 13:22:19 -07:00
Ian Lance Taylor
8de04c78b7 runtime: change nproc local variable to uint32
The nproc and ndone fields are uint32.  This makes the type
consistent.

LGTM=minux.ma
R=golang-codereviews, minux.ma
CC=golang-codereviews
https://golang.org/cl/79340044
2014-03-25 05:18:08 -07:00
Dmitriy Vyukov
fed5428c4a runtime: fix another race in bgsweep
It's possible that bgsweep constantly does not catch up for some reason,
in this case runfinq was not woken at all.

R=rsc
CC=golang-codereviews
https://golang.org/cl/75940043
2014-03-14 23:32:12 +04:00
Dmitriy Vyukov
8d321625fd runtime: fix spans corruption
The problem was that spans end up in wrong lists after split
(e.g. in h->busy instead of h->central->empty).
Also the span can be non-swept before split,
I don't know what it can cause, but it's safer to operate on swept spans.
Fixes #7544.

R=golang-codereviews, rsc
CC=golang-codereviews, khr
https://golang.org/cl/76160043
2014-03-14 23:25:48 +04:00
Dmitriy Vyukov
0da73b9f07 runtime: fix a race in bgsweep
See the comment for description.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75670044
2014-03-14 21:21:44 +04:00
Russ Cox
54c901cd08 runtime: fix empty string handling in garbage collector
The garbage collector uses type information to guide the
traversal of the heap. If it sees a field that should be a string,
it marks the object pointed at by the string data pointer as
visited but does not bother to look at the data, because
strings contain bytes, not pointers.

If you save s[len(s):] somewhere, though, the string data pointer
actually points just beyond the string data; if the string data
were exactly the size of an allocated block, the string data
pointer would actually point at the next block. It is incorrect
to mark that next block as visited and not bother to look at
the data, because the next block may be some other type
entirely.

The fix is to ignore strings with zero length during collection:
they are empty and can never become non-empty: the base
pointer will never be used again. The handling of slices already
does this (but using cap instead of len).

This was not a bug in Go 1.2, because until January all string
allocations included a trailing NUL byte not included in the
length, so s[len(s):] still pointed inside the string allocation
(at the NUL).

This bug was causing the crashes in test/run.go. Specifically,
the parsing of a regexp in package regexp/syntax allocated a
[]syntax.Inst with rounded size 1152 bytes. In fact it
allocated many such slices, because during the processing of
test/index2.go it creates thousands of regexps that are all
approximately the same complexity. That takes a long time, and
test/run works on other tests in other goroutines. One such
other test is chan/perm.go, which uses an 1152-byte source
file. test/run reads that file into a []byte and then calls
strings.Split(string(src), "\n"). The string(src) creates an
1152-byte string - and there's a very good chance of it
landing next to one of the many many regexp slices already
allocated - and then because the file ends in a \n,
strings.Split records the tail empty string as the final
element in the slice. A garbage collection happens at this
point, the collection finds that string before encountering
the []syntax.Inst data it now inadvertently points to, and the
[]syntax.Inst data is not scanned for the pointers that it
contains. Each syntax.Inst contains a []rune, those are
missed, and the backing rune arrays are freed for reuse. When
the regexp is later executed, the runes being searched for are
no longer runes at all, and there is no match, even on text
that should match.

On 64-bit machines the pointer in the []rune inside the
syntax.Inst is larger (along with a few other pointers),
pushing the []syntax.Inst backing array into a larger size
class, avoiding the collision with chan/perm.go's
inadvertently sized file.

I expect this was more prevalent on OS X than on Linux or
Windows because those managed to run faster or slower and
didn't overlap index2.go with chan/perm.go as often. On the
ARM systems, we only run one errorcheck test at a time, so
index2 and chan/perm would never overlap.

It is possible that this bug is the root cause of other crashes
as well. For now we only know it is the cause of the test/run crash.

Many thanks to Dmitriy for help debugging.

Fixes #7344.
Fixes #7455.

LGTM=r, dvyukov, dave, iant
R=golang-codereviews, dave, r, dvyukov, delpontej, iant
CC=golang-codereviews, khr
https://golang.org/cl/74250043
2014-03-11 23:58:39 -04:00
Dmitriy Vyukov
3877f1d9c8 runtime: remove atomic CAS loop from marknogc
Spans are now private to threads, and the loop
is removed from all other functions.
Remove it from marknogc for consistency.

LGTM=khr, rsc
R=golang-codereviews, bradfitz, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/72520043
2014-03-11 17:35:49 +04:00
Dmitriy Vyukov
38f6c3f59d runtime: wipe out bitSpecial from GC code
LGTM=khr, rsc
R=golang-codereviews, bradfitz, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/72480044
2014-03-11 17:33:03 +04:00
Keith Randall
1306279cd1 runtime: remove unused declarations.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/73720044
2014-03-10 16:02:46 -07:00
Russ Cox
b08156cd87 runtime: fix memory leak in runfinq
One reason the sync.Pool finalizer test can fail is that
this function's ef1 contains uninitialized data that just
happens to point at some of the old pool. I've seen this cause
retention of a single pool cache line (32 elements) on arm.

Really we need liveness information for C functions, but
for now we can be more careful about data in long-lived
C functions that block.

LGTM=bradfitz, dvyukov
R=golang-codereviews, bradfitz, dvyukov
CC=golang-codereviews, iant, khr
https://golang.org/cl/72490043
2014-03-07 11:27:01 -05:00
Russ Cox
da1bea0ef0 runtime: fix malloc page alignment + efence
Two memory allocator bug fixes.

- efence is not maintaining the proper heap metadata
  to make eventual memory reuse safe, so use SysFault.

- now that our heap PageSize is 8k but most hardware
  uses 4k pages, SysAlloc and SysReserve results must be
  explicitly aligned. Do that in a few more call sites and
  document this fact in malloc.h.

Fixes #7448.

LGTM=iant
R=golang-codereviews, josharian, iant
CC=dvyukov, golang-codereviews
https://golang.org/cl/71750048
2014-03-06 18:34:29 -05:00
David du Colombier
fb5e1e1fa1 runtime: fix warnings on Plan 9
warning: pkg/runtime/mgc0.c:2352 format mismatch p UVLONG, arg 2
warning: pkg/runtime/mgc0.c:2352 format mismatch p UVLONG, arg 3

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/71950044
2014-03-06 20:56:22 +01:00
Dmitriy Vyukov
aea99eda0f runtime: fix runaway memory usage
It was caused by mstats.heap_alloc skew.
Fixes #7430.

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/69870055
2014-03-06 21:33:00 +04:00
Keith Randall
e9445547b6 runtime: move stack shrinking until after sweepgen is incremented.
Before GC, we flush all the per-P allocation caches.  Doing
stack shrinking mid-GC causes these caches to fill up.  At the
end of gc, the sweepgen is incremented which causes all of the
data in these caches to be in a bad state (cached but not yet
swept).

Move the stack shrinking until after sweepgen is incremented,
so any caching that happens as part of shrinking is done with
already-swept data.

Reenable stack copying.

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/69620043
2014-02-27 14:20:15 -08:00
Keith Randall
1665b006a5 runtime: grow stack by copying
On stack overflow, if all frames on the stack are
copyable, we copy the frames to a new stack twice
as large as the old one.  During GC, if a G is using
less than 1/4 of its stack, copy the stack to a stack
half its size.

TODO
- Do something about C frames.  When a C frame is in the
  stack segment, it isn't copyable.  We allocate a new segment
  in this case.
  - For idempotent C code, we can abort it, copy the stack,
    then retry.  I'm working on a separate CL for this.
  - For other C code, we can raise the stackguard
    to the lowest Go frame so the next call that Go frame
    makes triggers a copy, which will then succeed.
- Pick a starting stack size?

The plan is that eventually we reach a point where the
stack contains only copyable frames.

LGTM=rsc
R=dvyukov, rsc
CC=golang-codereviews
https://golang.org/cl/54650044
2014-02-26 23:28:44 -08:00
Keith Randall
3b5278fca6 runtime: get rid of the settype buffer and lock.
MCaches	now hold a MSpan for each sizeclass which they have
exclusive access to allocate from, so no lock is needed.

Modifying the heap bitmaps also no longer requires a cas.

runtime.free gets more expensive.  But we don't use it
much any more.

It's not much faster on 1 processor, but it's a lot
faster on multiple processors.

benchmark                 old ns/op    new ns/op    delta
BenchmarkSetTypeNoPtr1           24           23   -0.42%
BenchmarkSetTypeNoPtr2           33           34   +0.89%
BenchmarkSetTypePtr1             51           49   -3.72%
BenchmarkSetTypePtr2             55           54   -1.98%

benchmark                old ns/op    new ns/op    delta
BenchmarkAllocation          52739        50770   -3.73%
BenchmarkAllocation-2        33957        34141   +0.54%
BenchmarkAllocation-3        33326        29015  -12.94%
BenchmarkAllocation-4        38105        25795  -32.31%
BenchmarkAllocation-5        68055        24409  -64.13%
BenchmarkAllocation-6        71544        23488  -67.17%
BenchmarkAllocation-7        68374        23041  -66.30%
BenchmarkAllocation-8        70117        20758  -70.40%

LGTM=rsc, dvyukov
R=dvyukov, bradfitz, khr, rsc
CC=golang-codereviews
https://golang.org/cl/46810043
2014-02-26 15:52:58 -08:00
Dave Cheney
7c8280c9ef all: merge NaCl branch (part 1)
See golang.org/s/go13nacl for design overview.

This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come.

CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately.

The exact change lists included are

15050047: syscall: support for Native Client
15360044: syscall: unzip implementation for Native Client
15370044: syscall: Native Client SRPC implementation
15400047: cmd/dist, cmd/go, go/build, test: support for Native Client
15410048: runtime: support for Native Client
15410049: syscall: file descriptor table for Native Client
15410050: syscall: in-memory file system for Native Client
15440048: all: update +build lines for Native Client port
15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client
15570045: os: support for Native Client
15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32
15690044: net: support for Native Client
15690048: runtime: support for fake time like on Go Playground
15690051: build: disable various tests on Native Client

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/68150047
2014-02-25 09:47:42 -05:00
Dmitriy Vyukov
ea87501750 runtime: fix heap memory corruption
With concurrent sweeping finc if modified by runfinq and queuefinalizer concurrently.
Fixes crashes like this one:
http://build.golang.org/log/6ad7b59ef2e93e3c9347eabfb4c4bd66df58fd5a
Fixes #7324.
Update #7396

LGTM=rsc
R=golang-codereviews, minux.ma, rsc
CC=golang-codereviews, khr
https://golang.org/cl/67980043
2014-02-24 20:53:50 +04:00
Dmitriy Vyukov
6e612ae0f5 runtime: fix potential memory corruption
Reinforce the guarantee that MSpan_EnsureSwept actually ensures that the span is swept.
I have not observed crashes related to this, but I do not see why it can't crash as well.

LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, khr, rsc
https://golang.org/cl/67990043
2014-02-24 20:53:20 +04:00
Dmitriy Vyukov
0ef0d6cd7b runtime: fix double symbol definition
runfinqv is already defined the same way on line 271.
There may also be something to fix in compiler/linker wrt diagnostics.
Fixes #7375.

LGTM=bradfitz
R=golang-codereviews, dave, bradfitz
CC=golang-codereviews
https://golang.org/cl/67850044
2014-02-24 20:23:03 +04:00
Russ Cox
67c83db60d runtime: use goc2c as much as possible
Package runtime's C functions written to be called from Go
started out written in C using carefully constructed argument
lists and the FLUSH macro to write a result back to memory.

For some functions, the appropriate parameter list ended up
being architecture-dependent due to differences in alignment,
so we added 'goc2c', which takes a .goc file containing Go func
declarations but C bodies, rewrites the Go func declaration to
equivalent C declarations for the target architecture, adds the
needed FLUSH statements, and writes out an equivalent C file.
That C file is compiled as part of package runtime.

Native Client's x86-64 support introduces the most complex
alignment rules yet, breaking many functions that could until
now be portably written in C. Using goc2c for those avoids the
breakage.

Separately, Keith's work on emitting stack information from
the C compiler would require the hand-written functions
to add #pragmas specifying how many arguments are result
parameters. Using goc2c for those avoids maintaining #pragmas.

For both reasons, use goc2c for as many Go-called C functions
as possible.

This CL is a replay of the bulk of CL 15400047 and CL 15790043,
both of which were reviewed as part of the NaCl port and are
checked in to the NaCl branch. This CL is part of bringing the
NaCl code into the main tree.

No new code here, just reformatting and occasional movement
into .h files.

LGTM=r
R=dave, alex.brainman, r
CC=golang-codereviews
https://golang.org/cl/65220044
2014-02-20 15:58:47 -05:00
Russ Cox
86e3cb8da5 runtime: introduce MSpan.needzero instead of writing to span data
This cleans up the code significantly, and it avoids any
possible problems with madvise zeroing out some but
not all of the data.

Fixes #6400.

LGTM=dave
R=dvyukov, dave
CC=golang-codereviews
https://golang.org/cl/57680046
2014-02-13 11:10:31 -05:00
Dmitriy Vyukov
f8e4a2ef94 runtime: fix concurrent GC sweep
The issue was that one of the MSpan_Sweep callers
was doing sweep with preemption enabled.
Additional checks are added.

LGTM=rsc
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/62990043
2014-02-13 19:36:45 +04:00
Russ Cox
73a304356b runtime: fix non-concurrent sweep
State of the world:

CL 46430043 introduced a new concurrent sweep but is broken.

CL 62360043 made the new sweep non-concurrent
to try to fix the world while we understand what's wrong with
the concurrent version.

This CL fixes the non-concurrent form to run finalizers.
This CL is just a band-aid to get the build green again.

Dmitriy is working on understanding and then fixing what's
wrong with the concurrent sweep.

TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/62370043
2014-02-12 15:54:21 -05:00
Dmitriy Vyukov
3cac829ff4 runtime: temporary disable concurrent GC sweep
We see failures on builders, e.g.:
http://build.golang.org/log/70bb28cd6bcf8c4f49810a011bb4337a61977bf4

LGTM=rsc, dave
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/62360043
2014-02-13 00:03:27 +04:00
Dmitriy Vyukov
3c3be62201 runtime: concurrent GC sweep
Moves sweep phase out of stoptheworld by adding
background sweeper goroutine and lazy on-demand sweeping.

It turned out to be somewhat trickier than I expected,
because there is no point in time when we know size of live heap
nor consistent number of mallocs and frees.
So everything related to next_gc, mprof, memstats, etc becomes trickier.

At the end of GC next_gc is conservatively set to heap_alloc*GOGC,
which is much larger than real value. But after every sweep
next_gc is decremented by freed*GOGC. So when everything is swept
next_gc becomes what it should be.

For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs),
because by the end of GC we know number of frees for the *previous* GC.

Significant caution is required to not cross yet-unknown real value of next_gc.
This is achieved by 2 means:
1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral.
2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are
returned to heap.
This provides quite strong guarantees that heap does not grow when it should now.

http-1
allocated                    7036         7033      -0.04%
allocs                         60           60      +0.00%
cputime                     51050        46700      -8.52%
gc-pause-one             34060569      1777993     -94.78%
gc-pause-total               2554          133     -94.79%
latency-50                 178448       170926      -4.22%
latency-95                 284350       198294     -30.26%
latency-99                 345191       220652     -36.08%
rss                     101564416    101007360      -0.55%
sys-gc                    6606832      6541296      -0.99%
sys-heap                 88801280     87752704      -1.18%
sys-other                 7334208      7405928      +0.98%
sys-stack                  524288       524288      +0.00%
sys-total               103266608    102224216      -1.01%
time                        50339        46533      -7.56%
virtual-mem             292990976    293728256      +0.25%

garbage-1
allocated                 2983818      2990889      +0.24%
allocs                      62880        62902      +0.03%
cputime                  16480000     16190000      -1.76%
gc-pause-one            828462467    487875135     -41.11%
gc-pause-total            4142312      2439375     -41.11%
rss                    1151709184   1153712128      +0.17%
sys-gc                   66068352     66068352      +0.00%
sys-heap               1039728640   1039728640      +0.00%
sys-other                37776064     40770176      +7.93%
sys-stack                 8781824      8781824      +0.00%
sys-total              1152354880   1155348992      +0.26%
time                     16496998     16199876      -1.80%
virtual-mem            1409564672   1402281984      -0.52%

LGTM=rsc
R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot
CC=golang-codereviews, khr
https://golang.org/cl/46430043
2014-02-12 22:16:42 +04:00
Dmitriy Vyukov
e48751e217 runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

This is the exact reincarnation of already LGTMed:
https://golang.org/cl/45770044
which must not break darwin/freebsd after:
https://golang.org/cl/56630043
TBR=iant

LGTM=khr, iant
R=iant, khr
CC=golang-codereviews
https://golang.org/cl/58230043
2014-01-30 13:28:19 +04:00
Dmitriy Vyukov
86a3a54284 runtime: fix windows build
Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes #7218.

R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045
2014-01-28 00:26:56 +04:00
Dmitriy Vyukov
1fa7029425 runtime: combine small NoScan allocations
Combine NoScan allocations < 16 bytes into a single memory block.
Reduces number of allocations on json/garbage benchmarks by 10+%.

json-1
allocated                 8039872      7949194      -1.13%
allocs                     105774        93776     -11.34%
cputime                 156200000    100700000     -35.53%
gc-pause-one              4908873      3814853     -22.29%
gc-pause-total            2748969      2899288      +5.47%
rss                      52674560     43560960     -17.30%
sys-gc                    3796976      3256304     -14.24%
sys-heap                 43843584     35192832     -19.73%
sys-other                 5589312      5310784      -4.98%
sys-stack                  393216       393216      +0.00%
sys-total                53623088     44153136     -17.66%
time                    156193436    100886714     -35.41%
virtual-mem             256548864    256540672      -0.00%

garbage-1
allocated                 2996885      2932982      -2.13%
allocs                      62904        55200     -12.25%
cputime                  17470000     17400000      -0.40%
gc-pause-one            932757485    925806143      -0.75%
gc-pause-total            4663787      4629030      -0.75%
rss                    1151074304   1133670400      -1.51%
sys-gc                   66068352     65085312      -1.49%
sys-heap               1039728640   1024065536      -1.51%
sys-other                38038208     37485248      -1.45%
sys-stack                 8650752      8781824      +1.52%
sys-total              1152485952   1135417920      -1.48%
time                     17478088     17418005      -0.34%
virtual-mem            1343709184   1324204032      -1.45%

LGTM=iant, bradfitz
R=golang-codereviews, dave, iant, rsc, bradfitz
CC=golang-codereviews, khr
https://golang.org/cl/38750047
2014-01-24 22:35:11 +04:00
Dmitriy Vyukov
f8e0057bb7 sync: scalable Pool
Introduce fixed-size P-local caches.
When local caches overflow/underflow a batch of items
is transferred to/from global mutex-protected cache.

benchmark                    old ns/op    new ns/op    delta
BenchmarkPool                    50554        22423  -55.65%
BenchmarkPool-4                 400359         5904  -98.53%
BenchmarkPool-16                403311         1598  -99.60%
BenchmarkPool-32                367310         1526  -99.58%

BenchmarkPoolOverlflow            5214         3633  -30.32%
BenchmarkPoolOverlflow-4         42663         9539  -77.64%
BenchmarkPoolOverlflow-8         46919        11385  -75.73%
BenchmarkPoolOverlflow-16        39454        13048  -66.93%

BenchmarkSprintfEmpty                    84           63  -25.68%
BenchmarkSprintfEmpty-2                 371           32  -91.13%
BenchmarkSprintfEmpty-4                 465           22  -95.25%
BenchmarkSprintfEmpty-8                 565           12  -97.77%
BenchmarkSprintfEmpty-16                498            5  -98.87%
BenchmarkSprintfEmpty-32                492            4  -99.04%

BenchmarkSprintfString                  259          229  -11.58%
BenchmarkSprintfString-2                574          144  -74.91%
BenchmarkSprintfString-4                651           77  -88.05%
BenchmarkSprintfString-8                868           47  -94.48%
BenchmarkSprintfString-16               825           33  -95.96%
BenchmarkSprintfString-32               825           30  -96.28%

BenchmarkSprintfInt                     213          188  -11.74%
BenchmarkSprintfInt-2                   448          138  -69.20%
BenchmarkSprintfInt-4                   624           52  -91.63%
BenchmarkSprintfInt-8                   691           31  -95.43%
BenchmarkSprintfInt-16                  724           18  -97.46%
BenchmarkSprintfInt-32                  718           16  -97.70%

BenchmarkSprintfIntInt                  311          282   -9.32%
BenchmarkSprintfIntInt-2                333          145  -56.46%
BenchmarkSprintfIntInt-4                642          110  -82.87%
BenchmarkSprintfIntInt-8                832           42  -94.90%
BenchmarkSprintfIntInt-16               817           24  -97.00%
BenchmarkSprintfIntInt-32               805           22  -97.17%

BenchmarkSprintfPrefixedInt             309          269  -12.94%
BenchmarkSprintfPrefixedInt-2           245          168  -31.43%
BenchmarkSprintfPrefixedInt-4           598           99  -83.36%
BenchmarkSprintfPrefixedInt-8           770           67  -91.23%
BenchmarkSprintfPrefixedInt-16          829           54  -93.49%
BenchmarkSprintfPrefixedInt-32          824           50  -93.83%

BenchmarkSprintfFloat                   418          398   -4.78%
BenchmarkSprintfFloat-2                 295          203  -31.19%
BenchmarkSprintfFloat-4                 585          128  -78.12%
BenchmarkSprintfFloat-8                 873           60  -93.13%
BenchmarkSprintfFloat-16                884           33  -96.24%
BenchmarkSprintfFloat-32                881           29  -96.62%

BenchmarkManyArgs                      1097         1069   -2.55%
BenchmarkManyArgs-2                     705          567  -19.57%
BenchmarkManyArgs-4                     792          319  -59.72%
BenchmarkManyArgs-8                     963          172  -82.14%
BenchmarkManyArgs-16                   1115          103  -90.76%
BenchmarkManyArgs-32                   1133           90  -92.03%

LGTM=rsc
R=golang-codereviews, bradfitz, minux.ma, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/46010043
2014-01-24 22:29:53 +04:00
Russ Cox
a81692e265 cmd/gc: add zeroing to enable precise stack accounting
There is more zeroing than I would like right now -
temporaries used for the new map and channel runtime
calls need to be eliminated - but it will do for now.

This CL only has an effect if you are building with

        GOEXPERIMENT=precisestack ./all.bash

(or make.bash). It costs about 5% in the overall time
spent in all.bash. That number will come down before
we make it on by default, but this should be enough for
Keith to try using the precise maps for copying stacks.

amd64 only (and it's not really great generated code).

TBR=khr, iant
CC=golang-codereviews
https://golang.org/cl/56430043
2014-01-23 23:11:04 -05:00
Dmitriy Vyukov
8371b0142e undo CL 45770044 / d795425bfa18
Breaks darwin and freebsd.

««« original CL description
runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044
»»»

R=golang-codereviews
CC=golang-codereviews
https://golang.org/cl/56060043
2014-01-23 19:56:59 +04:00
Dmitriy Vyukov
6d603af6dc runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044
2014-01-23 18:59:43 +04:00
Dmitriy Vyukov
9cbd2fb1aa runtime: remove locks from netpoll hotpaths
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
This mechanism does not require backing mutex to protect wait predicate.
Use it in netpoll. See comment in netpoll.goc for details.
This slightly reduces contention between reader, writer and read/write io notifications;
and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.

benchmark                             old ns/op    new ns/op    delta
BenchmarkTCP4ConcurrentReadWrite           2109         1945   -7.78%
BenchmarkTCP4ConcurrentReadWrite-2         1162         1113   -4.22%
BenchmarkTCP4ConcurrentReadWrite-4          798          755   -5.39%
BenchmarkTCP4ConcurrentReadWrite-8          803          748   -6.85%
BenchmarkTCP4Persistent                    9411         9240   -1.82%
BenchmarkTCP4Persistent-2                  5888         5813   -1.27%
BenchmarkTCP4Persistent-4                  4016         3968   -1.20%
BenchmarkTCP4Persistent-8                  3943         3857   -2.18%

R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
CC=golang-codereviews, khr
https://golang.org/cl/45700043
2014-01-22 11:27:16 +04:00
Dmitriy Vyukov
cb133c6607 runtime: do not collect GC roots explicitly
Currently we collect (add) all roots into a global array in a single-threaded GC phase.
This hinders parallelism.
With this change we just kick off parallel for for number_of_goroutines+5 iterations.
Then parallel for callback decides whether it needs to scan stack of a goroutine
scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely.
This requires to store all goroutines in an array instead of a linked list
(to allow direct indexing).
This CL also removes DebugScan functionality. It is broken because it uses
unbounded stack, so it can not run on g0. When it was working, I've found
it helpless for debugging issues because the two algorithms are too different now.
This change would require updating the DebugScan, so it's simpler to just delete it.

With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same.

garbage-8
allocated                 2987886      2989221      +0.04%
allocs                      62885        62887      +0.00%
cputime                  21286000     21272000      -0.07%
gc-pause-one             26633247     24885421      -6.56%
gc-pause-total             873570       811264      -7.13%
rss                     242089984    242515968      +0.18%
sys-gc                   13934336     13869056      -0.47%
sys-heap                205062144    205062144      +0.00%
sys-other                12628288     12628288      +0.00%
sys-stack                11534336     11927552      +3.41%
sys-total               243159104    243487040      +0.13%
time                      2809477      2740795      -2.44%

R=golang-codereviews, rsc
CC=cshapiro, golang-codereviews, khr
https://golang.org/cl/46860043
2014-01-21 13:06:57 +04:00
Dmitriy Vyukov
1ba04c171a runtime: per-P defer pool
Instead of a per-goroutine stack of defers for all sizes,
introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes.

For a program that starts 1e6 goroutines and then joins then:
old: rss=6.6g virtmem=10.2g time=4.85s
new: rss=4.5g virtmem= 8.2g time=3.48s

R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/42750044
2014-01-21 11:20:23 +04:00
Dmitriy Vyukov
d5a36cd6bb runtime: zero 2-word memory blocks in-place
Currently for 2-word blocks we set the flag to clear the flag. Makes no sense.
In particular on 32-bits we call memclr always.

R=golang-codereviews, dave, iant
CC=golang-codereviews, khr, rsc
https://golang.org/cl/41170044
2014-01-21 10:53:51 +04:00
Dmitriy Vyukov
c0b9e6218c runtime: output how long goroutines are blocked
Example of output:

goroutine 4 [sleep for 3 min]:
time.Sleep(0x34630b8a000)
        src/pkg/runtime/time.goc:31 +0x31
main.func·002()
        block.go:16 +0x2c
created by main.main
        block.go:17 +0x33

Full program and output are here:
http://play.golang.org/p/NEZdADI3Td

Fixes #6809.

R=golang-codereviews, khr, kamil.kisiel, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/50420043
2014-01-16 12:54:46 +04:00
Dmitriy Vyukov
b3a3afc9b7 runtime: fix data race in GC
Fixes #5139.
Update #7065.

R=golang-codereviews, bradfitz, minux.ma
CC=golang-codereviews
https://golang.org/cl/52090045
2014-01-15 19:38:08 +04:00
Russ Cox
3ec60c253d runtime: emit collection stacks in GODEBUG=allocfreetrace mode
R=khr, dvyukov
CC=golang-codereviews
https://golang.org/cl/51830043
2014-01-14 10:39:50 -05:00
Ian Lance Taylor
8da8b37674 runtime: fix 32-bit malloc for pointers >= 0x80000000
The spans array is allocated in runtime·mallocinit.  On a
32-bit system the number of entries in the spans array is
MaxArena32 / PageSize, which (2U << 30) / (1 << 12) == (1 << 19).
So we are allocating an array that can hold 19 bits for an
index that can hold 20 bits.  According to the comment in the
function, this is intentional: we only allocate enough spans
(and bitmaps) for a 2G arena, because allocating more would
probably be wasteful.

But since the span index is simply the upper 20 bits of the
memory address, this scheme only works if memory addresses are
limited to the low 2G of memory.  That would be OK if we were
careful to enforce it, but we're not.  What we are careful to
enforce, in functions like runtime·MHeap_SysAlloc, is that we
always return addresses between the heap's arena_start and
arena_start + MaxArena32.

We generally get away with it because we start allocating just
after the program end, so we only run into trouble with
programs that allocate a lot of memory, enough to get past
address 0x80000000.

This changes the code that computes a span index to subtract
arena_start on 32-bit systems just as we currently do on
64-bit systems.

R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/49460043
2014-01-09 15:00:00 -08:00
Keith Randall
020b39c3f3 runtime: use special records hung off the MSpan to
record finalizers and heap profile info.  Enables
removing the special bit from the heap bitmap.  Also
provides a generic mechanism for annotating occasional
heap objects.

finalizers
        overhead      per obj
old	680 B	      80 B avg
new	16 B/span     48 B

profile
        overhead      per obj
old	32KB	      24 B + hash tables
new	16 B/span     24 B

R=cshapiro, khr, dvyukov, gobot
CC=golang-codereviews
https://golang.org/cl/13314053
2014-01-07 13:45:50 -08:00
Keith Randall
2d20c0d625 runtime: mark objects in free lists as allocated and unscannable.
On the plus side, we don't need to change the bits when mallocing
pointerless objects.  On the other hand, we need to mark objects in the
free lists during GC.  But the free lists are small at GC time, so it
should be a net win.

benchmark                    old ns/op    new ns/op    delta
BenchmarkMalloc8                    40           33  -17.65%
BenchmarkMalloc16                   45           38  -15.72%
BenchmarkMallocTypeInfo8            58           59   +0.85%
BenchmarkMallocTypeInfo16           63           64   +1.10%

R=golang-dev, rsc, dvyukov
CC=cshapiro, golang-dev
https://golang.org/cl/41040043
2013-12-18 17:13:59 -08:00
Brad Fitzpatrick
8c6ef061e3 sync: add Pool type
Adds the Pool type and docs, and use it in fmt.
This is a temporary implementation, until Dmitry
makes it fast.

Uses the API proposal from Russ in http://goo.gl/cCKeb2 but
adds an optional New field, as used in fmt and elsewhere.
Almost all callers want that.

Update #4720

R=golang-dev, rsc, cshapiro, iant, r, dvyukov, khr
CC=golang-dev
https://golang.org/cl/41860043
2013-12-18 11:08:34 -08:00
Russ Cox
bc135f6492 runtime: fix crash in runtime.GoroutineProfile
This is a possible Go 1.2.1 candidate.

Fixes #6946.

R=iant, r
CC=golang-dev
https://golang.org/cl/41640043
2013-12-13 15:44:57 -05:00
Carl Shapiro
c69402d82b runtime: remove outdated comment and related indentation
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/39810043
2013-12-10 11:17:43 -08:00
Carl Shapiro
76c54c1193 runtime: add GODEBUG option for an electric fence like heap mode
When enabled this new debugging mode will allocate objects on
their own page and never recycle memory addresses.  This is an
essential tool to root cause a broad class of heap corruption.

R=golang-dev, dave, daniel.morsing, dvyukov, rsc, iant, cshapiro
CC=golang-dev
https://golang.org/cl/22060046
2013-12-06 14:40:45 -08:00
Carl Shapiro
f056daf075 cmd/5g, cmd/5l, cmd/6g, cmd/6l, cmd/8g, cmd/8l, cmd/gc, runtime: generate pointer maps by liveness analysis
This change allows the garbage collector to examine stack
slots that are determined as live and containing a pointer
value by the garbage collector.  This results in a mean
reduction of 65% in the number of stack slots scanned during
an invocation of "GOGC=1 all.bash".

Unfortunately, this does not yet allow garbage collection to
be precise for the stack slots computed as live.  Pointers
confound the determination of what definitions reach a given
instruction.  In general, this problem is not solvable without
runtime cost but some advanced cooperation from the compiler
might mitigate common cases.

R=golang-dev, rsc, cshapiro
CC=golang-dev
https://golang.org/cl/14430048
2013-12-05 17:35:22 -08:00
Carl Shapiro
0368a7ceb6 runtime: move stack scanning into the parallel mark phase
This change reduces the cost of the stack scanning by frames.
It moves the stack scanning from the serial root enumeration
phase to the parallel tracing phase.  The output that follows
are timings for the issue 6482 benchmark

Baseline

BenchmarkGoroutineSelect	      50	 108027405 ns/op
BenchmarkGoroutineBlocking	      50	  89573332 ns/op
BenchmarkGoroutineForRange	      20	  95614116 ns/op
BenchmarkGoroutineIdle		      20	 122809512 ns/op

Stack scan by frames, non-parallel

BenchmarkGoroutineSelect	      20	 297138929 ns/op
BenchmarkGoroutineBlocking	      20	 301137599 ns/op
BenchmarkGoroutineForRange	      10	 312499469 ns/op
BenchmarkGoroutineIdle		      10	 209428876 ns/op

Stack scan by frames, parallel

BenchmarkGoroutineSelect	      20	 183938431 ns/op
BenchmarkGoroutineBlocking	      20	 170109999 ns/op
BenchmarkGoroutineForRange	      20	 179628882 ns/op
BenchmarkGoroutineIdle		      20	 157541498 ns/op

The remaining performance disparity is due to inefficiencies
in gentraceback and its callees.  The effect was isolated by
using a parallel stack scan where scanstack was modified to do
a conservative scan of the stack segments without gentraceback
followed by a call of gentrackback with a no-op callback.

The output that follows are the top-10 most frequent tops of
stacks as determined by the Linux perf record facility.

Baseline

+  25.19%  gc.test  gc.test            [.] runtime.xchg
+  19.00%  gc.test  gc.test            [.] scanblock
+   8.53%  gc.test  gc.test            [.] scanstack
+   8.46%  gc.test  gc.test            [.] flushptrbuf
+   5.08%  gc.test  gc.test            [.] procresize
+   3.57%  gc.test  gc.test            [.] runtime.chanrecv
+   2.94%  gc.test  gc.test            [.] dequeue
+   2.74%  gc.test  gc.test            [.] addroots
+   2.25%  gc.test  gc.test            [.] runtime.ready
+   1.33%  gc.test  gc.test            [.] runtime.cas64

Gentraceback

+  18.12%  gc.test  gc.test             [.] runtime.xchg
+  14.68%  gc.test  gc.test             [.] scanblock
+   8.20%  gc.test  gc.test             [.] runtime.gentraceback
+   7.38%  gc.test  gc.test             [.] flushptrbuf
+   6.84%  gc.test  gc.test             [.] scanstack
+   5.92%  gc.test  gc.test             [.] runtime.findfunc
+   3.62%  gc.test  gc.test             [.] procresize
+   3.15%  gc.test  gc.test             [.] readvarint
+   1.92%  gc.test  gc.test             [.] addroots
+   1.87%  gc.test  gc.test             [.] runtime.chanrecv

R=golang-dev, dvyukov, rsc
CC=golang-dev
https://golang.org/cl/17410043
2013-12-03 14:12:55 -08:00
Keith Randall
139cc96a57 runtime: markfreed's error reports should be prefixed with "markfreed", not "markallocated".
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/14441055
2013-10-09 13:28:47 -07:00
Russ Cox
c3dadca977 runtime: do not scan stack by frames during garbage collection
Walking the stack by frames is ~3x more expensive
than not, and since it didn't end up being precise,
there is not enough benefit to outweigh the cost.

This is the conservative choice: this CL makes the
stack scanning behavior the same as it was in Go 1.1.

Add benchmarks to package runtime so that we have
them when we re-enable this feature during the
Go 1.3 development.

benchmark                     old ns/op    new ns/op    delta
BenchmarkGoroutineSelect        3194909      1272092  -60.18%
BenchmarkGoroutineBlocking      3120282       866366  -72.23%
BenchmarkGoroutineForRange      3256179       939902  -71.13%
BenchmarkGoroutineIdle          2005571       482982  -75.92%

The Go 1 benchmarks, just to add more data.
As far as I can tell the changes are mainly noise.

benchmark                         old ns/op    new ns/op    delta
BenchmarkBinaryTree17            4409403046   4414734932   +0.12%
BenchmarkFannkuch11              3407708965   3378306120   -0.86%
BenchmarkFmtFprintfEmpty                100           99   -0.60%
BenchmarkFmtFprintfString               242          239   -1.24%
BenchmarkFmtFprintfInt                  204          206   +0.98%
BenchmarkFmtFprintfIntInt               320          316   -1.25%
BenchmarkFmtFprintfPrefixedInt          295          299   +1.36%
BenchmarkFmtFprintfFloat                442          435   -1.58%
BenchmarkFmtManyArgs                   1246         1216   -2.41%
BenchmarkGobDecode                 10186951     10051210   -1.33%
BenchmarkGobEncode                 16504381     16445650   -0.36%
BenchmarkGzip                     447030885    447056865   +0.01%
BenchmarkGunzip                   111056154    111696305   +0.58%
BenchmarkHTTPClientServer             89973        93040   +3.41%
BenchmarkJSONEncode                28174182     27933893   -0.85%
BenchmarkJSONDecode               106353777    110443817   +3.85%
BenchmarkMandelbrot200              4822289      4806083   -0.34%
BenchmarkGoParse                    6102436      6142734   +0.66%
BenchmarkRegexpMatchEasy0_32            133          132   -0.75%
BenchmarkRegexpMatchEasy0_1K            372          373   +0.27%
BenchmarkRegexpMatchEasy1_32            113          111   -1.77%
BenchmarkRegexpMatchEasy1_1K            964          940   -2.49%
BenchmarkRegexpMatchMedium_32           202          205   +1.49%
BenchmarkRegexpMatchMedium_1K         68862        68858   -0.01%
BenchmarkRegexpMatchHard_32            3480         3407   -2.10%
BenchmarkRegexpMatchHard_1K          108255       112614   +4.03%
BenchmarkRevcomp                  751393035    743929976   -0.99%
BenchmarkTemplate                 139637041    135402220   -3.03%
BenchmarkTimeParse                      479          475   -0.84%
BenchmarkTimeFormat                     460          466   +1.30%

benchmark                          old MB/s     new MB/s  speedup
BenchmarkGobDecode                    75.34        76.36    1.01x
BenchmarkGobEncode                    46.50        46.67    1.00x
BenchmarkGzip                         43.41        43.41    1.00x
BenchmarkGunzip                      174.73       173.73    0.99x
BenchmarkJSONEncode                   68.87        69.47    1.01x
BenchmarkJSONDecode                   18.25        17.57    0.96x
BenchmarkGoParse                       9.49         9.43    0.99x
BenchmarkRegexpMatchEasy0_32         239.58       241.74    1.01x
BenchmarkRegexpMatchEasy0_1K        2749.74      2738.00    1.00x
BenchmarkRegexpMatchEasy1_32         282.49       286.32    1.01x
BenchmarkRegexpMatchEasy1_1K        1062.00      1088.96    1.03x
BenchmarkRegexpMatchMedium_32          4.93         4.86    0.99x
BenchmarkRegexpMatchMedium_1K         14.87        14.87    1.00x
BenchmarkRegexpMatchHard_32            9.19         9.39    1.02x
BenchmarkRegexpMatchHard_1K            9.46         9.09    0.96x
BenchmarkRevcomp                     338.26       341.65    1.01x
BenchmarkTemplate                     13.90        14.33    1.03x

Fixes #6482.

R=golang-dev, dave, r
CC=golang-dev
https://golang.org/cl/14257043
2013-10-02 11:59:53 -04:00
Russ Cox
30ecb4cd05 build: disable precise collection of stack frames
The code for call site-specific pointer bitmaps was not ready in time,
but the zeroing required without it is too expensive to use by default.
We will have to wait for precise collection of stack frames until Go 1.3.

The precise collection can be re-enabled by

        GOEXPERIMENT=precisestack ./all.bash

but that will not be the default for a Go 1.2 build.

Fixes #6087.

R=golang-dev, jeremyjackins, dan.kortschak, r
CC=golang-dev
https://golang.org/cl/13677045
2013-09-16 20:26:10 -04:00
Dmitriy Vyukov
a33ef8d11b runtime: account for all sys memory in MemStats
Currently lots of sys allocations are not accounted in any of XxxSys,
including GC bitmap, spans table, GC roots blocks, GC finalizer blocks,
iface table, netpoll descriptors and more. Up to ~20% can unaccounted.
This change introduces 2 new stats: GCSys and OtherSys for GC metadata
and all other misc allocations, respectively.
Also ensures that all XxxSys indeed sum up to Sys. All sys memory allocation
functions require the stat for accounting, so that it's impossible to miss something.
Also fix updating of mcache_sys/inuse, they were not updated after deallocation.

test/bench/garbage/parser before:
Sys		670064344
HeapSys		610271232
StackSys	65536
MSpanSys	14204928
MCacheSys	16384
BuckHashSys	1439992

after:
Sys		670064344
HeapSys		610271232
StackSys	65536
MSpanSys	14188544
MCacheSys	16384
BuckHashSys	3194304
GCSys		39198688
OtherSys	3129656

Fixes #5799.

R=rsc, dave, alex.brainman
CC=golang-dev
https://golang.org/cl/12946043
2013-09-06 16:55:40 -04:00
Keith Randall
23f9751e83 runtime: clean up map code. Remove hashmap.h.
Use cnew/cnewarray instead of mallocgc.

R=golang-dev, dvyukov
CC=golang-dev
https://golang.org/cl/13396045
2013-08-31 14:09:34 -07:00
Keith Randall
fb376021be runtime: record type information for hashtable internal structures.
Remove all hashtable-specific GC code.

Fixes bug 6119.

R=cshapiro, dvyukov, khr
CC=golang-dev
https://golang.org/cl/13078044
2013-08-31 09:09:50 -07:00
Carl Shapiro
c51152f438 runtime: check bitmap word for allocated bit in markonly
When searching for an allocated bit, flushptrbuf would search
backward in the bitmap word containing the bit of pointer
being looked-up before searching the span.  This extra check
was not replicated in markonly which, instead, after not
finding an allocated bit for a pointer would directly look in
the span.

Using statistics generated from godoc, before this change span
lookups were, on average, more common than word lookups.  It
was common for markonly to consult spans for one third of its
pointer lookups.  With this change in place, what were
previously span lookups are overwhelmingly become by the word
lookups making the total number of span lookups a relatively
small fraction of the whole.

This change also introduces some statistics gathering about
lookups guarded by the CollectStats enum.

R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/13311043
2013-08-29 13:52:38 -07:00
Keith Randall
ed467db6d8 cmd/cc,runtime: change preprocessor to expand macros inside of
#pragma textflag and #pragma dataflag directives.
Update dataflag directives to use symbols instead of integer constants.

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/13310043
2013-08-29 12:36:59 -07:00
Keith Randall
d0dd420a24 runtime: rename FlagNoPointers to FlagNoScan. Better represents
the use of the flag, especially for objects which actually do have
pointers but we don't want the GC to scan them.

R=golang-dev, cshapiro
CC=golang-dev
https://golang.org/cl/13181045
2013-08-23 17:28:47 -07:00
Dmitriy Vyukov
dfdd1ba028 runtime: do not trigger GC on g0
GC acquires worldsema, which is a goroutine-level semaphore
which parks goroutines. g0 can not be parked.
Fixes #6193.

R=khr, khr
CC=golang-dev
https://golang.org/cl/12880045
2013-08-22 02:17:45 +04:00
Carl Shapiro
87fdb8fb9a undo CL 13010045 / 04f8101b46dd
Update the original change but do not read interface types in
the arguments area.  Once the arguments area is zeroed as the
locals area is we can safely read interface type values there
too.

««« original CL description
undo CL 12785045 / 71ce80dc4195

This has broken the 32-bit builds.

««« original CL description
cmd/gc, runtime: use type information to scan interface values

R=golang-dev, rsc, dvyukov
CC=golang-dev
https://golang.org/cl/12785045
»»»

R=khr, golang-dev, khr
CC=golang-dev
https://golang.org/cl/13010045
»»»

R=khr, khr
CC=golang-dev
https://golang.org/cl/13073045
2013-08-21 13:51:00 -07:00
Carl Shapiro
ca2d32b46d undo CL 12785045 / 71ce80dc4195
This has broken the 32-bit builds.

««« original CL description
cmd/gc, runtime: use type information to scan interface values

R=golang-dev, rsc, dvyukov
CC=golang-dev
https://golang.org/cl/12785045
»»»

R=khr, golang-dev, khr
CC=golang-dev
https://golang.org/cl/13010045
2013-08-19 14:16:55 -07:00
Keith Randall
6401e0f83f runtime: don't run finalizers if we're still on the g0 stack.
R=golang-dev, rsc, dvyukov, khr
CC=golang-dev
https://golang.org/cl/11386044
2013-08-19 12:20:50 -07:00
Carl Shapiro
21ea5103a4 cmd/gc, runtime: use type information to scan interface values
R=golang-dev, rsc, dvyukov
CC=golang-dev
https://golang.org/cl/12785045
2013-08-19 10:19:59 -07:00
Russ Cox
5822e7848a runtime: make SetFinalizer(x, f) accept any f for which f(x) is valid
Originally the requirement was f(x) where f's argument is
exactly x's type.

CL 11858043 relaxed the requirement in a non-standard
way: f's argument must be exactly x's type or interface{}.

If we're going to relax the requirement, it should be done
in a way consistent with the rest of Go. This CL allows f's
argument to have any type for which x is assignable;
that's the same requirement the compiler would impose
if compiling f(x) directly.

Fixes #5368.

R=dvyukov, bradfitz, pieter
CC=golang-dev
https://golang.org/cl/12895043
2013-08-14 14:54:31 -04:00
Carl Shapiro
abc516e420 cmd/cc, cmd/gc, runtime: Uniquely encode iface and eface pointers in the pointer map.
Prior to this change, pointer maps encoded the disposition of
a word using a single bit.  A zero signaled a non-pointer
value and a one signaled a pointer value.  Interface values,
which are a effectively a union type, were conservatively
labeled as a pointer.

This change widens the logical element size of the pointer map
to two bits per word.  As before, zero signals a non-pointer
value and one signals a pointer value.  Additionally, a two
signals an iface pointer and a three signals an eface pointer.

Following other changes to the runtime, values two and three
will allow a type information to drive interpretation of the
subsequent word so only those interface values containing a
pointer value will be scanned.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12689046
2013-08-09 16:48:12 -07:00
Dmitriy Vyukov
23e15f7253 net: add special netFD mutex
The mutex, fdMutex, handles locking and lifetime of sysfd,
and serializes Read and Write methods.
This allows to strip 2 sync.Mutex.Lock calls,
2 sync.Mutex.Unlock calls, 1 defer and some amount
of misc overhead from every network operation.

On linux/amd64, Intel E5-2690:
benchmark                             old ns/op    new ns/op    delta
BenchmarkTCP4Persistent                    9595         9454   -1.47%
BenchmarkTCP4Persistent-2                  8978         8772   -2.29%
BenchmarkTCP4ConcurrentReadWrite           4900         4625   -5.61%
BenchmarkTCP4ConcurrentReadWrite-2         2603         2500   -3.96%

In general it strips 70-500 ns from every network operation depending
on processor model. On my relatively new E5-2690 it accounts to ~5%
of network op cost.

Fixes #6074.

R=golang-dev, bradfitz, alex.brainman, iant, mikioh.mikioh
CC=golang-dev
https://golang.org/cl/12418043
2013-08-09 21:43:00 +04:00
Carl Shapiro
73c93a404c cmd/cc, cmd/gc, runtime: emit bitmaps for scanning locals.
Previously, all word aligned locations in the local variables
area were scanned as conservative roots.  With this change, a
bitmap is generated describing the locations of pointer values
in local variables.

With this change the argument bitmap information has been
changed to only store information about arguments.  The locals
member, has been removed.  In its place, the bitmap data for
local variables is now used to store the size of locals.  If
the size is negative, the magnitude indicates the size of the
local variables area.

R=rsc
CC=golang-dev
https://golang.org/cl/12328044
2013-08-07 12:47:01 -07:00
Dmitriy Vyukov
9c0500b466 runtime: use gcpc/gcsp during traceback of goroutines in syscalls
gcpc/gcsp are used by GC in similar situation.
gcpc/gcsp are also more stable than gp->sched,
because gp->sched is mutated by entersyscall/exitsyscall
in morestack and mcall. So it has higher chances of being inconsistent.
Also, rename gcpc/gcsp to syscallpc/syscallsp.

This is the same as reverted change 12250043
with save marked as textflag 7.
The problem was that if save calls morestack,
then subsequent lessstack spoils g->sched.pc/sp.
And that bad values were remembered in g->syscallpc/sp.
Entersyscallblock had the same problem,
but it was never triggered to date.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12478043
2013-08-06 13:38:44 +04:00
Dmitriy Vyukov
f38ff9e5ea undo CL 12250043 / e911f94c4902
Break all 386 builders.

««« original CL description
runtime: use gcpc/gcsp during traceback of goroutines in syscalls
gcpc/gcsp are used by GC in similar situation.
gcpc/gcsp are also more stable than gp->sched,
because gp->sched is mutated by entersyscall/exitsyscall
in morestack and mcall. So it has higher chances of being inconsistent.
Also, rename gcpc/gcsp to syscallpc/syscallsp.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12250043
»»»

R=rsc
CC=golang-dev
https://golang.org/cl/12424045
2013-08-05 23:33:50 +04:00
Dmitriy Vyukov
d5ab784611 runtime: remove singleproc var
It was needed for the old scheduler,
because there temporary could be more threads than gomaxprocs.
In the new scheduler gomaxprocs is always respected.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12438043
2013-08-05 22:58:02 +04:00
Dmitriy Vyukov
f73972fa33 runtime: use gcpc/gcsp during traceback of goroutines in syscalls
gcpc/gcsp are used by GC in similar situation.
gcpc/gcsp are also more stable than gp->sched,
because gp->sched is mutated by entersyscall/exitsyscall
in morestack and mcall. So it has higher chances of being inconsistent.
Also, rename gcpc/gcsp to syscallpc/syscallsp.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12250043
2013-08-05 22:55:54 +04:00
Dmitriy Vyukov
0a904a3f2e runtime: remove dead code
Remove dead code related to allocation of type metadata with SysAlloc.

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/12311045
2013-08-04 23:32:06 +04:00
Pieter Droogendijk
6350e45892 runtime: allow SetFinalizer with a func(interface{})
Fixes #5368.

R=golang-dev, dvyukov
CC=golang-dev, rsc
https://golang.org/cl/11858043
2013-07-29 19:43:08 +04:00
Dmitriy Vyukov
f8a850b250 runtime: refactor mallocgc
Make it accept type, combine flags.
Several reasons for the change:
1. mallocgc and settype must be atomic wrt GC
2. settype is called from only one place now
3. it will help performance (eventually settype
functionality must be combined with markallocated)
4. flags are easier to read now (no mallocgc(sz, 0, 1, 0) anymore)

R=golang-dev, iant, nightlyone, rsc, dave, khr, bradfitz, r
CC=golang-dev
https://golang.org/cl/10136043
2013-07-26 21:17:24 +04:00
Russ Cox
48769bf546 runtime: use funcdata to supply garbage collection information
This CL introduces a FUNCDATA number for runtime-specific
garbage collection metadata, changes the C and Go compilers
to emit that metadata, and changes the runtime to expect it.

The old pseudo-instructions that carried this information
are gone, as is the linker code to process them.

R=golang-dev, dvyukov, cshapiro
CC=golang-dev
https://golang.org/cl/11406044
2013-07-19 16:04:09 -04:00
Dmitriy Vyukov
eb04df75cd runtime: prevent GC from seeing the contents of a frame in runfinq
This holds the last finalized object and arguments to its finalizer.
Fixes #5348.

R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/11454044
2013-07-19 18:01:33 +04:00
Russ Cox
c3de91bb15 cmd/ld, runtime: use new contiguous pcln table
R=golang-dev, r, dave
CC=golang-dev
https://golang.org/cl/11494043
2013-07-18 10:43:22 -04:00
Dmitriy Vyukov
5887f142a3 runtime: more reliable preemption
Currently preemption signal g->stackguard0==StackPreempt
can be lost if it is received when preemption is disabled
(e.g. m->lock!=0). This change duplicates the preemption
signal in g->preempt and restores g->stackguard0
when preemption is enabled.
Update #543.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/10792043
2013-07-17 12:52:37 -04:00
Russ Cox
5d363c6357 cmd/ld, runtime: new in-memory symbol table format
Design at http://golang.org/s/go12symtab.

This enables some cleanup of the garbage collector metadata
that will be done in future CLs.

This CL does not move the old symtab and pclntab back into
an unmapped section of the file. That's a bit tricky and will be
done separately.

Fixes #4020.

R=golang-dev, dave, cshapiro, iant, r
CC=golang-dev, nigeltao
https://golang.org/cl/11085043
2013-07-16 09:41:38 -04:00
Dmitriy Vyukov
4b536a550f runtime: introduce GODEBUG env var
Currently it replaces GOGCTRACE env var (GODEBUG=gctrace=1).
The plan is to extend it with other type of debug tracing,
e.g. GODEBUG=gctrace=1,schedtrace=100.

R=rsc
CC=bradfitz, daniel.morsing, gobot, golang-dev
https://golang.org/cl/10026045
2013-06-28 18:37:06 +04:00
Russ Cox
6fa3c89b77 runtime: record proper goroutine state during stack split
Until now, the goroutine state has been scattered during the
execution of newstack and oldstack. It's all there, and those routines
know how to get back to a working goroutine, but other pieces of
the system, like stack traces, do not. If something does interrupt
the newstack or oldstack execution, the rest of the system can't
understand the goroutine. For example, if newstack decides there
is an overflow and calls throw, the stack tracer wouldn't dump the
goroutine correctly.

For newstack to save a useful state snapshot, it needs to be able
to rewind the PC in the function that triggered the split back to
the beginning of the function. (The PC is a few instructions in, just
after the call to morestack.) To make that possible, we change the
prologues to insert a jmp back to the beginning of the function
after the call to morestack. That is, the prologue used to be roughly:

        TEXT myfunc
                check for split
                jmpcond nosplit
                call morestack
        nosplit:
                sub $xxx, sp

Now an extra instruction is inserted after the call:

        TEXT myfunc
        start:
                check for split
                jmpcond nosplit
                call morestack
                jmp start
        nosplit:
                sub $xxx, sp

The jmp is not executed directly. It is decoded and simulated by
runtime.rewindmorestack to discover the beginning of the function,
and then the call to morestack returns directly to the start label
instead of to the jump instruction. So logically the jmp is still
executed, just not by the cpu.

The prologue thus repeats in the case of a function that needs a
stack split, but against the cost of the split itself, the extra few
instructions are noise. The repeated prologue has the nice effect of
making a stack split double-check that the new stack is big enough:
if morestack happens to return on a too-small stack, we'll now notice
before corruption happens.

The ability for newstack to rewind to the beginning of the function
should help preemption too. If newstack decides that it was called
for preemption instead of a stack split, it now has the goroutine state
correctly paused if rescheduling is needed, and when the goroutine
can run again, it can return to the start label on its original stack
and re-execute the split check.

Here is an example of a split stack overflow showing the full
trace, without any special cases in the stack printer.
(This one was triggered by making the split check incorrect.)

runtime: newstack framesize=0x0 argsize=0x18 sp=0x6aebd0 stack=[0x6b0000, 0x6b0fa0]
        morebuf={pc:0x69f5b sp:0x6aebd8 lr:0x0}
        sched={pc:0x68880 sp:0x6aebd0 lr:0x0 ctxt:0x34e700}
runtime: split stack overflow: 0x6aebd0 < 0x6b0000
fatal error: runtime: split stack overflow

goroutine 1 [stack split]:
runtime.mallocgc(0x290, 0x100000000, 0x1)
        /Users/rsc/g/go/src/pkg/runtime/zmalloc_darwin_amd64.c:21 fp=0x6aebd8
runtime.new()
        /Users/rsc/g/go/src/pkg/runtime/zmalloc_darwin_amd64.c:682 +0x5b fp=0x6aec08
go/build.(*Context).Import(0x5ae340, 0xc210030c71, 0xa, 0xc2100b4380, 0x1b, ...)
        /Users/rsc/g/go/src/pkg/go/build/build.go:424 +0x3a fp=0x6b00a0
main.loadImport(0xc210030c71, 0xa, 0xc2100b4380, 0x1b, 0xc2100b42c0, ...)
        /Users/rsc/g/go/src/cmd/go/pkg.go:249 +0x371 fp=0x6b01a8
main.(*Package).load(0xc21017c800, 0xc2100b42c0, 0xc2101828c0, 0x0, 0x0, ...)
        /Users/rsc/g/go/src/cmd/go/pkg.go:431 +0x2801 fp=0x6b0c98
main.loadPackage(0x369040, 0x7, 0xc2100b42c0, 0x0)
        /Users/rsc/g/go/src/cmd/go/pkg.go:709 +0x857 fp=0x6b0f80
----- stack segment boundary -----
main.(*builder).action(0xc2100902a0, 0x0, 0x0, 0xc2100e6c00, 0xc2100e5750, ...)
        /Users/rsc/g/go/src/cmd/go/build.go:539 +0x437 fp=0x6b14a0
main.(*builder).action(0xc2100902a0, 0x0, 0x0, 0xc21015b400, 0x2, ...)
        /Users/rsc/g/go/src/cmd/go/build.go:528 +0x1d2 fp=0x6b1658
main.(*builder).test(0xc2100902a0, 0xc210092000, 0x0, 0x0, 0xc21008ff60, ...)
        /Users/rsc/g/go/src/cmd/go/test.go:622 +0x1b53 fp=0x6b1f68
----- stack segment boundary -----
main.runTest(0x5a6b20, 0xc21000a020, 0x2, 0x2)
        /Users/rsc/g/go/src/cmd/go/test.go:366 +0xd09 fp=0x6a5cf0
main.main()
        /Users/rsc/g/go/src/cmd/go/main.go:161 +0x4f9 fp=0x6a5f78
runtime.main()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:183 +0x92 fp=0x6a5fa0
runtime.goexit()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:1266 fp=0x6a5fa8

And here is a seg fault during oldstack:

SIGSEGV: segmentation violation
PC=0x1b2a6

runtime.oldstack()
        /Users/rsc/g/go/src/pkg/runtime/stack.c:159 +0x76
runtime.lessstack()
        /Users/rsc/g/go/src/pkg/runtime/asm_amd64.s:270 +0x22

goroutine 1 [stack unsplit]:
fmt.(*pp).printArg(0x2102e64e0, 0xe5c80, 0x2102c9220, 0x73, 0x0, ...)
        /Users/rsc/g/go/src/pkg/fmt/print.go:818 +0x3d3 fp=0x221031e6f8
fmt.(*pp).doPrintf(0x2102e64e0, 0x12fb20, 0x2, 0x221031eb98, 0x1, ...)
        /Users/rsc/g/go/src/pkg/fmt/print.go:1183 +0x15cb fp=0x221031eaf0
fmt.Sprintf(0x12fb20, 0x2, 0x221031eb98, 0x1, 0x1, ...)
        /Users/rsc/g/go/src/pkg/fmt/print.go:234 +0x67 fp=0x221031eb40
flag.(*stringValue).String(0x2102c9210, 0x1, 0x0)
        /Users/rsc/g/go/src/pkg/flag/flag.go:180 +0xb3 fp=0x221031ebb0
flag.(*FlagSet).Var(0x2102f6000, 0x293d38, 0x2102c9210, 0x143490, 0xa, ...)
        /Users/rsc/g/go/src/pkg/flag/flag.go:633 +0x40 fp=0x221031eca0
flag.(*FlagSet).StringVar(0x2102f6000, 0x2102c9210, 0x143490, 0xa, 0x12fa60, ...)
        /Users/rsc/g/go/src/pkg/flag/flag.go:550 +0x91 fp=0x221031ece8
flag.(*FlagSet).String(0x2102f6000, 0x143490, 0xa, 0x12fa60, 0x0, ...)
        /Users/rsc/g/go/src/pkg/flag/flag.go:563 +0x87 fp=0x221031ed38
flag.String(0x143490, 0xa, 0x12fa60, 0x0, 0x161950, ...)
        /Users/rsc/g/go/src/pkg/flag/flag.go:570 +0x6b fp=0x221031ed80
testing.init()
        /Users/rsc/g/go/src/pkg/testing/testing.go:-531 +0xbb fp=0x221031edc0
strings_test.init()
        /Users/rsc/g/go/src/pkg/strings/strings_test.go:1115 +0x62 fp=0x221031ef70
main.init()
        strings/_test/_testmain.go:90 +0x3d fp=0x221031ef78
runtime.main()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:180 +0x8a fp=0x221031efa0
runtime.goexit()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:1269 fp=0x221031efa8

goroutine 2 [runnable]:
runtime.MHeap_Scavenger()
        /Users/rsc/g/go/src/pkg/runtime/mheap.c:438
runtime.goexit()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:1269
created by runtime.main
        /Users/rsc/g/go/src/pkg/runtime/proc.c:166

rax     0x23ccc0
rbx     0x23ccc0
rcx     0x0
rdx     0x38
rdi     0x2102c0170
rsi     0x221032cfe0
rbp     0x221032cfa0
rsp     0x7fff5fbff5b0
r8      0x2102c0120
r9      0x221032cfa0
r10     0x221032c000
r11     0x104ce8
r12     0xe5c80
r13     0x1be82baac718
r14     0x13091135f7d69200
r15     0x0
rip     0x1b2a6
rflags  0x10246
cs      0x2b
fs      0x0
gs      0x0

Fixes #5723.

R=r, dvyukov, go.peter.90, dave, iant
CC=golang-dev
https://golang.org/cl/10360048
2013-06-27 11:32:01 -04:00
Dmitriy Vyukov
94dc963b55 runtime: fix race condition between GC and setGCPercent
If first GC runs concurrently with setGCPercent,
it can overwrite gcpercent value with default.

R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/10242047
2013-06-15 16:07:06 +04:00
Keith Randall
de316388a7 runtime: garbage collector runs on g0 now.
No need to change to Grunnable state.
Add some more checks for Grunning state.

R=golang-dev, rsc, khr, dvyukov
CC=golang-dev
https://golang.org/cl/10186045
2013-06-14 11:42:51 -07:00