This is based on the crash dump provided by Alan
and on mental experiments:
sweep 0 74
fatal error: gc: unswept span
runtime stack:
runtime.throw(0x9df60d)
markroot(0xc208002000, 0x3)
runtime.parfordo(0xc208002000)
runtime.gchelper()
I think that when we moved all stacks into heap,
we introduced a bunch of bad data races. This was later
worsened by parallel stack shrinking.
Observation 1: exitsyscall can allocate a stack from heap at any time (including during STW).
Observation 2: parallel stack shrinking can (surprisingly) grow heap during marking.
Consider that we steadily grow stacks of a number of goroutines from 8K to 16K.
And during GC they all can be shrunk back to 8K. Shrinking will allocate lots of 8K
stacks, and we do not necessary have that many in heap at this moment. So shrinking
can grow heap as well.
Consequence: any access to mheap.allspans in GC (and otherwise) must take heap lock.
This is not true in several places.
Fix this by protecting accesses to mheap.allspans and introducing allspans cache for marking,
similar to what we use for sweeping.
LGTM=rsc
R=golang-codereviews, rsc
CC=adonovan, golang-codereviews, khr, rlh
https://golang.org/cl/126510043
These are required for chans, semaphores, timers, etc.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/123640043
Half the code in the garbage collector accesses the bitmap
as an array of bytes instead of as an array of uintptrs.
This is tricky to do correctly in a portable fashion,
it breaks on big-endian systems.
Make the bitmap a byte array.
Simplifies markallocated, scanblock and span sweep along the way,
as we don't need to recalculate bitmap position for each word.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/125250043
We need to change the interface value representation for
concurrent garbage collection, so that there is no ambiguity
about whether the data word holds a pointer or scalar.
This CL does NOT make any representation changes.
Instead, it removes representation assumptions from
various pieces of code throughout the tree.
The isdirectiface function in cmd/gc/subr.c is now
the only place that decides that policy.
The policy propagates out from there in the reflect
metadata, as a new flag in the internal kind value.
A follow-up CL will change the representation by
changing the isdirectiface function. If that CL causes
problems, it will be easy to roll back.
Update #8405.
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, r
https://golang.org/cl/129090043
Currently we do the following dance after sweeping a span:
1. lock mcentral
2. remove the span from a list
3. unlock mcentral
4. unmark span
5. lock mheap
6. insert the span into heap
7. unlock mheap
8. lock mcentral
9. observe empty list
10. unlock mcentral
11. lock mheap
12. grab the span
13. unlock mheap
14. mark span
15. lock mcentral
16. insert the span into empty list
17. unlock mcentral
This change short-circuits this sequence to nothing,
that is, we just cache and use the span after sweeping.
This gives us functionality similar (even better) to tcmalloc's transfer cache.
benchmark old ns/op new ns/op delta
BenchmarkMalloc8 22.2 19.5 -12.16%
BenchmarkMalloc16 31.0 26.6 -14.19%
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/119550043
bv.data is an array of uint32s but the code was using
offsets computed for an array of bytes.
Add a test for stack GC info.
Fixes#8531.
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, khr, rsc
https://golang.org/cl/124450043
Restore https://golang.org/cl/41040043 after GC rewrite.
Original description:
On the plus side, we don't need to change the bits on malloc and free.
On the downside, we need to mark objects in the free lists during GC.
But the free lists are small at GC time, so it should be a net win.
benchmark old ns/op new ns/op delta
BenchmarkMalloc8 21.9 20.4 -6.85%
BenchmarkMalloc16 31.1 29.6 -4.82%
LGTM=khr
R=khr
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/122280043
Eliminating use of this extension makes it easier to port the Go runtime
to other compilers. This CL also disables the extension in cc to prevent
accidental use.
LGTM=rsc, khr
R=rsc, aram, khr, dvyukov
CC=axwalk, golang-codereviews
https://golang.org/cl/106790044
FlagNoGC is unused now.
FlagNoInvokeGC is unneeded as we don't invoke GC
on g0 and when holding locks anyway.
mal/malloc have very few uses and you never remember
the exact set of flags they use and the difference between them.
Moreover, eventually we need to give exact types to all allocations,
something what mal/malloc do not support.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/117580043
Shrinkstack does not touch normal heap anymore,
so we can shink stacks concurrently with marking.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, khr, rlh, rsc
https://golang.org/cl/122130043
Introduce the mFunction type to represent an mcall/onM-able function.
Name such functions using _m.
LGTM=bradfitz
R=bradfitz
CC=golang-codereviews
https://golang.org/cl/121320043
We call scanblock for lots of small root pieces
e.g. for every stack frame args and locals area.
Every scanblock invocation calls getempty/putempty,
which accesses lock-free stack shared among all worker threads.
One-element local cache allows most scanblock calls
to proceed without accessing the shared stack.
LGTM=rsc
R=golang-codereviews, rlh
CC=golang-codereviews, khr, rsc
https://golang.org/cl/121250043
For consistency with other code, as that was the only use of
memcopy outside of alg.goc.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/122030044
Several reasons:
1. Significantly simplifies runtime.
2. This code proved to be buggy.
3. Free is incompatible with bump-the-pointer allocation.
4. We want to write runtime in Go, Go does not have free.
5. Too much code to free env strings on startup.
LGTM=khr
R=golang-codereviews, josharian, tracey.brendan, khr
CC=bradfitz, golang-codereviews, r, rlh, rsc
https://golang.org/cl/116390043
This change introduces gomallocgc, a Go clone of mallocgc.
Only a few uses have been moved over, so there are still
lots of uses from C. Many of these C uses will be moved
over to Go (e.g. in slice.goc), but probably not all.
What should remain of C's mallocgc is an open question.
LGTM=rsc, dvyukov
R=rsc, khr, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/108840046
Implement the design described in:
https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbKcg/pub
Summary of the changes:
GC uses "2-bits per word" pointer type info embed directly into bitmap.
Scanning of stacks/data/heap is unified.
The old spans types go away.
Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap).
Linker generates "dense" 2-bits type info for data/bss (the same as stacks use).
Summary of results:
-1680 lines of code total (-1000+ in mgc0.c only)
-25% memory consumption
-3-7% binary size
-15% GC pause reduction
-7% run time reduction
LGTM=khr
R=golang-codereviews, rsc, christoph, khr
CC=golang-codereviews, rlh
https://golang.org/cl/106260045
updatememstats is called on both the m and g stacks.
Call into flushallmcaches correctly. flushallmcaches
can only run on the M stack.
This is somewhat temporary. once ReadMemStats is in
Go we can have all of this code M-only.
LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/116880043
redo stack allocation. This is mostly the same as
the original CL with a few bug fixes.
1. add racemalloc() for stack allocations
2. fix poolalloc/poolfree to terminate free lists correctly.
3. adjust span ref count correctly.
4. don't use cache for sizes >= StackCacheSize.
Should fix bugs and memory leaks in original changelist.
««« original CL description
undo CL 104200047 / 318b04f28372
Breaks windows and race detector.
TBR=rsc
««« original CL description
runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»
TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
»»»
LGTM=dvyukov
R=dvyukov, dave, khr, alex.brainman
CC=golang-codereviews
https://golang.org/cl/112240044
The code in GC that handles gp->gobuf.ctxt is wrong,
because it does not mark the ctxt object itself,
if just queues the ctxt object for scanning.
So the ctxt object can be collected as garbage.
However, Gobuf.ctxt is void*, so it's always marked and
scanned through G.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/105490044
Breaks windows and race detector.
TBR=rsc
««« original CL description
runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»
TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
Remove GC bitmap backward scanning.
This was already done once in https://golang.org/cl/5530074/
Still makes GC a bit faster.
On the garbage benchmark, before:
gc-pause-one=237345195
gc-pause-total=4746903
cputime=32427775
time=32458208
after:
gc-pause-one=235484019
gc-pause-total=4709680
cputime=31861965
time=31877772
Also prepares mgc0.c for future changes.
R=golang-codereviews, khr, khr
CC=golang-codereviews, rsc
https://golang.org/cl/105380043
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
Afterprologue check was required when did not know
about return arguments of functions and/or they were not zeroed.
Now 100% precision is required for stacks due to stack copying,
so it must work w/o afterprologue one way or another.
I can limit this change for 1.3 to merely adding a TODO,
but this check is super confusing so I don't want this knowledge to get lost.
LGTM=rsc
R=golang-codereviews, gobot, rsc, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/96580045
The 'continuation pc' is where the frame will continue
execution, if anywhere. For a frame that stopped execution
due to a CALL instruction, the continuation pc is immediately
after the CALL. But for a frame that stopped execution due to
a fault, the continuation pc is the pc after the most recent CALL
to deferproc in that frame, or else 0. That is where execution
will continue, if anywhere.
The liveness information is only recorded for CALL instructions.
This change makes sure that we never look for liveness information
except for CALL instructions.
Using a valid PC fixes crashes when a garbage collection or
stack copying tries to process a stack frame that has faulted.
Record continuation pc in heapdump (format change).
Fixes#8048.
LGTM=iant, khr
R=khr, iant, dvyukov
CC=golang-codereviews, r
https://golang.org/cl/100870044
Currently freeOSMemory makes only marking phase of GC, but not sweeping phase.
So recently memory is not released after freeOSMemory.
Do both marking and sweeping during freeOSMemory.
Fixes#8019.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/97550043
The GC program describing a data structure sometimes trusts the
pointer base type and other times does not (if not, the garbage collector
must fall back on per-allocation type information stored in the heap).
Make the scanning of a pointer in an interface do the same.
This fixes a crash in a particular use of reflect.SliceHeader.
Fixes#8004.
LGTM=khr
R=golang-codereviews, khr
CC=0xe2.0x9a.0x9b, golang-codereviews, iant, r
https://golang.org/cl/100470045
mstats.last_gc is unix time now, it is compared with abstract monotonic time.
On my machine GC is forced every 5 mins regardless of last_gc.
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, iant, rsc
https://golang.org/cl/91350045
The monotonic clock patch changed all runtime times
to abstract monotonic time. As the result user-visible
MemStats.LastGC become monotonic time as well.
Restore Unix time for LastGC.
This is the simplest way to expose time.now to runtime that I found.
Another option would be to change time.now to C called
int64 runtime.unixnanotime() and then express time.now in terms of it.
But this would require to introduce 2 64-bit divisions into time.now.
Another option would be to change time.now to C called
void runtime.unixnanotime1(struct {int64 sec, int32 nsec} *now)
and then express both time.now and runtime.unixnanotime in terms of it.
Fixes#7852.
LGTM=minux.ma, iant
R=minux.ma, rsc, iant
CC=golang-codereviews
https://golang.org/cl/93720045
Having the pointers means you can grub around in the
binary finding out more about them.
This helped with issue 7748.
LGTM=minux.ma, bradfitz
R=golang-codereviews, minux.ma, bradfitz
CC=golang-codereviews
https://golang.org/cl/88090045
Do not consider idle finalizer/bgsweep/timer goroutines as doing something useful.
We can't simply set isbackground for the whole lifetime of the goroutines,
because when finalizer goroutine calls user function, we do want to consider it
as doing something useful.
This is borken due to timers for quite some time.
With background sweep is become even more broken.
Fixes#7784.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/87960044
Currently Pool can cache up to 15 elements per P, and these elements are not accesible to other Ps.
If a Pool caches large objects, say 2MB, and GOMAXPROCS is set to a large value, say 32,
then the Pool can waste up to 960MB.
The new caching policy caches at most 1 per-P element, the rest is shared between Ps.
Get/Put performance is unchanged. Nested Get/Put performance is 57% worse.
However, overall scalability of nested Get/Put is significantly improved,
so the new policy starts winning under contention.
benchmark old ns/op new ns/op delta
BenchmarkPool 27.4 26.7 -2.55%
BenchmarkPool-4 6.63 6.59 -0.60%
BenchmarkPool-16 1.98 1.87 -5.56%
BenchmarkPool-64 1.93 1.86 -3.63%
BenchmarkPoolOverlflow 3970 6235 +57.05%
BenchmarkPoolOverlflow-4 10935 1668 -84.75%
BenchmarkPoolOverlflow-16 13419 520 -96.12%
BenchmarkPoolOverlflow-64 10295 380 -96.31%
LGTM=rsc
R=rsc
CC=golang-codereviews, khr
https://golang.org/cl/86020043
Cuts the number of calls from 6 to 2 in the non-debug case.
LGTM=iant
R=golang-codereviews, iant
CC=0intro, aram, golang-codereviews, khr
https://golang.org/cl/86040043
Given
type Outer struct {
*Inner
...
}
the compiler generates the implementation of (*Outer).M dispatching to
the embedded Inner. The implementation is logically:
func (p *Outer) M() {
(p.Inner).M()
}
but since the only change here is the replacement of one pointer
receiver with another, the actual generated code overwrites the
original receiver with the p.Inner pointer and then jumps to the M
method expecting the *Inner receiver.
During reflect.Value.Call, we create an argument frame and the
associated data structures to describe it to the garbage collector,
populate the frame, call reflect.call to run a function call using
that frame, and then copy the results back out of the frame. The
reflect.call function does a memmove of the frame structure onto the
stack (to set up the inputs), runs the call, and the memmoves the
stack back to the frame structure (to preserve the outputs).
Originally reflect.call did not distinguish inputs from outputs: both
memmoves were for the full stack frame. However, in the case where the
called function was one of these wrappers, the rewritten receiver is
almost certainly a different type than the original receiver. This is
not a problem on the stack, where we use the program counter to
determine the type information and understand that during (*Outer).M
the receiver is an *Outer while during (*Inner).M the receiver in the
same memory word is now an *Inner. But in the statically typed
argument frame created by reflect, the receiver is always an *Outer.
Copying the modified receiver pointer off the stack into the frame
will store an *Inner there, and then if a garbage collection happens
to scan that argument frame before it is discarded, it will scan the
*Inner memory as if it were an *Outer. If the two have different
memory layouts, the collection will intepret the memory incorrectly.
Fix by only copying back the results.
Fixes#7725.
LGTM=khr
R=khr
CC=dave, golang-codereviews
https://golang.org/cl/85180043
Trying to make GODEBUG=gcdead=1 work with liveness
and in particular ambiguously live variables.
1. In the liveness computation, mark all ambiguously live
variables as live for the entire function, except the entry.
They are zeroed directly after entry, and we need them not
to be poisoned thereafter.
2. In the liveness computation, compute liveness (and deadness)
for all parameters, not just pointer-containing parameters.
Otherwise gcdead poisons untracked scalar parameters and results.
3. Fix liveness debugging print for -live=2 to use correct bitmaps.
(Was not updated for compaction during compaction CL.)
4. Correct varkill during map literal initialization.
Was killing the map itself instead of the inserted value temp.
5. Disable aggressive varkill cleanup for call arguments if
the call appears in a defer or go statement.
6. In the garbage collector, avoid bug scanning empty
strings. An empty string is two zeros. The multiword
code only looked at the first zero and then interpreted
the next two bits in the bitmap as an ordinary word bitmap.
For a string the bits are 11 00, so if a live string was zero
length with a 0 base pointer, the poisoning code treated
the length as an ordinary word with code 00, meaning it
needed poisoning, turning the string into a poison-length
string with base pointer 0. By the same logic I believe that
a live nil slice (bits 11 01 00) will have its cap poisoned.
Always scan full multiword struct.
7. In the runtime, treat both poison words (PoisonGC and
PoisonStack) as invalid pointers that warrant crashes.
Manual testing as follows:
- Create a script called gcdead on your PATH containing:
#!/bin/bash
GODEBUG=gcdead=1 GOGC=10 GOTRACEBACK=2 exec "$@"
- Now you can build a test and then run 'gcdead ./foo.test'.
- More importantly, you can run 'go test -short -exec gcdead std'
to run all the tests.
Fixes#7676.
While here, enable the precise scanning of slices, since that was
disabled due to bugs like these. That now works, both with and
without gcdead.
Fixes#7549.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83410044
Reduce footprint of liveness bitmaps by about 5x.
1. Mark all liveness bitmap symbols as 4-byte aligned
(they were aligned to a larger size by default).
2. The bitmap data is a bitmap count n followed by n bitmaps.
Each bitmap begins with its own count m giving the number
of bits. All the m's are the same for the n bitmaps.
Emit this bitmap length once instead of n times.
3. Many bitmaps within a function have the same bit values,
but each call site was given a distinct bitmap. Merge duplicate
bitmaps so that no bitmap is written more than once.
4. Many functions end up with the same aggregate bitmap data.
We used to name the bitmap data funcname.gcargs and funcname.gclocals.
Instead, name it gclocals.<md5 of data> and mark it dupok so
that the linker coalesces duplicate sets. This cut the bitmap
data remaining after step 3 by 40%; I was not expecting it to
be quite so dramatic.
Applied to "go build -ldflags -w code.google.com/p/go.tools/cmd/godoc":
bitmaps pclntab binary on disk
before this CL 1326600 1985854 12738268
4-byte align 1154288 (0.87x) 1985854 (1.00x) 12566236 (0.99x)
one bitmap len 782528 (0.54x) 1985854 (1.00x) 12193500 (0.96x)
dedup bitmap 414748 (0.31x) 1948478 (0.98x) 11787996 (0.93x)
dedup bitmap set 245580 (0.19x) 1948478 (0.98x) 11620060 (0.91x)
While here, remove various dead blocks of code from plive.c.
Fixes#6929.
Fixes#7568.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83630044
GODEBUG=allocfreetrace=1:
The allocfreetrace=1 mode prints a stack trace for each block
allocated and freed, and also a stack trace for each garbage collection.
It was implemented by reusing the heap profiling support: if allocfreetrace=1
then the heap profile was effectively running at 1 sample per 1 byte allocated
(always sample). The stack being shown at allocation was the stack gathered
for profiling, meaning it was derived only from the program counters and
did not include information about function arguments or frame pointers.
The stack being shown at free was the allocation stack, not the free stack.
If you are generating this log, you can find the allocation stack yourself, but
it can be useful to see exactly the sequence that led to freeing the block:
was it the garbage collector or an explicit free? Now that the garbage collector
runs on an m0 stack, the stack trace for the garbage collector was never interesting.
Fix all these problems:
1. Decouple allocfreetrace=1 from heap profiling.
2. Print the standard goroutine stack traces instead of a custom format.
3. Print the stack trace at time of allocation for an allocation,
and print the stack trace at time of free (not the allocation trace again)
for a free.
4. Print all goroutine stacks at garbage collection. Having all the stacks
means that you can see the exact point at which each goroutine was
preempted, which is often useful for identifying liveness-related errors.
GODEBUG=gcdead=1:
This mode overwrites dead pointers with a poison value.
Detect the poison value as an invalid pointer during collection,
the same way that small integers are invalid pointers.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81670043
This is the same check we use during stack copying.
The check cannot be applied to C stack frames, even
though we do emit pointer bitmaps for the arguments,
because (1) the pointer bitmaps assume all arguments
are always live, not true of outputs during the prologue,
and (2) the pointer bitmaps encode interface values as
pointer pairs, not true of interfaces holding integers.
For the rest of the frames, however, we should hold ourselves
to the rule that a pointer marked live really is initialized.
The interface scanning already implicitly checks this
because it interprets the type word as a valid type pointer.
This may slow things down a little because of the extra loads.
Or it may speed things up because we don't bother enqueuing
nil pointers anymore. Enough of the rest of the system is slow
right now that we can't measure it meaningfully.
Enable for now, even if it is slow, to shake out bugs in the
liveness bitmaps, and then decide whether to turn it off
for the Go 1.3 release (issue 7650 reminds us to do this).
The new m->traceback field lets us force printing of fp=
values on all goroutine stack traces when we detect a
bad pointer. This makes it easier to understand exactly
where in the frame the bad pointer is, so that we can trace
it back to a specific variable and determine what is wrong.
Update #7650
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80860044
Currently it's possible that bgsweep finishes before all spans
have been swept (we only know that sweeping of all spans has *started*).
In such case bgsweep may fail wake up runfinq goroutine when it needs to.
finq may still be nil at this point, but some finalizers may be queued later.
Make bgsweep to wait for sweeping to *complete*, then it can decide
whether it needs to wake up runfinq for sure.
Update #7533
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75960043
If we set obj, then it will be enqueued for marking at the end of the scanning loop.
This is not necessary, since we've already marked it.
This can wait for 1.4 if you wish.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/80030043