Several reasons:
1. Significantly simplifies runtime.
2. This code proved to be buggy.
3. Free is incompatible with bump-the-pointer allocation.
4. We want to write runtime in Go, Go does not have free.
5. Too much code to free env strings on startup.
LGTM=khr
R=golang-codereviews, josharian, tracey.brendan, khr
CC=bradfitz, golang-codereviews, r, rlh, rsc
https://golang.org/cl/116390043
Stand-alone this test is fine. Run together with
others, however, the stack used can actually go
negative because other tests are freeing stack
during its execution.
This behavior is new with the new stack allocator.
The old allocator never returned (min-sized) stacks.
This test is fairly poor - it needs to run in
isolation to be accurate. Maybe we should delete it.
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/119330044
The DISPATCH and CALLFN macro definitions depend on an inconsistency
between the internal cpp mini-implementation and the language proper in
whether center-dot is an identifier character. The macro depends on it not
being an identifier character, but the resulting code depends on it being one.
Remove the dependence on the inconsistency by placing the center-dot into
the macro invocation rather that the body.
No semantic change. This is just renaming macro arguments.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/119320043
This change introduces gomallocgc, a Go clone of mallocgc.
Only a few uses have been moved over, so there are still
lots of uses from C. Many of these C uses will be moved
over to Go (e.g. in slice.goc), but probably not all.
What should remain of C's mallocgc is an open question.
LGTM=rsc, dvyukov
R=rsc, khr, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/108840046
Implement the design described in:
https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbKcg/pub
Summary of the changes:
GC uses "2-bits per word" pointer type info embed directly into bitmap.
Scanning of stacks/data/heap is unified.
The old spans types go away.
Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap).
Linker generates "dense" 2-bits type info for data/bss (the same as stacks use).
Summary of results:
-1680 lines of code total (-1000+ in mgc0.c only)
-25% memory consumption
-3-7% binary size
-15% GC pause reduction
-7% run time reduction
LGTM=khr
R=golang-codereviews, rsc, christoph, khr
CC=golang-codereviews, rlh
https://golang.org/cl/106260045
Sweepone may be running while a new span is allocating. It
must not see the state updated while the sweepgen is unset.
Fixes#8399
LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/118050043
This is bad for 2 reasons:
1. if the code under lock ever grows stack,
it will deadlock as stack growing acquires mheap lock.
2. It currently deadlocks with SetCPUProfileRate:
scavenger locks mheap, receives prof signal and tries to lock prof lock;
meanwhile SetCPUProfileRate locks prof lock and tries to grow stack
(presumably in runtime.unlock->futexwakeup). Boom.
Let's assume that it
Fixes#8407.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, khr
https://golang.org/cl/112640043
With cl/112640043 TestCgoDeadlockCrash episodically print:
unexpected return pc for runtime.newstackcall
After adding debug output I see the following trace:
runtime: unexpected return pc for runtime.newstackcall called from 0xc208011b00
runtime.throw(0x414da86)
src/pkg/runtime/panic.c:523 +0x77
runtime.gentraceback(0x40165fc, 0xba440c28, 0x0, 0xc208d15200, 0xc200000000, 0xc208ddfd20, 0x20, 0x0, 0x0, 0x300)
src/pkg/runtime/traceback_x86.c:185 +0xca4
runtime.callers(0x1, 0xc208ddfd20, 0x20)
src/pkg/runtime/traceback_x86.c:438 +0x98
mcommoninit(0xc208ddfc00)
src/pkg/runtime/proc.c:369 +0x5c
runtime.allocm(0xc208052000)
src/pkg/runtime/proc.c:686 +0xa6
newm(0x4017850, 0xc208052000)
src/pkg/runtime/proc.c:933 +0x27
startm(0xc208052000, 0x100000001)
src/pkg/runtime/proc.c:1011 +0xba
wakep()
src/pkg/runtime/proc.c:1071 +0x57
resetspinning()
src/pkg/runtime/proc.c:1297 +0xa1
schedule()
src/pkg/runtime/proc.c:1366 +0x14b
runtime.gosched0(0xc20808e240)
src/pkg/runtime/proc.c:1465 +0x5b
runtime.newstack()
src/pkg/runtime/stack.c:891 +0x44d
runtime: unexpected return pc for runtime.newstackcall called from 0xc208011b00
runtime.newstackcall(0x4000cbd, 0x4000b80)
src/pkg/runtime/asm_amd64.s:278 +0x6f
I suspect that it can happen on any stack split.
So don't unwind g0 stack.
Also, that comment is lying -- we can traceback w/o mcache,
CPU profiler does that.
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, khr, rsc
https://golang.org/cl/120040043
So we can tell from a binary which version of
Go built it.
LGTM=minux, rsc
R=golang-codereviews, minux, khr, rsc, dave
CC=golang-codereviews
https://golang.org/cl/117040043
In both cases we lie to malloc about the actual size that we need.
In panic we ask for less memory than we are going to use.
In slice we ask for more memory than we are going to use
(potentially asking for a fractional number of elements).
This breaks the new GC.
LGTM=khr
R=golang-codereviews, dave, khr
CC=golang-codereviews, rsc
https://golang.org/cl/116940043
Even though pointers are 4 bytes the stack frame should be kept
a multiple of 8 bytes so that return addresses pushed on the stack
are properly aligned.
Fixes#8379.
LGTM=dvyukov, minux
R=minux, bradfitz, dvyukov, dave
CC=golang-codereviews
https://golang.org/cl/115840048
These correspond to 2 and 3 word fat copies/clears on 8g, which dominate usage in the stdlib. (70% of copies and 46% of clears are for 2 or 3 words.) I missed these in CL 111350043, which added 2 and 3 word benchmarks for 6g. A follow-up CL will optimize these cases.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/115160043
benchmark old ns/op new ns/op delta
BenchmarkSelectUncontended 220 165 -25.00%
BenchmarkSelectContended 209 161 -22.97%
BenchmarkSelectProdCons 1042 904 -13.24%
But more importantly this change will allow
to get rid of free function in runtime.
Fixes#6494.
LGTM=rsc, khr
R=golang-codereviews, rsc, dominik.honnef, khr
CC=golang-codereviews, remyoudompheng
https://golang.org/cl/107670043
The CopyFat benchmarks were changed in CL 92760044. See CL 111350043 for discussion.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/116000043
These benchmarks are important for performance. When compiling the stdlib:
* 77.1% of the calls to sgen (copyfat) are for 16 bytes; another 8.7% are for 24 bytes. (The next most common is 32 bytes, at 5.7%.)
* Over half the calls to clearfat are for 16 or 24 bytes.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/111350043
updatememstats is called on both the m and g stacks.
Call into flushallmcaches correctly. flushallmcaches
can only run on the M stack.
This is somewhat temporary. once ReadMemStats is in
Go we can have all of this code M-only.
LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/116880043
redo stack allocation. This is mostly the same as
the original CL with a few bug fixes.
1. add racemalloc() for stack allocations
2. fix poolalloc/poolfree to terminate free lists correctly.
3. adjust span ref count correctly.
4. don't use cache for sizes >= StackCacheSize.
Should fix bugs and memory leaks in original changelist.
««« original CL description
undo CL 104200047 / 318b04f28372
Breaks windows and race detector.
TBR=rsc
««« original CL description
runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»
TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
»»»
LGTM=dvyukov
R=dvyukov, dave, khr, alex.brainman
CC=golang-codereviews
https://golang.org/cl/112240044
Resolves TODO for not walking all goroutines in NumGoroutines.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/107290044
1. Add select on sync channels benchmark.
2. Make channels in BenchmarkSelectNonblock shared.
With GOMAXPROCS=1 it is the same, but with GOMAXPROCS>1
it becomes a more interesting benchmark.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/115780043
The issue is discovered during testing of a change to runtime.
Even if it is unlikely to happen, the comment can safe an hour
next person who hits it.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/116790043
I don't see how it can lead to bad things today.
But it's better to kill it before it does.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/111130045
This CL adds 'dropg', which is called to drop the association
between m and its current goroutine, and it makes schedule
handle locked goroutines correctly, instead of requiring all
callers of schedule to do that.
The effect is that if you want to take over an m for, say,
garbage collection work while still allowing the current g
to run on some other m, you can do an mcall to a function
that is:
// dissociate gp
dropg();
gp->status = Gwaiting; // for ready
// put gp on run queue for others to find
runtime·ready(gp);
/* ... do other work here ... */
// done with m, let it run goroutines again
schedule();
Before this CL, the dropg() body had to be written explicitly,
and the check for lockedg before schedule had to be
written explicitly too, both of which make the code a bit
more fragile than it needs to be.
LGTM=iant
R=dvyukov, iant
CC=golang-codereviews, rlh
https://golang.org/cl/113110043
When we've switched to 8K pages,
heap started to grow by 128K instead of 64K,
because it was implicitly assuming that pages are 4K.
Fix that and make the code more robust.
LGTM=khr
R=golang-codereviews, dave, khr
CC=golang-codereviews, rsc
https://golang.org/cl/106450044
Maxstring is not updated in the new string routines,
this makes runtime think that long strings are bogus.
Fixes#8339.
LGTM=crawshaw, iant
R=golang-codereviews, crawshaw, iant
CC=golang-codereviews, khr, rsc
https://golang.org/cl/110930043
Both stdout and stderr are sent to /dev/null in android
apps. Introducing fatalf allows android to implement its
own copy that sends fatal errors to __android_log_print.
LGTM=minux, dave
R=minux, dave
CC=golang-codereviews
https://golang.org/cl/108400045
Based on cl/69170045 by Elias Naur.
There are currently several schemes for acquiring a TLS
slot to save the g register. None of them appear to work
for android. The closest are linux and darwin.
Linux uses a linker TLS relocation. This is not supported
by the android linker.
Darwin uses a fixed offset, and calls pthread_key_create
until it gets the slot it wants. As the runtime loads
late in the android process lifecycle, after an
arbitrary number of other libraries, we cannot rely on
any particular slot being available.
So we call pthread_key_create, take the first slot we are
given, and put it in runtime.tlsg, which we turn into a
regular variable in cmd/ld.
Makes android/arm cgo binaries work.
LGTM=minux
R=elias.naur, minux, dave, josharian
CC=golang-codereviews
https://golang.org/cl/106380043
The code in GC that handles gp->gobuf.ctxt is wrong,
because it does not mark the ctxt object itself,
if just queues the ctxt object for scanning.
So the ctxt object can be collected as garbage.
However, Gobuf.ctxt is void*, so it's always marked and
scanned through G.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/105490044
runtime·usleep and runtime·osyield fall back to calling an
assembly wrapper for the libc functions in the absence of a m,
so they can be called in cgo callback context.
LGTM=rsc
R=minux.ma, rsc
CC=dave, golang-codereviews
https://golang.org/cl/102620044
The main changes fall into a few patterns:
1. Replace #define with enum.
2. Add /*c2go */ comment giving effect of #define.
This is necessary for function-like #defines and
non-enum-able #defined constants.
(Not all compilers handle negative or large enums.)
3. Add extra braces in struct initializer.
(c2go does not implement the full rules.)
This is enough to let c2go typecheck the source tree.
There may be more changes once it is doing
other semantic analyses.
LGTM=minux, iant
R=minux, dave, iant
CC=golang-codereviews
https://golang.org/cl/106860045
A TLS slot is reserved by _rt0_.*_plan9 as an automatic and
its address (which is static on Plan 9) is saved in the
global _privates symbol. The startup linkage now is exactly
like that from Plan 9 libc, and the way we access g is
exactly as if we'd have used privalloc(2).
Aside from making the code more standard, this change
drastically simplifies it, both for 386 and for amd64, and
makes the Plan 9 code in liblink common for both 386 and
amd64.
The amd64 runtime code was cleared of nxm assumptions, and
now runs on the standard Plan 9 kernel.
Note handling fixes will follow in a separate CL.
LGTM=rsc
R=golang-codereviews, rsc, bradfitz, dave
CC=0intro, ality, golang-codereviews, jas, minux.ma, mischief
https://golang.org/cl/101510049
We restored registers correctly in the usual case where the thread
is a Go-managed thread and called runtime·sighandler, but we
failed to do so when runtime·sigtramp was called on a cgo-created
thread. In that case, runtime·sigtramp called runtime·badsignal,
a Go function, and did not restore registers after it returned
LGTM=rsc, dave
R=rsc, dave
CC=golang-codereviews, minux.ma
https://golang.org/cl/105280050
Breaks windows and race detector.
TBR=rsc
««« original CL description
runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»
TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
Remove GC bitmap backward scanning.
This was already done once in https://golang.org/cl/5530074/
Still makes GC a bit faster.
On the garbage benchmark, before:
gc-pause-one=237345195
gc-pause-total=4746903
cputime=32427775
time=32458208
after:
gc-pause-one=235484019
gc-pause-total=4709680
cputime=31861965
time=31877772
Also prepares mgc0.c for future changes.
R=golang-codereviews, khr, khr
CC=golang-codereviews, rsc
https://golang.org/cl/105380043
newproc takes two extra pointers, not two extra registers.
On amd64p32 (nacl) they are different.
We diagnosed this before the 1.3 cut but the tree was frozen.
I believe this is causing the random problems on the builder.
Fixes#8199.
TBR=r
CC=golang-codereviews
https://golang.org/cl/102710043
Output number of spinning threads,
this is useful to understanding whether the scheduler
is in a steady state or not.
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/103540045
Say when a goroutine is locked to OS thread in crash reports
and goroutine profiles.
It can be useful to understand what goroutines consume OS threads
(syscall and locked), e.g. if you forget to call UnlockOSThread
or leak locked goroutines.
R=golang-codereviews
CC=golang-codereviews, rsc
https://golang.org/cl/94170043
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
MOV with SSE registers seems faster than REP MOVSQ if the
size being copied is less than about 2K. Previously we
didn't use MOV if the memory region is larger than 256
byte. This patch improves the performance of 257 ~ 2048
byte non-overlapping copy by using MOV.
Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem).
benchmark old ns/op new ns/op delta
BenchmarkMemmove16 4 4 +0.42%
BenchmarkMemmove32 5 5 -0.20%
BenchmarkMemmove64 6 6 -0.81%
BenchmarkMemmove128 7 7 -0.82%
BenchmarkMemmove256 10 10 +1.92%
BenchmarkMemmove512 29 16 -44.90%
BenchmarkMemmove1024 37 25 -31.55%
BenchmarkMemmove2048 55 44 -19.46%
BenchmarkMemmove4096 92 91 -0.76%
benchmark old MB/s new MB/s speedup
BenchmarkMemmove16 3370.61 3356.88 1.00x
BenchmarkMemmove32 6368.68 6386.99 1.00x
BenchmarkMemmove64 10367.37 10462.62 1.01x
BenchmarkMemmove128 17551.16 17713.48 1.01x
BenchmarkMemmove256 24692.81 24142.99 0.98x
BenchmarkMemmove512 17428.70 31687.72 1.82x
BenchmarkMemmove1024 27401.82 40009.45 1.46x
BenchmarkMemmove2048 36884.86 45766.98 1.24x
BenchmarkMemmove4096 44295.91 44627.86 1.01x
LGTM=khr
R=golang-codereviews, gobot, khr
CC=golang-codereviews
https://golang.org/cl/90500043
This requires minimal changes to the runtime hooks. In particular,
synchronization events must be done only on valid addresses now,
so I've added the additional checks to race.c.
LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/101000046
Afterprologue check was required when did not know
about return arguments of functions and/or they were not zeroed.
Now 100% precision is required for stacks due to stack copying,
so it must work w/o afterprologue one way or another.
I can limit this change for 1.3 to merely adding a TODO,
but this check is super confusing so I don't want this knowledge to get lost.
LGTM=rsc
R=golang-codereviews, gobot, rsc, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/96580045
Also implement go:nosplit annotation. Not really needed
for now, but we'll definitely need it for other conversions.
benchmark old ns/op new ns/op delta
BenchmarkRuneIterate 534 474 -11.24%
BenchmarkRuneIterate2 535 470 -12.15%
LGTM=bradfitz
R=golang-codereviews, dave, bradfitz, minux
CC=golang-codereviews
https://golang.org/cl/93380044
Reportedly in the Linux 3.16 kernel the VDSO will not have
section headers or a normal symbol table.
Too late for 1.3 but perhaps for 1.3.1, if there is one.
Fixes#8197.
LGTM=rsc
R=golang-codereviews, mattn.jp, rsc
CC=golang-codereviews
https://golang.org/cl/101260044
It appears that something about Go on Windows
cannot handle the fault cause by a jump to address 0.
The way Go represents and calls functions, this
never happened at all, until CL 105140044.
This CL changes the code added in CL 105140044
to make jump to 0 impossible once again.
Fixes#8047. (again, on Windows)
TBR=bradfitz
R=golang-codereviews, dave
CC=adg, golang-codereviews, iant, r
https://golang.org/cl/105120044
jmpdefer modifies PC, SP, and LR, and not atomically,
so walking past jmpdefer will often end up in a state
where the three are not a consistent execution snapshot.
This was causing warning messages a few frames later
when the traceback realized it was confused, but given
the right memory it could easily crash instead.
Update #8153
LGTM=minux, iant
R=golang-codereviews, minux, iant
CC=golang-codereviews, r
https://golang.org/cl/107970043
A runtime.Goexit during a panic-invoked deferred call
left the panic stack intact even though all the stack frames
are gone when the goroutine is torn down.
The next goroutine to reuse that struct will have a
bogus panic stack and can cause the traceback routines
to walk into garbage.
Most likely to happen during tests, because t.Fatal might
be called during a deferred func and uses runtime.Goexit.
This "not enough cleared in Goexit" failure mode has
happened to us multiple times now. Clear all the pointers
that don't make sense to keep, not just gp->panic.
Fixes#8158.
LGTM=iant, dvyukov
R=iant, dvyukov
CC=golang-codereviews
https://golang.org/cl/102220043
The 1-byte write was silently clearing a byte on the stack.
If there was another function call with more arguments
in the same stack frame, no harm done.
Otherwise, if the variable at that location was already zero,
no harm done.
Otherwise, problems.
Fixes#8139.
LGTM=dsymonds
R=golang-codereviews, dsymonds
CC=golang-codereviews, iant, r
https://golang.org/cl/100940043
We were requiring that the defer stack and the panic stack
be completely processed, thinking that if any were left over
the stack scan and the defer stack/panic stack must be out
of sync. It turns out that the panic stack may well have
leftover entries in some situations, and that's okay.
Fixes#8132.
LGTM=minux, r
R=golang-codereviews, minux, r
CC=golang-codereviews, iant, khr
https://golang.org/cl/100900044
C globals are conservatively scanned. This helps
avoid false retention, especially for 32 bit.
LGTM=rsc
R=golang-codereviews, khr, rsc
CC=golang-codereviews
https://golang.org/cl/102040043
The 'continuation pc' is where the frame will continue
execution, if anywhere. For a frame that stopped execution
due to a CALL instruction, the continuation pc is immediately
after the CALL. But for a frame that stopped execution due to
a fault, the continuation pc is the pc after the most recent CALL
to deferproc in that frame, or else 0. That is where execution
will continue, if anywhere.
The liveness information is only recorded for CALL instructions.
This change makes sure that we never look for liveness information
except for CALL instructions.
Using a valid PC fixes crashes when a garbage collection or
stack copying tries to process a stack frame that has faulted.
Record continuation pc in heapdump (format change).
Fixes#8048.
LGTM=iant, khr
R=khr, iant, dvyukov
CC=golang-codereviews, r
https://golang.org/cl/100870044
Update #2675
The code here was using the error check for Linux/386,
not the one for FreeBSD/386. Most of the time it worked.
Thanks to Neel Natu (FreeBSD developer) for finding this.
The s/JCC/JAE/ a few lines later is a no-op but makes the
test match the rest of the file. Why we write JAE instead of JCC
I don't know, but the two are equivalent and the file might
as well be consistent.
LGTM=bradfitz, minux
R=golang-codereviews, bradfitz, minux
CC=golang-codereviews
https://golang.org/cl/99680044
The rtype struct is meant to be a copy of reflect.rtype. The
zero field was added to reflect.rtype in 18495:6e50725ac753.
LGTM=rsc
R=khr, rsc
CC=golang-codereviews
https://golang.org/cl/93660045
Add nacl.bash, the NaCl version of all.bash.
It's a separate script because it builds a variant of package syscall
with a large zip file embedded in it, containing all the input files
needed for tests.
Disable various tests new since the last round, mostly the ones using os/exec.
Fixes#7945.
LGTM=dave
R=golang-codereviews, remyoudompheng, dave, bradfitz
CC=golang-codereviews
https://golang.org/cl/100590044
The move from 4kB to 8kB in Go 1.2 was to eliminate many stack split hot spots.
The move back to 4kB was predicated on copying stacks eliminating
the potential for hot spots.
Unfortunately, the fact that stacks do not copy 100% of the time means
that hot spots can still happen under the right conditions, and the slowdown
is worse now than it was in Go 1.2. There is a real program in issue 8030 that
sees about a 30x slowdown: it has a reflect call near the top of the stack
which inhibits any stack copying on that segment.
Go back to 8kB until stack copying can be used 100% of the time.
Fixes#8030.
LGTM=khr, dave, iant
R=iant, khr, r, bradfitz, dave
CC=golang-codereviews
https://golang.org/cl/92540043
Currently freeOSMemory makes only marking phase of GC, but not sweeping phase.
So recently memory is not released after freeOSMemory.
Do both marking and sweeping during freeOSMemory.
Fixes#8019.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/97550043
The GC program describing a data structure sometimes trusts the
pointer base type and other times does not (if not, the garbage collector
must fall back on per-allocation type information stored in the heap).
Make the scanning of a pointer in an interface do the same.
This fixes a crash in a particular use of reflect.SliceHeader.
Fixes#8004.
LGTM=khr
R=golang-codereviews, khr
CC=0xe2.0x9a.0x9b, golang-codereviews, iant, r
https://golang.org/cl/100470045
mstats.last_gc is unix time now, it is compared with abstract monotonic time.
On my machine GC is forced every 5 mins regardless of last_gc.
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, iant, rsc
https://golang.org/cl/91350045
I have no test case for this at tip.
The original report included a program crashing at revision 88ac7297d2fa.
I tested this code at that revision and it does fix the crash.
However, at tip the reported code no longer crashes, presumably
because some allocation patterns have changed. I believe the
bug is still present at tip and that this code still fixes it.
Fixes#7143.
LGTM=alex.brainman
R=golang-codereviews, alex.brainman
CC=dvyukov, golang-codereviews
https://golang.org/cl/96300046
If it's not used (such as on other systems or if softfloat
is disabled) the linker will discard it.
The alternative is to teach cmd/go that every binary
depends on math implicitly on arm. I started down that
path but it's too scary. If we're going to get dependencies
right we should get dependencies right.
Fixes#6994.
LGTM=bradfitz, dave
R=golang-codereviews, bradfitz, dave
CC=golang-codereviews
https://golang.org/cl/95290043
<enter reason for undo>
««« original CL description
runtime/race: fix the link for the race detector.
LGTM=bradfitz
R=golang-dev, bradfitz
CC=golang-codereviews
https://golang.org/cl/100330043
»»»
TBR=minux
R=minux.ma
CC=golang-codereviews
https://golang.org/cl/96200044
Number of lost samples was overcounted (never reset).
Also remove unused variable (it's trivial to restore it for debugging if needed).
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, rsc
https://golang.org/cl/96060043
Where the spelling changed from British to
US norm (e.g., optimise -> optimize) it follows
the style in that file.
LGTM=adonovan
R=golang-codereviews, adonovan
CC=golang-codereviews
https://golang.org/cl/96980043
Because gotraceback is called early and often, its cache commits to the value of getenv("GOTRACEBACK") before getenv is even ready. So now we reset its cache once getenv becomes ready. Panicking programs now dump core again.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/97800045
If slice append is the only place where a program allocates,
then it will consume all available memory w/o triggering GC.
This was demonstrated in the issue.
Fixes#7922.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, iant, khr
https://golang.org/cl/91010048
The monotonic clock patch changed all runtime times
to abstract monotonic time. As the result user-visible
MemStats.LastGC become monotonic time as well.
Restore Unix time for LastGC.
This is the simplest way to expose time.now to runtime that I found.
Another option would be to change time.now to C called
int64 runtime.unixnanotime() and then express time.now in terms of it.
But this would require to introduce 2 64-bit divisions into time.now.
Another option would be to change time.now to C called
void runtime.unixnanotime1(struct {int64 sec, int32 nsec} *now)
and then express both time.now and runtime.unixnanotime in terms of it.
Fixes#7852.
LGTM=minux.ma, iant
R=minux.ma, rsc, iant
CC=golang-codereviews
https://golang.org/cl/93720045
The backing memory for >1 word interfaces was being scanned
conservatively.
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/94000043
Use a real type for Gs instead of scanning them conservatively.
Zero the schedlink pointer when it is dead.
Update #7820
LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews
https://golang.org/cl/89360043
This has typically crashed in the past, although usually with
an 'all goroutines are asleep - deadlock!' message that shows
no goroutines (because there aren't any).
Previous discussion at:
https://groups.google.com/d/msg/golang-nuts/uCT_7WxxopQ/BoSBlLFzUTkJhttps://groups.google.com/d/msg/golang-dev/KUojayEr20I/u4fp_Ej5PdUJhttp://golang.org/issue/7711
There is general agreement that runtime.Goexit terminates the
main goroutine, so that main cannot return, so the program does
not exit.
The interpretation that all other goroutines exiting causes an
exit(0) is relatively new and was not part of those discussions.
That is what this CL changes.
Thankfully, even though the exit(0) has been there for a while,
some other accounting bugs made it very difficult to trigger,
so it is reasonable to replace. In particular, see golang.org/issue/7711#c10
for an examination of the behavior across past releases.
Fixes#7711.
LGTM=iant, r
R=golang-codereviews, iant, dvyukov, r
CC=golang-codereviews
https://golang.org/cl/88210044
Having the pointers means you can grub around in the
binary finding out more about them.
This helped with issue 7748.
LGTM=minux.ma, bradfitz
R=golang-codereviews, minux.ma, bradfitz
CC=golang-codereviews
https://golang.org/cl/88090045
When I did the original 386 ports on Linux and OS X, I chose to
define GS-relative expressions like 4(GS) as relative to the actual
thread-local storage base, which was usually GS but might not be
(it might be FS, or it might be a different constant offset from GS or FS).
The original scope was limited but since then the rewrites have
gotten out of control. Sometimes GS is rewritten, sometimes FS.
Some ports do other rewrites to enable shared libraries and
other linking. At no point in the code is it clear whether you are
looking at the real GS/FS or some synthesized thing that will be
rewritten. The code manipulating all these is duplicated in many
places.
The first step to fixing issue 7719 is to make the code intelligible
again.
This CL adds an explicit TLS pseudo-register to the 386 and amd64.
As a register, TLS refers to the thread-local storage base, and it
can only be loaded into another register:
MOVQ TLS, AX
An offset from the thread-local storage base is written off(reg)(TLS*1).
Semantically it is off(reg), but the (TLS*1) annotation marks this as
indexing from the loaded TLS base. This emits a relocation so that
if the linker needs to adjust the offset, it can. For example:
MOVQ TLS, AX
MOVQ 8(AX)(TLS*1), CX // load m into CX
On systems that support direct access to the TLS memory, this
pair of instructions can be reduced to a direct TLS memory reference:
MOVQ 8(TLS), CX // load m into CX
The 2-instruction and 1-instruction forms correspond roughly to
ELF TLS initial exec mode and ELF TLS local exec mode, respectively.
Liblink applies this rewrite on systems that support the 1-instruction form.
The decision is made using only the operating system (and probably
the -shared flag, eventually), not the link mode. If some link modes
on a particular operating system require the 2-instruction form,
then all builds for that operating system will use the 2-instruction
form, so that the link mode decision can be delayed to link time.
Obviously it is late to be making changes like this, but I despair
of correcting issue 7719 and issue 7164 without it. To make sure
I am not changing existing behavior, I built a "hello world" program
for every GOOS/GOARCH combination we have and then worked
to make sure that the rewrite generates exactly the same binaries,
byte for byte. There are a handful of TODOs in the code marking
kludges to get the byte-for-byte property, but at least now I can
explain exactly how each binary is handled.
The targets I tested this way are:
darwin-386
darwin-amd64
dragonfly-386
dragonfly-amd64
freebsd-386
freebsd-amd64
freebsd-arm
linux-386
linux-amd64
linux-arm
nacl-386
nacl-amd64p32
netbsd-386
netbsd-amd64
openbsd-386
openbsd-amd64
plan9-386
plan9-amd64
solaris-amd64
windows-386
windows-amd64
There were four exceptions to the byte-for-byte goal:
windows-386 and windows-amd64 have a time stamp
at bytes 137 and 138 of the header.
darwin-386 and plan9-386 have five or six modified
bytes in the middle of the Go symbol table, caused by
editing comments in runtime/sys_{darwin,plan9}_386.s.
Fixes#7164.
LGTM=iant
R=iant, aram, minux.ma, dave
CC=golang-codereviews
https://golang.org/cl/87920043
Do not consider idle finalizer/bgsweep/timer goroutines as doing something useful.
We can't simply set isbackground for the whole lifetime of the goroutines,
because when finalizer goroutine calls user function, we do want to consider it
as doing something useful.
This is borken due to timers for quite some time.
With background sweep is become even more broken.
Fixes#7784.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/87960044
Currently Pool can cache up to 15 elements per P, and these elements are not accesible to other Ps.
If a Pool caches large objects, say 2MB, and GOMAXPROCS is set to a large value, say 32,
then the Pool can waste up to 960MB.
The new caching policy caches at most 1 per-P element, the rest is shared between Ps.
Get/Put performance is unchanged. Nested Get/Put performance is 57% worse.
However, overall scalability of nested Get/Put is significantly improved,
so the new policy starts winning under contention.
benchmark old ns/op new ns/op delta
BenchmarkPool 27.4 26.7 -2.55%
BenchmarkPool-4 6.63 6.59 -0.60%
BenchmarkPool-16 1.98 1.87 -5.56%
BenchmarkPool-64 1.93 1.86 -3.63%
BenchmarkPoolOverlflow 3970 6235 +57.05%
BenchmarkPoolOverlflow-4 10935 1668 -84.75%
BenchmarkPoolOverlflow-16 13419 520 -96.12%
BenchmarkPoolOverlflow-64 10295 380 -96.31%
LGTM=rsc
R=rsc
CC=golang-codereviews, khr
https://golang.org/cl/86020043
It looks like maybe on slower builders 4 seconds is not enough.
Trying to get rid of the flaky failures.
TBR=iant
CC=golang-codereviews
https://golang.org/cl/86870044
It runs too long in -short mode.
Disable the one in init, because it doesn't respect -short.
Make the part that claims to test execution in a finalizer
actually execute the test in the finalizer.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=aram.h, golang-codereviews, iant, khr
https://golang.org/cl/86550045
We originally decided to skip this test in short mode
to prevent the parallel runtime test to timeout on the
Plan 9 builder. This should no longer be required since
the issue was fixed in CL 86210043.
LGTM=dave, bradfitz
R=dvyukov, dave, bradfitz
CC=golang-codereviews, rsc
https://golang.org/cl/84790044
If you pass ns = 100,000 to this function, timediv will
return ms = 0. tsemacquire in /sys/src/9/port/sysproc.c
will return immediately when ms == 0 and the semaphore
cannot be acquired immediately - it doesn't sleep - so
notetsleep will spin, chewing cpu and repeatedly reading
the time, until the 100us have passed.
Thanks to the time reads it won't take too many iterations,
but whatever we are waiting for does not get a chance to
run. Eventually the notetsleep spin loop returns and we
end up in the stoptheworld spin loop - actually a sleep
loop but we're not doing a good job of sleeping.
After 100ms or so of this, the kernel says enough and
schedules a different thread. That thread manages to do
whatever we're waiting for, and the spinning in the other
thread stops. If tsemacquire had actually slept, this
would have happened much quicker.
Many thanks to Russ Cox for help debugging.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/86210043
Cuts the number of calls from 6 to 2 in the non-debug case.
LGTM=iant
R=golang-codereviews, iant
CC=0intro, aram, golang-codereviews, khr
https://golang.org/cl/86040043
Getenv() should not call malloc when called from
gotraceback(). Instead, we return a static buffer
in this case, with enough room to hold the longest
value.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/85680043
On Plan 9 gotraceback calls getenv calls malloc, and we gotraceback
on every call to gentraceback, which happens during garbage collection.
Honestly I don't even know how this works on Plan 9.
I suspect it does not, and that we are getting by because
no one has tried to run with $GOTRACEBACK set at all.
This will speed up all the other systems by epsilon, since they
won't call getenv and atoi repeatedly.
LGTM=bradfitz
R=golang-codereviews, bradfitz, 0intro
CC=golang-codereviews
https://golang.org/cl/85430046
Given
type Outer struct {
*Inner
...
}
the compiler generates the implementation of (*Outer).M dispatching to
the embedded Inner. The implementation is logically:
func (p *Outer) M() {
(p.Inner).M()
}
but since the only change here is the replacement of one pointer
receiver with another, the actual generated code overwrites the
original receiver with the p.Inner pointer and then jumps to the M
method expecting the *Inner receiver.
During reflect.Value.Call, we create an argument frame and the
associated data structures to describe it to the garbage collector,
populate the frame, call reflect.call to run a function call using
that frame, and then copy the results back out of the frame. The
reflect.call function does a memmove of the frame structure onto the
stack (to set up the inputs), runs the call, and the memmoves the
stack back to the frame structure (to preserve the outputs).
Originally reflect.call did not distinguish inputs from outputs: both
memmoves were for the full stack frame. However, in the case where the
called function was one of these wrappers, the rewritten receiver is
almost certainly a different type than the original receiver. This is
not a problem on the stack, where we use the program counter to
determine the type information and understand that during (*Outer).M
the receiver is an *Outer while during (*Inner).M the receiver in the
same memory word is now an *Inner. But in the statically typed
argument frame created by reflect, the receiver is always an *Outer.
Copying the modified receiver pointer off the stack into the frame
will store an *Inner there, and then if a garbage collection happens
to scan that argument frame before it is discarded, it will scan the
*Inner memory as if it were an *Outer. If the two have different
memory layouts, the collection will intepret the memory incorrectly.
Fix by only copying back the results.
Fixes#7725.
LGTM=khr
R=khr
CC=dave, golang-codereviews
https://golang.org/cl/85180043
It turns out there is a relatively common pattern that relies on
inverted channel semaphore:
gate := make(chan bool, N)
for ... {
// limit concurrency
gate <- true
go func() {
foo(...)
<-gate
}()
}
// join all goroutines
for i := 0; i < N; i++ {
gate <- true
}
So handle synchronization on inverted semaphores with cap>1.
Fixes#7718.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/84880046
Defers generated from cgo lie to us about their argument layout.
Mark those defers as not copyable.
CL 83820043 contains an additional test for this code and should be
checked in (and enabled) after this change is in.
Fixes bug 7695.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/84740043
Iterate the right number of times in arrays and channels.
Handle channels with zero-sized objects in them.
Output longer type names if we have them.
Compute argument offset correctly.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/82980043
I have no idea what this code is for, but it pretty
clearly needs to be uint64, not uint32.
LGTM=aram
R=0intro, aram
CC=golang-codereviews
https://golang.org/cl/84410043
Trying to make GODEBUG=gcdead=1 work with liveness
and in particular ambiguously live variables.
1. In the liveness computation, mark all ambiguously live
variables as live for the entire function, except the entry.
They are zeroed directly after entry, and we need them not
to be poisoned thereafter.
2. In the liveness computation, compute liveness (and deadness)
for all parameters, not just pointer-containing parameters.
Otherwise gcdead poisons untracked scalar parameters and results.
3. Fix liveness debugging print for -live=2 to use correct bitmaps.
(Was not updated for compaction during compaction CL.)
4. Correct varkill during map literal initialization.
Was killing the map itself instead of the inserted value temp.
5. Disable aggressive varkill cleanup for call arguments if
the call appears in a defer or go statement.
6. In the garbage collector, avoid bug scanning empty
strings. An empty string is two zeros. The multiword
code only looked at the first zero and then interpreted
the next two bits in the bitmap as an ordinary word bitmap.
For a string the bits are 11 00, so if a live string was zero
length with a 0 base pointer, the poisoning code treated
the length as an ordinary word with code 00, meaning it
needed poisoning, turning the string into a poison-length
string with base pointer 0. By the same logic I believe that
a live nil slice (bits 11 01 00) will have its cap poisoned.
Always scan full multiword struct.
7. In the runtime, treat both poison words (PoisonGC and
PoisonStack) as invalid pointers that warrant crashes.
Manual testing as follows:
- Create a script called gcdead on your PATH containing:
#!/bin/bash
GODEBUG=gcdead=1 GOGC=10 GOTRACEBACK=2 exec "$@"
- Now you can build a test and then run 'gcdead ./foo.test'.
- More importantly, you can run 'go test -short -exec gcdead std'
to run all the tests.
Fixes#7676.
While here, enable the precise scanning of slices, since that was
disabled due to bugs like these. That now works, both with and
without gcdead.
Fixes#7549.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83410044
The garbage collector poison pointers
(0x6969696969696969 and 0x6868686868686868)
are malformed addresses on amd64.
That is, they are not 48-bit addresses sign extended
to 64 bits. This causes a different kind of hardware fault
than the usual 'unmapped page' when accessing such
an address, and OS X 10.9.2 sends the resulting SIGSEGV
incorrectly, making it look like it was user-generated
rather than kernel-generated and does not include the
faulting address. This means that in GODEBUG=gcdead=1
mode, if there is a bug and something tries to dereference
a poisoned pointer, the runtime delivers the SIGSEGV to
os/signal and returns to the faulting code, which faults
again, causing the process to hang instead of crashing.
Fix by rewriting "user-generated" SIGSEGV on OS X to
look like a kernel-generated SIGSEGV with fault address
0xb01dfacedebac1e.
I chose that address because (1) when printed in hex
during a crash, it is obviously spelling out English text,
(2) there are no current Google hits for that pointer,
which will make its origin easy to find once this CL
is indexed, and (3) it is not an altogether inaccurate
description of the situation.
Add a test. Maybe other systems will break too.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, iant, ken
https://golang.org/cl/83270049
Delaying the runtime.throw until here will print more information.
In particular it will print the signal and code values, which means
it will show the fault address.
The canpanic checks were added recently, in CL 75320043.
They were just not added in exactly the right place.
LGTM=iant
R=dvyukov, iant
CC=golang-codereviews
https://golang.org/cl/83980043
Brad has been asking for this for a while.
I have resisted because I wanted to find a more general way to
do this, one that would keep the performance of code introducing
variables the same as the performance of code that did not.
(See golang.org/issue/3512#c20).
I have not found the more general way, and recent changes to
remove ambiguously live temporaries have blown away the
property I was trying to preserve, so that's no longer a reason
not to make the change.
Fixes#3512.
LGTM=iant
R=iant
CC=bradfitz, golang-codereviews, khr, r
https://golang.org/cl/83740044
The software floating point runs with m->locks++
to avoid being preempted; recognize this case in panic
and undo it so that m->locks is maintained correctly
when panicking.
Fixes#7553.
LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/84030043
The old limit of 5 was chosen because we didn't actually know how
many bytes of arguments there were; 5 was a halfway point between
printing some useful information and looking ridiculous.
Now we know how many bytes of arguments there are, and we stop
the printing when we reach that point, so the "looking ridiculous" case
doesn't happen anymore: we only print actual argument words.
The cutoff now serves only to truncate very long (but real) argument lists.
In multiple debugging sessions recently (completely unrelated bugs)
I have been frustrated by not seeing more of the long argument lists:
5 words is only 2.5 interface values or strings, and not even 2 slices.
Double the max amount we'll show.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews, iant, r
https://golang.org/cl/83850043
Reduce footprint of liveness bitmaps by about 5x.
1. Mark all liveness bitmap symbols as 4-byte aligned
(they were aligned to a larger size by default).
2. The bitmap data is a bitmap count n followed by n bitmaps.
Each bitmap begins with its own count m giving the number
of bits. All the m's are the same for the n bitmaps.
Emit this bitmap length once instead of n times.
3. Many bitmaps within a function have the same bit values,
but each call site was given a distinct bitmap. Merge duplicate
bitmaps so that no bitmap is written more than once.
4. Many functions end up with the same aggregate bitmap data.
We used to name the bitmap data funcname.gcargs and funcname.gclocals.
Instead, name it gclocals.<md5 of data> and mark it dupok so
that the linker coalesces duplicate sets. This cut the bitmap
data remaining after step 3 by 40%; I was not expecting it to
be quite so dramatic.
Applied to "go build -ldflags -w code.google.com/p/go.tools/cmd/godoc":
bitmaps pclntab binary on disk
before this CL 1326600 1985854 12738268
4-byte align 1154288 (0.87x) 1985854 (1.00x) 12566236 (0.99x)
one bitmap len 782528 (0.54x) 1985854 (1.00x) 12193500 (0.96x)
dedup bitmap 414748 (0.31x) 1948478 (0.98x) 11787996 (0.93x)
dedup bitmap set 245580 (0.19x) 1948478 (0.98x) 11620060 (0.91x)
While here, remove various dead blocks of code from plive.c.
Fixes#6929.
Fixes#7568.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83630044
REP MOVSQ and REP STOSQ have a really high startup overhead.
Use a Duff's device to do the repetition instead.
benchmark old ns/op new ns/op delta
BenchmarkClearFat32 7.20 1.60 -77.78%
BenchmarkCopyFat32 6.88 2.38 -65.41%
BenchmarkClearFat64 7.15 3.20 -55.24%
BenchmarkCopyFat64 6.88 3.44 -50.00%
BenchmarkClearFat128 9.53 5.34 -43.97%
BenchmarkCopyFat128 9.27 5.56 -40.02%
BenchmarkClearFat256 13.8 9.53 -30.94%
BenchmarkCopyFat256 13.5 10.3 -23.70%
BenchmarkClearFat512 22.3 18.0 -19.28%
BenchmarkCopyFat512 22.0 19.7 -10.45%
BenchmarkCopyFat1024 36.5 38.4 +5.21%
BenchmarkClearFat1024 35.1 35.0 -0.28%
TODO: use for stack frame zeroing
TODO: REP prefixes are still used for "reverse" copying when src/dst
regions overlap. Might be worth fixing.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, r
https://golang.org/cl/81370046
The old code was using the PC of the instruction after the CALL.
Variables live during the call but not live when it returns would
not be seen as live during the stack copy, which might lead to
corruption. The correct PC to use is the one just before the
return address. After this CL the lookup matches what mgc0.c does.
The only time this matters is if you have back to back CALL instructions:
CALL f1 // x live here
CALL f2 // x no longer live
If a stack copy occurs during the execution of f1, the old code will
use the liveness bitmap intended for the execution of f2 and will not
treat x as live.
The only way this situation can arise and cause a problem in a stack copy
is if x lives on the stack has had its address taken but the compiler knows
enough about the context to know that x is no longer needed once f1
returns. The compiler has never known that much, so using the f2 context
cannot currently cause incorrect execution. For the same reason, it is not
possible to write a test for this today.
CL 83090046 will make the compiler precise enough in some cases
that this distinction will start mattering. The existing stack growth tests
in package runtime will fail if that CL is submitted without this one.
While we're here, print the frame PC in debug mode and update the
bitmap interpretation strings.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83250043
GODEBUG=allocfreetrace=1:
The allocfreetrace=1 mode prints a stack trace for each block
allocated and freed, and also a stack trace for each garbage collection.
It was implemented by reusing the heap profiling support: if allocfreetrace=1
then the heap profile was effectively running at 1 sample per 1 byte allocated
(always sample). The stack being shown at allocation was the stack gathered
for profiling, meaning it was derived only from the program counters and
did not include information about function arguments or frame pointers.
The stack being shown at free was the allocation stack, not the free stack.
If you are generating this log, you can find the allocation stack yourself, but
it can be useful to see exactly the sequence that led to freeing the block:
was it the garbage collector or an explicit free? Now that the garbage collector
runs on an m0 stack, the stack trace for the garbage collector was never interesting.
Fix all these problems:
1. Decouple allocfreetrace=1 from heap profiling.
2. Print the standard goroutine stack traces instead of a custom format.
3. Print the stack trace at time of allocation for an allocation,
and print the stack trace at time of free (not the allocation trace again)
for a free.
4. Print all goroutine stacks at garbage collection. Having all the stacks
means that you can see the exact point at which each goroutine was
preempted, which is often useful for identifying liveness-related errors.
GODEBUG=gcdead=1:
This mode overwrites dead pointers with a poison value.
Detect the poison value as an invalid pointer during collection,
the same way that small integers are invalid pointers.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81670043
This is the same check we use during stack copying.
The check cannot be applied to C stack frames, even
though we do emit pointer bitmaps for the arguments,
because (1) the pointer bitmaps assume all arguments
are always live, not true of outputs during the prologue,
and (2) the pointer bitmaps encode interface values as
pointer pairs, not true of interfaces holding integers.
For the rest of the frames, however, we should hold ourselves
to the rule that a pointer marked live really is initialized.
The interface scanning already implicitly checks this
because it interprets the type word as a valid type pointer.
This may slow things down a little because of the extra loads.
Or it may speed things up because we don't bother enqueuing
nil pointers anymore. Enough of the rest of the system is slow
right now that we can't measure it meaningfully.
Enable for now, even if it is slow, to shake out bugs in the
liveness bitmaps, and then decide whether to turn it off
for the Go 1.3 release (issue 7650 reminds us to do this).
The new m->traceback field lets us force printing of fp=
values on all goroutine stack traces when we detect a
bad pointer. This makes it easier to understand exactly
where in the frame the bad pointer is, so that we can trace
it back to a specific variable and determine what is wrong.
Update #7650
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80860044
The garbage collector will scan these pointers,
so make sure they are initialized.
LGTM=bradfitz, khr
R=khr, bradfitz
CC=golang-codereviews
https://golang.org/cl/80960047
m->moreargp/morebuf were not cleared in case of preemption and stack growing,
it can lead to persistent leaks of large memory blocks.
It seems to fix the sync.Pool finalizer failures. I've run the test 500'000 times
w/o a single failure; previously it would fail dozens of times.
Fixes#7633.
Fixes#7533.
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, khr, rsc
https://golang.org/cl/80480044
Update channel race annotations to support change in
cl/75130045: doc: allow buffered channel as semaphore without initialization
The new annotations are added only for channels with capacity 1.
Strictly saying it's possible to construct a counter-example that
will produce a false positive with capacity > 1. But it's hardly can
lead to false positives in real programs, at least I would like to see such programs first.
Any additional annotations also increase probability of false negatives,
so I would prefer to add them lazily.
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, iant, khr, rsc
https://golang.org/cl/76970043
Currently it's possible that bgsweep finishes before all spans
have been swept (we only know that sweeping of all spans has *started*).
In such case bgsweep may fail wake up runfinq goroutine when it needs to.
finq may still be nil at this point, but some finalizers may be queued later.
Make bgsweep to wait for sweeping to *complete*, then it can decide
whether it needs to wake up runfinq for sure.
Update #7533
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75960043
If we set obj, then it will be enqueued for marking at the end of the scanning loop.
This is not necessary, since we've already marked it.
This can wait for 1.4 if you wish.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/80030043
Change two-bit stack map entries to encode:
0 = dead
1 = scalar
2 = pointer
3 = multiword
If multiword, the two-bit entry for the following word encodes:
0 = string
1 = slice
2 = iface
3 = eface
That way, during stack scanning we can check if a string
is zero length or a slice has zero capacity. We can avoid
following the contained pointer in those cases. It is safe
to do so because it can never be dereferenced, and it is
desirable to do so because it may cause false retention
of the following block in memory.
Slice feature turned off until issue 7564 is fixed.
Update #7549
LGTM=rsc
R=golang-codereviews, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/76380043
The existing code did not have a clear notion of whether
memory has been actually reserved. It checked based on
whether in 32-bit mode or 64-bit mode and (on GNU/Linux) the
requested address, but it confused the requested address and
the returned address.
LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews, michael.hudson
https://golang.org/cl/79610043
The nproc and ndone fields are uint32. This makes the type
consistent.
LGTM=minux.ma
R=golang-codereviews, minux.ma
CC=golang-codereviews
https://golang.org/cl/79340044
Structured Exception Handling (SEH) was the first way to handle
exceptions (memory faults, divides by zero) on Windows.
The S might as well stand for "stack-based": the implementation
interprets stack addresses in a few different ways, and it gets
subtly confused by Go's management of stacks. It's also something
that requires active maintenance during cgo switches, and we've
had bugs in that maintenance in the past.
We have recently come to believe that SEH cannot work with
Go's stack usage. See http://golang.org/issue/7325 for details.
Vectored Exception Handling (VEH) is more like a Unix signal
handler: you set it once for the whole process and forget about it.
This CL drops all the SEH code and replaces it with VEH code.
Many special cases and 7 #ifdefs disappear.
VEH was introduced in Windows XP, so Go on windows/386 will
now require Windows XP or later. The previous requirement was
Windows 2000 or later. Windows 2000 immediately preceded
Windows XP, so Windows 2000 is the only affected version.
Microsoft stopped supporting Windows 2000 in 2010.
See http://golang.org/s/win2000-golang-nuts for details.
Fixes#7325.
LGTM=alex.brainman, r
R=golang-codereviews, alex.brainman, stephen.gutekanst, dave
CC=golang-codereviews, iant, r
https://golang.org/cl/74790043
Also move generated code into a separate file,
because it's difficult to work with the file otherwise.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/76080044
It was using the wrong offset and returned random values
making "runoutput" compiler tests crash.
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/76250043
They were rejected by NaCl due to AES instructions and
accesses to %gs:0x8, caused by wrong tlsoffset value.
LGTM=iant
R=rsc, dave, iant
CC=golang-codereviews
https://golang.org/cl/76050044
It's possible that bgsweep constantly does not catch up for some reason,
in this case runfinq was not woken at all.
R=rsc
CC=golang-codereviews
https://golang.org/cl/75940043
The problem was that spans end up in wrong lists after split
(e.g. in h->busy instead of h->central->empty).
Also the span can be non-swept before split,
I don't know what it can cause, but it's safer to operate on swept spans.
Fixes#7544.
R=golang-codereviews, rsc
CC=golang-codereviews, khr
https://golang.org/cl/76160043
Currently processes crash with obscure message.
Say that it's "out of memory".
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, khr, rsc
https://golang.org/cl/75820045
The Solaris network poller uses event ports, which are
level-triggered. As such, it has to re-arm itself after each
wakeup. The arming mechanism (which runs in its own thread) raced
with the closing of a file descriptor happening in a different
thread. When a network file descriptor is about to be closed,
the network poller is awaken to give it a chance to remove its
association with the file descriptor. Because the poller always
re-armed itself, it raced with code that closed the descriptor.
This change makes the network poller check before re-arming if
the file descriptor is about to be closed, in which case it will
ignore the re-arming request. It uses the per-PollDesc lock in
order to serialize access to the PollDesc.
This change also adds extensive documentation describing the
Solaris implementation of the network poller.
Fixes#7410.
LGTM=dvyukov, iant
R=golang-codereviews, bradfitz, iant, dvyukov, aram.h, gobot
CC=golang-codereviews
https://golang.org/cl/69190044
Mark free memory blocks as unused.
On amd64 it allows the process to eat all 128 GB of heap
without killing the machine.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/74070043
This is especially important for SetPanicOnCrash,
but also useful for e.g. nil deref in mallocgc.
Panics on such crashes can't lead to anything useful,
only to deadlocks, hangs and obscure crashes.
This is a copy of broken but already LGTMed
https://golang.org/cl/68540043/
TBR=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75320043
When we copy stack, we check only new size of the top segment.
This is incorrect, because we can have other segments below it.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/73980045
Calling runtime·cgocall could trigger a GC in the child while
gclock was held by the parent.
Fixes#7511
LGTM=bradfitz, dvyukov, dave
R=golang-codereviews, bradfitz, dvyukov, dave
CC=golang-codereviews, rsc
https://golang.org/cl/75210044
On Plan 9, the kernel disallows the use of floating point
instructions while handling a note. Previously, we worked
around this by using a simple loop in place of memmove.
When I added that work-around, I verified that all paths
from the note handler didn't end up calling memmove. Now
that memclr is using SSE instructions, the same process
will have to be done again.
Instead of doing that, however, this CL just punts and
uses unoptimized functions everywhere on Plan 9.
LGTM=rsc
R=rsc, 0intro
CC=golang-codereviews
https://golang.org/cl/73830044
syscall.naclWrite was missing from sys_nacl_386.s
This gets ./make.bash passing, but doesn't pass validation. I'm not sure if this is the fault of this change, or validation was broken anyway.
LGTM=rsc
R=minux.ma, rsc
CC=golang-codereviews
https://golang.org/cl/74510043
1. Fix the bug that shrinkstack returns memory to heap.
This causes growslice to misbehave (it manually initialized
blocks, and in efence mode shrinkstack's free leads to
partially-initialized blocks coming out of growslice.
Which in turn causes GC to crash while treating the garbage
as Eface/Iface.
2. Enable efence for stack segments.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, khr
https://golang.org/cl/74080043
The garbage collector uses type information to guide the
traversal of the heap. If it sees a field that should be a string,
it marks the object pointed at by the string data pointer as
visited but does not bother to look at the data, because
strings contain bytes, not pointers.
If you save s[len(s):] somewhere, though, the string data pointer
actually points just beyond the string data; if the string data
were exactly the size of an allocated block, the string data
pointer would actually point at the next block. It is incorrect
to mark that next block as visited and not bother to look at
the data, because the next block may be some other type
entirely.
The fix is to ignore strings with zero length during collection:
they are empty and can never become non-empty: the base
pointer will never be used again. The handling of slices already
does this (but using cap instead of len).
This was not a bug in Go 1.2, because until January all string
allocations included a trailing NUL byte not included in the
length, so s[len(s):] still pointed inside the string allocation
(at the NUL).
This bug was causing the crashes in test/run.go. Specifically,
the parsing of a regexp in package regexp/syntax allocated a
[]syntax.Inst with rounded size 1152 bytes. In fact it
allocated many such slices, because during the processing of
test/index2.go it creates thousands of regexps that are all
approximately the same complexity. That takes a long time, and
test/run works on other tests in other goroutines. One such
other test is chan/perm.go, which uses an 1152-byte source
file. test/run reads that file into a []byte and then calls
strings.Split(string(src), "\n"). The string(src) creates an
1152-byte string - and there's a very good chance of it
landing next to one of the many many regexp slices already
allocated - and then because the file ends in a \n,
strings.Split records the tail empty string as the final
element in the slice. A garbage collection happens at this
point, the collection finds that string before encountering
the []syntax.Inst data it now inadvertently points to, and the
[]syntax.Inst data is not scanned for the pointers that it
contains. Each syntax.Inst contains a []rune, those are
missed, and the backing rune arrays are freed for reuse. When
the regexp is later executed, the runes being searched for are
no longer runes at all, and there is no match, even on text
that should match.
On 64-bit machines the pointer in the []rune inside the
syntax.Inst is larger (along with a few other pointers),
pushing the []syntax.Inst backing array into a larger size
class, avoiding the collision with chan/perm.go's
inadvertently sized file.
I expect this was more prevalent on OS X than on Linux or
Windows because those managed to run faster or slower and
didn't overlap index2.go with chan/perm.go as often. On the
ARM systems, we only run one errorcheck test at a time, so
index2 and chan/perm would never overlap.
It is possible that this bug is the root cause of other crashes
as well. For now we only know it is the cause of the test/run crash.
Many thanks to Dmitriy for help debugging.
Fixes#7344.
Fixes#7455.
LGTM=r, dvyukov, dave, iant
R=golang-codereviews, dave, r, dvyukov, delpontej, iant
CC=golang-codereviews, khr
https://golang.org/cl/74250043
For now Note, futexsleep and futexwakeup are designed for threads,
not for processes. The explicit use of UMTX_OP_WAIT_UINT_PRIVATE and
UMTX_OP_WAKE_PRIVATE can avoid unnecessary traversals of VM objects,
to hit undiscovered bugs related to VM system on SMP/SMT/NUMA
environment.
Update #7496
LGTM=iant
R=golang-codereviews, gobot, iant, bradfitz
CC=golang-codereviews
https://golang.org/cl/72760043
This CL is a reformulation of CL 73110043 containing only the minimum required to get the nacl builds compiling.
LGTM=bradfitz
R=rsc, bradfitz
CC=golang-codereviews
https://golang.org/cl/74220043
The change to signal_amd64.c from CL 15790043 was not merged correctly.
This CL reapplies the change, renaming the file to signal_amd64x.c and adds the appropriate build tags.
LGTM=iant, bradfitz
R=rsc, iant, bradfitz
CC=golang-codereviews
https://golang.org/cl/72790043
Spans are now private to threads, and the loop
is removed from all other functions.
Remove it from marknogc for consistency.
LGTM=khr, rsc
R=golang-codereviews, bradfitz, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/72520043
Thanks to Ian for spotting these.
runtime.h: define uintreg correctly.
stack.c: address warning caused by the type of uintreg being 32 bits on amd64p32.
Commentary (mainly for my own use)
nacl/amd64p32 defines a machine with 64bit registers, but address space is limited to a 4gb window (the window is placed randomly inside the full 48 bit virtual address space of a process). To cope with this 6c defines _64BIT and _64BITREG.
_64BITREG is always defined by 6c, so both GOARCH=amd64 and GOARCH=amd64p32 use 64bit wide registers.
However _64BIT itself is only defined when 6c is compiling for amd64 targets. The definition is elided for amd64p32 environments causing int, uint and other arch specific types to revert to their 32bit definitions.
LGTM=iant
R=iant, rsc, remyoudompheng
CC=golang-codereviews
https://golang.org/cl/72860046
This code being buggy is the only explanation I can come up
with for issue 7325. It's probably not, but the only alternative
is a Windows kernel bug. Comment this out to see what breaks
or gets fixed.
Update #7325
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/72590044
From the trace it appears that stackalloc is being
called with 0x1800 which is 6k = 4k + (StackSystem=2k).
Make StackSystem 4k too, to make stackalloc happy.
It's already 4k on windows/amd64.
TBR=khr
CC=golang-codereviews
https://golang.org/cl/72600043
There are at least 3 bugs:
1. g->stacksize accounting is broken during copystack/shrinkstack
2. stktop->free is not properly maintained during copystack/shrinkstack
3. stktop->free logic is broken:
we can have stktop->free==FixedStack,
and we will free it into stack cache,
but it actually comes from heap as the result of non-copying segment shrink
This shows as at least spurious races on race builders (maybe something else as well I don't know).
The idea behind the refactoring is to consolidate stacksize and
segment origin logic in stackalloc/stackfree.
Fixes#7490.
LGTM=rsc, khr
R=golang-codereviews, rsc, khr
CC=golang-codereviews
https://golang.org/cl/72440043
Recursive panics leave dangling Panic structs in g->panic stack.
At best it leads to a Defer leak and incorrect output on a subsequent panic.
At worst it arbitrary corrupts heap.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/72480043
One reason the sync.Pool finalizer test can fail is that
this function's ef1 contains uninitialized data that just
happens to point at some of the old pool. I've seen this cause
retention of a single pool cache line (32 elements) on arm.
Really we need liveness information for C functions, but
for now we can be more careful about data in long-lived
C functions that block.
LGTM=bradfitz, dvyukov
R=golang-codereviews, bradfitz, dvyukov
CC=golang-codereviews, iant, khr
https://golang.org/cl/72490043
Instead, split the underlying storage in half and
free just half of it.
Shrinking without copying lets us reclaim storage used
by a previously profligate Go routine that has now blocked
inside some C code.
To shrink in place, we need all stacks to be a power of 2 in size.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/69580044
Two memory allocator bug fixes.
- efence is not maintaining the proper heap metadata
to make eventual memory reuse safe, so use SysFault.
- now that our heap PageSize is 8k but most hardware
uses 4k pages, SysAlloc and SysReserve results must be
explicitly aligned. Do that in a few more call sites and
document this fact in malloc.h.
Fixes#7448.
LGTM=iant
R=golang-codereviews, josharian, iant
CC=dvyukov, golang-codereviews
https://golang.org/cl/71750048
I've just needed the G status on fault to debug runtime bug.
For some reason we print everything except header here.
Make it more informative and consistent.
R=rsc
CC=golang-codereviews
https://golang.org/cl/67870056
Implement custom assembly thunks for hot race calls (memory accesses and function entry/exit).
The thunks extract caller pc, verify that the address is in heap or global and switch to g0 stack.
Before:
ok regexp 3.692s
ok compress/bzip2 9.461s
ok encoding/json 6.380s
After:
ok regexp 2.229s (-40%)
ok compress/bzip2 4.703s (-50%)
ok encoding/json 3.629s (-43%)
For comparison, normal non-race build:
ok regexp 0.348s
ok compress/bzip2 0.304s
ok encoding/json 0.661s
Race build:
ok regexp 2.229s (+540%)
ok compress/bzip2 4.703s (+1447%)
ok encoding/json 3.629s (+449%)
Also removes some race-related special cases from cgocall and scheduler.
In long-term it will allow to remove cyclic runtime/race dependency on cmd/cgo.
Fixes#4249.
Fixes#7460.
Update #6508
Update #6688
R=iant, rsc, bradfitz
CC=golang-codereviews
https://golang.org/cl/55100044
32-bit Windows uses "structured exception handling" (SEH) to
handle hardware faults: that there is a per-thread linked list
of fault handlers maintained in user space instead of
something like Unix's signal handlers. The structures in the
linked list are required to live on the OS stack, and the
usual discipline is that the function that pushes a record
(allocated from the current stack frame) onto the list pops
that record before returning. Not to pop the entry before
returning creates a dangling pointer error: the list head
points to a stack frame that no longer exists.
Go pushes an SEH record in the top frame of every OS thread,
and that record suffices for all Go execution on that thread,
at least until cgo gets involved.
If we call into C using cgo, that called C code may push its
own SEH records, but by the convention it must pop them before
returning back to the Go code. We assume it does, and that's
fine.
If the C code calls back into Go, we want the Go SEH handler
to become active again, not whatever C has set up. So
runtime.callbackasm1, which handles a call from C back into
Go, pushes a new SEH record before calling the Go code and
pops it when the Go code returns. That's also fine.
It can happen that when Go calls C calls Go like this, the
inner Go code panics. We allow a defer in the outer Go to
recover the panic, effectively wiping not only the inner Go
frames but also the C calls. This sequence was not popping the
SEH stack up to what it was before the cgo calls, so it was
creating the dangling pointer warned about above. When
eventually the m stack was used enough to overwrite the
dangling SEH records, the SEH chain was lost, and any future
panic would not end up in Go's handler.
The bug in TestCallbackPanic and friends was thus creating a
situation where TestSetPanicOnFault - which causes a hardware
fault - would not find the Go fault handler and instead crash
the binary.
Add checks to TestCallbackPanicLocked to diagnose the mistake
in that test instead of leaving a bad state for another test
case to stumble over.
Fix bug by restoring SEH chain during deferred "endcgo"
cleanup.
This bug is likely present in Go 1.2.1, but since it depends
on Go calling C calling Go, with the inner Go panicking and
the outer Go recovering the panic, it seems not important
enough to bother fixing before Go 1.3. Certainly no one has
complained.
Fixes#7470.
LGTM=alex.brainman
R=golang-codereviews, alex.brainman
CC=golang-codereviews, iant, khr
https://golang.org/cl/71440043
For non-closure functions, the context register is uninitialized
on entry and will not be used, but morestack saves it and then the
garbage collector treats it as live. This can be a source of memory
leaks if the context register points at otherwise dead memory.
Avoid this by introducing a parallel set of morestack functions
that clear the context register, and use those for the non-closure functions.
I hope this will help with some of the finalizer flakiness, but it probably won't.
Fixes#7244.
LGTM=dvyukov
R=khr, dvyukov
CC=golang-codereviews
https://golang.org/cl/71030044
The flakiness appears to be just in tests, not in the actual code.
Specifically, the many tests call runtime.GC once and expect that
the finalizers will be running in the background when GC returns.
Now that the sweep phase is concurrent with execution, however,
the finalizers will not be run until sweep finishes, which might
be quite a bit later. To force sweep to finish, implement runtime.GC
by calling the actual collection twice. The second will complete the
sweep from the first.
This was reliably broken after a few runs before the CL and now
passes tens of runs:
while GOMAXPROCS=2 ./runtime.test -test.run=Finalizer -test.short \
-test.timeout=300s -test.cpu=$(perl -e 'print ("1,2,4," x 100) . "1"')
do true; done
Fixes#7328.
LGTM=dvyukov
R=dvyukov, dave
CC=golang-codereviews
https://golang.org/cl/71080043
The exception handler runs on the ordinary g stack,
and the stack copier is now trying to do a traceback
across it. That's never been needed before, so it was
unimplemented. Implement it, in all its ugliness.
Fixes windows/amd64 build.
TBR=khr
CC=golang-codereviews
https://golang.org/cl/71030043
cgocall.c: define the CBARGS macro for GOARCH_amd64p32. I don't think the value of this macro will ever be used under nacl/amd64p32 but it is required to compile even if cgo is not used.
hashmap.goc: amd64p32 uses 32bit words.
LGTM=iant
R=rsc, iant
CC=golang-codereviews
https://golang.org/cl/69960044
This CL replays the following one CL from the rsc-go13nacl repo.
This is the last replay CL: after this CL the main repo will have
everything the rsc-go13nacl repo did. Changes made to the main
repo after the rsc-go13nacl repo branched off probably mean that
NaCl doesn't actually work after this CL, but all the code is now moved
over and just needs to be redebugged.
---
cmd/6l, cmd/8l, cmd/ld: support for Native Client
See golang.org/s/go13nacl for design overview.
This CL is publicly visible but not CC'ed to golang-dev,
to avoid distracting from the preparation of the Go 1.2
release.
This CL and the others will be checked into my rsc-go13nacl
clone repo for now, and I will send CLs against the main
repo early in the Go 1.3 development.
R≡khr
https://golang.org/cl/15750044
---
LGTM=bradfitz, dave, iant
R=dave, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/69040044
Before GC, we flush all the per-P allocation caches. Doing
stack shrinking mid-GC causes these caches to fill up. At the
end of gc, the sweepgen is incremented which causes all of the
data in these caches to be in a bad state (cached but not yet
swept).
Move the stack shrinking until after sweepgen is incremented,
so any caching that happens as part of shrinking is done with
already-swept data.
Reenable stack copying.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/69620043
This test currently deadlocks on dragonfly/386.
Update #7421
LGTM=minux.ma
R=golang-codereviews, minux.ma
CC=golang-codereviews
https://golang.org/cl/69380043
warning: src/pkg/runtime/mem_plan9.c:72 param declared and not used: n
src/pkg/runtime/mem_plan9.c:73 name not declared: nbytes
src/pkg/runtime/mem_plan9.c:73 bad in naddr: NAME nbytes<>+0(SB)
LGTM=minux.ma, bradfitz
R=khr, minux.ma, bradfitz
CC=golang-codereviews
https://golang.org/cl/69360043
On stack overflow, if all frames on the stack are
copyable, we copy the frames to a new stack twice
as large as the old one. During GC, if a G is using
less than 1/4 of its stack, copy the stack to a stack
half its size.
TODO
- Do something about C frames. When a C frame is in the
stack segment, it isn't copyable. We allocate a new segment
in this case.
- For idempotent C code, we can abort it, copy the stack,
then retry. I'm working on a separate CL for this.
- For other C code, we can raise the stackguard
to the lowest Go frame so the next call that Go frame
makes triggers a copy, which will then succeed.
- Pick a starting stack size?
The plan is that eventually we reach a point where the
stack contains only copyable frames.
LGTM=rsc
R=dvyukov, rsc
CC=golang-codereviews
https://golang.org/cl/54650044
MCaches now hold a MSpan for each sizeclass which they have
exclusive access to allocate from, so no lock is needed.
Modifying the heap bitmaps also no longer requires a cas.
runtime.free gets more expensive. But we don't use it
much any more.
It's not much faster on 1 processor, but it's a lot
faster on multiple processors.
benchmark old ns/op new ns/op delta
BenchmarkSetTypeNoPtr1 24 23 -0.42%
BenchmarkSetTypeNoPtr2 33 34 +0.89%
BenchmarkSetTypePtr1 51 49 -3.72%
BenchmarkSetTypePtr2 55 54 -1.98%
benchmark old ns/op new ns/op delta
BenchmarkAllocation 52739 50770 -3.73%
BenchmarkAllocation-2 33957 34141 +0.54%
BenchmarkAllocation-3 33326 29015 -12.94%
BenchmarkAllocation-4 38105 25795 -32.31%
BenchmarkAllocation-5 68055 24409 -64.13%
BenchmarkAllocation-6 71544 23488 -67.17%
BenchmarkAllocation-7 68374 23041 -66.30%
BenchmarkAllocation-8 70117 20758 -70.40%
LGTM=rsc, dvyukov
R=dvyukov, bradfitz, khr, rsc
CC=golang-codereviews
https://golang.org/cl/46810043
These are mistakes in the first big NaCl CL.
LGTM=minux.ma, iant
R=golang-codereviews, minux.ma, iant
CC=golang-codereviews
https://golang.org/cl/69200043
Switch nanotime to a monotonic clock on openbsd/386 and openbsd/amd64.
Also use a monotonic clock when for thrsleep, since the sleep duration
is based on the value returned from nanotime.
Update #6007
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/68460044
For now we don't use CLOCK_MONOTONIC_FAST instead because
it's not supported on prior to 9-STABLE.
Update #6007
LGTM=minux.ma
R=golang-codereviews, minux.ma, bradfitz
CC=golang-codereviews
https://golang.org/cl/68690043
These previously reviewed CLs are present in this CL.
---
changeset: 18445:436bb084caed
user: Russ Cox <rsc@golang.org>
date: Mon Nov 11 09:50:34 2013 -0500
description:
runtime: assembly and system calls for Native Client x86-64
See golang.org/s/go13nacl for design overview.
This CL is publicly visible but not CC'ed to golang-dev,
to avoid distracting from the preparation of the Go 1.2
release.
This CL and the others will be checked into my rsc-go13nacl
clone repo for now, and I will send CLs against the main
repo early in the Go 1.3 development.
R≡adg
https://golang.org/cl/15760044
---
changeset: 18448:90bd871b5994
user: Russ Cox <rsc@golang.org>
date: Mon Nov 11 09:51:36 2013 -0500
description:
runtime: amd64p32 and Native Client assembly bootstrap
See golang.org/s/go13nacl for design overview.
This CL is publicly visible but not CC'ed to golang-dev,
to avoid distracting from the preparation of the Go 1.2
release.
This CL and the others will be checked into my rsc-go13nacl
clone repo for now, and I will send CLs against the main
repo early in the Go 1.3 development.
R≡khr
https://golang.org/cl/15820043
---
changeset: 18449:b011c3dc687e
user: Russ Cox <rsc@golang.org>
date: Mon Nov 11 09:51:58 2013 -0500
description:
math: amd64p32 assembly routines
These routines only manipulate float64 values,
so the amd64 and amd64p32 can share assembly.
The large number of files is symptomatic of a problem
with package path: it is a Go package structured like a C library.
But that will need to wait for another day.
See golang.org/s/go13nacl for design overview.
This CL is publicly visible but not CC'ed to golang-dev,
to avoid distracting from the preparation of the Go 1.2
release.
This CL and the others will be checked into my rsc-go13nacl
clone repo for now, and I will send CLs against the main
repo early in the Go 1.3 development.
R≡bradfitz
https://golang.org/cl/15870043
---
changeset: 18450:43234f082eec
user: Russ Cox <rsc@golang.org>
date: Mon Nov 11 10:03:19 2013 -0500
description:
syscall: networking for Native Client
See golang.org/s/go13nacl for design overview.
This CL is publicly visible but not CC'ed to golang-dev,
to avoid distracting from the preparation of the Go 1.2
release.
This CL and the others will be checked into my rsc-go13nacl
clone repo for now, and I will send CLs against the main
repo early in the Go 1.3 development.
R≡rsc
https://golang.org/cl/15780043
---
changeset: 18451:9c8d1d890aaa
user: Russ Cox <rsc@golang.org>
date: Mon Nov 11 10:03:34 2013 -0500
description:
runtime: assembly and system calls for Native Client x86-32
See golang.org/s/go13nacl for design overview.
This CL is publicly visible but not CC'ed to golang-dev,
to avoid distracting from the preparation of the Go 1.2
release.
This CL and the others will be checked into my rsc-go13nacl
clone repo for now, and I will send CLs against the main
repo early in the Go 1.3 development.
R≡rsc
https://golang.org/cl/15800043
---
changeset: 18452:f90b1dd9228f
user: Russ Cox <rsc@golang.org>
date: Mon Nov 11 11:04:09 2013 -0500
description:
runtime: fix frame size for linux/amd64 runtime.raise
R≡rsc
https://golang.org/cl/24480043
---
changeset: 18445:436bb084caed
user: Russ Cox <rsc@golang.org>
date: Mon Nov 11 09:50:34 2013 -0500
description:
runtime: assembly and system calls for Native Client x86-64
See golang.org/s/go13nacl for design overview.
This CL is publicly visible but not CC'ed to golang-dev,
to avoid distracting from the preparation of the Go 1.2
release.
This CL and the others will be checked into my rsc-go13nacl
clone repo for now, and I will send CLs against the main
repo early in the Go 1.3 development.
R≡adg
https://golang.org/cl/15760044
---
changeset: 18455:53b06799a938
user: Russ Cox <rsc@golang.org>
date: Mon Nov 11 23:29:52 2013 -0500
description:
cmd/gc: add -nolocalimports flag
R≡dsymonds
https://golang.org/cl/24990043
---
changeset: 18456:24f64e1eaa8a
user: Russ Cox <rsc@golang.org>
date: Tue Nov 12 22:06:29 2013 -0500
description:
runtime: add comments for playback write
R≡adg
https://golang.org/cl/25190043
---
changeset: 18457:d1f615bbb6e4
user: Russ Cox <rsc@golang.org>
date: Wed Nov 13 17:03:52 2013 -0500
description:
runtime: write only to NaCl stdout, never to NaCl stderr
NaCl writes some other messages on standard error
that we would like to be able to squelch.
R≡adg
https://golang.org/cl/26240044
---
changeset: 18458:1f01be1a1dc2
tag: tip
user: Russ Cox <rsc@golang.org>
date: Wed Nov 13 19:45:16 2013 -0500
description:
runtime: remove apparent debugging dreg
Setting timens to 0 turns off fake time.
TBR≡adg
https://golang.org/cl/26400043
LGTM=bradfitz
R=dave, bradfitz
CC=golang-codereviews
https://golang.org/cl/68730043
See golang.org/s/go13nacl for design overview.
This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come.
CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately.
The exact change lists included are
15050047: syscall: support for Native Client
15360044: syscall: unzip implementation for Native Client
15370044: syscall: Native Client SRPC implementation
15400047: cmd/dist, cmd/go, go/build, test: support for Native Client
15410048: runtime: support for Native Client
15410049: syscall: file descriptor table for Native Client
15410050: syscall: in-memory file system for Native Client
15440048: all: update +build lines for Native Client port
15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client
15570045: os: support for Native Client
15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32
15690044: net: support for Native Client
15690048: runtime: support for fake time like on Go Playground
15690051: build: disable various tests on Native Client
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/68150047