Fixes a regression introduced in revision 4cb93e2900d0.
That revision changed runtime·memmove to use SSE MOVOU
instructions for sizes between 17 and 256 bytes. We were
using memmove to save a copy of the note string during
the note handler. The Plan 9 kernel does not allow the
use of floating point in note handlers (which includes
MOVOU since it touches the XMM registers).
Arguably, runtime·memmove should not be using MOVOU when
GO386=387 but that wouldn't help us on amd64. It's very
important that we guard against any future changes so we
use a simple copy loop instead.
This change is extracted from CL 9796043 (since that CL
is still being ironed out).
R=rsc
CC=golang-dev
https://golang.org/cl/34640045
The funcdata symbol incorrectly named the dead value map the
dead pointer map. The dead value map identifies all dead
values, including pointers and non-pointers, in a stack frame.
The purpose of this map is to allow the runtime to poison
locations of dead data to catch lost invariants.
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/38670043
The new linker will disallow this on arm
(it is already disallowed on amd64 and 386)
in order to be able to lay out each function
separately.
The restriction is only for jumps into the middle
of a function; jumps to the beginning of a function
remain fine.
Prereq for linker cleanup (golang.org/s/go13linker).
R=iant, r, minux.ma
CC=golang-dev
https://golang.org/cl/35800043
When enabled this new debugging mode will allocate objects on
their own page and never recycle memory addresses. This is an
essential tool to root cause a broad class of heap corruption.
R=golang-dev, dave, daniel.morsing, dvyukov, rsc, iant, cshapiro
CC=golang-dev
https://golang.org/cl/22060046
This change allows the garbage collector to examine stack
slots that are determined as live and containing a pointer
value by the garbage collector. This results in a mean
reduction of 65% in the number of stack slots scanned during
an invocation of "GOGC=1 all.bash".
Unfortunately, this does not yet allow garbage collection to
be precise for the stack slots computed as live. Pointers
confound the determination of what definitions reach a given
instruction. In general, this problem is not solvable without
runtime cost but some advanced cooperation from the compiler
might mitigate common cases.
R=golang-dev, rsc, cshapiro
CC=golang-dev
https://golang.org/cl/14430048
Pass as a slice of strings instead. For 2-5 strings, implement
dedicated routines so no slices are needed.
static call counts in the go binary:
2 strings: 342 occurrences
3 strings: 98
4 strings: 30
5 strings: 13
6+ strings: 14
Why? C varags, bad for stack scanning and copying.
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/36380043
Now that the map implementation is reading the keys and values from
arbitrary memory (instead of from stack slots), it needs to tell the
race detector when it does so.
Fixes#6875.
R=golang-dev, dave
CC=golang-dev
https://golang.org/cl/36360043
This change is part of the plan to get rid of all vararg C calls
which are a pain for getting exact stack scanning.
We allocate a chunk of zero memory to return a pointer to when a
map access doesn't find the key. This is simpler than returning nil
and fixing things up in the caller. Linker magic allocates a single
zero memory area that is shared by all (non-reflect-generated) map
types.
Passing things by reference gets rid of some copies, so it speeds
up code with big keys/values.
benchmark old ns/op new ns/op delta
BenchmarkBigKeyMap 34 31 -8.48%
BenchmarkBigValMap 37 30 -18.62%
BenchmarkSmallKeyMap 26 23 -11.28%
R=golang-dev, dvyukov, khr, rsc
CC=golang-dev
https://golang.org/cl/14794043
Two bugs:
1. The first iteration of the traceback always uses LR when provided,
which it is (only) during a profiling signal, but in fact LR is correct
only if the stack frame has not been allocated yet. Otherwise an
intervening call may have changed LR, and the saved copy in the stack
frame should be used. Fix in traceback_arm.c.
2. The division runtime call adds 8 bytes to the stack. In order to
keep the traceback routines happy, it must copy the saved LR into
the new 0(SP). Change
SUB $8, SP
into
MOVW 0(SP), R11 // r11 is temporary, for use by linker
MOVW.W R11, -8(SP)
to update SP and 0(SP) atomically, so that the traceback always
sees a saved LR at 0(SP).
Fixes#6681.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/19910044
The CL causes misc/cgo/test to fail randomly.
I suspect that the problem is the use of a division instruction
in usleep, which can be called while trying to acquire an m
and therefore cannot store the denominator in m.
The solution to that would be to rewrite the code to use a
magic multiply instead of a divide, but now we're getting
pretty far off the original code.
Go back to the original in preparation for a different,
less efficient but simpler fix.
««« original CL description
cmd/5l, runtime: make ARM integer division profiler-friendly
The implementation of division constructed non-standard
stack frames that could not be handled by the traceback
routines.
CL 13239052 left the frames non-standard but fixed them
for the specific case of a divide-by-zero panic.
A profiling signal can arrive at any time, so that fix
is not sufficient.
Change the division to store the extra argument in the M struct
instead of in a new stack slot. That keeps the frames bog standard
at all times.
Also fix a related bug in the traceback code: when starting
a traceback, the LR register should be ignored if the current
function has already allocated its stack frame and saved the
original LR on the stack. The stack copy should be used, as the
LR register may have been modified.
Combined, these make the torture test from issue 6681 pass.
Fixes#6681.
R=golang-dev, r, josharian
CC=golang-dev
https://golang.org/cl/19810043
»»»
TBR=r
CC=golang-dev
https://golang.org/cl/20350043
The implementation of division constructed non-standard
stack frames that could not be handled by the traceback
routines.
CL 13239052 left the frames non-standard but fixed them
for the specific case of a divide-by-zero panic.
A profiling signal can arrive at any time, so that fix
is not sufficient.
Change the division to store the extra argument in the M struct
instead of in a new stack slot. That keeps the frames bog standard
at all times.
Also fix a related bug in the traceback code: when starting
a traceback, the LR register should be ignored if the current
function has already allocated its stack frame and saved the
original LR on the stack. The stack copy should be used, as the
LR register may have been modified.
Combined, these make the torture test from issue 6681 pass.
Fixes#6681.
R=golang-dev, r, josharian
CC=golang-dev
https://golang.org/cl/19810043
The case can happen when starttheworld is calling acquirep
to get things moving again and acquirep gets preempted.
The stack trace is in golang.org/issue/6644.
It is difficult to build a short test case for this, but
the person who reported issue 6644 confirms that this
solves the problem.
Fixes#6644.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/18740044
Nomemprof seems to be unneeded now, there is no recursion.
If the recursion will be re-introduced, it will break loudly by deadlocking.
Fixes#6566.
R=golang-dev, minux.ma, rsc
CC=golang-dev
https://golang.org/cl/14695044
Instead of adding an -march=armv5t flag to the gcc command
line, the same effect is obtained with an ".arch armv5t"
pseudo op in the assembly file that uses armv5t instructions.
R=golang-dev, iant, dave
CC=golang-dev
https://golang.org/cl/14511044
If an iterator is started while a map is in the middle of a grow,
and the map has NaN keys, then those keys might get returned by
the iterator more than once. This fix makes the evacuation decision
deterministic and repeatable for NaN keys so each one gets returned
only once.
R=golang-dev, r, khr, iant
CC=golang-dev
https://golang.org/cl/14367043
Changing from 4 kB to 8 kB brings significant improvement
on a variety of the Go 1 benchmarks, on both amd64
and 386 systems.
Significant runtime reductions:
amd64 386
GoParse -14% -1%
GobDecode -12% -20%
GobEncode -64% -1%
JSONDecode -9% -4%
JSONEncode -15% -5%
Template -17% -14%
In the longer term, khr's new stacks will avoid needing to
make this decision at all, but for Go 1.2 this is a reasonable
stopgap that makes performance significantly better.
Demand paging should mean that if the second 4 kB is not
used, it will not be brought into memory, so the change
should not adversely affect resident set size.
The same argument could justify bumping as high as 64 kB
on 64-bit machines, but there are diminishing returns
after 8 kB, and using 8 kB limits the possible unintended
memory overheads we are not aware of.
Benchmark graphs at
http://swtch.com/~rsc/gostackamd64.htmlhttp://swtch.com/~rsc/gostack386.html
Full data at
http://swtch.com/~rsc/gostack.zip
R=golang-dev, khr, dave, bradfitz, dvyukov
CC=golang-dev
https://golang.org/cl/14317043
It is not possible to use (there is no declaration in package syscall),
and no one seems to care.
Alex Brainman may bring this back properly for Go 1.3.
Fixes#6338.
R=golang-dev, r, alex.brainman
CC=golang-dev
https://golang.org/cl/14287043
Not scanning the stack by frames means we are reintroducing
a few false positives into the collection. Run the finalizer registration
in its own goroutine so that stack is guaranteed to be out of
consideration in a later collection.
This is working around a regression from yesterday's tip, but
it's not a regression from Go 1.1.
R=golang-dev
TBR=golang-dev
CC=golang-dev
https://golang.org/cl/14290043
Makes build unnecessarily slower. Will fix the parser instead.
««« original CL description
runtime/pprof: run TestGoroutineSwitch for longer
Short test now takes about 0.5 second here.
Fixes#6417.
The failure was also seen on our builders.
R=golang-dev, minux.ma, r
CC=golang-dev
https://golang.org/cl/13321048
»»»
R=golang-dev, minux.ma
CC=golang-dev
https://golang.org/cl/13720048
Short test now takes about 0.5 second here.
Fixes#6417.
The failure was also seen on our builders.
R=golang-dev, minux.ma, r
CC=golang-dev
https://golang.org/cl/13321048
If a fault happens in malloc, inevitably the next thing that happens
is a deadlock trying to allocate the panic value that says the fault
happened. Stop doing that, two ways.
First, reject panic in malloc just as we reject panic in garbage collection.
Second, runtime.panicstring was using an error implementation
backed by a Go string, so the interface held an allocated *string.
Since the actual errors are C strings, define a new error
implementation backed by a C char*, which needs no indirection
and therefore no allocation.
This second fix will avoid allocation for errors like nil panic derefs
or division by zero, so it is worth doing even though the first fix
should take care of faults during malloc.
Update #6419
R=golang-dev, dvyukov, dave
CC=golang-dev
https://golang.org/cl/13774043
Keeping pointers from the pre-walk phase confuses
the race detection instrumentation.
Fixes#6418.
R=golang-dev, dvyukov, r
CC=golang-dev
https://golang.org/cl/13368057
This interface is required to use the PCDATA interface
implemented in Go 1.2. While initially entirely private, the
FUNCDATA side of the interface has been made public. This
change completes the FUNCDATA/PCDATA interface.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/13735043
The code for call site-specific pointer bitmaps was not ready in time,
but the zeroing required without it is too expensive to use by default.
We will have to wait for precise collection of stack frames until Go 1.3.
The precise collection can be re-enabled by
GOEXPERIMENT=precisestack ./all.bash
but that will not be the default for a Go 1.2 build.
Fixes#6087.
R=golang-dev, jeremyjackins, dan.kortschak, r
CC=golang-dev
https://golang.org/cl/13677045
The uint64 divide function calls _mul64x32 to do a 64x32-bit multiply
and then compares the result against the 64-bit numerator.
If the result is bigger than the numerator, must use the slow path.
Unfortunately, the 64x32 produces a 96-bit product, and only the
low 64 bits were being used in the comparison. Return all 96 bits,
the bottom 64 via the original uint64* pointer, and the top 32
as the function's return value.
Fixes 386 build (broken by ARM division tests).
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/13722044
Because we can, and because it otherwise might crash
the program if we think we're out of memory.
Fixes#6390.
R=golang-dev, iant, minux.ma
CC=golang-dev
https://golang.org/cl/13345048
The implementation of division in the 5 toolchain is a bit too magical.
Hide the magic from the traceback routines.
Also add a test for the results of the software divide routine.
Fixes#5805.
R=golang-dev, minux.ma
CC=golang-dev
https://golang.org/cl/13239052
The kernel implementation of the fast system call path,
the one invoked by the SYSCALL instruction, is broken for
restarting system calls. A C program demonstrating this is below.
Change the system calls to use INT $0x80 instead, because
that (perhaps slightly slower) system call path actually works.
I filed http://www.freebsd.org/cgi/query-pr.cgi?pr=182161.
The C program demonstrating that it is FreeBSD's fault is below.
It reports the same "Bad address" failures from wait.
#include <sys/time.h>
#include <sys/signal.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
static void handler(int);
static void* looper(void*);
int
main(void)
{
int i;
struct sigaction sa;
pthread_cond_t cond;
pthread_mutex_t mu;
memset(&sa, 0, sizeof sa);
sa.sa_handler = handler;
sa.sa_flags = SA_RESTART;
memset(&sa.sa_mask, 0xff, sizeof sa.sa_mask);
sigaction(SIGCHLD, &sa, 0);
for(i=0; i<2; i++)
pthread_create(0, 0, looper, 0);
pthread_mutex_init(&mu, 0);
pthread_mutex_lock(&mu);
pthread_cond_init(&cond, 0);
for(;;)
pthread_cond_wait(&cond, &mu);
return 0;
}
static void
handler(int sig)
{
}
int
mywait4(int pid, int *stat, int options, struct rusage *rusage)
{
int result;
asm("movq %%rcx, %%r10; syscall"
: "=a" (result)
: "a" (7),
"D" (pid),
"S" (stat),
"d" (options),
"c" (rusage));
}
static void*
looper(void *v)
{
int pid, stat, out;
struct rusage rusage;
for(;;) {
if((pid = fork()) == 0)
_exit(0);
out = mywait4(pid, &stat, 0, &rusage);
if(out != pid) {
printf("wait4 returned %d\n", out);
}
}
}
Fixes#6372.
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/13582047
The test 'gp == m->curg' is not valid on Windows,
because the goroutine being profiled is not from the
current m.
TBR=golang-dev
CC=golang-dev
https://golang.org/cl/13718043
Because profiling signals can arrive at any time, we must
handle the case where a profiling signal arrives halfway
through a goroutine switch. Luckily, although there is much
to think through, very little needs to change.
Fixes#6000.
Fixes#6015.
R=golang-dev, dvyukov
CC=golang-dev
https://golang.org/cl/13421048
Bug #1:
Issue 5406 identified an interesting case:
defer iface.M()
may end up calling a wrapper that copies an indirect receiver
from the iface value and then calls the real M method. That's
two calls down, not just one, and so recover() == nil always
in the real M method, even during a panic.
[For the purposes of this entire discussion, a wrapper's
implementation is a function containing an ordinary call, not
the optimized tail call form that is somtimes possible. The
tail call does not create a second frame, so it is already
handled correctly.]
Fix this bug by introducing g->panicwrap, which counts the
number of bytes on current stack segment that are due to
wrapper calls that should not count against the recover
check. All wrapper functions must now adjust g->panicwrap up
on entry and back down on exit. This adds slightly to their
expense; on the x86 it is a single instruction at entry and
exit; on the ARM it is three. However, the alternative is to
make a call to recover depend on being able to walk the stack,
which I very much want to avoid. We have enough problems
walking the stack for garbage collection and profiling.
Also, if performance is critical in a specific case, it is already
faster to use a pointer receiver and avoid this kind of wrapper
entirely.
Bug #2:
The old code, which did not consider the possibility of two
calls, already contained a check to see if the call had split
its stack and so the panic-created segment was one behind the
current segment. In the wrapper case, both of the two calls
might split their stacks, so the panic-created segment can be
two behind the current segment.
Fix this by propagating the Stktop.panic flag forward during
stack splits instead of looking backward during recover.
Fixes#5406.
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/13367052