Running test/garbage/parser.out.
On a 4-core Lenovo X201s (Linux):
31.12u 0.60s 31.74r 1 cpu, no atomics
32.27u 0.58s 32.86r 1 cpu, atomic instructions
33.04u 0.83s 27.47r 2 cpu
On a 16-core Xeon (Linux):
33.08u 0.65s 33.80r 1 cpu, no atomics
34.87u 1.12s 29.60r 2 cpu
36.00u 1.87s 28.43r 3 cpu
36.46u 2.34s 27.10r 4 cpu
38.28u 3.85s 26.92r 5 cpu
37.72u 5.25s 26.73r 6 cpu
39.63u 7.11s 26.95r 7 cpu
39.67u 8.10s 26.68r 8 cpu
On a 2-core MacBook Pro Core 2 Duo 2.26 (circa 2009, MacBookPro5,5):
39.43u 1.45s 41.27r 1 cpu, no atomics
43.98u 2.95s 38.69r 2 cpu
On a 2-core Mac Mini Core 2 Duo 1.83 (circa 2008; Macmini2,1):
48.81u 2.12s 51.76r 1 cpu, no atomics
57.15u 4.72s 51.54r 2 cpu
The handoff algorithm is really only good for two cores.
Beyond that we will need to so something more sophisticated,
like have each core hand off to the next one, around a circle.
Even so, the code is a good checkpoint; for now we'll limit the
number of gc procs to at most 2.
R=dvyukov
CC=golang-dev
https://golang.org/cl/4641082
On darwin amd64 it was impossible to create more that ~132 threads. While
investigating I noticed that go consumes almost 1TB of virtual memory per
OS thread and the reason for such a small limit of OS thread was because
process was running out of virtual memory. While looking at bsdthread_create
I noticed that on amd64 it wasn't using PTHREAD_START_CUSTOM.
If you look at http://fxr.watson.org/fxr/source/bsd/kern/pthread_synch.c?v=xnu-1228
you will see that in that case darwin will use stack pointer as stack size,
allocating huge amounts of memory for stack. This change fixes the issue
and allows for creation of up to 2560 OS threads (which appears to be some
Mac OS X limit) with relatively small virtual memory consumption.
R=rsc
CC=golang-dev
https://golang.org/cl/4289075
The existing code assumed that signals only arrived
while executing on the goroutine stack (g == m->curg),
not while executing on the scheduler stack (g == m->g0).
Most of the signal handling trampolines correctly saved
and restored g already, but the sighandler C code did not
have access to it.
Some rewriting of assembly to make the various
implementations as similar as possible.
Will need to change Windows too but I don't
understand how sigtramp gets called there.
R=r
CC=golang-dev
https://golang.org/cl/4203042
The old heap maps used a multilevel table, but that
was overkill: there are only 1M entries on a 32-bit
machine and we can arrange to use a dense address
range on a 64-bit machine.
The heap map is in bss. The assumption is that if
we don't touch the pages they won't be mapped in.
Also moved some duplicated memory allocation
code out of the OS-specific files.
R=r
CC=golang-dev
https://golang.org/cl/4118042
Prefix all external symbols in runtime by runtime·,
to avoid conflicts with possible symbols of the same
name in linked-in C libraries. The obvious conflicts
are printf, malloc, and free, but hide everything to
avoid future pain.
The symbols left alone are:
** known to cgo **
_cgo_free
_cgo_malloc
libcgo_thread_start
initcgo
ncgocall
** known to linker **
_rt0_$GOARCH
_rt0_$GOARCH_$GOOS
text
etext
data
end
pclntab
epclntab
symtab
esymtab
** known to C compiler **
_divv
_modv
_div64by32
etc (arch specific)
Tested on darwin/386, darwin/amd64, linux/386, linux/amd64.
Built (but not tested) for freebsd/386, freebsd/amd64, linux/arm, windows/386.
R=r, PeterGo
CC=golang-dev
https://golang.org/cl/2899041
Old code was using recursion to traverse object graph.
New code uses an explicit stack, cutting the per-pointer
footprint to two words during the recursion and avoiding
the standard allocator and stack splitting code.
in test/garbage:
Reduces parser runtime by 2-3%
Reduces Peano runtime by 40%
Increases tree runtime by 4-5%
R=r
CC=golang-dev
https://golang.org/cl/2150042
Returns R14 and R15 to the available register pool.
Plays more nicely with ELF ABI C code.
In particular, our signal handlers will no longer crash
when a signal arrives during execution of a cgo C call.
Fixes#720.
R=ken2, r
CC=golang-dev
https://golang.org/cl/1847051
Could not take a signal on threads other than the main thread.
If you look at the spinning binary with dtrace, you can see a
fault happening over and over:
$ dtrace -n '
fbt::user_trap:entry /execname=="boot32" && self->count < 10/
{
self->count++;
printf("%s %x %x %x %x", probefunc, arg1, arg2, arg3, arg4);
stack();
tracemem(arg4, 256);
}'
dtrace: description 'fbt::user_trap:entry ' matched 1 probe
CPU ID FUNCTION:NAME
1 17015 user_trap:entry user_trap 0 10 79af0a0 79af0a0
mach_kernel`lo_alltraps+0x12a
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
0: 0e 00 00 00 37 00 00 00 00 00 00 00 1f 00 00 00 ....7...........
10: 1f 00 00 00 a8 33 00 00 00 00 00 01 00 00 00 00 .....3..........
20: 98 ba dc fe 07 09 00 00 00 00 00 00 98 ba dc fe ................
30: 06 00 00 00 0d 00 00 00 34 00 00 00 9e 1c 00 00 ........4.......
40: 17 00 00 00 00 02 00 00 ac 30 00 00 1f 00 00 00 .........0......
50: 00 00 00 00 00 00 00 00 0d 00 00 00 e0 e6 29 00 ..............).
60: 34 00 00 00 00 00 00 00 9e 1c 00 00 00 00 00 00 4...............
70: 17 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 ................
80: ac 30 00 00 00 00 00 00 1f 00 00 00 00 00 00 00 .0..............
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
a0: 48 00 00 00 10 00 00 00 85 00 00 00 a0 f2 29 00 H.............).
b0: 69 01 00 02 00 00 00 00 e6 93 04 82 ff 7f 00 00 i...............
c0: 2f 00 00 00 00 00 00 00 06 02 00 00 00 00 00 00 /...............
d0: 78 ee 42 01 01 00 00 00 1f 00 00 00 00 00 00 00 x.B.............
e0: 00 ed 9a 07 00 00 00 00 00 00 00 00 00 00 00 00 ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
...
The memory dump shows a 32-bit exception frame:
x86_saved_state32
gs = 0x37
fs = 0
es = 0x1f
ds = 0x1f
edi = 0x33a8
esi = 0x01000000
ebp = 0
cr2 = 0xfedcba98
ebx = 0x0907
edx = 0
ecx = 0xfedcba98
eax = 0x06
trapno = 0x0d
err = 0x34
eip = 0x1c9e
cs = 0x17
efl = 0x0200
uesp = 0x30ac
ss = 0x1f
The cr2 of 0xfedcba98 is the address that the new thread read
to cause the fault, but note that the trap is now a GP fault with
error code 0x34, meaning it's moved past the cr2 problem and on
to an invaild segment selector. The 0x34 is suspiciously similar
to the 0x37 in gs, and sure enough, OS X forces gs to have
that value in the signal handler, and if your thread hasn't set
up that segment (known as USER_CTHREAD), you'll fault on the IRET
into the signal handler and never be able to handle a signal.
The kernel bug is that it forces segment 0x37 without making sure
it is a valid segment. Leopard also forced 0x37 but had the courtesy
to set it up first.
Since OS X requires us to set up that segment (using the
thread_fast_set_cthread_self system call), we might as well
use it instead of the more complicated i386_set_ldt call to
set up our per-OS thread storage.
Also add some more zeros to bsdthread_register for new arguments
in Snow Leopard (apparently unnecessary, but being careful).
Fixes#510.
R=r
CC=golang-dev
https://golang.org/cl/824046
* correct symbol table size
* do not reorder functions in output
* traceback
* signal handling
* use same code for go + defer
* handle leaf functions in symbol table
R=kaib, dpx
CC=golang-dev
https://golang.org/cl/884041
because they are in package runtime.
another step to enforcing package boundaries.
R=r
DELTA=732 (114 added, 93 deleted, 525 changed)
OCL=35811
CL=35824
* keep coherent SP/PC in gobuf
(i.e., SP that would be in use at that PC)
* gogocall replaces setspgoto,
should work better in presence of link registers
* delete unused system calls
only amd64; 386 is now broken
R=r
DELTA=548 (183 added, 183 deleted, 182 changed)
OCL=30381
CL=30442