1
0
mirror of https://github.com/golang/go synced 2024-10-05 07:11:22 -06:00
Commit Graph

1147 Commits

Author SHA1 Message Date
Russ Cox
43aac4f9e7 runtime: raise maxmem to 512 GB
A workaround for #10460.

Change-Id: I607a556561d509db6de047892f886fb565513895
Reviewed-on: https://go-review.googlesource.com/10819
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 18:31:25 +00:00
Russ Cox
2c2770c3d4 cmd/cgo: make sure pointers passed to C escape to heap
Fixes #10303.

Change-Id: Ia68d3566ba3ebeea6e18e388446bd9b8c431e156
Reviewed-on: https://go-review.googlesource.com/10814
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-15 17:39:53 +00:00
Russ Cox
a3b9797baa runtime: gofmt
Change-Id: I539bdc438f694610a7cd373f7e1451171737cfb3
Reviewed-on: https://go-review.googlesource.com/11084
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-15 17:36:34 +00:00
Russ Cox
d5b40b6ac2 runtime: add GODEBUG gcshrinkstackoff, gcstackbarrieroff, and gcstoptheworld variables
While we're here, update the documentation and delete variables with no effect.

Change-Id: I4df0d266dff880df61b488ed547c2870205862f0
Reviewed-on: https://go-review.googlesource.com/10790
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 17:31:04 +00:00
Russ Cox
80ec711755 runtime: use type-based write barrier for remote stack write during chansend
A send on an unbuffered channel to a blocked receiver is the only
case in the runtime where one goroutine writes directly to the stack
of another. The garbage collector assumes that if a goroutine is
blocked, its stack contains no new pointers since the last time it ran.
The send on an unbuffered channel violates this, so it needs an
explicit write barrier. It has an explicit write barrier, but not one that
can handle a write to another stack. Use one that can (based on type bitmap
instead of heap bitmap).

To make this work, raise the limit for type bitmaps so that they are
used for all types up to 64 kB in size (256 bytes of bitmap).
(The runtime already imposes a limit of 64 kB for a channel element size.)

I have been unable to reproduce this problem in a simple test program.

Could help #11035.

Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5
Reviewed-on: https://go-review.googlesource.com/10815
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 16:50:30 +00:00
Russ Cox
d57c889ae8 runtime: wait to update arena_used until after mapping bitmap
This avoids a race with gcmarkwb_m that was leading to faults.

Fixes #10212.

Change-Id: I6fcf8d09f2692227063ce29152cb57366ea22487
Reviewed-on: https://go-review.googlesource.com/10816
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-06-11 18:15:21 +00:00
Ainar Garipov
7f9f70e5b6 all: fix misprints in comments
These were found by grepping the comments from the go code and feeding
the output to aspell.

Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395
Reviewed-on: https://go-review.googlesource.com/10941
Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-11 14:18:57 +00:00
Yongjian Xu
93e57a22d5 runtime: correct a drifted comment in referencing m->locked.
Change-Id: Ida4b98aa63e57594fa6fa0b8178106bac9b3cd19
Reviewed-on: https://go-review.googlesource.com/10837
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-10 06:15:20 +00:00
Russ Cox
433c0bc769 runtime: avoid fault in heapBitsBulkBarrier
Change-Id: I0512e461de1f25cb2a1cb7f23e7a77d00700667c
Reviewed-on: https://go-review.googlesource.com/10803
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-08 20:24:00 +00:00
Austin Clements
b0532a96a8 runtime: fix write-barrier-enabled phase list in gcmarkwb_m
Commit 1303957 was supposed to enable write barriers during the
concurrent scan phase, but it only enabled *calls* to the write
barrier during this phase. It failed to update the redundant list of
write-barrier-enabled phases in gcmarkwb_m, so it still wasn't greying
objects during the scan phase.

This commit fixes this by replacing the redundant list of phases in
gcmarkwb_m with simply checking writeBarrierEnabled. This is almost
certainly redundant with checks already done in callers, but the last
time we tried to remove these redundant checks everything got much
slower, so I'm leaving it alone for now.

Fixes #11105.

Change-Id: I00230a3cb80a008e749553a8ae901b409097e4be
Reviewed-on: https://go-review.googlesource.com/10801
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-08 05:13:15 +00:00
Austin Clements
306f8f11ad runtime: unwind stack barriers when writing above the current frame
Stack barriers assume that writes through pointers to frames above the
current frame will get write barriers, and hence these frames do not
need to be re-scanned to pick up these changes. For normal writes,
this is true. However, there are places in the runtime that use
typedmemmove to potentially write through pointers to higher frames
(such as mapassign1). Currently, typedmemmove does not execute write
barriers if the destination is on the stack. If there's a stack
barrier between the current frame and the frame being modified with
typedmemmove, and the stack barrier is not otherwise hit, it's
possible that the garbage collector will never see the updated pointer
and incorrectly reclaim the object.

Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove
and its variants) detect when the destination is in the stack and
unwind stack barriers up to the point, forcing mark termination to
later rescan the effected frame and collect these pointers.

Fixes #11084. Might be related to #10240, #10541, #10941, #11023,
 #11027 and possibly others.

Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd
Reviewed-on: https://go-review.googlesource.com/10791
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:57:47 +00:00
Austin Clements
1303957dbf runtime: enable write barriers during concurrent scan
Currently, write barriers are only enabled after completion of the
concurrent scan phase, as we enter the concurrent mark phase. However,
stack barriers are installed during the scan phase and assume that
write barriers will track changes to frames above the stack
barriers. Since write barriers aren't enabled until after stack
barriers are installed, we may miss modifications to the stack that
happen after installing the stack barriers and before enabling write
barriers.

Fix this by enabling write barriers during the scan phase.

This commit intentionally makes the minimal change to do this (there's
only one line of code change; the rest are comment changes). At the
very least, we should consider eliminating the ragged barrier that's
intended to synchronize the enabling of write barriers, but now just
wastes time. I've included a large comment about extensions and
alternative designs.

Change-Id: Ib20fede794e4fcb91ddf36f99bd97344d7f96421
Reviewed-on: https://go-review.googlesource.com/10795
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:55:33 +00:00
Austin Clements
6f6403eddf runtime: fix checkmarks to rescan stacks
Currently checkmarks mode fails to rescan stacks because it sees the
leftover state bits indicating that the stacks haven't changed since
the last scan. As a result, it won't detect lost marks caused by
failing to scan stacks correctly during regular garbage collection.

Fix this by marking all stacks dirty before performing the checkmark
phase.

Change-Id: I1f06882bb8b20257120a4b8e7f95bb3ffc263895
Reviewed-on: https://go-review.googlesource.com/10794
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:55:12 +00:00
Austin Clements
2774b37306 all: use RET instead of RETURN on ppc64
All of the architectures except ppc64 have only "RET" for the return
mnemonic. ppc64 used to have only "RETURN", but commit cf06ea6
introduced RET as a synonym for RETURN to make ppc64 consistent with
the other architectures. However, that commit was never followed up to
make the code itself consistent by eliminating uses of RETURN.

This commit replaces all uses of RETURN in the ppc64 assembly with
RET.

This was done with
    sed -i 's/\<RETURN\>/RET/' **/*_ppc64x.s
plus one manual change to syscall/asm.s.

Change-Id: I3f6c8d2be157df8841d48de988ee43f3e3087995
Reviewed-on: https://go-review.googlesource.com/10672
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-06 00:07:23 +00:00
Alan Donovan
232331f0c7 runtime: add blank assignment to defeat "declared but not used" error from go/types
gc should ideally consider this an error too; see golang/go#8560.

Change-Id: Ieee71c4ecaff493d7f83e15ba8c8a04ee90a4cf1
Reviewed-on: https://go-review.googlesource.com/10757
Reviewed-by: Robert Griesemer <gri@golang.org>
2015-06-05 18:05:16 +00:00
Austin Clements
7529314ed3 runtime: use correct SP when installing stack barriers
Currently the stack barriers are installed at the next frame boundary
after gp.sched.sp + 1024*2^n for n=0,1,2,... However, when a G is in a
system call, we set gp.sched.sp to 0, which causes stack barriers to
be installed at *every* frame. This easily overflows the slice we've
reserved for storing the stack barrier information, and causes a
"slice bounds out of range" panic in gcInstallStackBarrier.

Fix this by using gp.syscallsp instead of gp.sched.sp if it's
non-zero. This is the same logic that gentraceback uses to determine
the current SP.

Fixes #11049.

Change-Id: Ie40eeee5bec59b7c1aa715a7c17aa63b1f1cf4e8
Reviewed-on: https://go-review.googlesource.com/10755
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-05 15:53:07 +00:00
Russ Cox
3ffcbb633e runtime: default GOMAXPROCS to NumCPU(), not 1
See golang.org/s/go15gomaxprocs for details.

Change-Id: I8de5df34fa01d31d78f0194ec78a2474c281243c
Reviewed-on: https://go-review.googlesource.com/10668
Reviewed-by: Rob Pike <r@golang.org>
2015-06-05 04:38:04 +00:00
Josh Bleecher Snyder
5353cde080 runtime, cmd/internal/obj/arm: improve arm function prologue
When stack growth is not needed, as it usually is not,
execute only a single conditional branch
rather than three conditional instructions.
This adds 4 bytes to every function,
but might speed up execution in the common case.

Sample disassembly for

func f() {
	_ = [128]byte{}
}

Before:

TEXT main.f(SB) x.go
	x.go:3	0x2000	e59a1008	MOVW 0x8(R10), R1
	x.go:3	0x2004	e59fb028	MOVW 0x28(R15), R11
	x.go:3	0x2008	e08d200b	ADD R11, R13, R2
	x.go:3	0x200c	e1520001	CMP R1, R2
	x.go:3	0x2010	91a0300e	MOVW.LS R14, R3
	x.go:3	0x2014	9b0118a9	BL.LS runtime.morestack_noctxt(SB)
	x.go:3	0x2018	9afffff8	B.LS main.f(SB)
	x.go:3	0x201c	e52de084	MOVW.W R14, -0x84(R13)
	x.go:4	0x2020	e28d1004	ADD $4, R13, R1
	x.go:4	0x2024	e3a00000	MOVW $0, R0
	x.go:4	0x2028	eb012255	BL 0x4a984
	x.go:5	0x202c	e49df084	RET #132
	x.go:5	0x2030	eafffffe	B 0x2030
	x.go:5	0x2034	ffffff7c	?

After:

TEXT main.f(SB) x.go
	x.go:3	0x2000	e59a1008	MOVW 0x8(R10), R1
	x.go:3	0x2004	e59fb02c	MOVW 0x2c(R15), R11
	x.go:3	0x2008	e08d200b	ADD R11, R13, R2
	x.go:3	0x200c	e1520001	CMP R1, R2
	x.go:3	0x2010	9a000004	B.LS 0x2028
	x.go:3	0x2014	e52de084	MOVW.W R14, -0x84(R13)
	x.go:4	0x2018	e28d1004	ADD $4, R13, R1
	x.go:4	0x201c	e3a00000	MOVW $0, R0
	x.go:4	0x2020	eb0124dc	BL 0x4b398
	x.go:5	0x2024	e49df084	RET #132
	x.go:5	0x2028	e1a0300e	MOVW R14, R3
	x.go:5	0x202c	eb011b0d	BL runtime.morestack_noctxt(SB)
	x.go:5	0x2030	eafffff2	B main.f(SB)
	x.go:5	0x2034	eafffffe	B 0x2034
	x.go:5	0x2038	ffffff7c	?

Updates #10587.

package sort benchmarks on an iPhone 6:

name            old time/op  new time/op  delta
SortString1K     569µs ± 0%   565µs ± 1%  -0.75%  (p=0.000 n=23+24)
StableString1K   872µs ± 1%   870µs ± 1%  -0.16%  (p=0.009 n=23+24)
SortInt1K        317µs ± 2%   316µs ± 2%    ~     (p=0.410 n=26+26)
StableInt1K      343µs ± 1%   339µs ± 1%  -1.07%  (p=0.000 n=22+23)
SortInt64K      30.0ms ± 1%  30.0ms ± 1%    ~     (p=0.091 n=25+24)
StableInt64K    30.2ms ± 0%  30.0ms ± 0%  -0.69%  (p=0.000 n=22+22)
Sort1e2          147µs ± 1%   146µs ± 0%  -0.48%  (p=0.000 n=25+24)
Stable1e2        290µs ± 1%   286µs ± 1%  -1.30%  (p=0.000 n=23+24)
Sort1e4         29.5ms ± 2%  29.7ms ± 1%  +0.71%  (p=0.000 n=23+23)
Stable1e4       88.7ms ± 4%  88.6ms ± 8%  -0.07%  (p=0.022 n=26+26)
Sort1e6          4.81s ± 7%   4.83s ± 7%    ~     (p=0.192 n=26+26)
Stable1e6        18.3s ± 1%   18.1s ± 1%  -0.76%  (p=0.000 n=25+23)
SearchWrappers   318ns ± 1%   344ns ± 1%  +8.14%  (p=0.000 n=23+26)

package sort benchmarks on a first generation rpi:

name            old time/op  new time/op  delta
SearchWrappers  4.13µs ± 0%  3.95µs ± 0%   -4.42%  (p=0.000 n=15+13)
SortString1K    5.81ms ± 1%  5.82ms ± 2%     ~     (p=0.400 n=14+15)
StableString1K  9.69ms ± 1%  9.73ms ± 0%     ~     (p=0.121 n=15+11)
SortInt1K       3.30ms ± 2%  3.66ms ±19%  +10.82%  (p=0.000 n=15+14)
StableInt1K     5.97ms ±15%  4.17ms ± 8%  -30.05%  (p=0.000 n=15+15)
SortInt64K       319ms ± 1%   295ms ± 1%   -7.65%  (p=0.000 n=15+15)
StableInt64K     343ms ± 0%   332ms ± 0%   -3.26%  (p=0.000 n=12+13)
Sort1e2         3.36ms ± 2%  3.22ms ± 4%   -4.10%  (p=0.000 n=15+15)
Stable1e2       6.74ms ± 1%  6.43ms ± 2%   -4.67%  (p=0.000 n=15+15)
Sort1e4          247ms ± 1%   247ms ± 1%     ~     (p=0.331 n=15+14)
Stable1e4        864ms ± 0%   820ms ± 0%   -5.15%  (p=0.000 n=14+15)
Sort1e6          41.2s ± 0%   41.2s ± 0%   +0.15%  (p=0.000 n=13+14)
Stable1e6         192s ± 0%    182s ± 0%   -5.07%  (p=0.000 n=14+14)

Change-Id: I8a9db77e1d4ea1956575895893bc9d04bd81204b
Reviewed-on: https://go-review.googlesource.com/10497
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-04 16:35:12 +00:00
Brad Fitzpatrick
03410f6758 runtime: fix TestFixedGOROOT to properly restore the GOROOT env var after test
Otherwise subsequent tests won't see any modified GOROOT.

With this CL I can move my GOROOT, set GOROOT to the new location, and
the runtime tests pass. Previously the crash_tests would instead look
for the GOROOT baked into the binary, instead of the env var:

--- FAIL: TestGcSys (0.01s)
        crash_test.go:92: building source: exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestGCFairness (0.01s)
        crash_test.go:92: building source: exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestGdbPython (0.07s)
        runtime-gdb_test.go:64: building source exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestLargeStringConcat (0.01s)
        crash_test.go:92: building source: exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go

Update #10029

Change-Id: If91be0f04d3acdcf39a9e773a4e7905a446bc477
Reviewed-on: https://go-review.googlesource.com/10685
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-03 23:33:48 +00:00
Austin Clements
10083d8007 runtime: print start of GC cycle in gctrace, rather than end
Currently the GODEBUG=gctrace=1 trace line includes "@n.nnns" to
indicate the time that the GC cycle ended relative to the time the
program started. This was meant to be consistent with the utilization
as of the end of the cycle, which is printed next on the trace line,
but it winds up just being confusing and unexpected.

Change the trace line to include the time that the GC cycle started
relative to the time the program started.

Change-Id: I7d64580cd696eb17540716d3e8a74a9d6ae50650
Reviewed-on: https://go-review.googlesource.com/10634
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-03 02:17:43 +00:00
Austin Clements
faa7a7e8ae runtime: implement GC stack barriers
This commit implements stack barriers to minimize the amount of
stack re-scanning that must be done during mark termination.

Currently the GC scans stacks of active goroutines twice during every
GC cycle: once at the beginning during root discovery and once at the
end during mark termination. The second scan happens while the world
is stopped and guarantees that we've seen all of the roots (since
there are no write barriers on writes to local stack
variables). However, this means pause time is proportional to stack
size. In particularly recursive programs, this can drive pause time up
past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap).

Re-scanning the entire stack is rarely necessary, especially for large
stacks, because usually most of the frames on the stack were not
active between the first and second scans and hence any changes to
these frames (via non-escaping pointers passed down the stack) were
tracked by write barriers.

To efficiently track how far a stack has been unwound since the first
scan (and, hence, how much needs to be re-scanned), this commit
introduces stack barriers. During the first scan, at exponentially
spaced points in each stack, the scan overwrites return PCs with the
PC of the stack barrier function. When "returned" to, the stack
barrier function records how far the stack has unwound and jumps to
the original return PC for that point in the stack. Then the second
scan only needs to proceed as far as the lowest barrier that hasn't
been hit.

For deeply recursive programs, this substantially reduces mark
termination time (and hence pause time). For the goscheme example
linked in issue #10898, prior to this change, mark termination times
were typically between 100 and 500ms; with this change, mark
termination times are typically between 10 and 20ms. As a result of
the reduced stack scanning work, this reduces overall execution time
of the goscheme example by 20%.

Fixes #10898.

The effect of this on programs that are not deeply recursive is
minimal:

name                   old time/op    new time/op    delta
BinaryTree17              3.16s ± 2%     3.26s ± 1%  +3.31%  (p=0.000 n=19+19)
Fannkuch11                2.42s ± 1%     2.48s ± 1%  +2.24%  (p=0.000 n=17+19)
FmtFprintfEmpty          50.0ns ± 3%    49.8ns ± 1%    ~     (p=0.534 n=20+19)
FmtFprintfString          173ns ± 0%     175ns ± 0%  +1.49%  (p=0.000 n=16+19)
FmtFprintfInt             170ns ± 1%     175ns ± 1%  +2.97%  (p=0.000 n=20+19)
FmtFprintfIntInt          288ns ± 0%     295ns ± 0%  +2.73%  (p=0.000 n=16+19)
FmtFprintfPrefixedInt     242ns ± 1%     252ns ± 1%  +4.13%  (p=0.000 n=18+18)
FmtFprintfFloat           324ns ± 0%     323ns ± 0%  -0.36%  (p=0.000 n=20+19)
FmtManyArgs              1.14µs ± 0%    1.12µs ± 1%  -1.01%  (p=0.000 n=18+19)
GobDecode                8.88ms ± 1%    8.87ms ± 0%    ~     (p=0.480 n=19+18)
GobEncode                6.80ms ± 1%    6.85ms ± 0%  +0.82%  (p=0.000 n=20+18)
Gzip                      363ms ± 1%     363ms ± 1%    ~     (p=0.077 n=18+20)
Gunzip                   90.6ms ± 0%    90.0ms ± 1%  -0.71%  (p=0.000 n=17+18)
HTTPClientServer         51.5µs ± 1%    50.8µs ± 1%  -1.32%  (p=0.000 n=18+18)
JSONEncode               17.0ms ± 0%    17.1ms ± 0%  +0.40%  (p=0.000 n=18+17)
JSONDecode               61.8ms ± 0%    63.8ms ± 1%  +3.11%  (p=0.000 n=18+17)
Mandelbrot200            3.84ms ± 0%    3.84ms ± 1%    ~     (p=0.583 n=19+19)
GoParse                  3.71ms ± 1%    3.72ms ± 1%    ~     (p=0.159 n=18+19)
RegexpMatchEasy0_32       100ns ± 0%     100ns ± 1%  -0.19%  (p=0.033 n=17+19)
RegexpMatchEasy0_1K       342ns ± 1%     331ns ± 0%  -3.41%  (p=0.000 n=19+19)
RegexpMatchEasy1_32      82.5ns ± 0%    81.7ns ± 0%  -0.98%  (p=0.000 n=18+18)
RegexpMatchEasy1_1K       505ns ± 0%     494ns ± 1%  -2.16%  (p=0.000 n=18+18)
RegexpMatchMedium_32      137ns ± 1%     137ns ± 1%  -0.24%  (p=0.048 n=20+18)
RegexpMatchMedium_1K     41.6µs ± 0%    41.3µs ± 1%  -0.57%  (p=0.004 n=18+20)
RegexpMatchHard_32       2.11µs ± 0%    2.11µs ± 1%  +0.20%  (p=0.037 n=17+19)
RegexpMatchHard_1K       63.9µs ± 2%    63.3µs ± 0%  -0.99%  (p=0.000 n=20+17)
Revcomp                   560ms ± 1%     522ms ± 0%  -6.87%  (p=0.000 n=18+16)
Template                 75.0ms ± 0%    75.1ms ± 1%  +0.18%  (p=0.013 n=18+19)
TimeParse                 358ns ± 1%     364ns ± 0%  +1.74%  (p=0.000 n=20+15)
TimeFormat                360ns ± 0%     372ns ± 0%  +3.55%  (p=0.000 n=20+18)

Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2
Reviewed-on: https://go-review.googlesource.com/10314
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-06-02 20:00:57 +00:00
Austin Clements
724f8298a8 runtime: avoid double-scanning of stacks
Currently there's a race between stopg scanning another G's stack and
the G reaching a preemption point and scanning its own stack. When
this race occurs, the G's stack is scanned twice. Currently this is
okay, so this race is benign.

However, we will shortly be adding stack barriers during the first
stack scan, so scanning will no longer be idempotent. To prepare for
this, this change ensures that each stack is scanned only once during
each GC phase by checking the flag that indicates that the stack has
been scanned in this phase before scanning the stack.

Change-Id: Id9f4d5e2e5b839bc3f200ec1723a4a12dd677ab4
Reviewed-on: https://go-review.googlesource.com/10458
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-06-02 19:59:05 +00:00
Austin Clements
3f6e69aca5 runtime: steal space for stack barrier tracking from stack
The stack barrier code will need a bookkeeping structure to keep track
of the overwritten return PCs. This commit introduces and allocates
this structure, but does not yet use the structure.

We don't want to allocate space for this structure during garbage
collection, so this commit allocates it along with the allocation of
the corresponding stack. However, we can't do a regular allocation in
newstack because mallocgc may itself grow the stack (which would lead
to a recursive allocation). Hence, this commit makes the bookkeeping
structure part of the stack allocation itself by stealing the
necessary space from the top of the stack allocation. Since the size
of this bookkeeping structure is logarithmic in the size of the stack,
this has minimal impact on stack behavior.

Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665
Reviewed-on: https://go-review.googlesource.com/10313
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:57 +00:00
Austin Clements
e610c25df0 runtime: decouple stack bounds and stack allocation size
Currently the runtime assumes that the allocation for the stack is
exactly [stack.lo, stack.hi). We're about to steal a small part of
this allocation for per-stack GC metadata. To prepare for this, this
commit adds a field to the G for the allocated size of the stack.
With this change, stack.lo and stack.hi continue to act as the true
bounds on the stack, but are no longer also used as the bounds on the
stack allocation.

(I also tried this the other way around, where stack.lo and stack.hi
remained the allocation bounds and I introduced a new top of stack.
However, there are far more places that assume stack.hi is the true
top of the stack than there are places that assume it's the top of the
allocation.)

Change-Id: Ifa9d956753be53d286d09cbc73d47fb34a18c0c6
Reviewed-on: https://go-review.googlesource.com/10312
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:50 +00:00
Austin Clements
c02b8911d8 runtime: clean up signalstack API
Currently signalstack takes a lower limit and a length and all calls
hard-code the passed length. Change the API to take a *stack and
compute the lower limit and length from the passed stack.

This will make it easier for the runtime to steal some space from the
top of the stack since it eliminates the hard-coded stack sizes.

Change-Id: I7d2a9f45894b221f4e521628c2165530bbc57d53
Reviewed-on: https://go-review.googlesource.com/10311
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:42 +00:00
Austin Clements
cc6a7fce53 runtime: increase precision of gctrace times
Currently we truncate gctrace clock and CPU times to millisecond
precision. As a result, many phases are typically printed as 0, which
is fine for user consumption, but makes gathering statistics and
reports over GC traces difficult.

In 1.4, the gctrace line printed times in microseconds. This was
better for statistics, but not as easy for users to read or interpret,
and it generally made the trace lines longer.

This change strikes a balance between these extremes by printing
milliseconds, but including the decimal part to two significant
figures down to microsecond precision. This remains easy to read and
interpret, but includes more precision when it's useful.

For example, where the code currently prints,

gc #29 @1.629s 0%: 0+2+0+12+0 ms clock, 0+2+0+0/12/0+0 ms cpu, 4->4->2 MB, 4 MB goal, 1 P

this prints,

gc #29 @1.629s 0%: 0.005+2.1+0+12+0.29 ms clock, 0.005+2.1+0+0/12/0+0.29 ms cpu, 4->4->2 MB, 4 MB goal, 1 P

Fixes #10970.

Change-Id: I249624779433927cd8b0947b986df9060c289075
Reviewed-on: https://go-review.googlesource.com/10554
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 18:31:36 +00:00
Mikio Hara
1fa0a8cec5 runtime: fix data race in BenchmarkChanPopular
Fixes #11014.

Change-Id: I9a18dacd10564d3eaa1fea4d77f1a48e08e79f53
Reviewed-on: https://go-review.googlesource.com/10563
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-02 11:16:01 +00:00
Austin Clements
df2809f04e runtime: document that runtime.GC() blocks until GC is complete
runtime.GC() is intentionally very weakly specified. However, it is so
weakly specified that it's difficult to know that it's being used
correctly for its one intended use case: to ensure garbage collection
has run in a test that is garbage-sensitive. In particular, it is
unclear whether it is synchronous or asynchronous. In the old STW
collector this was essentially self-evident; short of queuing up a
garbage collection to run later, it had to be synchronous. However,
with the concurrent collector, there's evidence that people are
inferring that it may be asynchronous (e.g., issue #10986), as this is
both unclear in the documentation and possible in the implementation.

In fact, runtime.GC() runs a fully synchronous STW collection. We
probably don't want to commit to this exact behavior. But we can
commit to the essential property that tests rely on: that runtime.GC()
does not return until the GC has finished.

Change-Id: Ifc3045a505e1898ecdbe32c1f7e80e2e9ffacb5b
Reviewed-on: https://go-review.googlesource.com/10488
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-06-01 14:51:12 +00:00
Austin Clements
f2c3957ed8 runtime: disable GC around TestGoroutineParallelism
TestGoroutineParallelism can deadlock if the GC runs during the
test. Currently it tries to prevent this by forcing a GC before the
test, but this is best effort and fails completely if GOGC is very low
for testing.

This change replaces this best-effort fix with simply setting GOGC to
off for the duration of the test.

Change-Id: I8229310833f241b149ebcd32845870c1cb14e9f8
Reviewed-on: https://go-review.googlesource.com/10454
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-28 17:40:19 +00:00
Austin Clements
4a1957d0aa runtime: use stripped test environment for TestGdbPython
Most runtime tests that invoke the compiler to build a sub-test binary
do so with a special environment constructed by testEnv that strips
out environment variables that should apply to the test but not to the
build.

Fix TestGdbPython to use this test environment when invoking go build,
like other tests do.

Change-Id: Iafdf89d4765c587cbebc427a5d61cb8a7e71b326
Reviewed-on: https://go-review.googlesource.com/10455
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-28 17:39:08 +00:00
Elias Naur
8017ace496 runtime: don't always block all signals on OpenBSD
Implement the changes from CL 10173 on OpenBSD.

Change-Id: I2db1cd8141fd392a34753a1b8113e2e0401173b9
Reviewed-on: https://go-review.googlesource.com/10342
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-23 17:42:43 +00:00
Elias Naur
84cfba17c2 runtime: don't always unblock all signals
Ian proposed an improved way of handling signals masks in Go, motivated
by a problem where the Android java runtime expects certain signals to
be blocked for all JVM threads. Discussion here

https://groups.google.com/forum/#!topic/golang-dev/_TSCkQHJt6g

Ian's text is used in the following:

A Go program always needs to have the synchronous signals enabled.
These are the signals for which _SigPanic is set in sigtable, namely
SIGSEGV, SIGBUS, SIGFPE.

A Go program that uses the os/signal package, and calls signal.Notify,
needs to have at least one thread which is not blocking that signal,
but it doesn't matter much which one.

Unix programs do not change signal mask across execve.  They inherit
signal masks across fork.  The shell uses this fact to some extent;
for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are
blocked for commands run due to backquote quoting or $().

Our current position on signal masks was not thought out.  We wandered
into step by step, e.g., http://golang.org/cl/7323067 .

This CL does the following:

Introduce a new platform hook, msigsave, that saves the signal mask of
the current thread to m.sigsave.

Call msigsave from needm and newm.

In minit grab set up the signal mask from m.sigsave and unblock the
essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT
(for systems that have it).

In unminit, restore the signal mask from m.sigsave.

The first time that os/signal.Notify is called, start a new thread whose
only purpose is to update its signal mask to make sure signals for
signal.Notify are unblocked on at least one thread.

The effect on Go programs will be that if they are invoked with some
non-synchronous signals blocked, those signals will normally be
ignored.  Previously, those signals would mostly be ignored.  A change
in behaviour will occur for programs started with any of these signals
blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT,
SIGTERM.  Previously those signals would always cause a crash (unless
using the os/signal package); with this change, they will be ignored
if the program is started with the signal blocked (and does not use
the os/signal package).

./all.bash completes successfully on linux/amd64.

OpenBSD is missing the implementation.

Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c
Reviewed-on: https://go-review.googlesource.com/10173
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-22 20:24:08 +00:00
Russ Cox
001438bdfe runtime: fix callwritebarrier
Given a call frame F of size N where the return values start at offset R,
callwritebarrier was instructing heapBitsBulkBarrier to scan the block
of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R
bytes scanned might lead into the next allocated block in memory.
Because the scan was consulting the heap bitmap for type information,
scanning into the next block normally "just worked" in the sense of
not crashing.

Scanning the extra N-R bytes of memory is a problem mainly because
it causes the GC to consider pointers that might otherwise not be
considered, leading it to retain objects that should actually be freed.
This is very difficult to detect.

Luckily, juju turned up a case where the heap bitmap and the memory
were out of sync for the block immediately after the call frame, so that
heapBitsBulkBarrier saw an obvious non-pointer where it expected a
pointer, causing a loud crash.

Why is there a non-pointer in memory that the heap bitmap records as
a pointer? That is more difficult to answer. At least one way that it
could happen is that allocations containing no pointers at all do not
update the heap bitmap. So if heapBitsBulkBarrier walked out of the
current object and into a no-pointer object and consulted those bitmap
bits, it would be misled. This doesn't happen in general because all
the paths to heapBitsBulkBarrier first check for the no-pointer case.
This may or may not be what happened, but it's the only scenario
I've been able to construct.

I tried for quite a while to write a simple test for this and could not.
It does fix the juju crash, and it is clearly an improvement over the
old code.

Fixes #10844.

Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a
Reviewed-on: https://go-review.googlesource.com/10317
Reviewed-by: Austin Clements <austin@google.com>
2015-05-21 19:14:03 +00:00
Austin Clements
a5c3bbe0b4 runtime: eliminate write barrier from adjustpointers
Currently adjustpointers invokes a write barrier for every stack slot
it updates. This is safe---the write barrier always does nothing
because the new value is never a heap pointer---but it's unnecessary
overhead in performance and complexity.

Fix this by rewriting adjustpointers to work with *uintptrs instead of
*unsafe.Pointers. As an added bonus, this makes the code cleaner.

name                   old mean              new mean              delta
BinaryTree17            3.35s × (0.98,1.01)   3.33s × (0.99,1.02)    ~    (p=0.095 n=20+19)
Fannkuch11              2.49s × (1.00,1.01)   2.52s × (0.99,1.01)  +1.23% (p=0.000 n=19+20)
FmtFprintfEmpty        52.2ns × (0.99,1.02)  52.2ns × (0.99,1.02)    ~    (p=0.766 n=19+19)
FmtFprintfString        181ns × (0.99,1.02)   179ns × (0.99,1.01)  -1.06% (p=0.000 n=20+19)
FmtFprintfInt           177ns × (0.99,1.01)   173ns × (0.99,1.02)  -2.26% (p=0.000 n=17+20)
FmtFprintfIntInt        300ns × (0.99,1.01)   302ns × (0.99,1.01)  +0.76% (p=0.000 n=19+20)
FmtFprintfPrefixedInt   253ns × (0.99,1.02)   256ns × (0.99,1.01)  +0.96% (p=0.000 n=20+19)
FmtFprintfFloat         334ns × (0.99,1.02)   334ns × (1.00,1.01)    ~    (p=0.243 n=20+19)
FmtManyArgs            1.16µs × (0.99,1.01)  1.17µs × (0.99,1.02)  +0.88% (p=0.000 n=20+20)
GobDecode              9.16ms × (0.99,1.02)  9.18ms × (1.00,1.00)  +0.21% (p=0.048 n=20+17)
GobEncode              7.03ms × (0.99,1.01)  7.05ms × (0.99,1.01)    ~    (p=0.091 n=19+19)
Gzip                    374ms × (0.99,1.01)   372ms × (0.99,1.02)  -0.50% (p=0.008 n=18+20)
Gunzip                 92.9ms × (0.99,1.01)  92.5ms × (1.00,1.01)  -0.47% (p=0.002 n=19+19)
HTTPClientServer       53.1µs × (0.98,1.01)  52.5µs × (0.99,1.01)  -0.98% (p=0.000 n=20+19)
JSONEncode             17.4ms × (0.99,1.02)  17.5ms × (0.99,1.01)    ~    (p=0.061 n=19+20)
JSONDecode             66.0ms × (0.99,1.02)  64.7ms × (0.99,1.01)  -1.87% (p=0.000 n=20+20)
Mandelbrot200          3.94ms × (1.00,1.01)  3.95ms × (1.00,1.01)    ~    (p=0.799 n=18+19)
GoParse                3.89ms × (0.99,1.02)  3.86ms × (0.99,1.01)  -0.70% (p=0.016 n=20+19)
RegexpMatchEasy0_32     102ns × (0.99,1.02)   102ns × (1.00,1.01)    ~    (p=0.557 n=20+18)
RegexpMatchEasy0_1K     353ns × (0.99,1.02)   341ns × (0.99,1.01)  -3.38% (p=0.000 n=20+20)
RegexpMatchEasy1_32    85.0ns × (0.99,1.02)  85.0ns × (0.99,1.01)    ~    (p=0.851 n=19+20)
RegexpMatchEasy1_1K     521ns × (0.99,1.02)   506ns × (1.00,1.01)  -2.85% (p=0.000 n=20+18)
RegexpMatchMedium_32    142ns × (0.99,1.02)   141ns × (1.00,1.01)  -1.17% (p=0.000 n=20+19)
RegexpMatchMedium_1K   42.8µs × (0.99,1.01)  42.3µs × (0.99,1.01)  -1.07% (p=0.000 n=20+19)
RegexpMatchHard_32     2.17µs × (0.99,1.01)  2.16µs × (1.00,1.01)  -0.51% (p=0.042 n=20+18)
RegexpMatchHard_1K     65.6µs × (0.99,1.01)  64.8µs × (1.00,1.00)  -1.21% (p=0.000 n=20+17)
Revcomp                 581ms × (0.99,1.04)   536ms × (1.00,1.01)  -7.71% (p=0.000 n=20+18)
Template               77.2ms × (0.99,1.01)  76.8ms × (0.99,1.01)    ~    (p=0.426 n=20+18)
TimeParse               369ns × (0.99,1.02)   371ns × (1.00,1.01)    ~    (p=0.117 n=20+19)
TimeFormat              371ns × (0.99,1.02)   391ns × (0.99,1.01)  +5.33% (p=0.000 n=20+19)

Change-Id: I5b952ba577ac4365c8c87db837c5804a1e30b7be
Reviewed-on: https://go-review.googlesource.com/10293
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-21 18:35:49 +00:00
Rick Hudson
5b66e5d0d8 runtime: turn work buffer tracing off by default
During development we ran with monitoring code turned
on by default. This CL turns the work buffer monitoring
off. Performance change on most go1 benchmarks is small
or insignificant.

name                   old mean              new mean              delta
BinaryTree17            3.35s × (0.99,1.01)   3.35s × (0.99,1.01)    ~    (p=0.841 n=5+5)
Fannkuch11              2.59s × (1.00,1.01)   2.55s × (1.00,1.00)  -1.65% (p=0.008 n=5+5)
FmtFprintfEmpty        52.5ns × (0.99,1.02)  53.2ns × (0.98,1.01)    ~    (p=0.063 n=5+5)
FmtFprintfString        181ns × (1.00,1.00)   180ns × (1.00,1.00)  -0.55% (p=0.029 n=4+4)
FmtFprintfInt           176ns × (1.00,1.01)   174ns × (1.00,1.00)  -0.91% (p=0.000 n=5+4)
FmtFprintfIntInt        298ns × (1.00,1.00)   299ns × (1.00,1.00)    ~    (p=0.143 n=4+4)
FmtFprintfPrefixedInt   250ns × (1.00,1.01)   246ns × (1.00,1.00)  -1.68% (p=0.000 n=5+4)
FmtFprintfFloat         340ns × (1.00,1.00)   340ns × (1.00,1.01)    ~    (p=0.643 n=5+5)
FmtManyArgs            1.16µs × (1.00,1.00)  1.15µs × (1.00,1.00)  -0.47% (p=0.016 n=5+5)
GobDecode              9.22ms × (1.00,1.00)  9.23ms × (1.00,1.00)    ~    (p=0.841 n=5+5)
GobEncode              7.00ms × (1.00,1.01)  7.09ms × (0.99,1.01)  +1.26% (p=0.016 n=5+5)
Gzip                    387ms × (1.00,1.00)   389ms × (0.99,1.02)    ~    (p=1.000 n=5+5)
Gunzip                 97.8ms × (1.00,1.00)  98.3ms × (1.00,1.00)  +0.51% (p=0.016 n=5+4)
HTTPClientServer       52.6µs × (1.00,1.01)  52.7µs × (1.00,1.01)    ~    (p=1.000 n=5+5)
JSONEncode             18.0ms × (0.99,1.02)  17.9ms × (1.00,1.00)    ~    (p=0.310 n=5+5)
JSONDecode             64.8ms × (0.99,1.02)  63.6ms × (1.00,1.00)  -1.94% (p=0.008 n=5+5)
Mandelbrot200          4.05ms × (1.00,1.00)  4.05ms × (1.00,1.00)    ~    (p=0.421 n=5+5)
GoParse                3.86ms × (1.00,1.01)  3.84ms × (0.99,1.01)    ~    (p=0.421 n=5+5)
RegexpMatchEasy0_32     101ns × (1.00,1.00)   102ns × (0.99,1.02)    ~    (p=0.238 n=4+5)
RegexpMatchEasy0_1K     346ns × (1.00,1.01)   345ns × (1.00,1.00)    ~    (p=0.333 n=5+4)
RegexpMatchEasy1_32    87.3ns × (0.99,1.02)  87.4ns × (1.00,1.00)    ~    (p=0.190 n=5+4)
RegexpMatchEasy1_1K     520ns × (1.00,1.00)   520ns × (1.00,1.01)    ~    (p=1.000 n=4+5)
RegexpMatchMedium_32    143ns × (1.00,1.00)   142ns × (1.00,1.00)  -0.70% (p=0.029 n=4+4)
RegexpMatchMedium_1K   43.2µs × (1.00,1.01)  43.2µs × (1.00,1.00)    ~    (p=0.841 n=5+5)
RegexpMatchHard_32     2.24µs × (1.00,1.01)  2.23µs × (1.00,1.01)  -0.63% (p=0.048 n=5+5)
RegexpMatchHard_1K     68.7µs × (1.00,1.00)  68.3µs × (1.00,1.00)  -0.56% (p=0.008 n=5+5)
Revcomp                 577ms × (1.00,1.01)   579ms × (1.00,1.00)    ~    (p=0.151 n=5+5)
Template               74.9ms × (1.00,1.00)  76.5ms × (1.00,1.00)  +2.11% (p=0.008 n=5+5)
TimeParse               359ns × (1.00,1.00)   362ns × (1.00,1.00)  +0.72% (p=0.008 n=5+5)
TimeFormat              369ns × (1.00,1.00)   371ns × (1.00,1.01)    ~    (p=0.071 n=5+5)

Change-Id: I4206a3f77a3d1450966b7a62ea7597aec44cb72f
Reviewed-on: https://go-review.googlesource.com/10294
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-21 16:09:24 +00:00
Austin Clements
719efc70eb runtime: make runtime.callers walk calling G, not g0
Currently runtime.callers invokes gentraceback with the pc and sp of
the G it is called from, but always passes g0 even if it was called
from a regular g. Right now this has no ill effects because
runtime.callers does not use either callback argument or the
_TraceJumpStack flag, but it makes the code fragile and will break
some upcoming changes.

Fix this by lifting the getg() call outside of the systemstack in
runtime.callers.

Change-Id: I4e1e927961c0e0cd4dcf28693be47df7bae9e122
Reviewed-on: https://go-review.googlesource.com/10292
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-21 16:06:37 +00:00
Rick Hudson
197aa9e64d runtime: remove unused quiesce code
This is dead code. If you want to quiesce the system the
preferred way is to use forEachP(func(*p){}).

Change-Id: Ic7677a5dd55e3639b99e78ddeb2c71dd1dd091fa
Reviewed-on: https://go-review.googlesource.com/10267
Reviewed-by: Austin Clements <austin@google.com>
2015-05-20 17:56:44 +00:00
Rick Hudson
913db7685e runtime: run background mark helpers only if work is available
Prior to this CL whenever the GC marking was enabled and
a P was looking for work we supplied a G to help
the GC do its marking tasks. Once this G finished all
the marking available it would release the P to find another
available G. In the case where there was no work the P would drop
into findrunnable which would execute the mark helper G which would
immediately return and the P would drop into findrunnable again repeating
the process. Since the P was always given a G to run it never blocks.
This CL first checks if the GC mark helper G has available work and if
not the P immediately falls through to its blocking logic.

Fixes #10901

Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5
Reviewed-on: https://go-review.googlesource.com/10189
Reviewed-by: Austin Clements <austin@google.com>
2015-05-19 15:57:50 +00:00
Austin Clements
f4d51eb2f5 runtime: minor clean up to heapminimum
Currently setGCPercent sets heapminimum to heapminimum*GOGC/100. The
real intent is to set heapminimum to a scaled multiple of a fixed
default heap minimum, not to scale heapminimum based on its current
value. This turns out to be okay because setGCPercent is only called
once and heapminimum is initially set to this default heap minimum.
However, the code as written is confusing, especially since
setGCPercent is otherwise written so it could be called again to
change GOGC. Fix this by introducing a defaultHeapMinimum constant and
using this instead of the current value of heapminimum to compute the
scaled heap minimum.

As part of this, this commit improves the documentation on
heapminimum.

Change-Id: I4eb82c73dc2eb44a6e5a17c780a747a2e73d7493
Reviewed-on: https://go-review.googlesource.com/10181
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-19 15:30:34 +00:00
Russ Cox
8903b3db0e runtime: add fast check for self-loop pointer in scanobject
Addresses a problem reported on the mailing list.

This will come up mainly in programs custom allocators that batch allocations,
but it still helps in our programs, which mainly do not have such allocations.

name                   old mean              new mean              delta
BinaryTree17            5.95s × (0.97,1.03)   5.93s × (0.97,1.04)    ~    (p=0.613)
Fannkuch11              4.46s × (0.98,1.04)   4.33s × (0.99,1.01)  -2.93% (p=0.000)
FmtFprintfEmpty        86.6ns × (0.98,1.03)  86.8ns × (0.98,1.02)    ~    (p=0.523)
FmtFprintfString        290ns × (0.98,1.05)   287ns × (0.98,1.03)    ~    (p=0.061)
FmtFprintfInt           271ns × (0.98,1.04)   286ns × (0.99,1.01)  +5.54% (p=0.000)
FmtFprintfIntInt        495ns × (0.98,1.04)   489ns × (0.99,1.01)  -1.24% (p=0.015)
FmtFprintfPrefixedInt   391ns × (0.99,1.02)   407ns × (0.99,1.01)  +4.00% (p=0.000)
FmtFprintfFloat         578ns × (0.99,1.01)   559ns × (0.99,1.01)  -3.35% (p=0.000)
FmtManyArgs            1.96µs × (0.98,1.05)  1.94µs × (0.99,1.01)  -1.33% (p=0.030)
GobDecode              15.9ms × (0.97,1.05)  15.7ms × (0.99,1.01)  -1.35% (p=0.044)
GobEncode              11.4ms × (0.97,1.05)  11.3ms × (0.98,1.03)    ~    (p=0.141)
Gzip                    658ms × (0.98,1.05)   648ms × (0.99,1.01)  -1.59% (p=0.009)
Gunzip                  144ms × (0.99,1.03)   144ms × (0.99,1.01)    ~    (p=0.867)
HTTPClientServer       92.1µs × (0.97,1.05)  90.3µs × (0.99,1.01)  -1.89% (p=0.005)
JSONEncode             31.0ms × (0.96,1.07)  30.2ms × (0.98,1.03)  -2.66% (p=0.001)
JSONDecode              110ms × (0.97,1.04)   107ms × (0.99,1.01)  -2.59% (p=0.000)
Mandelbrot200          6.15ms × (0.98,1.04)  6.07ms × (0.99,1.02)  -1.32% (p=0.045)
GoParse                6.79ms × (0.97,1.04)  6.74ms × (0.97,1.04)    ~    (p=0.242)
RegexpMatchEasy0_32     158ns × (0.98,1.05)   155ns × (0.99,1.01)  -1.64% (p=0.010)
RegexpMatchEasy0_1K     548ns × (0.97,1.04)   540ns × (0.99,1.01)  -1.34% (p=0.042)
RegexpMatchEasy1_32     133ns × (0.97,1.04)   132ns × (0.97,1.05)    ~    (p=0.466)
RegexpMatchEasy1_1K     899ns × (0.96,1.05)   878ns × (0.99,1.01)  -2.32% (p=0.002)
RegexpMatchMedium_32    250ns × (0.96,1.03)   243ns × (0.99,1.01)  -2.90% (p=0.000)
RegexpMatchMedium_1K   73.4µs × (0.98,1.04)  73.0µs × (0.98,1.04)    ~    (p=0.411)
RegexpMatchHard_32     3.87µs × (0.97,1.07)  3.84µs × (0.98,1.04)    ~    (p=0.273)
RegexpMatchHard_1K      120µs × (0.97,1.08)   117µs × (0.99,1.01)  -2.06% (p=0.010)
Revcomp                 940ms × (0.96,1.07)   924ms × (0.97,1.07)    ~    (p=0.071)
Template                128ms × (0.96,1.05)   128ms × (0.99,1.01)    ~    (p=0.502)
TimeParse               632ns × (0.96,1.07)   616ns × (0.99,1.01)  -2.58% (p=0.001)
TimeFormat              671ns × (0.97,1.06)   657ns × (0.99,1.02)  -2.10% (p=0.002)

In contrast to the one in test/bench/go1 (above), the binarytree program on the
shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS
to runtime.NumCPU()*2.

Using that version, before vs after:

name          old mean             new mean             delta
BinaryTree20  18.6s × (0.96,1.05)  11.3s × (0.98,1.02)  -39.46% (p=0.000)

And Go 1.4 vs after:

name          old mean             new mean             delta
BinaryTree20  13.0s × (0.97,1.02)  11.3s × (0.98,1.02)  -13.21% (p=0.000)

There is still a scheduling problem - the raw run times are hiding the fact that
this chews up 2x the CPU - but we'll take care of that separately.

Change-Id: I3f5da879b24ae73a0d06745381ffb88c3744948b
Reviewed-on: https://go-review.googlesource.com/10220
Reviewed-by: Austin Clements <austin@google.com>
2015-05-19 15:29:40 +00:00
Josh Bleecher Snyder
79986e24e0 runtime/pprof: write heap statistics to heap profile always
This is a duplicate of CL 9491.
That CL broke the build due to pprof shortcomings
and was reverted in CL 9565.

CL 9623 fixed pprof, so this can go in again.

Fixes #10659.

Change-Id: If470fc90b3db2ade1d161b4417abd2f5c6c330b8
Reviewed-on: https://go-review.googlesource.com/10212
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2015-05-18 20:02:21 +00:00
Austin Clements
f0dd002895 runtime: use separate count and note for forEachP
Currently, forEachP reuses the stopwait and stopnote fields from
stopTheWorld to track how many Ps have not responded to the safe-point
request and to sleep until all Ps have responded.

It was assumed this was safe because both stopTheWorld and forEachP
must occur under the worlsema and hence stopwait and stopnote cannot
be used for both purposes simultaneously and callers could always
determine the appropriate use based on sched.gcwaiting (which is only
set by stopTheWorld). However, this is not the case, since it's
possible for there to be a window between when an M observes that
gcwaiting is set and when it checks stopwait during which stopwait
could have changed meanings. When this happens, the M decrements
stopwait and may wakeup stopnote, but does not otherwise participate
in the forEachP protocol. As a result, stopwait is decremented too
many times, so it may reach zero before all Ps have run the safe-point
function, causing forEachP to wake up early. It will then either
observe that some P has not run the safe-point function and panic with
"P did not run fn", or the remaining P (or Ps) will run the safe-point
function before it wakes up and it will observe that stopwait is
negative and panic with "not stopped".

Fix this problem by giving forEachP its own safePointWait and
safePointNote fields.

One known sequence of events that can cause this race is as
follows. It involves three actors:

G1 is running on M1 on P1. P1 has an empty run queue.

G2/M2 is in a blocked syscall and has lost its P. (The details of this
don't matter, it just needs to be in a position where it needs to grab
an idle P.)

GC just started on G3/M3/P3. (These aren't very involved, they just
have to be separate from the other G's, M's, and P's.)

1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1.

Now G1/M1 begins to enter a syscall:

2. G1/M1 invokes reentersyscall, which sets the P1's status to
   _Psyscall.

3. G1/M1's reentersyscall observes gcwaiting != 0 and calls
   entersyscall_gcwait.

4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock.

Back on GC:

5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and
   returns.

6. GC does stuff and then calls startTheWorld().

7. startTheWorld() calls procresize(), which sets P1's status to
   _Pidle and puts P1 on the idle list.

Now G2/M2 returns from its syscall and takes over P1:

8. G2/M2 returns from its blocked syscall and gets P1 from the idle
   list.

9. G2/M2 acquires P1, which sets P1's status to _Prunning.

10. G2/M2 starts a new syscall and invokes reentersyscall, which sets
    P1's status to _Psyscall.

Back on G1/M1:

11. G1/M1 finally acquires sched.lock in entersyscall_gcwait.

At this point, G1/M1 still thinks it's running on P1. P1's status is
_Psyscall, which is consistent with what G1/M1 is doing, but it's
_Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is
basically an ABA race on P1's status.

Because forEachP currently shares stopwait with stopTheWorld. G1/M1's
entersyscall_gcwait observes the non-zero stopwait set by forEachP,
but mistakes it for a stopTheWorld. It cas's P1's status from
_Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement
stopwait one more time than forEachP was expecting.

Fixes #10618. (See the issue for details on why the above race is safe
when forEachP is not involved.)

Prior to this commit, the command
  stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile
would reliably fail after a few hundred runs. With this commit, it
ran for over 2 million runs and never crashed.

Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30
Reviewed-on: https://go-review.googlesource.com/10157
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:47 +00:00
Austin Clements
277acca286 runtime: hold worldsema while starting the world
Currently, startTheWorld releases worldsema before starting the
world. Since startTheWorld can change gomaxprocs after allowing Ps to
run, this means that gomaxprocs can change while another P holds
worldsema.

Unfortunately, the garbage collector and forEachP assume that holding
worldsema protects against changes in gomaxprocs (which it *almost*
does). In particular, this is causing somewhat frequent "P did not run
fn" crashes in forEachP in the runtime tests because gomaxprocs is
changing between the several loops that forEachP does over all the Ps.

Fix this by only releasing worldsema after the world is started.

This relates to issue #10618. forEachP still fails under stress
testing, but much less frequently.

Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff
Reviewed-on: https://go-review.googlesource.com/10156
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:37 +00:00
Austin Clements
9c44a41dd5 runtime: disallow preemption during startTheWorld
Currently, startTheWorld clears preemptoff for the current M before
starting the world. A few callers increment m.locks around
startTheWorld, presumably to prevent preemption any time during
starting the world. This is almost certainly pointless (none of the
other callers do this), but there's no harm in making startTheWorld
keep preemption disabled until it's all done, which definitely lets us
drop these m.locks manipulations.

Change-Id: I8a93658abd0c72276c9bafa3d2c7848a65b4691a
Reviewed-on: https://go-review.googlesource.com/10155
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:31 +00:00
Austin Clements
a1da255aa0 runtime: factor stoptheworld/starttheworld pattern
There are several steps to stopping and starting the world and
currently they're open-coded in several places. The garbage collector
is the only thing that needs to stop and start the world in a
non-trivial pattern. Replace all other uses with calls to higher-level
functions that implement the entire pattern necessary to stop and
start the world.

This is a pure refectoring and should not change any code semantics.
In the following commits, we'll make changes that are easier to do
with this abstraction in place.

This commit renames the old starttheworld to startTheWorldWithSema.
This is a slight misnomer right now because the callers release
worldsema just before calling this. However, a later commit will swap
these and I don't want to think of another name in the mean time.

Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911
Reviewed-on: https://go-review.googlesource.com/10154
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:25 +00:00
Austin Clements
5f7060afd2 runtime: don't start GC if preemptoff is set
In order to avoid deadlocks, startGC avoids kicking off GC if locks
are held by the calling M. However, it currently fails to check
preemptoff, which is the other way to disable preemption.

Fix this by adding a check for preemptoff.

Change-Id: Ie1083166e5ba4af5c9d6c5a42efdfaaef41ca997
Reviewed-on: https://go-review.googlesource.com/10153
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:18 +00:00
Alex Brainman
e544bee1dd runtime: correct exception stack trace output
It is misleading when stack trace say:

signal arrived during cgo execution

but we are not in cgo call.

Change-Id: I627e2f2bdc7755074677f77f21befc070a101914
Reviewed-on: https://go-review.googlesource.com/9190
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 03:09:45 +00:00
Austin Clements
a0fc306023 runtime: eliminate runqvictims and a copy from runqsteal
Currently, runqsteal steals Gs from another P into an intermediate
buffer and then copies those Gs into the current P's run queue. This
intermediate buffer itself was moved from the stack to the P in commit
c4fe503 to eliminate the cost of zeroing it on every steal.

This commit follows up c4fe503 by stealing directly into the current
P's run queue, which eliminates the copy and the need for the
intermediate buffer. The update to the tail pointer is only committed
once the entire steal operation has succeeded, so the semantics of
stealing do not change.

Change-Id: Icdd7a0eb82668980bf42c0154b51eef6419fdd51
Reviewed-on: https://go-review.googlesource.com/9998
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-17 01:08:42 +00:00
Russ Cox
512f75e8df runtime: replace GC programs with simpler encoding, faster decoder
Small types record the location of pointers in their memory layout
by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries,
and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using
a bitmap for a large type containing arrays does not make sense:
if someone refers to the type [1<<28]*byte in a program in such
a way that the type information makes it into the binary, it would be
a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB
(for 1-bit entries) bitmap full of 1s into the binary or even to keep
one in memory during the execution of the program.

For large types containing arrays, it is much more compact to describe
the locations of pointers using a notation that can express repetition
than to lay out a bitmap of pointers. Go 1.4 included such a notation,
called ``GC programs'' but it was complex, required recursion during
decoding, and was generally slow. Dmitriy measured the execution of
these programs writing directly to the heap bitmap as being 7x slower
than copying from a preunrolled 4-bit mask (and frankly that code was
not terribly fast either). For some tests, unrollgcprog1 was seen costing
as much as 3x more than the rest of malloc combined.

This CL introduces a different form for the GC programs. They use a
simple Lempel-Ziv-style encoding of the 1-bit pointer information,
in which the only operations are (1) emit the following n bits
and (2) repeat the last n bits c more times. This encoding can be
generated directly from the Go type information (using repetition
only for arrays or large runs of non-pointer data) and it can be decoded
very efficiently. In particular the decoding requires little state and
no recursion, so that the entire decoding can run without any memory
accesses other than the reads of the encoding and the writes of the
decoded form to the heap bitmap. For recursive types like arrays of
arrays of arrays, the inner instructions are only executed once, not
n times, so that large repetitions run at full speed. (In contrast, large
repetitions in the old programs repeated the individual bit-level layout
of the inner data over and over.) The result is as much as 25x faster
decoding compared to the old form.

Because the old decoder was so slow, Go 1.4 had three (or so) cases
for how to set the heap bitmap bits for an allocation of a given type:

(1) If the type had an even number of words up to 32 words, then
the 4-bit pointer mask for the type fit in no more than 16 bytes;
store the 4-bit pointer mask directly in the binary and copy from it.

(1b) If the type had an odd number of words up to 15 words, then
the 4-bit pointer mask for the type, doubled to end on a byte boundary,
fit in no more than 16 bytes; store that doubled mask directly in the
binary and copy from it.

(2) If the type had an even number of words up to 128 words,
or an odd number of words up to 63 words (again due to doubling),
then the 4-bit pointer mask would fit in a 64-byte unrolled mask.
Store a GC program in the binary, but leave space in the BSS for
the unrolled mask. Execute the GC program to construct the mask the
first time it is needed, and thereafter copy from the mask.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.
(This is the case that was 7x slower than the other two.)

Because the new pointer masks store 1-bit entries instead of 4-bit
entries and because using the decoder no longer carries a significant
overhead, after this CL (that is, for Go 1.5) there are only two cases:

(1) If the type is 128 words or less (no condition about odd or even),
store the 1-bit pointer mask directly in the binary and use it to
initialize the heap bitmap during malloc. (Implemented in CL 9702.)

(2) There is no case 2 anymore.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.

Executing the GC program directly into the heap bitmap (case (3) above)
was disabled for the Go 1.5 dev cycle, both to avoid needing to use
GC programs for typedmemmove and to avoid updating that code as
the heap bitmap format changed. Typedmemmove no longer uses this
type information; as of CL 9886 it uses the heap bitmap directly.
Now that the heap bitmap format is stable, we reintroduce GC programs
and their space savings.

Benchmarks for heapBitsSetType, before this CL vs this CL:

name                    old mean               new mean              delta
SetTypePtr              7.59ns × (0.99,1.02)   5.16ns × (1.00,1.00)  -32.05% (p=0.000)
SetTypePtr8             21.0ns × (0.98,1.05)   21.4ns × (1.00,1.00)     ~    (p=0.179)
SetTypePtr16            24.1ns × (0.99,1.01)   24.6ns × (1.00,1.00)   +2.41% (p=0.001)
SetTypePtr32            31.2ns × (0.99,1.01)   32.4ns × (0.99,1.02)   +3.72% (p=0.001)
SetTypePtr64            45.2ns × (1.00,1.00)   47.2ns × (1.00,1.00)   +4.42% (p=0.000)
SetTypePtr126           75.8ns × (0.99,1.01)   79.1ns × (1.00,1.00)   +4.25% (p=0.000)
SetTypePtr128           74.3ns × (0.99,1.01)   77.6ns × (1.00,1.01)   +4.55% (p=0.000)
SetTypePtrSlice          726ns × (1.00,1.01)    712ns × (1.00,1.00)   -1.95% (p=0.001)
SetTypeNode1            20.0ns × (0.99,1.01)   20.7ns × (1.00,1.00)   +3.71% (p=0.000)
SetTypeNode1Slice        112ns × (1.00,1.00)    113ns × (0.99,1.00)     ~    (p=0.070)
SetTypeNode8            23.9ns × (1.00,1.00)   24.7ns × (1.00,1.01)   +3.18% (p=0.000)
SetTypeNode8Slice        294ns × (0.99,1.02)    287ns × (0.99,1.01)   -2.38% (p=0.015)
SetTypeNode64           52.8ns × (0.99,1.03)   51.8ns × (0.99,1.01)     ~    (p=0.069)
SetTypeNode64Slice      1.13µs × (0.99,1.05)   1.14µs × (0.99,1.00)     ~    (p=0.767)
SetTypeNode64Dead       36.0ns × (1.00,1.01)   32.5ns × (0.99,1.00)   -9.67% (p=0.000)
SetTypeNode64DeadSlice  1.43µs × (0.99,1.01)   1.40µs × (1.00,1.00)   -2.39% (p=0.001)
SetTypeNode124          75.7ns × (1.00,1.01)   79.0ns × (1.00,1.00)   +4.44% (p=0.000)
SetTypeNode124Slice     1.94µs × (1.00,1.01)   2.04µs × (0.99,1.01)   +4.98% (p=0.000)
SetTypeNode126          75.4ns × (1.00,1.01)   77.7ns × (0.99,1.01)   +3.11% (p=0.000)
SetTypeNode126Slice     1.95µs × (0.99,1.01)   2.03µs × (1.00,1.00)   +3.74% (p=0.000)
SetTypeNode128          85.4ns × (0.99,1.01)  122.0ns × (1.00,1.00)  +42.89% (p=0.000)
SetTypeNode128Slice     2.20µs × (1.00,1.01)   2.36µs × (0.98,1.02)   +7.48% (p=0.001)
SetTypeNode130          83.3ns × (1.00,1.00)  123.0ns × (1.00,1.00)  +47.61% (p=0.000)
SetTypeNode130Slice     2.30µs × (0.99,1.01)   2.40µs × (0.98,1.01)   +4.37% (p=0.000)
SetTypeNode1024          498ns × (1.00,1.00)    537ns × (1.00,1.00)   +7.96% (p=0.000)
SetTypeNode1024Slice    15.5µs × (0.99,1.01)   17.8µs × (1.00,1.00)  +15.27% (p=0.000)

The above compares always using a cached pointer mask (and the
corresponding waste of memory) against using the programs directly.
Some slowdown is expected, in exchange for having a better general algorithm.
The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024,
along with the slice variants of those.
It is possible that the cutoff of 128 words (bits) should be raised
in a followup CL, but even with this low cutoff the GC programs are
faster than Go 1.4's "fast path" non-GC program case.

Benchmarks for heapBitsSetType, Go 1.4 vs this CL:

name                    old mean              new mean              delta
SetTypePtr              6.89ns × (1.00,1.00)  5.17ns × (1.00,1.00)  -25.02% (p=0.000)
SetTypePtr8             25.8ns × (0.97,1.05)  21.5ns × (1.00,1.00)  -16.70% (p=0.000)
SetTypePtr16            39.8ns × (0.97,1.02)  24.7ns × (0.99,1.01)  -37.81% (p=0.000)
SetTypePtr32            68.8ns × (0.98,1.01)  32.2ns × (1.00,1.01)  -53.18% (p=0.000)
SetTypePtr64             130ns × (1.00,1.00)    47ns × (1.00,1.00)  -63.67% (p=0.000)
SetTypePtr126            241ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.25% (p=0.000)
SetTypePtr128           2.07µs × (1.00,1.00)  0.08µs × (1.00,1.00)  -96.27% (p=0.000)
SetTypePtrSlice         1.05µs × (0.99,1.01)  0.72µs × (0.99,1.02)  -31.70% (p=0.000)
SetTypeNode1            16.0ns × (0.99,1.01)  20.8ns × (0.99,1.03)  +29.91% (p=0.000)
SetTypeNode1Slice        184ns × (0.99,1.01)   112ns × (0.99,1.01)  -39.26% (p=0.000)
SetTypeNode8            29.5ns × (0.97,1.02)  24.6ns × (1.00,1.00)  -16.50% (p=0.000)
SetTypeNode8Slice        624ns × (0.98,1.02)   285ns × (1.00,1.00)  -54.31% (p=0.000)
SetTypeNode64            135ns × (0.96,1.08)    52ns × (0.99,1.02)  -61.32% (p=0.000)
SetTypeNode64Slice      3.83µs × (1.00,1.00)  1.14µs × (0.99,1.01)  -70.16% (p=0.000)
SetTypeNode64Dead        134ns × (0.99,1.01)    32ns × (1.00,1.01)  -75.74% (p=0.000)
SetTypeNode64DeadSlice  3.83µs × (0.99,1.00)  1.40µs × (1.00,1.01)  -63.42% (p=0.000)
SetTypeNode124           240ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.05% (p=0.000)
SetTypeNode124Slice     7.27µs × (1.00,1.00)  2.04µs × (1.00,1.00)  -71.95% (p=0.000)
SetTypeNode126          2.06µs × (0.99,1.01)  0.08µs × (0.99,1.01)  -96.23% (p=0.000)
SetTypeNode126Slice     64.4µs × (1.00,1.00)   2.0µs × (1.00,1.00)  -96.85% (p=0.000)
SetTypeNode128          2.09µs × (1.00,1.01)  0.12µs × (1.00,1.00)  -94.15% (p=0.000)
SetTypeNode128Slice     65.4µs × (1.00,1.00)   2.4µs × (0.99,1.03)  -96.39% (p=0.000)
SetTypeNode130          2.11µs × (1.00,1.00)  0.12µs × (1.00,1.00)  -94.18% (p=0.000)
SetTypeNode130Slice     66.3µs × (1.00,1.00)   2.4µs × (0.97,1.08)  -96.34% (p=0.000)
SetTypeNode1024         16.0µs × (1.00,1.01)   0.5µs × (1.00,1.00)  -96.65% (p=0.000)
SetTypeNode1024Slice     512µs × (1.00,1.00)    18µs × (0.98,1.04)  -96.45% (p=0.000)

SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation.
Both Go 1.4 and this CL are using pointer bitmaps for this case,
so that's an overall 3x speedup for using pointer bitmaps.

SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation.
Both Go 1.4 and this CL are running the GC program for this case,
so that's an overall 17x speedup when using GC programs (and
I've seen >20x on other systems).

Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against
this CL's SetTypeNode128 (GC program), the slow path in the
code in this CL is 2x faster than the fast path in Go 1.4.

The Go 1 benchmarks are basically unaffected compared to just before this CL.

Go 1 benchmarks, before this CL vs this CL:

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.97,1.04)   5.91s × (0.96,1.04)    ~    (p=0.306)
Fannkuch11              4.38s × (1.00,1.00)   4.37s × (1.00,1.01)  -0.22% (p=0.006)
FmtFprintfEmpty        90.7ns × (0.97,1.10)  89.3ns × (0.96,1.09)    ~    (p=0.280)
FmtFprintfString        282ns × (0.98,1.04)   287ns × (0.98,1.07)  +1.72% (p=0.039)
FmtFprintfInt           269ns × (0.99,1.03)   282ns × (0.97,1.04)  +4.87% (p=0.000)
FmtFprintfIntInt        478ns × (0.99,1.02)   481ns × (0.99,1.02)  +0.61% (p=0.048)
FmtFprintfPrefixedInt   399ns × (0.98,1.03)   400ns × (0.98,1.05)    ~    (p=0.533)
FmtFprintfFloat         563ns × (0.99,1.01)   570ns × (1.00,1.01)  +1.37% (p=0.000)
FmtManyArgs            1.89µs × (0.99,1.01)  1.92µs × (0.99,1.02)  +1.88% (p=0.000)
GobDecode              15.2ms × (0.99,1.01)  15.2ms × (0.98,1.05)    ~    (p=0.609)
GobEncode              11.6ms × (0.98,1.03)  11.9ms × (0.98,1.04)  +2.17% (p=0.000)
Gzip                    648ms × (0.99,1.01)   648ms × (1.00,1.01)    ~    (p=0.835)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.169)
HTTPClientServer       90.5µs × (0.98,1.03)  91.5µs × (0.98,1.04)  +1.04% (p=0.045)
JSONEncode             31.5ms × (0.98,1.03)  31.4ms × (0.98,1.03)    ~    (p=0.549)
JSONDecode              111ms × (0.99,1.01)   107ms × (0.99,1.01)  -3.21% (p=0.000)
Mandelbrot200          6.01ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.878)
GoParse                6.54ms × (0.99,1.02)  6.61ms × (0.99,1.03)  +1.08% (p=0.004)
RegexpMatchEasy0_32     160ns × (1.00,1.01)   161ns × (1.00,1.00)  +0.40% (p=0.000)
RegexpMatchEasy0_1K     560ns × (0.99,1.01)   559ns × (0.99,1.01)    ~    (p=0.088)
RegexpMatchEasy1_32     138ns × (0.99,1.01)   138ns × (1.00,1.00)    ~    (p=0.380)
RegexpMatchEasy1_1K     877ns × (1.00,1.00)   878ns × (1.00,1.00)    ~    (p=0.157)
RegexpMatchMedium_32    251ns × (0.99,1.00)   251ns × (1.00,1.01)  +0.28% (p=0.021)
RegexpMatchMedium_1K   72.6µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.539)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.378)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.067)
Revcomp                 904ms × (0.99,1.02)   904ms × (0.99,1.01)    ~    (p=0.943)
Template                125ms × (0.99,1.02)   127ms × (0.99,1.01)  +1.79% (p=0.000)
TimeParse               627ns × (0.99,1.01)   622ns × (0.99,1.01)  -0.88% (p=0.000)
TimeFormat              655ns × (0.99,1.02)   655ns × (0.99,1.02)    ~    (p=0.976)

For the record, Go 1 benchmarks, Go 1.4 vs this CL:

name                   old mean              new mean              delta
BinaryTree17            4.61s × (0.97,1.05)   5.91s × (0.98,1.03)  +28.35% (p=0.000)
Fannkuch11              4.40s × (0.99,1.03)   4.41s × (0.99,1.01)     ~    (p=0.212)
FmtFprintfEmpty         102ns × (0.99,1.01)    84ns × (0.99,1.02)  -18.38% (p=0.000)
FmtFprintfString        302ns × (0.98,1.01)   303ns × (0.99,1.02)     ~    (p=0.203)
FmtFprintfInt           313ns × (0.97,1.05)   270ns × (0.99,1.01)  -13.69% (p=0.000)
FmtFprintfIntInt        524ns × (0.98,1.02)   477ns × (0.99,1.00)   -8.87% (p=0.000)
FmtFprintfPrefixedInt   424ns × (0.98,1.02)   386ns × (0.99,1.01)   -8.96% (p=0.000)
FmtFprintfFloat         652ns × (0.98,1.02)   594ns × (0.97,1.05)   -8.97% (p=0.000)
FmtManyArgs            2.13µs × (0.99,1.02)  1.94µs × (0.99,1.01)   -8.92% (p=0.000)
GobDecode              17.1ms × (0.99,1.02)  14.9ms × (0.98,1.03)  -13.07% (p=0.000)
GobEncode              13.5ms × (0.98,1.03)  11.5ms × (0.98,1.03)  -15.25% (p=0.000)
Gzip                    656ms × (0.99,1.02)   647ms × (0.99,1.01)   -1.29% (p=0.000)
Gunzip                  143ms × (0.99,1.02)   144ms × (0.99,1.01)     ~    (p=0.204)
HTTPClientServer       88.2µs × (0.98,1.02)  90.8µs × (0.98,1.01)   +2.93% (p=0.000)
JSONEncode             32.2ms × (0.98,1.02)  30.9ms × (0.97,1.04)   -4.06% (p=0.001)
JSONDecode              121ms × (0.98,1.02)   110ms × (0.98,1.05)   -8.95% (p=0.000)
Mandelbrot200          6.06ms × (0.99,1.01)  6.11ms × (0.98,1.04)     ~    (p=0.184)
GoParse                6.76ms × (0.97,1.04)  6.58ms × (0.98,1.05)   -2.63% (p=0.003)
RegexpMatchEasy0_32     195ns × (1.00,1.01)   155ns × (0.99,1.01)  -20.43% (p=0.000)
RegexpMatchEasy0_1K     479ns × (0.98,1.03)   535ns × (0.99,1.02)  +11.59% (p=0.000)
RegexpMatchEasy1_32     169ns × (0.99,1.02)   131ns × (0.99,1.03)  -22.44% (p=0.000)
RegexpMatchEasy1_1K    1.53µs × (0.99,1.01)  0.87µs × (0.99,1.02)  -43.07% (p=0.000)
RegexpMatchMedium_32    334ns × (0.99,1.01)   242ns × (0.99,1.01)  -27.53% (p=0.000)
RegexpMatchMedium_1K    125µs × (1.00,1.01)    72µs × (0.99,1.03)  -42.53% (p=0.000)
RegexpMatchHard_32     6.03µs × (0.99,1.01)  3.79µs × (0.99,1.01)  -37.12% (p=0.000)
RegexpMatchHard_1K      189µs × (0.99,1.02)   115µs × (0.99,1.01)  -39.20% (p=0.000)
Revcomp                 935ms × (0.96,1.03)   926ms × (0.98,1.02)     ~    (p=0.083)
Template                146ms × (0.97,1.05)   119ms × (0.99,1.01)  -18.37% (p=0.000)
TimeParse               660ns × (0.99,1.01)   624ns × (0.99,1.02)   -5.43% (p=0.000)
TimeFormat              670ns × (0.98,1.02)   710ns × (1.00,1.01)   +5.97% (p=0.000)

This CL is a bit larger than I would like, but the compiler, linker, runtime,
and package reflect all need to be in sync about the format of these programs,
so there is no easy way to split this into independent changes (at least
while keeping the build working at each change).

Fixes #9625.
Fixes #10524.

Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a
Reviewed-on: https://go-review.googlesource.com/9888
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-16 00:38:17 +00:00
Russ Cox
d820d5f3ab runtime: make mapzero not crash on arm
Change-Id: I40e8a4a2e62253233b66f6a2e61e222437292c31
Reviewed-on: https://go-review.googlesource.com/10151
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-15 20:14:41 +00:00