qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-10-05 07:11:22 -06:00

Author	SHA1	Message	Date
Russ Cox	43aac4f9e7	runtime: raise maxmem to 512 GB A workaround for #10460. Change-Id: I607a556561d509db6de047892f886fb565513895 Reviewed-on: https://go-review.googlesource.com/10819 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-06-15 18:31:25 +00:00
Russ Cox	2c2770c3d4	cmd/cgo: make sure pointers passed to C escape to heap Fixes #10303. Change-Id: Ia68d3566ba3ebeea6e18e388446bd9b8c431e156 Reviewed-on: https://go-review.googlesource.com/10814 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-06-15 17:39:53 +00:00
Russ Cox	a3b9797baa	runtime: gofmt Change-Id: I539bdc438f694610a7cd373f7e1451171737cfb3 Reviewed-on: https://go-review.googlesource.com/11084 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-15 17:36:34 +00:00
Russ Cox	d5b40b6ac2	runtime: add GODEBUG gcshrinkstackoff, gcstackbarrieroff, and gcstoptheworld variables While we're here, update the documentation and delete variables with no effect. Change-Id: I4df0d266dff880df61b488ed547c2870205862f0 Reviewed-on: https://go-review.googlesource.com/10790 Reviewed-by: Austin Clements <austin@google.com>	2015-06-15 17:31:04 +00:00
Russ Cox	80ec711755	runtime: use type-based write barrier for remote stack write during chansend A send on an unbuffered channel to a blocked receiver is the only case in the runtime where one goroutine writes directly to the stack of another. The garbage collector assumes that if a goroutine is blocked, its stack contains no new pointers since the last time it ran. The send on an unbuffered channel violates this, so it needs an explicit write barrier. It has an explicit write barrier, but not one that can handle a write to another stack. Use one that can (based on type bitmap instead of heap bitmap). To make this work, raise the limit for type bitmaps so that they are used for all types up to 64 kB in size (256 bytes of bitmap). (The runtime already imposes a limit of 64 kB for a channel element size.) I have been unable to reproduce this problem in a simple test program. Could help #11035. Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5 Reviewed-on: https://go-review.googlesource.com/10815 Reviewed-by: Austin Clements <austin@google.com>	2015-06-15 16:50:30 +00:00
Russ Cox	d57c889ae8	runtime: wait to update arena_used until after mapping bitmap This avoids a race with gcmarkwb_m that was leading to faults. Fixes #10212. Change-Id: I6fcf8d09f2692227063ce29152cb57366ea22487 Reviewed-on: https://go-review.googlesource.com/10816 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-06-11 18:15:21 +00:00
Ainar Garipov	7f9f70e5b6	all: fix misprints in comments These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-11 14:18:57 +00:00
Yongjian Xu	93e57a22d5	runtime: correct a drifted comment in referencing m->locked. Change-Id: Ida4b98aa63e57594fa6fa0b8178106bac9b3cd19 Reviewed-on: https://go-review.googlesource.com/10837 Reviewed-by: Minux Ma <minux@golang.org>	2015-06-10 06:15:20 +00:00
Russ Cox	433c0bc769	runtime: avoid fault in heapBitsBulkBarrier Change-Id: I0512e461de1f25cb2a1cb7f23e7a77d00700667c Reviewed-on: https://go-review.googlesource.com/10803 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-08 20:24:00 +00:00
Austin Clements	b0532a96a8	runtime: fix write-barrier-enabled phase list in gcmarkwb_m Commit `1303957` was supposed to enable write barriers during the concurrent scan phase, but it only enabled calls to the write barrier during this phase. It failed to update the redundant list of write-barrier-enabled phases in gcmarkwb_m, so it still wasn't greying objects during the scan phase. This commit fixes this by replacing the redundant list of phases in gcmarkwb_m with simply checking writeBarrierEnabled. This is almost certainly redundant with checks already done in callers, but the last time we tried to remove these redundant checks everything got much slower, so I'm leaving it alone for now. Fixes #11105. Change-Id: I00230a3cb80a008e749553a8ae901b409097e4be Reviewed-on: https://go-review.googlesource.com/10801 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Minux Ma <minux@golang.org>	2015-06-08 05:13:15 +00:00
Austin Clements	306f8f11ad	runtime: unwind stack barriers when writing above the current frame Stack barriers assume that writes through pointers to frames above the current frame will get write barriers, and hence these frames do not need to be re-scanned to pick up these changes. For normal writes, this is true. However, there are places in the runtime that use typedmemmove to potentially write through pointers to higher frames (such as mapassign1). Currently, typedmemmove does not execute write barriers if the destination is on the stack. If there's a stack barrier between the current frame and the frame being modified with typedmemmove, and the stack barrier is not otherwise hit, it's possible that the garbage collector will never see the updated pointer and incorrectly reclaim the object. Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove and its variants) detect when the destination is in the stack and unwind stack barriers up to the point, forcing mark termination to later rescan the effected frame and collect these pointers. Fixes #11084. Might be related to #10240, #10541, #10941, #11023, #11027 and possibly others. Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd Reviewed-on: https://go-review.googlesource.com/10791 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:57:47 +00:00
Austin Clements	1303957dbf	runtime: enable write barriers during concurrent scan Currently, write barriers are only enabled after completion of the concurrent scan phase, as we enter the concurrent mark phase. However, stack barriers are installed during the scan phase and assume that write barriers will track changes to frames above the stack barriers. Since write barriers aren't enabled until after stack barriers are installed, we may miss modifications to the stack that happen after installing the stack barriers and before enabling write barriers. Fix this by enabling write barriers during the scan phase. This commit intentionally makes the minimal change to do this (there's only one line of code change; the rest are comment changes). At the very least, we should consider eliminating the ragged barrier that's intended to synchronize the enabling of write barriers, but now just wastes time. I've included a large comment about extensions and alternative designs. Change-Id: Ib20fede794e4fcb91ddf36f99bd97344d7f96421 Reviewed-on: https://go-review.googlesource.com/10795 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:55:33 +00:00
Austin Clements	6f6403eddf	runtime: fix checkmarks to rescan stacks Currently checkmarks mode fails to rescan stacks because it sees the leftover state bits indicating that the stacks haven't changed since the last scan. As a result, it won't detect lost marks caused by failing to scan stacks correctly during regular garbage collection. Fix this by marking all stacks dirty before performing the checkmark phase. Change-Id: I1f06882bb8b20257120a4b8e7f95bb3ffc263895 Reviewed-on: https://go-review.googlesource.com/10794 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:55:12 +00:00
Austin Clements	2774b37306	all: use RET instead of RETURN on ppc64 All of the architectures except ppc64 have only "RET" for the return mnemonic. ppc64 used to have only "RETURN", but commit `cf06ea6` introduced RET as a synonym for RETURN to make ppc64 consistent with the other architectures. However, that commit was never followed up to make the code itself consistent by eliminating uses of RETURN. This commit replaces all uses of RETURN in the ppc64 assembly with RET. This was done with sed -i 's/\<RETURN\>/RET/' */_ppc64x.s plus one manual change to syscall/asm.s. Change-Id: I3f6c8d2be157df8841d48de988ee43f3e3087995 Reviewed-on: https://go-review.googlesource.com/10672 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Minux Ma <minux@golang.org>	2015-06-06 00:07:23 +00:00
Alan Donovan	232331f0c7	runtime: add blank assignment to defeat "declared but not used" error from go/types gc should ideally consider this an error too; see golang/go#8560. Change-Id: Ieee71c4ecaff493d7f83e15ba8c8a04ee90a4cf1 Reviewed-on: https://go-review.googlesource.com/10757 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-06-05 18:05:16 +00:00
Austin Clements	7529314ed3	runtime: use correct SP when installing stack barriers Currently the stack barriers are installed at the next frame boundary after gp.sched.sp + 10242^n for n=0,1,2,... However, when a G is in a system call, we set gp.sched.sp to 0, which causes stack barriers to be installed at every* frame. This easily overflows the slice we've reserved for storing the stack barrier information, and causes a "slice bounds out of range" panic in gcInstallStackBarrier. Fix this by using gp.syscallsp instead of gp.sched.sp if it's non-zero. This is the same logic that gentraceback uses to determine the current SP. Fixes #11049. Change-Id: Ie40eeee5bec59b7c1aa715a7c17aa63b1f1cf4e8 Reviewed-on: https://go-review.googlesource.com/10755 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-05 15:53:07 +00:00
Russ Cox	3ffcbb633e	runtime: default GOMAXPROCS to NumCPU(), not 1 See golang.org/s/go15gomaxprocs for details. Change-Id: I8de5df34fa01d31d78f0194ec78a2474c281243c Reviewed-on: https://go-review.googlesource.com/10668 Reviewed-by: Rob Pike <r@golang.org>	2015-06-05 04:38:04 +00:00
Josh Bleecher Snyder	5353cde080	runtime, cmd/internal/obj/arm: improve arm function prologue When stack growth is not needed, as it usually is not, execute only a single conditional branch rather than three conditional instructions. This adds 4 bytes to every function, but might speed up execution in the common case. Sample disassembly for func f() { _ = [128]byte{} } Before: TEXT main.f(SB) x.go x.go:3 0x2000 e59a1008 MOVW 0x8(R10), R1 x.go:3 0x2004 e59fb028 MOVW 0x28(R15), R11 x.go:3 0x2008 e08d200b ADD R11, R13, R2 x.go:3 0x200c e1520001 CMP R1, R2 x.go:3 0x2010 91a0300e MOVW.LS R14, R3 x.go:3 0x2014 9b0118a9 BL.LS runtime.morestack_noctxt(SB) x.go:3 0x2018 9afffff8 B.LS main.f(SB) x.go:3 0x201c e52de084 MOVW.W R14, -0x84(R13) x.go:4 0x2020 e28d1004 ADD $4, R13, R1 x.go:4 0x2024 e3a00000 MOVW $0, R0 x.go:4 0x2028 eb012255 BL 0x4a984 x.go:5 0x202c e49df084 RET #132 x.go:5 0x2030 eafffffe B 0x2030 x.go:5 0x2034 ffffff7c ? After: TEXT main.f(SB) x.go x.go:3 0x2000 e59a1008 MOVW 0x8(R10), R1 x.go:3 0x2004 e59fb02c MOVW 0x2c(R15), R11 x.go:3 0x2008 e08d200b ADD R11, R13, R2 x.go:3 0x200c e1520001 CMP R1, R2 x.go:3 0x2010 9a000004 B.LS 0x2028 x.go:3 0x2014 e52de084 MOVW.W R14, -0x84(R13) x.go:4 0x2018 e28d1004 ADD $4, R13, R1 x.go:4 0x201c e3a00000 MOVW $0, R0 x.go:4 0x2020 eb0124dc BL 0x4b398 x.go:5 0x2024 e49df084 RET #132 x.go:5 0x2028 e1a0300e MOVW R14, R3 x.go:5 0x202c eb011b0d BL runtime.morestack_noctxt(SB) x.go:5 0x2030 eafffff2 B main.f(SB) x.go:5 0x2034 eafffffe B 0x2034 x.go:5 0x2038 ffffff7c ? Updates #10587. package sort benchmarks on an iPhone 6: name old time/op new time/op delta SortString1K 569µs ± 0% 565µs ± 1% -0.75% (p=0.000 n=23+24) StableString1K 872µs ± 1% 870µs ± 1% -0.16% (p=0.009 n=23+24) SortInt1K 317µs ± 2% 316µs ± 2% ~ (p=0.410 n=26+26) StableInt1K 343µs ± 1% 339µs ± 1% -1.07% (p=0.000 n=22+23) SortInt64K 30.0ms ± 1% 30.0ms ± 1% ~ (p=0.091 n=25+24) StableInt64K 30.2ms ± 0% 30.0ms ± 0% -0.69% (p=0.000 n=22+22) Sort1e2 147µs ± 1% 146µs ± 0% -0.48% (p=0.000 n=25+24) Stable1e2 290µs ± 1% 286µs ± 1% -1.30% (p=0.000 n=23+24) Sort1e4 29.5ms ± 2% 29.7ms ± 1% +0.71% (p=0.000 n=23+23) Stable1e4 88.7ms ± 4% 88.6ms ± 8% -0.07% (p=0.022 n=26+26) Sort1e6 4.81s ± 7% 4.83s ± 7% ~ (p=0.192 n=26+26) Stable1e6 18.3s ± 1% 18.1s ± 1% -0.76% (p=0.000 n=25+23) SearchWrappers 318ns ± 1% 344ns ± 1% +8.14% (p=0.000 n=23+26) package sort benchmarks on a first generation rpi: name old time/op new time/op delta SearchWrappers 4.13µs ± 0% 3.95µs ± 0% -4.42% (p=0.000 n=15+13) SortString1K 5.81ms ± 1% 5.82ms ± 2% ~ (p=0.400 n=14+15) StableString1K 9.69ms ± 1% 9.73ms ± 0% ~ (p=0.121 n=15+11) SortInt1K 3.30ms ± 2% 3.66ms ±19% +10.82% (p=0.000 n=15+14) StableInt1K 5.97ms ±15% 4.17ms ± 8% -30.05% (p=0.000 n=15+15) SortInt64K 319ms ± 1% 295ms ± 1% -7.65% (p=0.000 n=15+15) StableInt64K 343ms ± 0% 332ms ± 0% -3.26% (p=0.000 n=12+13) Sort1e2 3.36ms ± 2% 3.22ms ± 4% -4.10% (p=0.000 n=15+15) Stable1e2 6.74ms ± 1% 6.43ms ± 2% -4.67% (p=0.000 n=15+15) Sort1e4 247ms ± 1% 247ms ± 1% ~ (p=0.331 n=15+14) Stable1e4 864ms ± 0% 820ms ± 0% -5.15% (p=0.000 n=14+15) Sort1e6 41.2s ± 0% 41.2s ± 0% +0.15% (p=0.000 n=13+14) Stable1e6 192s ± 0% 182s ± 0% -5.07% (p=0.000 n=14+14) Change-Id: I8a9db77e1d4ea1956575895893bc9d04bd81204b Reviewed-on: https://go-review.googlesource.com/10497 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-04 16:35:12 +00:00
Brad Fitzpatrick	03410f6758	runtime: fix TestFixedGOROOT to properly restore the GOROOT env var after test Otherwise subsequent tests won't see any modified GOROOT. With this CL I can move my GOROOT, set GOROOT to the new location, and the runtime tests pass. Previously the crash_tests would instead look for the GOROOT baked into the binary, instead of the env var: --- FAIL: TestGcSys (0.01s) crash_test.go:92: building source: exit status 2 go: cannot find GOROOT directory: /home/bradfitz/go --- FAIL: TestGCFairness (0.01s) crash_test.go:92: building source: exit status 2 go: cannot find GOROOT directory: /home/bradfitz/go --- FAIL: TestGdbPython (0.07s) runtime-gdb_test.go:64: building source exit status 2 go: cannot find GOROOT directory: /home/bradfitz/go --- FAIL: TestLargeStringConcat (0.01s) crash_test.go:92: building source: exit status 2 go: cannot find GOROOT directory: /home/bradfitz/go Update #10029 Change-Id: If91be0f04d3acdcf39a9e773a4e7905a446bc477 Reviewed-on: https://go-review.googlesource.com/10685 Reviewed-by: Andrew Gerrand <adg@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-03 23:33:48 +00:00
Austin Clements	10083d8007	runtime: print start of GC cycle in gctrace, rather than end Currently the GODEBUG=gctrace=1 trace line includes "@n.nnns" to indicate the time that the GC cycle ended relative to the time the program started. This was meant to be consistent with the utilization as of the end of the cycle, which is printed next on the trace line, but it winds up just being confusing and unexpected. Change the trace line to include the time that the GC cycle started relative to the time the program started. Change-Id: I7d64580cd696eb17540716d3e8a74a9d6ae50650 Reviewed-on: https://go-review.googlesource.com/10634 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-03 02:17:43 +00:00
Austin Clements	faa7a7e8ae	runtime: implement GC stack barriers This commit implements stack barriers to minimize the amount of stack re-scanning that must be done during mark termination. Currently the GC scans stacks of active goroutines twice during every GC cycle: once at the beginning during root discovery and once at the end during mark termination. The second scan happens while the world is stopped and guarantees that we've seen all of the roots (since there are no write barriers on writes to local stack variables). However, this means pause time is proportional to stack size. In particularly recursive programs, this can drive pause time up past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap). Re-scanning the entire stack is rarely necessary, especially for large stacks, because usually most of the frames on the stack were not active between the first and second scans and hence any changes to these frames (via non-escaping pointers passed down the stack) were tracked by write barriers. To efficiently track how far a stack has been unwound since the first scan (and, hence, how much needs to be re-scanned), this commit introduces stack barriers. During the first scan, at exponentially spaced points in each stack, the scan overwrites return PCs with the PC of the stack barrier function. When "returned" to, the stack barrier function records how far the stack has unwound and jumps to the original return PC for that point in the stack. Then the second scan only needs to proceed as far as the lowest barrier that hasn't been hit. For deeply recursive programs, this substantially reduces mark termination time (and hence pause time). For the goscheme example linked in issue #10898, prior to this change, mark termination times were typically between 100 and 500ms; with this change, mark termination times are typically between 10 and 20ms. As a result of the reduced stack scanning work, this reduces overall execution time of the goscheme example by 20%. Fixes #10898. The effect of this on programs that are not deeply recursive is minimal: name old time/op new time/op delta BinaryTree17 3.16s ± 2% 3.26s ± 1% +3.31% (p=0.000 n=19+19) Fannkuch11 2.42s ± 1% 2.48s ± 1% +2.24% (p=0.000 n=17+19) FmtFprintfEmpty 50.0ns ± 3% 49.8ns ± 1% ~ (p=0.534 n=20+19) FmtFprintfString 173ns ± 0% 175ns ± 0% +1.49% (p=0.000 n=16+19) FmtFprintfInt 170ns ± 1% 175ns ± 1% +2.97% (p=0.000 n=20+19) FmtFprintfIntInt 288ns ± 0% 295ns ± 0% +2.73% (p=0.000 n=16+19) FmtFprintfPrefixedInt 242ns ± 1% 252ns ± 1% +4.13% (p=0.000 n=18+18) FmtFprintfFloat 324ns ± 0% 323ns ± 0% -0.36% (p=0.000 n=20+19) FmtManyArgs 1.14µs ± 0% 1.12µs ± 1% -1.01% (p=0.000 n=18+19) GobDecode 8.88ms ± 1% 8.87ms ± 0% ~ (p=0.480 n=19+18) GobEncode 6.80ms ± 1% 6.85ms ± 0% +0.82% (p=0.000 n=20+18) Gzip 363ms ± 1% 363ms ± 1% ~ (p=0.077 n=18+20) Gunzip 90.6ms ± 0% 90.0ms ± 1% -0.71% (p=0.000 n=17+18) HTTPClientServer 51.5µs ± 1% 50.8µs ± 1% -1.32% (p=0.000 n=18+18) JSONEncode 17.0ms ± 0% 17.1ms ± 0% +0.40% (p=0.000 n=18+17) JSONDecode 61.8ms ± 0% 63.8ms ± 1% +3.11% (p=0.000 n=18+17) Mandelbrot200 3.84ms ± 0% 3.84ms ± 1% ~ (p=0.583 n=19+19) GoParse 3.71ms ± 1% 3.72ms ± 1% ~ (p=0.159 n=18+19) RegexpMatchEasy0_32 100ns ± 0% 100ns ± 1% -0.19% (p=0.033 n=17+19) RegexpMatchEasy0_1K 342ns ± 1% 331ns ± 0% -3.41% (p=0.000 n=19+19) RegexpMatchEasy1_32 82.5ns ± 0% 81.7ns ± 0% -0.98% (p=0.000 n=18+18) RegexpMatchEasy1_1K 505ns ± 0% 494ns ± 1% -2.16% (p=0.000 n=18+18) RegexpMatchMedium_32 137ns ± 1% 137ns ± 1% -0.24% (p=0.048 n=20+18) RegexpMatchMedium_1K 41.6µs ± 0% 41.3µs ± 1% -0.57% (p=0.004 n=18+20) RegexpMatchHard_32 2.11µs ± 0% 2.11µs ± 1% +0.20% (p=0.037 n=17+19) RegexpMatchHard_1K 63.9µs ± 2% 63.3µs ± 0% -0.99% (p=0.000 n=20+17) Revcomp 560ms ± 1% 522ms ± 0% -6.87% (p=0.000 n=18+16) Template 75.0ms ± 0% 75.1ms ± 1% +0.18% (p=0.013 n=18+19) TimeParse 358ns ± 1% 364ns ± 0% +1.74% (p=0.000 n=20+15) TimeFormat 360ns ± 0% 372ns ± 0% +3.55% (p=0.000 n=20+18) Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2 Reviewed-on: https://go-review.googlesource.com/10314 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-06-02 20:00:57 +00:00
Austin Clements	724f8298a8	runtime: avoid double-scanning of stacks Currently there's a race between stopg scanning another G's stack and the G reaching a preemption point and scanning its own stack. When this race occurs, the G's stack is scanned twice. Currently this is okay, so this race is benign. However, we will shortly be adding stack barriers during the first stack scan, so scanning will no longer be idempotent. To prepare for this, this change ensures that each stack is scanned only once during each GC phase by checking the flag that indicates that the stack has been scanned in this phase before scanning the stack. Change-Id: Id9f4d5e2e5b839bc3f200ec1723a4a12dd677ab4 Reviewed-on: https://go-review.googlesource.com/10458 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-06-02 19:59:05 +00:00
Austin Clements	3f6e69aca5	runtime: steal space for stack barrier tracking from stack The stack barrier code will need a bookkeeping structure to keep track of the overwritten return PCs. This commit introduces and allocates this structure, but does not yet use the structure. We don't want to allocate space for this structure during garbage collection, so this commit allocates it along with the allocation of the corresponding stack. However, we can't do a regular allocation in newstack because mallocgc may itself grow the stack (which would lead to a recursive allocation). Hence, this commit makes the bookkeeping structure part of the stack allocation itself by stealing the necessary space from the top of the stack allocation. Since the size of this bookkeeping structure is logarithmic in the size of the stack, this has minimal impact on stack behavior. Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665 Reviewed-on: https://go-review.googlesource.com/10313 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 19:57:57 +00:00
Austin Clements	e610c25df0	runtime: decouple stack bounds and stack allocation size Currently the runtime assumes that the allocation for the stack is exactly [stack.lo, stack.hi). We're about to steal a small part of this allocation for per-stack GC metadata. To prepare for this, this commit adds a field to the G for the allocated size of the stack. With this change, stack.lo and stack.hi continue to act as the true bounds on the stack, but are no longer also used as the bounds on the stack allocation. (I also tried this the other way around, where stack.lo and stack.hi remained the allocation bounds and I introduced a new top of stack. However, there are far more places that assume stack.hi is the true top of the stack than there are places that assume it's the top of the allocation.) Change-Id: Ifa9d956753be53d286d09cbc73d47fb34a18c0c6 Reviewed-on: https://go-review.googlesource.com/10312 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 19:57:50 +00:00
Austin Clements	c02b8911d8	runtime: clean up signalstack API Currently signalstack takes a lower limit and a length and all calls hard-code the passed length. Change the API to take a *stack and compute the lower limit and length from the passed stack. This will make it easier for the runtime to steal some space from the top of the stack since it eliminates the hard-coded stack sizes. Change-Id: I7d2a9f45894b221f4e521628c2165530bbc57d53 Reviewed-on: https://go-review.googlesource.com/10311 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 19:57:42 +00:00
Austin Clements	cc6a7fce53	runtime: increase precision of gctrace times Currently we truncate gctrace clock and CPU times to millisecond precision. As a result, many phases are typically printed as 0, which is fine for user consumption, but makes gathering statistics and reports over GC traces difficult. In 1.4, the gctrace line printed times in microseconds. This was better for statistics, but not as easy for users to read or interpret, and it generally made the trace lines longer. This change strikes a balance between these extremes by printing milliseconds, but including the decimal part to two significant figures down to microsecond precision. This remains easy to read and interpret, but includes more precision when it's useful. For example, where the code currently prints, gc #29 @1.629s 0%: 0+2+0+12+0 ms clock, 0+2+0+0/12/0+0 ms cpu, 4->4->2 MB, 4 MB goal, 1 P this prints, gc #29 @1.629s 0%: 0.005+2.1+0+12+0.29 ms clock, 0.005+2.1+0+0/12/0+0.29 ms cpu, 4->4->2 MB, 4 MB goal, 1 P Fixes #10970. Change-Id: I249624779433927cd8b0947b986df9060c289075 Reviewed-on: https://go-review.googlesource.com/10554 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 18:31:36 +00:00
Mikio Hara	1fa0a8cec5	runtime: fix data race in BenchmarkChanPopular Fixes #11014. Change-Id: I9a18dacd10564d3eaa1fea4d77f1a48e08e79f53 Reviewed-on: https://go-review.googlesource.com/10563 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-02 11:16:01 +00:00
Austin Clements	df2809f04e	runtime: document that runtime.GC() blocks until GC is complete runtime.GC() is intentionally very weakly specified. However, it is so weakly specified that it's difficult to know that it's being used correctly for its one intended use case: to ensure garbage collection has run in a test that is garbage-sensitive. In particular, it is unclear whether it is synchronous or asynchronous. In the old STW collector this was essentially self-evident; short of queuing up a garbage collection to run later, it had to be synchronous. However, with the concurrent collector, there's evidence that people are inferring that it may be asynchronous (e.g., issue #10986), as this is both unclear in the documentation and possible in the implementation. In fact, runtime.GC() runs a fully synchronous STW collection. We probably don't want to commit to this exact behavior. But we can commit to the essential property that tests rely on: that runtime.GC() does not return until the GC has finished. Change-Id: Ifc3045a505e1898ecdbe32c1f7e80e2e9ffacb5b Reviewed-on: https://go-review.googlesource.com/10488 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-06-01 14:51:12 +00:00
Austin Clements	f2c3957ed8	runtime: disable GC around TestGoroutineParallelism TestGoroutineParallelism can deadlock if the GC runs during the test. Currently it tries to prevent this by forcing a GC before the test, but this is best effort and fails completely if GOGC is very low for testing. This change replaces this best-effort fix with simply setting GOGC to off for the duration of the test. Change-Id: I8229310833f241b149ebcd32845870c1cb14e9f8 Reviewed-on: https://go-review.googlesource.com/10454 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-28 17:40:19 +00:00
Austin Clements	4a1957d0aa	runtime: use stripped test environment for TestGdbPython Most runtime tests that invoke the compiler to build a sub-test binary do so with a special environment constructed by testEnv that strips out environment variables that should apply to the test but not to the build. Fix TestGdbPython to use this test environment when invoking go build, like other tests do. Change-Id: Iafdf89d4765c587cbebc427a5d61cb8a7e71b326 Reviewed-on: https://go-review.googlesource.com/10455 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-28 17:39:08 +00:00
Elias Naur	8017ace496	runtime: don't always block all signals on OpenBSD Implement the changes from CL 10173 on OpenBSD. Change-Id: I2db1cd8141fd392a34753a1b8113e2e0401173b9 Reviewed-on: https://go-review.googlesource.com/10342 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-05-23 17:42:43 +00:00
Elias Naur	84cfba17c2	runtime: don't always unblock all signals Ian proposed an improved way of handling signals masks in Go, motivated by a problem where the Android java runtime expects certain signals to be blocked for all JVM threads. Discussion here https://groups.google.com/forum/#!topic/golang-dev/_TSCkQHJt6g Ian's text is used in the following: A Go program always needs to have the synchronous signals enabled. These are the signals for which _SigPanic is set in sigtable, namely SIGSEGV, SIGBUS, SIGFPE. A Go program that uses the os/signal package, and calls signal.Notify, needs to have at least one thread which is not blocking that signal, but it doesn't matter much which one. Unix programs do not change signal mask across execve. They inherit signal masks across fork. The shell uses this fact to some extent; for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are blocked for commands run due to backquote quoting or $(). Our current position on signal masks was not thought out. We wandered into step by step, e.g., http://golang.org/cl/7323067 . This CL does the following: Introduce a new platform hook, msigsave, that saves the signal mask of the current thread to m.sigsave. Call msigsave from needm and newm. In minit grab set up the signal mask from m.sigsave and unblock the essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT (for systems that have it). In unminit, restore the signal mask from m.sigsave. The first time that os/signal.Notify is called, start a new thread whose only purpose is to update its signal mask to make sure signals for signal.Notify are unblocked on at least one thread. The effect on Go programs will be that if they are invoked with some non-synchronous signals blocked, those signals will normally be ignored. Previously, those signals would mostly be ignored. A change in behaviour will occur for programs started with any of these signals blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT, SIGTERM. Previously those signals would always cause a crash (unless using the os/signal package); with this change, they will be ignored if the program is started with the signal blocked (and does not use the os/signal package). ./all.bash completes successfully on linux/amd64. OpenBSD is missing the implementation. Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c Reviewed-on: https://go-review.googlesource.com/10173 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-05-22 20:24:08 +00:00
Russ Cox	001438bdfe	runtime: fix callwritebarrier Given a call frame F of size N where the return values start at offset R, callwritebarrier was instructing heapBitsBulkBarrier to scan the block of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R bytes scanned might lead into the next allocated block in memory. Because the scan was consulting the heap bitmap for type information, scanning into the next block normally "just worked" in the sense of not crashing. Scanning the extra N-R bytes of memory is a problem mainly because it causes the GC to consider pointers that might otherwise not be considered, leading it to retain objects that should actually be freed. This is very difficult to detect. Luckily, juju turned up a case where the heap bitmap and the memory were out of sync for the block immediately after the call frame, so that heapBitsBulkBarrier saw an obvious non-pointer where it expected a pointer, causing a loud crash. Why is there a non-pointer in memory that the heap bitmap records as a pointer? That is more difficult to answer. At least one way that it could happen is that allocations containing no pointers at all do not update the heap bitmap. So if heapBitsBulkBarrier walked out of the current object and into a no-pointer object and consulted those bitmap bits, it would be misled. This doesn't happen in general because all the paths to heapBitsBulkBarrier first check for the no-pointer case. This may or may not be what happened, but it's the only scenario I've been able to construct. I tried for quite a while to write a simple test for this and could not. It does fix the juju crash, and it is clearly an improvement over the old code. Fixes #10844. Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a Reviewed-on: https://go-review.googlesource.com/10317 Reviewed-by: Austin Clements <austin@google.com>	2015-05-21 19:14:03 +00:00
Austin Clements	a5c3bbe0b4	runtime: eliminate write barrier from adjustpointers Currently adjustpointers invokes a write barrier for every stack slot it updates. This is safe---the write barrier always does nothing because the new value is never a heap pointer---but it's unnecessary overhead in performance and complexity. Fix this by rewriting adjustpointers to work with uintptrs instead of unsafe.Pointers. As an added bonus, this makes the code cleaner. name old mean new mean delta BinaryTree17 3.35s × (0.98,1.01) 3.33s × (0.99,1.02) ~ (p=0.095 n=20+19) Fannkuch11 2.49s × (1.00,1.01) 2.52s × (0.99,1.01) +1.23% (p=0.000 n=19+20) FmtFprintfEmpty 52.2ns × (0.99,1.02) 52.2ns × (0.99,1.02) ~ (p=0.766 n=19+19) FmtFprintfString 181ns × (0.99,1.02) 179ns × (0.99,1.01) -1.06% (p=0.000 n=20+19) FmtFprintfInt 177ns × (0.99,1.01) 173ns × (0.99,1.02) -2.26% (p=0.000 n=17+20) FmtFprintfIntInt 300ns × (0.99,1.01) 302ns × (0.99,1.01) +0.76% (p=0.000 n=19+20) FmtFprintfPrefixedInt 253ns × (0.99,1.02) 256ns × (0.99,1.01) +0.96% (p=0.000 n=20+19) FmtFprintfFloat 334ns × (0.99,1.02) 334ns × (1.00,1.01) ~ (p=0.243 n=20+19) FmtManyArgs 1.16µs × (0.99,1.01) 1.17µs × (0.99,1.02) +0.88% (p=0.000 n=20+20) GobDecode 9.16ms × (0.99,1.02) 9.18ms × (1.00,1.00) +0.21% (p=0.048 n=20+17) GobEncode 7.03ms × (0.99,1.01) 7.05ms × (0.99,1.01) ~ (p=0.091 n=19+19) Gzip 374ms × (0.99,1.01) 372ms × (0.99,1.02) -0.50% (p=0.008 n=18+20) Gunzip 92.9ms × (0.99,1.01) 92.5ms × (1.00,1.01) -0.47% (p=0.002 n=19+19) HTTPClientServer 53.1µs × (0.98,1.01) 52.5µs × (0.99,1.01) -0.98% (p=0.000 n=20+19) JSONEncode 17.4ms × (0.99,1.02) 17.5ms × (0.99,1.01) ~ (p=0.061 n=19+20) JSONDecode 66.0ms × (0.99,1.02) 64.7ms × (0.99,1.01) -1.87% (p=0.000 n=20+20) Mandelbrot200 3.94ms × (1.00,1.01) 3.95ms × (1.00,1.01) ~ (p=0.799 n=18+19) GoParse 3.89ms × (0.99,1.02) 3.86ms × (0.99,1.01) -0.70% (p=0.016 n=20+19) RegexpMatchEasy0_32 102ns × (0.99,1.02) 102ns × (1.00,1.01) ~ (p=0.557 n=20+18) RegexpMatchEasy0_1K 353ns × (0.99,1.02) 341ns × (0.99,1.01) -3.38% (p=0.000 n=20+20) RegexpMatchEasy1_32 85.0ns × (0.99,1.02) 85.0ns × (0.99,1.01) ~ (p=0.851 n=19+20) RegexpMatchEasy1_1K 521ns × (0.99,1.02) 506ns × (1.00,1.01) -2.85% (p=0.000 n=20+18) RegexpMatchMedium_32 142ns × (0.99,1.02) 141ns × (1.00,1.01) -1.17% (p=0.000 n=20+19) RegexpMatchMedium_1K 42.8µs × (0.99,1.01) 42.3µs × (0.99,1.01) -1.07% (p=0.000 n=20+19) RegexpMatchHard_32 2.17µs × (0.99,1.01) 2.16µs × (1.00,1.01) -0.51% (p=0.042 n=20+18) RegexpMatchHard_1K 65.6µs × (0.99,1.01) 64.8µs × (1.00,1.00) -1.21% (p=0.000 n=20+17) Revcomp 581ms × (0.99,1.04) 536ms × (1.00,1.01) -7.71% (p=0.000 n=20+18) Template 77.2ms × (0.99,1.01) 76.8ms × (0.99,1.01) ~ (p=0.426 n=20+18) TimeParse 369ns × (0.99,1.02) 371ns × (1.00,1.01) ~ (p=0.117 n=20+19) TimeFormat 371ns × (0.99,1.02) 391ns × (0.99,1.01) +5.33% (p=0.000 n=20+19) Change-Id: I5b952ba577ac4365c8c87db837c5804a1e30b7be Reviewed-on: https://go-review.googlesource.com/10293 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-21 18:35:49 +00:00
Rick Hudson	5b66e5d0d8	runtime: turn work buffer tracing off by default During development we ran with monitoring code turned on by default. This CL turns the work buffer monitoring off. Performance change on most go1 benchmarks is small or insignificant. name old mean new mean delta BinaryTree17 3.35s × (0.99,1.01) 3.35s × (0.99,1.01) ~ (p=0.841 n=5+5) Fannkuch11 2.59s × (1.00,1.01) 2.55s × (1.00,1.00) -1.65% (p=0.008 n=5+5) FmtFprintfEmpty 52.5ns × (0.99,1.02) 53.2ns × (0.98,1.01) ~ (p=0.063 n=5+5) FmtFprintfString 181ns × (1.00,1.00) 180ns × (1.00,1.00) -0.55% (p=0.029 n=4+4) FmtFprintfInt 176ns × (1.00,1.01) 174ns × (1.00,1.00) -0.91% (p=0.000 n=5+4) FmtFprintfIntInt 298ns × (1.00,1.00) 299ns × (1.00,1.00) ~ (p=0.143 n=4+4) FmtFprintfPrefixedInt 250ns × (1.00,1.01) 246ns × (1.00,1.00) -1.68% (p=0.000 n=5+4) FmtFprintfFloat 340ns × (1.00,1.00) 340ns × (1.00,1.01) ~ (p=0.643 n=5+5) FmtManyArgs 1.16µs × (1.00,1.00) 1.15µs × (1.00,1.00) -0.47% (p=0.016 n=5+5) GobDecode 9.22ms × (1.00,1.00) 9.23ms × (1.00,1.00) ~ (p=0.841 n=5+5) GobEncode 7.00ms × (1.00,1.01) 7.09ms × (0.99,1.01) +1.26% (p=0.016 n=5+5) Gzip 387ms × (1.00,1.00) 389ms × (0.99,1.02) ~ (p=1.000 n=5+5) Gunzip 97.8ms × (1.00,1.00) 98.3ms × (1.00,1.00) +0.51% (p=0.016 n=5+4) HTTPClientServer 52.6µs × (1.00,1.01) 52.7µs × (1.00,1.01) ~ (p=1.000 n=5+5) JSONEncode 18.0ms × (0.99,1.02) 17.9ms × (1.00,1.00) ~ (p=0.310 n=5+5) JSONDecode 64.8ms × (0.99,1.02) 63.6ms × (1.00,1.00) -1.94% (p=0.008 n=5+5) Mandelbrot200 4.05ms × (1.00,1.00) 4.05ms × (1.00,1.00) ~ (p=0.421 n=5+5) GoParse 3.86ms × (1.00,1.01) 3.84ms × (0.99,1.01) ~ (p=0.421 n=5+5) RegexpMatchEasy0_32 101ns × (1.00,1.00) 102ns × (0.99,1.02) ~ (p=0.238 n=4+5) RegexpMatchEasy0_1K 346ns × (1.00,1.01) 345ns × (1.00,1.00) ~ (p=0.333 n=5+4) RegexpMatchEasy1_32 87.3ns × (0.99,1.02) 87.4ns × (1.00,1.00) ~ (p=0.190 n=5+4) RegexpMatchEasy1_1K 520ns × (1.00,1.00) 520ns × (1.00,1.01) ~ (p=1.000 n=4+5) RegexpMatchMedium_32 143ns × (1.00,1.00) 142ns × (1.00,1.00) -0.70% (p=0.029 n=4+4) RegexpMatchMedium_1K 43.2µs × (1.00,1.01) 43.2µs × (1.00,1.00) ~ (p=0.841 n=5+5) RegexpMatchHard_32 2.24µs × (1.00,1.01) 2.23µs × (1.00,1.01) -0.63% (p=0.048 n=5+5) RegexpMatchHard_1K 68.7µs × (1.00,1.00) 68.3µs × (1.00,1.00) -0.56% (p=0.008 n=5+5) Revcomp 577ms × (1.00,1.01) 579ms × (1.00,1.00) ~ (p=0.151 n=5+5) Template 74.9ms × (1.00,1.00) 76.5ms × (1.00,1.00) +2.11% (p=0.008 n=5+5) TimeParse 359ns × (1.00,1.00) 362ns × (1.00,1.00) +0.72% (p=0.008 n=5+5) TimeFormat 369ns × (1.00,1.00) 371ns × (1.00,1.01) ~ (p=0.071 n=5+5) Change-Id: I4206a3f77a3d1450966b7a62ea7597aec44cb72f Reviewed-on: https://go-review.googlesource.com/10294 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-21 16:09:24 +00:00
Austin Clements	719efc70eb	runtime: make runtime.callers walk calling G, not g0 Currently runtime.callers invokes gentraceback with the pc and sp of the G it is called from, but always passes g0 even if it was called from a regular g. Right now this has no ill effects because runtime.callers does not use either callback argument or the _TraceJumpStack flag, but it makes the code fragile and will break some upcoming changes. Fix this by lifting the getg() call outside of the systemstack in runtime.callers. Change-Id: I4e1e927961c0e0cd4dcf28693be47df7bae9e122 Reviewed-on: https://go-review.googlesource.com/10292 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-21 16:06:37 +00:00
Rick Hudson	197aa9e64d	runtime: remove unused quiesce code This is dead code. If you want to quiesce the system the preferred way is to use forEachP(func(*p){}). Change-Id: Ic7677a5dd55e3639b99e78ddeb2c71dd1dd091fa Reviewed-on: https://go-review.googlesource.com/10267 Reviewed-by: Austin Clements <austin@google.com>	2015-05-20 17:56:44 +00:00
Rick Hudson	913db7685e	runtime: run background mark helpers only if work is available Prior to this CL whenever the GC marking was enabled and a P was looking for work we supplied a G to help the GC do its marking tasks. Once this G finished all the marking available it would release the P to find another available G. In the case where there was no work the P would drop into findrunnable which would execute the mark helper G which would immediately return and the P would drop into findrunnable again repeating the process. Since the P was always given a G to run it never blocks. This CL first checks if the GC mark helper G has available work and if not the P immediately falls through to its blocking logic. Fixes #10901 Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5 Reviewed-on: https://go-review.googlesource.com/10189 Reviewed-by: Austin Clements <austin@google.com>	2015-05-19 15:57:50 +00:00
Austin Clements	f4d51eb2f5	runtime: minor clean up to heapminimum Currently setGCPercent sets heapminimum to heapminimum*GOGC/100. The real intent is to set heapminimum to a scaled multiple of a fixed default heap minimum, not to scale heapminimum based on its current value. This turns out to be okay because setGCPercent is only called once and heapminimum is initially set to this default heap minimum. However, the code as written is confusing, especially since setGCPercent is otherwise written so it could be called again to change GOGC. Fix this by introducing a defaultHeapMinimum constant and using this instead of the current value of heapminimum to compute the scaled heap minimum. As part of this, this commit improves the documentation on heapminimum. Change-Id: I4eb82c73dc2eb44a6e5a17c780a747a2e73d7493 Reviewed-on: https://go-review.googlesource.com/10181 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-19 15:30:34 +00:00
Russ Cox	8903b3db0e	runtime: add fast check for self-loop pointer in scanobject Addresses a problem reported on the mailing list. This will come up mainly in programs custom allocators that batch allocations, but it still helps in our programs, which mainly do not have such allocations. name old mean new mean delta BinaryTree17 5.95s × (0.97,1.03) 5.93s × (0.97,1.04) ~ (p=0.613) Fannkuch11 4.46s × (0.98,1.04) 4.33s × (0.99,1.01) -2.93% (p=0.000) FmtFprintfEmpty 86.6ns × (0.98,1.03) 86.8ns × (0.98,1.02) ~ (p=0.523) FmtFprintfString 290ns × (0.98,1.05) 287ns × (0.98,1.03) ~ (p=0.061) FmtFprintfInt 271ns × (0.98,1.04) 286ns × (0.99,1.01) +5.54% (p=0.000) FmtFprintfIntInt 495ns × (0.98,1.04) 489ns × (0.99,1.01) -1.24% (p=0.015) FmtFprintfPrefixedInt 391ns × (0.99,1.02) 407ns × (0.99,1.01) +4.00% (p=0.000) FmtFprintfFloat 578ns × (0.99,1.01) 559ns × (0.99,1.01) -3.35% (p=0.000) FmtManyArgs 1.96µs × (0.98,1.05) 1.94µs × (0.99,1.01) -1.33% (p=0.030) GobDecode 15.9ms × (0.97,1.05) 15.7ms × (0.99,1.01) -1.35% (p=0.044) GobEncode 11.4ms × (0.97,1.05) 11.3ms × (0.98,1.03) ~ (p=0.141) Gzip 658ms × (0.98,1.05) 648ms × (0.99,1.01) -1.59% (p=0.009) Gunzip 144ms × (0.99,1.03) 144ms × (0.99,1.01) ~ (p=0.867) HTTPClientServer 92.1µs × (0.97,1.05) 90.3µs × (0.99,1.01) -1.89% (p=0.005) JSONEncode 31.0ms × (0.96,1.07) 30.2ms × (0.98,1.03) -2.66% (p=0.001) JSONDecode 110ms × (0.97,1.04) 107ms × (0.99,1.01) -2.59% (p=0.000) Mandelbrot200 6.15ms × (0.98,1.04) 6.07ms × (0.99,1.02) -1.32% (p=0.045) GoParse 6.79ms × (0.97,1.04) 6.74ms × (0.97,1.04) ~ (p=0.242) RegexpMatchEasy0_32 158ns × (0.98,1.05) 155ns × (0.99,1.01) -1.64% (p=0.010) RegexpMatchEasy0_1K 548ns × (0.97,1.04) 540ns × (0.99,1.01) -1.34% (p=0.042) RegexpMatchEasy1_32 133ns × (0.97,1.04) 132ns × (0.97,1.05) ~ (p=0.466) RegexpMatchEasy1_1K 899ns × (0.96,1.05) 878ns × (0.99,1.01) -2.32% (p=0.002) RegexpMatchMedium_32 250ns × (0.96,1.03) 243ns × (0.99,1.01) -2.90% (p=0.000) RegexpMatchMedium_1K 73.4µs × (0.98,1.04) 73.0µs × (0.98,1.04) ~ (p=0.411) RegexpMatchHard_32 3.87µs × (0.97,1.07) 3.84µs × (0.98,1.04) ~ (p=0.273) RegexpMatchHard_1K 120µs × (0.97,1.08) 117µs × (0.99,1.01) -2.06% (p=0.010) Revcomp 940ms × (0.96,1.07) 924ms × (0.97,1.07) ~ (p=0.071) Template 128ms × (0.96,1.05) 128ms × (0.99,1.01) ~ (p=0.502) TimeParse 632ns × (0.96,1.07) 616ns × (0.99,1.01) -2.58% (p=0.001) TimeFormat 671ns × (0.97,1.06) 657ns × (0.99,1.02) -2.10% (p=0.002) In contrast to the one in test/bench/go1 (above), the binarytree program on the shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS to runtime.NumCPU()*2. Using that version, before vs after: name old mean new mean delta BinaryTree20 18.6s × (0.96,1.05) 11.3s × (0.98,1.02) -39.46% (p=0.000) And Go 1.4 vs after: name old mean new mean delta BinaryTree20 13.0s × (0.97,1.02) 11.3s × (0.98,1.02) -13.21% (p=0.000) There is still a scheduling problem - the raw run times are hiding the fact that this chews up 2x the CPU - but we'll take care of that separately. Change-Id: I3f5da879b24ae73a0d06745381ffb88c3744948b Reviewed-on: https://go-review.googlesource.com/10220 Reviewed-by: Austin Clements <austin@google.com>	2015-05-19 15:29:40 +00:00
Josh Bleecher Snyder	79986e24e0	runtime/pprof: write heap statistics to heap profile always This is a duplicate of CL 9491. That CL broke the build due to pprof shortcomings and was reverted in CL 9565. CL 9623 fixed pprof, so this can go in again. Fixes #10659. Change-Id: If470fc90b3db2ade1d161b4417abd2f5c6c330b8 Reviewed-on: https://go-review.googlesource.com/10212 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2015-05-18 20:02:21 +00:00
Austin Clements	f0dd002895	runtime: use separate count and note for forEachP Currently, forEachP reuses the stopwait and stopnote fields from stopTheWorld to track how many Ps have not responded to the safe-point request and to sleep until all Ps have responded. It was assumed this was safe because both stopTheWorld and forEachP must occur under the worlsema and hence stopwait and stopnote cannot be used for both purposes simultaneously and callers could always determine the appropriate use based on sched.gcwaiting (which is only set by stopTheWorld). However, this is not the case, since it's possible for there to be a window between when an M observes that gcwaiting is set and when it checks stopwait during which stopwait could have changed meanings. When this happens, the M decrements stopwait and may wakeup stopnote, but does not otherwise participate in the forEachP protocol. As a result, stopwait is decremented too many times, so it may reach zero before all Ps have run the safe-point function, causing forEachP to wake up early. It will then either observe that some P has not run the safe-point function and panic with "P did not run fn", or the remaining P (or Ps) will run the safe-point function before it wakes up and it will observe that stopwait is negative and panic with "not stopped". Fix this problem by giving forEachP its own safePointWait and safePointNote fields. One known sequence of events that can cause this race is as follows. It involves three actors: G1 is running on M1 on P1. P1 has an empty run queue. G2/M2 is in a blocked syscall and has lost its P. (The details of this don't matter, it just needs to be in a position where it needs to grab an idle P.) GC just started on G3/M3/P3. (These aren't very involved, they just have to be separate from the other G's, M's, and P's.) 1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1. Now G1/M1 begins to enter a syscall: 2. G1/M1 invokes reentersyscall, which sets the P1's status to _Psyscall. 3. G1/M1's reentersyscall observes gcwaiting != 0 and calls entersyscall_gcwait. 4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock. Back on GC: 5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and returns. 6. GC does stuff and then calls startTheWorld(). 7. startTheWorld() calls procresize(), which sets P1's status to _Pidle and puts P1 on the idle list. Now G2/M2 returns from its syscall and takes over P1: 8. G2/M2 returns from its blocked syscall and gets P1 from the idle list. 9. G2/M2 acquires P1, which sets P1's status to _Prunning. 10. G2/M2 starts a new syscall and invokes reentersyscall, which sets P1's status to _Psyscall. Back on G1/M1: 11. G1/M1 finally acquires sched.lock in entersyscall_gcwait. At this point, G1/M1 still thinks it's running on P1. P1's status is _Psyscall, which is consistent with what G1/M1 is doing, but it's _Psyscall because G2/M2 put it in to _Psyscall, not G1/M1. This is basically an ABA race on P1's status. Because forEachP currently shares stopwait with stopTheWorld. G1/M1's entersyscall_gcwait observes the non-zero stopwait set by forEachP, but mistakes it for a stopTheWorld. It cas's P1's status from _Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement stopwait one more time than forEachP was expecting. Fixes #10618. (See the issue for details on why the above race is safe when forEachP is not involved.) Prior to this commit, the command stress ./runtime.test -test.run TestFutexsleep\\|TestGoroutineProfile would reliably fail after a few hundred runs. With this commit, it ran for over 2 million runs and never crashed. Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30 Reviewed-on: https://go-review.googlesource.com/10157 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:47 +00:00
Austin Clements	277acca286	runtime: hold worldsema while starting the world Currently, startTheWorld releases worldsema before starting the world. Since startTheWorld can change gomaxprocs after allowing Ps to run, this means that gomaxprocs can change while another P holds worldsema. Unfortunately, the garbage collector and forEachP assume that holding worldsema protects against changes in gomaxprocs (which it almost does). In particular, this is causing somewhat frequent "P did not run fn" crashes in forEachP in the runtime tests because gomaxprocs is changing between the several loops that forEachP does over all the Ps. Fix this by only releasing worldsema after the world is started. This relates to issue #10618. forEachP still fails under stress testing, but much less frequently. Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff Reviewed-on: https://go-review.googlesource.com/10156 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:37 +00:00
Austin Clements	9c44a41dd5	runtime: disallow preemption during startTheWorld Currently, startTheWorld clears preemptoff for the current M before starting the world. A few callers increment m.locks around startTheWorld, presumably to prevent preemption any time during starting the world. This is almost certainly pointless (none of the other callers do this), but there's no harm in making startTheWorld keep preemption disabled until it's all done, which definitely lets us drop these m.locks manipulations. Change-Id: I8a93658abd0c72276c9bafa3d2c7848a65b4691a Reviewed-on: https://go-review.googlesource.com/10155 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:31 +00:00
Austin Clements	a1da255aa0	runtime: factor stoptheworld/starttheworld pattern There are several steps to stopping and starting the world and currently they're open-coded in several places. The garbage collector is the only thing that needs to stop and start the world in a non-trivial pattern. Replace all other uses with calls to higher-level functions that implement the entire pattern necessary to stop and start the world. This is a pure refectoring and should not change any code semantics. In the following commits, we'll make changes that are easier to do with this abstraction in place. This commit renames the old starttheworld to startTheWorldWithSema. This is a slight misnomer right now because the callers release worldsema just before calling this. However, a later commit will swap these and I don't want to think of another name in the mean time. Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911 Reviewed-on: https://go-review.googlesource.com/10154 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:25 +00:00
Austin Clements	5f7060afd2	runtime: don't start GC if preemptoff is set In order to avoid deadlocks, startGC avoids kicking off GC if locks are held by the calling M. However, it currently fails to check preemptoff, which is the other way to disable preemption. Fix this by adding a check for preemptoff. Change-Id: Ie1083166e5ba4af5c9d6c5a42efdfaaef41ca997 Reviewed-on: https://go-review.googlesource.com/10153 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:18 +00:00
Alex Brainman	e544bee1dd	runtime: correct exception stack trace output It is misleading when stack trace say: signal arrived during cgo execution but we are not in cgo call. Change-Id: I627e2f2bdc7755074677f77f21befc070a101914 Reviewed-on: https://go-review.googlesource.com/9190 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 03:09:45 +00:00
Austin Clements	a0fc306023	runtime: eliminate runqvictims and a copy from runqsteal Currently, runqsteal steals Gs from another P into an intermediate buffer and then copies those Gs into the current P's run queue. This intermediate buffer itself was moved from the stack to the P in commit `c4fe503` to eliminate the cost of zeroing it on every steal. This commit follows up `c4fe503` by stealing directly into the current P's run queue, which eliminates the copy and the need for the intermediate buffer. The update to the tail pointer is only committed once the entire steal operation has succeeded, so the semantics of stealing do not change. Change-Id: Icdd7a0eb82668980bf42c0154b51eef6419fdd51 Reviewed-on: https://go-review.googlesource.com/9998 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-05-17 01:08:42 +00:00
Russ Cox	512f75e8df	runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>	2015-05-16 00:38:17 +00:00
Russ Cox	d820d5f3ab	runtime: make mapzero not crash on arm Change-Id: I40e8a4a2e62253233b66f6a2e61e222437292c31 Reviewed-on: https://go-review.googlesource.com/10151 Reviewed-by: Minux Ma <minux@golang.org>	2015-05-15 20:14:41 +00:00

1 2 3 4 5 ...

1147 Commits