qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-20 05:24:41 -07:00

Author	SHA1	Message	Date
Rob Pike	ed9a4c91c2	runtime: document that GC blocks the whole program No code changes. Just make it clear that runtime.GC is not concurrent. Change-Id: I00a99ebd26402817c665c9a128978cef19f037be Reviewed-on: https://go-review.googlesource.com/12345 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-07-17 22:40:21 +00:00
Austin Clements	e33d6b3d4d	runtime: remove out-of-date comment An out-of-date comment snuck in to `cc8f544`. Remove it. Change-Id: I5bc7c17e737d1cabe57b88de06d7579c60ca28ff Reviewed-on: https://go-review.googlesource.com/12328 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2015-07-17 16:52:32 +00:00
Austin Clements	cc8f544198	runtime: don't free large spans until heapBitsSweepSpan returns This fixes a race between 1) sweeping and freeing an unmarked large span and 2) reusing that span and allocating from it. This race arises because mSpan_Sweep returns spans for large objects to the heap before heapBitsSweepSpan clears the mark bit on the object in the span. Specifically, the following sequence of events can lead to an incorrectly zeroed bitmap byte, which causes the garbage collector to not trace any pointers in that object (the pointer bits for the first four words are cleared, and the scan bits are also cleared, so it looks like a no-scan object). 1) P0 calls mSpan_Sweep on a large span S0 with an unmarked object on it. 2) mSpan_Sweep calls heapBitsSweepSpan, which invokes the callback for the one (unmarked) object on the span. 3) The callback calls mHeap_Free, which makes span S0 available for allocation, but this is too early. 4) P1 grabs this S0 from the heap to use for allocation. 5) P1 allocates an object on this span and writes that object's type bits to the bitmap. 6) P0 returns from the callback to heapBitsSweepSpan. heapBitsSweepSpan clears the byte containing the mark, even though this span is now owned by P1 and this byte contains important bitmap information. This fixes this problem by simply delaying the mHeap_Free until after the heapBitsSweepSpan. I think the overall logic of mSpan_Sweep could be simplified now, but this seems like the minimal change. Fixes #11617. Change-Id: I6b1382c7e7cc35f81984467c0772fe9848b7522a Reviewed-on: https://go-review.googlesource.com/12320 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Rob Pike <r@golang.org>	2015-07-17 03:34:11 +00:00
Russ Cox	a93e5b4ff9	Revert "runtime: diagnose invalid pointers during GC" Broke arm64. Update #9880. This reverts commit `38d9b2a3a9`. Change-Id: I35fa21005af2183828a9d8b195ebcfbe45ec5138 Reviewed-on: https://go-review.googlesource.com/12247 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-16 01:49:58 +00:00
Austin Clements	e42413cecc	runtime: fix saved PC/SP after safe-point function in syscall Running a safe-point function on syscall entry uses systemstack() and hence clobbers g.sched.pc and g.sched.sp. Fix this by re-saving them after the systemstack, just like in the other uses of systemstack in reentersyscall. Change-Id: I47868a53eba24d81919fda56ef6bbcf72f1f922e Reviewed-on: https://go-review.googlesource.com/12125 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-15 21:09:16 +00:00
Austin Clements	edfc979725	runtime: run safe-point function before entering _Psyscall Currently, we run a P's safe-point function immediately after entering _Psyscall state. This is unsafe, since as soon as we put the P in _Psyscall, we no longer control the P and another M may claim it. We'll still run the safe-point function only once (because doing so races on an atomic), but the P may no longer be at a safe-point when we do so. In particular, this means that the use of forEachP to dispose all P's gcw caches is unsafe. A P may enter a syscall, run the safe-point function, and dispose the P's gcw cache concurrently with another M claiming the P and attempting to use its gcw cache. If this happens, we may empty the gcw's workbuf after putting it on work.{full,partial}, or add pointers to it after putting it in work.empty. This will cause an assertion failure when we later pop the workbuf from the list and its object count is inconsistent with the list we got it from. Fix this by running the safe-point function just before putting the P in _Psyscall. Related to #11640. This probably fixes this issue, but while I'm able to show that we can enter a bad safe-point state as a result of this, I can't reproduce that specific failure. Change-Id: I6989c8ca7ef2a4a941ae1931e9a0748cbbb59434 Reviewed-on: https://go-review.googlesource.com/12124 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-15 21:09:07 +00:00
Matthew Dempsky	64e53337af	runtime: fix go:nowritebarrier annotation on gcmarkwb_m Change-Id: I945d46d3bb63f1992bce0d0b1e89e75cac9bbd54 Reviewed-on: https://go-review.googlesource.com/12271 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-07-15 21:06:13 +00:00
Russ Cox	38d9b2a3a9	runtime: diagnose invalid pointers during GC For #9880. Let's see what breaks. Change-Id: Ic8b99a604e60177a448af5f7173595feed607875 Reviewed-on: https://go-review.googlesource.com/10818 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2015-07-15 05:42:06 +00:00
Russ Cox	3290e9c145	runtime: fix build on non-x86 machines Fixes #11656 (again). Change-Id: I170ff10bfbdb0f34e57c11de42b6ee5291837813 Reviewed-on: https://go-review.googlesource.com/12142 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-14 04:42:12 +00:00
Austin Clements	777ab5ce1a	runtime: fix MemStats.{PauseNS,PauseEnd,PauseTotalNS,LastGC} These memstats are currently being computed by gcMark, which was appropriate in Go 1.4, but gcMark is now just one part of a bigger picture. In particular, it can't account for the sweep termination pause time, it can't account for all of the mark termination pause time, and the reported "pause end" and "last GC" times will be slightly earlier than they really are. Lift computing of these statistics into func gc, which has the appropriate visibility into the process to compute them correctly. Fixes one of the issues in #10323. This does not add new statistics appropriate to the concurrent collector; it simply fixes existing statistics that are being misreported. Change-Id: I670cb16594a8641f6b27acf4472db15b6e8e086e Reviewed-on: https://go-review.googlesource.com/11794 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-13 23:32:59 +00:00
Austin Clements	ad60cd8b92	runtime: report MemStats.PauseEnd in UNIX time Currently we report MemStats.PauseEnd in nanoseconds, but with no particular 0 time. On Linux, the 0 time is when the host started. On Darwin, it's the UNIX epoch. This is also inconsistent with the other absolute time in MemStats, LastGC, which is always reported in nanoseconds since 1970. Fix PauseEnd so it's always reported in nanoseconds since 1970, like LastGC. Fixes one of the issues raised in #10323. Change-Id: Ie2fe3169d45113992363a03b764f4e6c47e5c6a8 Reviewed-on: https://go-review.googlesource.com/11801 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-13 23:32:02 +00:00
Russ Cox	0bcdffeea6	runtime: fix x86 stack trace for call to heap memory Fixes #11656. Change-Id: Ib81d583e4b004e67dc9d2f898fd798112434e7a9 Reviewed-on: https://go-review.googlesource.com/12026 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Russ Cox <rsc@golang.org>	2015-07-13 19:42:35 +00:00
Russ Cox	683311175c	runtime: fix race in TestChanSendBarrier Fixes race detector build. Change-Id: I8bdc78d57487580e6b5b8c415df4653a1ba69e37 Reviewed-on: https://go-review.googlesource.com/12087 Reviewed-by: Austin Clements <austin@google.com>	2015-07-13 19:42:20 +00:00
Russ Cox	8c3533c89b	runtime: add memory barrier for sync send in select Missed select case when adding the barrier last time. All the more reason to refactor this code in Go 1.6. Fixes #11643. Change-Id: Ib0d19d6e0939296c0a3e06dda5e9b76f813bbc7e Reviewed-on: https://go-review.googlesource.com/12086 Reviewed-by: Austin Clements <austin@google.com>	2015-07-13 19:10:22 +00:00
Brad Fitzpatrick	2ae77376f7	all: link to https instead of http The one in misc/makerelease/makerelease.go is particularly bad and probably warrants rotating our keys. I didn't update old weekly notes, and reverted some changes involving test code for now, since we're late in the Go 1.5 freeze. Otherwise, the rest are all auto-generated changes, and all manually reviewed. Change-Id: Ia2753576ab5d64826a167d259f48a2f50508792d Reviewed-on: https://go-review.googlesource.com/12048 Reviewed-by: Rob Pike <r@golang.org>	2015-07-11 14:36:33 +00:00
Elias Naur	b3a8b0574a	runtime: abort on fatal errors and panics in c-shared and c-archive modes The default behaviour for fatal errors and runtime panics is to dump the goroutine stack traces and exit with code 2. However, when the process is owned by foreign code, it is suprising and inappropriate to suddenly exit the whole process, even on fatal errors. Instead, re-use the crash behaviour from GOTRACEBACK=crash and abort. The motivating use case is issue #11382, where an Android crash reporter is confused by an exiting process, but I believe the aborting behaviour is appropriate for all cases where Go does not own the process. The change is simple and contained and will enable reliable crash reporting for Android apps in Go 1.5, but I'll leave it to others to judge whether it is too late for Go 1.5. Fixes #11382 Change-Id: I477328e1092f483591c99da1fbb8bc4411911785 Reviewed-on: https://go-review.googlesource.com/12032 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-07-11 11:39:05 +00:00
Alex Brainman	d5004ee69e	runtime: use AddVectoredContinueHandler on Windows XP amd64 Recent change (CL 10370) unexpectedly broke TestRaiseException on Windows XP amd64. I still do not know why. But reverting old CL 8165 fixes the problem. This effectively makes Windows XP amd64 use AddVectoredContinueHandler instead of SetUnhandledExceptionFilter for exception handling. That is what we do for all recent Windows versions too. Fixes #11481 Change-Id: If2e8037711f05bf97e3c69f5a8d86af67c58f6fc Reviewed-on: https://go-review.googlesource.com/11888 Run-TryBot: Alex Brainman <alex.brainman@gmail.com> Reviewed-by: Daniel Theophanes <kardianos@gmail.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-07-11 07:02:57 +00:00
Ian Lance Taylor	6a90b1d621	runtime, cmd/go: fix tests to work when GOROOT_FINAL is set When GOROOT_FINAL is set when running all.bash, the tests are run before the files are copied to GOROOT_FINAL. The tests are run with GOROOT set, so most work fine. This fixes two cases that do not. In cmd/go/go_test.go we were explicitly removing GOROOT from the environment, causing tests that did not themselves explicitly set GOROOT to fail. There was no need to explicitly remove GOROOT, so don't do it. If people choose to run "go test cmd/go" with a bad GOROOT, that is their own lookout. In the runtime GDB test, the linker has told gdb to find the support script in GOROOT_FINAL, which will fail. Check for that case, and skip the test when we see it. Fixes #11652. Change-Id: I4d3a32311e3973c30fd8a79551aaeab6789d0451 Reviewed-on: https://go-review.googlesource.com/12021 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-07-10 21:29:37 +00:00
Ian Lance Taylor	2de67e9974	runtime: clarify that NumCPU returns only available CPUs Update #11609. Change-Id: Ie363facf13f5e62f1af4a8bdc42a18fb36e16ebf Reviewed-on: https://go-review.googlesource.com/12022 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-07-10 21:28:49 +00:00
Austin Clements	4b2774f5ea	runtime: make sysmon-triggered GC concurrent sysmon triggers a GC if there has been no GC for two minutes. Currently, this is a STW GC. There is no reason for this to be STW, so make it concurrent. Fixes #10261. Change-Id: I92f3ac37272d5c2a31480ff1fa897ebad08775a9 Reviewed-on: https://go-review.googlesource.com/11955 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-09 05:53:21 +00:00
David Chase	7929a0ddfa	cmd/compile: initialize line number properly for temporaries The expansion of structure, array, slice, and map literals does not use the right line number in its introduced assignments to temporaries, which leads to incorrect line number attribution for expressions in those literals. Inlining also incorrectly replaced the line numbers of args to inlined functions. This was revealed in CL 9721 because a now-avoided temporary assignment introduced the correct line number. I.e. before CL 9721 "tmp_wrongline := expr" was transformed to "tmp_rightline := expr; tmp_wrongline := tmp_rightline" Also includes a repair to CL 10334 involving line numbers where a spurious -1 remained (should have been 0, now is 0). Fixes #11400. Change-Id: I3a4687efe463977fa1e2c996606f4d91aaf22722 Reviewed-on: https://go-review.googlesource.com/11730 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Sameer Ajmani <sameer@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-07 21:30:59 +00:00
Russ Cox	2028077899	runtime: randomize scheduling in -race mode Basic randomization of goroutine scheduling for -race mode. It is probably possible to do much better (there's a paper linked in the issue that I haven't read, for example), but this suffices to introduce at least some unpredictability into the scheduling order. The goal here is to have _something_ for Go 1.5, so that we don't start hitting more of these scheduling order-dependent bugs if we change the scheduler order again in Go 1.6. For #11372. Change-Id: Idf1154123fbd5b7a1ee4d339e93f97635cc2bacb Reviewed-on: https://go-review.googlesource.com/11795 Reviewed-by: Austin Clements <austin@google.com>	2015-07-07 21:27:38 +00:00
Russ Cox	3b6e86f48a	cmd/compile: fix race detector handling of OBLOCK nodes Fixes #7561 correctly. Fixes #9137. Change-Id: I7f27e199d7101b785a7645f789e8fe41a405a86f Reviewed-on: https://go-review.googlesource.com/11713 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-06-30 19:25:18 +00:00
Russ Cox	8b99bb7b8c	runtime: fix broken arm builds Change-Id: I08de33aacb3fc932722286d69b1dd70ffe787c89 Reviewed-on: https://go-review.googlesource.com/11697 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 17:33:23 +00:00
Russ Cox	434e0bc0a0	cmd/link: record missing pcdata tables correctly The old code was recording the current table output offset, so the table from the next function would be used instead of the runtime realizing that there was no table at all. Add debug constant in runtime to check this for every function at startup. It's too expensive to do that by default, but we can do the last five functions. The end of the table is usually where the C symbols end up, so that's where the problems typically are. Fixes #10747. Fixes #11396. Change-Id: I13592e78017969fc22979fa902e19e1b151d41b1 Reviewed-on: https://go-review.googlesource.com/11657 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org>	2015-06-29 16:07:14 +00:00
Austin Clements	1b917484a8	runtime: reset mark state before checkmark and gctrace=2 mark Currently we fail to reset the live heap accounting state before the checkmark mark and before the gctrace=2 extra mark. As a result, if either are enabled, at the end of GC it thinks there are 0 bytes of live heap, which causes the GC controller to initiate a new GC immediately, regardless of the true heap size. Fix this by factoring this state reset into a function and calling it before all three possible marks. This function should be merged with gcResetGState, but doing so requires some additional cleanup, so it will wait for after the freeze. Filed #11427 for this cleanup. Fixes #10492. Change-Id: Ibe46348916fc8368fac6f086e142815c970a6f4d Reviewed-on: https://go-review.googlesource.com/11561 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 15:58:29 +00:00
Austin Clements	d57056ba26	runtime: don't free stack spans during GC Memory for stacks is manually managed by the runtime and, currently (with one exception) we free stack spans immediately when the last stack on a span is freed. However, the garbage collector assumes that spans can never transition from non-free to free during scan or mark. This disagreement makes it possible for the garbage collector to mark uninitialized objects and is blocking us from re-enabling the bad pointer test in the garbage collector (issue #9880). For example, the following sequence will result in marking an uninitialized object: 1. scanobject loads a pointer slot out of the object it's scanning. This happens to be one of the special pointers from the heap into a stack. Call the pointer p and suppose it points into X's stack. 2. X, running on another thread, grows its stack and frees its old stack. 3. The old stack happens to be large or was the last stack in its span, so X frees this span, setting it to state _MSpanFree. 4. The span gets reused as a heap span. 5. scanobject calls heapBitsForObject, which loads the span containing p, which is now in state _MSpanInUse, but doesn't necessarily have an object at p. The not-object at p gets marked, and at this point all sorts of things can go wrong. We already have a partial solution to this. When shrinking a stack, we put the old stack on a queue to be freed at the end of garbage collection. This was done to address exactly this problem, but wasn't a complete solution. This commit generalizes this solution to both shrinking and growing stacks. For stacks that fit in the stack pool, we simply don't free the span, even if its reference count reaches zero. It's fine to reuse the span for other stacks, and this enables that. At the end of GC, we sweep for cached stack spans with a zero reference count and free them. For larger stacks, we simply queue the stack span to be freed at the end of GC. Ideally, we would reuse these large stack spans the way we can small stack spans, but that's a more invasive change that will have to wait until after the freeze. Fixes #11267. Change-Id: Ib7f2c5da4845cc0268e8dc098b08465116972a71 Reviewed-on: https://go-review.googlesource.com/11502 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 15:33:40 +00:00
Austin Clements	f73b2fca84	runtime: remove unused _GCsweep state We don't use this state. _GCoff means we're sweeping in the background. This makes it clear in the next commit that _GCoff and only _GCoff means sweeping. Change-Id: I416324a829ba0be3794a6cf3cf1655114cb6e47c Reviewed-on: https://go-review.googlesource.com/11501 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 15:33:31 +00:00
Austin Clements	840965f8d7	runtime: always clear stack barriers on G exit Currently the runtime fails to clear a G's stack barriers in gfput if the G's stack allocation is _FixedStack bytes. This causes the runtime to panic if the following sequence of events happens: 1) The runtime installs stack barriers on a G. 2) The G exits by calling runtime.Goexit. Since this does not necessarily return through the stack barriers installed on the G, there may still be untriggered stack barriers left on the G's stack in recorded in g.stkbar. 3) The runtime calls gfput to add the exiting G to the free pool. If the G's stack allocation is _FixedStack bytes, we fail to clear g.stkbar. 4) A new G starts and allocates the G that was just added to the free pool. 5) The new G begins to execute and overwrites the stack slots that had stack barriers in them. 6) The garbage collector enters mark termination, attempts to remove stack barriers from the new G, and finds that they've been overwritten. Fix this by clearing the stack barriers in gfput in the case where it reuses the stack. Fixes #11256. Change-Id: I377c44258900e6bcc2d4b3451845814a8eeb2bcf Reviewed-on: https://go-review.googlesource.com/11461 Reviewed-by: Alex Brainman <alex.brainman@gmail.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 15:02:30 +00:00
Alex Brainman	85d4d46f3c	runtime: store syscall parameters in m not on stack Stack can move during callback, so libcall struct cannot be stored on stack. asmstdcall updates return values and errno in libcall struct parameter, but these could be at different location when callback returns. Store these in m, so they are not affected by GC. Fixes #10406 Change-Id: Id01c9d2b4b44530494e6d9e9e1c875261ce477cd Reviewed-on: https://go-review.googlesource.com/10370 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 02:45:45 +00:00
Austin Clements	d231cb8249	runtime: repeat bitmap for slice of GCprog n-1 times, not n times Currently, to write out the bitmap of a slice of a type with a GCprog, we construct a new GCprog that executes the underlying type's GCprog to write out the bitmap once and then repeats those bits n more times. This results in n+1 repetitions of the bitmap, which is one more repetition than it should be. This corrupts the bitmap of the heap following the slice and may write past the mapped bitmap memory and segfault. Fix this by repeating the bitmap only n-1 more times. Fixes #11430. Change-Id: Ic24854363bffc5a755b66f257339f9309ada3aa5 Reviewed-on: https://go-review.googlesource.com/11570 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-26 21:52:51 +00:00
Dmitry Vyukov	77132c810d	runtime/race: enable tests that now pass These tests pass after cl/11417. Change-Id: Id98088c52e564208ce432e9717eddd672c42c66d Reviewed-on: https://go-review.googlesource.com/11551 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-26 18:54:11 +00:00
Shenghou Ma	21a4c93166	runtime: slightly clean up softfloat code Removes the remains of the old C based stepflt implementation. Also removed goto usage. Change-Id: Ida4742c49000fae4fea4649f28afde630ce4c577 Reviewed-on: https://go-review.googlesource.com/9600 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-26 17:51:22 +00:00
Russ Cox	32fddadd98	runtime: reduce slice growth during append to 2x The new inlined code for append assumed that it could pass the desired new cap to growslice, not the number of new elements. But growslice still interpreted the argument as the number of new elements, making it always grow by >2x (more precisely, 2x+1 rounded up to the next malloc block size). At the time, I had intended to change the other callers to use the new cap as well, but it's too late for that. Instead, introduce growslice_n for the old callers and keep growslice for the inlined (common case) caller. Fixes #11403. Filed #11419 to merge them. Change-Id: I1338b1e5b352f3be4e43641f44b652ef7195251b Reviewed-on: https://go-review.googlesource.com/11541 Reviewed-by: Austin Clements <austin@google.com>	2015-06-26 17:49:33 +00:00
Dmitry Vyukov	cd0a8ed48a	cmd/compile: add instrumentation of OKEY Instrument operands of OKEY. Also instrument OSLICESTR. Previously it was not needed because of preceeding bounds checks (which were instrumented). But the preceeding bounds checks have disappeared. Change-Id: I3b0de213e23cbcf5b8ef800abeded5eeeb3f8287 Reviewed-on: https://go-review.googlesource.com/11417 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-26 15:54:03 +00:00
Aaron Jacobs	8628688304	Fix several out of date references to 4g/5g/6g/8g/9g. Change-Id: Ifb8e4e13c7778a7c0113190051415e096f5db94f Reviewed-on: https://go-review.googlesource.com/11390 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Andrew Gerrand <adg@golang.org>	2015-06-26 03:38:21 +00:00
Dmitry Vyukov	055e1a3ae7	runtime/race: fix test driver At some point it silently stopped recognizing test output. Meanwhile two tests degraded... Change-Id: I90a0325fc9aaa16c3ef16b9c4c642581da2bb10c Reviewed-on: https://go-review.googlesource.com/11416 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-25 11:36:07 +00:00
Russ Cox	a9e536442e	runtime: set m.procid always on Linux For debuggers and other program inspectors. Fixes #9914. Change-Id: I670728cea28c045e6eaba1808c550ee2f34d16ff Reviewed-on: https://go-review.googlesource.com/11341 Reviewed-by: Austin Clements <austin@google.com>	2015-06-24 21:50:39 +00:00
Dmitry Vyukov	77082481d4	runtime/race: make test more robust The test is flaky on builders lately. I don't see any issues other than usage of very small sleeps. So increase the sleeps. Also take opportunity to refactor the code. On my machine this change significantly reduces failure rate with GOMAXPROCS=2. I can't reproduce the failure with GOMAXPROCS=1. Fixes #10726 Change-Id: Iea6f10cf3ce1be5c112a2375d51c13687a8ab4c9 Reviewed-on: https://go-review.googlesource.com/9803 Reviewed-by: Austin Clements <austin@google.com>	2015-06-24 17:53:25 +00:00
Austin Clements	a8ae93fd26	runtime: fix heap bitmap repeating with large scalar tails When heapBitsSetType repeats a source bitmap with a scalar tail (typ.ptrdata < typ.size), it lays out the tail upon reaching the end of the source bitmap by simply increasing the number of bits claimed to be in the incoming bit buffer. This causes later iterations to read the appropriate number of zeros out of the bit buffer before starting on the next repeat of the source bitmap. Currently, however, later iterations of the loop continue to read bits from the source bitmap regardless of the number of bits currently in the bit buffer. The bit buffer can only hold 32 or 64 bits, so if the scalar tail is large and the padding bits exceed the size of the bit buffer, the read from the source bitmap on the next iteration will shift the incoming bits into oblivion when it attempts to put them in the bit buffer. When the buffer does eventually shift down to where these bits were supposed to be, it will contain zeros. As a result, words that should be marked as pointers on later repetitions are marked as scalars, so the garbage collector does not trace them. If this is the only reference to an object, it will be incorrectly freed. Fix this by adding logic to drain the bit buffer down if it is large instead of reading more bits from the source bitmap. Fixes #11286. Change-Id: I964432c4b9f1cec334fc8c3da0ff16460203feb6 Reviewed-on: https://go-review.googlesource.com/11360 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-23 18:37:17 +00:00
Austin Clements	eabdd05892	runtime: document memory ordering for h_spans h_spans can be accessed concurrently without synchronization from other threads, which means it needs the appropriate memory barriers on weakly ordered machines. It happens to already have the necessary memory barriers because all accesses to h_spans are currently protected by the heap lock and the unlocks happen in exactly the places where release barriers are needed, but it's easy to imagine that this could change in the future. Document the fact that we're depending on the barrier implied by the unlock. Related to issue #9984. Change-Id: I1bc3c95cd73361b041c8c95cd4bb92daf8c1f94a Reviewed-on: https://go-review.googlesource.com/11361 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-06-23 18:28:46 +00:00
Rick Hudson	1ab9176e54	runtime: remove race and increase precision in pointer validation. This CL removes the single and racy use of mheap.arena_end outside of the bookkeeping done in mHeap_init and mHeap_Alloc. There should be no way for heapBitsForSpan to see a pointer to an invalid span. This CL makes the check for this more precise by checking that the pointer is between mheap_.arena_start and mheap_.arena_used instead of mheap_.arena_end. Change-Id: I1200b54353ee1eda002d92645fd8d26048600ceb Reviewed-on: https://go-review.googlesource.com/11342 Reviewed-by: Austin Clements <austin@google.com>	2015-06-22 20:37:23 +00:00
Austin Clements	9a3112bcae	runtime: one more Map{Bits,Spans} before arena_used update In order to avoid a race with a concurrent write barrier or garbage collector thread, any update to arena_used must be preceded by mapping the corresponding heap bitmap and spans array memory. Otherwise, the concurrent access may observe that a pointer falls within the heap arena, but then attempt to access unmapped memory to look up its span or heap bits. Commit `d57c889` fixed all of the places where we updated arena_used immediately before mapping the heap bitmap and spans, but it missed the one place where we update arena_used and depend on later code to update it again and map the bitmap and spans. This creates a window where the original race can still happen. This commit fixes this by mapping the heap bitmap and spans before this arena_used update as well. This code path is only taken when expanding the heap reservation on 32-bit over a hole in the address space, so these extra mmap calls should have negligible impact. Fixes #10212, #11324. Change-Id: Id67795e6c7563eb551873bc401e5cc997aaa2bd8 Reviewed-on: https://go-review.googlesource.com/11340 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-06-22 18:54:38 +00:00
Austin Clements	2a331ca8bb	runtime: document relaxed access to arena_used The unsynchronized accesses to mheap_.arena_used in the concurrent part of the garbage collector look like a problem waiting to happen. In fact, they are safe, but the reason is somewhat subtle and undocumented. This commit documents this reasoning. Related to issue #9984. Change-Id: Icdbf2329c1aa11dbe2396a71eb5fc2a85bd4afd5 Reviewed-on: https://go-review.googlesource.com/11254 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-06-22 18:37:20 +00:00
Austin Clements	f5d494bbdf	runtime: ensure GC sees type-safe memory on weak machines Currently its possible for the garbage collector to observe uninitialized memory or stale heap bitmap bits on weakly ordered architectures such as ARM and PPC. On such architectures, the stores that zero newly allocated memory and initialize its heap bitmap may move after a store in user code that makes the allocated object observable by the garbage collector. To fix this, add a "publication barrier" (also known as an "export barrier") before returning from mallocgc. This is a store/store barrier that ensures any write done by user code that makes the returned object observable to the garbage collector will be ordered after the initialization performed by mallocgc. No barrier is necessary on the reading side because of the data dependency between loading the pointer and loading the contents of the object. Fixes one of the issues raised in #9984. Change-Id: Ia3d96ad9c5fc7f4d342f5e05ec0ceae700cd17c8 Reviewed-on: https://go-review.googlesource.com/11083 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Minux Ma <minux@golang.org> Reviewed-by: Martin Capitanio <capnm9@gmail.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-19 15:29:50 +00:00
Alex Brainman	9d968cb47b	runtime: rename cgocall_errno and asmcgocall_errno into cgocall and asmcgocall Change-Id: I5917bea8bb35b0e725dcc56a68f3a70137cfc180 Reviewed-on: https://go-review.googlesource.com/9387 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-06-19 01:47:11 +00:00
Rick Hudson	90a19961f2	runtime: reduce latency by aggressively ending mark phase Some latency regressions have crept into our system over the past few weeks. This CL fixes those by having the mark phase more aggressively blacken objects so that the mark termination phase, a STW phase, has less work to do. Three approaches were taken when the mark phase believes it has no more work to do, ie all the work buffers are empty. If things have gone well the mark phase is correct and there is in fact little or no work. In that case the following items will take very little time. If the mark phase is wrong this CL will ferret that work out and give the mark phase a chance to deal with it concurrently before mark termination begins. When the mark phase first appears to be out of work, it does three things: 1) It switches from allocating white to allocating black to reduce the number of unmarked objects reachable only from stacks. 2) It flushes and disables per-P GC work caches so all work must be in globally visible work buffers. 3) It rescans the global roots---the BSS and data segments---so there are fewer objects to blacken during mark termination. We do not rescan stacks at this point, though that could be done in a later CL. After these steps, it again drains the global work buffers. On a lightly loaded machine the garbage benchmark has reduced the number of GC cycles with latency > 10 ms from 83 out of 4083 cycles down to 2 out of 3995 cycles. Maximum latency was reduced from 60+ msecs down to 20 ms. Change-Id: I152285b48a7e56c5083a02e8e4485dd39c990492 Reviewed-on: https://go-review.googlesource.com/10590 Reviewed-by: Austin Clements <austin@google.com>	2015-06-18 21:38:46 +00:00
Shenghou Ma	3925a7c5db	all: switch to the new deprecation convention While we're at it, move some misplaced comment blocks around. Change-Id: I1847d7f1ca1dbb8e5de737203c4ed6c66e112508 Reviewed-on: https://go-review.googlesource.com/10188 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-18 19:16:23 +00:00
Dmitry Vyukov	e72f5f67a1	runtime: fix tracing of syscallexit There were two issues. 1. Delayed EvGoSysExit could have been emitted during TraceStart, while it had not yet emitted EvGoInSyscall. 2. Delayed EvGoSysExit could have been emitted during next tracing session. Fixes #10476 Fixes #11262 Change-Id: Iab68eb31cf38eb6eb6eee427f49c5ca0865a8c64 Reviewed-on: https://go-review.googlesource.com/9132 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-18 13:59:55 +00:00
Alex Brainman	2858b73843	runtime: remove cgocall and asmcgocall In preparation for rename of cgocall_errno into cgocall and asmcgocall_errno into asmcgocall in the fllowinng CL. rsc requested CL 9387 to be split into two parts. This is first part. Change-Id: I7434f0e4b44dd37017540695834bfcb1eebf0b2f Reviewed-on: https://go-review.googlesource.com/11166 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-06-18 04:42:53 +00:00
Russ Cox	cfa3eda587	runtime: fix race in scanvalid assertion Change-Id: I389b2e10fe667eaa55f87b71b1e004994694d4a3 Reviewed-on: https://go-review.googlesource.com/11173 Reviewed-by: Austin Clements <austin@google.com>	2015-06-17 20:12:37 +00:00
Russ Cox	3c60e6e8cf	runtime: fix races in stack scan This fixes a hang during runtime.TestTraceStress. It also fixes double-scan of stacks, which leads to stack barrier installation failures. Both of these have shown up as flaky failures on the dashboard. Fixes #10941. Change-Id: Ia2a5991ce2c9f43ba06ae1c7032f7c898dc990e0 Reviewed-on: https://go-review.googlesource.com/11089 Reviewed-by: Austin Clements <austin@google.com>	2015-06-17 17:56:26 +00:00
Russ Cox	08e25fc1ba	cmd/compile: introduce //go:systemstack annotation //go:systemstack means that the function must run on the system stack. Add one use in runtime as a demonstration. Fixes #9174. Change-Id: I8d4a509cb313541426157da703f1c022e964ace4 Reviewed-on: https://go-review.googlesource.com/10840 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2015-06-17 14:23:00 +00:00
Yongjian Xu	e3dc59f33d	runtime: fix typos in os_linux_arm.go Change-Id: I750900e0aed9ec528fea3f442c35196773e3ba5e Reviewed-on: https://go-review.googlesource.com/11163 Reviewed-by: Minux Ma <minux@golang.org>	2015-06-17 08:51:59 +00:00
Austin Clements	7387121ddb	runtime: account for stack guard when shrinking the stack Currently, when shrinkstack computes whether the halved stack allocation will have enough room for the stack, it accounts for the stack space that's actively in use but fails to leave extra room for the stack guard space. As a result, if the minimum stack size is small enough or the guard large enough, it may shrink the stack and leave less than enough room to run nosplit functions. If the next function called after the stack shrink is a nosplit function, it may overflow the stack without noticing and overwrite non-stack memory. We don't think this is happening under normal conditions right now. The minimum stack allocation is 2K and the guard is 640 bytes. The "worst case" stack shrink is from 4K (4048 bytes after stack barrier array reservation) to 2K (2016 bytes after stack barrier array reservation), which means the largest "used" size that will qualify for shrinking is 4048/4 - 8 = 1004 bytes. After copying, that leaves 2016 - 1004 = 1012 bytes of available stack, which is significantly more than the guard space. If we were to reduce the minimum stack size to 1K or raise the guard space above 1012 bytes, the logic in shrinkstack would no longer leave enough space. It's also possible to trigger this problem by setting firstStackBarrierOffset to 0, which puts stack barriers in a debug mode that steals away half of the stack for the stack barrier array reservation. Then, the largest "used" size that qualifies for shrinking is (4096/2)/4 - 8 = 504 bytes. After copying, that leaves (2096/2) - 504 = 8 bytes of available stack; much less than the required guard space. This causes failures like those in issue #11027 because func gc() shrinks its own stack and then immediately calls casgstatus (a nosplit function), which overflows the stack and overwrites a free list pointer in the neighboring span. However, since this seems to require the special debug mode, we don't think it's responsible for issue #11027. To forestall all of these subtle issues, this commit modifies shrinkstack to correctly account for the guard space when considering whether to halve the stack allocation. Change-Id: I7312584addc63b5bfe55cc384a1012f6181f1b9d Reviewed-on: https://go-review.googlesource.com/10714 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-16 21:17:53 +00:00
Austin Clements	5250279eb9	runtime: detect and print corrupted free lists Issues #10240, #10541, #10941, #11023, #11027 and possibly others are indicating memory corruption in the runtime. One of the easiest places to both get corruption and detect it is in the allocator's free lists since they appear throughout memory and follow strict invariants. This commit adds a check when sweeping a span that its free list is sane and, if not, it prints the corrupted free list and panics. Hopefully this will help us collect more information on these failures. Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64 Reviewed-on: https://go-review.googlesource.com/10713 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-06-16 21:17:47 +00:00
Russ Cox	142e434006	runtime: implement GOTRACEBACK=crash for linux/386 Change-Id: I401ce8d612160a4f4ee617bddca6827fa544763a Reviewed-on: https://go-review.googlesource.com/11087 Reviewed-by: Austin Clements <austin@google.com>	2015-06-16 20:47:47 +00:00
Russ Cox	7bc3e58806	all: extract "can I exec?" check from tests into internal/testenv Change-Id: I7b54be9d8b50b39e01c6be21f310ae9a10404e9d Reviewed-on: https://go-review.googlesource.com/10753 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-06-16 18:07:36 +00:00
Russ Cox	43aac4f9e7	runtime: raise maxmem to 512 GB A workaround for #10460. Change-Id: I607a556561d509db6de047892f886fb565513895 Reviewed-on: https://go-review.googlesource.com/10819 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-06-15 18:31:25 +00:00
Russ Cox	2c2770c3d4	cmd/cgo: make sure pointers passed to C escape to heap Fixes #10303. Change-Id: Ia68d3566ba3ebeea6e18e388446bd9b8c431e156 Reviewed-on: https://go-review.googlesource.com/10814 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-06-15 17:39:53 +00:00
Russ Cox	a3b9797baa	runtime: gofmt Change-Id: I539bdc438f694610a7cd373f7e1451171737cfb3 Reviewed-on: https://go-review.googlesource.com/11084 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-15 17:36:34 +00:00
Russ Cox	d5b40b6ac2	runtime: add GODEBUG gcshrinkstackoff, gcstackbarrieroff, and gcstoptheworld variables While we're here, update the documentation and delete variables with no effect. Change-Id: I4df0d266dff880df61b488ed547c2870205862f0 Reviewed-on: https://go-review.googlesource.com/10790 Reviewed-by: Austin Clements <austin@google.com>	2015-06-15 17:31:04 +00:00
Russ Cox	80ec711755	runtime: use type-based write barrier for remote stack write during chansend A send on an unbuffered channel to a blocked receiver is the only case in the runtime where one goroutine writes directly to the stack of another. The garbage collector assumes that if a goroutine is blocked, its stack contains no new pointers since the last time it ran. The send on an unbuffered channel violates this, so it needs an explicit write barrier. It has an explicit write barrier, but not one that can handle a write to another stack. Use one that can (based on type bitmap instead of heap bitmap). To make this work, raise the limit for type bitmaps so that they are used for all types up to 64 kB in size (256 bytes of bitmap). (The runtime already imposes a limit of 64 kB for a channel element size.) I have been unable to reproduce this problem in a simple test program. Could help #11035. Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5 Reviewed-on: https://go-review.googlesource.com/10815 Reviewed-by: Austin Clements <austin@google.com>	2015-06-15 16:50:30 +00:00
Russ Cox	d57c889ae8	runtime: wait to update arena_used until after mapping bitmap This avoids a race with gcmarkwb_m that was leading to faults. Fixes #10212. Change-Id: I6fcf8d09f2692227063ce29152cb57366ea22487 Reviewed-on: https://go-review.googlesource.com/10816 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-06-11 18:15:21 +00:00
Ainar Garipov	7f9f70e5b6	all: fix misprints in comments These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-11 14:18:57 +00:00
Yongjian Xu	93e57a22d5	runtime: correct a drifted comment in referencing m->locked. Change-Id: Ida4b98aa63e57594fa6fa0b8178106bac9b3cd19 Reviewed-on: https://go-review.googlesource.com/10837 Reviewed-by: Minux Ma <minux@golang.org>	2015-06-10 06:15:20 +00:00
Russ Cox	433c0bc769	runtime: avoid fault in heapBitsBulkBarrier Change-Id: I0512e461de1f25cb2a1cb7f23e7a77d00700667c Reviewed-on: https://go-review.googlesource.com/10803 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-08 20:24:00 +00:00
Austin Clements	b0532a96a8	runtime: fix write-barrier-enabled phase list in gcmarkwb_m Commit `1303957` was supposed to enable write barriers during the concurrent scan phase, but it only enabled calls to the write barrier during this phase. It failed to update the redundant list of write-barrier-enabled phases in gcmarkwb_m, so it still wasn't greying objects during the scan phase. This commit fixes this by replacing the redundant list of phases in gcmarkwb_m with simply checking writeBarrierEnabled. This is almost certainly redundant with checks already done in callers, but the last time we tried to remove these redundant checks everything got much slower, so I'm leaving it alone for now. Fixes #11105. Change-Id: I00230a3cb80a008e749553a8ae901b409097e4be Reviewed-on: https://go-review.googlesource.com/10801 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Minux Ma <minux@golang.org>	2015-06-08 05:13:15 +00:00
Austin Clements	306f8f11ad	runtime: unwind stack barriers when writing above the current frame Stack barriers assume that writes through pointers to frames above the current frame will get write barriers, and hence these frames do not need to be re-scanned to pick up these changes. For normal writes, this is true. However, there are places in the runtime that use typedmemmove to potentially write through pointers to higher frames (such as mapassign1). Currently, typedmemmove does not execute write barriers if the destination is on the stack. If there's a stack barrier between the current frame and the frame being modified with typedmemmove, and the stack barrier is not otherwise hit, it's possible that the garbage collector will never see the updated pointer and incorrectly reclaim the object. Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove and its variants) detect when the destination is in the stack and unwind stack barriers up to the point, forcing mark termination to later rescan the effected frame and collect these pointers. Fixes #11084. Might be related to #10240, #10541, #10941, #11023, #11027 and possibly others. Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd Reviewed-on: https://go-review.googlesource.com/10791 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:57:47 +00:00
Austin Clements	1303957dbf	runtime: enable write barriers during concurrent scan Currently, write barriers are only enabled after completion of the concurrent scan phase, as we enter the concurrent mark phase. However, stack barriers are installed during the scan phase and assume that write barriers will track changes to frames above the stack barriers. Since write barriers aren't enabled until after stack barriers are installed, we may miss modifications to the stack that happen after installing the stack barriers and before enabling write barriers. Fix this by enabling write barriers during the scan phase. This commit intentionally makes the minimal change to do this (there's only one line of code change; the rest are comment changes). At the very least, we should consider eliminating the ragged barrier that's intended to synchronize the enabling of write barriers, but now just wastes time. I've included a large comment about extensions and alternative designs. Change-Id: Ib20fede794e4fcb91ddf36f99bd97344d7f96421 Reviewed-on: https://go-review.googlesource.com/10795 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:55:33 +00:00
Austin Clements	6f6403eddf	runtime: fix checkmarks to rescan stacks Currently checkmarks mode fails to rescan stacks because it sees the leftover state bits indicating that the stacks haven't changed since the last scan. As a result, it won't detect lost marks caused by failing to scan stacks correctly during regular garbage collection. Fix this by marking all stacks dirty before performing the checkmark phase. Change-Id: I1f06882bb8b20257120a4b8e7f95bb3ffc263895 Reviewed-on: https://go-review.googlesource.com/10794 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:55:12 +00:00
Austin Clements	2774b37306	all: use RET instead of RETURN on ppc64 All of the architectures except ppc64 have only "RET" for the return mnemonic. ppc64 used to have only "RETURN", but commit `cf06ea6` introduced RET as a synonym for RETURN to make ppc64 consistent with the other architectures. However, that commit was never followed up to make the code itself consistent by eliminating uses of RETURN. This commit replaces all uses of RETURN in the ppc64 assembly with RET. This was done with sed -i 's/\<RETURN\>/RET/' */_ppc64x.s plus one manual change to syscall/asm.s. Change-Id: I3f6c8d2be157df8841d48de988ee43f3e3087995 Reviewed-on: https://go-review.googlesource.com/10672 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Minux Ma <minux@golang.org>	2015-06-06 00:07:23 +00:00
Alan Donovan	232331f0c7	runtime: add blank assignment to defeat "declared but not used" error from go/types gc should ideally consider this an error too; see golang/go#8560. Change-Id: Ieee71c4ecaff493d7f83e15ba8c8a04ee90a4cf1 Reviewed-on: https://go-review.googlesource.com/10757 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-06-05 18:05:16 +00:00
Austin Clements	7529314ed3	runtime: use correct SP when installing stack barriers Currently the stack barriers are installed at the next frame boundary after gp.sched.sp + 10242^n for n=0,1,2,... However, when a G is in a system call, we set gp.sched.sp to 0, which causes stack barriers to be installed at every* frame. This easily overflows the slice we've reserved for storing the stack barrier information, and causes a "slice bounds out of range" panic in gcInstallStackBarrier. Fix this by using gp.syscallsp instead of gp.sched.sp if it's non-zero. This is the same logic that gentraceback uses to determine the current SP. Fixes #11049. Change-Id: Ie40eeee5bec59b7c1aa715a7c17aa63b1f1cf4e8 Reviewed-on: https://go-review.googlesource.com/10755 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-05 15:53:07 +00:00
Russ Cox	3ffcbb633e	runtime: default GOMAXPROCS to NumCPU(), not 1 See golang.org/s/go15gomaxprocs for details. Change-Id: I8de5df34fa01d31d78f0194ec78a2474c281243c Reviewed-on: https://go-review.googlesource.com/10668 Reviewed-by: Rob Pike <r@golang.org>	2015-06-05 04:38:04 +00:00
Josh Bleecher Snyder	5353cde080	runtime, cmd/internal/obj/arm: improve arm function prologue When stack growth is not needed, as it usually is not, execute only a single conditional branch rather than three conditional instructions. This adds 4 bytes to every function, but might speed up execution in the common case. Sample disassembly for func f() { _ = [128]byte{} } Before: TEXT main.f(SB) x.go x.go:3 0x2000 e59a1008 MOVW 0x8(R10), R1 x.go:3 0x2004 e59fb028 MOVW 0x28(R15), R11 x.go:3 0x2008 e08d200b ADD R11, R13, R2 x.go:3 0x200c e1520001 CMP R1, R2 x.go:3 0x2010 91a0300e MOVW.LS R14, R3 x.go:3 0x2014 9b0118a9 BL.LS runtime.morestack_noctxt(SB) x.go:3 0x2018 9afffff8 B.LS main.f(SB) x.go:3 0x201c e52de084 MOVW.W R14, -0x84(R13) x.go:4 0x2020 e28d1004 ADD $4, R13, R1 x.go:4 0x2024 e3a00000 MOVW $0, R0 x.go:4 0x2028 eb012255 BL 0x4a984 x.go:5 0x202c e49df084 RET #132 x.go:5 0x2030 eafffffe B 0x2030 x.go:5 0x2034 ffffff7c ? After: TEXT main.f(SB) x.go x.go:3 0x2000 e59a1008 MOVW 0x8(R10), R1 x.go:3 0x2004 e59fb02c MOVW 0x2c(R15), R11 x.go:3 0x2008 e08d200b ADD R11, R13, R2 x.go:3 0x200c e1520001 CMP R1, R2 x.go:3 0x2010 9a000004 B.LS 0x2028 x.go:3 0x2014 e52de084 MOVW.W R14, -0x84(R13) x.go:4 0x2018 e28d1004 ADD $4, R13, R1 x.go:4 0x201c e3a00000 MOVW $0, R0 x.go:4 0x2020 eb0124dc BL 0x4b398 x.go:5 0x2024 e49df084 RET #132 x.go:5 0x2028 e1a0300e MOVW R14, R3 x.go:5 0x202c eb011b0d BL runtime.morestack_noctxt(SB) x.go:5 0x2030 eafffff2 B main.f(SB) x.go:5 0x2034 eafffffe B 0x2034 x.go:5 0x2038 ffffff7c ? Updates #10587. package sort benchmarks on an iPhone 6: name old time/op new time/op delta SortString1K 569µs ± 0% 565µs ± 1% -0.75% (p=0.000 n=23+24) StableString1K 872µs ± 1% 870µs ± 1% -0.16% (p=0.009 n=23+24) SortInt1K 317µs ± 2% 316µs ± 2% ~ (p=0.410 n=26+26) StableInt1K 343µs ± 1% 339µs ± 1% -1.07% (p=0.000 n=22+23) SortInt64K 30.0ms ± 1% 30.0ms ± 1% ~ (p=0.091 n=25+24) StableInt64K 30.2ms ± 0% 30.0ms ± 0% -0.69% (p=0.000 n=22+22) Sort1e2 147µs ± 1% 146µs ± 0% -0.48% (p=0.000 n=25+24) Stable1e2 290µs ± 1% 286µs ± 1% -1.30% (p=0.000 n=23+24) Sort1e4 29.5ms ± 2% 29.7ms ± 1% +0.71% (p=0.000 n=23+23) Stable1e4 88.7ms ± 4% 88.6ms ± 8% -0.07% (p=0.022 n=26+26) Sort1e6 4.81s ± 7% 4.83s ± 7% ~ (p=0.192 n=26+26) Stable1e6 18.3s ± 1% 18.1s ± 1% -0.76% (p=0.000 n=25+23) SearchWrappers 318ns ± 1% 344ns ± 1% +8.14% (p=0.000 n=23+26) package sort benchmarks on a first generation rpi: name old time/op new time/op delta SearchWrappers 4.13µs ± 0% 3.95µs ± 0% -4.42% (p=0.000 n=15+13) SortString1K 5.81ms ± 1% 5.82ms ± 2% ~ (p=0.400 n=14+15) StableString1K 9.69ms ± 1% 9.73ms ± 0% ~ (p=0.121 n=15+11) SortInt1K 3.30ms ± 2% 3.66ms ±19% +10.82% (p=0.000 n=15+14) StableInt1K 5.97ms ±15% 4.17ms ± 8% -30.05% (p=0.000 n=15+15) SortInt64K 319ms ± 1% 295ms ± 1% -7.65% (p=0.000 n=15+15) StableInt64K 343ms ± 0% 332ms ± 0% -3.26% (p=0.000 n=12+13) Sort1e2 3.36ms ± 2% 3.22ms ± 4% -4.10% (p=0.000 n=15+15) Stable1e2 6.74ms ± 1% 6.43ms ± 2% -4.67% (p=0.000 n=15+15) Sort1e4 247ms ± 1% 247ms ± 1% ~ (p=0.331 n=15+14) Stable1e4 864ms ± 0% 820ms ± 0% -5.15% (p=0.000 n=14+15) Sort1e6 41.2s ± 0% 41.2s ± 0% +0.15% (p=0.000 n=13+14) Stable1e6 192s ± 0% 182s ± 0% -5.07% (p=0.000 n=14+14) Change-Id: I8a9db77e1d4ea1956575895893bc9d04bd81204b Reviewed-on: https://go-review.googlesource.com/10497 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-04 16:35:12 +00:00
Brad Fitzpatrick	03410f6758	runtime: fix TestFixedGOROOT to properly restore the GOROOT env var after test Otherwise subsequent tests won't see any modified GOROOT. With this CL I can move my GOROOT, set GOROOT to the new location, and the runtime tests pass. Previously the crash_tests would instead look for the GOROOT baked into the binary, instead of the env var: --- FAIL: TestGcSys (0.01s) crash_test.go:92: building source: exit status 2 go: cannot find GOROOT directory: /home/bradfitz/go --- FAIL: TestGCFairness (0.01s) crash_test.go:92: building source: exit status 2 go: cannot find GOROOT directory: /home/bradfitz/go --- FAIL: TestGdbPython (0.07s) runtime-gdb_test.go:64: building source exit status 2 go: cannot find GOROOT directory: /home/bradfitz/go --- FAIL: TestLargeStringConcat (0.01s) crash_test.go:92: building source: exit status 2 go: cannot find GOROOT directory: /home/bradfitz/go Update #10029 Change-Id: If91be0f04d3acdcf39a9e773a4e7905a446bc477 Reviewed-on: https://go-review.googlesource.com/10685 Reviewed-by: Andrew Gerrand <adg@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-03 23:33:48 +00:00
Austin Clements	10083d8007	runtime: print start of GC cycle in gctrace, rather than end Currently the GODEBUG=gctrace=1 trace line includes "@n.nnns" to indicate the time that the GC cycle ended relative to the time the program started. This was meant to be consistent with the utilization as of the end of the cycle, which is printed next on the trace line, but it winds up just being confusing and unexpected. Change the trace line to include the time that the GC cycle started relative to the time the program started. Change-Id: I7d64580cd696eb17540716d3e8a74a9d6ae50650 Reviewed-on: https://go-review.googlesource.com/10634 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-03 02:17:43 +00:00
Austin Clements	faa7a7e8ae	runtime: implement GC stack barriers This commit implements stack barriers to minimize the amount of stack re-scanning that must be done during mark termination. Currently the GC scans stacks of active goroutines twice during every GC cycle: once at the beginning during root discovery and once at the end during mark termination. The second scan happens while the world is stopped and guarantees that we've seen all of the roots (since there are no write barriers on writes to local stack variables). However, this means pause time is proportional to stack size. In particularly recursive programs, this can drive pause time up past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap). Re-scanning the entire stack is rarely necessary, especially for large stacks, because usually most of the frames on the stack were not active between the first and second scans and hence any changes to these frames (via non-escaping pointers passed down the stack) were tracked by write barriers. To efficiently track how far a stack has been unwound since the first scan (and, hence, how much needs to be re-scanned), this commit introduces stack barriers. During the first scan, at exponentially spaced points in each stack, the scan overwrites return PCs with the PC of the stack barrier function. When "returned" to, the stack barrier function records how far the stack has unwound and jumps to the original return PC for that point in the stack. Then the second scan only needs to proceed as far as the lowest barrier that hasn't been hit. For deeply recursive programs, this substantially reduces mark termination time (and hence pause time). For the goscheme example linked in issue #10898, prior to this change, mark termination times were typically between 100 and 500ms; with this change, mark termination times are typically between 10 and 20ms. As a result of the reduced stack scanning work, this reduces overall execution time of the goscheme example by 20%. Fixes #10898. The effect of this on programs that are not deeply recursive is minimal: name old time/op new time/op delta BinaryTree17 3.16s ± 2% 3.26s ± 1% +3.31% (p=0.000 n=19+19) Fannkuch11 2.42s ± 1% 2.48s ± 1% +2.24% (p=0.000 n=17+19) FmtFprintfEmpty 50.0ns ± 3% 49.8ns ± 1% ~ (p=0.534 n=20+19) FmtFprintfString 173ns ± 0% 175ns ± 0% +1.49% (p=0.000 n=16+19) FmtFprintfInt 170ns ± 1% 175ns ± 1% +2.97% (p=0.000 n=20+19) FmtFprintfIntInt 288ns ± 0% 295ns ± 0% +2.73% (p=0.000 n=16+19) FmtFprintfPrefixedInt 242ns ± 1% 252ns ± 1% +4.13% (p=0.000 n=18+18) FmtFprintfFloat 324ns ± 0% 323ns ± 0% -0.36% (p=0.000 n=20+19) FmtManyArgs 1.14µs ± 0% 1.12µs ± 1% -1.01% (p=0.000 n=18+19) GobDecode 8.88ms ± 1% 8.87ms ± 0% ~ (p=0.480 n=19+18) GobEncode 6.80ms ± 1% 6.85ms ± 0% +0.82% (p=0.000 n=20+18) Gzip 363ms ± 1% 363ms ± 1% ~ (p=0.077 n=18+20) Gunzip 90.6ms ± 0% 90.0ms ± 1% -0.71% (p=0.000 n=17+18) HTTPClientServer 51.5µs ± 1% 50.8µs ± 1% -1.32% (p=0.000 n=18+18) JSONEncode 17.0ms ± 0% 17.1ms ± 0% +0.40% (p=0.000 n=18+17) JSONDecode 61.8ms ± 0% 63.8ms ± 1% +3.11% (p=0.000 n=18+17) Mandelbrot200 3.84ms ± 0% 3.84ms ± 1% ~ (p=0.583 n=19+19) GoParse 3.71ms ± 1% 3.72ms ± 1% ~ (p=0.159 n=18+19) RegexpMatchEasy0_32 100ns ± 0% 100ns ± 1% -0.19% (p=0.033 n=17+19) RegexpMatchEasy0_1K 342ns ± 1% 331ns ± 0% -3.41% (p=0.000 n=19+19) RegexpMatchEasy1_32 82.5ns ± 0% 81.7ns ± 0% -0.98% (p=0.000 n=18+18) RegexpMatchEasy1_1K 505ns ± 0% 494ns ± 1% -2.16% (p=0.000 n=18+18) RegexpMatchMedium_32 137ns ± 1% 137ns ± 1% -0.24% (p=0.048 n=20+18) RegexpMatchMedium_1K 41.6µs ± 0% 41.3µs ± 1% -0.57% (p=0.004 n=18+20) RegexpMatchHard_32 2.11µs ± 0% 2.11µs ± 1% +0.20% (p=0.037 n=17+19) RegexpMatchHard_1K 63.9µs ± 2% 63.3µs ± 0% -0.99% (p=0.000 n=20+17) Revcomp 560ms ± 1% 522ms ± 0% -6.87% (p=0.000 n=18+16) Template 75.0ms ± 0% 75.1ms ± 1% +0.18% (p=0.013 n=18+19) TimeParse 358ns ± 1% 364ns ± 0% +1.74% (p=0.000 n=20+15) TimeFormat 360ns ± 0% 372ns ± 0% +3.55% (p=0.000 n=20+18) Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2 Reviewed-on: https://go-review.googlesource.com/10314 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-06-02 20:00:57 +00:00
Austin Clements	724f8298a8	runtime: avoid double-scanning of stacks Currently there's a race between stopg scanning another G's stack and the G reaching a preemption point and scanning its own stack. When this race occurs, the G's stack is scanned twice. Currently this is okay, so this race is benign. However, we will shortly be adding stack barriers during the first stack scan, so scanning will no longer be idempotent. To prepare for this, this change ensures that each stack is scanned only once during each GC phase by checking the flag that indicates that the stack has been scanned in this phase before scanning the stack. Change-Id: Id9f4d5e2e5b839bc3f200ec1723a4a12dd677ab4 Reviewed-on: https://go-review.googlesource.com/10458 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-06-02 19:59:05 +00:00
Austin Clements	3f6e69aca5	runtime: steal space for stack barrier tracking from stack The stack barrier code will need a bookkeeping structure to keep track of the overwritten return PCs. This commit introduces and allocates this structure, but does not yet use the structure. We don't want to allocate space for this structure during garbage collection, so this commit allocates it along with the allocation of the corresponding stack. However, we can't do a regular allocation in newstack because mallocgc may itself grow the stack (which would lead to a recursive allocation). Hence, this commit makes the bookkeeping structure part of the stack allocation itself by stealing the necessary space from the top of the stack allocation. Since the size of this bookkeeping structure is logarithmic in the size of the stack, this has minimal impact on stack behavior. Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665 Reviewed-on: https://go-review.googlesource.com/10313 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 19:57:57 +00:00
Austin Clements	e610c25df0	runtime: decouple stack bounds and stack allocation size Currently the runtime assumes that the allocation for the stack is exactly [stack.lo, stack.hi). We're about to steal a small part of this allocation for per-stack GC metadata. To prepare for this, this commit adds a field to the G for the allocated size of the stack. With this change, stack.lo and stack.hi continue to act as the true bounds on the stack, but are no longer also used as the bounds on the stack allocation. (I also tried this the other way around, where stack.lo and stack.hi remained the allocation bounds and I introduced a new top of stack. However, there are far more places that assume stack.hi is the true top of the stack than there are places that assume it's the top of the allocation.) Change-Id: Ifa9d956753be53d286d09cbc73d47fb34a18c0c6 Reviewed-on: https://go-review.googlesource.com/10312 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 19:57:50 +00:00
Austin Clements	c02b8911d8	runtime: clean up signalstack API Currently signalstack takes a lower limit and a length and all calls hard-code the passed length. Change the API to take a *stack and compute the lower limit and length from the passed stack. This will make it easier for the runtime to steal some space from the top of the stack since it eliminates the hard-coded stack sizes. Change-Id: I7d2a9f45894b221f4e521628c2165530bbc57d53 Reviewed-on: https://go-review.googlesource.com/10311 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 19:57:42 +00:00
Austin Clements	cc6a7fce53	runtime: increase precision of gctrace times Currently we truncate gctrace clock and CPU times to millisecond precision. As a result, many phases are typically printed as 0, which is fine for user consumption, but makes gathering statistics and reports over GC traces difficult. In 1.4, the gctrace line printed times in microseconds. This was better for statistics, but not as easy for users to read or interpret, and it generally made the trace lines longer. This change strikes a balance between these extremes by printing milliseconds, but including the decimal part to two significant figures down to microsecond precision. This remains easy to read and interpret, but includes more precision when it's useful. For example, where the code currently prints, gc #29 @1.629s 0%: 0+2+0+12+0 ms clock, 0+2+0+0/12/0+0 ms cpu, 4->4->2 MB, 4 MB goal, 1 P this prints, gc #29 @1.629s 0%: 0.005+2.1+0+12+0.29 ms clock, 0.005+2.1+0+0/12/0+0.29 ms cpu, 4->4->2 MB, 4 MB goal, 1 P Fixes #10970. Change-Id: I249624779433927cd8b0947b986df9060c289075 Reviewed-on: https://go-review.googlesource.com/10554 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-02 18:31:36 +00:00
Mikio Hara	1fa0a8cec5	runtime: fix data race in BenchmarkChanPopular Fixes #11014. Change-Id: I9a18dacd10564d3eaa1fea4d77f1a48e08e79f53 Reviewed-on: https://go-review.googlesource.com/10563 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-02 11:16:01 +00:00
Austin Clements	df2809f04e	runtime: document that runtime.GC() blocks until GC is complete runtime.GC() is intentionally very weakly specified. However, it is so weakly specified that it's difficult to know that it's being used correctly for its one intended use case: to ensure garbage collection has run in a test that is garbage-sensitive. In particular, it is unclear whether it is synchronous or asynchronous. In the old STW collector this was essentially self-evident; short of queuing up a garbage collection to run later, it had to be synchronous. However, with the concurrent collector, there's evidence that people are inferring that it may be asynchronous (e.g., issue #10986), as this is both unclear in the documentation and possible in the implementation. In fact, runtime.GC() runs a fully synchronous STW collection. We probably don't want to commit to this exact behavior. But we can commit to the essential property that tests rely on: that runtime.GC() does not return until the GC has finished. Change-Id: Ifc3045a505e1898ecdbe32c1f7e80e2e9ffacb5b Reviewed-on: https://go-review.googlesource.com/10488 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-06-01 14:51:12 +00:00
Austin Clements	f2c3957ed8	runtime: disable GC around TestGoroutineParallelism TestGoroutineParallelism can deadlock if the GC runs during the test. Currently it tries to prevent this by forcing a GC before the test, but this is best effort and fails completely if GOGC is very low for testing. This change replaces this best-effort fix with simply setting GOGC to off for the duration of the test. Change-Id: I8229310833f241b149ebcd32845870c1cb14e9f8 Reviewed-on: https://go-review.googlesource.com/10454 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-28 17:40:19 +00:00
Austin Clements	4a1957d0aa	runtime: use stripped test environment for TestGdbPython Most runtime tests that invoke the compiler to build a sub-test binary do so with a special environment constructed by testEnv that strips out environment variables that should apply to the test but not to the build. Fix TestGdbPython to use this test environment when invoking go build, like other tests do. Change-Id: Iafdf89d4765c587cbebc427a5d61cb8a7e71b326 Reviewed-on: https://go-review.googlesource.com/10455 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-28 17:39:08 +00:00
Elias Naur	8017ace496	runtime: don't always block all signals on OpenBSD Implement the changes from CL 10173 on OpenBSD. Change-Id: I2db1cd8141fd392a34753a1b8113e2e0401173b9 Reviewed-on: https://go-review.googlesource.com/10342 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-05-23 17:42:43 +00:00
Elias Naur	84cfba17c2	runtime: don't always unblock all signals Ian proposed an improved way of handling signals masks in Go, motivated by a problem where the Android java runtime expects certain signals to be blocked for all JVM threads. Discussion here https://groups.google.com/forum/#!topic/golang-dev/_TSCkQHJt6g Ian's text is used in the following: A Go program always needs to have the synchronous signals enabled. These are the signals for which _SigPanic is set in sigtable, namely SIGSEGV, SIGBUS, SIGFPE. A Go program that uses the os/signal package, and calls signal.Notify, needs to have at least one thread which is not blocking that signal, but it doesn't matter much which one. Unix programs do not change signal mask across execve. They inherit signal masks across fork. The shell uses this fact to some extent; for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are blocked for commands run due to backquote quoting or $(). Our current position on signal masks was not thought out. We wandered into step by step, e.g., http://golang.org/cl/7323067 . This CL does the following: Introduce a new platform hook, msigsave, that saves the signal mask of the current thread to m.sigsave. Call msigsave from needm and newm. In minit grab set up the signal mask from m.sigsave and unblock the essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT (for systems that have it). In unminit, restore the signal mask from m.sigsave. The first time that os/signal.Notify is called, start a new thread whose only purpose is to update its signal mask to make sure signals for signal.Notify are unblocked on at least one thread. The effect on Go programs will be that if they are invoked with some non-synchronous signals blocked, those signals will normally be ignored. Previously, those signals would mostly be ignored. A change in behaviour will occur for programs started with any of these signals blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT, SIGTERM. Previously those signals would always cause a crash (unless using the os/signal package); with this change, they will be ignored if the program is started with the signal blocked (and does not use the os/signal package). ./all.bash completes successfully on linux/amd64. OpenBSD is missing the implementation. Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c Reviewed-on: https://go-review.googlesource.com/10173 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-05-22 20:24:08 +00:00
Russ Cox	001438bdfe	runtime: fix callwritebarrier Given a call frame F of size N where the return values start at offset R, callwritebarrier was instructing heapBitsBulkBarrier to scan the block of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R bytes scanned might lead into the next allocated block in memory. Because the scan was consulting the heap bitmap for type information, scanning into the next block normally "just worked" in the sense of not crashing. Scanning the extra N-R bytes of memory is a problem mainly because it causes the GC to consider pointers that might otherwise not be considered, leading it to retain objects that should actually be freed. This is very difficult to detect. Luckily, juju turned up a case where the heap bitmap and the memory were out of sync for the block immediately after the call frame, so that heapBitsBulkBarrier saw an obvious non-pointer where it expected a pointer, causing a loud crash. Why is there a non-pointer in memory that the heap bitmap records as a pointer? That is more difficult to answer. At least one way that it could happen is that allocations containing no pointers at all do not update the heap bitmap. So if heapBitsBulkBarrier walked out of the current object and into a no-pointer object and consulted those bitmap bits, it would be misled. This doesn't happen in general because all the paths to heapBitsBulkBarrier first check for the no-pointer case. This may or may not be what happened, but it's the only scenario I've been able to construct. I tried for quite a while to write a simple test for this and could not. It does fix the juju crash, and it is clearly an improvement over the old code. Fixes #10844. Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a Reviewed-on: https://go-review.googlesource.com/10317 Reviewed-by: Austin Clements <austin@google.com>	2015-05-21 19:14:03 +00:00
Austin Clements	a5c3bbe0b4	runtime: eliminate write barrier from adjustpointers Currently adjustpointers invokes a write barrier for every stack slot it updates. This is safe---the write barrier always does nothing because the new value is never a heap pointer---but it's unnecessary overhead in performance and complexity. Fix this by rewriting adjustpointers to work with uintptrs instead of unsafe.Pointers. As an added bonus, this makes the code cleaner. name old mean new mean delta BinaryTree17 3.35s × (0.98,1.01) 3.33s × (0.99,1.02) ~ (p=0.095 n=20+19) Fannkuch11 2.49s × (1.00,1.01) 2.52s × (0.99,1.01) +1.23% (p=0.000 n=19+20) FmtFprintfEmpty 52.2ns × (0.99,1.02) 52.2ns × (0.99,1.02) ~ (p=0.766 n=19+19) FmtFprintfString 181ns × (0.99,1.02) 179ns × (0.99,1.01) -1.06% (p=0.000 n=20+19) FmtFprintfInt 177ns × (0.99,1.01) 173ns × (0.99,1.02) -2.26% (p=0.000 n=17+20) FmtFprintfIntInt 300ns × (0.99,1.01) 302ns × (0.99,1.01) +0.76% (p=0.000 n=19+20) FmtFprintfPrefixedInt 253ns × (0.99,1.02) 256ns × (0.99,1.01) +0.96% (p=0.000 n=20+19) FmtFprintfFloat 334ns × (0.99,1.02) 334ns × (1.00,1.01) ~ (p=0.243 n=20+19) FmtManyArgs 1.16µs × (0.99,1.01) 1.17µs × (0.99,1.02) +0.88% (p=0.000 n=20+20) GobDecode 9.16ms × (0.99,1.02) 9.18ms × (1.00,1.00) +0.21% (p=0.048 n=20+17) GobEncode 7.03ms × (0.99,1.01) 7.05ms × (0.99,1.01) ~ (p=0.091 n=19+19) Gzip 374ms × (0.99,1.01) 372ms × (0.99,1.02) -0.50% (p=0.008 n=18+20) Gunzip 92.9ms × (0.99,1.01) 92.5ms × (1.00,1.01) -0.47% (p=0.002 n=19+19) HTTPClientServer 53.1µs × (0.98,1.01) 52.5µs × (0.99,1.01) -0.98% (p=0.000 n=20+19) JSONEncode 17.4ms × (0.99,1.02) 17.5ms × (0.99,1.01) ~ (p=0.061 n=19+20) JSONDecode 66.0ms × (0.99,1.02) 64.7ms × (0.99,1.01) -1.87% (p=0.000 n=20+20) Mandelbrot200 3.94ms × (1.00,1.01) 3.95ms × (1.00,1.01) ~ (p=0.799 n=18+19) GoParse 3.89ms × (0.99,1.02) 3.86ms × (0.99,1.01) -0.70% (p=0.016 n=20+19) RegexpMatchEasy0_32 102ns × (0.99,1.02) 102ns × (1.00,1.01) ~ (p=0.557 n=20+18) RegexpMatchEasy0_1K 353ns × (0.99,1.02) 341ns × (0.99,1.01) -3.38% (p=0.000 n=20+20) RegexpMatchEasy1_32 85.0ns × (0.99,1.02) 85.0ns × (0.99,1.01) ~ (p=0.851 n=19+20) RegexpMatchEasy1_1K 521ns × (0.99,1.02) 506ns × (1.00,1.01) -2.85% (p=0.000 n=20+18) RegexpMatchMedium_32 142ns × (0.99,1.02) 141ns × (1.00,1.01) -1.17% (p=0.000 n=20+19) RegexpMatchMedium_1K 42.8µs × (0.99,1.01) 42.3µs × (0.99,1.01) -1.07% (p=0.000 n=20+19) RegexpMatchHard_32 2.17µs × (0.99,1.01) 2.16µs × (1.00,1.01) -0.51% (p=0.042 n=20+18) RegexpMatchHard_1K 65.6µs × (0.99,1.01) 64.8µs × (1.00,1.00) -1.21% (p=0.000 n=20+17) Revcomp 581ms × (0.99,1.04) 536ms × (1.00,1.01) -7.71% (p=0.000 n=20+18) Template 77.2ms × (0.99,1.01) 76.8ms × (0.99,1.01) ~ (p=0.426 n=20+18) TimeParse 369ns × (0.99,1.02) 371ns × (1.00,1.01) ~ (p=0.117 n=20+19) TimeFormat 371ns × (0.99,1.02) 391ns × (0.99,1.01) +5.33% (p=0.000 n=20+19) Change-Id: I5b952ba577ac4365c8c87db837c5804a1e30b7be Reviewed-on: https://go-review.googlesource.com/10293 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-21 18:35:49 +00:00
Rick Hudson	5b66e5d0d8	runtime: turn work buffer tracing off by default During development we ran with monitoring code turned on by default. This CL turns the work buffer monitoring off. Performance change on most go1 benchmarks is small or insignificant. name old mean new mean delta BinaryTree17 3.35s × (0.99,1.01) 3.35s × (0.99,1.01) ~ (p=0.841 n=5+5) Fannkuch11 2.59s × (1.00,1.01) 2.55s × (1.00,1.00) -1.65% (p=0.008 n=5+5) FmtFprintfEmpty 52.5ns × (0.99,1.02) 53.2ns × (0.98,1.01) ~ (p=0.063 n=5+5) FmtFprintfString 181ns × (1.00,1.00) 180ns × (1.00,1.00) -0.55% (p=0.029 n=4+4) FmtFprintfInt 176ns × (1.00,1.01) 174ns × (1.00,1.00) -0.91% (p=0.000 n=5+4) FmtFprintfIntInt 298ns × (1.00,1.00) 299ns × (1.00,1.00) ~ (p=0.143 n=4+4) FmtFprintfPrefixedInt 250ns × (1.00,1.01) 246ns × (1.00,1.00) -1.68% (p=0.000 n=5+4) FmtFprintfFloat 340ns × (1.00,1.00) 340ns × (1.00,1.01) ~ (p=0.643 n=5+5) FmtManyArgs 1.16µs × (1.00,1.00) 1.15µs × (1.00,1.00) -0.47% (p=0.016 n=5+5) GobDecode 9.22ms × (1.00,1.00) 9.23ms × (1.00,1.00) ~ (p=0.841 n=5+5) GobEncode 7.00ms × (1.00,1.01) 7.09ms × (0.99,1.01) +1.26% (p=0.016 n=5+5) Gzip 387ms × (1.00,1.00) 389ms × (0.99,1.02) ~ (p=1.000 n=5+5) Gunzip 97.8ms × (1.00,1.00) 98.3ms × (1.00,1.00) +0.51% (p=0.016 n=5+4) HTTPClientServer 52.6µs × (1.00,1.01) 52.7µs × (1.00,1.01) ~ (p=1.000 n=5+5) JSONEncode 18.0ms × (0.99,1.02) 17.9ms × (1.00,1.00) ~ (p=0.310 n=5+5) JSONDecode 64.8ms × (0.99,1.02) 63.6ms × (1.00,1.00) -1.94% (p=0.008 n=5+5) Mandelbrot200 4.05ms × (1.00,1.00) 4.05ms × (1.00,1.00) ~ (p=0.421 n=5+5) GoParse 3.86ms × (1.00,1.01) 3.84ms × (0.99,1.01) ~ (p=0.421 n=5+5) RegexpMatchEasy0_32 101ns × (1.00,1.00) 102ns × (0.99,1.02) ~ (p=0.238 n=4+5) RegexpMatchEasy0_1K 346ns × (1.00,1.01) 345ns × (1.00,1.00) ~ (p=0.333 n=5+4) RegexpMatchEasy1_32 87.3ns × (0.99,1.02) 87.4ns × (1.00,1.00) ~ (p=0.190 n=5+4) RegexpMatchEasy1_1K 520ns × (1.00,1.00) 520ns × (1.00,1.01) ~ (p=1.000 n=4+5) RegexpMatchMedium_32 143ns × (1.00,1.00) 142ns × (1.00,1.00) -0.70% (p=0.029 n=4+4) RegexpMatchMedium_1K 43.2µs × (1.00,1.01) 43.2µs × (1.00,1.00) ~ (p=0.841 n=5+5) RegexpMatchHard_32 2.24µs × (1.00,1.01) 2.23µs × (1.00,1.01) -0.63% (p=0.048 n=5+5) RegexpMatchHard_1K 68.7µs × (1.00,1.00) 68.3µs × (1.00,1.00) -0.56% (p=0.008 n=5+5) Revcomp 577ms × (1.00,1.01) 579ms × (1.00,1.00) ~ (p=0.151 n=5+5) Template 74.9ms × (1.00,1.00) 76.5ms × (1.00,1.00) +2.11% (p=0.008 n=5+5) TimeParse 359ns × (1.00,1.00) 362ns × (1.00,1.00) +0.72% (p=0.008 n=5+5) TimeFormat 369ns × (1.00,1.00) 371ns × (1.00,1.01) ~ (p=0.071 n=5+5) Change-Id: I4206a3f77a3d1450966b7a62ea7597aec44cb72f Reviewed-on: https://go-review.googlesource.com/10294 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-21 16:09:24 +00:00
Austin Clements	719efc70eb	runtime: make runtime.callers walk calling G, not g0 Currently runtime.callers invokes gentraceback with the pc and sp of the G it is called from, but always passes g0 even if it was called from a regular g. Right now this has no ill effects because runtime.callers does not use either callback argument or the _TraceJumpStack flag, but it makes the code fragile and will break some upcoming changes. Fix this by lifting the getg() call outside of the systemstack in runtime.callers. Change-Id: I4e1e927961c0e0cd4dcf28693be47df7bae9e122 Reviewed-on: https://go-review.googlesource.com/10292 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-21 16:06:37 +00:00
Rick Hudson	197aa9e64d	runtime: remove unused quiesce code This is dead code. If you want to quiesce the system the preferred way is to use forEachP(func(*p){}). Change-Id: Ic7677a5dd55e3639b99e78ddeb2c71dd1dd091fa Reviewed-on: https://go-review.googlesource.com/10267 Reviewed-by: Austin Clements <austin@google.com>	2015-05-20 17:56:44 +00:00
Rick Hudson	913db7685e	runtime: run background mark helpers only if work is available Prior to this CL whenever the GC marking was enabled and a P was looking for work we supplied a G to help the GC do its marking tasks. Once this G finished all the marking available it would release the P to find another available G. In the case where there was no work the P would drop into findrunnable which would execute the mark helper G which would immediately return and the P would drop into findrunnable again repeating the process. Since the P was always given a G to run it never blocks. This CL first checks if the GC mark helper G has available work and if not the P immediately falls through to its blocking logic. Fixes #10901 Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5 Reviewed-on: https://go-review.googlesource.com/10189 Reviewed-by: Austin Clements <austin@google.com>	2015-05-19 15:57:50 +00:00
Austin Clements	f4d51eb2f5	runtime: minor clean up to heapminimum Currently setGCPercent sets heapminimum to heapminimum*GOGC/100. The real intent is to set heapminimum to a scaled multiple of a fixed default heap minimum, not to scale heapminimum based on its current value. This turns out to be okay because setGCPercent is only called once and heapminimum is initially set to this default heap minimum. However, the code as written is confusing, especially since setGCPercent is otherwise written so it could be called again to change GOGC. Fix this by introducing a defaultHeapMinimum constant and using this instead of the current value of heapminimum to compute the scaled heap minimum. As part of this, this commit improves the documentation on heapminimum. Change-Id: I4eb82c73dc2eb44a6e5a17c780a747a2e73d7493 Reviewed-on: https://go-review.googlesource.com/10181 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-19 15:30:34 +00:00
Russ Cox	8903b3db0e	runtime: add fast check for self-loop pointer in scanobject Addresses a problem reported on the mailing list. This will come up mainly in programs custom allocators that batch allocations, but it still helps in our programs, which mainly do not have such allocations. name old mean new mean delta BinaryTree17 5.95s × (0.97,1.03) 5.93s × (0.97,1.04) ~ (p=0.613) Fannkuch11 4.46s × (0.98,1.04) 4.33s × (0.99,1.01) -2.93% (p=0.000) FmtFprintfEmpty 86.6ns × (0.98,1.03) 86.8ns × (0.98,1.02) ~ (p=0.523) FmtFprintfString 290ns × (0.98,1.05) 287ns × (0.98,1.03) ~ (p=0.061) FmtFprintfInt 271ns × (0.98,1.04) 286ns × (0.99,1.01) +5.54% (p=0.000) FmtFprintfIntInt 495ns × (0.98,1.04) 489ns × (0.99,1.01) -1.24% (p=0.015) FmtFprintfPrefixedInt 391ns × (0.99,1.02) 407ns × (0.99,1.01) +4.00% (p=0.000) FmtFprintfFloat 578ns × (0.99,1.01) 559ns × (0.99,1.01) -3.35% (p=0.000) FmtManyArgs 1.96µs × (0.98,1.05) 1.94µs × (0.99,1.01) -1.33% (p=0.030) GobDecode 15.9ms × (0.97,1.05) 15.7ms × (0.99,1.01) -1.35% (p=0.044) GobEncode 11.4ms × (0.97,1.05) 11.3ms × (0.98,1.03) ~ (p=0.141) Gzip 658ms × (0.98,1.05) 648ms × (0.99,1.01) -1.59% (p=0.009) Gunzip 144ms × (0.99,1.03) 144ms × (0.99,1.01) ~ (p=0.867) HTTPClientServer 92.1µs × (0.97,1.05) 90.3µs × (0.99,1.01) -1.89% (p=0.005) JSONEncode 31.0ms × (0.96,1.07) 30.2ms × (0.98,1.03) -2.66% (p=0.001) JSONDecode 110ms × (0.97,1.04) 107ms × (0.99,1.01) -2.59% (p=0.000) Mandelbrot200 6.15ms × (0.98,1.04) 6.07ms × (0.99,1.02) -1.32% (p=0.045) GoParse 6.79ms × (0.97,1.04) 6.74ms × (0.97,1.04) ~ (p=0.242) RegexpMatchEasy0_32 158ns × (0.98,1.05) 155ns × (0.99,1.01) -1.64% (p=0.010) RegexpMatchEasy0_1K 548ns × (0.97,1.04) 540ns × (0.99,1.01) -1.34% (p=0.042) RegexpMatchEasy1_32 133ns × (0.97,1.04) 132ns × (0.97,1.05) ~ (p=0.466) RegexpMatchEasy1_1K 899ns × (0.96,1.05) 878ns × (0.99,1.01) -2.32% (p=0.002) RegexpMatchMedium_32 250ns × (0.96,1.03) 243ns × (0.99,1.01) -2.90% (p=0.000) RegexpMatchMedium_1K 73.4µs × (0.98,1.04) 73.0µs × (0.98,1.04) ~ (p=0.411) RegexpMatchHard_32 3.87µs × (0.97,1.07) 3.84µs × (0.98,1.04) ~ (p=0.273) RegexpMatchHard_1K 120µs × (0.97,1.08) 117µs × (0.99,1.01) -2.06% (p=0.010) Revcomp 940ms × (0.96,1.07) 924ms × (0.97,1.07) ~ (p=0.071) Template 128ms × (0.96,1.05) 128ms × (0.99,1.01) ~ (p=0.502) TimeParse 632ns × (0.96,1.07) 616ns × (0.99,1.01) -2.58% (p=0.001) TimeFormat 671ns × (0.97,1.06) 657ns × (0.99,1.02) -2.10% (p=0.002) In contrast to the one in test/bench/go1 (above), the binarytree program on the shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS to runtime.NumCPU()*2. Using that version, before vs after: name old mean new mean delta BinaryTree20 18.6s × (0.96,1.05) 11.3s × (0.98,1.02) -39.46% (p=0.000) And Go 1.4 vs after: name old mean new mean delta BinaryTree20 13.0s × (0.97,1.02) 11.3s × (0.98,1.02) -13.21% (p=0.000) There is still a scheduling problem - the raw run times are hiding the fact that this chews up 2x the CPU - but we'll take care of that separately. Change-Id: I3f5da879b24ae73a0d06745381ffb88c3744948b Reviewed-on: https://go-review.googlesource.com/10220 Reviewed-by: Austin Clements <austin@google.com>	2015-05-19 15:29:40 +00:00
Josh Bleecher Snyder	79986e24e0	runtime/pprof: write heap statistics to heap profile always This is a duplicate of CL 9491. That CL broke the build due to pprof shortcomings and was reverted in CL 9565. CL 9623 fixed pprof, so this can go in again. Fixes #10659. Change-Id: If470fc90b3db2ade1d161b4417abd2f5c6c330b8 Reviewed-on: https://go-review.googlesource.com/10212 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2015-05-18 20:02:21 +00:00
Austin Clements	f0dd002895	runtime: use separate count and note for forEachP Currently, forEachP reuses the stopwait and stopnote fields from stopTheWorld to track how many Ps have not responded to the safe-point request and to sleep until all Ps have responded. It was assumed this was safe because both stopTheWorld and forEachP must occur under the worlsema and hence stopwait and stopnote cannot be used for both purposes simultaneously and callers could always determine the appropriate use based on sched.gcwaiting (which is only set by stopTheWorld). However, this is not the case, since it's possible for there to be a window between when an M observes that gcwaiting is set and when it checks stopwait during which stopwait could have changed meanings. When this happens, the M decrements stopwait and may wakeup stopnote, but does not otherwise participate in the forEachP protocol. As a result, stopwait is decremented too many times, so it may reach zero before all Ps have run the safe-point function, causing forEachP to wake up early. It will then either observe that some P has not run the safe-point function and panic with "P did not run fn", or the remaining P (or Ps) will run the safe-point function before it wakes up and it will observe that stopwait is negative and panic with "not stopped". Fix this problem by giving forEachP its own safePointWait and safePointNote fields. One known sequence of events that can cause this race is as follows. It involves three actors: G1 is running on M1 on P1. P1 has an empty run queue. G2/M2 is in a blocked syscall and has lost its P. (The details of this don't matter, it just needs to be in a position where it needs to grab an idle P.) GC just started on G3/M3/P3. (These aren't very involved, they just have to be separate from the other G's, M's, and P's.) 1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1. Now G1/M1 begins to enter a syscall: 2. G1/M1 invokes reentersyscall, which sets the P1's status to _Psyscall. 3. G1/M1's reentersyscall observes gcwaiting != 0 and calls entersyscall_gcwait. 4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock. Back on GC: 5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and returns. 6. GC does stuff and then calls startTheWorld(). 7. startTheWorld() calls procresize(), which sets P1's status to _Pidle and puts P1 on the idle list. Now G2/M2 returns from its syscall and takes over P1: 8. G2/M2 returns from its blocked syscall and gets P1 from the idle list. 9. G2/M2 acquires P1, which sets P1's status to _Prunning. 10. G2/M2 starts a new syscall and invokes reentersyscall, which sets P1's status to _Psyscall. Back on G1/M1: 11. G1/M1 finally acquires sched.lock in entersyscall_gcwait. At this point, G1/M1 still thinks it's running on P1. P1's status is _Psyscall, which is consistent with what G1/M1 is doing, but it's _Psyscall because G2/M2 put it in to _Psyscall, not G1/M1. This is basically an ABA race on P1's status. Because forEachP currently shares stopwait with stopTheWorld. G1/M1's entersyscall_gcwait observes the non-zero stopwait set by forEachP, but mistakes it for a stopTheWorld. It cas's P1's status from _Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement stopwait one more time than forEachP was expecting. Fixes #10618. (See the issue for details on why the above race is safe when forEachP is not involved.) Prior to this commit, the command stress ./runtime.test -test.run TestFutexsleep\\|TestGoroutineProfile would reliably fail after a few hundred runs. With this commit, it ran for over 2 million runs and never crashed. Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30 Reviewed-on: https://go-review.googlesource.com/10157 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:47 +00:00
Austin Clements	277acca286	runtime: hold worldsema while starting the world Currently, startTheWorld releases worldsema before starting the world. Since startTheWorld can change gomaxprocs after allowing Ps to run, this means that gomaxprocs can change while another P holds worldsema. Unfortunately, the garbage collector and forEachP assume that holding worldsema protects against changes in gomaxprocs (which it almost does). In particular, this is causing somewhat frequent "P did not run fn" crashes in forEachP in the runtime tests because gomaxprocs is changing between the several loops that forEachP does over all the Ps. Fix this by only releasing worldsema after the world is started. This relates to issue #10618. forEachP still fails under stress testing, but much less frequently. Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff Reviewed-on: https://go-review.googlesource.com/10156 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:37 +00:00
Austin Clements	9c44a41dd5	runtime: disallow preemption during startTheWorld Currently, startTheWorld clears preemptoff for the current M before starting the world. A few callers increment m.locks around startTheWorld, presumably to prevent preemption any time during starting the world. This is almost certainly pointless (none of the other callers do this), but there's no harm in making startTheWorld keep preemption disabled until it's all done, which definitely lets us drop these m.locks manipulations. Change-Id: I8a93658abd0c72276c9bafa3d2c7848a65b4691a Reviewed-on: https://go-review.googlesource.com/10155 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:31 +00:00
Austin Clements	a1da255aa0	runtime: factor stoptheworld/starttheworld pattern There are several steps to stopping and starting the world and currently they're open-coded in several places. The garbage collector is the only thing that needs to stop and start the world in a non-trivial pattern. Replace all other uses with calls to higher-level functions that implement the entire pattern necessary to stop and start the world. This is a pure refectoring and should not change any code semantics. In the following commits, we'll make changes that are easier to do with this abstraction in place. This commit renames the old starttheworld to startTheWorldWithSema. This is a slight misnomer right now because the callers release worldsema just before calling this. However, a later commit will swap these and I don't want to think of another name in the mean time. Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911 Reviewed-on: https://go-review.googlesource.com/10154 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:25 +00:00
Austin Clements	5f7060afd2	runtime: don't start GC if preemptoff is set In order to avoid deadlocks, startGC avoids kicking off GC if locks are held by the calling M. However, it currently fails to check preemptoff, which is the other way to disable preemption. Fix this by adding a check for preemptoff. Change-Id: Ie1083166e5ba4af5c9d6c5a42efdfaaef41ca997 Reviewed-on: https://go-review.googlesource.com/10153 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 14:55:18 +00:00
Alex Brainman	e544bee1dd	runtime: correct exception stack trace output It is misleading when stack trace say: signal arrived during cgo execution but we are not in cgo call. Change-Id: I627e2f2bdc7755074677f77f21befc070a101914 Reviewed-on: https://go-review.googlesource.com/9190 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-18 03:09:45 +00:00
Austin Clements	a0fc306023	runtime: eliminate runqvictims and a copy from runqsteal Currently, runqsteal steals Gs from another P into an intermediate buffer and then copies those Gs into the current P's run queue. This intermediate buffer itself was moved from the stack to the P in commit `c4fe503` to eliminate the cost of zeroing it on every steal. This commit follows up `c4fe503` by stealing directly into the current P's run queue, which eliminates the copy and the need for the intermediate buffer. The update to the tail pointer is only committed once the entire steal operation has succeeded, so the semantics of stealing do not change. Change-Id: Icdd7a0eb82668980bf42c0154b51eef6419fdd51 Reviewed-on: https://go-review.googlesource.com/9998 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-05-17 01:08:42 +00:00
Russ Cox	512f75e8df	runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>	2015-05-16 00:38:17 +00:00
Russ Cox	d820d5f3ab	runtime: make mapzero not crash on arm Change-Id: I40e8a4a2e62253233b66f6a2e61e222437292c31 Reviewed-on: https://go-review.googlesource.com/10151 Reviewed-by: Minux Ma <minux@golang.org>	2015-05-15 20:14:41 +00:00
Russ Cox	c3c047a6a3	runtime: test and fix heap bitmap for 1-pointer allocation on 32-bit system Change-Id: Ic064fe7c6bd3304dcc8c3f7b3b5393870b5387c2 Reviewed-on: https://go-review.googlesource.com/10119 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Austin Clements <austin@google.com>	2015-05-15 18:47:00 +00:00
Russ Cox	7e26a2d9a8	runtime: allocate map element zero values for reflect-created types on demand Preallocating them in reflect means that (1) if you say _ = PtrTo(ArrayOf(1000000000, reflect.TypeOf(byte(0)))), you just allocated 1GB of data (2) if you say it again, that's another GB of data. The only use of t.zero in the runtime is for map elements. Delay the allocation until the creation of a map with that element type, and share the zeros. The one downside of the shared zero is that it's not garbage collected, but it's also never written, so the OS should be able to handle it fairly efficiently. Change-Id: I56b098a091abf3ac0945de28ebef9a6c08e76614 Reviewed-on: https://go-review.googlesource.com/10111 Reviewed-by: Keith Randall <khr@golang.org>	2015-05-15 13:56:40 +00:00
Russ Cox	65c4d7beab	runtime: optimize heapBitsBulkBarrier a tiny amount This may be mostly noise but: name old mean new mean delta BinaryTree17 6.03s × (0.98,1.02) 5.98s × (0.97,1.03) ~ (p=0.306) Fannkuch11 4.42s × (0.99,1.01) 4.34s × (0.99,1.02) -1.83% (p=0.000) FmtFprintfEmpty 84.7ns × (0.99,1.01) 84.4ns × (1.00,1.00) ~ (p=0.138) FmtFprintfString 289ns × (0.98,1.02) 289ns × (1.00,1.01) ~ (p=0.509) FmtFprintfInt 280ns × (0.97,1.03) 272ns × (0.98,1.03) -2.64% (p=0.003) FmtFprintfIntInt 484ns × (0.98,1.02) 482ns × (0.98,1.03) ~ (p=0.606) FmtFprintfPrefixedInt 397ns × (0.98,1.03) 393ns × (0.99,1.02) ~ (p=0.064) FmtFprintfFloat 573ns × (0.99,1.01) 569ns × (0.99,1.01) -0.69% (p=0.023) FmtManyArgs 1.89µs × (0.99,1.02) 1.91µs × (0.98,1.02) ~ (p=0.219) GobDecode 15.4ms × (0.99,1.02) 15.1ms × (0.99,1.01) -2.05% (p=0.000) GobEncode 12.0ms × (0.97,1.04) 11.9ms × (0.97,1.03) ~ (p=0.458) Gzip 652ms × (0.99,1.01) 653ms × (0.99,1.01) ~ (p=0.743) Gunzip 144ms × (0.99,1.01) 143ms × (0.99,1.01) ~ (p=0.134) HTTPClientServer 91.6µs × (0.99,1.01) 91.8µs × (0.99,1.03) ~ (p=0.678) JSONEncode 31.9ms × (1.00,1.00) 32.0ms × (0.99,1.01) ~ (p=0.334) JSONDecode 110ms × (0.99,1.01) 110ms × (0.99,1.01) ~ (p=0.315) Mandelbrot200 6.04ms × (0.99,1.01) 6.04ms × (1.00,1.01) ~ (p=0.596) GoParse 6.72ms × (0.98,1.03) 6.74ms × (0.99,1.03) ~ (p=0.577) RegexpMatchEasy0_32 161ns × (0.99,1.01) 160ns × (1.00,1.00) -0.83% (p=0.002) RegexpMatchEasy0_1K 542ns × (0.99,1.02) 541ns × (0.99,1.01) ~ (p=0.396) RegexpMatchEasy1_32 140ns × (0.98,1.01) 137ns × (1.00,1.00) -2.12% (p=0.000) RegexpMatchEasy1_1K 892ns × (0.99,1.01) 891ns × (1.00,1.01) ~ (p=0.631) RegexpMatchMedium_32 255ns × (0.99,1.01) 253ns × (0.99,1.01) -0.76% (p=0.008) RegexpMatchMedium_1K 73.1µs × (1.00,1.01) 72.9µs × (1.00,1.00) ~ (p=0.229) RegexpMatchHard_32 3.86µs × (1.00,1.01) 3.85µs × (1.00,1.00) ~ (p=0.341) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (0.99,1.00) ~ (p=0.955) Revcomp 954ms × (0.97,1.03) 955ms × (0.98,1.02) ~ (p=0.894) Template 133ms × (0.97,1.05) 129ms × (0.99,1.02) -2.50% (p=0.014) TimeParse 629ns × (0.99,1.01) 626ns × (0.99,1.01) ~ (p=0.106) TimeFormat 663ns × (0.99,1.01) 660ns × (0.99,1.02) ~ (p=0.231) Change-Id: I580e03ed01b0629cb5eae4c4637618f20127f924 Reviewed-on: https://go-review.googlesource.com/9994 Reviewed-by: Austin Clements <austin@google.com>	2015-05-15 13:52:00 +00:00
Russ Cox	497970f421	runtime: use memmove during slice append The effect of this CL: name old mean new mean delta BinaryTree17 5.97s × (0.96,1.04) 5.95s × (0.98,1.02) ~ (p=0.697) Fannkuch11 4.39s × (1.00,1.01) 4.41s × (1.00,1.01) +0.52% (p=0.015) FmtFprintfEmpty 90.8ns × (0.97,1.05) 89.4ns × (0.94,1.13) ~ (p=0.571) FmtFprintfString 305ns × (0.99,1.01) 292ns × (0.98,1.05) -4.35% (p=0.000) FmtFprintfInt 278ns × (0.96,1.03) 279ns × (0.98,1.04) ~ (p=0.741) FmtFprintfIntInt 489ns × (0.99,1.02) 482ns × (0.98,1.03) -1.43% (p=0.024) FmtFprintfPrefixedInt 402ns × (0.98,1.02) 395ns × (0.98,1.03) -1.67% (p=0.014) FmtFprintfFloat 578ns × (1.00,1.00) 569ns × (0.99,1.01) -1.48% (p=0.000) FmtManyArgs 1.88µs × (0.99,1.01) 1.88µs × (1.00,1.01) ~ (p=0.055) GobDecode 15.3ms × (0.99,1.01) 15.2ms × (1.00,1.01) -0.61% (p=0.007) GobEncode 11.8ms × (0.98,1.05) 11.6ms × (0.99,1.01) ~ (p=0.075) Gzip 647ms × (0.99,1.01) 647ms × (1.00,1.00) ~ (p=0.790) Gunzip 143ms × (1.00,1.00) 142ms × (1.00,1.00) ~ (p=0.370) HTTPClientServer 91.2µs × (0.99,1.01) 91.7µs × (0.99,1.02) ~ (p=0.233) JSONEncode 31.5ms × (0.98,1.01) 31.8ms × (0.99,1.02) +1.09% (p=0.015) JSONDecode 110ms × (0.99,1.01) 110ms × (0.99,1.02) ~ (p=0.577) Mandelbrot200 6.00ms × (1.00,1.00) 6.02ms × (1.00,1.00) +0.24% (p=0.001) GoParse 6.68ms × (0.98,1.02) 6.61ms × (0.99,1.01) -1.10% (p=0.027) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (1.00,1.01) -0.66% (p=0.001) RegexpMatchEasy0_1K 539ns × (1.00,1.00) 539ns × (0.99,1.01) ~ (p=0.509) RegexpMatchEasy1_32 140ns × (0.99,1.02) 139ns × (0.99,1.02) ~ (p=0.163) RegexpMatchEasy1_1K 886ns × (1.00,1.00) 887ns × (1.00,1.00) ~ (p=0.408) RegexpMatchMedium_32 252ns × (1.00,1.00) 255ns × (0.99,1.01) +1.01% (p=0.000) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.176) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.403) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.351) Revcomp 926ms × (0.99,1.01) 925ms × (0.99,1.01) ~ (p=0.541) Template 126ms × (0.99,1.02) 130ms × (0.99,1.01) +3.42% (p=0.000) TimeParse 632ns × (0.99,1.01) 626ns × (1.00,1.00) -0.88% (p=0.000) TimeFormat 658ns × (0.99,1.01) 662ns × (0.99,1.02) ~ (p=0.111) The effect of this CL combined with CL 9886: name old mean new mean delta BinaryTree17 5.90s × (0.98,1.03) 5.95s × (0.98,1.02) ~ (p=0.175) Fannkuch11 4.34s × (1.00,1.00) 4.41s × (1.00,1.01) +1.69% (p=0.000) FmtFprintfEmpty 87.3ns × (0.97,1.17) 89.4ns × (0.94,1.13) ~ (p=0.499) FmtFprintfString 288ns × (0.98,1.04) 292ns × (0.98,1.05) ~ (p=0.292) FmtFprintfInt 290ns × (0.98,1.05) 279ns × (0.98,1.04) -3.76% (p=0.001) FmtFprintfIntInt 493ns × (0.98,1.04) 482ns × (0.98,1.03) -2.27% (p=0.017) FmtFprintfPrefixedInt 399ns × (0.98,1.02) 395ns × (0.98,1.03) ~ (p=0.159) FmtFprintfFloat 569ns × (1.00,1.00) 569ns × (0.99,1.01) ~ (p=0.847) FmtManyArgs 1.90µs × (0.99,1.03) 1.88µs × (1.00,1.01) -1.14% (p=0.009) GobDecode 15.2ms × (1.00,1.01) 15.2ms × (1.00,1.01) ~ (p=0.170) GobEncode 11.8ms × (0.99,1.02) 11.6ms × (0.99,1.01) -1.47% (p=0.003) Gzip 649ms × (0.99,1.00) 647ms × (1.00,1.00) ~ (p=0.200) Gunzip 144ms × (0.99,1.01) 142ms × (1.00,1.00) -1.04% (p=0.000) HTTPClientServer 91.1µs × (0.98,1.03) 91.7µs × (0.99,1.02) ~ (p=0.345) JSONEncode 31.5ms × (0.99,1.01) 31.8ms × (0.99,1.02) +0.98% (p=0.021) JSONDecode 110ms × (1.00,1.01) 110ms × (0.99,1.02) ~ (p=0.259) Mandelbrot200 6.02ms × (1.00,1.01) 6.02ms × (1.00,1.00) ~ (p=0.500) GoParse 6.68ms × (1.00,1.01) 6.61ms × (0.99,1.01) -1.17% (p=0.001) RegexpMatchEasy0_32 161ns × (1.00,1.00) 161ns × (1.00,1.01) -0.39% (p=0.033) RegexpMatchEasy0_1K 539ns × (1.00,1.00) 539ns × (0.99,1.01) ~ (p=0.445) RegexpMatchEasy1_32 138ns × (1.00,1.01) 139ns × (0.99,1.02) ~ (p=0.281) RegexpMatchEasy1_1K 887ns × (1.00,1.01) 887ns × (1.00,1.00) ~ (p=0.610) RegexpMatchMedium_32 251ns × (1.00,1.02) 255ns × (0.99,1.01) +1.42% (p=0.000) RegexpMatchMedium_1K 72.7µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.097) RegexpMatchHard_32 3.85µs × (1.00,1.00) 3.84µs × (1.00,1.00) -0.31% (p=0.000) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.704) Revcomp 923ms × (0.98,1.02) 925ms × (0.99,1.01) ~ (p=0.574) Template 126ms × (0.98,1.03) 130ms × (0.99,1.01) +3.28% (p=0.000) TimeParse 631ns × (0.99,1.02) 626ns × (1.00,1.00) ~ (p=0.053) TimeFormat 660ns × (0.99,1.01) 662ns × (0.99,1.02) ~ (p=0.398) Change-Id: I59c03d329fe7bc178a31477c6f1f01062b881041 Reviewed-on: https://go-review.googlesource.com/9993 Reviewed-by: Austin Clements <austin@google.com>	2015-05-15 13:51:49 +00:00
Russ Cox	30aacd4ce2	runtime: add Node128, Node130 benchmarks Change-Id: I815a7ceeea48cc652b3c8568967665af39b02834 Reviewed-on: https://go-review.googlesource.com/10045 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-05-14 20:21:34 +00:00
Russ Cox	ecfe42cab0	runtime: keep pointer bits set always in 1-word spans It's dumb to clear them in initSpan, set them in heapBitsSetType, clear them in heapBitsSweepSpan, set them again in heapBitsSetType, clear them again in heapBitsSweepSpan, and so on. Set them in initSpan and be done with it (until the span is reused for objects of a different size). This avoids an atomic operation in a common case (one-word allocation). Suggested by rlh. name old mean new mean delta BinaryTree17 5.87s × (0.97,1.03) 5.93s × (0.98,1.04) ~ (p=0.056) Fannkuch11 4.34s × (1.00,1.01) 4.41s × (1.00,1.00) +1.42% (p=0.000) FmtFprintfEmpty 86.1ns × (0.98,1.03) 88.9ns × (0.95,1.14) ~ (p=0.066) FmtFprintfString 292ns × (0.97,1.04) 284ns × (0.98,1.03) -2.64% (p=0.000) FmtFprintfInt 271ns × (0.98,1.06) 274ns × (0.98,1.05) ~ (p=0.148) FmtFprintfIntInt 478ns × (0.98,1.05) 487ns × (0.98,1.03) +1.85% (p=0.004) FmtFprintfPrefixedInt 397ns × (0.98,1.05) 394ns × (0.98,1.02) ~ (p=0.184) FmtFprintfFloat 553ns × (0.99,1.02) 543ns × (0.99,1.01) -1.71% (p=0.000) FmtManyArgs 1.90µs × (0.98,1.05) 1.88µs × (0.99,1.01) -0.97% (p=0.037) GobDecode 15.1ms × (0.99,1.01) 15.3ms × (0.99,1.01) +0.78% (p=0.001) GobEncode 11.7ms × (0.98,1.05) 11.6ms × (0.99,1.02) -1.39% (p=0.009) Gzip 646ms × (1.00,1.01) 647ms × (1.00,1.01) ~ (p=0.120) Gunzip 142ms × (1.00,1.00) 142ms × (1.00,1.00) ~ (p=0.068) HTTPClientServer 89.7µs × (0.99,1.01) 90.1µs × (0.98,1.03) ~ (p=0.224) JSONEncode 31.3ms × (0.99,1.01) 31.2ms × (0.99,1.02) ~ (p=0.149) JSONDecode 113ms × (0.99,1.01) 111ms × (0.99,1.01) -1.25% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) +0.09% (p=0.015) GoParse 6.63ms × (0.98,1.03) 6.55ms × (0.99,1.02) -1.10% (p=0.006) RegexpMatchEasy0_32 161ns × (1.00,1.00) 161ns × (1.00,1.00) (sample has zero variance) RegexpMatchEasy0_1K 539ns × (0.99,1.01) 563ns × (0.99,1.01) +4.51% (p=0.000) RegexpMatchEasy1_32 140ns × (0.99,1.01) 141ns × (0.99,1.01) +1.34% (p=0.000) RegexpMatchEasy1_1K 886ns × (1.00,1.01) 888ns × (1.00,1.00) +0.20% (p=0.003) RegexpMatchMedium_32 252ns × (1.00,1.02) 255ns × (0.99,1.01) +1.32% (p=0.000) RegexpMatchMedium_1K 72.7µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.296) RegexpMatchHard_32 3.84µs × (1.00,1.01) 3.84µs × (1.00,1.00) ~ (p=0.339) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) -0.28% (p=0.022) Revcomp 914ms × (0.99,1.01) 909ms × (0.99,1.01) -0.49% (p=0.031) Template 128ms × (0.99,1.01) 127ms × (0.99,1.01) -1.10% (p=0.000) TimeParse 628ns × (0.99,1.01) 639ns × (0.99,1.01) +1.69% (p=0.000) TimeFormat 660ns × (0.99,1.01) 662ns × (0.99,1.02) ~ (p=0.287) Change-Id: I3127b0ab89708267c74aa7d0eae1db1a1bcdfda5 Reviewed-on: https://go-review.googlesource.com/9884 Reviewed-by: Austin Clements <austin@google.com>	2015-05-14 15:58:54 +00:00
Russ Cox	94934f843e	runtime: rewrite addb/subtractb to be simpler to compile; introduce add1, subtract1 This reduces the depth of the inlining at a particular call site. The inliner introduces many temporary variables, and the compiler can do a better job with fewer. Being verbose in the bodies of these helper functions seems like a reasonable tradeoff: the uses are still just as readable, and they run faster in some important cases. Change-Id: I5323976ed3704d0acd18fb31176cfbf5ba23a89c Reviewed-on: https://go-review.googlesource.com/9883 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-14 15:55:42 +00:00
Russ Cox	5b3739357a	runtime: skip atomics in heapBitsSetType when GC is not running Suggested by Rick during code review of this code, but separated out for easier diagnosis in case it causes problems (and also easier rollback). name old mean new mean delta SetTypePtr 13.9ns × (0.98,1.05) 6.2ns × (0.99,1.01) -55.18% (p=0.000) SetTypePtr8 15.5ns × (0.95,1.10) 15.5ns × (0.99,1.05) ~ (p=0.952) SetTypePtr16 17.8ns × (0.99,1.05) 18.0ns × (1.00,1.00) ~ (p=0.157) SetTypePtr32 25.2ns × (0.99,1.01) 24.3ns × (0.99,1.01) -3.86% (p=0.000) SetTypePtr64 42.2ns × (0.93,1.13) 40.8ns × (0.99,1.01) ~ (p=0.239) SetTypePtr126 67.3ns × (1.00,1.00) 67.5ns × (0.99,1.02) ~ (p=0.365) SetTypePtr128 67.6ns × (1.00,1.01) 70.1ns × (0.97,1.10) ~ (p=0.063) SetTypePtrSlice 575ns × (0.98,1.06) 543ns × (0.95,1.17) -5.54% (p=0.034) SetTypeNode1 12.4ns × (0.98,1.09) 12.8ns × (0.99,1.01) +3.40% (p=0.021) SetTypeNode1Slice 97.1ns × (0.97,1.09) 89.5ns × (1.00,1.00) -7.78% (p=0.000) SetTypeNode8 29.8ns × (1.00,1.01) 17.7ns × (1.00,1.01) -40.74% (p=0.000) SetTypeNode8Slice 204ns × (0.99,1.04) 190ns × (0.97,1.06) -6.96% (p=0.000) SetTypeNode64 42.8ns × (0.99,1.01) 44.0ns × (0.95,1.12) ~ (p=0.163) SetTypeNode64Slice 1.00µs × (0.95,1.09) 0.98µs × (0.96,1.08) ~ (p=0.356) SetTypeNode64Dead 12.2ns × (0.99,1.04) 12.7ns × (1.00,1.01) +4.34% (p=0.000) SetTypeNode64DeadSlice 1.14µs × (0.94,1.11) 0.99µs × (0.99,1.03) -13.74% (p=0.000) SetTypeNode124 67.9ns × (0.99,1.03) 70.4ns × (0.95,1.15) ~ (p=0.115) SetTypeNode124Slice 1.76µs × (0.99,1.04) 1.88µs × (0.91,1.23) ~ (p=0.096) SetTypeNode126 67.7ns × (1.00,1.01) 68.2ns × (0.99,1.02) +0.72% (p=0.014) SetTypeNode126Slice 1.76µs × (1.00,1.01) 1.87µs × (0.93,1.15) +6.15% (p=0.035) SetTypeNode1024 462ns × (0.96,1.10) 451ns × (0.99,1.05) ~ (p=0.224) SetTypeNode1024Slice 14.4µs × (0.95,1.15) 14.2µs × (0.97,1.19) ~ (p=0.676) name old mean new mean delta BinaryTree17 5.87s × (0.98,1.04) 5.87s × (0.98,1.03) ~ (p=0.993) Fannkuch11 4.39s × (0.99,1.01) 4.34s × (1.00,1.01) -1.22% (p=0.000) FmtFprintfEmpty 90.6ns × (0.97,1.06) 89.4ns × (0.97,1.03) ~ (p=0.070) FmtFprintfString 305ns × (0.98,1.02) 296ns × (0.99,1.02) -2.94% (p=0.000) FmtFprintfInt 276ns × (0.97,1.04) 270ns × (0.98,1.03) -2.17% (p=0.001) FmtFprintfIntInt 490ns × (0.97,1.05) 473ns × (0.99,1.02) -3.59% (p=0.000) FmtFprintfPrefixedInt 402ns × (0.99,1.02) 397ns × (0.99,1.01) -1.15% (p=0.000) FmtFprintfFloat 577ns × (0.99,1.01) 549ns × (0.99,1.01) -4.78% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.02) 1.87µs × (0.99,1.01) -1.43% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 14.7ms × (0.99,1.02) -3.55% (p=0.000) GobEncode 11.7ms × (0.98,1.04) 11.5ms × (0.99,1.02) -1.63% (p=0.002) Gzip 647ms × (0.99,1.01) 647ms × (1.00,1.01) ~ (p=0.486) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.00) ~ (p=0.234) HTTPClientServer 90.7µs × (0.99,1.01) 90.4µs × (0.98,1.04) ~ (p=0.331) JSONEncode 31.9ms × (0.97,1.06) 31.6ms × (0.98,1.02) ~ (p=0.206) JSONDecode 110ms × (0.99,1.01) 112ms × (0.99,1.02) +1.48% (p=0.000) Mandelbrot200 6.00ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.058) GoParse 6.63ms × (0.98,1.03) 6.61ms × (0.98,1.02) ~ (p=0.353) RegexpMatchEasy0_32 162ns × (0.99,1.01) 161ns × (1.00,1.00) -0.33% (p=0.004) RegexpMatchEasy0_1K 539ns × (0.99,1.01) 540ns × (0.99,1.02) ~ (p=0.222) RegexpMatchEasy1_32 139ns × (0.99,1.01) 140ns × (0.97,1.03) ~ (p=0.054) RegexpMatchEasy1_1K 886ns × (1.00,1.00) 887ns × (1.00,1.00) +0.18% (p=0.001) RegexpMatchMedium_32 252ns × (1.00,1.01) 252ns × (1.00,1.00) +0.21% (p=0.010) RegexpMatchMedium_1K 72.7µs × (1.00,1.01) 72.6µs × (1.00,1.00) ~ (p=0.060) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.065) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) -0.27% (p=0.000) Revcomp 916ms × (0.98,1.04) 909ms × (0.99,1.01) ~ (p=0.054) Template 126ms × (0.99,1.01) 128ms × (0.99,1.02) +1.43% (p=0.000) TimeParse 632ns × (0.99,1.01) 625ns × (1.00,1.01) -1.05% (p=0.000) TimeFormat 655ns × (0.99,1.02) 669ns × (0.99,1.02) +2.01% (p=0.000) Change-Id: I9477b7c9489c6fa98e860c190ce06cd73c53c6a1 Reviewed-on: https://go-review.googlesource.com/9829 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-14 15:54:53 +00:00
Brad Fitzpatrick	6f2c0f1585	runtime: add check for malloc in a signal handler Change-Id: Ic8ebbe81eb788626c01bfab238d54236e6e5ef2b Reviewed-on: https://go-review.googlesource.com/9964 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-13 20:36:19 +00:00
Rick Hudson	c4fe503119	runtime: reduce thrashing of gs between ps One important use case is a pipeline computation that pass values from one Goroutine to the next and then exits or is placed in a wait state. If GOMAXPROCS > 1 a Goroutine running on P1 will enable another Goroutine and then immediately make P1 available to execute it. We need to prevent other Ps from stealing the G that P1 is about to execute. Otherwise the Gs can thrash between Ps causing unneeded synchronization and slowing down throughput. Fix this by changing the stealing logic so that when a P attempts to steal the only G on some other P's run queue, it will pause momentarily to allow the victim P to schedule the G. As part of optimizing stealing we also use a per P victim queue move stolen gs. This eliminates the zeroing of a stack local victim queue which turned out to be expensive. This CL is a necessary but not sufficient prerequisite to changing the default value of GOMAXPROCS to something > 1 which is another CL/discussion. For highly serialized programs, such as GoroutineRing below this can make a large difference. For larger and more parallel programs such as the x/benchmarks there is no noticeable detriment. ~/work/code/src/rsc.io/benchstat/benchstat old.txt new.txt name old mean new mean delta GoroutineRing 30.2µs × (0.98,1.01) 30.1µs × (0.97,1.04) ~ (p=0.941) GoroutineRing-2 113µs × (0.91,1.07) 30µs × (0.98,1.03) -73.17% (p=0.004) GoroutineRing-4 144µs × (0.98,1.02) 32µs × (0.98,1.01) -77.69% (p=0.000) GoroutineRingBuf 32.7µs × (0.97,1.03) 32.5µs × (0.97,1.02) ~ (p=0.795) GoroutineRingBuf-2 120µs × (0.92,1.08) 33µs × (1.00,1.00) -72.48% (p=0.004) GoroutineRingBuf-4 138µs × (0.92,1.06) 33µs × (1.00,1.00) -76.21% (p=0.003) The bench benchmarks show little impact. old new garbage 7032879 7011696 httpold 25509 25301 splayold 1022073 1019499 jsonold 28230624 28081433 Change-Id: I228c48fed8d85c9bbef16a7edc53ab7898506f50 Reviewed-on: https://go-review.googlesource.com/9872 Reviewed-by: Austin Clements <austin@google.com>	2015-05-13 12:55:24 +00:00
Austin Clements	350fd548b3	runtime: don't run runq tests on the system stack Running these tests on the system stack is problematic because they allocate Ps, which are large enough to overflow the system stack if they are stack-allocated. It used to be necessary to run these tests on the system stack because they were written in C, but since this is no longer the case, we can fix this problem by simply not running the tests on the system stack. This also means we no longer need the hack in one of these tests that forces the allocated Ps to escape to the heap, so eliminate that as well. Change-Id: I9064f5f8fd7f7b446ff39a22a70b172cfcb2dc57 Reviewed-on: https://go-review.googlesource.com/9923 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-05-12 19:58:08 +00:00
David du Colombier	7de86a1b1c	runtime: terminate exit status buffer on Plan 9 The status buffer built by the exit function was not nil-terminated. Fixes #10789. Change-Id: I2d34ac50a19d138176c4b47393497ba7070d5b61 Reviewed-on: https://go-review.googlesource.com/9953 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>	2015-05-12 16:35:58 +00:00
David du Colombier	f85a05581e	runtime: fix signal handling on Plan 9 Once added to the signal queue, the pointer passed to the signal handler could no longer be valid. Instead of passing the pointer to the note string, we recopy the value of the note string to a static array in the signal queue. Fixes #10784. Change-Id: Iddd6837b58a14dfaa16b069308ae28a7b8e0965b Reviewed-on: https://go-review.googlesource.com/9950 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-05-12 16:35:46 +00:00
Michael Hudson-Doyle	77fc03f4cd	cmd/internal/ld, runtime: abort on shared library ABI mismatch This: 1) Defines the ABI hash of a package (as the SHA1 of the __.PKGDEF) 2) Defines the ABI hash of a shared library (sort the packages by import path, concatenate the hashes of the packages and SHA1 that) 3) When building a shared library, compute the above value and define a global symbol that points to a go string that has the hash as its value. 4) When linking against a shared library, read the abi hash from the library and put both the value seen at link time and a reference to the global symbol into the moduledata. 5) During runtime initialization, check that the hash seen at link time still matches the hash the global symbol points to. Change-Id: Iaa54c783790e6dde3057a2feadc35473d49614a5 Reviewed-on: https://go-review.googlesource.com/8773 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>	2015-05-12 01:30:40 +00:00
Michael Hudson-Doyle	be0cb9224b	runtime: fix addmoduledata to follow the platform ABI addmoduledata is called from a .init_array function and need to follow the platform ABI. It contains accesses to global data which are rewritten to use R15 by the assembler, and as R15 is callee-save we need to save it. Change-Id: I03893efb1576aed4f102f2465421f256f3bb0f30 Reviewed-on: https://go-review.googlesource.com/9941 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-05-12 00:50:32 +00:00
Russ Cox	4212a3c3d9	runtime: use heap bitmap for typedmemmove The current implementation of typedmemmove walks the ptrmask in the type to find out where pointers are. This led to turning off GC programs for the Go 1.5 dev cycle, so that there would always be a ptrmask. Instead of also interpreting the GC programs, interpret the heap bitmap, which we know must be available and up to date. (There is no point to write barriers when writing outside the heap.) This CL is only about correctness. The next CL will optimize the code. Change-Id: Id1305c7c071fd2734ab96634b0e1c745b23fa793 Reviewed-on: https://go-review.googlesource.com/9886 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 16:38:21 +00:00
Russ Cox	266a842f55	runtime: zero entire bitmap for object, even past dead marker We want typedmemmove to use the heap bitmap to determine where pointers are, instead of reinterpreting the type information. The heap bitmap is simpler to access. In general, typedmemmove will need to be able to look up the bits for any word and find valid pointer information, so fill even after the dead marker. Not filling after the dead marker was an optimization I introduced only a few days ago, when reintroducing the dead marker code. At the time I said it probably wouldn't last, and it didn't. Change-Id: I6ba01bff17ddee1ff429f454abe29867ec60606e Reviewed-on: https://go-review.googlesource.com/9885 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 16:37:46 +00:00
Russ Cox	e375ca2a25	runtime: reorder bits in heap bitmap bytes The runtime deals with 1-bit pointer bitmaps and 2-bit heap bitmaps that have entries for both pointers and mark bits. Each byte in a 1-bit pointer bitmap looks like pppppppp (all pointer bits). Each byte in a 2-bit heap bitmap looks like mpmpmpmp (mark, pointer, ...). This means that when converting from 1-bit to 2-bit, as we do during malloc, we have to pick up 4 bits in pppp form and use shifts to create the mpmpmpmp form. This CL changes the 2-bit heap bitmap form to mmmmpppp, so that 4 bits picked up in 1-bit form can be used directly in the low bits of the heap bitmap byte, without expansion. This simplifies the code, and it also happens to be faster. name old mean new mean delta SetTypePtr 14.0ns × (0.98,1.09) 14.0ns × (0.98,1.08) ~ (p=0.966) SetTypePtr8 16.5ns × (0.99,1.05) 15.3ns × (0.96,1.16) -6.86% (p=0.012) SetTypePtr16 21.3ns × (0.98,1.05) 18.8ns × (0.94,1.14) -11.49% (p=0.000) SetTypePtr32 34.6ns × (0.93,1.22) 27.7ns × (0.91,1.26) -20.08% (p=0.001) SetTypePtr64 55.7ns × (0.97,1.11) 41.6ns × (0.98,1.04) -25.30% (p=0.000) SetTypePtr126 98.0ns × (1.00,1.00) 67.7ns × (0.99,1.05) -30.88% (p=0.000) SetTypePtr128 98.6ns × (1.00,1.01) 68.6ns × (0.99,1.03) -30.44% (p=0.000) SetTypePtrSlice 781ns × (0.99,1.01) 571ns × (0.99,1.04) -26.93% (p=0.000) SetTypeNode1 13.1ns × (0.99,1.01) 12.1ns × (0.99,1.01) -7.45% (p=0.000) SetTypeNode1Slice 113ns × (0.99,1.01) 94ns × (1.00,1.00) -16.35% (p=0.000) SetTypeNode8 32.7ns × (1.00,1.00) 29.8ns × (0.99,1.01) -8.97% (p=0.000) SetTypeNode8Slice 266ns × (1.00,1.00) 204ns × (1.00,1.00) -23.40% (p=0.000) SetTypeNode64 58.0ns × (0.98,1.08) 42.8ns × (1.00,1.01) -26.24% (p=0.000) SetTypeNode64Slice 1.55µs × (0.99,1.02) 0.96µs × (1.00,1.00) -37.84% (p=0.000) SetTypeNode64Dead 13.1ns × (0.99,1.01) 12.1ns × (1.00,1.00) -7.33% (p=0.000) SetTypeNode64DeadSlice 1.52µs × (1.00,1.01) 1.08µs × (1.00,1.01) -28.95% (p=0.000) SetTypeNode124 97.9ns × (1.00,1.00) 67.1ns × (1.00,1.01) -31.49% (p=0.000) SetTypeNode124Slice 2.87µs × (0.99,1.02) 1.75µs × (1.00,1.01) -39.15% (p=0.000) SetTypeNode126 98.4ns × (1.00,1.01) 68.1ns × (1.00,1.01) -30.79% (p=0.000) SetTypeNode126Slice 2.91µs × (0.99,1.01) 1.77µs × (0.99,1.01) -39.09% (p=0.000) SetTypeNode1024 732ns × (1.00,1.00) 511ns × (0.87,1.42) -30.14% (p=0.000) SetTypeNode1024Slice 23.1µs × (1.00,1.00) 13.9µs × (0.99,1.02) -39.83% (p=0.000) Change-Id: I12e3b850a4e6fa6c8146b8635ff728f3ef658819 Reviewed-on: https://go-review.googlesource.com/9828 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 16:37:36 +00:00
Russ Cox	363fd1dd1b	runtime: move a few atomic fields up Moving them up makes them properly aligned on 32-bit systems. There are some odd fields above them right now (like fixalloc and mutex maybe). Change-Id: I57851a5bbb2e7cc339712f004f99bb6c0cce0ca5 Reviewed-on: https://go-review.googlesource.com/9889 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 16:08:57 +00:00
Russ Cox	8f037fa1ab	runtime: fix TestLFStack on 386 The new(uint64) was moving to the stack, which may not be aligned. Change-Id: Iad070964202001b52029494d43e299fed980f939 Reviewed-on: https://go-review.googlesource.com/9787 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: David Chase <drchase@google.com>	2015-05-11 15:21:54 +00:00
Russ Cox	1635ab7dfe	runtime: remove wbshadow mode The write barrier shadow heap was very useful for developing the write barriers initially, but it's no longer used, clunky, and dragging the rest of the implementation down. The gccheckmark mode will find bugs due to missed barriers when they result in missed marks; wbshadow mode found the missed barriers more aggressively, but it required an entire separate copy of the heap. The gccheckmark mode requires no extra memory, making it more useful in practice. Compared to previous CL: name old mean new mean delta BinaryTree17 5.91s × (0.96,1.06) 5.72s × (0.97,1.03) -3.12% (p=0.000) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.91% (p=0.000) FmtFprintfEmpty 89.0ns × (0.93,1.10) 86.6ns × (0.96,1.11) ~ (p=0.077) FmtFprintfString 298ns × (0.98,1.06) 283ns × (0.99,1.04) -4.90% (p=0.000) FmtFprintfInt 286ns × (0.98,1.03) 283ns × (0.98,1.04) -1.09% (p=0.032) FmtFprintfIntInt 498ns × (0.97,1.06) 480ns × (0.99,1.02) -3.65% (p=0.000) FmtFprintfPrefixedInt 408ns × (0.98,1.02) 396ns × (0.99,1.01) -3.00% (p=0.000) FmtFprintfFloat 587ns × (0.98,1.01) 562ns × (0.99,1.01) -4.34% (p=0.000) FmtManyArgs 1.94µs × (0.99,1.02) 1.89µs × (0.99,1.01) -2.85% (p=0.000) GobDecode 15.8ms × (0.98,1.03) 15.7ms × (0.99,1.02) ~ (p=0.251) GobEncode 12.0ms × (0.96,1.09) 11.8ms × (0.98,1.03) -1.87% (p=0.024) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.688) Gunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.203) HTTPClientServer 90.3µs × (0.98,1.01) 89.1µs × (0.99,1.02) -1.30% (p=0.000) JSONEncode 31.6ms × (0.99,1.01) 31.7ms × (0.98,1.02) ~ (p=0.219) JSONDecode 107ms × (1.00,1.01) 111ms × (0.99,1.01) +3.58% (p=0.000) Mandelbrot200 6.03ms × (1.00,1.01) 6.01ms × (1.00,1.00) ~ (p=0.077) GoParse 6.53ms × (0.99,1.03) 6.54ms × (0.99,1.02) ~ (p=0.585) RegexpMatchEasy0_32 161ns × (1.00,1.01) 161ns × (0.98,1.05) ~ (p=0.948) RegexpMatchEasy0_1K 541ns × (0.99,1.01) 559ns × (0.98,1.01) +3.32% (p=0.000) RegexpMatchEasy1_32 138ns × (1.00,1.00) 137ns × (0.99,1.01) -0.55% (p=0.001) RegexpMatchEasy1_1K 887ns × (0.99,1.01) 878ns × (0.99,1.01) -0.98% (p=0.000) RegexpMatchMedium_32 253ns × (0.99,1.01) 252ns × (0.99,1.01) -0.39% (p=0.001) RegexpMatchMedium_1K 72.8µs × (1.00,1.00) 72.7µs × (1.00,1.00) ~ (p=0.485) RegexpMatchHard_32 3.85µs × (1.00,1.01) 3.85µs × (1.00,1.01) ~ (p=0.283) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) ~ (p=0.175) Revcomp 922ms × (0.97,1.08) 903ms × (0.98,1.05) -2.15% (p=0.021) Template 126ms × (0.99,1.01) 126ms × (0.99,1.01) ~ (p=0.943) TimeParse 628ns × (0.99,1.01) 634ns × (0.99,1.01) +0.92% (p=0.000) TimeFormat 668ns × (0.99,1.01) 698ns × (0.98,1.03) +4.53% (p=0.000) It's nice that the microbenchmarks are the ones helped the most, because those were the ones hurt the most by the conversion from 4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that process to (compared to CL 9706 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.72s × (0.97,1.03) -2.57% (p=0.011) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.87% (p=0.000) FmtFprintfEmpty 89.1ns × (0.95,1.16) 86.6ns × (0.96,1.11) ~ (p=0.090) FmtFprintfString 283ns × (0.98,1.02) 283ns × (0.99,1.04) ~ (p=0.681) FmtFprintfInt 284ns × (0.98,1.04) 283ns × (0.98,1.04) ~ (p=0.620) FmtFprintfIntInt 486ns × (0.98,1.03) 480ns × (0.99,1.02) -1.27% (p=0.002) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 396ns × (0.99,1.01) -0.84% (p=0.001) FmtFprintfFloat 566ns × (0.99,1.01) 562ns × (0.99,1.01) -0.80% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.89µs × (0.99,1.01) -1.10% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.7ms × (0.99,1.02) +1.55% (p=0.005) GobEncode 11.9ms × (0.97,1.03) 11.8ms × (0.98,1.03) -0.97% (p=0.048) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.627) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.482) HTTPClientServer 89.2µs × (0.99,1.02) 89.1µs × (0.99,1.02) ~ (p=0.740) JSONEncode 32.3ms × (0.97,1.06) 31.7ms × (0.98,1.02) -1.95% (p=0.002) JSONDecode 106ms × (0.99,1.01) 111ms × (0.99,1.01) +4.22% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.417) GoParse 6.57ms × (0.97,1.06) 6.54ms × (0.99,1.02) ~ (p=0.404) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (0.98,1.05) ~ (p=0.088) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 559ns × (0.98,1.01) -0.47% (p=0.034) RegexpMatchEasy1_32 145ns × (0.95,1.04) 137ns × (0.99,1.01) -5.56% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 878ns × (0.99,1.01) +1.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 252ns × (0.99,1.01) -1.43% (p=0.001) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.7µs × (1.00,1.00) -1.55% (p=0.004) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.80% (p=0.003) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.00) -2.13% (p=0.001) Revcomp 936ms × (0.95,1.08) 903ms × (0.98,1.05) -3.58% (p=0.002) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.98% (p=0.000) TimeParse 638ns × (0.98,1.05) 634ns × (0.99,1.01) ~ (p=0.198) TimeFormat 674ns × (0.99,1.01) 698ns × (0.98,1.03) +3.69% (p=0.000) Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c Reviewed-on: https://go-review.googlesource.com/9706 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:55:11 +00:00
Russ Cox	54af9a3ba5	runtime: reintroduce ``dead'' space during GC scan Reintroduce an optimization discarded during the initial conversion from 4-bit heap bitmaps to 2-bit heap bitmaps: when we reach the place in the bitmap where there are no more pointers, mark that position for the GC so that it can avoid scanning past that place. During heapBitsSetType we can also avoid initializing heap bitmap beyond that location, which gives a bit of a win compared to Go 1.4. This particular optimization (not initializing the heap bitmap) may not last: we might change typedmemmove to use the heap bitmap, in which case it would all need to be initialized. The early stop in the GC scan will stay no matter what. Compared to Go 1.4 (github.com/rsc/go, branch go14bench): name old mean new mean delta SetTypeNode64 80.7ns × (1.00,1.01) 57.4ns × (1.00,1.01) -28.83% (p=0.000) SetTypeNode64Dead 80.5ns × (1.00,1.01) 13.1ns × (0.99,1.02) -83.77% (p=0.000) SetTypeNode64Slice 2.16µs × (1.00,1.01) 1.54µs × (1.00,1.01) -28.75% (p=0.000) SetTypeNode64DeadSlice 2.16µs × (1.00,1.01) 1.52µs × (1.00,1.00) -29.74% (p=0.000) Compared to previous CL: name old mean new mean delta SetTypeNode64 56.7ns × (1.00,1.00) 57.4ns × (1.00,1.01) +1.19% (p=0.000) SetTypeNode64Dead 57.2ns × (1.00,1.00) 13.1ns × (0.99,1.02) -77.15% (p=0.000) SetTypeNode64Slice 1.56µs × (1.00,1.01) 1.54µs × (1.00,1.01) -0.89% (p=0.000) SetTypeNode64DeadSlice 1.55µs × (1.00,1.01) 1.52µs × (1.00,1.00) -2.23% (p=0.000) This is the last CL in the sequence converting from the 4-bit heap to the 2-bit heap, with all the same optimizations reenabled. Compared to before that process began (compared to CL 9701 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.91s × (0.96,1.06) ~ (p=0.578) Fannkuch11 4.32s × (1.00,1.00) 4.32s × (1.00,1.00) ~ (p=0.474) FmtFprintfEmpty 89.1ns × (0.95,1.16) 89.0ns × (0.93,1.10) ~ (p=0.942) FmtFprintfString 283ns × (0.98,1.02) 298ns × (0.98,1.06) +5.33% (p=0.000) FmtFprintfInt 284ns × (0.98,1.04) 286ns × (0.98,1.03) ~ (p=0.208) FmtFprintfIntInt 486ns × (0.98,1.03) 498ns × (0.97,1.06) +2.48% (p=0.000) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 408ns × (0.98,1.02) +2.23% (p=0.000) FmtFprintfFloat 566ns × (0.99,1.01) 587ns × (0.98,1.01) +3.69% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.94µs × (0.99,1.02) +1.81% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.8ms × (0.98,1.03) +1.94% (p=0.002) GobEncode 11.9ms × (0.97,1.03) 12.0ms × (0.96,1.09) ~ (p=0.263) Gzip 648ms × (0.99,1.01) 648ms × (0.99,1.01) ~ (p=0.992) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.585) HTTPClientServer 89.2µs × (0.99,1.02) 90.3µs × (0.98,1.01) +1.24% (p=0.000) JSONEncode 32.3ms × (0.97,1.06) 31.6ms × (0.99,1.01) -2.29% (p=0.000) JSONDecode 106ms × (0.99,1.01) 107ms × (1.00,1.01) +0.62% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.03ms × (1.00,1.01) ~ (p=0.250) GoParse 6.57ms × (0.97,1.06) 6.53ms × (0.99,1.03) ~ (p=0.243) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (1.00,1.01) -0.80% (p=0.000) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 541ns × (0.99,1.01) -3.67% (p=0.000) RegexpMatchEasy1_32 145ns × (0.95,1.04) 138ns × (1.00,1.00) -5.04% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 887ns × (0.99,1.01) +2.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 253ns × (0.99,1.01) -1.05% (p=0.012) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.8µs × (1.00,1.00) -1.51% (p=0.005) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.88% (p=0.002) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.01) -2.02% (p=0.001) Revcomp 936ms × (0.95,1.08) 922ms × (0.97,1.08) ~ (p=0.234) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.99% (p=0.000) TimeParse 638ns × (0.98,1.05) 628ns × (0.99,1.01) -1.54% (p=0.004) TimeFormat 674ns × (0.99,1.01) 668ns × (0.99,1.01) -0.80% (p=0.001) The slowdown of the first few benchmarks seems to be due to the new atomic operations for certain small size allocations. But the larger benchmarks mostly improve, probably due to the decreased memory pressure from having half as much heap bitmap. CL 9706, which removes the (never used anymore) wbshadow mode, gets back what is lost in the early microbenchmarks. Change-Id: I37423a209e8ec2a2e92538b45cac5422a6acd32d Reviewed-on: https://go-review.googlesource.com/9705 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:51:40 +00:00
Russ Cox	feb8a3b616	runtime: optimize heapBitsSetType For the conversion of the heap bitmap from 4-bit to 2-bit fields, I replaced heapBitsSetType with the dumbest thing that could possibly work: two atomic operations (atomicand8+atomicor8) per 2-bit field. This CL replaces that code with a proper implementation that avoids the atomics whenever possible. Benchmarks vs base CL (before the conversion to 2-bit heap bitmap) and vs Go 1.4 below. Compared to Go 1.4, SetTypePtr (a 1-pointer allocation) is 10ns slower because a race against the concurrent GC requires the use of an atomicor8 that used to be an ordinary write. This slowdown was present even in the base CL. Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation) is 10ns slower because it too needs a new atomic, because with the denser representation, the byte on the end of the allocation is now shared with the object next to it; this was not true with the 4-bit representation. Excluding these two (fundamental) slowdowns due to the use of atomics, the new code is noticeably faster than both Go 1.4 and the base CL. The next CL will reintroduce the ``typeDead'' optimization. Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5). Compared to base CL ( = new atomic) name old mean new mean delta SetTypePtr 14.1ns × (0.99,1.02) 14.7ns × (0.93,1.10) ~ (p=0.175) SetTypePtr8 18.4ns × (1.00,1.01) 18.6ns × (0.81,1.21) ~ (p=0.866) SetTypePtr16 28.7ns × (1.00,1.00) 22.4ns × (0.90,1.27) -21.88% (p=0.015) SetTypePtr32 52.3ns × (1.00,1.00) 33.8ns × (0.93,1.24) -35.37% (p=0.001) SetTypePtr64 79.2ns × (1.00,1.00) 55.1ns × (1.00,1.01) -30.43% (p=0.000) SetTypePtr126 118ns × (1.00,1.00) 100ns × (1.00,1.00) -15.97% (p=0.000) SetTypePtr128 130ns × (0.92,1.19) 98ns × (1.00,1.00) -24.36% (p=0.008) SetTypePtrSlice 726ns × (0.96,1.08) 760ns × (1.00,1.00) ~ (p=0.152) SetTypeNode1 14.1ns × (0.94,1.15) 12.0ns × (1.00,1.01) -14.60% (p=0.020) SetTypeNode1Slice 135ns × (0.96,1.07) 88ns × (1.00,1.00) -34.53% (p=0.000) SetTypeNode8 20.9ns × (1.00,1.01) 32.6ns × (1.00,1.00) +55.37% (p=0.000) SetTypeNode8Slice 414ns × (0.99,1.02) 244ns × (1.00,1.00) -41.09% (p=0.000) SetTypeNode64 80.0ns × (1.00,1.00) 57.4ns × (1.00,1.00) -28.23% (p=0.000) SetTypeNode64Slice 2.15µs × (1.00,1.01) 1.56µs × (1.00,1.00) -27.43% (p=0.000) SetTypeNode124 119ns × (0.99,1.00) 100ns × (1.00,1.00) -16.11% (p=0.000) SetTypeNode124Slice 3.40µs × (1.00,1.00) 2.93µs × (1.00,1.00) -13.80% (p=0.000) SetTypeNode126 120ns × (1.00,1.01) 98ns × (1.00,1.00) -18.19% (p=0.000) SetTypeNode126Slice 3.53µs × (0.98,1.08) 3.02µs × (1.00,1.00) -14.49% (p=0.002) SetTypeNode1024 726ns × (0.97,1.09) 740ns × (1.00,1.00) ~ (p=0.451) SetTypeNode1024Slice 24.9µs × (0.89,1.37) 23.1µs × (1.00,1.00) ~ (p=0.476) Compared to Go 1.4 ( = new atomic) name old mean new mean delta SetTypePtr 5.71ns × (0.89,1.19) 14.68ns × (0.93,1.10) +157.24% (p=0.000) SetTypePtr8 19.3ns × (0.96,1.10) 18.6ns × (0.81,1.21) ~ (p=0.638) SetTypePtr16 30.7ns × (0.99,1.03) 22.4ns × (0.90,1.27) -26.88% (p=0.005) SetTypePtr32 51.5ns × (1.00,1.00) 33.8ns × (0.93,1.24) -34.40% (p=0.001) SetTypePtr64 83.6ns × (0.94,1.12) 55.1ns × (1.00,1.01) -34.12% (p=0.001) SetTypePtr126 137ns × (0.87,1.26) 100ns × (1.00,1.00) -27.10% (p=0.028) SetTypePtrSlice 865ns × (0.80,1.23) 760ns × (1.00,1.00) ~ (p=0.243) SetTypeNode1 15.2ns × (0.88,1.12) 12.0ns × (1.00,1.01) -20.89% (p=0.014) SetTypeNode1Slice 156ns × (0.93,1.16) 88ns × (1.00,1.00) -43.57% (p=0.001) SetTypeNode8 23.8ns × (0.90,1.18) 32.6ns × (1.00,1.00) +36.76% (p=0.003) ** SetTypeNode8Slice 502ns × (0.92,1.10) 244ns × (1.00,1.00) -51.46% (p=0.000) SetTypeNode64 85.6ns × (0.94,1.11) 57.4ns × (1.00,1.00) -32.89% (p=0.001) SetTypeNode64Slice 2.36µs × (0.91,1.14) 1.56µs × (1.00,1.00) -33.96% (p=0.002) SetTypeNode124 130ns × (0.91,1.12) 100ns × (1.00,1.00) -23.49% (p=0.004) SetTypeNode124Slice 3.81µs × (0.90,1.22) 2.93µs × (1.00,1.00) -23.09% (p=0.025) There are fewer benchmarks vs Go 1.4 because unrolling directly into the heap bitmap is not yet implemented, so those would not be meaningful comparisons. These benchmarks were not present in Go 1.4 as distributed. The backport to Go 1.4 is in github.com/rsc/go's go14bench branch, commit 71d5ee5. Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6 Reviewed-on: https://go-review.googlesource.com/9704 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:51:20 +00:00
Russ Cox	0234dfd493	runtime: use 2-bit heap bitmap (in place of 4-bit) Previous CLs changed the representation of the non-heap type bitmaps to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap stored a 2-bit type for each word and a mark bit and checkmark bit for the first word of the object. (There used to be additional per-word bits.) Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not, and the other used for mark, checkmark, and "keep scanning forward to find pointers in this object." See comments for details. This CL replaces heapBitsSetType with very slow but obviously correct code. A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.) Change-Id: I999577a133f3cfecacebdec9cdc3573c235c7fb9 Reviewed-on: https://go-review.googlesource.com/9703 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 14:43:45 +00:00
Russ Cox	6d8a147bef	runtime: use 1-bit pointer bitmaps in type representation The type information in reflect.Type and the GC programs is now 1 bit per word, down from 2 bits. The in-memory unrolled type bitmap representation are now 1 bit per word, down from 4 bits. The conversion from the unrolled (now 1-bit) bitmap to the heap bitmap (still 4-bit) is not optimized. A followup CL will work on that, after the heap bitmap has been converted to 2-bit. The typeDead optimization, in which a special value denotes that there are no more pointers anywhere in the object, is lost in this CL. A followup CL will bring it back in the final form of heapBitsSetType. Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549 Reviewed-on: https://go-review.googlesource.com/9702 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 14:43:33 +00:00
Russ Cox	7d9e16abc6	runtime: add benchmark of heapBitsSetType There was an old benchmark that measured this indirectly via allocation, but I don't understand how to factor out the allocation cost when interpreting the numbers. Replace with a benchmark that only calls heapBitsSetType, that does not allocate. This was not possible when the benchmark was first written, because heapBitsSetType had not been factored out of mallocgc. Change-Id: I30f0f02362efab3465a50769398be859832e6640 Reviewed-on: https://go-review.googlesource.com/9701 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 14:40:27 +00:00
Daniel Morsing	db6f88a84b	runtime: enable profiling on g0 Since we now have stack information for code running on the systemstack, we can traceback over it. To make cpu profiles useful, add a case in gentraceback to jump over systemstack switches. Fixes #10609. Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7 Reviewed-on: https://go-review.googlesource.com/9506 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-05-11 08:44:30 +00:00
Shenghou Ma	fd392ee52b	cmd/internal/ld: generate correct .debug_frames on RISC architectures With this patch, gdb seems to be able to corretly backtrace Go process on at least linux/{arm,arm64,ppc64}. Change-Id: Ic40a2a70e71a19c4a92e4655710f38a807b67e9a Reviewed-on: https://go-review.googlesource.com/9822 Run-TryBot: Minux Ma <minux@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-05-08 00:34:27 +00:00
Russ Cox	0211d7d7b0	runtime: turn off checkmark by default Change-Id: Ic8cb8b1ed8715d6d5a53ec3cac385c0e93883514 Reviewed-on: https://go-review.googlesource.com/9825 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-07 21:08:42 +00:00
Russ Cox	9626561030	runtime: fix gccheckmark mode and enable by default It was testing the mark bits on what roots pointed at, but not the remainder of the live heap, because in CL 2991 I accidentally inverted this check during refactoring. The next CL will turn it back off by default again, but I want one run on the builders with the full checkmark checks. Change-Id: Ic166458cea25c0a56e5387fc527cb166ff2e5ada Reviewed-on: https://go-review.googlesource.com/9824 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-07 21:08:29 +00:00
Rick Hudson	b6e178ed7e	runtime: set heap minimum default based on GOGC Currently the heap minimum is set to 4MB which prevents our ability to collect at every allocation by setting GOGC=0. This adjust the heap minimum to 4MB*GOGC/100 thus reenabling collecting at every allocation. Fixes #10681 Change-Id: I912d027dac4b14ae535597e8beefa9ac3fb8ad94 Reviewed-on: https://go-review.googlesource.com/9814 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-07 21:05:58 +00:00
Michael Hudson-Doyle	fa896733b5	runtime: check consistency of all module data objects Current code just checks the consistency (that the functab is correctly sorted by PC, etc) of the moduledata object that the runtime belongs to. Change to check all of them. Change-Id: I544a44c5de7445fff87d3cdb4840ff04c5e2bf75 Reviewed-on: https://go-review.googlesource.com/9773 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-05-07 15:06:08 +00:00
Alex Brainman	a52dc9fcbd	runtime: fix comments that mention g status values Makes searching in source code easier. Change-Id: Ie2e85934d23920ac0bc01d28168bcfbbdc465580 Reviewed-on: https://go-review.googlesource.com/9774 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Reviewed-by: Minux Ma <minux@golang.org>	2015-05-07 00:00:38 +00:00
Austin Clements	17db6e0420	runtime: use heap scan size as estimate of GC scan work Currently, the GC uses a moving average of recent scan work ratios to estimate the total scan work required by this cycle. This is in turn used to compute how much scan work should be done by mutators when they allocate in order to perform all expected scan work by the time the allocated heap reaches the heap goal. However, our current scan work estimate can be arbitrarily wrong if the heap topography changes significantly from one cycle to the next. For example, in the go1 benchmarks, at the beginning of each benchmark, the heap is dominated by a 256MB no-scan object, so the GC learns that the scan density of the heap is very low. In benchmarks that then rapidly allocate pointer-dense objects, by the time of the next GC cycle, our estimate of the scan work can be too low by a large factor. This in turn lets the mutator allocate faster than the GC can collect, allowing it to get arbitrarily far ahead of the scan work estimate, which leads to very long GC cycles with very little mutator assist that can overshoot the heap goal by large margins. This is particularly easy to demonstrate with BinaryTree17: $ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17 gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P testing: warning: no tests to run PASS BenchmarkBinaryTree17 gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced) gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P 1 9150447353 ns/op Change this estimate to instead use the current scannable heap size. This has the advantage of being based solely on the current state of the heap, not on past densities or reachable heap sizes, so it isn't susceptible to falling behind during these sorts of phase changes. This is strictly an over-estimate, but it's better to over-estimate and get more assist than necessary than it is to under-estimate and potentially spiral out of control. Experiments with scaling this estimate back showed no obvious benefit for mutator utilization, heap size, or assist time. This new estimate has little effect for most benchmarks, including most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge effect for benchmarks that triggered the bad pacer behavior: name old mean new mean delta BinaryTree17 10.0s × (1.00,1.00) 3.5s × (0.98,1.01) -64.93% (p=0.000) Fannkuch11 2.74s × (1.00,1.01) 2.65s × (1.00,1.00) -3.52% (p=0.000) FmtFprintfEmpty 56.4ns × (0.99,1.00) 57.8ns × (1.00,1.01) +2.43% (p=0.000) FmtFprintfString 187ns × (0.99,1.00) 185ns × (0.99,1.01) -1.19% (p=0.010) FmtFprintfInt 184ns × (1.00,1.00) 183ns × (1.00,1.00) (no variance) FmtFprintfIntInt 321ns × (1.00,1.00) 315ns × (1.00,1.00) -1.80% (p=0.000) FmtFprintfPrefixedInt 266ns × (1.00,1.00) 263ns × (1.00,1.00) -1.22% (p=0.000) FmtFprintfFloat 353ns × (1.00,1.00) 353ns × (1.00,1.00) -0.13% (p=0.035) FmtManyArgs 1.21µs × (1.00,1.00) 1.19µs × (1.00,1.00) -1.33% (p=0.000) GobDecode 9.69ms × (1.00,1.00) 9.59ms × (1.00,1.00) -1.07% (p=0.000) GobEncode 7.89ms × (0.99,1.01) 7.74ms × (1.00,1.00) -1.92% (p=0.000) Gzip 391ms × (1.00,1.00) 392ms × (1.00,1.00) ~ (p=0.522) Gunzip 97.1ms × (1.00,1.00) 97.0ms × (1.00,1.00) -0.10% (p=0.000) HTTPClientServer 55.7µs × (0.99,1.01) 56.7µs × (0.99,1.01) +1.81% (p=0.001) JSONEncode 19.1ms × (1.00,1.00) 19.0ms × (1.00,1.00) -0.85% (p=0.000) JSONDecode 66.8ms × (1.00,1.00) 66.9ms × (1.00,1.00) ~ (p=0.288) Mandelbrot200 4.13ms × (1.00,1.00) 4.12ms × (1.00,1.00) -0.08% (p=0.000) GoParse 3.97ms × (1.00,1.01) 4.01ms × (1.00,1.00) +0.99% (p=0.000) RegexpMatchEasy0_32 114ns × (1.00,1.00) 115ns × (0.99,1.00) ~ (p=0.070) RegexpMatchEasy0_1K 376ns × (1.00,1.00) 376ns × (1.00,1.00) ~ (p=0.900) RegexpMatchEasy1_32 94.9ns × (1.00,1.00) 96.3ns × (1.00,1.01) +1.53% (p=0.001) RegexpMatchEasy1_1K 568ns × (1.00,1.00) 567ns × (1.00,1.00) -0.22% (p=0.001) RegexpMatchMedium_32 159ns × (1.00,1.00) 159ns × (1.00,1.00) ~ (p=0.178) RegexpMatchMedium_1K 46.4µs × (1.00,1.00) 46.6µs × (1.00,1.00) +0.29% (p=0.000) RegexpMatchHard_32 2.37µs × (1.00,1.00) 2.37µs × (1.00,1.00) ~ (p=0.722) RegexpMatchHard_1K 71.1µs × (1.00,1.00) 71.2µs × (1.00,1.00) ~ (p=0.229) Revcomp 565ms × (1.00,1.00) 562ms × (1.00,1.00) -0.52% (p=0.000) Template 81.0ms × (1.00,1.00) 80.2ms × (1.00,1.00) -0.97% (p=0.000) TimeParse 380ns × (1.00,1.00) 380ns × (1.00,1.00) ~ (p=0.148) TimeFormat 405ns × (0.99,1.00) 385ns × (0.99,1.00) -5.00% (p=0.000) Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4 Reviewed-on: https://go-review.googlesource.com/9696 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-05-06 19:40:38 +00:00
Austin Clements	3be3cbd548	runtime: track "scannable" bytes of heap This tracks the number of scannable bytes in the allocated heap. That is, bytes that the garbage collector must scan before reaching the last pointer field in each object. This will be used to compute a more robust estimate of the GC scan work. Change-Id: I1eecd45ef9cdd65b69d2afb5db5da885c80086bb Reviewed-on: https://go-review.googlesource.com/9695 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-06 19:40:33 +00:00
Austin Clements	53c53984e7	runtime: include scalar slots in GC scan work metric The garbage collector predicts how much "scan work" must be done in a cycle to determine how much work should be done by mutators when they allocate. Most code doesn't care what units the scan work is in: it simply knows that a certain amount of scan work has to be done in the cycle. Currently, the GC uses the number of pointer slots scanned as the scan work on the theory that this is the bulk of the time spent in the garbage collector and hence reflects real CPU resource usage. However, this metric is difficult to estimate at the beginning of a cycle. Switch to counting the total number of bytes scanned, including both pointer and scalar slots. This is still less than the total marked heap since it omits no-scan objects and no-scan tails of objects. This metric may not reflect absolute performance as well as the count of scanned pointer slots (though it still takes time to scan scalar fields), but it will be much easier to estimate robustly, which is more important. Change-Id: Ie3a5eeeb0384a1ca566f61b2f11e9ff3a75ca121 Reviewed-on: https://go-review.googlesource.com/9694 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-06 19:40:27 +00:00
Austin Clements	c4931a8433	runtime: dispose gcWork caches before updating controller state Currently, we only flush the per-P gcWork caches in gcMark, at the beginning of mark termination. This is necessary to ensure that no work is held up in these caches. However, this flush happens after we update the GC controller state, which depends on statistics about marked heap size and scan work that are only updated by this flush. Hence, the controller is missing the bulk of heap marking and scan work. This bug was introduced in commit `1b4025f`, which introduced the per-P gcWork caches. Fix this by flushing these caches before we update the GC controller state. We continue to flush them at the beginning of mark termination as well to be robust in case any write barriers happened between the previous flush and entering mark termination, but this should be a no-op. Change-Id: I8f0f91024df967ebf0c616d1c4f0c339c304ebaa Reviewed-on: https://go-review.googlesource.com/9646 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-06 19:40:22 +00:00
Rick Hudson	1845314560	runtime: remove unused GC timers During development some tracing routines were added that are not needed in the release. These included GCstarttimes, GCendtimes, and GCprinttimes. Fixes #10462 Change-Id: I0788e6409d61038571a5ae0cbbab793102df0a65 Reviewed-on: https://go-review.googlesource.com/9689 Reviewed-by: Austin Clements <austin@google.com>	2015-05-06 12:53:08 +00:00
Aram Hăvărneanu	fe5ef5c9d7	runtime, syscall: link Solaris binaries directly instead of using dlopen/dlsym Before CL 8214 (use .plt instead of .got on Solaris) Solaris used a dynamic linking scheme that didn't permit lazy binding. To speed program startup, Go binaries only used it for a small number of symbols required by the runtime. Other symbols were resolved on demand on first use, and were cached for subsequent use. This required some moderately complex code in the syscall package. CL 8214 changed the way dynamic linking is implemented, and now lazy binding is supported. As now all symbols are resolved lazily by the dynamic loader, there is no need for the complex code in the syscall package that did the same. This CL makes Go programs link directly with the necessary shared libraries and deletes the lazy-loading code implemented in Go. Change-Id: Ifd7275db72de61b70647242e7056dd303b1aee9e Reviewed-on: https://go-review.googlesource.com/9184 Reviewed-by: Minux Ma <minux@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-05-06 11:38:50 +00:00
Aram Hăvărneanu	121489cbfd	runtime/cgo: add cgo support for solaris/amd64 Change-Id: Ic9744c7716cdd53f27c6e5874230963e5fff0333 Reviewed-on: https://go-review.googlesource.com/8260 Reviewed-by: Minux Ma <minux@golang.org>	2015-05-06 11:37:28 +00:00
Aram Hăvărneanu	c94f1f791b	runtime: always load address of libcFunc on Solaris The linker always uses .plt for externals, so libcFunc is now an actual external symbol instead of a pointer to one. Fixes most of the breakage introduced in previous CL. Change-Id: I64b8c96f93127f2d13b5289b024677fd3ea7dbea Reviewed-on: https://go-review.googlesource.com/8215 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Minux Ma <minux@golang.org>	2015-05-06 11:36:57 +00:00
Russ Cox	ceefebd795	runtime: rename ptrsize to ptrdata I forgot there is already a ptrSize constant. Rename field to avoid some confusion. Change-Id: I098fdcc8afc947d6c02c41c6e6de24624cc1c8ff Reviewed-on: https://go-review.googlesource.com/9700 Reviewed-by: Austin Clements <austin@google.com>	2015-05-05 19:27:47 +00:00
Keith Randall	5a828cfcde	runtime: let freezetheworld work even when gomaxprocs=1 Freezetheworld still has stuff to do when gomaxprocs=1. In particular, signals can come in on other Ms (like the GC M, say) and the single user M is still running. Fixes #10546 Change-Id: I2f07f17d1c81e93cf905df2cb087112d436ca7e7 Reviewed-on: https://go-review.googlesource.com/9551 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-05-05 15:11:10 +00:00
Shenghou Ma	102436e800	runtime: fix software FP regs corruption when emulating SQRT on ARM When emulating ARM FSQRT instruction, the sqrt function itself should not use any floating point arithmetics, otherwise it will clobber the user software FP registers. Fortunately, the sqrt function only uses floating point instructions to test for corner cases, so it's easy to make that function does all it job using pure integer arithmetic only. I've verified that after this change, runtime.stepflt and runtime.sqrt doesn't contain any call to _sfloat. (Perhaps we should add //go:nosfloat to make the compiler enforce this?) Fixes #10641. Change-Id: Ida4742c49000fae4fea4649f28afde630ce4c576 Signed-off-by: Shenghou Ma <minux@golang.org> Reviewed-on: https://go-review.googlesource.com/9570 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Keith Randall <khr@golang.org>	2015-05-05 07:32:58 +00:00
Austin Clements	98a9d36837	runtime: add pointer size to type structure This adds a field to the runtime type structure that records the size of the prefix of objects of that type containing pointers. Any data after this offset is scalar data. This is necessary for shrinking the type bitmaps to 1 bit and will help the garbage collector efficiently estimate the amount of heap that needs to be scanned. Change-Id: I1318d79e6360dca0ac980245016c562e61f52ff5 Reviewed-on: https://go-review.googlesource.com/9691 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-05-04 20:17:48 +00:00
Rick Hudson	b86e71f5aa	runtime: Reduce calls to shouldtriggergc shouldtriggergc is slightly expensive due to the call overhead and the use of an atomic. This CL reduces the number of time one checks if a GC should be done from one at each allocation to once when a span is allocated. Since shouldtriggergc is an important abstraction simply hand inlining it, along with its atomic instruction would lose the abstraction. Change-Id: Ia3210655b4b3d433f77064a21ecb54e4d9d435f7 Reviewed-on: https://go-review.googlesource.com/9403 Reviewed-by: Austin Clements <austin@google.com>	2015-05-04 17:38:58 +00:00
Alex Brainman	031c3bc9ae	runtime: fix stackDebug comment Change-Id: Ia9191bd7ecdf7bd5ee7d69ae23aa71760f379aa8 Reviewed-on: https://go-review.googlesource.com/9590 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-05-02 02:39:50 +00:00
Austin Clements	dc870d5f4b	runtime: detailed debug output of controller state This adds a detailed debug dump of the state of the GC controller and a GODEBUG flag to enable it. Change-Id: I562fed7981691a84ddf0f9e6fcd9f089f497ac13 Reviewed-on: https://go-review.googlesource.com/9640 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-01 19:39:43 +00:00
Russ Cox	4fffc50c26	runtime: correct accounting of scan work and bytes marked (1) Count pointer-free objects found during scanning roots as marked bytes, by not zeroing the mark total after scanning roots. (2) Don't count the bytes for the roots themselves, by not adding them to the mark total in scanblock (the zeroing removed by (1) was aimed at that add but hitting more). Combined, (1) and (2) fix the calculation of the marked heap size. This makes the GC trigger much less often in the Go 1 benchmarks, which have a global []byte pointing at 256 MB of data. That 256 MB allocation was not being included in the heap size in the current code, but was included in Go 1.4. This is the source of much of the relative slowdown in that directory. (3) Count the bytes for the roots as scanned work, by not zeroing the scan total after scanning roots. There is no strict justification for this, and it probably doesn't matter much either way, but it was always combined with another buggy zeroing (removed in (1)), so guilty by association. Austin noticed this. name old mean new mean delta BenchmarkBinaryTree17 13.1s × (0.97,1.03) 5.9s × (0.97,1.05) -55.19% (p=0.000) BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.37s × (1.00,1.01) +0.47% (p=0.032) BenchmarkFmtFprintfEmpty 84.6ns × (0.95,1.14) 85.7ns × (0.94,1.05) ~ (p=0.521) BenchmarkFmtFprintfString 320ns × (0.95,1.06) 283ns × (0.99,1.02) -11.48% (p=0.000) BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 288ns × (0.99,1.02) -7.26% (p=0.000) BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 478ns × (0.99,1.02) -13.70% (p=0.000) BenchmarkFmtFprintfPrefixedInt 434ns × (0.96,1.06) 393ns × (0.98,1.04) -9.60% (p=0.000) BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 584ns × (0.99,1.01) -5.73% (p=0.000) BenchmarkFmtManyArgs 2.19µs × (0.98,1.03) 1.94µs × (0.99,1.01) -11.62% (p=0.000) BenchmarkGobDecode 21.2ms × (0.97,1.06) 15.2ms × (0.99,1.01) -28.17% (p=0.000) BenchmarkGobEncode 18.1ms × (0.94,1.06) 11.8ms × (0.99,1.01) -35.00% (p=0.000) BenchmarkGzip 650ms × (0.98,1.01) 649ms × (0.99,1.02) ~ (p=0.802) BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.438) BenchmarkHTTPClientServer 110µs × (0.98,1.04) 101µs × (0.98,1.02) -8.79% (p=0.000) BenchmarkJSONEncode 40.3ms × (0.97,1.03) 31.8ms × (0.98,1.03) -20.92% (p=0.000) BenchmarkJSONDecode 119ms × (0.97,1.02) 108ms × (0.99,1.02) -9.15% (p=0.000) BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (0.99,1.01) ~ (p=0.750) BenchmarkGoParse 8.58ms × (0.89,1.10) 6.80ms × (1.00,1.00) -20.71% (p=0.000) BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 162ns × (0.99,1.02) ~ (p=0.131) BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 559ns × (0.99,1.02) +3.58% (p=0.000) BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (1.00,1.00) ~ (p=0.466) BenchmarkRegexpMatchEasy1_1K 889ns × (0.99,1.01) 885ns × (0.99,1.01) -0.50% (p=0.022) BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.02) 252ns × (0.99,1.01) ~ (p=0.469) BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 73.6µs × (0.99,1.03) ~ (p=0.168) BenchmarkRegexpMatchHard_32 3.87µs × (1.00,1.01) 3.86µs × (1.00,1.00) ~ (p=0.055) BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.00) ~ (p=0.133) BenchmarkRevcomp 995ms × (0.94,1.10) 949ms × (0.99,1.01) -4.64% (p=0.000) BenchmarkTemplate 141ms × (0.97,1.02) 127ms × (0.99,1.01) -10.00% (p=0.000) BenchmarkTimeParse 641ns × (0.99,1.01) 623ns × (0.99,1.01) -2.79% (p=0.000) BenchmarkTimeFormat 729ns × (0.98,1.03) 679ns × (0.99,1.00) -6.93% (p=0.000) Change-Id: I839bd7356630d18377989a0748763414e15ed057 Reviewed-on: https://go-review.googlesource.com/9602 Reviewed-by: Austin Clements <austin@google.com>	2015-05-01 19:31:00 +00:00
Russ Cox	4d0f3a1c95	cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss The bitmaps were 2 bits per pointer because we needed to distinguish scalar, pointer, multiword, and we used the leftover value to distinguish uninitialized from scalar, even though the garbage collector (GC) didn't care. Now that there are no multiword structures from the GC's point of view, cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not. The GC assumes the same layout for stack frames and for the maps describing the global data and bss sections, so change them all in one CL. The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since the 2-bit representation lives (at least for now) in some of the reflect data. Because these stack frame bitmaps are stored directly in the rodata in the binary, this CL reduces the size of the 6g binary by about 1.1%. Performance change is basically a wash, but using less memory, and smaller binaries, and enables other bitmap reductions. name old mean new mean delta BenchmarkBinaryTree17 13.2s × (0.97,1.03) 13.0s × (0.99,1.01) -0.93% (p=0.005) BenchmarkBinaryTree17-2 9.69s × (0.96,1.05) 9.51s × (0.96,1.03) -1.86% (p=0.001) BenchmarkBinaryTree17-4 10.1s × (0.97,1.05) 10.0s × (0.96,1.05) ~ (p=0.141) BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.43s × (0.98,1.04) +1.75% (p=0.001) BenchmarkFannkuch11-2 4.31s × (0.99,1.03) 4.32s × (1.00,1.00) ~ (p=0.095) BenchmarkFannkuch11-4 4.32s × (0.99,1.02) 4.38s × (0.98,1.04) +1.38% (p=0.008) BenchmarkFmtFprintfEmpty 83.5ns × (0.97,1.10) 87.3ns × (0.92,1.11) +4.55% (p=0.014) BenchmarkFmtFprintfEmpty-2 81.8ns × (0.98,1.04) 82.5ns × (0.97,1.08) ~ (p=0.364) BenchmarkFmtFprintfEmpty-4 80.9ns × (0.99,1.01) 82.6ns × (0.97,1.08) +2.12% (p=0.010) BenchmarkFmtFprintfString 320ns × (0.95,1.04) 322ns × (0.97,1.05) ~ (p=0.368) BenchmarkFmtFprintfString-2 303ns × (0.97,1.04) 304ns × (0.97,1.04) ~ (p=0.484) BenchmarkFmtFprintfString-4 305ns × (0.97,1.05) 306ns × (0.98,1.05) ~ (p=0.543) BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 319ns × (0.97,1.03) +2.63% (p=0.000) BenchmarkFmtFprintfInt-2 297ns × (0.98,1.04) 301ns × (0.97,1.04) +1.19% (p=0.023) BenchmarkFmtFprintfInt-4 302ns × (0.98,1.02) 304ns × (0.97,1.03) ~ (p=0.126) BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 554ns × (0.97,1.03) ~ (p=0.975) BenchmarkFmtFprintfIntInt-2 520ns × (0.98,1.03) 517ns × (0.98,1.02) ~ (p=0.153) BenchmarkFmtFprintfIntInt-4 524ns × (0.98,1.02) 525ns × (0.98,1.03) ~ (p=0.597) BenchmarkFmtFprintfPrefixedInt 433ns × (0.97,1.06) 434ns × (0.97,1.06) ~ (p=0.804) BenchmarkFmtFprintfPrefixedInt-2 413ns × (0.98,1.04) 413ns × (0.98,1.03) ~ (p=0.881) BenchmarkFmtFprintfPrefixedInt-4 420ns × (0.97,1.03) 421ns × (0.97,1.03) ~ (p=0.561) BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 636ns × (0.97,1.03) +2.57% (p=0.000) BenchmarkFmtFprintfFloat-2 601ns × (0.98,1.02) 617ns × (0.98,1.03) +2.58% (p=0.000) BenchmarkFmtFprintfFloat-4 613ns × (0.98,1.03) 626ns × (0.98,1.02) +2.15% (p=0.000) BenchmarkFmtManyArgs 2.19µs × (0.96,1.04) 2.23µs × (0.97,1.02) +1.65% (p=0.000) BenchmarkFmtManyArgs-2 2.08µs × (0.98,1.03) 2.10µs × (0.99,1.02) +0.79% (p=0.019) BenchmarkFmtManyArgs-4 2.10µs × (0.98,1.02) 2.13µs × (0.98,1.02) +1.72% (p=0.000) BenchmarkGobDecode 21.3ms × (0.97,1.05) 21.1ms × (0.97,1.04) -1.36% (p=0.025) BenchmarkGobDecode-2 20.0ms × (0.97,1.03) 19.2ms × (0.97,1.03) -4.00% (p=0.000) BenchmarkGobDecode-4 19.5ms × (0.99,1.02) 19.0ms × (0.99,1.01) -2.39% (p=0.000) BenchmarkGobEncode 18.3ms × (0.95,1.07) 18.1ms × (0.96,1.08) ~ (p=0.305) BenchmarkGobEncode-2 16.8ms × (0.97,1.02) 16.4ms × (0.98,1.02) -2.79% (p=0.000) BenchmarkGobEncode-4 15.4ms × (0.98,1.02) 15.4ms × (0.98,1.02) ~ (p=0.465) BenchmarkGzip 650ms × (0.98,1.03) 655ms × (0.97,1.04) ~ (p=0.075) BenchmarkGzip-2 652ms × (0.98,1.03) 655ms × (0.98,1.02) ~ (p=0.337) BenchmarkGzip-4 656ms × (0.98,1.04) 653ms × (0.98,1.03) ~ (p=0.291) BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.507) BenchmarkGunzip-2 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.313) BenchmarkGunzip-4 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.312) BenchmarkHTTPClientServer 110µs × (0.98,1.03) 109µs × (0.99,1.02) -1.40% (p=0.000) BenchmarkHTTPClientServer-2 154µs × (0.90,1.08) 149µs × (0.90,1.08) -3.43% (p=0.007) BenchmarkHTTPClientServer-4 138µs × (0.97,1.04) 138µs × (0.96,1.04) ~ (p=0.670) BenchmarkJSONEncode 40.2ms × (0.98,1.02) 40.2ms × (0.98,1.05) ~ (p=0.828) BenchmarkJSONEncode-2 35.1ms × (0.99,1.02) 35.2ms × (0.98,1.03) ~ (p=0.392) BenchmarkJSONEncode-4 35.3ms × (0.98,1.03) 35.3ms × (0.98,1.02) ~ (p=0.813) BenchmarkJSONDecode 119ms × (0.97,1.02) 117ms × (0.98,1.02) -1.80% (p=0.000) BenchmarkJSONDecode-2 115ms × (0.99,1.02) 114ms × (0.98,1.02) -1.18% (p=0.000) BenchmarkJSONDecode-4 116ms × (0.98,1.02) 114ms × (0.98,1.02) -1.43% (p=0.000) BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.985) BenchmarkMandelbrot200-2 6.03ms × (1.00,1.01) 6.02ms × (1.00,1.01) ~ (p=0.320) BenchmarkMandelbrot200-4 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.799) BenchmarkGoParse 8.63ms × (0.89,1.10) 8.58ms × (0.93,1.09) ~ (p=0.667) BenchmarkGoParse-2 8.20ms × (0.97,1.04) 8.37ms × (0.97,1.04) +1.96% (p=0.001) BenchmarkGoParse-4 8.00ms × (0.98,1.02) 8.14ms × (0.99,1.02) +1.75% (p=0.000) BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 164ns × (0.98,1.04) +1.35% (p=0.011) BenchmarkRegexpMatchEasy0_32-2 161ns × (1.00,1.01) 161ns × (1.00,1.00) ~ (p=0.185) BenchmarkRegexpMatchEasy0_32-4 161ns × (1.00,1.00) 161ns × (1.00,1.00) -0.19% (p=0.001) BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 566ns × (0.98,1.04) +4.98% (p=0.000) BenchmarkRegexpMatchEasy0_1K-2 540ns × (0.99,1.01) 557ns × (0.99,1.01) +3.21% (p=0.000) BenchmarkRegexpMatchEasy0_1K-4 541ns × (0.99,1.01) 559ns × (0.99,1.01) +3.26% (p=0.000) BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (0.99,1.03) ~ (p=0.979) BenchmarkRegexpMatchEasy1_32-2 139ns × (0.99,1.04) 139ns × (0.99,1.02) ~ (p=0.777) BenchmarkRegexpMatchEasy1_32-4 139ns × (0.98,1.04) 139ns × (0.99,1.04) ~ (p=0.771) BenchmarkRegexpMatchEasy1_1K 890ns × (0.99,1.03) 885ns × (1.00,1.01) -0.50% (p=0.004) BenchmarkRegexpMatchEasy1_1K-2 888ns × (0.99,1.01) 885ns × (0.99,1.01) -0.37% (p=0.004) BenchmarkRegexpMatchEasy1_1K-4 890ns × (0.99,1.02) 884ns × (1.00,1.00) -0.70% (p=0.000) BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.01) 251ns × (0.99,1.01) ~ (p=0.081) BenchmarkRegexpMatchMedium_32-2 254ns × (0.99,1.04) 252ns × (0.99,1.01) -0.78% (p=0.027) BenchmarkRegexpMatchMedium_32-4 253ns × (0.99,1.04) 252ns × (0.99,1.01) -0.70% (p=0.022) BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 72.7µs × (1.00,1.00) ~ (p=0.064) BenchmarkRegexpMatchMedium_1K-2 74.1µs × (0.98,1.05) 72.9µs × (1.00,1.01) -1.61% (p=0.001) BenchmarkRegexpMatchMedium_1K-4 73.6µs × (0.99,1.05) 72.8µs × (1.00,1.00) -1.13% (p=0.007) BenchmarkRegexpMatchHard_32 3.88µs × (0.99,1.03) 3.92µs × (0.98,1.05) ~ (p=0.143) BenchmarkRegexpMatchHard_32-2 3.89µs × (0.99,1.03) 3.93µs × (0.98,1.09) ~ (p=0.278) BenchmarkRegexpMatchHard_32-4 3.90µs × (0.99,1.05) 3.93µs × (0.98,1.05) ~ (p=0.252) BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.02) -0.54% (p=0.003) BenchmarkRegexpMatchHard_1K-2 118µs × (0.99,1.01) 118µs × (0.99,1.03) ~ (p=0.581) BenchmarkRegexpMatchHard_1K-4 118µs × (0.99,1.02) 117µs × (0.99,1.01) -0.54% (p=0.002) BenchmarkRevcomp 991ms × (0.95,1.10) 989ms × (0.94,1.08) ~ (p=0.879) BenchmarkRevcomp-2 978ms × (0.95,1.11) 962ms × (0.96,1.08) ~ (p=0.257) BenchmarkRevcomp-4 979ms × (0.96,1.07) 974ms × (0.96,1.11) ~ (p=0.678) BenchmarkTemplate 141ms × (0.99,1.02) 145ms × (0.99,1.02) +2.75% (p=0.000) BenchmarkTemplate-2 135ms × (0.98,1.02) 138ms × (0.99,1.02) +2.34% (p=0.000) BenchmarkTemplate-4 136ms × (0.98,1.02) 140ms × (0.99,1.02) +2.71% (p=0.000) BenchmarkTimeParse 640ns × (0.99,1.01) 622ns × (0.99,1.01) -2.88% (p=0.000) BenchmarkTimeParse-2 640ns × (0.99,1.01) 622ns × (1.00,1.00) -2.81% (p=0.000) BenchmarkTimeParse-4 640ns × (1.00,1.01) 622ns × (0.99,1.01) -2.82% (p=0.000) BenchmarkTimeFormat 730ns × (0.98,1.02) 731ns × (0.98,1.03) ~ (p=0.767) BenchmarkTimeFormat-2 709ns × (0.99,1.02) 707ns × (0.99,1.02) ~ (p=0.347) BenchmarkTimeFormat-4 717ns × (0.98,1.01) 718ns × (0.98,1.02) ~ (p=0.793) Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9 Reviewed-on: https://go-review.googlesource.com/9406 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-01 18:44:36 +00:00
Josh Bleecher Snyder	7bebccb972	Revert "runtime/pprof: write heap statistics to heap profile always" This reverts commit `c26fc88d56`. This broke pprof. See the comments at 9491. Change-Id: Ic99ce026e86040c050a9bf0ea3024a1a42274ad1 Reviewed-on: https://go-review.googlesource.com/9565 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2015-05-01 15:56:20 +00:00
Keith Randall	a55b131393	cmd/dist, runtime: Make stack guard larger for non-optimized builds Kind of a hack, but makes the non-optimized builds pass. Fixes #10079 Change-Id: I26f41c546867f8f3f16d953dc043e784768f2aff Reviewed-on: https://go-review.googlesource.com/9552 Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-01 15:41:55 +00:00
David Chase	7fbb1b36c3	cmd/internal/gc: improve flow of input params to output params This includes the following information in the per-function summary: outK = paramJ encoded in outK bits for paramJ outK = paramJ encoded in outK bits for paramJ heap = paramJ EscHeap heap = paramJ EscContentEscapes Note that (currently) if the address of a parameter is taken and returned, necessarily a heap allocation occurred to contain that reference, and the heap can never refer to stack, therefore the parameter and everything downstream from it escapes to the heap. The per-function summary information now has a tuneable number of bits (2 is probably noticeably better than 1, 3 is likely overkill, but it is now easy to check and the -m debugging output includes information that allows you to figure out if more would be better.) A new test was added to check pointer flow through struct-typed and struct-typed parameters and returns; some of these are sensitive to the number of summary bits, and ought to yield better results with a more competent escape analysis algorithm. Another new test checks (some) correctness with array parameters, results, and operations. The old analysis inferred a piece of plan9 runtime was non-escaping by counteracting overconservative analysis with buggy analysis; with the bug fixed, the result was too conservative (and it's not easy to fix in this framework) so the source code was tweaked to get the desired result. A test was added against the discovered bug. The escape analysis was further improved splitting the "level" into 3 parts, one tracking the conventional "level" and the other two computing the highest-level-suffix-from-copy, which is used to generally model the cancelling effect of indirection applied to address-of. With the improved escape analysis enabled, it was necessary to modify one of the runtime tests because it now attempts to allocate too much on the (small, fixed-size) G0 (system) stack and this failed the test. Compiling src/std after touching src/runtime/.go with -m logging turned on shows 420 fewer heap allocation sites (10538 vs 10968). Profiling allocations in src/html/template with for i in {1..5} ; do go tool 6g -memprofile=mastx.${i}.prof -memprofilerate=1 *.go; go tool pprof -alloc_objects -text mastx.${i}.prof ; done showed a 15% reduction in allocations performed by the compiler. Update #3753 Update #4720 Fixes #10466 Change-Id: I0fd97d5f5ac527b45f49e2218d158a6e89951432 Reviewed-on: https://go-review.googlesource.com/8202 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-05-01 13:47:20 +00:00
David Crawshaw	4044adedf7	runtime/cgo, cmd/dist: turn off exc_bad_access handler by default App Store policy requires programs do not reference the exc_server symbol. (Some public forum threads show that Unity ran into this several years ago and it is a hard policy rule.) While some research suggests that I could write my own version of exc_server, the expedient course is to disable the exception handler by default. Go programs only need it when running under lldb, which is primarily used by tests. So enable the exception handler in cmd/dist when we are running the tests. Fixes #10646 Change-Id: I853905254894b5367edb8abd381d45585a78ee8b Reviewed-on: https://go-review.googlesource.com/9549 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-05-01 13:19:39 +00:00
Shenghou Ma	5f69e739d3	runtime: adjust traceTickDiv for non-x86 architectures Fixes #10554. Fixes #10623. Change-Id: I90fbaa34e3d55c8758178f8d2e7fa41ff1194a1b Signed-off-by: Shenghou Ma <minux@golang.org> Reviewed-on: https://go-review.googlesource.com/9247 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Dave Cheney <dave@cheney.net>	2015-05-01 07:25:49 +00:00
Russ Cox	79a990b845	runtime: schedule GC work more aggressively Schedule the work as early as possible, while still respecting the utilization percentage on average. The old code tried never to go above the utilization percentage. The new code is willing to go above the utilization percentage by one time slice (but of course after doing that it must wait until the percentage drops back down to the target before it gets another time slice). The effect is that for concurrent GCs that can run in a small number of time slices, the time during which write barriers are enabled is reduced by one mutator + GC time slice round (possibly 30 ms per GC). This only affects the fractional GC processor (the remainder of GOMAXPROCS/4), so it matters most in GOMAXPROCS=1, a bit in GOMAXPROCS=2, and not at all in GOMAXPROCS=4. GOMAXPROCS=1 name old mean new mean delta BenchmarkBinaryTree17 12.4s × (0.98,1.03) 13.5s × (0.97,1.04) +8.84% (p=0.000) BenchmarkFannkuch11 4.38s × (1.00,1.01) 4.38s × (1.00,1.01) ~ (p=0.343) BenchmarkFmtFprintfEmpty 88.9ns × (0.97,1.10) 90.1ns × (0.93,1.14) ~ (p=0.224) BenchmarkFmtFprintfString 356ns × (0.94,1.05) 321ns × (0.94,1.12) -9.77% (p=0.000) BenchmarkFmtFprintfInt 344ns × (0.98,1.03) 325ns × (0.96,1.03) -5.46% (p=0.000) BenchmarkFmtFprintfIntInt 622ns × (0.97,1.03) 571ns × (0.95,1.05) -8.09% (p=0.000) BenchmarkFmtFprintfPrefixedInt 462ns × (0.96,1.04) 431ns × (0.95,1.05) -6.81% (p=0.000) BenchmarkFmtFprintfFloat 653ns × (0.98,1.03) 621ns × (0.99,1.03) -4.90% (p=0.000) BenchmarkFmtManyArgs 2.32µs × (0.97,1.03) 2.19µs × (0.98,1.02) -5.43% (p=0.000) BenchmarkGobDecode 27.0ms × (0.96,1.04) 20.0ms × (0.97,1.04) -26.06% (p=0.000) BenchmarkGobEncode 26.6ms × (0.99,1.01) 17.8ms × (0.95,1.05) -33.19% (p=0.000) BenchmarkGzip 659ms × (0.98,1.03) 650ms × (0.99,1.01) -1.34% (p=0.000) BenchmarkGunzip 145ms × (0.98,1.04) 143ms × (1.00,1.01) -1.47% (p=0.000) BenchmarkHTTPClientServer 111µs × (0.97,1.04) 110µs × (0.96,1.03) -1.30% (p=0.000) BenchmarkJSONEncode 52.0ms × (0.97,1.03) 40.8ms × (0.97,1.03) -21.47% (p=0.000) BenchmarkJSONDecode 127ms × (0.98,1.04) 120ms × (0.98,1.02) -5.55% (p=0.000) BenchmarkMandelbrot200 6.04ms × (0.99,1.04) 6.02ms × (1.00,1.01) ~ (p=0.176) BenchmarkGoParse 8.62ms × (0.96,1.08) 8.55ms × (0.93,1.09) ~ (p=0.302) BenchmarkRegexpMatchEasy0_32 164ns × (0.98,1.05) 165ns × (0.98,1.07) ~ (p=0.293) BenchmarkRegexpMatchEasy0_1K 546ns × (0.98,1.06) 547ns × (0.97,1.07) ~ (p=0.741) BenchmarkRegexpMatchEasy1_32 142ns × (0.97,1.09) 141ns × (0.97,1.05) ~ (p=0.231) BenchmarkRegexpMatchEasy1_1K 904ns × (0.97,1.07) 900ns × (0.98,1.04) ~ (p=0.294) BenchmarkRegexpMatchMedium_32 256ns × (0.98,1.06) 256ns × (0.97,1.04) ~ (p=0.530) BenchmarkRegexpMatchMedium_1K 74.2µs × (0.98,1.05) 73.8µs × (0.98,1.04) ~ (p=0.334) BenchmarkRegexpMatchHard_32 3.94µs × (0.98,1.07) 3.92µs × (0.98,1.05) ~ (p=0.356) BenchmarkRegexpMatchHard_1K 119µs × (0.98,1.07) 119µs × (0.98,1.06) ~ (p=0.467) BenchmarkRevcomp 978ms × (0.96,1.09) 984ms × (0.95,1.07) ~ (p=0.448) BenchmarkTemplate 151ms × (0.96,1.03) 142ms × (0.95,1.04) -5.55% (p=0.000) BenchmarkTimeParse 628ns × (0.99,1.01) 628ns × (0.99,1.01) ~ (p=0.855) BenchmarkTimeFormat 729ns × (0.98,1.06) 734ns × (0.97,1.05) ~ (p=0.149) GOMAXPROCS=2 name old mean new mean delta BenchmarkBinaryTree17-2 9.80s × (0.97,1.03) 9.85s × (0.99,1.02) ~ (p=0.444) BenchmarkFannkuch11-2 4.35s × (0.99,1.01) 4.40s × (0.98,1.05) ~ (p=0.099) BenchmarkFmtFprintfEmpty-2 86.7ns × (0.97,1.05) 85.9ns × (0.98,1.04) ~ (p=0.409) BenchmarkFmtFprintfString-2 297ns × (0.98,1.01) 297ns × (0.99,1.01) ~ (p=0.743) BenchmarkFmtFprintfInt-2 309ns × (0.98,1.02) 310ns × (0.99,1.01) ~ (p=0.464) BenchmarkFmtFprintfIntInt-2 525ns × (0.97,1.05) 518ns × (0.99,1.01) ~ (p=0.151) BenchmarkFmtFprintfPrefixedInt-2 408ns × (0.98,1.02) 408ns × (0.98,1.03) ~ (p=0.797) BenchmarkFmtFprintfFloat-2 603ns × (0.99,1.01) 604ns × (0.98,1.02) ~ (p=0.588) BenchmarkFmtManyArgs-2 2.07µs × (0.98,1.02) 2.05µs × (0.99,1.01) ~ (p=0.091) BenchmarkGobDecode-2 19.1ms × (0.97,1.01) 19.3ms × (0.97,1.04) ~ (p=0.195) BenchmarkGobEncode-2 16.2ms × (0.97,1.03) 16.4ms × (0.99,1.01) ~ (p=0.069) BenchmarkGzip-2 652ms × (0.99,1.01) 651ms × (0.99,1.01) ~ (p=0.705) BenchmarkGunzip-2 143ms × (1.00,1.01) 143ms × (1.00,1.00) ~ (p=0.665) BenchmarkHTTPClientServer-2 149µs × (0.92,1.11) 149µs × (0.91,1.08) ~ (p=0.862) BenchmarkJSONEncode-2 34.6ms × (0.98,1.02) 37.2ms × (0.99,1.01) +7.56% (p=0.000) BenchmarkJSONDecode-2 117ms × (0.99,1.01) 117ms × (0.99,1.01) ~ (p=0.858) BenchmarkMandelbrot200-2 6.10ms × (0.99,1.03) 6.03ms × (1.00,1.00) ~ (p=0.083) BenchmarkGoParse-2 8.25ms × (0.98,1.01) 8.21ms × (0.99,1.02) ~ (p=0.307) BenchmarkRegexpMatchEasy0_32-2 162ns × (0.99,1.02) 162ns × (0.99,1.01) ~ (p=0.857) BenchmarkRegexpMatchEasy0_1K-2 541ns × (0.99,1.01) 540ns × (1.00,1.00) ~ (p=0.530) BenchmarkRegexpMatchEasy1_32-2 138ns × (1.00,1.00) 141ns × (0.98,1.04) +1.88% (p=0.038) BenchmarkRegexpMatchEasy1_1K-2 887ns × (0.99,1.01) 894ns × (0.99,1.01) ~ (p=0.087) BenchmarkRegexpMatchMedium_32-2 252ns × (0.99,1.01) 252ns × (0.99,1.01) ~ (p=0.954) BenchmarkRegexpMatchMedium_1K-2 73.4µs × (0.99,1.02) 72.8µs × (1.00,1.01) -0.87% (p=0.029) BenchmarkRegexpMatchHard_32-2 3.95µs × (0.97,1.05) 3.87µs × (1.00,1.01) -2.11% (p=0.035) BenchmarkRegexpMatchHard_1K-2 117µs × (0.99,1.01) 117µs × (0.99,1.01) ~ (p=0.669) BenchmarkRevcomp-2 980ms × (0.95,1.03) 993ms × (0.94,1.09) ~ (p=0.527) BenchmarkTemplate-2 136ms × (0.98,1.01) 135ms × (0.99,1.01) ~ (p=0.200) BenchmarkTimeParse-2 630ns × (1.00,1.01) 630ns × (1.00,1.00) ~ (p=0.634) BenchmarkTimeFormat-2 705ns × (0.99,1.01) 710ns × (0.98,1.02) ~ (p=0.174) GOMAXPROCS=4 BenchmarkBinaryTree17-4 9.87s × (0.96,1.04) 9.75s × (0.96,1.03) ~ (p=0.178) BenchmarkFannkuch11-4 4.35s × (1.00,1.01) 4.40s × (0.99,1.04) ~ (p=0.071) BenchmarkFmtFprintfEmpty-4 85.8ns × (0.98,1.06) 85.6ns × (0.98,1.04) ~ (p=0.858) BenchmarkFmtFprintfString-4 306ns × (0.99,1.03) 304ns × (0.97,1.02) ~ (p=0.470) BenchmarkFmtFprintfInt-4 317ns × (0.98,1.01) 315ns × (0.98,1.02) -0.92% (p=0.044) BenchmarkFmtFprintfIntInt-4 527ns × (0.99,1.01) 525ns × (0.98,1.01) ~ (p=0.164) BenchmarkFmtFprintfPrefixedInt-4 421ns × (0.98,1.03) 417ns × (0.99,1.02) ~ (p=0.092) BenchmarkFmtFprintfFloat-4 623ns × (0.98,1.02) 618ns × (0.98,1.03) ~ (p=0.172) BenchmarkFmtManyArgs-4 2.09µs × (0.98,1.02) 2.09µs × (0.98,1.02) ~ (p=0.679) BenchmarkGobDecode-4 18.6ms × (0.99,1.01) 18.6ms × (0.98,1.03) ~ (p=0.595) BenchmarkGobEncode-4 15.0ms × (0.98,1.02) 15.1ms × (0.99,1.01) ~ (p=0.301) BenchmarkGzip-4 659ms × (0.98,1.04) 660ms × (0.97,1.02) ~ (p=0.724) BenchmarkGunzip-4 145ms × (0.98,1.04) 144ms × (0.99,1.04) ~ (p=0.671) BenchmarkHTTPClientServer-4 139µs × (0.97,1.02) 138µs × (0.99,1.02) ~ (p=0.392) BenchmarkJSONEncode-4 35.0ms × (0.99,1.02) 35.1ms × (0.98,1.02) ~ (p=0.777) BenchmarkJSONDecode-4 119ms × (0.98,1.01) 118ms × (0.98,1.02) ~ (p=0.710) BenchmarkMandelbrot200-4 6.02ms × (1.00,1.00) 6.02ms × (1.00,1.00) ~ (p=0.289) BenchmarkGoParse-4 7.96ms × (0.99,1.01) 7.96ms × (0.99,1.01) ~ (p=0.884) BenchmarkRegexpMatchEasy0_32-4 164ns × (0.98,1.04) 166ns × (0.97,1.04) ~ (p=0.221) BenchmarkRegexpMatchEasy0_1K-4 540ns × (0.99,1.01) 552ns × (0.97,1.04) +2.10% (p=0.018) BenchmarkRegexpMatchEasy1_32-4 140ns × (0.99,1.04) 142ns × (0.97,1.04) ~ (p=0.226) BenchmarkRegexpMatchEasy1_1K-4 896ns × (0.99,1.03) 907ns × (0.97,1.04) ~ (p=0.155) BenchmarkRegexpMatchMedium_32-4 255ns × (0.99,1.04) 255ns × (0.98,1.04) ~ (p=0.904) BenchmarkRegexpMatchMedium_1K-4 73.4µs × (0.99,1.04) 73.8µs × (0.98,1.04) ~ (p=0.560) BenchmarkRegexpMatchHard_32-4 3.93µs × (0.98,1.04) 3.95µs × (0.98,1.04) ~ (p=0.571) BenchmarkRegexpMatchHard_1K-4 117µs × (1.00,1.01) 119µs × (0.98,1.04) +1.48% (p=0.048) BenchmarkRevcomp-4 990ms × (0.94,1.08) 989ms × (0.94,1.10) ~ (p=0.957) BenchmarkTemplate-4 137ms × (0.98,1.02) 137ms × (0.99,1.01) ~ (p=0.996) BenchmarkTimeParse-4 629ns × (1.00,1.00) 629ns × (0.99,1.01) ~ (p=0.924) BenchmarkTimeFormat-4 710ns × (0.99,1.01) 716ns × (0.98,1.02) +0.84% (p=0.033) Change-Id: I43a04e0f6ad5e3ba9847dddf12e13222561f9cf4 Reviewed-on: https://go-review.googlesource.com/9543 Reviewed-by: Austin Clements <austin@google.com>	2015-04-30 15:50:12 +00:00
Austin Clements	3ca20218c1	runtime: fix gcDumpObject on non-heap pointers gcDumpObject is used to print the source and destination objects when checkmark find a missing mark. However, gcDumpObject currently assumes the given pointer will point to a heap object. This is not true of the source object during root marking and may not even be true of the destination object in the limited situations where the heap points back in to the stack. If the pointer isn't a heap object, gcDumpObject will attempt an out-of-bounds access to h_spans. This will cause a panicslice, which will attempt to construct a useful panic message. This will cause a string allocation, which will lead mallocgc to panic because the GC is in mark termination (checkmark only happens during mark termination). Fix this by checking that the pointer points into the heap arena before attempting to use it as an arena pointer. Change-Id: I09da600c380d4773f1f8f38e45b82cb229ea6382 Reviewed-on: https://go-review.googlesource.com/9498 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-30 14:53:51 +00:00
Keith Randall	4b78c9575d	runtime: print stack of G during a signal Sequence of operations: - Go code does a systemstack call - during the systemstack call, receive a signal - signal requests a traceback of all goroutines The orignal G is still marked as _Grunning, so the traceback code refuses to print its stack. Fix by allowing traceback of Gs whose caller is on the same M as G is. G can't be modifying its stack if that is the case. Fixes #10546 Change-Id: I2bcea48c0197fbf78ab6fa080027cd80181083ad Reviewed-on: https://go-review.googlesource.com/9435 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-04-29 19:25:10 +00:00
Shenghou Ma	4d1ab2d8d1	runtime: re-enable TestNewProc0 on android/arm and fix heap corruption The problem is not actually specific to android/arm. Linux/ARM's runtime.clone set the stack pointer to child_stk-4 before calling the fn. And then when fn returns, it tries to write to 4(R13) to provide argument for runtime.exit, which is just beyond the allocated child stack, and thus it will corrupt the heap randomly or trigger segfault if that memory happens to be unmapped. While we're at here, shorten the test polling interval to 0.1s to speed up the test (it was only checking at 1s interval, which means the test takes at least 1s). Fixes #10548. Change-Id: I57cd63232022b113b6cd61e987b0684ebcce930a Reviewed-on: https://go-review.googlesource.com/9457 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-04-29 19:18:07 +00:00
Russ Cox	c26fc88d56	runtime/pprof: write heap statistics to heap profile always The heap statistics were only written if asked for a profile with debug > 0, but that also prints a stack trace for each profile line, which is comparatively much noisier. The statistics are short enough and separate enough (they only appear at the end) and useful enough that we can print them always. This means that people using -test.memprofile in tests will get a memory profile with statistics included now. Pprof won't care, but if people care to look, the numbers will be there. This avoids the need for hacks like using -memprofilerate=1 to find the number of allocations. Change-Id: I10a4f593403d0315aad11b37c6e554b734caa73f Reviewed-on: https://go-review.googlesource.com/9491 Reviewed-by: David Chase <drchase@google.com>	2015-04-29 18:07:43 +00:00
Keith Randall	c526f3ac10	runtime: tail call into memeq/cmp body implementations There's no need to call/ret to the body implementation. It can write the result to the right place. Just jump to it and have it return to our caller. Old: call body implementation compute result put result in a register return write register to result location return New: load address of result location into a register jump to body implementation compute result write result to passed-in address return It's a bit tricky on 386 because there is no free register with which to pass the result location. Free up a register by keeping around blen-alen instead of both alen and blen. Change-Id: If2cf0682a5bf1cc592bdda7c126ed4eee8944fba Reviewed-on: https://go-review.googlesource.com/9202 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2015-04-29 04:46:25 +00:00
Shenghou Ma	7e49c8193c	runtime: skip gdb goroutine backtrace test on non-x86 Gdb is not able to backtrace our non-standard stack frames on RISC architectures without frame pointer. Change-Id: Id62a566ce2d743602ded2da22ff77b9ae34bc5ae Signed-off-by: Shenghou Ma <minux@golang.org> Reviewed-on: https://go-review.googlesource.com/9456 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-04-29 04:44:38 +00:00
Shenghou Ma	da11a9dda3	cmd/internal/ld, runtime: unify stack reservation in PE header and runtime With 128KB stack reservation, on 32-bit Windows, the maximum number threads is ~9000. The original 65535-byte stack commit is causing problem on Windows XP where it makes the stack reservation to be 1MB despite the fact that the runtime specified 128KB. While we're at here, also fix the extra spacings in the unable to create more OS thread error message: println will insert a space between each argument. See #9457 for more information. Change-Id: I3a82f7d9717d3d55211b6eb1c34b00b0eaad83ed Reviewed-on: https://go-review.googlesource.com/2237 Reviewed-by: Alex Brainman <alex.brainman@gmail.com> Run-TryBot: Minux Ma <minux@golang.org>	2015-04-29 03:27:10 +00:00
Shenghou Ma	e7dd28891e	cmd/internal/gc, cmd/[56789]g: rename stackcopy to blockcopy To avoid confusion with the runtime concept of copying stack. Change-Id: I33442377b71012c2482c2d0ddd561492c71e70d0 Reviewed-on: https://go-review.googlesource.com/8639 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-29 00:28:01 +00:00
Ian Lance Taylor	0c62c93a09	runtime/cgo: use PTHREAD_{MUTEX,COND}_INITIALIZER Technically you must initialize static pthread_mutex_t and pthread_cond_t variables with the appropriate INITIALIZER macro. In practice the default initializers are zero anyhow, but it's still good code hygiene. Change-Id: I517304b16c2c7943b3880855c1b47a9a506b4bdf Reviewed-on: https://go-review.googlesource.com/9433 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-04-28 22:27:26 +00:00
Austin Clements	63caec5dee	runtime: eliminate one heapBitsForObject from scanobject scanobject with ptrmask!=nil is only ever called with the base pointer of a heap object. Currently, scanobject calls heapBitsForObject, which goes to a great deal of trouble to check that the pointer points into the heap and to find the base of the object it points to, both of which are completely unnecessary in this case. Replace this call to heapBitsForObject with much simpler logic to fetch the span and compute the heap bits. Benchmark results with five runs: name old mean new mean delta BenchmarkBinaryTree17 9.21s × (0.95,1.02) 8.55s × (0.91,1.03) -7.16% (p=0.022) BenchmarkFannkuch11 2.65s × (1.00,1.00) 2.62s × (1.00,1.00) -1.10% (p=0.000) BenchmarkFmtFprintfEmpty 73.2ns × (0.99,1.01) 71.7ns × (1.00,1.01) -1.99% (p=0.004) BenchmarkFmtFprintfString 302ns × (0.99,1.00) 292ns × (0.98,1.02) -3.31% (p=0.020) BenchmarkFmtFprintfInt 281ns × (0.98,1.01) 279ns × (0.96,1.02) ~ (p=0.596) BenchmarkFmtFprintfIntInt 482ns × (0.98,1.01) 488ns × (0.95,1.02) ~ (p=0.419) BenchmarkFmtFprintfPrefixedInt 382ns × (0.99,1.01) 365ns × (0.96,1.02) -4.35% (p=0.015) BenchmarkFmtFprintfFloat 475ns × (0.99,1.01) 472ns × (1.00,1.00) ~ (p=0.108) BenchmarkFmtManyArgs 1.89µs × (1.00,1.01) 1.90µs × (0.94,1.02) ~ (p=0.883) BenchmarkGobDecode 22.4ms × (0.99,1.01) 21.9ms × (0.92,1.04) ~ (p=0.332) BenchmarkGobEncode 24.7ms × (0.98,1.02) 23.9ms × (0.87,1.07) ~ (p=0.407) BenchmarkGzip 397ms × (0.99,1.01) 398ms × (0.99,1.01) ~ (p=0.718) BenchmarkGunzip 96.7ms × (1.00,1.00) 96.9ms × (1.00,1.00) ~ (p=0.230) BenchmarkHTTPClientServer 71.5µs × (0.98,1.01) 68.5µs × (0.92,1.06) ~ (p=0.243) BenchmarkJSONEncode 46.1ms × (0.98,1.01) 44.9ms × (0.98,1.03) -2.51% (p=0.040) BenchmarkJSONDecode 86.1ms × (0.99,1.01) 86.5ms × (0.99,1.01) ~ (p=0.343) BenchmarkMandelbrot200 4.12ms × (1.00,1.00) 4.13ms × (1.00,1.00) +0.23% (p=0.000) BenchmarkGoParse 5.89ms × (0.96,1.03) 5.82ms × (0.96,1.04) ~ (p=0.522) BenchmarkRegexpMatchEasy0_32 141ns × (0.99,1.01) 142ns × (1.00,1.00) ~ (p=0.178) BenchmarkRegexpMatchEasy0_1K 408ns × (1.00,1.00) 392ns × (0.99,1.00) -3.83% (p=0.000) BenchmarkRegexpMatchEasy1_32 122ns × (1.00,1.00) 122ns × (1.00,1.00) ~ (p=0.178) BenchmarkRegexpMatchEasy1_1K 626ns × (1.00,1.01) 624ns × (0.99,1.00) ~ (p=0.122) BenchmarkRegexpMatchMedium_32 202ns × (0.99,1.00) 205ns × (0.99,1.01) +1.58% (p=0.001) BenchmarkRegexpMatchMedium_1K 54.4µs × (1.00,1.00) 55.5µs × (1.00,1.00) +1.86% (p=0.000) BenchmarkRegexpMatchHard_32 2.68µs × (1.00,1.00) 2.71µs × (1.00,1.00) +0.97% (p=0.002) BenchmarkRegexpMatchHard_1K 79.8µs × (1.00,1.01) 80.5µs × (1.00,1.01) +0.94% (p=0.003) BenchmarkRevcomp 590ms × (0.99,1.01) 585ms × (1.00,1.00) ~ (p=0.066) BenchmarkTemplate 111ms × (0.97,1.02) 112ms × (0.99,1.01) ~ (p=0.201) BenchmarkTimeParse 392ns × (1.00,1.00) 385ns × (1.00,1.00) -1.69% (p=0.000) BenchmarkTimeFormat 449ns × (0.98,1.01) 448ns × (0.99,1.01) ~ (p=0.550) Change-Id: Ie7c3830c481d96c9043e7bf26853c6c1d05dc9f4 Reviewed-on: https://go-review.googlesource.com/9364 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-28 15:22:20 +00:00
Russ Cox	32d6fbcb4f	runtime: replace needwb() with writeBarrierEnabled Reduce the write barrier check to a single load and compare so that it can be inlined into write barrier use sites. Makes the standard write barrier a little faster too. name old new delta BenchmarkBinaryTree17 17.9s × (0.99,1.01) 17.9s × (1.00,1.01) ~ BenchmarkFannkuch11 4.35s × (1.00,1.00) 4.43s × (1.00,1.00) +1.81% BenchmarkFmtFprintfEmpty 120ns × (0.93,1.06) 110ns × (1.00,1.06) -7.92% BenchmarkFmtFprintfString 479ns × (0.99,1.00) 487ns × (0.99,1.00) +1.67% BenchmarkFmtFprintfInt 452ns × (0.99,1.02) 450ns × (0.99,1.00) ~ BenchmarkFmtFprintfIntInt 766ns × (0.99,1.01) 762ns × (1.00,1.00) ~ BenchmarkFmtFprintfPrefixedInt 576ns × (0.98,1.01) 584ns × (0.99,1.01) ~ BenchmarkFmtFprintfFloat 730ns × (1.00,1.01) 738ns × (1.00,1.00) +1.16% BenchmarkFmtManyArgs 2.84µs × (0.99,1.00) 2.80µs × (1.00,1.01) -1.22% BenchmarkGobDecode 39.3ms × (0.98,1.01) 39.0ms × (0.99,1.00) ~ BenchmarkGobEncode 39.5ms × (0.99,1.01) 37.8ms × (0.98,1.01) -4.33% BenchmarkGzip 663ms × (1.00,1.01) 661ms × (0.99,1.01) ~ BenchmarkGunzip 143ms × (1.00,1.00) 142ms × (1.00,1.00) ~ BenchmarkHTTPClientServer 132µs × (0.99,1.01) 132µs × (0.99,1.01) ~ BenchmarkJSONEncode 57.4ms × (0.99,1.01) 56.3ms × (0.99,1.01) -1.96% BenchmarkJSONDecode 139ms × (0.99,1.00) 138ms × (0.99,1.01) ~ BenchmarkMandelbrot200 6.03ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ BenchmarkGoParse 10.3ms × (0.89,1.14) 10.2ms × (0.87,1.05) ~ BenchmarkRegexpMatchEasy0_32 209ns × (1.00,1.00) 208ns × (1.00,1.00) ~ BenchmarkRegexpMatchEasy0_1K 591ns × (0.99,1.00) 588ns × (1.00,1.00) ~ BenchmarkRegexpMatchEasy1_32 184ns × (0.99,1.02) 182ns × (0.99,1.01) ~ BenchmarkRegexpMatchEasy1_1K 1.01µs × (1.00,1.00) 0.99µs × (1.00,1.01) -2.33% BenchmarkRegexpMatchMedium_32 330ns × (1.00,1.00) 323ns × (1.00,1.01) -2.12% BenchmarkRegexpMatchMedium_1K 92.6µs × (1.00,1.00) 89.9µs × (1.00,1.00) -2.92% BenchmarkRegexpMatchHard_32 4.80µs × (0.95,1.00) 4.72µs × (0.95,1.01) ~ BenchmarkRegexpMatchHard_1K 136µs × (1.00,1.00) 133µs × (1.00,1.01) -1.86% BenchmarkRevcomp 900ms × (0.99,1.04) 900ms × (1.00,1.05) ~ BenchmarkTemplate 172ms × (1.00,1.00) 168ms × (0.99,1.01) -2.07% BenchmarkTimeParse 637ns × (1.00,1.00) 637ns × (1.00,1.00) ~ BenchmarkTimeFormat 744ns × (1.00,1.01) 738ns × (1.00,1.00) -0.67% Change-Id: I4ecc925805da1f5ee264377f1f7574f54ee575e7 Reviewed-on: https://go-review.googlesource.com/9321 Reviewed-by: Austin Clements <austin@google.com>	2015-04-28 01:37:53 +00:00
Russ Cox	2050f57141	runtime: change unused argument in fat write barriers from pointer to scalar The argument is unused, only present for alignment of the following argument. The compiler today always passes a zero but I'd rather not write anything there during the call sequence, so mark it as a scalar so the garbage collector won't look at it. As expected, no significant performance change. name old new delta BenchmarkBinaryTree17 17.9s × (0.99,1.00) 17.9s × (0.99,1.01) ~ BenchmarkFannkuch11 4.35s × (1.00,1.00) 4.35s × (1.00,1.00) ~ BenchmarkFmtFprintfEmpty 120ns × (0.94,1.05) 120ns × (0.93,1.06) ~ BenchmarkFmtFprintfString 477ns × (1.00,1.00) 479ns × (0.99,1.00) ~ BenchmarkFmtFprintfInt 450ns × (0.99,1.01) 452ns × (0.99,1.02) ~ BenchmarkFmtFprintfIntInt 765ns × (0.99,1.01) 766ns × (0.99,1.01) ~ BenchmarkFmtFprintfPrefixedInt 569ns × (0.99,1.01) 576ns × (0.98,1.01) ~ BenchmarkFmtFprintfFloat 728ns × (1.00,1.00) 730ns × (1.00,1.01) ~ BenchmarkFmtManyArgs 2.82µs × (0.99,1.01) 2.84µs × (0.99,1.00) ~ BenchmarkGobDecode 39.1ms × (0.99,1.01) 39.3ms × (0.98,1.01) ~ BenchmarkGobEncode 39.4ms × (0.99,1.01) 39.5ms × (0.99,1.01) ~ BenchmarkGzip 661ms × (0.99,1.01) 663ms × (1.00,1.01) ~ BenchmarkGunzip 143ms × (1.00,1.00) 143ms × (1.00,1.00) ~ BenchmarkHTTPClientServer 133µs × (0.99,1.01) 132µs × (0.99,1.01) ~ BenchmarkJSONEncode 57.3ms × (0.99,1.04) 57.4ms × (0.99,1.01) ~ BenchmarkJSONDecode 139ms × (0.99,1.00) 139ms × (0.99,1.00) ~ BenchmarkMandelbrot200 6.02ms × (1.00,1.00) 6.03ms × (1.00,1.00) ~ BenchmarkGoParse 9.72ms × (0.92,1.11) 10.31ms × (0.89,1.14) ~ BenchmarkRegexpMatchEasy0_32 209ns × (1.00,1.01) 209ns × (1.00,1.00) ~ BenchmarkRegexpMatchEasy0_1K 592ns × (0.99,1.00) 591ns × (0.99,1.00) ~ BenchmarkRegexpMatchEasy1_32 183ns × (0.98,1.01) 184ns × (0.99,1.02) ~ BenchmarkRegexpMatchEasy1_1K 1.01µs × (1.00,1.01) 1.01µs × (1.00,1.00) ~ BenchmarkRegexpMatchMedium_32 330ns × (1.00,1.00) 330ns × (1.00,1.00) ~ BenchmarkRegexpMatchMedium_1K 92.4µs × (1.00,1.00) 92.6µs × (1.00,1.00) ~ BenchmarkRegexpMatchHard_32 4.77µs × (0.95,1.01) 4.80µs × (0.95,1.00) ~ BenchmarkRegexpMatchHard_1K 136µs × (1.00,1.00) 136µs × (1.00,1.00) ~ BenchmarkRevcomp 906ms × (0.99,1.05) 900ms × (0.99,1.04) ~ BenchmarkTemplate 171ms × (0.99,1.01) 172ms × (1.00,1.00) ~ BenchmarkTimeParse 638ns × (1.00,1.00) 637ns × (1.00,1.00) ~ BenchmarkTimeFormat 745ns × (0.99,1.02) 744ns × (1.00,1.01) ~ Change-Id: I0aeac5dc7adfd75e2223e3aabfedc7818d339f9b Reviewed-on: https://go-review.googlesource.com/9320 Reviewed-by: Austin Clements <austin@google.com>	2015-04-28 01:37:45 +00:00
Austin Clements	02ba71e547	runtime/race: fix failing tests Some race tests were sensitive to the goroutine scheduling order. When this changed in commit `e870f06`, these tests started to fail. Fix TestRaceHeapParam by ensuring that the racing goroutine has run before the test exits. Fix TestRaceRWMutexMultipleReaders by adding a third reader to ensure that two readers wind up on the same side of the writer (and race with each other) regardless of the schedule. Fix TestRaceRange by ensuring that the racing goroutine runs before the main goroutine exits the loop it races with. Change-Id: Iaf002f8730ea42227feaf2f3c51b9a1e57ccffdd Reviewed-on: https://go-review.googlesource.com/9402 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-27 23:12:00 +00:00
Russ Cox	f774e6a1f8	runtime/race: stop listening to external network addresses This makes the OS X firewall box pop up. Not run during all.bash so hasn't been noticed before. Change-Id: I78feb4fd3e1d3c983ae3419085048831c04de3da Reviewed-on: https://go-review.googlesource.com/9401 Reviewed-by: Austin Clements <austin@google.com>	2015-04-27 23:11:45 +00:00
Austin Clements	7c7cd69591	runtime: fix stack use accounting ReadMemStats accounts for stacks slightly differently than the runtime does internally. Internally, only stacks allocated by newosproc0 are accounted in memstats.stacks_sys and other stacks are accounted in heap_sys. readmemstats_m shuffles the statistics so all stacks are accounted in StackSys rather than HeapSys. However, currently, readmemstats_m assumes StackSys will be zero when it does this shuffle. This was true until commit `6ad33be`. If it isn't (e.g., if something called newosproc0), StackSys+HeapSys will be different before and after this shuffle, and the Sys sum that was computed earlier will no longer agree with the sum of its components. Fix this by making the shuffle in readmemstats_m not assume that StackSys is zero. Fixes #10585. Change-Id: If13991c8de68bd7b85e1b613d3f12b4fd6fd5813 Reviewed-on: https://go-review.googlesource.com/9366 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-27 23:09:39 +00:00
David Crawshaw	d707a6e0e2	runtime: remove unnecessary noescape to fix netbsd I introduced this build failure in golang.org/cl/9302 but failed to notice due to the other failures on the dashboard. Change-Id: I84bf00f664ba572c1ca722e0136d8a2cf21613ca Reviewed-on: https://go-review.googlesource.com/9363 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Minux Ma <minux@golang.org>	2015-04-27 23:04:38 +00:00
Austin Clements	23ce80efeb	runtime/race: fix benchmark deadlock Currently TestRaceCrawl fails to wg.Done for every wg.Adds if the depth ever reaches 0. This causes the test to deadlock. Under the race detector, this deadlock is not detected, so the test eventually times out. This only recently became a problem. Prior to commit `e870f06` the depth would never reach 0 because the strict round-robin goroutine schedule ensured that all of the URLs were already "seen" by depth 2. Now that the runtime prefers scheduling the most recently started goroutine, the test is able to reach depth 0 and trigger this deadlock. Change-Id: I5176302a89614a344c84d587073b364833af6590 Reviewed-on: https://go-review.googlesource.com/9344 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-27 20:54:34 +00:00
Russ Cox	42da270024	runtime: fix race in BenchmarkPingPongHog The master goroutine was returning before the child goroutine had done its final i < b.N (the one that fails and causes it to exit the loop) and then the benchmark harness was updating b.N, causing a read+write race on b.N. Change-Id: I2504270a0de30544736f6c32161337a25b505c3e Reviewed-on: https://go-review.googlesource.com/9368 Reviewed-by: Austin Clements <austin@google.com>	2015-04-27 20:10:11 +00:00
Austin Clements	33e0f3d853	runtime: fix some out of date comments and typos Change-Id: I061057414c722c5a0f03c709528afc8554114db6 Reviewed-on: https://go-review.googlesource.com/9367 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 20:08:38 +00:00
Josh Bleecher Snyder	9a0fd97ff3	runtime: remove a modulus calculation from pollorder This is a follow-up to CL 9269, as suggested by dvyukov. There is probably even more that can be done to speed up this shuffle. It will matter more once CL 7570 (fine-grained locking in select) is in and can be revisited then, with benchmarks. Change-Id: Ic13a27d11cedd1e1f007951214b3bb56b1644f02 Reviewed-on: https://go-review.googlesource.com/9393 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-04-27 19:36:37 +00:00
Austin Clements	1b01910c06	runtime: rename gcController.findRunnable to findRunnableGCWorker This avoids confusion with the main findrunnable in the scheduler. Change-Id: I8cf40657557a8610a2fe5a2f74598518256ca7f0 Reviewed-on: https://go-review.googlesource.com/9305 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 19:26:42 +00:00
Austin Clements	bb6320535d	runtime: replace STW for enabling write barriers with ragged barrier Currently, we use a full stop-the-world around enabling write barriers. This is to ensure that all Gs have enabled write barriers before any blackening occurs (either in gcBgMarkWorker() or in gcAssistAlloc()). However, there's no need to bring the whole world to a synchronous stop to ensure this. This change replaces the STW with a ragged barrier that ensures each P has individually observed that write barriers should be enabled before GC performs any blackening. Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955 Reviewed-on: https://go-review.googlesource.com/8207 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 19:26:37 +00:00
Austin Clements	57afa76471	runtime: add ragged global barrier function This adds forEachP, which performs a general-purpose ragged global barrier. forEachP takes a callback and invokes it for every P at a GC safe point. Ps that are idle or in a syscall are considered to be at a continuous safe point. forEachP ensures that these Ps do not change state by forcing all syscall Ps into idle and holding the sched.lock. To ensure that Ps do not enter syscall or idle without running the safe-point function, this adds checks for a pending callback every place there is currently a gcwaiting check. We'll use forEachP to replace the STW around enabling the write barrier and to replace the current asynchronous per-M wbuf cache with a cooperatively managed per-P gcWork cache. Change-Id: Ie944f8ce1fead7c79bf271d2f42fcd61a41bb3cc Reviewed-on: https://go-review.googlesource.com/8206 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-27 19:26:33 +00:00
Austin Clements	b0b1a66052	runtime: reset spinning in mspinning if work was ready()ed This fixes a bug where the runtime ready()s a goroutine while setting up a new M that's initially marked as spinning, causing the scheduler to later panic when it finds work in the run queue of a P associated with a spinning M. Specifically, the sequence of events that can lead to this is: 1) sysmon calls handoffp to hand off a P stolen from a syscall. 2) handoffp sees no pending work on the P, so it calls startm with spinning set. 3) startm calls newm, which in turn calls allocm to allocate a new M. 4) allocm "borrows" the P we're handing off in order to do allocation and performs this allocation. 5) This allocation may assist the garbage collector, and this assist may detect the end of concurrent mark and ready() the main GC goroutine to signal this. 6) This ready()ing puts the GC goroutine on the run queue of the borrowed P. 7) newm starts the OS thread, which runs mstart and subsequently mstart1, which marks the M spinning because startm was called with spinning set. 8) mstart1 enters the scheduler, which panics because there's work on the run queue, but the M is marked spinning. To fix this, before marking the M spinning in step 7, add a check to see if work was been added to the P's run queue. If this is the case, undo the spinning instead. Fixes #10573. Change-Id: I4670495ae00582144a55ce88c45ae71de597cfa5 Reviewed-on: https://go-review.googlesource.com/9332 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-04-27 12:49:54 +00:00
Austin Clements	2a46f55b35	runtime: panic when idling a P with runnable Gs This adds a check that we never put a P on the idle list when it has work on its local run queue. Change-Id: Ifcfab750de60c335148a7f513d4eef17be03b6a7 Reviewed-on: https://go-review.googlesource.com/9324 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-04-27 12:49:49 +00:00
Josh Bleecher Snyder	fd5540e7e5	runtime: tighten select permutation generation This is the optimization made to math/rand in CL 21030043. Change-Id: I231b24fa77cac1fe74ba887db76313b5efaab3e8 Reviewed-on: https://go-review.googlesource.com/9269 Reviewed-by: Minux Ma <minux@golang.org>	2015-04-27 02:36:24 +00:00
David Crawshaw	a5b693b431	runtime: signal forwarding for darwin/amd64 Follows the linux signal forwarding semantics from http://golang.org/cl/8712, sharing the implementation of sigfwdgo. Forwarding for 386, arm, and arm64 will follow. Change-Id: I6bf30d563d19da39b6aec6900c7fe12d82ed4f62 Reviewed-on: https://go-review.googlesource.com/9302 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-04-26 13:46:13 +00:00
Rick Hudson	ada8cdb9f6	runtime: Fix bug due to elided return. A previous change to mbitmap.go dropped a return on a path the seems not to be excersized. This was a mistake that this CL fixes. Change-Id: I715ee4ef08f5bf8d9f53cee84e8fb31a237e2d43 Reviewed-on: https://go-review.googlesource.com/9295 Reviewed-by: Austin Clements <austin@google.com>	2015-04-24 21:52:30 +00:00
Austin Clements	1b4025f4bd	runtime: replace per-M workbuf cache with per-P gcWork cache Currently, each M has a cache of the most recently used workbuf. This is used primarily by the write barrier so it doesn't have to access the global workbuf lists on every write barrier. It's also used by stack scanning because it's convenient. This cache is important for write barrier performance, but this particular approach has several downsides. It's faster than no cache, but far from optimal (as the benchmarks below show). It's complex: access to the cache is sprinkled through most of the workbuf list operations and it requires special care to transform into and back out of the gcWork cache that's actually used for scanning and marking. It requires atomic exchanges to take ownership of the cached workbuf and to return it to the M's cache even though it's almost always used by only the current M. Since it's per-M, flushing these caches is O(# of Ms), which may be high. And it has some significant subtleties: for example, in general the cache shouldn't be used after the harvestwbufs() in mark termination because it could hide work from mark termination, but stack scanning can happen after this and will* use the cache (but it turns out this is okay because it will always be followed by a getfull(), which drains the cache). This change replaces this cache with a per-P gcWork object. This gcWork cache can be used directly by scanning and marking (as long as preemption is disabled, which is a general requirement of gcWork). Since it's per-P, it doesn't require synchronization, which simplifies things and means the only atomic operations in the write barrier are occasionally fetching new work buffers and setting a mark bit if the object isn't already marked. This cache can be flushed in O(# of Ps), which is generally small. It follows a simple flushing rule: the cache can be used during any phase, but during mark termination it must be flushed before allowing preemption. This also makes the dispose during mutator assist no longer necessary, which eliminates the vast majority of gcWork dispose calls and reduces contention on the global workbuf lists. And it's a lot faster on some benchmarks: benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 11963668673 11206112763 -6.33% BenchmarkFannkuch11 2643217136 2649182499 +0.23% BenchmarkFmtFprintfEmpty 70.4 70.2 -0.28% BenchmarkFmtFprintfString 364 307 -15.66% BenchmarkFmtFprintfInt 317 282 -11.04% BenchmarkFmtFprintfIntInt 512 483 -5.66% BenchmarkFmtFprintfPrefixedInt 404 380 -5.94% BenchmarkFmtFprintfFloat 521 479 -8.06% BenchmarkFmtManyArgs 2164 1894 -12.48% BenchmarkGobDecode 30366146 22429593 -26.14% BenchmarkGobEncode 29867472 26663152 -10.73% BenchmarkGzip 391236616 396779490 +1.42% BenchmarkGunzip 96639491 96297024 -0.35% BenchmarkHTTPClientServer 100110 70763 -29.31% BenchmarkJSONEncode 51866051 52511382 +1.24% BenchmarkJSONDecode 103813138 86094963 -17.07% BenchmarkMandelbrot200 4121834 4120886 -0.02% BenchmarkGoParse 16472789 5879949 -64.31% BenchmarkRegexpMatchEasy0_32 140 140 +0.00% BenchmarkRegexpMatchEasy0_1K 394 394 +0.00% BenchmarkRegexpMatchEasy1_32 120 120 +0.00% BenchmarkRegexpMatchEasy1_1K 621 614 -1.13% BenchmarkRegexpMatchMedium_32 209 202 -3.35% BenchmarkRegexpMatchMedium_1K 54889 55175 +0.52% BenchmarkRegexpMatchHard_32 2682 2675 -0.26% BenchmarkRegexpMatchHard_1K 79383 79524 +0.18% BenchmarkRevcomp 584116718 584595320 +0.08% BenchmarkTemplate 125400565 109620196 -12.58% BenchmarkTimeParse 386 387 +0.26% BenchmarkTimeFormat 580 447 -22.93% (Best out of 10 runs. The delta of averages is similar.) This also puts us in a good position to flush these caches when nearing the end of concurrent marking, which will let us increase the size of the work buffers while still controlling mark termination pause time. Change-Id: I2dd94c8517a19297a98ec280203cccaa58792522 Reviewed-on: https://go-review.googlesource.com/9178 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 20:10:14 +00:00
Austin Clements	d1cae6358c	runtime: fix check for pending GC work When findRunnable considers running a fractional mark worker, it first checks if there's any work to be done; if there isn't there's no point in running the worker because it will just reschedule immediately. However, currently findRunnable just checks work.full and work.partial, whereas getfull can also draw work from m.currentwbuf. As a result, findRunnable may not start a worker even though there actually is work. This problem manifests itself in occasional failures of the test/init1.go test. This test is unusual because it performs a large amount of allocation without executing any write barriers, which means there's nothing to force the pointers in currentwbuf out to the work.partial/full lists where findRunnable can see them. This change fixes this problem by making findRunnable also check for a currentwbuf. This aligns findRunnable with trygetfull's notion of whether or not there's work. Change-Id: Ic76d22b7b5d040bc4f58a6b5975e9217650e66c4 Reviewed-on: https://go-review.googlesource.com/9299 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 20:10:10 +00:00
Austin Clements	26eac917dc	runtime: start dedicated mark workers even if there's no work Currently, findRunnable only considers running a mark worker if there's work in the work queue. In principle, this can delay the start of the desired number of dedicated mark workers if there's no work pending. This is unlikely to occur in practice, since there should be work queued from the scan phase, but if it were to come up, a CPU hog mutator could slow down or delay garbage collection. This check makes sense for fractional mark workers, since they'll just return to the scheduler immediately if there's no work, but we want the scheduler to start all of the dedicated mark workers promptly, even if there's currently no queued work. Hence, this change moves the pending work check after the check for starting a dedicated worker. Change-Id: I52b851cc9e41f508a0955b3f905ca80f109ea101 Reviewed-on: https://go-review.googlesource.com/9298 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-24 20:10:05 +00:00
Austin Clements	711a164267	runtime: fix some out-of-date comments bgMarkCount no longer exists. Change-Id: I3aa406fdccfca659814da311229afbae55af8304 Reviewed-on: https://go-review.googlesource.com/9297 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-24 20:10:01 +00:00
Srdjan Petrovic	6ad33be2d9	runtime: implement xadduintptr and update system mstats using it The motivation is that sysAlloc/Free() currently aren't safe to be called without a valid G, because arm's xadd64() uses locks that require a valid G. The solution here was proposed by Dmitry Vyukov: use xadduintptr() instead of xadd64(), until arm can support xadd64 on all of its architectures (not a trivial task for arm). Change-Id: I250252079357ea2e4360e1235958b1c22051498f Reviewed-on: https://go-review.googlesource.com/9002 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-04-24 16:53:26 +00:00
Austin Clements	0e6a6c510f	runtime: simplify process for starting GC goroutine Currently, when allocation reaches the GC trigger, the runtime uses readyExecute to start the GC goroutine immediately rather than wait for the scheduler to get around to the GC goroutine while the mutator continues to grow the heap. Now that the scheduler runs the most recently readied goroutine when a goroutine yields its time slice, this rigmarole is no longer necessary. The runtime can simply ready the GC goroutine and yield from the readying goroutine. Change-Id: I3b4ebadd2a72a923b1389f7598f82973dd5c8710 Reviewed-on: https://go-review.googlesource.com/9292 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-04-24 15:13:05 +00:00
Austin Clements	ce502b063c	runtime: use park/ready to wake up GC at end of concurrent mark Currently, the main GC goroutine sleeps on a note during concurrent mark and the first background mark worker or assist to finish marking use wakes up that note to let the main goroutine proceed into mark termination. Unfortunately, the latency of this wakeup can be quite high, since the GC goroutine will typically have lost its P while in the futex sleep, meaning it will be placed on the global run queue and will wait there until some P is kind enough to pick it up. This delay gives the mutator more time to allocate and create floating garbage, growing the heap unnecessarily. Worse, it's likely that background marking has stopped at this point (unless GOMAXPROCS>4), so anything that's allocated and published to the heap during this window will have to be scanned during mark termination while the world is stopped. This change replaces the note sleep/wakeup with a gopark/ready scheme. This keeps the wakeup inside the Go scheduler and lets the garbage collector take advantage of the new scheduler semantics that run the ready()d goroutine immediately when the ready()ing goroutine sleeps. For the json benchmark from x/benchmarks with GOMAXPROCS=4, this reduces the delay in waking up the GC goroutine and entering mark termination once concurrent marking is done from ~100ms to typically <100µs. Change-Id: Ib11f8b581b8914f2d68e0094f121e49bac3bb384 Reviewed-on: https://go-review.googlesource.com/9291 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:13:01 +00:00
Austin Clements	4e32718d3e	runtime: use timer for GC control revise rather than timeout Currently, we use a note sleep with a timeout in a loop in func gc to periodically revise the GC control variables. Replace this with a fully blocking note sleep and use a periodic timer to trigger the revise instead. This is a step toward replacing the note sleep in func gc. Change-Id: I2d562f6b9b2e5f0c28e9a54227e2c0f8a2603f63 Reviewed-on: https://go-review.googlesource.com/9290 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:12:56 +00:00

... 2 3 4 5 6 ...

1355 Commits