qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-20 07:34:40 -07:00

Author	SHA1	Message	Date
Austin Clements	26eac917dc	runtime: start dedicated mark workers even if there's no work Currently, findRunnable only considers running a mark worker if there's work in the work queue. In principle, this can delay the start of the desired number of dedicated mark workers if there's no work pending. This is unlikely to occur in practice, since there should be work queued from the scan phase, but if it were to come up, a CPU hog mutator could slow down or delay garbage collection. This check makes sense for fractional mark workers, since they'll just return to the scheduler immediately if there's no work, but we want the scheduler to start all of the dedicated mark workers promptly, even if there's currently no queued work. Hence, this change moves the pending work check after the check for starting a dedicated worker. Change-Id: I52b851cc9e41f508a0955b3f905ca80f109ea101 Reviewed-on: https://go-review.googlesource.com/9298 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-24 20:10:05 +00:00
Austin Clements	711a164267	runtime: fix some out-of-date comments bgMarkCount no longer exists. Change-Id: I3aa406fdccfca659814da311229afbae55af8304 Reviewed-on: https://go-review.googlesource.com/9297 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-24 20:10:01 +00:00
Srdjan Petrovic	6ad33be2d9	runtime: implement xadduintptr and update system mstats using it The motivation is that sysAlloc/Free() currently aren't safe to be called without a valid G, because arm's xadd64() uses locks that require a valid G. The solution here was proposed by Dmitry Vyukov: use xadduintptr() instead of xadd64(), until arm can support xadd64 on all of its architectures (not a trivial task for arm). Change-Id: I250252079357ea2e4360e1235958b1c22051498f Reviewed-on: https://go-review.googlesource.com/9002 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-04-24 16:53:26 +00:00
Austin Clements	0e6a6c510f	runtime: simplify process for starting GC goroutine Currently, when allocation reaches the GC trigger, the runtime uses readyExecute to start the GC goroutine immediately rather than wait for the scheduler to get around to the GC goroutine while the mutator continues to grow the heap. Now that the scheduler runs the most recently readied goroutine when a goroutine yields its time slice, this rigmarole is no longer necessary. The runtime can simply ready the GC goroutine and yield from the readying goroutine. Change-Id: I3b4ebadd2a72a923b1389f7598f82973dd5c8710 Reviewed-on: https://go-review.googlesource.com/9292 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-04-24 15:13:05 +00:00
Austin Clements	ce502b063c	runtime: use park/ready to wake up GC at end of concurrent mark Currently, the main GC goroutine sleeps on a note during concurrent mark and the first background mark worker or assist to finish marking use wakes up that note to let the main goroutine proceed into mark termination. Unfortunately, the latency of this wakeup can be quite high, since the GC goroutine will typically have lost its P while in the futex sleep, meaning it will be placed on the global run queue and will wait there until some P is kind enough to pick it up. This delay gives the mutator more time to allocate and create floating garbage, growing the heap unnecessarily. Worse, it's likely that background marking has stopped at this point (unless GOMAXPROCS>4), so anything that's allocated and published to the heap during this window will have to be scanned during mark termination while the world is stopped. This change replaces the note sleep/wakeup with a gopark/ready scheme. This keeps the wakeup inside the Go scheduler and lets the garbage collector take advantage of the new scheduler semantics that run the ready()d goroutine immediately when the ready()ing goroutine sleeps. For the json benchmark from x/benchmarks with GOMAXPROCS=4, this reduces the delay in waking up the GC goroutine and entering mark termination once concurrent marking is done from ~100ms to typically <100µs. Change-Id: Ib11f8b581b8914f2d68e0094f121e49bac3bb384 Reviewed-on: https://go-review.googlesource.com/9291 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:13:01 +00:00
Austin Clements	4e32718d3e	runtime: use timer for GC control revise rather than timeout Currently, we use a note sleep with a timeout in a loop in func gc to periodically revise the GC control variables. Replace this with a fully blocking note sleep and use a periodic timer to trigger the revise instead. This is a step toward replacing the note sleep in func gc. Change-Id: I2d562f6b9b2e5f0c28e9a54227e2c0f8a2603f63 Reviewed-on: https://go-review.googlesource.com/9290 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:12:56 +00:00
Austin Clements	e870f06c3f	runtime: yield time slice to most recently readied G Currently, when the runtime ready()s a G, it adds it to the end of the current P's run queue and continues running. If there are many other things in the run queue, this can result in a significant delay before the ready()d G actually runs and can hurt fairness when other Gs in the run queue are CPU hogs. For example, if there are three Gs sharing a P, one of which is a CPU hog that never voluntarily gives up the P and the other two of which are doing small amounts of work and communicating back and forth on an unbuffered channel, the two communicating Gs will get very little CPU time. Change this so that when G1 ready()s G2 and then blocks, the scheduler immediately hands off the remainder of G1's time slice to G2. In the above example, the two communicating Gs will now act as a unit and together get half of the CPU time, while the CPU hog gets the other half of the CPU time. This fixes the problem demonstrated by the ping-pong benchmark added in the previous commit: benchmark old ns/op new ns/op delta BenchmarkPingPongHog 684287 825 -99.88% On the x/benchmarks suite, this change improves the performance of garbage by ~6% (for GOMAXPROCS=1 and 4), and json by 28% and 36% for GOMAXPROCS=1 and 4. It has negligible effect on heap size. This has no effect on the go1 benchmark suite since those benchmarks are mostly single-threaded. Change-Id: I858a08eaa78f702ea98a5fac99d28a4ac91d339f Reviewed-on: https://go-review.googlesource.com/9289 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:12:52 +00:00
Austin Clements	da0e37fa8d	runtime: benchmark for ping-pong in the presence of a CPU hog This benchmark demonstrates a current problem with the scheduler where a set of frequently communicating goroutines get very little CPU time in the presence of another goroutine that hogs that CPU, even if one of those communicating goroutines is always runnable. Currently it takes about 0.5 milliseconds to switch between ping-ponging goroutines in the presence of a CPU hog: BenchmarkPingPongHog 2000 684287 ns/op Change-Id: I278848c84f778de32344921ae8a4a8056e4898b0 Reviewed-on: https://go-review.googlesource.com/9288 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:12:47 +00:00
Austin Clements	e5e52f4f2c	runtime: factor checking if P run queue is empty There are a variety of places where we check if a P's run queue is empty. This test is about to get slightly more complicated, so factor it out into a new function, runqempty. This function is inlinable, so this has no effect on performance. Change-Id: If4a0b01ffbd004937de90d8d686f6ded4aad2c6b Reviewed-on: https://go-review.googlesource.com/9287 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-24 15:12:42 +00:00
Srdjan Petrovic	5c8fbc6f1e	runtime: signal forwarding Forward signals to signal handlers installed before Go installs its own, under certain circumstances. In particular, as iant@ suggests, signals are forwarded iff: (1) a non-SIG_DFL signal handler existed before Go, and (2) signal is synchronous (i.e., one of SIGSEGV, SIGBUS, SIGFPE), and (3a) signal occured on a non-Go thread, or (3b) signal occurred on a Go thread but in CGo code. Supported only on Linux, for now. Change-Id: I403219ee47b26cf65da819fb86cf1ec04d3e25f5 Reviewed-on: https://go-review.googlesource.com/8712 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-04-24 05:19:39 +00:00
Srdjan Petrovic	1f65c9c141	runtime: deflake TestNewOSProc0, fix _rt0_amd64_linux_lib stack alignment This addresses iant's comments from CL 9164. Change-Id: I7b5b282f61b11aab587402c2d302697e76666376 Reviewed-on: https://go-review.googlesource.com/9222 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-04-23 23:09:03 +00:00
Austin Clements	ed09e0e2bf	runtime: fix underflow in next_gc calculation Currently, it's possible for the next_gc calculation to underflow. Since next_gc is unsigned, this wraps around and effectively disables GC for the rest of the program's execution. Besides being obviously wrong, this is causing test failures on 32-bit because some tests are running out of heap. This underflow happens for two reasons, both having to do with how we estimate the reachable heap size at the end of the GC cycle. One reason is that this calculation depends on the value of heap_live at the beginning of the GC cycle, but we currently only record that value during a concurrent GC and not during a forced STW GC. Fix this by moving the recorded value from gcController to work and recording it on a common code path. The other reason is that we use the amount of allocation during the GC cycle as an approximation of the amount of floating garbage and subtract it from the marked heap to estimate the reachable heap. However, since this is only an approximation, it's possible for the amount of allocation during the cycle to be larger than the marked heap size (since the runtime allocates white and it's possible for these allocations to never be made reachable from the heap). Currently this causes wrap-around in our estimate of the reachable heap size, which in turn causes wrap-around in next_gc. Fix this by bottoming out the reachable heap estimate at 0, in which case we just fall back to triggering GC at heapminimum (which is okay since this only happens on small heaps). Fixes #10555, fixes #10556, and fixes #10559. Change-Id: Iad07b529c03772356fede2ae557732f13ebfdb63 Reviewed-on: https://go-review.googlesource.com/9286 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-23 20:52:54 +00:00
Rick Hudson	77f56af0bc	runtime: Improve scanning performance To achieve a 2% improvement in the garbage benchmark this CL removes an unneeded assert and avoids one hbits.next() call per object being scanned. Change-Id: Ibd542d01e9c23eace42228886f9edc488354df0d Reviewed-on: https://go-review.googlesource.com/9244 Reviewed-by: Austin Clements <austin@google.com>	2015-04-23 20:27:46 +00:00
Hyang-Ah Hana Kim	aef54d40ac	runtime: disable TestNewOSProc0 on android/arm. newosproc0 does not work on android/arm. See issue #10548. Change-Id: Ieaf6f5d0b77cddf5bf0b6c89fd12b1c1b8723f9b Reviewed-on: https://go-review.googlesource.com/9293 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-04-23 19:08:33 +00:00
Shenghou Ma	edc53e1f14	runtime: fix build after CL 9164 on Linux There is an assumption that the function executed in child thread created by runtime.close should not return. And different systems enforce that differently: some exit that thread, some exit the whole process. The test TestNewOSProc0 introduced in CL 9161 breaks that assumption, so we need to adjust the code to only exit the thread should the called function return. Change-Id: Id631cb2f02ec6fbd765508377a79f3f96c6a2ed6 Reviewed-on: https://go-review.googlesource.com/9246 Reviewed-by: Dave Cheney <dave@cheney.net>	2015-04-22 23:21:25 +00:00
Austin Clements	4655aadd00	runtime: use reachable heap estimate to set trigger/goal Currently, we set the heap goal for the next GC cycle using the size of the marked heap at the end of the current cycle. This can lead to a bad feedback loop if the mutator is rapidly allocating and releasing pointers that can significantly bloat heap size. If the GC were STW, the marked heap size would be exactly the reachable heap size (call it stwLive). However, in concurrent GC, marked=stwLive+floatLive, where floatLive is the amount of "floating garbage": objects that were reachable at some point during the cycle and were marked, but which are no longer reachable by the end of the cycle. If the GC cycle is short, then the mutator doesn't have much time to create floating garbage, so marked≈stwLive. However, if the GC cycle is long and the mutator is allocating and creating floating garbage very rapidly, then it's possible that marked≫stwLive. Since the runtime currently sets the heap goal based on marked, this will cause it to set a high heap goal. This means that 1) the next GC cycle will take longer because of the larger heap and 2) the assist ratio will be low because of the large distance between the trigger and the goal. The combination of these lets the mutator produce even more floating garbage in the next cycle, which further exacerbates the problem. For example, on the garbage benchmark with GOMAXPROCS=1, this causes the heap to grow to ~500MB and the garbage collector to retain upwards of ~300MB of heap, while the true reachable heap size is ~32MB. This, in turn, causes the GC cycle to take upwards of ~3 seconds. Fix this bad feedback loop by estimating the true reachable heap size (stwLive) and using this rather than the marked heap size (stwLive+floatLive) as the basis for the GC trigger and heap goal. This breaks the bad feedback loop and causes the mutator to assist more, which decreases the rate at which it can create floating garbage. On the same garbage benchmark, this reduces the maximum heap size to ~73MB, the retained heap to ~40MB, and the duration of the GC cycle to ~200ms. Change-Id: I7712244c94240743b266f9eb720c03802799cdd1 Reviewed-on: https://go-review.googlesource.com/9177 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-22 19:28:42 +00:00
Austin Clements	1ccc577b8a	runtime: include heap goal in gctrace line This may or may not be useful to the end user, but it's incredibly useful for us to understand the behavior of the pacer. Currently this is fairly easy (though not trivial) to derive from the other heap stats we print, but we're about to change how we compute the goal, which will make it much harder to derive. Change-Id: I796ef233d470c01f606bd9929820c01ece1f585a Reviewed-on: https://go-review.googlesource.com/9176 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-22 19:07:44 +00:00
Austin Clements	1f39beb01a	runtime: avoid divide-by-zero in GC trigger controller The trigger controller computes GC CPU utilization by dividing by the wall-clock time that's passed since concurrent mark began. Since this delta is nanoseconds it's borderline impossible for it to be zero, but if it is zero we'll currently divide by zero. Be robust to this possibility by ignoring the utilization in the error term if no time has elapsed. Change-Id: I93dfc9e84735682af3e637f6538d1e7602634f09 Reviewed-on: https://go-review.googlesource.com/9175 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-22 19:07:36 +00:00
Srdjan Petrovic	ca9128f18f	runtime: merge clone0 and clone We initially added clone0 to handle the case when G or M don't exist, but it turns out that we could have just modified clone. (It also helps that the function we're invoking in clone0 no longer needs arguments.) As a side-effect, newosproc0 is now supported on all linux archs. Change-Id: Ie603af75d8f164310fc16446052d83743961f3ca Reviewed-on: https://go-review.googlesource.com/9164 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-04-22 16:28:57 +00:00
Shenghou Ma	87054c4704	runtime: fix more vet reported issues Change-Id: Ie8dfdb592ee0bfc736d08c92c3d8413a37b6ac03 Reviewed-on: https://go-review.googlesource.com/9241 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-04-22 02:50:48 +00:00
Keith Randall	3a56aa0d3e	runtime: check error codes for arm64 system calls Unlike linux arm32, linux arm64 does not set the condition codes to indicate whether a system call failed or not. We must check if the return value is in the error code range (the same as amd64 does). Fixes runtime.TestBadOpen test. Change-Id: I97a8b0a17b5f002a3215c535efa91d199cee3309 Reviewed-on: https://go-review.googlesource.com/9220 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-22 02:30:22 +00:00
Josh Bleecher Snyder	a76099f0d9	runtime: fix arm64 asm vet issues Several naming changes and a real issue in asmcgocall_errno. Change-Id: Ieb0a328a168819fe233d74e0397358384d7e71b3 Reviewed-on: https://go-review.googlesource.com/9212 Reviewed-by: Minux Ma <minux@golang.org>	2015-04-22 02:30:11 +00:00
Austin Clements	170fb10089	runtime: assist harder if GC exceeds the estimated marked heap Currently, the GC controller computes the mutator assist ratio at the beginning of the cycle by estimating that the marked heap size this cycle will be the same as it was the previous cycle. It then uses that assist ratio for the rest of the cycle. However, this means that if the mutator is quickly growing its reachable heap, the heap size is likely to exceed the heap goal and currently there's no additional pressure on mutator assists when this happens. For example, 6g (with GOMAXPROCS=1) frequently exceeds the goal heap size by ~25% because of this. This change makes GC revise its work estimate and the resulting assist ratio every 10ms during the concurrent mark. Instead of unconditionally using the marked heap size from the last cycle as an estimate for this cycle, it takes the minimum of the previously marked heap and the currently marked heap. As a result, as the cycle approaches or exceeds its heap goal, this will increase the assist ratio to put more pressure on the mutator assist to bring the cycle to an end. For 6g, this causes the GC to always finish within 5% and often within 1% of its heap goal. Change-Id: I4333b92ad0878c704964be42c655c38a862b4224 Reviewed-on: https://go-review.googlesource.com/9070 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-04-21 15:35:55 +00:00
Austin Clements	e0c3d85f08	runtime: fix background marking at 25% utilization Currently, in accordance with the GC pacing proposal, we schedule background marking with a goal of achieving 25% utilization total between mutator assists and background marking. This is stricter than was set out in the Go 1.5 proposal, which suggests that the garbage collector can use 25% just for itself and anything the mutator does to help out is on top of that. It also has several technical drawbacks. Because mutator assist time is constantly changing and we can't have instantaneous information on background marking time, it effectively requires hitting a moving target based on out-of-date information. This works out in the long run, but works poorly for short GC cycles and on short time scales. Also, this requires time-multiplexing all Ps between the mutator and background GC since the goal utilization of background GC constantly fluctuates. This results in a complicated scheduling algorithm, poor affinity, and extra overheads from context switching. This change modifies the way we schedule and run background marking so that background marking always consumes 25% of GOMAXPROCS and mutator assist is in addition to this. This enables a much more robust scheduling algorithm where we pre-determine the number of Ps we should dedicate to background marking as well as the utilization goal for a single floating "remainder" mark worker. Change-Id: I187fa4c03ab6fe78012a84d95975167299eb9168 Reviewed-on: https://go-review.googlesource.com/9013 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:50 +00:00
Austin Clements	24a7252e25	runtime: finish sweeping before concurrent GC starts Currently, the concurrent sweep follows a 1:1 rule: when allocation needs a span, it sweeps a span (likewise, when a large allocation needs N pages, it sweeps until it frees N pages). This rule worked well for the STW collector (especially when GOGC==100) because it did no more sweeping than necessary to keep the heap from growing, would generally finish sweeping just before GC, and ensured good temporal locality between sweeping a page and allocating from it. It doesn't work well with concurrent GC. Since concurrent GC requires starting GC earlier (sometimes much earlier), the sweep often won't be done when GC starts. Unfortunately, the first thing GC has to do is finish the sweep. In the mean time, the mutator can continue allocating, pushing the heap size even closer to the goal size. This worked okay with the 7/8ths trigger, but it gets into a vicious cycle with the GC trigger controller: if the mutator is allocating quickly and driving the trigger lower, more and more sweep work will be left to GC; this both causes GC to take longer (allowing the mutator to allocate more during GC) and delays the start of the concurrent mark phase, which throws off the GC controller's statistics and generally causes it to push the trigger even lower. As an example of a particularly bad case, the garbage benchmark with GOMAXPROCS=4 and -benchmem 512 (MB) spends the first 0.4-0.8 seconds of each GC cycle sweeping, during which the heap grows by between 109MB and 252MB. To fix this, this change replaces the 1:1 sweep rule with a proportional sweep rule. At the end of GC, GC knows exactly how much heap allocation will occur before the next concurrent GC as well as how many span pages must be swept. This change computes this "sweep ratio" and when the mallocgc asks for a span, the mcentral sweeps enough spans to bring the swept span count into ratio with the allocated byte count. On the benchmark from above, this entirely eliminates sweeping at the beginning of GC, which reduces the time between startGC readying the GC goroutine and GC stopping the world for sweep termination to ~100µs during which the heap grows at most 134KB. Change-Id: I35422d6bba0c2310d48bb1f8f30a72d29e98c1af Reviewed-on: https://go-review.googlesource.com/8921 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:46 +00:00
Austin Clements	91c80ce6c7	runtime: make mcache.local_cachealloc a uintptr This field used to decrease with sweeps (and potentially go negative). Now it is always zero or positive, so change it to a uintptr so it meshes better with other memory stats. Change-Id: I6a50a956ddc6077eeaf92011c51743cb69540a3c Reviewed-on: https://go-review.googlesource.com/8899 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:41 +00:00
Austin Clements	a0452a6821	runtime: proportional response GC trigger controller Currently, concurrent GC triggers at a fixed 7/8*GOGC heap growth. For mutators that allocate slowly, this means GC will trigger too early and run too often, wasting CPU time on GC. For mutators that allocate quickly, this means GC will trigger too late, causing the program to exceed the GOGC heap growth goal and/or to exceed CPU goals because of a high mutator assist ratio. This change adds a feedback control loop to dynamically adjust the GC trigger from cycle to cycle. By monitoring the heap growth and GC CPU utilization from cycle to cycle, this adjusts the Go garbage collector to target the GOGC heap growth goal and the 25% CPU utilization goal. Change-Id: Ic82eef288c1fa122f73b69fe604d32cbb219e293 Reviewed-on: https://go-review.googlesource.com/8851 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:37 +00:00
Austin Clements	8d03acce54	runtime: multi-threaded, utilization-scheduled background mark Currently, the concurrent mark phase is performed by the main GC goroutine. Prior to the previous commit enabling preemption, this caused marking to always consume 1/GOMAXPROCS of the available CPU time. If GOMAXPROCS=1, this meant background GC would consume 100% of the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use less than the goal of 25%. If GOMAXPROCS=4, background GC would use the goal 25%, but if the mutator wasn't using the remaining 75%, background marking wouldn't take advantage of the idle time. Enabling preemption in the previous commit made GC miss CPU targets in completely different ways, but set us up to bring everything back in line. This change replaces the fixed GC goroutine with per-P background mark goroutines. Once started, these goroutines don't go in the standard run queues; instead, they are scheduled specially such that the time spent in mutator assists and the background mark goroutines totals 25% of the CPU time available to the program. Furthermore, this lets background marking take advantage of idle Ps, which significantly boosts GC performance for applications that under-utilize the CPU. This requires also changing how time is reported for gctrace, so this change splits the concurrent mark CPU time into assist/background/idle scanning. This also requires increasing the size of the StackRecord slice used in a GoroutineProfile test. Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157 Reviewed-on: https://go-review.googlesource.com/8850 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:32 +00:00
Austin Clements	af060c3086	runtime: generally allow preemption during concurrent GC phases Currently, the entire GC process runs with g.m.preemptoff set. In the concurrent phases, the parts that actually need preemption disabled are run on a system stack and there's no overall need to stay on the same M or P during the concurrent phases. Hence, move the setting of g.m.preemptoff to when we start mark termination, at which point we really do need preemption disabled. This dramatically changes the scheduling behavior of the concurrent mark phase. Currently, since this is non-preemptible, concurrent mark gets one dedicated P (so 1/GOMAXPROCS utilization). With this change, the GC goroutine is scheduled like any other goroutine during concurrent mark, so it gets 1/<runnable goroutines> utilization. You might think it's not even necessary to set g.m.preemptoff at that point since the world is stopped, but stackalloc/stackfree use this as a signal that the per-P pools are not safe to access without synchronization. Change-Id: I08aebe8179a7d304650fb8449ff36262b3771099 Reviewed-on: https://go-review.googlesource.com/8839 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:27 +00:00
Austin Clements	100da60979	runtime: track time spent in mutator assists This time is tracked per P and periodically flushed to the global controller state. This will be used to compute mutator assist utilization in order to schedule background GC work. Change-Id: Ib94f90903d426a02cf488bf0e2ef67a068eb3eec Reviewed-on: https://go-review.googlesource.com/8837 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:22 +00:00
Austin Clements	4b2fde945a	runtime: proportional mutator assist Currently, mutator allocation periodically assists the garbage collector by performing a small, fixed amount of scanning work. However, to control heap growth, mutators need to perform scanning work proportional to their allocation rate. This change implements proportional mutator assists. This uses the scan work estimate computed by the garbage collector at the beginning of each cycle to compute how much scan work must be performed per allocation byte to complete the estimated scan work by the time the heap reaches the goal size. When allocation triggers an assist, it uses this ratio and the amount allocated since the last assist to compute the assist work, then attempts to steal as much of this work as possible from the background collector's credit, and then performs any remaining scan work itself. Change-Id: I98b2078147a60d01d6228b99afd414ef857e4fba Reviewed-on: https://go-review.googlesource.com/8836 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:18 +00:00
Austin Clements	028f972847	runtime: make gcDrainN in terms of scan work Currently, the "n" in gcDrainN is in terms of objects to scan. This is used by gchelpwork to perform a limited amount of work on allocation, but is a pretty arbitrary way to bound this amount of work since the number of objects has little relation to how long they take to scan. Modify gcDrainN to perform a fixed amount of scan work instead. For now, gchelpwork still performs a fairly arbitrary amount of scan work, but at least this is much more closely related to how long the work will take. Shortly, we'll use this to precisely control the scan work performed by mutator assists during allocation to achieve the heap size goal. Change-Id: I3cd07fe0516304298a0af188d0ccdf621d4651cc Reviewed-on: https://go-review.googlesource.com/8835 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:14 +00:00
Austin Clements	8e24283a28	runtime: track background scan work credit This tracks scan work done by background GC in a global pool. Mutator assists will draw on this credit to avoid doing work when background GC is staying ahead. Unlike the other GC controller tracking variables, this will be both written and read throughout the cycle. Hence, we can't arbitrarily delay updates like we can for scan work and bytes marked. However, we still want to minimize contention, so this global credit pool is allowed some error from the "true" amount of credit. Background GC accumulates credit locally up to a limit and only then flushes to the global pool. Similarly, mutator assists will draw from the credit pool in batches. Change-Id: I1aa4fc604b63bf53d1ee2a967694dffdfc3e255e Reviewed-on: https://go-review.googlesource.com/8834 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:09 +00:00
Austin Clements	4e9fc0df48	runtime: implement GC scan work estimator This implements tracking the scan work ratio of a GC cycle and using this to estimate the scan work that will be required by the next GC cycle. Currently this estimate is unused; it will be used to drive mutator assists. Change-Id: I8685b59d89cf1d83eddfc9b30d84da4e3a7f4b72 Reviewed-on: https://go-review.googlesource.com/8833 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:04 +00:00
Austin Clements	571ebae6ef	runtime: track scan work performed during concurrent mark This tracks the amount of scan work in terms of scanned pointers during the concurrent mark phase. We'll use this information to estimate scan work for the next cycle. Currently this aggregates the work counter in gcWork and dispose atomically aggregates this into a global work counter. dispose happens relatively infrequently, so the contention on the global counter should be low. If this turns out to be an issue, we can reduce the number of disposes, and if it's still a problem, we can switch to per-P counters. Change-Id: Iac0364c466ee35fab781dbbbe7970a5f3c4e1fc1 Reviewed-on: https://go-review.googlesource.com/8832 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:00 +00:00
Austin Clements	fb9fd2bdd7	runtime: atomic ops for int64 These currently use portable implementations in terms of their uint64 counterparts. Change-Id: Icba5f7134cfcf9d0429edabcdd73091d97e5e905 Reviewed-on: https://go-review.googlesource.com/8831 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:34:54 +00:00
Sebastien Binet	918fdae348	reflect: implement ArrayOf This change exposes reflect.ArrayOf to create new reflect.Type array types at runtime, when given a reflect.Type element. - reflect: implement ArrayOf - reflect: tests for ArrayOf - runtime: document that typeAlg is used by reflect and must be kept in synchronized Fixes #5996. Change-Id: I5d07213364ca915c25612deea390507c19461758 Reviewed-on: https://go-review.googlesource.com/4111 Reviewed-by: Keith Randall <khr@golang.org>	2015-04-21 15:21:09 +00:00
Matthew Dempsky	c0fa9e3f6f	runtime/pprof: disable flaky TestTraceFutileWakeup on linux/ppc64le Update #10512. Change-Id: Ifdc59c3a5d8aba420b34ae4e37b3c2315dd7c783 Reviewed-on: https://go-review.googlesource.com/9162 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-04-21 10:01:53 +00:00
Rick Hudson	899a4ad47e	runtime: Speed up heapBitsForObject Optimized heapBitsForObject by special casing objects whose size is a power of two. When a span holding such objects is initialized I added a mask that when &ed with an interior pointer results in the base of the pointer. For the garbage benchmark this resulted in CPU_CLK_UNHALTED in heapBitsForObject going from 7.7% down to 5.9% of the total, INST_RETIRED went from 12.2 -> 8.7. Here are the benchmarks that were at lease plus or minus 1%. benchmark old ns/op new ns/op delta BenchmarkFmtFprintfString 249 221 -11.24% BenchmarkFmtFprintfInt 247 223 -9.72% BenchmarkFmtFprintfEmpty 76.5 69.6 -9.02% BenchmarkBinaryTree17 4106631412 3744550160 -8.82% BenchmarkFmtFprintfFloat 424 399 -5.90% BenchmarkGoParse 4484421 4242115 -5.40% BenchmarkGobEncode 8803668 8449107 -4.03% BenchmarkFmtManyArgs 1494 1436 -3.88% BenchmarkGobDecode 10431051 10032606 -3.82% BenchmarkFannkuch11 2591306713 2517400464 -2.85% BenchmarkTimeParse 361 371 +2.77% BenchmarkJSONDecode 70620492 68830357 -2.53% BenchmarkRegexpMatchMedium_1K 54693 53343 -2.47% BenchmarkTemplate 90008879 91929940 +2.13% BenchmarkTimeFormat 380 387 +1.84% BenchmarkRegexpMatchEasy1_32 111 113 +1.80% BenchmarkJSONEncode 21359159 21007583 -1.65% BenchmarkRegexpMatchEasy1_1K 603 613 +1.66% BenchmarkRegexpMatchEasy0_32 127 129 +1.57% BenchmarkFmtFprintfIntInt 399 393 -1.50% BenchmarkRegexpMatchEasy0_1K 373 378 +1.34% Change-Id: I78e297161026f8b5cc7507c965fd3e486f81ed29 Reviewed-on: https://go-review.googlesource.com/8980 Reviewed-by: Austin Clements <austin@google.com>	2015-04-20 21:39:06 +00:00
Russ Cox	181e26b9fa	runtime: replace func-based write barrier skipping with type-based This CL revises CL 7504 to use explicitly uintptr types for the struct fields that are going to be updated sometimes without write barriers. The result is that the fields are now updated always without write barriers. This approach has two important properties: 1) Now the GC never looks at the field, so if the missing reference could cause a problem, it will do so all the time, not just when the write barrier is missed at just the right moment. 2) Now a write barrier never happens for the field, avoiding the (correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1. Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e Reviewed-on: https://go-review.googlesource.com/9019 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-20 20:20:09 +00:00
Ian Lance Taylor	357a013060	runtime: save registers in linux/{386,amd64} lib entry point The callee-saved registers must be saved because for the c-shared case this code is invoked from C code in the system library, and that code expects the registers to be saved. The tests were passing because in the normal case the code calls a cgo function that naturally saves callee-saved registers anyhow. However, it fails when the code takes the non-cgo path. Change-Id: I9c1f5e884f5a72db9614478049b1863641c8b2b9 Reviewed-on: https://go-review.googlesource.com/9114 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-04-20 18:09:41 +00:00
Ian Lance Taylor	725aa3451a	runtime: no deadlock error if buildmode=c-archive or c-shared Change-Id: I4ee6dac32bd3759aabdfdc92b235282785fbcca9 Reviewed-on: https://go-review.googlesource.com/9083 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-04-20 17:31:44 +00:00
Ian Lance Taylor	9c1868d06d	runtime: add -buildmode=c-archive/c-shared support for linux/386 Change-Id: I87147ca6bb53e3121cc4245449c519509f107638 Reviewed-on: https://go-review.googlesource.com/9009 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-04-17 19:31:37 +00:00
Russ Cox	8e5346571c	runtime: leave gccheckmark testing off by default It's not helping anymore, and it's fooling people who try to understand performance (like me). Change-Id: I133a644acae0ddf1bfa17c654cdc01e2089da963 Reviewed-on: https://go-review.googlesource.com/9018 Reviewed-by: Austin Clements <austin@google.com>	2015-04-17 19:29:04 +00:00
Austin Clements	c1c667542c	runtime: fix dangling pointer in readyExecute readyExecute passes a closure to mcall that captures an argument to readyExecute. Since mcall is marked noescape, this closure lives on the stack of the calling goroutine. However, the closure puts the calling goroutine on the run queue (and switches to a new goroutine). If the calling goroutine gets scheduled before the mcall returns, this stack-allocated closure will become invalid while it's still executing. One consequence of this we've observed is that the captured gp variable can get overwritten before the call to execute(gp), causing execute(gp) to segfault. Fix this by passing the currently captured gp variable through a field in the calling goroutine's g struct so that the func is no longer a closure. To prevent problems like this in the future, this change also removes the go:noescape annotation from mcall. Due to a compiler bug, this will currently cause a func closure passed to mcall to be implicitly allocated rather than refusing the implicit allocation. However, this is okay because there are no other closures passed to mcall right now and the compiler bug will be fixed shortly. Fixes #10428. Change-Id: I49b48b85de5643323b89e9eaa4df63854e968c32 Reviewed-on: https://go-review.googlesource.com/8866 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-17 17:59:14 +00:00
Dave Cheney	7ae9d06880	runtime/pprof: disable TestTraceStressStartStop Updates #10476 Change-Id: Ic4414f669104905c6004835be5cf0fa873553ea6 Reviewed-on: https://go-review.googlesource.com/8962 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-17 14:54:25 +00:00
David Crawshaw	c8aba85e4a	runtime: export main.main for android Previously we started the Go runtime from a JNI function call, which eventually called the program's main function. Now the runtime is initialized by an ELF initialization function as a c-shared library, and the program's main function is not called. So now we export main so it can be called from JNI. This is necessary for all-Go apps because unlike a normal shared library, the program loading the library is not written by or known to the programmer. As far as they are concerned, the .so is everything. In fact the same code is compiled for iOS as a normal Go program. Change-Id: I61c6a92243240ed229342362231b1bfc7ca526ba Reviewed-on: https://go-review.googlesource.com/9015 Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>	2015-04-17 12:11:04 +00:00
David Crawshaw	5da1c254d5	runtime: do not run main when buildmode=c-shared Change-Id: Ie7f85873978adf3fd5c739176f501ca219592824 Reviewed-on: https://go-review.googlesource.com/9011 Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-04-17 11:31:01 +00:00
Russ Cox	6a2b0c0b6d	runtime: delete cgo_allocate This memory is untyped and can't be used anymore. The next version of SWIG won't need it. Change-Id: I592b287c5f5186975ee09a9b28d8efe3b57134e7 Reviewed-on: https://go-review.googlesource.com/8956 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-04-17 01:30:47 +00:00
David Crawshaw	5b72b8c7a3	runtime: aeshash stubs for arm64 For some reason the absense of an implementation does not stop arm64 binaries being built. However it comes up with -buildmode=c-archive. Change-Id: Ic0db5fd8fb4fe8252b5aa320818df0c7aec3db8f Reviewed-on: https://go-review.googlesource.com/8989 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-04-16 19:49:31 +00:00

1 2 3 4 5 ...

1011 Commits