Currently, mutator allocation periodically assists the garbage
collector by performing a small, fixed amount of scanning work.
However, to control heap growth, mutators need to perform scanning
work *proportional* to their allocation rate.
This change implements proportional mutator assists. This uses the
scan work estimate computed by the garbage collector at the beginning
of each cycle to compute how much scan work must be performed per
allocation byte to complete the estimated scan work by the time the
heap reaches the goal size. When allocation triggers an assist, it
uses this ratio and the amount allocated since the last assist to
compute the assist work, then attempts to steal as much of this work
as possible from the background collector's credit, and then performs
any remaining scan work itself.
Change-Id: I98b2078147a60d01d6228b99afd414ef857e4fba
Reviewed-on: https://go-review.googlesource.com/8836
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, the "n" in gcDrainN is in terms of objects to scan. This is
used by gchelpwork to perform a limited amount of work on allocation,
but is a pretty arbitrary way to bound this amount of work since the
number of objects has little relation to how long they take to scan.
Modify gcDrainN to perform a fixed amount of scan work instead. For
now, gchelpwork still performs a fairly arbitrary amount of scan work,
but at least this is much more closely related to how long the work
will take. Shortly, we'll use this to precisely control the scan work
performed by mutator assists during allocation to achieve the heap
size goal.
Change-Id: I3cd07fe0516304298a0af188d0ccdf621d4651cc
Reviewed-on: https://go-review.googlesource.com/8835
Reviewed-by: Rick Hudson <rlh@golang.org>
This tracks scan work done by background GC in a global pool. Mutator
assists will draw on this credit to avoid doing work when background
GC is staying ahead.
Unlike the other GC controller tracking variables, this will be both
written and read throughout the cycle. Hence, we can't arbitrarily
delay updates like we can for scan work and bytes marked. However, we
still want to minimize contention, so this global credit pool is
allowed some error from the "true" amount of credit. Background GC
accumulates credit locally up to a limit and only then flushes to the
global pool. Similarly, mutator assists will draw from the credit pool
in batches.
Change-Id: I1aa4fc604b63bf53d1ee2a967694dffdfc3e255e
Reviewed-on: https://go-review.googlesource.com/8834
Reviewed-by: Rick Hudson <rlh@golang.org>
This implements tracking the scan work ratio of a GC cycle and using
this to estimate the scan work that will be required by the next GC
cycle. Currently this estimate is unused; it will be used to drive
mutator assists.
Change-Id: I8685b59d89cf1d83eddfc9b30d84da4e3a7f4b72
Reviewed-on: https://go-review.googlesource.com/8833
Reviewed-by: Rick Hudson <rlh@golang.org>
This tracks the amount of scan work in terms of scanned pointers
during the concurrent mark phase. We'll use this information to
estimate scan work for the next cycle.
Currently this aggregates the work counter in gcWork and dispose
atomically aggregates this into a global work counter. dispose happens
relatively infrequently, so the contention on the global counter
should be low. If this turns out to be an issue, we can reduce the
number of disposes, and if it's still a problem, we can switch to
per-P counters.
Change-Id: Iac0364c466ee35fab781dbbbe7970a5f3c4e1fc1
Reviewed-on: https://go-review.googlesource.com/8832
Reviewed-by: Rick Hudson <rlh@golang.org>
These currently use portable implementations in terms of their uint64
counterparts.
Change-Id: Icba5f7134cfcf9d0429edabcdd73091d97e5e905
Reviewed-on: https://go-review.googlesource.com/8831
Reviewed-by: Rick Hudson <rlh@golang.org>
This change exposes reflect.ArrayOf to create new reflect.Type array
types at runtime, when given a reflect.Type element.
- reflect: implement ArrayOf
- reflect: tests for ArrayOf
- runtime: document that typeAlg is used by reflect and must be kept in
synchronized
Fixes#5996.
Change-Id: I5d07213364ca915c25612deea390507c19461758
Reviewed-on: https://go-review.googlesource.com/4111
Reviewed-by: Keith Randall <khr@golang.org>
Optimized heapBitsForObject by special casing
objects whose size is a power of two. When a
span holding such objects is initialized I
added a mask that when &ed with an interior pointer
results in the base of the pointer. For the garbage
benchmark this resulted in CPU_CLK_UNHALTED in
heapBitsForObject going from 7.7% down to 5.9%
of the total, INST_RETIRED went from 12.2 -> 8.7.
Here are the benchmarks that were at lease plus or minus 1%.
benchmark old ns/op new ns/op delta
BenchmarkFmtFprintfString 249 221 -11.24%
BenchmarkFmtFprintfInt 247 223 -9.72%
BenchmarkFmtFprintfEmpty 76.5 69.6 -9.02%
BenchmarkBinaryTree17 4106631412 3744550160 -8.82%
BenchmarkFmtFprintfFloat 424 399 -5.90%
BenchmarkGoParse 4484421 4242115 -5.40%
BenchmarkGobEncode 8803668 8449107 -4.03%
BenchmarkFmtManyArgs 1494 1436 -3.88%
BenchmarkGobDecode 10431051 10032606 -3.82%
BenchmarkFannkuch11 2591306713 2517400464 -2.85%
BenchmarkTimeParse 361 371 +2.77%
BenchmarkJSONDecode 70620492 68830357 -2.53%
BenchmarkRegexpMatchMedium_1K 54693 53343 -2.47%
BenchmarkTemplate 90008879 91929940 +2.13%
BenchmarkTimeFormat 380 387 +1.84%
BenchmarkRegexpMatchEasy1_32 111 113 +1.80%
BenchmarkJSONEncode 21359159 21007583 -1.65%
BenchmarkRegexpMatchEasy1_1K 603 613 +1.66%
BenchmarkRegexpMatchEasy0_32 127 129 +1.57%
BenchmarkFmtFprintfIntInt 399 393 -1.50%
BenchmarkRegexpMatchEasy0_1K 373 378 +1.34%
Change-Id: I78e297161026f8b5cc7507c965fd3e486f81ed29
Reviewed-on: https://go-review.googlesource.com/8980
Reviewed-by: Austin Clements <austin@google.com>
This CL revises CL 7504 to use explicitly uintptr types for the
struct fields that are going to be updated sometimes without
write barriers. The result is that the fields are now updated *always*
without write barriers.
This approach has two important properties:
1) Now the GC never looks at the field, so if the missing reference
could cause a problem, it will do so all the time, not just when the
write barrier is missed at just the right moment.
2) Now a write barrier never happens for the field, avoiding the
(correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1.
Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e
Reviewed-on: https://go-review.googlesource.com/9019
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The callee-saved registers must be saved because for the c-shared case
this code is invoked from C code in the system library, and that code
expects the registers to be saved. The tests were passing because in
the normal case the code calls a cgo function that naturally saves
callee-saved registers anyhow. However, it fails when the code takes
the non-cgo path.
Change-Id: I9c1f5e884f5a72db9614478049b1863641c8b2b9
Reviewed-on: https://go-review.googlesource.com/9114
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Change-Id: I87147ca6bb53e3121cc4245449c519509f107638
Reviewed-on: https://go-review.googlesource.com/9009
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
It's not helping anymore, and it's fooling people who try to
understand performance (like me).
Change-Id: I133a644acae0ddf1bfa17c654cdc01e2089da963
Reviewed-on: https://go-review.googlesource.com/9018
Reviewed-by: Austin Clements <austin@google.com>
readyExecute passes a closure to mcall that captures an argument to
readyExecute. Since mcall is marked noescape, this closure lives on
the stack of the calling goroutine. However, the closure puts the
calling goroutine on the run queue (and switches to a new
goroutine). If the calling goroutine gets scheduled before the mcall
returns, this stack-allocated closure will become invalid while it's
still executing. One consequence of this we've observed is that the
captured gp variable can get overwritten before the call to
execute(gp), causing execute(gp) to segfault.
Fix this by passing the currently captured gp variable through a field
in the calling goroutine's g struct so that the func is no longer a
closure.
To prevent problems like this in the future, this change also removes
the go:noescape annotation from mcall. Due to a compiler bug, this
will currently cause a func closure passed to mcall to be implicitly
allocated rather than refusing the implicit allocation. However, this
is okay because there are no other closures passed to mcall right now
and the compiler bug will be fixed shortly.
Fixes#10428.
Change-Id: I49b48b85de5643323b89e9eaa4df63854e968c32
Reviewed-on: https://go-review.googlesource.com/8866
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Previously we started the Go runtime from a JNI function call, which
eventually called the program's main function. Now the runtime is
initialized by an ELF initialization function as a c-shared library,
and the program's main function is not called. So now we export main
so it can be called from JNI.
This is necessary for all-Go apps because unlike a normal shared
library, the program loading the library is not written by or known
to the programmer. As far as they are concerned, the .so is
everything. In fact the same code is compiled for iOS as a normal Go
program.
Change-Id: I61c6a92243240ed229342362231b1bfc7ca526ba
Reviewed-on: https://go-review.googlesource.com/9015
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Change-Id: Ie7f85873978adf3fd5c739176f501ca219592824
Reviewed-on: https://go-review.googlesource.com/9011
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This memory is untyped and can't be used anymore.
The next version of SWIG won't need it.
Change-Id: I592b287c5f5186975ee09a9b28d8efe3b57134e7
Reviewed-on: https://go-review.googlesource.com/8956
Reviewed-by: Ian Lance Taylor <iant@golang.org>
For some reason the absense of an implementation does not stop arm64
binaries being built. However it comes up with -buildmode=c-archive.
Change-Id: Ic0db5fd8fb4fe8252b5aa320818df0c7aec3db8f
Reviewed-on: https://go-review.googlesource.com/8989
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I8e912ff9327a4163b63b8c628aa3546e86ddcc02
Reviewed-on: https://go-review.googlesource.com/8983
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Change-Id: I09e84161d106960a69972f5fc845a1e40c28e58f
Reviewed-on: https://go-review.googlesource.com/8331
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
It is faster to execute
MOVQ AX,(DI)
MOVQ AX,8(DI)
MOVQ AX,16(DI)
MOVQ AX,24(DI)
ADDQ $32,DI
than
STOSQ
STOSQ
STOSQ
STOSQ
However, in order to be able to jump into
the middle of a block of MOVQs, the call
site needs to pre-adjust DI.
If we're clearing a small area, the cost
of that DI pre-adjustment isn't repaid.
This CL switches the DUFFZERO implementation
to use a hybrid strategy, in which small
clears use STOSQ as before, but large clears
use mostly MOVQ/ADDQ blocks.
benchmark old ns/op new ns/op delta
BenchmarkClearFat8 0.55 0.55 +0.00%
BenchmarkClearFat12 0.82 0.83 +1.22%
BenchmarkClearFat16 0.55 0.55 +0.00%
BenchmarkClearFat24 0.82 0.82 +0.00%
BenchmarkClearFat32 2.20 1.94 -11.82%
BenchmarkClearFat40 1.92 1.66 -13.54%
BenchmarkClearFat48 2.21 1.93 -12.67%
BenchmarkClearFat56 3.03 2.20 -27.39%
BenchmarkClearFat64 3.26 2.48 -23.93%
BenchmarkClearFat72 3.57 2.76 -22.69%
BenchmarkClearFat80 3.83 3.05 -20.37%
BenchmarkClearFat88 4.14 3.30 -20.29%
BenchmarkClearFat128 5.54 4.69 -15.34%
BenchmarkClearFat256 9.95 9.09 -8.64%
BenchmarkClearFat512 18.7 17.9 -4.28%
BenchmarkClearFat1024 36.2 35.4 -2.21%
Change-Id: Ic786406d9b3cab68d5a231688f9e66fcd1bd7103
Reviewed-on: https://go-review.googlesource.com/2585
Reviewed-by: Keith Randall <khr@golang.org>
By removing type slice, renaming type sliceStruct to type slice and
whacking until it compiles.
Has a pleasing net reduction of conversions.
Fixes#10188
Change-Id: I77202b8df637185b632fd7875a1fdd8d52c7a83c
Reviewed-on: https://go-review.googlesource.com/8770
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Fixes#10450
runtime.cputicks is called from runtime.exitsyscall and must not
split the stack. cputicks is implemented in several ways and the
NOSPLIT annotation was missing from a few of these.
Change-Id: I5cbbb4e5888c5d298fe2fef240782d0e49f59af8
Reviewed-on: https://go-review.googlesource.com/8939
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
When Windows calls externalthreadhandler it expects to receive
return value in AX. We don't set AX anywhere. Change that.
Store ctrlhandler1 and profileloop1 return values into AX before
returning from externalthreadhandler.
Fixes#10215.
Change-Id: Ied04542cc3ebe7d4a26660e970f9f78098143591
Reviewed-on: https://go-review.googlesource.com/8901
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
A G will be preempted if it runs for 10ms without blocking. Currently
this constant is hard-coded in retake. Move it to a global const.
We'll use the time slice length in scheduling background GC.
Change-Id: I79a979948af2fad3afe5df9d4af4062f166554b7
Reviewed-on: https://go-review.googlesource.com/8838
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
mHeap_ReclaimList is asked to reclaim at least npages pages, but it
counts the number of spans reclaimed, not the number of pages
reclaimed. The number of spans reclaimed is strictly larger than the
number of pages, so this is not strictly wrong, but it is forcing more
reclamation than was intended by the caller, which delays large
allocations.
Fix this by increasing the count by the number of pages in the swept
span, rather than just increasing it by 1.
Fixes#9048.
Change-Id: I5ae364a9837a6012e68fcd431bba000340cfd50c
Reviewed-on: https://go-review.googlesource.com/8920
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Commit d7e0ad4 removed the next_gc manipulation from mSpan_Sweep, but
left in the traceNextGC() for recording the updated next_gc
value. Remove this now unnecessary call.
Change-Id: I28e0de071661199be9810d7bdcc81ce50b5a58ae
Reviewed-on: https://go-review.googlesource.com/8894
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
With the new buildmodes c-archive and c-shared, it is possible for a
cgo call to come in early in the lifecycle of a Go program. Calls
before the runtime has been initialized are caught by
_cgo_wait_runtime_init_done. However a call can come in after the
runtime has initialized, but before the program's package init
functions have finished running.
To avoid this cgocallback checks m.ncgo to see if we are on a thread
running Go. If not, we may be a foreign thread and it blocks until
main_init is complete.
Change-Id: I7a9f137fa2a40c322a0b93764261f9aa17fcf5b8
Reviewed-on: https://go-review.googlesource.com/8897
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Avoids shadowing the builtin channel close function.
Change-Id: I7a729b0937c8248fe27222be61318a88db995eee
Reviewed-on: https://go-review.googlesource.com/8898
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Follows http://golang.org/cl/8454, a similar CL for arm architectures.
This CL involves android-specific changes, namely, synthesizing
argv/auxv, as android doesn't provide those to the init functions.
This code is based on crawshaw@ android code in golang.org/x/mobile.
Change-Id: I32364efbb2662e80270a99bd7dfb1d0421b5417d
Reviewed-on: https://go-review.googlesource.com/8457
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Adds the runtime initialization flow for arm akin to amd64.
In particular,we use the library initialization entry point to:
- create a new OS thread and run the "regular" runtime init stack on
that thread
- return immediately from the main (i.e., loader) thread
- at the first CGO invocation, we wait for the runtime initialization
to complete.
Verified to work on a Raspberry Pi and an Android phone.
Change-Id: I32f39228ae30a03ce9569287f234b305790fecf6
Reviewed-on: https://go-review.googlesource.com/8455
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Srdjan Petrovic <spetrovic@google.com>
Related to issue #10410
For some reason, any non-trivial code in _cgo_wait_runtime_init_done
(even fprintf()) will crash that call.
If anybody has any guess why this is happening, please let me know!
For now, I'm clearing the functions for ppc64, as it's currently not used.
Change-Id: I1b11383aaf4f9f9a16f1fd6606842cfeedc9f0b3
Reviewed-on: https://go-review.googlesource.com/8766
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Srdjan Petrovic <spetrovic@google.com>
Just like darwin/arm.
Change-Id: Ic75927bd6457d37cda7dd8279fd9b4cd52edc1d1
Reviewed-on: https://go-review.googlesource.com/8813
Reviewed-by: Minux Ma <minux@golang.org>
Like other arm64 platforms, darwin/arm64 has a different physical
page size to logical page size so it is running into issue 9993. I
hope it can be fixed for Go 1.5, but for now it is demonstrating the
same bug as the other skipped os+arch combinations.
Change-Id: Iedaf9afe56d6954bb4391b6e843d81742a75a00c
Reviewed-on: https://go-review.googlesource.com/8814
Reviewed-by: Minux Ma <minux@golang.org>
Just like darwin/arm.
Change-Id: Ie4998d24b2d891a9f6c8047ec40cd3fdf80622cd
Reviewed-on: https://go-review.googlesource.com/8812
Reviewed-by: Minux Ma <minux@golang.org>
Tested by using -buildmode=c-archive to generate an archive, add it
to an Xcode project and calling a Go function from an iOS app. (I'm
still investigating proper buildmode tests for all.bash.)
Change-Id: I7890df15246df8e90ad27837b8d64ba2cde409fe
Reviewed-on: https://go-review.googlesource.com/8719
Reviewed-by: Ian Lance Taylor <iant@golang.org>
A similar fix was applied in 545686857b
but another instance of 'pc' was missed.
Also adds a test for the goroutine gdb command.
It currently uses goroutine 2 for the test, since goroutine 1 has
its stack pointer set to 0 for some reason.
Change-Id: I53ca22be6952f03a862edbdebd9b5c292e0853ae
Reviewed-on: https://go-review.googlesource.com/8729
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently, when allocation reaches the concurrent GC trigger size, we
start the concurrent collector by ready'ing its G. This simply puts it
on the end of the P's run queue, which means we may not actually start
GC for some time as the current G continues to run and then the P
drains other Gs already on its run queue. Since the mutator can
continue to allocate, the heap can potentially be much larger than we
intended by the time GC actually starts. Furthermore, how much larger
is difficult to predict since it depends on the scheduler.
Fix this by preempting the current G and switching directly to the
concurrent GC G as soon as we reach the trigger heap size.
On the garbage benchmark from the benchmarks subrepo with
GOMAXPROCS=4, this reduces the time from triggering the GC to the
beginning of sweep termination by 10 to 30 milliseconds, which reduces
allocation after the trigger by up to 10MB (a large fraction of the
64MB live heap the benchmark tries to maintain).
One other known source of delay before we "really" start GC is the
sweep finalization performed before sweep termination. This has
similar negative effects on heap size and predictability, but is an
orthogonal problem. This change adds a TODO for this.
Change-Id: I8bae98cb43685c1bf353ff55868e4647e3743c47
Reviewed-on: https://go-review.googlesource.com/8513
Reviewed-by: Rick Hudson <rlh@golang.org>
These were appropriate for STW GC, since it interrupted the allocating
Goroutine, but don't apply to concurrent GC, which runs on its own
Goroutine. Forced GC is still STW, but it makes sense to attribute the
GC to the goroutine that called runtime.GC().
Change-Id: If12418ca66dc7e53b8b16025af4e03adb5d9577e
Reviewed-on: https://go-review.googlesource.com/8715
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
exitsyscallfast checks for freezetheworld, but does so only by
checking if stopwait is positive. This can also happen during
stoptheworld, which is harmless, but confusing. Shortly, it will be
important that we get to the p.status cas even if stopwait is set.
Hence, make this test more specific so it only triggers with
freezetheworld and not other uses of stopwait.
Change-Id: Ibb722cd8360c3ed5a9654482519e3ceb87a8274d
Reviewed-on: https://go-review.googlesource.com/8205
Reviewed-by: Russ Cox <rsc@golang.org>
'themoduledata' doesn't really make sense now we support multiple moduledata
objects.
Change-Id: I8263045d8f62a42cb523502b37289b0fba054f62
Reviewed-on: https://go-review.googlesource.com/8521
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>