Currently MaxGomaxprocs is 256. The previous CL saved enough per-P
static space that we can quadruple MaxGomaxprocs (and hence the static
size of allp) and still come out ahead.
This is safe for Go 1.9. In Go 1.10 we'll eliminate the hard-coded
limit entirely.
Updates #15131.
Change-Id: I919ea821c1ce64c27812541dccd7cd7db4122d16
Reviewed-on: https://go-review.googlesource.com/45673
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Back in the day, allp was just a pointer to an array. As a result, the
runtime has a few loops of the form:
for i := 0; ; i++ {
p := allp[i]
if p == nil {
break
}
...
}
This is silly now because it requires that allp be one longer than the
maximum possible number of Ps, but now that allp is in Go it has a
length.
Replace these with range loops.
Change-Id: I91ef4bc7bd3c9d4fda2264f4aa1b1d0271d7f578
Reviewed-on: https://go-review.googlesource.com/45571
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
ARM currently does not use a hardware yield instruction in the spin
loop in procyield because the YIELD instruction was only added in
ARMv6K. However, it appears earlier ARM chips will interpret the YIELD
encoding as an effective NOP (specifically an MSR instruction that
ultimately has no effect on the CPSR register).
Hence, use YIELD in procyield on ARM since it should be, at worst,
harmless.
Fixes#16663.
Change-Id: Id1787ac48862b785b92c28f1ac84cb4908d2173d
Reviewed-on: https://go-review.googlesource.com/45250
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
If we're in a situation where printing the fp and sp in the traceback
is useful, it's almost certainly also useful to print the PC.
Change-Id: Ie48a0d5de8a54b5b90ab1d18638a897958e48f70
Reviewed-on: https://go-review.googlesource.com/45210
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Both runtime.exit and syscall.Exit call Windows ExitProcess.
But recently (CL 34616) runtime.exit was changed to ignore
Windows CreateThread errors if ExitProcess is called.
This CL adjusts syscall.Exit to do the same.
Fixes#18253 (maybe)
Change-Id: I6496c31b01e7c7d73b69c0b2ae33ed7fbe06736b
Reviewed-on: https://go-review.googlesource.com/45115
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This adds diagnostics so we can tell if the finalizer has started, in
addition to whether or not it has finished.
Updates #19381.
Change-Id: Icb7b1b0380c9ad1128b17074828945511a6cca5d
Reviewed-on: https://go-review.googlesource.com/45138
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
runtime.GC no longer triggers a STW GC. This fixes the description of
GODEBUG=gctrace=1 so it doesn't claim otherwise.
Change-Id: Ibd34a55c5ae7b5eda5c2393b9a6674bdf1d51eb3
Reviewed-on: https://go-review.googlesource.com/45131
Reviewed-by: Rick Hudson <rlh@golang.org>
The current implementation of "goroutine N cmd" assumes it can get
goroutine N's state from the goroutine's sched buffer. But this only
works if the goroutine is blocked. Extend find_goroutine so that, if
there is no saved scheduler state for a goorutine, it tries to find
the thread the goroutine is running on and use the thread's current
register state. We also extend find_goroutine to understand saved
syscall register state.
Fixes#13887.
Change-Id: I739008a8987471deaa4a9da918655e4042cf969b
Reviewed-on: https://go-review.googlesource.com/45031
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently the extra Ms created for cgo callbacks have a corresponding
G that's kept in syscall state with only a call to goexit on its
stack. This leads to confusing output from runtime.NumGoroutines and
in tracebacks:
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
.../src/runtime/asm_amd64.s:2197 +0x1
Fix this by putting this goroutine into state _Gdead when it's not in
use instead of _Gsyscall. To keep the goroutine counts correct, we
also add one to sched.ngsys while the goroutine is in _Gdead. The
effect of this is as if the goroutine simply doesn't exist when it's
not in use.
Fixes#16631.
Fixes#16714.
Change-Id: Ieae08a2febd4b3d00bef5c23fd6ca88fb2bb0087
Reviewed-on: https://go-review.googlesource.com/45030
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The test is inherently racy, and for me fails about 0.05% of the time.
So only fail the test if it fails ten times in a row.
Fixes#20594
Change-Id: I3b3f7598f2196f7406f1a3937f38f21ff0c0e4b5
Reviewed-on: https://go-review.googlesource.com/45020
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
For cgo programs on linux-amd64 we call the C function mmap.
This supports programs such as the C memory sanitizer that need to
intercept all calls to mmap. It turns out that there are programs that
intercept both mmap and munmap, or that at least expect that if they
intercept mmap, they also intercept munmap. So, if we permit mmap
to be intercepted, also permit munmap to be intercepted.
No test, as it requires two odd things: a C program that intercepts
mmap and munmap, and a Go program that calls munmap.
Change-Id: Iec33f47d59f70dbb7463fd12d30728c24cd4face
Reviewed-on: https://go-review.googlesource.com/45016
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Try to avoid a race between the main goroutine exiting and a panic
occurring. Don't try too hard, to avoid hanging.
Updates #3934Fixes#20018
Change-Id: I57a02b6d795d2a61f1cadd137ce097145280ece7
Reviewed-on: https://go-review.googlesource.com/41052
Reviewed-by: Austin Clements <austin@google.com>
C code expects CR2, CR3, and CR4 to be preserved across function calls.
Preserve the entire CR register across function calls in
_rt0_ppc64le_linux_lib and crosscall2. The standard ppc64le call frame
uses 8(R1) as the place to save CR; emulate that.
It's hard to write a reliable test for this as it requires writing C
code that sets CR2, CR3, or CR4 across a call to a Go function.
Change-Id: If39e771a5b574602b848227312e83598fe74eab7
Reviewed-on: https://go-review.googlesource.com/44733
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Since TestPingPongHog tests the scheduler, it's ultimately
probabilistic. Currently, it requires the result be at most of factor
of 2 off of the ideal. It turns out this isn't quite enough in
practice, with factors on 1000 iterations on linux/amd64 ranging from
0.48 to 2.5. If the test were failing, we would expect a factor closer
to 1000X, so it's pretty safe to expand the accepted factor from 2 to
5.
Fixes#20494.
Change-Id: If8f2e96194fe66f1fb981a965d1167fe74ff38d7
Reviewed-on: https://go-review.googlesource.com/44859
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
key32 is called between entersyscallblock and exitsyscall
stack split may occur if disable inlining and the G is preempted
Fix the problem by describing key32 as nosplit function
Fixes#20510
Change-Id: I1f0787995936f34ef0052cf79fde036f1b338865
Reviewed-on: https://go-review.googlesource.com/44390
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
cmd/compile/internal/ld/decodesym.go is now
cmd/link/internal/ld/decodesym.go
Change-Id: I16ec5c89aa3507e70676c2b50d70f1fde533a085
Reviewed-on: https://go-review.googlesource.com/44373
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This avoids false-positive TSAN reports when using the C sigaction
function to read handlers registered by the Go runtime.
(Unfortunately, I can't seem to coax the runtime into reproducing the
failure in a small unit-test.)
Change-Id: I744279a163708e24b1fbe296ca691935c394b5f3
Reviewed-on: https://go-review.googlesource.com/44270
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Currently, the heap arena allocator allocates monotonically increasing
addresses. This is fine on 64-bit where we stake out a giant block of
the address space for ourselves and start at the beginning of it, but
on 32-bit the arena starts at address 0 but we start allocating from
wherever the OS feels like giving us memory. We can generally hint the
OS to start us at a low address, but this doesn't always work.
As a result, on 32-bit, if the OS gives us an arena block that's lower
than the current block we're allocating from, we simply say "thanks
but no thanks", return the whole (256MB!) block of memory, and then
take a fallback path that mmaps just the amount of memory we need
(which may be as little as 8K).
We have to do this because mheap_.arena_used is *both* the highest
used address in the arena and the next address we allocate from.
Fix all of this by separating the second role of arena_used out into a
new field called arena_alloc. This lets us accept any arena block the
OS gives us. This also slightly changes the invariants around
arena_end. Previously, we ensured arena_used <= arena_end, but this
was related to arena_used's second role, so the new invariant is
arena_alloc <= arena_end. As a result, we no longer necessarily update
arena_end when we're updating arena_used.
Fixes#20259 properly. (Unlike the original fix, this one should not
be cherry-picked to Go 1.8.)
This is reasonably low risk. I verified several key properties of the
32-bit code path with both 4K and 64K physical pages using a symbolic
model and the change does not materially affect 64-bit (arena_used ==
arena_alloc on 64-bit). The only oddity is that we no longer call
setArenaUsed with racemap == false to indicate that we're creating a
hole in the address space, but this only happened in a 32-bit-only
code path, and the race detector require 64-bit, so this never
mattered anyway.
Change-Id: Ib1334007933e615166bac4159bf357ae06ec6a25
Reviewed-on: https://go-review.googlesource.com/44010
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
We weren't setting r0 to 0, as required by our generated code.
Before this patch, the misc/cgo/testcarchive tests failed on ppc64le.
After this patch, they work, so enable them.
Change-Id: I53b16746961da9f7c34f59030a1e40953c9c1e05
Reviewed-on: https://go-review.googlesource.com/44093
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Commit 4dcba023c6 replaced select with pselect6 on linux/amd64 and
linux/arm, but it turns out the Android emulator uses linux/386. This
makes the equivalent change there, too.
Fixes#20409 more.
Change-Id: If542d6ade06309aab8758d5f5f6edec201ca7670
Reviewed-on: https://go-review.googlesource.com/44011
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
There are two copies each of the stackPreempt/_StackPreempt and
stackFork/_StackFork constants. Remove the ones left over from C that
are no longer used.
Change-Id: I849604c72c11e4a0cb08e45e9817eb3f5a6ce8ba
Reviewed-on: https://go-review.googlesource.com/43638
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Setting stackCache to 0 to disable stack caches for debugging hasn't
worked for a long time. It causes stackalloc to fall back to full span
allocation, round sub-page stacks down to 0 pages, and blow up.
Fix this debug mode so it disables the per-P caches, but continues to
use the global stack pools for small stacks, which correctly handle
sub-page stacks. While we're here, rename stackCache to stackNoCache
so it acts like the rest of the stack allocator debug modes where "0"
is the right default value.
Fixes#17291.
Change-Id: If401c41cee3448513cbd7bb2e9334a8efab257a7
Reviewed-on: https://go-review.googlesource.com/43637
Reviewed-by: Keith Randall <khr@golang.org>
The stackFromSystem debug mode has two problems:
1) It rounds the stack allocation to _PageSize. If the physical page
size is >8K, this can cause unmapping the memory later to either
under-unmap or over-unmap.
2) It doesn't return the rounded-up allocation size to its caller, so
when we later unmap the memory, we may pass the wrong length.
Fix these problems by rounding the size up to the physical page size
and putting that rounded-up size in the returned stack bounds.
Fixes#17289.
Change-Id: I6b854af3b06bb16e3750798397bb5e2a722ec1cb
Reviewed-on: https://go-review.googlesource.com/43636
Reviewed-by: Keith Randall <khr@golang.org>
If mheap.sysAlloc doesn't have room in the heap arena for an
allocation, it will attempt to map more address space with sysReserve.
sysReserve is given a hint, but can return any unused address range.
Currently, mheap.sysAlloc incorrectly assumes the returned region will
never fall between arena_start and arena_used. If it does,
mheap.sysAlloc will blindly accept the new region as the new
arena_used and arena_end, causing these to decrease and make it so any
Go heap above the new arena_used is no longer considered part of the
Go heap. This assumption *used to be* safe because we had all memory
between arena_start and arena_used mapped, but when we switched to an
arena_start of 0 on 32-bit, it became no longer safe.
Most likely, we've only recently seen this bug occur because we
usually start arena_used just above the binary, which is low in the
address space. Hence, the kernel is very unlikely to give us a region
before arena_used.
Since mheap.sysAlloc is a linear allocator, there's not much we can do
to handle this well. Hence, we fix this problem by simply rejecting
the new region if it isn't after arena_end. In this case, we'll take
the fall-back path and mmap a small region at any address just for the
requested memory.
Fixes#20259.
Change-Id: Ib72e8cd621545002d595c7cade1e817cfe3e5b1e
Reviewed-on: https://go-review.googlesource.com/43870
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Android O black-lists the select system call because its libc, Bionic,
does not use this system call. Replace our use of select with pselect6
(which is allowed) on the platforms that support targeting Android.
linux/arm64 already uses pselect6 because there is no select on arm64,
so only linux/amd64 and linux/arm need changing. pselect6 has been
available since Linux 2.6.16, which is before Go's minimum
requirement.
Fixes#20409.
Change-Id: Ic526b5b259a9e01d2f145a1f4d2e76e8c49ce809
Reviewed-on: https://go-review.googlesource.com/43641
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
profileBuilder.locForPC returns 0 to mean "no location" because 0 is
an invalid location index. However, the code to build count profiles
doesn't check the result of locForPC, so this 0 location index ends up
in the profile's location list. This, in turn, causes problems later
when we decode the profile because it puts a nil *Location in the
sample's location slice, which can later lead to a nil pointer panic.
Fix this by making printCountProfile correctly discard the result of
locForPC if it returns 0. This makes this call match the other two
calls of locForPC.
Updates #15156.
Change-Id: I4492b3652b513448bc56f4cfece4e37da5e42f94
Reviewed-on: https://go-review.googlesource.com/43630
Reviewed-by: Michael Matloob <matloob@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
TestGoroutineCounts was flaky when running on a system under load.
This happened on three builds the last couple of days.
Fix this by running this test with a single operating system thread, so
we do not depend on the operating system scheduler. 50 000 tests ran
without failure with the new version, the old version failed 0.5% of the
time.
Fixes#15156.
Change-Id: I1e5a18d0fef4f72cc9a56e376822b2849cdb0f8b
Reviewed-on: https://go-review.googlesource.com/43590
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
In low memory situations mmap(2) on Illumos[2] can return EAGAIN when it
is unable to reserve the necessary space for the requested mapping. Go
was not previously handling this correctly for Illumos and would fail to
recognize it was in a low-memory situation, the result being the program
would terminate with a panic instead of running the GC.
Fixes: #14930
[1]: https://www.illumos.org/man/2/mmap
Change-Id: I889cc0547e23f9d6c56e4fdd7bcbd0e15403873a
Reviewed-on: https://go-review.googlesource.com/43461
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
On other systems we use "SWI $n". Change Plan 9 files to be
consistent. Generated binary is unchanged.
Fixes#20378.
Change-Id: Ia2a722061da2450c7b30cb707ed4f172fafecf74
Reviewed-on: https://go-review.googlesource.com/43533
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
LLV and SCV are 64-bit load-linked and store-conditional. They
were used in runtime as #define WORD. Change them to normal
instruction form.
NOOP is hardware no-op. It was written as WORD $0. Make a name
for it for better disassembly output.
Fixes#12561.
Fixes#18238.
Change-Id: I82c667ce756fa83ef37b034b641e8c4366335e83
Reviewed-on: https://go-review.googlesource.com/40297
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently proto symbolization uses runtime.FuncForPC and assumes each
PC maps to a single frame. This isn't true in the presence of inlining
(even with leaf-only inlining this can get incorrect results).
Change PC symbolization to use runtime.CallersFrames to expand each PC
to all of the frames at that PC.
Change-Id: I8d20dff7495a5de495ae07f569122c225d433ced
Reviewed-on: https://go-review.googlesource.com/41256
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
Proto profile conversion is inconsistent about call vs return PCs in
profile locations. The proto defines locations to be call PCs. This is
what we do when proto-izing CPU profiles, but we fail to convert the
return PCs in memory and count profile stacks to call PCs when
converting them to proto locations.
Fix this in the heap and count profile conversion functions.
TestConvertMemProfile also hard-codes this failure to convert from
return PCs to call PCs, so fix up the addresses in the synthesized
profile to be return PCs while checking that we get call PCs out of
the conversion.
Change-Id: If1fc028b86fceac6d71a2d9fa6c41ff442c89296
Reviewed-on: https://go-review.googlesource.com/42951
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
runtime.gchelper depends on the non-atomic load of work.ndone
happening strictly before the atomic add of work.nwait. Until very
recently (commit 978af9c2db, fixing #20334), the compiler reordered
these operations. This created a race since work.ndone can change as
soon as work.nwait is equal to work.ndone. If that happened, more than
one gchelper could attempt to wake up the work.alldone note, causing a
"double wakeup" panic.
This was fixed in the compiler, but to make this code less subtle,
make the load of work.ndone atomic. This clearly forces the order of
these operations, ensuring the race doesn't happen.
Fixes#19305 (though really 978af9c2db fixed it).
Change-Id: Ieb1a84e1e5044c33ac612c8a5ab6297e7db4c57d
Reviewed-on: https://go-review.googlesource.com/43311
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This adds debugging information when we panic with "heapBitsForSpan:
base out of range".
Updates #20259.
Change-Id: I0dc1a106aa9e9531051c7d08867ace5ef230eb3f
Reviewed-on: https://go-review.googlesource.com/43310
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
They are not exported and not used in the compiler or standard library.
Change-Id: Ie1d210464f826742d282f12258ed1792cbd2d188
Reviewed-on: https://go-review.googlesource.com/43135
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Implements detection of x86 cpu features that
are used in the go standard library.
Changes all standard library packages to use the new cpu package
instead of using runtime internal variables to check x86 cpu features.
Updates: #15403
Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9
Reviewed-on: https://go-review.googlesource.com/41476
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
TestGoroutineCounts currently depends on timing to get 100 goroutines
to a known blocking point before taking a profile. This fails
frequently, with different goroutines captured at different stacks.
The test is disabled on openbsd because it was too flaky, but in fact
it flakes on all platforms.
Fix this by using Gosched instead of timing. This is both much more
reliable and makes the test run faster.
Fixes#15156.
Change-Id: Ia6e894196d717655b8fb4ee96df53f6cc8bc5f1f
Reviewed-on: https://go-review.googlesource.com/42953
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Because the hint parameter is supposed to be treated
purely as a hint, if it doesn't meet the requirements
we disregard it and continue as if there was no hint
at all.
Fixes#19926
Change-Id: I86e7f99472fad6b99ba4e2fd33e4a9e55d55115e
Reviewed-on: https://go-review.googlesource.com/40854
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Changes all cpu features to be detected and stored in bools in rt0_go.
Updates: #15403
Change-Id: I5a9961cdec789b331d09c44d86beb53833d5dc3e
Reviewed-on: https://go-review.googlesource.com/41950
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
overLoadFactor used a uintptr for its calculations.
When the number of potential buckets was large,
perhaps due to a coding error or corrupt/malicious user input
leading to a very large map size hint,
this led to overflow on 32 bit systems.
This overflow resulted in an infinite loop.
Prevent it by always using a 64 bit calculation.
Updates #20195
Change-Id: Iaabc710773cd5da6754f43b913478cc5562d89a2
Reviewed-on: https://go-review.googlesource.com/42185
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Currently Go sets the system-wide timer resolution to 1ms the whole
time it's running. This has negative affects on system performance and
power consumption. Unfortunately, simply reducing the timer resolution
to the default 15ms interferes with several sleeps in the runtime
itself, including sysmon's ability to interrupt goroutines.
This commit takes a hybrid approach: it only reduces the timer
resolution when the Go process is entirely idle. When the process is
idle, nothing needs a high resolution timer. When the process is
non-idle, it's already consuming CPU so it doesn't really matter if
the OS also takes timer interrupts more frequently.
Updates #8687.
Change-Id: I0652564b4a36d61a80e045040094a39c19da3b06
Reviewed-on: https://go-review.googlesource.com/38403
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Currently the pprof tests re-symbolize PCs in profiles, and do so in a
way that can't handle inlining. Proto profiles already contain full
symbol information, so this modifies the tests to use the symbol
information already present in the profile.
Change-Id: I63cd491de7197080fd158b1e4f782630f1bbbb56
Reviewed-on: https://go-review.googlesource.com/41255
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
Currently _TinySizeClass is untyped, which means it can accidentally
be used as a spanClass (not that I would know this from experience or
anything). Make it an int8 to avoid this mix up.
This is a cherry-pick of dev.garbage commit 81b74bf9c5.
Change-Id: I1e69eccee436ea5aa45e9a9828a013e369e03f1a
Reviewed-on: https://go-review.googlesource.com/41254
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This is no longer necessary now that we can more efficiently consult
the span's noscan bit.
This is a cherry-pick of dev.garbage commit 312aa09996.
Change-Id: Id0b00b278533660973f45eb6efa5b00f373d58af
Reviewed-on: https://go-review.googlesource.com/41252
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, we mix objects with pointers and objects without pointers
("noscan" objects) together in memory. As a result, for every object
we grey, we have to check that object's heap bits to find out if it's
noscan, which adds to the per-object cost of GC. This also hurts the
TLB footprint of the garbage collector because it decreases the
density of scannable objects at the page level.
This commit improves the situation by using separate spans for noscan
objects. This will allow a much simpler noscan check (in a follow up
CL), eliminate the need to clear the bitmap of noscan objects (in a
follow up CL), and improves TLB footprint by increasing the density of
scannable objects.
This is also a step toward eliminating dead bits, since the current
noscan check depends on checking the dead bit of the first word.
This has no effect on the heap size of the garbage benchmark.
We'll measure the performance change of this after the follow-up
optimizations.
This is a cherry-pick from dev.garbage commit d491e550c3. The only
non-trivial merge conflict was in updatememstats in mstats.go, where
we now have to separate the per-spanclass stats from the per-sizeclass
stats.
Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40
Reviewed-on: https://go-review.googlesource.com/41251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
In particular, this says that Frames.Function uniquely identifies a
function within a program. We depend on this in various places that
use runtime.Frames in std, but it wasn't actually written down.
Change-Id: Ie7ede348c17673e11ae513a094862b60c506abc5
Reviewed-on: https://go-review.googlesource.com/41610
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Profile labels added by the user using pprof.Do, if present will
be in a *labelMap stored in the unsafe.Pointer 'tag' field of
the profile map entry. This change extracts the labels from the tag
field and writes them to the profile proto.
Change-Id: Ic40fdc58b66e993ca91d5d5effe0e04ffbb5bc46
Reviewed-on: https://go-review.googlesource.com/39613
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
If g1 sets its labels and then they are copied into a profile buffer
and then g2 reads the profile buffer and inspects the labels,
the race detector must understand that g1's recording of the labels
happens before g2's use of the labels. Make that so.
Fixes race test failure in CL 39613.
Change-Id: Id7cda1c2aac6f8eef49213b5ca414f7154b4acfa
Reviewed-on: https://go-review.googlesource.com/42111
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
Delete old TestRuntimeFunctionTrimming, which is testing a dead API
and is now handled in end-to-end tests.
Change-Id: I64fc2991ed4a7690456356b5f6b546f36935bb67
Reviewed-on: https://go-review.googlesource.com/41815
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
This may improve perormance during concurrent access
to mheap.central array from multiple CPU cores.
Change-Id: I8f48dd2e72aa62e9c32de07ae60fe552d8642782
Reviewed-on: https://go-review.googlesource.com/41550
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Mostly unnecessary *testing.T arguments.
Found with github.com/mvdan/unparam.
Change-Id: Ifb955cb88f2ce8784ee4172f4f94d860fa36ae9a
Reviewed-on: https://go-review.googlesource.com/41691
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reduces cmd/go by 4464 bytes on amd64.
Removes the duplicate detection of AVX support and
presence of Intel processors.
Change-Id: I4670189951a63760fae217708f68d65e94a30dc5
Reviewed-on: https://go-review.googlesource.com/41570
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Implemented low-level time system for windows on hardware (software),
which does not support memory mapped _KSYSTEM_TIME page update.
In particular this problem exists on Wine where _KSYSTEM_TIME
only contains time at the start, and is never modified.
On start we try to detect Wine and if it's so we fallback to
GetSystemTimeAsFileTime() for current time and a monotonic
timer based on QueryPerformanceCounter family of syscalls:
https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspxFixes#18537
Change-Id: I269d22467ed9b0afb62056974d23e731b80c83ed
Reviewed-on: https://go-review.googlesource.com/35710
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The experiment "clobberdead" clobbers all pointer fields that the
compiler thinks are dead, just before and after every safepoint.
Useful for debugging the generation of live pointer bitmaps.
Helped find the following issues:
Update #15936
Update #16026
Update #16095
Update #18860
Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9
Reviewed-on: https://go-review.googlesource.com/23924
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently TestSetGCPercent checks that NextGC is within 10 MB of the
expected value. For some reason it's much noisier on some of the
builders. To get these passing again, raise the threshold to 20 MB.
Change-Id: I14e64025660d782d81ff0421c1eb898f416e11fe
Reviewed-on: https://go-review.googlesource.com/41374
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently SetGCPercent forces a GC in order to recompute GC pacing.
Since we can now recompute pacing on the fly using gcSetTriggerRatio,
change SetGCPercent (really runtime.setGCPercent) to go through
gcSetTriggerRatio and not trigger a GC.
Fixes#19076.
Change-Id: Ib30d7ab1bb3b55219535b9f238108f3d45a1b522
Reviewed-on: https://go-review.googlesource.com/39835
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
The current SetGCPercent test is, shall we say, minimal.
Expand it to check that the GC target is actually computed and updated
correctly.
For #19076.
Change-Id: I6e9b2ee0ef369f22f72e43b58d89e9f1e1b73b1b
Reviewed-on: https://go-review.googlesource.com/39834
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This changes gcSetTriggerRatio so it can be called even during
concurrent mark or sweep. In this case, it will adjust the pacing of
the current phase, accounting for progress that has already been made.
To make this work for concurrent sweep, this introduces a "basis" for
the pagesSwept count, much like the basis we just introduced for
heap_live. This lets gcSetTriggerRatio shift the basis to the current
heap_live and pagesSwept and compute a slope from there to completion.
This avoids creating a discontinuity where, if the ratio has
increased, there has to be a flurry of sweep activity to catch up.
Instead, this creates a continuous, piece-wise linear function as
adjustments are made.
For #19076.
Change-Id: Ibcd76aeeb81ff4814b00be7cbd3530b73bbdbba9
Reviewed-on: https://go-review.googlesource.com/39833
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, proportional sweep maintains its own count of how many
bytes have been allocated since the beginning of the sweep cycle so it
can compute how many pages need to be swept for a given allocation.
However, this requires a somewhat complex reimbursement scheme since
proportional sweep must be done before a span is allocated, but we
don't know how many bytes to charge until we've allocated a span. This
means that the allocated byte count used by proportional sweep can go
up and down, which has led to underflow bugs in the past (#18043) and
is going to interfere with adjusting sweep pacing on-the-fly (for #19076).
This approach also means we're maintaining a statistic that is very
closely related to heap_live, but has a different 0 value. This is
particularly confusing because the sweep ratio is computed based on
heap_live, so you have to understand that these two statistics are
very closely related.
Replace all of this and compute the sweep debt directly from the
current value of heap_live. To make this work, we simply save the
value of heap_live when the sweep ratio is computed to use as a
"basis" for later computing the sweep debt.
This eliminates the need for reimbursement as well as the code for
maintaining the sweeper's version of the live heap size.
For #19076.
Coincidentally fixes#18043, since this eliminates sweep reimbursement
entirely.
Change-Id: I1f931ddd6e90c901a3972c7506874c899251dc2a
Reviewed-on: https://go-review.googlesource.com/39832
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, the computations that derive controls from the GC trigger
are spread across several parts of the mark termination code.
Consolidate computing the absolute trigger, the heap goal, and sweep
pacing into a single function called at the end of mark termination.
Unlike the code being consolidated, this has to be more careful about
negative gcpercent. Many of the consolidated code paths simply didn't
execute if GC was off.
This is a step toward being able to change the GC trigger ratio in the
middle of concurrent sweeping and marking. For this commit, we try to
stick close to the original structure of the code that's being
consolidated, so it doesn't yet support mid-cycle adjustments.
For #19076.
Change-Id: Ic5335be04b96ad20e70d53d67913a86bd6b31456
Reviewed-on: https://go-review.googlesource.com/39831
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
gcController.triggerRatio is the only field in gcController that
persists across cycles. As global mutable state, the places where it
written and read are spread out, making it difficult to see that
updates and downstream calculations are done correctly.
Improve this situation by doing two things:
1) Move triggerRatio to memstats so it lives with the other
trigger-related fields and makes gcController entirely transient
state.
2) Commit the new trigger ratio during mark termination when we
compute other next-cycle controls, including the absolute trigger.
This forces us to explicitly thread the new trigger ratio from
gcController.endCycle to mark termination, so we're not just pulling
it out of global state.
Change-Id: I6669932f8039a8c0ef46a3f2a8c537db72e578aa
Reviewed-on: https://go-review.googlesource.com/39830
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
heap_live is updated atomically without locking, so we should also use
atomic loads to read it. Fix the reads of heap_live that happen
outside of STW to be atomic.
Change-Id: Idca9451c348168c2a792a9499af349833a3c333f
Reviewed-on: https://go-review.googlesource.com/41371
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
On 32-bit architectures (or if we fail to map a 64-bit-style arena),
we try to map the heap arena just above the end of the process image.
While we can accept any address, using lower addresses is preferable
because lower addresses cause us to map less of the heap bitmap.
However, if a program is linked against C code that has global
constructors, those constructors may call brk/sbrk to allocate memory
(e.g., many C malloc implementations do this for small allocations).
The brk also starts just above the process image, so this may adjust
the brk past the beginning of where we want to put the heap arena. In
this case, the kernel will pick a different address for the arena and
it will usually be very high (at least, as these things go in a 32-bit
address space).
Fix this by consulting the current value of the brk and using this in
addition to the end of the process image to compute the initial arena
placement.
This is implemented only on Linux currently, since we have no evidence
that it's an issue on any other OSes.
Fixes#19831.
Change-Id: Id64b45d08d8c91e4f50d92d0339146250b04f2f8
Reviewed-on: https://go-review.googlesource.com/39810
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TestBreakpoint expects to see "runtime.Breakpoint()" in the stack trace.
If runtime.Breakpoint() is inlined, then the stack trace prints
"runtime.Breakpoint(...)" since the runtime does not have information
about arguments (or lack thereof) to inlined functions. This change
makes the test independent of inlining by looking for the string
"runtime.Breakpoint(". Now TestBreakpoint passes with -l=4.
Change-Id: Ia044a8e8a4de2337cb2b393d6fa78c73a2f25926
Reviewed-on: https://go-review.googlesource.com/40997
Run-TryBot: David Lazar <lazard@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TestBlockProfile matches samples against a regexp that accepts "," in
profile PCs. I suspect this was just a syntax mistake. Remove "," from
the character class.
Change-Id: Idcfc20ed6900075abae08597ba71db559e89b37b
Reviewed-on: https://go-review.googlesource.com/41111
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Peter Weinberger <pjw@google.com>
TestBlockProfile currently requires exactly five PCs in each sample.
With more aggressive inlining there may be fewer, so change this test
to use the same pattern as TestMutexProfile, which accepts one or more
PCs. With this change, this test passes when compiled with -l=4.
Change-Id: I1421a6d56c96b77111bdc671d88723a222672fd6
Reviewed-on: https://go-review.googlesource.com/41110
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
CL 40876 changed ExampleFrames so that the output
was stable with and without mid-stack inlining.
However, that change lost some of the
pedagogical and copy/paste value of the example.
It was unclear why both more and i were being tracked,
and whether the 5 in i < 5 is related to len(pc),
and if so, why and how.
This CL rewrites the example with lots more comments,
and such that the core structure more closely matches
normal usage, and such that it is obvious
which lines of code should be deleted when copying.
As a bonus, it also now illustrates Frame.File.
Change-Id: Iab73541dd096657ddf79c5795337e8b596d89740
Reviewed-on: https://go-review.googlesource.com/41136
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
The period recorded in CPU profiles is in nanoseconds, but was being
computed incorrectly as hz * 1000. As a result, many absolute times
displayed by pprof were incorrect.
Fix this by computing the period correctly.
Change-Id: I6fadd6d8ad3e57f31e8cc7a25a24fcaec510d8d4
Reviewed-on: https://go-review.googlesource.com/40995
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Instead of populating the aux symbol
of CALLudiv during rewrite rules,
populate it during genssa.
This simplifies the rewrite rules.
It also removes all remaining calls
to ctxt.Lookup from any rewrite rules.
This is a first step towards removing
ctxt from ssa.Cache entirely,
and also a first step towards converting
the obj.LSym.Version field into a boolean.
It should also speed up compilation.
Also, move func udiv into package runtime.
That's where it is anyway,
and it lets udiv look and act like the rest of
the runtime support functions.
Change-Id: I41462a632c14fdc41f61b08049ec13cd80a87bfe
Reviewed-on: https://go-review.googlesource.com/41191
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Returns at the end of func bodies where the funcs have no return values
are pointless.
Change-Id: I0da5ea78671503e41a9f56dd770df8c919310ce5
Reviewed-on: https://go-review.googlesource.com/41093
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This extends the GCSweepDone event with counts of swept and reclaimed
bytes. These are useful for understanding the duration and
effectiveness of sweep events.
Change-Id: I3c97a4f0f3aad3adbd188adb264859775f54e2df
Reviewed-on: https://go-review.googlesource.com/40811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Currently, each individual span sweep emits a span to the trace. But
sweeps are generally done in loops until some condition is satisfied,
so this tracing is lower-level than anyone really wants any hides the
fact that no other work is being accomplished between adjacent sweep
events. This is also high overhead: enabling tracing significantly
impacts sweep latency.
Replace this with instead tracing around the sweep loops used for
allocation. This is slightly tricky because sweep loops don't
generally know if any sweeping will happen in them. Hence, we make the
tracing lazy by recording in the P that we would like to start tracing
the sweep *if* one happens, and then only closing the sweep event if
we started it.
This does mean we don't get tracing on every sweep path, which are
legion. However, we get much more informative tracing on the paths
that block allocation, which are the paths that matter.
Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c
Reviewed-on: https://go-review.googlesource.com/40810
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Changes the text to match GOOS which appends 'and so on' at the
end to avoid restricting the set of possible values.
Change-Id: I54bcde71334202cf701662cdc2582c974ba8bf53
Reviewed-on: https://go-review.googlesource.com/41074
Reviewed-by: Ian Lance Taylor <iant@golang.org>
When allocating a non-small array of buckets for a map,
also preallocate some overflow buckets.
The estimate of the number of overflow buckets
is based on a simulation of putting mid=(low+high)/2 elements
into a map, where low is the minimum number of elements
needed to reach this value of b (according to overLoadFactor),
and high is the maximum number of elements possible
to put in this value of b (according to overLoadFactor).
This estimate is surprisingly reliable and accurate.
The number of overflow buckets needed is quadratic,
for a fixed value of b.
Using this mid estimate means that we will overallocate a few
too many overflow buckets when the actual number of elements is near low,
and underallocate significantly too few overflow buckets
when the actual number of elements is near high.
The mechanism introduced in this CL can be re-used for
other overflow bucket optimizations.
For example, given an initial size hint,
we could estimate quite precisely the number of overflow buckets.
This is #19931.
We could also change from "non-nil means end-of-list"
to "pointer-to-hmap.buckets means end-of-list",
and then create a linked list of reusable overflow buckets
when they are freed by map growth.
That is #19992.
We could also use a similar mechanism to do bulk allocation
of overflow buckets.
All these uses can co-exist with only the one additional pointer
in mapextra, given a little care.
name old time/op new time/op delta
MapPopulate/1-8 60.1ns ± 2% 60.3ns ± 2% ~ (p=0.278 n=19+20)
MapPopulate/10-8 577ns ± 1% 578ns ± 1% ~ (p=0.140 n=20+20)
MapPopulate/100-8 8.06µs ± 1% 8.19µs ± 1% +1.67% (p=0.000 n=20+20)
MapPopulate/1000-8 104µs ± 1% 104µs ± 1% ~ (p=0.317 n=20+20)
MapPopulate/10000-8 891µs ± 1% 888µs ± 1% ~ (p=0.101 n=19+20)
MapPopulate/100000-8 8.61ms ± 1% 8.58ms ± 0% -0.34% (p=0.009 n=20+17)
name old alloc/op new alloc/op delta
MapPopulate/1-8 0.00B 0.00B ~ (all equal)
MapPopulate/10-8 179B ± 0% 179B ± 0% ~ (all equal)
MapPopulate/100-8 3.33kB ± 0% 3.38kB ± 0% +1.48% (p=0.000 n=20+16)
MapPopulate/1000-8 55.5kB ± 0% 53.4kB ± 0% -3.84% (p=0.000 n=19+20)
MapPopulate/10000-8 432kB ± 0% 428kB ± 0% -1.06% (p=0.000 n=19+20)
MapPopulate/100000-8 3.65MB ± 0% 3.62MB ± 0% -0.70% (p=0.000 n=20+20)
name old allocs/op new allocs/op delta
MapPopulate/1-8 0.00 0.00 ~ (all equal)
MapPopulate/10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
MapPopulate/100-8 18.0 ± 0% 17.0 ± 0% -5.56% (p=0.000 n=20+20)
MapPopulate/1000-8 96.0 ± 0% 72.6 ± 1% -24.38% (p=0.000 n=20+20)
MapPopulate/10000-8 625 ± 0% 319 ± 0% -48.86% (p=0.000 n=20+20)
MapPopulate/100000-8 6.23k ± 0% 4.00k ± 0% -35.79% (p=0.000 n=20+20)
Change-Id: I01f41cb1374bdb99ccedbc00d04fb9ae43daa204
Reviewed-on: https://go-review.googlesource.com/40979
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Any change to how we allocate overflow buckets
will require some extra hmap storage,
but we don't want hmap to grow,
particular as small maps usually don't need overflow buckets.
This CL converts the existing hmap overflow field,
which is usually used for pointer-free maps,
into a generic extra field.
This extra field can be used to hold data that is optional.
If it is valuable enough to do have special
handling of overflow buckets, which are medium-sized,
it is valuable enough to pay an extra alloc and two extra words for.
Adding fields to extra would entail adding overhead to pointer-free maps;
any mapextra fields added would need to be weighed against that.
This CL is just rearrangement, though.
Updates #19931
Updates #19992
Change-Id: If8537a206905b9d4dc6cd9d886184ece671b3f80
Reviewed-on: https://go-review.googlesource.com/40976
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This simplifies the code, as well as providing
a single place to modify to change the
allocation of new overflow buckets.
Updates #19931
Updates #19992
Change-Id: I77070619f5c8fe449bbc35278278bca5eda780f2
Reviewed-on: https://go-review.googlesource.com/40975
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Only the noinline pragma on testCallerFoo is needed to pass the test,
but the second pragma makes the test robust to future changes to the
inliner.
Change-Id: I80b384380c598f52e0382f53b59bb47ff196363d
Reviewed-on: https://go-review.googlesource.com/40877
Run-TryBot: David Lazar <lazard@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Otherwise, with -l=4, runtime.Callers gets inlined and the example
prints too many frames. Now the example passes with -l=4.
Change-Id: I9e420af9371724ac3ec89efafd76a658cf82bb4a
Reviewed-on: https://go-review.googlesource.com/40876
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This rewrites runtime.Caller in terms of stackExpander, which already
handles inlined frames and partially skipped frames. This also has the
effect of making runtime.Caller understand cgo frames if there is a cgo
symbolizer.
Updates #19348.
Change-Id: Icdf4df921aab5aa394d4d92e3becc4dd169c9a6e
Reviewed-on: https://go-review.googlesource.com/40270
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
The Frames API forces the PC slice to escape to the heap because it
stores it in the Frames object. However, we'd like to use this API for
call stack expansion internally in the runtime in places where it
would be very good to avoid heap allocation.
This commit makes this possible by pulling the bulk of the Frames
implementation into an internal frameExpander API. The key difference
between these APIs is that the frameExpander does not hold the PC
slice; instead, the caller is responsible for threading the PC slice
through the frameExpander API calls. This makes it possible to keep
the PC slice on the stack. The Frames API then becomes a thin shim
around the frameExpander that keeps the PC slice in the Frames object.
Change-Id: If6b2d0b9132a2a905a0cf5deced9feddce76fc0e
Reviewed-on: https://go-review.googlesource.com/40610
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
While debugging a recent regression it was discovered that
the assembler for ppc64x was not always generating the correct
instruction for DS form loads and stores. When an instruction
is DS form then the offset must be a multiple of 4, and if it
isn't then bits outside the offset field were being incorrectly
set resulting in unexpected and incorrect instructions.
This change adds a check to determine when the opcode is DS form
and then verifies that the offset is a multiple of 4 before
generating the instruction, otherwise logs an error.
This also changes a few asm files that were using unaligned offsets
for DS form loads and stores. In the runtime package these were
instructions intended to cause a crash so using aligned or unaligned
offsets doesn't change that behavior.
Change-Id: Ie3a7e1e65dcc9933b54de7a46a054da8459cb56f
Reviewed-on: https://go-review.googlesource.com/40476
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Now the runtime/trace tests pass with -l=4.
This also gets rid of the frames cache for multiple reasons:
1) The frames cache was used to avoid repeated calls to funcname and
funcline. Now these calls happen inside the CallersFrames iterator.
2) Maintaining a frames cache is harder: map[uintptr]traceFrame
doesn't work since each PC can map to multiple traceFrames.
3) It's not clear that the cache is important.
Change-Id: I2914ac0b3ba08e39b60149d99a98f9f532b35bbb
Reviewed-on: https://go-review.googlesource.com/40591
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This extends the sweeper to free workbufs back to the heap between GC
cycles, allowing this memory to be reused for GC'd allocations or
eventually returned to the OS.
This helps for applications that have high peak heap usage relative to
their regular heap usage (for example, a high-memory initialization
phase). Workbuf memory is roughly proportional to heap size and since
we currently never free workbufs, it's proportional to *peak* heap
size. By freeing workbufs, we can release and reuse this memory for
other purposes when the heap shrinks.
This is somewhat complicated because this costs ~1–2 µs per workbuf
span, so for large heaps it's too expensive to just do synchronously
after mark termination between starting the world and dropping the
worldsema. Hence, we do it asynchronously in the sweeper. This adds a
list of "free" workbuf spans that can be returned to the heap. GC
moves all workbuf spans to this list after mark termination and the
background sweeper drains this list back to the heap. If the sweeper
doesn't finish, that's fine, since getempty can directly reuse any
remaining spans to allocate more workbufs.
Performance impact is negligible. On the x/benchmarks, this reduces
GC-bytes-from-system by 6–11%.
Fixes#19325.
Change-Id: Icb92da2196f0c39ee984faf92d52f29fd9ded7a8
Reviewed-on: https://go-review.googlesource.com/38582
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the runtime allocates workbufs from persistent memory, which
means they can never be freed.
Switch to allocating them from manually-managed heap spans. This
doesn't free them yet, but it puts us in a position to do so.
For #19325.
Change-Id: I94b2512a2f2bbbb456cd9347761b9412e80d2da9
Reviewed-on: https://go-review.googlesource.com/38581
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This introduces a new type, *gcBits, to use for alloc/mark bitmap
allocations instead of *uint8. This type is marked go:notinheap, so
uses of it correctly eliminate write barriers. Since we now have a
type, this also extracts some common operations to methods both for
convenience and to avoid (*uint8) casts at most use sites.
For #19325.
Change-Id: Id51f734fb2e96b8b7715caa348c8dcd4aef0696a
Reviewed-on: https://go-review.googlesource.com/38580
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This clarifies that the gcBits type is actually an arena of gcBits and
will let us introduce a new gcBits type representing a single
mark/alloc bitmap allocated from the arena.
For #19325.
Change-Id: Idedf76d202d9174a17c61bcca9d5539e042e2445
Reviewed-on: https://go-review.googlesource.com/38579
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, manually-managed spans are included in memstats.heap_inuse
and memstats.heap_sys, but when we export these stats to the user, we
subtract out how much has been allocated for stack spans from both.
This works for now because stacks are the only manually-managed spans
we have.
However, we're about to use manually-managed spans for more things
that don't necessarily have obvious stats we can use to adjust the
user-presented numbers. Prepare for this by changing the accounting so
manually-managed spans don't count toward heap_inuse or heap_sys. This
makes these fields align with the fields presented to the user and
means we don't have to track more statistics just so we can adjust
these statistics.
For #19325.
Change-Id: I5cb35527fd65587ff23339276ba2c3969e2ad98f
Reviewed-on: https://go-review.googlesource.com/38577
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
We're going to start using manually-managed spans for GC workbufs, so
rename the allocate/free methods and pass in a pointer to the stats to
use instead of using the stack stats directly.
For #19325.
Change-Id: I37df0147ae5a8e1f3cb37d59c8e57a1fcc6f2980
Reviewed-on: https://go-review.googlesource.com/38576
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
We're going to use this free list for other types of manually-managed
memory in the heap.
For #19325.
Change-Id: Ib7e682295133eabfddf3a84f44db43d937bfdd9c
Reviewed-on: https://go-review.googlesource.com/38575
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
We're about to generalize _MSpanStack to be used for other forms of
in-heap manual memory management in the runtime. This is an automated
rename of _MSpanStack to _MSpanManual plus some comment fix-ups.
For #19325.
Change-Id: I1e20a57bb3b87a0d324382f92a3e294ffc767395
Reviewed-on: https://go-review.googlesource.com/38574
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently CallersFrames expands each PC to a slice of Frames and then
iteratively returns those Frames. However, this makes it very
difficult to avoid heap allocation: either the Frames slice will be
heap allocated, or, if it uses internal scratch space for small slices
(as it currently does), the Frames object itself has to be heap
allocated.
Fix this, at least in the common case, by expanding each PC
iteratively. We introduce a new pcExpander type that's responsible for
expanding a single PC. This maintains state from one Frame to the next
in the same PC. Frames then becomes a wrapper around this responsible
for feeding it the next PC when the pcExpander runs out of frames for
the current PC.
This makes it possible to stack-allocate a Frames object, which will
make it possible to use this API for PC expansion from within the
runtime itself.
Change-Id: I993463945ab574557cf1d6bedbe79ce7e9cbbdcd
Reviewed-on: https://go-review.googlesource.com/40434
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
Prevent a crash if the same type in two plugins had a recursive
definition, either by referring to a pointer to itself or a map existing
with the type as a value type (which creates a recursive definition
through the overflow bucket type).
Fixes#19258
Change-Id: Iac1cbda4c5b6e8edd5e6859a4d5da3bad539a9c6
Reviewed-on: https://go-review.googlesource.com/40292
Run-TryBot: Todd Neal <todd@tneal.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
This was unintentionally emptied rather than removed in 9417c022.
Change-Id: Ie6fdcf7ef55e58f12e2a2750ab448aa2d9f94d15
Reviewed-on: https://go-review.googlesource.com/40413
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
OpenBSD 6.0 and later have support for PT_TLS in ld.so(1). Now that OpenBSD
6.1 has been released, OpenBSD 5.9 is no longer officially supported and Go
can start generating PT_TLS for OpenBSD cgo binaries. This also allows us
to remove the workarounds in the OpenBSD cgo runtime.
This change also removes the environ and progname exports - these are now
provided directly by ld.so(1) itself.
Fixes#19932
Change-Id: I42e75ef9feb5dcd4696add5233497e3cbc48ad52
Reviewed-on: https://go-review.googlesource.com/40331
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The hardware divider is an optional component of ARMv7. This patch
detects whether it is available in runtime and use it or not.
1. The hardware divider is detected at startup and a flag is set/clear
according to a perticular bit of runtime.hwcap.
2. Each call of runtime.udiv will check this flag and decide if
use the hardware division instruction.
A rough test shows the performance improves 40-50% for ARMv7. And
the compatibility of ARMv5/v6 is not broken.
fixes#19118
Change-Id: Ic586bc9659ebc169553ca2004d2bdb721df823ac
Reviewed-on: https://go-review.googlesource.com/37496
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Changing mheap_.arena_used requires several steps that are currently
repeated multiple times in mheap_.sysAlloc. Consolidate these into a
single function.
In the future, this will also make it easier to add other auxiliary VM
structures.
Change-Id: Ie68837d2612e1f4ba4904acb1b6b832b15431d56
Reviewed-on: https://go-review.googlesource.com/40151
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The runtime.writeBarrier variable tries to be helpful by telling you
that the compiler also knows about this variable, which you could
probably guess, but doesn't say how the compiler knows about it. In
fact, the compiler has a complete copy in builtin/runtime.go that
needs to be kept in sync. Say so.
Change-Id: Ia7fb0c591cb6f9b8230decce01008b417dfcec89
Reviewed-on: https://go-review.googlesource.com/40150
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
They are dead code already, but the verifier is still not happy.
Don't assemble them at all.
Looks like it has been like that for long. I don't know why it
was ok. Maybe the verifier is now more picky?
Fixes#19884.
Change-Id: Ib806fb73ca469789dec56f52d484cf8baf7a245c
Reviewed-on: https://go-review.googlesource.com/40111
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
This code was added recently, and it doesn't seem like the parameter
will be useful in the near future.
Change-Id: I5d64dadb6820c159b588262ab90df2461b5fdf04
Reviewed-on: https://go-review.googlesource.com/39692
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Stack spans don't internally use many of the fields of the mspan,
which means things like the size class and element size get left over
from whatever last used the mspan. This can lead to confusing crashes
and debugging.
Zero these fields or initialize them to something reasonable. This
also lets us simplify some code that currently has to distinguish
between heap and stack spans.
Change-Id: I9bd114e76c147bb32de497045b932f8bf1988bbf
Reviewed-on: https://go-review.googlesource.com/38573
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Commit 44ed88a5a7 moved printing of the "sweep done" gcpacertrace
message so that it is printed when the final sweeper finishes.
However, by this point some other thread has often already observed
that there are no more spans to sweep and zeroed sweepPagesPerByte.
Avoid printing a 0 sweep ratio in the trace when this race happens by
getting the value of the sweep ratio upon entry to sweepone and
printing that.
Change-Id: Iac0c48ae899e12f193267cdfb012c921f8b71c85
Reviewed-on: https://go-review.googlesource.com/39492
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Providing size hint when creating a map allows avoiding re-allocating
underlying data structure if we know how many elements are going to
be inserted. This can be used for example during decoding maps in
gob.
Fixes#19599
Change-Id: I108035fec29391215d2261a73eaed1310b46bab1
Reviewed-on: https://go-review.googlesource.com/38335
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently runtime.GC() triggers a STW GC. For common uses in tests and
benchmarks, it doesn't matter whether it's STW or concurrent, but for
uses in servers for things like collecting heap profiles and
controlling memory footprint, this pause can be a bit problem for
latency.
This changes runtime.GC() to trigger a concurrent GC. In order to
remain as close as possible to its current meaning, we define it to
always perform a full mark/sweep GC cycle before returning (even if
that means it has to finish up a cycle we're in the middle of first)
and to publish the heap profile as of the triggered mark termination.
While it must perform a full cycle, simultaneous runtime.GC() calls
can be consolidated into a single full cycle.
Fixes#18216.
Change-Id: I9088cc5deef4ab6bcf0245ed1982a852a01c44b5
Reviewed-on: https://go-review.googlesource.com/37520
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
sweepone returns ^uintptr(0) when there are no more spans to *start*
sweeping, but there may be spans being swept concurrently at the time
and there's currently no efficient way to tell when the sweeper is
done sweeping all the spans.
We'll need this for concurrent runtime.GC(), so add a count of the
number of active sweepone calls to make it possible to block until
sweeping is truly done.
This is also useful for more accurately printing the gcpacertrace,
since that should be printed after all of the sweeping stats are in
(currently we can print it slightly too early).
For #18216.
Change-Id: I06e6240c9e7b40aca6fd7b788bb6962107c10a0f
Reviewed-on: https://go-review.googlesource.com/37716
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Forced GCs don't provide good information about how to adjust the GC
trigger. Currently we avoid adjusting the trigger on forced GC because
forced GC is also STW and we don't adjust the trigger on STW GC.
However, this will become a problem when forced GC is concurrent.
Fix this by skipping trigger adjustment if the GC was user-forced.
For #18216.
Change-Id: I03dfdad12ecd3cfeca4573140a0768abb29aac5e
Reviewed-on: https://go-review.googlesource.com/38951
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently gcMode != gcBackgroundMode implies this was a user-forced GC
cycle. This is no longer going to be true when we make runtime.GC()
trigger a concurrent GC, so replace this with an explicit
work.userForced bit.
For #18216.
Change-Id: If7d71bbca78b5f0b35641b070f9d457f5c9a52bd
Reviewed-on: https://go-review.googlesource.com/37519
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently freeOSMemory calls gcStart directly, but we really just want
it to behave like runtime.GC() and then perform a scavenge, so make it
call runtime.GC() rather than gcStart.
For #18216.
Change-Id: I548ec007afc788e87d383532a443a10d92105937
Reviewed-on: https://go-review.googlesource.com/37518
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Now that the gcMode is no longer involved in the GC trigger condition,
we can simplify the triggering of forced GCs. By making the trigger
condition for forced GCs true even if gcphase is not _GCoff, we don't
need any special case path in gcStart to ensure that forced GCs don't
get consolidated.
Change-Id: I6067a13d76e40ff2eef8fade6fc14adb0cb58ee5
Reviewed-on: https://go-review.googlesource.com/37517
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the GC triggering condition is an awkward combination of the
gcMode (whether or not it's gcBackgroundMode) and a boolean
"forceTrigger" flag.
Replace this with a new gcTrigger type that represents the range of
transition predicates we need. This has several advantages:
1. We can remove the awkward logic that affects the trigger behavior
based on the gcMode. Now gcMode purely controls whether to run a
STW GC or not and the gcTrigger controls whether this is a forced
GC that cannot be consolidated with other GC cycles.
2. We can lift the time-based triggering logic in sysmon to just
another type of GC trigger and move the logic to the trigger test.
3. This sets us up to have a cycle count-based trigger, which we'll
use to make runtime.GC trigger concurrent GC with the desired
consolidation properties.
For #18216.
Change-Id: If9cd49349579a548800f5022ae47b8128004bbfc
Reviewed-on: https://go-review.googlesource.com/37516
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently sysmon triggers periodic GC if GC is not currently running
and it's been long enough since the last GC. This misses some
important conditions; for example, whether GC is enabled at all by
GOGC. As a result, if GOGC is off, once we pass the timeout for
periodic GC, sysmon will attempt to trigger a GC every 10ms. This GC
will be a no-op because gcStart will check all of the appropriate
conditions and do nothing, but it still goes through the motions of
waking the forcegc goroutine and printing a gctrace line.
Fix this by making sysmon call gcShouldStart to check *all* of the
appropriate transition conditions before attempting to trigger a
periodic GC.
Fixes#19247.
Change-Id: Icee5521ce175e8419f934723849853d53773af31
Reviewed-on: https://go-review.googlesource.com/37515
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the heap profile is flushed by *either* gcSweep in STW mode
or by gcMarkTermination in concurrent mode. Simplify this by making
gcMarkTermination always flush the heap profile and by making gcSweep
do one extra flush (instead of two) in STW mode.
Change-Id: I62147afb2a128e1f3d92ef4bb8144c8a345f53c4
Reviewed-on: https://go-review.googlesource.com/37715
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we snapshot the heap profile just *after* mark termination
starts the world because it's a relatively expensive operation.
However, this means any alloc or free events that happen between
starting the world and snapshotting the heap profile can be accounted
to the wrong cycle. In the worst case, a free can be accounted to the
cycle before the alloc; if the heap is small, this can result
temporarily in a negative "in use" count in the profile.
Fix this without making STW more expensive by using a global heap
profile cycle counter. This lets us split up the operation into a two
parts: 1) a super-cheap snapshot operation that simply increments the
global cycle counter during STW, and 2) a more expensive cleanup
operation we can do after starting the world that frees up a slot in
all buckets for use by the next heap profile cycle.
Fixes#19311.
Change-Id: I6bdafabf111c48b3d26fe2d91267f7bef0bd4270
Reviewed-on: https://go-review.googlesource.com/37714
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently memRecord has the same set of four fields repeated three
times. Pull these into a type and use this type three times. This
cleans up and simplifies the code a bit and will make it easier to
switch to a globally tracked heap profile cycle for #19311.
Change-Id: I414d15673feaa406a8366b48784437c642997cf2
Reviewed-on: https://go-review.googlesource.com/37713
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Every time I modify heap profiling, I find myself redrawing this
diagram, so add it to the comments. This shows how allocations and
frees are accounted, how we arrive at consistent profile snapshots,
and when those snapshots are published to the user.
Change-Id: I106aba1200af3c773b46e24e5f50205e808e2c69
Reviewed-on: https://go-review.googlesource.com/37514
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Now that we have a nice predicate system, improve the tests performed
by TestMemStats. We add some more non-zero checks (now that we force a
GC, things like NumGC must be non-zero), checks for trivial boolean
fields, and a few more range checks.
Change-Id: I6da46d33fa0ce5738407ee57d587825479413171
Reviewed-on: https://go-review.googlesource.com/37513
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently most TestMemStats failures dump the whole MemStats object if
anything is amiss without telling you what is amiss, or even which
field is wrong. This makes it hard to figure out what the actual
problem is.
Replace this with a reflection walk over MemStats and a map of
predicates to check. If one fails, we can construct a detailed and
descriptive error message. The predicates are a direct translation of
the current tests.
Change-Id: I5a7cafb8e6a1eeab653d2e18bb74e2245eaa5444
Reviewed-on: https://go-review.googlesource.com/37512
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
The `skip` argument passed to runtime.Caller and runtime.Callers should
be interpreted as the number of logical calls to skip (rather than the
number of physical stack frames to skip). This changes runtime.Callers
to skip inlined calls in addition to physical stack frames.
The result value of runtime.Callers is a slice of program counters
([]uintptr) representing physical stack frames. If the `skip` parameter
to runtime.Callers skips part-way into a physical frame, there is no
convenient way to encode that in the resulting slice. To avoid changing
the API in an incompatible way, our solution is to store the number of
skipped logical calls of the first frame in the _second_ uintptr
returned by runtime.Callers. Since this number is a small integer, we
encode it as a valid PC value into a small symbol called:
runtime.skipPleaseUseCallersFrames
For example, if f() calls g(), g() calls `runtime.Callers(2, pcs)`, and
g() is inlined into f, then the frame for f will be partially skipped,
resulting in the following slice:
pcs = []uintptr{pc_in_f, runtime.skipPleaseUseCallersFrames+1, ...}
We store the skip PC in pcs[1] instead of pcs[0] so that `pcs[i:]` will
truncate the captured stack trace rather than grow it for all i.
Updates #19348.
Change-Id: I1c56f89ac48c29e6f52a5d085567c6d77d499cf1
Reviewed-on: https://go-review.googlesource.com/37854
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
The result from CFBundleCopyResourceURL is owned by the caller. This
CL adds the necessary CFRelease to release it after use.
Fixes#19722
Change-Id: I7afe22ef241d21922a7f5cef6498017e6269a5c3
Reviewed-on: https://go-review.googlesource.com/38639
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
We reused the old C stack check mechanism for the implementation of
//go:systemstack, so when we execute a //go:systemstack function on a
user stack, the system fails by calling morestackc. However,
morestackc's message still talks about "executing C code".
Fix morestackc's message to reflect its modern usage.
Change-Id: I7e70e7980eab761c0520f675d3ce89486496030f
Reviewed-on: https://go-review.googlesource.com/38572
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
On Android, the thread local offset is found by looping through memory
starting at the TLS base address. The search is limited to
PTHREAD_KEYS_MAX, but issue 19472 made it clear that in some cases, the
slot is located further from the TLS base.
The limit is merely a sanity check in case our assumptions about the
thread-local storage layout are wrong, so this CL raises it to 384, which
is enough for the test case in issue 19472.
Fixes#19472
Change-Id: I89d1db3e9739d3a7fff5548ae487a7483c0a278a
Reviewed-on: https://go-review.googlesource.com/38636
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Clean up code that does interface equality. Avoid doing checks
in efaceeq/ifaceeq that we already did before calling those routines.
No noticeable performance changes for existing benchmarks.
name old time/op new time/op delta
EfaceCmpDiff-8 604ns ± 1% 553ns ± 1% -8.41% (p=0.000 n=9+10)
Fixes#18618
Change-Id: I3bd46db82b96494873045bc3300c56400bc582eb
Reviewed-on: https://go-review.googlesource.com/38606
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
The darwin linker for ARM does not allow PC-relative relocation
of external symbol in text section. Work around it by accessing
it indirectly: putting its address in a global variable (which is
not external), and accessing through that variable.
Fixes#19684.
Change-Id: I41361bbb281b5dbdda0d100ae49d32c69ed85a81
Reviewed-on: https://go-review.googlesource.com/38596
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
Starting in go1.9, the minimum processor requirement for ppc64 is POWER8. This
means the checks for GOARCH_ppc64 in asm_ppc64x.s can be removed, since we can
assume LBAR and STBCCC instructions (both from ISA 2.06) will always be
available.
Updates #19074
Change-Id: Ib4418169cd9fc6f871a5ab126b28ee58a2f349e2
Reviewed-on: https://go-review.googlesource.com/38406
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>