This ensures that runtime's signal handlers pass through the TSAN and
MSAN libc interceptors and subsequent calls to the intercepted
sigaction function from C will correctly see them.
Fixes#17753.
Change-Id: I9798bb50291a4b8fa20caa39c02a4465ec40bb8d
Reviewed-on: https://go-review.googlesource.com/33142
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In plugins and every program that opens a plugin, include a hash of
every imported package.
There are two versions of each hash: one local and one exported.
As the program starts and plugins are loaded, the first exported
symbol for each package becomes the canonical version.
Any subsequent plugin's local package hash symbol has to match the
canonical version.
Fixes#17832
Change-Id: I4e62c8e1729d322e14b1673bada40fa7a74ea8bc
Reviewed-on: https://go-review.googlesource.com/33161
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Add a variant of sync/atomic's TestUnaligned64 to
runtime/internal/atomic.
Skips the test on arm for now where it's currently failing.
Updates #17786
Change-Id: If63f9c1243e9db7b243a95205b2d27f7d1dc1e6e
Reviewed-on: https://go-review.googlesource.com/33159
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This change is an experimental implementation of asynchronous
cancelable I/O operations on Plan 9, which are required to
implement deadlines.
There are no asynchronous syscalls on Plan 9. I/O operations
are performed with blocking pread and pwrite syscalls.
Implementing deadlines in Go requires a way to interrupt
I/O operations.
It is possible to interrupt reads and writes on a TCP connection
by forcing the closure of the TCP connection. This approach
has been used successfully in CL 31390.
However, we can't implement deadlines with this method, since
we require to be able to reuse the connection after the timeout.
On Plan 9, I/O operations are interrupted when the process
receives a note. We can rely on this behavior to implement
a more generic approach.
When doing an I/O operation (read or write), we start the I/O in
its own process, then wait for the result asynchronously. The
process is able to handle the "hangup" note. When receiving the
"hangup" note, the currently running I/O operation is canceled
and the process returns.
This way, deadlines can be implemented by sending an "hangup"
note to the process running the blocking I/O operation, after
the expiration of a timer.
Fixes#11932.
Fixes#17498.
Change-Id: I414f72c7a9a4f9b8f9c09ed3b6c269f899d9b430
Reviewed-on: https://go-review.googlesource.com/31521
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
When a Go program crashes with GOTRACEBACK=crash, the OS creates a
core dump. Include the text-formatted output of some of the cause of
that crash in the core dump.
Output printed by the runtime before crashing is maintained in a
circular buffer to allow access to messages that may be printed
immediately before calling runtime.throw.
The stack traces printed by the runtime as it crashes are not stored.
The information required to recreate them should be included in the
core file.
Updates #16893
There are no tests covering the generation of core dumps; this change
has not added any.
This adds (reentrant) locking to runtime.gwrite, which may have an
undesired performance impact.
Change-Id: Ia2463be3c12429354d290bdec5f3c8d565d1a2c3
Reviewed-on: https://go-review.googlesource.com/32013
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
I don't have any way to test or reproduce this problem,
but the current code is clearly wrong for Windows.
Make it better.
As I said on #17165:
But the borrowing of M's and the profiling of M's by the CPU profiler
seem not synchronized enough. This code implements the CPU profiler
on Windows:
func profileloop1(param uintptr) uint32 {
stdcall2(_SetThreadPriority, currentThread, _THREAD_PRIORITY_HIGHEST)
for {
stdcall2(_WaitForSingleObject, profiletimer, _INFINITE)
first := (*m)(atomic.Loadp(unsafe.Pointer(&allm)))
for mp := first; mp != nil; mp = mp.alllink {
thread := atomic.Loaduintptr(&mp.thread)
// Do not profile threads blocked on Notes,
// this includes idle worker threads,
// idle timer thread, idle heap scavenger, etc.
if thread == 0 || mp.profilehz == 0 || mp.blocked {
continue
}
stdcall1(_SuspendThread, thread)
if mp.profilehz != 0 && !mp.blocked {
profilem(mp)
}
stdcall1(_ResumeThread, thread)
}
}
}
func profilem(mp *m) {
var r *context
rbuf := make([]byte, unsafe.Sizeof(*r)+15)
tls := &mp.tls[0]
gp := *((**g)(unsafe.Pointer(tls)))
// align Context to 16 bytes
r = (*context)(unsafe.Pointer((uintptr(unsafe.Pointer(&rbuf[15]))) &^ 15))
r.contextflags = _CONTEXT_CONTROL
stdcall2(_GetThreadContext, mp.thread, uintptr(unsafe.Pointer(r)))
sigprof(r.ip(), r.sp(), 0, gp, mp)
}
func sigprof(pc, sp, lr uintptr, gp *g, mp *m) {
if prof.hz == 0 {
return
}
// Profiling runs concurrently with GC, so it must not allocate.
mp.mallocing++
... lots of code ...
mp.mallocing--
}
A borrowed M may migrate between threads. Between the
atomic.Loaduintptr(&mp.thread) and the SuspendThread, mp may have
moved to a new thread, so that it's in active use. In particular
it might be calling malloc, as in the crash stack trace. If so, the
mp.mallocing++ in sigprof would provoke the crash.
Those lines are trying to guard against allocation during sigprof.
But on Windows, mp is the thread being traced, not the current
thread. Those lines should really be using getg().m.mallocing, which
is the same on Unix but not on Windows. With that change, it's
possible the race on the actual thread is not a problem: the traceback
would get confused and eventually return an error, but that's fine.
The code expects that possibility.
Fixes#17165.
Change-Id: If6619731910d65ca4b1a6e7de761fa2518ef339e
Reviewed-on: https://go-review.googlesource.com/33132
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
All the existing CPU profiler tests already parse the profile.
That should be sufficient indication that profiles can be parsed.
Fixes#17853.
Change-Id: Ie8a190e2ae4eef125c8eb0d4e8b7adac420abbdb
Reviewed-on: https://go-review.googlesource.com/33136
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
rsc's change golang.org/cl/32455 added a mechanism
that allows pprof to depend on gzip without introducing
an import cycle. This obsoletes the need for the gzip0
package, which was created solely to remove the need
for that dependency.
Change-Id: Ifa3b98faac9b251f909b84b4da54742046c4e3ad
Reviewed-on: https://go-review.googlesource.com/33137
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change buffers the entire profile and converts in one shot
in the profile writer, and could use more memory than necessary
to output protocol buffer formatted profiles. It should be
possible to convert each chunk in a stream (maybe maintaining
some minimal state to output in the end) which could save on
memory usage.
Fixes#16093
Change-Id: I946c6a2b044ae644c72c8bb2d3bd82c415b1a847
Reviewed-on: https://go-review.googlesource.com/33071
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
A Go binary may only have 1 executable memory region if it has been
linked using internal linking. This change means that the test will
be skipped if this is the case, rather than fail.
Fixes#17852.
Change-Id: I59459a0f90ae8963aeb9908e5cb9fb64d7d0e0f4
Reviewed-on: https://go-review.googlesource.com/32920
Run-TryBot: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
This change adds code, originally written by Russ Cox <rsc@golang.org>
and open-sourced by Google, that converts from the "legacy"
binary pprof profile format to a struct representation of the
new protocol buffer pprof profile format.
This code reads the entire binary format for conversion to the
protobuf format. In a future change, we will update the code
to incrementally read and convert segments of the binary format,
so that the entire profile does not need to be stored in memory.
This change also contains contributions by Daria Kolistratova
<daria.kolistratova@intel.com> from the rolled-back change
golang.org/cl/30556 adapting the code to be used by the package
runtime/pprof.
This code also appeared in the change golang.org/cl/32257, which was based
on Daria Kolistratova's change, but was also rolled back.
Updates #16093
Change-Id: I5c768b1134bc15408d80a3ccc7ed867db9a1c63d
Reviewed-on: https://go-review.googlesource.com/32811
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Fixes#17811
Change-Id: I7bf9cbc5245417047ad28a14d9b9ad6592607d3d
Reviewed-on: https://go-review.googlesource.com/32774
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I used the slowtests.go tool as described in
https://golang.org/cl/32684 on packages that stood out.
go test -short std drops from ~56 to ~52 seconds.
This isn't a huge win, but it was mostly an exercise.
Updates #17751
Change-Id: I9f3402e36a038d71e662d06ce2c1d52f6c4b674d
Reviewed-on: https://go-review.googlesource.com/32751
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
mheap_.heap_live is an atomically accessed uint64. It is currently not 8-byte
aligned on 32-bit platforms, which has been okay because it's only accessed via
Xadd64, which doesn't require alignment on 386 or ARM32. However, Xadd64 on
MIPS32 does require 8-byte alignment.
Add a padding field to force 8-byte alignment of heap_live and prevent an
alignment check crash on MIPS32.
Change-Id: I7eddf7883aec7a0a7e0525af5d58ed4338a401d0
Reviewed-on: https://go-review.googlesource.com/31635
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Newer versions of gcc notice a type mismatch and complain.
Fix code to match documented signature in MSDN.
Trybots say this still compiles with the older (5.1) version
of gcc.
Fixes#17771.
Change-Id: Ib3fe6f71b40751e1146249e31232da5ac69b9e00
Reviewed-on: https://go-review.googlesource.com/32646
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
OpenBSD's scheduler causes preemption to take 20+ms, so 30ms is not
enough time for 3 goroutines to run. This change continues to sleep for
30ms, but if it finds that the 3 goroutines have not run, it sleeps for
an additional 1s before declaring failure.
Updates #17712
Change-Id: I3e886e40d05192b7cb71b4f242af195836ef62a8
Reviewed-on: https://go-review.googlesource.com/32634
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Quentin Smith <quentin@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The code to do the conversion is smaller than the
call to the runtime.
The 1-result asserts need to call panic if they fail, but that
code is out of line.
The only conversions left in the runtime are those which
might allocate and those which might need to generate an itab.
Given the following types:
type E interface{}
type I interface { foo() }
type I2 iterface { foo(); bar() }
type Big [10]int
func (b Big) foo() { ... }
This CL inlines the following conversions:
was assertE2T
var e E = ...
b := i.(Big)
was assertE2T2
var e E = ...
b, ok := i.(Big)
was assertI2T
var i I = ...
b := i.(Big)
was assertI2T2
var i I = ...
b, ok := i.(Big)
was assertI2E
var i I = ...
e := i.(E)
was assertI2E2
var i I = ...
e, ok := i.(E)
These are the remaining runtime calls:
convT2E:
var b Big = ...
var e E = b
convT2I:
var b Big = ...
var i I = b
convI2I:
var i2 I2 = ...
var i I = i2
assertE2I:
var e E = ...
i := e.(I)
assertE2I2:
var e E = ...
i, ok := e.(I)
assertI2I:
var i I = ...
i2 := i.(I2)
assertI2I2:
var i I = ...
i2, ok := i.(I2)
Fixes#17405Fixes#8422
Change-Id: Ida2367bf8ce3cd2c6bb599a1814f1d275afabe21
Reviewed-on: https://go-review.googlesource.com/32313
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
The runtime.typeEquals function is used during typelinksinit to
determine the canonical set of *_type values to use throughout the
runtime. As such, it is run against non-canonical *_type values, that
is, types from modules that are duplicates of a type from another
module that was loaded earlier in the program life.
These non-canonical *_type values sometimes contain pointers. These
pointers are pointing to position-independent data, and so they are set
by ld.so using dynamic relocations when the module is loaded. As such,
the pointer can point to the equivalent memory from a previous module.
This means if typesEqual follows a pointer inside a *_type, it can end
up at a piece of memory from another module. If it reads a typeOff or
nameOff from that memory and attempts to resolve it against the
non-canonical *_type from the later module, it will end up with a
reference to junk memory.
Instead, resolve against the pointer the offset was read from, so the
data is valid.
Fixes#17709.
Should no longer matter after #17724 is resolved in a later Go.
Change-Id: Ie88b151a3407d82ac030a97b5b6a19fc781901cb
Reviewed-on: https://go-review.googlesource.com/32513
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This makes no practical difference, as SIGSTOP can not be caught, but
may as well be consistent.
Change-Id: I3efbbf092388bb3f6dccc94cf703c5d94d35f6a1
Reviewed-on: https://go-review.googlesource.com/32533
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
sigfwd calls an arbitrary C signal handler function. The System V ABI
for x86_64 (and the most recent revision of the ABI for i386) requires
the stack to be 16-byte aligned.
Fixes: #17641
Change-Id: I77f53d4a8c29c1b0fe8cfbcc8d5381c4e6f75a6b
Reviewed-on: https://go-review.googlesource.com/32107
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The introduction of -buildmode=plugin means modules can be added to a
Go program while it is running. This means there exists some time
while the program is running with the module is on the moduledata
linked list, but it has not been initialized to the satisfaction of
other parts of the runtime. Notably, the GC.
This CL adds a new way of access modules, an activeModules function.
It returns a slice of modules that is built in the background and
atomically swapped in. The parts of the runtime that need to wait on
module initialization can use this slice instead of the linked list.
Fixes#17455
Change-Id: I04790fd07e40c7295beb47cea202eb439206d33d
Reviewed-on: https://go-review.googlesource.com/32357
Reviewed-by: Ian Lance Taylor <iant@golang.org>
- Adds overflow checks
- Adds parsing of negative integers
- Adds boolean return value to signal parsing errors
- Adds atoi32 for parsing of integers that fit in an int32
- Adds tests
Handling of errors to provide error messages
at the call sites is left to future CLs.
Updates #17718
Change-Id: I3cacd0ab1230b9efc5404c68edae7304d39bcbc0
Reviewed-on: https://go-review.googlesource.com/32390
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This implements a check that can be done at runtime for the ISA level and
hardware capability. It follows the same implementation as in s390x.
These checks will be important as we enable new instructions and write go
asm implementations using those.
Updates #15403Fixes#16643
Change-Id: Idfee374a3ffd7cf13a7d8cf0a6c83d247d3bee16
Reviewed-on: https://go-review.googlesource.com/32330
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently we have write barriers for direct channel sends, where the
receiver is blocked and the sender is writing directly to the
receiver's stack; but not for direct channel receives, where the
sender is blocked and the receiver is reading directly from the
sender's stack.
This was okay with the old write barrier because either 1) the
receiver would write the received pointer into the heap (causing it to
be shaded), 2) the pointer would still be on the receiver's stack at
mark termination and we would rescan it, or 3) the receiver dropped
the pointer so it wasn't necessarily reachable anyway.
This is not okay with the write barrier because it lets a grey stack
send a white pointer to a black stack and then remove it from its own
stack. If the grey stack was the sole grey-protector of this pointer,
this hides the object from the garbage collector.
Fix this by making direct receives perform a stack-to-stack write
barrier just like direct sends do.
Fixes#17694.
Change-Id: I1a4cb904e4138d2ac22f96a3e986635534a5ae41
Reviewed-on: https://go-review.googlesource.com/32450
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, assists can only perform heap marking jobs. However, at the
beginning of GC, there are only root jobs and no heap marking jobs. As
a result, there's often a period at the beginning of a GC cycle where
no goroutine has accumulated assist credit, but at the same time it
can't get any credit because there are no heap marking jobs for it to
do yet. As a result, many goroutines often block on the assist queue
at the very beginning of the GC cycle.
This commit fixes this by allowing assists to perform root marking
jobs. The tricky part of this (and the reason we haven't done this
before) is that stack scanning jobs can lead to deadlocks if the
goroutines performing the stack scanning are themselves
non-preemptible, since two non-preemptible goroutines may try to scan
each other. To address this, we use the same insight d6625ca used to
simplify the mark worker stack scanning: as long as we're careful with
the stacks and only drain jobs while on the system stack, we can put
the goroutine into a preemptible state while we drain jobs. This means
an assist's user stack can be scanned while it continues to do work.
This reduces the rate of assist blocking in the x/benchmarks HTTP
benchmark by a factor of 3 and all remaining blocking happens towards
the *end* of the GC cycle, when there may genuinely not be enough work
to go around.
Ideally, assists would get credit for working on root jobs. Currently
they do not; however, this change prioritizes heap work over root jobs
in assists, so they're likely to mostly perform heap work. In contrast
with mark workers, for assists, the root jobs act only as a backstop
to create heap work when there isn't enough heap work.
Fixes#15361.
Change-Id: If6e169863e4ad75710b0c8dc00f6125b41e9a595
Reviewed-on: https://go-review.googlesource.com/32432
Reviewed-by: Rick Hudson <rlh@golang.org>
This lifts the part of gcAssistAlloc that runs on the system stack to
its own function in preparation for letting assists perform root jobs
(notably stack scanning). This makes it easy to see that there are no
references to the user stack once we've entered gcAssistAlloc1, which
means it's safe to shrink the stack while in gcAssistAlloc1.
This does not yet make assists perform root jobs, so it's not actually
possible for the stack to shrink yet. That will happen in the next
commit.
The code in gcAssistAlloc1 is identical to the code that's currently
passed in a closure to systemstack with one exception. Currently, we
set the "completed" variable in the enclosing scope to indicate that
the assist completed the mark phase. This is exactly the sort of
cross-stack reference lifting this function is meant to prevent. We
replace this variable with setting gp.param to nil or non-nil to
indicate the completion status.
Updates #15361.
Change-Id: Iba7cfb758c781070a441aea86c0117b399a24dbd
Reviewed-on: https://go-review.googlesource.com/32431
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
This reverts commit b33030a727.
Reason for revert: We're going to try to get the code in this change
submitted in smaller, more carefully reviewed changes.
Change-Id: I4175f4b297f0e69fb78b11f9dc0bd82f27865be7
Reviewed-on: https://go-review.googlesource.com/32441
Reviewed-by: Russ Cox <rsc@golang.org>
The map[typeOff]*_type object is created at run time and stored in
the moduledata. The moduledata object is marked by the linker as
SNOPTRDATA, so the reference is ignored by the GC. Running
misc/cgo/testplugin/test.bash with GOGC=1 will eventually collect
the typemap and crash.
This bug probably comes up in -linkshared binaries in Go 1.7.
I don't know why we haven't seen a report about this yet.
Fixes#17680
Change-Id: I0e9b5c006010e8edd51d9471651620ba665248d3
Reviewed-on: https://go-review.googlesource.com/32430
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Plumb the import path of a plugin package through to the linker, and
use it as the prefix on the exported symbol names.
Before this we used the basename of the plugin file as the prefix,
which could conflict and result in multiple loaded plugins sharing
symbols that are distinct.
Fixes#17155Fixes#17579
Change-Id: I7ce966ca82d04e8507c0bcb8ea4ad946809b1ef5
Reviewed-on: https://go-review.googlesource.com/32355
Reviewed-by: Ian Lance Taylor <iant@golang.org>
No point in computing this info on startup.
Compute it at build time.
This lets us spend more time computing & checking the size classes.
Improve the div magic for rounding to the start of an object.
We can now use 32-bit multiplies & shifts, which should help
32-bit platforms.
The static data is <1KB.
The actual size classes are not changed by this CL.
Change-Id: I6450cec7d1b2b4ad31fd3f945f504ed2ec6570e7
Reviewed-on: https://go-review.googlesource.com/32219
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
I did 'export GORACE=atexit_sleep_ms=0' in a console
and then was puzzled as to why race tests fail.
Existing GORACE env var may (or may not) override
the one that we setup.
Filter out GORACE as we do for other important env vars.
Change-Id: I29be86b0cbb9b5dc7f9efb15729ade86fc79b0e0
Reviewed-on: https://go-review.googlesource.com/32163
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
On solaris/amd64 sometimes the reported cycle count is negative. Replace
with 0.
Change-Id: I364eea5ca072281245c7ab3afb0bf69adc3a8eae
Reviewed-on: https://go-review.googlesource.com/32258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
On amd64p32, rt0_go attempts to reserve 128 bytes of scratch space on
the stack, but due to a register mixup this ends up being a no-op. Fix
this so we actually reserve the stack space.
Change-Id: I04dbfbeb44f3109528c8ec74e1136bc00d7e1faa
Reviewed-on: https://go-review.googlesource.com/32331
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
With the hybrid barrier in place, we can now disable stack rescanning
by default. This commit adds a "gcrescanstacks" GODEBUG variable that
is off by default but can be set to re-enable STW stack rescanning.
The plan is to leave this off but available in Go 1.8 for debugging
and as a fallback.
With this change, worst-case mark termination time at GOMAXPROCS=12
*not* including time spent stopping the world (which is still
unbounded) is reliably under 100 µs, with a 95%ile around 50 µs in
every benchmark I tried (the go1 benchmarks, the x/benchmarks garbage
benchmark, and the gcbench activegs and rpc benchmarks). Including
time spent stopping the world usually adds about 20 µs to total STW
time at GOMAXPROCS=12, but I've seen it add around 150 µs in these
benchmarks when a goroutine takes time to reach a safe point (see
issue #10958) or when stopping the world races with goroutine
switches. At GOMAXPROCS=1, where this isn't an issue, worst case STW
is typically 30 µs.
The go-gcbench activegs benchmark is designed to stress large numbers
of dirty stacks. This commit reduces 95%ile STW time for 500k dirty
stacks by nearly three orders of magnitude, from 150ms to 195µs.
This has little effect on the throughput of the go1 benchmarks or the
x/benchmarks benchmarks.
name old time/op new time/op delta
XGarbage-12 2.31ms ± 0% 2.32ms ± 1% +0.28% (p=0.001 n=17+16)
XJSON-12 12.4ms ± 0% 12.4ms ± 0% +0.41% (p=0.000 n=18+18)
XHTTP-12 11.8µs ± 0% 11.8µs ± 1% ~ (p=0.492 n=20+18)
It reduces the tail latency of the x/benchmarks HTTP benchmark:
name old p50-time new p50-time delta
XHTTP-12 489µs ± 0% 491µs ± 1% +0.54% (p=0.000 n=20+18)
name old p95-time new p95-time delta
XHTTP-12 957µs ± 1% 960µs ± 1% +0.28% (p=0.002 n=20+17)
name old p99-time new p99-time delta
XHTTP-12 1.76ms ± 1% 1.64ms ± 1% -7.20% (p=0.000 n=20+18)
Comparing to the beginning of the hybrid barrier implementation
("runtime: parallelize STW mcache flushing") shows that the hybrid
barrier trades a small performance impact for much better STW latency,
as expected. The magnitude of the performance impact is generally
small:
name old time/op new time/op delta
BinaryTree17-12 2.37s ± 1% 2.42s ± 1% +2.04% (p=0.000 n=19+18)
Fannkuch11-12 2.84s ± 0% 2.72s ± 0% -4.00% (p=0.000 n=19+19)
FmtFprintfEmpty-12 44.2ns ± 1% 45.2ns ± 1% +2.20% (p=0.000 n=17+19)
FmtFprintfString-12 130ns ± 1% 134ns ± 0% +2.94% (p=0.000 n=18+16)
FmtFprintfInt-12 114ns ± 1% 117ns ± 0% +3.01% (p=0.000 n=19+15)
FmtFprintfIntInt-12 176ns ± 1% 182ns ± 0% +3.17% (p=0.000 n=20+15)
FmtFprintfPrefixedInt-12 186ns ± 1% 187ns ± 1% +1.04% (p=0.000 n=20+19)
FmtFprintfFloat-12 251ns ± 1% 250ns ± 1% -0.74% (p=0.000 n=17+18)
FmtManyArgs-12 746ns ± 1% 761ns ± 0% +2.08% (p=0.000 n=19+20)
GobDecode-12 6.57ms ± 1% 6.65ms ± 1% +1.11% (p=0.000 n=19+20)
GobEncode-12 5.59ms ± 1% 5.65ms ± 0% +1.08% (p=0.000 n=17+17)
Gzip-12 223ms ± 1% 223ms ± 1% -0.31% (p=0.006 n=20+20)
Gunzip-12 38.0ms ± 0% 37.9ms ± 1% -0.25% (p=0.009 n=19+20)
HTTPClientServer-12 77.5µs ± 1% 78.9µs ± 2% +1.89% (p=0.000 n=20+20)
JSONEncode-12 14.7ms ± 1% 14.9ms ± 0% +0.75% (p=0.000 n=20+20)
JSONDecode-12 53.0ms ± 1% 55.9ms ± 1% +5.54% (p=0.000 n=19+19)
Mandelbrot200-12 3.81ms ± 0% 3.81ms ± 1% +0.20% (p=0.023 n=17+19)
GoParse-12 3.17ms ± 1% 3.18ms ± 1% ~ (p=0.057 n=20+19)
RegexpMatchEasy0_32-12 71.7ns ± 1% 70.4ns ± 1% -1.77% (p=0.000 n=19+20)
RegexpMatchEasy0_1K-12 946ns ± 0% 946ns ± 0% ~ (p=0.405 n=18+18)
RegexpMatchEasy1_32-12 67.2ns ± 2% 67.3ns ± 2% ~ (p=0.732 n=20+20)
RegexpMatchEasy1_1K-12 374ns ± 1% 378ns ± 1% +1.14% (p=0.000 n=18+19)
RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 1% ~ (p=0.259 n=18+20)
RegexpMatchMedium_1K-12 34.2µs ± 1% 34.5µs ± 1% +1.03% (p=0.000 n=18+18)
RegexpMatchHard_32-12 1.77µs ± 1% 1.79µs ± 1% +0.73% (p=0.000 n=19+18)
RegexpMatchHard_1K-12 53.6µs ± 1% 54.2µs ± 1% +1.10% (p=0.000 n=19+19)
Template-12 61.5ms ± 1% 63.9ms ± 0% +3.96% (p=0.000 n=18+18)
TimeParse-12 303ns ± 1% 300ns ± 1% -1.08% (p=0.000 n=19+20)
TimeFormat-12 318ns ± 1% 320ns ± 0% +0.79% (p=0.000 n=19+19)
Revcomp-12 (*) 509ms ± 3% 504ms ± 0% ~ (p=0.967 n=7+12)
[Geo mean] 54.3µs 54.8µs +0.88%
(*) Revcomp is highly non-linear, so I only took samples with 2
iterations.
name old time/op new time/op delta
XGarbage-12 2.25ms ± 0% 2.32ms ± 1% +2.74% (p=0.000 n=16+16)
XJSON-12 11.6ms ± 0% 12.4ms ± 0% +6.81% (p=0.000 n=18+18)
XHTTP-12 11.6µs ± 1% 11.8µs ± 1% +1.62% (p=0.000 n=17+18)
Updates #17503.
Updates #17099, since you can't have a rescan list bug if there's no
rescan list. I'm not marking it as fixed, since gcrescanstacks can
still be set to re-enable the rescan lists.
Change-Id: I6e926b4c2dbd4cd56721869d4f817bdbb330b851
Reviewed-on: https://go-review.googlesource.com/31766
Reviewed-by: Rick Hudson <rlh@golang.org>
This implements the unconditional version of the hybrid deletion write
barrier, which always shades both the old and new pointer. It's
unconditional for now because barriers on channel operations require
checking both the source and destination stacks and we don't have a
way to funnel this information into the write barrier at the moment.
As part of this change, we modify the typed memclr operations
introduced earlier to invoke the write barrier.
This has basically no overall effect on benchmark performance. This is
good, since it indicates that neither the extra shade nor the new bulk
clear barriers have much effect. It also has little effect on latency.
This is expected, since we haven't yet modified mark termination to
take advantage of the hybrid barrier.
Updates #17503.
Change-Id: Iebedf84af2f0e857bd5d3a2d525f760b5cf7224b
Reviewed-on: https://go-review.googlesource.com/31765
Reviewed-by: Rick Hudson <rlh@golang.org>
With the hybrid barrier, unless we're doing a STW GC or hit a very
rare race (~once per all.bash) that can start mark termination before
all of the work is drained, we don't need to drain the work queue at
all. Even draining an empty work queue is rather expensive since we
have to enter the getfull() barrier, so it's worth avoiding this.
Conveniently, it's quite easy to detect whether or not we actually
need the getufull() barrier: since the world is stopped when we enter
mark termination, everything must have flushed its work to the work
queue, so we can just check the queue. If the queue is empty and we
haven't queued up any jobs that may create more work (which should
always be the case with the hybrid barrier), we can simply have all GC
workers perform non-blocking drains.
Also conveniently, this solution is quite safe. If we do somehow screw
something up and there's work on the work queue, some worker will
still process it, it just may not happen in parallel.
This is not the "right" solution, but it's simple, expedient,
low-risk, and maintains compatibility with debug.gcrescanstacks. When
we remove the gcrescanstacks fallback in Go 1.9, we should also fix
the race that starts mark termination early, and then we can eliminate
work draining from mark termination.
Updates #17503.
Change-Id: I7b3cd5de6a248ab29d78c2b42aed8b7443641361
Reviewed-on: https://go-review.googlesource.com/32186
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently bulkBarrierPreWrite calls writebarrierptr_prewrite, but this
means that we check writeBarrier.needed twice and perform cgo checks
twice.
Change bulkBarrierPreWrite to call writebarrierptr_prewrite1 to skip
over these duplicate checks.
This may speed up bulkBarrierPreWrite slightly, but mostly this will
save us from running out of nosplit stack space on ppc64x in the near
future.
Updates #17503.
Change-Id: I1cea1a2207e884ab1a279c6a5e378dcdc048b63e
Reviewed-on: https://go-review.googlesource.com/31890
Reviewed-by: Rick Hudson <rlh@golang.org>
gobuf.ctxt is set to nil from many places in assembly code and these
assignments require write barriers with the hybrid barrier.
Conveniently, in most of these places ctxt should already be nil, in
which case we don't need the barrier. This commit changes these places
to assert that ctxt is already nil.
gogo is more complicated, since ctxt may not already be nil. For gogo,
we manually perform the write barrier if ctxt is not nil.
Updates #17503.
Change-Id: I9d75e27c75a1b7f8b715ad112fc5d45ffa856d30
Reviewed-on: https://go-review.googlesource.com/31764
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, we perform write barriers after performing pointer writes.
At the moment, it simply doesn't matter what order this happens in, as
long as they appear atomic to GC. But both the hybrid barrier and ROC
are going to require a pre-write write barrier.
For the hybrid barrier, this is important because the barrier needs to
observe both the current value of the slot and the value that will be
written to it. (Alternatively, the caller could do the write and pass
in the old value, but it seems easier and more useful to just swap the
order of the barrier and the write.)
For ROC, this is necessary because, if the pointer write is going to
make the pointer reachable to some goroutine that it currently is not
visible to, the garbage collector must take some special action before
that pointer becomes more broadly visible.
This commits swaps pointer writes around so the write barrier occurs
before the pointer write.
The main subtlety here is bulk memory writes. Currently, these copy to
the destination first and then use the pointer bitmap of the
destination to find the copied pointers and invoke the write barrier.
This is necessary because the source may not have a pointer bitmap. To
handle these, we pass both the source and the destination to the bulk
memory barrier, which uses the pointer bitmap of the destination, but
reads the pointer values from the source.
Updates #17503.
Change-Id: I78ecc0c5c94ee81c29019c305b3d232069294a55
Reviewed-on: https://go-review.googlesource.com/31763
Reviewed-by: Rick Hudson <rlh@golang.org>
Apparently on macOS Sierra LLDB thinks /usr/lib/dyld is mapped
at address 0, even if Go code starts at 0x1000, and it looks up
addresses from dyld which shadows Go symbols. Move Go binary at
a higher address to avoid clash.
Fixes#17463. Re-enable TestLldbPython.
Change-Id: I89ca6f3ee48aa6da9862bfa0c2da91477cc93255
Reviewed-on: https://go-review.googlesource.com/32185
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Quentin Smith <quentin@golang.org>
As for dropg, save is writing a nil pointer that will generate a write
barrier with the hybrid barrier. However, in this case, ctxt always
should already be nil, so replace the write with an assertion that
this is the case.
At this point, we're ready to disable the write barrier elision
optimizations that interfere with the hybrid barrier.
Updates #17503.
Change-Id: I83208e65aa33403d442401f355b2e013ab9a50e9
Reviewed-on: https://go-review.googlesource.com/31571
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently this contains no write barriers because it's writing nil
pointers, but with the hybrid barrier, even these will produce write
barriers. However, since these are *gs and *ms, they don't need write
barriers, so we can simply eliminate them.
Updates #17503.
Change-Id: Ib188a60492c5cfb352814bf9b2bcb2941fb7d6c0
Reviewed-on: https://go-review.googlesource.com/31570
Reviewed-by: Rick Hudson <rlh@golang.org>
The hybrid barrier requires allocate-black, but there's one case where
we don't currently allocate black: the tiny allocator. If we allocate
a *new* tiny alloc block during GC, it will be allocated black, but if
we allocated the current block before GC, it won't be black, and the
further allocations from it won't mark it, which means we may free a
reachable tiny block during sweeping.
Fix this by passing over all mcaches at the beginning of mark, while
the world is still stopped, and greying their tiny blocks.
Updates #17503.
Change-Id: I04d4df7cc2f553f8f7b1e4cb0b52e2946588111a
Reviewed-on: https://go-review.googlesource.com/31456
Reviewed-by: Rick Hudson <rlh@golang.org>
The hybrid barrier requires barriers on stack-to-stack copies if
either stack is grey. There are only two instances of this in the
runtime: channel sends and starting a goroutine. Channel sends already
use typedmemmove and hence have the necessary barriers. This commits
adds barriers for the stack-to-stack copy when starting a goroutine.
Updates #17503.
Change-Id: Ibb55e08127ca4d021ac54be61cb96732efa5df5b
Reviewed-on: https://go-review.googlesource.com/31455
Reviewed-by: Rick Hudson <rlh@golang.org>
Original Change by Daria Kolistratova <daria.kolistratova@intel.com>
Added functions with suffix proto and stuff from pprof tool to translate
to protobuf. Done as the profile proto is more extensible than the legacy
pprof format and is pprof's preferred profile format. Large part was taken
from https://github.com/google/pprof tool. Tested by hand and compared the
result with translated by pprof tool, profiles are identical.
Fixes#16093
Change-Id: I2751345b09a66ee2b6aa64be76cba4cd1c326aa6
Reviewed-on: https://go-review.googlesource.com/32257
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
Waiting 2ms for all the kicked-off goroutines to run and block
seems a little optimistic. No harm done by waiting for 200ms instead.
Fixes#17238.
Change-Id: I827532ea2f5f1f3ed04179f8957dd2c563946ed0
Reviewed-on: https://go-review.googlesource.com/32103
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently we initialize LR on a new stack by writing nil to it. But
this is an initializing write since the newly allocated stack is not
zeroed, so this is unsafe with the hybrid barrier. Change this is a
uintptr write to avoid a bad write barrier.
Updates #17503.
Change-Id: I062ac352e35df7da4644c1f2a5aaab87049d1f60
Reviewed-on: https://go-review.googlesource.com/32093
Reviewed-by: Rick Hudson <rlh@golang.org>
We reuse finalizers in finblocks, which are allocated off-heap. This
means they have to be zero-initialized before becoming visible to the
garbage collector. We actually already do this by clearing the
finalizer before returning it to the pool, but we're not careful to
enforce correct memory ordering. Fix this by manipulating the
finalizer count atomically so these writes synchronize properly with
the garbage collector.
Updates #17503.
Change-Id: I7797d31df3c656c9fe654bc6da287f66a9e2037d
Reviewed-on: https://go-review.googlesource.com/31454
Reviewed-by: Rick Hudson <rlh@golang.org>
runfinq allocates a stack frame on the heap for constructing the
finalizer function calls and reuses it for each call. However, because
the type of this frame is constantly shifting, it tells mallocgc there
are no pointers in it and it acts essentially like uninitialized
memory between uses. But runfinq uses pointer writes with write
barriers to "initialize" this memory, which is not going to be safe
with the hybrid barrier, since the hybrid barrier may see a stale
pointer left in the "uninitialized" frame.
Fix this by zero-initializing the argument values in the frame before
writing the argument pointers.
Updates #17503.
Change-Id: I951c0a2be427eb9082a32d65c4410e6fdef041be
Reviewed-on: https://go-review.googlesource.com/31453
Reviewed-by: Rick Hudson <rlh@golang.org>
Since barrier-less memclr is only safe in very narrow circumstances,
this commit renames memclr to avoid accidentally calling memclr on
typed memory. This can cause subtle, non-deterministic bugs, so it's
worth some effort to prevent. In the near term, this will also prevent
bugs creeping in from any concurrent CLs that add calls to memclr; if
this happens, whichever patch hits master second will fail to compile.
This also adds the other new memclr variants to the compiler's
builtin.go to minimize the churn on that binary blob. We'll use these
in future commits.
Updates #17503.
Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca
Reviewed-on: https://go-review.googlesource.com/31369
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently fixalloc does not zero memory it reuses. This is dangerous
with the hybrid barrier if the type may contain heap pointers, since
it may cause us to observe a dead heap pointer on reuse. It's also
error-prone since it's the only allocator that doesn't zero on
allocation (mallocgc of course zeroes, but so do persistentalloc and
sysAlloc). It's also largely pointless: for mcache, the caller
immediately memclrs the allocation; and the two specials types are
tiny so there's no real cost to zeroing them.
Change fixalloc to zero allocations by default.
The only type we don't zero by default is mspan. This actually
requires that the spsn's sweepgen survive across freeing and
reallocating a span. If we were to zero it, the following race would
be possible:
1. The current sweepgen is 2. Span s is on the unswept list.
2. Direct sweeping sweeps span s, finds it's all free, and releases s
to the fixalloc.
3. Thread 1 allocates s from fixalloc. Suppose this zeros s, including
s.sweepgen.
4. Thread 1 calls s.init, which sets s.state to _MSpanDead.
5. On thread 2, background sweeping comes across span s in allspans
and cas's s.sweepgen from 0 (sg-2) to 1 (sg-1). Now it thinks it
owns it for sweeping. 6. Thread 1 continues initializing s.
Everything breaks.
I would like to fix this because it's obviously confusing, but it's a
subtle enough problem that I'm leaving it alone for now. The solution
may be to skip sweepgen 0, but then we have to think about wrap-around
much more carefully.
Updates #17503.
Change-Id: Ie08691feed3abbb06a31381b94beb0a2e36a0613
Reviewed-on: https://go-review.googlesource.com/31368
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the zero value of an mspan is in the "in use" state. This
seems like a bad idea in general. But it's going to wreak havoc when
we make fixalloc zero allocations: even "freed" mspan objects are
still on the allspans list and still get looked at by the garbage
collector. Hence, if we leave the mspan states the way they are,
allocating a span that reuses old memory will temporarily pass that
span (which is visible to GC!) through the "in use" state, which can
cause "unswept span" panics.
Fix all of this by making the zero state "dead".
Updates #17503.
Change-Id: I77c7ac06e297af4b9e6258bc091c37abe102acc3
Reviewed-on: https://go-review.googlesource.com/31367
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
The hybrid barrier requires distinguishing typed and untyped memory
even when zeroing because the *current* contents of the memory matters
even when overwriting.
This commit introduces runtime.typedmemclr and runtime.memclrHasPointers
as a typed memory clearing functions parallel to runtime.typedmemmove.
Currently these simply call memclr, but with the hybrid barrier we'll
need to shade any pointers we're overwriting. These will provide us
with the necessary hooks to do so.
Updates #17503.
Change-Id: I74478619f8907825898092aaa204d6e4690f27e6
Reviewed-on: https://go-review.googlesource.com/31366
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently all mcaches are flushed in a single STW root job. This takes
about 5 µs per P, but since it's done sequentially it adds about
5*GOMAXPROCS µs to the STW.
Fix this by parallelizing the job. Since there are exactly GOMAXPROCS
mcaches to flush, this parallelizes quite nicely and brings the STW
latency cost down to a constant 5 µs (assuming GOMAXPROCS actually
reflects the number of CPUs).
Updates #17503.
Change-Id: Ibefeb1c2229975d5137c6e67fac3b6c92103742d
Reviewed-on: https://go-review.googlesource.com/32033
Reviewed-by: Rick Hudson <rlh@golang.org>
Added functions with suffix proto and stuff from pprof tool to translate
to protobuf. Done as the profile proto is more extensible than the legacy
pprof format and is pprof's preferred profile format. Large part was taken
from https://github.com/google/pprof tool. Tested by hand and compared the
result with translated by pprof tool, profiles are identical.
Fixes#16093
Change-Id: I5acdb2809cab0d16ed4694fdaa7b8ddfd68df11e
Reviewed-on: https://go-review.googlesource.com/30556
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
The current logic in gcDrain conflates non-blocking with preemptible
draining for root jobs. As a result, if you do a non-blocking (but
*not* preemptible) drain, like dedicated workers do, the root job
drain will stop if preempted and fall through to heap marking jobs,
which won't stop until it fails to get a heap marking job.
This commit fixes the condition on root marking jobs so they only stop
when preempted if the drain is preemptible.
Coincidentally, this also fixes a nil pointer dereference if we call
gcDrain with gcDrainNoBlock and without a user G, since it tries to
get the preempt flag from the nil user G. This combination never
happens right now, but will in the future.
Change-Id: Ia910ec20a9b46237f7926969144a33b1b4a7b2f9
Reviewed-on: https://go-review.googlesource.com/32291
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
This adds support to the runtime/trace test for saving traces
collected by its tests to disk and a script in internal/trace that
uses this to collect canned traces for the trace test suite. This can
be used to add to the test suite when we introduce a new trace format
version.
Change-Id: Id9ac1ff312235bf02f982fdfff8a827f54035758
Reviewed-on: https://go-review.googlesource.com/32290
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Currently when a goroutine blocks on a GC assist, it emits a generic
EvGoBlock event. Since assist blocking events and, in particular, the
length of the blocked assist queue, are important for diagnosing GC
behavior, this commit adds a new EvGoBlockGC event for blocking on a
GC assist. The trace viewer uses this event to report a "waiting on
GC" count in the "Goroutines" row. This makes sense because, unlike
other blocked goroutines, these goroutines do have work to do, so
being blocked on a GC assist is quite similar to being in the
"runnable" state, which we also report in the trace viewer.
Change-Id: Ic21a326992606b121ea3d3d00110d8d1fdc7a5ef
Reviewed-on: https://go-review.googlesource.com/30704
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Currently mark workers are shown in the trace as regular goroutines
labeled "runtime.gcBgMarkWorker". That's somewhat unhelpful to an end
user because of the opaque label and particularly unhelpful to runtime
developers because it doesn't distinguish the different types of mark
workers.
Fix this by introducing a variant of the GoStart event called
GoStartLabel that lets the runtime indicate a label for a goroutine
execution span and using this to label mark worker executions as "GC
(<mode>)" in the trace viewer.
Since this bumps the trace version to 1.8, we also add test data for
1.7 traces.
Change-Id: Id7b9c0536508430c661ffb9e40e436f3901ca121
Reviewed-on: https://go-review.googlesource.com/30702
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Currently, gcDrain looks for the preemption flag at getg().preempt.
However, commit d6625ca moved mark worker draining to the system
stack, which means getg() returns the g0, which never has the preempt
flag set, so idle and fractional workers don't get preempted after
10ms and just run until they run out of work. As a result, if there's
enough idle time, GC becomes effectively STW.
Fix this by looking for the preemption flag on getg().m.curg, which
will always be the user G (where the preempt flag is set), regardless
of whether gcDrain is running on the user or the g0 stack.
Change-Id: Ib554cf49a705b86ccc3d08940bc869f868c50dd2
Reviewed-on: https://go-review.googlesource.com/32251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
runtime.SetMutexProfileFraction(n int) will capture 1/n-th of stack
traces of goroutines holding contended mutexes if n > 0. From runtime/pprof,
pprot.Lookup("mutex").WriteTo writes the accumulated
stack traces to w (in essentially the same format that blocking
profiling uses).
Change-Id: Ie0b54fa4226853d99aa42c14cb529ae586a8335a
Reviewed-on: https://go-review.googlesource.com/29650
Reviewed-by: Austin Clements <austin@google.com>
- removes the runtime function stringtoslicebytetmp
- removes the generation of calls to stringtoslicebytetmp from the frontend
- adds handling of OSTRARRAYBYTETMP in the backend
This reduces binary sizes and avoids function call overhead.
Change-Id: Ib9988d48549cee663b685b4897a483f94727b940
Reviewed-on: https://go-review.googlesource.com/32158
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I do not know why it is included. All tests pass without it.
Change-Id: I839076ee131816dfd177570a902c69fe8fba5022
Reviewed-on: https://go-review.googlesource.com/32144
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Otherwise, the way the ELF dynamic linker works means that you can end up with
the same itab being passed to additab twice, leading to the itab linked list
having a cycle in it. Add a test to additab in runtime to catch this when it
happens, not some arbitrary and surprsing time later.
Fixes#17594
Change-Id: I6c82edcc9ac88ac188d1185370242dc92f46b1ad
Reviewed-on: https://go-review.googlesource.com/32131
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Assembly copied from the clock_gettime(CLOCK_MONOTONIC)
call in runtime.nanotime in these files and then modified to use
CLOCK_REALTIME.
Also comment system call numbers in a few other files.
Fixes#11222.
Change-Id: Ie132086de7386f865908183aac2713f90fc73e0d
Reviewed-on: https://go-review.googlesource.com/32177
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Don't include package path when creating LSyms for auto and param
variables during Prog generation, and update the DWARF emit routine
accordingly (remove the code that chops off package path from names in
DWARF var location expressions). Implementation suggested by mdempsky@.
The intent of this change is to have saner location expressions in cases
where the variable corresponds to a structure field. For example, the
SSA compiler's "decompose" phase can take a slice value and break it
apart into three scalar variables corresponding to the fields (slice "X"
gets split into "X.len", "X.cap", "X.ptr"). In such cases we want the
name in the location expression to omit the package path but preserve
the original variable name (e.g. "X").
Fixes#16338
Change-Id: Ibc444e7f3454b70fc500a33f0397e669d127daa1
Reviewed-on: https://go-review.googlesource.com/31819
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Currently, markroot delays scanning mark worker stacks until mark
termination by putting the mark worker G directly on the rescan list
when it encounters one during the mark phase. Without this, since mark
workers are non-preemptible, two mark workers that attempt to scan
each other's stacks can deadlock.
However, this is annoyingly asymmetric and causes some real problems.
First, markroot does not own the G at that point, so it's not
technically safe to add it to the rescan list. I haven't been able to
find a specific problem this could cause, but I suspect it's the root
cause of issue #17099. Second, this will interfere with the hybrid
barrier, since there is no stack rescanning during mark termination
with the hybrid barrier.
This commit switches to a different approach. We move the mark
worker's call to gcDrain to the system stack and set the mark worker's
status to _Gwaiting for the duration of the drain to indicate that
it's preemptible. This lets another mark worker scan its G stack while
the drain is running on the system stack. We don't return to the G
stack until we can switch back to _Grunning, which ensures we don't
race with a stack scan. This lets us eliminate the special case for
mark worker stack scans and scan them just like any other goroutine.
The only subtlety to this approach is that we have to disable stack
shrinking for mark workers; they could be referring to captured
variables from the G stack, so it's not safe to move their stacks.
Updates #17099 and #17503.
Change-Id: Ia5213949ec470af63e24dfce01df357c12adbbea
Reviewed-on: https://go-review.googlesource.com/31820
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, if the number of stack barriers for a stack is 0, we'll
create a zero-length slice that points just past the end of the stack
allocation. This bad pointer causes GC panics.
Fix this by creating a nil slice if the stack barrier count is 0.
In practice, the only way this can happen is if
GODEBUG=gcstackbarrieroff=1 is set because even the minimum size stack
reserves space for two stack barriers.
Change-Id: I3527c9a504c445b64b81170ee285a28594e7983d
Reviewed-on: https://go-review.googlesource.com/31762
Reviewed-by: Rick Hudson <rlh@golang.org>
This adds debug code enabled in gccheckmark mode that panics if we
attempt to mark an unallocated object. This is a common issue with the
hybrid barrier when we're manipulating uninitialized memory that
contains stale pointers. This also tends to catch bugs that will lead
to "sweep increased allocation count" crashes closer to the source of
the bug.
Change-Id: I443ead3eac6f316a46f50b106078b524cac317f4
Reviewed-on: https://go-review.googlesource.com/31761
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently reflectcall has a subtle dance with write barriers where the
assembly code copies the result values from the stack to the in-heap
argument frame without write barriers and then calls into the runtime
after the fact to invoke the necessary write barriers.
For the hybrid barrier (and for ROC), we need to switch to a
*pre*-write write barrier, which is very difficult to do with the
current setup. We could tie ourselves in knots of subtle reasoning
about why it's okay in this particular case to have a post-write write
barrier, but this commit instead takes a different approach. Rather
than making things more complex, this simplifies reflection calls so
that the argument copy is done in Go using normal bulk write barriers.
The one difficulty with this approach is that calling into Go requires
putting arguments on the stack, but the call* functions "donate" their
entire stack frame to the called function. We can get away with this
now because the copy avoids using the stack and has copied the results
out before we clobber the stack frame to call into the write barrier.
The solution in this CL is to call another function, passing arguments
in registers instead of on the stack, and let that other function
reserve more stack space and setup the arguments for the runtime.
This approach seemed to work out the best. I also tried making the
call* functions reserve 32 extra bytes of frame for the write barrier
arguments and adjust SP up by 32 bytes around the call. However, even
with the necessary changes to the assembler to correct the spdelta
table, the runtime was still having trouble with the frame layout (and
the changes to the assembler caused many other things that do strange
things with the SP to fail to assemble). The approach I took doesn't
require any funny business with the SP.
Updates #17503.
Change-Id: Ie2bb0084b24d6cff38b5afb218b9e0534ad2119e
Reviewed-on: https://go-review.googlesource.com/31655
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Now that sweeping and span marking use the sweep list, there's no need
for the work.spans snapshot of the allspans list. This change
eliminates the few remaining uses of it, which are either dead code or
can use allspans directly, and removes work.spans and its support
functions.
Change-Id: Id5388b42b1e68e8baee853d8eafb8bb4ff95bb43
Reviewed-on: https://go-review.googlesource.com/30537
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently markrootSpans iterates over all spans ever allocated to find
the in-use spans. Since we now have a list of in-use spans, change it
to iterate over that instead.
This, combined with the previous change, fixes#9265. Before these two
changes, blowing up the heap to 8GB and then shrinking it to a 0MB
live set caused the small-heap portion of the test to run 60x slower
than without the initial blowup. With these two changes, the time is
indistinguishable.
No significant effect on other benchmarks.
Change-Id: I4a27e533efecfb5d18cba3a87c0181a81d0ddc1e
Reviewed-on: https://go-review.googlesource.com/30536
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently sweeping walks the list of all spans, which means the work
in sweeping is proportional to the maximum number of spans ever used.
If the heap was once large but is now small, this causes an
amortization failure: on a small heap, GCs happen frequently, but a
full sweep still has to happen in each GC cycle, which means we spent
a lot of time in sweeping.
Fix this by creating a separate list consisting of just the in-use
spans to be swept, so sweeping is proportional to the number of in-use
spans (which is proportional to the live heap). Specifically, we
create two lists: a list of unswept in-use spans and a list of swept
in-use spans. At the start of the sweep cycle, the swept list becomes
the unswept list and the new swept list is empty. Allocating a new
in-use span adds it to the swept list. Sweeping moves spans from the
unswept list to the swept list.
This fixes the amortization problem because a shrinking heap moves
spans off the unswept list without adding them to the swept list,
reducing the time required by the next sweep cycle.
Updates #9265. This fix eliminates almost all of the time spent in
sweepone; however, markrootSpans has essentially the same bug, so now
the test program from this issue spends all of its time in
markrootSpans.
No significant effect on other benchmarks.
Change-Id: Ib382e82790aad907da1c127e62b3ab45d7a4ac1e
Reviewed-on: https://go-review.googlesource.com/30535
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we set the len and cap of h.spans to the full reserved
region of the address space and track the actual mapped region
separately in h.spans_mapped. Since we have both the len and cap at
our disposal, change things so len(h.spans) tracks how much of the
spans array is mapped and eliminate h.spans_mapped. This simplifies
mheap and means we'll get nice "index out of bounds" exceptions if we
do try to go off the end of the spans rather than a SIGSEGV.
Change-Id: I8ed9a1a9a844d90e9fd2e269add4704623dbdfe6
Reviewed-on: https://go-review.googlesource.com/30533
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Like h_allspans and mheap_.allspans, these were two ways of referring
to the spans array from when the runtime was split between C and Go.
Clean this up by making mheap_.spans a slice and eliminating h_spans.
Change-Id: I3aa7038d53c3a4252050aa33e468c48dfed0b70e
Reviewed-on: https://go-review.googlesource.com/30532
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This was necessary in the C days when allspans was an mspan**, but now
that allspans is a Go slice, this is redundant with len(allspans) and
we can use range loops over allspans.
Change-Id: Ie1dc39611e574e29a896e01690582933f4c5be7e
Reviewed-on: https://go-review.googlesource.com/30531
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
These are two ways to refer to the allspans array that hark back to
when the runtime was split between C and Go. Clean this up by making
mheap_.allspans a slice and eliminating h_allspans.
Change-Id: Ic9360d040cf3eb590b5dfbab0b82e8ace8525610
Reviewed-on: https://go-review.googlesource.com/30530
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
These are emulated by the assembler and we don't need them.
Change-Id: I2b07c5315a5b642fdb5e50b468453260ae121164
Reviewed-on: https://go-review.googlesource.com/31758
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Looking at the kernel sources, I don't see how this is possible.
But obviously it is. Just try again.
Fixes#17161.
Change-Id: Iea7d53f7cf75944792d2f75a0d07129831c7bcdb
Reviewed-on: https://go-review.googlesource.com/31823
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Makes windows same as others.
Change-Id: Ib4651e06d0bd37473ac345d36c91f39aa8f5e662
Reviewed-on: https://go-review.googlesource.com/31791
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently mspan.isFree technically returns whether the object was not
allocated *during this cycle*. Fix it so it actually returns whether
or not the object is allocated so the method is more generally useful
(especially for debugging).
It has one caller, which is carefully written to be insensitive to
this distinction, but this lets us simplify this caller.
Change-Id: I9d79cf784a56015e434961733093c1d8d03fc091
Reviewed-on: https://go-review.googlesource.com/30145
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
morestack writes the context pointer to gobuf.ctxt, but since
morestack is written in assembly (and has to be very careful with
state), it does *not* invoke the requisite write barrier for this
write. Instead, we patch this up later, in newstack, where we invoke
an explicit write barrier for ctxt.
This already requires some subtle reasoning, and it's going to get a
lot hairier with the hybrid barrier.
Fix this by simplifying the whole mechanism. Instead of writing
gobuf.ctxt in morestack, just pass the value of the context register
to newstack and let it write it to gobuf.ctxt. This is a normal Go
pointer write, so it gets the normal Go write barrier. No subtle
reasoning required.
Updates #17503.
Change-Id: Ia6bf8459bfefc6828f53682ade32c02412e4db63
Reviewed-on: https://go-review.googlesource.com/31550
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This commit fixes two bizarrely related bugs:
1. The signatures for the call* functions were wrong, indicating that
they had only two pointer arguments instead of three. We didn't notice
because the call* functions are defined by a macro expansion, which go
vet doesn't see.
2. deferArgs on a defer object with a zero-sized frame returned a
pointer just past the end of the allocated object, which is illegal in
Go (and can cause the "sweep increased allocation count" crashes).
In a fascinating twist, these two bugs canceled each other out, which
is why I'm fixing them together. The pointer returned by deferArgs is
used in only two ways: as an argument to memmove and as an argument to
reflectcall. memmove is NOSPLIT, so the argument was unobservable.
reflectcall immediately tail calls one of the call* functions, which
are not NOSPLIT, but the deferArgs pointer just happened to be the
third argument that was accidentally marked as a scalar. Hence, when
the garbage collector scanned the stack, it didn't see the bad
pointer as a pointer.
I believe this was all ultimately benign. In principle, stack growth
during the reflectcall could fail to update the args pointer, but it
never points to the stack, so it never needs to be updated. Also in
principle, the garbage collector could fail to mark the args object
because of the incorrect call* signatures, but in all calls to
reflectcall (including the ones spelled "call" in the reflect package)
the args object is kept live by the calling stack.
Change-Id: Ic932c79d5f4382be23118fdd9dba9688e9169e28
Reviewed-on: https://go-review.googlesource.com/31654
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
trace's reader *g is going to cause write barriers in unfortunate
places, so replace it with a guintptr.
Change-Id: Ie8fb13bb89a78238f9d2a77ec77da703e96df8af
Reviewed-on: https://go-review.googlesource.com/31469
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Fixes#16076
Change-Id: I91fa87b642592ee4604537dd8c3197cd61ec8b31
Reviewed-on: https://go-review.googlesource.com/31516
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This is a more robust method for obtaining the availability of vx.
Since this variable may be checked frequently I've also now
padded it so that it will be in its own cache line.
I've kept the other check (in hash/crc32) the same for now until
I can figure out the best way to update it.
Updates #15403.
Change-Id: I74eed651afc6f6a9c5fa3b88fa6a2b0c9ecf5875
Reviewed-on: https://go-review.googlesource.com/31149
Reviewed-by: Austin Clements <austin@google.com>
oneNewExtraM creates a spare M and G for use with cgo callbacks. The G
doesn't run right away, but goes directly into syscall status. For the
garbage collector, it's marked as "scan valid" and not on the rescan
list, but I forgot to also mark it as "scan done". As a result,
gcMarkRootCheck thinks that the goroutine hasn't been scanned and
panics.
This only affects GODEBUG=gccheckmark=1 mode, since we otherwise skip
the gcMarkRootCheck.
Fixes#17473.
Change-Id: I94f5671c42eb44bd5ea7dc68fbf85f0c19e2e52c
Reviewed-on: https://go-review.googlesource.com/31139
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Updating the heap profile stats is one of the most expensive parts of
mark termination other than stack rescanning, but there's really no
need to do this with the world stopped. Move it to right after we've
started the world back up. This creates a *very* small window where
allocations from the next cycle can slip into the profile, but the
exact point where mark termination happens is so non-deterministic
already that a slight reordering here is unimportant.
Change-Id: I2f76f22c70329923ad6a594a2c26869f0736d34e
Reviewed-on: https://go-review.googlesource.com/31363
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
The only reason these flushes are still necessary at all is that
gcmarknewobject doesn't flush its gcWork stats like it's supposed to.
By changing gcmarknewobject to follow the standard protocol, the
flushes become completely unnecessary because mark 2 ensures caches
are flushed (and stay flushed) before we ever enter mark termination.
In the garbage benchmark, this takes roughly 50 µs, which is
surprisingly long for doing nothing. We still double-check after
draining that they are in fact empty.
Change-Id: Ia1c7cf98a53f72baa513792eb33eca6a0b4a7128
Reviewed-on: https://go-review.googlesource.com/31134
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
The pointer checking code needs to know the exact type of the parameter
expected by the C function, so that it can use a type assertion to
convert the empty interface returned by cgoCheckPointer to the correct
type. Previously this was done by using a type conversion, but that
meant that the code accepted arguments that were convertible to the
parameter type, rather than arguments that were assignable as in a
normal function call. In other words, some code that should not have
passed type checking was accepted.
This CL changes cgo to always use a function literal for pointer
checking. Now the argument is passed to the function literal, which has
the correct argument type, so type checking is performed just as for a
function call as it should be.
Since we now always use a function literal, simplify the checking code
to run as a statement by itself. It now no longer needs to return a
value, and we no longer need a type assertion.
This does have the cost of introducing another function call into any
call to a C function that requires pointer checking, but the cost of the
additional call should be minimal compared to the cost of pointer
checking.
Fixes#16591.
Change-Id: I220165564cf69db9fd5f746532d7f977a5b2c989
Reviewed-on: https://go-review.googlesource.com/31233
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
The panic leaves the lock in an unusable state.
Trying to panic with a usable state makes the lock significantly
less efficient and scalable (see early CL patch sets and discussion).
Instead, use runtime.throw, which will crash the program directly.
In general throw is reserved for when the runtime detects truly
serious, unrecoverable problems. This problem is certainly serious,
and, without a significant performance hit, is unrecoverable.
Fixes#13879.
Change-Id: I41920d9e2317270c6f909957d195bd8b68177f8d
Reviewed-on: https://go-review.googlesource.com/31359
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Correcting a line in asm_ppc64x.s in the cmpbodyLE function
that originally was R14 but accidentally changed to R4.
Fixes#17488
Change-Id: Id4ca6fb2e0cd81251557a0627e17b5e734c39e01
Reviewed-on: https://go-review.googlesource.com/31266
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
GC assists retry if preempted or if they fail to park. However, on the
retry path they currently use stale statistics. In particular, the
retry can use "debtBytes", but debtBytes isn't updated when the debt
changes (since other than retries it is only used once). Also, though
less of a problem, the if the assist ratio has changed while the
assist was blocked, the retry will still use the old assist ratio.
Fix all of this by simply making the retry jump back to where we
compute these statistics, rather than just after.
Change-Id: I2ed8b4f0fc9f008ff060aa926f4334b662ac7d3f
Reviewed-on: https://go-review.googlesource.com/30701
Reviewed-by: Rick Hudson <rlh@golang.org>
This puts all of the assist queue-related code together and makes it
easier to modify how the assist queue works.
Change-Id: Id54e06702bdd5a5dd3fef2ce2c14cd7ca215303c
Reviewed-on: https://go-review.googlesource.com/30700
Reviewed-by: Rick Hudson <rlh@golang.org>
It's pretty simple. For:
e = (interface{})(i)
Do:
tmp = i.itab
if tmp != nil {
tmp = tmp.typ_ // load type from itab
}
e = eface{tmp, i.data}
It is smaller and faster than calling the runtime.
Change-Id: I0ad27f62f4ec0b6cd53bc8530e4da0eae3e67a6c
Reviewed-on: https://go-review.googlesource.com/31260
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
getArgInfo for reflect.makeFuncStub and reflect.methodValueCall is
necessarily special. These have dynamically determined argument maps
that are stored in their context (that is, their *funcval). These
functions are written to store this context at 0(SP) when called, and
getArgInfo retrieves it from there.
This technique works if getArgInfo is passed an active call frame for
one of these functions. However, getArgInfo is also used in
tracebackdefers, where the "call" is not a true call with an active
stack frame, but a deferred call. In this situation, getArgInfo
currently crashes because tracebackdefers passes a frame with sp set
to 0. However, the entire approach used by getArgInfo is flawed in
this situation because the wrapper has not actually executed, and
hence hasn't saved this metadata to any stack frame.
In the defer case, we know the *funcval from the _defer itself, so we
can fix this by teaching getArgInfo to use the *funcval context
directly when its available, and otherwise get it from the active call
frame.
While we're here, this commit simplifies getArgInfo a bit by making it
play more nicely with the type system. Rather than decoding the
*reflect.methodValue that is the wrapper's context as a *[2]uintptr,
just write out a copy of the reflect.methodValue type in the runtime.
Fixes#16331. Fixes#17471.
Change-Id: I81db4d985179b4a81c68c490cceeccbfc675456a
Reviewed-on: https://go-review.googlesource.com/31138
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
If morestack runs on the g0 or gsignal stack, it currently performs
some abort operation that typically produces a signal (e.g., it does
an INT $3 on x86). This is useful if you're running in a debugger, but
if you're not, the runtime tries to trap this signal, which is likely
to send the program into a deeper spiral of collapse and lead to very
confusing diagnostic output.
Help out people trying to debug without a debugger by making morestack
print an informative message before blowing up.
Change-Id: I2814c64509b137bfe20a00091d8551d18c2c4749
Reviewed-on: https://go-review.googlesource.com/31133
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This improves the performance for byte.Compare by rewriting
the cmpbody function in runtime/asm_ppc64x.s. The previous code
had a simple loop which loaded a pair of bytes and compared them,
which is inefficient for long buffers. The updated function checks
for 8 or 32 byte chunks and then loads and compares double words where
possible.
Because the byte.Compare result indicates greater or less than,
the doubleword loads must take endianness into account, using a
byte reversed load in the little endian case.
Fixes#17433
benchmark old ns/op new ns/op delta
BenchmarkBytesCompare/8-16 13.6 7.16 -47.35%
BenchmarkBytesCompare/16-16 25.7 7.83 -69.53%
BenchmarkBytesCompare/32-16 38.1 7.78 -79.58%
BenchmarkBytesCompare/64-16 63.0 10.6 -83.17%
BenchmarkBytesCompare/128-16 112 13.0 -88.39%
BenchmarkBytesCompare/256-16 211 28.1 -86.68%
BenchmarkBytesCompare/512-16 410 38.6 -90.59%
BenchmarkBytesCompare/1024-16 807 60.2 -92.54%
BenchmarkBytesCompare/2048-16 1601 103 -93.57%
Change-Id: I121acc74fcd27c430797647b8d682eb0607c63eb
Reviewed-on: https://go-review.googlesource.com/30949
Reviewed-by: David Chase <drchase@google.com>
Copies utf8 constants and EncodeRune implementation from unicode/utf8.
Adds a new decoderune implementation that is used by the compiler
in code generated for ranging over strings. It does not handle
ASCII runes since these are handled directly before calls to decoderune.
The DecodeRuneInString implementation from unicode/utf8 is not used
since it uses a lookup table that would increase the use of cpu caches.
Adds more tests that check decoding of valid and invalid utf8 sequences.
name old time/op new time/op delta
RuneIterate/range2/ASCII-4 7.45ns ± 2% 7.45ns ± 1% ~ (p=0.634 n=16+16)
RuneIterate/range2/Japanese-4 53.5ns ± 1% 49.2ns ± 2% -8.03% (p=0.000 n=20+20)
RuneIterate/range2/MixedLength-4 46.3ns ± 1% 41.0ns ± 2% -11.57% (p=0.000 n=20+20)
new:
"".decoderune t=1 size=423 args=0x28 locals=0x0
old:
"".charntorune t=1 size=666 args=0x28 locals=0x0
Change-Id: I1df1fdb385bb9ea5e5e71b8818ea2bf5ce62de52
Reviewed-on: https://go-review.googlesource.com/28490
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently we use go:nowritebarrier in many places in proc.go.
go:notinheap and go:yeswritebarrierrec now let us use
go:nowritebarrierrec (the recursive form of the go:nowritebarrier
pragma) more liberally. Do so in proc.go
Change-Id: Ia7fcbc12ce6c51cb24730bf835fb7634ad53462f
Reviewed-on: https://go-review.googlesource.com/30942
Reviewed-by: Rick Hudson <rlh@golang.org>
This covers basically all sysAlloc'd, persistentalloc'd, and
fixalloc'd types.
Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b
Reviewed-on: https://go-review.googlesource.com/30941
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently mspan links to its previous mspan using a **mspan field that
points to the previous span's next field. This simplifies some of the
list manipulation code, but is going to make it very hard to convince
the compiler that mspan list manipulations don't need write barriers.
Fix this by using a more traditional ("boring") linked list that uses
a simple *mspan pointer to the previous mspan. This complicates some
of the list manipulation slightly, but it will let us eliminate all
write barriers from the mspan list manipulation code by marking mspan
go:notinheap.
Change-Id: I0d0b212db5f20002435d2a0ed2efc8aa0364b905
Reviewed-on: https://go-review.googlesource.com/30940
Reviewed-by: Rick Hudson <rlh@golang.org>
This adds a //go:notinheap pragma for declarations of types that must
not be heap allocated. We ensure these rules by disallowing new(T),
make([]T), append([]T), or implicit allocation of T, by disallowing
conversions to notinheap types, and by propagating notinheap to any
struct or array that contains notinheap elements.
The utility of this pragma is that we can eliminate write barriers for
writes to pointers to go:notinheap types, since the write barrier is
guaranteed to be a no-op. This will let us mark several scheduler and
memory allocator structures as go:notinheap, which will let us
disallow write barriers in the scheduler and memory allocator much
more thoroughly and also eliminate some problematic hybrid write
barriers.
This also makes go:nowritebarrierrec and go:yeswritebarrierrec much
more powerful. Currently we use go:nowritebarrier all over the place,
but it's almost never what you actually want: when write barriers are
illegal, they're typically illegal for a whole dynamic scope. Partly
this is because go:nowritebarrier has been around longer, but it's
also because go:nowritebarrierrec couldn't be used in situations that
had no-op write barriers or where some nested scope did allow write
barriers. go:notinheap eliminates many no-op write barriers and
go:yeswritebarrierrec makes it possible to opt back in to write
barriers, so these two changes will let us use go:nowritebarrierrec
far more liberally.
This updates #13386, which is about controlling pointers from non-GC'd
memory to GC'd memory. That would require some additional pragma (or
pragmas), but could build on this pragma.
Change-Id: I6314f8f4181535dd166887c9ec239977b54940bd
Reviewed-on: https://go-review.googlesource.com/30939
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This pragma cancels the effect of go:nowritebarrierrec. This is useful
in the scheduler because there are places where we enter a function
without a valid P (and hence cannot have write barriers), but then
obtain a P. This allows us to annotate the function with
go:nowritebarrierrec and split out the part after we've obtained a P
into a go:yeswritebarrierrec function.
Change-Id: Ic8ce4b6d3c074a1ecd8280ad90eaf39f0ffbcc2a
Reviewed-on: https://go-review.googlesource.com/30938
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
To compile:
m[k] = v
instead of:
mapassign(maptype, m, &k, &v), do
do:
*mapassign(maptype, m, &k) = v
mapassign returns a pointer to the value slot in the map. It is just
like mapaccess except that it will allocate a new slot if k is not
already present in the map.
This makes map accesses faster but potentially larger (codewise).
It is faster because the write into the map is done when the compiler
knows the concrete type, so it can be done with a few store
instructions instead of calling typedmemmove. We also potentially
avoid stack temporaries to hold v.
The code can be larger when the map has pointers in its value type,
since there is a write barrier call in addition to the mapassign call.
That makes the code at the callsite a bit bigger (go binary is 0.3%
bigger).
This CL is in preparation for doing operations like m[k] += v with
only a single runtime call. That will roughly double the speed of
such operations.
Update #17133
Update #5147
Change-Id: Ia435f032090a2ed905dac9234e693972fe8c2dc5
Reviewed-on: https://go-review.googlesource.com/30815
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Update comments for duffzero and duffcopy
which referred to legacy locations:
+ cmd/?g/cgen.go
+ cmd/?g/ggen.go
Remnants of the old days when we had 5g, 6g etc.
Those locations have since moved to:
+ cmd/compile/internal/<arch>/cgen.go
+ cmd/compile/internal/<arch>/ggen.go
Change-Id: Ie2ea668559d52d42b747260ea69a6d5b3d70e859
Reviewed-on: https://go-review.googlesource.com/29073
Reviewed-by: Russ Cox <rsc@golang.org>
Add checks for failure of CreateEvent, SetEvent or
WaitForSingleObject. Any failures are considered fatal and
will throw() after printing an informative message.
Updates #16646
Change-Id: I3bacf9001d2abfa8667cc3aff163ff2de1c99915
Reviewed-on: https://go-review.googlesource.com/26655
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Fix typo in word synchronization in comments.
Change-Id: I453b4e799301e758799c93df1e32f5244ca2fb84
Reviewed-on: https://go-review.googlesource.com/25174
Reviewed-by: Russ Cox <rsc@golang.org>
The canBackTrace variable is true for all of the architectures
Go supports and this is likely to remain the case as new
architectures are added.
Change-Id: I73900c018eb4b2e5c02fccd8d3e89853b2ba9d90
Reviewed-on: https://go-review.googlesource.com/22423
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
ARM direct CALL/JMP instruction has 24 bit offset, which can only
encodes jumps within +/-32M. When the target is too far, the top
bits get truncated and the program jumps wild.
This CL detects too-far jumps and automatically insert trampolines,
currently only internal linking on ARM.
It is necessary to make the following changes to the linker:
- Resolve direct jump relocs when assigning addresses to functions.
this allows trampoline insertion without moving all code that
already laid down.
- Lay down packages in dependency order, so that when resolving a
inter-package direct jump reloc, the target address is already
known. Intra-package jumps are assumed never too far.
- a linker flag -debugtramp is added for debugging trampolines:
"-debugtramp=1 -v" prints trampoline debug message
"-debugtramp=2" forces all inter-package jump to use
trampolines (currently ARM only)
"-debugtramp=2 -v" does both
- Some data structures are changed for bookkeeping.
On ARM, pseudo DIV/DIVU/MOD/MODU instructions now clobber R8
(unfortunate). In the standard library there is no ARM assembly
code that uses these instructions, and the compiler no longer emits
them (CL 29390).
all.bash passes with -debugtramp=2, except a disassembly test (this
is unavoidable as we changed the instruction).
TBD: debug info of trampolines?
Fixes#17028.
Change-Id: Idcce347ea7e0af77c4079041a160b2f6e114b474
Reviewed-on: https://go-review.googlesource.com/29397
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
If we get a SIGPROF on a non-Go thread, and the program has not called
runtime.SetCgoTraceback so we have no way to collect a stack trace, then
record a profile that is just the PC where the signal occurred. That
will at least point the user to the right area.
Retrieving the PC from the sigctxt in a signal handler on a non-G thread
required marking a number of trivial sigctxt methods as nosplit, and,
for extra safety, nowritebarrierrec.
The test shows that the existing test CgoPprofThread test does not test
the stack trace, just the profile signal. Leaving that for later.
Change-Id: I8f8f3ff09ac099fc9d9df94b5a9d210ffc20c4ab
Reviewed-on: https://go-review.googlesource.com/30252
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
CL 14472 solved issue #12030 by explicitly linking msvcrt.dll
to every cgo executable we build. This CL achieves the same
by manually loading ntdll.dll during startup.
Updates #12030
Change-Id: I5d9cd925ef65cc34c5d4031c750f0f97794529b2
Reviewed-on: https://go-review.googlesource.com/30737
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Currently these are labeled "MARK", which was accurate in the STW
collector, but these really indicate mark termination now, since
marking happens for the full duration of the concurrent GC. Re-label
them as "MARK TERMINATION" to clarify this.
Change-Id: Ie98bd961195acde49598b4fa3f9e7d90d757c0a6
Reviewed-on: https://go-review.googlesource.com/30018
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
When GC is disabled, we set gcpercent to -1. However, we still use
gcpercent to compute several values, such as next_gc and gc_trigger.
These calculations are meaningless when gcpercent is -1 and result in
meaningless values. This is okay in a sense because we also never use
these values if gcpercent is -1, but they're confusing when exposed to
the user, for example via MemStats or the execution trace. It's
particularly unfortunate in the execution trace because it attempts to
plot the underflowed value of next_gc, which scales all useful
information in the heap row into oblivion.
Fix this by making next_gc ^0 when gcpercent < 0. This has the
advantage of being true in a way: next_gc is effectively infinite when
gcpercent < 0. We can also detect this special value when updating the
execution trace and report next_gc as 0 so it doesn't blow up the
display of the heap line.
Change-Id: I4f366e4451f8892a4908da7b2b6086bdc67ca9a9
Reviewed-on: https://go-review.googlesource.com/30016
Reviewed-by: Rick Hudson <rlh@golang.org>
On 64-bit big-endian GNU/Linux machines we need to treat sigset as a
single uint64, not as a pair of uint32 values. This fix was already made
for s390x, but not for ppc64 (which is big-endian--the little endian
version is known as ppc64le). So copy os_linux_390.x to
os_linux_be64.go, and use build constraints as needed.
Fixes#17361
Change-Id: Ia0eb18221a8f5056bf17675fcfeb010407a13fb0
Reviewed-on: https://go-review.googlesource.com/30602
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The CgoExternalThreadSIGPROF test starts a thread at constructor time
that does a busy loop. That can throw off some other tests. So only
build that code if testprogcgo is built with the tag threadprof, and
adjust the tests that use that code to pass that build tag.
This revealed that the CgoPprofThread test was not testing what it
should have, as it never actually started the cpuHog thread. It was
passing because of the busy loop thread. Fix it to start the thread as
intended.
Change-Id: I087a9e4fc734a86be16a287456441afac5676beb
Reviewed-on: https://go-review.googlesource.com/30362
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
In the function prologue, we emit a jump to the beginning of
the function immediately after calling morestack. And in the
runtime stack growing code, it decodes and emulates that jump.
This emulation was necessary before we had per-PC SP deltas,
since the traceback code assumed that the frame size was fixed
for the whole function, except on the first instruction where
it was 0. Since we now have per-PC SP deltas and PCDATA, we
can correctly record that the frame size is 0. This makes the
emulation unnecessary.
This may be helpful for registerized calling convention, where
there may be unspills of arguments after calling morestack. It
also simplifies the runtime.
Change-Id: I7ebee31eaee81795445b33f521ab6a79624c4ceb
Reviewed-on: https://go-review.googlesource.com/30138
Reviewed-by: David Chase <drchase@google.com>
Calling cgocallback from a signal handler can fail when using the race
detector. Calling cgocallback will lead to a call to newextram which
will call oneNewExtraM which will call racegostart. The racegostart
function will set up some race detector data structures, and doing that
will sometimes call the C memory allocator. If we are running the signal
handler from a signal that interrupted the C memory allocator, we will
crash or hang.
Instead, change the signal handler code to call needm and dropm. The
needm function will grab allocated m and g structures and initialize the
g to use the current stack--the signal stack. That is all we need to
safely call code that allocates memory and checks whether it needs to
split the stack. This may temporarily leave us with no m available to
run a cgo callback, but that is OK in this case since the code we call
will quickly either crash or call dropm to return the m.
Implementing this required changing some of the setSignalstackSP
functions to avoid a write barrier. These functions never need a write
barrier but in some cases generated one anyhow because on some systems
the ss_sp field is a pointer.
Change-Id: I3893f47c3a66278f85eab7f94c1ab11d4f3be133
Reviewed-on: https://go-review.googlesource.com/30218
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Change-Id: I8fd271066925734c3f7196f64db04f27c4ce27cb
Reviewed-on: https://go-review.googlesource.com/30274
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I avoided anywhere in the compiler or things which might be used by
the compiler in the future, since they need to build with Go 1.4.
I also avoided anywhere where there was no benefit to changing it.
I probably missed some.
Updates #16721
Change-Id: Ib3c895ff475c6dec2d4322393faaf8cb6a6d4956
Reviewed-on: https://go-review.googlesource.com/30250
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
In particular, it wasn't obvious that some values are special (unless
you also found those special values), so document that it isn't
necessarily a hash value.
Change-Id: Iff292822b44408239e26cd882dc07be6df2c1d38
Reviewed-on: https://go-review.googlesource.com/30143
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
gcDumpObject is often used on a stack pointer (for example, when
checkmark finds an unmarked object on the stack), but since stack
spans don't have an elemsize, it doesn't print any of the memory from
the frame. Make it at least slightly more useful by printing
everything between obj and obj+off (inclusive). While we're here, also
print out the span state.
Change-Id: I51be064ea8791b4a365865bfdc7afa7b5aaecfbd
Reviewed-on: https://go-review.googlesource.com/30142
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently span states are untyped constants and the field is just a
uint8. Make this more type-safe by introducing a type for the span
state.
Change-Id: I369bf59fe6e8234475f4921611424fceb7d0a6de
Reviewed-on: https://go-review.googlesource.com/30141
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the SetFinalizer documentation makes a strong claim that
SetFinalizer will panic if the pointer is not to an object allocated
by calling new, to a composite literal, or to a local variable. This
is not true. For example, it doesn't panic when passed the address of
a package-level variable. Nor can we practically make it true. For
example, we can't distinguish between passing a pointer to a composite
literal and passing a pointer to its first field.
Hence, weaken the guarantee to say that it "may" panic.
Updates #17311. (Might fix it, depending on what we want to do with
package-level variables.)
Change-Id: I1c68ea9d0a5bbd3dd1b7ce329d92b0f05e2e0877
Reviewed-on: https://go-review.googlesource.com/30137
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
In FreeBSD 10.0, the _umtx_op syscall was changed to allow sleeping on
any supported clock, but the default clock was switched from a monotonic
clock to CLOCK_REALTIME.
Prior to 10.0, the __umtx_op_wait* functions ignored the fourth argument
to _umtx_op (uaddr1), expected the fifth argument (uaddr2) to be a
struct timespec pointer, and used a monotonic clock (nanouptime(9)) for
timeout calculations.
Since 10.0, if callers want a clock other than CLOCK_REALTIME, they must
call _umtx_op with uaddr1 set to a value greater than sizeof(struct
timespec), and with uaddr2 as pointer to a struct _umtx_time, rather
than a timespec. Callers can set the _clockid field of the struct
_umtx_time to request the clock they want.
The relevant FreeBSD commit:
https://svnweb.freebsd.org/base?view=revision&revision=232144Fixes#17168
Change-Id: I3dd7b32b683622b8d7b4a6a8f9eb56401bed6bdf
Reviewed-on: https://go-review.googlesource.com/30154
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The endcgo function call is currently deferred in case a cgo
callback into Go panics and unwinds through cgocall. Typical cgo
calls do not have callbacks into Go, and even fewer panic, so we
pay the cost of this defer for no typical benefit.
Amazingly, there is another defer on the cgocallback path also used
to cleanup in case the Go code called by cgo panics. This CL folds
the first defer into the second, to reduce the cost of typical cgo
calls.
This reduces the overhead for a no-op cgo call significantly:
name old time/op new time/op delta
CgoNoop-8 93.5ns ± 0% 51.1ns ± 1% -45.34% (p=0.016 n=4+5)
The total effect between Go 1.7 and 1.8 is even greater, as CL 29656
reduced the cost of defer recently. Hopefully a future Go release
will drop the cost of defer to nothing, making this optimization
unnecessary. But until then, this is nice.
Change-Id: Id1a5648f687a87001d95bec6842e4054bd20ee4f
Reviewed-on: https://go-review.googlesource.com/30080
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Consistently access function parameters using the FP pseudo-register
instead of SP (e.g., x+0(FP) instead of x+4(SP) or x+8(SP), depending
on register size). Two reasons: 1) doc/asm says the SP pseudo-register
should use negative offsets in the range [-framesize, 0), and 2)
cmd/vet only validates parameter offsets when indexed from the FP
pseudo-register.
No binary changes to the compiled object files for any of the affected
package/OS/arch combinations.
Change-Id: I0efc6079bc7519fcea588c114ec6a39b245d68b0
Reviewed-on: https://go-review.googlesource.com/30085
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change reverts CL 18814 which is a workaroud for older DragonFly
BSD kernels, and fixes#13945 and #13947 in a more general way the
same as other platforms except NetBSD.
This is a followup to CL 29491.
Updates #16329.
Change-Id: I771670bc672c827f2b3dbc7fd7417c49897cb991
Reviewed-on: https://go-review.googlesource.com/29971
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Change setsig, setsigstack, getsig, raise, raiseproc to take uint32 for
signal number parameter, as that is the type mostly used for signal
numbers. Same for dieFromSignal, sigInstallGoHandler, raisebadsignal.
Remove setsig restart parameter, as it is always either true or
irrelevant.
Don't check the handler in setsigstack, as the only caller does that
anyhow.
Don't bother to convert the handler from sigtramp to sighandler in
getsig, as it will never be called when the handler is sigtramp or
sighandler.
Don't check the return value from rt_sigaction in the GNU/Linux version
of setsigstack; no other setsigstack checks it, and it never fails.
Change-Id: I6bbd677e048a77eddf974dd3d017bc3c560fbd48
Reviewed-on: https://go-review.googlesource.com/29953
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change replaces the use of CryptGenRandom with RtlGenRandom in
Windows to generate cryptographically random numbers during process
startup. RtlGenRandom uses the same RNG as CryptGenRandom, but it has many
fewer DLL dependencies and so does not affect process startup time as
much.
This makes running simple Go program on my computers faster.
Windows XP:
benchmark old ns/op new ns/op delta
BenchmarkRunningGoProgram-2 47408573 10784148 -77.25%
Windows 7 (VM):
benchmark old ns/op new ns/op delta
BenchmarkRunningGoProgram 16260390 12792150 -21.33%
Windows 7:
benchmark old ns/op new ns/op delta
BenchmarkRunningGoProgram-2 13600778 10050574 -26.10%
Fixes#15589
Change-Id: I2816239a2056e3d4a6dcd86a6fa2bb619c6008fe
Reviewed-on: https://go-review.googlesource.com/29700
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The OS-independent sigmask type was not pulling its weight. Replace it
with the OS-dependent sigset type. This requires adding an OS-specific
sigaddset function, but permits removing the OS-specific sigmaskToSigset
function.
Change-Id: I43307b512b0264ec291baadaea902f05ce212305
Reviewed-on: https://go-review.googlesource.com/29950
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change the two calls to signalstack(nil) to inline the code
instead (it's two lines).
Change-Id: Ie92a05494f924f279e40ac159f1b677fda18f281
Reviewed-on: https://go-review.googlesource.com/29854
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The SetFinalizer documentation states that
"The argument obj must be a pointer to an object allocated by calling
new or by taking the address of a composite literal."
which precludes pointers to local variables. According to a comment
on #6591, this case is expected to work. This CL updates the documentation
for SetFinalizer accordingly.
Fixes#6591
Change-Id: Id861b3436bc1c9521361ea2d51c1ce74a121c1af
Reviewed-on: https://go-review.googlesource.com/29592
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This optimizes deferproc and deferreturn in various ways.
The most important optimization is that it more carefully arranges to
prevent preemption or stack growth. Currently we do this by switching
to the system stack on every deferproc and every deferreturn. While we
need to be on the system stack for the slow path of allocating and
freeing defers, in the common case we can fit in the nosplit stack.
Hence, this change pushes the system stack switch down into the slow
paths and makes everything now exposed to the user stack nosplit. This
also eliminates the need for various acquirem/releasem pairs, since we
are now preventing preemption by preventing stack split checks.
As another smaller optimization, we special case the common cases of
zero-sized and pointer-sized defer frames to respectively skip the
copy and perform the copy in line instead of calling memmove.
This speeds up the runtime defer benchmark by 42%:
name old time/op new time/op delta
Defer-4 75.1ns ± 1% 43.3ns ± 1% -42.31% (p=0.000 n=8+10)
In reality, this speeds up defer by about 2.2X. The two benchmarks
below compare a Lock/defer Unlock pair (DeferLock) with a Lock/Unlock
pair (NoDeferLock). NoDeferLock establishes a baseline cost, so these
two benchmarks together show that this change reduces the overhead of
defer from 61.4ns to 27.9ns.
name old time/op new time/op delta
DeferLock-4 77.4ns ± 1% 43.9ns ± 1% -43.31% (p=0.000 n=10+10)
NoDeferLock-4 16.0ns ± 0% 15.9ns ± 0% -0.39% (p=0.000 n=9+8)
This also shaves 34ns off cgo calls:
name old time/op new time/op delta
CgoNoop-4 122ns ± 1% 88.3ns ± 1% -27.72% (p=0.000 n=8+9)
Updates #14939, #16051.
Change-Id: I2baa0dea378b7e4efebbee8fca919a97d5e15f38
Reviewed-on: https://go-review.googlesource.com/29656
Reviewed-by: Keith Randall <khr@golang.org>
This makes it possible to inline getcallersp. getcallersp is on the
hot path of defers, so this slightly speeds up defer:
name old time/op new time/op delta
Defer-4 78.3ns ± 2% 75.1ns ± 1% -4.00% (p=0.000 n=9+8)
Updates #14939.
Change-Id: Icc1cc4cd2f0a81fc4c8344432d0b2e783accacdd
Reviewed-on: https://go-review.googlesource.com/29655
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The big documentation comment at the top of malloc.go has gotten
woefully out of date. Update it.
Change-Id: Ibdb1bdcfdd707a6dc9db79d0633a36a28882301b
Reviewed-on: https://go-review.googlesource.com/29731
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
This documents all fields in MemStats and more clearly documents where
mstats differs from MemStats.
Fixes#15849.
Change-Id: Ie09374bcdb3a5fdd2d25fe4bba836aaae92cb1dd
Reviewed-on: https://go-review.googlesource.com/28972
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
We used to compute an estimate of the reachable heap size that was
different from the marked heap size. This ultimately caused more
problems than it solved, so we pulled it out, but memstats still has
both heap_reachable and heap_marked, and there are some leftover TODOs
about the problems with this estimate.
Clean this up by eliminating heap_reachable in favor of heap_marked
and deleting the stale TODOs.
Change-Id: I713bc20a7c90683d2b43ff63c0b21a440269cc4d
Reviewed-on: https://go-review.googlesource.com/29271
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Back in Go 1.4, memstats.next_gc was both the heap size at which GC
would trigger, and the size GC kept the heap under. When we switched
to concurrent GC in Go 1.5, we got somewhat confused and made this
variable the trigger heap size, while gcController.heapGoal became the
goal heap size.
memstats.next_gc is exposed to the user via MemStats.NextGC, while
gcController.heapGoal is not. This is unfortunate because 1) the heap
goal is far more useful for diagnostics, and 2) the trigger heap size
is just part of the GC trigger heuristic, which means it wouldn't be
useful to an application even if it tried to use it.
We never noticed this mess because MemStats.NextGC is practically
undocumented. Now that we're trying to document MemStats, it became
clear that this field had diverged from its original usefulness.
Clean up this mess by shuffling things back around so that next_gc is
the goal heap size and the new (unexposed) memstats.gc_trigger field
is the trigger heap size. This eliminates gcController.heapGoal.
Updates #15849.
Change-Id: I2cbbd43b1d78bdf613cb43f53488bd63913189b7
Reviewed-on: https://go-review.googlesource.com/29270
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
All the variants that sets the new signal mask in minit do the same
thing, so merge them. This requires an OS-specific sigdelset function;
the function already exists for linux, and is now added for other OS's.
Change-Id: Ie96f6f02e2cf09c43005085985a078bd9581f670
Reviewed-on: https://go-review.googlesource.com/29771
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Combine the various versions of sigtrampgo into a single function in
signal_unix.go. This requires defining a fixsigcode method on sigctxt
for all operating systems; it only does something on Darwin. This also
requires changing the darwin/amd64 signal handler to call sigreturn
itself, rather than relying on sigtrampgo to call sigreturn for it. We
can then drop the Darwin sigreturn function, as it is no longer used.
Change-Id: I5a0b9d2d2c141957e151b41e694efeb20e4b4b9a
Reviewed-on: https://go-review.googlesource.com/29761
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change all Unix systems to use stackt for the alternate signal
stack (some were using sigaltstackt). Add OS-specific setSignalstackSP
function to handle different types for ss_sp field, and unify all
OS-specific signalstack functions into one. Unify handling of alternate
signal stack in OS-specific minit and sigtrampgo functions via new
functions minitSignalstack and setGsignalStack.
Change-Id: Idc316dc69b1dd725717acdf61a1cd8b9f33ed174
Reviewed-on: https://go-review.googlesource.com/29757
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Implement a comment by Ralph Corderoy on CL 29754.
Change-Id: I22bbede211ddcb8a057f16b4f47d335a156cc8d2
Reviewed-on: https://go-review.googlesource.com/29756
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently raceSymbolizeCode uses funcline, which is internal runtime
function which crashes on incorrect PCs. Use FileLine instead,
it is public and does not crash on invalid data.
Note: FileLine returns "?" file on failure. That string is not NUL-terminated,
so we need to additionally check what FileLine returns.
Fixes#17190
Change-Id: Ic6fbd4f0e68ddd52e9b2dd25e625b50adcb69a98
Reviewed-on: https://go-review.googlesource.com/29714
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
PC passed to racegostart is expected to be a return PC
of the go statement. Race runtime will subtract 1 from the PC
before symbolization. Passing start PC of a function is wrong.
Add sys.PCQuantum to the function start PC.
Update #17190
Change-Id: Ia504c49e79af84ed4ea360c2aea472b370ea8bf5
Reviewed-on: https://go-review.googlesource.com/29712
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Replace all the Unix sighandler functions with a single instance.
Push the relatively small amount of processor-specific code into five
methods on sigctxt: sigpc, sigsp, siglr, fault, preparePanic.
(Some processors already had a fault method.)
Change-Id: Ib459412ff8f7e0f5ad06bfd43eb827c8b196fc32
Reviewed-on: https://go-review.googlesource.com/29752
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Takes a bit too long to run it all the time.
Fixes#17217
Update #17104
Change-Id: I4802190ea16ee0f436a7f95b093ea0f995f5b11d
Reviewed-on: https://go-review.googlesource.com/29751
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Unify the OS-specific versions of msigsave, msigrestore, sigblock,
updatesigmask, and unblocksig into single versions in signal_unix.go.
To do this, make sigprocmask work the same way on all systems, which
required adding a definition of sigprocmask for linux and openbsd.
Also add a single OS-specific function sigmaskToSigset.
Change-Id: I7cbf75131dddb57eeefe648ef845b0791404f785
Reviewed-on: https://go-review.googlesource.com/29689
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Attempt to fix the linux-amd64-clang builder, which broke
with CL 29472.
Turns out pthread_yield is a non-portable Linux function, and
should have #define _GNU_SOURCE before #include <pthread.h>.
GCC doesn't complain about this, but Clang does:
./raceprof.go:44:3: warning: implicit declaration of function 'pthread_yield' is invalid in C99 [-Wimplicit-function-declaration]
(Though the error, while explicable, certainly could be clearer.)
There is a portable POSIX equivalent, sched_yield, so this
CL uses it instead.
Change-Id: I58ca7a3f73a2b3697712fdb02e72a8027c391169
Reviewed-on: https://go-review.googlesource.com/29675
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Inspired by difficulties with plugin support on darwin.
Change-Id: I2cef8410837946454e75d00e94e46791f03f2267
Reviewed-on: https://go-review.googlesource.com/29391
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Instrumenting copy and append for the race detector changes them to call
different functions. In the runtime package the alternate functions are
not marked as nosplit. This caused a crash in the SIGPROF handler when
invoked on a non-Go thread in a program built with the race detector. In
some cases the handler can call copy, the race detector changed that to
a call to a non-nosplit function, the function tried to check the stack
guard, and crashed because it was running on a non-Go thread. The
SIGPROF handler is written carefully to avoid such problems, but hidden
function calls are difficult to avoid.
Fix this by changing the compiler to not instrument copy and append when
compiling the runtime package. Change the runtime package to add
explicit race checks for the only code I could find where copy is used
to write to user data (append is never used).
Change-Id: I11078a66c0aaa459a7d2b827b49f4147922050af
Reviewed-on: https://go-review.googlesource.com/29472
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Requires adding a sigfwd function for Solaris, as previously
signal2_unix.go was not built for Solaris.
Change-Id: Iea3ff0ddfa15af573813eb075bead532b324a3fc
Reviewed-on: https://go-review.googlesource.com/29550
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change reverts CL 18835 which is a workaroud for older DragonFly
BSD kernels, and fixes#14051, #14052 and #14067 in a more general way
the same as other platforms except NetBSD.
This change also bumps the minimum required version of DragonFly BSD
kernel to 4.4.4.
Fixes#16329.
Change-Id: I0b44b6afa675f5ed9523914226bd9ec4809ba5ae
Reviewed-on: https://go-review.googlesource.com/29491
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Some applications built with Go on ppc64x with external linking
can fail to link with relocation truncation errors if the elf
text section that is generated is larger than 2^26 bytes and that
section contains a call instruction (bl) which calls a function
beyond the limit addressable by the 24 bit field in the
instruction.
This solution consists of generating multiple text sections where
each is small enough to allow the GNU linker to resolve the calls
by generating long branch code where needed. Other changes were added
to handle differences in processing when multiple text sections exist.
Some adjustments were required to the computation of a method's address
when using the method offset table when there are multiple text sections.
The number of possible section headers was increased to allow for up
to 128 text sections. A test case was also added.
Fixes#15823.
Change-Id: If8117b0e0afb058cbc072258425a35aef2363c92
Reviewed-on: https://go-review.googlesource.com/27790
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>