1
0
mirror of https://github.com/golang/go synced 2024-11-19 17:24:41 -07:00
Commit Graph

74 Commits

Author SHA1 Message Date
Austin Clements
24dd83d7eb runtime: buffered write barrier for amd64p32
Updates #22460.

Change-Id: I6656d478625e5e54aa2eaa38d99dfb0f71ea1fdd
Reviewed-on: https://go-review.googlesource.com/92697
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-13 16:34:16 +00:00
Austin Clements
15d6ab69fb runtime: make systemstack tail call if already switched
Currently systemstack always calls its argument, even if we're already
on the system stack. Unfortunately, traceback with _TraceJump stops at
the first systemstack it sees, which often cuts off runtime stacks
early in profiles.

Fix this by performing a tail call if we're already on the system
stack. This eliminates it from the traceback entirely, so it won't
stop prematurely (or all get mushed into a single node in the profile
graph).

Change-Id: Ibc69e8765e899f8d3806078517b8c7314da196f4
Reviewed-on: https://go-review.googlesource.com/74050
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30 16:33:55 +00:00
Austin Clements
3beaf26e4f runtime: remove write barriers from newstack, gogo
Currently, newstack and gogo have write barriers for maintaining the
context register saved in g.sched.ctxt. This is troublesome, because
newstack can be called from go:nowritebarrierrec places that can't
allow write barriers. It happens to be benign because g.sched.ctxt
will always be nil on entry to newstack *and* it so happens the
incoming ctxt will also always be nil in these contexts (I
think/hope), but this is playing with fire. It's also desirable to
mark newstack go:nowritebarrierrec to prevent any other, non-benign
write barriers from creeping in, but we can't do that right now
because of this one write barrier.

Fix all of this by observing that g.sched.ctxt is really just a saved
live pointer register. Hence, we can shade it when we scan g's stack
and otherwise move it back and forth between the actual context
register and g.sched.ctxt without write barriers. This means we can
save it in morestack along with all of the other g.sched, eliminate
the save from newstack along with its troublesome write barrier, and
eliminate the shenanigans in gogo to invoke the write barrier when
restoring it.

Once we've done all of this, we can mark newstack
go:nowritebarrierrec.

Fixes #22385.
For #22460.

Change-Id: I43c24958e3f6785b53c1350e1e83c2844e0d1522
Reviewed-on: https://go-review.googlesource.com/72553
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-29 17:56:08 +00:00
David Chase
6cac100eef cmd/compile: add intrinsic for reading caller's pc
First step towards removing the mandatory argument for
getcallerpc, which solves certain problems for the runtime.
This might also slightly improve performance.

Intrinsic enabled on 386, amd64, amd64p32,
runtime asm implementation removed on those architectures.

Now-superfluous argument remains in getcallerpc signature
(for a future CL; non-386/amd64 asm funcs ignore it).

Added getcallerpc to the "not a real function" test
in dcl.go, that story is a little odd with respect to
unexported functions but that is not this CL.

Fixes #17327.

Change-Id: I5df1ad91f27ee9ac1f0dd88fa48f1329d6306c3e
Reviewed-on: https://go-review.googlesource.com/31851
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-09-22 18:37:03 +00:00
Martin Möhrmann
6675fadfb8 runtime: cleanup amd64p32 memmove and memclr file organization
Move memclr to a separate file to make it consistent
with other platforms asm function to file organization.

Remove nacl from the memmove filename as the implementation
is generic for the amd64p32 platform even if currently only
nacl is supported for amd64p32.

Change-Id: I8930b76da430a5cf2664801974e4f5185fc0f82f
Reviewed-on: https://go-review.googlesource.com/61031
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-09-05 20:00:05 +00:00
Martin Möhrmann
3216e0cefa cmd/compile: replace eqstring with memequal
eqstring is only called for strings with equal lengths.
Instead of pushing a pointer and length for each argument string
on the stack we can omit pushing one of the lengths on the stack.

Changing eqstrings signature to eqstring(*uint8, *uint8, int) bool
to implement the above optimization would make it very similar to the
existing memequal(*any, *any, uintptr) bool function.

Since string lengths are positive we can avoid code redundancy and
use memequal instead of using eqstring with an optimized signature.

go command binary size reduced by 4128 bytes on amd64.

name                          old time/op    new time/op    delta
CompareStringEqual              6.03ns ± 1%    5.71ns ± 1%   -5.23%  (p=0.000 n=19+18)
CompareStringIdentical          2.88ns ± 1%    3.22ns ± 7%  +11.86%  (p=0.000 n=20+20)
CompareStringSameLength         4.31ns ± 1%    4.01ns ± 1%   -7.17%  (p=0.000 n=19+19)
CompareStringDifferentLength    0.29ns ± 2%    0.29ns ± 2%     ~     (p=1.000 n=20+20)
CompareStringBigUnaligned       64.3µs ± 2%    64.1µs ± 3%     ~     (p=0.164 n=20+19)
CompareStringBig                61.9µs ± 1%    61.6µs ± 2%   -0.46%  (p=0.033 n=20+19)

Change-Id: Ice15f3b937c981f0d3bc8479a9ea0d10658ac8df
Reviewed-on: https://go-review.googlesource.com/53650
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-22 17:59:02 +00:00
Cholerae Hu
57bf6aca71 runtime, cmd/compile: add intrinsic getclosureptr
Intrinsic enabled on all architectures,
runtime asm implementation removed on all architectures.

Fixes #21258

Change-Id: I2cb86d460b497c2f287a5b3df5c37fdb231c23a7
Reviewed-on: https://go-review.googlesource.com/53411
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2017-08-11 18:11:22 +00:00
Martin Möhrmann
7045e6f6c4 runtime: remove unused prefetch functions
The only non test user of the assembler prefetch functions is the
heapBits.prefetch function which is itself unused.

The runtime prefetch functions have no functionality on most platforms
and are not inlineable since they are written in assembler. The function
call overhead eliminates the performance gains that could be achieved with
prefetching and would degrade performance for platforms where the functions
are no-ops.

If prefetch functions are needed back again later they can be improved
by avoiding the function call overhead and implementing them as intrinsics.

Change-Id: I52c553cf3607ffe09f0441c6e7a0a818cb21117d
Reviewed-on: https://go-review.googlesource.com/44370
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-08-08 06:43:49 +00:00
Martin Möhrmann
aeee34cb24 runtime: remove unused cpuid_X variables
They are not exported and not used in the compiler or standard library.

Change-Id: Ie1d210464f826742d282f12258ed1792cbd2d188
Reviewed-on: https://go-review.googlesource.com/43135
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-10 19:28:42 +00:00
Martin Möhrmann
5a6c580990 runtime: refactor cpu feature detection for 386 & amd64
Changes all cpu features to be detected and stored in bools in rt0_go.

Updates: #15403

Change-Id: I5a9961cdec789b331d09c44d86beb53833d5dc3e
Reviewed-on: https://go-review.googlesource.com/41950
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-01 20:46:03 +00:00
Martin Möhrmann
b64e817853 runtime: simplify detection of preference to use AVX memmove
Reduces cmd/go by 4464 bytes on amd64.

Removes the duplicate detection of AVX support and
presence of Intel processors.

Change-Id: I4670189951a63760fae217708f68d65e94a30dc5
Reviewed-on: https://go-review.googlesource.com/41570
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-25 04:50:04 +00:00
Matthew Dempsky
c310c688ff cmd/compile, runtime: simplify multiway select implementation
This commit reworks multiway select statements to use normal control
flow primitives instead of the previous setjmp/longjmp-like behavior.
This simplifies liveness analysis and should prevent issues around
"returns twice" function calls within SSA passes.

test/live.go is updated because liveness analysis's CFG is more
representative of actual control flow. The case bodies are the only
real successors of the selectgo call, but previously the selectsend,
selectrecv, etc. calls were included in the successors list too.

Updates #19331.

Change-Id: I7f879b103a4b85e62fc36a270d812f54c0aa3e83
Reviewed-on: https://go-review.googlesource.com/37661
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-07 20:14:17 +00:00
Austin Clements
d089a6c718 runtime: remove stack barriers
Now that we don't rescan stacks, stack barriers are unnecessary. This
removes all of the code and structures supporting them as well as
tests that were specifically for stack barriers.

Updates #17503.

Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c
Reviewed-on: https://go-review.googlesource.com/36620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-14 15:52:54 +00:00
Sokolov Yura
d03c124860 runtime: implement fastrand in go
So it could be inlined.

Using bit-tricks it could be implemented without condition
(improved trick version by Minux Ma).

Simple benchmark shows it is faster on i386 and x86_64, though
I don't know will it be faster on other architectures?

benchmark                       old ns/op     new ns/op     delta
BenchmarkFastrand-3             2.79          1.48          -46.95%
BenchmarkFastrandHashiter-3     25.9          24.9          -3.86%

Change-Id: Ie2eb6d0f598c0bb5fac7f6ad0f8b5e3eddaa361b
Reviewed-on: https://go-review.googlesource.com/34782
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-10 19:16:29 +00:00
Austin Clements
1bd39e79db runtime: fix SP adjustment on amd64p32
On amd64p32, rt0_go attempts to reserve 128 bytes of scratch space on
the stack, but due to a register mixup this ends up being a no-op. Fix
this so we actually reserve the stack space.

Change-Id: I04dbfbeb44f3109528c8ec74e1136bc00d7e1faa
Reviewed-on: https://go-review.googlesource.com/32331
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-28 21:39:17 +00:00
Austin Clements
70c107c68d runtime: add deletion barriers on gobuf.ctxt
gobuf.ctxt is set to nil from many places in assembly code and these
assignments require write barriers with the hybrid barrier.

Conveniently, in most of these places ctxt should already be nil, in
which case we don't need the barrier. This commit changes these places
to assert that ctxt is already nil.

gogo is more complicated, since ctxt may not already be nil. For gogo,
we manually perform the write barrier if ctxt is not nil.

Updates #17503.

Change-Id: I9d75e27c75a1b7f8b715ad112fc5d45ffa856d30
Reviewed-on: https://go-review.googlesource.com/31764
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-28 20:48:02 +00:00
Austin Clements
87e48c5afd runtime, cmd/compile: rename memclr -> memclrNoHeapPointers
Since barrier-less memclr is only safe in very narrow circumstances,
this commit renames memclr to avoid accidentally calling memclr on
typed memory. This can cause subtle, non-deterministic bugs, so it's
worth some effort to prevent. In the near term, this will also prevent
bugs creeping in from any concurrent CLs that add calls to memclr; if
this happens, whichever patch hits master second will fail to compile.

This also adds the other new memclr variants to the compiler's
builtin.go to minimize the churn on that binary blob. We'll use these
in future commits.

Updates #17503.

Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca
Reviewed-on: https://go-review.googlesource.com/31369
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 18:20:33 +00:00
Austin Clements
79561a84ce runtime: simplify reflectcall write barriers
Currently reflectcall has a subtle dance with write barriers where the
assembly code copies the result values from the stack to the in-heap
argument frame without write barriers and then calls into the runtime
after the fact to invoke the necessary write barriers.

For the hybrid barrier (and for ROC), we need to switch to a
*pre*-write write barrier, which is very difficult to do with the
current setup. We could tie ourselves in knots of subtle reasoning
about why it's okay in this particular case to have a post-write write
barrier, but this commit instead takes a different approach. Rather
than making things more complex, this simplifies reflection calls so
that the argument copy is done in Go using normal bulk write barriers.

The one difficulty with this approach is that calling into Go requires
putting arguments on the stack, but the call* functions "donate" their
entire stack frame to the called function. We can get away with this
now because the copy avoids using the stack and has copied the results
out before we clobber the stack frame to call into the write barrier.
The solution in this CL is to call another function, passing arguments
in registers instead of on the stack, and let that other function
reserve more stack space and setup the arguments for the runtime.

This approach seemed to work out the best. I also tried making the
call* functions reserve 32 extra bytes of frame for the write barrier
arguments and adjust SP up by 32 bytes around the call. However, even
with the necessary changes to the assembler to correct the spdelta
table, the runtime was still having trouble with the frame layout (and
the changes to the assembler caused many other things that do strange
things with the SP to fail to assemble). The approach I took doesn't
require any funny business with the SP.

Updates #17503.

Change-Id: Ie2bb0084b24d6cff38b5afb218b9e0534ad2119e
Reviewed-on: https://go-review.googlesource.com/31655
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-26 15:44:44 +00:00
Austin Clements
bf9c71cb43 runtime: make morestack less subtle
morestack writes the context pointer to gobuf.ctxt, but since
morestack is written in assembly (and has to be very careful with
state), it does *not* invoke the requisite write barrier for this
write. Instead, we patch this up later, in newstack, where we invoke
an explicit write barrier for ctxt.

This already requires some subtle reasoning, and it's going to get a
lot hairier with the hybrid barrier.

Fix this by simplifying the whole mechanism. Instead of writing
gobuf.ctxt in morestack, just pass the value of the context register
to newstack and let it write it to gobuf.ctxt. This is a normal Go
pointer write, so it gets the normal Go write barrier. No subtle
reasoning required.

Updates #17503.

Change-Id: Ia6bf8459bfefc6828f53682ade32c02412e4db63
Reviewed-on: https://go-review.googlesource.com/31550
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-24 02:23:16 +00:00
Austin Clements
687d9d5d78 runtime: print a message on bad morestack
If morestack runs on the g0 or gsignal stack, it currently performs
some abort operation that typically produces a signal (e.g., it does
an INT $3 on x86). This is useful if you're running in a debugger, but
if you're not, the runtime tries to trap this signal, which is likely
to send the program into a deeper spiral of collapse and lead to very
confusing diagnostic output.

Help out people trying to debug without a debugger by making morestack
print an informative message before blowing up.

Change-Id: I2814c64509b137bfe20a00091d8551d18c2c4749
Reviewed-on: https://go-review.googlesource.com/31133
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-10-17 18:56:09 +00:00
Austin Clements
d211c2d377 runtime: implement getcallersp in Go
This makes it possible to inline getcallersp. getcallersp is on the
hot path of defers, so this slightly speeds up defer:

name           old time/op  new time/op  delta
Defer-4        78.3ns ± 2%  75.1ns ± 1%  -4.00%   (p=0.000 n=9+8)

Updates #14939.

Change-Id: Icc1cc4cd2f0a81fc4c8344432d0b2e783accacdd
Reviewed-on: https://go-review.googlesource.com/29655
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-26 22:01:32 +00:00
Josh Bleecher Snyder
2b74de3ed9 runtime: rename fastrand1 to fastrand
Change-Id: I37706ff0a3486827c5b072c95ad890ea87ede847
Reviewed-on: https://go-review.googlesource.com/28210
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-30 23:59:21 +00:00
Josh Bleecher Snyder
e2103adb6c crypto/*, runtime: nacl asm fixes
Found by vet.

Updates #11041

Change-Id: I5217b3e20c6af435d7500d6bb487b9895efe6605
Reviewed-on: https://go-review.googlesource.com/27493
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-08-22 19:50:41 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
Keith Randall
bd70bd9cb2 runtime: unify memeq and memequal
They do the same thing, except memequal also has the short-circuit
check if the two pointers are equal.

A) We might as well always do the short-circuit check, it is only 2 instructions.
B) The extra function call (memequal->memeq) is expensive.

benchmark                 old ns/op     new ns/op     delta
BenchmarkArrayEqual-8     8.56          5.31          -37.97%

No noticeable affect on the former memeq user (maps).

Fixes #14302

Change-Id: I85d1ada59ed11e64dd6c54667f79d32cc5f81948
Reviewed-on: https://go-review.googlesource.com/19843
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-02-23 00:15:38 +00:00
Shenghou Ma
3583a44ed2 runtime: check that masks and shifts are correct aligned
We need a runtime check because the original issue is encountered
when running cross compiled windows program from linux. It's better
to give a meaningful crash message earlier than to segfault later.

The added test should not impose any measurable overhead to Go
programs.

For #12415.

Change-Id: Ib4a24ef560c09c0585b351d62eefd157b6b7f04c
Reviewed-on: https://go-review.googlesource.com/14207
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-25 00:46:57 +00:00
Matthew Dempsky
7bb38f6e47 runtime: replace tls0 with m0.tls
We're allocating TLS storage for m0 anyway, so might as well use it.

Change-Id: I7dc20bbea5320c8ab8a367f18a9540706751e771
Reviewed-on: https://go-review.googlesource.com/16890
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-13 01:53:00 +00:00
Michael Matloob
67faca7d9c runtime: break atomics out into package runtime/internal/atomic
This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.

all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.

Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 17:38:04 +00:00
Keith Randall
4b7d5f0b94 runtime: memmove/memclr pointers atomically
Make sure that we're moving or zeroing pointers atomically.
Anything that is a multiple of pointer size and at least
pointer aligned might have pointers in it.  All the code looks
ok except for the 1-pointer-sized moves.

Fixes #13160
Update #12552

Change-Id: Ib97d9b918fa9f4cc5c56c67ed90255b7fdfb7b45
Reviewed-on: https://go-review.googlesource.com/16668
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-07 02:42:12 +00:00
Austin Clements
9f6df6c940 runtime: use 4 byte writes in amd64p32 memmove/memclr
Currently, amd64p32's memmove and memclr use 8 byte writes as much as
possible and 1 byte writes for the tail of the object. However, if an
object ends with a 4 byte pointer at an 8 byte aligned offset, this
may copy/zero the pointer field one byte at a time, allowing the
garbage collector to observe a partially copied pointer.

Fix this by using 4 byte writes instead of 8 byte writes.

Updates #12552.

Change-Id: I13324fd05756fb25ae57e812e836f0a975b5595c
Reviewed-on: https://go-review.googlesource.com/15370
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-02 22:49:15 +00:00
Austin Clements
e2bb03f175 runtime: don't install a stack barrier in cgocallback_gofunc's frame
Currently the runtime can install stack barriers in any frame.
However, the frame of cgocallback_gofunc is special: it's the one
function that switches from a regular G stack to the system stack on
return. Hence, the return PC slot in its frame on the G stack is
actually used to save getg().sched.pc (so tracebacks appear to unwind
to the last Go function running on that G), and not as an actual
return PC for cgocallback_gofunc.

Because of this, if we install a stack barrier in cgocallback_gofunc's
return PC slot, when cgocallback_gofunc does return, it will move the
stack barrier stub PC in to getg().sched.pc and switch back to the
system stack. The rest of the runtime doesn't know how to deal with a
stack barrier stub in sched.pc: nothing knows how to match it up with
the G's stack barrier array and, when the runtime removes stack
barriers, it doesn't know to undo the one in sched.pc. Hence, if the C
code later returns back in to Go code, it will attempt to return
through the stack barrier saved in sched.pc, which may no longer have
correct unwinding information.

Fix this by blacklisting cgocallback_gofunc's frame so the runtime
won't install a stack barrier in it's return PC slot.

Fixes #12238.

Change-Id: I46aa2155df2fd050dd50de3434b62987dc4947b8
Reviewed-on: https://go-review.googlesource.com/13944
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:06:47 +00:00
Michael Hudson-Doyle
d497eeb005 runtime: remove unused xchgp/xchgp1
I noticed that they were unimplemented on arm64 but then that they were
in fact not used at all.

Change-Id: Iee579feda2a5e374fa571bcc8c89e4ef607d50f6
Reviewed-on: https://go-review.googlesource.com/13951
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-27 00:28:35 +00:00
Austin Clements
f5d494bbdf runtime: ensure GC sees type-safe memory on weak machines
Currently its possible for the garbage collector to observe
uninitialized memory or stale heap bitmap bits on weakly ordered
architectures such as ARM and PPC. On such architectures, the stores
that zero newly allocated memory and initialize its heap bitmap may
move after a store in user code that makes the allocated object
observable by the garbage collector.

To fix this, add a "publication barrier" (also known as an "export
barrier") before returning from mallocgc. This is a store/store
barrier that ensures any write done by user code that makes the
returned object observable to the garbage collector will be ordered
after the initialization performed by mallocgc. No barrier is
necessary on the reading side because of the data dependency between
loading the pointer and loading the contents of the object.

Fixes one of the issues raised in #9984.

Change-Id: Ia3d96ad9c5fc7f4d342f5e05ec0ceae700cd17c8
Reviewed-on: https://go-review.googlesource.com/11083
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Martin Capitanio <capnm9@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-19 15:29:50 +00:00
Alex Brainman
9d968cb47b runtime: rename cgocall_errno and asmcgocall_errno into cgocall and asmcgocall
Change-Id: I5917bea8bb35b0e725dcc56a68f3a70137cfc180
Reviewed-on: https://go-review.googlesource.com/9387
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-19 01:47:11 +00:00
Alex Brainman
2858b73843 runtime: remove cgocall and asmcgocall
In preparation for rename of cgocall_errno into cgocall and
asmcgocall_errno into asmcgocall in the fllowinng CL.
rsc requested CL 9387 to be split into two parts. This is first part.

Change-Id: I7434f0e4b44dd37017540695834bfcb1eebf0b2f
Reviewed-on: https://go-review.googlesource.com/11166
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-18 04:42:53 +00:00
Austin Clements
faa7a7e8ae runtime: implement GC stack barriers
This commit implements stack barriers to minimize the amount of
stack re-scanning that must be done during mark termination.

Currently the GC scans stacks of active goroutines twice during every
GC cycle: once at the beginning during root discovery and once at the
end during mark termination. The second scan happens while the world
is stopped and guarantees that we've seen all of the roots (since
there are no write barriers on writes to local stack
variables). However, this means pause time is proportional to stack
size. In particularly recursive programs, this can drive pause time up
past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap).

Re-scanning the entire stack is rarely necessary, especially for large
stacks, because usually most of the frames on the stack were not
active between the first and second scans and hence any changes to
these frames (via non-escaping pointers passed down the stack) were
tracked by write barriers.

To efficiently track how far a stack has been unwound since the first
scan (and, hence, how much needs to be re-scanned), this commit
introduces stack barriers. During the first scan, at exponentially
spaced points in each stack, the scan overwrites return PCs with the
PC of the stack barrier function. When "returned" to, the stack
barrier function records how far the stack has unwound and jumps to
the original return PC for that point in the stack. Then the second
scan only needs to proceed as far as the lowest barrier that hasn't
been hit.

For deeply recursive programs, this substantially reduces mark
termination time (and hence pause time). For the goscheme example
linked in issue #10898, prior to this change, mark termination times
were typically between 100 and 500ms; with this change, mark
termination times are typically between 10 and 20ms. As a result of
the reduced stack scanning work, this reduces overall execution time
of the goscheme example by 20%.

Fixes #10898.

The effect of this on programs that are not deeply recursive is
minimal:

name                   old time/op    new time/op    delta
BinaryTree17              3.16s ± 2%     3.26s ± 1%  +3.31%  (p=0.000 n=19+19)
Fannkuch11                2.42s ± 1%     2.48s ± 1%  +2.24%  (p=0.000 n=17+19)
FmtFprintfEmpty          50.0ns ± 3%    49.8ns ± 1%    ~     (p=0.534 n=20+19)
FmtFprintfString          173ns ± 0%     175ns ± 0%  +1.49%  (p=0.000 n=16+19)
FmtFprintfInt             170ns ± 1%     175ns ± 1%  +2.97%  (p=0.000 n=20+19)
FmtFprintfIntInt          288ns ± 0%     295ns ± 0%  +2.73%  (p=0.000 n=16+19)
FmtFprintfPrefixedInt     242ns ± 1%     252ns ± 1%  +4.13%  (p=0.000 n=18+18)
FmtFprintfFloat           324ns ± 0%     323ns ± 0%  -0.36%  (p=0.000 n=20+19)
FmtManyArgs              1.14µs ± 0%    1.12µs ± 1%  -1.01%  (p=0.000 n=18+19)
GobDecode                8.88ms ± 1%    8.87ms ± 0%    ~     (p=0.480 n=19+18)
GobEncode                6.80ms ± 1%    6.85ms ± 0%  +0.82%  (p=0.000 n=20+18)
Gzip                      363ms ± 1%     363ms ± 1%    ~     (p=0.077 n=18+20)
Gunzip                   90.6ms ± 0%    90.0ms ± 1%  -0.71%  (p=0.000 n=17+18)
HTTPClientServer         51.5µs ± 1%    50.8µs ± 1%  -1.32%  (p=0.000 n=18+18)
JSONEncode               17.0ms ± 0%    17.1ms ± 0%  +0.40%  (p=0.000 n=18+17)
JSONDecode               61.8ms ± 0%    63.8ms ± 1%  +3.11%  (p=0.000 n=18+17)
Mandelbrot200            3.84ms ± 0%    3.84ms ± 1%    ~     (p=0.583 n=19+19)
GoParse                  3.71ms ± 1%    3.72ms ± 1%    ~     (p=0.159 n=18+19)
RegexpMatchEasy0_32       100ns ± 0%     100ns ± 1%  -0.19%  (p=0.033 n=17+19)
RegexpMatchEasy0_1K       342ns ± 1%     331ns ± 0%  -3.41%  (p=0.000 n=19+19)
RegexpMatchEasy1_32      82.5ns ± 0%    81.7ns ± 0%  -0.98%  (p=0.000 n=18+18)
RegexpMatchEasy1_1K       505ns ± 0%     494ns ± 1%  -2.16%  (p=0.000 n=18+18)
RegexpMatchMedium_32      137ns ± 1%     137ns ± 1%  -0.24%  (p=0.048 n=20+18)
RegexpMatchMedium_1K     41.6µs ± 0%    41.3µs ± 1%  -0.57%  (p=0.004 n=18+20)
RegexpMatchHard_32       2.11µs ± 0%    2.11µs ± 1%  +0.20%  (p=0.037 n=17+19)
RegexpMatchHard_1K       63.9µs ± 2%    63.3µs ± 0%  -0.99%  (p=0.000 n=20+17)
Revcomp                   560ms ± 1%     522ms ± 0%  -6.87%  (p=0.000 n=18+16)
Template                 75.0ms ± 0%    75.1ms ± 1%  +0.18%  (p=0.013 n=18+19)
TimeParse                 358ns ± 1%     364ns ± 0%  +1.74%  (p=0.000 n=20+15)
TimeFormat                360ns ± 0%     372ns ± 0%  +3.55%  (p=0.000 n=20+18)

Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2
Reviewed-on: https://go-review.googlesource.com/10314
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-06-02 20:00:57 +00:00
Srdjan Petrovic
6ad33be2d9 runtime: implement xadduintptr and update system mstats using it
The motivation is that sysAlloc/Free() currently aren't safe to be
called without a valid G, because arm's xadd64() uses locks that require
a valid G.

The solution here was proposed by Dmitry Vyukov: use xadduintptr()
instead of xadd64(), until arm can support xadd64 on all of its
architectures (not a trivial task for arm).

Change-Id: I250252079357ea2e4360e1235958b1c22051498f
Reviewed-on: https://go-review.googlesource.com/9002
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-24 16:53:26 +00:00
Russ Cox
92c826b1b2 cmd/internal/gc: inline runtime.getg
This more closely restores what the old C runtime did.
(In C, g was an 'extern register' with the same effective
implementation as in this CL.)

On a late 2012 MacBookPro10,2, best of 5 old vs best of 5 new:

benchmark                          old ns/op      new ns/op      delta
BenchmarkBinaryTree17              4981312777     4463426605     -10.40%
BenchmarkFannkuch11                3046495712     3006819428     -1.30%
BenchmarkFmtFprintfEmpty           89.3           79.8           -10.64%
BenchmarkFmtFprintfString          284            262            -7.75%
BenchmarkFmtFprintfInt             282            262            -7.09%
BenchmarkFmtFprintfIntInt          480            448            -6.67%
BenchmarkFmtFprintfPrefixedInt     382            358            -6.28%
BenchmarkFmtFprintfFloat           529            486            -8.13%
BenchmarkFmtManyArgs               1849           1773           -4.11%
BenchmarkGobDecode                 12835963       11794385       -8.11%
BenchmarkGobEncode                 10527170       10288422       -2.27%
BenchmarkGzip                      436109569      438422516      +0.53%
BenchmarkGunzip                    110121663      109843648      -0.25%
BenchmarkHTTPClientServer          81930          85446          +4.29%
BenchmarkJSONEncode                24638574       24280603       -1.45%
BenchmarkJSONDecode                93022423       85753546       -7.81%
BenchmarkMandelbrot200             4703899        4735407        +0.67%
BenchmarkGoParse                   5319853        5086843        -4.38%
BenchmarkRegexpMatchEasy0_32       151            151            +0.00%
BenchmarkRegexpMatchEasy0_1K       452            453            +0.22%
BenchmarkRegexpMatchEasy1_32       131            132            +0.76%
BenchmarkRegexpMatchEasy1_1K       761            722            -5.12%
BenchmarkRegexpMatchMedium_32      228            224            -1.75%
BenchmarkRegexpMatchMedium_1K      63751          64296          +0.85%
BenchmarkRegexpMatchHard_32        3188           3238           +1.57%
BenchmarkRegexpMatchHard_1K        95396          96756          +1.43%
BenchmarkRevcomp                   661587262      687107364      +3.86%
BenchmarkTemplate                  108312598      104008540      -3.97%
BenchmarkTimeParse                 453            459            +1.32%
BenchmarkTimeFormat                475            441            -7.16%

The garbage benchmark from the benchmarks subrepo gets 2.6% faster as well.

Change-Id: I320aeda332db81012688b26ffab23f6581c59cfa
Reviewed-on: https://go-review.googlesource.com/8460
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-04-07 14:26:47 +00:00
Michael Hudson-Doyle
f78dc1dac1 runtime: rename ·main·f to ·mainPC to avoid duplicate symbol
runtime·main·f is normalized by the linker to runtime.main.f, as is
the compiler-generated symbol runtime.main·f.  Change the former to
runtime·mainPC instead.

Fixes issue #9934

Change-Id: I656a6fa6422d45385fa2cc55bd036c6affa1abfe
Reviewed-on: https://go-review.googlesource.com/8234
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-30 18:52:14 +00:00
Dave Cheney
98485f5ad4 runtime: fix linux/amd64p32 build
Implement runtime.atomicand8 for amd64p32 which was overlooked
in CL 7861.

Change-Id: Ic7eccddc6fd6c4682cac1761294893928f5428a2
Reviewed-on: https://go-review.googlesource.com/7920
Reviewed-by: Minux Ma <minux@golang.org>
2015-03-21 02:59:43 +00:00
Shenghou Ma
3b00197017 runtime: add argument sizes for asm functions for bytes, strings
Also fixed a stack corruption bug for nacl/amd64p32.

Change-Id: I64b821b16999c296a159137d971af3870053c621
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/7073
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-03-07 06:02:40 +00:00
Dmitry Vyukov
894024f478 runtime: fix traceback from goexit1
We used to not call traceback from goexit1.
But now tracer does it and crashes on amd64p32:

runtime: unexpected return pc for runtime.getg called from 0x108a4240
goroutine 18 [runnable, locked to thread]:
runtime.traceGoEnd()
    src/runtime/trace.go:758 fp=0x10818fe0 sp=0x10818fdc
runtime.goexit1()
    src/runtime/proc1.go:1540 +0x20 fp=0x10818fe8 sp=0x10818fe0
runtime.getg(0x0)
    src/runtime/asm_386.s:2414 fp=0x10818fec sp=0x10818fe8
created by runtime/pprof_test.TestTraceStress
    src/runtime/pprof/trace_test.go:123 +0x500

Return PC from goexit1 points right after goexit (+0x6).
It happens to work most of the time somehow.

This change fixes traceback from goexit1 by adding an additional NOP to goexit.

Fixes #9931

Change-Id: Ied25240a181b0a2d7bc98127b3ed9068e9a1a13e
Reviewed-on: https://go-review.googlesource.com/5460
Reviewed-by: Russ Cox <rsc@golang.org>
2015-02-28 23:19:57 +00:00
Matthew Dempsky
7abdc90fe3 runtime: remove gogetcallerpc and gogetcallersp functions
Package runtime's Go code was converted to directly call getcallerpc
and getcallersp in https://golang.org/cl/138740043, but the assembly
implementations were not removed.

Change-Id: Ib2eaee674d594cbbe799925aae648af782a01c83
Reviewed-on: https://go-review.googlesource.com/5901
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-02-25 09:34:58 +00:00
Josh Bleecher Snyder
135ef49fde runtime: speed up eqstring
eqstring does not need to check the length of the strings.

6g

benchmark                              old ns/op     new ns/op     delta
BenchmarkCompareStringEqual            7.03          6.14          -12.66%
BenchmarkCompareStringIdentical        3.36          3.04          -9.52%

5g

benchmark                                 old ns/op     new ns/op     delta
BenchmarkCompareStringEqual               238           232           -2.52%
BenchmarkCompareStringIdentical           90.8          80.7          -11.12%

The equivalent PPC changes are in a separate commit
because I don't have the hardware to test them.

Change-Id: I292874324b9bbd9d24f57a390cfff8b550cdd53c
Reviewed-on: https://go-review.googlesource.com/3955
Reviewed-by: Keith Randall <khr@golang.org>
2015-02-06 18:51:00 +00:00
Russ Cox
fd4dc91a96 strings: remove overengineered Compare implementation
The function is here ONLY for symmetry with package bytes.
This function should be used ONLY if it makes code clearer.
It is not here for performance. Remove any performance benefit.

If performance becomes an issue, the compiler should be fixed to
recognize the three-way compare (for all comparable types)
rather than encourage people to micro-optimize by using this function.

Change-Id: I71f4130bce853f7aef724c6044d15def7987b457
Reviewed-on: https://go-review.googlesource.com/3012
Reviewed-by: Rob Pike <r@golang.org>
2015-01-19 02:19:17 +00:00
Alan Donovan
90ce1936e3 strings: add Compare(x, y string) int, for symmetry with bytes.Compare
The implementation is the same assembly (or Go) routine.

Change-Id: Ib937c461c24ad2d5be9b692b4eed40d9eb031412
Reviewed-on: https://go-review.googlesource.com/2828
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-01-15 17:17:05 +00:00
Shenghou Ma
0d4d582c68 cmd/5a, cmd/6a, cmd/8a, cmd/9a: check nerrors before exit
Also fix one unaligned stack size for nacl that is caught
by this change.

Fixes #9539.

Change-Id: Ib696a573d3f1f9bac7724f3a719aab65a11e04d3
Reviewed-on: https://go-review.googlesource.com/2600
Reviewed-by: Keith Randall <khr@golang.org>
2015-01-09 23:56:28 +00:00
Josh Bleecher Snyder
f7e43f14d3 runtime: remove stray commas in assembly
Change-Id: I4dc97ff8111bdc5ca6e4e3af06aaf4f768031c68
Reviewed-on: https://go-review.googlesource.com/2473
Reviewed-by: Minux Ma <minux@golang.org>
2015-01-07 22:41:54 +00:00
Keith Randall
d5e4c4061b runtime: remove size argument from hash and equal algorithms
The equal algorithm used to take the size
   equal(p, q *T, size uintptr) bool
With this change, it does not
   equal(p, q *T) bool
Similarly for the hash algorithm.

The size is rarely used, as most equal functions know the size
of the thing they are comparing.  For instance f32equal already
knows its inputs are 4 bytes in size.

For cases where the size is not known, we allocate a closure
(one for each size needed) that points to an assembly stub that
reads the size out of the closure and calls generic code that
has a size argument.

Reduces the size of the go binary by 0.07%.  Performance impact
is not measurable.

Change-Id: I6e00adf3dde7ad2974adbcff0ee91e86d2194fec
Reviewed-on: https://go-review.googlesource.com/2392
Reviewed-by: Russ Cox <rsc@golang.org>
2015-01-07 21:57:01 +00:00
Russ Cox
df027aceb9 reflect: add write barriers
Use typedmemmove, typedslicecopy, and adjust reflect.call
to execute the necessary write barriers.

Found with GODEBUG=wbshadow=2 mode.
Eventually that will run automatically, but right now
it still detects other missing write barriers.

Change-Id: Iec5b5b0c1be5589295e28e5228e37f1a92e07742
Reviewed-on: https://go-review.googlesource.com/2312
Reviewed-by: Keith Randall <khr@golang.org>
2015-01-06 00:28:31 +00:00