1
0
mirror of https://github.com/golang/go synced 2024-10-05 19:21:21 -06:00
Commit Graph

1709 Commits

Author SHA1 Message Date
Caleb Spare
fb7178e7cc runtime: copy sqrt normalization bugfix from math
This copies the change from CL 16158 (applied as
22d4c8bf13).

Updates #13013

Change-Id: Id7d02e63d92806f06a4e064a91b2fb6574fe385f
Reviewed-on: https://go-review.googlesource.com/16291
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-23 23:43:47 +00:00
Matthew Dempsky
8ee0fd8623 runtime: replace is{plan9,solaris,windows} with GOOS tests
Change-Id: I27589395f547c5837dc7536a0ab5bc7cc23a4ff6
Reviewed-on: https://go-review.googlesource.com/10872
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-23 18:11:17 +00:00
Alex Brainman
6410e67a1e runtime: account for cpu affinity in windows NumCPU
Fixes #11671

Change-Id: Ide1f8d92637dad2a2faed391329f9b6001789b76
Reviewed-on: https://go-review.googlesource.com/14742
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-23 07:54:42 +00:00
Austin Clements
beedb1ec33 runtime: add pcvalue cache to improve stack scan speed
The cost of scanning large stacks is currently dominated by the time
spent looking up and decoding the pcvalue table. However, large stacks
are usually large not because they contain calls to many different
functions, but because they contain many calls to the same, small set
of recursive functions. Hence, walking large stacks tends to make the
same pcvalue queries many times.

Based on this observation, this commit adds a small, very simple, and
fast cache in front of pcvalue lookup. We thread this cache down from
operations that make many pcvalue calls, such as gentraceback, stack
scanning, and stack adjusting.

This simple cache works well because it has minimal overhead when it's
not effective. I also tried a hashed direct-map cache, CLOCK-based
replacement, round-robin replacement, and round-robin with lookups
disabled until there had been at least 16 probes, but none of these
approaches had obvious wins over the random replacement policy in this
commit.

This nearly doubles the overall performance of the deep stack test
program from issue #10898:

name        old time/op  new time/op  delta
Issue10898   16.5s ±12%    9.2s ±12%  -44.37%  (p=0.008 n=5+5)

It's a very slight win on the garbage benchmark:

name              old time/op  new time/op  delta
XBenchGarbage-12  4.92ms ± 1%  4.89ms ± 1%  -0.75%  (p=0.000 n=18+19)

It's a wash (but doesn't harm performance) on the go1 benchmarks,
which don't have particularly deep stacks:

name                      old time/op    new time/op    delta
BinaryTree17-12              3.11s ± 2%     3.20s ± 3%  +2.83%  (p=0.000 n=17+20)
Fannkuch11-12                2.51s ± 1%     2.51s ± 1%  -0.22%  (p=0.034 n=19+18)
FmtFprintfEmpty-12          50.8ns ± 3%    50.6ns ± 2%    ~     (p=0.793 n=20+20)
FmtFprintfString-12          174ns ± 0%     174ns ± 1%  +0.17%  (p=0.048 n=15+20)
FmtFprintfInt-12             177ns ± 0%     165ns ± 1%  -6.99%  (p=0.000 n=17+19)
FmtFprintfIntInt-12          283ns ± 1%     284ns ± 0%  +0.22%  (p=0.000 n=18+15)
FmtFprintfPrefixedInt-12     243ns ± 1%     244ns ± 1%  +0.40%  (p=0.000 n=20+19)
FmtFprintfFloat-12           318ns ± 0%     319ns ± 0%  +0.27%  (p=0.001 n=19+20)
FmtManyArgs-12              1.12µs ± 0%    1.14µs ± 0%  +1.74%  (p=0.000 n=19+20)
GobDecode-12                8.69ms ± 0%    8.73ms ± 1%  +0.46%  (p=0.000 n=18+18)
GobEncode-12                6.64ms ± 1%    6.61ms ± 1%  -0.46%  (p=0.000 n=20+20)
Gzip-12                      323ms ± 2%     319ms ± 1%  -1.11%  (p=0.000 n=20+20)
Gunzip-12                   42.8ms ± 0%    42.9ms ± 0%    ~     (p=0.158 n=18+20)
HTTPClientServer-12         63.3µs ± 1%    63.1µs ± 1%  -0.35%  (p=0.011 n=20+20)
JSONEncode-12               16.9ms ± 1%    17.3ms ± 1%  +2.84%  (p=0.000 n=19+20)
JSONDecode-12               59.7ms ± 0%    58.5ms ± 0%  -2.05%  (p=0.000 n=19+17)
Mandelbrot200-12            3.92ms ± 0%    3.91ms ± 0%  -0.16%  (p=0.003 n=19+19)
GoParse-12                  3.79ms ± 2%    3.75ms ± 2%  -0.91%  (p=0.005 n=20+20)
RegexpMatchEasy0_32-12       102ns ± 1%     101ns ± 1%  -0.80%  (p=0.001 n=14+20)
RegexpMatchEasy0_1K-12       337ns ± 1%     346ns ± 1%  +2.90%  (p=0.000 n=20+19)
RegexpMatchEasy1_32-12      84.4ns ± 2%    84.3ns ± 2%    ~     (p=0.743 n=20+20)
RegexpMatchEasy1_1K-12       502ns ± 1%     505ns ± 0%  +0.64%  (p=0.000 n=20+20)
RegexpMatchMedium_32-12      133ns ± 1%     132ns ± 1%  -0.85%  (p=0.000 n=20+19)
RegexpMatchMedium_1K-12     40.1µs ± 1%    39.8µs ± 1%  -0.77%  (p=0.000 n=18+18)
RegexpMatchHard_32-12       2.08µs ± 1%    2.07µs ± 1%  -0.55%  (p=0.001 n=18+19)
RegexpMatchHard_1K-12       62.4µs ± 1%    62.0µs ± 1%  -0.74%  (p=0.000 n=19+19)
Revcomp-12                   545ms ± 2%     545ms ± 3%    ~     (p=0.771 n=19+20)
Template-12                 73.7ms ± 1%    72.0ms ± 0%  -2.33%  (p=0.000 n=20+18)
TimeParse-12                 358ns ± 1%     351ns ± 1%  -2.07%  (p=0.000 n=20+20)
TimeFormat-12                369ns ± 1%     356ns ± 0%  -3.53%  (p=0.000 n=20+18)
[Geo mean]                  63.5µs         63.2µs       -0.41%

name                      old speed      new speed      delta
GobDecode-12              88.3MB/s ± 0%  87.9MB/s ± 0%  -0.43%  (p=0.000 n=18+17)
GobEncode-12               116MB/s ± 1%   116MB/s ± 1%  +0.47%  (p=0.000 n=20+20)
Gzip-12                   60.2MB/s ± 2%  60.8MB/s ± 1%  +1.13%  (p=0.000 n=20+20)
Gunzip-12                  453MB/s ± 0%   453MB/s ± 0%    ~     (p=0.160 n=18+20)
JSONEncode-12              115MB/s ± 1%   112MB/s ± 1%  -2.76%  (p=0.000 n=19+20)
JSONDecode-12             32.5MB/s ± 0%  33.2MB/s ± 0%  +2.09%  (p=0.000 n=19+17)
GoParse-12                15.3MB/s ± 2%  15.4MB/s ± 2%  +0.92%  (p=0.004 n=20+20)
RegexpMatchEasy0_32-12     311MB/s ± 1%   314MB/s ± 1%  +0.78%  (p=0.000 n=15+19)
RegexpMatchEasy0_1K-12    3.04GB/s ± 1%  2.95GB/s ± 1%  -2.90%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12     379MB/s ± 2%   380MB/s ± 2%    ~     (p=0.779 n=20+20)
RegexpMatchEasy1_1K-12    2.04GB/s ± 1%  2.02GB/s ± 0%  -0.62%  (p=0.000 n=20+20)
RegexpMatchMedium_32-12   7.46MB/s ± 1%  7.53MB/s ± 1%  +0.86%  (p=0.000 n=20+19)
RegexpMatchMedium_1K-12   25.5MB/s ± 1%  25.7MB/s ± 1%  +0.78%  (p=0.000 n=18+18)
RegexpMatchHard_32-12     15.4MB/s ± 1%  15.5MB/s ± 1%  +0.62%  (p=0.000 n=19+19)
RegexpMatchHard_1K-12     16.4MB/s ± 1%  16.5MB/s ± 1%  +0.82%  (p=0.000 n=20+19)
Revcomp-12                 466MB/s ± 2%   466MB/s ± 3%    ~     (p=0.765 n=19+20)
Template-12               26.3MB/s ± 1%  27.0MB/s ± 0%  +2.38%  (p=0.000 n=20+18)
[Geo mean]                97.8MB/s       98.0MB/s       +0.23%

Change-Id: I281044ae0b24990ba46487cacbc1069493274bc4
Reviewed-on: https://go-review.googlesource.com/13614
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-22 17:48:13 +00:00
Matthew Dempsky
1652a2c316 runtime: add mSpanList type to represent lists of mspans
This CL introduces a new mSpanList type to replace the empty mspan
variables that were previously used as list heads.

To be type safe, the previous circular linked list data structure is
now a tail queue instead.  One complication of this is
mSpanList_Remove needs to know the list a span is being removed from,
but this appears to be computable in all circumstances.

As a temporary sanity check, mSpanList_Insert and mSpanList_InsertBack
record the list that an mspan has been inserted into so that
mSpanList_Remove can verify that the correct list was specified.

Whereas mspan is 112 bytes on amd64, mSpanList is only 16 bytes.  This
shrinks the size of mheap from 50216 bytes to 12584 bytes.

Change-Id: I8146364753dbc3b4ab120afbb9c7b8740653c216
Reviewed-on: https://go-review.googlesource.com/15906
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2015-10-22 17:12:06 +00:00
Aaron Jacobs
151f4ec95d runtime: remove unused printpc and printbyte functions
Change-Id: I40e338f6b445ca72055fc9bac0f09f0dca904e3a
Reviewed-on: https://go-review.googlesource.com/16191
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-22 15:02:44 +00:00
Matthew Dempsky
5a68eb9f25 runtime: prune some dead variables
Change-Id: I7a1c3079b433c4e30d72fb7d59f9594e0d5efe47
Reviewed-on: https://go-review.googlesource.com/16178
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-10-22 03:56:19 +00:00
Matthew Dempsky
29330c118d runtime: change fixalloc's chunk field to unsafe.Pointer
It's never used as a *byte anyway, so might as well just make it an
unsafe.Pointer instead.

Change-Id: I68ee418781ab2fc574eeac0498f2515b5561b7a8
Reviewed-on: https://go-review.googlesource.com/16175
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-22 01:14:23 +00:00
Shenghou Ma
1948aef6e3 runtime: fix typos
Change-Id: Iffc25fc80452baf090bf8ef15ab798cfaa120b8e
Reviewed-on: https://go-review.googlesource.com/16154
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-22 00:40:48 +00:00
Matthew Dempsky
58e3ae2fae runtime: split plan9 and solaris's m fields into new embedded mOS type
Reduces the size of m by ~8% on linux/amd64 (1040 bytes -> 960 bytes).

There are also windows-specific fields, but they're currently
referenced in OS-independent source files (but only when
GOOS=="windows").

Change-Id: I13e1471ff585ccced1271f74209f8ed6df14c202
Reviewed-on: https://go-review.googlesource.com/16173
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-22 00:04:52 +00:00
Matthew Dempsky
7df8ba136c runtime: replace unsafe pointer arithmetic with array indexing
Change-Id: I313819abebd4cda4a6c30fd0fd6f44cb1d09161f
Reviewed-on: https://go-review.googlesource.com/16167
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 23:22:20 +00:00
Matthew Dempsky
84afa1be76 runtime: make iface/eface handling more type safe
Change compiler-invoked interface functions to directly take
iface/eface parameters instead of fInterface/interface{} to avoid
needing to always convert.

For the handful of functions that legitimately need to take an
interface{} parameter, add efaceOf to type-safely convert *interface{}
to *eface.

Change-Id: I8928761a12fd3c771394f36adf93d3006a9fcf39
Reviewed-on: https://go-review.googlesource.com/16166
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 23:08:22 +00:00
Ian Lance Taylor
73f329f472 runtime, syscall: add calls to msan functions
Add explicit memory sanitizer instrumentation to the runtime and syscall
packages.  The compiler does not instrument the runtime package.  It
does instrument the syscall package, but we need to add a couple of
cases that it can't see.

Change-Id: I2d66073f713fe67e33a6720460d2bb8f72f31394
Reviewed-on: https://go-review.googlesource.com/16164
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-21 19:17:46 +00:00
Matthew Dempsky
c279250946 runtime: change functype's in and out fields to []*_type
Allows removing a few gratuitous unsafe.Pointer conversions and
parallels the type of reflect.funcType's in and out fields ([]*rtype).

Change-Id: Ie5ca230a94407301a854dfd8782a3180d5054bc4
Reviewed-on: https://go-review.googlesource.com/16163
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 18:37:45 +00:00
Ian Lance Taylor
5174df9087 runtime, runtime/msan: add msan runtime support
These are the runtime support functions for letting Go code interoperate
with the C/C++ memory sanitizer.  Calls to msanread/msanwrite are now
inserted by the compiler with the -msan option.  Calls to
msanmalloc/msanfree will be from other runtime functions in a subsequent
CL.

Change-Id: I64fb061b38cc6519153face242eccd291c07d1f2
Reviewed-on: https://go-review.googlesource.com/16162
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-21 17:50:39 +00:00
Austin Clements
a42f668654 runtime: eliminate unused _GCstw phase
Change-Id: Ie94cd17e1975fdaaa418fa6a7b2d3b164fedc135
Reviewed-on: https://go-review.googlesource.com/16057
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-21 16:26:34 +00:00
Austin Clements
28f458ce5b runtime: eliminate unnecessary ragged barrier
The ragged barrier after entering the concurrent mark phase is
vestigial. This used to be the point where we enabled write barriers,
so it was necessary to synchronize all Ps to ensure write barriers
were enabled before any marking occurred. However, we've long since
switched to enabling write barriers during the concurrent scan phase,
so the start-the-world at the beginning of the concurrent scan phase
ensures that all Ps have enabled the write barrier.

Hence, we can eliminate the old "install write barrier" phase.

Fixes #11971.

Change-Id: I8cdcb84b5525cef19927d51ea11ba0a4db991ea8
Reviewed-on: https://go-review.googlesource.com/16044
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-21 16:26:25 +00:00
Matthew Dempsky
d4a7ea1b71 runtime: add stringStructOf helper function
Instead of open-coding conversions from *string to unsafe.Pointer then
to *stringStruct, add a helper function to add some type safety.
Bonus: This caught two **string values being converted to
*stringStruct in heapdump.go.

While here, get rid of the redundant _string type, but add in a
stringStructDWARF type used for generating DWARF debug info.

Change-Id: I8882f8cca66ac45190270f82019a5d85db023bd2
Reviewed-on: https://go-review.googlesource.com/16131
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-20 23:13:27 +00:00
Aaron Jacobs
ef986fa3fc runtime: change odd 'print1_write' file names
The '1' part is left over from the C conversion, but no longer makes
sense given that print1.go no longer exists.

Change-Id: Iec171251370d740f234afdbd6fb1a4009fde6696
Reviewed-on: https://go-review.googlesource.com/16036
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-20 23:03:06 +00:00
Hyang-Ah Hana Kim
30ee5919bd runtime: add syscalls needed for android/amd64 logging.
access, connect, socket.

In Android-L, logging is done by writing the log messages to the logd
process through a unix domain socket.

Also, changed the arg types of those syscall stubs to match linux
programming APIs.

For golang/go#10743

Change-Id: I66368a03316e253561e9e76aadd180c2cd2e48f3
Reviewed-on: https://go-review.googlesource.com/15993
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-10-20 16:56:58 +00:00
Aaron Jacobs
3bc0601742 runtime: rename _func.frame to make it clear it's deprecated and unused.
When I saw that it was labelled "legacy", I went looking for users of it
to see how it was still used. But there aren't any. Save the next person
the trouble.

Change-Id: I921dd6c57b60331c9816542272555153ac133c02
Reviewed-on: https://go-review.googlesource.com/16035
Reviewed-by: Dave Cheney <dave@cheney.net>
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-20 03:16:09 +00:00
Michael Hudson-Doyle
c5856cfdb6 runtime: tweaks to allow -buildmode=shared to work
Building Go shared libraries requires that all functions that have declarations
without bodies have implementations and vice versa, so remove the
implementation of call16 and add a stub implementation of sigreturn.

Change-Id: I4d5a30c8637a5da7991054e151a536611d5bea46
Reviewed-on: https://go-review.googlesource.com/15966
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-19 21:23:36 +00:00
Austin Clements
3cd56b4dca runtime: combine gcResetGState and gcResetMarkState
These functions are always called together and perform logically
related state resets, so combine them in to just gcResetMarkState.

Fixes #11427.

Change-Id: I06c17ef65f66186494887a767b3993126955b5fe
Reviewed-on: https://go-review.googlesource.com/16041
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-19 18:38:07 +00:00
Austin Clements
b0d5e5c500 runtime: consolidate gcResetGState calls
Currently gcResetGState is called by func gcscan_m for concurrent GC
and directly by func gc for STW GC. Simplify this by consolidating
these two calls in to one call by func gc above where it splits for
concurrent and STW GC.

As a consequence, gcResetGState and gcResetMarkState are always called
together, so the next commit will consolidate these.

Change-Id: Ib62d404c7b32b28f7d3080d26ecf3966cbc4aca0
Reviewed-on: https://go-review.googlesource.com/16040
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-19 18:38:00 +00:00
Austin Clements
feb92a8e8c runtime: remove work.partial queue
This work queue is no longer used (there are many reads of
work.partial, but the only write is in putpartial, which is never
called).

Fixes #11922.

Change-Id: I08b76c0c02a0867a9cdcb94783e1f7629d44249a
Reviewed-on: https://go-review.googlesource.com/15892
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-19 18:37:54 +00:00
Aaron Jacobs
5d88323fa6 runtime: remove a redundant nil pointer check.
It appears this was made possible by commit 89f185f; before that, g was
not dereferenced above.

Change-Id: I70bc571d924b36351392fd4c13d681e938cfb573
Reviewed-on: https://go-review.googlesource.com/16033
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-10-19 09:58:15 +00:00
Nodir Turakulov
386fa03609 runtime: merge proc1.go -> proc.go
from proc1.go to proc.go:
* prepend header comment explaining "Goroutine scheduler"
* insert m0 and g0 var defs after the comment
* append the rest

Updates #12952

Change-Id: I35ee9ae3287675cde0c1b6aeaca0a460393f2354
Reviewed-on: https://go-review.googlesource.com/16024
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-19 01:11:00 +00:00
Nodir Turakulov
243757576d runtime: merge race1.go -> race.go
* append contents of race1.go to race.go
* delete "Implementation of the race detector API." comment
  from race1.go

Updates #12952

Change-Id: Ibdd9c4dc79a63c3bef69eade9525578063c86c1c
Reviewed-on: https://go-review.googlesource.com/16023
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-18 23:48:22 +00:00
Michael Hudson-Doyle
6deb3c0619 runtime, runtime/cgo: conform to PIC register use rules in ppc64 asm
PIC code on ppc64le uses R2 as a TOC pointer and when calling a function
through a function pointer must ensure the function pointer is in R12.  These
rules are easy enough to follow unconditionally in our assembly, so do that.

Change-Id: Icfc4e47ae5dfbe15f581cbdd785cdeed6e40bc32
Reviewed-on: https://go-review.googlesource.com/15526
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-18 23:36:39 +00:00
Michael Hudson-Doyle
b8f8969fbd reflect, runtime, runtime/cgo: use ppc64 asm constant for fixed frame size
Shared libraries on ppc64le will require a larger minimum stack frame (because
the ABI mandates that the TOC pointer is available at 24(R1)). Part 3 of that
is using a #define in the ppc64 assembly to refer to the size of the fixed
part of the stack (finding all these took me about a week!).

Change-Id: I50f22fe1c47af1ec59da1bd7ea8f84a4750df9b7
Reviewed-on: https://go-review.googlesource.com/15525
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-18 23:15:26 +00:00
Michael Hudson-Doyle
a4855812e2 runtime: add a constant for the smallest possible stack frame
Shared libraries on ppc64le will require a larger minimum stack frame (because
the ABI mandates that the TOC pointer is available at 24(R1)). So to prepare
for this, make a constant for the fixed part of a stack and use that where
necessary.

Change-Id: I447949f4d725003bb82e7d2cf7991c1bca5aa887
Reviewed-on: https://go-review.googlesource.com/15523
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-18 22:14:00 +00:00
Michael Hudson-Doyle
45c06b27a4 cmd/internal/obj, runtime: add NOFRAME flag to suppress stack frame set up on ppc64x
Replace the confusing game where a frame size of $-8 would suppress the
implicit setting up of a stack frame with a nice explicit flag.

The code to set up the function prologue is still a little confusing but better
than it was.

Change-Id: I1d49278ff42c6bc734ebfb079998b32bc53f8d9a
Reviewed-on: https://go-review.googlesource.com/15670
Reviewed-by: Minux Ma <minux@golang.org>
2015-10-18 22:13:30 +00:00
Nodir Turakulov
db2e73faeb runtime: merge stack{1,2}.go -> stack.go
* rename stack1.go -> stack.go
* prepend contents of stack2.go to stack.go

Updates #12952

Change-Id: I60d409af37162a5a7596c678dfebc2cea89564ff
Reviewed-on: https://go-review.googlesource.com/16008
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-17 20:52:22 +00:00
Matthew Dempsky
4562784bae runtime: remove some unnecessary unsafe code in mfixalloc
Change-Id: Ie9ea4af4315a4d0eb69d0569726bb3eca2b397af
Reviewed-on: https://go-review.googlesource.com/16005
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-17 00:26:26 +00:00
Nodir Turakulov
9358f7fa61 runtime: merge panic1.go into panic.go
A TODO to merge is removed from panic1.go.
The rest is appended to panic.go

Updates #12952

Change-Id: Ied4382a455abc20bc2938e34d031802e6b4baf8b
Reviewed-on: https://go-review.googlesource.com/15905
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-16 15:51:49 +00:00
Nodir Turakulov
d72d299f3e runtime: rename print1.go -> print.go
It seems that it was called print1.go mistakenly: print.go was deleted
in the same commit:
https://go.googlesource.com/go/+/597b266eafe7d63e9be8da1c1b4813bd2998a11c

Updates #12952

Change-Id: I371e59d6cebc8824857df3f3ee89101147dfffc0
Reviewed-on: https://go-review.googlesource.com/15950
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-16 15:51:30 +00:00
Nodir Turakulov
881b0e7880 runtime: merge string1.go into string.go
string1.go contents are appended to string.go as is

Updates #12952

Change-Id: I30083ba7fdd362d4421e964a494c76ca865bedc2
Reviewed-on: https://go-review.googlesource.com/15951
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-16 15:46:02 +00:00
Michael Hudson-Doyle
42c7929c04 runtime, runtime/debug: access unexported runtime functions with //go:linkname, not assembly stubs
Change-Id: I88f80f5914d6e4c179f3d28aa59fc29b7ef0cc66
Reviewed-on: https://go-review.googlesource.com/15960
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-16 09:14:25 +00:00
Michael Hudson-Doyle
0b8d583320 runtime, os/signal: use //go:linkname instead of assembly stubs to get access to runtime functions
os/signal depends on a few unexported runtime functions. This removes the
assembly stubs it used to get access to these in favour of using
//go:linkname in runtime to make the functions accessible to os/signal.

This is motivated by ppc64le shared libraries, where you cannot BR to a symbol
defined in a shared library (only BL), but it seems like an improvment anyway.

Change-Id: I09361203ce38070bd3f132f6dc5ac212f2dc6f58
Reviewed-on: https://go-review.googlesource.com/15871
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-10-16 07:11:04 +00:00
Matthew Dempsky
4c2465d47d runtime: use unsafe.Pointer(x) instead of (unsafe.Pointer)(x)
This isn't C anymore.  No binary change to pkg/linux_amd64/runtime.a.

Change-Id: I24d66b0f5ac888f432b874aac684b1395e7c8345
Reviewed-on: https://go-review.googlesource.com/15903
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-15 21:48:37 +00:00
Raul Silvera
1d765b77a0 runtime: Reduce testing for fastlog2 implementation
The current fastlog2 testing checks all 64M values in the domain of
interest, which is too much for platforms with no native floating point.

Reduce testing under testing.Short() to speed up builds for those platforms.

Related to #12620

Change-Id: Ie5dcd408724ba91c3b3fcf9ba0dddedb34706cd1
Reviewed-on: https://go-review.googlesource.com/15830
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Joel Sing <jsing@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-14 04:54:33 +00:00
Ian Lance Taylor
2961cab965 runtime: remove _Kind constants
The duplication of _Kind and kind constants is a legacy of the
conversion from C.

Change-Id: I368b35a41f215cf91ac4b09dac59699edb414a0e
Reviewed-on: https://go-review.googlesource.com/15800
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-13 00:15:36 +00:00
Austin Clements
65aa2da617 runtime: assist before allocating
Currently, when the mutator allocates, the runtime first allocates the
memory and then, if that G has done "enough" allocation, the runtime
checks whether the G has assist debt to pay off and, if so, pays it
off. This approach leads to under-assisting, where a G can allocate a
large region (or many small regions) before paying for it, or can even
exit with outstanding debt.

This commit flips this around so that a G always acquires enough
credit for an allocation before it can perform that allocation. We
continue to amortize the cost of assists by requiring that they
over-assist when triggered to build up credit for many allocations.

Fixes #11967.

Change-Id: Idac9f11133b328535667674d837be72c23ebd899
Reviewed-on: https://go-review.googlesource.com/15409
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-10-09 19:39:03 +00:00
Austin Clements
89c341c5e9 runtime: directly track GC assist balance
Currently we track the per-G GC assist balance as two monotonically
increasing values: the bytes allocated by the G this cycle (gcalloc)
and the scan work performed by the G this cycle (gcscanwork). The
assist balance is hence assistRatio*gcalloc - gcscanwork.

This works, but has two important downsides:

1) It requires floating-point math to figure out if a G is in debt or
   not. This makes it inappropriate to check for assist debt in the
   hot path of mallocgc, so we only do this when a G allocates a new
   span. As a result, Gs can operate "in the red", leading to
   under-assist and extended GC cycle length.

2) Revising the assist ratio during a GC cycle can lead to an "assist
   burst". If you think of plotting the scan work performed versus
   heaps size, the assist ratio controls the slope of this line.
   However, in the current system, the target line always passes
   through 0 at the heap size that triggered GC, so if the runtime
   increases the assist ratio, there has to be a potentially large
   assist to jump from the current amount of scan work up to the new
   target scan work for the current heap size.

This commit replaces this approach with directly tracking the GC
assist balance in terms of allocation credit bytes. Allocating N bytes
simply decreases this by N and assisting raises it by the amount of
scan work performed divided by the assist ratio (to get back to
bytes).

This will make it cheap to figure out if a G is in debt, which will
let us efficiently check if an assist is necessary *before* performing
an allocation and hence keep Gs "in the black".

This also fixes assist bursts because the assist ratio is now in terms
of *remaining* work, rather than work from the beginning of the GC
cycle. Hence, the plot of scan work versus heap size becomes
continuous: we can revise the slope, but this slope always starts from
where we are right now, rather than where we were at the beginning of
the cycle.

Change-Id: Ia821c5f07f8a433e8da7f195b52adfedd58bdf2c
Reviewed-on: https://go-review.googlesource.com/15408
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:52 +00:00
Austin Clements
9e77c89868 runtime: ensure minimum heap distance via heap goal
Currently we ensure a minimum heap distance of 1MB when computing the
assist ratio. Rather than enforcing this minimum on the heap distance,
it makes more sense to enforce that the heap goal itself is at least
1MB over the live heap size at the beginning of GC. Currently the two
approaches are semantically equivalent, but this will let us switch to
basing the assist ratio on current heap distance rather than the
initial heap distance, since we can't enforce this minimum on the
current heap distance (the GC may never finish because the goal posts
will always be 1MB away).

Change-Id: I0027b1c26a41a0152b01e5b67bdb1140d43ee903
Reviewed-on: https://go-review.googlesource.com/15604
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:39 +00:00
Austin Clements
8e8219deb5 runtime: update gcController.scanWork regularly
Currently, gcController.scanWork is updated as lazily as possible
since it is only read at the end of the GC cycle. We're about to read
it during the GC cycle to improve the assist ratio revisions, so
modify gcDrain* to regularly flush to gcController.scanWork in much
the same way as we regularly flush to gcController.bgScanCredit.

One consequence of this is that it's difficult to keep gcw.scanWork
monotonic, so we give up on that and simply return the amount of scan
work done by gcDrainN rather than calculating it in the caller.

Change-Id: I7b50acdc39602f843eed0b5c6d2dacd7e762b81d
Reviewed-on: https://go-review.googlesource.com/15407
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:29 +00:00
Austin Clements
c18b163c15 runtime: control background scan credit flushing with flag
Currently callers of gcDrain control whether it flushes scan work
credit to gcController.bgScanCredit by passing a value other than -1
for the flush threshold. Shortly we're going to make this always flush
scan work to gcController.scanWork and optionally also flush scan work
to gcController.bgScanCredit. This will be much easier if the flush
threshold is simply a constant (which it is in practice) and callers
merely control whether or not the flush includes the background
credit. Hence, replace the flush threshold argument with a flag.

Change-Id: Ia27db17de8a3f1e462a5d7137d4b5dc72f99a04e
Reviewed-on: https://go-review.googlesource.com/15406
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:16 +00:00
Austin Clements
9b3cdaf0a3 runtime: consolidate gcDrain and gcDrainUntilPreempt
These functions were nearly identical. Consolidate them by adding a
flags argument. In addition to cleaning up this code, this makes
further changes that affect both functions easier.

Change-Id: I6ec5c947603bbbd3ff4040113b2fbc240e99745f
Reviewed-on: https://go-review.googlesource.com/15405
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:38:03 +00:00
Austin Clements
39ed682206 runtime: explain why continuous assist revising is necessary
Change-Id: I950af8d80433b3ae8a1da0aa7a8d2d0b295dd313
Reviewed-on: https://go-review.googlesource.com/15404
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-10-09 19:37:53 +00:00
Austin Clements
3271250ec4 runtime: fix comment for gcAssistAlloc
Change-Id: I312e56e95d8ef8ae036d16444ab1e2df1285845d
Reviewed-on: https://go-review.googlesource.com/15403
Reviewed-by: Russ Cox <rsc@golang.org>
2015-10-09 19:37:41 +00:00
Austin Clements
3e57b17dc3 runtime: fix comment for assistRatio
The comment for assistRatio claimed it to be the reciprocal of what it
actually is.

Change-Id: If7f9bb853d75d0097facff3aa6704b224d9108b8
Reviewed-on: https://go-review.googlesource.com/15402
Reviewed-by: Russ Cox <rsc@golang.org>
2015-10-09 19:37:23 +00:00
Nodir Turakulov
3be4d59820 runtime: remove redundant type cast
(*T)(unsafe.Pointer(&t)) === &t
for t of type T

Change-Id: I43c1aa436747dfa0bf4cb0d615da1647633f9536
Reviewed-on: https://go-review.googlesource.com/15656
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-09 18:48:36 +00:00
Keith Randall
91059de095 runtime: make aeshash more DOS-proof
Improve the aeshash implementation to make it harder to engineer collisions.

1) Scramble the seed before xoring with the input string.  This
   makes it harder to cancel known portions of the seed (like the size)
   because it mixes the per-table seed into those other parts.

2) Use table-dependent seeds for all stripes when hashing >16 byte strings.

For small strings this change uses 4 aesenc ops instead of 3, so it
is somewhat slower.  The first two can run in parallel, though, so
it isn't 33% slower.

benchmark                            old ns/op     new ns/op     delta
BenchmarkHash64-12                   10.2          11.2          +9.80%
BenchmarkHash16-12                   5.71          6.13          +7.36%
BenchmarkHash5-12                    6.64          7.01          +5.57%
BenchmarkHashBytesSpeed-12           30.3          31.9          +5.28%
BenchmarkHash65536-12                2785          2882          +3.48%
BenchmarkHash1024-12                 53.6          55.4          +3.36%
BenchmarkHashStringArraySpeed-12     54.9          56.5          +2.91%
BenchmarkHashStringSpeed-12          18.7          19.2          +2.67%
BenchmarkHashInt32Speed-12           14.8          15.1          +2.03%
BenchmarkHashInt64Speed-12           14.5          14.5          +0.00%

Change-Id: I59ea124b5cb92b1c7e8584008257347f9049996c
Reviewed-on: https://go-review.googlesource.com/14124
Reviewed-by: jcd . <jcd@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-08 16:43:03 +00:00
Michael Hudson-Doyle
168a51b3a1 runtime: adjust the arm64 memmove and memclr to operate by word as much as they can
Not only is this an obvious optimization:

benchmark                           old MB/s     new MB/s     speedup
BenchmarkMemmove1-4                 35.35        29.65        0.84x
BenchmarkMemmove2-4                 63.78        52.53        0.82x
BenchmarkMemmove3-4                 89.72        73.96        0.82x
BenchmarkMemmove4-4                 109.94       95.73        0.87x
BenchmarkMemmove5-4                 127.60       112.80       0.88x
BenchmarkMemmove6-4                 143.59       126.67       0.88x
BenchmarkMemmove7-4                 157.90       138.92       0.88x
BenchmarkMemmove8-4                 167.18       231.81       1.39x
BenchmarkMemmove9-4                 175.23       252.07       1.44x
BenchmarkMemmove10-4                165.68       261.10       1.58x
BenchmarkMemmove11-4                174.43       263.31       1.51x
BenchmarkMemmove12-4                180.76       267.56       1.48x
BenchmarkMemmove13-4                189.06       284.93       1.51x
BenchmarkMemmove14-4                186.31       284.72       1.53x
BenchmarkMemmove15-4                195.75       281.62       1.44x
BenchmarkMemmove16-4                202.96       439.23       2.16x
BenchmarkMemmove32-4                264.77       775.77       2.93x
BenchmarkMemmove64-4                306.81       1209.64      3.94x
BenchmarkMemmove128-4               357.03       1515.41      4.24x
BenchmarkMemmove256-4               380.77       2066.01      5.43x
BenchmarkMemmove512-4               385.05       2556.45      6.64x
BenchmarkMemmove1024-4              381.23       2804.10      7.36x
BenchmarkMemmove2048-4              379.06       2814.83      7.43x
BenchmarkMemmove4096-4              387.43       3064.96      7.91x
BenchmarkMemmoveUnaligned1-4        28.91        25.40        0.88x
BenchmarkMemmoveUnaligned2-4        56.13        47.56        0.85x
BenchmarkMemmoveUnaligned3-4        74.32        69.31        0.93x
BenchmarkMemmoveUnaligned4-4        97.02        83.58        0.86x
BenchmarkMemmoveUnaligned5-4        110.17       103.62       0.94x
BenchmarkMemmoveUnaligned6-4        124.95       113.26       0.91x
BenchmarkMemmoveUnaligned7-4        142.37       130.82       0.92x
BenchmarkMemmoveUnaligned8-4        151.20       205.64       1.36x
BenchmarkMemmoveUnaligned9-4        166.97       215.42       1.29x
BenchmarkMemmoveUnaligned10-4       148.49       221.22       1.49x
BenchmarkMemmoveUnaligned11-4       159.47       239.57       1.50x
BenchmarkMemmoveUnaligned12-4       163.52       247.32       1.51x
BenchmarkMemmoveUnaligned13-4       167.55       256.54       1.53x
BenchmarkMemmoveUnaligned14-4       175.12       251.03       1.43x
BenchmarkMemmoveUnaligned15-4       192.10       267.13       1.39x
BenchmarkMemmoveUnaligned16-4       190.76       378.87       1.99x
BenchmarkMemmoveUnaligned32-4       259.02       562.98       2.17x
BenchmarkMemmoveUnaligned64-4       317.72       842.44       2.65x
BenchmarkMemmoveUnaligned128-4      355.43       1274.49      3.59x
BenchmarkMemmoveUnaligned256-4      378.17       1815.74      4.80x
BenchmarkMemmoveUnaligned512-4      362.15       2180.81      6.02x
BenchmarkMemmoveUnaligned1024-4     376.07       2453.58      6.52x
BenchmarkMemmoveUnaligned2048-4     381.66       2568.32      6.73x
BenchmarkMemmoveUnaligned4096-4     398.51       2669.36      6.70x
BenchmarkMemclr5-4                  113.83       107.93       0.95x
BenchmarkMemclr16-4                 223.84       389.63       1.74x
BenchmarkMemclr64-4                 421.99       1209.58      2.87x
BenchmarkMemclr256-4                525.94       2411.58      4.59x
BenchmarkMemclr4096-4               581.66       4372.20      7.52x
BenchmarkMemclr65536-4              565.84       4747.48      8.39x
BenchmarkGoMemclr5-4                194.63       160.31       0.82x
BenchmarkGoMemclr16-4               295.30       630.07       2.13x
BenchmarkGoMemclr64-4               480.24       1884.03      3.92x
BenchmarkGoMemclr256-4              540.23       2926.49      5.42x

but it turns out that it's necessary to avoid the GC seeing partially written
pointers.

It's of course possible to be more sophisticated (using ldp/stp to move 16
bytes at a time in the core loop and unrolling the tail copying loops being
the obvious ideas) but I wanted something simple and (reasonably) obviously
correct.

Fixes #12552

Change-Id: Iaeaf8a812cd06f4747ba2f792de1ded738890735
Reviewed-on: https://go-review.googlesource.com/14813
Reviewed-by: Austin Clements <austin@google.com>
2015-10-08 07:49:35 +00:00
Michael Hudson-Doyle
a5cb76243a cmd/internal/obj, cmd/link, runtime: lots of TLS cleanup
It's particularly nice to get rid of the android special cases in the linker.

Change-Id: I516363af7ce8a6b2f196fe49cb8887ac787a6dad
Reviewed-on: https://go-review.googlesource.com/14197
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-08 00:21:30 +00:00
Raul Silvera
27ee719fb3 pprof: improve sampling for heap profiling
The current heap sampling introduces some bias that interferes
with unsampling, producing unexpected heap profiles.
The solution is to use a Poisson process to generate the
sampling points, using the formulas described at
https://en.wikipedia.org/wiki/Poisson_process

This fixes #12620

Change-Id: If2400809ed3c41de504dd6cff06be14e476ff96c
Reviewed-on: https://go-review.googlesource.com/14590
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-05 08:15:09 +00:00
Austin Clements
9f6df6c940 runtime: use 4 byte writes in amd64p32 memmove/memclr
Currently, amd64p32's memmove and memclr use 8 byte writes as much as
possible and 1 byte writes for the tail of the object. However, if an
object ends with a 4 byte pointer at an 8 byte aligned offset, this
may copy/zero the pointer field one byte at a time, allowing the
garbage collector to observe a partially copied pointer.

Fix this by using 4 byte writes instead of 8 byte writes.

Updates #12552.

Change-Id: I13324fd05756fb25ae57e812e836f0a975b5595c
Reviewed-on: https://go-review.googlesource.com/15370
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-02 22:49:15 +00:00
Austin Clements
44078a3228 runtime: adjust huge page flags only on huge page granularity
This fixes an issue where the runtime panics with "out of memory" or
"cannot allocate memory" even though there's ample memory by reducing
the number of memory mappings created by the memory allocator.

Commit 7e1b61c worked around issue #8832 where Linux's transparent
huge page support could dramatically increase the RSS of a Go process
by setting the MADV_NOHUGEPAGE flag on any regions of pages released
to the OS with MADV_DONTNEED. This had the side effect of also
increasing the number of VMAs (memory mappings) in a Go address space
because a separate VMA is needed for every region of the virtual
address space with different flags. Unfortunately, by default, Linux
limits the number of VMAs in an address space to 65530, and a large
heap can quickly reach this limit when the runtime starts scavenging
memory.

This commit dramatically reduces the number of VMAs. It does this
primarily by only adjusting the huge page flag at huge page
granularity. With this change, on amd64, even a pessimal heap that
alternates between MADV_NOHUGEPAGE and MADV_HUGEPAGE must reach 128GB
to reach the VMA limit. Because of this rounding to huge page
granularity, this change is also careful to leave large used and
unused regions huge page-enabled.

This change reduces the maximum number of VMAs during the runtime
benchmarks with GODEBUG=scavenge=1 from 692 to 49.

Fixes #12233.

Change-Id: Ic397776d042f20d53783a1cacf122e2e2db00584
Reviewed-on: https://go-review.googlesource.com/15191
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-02 20:20:43 +00:00
Austin Clements
9a31d38f65 runtime: remove sweep wait loop in finishsweep_m
In general, finishsweep_m must block until any spans that are
concurrently being swept have been swept. It accomplishes this by
looping over all spans, which, as in the previous commit, takes
~1ms/heap GB. Unfortunately, we do this during the STW sweep
termination phase, so multi-gigabyte heaps can push our STW time past
10ms.

However, there's no need to do this wait if the world is stopped
because, in effect, stopping the world already had to wait for
anything that was sweeping (and if it didn't, the wait in
finishsweep_m would deadlock). Hence, we can simply skip this loop if
the world is stopped, such as during sweep termination. In fact,
currently all calls to finishsweep_m are STW, but this hasn't always
been the case and may not be the case in the future, so we keep the
logic around.

For 24GB heaps, this reduces max pause time by 75% relative to tip and
by 90% relative to Go 1.5. Notably, all pauses are now well under
10ms. Here are the results for the garbage benchmark:

               ------------- max pause ------------
Heap   Procs   after change   before change   1.5.1
24GB     12        3.8ms          16ms         37ms
24GB      4        3.7ms          16ms         37ms
 4GB      4        3.7ms           3ms        6.9ms

In the 4GB/4P case, it seems the "before change" run got lucky: the
max went up, but the 99%ile pause time went down from 3ms to 2.04ms.

Change-Id: Ica22189559f231d408ef2815019c9dbb5f38bf31
Reviewed-on: https://go-review.googlesource.com/15071
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-02 19:56:01 +00:00
Austin Clements
dac220b0a9 runtime: remove in-use page count loop from STW
In order to compute the sweep ratio, the runtime needs to know how
many pages belong to spans in state _MSpanInUse. Currently it finds
this out by looping over all spans during mark termination. However,
this takes ~1ms/heap GB, so multi-gigabyte heaps can quickly push our
STW time past 10ms.

Replace the loop with an actively maintained count of in-use pages.

For multi-gigabyte heaps, this reduces max mark termination pause time
by 75%–90% relative to tip and by 85%–95% relative to Go 1.5.1. This
shifts the longest pause time for large heaps to the sweep termination
phase, so it only slightly decreases max pause time, though it roughly
halves mean pause time. Here are the results for the garbage
benchmark:

               ---- max mark termination pause ----
Heap   Procs   after change   before change   1.5.1
24GB     12        1.9ms          18ms         37ms
24GB      4        3.7ms          18ms         37ms
 4GB      4        920µs         3.8ms        6.9ms

Fixes #11484.

Change-Id: Ia2d28bb8a1e4f1c3b8ebf79fb203f12b9bf114ac
Reviewed-on: https://go-review.googlesource.com/15070
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-02 19:55:55 +00:00
Austin Clements
608c1b0d56 runtime: scan objects with finalizers concurrently
This reduces pause time by ~25% relative to tip and by ~50% relative
to Go 1.5.1.

Currently one of the steps of STW mark termination is to loop (in
parallel) over all spans to find objects with finalizers in order to
mark all objects reachable from these objects and to treat the
finalizer special as a root. Unfortunately, even if there are no
finalizers at all, this loop takes roughly 1 ms/heap GB/core, so
multi-gigabyte heaps can quickly push our STW time past 10ms.

Fix this by moving this scan from mark termination to concurrent scan,
where it can run in parallel with mutators. The loop itself could also
be optimized, but this cost is small compared to concurrent marking.

Making this scan concurrent introduces two complications:

1) The scan currently walks the specials list of each span without
locking it, which is safe only with the world stopped. We fix this by
speculatively checking if a span has any specials (the vast majority
won't) and then locking the specials list only if there are specials
to check.

2) An object can have a finalizer set after concurrent scan, in which
case it won't have been marked appropriately by concurrent scan. If
the finalizer is a closure and is only reachable from the special, it
could be swept before it is run. Likewise, if the object is not marked
yet when the finalizer is set and then becomes unreachable before it
is marked, other objects reachable only from it may be swept before
the finalizer function is run. We fix this issue by making
addfinalizer ensure the same marking invariants as markroot does.

For multi-gigabyte heaps, this reduces max pause time by 20%–30%
relative to tip (depending on GOMAXPROCS) and by ~50% relative to Go
1.5.1 (where this loop was neither concurrent nor parallel). Here are
the results for the garbage benchmark:

               ---------------- max pause ----------------
Heap   Procs   Concurrent scan   STW parallel scan   1.5.1
24GB     12         18ms              23ms            37ms
24GB      4         18ms              25ms            37ms
 4GB      4         3.8ms            4.9ms           6.9ms

In all cases, 95%ile pause time is similar to the max pause time. This
also improves mean STW time by 10%–30%.

Fixes #11485.

Change-Id: I9359d8c3d120a51d23d924b52bf853a1299b1dfd
Reviewed-on: https://go-review.googlesource.com/14982
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-02 19:55:48 +00:00
Austin Clements
fbd2660af3 runtime: introduce gcMode type for GC modes
Currently, the GC modes constants are untyped and functions pass them
around as ints. Clean this up by introducing a proper type for these
constant.

Change-Id: Ibc022447bdfa203644921fbb548312d7e2272e8d
Reviewed-on: https://go-review.googlesource.com/14981
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-02 19:55:41 +00:00
Austin Clements
1b84bb8c7c runtime: fix out-of-date comment on gcWork usage
Change-Id: I3c21ffa80a5c14911e07238b1f64bec686ed7b72
Reviewed-on: https://go-review.googlesource.com/14980
Reviewed-by: Minux Ma <minux@golang.org>
2015-10-02 19:55:34 +00:00
David Crawshaw
47ccf96a95 runtime: darwin/386 entrypoint for c-archive
Change-Id: Ic22597b5e2824cffe9598cb9b506af3426c285fd
Reviewed-on: https://go-review.googlesource.com/12412
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-02 11:45:52 +00:00
Michael Hudson-Doyle
2c911143fd runtime: adjust the ppc64x memmove and memclr to copy by word as much as it can
Issue #12552 can happen on ppc64 too, although much less frequently in my
testing. I'm fairly sure this fixes it (2 out of 200 runs of oracle.test failed
without this change and 0 of 200 failed with it). It's also a lot faster for
large moves/clears:

name           old speed      new speed       delta
Memmove1-6      157MB/s ± 9%    144MB/s ± 0%    -8.20%         (p=0.004 n=10+9)
Memmove2-6      281MB/s ± 1%    249MB/s ± 1%   -11.53%        (p=0.000 n=10+10)
Memmove3-6      376MB/s ± 1%    328MB/s ± 1%   -12.64%        (p=0.000 n=10+10)
Memmove4-6      475MB/s ± 4%    345MB/s ± 1%   -27.28%         (p=0.000 n=10+8)
Memmove5-6      540MB/s ± 1%    393MB/s ± 0%   -27.21%        (p=0.000 n=10+10)
Memmove6-6      609MB/s ± 0%    423MB/s ± 0%   -30.56%         (p=0.000 n=9+10)
Memmove7-6      659MB/s ± 0%    468MB/s ± 0%   -28.99%         (p=0.000 n=8+10)
Memmove8-6      705MB/s ± 0%   1295MB/s ± 1%   +83.73%          (p=0.000 n=9+9)
Memmove9-6      740MB/s ± 1%   1241MB/s ± 1%   +67.61%         (p=0.000 n=10+8)
Memmove10-6     780MB/s ± 0%   1162MB/s ± 1%   +48.95%         (p=0.000 n=10+9)
Memmove11-6     811MB/s ± 0%   1180MB/s ± 0%   +45.58%          (p=0.000 n=8+9)
Memmove12-6     820MB/s ± 1%   1073MB/s ± 1%   +30.83%         (p=0.000 n=10+9)
Memmove13-6     849MB/s ± 0%   1068MB/s ± 1%   +25.87%        (p=0.000 n=10+10)
Memmove14-6     877MB/s ± 0%    911MB/s ± 0%    +3.83%        (p=0.000 n=10+10)
Memmove15-6     893MB/s ± 0%    922MB/s ± 0%    +3.25%         (p=0.000 n=10+9)
Memmove16-6     897MB/s ± 1%   2418MB/s ± 1%  +169.67%         (p=0.000 n=10+9)
Memmove32-6     908MB/s ± 0%   3927MB/s ± 2%  +332.64%         (p=0.000 n=10+8)
Memmove64-6    1.11GB/s ± 0%   5.59GB/s ± 0%  +404.64%          (p=0.000 n=9+9)
Memmove128-6   1.25GB/s ± 0%   6.71GB/s ± 2%  +437.49%         (p=0.000 n=9+10)
Memmove256-6   1.33GB/s ± 0%   7.25GB/s ± 1%  +445.06%        (p=0.000 n=10+10)
Memmove512-6   1.38GB/s ± 0%   8.87GB/s ± 0%  +544.43%        (p=0.000 n=10+10)
Memmove1024-6  1.40GB/s ± 0%  10.00GB/s ± 0%  +613.80%        (p=0.000 n=10+10)
Memmove2048-6  1.41GB/s ± 0%  10.65GB/s ± 0%  +652.95%         (p=0.000 n=9+10)
Memmove4096-6  1.42GB/s ± 0%  11.01GB/s ± 0%  +675.37%         (p=0.000 n=8+10)
Memclr5-6       269MB/s ± 1%    264MB/s ± 0%    -1.80%        (p=0.000 n=10+10)
Memclr16-6      600MB/s ± 0%    887MB/s ± 1%   +47.83%        (p=0.000 n=10+10)
Memclr64-6     1.06GB/s ± 0%   2.91GB/s ± 1%  +174.58%         (p=0.000 n=8+10)
Memclr256-6    1.32GB/s ± 0%   6.58GB/s ± 0%  +399.86%         (p=0.000 n=9+10)
Memclr4096-6   1.42GB/s ± 0%  10.90GB/s ± 0%  +668.03%         (p=0.000 n=8+10)
Memclr65536-6  1.43GB/s ± 0%  11.37GB/s ± 0%  +697.83%          (p=0.000 n=9+8)
GoMemclr5-6     359MB/s ± 0%    360MB/s ± 0%    +0.46%        (p=0.000 n=10+10)
GoMemclr16-6    750MB/s ± 0%   1264MB/s ± 1%   +68.45%        (p=0.000 n=10+10)
GoMemclr64-6   1.17GB/s ± 0%   3.78GB/s ± 1%  +223.58%         (p=0.000 n=10+9)
GoMemclr256-6  1.35GB/s ± 0%   7.47GB/s ± 0%  +452.44%        (p=0.000 n=10+10)

Update #12552

Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893
Reviewed-on: https://go-review.googlesource.com/14840
Reviewed-by: Minux Ma <minux@golang.org>
2015-10-02 07:50:52 +00:00
Mikio Hara
9fb79380f0 runtime: drop sigfwd from signal forwarding unsupported platforms
This change splits signal_unix.go into signal_unix.go and
signal2_unix.go and removes the fake symbol sigfwd from signal
forwarding unsupported platforms for clarification purpose.

Change-Id: I205eab5cf1930fda8a68659b35cfa9f3a0e67ca6
Reviewed-on: https://go-review.googlesource.com/12062
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-02 01:07:44 +00:00
Joel Sing
db70c019d7 runtime/trace: reduce memory usage for trace stress tests on openbsd/arm
Reduce allocation to avoid running out of memory on the openbsd/arm builder,
until issue/12032 is resolved.

Update issue #12032

Change-Id: Ibd513829ffdbd0db6cd86a0a5409934336131156
Reviewed-on: https://go-review.googlesource.com/15242
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-10-01 18:00:55 +00:00
Joel Sing
1d5251f707 runtime: handle sysReserve failure in mHeap_SysAlloc
sysReserve will return nil on failure - correctly handle this case and return
nil to the caller. Currently, a failure will result in h.arena_end being set
to psize, h.arena_used being set to zero and fun times ensue.

On the openbsd/arm builder this has resulted in:

  runtime: address space conflict: map(0x0) = 0x40946000
  fatal error: runtime: address space conflict

When it should be reporting out of memory instead.

Change-Id: Iba828d5ee48ee1946de75eba409e0cfb04f089d4
Reviewed-on: https://go-review.googlesource.com/15056
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-10-01 14:40:02 +00:00
Jeremy Schlatter
59bacb285c runtime: update comment to match function name
Change-Id: I8f22434ade576cc7e3e6d9f357bba12c1296e3d1
Reviewed-on: https://go-review.googlesource.com/15250
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-01 13:12:50 +00:00
Ian Lance Taylor
0c1f0549b8 runtime, runtime/cgo: support using msan on cgo code
The memory sanitizer (msan) is a nice compiler feature that can
dynamically check for memory errors in C code.  It's not useful for Go
code, since Go is memory safe.  But it is useful to be able to use the
memory sanitizer on C code that is linked into a Go program via cgo.
Without this change it does not work, as msan considers memory passed
from Go to C as uninitialized.

To make this work, change the runtime to call the C mmap function when
using cgo.  When using msan the mmap call will be intercepted and marked
as returning initialized memory.

Work around what appears to be an msan bug by calling malloc before we
call mmap.

Change-Id: I8ab7286d7595ae84782f68a98bef6d3688b946f9
Reviewed-on: https://go-review.googlesource.com/15170
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-09-30 22:17:55 +00:00
Austin Clements
e01be84149 runtime: test that periodic GC works
We've broken periodic GC a few times without noticing because there's
no test for it, partly because you have to wait two minutes to see if
it happens. This exposes control of the periodic GC timeout to runtime
tests and adds a test that cranks it down to zero and sleeps for a bit
to make sure periodic GCs happen.

Change-Id: I3ec44e967e99f4eda752f85c329eebd18b87709e
Reviewed-on: https://go-review.googlesource.com/13169
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-09-30 19:24:07 +00:00
Shenghou Ma
604fbab3f1 runtime: fix incomplete sentence in comment
Fixes #12709.

Change-Id: If5a2536458fcd26d6f003dde1bfc02f86b09fa94
Reviewed-on: https://go-review.googlesource.com/14793
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-09-23 17:05:39 +00:00
Alex Brainman
d02a4c1d60 runtime: test that timeBeginPeriod succeeds
Change-Id: I5183f767dadb6d24a34d2460d02e97ddbaab129a
Reviewed-on: https://go-review.googlesource.com/12546
Run-TryBot: Minux Ma <minux@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-23 09:01:08 +00:00
Austin Clements
b307910b6e runtime: fix offset in invalidptr panic message
Change-Id: I00e1eebbf5e1a01c8fad5ca5324aa8eec1e4d731
Reviewed-on: https://go-review.googlesource.com/14792
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-22 16:55:17 +00:00
Ilya Tocar
5cf281a9b7 runtime: optimize duffcopy on amd64
Use movups to copy 16 bytes at a time.
Results (haswell):

name            old time/op  new time/op  delta
CopyFat8-48     0.62ns ± 3%  0.63ns ± 3%     ~     (p=0.535 n=20+20)
CopyFat12-48    0.92ns ± 2%  0.93ns ± 3%     ~     (p=0.594 n=17+18)
CopyFat16-48    1.23ns ± 2%  1.23ns ± 2%     ~     (p=0.839 n=20+19)
CopyFat24-48    1.85ns ± 2%  1.84ns ± 0%   -0.48%  (p=0.014 n=19+20)
CopyFat32-48    2.45ns ± 0%  2.45ns ± 1%     ~     (p=1.000 n=16+16)
CopyFat64-48    3.30ns ± 2%  2.14ns ± 1%  -35.00%  (p=0.000 n=20+18)
CopyFat128-48   6.05ns ± 0%  3.98ns ± 0%  -34.22%  (p=0.000 n=18+17)
CopyFat256-48   11.9ns ± 3%   7.7ns ± 0%  -35.87%  (p=0.000 n=20+17)
CopyFat512-48   23.0ns ± 2%  15.1ns ± 2%  -34.52%  (p=0.000 n=20+18)
CopyFat1024-48  44.8ns ± 1%  29.8ns ± 2%  -33.48%  (p=0.000 n=17+19)

Change-Id: I8a78773c656d400726a020894461e00c59f896bf
Reviewed-on: https://go-review.googlesource.com/14836
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-22 15:02:37 +00:00
Dmitry Vyukov
9172a1b573 runtime: race instrument read of convT2E/I arg
Sometimes this read is instrumented by compiler when it creates
a temp to take address, but sometimes it is not (e.g. for global vars
compiler takes address of the global directly).

Instrument convT2E/I similarly to chansend and mapaccess.

Fixes #12664

Change-Id: Ia7807f15d735483996426c5f3aed60a33b279579
Reviewed-on: https://go-review.googlesource.com/14752
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-19 10:26:36 +00:00
Austin Clements
c742ff6adc runtime: remove flaky TestInvalidptrCrash to fix build
This test fails on arm64 and some amd64 OSs and fails on Linux/amd64
if you remove the first runtime.GC(), which should be unnecessary, and
run it in all.bash (but not if you run it in isolation). I don't
understand any of these failures, so for now just remove this test.

TBR=rlh

Change-Id: Ibed00671126000ed7dc5b5d4af1f86fe4a1e30e1
Reviewed-on: https://go-review.googlesource.com/14767
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-19 01:43:00 +00:00
Austin Clements
97b64d88eb runtime: avoid debug prints of huge objects
Currently when the GC prints an object for debugging (e.g., for a
failed invalidptr or checkmark check), it dumps the entire object. To
avoid inundating the user with output for really large objects, limit
this to printing just the first 128 words (which are most likely to be
useful in identifying the type of an object) and the 32 words around
the problematic field.

Change-Id: Id94a5c9d8162f8bd9b2a63bf0b1bfb0adde83c68
Reviewed-on: https://go-review.googlesource.com/14764
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-18 22:23:18 +00:00
Austin Clements
b7c55ba496 runtime: improve invalid pointer error message
By default, the runtime panics if it detects a pointer to an
unallocated span. At this point, this usually catches bad uses of
unsafe or cgo in user code (though it could also catch runtime bugs).
Unfortunately, the rather cryptic error misleads users, offers users
little help with debugging their own problem, and offers the Go
developers little help with root-causing.

Improve the error message in various ways. First, the wording is
improved to make it clearer what condition was detected and to suggest
that this may be the result of incorrect use of unsafe or cgo. Second,
we add a dump of the object containing the bad pointer so that there's
at least some hope of figuring out why a bad pointer was stored in the
Go heap.

Change-Id: I57b91b12bc3cb04476399d7706679e096ce594b9
Reviewed-on: https://go-review.googlesource.com/14763
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-18 22:23:11 +00:00
Shawn Walker-Salas
001a75a74c runtime/trace: fix tracing of blocking system calls
The placement and invocation of traceGoSysCall when using
entersyscallblock() instead of entersyscall() differs enough that the
TestTraceSymbolize test can fail on some platforms.

This change moves the invocation of traceGoSysCall for entersyscall() so
that the same number of "frames to skip" are present in the trace as when
entersyscallblock() is used ensuring system call traces remain identical
regardless of internal implementation choices.

Fixes golang/go#12056

Change-Id: I8361e91aa3708f5053f98263dfe9feb8c5d1d969
Reviewed-on: https://go-review.googlesource.com/13861
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-09-17 09:06:20 +00:00
Alex Brainman
3d1f8c2379 runtime: print errno and byte count before crashing in mem_windows.go
As per iant suggestion during issue #12587 crash investigation.

Also adjust incorrect throw message in sysUsed while we are here.

Change-Id: Ice07904fdd6e0980308cb445965a696d26a1b92e
Reviewed-on: https://go-review.googlesource.com/14633
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-17 07:06:42 +00:00
David Crawshaw
9337dc9b5e runtime/debug: more explicit Stack docs
Change-Id: I81a7f22be827519b5290b4acbcba357680cad3c4
Reviewed-on: https://go-review.googlesource.com/14605
Reviewed-by: Rob Pike <r@golang.org>
2015-09-16 22:25:11 +00:00
Ilya Tocar
2421c6e3df runtime: optimize duffzero for amd64.
Use MOVUPS to zero 16 bytes at a time.

results (haswell):

name             old time/op  new time/op  delta
ClearFat8-48     0.62ns ± 2%  0.62ns ± 1%     ~     (p=0.085 n=20+15)
ClearFat12-48    0.93ns ± 2%  0.93ns ± 2%     ~     (p=0.757 n=19+19)
ClearFat16-48    1.23ns ± 1%  1.23ns ± 1%     ~     (p=0.896 n=19+17)
ClearFat24-48    1.85ns ± 2%  1.84ns ± 0%   -0.51%  (p=0.023 n=20+15)
ClearFat32-48    2.45ns ± 0%  2.46ns ± 2%     ~     (p=0.053 n=17+18)
ClearFat40-48    1.99ns ± 0%  0.92ns ± 2%  -53.54%  (p=0.000 n=19+20)
ClearFat48-48    2.15ns ± 1%  0.92ns ± 2%  -56.93%  (p=0.000 n=19+20)
ClearFat56-48    2.46ns ± 1%  1.23ns ± 0%  -49.98%  (p=0.000 n=19+14)
ClearFat64-48    2.76ns ± 0%  2.14ns ± 1%  -22.21%  (p=0.000 n=17+17)
ClearFat128-48   5.21ns ± 0%  3.99ns ± 0%  -23.46%  (p=0.000 n=17+19)
ClearFat256-48   10.3ns ± 4%   7.7ns ± 0%  -25.37%  (p=0.000 n=20+17)
ClearFat512-48   20.2ns ± 4%  15.0ns ± 1%  -25.58%  (p=0.000 n=20+17)
ClearFat1024-48  39.7ns ± 2%  29.7ns ± 0%  -25.05%  (p=0.000 n=19+19)

Change-Id: I200401eec971b2dd2450c0651c51e378bd982405
Reviewed-on: https://go-review.googlesource.com/14408
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-16 16:07:44 +00:00
David Crawshaw
2d697b2401 runtime/debug: implement Stack using runtime.Stack
Fixes #12363

Change-Id: I1a025ab6a1cbd5a58f5c2bce5416788387495428
Reviewed-on: https://go-review.googlesource.com/14604
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-16 11:36:21 +00:00
David Crawshaw
fb30270037 runtime: preserve R11 in darwin/arm entrypoint
The _rt0_arm_darwin_lib entrypoint has to conform to the darwin ARMv7
calling convention, which requires functions to preserve the value of
R11. Go uses R11 as the liblink REGTMP register, so save it manually.

Also avoid using R4, which is also callee-save.

Fixes #12590

Change-Id: I9c3b374e330f81ff8fc9c01fa20505a33ddcf39a
Reviewed-on: https://go-review.googlesource.com/14603
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-09-16 11:23:32 +00:00
Ian Lance Taylor
b6d115a583 runtime: on unexpected netpoll error, throw instead of looping
The current code prints an error message and then tries to carry on.
This is not helpful for Go users: they see a message that means
nothing and that they can do nothing about.  In the only known case of
this message, in issue 11498, the best guess is that the netpoll code
went into an infinite loop.  Instead of doing that, crash the program.

Fixes #11498.

Change-Id: Idda3456c5b708f0df6a6b56c5bb4e796bbc39d7c
Reviewed-on: https://go-review.googlesource.com/12047
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-09-15 17:56:56 +00:00
Keith Randall
731bdc5115 runtime: fix aeshash of empty string
Aeshash currently computes the hash of the empty string as
hash("", seed) = seed.  This is bad because the hash of a compound
object with empty strings in it doesn't include information about
where those empty strings were.  For instance [2]string{"", "foo"}
and [2]string{"foo", ""} might get the same hash.

Fix this by returning a scrambled seed instead of the seed itself.
With this fix, we can remove the scrambling done by the generated
array hash routines.

The test also rejects hash("", seed) = 0, if we ever thought
it would be a good idea to try that.

The fallback hash is already OK in this regard.

Change-Id: Iaedbaa5be8d6a246dc7e9383d795000e0f562037
Reviewed-on: https://go-review.googlesource.com/14129
Reviewed-by: jcd . <jcd@golang.org>
2015-09-15 17:51:23 +00:00
Alex Brainman
d7c12042bf runtime: provide room for first 4 syscall parameters in windows usleep2
Windows amd64 requires all syscall callers to provide room for first
4 parameters on stack. We do that for all our syscalls, except inside
of usleep2. In https://codereview.appspot.com/7563043#msg3 rsc says:

"We don't need the stack alignment and first 4 parameters on amd64
because it's just a system call, not an ordinary function call."

He seems to be wrong on both counts. But alignment is already fixed.
Fix parameter space now too.

Fixes #12444

Change-Id: I66a2a18d2f2c3846e3aa556cc3acc8ec6240bea0
Reviewed-on: https://go-review.googlesource.com/14282
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-15 01:12:32 +00:00
Ian Lance Taylor
ffd7d31787 runtime: unblock special glibc signals on each thread
Glibc uses some special signals for special thread operations.  These
signals will be used in programs that use cgo and invoke certain glibc
functions, such as setgid.  In order for this to work, these signals
need to not be masked by any thread.  Before this change, they were
being masked by programs that used os/signal.Notify, because it
carefully masks all non-thread-specific signals in all threads so that a
dedicated thread will collect and report those signals (see ensureSigM
in signal1_unix.go).

This change adds the two glibc special signals to the set of signals
that are unmasked in each thread.

Fixes #12498.

Change-Id: I797d71a099a2169c186f024185d44a2e1972d4ad
Reviewed-on: https://go-review.googlesource.com/14297
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-09-14 21:59:54 +00:00
Austin Clements
4ac4085f8e runtime: minor clarifications of markroot
This puts the _Root* indexes in a more friendly order and tweaks
markrootSpans to use a for-range loop instead of its own indexing.

Change-Id: I2c18d55c9a673ea396b6424d51ef4997a1a74825
Reviewed-on: https://go-review.googlesource.com/14548
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-14 19:37:44 +00:00
Austin Clements
a1cad70a2f runtime: remove unused g.readyg field
Commit 0e6a6c5 removed readyExecute a long time ago, but left behind
the g.readyg field that was used by readyExecute. Remove this now
unused field.

Change-Id: I41b87ad2b427974d256ec7a7f6d4bdc2ce8a13bb
Reviewed-on: https://go-review.googlesource.com/13111
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-14 18:40:22 +00:00
Austin Clements
70462f90ec runtime: simplify mSpan_Sweep
This is a cleanup following cc8f544, which was a minimal change to fix
issue #11617. This consolidates the two places in mSpan_Sweep that
update sweepgen. Previously this was necessary because sweepgen must
be updated before freeing the span, but we freed large spans early.
Now we free large spans later, so there's no need to duplicate the
sweepgen update. This also means large spans can take advantage of the
sweepgen sanity checking performed for other spans.

Change-Id: I23b79dbd9ec81d08575cd307cdc0fa6b20831768
Reviewed-on: https://go-review.googlesource.com/12451
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-14 18:29:58 +00:00
Austin Clements
572f08a064 runtime: split marking of span roots into 128 subtasks
Marking of span roots can represent a significant fraction of the time
spent in mark termination. Simply traversing the span list takes about
1ms per GB of heap and if there are a large number of finalizers (for
example, for network connections), it may take much longer.

Improve the situation by splitting the span scan into 128 subtasks
that can be executed in parallel and load balanced by the markroots
parallel for. This lets the GC balance this job across the Ps.

A better solution is to do this during concurrent mark, or to improve
it algorithmically, but this is a simple change with a lot of bang for
the buck.

This was suggested by Rhys Hiltner.

Updates #11485.

Change-Id: I8b281adf0ba827064e154a1b6cc32d4d8031c03c
Reviewed-on: https://go-review.googlesource.com/13112
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-14 18:15:40 +00:00
Austin Clements
739f133837 runtime: fix hashing of trace stacks
The call to hash the trace stack reversed the "seed" and "size"
arguments to memhash and, hence, always called memhash with a 0 size,
which dutifully returned a hash value that depended only on the number
of PCs in the stack and not their values. As a result, all stacks were
put in to a very subset of the 8,192 buckets.

Fix this by passing these arguments in the correct order.

Change-Id: I67cd29312f5615c7ffa23e205008dd72c6b8af62
Reviewed-on: https://go-review.googlesource.com/13613
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-09-14 18:14:14 +00:00
Dave Cheney
4f48507d90 runtime: reduce pthread stack size in TestCgoCallbackGC
Fixes #11959

This test runs 100 concurrent callbacks from C to Go consuming 100
operating system threads, which at 8mb a piece (the default on linux/arm)
would reserve over 800mb of address space. This would frequently
cause the test to fail on platforms with ~1gb of ram, such as the
raspberry pi.

This change reduces the thread stack allocation to 256kb, a number picked
at random, but at 1/32th the previous size, should allow the test to
pass successfully on all platforms.

Change-Id: I8b8bbab30ea7b2972b3269a6ff91e6fe5bc717af
Reviewed-on: https://go-review.googlesource.com/13731
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Capitanio <capnm9@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-09-13 23:46:55 +00:00
Shenghou Ma
0b5bcf53ee runtime/cgo: explicitly link msvcrt on windows
It's because runtime links to ntdll, and ntdll exports a couple
incompatible libc functions. We must link to msvcrt first and
then try ntdll.

Fixes #12030.

Change-Id: I0105417bada108da55f5ae4482c2423ac7a92957
Reviewed-on: https://go-review.googlesource.com/14472
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-09-12 08:34:52 +00:00
Rob Pike
67ddae87b9 all: use one 'l' when cancelling everywhere except Solaris
Fixes #11626.

Change-Id: I1b70c0844473c3b57a53d7cca747ea5cdc68d232
Reviewed-on: https://go-review.googlesource.com/14526
Run-TryBot: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-11 18:31:51 +00:00
Didier Spezia
4f33436004 runtime,internal/trace: map/slice literals janitoring
Simplify slice/map literal expressions.
Caught with gofmt -d -s, fixed with gofmt -w -s
Checked that the result can still be compiled with Go 1.4.

Change-Id: I06bce110bb5f46ee2f45113681294475aa6968bc
Reviewed-on: https://go-review.googlesource.com/13839
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-09-11 14:03:43 +00:00
Michael Hudson-Doyle
b0344e9fd5 cmd/internal/obj, cmd/link, runtime: a saner model for TLS on arm
this leaves lots of cruft behind, will delete that soon

Change-Id: I12d6b6192f89bcdd89b2b0873774bd3458373b8a
Reviewed-on: https://go-review.googlesource.com/14196
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-10 19:49:13 +00:00
Shenghou Ma
1cf05ee612 runtime: move arch1_$GOARCH.go into arch_$GOARCH.go
Update #12563.

Change-Id: Id87f8e53586accd662575c31961c39787268df7a
Reviewed-on: https://go-review.googlesource.com/14471
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-10 04:52:24 +00:00
Keith Randall
00c638d243 runtime: on map update, don't overwrite key if we don't need to.
Keep track of which types of keys need an update and which don't.

Strings need an update because the new key might pin a smaller backing store.
Floats need an update because it might be +0/-0.
Interfaces need an update because they may contain strings or floats.

Fixes #11088

Change-Id: I9ade53c1dfb3c1a2870d68d07201bc8128e9f217
Reviewed-on: https://go-review.googlesource.com/10843
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-09 21:06:49 +00:00
Austin Clements
e9089e4ab6 runtime: add high-level description of how stack barriers work
Change-Id: I6affe75b5fa9dbf513c16200bff4fd7aa5f3a985
Reviewed-on: https://go-review.googlesource.com/14051
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-09 01:18:56 +00:00
Austin Clements
92e68c89f6 runtime: move stack barrier code to its own file
Currently the stack barrier code is mixed in with the mark and scan
code. Move all of the stack barrier related functions and variables to
a new dedicated source file. There are no code modifications.

Change-Id: I604603045465ef8573b9f88915d28ab6b5910903
Reviewed-on: https://go-review.googlesource.com/14050
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-09 01:18:50 +00:00
Michael Hudson-Doyle
0388d4303f runtime: remove unused FUNCDATA_DeadValueMaps
Change-Id: Iccb0221bd9aef062d20798b952eaa09d9e60b902
Reviewed-on: https://go-review.googlesource.com/14345
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-07 21:02:11 +00:00
Michael Hudson-Doyle
31322996fd runtime: add stub sigreturn on arm
When building a shared library, all functions that are declared must actually
be defined.

Change-Id: I1488690cecfb66e62d9fdb3b8d257a4dc31d202a
Reviewed-on: https://go-review.googlesource.com/14187
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-09-07 07:49:09 +00:00
Michael Hudson-Doyle
40af15f28e runtime: teach softfloat interpreter about "add r11, pc, r11"
This is generated during fp code when -shared is active.

Change-Id: Ia1092299b9c3b63ff771ca4842158b42c34bd008
Reviewed-on: https://go-review.googlesource.com/14286
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-09-04 06:43:35 +00:00
Michael Hudson-Doyle
9e6ba37b86 cmd/internal/obj: some platform independent bits of proper toolchain support for thread local storage
Also simplifies some silliness around making the .tbss section wrt internal
vs external linking. The "make TLS make sense" project has quite a few more
steps to go.

Issue #11270

Change-Id: Ia4fa135cb22d916728ead95bdbc0ebc1ae06f05c
Reviewed-on: https://go-review.googlesource.com/13990
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-03 14:06:07 +00:00
Michael Hudson-Doyle
9f0baca505 runtime: fixes for arm64 shared libraries
Building for shared libraries requires that all functions that are declared
have an implementation and vice versa so make that so on arm64.

It would be nicer to not require the stub sigreturn (it will never be called)
but that seems a bit awkward.

Change-Id: I3cec81697161b452af81fa35939f748bd1acf7fd
Reviewed-on: https://go-review.googlesource.com/13995
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-09-03 01:07:40 +00:00
Keith Randall
a088f1b76c runtime: soften up hash checks a bit
The hash tests generate occasional failures, quiet them some more.

In particular we can get 1 collision when the expected number is
.001 or so. That shouldn't be a dealbreaker.

Fixes #12311

Change-Id: I784e91b5d21f4f1f166dc51bde2d1cd3a7a3bfea
Reviewed-on: https://go-review.googlesource.com/13902
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-08-31 19:38:24 +00:00
Shenghou Ma
32d3b96e8b runtime: implement cmpstring and bytes.Compare in assembly for ppc64
Change-Id: I15bf55aa5ac3588c05f0a253f583c52bab209892
Reviewed-on: https://go-review.googlesource.com/14041
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-08-31 18:41:58 +00:00
Austin Clements
77e528293b runtime: check that stack barrier unwind is in sync
Currently the stack barrier stub blindly unwinds the next stack
barrier from the G's stack barrier array without checking that it's
the right stack barrier. If through some bug the stack barrier array
position gets out of sync with where we actually are on the stack,
this could return to the wrong PC, which would lead to difficult to
debug crashes. To address this, this commit adds a check to the amd64
stack barrier stub that it's unwinding the correct stack barrier.

Updates #12238.

Change-Id: If824d95191d07e2512dc5dba0d9978cfd9f54e02
Reviewed-on: https://go-review.googlesource.com/13948
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:07:02 +00:00
Austin Clements
3bfc9df21a runtime: add GODEBUG for stack barriers at every frame
Currently enabling the debugging mode where stack barriers are
installed at every frame requires recompiling the runtime. However,
this is potentially useful for field debugging and for runtime tests,
so make this mode a GODEBUG.

Updates #12238.

Change-Id: I6fb128f598b19568ae723a612e099c0ed96917f5
Reviewed-on: https://go-review.googlesource.com/13947
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:06:55 +00:00
Austin Clements
e2bb03f175 runtime: don't install a stack barrier in cgocallback_gofunc's frame
Currently the runtime can install stack barriers in any frame.
However, the frame of cgocallback_gofunc is special: it's the one
function that switches from a regular G stack to the system stack on
return. Hence, the return PC slot in its frame on the G stack is
actually used to save getg().sched.pc (so tracebacks appear to unwind
to the last Go function running on that G), and not as an actual
return PC for cgocallback_gofunc.

Because of this, if we install a stack barrier in cgocallback_gofunc's
return PC slot, when cgocallback_gofunc does return, it will move the
stack barrier stub PC in to getg().sched.pc and switch back to the
system stack. The rest of the runtime doesn't know how to deal with a
stack barrier stub in sched.pc: nothing knows how to match it up with
the G's stack barrier array and, when the runtime removes stack
barriers, it doesn't know to undo the one in sched.pc. Hence, if the C
code later returns back in to Go code, it will attempt to return
through the stack barrier saved in sched.pc, which may no longer have
correct unwinding information.

Fix this by blacklisting cgocallback_gofunc's frame so the runtime
won't install a stack barrier in it's return PC slot.

Fixes #12238.

Change-Id: I46aa2155df2fd050dd50de3434b62987dc4947b8
Reviewed-on: https://go-review.googlesource.com/13944
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-30 16:06:47 +00:00
Keith Randall
805e56ef47 runtime: short-circuit bytes.Compare if src and dst are the same slice
Should only matter on ppc64 and ppc64le.

Fixes #11336

Change-Id: Id4b0ac28b573648e1aa98e87bf010f00d006b146
Reviewed-on: https://go-review.googlesource.com/13901
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-08-29 02:43:57 +00:00
Russ Cox
9c04d00214 runtime: check explicitly for short unwinding of stacks
Right now we find out implicitly if stack barriers are in place,
or defers. This change makes sure we find out about short
unwinds always.

Change-Id: Ibdde1ba9c79eb792660dcb7aa6f186e4e4d559b3
Reviewed-on: https://go-review.googlesource.com/13966
Reviewed-by: Austin Clements <austin@google.com>
2015-08-28 16:05:59 +00:00
Tim Cooijmans
34db31d5f5 src/runtime: Add missing defs for android/386.
Change-Id: I63bf6d2fdf41b49ff8783052d5d6c53b20e2f050
Reviewed-on: https://go-review.googlesource.com/13760
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-08-27 15:14:41 +00:00
Michael Hudson-Doyle
d497eeb005 runtime: remove unused xchgp/xchgp1
I noticed that they were unimplemented on arm64 but then that they were
in fact not used at all.

Change-Id: Iee579feda2a5e374fa571bcc8c89e4ef607d50f6
Reviewed-on: https://go-review.googlesource.com/13951
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-27 00:28:35 +00:00
Uttam C Pawar
32add8d7c8 bytes: improve Compare function on amd64 for large byte arrays
This patch contains only loop unrolling change for size > 63B

Following are the performance numbers for various sizes on
On Haswell based system: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz.

benchcmp go.head.8.25.15.txt go.head.8.25.15.opt.txt
benchmark                       old ns/op     new ns/op     delta
BenchmarkBytesCompare1-4        5.37          5.37          +0.00%
BenchmarkBytesCompare2-4        5.37          5.38          +0.19%
BenchmarkBytesCompare4-4        5.37          5.37          +0.00%
BenchmarkBytesCompare8-4        4.42          4.38          -0.90%
BenchmarkBytesCompare16-4       4.27          4.45          +4.22%
BenchmarkBytesCompare32-4       5.30          5.36          +1.13%
BenchmarkBytesCompare64-4       6.93          6.78          -2.16%
BenchmarkBytesCompare128-4      10.3          9.50          -7.77%
BenchmarkBytesCompare256-4      17.1          13.8          -19.30%
BenchmarkBytesCompare512-4      31.3          22.1          -29.39%
BenchmarkBytesCompare1024-4     62.5          39.0          -37.60%
BenchmarkBytesCompare2048-4     112           73.2          -34.64%

Change-Id: I4eeb1c22732fd62cbac97ba757b0d29f648d4ef1
Reviewed-on: https://go-review.googlesource.com/11871
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-26 03:52:20 +00:00
Todd Neal
a94e906c41 runtime: remove always false comparison in sigsend
s is a uint32 and can never be zero. It's max value is already tested
against sig.wanted, whose size is derived from _NSIG.  This also
matches the test in signal_enable.

Fixes #11282

Change-Id: I8eec9c7df8eb8682433616462fe51b264c092475
Reviewed-on: https://go-review.googlesource.com/13940
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-08-26 01:02:55 +00:00
Michael Hudson-Doyle
af78482d6b cmd/compile, cmd/link, reflect, runtime: remove type.zero field
No longer used after previous hashmap change.

Change-Id: I558470f872281e84a78406132df4e391d077b833
Reviewed-on: https://go-review.googlesource.com/13785
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-26 00:28:17 +00:00
Michael Hudson-Doyle
38519e69d0 cmd/compile, runtime: stop returning t.zero on hashmap miss
Previously t.zero always pointed to runtime.zerovalue. Change the hashmap code
to always return a runtime pointer directly, and change that pointer to point
to a larger buffer if one is needed.

(It might be better to only copy from the pointer returned by the mapaccess
functions when the value type is small enough and have the compiler insert
explicit zeroing for larger value types, but I tried and failed to do this).

This removes all uses of the zero field of the type data; the field itself can
be removed in a separate change.

Fixes #11491

Change-Id: I5b81752ff4067d74a5a281c41e88f151bae0171e
Reviewed-on: https://go-review.googlesource.com/13784
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-08-26 00:03:21 +00:00
Austin Clements
05a3b1fce5 cmd/compile: fix uninitialized memory in compare of interface value
A comparison of the form l == r where l is an interface and r is
concrete performs a type assertion on l to convert it to r's type.
However, the compiler fails to zero the temporary where the result of
the type assertion is written, so if the type is a pointer type and a
stack scan occurs while in the type assertion, it may see an invalid
pointer on the stack.

Fix this by zeroing the temporary. This is equivalent to the fix for
type switches from c4092ac.

Fixes #12253.

Change-Id: Iaf205d456b856c056b317b4e888ce892f0c555b9
Reviewed-on: https://go-review.googlesource.com/13872
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-25 14:37:08 +00:00
Dave Cheney
686d44d9e0 runtime: check pointer equality in arm64 cmpbody
Updates #11336

Follow the lead of amd64 by doing a pointer equality check
before comparing string/byte contents on arm64.

BenchmarkCompareBytesEqual-8               25.8           26.3           +1.94%
BenchmarkCompareBytesToNil-8               9.59           9.59           +0.00%
BenchmarkCompareBytesEmpty-8               9.59           9.17           -4.38%
BenchmarkCompareBytesIdentical-8           26.3           9.17           -65.13%
BenchmarkCompareBytesSameLength-8          16.3           16.3           +0.00%
BenchmarkCompareBytesDifferentLength-8     16.3           16.3           +0.00%
BenchmarkCompareBytesBigUnaligned-8        1132038        1131409        -0.06%
BenchmarkCompareBytesBig-8                 1126758        1128470        +0.15%
BenchmarkCompareBytesBigIdentical-8        1084366        9.17           -100.00%

Change-Id: Id7125c31957eff1ddb78897d4511bd50e79af3f7
Reviewed-on: https://go-review.googlesource.com/13885
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-25 03:29:47 +00:00
Todd Neal
3efe36d4c4 runtime: fix nmspinning comparison
nmspinning has a value range of [0, 2^31-1].  Update the comment to
indicate this and fix the comparison so it's not always false.

Fixes #11280

Change-Id: Iedaf0654dcba5e2c800645f26b26a1a781ea1991
Reviewed-on: https://go-review.googlesource.com/13877
Reviewed-by: Minux Ma <minux@golang.org>
2015-08-25 02:44:11 +00:00
Shenghou Ma
24be0997a2 runtime: add a missing hex conversion
gobuf.g is a guintptr, so without hex(), it will be printed as
a decimal, which is not very helpful and inconsistent with how
other pointers are printed.

Change-Id: I7c0432e9709e90a5c3b3e22ce799551a6242d017
Reviewed-on: https://go-review.googlesource.com/13879
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-25 01:37:54 +00:00
Dave Cheney
1135b9d671 runtime: check pointer equality in arm cmpbody
Updates #11336

Follow the lead of amd64 do a pointer equality check
before comparing string/byte contents on arm.

BenchmarkCompareBytesEqual-4               208             211             +1.44%
BenchmarkCompareBytesToNil-4               83.6            81.8            -2.15%
BenchmarkCompareBytesEmpty-4               80.2            75.2            -6.23%
BenchmarkCompareBytesIdentical-4           208             75.2            -63.85%
BenchmarkCompareBytesSameLength-4          126             128             +1.59%
BenchmarkCompareBytesDifferentLength-4     128             130             +1.56%
BenchmarkCompareBytesBigUnaligned-4        14192804        14060971        -0.93%
BenchmarkCompareBytesBig-4                 12277313        12128193        -1.21%
BenchmarkCompareBytesBigIdentical-4        9385046         78.5            -100.00%

Change-Id: I5b24620018688c5fe04b6ff6743a24c4ce225788
Reviewed-on: https://go-review.googlesource.com/13881
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-24 21:18:33 +00:00
Hyang-Ah (Hana) Kim
db5eb2a2c3 runtime/cgo: remove __stack_chk_fail_local
I cannot find where it's being used.

This addresses a duplicate symbol issue encountered in golang/go#9327.

Change-Id: I8efda45a006ad3e19423748210c78bd5831215e0
Reviewed-on: https://go-review.googlesource.com/13615
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-21 15:56:36 +00:00
Shawn Walker-Salas
d9e3d16796 runtime, syscall: remove unused bits from Solaris implementation
CL 9184 changed the runtime and syscall packages to link Solaris binaries
directly instead of using dlopen/dlsym but did not remove the unused (and
now broken) references to dlopen, dlclose, and dlsym.

Fixes #11923

Change-Id: I36345ce5e7b371bd601b7d48af000f4ccacd62c0
Reviewed-on: https://go-review.googlesource.com/13410
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-08-21 11:39:24 +00:00
Russ Cox
3ae17043f7 runtime: make sure heapBitsBulkBarrier cannot be preempted
Changes the torture test in #12068 from failing about 1/10 times
to not failing in almost 2,000 runs.

This was only happening in -race mode because functions are
bigger in -race mode, so a few of the helpers for heapBitsBulkBarrier
were not being inlined, and they were not marked nosplit,
so (only in -race mode) the write barrier was being preempted by GC,
causing missed pointer updates.

Filed issue #12069 for diagnosis of any other similar errors.

Fixes #12068.

Change-Id: Ic174d9b050ba278b18b08ab0d85a73c33bd5b175
Reviewed-on: https://go-review.googlesource.com/13364
Reviewed-by: Austin Clements <austin@google.com>
2015-08-07 17:55:26 +00:00
Russ Cox
4a19081358 runtime: run on GOARM=5 and GOARM=6 uniprocessor freebsd/arm systems
Also, crash early on non-Linux SMP ARM systems when GOARM < 7;
without the proper synchronization, SMP cannot work.

Linux is okay because we call kernel-provided routines for
synchronization and barriers, and the kernel takes care of
providing the right routines for the current system.
On non-Linux systems we are left to fend for ourselves.

It is possible to use different synchronization on GOARM=6,
but it's too late to do that in the Go 1.5 cycle.
We don't believe there are any non-Linux SMP GOARM=6 systems anyway.

Fixes #12067.

Change-Id: I771a556e47893ed540ec2cd33d23c06720157ea3
Reviewed-on: https://go-review.googlesource.com/13363
Reviewed-by: Austin Clements <austin@google.com>
2015-08-07 17:39:07 +00:00
Austin Clements
ad731887a7 runtime: call goexit1 instead of goexit
Currently, runtime.Goexit() calls goexit()—the goroutine exit stub—to
terminate the goroutine. This *mostly* works, but can cause a
"leftover stack barriers" panic if the following happens:

1. Goroutine A has a reasonably large stack.

2. The garbage collector scan phase runs and installs stack barriers
   in A's stack. The top-most stack barrier happens to fall at address X.

3. Goroutine A unwinds the stack far enough to be a candidate for
   stack shrinking, but not past X.

4. Goroutine A calls runtime.Goexit(), which calls goexit(), which
   calls goexit1().

5. The garbage collector enters mark termination.

6. Goroutine A is preempted right at the prologue of goexit1() and
   performs a stack shrink, which calls gentraceback.

gentraceback stops as soon as it sees goexit on the stack, which is
only two frames up at this point, even though there may really be many
frames above it. More to the point, the stack barrier at X is above
the goexit frame, so gentraceback never sees that stack barrier. At
the end of gentraceback, it checks that it saw all of the stack
barriers and panics because it didn't see the one at X.

The fix is simple: call goexit1, which actually implements the process
of exiting a goroutine, rather than goexit, the exit stub.

To make sure this doesn't happen again in the future, we also add an
argument to the stub prototype of goexit so you really, really have to
want to call it in order to call it. We were able to reliably
reproduce the above sequence with a fair amount of awful code inserted
at the right places in the runtime, but chose to change the goexit
prototype to ensure this wouldn't happen again rather than pollute the
runtime with ugly testing code.

Change-Id: Ifb6fb53087e09a252baddadc36eebf954468f2a8
Reviewed-on: https://go-review.googlesource.com/13323
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-06 20:21:05 +00:00
Russ Cox
26baed6af7 runtime: fix race that dropped GoSysExit events from trace
This makes TestTraceStressStartStop much less flaky.
Running under stress, it changes the failure rate from
above 1/100 to under 1/50000. That very unlikely
failure happens when an unexpected GoSysExit is
written. Not sure how that happens yet, but it is much
less important.

Fixes #11953.

Change-Id: I034671936334b4f3ab733614ef239aa121d20247
Reviewed-on: https://go-review.googlesource.com/13321
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-08-06 19:29:09 +00:00
Austin Clements
d57f037302 runtime: don't recheck heap trigger for periodic GC
88e945f introduced a non-speculative double check of the heap trigger
before actually starting a concurrent GC. This was necessary to fix a
race for heap-triggered GC, but broke sysmon-triggered periodic GC,
since the heap check will of course fail for periodically triggered
GC.

Fix this by telling startGC whether or not this GC was triggered by
heap size or a timer and only doing the heap size double check for GCs
triggered by heap size.

Fixes #12026.

Change-Id: I7c3f6ec364545c36d619f2b4b3bf3b758e3bcbd6
Reviewed-on: https://go-review.googlesource.com/13168
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-05 17:28:56 +00:00
Russ Cox
2a60d77059 runtime: align stack pointer during initcgo call on arm
This is what is causing freebsd/arm to crash mysteriously when using cgo.
The bug was introduced in golang.org/cl/4030, which moved this code out
of rt0_go and into its own function. The ARM ABI says that calls must
be made with the stack pointer at an 8-byte boundary, but only FreeBSD
seems to crash when this is violated.

Fixes #10119.

Change-Id: Ibdbe76b2c7b80943ab66b8abbb38b47acb70b1e5
Reviewed-on: https://go-review.googlesource.com/13161
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-08-05 05:31:34 +00:00
Austin Clements
be39a42920 runtime: fix typos in comments
Change-Id: I66f7937b22bb6e05c3f2f0f2a057151020ad9699
Reviewed-on: https://go-review.googlesource.com/13049
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:56 +00:00
Austin Clements
e3870aa6f3 runtime: fix assist utilization computation
When commit 510fd13 enabled assists during the scan phase, it failed
to also update the code in the GC controller that computed the assist
CPU utilization and adjusted the trigger based on it. Fix that code so
it uses the start of the scan phase as the wall-clock time when
assists were enabled rather than the start of the mark phase.

Change-Id: I05013734b4448c3e2c730dc7b0b5ee28c86ed8cf
Reviewed-on: https://go-review.googlesource.com/13048
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:53 +00:00
Austin Clements
1fb01a88f9 runtime: revise assist ratio aggressively
At the start of a GC cycle, the garbage collector computes the assist
ratio based on the total scannable heap size. This was intended to be
conservative; after all, this assumes the entire heap may be reachable
and hence needs to be scanned. But it only assumes that the *current*
entire heap may be reachable. It fails to account for heap allocated
during the GC cycle. If the trigger ratio is very low (near zero), and
most of the heap is reachable when GC starts (which is likely if the
trigger ratio is near zero), then it's possible for the mutator to
create new, reachable heap fast enough that the assists won't keep up
based on the assist ratio computed at the beginning of the cycle. As a
result, the heap can grow beyond the heap goal (by hundreds of megs in
stress tests like in issue #11911).

We already have some vestigial logic for dealing with situations like
this; it just doesn't run often enough. Currently, every 10 ms during
the GC cycle, the GC revises the assist ratio. This was put in before
we switched to a conservative assist ratio (when we really were using
estimates of scannable heap), and it turns out to be exactly what we
need now. However, every 10 ms is far too infrequent for a rapidly
allocating mutator.

This commit reuses this logic, but replaces the 10 ms timer with
revising the assist ratio every time the heap is locked, which
coincides precisely with when the statistics used to compute the
assist ratio are updated.

Fixes #11911.

Change-Id: I377b231ab064946228378fa10422a46d1b50f4c5
Reviewed-on: https://go-review.googlesource.com/13047
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:48 +00:00
Austin Clements
f9dc3382ad runtime: when gcpacertrace > 0, print information about assist ratio
This was useful in debugging the mutator assist behavior for #11911,
and it fits with the other gcpacertrace output.

Change-Id: I1e25590bb4098223a160de796578bd11086309c7
Reviewed-on: https://go-review.googlesource.com/13046
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:46 +00:00
Austin Clements
fc9ca85f4c runtime: make sweep proportional to spans bytes allocated
Proportional concurrent sweep is currently based on a ratio of spans
to be swept per bytes of object allocation. However, proportional
sweeping is performed during span allocation, not object allocation,
in order to minimize contention and overhead. Since objects are
allocated from spans after those spans are allocated, the system tends
to operate in debt, which means when the next GC cycle starts, there
is often sweep debt remaining, so GC has to finish the sweep, which
delays the start of the cycle and delays enabling mutator assists.

For example, it's quite likely that many Ps will simultaneously refill
their span caches immediately after a GC cycle (because GC flushes the
span caches), but at this point, there has been very little object
allocation since the end of GC, so very little sweeping is done. The
Ps then allocate objects from these cached spans, which drives up the
bytes of object allocation, but since these allocations are coming
from cached spans, nothing considers whether more sweeping has to
happen. If the sweep ratio is high enough (which can happen if the
next GC trigger is very close to the retained heap size), this can
easily represent a sweep debt of thousands of pages.

Fix this by making proportional sweep proportional to the number of
bytes of spans allocated, rather than the number of bytes of objects
allocated. Prior to allocating a span, both the small object path and
the large object path ensure credit for allocating that span, so the
system operates in the black, rather than in the red.

Combined with the previous commit, this should eliminate all sweeping
from GC start up. On the stress test in issue #11911, this reduces the
time spent sweeping during GC (and delaying start up) by several
orders of magnitude:

                mean    99%ile     max
    pre fix      1 ms    11 ms   144 ms
    post fix   270 ns   735 ns   916 ns

Updates #11911.

Change-Id: I89223712883954c9d6ec2a7a51ecb97172097df3
Reviewed-on: https://go-review.googlesource.com/13044
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:44 +00:00
Austin Clements
e30c6d64ba runtime: always give concurrent sweep some heap distance
Currently it's possible for the next_gc heap size trigger computed for
the next GC cycle to be less than the current allocated heap size.
This means the next cycle will start immediately, which means there's
no time to perform the concurrent sweep between GC cycles. This places
responsibility for finishing the sweep on GC itself, which delays GC
start-up and hence delays mutator assist.

Fix this by ensuring that next_gc is always at least a little higher
than the allocated heap size, so we won't trigger the next cycle
instantly.

Updates #11911.

Change-Id: I74f0b887bf187518d5fedffc7989817cbcf30592
Reviewed-on: https://go-review.googlesource.com/13043
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-08-04 18:54:41 +00:00
Austin Clements
fb5230af8a runtime: assist the GC during GC startup and shutdown
Currently there are two sensitive periods during which a mutator can
allocate past the heap goal but mutator assists can't be enabled: 1)
at the beginning of GC between when the heap first passes the heap
trigger and sweep termination and 2) at the end of GC between mark
termination and when the background GC goroutine parks. During these
periods there's no back-pressure or safety net, so a rapidly
allocating mutator can allocate past the heap goal. This is
exacerbated if there are many goroutines because the GC coordinator is
scheduled as any other goroutine, so if it gets preempted during one
of these periods, it may stay preempted for a long period (10s or 100s
of milliseconds).

Normally the mutator does scan work to create back-pressure against
allocation, but there is no scan work during these periods. Hence, as
a fall back, if a mutator would assist but can't yet, simply yield the
CPU. This delays the mutator somewhat, but more importantly gives more
CPU time to the GC coordinator for it to complete the transition.

This is obviously a workaround. Issue #11970 suggests a far better but
far more invasive way to fix this.

Updates #11911. (This very nearly fixes the issue, but about once
every 15 minutes I get a GC cycle where the assists are enabled but
don't do enough work.)

Change-Id: I9768b79e3778abd3e06d306596c3bd77f65bf3f1
Reviewed-on: https://go-review.googlesource.com/13026
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-08-04 18:54:38 +00:00
Austin Clements
88e945fd23 runtime: recheck GC trigger before actually starting GC
Currently allocation checks the GC trigger speculatively during
allocation and then triggers the GC without rechecking. As a result,
it's possible for G 1 and G 2 to detect the trigger simultaneously,
both enter startGC, G 1 actually starts GC while G 2 gets preempted
until after the whole GC cycle, then G 2 immediately starts another GC
cycle even though the heap is now well under the trigger.

Fix this by re-checking the GC trigger non-speculatively just before
actually kicking off a new GC cycle.

This contributes to #11911 because when this happens, we definitely
don't finish the background sweep before starting the next GC cycle,
which can significantly delay the start of concurrent scan.

Change-Id: I560ab79ba5684ba435084410a9765d28f5745976
Reviewed-on: https://go-review.googlesource.com/13025
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-08-04 18:54:32 +00:00
Mikio Hara
5e15e28e0e runtime: skip TestCgoCallbackGC on dragonfly
Updates #11990.

Change-Id: I6c58923a1b5a3805acfb6e333e3c9e87f4edf4ba
Reviewed-on: https://go-review.googlesource.com/13050
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-08-03 04:41:48 +00:00
Russ Cox
c5dff7282e cmd/compile, runtime: fix placement of map bucket overflow pointer on nacl
On most systems, a pointer is the worst case alignment, so adding
a pointer field at the end of a struct guarantees there will be no
padding added after that field (to satisfy overall struct alignment
due to some more-aligned field also present).

In the runtime, the map implementation needs a quick way to
get to the overflow pointer, which is last in the bucket struct,
so it uses size - sizeof(pointer) as the offset.

NaCl/amd64p32 is the exception, as always.
The worst case alignment is 64 bits but pointers are 32 bits.
There's a long history that is not worth going into, but when
we moved the overflow pointer to the end of the struct,
we didn't get the padding computation right.
The compiler computed the regular struct size and then
on amd64p32 added another 32-bit field.
And the runtime assumed it could step back two 32-bit fields
(one 64-bit register size) to get to the overflow pointer.
But in fact if the struct needed 64-bit alignment, the computation
of the regular struct size would have added a 32-bit pad already,
and then the code unconditionally added a second 32-bit pad.
This placed the overflow pointer three words from the end, not two.
The last two were padding, and since the runtime was consistent
about using the second-to-last word as the overflow pointer,
no harm done in the sense of overwriting useful memory.
But writing the overflow pointer to a non-pointer word of memory
means that the GC can't see the overflow blocks, so it will
collect them prematurely. Then bad things happen.

Correct all this in a few steps:

1. Add an explicit check at the end of the bucket layout in the
compiler that the overflow field is last in the struct, never
followed by padding.

2. When padding is needed on nacl (not always, just when needed),
insert it before the overflow pointer, to preserve the "last in the struct"
property.

3. Let the compiler have the final word on the width of the struct,
by inserting an explicit padding field instead of overwriting the
results of the width computation it does.

4. For the same reason (tell the truth to the compiler), set the type
of the overflow field when we're trying to pretend its not a pointer
(in this case the runtime maintains a list of the overflow blocks
elsewhere).

5. Make the runtime use "last in the struct" as its location algorithm.

This fixes TestTraceStress on nacl/amd64p32.
The 'bad map state' and 'invalid free list' failures no longer occur.

Fixes #11838.

Change-Id: If918887f8f252d988db0a35159944d2b36512f92
Reviewed-on: https://go-review.googlesource.com/12971
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-31 18:49:32 +00:00
Russ Cox
108ec5f75a runtime: fix systemstack tracebacks on nacl/arm
For #11956.

Change-Id: Ic9b57cafa197953cc7f435941e44d42b60b3ddf0
Reviewed-on: https://go-review.googlesource.com/13011
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-07-31 04:35:38 +00:00
Russ Cox
abdc77a288 runtime: avoid reference to stale stack after GC shrinkstack
Dangling pointer error. Unlikely to trigger in practice, but still.
Found by running GODEBUG=efence=1 GOGC=1 trace.test.

Change-Id: Ice474dedcf62dd33ab77526287a023ba3b166db9
Reviewed-on: https://go-review.googlesource.com/12991
Reviewed-by: Austin Clements <austin@google.com>
2015-07-31 02:18:42 +00:00
Russ Cox
4bd8040d47 runtime, sync/atomic: add memory barriers in arm cas routines
This only triggers on ARMv7+.
If there are important SMP ARMv6 machines we can reconsider.

Makes TestLFStress tests pass and sync/atomic tests not time out
on Apple iPad Mini 3.

Fixes #7977.
Fixes #10189.

Change-Id: Ie424dea3765176a377d39746be9aa8265d11bec4
Reviewed-on: https://go-review.googlesource.com/12950
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-30 20:11:11 +00:00
Russ Cox
e0c180c44f runtime/cgo: fix darwin/amd64 signal handling setup
Was not allocating space for the frame above sigpanic,
nor was it pushing the LR into the right place.
Because traceback past sigpanic only needs the
LR for faulting leaves, this was not noticed too much.
But it did break the sync/atomic nil deref tests.

Change-Id: Icba53fffa193423aab744c37f21ee893ce2ee3ac
Reviewed-on: https://go-review.googlesource.com/12926
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-30 19:18:45 +00:00
Russ Cox
b2dfacf35e runtime: change arm software div/mod call sequence not to modify stack
Instead of pushing the denominator argument on the stack,
the denominator is now passed in m.

This fixes a variety of bugs related to trying to take stack traces
backwards from the middle of the software div/mod routines.
Some of those bugs have been kludged around in the past,
but others have not. Instead of trying to patch up after breaking
the stack, this CL stops breaking the stack.

This is an update of https://golang.org/cl/19810043,
which was rolled back in https://golang.org/cl/20350043.

The problem in the original CL was that there were divisions
at bad times, when m was not available. These were divisions
by constant denominators, either in C code or in assembly.
The Go compiler knows how to generate division by multiplication
for constant denominators, but the C compiler did not.
There is no longer any C code, so that's taken care of.
There was one problematic DIV in runtime.usleep (assembly)
but https://golang.org/cl/12898 took care of that one.
So now this approach is safe.

Reject DIV/MOD in NOSPLIT functions to keep them from
coming back.

Fixes #6681.
Fixes #6699.
Fixes #10486.

Change-Id: I09a13c76ad08ba75b3bd5d46a3eb78e66a84ab38
Reviewed-on: https://go-review.googlesource.com/12899
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-30 16:14:05 +00:00
Russ Cox
c9d2c7f0d2 runtime: replace divide with multiply in runtime.usleep on arm
We want to adjust the DIV calling convention to use m,
and usleep can be called without an m, so switch to a
multiplication by the reciprocal (and test).

Step toward a fix for #6699 and #10486.

Change-Id: Iccf76a18432d835e48ec64a2fa34a0e4d6d4b955
Reviewed-on: https://go-review.googlesource.com/12898
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-30 15:48:29 +00:00
David Crawshaw
b7205b92c0 runtime/trace: test requires 'go tool addr2line'
For the android/arm builder.

Change-Id: Iad4881689223cd6479870da9541524a8cc458cce
Reviewed-on: https://go-review.googlesource.com/12859
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2015-07-30 05:57:37 +00:00
Russ Cox
c4092ac398 cmd/compile: fix uninitialized memory during type switch assertE2I2
Fixes arm64 builder crash.

The bug is possible on all architectures; you just have to get lucky
and hit a preemption or a stack growth on entry to assertE2I2.
The test stacks the deck.

Change-Id: I8419da909b06249b1ad15830cbb64e386b6aa5f6
Reviewed-on: https://go-review.googlesource.com/12890
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-30 05:21:56 +00:00
Russ Cox
bfac8623d5 runtime: enable TestEmptySlice
It says to disable until #7564 is fixed. It was fixed in April 2014.

Change-Id: I9bebfe96802bafdd2d1a0a47591df346d91b000c
Reviewed-on: https://go-review.googlesource.com/12858
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-30 04:47:16 +00:00
Russ Cox
d3ffc975f3 runtime: set invalidptr=1 by default, as documented
Also make invalidptr control the recently added GC pointer check,
as documented.

Change-Id: Iccfdf49480219d12be8b33b8f03d8312d8ceabed
Reviewed-on: https://go-review.googlesource.com/12857
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-29 23:50:20 +00:00
Russ Cox
bd5ca22232 runtime/trace: remove existing Skips
The skips added in CL 12579, based on incorrect time stamps,
should be sufficient to identify and exclude all the time-related
flakiness on these systems.

If there is other flakiness, we want to find out.

For #10512.

Change-Id: I5b588ac1585b2e9d1d18143520d2d51686b563e3
Reviewed-on: https://go-review.googlesource.com/12746
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:32:23 +00:00
Russ Cox
80c98fa901 runtime/trace: record event sequence numbers explicitly
Nearly all the flaky failures we've seen in trace tests have been
due to the use of time stamps to determine relative event ordering.
This is tricky for many reasons, including:
 - different cores might not have exactly synchronized clocks
 - VMs are worse than real hardware
 - non-x86 chips have different timer resolution than x86 chips
 - on fast systems two events can end up with the same time stamp

Stop trying to make time reliable. It's clearly not going to be for Go 1.5.
Instead, record an explicit event sequence number for ordering.
Using our own counter solves all of the above problems.

The trace still contains time stamps, of course. The sequence number
is just used for ordering.

Should alleviate #10554 somewhat. Then tickDiv can be chosen to
be a useful time unit instead of having to be exact for ordering.

Separating ordering and time stamps lets the trace parser diagnose
systems where the time stamp order and actual order do not match
for one reason or another. This CL adds that check to the end of
trace.Parse, after all other sequence order-based checking.
If that error is found, we skip the test instead of failing it.
Putting the check in trace.Parse means that cmd/trace will pick
up the same check, refusing to display a trace where the time stamps
do not match actual ordering.

Using net/http's BenchmarkClientServerParallel4 on various CPU counts,
not tracing vs tracing:

name                      old time/op    new time/op    delta
ClientServerParallel4       50.4µs ± 4%    80.2µs ± 4%  +59.06%        (p=0.000 n=10+10)
ClientServerParallel4-2     33.1µs ± 7%    57.8µs ± 5%  +74.53%        (p=0.000 n=10+10)
ClientServerParallel4-4     18.5µs ± 4%    32.6µs ± 3%  +75.77%        (p=0.000 n=10+10)
ClientServerParallel4-6     12.9µs ± 5%    24.4µs ± 2%  +89.33%        (p=0.000 n=10+10)
ClientServerParallel4-8     11.4µs ± 6%    21.0µs ± 3%  +83.40%        (p=0.000 n=10+10)
ClientServerParallel4-12    14.4µs ± 4%    23.8µs ± 4%  +65.67%        (p=0.000 n=10+10)

Fixes #10512.

Change-Id: I173eecf8191e86feefd728a5aad25bf1bc094b12
Reviewed-on: https://go-review.googlesource.com/12579
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:32:14 +00:00
Russ Cox
fde392623a runtime: ignore arguments in cgocallback_gofunc frame
Otherwise the GC may see uninitialized memory there,
which might be old pointers that are retained, or it might
trigger the invalid pointer check.

Fixes #11907.

Change-Id: I67e306384a68468eef45da1a8eb5c9df216a77c0
Reviewed-on: https://go-review.googlesource.com/12852
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:30:46 +00:00
Russ Cox
f6dfe16798 runtime: fix darwin/amd64 assembly frame sizes
Change-Id: I2f0ecdc02ce275feadf07e402b54f988513e9b49
Reviewed-on: https://go-review.googlesource.com/12855
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 22:26:02 +00:00
Russ Cox
4addec3aaa runtime: reenable bad pointer check in GC
The last time we tried this, linux/arm64 broke.
The series of CLs leading to this one fixes that problem.
Let's try again.

Fixes #9880.

Change-Id: I67bc1d959175ec972d4dcbe4aa6f153790f74251
Reviewed-on: https://go-review.googlesource.com/12849
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 21:37:55 +00:00
Russ Cox
421220571d runtime, reflect: use correctly aligned stack frame sizes on arm64
arm64 requires either no stack frame or a frame with a size that is 8 mod 16
(adding the saved LR will make it 16-aligned).

The cmd/internal/obj/arm64 has been silently aligning frames, but it led to
a terrible bug when the compiler and obj disagreed on the frame size,
and it's just generally confusing, so we're going to make misaligned frames
an error instead of something that is silently changed.

This CL prepares by updating assembly files.
Note that the changes in this CL are already being done silently by
cmd/internal/obj/arm64, so there is no semantic effect here,
just a clarity effect.

For #9880.

Change-Id: Ibd6928dc5fdcd896c2bacd0291bf26b364591e28
Reviewed-on: https://go-review.googlesource.com/12845
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 21:35:35 +00:00
Austin Clements
23e4744c07 runtime: report GC CPU utilization in MemStats
This adds a GCCPUFraction field to MemStats that reports the
cumulative fraction of the program's execution time spent in the
garbage collector. This is equivalent to the utilization percent shown
in the gctrace output and makes this available programmatically.

This does make one small effect on the gctrace output: we now report
the duration of mark termination up to just before the final
start-the-world, rather than up to just after. However, unlike
stop-the-world, I don't believe there's any way that start-the-world
can block, so it should take negligible time.

While there are many statistics one might want to expose via MemStats,
this is one of the few that will undoubtedly remain meaningful
regardless of future changes to the memory system.

The diff for this change is larger than the actual change. Mostly it
lifts the code for computing the GC CPU utilization out of the
debug.gctrace path.

Updates #10323.

Change-Id: I0f7dc3fdcafe95e8d1233ceb79de606b48acd989
Reviewed-on: https://go-review.googlesource.com/12844
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 20:23:34 +00:00
Austin Clements
4b71660c5b runtime: always capture GC phase transition times
Currently we only capture GC phase transition times if
debug.gctrace>0, but we're about to compute GC CPU utilization
regardless of whether debug.gctrace is set, so we need these
regardless of debug.gctrace.

Change-Id: If3acf16505a43d416e9a99753206f03287180660
Reviewed-on: https://go-review.googlesource.com/12843
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-07-29 20:23:25 +00:00
Austin Clements
87f97c73d3 runtime: avoid race between SIGPROF traceback and stack barriers
The following sequence of events can lead to the runtime attempting an
out-of-bounds access on a stack barrier slice:

1. A SIGPROF comes in on a thread while the G on that thread is in
   _Gsyscall. The sigprof handler calls gentraceback, which saves a
   local copy of the G's stkbar slice. Currently the G has no stack
   barriers, so this slice is empty.

2. On another thread, the GC concurrently scans the stack of the
   goroutine being profiled (it considers it stopped because it's in
   _Gsyscall) and installs stack barriers.

3. Back on the sigprof thread, gentraceback comes across a stack
   barrier in the stack and attempts to look it up in its (zero
   length) copy of G's old stkbar slice, which causes an out-of-bounds
   access.

This commit fixes this by adding a simple cas spin to synchronize the
SIGPROF handler with stack barrier insertion.

In general I would prefer that this synchronization be done through
the G status, since that's how stack scans are otherwise synchronized,
but adding a new lock is a much smaller change and G statuses are full
of subtlety.

Fixes #11863.

Change-Id: Ie89614a6238bb9c6a5b1190499b0b48ec759eaf7
Reviewed-on: https://go-review.googlesource.com/12748
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 19:31:46 +00:00
Rick Hudson
e95bc5fef7 runtime: force mutator to give work buffer to GC
The scheduler, work buffer's dispose, and write barriers
can conspire to hide the a pointer from the GC's concurent
mark phase. If this pointer is the only path to a large
amount of marking the STW mark termination phase may take
a lot of time.

Consider the following:
1) dispose places a work buffer on the partial queue
2) the GC is busy so it does not immediately remove and
   process the work buffer
3) the scheduler runs a mutator whose write barrier dequeues the
   work buffer from the partial queue so the GC won't see it
This repeats until the GC reaches the mark termination
phase where the GC finally discovers the pointer along
with a lot of work to do.

This CL fixes the problem by having the mutator
dispose of the buffer to the full queue instead of
the partial queue. Since the write buffer never asks for full
buffers the conspiracy described above is not possible.

Updates #11694.

Change-Id: I2ce832f9657a7570f800e8ce4459cd9e304ef43b
Reviewed-on: https://go-review.googlesource.com/12840
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 18:56:11 +00:00
Dmitry Vyukov
0c22a74e85 runtime: fix out-of-bounds in stack debugging
Currently stackDebug=4 crashes as:

panic: runtime error: index out of range
fatal error: panic on system stack
runtime stack:
runtime.throw(0x607470, 0x15)
	src/runtime/panic.go:527 +0x96
runtime.gopanic(0x5ada00, 0xc82000a1d0)
	src/runtime/panic.go:354 +0xb9
runtime.panicindex()
	src/runtime/panic.go:12 +0x49
runtime.adjustpointers(0xc820065ac8, 0x7ffe58b56100, 0x7ffe58b56318, 0x0)
	src/runtime/stack1.go:428 +0x5fb
runtime.adjustframe(0x7ffe58b56200, 0x7ffe58b56318, 0x1)
	src/runtime/stack1.go:542 +0x780
runtime.gentraceback(0x487760, 0xc820065ac0, 0x0, 0xc820001080, 0x0, 0x0, 0x7fffffff, 0x6341b8, 0x7ffe58b56318, 0x0, ...)
	src/runtime/traceback.go:336 +0xa7e
runtime.copystack(0xc820001080, 0x1000)
	src/runtime/stack1.go:616 +0x3b1
runtime.newstack()
	src/runtime/stack1.go:801 +0xdde

Change-Id: If2d60960231480a9dbe545d87385fe650d6db808
Reviewed-on: https://go-review.googlesource.com/12763
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-28 20:11:19 +00:00
Russ Cox
7a63ab1a65 runtime: use 64k page rounding on arm64
Fixes #11886.

Change-Id: I9392fd2ef5951173ae275b3ab42db4f8bd2e1d7a
Reviewed-on: https://go-review.googlesource.com/12747
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-28 19:59:00 +00:00
David du Colombier
68117a91ae runtime: fix x86 stack trace for call to heap memory on Plan 9
Russ Cox fixed this issue for other systems
in CL 12026, but the Plan 9 part was forgotten.

Fixes #11656.

Change-Id: I91c033687987ba43d13ad8f42e3fe4c7a78e6075
Reviewed-on: https://go-review.googlesource.com/12762
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-28 19:01:41 +00:00
Ian Lance Taylor
0229317d76 runtime: don't define libc_getpid in os3_solaris.go
The function is already defined between syscall_solaris.go and
syscall2_solaris.go.

Change-Id: I034baf7c8531566bebfdbc5a4061352cbcc31449
Reviewed-on: https://go-review.googlesource.com/12773
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-28 14:07:17 +00:00
Ian Lance Taylor
deaf0333df runtime: fix definitions of getpid and kill on Solaris
A further attempt to fix raiseproc on Solaris.

Change-Id: I8d8000d6ccd0cd9f029ebe1f211b76ecee230cd0
Reviewed-on: https://go-review.googlesource.com/12771
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-28 06:21:08 +00:00
Ian Lance Taylor
d7223c6cc1 runtime: correct implementation of raiseproc on Solaris
I forgot that the libc raise function only sends the signal to the
current thread.  We need to actually use kill and getpid here, as we
do on other systems.

Change-Id: Iac34af822c93468bf68cab8879db3ee20891caaf
Reviewed-on: https://go-review.googlesource.com/12704
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-28 05:41:27 +00:00
David Crawshaw
249894ab6c runtime/cgo: remove TMPDIR logic for iOS
Seems like the simplest solution for 1.5. All the parts of the test
suite I can run on my current device (for which my exception handler
fix no longer works, apparently) pass without this code. I'll move it
into x/mobile/app.

Fixes #11884

Change-Id: I2da40c8c7b48a4c6970c4d709dd7c148a22e8727
Reviewed-on: https://go-review.googlesource.com/12721
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-27 21:28:31 +00:00
Austin Clements
c1f7a56fc0 runtime: close window that hides GC work from concurrent mark
Currently we enter mark 2 by first flushing all existing gcWork caches
and then setting gcBlackenPromptly, which disables further gcWork
caching. However, if a worker or assist pulls a work buffer in to its
gcWork cache after that cache has been flushed but before caching is
disabled, that work may remain in that cache until mark termination.
If that work represents a heap bottleneck (e.g., a single pointer that
is the only way to reach a large amount of the heap), this can force
mark termination to do a large amount of work, resulting in a long
STW.

Fix this by reversing the order of these steps: first disable caching,
then flush all existing caches.

Rick Hudson <rlh> did the hard work of tracking this down. This CL
combined with CL 12672 and CL 12646 distills the critical parts of his
fix from CL 12539.

Fixes #11694.

Change-Id: Ib10d0a21e3f6170a80727d0286f9990df049fed2
Reviewed-on: https://go-review.googlesource.com/12688
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-07-27 20:00:25 +00:00
Austin Clements
510fd1350d runtime: enable GC assists ASAP
Currently the GC coordinator enables GC assists at the same time it
enables background mark workers, after the concurrent scan phase is
done. However, this means a rapidly allocating mutator has the entire
scan phase during which to allocate beyond the heap trigger and
potentially beyond the heap goal with no back-pressure from assists.
This prevents the feedback system that's supposed to keep the heap
size under the heap goal from doing its job.

Fix this by enabling mutator assists during the scan phase. This is
safe because the write barrier is already enabled and globally
acknowledged at this point.

There's still a very small window between when the heap size reaches
the heap trigger and when the GC coordinator is able to stop the world
during which the mutator can allocate unabated. This allows *very*
rapidly allocator mutators like TestTraceStress to still occasionally
exceed the heap goal by a small amount (~20 MB at most for
TestTraceStress). However, this seems like a corner case.

Fixes #11677.

Change-Id: I0f80d949ec82341cd31ca1604a626efb7295a819
Reviewed-on: https://go-review.googlesource.com/12674
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:05 +00:00
Austin Clements
f5e67e53e7 runtime: allow GC drain whenever write barrier is enabled
Currently we hand-code a set of phases when draining is allowed.
However, this set of phases is conservative. The critical invariant is
simply that the write barrier must be enabled if we're draining.

Shortly we're going to enable mutator assists during the scan phase,
which means we may drain during the scan phase. In preparation, this
commit generalizes these assertions to check the fundamental condition
that the write barrier is enabled, rather than checking that we're in
any particular phase.

Change-Id: I0e1bec1ca823d4a697a0831ec4c50f5dd3f2a893
Reviewed-on: https://go-review.googlesource.com/12673
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:04 +00:00
Austin Clements
64a32ffeee runtime: don't start workers between mark 1 & 2
Currently we clear both the mark 1 and mark 2 signals at the beginning
of concurrent mark. If either if these is clear, it acts as a signal
to the scheduler that it should start background workers. However,
this means that in the interim *between* mark 1 and mark 2, the
scheduler basically loops starting up new workers only to have them
return with nothing to do. In addition to harming performance and
delaying mutator work, this approach has a race where workers started
for mark 1 can mistakenly signal mark 2, causing it to complete
prematurely. This approach also interferes with starting assists
earlier to fix #11677.

Fix this by initially setting both mark 1 and mark 2 to "signaled".
The scheduler will not start background mark workers, though assists
can still run. When we're ready to enter mark 1, we clear the mark 1
signal and wait for it. Then, when we're ready to enter mark 2, we
clear the mark 2 signal and wait for it.

This structure also lets us deal cleanly with the situation where all
work is drained *prior* to the mark 2 wait, meaning that there may be
no workers to signal completion. Currently we deal with this using a
racy (and possibly incorrect) check for work in the coordinator itself
to skip the mark 2 wait if there's no work. This change makes the
coordinator unconditionally wait for mark completion and makes the
scheduler itself signal completion by slightly extending the logic it
already has to determine that there's no work and hence no use in
starting a new worker.

This is a prerequisite to fixing the remaining component of #11677,
which will require enabling assists during the scan phase. However, we
don't want to enable background workers until the mark phase because
they will compete with the scan. This change lets us use bgMark1 and
bgMark2 to indicate when it's okay to start background workers
independent of assists.

This is also a prerequisite to fixing #11694. It significantly reduces
the occurrence of long mark termination pauses in #11694 (from 64 out
of 1000 to 2 out of 1000 in one experiment).

Coincidentally, this also reduces the final heap size (and hence run
time) of TestTraceStress from ~100 MB and ~1.9 seconds to ~14 MB and
~0.4 seconds because it significantly shortens concurrent mark
duration.

Rick Hudson <rlh> did the hard work of tracking this down.

Change-Id: I12ea9ee2db9a0ae9d3a90dde4944a75fcf408f4c
Reviewed-on: https://go-review.googlesource.com/12672
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:02 +00:00
Austin Clements
8f34b25318 runtime: retry GC assist until debt is paid off
Currently, there are three ways to satisfy a GC assist: 1) the mutator
steals credit from background GC, 2) the mutator actually does GC
work, and 3) there is no more work available. 3 was never really
intended as a way to satisfy an assist, and it causes problems: there
are periods when it's expected that the GC won't have any work, such
as when transitioning from mark 1 to mark 2 and from mark 2 to mark
termination. During these periods, there's no back-pressure on rapidly
allocating mutators, which lets them race ahead of the heap goal.

For example, test/init1.go and the runtime/trace test both have small
reachable heaps and contain loops that rapidly allocate large garbage
byte slices. This bug lets these tests exceed the heap goal by several
orders of magnitude.

Fix this by forcing the assist (and hence the allocation) to block
until it can satisfy its debt via either 1 or 2, or the GC cycle
terminates.

This fixes one the causes of #11677. It's still possible to overshoot
the GC heap goal, but with this change the overshoot is almost exactly
by the amount of allocation that happens during the concurrent scan
phase, between when the heap passes the GC trigger and when the GC
enables assists.

Change-Id: I5ef4edcb0d2e13a1e432e66e8245f2bd9f8995be
Reviewed-on: https://go-review.googlesource.com/12671
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:01 +00:00
Austin Clements
500c88d40d runtime: yield to GC coordinator after assist completion
Currently it's possible for the GC assist to signal completion of the
mark phase, which puts the GC coordinator goroutine on the current P's
run queue, and then return to mutator code that delays until the next
forced preemption before actually yielding control to the GC
coordinator, dragging out completion of the mark phase. This delay can
be further exacerbated if the mutator makes other goroutines runnable
before yielding control, since this will push the GC coordinator on
the back of the P's run queue.

To fix this, this adds a Gosched to the assist if it completed the
mark phase. This immediately and directly yields control to the GC
coordinator. This already happens implicitly in the background mark
workers because they park immediately after completing the mark.

This is one of the reasons completion of the mark phase is being
dragged out and allowing the mutator to allocate without assisting,
leading to the large heap goal overshoot in issue #11677. This is also
a prerequisite to making the assist block when it can't pay off its
debt.

Change-Id: I586adfbecb3ca042a37966752c1dc757f5c7fc78
Reviewed-on: https://go-review.googlesource.com/12670
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:59:00 +00:00
Austin Clements
4f188c2d1c runtime: disallow GC assists in non-preemptible contexts
Currently it's possible to perform GC work on a system stack or when
locks are held if there's an allocation that triggers an assist. This
is generally a bad idea because of the fragility of these contexts,
and it's incompatible with two changes we're about to make: one is to
yield after signaling mark completion (which we can't do from a
non-preemptible context) and the other is to make assists block if
there's no other way for them to pay off the assist debt.

This commit simply skips the assist if it's called from a
non-preemptible context. The allocation will still count toward the
assist debt, so it will be paid off by a later assist. There should be
little allocation from non-preemptible contexts, so this shouldn't
harm the overall assist mechanism.

Change-Id: I7bf0e6c73e659fe6b52f27437abf39d76b245c79
Reviewed-on: https://go-review.googlesource.com/12649
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:59 +00:00
Austin Clements
dff9108d98 runtime: make notetsleep_internal nowritebarrier
When notetsleep_internal is called from notetsleepg, notetsleepg has
just given up the P, so write barriers are not allowed in
notetsleep_internal.

Change-Id: I1b214fa388b1ea05b8ce2dcfe1c0074c0a3c8870
Reviewed-on: https://go-review.googlesource.com/12647
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:58 +00:00
Austin Clements
cf225a1748 runtime: fix mark 2 completion in fractional/idle workers
Currently fractional and idle mark workers dispose of their gcWork
cache during mark 2 after incrementing work.nwait and after checking
whether there are any workers or any work available. This creates a
window for two races:

1) If the only remaining work is in this worker's gcWork cache, it
   will see that there are no more workers and no more work on the
   global lists (since it has not yet flushed its own cache) and
   prematurely signal mark 2 completion.

2) After this worker has incremented work.nwait but before it has
   flushed its cache, another worker may observe that there are no
   more workers and no more work and prematurely signal mark 2
   completion.

We can fix both of these by simply moving the cache flush above the
increment of nwait and the test of the completion condition.

This is probably contributing to #11694, though this alone is not
enough to fix it.

Change-Id: Idcf9656e5c460c5ea0d23c19c6c51e951f7716c3
Reviewed-on: https://go-review.googlesource.com/12646
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:56 +00:00
Austin Clements
b8526a8380 runtime: steal the correct amount of GC assist credit
GC assists are supposed to steal at most the amount of background GC
credit available so that background GC credit doesn't go negative.
However, they are instead stealing the *total* amount of their debt
but only claiming up to the amount of credit that was available. This
results in draining the background GC credit pool too quickly, which
results in unnecessary assist work.

The fix is trivial: steal the amount of work we meant to steal (which
is already computed).

Change-Id: I837fe60ed515ba91c6baf363248069734a7895ef
Reviewed-on: https://go-review.googlesource.com/12643
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 19:58:54 +00:00
Austin Clements
4c9464525e runtime: document gctrace format
Fixes #10348.

Change-Id: I3eea9738e3f6fdc1998d04a601dc9b556dd2db72
Reviewed-on: https://go-review.googlesource.com/12453
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 17:45:34 +00:00
Austin Clements
7eeeae2a5c runtime: always report starting heap size in gctrace
Currently the gctrace output reports the trigger heap size, rather
than the actual heap size at the beginning of GC. Often these are the
same, or at least very close. However, it's possible for the heap to
already have exceeded this trigger when we first check the trigger and
start GC; in this case, this output is very misleading. We've
encountered this confusion a few times when debugging and this
behavior is difficult to document succinctly.

Change the gctrace output to report the actual heap size when GC
starts, rather than the trigger.

Change-Id: I246b3ccae4c4c7ea44c012e70d24a46878d7601f
Reviewed-on: https://go-review.googlesource.com/12452
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 17:45:28 +00:00
Austin Clements
cc6ed285e5 runtime: remove # from gctrace line
Whenever someone pastes gctrace output into GitHub, it helpfully turns
the GC cycle number into a link to some unrelated issue. Prevent this
by removing the pound before the cycle number. The fact that this is a
cycle number is probably more obvious at a glance than most of the
other numbers.

Change-Id: Ifa5fc7fe6c715eac50e639f25bc36c81a132ffea
Reviewed-on: https://go-review.googlesource.com/12413
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 17:45:22 +00:00
Ian Lance Taylor
f0876a1a94 runtime: log all thread stack traces during GODEBUG=crash on Unix
This extends https://golang.org/cl/2811, which only applied to Darwin
and GNU/Linux, to all Unix systems.

Fixes #9591.

Change-Id: Iec3fb438564ba2924b15b447c0480f87c0bfd009
Reviewed-on: https://go-review.googlesource.com/12661
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-27 16:58:53 +00:00
Russ Cox
6b8762104a runtime/pprof: document content of heap profile
Fixes #11343.

Change-Id: I46efc24b687b9d060ad864fbb238c74544348e38
Reviewed-on: https://go-review.googlesource.com/12556
Reviewed-by: Rob Pike <r@golang.org>
2015-07-27 16:30:27 +00:00
Russ Cox
f6fb549d22 runtime/cgo: move TMPDIR magic out of os
It's not clear this really belongs anywhere at all,
but this is a better place for it than package os.
This way package os can avoid importing "C".

Fixes #10455.

Change-Id: Ibe321a93bf26f478951c3a067d75e22f3d967eb7
Reviewed-on: https://go-review.googlesource.com/12574
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-07-27 16:05:42 +00:00
Michael Hudson-Doyle
2b0ddb6c23 runtime: pass a smaller buffer to sched_getaffinity on ARM
The system stack is only around 8kb on ARM so one can't put an 8kb buffer on
the stack. More than 1024 ARM cores seems sufficiently unlikely for the
foreseeable future.

Fixes #11853

Change-Id: I7cb27c1250a6153f86e269c172054e9dfc218c72
Reviewed-on: https://go-review.googlesource.com/12622
Reviewed-by: Austin Clements <austin@google.com>
2015-07-27 01:04:10 +00:00
Ian Lance Taylor
eb248c4df2 runtime: require gdb version 7.9 for gdb test
Issue 11214 reports problems with older versions of gdb.  It does work
with gdb 7.9 on my Ubuntu Trusty system, so take that as the minimum
required version.

Fixes #11214.

Change-Id: I61b732895506575be7af595f81fc1bcf696f58c2
Reviewed-on: https://go-review.googlesource.com/12626
Reviewed-by: Austin Clements <austin@google.com>
2015-07-24 17:15:44 +00:00
Ian Lance Taylor
d9ee9a0f6e runtime: fix runtime·raise for dragonfly amd64
Fixes #11847.

Change-Id: I21736a4c6f6fb2f61aec1396ce2c965e3e329e92
Reviewed-on: https://go-review.googlesource.com/12621
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
2015-07-24 05:16:19 +00:00
Russ Cox
74ec5bf2d8 runtime: make pcln table check not trigger next to foreign code
Foreign code can be arbitrarily aligned,
so the function before it can have
arbitrarily much padding.
We can't call pcvalue on values in the padding.

Fixes #11653.

Change-Id: I7d57f813ae5a2409d1520fcc909af3eeef2da131
Reviewed-on: https://go-review.googlesource.com/12550
Reviewed-by: Rob Pike <r@golang.org>
2015-07-23 14:14:22 +00:00
Russ Cox
7334cb3a6f runtime/trace: fix TestTraceSymbolize networking
We use 127.0.0.1 instead of localhost in Go networking tests.
The reporter of #11774 has localhost defined to be 120.192.83.162,
for reasons unknown.

Also, if TestTraceSymbolize calls Fatalf (for example because Listen
fails) then we need to stop the trace for future tests to work.
See failure log in #11774.

Fixes #11774.

Change-Id: Iceddb03a72d31e967acd2d559ecb78051f9c14b7
Reviewed-on: https://go-review.googlesource.com/12521
Reviewed-by: Rob Pike <r@golang.org>
2015-07-23 05:37:15 +00:00
Russ Cox
77d38d9cbe runtime: handle linux CPU masks up to 64k CPUs
Fixes #11823.

Change-Id: Ic949ccb9657478f8ca34fdf1a6fe88f57db69f24
Reviewed-on: https://go-review.googlesource.com/12535
Reviewed-by: Austin Clements <austin@google.com>
2015-07-22 20:53:01 +00:00
Russ Cox
75d779566b runtime/cgo: make compatible with race detector
Some routines run without and m or g and cannot invoke the
race detector runtime. They must be opaque to the runtime.
That used to be true because they were written in C.
Now that they are written in Go, disable the race detector
annotations for those functions explicitly.

Add test.

Fixes #10874.

Change-Id: Ia8cc28d51e7051528f9f9594b75634e6bb66a785
Reviewed-on: https://go-review.googlesource.com/12534
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-22 20:28:47 +00:00
Russ Cox
3b26e8b29a runtime/pprof: ignore too few samples on Windows test
Fixes #10842.

Change-Id: I7de98f3073a47911863a252b7a74d8fdaa48c86f
Reviewed-on: https://go-review.googlesource.com/12529
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-22 20:26:37 +00:00
Ian Lance Taylor
872b168fe3 runtime: if we don't handle a signal on a non-Go thread, raise it
In the past badsignal would crash the program.  In
https://golang.org/cl/10757044 badsignal was changed to call sigsend,
to fix issue #3250.  The effect of this was that when a non-Go thread
received a signal, and os/signal.Notify was not being used to check
for occurrences of the signal, the signal was ignored.

This changes the code so that if os/signal.Notify is not being used,
then the signal handler is reset to what it was, and the signal is
raised again.  This lets non-Go threads handle the signal as they
wish.  In particular, it means that a segmentation violation in a
non-Go thread will ordinarily crash the process, as it should.

Fixes #10139.
Update #11794.

Change-Id: I2109444aaada9d963ad03b1d071ec667760515e5
Reviewed-on: https://go-review.googlesource.com/12503
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2015-07-22 20:26:29 +00:00
Russ Cox
4a4eba9f37 runtime: disable TestGoroutineParallelism on uniprocessor
It's a bad test and it's worst on uniprocessors.

Fixes #11143.

Change-Id: I0164231ada294788d7eec251a2fc33e02a26c13b
Reviewed-on: https://go-review.googlesource.com/12522
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-22 18:53:12 +00:00
Austin Clements
58f3a82950 runtime: fix comments referring to trace functions in runtime/pprof
ae1ea2a moved trace-related functions from runtime/pprof to
runtime/trace, but missed a doc comment and a code comment. Update
these to reflect the move.

Change-Id: I6e1e8861e5ede465c08a2e3f80b976145a8b32d8
Reviewed-on: https://go-review.googlesource.com/12525
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-07-22 18:33:38 +00:00
Dmitry Vyukov
ae1ea2aa94 runtime/trace: add new package
Move tracing functions from runtime/pprof to the new runtime/trace package.

Fixes #9710

Change-Id: I718bcb2ae3e5959d9f72cab5e6708289e5c8ebd5
Reviewed-on: https://go-review.googlesource.com/12511
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-22 15:47:16 +00:00
Michael Hudson-Doyle
1125cd4997 cmd/compile: define func value symbols at declaration
This is mostly Russ's https://golang.org/cl/12145 but with some extra fixes to
account for the fact that function declarations without implementations now
break shared libraries, and including my test case.

Fixes #11480.

Change-Id: Iabdc2934a0378e5025e4e7affadb535eaef2c8f1
Reviewed-on: https://go-review.googlesource.com/12340
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-20 00:50:46 +00:00
Austin Clements
1942e3814b runtime: clarify runtime.GC blocking behavior
The runtime.GC documentation was rewritten in df2809f to make it clear
that it blocks until GC is complete, but the re-rewrite in ed9a4c9 and
e28a679 lost this property when clarifying that it may also block the
entire program and not just the caller.

Try to arrive at wording that conveys both of these properties.

Change-Id: I1e255322aa28a21a548556ecf2a44d8d8ac524ef
Reviewed-on: https://go-review.googlesource.com/12392
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-19 15:10:06 +00:00
Ian Lance Taylor
692054e76e runtime: check for findmoduledatap returning nil
The findmoduledatap function will not return nil in ordinary use, but
check for nil to try to avoid crashing when we are already crashing.

Update #11783.

Change-Id: If7b1adb51efab13b4c1a37b6f3c9ad22641a0b56
Reviewed-on: https://go-review.googlesource.com/12391
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-18 21:26:59 +00:00
Alex Brainman
4a0d9587f2 runtime: skip TestReturnAfterStackGrowInCallback if gcc is not found
Fixes #11754

Change-Id: Ifa423ca6eea46d1500278db290498724a9559d14
Reviewed-on: https://go-review.googlesource.com/12347
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-18 01:29:09 +00:00
Rob Pike
e28a679216 runtime: make the GC message less committal.
We shouldn't guarantee this behavior, but suggest it's possible.

Change-Id: I4c2afb48b99be4d91537306d3337171a13c9990a
Reviewed-on: https://go-review.googlesource.com/12346
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-07-18 00:28:50 +00:00
Rob Pike
ed9a4c91c2 runtime: document that GC blocks the whole program
No code changes. Just make it clear that runtime.GC is not concurrent.

Change-Id: I00a99ebd26402817c665c9a128978cef19f037be
Reviewed-on: https://go-review.googlesource.com/12345
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-17 22:40:21 +00:00
Austin Clements
e33d6b3d4d runtime: remove out-of-date comment
An out-of-date comment snuck in to cc8f544. Remove it.

Change-Id: I5bc7c17e737d1cabe57b88de06d7579c60ca28ff
Reviewed-on: https://go-review.googlesource.com/12328
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2015-07-17 16:52:32 +00:00
Austin Clements
cc8f544198 runtime: don't free large spans until heapBitsSweepSpan returns
This fixes a race between 1) sweeping and freeing an unmarked large
span and 2) reusing that span and allocating from it. This race arises
because mSpan_Sweep returns spans for large objects to the heap
*before* heapBitsSweepSpan clears the mark bit on the object in the
span.

Specifically, the following sequence of events can lead to an
incorrectly zeroed bitmap byte, which causes the garbage collector to
not trace any pointers in that object (the pointer bits for the first
four words are cleared, and the scan bits are also cleared, so it
looks like a no-scan object).

1) P0 calls mSpan_Sweep on a large span S0 with an unmarked object on it.

2) mSpan_Sweep calls heapBitsSweepSpan, which invokes the callback for
   the one (unmarked) object on the span.

3) The callback calls mHeap_Free, which makes span S0 available for
   allocation, but this is too early.

4) P1 grabs this S0 from the heap to use for allocation.

5) P1 allocates an object on this span and writes that object's type
   bits to the bitmap.

6) P0 returns from the callback to heapBitsSweepSpan.
   heapBitsSweepSpan clears the byte containing the mark, even though
   this span is now owned by P1 and this byte contains important
   bitmap information.

This fixes this problem by simply delaying the mHeap_Free until after
the heapBitsSweepSpan. I think the overall logic of mSpan_Sweep could
be simplified now, but this seems like the minimal change.

Fixes #11617.

Change-Id: I6b1382c7e7cc35f81984467c0772fe9848b7522a
Reviewed-on: https://go-review.googlesource.com/12320
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-17 03:34:11 +00:00
Russ Cox
a93e5b4ff9 Revert "runtime: diagnose invalid pointers during GC"
Broke arm64. Update #9880.

This reverts commit 38d9b2a3a9.

Change-Id: I35fa21005af2183828a9d8b195ebcfbe45ec5138
Reviewed-on: https://go-review.googlesource.com/12247
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-16 01:49:58 +00:00
Austin Clements
e42413cecc runtime: fix saved PC/SP after safe-point function in syscall
Running a safe-point function on syscall entry uses systemstack() and
hence clobbers g.sched.pc and g.sched.sp. Fix this by re-saving them
after the systemstack, just like in the other uses of systemstack in
reentersyscall.

Change-Id: I47868a53eba24d81919fda56ef6bbcf72f1f922e
Reviewed-on: https://go-review.googlesource.com/12125
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-15 21:09:16 +00:00
Austin Clements
edfc979725 runtime: run safe-point function before entering _Psyscall
Currently, we run a P's safe-point function immediately after entering
_Psyscall state. This is unsafe, since as soon as we put the P in
_Psyscall, we no longer control the P and another M may claim it.
We'll still run the safe-point function only once (because doing so
races on an atomic), but the P may no longer be at a safe-point when
we do so.

In particular, this means that the use of forEachP to dispose all P's
gcw caches is unsafe. A P may enter a syscall, run the safe-point
function, and dispose the P's gcw cache concurrently with another M
claiming the P and attempting to use its gcw cache. If this happens,
we may empty the gcw's workbuf after putting it on
work.{full,partial}, or add pointers to it after putting it in
work.empty. This will cause an assertion failure when we later pop the
workbuf from the list and its object count is inconsistent with the
list we got it from.

Fix this by running the safe-point function just before putting the P
in _Psyscall.

Related to #11640. This probably fixes this issue, but while I'm able
to show that we can enter a bad safe-point state as a result of this,
I can't reproduce that specific failure.

Change-Id: I6989c8ca7ef2a4a941ae1931e9a0748cbbb59434
Reviewed-on: https://go-review.googlesource.com/12124
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-15 21:09:07 +00:00
Matthew Dempsky
64e53337af runtime: fix go:nowritebarrier annotation on gcmarkwb_m
Change-Id: I945d46d3bb63f1992bce0d0b1e89e75cac9bbd54
Reviewed-on: https://go-review.googlesource.com/12271
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-07-15 21:06:13 +00:00
Russ Cox
38d9b2a3a9 runtime: diagnose invalid pointers during GC
For #9880. Let's see what breaks.

Change-Id: Ic8b99a604e60177a448af5f7173595feed607875
Reviewed-on: https://go-review.googlesource.com/10818
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2015-07-15 05:42:06 +00:00
Russ Cox
3290e9c145 runtime: fix build on non-x86 machines
Fixes #11656 (again).

Change-Id: I170ff10bfbdb0f34e57c11de42b6ee5291837813
Reviewed-on: https://go-review.googlesource.com/12142
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-14 04:42:12 +00:00
Austin Clements
777ab5ce1a runtime: fix MemStats.{PauseNS,PauseEnd,PauseTotalNS,LastGC}
These memstats are currently being computed by gcMark, which was
appropriate in Go 1.4, but gcMark is now just one part of a bigger
picture. In particular, it can't account for the sweep termination
pause time, it can't account for all of the mark termination pause
time, and the reported "pause end" and "last GC" times will be
slightly earlier than they really are.

Lift computing of these statistics into func gc, which has the
appropriate visibility into the process to compute them correctly.

Fixes one of the issues in #10323. This does not add new statistics
appropriate to the concurrent collector; it simply fixes existing
statistics that are being misreported.

Change-Id: I670cb16594a8641f6b27acf4472db15b6e8e086e
Reviewed-on: https://go-review.googlesource.com/11794
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-13 23:32:59 +00:00
Austin Clements
ad60cd8b92 runtime: report MemStats.PauseEnd in UNIX time
Currently we report MemStats.PauseEnd in nanoseconds, but with no
particular 0 time. On Linux, the 0 time is when the host started. On
Darwin, it's the UNIX epoch. This is also inconsistent with the other
absolute time in MemStats, LastGC, which is always reported in
nanoseconds since 1970.

Fix PauseEnd so it's always reported in nanoseconds since 1970, like
LastGC.

Fixes one of the issues raised in #10323.

Change-Id: Ie2fe3169d45113992363a03b764f4e6c47e5c6a8
Reviewed-on: https://go-review.googlesource.com/11801
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-13 23:32:02 +00:00
Russ Cox
0bcdffeea6 runtime: fix x86 stack trace for call to heap memory
Fixes #11656.

Change-Id: Ib81d583e4b004e67dc9d2f898fd798112434e7a9
Reviewed-on: https://go-review.googlesource.com/12026
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
2015-07-13 19:42:35 +00:00
Russ Cox
683311175c runtime: fix race in TestChanSendBarrier
Fixes race detector build.

Change-Id: I8bdc78d57487580e6b5b8c415df4653a1ba69e37
Reviewed-on: https://go-review.googlesource.com/12087
Reviewed-by: Austin Clements <austin@google.com>
2015-07-13 19:42:20 +00:00
Russ Cox
8c3533c89b runtime: add memory barrier for sync send in select
Missed select case when adding the barrier last time.
All the more reason to refactor this code in Go 1.6.

Fixes #11643.

Change-Id: Ib0d19d6e0939296c0a3e06dda5e9b76f813bbc7e
Reviewed-on: https://go-review.googlesource.com/12086
Reviewed-by: Austin Clements <austin@google.com>
2015-07-13 19:10:22 +00:00
Brad Fitzpatrick
2ae77376f7 all: link to https instead of http
The one in misc/makerelease/makerelease.go is particularly bad and
probably warrants rotating our keys.

I didn't update old weekly notes, and reverted some changes involving
test code for now, since we're late in the Go 1.5 freeze. Otherwise,
the rest are all auto-generated changes, and all manually reviewed.

Change-Id: Ia2753576ab5d64826a167d259f48a2f50508792d
Reviewed-on: https://go-review.googlesource.com/12048
Reviewed-by: Rob Pike <r@golang.org>
2015-07-11 14:36:33 +00:00
Elias Naur
b3a8b0574a runtime: abort on fatal errors and panics in c-shared and c-archive modes
The default behaviour for fatal errors and runtime panics is to dump
the goroutine stack traces and exit with code 2. However, when the process is
owned by foreign code, it is suprising and inappropriate to suddenly exit
the whole process, even on fatal errors. Instead, re-use the crash behaviour
from GOTRACEBACK=crash and abort.

The motivating use case is issue #11382, where an Android crash reporter
is confused by an exiting process, but I believe the aborting behaviour
is appropriate for all cases where Go does not own the process.

The change is simple and contained and will enable reliable crash reporting
for Android apps in Go 1.5, but I'll leave it to others to judge whether it
is too late for Go 1.5.

Fixes #11382

Change-Id: I477328e1092f483591c99da1fbb8bc4411911785
Reviewed-on: https://go-review.googlesource.com/12032
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-07-11 11:39:05 +00:00
Alex Brainman
d5004ee69e runtime: use AddVectoredContinueHandler on Windows XP amd64
Recent change (CL 10370) unexpectedly broke TestRaiseException on
Windows XP amd64. I still do not know why. But reverting old
CL 8165 fixes the problem.

This effectively makes Windows XP amd64 use AddVectoredContinueHandler
instead of SetUnhandledExceptionFilter for exception handling. That is
what we do for all recent Windows versions too.

Fixes #11481

Change-Id: If2e8037711f05bf97e3c69f5a8d86af67c58f6fc
Reviewed-on: https://go-review.googlesource.com/11888
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Daniel Theophanes <kardianos@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-07-11 07:02:57 +00:00
Ian Lance Taylor
6a90b1d621 runtime, cmd/go: fix tests to work when GOROOT_FINAL is set
When GOROOT_FINAL is set when running all.bash, the tests are run
before the files are copied to GOROOT_FINAL.  The tests are run with
GOROOT set, so most work fine.  This fixes two cases that do not.

In cmd/go/go_test.go we were explicitly removing GOROOT from the
environment, causing tests that did not themselves explicitly set
GOROOT to fail.  There was no need to explicitly remove GOROOT, so
don't do it.  If people choose to run "go test cmd/go" with a bad
GOROOT, that is their own lookout.

In the runtime GDB test, the linker has told gdb to find the support
script in GOROOT_FINAL, which will fail.  Check for that case, and
skip the test when we see it.

Fixes #11652.

Change-Id: I4d3a32311e3973c30fd8a79551aaeab6789d0451
Reviewed-on: https://go-review.googlesource.com/12021
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-10 21:29:37 +00:00
Ian Lance Taylor
2de67e9974 runtime: clarify that NumCPU returns only available CPUs
Update #11609.

Change-Id: Ie363facf13f5e62f1af4a8bdc42a18fb36e16ebf
Reviewed-on: https://go-review.googlesource.com/12022
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-07-10 21:28:49 +00:00
Austin Clements
4b2774f5ea runtime: make sysmon-triggered GC concurrent
sysmon triggers a GC if there has been no GC for two minutes.
Currently, this is a STW GC. There is no reason for this to be STW, so
make it concurrent.

Fixes #10261.

Change-Id: I92f3ac37272d5c2a31480ff1fa897ebad08775a9
Reviewed-on: https://go-review.googlesource.com/11955
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-09 05:53:21 +00:00
David Chase
7929a0ddfa cmd/compile: initialize line number properly for temporaries
The expansion of structure, array, slice, and map literals
does not use the right line number in its introduced assignments
to temporaries, which leads to incorrect line number attribution
for expressions in those literals.

Inlining also incorrectly replaced the line numbers of args to
inlined functions.

This was revealed in CL 9721 because a now-avoided temporary
assignment introduced the correct line number.
I.e. before CL 9721
  "tmp_wrongline := expr"
was transformed to
  "tmp_rightline := expr; tmp_wrongline := tmp_rightline"

Also includes a repair to CL 10334 involving line numbers
where a spurious -1 remained (should have been 0, now is 0).

Fixes #11400.

Change-Id: I3a4687efe463977fa1e2c996606f4d91aaf22722
Reviewed-on: https://go-review.googlesource.com/11730
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Sameer Ajmani <sameer@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-07 21:30:59 +00:00
Russ Cox
2028077899 runtime: randomize scheduling in -race mode
Basic randomization of goroutine scheduling for -race mode.
It is probably possible to do much better (there's a paper linked
in the issue that I haven't read, for example), but this suffices
to introduce at least some unpredictability into the scheduling order.
The goal here is to have _something_ for Go 1.5, so that we don't
start hitting more of these scheduling order-dependent bugs
if we change the scheduler order again in Go 1.6.

For #11372.

Change-Id: Idf1154123fbd5b7a1ee4d339e93f97635cc2bacb
Reviewed-on: https://go-review.googlesource.com/11795
Reviewed-by: Austin Clements <austin@google.com>
2015-07-07 21:27:38 +00:00
Russ Cox
3b6e86f48a cmd/compile: fix race detector handling of OBLOCK nodes
Fixes #7561 correctly.
Fixes #9137.

Change-Id: I7f27e199d7101b785a7645f789e8fe41a405a86f
Reviewed-on: https://go-review.googlesource.com/11713
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-06-30 19:25:18 +00:00
Russ Cox
8b99bb7b8c runtime: fix broken arm builds
Change-Id: I08de33aacb3fc932722286d69b1dd70ffe787c89
Reviewed-on: https://go-review.googlesource.com/11697
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 17:33:23 +00:00
Russ Cox
434e0bc0a0 cmd/link: record missing pcdata tables correctly
The old code was recording the current table output offset,
so the table from the next function would be used instead of
the runtime realizing that there was no table at all.

Add debug constant in runtime to check this for every function
at startup. It's too expensive to do that by default, but we can
do the last five functions. The end of the table is usually where
the C symbols end up, so that's where the problems typically are.

Fixes #10747.
Fixes #11396.

Change-Id: I13592e78017969fc22979fa902e19e1b151d41b1
Reviewed-on: https://go-review.googlesource.com/11657
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
2015-06-29 16:07:14 +00:00
Austin Clements
1b917484a8 runtime: reset mark state before checkmark and gctrace=2 mark
Currently we fail to reset the live heap accounting state before the
checkmark mark and before the gctrace=2 extra mark. As a result, if
either are enabled, at the end of GC it thinks there are 0 bytes of
live heap, which causes the GC controller to initiate a new GC
immediately, regardless of the true heap size.

Fix this by factoring this state reset into a function and calling it
before all three possible marks.

This function should be merged with gcResetGState, but doing so
requires some additional cleanup, so it will wait for after the
freeze. Filed #11427 for this cleanup.

Fixes #10492.

Change-Id: Ibe46348916fc8368fac6f086e142815c970a6f4d
Reviewed-on: https://go-review.googlesource.com/11561
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 15:58:29 +00:00
Austin Clements
d57056ba26 runtime: don't free stack spans during GC
Memory for stacks is manually managed by the runtime and, currently
(with one exception) we free stack spans immediately when the last
stack on a span is freed. However, the garbage collector assumes that
spans can never transition from non-free to free during scan or mark.
This disagreement makes it possible for the garbage collector to mark
uninitialized objects and is blocking us from re-enabling the bad
pointer test in the garbage collector (issue #9880).

For example, the following sequence will result in marking an
uninitialized object:

1. scanobject loads a pointer slot out of the object it's scanning.
   This happens to be one of the special pointers from the heap into a
   stack. Call the pointer p and suppose it points into X's stack.

2. X, running on another thread, grows its stack and frees its old
   stack.

3. The old stack happens to be large or was the last stack in its
   span, so X frees this span, setting it to state _MSpanFree.

4. The span gets reused as a heap span.

5. scanobject calls heapBitsForObject, which loads the span containing
   p, which is now in state _MSpanInUse, but doesn't necessarily have
   an object at p. The not-object at p gets marked, and at this point
   all sorts of things can go wrong.

We already have a partial solution to this. When shrinking a stack, we
put the old stack on a queue to be freed at the end of garbage
collection. This was done to address exactly this problem, but wasn't
a complete solution.

This commit generalizes this solution to both shrinking and growing
stacks. For stacks that fit in the stack pool, we simply don't free
the span, even if its reference count reaches zero. It's fine to reuse
the span for other stacks, and this enables that. At the end of GC, we
sweep for cached stack spans with a zero reference count and free
them. For larger stacks, we simply queue the stack span to be freed at
the end of GC. Ideally, we would reuse these large stack spans the way
we can small stack spans, but that's a more invasive change that will
have to wait until after the freeze.

Fixes #11267.

Change-Id: Ib7f2c5da4845cc0268e8dc098b08465116972a71
Reviewed-on: https://go-review.googlesource.com/11502
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 15:33:40 +00:00
Austin Clements
f73b2fca84 runtime: remove unused _GCsweep state
We don't use this state. _GCoff means we're sweeping in the
background. This makes it clear in the next commit that _GCoff and
only _GCoff means sweeping.

Change-Id: I416324a829ba0be3794a6cf3cf1655114cb6e47c
Reviewed-on: https://go-review.googlesource.com/11501
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 15:33:31 +00:00
Austin Clements
840965f8d7 runtime: always clear stack barriers on G exit
Currently the runtime fails to clear a G's stack barriers in gfput if
the G's stack allocation is _FixedStack bytes. This causes the runtime
to panic if the following sequence of events happens:

1) The runtime installs stack barriers on a G.

2) The G exits by calling runtime.Goexit. Since this does not
   necessarily return through the stack barriers installed on the G,
   there may still be untriggered stack barriers left on the G's stack
   in recorded in g.stkbar.

3) The runtime calls gfput to add the exiting G to the free pool. If
   the G's stack allocation is _FixedStack bytes, we fail to clear
   g.stkbar.

4) A new G starts and allocates the G that was just added to the free
   pool.

5) The new G begins to execute and overwrites the stack slots that had
   stack barriers in them.

6) The garbage collector enters mark termination, attempts to remove
   stack barriers from the new G, and finds that they've been
   overwritten.

Fix this by clearing the stack barriers in gfput in the case where it
reuses the stack.

Fixes #11256.

Change-Id: I377c44258900e6bcc2d4b3451845814a8eeb2bcf
Reviewed-on: https://go-review.googlesource.com/11461
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 15:02:30 +00:00
Alex Brainman
85d4d46f3c runtime: store syscall parameters in m not on stack
Stack can move during callback, so libcall struct cannot be stored on stack.
asmstdcall updates return values and errno in libcall struct parameter, but
these could be at different location when callback returns.
Store these in m, so they are not affected by GC.

Fixes #10406

Change-Id: Id01c9d2b4b44530494e6d9e9e1c875261ce477cd
Reviewed-on: https://go-review.googlesource.com/10370
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 02:45:45 +00:00
Austin Clements
d231cb8249 runtime: repeat bitmap for slice of GCprog n-1 times, not n times
Currently, to write out the bitmap of a slice of a type with a GCprog,
we construct a new GCprog that executes the underlying type's GCprog
to write out the bitmap once and then repeats those bits n more times.
This results in n+1 repetitions of the bitmap, which is one more
repetition than it should be. This corrupts the bitmap of the heap
following the slice and may write past the mapped bitmap memory and
segfault.

Fix this by repeating the bitmap only n-1 more times.

Fixes #11430.

Change-Id: Ic24854363bffc5a755b66f257339f9309ada3aa5
Reviewed-on: https://go-review.googlesource.com/11570
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-26 21:52:51 +00:00
Dmitry Vyukov
77132c810d runtime/race: enable tests that now pass
These tests pass after cl/11417.

Change-Id: Id98088c52e564208ce432e9717eddd672c42c66d
Reviewed-on: https://go-review.googlesource.com/11551
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-26 18:54:11 +00:00
Shenghou Ma
21a4c93166 runtime: slightly clean up softfloat code
Removes the remains of the old C based stepflt implementation.
Also removed goto usage.

Change-Id: Ida4742c49000fae4fea4649f28afde630ce4c577
Reviewed-on: https://go-review.googlesource.com/9600
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-26 17:51:22 +00:00
Russ Cox
32fddadd98 runtime: reduce slice growth during append to 2x
The new inlined code for append assumed that it could pass the
desired new cap to growslice, not the number of new elements.
But growslice still interpreted the argument as the number of new elements,
making it always grow by >2x (more precisely, 2x+1 rounded up
to the next malloc block size). At the time, I had intended to change
the other callers to use the new cap as well, but it's too late for that.
Instead, introduce growslice_n for the old callers and keep growslice
for the inlined (common case) caller.

Fixes #11403.

Filed #11419 to merge them.

Change-Id: I1338b1e5b352f3be4e43641f44b652ef7195251b
Reviewed-on: https://go-review.googlesource.com/11541
Reviewed-by: Austin Clements <austin@google.com>
2015-06-26 17:49:33 +00:00
Dmitry Vyukov
cd0a8ed48a cmd/compile: add instrumentation of OKEY
Instrument operands of OKEY.
Also instrument OSLICESTR. Previously it was not needed
because of preceeding bounds checks (which were instrumented).
But the preceeding bounds checks have disappeared.

Change-Id: I3b0de213e23cbcf5b8ef800abeded5eeeb3f8287
Reviewed-on: https://go-review.googlesource.com/11417
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-26 15:54:03 +00:00
Aaron Jacobs
8628688304 Fix several out of date references to 4g/5g/6g/8g/9g.
Change-Id: Ifb8e4e13c7778a7c0113190051415e096f5db94f
Reviewed-on: https://go-review.googlesource.com/11390
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-06-26 03:38:21 +00:00
Dmitry Vyukov
055e1a3ae7 runtime/race: fix test driver
At some point it silently stopped recognizing test output.
Meanwhile two tests degraded...

Change-Id: I90a0325fc9aaa16c3ef16b9c4c642581da2bb10c
Reviewed-on: https://go-review.googlesource.com/11416
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-25 11:36:07 +00:00
Russ Cox
a9e536442e runtime: set m.procid always on Linux
For debuggers and other program inspectors.

Fixes #9914.

Change-Id: I670728cea28c045e6eaba1808c550ee2f34d16ff
Reviewed-on: https://go-review.googlesource.com/11341
Reviewed-by: Austin Clements <austin@google.com>
2015-06-24 21:50:39 +00:00
Dmitry Vyukov
77082481d4 runtime/race: make test more robust
The test is flaky on builders lately. I don't see any issues other than
usage of very small sleeps. So increase the sleeps. Also take opportunity
to refactor the code.
On my machine this change significantly reduces failure rate with GOMAXPROCS=2.
I can't reproduce the failure with GOMAXPROCS=1.

Fixes #10726

Change-Id: Iea6f10cf3ce1be5c112a2375d51c13687a8ab4c9
Reviewed-on: https://go-review.googlesource.com/9803
Reviewed-by: Austin Clements <austin@google.com>
2015-06-24 17:53:25 +00:00
Austin Clements
a8ae93fd26 runtime: fix heap bitmap repeating with large scalar tails
When heapBitsSetType repeats a source bitmap with a scalar tail
(typ.ptrdata < typ.size), it lays out the tail upon reaching the end
of the source bitmap by simply increasing the number of bits claimed
to be in the incoming bit buffer. This causes later iterations to read
the appropriate number of zeros out of the bit buffer before starting
on the next repeat of the source bitmap.

Currently, however, later iterations of the loop continue to read bits
from the source bitmap *regardless of the number of bits currently in
the bit buffer*. The bit buffer can only hold 32 or 64 bits, so if the
scalar tail is large and the padding bits exceed the size of the bit
buffer, the read from the source bitmap on the next iteration will
shift the incoming bits into oblivion when it attempts to put them in
the bit buffer. When the buffer does eventually shift down to where
these bits were supposed to be, it will contain zeros. As a result,
words that should be marked as pointers on later repetitions are
marked as scalars, so the garbage collector does not trace them. If
this is the only reference to an object, it will be incorrectly freed.

Fix this by adding logic to drain the bit buffer down if it is large
instead of reading more bits from the source bitmap.

Fixes #11286.

Change-Id: I964432c4b9f1cec334fc8c3da0ff16460203feb6
Reviewed-on: https://go-review.googlesource.com/11360
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-23 18:37:17 +00:00
Austin Clements
eabdd05892 runtime: document memory ordering for h_spans
h_spans can be accessed concurrently without synchronization from
other threads, which means it needs the appropriate memory barriers on
weakly ordered machines. It happens to already have the necessary
memory barriers because all accesses to h_spans are currently
protected by the heap lock and the unlocks happen in exactly the
places where release barriers are needed, but it's easy to imagine
that this could change in the future. Document the fact that we're
depending on the barrier implied by the unlock.

Related to issue #9984.

Change-Id: I1bc3c95cd73361b041c8c95cd4bb92daf8c1f94a
Reviewed-on: https://go-review.googlesource.com/11361
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-06-23 18:28:46 +00:00
Rick Hudson
1ab9176e54 runtime: remove race and increase precision in pointer validation.
This CL removes the single and racy use of mheap.arena_end outside
of the bookkeeping done in mHeap_init and mHeap_Alloc.
There should be no way for heapBitsForSpan to see a pointer to
an invalid span. This CL makes the check for this more precise by
checking that the pointer is between mheap_.arena_start and
mheap_.arena_used instead of mheap_.arena_end.

Change-Id: I1200b54353ee1eda002d92645fd8d26048600ceb
Reviewed-on: https://go-review.googlesource.com/11342
Reviewed-by: Austin Clements <austin@google.com>
2015-06-22 20:37:23 +00:00
Austin Clements
9a3112bcae runtime: one more Map{Bits,Spans} before arena_used update
In order to avoid a race with a concurrent write barrier or garbage
collector thread, any update to arena_used must be preceded by mapping
the corresponding heap bitmap and spans array memory. Otherwise, the
concurrent access may observe that a pointer falls within the heap
arena, but then attempt to access unmapped memory to look up its span
or heap bits.

Commit d57c889 fixed all of the places where we updated arena_used
immediately before mapping the heap bitmap and spans, but it missed
the one place where we update arena_used and depend on later code to
update it again and map the bitmap and spans. This creates a window
where the original race can still happen. This commit fixes this by
mapping the heap bitmap and spans before this arena_used update as
well. This code path is only taken when expanding the heap reservation
on 32-bit over a hole in the address space, so these extra mmap calls
should have negligible impact.

Fixes #10212, #11324.

Change-Id: Id67795e6c7563eb551873bc401e5cc997aaa2bd8
Reviewed-on: https://go-review.googlesource.com/11340
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-06-22 18:54:38 +00:00
Austin Clements
2a331ca8bb runtime: document relaxed access to arena_used
The unsynchronized accesses to mheap_.arena_used in the concurrent
part of the garbage collector look like a problem waiting to happen.
In fact, they are safe, but the reason is somewhat subtle and
undocumented. This commit documents this reasoning.

Related to issue #9984.

Change-Id: Icdbf2329c1aa11dbe2396a71eb5fc2a85bd4afd5
Reviewed-on: https://go-review.googlesource.com/11254
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-06-22 18:37:20 +00:00
Austin Clements
f5d494bbdf runtime: ensure GC sees type-safe memory on weak machines
Currently its possible for the garbage collector to observe
uninitialized memory or stale heap bitmap bits on weakly ordered
architectures such as ARM and PPC. On such architectures, the stores
that zero newly allocated memory and initialize its heap bitmap may
move after a store in user code that makes the allocated object
observable by the garbage collector.

To fix this, add a "publication barrier" (also known as an "export
barrier") before returning from mallocgc. This is a store/store
barrier that ensures any write done by user code that makes the
returned object observable to the garbage collector will be ordered
after the initialization performed by mallocgc. No barrier is
necessary on the reading side because of the data dependency between
loading the pointer and loading the contents of the object.

Fixes one of the issues raised in #9984.

Change-Id: Ia3d96ad9c5fc7f4d342f5e05ec0ceae700cd17c8
Reviewed-on: https://go-review.googlesource.com/11083
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Martin Capitanio <capnm9@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-19 15:29:50 +00:00
Alex Brainman
9d968cb47b runtime: rename cgocall_errno and asmcgocall_errno into cgocall and asmcgocall
Change-Id: I5917bea8bb35b0e725dcc56a68f3a70137cfc180
Reviewed-on: https://go-review.googlesource.com/9387
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-19 01:47:11 +00:00
Rick Hudson
90a19961f2 runtime: reduce latency by aggressively ending mark phase
Some latency regressions have crept into our system over the past few
weeks. This CL fixes those by having the mark phase more aggressively
blacken objects so that the mark termination phase, a STW phase, has less
work to do. Three approaches were taken when the mark phase believes
it has no more work to do, ie all the work buffers are empty.
If things have gone well the mark phase is correct and there is
in fact little or no work. In that case the following items will
take very little time. If the mark phase is wrong this CL will
ferret that work out and give the mark phase a chance to deal with
it concurrently before mark termination begins.

When the mark phase first appears to be out of work, it does three things:
1) It switches from allocating white to allocating black to reduce the
number of unmarked objects reachable only from stacks.
2) It flushes and disables per-P GC work caches so all work must be in
globally visible work buffers.
3) It rescans the global roots---the BSS and data segments---so there
are fewer objects to blacken during mark termination. We do not rescan
stacks at this point, though that could be done in a later CL.
After these steps, it again drains the global work buffers.

On a lightly loaded machine the garbage benchmark has reduced the
number of GC cycles with latency > 10 ms from 83 out of 4083 cycles
down to 2 out of 3995 cycles. Maximum latency was reduced from
60+ msecs down to 20 ms.

Change-Id: I152285b48a7e56c5083a02e8e4485dd39c990492
Reviewed-on: https://go-review.googlesource.com/10590
Reviewed-by: Austin Clements <austin@google.com>
2015-06-18 21:38:46 +00:00
Shenghou Ma
3925a7c5db all: switch to the new deprecation convention
While we're at it, move some misplaced comment blocks around.

Change-Id: I1847d7f1ca1dbb8e5de737203c4ed6c66e112508
Reviewed-on: https://go-review.googlesource.com/10188
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-18 19:16:23 +00:00
Dmitry Vyukov
e72f5f67a1 runtime: fix tracing of syscallexit
There were two issues.
1. Delayed EvGoSysExit could have been emitted during TraceStart,
while it had not yet emitted EvGoInSyscall.
2. Delayed EvGoSysExit could have been emitted during next tracing session.

Fixes #10476
Fixes #11262

Change-Id: Iab68eb31cf38eb6eb6eee427f49c5ca0865a8c64
Reviewed-on: https://go-review.googlesource.com/9132
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-18 13:59:55 +00:00
Alex Brainman
2858b73843 runtime: remove cgocall and asmcgocall
In preparation for rename of cgocall_errno into cgocall and
asmcgocall_errno into asmcgocall in the fllowinng CL.
rsc requested CL 9387 to be split into two parts. This is first part.

Change-Id: I7434f0e4b44dd37017540695834bfcb1eebf0b2f
Reviewed-on: https://go-review.googlesource.com/11166
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-18 04:42:53 +00:00
Russ Cox
cfa3eda587 runtime: fix race in scanvalid assertion
Change-Id: I389b2e10fe667eaa55f87b71b1e004994694d4a3
Reviewed-on: https://go-review.googlesource.com/11173
Reviewed-by: Austin Clements <austin@google.com>
2015-06-17 20:12:37 +00:00
Russ Cox
3c60e6e8cf runtime: fix races in stack scan
This fixes a hang during runtime.TestTraceStress.
It also fixes double-scan of stacks, which leads to
stack barrier installation failures.

Both of these have shown up as flaky failures on the dashboard.

Fixes #10941.

Change-Id: Ia2a5991ce2c9f43ba06ae1c7032f7c898dc990e0
Reviewed-on: https://go-review.googlesource.com/11089
Reviewed-by: Austin Clements <austin@google.com>
2015-06-17 17:56:26 +00:00
Russ Cox
08e25fc1ba cmd/compile: introduce //go:systemstack annotation
//go:systemstack means that the function must run on the system stack.

Add one use in runtime as a demonstration.

Fixes #9174.

Change-Id: I8d4a509cb313541426157da703f1c022e964ace4
Reviewed-on: https://go-review.googlesource.com/10840
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2015-06-17 14:23:00 +00:00
Yongjian Xu
e3dc59f33d runtime: fix typos in os_linux_arm.go
Change-Id: I750900e0aed9ec528fea3f442c35196773e3ba5e
Reviewed-on: https://go-review.googlesource.com/11163
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-17 08:51:59 +00:00
Austin Clements
7387121ddb runtime: account for stack guard when shrinking the stack
Currently, when shrinkstack computes whether the halved stack
allocation will have enough room for the stack, it accounts for the
stack space that's actively in use but fails to leave extra room for
the stack guard space. As a result, *if* the minimum stack size is
small enough or the guard large enough, it may shrink the stack and
leave less than enough room to run nosplit functions. If the next
function called after the stack shrink is a nosplit function, it may
overflow the stack without noticing and overwrite non-stack memory.

We don't think this is happening under normal conditions right now.
The minimum stack allocation is 2K and the guard is 640 bytes. The
"worst case" stack shrink is from 4K (4048 bytes after stack barrier
array reservation) to 2K (2016 bytes after stack barrier array
reservation), which means the largest "used" size that will qualify
for shrinking is 4048/4 - 8 = 1004 bytes. After copying, that leaves
2016 - 1004 = 1012 bytes of available stack, which is significantly
more than the guard space.

If we were to reduce the minimum stack size to 1K or raise the guard
space above 1012 bytes, the logic in shrinkstack would no longer leave
enough space.

It's also possible to trigger this problem by setting
firstStackBarrierOffset to 0, which puts stack barriers in a debug
mode that steals away *half* of the stack for the stack barrier array
reservation. Then, the largest "used" size that qualifies for
shrinking is (4096/2)/4 - 8 = 504 bytes. After copying, that leaves
(2096/2) - 504 = 8 bytes of available stack; much less than the
required guard space. This causes failures like those in issue #11027
because func gc() shrinks its own stack and then immediately calls
casgstatus (a nosplit function), which overflows the stack and
overwrites a free list pointer in the neighboring span. However, since
this seems to require the special debug mode, we don't think it's
responsible for issue #11027.

To forestall all of these subtle issues, this commit modifies
shrinkstack to correctly account for the guard space when considering
whether to halve the stack allocation.

Change-Id: I7312584addc63b5bfe55cc384a1012f6181f1b9d
Reviewed-on: https://go-review.googlesource.com/10714
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-16 21:17:53 +00:00
Austin Clements
5250279eb9 runtime: detect and print corrupted free lists
Issues #10240, #10541, #10941, #11023, #11027 and possibly others are
indicating memory corruption in the runtime. One of the easiest places
to both get corruption and detect it is in the allocator's free lists
since they appear throughout memory and follow strict invariants. This
commit adds a check when sweeping a span that its free list is sane
and, if not, it prints the corrupted free list and panics. Hopefully
this will help us collect more information on these failures.

Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64
Reviewed-on: https://go-review.googlesource.com/10713
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-06-16 21:17:47 +00:00
Russ Cox
142e434006 runtime: implement GOTRACEBACK=crash for linux/386
Change-Id: I401ce8d612160a4f4ee617bddca6827fa544763a
Reviewed-on: https://go-review.googlesource.com/11087
Reviewed-by: Austin Clements <austin@google.com>
2015-06-16 20:47:47 +00:00
Russ Cox
7bc3e58806 all: extract "can I exec?" check from tests into internal/testenv
Change-Id: I7b54be9d8b50b39e01c6be21f310ae9a10404e9d
Reviewed-on: https://go-review.googlesource.com/10753
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-16 18:07:36 +00:00
Russ Cox
43aac4f9e7 runtime: raise maxmem to 512 GB
A workaround for #10460.

Change-Id: I607a556561d509db6de047892f886fb565513895
Reviewed-on: https://go-review.googlesource.com/10819
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 18:31:25 +00:00
Russ Cox
2c2770c3d4 cmd/cgo: make sure pointers passed to C escape to heap
Fixes #10303.

Change-Id: Ia68d3566ba3ebeea6e18e388446bd9b8c431e156
Reviewed-on: https://go-review.googlesource.com/10814
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-15 17:39:53 +00:00
Russ Cox
a3b9797baa runtime: gofmt
Change-Id: I539bdc438f694610a7cd373f7e1451171737cfb3
Reviewed-on: https://go-review.googlesource.com/11084
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-15 17:36:34 +00:00
Russ Cox
d5b40b6ac2 runtime: add GODEBUG gcshrinkstackoff, gcstackbarrieroff, and gcstoptheworld variables
While we're here, update the documentation and delete variables with no effect.

Change-Id: I4df0d266dff880df61b488ed547c2870205862f0
Reviewed-on: https://go-review.googlesource.com/10790
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 17:31:04 +00:00
Russ Cox
80ec711755 runtime: use type-based write barrier for remote stack write during chansend
A send on an unbuffered channel to a blocked receiver is the only
case in the runtime where one goroutine writes directly to the stack
of another. The garbage collector assumes that if a goroutine is
blocked, its stack contains no new pointers since the last time it ran.
The send on an unbuffered channel violates this, so it needs an
explicit write barrier. It has an explicit write barrier, but not one that
can handle a write to another stack. Use one that can (based on type bitmap
instead of heap bitmap).

To make this work, raise the limit for type bitmaps so that they are
used for all types up to 64 kB in size (256 bytes of bitmap).
(The runtime already imposes a limit of 64 kB for a channel element size.)

I have been unable to reproduce this problem in a simple test program.

Could help #11035.

Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5
Reviewed-on: https://go-review.googlesource.com/10815
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 16:50:30 +00:00
Russ Cox
d57c889ae8 runtime: wait to update arena_used until after mapping bitmap
This avoids a race with gcmarkwb_m that was leading to faults.

Fixes #10212.

Change-Id: I6fcf8d09f2692227063ce29152cb57366ea22487
Reviewed-on: https://go-review.googlesource.com/10816
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-06-11 18:15:21 +00:00
Ainar Garipov
7f9f70e5b6 all: fix misprints in comments
These were found by grepping the comments from the go code and feeding
the output to aspell.

Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395
Reviewed-on: https://go-review.googlesource.com/10941
Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-11 14:18:57 +00:00
Yongjian Xu
93e57a22d5 runtime: correct a drifted comment in referencing m->locked.
Change-Id: Ida4b98aa63e57594fa6fa0b8178106bac9b3cd19
Reviewed-on: https://go-review.googlesource.com/10837
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-10 06:15:20 +00:00
Russ Cox
433c0bc769 runtime: avoid fault in heapBitsBulkBarrier
Change-Id: I0512e461de1f25cb2a1cb7f23e7a77d00700667c
Reviewed-on: https://go-review.googlesource.com/10803
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-08 20:24:00 +00:00
Austin Clements
b0532a96a8 runtime: fix write-barrier-enabled phase list in gcmarkwb_m
Commit 1303957 was supposed to enable write barriers during the
concurrent scan phase, but it only enabled *calls* to the write
barrier during this phase. It failed to update the redundant list of
write-barrier-enabled phases in gcmarkwb_m, so it still wasn't greying
objects during the scan phase.

This commit fixes this by replacing the redundant list of phases in
gcmarkwb_m with simply checking writeBarrierEnabled. This is almost
certainly redundant with checks already done in callers, but the last
time we tried to remove these redundant checks everything got much
slower, so I'm leaving it alone for now.

Fixes #11105.

Change-Id: I00230a3cb80a008e749553a8ae901b409097e4be
Reviewed-on: https://go-review.googlesource.com/10801
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-08 05:13:15 +00:00
Austin Clements
306f8f11ad runtime: unwind stack barriers when writing above the current frame
Stack barriers assume that writes through pointers to frames above the
current frame will get write barriers, and hence these frames do not
need to be re-scanned to pick up these changes. For normal writes,
this is true. However, there are places in the runtime that use
typedmemmove to potentially write through pointers to higher frames
(such as mapassign1). Currently, typedmemmove does not execute write
barriers if the destination is on the stack. If there's a stack
barrier between the current frame and the frame being modified with
typedmemmove, and the stack barrier is not otherwise hit, it's
possible that the garbage collector will never see the updated pointer
and incorrectly reclaim the object.

Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove
and its variants) detect when the destination is in the stack and
unwind stack barriers up to the point, forcing mark termination to
later rescan the effected frame and collect these pointers.

Fixes #11084. Might be related to #10240, #10541, #10941, #11023,
 #11027 and possibly others.

Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd
Reviewed-on: https://go-review.googlesource.com/10791
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:57:47 +00:00
Austin Clements
1303957dbf runtime: enable write barriers during concurrent scan
Currently, write barriers are only enabled after completion of the
concurrent scan phase, as we enter the concurrent mark phase. However,
stack barriers are installed during the scan phase and assume that
write barriers will track changes to frames above the stack
barriers. Since write barriers aren't enabled until after stack
barriers are installed, we may miss modifications to the stack that
happen after installing the stack barriers and before enabling write
barriers.

Fix this by enabling write barriers during the scan phase.

This commit intentionally makes the minimal change to do this (there's
only one line of code change; the rest are comment changes). At the
very least, we should consider eliminating the ragged barrier that's
intended to synchronize the enabling of write barriers, but now just
wastes time. I've included a large comment about extensions and
alternative designs.

Change-Id: Ib20fede794e4fcb91ddf36f99bd97344d7f96421
Reviewed-on: https://go-review.googlesource.com/10795
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:55:33 +00:00
Austin Clements
6f6403eddf runtime: fix checkmarks to rescan stacks
Currently checkmarks mode fails to rescan stacks because it sees the
leftover state bits indicating that the stacks haven't changed since
the last scan. As a result, it won't detect lost marks caused by
failing to scan stacks correctly during regular garbage collection.

Fix this by marking all stacks dirty before performing the checkmark
phase.

Change-Id: I1f06882bb8b20257120a4b8e7f95bb3ffc263895
Reviewed-on: https://go-review.googlesource.com/10794
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:55:12 +00:00
Austin Clements
2774b37306 all: use RET instead of RETURN on ppc64
All of the architectures except ppc64 have only "RET" for the return
mnemonic. ppc64 used to have only "RETURN", but commit cf06ea6
introduced RET as a synonym for RETURN to make ppc64 consistent with
the other architectures. However, that commit was never followed up to
make the code itself consistent by eliminating uses of RETURN.

This commit replaces all uses of RETURN in the ppc64 assembly with
RET.

This was done with
    sed -i 's/\<RETURN\>/RET/' **/*_ppc64x.s
plus one manual change to syscall/asm.s.

Change-Id: I3f6c8d2be157df8841d48de988ee43f3e3087995
Reviewed-on: https://go-review.googlesource.com/10672
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2015-06-06 00:07:23 +00:00
Alan Donovan
232331f0c7 runtime: add blank assignment to defeat "declared but not used" error from go/types
gc should ideally consider this an error too; see golang/go#8560.

Change-Id: Ieee71c4ecaff493d7f83e15ba8c8a04ee90a4cf1
Reviewed-on: https://go-review.googlesource.com/10757
Reviewed-by: Robert Griesemer <gri@golang.org>
2015-06-05 18:05:16 +00:00
Austin Clements
7529314ed3 runtime: use correct SP when installing stack barriers
Currently the stack barriers are installed at the next frame boundary
after gp.sched.sp + 1024*2^n for n=0,1,2,... However, when a G is in a
system call, we set gp.sched.sp to 0, which causes stack barriers to
be installed at *every* frame. This easily overflows the slice we've
reserved for storing the stack barrier information, and causes a
"slice bounds out of range" panic in gcInstallStackBarrier.

Fix this by using gp.syscallsp instead of gp.sched.sp if it's
non-zero. This is the same logic that gentraceback uses to determine
the current SP.

Fixes #11049.

Change-Id: Ie40eeee5bec59b7c1aa715a7c17aa63b1f1cf4e8
Reviewed-on: https://go-review.googlesource.com/10755
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-05 15:53:07 +00:00
Russ Cox
3ffcbb633e runtime: default GOMAXPROCS to NumCPU(), not 1
See golang.org/s/go15gomaxprocs for details.

Change-Id: I8de5df34fa01d31d78f0194ec78a2474c281243c
Reviewed-on: https://go-review.googlesource.com/10668
Reviewed-by: Rob Pike <r@golang.org>
2015-06-05 04:38:04 +00:00
Josh Bleecher Snyder
5353cde080 runtime, cmd/internal/obj/arm: improve arm function prologue
When stack growth is not needed, as it usually is not,
execute only a single conditional branch
rather than three conditional instructions.
This adds 4 bytes to every function,
but might speed up execution in the common case.

Sample disassembly for

func f() {
	_ = [128]byte{}
}

Before:

TEXT main.f(SB) x.go
	x.go:3	0x2000	e59a1008	MOVW 0x8(R10), R1
	x.go:3	0x2004	e59fb028	MOVW 0x28(R15), R11
	x.go:3	0x2008	e08d200b	ADD R11, R13, R2
	x.go:3	0x200c	e1520001	CMP R1, R2
	x.go:3	0x2010	91a0300e	MOVW.LS R14, R3
	x.go:3	0x2014	9b0118a9	BL.LS runtime.morestack_noctxt(SB)
	x.go:3	0x2018	9afffff8	B.LS main.f(SB)
	x.go:3	0x201c	e52de084	MOVW.W R14, -0x84(R13)
	x.go:4	0x2020	e28d1004	ADD $4, R13, R1
	x.go:4	0x2024	e3a00000	MOVW $0, R0
	x.go:4	0x2028	eb012255	BL 0x4a984
	x.go:5	0x202c	e49df084	RET #132
	x.go:5	0x2030	eafffffe	B 0x2030
	x.go:5	0x2034	ffffff7c	?

After:

TEXT main.f(SB) x.go
	x.go:3	0x2000	e59a1008	MOVW 0x8(R10), R1
	x.go:3	0x2004	e59fb02c	MOVW 0x2c(R15), R11
	x.go:3	0x2008	e08d200b	ADD R11, R13, R2
	x.go:3	0x200c	e1520001	CMP R1, R2
	x.go:3	0x2010	9a000004	B.LS 0x2028
	x.go:3	0x2014	e52de084	MOVW.W R14, -0x84(R13)
	x.go:4	0x2018	e28d1004	ADD $4, R13, R1
	x.go:4	0x201c	e3a00000	MOVW $0, R0
	x.go:4	0x2020	eb0124dc	BL 0x4b398
	x.go:5	0x2024	e49df084	RET #132
	x.go:5	0x2028	e1a0300e	MOVW R14, R3
	x.go:5	0x202c	eb011b0d	BL runtime.morestack_noctxt(SB)
	x.go:5	0x2030	eafffff2	B main.f(SB)
	x.go:5	0x2034	eafffffe	B 0x2034
	x.go:5	0x2038	ffffff7c	?

Updates #10587.

package sort benchmarks on an iPhone 6:

name            old time/op  new time/op  delta
SortString1K     569µs ± 0%   565µs ± 1%  -0.75%  (p=0.000 n=23+24)
StableString1K   872µs ± 1%   870µs ± 1%  -0.16%  (p=0.009 n=23+24)
SortInt1K        317µs ± 2%   316µs ± 2%    ~     (p=0.410 n=26+26)
StableInt1K      343µs ± 1%   339µs ± 1%  -1.07%  (p=0.000 n=22+23)
SortInt64K      30.0ms ± 1%  30.0ms ± 1%    ~     (p=0.091 n=25+24)
StableInt64K    30.2ms ± 0%  30.0ms ± 0%  -0.69%  (p=0.000 n=22+22)
Sort1e2          147µs ± 1%   146µs ± 0%  -0.48%  (p=0.000 n=25+24)
Stable1e2        290µs ± 1%   286µs ± 1%  -1.30%  (p=0.000 n=23+24)
Sort1e4         29.5ms ± 2%  29.7ms ± 1%  +0.71%  (p=0.000 n=23+23)
Stable1e4       88.7ms ± 4%  88.6ms ± 8%  -0.07%  (p=0.022 n=26+26)
Sort1e6          4.81s ± 7%   4.83s ± 7%    ~     (p=0.192 n=26+26)
Stable1e6        18.3s ± 1%   18.1s ± 1%  -0.76%  (p=0.000 n=25+23)
SearchWrappers   318ns ± 1%   344ns ± 1%  +8.14%  (p=0.000 n=23+26)

package sort benchmarks on a first generation rpi:

name            old time/op  new time/op  delta
SearchWrappers  4.13µs ± 0%  3.95µs ± 0%   -4.42%  (p=0.000 n=15+13)
SortString1K    5.81ms ± 1%  5.82ms ± 2%     ~     (p=0.400 n=14+15)
StableString1K  9.69ms ± 1%  9.73ms ± 0%     ~     (p=0.121 n=15+11)
SortInt1K       3.30ms ± 2%  3.66ms ±19%  +10.82%  (p=0.000 n=15+14)
StableInt1K     5.97ms ±15%  4.17ms ± 8%  -30.05%  (p=0.000 n=15+15)
SortInt64K       319ms ± 1%   295ms ± 1%   -7.65%  (p=0.000 n=15+15)
StableInt64K     343ms ± 0%   332ms ± 0%   -3.26%  (p=0.000 n=12+13)
Sort1e2         3.36ms ± 2%  3.22ms ± 4%   -4.10%  (p=0.000 n=15+15)
Stable1e2       6.74ms ± 1%  6.43ms ± 2%   -4.67%  (p=0.000 n=15+15)
Sort1e4          247ms ± 1%   247ms ± 1%     ~     (p=0.331 n=15+14)
Stable1e4        864ms ± 0%   820ms ± 0%   -5.15%  (p=0.000 n=14+15)
Sort1e6          41.2s ± 0%   41.2s ± 0%   +0.15%  (p=0.000 n=13+14)
Stable1e6         192s ± 0%    182s ± 0%   -5.07%  (p=0.000 n=14+14)

Change-Id: I8a9db77e1d4ea1956575895893bc9d04bd81204b
Reviewed-on: https://go-review.googlesource.com/10497
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-04 16:35:12 +00:00
Brad Fitzpatrick
03410f6758 runtime: fix TestFixedGOROOT to properly restore the GOROOT env var after test
Otherwise subsequent tests won't see any modified GOROOT.

With this CL I can move my GOROOT, set GOROOT to the new location, and
the runtime tests pass. Previously the crash_tests would instead look
for the GOROOT baked into the binary, instead of the env var:

--- FAIL: TestGcSys (0.01s)
        crash_test.go:92: building source: exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestGCFairness (0.01s)
        crash_test.go:92: building source: exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestGdbPython (0.07s)
        runtime-gdb_test.go:64: building source exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestLargeStringConcat (0.01s)
        crash_test.go:92: building source: exit status 2
                go: cannot find GOROOT directory: /home/bradfitz/go

Update #10029

Change-Id: If91be0f04d3acdcf39a9e773a4e7905a446bc477
Reviewed-on: https://go-review.googlesource.com/10685
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-03 23:33:48 +00:00
Austin Clements
10083d8007 runtime: print start of GC cycle in gctrace, rather than end
Currently the GODEBUG=gctrace=1 trace line includes "@n.nnns" to
indicate the time that the GC cycle ended relative to the time the
program started. This was meant to be consistent with the utilization
as of the end of the cycle, which is printed next on the trace line,
but it winds up just being confusing and unexpected.

Change the trace line to include the time that the GC cycle started
relative to the time the program started.

Change-Id: I7d64580cd696eb17540716d3e8a74a9d6ae50650
Reviewed-on: https://go-review.googlesource.com/10634
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-03 02:17:43 +00:00
Austin Clements
faa7a7e8ae runtime: implement GC stack barriers
This commit implements stack barriers to minimize the amount of
stack re-scanning that must be done during mark termination.

Currently the GC scans stacks of active goroutines twice during every
GC cycle: once at the beginning during root discovery and once at the
end during mark termination. The second scan happens while the world
is stopped and guarantees that we've seen all of the roots (since
there are no write barriers on writes to local stack
variables). However, this means pause time is proportional to stack
size. In particularly recursive programs, this can drive pause time up
past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap).

Re-scanning the entire stack is rarely necessary, especially for large
stacks, because usually most of the frames on the stack were not
active between the first and second scans and hence any changes to
these frames (via non-escaping pointers passed down the stack) were
tracked by write barriers.

To efficiently track how far a stack has been unwound since the first
scan (and, hence, how much needs to be re-scanned), this commit
introduces stack barriers. During the first scan, at exponentially
spaced points in each stack, the scan overwrites return PCs with the
PC of the stack barrier function. When "returned" to, the stack
barrier function records how far the stack has unwound and jumps to
the original return PC for that point in the stack. Then the second
scan only needs to proceed as far as the lowest barrier that hasn't
been hit.

For deeply recursive programs, this substantially reduces mark
termination time (and hence pause time). For the goscheme example
linked in issue #10898, prior to this change, mark termination times
were typically between 100 and 500ms; with this change, mark
termination times are typically between 10 and 20ms. As a result of
the reduced stack scanning work, this reduces overall execution time
of the goscheme example by 20%.

Fixes #10898.

The effect of this on programs that are not deeply recursive is
minimal:

name                   old time/op    new time/op    delta
BinaryTree17              3.16s ± 2%     3.26s ± 1%  +3.31%  (p=0.000 n=19+19)
Fannkuch11                2.42s ± 1%     2.48s ± 1%  +2.24%  (p=0.000 n=17+19)
FmtFprintfEmpty          50.0ns ± 3%    49.8ns ± 1%    ~     (p=0.534 n=20+19)
FmtFprintfString          173ns ± 0%     175ns ± 0%  +1.49%  (p=0.000 n=16+19)
FmtFprintfInt             170ns ± 1%     175ns ± 1%  +2.97%  (p=0.000 n=20+19)
FmtFprintfIntInt          288ns ± 0%     295ns ± 0%  +2.73%  (p=0.000 n=16+19)
FmtFprintfPrefixedInt     242ns ± 1%     252ns ± 1%  +4.13%  (p=0.000 n=18+18)
FmtFprintfFloat           324ns ± 0%     323ns ± 0%  -0.36%  (p=0.000 n=20+19)
FmtManyArgs              1.14µs ± 0%    1.12µs ± 1%  -1.01%  (p=0.000 n=18+19)
GobDecode                8.88ms ± 1%    8.87ms ± 0%    ~     (p=0.480 n=19+18)
GobEncode                6.80ms ± 1%    6.85ms ± 0%  +0.82%  (p=0.000 n=20+18)
Gzip                      363ms ± 1%     363ms ± 1%    ~     (p=0.077 n=18+20)
Gunzip                   90.6ms ± 0%    90.0ms ± 1%  -0.71%  (p=0.000 n=17+18)
HTTPClientServer         51.5µs ± 1%    50.8µs ± 1%  -1.32%  (p=0.000 n=18+18)
JSONEncode               17.0ms ± 0%    17.1ms ± 0%  +0.40%  (p=0.000 n=18+17)
JSONDecode               61.8ms ± 0%    63.8ms ± 1%  +3.11%  (p=0.000 n=18+17)
Mandelbrot200            3.84ms ± 0%    3.84ms ± 1%    ~     (p=0.583 n=19+19)
GoParse                  3.71ms ± 1%    3.72ms ± 1%    ~     (p=0.159 n=18+19)
RegexpMatchEasy0_32       100ns ± 0%     100ns ± 1%  -0.19%  (p=0.033 n=17+19)
RegexpMatchEasy0_1K       342ns ± 1%     331ns ± 0%  -3.41%  (p=0.000 n=19+19)
RegexpMatchEasy1_32      82.5ns ± 0%    81.7ns ± 0%  -0.98%  (p=0.000 n=18+18)
RegexpMatchEasy1_1K       505ns ± 0%     494ns ± 1%  -2.16%  (p=0.000 n=18+18)
RegexpMatchMedium_32      137ns ± 1%     137ns ± 1%  -0.24%  (p=0.048 n=20+18)
RegexpMatchMedium_1K     41.6µs ± 0%    41.3µs ± 1%  -0.57%  (p=0.004 n=18+20)
RegexpMatchHard_32       2.11µs ± 0%    2.11µs ± 1%  +0.20%  (p=0.037 n=17+19)
RegexpMatchHard_1K       63.9µs ± 2%    63.3µs ± 0%  -0.99%  (p=0.000 n=20+17)
Revcomp                   560ms ± 1%     522ms ± 0%  -6.87%  (p=0.000 n=18+16)
Template                 75.0ms ± 0%    75.1ms ± 1%  +0.18%  (p=0.013 n=18+19)
TimeParse                 358ns ± 1%     364ns ± 0%  +1.74%  (p=0.000 n=20+15)
TimeFormat                360ns ± 0%     372ns ± 0%  +3.55%  (p=0.000 n=20+18)

Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2
Reviewed-on: https://go-review.googlesource.com/10314
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-06-02 20:00:57 +00:00
Austin Clements
724f8298a8 runtime: avoid double-scanning of stacks
Currently there's a race between stopg scanning another G's stack and
the G reaching a preemption point and scanning its own stack. When
this race occurs, the G's stack is scanned twice. Currently this is
okay, so this race is benign.

However, we will shortly be adding stack barriers during the first
stack scan, so scanning will no longer be idempotent. To prepare for
this, this change ensures that each stack is scanned only once during
each GC phase by checking the flag that indicates that the stack has
been scanned in this phase before scanning the stack.

Change-Id: Id9f4d5e2e5b839bc3f200ec1723a4a12dd677ab4
Reviewed-on: https://go-review.googlesource.com/10458
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-06-02 19:59:05 +00:00
Austin Clements
3f6e69aca5 runtime: steal space for stack barrier tracking from stack
The stack barrier code will need a bookkeeping structure to keep track
of the overwritten return PCs. This commit introduces and allocates
this structure, but does not yet use the structure.

We don't want to allocate space for this structure during garbage
collection, so this commit allocates it along with the allocation of
the corresponding stack. However, we can't do a regular allocation in
newstack because mallocgc may itself grow the stack (which would lead
to a recursive allocation). Hence, this commit makes the bookkeeping
structure part of the stack allocation itself by stealing the
necessary space from the top of the stack allocation. Since the size
of this bookkeeping structure is logarithmic in the size of the stack,
this has minimal impact on stack behavior.

Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665
Reviewed-on: https://go-review.googlesource.com/10313
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:57 +00:00
Austin Clements
e610c25df0 runtime: decouple stack bounds and stack allocation size
Currently the runtime assumes that the allocation for the stack is
exactly [stack.lo, stack.hi). We're about to steal a small part of
this allocation for per-stack GC metadata. To prepare for this, this
commit adds a field to the G for the allocated size of the stack.
With this change, stack.lo and stack.hi continue to act as the true
bounds on the stack, but are no longer also used as the bounds on the
stack allocation.

(I also tried this the other way around, where stack.lo and stack.hi
remained the allocation bounds and I introduced a new top of stack.
However, there are far more places that assume stack.hi is the true
top of the stack than there are places that assume it's the top of the
allocation.)

Change-Id: Ifa9d956753be53d286d09cbc73d47fb34a18c0c6
Reviewed-on: https://go-review.googlesource.com/10312
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:50 +00:00
Austin Clements
c02b8911d8 runtime: clean up signalstack API
Currently signalstack takes a lower limit and a length and all calls
hard-code the passed length. Change the API to take a *stack and
compute the lower limit and length from the passed stack.

This will make it easier for the runtime to steal some space from the
top of the stack since it eliminates the hard-coded stack sizes.

Change-Id: I7d2a9f45894b221f4e521628c2165530bbc57d53
Reviewed-on: https://go-review.googlesource.com/10311
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:42 +00:00
Austin Clements
cc6a7fce53 runtime: increase precision of gctrace times
Currently we truncate gctrace clock and CPU times to millisecond
precision. As a result, many phases are typically printed as 0, which
is fine for user consumption, but makes gathering statistics and
reports over GC traces difficult.

In 1.4, the gctrace line printed times in microseconds. This was
better for statistics, but not as easy for users to read or interpret,
and it generally made the trace lines longer.

This change strikes a balance between these extremes by printing
milliseconds, but including the decimal part to two significant
figures down to microsecond precision. This remains easy to read and
interpret, but includes more precision when it's useful.

For example, where the code currently prints,

gc #29 @1.629s 0%: 0+2+0+12+0 ms clock, 0+2+0+0/12/0+0 ms cpu, 4->4->2 MB, 4 MB goal, 1 P

this prints,

gc #29 @1.629s 0%: 0.005+2.1+0+12+0.29 ms clock, 0.005+2.1+0+0/12/0+0.29 ms cpu, 4->4->2 MB, 4 MB goal, 1 P

Fixes #10970.

Change-Id: I249624779433927cd8b0947b986df9060c289075
Reviewed-on: https://go-review.googlesource.com/10554
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 18:31:36 +00:00
Mikio Hara
1fa0a8cec5 runtime: fix data race in BenchmarkChanPopular
Fixes #11014.

Change-Id: I9a18dacd10564d3eaa1fea4d77f1a48e08e79f53
Reviewed-on: https://go-review.googlesource.com/10563
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-02 11:16:01 +00:00
Austin Clements
df2809f04e runtime: document that runtime.GC() blocks until GC is complete
runtime.GC() is intentionally very weakly specified. However, it is so
weakly specified that it's difficult to know that it's being used
correctly for its one intended use case: to ensure garbage collection
has run in a test that is garbage-sensitive. In particular, it is
unclear whether it is synchronous or asynchronous. In the old STW
collector this was essentially self-evident; short of queuing up a
garbage collection to run later, it had to be synchronous. However,
with the concurrent collector, there's evidence that people are
inferring that it may be asynchronous (e.g., issue #10986), as this is
both unclear in the documentation and possible in the implementation.

In fact, runtime.GC() runs a fully synchronous STW collection. We
probably don't want to commit to this exact behavior. But we can
commit to the essential property that tests rely on: that runtime.GC()
does not return until the GC has finished.

Change-Id: Ifc3045a505e1898ecdbe32c1f7e80e2e9ffacb5b
Reviewed-on: https://go-review.googlesource.com/10488
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-06-01 14:51:12 +00:00
Austin Clements
f2c3957ed8 runtime: disable GC around TestGoroutineParallelism
TestGoroutineParallelism can deadlock if the GC runs during the
test. Currently it tries to prevent this by forcing a GC before the
test, but this is best effort and fails completely if GOGC is very low
for testing.

This change replaces this best-effort fix with simply setting GOGC to
off for the duration of the test.

Change-Id: I8229310833f241b149ebcd32845870c1cb14e9f8
Reviewed-on: https://go-review.googlesource.com/10454
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-28 17:40:19 +00:00
Austin Clements
4a1957d0aa runtime: use stripped test environment for TestGdbPython
Most runtime tests that invoke the compiler to build a sub-test binary
do so with a special environment constructed by testEnv that strips
out environment variables that should apply to the test but not to the
build.

Fix TestGdbPython to use this test environment when invoking go build,
like other tests do.

Change-Id: Iafdf89d4765c587cbebc427a5d61cb8a7e71b326
Reviewed-on: https://go-review.googlesource.com/10455
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-28 17:39:08 +00:00
Elias Naur
8017ace496 runtime: don't always block all signals on OpenBSD
Implement the changes from CL 10173 on OpenBSD.

Change-Id: I2db1cd8141fd392a34753a1b8113e2e0401173b9
Reviewed-on: https://go-review.googlesource.com/10342
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-23 17:42:43 +00:00
Elias Naur
84cfba17c2 runtime: don't always unblock all signals
Ian proposed an improved way of handling signals masks in Go, motivated
by a problem where the Android java runtime expects certain signals to
be blocked for all JVM threads. Discussion here

https://groups.google.com/forum/#!topic/golang-dev/_TSCkQHJt6g

Ian's text is used in the following:

A Go program always needs to have the synchronous signals enabled.
These are the signals for which _SigPanic is set in sigtable, namely
SIGSEGV, SIGBUS, SIGFPE.

A Go program that uses the os/signal package, and calls signal.Notify,
needs to have at least one thread which is not blocking that signal,
but it doesn't matter much which one.

Unix programs do not change signal mask across execve.  They inherit
signal masks across fork.  The shell uses this fact to some extent;
for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are
blocked for commands run due to backquote quoting or $().

Our current position on signal masks was not thought out.  We wandered
into step by step, e.g., http://golang.org/cl/7323067 .

This CL does the following:

Introduce a new platform hook, msigsave, that saves the signal mask of
the current thread to m.sigsave.

Call msigsave from needm and newm.

In minit grab set up the signal mask from m.sigsave and unblock the
essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT
(for systems that have it).

In unminit, restore the signal mask from m.sigsave.

The first time that os/signal.Notify is called, start a new thread whose
only purpose is to update its signal mask to make sure signals for
signal.Notify are unblocked on at least one thread.

The effect on Go programs will be that if they are invoked with some
non-synchronous signals blocked, those signals will normally be
ignored.  Previously, those signals would mostly be ignored.  A change
in behaviour will occur for programs started with any of these signals
blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT,
SIGTERM.  Previously those signals would always cause a crash (unless
using the os/signal package); with this change, they will be ignored
if the program is started with the signal blocked (and does not use
the os/signal package).

./all.bash completes successfully on linux/amd64.

OpenBSD is missing the implementation.

Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c
Reviewed-on: https://go-review.googlesource.com/10173
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-22 20:24:08 +00:00
Russ Cox
001438bdfe runtime: fix callwritebarrier
Given a call frame F of size N where the return values start at offset R,
callwritebarrier was instructing heapBitsBulkBarrier to scan the block
of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R
bytes scanned might lead into the next allocated block in memory.
Because the scan was consulting the heap bitmap for type information,
scanning into the next block normally "just worked" in the sense of
not crashing.

Scanning the extra N-R bytes of memory is a problem mainly because
it causes the GC to consider pointers that might otherwise not be
considered, leading it to retain objects that should actually be freed.
This is very difficult to detect.

Luckily, juju turned up a case where the heap bitmap and the memory
were out of sync for the block immediately after the call frame, so that
heapBitsBulkBarrier saw an obvious non-pointer where it expected a
pointer, causing a loud crash.

Why is there a non-pointer in memory that the heap bitmap records as
a pointer? That is more difficult to answer. At least one way that it
could happen is that allocations containing no pointers at all do not
update the heap bitmap. So if heapBitsBulkBarrier walked out of the
current object and into a no-pointer object and consulted those bitmap
bits, it would be misled. This doesn't happen in general because all
the paths to heapBitsBulkBarrier first check for the no-pointer case.
This may or may not be what happened, but it's the only scenario
I've been able to construct.

I tried for quite a while to write a simple test for this and could not.
It does fix the juju crash, and it is clearly an improvement over the
old code.

Fixes #10844.

Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a
Reviewed-on: https://go-review.googlesource.com/10317
Reviewed-by: Austin Clements <austin@google.com>
2015-05-21 19:14:03 +00:00
Austin Clements
a5c3bbe0b4 runtime: eliminate write barrier from adjustpointers
Currently adjustpointers invokes a write barrier for every stack slot
it updates. This is safe---the write barrier always does nothing
because the new value is never a heap pointer---but it's unnecessary
overhead in performance and complexity.

Fix this by rewriting adjustpointers to work with *uintptrs instead of
*unsafe.Pointers. As an added bonus, this makes the code cleaner.

name                   old mean              new mean              delta
BinaryTree17            3.35s × (0.98,1.01)   3.33s × (0.99,1.02)    ~    (p=0.095 n=20+19)
Fannkuch11              2.49s × (1.00,1.01)   2.52s × (0.99,1.01)  +1.23% (p=0.000 n=19+20)
FmtFprintfEmpty        52.2ns × (0.99,1.02)  52.2ns × (0.99,1.02)    ~    (p=0.766 n=19+19)
FmtFprintfString        181ns × (0.99,1.02)   179ns × (0.99,1.01)  -1.06% (p=0.000 n=20+19)
FmtFprintfInt           177ns × (0.99,1.01)   173ns × (0.99,1.02)  -2.26% (p=0.000 n=17+20)
FmtFprintfIntInt        300ns × (0.99,1.01)   302ns × (0.99,1.01)  +0.76% (p=0.000 n=19+20)
FmtFprintfPrefixedInt   253ns × (0.99,1.02)   256ns × (0.99,1.01)  +0.96% (p=0.000 n=20+19)
FmtFprintfFloat         334ns × (0.99,1.02)   334ns × (1.00,1.01)    ~    (p=0.243 n=20+19)
FmtManyArgs            1.16µs × (0.99,1.01)  1.17µs × (0.99,1.02)  +0.88% (p=0.000 n=20+20)
GobDecode              9.16ms × (0.99,1.02)  9.18ms × (1.00,1.00)  +0.21% (p=0.048 n=20+17)
GobEncode              7.03ms × (0.99,1.01)  7.05ms × (0.99,1.01)    ~    (p=0.091 n=19+19)
Gzip                    374ms × (0.99,1.01)   372ms × (0.99,1.02)  -0.50% (p=0.008 n=18+20)
Gunzip                 92.9ms × (0.99,1.01)  92.5ms × (1.00,1.01)  -0.47% (p=0.002 n=19+19)
HTTPClientServer       53.1µs × (0.98,1.01)  52.5µs × (0.99,1.01)  -0.98% (p=0.000 n=20+19)
JSONEncode             17.4ms × (0.99,1.02)  17.5ms × (0.99,1.01)    ~    (p=0.061 n=19+20)
JSONDecode             66.0ms × (0.99,1.02)  64.7ms × (0.99,1.01)  -1.87% (p=0.000 n=20+20)
Mandelbrot200          3.94ms × (1.00,1.01)  3.95ms × (1.00,1.01)    ~    (p=0.799 n=18+19)
GoParse                3.89ms × (0.99,1.02)  3.86ms × (0.99,1.01)  -0.70% (p=0.016 n=20+19)
RegexpMatchEasy0_32     102ns × (0.99,1.02)   102ns × (1.00,1.01)    ~    (p=0.557 n=20+18)
RegexpMatchEasy0_1K     353ns × (0.99,1.02)   341ns × (0.99,1.01)  -3.38% (p=0.000 n=20+20)
RegexpMatchEasy1_32    85.0ns × (0.99,1.02)  85.0ns × (0.99,1.01)    ~    (p=0.851 n=19+20)
RegexpMatchEasy1_1K     521ns × (0.99,1.02)   506ns × (1.00,1.01)  -2.85% (p=0.000 n=20+18)
RegexpMatchMedium_32    142ns × (0.99,1.02)   141ns × (1.00,1.01)  -1.17% (p=0.000 n=20+19)
RegexpMatchMedium_1K   42.8µs × (0.99,1.01)  42.3µs × (0.99,1.01)  -1.07% (p=0.000 n=20+19)
RegexpMatchHard_32     2.17µs × (0.99,1.01)  2.16µs × (1.00,1.01)  -0.51% (p=0.042 n=20+18)
RegexpMatchHard_1K     65.6µs × (0.99,1.01)  64.8µs × (1.00,1.00)  -1.21% (p=0.000 n=20+17)
Revcomp                 581ms × (0.99,1.04)   536ms × (1.00,1.01)  -7.71% (p=0.000 n=20+18)
Template               77.2ms × (0.99,1.01)  76.8ms × (0.99,1.01)    ~    (p=0.426 n=20+18)
TimeParse               369ns × (0.99,1.02)   371ns × (1.00,1.01)    ~    (p=0.117 n=20+19)
TimeFormat              371ns × (0.99,1.02)   391ns × (0.99,1.01)  +5.33% (p=0.000 n=20+19)

Change-Id: I5b952ba577ac4365c8c87db837c5804a1e30b7be
Reviewed-on: https://go-review.googlesource.com/10293
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-21 18:35:49 +00:00
Rick Hudson
5b66e5d0d8 runtime: turn work buffer tracing off by default
During development we ran with monitoring code turned
on by default. This CL turns the work buffer monitoring
off. Performance change on most go1 benchmarks is small
or insignificant.

name                   old mean              new mean              delta
BinaryTree17            3.35s × (0.99,1.01)   3.35s × (0.99,1.01)    ~    (p=0.841 n=5+5)
Fannkuch11              2.59s × (1.00,1.01)   2.55s × (1.00,1.00)  -1.65% (p=0.008 n=5+5)
FmtFprintfEmpty        52.5ns × (0.99,1.02)  53.2ns × (0.98,1.01)    ~    (p=0.063 n=5+5)
FmtFprintfString        181ns × (1.00,1.00)   180ns × (1.00,1.00)  -0.55% (p=0.029 n=4+4)
FmtFprintfInt           176ns × (1.00,1.01)   174ns × (1.00,1.00)  -0.91% (p=0.000 n=5+4)
FmtFprintfIntInt        298ns × (1.00,1.00)   299ns × (1.00,1.00)    ~    (p=0.143 n=4+4)
FmtFprintfPrefixedInt   250ns × (1.00,1.01)   246ns × (1.00,1.00)  -1.68% (p=0.000 n=5+4)
FmtFprintfFloat         340ns × (1.00,1.00)   340ns × (1.00,1.01)    ~    (p=0.643 n=5+5)
FmtManyArgs            1.16µs × (1.00,1.00)  1.15µs × (1.00,1.00)  -0.47% (p=0.016 n=5+5)
GobDecode              9.22ms × (1.00,1.00)  9.23ms × (1.00,1.00)    ~    (p=0.841 n=5+5)
GobEncode              7.00ms × (1.00,1.01)  7.09ms × (0.99,1.01)  +1.26% (p=0.016 n=5+5)
Gzip                    387ms × (1.00,1.00)   389ms × (0.99,1.02)    ~    (p=1.000 n=5+5)
Gunzip                 97.8ms × (1.00,1.00)  98.3ms × (1.00,1.00)  +0.51% (p=0.016 n=5+4)
HTTPClientServer       52.6µs × (1.00,1.01)  52.7µs × (1.00,1.01)    ~    (p=1.000 n=5+5)
JSONEncode             18.0ms × (0.99,1.02)  17.9ms × (1.00,1.00)    ~    (p=0.310 n=5+5)
JSONDecode             64.8ms × (0.99,1.02)  63.6ms × (1.00,1.00)  -1.94% (p=0.008 n=5+5)
Mandelbrot200          4.05ms × (1.00,1.00)  4.05ms × (1.00,1.00)    ~    (p=0.421 n=5+5)
GoParse                3.86ms × (1.00,1.01)  3.84ms × (0.99,1.01)    ~    (p=0.421 n=5+5)
RegexpMatchEasy0_32     101ns × (1.00,1.00)   102ns × (0.99,1.02)    ~    (p=0.238 n=4+5)
RegexpMatchEasy0_1K     346ns × (1.00,1.01)   345ns × (1.00,1.00)    ~    (p=0.333 n=5+4)
RegexpMatchEasy1_32    87.3ns × (0.99,1.02)  87.4ns × (1.00,1.00)    ~    (p=0.190 n=5+4)
RegexpMatchEasy1_1K     520ns × (1.00,1.00)   520ns × (1.00,1.01)    ~    (p=1.000 n=4+5)
RegexpMatchMedium_32    143ns × (1.00,1.00)   142ns × (1.00,1.00)  -0.70% (p=0.029 n=4+4)
RegexpMatchMedium_1K   43.2µs × (1.00,1.01)  43.2µs × (1.00,1.00)    ~    (p=0.841 n=5+5)
RegexpMatchHard_32     2.24µs × (1.00,1.01)  2.23µs × (1.00,1.01)  -0.63% (p=0.048 n=5+5)
RegexpMatchHard_1K     68.7µs × (1.00,1.00)  68.3µs × (1.00,1.00)  -0.56% (p=0.008 n=5+5)
Revcomp                 577ms × (1.00,1.01)   579ms × (1.00,1.00)    ~    (p=0.151 n=5+5)
Template               74.9ms × (1.00,1.00)  76.5ms × (1.00,1.00)  +2.11% (p=0.008 n=5+5)
TimeParse               359ns × (1.00,1.00)   362ns × (1.00,1.00)  +0.72% (p=0.008 n=5+5)
TimeFormat              369ns × (1.00,1.00)   371ns × (1.00,1.01)    ~    (p=0.071 n=5+5)

Change-Id: I4206a3f77a3d1450966b7a62ea7597aec44cb72f
Reviewed-on: https://go-review.googlesource.com/10294
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-21 16:09:24 +00:00
Austin Clements
719efc70eb runtime: make runtime.callers walk calling G, not g0
Currently runtime.callers invokes gentraceback with the pc and sp of
the G it is called from, but always passes g0 even if it was called
from a regular g. Right now this has no ill effects because
runtime.callers does not use either callback argument or the
_TraceJumpStack flag, but it makes the code fragile and will break
some upcoming changes.

Fix this by lifting the getg() call outside of the systemstack in
runtime.callers.

Change-Id: I4e1e927961c0e0cd4dcf28693be47df7bae9e122
Reviewed-on: https://go-review.googlesource.com/10292
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-21 16:06:37 +00:00
Rick Hudson
197aa9e64d runtime: remove unused quiesce code
This is dead code. If you want to quiesce the system the
preferred way is to use forEachP(func(*p){}).

Change-Id: Ic7677a5dd55e3639b99e78ddeb2c71dd1dd091fa
Reviewed-on: https://go-review.googlesource.com/10267
Reviewed-by: Austin Clements <austin@google.com>
2015-05-20 17:56:44 +00:00
Rick Hudson
913db7685e runtime: run background mark helpers only if work is available
Prior to this CL whenever the GC marking was enabled and
a P was looking for work we supplied a G to help
the GC do its marking tasks. Once this G finished all
the marking available it would release the P to find another
available G. In the case where there was no work the P would drop
into findrunnable which would execute the mark helper G which would
immediately return and the P would drop into findrunnable again repeating
the process. Since the P was always given a G to run it never blocks.
This CL first checks if the GC mark helper G has available work and if
not the P immediately falls through to its blocking logic.

Fixes #10901

Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5
Reviewed-on: https://go-review.googlesource.com/10189
Reviewed-by: Austin Clements <austin@google.com>
2015-05-19 15:57:50 +00:00
Austin Clements
f4d51eb2f5 runtime: minor clean up to heapminimum
Currently setGCPercent sets heapminimum to heapminimum*GOGC/100. The
real intent is to set heapminimum to a scaled multiple of a fixed
default heap minimum, not to scale heapminimum based on its current
value. This turns out to be okay because setGCPercent is only called
once and heapminimum is initially set to this default heap minimum.
However, the code as written is confusing, especially since
setGCPercent is otherwise written so it could be called again to
change GOGC. Fix this by introducing a defaultHeapMinimum constant and
using this instead of the current value of heapminimum to compute the
scaled heap minimum.

As part of this, this commit improves the documentation on
heapminimum.

Change-Id: I4eb82c73dc2eb44a6e5a17c780a747a2e73d7493
Reviewed-on: https://go-review.googlesource.com/10181
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-19 15:30:34 +00:00
Russ Cox
8903b3db0e runtime: add fast check for self-loop pointer in scanobject
Addresses a problem reported on the mailing list.

This will come up mainly in programs custom allocators that batch allocations,
but it still helps in our programs, which mainly do not have such allocations.

name                   old mean              new mean              delta
BinaryTree17            5.95s × (0.97,1.03)   5.93s × (0.97,1.04)    ~    (p=0.613)
Fannkuch11              4.46s × (0.98,1.04)   4.33s × (0.99,1.01)  -2.93% (p=0.000)
FmtFprintfEmpty        86.6ns × (0.98,1.03)  86.8ns × (0.98,1.02)    ~    (p=0.523)
FmtFprintfString        290ns × (0.98,1.05)   287ns × (0.98,1.03)    ~    (p=0.061)
FmtFprintfInt           271ns × (0.98,1.04)   286ns × (0.99,1.01)  +5.54% (p=0.000)
FmtFprintfIntInt        495ns × (0.98,1.04)   489ns × (0.99,1.01)  -1.24% (p=0.015)
FmtFprintfPrefixedInt   391ns × (0.99,1.02)   407ns × (0.99,1.01)  +4.00% (p=0.000)
FmtFprintfFloat         578ns × (0.99,1.01)   559ns × (0.99,1.01)  -3.35% (p=0.000)
FmtManyArgs            1.96µs × (0.98,1.05)  1.94µs × (0.99,1.01)  -1.33% (p=0.030)
GobDecode              15.9ms × (0.97,1.05)  15.7ms × (0.99,1.01)  -1.35% (p=0.044)
GobEncode              11.4ms × (0.97,1.05)  11.3ms × (0.98,1.03)    ~    (p=0.141)
Gzip                    658ms × (0.98,1.05)   648ms × (0.99,1.01)  -1.59% (p=0.009)
Gunzip                  144ms × (0.99,1.03)   144ms × (0.99,1.01)    ~    (p=0.867)
HTTPClientServer       92.1µs × (0.97,1.05)  90.3µs × (0.99,1.01)  -1.89% (p=0.005)
JSONEncode             31.0ms × (0.96,1.07)  30.2ms × (0.98,1.03)  -2.66% (p=0.001)
JSONDecode              110ms × (0.97,1.04)   107ms × (0.99,1.01)  -2.59% (p=0.000)
Mandelbrot200          6.15ms × (0.98,1.04)  6.07ms × (0.99,1.02)  -1.32% (p=0.045)
GoParse                6.79ms × (0.97,1.04)  6.74ms × (0.97,1.04)    ~    (p=0.242)
RegexpMatchEasy0_32     158ns × (0.98,1.05)   155ns × (0.99,1.01)  -1.64% (p=0.010)
RegexpMatchEasy0_1K     548ns × (0.97,1.04)   540ns × (0.99,1.01)  -1.34% (p=0.042)
RegexpMatchEasy1_32     133ns × (0.97,1.04)   132ns × (0.97,1.05)    ~    (p=0.466)
RegexpMatchEasy1_1K     899ns × (0.96,1.05)   878ns × (0.99,1.01)  -2.32% (p=0.002)
RegexpMatchMedium_32    250ns × (0.96,1.03)   243ns × (0.99,1.01)  -2.90% (p=0.000)
RegexpMatchMedium_1K   73.4µs × (0.98,1.04)  73.0µs × (0.98,1.04)    ~    (p=0.411)
RegexpMatchHard_32     3.87µs × (0.97,1.07)  3.84µs × (0.98,1.04)    ~    (p=0.273)
RegexpMatchHard_1K      120µs × (0.97,1.08)   117µs × (0.99,1.01)  -2.06% (p=0.010)
Revcomp                 940ms × (0.96,1.07)   924ms × (0.97,1.07)    ~    (p=0.071)
Template                128ms × (0.96,1.05)   128ms × (0.99,1.01)    ~    (p=0.502)
TimeParse               632ns × (0.96,1.07)   616ns × (0.99,1.01)  -2.58% (p=0.001)
TimeFormat              671ns × (0.97,1.06)   657ns × (0.99,1.02)  -2.10% (p=0.002)

In contrast to the one in test/bench/go1 (above), the binarytree program on the
shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS
to runtime.NumCPU()*2.

Using that version, before vs after:

name          old mean             new mean             delta
BinaryTree20  18.6s × (0.96,1.05)  11.3s × (0.98,1.02)  -39.46% (p=0.000)

And Go 1.4 vs after:

name          old mean             new mean             delta
BinaryTree20  13.0s × (0.97,1.02)  11.3s × (0.98,1.02)  -13.21% (p=0.000)

There is still a scheduling problem - the raw run times are hiding the fact that
this chews up 2x the CPU - but we'll take care of that separately.

Change-Id: I3f5da879b24ae73a0d06745381ffb88c3744948b
Reviewed-on: https://go-review.googlesource.com/10220
Reviewed-by: Austin Clements <austin@google.com>
2015-05-19 15:29:40 +00:00
Josh Bleecher Snyder
79986e24e0 runtime/pprof: write heap statistics to heap profile always
This is a duplicate of CL 9491.
That CL broke the build due to pprof shortcomings
and was reverted in CL 9565.

CL 9623 fixed pprof, so this can go in again.

Fixes #10659.

Change-Id: If470fc90b3db2ade1d161b4417abd2f5c6c330b8
Reviewed-on: https://go-review.googlesource.com/10212
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2015-05-18 20:02:21 +00:00
Austin Clements
f0dd002895 runtime: use separate count and note for forEachP
Currently, forEachP reuses the stopwait and stopnote fields from
stopTheWorld to track how many Ps have not responded to the safe-point
request and to sleep until all Ps have responded.

It was assumed this was safe because both stopTheWorld and forEachP
must occur under the worlsema and hence stopwait and stopnote cannot
be used for both purposes simultaneously and callers could always
determine the appropriate use based on sched.gcwaiting (which is only
set by stopTheWorld). However, this is not the case, since it's
possible for there to be a window between when an M observes that
gcwaiting is set and when it checks stopwait during which stopwait
could have changed meanings. When this happens, the M decrements
stopwait and may wakeup stopnote, but does not otherwise participate
in the forEachP protocol. As a result, stopwait is decremented too
many times, so it may reach zero before all Ps have run the safe-point
function, causing forEachP to wake up early. It will then either
observe that some P has not run the safe-point function and panic with
"P did not run fn", or the remaining P (or Ps) will run the safe-point
function before it wakes up and it will observe that stopwait is
negative and panic with "not stopped".

Fix this problem by giving forEachP its own safePointWait and
safePointNote fields.

One known sequence of events that can cause this race is as
follows. It involves three actors:

G1 is running on M1 on P1. P1 has an empty run queue.

G2/M2 is in a blocked syscall and has lost its P. (The details of this
don't matter, it just needs to be in a position where it needs to grab
an idle P.)

GC just started on G3/M3/P3. (These aren't very involved, they just
have to be separate from the other G's, M's, and P's.)

1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1.

Now G1/M1 begins to enter a syscall:

2. G1/M1 invokes reentersyscall, which sets the P1's status to
   _Psyscall.

3. G1/M1's reentersyscall observes gcwaiting != 0 and calls
   entersyscall_gcwait.

4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock.

Back on GC:

5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and
   returns.

6. GC does stuff and then calls startTheWorld().

7. startTheWorld() calls procresize(), which sets P1's status to
   _Pidle and puts P1 on the idle list.

Now G2/M2 returns from its syscall and takes over P1:

8. G2/M2 returns from its blocked syscall and gets P1 from the idle
   list.

9. G2/M2 acquires P1, which sets P1's status to _Prunning.

10. G2/M2 starts a new syscall and invokes reentersyscall, which sets
    P1's status to _Psyscall.

Back on G1/M1:

11. G1/M1 finally acquires sched.lock in entersyscall_gcwait.

At this point, G1/M1 still thinks it's running on P1. P1's status is
_Psyscall, which is consistent with what G1/M1 is doing, but it's
_Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is
basically an ABA race on P1's status.

Because forEachP currently shares stopwait with stopTheWorld. G1/M1's
entersyscall_gcwait observes the non-zero stopwait set by forEachP,
but mistakes it for a stopTheWorld. It cas's P1's status from
_Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement
stopwait one more time than forEachP was expecting.

Fixes #10618. (See the issue for details on why the above race is safe
when forEachP is not involved.)

Prior to this commit, the command
  stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile
would reliably fail after a few hundred runs. With this commit, it
ran for over 2 million runs and never crashed.

Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30
Reviewed-on: https://go-review.googlesource.com/10157
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:47 +00:00
Austin Clements
277acca286 runtime: hold worldsema while starting the world
Currently, startTheWorld releases worldsema before starting the
world. Since startTheWorld can change gomaxprocs after allowing Ps to
run, this means that gomaxprocs can change while another P holds
worldsema.

Unfortunately, the garbage collector and forEachP assume that holding
worldsema protects against changes in gomaxprocs (which it *almost*
does). In particular, this is causing somewhat frequent "P did not run
fn" crashes in forEachP in the runtime tests because gomaxprocs is
changing between the several loops that forEachP does over all the Ps.

Fix this by only releasing worldsema after the world is started.

This relates to issue #10618. forEachP still fails under stress
testing, but much less frequently.

Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff
Reviewed-on: https://go-review.googlesource.com/10156
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:37 +00:00
Austin Clements
9c44a41dd5 runtime: disallow preemption during startTheWorld
Currently, startTheWorld clears preemptoff for the current M before
starting the world. A few callers increment m.locks around
startTheWorld, presumably to prevent preemption any time during
starting the world. This is almost certainly pointless (none of the
other callers do this), but there's no harm in making startTheWorld
keep preemption disabled until it's all done, which definitely lets us
drop these m.locks manipulations.

Change-Id: I8a93658abd0c72276c9bafa3d2c7848a65b4691a
Reviewed-on: https://go-review.googlesource.com/10155
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:31 +00:00
Austin Clements
a1da255aa0 runtime: factor stoptheworld/starttheworld pattern
There are several steps to stopping and starting the world and
currently they're open-coded in several places. The garbage collector
is the only thing that needs to stop and start the world in a
non-trivial pattern. Replace all other uses with calls to higher-level
functions that implement the entire pattern necessary to stop and
start the world.

This is a pure refectoring and should not change any code semantics.
In the following commits, we'll make changes that are easier to do
with this abstraction in place.

This commit renames the old starttheworld to startTheWorldWithSema.
This is a slight misnomer right now because the callers release
worldsema just before calling this. However, a later commit will swap
these and I don't want to think of another name in the mean time.

Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911
Reviewed-on: https://go-review.googlesource.com/10154
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:25 +00:00
Austin Clements
5f7060afd2 runtime: don't start GC if preemptoff is set
In order to avoid deadlocks, startGC avoids kicking off GC if locks
are held by the calling M. However, it currently fails to check
preemptoff, which is the other way to disable preemption.

Fix this by adding a check for preemptoff.

Change-Id: Ie1083166e5ba4af5c9d6c5a42efdfaaef41ca997
Reviewed-on: https://go-review.googlesource.com/10153
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:18 +00:00
Alex Brainman
e544bee1dd runtime: correct exception stack trace output
It is misleading when stack trace say:

signal arrived during cgo execution

but we are not in cgo call.

Change-Id: I627e2f2bdc7755074677f77f21befc070a101914
Reviewed-on: https://go-review.googlesource.com/9190
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 03:09:45 +00:00
Austin Clements
a0fc306023 runtime: eliminate runqvictims and a copy from runqsteal
Currently, runqsteal steals Gs from another P into an intermediate
buffer and then copies those Gs into the current P's run queue. This
intermediate buffer itself was moved from the stack to the P in commit
c4fe503 to eliminate the cost of zeroing it on every steal.

This commit follows up c4fe503 by stealing directly into the current
P's run queue, which eliminates the copy and the need for the
intermediate buffer. The update to the tail pointer is only committed
once the entire steal operation has succeeded, so the semantics of
stealing do not change.

Change-Id: Icdd7a0eb82668980bf42c0154b51eef6419fdd51
Reviewed-on: https://go-review.googlesource.com/9998
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-17 01:08:42 +00:00
Russ Cox
512f75e8df runtime: replace GC programs with simpler encoding, faster decoder
Small types record the location of pointers in their memory layout
by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries,
and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using
a bitmap for a large type containing arrays does not make sense:
if someone refers to the type [1<<28]*byte in a program in such
a way that the type information makes it into the binary, it would be
a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB
(for 1-bit entries) bitmap full of 1s into the binary or even to keep
one in memory during the execution of the program.

For large types containing arrays, it is much more compact to describe
the locations of pointers using a notation that can express repetition
than to lay out a bitmap of pointers. Go 1.4 included such a notation,
called ``GC programs'' but it was complex, required recursion during
decoding, and was generally slow. Dmitriy measured the execution of
these programs writing directly to the heap bitmap as being 7x slower
than copying from a preunrolled 4-bit mask (and frankly that code was
not terribly fast either). For some tests, unrollgcprog1 was seen costing
as much as 3x more than the rest of malloc combined.

This CL introduces a different form for the GC programs. They use a
simple Lempel-Ziv-style encoding of the 1-bit pointer information,
in which the only operations are (1) emit the following n bits
and (2) repeat the last n bits c more times. This encoding can be
generated directly from the Go type information (using repetition
only for arrays or large runs of non-pointer data) and it can be decoded
very efficiently. In particular the decoding requires little state and
no recursion, so that the entire decoding can run without any memory
accesses other than the reads of the encoding and the writes of the
decoded form to the heap bitmap. For recursive types like arrays of
arrays of arrays, the inner instructions are only executed once, not
n times, so that large repetitions run at full speed. (In contrast, large
repetitions in the old programs repeated the individual bit-level layout
of the inner data over and over.) The result is as much as 25x faster
decoding compared to the old form.

Because the old decoder was so slow, Go 1.4 had three (or so) cases
for how to set the heap bitmap bits for an allocation of a given type:

(1) If the type had an even number of words up to 32 words, then
the 4-bit pointer mask for the type fit in no more than 16 bytes;
store the 4-bit pointer mask directly in the binary and copy from it.

(1b) If the type had an odd number of words up to 15 words, then
the 4-bit pointer mask for the type, doubled to end on a byte boundary,
fit in no more than 16 bytes; store that doubled mask directly in the
binary and copy from it.

(2) If the type had an even number of words up to 128 words,
or an odd number of words up to 63 words (again due to doubling),
then the 4-bit pointer mask would fit in a 64-byte unrolled mask.
Store a GC program in the binary, but leave space in the BSS for
the unrolled mask. Execute the GC program to construct the mask the
first time it is needed, and thereafter copy from the mask.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.
(This is the case that was 7x slower than the other two.)

Because the new pointer masks store 1-bit entries instead of 4-bit
entries and because using the decoder no longer carries a significant
overhead, after this CL (that is, for Go 1.5) there are only two cases:

(1) If the type is 128 words or less (no condition about odd or even),
store the 1-bit pointer mask directly in the binary and use it to
initialize the heap bitmap during malloc. (Implemented in CL 9702.)

(2) There is no case 2 anymore.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.

Executing the GC program directly into the heap bitmap (case (3) above)
was disabled for the Go 1.5 dev cycle, both to avoid needing to use
GC programs for typedmemmove and to avoid updating that code as
the heap bitmap format changed. Typedmemmove no longer uses this
type information; as of CL 9886 it uses the heap bitmap directly.
Now that the heap bitmap format is stable, we reintroduce GC programs
and their space savings.

Benchmarks for heapBitsSetType, before this CL vs this CL:

name                    old mean               new mean              delta
SetTypePtr              7.59ns × (0.99,1.02)   5.16ns × (1.00,1.00)  -32.05% (p=0.000)
SetTypePtr8             21.0ns × (0.98,1.05)   21.4ns × (1.00,1.00)     ~    (p=0.179)
SetTypePtr16            24.1ns × (0.99,1.01)   24.6ns × (1.00,1.00)   +2.41% (p=0.001)
SetTypePtr32            31.2ns × (0.99,1.01)   32.4ns × (0.99,1.02)   +3.72% (p=0.001)
SetTypePtr64            45.2ns × (1.00,1.00)   47.2ns × (1.00,1.00)   +4.42% (p=0.000)
SetTypePtr126           75.8ns × (0.99,1.01)   79.1ns × (1.00,1.00)   +4.25% (p=0.000)
SetTypePtr128           74.3ns × (0.99,1.01)   77.6ns × (1.00,1.01)   +4.55% (p=0.000)
SetTypePtrSlice          726ns × (1.00,1.01)    712ns × (1.00,1.00)   -1.95% (p=0.001)
SetTypeNode1            20.0ns × (0.99,1.01)   20.7ns × (1.00,1.00)   +3.71% (p=0.000)
SetTypeNode1Slice        112ns × (1.00,1.00)    113ns × (0.99,1.00)     ~    (p=0.070)
SetTypeNode8            23.9ns × (1.00,1.00)   24.7ns × (1.00,1.01)   +3.18% (p=0.000)
SetTypeNode8Slice        294ns × (0.99,1.02)    287ns × (0.99,1.01)   -2.38% (p=0.015)
SetTypeNode64           52.8ns × (0.99,1.03)   51.8ns × (0.99,1.01)     ~    (p=0.069)
SetTypeNode64Slice      1.13µs × (0.99,1.05)   1.14µs × (0.99,1.00)     ~    (p=0.767)
SetTypeNode64Dead       36.0ns × (1.00,1.01)   32.5ns × (0.99,1.00)   -9.67% (p=0.000)
SetTypeNode64DeadSlice  1.43µs × (0.99,1.01)   1.40µs × (1.00,1.00)   -2.39% (p=0.001)
SetTypeNode124          75.7ns × (1.00,1.01)   79.0ns × (1.00,1.00)   +4.44% (p=0.000)
SetTypeNode124Slice     1.94µs × (1.00,1.01)   2.04µs × (0.99,1.01)   +4.98% (p=0.000)
SetTypeNode126          75.4ns × (1.00,1.01)   77.7ns × (0.99,1.01)   +3.11% (p=0.000)
SetTypeNode126Slice     1.95µs × (0.99,1.01)   2.03µs × (1.00,1.00)   +3.74% (p=0.000)
SetTypeNode128          85.4ns × (0.99,1.01)  122.0ns × (1.00,1.00)  +42.89% (p=0.000)
SetTypeNode128Slice     2.20µs × (1.00,1.01)   2.36µs × (0.98,1.02)   +7.48% (p=0.001)
SetTypeNode130          83.3ns × (1.00,1.00)  123.0ns × (1.00,1.00)  +47.61% (p=0.000)
SetTypeNode130Slice     2.30µs × (0.99,1.01)   2.40µs × (0.98,1.01)   +4.37% (p=0.000)
SetTypeNode1024          498ns × (1.00,1.00)    537ns × (1.00,1.00)   +7.96% (p=0.000)
SetTypeNode1024Slice    15.5µs × (0.99,1.01)   17.8µs × (1.00,1.00)  +15.27% (p=0.000)

The above compares always using a cached pointer mask (and the
corresponding waste of memory) against using the programs directly.
Some slowdown is expected, in exchange for having a better general algorithm.
The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024,
along with the slice variants of those.
It is possible that the cutoff of 128 words (bits) should be raised
in a followup CL, but even with this low cutoff the GC programs are
faster than Go 1.4's "fast path" non-GC program case.

Benchmarks for heapBitsSetType, Go 1.4 vs this CL:

name                    old mean              new mean              delta
SetTypePtr              6.89ns × (1.00,1.00)  5.17ns × (1.00,1.00)  -25.02% (p=0.000)
SetTypePtr8             25.8ns × (0.97,1.05)  21.5ns × (1.00,1.00)  -16.70% (p=0.000)
SetTypePtr16            39.8ns × (0.97,1.02)  24.7ns × (0.99,1.01)  -37.81% (p=0.000)
SetTypePtr32            68.8ns × (0.98,1.01)  32.2ns × (1.00,1.01)  -53.18% (p=0.000)
SetTypePtr64             130ns × (1.00,1.00)    47ns × (1.00,1.00)  -63.67% (p=0.000)
SetTypePtr126            241ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.25% (p=0.000)
SetTypePtr128           2.07µs × (1.00,1.00)  0.08µs × (1.00,1.00)  -96.27% (p=0.000)
SetTypePtrSlice         1.05µs × (0.99,1.01)  0.72µs × (0.99,1.02)  -31.70% (p=0.000)
SetTypeNode1            16.0ns × (0.99,1.01)  20.8ns × (0.99,1.03)  +29.91% (p=0.000)
SetTypeNode1Slice        184ns × (0.99,1.01)   112ns × (0.99,1.01)  -39.26% (p=0.000)
SetTypeNode8            29.5ns × (0.97,1.02)  24.6ns × (1.00,1.00)  -16.50% (p=0.000)
SetTypeNode8Slice        624ns × (0.98,1.02)   285ns × (1.00,1.00)  -54.31% (p=0.000)
SetTypeNode64            135ns × (0.96,1.08)    52ns × (0.99,1.02)  -61.32% (p=0.000)
SetTypeNode64Slice      3.83µs × (1.00,1.00)  1.14µs × (0.99,1.01)  -70.16% (p=0.000)
SetTypeNode64Dead        134ns × (0.99,1.01)    32ns × (1.00,1.01)  -75.74% (p=0.000)
SetTypeNode64DeadSlice  3.83µs × (0.99,1.00)  1.40µs × (1.00,1.01)  -63.42% (p=0.000)
SetTypeNode124           240ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.05% (p=0.000)
SetTypeNode124Slice     7.27µs × (1.00,1.00)  2.04µs × (1.00,1.00)  -71.95% (p=0.000)
SetTypeNode126          2.06µs × (0.99,1.01)  0.08µs × (0.99,1.01)  -96.23% (p=0.000)
SetTypeNode126Slice     64.4µs × (1.00,1.00)   2.0µs × (1.00,1.00)  -96.85% (p=0.000)
SetTypeNode128          2.09µs × (1.00,1.01)  0.12µs × (1.00,1.00)  -94.15% (p=0.000)
SetTypeNode128Slice     65.4µs × (1.00,1.00)   2.4µs × (0.99,1.03)  -96.39% (p=0.000)
SetTypeNode130          2.11µs × (1.00,1.00)  0.12µs × (1.00,1.00)  -94.18% (p=0.000)
SetTypeNode130Slice     66.3µs × (1.00,1.00)   2.4µs × (0.97,1.08)  -96.34% (p=0.000)
SetTypeNode1024         16.0µs × (1.00,1.01)   0.5µs × (1.00,1.00)  -96.65% (p=0.000)
SetTypeNode1024Slice     512µs × (1.00,1.00)    18µs × (0.98,1.04)  -96.45% (p=0.000)

SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation.
Both Go 1.4 and this CL are using pointer bitmaps for this case,
so that's an overall 3x speedup for using pointer bitmaps.

SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation.
Both Go 1.4 and this CL are running the GC program for this case,
so that's an overall 17x speedup when using GC programs (and
I've seen >20x on other systems).

Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against
this CL's SetTypeNode128 (GC program), the slow path in the
code in this CL is 2x faster than the fast path in Go 1.4.

The Go 1 benchmarks are basically unaffected compared to just before this CL.

Go 1 benchmarks, before this CL vs this CL:

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.97,1.04)   5.91s × (0.96,1.04)    ~    (p=0.306)
Fannkuch11              4.38s × (1.00,1.00)   4.37s × (1.00,1.01)  -0.22% (p=0.006)
FmtFprintfEmpty        90.7ns × (0.97,1.10)  89.3ns × (0.96,1.09)    ~    (p=0.280)
FmtFprintfString        282ns × (0.98,1.04)   287ns × (0.98,1.07)  +1.72% (p=0.039)
FmtFprintfInt           269ns × (0.99,1.03)   282ns × (0.97,1.04)  +4.87% (p=0.000)
FmtFprintfIntInt        478ns × (0.99,1.02)   481ns × (0.99,1.02)  +0.61% (p=0.048)
FmtFprintfPrefixedInt   399ns × (0.98,1.03)   400ns × (0.98,1.05)    ~    (p=0.533)
FmtFprintfFloat         563ns × (0.99,1.01)   570ns × (1.00,1.01)  +1.37% (p=0.000)
FmtManyArgs            1.89µs × (0.99,1.01)  1.92µs × (0.99,1.02)  +1.88% (p=0.000)
GobDecode              15.2ms × (0.99,1.01)  15.2ms × (0.98,1.05)    ~    (p=0.609)
GobEncode              11.6ms × (0.98,1.03)  11.9ms × (0.98,1.04)  +2.17% (p=0.000)
Gzip                    648ms × (0.99,1.01)   648ms × (1.00,1.01)    ~    (p=0.835)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.169)
HTTPClientServer       90.5µs × (0.98,1.03)  91.5µs × (0.98,1.04)  +1.04% (p=0.045)
JSONEncode             31.5ms × (0.98,1.03)  31.4ms × (0.98,1.03)    ~    (p=0.549)
JSONDecode              111ms × (0.99,1.01)   107ms × (0.99,1.01)  -3.21% (p=0.000)
Mandelbrot200          6.01ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.878)
GoParse                6.54ms × (0.99,1.02)  6.61ms × (0.99,1.03)  +1.08% (p=0.004)
RegexpMatchEasy0_32     160ns × (1.00,1.01)   161ns × (1.00,1.00)  +0.40% (p=0.000)
RegexpMatchEasy0_1K     560ns × (0.99,1.01)   559ns × (0.99,1.01)    ~    (p=0.088)
RegexpMatchEasy1_32     138ns × (0.99,1.01)   138ns × (1.00,1.00)    ~    (p=0.380)
RegexpMatchEasy1_1K     877ns × (1.00,1.00)   878ns × (1.00,1.00)    ~    (p=0.157)
RegexpMatchMedium_32    251ns × (0.99,1.00)   251ns × (1.00,1.01)  +0.28% (p=0.021)
RegexpMatchMedium_1K   72.6µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.539)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.378)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.067)
Revcomp                 904ms × (0.99,1.02)   904ms × (0.99,1.01)    ~    (p=0.943)
Template                125ms × (0.99,1.02)   127ms × (0.99,1.01)  +1.79% (p=0.000)
TimeParse               627ns × (0.99,1.01)   622ns × (0.99,1.01)  -0.88% (p=0.000)
TimeFormat              655ns × (0.99,1.02)   655ns × (0.99,1.02)    ~    (p=0.976)

For the record, Go 1 benchmarks, Go 1.4 vs this CL:

name                   old mean              new mean              delta
BinaryTree17            4.61s × (0.97,1.05)   5.91s × (0.98,1.03)  +28.35% (p=0.000)
Fannkuch11              4.40s × (0.99,1.03)   4.41s × (0.99,1.01)     ~    (p=0.212)
FmtFprintfEmpty         102ns × (0.99,1.01)    84ns × (0.99,1.02)  -18.38% (p=0.000)
FmtFprintfString        302ns × (0.98,1.01)   303ns × (0.99,1.02)     ~    (p=0.203)
FmtFprintfInt           313ns × (0.97,1.05)   270ns × (0.99,1.01)  -13.69% (p=0.000)
FmtFprintfIntInt        524ns × (0.98,1.02)   477ns × (0.99,1.00)   -8.87% (p=0.000)
FmtFprintfPrefixedInt   424ns × (0.98,1.02)   386ns × (0.99,1.01)   -8.96% (p=0.000)
FmtFprintfFloat         652ns × (0.98,1.02)   594ns × (0.97,1.05)   -8.97% (p=0.000)
FmtManyArgs            2.13µs × (0.99,1.02)  1.94µs × (0.99,1.01)   -8.92% (p=0.000)
GobDecode              17.1ms × (0.99,1.02)  14.9ms × (0.98,1.03)  -13.07% (p=0.000)
GobEncode              13.5ms × (0.98,1.03)  11.5ms × (0.98,1.03)  -15.25% (p=0.000)
Gzip                    656ms × (0.99,1.02)   647ms × (0.99,1.01)   -1.29% (p=0.000)
Gunzip                  143ms × (0.99,1.02)   144ms × (0.99,1.01)     ~    (p=0.204)
HTTPClientServer       88.2µs × (0.98,1.02)  90.8µs × (0.98,1.01)   +2.93% (p=0.000)
JSONEncode             32.2ms × (0.98,1.02)  30.9ms × (0.97,1.04)   -4.06% (p=0.001)
JSONDecode              121ms × (0.98,1.02)   110ms × (0.98,1.05)   -8.95% (p=0.000)
Mandelbrot200          6.06ms × (0.99,1.01)  6.11ms × (0.98,1.04)     ~    (p=0.184)
GoParse                6.76ms × (0.97,1.04)  6.58ms × (0.98,1.05)   -2.63% (p=0.003)
RegexpMatchEasy0_32     195ns × (1.00,1.01)   155ns × (0.99,1.01)  -20.43% (p=0.000)
RegexpMatchEasy0_1K     479ns × (0.98,1.03)   535ns × (0.99,1.02)  +11.59% (p=0.000)
RegexpMatchEasy1_32     169ns × (0.99,1.02)   131ns × (0.99,1.03)  -22.44% (p=0.000)
RegexpMatchEasy1_1K    1.53µs × (0.99,1.01)  0.87µs × (0.99,1.02)  -43.07% (p=0.000)
RegexpMatchMedium_32    334ns × (0.99,1.01)   242ns × (0.99,1.01)  -27.53% (p=0.000)
RegexpMatchMedium_1K    125µs × (1.00,1.01)    72µs × (0.99,1.03)  -42.53% (p=0.000)
RegexpMatchHard_32     6.03µs × (0.99,1.01)  3.79µs × (0.99,1.01)  -37.12% (p=0.000)
RegexpMatchHard_1K      189µs × (0.99,1.02)   115µs × (0.99,1.01)  -39.20% (p=0.000)
Revcomp                 935ms × (0.96,1.03)   926ms × (0.98,1.02)     ~    (p=0.083)
Template                146ms × (0.97,1.05)   119ms × (0.99,1.01)  -18.37% (p=0.000)
TimeParse               660ns × (0.99,1.01)   624ns × (0.99,1.02)   -5.43% (p=0.000)
TimeFormat              670ns × (0.98,1.02)   710ns × (1.00,1.01)   +5.97% (p=0.000)

This CL is a bit larger than I would like, but the compiler, linker, runtime,
and package reflect all need to be in sync about the format of these programs,
so there is no easy way to split this into independent changes (at least
while keeping the build working at each change).

Fixes #9625.
Fixes #10524.

Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a
Reviewed-on: https://go-review.googlesource.com/9888
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-16 00:38:17 +00:00
Russ Cox
d820d5f3ab runtime: make mapzero not crash on arm
Change-Id: I40e8a4a2e62253233b66f6a2e61e222437292c31
Reviewed-on: https://go-review.googlesource.com/10151
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-15 20:14:41 +00:00
Russ Cox
c3c047a6a3 runtime: test and fix heap bitmap for 1-pointer allocation on 32-bit system
Change-Id: Ic064fe7c6bd3304dcc8c3f7b3b5393870b5387c2
Reviewed-on: https://go-review.googlesource.com/10119
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-15 18:47:00 +00:00
Russ Cox
7e26a2d9a8 runtime: allocate map element zero values for reflect-created types on demand
Preallocating them in reflect means that
(1) if you say _ = PtrTo(ArrayOf(1000000000, reflect.TypeOf(byte(0)))), you just allocated 1GB of data
(2) if you say it again, that's *another* GB of data.

The only use of t.zero in the runtime is for map elements.
Delay the allocation until the creation of a map with that element type,
and share the zeros.

The one downside of the shared zero is that it's not garbage collected,
but it's also never written, so the OS should be able to handle it fairly
efficiently.

Change-Id: I56b098a091abf3ac0945de28ebef9a6c08e76614
Reviewed-on: https://go-review.googlesource.com/10111
Reviewed-by: Keith Randall <khr@golang.org>
2015-05-15 13:56:40 +00:00
Russ Cox
65c4d7beab runtime: optimize heapBitsBulkBarrier a tiny amount
This may be mostly noise but:

name                   old mean              new mean              delta
BinaryTree17            6.03s × (0.98,1.02)   5.98s × (0.97,1.03)    ~    (p=0.306)
Fannkuch11              4.42s × (0.99,1.01)   4.34s × (0.99,1.02)  -1.83% (p=0.000)
FmtFprintfEmpty        84.7ns × (0.99,1.01)  84.4ns × (1.00,1.00)    ~    (p=0.138)
FmtFprintfString        289ns × (0.98,1.02)   289ns × (1.00,1.01)    ~    (p=0.509)
FmtFprintfInt           280ns × (0.97,1.03)   272ns × (0.98,1.03)  -2.64% (p=0.003)
FmtFprintfIntInt        484ns × (0.98,1.02)   482ns × (0.98,1.03)    ~    (p=0.606)
FmtFprintfPrefixedInt   397ns × (0.98,1.03)   393ns × (0.99,1.02)    ~    (p=0.064)
FmtFprintfFloat         573ns × (0.99,1.01)   569ns × (0.99,1.01)  -0.69% (p=0.023)
FmtManyArgs            1.89µs × (0.99,1.02)  1.91µs × (0.98,1.02)    ~    (p=0.219)
GobDecode              15.4ms × (0.99,1.02)  15.1ms × (0.99,1.01)  -2.05% (p=0.000)
GobEncode              12.0ms × (0.97,1.04)  11.9ms × (0.97,1.03)    ~    (p=0.458)
Gzip                    652ms × (0.99,1.01)   653ms × (0.99,1.01)    ~    (p=0.743)
Gunzip                  144ms × (0.99,1.01)   143ms × (0.99,1.01)    ~    (p=0.134)
HTTPClientServer       91.6µs × (0.99,1.01)  91.8µs × (0.99,1.03)    ~    (p=0.678)
JSONEncode             31.9ms × (1.00,1.00)  32.0ms × (0.99,1.01)    ~    (p=0.334)
JSONDecode              110ms × (0.99,1.01)   110ms × (0.99,1.01)    ~    (p=0.315)
Mandelbrot200          6.04ms × (0.99,1.01)  6.04ms × (1.00,1.01)    ~    (p=0.596)
GoParse                6.72ms × (0.98,1.03)  6.74ms × (0.99,1.03)    ~    (p=0.577)
RegexpMatchEasy0_32     161ns × (0.99,1.01)   160ns × (1.00,1.00)  -0.83% (p=0.002)
RegexpMatchEasy0_1K     542ns × (0.99,1.02)   541ns × (0.99,1.01)    ~    (p=0.396)
RegexpMatchEasy1_32     140ns × (0.98,1.01)   137ns × (1.00,1.00)  -2.12% (p=0.000)
RegexpMatchEasy1_1K     892ns × (0.99,1.01)   891ns × (1.00,1.01)    ~    (p=0.631)
RegexpMatchMedium_32    255ns × (0.99,1.01)   253ns × (0.99,1.01)  -0.76% (p=0.008)
RegexpMatchMedium_1K   73.1µs × (1.00,1.01)  72.9µs × (1.00,1.00)    ~    (p=0.229)
RegexpMatchHard_32     3.86µs × (1.00,1.01)  3.85µs × (1.00,1.00)    ~    (p=0.341)
RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (0.99,1.00)    ~    (p=0.955)
Revcomp                 954ms × (0.97,1.03)   955ms × (0.98,1.02)    ~    (p=0.894)
Template                133ms × (0.97,1.05)   129ms × (0.99,1.02)  -2.50% (p=0.014)
TimeParse               629ns × (0.99,1.01)   626ns × (0.99,1.01)    ~    (p=0.106)
TimeFormat              663ns × (0.99,1.01)   660ns × (0.99,1.02)    ~    (p=0.231)

Change-Id: I580e03ed01b0629cb5eae4c4637618f20127f924
Reviewed-on: https://go-review.googlesource.com/9994
Reviewed-by: Austin Clements <austin@google.com>
2015-05-15 13:52:00 +00:00
Russ Cox
497970f421 runtime: use memmove during slice append
The effect of this CL:

name                   old mean              new mean              delta
BinaryTree17            5.97s × (0.96,1.04)   5.95s × (0.98,1.02)    ~    (p=0.697)
Fannkuch11              4.39s × (1.00,1.01)   4.41s × (1.00,1.01)  +0.52% (p=0.015)
FmtFprintfEmpty        90.8ns × (0.97,1.05)  89.4ns × (0.94,1.13)    ~    (p=0.571)
FmtFprintfString        305ns × (0.99,1.01)   292ns × (0.98,1.05)  -4.35% (p=0.000)
FmtFprintfInt           278ns × (0.96,1.03)   279ns × (0.98,1.04)    ~    (p=0.741)
FmtFprintfIntInt        489ns × (0.99,1.02)   482ns × (0.98,1.03)  -1.43% (p=0.024)
FmtFprintfPrefixedInt   402ns × (0.98,1.02)   395ns × (0.98,1.03)  -1.67% (p=0.014)
FmtFprintfFloat         578ns × (1.00,1.00)   569ns × (0.99,1.01)  -1.48% (p=0.000)
FmtManyArgs            1.88µs × (0.99,1.01)  1.88µs × (1.00,1.01)    ~    (p=0.055)
GobDecode              15.3ms × (0.99,1.01)  15.2ms × (1.00,1.01)  -0.61% (p=0.007)
GobEncode              11.8ms × (0.98,1.05)  11.6ms × (0.99,1.01)    ~    (p=0.075)
Gzip                    647ms × (0.99,1.01)   647ms × (1.00,1.00)    ~    (p=0.790)
Gunzip                  143ms × (1.00,1.00)   142ms × (1.00,1.00)    ~    (p=0.370)
HTTPClientServer       91.2µs × (0.99,1.01)  91.7µs × (0.99,1.02)    ~    (p=0.233)
JSONEncode             31.5ms × (0.98,1.01)  31.8ms × (0.99,1.02)  +1.09% (p=0.015)
JSONDecode              110ms × (0.99,1.01)   110ms × (0.99,1.02)    ~    (p=0.577)
Mandelbrot200          6.00ms × (1.00,1.00)  6.02ms × (1.00,1.00)  +0.24% (p=0.001)
GoParse                6.68ms × (0.98,1.02)  6.61ms × (0.99,1.01)  -1.10% (p=0.027)
RegexpMatchEasy0_32     162ns × (1.00,1.00)   161ns × (1.00,1.01)  -0.66% (p=0.001)
RegexpMatchEasy0_1K     539ns × (1.00,1.00)   539ns × (0.99,1.01)    ~    (p=0.509)
RegexpMatchEasy1_32     140ns × (0.99,1.02)   139ns × (0.99,1.02)    ~    (p=0.163)
RegexpMatchEasy1_1K     886ns × (1.00,1.00)   887ns × (1.00,1.00)    ~    (p=0.408)
RegexpMatchMedium_32    252ns × (1.00,1.00)   255ns × (0.99,1.01)  +1.01% (p=0.000)
RegexpMatchMedium_1K   72.6µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.176)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.403)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.351)
Revcomp                 926ms × (0.99,1.01)   925ms × (0.99,1.01)    ~    (p=0.541)
Template                126ms × (0.99,1.02)   130ms × (0.99,1.01)  +3.42% (p=0.000)
TimeParse               632ns × (0.99,1.01)   626ns × (1.00,1.00)  -0.88% (p=0.000)
TimeFormat              658ns × (0.99,1.01)   662ns × (0.99,1.02)    ~    (p=0.111)

The effect of this CL combined with CL 9886:

name                   old mean              new mean              delta
BinaryTree17            5.90s × (0.98,1.03)   5.95s × (0.98,1.02)    ~    (p=0.175)
Fannkuch11              4.34s × (1.00,1.00)   4.41s × (1.00,1.01)  +1.69% (p=0.000)
FmtFprintfEmpty        87.3ns × (0.97,1.17)  89.4ns × (0.94,1.13)    ~    (p=0.499)
FmtFprintfString        288ns × (0.98,1.04)   292ns × (0.98,1.05)    ~    (p=0.292)
FmtFprintfInt           290ns × (0.98,1.05)   279ns × (0.98,1.04)  -3.76% (p=0.001)
FmtFprintfIntInt        493ns × (0.98,1.04)   482ns × (0.98,1.03)  -2.27% (p=0.017)
FmtFprintfPrefixedInt   399ns × (0.98,1.02)   395ns × (0.98,1.03)    ~    (p=0.159)
FmtFprintfFloat         569ns × (1.00,1.00)   569ns × (0.99,1.01)    ~    (p=0.847)
FmtManyArgs            1.90µs × (0.99,1.03)  1.88µs × (1.00,1.01)  -1.14% (p=0.009)
GobDecode              15.2ms × (1.00,1.01)  15.2ms × (1.00,1.01)    ~    (p=0.170)
GobEncode              11.8ms × (0.99,1.02)  11.6ms × (0.99,1.01)  -1.47% (p=0.003)
Gzip                    649ms × (0.99,1.00)   647ms × (1.00,1.00)    ~    (p=0.200)
Gunzip                  144ms × (0.99,1.01)   142ms × (1.00,1.00)  -1.04% (p=0.000)
HTTPClientServer       91.1µs × (0.98,1.03)  91.7µs × (0.99,1.02)    ~    (p=0.345)
JSONEncode             31.5ms × (0.99,1.01)  31.8ms × (0.99,1.02)  +0.98% (p=0.021)
JSONDecode              110ms × (1.00,1.01)   110ms × (0.99,1.02)    ~    (p=0.259)
Mandelbrot200          6.02ms × (1.00,1.01)  6.02ms × (1.00,1.00)    ~    (p=0.500)
GoParse                6.68ms × (1.00,1.01)  6.61ms × (0.99,1.01)  -1.17% (p=0.001)
RegexpMatchEasy0_32     161ns × (1.00,1.00)   161ns × (1.00,1.01)  -0.39% (p=0.033)
RegexpMatchEasy0_1K     539ns × (1.00,1.00)   539ns × (0.99,1.01)    ~    (p=0.445)
RegexpMatchEasy1_32     138ns × (1.00,1.01)   139ns × (0.99,1.02)    ~    (p=0.281)
RegexpMatchEasy1_1K     887ns × (1.00,1.01)   887ns × (1.00,1.00)    ~    (p=0.610)
RegexpMatchMedium_32    251ns × (1.00,1.02)   255ns × (0.99,1.01)  +1.42% (p=0.000)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.097)
RegexpMatchHard_32     3.85µs × (1.00,1.00)  3.84µs × (1.00,1.00)  -0.31% (p=0.000)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.704)
Revcomp                 923ms × (0.98,1.02)   925ms × (0.99,1.01)    ~    (p=0.574)
Template                126ms × (0.98,1.03)   130ms × (0.99,1.01)  +3.28% (p=0.000)
TimeParse               631ns × (0.99,1.02)   626ns × (1.00,1.00)    ~    (p=0.053)
TimeFormat              660ns × (0.99,1.01)   662ns × (0.99,1.02)    ~    (p=0.398)

Change-Id: I59c03d329fe7bc178a31477c6f1f01062b881041
Reviewed-on: https://go-review.googlesource.com/9993
Reviewed-by: Austin Clements <austin@google.com>
2015-05-15 13:51:49 +00:00
Russ Cox
30aacd4ce2 runtime: add Node128, Node130 benchmarks
Change-Id: I815a7ceeea48cc652b3c8568967665af39b02834
Reviewed-on: https://go-review.googlesource.com/10045
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-14 20:21:34 +00:00
Russ Cox
ecfe42cab0 runtime: keep pointer bits set always in 1-word spans
It's dumb to clear them in initSpan, set them in heapBitsSetType,
clear them in heapBitsSweepSpan, set them again in heapBitsSetType,
clear them again in heapBitsSweepSpan, and so on.

Set them in initSpan and be done with it (until the span is reused
for objects of a different size).

This avoids an atomic operation in a common case (one-word allocation).
Suggested by rlh.

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.97,1.03)   5.93s × (0.98,1.04)              ~    (p=0.056)
Fannkuch11              4.34s × (1.00,1.01)   4.41s × (1.00,1.00)            +1.42% (p=0.000)
FmtFprintfEmpty        86.1ns × (0.98,1.03)  88.9ns × (0.95,1.14)              ~    (p=0.066)
FmtFprintfString        292ns × (0.97,1.04)   284ns × (0.98,1.03)            -2.64% (p=0.000)
FmtFprintfInt           271ns × (0.98,1.06)   274ns × (0.98,1.05)              ~    (p=0.148)
FmtFprintfIntInt        478ns × (0.98,1.05)   487ns × (0.98,1.03)            +1.85% (p=0.004)
FmtFprintfPrefixedInt   397ns × (0.98,1.05)   394ns × (0.98,1.02)              ~    (p=0.184)
FmtFprintfFloat         553ns × (0.99,1.02)   543ns × (0.99,1.01)            -1.71% (p=0.000)
FmtManyArgs            1.90µs × (0.98,1.05)  1.88µs × (0.99,1.01)            -0.97% (p=0.037)
GobDecode              15.1ms × (0.99,1.01)  15.3ms × (0.99,1.01)            +0.78% (p=0.001)
GobEncode              11.7ms × (0.98,1.05)  11.6ms × (0.99,1.02)            -1.39% (p=0.009)
Gzip                    646ms × (1.00,1.01)   647ms × (1.00,1.01)              ~    (p=0.120)
Gunzip                  142ms × (1.00,1.00)   142ms × (1.00,1.00)              ~    (p=0.068)
HTTPClientServer       89.7µs × (0.99,1.01)  90.1µs × (0.98,1.03)              ~    (p=0.224)
JSONEncode             31.3ms × (0.99,1.01)  31.2ms × (0.99,1.02)              ~    (p=0.149)
JSONDecode              113ms × (0.99,1.01)   111ms × (0.99,1.01)            -1.25% (p=0.000)
Mandelbrot200          6.01ms × (1.00,1.00)  6.01ms × (1.00,1.00)            +0.09% (p=0.015)
GoParse                6.63ms × (0.98,1.03)  6.55ms × (0.99,1.02)            -1.10% (p=0.006)
RegexpMatchEasy0_32     161ns × (1.00,1.00)   161ns × (1.00,1.00)  (sample has zero variance)
RegexpMatchEasy0_1K     539ns × (0.99,1.01)   563ns × (0.99,1.01)            +4.51% (p=0.000)
RegexpMatchEasy1_32     140ns × (0.99,1.01)   141ns × (0.99,1.01)            +1.34% (p=0.000)
RegexpMatchEasy1_1K     886ns × (1.00,1.01)   888ns × (1.00,1.00)            +0.20% (p=0.003)
RegexpMatchMedium_32    252ns × (1.00,1.02)   255ns × (0.99,1.01)            +1.32% (p=0.000)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  72.6µs × (1.00,1.00)              ~    (p=0.296)
RegexpMatchHard_32     3.84µs × (1.00,1.01)  3.84µs × (1.00,1.00)              ~    (p=0.339)
RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (1.00,1.00)            -0.28% (p=0.022)
Revcomp                 914ms × (0.99,1.01)   909ms × (0.99,1.01)            -0.49% (p=0.031)
Template                128ms × (0.99,1.01)   127ms × (0.99,1.01)            -1.10% (p=0.000)
TimeParse               628ns × (0.99,1.01)   639ns × (0.99,1.01)            +1.69% (p=0.000)
TimeFormat              660ns × (0.99,1.01)   662ns × (0.99,1.02)              ~    (p=0.287)

Change-Id: I3127b0ab89708267c74aa7d0eae1db1a1bcdfda5
Reviewed-on: https://go-review.googlesource.com/9884
Reviewed-by: Austin Clements <austin@google.com>
2015-05-14 15:58:54 +00:00
Russ Cox
94934f843e runtime: rewrite addb/subtractb to be simpler to compile; introduce add1, subtract1
This reduces the depth of the inlining at a particular call site.
The inliner introduces many temporary variables, and the compiler can do
a better job with fewer. Being verbose in the bodies of these helper functions
seems like a reasonable tradeoff: the uses are still just as readable, and
they run faster in some important cases.

Change-Id: I5323976ed3704d0acd18fb31176cfbf5ba23a89c
Reviewed-on: https://go-review.googlesource.com/9883
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-14 15:55:42 +00:00
Russ Cox
5b3739357a runtime: skip atomics in heapBitsSetType when GC is not running
Suggested by Rick during code review of this code,
but separated out for easier diagnosis in case it causes
problems (and also easier rollback).

name                    old mean              new mean              delta
SetTypePtr              13.9ns × (0.98,1.05)   6.2ns × (0.99,1.01)  -55.18% (p=0.000)
SetTypePtr8             15.5ns × (0.95,1.10)  15.5ns × (0.99,1.05)     ~    (p=0.952)
SetTypePtr16            17.8ns × (0.99,1.05)  18.0ns × (1.00,1.00)     ~    (p=0.157)
SetTypePtr32            25.2ns × (0.99,1.01)  24.3ns × (0.99,1.01)   -3.86% (p=0.000)
SetTypePtr64            42.2ns × (0.93,1.13)  40.8ns × (0.99,1.01)     ~    (p=0.239)
SetTypePtr126           67.3ns × (1.00,1.00)  67.5ns × (0.99,1.02)     ~    (p=0.365)
SetTypePtr128           67.6ns × (1.00,1.01)  70.1ns × (0.97,1.10)     ~    (p=0.063)
SetTypePtrSlice          575ns × (0.98,1.06)   543ns × (0.95,1.17)   -5.54% (p=0.034)
SetTypeNode1            12.4ns × (0.98,1.09)  12.8ns × (0.99,1.01)   +3.40% (p=0.021)
SetTypeNode1Slice       97.1ns × (0.97,1.09)  89.5ns × (1.00,1.00)   -7.78% (p=0.000)
SetTypeNode8            29.8ns × (1.00,1.01)  17.7ns × (1.00,1.01)  -40.74% (p=0.000)
SetTypeNode8Slice        204ns × (0.99,1.04)   190ns × (0.97,1.06)   -6.96% (p=0.000)
SetTypeNode64           42.8ns × (0.99,1.01)  44.0ns × (0.95,1.12)     ~    (p=0.163)
SetTypeNode64Slice      1.00µs × (0.95,1.09)  0.98µs × (0.96,1.08)     ~    (p=0.356)
SetTypeNode64Dead       12.2ns × (0.99,1.04)  12.7ns × (1.00,1.01)   +4.34% (p=0.000)
SetTypeNode64DeadSlice  1.14µs × (0.94,1.11)  0.99µs × (0.99,1.03)  -13.74% (p=0.000)
SetTypeNode124          67.9ns × (0.99,1.03)  70.4ns × (0.95,1.15)     ~    (p=0.115)
SetTypeNode124Slice     1.76µs × (0.99,1.04)  1.88µs × (0.91,1.23)     ~    (p=0.096)
SetTypeNode126          67.7ns × (1.00,1.01)  68.2ns × (0.99,1.02)   +0.72% (p=0.014)
SetTypeNode126Slice     1.76µs × (1.00,1.01)  1.87µs × (0.93,1.15)   +6.15% (p=0.035)
SetTypeNode1024          462ns × (0.96,1.10)   451ns × (0.99,1.05)     ~    (p=0.224)
SetTypeNode1024Slice    14.4µs × (0.95,1.15)  14.2µs × (0.97,1.19)     ~    (p=0.676)

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.98,1.04)   5.87s × (0.98,1.03)    ~    (p=0.993)
Fannkuch11              4.39s × (0.99,1.01)   4.34s × (1.00,1.01)  -1.22% (p=0.000)
FmtFprintfEmpty        90.6ns × (0.97,1.06)  89.4ns × (0.97,1.03)    ~    (p=0.070)
FmtFprintfString        305ns × (0.98,1.02)   296ns × (0.99,1.02)  -2.94% (p=0.000)
FmtFprintfInt           276ns × (0.97,1.04)   270ns × (0.98,1.03)  -2.17% (p=0.001)
FmtFprintfIntInt        490ns × (0.97,1.05)   473ns × (0.99,1.02)  -3.59% (p=0.000)
FmtFprintfPrefixedInt   402ns × (0.99,1.02)   397ns × (0.99,1.01)  -1.15% (p=0.000)
FmtFprintfFloat         577ns × (0.99,1.01)   549ns × (0.99,1.01)  -4.78% (p=0.000)
FmtManyArgs            1.89µs × (0.99,1.02)  1.87µs × (0.99,1.01)  -1.43% (p=0.000)
GobDecode              15.2ms × (0.99,1.01)  14.7ms × (0.99,1.02)  -3.55% (p=0.000)
GobEncode              11.7ms × (0.98,1.04)  11.5ms × (0.99,1.02)  -1.63% (p=0.002)
Gzip                    647ms × (0.99,1.01)   647ms × (1.00,1.01)    ~    (p=0.486)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.00)    ~    (p=0.234)
HTTPClientServer       90.7µs × (0.99,1.01)  90.4µs × (0.98,1.04)    ~    (p=0.331)
JSONEncode             31.9ms × (0.97,1.06)  31.6ms × (0.98,1.02)    ~    (p=0.206)
JSONDecode              110ms × (0.99,1.01)   112ms × (0.99,1.02)  +1.48% (p=0.000)
Mandelbrot200          6.00ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.058)
GoParse                6.63ms × (0.98,1.03)  6.61ms × (0.98,1.02)    ~    (p=0.353)
RegexpMatchEasy0_32     162ns × (0.99,1.01)   161ns × (1.00,1.00)  -0.33% (p=0.004)
RegexpMatchEasy0_1K     539ns × (0.99,1.01)   540ns × (0.99,1.02)    ~    (p=0.222)
RegexpMatchEasy1_32     139ns × (0.99,1.01)   140ns × (0.97,1.03)    ~    (p=0.054)
RegexpMatchEasy1_1K     886ns × (1.00,1.00)   887ns × (1.00,1.00)  +0.18% (p=0.001)
RegexpMatchMedium_32    252ns × (1.00,1.01)   252ns × (1.00,1.00)  +0.21% (p=0.010)
RegexpMatchMedium_1K   72.7µs × (1.00,1.01)  72.6µs × (1.00,1.00)    ~    (p=0.060)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.065)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)  -0.27% (p=0.000)
Revcomp                 916ms × (0.98,1.04)   909ms × (0.99,1.01)    ~    (p=0.054)
Template                126ms × (0.99,1.01)   128ms × (0.99,1.02)  +1.43% (p=0.000)
TimeParse               632ns × (0.99,1.01)   625ns × (1.00,1.01)  -1.05% (p=0.000)
TimeFormat              655ns × (0.99,1.02)   669ns × (0.99,1.02)  +2.01% (p=0.000)

Change-Id: I9477b7c9489c6fa98e860c190ce06cd73c53c6a1
Reviewed-on: https://go-review.googlesource.com/9829
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-14 15:54:53 +00:00
Brad Fitzpatrick
6f2c0f1585 runtime: add check for malloc in a signal handler
Change-Id: Ic8ebbe81eb788626c01bfab238d54236e6e5ef2b
Reviewed-on: https://go-review.googlesource.com/9964
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-13 20:36:19 +00:00
Rick Hudson
c4fe503119 runtime: reduce thrashing of gs between ps
One important use case is a pipeline computation that pass values
from one Goroutine to the next and then exits or is placed in a
wait state. If GOMAXPROCS > 1 a Goroutine running on P1 will enable
another Goroutine and then immediately make P1 available to execute
it. We need to prevent other Ps from stealing the G that P1 is about
to execute. Otherwise the Gs can thrash between Ps causing unneeded
synchronization and slowing down throughput.

Fix this by changing the stealing logic so that when a P attempts to
steal the only G on some other P's run queue, it will pause
momentarily to allow the victim P to schedule the G.

As part of optimizing stealing we also use a per P victim queue
move stolen gs. This eliminates the zeroing of a stack local victim
queue which turned out to be expensive.

This CL is a necessary but not sufficient prerequisite to changing
the default value of GOMAXPROCS to something > 1 which is another
CL/discussion.

For highly serialized programs, such as GoroutineRing below this can
make a large difference. For larger and more parallel programs such
as the x/benchmarks there is no noticeable detriment.

~/work/code/src/rsc.io/benchstat/benchstat old.txt new.txt
name                old mean              new mean              delta
GoroutineRing       30.2µs × (0.98,1.01)  30.1µs × (0.97,1.04)     ~    (p=0.941)
GoroutineRing-2      113µs × (0.91,1.07)    30µs × (0.98,1.03)  -73.17% (p=0.004)
GoroutineRing-4      144µs × (0.98,1.02)    32µs × (0.98,1.01)  -77.69% (p=0.000)
GoroutineRingBuf    32.7µs × (0.97,1.03)  32.5µs × (0.97,1.02)     ~    (p=0.795)
GoroutineRingBuf-2   120µs × (0.92,1.08)    33µs × (1.00,1.00)  -72.48% (p=0.004)
GoroutineRingBuf-4   138µs × (0.92,1.06)    33µs × (1.00,1.00)  -76.21% (p=0.003)

The bench benchmarks show little impact.
    	  	      old  	 new
garbage	      	      7032879	 7011696
httpold		        25509	   25301
splayold	      1022073	 1019499
jsonold		     28230624   28081433

Change-Id: I228c48fed8d85c9bbef16a7edc53ab7898506f50
Reviewed-on: https://go-review.googlesource.com/9872
Reviewed-by: Austin Clements <austin@google.com>
2015-05-13 12:55:24 +00:00
Austin Clements
350fd548b3 runtime: don't run runq tests on the system stack
Running these tests on the system stack is problematic because they
allocate Ps, which are large enough to overflow the system stack if
they are stack-allocated. It used to be necessary to run these tests
on the system stack because they were written in C, but since this is
no longer the case, we can fix this problem by simply not running the
tests on the system stack.

This also means we no longer need the hack in one of these tests that
forces the allocated Ps to escape to the heap, so eliminate that as
well.

Change-Id: I9064f5f8fd7f7b446ff39a22a70b172cfcb2dc57
Reviewed-on: https://go-review.googlesource.com/9923
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-12 19:58:08 +00:00
David du Colombier
7de86a1b1c runtime: terminate exit status buffer on Plan 9
The status buffer built by the exit function
was not nil-terminated.

Fixes #10789.

Change-Id: I2d34ac50a19d138176c4b47393497ba7070d5b61
Reviewed-on: https://go-review.googlesource.com/9953
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-12 16:35:58 +00:00
David du Colombier
f85a05581e runtime: fix signal handling on Plan 9
Once added to the signal queue, the pointer passed to the
signal handler could no longer be valid. Instead of passing
the pointer to the note string, we recopy the value of the
note string to a static array in the signal queue.

Fixes #10784.

Change-Id: Iddd6837b58a14dfaa16b069308ae28a7b8e0965b
Reviewed-on: https://go-review.googlesource.com/9950
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-12 16:35:46 +00:00
Michael Hudson-Doyle
77fc03f4cd cmd/internal/ld, runtime: abort on shared library ABI mismatch
This:

1) Defines the ABI hash of a package (as the SHA1 of the __.PKGDEF)
2) Defines the ABI hash of a shared library (sort the packages by import
   path, concatenate the hashes of the packages and SHA1 that)
3) When building a shared library, compute the above value and define a
   global symbol that points to a go string that has the hash as its value.
4) When linking against a shared library, read the abi hash from the
   library and put both the value seen at link time and a reference
   to the global symbol into the moduledata.
5) During runtime initialization, check that the hash seen at link time
   still matches the hash the global symbol points to.

Change-Id: Iaa54c783790e6dde3057a2feadc35473d49614a5
Reviewed-on: https://go-review.googlesource.com/8773
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
2015-05-12 01:30:40 +00:00
Michael Hudson-Doyle
be0cb9224b runtime: fix addmoduledata to follow the platform ABI
addmoduledata is called from a .init_array function and need to follow the
platform ABI. It contains accesses to global data which are rewritten to use
R15 by the assembler, and as R15 is callee-save we need to save it.

Change-Id: I03893efb1576aed4f102f2465421f256f3bb0f30
Reviewed-on: https://go-review.googlesource.com/9941
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-12 00:50:32 +00:00
Russ Cox
4212a3c3d9 runtime: use heap bitmap for typedmemmove
The current implementation of typedmemmove walks the ptrmask
in the type to find out where pointers are. This led to turning off
GC programs for the Go 1.5 dev cycle, so that there would always
be a ptrmask. Instead of also interpreting the GC programs,
interpret the heap bitmap, which we know must be available and
up to date. (There is no point to write barriers when writing outside
the heap.)

This CL is only about correctness. The next CL will optimize the code.

Change-Id: Id1305c7c071fd2734ab96634b0e1c745b23fa793
Reviewed-on: https://go-review.googlesource.com/9886
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:38:21 +00:00
Russ Cox
266a842f55 runtime: zero entire bitmap for object, even past dead marker
We want typedmemmove to use the heap bitmap to determine
where pointers are, instead of reinterpreting the type information.
The heap bitmap is simpler to access.

In general, typedmemmove will need to be able to look up the bits
for any word and find valid pointer information, so fill even after the
dead marker. Not filling after the dead marker was an optimization
I introduced only a few days ago, when reintroducing the dead marker
code. At the time I said it probably wouldn't last, and it didn't.

Change-Id: I6ba01bff17ddee1ff429f454abe29867ec60606e
Reviewed-on: https://go-review.googlesource.com/9885
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:37:46 +00:00
Russ Cox
e375ca2a25 runtime: reorder bits in heap bitmap bytes
The runtime deals with 1-bit pointer bitmaps and 2-bit heap bitmaps
that have entries for both pointers and mark bits.

Each byte in a 1-bit pointer bitmap looks like pppppppp (all pointer bits).
Each byte in a 2-bit heap bitmap looks like mpmpmpmp (mark, pointer, ...).
This means that when converting from 1-bit to 2-bit, as we do
during malloc, we have to pick up 4 bits in pppp form and use
shifts to create the mpmpmpmp form.

This CL changes the 2-bit heap bitmap form to mmmmpppp,
so that 4 bits picked up in 1-bit form can be used directly in
the low bits of the heap bitmap byte, without expansion.
This simplifies the code, and it also happens to be faster.

name                    old mean              new mean              delta
SetTypePtr              14.0ns × (0.98,1.09)  14.0ns × (0.98,1.08)     ~    (p=0.966)
SetTypePtr8             16.5ns × (0.99,1.05)  15.3ns × (0.96,1.16)   -6.86% (p=0.012)
SetTypePtr16            21.3ns × (0.98,1.05)  18.8ns × (0.94,1.14)  -11.49% (p=0.000)
SetTypePtr32            34.6ns × (0.93,1.22)  27.7ns × (0.91,1.26)  -20.08% (p=0.001)
SetTypePtr64            55.7ns × (0.97,1.11)  41.6ns × (0.98,1.04)  -25.30% (p=0.000)
SetTypePtr126           98.0ns × (1.00,1.00)  67.7ns × (0.99,1.05)  -30.88% (p=0.000)
SetTypePtr128           98.6ns × (1.00,1.01)  68.6ns × (0.99,1.03)  -30.44% (p=0.000)
SetTypePtrSlice          781ns × (0.99,1.01)   571ns × (0.99,1.04)  -26.93% (p=0.000)
SetTypeNode1            13.1ns × (0.99,1.01)  12.1ns × (0.99,1.01)   -7.45% (p=0.000)
SetTypeNode1Slice        113ns × (0.99,1.01)    94ns × (1.00,1.00)  -16.35% (p=0.000)
SetTypeNode8            32.7ns × (1.00,1.00)  29.8ns × (0.99,1.01)   -8.97% (p=0.000)
SetTypeNode8Slice        266ns × (1.00,1.00)   204ns × (1.00,1.00)  -23.40% (p=0.000)
SetTypeNode64           58.0ns × (0.98,1.08)  42.8ns × (1.00,1.01)  -26.24% (p=0.000)
SetTypeNode64Slice      1.55µs × (0.99,1.02)  0.96µs × (1.00,1.00)  -37.84% (p=0.000)
SetTypeNode64Dead       13.1ns × (0.99,1.01)  12.1ns × (1.00,1.00)   -7.33% (p=0.000)
SetTypeNode64DeadSlice  1.52µs × (1.00,1.01)  1.08µs × (1.00,1.01)  -28.95% (p=0.000)
SetTypeNode124          97.9ns × (1.00,1.00)  67.1ns × (1.00,1.01)  -31.49% (p=0.000)
SetTypeNode124Slice     2.87µs × (0.99,1.02)  1.75µs × (1.00,1.01)  -39.15% (p=0.000)
SetTypeNode126          98.4ns × (1.00,1.01)  68.1ns × (1.00,1.01)  -30.79% (p=0.000)
SetTypeNode126Slice     2.91µs × (0.99,1.01)  1.77µs × (0.99,1.01)  -39.09% (p=0.000)
SetTypeNode1024          732ns × (1.00,1.00)   511ns × (0.87,1.42)  -30.14% (p=0.000)
SetTypeNode1024Slice    23.1µs × (1.00,1.00)  13.9µs × (0.99,1.02)  -39.83% (p=0.000)

Change-Id: I12e3b850a4e6fa6c8146b8635ff728f3ef658819
Reviewed-on: https://go-review.googlesource.com/9828
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:37:36 +00:00
Russ Cox
363fd1dd1b runtime: move a few atomic fields up
Moving them up makes them properly aligned on 32-bit systems.
There are some odd fields above them right now
(like fixalloc and mutex maybe).

Change-Id: I57851a5bbb2e7cc339712f004f99bb6c0cce0ca5
Reviewed-on: https://go-review.googlesource.com/9889
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:08:57 +00:00
Russ Cox
8f037fa1ab runtime: fix TestLFStack on 386
The new(uint64) was moving to the stack, which may not be aligned.

Change-Id: Iad070964202001b52029494d43e299fed980f939
Reviewed-on: https://go-review.googlesource.com/9787
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-05-11 15:21:54 +00:00
Russ Cox
1635ab7dfe runtime: remove wbshadow mode
The write barrier shadow heap was very useful for
developing the write barriers initially, but it's no longer used,
clunky, and dragging the rest of the implementation down.

The gccheckmark mode will find bugs due to missed barriers
when they result in missed marks; wbshadow mode found the
missed barriers more aggressively, but it required an entire
separate copy of the heap. The gccheckmark mode requires
no extra memory, making it more useful in practice.

Compared to previous CL:
name                   old mean              new mean              delta
BinaryTree17            5.91s × (0.96,1.06)   5.72s × (0.97,1.03)  -3.12% (p=0.000)
Fannkuch11              4.32s × (1.00,1.00)   4.36s × (1.00,1.00)  +0.91% (p=0.000)
FmtFprintfEmpty        89.0ns × (0.93,1.10)  86.6ns × (0.96,1.11)    ~    (p=0.077)
FmtFprintfString        298ns × (0.98,1.06)   283ns × (0.99,1.04)  -4.90% (p=0.000)
FmtFprintfInt           286ns × (0.98,1.03)   283ns × (0.98,1.04)  -1.09% (p=0.032)
FmtFprintfIntInt        498ns × (0.97,1.06)   480ns × (0.99,1.02)  -3.65% (p=0.000)
FmtFprintfPrefixedInt   408ns × (0.98,1.02)   396ns × (0.99,1.01)  -3.00% (p=0.000)
FmtFprintfFloat         587ns × (0.98,1.01)   562ns × (0.99,1.01)  -4.34% (p=0.000)
FmtManyArgs            1.94µs × (0.99,1.02)  1.89µs × (0.99,1.01)  -2.85% (p=0.000)
GobDecode              15.8ms × (0.98,1.03)  15.7ms × (0.99,1.02)    ~    (p=0.251)
GobEncode              12.0ms × (0.96,1.09)  11.8ms × (0.98,1.03)  -1.87% (p=0.024)
Gzip                    648ms × (0.99,1.01)   647ms × (0.99,1.01)    ~    (p=0.688)
Gunzip                  143ms × (1.00,1.01)   143ms × (1.00,1.01)    ~    (p=0.203)
HTTPClientServer       90.3µs × (0.98,1.01)  89.1µs × (0.99,1.02)  -1.30% (p=0.000)
JSONEncode             31.6ms × (0.99,1.01)  31.7ms × (0.98,1.02)    ~    (p=0.219)
JSONDecode              107ms × (1.00,1.01)   111ms × (0.99,1.01)  +3.58% (p=0.000)
Mandelbrot200          6.03ms × (1.00,1.01)  6.01ms × (1.00,1.00)    ~    (p=0.077)
GoParse                6.53ms × (0.99,1.03)  6.54ms × (0.99,1.02)    ~    (p=0.585)
RegexpMatchEasy0_32     161ns × (1.00,1.01)   161ns × (0.98,1.05)    ~    (p=0.948)
RegexpMatchEasy0_1K     541ns × (0.99,1.01)   559ns × (0.98,1.01)  +3.32% (p=0.000)
RegexpMatchEasy1_32     138ns × (1.00,1.00)   137ns × (0.99,1.01)  -0.55% (p=0.001)
RegexpMatchEasy1_1K     887ns × (0.99,1.01)   878ns × (0.99,1.01)  -0.98% (p=0.000)
RegexpMatchMedium_32    253ns × (0.99,1.01)   252ns × (0.99,1.01)  -0.39% (p=0.001)
RegexpMatchMedium_1K   72.8µs × (1.00,1.00)  72.7µs × (1.00,1.00)    ~    (p=0.485)
RegexpMatchHard_32     3.85µs × (1.00,1.01)  3.85µs × (1.00,1.01)    ~    (p=0.283)
RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (1.00,1.00)    ~    (p=0.175)
Revcomp                 922ms × (0.97,1.08)   903ms × (0.98,1.05)  -2.15% (p=0.021)
Template                126ms × (0.99,1.01)   126ms × (0.99,1.01)    ~    (p=0.943)
TimeParse               628ns × (0.99,1.01)   634ns × (0.99,1.01)  +0.92% (p=0.000)
TimeFormat              668ns × (0.99,1.01)   698ns × (0.98,1.03)  +4.53% (p=0.000)

It's nice that the microbenchmarks are the ones helped the most,
because those were the ones hurt the most by the conversion from
4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that
process to (compared to CL 9706 patch set 1):

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.94,1.09)   5.72s × (0.97,1.03)  -2.57% (p=0.011)
Fannkuch11              4.32s × (1.00,1.00)   4.36s × (1.00,1.00)  +0.87% (p=0.000)
FmtFprintfEmpty        89.1ns × (0.95,1.16)  86.6ns × (0.96,1.11)    ~    (p=0.090)
FmtFprintfString        283ns × (0.98,1.02)   283ns × (0.99,1.04)    ~    (p=0.681)
FmtFprintfInt           284ns × (0.98,1.04)   283ns × (0.98,1.04)    ~    (p=0.620)
FmtFprintfIntInt        486ns × (0.98,1.03)   480ns × (0.99,1.02)  -1.27% (p=0.002)
FmtFprintfPrefixedInt   400ns × (0.99,1.02)   396ns × (0.99,1.01)  -0.84% (p=0.001)
FmtFprintfFloat         566ns × (0.99,1.01)   562ns × (0.99,1.01)  -0.80% (p=0.000)
FmtManyArgs            1.91µs × (0.99,1.02)  1.89µs × (0.99,1.01)  -1.10% (p=0.000)
GobDecode              15.5ms × (0.98,1.05)  15.7ms × (0.99,1.02)  +1.55% (p=0.005)
GobEncode              11.9ms × (0.97,1.03)  11.8ms × (0.98,1.03)  -0.97% (p=0.048)
Gzip                    648ms × (0.99,1.01)   647ms × (0.99,1.01)    ~    (p=0.627)
Gunzip                  143ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.482)
HTTPClientServer       89.2µs × (0.99,1.02)  89.1µs × (0.99,1.02)    ~    (p=0.740)
JSONEncode             32.3ms × (0.97,1.06)  31.7ms × (0.98,1.02)  -1.95% (p=0.002)
JSONDecode              106ms × (0.99,1.01)   111ms × (0.99,1.01)  +4.22% (p=0.000)
Mandelbrot200          6.02ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.417)
GoParse                6.57ms × (0.97,1.06)  6.54ms × (0.99,1.02)    ~    (p=0.404)
RegexpMatchEasy0_32     162ns × (1.00,1.00)   161ns × (0.98,1.05)    ~    (p=0.088)
RegexpMatchEasy0_1K     561ns × (0.99,1.02)   559ns × (0.98,1.01)  -0.47% (p=0.034)
RegexpMatchEasy1_32     145ns × (0.95,1.04)   137ns × (0.99,1.01)  -5.56% (p=0.000)
RegexpMatchEasy1_1K     864ns × (0.99,1.04)   878ns × (0.99,1.01)  +1.57% (p=0.000)
RegexpMatchMedium_32    255ns × (0.99,1.04)   252ns × (0.99,1.01)  -1.43% (p=0.001)
RegexpMatchMedium_1K   73.9µs × (0.98,1.04)  72.7µs × (1.00,1.00)  -1.55% (p=0.004)
RegexpMatchHard_32     3.92µs × (0.98,1.04)  3.85µs × (1.00,1.01)  -1.80% (p=0.003)
RegexpMatchHard_1K      120µs × (0.98,1.04)   117µs × (1.00,1.00)  -2.13% (p=0.001)
Revcomp                 936ms × (0.95,1.08)   903ms × (0.98,1.05)  -3.58% (p=0.002)
Template                130ms × (0.98,1.04)   126ms × (0.99,1.01)  -2.98% (p=0.000)
TimeParse               638ns × (0.98,1.05)   634ns × (0.99,1.01)    ~    (p=0.198)
TimeFormat              674ns × (0.99,1.01)   698ns × (0.98,1.03)  +3.69% (p=0.000)

Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c
Reviewed-on: https://go-review.googlesource.com/9706
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-11 14:55:11 +00:00
Russ Cox
54af9a3ba5 runtime: reintroduce ``dead'' space during GC scan
Reintroduce an optimization discarded during the initial conversion
from 4-bit heap bitmaps to 2-bit heap bitmaps: when we reach the
place in the bitmap where there are no more pointers, mark that position
for the GC so that it can avoid scanning past that place.

During heapBitsSetType we can also avoid initializing heap bitmap
beyond that location, which gives a bit of a win compared to Go 1.4.
This particular optimization (not initializing the heap bitmap) may not last:
we might change typedmemmove to use the heap bitmap, in which
case it would all need to be initialized. The early stop in the GC scan
will stay no matter what.

Compared to Go 1.4 (github.com/rsc/go, branch go14bench):
name                    old mean              new mean              delta
SetTypeNode64           80.7ns × (1.00,1.01)  57.4ns × (1.00,1.01)  -28.83% (p=0.000)
SetTypeNode64Dead       80.5ns × (1.00,1.01)  13.1ns × (0.99,1.02)  -83.77% (p=0.000)
SetTypeNode64Slice      2.16µs × (1.00,1.01)  1.54µs × (1.00,1.01)  -28.75% (p=0.000)
SetTypeNode64DeadSlice  2.16µs × (1.00,1.01)  1.52µs × (1.00,1.00)  -29.74% (p=0.000)

Compared to previous CL:
name                    old mean              new mean              delta
SetTypeNode64           56.7ns × (1.00,1.00)  57.4ns × (1.00,1.01)   +1.19% (p=0.000)
SetTypeNode64Dead       57.2ns × (1.00,1.00)  13.1ns × (0.99,1.02)  -77.15% (p=0.000)
SetTypeNode64Slice      1.56µs × (1.00,1.01)  1.54µs × (1.00,1.01)   -0.89% (p=0.000)
SetTypeNode64DeadSlice  1.55µs × (1.00,1.01)  1.52µs × (1.00,1.00)   -2.23% (p=0.000)

This is the last CL in the sequence converting from the 4-bit heap
to the 2-bit heap, with all the same optimizations reenabled.
Compared to before that process began (compared to CL 9701 patch set 1):

name                    old mean              new mean              delta
BinaryTree17             5.87s × (0.94,1.09)   5.91s × (0.96,1.06)    ~    (p=0.578)
Fannkuch11               4.32s × (1.00,1.00)   4.32s × (1.00,1.00)    ~    (p=0.474)
FmtFprintfEmpty         89.1ns × (0.95,1.16)  89.0ns × (0.93,1.10)    ~    (p=0.942)
FmtFprintfString         283ns × (0.98,1.02)   298ns × (0.98,1.06)  +5.33% (p=0.000)
FmtFprintfInt            284ns × (0.98,1.04)   286ns × (0.98,1.03)    ~    (p=0.208)
FmtFprintfIntInt         486ns × (0.98,1.03)   498ns × (0.97,1.06)  +2.48% (p=0.000)
FmtFprintfPrefixedInt    400ns × (0.99,1.02)   408ns × (0.98,1.02)  +2.23% (p=0.000)
FmtFprintfFloat          566ns × (0.99,1.01)   587ns × (0.98,1.01)  +3.69% (p=0.000)
FmtManyArgs             1.91µs × (0.99,1.02)  1.94µs × (0.99,1.02)  +1.81% (p=0.000)
GobDecode               15.5ms × (0.98,1.05)  15.8ms × (0.98,1.03)  +1.94% (p=0.002)
GobEncode               11.9ms × (0.97,1.03)  12.0ms × (0.96,1.09)    ~    (p=0.263)
Gzip                     648ms × (0.99,1.01)   648ms × (0.99,1.01)    ~    (p=0.992)
Gunzip                   143ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.585)
HTTPClientServer        89.2µs × (0.99,1.02)  90.3µs × (0.98,1.01)  +1.24% (p=0.000)
JSONEncode              32.3ms × (0.97,1.06)  31.6ms × (0.99,1.01)  -2.29% (p=0.000)
JSONDecode               106ms × (0.99,1.01)   107ms × (1.00,1.01)  +0.62% (p=0.000)
Mandelbrot200           6.02ms × (1.00,1.00)  6.03ms × (1.00,1.01)    ~    (p=0.250)
GoParse                 6.57ms × (0.97,1.06)  6.53ms × (0.99,1.03)    ~    (p=0.243)
RegexpMatchEasy0_32      162ns × (1.00,1.00)   161ns × (1.00,1.01)  -0.80% (p=0.000)
RegexpMatchEasy0_1K      561ns × (0.99,1.02)   541ns × (0.99,1.01)  -3.67% (p=0.000)
RegexpMatchEasy1_32      145ns × (0.95,1.04)   138ns × (1.00,1.00)  -5.04% (p=0.000)
RegexpMatchEasy1_1K      864ns × (0.99,1.04)   887ns × (0.99,1.01)  +2.57% (p=0.000)
RegexpMatchMedium_32     255ns × (0.99,1.04)   253ns × (0.99,1.01)  -1.05% (p=0.012)
RegexpMatchMedium_1K    73.9µs × (0.98,1.04)  72.8µs × (1.00,1.00)  -1.51% (p=0.005)
RegexpMatchHard_32      3.92µs × (0.98,1.04)  3.85µs × (1.00,1.01)  -1.88% (p=0.002)
RegexpMatchHard_1K       120µs × (0.98,1.04)   117µs × (1.00,1.01)  -2.02% (p=0.001)
Revcomp                  936ms × (0.95,1.08)   922ms × (0.97,1.08)    ~    (p=0.234)
Template                 130ms × (0.98,1.04)   126ms × (0.99,1.01)  -2.99% (p=0.000)
TimeParse                638ns × (0.98,1.05)   628ns × (0.99,1.01)  -1.54% (p=0.004)
TimeFormat               674ns × (0.99,1.01)   668ns × (0.99,1.01)  -0.80% (p=0.001)

The slowdown of the first few benchmarks seems to be due to the new
atomic operations for certain small size allocations. But the larger
benchmarks mostly improve, probably due to the decreased memory
pressure from having half as much heap bitmap.

CL 9706, which removes the (never used anymore) wbshadow mode,
gets back what is lost in the early microbenchmarks.

Change-Id: I37423a209e8ec2a2e92538b45cac5422a6acd32d
Reviewed-on: https://go-review.googlesource.com/9705
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-11 14:51:40 +00:00
Russ Cox
feb8a3b616 runtime: optimize heapBitsSetType
For the conversion of the heap bitmap from 4-bit to 2-bit fields,
I replaced heapBitsSetType with the dumbest thing that could possibly work:
two atomic operations (atomicand8+atomicor8) per 2-bit field.

This CL replaces that code with a proper implementation that
avoids the atomics whenever possible. Benchmarks vs base CL
(before the conversion to 2-bit heap bitmap) and vs Go 1.4 below.

Compared to Go 1.4, SetTypePtr (a 1-pointer allocation)
is 10ns slower because a race against the concurrent GC requires the
use of an atomicor8 that used to be an ordinary write. This slowdown
was present even in the base CL.

Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation)
is 10ns slower because it too needs a new atomic, because with the
denser representation, the byte on the end of the allocation is now shared
with the object next to it; this was not true with the 4-bit representation.

Excluding these two (fundamental) slowdowns due to the use of atomics,
the new code is noticeably faster than both Go 1.4 and the base CL.

The next CL will reintroduce the ``typeDead'' optimization.

Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5).

Compared to base CL (** = new atomic)
name                  old mean              new mean              delta
SetTypePtr            14.1ns × (0.99,1.02)  14.7ns × (0.93,1.10)     ~    (p=0.175)
SetTypePtr8           18.4ns × (1.00,1.01)  18.6ns × (0.81,1.21)     ~    (p=0.866)
SetTypePtr16          28.7ns × (1.00,1.00)  22.4ns × (0.90,1.27)  -21.88% (p=0.015)
SetTypePtr32          52.3ns × (1.00,1.00)  33.8ns × (0.93,1.24)  -35.37% (p=0.001)
SetTypePtr64          79.2ns × (1.00,1.00)  55.1ns × (1.00,1.01)  -30.43% (p=0.000)
SetTypePtr126          118ns × (1.00,1.00)   100ns × (1.00,1.00)  -15.97% (p=0.000)
SetTypePtr128          130ns × (0.92,1.19)    98ns × (1.00,1.00)  -24.36% (p=0.008)
SetTypePtrSlice        726ns × (0.96,1.08)   760ns × (1.00,1.00)     ~    (p=0.152)
SetTypeNode1          14.1ns × (0.94,1.15)  12.0ns × (1.00,1.01)  -14.60% (p=0.020)
SetTypeNode1Slice      135ns × (0.96,1.07)    88ns × (1.00,1.00)  -34.53% (p=0.000)
SetTypeNode8          20.9ns × (1.00,1.01)  32.6ns × (1.00,1.00)  +55.37% (p=0.000) **
SetTypeNode8Slice      414ns × (0.99,1.02)   244ns × (1.00,1.00)  -41.09% (p=0.000)
SetTypeNode64         80.0ns × (1.00,1.00)  57.4ns × (1.00,1.00)  -28.23% (p=0.000)
SetTypeNode64Slice    2.15µs × (1.00,1.01)  1.56µs × (1.00,1.00)  -27.43% (p=0.000)
SetTypeNode124         119ns × (0.99,1.00)   100ns × (1.00,1.00)  -16.11% (p=0.000)
SetTypeNode124Slice   3.40µs × (1.00,1.00)  2.93µs × (1.00,1.00)  -13.80% (p=0.000)
SetTypeNode126         120ns × (1.00,1.01)    98ns × (1.00,1.00)  -18.19% (p=0.000)
SetTypeNode126Slice   3.53µs × (0.98,1.08)  3.02µs × (1.00,1.00)  -14.49% (p=0.002)
SetTypeNode1024        726ns × (0.97,1.09)   740ns × (1.00,1.00)     ~    (p=0.451)
SetTypeNode1024Slice  24.9µs × (0.89,1.37)  23.1µs × (1.00,1.00)     ~    (p=0.476)

Compared to Go 1.4 (** = new atomic)
name                  old mean               new mean              delta
SetTypePtr            5.71ns × (0.89,1.19)  14.68ns × (0.93,1.10)  +157.24% (p=0.000) **
SetTypePtr8           19.3ns × (0.96,1.10)   18.6ns × (0.81,1.21)      ~    (p=0.638)
SetTypePtr16          30.7ns × (0.99,1.03)   22.4ns × (0.90,1.27)   -26.88% (p=0.005)
SetTypePtr32          51.5ns × (1.00,1.00)   33.8ns × (0.93,1.24)   -34.40% (p=0.001)
SetTypePtr64          83.6ns × (0.94,1.12)   55.1ns × (1.00,1.01)   -34.12% (p=0.001)
SetTypePtr126          137ns × (0.87,1.26)    100ns × (1.00,1.00)   -27.10% (p=0.028)
SetTypePtrSlice        865ns × (0.80,1.23)    760ns × (1.00,1.00)      ~    (p=0.243)
SetTypeNode1          15.2ns × (0.88,1.12)   12.0ns × (1.00,1.01)   -20.89% (p=0.014)
SetTypeNode1Slice      156ns × (0.93,1.16)     88ns × (1.00,1.00)   -43.57% (p=0.001)
SetTypeNode8          23.8ns × (0.90,1.18)   32.6ns × (1.00,1.00)   +36.76% (p=0.003) **
SetTypeNode8Slice      502ns × (0.92,1.10)    244ns × (1.00,1.00)   -51.46% (p=0.000)
SetTypeNode64         85.6ns × (0.94,1.11)   57.4ns × (1.00,1.00)   -32.89% (p=0.001)
SetTypeNode64Slice    2.36µs × (0.91,1.14)   1.56µs × (1.00,1.00)   -33.96% (p=0.002)
SetTypeNode124         130ns × (0.91,1.12)    100ns × (1.00,1.00)   -23.49% (p=0.004)
SetTypeNode124Slice   3.81µs × (0.90,1.22)   2.93µs × (1.00,1.00)   -23.09% (p=0.025)

There are fewer benchmarks vs Go 1.4 because unrolling directly
into the heap bitmap is not yet implemented, so those would not
be meaningful comparisons.

These benchmarks were not present in Go 1.4 as distributed.
The backport to Go 1.4 is in github.com/rsc/go's go14bench branch,
commit 71d5ee5.

Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6
Reviewed-on: https://go-review.googlesource.com/9704
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-11 14:51:20 +00:00
Russ Cox
0234dfd493 runtime: use 2-bit heap bitmap (in place of 4-bit)
Previous CLs changed the representation of the non-heap type bitmaps
to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap
stored a 2-bit type for each word and a mark bit and checkmark bit
for the first word of the object. (There used to be additional per-word bits.)

Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not,
and the other used for mark, checkmark, and "keep scanning forward
to find pointers in this object." See comments for details.

This CL replaces heapBitsSetType with very slow but obviously correct code.
A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.)

Change-Id: I999577a133f3cfecacebdec9cdc3573c235c7fb9
Reviewed-on: https://go-review.googlesource.com/9703
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 14:43:45 +00:00
Russ Cox
6d8a147bef runtime: use 1-bit pointer bitmaps in type representation
The type information in reflect.Type and the GC programs is now
1 bit per word, down from 2 bits.

The in-memory unrolled type bitmap representation are now
1 bit per word, down from 4 bits.

The conversion from the unrolled (now 1-bit) bitmap to the
heap bitmap (still 4-bit) is not optimized. A followup CL will
work on that, after the heap bitmap has been converted to 2-bit.

The typeDead optimization, in which a special value denotes
that there are no more pointers anywhere in the object, is lost
in this CL. A followup CL will bring it back in the final form of
heapBitsSetType.

Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549
Reviewed-on: https://go-review.googlesource.com/9702
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 14:43:33 +00:00
Russ Cox
7d9e16abc6 runtime: add benchmark of heapBitsSetType
There was an old benchmark that measured this indirectly
via allocation, but I don't understand how to factor out the
allocation cost when interpreting the numbers.

Replace with a benchmark that only calls heapBitsSetType,
that does not allocate. This was not possible when the
benchmark was first written, because heapBitsSetType had
not been factored out of mallocgc.

Change-Id: I30f0f02362efab3465a50769398be859832e6640
Reviewed-on: https://go-review.googlesource.com/9701
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 14:40:27 +00:00
Daniel Morsing
db6f88a84b runtime: enable profiling on g0
Since we now have stack information for code running on the
systemstack, we can traceback over it. To make cpu profiles useful,
add a case in gentraceback to jump over systemstack switches.

Fixes #10609.

Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7
Reviewed-on: https://go-review.googlesource.com/9506
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-05-11 08:44:30 +00:00
Shenghou Ma
fd392ee52b cmd/internal/ld: generate correct .debug_frames on RISC architectures
With this patch, gdb seems to be able to corretly backtrace Go
process on at least linux/{arm,arm64,ppc64}.

Change-Id: Ic40a2a70e71a19c4a92e4655710f38a807b67e9a
Reviewed-on: https://go-review.googlesource.com/9822
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-08 00:34:27 +00:00
Russ Cox
0211d7d7b0 runtime: turn off checkmark by default
Change-Id: Ic8cb8b1ed8715d6d5a53ec3cac385c0e93883514
Reviewed-on: https://go-review.googlesource.com/9825
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-07 21:08:42 +00:00
Russ Cox
9626561030 runtime: fix gccheckmark mode and enable by default
It was testing the mark bits on what roots pointed at,
but not the remainder of the live heap, because in
CL 2991 I accidentally inverted this check during
refactoring.

The next CL will turn it back off by default again,
but I want one run on the builders with the full
checkmark checks.

Change-Id: Ic166458cea25c0a56e5387fc527cb166ff2e5ada
Reviewed-on: https://go-review.googlesource.com/9824
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-07 21:08:29 +00:00
Rick Hudson
b6e178ed7e runtime: set heap minimum default based on GOGC
Currently the heap minimum is set to 4MB which prevents our ability to
collect at every allocation by setting GOGC=0. This adjust the
heap minimum to 4MB*GOGC/100 thus reenabling collecting at every allocation.
Fixes #10681

Change-Id: I912d027dac4b14ae535597e8beefa9ac3fb8ad94
Reviewed-on: https://go-review.googlesource.com/9814
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-07 21:05:58 +00:00
Michael Hudson-Doyle
fa896733b5 runtime: check consistency of all module data objects
Current code just checks the consistency (that the functab is correctly
sorted by PC, etc) of the moduledata object that the runtime belongs to.
Change to check all of them.

Change-Id: I544a44c5de7445fff87d3cdb4840ff04c5e2bf75
Reviewed-on: https://go-review.googlesource.com/9773
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-07 15:06:08 +00:00
Alex Brainman
a52dc9fcbd runtime: fix comments that mention g status values
Makes searching in source code easier.

Change-Id: Ie2e85934d23920ac0bc01d28168bcfbbdc465580
Reviewed-on: https://go-review.googlesource.com/9774
Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
2015-05-07 00:00:38 +00:00
Austin Clements
17db6e0420 runtime: use heap scan size as estimate of GC scan work
Currently, the GC uses a moving average of recent scan work ratios to
estimate the total scan work required by this cycle. This is in turn
used to compute how much scan work should be done by mutators when
they allocate in order to perform all expected scan work by the time
the allocated heap reaches the heap goal.

However, our current scan work estimate can be arbitrarily wrong if
the heap topography changes significantly from one cycle to the
next. For example, in the go1 benchmarks, at the beginning of each
benchmark, the heap is dominated by a 256MB no-scan object, so the GC
learns that the scan density of the heap is very low. In benchmarks
that then rapidly allocate pointer-dense objects, by the time of the
next GC cycle, our estimate of the scan work can be too low by a large
factor. This in turn lets the mutator allocate faster than the GC can
collect, allowing it to get arbitrarily far ahead of the scan work
estimate, which leads to very long GC cycles with very little mutator
assist that can overshoot the heap goal by large margins. This is
particularly easy to demonstrate with BinaryTree17:

$ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17
gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P
gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P
testing: warning: no tests to run
PASS
BenchmarkBinaryTree17	gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced)
gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P
       1       9150447353 ns/op

Change this estimate to instead use the *current* scannable heap
size. This has the advantage of being based solely on the current
state of the heap, not on past densities or reachable heap sizes, so
it isn't susceptible to falling behind during these sorts of phase
changes. This is strictly an over-estimate, but it's better to
over-estimate and get more assist than necessary than it is to
under-estimate and potentially spiral out of control. Experiments with
scaling this estimate back showed no obvious benefit for mutator
utilization, heap size, or assist time.

This new estimate has little effect for most benchmarks, including
most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge
effect for benchmarks that triggered the bad pacer behavior:

name                   old mean              new mean              delta
BinaryTree17            10.0s × (1.00,1.00)    3.5s × (0.98,1.01)  -64.93% (p=0.000)
Fannkuch11              2.74s × (1.00,1.01)   2.65s × (1.00,1.00)   -3.52% (p=0.000)
FmtFprintfEmpty        56.4ns × (0.99,1.00)  57.8ns × (1.00,1.01)   +2.43% (p=0.000)
FmtFprintfString        187ns × (0.99,1.00)   185ns × (0.99,1.01)   -1.19% (p=0.010)
FmtFprintfInt           184ns × (1.00,1.00)   183ns × (1.00,1.00)  (no variance)
FmtFprintfIntInt        321ns × (1.00,1.00)   315ns × (1.00,1.00)   -1.80% (p=0.000)
FmtFprintfPrefixedInt   266ns × (1.00,1.00)   263ns × (1.00,1.00)   -1.22% (p=0.000)
FmtFprintfFloat         353ns × (1.00,1.00)   353ns × (1.00,1.00)   -0.13% (p=0.035)
FmtManyArgs            1.21µs × (1.00,1.00)  1.19µs × (1.00,1.00)   -1.33% (p=0.000)
GobDecode              9.69ms × (1.00,1.00)  9.59ms × (1.00,1.00)   -1.07% (p=0.000)
GobEncode              7.89ms × (0.99,1.01)  7.74ms × (1.00,1.00)   -1.92% (p=0.000)
Gzip                    391ms × (1.00,1.00)   392ms × (1.00,1.00)     ~    (p=0.522)
Gunzip                 97.1ms × (1.00,1.00)  97.0ms × (1.00,1.00)   -0.10% (p=0.000)
HTTPClientServer       55.7µs × (0.99,1.01)  56.7µs × (0.99,1.01)   +1.81% (p=0.001)
JSONEncode             19.1ms × (1.00,1.00)  19.0ms × (1.00,1.00)   -0.85% (p=0.000)
JSONDecode             66.8ms × (1.00,1.00)  66.9ms × (1.00,1.00)     ~    (p=0.288)
Mandelbrot200          4.13ms × (1.00,1.00)  4.12ms × (1.00,1.00)   -0.08% (p=0.000)
GoParse                3.97ms × (1.00,1.01)  4.01ms × (1.00,1.00)   +0.99% (p=0.000)
RegexpMatchEasy0_32     114ns × (1.00,1.00)   115ns × (0.99,1.00)     ~    (p=0.070)
RegexpMatchEasy0_1K     376ns × (1.00,1.00)   376ns × (1.00,1.00)     ~    (p=0.900)
RegexpMatchEasy1_32    94.9ns × (1.00,1.00)  96.3ns × (1.00,1.01)   +1.53% (p=0.001)
RegexpMatchEasy1_1K     568ns × (1.00,1.00)   567ns × (1.00,1.00)   -0.22% (p=0.001)
RegexpMatchMedium_32    159ns × (1.00,1.00)   159ns × (1.00,1.00)     ~    (p=0.178)
RegexpMatchMedium_1K   46.4µs × (1.00,1.00)  46.6µs × (1.00,1.00)   +0.29% (p=0.000)
RegexpMatchHard_32     2.37µs × (1.00,1.00)  2.37µs × (1.00,1.00)     ~    (p=0.722)
RegexpMatchHard_1K     71.1µs × (1.00,1.00)  71.2µs × (1.00,1.00)     ~    (p=0.229)
Revcomp                 565ms × (1.00,1.00)   562ms × (1.00,1.00)   -0.52% (p=0.000)
Template               81.0ms × (1.00,1.00)  80.2ms × (1.00,1.00)   -0.97% (p=0.000)
TimeParse               380ns × (1.00,1.00)   380ns × (1.00,1.00)     ~    (p=0.148)
TimeFormat              405ns × (0.99,1.00)   385ns × (0.99,1.00)   -5.00% (p=0.000)

Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4
Reviewed-on: https://go-review.googlesource.com/9696
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-06 19:40:38 +00:00
Austin Clements
3be3cbd548 runtime: track "scannable" bytes of heap
This tracks the number of scannable bytes in the allocated heap. That
is, bytes that the garbage collector must scan before reaching the
last pointer field in each object.

This will be used to compute a more robust estimate of the GC scan
work.

Change-Id: I1eecd45ef9cdd65b69d2afb5db5da885c80086bb
Reviewed-on: https://go-review.googlesource.com/9695
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:33 +00:00
Austin Clements
53c53984e7 runtime: include scalar slots in GC scan work metric
The garbage collector predicts how much "scan work" must be done in a
cycle to determine how much work should be done by mutators when they
allocate. Most code doesn't care what units the scan work is in: it
simply knows that a certain amount of scan work has to be done in the
cycle. Currently, the GC uses the number of pointer slots scanned as
the scan work on the theory that this is the bulk of the time spent in
the garbage collector and hence reflects real CPU resource usage.
However, this metric is difficult to estimate at the beginning of a
cycle.

Switch to counting the total number of bytes scanned, including both
pointer and scalar slots. This is still less than the total marked
heap since it omits no-scan objects and no-scan tails of objects. This
metric may not reflect absolute performance as well as the count of
scanned pointer slots (though it still takes time to scan scalar
fields), but it will be much easier to estimate robustly, which is
more important.

Change-Id: Ie3a5eeeb0384a1ca566f61b2f11e9ff3a75ca121
Reviewed-on: https://go-review.googlesource.com/9694
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:27 +00:00
Austin Clements
c4931a8433 runtime: dispose gcWork caches before updating controller state
Currently, we only flush the per-P gcWork caches in gcMark, at the
beginning of mark termination. This is necessary to ensure that no
work is held up in these caches.

However, this flush happens after we update the GC controller state,
which depends on statistics about marked heap size and scan work that
are only updated by this flush. Hence, the controller is missing the
bulk of heap marking and scan work. This bug was introduced in commit
1b4025f, which introduced the per-P gcWork caches.

Fix this by flushing these caches before we update the GC controller
state. We continue to flush them at the beginning of mark termination
as well to be robust in case any write barriers happened between the
previous flush and entering mark termination, but this should be a
no-op.

Change-Id: I8f0f91024df967ebf0c616d1c4f0c339c304ebaa
Reviewed-on: https://go-review.googlesource.com/9646
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-06 19:40:22 +00:00
Rick Hudson
1845314560 runtime: remove unused GC timers
During development some tracing routines were added that are not
needed in the release. These included GCstarttimes, GCendtimes, and
GCprinttimes.
Fixes #10462

Change-Id: I0788e6409d61038571a5ae0cbbab793102df0a65
Reviewed-on: https://go-review.googlesource.com/9689
Reviewed-by: Austin Clements <austin@google.com>
2015-05-06 12:53:08 +00:00