1
0
mirror of https://github.com/golang/go synced 2024-10-02 16:28:34 -06:00
Commit Graph

2145 Commits

Author SHA1 Message Date
Rick Hudson
e9eaa181fc [dev.garbage] runtime: simplify nextFreeFast so it is inlined
nextFreeFast is currently not inlined by the compiler due
to its size and complexity. This CL simplifies
nextFreeFast by letting the slow path handle (nextFree)
handle a corner cases.

Change-Id: Ia9c5d1a7912bcb4bec072f5fd240f0e0bafb20e4
Reviewed-on: https://go-review.googlesource.com/22598
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2016-04-29 16:47:11 +00:00
Austin Clements
b3579c095e [dev.garbage] runtime: revive sweep fast path
sweep used to skip mcental.freeSpan (and its locking) if it didn't
find any new free objects. We lost that optimization when the
freed-object counting changed in dad83f7 to count total free objects
instead of newly freed objects.

The previous commit brings back counting of newly freed objects, so we
can easily revive this optimization by checking that count (like we
used to) instead of the total free objects count.

Change-Id: I43658707a1c61674d0366124d5976b00d98741a9
Reviewed-on: https://go-review.googlesource.com/22596
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-29 15:25:28 +00:00
Austin Clements
d97625ae9e [dev.garbage] runtime: fix nfree accounting
Commit 8dda1c4 changed the meaning of "nfree" in sweep from the number
of newly freed objects to the total number of free objects in the
span, but didn't update where sweep added nfree to c.local_nsmallfree.
Hence, we're over-accounting the number of frees. This is causing
TestArrayHash to fail with "too many allocs NNN - hash not balanced".

Fix this by computing the number of newly freed objects and adding
that to c.local_nsmallfree, so it behaves like it used to. Computing
this requires a small tweak to mallocgc: apparently we've never set
s.allocCount when allocating a large object; fix this by setting it to
1 so sweep doesn't get confused.

Change-Id: I31902ffd310110da4ffd807c5c06f1117b872dc8
Reviewed-on: https://go-review.googlesource.com/22595
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2016-04-29 15:25:26 +00:00
Austin Clements
6d11490539 [dev.garbage] runtime: fix allocfreetrace
We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode
when we removed the sweep over the mark bitmap. Fix it by
re-introducing the sweep over the bitmap specifically if we're in
allocfreetrace mode. This doesn't have to be even remotely efficient,
since the overhead of allocfreetrace is huge anyway, so we can keep
the code for this down to just a few lines.

Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40
Reviewed-on: https://go-review.googlesource.com/22592
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-29 15:08:21 +00:00
Austin Clements
38f674687a [dev.garbage] runtime: reintroduce no-zeroing optimization
Currently we always zero objects when we allocate them. We used to
have an optimization that would not zero objects that had not been
allocated since the whole span was last zeroed (either by getting it
from the system or by getting it from the heap, which does a bulk
zero), but this depended on the sweeper clobbering the first two words
of each object. Hence, we lost this optimization when the bitmap
sweeper went away.

Re-introduce this optimization using a different mechanism. Each span
already keeps a flag indicating that it just came from the OS or was
just bulk zeroed by the mheap. We can simply use this flag to know
when we don't need to zero an object. This is slightly less efficient
than the old optimization: if a span gets allocated and partially
used, then GC happens and the span gets returned to the mcentral, then
the span gets re-acquired, the old optimization knew that it only had
to re-zero the objects that had been reclaimed, whereas this
optimization will re-zero everything. However, in this case, you're
already paying for the garbage collection, and you've only wasted one
zeroing of the span, so in practice there seems to be little
difference. (If we did want to revive the full optimization, each span
could keep track of a frontier beyond which all free slots are zeroed.
I prototyped this and it didn't obvious do any better than the much
simpler approach in this commit.)

This significantly improves BinaryTree17, which is allocation-heavy
(and runs first, so most pages are already zeroed), and slightly
improves everything else.

name              old time/op  new time/op  delta
XBenchGarbage-12  2.15ms ± 1%  2.14ms ± 1%  -0.80%  (p=0.000 n=17+17)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.71s ± 1%     2.56s ± 1%  -5.73%        (p=0.000 n=18+19)
DivconstI64-12              1.70ns ± 1%    1.70ns ± 1%    ~           (p=0.562 n=18+18)
DivconstU64-12              1.74ns ± 2%    1.74ns ± 1%    ~           (p=0.394 n=20+20)
DivconstI32-12              1.74ns ± 0%    1.74ns ± 0%    ~     (all samples are equal)
DivconstU32-12              1.66ns ± 1%    1.66ns ± 0%    ~           (p=0.516 n=15+16)
DivconstI16-12              1.84ns ± 0%    1.84ns ± 0%    ~     (all samples are equal)
DivconstU16-12              1.82ns ± 0%    1.82ns ± 0%    ~     (all samples are equal)
DivconstI8-12               1.79ns ± 0%    1.79ns ± 0%    ~     (all samples are equal)
DivconstU8-12               1.60ns ± 0%    1.60ns ± 1%    ~           (p=0.603 n=17+19)
Fannkuch11-12                2.11s ± 1%     2.11s ± 0%    ~           (p=0.333 n=16+19)
FmtFprintfEmpty-12          45.1ns ± 4%    45.4ns ± 5%    ~           (p=0.111 n=20+20)
FmtFprintfString-12          134ns ± 0%     129ns ± 0%  -3.45%        (p=0.000 n=18+16)
FmtFprintfInt-12             131ns ± 1%     129ns ± 1%  -1.54%        (p=0.000 n=16+18)
FmtFprintfIntInt-12          205ns ± 2%     203ns ± 0%  -0.56%        (p=0.014 n=20+18)
FmtFprintfPrefixedInt-12     200ns ± 2%     197ns ± 1%  -1.48%        (p=0.000 n=20+18)
FmtFprintfFloat-12           256ns ± 1%     256ns ± 0%  -0.21%        (p=0.008 n=18+20)
FmtManyArgs-12               805ns ± 0%     804ns ± 0%  -0.19%        (p=0.001 n=18+18)
GobDecode-12                7.21ms ± 1%    7.14ms ± 1%  -0.92%        (p=0.000 n=19+20)
GobEncode-12                5.88ms ± 1%    5.88ms ± 1%    ~           (p=0.641 n=18+19)
Gzip-12                      218ms ± 1%     218ms ± 1%    ~           (p=0.271 n=19+18)
Gunzip-12                   37.1ms ± 0%    36.9ms ± 0%  -0.29%        (p=0.000 n=18+17)
HTTPClientServer-12         78.1µs ± 2%    77.4µs ± 2%    ~           (p=0.070 n=19+19)
JSONEncode-12               15.5ms ± 1%    15.5ms ± 0%    ~           (p=0.063 n=20+18)
JSONDecode-12               56.1ms ± 0%    55.4ms ± 1%  -1.18%        (p=0.000 n=19+18)
Mandelbrot200-12            4.05ms ± 0%    4.06ms ± 0%  +0.29%        (p=0.001 n=18+18)
GoParse-12                  3.28ms ± 1%    3.21ms ± 1%  -2.30%        (p=0.000 n=20+20)
RegexpMatchEasy0_32-12      69.4ns ± 2%    69.3ns ± 1%    ~           (p=0.205 n=18+16)
RegexpMatchEasy0_1K-12       239ns ± 0%     239ns ± 0%    ~     (all samples are equal)
RegexpMatchEasy1_32-12      69.4ns ± 1%    69.4ns ± 1%    ~           (p=0.620 n=15+18)
RegexpMatchEasy1_1K-12       370ns ± 1%     369ns ± 2%    ~           (p=0.088 n=20+20)
RegexpMatchMedium_32-12      108ns ± 0%     108ns ± 0%    ~     (all samples are equal)
RegexpMatchMedium_1K-12     33.6µs ± 3%    33.5µs ± 3%    ~           (p=0.718 n=20+20)
RegexpMatchHard_32-12       1.68µs ± 1%    1.67µs ± 2%    ~           (p=0.316 n=20+20)
RegexpMatchHard_1K-12       50.5µs ± 3%    50.4µs ± 3%    ~           (p=0.659 n=20+20)
Revcomp-12                   381ms ± 1%     381ms ± 1%    ~           (p=0.916 n=19+18)
Template-12                 66.5ms ± 1%    65.8ms ± 2%  -1.08%        (p=0.000 n=20+20)
TimeParse-12                 317ns ± 0%     319ns ± 0%  +0.48%        (p=0.000 n=19+12)
TimeFormat-12                338ns ± 0%     338ns ± 0%    ~           (p=0.124 n=19+18)
[Geo mean]                  5.99µs         5.96µs       -0.54%

Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907
Reviewed-on: https://go-review.googlesource.com/22591
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-29 15:08:13 +00:00
Austin Clements
3e2462387f [dev.garbage] runtime: eliminate mspan.start
This converts all remaining uses of mspan.start to instead use
mspan.base(). In many cases, this actually reduces the complexity of
the code.

Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7
Reviewed-on: https://go-review.googlesource.com/22590
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2016-04-29 03:53:17 +00:00
Austin Clements
b7adc41fba [dev.garbage] runtime: use s.base() everywhere it makes sense
Currently we have lots of (s.start << _PageShift) and variants. We now
have an s.base() function that returns this. It's faster and more
readable, so use it.

Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b
Reviewed-on: https://go-review.googlesource.com/22559
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2016-04-29 03:53:14 +00:00
Austin Clements
2e8b74b695 [dev.garbage] runtime: document sysAlloc
In particular, it always returns an aligned pointer.

Change-Id: I763789a539a4bfd8b0efb36a39a80be1a479d3e2
Reviewed-on: https://go-review.googlesource.com/22558
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-29 03:53:12 +00:00
Austin Clements
15744c92de [dev.garbage] runtime: remove unused head/end arguments from freeSpan
These used to be used for the list of newly freed objects, but that's
no longer a thing.

Change-Id: I5a4503137b74ec0eae5372ca271b1aa0b32df074
Reviewed-on: https://go-review.googlesource.com/22557
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-29 03:53:08 +00:00
Rick Hudson
2fb75ea6c6 [dev.garbage] runtime: use sys.Ctz64 intrinsic
Our compilers now provides instrinsics including
sys.Ctz64 that support CTZ (count trailing zero)
instructions. This CL replaces the Go versions
of CTZ with the compiler intrinsic.

Count trailing zeros CTZ finds the least
significant 1 in a word and returns the number
of less significant 0s in the word.

Allocation uses the bitmap created by the garbage
collector to locate an unmarked object. The logic
takes a word of the bitmap, complements, and then
caches it. It then uses CTZ to locate an available
unmarked object. It then shifts marked bits out of
the bitmap word preparing it for the next search.
Once all the unmarked objects are used in the
cached work the bitmap gets another word and
repeats the process.

Change-Id: Id2fc42d1d4b9893efaa2e1bd01896985b7e42f82
Reviewed-on: https://go-review.googlesource.com/21366
Reviewed-by: Austin Clements <austin@google.com>
2016-04-29 00:00:50 +00:00
Rick Hudson
2063d5d903 [dev.garbage] runtime: restructure alloc and mark bits
Two changes are included here that are dependent on the other.
The first is that allocBits and gcamrkBits are changed to
a *uint8 which points to the first byte of that span's
mark and alloc bits. Several places were altered to
perform pointer arithmetic to locate the byte corresponding
to an object in the span. The actual bit corresponding
to an object is indexed in the byte by using the lower three
bits of the objects index.

The second change avoids the redundant calculation of an
object's index. The index is returned from heapBitsForObject
and then used by the functions indexing allocBits
and gcmarkBits.

Finally we no longer allocate the gc bits in the span
structures. Instead we use an arena based allocation scheme
that allows for a more compact bit map as well as recycling
and bulk clearing of the mark bits.

Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef
Reviewed-on: https://go-review.googlesource.com/20705
Reviewed-by: Austin Clements <austin@google.com>
2016-04-29 00:00:47 +00:00
Mikio Hara
be730b49ca runtime: drop _SigUnblock for SIGSYS on Linux
The _SigUnblock flag was appended to SIGSYS slot of runtime signal table
for Linux in https://go-review.googlesource.com/22202, but there is
still no concrete opinion on whether SIGSYS must be an unblocked signal
for runtime.

This change removes _SigUnblock flag from SIGSYS on Linux for
consistency in runtime signal handling and adds a reference to #15204 to
runtime signal table for FreeBSD.

Updates #15204.

Change-Id: I42992b1d852c2ab5dd37d6dbb481dba46929f665
Reviewed-on: https://go-review.googlesource.com/22537
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-28 21:48:44 +00:00
Rick Hudson
23aeb34df1 [dev.garbage] Merge remote-tracking branch 'origin/master' into HEAD
Change-Id: I282fd9ce9db435dfd35e882a9502ab1abc185297
2016-04-27 18:46:52 -04:00
Brad Fitzpatrick
06d639e075 runtime: fix SetCgoTraceback doc indentation
It wasn't rendering as HTML nicely.

Change-Id: I5408ec22932a05e85c210c0faa434bd19dce5650
Reviewed-on: https://go-review.googlesource.com/22532
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-27 22:12:01 +00:00
Rick Hudson
1354b32cd7 [dev.garbage] runtime: add gc work buffer tryGet and put fast paths
The complexity of the GC work buffers put and tryGet
prevented them from being inlined. This CL simplifies
the fast path thus enabling inlining. If the fast
path does not succeed the previous put and tryGet
functions are called.

Change-Id: I6da6495d0dadf42bd0377c110b502274cc01acf5
Reviewed-on: https://go-review.googlesource.com/20704
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:55:02 +00:00
Rick Hudson
f8d0d4fd59 [dev.garbage] runtime: cleanup and optimize span.base()
Prior to this CL the base of a span was calculated in various
places using shifts or calls to base(). This CL now
always calls base() which has been optimized to calculate the
base of the span when the span is initialized and store that
value in the span structure.

Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100
Reviewed-on: https://go-review.googlesource.com/20703
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:59 +00:00
Rick Hudson
8dda1c4c08 [dev.garbage] runtime: remove heapBitsSweepSpan
Prior to this CL the sweep phase was responsible for locating
all objects that were about to be freed and calling a function
to process the object. This was done by the function
heapBitsSweepSpan. Part of processing included calls to
tracefree and msanfree as well as counting how many objects
were freed.

The calls to tracefree and msanfree have been moved into the
gcmalloc routine and called when the object is about to be
reallocated. The counting of free objects has been optimized
using an array based popcnt algorithm and if all the objects
in a span are free then span is freed.

Similarly the code to locate the next free object has been
optimized to use an array based ctz (count trailing zero).
Various hot paths in the allocation logic have been optimized.

At this point the garbage benchmark is within 3% of the 1.6
release.

Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa
Reviewed-on: https://go-review.googlesource.com/20201
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:57 +00:00
Rick Hudson
4093481523 [dev.garbage] runtime: add bit and cache ctz64 (count trailing zero)
Add to each span a 64 bit cache (allocCache) of the allocBits
at freeindex. allocCache is shifted such that the lowest bit
corresponds to the bit freeindex. allocBits uses a 0 to
indicate an object is free, on the other hand allocCache
uses a 1 to indicate an object is free. This facilitates
ctz64 (count trailing zero) which counts the number of 0s
trailing the least significant 1. This is also the index of
the least significant 1.

Each span maintains a freeindex indicating the boundary
between allocated objects and unallocated objects. allocCache
is shifted as freeindex is incremented such that the low bit
in allocCache corresponds to the bit a freeindex in the
allocBits array.

Currently ctz64 is written in Go using a for loop so it is
not very efficient. Use of the hardware instruction will
follow. With this in mind comparisons of the garbage
benchmark are as follows.

1.6 release        2.8 seconds
dev:garbage branch 3.1 seconds.

Profiling shows the go implementation of ctz64 takes up
1% of the total time.

Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f
Reviewed-on: https://go-review.googlesource.com/20200
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:54 +00:00
Rick Hudson
44fe90d0b3 [dev.garbage] runtime: logic that uses count trailing zero (ctz)
Most (all?) processors that Go supports supply a hardware
instruction that takes a byte and returns the number
of zeros trailing the first 1 encountered, or 8
if no ones are found. This is the index within the
byte of the first 1 encountered. CTZ should improve the
performance of the nextFreeIndex function.

Since nextFreeIndex wants the next unmarked (0) bit
a bit-wise complement is needed before calling ctz.
Furthermore unmarked bits associated with previously
allocated objects need to be ignored. Instead of writing
a 1 as we allocate the code masks all bits less than the
freeindex after loading the byte.

While this CL does not actual execute a CTZ instruction
it supplies a ctz function with the appropiate signature
along with the logic to execute it.

Change-Id: I5c55ce0ed48ca22c21c4dd9f969b0819b4eadaa7
Reviewed-on: https://go-review.googlesource.com/20169
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:52 +00:00
Rick Hudson
e4ac2d4acc [dev.garbage] runtime: replace ref with allocCount
This is a renaming of the field ref to the
more appropriate allocCount. The field
holds the number of objects in the span
that are currently allocated. Some throws
strings were adjusted to more accurately
convey the meaning of allocCount.

Change-Id: I10daf44e3e9cc24a10912638c7de3c1984ef8efe
Reviewed-on: https://go-review.googlesource.com/19518
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:49 +00:00
Rick Hudson
3479b065d4 [dev.garbage] runtime: allocate directly from GC mark bits
Instead of building a freelist from the mark bits generated
by the GC this CL allocates directly from the mark bits.

The approach moves the mark bits from the pointer/no pointer
heap structures into their own per span data structures. The
mark/allocation vectors consist of a single mark bit per
object. Two vectors are maintained, one for allocation and
one for the GC's mark phase. During the GC cycle's sweep
phase the interpretation of the vectors is swapped. The
mark vector becomes the allocation vector and the old
allocation vector is cleared and becomes the mark vector that
the next GC cycle will use.

Marked entries in the allocation vector indicate that the
object is not free. Each allocation vector maintains a boundary
between areas of the span already allocated from and areas
not yet allocated from. As objects are allocated this boundary
is moved until it reaches the end of the span. At this point
further allocations will be done from another span.

Since we no longer sweep a span inspecting each freed object
the responsibility for maintaining pointer/scalar bits in
the heapBitMap containing is now the responsibility of the
the routines doing the actual allocation.

This CL is functionally complete and ready for performance
tuning.

Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04
Reviewed-on: https://go-review.googlesource.com/19470
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:47 +00:00
Rick Hudson
dc65a82eff [dev.garbage] runtime: mark/allocation helper functions
The gcmarkBits is a bit vector used by the GC to mark
reachable objects. Once a GC cycle is complete the gcmarkBits
swap places with the allocBits. allocBits is then used directly
by malloc to locate free objects, thus avoiding the
construction of a linked free list. This CL introduces a set
of helper functions for manipulating gcmarkBits and allocBits
that will be used by later CLs to realize the actual
algorithm. Minimal attempts have been made to optimize these
helper routines.

Change-Id: I55ad6240ca32cd456e8ed4973c6970b3b882dd34
Reviewed-on: https://go-review.googlesource.com/19420
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-27 21:54:44 +00:00
Rick Hudson
e1c4e9a754 [dev.garbage] runtime: refactor next free object
In preparation for changing how the next free object is chosen
refactor and consolidate code into a single function.

Change-Id: I6836cd88ed7cbf0b2df87abd7c1c3b9fabc1cbd8
Reviewed-on: https://go-review.googlesource.com/19317
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:41 +00:00
Rick Hudson
aed861038f [dev.garbage] runtime: add stackfreelist
The freelist for normal objects and the freelist
for stacks share the same mspan field for holding
the list head but are operated on by different code
sequences. This overloading complicates the use of bit
vectors for allocation of normal objects. This change
refactors the use of the stackfreelist out from the
use of freelist.

Change-Id: I5b155b5b8a1fcd8e24c12ee1eb0800ad9b6b4fa0
Reviewed-on: https://go-review.googlesource.com/19315
Reviewed-by: Austin Clements <austin@google.com>
2016-04-27 21:54:39 +00:00
Rick Hudson
2ac8bdc52a [dev.garbage] runtime: bitmap allocation data structs
The bitmap allocation data structure prototypes. Before
this is released these underlying data structures need
to be more performant but the signatures of helper
functions utilizing these structures will remain stable.

Change-Id: I5ace12f2fb512a7038a52bbde2bfb7e98783bcbe
Reviewed-on: https://go-review.googlesource.com/19221
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-27 21:54:35 +00:00
Austin Clements
b49b71ae19 runtime: don't rescan globals
Currently the runtime rescans globals during mark 2 and mark
termination. This costs as much as 500µs/MB in STW time, which is
enough to surpass the 10ms STW limit with only 20MB of globals.

It's also basically unnecessary. The compiler already generates write
barriers for global -> heap pointer updates and the regular write
barrier doesn't check whether the slot is a global or in the heap.
Some less common write barriers do cause problems.
heapBitsBulkBarrier, which is used by typedmemmove and related
functions, currently depends on having access to the pointer bitmap
and as a result ignores writes to globals. Likewise, the
reflect-related write barriers reflect_typedmemmovepartial and
callwritebarrier ignore non-heap destinations; though it appears they
can never be called with global pointers anyway.

This commit makes heapBitsBulkBarrier issue write barriers for writes
to global pointers using the data and BSS pointer bitmaps, removes the
inheap checks from the reflection write barriers, and eliminates the
rescans during mark 2 and mark termination. It also adds a test that
writes to globals have write barriers.

Programs with large data+BSS segments (with pointers) aren't common,
but for programs that do have large data+BSS segments, this
significantly reduces pause time:

name \ 95%ile-time/markTerm              old         new  delta
LargeBSS/bss:1GB/gomaxprocs:4  148200µs ± 6%  302µs ±52%  -99.80% (p=0.008 n=5+5)

This very slightly improves the go1 benchmarks:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.62s ± 3%     2.62s ± 4%    ~     (p=0.904 n=20+20)
Fannkuch11-12                2.15s ± 1%     2.13s ± 0%  -1.29%  (p=0.000 n=18+20)
FmtFprintfEmpty-12          48.3ns ± 2%    47.6ns ± 1%  -1.52%  (p=0.000 n=20+16)
FmtFprintfString-12          152ns ± 0%     152ns ± 1%    ~     (p=0.725 n=18+18)
FmtFprintfInt-12             150ns ± 1%     149ns ± 1%  -1.14%  (p=0.000 n=19+20)
FmtFprintfIntInt-12          250ns ± 0%     244ns ± 1%  -2.12%  (p=0.000 n=20+18)
FmtFprintfPrefixedInt-12     219ns ± 1%     217ns ± 1%  -1.20%  (p=0.000 n=19+20)
FmtFprintfFloat-12           280ns ± 0%     281ns ± 1%  +0.47%  (p=0.000 n=19+19)
FmtManyArgs-12               928ns ± 0%     923ns ± 1%  -0.53%  (p=0.000 n=19+18)
GobDecode-12                7.21ms ± 1%    7.24ms ± 2%    ~     (p=0.091 n=19+19)
GobEncode-12                6.07ms ± 1%    6.05ms ± 1%  -0.36%  (p=0.002 n=20+17)
Gzip-12                      265ms ± 1%     265ms ± 1%    ~     (p=0.496 n=20+19)
Gunzip-12                   39.6ms ± 1%    39.3ms ± 1%  -0.85%  (p=0.000 n=19+19)
HTTPClientServer-12         74.0µs ± 2%    73.8µs ± 1%    ~     (p=0.569 n=20+19)
JSONEncode-12               15.4ms ± 1%    15.3ms ± 1%  -0.25%  (p=0.049 n=17+17)
JSONDecode-12               53.7ms ± 2%    53.0ms ± 1%  -1.29%  (p=0.000 n=18+17)
Mandelbrot200-12            3.97ms ± 1%    3.97ms ± 0%    ~     (p=0.072 n=17+18)
GoParse-12                  3.35ms ± 2%    3.36ms ± 1%  +0.51%  (p=0.005 n=18+20)
RegexpMatchEasy0_32-12      72.7ns ± 2%    72.2ns ± 1%  -0.70%  (p=0.005 n=19+19)
RegexpMatchEasy0_1K-12       246ns ± 1%     245ns ± 0%  -0.60%  (p=0.000 n=18+16)
RegexpMatchEasy1_32-12      72.8ns ± 1%    72.5ns ± 1%  -0.37%  (p=0.011 n=18+18)
RegexpMatchEasy1_1K-12       380ns ± 1%     385ns ± 1%  +1.34%  (p=0.000 n=20+19)
RegexpMatchMedium_32-12      115ns ± 2%     115ns ± 1%  +0.44%  (p=0.047 n=20+20)
RegexpMatchMedium_1K-12     35.4µs ± 1%    35.5µs ± 1%    ~     (p=0.079 n=18+19)
RegexpMatchHard_32-12       1.83µs ± 0%    1.80µs ± 1%  -1.76%  (p=0.000 n=18+18)
RegexpMatchHard_1K-12       55.1µs ± 0%    54.3µs ± 1%  -1.42%  (p=0.000 n=18+19)
Revcomp-12                   386ms ± 1%     381ms ± 1%  -1.14%  (p=0.000 n=18+18)
Template-12                 61.5ms ± 2%    61.5ms ± 2%    ~     (p=0.647 n=19+20)
TimeParse-12                 338ns ± 0%     336ns ± 1%  -0.72%  (p=0.000 n=14+19)
TimeFormat-12                350ns ± 0%     357ns ± 0%  +2.05%  (p=0.000 n=19+18)
[Geo mean]                  55.3µs         55.0µs       -0.41%

Change-Id: I57e8720385a1b991aeebd111b6874354308e2a6b
Reviewed-on: https://go-review.googlesource.com/20829
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-27 18:48:16 +00:00
Austin Clements
30172f1811 runtime: make {add,subtract}{b,1} nosplit
These are used at the bottom level of various GC operations that must
not be preempted. To be on the safe side, mark them all nosplit.

Change-Id: I8f7360e79c9852bd044df71413b8581ad764380c
Reviewed-on: https://go-review.googlesource.com/22504
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-27 18:46:00 +00:00
David Crawshaw
217be5b35d reflect: unnamed interface types have no name
Fixes #15468

Change-Id: I8723171f87774a98d5e80e7832ebb96dd1fbea74
Reviewed-on: https://go-review.googlesource.com/22524
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
2016-04-27 18:06:20 +00:00
Zhongwei Yao
74a9bad638 cmd/compile: enable const division for arm64
performance:
benchmark                   old ns/op     new ns/op     delta
BenchmarkDivconstI64-8      8.28          2.70          -67.39%
BenchmarkDivconstU64-8      8.28          4.69          -43.36%
BenchmarkDivconstI32-8      8.28          6.39          -22.83%
BenchmarkDivconstU32-8      8.28          4.43          -46.50%
BenchmarkDivconstI16-8      5.17          5.17          +0.00%
BenchmarkDivconstU16-8      5.33          5.34          +0.19%
BenchmarkDivconstI8-8       3.50          3.50          +0.00%
BenchmarkDivconstU8-8       3.51          3.50          -0.28%

Fixes #15382

Change-Id: Ibce7b28f0586d593b33c4d4ecc5d5e7e7c905d13
Reviewed-on: https://go-review.googlesource.com/22292
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
2016-04-27 17:47:49 +00:00
Cherry Zhang
9629f55fbb cmd/link: remove absolute address for c-archive on darwin/arm
Now it is possible to build a c-archive as PIC on darwin/arm (this is
now the default). Then the system linker can link the binary using
the archive as PIE.

Fixes #12896.

Change-Id: Iad84131572422190f5fa036e7d71910dc155f155
Reviewed-on: https://go-review.googlesource.com/22461
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-04-27 16:22:06 +00:00
Dmitry Vyukov
6dfba5c7ce runtime/race: improve TestNoRaceIOHttp test
TestNoRaceIOHttp does all kinds of bad things:
1. Binds to a fixed port, so concurrent tests fail.
2. Registers HTTP handler multiple times, so repeated tests fail.
3. Relies on sleep to wait for listen.

Fix all of that.

Change-Id: I1210b7797ef5e92465b37dc407246d92a2a24fe8
Reviewed-on: https://go-review.googlesource.com/19953
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-27 08:08:18 +00:00
Austin Clements
2a889b9d93 runtime: make stack re-scan O(# dirty stacks)
Currently the stack re-scan during mark termination is O(# stacks)
because we enqueue a root marking job for every goroutine. It takes
~34ns to process this root marking job for a valid (clean) stack, so
at around 300k goroutines we exceed the 10ms pause goal. A non-trivial
portion of this time is spent simply taking the cache miss to check
the gcscanvalid flag, so simply optimizing the path that handles clean
stacks can only improve this so much.

Fix this by keeping an explicit list of goroutines with dirty stacks
that need to be rescanned. When a goroutine first transitions to
running after a stack scan and marks its stack dirty, it adds itself
to this list. We enqueue root marking jobs only for the goroutines in
this list, so this improves stack re-scanning asymptotically by
completely eliminating time spent on clean goroutines.

This reduces mark termination time for 500k idle goroutines from 15ms
to 238µs. Overall performance effect is negligible.

name \ 95%ile-time/markTerm     old           new         delta
IdleGs/gs:500000/gomaxprocs:12  15000µs ± 0%  238µs ± 5%  -98.41% (p=0.000 n=10+10)

name              old time/op  new time/op  delta
XBenchGarbage-12  2.30ms ± 3%  2.29ms ± 1%  -0.43%  (p=0.049 n=17+18)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.57s ± 3%     2.59s ± 2%    ~     (p=0.141 n=19+20)
Fannkuch11-12                2.09s ± 0%     2.10s ± 1%  +0.53%  (p=0.000 n=19+19)
FmtFprintfEmpty-12          45.3ns ± 3%    45.2ns ± 2%    ~     (p=0.845 n=20+20)
FmtFprintfString-12          129ns ± 0%     127ns ± 0%  -1.55%  (p=0.000 n=16+16)
FmtFprintfInt-12             123ns ± 0%     119ns ± 1%  -3.24%  (p=0.000 n=19+19)
FmtFprintfIntInt-12          195ns ± 1%     189ns ± 1%  -3.11%  (p=0.000 n=17+17)
FmtFprintfPrefixedInt-12     193ns ± 1%     187ns ± 1%  -3.06%  (p=0.000 n=19+19)
FmtFprintfFloat-12           254ns ± 0%     255ns ± 1%  +0.35%  (p=0.001 n=14+17)
FmtManyArgs-12               781ns ± 0%     770ns ± 0%  -1.48%  (p=0.000 n=16+19)
GobDecode-12                7.00ms ± 1%    6.98ms ± 1%    ~     (p=0.563 n=19+19)
GobEncode-12                5.91ms ± 1%    5.92ms ± 0%    ~     (p=0.118 n=19+18)
Gzip-12                      219ms ± 1%     215ms ± 1%  -1.81%  (p=0.000 n=18+18)
Gunzip-12                   37.2ms ± 0%    37.4ms ± 0%  +0.45%  (p=0.000 n=17+19)
HTTPClientServer-12         76.9µs ± 3%    77.5µs ± 2%  +0.81%  (p=0.030 n=20+19)
JSONEncode-12               15.0ms ± 0%    14.8ms ± 1%  -0.88%  (p=0.001 n=15+19)
JSONDecode-12               50.6ms ± 0%    53.2ms ± 2%  +5.07%  (p=0.000 n=17+19)
Mandelbrot200-12            4.05ms ± 0%    4.05ms ± 1%    ~     (p=0.581 n=16+17)
GoParse-12                  3.34ms ± 1%    3.30ms ± 1%  -1.21%  (p=0.000 n=15+20)
RegexpMatchEasy0_32-12      69.6ns ± 1%    69.8ns ± 2%    ~     (p=0.566 n=19+19)
RegexpMatchEasy0_1K-12       238ns ± 1%     236ns ± 0%  -0.91%  (p=0.000 n=17+13)
RegexpMatchEasy1_32-12      69.8ns ± 1%    70.0ns ± 1%  +0.23%  (p=0.026 n=17+16)
RegexpMatchEasy1_1K-12       371ns ± 1%     363ns ± 1%  -2.07%  (p=0.000 n=19+19)
RegexpMatchMedium_32-12      107ns ± 2%     106ns ± 1%  -0.51%  (p=0.031 n=18+20)
RegexpMatchMedium_1K-12     33.0µs ± 0%    32.9µs ± 0%  -0.30%  (p=0.004 n=16+16)
RegexpMatchHard_32-12       1.70µs ± 0%    1.70µs ± 0%  +0.45%  (p=0.000 n=16+17)
RegexpMatchHard_1K-12       51.1µs ± 2%    51.4µs ± 1%  +0.53%  (p=0.000 n=17+19)
Revcomp-12                   378ms ± 1%     385ms ± 1%  +1.92%  (p=0.000 n=19+18)
Template-12                 64.3ms ± 2%    65.0ms ± 2%  +1.09%  (p=0.001 n=19+19)
TimeParse-12                 315ns ± 1%     317ns ± 2%    ~     (p=0.108 n=18+20)
TimeFormat-12                360ns ± 1%     337ns ± 0%  -6.30%  (p=0.000 n=18+13)
[Geo mean]                  51.8µs         51.6µs       -0.48%

Change-Id: Icf8994671476840e3998236e15407a505d4c760c
Reviewed-on: https://go-review.googlesource.com/20700
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 23:40:13 +00:00
Austin Clements
5b765ce310 runtime: don't clear gcscanvalid in casfrom_Gscanstatus
Currently we clear gcscanvalid in both casgstatus and
casfrom_Gscanstatus if the new status is _Grunning. This is very
important to do in casgstatus. However, this is potentially wrong in
casfrom_Gscanstatus because in this case the caller doesn't own gp and
hence the write is racy. Unlike the other _Gscan statuses, during
_Gscanrunning, the G is still running. This does not indicate that
it's transitioning into a running state. The scan simply hasn't
happened yet, so it's neither valid nor invalid.

Conveniently, this also means clearing gcscanvalid is unnecessary in
this case because the G was already in _Grunning, so we can simply
remove this code. What will happen instead is that the G will be
preempted to scan itself, that scan will set gcscanvalid to true, and
then the G will return to _Grunning via casgstatus, clearing
gcscanvalid.

This fix will become necessary shortly when we start keeping track of
the set of G's with dirty stacks, since it will no longer be
idempotent to simply set gcscanvalid to false.

Change-Id: I688c82e6fbf00d5dbbbff49efa66acb99ee86785
Reviewed-on: https://go-review.googlesource.com/20669
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 23:40:10 +00:00
Austin Clements
c707d83856 runtime: fix typos in comment about gcscanvalid
Change-Id: Id4ad7ebf88a21eba2bc5714b96570ed5cfaed757
Reviewed-on: https://go-review.googlesource.com/22210
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 23:40:07 +00:00
Austin Clements
9f263c14ed runtime: remove stack barriers during sweep
This adds a best-effort pass to remove stack barriers immediately
after the end of mark termination. This isn't necessary for the Go
runtime, but should help external tools that perform stack walks but
aren't aware of Go's stack barriers such as GDB, perf, and VTune.
(Though clearly they'll still have trouble unwinding stacks during
mark.)

Change-Id: I66600fae1f03ee36b5459d2b00dcc376269af18e
Reviewed-on: https://go-review.googlesource.com/20668
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 23:40:04 +00:00
Austin Clements
269c969c81 runtime: remove stack barriers during concurrent mark
Currently we remove stack barriers during STW mark termination, which
has a non-trivial per-goroutine cost and means that we have to touch
even clean stacks during mark termination. However, there's no problem
with leaving them in during the sweep phase. They just have to be out
by the time we install new stack barriers immediately prior to
scanning the stack such as during the mark phase of the next GC cycle
or during mark termination in a STW GC.

Hence, move the gcRemoveStackBarriers from STW mark termination to
just before we install new stack barriers during concurrent mark. This
removes the cost from STW. Furthermore, this combined with concurrent
stack shrinking means that the mark termination scan of a clean stack
is a complete no-op, which will make it possible to skip clean stacks
entirely during mark termination.

This has the downside that it will mess up anything outside of Go that
tries to walk Go stacks all the time instead of just some of the time.
This includes tools like GDB, perf, and VTune. We'll improve the
situation shortly.

Change-Id: Ia40baad8f8c16aeefac05425e00b0cf478137097
Reviewed-on: https://go-review.googlesource.com/20667
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 23:40:01 +00:00
Austin Clements
efb0c55407 runtime: avoid span root marking entirely during mark termination
Currently we enqueue span root mark jobs during both concurrent mark
and mark termination, but we make the job a no-op during mark
termination.

This is silly. Instead of queueing them up just to not do them, don't
queue them up in the first place.

Change-Id: Ie1d36de884abfb17dd0db6f0449a2b7c997affab
Reviewed-on: https://go-review.googlesource.com/20666
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 23:39:58 +00:00
Austin Clements
e8337491aa runtime: free dead G stacks concurrently
Currently we free cached stacks of dead Gs during STW stack root
marking. We do this during STW because there's no way to take
ownership of a particular dead G, so attempting to free a dead G's
stack during concurrent stack root marking could race with reusing
that G.

However, we can do this concurrently if we take a completely different
approach. One way to prevent reuse of a dead G is to remove it from
the free G list. Hence, this adds a new fixed root marking task that
simply removes all Gs from the list of dead Gs with cached stacks,
frees their stacks, and then adds them to the list of dead Gs without
cached stacks.

This is also a necessary step toward rescanning only dirty stacks,
since it eliminates another task from STW stack marking.

Change-Id: Iefbad03078b284a2e7bf30fba397da4ca87fe095
Reviewed-on: https://go-review.googlesource.com/20665
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 23:39:55 +00:00
Austin Clements
1a2cf91f5e runtime: split gfree list into with-stacks and without-stacks
Currently all free Gs are added to one list. Split this into two
lists: one for free Gs with cached stacks and one for Gs without
cached stacks.

This lets us preferentially allocate Gs that already have a stack, but
more importantly, it sets us up to free cached G stacks concurrently.

Change-Id: Idbe486f708997e1c9d166662995283f02d1eeb3c
Reviewed-on: https://go-review.googlesource.com/20664
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 23:39:51 +00:00
Michael Munday
55154cf0b2 cmd/link: fix gdb backtrace on architectures using a link register
Also adds TestGdbBacktrace to the runtime package.

Dwarf modifications written by Bryan Chan (@bryanpkc) who is also
at IBM and covered by the same CLA.

Fixes #14628

Change-Id: I106a1f704c3745a31f29cdadb0032e3905829850
Reviewed-on: https://go-review.googlesource.com/20193
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26 18:35:47 +00:00
Ilya Tocar
6b02a19247 strings: use SSE4.2 in strings.Index on AMD64
Use PCMPESTRI instruction if available.

Index-4              21.1ns ± 0%  21.1ns ± 0%     ~     (all samples are equal)
IndexHard1-4          395µs ± 0%   105µs ± 0%  -73.53%        (p=0.000 n=19+20)
IndexHard2-4          300µs ± 0%   147µs ± 0%  -51.11%        (p=0.000 n=19+20)
IndexHard3-4          665µs ± 0%   665µs ± 0%     ~           (p=0.942 n=16+19)

Change-Id: I4f66794164740a2b939eb1c78934e2390b489064
Reviewed-on: https://go-review.googlesource.com/22337
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-04-26 10:14:26 +00:00
Keith Randall
9cb79e9536 runtime: arm5, fix large-offset floating-point stores
The code sequence for large-offset floating-point stores
includes adding the base pointer to r11.  Make sure we
can interpret that instruction correctly.

Fixes build.

Fixes #15440

Change-Id: I7fe5a4a57e08682967052bf77c54e0ec47fcb53e
Reviewed-on: https://go-review.googlesource.com/22440
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2016-04-25 22:33:33 +00:00
Keith Randall
6f3f02f80d runtime: zero tmpbuf between len and cap
Zero the entire buffer so we don't need to
lower its capacity upon return.  This lets callers
do some appending without allocation.

Zeroing is cheap, the byte buffer requires only
4 extra instructions.

Fixes #14235

Change-Id: I970d7badcef047dafac75ac17130030181f18fe2
Reviewed-on: https://go-review.googlesource.com/22424
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-25 21:16:52 +00:00
Dmitry Vyukov
75b844f0d2 runtime/trace: test detection of broken timestamps
On some processors cputicks (used to generate trace timestamps)
produce non-monotonic timestamps. It is important that the parser
distinguishes logically inconsistent traces (e.g. missing, excessive
or misordered events) from broken timestamps. The former is a bug
in tracer, the latter is a machine issue.

Test that (1) parser does not return a logical error in case of
broken timestamps and (2) broken timestamps are eventually detected
and reported.

Change-Id: Ib4b1eb43ce128b268e754400ed8b5e8def04bd78
Reviewed-on: https://go-review.googlesource.com/21608
Reviewed-by: Austin Clements <austin@google.com>
2016-04-24 09:11:37 +00:00
Dmitry Vyukov
a3703618ea runtime: use per-goroutine sequence numbers in tracer
Currently tracer uses global sequencer and it introduces
significant slowdown on parallel machines (up to 10x).
Replace the global sequencer with per-goroutine sequencer.

If we assign per-goroutine sequence numbers to only 3 types
of events (start, unblock and syscall exit), it is enough to
restore consistent partial ordering of all events. Even these
events don't need sequence numbers all the time (if goroutine
starts on the same P where it was unblocked, then start does
not need sequence number).
The burden of restoring the order is put on trace parser.
Details of the algorithm are described in the comments.

On http benchmark with GOMAXPROCS=48:
no tracing: 5026 ns/op
tracing: 27803 ns/op (+453%)
with this change: 6369 ns/op (+26%, mostly for traceback)

Also trace size is reduced by ~22%. Average event size before: 4.63
bytes/event, after: 3.62 bytes/event.

Besides running trace tests, I've also tested with manually broken
cputicks (random skew for each event, per-P skew and episodic random skew).
In all cases broken timestamps were detected and no test failures.

Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa
Reviewed-on: https://go-review.googlesource.com/21512
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-23 15:57:05 +00:00
Dmitry Vyukov
2d342fba78 runtime: fix description of trace events
Change-Id: I037101b1921fe151695d32e9874b50dd64982298
Reviewed-on: https://go-review.googlesource.com/22314
Reviewed-by: Austin Clements <austin@google.com>
2016-04-22 21:32:37 +00:00
Ian Lance Taylor
32302d6289 runtime/cgo: use normal libinit on PPC GNU/Linux
The special case was because PPC did not support external linking, but
now it does.

Fixes #10410.

Change-Id: I9b024686e0f03da7a44c1c59b41c529802f16ab0
Reviewed-on: https://go-review.googlesource.com/22372
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-22 14:30:27 +00:00
David Crawshaw
c165988360 cmd/compile, etc: use nameOff in uncommonType
linux/amd64 PIE:
	cmd/go:  -62KB (0.5%)
	jujud:  -550KB (0.7%)

For #6853.

Change-Id: Ieb67982abce5832e24b997506f0ae7108f747108
Reviewed-on: https://go-review.googlesource.com/22371
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-22 13:51:29 +00:00
David Crawshaw
1492e7db05 cmd/compile, etc: use nameOff for rtype string
linux/amd64:
	cmd/go:   -8KB (basically nothing)

linux/amd64 PIE:
	cmd/go: -191KB (1.6%)
	jujud:  -1.5MB (1.9%)

Updates #6853
Fixes #15064

Change-Id: I0adbb95685e28be92e8548741df0e11daa0a9b5f
Reviewed-on: https://go-review.googlesource.com/21777
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-22 10:08:05 +00:00
Austin Clements
c8bd293e56 runtime: eliminate floating garbage estimate
Currently when we compute the trigger for the next GC, we do it based
on an estimate of the reachable heap size at the start of the GC
cycle, which is itself based on an estimate of the floating garbage.
This was introduced by 4655aad to fix a bad feedback loop that allowed
the heap to grow to many times the true reachable size.

However, this estimate gets easily confused by rapidly allocating
applications, and, worse it's different than the heap size the trigger
controller uses to compute the trigger itself. This results in the
trigger controller often thinking that GC finished before it started.
Since this would be a pretty great outcome from it's perspective, it
sets the trigger for the next cycle as close to the next goal as
possible (which is limited to 95% of the goal).

Furthermore, the bad feedback loop this estimate originally fixed
seems not to happen any more, suggesting it was fixed more correctly
by some other change in the mean time. Finally, with the change to
allocate black, it shouldn't even be theoretically possible for this
bad feedback loop to occur.

Hence, eliminate the floating garbage estimate and simply consider the
reachable heap to be the marked heap. This harms overall throughput
slightly for allocation-heavy benchmarks, but significantly improves
mutator availability.

Fixes #12204. This brings the average trigger in this benchmark from
0.95 (the cap) to 0.7 and the active GC utilization from ~90% to ~45%.

Updates #14951. This makes the trigger controller much better behaved,
so it pulls the trigger lower if assists are consuming a lot of CPU
like it's supposed to, increasing mutator availability.

name              old time/op  new time/op  delta
XBenchGarbage-12  2.21ms ± 1%  2.28ms ± 3%  +3.29%  (p=0.000 n=17+17)

Some of this slow down we paid for in earlier commits. Relative to the
start of the series to switch to allocate-black (the parent of "count
black allocations toward scan work"), the garbage benchmark is 2.62%
slower.

name                      old time/op    new time/op    delta
BinaryTree17-12              2.53s ± 3%     2.53s ± 3%    ~     (p=0.708 n=20+19)
Fannkuch11-12                2.08s ± 0%     2.08s ± 0%  -0.22%  (p=0.002 n=19+18)
FmtFprintfEmpty-12          45.3ns ± 2%    45.2ns ± 3%    ~     (p=0.505 n=20+20)
FmtFprintfString-12          129ns ± 0%     131ns ± 2%  +1.80%  (p=0.000 n=16+19)
FmtFprintfInt-12             121ns ± 2%     121ns ± 2%    ~     (p=0.768 n=19+19)
FmtFprintfIntInt-12          186ns ± 1%     188ns ± 3%  +0.99%  (p=0.000 n=19+19)
FmtFprintfPrefixedInt-12     188ns ± 1%     188ns ± 1%    ~     (p=0.947 n=18+16)
FmtFprintfFloat-12           254ns ± 1%     255ns ± 1%  +0.30%  (p=0.002 n=19+17)
FmtManyArgs-12               763ns ± 0%     770ns ± 0%  +0.92%  (p=0.000 n=18+18)
GobDecode-12                7.00ms ± 1%    7.04ms ± 1%  +0.61%  (p=0.049 n=20+20)
GobEncode-12                5.88ms ± 1%    5.88ms ± 0%    ~     (p=0.641 n=18+19)
Gzip-12                      214ms ± 1%     215ms ± 1%  +0.43%  (p=0.002 n=18+19)
Gunzip-12                   37.6ms ± 0%    37.6ms ± 0%  +0.11%  (p=0.015 n=17+18)
HTTPClientServer-12         76.9µs ± 2%    78.1µs ± 2%  +1.44%  (p=0.000 n=20+18)
JSONEncode-12               15.2ms ± 2%    15.1ms ± 1%    ~     (p=0.271 n=19+18)
JSONDecode-12               53.1ms ± 1%    53.3ms ± 0%  +0.49%  (p=0.000 n=18+19)
Mandelbrot200-12            4.04ms ± 1%    4.03ms ± 0%  -0.33%  (p=0.005 n=18+18)
GoParse-12                  3.29ms ± 1%    3.28ms ± 1%    ~     (p=0.146 n=16+17)
RegexpMatchEasy0_32-12      69.9ns ± 3%    69.5ns ± 1%    ~     (p=0.785 n=20+19)
RegexpMatchEasy0_1K-12       237ns ± 0%     237ns ± 0%    ~     (p=1.000 n=18+18)
RegexpMatchEasy1_32-12      69.5ns ± 1%    69.2ns ± 1%  -0.44%  (p=0.020 n=16+19)
RegexpMatchEasy1_1K-12       372ns ± 1%     371ns ± 2%    ~     (p=0.086 n=20+19)
RegexpMatchMedium_32-12      108ns ± 3%     107ns ± 1%  -1.00%  (p=0.004 n=19+14)
RegexpMatchMedium_1K-12     34.2µs ± 4%    34.0µs ± 2%    ~     (p=0.380 n=19+20)
RegexpMatchHard_32-12       1.77µs ± 4%    1.76µs ± 3%    ~     (p=0.558 n=18+20)
RegexpMatchHard_1K-12       53.4µs ± 4%    52.8µs ± 2%  -1.10%  (p=0.020 n=18+20)
Revcomp-12                   359ms ± 4%     377ms ± 0%  +5.19%  (p=0.000 n=20+18)
Template-12                 63.7ms ± 2%    62.9ms ± 2%  -1.27%  (p=0.005 n=18+20)
TimeParse-12                 316ns ± 2%     313ns ± 1%    ~     (p=0.059 n=20+16)
TimeFormat-12                329ns ± 0%     331ns ± 0%  +0.39%  (p=0.000 n=16+18)
[Geo mean]                  51.6µs         51.7µs       +0.18%

Change-Id: I1dce4640c8205d41717943b021039fffea863c57
Reviewed-on: https://go-review.googlesource.com/21324
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 20:07:25 +00:00
Austin Clements
6002e01e34 runtime: allocate black during GC
Currently we allocate white for most of concurrent marking. This is
based on the classical argument that it produces less floating
garbage, since allocations during GC may not get linked into the heap
and allocating white lets us reclaim these. However, it's not clear
how often this actually happens, especially since our write barrier
shades any pointer as soon as it's installed in the heap regardless of
the color of the slot.

On the other hand, allocating black has several advantages that seem
to significantly outweigh this downside.

1) It naturally bounds the total scan work to the live heap size at
the start of a GC cycle. Allocating white does not, and thus depends
entirely on assists to prevent the heap from growing faster than it
can be scanned.

2) It reduces the total amount of scan work per GC cycle by the size
of newly allocated objects that are linked into the heap graph, since
objects allocated black never need to be scanned.

3) It reduces total write barrier work since more objects will already
be black when they are linked into the heap graph.

This gives a slight overall improvement in benchmarks.

name              old time/op  new time/op  delta
XBenchGarbage-12  2.24ms ± 0%  2.21ms ± 1%  -1.32%  (p=0.000 n=18+17)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.60s ± 3%     2.53s ± 3%  -2.56%  (p=0.000 n=20+20)
Fannkuch11-12                2.08s ± 1%     2.08s ± 0%    ~     (p=0.452 n=19+19)
FmtFprintfEmpty-12          45.1ns ± 2%    45.3ns ± 2%    ~     (p=0.367 n=19+20)
FmtFprintfString-12          131ns ± 3%     129ns ± 0%  -1.60%  (p=0.000 n=20+16)
FmtFprintfInt-12             122ns ± 0%     121ns ± 2%  -0.86%  (p=0.000 n=16+19)
FmtFprintfIntInt-12          187ns ± 1%     186ns ± 1%    ~     (p=0.514 n=18+19)
FmtFprintfPrefixedInt-12     189ns ± 0%     188ns ± 1%  -0.54%  (p=0.000 n=16+18)
FmtFprintfFloat-12           256ns ± 0%     254ns ± 1%  -0.43%  (p=0.000 n=17+19)
FmtManyArgs-12               769ns ± 0%     763ns ± 0%  -0.72%  (p=0.000 n=18+18)
GobDecode-12                7.08ms ± 2%    7.00ms ± 1%  -1.22%  (p=0.000 n=20+20)
GobEncode-12                5.88ms ± 0%    5.88ms ± 1%    ~     (p=0.406 n=18+18)
Gzip-12                      214ms ± 0%     214ms ± 1%    ~     (p=0.103 n=17+18)
Gunzip-12                   37.6ms ± 0%    37.6ms ± 0%    ~     (p=0.563 n=17+17)
HTTPClientServer-12         77.2µs ± 3%    76.9µs ± 2%    ~     (p=0.606 n=20+20)
JSONEncode-12               15.1ms ± 1%    15.2ms ± 2%    ~     (p=0.138 n=19+19)
JSONDecode-12               53.3ms ± 1%    53.1ms ± 1%  -0.33%  (p=0.000 n=19+18)
Mandelbrot200-12            4.04ms ± 1%    4.04ms ± 1%    ~     (p=0.075 n=19+18)
GoParse-12                  3.30ms ± 1%    3.29ms ± 1%  -0.57%  (p=0.000 n=18+16)
RegexpMatchEasy0_32-12      69.5ns ± 1%    69.9ns ± 3%    ~     (p=0.822 n=18+20)
RegexpMatchEasy0_1K-12       237ns ± 1%     237ns ± 0%    ~     (p=0.398 n=19+18)
RegexpMatchEasy1_32-12      69.8ns ± 2%    69.5ns ± 1%    ~     (p=0.090 n=20+16)
RegexpMatchEasy1_1K-12       371ns ± 1%     372ns ± 1%    ~     (p=0.178 n=19+20)
RegexpMatchMedium_32-12      108ns ± 2%     108ns ± 3%    ~     (p=0.124 n=20+19)
RegexpMatchMedium_1K-12     33.9µs ± 2%    34.2µs ± 4%    ~     (p=0.309 n=20+19)
RegexpMatchHard_32-12       1.75µs ± 2%    1.77µs ± 4%  +1.28%  (p=0.018 n=19+18)
RegexpMatchHard_1K-12       52.7µs ± 1%    53.4µs ± 4%  +1.23%  (p=0.013 n=15+18)
Revcomp-12                   354ms ± 1%     359ms ± 4%  +1.27%  (p=0.043 n=20+20)
Template-12                 63.6ms ± 2%    63.7ms ± 2%    ~     (p=0.654 n=20+18)
TimeParse-12                 313ns ± 1%     316ns ± 2%  +0.80%  (p=0.014 n=17+20)
TimeFormat-12                332ns ± 0%     329ns ± 0%  -0.66%  (p=0.000 n=16+16)
[Geo mean]                  51.7µs         51.6µs       -0.09%

Change-Id: I2214a6a0e4f544699ea166073249a8efdf080dc0
Reviewed-on: https://go-review.googlesource.com/21323
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 20:07:22 +00:00
Austin Clements
64a26b79ac runtime: simplify/optimize allocate-black a bit
Currently allocating black switches to the system stack (which is
probably a historical accident) and atomically updates the global
bytes marked stat. Since we're about to depend on this much more,
optimize it a bit by putting it back on the regular stack and updating
the per-P bytes marked stat, which gets lazily folded into the global
bytes marked stat.

Change-Id: Ibbe16e5382d3fd2256e4381f88af342bf7020b04
Reviewed-on: https://go-review.googlesource.com/22170
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 20:07:20 +00:00
Austin Clements
479501c14c runtime: count black allocations toward scan work
Currently we count black allocations toward the scannable heap size,
but not toward the scan work we've done so far. This is clearly
inconsistent (we have, in effect, scanned these allocations and since
they're already black, we're not going to scan them again). Worse, it
means we don't count black allocations toward the scannable heap size
as of the *next* GC because this is based on the amount of scan work
we did in this cycle.

Fix this by counting black allocations as scan work. Currently the GC
spends very little time in allocate-black mode, so this probably
hasn't been a problem, but this will become important when we switch
to always allocating black.

Change-Id: If6ff693b070c385b65b6ecbbbbf76283a0f9d990
Reviewed-on: https://go-review.googlesource.com/22119
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 20:07:17 +00:00
Martin Möhrmann
7e460e70d9 runtime: use type int to specify size for newarray
Consistently use type int for the size argument of
runtime.newarray, runtime.reflect_unsafe_NewArray
and reflect.unsafe_NewArray.

Change-Id: Ic77bf2dde216c92ca8c49462f8eedc0385b6314e
Reviewed-on: https://go-review.googlesource.com/22311
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 04:15:14 +00:00
Keith Randall
60fd32a47f cmd/compile: change the way we handle large map values
mapaccess{1,2} returns a pointer to the value.  When the key
is not in the map, it returns a pointer to zeroed memory.
Currently, for large map values we have a complicated scheme which
dynamically allocates zeroed memory for this purpose.  It is ugly
code and requires an atomic.Load in a bunch of places we'd rather
not have it.

Switch to a scheme where callsites of mapaccess{1,2} which expect
large return values pass in a pointer to zeroed memory that
mapaccess can return if the key is not found.  This avoids the
atomic.Load on all map accesses with a few extra instructions only
for the large value acccesses, plus a bit of bss space.

There was a time (1.4 & 1.5?) where we did something like this but
all the tricks to make the right size zero value were done by the
linker.  That scheme broke in the presence of dyamic linking.
The scheme in this CL works even when dynamic linking.

Fixes #12337

Change-Id: Ic2d0319944af33bbb59785938d9ab80958d1b4b1
Reviewed-on: https://go-review.googlesource.com/22221
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2016-04-20 21:15:31 +00:00
Keith Randall
001e8e8070 runtime: simplify mallocgc flag argument
mallocgc can calculate noscan itself.  The only remaining
flag argument is needzero, so we just make that a boolean arg.

Fixes #15379

Change-Id: I839a70790b2a0c9dbcee2600052bfbd6c8148e20
Reviewed-on: https://go-review.googlesource.com/22290
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-20 14:02:22 +00:00
Keith Randall
bfe0cbdc50 cmd/compile,runtime: pass elem type to {make,grow}slice
No point in passing the slice type to these functions.
All they need is the element type.  One less indirection,
maybe a few less []T type descriptors in the binary.

Change-Id: Ib0b83b5f14ca21d995ecc199ce8ac00c4eb375e6
Reviewed-on: https://go-review.googlesource.com/22275
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-20 00:31:16 +00:00
Josh Bleecher Snyder
0150f15a92 runtime: call mallocgc directly from makeslice and growslice
The extra checks provided by newarray are
redundant in these cases.

This shrinks by one frame the call stack expected
by the pprof test.

name                      old time/op  new time/op  delta
MakeSlice-8               34.3ns ± 2%  30.5ns ± 3%  -11.03%  (p=0.000 n=24+22)
GrowSlicePtr-8             134ns ± 2%   129ns ± 3%   -3.25%  (p=0.000 n=25+24)

Change-Id: Icd828655906b921c732701fd9d61da3fa217b0af
Reviewed-on: https://go-review.googlesource.com/22276
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-04-20 00:05:36 +00:00
Julia Hansbrough
58012ea785 runtime: updated SIGSYS to cause a panic + stacktrace
On GNU/Linux, SIGSYS is specified to cause the process to terminate
without a core dump. In https://codereview.appspot.com/3749041 , it
appears that Golang accidentally introduced incorrect behavior for
this signal, which caused Golang processes to keep running after
receiving SIGSYS. This change reverts it to the old/correct behavior.

Updates #15204

Change-Id: I3aa48a9499c1bc36fa5d3f40c088fdd7599e0db5
Reviewed-on: https://go-review.googlesource.com/22202
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-19 22:48:31 +00:00
Keith Randall
998c8e034c cmd/compile: convT2{I,E} don't handle direct interfaces
We now inline type to interface conversions when the type
is pointer-shaped.  No need to keep code to handle that in
convT2{I,E}.

Change-Id: I3a6668259556077cbb2986a9e8fe42a625d506c9
Reviewed-on: https://go-review.googlesource.com/22249
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
2016-04-19 22:27:08 +00:00
Josh Bleecher Snyder
a4dd6ea152 runtime: add maxSliceCap
This avoids expensive division calculations
for many common slice element sizes.

name                      old time/op  new time/op  delta
MakeSlice-8               51.9ns ± 3%  35.1ns ± 2%  -32.41%  (p=0.000 n=10+10)
GrowSliceBytes-8          44.1ns ± 2%  44.1ns ± 1%     ~     (p=0.984 n=10+10)
GrowSliceInts-8           60.9ns ± 3%  60.9ns ± 3%     ~     (p=0.698 n=10+10)
GrowSlicePtr-8             131ns ± 1%   120ns ± 2%   -8.41%   (p=0.000 n=8+10)
GrowSliceStruct24Bytes-8   111ns ± 2%   103ns ± 3%   -7.23%    (p=0.000 n=8+8)

Change-Id: I2630eb3d73c814db030cad16e620ea7fecbbd312
Reviewed-on: https://go-review.googlesource.com/22223
Reviewed-by: Keith Randall <khr@golang.org>
2016-04-19 21:38:52 +00:00
Josh Bleecher Snyder
411a0adc9b runtime: add benchmarks for in-place append
Change-Id: I2b43cc976d2efbf8b41170be536fdd10364b65e5
Reviewed-on: https://go-review.googlesource.com/22190
Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 19:08:39 +00:00
David Crawshaw
95df0c6ab9 cmd/compile, etc: use name offset in method tables
Introduce and start using nameOff for two encoded names. This pair
of changes is best done together because the linker's method decoder
expects the method layouts to match.

Precursor to converting all existing name and *string fields to
nameOff.

linux/amd64:
	cmd/go:  -45KB (0.5%)
	jujud:  -389KB (0.6%)

linux/amd64 PIE:
	cmd/go: -170KB (1.4%)
	jujud:  -1.5MB (1.8%)

For #6853.

Change-Id: Ia044423f010fb987ce070b94c46a16fc78666ff6
Reviewed-on: https://go-review.googlesource.com/21396
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-18 09:12:41 +00:00
Austin Clements
2cdcb6f829 runtime: scavenge memory on physical page-aligned boundaries
Currently the scavenger marks memory unused in multiples of the
allocator page size (8K). This is safe as long as the true physical
page size is 4K (or 8K), as it is on many platforms. However, on
ARM64, PPC64x, and MIPS64, the physical page size is larger than 8K,
so if we attempt to mark memory unused, the kernel will round the
boundaries of the region *out* to all pages covered by the requested
region, and we'll release a larger region of memory than intended. As
a result, the scavenger is currently disabled on these platforms.

Fix this by first rounding the region to be marked unused *in* to
multiples of the physical page size, so that when we ask the kernel to
mark it unused, it releases exactly the requested region.

Fixes #9993.

Change-Id: I96d5fdc2f77f9d69abadcea29bcfe55e68288cb1
Reviewed-on: https://go-review.googlesource.com/22066
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-16 21:42:43 +00:00
Austin Clements
1151473077 runtime: check that sysUnused is always physical-page aligned
If sysUnused is passed an address or length that is not aligned to the
physical page boundary, the kernel will unmap more memory than the
caller wanted. Add a check for this.

For #9993.

Change-Id: I68ff03032e7b65cf0a853fe706ce21dc7f2aaaf8
Reviewed-on: https://go-review.googlesource.com/22065
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2016-04-16 21:42:40 +00:00
Austin Clements
8ce844e88e runtime: check kernel physical page size during init
The runtime hard-codes an assumed physical page size. If this is
smaller than the kernel's page size or not a multiple of it, sysUnused
may incorrectly release more memory to the system than intended.

Add a runtime startup check that the runtime's assumed physical page
is compatible with the kernel's physical page size.

For #9993.

Change-Id: Ida9d07f93c00ca9a95dd55fc59bf0d8a607f6728
Reviewed-on: https://go-review.googlesource.com/22064
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-16 21:42:37 +00:00
Austin Clements
d6b177d1eb runtime: remove empty 386 archauxv
archauxv no longer does anything on 386, so remove it.

Change-Id: I94545238e40fa6a6832a7c3b40aedfc6c1f6a97b
Reviewed-on: https://go-review.googlesource.com/22063
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-16 21:42:34 +00:00
Austin Clements
90addd3d41 runtime: common handling of _AT_RANDOM auxv
The Linux kernel provides 16 bytes of random data via the auxv vector
at startup. Currently we consume this separately on 386, amd64, arm,
and arm64. Now that we have a common auxv parser, handle _AT_RANDOM in
the common path.

Change-Id: Ib69549a1d37e2d07a351cf0f44007bcd24f0d20d
Reviewed-on: https://go-review.googlesource.com/22062
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-16 21:42:31 +00:00
Austin Clements
c955bb2040 runtime: common auxv parser
Currently several different Linux architectures have separate copies
of the auxv parser. Bring these all together into a single copy of the
parser that calls out to a per-arch handler for each tag/value pair.
This is in preparation for handling common auxv tags in one place.

For #9993.

Change-Id: Iceebc3afad6b4133b70fca7003561ae370445c10
Reviewed-on: https://go-review.googlesource.com/22061
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2016-04-16 21:42:27 +00:00
Mikio Hara
6f59ccb052 runtime: don't always unblock all signals on dragonfly, freebsd and openbsd
https://golang.org/cl/10173 intrduced msigsave, ensureSigM and
_SigUnblock but didn't enable the new signal save/restore mechanism for
SIG{HUP,INT,QUIT,ABRT,TERM} on DragonFly BSD, FreeBSD and OpenBSD.

At present, it looks like they have the implementation. This change
enables the new mechanism on DragonFly BSD, FreeBSD and OpenBSD the same
as Darwin, NetBSD.

Change-Id: Ifb4b4743b3b4f50bfcdc7cf1fe1b59c377fa2a41
Reviewed-on: https://go-review.googlesource.com/18657
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-15 21:20:45 +00:00
Austin Clements
7c7081f514 sync/atomic: don't atomically write pointers twice
sync/atomic.StorePointer (which is implemented in
runtime/atomic_pointer.go) writes the pointer twice (through two
completely different code paths, no less). Fix it to only write once.

Change-Id: Id3b2aef9aa9081c2cf096833e001b93d3dd1f5da
Reviewed-on: https://go-review.googlesource.com/21999
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-04-14 21:13:26 +00:00
Austin Clements
8f6c35de2f runtime: make sync_atomic_SwapPointer signature match sync/atomic
SwapPointer is declared as

  func SwapPointer(addr *unsafe.Pointer, new unsafe.Pointer) (old unsafe.Pointer)

in sync/atomic, but defined in the runtime (where it's actually
implemented) as

  func sync_atomic_SwapPointer(ptr unsafe.Pointer, new unsafe.Pointer) unsafe.Pointer

Make ptr a *unsafe.Pointer in the runtime definition to match the type
in sync/atomic.

Change-Id: I99bab651b995001bbe54f9e790fdef2417ef0e9e
Reviewed-on: https://go-review.googlesource.com/21998
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2016-04-14 21:13:23 +00:00
Keith Randall
98b6febcef runtime/internal/sys: better fallback algorithms for intrinsics
Use deBruijn sequences to count low-order zeros.
Reorg bswap to not use &^, it takes another instruction on x86.

Change-Id: I4a5ed9fd16ee6a279d88c067e8a2ba11de821156
Reviewed-on: https://go-review.googlesource.com/22084
Reviewed-by: David Chase <drchase@google.com>
2016-04-14 21:09:03 +00:00
Jeremy Jackins
02b8e6978a runtime: find a home for orphaned comments
These comments were left behind after runtime.h was converted
from C to Go. I examined the original code and tried to move these
to the places that the most sense.

Change-Id: I8769d60234c0113d682f9de3bd8d6c34c450c188
Reviewed-on: https://go-review.googlesource.com/21969
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-14 18:34:09 +00:00
David Crawshaw
f120936dff cmd/compile, etc: use name for type pkgPath
By replacing the *string used to represent pkgPath with a
reflect.name everywhere, the embedded *string for package paths
inside the reflect.name can be replaced by an offset, nameOff.
This reduces the number of pointers in the type information.

This also moves all reflect.name types into the same section, making
it possible to use nameOff more widely in later CLs.

No significant binary size change for normal binaries, but:

linux/amd64 PIE:
	cmd/go: -440KB (3.7%)
	jujud:  -2.6MB (3.2%)

For #6853.

Change-Id: I3890b132a784a1090b1b72b32febfe0bea77eaee
Reviewed-on: https://go-review.googlesource.com/21395
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-13 20:48:26 +00:00
Brad Fitzpatrick
73e2ad2022 runtime: rename os1_darwin.go to os_darwin.go
Change-Id: If0e0bc5a85101db1e70faaab168fc2d12024eb93
Reviewed-on: https://go-review.googlesource.com/22005
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-13 20:37:12 +00:00
Brad Fitzpatrick
d9712aa82a runtime: merge the darwin os*.go files together
Merge them together into os1_darwin.go. A future CL will rename it.

Change-Id: Ia4380d3296ebd5ce210908ce3582ff184566f692
Reviewed-on: https://go-review.googlesource.com/22004
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-13 20:35:09 +00:00
Austin Clements
4721ea6abc runtime/internal/atomic: rename Storep1 to StorepNoWB
Make it clear that the point of this function stores a pointer
*without* a write barrier.

sed -i -e 's/Storep1/StorepNoWB/' $(git grep -l Storep1)

Updates #15270.

Change-Id: Ifad7e17815e51a738070655fe3b178afdadaecf6
Reviewed-on: https://go-review.googlesource.com/21994
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2016-04-13 19:17:25 +00:00
Austin Clements
d8e8fc292a runtime/internal/atomic: remove write barrier from Storep1 on s390x
atomic.Storep1 is not supposed to invoke a write barrier (that's what
atomicstorep is for), but currently does on s390x. This causes a panic
in runtime.mapzero when it tries to use atomic.Storep1 to store what's
actually a scalar.

Fix this by eliminating the write barrier from atomic.Storep1 on
s390x. Also add some documentation to atomicstorep to explain the
difference between these.

Fixes #15270.

Change-Id: I291846732d82f090a218df3ef6351180aff54e81
Reviewed-on: https://go-review.googlesource.com/21993
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
2016-04-13 16:06:51 +00:00
Lynn Boger
c4807d4cc7 runtime: improve memmove performance ppc64,ppc64le
This change improves the performance of memmove
on ppc64 & ppc64le mainly for moves >=32 bytes.
In addition, the test to detect backward moves
 was enhanced to avoid backward moves if source
and dest were in different types of storage, since
backward moves might not always be efficient.

Fixes #14507

The following shows some of the improvements from the test
in the runtime package:

BenchmarkMemmove32                   4229.56      4717.13      1.12x
BenchmarkMemmove64                   6156.03      7810.42      1.27x
BenchmarkMemmove128                  7521.69      12468.54     1.66x
BenchmarkMemmove256                  6729.90      18260.33     2.71x
BenchmarkMemmove512                  8521.59      18033.81     2.12x
BenchmarkMemmove1024                 9760.92      25762.61     2.64x
BenchmarkMemmove2048                 10241.00     29584.94     2.89x
BenchmarkMemmove4096                 10399.37     31882.31     3.07x

BenchmarkMemmoveUnalignedDst16       1943.69      2258.33      1.16x
BenchmarkMemmoveUnalignedDst32       3885.08      3965.81      1.02x
BenchmarkMemmoveUnalignedDst64       5121.63      6965.54      1.36x
BenchmarkMemmoveUnalignedDst128      7212.34      11372.68     1.58x
BenchmarkMemmoveUnalignedDst256      6564.52      16913.59     2.58x
BenchmarkMemmoveUnalignedDst512      8364.35      17782.57     2.13x
BenchmarkMemmoveUnalignedDst1024     9539.87      24914.72     2.61x
BenchmarkMemmoveUnalignedDst2048     9199.23      21235.11     2.31x
BenchmarkMemmoveUnalignedDst4096     10077.39     25231.99     2.50x

BenchmarkMemmoveUnalignedSrc32       3249.83      3742.52      1.15x
BenchmarkMemmoveUnalignedSrc64       5562.35      6627.96      1.19x
BenchmarkMemmoveUnalignedSrc128      6023.98      10200.84     1.69x
BenchmarkMemmoveUnalignedSrc256      6921.83      15258.43     2.20x
BenchmarkMemmoveUnalignedSrc512      8593.13      16541.97     1.93x
BenchmarkMemmoveUnalignedSrc1024     9730.95      22927.84     2.36x
BenchmarkMemmoveUnalignedSrc2048     9793.28      21537.73     2.20x
BenchmarkMemmoveUnalignedSrc4096     10132.96     26295.06     2.60x

Change-Id: I73af59970d4c97c728deabb9708b31ec7e01bdf2
Reviewed-on: https://go-review.googlesource.com/21990
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-13 15:27:59 +00:00
David Crawshaw
7d469179e6 cmd/compile, etc: store method tables as offsets
This CL introduces the typeOff type and a lookup method of the same
name that can turn a typeOff offset into an *rtype.

In a typical Go binary (built with buildmode=exe, pie, c-archive, or
c-shared), there is one moduledata and all typeOff values are offsets
relative to firstmoduledata.types. This makes computing the pointer
cheap in typical programs.

With buildmode=shared (and one day, buildmode=plugin) there are
multiple modules whose relative offset is determined at runtime.
We identify a type in the general case by the pair of the original
*rtype that references it and its typeOff value. We determine
the module from the original pointer, and then use the typeOff from
there to compute the final *rtype.

To ensure there is only one *rtype representing each type, the
runtime initializes a typemap for each module, using any identical
type from an earlier module when resolving that offset. This means
that types computed from an offset match the type mapped by the
pointer dynamic relocations.

A series of followup CLs will replace other *rtype values with typeOff
(and name/*string with nameOff).

For types created at runtime by reflect, type offsets are treated as
global IDs and reference into a reflect offset map kept by the runtime.

darwin/amd64:
	cmd/go:  -57KB (0.6%)
	jujud:  -557KB (0.8%)

linux/amd64 PIE:
	cmd/go: -361KB (3.0%)
	jujud:  -3.5MB (4.2%)

For #6853.

Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96
Reviewed-on: https://go-review.googlesource.com/21285
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-13 13:03:11 +00:00
Matthew Dempsky
6af4e996e2 runtime: simplify setPanicOnFault slightly
No need to acquire the M just to change G's paniconfault flag, and the
original C implementation of SetPanicOnFault did not. The M
acquisition logic is an artifact of golang.org/cl/131010044, which was
started before golang.org/cl/123640043 (which introduced the current
"getg" function) was submitted.

Change-Id: I6d1939008660210be46904395cf5f5bbc2c8f754
Reviewed-on: https://go-review.googlesource.com/21935
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-13 06:14:06 +00:00
Keith Randall
260b7daf0a cmd/compile: fix arg to getcallerpc
getcallerpc's arg needs to point to the first argument slot.
I believe this bug was introduced by Michel's itab changes
(specifically https://go-review.googlesource.com/c/20902).

Fixes #15145

Change-Id: Ifb2e17f3658e2136c7950dfc789b4d5706320683
Reviewed-on: https://go-review.googlesource.com/21931
Reviewed-by: Michel Lespinasse <walken@google.com>
2016-04-13 00:24:38 +00:00
David Crawshaw
f028b9f9e2 cmd/link, etc: store typelinks as offsets
This is the first in a series of CLs to replace the use of pointers
in binary read-only data with offsets.

In standard Go binaries these CLs have a small effect, shrinking
8-byte pointers to 4-bytes. In position-independent code, it also
saves the dynamic relocation for the pointer. This has a significant
effect on the binary size when building as PIE, c-archive, or
c-shared.

darwin/amd64:
	cmd/go: -12KB (0.1%)
	jujud:  -82KB (0.1%)

linux/amd64 PIE:
	cmd/go:  -86KB (0.7%)
	jujud:  -569KB (0.7%)

For #6853.

Change-Id: Iad5625bbeba58dabfd4d334dbee3fcbfe04b2dcf
Reviewed-on: https://go-review.googlesource.com/21284
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-12 20:32:41 +00:00
Michael Munday
78ecd61f62 runtime/cgo: add s390x support
Change-Id: I64ada9fe34c3cfc4bd514ec5d8c8f4d4c99074fb
Reviewed-on: https://go-review.googlesource.com/20950
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-12 15:42:36 +00:00
Michael Munday
7cbe7b1e86 runtime/internal/atomic: add s390x atomic operations
Load and store instructions are atomic on the s390x.

Change-Id: I0031ed2fba43f33863bca114d0fdec2e7d1ce807
Reviewed-on: https://go-review.googlesource.com/20938
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-12 15:40:48 +00:00
Dmitry Vyukov
3fafe2e888 internal/trace: support parsing of 1.5 traces
1. Parse out version from trace header.
2. Restore handling of 1.5 traces.
3. Restore optional symbolization of traces.
4. Add some canned 1.5 traces for regression testing
   (http benchmark trace, runtime/trace stress traces,
    plus one with broken timestamps).

Change-Id: Idb18a001d03ded8e13c2730eeeb37c5836e31256
Reviewed-on: https://go-review.googlesource.com/21803
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-04-11 17:56:44 +00:00
Dominik Honnef
683917a721 all: use bytes.Equal, bytes.Contains and strings.Contains, again
The previous cleanup was done with a buggy tool, missing some potential
rewrites.

Change-Id: I333467036e355f999a6a493e8de87e084f374e26
Reviewed-on: https://go-review.googlesource.com/21378
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-11 15:16:54 +00:00
Dave Cheney
720c4c016c runtime: merge lfstack_amd64.go into lfstack_64bit.go
Merge the amd64 lfstack implementation into the general 64 bit
implementation.

Change-Id: Id9ed61b90d2e3bc3b0246294c03eb2c92803b6ca
Reviewed-on: https://go-review.googlesource.com/21707
Run-TryBot: Dave Cheney <dave@cheney.net>
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-11 06:18:52 +00:00
Jeremy Jackins
ba09d06e16 runtime: remove remaining references to TheChar
After mdempsky's recent changes, these are the only references to
"TheChar" left in the Go tree. Without the context, and without
knowing the history, this is confusing.

Also rename sys.TheGoos and sys.TheGoarch to sys.GOOS
and sys.GOARCH.

Also change the heap dump format to include sys.GOARCH
rather than TheChar, which is no longer a concept.

Updates #15169 (changes heapdump format)

Change-Id: I3e99eeeae00ed55d7d01e6ed503d958c6e931dca
Reviewed-on: https://go-review.googlesource.com/21647
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-04-11 04:32:07 +00:00
Martin Möhrmann
ad7448fe98 runtime: speed up makeslice by avoiding divisions
Only compute the number of maximum allowed elements per slice once.

name         old time/op  new time/op  delta
MakeSlice-2  55.5ns ± 1%  45.6ns ± 2%  -17.88%  (p=0.000 n=99+100)

Change-Id: I951feffda5d11910a75e55d7e978d306d14da2c5
Reviewed-on: https://go-review.googlesource.com/21801
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-04-10 23:39:46 +00:00
Josh Bleecher Snyder
6b33b0e98e cmd/compile: avoid a spill in append fast path
Instead of spilling newlen, recalculate it.
This removes a spill from the fast path,
at the cost of a cheap recalculation
on the (rare) growth path.
This uses 8 bytes less of stack space.
It generates two more bytes of code,
but that is due to suboptimal register allocation;
see far below.

Runtime append microbenchmarks are all over the map,
presumably due to incidental code movement.

Sample code:

func s(b []byte) []byte {
	b = append(b, 1, 2, 3)
	return b
}

Before:

"".s t=1 size=160 args=0x30 locals=0x48
	0x0000 00000 (append.go:8)	TEXT	"".s(SB), $72-48
	0x0000 00000 (append.go:8)	MOVQ	(TLS), CX
	0x0009 00009 (append.go:8)	CMPQ	SP, 16(CX)
	0x000d 00013 (append.go:8)	JLS	149
	0x0013 00019 (append.go:8)	SUBQ	$72, SP
	0x0017 00023 (append.go:8)	FUNCDATA	$0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB)
	0x0017 00023 (append.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0017 00023 (append.go:9)	MOVQ	"".b+88(FP), CX
	0x001c 00028 (append.go:9)	LEAQ	3(CX), DX
	0x0020 00032 (append.go:9)	MOVQ	DX, "".autotmp_0+64(SP)
	0x0025 00037 (append.go:9)	MOVQ	"".b+96(FP), BX
	0x002a 00042 (append.go:9)	CMPQ	DX, BX
	0x002d 00045 (append.go:9)	JGT	$0, 86
	0x002f 00047 (append.go:8)	MOVQ	"".b+80(FP), AX
	0x0034 00052 (append.go:9)	MOVB	$1, (AX)(CX*1)
	0x0038 00056 (append.go:9)	MOVB	$2, 1(AX)(CX*1)
	0x003d 00061 (append.go:9)	MOVB	$3, 2(AX)(CX*1)
	0x0042 00066 (append.go:10)	MOVQ	AX, "".~r1+104(FP)
	0x0047 00071 (append.go:10)	MOVQ	DX, "".~r1+112(FP)
	0x004c 00076 (append.go:10)	MOVQ	BX, "".~r1+120(FP)
	0x0051 00081 (append.go:10)	ADDQ	$72, SP
	0x0055 00085 (append.go:10)	RET
	0x0056 00086 (append.go:9)	LEAQ	type.[]uint8(SB), AX
	0x005d 00093 (append.go:9)	MOVQ	AX, (SP)
	0x0061 00097 (append.go:9)	MOVQ	"".b+80(FP), BP
	0x0066 00102 (append.go:9)	MOVQ	BP, 8(SP)
	0x006b 00107 (append.go:9)	MOVQ	CX, 16(SP)
	0x0070 00112 (append.go:9)	MOVQ	BX, 24(SP)
	0x0075 00117 (append.go:9)	MOVQ	DX, 32(SP)
	0x007a 00122 (append.go:9)	PCDATA	$0, $0
	0x007a 00122 (append.go:9)	CALL	runtime.growslice(SB)
	0x007f 00127 (append.go:9)	MOVQ	40(SP), AX
	0x0084 00132 (append.go:9)	MOVQ	56(SP), BX
	0x0089 00137 (append.go:8)	MOVQ	"".b+88(FP), CX
	0x008e 00142 (append.go:9)	MOVQ	"".autotmp_0+64(SP), DX
	0x0093 00147 (append.go:9)	JMP	52
	0x0095 00149 (append.go:9)	NOP
	0x0095 00149 (append.go:8)	CALL	runtime.morestack_noctxt(SB)
	0x009a 00154 (append.go:8)	JMP	0

After:

"".s t=1 size=176 args=0x30 locals=0x40
	0x0000 00000 (append.go:8)	TEXT	"".s(SB), $64-48
	0x0000 00000 (append.go:8)	MOVQ	(TLS), CX
	0x0009 00009 (append.go:8)	CMPQ	SP, 16(CX)
	0x000d 00013 (append.go:8)	JLS	151
	0x0013 00019 (append.go:8)	SUBQ	$64, SP
	0x0017 00023 (append.go:8)	FUNCDATA	$0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB)
	0x0017 00023 (append.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0017 00023 (append.go:9)	MOVQ	"".b+80(FP), CX
	0x001c 00028 (append.go:9)	LEAQ	3(CX), DX
	0x0020 00032 (append.go:9)	MOVQ	"".b+88(FP), BX
	0x0025 00037 (append.go:9)	CMPQ	DX, BX
	0x0028 00040 (append.go:9)	JGT	$0, 81
	0x002a 00042 (append.go:8)	MOVQ	"".b+72(FP), AX
	0x002f 00047 (append.go:9)	MOVB	$1, (AX)(CX*1)
	0x0033 00051 (append.go:9)	MOVB	$2, 1(AX)(CX*1)
	0x0038 00056 (append.go:9)	MOVB	$3, 2(AX)(CX*1)
	0x003d 00061 (append.go:10)	MOVQ	AX, "".~r1+96(FP)
	0x0042 00066 (append.go:10)	MOVQ	DX, "".~r1+104(FP)
	0x0047 00071 (append.go:10)	MOVQ	BX, "".~r1+112(FP)
	0x004c 00076 (append.go:10)	ADDQ	$64, SP
	0x0050 00080 (append.go:10)	RET
	0x0051 00081 (append.go:9)	LEAQ	type.[]uint8(SB), AX
	0x0058 00088 (append.go:9)	MOVQ	AX, (SP)
	0x005c 00092 (append.go:9)	MOVQ	"".b+72(FP), BP
	0x0061 00097 (append.go:9)	MOVQ	BP, 8(SP)
	0x0066 00102 (append.go:9)	MOVQ	CX, 16(SP)
	0x006b 00107 (append.go:9)	MOVQ	BX, 24(SP)
	0x0070 00112 (append.go:9)	MOVQ	DX, 32(SP)
	0x0075 00117 (append.go:9)	PCDATA	$0, $0
	0x0075 00117 (append.go:9)	CALL	runtime.growslice(SB)
	0x007a 00122 (append.go:9)	MOVQ	40(SP), AX
	0x007f 00127 (append.go:9)	MOVQ	48(SP), CX
	0x0084 00132 (append.go:9)	MOVQ	56(SP), BX
	0x0089 00137 (append.go:9)	ADDQ	$3, CX
	0x008d 00141 (append.go:9)	MOVQ	CX, DX
	0x0090 00144 (append.go:8)	MOVQ	"".b+80(FP), CX
	0x0095 00149 (append.go:9)	JMP	47
	0x0097 00151 (append.go:9)	NOP
	0x0097 00151 (append.go:8)	CALL	runtime.morestack_noctxt(SB)
	0x009c 00156 (append.go:8)	JMP	0

Observe that in the following sequence,
we should use DX directly instead of using
CX as a temporary register, which would make
the new code a strict improvement on the old:

	0x007f 00127 (append.go:9)	MOVQ	48(SP), CX
	0x0084 00132 (append.go:9)	MOVQ	56(SP), BX
	0x0089 00137 (append.go:9)	ADDQ	$3, CX
	0x008d 00141 (append.go:9)	MOVQ	CX, DX
	0x0090 00144 (append.go:8)	MOVQ	"".b+80(FP), CX

Change-Id: I4ee50b18fa53865901d2d7f86c2cbb54c6fa6924
Reviewed-on: https://go-review.googlesource.com/21812
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 20:02:04 +00:00
Josh Bleecher Snyder
974c201f74 runtime: avoid unnecessary map iteration write barrier
Update #14921

Change-Id: I5c5816d0193757bf7465b1e09c27ca06897df4bf
Reviewed-on: https://go-review.googlesource.com/21814
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 20:01:47 +00:00
Emmanuel Odeke
e4f1d9cf2e runtime: make execution error panic values implement the Error interface
Make execution panics implement Error as
mandated by https://golang.org/ref/spec#Run_time_panics,
instead of panics with strings.

Fixes #14965

Change-Id: I7827f898b9b9c08af541db922cc24fa0800ff18a
Reviewed-on: https://go-review.googlesource.com/21214
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-10 01:16:30 +00:00
Dmitry Vyukov
0435e88a11 runtime: revert "do not call timeBeginPeriod on windows"
This reverts commit ab4c9298b8.

Sysmon critically depends on system timer resolution for retaking
of Ps blocked in system calls. See #14790 for an example
of a program where execution time goes from 2ms to 30ms if
timeBeginPeriod(1) is not used.

We can remove timeBeginPeriod(1) when we support UMS (#7876).

Update #14790

Change-Id: I362b56154359b2c52d47f9f2468fe012b481cf6d
Reviewed-on: https://go-review.googlesource.com/20834
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2016-04-09 16:11:41 +00:00
Dmitry Vyukov
0fb7b4cccd runtime: emit file:line info into traces
This makes traces self-contained and simplifies trace workflow
in modern cloud environments where it is simpler to reach
a service via HTTP than to obtain the binary.

Change-Id: I6ff3ca694dc698270f1e29da37d5efaf4e843a0d
Reviewed-on: https://go-review.googlesource.com/21732
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2016-04-08 20:52:30 +00:00
Michael Munday
e6f36f0cd5 runtime: add s390x support (new files and lfstack_64bit.go modifications)
Change-Id: I51c0a332e3cbdab348564e5dcd27583e75e4b881
Reviewed-on: https://go-review.googlesource.com/20946
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-07 18:56:54 +00:00
Dave Cheney
9cc9e95b28 Revert "runtime: merge lfstack{Pack,Unpack} into one file"
This broke solaris, which apparently does use the upper 17 bits of the address space.

This reverts commit 3b02c5b1b6.

Change-Id: Iedfe54abd0384960845468205f20191a97751c0b
Reviewed-on: https://go-review.googlesource.com/21652
Reviewed-by: Dave Cheney <dave@cheney.net>
2016-04-07 14:05:24 +00:00
Richard Miller
121c434f7a runtime/pprof: make TestBlockProfile less timing dependent
The test for profiling of channel blocking is timing dependent,
and in particular the blockSelectRecvAsync case can fail on a
slow builder (plan9_arm) when many tests are run in parallel.
The child goroutine sleeps for a fixed period so the parent
can be observed to block in a select call reading from the
child; but if the OS process running the parent goroutine is
delayed long enough, the child may wake again before the
parent has reached the blocking point.  By repeating the test
three times, the likelihood of a blocking event is increased.

Fixes #15096

Change-Id: I2ddb9576a83408d06b51ded682bf8e71e53ce59e
Reviewed-on: https://go-review.googlesource.com/21604
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-07 09:57:06 +00:00
Dave Cheney
3b02c5b1b6 runtime: merge lfstack{Pack,Unpack} into one file
Merge the remaining lfstack{Pack,Unpack} implemetations into one file.

unsafe.Sizeof(uintptr(0)) == 4 is a constant comparison so this branch
folds away at compile time.

Dmitry confirmed that the upper 17 bits of an address will be zero for a
user mode pointer, so there is no need to sign extend on amd64 during
unpack, so we can reuse the same implementation as all othe 64 bit
archs.

Change-Id: I99f589416d8b181ccde5364c9c2e78e4a5efc7f1
Reviewed-on: https://go-review.googlesource.com/21597
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-07 09:17:22 +00:00
Michael Hudson-Doyle
31cf1c1779 runtime: clamp OS-reported number of processors to _MaxGomaxprocs
So that all Go processes do not die on startup on a system with >256 CPUs.

I tested this by hacking osinit to set ncpu to 1000.

Updates #15131

Change-Id: I52e061a0de97be41d684dd8b748fa9087d6f1aef
Reviewed-on: https://go-review.googlesource.com/21599
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-07 00:11:25 +00:00
Dave Cheney
0c81248bf4 runtime: remove unused return value from lfstackUnpack
None of the two places that call lfstackUnpack use the second argument.
This simplifies a followup CL that merges the lfstack{Pack,Unpack}
implementations.

Change-Id: I3c93f6259da99e113d94f8c8027584da79c1ac2c
Reviewed-on: https://go-review.googlesource.com/21595
Run-TryBot: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-06 21:04:43 +00:00
Brad Fitzpatrick
2cefd12a1b net, runtime: skip flaky tests on OpenBSD
Flaky tests are a distraction and cover up real problems.

File bugs instead and mark them as flaky.

This moves the net/http flaky test flagging mechanism to internal/testenv.

Updates #15156
Updates #15157
Updates #15158

Change-Id: I0e561cd2a09c0dec369cd4ed93bc5a2b40233dfe
Reviewed-on: https://go-review.googlesource.com/21614
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-06 19:28:24 +00:00
Dave Cheney
5c7ae10f66 runtime: merge 64bit lfstack impls
Merge all the 64bit lfstack impls into one file, adjust build tags to
match.

Merge all the comments on the various lfstack implementations for
posterity.

lfstack_amd64.go can probably be merged, but it is slightly different so
that will happen in a followup.

Change-Id: I5362d5e127daa81c9cb9d4fa8a0cc5c5e5c2707c
Reviewed-on: https://go-review.googlesource.com/21591
Run-TryBot: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 06:47:31 +00:00
Brad Fitzpatrick
8455f3a3d5 os: consolidate os{1,2}_*.go files
Change-Id: I463ca59f486b2842f67f151a55f530ee10663830
Reviewed-on: https://go-review.googlesource.com/21568
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-06 05:04:47 +00:00
Brad Fitzpatrick
fd2bb1e30a runtime: rename os1_windows.go to os_windows.go
Change-Id: I11172f3d0e28f17b812e67a4db9cfe513b8e1974
Reviewed-on: https://go-review.googlesource.com/21565
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 04:26:01 +00:00
Brad Fitzpatrick
e095f53e9b runtime: merge os{,2}_windows.go into os1_windows.go.
A future CL will rename os1_windows.go to os_windows.go.

Change-Id: I223e76002dd1e9c9d1798fb0beac02c7d3bf4812
Reviewed-on: https://go-review.googlesource.com/21564
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 04:25:44 +00:00
Michael Munday
0f08dd2183 runtime: add s390x support (modified files only)
Change-Id: Ib79ad4a890994ad64edb1feb79bd242d26b5b08a
Reviewed-on: https://go-review.googlesource.com/20945
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-06 04:25:06 +00:00
Shenghou Ma
a2eded3421 runtime: get randomness from AT_RANDOM AUXV on linux/arm64
Fixes #15147.

Change-Id: Ibfe46c747dea987787a51eb0c95ccd8c5f24f366
Reviewed-on: https://go-review.googlesource.com/21580
Run-TryBot: Minux Ma <minux@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-06 04:23:37 +00:00
Brad Fitzpatrick
34c58065e5 runtime: rename os1_linux.go to os_linux.go
Change-Id: I938f61763c3256a876d62aeb54ef8c25cc4fc90e
Reviewed-on: https://go-review.googlesource.com/21567
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 03:55:38 +00:00
Brad Fitzpatrick
5103fbfdb2 runtime: merge os_linux.go into os1_linux.go
Change-Id: I791c47014fe69e8529c7b2f0b9a554e47902d46c
Reviewed-on: https://go-review.googlesource.com/21566
Reviewed-by: Minux Ma <minux@golang.org>
2016-04-06 03:54:47 +00:00
Brad Fitzpatrick
8556c76f88 runtime: minor Windows cleanup
Change-Id: I9a8081ef1109469e9577c642156aa635188d8954
Reviewed-on: https://go-review.googlesource.com/21538
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2016-04-06 02:23:29 +00:00
David Chase
d8c815d8b5 cmd/compile: note escape of parts of closured-capture vars
Missed a case for closure calls (OCALLFUNC && indirect) in
esc.go:esccall.

Cleanup to runtime code for windows to more thoroughly hide
a technical escape.  Also made code pickier about failing
to late non-optional kernel32.dll.

Fixes #14409.

Change-Id: Ie75486a2c8626c4583224e02e4872c2875f7bca5
Reviewed-on: https://go-review.googlesource.com/20102
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-05 18:10:09 +00:00
Dmitry Vyukov
475d113b53 runtime: don't burn CPU unnecessarily
Two GC-related functions, scang and casgstatus, wait in an active spin loop.
Active spinning is never a good idea in user-space. Once we wait several
times more than the expected wait time, something unexpected is happenning
(e.g. the thread we are waiting for is descheduled or handling a page fault)
and we need to yield to OS scheduler. Moreover, the expected wait time is
very high for these functions: scang wait time can be tens of milliseconds,
casgstatus can be hundreds of microseconds. It does not make sense to spin
even for that time.

go install -a std profile on a 4-core machine shows that 11% of time is spent
in the active spin in scang:

  6.12%    compile  compile                [.] runtime.scang
  3.27%    compile  compile                [.] runtime.readgstatus
  1.72%    compile  compile                [.] runtime/internal/atomic.Load

The active spin also increases tail latency in the case of the slightest
oversubscription: GC goroutines spend whole quantum in the loop instead of
executing user code.

Here is scang wait time histogram during go install -a std:

13707.0000 - 1815442.7667 [   118]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎...
1815442.7667 - 3617178.5333 [     9]: ∎∎∎∎∎∎∎∎∎
3617178.5333 - 5418914.3000 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
5418914.3000 - 7220650.0667 [     5]: ∎∎∎∎∎
7220650.0667 - 9022385.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
9022385.8333 - 10824121.6000 [    13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
10824121.6000 - 12625857.3667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
12625857.3667 - 14427593.1333 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
14427593.1333 - 16229328.9000 [    18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
16229328.9000 - 18031064.6667 [    32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
18031064.6667 - 19832800.4333 [    28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
19832800.4333 - 21634536.2000 [     6]: ∎∎∎∎∎∎
21634536.2000 - 23436271.9667 [    15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
23436271.9667 - 25238007.7333 [    11]: ∎∎∎∎∎∎∎∎∎∎∎
25238007.7333 - 27039743.5000 [    27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
27039743.5000 - 28841479.2667 [    20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
28841479.2667 - 30643215.0333 [    10]: ∎∎∎∎∎∎∎∎∎∎
30643215.0333 - 32444950.8000 [     7]: ∎∎∎∎∎∎∎
32444950.8000 - 34246686.5667 [     4]: ∎∎∎∎
34246686.5667 - 36048422.3333 [     4]: ∎∎∎∎
36048422.3333 - 37850158.1000 [     1]: ∎
37850158.1000 - 39651893.8667 [     5]: ∎∎∎∎∎
39651893.8667 - 41453629.6333 [     2]: ∎∎
41453629.6333 - 43255365.4000 [     2]: ∎∎
43255365.4000 - 45057101.1667 [     2]: ∎∎
45057101.1667 - 46858836.9333 [     1]: ∎
46858836.9333 - 48660572.7000 [     2]: ∎∎
48660572.7000 - 50462308.4667 [     3]: ∎∎∎
50462308.4667 - 52264044.2333 [     2]: ∎∎
52264044.2333 - 54065780.0000 [     2]: ∎∎

and the zoomed-in first part:

13707.0000 - 19916.7667 [     2]: ∎∎
19916.7667 - 26126.5333 [     2]: ∎∎
26126.5333 - 32336.3000 [     9]: ∎∎∎∎∎∎∎∎∎
32336.3000 - 38546.0667 [     8]: ∎∎∎∎∎∎∎∎
38546.0667 - 44755.8333 [    12]: ∎∎∎∎∎∎∎∎∎∎∎∎
44755.8333 - 50965.6000 [    10]: ∎∎∎∎∎∎∎∎∎∎
50965.6000 - 57175.3667 [     5]: ∎∎∎∎∎
57175.3667 - 63385.1333 [     6]: ∎∎∎∎∎∎
63385.1333 - 69594.9000 [     5]: ∎∎∎∎∎
69594.9000 - 75804.6667 [     6]: ∎∎∎∎∎∎
75804.6667 - 82014.4333 [     6]: ∎∎∎∎∎∎
82014.4333 - 88224.2000 [     4]: ∎∎∎∎
88224.2000 - 94433.9667 [     1]: ∎
94433.9667 - 100643.7333 [     1]: ∎
100643.7333 - 106853.5000 [     2]: ∎∎
106853.5000 - 113063.2667 [     0]:
113063.2667 - 119273.0333 [     2]: ∎∎
119273.0333 - 125482.8000 [     2]: ∎∎
125482.8000 - 131692.5667 [     1]: ∎
131692.5667 - 137902.3333 [     1]: ∎
137902.3333 - 144112.1000 [     0]:
144112.1000 - 150321.8667 [     2]: ∎∎
150321.8667 - 156531.6333 [     1]: ∎
156531.6333 - 162741.4000 [     1]: ∎
162741.4000 - 168951.1667 [     0]:
168951.1667 - 175160.9333 [     0]:
175160.9333 - 181370.7000 [     1]: ∎
181370.7000 - 187580.4667 [     1]: ∎
187580.4667 - 193790.2333 [     2]: ∎∎
193790.2333 - 200000.0000 [     0]:

Here is casgstatus wait time histogram:

  631.0000 -  5276.6333 [     3]: ∎∎∎
 5276.6333 -  9922.2667 [     5]: ∎∎∎∎∎
 9922.2667 - 14567.9000 [     2]: ∎∎
14567.9000 - 19213.5333 [     6]: ∎∎∎∎∎∎
19213.5333 - 23859.1667 [     5]: ∎∎∎∎∎
23859.1667 - 28504.8000 [     6]: ∎∎∎∎∎∎
28504.8000 - 33150.4333 [     6]: ∎∎∎∎∎∎
33150.4333 - 37796.0667 [     2]: ∎∎
37796.0667 - 42441.7000 [     1]: ∎
42441.7000 - 47087.3333 [     3]: ∎∎∎
47087.3333 - 51732.9667 [     0]:
51732.9667 - 56378.6000 [     1]: ∎
56378.6000 - 61024.2333 [     0]:
61024.2333 - 65669.8667 [     0]:
65669.8667 - 70315.5000 [     0]:
70315.5000 - 74961.1333 [     1]: ∎
74961.1333 - 79606.7667 [     0]:
79606.7667 - 84252.4000 [     0]:
84252.4000 - 88898.0333 [     0]:
88898.0333 - 93543.6667 [     0]:
93543.6667 - 98189.3000 [     0]:
98189.3000 - 102834.9333 [     0]:
102834.9333 - 107480.5667 [     1]: ∎
107480.5667 - 112126.2000 [     0]:
112126.2000 - 116771.8333 [     0]:
116771.8333 - 121417.4667 [     0]:
121417.4667 - 126063.1000 [     0]:
126063.1000 - 130708.7333 [     0]:
130708.7333 - 135354.3667 [     0]:
135354.3667 - 140000.0000 [     1]: ∎

Ideally we eliminate the waiting by switching to async
state machine for GC, but for now just yield to OS scheduler
after a reasonable wait time.

To choose yielding parameters I've measured
golang.org/x/benchmarks/http tail latencies with different yield
delays and oversubscription levels.

With no oversubscription (to the degree possible):

scang yield delay = 1, casgstatus yield delay = 1
Latency-50   1.41ms ±15%  1.41ms ± 5%    ~     (p=0.611 n=13+12)
Latency-95   5.21ms ± 2%  5.15ms ± 2%  -1.15%  (p=0.012 n=13+13)
Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.54%  (p=0.002 n=13+13)
Latency-999  10.7ms ± 9%  10.2ms ±10%  -5.46%  (p=0.004 n=12+13)

scang yield delay = 5000, casgstatus yield delay = 3000
Latency-50   1.41ms ±15%  1.41ms ± 8%    ~     (p=0.511 n=13+13)
Latency-95   5.21ms ± 2%  5.14ms ± 2%  -1.23%  (p=0.006 n=13+13)
Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.94%  (p=0.000 n=13+13)
Latency-999  10.7ms ± 9%  10.1ms ± 8%  -6.14%  (p=0.000 n=12+13)

scang yield delay = 10000, casgstatus yield delay = 5000
Latency-50   1.41ms ±15%  1.45ms ± 6%    ~     (p=0.724 n=13+13)
Latency-95   5.21ms ± 2%  5.18ms ± 1%    ~     (p=0.287 n=13+13)
Latency-99   7.16ms ± 2%  7.05ms ± 2%  -1.64%  (p=0.002 n=13+13)
Latency-999  10.7ms ± 9%  10.0ms ± 5%  -6.72%  (p=0.000 n=12+13)

scang yield delay = 30000, casgstatus yield delay = 10000
Latency-50   1.41ms ±15%  1.51ms ± 7%  +6.57%  (p=0.002 n=13+13)
Latency-95   5.21ms ± 2%  5.21ms ± 2%    ~     (p=0.960 n=13+13)
Latency-99   7.16ms ± 2%  7.06ms ± 2%  -1.50%  (p=0.012 n=13+13)
Latency-999  10.7ms ± 9%  10.0ms ± 6%  -6.49%  (p=0.000 n=12+13)

scang yield delay = 100000, casgstatus yield delay = 50000
Latency-50   1.41ms ±15%  1.53ms ± 6%  +8.48%  (p=0.000 n=13+12)
Latency-95   5.21ms ± 2%  5.23ms ± 2%    ~     (p=0.287 n=13+13)
Latency-99   7.16ms ± 2%  7.08ms ± 2%  -1.21%  (p=0.004 n=13+13)
Latency-999  10.7ms ± 9%   9.9ms ± 3%  -7.99%  (p=0.000 n=12+12)

scang yield delay = 200000, casgstatus yield delay = 100000
Latency-50   1.41ms ±15%  1.47ms ± 5%    ~     (p=0.072 n=13+13)
Latency-95   5.21ms ± 2%  5.17ms ± 2%    ~     (p=0.091 n=13+13)
Latency-99   7.16ms ± 2%  7.02ms ± 2%  -1.99%  (p=0.000 n=13+13)
Latency-999  10.7ms ± 9%   9.9ms ± 5%  -7.86%  (p=0.000 n=12+13)

With slight oversubscription (another instance of http benchmark
was running in background with reduced GOMAXPROCS):

scang yield delay = 1, casgstatus yield delay = 1
Latency-50    840µs ± 3%   804µs ± 3%  -4.37%  (p=0.000 n=15+18)
Latency-95   6.52ms ± 4%  6.03ms ± 4%  -7.51%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%  10.0ms ± 4%  -7.33%  (p=0.000 n=18+14)
Latency-999  18.0ms ± 9%  16.8ms ± 7%  -6.84%  (p=0.000 n=18+18)

scang yield delay = 5000, casgstatus yield delay = 3000
Latency-50    840µs ± 3%   809µs ± 3%  -3.71%  (p=0.000 n=15+17)
Latency-95   6.52ms ± 4%  6.11ms ± 4%  -6.29%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%   9.9ms ± 6%  -7.55%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.5ms ±11%  -8.49%  (p=0.000 n=18+18)

scang yield delay = 10000, casgstatus yield delay = 5000
Latency-50    840µs ± 3%   823µs ± 5%  -2.06%  (p=0.002 n=15+18)
Latency-95   6.52ms ± 4%  6.32ms ± 3%  -3.05%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 4%  -5.22%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.7ms ±10%  -7.09%  (p=0.000 n=18+18)

scang yield delay = 30000, casgstatus yield delay = 10000
Latency-50    840µs ± 3%   836µs ± 5%    ~     (p=0.442 n=15+18)
Latency-95   6.52ms ± 4%  6.39ms ± 3%  -2.00%  (p=0.000 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 6%  -5.15%  (p=0.000 n=18+17)
Latency-999  18.0ms ± 9%  16.6ms ± 8%  -7.48%  (p=0.000 n=18+18)

scang yield delay = 100000, casgstatus yield delay = 50000
Latency-50    840µs ± 3%   836µs ± 6%    ~     (p=0.401 n=15+18)
Latency-95   6.52ms ± 4%  6.40ms ± 4%  -1.79%  (p=0.010 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 5%  -4.95%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.5ms ±14%  -8.17%  (p=0.000 n=18+18)

scang yield delay = 200000, casgstatus yield delay = 100000
Latency-50    840µs ± 3%   828µs ± 2%  -1.49%  (p=0.001 n=15+17)
Latency-95   6.52ms ± 4%  6.38ms ± 4%  -2.04%  (p=0.001 n=18+18)
Latency-99   10.8ms ± 7%  10.2ms ± 4%  -4.77%  (p=0.000 n=18+18)
Latency-999  18.0ms ± 9%  16.9ms ± 9%  -6.23%  (p=0.000 n=18+18)

With significant oversubscription (background http benchmark
was running with full GOMAXPROCS):

scang yield delay = 1, casgstatus yield delay = 1
Latency-50   1.32ms ±12%  1.30ms ±13%    ~     (p=0.454 n=14+14)
Latency-95   16.3ms ±10%  15.3ms ± 7%  -6.29%  (p=0.001 n=14+14)
Latency-99   29.4ms ±10%  27.9ms ± 5%  -5.04%  (p=0.001 n=14+12)
Latency-999  49.9ms ±19%  45.9ms ± 5%  -8.00%  (p=0.008 n=14+13)

scang yield delay = 5000, casgstatus yield delay = 3000
Latency-50   1.32ms ±12%  1.29ms ± 9%    ~     (p=0.227 n=14+14)
Latency-95   16.3ms ±10%  15.4ms ± 5%  -5.27%  (p=0.002 n=14+14)
Latency-99   29.4ms ±10%  27.9ms ± 6%  -5.16%  (p=0.001 n=14+14)
Latency-999  49.9ms ±19%  46.8ms ± 8%  -6.21%  (p=0.050 n=14+14)

scang yield delay = 10000, casgstatus yield delay = 5000
Latency-50   1.32ms ±12%  1.35ms ± 9%     ~     (p=0.401 n=14+14)
Latency-95   16.3ms ±10%  15.0ms ± 4%   -7.67%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.4ms ± 5%   -6.98%  (p=0.000 n=14+14)
Latency-999  49.9ms ±19%  44.7ms ± 5%  -10.56%  (p=0.000 n=14+11)

scang yield delay = 30000, casgstatus yield delay = 10000
Latency-50   1.32ms ±12%  1.36ms ±10%     ~     (p=0.246 n=14+14)
Latency-95   16.3ms ±10%  14.9ms ± 5%   -8.31%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.4ms ± 7%   -6.70%  (p=0.000 n=14+14)
Latency-999  49.9ms ±19%  44.9ms ±15%  -10.13%  (p=0.003 n=14+14)

scang yield delay = 100000, casgstatus yield delay = 50000
Latency-50   1.32ms ±12%  1.41ms ± 9%  +6.37%  (p=0.008 n=14+13)
Latency-95   16.3ms ±10%  15.1ms ± 8%  -7.45%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.5ms ±12%  -6.67%  (p=0.002 n=14+14)
Latency-999  49.9ms ±19%  45.9ms ±16%  -8.06%  (p=0.019 n=14+14)

scang yield delay = 200000, casgstatus yield delay = 100000
Latency-50   1.32ms ±12%  1.42ms ±10%   +7.21%  (p=0.003 n=14+14)
Latency-95   16.3ms ±10%  15.0ms ± 7%   -7.59%  (p=0.000 n=14+14)
Latency-99   29.4ms ±10%  27.3ms ± 8%   -7.20%  (p=0.000 n=14+14)
Latency-999  49.9ms ±19%  44.8ms ± 8%  -10.21%  (p=0.001 n=14+13)

All numbers are on 8 cores and with GOGC=10 (http benchmark has
tiny heap, few goroutines and low allocation rate, so by default
GC barely affects tail latency).

10us/5us yield delays seem to provide a reasonable compromise
and give 5-10% tail latency reduction. That's what used in this change.

go install -a std results on 4 core machine:

name      old time/op  new time/op  delta
Time       8.39s ± 2%   7.94s ± 2%  -5.34%  (p=0.000 n=47+49)
UserTime   24.6s ± 2%   22.9s ± 2%  -6.76%  (p=0.000 n=49+49)
SysTime    1.77s ± 9%   1.89s ±11%  +7.00%  (p=0.000 n=49+49)
CpuLoad    315ns ± 2%   313ns ± 1%  -0.59%  (p=0.000 n=49+48) # %CPU
MaxRSS    97.1ms ± 4%  97.5ms ± 9%    ~     (p=0.838 n=46+49) # bytes

Update #14396
Update #14189

Change-Id: I3f4109bf8f7fd79b39c466576690a778232055a2
Reviewed-on: https://go-review.googlesource.com/21503
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-04-05 15:52:03 +00:00
Dmitry Vyukov
3b246fa863 runtime: sleep less when we can do work
Usleep(100) in runqgrab negatively affects latency and throughput
of parallel application. We are sleeping instead of doing useful work.
This is effect is particularly visible on windows where minimal
sleep duration is 1-15ms.

Reduce sleep from 100us to 3us and use osyield on windows.
Sync chan send/recv takes ~50ns, so 3us gives us ~50x overshoot.

benchmark                    old ns/op     new ns/op     delta
BenchmarkChanSync-12         216           217           +0.46%
BenchmarkChanSyncWork-12     27213         25816         -5.13%

CPU consumption goes up from 106% to 108% in the first case,
and from 107% to 125% in the second case.

Test case from #14790 on windows:

BenchmarkDefaultResolution-8  4583372   29720    -99.35%
Benchmark1ms-8                992056    30701    -96.91%

99-th latency percentile for HTTP request serving is improved by up to 15%
(see http://golang.org/cl/20835 for details).

The following benchmarks are from the change that originally added this sleep
(see https://golang.org/s/go15gomaxprocs):

name        old time/op  new time/op  delta
Chain       22.6µs ± 2%  22.7µs ± 6%    ~      (p=0.905 n=9+10)
ChainBuf    22.4µs ± 3%  22.5µs ± 4%    ~      (p=0.780 n=9+10)
Chain-2     23.5µs ± 4%  24.9µs ± 1%  +5.66%   (p=0.000 n=10+9)
ChainBuf-2  23.7µs ± 1%  24.4µs ± 1%  +3.31%   (p=0.000 n=9+10)
Chain-4     24.2µs ± 2%  25.1µs ± 3%  +3.70%   (p=0.000 n=9+10)
ChainBuf-4  24.4µs ± 5%  25.0µs ± 2%  +2.37%  (p=0.023 n=10+10)
Powser       2.37s ± 1%   2.37s ± 1%    ~       (p=0.423 n=8+9)
Powser-2     2.48s ± 2%   2.57s ± 2%  +3.74%   (p=0.000 n=10+9)
Powser-4     2.66s ± 1%   2.75s ± 1%  +3.40%  (p=0.000 n=10+10)
Sieve        13.3s ± 2%   13.3s ± 2%    ~      (p=1.000 n=10+9)
Sieve-2      7.00s ± 2%   7.44s ±16%    ~      (p=0.408 n=8+10)
Sieve-4      4.13s ±21%   3.85s ±22%    ~       (p=0.113 n=9+9)

Fixes #14790

Change-Id: Ie7c6a1c4f9c8eb2f5d65ab127a3845386d6f8b5d
Reviewed-on: https://go-review.googlesource.com/20835
Reviewed-by: Austin Clements <austin@google.com>
2016-04-05 15:32:06 +00:00
Alex Brainman
ffeae198d0 runtime: leave directory before removing it in TestDLLPreloadMitigation
Fixes #15120

Change-Id: I1d9a192ac163826bad8b46e8c0b0b9e218e69570
Reviewed-on: https://go-review.googlesource.com/21520
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-05 04:32:49 +00:00
Alex Brainman
fcac88098b runtime: remove race out of BenchmarkChanToSyscallPing1ms
Fixes #15119

Change-Id: I31445bf282a5e2a160ff4e66c5a592b989a5798f
Reviewed-on: https://go-review.googlesource.com/21448
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-05 00:52:40 +00:00
Austin Clements
61f56e925e runtime: fix pagesInUse accounting
When we grow the heap, we create a temporary "in use" span for the
memory acquired from the OS and then free that span to link it into
the heap. Hence, we (1) increase pagesInUse when we make the temporary
span so that (2) freeing the span will correctly decrease it.

However, currently step (1) increases pagesInUse by the number of
pages requested from the heap, while step (2) decreases it by the
number of pages requested from the OS (the size of the temporary
span). These aren't necessarily the same, since we round up the number
of pages we request from the OS, so steps 1 and 2 don't necessarily
cancel out like they're supposed to. Over time, this can add up and
cause pagesInUse to underflow and wrap around to 2^64. The garbage
collector computes the sweep ratio from this, so if this happens, the
sweep ratio becomes effectively infinite, causing the first allocation
on each P in a sweep cycle to sweep the entire heap. This makes
sweeping effectively STW.

Fix this by increasing pagesInUse in step 1 by the number of pages
requested from the OS, so that the two steps correctly cancel out. We
add a test that checks that the running total matches the actual state
of the heap.

Fixes #15022. For 1.6.x.

Change-Id: Iefd9d6abe37d0d447cbdbdf9941662e4f18eeffc
Reviewed-on: https://go-review.googlesource.com/21280
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-04-04 15:33:26 +00:00
Alex Brainman
1f5b1b2b66 runtime: change osyield to use Windows SwitchToThread
It appears that windows osyield is just 15ms sleep on my computer
(see benchmarks below). Replace NtWaitForSingleObject in osyield
with SwitchToThread (as suggested by Dmitry).

Also add issue #14790 related benchmarks, so we can track perfomance
changes in CL 20834 and CL 20835 and beyond.

Update #14790

benchmark                             old ns/op     new ns/op     delta
BenchmarkChanToSyscallPing1ms         1953200       1953000       -0.01%
BenchmarkChanToSyscallPing15ms        31562904      31248400      -1.00%
BenchmarkSyscallToSyscallPing1ms      5247          4202          -19.92%
BenchmarkSyscallToSyscallPing15ms     5260          4374          -16.84%
BenchmarkChanToChanPing1ms            474           494           +4.22%
BenchmarkChanToChanPing15ms           468           489           +4.49%
BenchmarkOsYield1ms                   980018        75.5          -99.99%
BenchmarkOsYield15ms                  15625200      75.8          -100.00%

Change-Id: I1b4cc7caca784e2548ee3c846ca07ef152ebedce
Reviewed-on: https://go-review.googlesource.com/21294
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-04 10:05:05 +00:00
Christopher Nelson
ed8f0e5c33 cmd/go: fix -buildmode=c-archive should work on windows
Add supporting code for runtime initialization, including both
32- and 64-bit x86 architectures.

Add .ctors section on Windows to PE .o files, and INITENTRY to .ctors
section to plug in to the GCC C/C++ startup initialization mechanism.
This allows the Go runtime to initialize itself. Add .text section
symbol for .ctor relocations. Note: This is unlikely to be useful for
MSVC-based toolchains.

Fixes #13494

Change-Id: I4286a96f70e5f5228acae88eef46e2bed95813f3
Reviewed-on: https://go-review.googlesource.com/18057
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2016-04-04 03:38:25 +00:00
Eric Engestrom
7a8caf7d43 all: fix spelling mistakes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>

Change-Id: I91873aaebf79bdf1c00d38aacc1a1fb8d79656a7
Reviewed-on: https://go-review.googlesource.com/21433
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-03 17:03:15 +00:00
Brad Fitzpatrick
683448a304 runtime, syscall: only search for Windows DLLs in the System32 directory
Make sure that for any DLL that Go uses itself, we only look for the
DLL in the Windows System32 directory, guarding against DLL preloading
attacks.

(Unless the Windows version is ancient and LoadLibraryEx is
unavailable, in which case the user probably has bigger security
problems anyway.)

This does not change the behavior of syscall.LoadLibrary or NewLazyDLL
if the DLL name is something unused by Go itself.

This change also intentionally does not add any new API surface. Instead,
x/sys is updated with a LoadLibraryEx function and LazyDLL.Flags in:
    https://golang.org/cl/21388

Updates #14959

Change-Id: I8d29200559cc19edf8dcf41dbdd39a389cd6aeb9
Reviewed-on: https://go-review.googlesource.com/21140
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-01 22:55:36 +00:00
Ian Lance Taylor
59fc42b230 runtime: allocate mp.cgocallers earlier
Fixes #15061.

Change-Id: I71f69f398d1c5f3a884bbd044786f1a5600d0fae
Reviewed-on: https://go-review.googlesource.com/21398
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-01 22:23:13 +00:00
Ian Lance Taylor
58394fd7d5 runtime/cgo: only build _cgo_callers if x_cgo_callers is defined
Fixes a problem when using the external linker on Solaris.  The Solaris
external linker still doesn't work due to issue #14957.

The problem is, for example, with `go test cmd/objdump`:

        objdump_test.go:71: go build fmthello.go: exit status 2
                # command-line-arguments
                /var/gcc/iant/go/pkg/tool/solaris_amd64/link: running gcc failed: exit status 1
                Undefined                       first referenced
                 symbol                             in file
                x_cgo_callers                       /tmp/go-link-355600608/go.o
                ld: fatal: symbol referencing errors
                collect2: error: ld returned 1 exit status

Change-Id: I54917cfd5c288ee77ea25c439489bd2c9124fe73
Reviewed-on: https://go-review.googlesource.com/21392
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-04-01 16:13:45 +00:00
Ian Lance Taylor
ea306ae625 runtime: support symbolic backtrace of C code in a cgo crash
The new function runtime.SetCgoTraceback may be used to register stack
traceback and symbolizer functions, written in C, to do a stack
traceback from cgo code.

There is a sample implementation of runtime.SetCgoSymbolizer at
github.com/ianlancetaylor/cgosymbolizer.  Just importing that package is
sufficient to get symbolic C backtraces.

Currently only supported on linux/amd64.

Change-Id: If96ee2eb41c6c7379d407b9561b87557bfe47341
Reviewed-on: https://go-review.googlesource.com/17761
Reviewed-by: Austin Clements <austin@google.com>
2016-04-01 04:13:44 +00:00
Matthew Dempsky
e6066711a0 cmd/compile, runtime: fix pedantic int->string conversions
Previously, cmd/compile rejected constant int->string conversions if
the integer value did not fit into an "int" value. Also, runtime
incorrectly truncated 64-bit values to 32-bit before checking if
they're a valid Unicode code point. According to the Go spec, both of
these cases should instead yield "\uFFFD".

Fixes #15039.

Change-Id: I3c8a3ad9a0780c0a8dc1911386a523800fec9764
Reviewed-on: https://go-review.googlesource.com/21344
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-31 10:28:23 +00:00
Keith Randall
4b209dbf0b runtime: don't use REP;MOVSB if CPUID doesn't say it is fast
Only use REP;MOVSB if:
 1) The CPUID flag says it is fast, and
 2) The pointers are unaligned
Otherwise, use REP;MOVSQ.

Update #14630

Change-Id: I946b28b87880c08e5eed1ce2945016466c89db66
Reviewed-on: https://go-review.googlesource.com/21300
Reviewed-by: Nigel Tao <nigeltao@golang.org>
2016-03-31 02:54:10 +00:00
Austin Clements
17f6e5396b runtime: print sweep ratio if gcpacertrace>0
Change-Id: I5217bf4b75e110ca2946e1abecac6310ed84dad5
Reviewed-on: https://go-review.googlesource.com/21205
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-03-30 02:27:58 +00:00
Michel Lespinasse
859b63cc09 cmd/compile: optimize remaining convT2I calls
See #14874
Updates #6853

This change adds a compiler optimization for non pointer shaped convT2I.
Since itab symbols are now emitted by the compiler, the itab address can
be passed directly to convT2I instead of passing the iface type and a
cache pointer argument.

Compilebench results for the 5-commits series ending here:

name       old time/op     new time/op     delta
Template       336ms ± 4%      344ms ± 4%   +2.61%          (p=0.027 n=9+8)
Unicode        165ms ± 6%      173ms ± 7%   +5.11%          (p=0.014 n=9+9)
GoTypes        1.09s ± 1%      1.06s ± 2%   -3.29%          (p=0.000 n=9+9)
Compiler       5.09s ±10%      4.75s ±10%   -6.64%        (p=0.011 n=10+10)
MakeBash       31.1s ± 5%      30.3s ± 3%     ~           (p=0.089 n=10+10)

name       old text-bytes  new text-bytes  delta
HelloSize       558k ± 0%       558k ± 0%   +0.02%        (p=0.000 n=10+10)
CmdGoSize      6.24M ± 0%      6.11M ± 0%   -2.11%        (p=0.000 n=10+10)

name       old data-bytes  new data-bytes  delta
HelloSize      3.66k ± 0%      3.74k ± 0%   +2.41%        (p=0.000 n=10+10)
CmdGoSize       134k ± 0%       162k ± 0%  +20.76%        (p=0.000 n=10+10)

name       old bss-bytes   new bss-bytes   delta
HelloSize       126k ± 0%       126k ± 0%     ~     (all samples are equal)
CmdGoSize       149k ± 0%       146k ± 0%   -2.17%        (p=0.000 n=10+10)

name       old exe-bytes   new exe-bytes   delta
HelloSize       924k ± 0%       924k ± 0%   +0.05%        (p=0.000 n=10+10)
CmdGoSize      9.77M ± 0%      9.62M ± 0%   -1.47%        (p=0.000 n=10+10)

Change-Id: Ib230ddc04988824035c32287ae544a965fedd344
Reviewed-on: https://go-review.googlesource.com/20902
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
2016-03-29 02:21:50 +00:00
Michel Lespinasse
79688ca58f cmd/link: collect itablinks as a slice in moduledata
See #14874

This change tells the linker to collect all the itablink symbols and
collect them so that moduledata can have a slice of all compiler
generated itabs.

The logic is shamelessly adapted from what is done with typelink symbols.

Change-Id: Ie93b59acf0fcba908a876d506afbf796f222dbac
Reviewed-on: https://go-review.googlesource.com/20889
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-29 02:18:56 +00:00
Michel Lespinasse
f00bbd5f81 cmd/compile: emit itabs and itablinks
See #14874

This change tells the compiler to emit itab and itablink symbols in
situations where they could be useful; however the compiled code does
not actually make use of the new symbols yet.

Change-Id: I0db3e6ec0cb1f3b7cebd4c60229e4a48372fe586
Reviewed-on: https://go-review.googlesource.com/20888
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
2016-03-29 02:18:03 +00:00
Michel Lespinasse
7043d2bb5e runtime: insert itabs into hash table during init
See #14874

This change makes the runtime register all compiler generated itabs
(as obtained from the moduledata) during init.

Change-Id: I9969a0985b99b8bda820a631f7fe4c78f1174cdf
Reviewed-on: https://go-review.googlesource.com/20900
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
2016-03-29 02:14:49 +00:00
Matthew Dempsky
deb83d0639 cmd/compile: remove unused write barrier helpers
These have been unused since CL 10316.

Passes toolstash -cmp.

Change-Id: Icc19f3fcc7275fbee1c665f704e10a110ecce2a5
Reviewed-on: https://go-review.googlesource.com/21242
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-03-29 00:16:55 +00:00
Shinji Tanaka
0f86d1edfb runtime: use set_thread_area instead of modify_ldt on linux/386
linux/386 depends on modify_ldt system call, but recent Linux kernels
can disable this system call. Any Go programs built as linux/386
crash with the message 'Trace/breakpoint trap'.

The kernel config CONFIG_MODIFY_LDT_SYSCALL, which control
enable/disable modify_ldt, is disabled on Amazon Linux 2016.03.

This fixes this problem by using set_thread_area instead of modify_ldt
on linux/386.

Fixes #14795.

Change-Id: I0cc5139e40e9e5591945164156a77b6bdff2c7f1
Reviewed-on: https://go-review.googlesource.com/21190
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
2016-03-28 16:56:38 +00:00
David Chase
8eec2bbfbc cmd/compile: added some intrinsics to SSA back end
One intrinsic was needed to help get the very best
performance out of a future GC; as long as that one was
being added, I also added Bswap since that is sometimes
a handy thing to have.  I had intended to fill out the
bit-scan intrinsic family, but the mismatch between the
"scan forward" instruction and "count leading zeroes"
was large enough to cause me to leave it out -- it poses
a dilemma that I'd rather dodge right now.

These intrinsics are not exposed for general use.
That's a separate issue requiring an API proposal change
( https://github.com/golang/proposal )

All intrinsics are tested, both that they are substituted
on the appropriate architecture, and that they produce the
expected result.

Change-Id: I5848037cfd97de4f75bdc33bdd89bba00af4a8ee
Reviewed-on: https://go-review.googlesource.com/20564
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 16:29:59 +00:00
Matthew Dempsky
995fb0319e cmd/compile: fix stringtoslicebytetmp optimization
Fixes #14973.

Change-Id: Iea68c9deca9429bde465c9ae05639209fe0ccf72
Reviewed-on: https://go-review.googlesource.com/21175
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-27 19:12:37 +00:00
Shenghou Ma
a7f7a9cca7 runtime, runtime/cgo: save callee-saved FP registers on arm64
For #14876.

Change-Id: I0992859264cbaf9c9b691fad53345bbb01b4cf3b
Reviewed-on: https://go-review.googlesource.com/21085
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-25 23:04:44 +00:00
Shenghou Ma
71916437fb runtime/cgo: save callee-saved xmm registers on windows/amd64
For #14876.

Change-Id: I33947f74e8058437a784862f1f064974afc99250
Reviewed-on: https://go-review.googlesource.com/21084
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-25 23:04:33 +00:00
Joe Sylve
562e38c0af runtime: fix signal handling on Solaris
This fixes the problems with signal handling that were inadvertently
introduced in https://go-review.googlesource.com/21006.

Fixes #14899

Change-Id: Ia746914dcb3146a52413d32c57b089af763f0810
Reviewed-on: https://go-review.googlesource.com/21145
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-25 21:38:47 +00:00
Marvin Stenger
6b0688f742 runtime: speed up growslice by avoiding divisions 2
This is a follow-up of https://go-review.googlesource.com/#/c/20653/

Special case computation for slices with elements of byte size or
pointer size.

name                      old time/op  new time/op  delta
GrowSliceBytes-4          86.2ns ± 3%  75.4ns ± 2%  -12.50%  (p=0.000 n=20+20)
GrowSliceInts-4            161ns ± 3%   136ns ± 3%  -15.59%  (p=0.000 n=19+19)
GrowSlicePtr-4             239ns ± 2%   233ns ± 2%   -2.52%  (p=0.000 n=20+20)
GrowSliceStruct24Bytes-4   258ns ± 3%   256ns ± 3%     ~     (p=0.134 n=20+20)

Change-Id: Ice5fa648058fe9d7fa89dee97ca359966f671128
Reviewed-on: https://go-review.googlesource.com/21101
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-25 16:47:56 +00:00
Richard Miller
967b9940b4 runtime: avoid fork/exit race in plan9
There's a race between runtime.goexitsall killing all OS processes
of a go program in order to exit, and runtime.newosproc forking a
new one.  If the new process has been created but not yet stored
its pid in m.procid, it will not be killed by goexitsall and
deadlock results.

This CL prevents the race by making the newly forked process
check whether the program is exiting.  It also prevents a
potential "shoot-out" if multiple goroutines call Exit at
the same time, which could possibly lead to two processes
killing each other and leaving the rest deadlocked.

Change-Id: I3170b4a62d2461f6b029b3d6aad70373714ed53e
Reviewed-on: https://go-review.googlesource.com/21135
Run-TryBot: David du Colombier <0intro@gmail.com>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David du Colombier <0intro@gmail.com>
2016-03-25 15:04:45 +00:00
Dmitry Vyukov
ea0386f85f runtime: improve randomized stealing logic
During random stealing we steal 4*GOMAXPROCS times from random procs.
One would expect that most of the time we check all procs this way,
but due to low quality PRNG we actually miss procs with frightening
probability. Below are modelling experiment results for 1e6 tries:

GOMAXPROCS = 2 : missed 1 procs 7944 times

GOMAXPROCS = 3 : missed 1 procs 101620 times
GOMAXPROCS = 3 : missed 2 procs 3571 times

GOMAXPROCS = 4 : missed 1 procs 63916 times
GOMAXPROCS = 4 : missed 2 procs 61 times
GOMAXPROCS = 4 : missed 3 procs 16 times

GOMAXPROCS = 5 : missed 1 procs 133136 times
GOMAXPROCS = 5 : missed 2 procs 1025 times
GOMAXPROCS = 5 : missed 3 procs 101 times
GOMAXPROCS = 5 : missed 4 procs 15 times

GOMAXPROCS = 8 : missed 1 procs 151765 times
GOMAXPROCS = 8 : missed 2 procs 5057 times
GOMAXPROCS = 8 : missed 3 procs 1726 times
GOMAXPROCS = 8 : missed 4 procs 68 times

GOMAXPROCS = 12 : missed 1 procs 199081 times
GOMAXPROCS = 12 : missed 2 procs 27489 times
GOMAXPROCS = 12 : missed 3 procs 3113 times
GOMAXPROCS = 12 : missed 4 procs 233 times
GOMAXPROCS = 12 : missed 5 procs 9 times

GOMAXPROCS = 16 : missed 1 procs 237477 times
GOMAXPROCS = 16 : missed 2 procs 30037 times
GOMAXPROCS = 16 : missed 3 procs 9466 times
GOMAXPROCS = 16 : missed 4 procs 1334 times
GOMAXPROCS = 16 : missed 5 procs 192 times
GOMAXPROCS = 16 : missed 6 procs 5 times
GOMAXPROCS = 16 : missed 7 procs 1 times
GOMAXPROCS = 16 : missed 8 procs 1 times

A missed proc won't lead to underutilization because we check all procs
again after dropping P. But it can lead to an unpleasant situation
when we miss a proc, drop P, check all procs, discover work, acquire P,
miss the proc again, repeat.

Improve stealing logic to cover all procs.
Also don't enter spinning mode and try to steal when there is nobody around.

Change-Id: Ibb6b122cc7fb836991bad7d0639b77c807aab4c2
Reviewed-on: https://go-review.googlesource.com/20836
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2016-03-25 11:00:48 +00:00
David Crawshaw
24ce64d1a9 cmd/compile, runtime: new static name encoding
Create a byte encoding designed for static Go names.

It is intended to be a compact representation of a name
and optional tag data that can be turned into a Go string
without allocating, and describes whether or not it is
exported without unicode table.

The encoding is described in reflect/type.go:

// The first byte is a bit field containing:
//
//	1<<0 the name is exported
//	1<<1 tag data follows the name
//	1<<2 pkgPath *string follow the name and tag
//
// The next two bytes are the data length:
//
//	 l := uint16(data[1])<<8 | uint16(data[2])
//
// Bytes [3:3+l] are the string data.
//
// If tag data follows then bytes 3+l and 3+l+1 are the tag length,
// with the data following.
//
// If the import path follows, then ptrSize bytes at the end of
// the data form a *string. The import path is only set for concrete
// methods that are defined in a different package than their type.

Shrinks binary sizes:

	cmd/go: 164KB (1.6%)
	jujud:  1.0MB (1.5%)

For #6853.

Change-Id: I46b6591015b17936a443c9efb5009de8dfe8b609
Reviewed-on: https://go-review.googlesource.com/20968
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-25 00:13:49 +00:00
Joe Sylve
df2b2eb63d runtime: improve last ditch signal forwarding for Unix libraries
The current runtime attempts to forward signals generated by non-Go
code to the original signal handler.  If it can't call the original
handler directly, it currently attempts to re-raise the signal after
resetting the handler.  In this case, the original context is lost.

This fix prevents that problem by simply returning from the go signal
handler after resetting the original handler.  It only does this when
the original handler is the system default handler, which in all cases
is known to not recover.  The signal is not reset, so it is retriggered
and the original handler takes over with the proper context.

Fixes #14899

Change-Id: Ib1c19dfa4b50d9732d7a453de3784c8141e1cbb3
Reviewed-on: https://go-review.googlesource.com/21006
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-24 19:34:17 +00:00
Marvin Stenger
44532f1a9d runtime: fix inconsistency in slice.go
Fixes #14938.

Additionally some simplifications along the way.

Change-Id: I2c5fb7e32dcc6fab68fff36a49cb72e715756abe
Reviewed-on: https://go-review.googlesource.com/21046
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2016-03-24 18:17:28 +00:00
Elias Naur
be10c51500 runtime/cgo: block signals to the iOS mach exception handler
For darwin/arm{,64} a non-Go thread is created to convert
EXC_BAD_ACCESS to panics. However, the Go signal handler refuse to
handle signals that would otherwise be ignored if they arrive at
non-Go threads.

Block all (posix) signals to that thread, making sure that
no unexpected signals arrive to it. At least one test, TestStop in
os/signal, depends on signals not arriving on any non-Go threads.

For #14318

Change-Id: I901467fb53bdadb0d03b0f1a537116c7f4754423
Reviewed-on: https://go-review.googlesource.com/21047
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-03-24 14:12:12 +00:00
Lynn Boger
baec148767 bytes: Equal perf improvements on ppc64le/ppc64
The existing implementation for Equal and similar
functions in the bytes package operate on one byte at
at time.  This performs poorly on ppc64/ppc64le especially
when the byte buffers are large.  This change improves
those functions by loading and comparing double words where
possible.  The common code has been moved to a function
that can be shared by the other functions in this
file which perform the same type of comparison.
Further optimizations are done for the case where
>= 32 bytes are being compared.  The new function
memeqbody is used by memeq_varlen, Equal, and eqstring.

When running the bytes test with -test.bench=Equal

benchmark                     old MB/s     new MB/s     speedup
BenchmarkEqual1               164.83       129.49       0.79x
BenchmarkEqual6               563.51       445.47       0.79x
BenchmarkEqual9               656.15       1099.00      1.67x
BenchmarkEqual15              591.93       1024.30      1.73x
BenchmarkEqual16              613.25       1914.12      3.12x
BenchmarkEqual20              682.37       1687.04      2.47x
BenchmarkEqual32              807.96       3843.29      4.76x
BenchmarkEqual4K              1076.25      23280.51     21.63x
BenchmarkEqual4M              1079.30      13120.14     12.16x
BenchmarkEqual64M             1073.28      10876.92     10.13x

It was determined that the degradation in the smaller byte tests
were due to unfavorable code alignment of the single byte loop.

Fixes #14368

Change-Id: I0dd87382c28887c70f4fbe80877a8ba03c31d7cd
Reviewed-on: https://go-review.googlesource.com/20249
Reviewed-by: Minux Ma <minux@golang.org>
2016-03-23 14:21:15 +00:00
Keith Randall
6a33f7765f runtime: use MOVSB instead of MOVSQ for unaligned moves
MOVSB is quite a bit faster for unaligned moves.
Possibly we should use MOVSB all of the time, but Intel folks
say it might be a bit faster to use MOVSQ on some processors
(but not any I have access to at the moment).

benchmark                              old ns/op     new ns/op     delta
BenchmarkMemmove4096-8                 93.9          93.2          -0.75%
BenchmarkMemmoveUnalignedDst4096-8     256           151           -41.02%
BenchmarkMemmoveUnalignedSrc4096-8     175           90.5          -48.29%

Fixes #14630

Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff
Reviewed-on: https://go-review.googlesource.com/20293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-21 19:10:24 +00:00
Michael Munday
cc17d1ba76 runtime/internal/sys: add s390x support
Change-Id: I928532b406a3457d2c5f75f4de7d46a3f795192e
Reviewed-on: https://go-review.googlesource.com/20939
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-21 08:04:38 +00:00
Keith Randall
377eaa7494 runtime: add space
Missed this in review of 20812

Change-Id: I01e220499dcd58e1a7205e2a577dd9630a8b7174
Reviewed-on: https://go-review.googlesource.com/20819
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-18 21:31:49 +00:00
Keith Randall
15ea61146e runtime: use unaligned loads on ppc64
benchmark                      old ns/op     new ns/op     delta
BenchmarkAlignedLoad-160       8.67          7.42          -14.42%
BenchmarkUnalignedLoad-160     8.63          7.37          -14.60%

Change-Id: Id4609d7b4038c4d2ec332efc4fe6f1adfb61b82b
Reviewed-on: https://go-review.googlesource.com/20812
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-18 19:21:53 +00:00
Marcel van Lohuizen
872ca73cad runtime: don't assume b.N > 0
Change-Id: I2e26717f2563d7633ffd15f4adf63c3d0ee3403f
Reviewed-on: https://go-review.googlesource.com/20856
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-03-18 15:55:02 +00:00
Austin Clements
857d0b48db runtime: document sudog
Change-Id: I85c0bcf02842cc32dbc9bfdcea27efe871173574
Reviewed-on: https://go-review.googlesource.com/20774
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-17 18:57:10 +00:00
Matthew Dempsky
ac74e5debc cmd/compile: stop constructing sudog type
The compiler doesn't care about the runtime's sudog type. Stop
constructing it.

Change-Id: If1885fe30b2e215a08d17662eab5ea6d81fe58ab
Reviewed-on: https://go-review.googlesource.com/20797
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-03-17 18:02:40 +00:00
Austin Clements
f11e4eb5cc runtime: shrink stacks during concurrent mark
Currently we shrink stacks during STW mark termination because it used
to be unsafe to shrink them concurrently. For some programs, this
significantly increases pause time: stack shrinking costs ~5ms/MB
copied plus 2µs/shrink.

Now that we've made it safe to shrink a stack without the world being
stopped, shrink them during the concurrent mark phase.

This reduces the STW time in the program from issue #12967 by an order
of magnitude and brings it from over the 10ms goal to well under:

name           old 95%ile-markTerm-time  new 95%ile-markTerm-time  delta
Stackshrink-4               23.8ms ±60%               1.80ms ±39%  -92.44%  (p=0.008 n=5+5)

Fixes #12967.

This slows down the go1 and garbage benchmarks overall by < 0.5%.

name              old time/op  new time/op  delta
XBenchGarbage-12  2.48ms ± 1%  2.49ms ± 1%  +0.45%  (p=0.005 n=25+21)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.93s ± 2%     2.97s ± 2%  +1.34%  (p=0.002 n=19+20)
Fannkuch11-12                2.51s ± 1%     2.59s ± 0%  +3.09%  (p=0.000 n=18+18)
FmtFprintfEmpty-12          51.1ns ± 2%    51.5ns ± 1%    ~     (p=0.280 n=20+17)
FmtFprintfString-12          175ns ± 1%     169ns ± 1%  -3.01%  (p=0.000 n=20+20)
FmtFprintfInt-12             160ns ± 1%     160ns ± 0%  +0.53%  (p=0.000 n=20+20)
FmtFprintfIntInt-12          265ns ± 0%     266ns ± 1%  +0.59%  (p=0.000 n=20+20)
FmtFprintfPrefixedInt-12     237ns ± 1%     238ns ± 1%  +0.44%  (p=0.000 n=20+20)
FmtFprintfFloat-12           326ns ± 1%     341ns ± 1%  +4.55%  (p=0.000 n=20+19)
FmtManyArgs-12              1.01µs ± 0%    1.02µs ± 0%  +0.43%  (p=0.000 n=20+19)
GobDecode-12                8.41ms ± 1%    8.30ms ± 2%  -1.22%  (p=0.000 n=20+19)
GobEncode-12                6.66ms ± 1%    6.68ms ± 0%  +0.30%  (p=0.000 n=18+19)
Gzip-12                      322ms ± 1%     322ms ± 1%    ~     (p=1.000 n=20+20)
Gunzip-12                   42.8ms ± 0%    42.9ms ± 0%    ~     (p=0.174 n=20+20)
HTTPClientServer-12         69.7µs ± 1%    70.6µs ± 1%  +1.20%  (p=0.000 n=20+20)
JSONEncode-12               16.8ms ± 0%    16.8ms ± 1%    ~     (p=0.154 n=19+19)
JSONDecode-12               65.1ms ± 0%    65.3ms ± 1%  +0.34%  (p=0.003 n=20+20)
Mandelbrot200-12            3.93ms ± 0%    3.92ms ± 0%    ~     (p=0.396 n=19+20)
GoParse-12                  3.66ms ± 1%    3.65ms ± 1%    ~     (p=0.117 n=16+18)
RegexpMatchEasy0_32-12      85.0ns ± 2%    85.5ns ± 2%    ~     (p=0.143 n=20+20)
RegexpMatchEasy0_1K-12       267ns ± 1%     267ns ± 1%    ~     (p=0.867 n=20+17)
RegexpMatchEasy1_32-12      83.3ns ± 2%    83.8ns ± 1%    ~     (p=0.068 n=20+20)
RegexpMatchEasy1_1K-12       432ns ± 1%     432ns ± 1%    ~     (p=0.804 n=20+19)
RegexpMatchMedium_32-12      133ns ± 0%     133ns ± 0%    ~     (p=1.000 n=20+20)
RegexpMatchMedium_1K-12     40.3µs ± 1%    40.4µs ± 1%    ~     (p=0.319 n=20+19)
RegexpMatchHard_32-12       2.10µs ± 1%    2.10µs ± 1%    ~     (p=0.723 n=20+18)
RegexpMatchHard_1K-12       63.0µs ± 0%    63.0µs ± 0%    ~     (p=0.158 n=19+17)
Revcomp-12                   461ms ± 1%     476ms ± 8%  +3.29%  (p=0.002 n=20+20)
Template-12                 80.1ms ± 1%    79.3ms ± 1%  -1.00%  (p=0.000 n=20+20)
TimeParse-12                 360ns ± 0%     360ns ± 0%    ~     (p=0.802 n=18+19)
TimeFormat-12                374ns ± 1%     372ns ± 0%  -0.77%  (p=0.000 n=20+19)
[Geo mean]                  61.8µs         62.0µs       +0.40%

Change-Id: Ib60cd46b7a4987e07670eb271d22f6cee5802842
Reviewed-on: https://go-review.googlesource.com/20044
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-16 20:13:25 +00:00
Austin Clements
c14d25c648 runtime: generalize work.finalizersDone to work.markrootDone
We're about to add another root marking job that needs to happen only
during the first markroot pass (whether that's concurrent or STW),
just like finalizer scanning. Rather than introducing another flag
that has the same value as finalizersDone, just rename finalizersDone
to markrootDone.

Change-Id: I535356c6ea1f3734cb5b6add264cb7bf48de95e8
Reviewed-on: https://go-review.googlesource.com/20043
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:13:22 +00:00
Austin Clements
276b177771 runtime: make shrinkstack concurrent-safe
Currently shinkstack is only safe during STW because it adjusts
channel-related stack pointers and moves send/receive stack slots
without synchronizing with the channel code. Make it safe to use when
the world isn't stopped by:

1) Locking all channels the G is blocked on while adjusting the sudogs
   and copying the area of the stack that may contain send/receive
   slots.

2) For any stack frames that may contain send/receive slot, using an
   atomic CAS to adjust pointers to prevent races between adjusting a
   pointer in a receive slot and a concurrent send writing to that
   receive slot.

In principle, the synchronization could be finer-grained. For example,
we considered synchronizing around the sudogs, which would allow
channel operations involving other Gs to continue if the G being
shrunk was far enough down the send/receive queue. However, using the
channel lock means no additional locks are necessary in the channel
code. Furthermore, the stack shrinking code holds the channel lock for
a very short time (much less than the time required to shrink the
stack).

This does not yet make stack shrinking concurrent; it merely makes
doing so safe.

This has negligible effect on the go1 and garbage benchmarks.

For #12967.

Change-Id: Ia49df3a8a7be4b36e365aac4155a2416b94b988c
Reviewed-on: https://go-review.googlesource.com/20042
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2016-03-16 20:13:20 +00:00
Austin Clements
d45bf7228f runtime: define lock order between G status and channel lock
Currently, locking a G's stack by setting its status to _Gcopystack or
_Gscan is unordered with respect to channel locks. However, when we
make stack shrinking concurrent, stack shrinking will need to lock the
G and then acquire channel locks, which imposes an order on these.

Document this lock ordering and fix closechan to respect it.
Everything else already happens to respect it.

For #12967.

Change-Id: I4dd02675efffb3e7daa5285cf75bf24f987d90d4
Reviewed-on: https://go-review.googlesource.com/20041
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:13:17 +00:00
Austin Clements
db72b41bcd runtime: protect sudog.elem with hchan.lock
Currently sudog.elem is never accessed concurrently, so in several
cases we drop the channel lock just before reading/writing the
sent/received value from/to sudog.elem. However, concurrent stack
shrinking is going to have to adjust sudog.elem to point to the new
stack, which means it needs a way to synchronize with accesses to
sudog.elem. Hence, add sudog.elem to the fields protected by
hchan.lock and scoot the unlocks down past the uses of sudog.elem.

While we're here, better document the channel synchronization rules.

For #12967.

Change-Id: I3ad0ca71f0a74b0716c261aef21b2f7f13f74917
Reviewed-on: https://go-review.googlesource.com/20040
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:13:15 +00:00
Austin Clements
3c2a21ff13 runtime: fix transient _Gwaiting states in newstack
With concurrent stack shrinking, the stack can move the instant after
a G enters _Gwaiting. There are only two places that put a G into
_Gwaiting: gopark and newstack. We fixed uses of gopark. This commit
fixes newstack by simplifying its G transitions and, in particular,
eliminating or narrowing the transient _Gwaiting states it passes
through so it's clear nothing in the G is accessed while in _Gwaiting.

For #12967.

Change-Id: I2440ead411d2bc61beb1e2ab020ebe3cb3481af9
Reviewed-on: https://go-review.googlesource.com/20039
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:13:12 +00:00
Austin Clements
8fb182d020 runtime: never pass stack pointers to gopark
gopark calls the unlock function after setting the G to _Gwaiting.
This means it's generally unsafe to access the G's stack from the
unlock function because the G may start running on another P. Once we
start shrinking stacks concurrently, a stack shrink could also move
the stack the moment after it enters _Gwaiting and before the unlock
function is called.

Document this restriction and fix the two places where we currently
violate it.

This is unlikely to be a problem in practice for these two places
right now, but they're already skating on thin ice. For example, the
following sequence could in principle cause corruption, deadlock, or a
panic in the select code:

On M1/P1:
1. G1 selects on channels A and B.
2. selectgoImpl calls gopark.
3. gopark puts G1 in _Gwaiting.
4. gopark calls selparkcommit.
5. selparkcommit releases the lock on channel A.

On M2/P2:
6. G2 sends to channel A.
7. The send puts G1 in _Grunnable and puts it on P2's run queue.
8. The scheduler runs, selects G1, puts it in _Grunning, and resumes G1.
9. On G1, the sellock immediately following the gopark gets called.
10. sellock grows and moves the stack.

On M1/P1:
11. selparkcommit continues to scan the lock order for the next
channel to unlock, but it's now reading from a freed (and possibly
reused) stack.

This shouldn't happen in practice because step 10 isn't the first call
to sellock, so the stack should already be big enough. However, once
we start shrinking stacks concurrently, this reasoning won't work any
more.

For #12967.

Change-Id: I3660c5be37e5be9f87433cb8141bdfdf37fadc4c
Reviewed-on: https://go-review.googlesource.com/20038
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:13:10 +00:00
Austin Clements
005140a77e runtime: put g.waiting list in lock order
Currently the g.waiting list created by a select is in poll order.
However, nothing depends on this, and we're going to need access to
the channel lock order in other places shortly, so modify select to
put the waiting list in channel lock order.

For #12967.

Change-Id: If0d38816216ecbb37a36624d9b25dd96e0a775ec
Reviewed-on: https://go-review.googlesource.com/20037
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2016-03-16 20:13:07 +00:00
Austin Clements
26594c3dfd runtime: use indexes for select lock order
Currently the select lock order is a []*hchan. We're going to need to
refer to things other than the channel itself in lock order shortly,
so switch this to a []uint16 of indexes into the select cases. This
parallels the existing representation for the poll order.

Change-Id: I89262223fe20b4ddf5321592655ba9eac489cda1
Reviewed-on: https://go-review.googlesource.com/20036
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:13:05 +00:00
Austin Clements
e4a95b6343 runtime: record channel in sudog
Given a G, there's currently no way to find the channel it's blocking
on. We'll need this information to fix a (probably theoretical) bug in
select and to implement concurrent stack shrinking, so record the
channel in the sudog.

For #12967.

Change-Id: If8fb63a140f1d07175818824d08c0ebeec2bdf66
Reviewed-on: https://go-review.googlesource.com/20035
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:13:02 +00:00
Austin Clements
d7cedc4b74 runtime: perform gcMarkRootCheck during STW in checkmark mode
gcMarkRootCheck is too expensive to do during mark termination.
However, since it's a useful check and it complements checkmark mode
nicely, enable it during mark termination is checkmark is enabled.

Change-Id: Icd9039e85e6e9d22747454441b50f1cdd1412202
Reviewed-on: https://go-review.googlesource.com/20663
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 20:12:59 +00:00
Alan Donovan
ed73efbb74 runtime/debug: clarify WriteHeapDump STW behavior
Change-Id: I049d2596fe8ce0e93391599f5c224779fd8e316f
Reviewed-on: https://go-review.googlesource.com/20761
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-16 17:02:50 +00:00
Michael Matloob
7e3344f74e runtime: update link to WriteHeapDump format
The new link is https://golang.org/s/go15heapdump.

Change-Id: Ifcaf8572bfe815ffaa78442a1991f6e20e990a50
Reviewed-on: https://go-review.googlesource.com/20740
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-15 22:09:43 +00:00
Wedson Almeida Filho
8e7072ca83 sync: new Cond implementation
Change Cond implementation to use a notification list such that waiters
can first register for a notification, release the lock, then actually
wait. Signalers never have to park anymore.

This is intended to address an issue in the previous implementation
where Broadcast could fail to signal all waiters.

Results of the existing benchmark are below.

                                          Original          New  Diff
BenchmarkCond1-48        2000000               745 ns/op    755 +1.3%
BenchmarkCond2-48        1000000              1545 ns/op   1532 -0.8%
BenchmarkCond4-48         300000              3833 ns/op   3896 +1.6%
BenchmarkCond8-48         200000             10049 ns/op  10257 +2.1%
BenchmarkCond16-48        100000             21123 ns/op  21236 +0.5%
BenchmarkCond32-48         30000             40393 ns/op  41097 +1.7%

Fixes #14064

Change-Id: I083466d61593a791a034df61f5305adfb8f1c7f9
Reviewed-on: https://go-review.googlesource.com/18892
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Caleb Spare <cespare@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-15 22:01:20 +00:00
David Crawshaw
f2772a4935 cmd/compile: compute second method type at runtime
The type information for a method includes two variants: a func
without the receiver, and a func with the receiver as the first
parameter. The former is used as part of the dynamic interface
checks, but the latter is only returned as a type in the
reflect.Method struct.

Instead of computing it at compile time, construct it at run time
with reflect.FuncOf.

Using cl/20701 as a baseline,

	cmd/go: -480KB, (4.4%)
	jujud:  -5.6MB, (7.8%)

For #6853.

Change-Id: I1b8c73f3ab894735f53d00cb9c0b506d84d54e92
Reviewed-on: https://go-review.googlesource.com/20709
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-15 19:57:40 +00:00
Elias Naur
ea4b785ae0 runtime: preserve darwin/arm{,64} callee-save registers
CL 14603 attempted to preserve the callee-save registers for
the darwin/arm runtime initialization routine, but I believe it
wasn't sufficient and resulted in the crash reported in issue

Saving and restoring the registers on the stack the same way
linux/arm does seems more obvious and fixes #14778, so do that.

Even though #14778 is not reproducible on darwin/arm64, I applied
a similar change there, and to linux/arm64 which obeys the same
calling convention.

Finally, this CL is a candidate for a 1.6 minor release for the same
reason CL 14603 was in a 1.5 minor release (as CL 16968). It is
small and only touches the iOS platforms and gomobile on darwin/arm
is currently useless without it.

Fixes #14778
Fixes #12590 (again)

Change-Id: I7401daf0bbd7c579a7e84761384a7b763651752a
Reviewed-on: https://go-review.googlesource.com/20621
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-15 08:43:34 +00:00
Austin Clements
b1a6e07919 runtime: document the G states
In particular, write down the rules for stack ownership because the
details of this are about to get very important with concurrent stack
shrinking. (Interestingly, the details don't actually change, but
anything that's currently skating on thin ice is likely to fall
through.)

Fox #12967.

Change-Id: I561e2610e864295e9faba07717a934aabefcaab9
Reviewed-on: https://go-review.googlesource.com/20034
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-03-14 16:30:16 +00:00
Austin Clements
da153354b2 runtime: copy stack before adjusting
Currently copystack adjusts pointers in the old stack and then copies
the adjusted stack to the new stack. In addition to being generally
confusing, this is going to make concurrent stack shrinking harder.

Switch this around so that we first copy the stack and then adjust
pointers on the new stack (never writing to the old stack).

This reprises CL 15996, but takes a different and simpler approach. CL
15996 still walked the old stack while adjusting pointers on the new
stack. In this CL, we adjust auxiliary structures before walking the
stack, so we can just walk the new stack.

For #12967.

Change-Id: I94fa86f823ba9ee478e73b2ba509eed3361c43df
Reviewed-on: https://go-review.googlesource.com/20033
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-03-14 16:29:47 +00:00
Austin Clements
5a50e00306 runtime: improve comment on selectgo
In particular, document that *sel is on the stack no matter what.

Change-Id: I1c264215e026c27721b13eedae73ec845066cdec
Reviewed-on: https://go-review.googlesource.com/20032
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-03-14 16:29:21 +00:00
Richard Miller
8a2d6e9f6f runtime: fix a typo in asssembly macro GO_RESULTS_INITIALIZED
Fixes #14772

Change-Id: I32f2b6b74de28be406b1306364bc07620a453962
Reviewed-on: https://go-review.googlesource.com/20680
Reviewed-by: David du Colombier <0intro@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-14 14:53:29 +00:00
Martin Möhrmann
43ed65f869 runtime: speed up growslice by avoiding divisions
Only compute the number of maximum allowed elements per slice once.
Special case newcap computation for slices with byte sized elements.

name              old time/op  new time/op  delta
GrowSliceBytes-2  61.1ns ± 1%  43.4ns ± 1%  -29.00%  (p=0.000 n=20+20)
GrowSliceInts-2   85.9ns ± 1%  75.7ns ± 1%  -11.80%  (p=0.000 n=20+20)

Change-Id: I5d9c0d5987cdd108ac29dc32e31912dcefa2324d
Reviewed-on: https://go-review.googlesource.com/20653
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-14 00:45:57 +00:00
Matthew Dempsky
d1341d6cf3 cmd/compile, runtime: eliminate growslice_n
Fixes #11419.

Change-Id: I7935a253e3e96191a33f5041bab203ecc5f0c976
Reviewed-on: https://go-review.googlesource.com/20647
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-13 23:34:31 +00:00
Emmanuel Odeke
6dfcc336c5 runtime: move testSchedLocalQueue* to export_test
Move functions testSchedLocalQueueLocal and testSchedLocalQueueSteal
from proc.go to export_test.go, the only site that they are used.

Fixes #14796

Change-Id: I16b6fa4a13835eab33f66a2c2e87a5f5c79b7bd3
Reviewed-on: https://go-review.googlesource.com/20640
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-13 00:34:58 +00:00
David Crawshaw
cc158403d6 cmd/compile: track reflect.Type.Method in deadcode
In addition to reflect.Value.Call, exported methods can be invoked
by the Func value in the reflect.Method struct. This CL has the
compiler track what functions get access to a legitimate reflect.Method
struct by looking for interface calls to either of:

	Method(int) reflect.Method
	MethodByName(string) (reflect.Method, bool)

This is a little overly conservative. If a user implements a type
with one of these methods without using the underlying calls on
reflect.Type, the linker will assume the worst and include all
exported methods. But it's cheap.

No change to any of the binary sizes reported in cl/20483.

For #14740

Change-Id: Ie17786395d0453ce0384d8b240ecb043b7726137
Reviewed-on: https://go-review.googlesource.com/20489
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-03-11 21:19:20 +00:00
Ian Lance Taylor
b354f91417 runtime: limit TestCgoCCodeSIGPROF test to 1 second
Still fails about 20% of the time on my laptop.

Fixes #14766.

Change-Id: I169ab728c6022dceeb91188f5ad466ed6413c062
Reviewed-on: https://go-review.googlesource.com/20590
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-11 06:43:01 +00:00
Ian Lance Taylor
1d809c5c14 runtime: fix names in SetFinalizer doc comment
Fixes #14554.

Change-Id: I37ab4e4dc1aee84ac448d437314f8eecbbc02994
Reviewed-on: https://go-review.googlesource.com/20021
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-10 18:20:44 +00:00
Richard Miller
6b59d61822 runtime: Plan 9 - prevent preemption by GC while exiting
On Plan 9, there's no "kill all threads" system call, so exit is done
by sending a "go: exit" note to each OS process.  If concurrent GC
occurs during this loop, deadlock sometimes results.  Prevent this by
incrementing m.locks before sending notes.

Change-Id: I31aa15134ff6e42d9a82f9f8a308620b3ad1b1b1
Reviewed-on: https://go-review.googlesource.com/20477
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-09 16:48:00 +00:00
David Crawshaw
8df733bd22 cmd/compile: remove slices from rtype.funcType
Alternative to golang.org/cl/19852. This memory layout doesn't have
an easy type representation, but it is noticeably smaller than the
current funcType, and saves significant extra space.

Some notes on the layout are in reflect/type.go:

// A *rtype for each in and out parameter is stored in an array that
// directly follows the funcType (and possibly its uncommonType). So
// a function type with one method, one input, and one output is:
//
//	struct {
//		funcType
//		uncommonType
//		[2]*rtype    // [0] is in, [1] is out
//		uncommonTypeSliceContents
//	}

There are three arbitrary limits introduced by this CL:

1. No more than 65535 function input parameters.
2. No more than 32767 function output parameters.
3. reflect.FuncOf is limited to 128 parameters.

I don't think these are limits in practice, but are worth noting.

Reduces godoc binary size by 2.4%, 330KB.

For #6853.

Change-Id: I225c0a0516ebdbe92d41dfdf43f716da42dfe347
Reviewed-on: https://go-review.googlesource.com/19916
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-09 01:25:18 +00:00
David Crawshaw
a24b3ed753 cmd/compile: remove rtype *uncommonType field
Instead of a pointer on every rtype, use a bit flag to indicate that
the contents of uncommonType directly follows the rtype value when it
is needed.

This requires a bit of juggling in the compiler's rtype encoder. The
backing arrays for fields in the rtype are presently encoded directly
after the slice header. This packing requires separating the encoding
of the uncommonType slice headers from their backing arrays.

Reduces binary size of godoc by ~180KB (1.5%).
No measurable change in all.bash time.
For #6853.

Change-Id: I60205948ceb5c0abba76fdf619652da9c465a597
Reviewed-on: https://go-review.googlesource.com/19790
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-08 23:23:13 +00:00
Russ Cox
0c7ccbf601 cmd/go: ignore C files when CGO_ENABLED=0
Before, those C files might have been intended for the Plan 9 C compiler,
but that option was removed in Go 1.5. We can simplify the maintenance
of cgo packages now if we assume C files (and C++ and M and SWIG files)
should only be considered when cgo is enabled.

Also remove newly unnecessary build tags in runtime/cgo's C files.

Fixes #14123

Change-Id: Ia5a7fe62b9469965aa7c3547fe43c6c9292b8205
Reviewed-on: https://go-review.googlesource.com/19613
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-08 20:10:52 +00:00
Burcu Dogan
6df8038768 runtime: listen 127.0.0.1 instead of localhost on android
Fixes #14486.
Related to #14485.

Change-Id: I2dd77b0337aebfe885ae828483deeaacb500b12a
Reviewed-on: https://go-review.googlesource.com/20340
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-08 00:22:38 +00:00
Austin Clements
2b19b6e3f1 runtime: fix checkmark scanning of finalizers
Currently work.finalizersDone is reset only at the beginning of
gcStart. As a result, it will be set when checkmark runs, so checkmark
will skip scanning finalizers. Hence, if there are any bugs that cause
the regular scan of finalizers to miss pointers, checkmark will also
miss them and fail to detect the missed pointer.

Fix this by resetting finalizersDone in gcResetMarkState. This way it
gets reset before any full mark, which is exactly what we want.

Change-Id: I4ddb5eba5b3b97e52aaf3e08fd9aa692bda32b20
Reviewed-on: https://go-review.googlesource.com/20332
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-07 22:32:20 +00:00
Matthew Dempsky
a03bdc3e6b runtime: eliminate unnecessary type conversions
Automated refactoring produced using github.com/mdempsky/unconvert.

Change-Id: Iacf871a4f221ef17f48999a464ab2858b2bbaa90
Reviewed-on: https://go-review.googlesource.com/20071
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-07 20:53:27 +00:00
Richard Miller
0de0cafb9f runtime: new files for plan9_arm support
Implementation more or less follows plan9_386 version.
Revised 7 March to correct a bug in runtime.seek and
tidy whitespace for 8-column tabs.

Change-Id: I2e921558b5816502e8aafe330530c5a48a6c7537
Reviewed-on: https://go-review.googlesource.com/18966
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-07 16:25:48 +00:00
Richard Miller
d145456b16 runtime: signal handling support for plan9_arm
Plan 9 trap/signal handling differs on ARM from other architectures
because ARM has a link register.  Also trap message syntax varies
between different architectures (historical accident?).
Revised 7 March to clarify a comment.

Change-Id: Ib6485f82857a2f9a0d6b2c375cf0aaa230b83656
Reviewed-on: https://go-review.googlesource.com/18969
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-07 16:25:17 +00:00
Austin Clements
ff71ed86b6 runtime: merge {bgMark,assist}StartTime
We used to start background mark workers and assists at different
times, so we needed to keep track of these separately. They're now set
to exactly the same time, so clean things up by merging them in to one
value, markStartTime.

Change-Id: I17c9843c3ed2d6f07b4c8cd0b2c438fc6de23b53
Reviewed-on: https://go-review.googlesource.com/20143
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-03-07 00:22:42 +00:00
Hitoshi Mitake
8c838192b8 runtime: don't print EnableGC flag in WriteHeapProfile()
Current runtime.WriteHeapProfile() doesn't print correct
EnableGC. Even if GOGC=off, the result file has below line:
 # EnableGC = true

It is hard to print correct status of the variable because of corner
cases e.g. initialization. For avoiding confusion, this commit removes
the print.

Change-Id: Ia792454a6c650bdc50a06fbaff4df7b6330ae08a
Reviewed-on: https://go-review.googlesource.com/18600
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-03-05 08:14:23 +00:00
Austin Clements
b5481dd0a6 runtime: disable gcMarkRootCheck debugging check during STW
gcMarkRootCheck takes ~10ns per goroutine. This is just a debugging
check, so disable it (plus, if something is going to go wrong, it's
more likely to go wrong during concurrent mark).

We may be able to re-enable this later, or move it to after we've
started the world again. (But not for 1.6.x.)

For 1.6.x.

Fixes #14419.

name / 95%ile-time/markTerm          old          new  delta
500kIdleGs-12                24.0ms ± 0%  18.9ms ± 6%  -21.46%  (p=0.000 n=15+20)

Change-Id: Idb2a2b1771449de772c159ef95920d6df1090666
Reviewed-on: https://go-review.googlesource.com/20148
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-04 21:12:06 +00:00
Austin Clements
9ab9053344 runtime: reset mark state before stopping the world
Currently we reset the mark state during STW sweep termination. This
involves looping over all of the goroutines. Each iteration of this
loop takes ~25ns, so at around 400k goroutines, we'll exceed our 10ms
pause goal.

However, it's safe to do this before we stop the world for sweep
termination because nothing is consuming this state yet. Hence, move
the reset to just before STW.

This isn't perfect: a long reset can still delay allocating goroutines
that block on GC starting. But it's certainly better to block some
things eventually than to block everything immediately.

For 1.6.x.

Fixes #14420.

name \ 95%ile-time/sweepTerm           old          new  delta
500kIdleGs-12                 11312µs ± 6%  18.9µs ± 6%  -99.83%  (p=0.000 n=16+20)

Change-Id: I9815c4d8d9b0d3c3e94dfdab78049cefe0dcc93c
Reviewed-on: https://go-review.googlesource.com/20147
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-04 21:12:03 +00:00
Ian Lance Taylor
1716162a9a runtime: fix off-by-one error finding module for PC
Also fix compiler-invoked panics to avoid a confusing "malloc deadlock"
crash if they are invoked while executing the runtime.

Fixes #14599.

Change-Id: I89436abcbf3587901909abbdca1973301654a76e
Reviewed-on: https://go-review.googlesource.com/20219
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-04 21:06:31 +00:00
David Crawshaw
69285a8b46 reflect: recognize unnamed directional channels
go test github.com/onsi/gomega/gbytes now passes at tip, and tests
added to the reflect package.

Fixes #14645

Change-Id: I16216c1a86211a1103d913237fe6bca5000cf885
Reviewed-on: https://go-review.googlesource.com/20221
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-04 20:34:30 +00:00
Mikio Hara
b0f4ee533a net: deduplicate TCP socket code
This change consolidates functions and methods related to TCPAddr,
TCPConn and TCPListener for maintenance purpose, especially for
documentation. Also refactors Dial error code paths.

The followup changes will update comments and examples.

Updates #10624.

Change-Id: I3333ee218ebcd08928f9e2826cd1984d15ea153e
Reviewed-on: https://go-review.googlesource.com/20009
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-03 04:23:59 +00:00
Russ Cox
6969d9bf03 runtime/pprof: sort counted profiles by count
This is especially helpful in programs with very large numbers of goroutines:
the bulk of the goroutines will show up at the top.

Before:
	1 @ 0x86ab8 0x86893 0x82164 0x8e7ce 0x7b798 0x5b871
	#	0x86ab8	runtime/pprof.writeRuntimeProfile+0xb8		/Users/rsc/go/src/runtime/pprof/pprof.go:545
	#	0x86893	runtime/pprof.writeGoroutine+0x93		/Users/rsc/go/src/runtime/pprof/pprof.go:507
	#	0x82164	runtime/pprof.(*Profile).WriteTo+0xd4		/Users/rsc/go/src/runtime/pprof/pprof.go:236
	#	0x8e7ce	runtime/pprof_test.TestGoroutineCounts+0x15e	/Users/rsc/go/src/runtime/pprof/pprof_test.go:603
	#	0x7b798	testing.tRunner+0x98				/Users/rsc/go/src/testing/testing.go:473

	1 @ 0x2d373 0x2d434 0x560f 0x516b 0x7cd42 0x7b861 0x2297 0x2cf90 0x5b871
	#	0x7cd42	testing.RunTests+0x8d2	/Users/rsc/go/src/testing/testing.go:583
	#	0x7b861	testing.(*M).Run+0x81	/Users/rsc/go/src/testing/testing.go:515
	#	0x2297	main.main+0x117		runtime/pprof/_test/_testmain.go:72
	#	0x2cf90	runtime.main+0x2b0	/Users/rsc/go/src/runtime/proc.go:188

	10 @ 0x2d373 0x2d434 0x560f 0x516b 0x8e5b6 0x5b871
	#	0x8e5b6	runtime/pprof_test.func1+0x36	/Users/rsc/go/src/runtime/pprof/pprof_test.go:582

	50 @ 0x2d373 0x2d434 0x560f 0x516b 0x8e656 0x5b871
	#	0x8e656	runtime/pprof_test.func3+0x36	/Users/rsc/go/src/runtime/pprof/pprof_test.go:584

	40 @ 0x2d373 0x2d434 0x560f 0x516b 0x8e606 0x5b871
	#	0x8e606	runtime/pprof_test.func2+0x36	/Users/rsc/go/src/runtime/pprof/pprof_test.go:583

After:

	50 @ 0x2d373 0x2d434 0x560f 0x516b 0x8ecc6 0x5b871
	#	0x8ecc6	runtime/pprof_test.func3+0x36	/Users/rsc/go/src/runtime/pprof/pprof_test.go:584

	40 @ 0x2d373 0x2d434 0x560f 0x516b 0x8ec76 0x5b871
	#	0x8ec76	runtime/pprof_test.func2+0x36	/Users/rsc/go/src/runtime/pprof/pprof_test.go:583

	10 @ 0x2d373 0x2d434 0x560f 0x516b 0x8ec26 0x5b871
	#	0x8ec26	runtime/pprof_test.func1+0x36	/Users/rsc/go/src/runtime/pprof/pprof_test.go:582

	1 @ 0x2d373 0x2d434 0x560f 0x516b 0x7cd42 0x7b861 0x2297 0x2cf90 0x5b871
	#	0x7cd42	testing.RunTests+0x8d2	/Users/rsc/go/src/testing/testing.go:583
	#	0x7b861	testing.(*M).Run+0x81	/Users/rsc/go/src/testing/testing.go:515
	#	0x2297	main.main+0x117		runtime/pprof/_test/_testmain.go:72
	#	0x2cf90	runtime.main+0x2b0	/Users/rsc/go/src/runtime/proc.go:188

	1 @ 0x87128 0x86f03 0x82164 0x8ee30 0x7b798 0x5b871
	#	0x87128	runtime/pprof.writeRuntimeProfile+0xb8		/Users/rsc/go/src/runtime/pprof/pprof.go:566
	#	0x86f03	runtime/pprof.writeGoroutine+0x93		/Users/rsc/go/src/runtime/pprof/pprof.go:528
	#	0x82164	runtime/pprof.(*Profile).WriteTo+0xd4		/Users/rsc/go/src/runtime/pprof/pprof.go:236
	#	0x8ee30	runtime/pprof_test.TestGoroutineCounts+0x150	/Users/rsc/go/src/runtime/pprof/pprof_test.go:603
	#	0x7b798	testing.tRunner+0x98				/Users/rsc/go/src/testing/testing.go:473

Change-Id: I43de9eee2d96f9c46f7b0fbe099a0571164324f5
Reviewed-on: https://go-review.googlesource.com/20107
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-02 20:04:29 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
Brad Fitzpatrick
519474451a all: make copyright headers consistent with one space after period
This is a subset of https://golang.org/cl/20022 with only the copyright
header lines, so the next CL will be smaller and more reviewable.

Go policy has been single space after periods in comments for some time.

The copyright header template at:

    https://golang.org/doc/contribute.html#copyright

also uses a single space.

Make them all consistent.

Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0
Reviewed-on: https://go-review.googlesource.com/20111
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-01 23:34:33 +00:00
Keith Randall
9d854fd44a Merge branch 'dev.ssa' into mergebranch
Merge dev.ssa branch back into master.

Change-Id: Ie6fac3f8d355ab164f934415fe4fc7fcb8c3db16
2016-03-01 12:50:17 -08:00
Gerrit Code Review
acdb0da47d Merge "[dev.ssa] Merge remote-tracking branch 'origin/master' into ssamerge" into dev.ssa 2016-02-29 23:19:17 +00:00
Keith Randall
4fffd4569d [dev.ssa] Merge remote-tracking branch 'origin/master' into ssamerge
(Last?) Semi-regular merge from tip to dev.ssa.

Conflicts:
	src/cmd/compile/internal/gc/closure.go
	src/cmd/compile/internal/gc/gsubr.go
	src/cmd/compile/internal/gc/lex.go
	src/cmd/compile/internal/gc/pgen.go
	src/cmd/compile/internal/gc/syntax.go
	src/cmd/compile/internal/gc/walk.go
	src/cmd/internal/obj/pass.go

Change-Id: Ib5ea8bf74d420f4902a9c6208761be9f22371ae7
2016-02-29 13:32:20 -08:00
David Chase
8107b0012f [dev.ssa] cmd/compile: use 32-bit load to read writebarrier
Avoid targeting a partial register with load;
ensure source of load (writebarrier) is aligned.

Better yet would be "CMPB $1,writebarrier" but that requires
wrestling with flagalloc (mem operand complicates moving
instruction around).

Didn't see a change in time for
   benchcmd -n 10 Build go build net/http

Verified that we clean the code up properly:
   0x20a8 <main.main+104>:	mov    0xc30a2(%rip),%eax
                            # 0xc5150 <runtime.writeBarrier>
   0x20ae <main.main+110>:	test   %al,%al

Change-Id: Id5fb8c260eaec27bd727cb0ae1476c60343b0986
Reviewed-on: https://go-review.googlesource.com/19998
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-28 22:29:23 +00:00
Austin Clements
d62d831882 runtime: clean up adjustpointer and eliminate write barrier
Commit a5c3bbe modified adjustpointers to use *uintptrs instead of
*unsafe.Pointers for manipulating stack pointers for clarity and to
eliminate the unnecessary write barrier when writing the updated stack
pointer.

This commit makes the equivalent change to adjustpointer.

Change-Id: I6dc309590b298bdd86ecdc9737db848d6786c3f7
Reviewed-on: https://go-review.googlesource.com/17148
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-28 04:19:01 +00:00
Ian Lance Taylor
8d94b9b820 runtime: more deflaking of TestCgoCheckBytes
Fixes #14519.

Change-Id: I8f78f67a463e6467e09df90446f7ebd28789d6c9
Reviewed-on: https://go-review.googlesource.com/19933
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2016-02-26 19:20:47 +00:00
Dmitry Vyukov
bdc14698f8 runtime: unwire g/m in dropg always
Currently dropg does not unwire locked g/m.
This is unnecessary distiction between locked and non-locked g/m.
We always restart goroutines with execute which re-wires g/m.

First, this produces false sense that this distinction is necessary.
Second, it can confuse some sanity and cross checks. For example,
if we check that g/m are unwired before we wire them in execute,
the check will fail for locked g/m. I've hit this while doing some
race detector changes, When we deschedule a goroutine and run
scheduler code, m.curg is generally nil, but not for locked ms.

Remove the distinction.

Change-Id: I3b87a28ff343baa1d564aab1f821b582a84dee07
Reviewed-on: https://go-review.googlesource.com/19950
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-26 15:45:45 +00:00
Austin Clements
3b3d58e119 runtime: remove workbuf logging
Early in Go 1.5 we had bugs with ownership of workbufs, so we added a
system for tracing their ownership to help debug these issues.
However, this system has both CPU and space overhead even when
disabled, it clutters up the workbuf API, the higher level gcWork
abstraction makes it very difficult to mess up the ownership of
workbufs in practice, and the tracing hasn't been enabled or needed
since 5b66e5d nine months ago. Hence, remove it.

Benchmarks show the usual noise from changes at this level, but little
overall movement.

name              old time/op  new time/op  delta
XBenchGarbage-12  2.48ms ± 1%  2.47ms ± 0%  -0.68%  (p=0.000 n=21+21)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.98s ± 7%     2.98s ± 6%    ~     (p=0.799 n=20+20)
Fannkuch11-12                2.61s ± 3%     2.55s ± 5%  -2.55%  (p=0.003 n=20+20)
FmtFprintfEmpty-12          52.8ns ± 6%    53.6ns ± 6%    ~     (p=0.228 n=20+20)
FmtFprintfString-12          177ns ± 4%     177ns ± 4%    ~     (p=0.280 n=20+20)
FmtFprintfInt-12             162ns ± 5%     162ns ± 3%    ~     (p=0.347 n=20+20)
FmtFprintfIntInt-12          277ns ± 7%     273ns ± 4%  -1.62%  (p=0.005 n=20+20)
FmtFprintfPrefixedInt-12     237ns ± 4%     242ns ± 4%  +2.13%  (p=0.005 n=20+20)
FmtFprintfFloat-12           315ns ± 4%     312ns ± 4%  -0.97%  (p=0.001 n=20+20)
FmtManyArgs-12              1.11µs ± 3%    1.15µs ± 4%  +3.41%  (p=0.004 n=20+20)
GobDecode-12                8.50ms ± 7%    8.53ms ± 7%    ~     (p=0.429 n=20+20)
GobEncode-12                6.86ms ± 9%    6.93ms ± 7%  +0.93%  (p=0.030 n=20+20)
Gzip-12                      326ms ± 4%     329ms ± 4%  +0.98%  (p=0.020 n=20+20)
Gunzip-12                   43.3ms ± 3%    43.8ms ± 9%  +1.25%  (p=0.003 n=20+20)
HTTPClientServer-12         72.0µs ± 3%    71.5µs ± 3%    ~     (p=0.053 n=20+20)
JSONEncode-12               17.0ms ± 6%    17.3ms ± 7%  +1.32%  (p=0.006 n=20+20)
JSONDecode-12               64.2ms ± 4%    63.5ms ± 3%  -1.05%  (p=0.005 n=20+20)
Mandelbrot200-12            4.00ms ± 3%    3.99ms ± 3%    ~     (p=0.121 n=20+20)
GoParse-12                  3.74ms ± 5%    3.75ms ± 9%    ~     (p=0.383 n=20+20)
RegexpMatchEasy0_32-12       104ns ± 4%     104ns ± 6%    ~     (p=0.392 n=20+20)
RegexpMatchEasy0_1K-12       358ns ± 3%     361ns ± 4%  +0.95%  (p=0.003 n=20+20)
RegexpMatchEasy1_32-12      86.3ns ± 5%    86.1ns ± 6%    ~     (p=0.614 n=20+20)
RegexpMatchEasy1_1K-12       523ns ± 4%     518ns ± 3%  -1.14%  (p=0.008 n=20+20)
RegexpMatchMedium_32-12      137ns ± 3%     134ns ± 4%  -1.90%  (p=0.005 n=20+20)
RegexpMatchMedium_1K-12     41.0µs ± 3%    40.6µs ± 4%  -1.11%  (p=0.004 n=20+20)
RegexpMatchHard_32-12       2.13µs ± 4%    2.11µs ± 5%  -1.31%  (p=0.014 n=20+20)
RegexpMatchHard_1K-12       64.1µs ± 3%    63.2µs ± 5%  -1.38%  (p=0.005 n=20+20)
Revcomp-12                   555ms ±10%     548ms ± 7%  -1.17%  (p=0.011 n=20+20)
Template-12                 84.2ms ± 5%    88.2ms ± 4%  +4.73%  (p=0.000 n=20+20)
TimeParse-12                 365ns ± 4%     371ns ± 5%  +1.77%  (p=0.002 n=20+20)
TimeFormat-12                361ns ± 4%     365ns ± 3%  +1.08%  (p=0.002 n=20+20)
[Geo mean]                  64.7µs         64.8µs       +0.19%

Change-Id: Ib043a7a0d18b588b298873d60913d44cd19f3b44
Reviewed-on: https://go-review.googlesource.com/19887
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-02-26 15:14:32 +00:00
David Crawshaw
0231f5420f cmd/compile: remove uncommonType.name
Reduces binary size of cmd/go by 0.5%.
For #6853.

Change-Id: I5a4b814049580ab5098ad252d979f80b70d8a5f9
Reviewed-on: https://go-review.googlesource.com/19694
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-26 12:02:39 +00:00
Keith Randall
687abca1ea runtime: avoid using REP prefix for IndexByte
REP-prefixed instructions have a large startup cost.
Avoid them like the plague.

benchmark                  old ns/op     new ns/op     delta
BenchmarkIndexByte10-8     22.4          5.34          -76.16%

Fixes #13983

Change-Id: I857e956e240fc9681d053f2584ccf24c1b272bb3
Reviewed-on: https://go-review.googlesource.com/18703
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-26 01:09:53 +00:00
Austin Clements
cbe849fc38 runtime: eliminate unused _Genqueue state
_Genqueue and _Gscanenqueue were introduced as part of the GC quiesce
code. The quiesce code was removed by 197aa9e, but these states and
some associated code stuck around. Remove them.

Change-Id: I69df81881602d4a431556513dac2959668d27c20
Reviewed-on: https://go-review.googlesource.com/19638
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 23:37:32 +00:00
Austin Clements
4eb33f6b8d runtime: eliminate a conditional branch from heapBits.bits
Change-Id: I1fa5e629b2890a8509559ce4ea17b74f47d71925
Reviewed-on: https://go-review.googlesource.com/19637
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 23:37:29 +00:00
Austin Clements
0168c2676f runtime: use only per-P gcWork
Currently most uses of gcWork use the per-P gcWork, but there are two
places that still use a stack-based gcWork. Simplify things by making
these instead use the per-P gcWork.

Change-Id: I712d012cce9dd5757c8541824e9641ac1c2a329c
Reviewed-on: https://go-review.googlesource.com/19636
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 23:37:27 +00:00
Austin Clements
7b229001e7 runtime: pass gcWork to markroot
Currently markroot uses a gcWork on the stack and disposes of it
immediately after marking one root. This used to be necessary because
markroot was called from the depths of parfor, but now that we call it
directly and have ready access to a gcWork at the call site, pass the
gcWork in, use it directly in markroot, and share it across calls to
markroot from the same P.

Change-Id: Id7c3b811bfb944153760e01873c07c8d18909be1
Reviewed-on: https://go-review.googlesource.com/19635
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2016-02-25 23:37:25 +00:00
Austin Clements
98130b39f5 runtime: remove noescape hacks from gcWork
When gcWork was first introduced, the compiler's escape analysis
wasn't good enough to detect that that method receiver didn't escape,
so we had to hack around this.

Now that the compiler can figure out this for itself, remove these
hacks.

Change-Id: I9f73fab721e272410b8b6905b564e7abc03c0dfe
Reviewed-on: https://go-review.googlesource.com/19634
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 23:37:22 +00:00
Austin Clements
0d26efb12a runtime: remove unnecessary clears of the heap bitmap
Currently we clear the heap bitmap of a span both when we allocate
that span *and* when we free it. There's no point in doing both, and
we definitely have to write the heap bitmap when we allocate a span
for pointer-sized objects, so switch to clearing only when we allocate
a span.

This results in a slight overall performance improvement; however,
most of the benchmarks that get slower are very short, while the
longer benchmarks generally got faster.

name              old time/op  new time/op  delta
XBenchGarbage-12  2.48ms ± 1%  2.47ms ± 1%  -0.58%  (p=0.000 n=91+91)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.85s ± 2%     2.85s ± 2%    ~     (p=0.550 n=20+19)
Fannkuch11-12                2.54s ± 0%     2.47s ± 1%  -2.72%  (p=0.000 n=19+18)
FmtFprintfEmpty-12          51.3ns ± 4%    51.0ns ± 3%    ~     (p=0.223 n=20+20)
FmtFprintfString-12          169ns ± 0%     167ns ± 0%  -1.18%  (p=0.000 n=17+16)
FmtFprintfInt-12             160ns ± 0%     161ns ± 0%  +0.63%  (p=0.000 n=16+15)
FmtFprintfIntInt-12          267ns ± 0%     269ns ± 1%  +0.62%  (p=0.000 n=17+20)
FmtFprintfPrefixedInt-12     234ns ± 1%     240ns ± 0%  +2.80%  (p=0.000 n=20+20)
FmtFprintfFloat-12           316ns ± 0%     313ns ± 0%  -0.76%  (p=0.000 n=20+19)
FmtManyArgs-12              1.04µs ± 0%    1.05µs ± 0%  +0.45%  (p=0.000 n=19+16)
GobDecode-12                7.90ms ± 1%    7.81ms ± 0%  -1.10%  (p=0.000 n=18+18)
GobEncode-12                6.61ms ± 1%    6.58ms ± 0%  -0.46%  (p=0.000 n=20+15)
Gzip-12                      320ms ± 1%     322ms ± 1%  +0.47%  (p=0.030 n=20+20)
Gunzip-12                   42.4ms ± 1%    42.6ms ± 0%  +0.37%  (p=0.000 n=20+20)
HTTPClientServer-12         70.7µs ± 1%    70.6µs ± 2%    ~     (p=0.784 n=18+20)
JSONEncode-12               16.9ms ± 1%    16.8ms ± 0%  -0.64%  (p=0.000 n=20+20)
JSONDecode-12               60.8ms ± 0%    58.6ms ± 1%  -3.50%  (p=0.000 n=17+18)
Mandelbrot200-12            3.92ms ± 0%    3.91ms ± 0%  -0.25%  (p=0.000 n=19+19)
GoParse-12                  3.65ms ± 0%    3.68ms ± 1%  +0.67%  (p=0.000 n=17+16)
RegexpMatchEasy0_32-12       102ns ± 1%     102ns ± 2%  +0.67%  (p=0.009 n=19+19)
RegexpMatchEasy0_1K-12       350ns ± 0%     351ns ± 1%  +0.34%  (p=0.002 n=20+20)
RegexpMatchEasy1_32-12      84.1ns ± 2%    84.2ns ± 2%    ~     (p=0.799 n=20+18)
RegexpMatchEasy1_1K-12       510ns ± 1%     508ns ± 1%  -0.45%  (p=0.000 n=20+17)
RegexpMatchMedium_32-12      132ns ± 1%     134ns ± 1%  +0.85%  (p=0.000 n=20+19)
RegexpMatchMedium_1K-12     40.0µs ± 1%    39.9µs ± 1%  -0.29%  (p=0.014 n=19+18)
RegexpMatchHard_32-12       2.09µs ± 1%    2.05µs ± 0%  -1.76%  (p=0.000 n=20+18)
RegexpMatchHard_1K-12       62.7µs ± 1%    61.8µs ± 1%  -1.39%  (p=0.000 n=20+19)
Revcomp-12                   541ms ± 1%     534ms ± 0%  -1.16%  (p=0.000 n=19+20)
Template-12                 71.1ms ± 0%    69.1ms ± 0%  -2.83%  (p=0.000 n=18+19)
TimeParse-12                 356ns ± 0%     357ns ± 0%  +0.36%  (p=0.000 n=17+19)
TimeFormat-12                358ns ± 0%     372ns ± 1%  +3.74%  (p=0.000 n=15+18)
[Geo mean]                  62.6µs         62.5µs       -0.25%

Change-Id: Ied190b77c7a4d91ec7b2218c592fc31cf7acf362
Reviewed-on: https://go-review.googlesource.com/19633
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 23:37:19 +00:00
Austin Clements
1e91e2a25a runtime: document non-obvious requirement on sudog.elem
The channel code must not allow stack splits between when it assigns a
potential stack pointer to sudog.elem (or sudog.selectdone) and when
it makes the sudog visible to copystack by putting it on the g.waiting
list. We do get this right everywhere, but add a comment about this
subtlety for future eyes.

Change-Id: I941da150437167acff37b0e56983c793f40fcf79
Reviewed-on: https://go-review.googlesource.com/19632
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 23:37:17 +00:00
Austin Clements
39f2bd737b runtime: improve initSpan documentation
Change-Id: I9c45aad1c35a99da4c3b8990649dcd962fd23b81
Reviewed-on: https://go-review.googlesource.com/19631
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 23:37:14 +00:00
Austin Clements
e1024b6030 runtime: fix heapBitsSweepSpan comment
Currently the heapBitsSweepSpan comment claims that heapBitsSweepSpan
sets the heap bitmap for the first two words to dead. In fact, it sets
the first *four* words to scalar/dead. This is important because first
two words don't actually have a dead bit, so for objects larger than
two words it *must* set a dead bit in third word to reset the object
to a "noscan" state. For example, we use this in heapBits.hasPointers
to detect that an object larger than two words is noscan.

Change-Id: Ie166a628bed5060851db083475c7377adb349d6c
Reviewed-on: https://go-review.googlesource.com/19630
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 23:37:09 +00:00
Keith Randall
d3f15ff6bc [dev.ssa] cmd/compile: shrink stack guard
Our stack frame sizes look pretty good now.  Lower the stack
guard from 1024 to 720.
Tip is currently using 720.
We could go lower (to 640 at least) except PPC doesn't like that.

Change-Id: Ie5f96c0e822435638223f1e8a2bd1a1eed68e6aa
Reviewed-on: https://go-review.googlesource.com/19922
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-02-25 22:32:48 +00:00
Ian Lance Taylor
ad03af66eb runtime, runtime/pprof: add Frames to get file/line for Callers
This indirectly implements a small fix for runtime/pprof: it used to
look for runtime.gopanic when it should have been looking for
runtime.sigpanic.

Update #11432.

Change-Id: I5e3f5203b2ac5463efd85adf6636e64174aacb1d
Reviewed-on: https://go-review.googlesource.com/19869
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-02-25 19:42:19 +00:00
Dmitry Vyukov
db44223fde runtime: fix getcallerpc args
Change-Id: I6b14b8eecf125dd74bd40f4e7fff6b49de150e42
Reviewed-on: https://go-review.googlesource.com/19897
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-25 18:57:28 +00:00
David Crawshaw
30f93f0994 cmd/compile: remove rtype.ptrToThis
Simplifies some code as ptrToThis was unreliable under dynamic
linking. Now the same type lookup is used regardless of execution
mode.

A synthetic relocation, R_USETYPE, is introduced to make sure the
linker includes *T on use of T, if *T is carrying methods.

Changes the heap dump format. Anything reading the format needs to
look at the last bool of a type of an interface value to determine
if the type should be the pointer-to type.

Reduces binary size of cmd/go by 0.2%.
For #6853.

Change-Id: I79fcb19a97402bdb0193f3c7f6d94ddf061ee7b2
Reviewed-on: https://go-review.googlesource.com/19695
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25 17:47:42 +00:00
Martin Möhrmann
fdd0179bb1 all: fix typos and spelling
Change-Id: Icd06d99c42b8299fd931c7da821e1f418684d913
Reviewed-on: https://go-review.googlesource.com/19829
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-24 18:42:29 +00:00
David Crawshaw
a858931200 cmd/compile: embed type string header in rtype
Reduces binary size of cmd/go by 1%.

For #6853.

Change-Id: I6f2992a4dd3699db1b532ab08683e82741b9c2e4
Reviewed-on: https://go-review.googlesource.com/19692
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-24 17:12:15 +00:00
Shenghou Ma
1439158120 runtime, syscall: switch linux/386 to use int 0x80
Like bionic, musl also doesn't provide vsyscall helper in %gs:0x10,
and as int $0x80 is as fast as calling %gs:0x10, just use int $0x80
always.

Because we're no longer using vsyscall in VDSO, get rid of VDSO code
for linux/386 too.

Fixes #14476.

Change-Id: I00ec8652060700e0a3c9b524bfe3c16a810263f6
Reviewed-on: https://go-review.googlesource.com/19833
Run-TryBot: Minux Ma <minux@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-24 02:07:17 +00:00
Ian Lance Taylor
5c096cc092 runtime: deflake TestCgoCheckBytes
Bump up the multiplier to 20.  Also run the fast version first, so that
the slow version is likely to start up faster.

Change-Id: Ia0654cc1212ab03a45da1904d3e4b57d6a8d02a0
Reviewed-on: https://go-review.googlesource.com/19835
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-02-24 01:49:05 +00:00
Keith Randall
80bc512449 [dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch
Semi-regular merge from tip to dev.ssa.

Change-Id: If7d2269f267bcbc0ecd3a483d349951044470e3f
2016-02-23 14:42:20 -08:00
Matthew Dempsky
9877900c8c Revert "cmd/compile: move hiter, hmap, and scase definitions into builtin.go"
This reverts commit f28bbb776a.

Change-Id: I82fb81dcff3ddcaefef72949f1ef3a41bcd22301
Reviewed-on: https://go-review.googlesource.com/19849
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-23 19:42:52 +00:00
Shawn Smith
58ec5839cd all: fix typos
Change-Id: I6035941df8b0de6aeaf6c05df7257bcf6e9191fe
Reviewed-on: https://go-review.googlesource.com/19320
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-23 13:58:47 +00:00
Keith Randall
bd70bd9cb2 runtime: unify memeq and memequal
They do the same thing, except memequal also has the short-circuit
check if the two pointers are equal.

A) We might as well always do the short-circuit check, it is only 2 instructions.
B) The extra function call (memequal->memeq) is expensive.

benchmark                 old ns/op     new ns/op     delta
BenchmarkArrayEqual-8     8.56          5.31          -37.97%

No noticeable affect on the former memeq user (maps).

Fixes #14302

Change-Id: I85d1ada59ed11e64dd6c54667f79d32cc5f81948
Reviewed-on: https://go-review.googlesource.com/19843
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-02-23 00:15:38 +00:00
Matthew Dempsky
a4b833940d runtime: move machport into darwin's mOS
It's not needed on other OSes.

Change-Id: Ia6b13510585392a7062374806527d33876beba2a
Reviewed-on: https://go-review.googlesource.com/19818
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-22 21:15:50 +00:00
Matthew Dempsky
756ea30eb0 runtime: simplify stack copying in ThreadCreateProfile
Change-Id: I7414d2fab18ae6e7e7c50f8697ec64d38290f409
Reviewed-on: https://go-review.googlesource.com/19817
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-02-22 21:15:46 +00:00
Matthew Dempsky
f28bbb776a cmd/compile: move hiter, hmap, and scase definitions into builtin.go
Also eliminates per-maptype hiter and hmap types, since they're not
really needed anyway.  Update packages reflect and runtime
accordingly.

Reduces golang.org/x/tools/cmd/godoc's text segment by ~170kB:

   text	   data	    bss	    dec	    hex	filename
13085702	 140640	 151520	13377862	 cc2146	godoc.before
12915382	 140640	 151520	13207542	 c987f6	godoc.after

Updates #6853.

Change-Id: I948b2bc1f22d477c1756204996b4e3e1fb568d81
Reviewed-on: https://go-review.googlesource.com/16610
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-22 07:42:37 +00:00
Keith Randall
d0c11577b9 cmd/compile: inline {i,e}facethash
These functions are really simple, the overhead of calling
them (in both time and code size) is larger than the inlined versions.

Reorganize how the nil case in a type switch is handled, as we have
to check for nil explicitly now anyway.

Saves about 0.8% in the binary size of the go tool.

Change-Id: I8501b62d72fde43650b79f52b5f699f1fbd0e7e7
Reviewed-on: https://go-review.googlesource.com/19814
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-02-22 05:09:25 +00:00
Austin Clements
8847a5913a runtime: remove unused parfor code
Change-Id: Ibbfae20cab48163f22d661604ef730705f2b97ba
Reviewed-on: https://go-review.googlesource.com/19661
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-02-21 23:22:11 +00:00
Shenghou Ma
d70c04cf08 runtime: fix missing word in comment
Change-Id: I6cb8ac7b59812e82111ab3b0f8303ab8194a5129
Reviewed-on: https://go-review.googlesource.com/19791
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-21 22:40:25 +00:00
Matthew Dempsky
8ffe496ae7 cmd/compile, runtime: eliminate unnecessary algorithm types
There's no need for 8 different ways to represent that a type is
non-comparable.

While here, move AMEM out of the runtime-known algorithm values since
it's not needed at run-time, and get rid of the unused AUNK constant.

Change-Id: Ie23972b692c6f27fc5f1a908561b3e26ef5a50e9
Reviewed-on: https://go-review.googlesource.com/19779
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-02-21 20:59:36 +00:00
Shenghou Ma
e960302410 runtime: when crash with panic, call user Error/String methods before freezing the world
Fixes #14432.

Change-Id: I0a92ef86de95de39217df9a664d8034ef685a906
Reviewed-on: https://go-review.googlesource.com/19792
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-21 20:18:51 +00:00
Josh Bleecher Snyder
e43c74a0d8 all: use cannot instead of can not
You can not use cannot, but you cannot spell cannot can not.

Change-Id: I2f0971481a460804de96fd8c9e46a9cc62a3fc5b
Reviewed-on: https://go-review.googlesource.com/19772
Reviewed-by: Rob Pike <r@golang.org>
2016-02-21 15:35:50 +00:00
Shenghou Ma
315f4c70f1 runtime: use correct psABI SP alignment before calling libc mmap
Fixes #14384.

Change-Id: Ib025cf2d20754b4c2db52f0a8a4717fd303371d6
Reviewed-on: https://go-review.googlesource.com/19660
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-02-20 06:10:01 +00:00
Ian Lance Taylor
c8e7b34b59 runtime: skip cgo check for non-pointer slice elements
Fixes #14387.

Change-Id: Icc98be80f549c5e1f55c5e693bfea97b456a6c41
Reviewed-on: https://go-review.googlesource.com/19621
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-02-19 16:07:27 +00:00
Nathan VanBenschoten
b04f3b06ec all: replace strings.Index with strings.Contains where possible
Change-Id: Ia613f1c37bfce800ece0533a5326fca91d99a66a
Reviewed-on: https://go-review.googlesource.com/18120
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
2016-02-19 01:06:05 +00:00
David Chase
ae276d8c23 [dev.ssa] cmd/compile: reenable TestStackBarrierProfiling
Tested it 1000x on OS X and Linux amd64, no failures.
Updated TODO.

Change-Id: Ia60c8d90962f6e5f7c3ed1ded6ba1b25eee983e1
Reviewed-on: https://go-review.googlesource.com/19662
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-18 23:21:14 +00:00
Austin Clements
7c22af830a runtime: fix deadlock in TestCrashDumpsAllThreads
TestCrashDumpsAllThreads carefully sets the number of Ps to one
greater than the number of non-preemptible loops it starts so that the
main goroutine can continue to run (necessary because of #10958).
However, if GC starts, it can take over that one spare P and lock up
the system while waiting for the non-preemptible loops, causing the
test to eventually time out. This deadlock is easily reproducible if
you run the runtime test with GOGC=1.

Fix this by forcing GOGC=off when running this test.

Change-Id: Ifb22da5ce33f9a61700a326ea92fcf4b049721d1
Reviewed-on: https://go-review.googlesource.com/19516
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-02-16 20:18:40 +00:00
Austin Clements
0c02bc009a runtime: show panics in traceback
We used to include panic calls in tracebacks; however, when
runtime.panic was renamed to runtime.gopanic in the conversion of the
runtime to Go, we missed the special case in showframe that includes
panic calls even though they're in package runtime.

Fix the function name check in showframe (and, while we're here, fix
the other check for "runtime.panic" in runtime/pprof). Since the
"runtime.gopanic" name doesn't match what users call panic and hence
isn't very user-friendly, make traceback rewrite it to just "panic".

Updates #5832, #13857. Fixes #14315.

Change-Id: I8059621b41ec043e63d5cfb4cbee479f47f64973
Reviewed-on: https://go-review.googlesource.com/19492
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-02-16 16:58:43 +00:00
Ian Lance Taylor
387d5b8cfb runtime: remove debugging print in cgoCheckTypedBlock
Change-Id: I83639fcde88e7d9747b54728a9481ee2e1b23a64
Reviewed-on: https://go-review.googlesource.com/19486
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-02-13 17:33:22 +00:00
Ian Lance Taylor
c93193aec0 runtime: return errno value from Solaris mmap as expected
The code in mem_bsd.go expects that when mmap fails it will return a
positive errno value.  This fixes the Solaris implementation of mmap to
work as expected.

Change-Id: Id1c34a9b916e8dc955ced90ea2f4af8321d92265
Reviewed-on: https://go-review.googlesource.com/19477
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-02-12 19:20:19 +00:00
Ryan Brown
68aa7fb636 cmd/link: fix padding for dwarf aranges on 32 bit platforms.
Fixes #14278

Change-Id: I6a0c1370d595f0573ff0eb933450b1eea41f4bb3
Reviewed-on: https://go-review.googlesource.com/19452
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-02-12 19:13:11 +00:00
Ian Lance Taylor
cc0a04d351 runtime: fix errno sign for some mmap and mincore cases
The caller of mmap expects it to return a positive errno value, but the
linux-arm64 and nacl-386 system calls returned a negative errno value.
Correct them to negate the errno value.

The caller of mincore expects it to return a negative errno value (yes,
this is inconsistent), but the linux-mips64x and linux-ppc64x system
call returned a positive errno value.  Correct them to negate the errno
value.

Add a test that mmap returns errno with the correct sign.  Brad added a
test for mincore's errno value in https://golang.org/cl/19457.

Fixes #14297.

Change-Id: I2b93f32e679bd1eae1c9aef9ae7bcf0ba39521b5
Reviewed-on: https://go-review.googlesource.com/19455
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-12 00:07:29 +00:00
Brad Fitzpatrick
70418eb819 runtime: add test for mincore's return value sign on Linux
Updates #14297

Change-Id: I6b5f5020af5efaaa71280bdeb2ff99785ee9b959
Reviewed-on: https://go-review.googlesource.com/19457
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-02-11 19:09:50 +00:00