1
0
mirror of https://github.com/golang/go synced 2024-10-02 14:38:33 -06:00
Commit Graph

34443 Commits

Author SHA1 Message Date
Austin Clements
af192a3e22 runtime: allow 5% mutator assist over 25% background mark
Currently, both the background mark worker and the goal GC CPU are
both fixed at 25%. The trigger controller's goal is to achieve the
goal CPU usage, and with the previous commit it can actually achieve
this. But this means there are *no* assists, which sounds ideal but
actually causes problems for the trigger controller. Since the
controller can't lower CPU usage below the background mark worker CPU,
it saturates at the CPU goal and no longer gets feedback, which
translates into higher variability in heap growth.

This commit fixes this by allowing assists 5% CPU beyond the 25% fixed
background mark. This avoids saturating the trigger controller, since
it can now get feedback from both sides of the CPU goal. This leads to
low variability in both CPU usage and heap growth, at the cost of
reintroducing a low rate of mark assists.

We also experimented with 20% background plus 5% assist, but 25%+5%
clearly performed better in benchmarks.

Updates #14951.
Updates #14812.
Updates #18534.

Combined with the previous CL, this significantly improves tail
mutator utilization in the x/bechmarks garbage benchmark. On a sample
trace, it increased the 99.9%ile mutator utilization at 10ms from 26%
to 59%, and at 5ms from 17% to 52%. It reduced the 99.9%ile zero
utilization window from 2ms to 700µs. It also helps the mean mutator
utilization: it increased the 10s mutator utilization from 83% to 94%.
The minimum mutator utilization is also somewhat improved, though
there is still some unknown artifact that causes a miniscule fraction
of mutator assists to take 5--10ms (in fact, there was exactly one
10ms mutator assist in my sample trace).

This has no significant effect on the throughput of the
github.com/dr2chase/bent benchmarks-50.

This has little effect on the go1 benchmarks (and the slight overall
improvement makes up for the slight overall slowdown from the previous
commit):

name                      old time/op    new time/op    delta
BinaryTree17-12              2.40s ± 0%     2.41s ± 1%  +0.26%  (p=0.010 n=18+19)
Fannkuch11-12                2.95s ± 0%     2.93s ± 0%  -0.62%  (p=0.000 n=18+15)
FmtFprintfEmpty-12          42.2ns ± 0%    42.3ns ± 1%  +0.37%  (p=0.001 n=15+14)
FmtFprintfString-12         67.9ns ± 2%    67.2ns ± 3%  -1.03%  (p=0.002 n=20+18)
FmtFprintfInt-12            75.6ns ± 3%    76.8ns ± 2%  +1.59%  (p=0.000 n=19+17)
FmtFprintfIntInt-12          123ns ± 1%     124ns ± 1%  +0.77%  (p=0.000 n=17+14)
FmtFprintfPrefixedInt-12     148ns ± 1%     150ns ± 1%  +1.28%  (p=0.000 n=20+20)
FmtFprintfFloat-12           212ns ± 0%     211ns ± 1%  -0.67%  (p=0.000 n=16+17)
FmtManyArgs-12               499ns ± 1%     500ns ± 0%  +0.23%  (p=0.004 n=19+16)
GobDecode-12                6.49ms ± 1%    6.51ms ± 1%  +0.32%  (p=0.008 n=19+19)
GobEncode-12                5.47ms ± 0%    5.43ms ± 1%  -0.68%  (p=0.000 n=19+20)
Gzip-12                      220ms ± 1%     216ms ± 1%  -1.66%  (p=0.000 n=20+19)
Gunzip-12                   38.8ms ± 0%    38.5ms ± 0%  -0.80%  (p=0.000 n=19+20)
HTTPClientServer-12         78.5µs ± 1%    78.1µs ± 1%  -0.53%  (p=0.008 n=20+19)
JSONEncode-12               12.2ms ± 0%    11.9ms ± 0%  -2.38%  (p=0.000 n=17+19)
JSONDecode-12               52.3ms ± 0%    53.3ms ± 0%  +1.84%  (p=0.000 n=19+20)
Mandelbrot200-12            3.69ms ± 0%    3.69ms ± 0%  -0.19%  (p=0.000 n=19+19)
GoParse-12                  3.17ms ± 1%    3.19ms ± 1%  +0.61%  (p=0.000 n=20+20)
RegexpMatchEasy0_32-12      73.7ns ± 0%    73.2ns ± 1%  -0.66%  (p=0.000 n=17+20)
RegexpMatchEasy0_1K-12       238ns ± 0%     239ns ± 0%  +0.32%  (p=0.000 n=17+16)
RegexpMatchEasy1_32-12      69.1ns ± 1%    69.2ns ± 1%    ~     (p=0.669 n=19+13)
RegexpMatchEasy1_1K-12       365ns ± 1%     367ns ± 1%  +0.49%  (p=0.000 n=19+19)
RegexpMatchMedium_32-12      104ns ± 1%     105ns ± 1%  +1.33%  (p=0.000 n=16+20)
RegexpMatchMedium_1K-12     33.6µs ± 3%    34.1µs ± 4%  +1.67%  (p=0.001 n=20+20)
RegexpMatchHard_32-12       1.67µs ± 1%    1.62µs ± 1%  -2.78%  (p=0.000 n=18+17)
RegexpMatchHard_1K-12       50.3µs ± 2%    48.7µs ± 1%  -3.09%  (p=0.000 n=19+18)
Revcomp-12                   384ms ± 0%     386ms ± 0%  +0.59%  (p=0.000 n=19+19)
Template-12                 61.1ms ± 1%    60.5ms ± 1%  -1.02%  (p=0.000 n=19+20)
TimeParse-12                 307ns ± 0%     303ns ± 1%  -1.23%  (p=0.000 n=19+15)
TimeFormat-12                323ns ± 0%     323ns ± 0%  -0.12%  (p=0.011 n=15+20)
[Geo mean]                  47.1µs         47.0µs       -0.20%

https://perf.golang.org/search?q=upload:20171030.4

It slightly improve the performance the x/benchmarks:

name                         old time/op  new time/op  delta
Garbage/benchmem-MB=1024-12  2.29ms ± 3%  2.22ms ± 2%  -2.97%  (p=0.000 n=18+18)
Garbage/benchmem-MB=64-12    2.24ms ± 2%  2.21ms ± 2%  -1.64%  (p=0.000 n=18+18)
HTTP-12                      12.6µs ± 1%  12.6µs ± 1%    ~     (p=0.690 n=19+17)
JSON-12                      11.3ms ± 2%  11.3ms ± 1%    ~     (p=0.163 n=17+18)

and fixes some of the heap size bloat caused by the previous commit:

name                         old peak-RSS-bytes  new peak-RSS-bytes  delta
Garbage/benchmem-MB=1024-12          1.88G ± 2%          1.77G ± 2%  -5.52%  (p=0.000 n=20+18)
Garbage/benchmem-MB=64-12             248M ± 8%           226M ± 5%  -8.93%  (p=0.000 n=20+20)
HTTP-12                              47.0M ±27%          47.2M ±12%    ~     (p=0.512 n=20+20)
JSON-12                               206M ±11%           206M ±10%    ~     (p=0.841 n=20+20)

https://perf.golang.org/search?q=upload:20171030.5

Combined with the change to add a soft goal in the previous commit,
the achieves a decent performance improvement on the garbage
benchmark:

name                         old time/op  new time/op  delta
Garbage/benchmem-MB=1024-12  2.40ms ± 4%  2.22ms ± 2%  -7.40%  (p=0.000 n=19+18)
Garbage/benchmem-MB=64-12    2.23ms ± 1%  2.21ms ± 2%  -1.06%  (p=0.000 n=19+18)
HTTP-12                      12.5µs ± 1%  12.6µs ± 1%    ~     (p=0.330 n=20+17)
JSON-12                      11.1ms ± 1%  11.3ms ± 1%  +1.87%  (p=0.000 n=16+18)

https://perf.golang.org/search?q=upload:20171030.6

Change-Id: If04ddb57e1e58ef2fb9eec54c290eb4ae4bea121
Reviewed-on: https://go-review.googlesource.com/59971
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-31 21:59:11 +00:00
Austin Clements
03eb9483e3 runtime: separate soft and hard heap limits
Currently, GC pacing is based on a single hard heap limit computed
based on GOGC. In order to achieve this hard limit, assist pacing
makes the conservative assumption that the entire heap is live.
However, in the steady state (with GOGC=100), only half of the heap is
live. As a result, the garbage collector works twice as hard as
necessary and finishes half way between the trigger and the goal.
Since this is a stable state for the trigger controller, this repeats
from cycle to cycle. Matters are even worse if GOGC is higher. For
example, if GOGC=200, only a third of the heap is live in steady
state, so the GC will work three times harder than necessary and
finish only a third of the way between the trigger and the goal.

Since this causes the garbage collector to consume ~50% of the
available CPU during marking instead of the intended 25%, about 25% of
the CPU goes to mutator assists. This high mutator assist cost causes
high mutator latency variability.

This commit improves the situation by separating the heap goal into
two goals: a soft goal and a hard goal. The soft goal is set based on
GOGC, just like the current goal is, and the hard goal is set at a 10%
larger heap than the soft goal. Prior to the soft goal, assist pacing
assumes the heap is in steady state (e.g., only half of it is live).
Between the soft goal and the hard goal, assist pacing switches to the
current conservative assumption that the entire heap is live.

In benchmarks, this nearly eliminates mutator assists. However, since
background marking is fixed at 25% CPU, this causes the trigger
controller to saturate, which leads to somewhat higher variability in
heap size. The next commit will address this.

The lower CPU usage of course leads to longer mark cycles, though
really it means the mark cycles are as long as they should have been
in the first place. This does, however, lead to two potential
down-sides compared to the current pacing policy: 1. the total
overhead of the write barrier is higher because it's enabled more of
the time and 2. the heap size may be larger because there's more
floating garbage. We addressed 1 by significantly improving the
performance of the write barrier in the preceding commits. 2 can be
demonstrated in intense GC benchmarks, but doesn't seem to be a
problem in any real applications.

Updates #14951.
Updates #14812 (fixes?).
Fixes #18534.

This has no significant effect on the throughput of the
github.com/dr2chase/bent benchmarks-50.

This has little overall throughput effect on the go1 benchmarks:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.41s ± 0%     2.40s ± 0%  -0.22%  (p=0.007 n=20+18)
Fannkuch11-12                2.95s ± 0%     2.95s ± 0%  +0.07%  (p=0.003 n=17+18)
FmtFprintfEmpty-12          41.7ns ± 3%    42.2ns ± 0%  +1.17%  (p=0.002 n=20+15)
FmtFprintfString-12         66.5ns ± 0%    67.9ns ± 2%  +2.16%  (p=0.000 n=16+20)
FmtFprintfInt-12            77.6ns ± 2%    75.6ns ± 3%  -2.55%  (p=0.000 n=19+19)
FmtFprintfIntInt-12          124ns ± 1%     123ns ± 1%  -0.98%  (p=0.000 n=18+17)
FmtFprintfPrefixedInt-12     151ns ± 1%     148ns ± 1%  -1.75%  (p=0.000 n=19+20)
FmtFprintfFloat-12           210ns ± 1%     212ns ± 0%  +0.75%  (p=0.000 n=19+16)
FmtManyArgs-12               501ns ± 1%     499ns ± 1%  -0.30%  (p=0.041 n=17+19)
GobDecode-12                6.50ms ± 1%    6.49ms ± 1%    ~     (p=0.234 n=19+19)
GobEncode-12                5.43ms ± 0%    5.47ms ± 0%  +0.75%  (p=0.000 n=20+19)
Gzip-12                      216ms ± 1%     220ms ± 1%  +1.71%  (p=0.000 n=19+20)
Gunzip-12                   38.6ms ± 0%    38.8ms ± 0%  +0.66%  (p=0.000 n=18+19)
HTTPClientServer-12         78.1µs ± 1%    78.5µs ± 1%  +0.49%  (p=0.035 n=20+20)
JSONEncode-12               12.1ms ± 0%    12.2ms ± 0%  +1.05%  (p=0.000 n=18+17)
JSONDecode-12               53.0ms ± 0%    52.3ms ± 0%  -1.27%  (p=0.000 n=19+19)
Mandelbrot200-12            3.74ms ± 0%    3.69ms ± 0%  -1.17%  (p=0.000 n=18+19)
GoParse-12                  3.17ms ± 1%    3.17ms ± 1%    ~     (p=0.569 n=19+20)
RegexpMatchEasy0_32-12      73.2ns ± 1%    73.7ns ± 0%  +0.76%  (p=0.000 n=18+17)
RegexpMatchEasy0_1K-12       239ns ± 0%     238ns ± 0%  -0.27%  (p=0.000 n=13+17)
RegexpMatchEasy1_32-12      69.0ns ± 2%    69.1ns ± 1%    ~     (p=0.404 n=19+19)
RegexpMatchEasy1_1K-12       367ns ± 1%     365ns ± 1%  -0.60%  (p=0.000 n=19+19)
RegexpMatchMedium_32-12      105ns ± 1%     104ns ± 1%  -1.24%  (p=0.000 n=19+16)
RegexpMatchMedium_1K-12     34.1µs ± 2%    33.6µs ± 3%  -1.60%  (p=0.000 n=20+20)
RegexpMatchHard_32-12       1.62µs ± 1%    1.67µs ± 1%  +2.75%  (p=0.000 n=18+18)
RegexpMatchHard_1K-12       48.8µs ± 1%    50.3µs ± 2%  +3.07%  (p=0.000 n=20+19)
Revcomp-12                   386ms ± 0%     384ms ± 0%  -0.57%  (p=0.000 n=20+19)
Template-12                 59.9ms ± 1%    61.1ms ± 1%  +2.01%  (p=0.000 n=20+19)
TimeParse-12                 301ns ± 2%     307ns ± 0%  +2.11%  (p=0.000 n=20+19)
TimeFormat-12                323ns ± 0%     323ns ± 0%    ~     (all samples are equal)
[Geo mean]                  47.0µs         47.1µs       +0.23%

https://perf.golang.org/search?q=upload:20171030.1

Likewise, the throughput effect on the x/benchmarks is minimal (and
reasonably positive on the garbage benchmark with a large heap):

name                         old time/op  new time/op  delta
Garbage/benchmem-MB=1024-12  2.40ms ± 4%  2.29ms ± 3%  -4.57%  (p=0.000 n=19+18)
Garbage/benchmem-MB=64-12    2.23ms ± 1%  2.24ms ± 2%  +0.59%  (p=0.016 n=19+18)
HTTP-12                      12.5µs ± 1%  12.6µs ± 1%    ~     (p=0.326 n=20+19)
JSON-12                      11.1ms ± 1%  11.3ms ± 2%  +2.15%  (p=0.000 n=16+17)

It does increase the heap size of the garbage benchmarks, but seems to
have relatively little impact on more realistic programs. Also, we'll
gain some of this back with the next commit.

name                         old peak-RSS-bytes  new peak-RSS-bytes  delta
Garbage/benchmem-MB=1024-12          1.21G ± 1%          1.88G ± 2%  +55.59%  (p=0.000 n=19+20)
Garbage/benchmem-MB=64-12             168M ± 3%           248M ± 8%  +48.08%  (p=0.000 n=18+20)
HTTP-12                              45.6M ± 9%          47.0M ±27%     ~     (p=0.925 n=20+20)
JSON-12                               193M ±11%           206M ±11%   +7.06%  (p=0.001 n=20+20)

https://perf.golang.org/search?q=upload:20171030.2

Change-Id: Ic78904135f832b4d64056cbe734ab979f5ad9736
Reviewed-on: https://go-review.googlesource.com/59970
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-31 21:59:08 +00:00
Cherry Zhang
bc723cf340 cmd/compile: on ARM, make sure *const's AuxInt fit into int32
Previously some of the AuxInt are uint32, which may not fit into
int32. This CL convert them to int32. This does not change the
generated code, but make ssacheck happy.

Pass "toolstash -cmp" for std cmd on ARM.

Fixes #22499.

Change-Id: Ib072d3c14962388bfeb0766c861995d00b4fa7c4
Reviewed-on: https://go-review.googlesource.com/74770
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-31 20:14:55 +00:00
Daniel Martí
7cb3e4fb1d all: unindent some if bodies by exiting early
All of these had a return or break in the else body, so flipping the
condition means we can unindent and simplify.

Change-Id: If93e97504480d18a0dac3f2c8ffe57ab8bcb929c
Reviewed-on: https://go-review.googlesource.com/74190
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 20:07:46 +00:00
Ilya Tocar
94484d8ed5 cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64
This significantly speed-ups Trunc.
Ceil/Floor are using the same instruction, so do them too.

name     old time/op  new time/op  delta
Floor-6  3.33ns ± 1%  3.22ns ± 0%   -3.39%  (p=0.000 n=10+10)
Ceil-6   3.33ns ± 1%  3.22ns ± 0%   -3.16%  (p=0.000 n=10+7)
Trunc-6  4.83ns ± 0%  3.22ns ± 0%  -33.36%  (p=0.000 n=6+8)

Change-Id: If848790e458eedfe38a6a0407bb4f589c68ac254
Reviewed-on: https://go-review.googlesource.com/68630
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-31 19:30:54 +00:00
Matthew Dempsky
8684534321 cmd/compile: don't export unreachable inline method bodies
Previously, anytime we exported a function or method declaration
(which includes methods for every type transitively exported), we
included the inline function bodies, if any. However, in many cases,
it's impossible (or at least very unlikely) for the importing package
to call the method.

For example:

    package p
    type T int
    func (t T) M() { t.u() }
    func (t T) u() {}
    func (t T) v() {}

T.M and T.u are inlineable, and they're both reachable through calls
to T.M, which is exported. However, t.v is also inlineable, but cannot
be reached.

Exception: if p.T is embedded in another type q.U, p.T.v will be
promoted to q.U.v, and the generated wrapper function could have
inlined the call to p.T.v. However, in practice, this doesn't happen,
and a missed inlining opportunity doesn't affect correctness.

To implement this, this CL introduces an extra flood fill pass before
exporting to mark inline bodies that are actually reachable, so the
exporter can skip over methods like t.v.

This reduces Kubernetes build time (as measured by "time go build -a
k8s.io/kubernetes/cmd/...") on an HP Z620 measurably:

    == before ==
    real    0m44.658s
    user    11m19.136s
    sys     0m53.844s

    == after ==
    real    0m41.702s
    user    10m29.732s
    sys     0m50.908s

It also significantly cuts down the cost of enabling mid-stack
inlining (-l=4):

    == before (-l=4) ==
    real    1m19.236s
    user    20m6.528s
    sys     1m17.328s

    == after (-l=4) ==
    real    0m59.100s
    user    13m12.808s
    sys     0m58.776s

Updates #19348.

Change-Id: Iade58233ca42af823a1630517a53848b5d3c7a7e
Reviewed-on: https://go-review.googlesource.com/74110
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-10-31 19:12:51 +00:00
Cherry Zhang
f33f20ef1f test: fix and re-enable nosplit.go
The test was skipped because it did not work on AMD64 with
frame pointer enabled, and accidentally skipped on other
architectures. Now frame pointer is the default on AMD64.
Update the test to work with frame pointer. Now the test
is skipped only when frame pointer is NOT enabled on AMD64.

Fixes #18317.

Change-Id: I724cb6874e562f16e67ce5f389a1d032a2003115
Reviewed-on: https://go-review.googlesource.com/68610
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-10-31 19:12:11 +00:00
Joe Kyo
54d04c2fcb crypto/tls: remove bookkeeping code from pHash function
Since copy function can figure out how many bytes of data to copy when
two slices have different length, it is not necessary to check how many
bytes need to copy each time before copying the data.

Change-Id: I5151ddfe46af5575566fe9c9a2648e111575ec3d
Reviewed-on: https://go-review.googlesource.com/71090
Reviewed-by: Filippo Valsorda <hi@filippo.io>
Run-TryBot: Filippo Valsorda <hi@filippo.io>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-31 19:11:03 +00:00
Cherry Zhang
622cfd8833 cmd/compile: don't fold address of global into load/store on PPC64
On PPC64 (and a few other architectures), accessing global
requires multiple instructions and use of temp register.
The compiler emits a single MOV prog, and the assembler
expands it to multiple instructions. If globals are accessed
multiple times, each time it generates a reload of the temp
register. As this is done by the assembler, the compiler
cannot optimize it.

This CL makes the compiler not fold address of global into load
and store. If a global is accessed multiple times, or multiple
fields of a struct are accessed, the compiler can CSE the
address. Currently, this doesn't help the case where different
globals are accessed, even though they may be close to each
other in the address space (which we don't know at compile time).

It helps a little bit in go1 benchmark:

name                     old time/op    new time/op    delta
BinaryTree17-2              4.84s ± 1%     4.84s ± 1%    ~     (p=0.796 n=10+10)
Fannkuch11-2                4.10s ± 0%     4.08s ± 0%  -0.58%  (p=0.000 n=9+8)
FmtFprintfEmpty-2          97.9ns ± 1%    96.8ns ± 1%  -1.08%  (p=0.000 n=10+10)
FmtFprintfString-2          147ns ± 0%     147ns ± 1%    ~     (p=0.129 n=9+10)
FmtFprintfInt-2             152ns ± 0%     152ns ± 0%    ~     (p=0.294 n=10+8)
FmtFprintfIntInt-2          218ns ± 1%     217ns ± 0%  -0.64%  (p=0.000 n=10+8)
FmtFprintfPrefixedInt-2     263ns ± 1%     256ns ± 0%  -2.77%  (p=0.000 n=10+8)
FmtFprintfFloat-2           375ns ± 1%     368ns ± 0%  -1.95%  (p=0.000 n=10+7)
FmtManyArgs-2               849ns ± 0%     850ns ± 0%    ~     (p=0.621 n=8+9)
GobDecode-2                12.3ms ± 1%    12.2ms ± 1%  -0.94%  (p=0.003 n=10+10)
GobEncode-2                10.3ms ± 1%    10.5ms ± 1%  +2.03%  (p=0.000 n=10+10)
Gzip-2                      414ms ± 1%     414ms ± 0%    ~     (p=0.842 n=9+10)
Gunzip-2                   66.3ms ± 0%    66.4ms ± 0%    ~     (p=0.077 n=9+9)
HTTPClientServer-2         66.3µs ± 5%    66.4µs ± 1%    ~     (p=0.661 n=10+9)
JSONEncode-2               23.9ms ± 1%    23.9ms ± 1%    ~     (p=0.905 n=10+9)
JSONDecode-2                119ms ± 1%     116ms ± 0%  -2.65%  (p=0.000 n=10+10)
Mandelbrot200-2            5.11ms ± 0%    4.92ms ± 0%  -3.71%  (p=0.000 n=10+10)
GoParse-2                  5.81ms ± 1%    5.84ms ± 1%    ~     (p=0.052 n=10+10)
RegexpMatchEasy0_32-2       315ns ± 0%     317ns ± 0%  +0.67%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K-2       658ns ± 0%     638ns ± 0%  -3.01%  (p=0.000 n=9+9)
RegexpMatchEasy1_32-2       315ns ± 1%     317ns ± 0%  +0.56%  (p=0.000 n=9+9)
RegexpMatchEasy1_1K-2       935ns ± 0%     926ns ± 0%  -0.96%  (p=0.000 n=9+9)
RegexpMatchMedium_32-2      394ns ± 0%     396ns ± 1%  +0.46%  (p=0.001 n=10+10)
RegexpMatchMedium_1K-2     65.1µs ± 0%    64.5µs ± 0%  -0.90%  (p=0.000 n=9+9)
RegexpMatchHard_32-2       3.16µs ± 0%    3.17µs ± 0%  +0.35%  (p=0.000 n=10+9)
RegexpMatchHard_1K-2       89.4µs ± 0%    89.3µs ± 0%    ~     (p=0.136 n=9+9)
Revcomp-2                   703ms ± 2%     694ms ± 2%  -1.41%  (p=0.009 n=10+10)
Template-2                  107ms ± 1%     107ms ± 1%    ~     (p=0.053 n=9+10)
TimeParse-2                 526ns ± 0%     524ns ± 0%  -0.34%  (p=0.002 n=9+9)
TimeFormat-2                534ns ± 0%     504ns ± 1%  -5.51%  (p=0.000 n=10+10)
[Geo mean]                 93.8µs         93.1µs       -0.70%

It also helps in the case mentioned in issue #17110, main.main
in package math's test. Now it generates 4 loads of R31 instead
of 10, for the same piece of code.

This causes a slight increase of binary size: cmd/go increases
0.66%.

If this is a good idea, we should do it on other architectures
where accessing global is expensive.

Updates #17110.

Change-Id: I2687af6eafc04f2a57c19781ec300c33567094b6
Reviewed-on: https://go-review.googlesource.com/68250
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-10-31 18:43:11 +00:00
Michael Munday
4745604bcb cmd/compile: intrinsify math.RoundToEven on s390x
The new RoundToEven function can be implemented as a single FIDBR
instruction on s390x.

name         old time/op  new time/op  delta
RoundToEven  5.32ns ± 1%  0.86ns ± 1%  -83.86%  (p=0.000 n=10+10)

Change-Id: Iaf597e57a0d1085961701e3c75ff4f6f6dcebb5f
Reviewed-on: https://go-review.googlesource.com/74350
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-31 18:04:27 +00:00
Russ Cox
51daa25c6c cmd/dist: avoid darwin_amd64 assumption in debug prints
Noted in CL 73212 review by crawshaw.
Neglected to update CL 73212 before submitting.

Also fix printing of target goos/goarch for cross-compile build.

Change-Id: If702f23071a4456810f1de6abb9115b38933c5c1
Reviewed-on: https://go-review.googlesource.com/74631
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-10-31 16:25:52 +00:00
Russ Cox
fc768da8b8 cmd/vet: tighten printf format error messages
Every time I see an error that begins `missing argument for Fprintf("%s")`
my mental type-checker goes off, since obviously "%s" is not a valid first
argument to Fprintf. Writing Printf("%s") to report an error in Printf("hello %s")
is almost as confusing.

This CL rewords the errors reported by vet's printf check to be more
consistent with each other, avoid placing context like "in printf call"
in the middle of the message, and to avoid the imprecisions above by
not quoting the format string at all.

Before:

	bad.go:9: no formatting directive in Printf call
	bad.go:10: missing argument for Printf("%s"): format reads arg 1, have only 0 args
	bad.go:11: wrong number of args for format in Printf call: 1 needed but 2 args
	bad.go:12: bad syntax for printf argument index: [1]
	bad.go:13: index value [0] for Printf("%[0]s"); indexes start at 1
	bad.go:14: missing argument for Printf("%[2]s"): format reads arg 2, have only 1 args
	bad.go:15: bad syntax for printf argument index: [abc]
	bad.go:16: unrecognized printf verb 'z'
	bad.go:17: arg "hello" for * in printf format not of type int
	bad.go:18: arg fmt.Sprint in printf call is a function value, not a function call
	bad.go:19: arg fmt.Sprint in Print call is a function value, not a function call
	bad.go:20: arg "world" for printf verb %d of wrong type: string
	bad.go:21: missing argument for Printf("%q"): format reads arg 2, have only 1 args
	bad.go:22: first argument to Print is os.Stderr
	bad.go:23: Println call ends with newline
	bad.go:32: arg r in Sprint call causes recursive call to String method
	bad.go:34: arg r for printf causes recursive call to String method

After:

	bad.go:9: Printf call has arguments but no formatting directives
	bad.go:10: Printf format %s reads arg #1, but have only 0 args
	bad.go:11: Printf call needs 1 args but has 2 args
	bad.go:12: Printf format %[1 is missing closing ]
	bad.go:13: Printf format has invalid argument index [0]
	bad.go:14: Printf format has invalid argument index [2]
	bad.go:15: Printf format has invalid argument index [abc]
	bad.go:16: Printf format %.234z has unknown verb z
	bad.go:17: Printf format %.*s uses non-int "hello" as argument of *
	bad.go:18: Printf format %s arg fmt.Sprint is a func value, not called
	bad.go:19: Print arg fmt.Sprint is a func value, not called
	bad.go:20: Printf format %d has arg "world" of wrong type string
	bad.go:21: Printf format %q reads arg #2, but have only 1 args
	bad.go:22: Print does not take io.Writer but has first arg os.Stderr
	bad.go:23: Println args end with redundant newline
	bad.go:32: Sprint arg r causes recursive call to String method
	bad.go:34: Sprintf format %s with arg r causes recursive String method call

Change-Id: I5719f0fb9f2cd84df8ad4c7754ab9b79c691b060
Reviewed-on: https://go-review.googlesource.com/74352
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2017-10-31 16:25:35 +00:00
Russ Cox
9aa6f80ed3 go/importer: support lookup in importer.For
The support in this CL assumes that something at a higher level than
the toolchain-specific importers is taking care of converting imports
in source code into canonical import paths before invoking the
toolchain-specific importers. That kind of "what does an import mean"
as opposed to "find me the import data for this specific path"
should be provided by higher-level layers.

That's a different layering than the default behavior but matches the
current layering in the compiler and linker and works with the metadata
planned for generation by the go command for package management.
It should also eventually allow the importer code to stop concerning
itself with source directories and vendor import translation and maybe
deprecate ImporterFrom in favor of Importer once again. But that's all
in the future. For now, just make non-nil lookups work, and test that.

Fixes #13847.
Adds #22550.

Change-Id: I048c6a384492e634988a7317942667689ae680ff
Reviewed-on: https://go-review.googlesource.com/74354
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-10-31 16:24:26 +00:00
Kenny Grant
3e887ff7ea time: document that valid layouts are not valid Parse values
For #9346 #22135 explicitly state under layout constants
that they are not valid time values for Parse. Also add
examples of parsing valid RFC3339 values and the layout
to the example for time.Parse.

Fix capitalisation of time.Parse and Time.Format.

For #20869 include RFC3339 in the list of layouts that do
not accept all the time formats allowed by RFCs (lowercase z).
This does not fully address #20869.

Fixes #9346
Fixes #22135

Change-Id: Ia4c13e5745de583db5ef7d5b1688d7768bc42c1b
Reviewed-on: https://go-review.googlesource.com/74231
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 14:23:19 +00:00
Ramazan AYYILDIZ
bc98cea941 strings: add examples for specialCase
Change-Id: Ifa0384722dd879af7f5edb7b7aaac5ede3cff46d
Reviewed-on: https://go-review.googlesource.com/74690
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 14:09:07 +00:00
Austin Clements
52cf91a5d5 cmd/compile,runtime: update instrumentation comments
The compiler's instrumentation pass has some out-of-date comments
about the write barrier and some confusing comments about
typedslicecopy. Update these comments and add a comment to
typedslicecopy explaining why it's manually instrumented while none of
the other operations are.

Change-Id: I024e5361d53f1c3c122db0c85155368a30cabd6b
Reviewed-on: https://go-review.googlesource.com/74430
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-31 14:03:10 +00:00
Joe Kyo
5536180ae7 net/http: display connect methods table properly in go doc
When run `go doc -u http.connectMethod`, the whole table is treated as
a single long line. This commit inserts `\t` at the begining of each line,
so the table can be displayed properly in `go doc`.

Change-Id: I6408efd31f84c113e81167d62e1791643000d629
Reviewed-on: https://go-review.googlesource.com/74651
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 14:02:12 +00:00
Russ Cox
8f4f1f63e9 fmt: hide bad format in test from vet
Hide in the source code instead of in the separate whitelist.
Removes the only printf false positive in the standard library.

Change-Id: I99285e67588c7c93bd56d59ee768a03be7c301e7
Reviewed-on: https://go-review.googlesource.com/74590
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2017-10-31 13:49:53 +00:00
Russ Cox
9364c0e337 cmd/vet: do not import net/http at startup
The httpresponse.go module wants to be able to tell if a particular type t
is net/http.Response (and also net/http.Client). It does this by importing
net/http, looking up Response, and then comparing that saved type against
each t.

Instead of doing an eager import of net/http, wait until we have a type t
to ask a question about, and then just look to see if that t is http.Response.
This kind of lazy check does not require assuming that net/http is available
or will be important (perhaps the check is disabled in this run, or perhaps
other conditions that lead to the comparison are not satisfied).

Not loading these kinds of types at startup time will scale better.

Change-Id: Ibb00623901a96e725a4ff6f231e6d15127979dfd
Reviewed-on: https://go-review.googlesource.com/74353
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-31 13:49:46 +00:00
Russ Cox
c1e026a5f6 build: quiet make.bash, make.bat, make.rc
The signal-to-noise ratio is too low.
Stop printing the name of every package.
Can still get the old output with make.bash -v.

Change-Id: Ib2c82e037166e6d2ddc31ae2a4d29af5becce574
Reviewed-on: https://go-review.googlesource.com/74351
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-10-31 13:48:53 +00:00
Russ Cox
84dc501d20 test/run: use go tool compile + link instead of go run when possible
This cuts 6 seconds off all.bash with the new go command.
Not a ton, but also an easy 6 seconds to grab.

The -tags=use_go_run in the misc/cgo tests is just some
go command flag that will make run.go use go run,
but without making everything look stale.
(Those tests have relative imports,
so go tool compile+link is not enough.)

Change-Id: I43bf4bb661d3adde2b2d4aad5e8f64b97bc69ba9
Reviewed-on: https://go-review.googlesource.com/73994
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 13:21:05 +00:00
Russ Cox
2e2047a07f runtime/race: install alternate packages to temp dir
The content-based staleness code means that

	go run -gcflags=-l helloworld.go

recompiles all of helloworld.go's dependencies with -gcflags=-l,
whereas before it would have assumed installed packages were
up-to-date. In this test, that means every race iteration rebuilds
the runtime and maybe a few other packages. Instead, install them
to a temporary location for reuse.

This speeds the test from 17s to 9s on my MacBook Pro.

Change-Id: Ied136ce72650261083bb19cc7dee38dac0ad05ca
Reviewed-on: https://go-review.googlesource.com/73992
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 13:20:41 +00:00
Russ Cox
94471f6324 runtime: shorten tests in all.bash
This cuts 23 seconds from all.bash on my MacBook Pro.

Change-Id: Ibc4d7c01660b9e9ebd088dd55ba993f0d7ec6aa3
Reviewed-on: https://go-review.googlesource.com/73991
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 13:20:27 +00:00
Russ Cox
6c8418f560 cmd/dist: log timing to $GOBUILDTIMELOGFILE
We can't make all.bash faster if we can't measure it.
Measure it.

Change-Id: Ia5da791d4cfbfa1fd9a8e905b3188f63819ade73
Reviewed-on: https://go-review.googlesource.com/73990
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 13:20:12 +00:00
Russ Cox
7dea509703 cmd/go: switch to entirely content-based staleness determination
This CL changes the go command to base all its rebuilding decisions
on the content of the files being processed and not their file system
modification times. It also eliminates the special handling of release
toolchains, which were previously considered always up-to-date
because modification time order could not be trusted when unpacking
a pre-built release.

The go command previously tracked "build IDs" as a backup to
modification times, to catch changes not reflected in modification times.
For example, if you remove one .go file in a package with multiple .go
files, there is no modification time remaining in the system that indicates
that the installed package is out of date. The old build ID was the hash
of a list of file names and a few other factors, expected to change if
those factors changed.

This CL moves to using this kind of build ID as the only way to
detect staleness, making sure that the build ID hash includes all
possible factors that need to influence the rebuild decision.

One such factor is the compiler flags. As of this CL, if you run

	go build -gcflags -N cmd/gofmt

you will get a gofmt where every package is built with -N,
regardless of what may or may not be installed already.

Another such factor is the linker flags. As of this CL, if you run

	go install myprog
	go install -ldflags=-s myprog

the second go install will now correctly build a new myprog with
the updated linker flags. (Previously the installed myprog appeared
up-to-date, because the ldflags were not included in the build ID.)

Because we have more precise information we can also validate whether
the target of a "go test -c" operation is already the right binary and
therefore can avoid a rebuild.

This CL sets us up for having a more general build artifact cache,
maybe even a step toward not having a pkg directory with .a files,
but this CL does not take that step. For now the result of go install
is the same as it ever was; we just do a better job of what needs to
be installed.

This CL does slow down builds a small amount by reading all the
dependent source files in full. (The go command already read the
beginning of every dependent source file to discover build tags
and imports.) On my MacBook Pro, before this CL all.bash takes
3m58s, while after this CL and a few optimizations stacked above it
all.bash takes 4m28s. Given that CL 73850 cut 1m43s off the all.bash
time earlier today, we can afford adding 30s back for now.
More optimizations are planned that should make the go command
more efficient than it was even before this CL.

Fixes #15799.
Fixes #18369.
Fixes #19340.
Fixes #21477.

Change-Id: I10d7ca0e31ca3f58aabb9b1f11e2e3d9d18f0bc9
Reviewed-on: https://go-review.googlesource.com/73212
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-10-31 13:19:38 +00:00
Russ Cox
4b5018ce57 cmd/compile: change ssa test to avoid go run -gcflags=-d=ssa/check/on
In the new content-based staleness world, setting -gcflags like this
recompiles all the packages involved in running the program, not just
the "stale" ones. So go run -gcflags=-d=ssa/check/on recompiles
runtime with those flags too, which is not what the test is trying
to check.

Change-Id: I4dbd5bf2970c3a622c01de84bd8aa9d5e9ec5239
Reviewed-on: https://go-review.googlesource.com/74570
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-10-31 13:19:29 +00:00
Russ Cox
2beb173e98 all: respect $GO_GCFLAGS during run.bash
If the go install doesn't use the same flags as the main build
it can overwrite the installed standard library, leading to
flakiness and slow future tests.

Force uses of 'go install' etc to propagate $GO_GCFLAGS
or disable them entirely, to avoid problems.

As I understand it, the main place this happens is the ssacheck builder.
If there are other uses that need to run some of the now-disabled
tests we can reenable fixed tests in followup CLs.

Change-Id: Ib860a253539f402f8a96a3c00ec34f0bbf137c9a
Reviewed-on: https://go-review.googlesource.com/74470
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-10-31 13:19:15 +00:00
Tim Cooper
99be9cc02c flag: add (*FlagSet).Name, (*FlagSet).ErrorHandling, export (*FlagSet).Output
Allows code that operates on a FlagSet to know the name and error
handling behavior of the FlagSet without having to call FlagSet.Init.

Fixes #17628
Fixes #21888

Change-Id: Ib0fe4c8885f9ccdacf5a7fb761d5ecb23f3bb055
Reviewed-on: https://go-review.googlesource.com/70391
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 03:54:16 +00:00
Jason Wangsadinata
26e49e695f container/ring: fix example_test.go
The Len method is a linear operation. CL 73090 used Len to iterate over
a ring, resulting in a quadratic time operation.

Change-Id: Ib69c19190ba648311e6c345d8cb26292b50121ee
Reviewed-on: https://go-review.googlesource.com/74390
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-31 03:52:55 +00:00
Ian Lance Taylor
94d9371780 os: clarify that OpenFile reqires one of O_RDONLY/O_WRONLY/O_RDWR
Fixes #21322.

Change-Id: Ia589c576be0b5cdb7cde5d35cd857ad7c93c372b
Reviewed-on: https://go-review.googlesource.com/74550
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-10-31 03:23:10 +00:00
Michael Munday
c280126557 cmd/asm, cmd/internal/obj/s390x, math: add "test under mask" instructions
Adds the following s390x test under mask (immediate) instructions:

TMHH
TMHL
TMLH
TMLL

These are useful for testing bits and are already used in the math package.

Change-Id: Idffb3f83b238dba76ac1e42ac6b0bf7f1d11bea2
Reviewed-on: https://go-review.googlesource.com/41092
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30 23:55:14 +00:00
Michael Munday
96cdacb971 cmd/asm, cmd/compile: optimize math.Abs and math.Copysign on s390x
This change adds three new instructions:

- LPDFR: load positive (math.Abs(x))
- LNDFR: load negative (-math.Abs(x))
- CPSDR: copy sign (math.Copysign(x, y))

By making use of GPR <-> FPR moves we can now compile math.Abs and
math.Copysign to these instructions using SSA rules.

This CL also adds new rules to merge address generation into combined
load operations. This makes GPR <-> FPR move matching more reliable.

name                 old time/op  new time/op  delta
Copysign             1.85ns ± 0%  1.40ns ± 1%  -24.65%  (p=0.000 n=8+10)
Abs                  1.58ns ± 1%  0.73ns ± 1%  -53.64%  (p=0.000 n=10+10)

The geo mean improvement for all math package benchmarks was 4.6%.

Change-Id: I0cec35c5c1b3fb45243bf666b56b57faca981bc9
Reviewed-on: https://go-review.googlesource.com/73950
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30 23:42:51 +00:00
Bill O'Farrell
7fff1db060 runtime: remove unnecessary sync from publicationBarrier on s390x
Memory accesses on z are at least as ordered as they are on AMD64.

Change-Id: Ia515430e571ebd07e9314de05c54dc992ab76b95
Reviewed-on: https://go-review.googlesource.com/74010
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2017-10-30 23:42:27 +00:00
Matthew Dempsky
03c8c56682 cmd/compile: skip compiling wrappers for imported defined types
When compiling a package that defines a type T with method T.M, we
already compile and emit the wrapper method (*T).M. There's no need
for every package that uses T to do the same.

Change-Id: I3ca2659029907570f8b98d66111686435fad7ed0
Reviewed-on: https://go-review.googlesource.com/74412
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-10-30 22:03:33 +00:00
Mark Theunissen
84e91e1d6b net/url: preserve leading slashes when resolving path
When doing resolvePath, if there are multiple leading slashes in the
target, preserve them. This prevents an issue where the Go http.Client
cleans up multiple leading slashes in the Location header in a
redirect, resulting in a redirection to the incorrect target.

Fixes #21158.

Change-Id: I6a21ea61ca3bc7033f3c8a6ccc21ecaa3e996fa8
Reviewed-on: https://go-review.googlesource.com/51050
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
2017-10-30 21:00:06 +00:00
David Chase
b4c3fe7b04 cmd/compile: adjust expectations of test for issue 18902
The test for #18902 reads the assembly stream to be sure
that the line number does not change too often (this is an
indication that debugging the code will be unpleasant and
that the compiler is probably getting line numbers "wrong").

It checks that it is getting "enough" input, but the
compiler has gotten enough better since the test was written
that it now fails for lack of enough input.  The old
threshould was 200 instructions, the new one is 150 (the
minimum observed input is on arm64 with 184 instructions).

Fixes #22494.

Change-Id: Ibba7e9ff4ab6a7be369e5dd5859d150b7db94653
Reviewed-on: https://go-review.googlesource.com/74357
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-10-30 20:27:09 +00:00
Keith Randall
0153a4130d cmd/compile: fix runtime.KeepAlive
KeepAlive needs to introduce a use of the spill of the
value it is keeping alive.  Without that, we don't guarantee
that the spill dominates the KeepAlive.

This bug was probably introduced with the code to move spills
down to the dominator of the restores, instead of always spilling
just after the value itself (CL 34822).

Fixes #22458.

Change-Id: I94955a21960448ffdacc4df775fe1213967b1d4c
Reviewed-on: https://go-review.googlesource.com/74210
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-30 19:55:02 +00:00
Russ Cox
b09e2de735 cmd/dist: force non-devel version for cross-build buildlets
If the compiler has a non-devel version it will report that version
to the go command for use as the "compiler ID" instead of using
the content ID of the binary. This in turn allows the go command
to see the compiled-for-amd64 arm compiler and the compiled-for-arm
arm compiler as having the same ID, so that packages cross-compiled
from amd64 look up-to-date when copied to the arm system
during the linux-arm buildlets and trybots.

Change-Id: I76cbf129303941f8e31bdb100e263478159ddaa5
Reviewed-on: https://go-review.googlesource.com/74360
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30 19:44:23 +00:00
Michael Munday
b97688d112 math: optimize dim and remove s390x assembly implementation
By calculating dim directly, rather than calling max, we can simplify
the generated code significantly. The compiler now reports that dim
is easily inlineable, but it can't be inlined because there is still
an assembly stub for Dim.

Since dim is now very simple I no longer think it is worth having
assembly implementations of it. I have therefore removed the s390x
assembly. Removing the other assembly for Dim is #21913.

name  old time/op  new time/op  delta
Dim   4.29ns ± 0%  3.53ns ± 0%  -17.62%  (p=0.000 n=9+8)

Change-Id: Ic38a6b51603cbc661dcdb868ecf2b1947e9f399e
Reviewed-on: https://go-review.googlesource.com/64194
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-10-30 19:05:51 +00:00
Sam Whited
be08ddbfcd encoding/xml: don't panic when custom Unmarshaler sees StartElement
Change-Id: I90aa0a983abd0080f3de75d3340fdb15c1f9ca35
Reviewed-on: https://go-review.googlesource.com/70891
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Sam Whited <sam@samwhited.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-30 18:52:19 +00:00
Terin Stock
01c144c410 net/http/pprof: attach handlers using http.HandleFunc
Simplify how pprof attaches the handlers to the DefaultMux by using
http.HandleFunc instead of manually wrapping the handlers in
a http.HandlerFunc.

Change-Id: I65db262ebb2e29e4b6f30df9d2688f5daf782c29
Reviewed-on: https://go-review.googlesource.com/71251
Reviewed-by: Sam Whited <sam@samwhited.com>
Reviewed-by: Tom Bergan <tombergan@google.com>
Run-TryBot: Sam Whited <sam@samwhited.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-30 18:46:16 +00:00
Austin Clements
877387e38a runtime: use buffered write barrier for bulkBarrierPreWrite
This modifies bulkBarrierPreWrite to use the buffered write barrier
instead of the eager write barrier. This reduces the number of system
stack switches and sanity checks by a factor of the buffer size
(currently 256). This affects both typedmemmove and typedmemclr.

Since this is purely a runtime change, it applies to all arches
(unlike the pointer write barrier).

name                 old time/op  new time/op  delta
BulkWriteBarrier-12  7.33ns ± 6%  4.46ns ± 9%  -39.10%  (p=0.000 n=20+19)

Updates #22460.

Change-Id: I6a686a63bbf08be02b9b97250e37163c5a90cdd8
Reviewed-on: https://go-review.googlesource.com/73832
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-30 18:12:54 +00:00
Austin Clements
6a5f1e58ed runtime: simplify and optimize typedslicecopy
Currently, typedslicecopy meticulously performs a typedmemmove on
every element of the slice. This probably used to be necessary because
we only had an individual element's type, but now we use the heap
bitmap, so we only need to know whether the type has any pointers and
how big it is. Hence, this CL rewrites typedslicecopy to simply
perform one bulk barrier and one memmove.

This also has a side-effect of eliminating two unnecessary write
barriers per slice element that were coming from updates to dstp and
srcp, which were stored in the parent stack frame. However, most of
the win comes from eliminating the loops.

name                 old time/op  new time/op  delta
BulkWriteBarrier-12  7.83ns ±10%  7.33ns ± 6%  -6.45%  (p=0.000 n=20+20)

Updates #22460.

Change-Id: Id3450e9f36cc8e0892f268319b136f0d8f5464b8
Reviewed-on: https://go-review.googlesource.com/73831
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-30 18:12:51 +00:00
Austin Clements
f96b95bcd1 runtime: benchmark for bulk write barriers
This adds a benchmark of typedslicecopy and its bulk write barriers.

For #22460.

Change-Id: I439ca3b130bb22944468095f8f18b464e5bb43ca
Reviewed-on: https://go-review.googlesource.com/74051
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-30 18:12:49 +00:00
Austin Clements
7e343134d3 cmd/compile: compiler support for buffered write barrier
This CL implements the compiler support for calling the buffered write
barrier added by the previous CL.

Since the buffered write barrier is only implemented on amd64 right
now, this still supports the old, eager write barrier as well. There's
little overhead to supporting both and this way a few tests in
test/fixedbugs that expect to have liveness maps at write barrier
calls can easily opt-in to the old, eager barrier.

This significantly improves the performance of the write barrier:

name             old time/op  new time/op  delta
WriteBarrier-12  73.5ns ±20%  19.2ns ±27%  -73.90%  (p=0.000 n=19+18)

It also reduces the size of binaries because the write barrier call is
more compact:

name        old object-bytes  new object-bytes  delta
Template           398k ± 0%         393k ± 0%  -1.14%  (p=0.008 n=5+5)
Unicode            208k ± 0%         206k ± 0%  -1.00%  (p=0.008 n=5+5)
GoTypes           1.18M ± 0%        1.15M ± 0%  -2.00%  (p=0.008 n=5+5)
Compiler          4.05M ± 0%        3.88M ± 0%  -4.26%  (p=0.008 n=5+5)
SSA               8.25M ± 0%        8.11M ± 0%  -1.59%  (p=0.008 n=5+5)
Flate              228k ± 0%         224k ± 0%  -1.83%  (p=0.008 n=5+5)
GoParser           295k ± 0%         284k ± 0%  -3.62%  (p=0.008 n=5+5)
Reflect           1.00M ± 0%        0.99M ± 0%  -0.70%  (p=0.008 n=5+5)
Tar                339k ± 0%         333k ± 0%  -1.67%  (p=0.008 n=5+5)
XML                404k ± 0%         395k ± 0%  -2.10%  (p=0.008 n=5+5)
[Geo mean]         704k              690k       -2.00%

name        old exe-bytes     new exe-bytes     delta
HelloSize         1.05M ± 0%        1.04M ± 0%  -1.55%  (p=0.008 n=5+5)

https://perf.golang.org/search?q=upload:20171027.1

(Amusingly, this also reduces compiler allocations by 0.75%, which,
combined with the better write barrier, speeds up the compiler overall
by 2.10%. See the perf link.)

It slightly improves the performance of most of the go1 benchmarks and
improves the performance of the x/benchmarks:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.40s ± 1%     2.47s ± 1%  +2.69%  (p=0.000 n=19+19)
Fannkuch11-12                2.95s ± 0%     2.95s ± 0%  +0.21%  (p=0.000 n=20+19)
FmtFprintfEmpty-12          41.8ns ± 4%    41.4ns ± 2%  -1.03%  (p=0.014 n=20+20)
FmtFprintfString-12         68.7ns ± 2%    67.5ns ± 1%  -1.75%  (p=0.000 n=20+17)
FmtFprintfInt-12            79.0ns ± 3%    77.1ns ± 1%  -2.40%  (p=0.000 n=19+17)
FmtFprintfIntInt-12          127ns ± 1%     123ns ± 3%  -3.42%  (p=0.000 n=20+20)
FmtFprintfPrefixedInt-12     152ns ± 1%     150ns ± 1%  -1.02%  (p=0.000 n=18+17)
FmtFprintfFloat-12           211ns ± 1%     209ns ± 0%  -0.99%  (p=0.000 n=20+16)
FmtManyArgs-12               500ns ± 0%     496ns ± 0%  -0.73%  (p=0.000 n=17+20)
GobDecode-12                6.44ms ± 1%    6.53ms ± 0%  +1.28%  (p=0.000 n=20+19)
GobEncode-12                5.46ms ± 0%    5.46ms ± 1%    ~     (p=0.550 n=19+20)
Gzip-12                      220ms ± 1%     216ms ± 0%  -1.75%  (p=0.000 n=19+19)
Gunzip-12                   38.8ms ± 0%    38.6ms ± 0%  -0.30%  (p=0.000 n=18+19)
HTTPClientServer-12         79.0µs ± 1%    78.2µs ± 1%  -1.01%  (p=0.000 n=20+20)
JSONEncode-12               11.9ms ± 0%    11.9ms ± 0%  -0.29%  (p=0.000 n=20+19)
JSONDecode-12               52.6ms ± 0%    52.2ms ± 0%  -0.68%  (p=0.000 n=19+20)
Mandelbrot200-12            3.69ms ± 0%    3.68ms ± 0%  -0.36%  (p=0.000 n=20+20)
GoParse-12                  3.13ms ± 1%    3.18ms ± 1%  +1.67%  (p=0.000 n=19+20)
RegexpMatchEasy0_32-12      73.2ns ± 1%    72.3ns ± 1%  -1.19%  (p=0.000 n=19+18)
RegexpMatchEasy0_1K-12       241ns ± 0%     239ns ± 0%  -0.83%  (p=0.000 n=17+16)
RegexpMatchEasy1_32-12      68.6ns ± 1%    69.0ns ± 1%  +0.47%  (p=0.015 n=18+16)
RegexpMatchEasy1_1K-12       364ns ± 0%     361ns ± 0%  -0.67%  (p=0.000 n=16+17)
RegexpMatchMedium_32-12      104ns ± 1%     103ns ± 1%  -0.79%  (p=0.001 n=20+15)
RegexpMatchMedium_1K-12     33.8µs ± 3%    34.0µs ± 2%    ~     (p=0.267 n=20+19)
RegexpMatchHard_32-12       1.64µs ± 1%    1.62µs ± 2%  -1.25%  (p=0.000 n=19+18)
RegexpMatchHard_1K-12       49.2µs ± 0%    48.7µs ± 1%  -0.93%  (p=0.000 n=19+18)
Revcomp-12                   391ms ± 5%     396ms ± 7%    ~     (p=0.154 n=19+19)
Template-12                 63.1ms ± 0%    59.5ms ± 0%  -5.76%  (p=0.000 n=18+19)
TimeParse-12                 307ns ± 0%     306ns ± 0%  -0.39%  (p=0.000 n=19+17)
TimeFormat-12                325ns ± 0%     323ns ± 0%  -0.50%  (p=0.000 n=19+19)
[Geo mean]                  47.3µs         46.9µs       -0.67%

https://perf.golang.org/search?q=upload:20171026.1

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.25ms ± 1%  2.20ms ± 1%  -2.31%  (p=0.000 n=18+18)
HTTP-12                    12.6µs ± 0%  12.6µs ± 0%  -0.72%  (p=0.000 n=18+17)
JSON-12                    11.0ms ± 0%  11.0ms ± 1%  -0.68%  (p=0.000 n=17+19)

https://perf.golang.org/search?q=upload:20171026.2

Updates #14951.
Updates #22460.

Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7
Reviewed-on: https://go-review.googlesource.com/73712
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30 18:12:46 +00:00
Austin Clements
e9079a69f3 runtime: buffered write barrier implementation
This implements runtime support for buffered write barriers on amd64.
The buffered write barrier has a fast path that simply enqueues
pointers in a per-P buffer. Unlike the current write barrier, this
fast path is *not* a normal Go call and does not require the compiler
to spill general-purpose registers or put arguments on the stack. When
the buffer fills up, the write barrier takes the slow path, which
spills all general purpose registers and flushes the buffer. We don't
allow safe-points or stack splits while this frame is active, so it
doesn't matter that we have no type information for the spilled
registers in this frame.

One minor complication is cgocheck=2 mode, which uses the write
barrier to detect Go pointers being written to non-Go memory. We
obviously can't buffer this, so instead we set the buffer to its
minimum size, forcing the write barrier into the slow path on every
call. For this specific case, we pass additional information as
arguments to the flush function. This also requires enabling the cgo
write barrier slightly later during runtime initialization, after Ps
(and the per-P write barrier buffers) have been initialized.

The code in this CL is not yet active. The next CL will modify the
compiler to generate calls to the new write barrier.

This reduces the average cost of the write barrier by roughly a factor
of 4, which will pay for the cost of having it enabled more of the
time after we make the GC pacer less aggressive. (Benchmarks will be
in the next CL.)

Updates #14951.
Updates #22460.

Change-Id: I396b5b0e2c5e5c4acfd761a3235fd15abadc6cb1
Reviewed-on: https://go-review.googlesource.com/73711
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-30 18:12:44 +00:00
Austin Clements
1e8ab99b37 runtime: add benchmark for write barriers
For #22460.

Change-Id: I798f26d45bbe1efd16b632e201413cb26cb3e6c7
Reviewed-on: https://go-review.googlesource.com/73811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-30 18:12:41 +00:00
Austin Clements
15d6ab69fb runtime: make systemstack tail call if already switched
Currently systemstack always calls its argument, even if we're already
on the system stack. Unfortunately, traceback with _TraceJump stops at
the first systemstack it sees, which often cuts off runtime stacks
early in profiles.

Fix this by performing a tail call if we're already on the system
stack. This eliminates it from the traceback entirely, so it won't
stop prematurely (or all get mushed into a single node in the profile
graph).

Change-Id: Ibc69e8765e899f8d3806078517b8c7314da196f4
Reviewed-on: https://go-review.googlesource.com/74050
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30 16:33:55 +00:00
Russ Cox
67a7d5d885 misc/cgo/testshared: don't assume mtimes trigger rebuilds
The upcoming CL 73212 will see through mtime modifications.
Change the underlying file too.

Change-Id: Ib23b4136a62ee87bce408b76bb0385451ae7dcd2
Reviewed-on: https://go-review.googlesource.com/74130
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-30 16:28:40 +00:00
Lynn Boger
4d0151ede5 cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics on ppc64x
This adds support for math Abs, Copysign to be instrinsics on ppc64x.

New instruction FCPSGN is added to generate fcpsgn. Some new
rules are added to improve the int<->float conversions that are
generated mainly due to the Float64bits and Float64frombits in
the math package. PPC64.rules is also modified as suggested
in the review for CL 63290.

Improvements:
benchmark                           old ns/op     new ns/op     delta
BenchmarkAbs-16                   1.12          0.69          -38.39%
BenchmarkCopysign-16              1.30          0.93          -28.46%
BenchmarkNextafter32-16           9.34          8.05          -13.81%
BenchmarkFrexp-16                 8.81          7.60          -13.73%

Others that used Copysign also saw smaller improvements.

I attempted to make this work using rules since that
seems to be preferred, but due to the use of Float64bits and
Float64frombits in these functions, several rules had to be added and
even then not all cases were matched. Using rules became too
complicated and seemed too fragile for these.

Updates #21390

Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995
Reviewed-on: https://go-review.googlesource.com/67130
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30 13:56:39 +00:00