1
0
mirror of https://github.com/golang/go synced 2024-11-18 07:04:52 -07:00
Commit Graph

3625 Commits

Author SHA1 Message Date
Michael Munday
377c1536f5 cmd/compile: mark s390x int <-> float conversions as clobbering flags
These conversion instructions set the condition code and so should
be marked as clobbering flags.

Fixes #39651.

Change-Id: I91cc9687ea70ef0551bb3139c1875071c349d43e
Reviewed-on: https://go-review.googlesource.com/c/go/+/238628
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-06-18 16:36:16 +00:00
Keith Randall
8c8045fd38 cmd/compile: fix ordering problems in struct equality
Make sure that if a field comparison might panic, we evaluate
(and short circuit if not equal) all previous fields, and don't
evaluate any subsequent fields.

Add a bunch more tests to the equality+panic checker.

Update #8606

Change-Id: I6a159bbc8da5b2b7ee835c0cd1fc565575b58c46
Reviewed-on: https://go-review.googlesource.com/c/go/+/237919
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-06-15 18:03:08 +00:00
Michael Munday
ac743dea8e cmd/compile: always tighten and de-duplicate tuple selectors
The scheduler assumes two special invariants that apply to tuple
selectors (Select0 and Select1 ops):

  1. There is only one tuple selector of each type per generator.
  2. Tuple selectors and generators reside in the same block.

Prior to this CL the assumption was that these invariants would
only be broken by the CSE pass. The CSE pass therefore contained
code to move and de-duplicate selectors to fix these invariants.

However it is also possible to write relatively basic optimization
rules that cause these invariants to be broken. For example:

  (A (Select0 (B))) -> (Select1 (B))

This rule could result in the newly added selector (Select1) being
in a different block to the tuple generator (see issue #38356). It
could also result in duplicate selectors if this rule matches
multiple times for the same tuple generator (see issue #39472).

The CSE pass will 'fix' these invariants. However it will only do
so when optimizations are enabled (since disabling optimizations
disables the CSE pass).

This CL moves the CSE tuple selector fixup code into its own pass
and makes it mandatory even when optimizations are disabled. This
allows tuple selectors to be treated like normal ops for most of
the compilation pipeline until after the new pass has run, at which
point we need to be careful to maintain the invariant again.

Fixes #39472.

Change-Id: Ia3f79e09d9c65ac95f897ce37e967ee1258a080b
Reviewed-on: https://go-review.googlesource.com/c/go/+/237118
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-06-10 14:55:29 +00:00
Xiangdong Ji
e031318ca6 cmd/compile: ARM comparisons with 0 incorrect on overflow
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:

  Block-Op         Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.

For more details please refer to the previous fix on 64-bit ARM:
  https://go-review.googlesource.com/c/go/+/233097

Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.

name                     old time/op    new time/op     delta
BinaryTree17-8              7.73s ± 0%      7.81s ± 0%  +0.97%  (p=0.000 n=7+8)
Fannkuch11-8                7.06s ± 0%      7.00s ± 0%  -0.83%  (p=0.000 n=8+8)
FmtFprintfEmpty-8           181ns ± 1%      183ns ± 1%  +1.31%  (p=0.001 n=8+8)
FmtFprintfString-8          319ns ± 1%      325ns ± 2%  +1.71%  (p=0.009 n=7+8)
FmtFprintfInt-8             358ns ± 1%      359ns ± 1%    ~     (p=0.293 n=7+7)
FmtFprintfIntInt-8          459ns ± 3%      456ns ± 1%    ~     (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8     535ns ± 4%      538ns ± 4%    ~     (p=0.572 n=8+8)
FmtFprintfFloat-8          1.01µs ± 2%     1.01µs ± 2%    ~     (p=0.625 n=8+8)
FmtManyArgs-8              1.93µs ± 2%     1.93µs ± 1%    ~     (p=0.979 n=8+7)
GobDecode-8                16.1ms ± 1%     16.5ms ± 1%  +2.32%  (p=0.000 n=8+8)
GobEncode-8                15.9ms ± 0%     15.8ms ± 1%  -1.00%  (p=0.000 n=8+7)
Gzip-8                      690ms ± 1%      670ms ± 0%  -2.90%  (p=0.000 n=8+8)
Gunzip-8                    109ms ± 1%      109ms ± 1%    ~     (p=0.694 n=7+8)
HTTPClientServer-8          149µs ± 3%      146µs ± 2%  -1.70%  (p=0.028 n=8+8)
JSONEncode-8               50.5ms ± 1%     49.2ms ± 0%  -2.60%  (p=0.001 n=7+7)
JSONDecode-8                135ms ± 2%      137ms ± 1%    ~     (p=0.054 n=8+7)
Mandelbrot200-8             951ms ± 0%      952ms ± 0%    ~     (p=0.852 n=6+8)
GoParse-8                  9.47ms ± 1%     9.66ms ± 1%  +2.01%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8       288ns ± 2%      277ns ± 2%  -3.61%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8      1.66µs ± 1%     1.69µs ± 2%  +2.21%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       334ns ± 1%      305ns ± 2%  -8.86%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8      2.14µs ± 2%     2.15µs ± 0%    ~     (p=0.099 n=8+8)
RegexpMatchMedium_32-8     13.3ns ± 1%     13.3ns ± 0%    ~     (p=1.000 n=7+7)
RegexpMatchMedium_1K-8     81.1µs ± 3%     80.7µs ± 1%    ~     (p=0.955 n=7+8)
RegexpMatchHard_32-8       4.26µs ± 0%     4.26µs ± 0%    ~     (p=0.933 n=7+8)
RegexpMatchHard_1K-8        124µs ± 0%      124µs ± 0%  +0.31%  (p=0.000 n=8+8)
Revcomp-8                  14.7ms ± 2%     14.5ms ± 1%  -1.66%  (p=0.003 n=8+8)
Template-8                  197ms ± 2%      200ms ± 3%  +1.62%  (p=0.021 n=8+8)
TimeParse-8                1.33µs ± 1%     1.30µs ± 1%  -1.86%  (p=0.002 n=8+8)
TimeFormat-8               3.04µs ± 1%     3.02µs ± 0%  -0.60%  (p=0.000 n=8+8)

name                     old speed      new speed       delta
GobDecode-8              47.6MB/s ± 1%   46.5MB/s ± 1%  -2.28%  (p=0.000 n=8+8)
GobEncode-8              48.1MB/s ± 0%   48.6MB/s ± 1%  +1.02%  (p=0.000 n=8+7)
Gzip-8                   28.1MB/s ± 1%   29.0MB/s ± 0%  +2.97%  (p=0.000 n=8+8)
Gunzip-8                  178MB/s ± 1%    179MB/s ± 2%    ~     (p=0.694 n=7+8)
JSONEncode-8             38.4MB/s ± 1%   39.4MB/s ± 0%  +2.67%  (p=0.001 n=7+7)
JSONDecode-8             14.3MB/s ± 2%   14.2MB/s ± 1%  -0.81%  (p=0.043 n=8+7)
GoParse-8                6.12MB/s ± 1%   5.99MB/s ± 1%  -2.00%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8     111MB/s ± 2%    115MB/s ± 2%  +3.77%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8     618MB/s ± 1%    604MB/s ± 2%  -2.16%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8    95.7MB/s ± 1%  105.1MB/s ± 2%  +9.76%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8     479MB/s ± 2%    477MB/s ± 0%    ~     (p=0.105 n=8+8)
RegexpMatchMedium_32-8   75.2MB/s ± 1%   75.2MB/s ± 0%    ~     (p=0.247 n=7+7)
RegexpMatchMedium_1K-8   12.6MB/s ± 3%   12.7MB/s ± 1%    ~     (p=0.538 n=7+8)
RegexpMatchHard_32-8     7.52MB/s ± 0%   7.52MB/s ± 0%    ~     (p=0.968 n=7+8)
RegexpMatchHard_1K-8     8.26MB/s ± 0%   8.24MB/s ± 0%  -0.30%  (p=0.001 n=8+8)
Revcomp-8                 173MB/s ± 2%    176MB/s ± 1%  +1.68%  (p=0.003 n=8+8)
Template-8               9.85MB/s ± 2%   9.69MB/s ± 3%  -1.59%  (p=0.021 n=8+8)

Fixes   #39303
Updates #38740

Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-06-09 15:50:33 +00:00
Dmitri Shuralyov
ee379d2b08 all: treat all files as binary, but check in .bat with CRLF
This is a followup to CL 96495.

It should be simpler and more robust to achieve .bat files having
CRLF line endings by treating it as a binary file, like all other
files, and checking it in with the desired CRLF line endings.

A test is used to check the entire Go tree, short of directories
starting with "." and named "testdata", for any .bat files that
have anything other than strict CRLF line endings. This will help
catch any accidental modifications to existing .bat files or check
ins of new .bat files.

Importantly, this is compatible with how Gerrit serves .tar.gz files,
making it so that CRLF line endings are preserved.

The Go project is supported on many different environments, some of
which may have limited git implementations available, or none at all.
Relying on fewer git features and special rules makes it easier to
have confidence in the exact content of all files. Additionally, Go
development started in Subversion, moved to Perforce, then Mercurial,
and now uses Git.¹ Reducing its reliance on git-specific features will
help if there will be another transition in the project's future.

There are only 5 .bat files in the entire Go source tree, so a new one
being added is a rare event, and we prefer to do things in Go instead.
We still have the option of improving the experience for developers by
adding a pre-commit converter for .bat files to the git-codereview tool.

¹ https://groups.google.com/d/msg/golang-dev/sckirqOWepg/YmyT7dWJiocJ

Fixes #39391.
For #37791.

Change-Id: I6e202216322872f0307ac96f1b8d3f57cb901e6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/236437
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-06-08 15:31:43 +00:00
Keith Randall
f72d7cfc8f cmd/compile: add interface equality tests
Add interfaces which differ in type. Those used so far only
differ in value, not type.

These additional tests are needed to generate a failure
before CL 236278 went in.

Update #8606

Change-Id: Icdb7647b1973c2fff7e5afe2bd8b8c1b384f583e
Reviewed-on: https://go-review.googlesource.com/c/go/+/236418
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-06-04 06:21:55 +00:00
Keith Randall
9984ef824c cmd/compile: test that equality is evaluated in order
Make sure that we compare fields of structs and elements of arrays in order,
with proper short-circuiting.

Update #8606

Change-Id: I0a66ad92ea0af7bcc56dfdb275dec2b8d7e8b4fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/236147
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-06-03 19:07:55 +00:00
Richard Musiol
0452f9460f runtime: fix race condition between timer and event handler
This change fixes a race condition between beforeIdle waking up the
innermost event handler and a timer causing a different goroutine to
wake up at the exact same moment. This messes up the wasm event handling
and leads to memory corruption. The solution is to make beforeIdle
return the goroutine that must run next and have findrunnable pick
this goroutine without considering timers again.

Fixes #38093
Fixes #38574

Change-Id: Iffbe99411d25c2730953d1c8b0741fd892f8e540
Reviewed-on: https://go-review.googlesource.com/c/go/+/230178
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-05-31 18:35:04 +00:00
Xiangdong Ji
e8f5a33191 cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:

  Block-Op        Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.

Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.

Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.

This change also updates TestFormats to support using %#x.

Examples exhibiting where does the issue come from:
  1: 'if x + 3 < 0' might be converted to:
  before:
    CMN $3, R0
    BGE <else branch> // wrong branch is taken if 'x+3' overflows
  after:
    CMN $3, R0
    BPL <else branch>

  2: 'if y - 3 > 0' might be converted to:
  before:
    CMP $3, R0
    BLE <else branch> // wrong branch is taken if 'y-3' underflows
  after:
    CMP $3, R0
    BMI <else branch>
    BEQ <else branch>

Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.

S1:
name                    old time/op  new time/op  delta
CondRewrite/SoloJump  13.6ns ± 0%  12.9ns ± 0%  -5.15%  (p=0.000 n=10+10)
CondRewrite/CombJump  13.8ns ± 1%  12.9ns ± 0%  -6.32%  (p=0.000 n=10+10)

S2:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  11.6ns ± 0%  10.9ns ± 0%  -6.03%  (p=0.000 n=10+10)
CondRewrite/CombJump  11.4ns ± 0%  10.8ns ± 1%  -5.53%  (p=0.000 n=10+10)

S3:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  7.36ns ± 0%  7.50ns ± 0%  +1.79%  (p=0.000 n=9+10)
CondRewrite/CombJump  7.35ns ± 0%  7.75ns ± 0%  +5.51%  (p=0.000 n=8+9)

S4:
name                      old time/op  new time/op  delta
CondRewrite/SoloJump-224  11.5ns ± 1%  10.9ns ± 0%  -4.97%  (p=0.000 n=10+10)
CondRewrite/CombJump-224  11.9ns ± 0%  11.5ns ± 0%  -2.95%  (p=0.000 n=10+10)

S5:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  10.0ns ± 0%  10.0ns ± 0%  -0.45%  (p=0.000 n=9+10)
CondRewrite/CombJump  9.93ns ± 0%  9.77ns ± 0%  -1.53%  (p=0.000 n=10+9)

Go1 perf. data:

name                     old time/op    new time/op    delta
BinaryTree17              6.29s ± 1%     6.30s ± 1%    ~     (p=1.000 n=5+5)
Fannkuch11                5.40s ± 0%     5.40s ± 0%    ~     (p=0.841 n=5+5)
FmtFprintfEmpty          97.9ns ± 0%    98.9ns ± 3%    ~     (p=0.937 n=4+5)
FmtFprintfString          171ns ± 3%     171ns ± 2%    ~     (p=0.754 n=5+5)
FmtFprintfInt             212ns ± 0%     217ns ± 6%  +2.55%  (p=0.008 n=5+5)
FmtFprintfIntInt          296ns ± 1%     297ns ± 2%    ~     (p=0.516 n=5+5)
FmtFprintfPrefixedInt     371ns ± 2%     374ns ± 7%    ~     (p=1.000 n=5+5)
FmtFprintfFloat           435ns ± 1%     439ns ± 2%    ~     (p=0.056 n=5+5)
FmtManyArgs              1.37µs ± 1%    1.36µs ± 1%    ~     (p=0.730 n=5+5)
GobDecode                14.6ms ± 4%    14.4ms ± 4%    ~     (p=0.690 n=5+5)
GobEncode                11.8ms ±20%    11.6ms ±15%    ~     (p=1.000 n=5+5)
Gzip                      507ms ± 0%     491ms ± 0%  -3.22%  (p=0.008 n=5+5)
Gunzip                   73.8ms ± 0%    73.9ms ± 0%    ~     (p=0.690 n=5+5)
HTTPClientServer          116µs ± 0%     116µs ± 0%    ~     (p=0.686 n=4+4)
JSONEncode               21.8ms ± 1%    21.6ms ± 2%    ~     (p=0.151 n=5+5)
JSONDecode                104ms ± 1%     103ms ± 1%  -1.08%  (p=0.016 n=5+5)
Mandelbrot200            9.53ms ± 0%    9.53ms ± 0%    ~     (p=0.421 n=5+5)
GoParse                  7.55ms ± 1%    7.51ms ± 1%    ~     (p=0.151 n=5+5)
RegexpMatchEasy0_32       158ns ± 0%     158ns ± 0%    ~     (all equal)
RegexpMatchEasy0_1K       606ns ± 1%     608ns ± 3%    ~     (p=0.937 n=5+5)
RegexpMatchEasy1_32       143ns ± 0%     144ns ± 1%    ~     (p=0.095 n=5+4)
RegexpMatchEasy1_1K       927ns ± 2%     944ns ± 2%    ~     (p=0.056 n=5+5)
RegexpMatchMedium_32     16.0ns ± 0%    16.0ns ± 0%    ~     (all equal)
RegexpMatchMedium_1K     69.3µs ± 2%    69.7µs ± 0%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32       3.73µs ± 0%    3.73µs ± 1%    ~     (p=0.984 n=5+5)
RegexpMatchHard_1K        111µs ± 1%     110µs ± 0%    ~     (p=0.151 n=5+5)
Revcomp                   1.91s ±47%     1.77s ±68%    ~     (p=1.000 n=5+5)
Template                  138ms ± 1%     138ms ± 1%    ~     (p=1.000 n=5+5)
TimeParse                 787ns ± 2%     785ns ± 1%    ~     (p=0.540 n=5+5)
TimeFormat                729ns ± 1%     726ns ± 1%    ~     (p=0.151 n=5+5)

Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2020-05-29 15:39:54 +00:00
Josh Bleecher Snyder
364a05e2fe cmd/compile: add test for issue 37246
CL 233857 fixed the underlying issue for #37246,
which had arisen again as #38916.

Add the test case from #37246 to ensure it stays fixed.

Fixes #37246

Change-Id: If7fd75a096d2ce4364dc15509253c3882838161d
Reviewed-on: https://go-review.googlesource.com/c/go/+/233941
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2020-05-14 15:18:29 +00:00
Michael Munday
f073395b73 cmd/compile: fix tuple selector bug in CSE pass
When tuple generators and selectors are eliminated as part of the
CSE pass we may end up with tuple selectors that are in different
blocks to the tuple generators that they correspond to. This breaks
the invariant that tuple generators and their corresponding
selectors must be in the same block. Therefore after CSE this
situation must be corrected.

Unfortunately the fixup code did not take into account that selectors
could be eliminated by CSE. It assumed that only the tuple generators
could be eliminated. In some situations this meant that it got into
a state where it was replacing references to selectors with references
to dead selectors in the wrong block.

To fix this we move the fixup code after the CSE rewrites have been
applied. This removes any difficult-to-reason-about interactions
with the CSE rewriter.

Fixes #38916.

Change-Id: I2211982dcdba399d03299f0a819945b3eb93b291
Reviewed-on: https://go-review.googlesource.com/c/go/+/233857
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-05-14 08:07:52 +00:00
Keith Randall
2cb10d42b7 cmd/compile: in prove, zero right shifts of positive int by #bits - 1
Taking over Zach's CL 212277. Just cleaned up and added a test.

For a positive, signed integer, an arithmetic right shift of count
(bit-width - 1) equals zero. e.g. int64(22) >> 63 -> 0. This CL makes
prove replace these right shifts with a zero-valued constant.

These shifts may arise in source code explicitly, but can also be
created by the generic rewrite of signed division by a power of 2.
// Signed divide by power of 2.
// n / c =       n >> log(c) if n >= 0
//       = (n+c-1) >> log(c) if n < 0
// We conditionally add c-1 by adding n>>63>>(64-log(c))
	(first shift signed, second shift unsigned).
(Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) ->
  (Rsh64x64
    (Add64 <t> n (Rsh64Ux64 <t>
    	(Rsh64x64 <t> n (Const64 <typ.UInt64> [63]))
	(Const64 <typ.UInt64> [64-log2(c)])))
    (Const64 <typ.UInt64> [log2(c)]))

If n is known to be positive, this rewrite includes an extra Add and 2
extra Rsh. This CL will allow prove to replace one of the extra Rsh with
a 0. That replacement then allows lateopt to remove all the unneccesary
fixups from the generic rewrite.

There is a rewrite rule to handle this case directly:
(Div64 n (Const64 [c])) && isNonNegative(n) && isPowerOfTwo(c) ->
	(Rsh64Ux64 n (Const64 <typ.UInt64> [log2(c)]))
But this implementation of isNonNegative really only handles constants
and a few special operations like len/cap. The division could be
handled if the factsTable version of isNonNegative were available.
Unfortunately, the first opt pass happens before prove even has a
chance to deduce the numerator is non-negative, so the generic rewrite
has already fired and created the extra Ops discussed above.

Fixes #36159

By Printf count, this zeroes 137 right shifts when building std and cmd.

Change-Id: Iab486910ac9d7cfb86ace2835456002732b384a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/232857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-05-11 16:23:52 +00:00
Emmanuel T Odeke
7cbee12444 cmd/compile: improve error when setting unexported fields
Improve the error user experience when users try to set/refer
to unexported fields and methods of struct literals, by directly saying

    "cannot refer to unexported field or method"

Fixes #31053

Change-Id: I6fd3caf64b7ca9f9d8ea60b7756875e340792d59
Reviewed-on: https://go-review.googlesource.com/c/go/+/201657
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-05-08 20:44:01 +00:00
Emmanuel T Odeke
26de581a70 cmd/compile: omit file:pos for non-existent errors
Omits printing the file:line:column when trying to
open non-existent files

Given:
    go tool compile x.go

* Before:
    x.go:0: open x.go: no such file or directory

* After:
    open x.go: no such file or directory

Reverts the revert in CL 231043 by only fixing the case
of non-existent errors which is what the original bug
was about. The fix for "permission errors" will come later
on when I have bandwidth to investigate the differences
between running with root and why os.Open works for some
builders and not others.

Fixes #36437

Change-Id: I9c8a0981ad708b504bb43990a4105b42266fa41f
Reviewed-on: https://go-review.googlesource.com/c/go/+/230941
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-05-08 20:28:57 +00:00
Martin Möhrmann
6ed4661807 cmd/compile: optimize make+copy pattern to avoid memclr
match:
 m = make([]T, x); copy(m, s)
for pointer free T and x==len(s) rewrite to:
 m = mallocgc(x*elemsize(T), nil, false); memmove(&m, &s, x*elemsize(T))
otherwise rewrite to:
 m = makeslicecopy([]T, x, s)

This avoids memclear and shading of pointers in the newly created slice
before the copy.

With this CL "s" is only be allowed to bev a variable and not a more
complex expression. This restriction could be lifted in future versions
of this optimization when it can be proven that "s" is not referencing "m".

Triggers 450 times during make.bash..
Reduces go binary size by ~8 kbyte.

name                           old time/op  new time/op  delta
MakeSliceCopy/mallocmove/Byte  71.1ns ± 1%  65.8ns ± 0%  -7.49%  (p=0.000 n=10+9)
MakeSliceCopy/mallocmove/Int   71.2ns ± 1%  66.0ns ± 0%  -7.27%  (p=0.000 n=10+8)
MakeSliceCopy/mallocmove/Ptr    104ns ± 4%    99ns ± 1%  -5.13%  (p=0.000 n=10+10)
MakeSliceCopy/makecopy/Byte    70.3ns ± 0%  68.0ns ± 0%  -3.22%  (p=0.000 n=10+9)
MakeSliceCopy/makecopy/Int     70.3ns ± 0%  68.5ns ± 1%  -2.59%  (p=0.000 n=9+10)
MakeSliceCopy/makecopy/Ptr      102ns ± 0%    99ns ± 1%  -2.97%  (p=0.000 n=9+9)
MakeSliceCopy/nilappend/Byte   75.4ns ± 0%  74.9ns ± 2%  -0.63%  (p=0.015 n=9+9)
MakeSliceCopy/nilappend/Int    75.6ns ± 0%  76.4ns ± 3%    ~     (p=0.245 n=9+10)
MakeSliceCopy/nilappend/Ptr     107ns ± 0%   108ns ± 1%  +0.93%  (p=0.005 n=9+10)

Fixes #26252

Change-Id: Iec553dd1fef6ded16197216a472351c8799a8e71
Reviewed-on: https://go-review.googlesource.com/c/go/+/146719
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-05-07 17:50:24 +00:00
Cuong Manh Le
0f47c12a29 cmd/compile: do not emit code for discardable blank fields
Fixes #38690

Change-Id: I3544daf617fddc0f89636265c113001178d16b0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/230121
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-05-06 04:34:54 +00:00
Keith Randall
b4ecafc986 cmd/compile: restrict bit test rewrite rules
The {AND,OR,XOR}const ops can only take an int32 as an argument.
Make sure that when rewriting a BTx op to one of these, the result
has no high-order bits.

Fixes #38746

Change-Id: Ia7c5f76952329f60974bc033c29a5433610f3b28
Reviewed-on: https://go-review.googlesource.com/c/go/+/231977
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-05-05 15:41:37 +00:00
Keith Randall
9ed0fb42e3 cmd/compile: add indexed memory modification ops to amd64
name            old time/op  new time/op  delta
Modify-16        404ns ± 1%   365ns ± 1%  -9.73%  (p=0.000 n=10+10)
ConstModify-16   407ns ± 0%   385ns ± 2%  -5.56%  (p=0.000 n=9+10)

Seems to generally help generated code.

Binary size change is in the noise.

Change-Id: I57891bfaf0f7dfc5d143bb9f7ebafc7079d2614f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228098
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-04-30 17:21:31 +00:00
Keith Randall
882ec701d2 cmd/compile: add indexed load+op operations to amd64
name        old time/op  new time/op  delta
LoadAdd-16   545ns ± 0%   456ns ± 0%  -16.31%  (p=0.000 n=10+10)

Update #36468

Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227960
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-04-30 17:19:57 +00:00
Austin Clements
eda6fe3572 Revert "cmd/compile: omit file:pos for non-existent, permission errors"
This reverts commit 4f7053c87f.

Reason for revert: Newly added test is failing on several builders.

Change-Id: I22dcbfebf2f57735b2f479886bbeb623f95b132f
Reviewed-on: https://go-review.googlesource.com/c/go/+/231043
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-30 02:29:55 +00:00
Emmanuel T Odeke
4f7053c87f cmd/compile: omit file:pos for non-existent, permission errors
Omits printing the file:line:column when trying to open either
* non-existent files
* files without permission

Given:
    go tool compile x.go

For either of x.go not existing, or if no read permissions:

* Before:
    x.go:0: open x.go: no such file or directory
    x.go:0: open x.go: permission denied

* After:
    open x.go: no such file or directory
    open x.go: permission denied

While here, noticed an oddity with the Linux builders, that appear
to always be running under root, hence the test for permission errors
with 0222 -W-*-W-*-W- can't pass on linux-amd64 builders.
The filed bug is #38608.

Fixes #36437

Change-Id: I9645ef73177c286c99547e3a0f3719fa07b35cb5
Reviewed-on: https://go-review.googlesource.com/c/go/+/229357
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-30 01:17:47 +00:00
Josh Bleecher Snyder
4a7e363288 cmd/compile: optimize Move with all-zero ro sym src to Zero
We set up static symbols during walk that
we later make copies of to initialize local variables.
It is difficult to ascertain at that time exactly
when copying a symbol is profitable vs locally
initializing an autotmp.

During SSA, we are much better placed to optimize.
This change recognizes when we are copying from a
global readonly all-zero symbol and replaces it with
direct zeroing.

This often allows the all-zero symbol to be
deadcode eliminated at link time.
This is not ideal--it makes for large object files,
and longer link times--but it is the cleanest fix I could find.

This makes the final binary for the program in #38554
shrink from >500mb to ~2.2mb.

It also shrinks the standard binaries:

file      before    after     Δ       %
addr2line 4412496   4404304   -8192   -0.186%
buildid   2893816   2889720   -4096   -0.142%
cgo       4841048   4832856   -8192   -0.169%
compile   19926480  19922432  -4048   -0.020%
cover     5281816   5277720   -4096   -0.078%
link      6734648   6730552   -4096   -0.061%
nm        4366240   4358048   -8192   -0.188%
objdump   4755968   4747776   -8192   -0.172%
pprof     14653060  14612100  -40960  -0.280%
trace     11805940  11777268  -28672  -0.243%
vet       7185560   7181416   -4144   -0.058%
total     113588440 113465560 -122880 -0.108%

And not just by removing unnecessary symbols;
the program text shrinks a bit as well.

Fixes #38554

Change-Id: I8381ae6084ae145a5e0cd9410c451e52c0dc51c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/229704
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-24 23:58:10 +00:00
Matthew Dempsky
a44d06d3b4 cmd/compile: use fixVariadicCall in escape analysis
This CL uses fixVariadicCall before escape analyzing function calls.
This has a number of benefits, though also some minor obstacles:

Most notably, it allows us to remove ODDDARG along with the logic
involved in setting it up, manipulating EscHoles, and later copying
its escape analysis flags to the actual slice argument. Instead, we
uniformly handle all variadic calls the same way. (E.g., issue31573.go
is updated because now f() and f(nil...) are handled identically.)

It also allows us to simplify handling of builtins and generic
function calls. Previously handling of calls was hairy enough to
require multiple dispatches on n.Op, whereas now the logic is uniform
enough that we can easily handle it with a single dispatch.

The downside is handling //go:uintptrescapes is now somewhat clumsy.
(It used to be clumsy, but it still is, too.) The proper fix here is
probably to stop using escape analysis tags for //go:uintptrescapes
and unsafe-uintptr, and have an earlier pass responsible for them.

Finally, note that while we now call fixVariadicCall in Escape, we
still have to call it in Order, because we don't (yet) run Escape on
all compiler-generated functions. In particular, the generated "init"
function for initializing package-level variables can contain calls to
variadic functions and isn't escape analyzed.

Passes toolstash-check -race.

Change-Id: I4cdb92a393ac487910aeee58a5cb8c1500eef881
Reviewed-on: https://go-review.googlesource.com/c/go/+/229759
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2020-04-23 22:02:12 +00:00
Josh Bleecher Snyder
e7c1873691 cmd/compile: optimize x & 1 != 0 to x & 1 on amd64
Triggers a handful of times in std+cmd.

Change-Id: I9bb8ce9a5f8bae2547cb61157cd8f256e1b63e76
Reviewed-on: https://go-review.googlesource.com/c/go/+/229602
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-23 17:52:28 +00:00
Michael Munday
ab7a65f283 cmd/compile: clean up codegen for branch-on-carry on s390x
This CL optimizes code that uses a carry from a function such as
bits.Add64 as the condition in an if statement. For example:

    x, c := bits.Add64(a, b, 0)
    if c != 0 {
        panic("overflow")
    }

Rather than converting the carry into a 0 or a 1 value and using
that as an input to a comparison instruction the carry flag is now
used as the input to a conditional branch directly. This typically
removes an ADD LOGICAL WITH CARRY instruction when user code is
doing overflow detection and is closer to the code that a user
would expect to generate.

Change-Id: I950431270955ab72f1b5c6db873b6abe769be0da
Reviewed-on: https://go-review.googlesource.com/c/go/+/219757
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-22 20:11:06 +00:00
Matthew Dempsky
2a2423bd05 cmd/compile: more precise analysis of method values
Previously for a method value "x.M", we always flowed x directly to
the heap, which led to the receiver argument generally needing to be
heap allocated.

This CL changes it to flow x to the closure and M's receiver
parameter. This allows receiver arguments to be stack allocated as
long as (1) the closure never escapes, *and* (2) method doesn't leak
its receiver parameter.

Within the standard library, this allows a handful of objects to be
stack allocated instead. Listed here are diagnostics that were
previously emitted by "go build -gcflags=-m std cmd" that are no
longer emitted:

archive/tar/writer.go:118:6: moved to heap: f
archive/tar/writer.go:208:6: moved to heap: f
archive/tar/writer.go:248:6: moved to heap: f
cmd/compile/internal/gc/initorder.go:252:2: moved to heap: d
cmd/compile/internal/gc/initorder.go:75:2: moved to heap: s
cmd/go/internal/generate/generate.go:206:7: &Generator literal escapes to heap
cmd/internal/obj/arm64/asm7.go:910:2: moved to heap: c
cmd/internal/obj/mips/asm0.go:415:2: moved to heap: c
cmd/internal/obj/pcln.go:294:22: new(pcinlineState) escapes to heap
cmd/internal/obj/s390x/asmz.go:459:2: moved to heap: c
crypto/tls/handshake_server.go:56:2: moved to heap: hs

Thanks to Cuong Manh Le for help coming up with this solution.

Fixes #27557.

Change-Id: I8c85d671d07fb9b53e11d2dd05949a34dbbd7e17
Reviewed-on: https://go-review.googlesource.com/c/go/+/228263
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-21 20:49:34 +00:00
Michael Munday
e464d7d797 cmd/compile: optimize comparisons with immediates on s390x
When generating code for unsigned equals (==) and not equals (!=)
comparisons we currently, on s390x, always use signed comparisons.

This mostly works well, however signed comparisons on s390x sign
extend their immediates and unsigned comparisons zero extend them.
For compare-and-branch instructions which can only have 8-bit
immediates this significantly changes the range of immediate values
we can represent: [-128, 127] for signed comparisons and [0, 255]
for unsigned comparisons.

When generating equals and not equals checks we don't neet to worry
about whether the comparison is signed or unsigned. This CL
therefore adds rules to allow us to switch signedness for such
comparisons if it means that it brings a constant into range for an
8-bit immediate.

For example, a signed equals with an integer in the range [128, 255]
will now be implemented using an unsigned compare-and-branch
instruction rather than separate compare and branch instructions.

As part of this change I've also added support for adding a name
to block control values using the same `x:(...)` syntax we use for
value rules.

Triggers 792 times when compiling cmd and std.

Change-Id: I77fa80a128f0a8ce51a2888d1e384bd5e9b61a77
Reviewed-on: https://go-review.googlesource.com/c/go/+/228642
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-21 19:23:51 +00:00
Russ Cox
768201729d cmd/compile: detect and diagnose invalid //go: directive placement
Thie CL changes cmd/compile/internal/syntax to give the gc half of
the compiler more control over pragma handling, so that it can prepare
better errors, diagnose misuse, and so on. Before, the API between
the two was hard-coded as a uint16. Now it is an interface{}.
This should set us up better for future directives.

In addition to the split, this CL emits a "misplaced compiler directive"
error for any directive that is in a place where it has no effect.
I've certainly been confused in the past by adding comments
that were doing nothing and not realizing it. This should help
avoid that kind of confusion.

The rule, now applied consistently, is that a //go: directive
must appear on a line by itself immediately before the declaration
specifier it means to apply to. See cmd/compile/doc.go for
precise text and test/directive.go for examples.

This may cause some code to stop compiling, but that code
was broken. For example, this code formerly applied the
//go:noinline to f (not c) but now will fail to compile:

	//go:noinline
	const c = 1

	func f() {}

Change-Id: Ieba9b8d90a27cfab25de79d2790a895cefe5296f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228578
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-21 16:47:01 +00:00
alex-semenyuk
876c1feb7d test/codegen, runtime/pprof, runtime: apply fmt
Change-Id: Ife4e065246729319c39e57a4fbd8e6f7b37724e1
GitHub-Last-Rev: e71803eaeb
GitHub-Pull-Request: golang/go#38527
Reviewed-on: https://go-review.googlesource.com/c/go/+/228901
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
2020-04-21 09:07:42 +00:00
Josh Bleecher Snyder
50b11318fe cmd/compile: use oneBit instead of isPowerOfTwo in bit optimization
This optimization works on any integer with exactly one bit set.
This is identical to being a power of two, except in the
most negative number. Use oneBit instead.

The rule now triggers in a few more places in std+cmd,
in packages encoding/asn1, crypto/elliptic, and
vendor/golang.org/x/crypto/cryptobyte.

This change obviates the need for CL 222479
by doing this optimization consistently in the compiler.

Change-Id: I983c6235290fdc634fda5e11b10f1f8ce041272f
Reviewed-on: https://go-review.googlesource.com/c/go/+/229124
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2020-04-21 00:38:34 +00:00
alex-semenyuk
d0d0028207 test: remove duplicate code from makechan/makemap
Change-Id: Ib9bcfaa12d42bf9d2045aef035080b1a990a8b98
GitHub-Last-Rev: bee77a8970
GitHub-Pull-Request: golang/go#38047
Reviewed-on: https://go-review.googlesource.com/c/go/+/225219
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-19 07:51:23 +00:00
Cherry Zhang
a32262d462 cmd/compile: when marking REFLECTMETHOD, check for reflect package itself
reflect.Type.Method (and MethodByName) can be used to obtain a
reference of a method by reflection. The linker needs to know
if reflect.Type.Method is called, and retain all exported methods
accordingly. This is handled by the compiler, which marks the
caller of reflect.Type.Method with REFLECTMETHOD attribute. The
current code failed to handle the reflect package itself, so the
method wrapper reflect.Type.Method is not marked. This CL fixes
it.

Fixes #38515.

Change-Id: I12904d23eda664cf1794bc3676152f3218fb762b
Reviewed-on: https://go-review.googlesource.com/c/go/+/228880
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-19 03:12:32 +00:00
Russ Cox
8ce21fae60 test: add copyright notice to typecheck.go
Also gofmt.

Change-Id: I36ac990965250867574f8e2318b65b87a0beda04
Reviewed-on: https://go-review.googlesource.com/c/go/+/228697
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-04-17 13:30:49 +00:00
Josh Bleecher Snyder
29d925dfcf test: add test for nil check / bounds check compiler confusion
This test started failing at CL 228106 and was fixed by CL 228677.

Fixes #38496

Change-Id: I2dadcd99227347e8d28179039f5f345e728c4595
Reviewed-on: https://go-review.googlesource.com/c/go/+/228698
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-17 04:26:54 +00:00
David Chase
e4e192484b cmd/compile: split up the addressing mode on OpAMD64CMP*loadidx* always
Benchmarking suggests that the combo instruction is notably slower,
at least in the places where we measure.

Updates #37955

Change-Id: I829f1975dd6edf38163128ba51d84604055512f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/228157
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-15 18:09:14 +00:00
Michael Munday
334d410ae3 cmd/compile: fix incorrect block for s390x Select1 op
When inserting Select0 and Select1 ops we need to ensure that they
live in the same block as their argument. This is because they need
to be scheduled immediately after their argument for register and
flag allocation to work correctly.

Fixes #38356.

Change-Id: Iba384dbe87010f1c7c4ce909f08011e5f1de7fd5
Reviewed-on: https://go-review.googlesource.com/c/go/+/227879
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-14 19:01:47 +00:00
Josh Bleecher Snyder
2db4cc38a0 cmd/compile: improve generated code for concrete cases in type switches
Consider

switch x:= x.(type) {
case int:
  // int stmts
case error:
  // error stmts
}

Prior to this change, we lowered this roughly as:

if x, ok := x.(int); ok {
  // int stmts
} else if x, ok := x.(error); ok {
  // error stmts
}

x, ok := x.(error) is implemented with a call to runtime.assertE2I2 or runtime.assertI2I2.

x, ok := x.(int) generates inline code that checks whether x has type int,
and populates x and ok as appropriate. We then immediately branch again on ok.
The shortcircuit pass in the SSA backend is designed to recognize situations
like this, in which we are immediately branching on a bool value
that we just calculated with a branch.

However, the shortcircuit pass has limitations when the intermediate state has phis.
In this case, the phi value is x (the int).
CL 222923 improved the situation, but many cases are still unhandled.
I have further improvements in progress, which is how I found this particular problem,
but they are expensive, and may or may not see the light of day.

In the common case of a lone concrete type in a type switch case,
it is easier and cheaper to simply lower a different way, roughly:

if _, ok := x.(int); ok {
  x := x.(int)
  // int stmts
}

Instead of using a type assertion, though, we extract the value of x
from the interface directly.

This removes the need to track x (the int) across the branch on ok,
which removes the phi, which lets the shortcircuit pass do its job.

Benchmarks for encoding/binary show improvements, as well as some
wild swings on the super fast benchmarks (alignment effects?):

name                      old time/op    new time/op    delta
ReadSlice1000Int32s-8       5.25µs ± 2%    4.87µs ± 3%   -7.11%  (p=0.000 n=44+49)
ReadStruct-8                 451ns ± 2%     417ns ± 2%   -7.39%  (p=0.000 n=45+46)
WriteStruct-8                412ns ± 2%     405ns ± 3%   -1.58%  (p=0.000 n=46+48)
ReadInts-8                   296ns ± 8%     275ns ± 3%   -7.23%  (p=0.000 n=48+50)
WriteInts-8                  324ns ± 1%     318ns ± 2%   -1.67%  (p=0.000 n=44+49)
WriteSlice1000Int32s-8      5.21µs ± 2%    4.92µs ± 1%   -5.67%  (p=0.000 n=46+44)
PutUint16-8                 0.58ns ± 2%    0.59ns ± 2%   +0.63%  (p=0.000 n=49+49)
PutUint32-8                 0.87ns ± 1%    0.58ns ± 1%  -33.10%  (p=0.000 n=46+44)
PutUint64-8                 0.66ns ± 2%    0.87ns ± 2%  +33.07%  (p=0.000 n=47+48)
LittleEndianPutUint16-8     0.86ns ± 2%    0.87ns ± 2%   +0.55%  (p=0.003 n=47+50)
LittleEndianPutUint32-8     0.87ns ± 1%    0.87ns ± 1%     ~     (p=0.547 n=45+47)
LittleEndianPutUint64-8     0.87ns ± 2%    0.87ns ± 1%     ~     (p=0.451 n=46+47)
ReadFloats-8                79.8ns ± 5%    75.9ns ± 2%   -4.83%  (p=0.000 n=50+47)
WriteFloats-8               89.3ns ± 1%    88.9ns ± 1%   -0.48%  (p=0.000 n=46+44)
ReadSlice1000Float32s-8     5.51µs ± 1%    4.87µs ± 2%  -11.74%  (p=0.000 n=47+46)
WriteSlice1000Float32s-8    5.51µs ± 1%    4.93µs ± 1%  -10.60%  (p=0.000 n=48+47)
PutUvarint32-8              25.9ns ± 2%    24.0ns ± 2%   -7.02%  (p=0.000 n=48+50)
PutUvarint64-8              75.1ns ± 1%    61.5ns ± 2%  -18.12%  (p=0.000 n=45+47)
[Geo mean]                  57.3ns         54.3ns        -5.33%

Despite the rarity of type switches, this generates noticeably smaller binaries.

file      before    after     Δ       %
addr2line 4413296   4409200   -4096   -0.093%
api       5982648   5962168   -20480  -0.342%
cgo       4854168   4833688   -20480  -0.422%
compile   19694784  19682560  -12224  -0.062%
cover     5278008   5265720   -12288  -0.233%
doc       4694824   4682536   -12288  -0.262%
fix       3411336   3394952   -16384  -0.480%
link      6721496   6717400   -4096   -0.061%
nm        4371152   4358864   -12288  -0.281%
objdump   4760960   4752768   -8192   -0.172%
pprof     14810820  14790340  -20480  -0.138%
trace     11681076  11668788  -12288  -0.105%
vet       8285464   8244504   -40960  -0.494%
total     115824120 115627576 -196544 -0.170%

Compiler performance is marginally improved (note that go/types has many type switches):

name        old alloc/op      new alloc/op      delta
Template         35.0MB ± 0%       35.0MB ± 0%  +0.09%  (p=0.008 n=5+5)
Unicode          28.5MB ± 0%       28.5MB ± 0%    ~     (p=0.548 n=5+5)
GoTypes           114MB ± 0%        114MB ± 0%  -0.76%  (p=0.008 n=5+5)
Compiler          541MB ± 0%        541MB ± 0%  -0.03%  (p=0.008 n=5+5)
SSA              1.17GB ± 0%       1.17GB ± 0%    ~     (p=0.841 n=5+5)
Flate            21.9MB ± 0%       21.9MB ± 0%    ~     (p=0.421 n=5+5)
GoParser         26.9MB ± 0%       26.9MB ± 0%    ~     (p=0.222 n=5+5)
Reflect          74.6MB ± 0%       74.6MB ± 0%    ~     (p=1.000 n=5+5)
Tar              32.9MB ± 0%       32.8MB ± 0%    ~     (p=0.056 n=5+5)
XML              42.4MB ± 0%       42.1MB ± 0%  -0.77%  (p=0.008 n=5+5)
[Geo mean]       73.2MB            73.1MB       -0.15%

name        old allocs/op     new allocs/op     delta
Template           377k ± 0%         377k ± 0%  +0.06%  (p=0.008 n=5+5)
Unicode            354k ± 0%         354k ± 0%    ~     (p=0.095 n=5+5)
GoTypes           1.31M ± 0%        1.30M ± 0%  -0.73%  (p=0.008 n=5+5)
Compiler          5.44M ± 0%        5.44M ± 0%  -0.04%  (p=0.008 n=5+5)
SSA               11.7M ± 0%        11.7M ± 0%    ~     (p=1.000 n=5+5)
Flate              239k ± 0%         239k ± 0%    ~     (p=1.000 n=5+5)
GoParser           302k ± 0%         302k ± 0%  -0.04%  (p=0.008 n=5+5)
Reflect            977k ± 0%         977k ± 0%    ~     (p=0.690 n=5+5)
Tar                346k ± 0%         346k ± 0%    ~     (p=0.889 n=5+5)
XML                431k ± 0%         430k ± 0%  -0.25%  (p=0.008 n=5+5)
[Geo mean]         806k              806k       -0.10%

For packages with many type switches, this considerably shrinks function text size.
Some examples:

file                                                           before   after    Δ       %
encoding/binary.s                                              30726    29504    -1222   -3.977%
go/printer.s                                                   77597    76005    -1592   -2.052%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                 65704    63318    -2386   -3.631%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8047     7714     -333    -4.138%

Text size regressions are rare.

Change-Id: Ic10982bbb04876250eaa5bfee97990141ae5fc28
Reviewed-on: https://go-review.googlesource.com/c/go/+/228106
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-14 17:34:31 +00:00
Cherry Zhang
1b15c7f102 cmd/compile: debug rewrite
If -d=ssa/PASS/debug=N is specified (N >= 2) for a rewrite pass
(e.g. lower), when a Value (or Block) is rewritten, print the
Value (or Block) before and after.

For #31915.
Updates #19013.

Change-Id: I80eadd44302ae736bc7daed0ef68529ab7a16776
Reviewed-on: https://go-review.googlesource.com/c/go/+/176718
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-13 21:56:15 +00:00
Keith Randall
83e288f3db cmd/compile: prevent constant folding of +/- when result is NaN
Missed as part of CL 221790. It isn't just * and / that can make NaNs.

Update #36400
Fixes #38359

Change-Id: I3fa562f772fe03b510793a6dc0cf6189c0c3e652
Reviewed-on: https://go-review.googlesource.com/c/go/+/227860
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2020-04-10 19:32:41 +00:00
Ian Lance Taylor
162f1bf7c2 test: add test case that gccgo failed to compile
Change-Id: I08ca5f77b7352fe3ced1fbe3d027d6f5b4828e35
Reviewed-on: https://go-review.googlesource.com/c/go/+/227783
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2020-04-10 18:42:07 +00:00
Lynn Boger
a1550d3ca3 cmd/compile: use isel with variable shifts on ppc64x
This changes the code generated for variable length shift
counts to use isel instead of instructions that set and
read the carry flag.

This reduces the generated code for shifts like this
by 1 instruction and avoids the use of instructions to
set and read the carry flag.

This sequence can be found in strconv with these results
on power9:

Atof64Decimal                          71.6ns ± 0%  68.3ns ± 0%   -4.61%
Atof64Float                            95.3ns ± 0%  90.9ns ± 0%   -4.62%
Atof64FloatExp                          153ns ± 0%   149ns ± 0%   -2.61%
Atof64Big                               234ns ± 0%   232ns ± 0%   -0.85%
Atof64RandomBits                        348ns ± 0%   369ns ± 0%   +6.03%
Atof64RandomFloats                      262ns ± 0%   262ns ± 0%     ~
Atof32Decimal                          72.0ns ± 0%  68.2ns ± 0%   -5.28%
Atof32Float                            92.1ns ± 0%  87.1ns ± 0%   -5.43%
Atof32FloatExp                          159ns ± 0%   158ns ± 0%   -0.63%
Atof32Random                            194ns ± 0%   191ns ± 0%   -1.55%

Some tests in codegen/shift.go are enabled to verify the
expected instructions are generated.

Change-Id: I968715d10ada405a8c46132bf19b8ed9b85796d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227337
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-09 19:18:56 +00:00
Josh Bleecher Snyder
ade0811dc8 cmd/compile: handle some additional phis in shortcircuit
Prior to this change, the shortcircuit pass could only
handle blocks containing only a single phi control value,
possibly wrapped in some OpNot and OpCopy values.

This change partially lifts this limitation.
It handles some cases in which the block contains other phi values.
This appears to happen most commonly in cases in which
the conditionals being checked involve the memory state,
in which case there is a phi memory value in the block.

The general idea here is to use the information we have about
the CFG to (1) move the other phi values into other blocks
and/or (2) rewrite uses of the other phi values in other blocks.

For example, consider this CFG:

p   q
 \ /
  b
 / \
t   u

And consider a phi value v in block b.
We'll write v = Phi(p: x, q: y) to say that v has value x corresponding
to inbound block p, and value y for block q.

We will rewrite this CFG to:

p    q
|   /
|  b
|/  \
t    u

What should we do with v?

Any uses of v in u can be replaced with y. Why?
If we are in block u, we came from b, and before that from q.
If prior to b we came from p, then we would have gone to t, not u.
Since we came from q, we know that v took the value y.

Uses of v in t are a bit more complicated.
It is going to end up being a phi value: Phi(p: ?, b: ?).

Suppose, after the rewrite, we came from block p.
Then, before the rewrite, we would have gone to b,
where v would have the value x.
So we have Phi(p: x, b: ?).

Suppose, after the rewrite, we came from block b.
Then we must have come from block q.
If we come from block q, v has value y.
So we have Phi(p: x, b: y).
Uses of v in t can thus be replaced with a new phi value,
with the same values as v, but with altered predecessors.

Similar reasoning can be employed to rewrite or replace
other uses of v elsewhere in the CFG, so that v itself can be eliminated,
and the CFG rewrite can proceed.

This change sets up the infrastructure for such optimizations
and adds a few cheap ones. All optimizations in this change depend
only on the shape of the CFG; future changes may also depend on where
v's uses are. That analysis is more powerful but more expensive,
and should be done incrementally.

The use of closures here is perhaps a bit unusual,
but during development it proved critical to having readable code.
We must decide early on whether we can safely do the CFG modifications,
and then later fix up the phis if so.
Safely storing state and decisions across these two phases is hard to do readably.
Closures solve the problem neatly.

I manually instrumented the code paths in shortcircuitPhiPlan.
During make.bash there are nearly 6000 invocations.
The least-visited code path gets run 85 times,
so all the code in this CL is reasonably well-exercised.

Here is a concrete example of code improved by this change:

func f(e interface{}) int {
	if x, ok := e.(int); ok {
		return x
	}
	return 0
}

Omitting PCDATA, FUNCDATA, and the like, it used to compile to:

"".f STEXT nosplit size=50 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	43
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:4)	JNE	33
	0x001b 00027 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x0020 00032 (x.go:5)	RET
	0x0021 00033 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x002a 00042 (x.go:7)	RET
	0x002b 00043 (x.go:7)	MOVL	$0, AX
	0x0030 00048 (x.go:4)	JMP	25

Afterwards, it compiles to:

"".f STEXT nosplit size=41 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	31
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x001e 00030 (x.go:5)	RET
	0x001f 00031 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x0028 00040 (x.go:7)	RET

Note that there is now only a single JNE and a single RET $0 path.

Updates #37608

Has a minor good effect on compilation speed and memory use.

Provides widespread improvements to generated code.
The rare, minor regressions I have investigated are due to
register allocation fluctuations.

file      before    after     Δ       %       
addr2line 4376080   4371984   -4096   -0.094% 
api       5945400   5933112   -12288  -0.207% 
asm       5034312   5030216   -4096   -0.081% 
buildid   2844952   2840856   -4096   -0.144% 
cgo       4812872   4804680   -8192   -0.170% 
compile   19622064  19610368  -11696  -0.060% 
cover     5236648   5232552   -4096   -0.078% 
dist      3658312   3654216   -4096   -0.112% 
doc       4653512   4649416   -4096   -0.088% 
fix       3370072   3365976   -4096   -0.122% 
link      6671864   6667768   -4096   -0.061% 
pprof     14781652  14761172  -20480  -0.139% 
trace     11639684  11627396  -12288  -0.106% 
vet       8252280   8231800   -20480  -0.248% 
total     115052984 114934792 -118192 -0.103% 


file                                                                     before   after    Δ       %       
internal/cpu.s                                                           3298     3296     -2      -0.061% 
internal/bytealg.s                                                       1730     1737     +7      +0.405% 
cmd/vendor/golang.org/x/mod/semver.s                                     7332     7283     -49     -0.668% 
image/color.s                                                            8248     8156     -92     -1.115% 
math.s                                                                   35966    35956    -10     -0.028% 
math/cmplx.s                                                             6596     6575     -21     -0.318% 
runtime.s                                                                480566   480053   -513    -0.107% 
sync.s                                                                   16408    16385    -23     -0.140% 
math/rand.s                                                              10447    10406    -41     -0.392% 
internal/reflectlite.s                                                   28408    28366    -42     -0.148% 
errors.s                                                                 2736     2701     -35     -1.279% 
sort.s                                                                   17031    17036    +5      +0.029% 
io.s                                                                     16993    16964    -29     -0.171% 
container/heap.s                                                         2006     1997     -9      -0.449% 
text/tabwriter.s                                                         9570     9552     -18     -0.188% 
bytes.s                                                                  31823    31594    -229    -0.720% 
strconv.s                                                                52760    52717    -43     -0.082% 
vendor/golang.org/x/text/transform.s                                     16713    16706    -7      -0.042% 
strings.s                                                                42590    42563    -27     -0.063% 
bufio.s                                                                  22883    22785    -98     -0.428% 
encoding/base32.s                                                        9586     9531     -55     -0.574% 
syscall.s                                                                82237    82243    +6      +0.007% 
image.s                                                                  37465    37452    -13     -0.035% 
regexp/syntax.s                                                          82827    82769    -58     -0.070% 
image/draw.s                                                             18698    18584    -114    -0.610% 
image/jpeg.s                                                             36560    36549    -11     -0.030% 
time.s                                                                   82557    82526    -31     -0.038% 
context.s                                                                10863    10820    -43     -0.396% 
regexp.s                                                                 64114    64049    -65     -0.101% 
os.s                                                                     51751    51524    -227    -0.439% 
reflect.s                                                                168240   168049   -191    -0.114% 
cmd/go/internal/lockedfile/internal/filelock.s                           2317     2290     -27     -1.165% 
path/filepath.s                                                          17831    17766    -65     -0.365% 
io/ioutil.s                                                              6994     6990     -4      -0.057% 
encoding/binary.s                                                        30791    30726    -65     -0.211% 
cmd/vendor/golang.org/x/sys/unix.s                                       78055    78033    -22     -0.028% 
encoding/pem.s                                                           9280     9247     -33     -0.356% 
crypto/cipher.s                                                          20376    20374    -2      -0.010% 
os/exec.s                                                                29229    29140    -89     -0.304% 
internal/goroot.s                                                        4588     4579     -9      -0.196% 
cmd/internal/browser.s                                                   2246     2240     -6      -0.267% 
cmd/vendor/golang.org/x/crypto/ssh/terminal.s                            27183    27149    -34     -0.125% 
fmt.s                                                                    76625    76484    -141    -0.184% 
encoding/hex.s                                                           6154     6152     -2      -0.032% 
compress/lzw.s                                                           7063     7059     -4      -0.057% 
database/sql/driver.s                                                    18875    18862    -13     -0.069% 
debug/plan9obj.s                                                         8268     8266     -2      -0.024% 
net/url.s                                                                29724    29719    -5      -0.017% 
encoding/csv.s                                                           12872    12856    -16     -0.124% 
debug/gosym.s                                                            25303    25268    -35     -0.138% 
compress/flate.s                                                         50952    51019    +67     +0.131% 
compress/zlib.s                                                          7277     7266     -11     -0.151% 
archive/zip.s                                                            42155    42111    -44     -0.104% 
debug/dwarf.s                                                            107632   107541   -91     -0.085% 
database/sql.s                                                           98373    98028    -345    -0.351% 
os/user.s                                                                14722    14708    -14     -0.095% 
encoding/json.s                                                          105836   105711   -125    -0.118% 
debug/macho.s                                                            32598    32560    -38     -0.117% 
encoding/gob.s                                                           136478   135755   -723    -0.530% 
debug/pe.s                                                               31160    30869    -291    -0.934% 
debug/elf.s                                                              63495    63302    -193    -0.304% 
vendor/golang.org/x/text/unicode/bidi.s                                  27220    27217    -3      -0.011% 
vendor/golang.org/x/text/secure/bidirule.s                               3363     3352     -11     -0.327% 
go/token.s                                                               12036    12035    -1      -0.008% 
flag.s                                                                   22277    22256    -21     -0.094% 
mime.s                                                                   39696    39509    -187    -0.471% 
go/scanner.s                                                             19033    19020    -13     -0.068% 
archive/tar.s                                                            70936    70581    -355    -0.500% 
internal/xcoff.s                                                         22823    22820    -3      -0.013% 
text/scanner.s                                                           11631    11629    -2      -0.017% 
encoding/xml.s                                                           110534   110408   -126    -0.114% 
math/big.s                                                               183636   183545   -91     -0.050% 
image/gif.s                                                              27376    27343    -33     -0.121% 
crypto/dsa.s                                                             6029     5969     -60     -0.995% 
image/png.s                                                              42947    42939    -8      -0.019% 
crypto/rand.s                                                            6866     6854     -12     -0.175% 
vendor/golang.org/x/text/unicode/norm.s                                  66394    66354    -40     -0.060% 
runtime/trace.s                                                          2603     2521     -82     -3.150% 
crypto/ed25519.s                                                         6321     6300     -21     -0.332% 
text/template/parse.s                                                    93910    93844    -66     -0.070% 
crypto/rsa.s                                                             31460    31369    -91     -0.289% 
encoding/asn1.s                                                          57021    57023    +2      +0.004% 
crypto/elliptic.s                                                        51382    51363    -19     -0.037% 
crypto/x509/pkix.s                                                       10386    10342    -44     -0.424% 
vendor/golang.org/x/net/idna.s                                           24482    24466    -16     -0.065% 
vendor/golang.org/x/crypto/cryptobyte.s                                  33479    33280    -199    -0.594% 
crypto/ecdsa.s                                                           11936    11883    -53     -0.444% 
go/constant.s                                                            43670    42663    -1007   -2.306% 
go/ast.s                                                                 80383    80191    -192    -0.239% 
testing.s                                                                68069    68057    -12     -0.018% 
runtime/pprof.s                                                          59613    59603    -10     -0.017% 
testing/iotest.s                                                         4895     4891     -4      -0.082% 
internal/trace.s                                                         78136    78089    -47     -0.060% 
cmd/internal/goobj2.s                                                    13158    13154    -4      -0.030% 
cmd/internal/src.s                                                       17661    17657    -4      -0.023% 
go/parser.s                                                              79046    78880    -166    -0.210% 
cmd/internal/objabi.s                                                    16367    16343    -24     -0.147% 
text/template.s                                                          94899    94486    -413    -0.435% 
go/printer.s                                                             77267    76992    -275    -0.356% 
cmd/internal/goobj.s                                                     25988    25947    -41     -0.158% 
runtime/pprof/internal/profile.s                                         102066   101933   -133    -0.130% 
go/format.s                                                              5419     5371     -48     -0.886% 
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s                            37181    37149    -32     -0.086% 
go/doc.s                                                                 74533    74132    -401    -0.538% 
html/template.s                                                          88743    88389    -354    -0.399% 
cmd/asm/internal/lex.s                                                   24881    24872    -9      -0.036% 
cmd/internal/buildid.s                                                   18263    18256    -7      -0.038% 
cmd/vendor/golang.org/x/arch/x86/x86asm.s                                80036    79980    -56     -0.070% 
go/build.s                                                               68905    68737    -168    -0.244% 
cmd/cover.s                                                              46070    45950    -120    -0.260% 
cmd/internal/obj.s                                                       117001   116991   -10     -0.009% 
cmd/doc.s                                                                62700    62419    -281    -0.448% 
cmd/internal/obj/arm.s                                                   66745    66687    -58     -0.087% 
cmd/compile/internal/syntax.s                                            145406   145062   -344    -0.237% 
cmd/internal/obj/wasm.s                                                  44049    44027    -22     -0.050% 
net.s                                                                    291835   291020   -815    -0.279% 
cmd/dist.s                                                               209020   208807   -213    -0.102% 
cmd/cgo.s                                                                241564   241102   -462    -0.191% 
vendor/golang.org/x/net/http/httpproxy.s                                 9407     9399     -8      -0.085% 
log/syslog.s                                                             7921     7909     -12     -0.151% 
go/types.s                                                               319325   317513   -1812   -0.567% 
vendor/golang.org/x/net/http/httpguts.s                                  3834     3825     -9      -0.235% 
mime/multipart.s                                                         21414    21343    -71     -0.332% 
cmd/internal/obj/ppc64.s                                                 119949   119938   -11     -0.009% 
cmd/compile/internal/logopt.s                                            10158    10118    -40     -0.394% 
vendor/golang.org/x/net/nettest.s                                        28012    27991    -21     -0.075% 
go/internal/srcimporter.s                                                6405     6380     -25     -0.390% 
go/internal/gcimporter.s                                                 34525    34493    -32     -0.093% 
net/mail.s                                                               23937    23720    -217    -0.907% 
go/internal/gccgoimporter.s                                              56095    56038    -57     -0.102% 
cmd/compile/internal/types.s                                             47247    47207    -40     -0.085% 
cmd/api.s                                                                39582    39558    -24     -0.061% 
cmd/go/internal/base.s                                                   12572    12551    -21     -0.167% 
cmd/vendor/golang.org/x/xerrors.s                                        17846    17814    -32     -0.179% 
cmd/vendor/golang.org/x/mod/sumdb/note.s                                 18142    18070    -72     -0.397% 
cmd/go/internal/search.s                                                 19994    19876    -118    -0.590% 
cmd/go/internal/imports.s                                                16457    16428    -29     -0.176% 
cmd/vendor/golang.org/x/mod/module.s                                     17838    17759    -79     -0.443% 
cmd/go/internal/cache.s                                                  30551    30514    -37     -0.121% 
cmd/vendor/golang.org/x/mod/sumdb/tlog.s                                 36356    36321    -35     -0.096% 
cmd/internal/test2json.s                                                 9452     9408     -44     -0.466% 
cmd/go/internal/mvs.s                                                    25136    25092    -44     -0.175% 
cmd/go/internal/txtar.s                                                  3488     3461     -27     -0.774% 
cmd/vendor/golang.org/x/mod/zip.s                                        18811    18800    -11     -0.058% 
cmd/go/internal/version.s                                                11213    11171    -42     -0.375% 
cmd/link/internal/benchmark.s                                            4941     4949     +8      +0.162% 
cmd/internal/obj/s390x.s                                                 126865   126849   -16     -0.013% 
cmd/gofmt.s                                                              30684    30596    -88     -0.287% 
cmd/fix.s                                                                87450    86906    -544    -0.622% 
cmd/internal/obj/x86.s                                                   88578    88556    -22     -0.025% 
cmd/vendor/golang.org/x/mod/modfile.s                                    72450    72363    -87     -0.120% 
cmd/oldlink/internal/loader.s                                            16743    16741    -2      -0.012% 
cmd/pack.s                                                               14863    14861    -2      -0.013% 
cmd/go/internal/load.s                                                   106742   106568   -174    -0.163% 
cmd/oldlink/internal/objfile.s                                           21787    21780    -7      -0.032% 
cmd/oldlink/internal/loadmacho.s                                         29309    29317    +8      +0.027% 
cmd/oldlink/internal/loadelf.s                                           35013    35021    +8      +0.023% 
cmd/asm/internal/asm.s                                                   68550    68538    -12     -0.018% 
cmd/link/internal/loader.s                                               94765    94564    -201    -0.212% 
cmd/link/internal/loadelf.s                                              35663    35667    +4      +0.011% 
cmd/link/internal/loadmacho.s                                            29501    29509    +8      +0.027% 
cmd/vendor/golang.org/x/tools/go/analysis.s                              4983     4976     -7      -0.140% 
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s       16771    16709    -62     -0.370% 
cmd/vendor/golang.org/x/tools/go/types/objectpath.s                      18481    18456    -25     -0.135% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2100     2085     -15     -0.714% 
cmd/vendor/github.com/google/pprof/profile.s                             150141   149620   -521    -0.347% 
cmd/vendor/github.com/google/pprof/internal/measurement.s                10420    10404    -16     -0.154% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s               36814    36755    -59     -0.160% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s                 6688     6673     -15     -0.224% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s               9856     9784     -72     -0.731% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/composite.s             3011     2979     -32     -1.063% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s              9737     9682     -55     -0.565% 
cmd/vendor/golang.org/x/tools/go/cfg.s                                   30738    30725    -13     -0.042% 
cmd/vendor/github.com/ianlancetaylor/demangle.s                          175195   174513   -682    -0.389% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/httpresponse.s          3625     3520     -105    -2.897% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s           2987     2971     -16     -0.536% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s                 4372     4340     -32     -0.732% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s            8634     8611     -23     -0.266% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s                 6189     6164     -25     -0.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s             8089     8073     -16     -0.198% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unsafeptr.s             2208     2177     -31     -1.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s           8050     8047     -3      -0.037% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s          3665     3629     -36     -0.982% 
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                           65773    65680    -93     -0.141% 
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s                  13328    13286    -42     -0.315% 
cmd/vendor/golang.org/x/tools/go/types/typeutil.s                        12263    12162    -101    -0.824% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s              1459     1421     -38     -2.605% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s              5208     5191     -17     -0.326% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unmarshal.s             1801     1782     -19     -1.055% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s            9569     9528     -41     -0.428% 
cmd/go/internal/work.s                                                   304928   304756   -172    -0.056% 
crypto/x509.s                                                            147340   147139   -201    -0.136% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s                34287    34019    -268    -0.782% 
crypto/tls.s                                                             311603   310644   -959    -0.308% 
cmd/oldlink/internal/ld.s                                                533115   532651   -464    -0.087% 
cmd/oldlink/internal/wasm.s                                              16484    16458    -26     -0.158% 
cmd/oldlink/internal/x86.s                                               18832    18830    -2      -0.011% 
cmd/link/internal/ld.s                                                   548200   547626   -574    -0.105% 
cmd/link/internal/wasm.s                                                 16760    16734    -26     -0.155% 
cmd/link/internal/arm64.s                                                20850    20840    -10     -0.048% 
cmd/link/internal/x86.s                                                  17437    17435    -2      -0.011% 
net/http.s                                                               556647   555519   -1128   -0.203% 
net/http/cookiejar.s                                                     15849    15833    -16     -0.101% 
expvar.s                                                                 9521     9508     -13     -0.137% 
net/http/httptest.s                                                      16471    16452    -19     -0.115% 
cmd/vendor/github.com/google/pprof/internal/plugin.s                     4266     4264     -2      -0.047% 
net/http/cgi.s                                                           23448    23428    -20     -0.085% 
cmd/go/internal/web.s                                                    16472    16428    -44     -0.267% 
net/http/httputil.s                                                      39672    39670    -2      -0.005% 
net/rpc.s                                                                33989    33965    -24     -0.071% 
net/http/fcgi.s                                                          19167    19162    -5      -0.026% 
cmd/vendor/github.com/google/pprof/internal/symbolz.s                    5861     5857     -4      -0.068% 
cmd/vendor/github.com/google/pprof/internal/binutils.s                   35842    35823    -19     -0.053% 
cmd/vendor/github.com/google/pprof/internal/symbolizer.s                 11449    11404    -45     -0.393% 
cmd/go/internal/get.s                                                    62726    62582    -144    -0.230% 
cmd/vendor/github.com/google/pprof/internal/report.s                     80032    80022    -10     -0.012% 
cmd/go/internal/modfetch/codehost.s                                      89005    88871    -134    -0.151% 
cmd/trace.s                                                              116607   116496   -111    -0.095% 
cmd/vendor/github.com/google/pprof/internal/driver.s                     143234   143207   -27     -0.019% 
cmd/vendor/github.com/google/pprof/driver.s                              9000     8998     -2      -0.022% 
cmd/go/internal/modfetch.s                                               126300   125726   -574    -0.454% 
cmd/pprof.s                                                              12317    12312    -5      -0.041% 
cmd/go/internal/modconv.s                                                17878    17861    -17     -0.095% 
cmd/go/internal/modload.s                                                150261   149763   -498    -0.331% 
cmd/go/internal/clean.s                                                  11122    11091    -31     -0.279% 
cmd/go/internal/help.s                                                   6523     6521     -2      -0.031% 
cmd/go/internal/generate.s                                               11627    11614    -13     -0.112% 
cmd/go/internal/envcmd.s                                                 22034    21986    -48     -0.218% 
cmd/go/internal/modget.s                                                 38478    38398    -80     -0.208% 
cmd/go/internal/modcmd.s                                                 46430    46229    -201    -0.433% 
cmd/go/internal/test.s                                                   64399    64374    -25     -0.039% 
cmd/compile/internal/ssa.s                                               3615264  3608276  -6988   -0.193% 
cmd/compile/internal/gc.s                                                1538865  1537625  -1240   -0.081% 
cmd/compile/internal/amd64.s                                             33593    33574    -19     -0.057% 
cmd/compile/internal/x86.s                                               30871    30852    -19     -0.062% 
total                                                                    19343565 19311284 -32281  -0.167% 

Change-Id: Ib030eb79458827a5a5b6d0d2f98765f8325a4d7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/222923
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-08 22:13:38 +00:00
Ruixin(Peter) Bao
b2790a2838 cmd/compile: allow floating point Ops to produce flags on s390x
On s390x, some floating point arithmetic instructions (FSUB, FADD)  generate flag.
This patch allows those related SSA ops to return a tuple, where the second argument of
the tuple is the generated flag. We can use the flag and remove the
subsequent comparison instruction (e.g: LTDBR).

This CL also reduces the .text section for math.test binary by 0.4KB.

Benchmarks:
name                    old time/op  new time/op  delta
Acos-18                 12.1ns ± 0%  12.1ns ± 0%     ~     (all equal)
Acosh-18                18.5ns ± 0%  18.5ns ± 0%     ~     (all equal)
Asin-18                 13.1ns ± 0%  13.1ns ± 0%     ~     (all equal)
Asinh-18                19.4ns ± 0%  19.5ns ± 1%     ~     (p=0.444 n=5+5)
Atan-18                 10.0ns ± 0%  10.0ns ± 0%     ~     (all equal)
Atanh-18                19.1ns ± 1%  19.2ns ± 2%     ~     (p=0.841 n=5+5)
Atan2-18                16.4ns ± 0%  16.4ns ± 0%     ~     (all equal)
Cbrt-18                 14.8ns ± 0%  14.8ns ± 0%     ~     (all equal)
Ceil-18                 0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Copysign-18             0.80ns ± 0%  0.80ns ± 0%     ~     (all equal)
Cos-18                  7.19ns ± 0%  7.19ns ± 0%     ~     (p=0.556 n=4+5)
Cosh-18                 12.4ns ± 0%  12.4ns ± 0%     ~     (all equal)
Erf-18                  10.8ns ± 0%  10.8ns ± 0%     ~     (all equal)
Erfc-18                 11.0ns ± 0%  11.0ns ± 0%     ~     (all equal)
Erfinv-18               23.0ns ±16%  26.8ns ± 1%  +16.90%  (p=0.008 n=5+5)
Erfcinv-18              23.3ns ±15%  26.1ns ± 7%     ~     (p=0.087 n=5+5)
Exp-18                  8.67ns ± 0%  8.67ns ± 0%     ~     (p=1.000 n=4+4)
ExpGo-18                50.8ns ± 3%  52.4ns ± 2%     ~     (p=0.063 n=5+5)
Expm1-18                9.49ns ± 1%  9.47ns ± 0%     ~     (p=1.000 n=5+5)
Exp2-18                 52.7ns ± 1%  50.5ns ± 3%   -4.10%  (p=0.024 n=5+5)
Exp2Go-18               50.6ns ± 1%  48.4ns ± 3%   -4.39%  (p=0.008 n=5+5)
Abs-18                  0.67ns ± 0%  0.67ns ± 0%     ~     (p=0.444 n=5+5)
Dim-18                  1.02ns ± 0%  1.03ns ± 0%   +0.98%  (p=0.008 n=5+5)
Floor-18                0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Max-18                  3.09ns ± 1%  3.05ns ± 0%   -1.42%  (p=0.008 n=5+5)
Min-18                  3.32ns ± 1%  3.30ns ± 0%   -0.72%  (p=0.016 n=5+4)
Mod-18                  62.3ns ± 1%  65.8ns ± 3%   +5.55%  (p=0.008 n=5+5)
Frexp-18                5.05ns ± 2%  4.98ns ± 0%     ~     (p=0.683 n=5+5)
Gamma-18                24.4ns ± 0%  24.1ns ± 0%   -1.23%  (p=0.008 n=5+5)
Hypot-18                10.3ns ± 0%  10.3ns ± 0%     ~     (all equal)
HypotGo-18              10.2ns ± 0%  10.2ns ± 0%     ~     (all equal)
Ilogb-18                3.56ns ± 1%  3.54ns ± 0%     ~     (p=0.595 n=5+5)
J0-18                    113ns ± 0%   108ns ± 1%   -4.42%  (p=0.016 n=4+5)
J1-18                    115ns ± 0%   109ns ± 1%   -4.87%  (p=0.016 n=4+5)
Jn-18                    240ns ± 0%   230ns ± 2%   -4.41%  (p=0.008 n=5+5)
Ldexp-18                6.19ns ± 0%  6.19ns ± 0%     ~     (p=0.444 n=5+5)
Lgamma-18               32.2ns ± 0%  32.2ns ± 0%     ~     (all equal)
Log-18                  13.1ns ± 0%  13.1ns ± 0%     ~     (all equal)
Logb-18                 4.23ns ± 0%  4.22ns ± 0%     ~     (p=0.444 n=5+5)
Log1p-18                12.7ns ± 0%  12.7ns ± 0%     ~     (all equal)
Log10-18                18.1ns ± 0%  18.2ns ± 0%     ~     (p=0.167 n=5+5)
Log2-18                 14.0ns ± 0%  14.0ns ± 0%     ~     (all equal)
Modf-18                 10.4ns ± 0%  10.5ns ± 0%   +0.96%  (p=0.016 n=4+5)
Nextafter32-18          11.3ns ± 0%  11.3ns ± 0%     ~     (all equal)
Nextafter64-18          4.01ns ± 1%  3.97ns ± 0%     ~     (p=0.333 n=5+4)
PowInt-18               32.7ns ± 0%  32.7ns ± 0%     ~     (all equal)
PowFrac-18              33.2ns ± 0%  33.1ns ± 0%     ~     (p=0.095 n=4+5)
Pow10Pos-18             1.58ns ± 0%  1.58ns ± 0%     ~     (all equal)
Pow10Neg-18             5.81ns ± 0%  5.81ns ± 0%     ~     (all equal)
Round-18                0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
RoundToEven-18          0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Remainder-18            40.6ns ± 0%  40.7ns ± 0%     ~     (p=0.238 n=5+4)
Signbit-18              1.57ns ± 0%  1.57ns ± 0%     ~     (all equal)
Sin-18                  6.75ns ± 0%  6.74ns ± 0%     ~     (p=0.333 n=5+4)
Sincos-18               29.5ns ± 0%  29.5ns ± 0%     ~     (all equal)
Sinh-18                 14.4ns ± 0%  14.4ns ± 0%     ~     (all equal)
SqrtIndirect-18         3.97ns ± 0%  4.15ns ± 0%   +4.59%  (p=0.008 n=5+5)
SqrtLatency-18          8.01ns ± 0%  8.01ns ± 0%     ~     (all equal)
SqrtIndirectLatency-18  11.6ns ± 0%  11.6ns ± 0%     ~     (all equal)
SqrtGoLatency-18        44.7ns ± 0%  45.0ns ± 0%   +0.67%  (p=0.008 n=5+5)
SqrtPrime-18            1.26µs ± 0%  1.27µs ± 0%   +0.63%  (p=0.029 n=4+4)
Tan-18                  11.1ns ± 0%  11.1ns ± 0%     ~     (all equal)
Tanh-18                 15.8ns ± 0%  15.8ns ± 0%     ~     (all equal)
Trunc-18                0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Y0-18                    113ns ± 2%   108ns ± 3%   -5.11%  (p=0.008 n=5+5)
Y1-18                    112ns ± 3%   107ns ± 0%   -4.29%  (p=0.000 n=5+4)
Yn-18                    229ns ± 0%   220ns ± 1%   -3.76%  (p=0.016 n=4+5)
Float64bits-18          1.09ns ± 0%  1.09ns ± 0%     ~     (all equal)
Float64frombits-18      0.55ns ± 0%  0.55ns ± 0%     ~     (all equal)
Float32bits-18          0.96ns ±16%  0.86ns ± 0%     ~     (p=0.563 n=5+5)
Float32frombits-18      1.03ns ±28%  0.84ns ± 0%     ~     (p=0.167 n=5+5)
FMA-18                  1.60ns ± 0%  1.60ns ± 0%     ~     (all equal)
[Geo mean]              10.0ns        9.9ns        -0.41%
Change-Id: Ief7e63ea5a8ba404b0a4696e12b9b7e0b05a9a03
Reviewed-on: https://go-review.googlesource.com/c/go/+/209160
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-08 20:57:58 +00:00
Austin Clements
da8591b61c all: remove darwin/arm build-tags and files
This removes all files that are only used on darwin/arm and cleans up
build tags in files that are still used on other platforms.

Updates #37611.

Change-Id: Ic9490cf0edfc157c6276a7ca950c1768b34a998f
Reviewed-on: https://go-review.googlesource.com/c/go/+/227197
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-08 18:35:43 +00:00
Dan Scales
0a820007e7 runtime: static lock ranking for the runtime (enabled by GOEXPERIMENT)
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.

Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.

Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.

See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.

I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.

I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).

Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.

For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.

Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.

To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.

Fixes #38029

Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-07 21:51:03 +00:00
Michael Munday
bfd569fcb0 cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.

Fixes #37316.

Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-07 19:55:05 +00:00
Lynn Boger
815509ae31 cmd/compile: improve lowered moves and zeros for ppc64le
This change includes the following:
- Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9.
These instructions do not require an index register, which
allows more loads and stores within a loop without initializing
multiple index registers. The LoweredQuadXXX generate LXV/STXV.
- Create LoweredMoveXXXShort and LoweredZeroXXXShort for short
moves that don't generate loops, and therefore don't clobber the
address registers or flags.
- Use registers other than R3 and R4 to avoid conflicting with
registers that have already been allocated to avoid unnecessary
register moves.
- Eliminate the use of R14 as scratch register and use R31
instead.
- Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a
loop with more than 3 iterations.

This performance opportunity was noticed in github.com/golang/snappy
benchmarks. Results on power9:

WordsDecode1e1    54.1ns ± 0%    53.8ns ± 0%   -0.51%  (p=0.029 n=4+4)
WordsDecode1e2     287ns ± 0%     282ns ± 1%   -1.83%  (p=0.029 n=4+4)
WordsDecode1e3    3.98µs ± 0%    3.64µs ± 0%   -8.52%  (p=0.029 n=4+4)
WordsDecode1e4    66.9µs ± 0%    67.0µs ± 0%   +0.20%  (p=0.029 n=4+4)
WordsDecode1e5     723µs ± 0%     723µs ± 0%   -0.01%  (p=0.200 n=4+4)
WordsDecode1e6    7.21ms ± 0%    7.21ms ± 0%   -0.02%  (p=1.000 n=4+4)
WordsEncode1e1    29.9ns ± 0%    29.4ns ± 0%   -1.51%  (p=0.029 n=4+4)
WordsEncode1e2    2.12µs ± 0%    1.75µs ± 0%  -17.70%  (p=0.029 n=4+4)
WordsEncode1e3    11.7µs ± 0%    11.2µs ± 0%   -4.61%  (p=0.029 n=4+4)
WordsEncode1e4     119µs ± 0%     120µs ± 0%   +0.36%  (p=0.029 n=4+4)
WordsEncode1e5    1.21ms ± 0%    1.22ms ± 0%   +0.41%  (p=0.029 n=4+4)
WordsEncode1e6    12.0ms ± 0%    12.0ms ± 0%   +0.57%  (p=0.029 n=4+4)
RandomEncode       286µs ± 0%     203µs ± 0%  -28.82%  (p=0.029 n=4+4)
ExtendMatch       47.4µs ± 0%    47.0µs ± 0%   -0.85%  (p=0.029 n=4+4)

Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858
Reviewed-on: https://go-review.googlesource.com/c/go/+/226539
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-06 12:09:39 +00:00
Josh Bleecher Snyder
fff7509d47 cmd/compile: add intrinsic HasCPUFeature for checking cpu features
Before using some CPU instructions, we must check for their presence.
We use global variables in the runtime package to record features.

Prior to this CL, we issued a regular memory load for these features.
The downside to this is that, because it is a regular memory load,
it cannot be hoisted out of loops or otherwise reordered with other loads.

This CL introduces a new intrinsic just for checking cpu features.
It still ends up resulting in a memory load, but that memory load can
now be floated to the entry block and rematerialized as needed.

One downside is that the regular load could be combined with the comparison
into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE.
(It is possible that MOVBQZX+TESTQ+NE would be better.)

This CL does only amd64. It is easy to extend to other architectures.

For the benchmark in #36196, on my machine, this offers a mild speedup.

name      old time/op  new time/op  delta
FMA-8     1.39ns ± 6%  1.29ns ± 9%  -7.19%  (p=0.000 n=97+96)
NonFMA-8  2.03ns ±11%  2.04ns ±12%    ~     (p=0.618 n=99+98)

Updates #15808
Updates #36196

Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/212360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2020-04-04 01:01:04 +00:00
Dan Scales
ed7a8332c4 cmd/compile: allow mid-stack inlining when there is a cycle of recursion
We still disallow inlining for an immediately-recursive function, but allow
inlining if a function is in a recursion chain.

If all functions in the recursion chain are simple, then we could inline
forever down the recursion chain (eventually running out of stack on the
compiler), so we add a map to keep track of the functions we have
already inlined at a call site. We stop inlining when we reach a
function that we have already inlined in the recursive chain. Of course,
normally the inlining will have stopped earlier, because of the cost
function.

We could also limit the depth of inlining by a simple count (say, limit
max inlining of 10 at any given site). Would that limit other
opportunities too much?

Added a test in test/inline.go. runtime.BenchmarkStackCopyNoCache() is
also already a good test that triggers the check to stop inlining
when we reach the start of the recursive chain again.

For the bent benchmark suite, the performance improvement was mostly not
statistically significant, but the geomean averaged out to: -0.68%. The text size
increase was less than .1% for all bent benchmarks. The cmd/go text size increase
was 0.02% and the cmd/compile text size increase was .1%.

Fixes #29737

Change-Id: I892fa84bb07a947b3125ec8f25ed0e508bf2bdf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/226818
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-03 21:43:52 +00:00
Keith Randall
bba88467f8 cmd/compile: add indexed-load CMP instructions
Things like CMPQ 4(AX)(BX*8), CX

Fixes #37955

Change-Id: Icbed430f65c91a0e3f38a633d8321d79433ad8b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/224219
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-04-01 17:03:26 +00:00