1
0
mirror of https://github.com/golang/go synced 2024-11-14 06:50:21 -07:00
Commit Graph

246 Commits

Author SHA1 Message Date
Xiangdong Ji
e031318ca6 cmd/compile: ARM comparisons with 0 incorrect on overflow
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:

  Block-Op         Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.

For more details please refer to the previous fix on 64-bit ARM:
  https://go-review.googlesource.com/c/go/+/233097

Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.

name                     old time/op    new time/op     delta
BinaryTree17-8              7.73s ± 0%      7.81s ± 0%  +0.97%  (p=0.000 n=7+8)
Fannkuch11-8                7.06s ± 0%      7.00s ± 0%  -0.83%  (p=0.000 n=8+8)
FmtFprintfEmpty-8           181ns ± 1%      183ns ± 1%  +1.31%  (p=0.001 n=8+8)
FmtFprintfString-8          319ns ± 1%      325ns ± 2%  +1.71%  (p=0.009 n=7+8)
FmtFprintfInt-8             358ns ± 1%      359ns ± 1%    ~     (p=0.293 n=7+7)
FmtFprintfIntInt-8          459ns ± 3%      456ns ± 1%    ~     (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8     535ns ± 4%      538ns ± 4%    ~     (p=0.572 n=8+8)
FmtFprintfFloat-8          1.01µs ± 2%     1.01µs ± 2%    ~     (p=0.625 n=8+8)
FmtManyArgs-8              1.93µs ± 2%     1.93µs ± 1%    ~     (p=0.979 n=8+7)
GobDecode-8                16.1ms ± 1%     16.5ms ± 1%  +2.32%  (p=0.000 n=8+8)
GobEncode-8                15.9ms ± 0%     15.8ms ± 1%  -1.00%  (p=0.000 n=8+7)
Gzip-8                      690ms ± 1%      670ms ± 0%  -2.90%  (p=0.000 n=8+8)
Gunzip-8                    109ms ± 1%      109ms ± 1%    ~     (p=0.694 n=7+8)
HTTPClientServer-8          149µs ± 3%      146µs ± 2%  -1.70%  (p=0.028 n=8+8)
JSONEncode-8               50.5ms ± 1%     49.2ms ± 0%  -2.60%  (p=0.001 n=7+7)
JSONDecode-8                135ms ± 2%      137ms ± 1%    ~     (p=0.054 n=8+7)
Mandelbrot200-8             951ms ± 0%      952ms ± 0%    ~     (p=0.852 n=6+8)
GoParse-8                  9.47ms ± 1%     9.66ms ± 1%  +2.01%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8       288ns ± 2%      277ns ± 2%  -3.61%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8      1.66µs ± 1%     1.69µs ± 2%  +2.21%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       334ns ± 1%      305ns ± 2%  -8.86%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8      2.14µs ± 2%     2.15µs ± 0%    ~     (p=0.099 n=8+8)
RegexpMatchMedium_32-8     13.3ns ± 1%     13.3ns ± 0%    ~     (p=1.000 n=7+7)
RegexpMatchMedium_1K-8     81.1µs ± 3%     80.7µs ± 1%    ~     (p=0.955 n=7+8)
RegexpMatchHard_32-8       4.26µs ± 0%     4.26µs ± 0%    ~     (p=0.933 n=7+8)
RegexpMatchHard_1K-8        124µs ± 0%      124µs ± 0%  +0.31%  (p=0.000 n=8+8)
Revcomp-8                  14.7ms ± 2%     14.5ms ± 1%  -1.66%  (p=0.003 n=8+8)
Template-8                  197ms ± 2%      200ms ± 3%  +1.62%  (p=0.021 n=8+8)
TimeParse-8                1.33µs ± 1%     1.30µs ± 1%  -1.86%  (p=0.002 n=8+8)
TimeFormat-8               3.04µs ± 1%     3.02µs ± 0%  -0.60%  (p=0.000 n=8+8)

name                     old speed      new speed       delta
GobDecode-8              47.6MB/s ± 1%   46.5MB/s ± 1%  -2.28%  (p=0.000 n=8+8)
GobEncode-8              48.1MB/s ± 0%   48.6MB/s ± 1%  +1.02%  (p=0.000 n=8+7)
Gzip-8                   28.1MB/s ± 1%   29.0MB/s ± 0%  +2.97%  (p=0.000 n=8+8)
Gunzip-8                  178MB/s ± 1%    179MB/s ± 2%    ~     (p=0.694 n=7+8)
JSONEncode-8             38.4MB/s ± 1%   39.4MB/s ± 0%  +2.67%  (p=0.001 n=7+7)
JSONDecode-8             14.3MB/s ± 2%   14.2MB/s ± 1%  -0.81%  (p=0.043 n=8+7)
GoParse-8                6.12MB/s ± 1%   5.99MB/s ± 1%  -2.00%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8     111MB/s ± 2%    115MB/s ± 2%  +3.77%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8     618MB/s ± 1%    604MB/s ± 2%  -2.16%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8    95.7MB/s ± 1%  105.1MB/s ± 2%  +9.76%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8     479MB/s ± 2%    477MB/s ± 0%    ~     (p=0.105 n=8+8)
RegexpMatchMedium_32-8   75.2MB/s ± 1%   75.2MB/s ± 0%    ~     (p=0.247 n=7+7)
RegexpMatchMedium_1K-8   12.6MB/s ± 3%   12.7MB/s ± 1%    ~     (p=0.538 n=7+8)
RegexpMatchHard_32-8     7.52MB/s ± 0%   7.52MB/s ± 0%    ~     (p=0.968 n=7+8)
RegexpMatchHard_1K-8     8.26MB/s ± 0%   8.24MB/s ± 0%  -0.30%  (p=0.001 n=8+8)
Revcomp-8                 173MB/s ± 2%    176MB/s ± 1%  +1.68%  (p=0.003 n=8+8)
Template-8               9.85MB/s ± 2%   9.69MB/s ± 3%  -1.59%  (p=0.021 n=8+8)

Fixes   #39303
Updates #38740

Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-06-09 15:50:33 +00:00
Xiangdong Ji
e8f5a33191 cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:

  Block-Op        Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.

Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.

Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.

This change also updates TestFormats to support using %#x.

Examples exhibiting where does the issue come from:
  1: 'if x + 3 < 0' might be converted to:
  before:
    CMN $3, R0
    BGE <else branch> // wrong branch is taken if 'x+3' overflows
  after:
    CMN $3, R0
    BPL <else branch>

  2: 'if y - 3 > 0' might be converted to:
  before:
    CMP $3, R0
    BLE <else branch> // wrong branch is taken if 'y-3' underflows
  after:
    CMP $3, R0
    BMI <else branch>
    BEQ <else branch>

Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.

S1:
name                    old time/op  new time/op  delta
CondRewrite/SoloJump  13.6ns ± 0%  12.9ns ± 0%  -5.15%  (p=0.000 n=10+10)
CondRewrite/CombJump  13.8ns ± 1%  12.9ns ± 0%  -6.32%  (p=0.000 n=10+10)

S2:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  11.6ns ± 0%  10.9ns ± 0%  -6.03%  (p=0.000 n=10+10)
CondRewrite/CombJump  11.4ns ± 0%  10.8ns ± 1%  -5.53%  (p=0.000 n=10+10)

S3:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  7.36ns ± 0%  7.50ns ± 0%  +1.79%  (p=0.000 n=9+10)
CondRewrite/CombJump  7.35ns ± 0%  7.75ns ± 0%  +5.51%  (p=0.000 n=8+9)

S4:
name                      old time/op  new time/op  delta
CondRewrite/SoloJump-224  11.5ns ± 1%  10.9ns ± 0%  -4.97%  (p=0.000 n=10+10)
CondRewrite/CombJump-224  11.9ns ± 0%  11.5ns ± 0%  -2.95%  (p=0.000 n=10+10)

S5:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  10.0ns ± 0%  10.0ns ± 0%  -0.45%  (p=0.000 n=9+10)
CondRewrite/CombJump  9.93ns ± 0%  9.77ns ± 0%  -1.53%  (p=0.000 n=10+9)

Go1 perf. data:

name                     old time/op    new time/op    delta
BinaryTree17              6.29s ± 1%     6.30s ± 1%    ~     (p=1.000 n=5+5)
Fannkuch11                5.40s ± 0%     5.40s ± 0%    ~     (p=0.841 n=5+5)
FmtFprintfEmpty          97.9ns ± 0%    98.9ns ± 3%    ~     (p=0.937 n=4+5)
FmtFprintfString          171ns ± 3%     171ns ± 2%    ~     (p=0.754 n=5+5)
FmtFprintfInt             212ns ± 0%     217ns ± 6%  +2.55%  (p=0.008 n=5+5)
FmtFprintfIntInt          296ns ± 1%     297ns ± 2%    ~     (p=0.516 n=5+5)
FmtFprintfPrefixedInt     371ns ± 2%     374ns ± 7%    ~     (p=1.000 n=5+5)
FmtFprintfFloat           435ns ± 1%     439ns ± 2%    ~     (p=0.056 n=5+5)
FmtManyArgs              1.37µs ± 1%    1.36µs ± 1%    ~     (p=0.730 n=5+5)
GobDecode                14.6ms ± 4%    14.4ms ± 4%    ~     (p=0.690 n=5+5)
GobEncode                11.8ms ±20%    11.6ms ±15%    ~     (p=1.000 n=5+5)
Gzip                      507ms ± 0%     491ms ± 0%  -3.22%  (p=0.008 n=5+5)
Gunzip                   73.8ms ± 0%    73.9ms ± 0%    ~     (p=0.690 n=5+5)
HTTPClientServer          116µs ± 0%     116µs ± 0%    ~     (p=0.686 n=4+4)
JSONEncode               21.8ms ± 1%    21.6ms ± 2%    ~     (p=0.151 n=5+5)
JSONDecode                104ms ± 1%     103ms ± 1%  -1.08%  (p=0.016 n=5+5)
Mandelbrot200            9.53ms ± 0%    9.53ms ± 0%    ~     (p=0.421 n=5+5)
GoParse                  7.55ms ± 1%    7.51ms ± 1%    ~     (p=0.151 n=5+5)
RegexpMatchEasy0_32       158ns ± 0%     158ns ± 0%    ~     (all equal)
RegexpMatchEasy0_1K       606ns ± 1%     608ns ± 3%    ~     (p=0.937 n=5+5)
RegexpMatchEasy1_32       143ns ± 0%     144ns ± 1%    ~     (p=0.095 n=5+4)
RegexpMatchEasy1_1K       927ns ± 2%     944ns ± 2%    ~     (p=0.056 n=5+5)
RegexpMatchMedium_32     16.0ns ± 0%    16.0ns ± 0%    ~     (all equal)
RegexpMatchMedium_1K     69.3µs ± 2%    69.7µs ± 0%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32       3.73µs ± 0%    3.73µs ± 1%    ~     (p=0.984 n=5+5)
RegexpMatchHard_1K        111µs ± 1%     110µs ± 0%    ~     (p=0.151 n=5+5)
Revcomp                   1.91s ±47%     1.77s ±68%    ~     (p=1.000 n=5+5)
Template                  138ms ± 1%     138ms ± 1%    ~     (p=1.000 n=5+5)
TimeParse                 787ns ± 2%     785ns ± 1%    ~     (p=0.540 n=5+5)
TimeFormat                729ns ± 1%     726ns ± 1%    ~     (p=0.151 n=5+5)

Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2020-05-29 15:39:54 +00:00
Keith Randall
2cb10d42b7 cmd/compile: in prove, zero right shifts of positive int by #bits - 1
Taking over Zach's CL 212277. Just cleaned up and added a test.

For a positive, signed integer, an arithmetic right shift of count
(bit-width - 1) equals zero. e.g. int64(22) >> 63 -> 0. This CL makes
prove replace these right shifts with a zero-valued constant.

These shifts may arise in source code explicitly, but can also be
created by the generic rewrite of signed division by a power of 2.
// Signed divide by power of 2.
// n / c =       n >> log(c) if n >= 0
//       = (n+c-1) >> log(c) if n < 0
// We conditionally add c-1 by adding n>>63>>(64-log(c))
	(first shift signed, second shift unsigned).
(Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) ->
  (Rsh64x64
    (Add64 <t> n (Rsh64Ux64 <t>
    	(Rsh64x64 <t> n (Const64 <typ.UInt64> [63]))
	(Const64 <typ.UInt64> [64-log2(c)])))
    (Const64 <typ.UInt64> [log2(c)]))

If n is known to be positive, this rewrite includes an extra Add and 2
extra Rsh. This CL will allow prove to replace one of the extra Rsh with
a 0. That replacement then allows lateopt to remove all the unneccesary
fixups from the generic rewrite.

There is a rewrite rule to handle this case directly:
(Div64 n (Const64 [c])) && isNonNegative(n) && isPowerOfTwo(c) ->
	(Rsh64Ux64 n (Const64 <typ.UInt64> [log2(c)]))
But this implementation of isNonNegative really only handles constants
and a few special operations like len/cap. The division could be
handled if the factsTable version of isNonNegative were available.
Unfortunately, the first opt pass happens before prove even has a
chance to deduce the numerator is non-negative, so the generic rewrite
has already fired and created the extra Ops discussed above.

Fixes #36159

By Printf count, this zeroes 137 right shifts when building std and cmd.

Change-Id: Iab486910ac9d7cfb86ace2835456002732b384a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/232857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-05-11 16:23:52 +00:00
Martin Möhrmann
6ed4661807 cmd/compile: optimize make+copy pattern to avoid memclr
match:
 m = make([]T, x); copy(m, s)
for pointer free T and x==len(s) rewrite to:
 m = mallocgc(x*elemsize(T), nil, false); memmove(&m, &s, x*elemsize(T))
otherwise rewrite to:
 m = makeslicecopy([]T, x, s)

This avoids memclear and shading of pointers in the newly created slice
before the copy.

With this CL "s" is only be allowed to bev a variable and not a more
complex expression. This restriction could be lifted in future versions
of this optimization when it can be proven that "s" is not referencing "m".

Triggers 450 times during make.bash..
Reduces go binary size by ~8 kbyte.

name                           old time/op  new time/op  delta
MakeSliceCopy/mallocmove/Byte  71.1ns ± 1%  65.8ns ± 0%  -7.49%  (p=0.000 n=10+9)
MakeSliceCopy/mallocmove/Int   71.2ns ± 1%  66.0ns ± 0%  -7.27%  (p=0.000 n=10+8)
MakeSliceCopy/mallocmove/Ptr    104ns ± 4%    99ns ± 1%  -5.13%  (p=0.000 n=10+10)
MakeSliceCopy/makecopy/Byte    70.3ns ± 0%  68.0ns ± 0%  -3.22%  (p=0.000 n=10+9)
MakeSliceCopy/makecopy/Int     70.3ns ± 0%  68.5ns ± 1%  -2.59%  (p=0.000 n=9+10)
MakeSliceCopy/makecopy/Ptr      102ns ± 0%    99ns ± 1%  -2.97%  (p=0.000 n=9+9)
MakeSliceCopy/nilappend/Byte   75.4ns ± 0%  74.9ns ± 2%  -0.63%  (p=0.015 n=9+9)
MakeSliceCopy/nilappend/Int    75.6ns ± 0%  76.4ns ± 3%    ~     (p=0.245 n=9+10)
MakeSliceCopy/nilappend/Ptr     107ns ± 0%   108ns ± 1%  +0.93%  (p=0.005 n=9+10)

Fixes #26252

Change-Id: Iec553dd1fef6ded16197216a472351c8799a8e71
Reviewed-on: https://go-review.googlesource.com/c/go/+/146719
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-05-07 17:50:24 +00:00
Keith Randall
9ed0fb42e3 cmd/compile: add indexed memory modification ops to amd64
name            old time/op  new time/op  delta
Modify-16        404ns ± 1%   365ns ± 1%  -9.73%  (p=0.000 n=10+10)
ConstModify-16   407ns ± 0%   385ns ± 2%  -5.56%  (p=0.000 n=9+10)

Seems to generally help generated code.

Binary size change is in the noise.

Change-Id: I57891bfaf0f7dfc5d143bb9f7ebafc7079d2614f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228098
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-04-30 17:21:31 +00:00
Keith Randall
882ec701d2 cmd/compile: add indexed load+op operations to amd64
name        old time/op  new time/op  delta
LoadAdd-16   545ns ± 0%   456ns ± 0%  -16.31%  (p=0.000 n=10+10)

Update #36468

Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227960
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-04-30 17:19:57 +00:00
Josh Bleecher Snyder
4a7e363288 cmd/compile: optimize Move with all-zero ro sym src to Zero
We set up static symbols during walk that
we later make copies of to initialize local variables.
It is difficult to ascertain at that time exactly
when copying a symbol is profitable vs locally
initializing an autotmp.

During SSA, we are much better placed to optimize.
This change recognizes when we are copying from a
global readonly all-zero symbol and replaces it with
direct zeroing.

This often allows the all-zero symbol to be
deadcode eliminated at link time.
This is not ideal--it makes for large object files,
and longer link times--but it is the cleanest fix I could find.

This makes the final binary for the program in #38554
shrink from >500mb to ~2.2mb.

It also shrinks the standard binaries:

file      before    after     Δ       %
addr2line 4412496   4404304   -8192   -0.186%
buildid   2893816   2889720   -4096   -0.142%
cgo       4841048   4832856   -8192   -0.169%
compile   19926480  19922432  -4048   -0.020%
cover     5281816   5277720   -4096   -0.078%
link      6734648   6730552   -4096   -0.061%
nm        4366240   4358048   -8192   -0.188%
objdump   4755968   4747776   -8192   -0.172%
pprof     14653060  14612100  -40960  -0.280%
trace     11805940  11777268  -28672  -0.243%
vet       7185560   7181416   -4144   -0.058%
total     113588440 113465560 -122880 -0.108%

And not just by removing unnecessary symbols;
the program text shrinks a bit as well.

Fixes #38554

Change-Id: I8381ae6084ae145a5e0cd9410c451e52c0dc51c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/229704
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-24 23:58:10 +00:00
Josh Bleecher Snyder
e7c1873691 cmd/compile: optimize x & 1 != 0 to x & 1 on amd64
Triggers a handful of times in std+cmd.

Change-Id: I9bb8ce9a5f8bae2547cb61157cd8f256e1b63e76
Reviewed-on: https://go-review.googlesource.com/c/go/+/229602
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-23 17:52:28 +00:00
Michael Munday
ab7a65f283 cmd/compile: clean up codegen for branch-on-carry on s390x
This CL optimizes code that uses a carry from a function such as
bits.Add64 as the condition in an if statement. For example:

    x, c := bits.Add64(a, b, 0)
    if c != 0 {
        panic("overflow")
    }

Rather than converting the carry into a 0 or a 1 value and using
that as an input to a comparison instruction the carry flag is now
used as the input to a conditional branch directly. This typically
removes an ADD LOGICAL WITH CARRY instruction when user code is
doing overflow detection and is closer to the code that a user
would expect to generate.

Change-Id: I950431270955ab72f1b5c6db873b6abe769be0da
Reviewed-on: https://go-review.googlesource.com/c/go/+/219757
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-22 20:11:06 +00:00
Michael Munday
e464d7d797 cmd/compile: optimize comparisons with immediates on s390x
When generating code for unsigned equals (==) and not equals (!=)
comparisons we currently, on s390x, always use signed comparisons.

This mostly works well, however signed comparisons on s390x sign
extend their immediates and unsigned comparisons zero extend them.
For compare-and-branch instructions which can only have 8-bit
immediates this significantly changes the range of immediate values
we can represent: [-128, 127] for signed comparisons and [0, 255]
for unsigned comparisons.

When generating equals and not equals checks we don't neet to worry
about whether the comparison is signed or unsigned. This CL
therefore adds rules to allow us to switch signedness for such
comparisons if it means that it brings a constant into range for an
8-bit immediate.

For example, a signed equals with an integer in the range [128, 255]
will now be implemented using an unsigned compare-and-branch
instruction rather than separate compare and branch instructions.

As part of this change I've also added support for adding a name
to block control values using the same `x:(...)` syntax we use for
value rules.

Triggers 792 times when compiling cmd and std.

Change-Id: I77fa80a128f0a8ce51a2888d1e384bd5e9b61a77
Reviewed-on: https://go-review.googlesource.com/c/go/+/228642
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-21 19:23:51 +00:00
alex-semenyuk
876c1feb7d test/codegen, runtime/pprof, runtime: apply fmt
Change-Id: Ife4e065246729319c39e57a4fbd8e6f7b37724e1
GitHub-Last-Rev: e71803eaeb
GitHub-Pull-Request: golang/go#38527
Reviewed-on: https://go-review.googlesource.com/c/go/+/228901
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
2020-04-21 09:07:42 +00:00
Josh Bleecher Snyder
50b11318fe cmd/compile: use oneBit instead of isPowerOfTwo in bit optimization
This optimization works on any integer with exactly one bit set.
This is identical to being a power of two, except in the
most negative number. Use oneBit instead.

The rule now triggers in a few more places in std+cmd,
in packages encoding/asn1, crypto/elliptic, and
vendor/golang.org/x/crypto/cryptobyte.

This change obviates the need for CL 222479
by doing this optimization consistently in the compiler.

Change-Id: I983c6235290fdc634fda5e11b10f1f8ce041272f
Reviewed-on: https://go-review.googlesource.com/c/go/+/229124
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2020-04-21 00:38:34 +00:00
David Chase
e4e192484b cmd/compile: split up the addressing mode on OpAMD64CMP*loadidx* always
Benchmarking suggests that the combo instruction is notably slower,
at least in the places where we measure.

Updates #37955

Change-Id: I829f1975dd6edf38163128ba51d84604055512f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/228157
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-15 18:09:14 +00:00
Lynn Boger
a1550d3ca3 cmd/compile: use isel with variable shifts on ppc64x
This changes the code generated for variable length shift
counts to use isel instead of instructions that set and
read the carry flag.

This reduces the generated code for shifts like this
by 1 instruction and avoids the use of instructions to
set and read the carry flag.

This sequence can be found in strconv with these results
on power9:

Atof64Decimal                          71.6ns ± 0%  68.3ns ± 0%   -4.61%
Atof64Float                            95.3ns ± 0%  90.9ns ± 0%   -4.62%
Atof64FloatExp                          153ns ± 0%   149ns ± 0%   -2.61%
Atof64Big                               234ns ± 0%   232ns ± 0%   -0.85%
Atof64RandomBits                        348ns ± 0%   369ns ± 0%   +6.03%
Atof64RandomFloats                      262ns ± 0%   262ns ± 0%     ~
Atof32Decimal                          72.0ns ± 0%  68.2ns ± 0%   -5.28%
Atof32Float                            92.1ns ± 0%  87.1ns ± 0%   -5.43%
Atof32FloatExp                          159ns ± 0%   158ns ± 0%   -0.63%
Atof32Random                            194ns ± 0%   191ns ± 0%   -1.55%

Some tests in codegen/shift.go are enabled to verify the
expected instructions are generated.

Change-Id: I968715d10ada405a8c46132bf19b8ed9b85796d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227337
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-09 19:18:56 +00:00
Josh Bleecher Snyder
ade0811dc8 cmd/compile: handle some additional phis in shortcircuit
Prior to this change, the shortcircuit pass could only
handle blocks containing only a single phi control value,
possibly wrapped in some OpNot and OpCopy values.

This change partially lifts this limitation.
It handles some cases in which the block contains other phi values.
This appears to happen most commonly in cases in which
the conditionals being checked involve the memory state,
in which case there is a phi memory value in the block.

The general idea here is to use the information we have about
the CFG to (1) move the other phi values into other blocks
and/or (2) rewrite uses of the other phi values in other blocks.

For example, consider this CFG:

p   q
 \ /
  b
 / \
t   u

And consider a phi value v in block b.
We'll write v = Phi(p: x, q: y) to say that v has value x corresponding
to inbound block p, and value y for block q.

We will rewrite this CFG to:

p    q
|   /
|  b
|/  \
t    u

What should we do with v?

Any uses of v in u can be replaced with y. Why?
If we are in block u, we came from b, and before that from q.
If prior to b we came from p, then we would have gone to t, not u.
Since we came from q, we know that v took the value y.

Uses of v in t are a bit more complicated.
It is going to end up being a phi value: Phi(p: ?, b: ?).

Suppose, after the rewrite, we came from block p.
Then, before the rewrite, we would have gone to b,
where v would have the value x.
So we have Phi(p: x, b: ?).

Suppose, after the rewrite, we came from block b.
Then we must have come from block q.
If we come from block q, v has value y.
So we have Phi(p: x, b: y).
Uses of v in t can thus be replaced with a new phi value,
with the same values as v, but with altered predecessors.

Similar reasoning can be employed to rewrite or replace
other uses of v elsewhere in the CFG, so that v itself can be eliminated,
and the CFG rewrite can proceed.

This change sets up the infrastructure for such optimizations
and adds a few cheap ones. All optimizations in this change depend
only on the shape of the CFG; future changes may also depend on where
v's uses are. That analysis is more powerful but more expensive,
and should be done incrementally.

The use of closures here is perhaps a bit unusual,
but during development it proved critical to having readable code.
We must decide early on whether we can safely do the CFG modifications,
and then later fix up the phis if so.
Safely storing state and decisions across these two phases is hard to do readably.
Closures solve the problem neatly.

I manually instrumented the code paths in shortcircuitPhiPlan.
During make.bash there are nearly 6000 invocations.
The least-visited code path gets run 85 times,
so all the code in this CL is reasonably well-exercised.

Here is a concrete example of code improved by this change:

func f(e interface{}) int {
	if x, ok := e.(int); ok {
		return x
	}
	return 0
}

Omitting PCDATA, FUNCDATA, and the like, it used to compile to:

"".f STEXT nosplit size=50 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	43
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:4)	JNE	33
	0x001b 00027 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x0020 00032 (x.go:5)	RET
	0x0021 00033 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x002a 00042 (x.go:7)	RET
	0x002b 00043 (x.go:7)	MOVL	$0, AX
	0x0030 00048 (x.go:4)	JMP	25

Afterwards, it compiles to:

"".f STEXT nosplit size=41 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	31
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x001e 00030 (x.go:5)	RET
	0x001f 00031 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x0028 00040 (x.go:7)	RET

Note that there is now only a single JNE and a single RET $0 path.

Updates #37608

Has a minor good effect on compilation speed and memory use.

Provides widespread improvements to generated code.
The rare, minor regressions I have investigated are due to
register allocation fluctuations.

file      before    after     Δ       %       
addr2line 4376080   4371984   -4096   -0.094% 
api       5945400   5933112   -12288  -0.207% 
asm       5034312   5030216   -4096   -0.081% 
buildid   2844952   2840856   -4096   -0.144% 
cgo       4812872   4804680   -8192   -0.170% 
compile   19622064  19610368  -11696  -0.060% 
cover     5236648   5232552   -4096   -0.078% 
dist      3658312   3654216   -4096   -0.112% 
doc       4653512   4649416   -4096   -0.088% 
fix       3370072   3365976   -4096   -0.122% 
link      6671864   6667768   -4096   -0.061% 
pprof     14781652  14761172  -20480  -0.139% 
trace     11639684  11627396  -12288  -0.106% 
vet       8252280   8231800   -20480  -0.248% 
total     115052984 114934792 -118192 -0.103% 


file                                                                     before   after    Δ       %       
internal/cpu.s                                                           3298     3296     -2      -0.061% 
internal/bytealg.s                                                       1730     1737     +7      +0.405% 
cmd/vendor/golang.org/x/mod/semver.s                                     7332     7283     -49     -0.668% 
image/color.s                                                            8248     8156     -92     -1.115% 
math.s                                                                   35966    35956    -10     -0.028% 
math/cmplx.s                                                             6596     6575     -21     -0.318% 
runtime.s                                                                480566   480053   -513    -0.107% 
sync.s                                                                   16408    16385    -23     -0.140% 
math/rand.s                                                              10447    10406    -41     -0.392% 
internal/reflectlite.s                                                   28408    28366    -42     -0.148% 
errors.s                                                                 2736     2701     -35     -1.279% 
sort.s                                                                   17031    17036    +5      +0.029% 
io.s                                                                     16993    16964    -29     -0.171% 
container/heap.s                                                         2006     1997     -9      -0.449% 
text/tabwriter.s                                                         9570     9552     -18     -0.188% 
bytes.s                                                                  31823    31594    -229    -0.720% 
strconv.s                                                                52760    52717    -43     -0.082% 
vendor/golang.org/x/text/transform.s                                     16713    16706    -7      -0.042% 
strings.s                                                                42590    42563    -27     -0.063% 
bufio.s                                                                  22883    22785    -98     -0.428% 
encoding/base32.s                                                        9586     9531     -55     -0.574% 
syscall.s                                                                82237    82243    +6      +0.007% 
image.s                                                                  37465    37452    -13     -0.035% 
regexp/syntax.s                                                          82827    82769    -58     -0.070% 
image/draw.s                                                             18698    18584    -114    -0.610% 
image/jpeg.s                                                             36560    36549    -11     -0.030% 
time.s                                                                   82557    82526    -31     -0.038% 
context.s                                                                10863    10820    -43     -0.396% 
regexp.s                                                                 64114    64049    -65     -0.101% 
os.s                                                                     51751    51524    -227    -0.439% 
reflect.s                                                                168240   168049   -191    -0.114% 
cmd/go/internal/lockedfile/internal/filelock.s                           2317     2290     -27     -1.165% 
path/filepath.s                                                          17831    17766    -65     -0.365% 
io/ioutil.s                                                              6994     6990     -4      -0.057% 
encoding/binary.s                                                        30791    30726    -65     -0.211% 
cmd/vendor/golang.org/x/sys/unix.s                                       78055    78033    -22     -0.028% 
encoding/pem.s                                                           9280     9247     -33     -0.356% 
crypto/cipher.s                                                          20376    20374    -2      -0.010% 
os/exec.s                                                                29229    29140    -89     -0.304% 
internal/goroot.s                                                        4588     4579     -9      -0.196% 
cmd/internal/browser.s                                                   2246     2240     -6      -0.267% 
cmd/vendor/golang.org/x/crypto/ssh/terminal.s                            27183    27149    -34     -0.125% 
fmt.s                                                                    76625    76484    -141    -0.184% 
encoding/hex.s                                                           6154     6152     -2      -0.032% 
compress/lzw.s                                                           7063     7059     -4      -0.057% 
database/sql/driver.s                                                    18875    18862    -13     -0.069% 
debug/plan9obj.s                                                         8268     8266     -2      -0.024% 
net/url.s                                                                29724    29719    -5      -0.017% 
encoding/csv.s                                                           12872    12856    -16     -0.124% 
debug/gosym.s                                                            25303    25268    -35     -0.138% 
compress/flate.s                                                         50952    51019    +67     +0.131% 
compress/zlib.s                                                          7277     7266     -11     -0.151% 
archive/zip.s                                                            42155    42111    -44     -0.104% 
debug/dwarf.s                                                            107632   107541   -91     -0.085% 
database/sql.s                                                           98373    98028    -345    -0.351% 
os/user.s                                                                14722    14708    -14     -0.095% 
encoding/json.s                                                          105836   105711   -125    -0.118% 
debug/macho.s                                                            32598    32560    -38     -0.117% 
encoding/gob.s                                                           136478   135755   -723    -0.530% 
debug/pe.s                                                               31160    30869    -291    -0.934% 
debug/elf.s                                                              63495    63302    -193    -0.304% 
vendor/golang.org/x/text/unicode/bidi.s                                  27220    27217    -3      -0.011% 
vendor/golang.org/x/text/secure/bidirule.s                               3363     3352     -11     -0.327% 
go/token.s                                                               12036    12035    -1      -0.008% 
flag.s                                                                   22277    22256    -21     -0.094% 
mime.s                                                                   39696    39509    -187    -0.471% 
go/scanner.s                                                             19033    19020    -13     -0.068% 
archive/tar.s                                                            70936    70581    -355    -0.500% 
internal/xcoff.s                                                         22823    22820    -3      -0.013% 
text/scanner.s                                                           11631    11629    -2      -0.017% 
encoding/xml.s                                                           110534   110408   -126    -0.114% 
math/big.s                                                               183636   183545   -91     -0.050% 
image/gif.s                                                              27376    27343    -33     -0.121% 
crypto/dsa.s                                                             6029     5969     -60     -0.995% 
image/png.s                                                              42947    42939    -8      -0.019% 
crypto/rand.s                                                            6866     6854     -12     -0.175% 
vendor/golang.org/x/text/unicode/norm.s                                  66394    66354    -40     -0.060% 
runtime/trace.s                                                          2603     2521     -82     -3.150% 
crypto/ed25519.s                                                         6321     6300     -21     -0.332% 
text/template/parse.s                                                    93910    93844    -66     -0.070% 
crypto/rsa.s                                                             31460    31369    -91     -0.289% 
encoding/asn1.s                                                          57021    57023    +2      +0.004% 
crypto/elliptic.s                                                        51382    51363    -19     -0.037% 
crypto/x509/pkix.s                                                       10386    10342    -44     -0.424% 
vendor/golang.org/x/net/idna.s                                           24482    24466    -16     -0.065% 
vendor/golang.org/x/crypto/cryptobyte.s                                  33479    33280    -199    -0.594% 
crypto/ecdsa.s                                                           11936    11883    -53     -0.444% 
go/constant.s                                                            43670    42663    -1007   -2.306% 
go/ast.s                                                                 80383    80191    -192    -0.239% 
testing.s                                                                68069    68057    -12     -0.018% 
runtime/pprof.s                                                          59613    59603    -10     -0.017% 
testing/iotest.s                                                         4895     4891     -4      -0.082% 
internal/trace.s                                                         78136    78089    -47     -0.060% 
cmd/internal/goobj2.s                                                    13158    13154    -4      -0.030% 
cmd/internal/src.s                                                       17661    17657    -4      -0.023% 
go/parser.s                                                              79046    78880    -166    -0.210% 
cmd/internal/objabi.s                                                    16367    16343    -24     -0.147% 
text/template.s                                                          94899    94486    -413    -0.435% 
go/printer.s                                                             77267    76992    -275    -0.356% 
cmd/internal/goobj.s                                                     25988    25947    -41     -0.158% 
runtime/pprof/internal/profile.s                                         102066   101933   -133    -0.130% 
go/format.s                                                              5419     5371     -48     -0.886% 
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s                            37181    37149    -32     -0.086% 
go/doc.s                                                                 74533    74132    -401    -0.538% 
html/template.s                                                          88743    88389    -354    -0.399% 
cmd/asm/internal/lex.s                                                   24881    24872    -9      -0.036% 
cmd/internal/buildid.s                                                   18263    18256    -7      -0.038% 
cmd/vendor/golang.org/x/arch/x86/x86asm.s                                80036    79980    -56     -0.070% 
go/build.s                                                               68905    68737    -168    -0.244% 
cmd/cover.s                                                              46070    45950    -120    -0.260% 
cmd/internal/obj.s                                                       117001   116991   -10     -0.009% 
cmd/doc.s                                                                62700    62419    -281    -0.448% 
cmd/internal/obj/arm.s                                                   66745    66687    -58     -0.087% 
cmd/compile/internal/syntax.s                                            145406   145062   -344    -0.237% 
cmd/internal/obj/wasm.s                                                  44049    44027    -22     -0.050% 
net.s                                                                    291835   291020   -815    -0.279% 
cmd/dist.s                                                               209020   208807   -213    -0.102% 
cmd/cgo.s                                                                241564   241102   -462    -0.191% 
vendor/golang.org/x/net/http/httpproxy.s                                 9407     9399     -8      -0.085% 
log/syslog.s                                                             7921     7909     -12     -0.151% 
go/types.s                                                               319325   317513   -1812   -0.567% 
vendor/golang.org/x/net/http/httpguts.s                                  3834     3825     -9      -0.235% 
mime/multipart.s                                                         21414    21343    -71     -0.332% 
cmd/internal/obj/ppc64.s                                                 119949   119938   -11     -0.009% 
cmd/compile/internal/logopt.s                                            10158    10118    -40     -0.394% 
vendor/golang.org/x/net/nettest.s                                        28012    27991    -21     -0.075% 
go/internal/srcimporter.s                                                6405     6380     -25     -0.390% 
go/internal/gcimporter.s                                                 34525    34493    -32     -0.093% 
net/mail.s                                                               23937    23720    -217    -0.907% 
go/internal/gccgoimporter.s                                              56095    56038    -57     -0.102% 
cmd/compile/internal/types.s                                             47247    47207    -40     -0.085% 
cmd/api.s                                                                39582    39558    -24     -0.061% 
cmd/go/internal/base.s                                                   12572    12551    -21     -0.167% 
cmd/vendor/golang.org/x/xerrors.s                                        17846    17814    -32     -0.179% 
cmd/vendor/golang.org/x/mod/sumdb/note.s                                 18142    18070    -72     -0.397% 
cmd/go/internal/search.s                                                 19994    19876    -118    -0.590% 
cmd/go/internal/imports.s                                                16457    16428    -29     -0.176% 
cmd/vendor/golang.org/x/mod/module.s                                     17838    17759    -79     -0.443% 
cmd/go/internal/cache.s                                                  30551    30514    -37     -0.121% 
cmd/vendor/golang.org/x/mod/sumdb/tlog.s                                 36356    36321    -35     -0.096% 
cmd/internal/test2json.s                                                 9452     9408     -44     -0.466% 
cmd/go/internal/mvs.s                                                    25136    25092    -44     -0.175% 
cmd/go/internal/txtar.s                                                  3488     3461     -27     -0.774% 
cmd/vendor/golang.org/x/mod/zip.s                                        18811    18800    -11     -0.058% 
cmd/go/internal/version.s                                                11213    11171    -42     -0.375% 
cmd/link/internal/benchmark.s                                            4941     4949     +8      +0.162% 
cmd/internal/obj/s390x.s                                                 126865   126849   -16     -0.013% 
cmd/gofmt.s                                                              30684    30596    -88     -0.287% 
cmd/fix.s                                                                87450    86906    -544    -0.622% 
cmd/internal/obj/x86.s                                                   88578    88556    -22     -0.025% 
cmd/vendor/golang.org/x/mod/modfile.s                                    72450    72363    -87     -0.120% 
cmd/oldlink/internal/loader.s                                            16743    16741    -2      -0.012% 
cmd/pack.s                                                               14863    14861    -2      -0.013% 
cmd/go/internal/load.s                                                   106742   106568   -174    -0.163% 
cmd/oldlink/internal/objfile.s                                           21787    21780    -7      -0.032% 
cmd/oldlink/internal/loadmacho.s                                         29309    29317    +8      +0.027% 
cmd/oldlink/internal/loadelf.s                                           35013    35021    +8      +0.023% 
cmd/asm/internal/asm.s                                                   68550    68538    -12     -0.018% 
cmd/link/internal/loader.s                                               94765    94564    -201    -0.212% 
cmd/link/internal/loadelf.s                                              35663    35667    +4      +0.011% 
cmd/link/internal/loadmacho.s                                            29501    29509    +8      +0.027% 
cmd/vendor/golang.org/x/tools/go/analysis.s                              4983     4976     -7      -0.140% 
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s       16771    16709    -62     -0.370% 
cmd/vendor/golang.org/x/tools/go/types/objectpath.s                      18481    18456    -25     -0.135% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2100     2085     -15     -0.714% 
cmd/vendor/github.com/google/pprof/profile.s                             150141   149620   -521    -0.347% 
cmd/vendor/github.com/google/pprof/internal/measurement.s                10420    10404    -16     -0.154% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s               36814    36755    -59     -0.160% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s                 6688     6673     -15     -0.224% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s               9856     9784     -72     -0.731% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/composite.s             3011     2979     -32     -1.063% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s              9737     9682     -55     -0.565% 
cmd/vendor/golang.org/x/tools/go/cfg.s                                   30738    30725    -13     -0.042% 
cmd/vendor/github.com/ianlancetaylor/demangle.s                          175195   174513   -682    -0.389% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/httpresponse.s          3625     3520     -105    -2.897% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s           2987     2971     -16     -0.536% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s                 4372     4340     -32     -0.732% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s            8634     8611     -23     -0.266% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s                 6189     6164     -25     -0.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s             8089     8073     -16     -0.198% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unsafeptr.s             2208     2177     -31     -1.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s           8050     8047     -3      -0.037% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s          3665     3629     -36     -0.982% 
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                           65773    65680    -93     -0.141% 
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s                  13328    13286    -42     -0.315% 
cmd/vendor/golang.org/x/tools/go/types/typeutil.s                        12263    12162    -101    -0.824% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s              1459     1421     -38     -2.605% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s              5208     5191     -17     -0.326% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unmarshal.s             1801     1782     -19     -1.055% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s            9569     9528     -41     -0.428% 
cmd/go/internal/work.s                                                   304928   304756   -172    -0.056% 
crypto/x509.s                                                            147340   147139   -201    -0.136% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s                34287    34019    -268    -0.782% 
crypto/tls.s                                                             311603   310644   -959    -0.308% 
cmd/oldlink/internal/ld.s                                                533115   532651   -464    -0.087% 
cmd/oldlink/internal/wasm.s                                              16484    16458    -26     -0.158% 
cmd/oldlink/internal/x86.s                                               18832    18830    -2      -0.011% 
cmd/link/internal/ld.s                                                   548200   547626   -574    -0.105% 
cmd/link/internal/wasm.s                                                 16760    16734    -26     -0.155% 
cmd/link/internal/arm64.s                                                20850    20840    -10     -0.048% 
cmd/link/internal/x86.s                                                  17437    17435    -2      -0.011% 
net/http.s                                                               556647   555519   -1128   -0.203% 
net/http/cookiejar.s                                                     15849    15833    -16     -0.101% 
expvar.s                                                                 9521     9508     -13     -0.137% 
net/http/httptest.s                                                      16471    16452    -19     -0.115% 
cmd/vendor/github.com/google/pprof/internal/plugin.s                     4266     4264     -2      -0.047% 
net/http/cgi.s                                                           23448    23428    -20     -0.085% 
cmd/go/internal/web.s                                                    16472    16428    -44     -0.267% 
net/http/httputil.s                                                      39672    39670    -2      -0.005% 
net/rpc.s                                                                33989    33965    -24     -0.071% 
net/http/fcgi.s                                                          19167    19162    -5      -0.026% 
cmd/vendor/github.com/google/pprof/internal/symbolz.s                    5861     5857     -4      -0.068% 
cmd/vendor/github.com/google/pprof/internal/binutils.s                   35842    35823    -19     -0.053% 
cmd/vendor/github.com/google/pprof/internal/symbolizer.s                 11449    11404    -45     -0.393% 
cmd/go/internal/get.s                                                    62726    62582    -144    -0.230% 
cmd/vendor/github.com/google/pprof/internal/report.s                     80032    80022    -10     -0.012% 
cmd/go/internal/modfetch/codehost.s                                      89005    88871    -134    -0.151% 
cmd/trace.s                                                              116607   116496   -111    -0.095% 
cmd/vendor/github.com/google/pprof/internal/driver.s                     143234   143207   -27     -0.019% 
cmd/vendor/github.com/google/pprof/driver.s                              9000     8998     -2      -0.022% 
cmd/go/internal/modfetch.s                                               126300   125726   -574    -0.454% 
cmd/pprof.s                                                              12317    12312    -5      -0.041% 
cmd/go/internal/modconv.s                                                17878    17861    -17     -0.095% 
cmd/go/internal/modload.s                                                150261   149763   -498    -0.331% 
cmd/go/internal/clean.s                                                  11122    11091    -31     -0.279% 
cmd/go/internal/help.s                                                   6523     6521     -2      -0.031% 
cmd/go/internal/generate.s                                               11627    11614    -13     -0.112% 
cmd/go/internal/envcmd.s                                                 22034    21986    -48     -0.218% 
cmd/go/internal/modget.s                                                 38478    38398    -80     -0.208% 
cmd/go/internal/modcmd.s                                                 46430    46229    -201    -0.433% 
cmd/go/internal/test.s                                                   64399    64374    -25     -0.039% 
cmd/compile/internal/ssa.s                                               3615264  3608276  -6988   -0.193% 
cmd/compile/internal/gc.s                                                1538865  1537625  -1240   -0.081% 
cmd/compile/internal/amd64.s                                             33593    33574    -19     -0.057% 
cmd/compile/internal/x86.s                                               30871    30852    -19     -0.062% 
total                                                                    19343565 19311284 -32281  -0.167% 

Change-Id: Ib030eb79458827a5a5b6d0d2f98765f8325a4d7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/222923
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-08 22:13:38 +00:00
Ruixin(Peter) Bao
b2790a2838 cmd/compile: allow floating point Ops to produce flags on s390x
On s390x, some floating point arithmetic instructions (FSUB, FADD)  generate flag.
This patch allows those related SSA ops to return a tuple, where the second argument of
the tuple is the generated flag. We can use the flag and remove the
subsequent comparison instruction (e.g: LTDBR).

This CL also reduces the .text section for math.test binary by 0.4KB.

Benchmarks:
name                    old time/op  new time/op  delta
Acos-18                 12.1ns ± 0%  12.1ns ± 0%     ~     (all equal)
Acosh-18                18.5ns ± 0%  18.5ns ± 0%     ~     (all equal)
Asin-18                 13.1ns ± 0%  13.1ns ± 0%     ~     (all equal)
Asinh-18                19.4ns ± 0%  19.5ns ± 1%     ~     (p=0.444 n=5+5)
Atan-18                 10.0ns ± 0%  10.0ns ± 0%     ~     (all equal)
Atanh-18                19.1ns ± 1%  19.2ns ± 2%     ~     (p=0.841 n=5+5)
Atan2-18                16.4ns ± 0%  16.4ns ± 0%     ~     (all equal)
Cbrt-18                 14.8ns ± 0%  14.8ns ± 0%     ~     (all equal)
Ceil-18                 0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Copysign-18             0.80ns ± 0%  0.80ns ± 0%     ~     (all equal)
Cos-18                  7.19ns ± 0%  7.19ns ± 0%     ~     (p=0.556 n=4+5)
Cosh-18                 12.4ns ± 0%  12.4ns ± 0%     ~     (all equal)
Erf-18                  10.8ns ± 0%  10.8ns ± 0%     ~     (all equal)
Erfc-18                 11.0ns ± 0%  11.0ns ± 0%     ~     (all equal)
Erfinv-18               23.0ns ±16%  26.8ns ± 1%  +16.90%  (p=0.008 n=5+5)
Erfcinv-18              23.3ns ±15%  26.1ns ± 7%     ~     (p=0.087 n=5+5)
Exp-18                  8.67ns ± 0%  8.67ns ± 0%     ~     (p=1.000 n=4+4)
ExpGo-18                50.8ns ± 3%  52.4ns ± 2%     ~     (p=0.063 n=5+5)
Expm1-18                9.49ns ± 1%  9.47ns ± 0%     ~     (p=1.000 n=5+5)
Exp2-18                 52.7ns ± 1%  50.5ns ± 3%   -4.10%  (p=0.024 n=5+5)
Exp2Go-18               50.6ns ± 1%  48.4ns ± 3%   -4.39%  (p=0.008 n=5+5)
Abs-18                  0.67ns ± 0%  0.67ns ± 0%     ~     (p=0.444 n=5+5)
Dim-18                  1.02ns ± 0%  1.03ns ± 0%   +0.98%  (p=0.008 n=5+5)
Floor-18                0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Max-18                  3.09ns ± 1%  3.05ns ± 0%   -1.42%  (p=0.008 n=5+5)
Min-18                  3.32ns ± 1%  3.30ns ± 0%   -0.72%  (p=0.016 n=5+4)
Mod-18                  62.3ns ± 1%  65.8ns ± 3%   +5.55%  (p=0.008 n=5+5)
Frexp-18                5.05ns ± 2%  4.98ns ± 0%     ~     (p=0.683 n=5+5)
Gamma-18                24.4ns ± 0%  24.1ns ± 0%   -1.23%  (p=0.008 n=5+5)
Hypot-18                10.3ns ± 0%  10.3ns ± 0%     ~     (all equal)
HypotGo-18              10.2ns ± 0%  10.2ns ± 0%     ~     (all equal)
Ilogb-18                3.56ns ± 1%  3.54ns ± 0%     ~     (p=0.595 n=5+5)
J0-18                    113ns ± 0%   108ns ± 1%   -4.42%  (p=0.016 n=4+5)
J1-18                    115ns ± 0%   109ns ± 1%   -4.87%  (p=0.016 n=4+5)
Jn-18                    240ns ± 0%   230ns ± 2%   -4.41%  (p=0.008 n=5+5)
Ldexp-18                6.19ns ± 0%  6.19ns ± 0%     ~     (p=0.444 n=5+5)
Lgamma-18               32.2ns ± 0%  32.2ns ± 0%     ~     (all equal)
Log-18                  13.1ns ± 0%  13.1ns ± 0%     ~     (all equal)
Logb-18                 4.23ns ± 0%  4.22ns ± 0%     ~     (p=0.444 n=5+5)
Log1p-18                12.7ns ± 0%  12.7ns ± 0%     ~     (all equal)
Log10-18                18.1ns ± 0%  18.2ns ± 0%     ~     (p=0.167 n=5+5)
Log2-18                 14.0ns ± 0%  14.0ns ± 0%     ~     (all equal)
Modf-18                 10.4ns ± 0%  10.5ns ± 0%   +0.96%  (p=0.016 n=4+5)
Nextafter32-18          11.3ns ± 0%  11.3ns ± 0%     ~     (all equal)
Nextafter64-18          4.01ns ± 1%  3.97ns ± 0%     ~     (p=0.333 n=5+4)
PowInt-18               32.7ns ± 0%  32.7ns ± 0%     ~     (all equal)
PowFrac-18              33.2ns ± 0%  33.1ns ± 0%     ~     (p=0.095 n=4+5)
Pow10Pos-18             1.58ns ± 0%  1.58ns ± 0%     ~     (all equal)
Pow10Neg-18             5.81ns ± 0%  5.81ns ± 0%     ~     (all equal)
Round-18                0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
RoundToEven-18          0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Remainder-18            40.6ns ± 0%  40.7ns ± 0%     ~     (p=0.238 n=5+4)
Signbit-18              1.57ns ± 0%  1.57ns ± 0%     ~     (all equal)
Sin-18                  6.75ns ± 0%  6.74ns ± 0%     ~     (p=0.333 n=5+4)
Sincos-18               29.5ns ± 0%  29.5ns ± 0%     ~     (all equal)
Sinh-18                 14.4ns ± 0%  14.4ns ± 0%     ~     (all equal)
SqrtIndirect-18         3.97ns ± 0%  4.15ns ± 0%   +4.59%  (p=0.008 n=5+5)
SqrtLatency-18          8.01ns ± 0%  8.01ns ± 0%     ~     (all equal)
SqrtIndirectLatency-18  11.6ns ± 0%  11.6ns ± 0%     ~     (all equal)
SqrtGoLatency-18        44.7ns ± 0%  45.0ns ± 0%   +0.67%  (p=0.008 n=5+5)
SqrtPrime-18            1.26µs ± 0%  1.27µs ± 0%   +0.63%  (p=0.029 n=4+4)
Tan-18                  11.1ns ± 0%  11.1ns ± 0%     ~     (all equal)
Tanh-18                 15.8ns ± 0%  15.8ns ± 0%     ~     (all equal)
Trunc-18                0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Y0-18                    113ns ± 2%   108ns ± 3%   -5.11%  (p=0.008 n=5+5)
Y1-18                    112ns ± 3%   107ns ± 0%   -4.29%  (p=0.000 n=5+4)
Yn-18                    229ns ± 0%   220ns ± 1%   -3.76%  (p=0.016 n=4+5)
Float64bits-18          1.09ns ± 0%  1.09ns ± 0%     ~     (all equal)
Float64frombits-18      0.55ns ± 0%  0.55ns ± 0%     ~     (all equal)
Float32bits-18          0.96ns ±16%  0.86ns ± 0%     ~     (p=0.563 n=5+5)
Float32frombits-18      1.03ns ±28%  0.84ns ± 0%     ~     (p=0.167 n=5+5)
FMA-18                  1.60ns ± 0%  1.60ns ± 0%     ~     (all equal)
[Geo mean]              10.0ns        9.9ns        -0.41%
Change-Id: Ief7e63ea5a8ba404b0a4696e12b9b7e0b05a9a03
Reviewed-on: https://go-review.googlesource.com/c/go/+/209160
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-08 20:57:58 +00:00
Michael Munday
bfd569fcb0 cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.

Fixes #37316.

Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-07 19:55:05 +00:00
Lynn Boger
815509ae31 cmd/compile: improve lowered moves and zeros for ppc64le
This change includes the following:
- Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9.
These instructions do not require an index register, which
allows more loads and stores within a loop without initializing
multiple index registers. The LoweredQuadXXX generate LXV/STXV.
- Create LoweredMoveXXXShort and LoweredZeroXXXShort for short
moves that don't generate loops, and therefore don't clobber the
address registers or flags.
- Use registers other than R3 and R4 to avoid conflicting with
registers that have already been allocated to avoid unnecessary
register moves.
- Eliminate the use of R14 as scratch register and use R31
instead.
- Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a
loop with more than 3 iterations.

This performance opportunity was noticed in github.com/golang/snappy
benchmarks. Results on power9:

WordsDecode1e1    54.1ns ± 0%    53.8ns ± 0%   -0.51%  (p=0.029 n=4+4)
WordsDecode1e2     287ns ± 0%     282ns ± 1%   -1.83%  (p=0.029 n=4+4)
WordsDecode1e3    3.98µs ± 0%    3.64µs ± 0%   -8.52%  (p=0.029 n=4+4)
WordsDecode1e4    66.9µs ± 0%    67.0µs ± 0%   +0.20%  (p=0.029 n=4+4)
WordsDecode1e5     723µs ± 0%     723µs ± 0%   -0.01%  (p=0.200 n=4+4)
WordsDecode1e6    7.21ms ± 0%    7.21ms ± 0%   -0.02%  (p=1.000 n=4+4)
WordsEncode1e1    29.9ns ± 0%    29.4ns ± 0%   -1.51%  (p=0.029 n=4+4)
WordsEncode1e2    2.12µs ± 0%    1.75µs ± 0%  -17.70%  (p=0.029 n=4+4)
WordsEncode1e3    11.7µs ± 0%    11.2µs ± 0%   -4.61%  (p=0.029 n=4+4)
WordsEncode1e4     119µs ± 0%     120µs ± 0%   +0.36%  (p=0.029 n=4+4)
WordsEncode1e5    1.21ms ± 0%    1.22ms ± 0%   +0.41%  (p=0.029 n=4+4)
WordsEncode1e6    12.0ms ± 0%    12.0ms ± 0%   +0.57%  (p=0.029 n=4+4)
RandomEncode       286µs ± 0%     203µs ± 0%  -28.82%  (p=0.029 n=4+4)
ExtendMatch       47.4µs ± 0%    47.0µs ± 0%   -0.85%  (p=0.029 n=4+4)

Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858
Reviewed-on: https://go-review.googlesource.com/c/go/+/226539
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-06 12:09:39 +00:00
Josh Bleecher Snyder
fff7509d47 cmd/compile: add intrinsic HasCPUFeature for checking cpu features
Before using some CPU instructions, we must check for their presence.
We use global variables in the runtime package to record features.

Prior to this CL, we issued a regular memory load for these features.
The downside to this is that, because it is a regular memory load,
it cannot be hoisted out of loops or otherwise reordered with other loads.

This CL introduces a new intrinsic just for checking cpu features.
It still ends up resulting in a memory load, but that memory load can
now be floated to the entry block and rematerialized as needed.

One downside is that the regular load could be combined with the comparison
into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE.
(It is possible that MOVBQZX+TESTQ+NE would be better.)

This CL does only amd64. It is easy to extend to other architectures.

For the benchmark in #36196, on my machine, this offers a mild speedup.

name      old time/op  new time/op  delta
FMA-8     1.39ns ± 6%  1.29ns ± 9%  -7.19%  (p=0.000 n=97+96)
NonFMA-8  2.03ns ±11%  2.04ns ±12%    ~     (p=0.618 n=99+98)

Updates #15808
Updates #36196

Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/212360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2020-04-04 01:01:04 +00:00
Keith Randall
bba88467f8 cmd/compile: add indexed-load CMP instructions
Things like CMPQ 4(AX)(BX*8), CX

Fixes #37955

Change-Id: Icbed430f65c91a0e3f38a633d8321d79433ad8b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/224219
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-04-01 17:03:26 +00:00
Josh Bleecher Snyder
8114242359 cmd/compile, runtime: use more registers for amd64 write barrier calls
The compiler-inserted write barrier calls use a special ABI
for speed and to minimize the binary size impact.

runtime.gcWriteBarrier takes its args in DI and AX.
This change adds gcWriteBarrier wrapper functions,
varying only in the register used for the second argument.
(Allowing variation in the first argument doesn't offer improvements,
which is convenient, as it avoids quadratic API growth.)
This reduces the number of register copies.

The goals are reduced binary size via reduced register pressure/copies.

One downside to this change is that when the write barrier is on,
we may bounce through several different write barrier wrappers,
which is bad for the instruction cache.

Package runtime write barrier benchmarks for this change:

name                old time/op  new time/op  delta
WriteBarrier-8      16.6ns ± 6%  15.6ns ± 6%  -5.73%  (p=0.000 n=97+99)
BulkWriteBarrier-8  4.37ns ± 7%  4.22ns ± 8%  -3.45%  (p=0.000 n=96+99)

However, I don't particularly trust these numbers.
I ran runtime.BenchmarkWriteBarrier multiple times as I rebased
this change, and noticed that the results have high variance
depending on the parent change, perhaps due to aligment.

This change was stress tested with GOGC=1 GODEBUG=gccheckmark=1 go test std.

This change reduces binary sizes:

file      before    after     Δ       %
addr2line 4308720   4296688   -12032  -0.279%
api       5965592   5945368   -20224  -0.339%
asm       5148088   5025464   -122624 -2.382%
buildid   2848760   2844904   -3856   -0.135%
cgo       4828968   4812840   -16128  -0.334%
compile   19754720  19529744  -224976 -1.139%
cover     5256840   5236600   -20240  -0.385%
dist      3670312   3658264   -12048  -0.328%
doc       4669608   4657576   -12032  -0.258%
fix       3377976   3365944   -12032  -0.356%
link      6614888   6586472   -28416  -0.430%
nm        4258368   4254528   -3840   -0.090%
objdump   4656336   4644304   -12032  -0.258%
pack      2295176   2295432   +256    +0.011%
pprof     14762356  14709364  -52992  -0.359%
test2json 2824456   2820600   -3856   -0.137%
trace     11684404  11643700  -40704  -0.348%
vet       8284760   8252248   -32512  -0.392%
total     115210328 114580040 -630288 -0.547%

This change improves compiler performance:

name        old time/op       new time/op       delta
Template          208ms ± 3%        207ms ± 3%  -0.40%  (p=0.030 n=43+44)
Unicode          80.2ms ± 3%       81.3ms ± 3%  +1.25%  (p=0.000 n=41+44)
GoTypes           699ms ± 3%        694ms ± 2%  -0.71%  (p=0.016 n=42+37)
Compiler          3.26s ± 2%        3.23s ± 2%  -0.86%  (p=0.000 n=43+45)
SSA               6.97s ± 1%        6.93s ± 1%  -0.63%  (p=0.000 n=43+45)
Flate             134ms ± 3%        133ms ± 2%    ~     (p=0.139 n=45+42)
GoParser          165ms ± 2%        164ms ± 1%  -0.79%  (p=0.000 n=45+40)
Reflect           434ms ± 4%        435ms ± 4%    ~     (p=0.937 n=44+44)
Tar               181ms ± 2%        181ms ± 2%    ~     (p=0.702 n=43+45)
XML               244ms ± 2%        244ms ± 2%    ~     (p=0.237 n=45+44)
[Geo mean]        403ms             402ms       -0.29%

name        old user-time/op  new user-time/op  delta
Template          271ms ± 2%        268ms ± 1%  -1.40%  (p=0.000 n=42+42)
Unicode           117ms ± 3%        116ms ± 5%    ~     (p=0.066 n=45+45)
GoTypes           948ms ± 2%        936ms ± 2%  -1.30%  (p=0.000 n=41+40)
Compiler          4.26s ± 1%        4.21s ± 2%  -1.25%  (p=0.000 n=37+45)
SSA               9.52s ± 2%        9.41s ± 1%  -1.18%  (p=0.000 n=44+45)
Flate             167ms ± 2%        165ms ± 2%  -1.15%  (p=0.000 n=44+41)
GoParser          201ms ± 2%        198ms ± 1%  -1.40%  (p=0.000 n=43+43)
Reflect           563ms ± 8%        560ms ± 7%    ~     (p=0.206 n=45+44)
Tar               224ms ± 2%        222ms ± 2%  -0.81%  (p=0.000 n=45+45)
XML               308ms ± 2%        304ms ± 1%  -1.17%  (p=0.000 n=42+43)
[Geo mean]        525ms             519ms       -1.08%

name        old alloc/op      new alloc/op      delta
Template         36.3MB ± 0%       36.3MB ± 0%    ~     (p=0.421 n=5+5)
Unicode          28.4MB ± 0%       28.3MB ± 0%    ~     (p=0.056 n=5+5)
GoTypes           121MB ± 0%        121MB ± 0%  -0.14%  (p=0.008 n=5+5)
Compiler          567MB ± 0%        567MB ± 0%  -0.06%  (p=0.016 n=4+5)
SSA              1.26GB ± 0%       1.26GB ± 0%  -0.07%  (p=0.008 n=5+5)
Flate            22.9MB ± 0%       22.8MB ± 0%    ~     (p=0.310 n=5+5)
GoParser         28.0MB ± 0%       27.9MB ± 0%  -0.09%  (p=0.008 n=5+5)
Reflect          78.4MB ± 0%       78.4MB ± 0%  -0.03%  (p=0.008 n=5+5)
Tar              34.2MB ± 0%       34.2MB ± 0%  -0.05%  (p=0.008 n=5+5)
XML              44.4MB ± 0%       44.4MB ± 0%  -0.04%  (p=0.016 n=5+5)
[Geo mean]       76.4MB            76.3MB       -0.05%

name        old allocs/op     new allocs/op     delta
Template           356k ± 0%         356k ± 0%  -0.13%  (p=0.008 n=5+5)
Unicode            326k ± 0%         326k ± 0%  -0.07%  (p=0.008 n=5+5)
GoTypes           1.24M ± 0%        1.24M ± 0%  -0.24%  (p=0.008 n=5+5)
Compiler          5.30M ± 0%        5.28M ± 0%  -0.34%  (p=0.008 n=5+5)
SSA               11.9M ± 0%        11.9M ± 0%  -0.16%  (p=0.008 n=5+5)
Flate              226k ± 0%         225k ± 0%  -0.12%  (p=0.008 n=5+5)
GoParser           287k ± 0%         286k ± 0%  -0.29%  (p=0.008 n=5+5)
Reflect            930k ± 0%         929k ± 0%  -0.05%  (p=0.008 n=5+5)
Tar                332k ± 0%         331k ± 0%  -0.12%  (p=0.008 n=5+5)
XML                411k ± 0%         411k ± 0%  -0.12%  (p=0.008 n=5+5)
[Geo mean]         771k              770k       -0.16%

For some packages, this change significantly reduces the size of executable text.
Examples:

file                                   before   after    Δ       %
cmd/internal/obj/arm.s                 68658    66855    -1803   -2.626%
cmd/internal/obj/mips.s                57486    56272    -1214   -2.112%
cmd/internal/obj/arm64.s               152107   147163   -4944   -3.250%
cmd/internal/obj/ppc64.s               125544   120456   -5088   -4.053%
cmd/vendor/golang.org/x/tools/go/cfg.s 31699    30742    -957    -3.019%

Full listing:

file                                                                     before   after    Δ       %
container/ring.s                                                         1890     1870     -20     -1.058%
container/list.s                                                         5366     5390     +24     +0.447%
internal/cpu.s                                                           3298     3295     -3      -0.091%
internal/testlog.s                                                       1507     1501     -6      -0.398%
image/color.s                                                            8281     8248     -33     -0.399%
runtime.s                                                                480970   480075   -895    -0.186%
sync.s                                                                   16497    16408    -89     -0.539%
internal/singleflight.s                                                  2591     2577     -14     -0.540%
math/rand.s                                                              10456    10438    -18     -0.172%
cmd/go/internal/par.s                                                    2801     2790     -11     -0.393%
internal/reflectlite.s                                                   28477    28417    -60     -0.211%
errors.s                                                                 2750     2736     -14     -0.509%
internal/oserror.s                                                       446      434      -12     -2.691%
sort.s                                                                   17061    17046    -15     -0.088%
io.s                                                                     17063    16999    -64     -0.375%
vendor/golang.org/x/crypto/hkdf.s                                        1962     1936     -26     -1.325%
text/tabwriter.s                                                         9617     9574     -43     -0.447%
hash/crc64.s                                                             3414     3408     -6      -0.176%
hash/crc32.s                                                             6657     6651     -6      -0.090%
bytes.s                                                                  31932    31863    -69     -0.216%
strconv.s                                                                53158    52799    -359    -0.675%
strings.s                                                                42829    42665    -164    -0.383%
encoding/ascii85.s                                                       4833     4791     -42     -0.869%
vendor/golang.org/x/text/transform.s                                     16810    16724    -86     -0.512%
path.s                                                                   6848     6845     -3      -0.044%
encoding/base32.s                                                        9658     9592     -66     -0.683%
bufio.s                                                                  23051    22908    -143    -0.620%
compress/bzip2.s                                                         11773    11764    -9      -0.076%
image.s                                                                  37565    37502    -63     -0.168%
syscall.s                                                                82359    82279    -80     -0.097%
regexp/syntax.s                                                          83573    82930    -643    -0.769%
image/jpeg.s                                                             36535    36490    -45     -0.123%
regexp.s                                                                 64396    64214    -182    -0.283%
time.s                                                                   82724    82622    -102    -0.123%
plugin.s                                                                 6539     6536     -3      -0.046%
context.s                                                                10959    10865    -94     -0.858%
internal/poll.s                                                          24286    24270    -16     -0.066%
reflect.s                                                                168304   167927   -377    -0.224%
internal/fmtsort.s                                                       7416     7376     -40     -0.539%
os.s                                                                     52465    51787    -678    -1.292%
cmd/go/internal/lockedfile/internal/filelock.s                           2326     2317     -9      -0.387%
os/signal.s                                                              4657     4648     -9      -0.193%
runtime/debug.s                                                          6040     5998     -42     -0.695%
encoding/binary.s                                                        30838    30801    -37     -0.120%
vendor/golang.org/x/net/route.s                                          23694    23491    -203    -0.857%
path/filepath.s                                                          17895    17889    -6      -0.034%
cmd/vendor/golang.org/x/sys/unix.s                                       78125    78109    -16     -0.020%
io/ioutil.s                                                              6999     6996     -3      -0.043%
encoding/base64.s                                                        12094    12007    -87     -0.719%
crypto/cipher.s                                                          20466    20372    -94     -0.459%
cmd/go/internal/robustio.s                                               2672     2669     -3      -0.112%
encoding/pem.s                                                           9302     9286     -16     -0.172%
internal/obscuretestdata.s                                               1719     1695     -24     -1.396%
crypto/aes.s                                                             11014    11002    -12     -0.109%
os/exec.s                                                                29388    29231    -157    -0.534%
cmd/internal/browser.s                                                   2266     2260     -6      -0.265%
internal/goroot.s                                                        4601     4592     -9      -0.196%
vendor/golang.org/x/crypto/chacha20poly1305.s                            8945     8942     -3      -0.034%
cmd/vendor/golang.org/x/crypto/ssh/terminal.s                            27226    27195    -31     -0.114%
index/suffixarray.s                                                      36431    36411    -20     -0.055%
fmt.s                                                                    77017    76709    -308    -0.400%
encoding/hex.s                                                           6241     6154     -87     -1.394%
compress/lzw.s                                                           7133     7069     -64     -0.897%
database/sql/driver.s                                                    18888    18877    -11     -0.058%
net/url.s                                                                29838    29739    -99     -0.332%
debug/plan9obj.s                                                         8329     8279     -50     -0.600%
encoding/csv.s                                                           12986    12902    -84     -0.647%
debug/gosym.s                                                            25403    25330    -73     -0.287%
compress/flate.s                                                         51192    50970    -222    -0.434%
vendor/golang.org/x/net/dns/dnsmessage.s                                 86769    86208    -561    -0.647%
compress/gzip.s                                                          9791     9758     -33     -0.337%
compress/zlib.s                                                          7310     7277     -33     -0.451%
archive/zip.s                                                            42356    42166    -190    -0.449%
debug/dwarf.s                                                            108259   107730   -529    -0.489%
encoding/json.s                                                          106378   105910   -468    -0.440%
os/user.s                                                                14751    14724    -27     -0.183%
database/sql.s                                                           99011    98404    -607    -0.613%
log.s                                                                    9466     9423     -43     -0.454%
debug/pe.s                                                               31272    31182    -90     -0.288%
debug/macho.s                                                            32764    32608    -156    -0.476%
encoding/gob.s                                                           136976   136517   -459    -0.335%
vendor/golang.org/x/text/unicode/bidi.s                                  27318    27276    -42     -0.154%
archive/tar.s                                                            71416    70975    -441    -0.618%
vendor/golang.org/x/net/http2/hpack.s                                    23892    23848    -44     -0.184%
vendor/golang.org/x/text/secure/bidirule.s                               3354     3351     -3      -0.089%
mime/quotedprintable.s                                                   5960     5925     -35     -0.587%
net/http/internal.s                                                      5874     5853     -21     -0.358%
math/big.s                                                               184147   183692   -455    -0.247%
debug/elf.s                                                              63775    63567    -208    -0.326%
mime.s                                                                   39802    39709    -93     -0.234%
encoding/xml.s                                                           111038   110713   -325    -0.293%
crypto/dsa.s                                                             6044     6029     -15     -0.248%
go/token.s                                                               12139    12077    -62     -0.511%
crypto/rand.s                                                            6889     6866     -23     -0.334%
go/scanner.s                                                             19030    19008    -22     -0.116%
flag.s                                                                   22320    22236    -84     -0.376%
vendor/golang.org/x/text/unicode/norm.s                                  66652    66391    -261    -0.392%
crypto/rsa.s                                                             31671    31650    -21     -0.066%
crypto/elliptic.s                                                        51553    51403    -150    -0.291%
internal/xcoff.s                                                         22950    22822    -128    -0.558%
go/constant.s                                                            43750    43689    -61     -0.139%
encoding/asn1.s                                                          57086    57035    -51     -0.089%
runtime/trace.s                                                          2609     2603     -6      -0.230%
crypto/x509/pkix.s                                                       10458    10471    +13     +0.124%
image/gif.s                                                              27544    27385    -159    -0.577%
vendor/golang.org/x/net/idna.s                                           24558    24502    -56     -0.228%
image/png.s                                                              42775    42685    -90     -0.210%
vendor/golang.org/x/crypto/cryptobyte.s                                  33616    33493    -123    -0.366%
go/ast.s                                                                 80684    80449    -235    -0.291%
net/internal/socktest.s                                                  16571    16535    -36     -0.217%
crypto/ecdsa.s                                                           11948    11936    -12     -0.100%
text/template/parse.s                                                    95138    94002    -1136   -1.194%
runtime/pprof.s                                                          59702    59639    -63     -0.106%
testing.s                                                                68427    68088    -339    -0.495%
internal/testenv.s                                                       5620     5596     -24     -0.427%
testing/internal/testdeps.s                                              3312     3294     -18     -0.543%
internal/trace.s                                                         78473    78239    -234    -0.298%
testing/iotest.s                                                         4968     4908     -60     -1.208%
os/signal/internal/pty.s                                                 3011     2990     -21     -0.697%
testing/quick.s                                                          12179    12125    -54     -0.443%
cmd/internal/bio.s                                                       9286     9274     -12     -0.129%
cmd/internal/src.s                                                       17684    17663    -21     -0.119%
cmd/internal/goobj2.s                                                    12588    12558    -30     -0.238%
cmd/internal/objabi.s                                                    16408    16390    -18     -0.110%
go/printer.s                                                             77417    77308    -109    -0.141%
go/parser.s                                                              80045    79113    -932    -1.164%
go/format.s                                                              5434     5419     -15     -0.276%
cmd/internal/goobj.s                                                     26146    25954    -192    -0.734%
runtime/pprof/internal/profile.s                                         102518   102178   -340    -0.332%
text/template.s                                                          95343    94935    -408    -0.428%
cmd/internal/dwarf.s                                                     31718    31572    -146    -0.460%
cmd/vendor/golang.org/x/arch/arm/armasm.s                                45240    45151    -89     -0.197%
internal/lazytemplate.s                                                  1470     1457     -13     -0.884%
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s                            37253    37220    -33     -0.089%
cmd/asm/internal/flags.s                                                 2593     2590     -3      -0.116%
cmd/asm/internal/lex.s                                                   25068    24921    -147    -0.586%
cmd/internal/buildid.s                                                   18536    18263    -273    -1.473%
cmd/vendor/golang.org/x/arch/x86/x86asm.s                                80209    80105    -104    -0.130%
go/doc.s                                                                 75140    74585    -555    -0.739%
cmd/internal/edit.s                                                      3893     3899     +6      +0.154%
html/template.s                                                          89377    88809    -568    -0.636%
cmd/vendor/golang.org/x/arch/arm64/arm64asm.s                            117998   117824   -174    -0.147%
cmd/internal/obj.s                                                       115015   114290   -725    -0.630%
go/build.s                                                               69379    68862    -517    -0.745%
cmd/internal/objfile.s                                                   48106    47982    -124    -0.258%
cmd/cover.s                                                              46239    46113    -126    -0.272%
cmd/addr2line.s                                                          2845     2833     -12     -0.422%
cmd/internal/obj/arm.s                                                   68658    66855    -1803   -2.626%
cmd/internal/obj/mips.s                                                  57486    56272    -1214   -2.112%
cmd/internal/obj/riscv.s                                                 63834    63006    -828    -1.297%
cmd/compile/internal/syntax.s                                            146582   145456   -1126   -0.768%
cmd/internal/obj/wasm.s                                                  44117    44066    -51     -0.116%
cmd/cgo.s                                                                242645   241653   -992    -0.409%
cmd/internal/obj/arm64.s                                                 152107   147163   -4944   -3.250%
net.s                                                                    295972   292010   -3962   -1.339%
go/types.s                                                               321371   319432   -1939   -0.603%
vendor/golang.org/x/net/http/httpproxy.s                                 9450     9423     -27     -0.286%
net/textproto.s                                                          19455    19406    -49     -0.252%
cmd/internal/obj/ppc64.s                                                 125544   120456   -5088   -4.053%
go/internal/srcimporter.s                                                6475     6409     -66     -1.019%
log/syslog.s                                                             8017     7929     -88     -1.098%
cmd/compile/internal/logopt.s                                            10183    10162    -21     -0.206%
net/mail.s                                                               24085    23948    -137    -0.569%
mime/multipart.s                                                         21527    21420    -107    -0.497%
cmd/internal/obj/s390x.s                                                 127610   127757   +147    +0.115%
go/internal/gcimporter.s                                                 34913    34548    -365    -1.045%
vendor/golang.org/x/net/nettest.s                                        28103    28016    -87     -0.310%
cmd/go/internal/cfg.s                                                    9967     9916     -51     -0.512%
cmd/api.s                                                                39703    39603    -100    -0.252%
go/internal/gccgoimporter.s                                              56470    56120    -350    -0.620%
go/importer.s                                                            2077     2056     -21     -1.011%
cmd/compile/internal/types.s                                             48202    47282    -920    -1.909%
cmd/go/internal/str.s                                                    4341     4320     -21     -0.484%
cmd/internal/obj/x86.s                                                   89440    88625    -815    -0.911%
cmd/go/internal/base.s                                                   12667    12580    -87     -0.687%
cmd/go/internal/cache.s                                                  30754    30571    -183    -0.595%
cmd/doc.s                                                                62976    62755    -221    -0.351%
cmd/go/internal/search.s                                                 20114    19993    -121    -0.602%
cmd/vendor/golang.org/x/xerrors.s                                        17923    17855    -68     -0.379%
cmd/go/internal/lockedfile.s                                             16451    16415    -36     -0.219%
cmd/vendor/golang.org/x/mod/sumdb/note.s                                 18200    18150    -50     -0.275%
cmd/vendor/golang.org/x/mod/module.s                                     17869    17851    -18     -0.101%
cmd/asm/internal/arch.s                                                  37533    37482    -51     -0.136%
cmd/fix.s                                                                87728    87492    -236    -0.269%
cmd/vendor/golang.org/x/mod/sumdb/tlog.s                                 36394    36367    -27     -0.074%
cmd/vendor/golang.org/x/mod/sumdb/dirhash.s                              4990     4963     -27     -0.541%
cmd/go/internal/imports.s                                                16499    16469    -30     -0.182%
cmd/vendor/golang.org/x/mod/zip.s                                        18816    18745    -71     -0.377%
cmd/go/internal/cmdflag.s                                                5126     5123     -3      -0.059%
cmd/internal/test2json.s                                                 9540     9452     -88     -0.922%
cmd/go/internal/tool.s                                                   3629     3623     -6      -0.165%
cmd/go/internal/version.s                                                11232    11220    -12     -0.107%
cmd/go/internal/mvs.s                                                    25383    25179    -204    -0.804%
cmd/nm.s                                                                 5815     5803     -12     -0.206%
cmd/dist.s                                                               210146   209140   -1006   -0.479%
cmd/asm/internal/asm.s                                                   68655    68549    -106    -0.154%
cmd/vendor/golang.org/x/mod/modfile.s                                    72974    72510    -464    -0.636%
cmd/go/internal/load.s                                                   107548   106861   -687    -0.639%
cmd/link/internal/sym.s                                                  18708    18581    -127    -0.679%
cmd/asm.s                                                                3367     3343     -24     -0.713%
cmd/gofmt.s                                                              30795    30698    -97     -0.315%
cmd/link/internal/objfile.s                                              21828    21630    -198    -0.907%
cmd/pack.s                                                               14878    14869    -9      -0.060%
cmd/vendor/github.com/google/pprof/internal/elfexec.s                    6788     6782     -6      -0.088%
cmd/test2json.s                                                          1647     1641     -6      -0.364%
cmd/link/internal/loader.s                                               48677    48483    -194    -0.399%
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s       16783    16773    -10     -0.060%
cmd/link/internal/loadelf.s                                              35464    35126    -338    -0.953%
cmd/link/internal/loadmacho.s                                            29438    29180    -258    -0.876%
cmd/link/internal/loadpe.s                                               16440    16371    -69     -0.420%
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2106     2100     -6      -0.285%
cmd/link/internal/loadxcoff.s                                            11711    11615    -96     -0.820%
cmd/vendor/golang.org/x/tools/go/analysis/internal/facts.s               14954    14883    -71     -0.475%
cmd/vendor/golang.org/x/tools/go/ast/inspector.s                         5394     5374     -20     -0.371%
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s               37029    36822    -207    -0.559%
cmd/vendor/golang.org/x/tools/go/analysis/passes/inspect.s               340      337      -3      -0.882%
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s               9919     9858     -61     -0.615%
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s                 6705     6690     -15     -0.224%
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s              9783     9741     -42     -0.429%
cmd/vendor/golang.org/x/tools/go/cfg.s                                   31699    30742    -957    -3.019%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ifaceassert.s           2768     2762     -6      -0.217%
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s           3031     2998     -33     -1.089%
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s                 4382     4376     -6      -0.137%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s            8654     8642     -12     -0.139%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stringintconv.s         3458     3446     -12     -0.347%
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s             8011     7995     -16     -0.200%
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s                 6205     6193     -12     -0.193%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                           66183    65861    -322    -0.487%
cmd/vendor/github.com/google/pprof/profile.s                             150844   150261   -583    -0.386%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s           8057     8054     -3      -0.037%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s          3670     3667     -3      -0.082%
cmd/vendor/github.com/google/pprof/internal/measurement.s                10464    10440    -24     -0.229%
cmd/vendor/golang.org/x/tools/go/types/typeutil.s                        12319    12274    -45     -0.365%
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s                  13503    13342    -161    -1.192%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s              5261     5218     -43     -0.817%
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s              1462     1459     -3      -0.205%
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s            9594     9582     -12     -0.125%
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s                34397    34338    -59     -0.172%
cmd/vendor/github.com/google/pprof/internal/graph.s                      53225    52936    -289    -0.543%
cmd/vendor/github.com/ianlancetaylor/demangle.s                          177450   175329   -2121   -1.195%
crypto/x509.s                                                            147892   147388   -504    -0.341%
cmd/go/internal/work.s                                                   306465   304950   -1515   -0.494%
cmd/go/internal/run.s                                                    4664     4657     -7      -0.150%
crypto/tls.s                                                             313130   311833   -1297   -0.414%
net/http/httptrace.s                                                     3979     3905     -74     -1.860%
net/smtp.s                                                               14413    14344    -69     -0.479%
cmd/link/internal/ld.s                                                   545343   542279   -3064   -0.562%
cmd/link/internal/mips.s                                                 6218     6215     -3      -0.048%
cmd/link/internal/mips64.s                                               6108     6103     -5      -0.082%
cmd/link/internal/amd64.s                                                18154    18112    -42     -0.231%
cmd/link/internal/arm64.s                                                22527    22494    -33     -0.146%
cmd/link/internal/arm.s                                                  22574    22494    -80     -0.354%
cmd/link/internal/s390x.s                                                20779    20746    -33     -0.159%
cmd/link/internal/wasm.s                                                 16531    16493    -38     -0.230%
cmd/link/internal/x86.s                                                  18906    18849    -57     -0.301%
cmd/link/internal/ppc64.s                                                26856    26778    -78     -0.290%
net/http.s                                                               559101   556513   -2588   -0.463%
net/http/cookiejar.s                                                     15912    15885    -27     -0.170%
expvar.s                                                                 9531     9525     -6      -0.063%
net/http/httptest.s                                                      16616    16475    -141    -0.849%
net/http/cgi.s                                                           23624    23458    -166    -0.703%
cmd/go/internal/web.s                                                    16546    16489    -57     -0.344%
cmd/vendor/golang.org/x/mod/sumdb.s                                      33197    33117    -80     -0.241%
net/http/fcgi.s                                                          19266    19169    -97     -0.503%
net/http/httputil.s                                                      39875    39728    -147    -0.369%
cmd/vendor/github.com/google/pprof/internal/symbolz.s                    5888     5867     -21     -0.357%
net/rpc.s                                                                34154    34003    -151    -0.442%
cmd/vendor/github.com/google/pprof/internal/transport.s                  2746     2716     -30     -1.092%
cmd/vendor/github.com/google/pprof/internal/binutils.s                   35999    35875    -124    -0.344%
net/rpc/jsonrpc.s                                                        6637     6598     -39     -0.588%
cmd/vendor/github.com/google/pprof/internal/symbolizer.s                 11533    11458    -75     -0.650%
cmd/go/internal/get.s                                                    62921    62803    -118    -0.188%
cmd/vendor/github.com/google/pprof/internal/report.s                     80364    80058    -306    -0.381%
cmd/go/internal/modfetch/codehost.s                                      89680    89066    -614    -0.685%
cmd/trace.s                                                              117171   116701   -470    -0.401%
cmd/vendor/github.com/google/pprof/internal/driver.s                     144268   143297   -971    -0.673%
cmd/go/internal/modfetch.s                                               126299   125860   -439    -0.348%
cmd/vendor/github.com/google/pprof/driver.s                              9042     9000     -42     -0.464%
cmd/go/internal/modconv.s                                                17947    17889    -58     -0.323%
cmd/pprof.s                                                              12399    12326    -73     -0.589%
cmd/go/internal/modload.s                                                151182   150389   -793    -0.525%
cmd/go/internal/generate.s                                               11738    11636    -102    -0.869%
cmd/go/internal/help.s                                                   6571     6531     -40     -0.609%
cmd/go/internal/clean.s                                                  11174    11142    -32     -0.286%
cmd/go/internal/vet.s                                                    7897     7867     -30     -0.380%
cmd/go/internal/envcmd.s                                                 22176    22095    -81     -0.365%
cmd/go/internal/list.s                                                   15216    15067    -149    -0.979%
cmd/go/internal/modget.s                                                 38698    38519    -179    -0.463%
cmd/go/internal/modcmd.s                                                 46674    46441    -233    -0.499%
cmd/go/internal/test.s                                                   64664    64456    -208    -0.322%
cmd/go.s                                                                 6730     6703     -27     -0.401%
cmd/compile/internal/ssa.s                                               3592565  3582500  -10065  -0.280%
cmd/compile/internal/gc.s                                                1549123  1537123  -12000  -0.775%
cmd/compile/internal/riscv64.s                                           14579    14483    -96     -0.658%
cmd/compile/internal/mips.s                                              20578    20419    -159    -0.773%
cmd/compile/internal/ppc64.s                                             25524    25359    -165    -0.646%
cmd/compile/internal/mips64.s                                            19795    19636    -159    -0.803%
cmd/compile/internal/wasm.s                                              13329    13290    -39     -0.293%
cmd/compile/internal/s390x.s                                             28097    27892    -205    -0.730%
cmd/compile/internal/arm.s                                               31489    31321    -168    -0.534%
cmd/compile/internal/arm64.s                                             29803    29590    -213    -0.715%
cmd/compile/internal/amd64.s                                             32961    33221    +260    +0.789%
cmd/compile/internal/x86.s                                               31029    30878    -151    -0.487%
total                                                                    18534966 18440341 -94625  -0.511%

Change-Id: I830d37364f14f0297800adc42c99f60a74c51aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/226367
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-31 21:26:33 +00:00
Keith Randall
33b648c0e9 cmd/compile: fix ephemeral pointer problem on amd64
Make sure we don't use the rewrite ptr + (c + x) -> c + (ptr + x), as
that may create an ephemeral out-of-bounds pointer.

I have not seen an actual bug caused by this yet, but we've seen
them in the 386 port so I'm fixing this issue for amd64 as well.

The load-combining rules needed to be reworked somewhat to still
work without the above broken rule.

Update #37881

Change-Id: I8046d170e89e2035195f261535e34ca7d8aca68a
Reviewed-on: https://go-review.googlesource.com/c/go/+/226437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-30 17:25:29 +00:00
Keith Randall
af7eafd150 cmd/compile: convert 386 port to use addressing modes pass (take 2)
Retrying CL 222782, with a fix that will hopefully stop the random crashing.

The issue with the previous CL is that it does pointer arithmetic
in a way that may briefly generate an out-of-bounds pointer. If an
interrupt happens to occur in that state, the referenced object may
be collected incorrectly.

Suppose there was code that did s[x+c].  The previous CL had a rule
to the effect of ptr + (x + c) -> c + (ptr + x).  But ptr+x is not
guaranteed to point to the same object as ptr. In contrast,
ptr+(x+c) is guaranteed to point to the same object as ptr, because
we would have already checked that x+c is in bounds.

For example, strconv.trim used to have this code:
  MOVZX -0x1(BX)(DX*1), BP
  CMPL $0x30, AL
After CL 222782, it had this code:
  LEAL 0(BX)(DX*1), BP
  CMPB $0x30, -0x1(BP)

An interrupt between those last two instructions could see BP pointing
outside the backing store of the slice involved.

It's really hard to actually demonstrate a bug. First, you need to
have an interrupt occur at exactly the right time. Then, there must
be no other pointers to the object in question. Since the interrupted
frame will be scanned conservatively, there can't even be a dead
pointer in another register or on the stack. (In the example above,
a bug can't happen because BX still holds the original pointer.)
Then, the object in question needs to be collected (or at least
scanned?) before the interrupted code continues.

This CL needs to handle load combining somewhat differently than CL 222782
because of the new restriction on arithmetic. That's the only real
difference (other than removing the bad rules) from that old CL.

This bug is also present in the amd64 rewrite rules, and we haven't
seen any crashing as a result. I will fix up that code similarly to
this one in a separate CL.

Update #37881

Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f
Reviewed-on: https://go-review.googlesource.com/c/go/+/225798
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-27 18:54:45 +00:00
Lynn Boger
e4a1cf8a56 cmd/compile: add rules to eliminate unnecessary signed shifts
This change to the rules removes some unnecessary signed shifts
that appear in the math/rand functions. Existing rules did not
cover some of the signed cases.

A little improvement seen in math/rand due to removing 1 of 2
instructions generated for Int31n, which is inlined quite a bit.

Intn1000                 46.9ns ± 0%  45.5ns ± 0%   -2.99%  (p=1.000 n=1+1)
Int63n1000               33.5ns ± 0%  32.8ns ± 0%   -2.09%  (p=1.000 n=1+1)
Int31n1000               32.7ns ± 0%  32.6ns ± 0%   -0.31%  (p=1.000 n=1+1)
Float32                  32.7ns ± 0%  30.3ns ± 0%   -7.34%  (p=1.000 n=1+1)
Float64                  21.7ns ± 0%  20.9ns ± 0%   -3.69%  (p=1.000 n=1+1)
Perm3                     205ns ± 0%   202ns ± 0%   -1.46%  (p=1.000 n=1+1)
Perm30                   1.71µs ± 0%  1.68µs ± 0%   -1.35%  (p=1.000 n=1+1)
Perm30ViaShuffle         1.65µs ± 0%  1.65µs ± 0%   -0.30%  (p=1.000 n=1+1)
ShuffleOverhead          2.83µs ± 0%  2.83µs ± 0%   -0.07%  (p=1.000 n=1+1)
Read3                    18.7ns ± 0%  16.1ns ± 0%  -13.90%  (p=1.000 n=1+1)
Read64                    126ns ± 0%   124ns ± 0%   -1.59%  (p=1.000 n=1+1)
Read1000                 1.75µs ± 0%  1.63µs ± 0%   -7.08%  (p=1.000 n=1+1)

Change-Id: I11502dfca7d65aafc76749a8d713e9e50c24a858
Reviewed-on: https://go-review.googlesource.com/c/go/+/225917
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-27 16:05:42 +00:00
Ruixin(Peter) Bao
16cfab8d89 cmd/compile: use load and test instructions on s390x
The load and test instructions compare the given value
against zero and will produce a condition code indicating
one of the following scenarios:

0: Result is zero
1: Result is less than zero
2: Result is greater than zero
3: Result is not a number (NaN)

The instruction can be used to simplify floating point comparisons
against zero, which can enable further optimizations.

This CL also reduces the size of .text section of math.test binary by around
0.7 KB (in hexadecimal, from 1358f0 to 135620).

Change-Id: I33cb714f0c6feebac7a1c46dfcc735e7daceff9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/209159
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-25 13:10:07 +00:00
Keith Randall
c785633941 Revert "cmd/compile: convert 386 port to use addressing modes pass"
This reverts commit CL 222782.

Reason for revert: Reverting to see if 386 errors go away

Update #37881

Change-Id: I74f287404c52414db1b6ff1649effa4ed9e5cc0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/225218
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-03-24 19:07:15 +00:00
Keith Randall
e0deacd1c0 Revert "cmd/compile: disable mem+op operations on 386"
This reverts commit CL 224837.

Reason for revert: Reverting partial reverts of 222782.

Update #37881

Change-Id: Ie9bf84d6e17ed214abe538965e5ff03936886826
Reviewed-on: https://go-review.googlesource.com/c/go/+/225217
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-03-24 19:06:22 +00:00
Keith Randall
f975485ad1 Revert "cmd/compile: disable addressingmodes pass for 386"
This reverts commit CL 225057.

Reason for revert: Undoing partial reverts of CL 222782

Update #37881

Change-Id: Iee024cab2a580a37a0fc355e0e3c5ad3d8fdaf7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/225197
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-03-24 19:05:50 +00:00
Keith Randall
5b897ec017 cmd/compile: disable addressingmodes pass for 386
Update #37881

Change-Id: I1f9a3f57f6215a19c31765c257ee78715eab36b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/225057
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-23 20:31:13 +00:00
Keith Randall
3adbdb6d99 cmd/compile: disable mem+op operations on 386
Rolling back portions of CL 222782 to see if that helps
issue #37881 any.

Update #37881

Change-Id: I9cc3ff8c469fa5e4b22daec715d04148033f46f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/224837
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-03-23 18:27:37 +00:00
Russ Cox
fc8a6336d1 cmd/asm, cmd/compile, runtime: add -spectre=ret mode
This commit extends the -spectre flag to cmd/asm and adds
a new Spectre mitigation mode "ret", which enables the use
of retpolines.

Retpolines prevent speculation about the target of an indirect
jump or call and are described in more detail here:
https://support.google.com/faqs/answer/7625886

Change-Id: I4f2cb982fa94e44d91e49bd98974fd125619c93a
Reviewed-on: https://go-review.googlesource.com/c/go/+/222661
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-13 19:05:54 +00:00
Russ Cox
877ef86bec cmd/compile: add spectre mitigation mode enabled by -spectre
This commit adds a new cmd/compile flag -spectre,
which accepts a comma-separated list of possible
Spectre mitigations to apply, or the empty string (none),
or "all". The only known mitigation right now is "index",
which uses conditional moves to ensure that x86-64 CPUs
do not speculate past index bounds checks.

Speculating past index bounds checks may be problematic
on systems running privileged servers that accept requests
from untrusted users who can execute their own programs
on the same machine. (And some more constraints that
make it even more unlikely in practice.)

The cases this protects against are analogous to the ones
Microsoft explains in the "Array out of bounds load/store feeding ..."
sections here:
https://docs.microsoft.com/en-us/cpp/security/developer-guidance-speculative-execution?view=vs-2019#array-out-of-bounds-load-feeding-an-indirect-branch

Change-Id: Ib7532d7e12466b17e04c4e2075c2a456dc98f610
Reviewed-on: https://go-review.googlesource.com/c/go/+/222660
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-13 19:05:46 +00:00
Keith Randall
d84cbec890 cmd/compile: convert 386 port to use addressing modes pass
Update #36468

Change-Id: Idfdb845d097994689be450d6e8a57fa9adb57166
Reviewed-on: https://go-review.googlesource.com/c/go/+/222782
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-03-13 17:00:54 +00:00
Russ Cox
801a9d9a0c test/codegen: mention in README that tests only run on Linux without -all_codegen
This took me a while to figure out. The relevant code is in
test/run.go (note the "linux" hard-coded strings):

	var arch, subarch, os string
	switch {
	case archspec[2] != "": // 3 components: "linux/386/sse2"
		os, arch, subarch = archspec[0], archspec[1][1:], archspec[2][1:]
	case archspec[1] != "": // 2 components: "386/sse2"
		os, arch, subarch = "linux", archspec[0], archspec[1][1:]
	default: // 1 component: "386"
		os, arch, subarch = "linux", archspec[0], ""
		if arch == "wasm" {
			os = "js"
		}
	}

Change-Id: I92ba280025d2072e17532a5e43cf1d676789c167
Reviewed-on: https://go-review.googlesource.com/c/go/+/222819
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-11 16:17:08 +00:00
Keith Randall
98cb76799c cmd/compile: insert complicated x86 addressing modes as a separate pass
Use a separate compiler pass to introduce complicated x86 addressing
modes.  Loads in the normal architecture rules (for x86 and all other
platforms) can have constant offsets (AuxInt values) and symbols (Aux
values), but no more.

The complex addressing modes (x+y, x+2*y, etc.) are introduced in a
separate pass that combines loads with LEAQx ops.

Organizing rewrites this way simplifies the number of rewrites
required, as there are lots of different rule orderings that have to
be specified to ensure these complex addressing modes are always found
if they are possible.

Update #36468

Change-Id: I5b4bf7b03a1e731d6dfeb9ef19b376175f3b4b44
Reviewed-on: https://go-review.googlesource.com/c/go/+/217097
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-03-10 00:13:21 +00:00
Diogo Pinela
19ed0d993c cmd/compile: use staticuint64s instead of staticbytes
There are still two places in src/runtime/string.go that use
staticbytes, so we cannot delete it just yet.

There is a new codegen test to verify that the index calculation
is constant-folded, at least on amd64. ppc64, mips[64] and s390x
cannot currently do that.

There is also a new runtime benchmark to ensure that this does not
slow down performance (tested against parent commit):

name                      old time/op  new time/op  delta
ConvT2EByteSized/bool-4   1.07ns ± 1%  1.07ns ± 1%   ~     (p=0.060 n=14+15)
ConvT2EByteSized/uint8-4  1.06ns ± 1%  1.07ns ± 1%   ~     (p=0.095 n=14+15)

Updates #37612

Change-Id: I5ec30738edaa48cda78dfab4a78e24a32fa7fd6a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221957
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-03-04 21:43:01 +00:00
Keith Randall
cd9fd640db cmd/compile: don't allow NaNs in floating-point constant ops
Trying this CL again, with a fixed test that allows platforms
to disagree on the exact behavior of converting NaNs.

We store 32-bit floating point constants in a 64-bit field, by
converting that 32-bit float to 64-bit float to store it, and convert
it back to use it.

That works for *almost* all floating-point constants. The exception is
signaling NaNs. The round trip described above means we can't represent
a 32-bit signaling NaN, because conversions strip the signaling bit.

To fix this issue, just forbid NaNs as floating-point constants in SSA
form. This shouldn't affect any real-world code, as people seldom
constant-propagate NaNs (except in test code).

Additionally, NaNs are somewhat underspecified (which of the many NaNs
do you get when dividing 0/0?), so when cross-compiling there's a
danger of using the compiler machine's NaN regime for some math, and
the target machine's NaN regime for other math. Better to use the
target machine's NaN regime always.

Update #36400

Change-Id: Idf203b688a15abceabbd66ba290d4e9f63619ecb
Reviewed-on: https://go-review.googlesource.com/c/go/+/221790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-03-04 04:49:54 +00:00
Josh Bleecher Snyder
b49d8ce2fa all: fix two minor typos in comments
Change-Id: Iec6cd81c9787d3419850aa97e75052956ad139bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/221789
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
2020-03-03 17:44:05 +00:00
Michael Munday
e37cc29863 cmd/compile: optimize integer-in-range checks
This CL incorporates code from CL 201206 by Josh Bleecher Snyder
(thanks Josh).

This CL restores the integer-in-range optimizations in the SSA
backend. The fuse pass is enhanced to detect inequalities that
could be merged and fuse their associated blocks while the generic
rules optimize them into a single unsigned comparison.

For example, the inequality `x >= 0 && x < 10` will now be optimized
to `unsigned(x) < 10`.

Overall has a fairly positive impact on binary sizes.

name                      old time/op       new time/op       delta
Template                        192ms ± 1%        192ms ± 1%    ~     (p=0.757 n=17+18)
Unicode                        76.6ms ± 2%       76.5ms ± 2%    ~     (p=0.603 n=19+19)
GoTypes                         694ms ± 1%        693ms ± 1%    ~     (p=0.569 n=19+20)
Compiler                        3.26s ± 0%        3.27s ± 0%  +0.25%  (p=0.000 n=20+20)
SSA                             7.41s ± 0%        7.49s ± 0%  +1.10%  (p=0.000 n=17+19)
Flate                           120ms ± 1%        120ms ± 1%  +0.38%  (p=0.003 n=19+19)
GoParser                        152ms ± 1%        152ms ± 1%    ~     (p=0.061 n=17+19)
Reflect                         422ms ± 1%        425ms ± 2%  +0.76%  (p=0.001 n=18+20)
Tar                             167ms ± 1%        167ms ± 0%    ~     (p=0.730 n=18+19)
XML                             233ms ± 4%        231ms ± 1%    ~     (p=0.752 n=20+17)
LinkCompiler                    927ms ± 8%        928ms ± 8%    ~     (p=0.857 n=19+20)
ExternalLinkCompiler            1.81s ± 2%        1.81s ± 2%    ~     (p=0.513 n=19+20)
LinkWithoutDebugCompiler        556ms ±10%        583ms ±13%  +4.95%  (p=0.007 n=20+20)
[Geo mean]                      478ms             481ms       +0.52%

name                      old user-time/op  new user-time/op  delta
Template                        270ms ± 5%        269ms ± 7%    ~     (p=0.925 n=20+20)
Unicode                         134ms ± 7%        131ms ±14%    ~     (p=0.593 n=18+20)
GoTypes                         981ms ± 3%        987ms ± 2%  +0.63%  (p=0.049 n=19+18)
Compiler                        4.50s ± 2%        4.50s ± 1%    ~     (p=0.588 n=19+20)
SSA                             10.6s ± 2%        10.6s ± 1%    ~     (p=0.141 n=20+19)
Flate                           164ms ± 8%        165ms ±10%    ~     (p=0.738 n=20+20)
GoParser                        202ms ± 5%        203ms ± 6%    ~     (p=0.820 n=20+20)
Reflect                         587ms ± 6%        597ms ± 3%    ~     (p=0.087 n=20+18)
Tar                             230ms ± 6%        228ms ± 8%    ~     (p=0.569 n=19+20)
XML                             311ms ± 6%        314ms ± 5%    ~     (p=0.369 n=20+20)
LinkCompiler                    878ms ± 8%        887ms ± 7%    ~     (p=0.289 n=20+20)
ExternalLinkCompiler            1.60s ± 7%        1.60s ± 7%    ~     (p=0.820 n=20+20)
LinkWithoutDebugCompiler        498ms ±12%        489ms ±11%    ~     (p=0.398 n=20+20)
[Geo mean]                      611ms             611ms       +0.05%

name                      old alloc/op      new alloc/op      delta
Template                       36.1MB ± 0%       36.0MB ± 0%  -0.32%  (p=0.000 n=20+20)
Unicode                        28.3MB ± 0%       28.3MB ± 0%  -0.03%  (p=0.000 n=19+20)
GoTypes                         121MB ± 0%        121MB ± 0%    ~     (p=0.226 n=16+20)
Compiler                        563MB ± 0%        563MB ± 0%    ~     (p=0.166 n=20+19)
SSA                            1.32GB ± 0%       1.33GB ± 0%  +0.88%  (p=0.000 n=20+19)
Flate                          22.7MB ± 0%       22.7MB ± 0%  -0.02%  (p=0.033 n=19+20)
GoParser                       27.9MB ± 0%       27.9MB ± 0%  -0.02%  (p=0.001 n=20+20)
Reflect                        78.3MB ± 0%       78.2MB ± 0%  -0.01%  (p=0.019 n=20+20)
Tar                            34.0MB ± 0%       34.0MB ± 0%  -0.04%  (p=0.000 n=20+20)
XML                            43.9MB ± 0%       43.9MB ± 0%  -0.07%  (p=0.000 n=20+19)
LinkCompiler                    205MB ± 0%        205MB ± 0%  +0.44%  (p=0.000 n=20+18)
ExternalLinkCompiler            223MB ± 0%        223MB ± 0%  +0.03%  (p=0.000 n=20+20)
LinkWithoutDebugCompiler        139MB ± 0%        142MB ± 0%  +1.75%  (p=0.000 n=20+20)
[Geo mean]                     93.7MB            93.9MB       +0.20%

name                      old allocs/op     new allocs/op     delta
Template                         363k ± 0%         361k ± 0%  -0.58%  (p=0.000 n=20+19)
Unicode                          329k ± 0%         329k ± 0%  -0.06%  (p=0.000 n=19+20)
GoTypes                         1.28M ± 0%        1.28M ± 0%  -0.01%  (p=0.000 n=20+20)
Compiler                        5.40M ± 0%        5.40M ± 0%  -0.01%  (p=0.000 n=20+20)
SSA                             12.7M ± 0%        12.8M ± 0%  +0.80%  (p=0.000 n=20+20)
Flate                            228k ± 0%         228k ± 0%    ~     (p=0.194 n=20+20)
GoParser                         295k ± 0%         295k ± 0%  -0.04%  (p=0.000 n=20+20)
Reflect                          949k ± 0%         949k ± 0%  -0.01%  (p=0.000 n=20+20)
Tar                              337k ± 0%         337k ± 0%  -0.06%  (p=0.000 n=20+20)
XML                              418k ± 0%         417k ± 0%  -0.17%  (p=0.000 n=20+20)
LinkCompiler                     553k ± 0%         554k ± 0%  +0.22%  (p=0.000 n=20+19)
ExternalLinkCompiler            1.52M ± 0%        1.52M ± 0%  +0.27%  (p=0.000 n=20+20)
LinkWithoutDebugCompiler         186k ± 0%         186k ± 0%  +0.06%  (p=0.000 n=20+20)
[Geo mean]                       723k              723k       +0.03%

name                      old text-bytes    new text-bytes    delta
HelloSize                       828kB ± 0%        828kB ± 0%  -0.01%  (p=0.000 n=20+20)

name                      old data-bytes    new data-bytes    delta
HelloSize                      13.4kB ± 0%       13.4kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       180kB ± 0%        180kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.23MB ± 0%       1.23MB ± 0%  -0.33%  (p=0.000 n=20+20)

file      before    after     Δ       %
addr2line 4320075   4311883   -8192   -0.190%
asm       5191932   5187836   -4096   -0.079%
buildid   2835338   2831242   -4096   -0.144%
compile   20531717  20569099  +37382  +0.182%
cover     5322511   5318415   -4096   -0.077%
dist      3723749   3719653   -4096   -0.110%
doc       4743515   4739419   -4096   -0.086%
fix       3413960   3409864   -4096   -0.120%
link      6690119   6686023   -4096   -0.061%
nm        4269616   4265520   -4096   -0.096%
pprof     14942189  14929901  -12288  -0.082%
trace     11807164  11790780  -16384  -0.139%
vet       8384104   8388200   +4096   +0.049%
go        15339076  15334980  -4096   -0.027%
total     132258257 132226007 -32250  -0.024%

Fixes #30645.

Change-Id: If551ac5996097f3685870d083151b5843170aab0
Reviewed-on: https://go-review.googlesource.com/c/go/+/165998
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-03 14:30:26 +00:00
Michael Munday
44fe355694 cmd/compile: canonicalize comparison argument order
Ensure that any comparison between two values has the same argument
order. This helps ensure that they can be eliminated during the
lowered CSE pass which will be particularly important if we eliminate
the Greater and Geq ops (see #37316).

Example:

  CMP R0, R1
  BLT L1
  CMP R1, R0 // different order, cannot eliminate
  BEQ L2

  CMP R0, R1
  BLT L1
  CMP R0, R1 // same order, can eliminate
  BEQ L2

This does have some drawbacks. Notably comparisons might 'flip'
direction in the assembly output after even small changes to the
code or compiler. It should help make optimizations more reliable
however.

compilecmp master -> HEAD
master (218f4572f5): text/template: make reflect.Value indirections more robust
HEAD (f1661fef3e): cmd/compile: canonicalize comparison argument order
platform: linux/amd64

file      before    after     Δ       %
api       6063927   6068023   +4096   +0.068%
asm       5191757   5183565   -8192   -0.158%
cgo       4893518   4901710   +8192   +0.167%
cover     5330345   5326249   -4096   -0.077%
fix       3417778   3421874   +4096   +0.120%
pprof     14889456  14885360  -4096   -0.028%
test2json 2848138   2844042   -4096   -0.144%
trace     11746239  11733951  -12288  -0.105%
total     132739173 132722789 -16384  -0.012%

Change-Id: I11736b3fe2a4553f6fc65018f475e88217fa22f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/220425
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 10:32:22 +00:00
Bryan C. Mills
a9f1ea4a83 Revert "cmd/compile: don't allow NaNs in floating-point constant ops"
This reverts CL 213477.

Reason for revert: tests are failing on linux-mips*-rtrk builders.

Change-Id: I8168f7450890233f1bd7e53930b73693c26d4dc0
Reviewed-on: https://go-review.googlesource.com/c/go/+/220897
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-25 15:49:19 +00:00
Keith Randall
2aa7c6c548 cmd/compile: don't allow NaNs in floating-point constant ops
We store 32-bit floating point constants in a 64-bit field, by
converting that 32-bit float to 64-bit float to store it, and convert
it back to use it.

That works for *almost* all floating-point constants. The exception is
signaling NaNs. The round trip described above means we can't represent
a 32-bit signaling NaN, because conversions strip the signaling bit.

To fix this issue, just forbid NaNs as floating-point constants in SSA
form. This shouldn't affect any real-world code, as people seldom
constant-propagate NaNs (except in test code).

Additionally, NaNs are somewhat underspecified (which of the many NaNs
do you get when dividing 0/0?), so when cross-compiling there's a
danger of using the compiler machine's NaN regime for some math, and
the target machine's NaN regime for other math. Better to use the
target machine's NaN regime always.

This has been a bug since 1.10, and there's an easy workaround
(declare a global varaible containing the signaling NaN pattern, and
use that as the argument to math.Float32frombits) so we'll fix it in
1.15.

Fixes #36400
Update #36399

Change-Id: Icf155e743281560eda2eed953d19a829552ccfda
Reviewed-on: https://go-review.googlesource.com/c/go/+/213477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-02-25 02:21:53 +00:00
Keith Randall
1cfe8e91b6 cmd/compile: use ADDQ instead of LEAQ when we can
The address calculations in the example end up doing x << 4 + y + 0.
Before this CL we use a SHLQ+LEAQ. Since the constant offset is 0,
we can use SHLQ+ADDQ instead.

Change-Id: Ia048c4fdbb3a42121c7e1ab707961062e8247fca
Reviewed-on: https://go-review.googlesource.com/c/go/+/209959
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-24 21:33:53 +00:00
Brian Kessler
6b1d5471b9 cmd/compile: add signed indivisibility by power of 2 rules
Commit 44343c777c (CL 173557) added rules for handling
divisibility checks for powers of 2 for signed integers, x%c ==0.
This change adds the complementary indivisibility rules, x%c != 0.

Fixes #34166

Change-Id: I87379e30af7aff633371acca82db2397da9b2c07
Reviewed-on: https://go-review.googlesource.com/c/go/+/194219
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 16:30:46 +00:00
Russ Cox
543c6d2e0d math, cmd/compile: rename Fma to FMA
This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).

I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.

Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.

The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.

For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.

So math.FMA, not math.Fma.

This API has not yet been released, so this change does not break
the compatibility promise.

This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.

Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-07 14:51:06 +00:00
smasher164
58b031949b cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.

Updates #25819.

Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 17:42:47 +00:00
smasher164
7a6da218b1 cmd/compile: add fma intrinsic for amd64
To permit ssa-level optimization, this change introduces an amd64 intrinsic
that generates the VFMADD231SD instruction for the fused-multiply-add
operation on systems that support it. System support is detected via
cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
("Fma") to VFMADD231SD.

The benchmark compares the software implementation (old) with the intrinsic
(new).

name   old time/op  new time/op  delta
Fma-4  27.2ns ± 1%   1.0ns ± 9%  -96.48%  (p=0.008 n=5+5)

Updates #25819.

Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:42:10 +00:00
smasher164
33425ab8db cmd/compile: introduce generic ssa intrinsic for fused-multiply-add
In order to make math.FMA a compiler intrinsic for ISAs like ARM64,
PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and
rewritten as

    ARM64: (Fma x y z) -> (FMADDD z x y)
    PPC64: (Fma x y z) -> (FMADD x y z)
    S390X: (Fma x y z) -> (FMADD z x y)

Updates #25819.

Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/131959
Run-TryBot: Akhil Indurti <aindurti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:24:15 +00:00
David Chase
6adaf17eaa cmd/compile: preserve statements in late nilcheckelim optimization
When a subsequent load/store of a ptr makes the nil check of that pointer
unnecessary, if their lines differ, change the line of the load/store
to that of the nilcheck, and attempt to rehome the load/store position
instead.

This fix makes profiling less accurate in order to make panics more
informative.

Fixes #33724

Change-Id: Ib9afaac12fe0d0320aea1bf493617facc34034b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/200197
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-15 16:43:44 +00:00
Meng Zhuo
50f1157760 cmd/compile: add math/bits.Mul64 intrinsic on mips64x
Benchmark:
name   old time/op  new time/op  delta
Mul    36.0ns ± 1%   2.8ns ± 0%  -92.31%  (p=0.000 n=10+10)
Mul32  4.37ns ± 0%  4.37ns ± 0%     ~     (p=0.429 n=6+10)
Mul64  36.4ns ± 0%   2.8ns ± 0%  -92.37%  (p=0.000 n=10+9)

Change-Id: Ic4f4e5958adbf24999abcee721d0180b5413fca7
Reviewed-on: https://go-review.googlesource.com/c/go/+/200582
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-14 21:23:34 +00:00