1
0
mirror of https://github.com/golang/go synced 2024-11-17 18:54:42 -07:00
Commit Graph

373 Commits

Author SHA1 Message Date
Keith Randall
aaa3d39f27 cmd/compile: include all entries in map literal hint size
Currently we only include static entries in the hint for sizing
the map when allocating a map for a map literal. Change that to
include all entries.

This will be an overallocation if the dynamic entries in the map have
equal keys, but equal keys in map literals are rare, and at worst we
waste a bit of space.

Fixes #43020

Change-Id: I232f82f15316bdf4ea6d657d25a0b094b77884ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/383634
Run-TryBot: Keith Randall <khr@golang.org>
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-01 23:20:30 +00:00
Archana R
4056934e48 test/codegen: updated arithmetic tests to verify on ppc64,ppc64le
Updated multiple tests in test/codegen/arithmetic.go to verify
on ppc64/ppc64le as well

Change-Id: I79ca9f87017ea31147a4ba16f5d42ba0fcae64e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/358546
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-11-01 13:12:37 +00:00
Archana R
8b9c0d1a79 test/codegen: updated comparison test to verify on ppc64,ppc64le
Updated test/codegen/comparison.go to verify memequal is inlined
as implemented in CL 328291.

Change-Id: If7824aed37ee1f8640e54fda0f9b7610582ba316
Reviewed-on: https://go-review.googlesource.com/c/go/+/357289
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-21 13:05:07 +00:00
wdvxdr
fe7df4c4d0 cmd/compile: use MOVBE instruction for GOAMD64>=v3
In CL 354670, I copied some existing rules for convenience but forgot
to update the last rule which broke `GOAMD64=v3 ./make.bat`

Revive CL 354670

Change-Id: Ic1e2047c603f0122482a4b293ce1ef74d806c019
Reviewed-on: https://go-review.googlesource.com/c/go/+/356810
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
2021-10-19 16:18:56 +00:00
Daniel Martí
b0351bfd7d Revert "cmd/compile: use MOVBE instruction for GOAMD64>=v3"
This reverts CL 354670.

Reason for revert: broke make.bash with GOAMD64=v3.

Fixes #49061.

Change-Id: I7f2ed99b7c10100c4e0c1462ea91c4c9d8c609b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/356790
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Koichi Shiraishi <zchee.io@gmail.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2021-10-19 09:49:38 +00:00
Lynn Boger
33b3260c1e cmd/compile/internal/ssagen: set BitLen32 as intrinsic on PPC64
It was noticed through some other investigation that BitLen32
was not generating the best code and found that it wasn't recognized
as an intrinsic. This corrects that and enables the test for PPC64.

Change-Id: Iab496a8830c8552f507b7292649b1b660f3848b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/355872
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-18 21:33:08 +00:00
wdvxdr
3e5cc4d6f6 cmd/compile: use MOVBE instruction for GOAMD64>=v3
encoding/binary benchmark on my laptop:
name                      old time/op    new time/op     delta
ReadSlice1000Int32s-8       4.42µs ± 5%     4.20µs ± 1%   -4.94%  (p=0.046 n=9+8)
ReadStruct-8                 359ns ± 8%      368ns ± 5%   +2.35%  (p=0.041 n=9+10)
WriteStruct-8                349ns ± 1%      357ns ± 1%   +2.15%  (p=0.000 n=8+10)
ReadInts-8                   235ns ± 1%      233ns ± 1%   -1.01%  (p=0.005 n=10+10)
WriteInts-8                  265ns ± 1%      274ns ± 1%   +3.45%  (p=0.000 n=10+10)
WriteSlice1000Int32s-8      4.61µs ± 5%     4.59µs ± 5%     ~     (p=0.986 n=10+10)
PutUint16-8                 0.56ns ± 4%     0.57ns ± 4%     ~     (p=0.101 n=10+10)
PutUint32-8                 0.83ns ± 2%     0.56ns ± 6%  -32.91%  (p=0.000 n=10+10)
PutUint64-8                 0.81ns ± 3%     0.62ns ± 4%  -23.82%  (p=0.000 n=10+10)
LittleEndianPutUint16-8     0.55ns ± 4%     0.55ns ± 3%     ~     (p=0.926 n=10+10)
LittleEndianPutUint32-8     0.41ns ± 4%     0.42ns ± 3%     ~     (p=0.148 n=10+9)
LittleEndianPutUint64-8     0.55ns ± 2%     0.56ns ± 6%     ~     (p=0.897 n=10+10)
ReadFloats-8                60.4ns ± 4%     59.0ns ± 1%   -2.25%  (p=0.007 n=10+10)
WriteFloats-8               72.3ns ± 2%     71.5ns ± 7%     ~     (p=0.089 n=10+10)
ReadSlice1000Float32s-8     4.21µs ± 3%     4.18µs ± 2%     ~     (p=0.197 n=10+10)
WriteSlice1000Float32s-8    4.61µs ± 2%     4.68µs ± 7%     ~     (p=1.000 n=8+10)
ReadSlice1000Uint8s-8        250ns ± 4%      247ns ± 4%     ~     (p=0.324 n=10+10)
WriteSlice1000Uint8s-8       227ns ± 5%      229ns ± 2%     ~     (p=0.193 n=10+7)
PutUvarint32-8              15.3ns ± 2%     15.4ns ± 4%     ~     (p=0.782 n=10+10)
PutUvarint64-8              38.5ns ± 1%     38.6ns ± 5%     ~     (p=0.396 n=8+10)

name                      old speed      new speed       delta
ReadSlice1000Int32s-8      890MB/s ±17%    953MB/s ± 1%   +7.00%  (p=0.027 n=10+8)
ReadStruct-8               209MB/s ± 8%    204MB/s ± 5%   -2.42%  (p=0.043 n=9+10)
WriteStruct-8              214MB/s ± 3%    210MB/s ± 1%   -1.75%  (p=0.003 n=9+10)
ReadInts-8                 127MB/s ± 1%    129MB/s ± 1%   +1.01%  (p=0.006 n=10+10)
WriteInts-8                113MB/s ± 1%    109MB/s ± 1%   -3.34%  (p=0.000 n=10+10)
WriteSlice1000Int32s-8     868MB/s ± 5%    872MB/s ± 5%     ~     (p=1.000 n=10+10)
PutUint16-8               3.55GB/s ± 4%   3.50GB/s ± 4%     ~     (p=0.093 n=10+10)
PutUint32-8               4.83GB/s ± 2%   7.21GB/s ± 6%  +49.16%  (p=0.000 n=10+10)
PutUint64-8               9.89GB/s ± 3%  12.99GB/s ± 4%  +31.26%  (p=0.000 n=10+10)
LittleEndianPutUint16-8   3.65GB/s ± 4%   3.65GB/s ± 4%     ~     (p=0.912 n=10+10)
LittleEndianPutUint32-8   9.74GB/s ± 3%   9.63GB/s ± 3%     ~     (p=0.222 n=9+9)
LittleEndianPutUint64-8   14.4GB/s ± 2%   14.3GB/s ± 5%     ~     (p=0.912 n=10+10)
ReadFloats-8               199MB/s ± 4%    203MB/s ± 1%   +2.27%  (p=0.007 n=10+10)
WriteFloats-8              166MB/s ± 2%    168MB/s ± 7%     ~     (p=0.089 n=10+10)
ReadSlice1000Float32s-8    949MB/s ± 3%    958MB/s ± 2%     ~     (p=0.218 n=10+10)
WriteSlice1000Float32s-8   867MB/s ± 2%    857MB/s ± 6%     ~     (p=1.000 n=8+10)
ReadSlice1000Uint8s-8     4.00GB/s ± 4%   4.06GB/s ± 4%     ~     (p=0.353 n=10+10)
WriteSlice1000Uint8s-8    4.40GB/s ± 4%   4.36GB/s ± 2%     ~     (p=0.193 n=10+7)
PutUvarint32-8             262MB/s ± 2%    260MB/s ± 4%     ~     (p=0.739 n=10+10)
PutUvarint64-8             208MB/s ± 1%    207MB/s ± 5%     ~     (p=0.408 n=8+10)

Updates #45453

Change-Id: Ifda0d48d54665cef45d46d3aad974062633142c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/354670
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
2021-10-18 16:01:55 +00:00
Jake Ciolek
732f6fa9d5 cmd/compile: use ANDL for small immediates
We can rewrite ANDQ with an immediate fitting in 32bit with an ANDL, which is shorter to encode.

Looking at Go binary itself, before the change there was:

ANDL: 2337
ANDQ: 4476

After the change:

ANDL: 3790
ANDQ: 3024

So we got rid of 1452 ANDQs

This makes the Linux x86_64 binary 0.03% smaller.

There seems to be an impact on performance.

Intel Cascade Lake benchmarks (with perflock):

name                     old time/op    new time/op    delta
BinaryTree17-8              1.91s ± 1%     1.89s ± 1%  -1.22%  (p=0.000 n=21+18)
Fannkuch11-8                2.34s ± 0%     2.34s ± 0%    ~     (p=0.052 n=20+20)
FmtFprintfEmpty-8          27.7ns ± 1%    27.4ns ± 3%    ~     (p=0.497 n=21+21)
FmtFprintfString-8         53.2ns ± 0%    51.5ns ± 0%  -3.21%  (p=0.000 n=20+19)
FmtFprintfInt-8            57.3ns ± 0%    55.7ns ± 0%  -2.89%  (p=0.000 n=19+19)
FmtFprintfIntInt-8         92.3ns ± 0%    88.4ns ± 1%  -4.23%  (p=0.000 n=20+21)
FmtFprintfPrefixedInt-8     103ns ± 0%     103ns ± 0%  +0.23%  (p=0.000 n=20+21)
FmtFprintfFloat-8           147ns ± 0%     148ns ± 0%  +0.75%  (p=0.000 n=20+21)
FmtManyArgs-8               384ns ± 0%     381ns ± 0%  -0.63%  (p=0.000 n=21+21)
GobDecode-8                3.86ms ± 1%    3.88ms ± 1%  +0.52%  (p=0.000 n=20+21)
GobEncode-8                2.77ms ± 1%    2.77ms ± 0%    ~     (p=0.078 n=21+21)
Gzip-8                      168ms ± 1%     168ms ± 0%  +0.24%  (p=0.000 n=20+20)
Gunzip-8                   25.1ms ± 0%    24.3ms ± 0%  -3.03%  (p=0.000 n=21+21)
HTTPClientServer-8         61.4µs ± 8%    59.1µs ±10%    ~     (p=0.088 n=20+21)
JSONEncode-8               6.86ms ± 0%    6.70ms ± 0%  -2.29%  (p=0.000 n=20+19)
JSONDecode-8               30.8ms ± 1%    30.6ms ± 1%  -0.82%  (p=0.000 n=20+20)
Mandelbrot200-8            3.85ms ± 0%    3.85ms ± 0%    ~     (p=0.191 n=16+17)
GoParse-8                  2.61ms ± 2%    2.60ms ± 1%    ~     (p=0.561 n=21+20)
RegexpMatchEasy0_32-8      48.5ns ± 2%    45.9ns ± 3%  -5.26%  (p=0.000 n=20+21)
RegexpMatchEasy0_1K-8       139ns ± 0%     139ns ± 0%  +0.27%  (p=0.000 n=18+20)
RegexpMatchEasy1_32-8      41.3ns ± 0%    42.1ns ± 4%  +1.95%  (p=0.000 n=17+21)
RegexpMatchEasy1_1K-8       216ns ± 2%     216ns ± 0%  +0.17%  (p=0.020 n=21+19)
RegexpMatchMedium_32-8      790ns ± 7%     803ns ± 8%    ~     (p=0.178 n=21+21)
RegexpMatchMedium_1K-8     23.5µs ± 5%    23.7µs ± 5%    ~     (p=0.421 n=21+21)
RegexpMatchHard_32-8       1.09µs ± 1%    1.09µs ± 1%  -0.53%  (p=0.000 n=19+18)
RegexpMatchHard_1K-8       33.0µs ± 0%    33.0µs ± 0%    ~     (p=0.610 n=21+20)
Revcomp-8                   348ms ± 0%     353ms ± 0%  +1.38%  (p=0.000 n=17+18)
Template-8                 42.0ms ± 1%    41.9ms ± 1%  -0.30%  (p=0.049 n=20+20)
TimeParse-8                 185ns ± 0%     185ns ± 0%    ~     (p=0.387 n=20+18)
TimeFormat-8                237ns ± 1%     241ns ± 1%  +1.57%  (p=0.000 n=21+21)
[Geo mean]                 35.4µs         35.2µs       -0.66%

name                     old speed      new speed      delta
GobDecode-8               199MB/s ± 1%   198MB/s ± 1%  -0.52%  (p=0.000 n=20+21)
GobEncode-8               277MB/s ± 1%   277MB/s ± 0%    ~     (p=0.075 n=21+21)
Gzip-8                    116MB/s ± 1%   115MB/s ± 0%  -0.25%  (p=0.000 n=20+20)
Gunzip-8                  773MB/s ± 0%   797MB/s ± 0%  +3.12%  (p=0.000 n=21+21)
JSONEncode-8              283MB/s ± 0%   290MB/s ± 0%  +2.35%  (p=0.000 n=20+19)
JSONDecode-8             63.0MB/s ± 1%  63.5MB/s ± 1%  +0.82%  (p=0.000 n=20+20)
GoParse-8                22.2MB/s ± 2%  22.3MB/s ± 1%    ~     (p=0.539 n=21+20)
RegexpMatchEasy0_32-8     660MB/s ± 2%   697MB/s ± 3%  +5.57%  (p=0.000 n=20+21)
RegexpMatchEasy0_1K-8    7.36GB/s ± 0%  7.34GB/s ± 0%  -0.26%  (p=0.000 n=18+20)
RegexpMatchEasy1_32-8     775MB/s ± 0%   761MB/s ± 4%  -1.88%  (p=0.000 n=17+21)
RegexpMatchEasy1_1K-8    4.74GB/s ± 2%  4.74GB/s ± 0%  -0.18%  (p=0.020 n=21+19)
RegexpMatchMedium_32-8   40.6MB/s ± 7%  39.9MB/s ± 9%    ~     (p=0.191 n=21+21)
RegexpMatchMedium_1K-8   43.7MB/s ± 5%  43.2MB/s ± 5%    ~     (p=0.435 n=21+21)
RegexpMatchHard_32-8     29.3MB/s ± 1%  29.4MB/s ± 1%  +0.53%  (p=0.000 n=19+18)
RegexpMatchHard_1K-8     31.0MB/s ± 0%  31.0MB/s ± 0%    ~     (p=0.572 n=21+20)
Revcomp-8                 730MB/s ± 0%   720MB/s ± 0%  -1.36%  (p=0.000 n=17+18)
Template-8               46.2MB/s ± 1%  46.3MB/s ± 1%  +0.30%  (p=0.041 n=20+20)
[Geo mean]                204MB/s        205MB/s       +0.30%

Change-Id: Iac75d0ec184a515ce0e65e19559d5fe2e9840514
Reviewed-on: https://go-review.googlesource.com/c/go/+/354970
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-10-12 22:02:39 +00:00
Alejandro García Montoro
e1c294a56d cmd/compile: eliminate successive swaps
The code generated when storing eight bytes loaded from memory in big
endian introduced two successive byte swaps that did not actually
modified the data.

The new rules match this specific pattern both for amd64 and for arm64,
eliminating the double swap.

Fixes #41684

Change-Id: Icb6dc20b68e4393cef4fe6a07b33aba0d18c3ff3
Reviewed-on: https://go-review.googlesource.com/c/go/+/320073
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
2021-10-09 01:04:29 +00:00
Ruslan Andreev
810b08b8ec cmd/compile: inline memequal(x, const, sz) for small sizes
This CL adds late expanded memequal(x, const, sz) inlining for 2, 4, 8
bytes size. This PoC is using the same method as CL 248404.
This optimization fires about 100 times in Go compiler (1675 occurrences
reduced to 1574, so -6%).
Also, added unit-tests to codegen/comparisions.go file.

Updates #37275

Change-Id: Ia52808d573cb706d1da8166c5746ede26f46c5da
Reviewed-on: https://go-review.googlesource.com/c/go/+/328291
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Trust: David Chase <drchase@google.com>
2021-10-06 13:47:50 +00:00
nimelehin
097a82f54d cmd/compile: don't emit unnecessary amd64 extension checks
In case of amd64 the compiler issues checks if extensions are
available on a platform. With GOAMD64 microarchitecture levels
provided, some of the checks could be eliminated.

Change-Id: If15c178bcae273b2ce7d3673415cb8849292e087
Reviewed-on: https://go-review.googlesource.com/c/go/+/352010
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
2021-10-05 18:32:12 +00:00
wdvxdr
060cd73ab9 cmd/compile: use TZCNT instruction for GOAMD64>=v3
on my Intel CoffeeLake CPU:
name               old time/op  new time/op  delta
TrailingZeros-8    0.68ns ± 1%  0.64ns ± 1%  -6.26%  (p=0.000 n=10+10)
TrailingZeros8-8   0.70ns ± 1%  0.70ns ± 1%    ~     (p=0.697 n=10+10)
TrailingZeros16-8  0.70ns ± 1%  0.70ns ± 1%  +0.57%  (p=0.043 n=10+10)
TrailingZeros32-8  0.66ns ± 1%  0.64ns ± 1%  -3.35%  (p=0.000 n=10+10)
TrailingZeros64-8  0.68ns ± 1%  0.64ns ± 1%  -5.84%  (p=0.000 n=9+10)

Updates #45453

Change-Id: I228ff2d51df24b1306136f061432f8a12bb1d6fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/353249
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-10-05 16:06:49 +00:00
Lynn Boger
b8990ec932 test: update test/codegen/noextend.go to work with either ABI on ppc64x
This updates the codegen tests in noextend.go so they are not
dependent on the ABI.

Change-Id: I8433bea9dc78830c143290a7e0cf901b2397d38a
Reviewed-on: https://go-review.googlesource.com/c/go/+/353070
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-29 17:30:11 +00:00
Archana R
b35c668072 cmd/compile: add PPC64-specific inlining for runtime.memmove
Add rule to PPC64.rules to inline runtime.memmove in more cases, as is
done for other target architectures
Updated tests in codegen/copy.go to verify changes are done on
ppc64/ppc64le

Updates #41662

Change-Id: Id937ce21f9b4f4047b3e66dfa3c960128ee16a2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/352054
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2021-09-29 16:55:51 +00:00
Joel Sing
fe8347b61a cmd/compile: optimise immediate operands with constants on riscv64
Instructions with immediates can be precomputed when operating on a
constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI
and ADDI. Additionally, optimise ANDI and ORI when the immediate is
all ones or all zeroes.

In particular, the RISCV64 logical left and right shift rules
(Lsh*x*/Rsh*Ux*) produce sequences that check if the shift amount
exceeds 64 and if so returns zero. When the shift amount is a
constant we can precompute and eliminate the filter entirely.

Likewise the arithmetic right shift rules produce sequences that
check if the shift amount exceeds 64 and if so, ensures that the
lower six bits of the shift are all ones. When the shift amount
is a constant we can precompute the shift value.

Arithmetic right shift sequences like:

   117fc:       00100513                li      a0,1
   11800:       04053593                sltiu   a1,a0,64
   11804:       fff58593                addi    a1,a1,-1
   11808:       0015e593                ori     a1,a1,1
   1180c:       40b45433                sra     s0,s0,a1

Are now a single srai instruction:

   117fc:       40145413                srai    s0,s0,0x1

Likewise for logical left shift (and logical right shift):

   1d560:       01100413                li      s0,17
   1d564:       04043413                sltiu   s0,s0,64
   1d568:       40800433                neg     s0,s0
   1d56c:       01131493                slli    s1,t1,0x11
   1d570:       0084f433                and     s0,s1,s0

Which are now a single slli (or srli) instruction:

   1d120:       01131413                slli    s0,t1,0x11

This removes more than 30,000 instructions from the Go binary and
should improve performance in a variety of areas - of note
runtime.makemap_small drops from 48 to 36 instructions. Similar
gains exist in at least other parts of runtime and math/bits.

Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350689
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-24 10:51:48 +00:00
Joel Sing
d413908320 test/codegen: add shift tests for RISCV64
Add tests for shift by constant, masked shifts and bounded shifts. While here,
sort tests by architecture and keep order of tests consistent (lsh, rshU, rsh).

Change-Id: I512d64196f34df9cb2884e8c0f6adcf9dd88b0fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/351289
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
2021-09-24 07:22:13 +00:00
Matthew Dempsky
04572fa29b cmd/compile: use BMI1 instructions for GOAMD64=v3 and higher
BMI1 includes four instructions (ANDN, BLSI, BLSMSK, BLSR) that are
easy to peephole optimize, and which GCC always seems to favor using
when available and applicable.

Updates #45453.

Change-Id: I0274184057058f5c579e5bc3ea9c414396d3cf46
Reviewed-on: https://go-review.googlesource.com/c/go/+/351130
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Trust: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-09-22 00:15:27 +00:00
Keith Randall
6f35430faa cmd/compile: allow rotates to be merged with logical ops on arm64
Fixes #48002

Change-Id: Ie3a157d55b291f5ac2ef4845e6ce4fefd84fc642
Reviewed-on: https://go-review.googlesource.com/c/go/+/350912
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-20 16:25:36 +00:00
Keith Randall
315dbd10c9 cmd/compile: fold double negate on arm64
Fixes #48467

Change-Id: I52305dbf561ee3eee6c1f053e555a3a6ec1ab892
Reviewed-on: https://go-review.googlesource.com/c/go/+/350910
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-09-19 23:50:45 +00:00
Keith Randall
83b36ffb10 cmd/compile: implement constant rotates on arm64
Explicit constant rotates work, but constant arguments to
bits.RotateLeft* needed the additional rule.

Fixes #48465

Change-Id: Ia7544f21d0e7587b6b6506f72421459cd769aea6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350909
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-09-19 23:50:23 +00:00
Michael Munday
c69f5c0d76 cmd/compile: add support for Abs and Copysign intrinsics on riscv64
Also, add the FABSS and FABSD pseudo instructions to the assembler.
The compiler could use FSGNJX[SD] directly but there doesn't seem
to be much advantage to doing so and the pseudo instructions are
easier to understand.

Change-Id: Ie8825b8aa8773c69cc4f07a32ef04abf4061d80d
Reviewed-on: https://go-review.googlesource.com/c/go/+/348989
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
2021-09-10 10:45:59 +00:00
fanzha02
2091bd3f26 cmd/compile: simiplify arm64 bitfield optimizations
In some rewrite rules for arm64 bitfield optimizations, the
bitfield lsb value and the bitfield width value are related
to datasize, some of them use datasize directly to check the
bitfield lsb value is valid, to get the bitfiled width value,
but some of them call isARM64BFMask() and arm64BFWidth()
functions. In order to be consistent, this patch changes them
all to use datasize.

Besides, this patch sorts the codegen test cases.

Run the "toolstash-check -all" command and find one inconsistent code
is as the following.

new:    src/math/fma.go:104      BEQ	247
master: src/math/fma.go:104      BEQ	248

The above inconsistence is due to this patch changing the range of the
field lsb value in "UBFIZ" optimization rules from "lc+(32|16|8)<64" to
"lc<64", so that the following code is generated as "UBFIZ". The logical
of changed code is still correct.

The code of src/math/fma.go:160:
  const uvinf = 0x7FF0000000000000
  func FMA(a, b uint32) float64 {
  	ps := a+b
  	return Float64frombits(uint64(ps)<<63 | uvinf)
  }

The new assembly code:
  TEXT    "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16
  MOVWU   "".a(FP), R0
  MOVWU   "".b+4(FP), R1
  ADD     R1, R0, R0
  UBFIZ   $63, R0, $1, R0
  ORR     $9218868437227405312, R0, R0
  MOVD    R0, "".~r2+8(FP)
  RET     (R30)

The master assembly code:
  TEXT    "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16
  MOVWU   "".a(FP), R0
  MOVWU   "".b+4(FP), R1
  ADD     R1, R0, R0
  MOVWU   R0, R0
  LSL     $63, R0, R0
  ORR     $9218868437227405312, R0, R0
  MOVD    R0, "".~r2+8(FP)
  RET     (R30)

Change-Id: I9061104adfdfd3384d0525327ae1e5c8b0df5c35
Reviewed-on: https://go-review.googlesource.com/c/go/+/265038
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-10 02:44:36 +00:00
Michael Munday
8214257347 test/codegen: fix package name for test case
The codegen tests are currently skipped (see #48247). The test
added in CL 346050 did not compile because it was in the main
package but did not contain a main function. Changing the package
to 'codegen' fixes the issue.

Updates #48247.

Change-Id: I0a0eaca8e6a7d7b335606d2c76a204ac0c12e6d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/348392
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-09-08 14:51:40 +00:00
Michael Munday
1da64686f8 test/codegen: fix compilation of bitfield tests
The codegen tests are currently skipped (see #48247) and the
bitfield tests do not actually compile due to a duplicate function
name (sbfiz5) added in CL 267602. Renaming the function fixes the
issue.

Updates #48247.

Change-Id: I626fd5ef13732dc358e73ace9ddcc4cbb6ae5b21
Reviewed-on: https://go-review.googlesource.com/c/go/+/348391
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-08 14:51:31 +00:00
Michael Munday
fdc2072420 test/codegen: remove broken riscv64 test
This test is not executed by default (see #48247) and does not
actually pass. It was added in CL 346689. The code generation
changes made in that CL only change how instructions are assembled,
they do not actually affect the output of the compiler. This test
is unfortunately therefore invalid and will never pass.

Updates #48247.

Change-Id: I0c807e4a111336e5a097fe4e3af2805f9932a87f
Reviewed-on: https://go-review.googlesource.com/c/go/+/348390
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-09-08 14:51:22 +00:00
wdvxdr
21de6bc463 cmd/compile: simplify less with non-negative number and constant 0 or 1
The most common cases:
len(s) > 0
len(s) < 1

and they can be simplified to:
len(s) != 0
len(s) == 0

Fixes #48054

Change-Id: I16e5b0cffcfab62a4acc2a09977a6cd3543dd000
Reviewed-on: https://go-review.googlesource.com/c/go/+/346050
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-07 14:53:41 +00:00
fanzha02
7b69ddc171 cmd/compile: merge sign extension and shift into SBFIZ
This patch adds some rules to rewrite "(LeftShift (SignExtend x) lc)"
expression as "SBFIZ".

Add the test cases.

Change-Id: I294c4ba09712eeb02c7a952447bb006780f1e60d
Reviewed-on: https://go-review.googlesource.com/c/go/+/267602
Trust: fannie zhang <Fannie.Zhang@arm.com>
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-09-06 03:31:53 +00:00
fanzha02
43b05173a2 cmd/compile: merge zero/sign extensions with UBFX/SBFX on arm64
The UBFX and SBFX already zero/sign extend the result. Further
zero/sign extensions are thus unnecessary as long as they leave
the top bits unaltered. This patch absorbs zero/sign extensions
into UBFX/SBFX.

Add the related test cases.

Change-Id: I7c4516c8b52d677f77bf3aaedab87c4a28056ec0
Reviewed-on: https://go-review.googlesource.com/c/go/+/265039
Trust: fannie zhang <Fannie.Zhang@arm.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-06 03:31:43 +00:00
Ben Shi
37d4532867 cmd/internal/obj/riscv: simplify addition with constant
This CL simplifies riscv addition (add r, imm) to
(ADDI (ADDI r, imm/2), imm-imm/2) if imm is in specific ranges.
(-4096 <= imm <= -2049 or 2048 <= imm <= 4094)

There is little impact to the go1 benchmark, while the total
size of pkg/linux_riscv64 decreased by about 11KB.

Change-Id: I236eb8af3b83bb35ce9c0b318fc1d235e8ab9a4e
GitHub-Last-Rev: a2f56a0763
GitHub-Pull-Request: golang/go#48110
Reviewed-on: https://go-review.googlesource.com/c/go/+/346689
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Michael Munday <mike.munday@lowrisc.org>
2021-09-02 14:35:10 +00:00
Michael Munday
ea51e223c2 cmd/{asm,compile}: add fused multiply-add support on riscv64
Add support to the assembler for F[N]M{ADD,SUB}[SD] instructions.
Argument order is:

  OP RS1, RS2, RS3, RD

Also, add support for the FMA intrinsic to the compiler. Automatic
FMA matching is left to a future CL.

Change-Id: I47166c7393b2ab6bfc2e42aa8c1a8997c3a071b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/293030
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
2021-09-01 21:17:04 +00:00
Jake Ciolek
6cf1d5d0fa cmd/compile: generic SSA rules for simplifying 2 and 3 operand integer arithmetic expressions
This applies the following generic integer addition/subtraction transformations:

x - (x + y) = -y
(x - y) - x = -y
y + (x - y) = x
y + (z + (x - y) = x + z

There's over 40 unique functions matching in Go. Hits 2 funcs in the runtime itself:

runtime.stackfree()
runtime.runqdrain()

Go binary size reduced by 0.05% on Linux x86_64.

StackCopy bench (perflocked Cascade Lake x86):

name                old time/op  new time/op  delta
StackCopyPtr-8      87.3ms ± 1%  86.9ms ± 0%  -0.45%  (p=0.000 n=20+20)
StackCopy-8         77.6ms ± 1%  77.0ms ± 0%  -0.76%  (p=0.000 n=20+20)
StackCopyNoCache-8  2.28ms ± 2%  2.26ms ± 2%  -0.93%  (p=0.008 n=19+20)

test/bench/go1 benchmarks (perflocked Cascade Lake x86):

name                     old time/op    new time/op    delta
BinaryTree17-8              1.88s ± 1%     1.88s ± 0%    ~     (p=0.373 n=15+12)
Fannkuch11-8                2.31s ± 0%     2.35s ± 0%  +1.52%  (p=0.000 n=15+14)
FmtFprintfEmpty-8          26.6ns ± 0%    26.6ns ± 0%    ~     (p=0.081 n=14+13)
FmtFprintfString-8         48.6ns ± 0%    50.0ns ± 0%  +2.86%  (p=0.000 n=15+14)
FmtFprintfInt-8            56.9ns ± 0%    54.8ns ± 0%  -3.70%  (p=0.000 n=15+15)
FmtFprintfIntInt-8         90.4ns ± 0%    88.8ns ± 0%  -1.78%  (p=0.000 n=15+15)
FmtFprintfPrefixedInt-8     104ns ± 0%     104ns ± 0%    ~     (p=0.905 n=14+13)
FmtFprintfFloat-8           148ns ± 0%     144ns ± 0%  -2.19%  (p=0.000 n=14+15)
FmtManyArgs-8               389ns ± 0%     390ns ± 0%  +0.35%  (p=0.000 n=12+15)
GobDecode-8                3.90ms ± 1%    3.88ms ± 0%  -0.49%  (p=0.000 n=15+14)
GobEncode-8                2.73ms ± 0%    2.73ms ± 0%    ~     (p=0.425 n=15+14)
Gzip-8                      169ms ± 0%     168ms ± 0%  -0.52%  (p=0.000 n=13+13)
Gunzip-8                   24.7ms ± 0%    24.8ms ± 0%  +0.61%  (p=0.000 n=15+15)
HTTPClientServer-8         60.5µs ± 6%    60.4µs ± 7%    ~     (p=0.595 n=15+15)
JSONEncode-8               6.97ms ± 1%    6.93ms ± 0%  -0.69%  (p=0.000 n=14+14)
JSONDecode-8               31.2ms ± 1%    30.8ms ± 1%  -1.27%  (p=0.000 n=14+14)
Mandelbrot200-8            3.87ms ± 0%    3.87ms ± 0%    ~     (p=0.652 n=15+14)
GoParse-8                  2.65ms ± 2%    2.64ms ± 1%    ~     (p=0.202 n=15+15)
RegexpMatchEasy0_32-8      45.1ns ± 0%    45.9ns ± 0%  +1.68%  (p=0.000 n=14+15)
RegexpMatchEasy0_1K-8       140ns ± 0%     139ns ± 0%  -0.44%  (p=0.000 n=15+14)
RegexpMatchEasy1_32-8      40.9ns ± 3%    40.5ns ± 0%  -0.88%  (p=0.000 n=15+13)
RegexpMatchEasy1_1K-8       215ns ± 1%     220ns ± 1%  +2.27%  (p=0.000 n=15+15)
RegexpMatchMedium_32-8      783ns ± 7%     738ns ± 0%    ~     (p=0.361 n=15+15)
RegexpMatchMedium_1K-8     24.1µs ± 6%    23.4µs ± 6%  -2.94%  (p=0.004 n=15+15)
RegexpMatchHard_32-8       1.10µs ± 1%    1.09µs ± 1%  -0.40%  (p=0.006 n=15+14)
RegexpMatchHard_1K-8       33.0µs ± 0%    33.0µs ± 0%    ~     (p=0.535 n=12+14)
Revcomp-8                   354ms ± 0%     353ms ± 0%  -0.23%  (p=0.002 n=15+13)
Template-8                 42.0ms ± 1%    41.8ms ± 2%  -0.37%  (p=0.023 n=14+15)
TimeParse-8                 181ns ± 0%     180ns ± 1%  -0.18%  (p=0.014 n=12+13)
TimeFormat-8                240ns ± 0%     242ns ± 1%  +0.69%  (p=0.000 n=12+15)
[Geo mean]                 35.2µs         35.1µs       -0.43%

name                     old speed      new speed      delta
GobDecode-8               197MB/s ± 1%   198MB/s ± 0%  +0.49%  (p=0.000 n=15+14)
GobEncode-8               281MB/s ± 0%   281MB/s ± 0%    ~     (p=0.419 n=15+14)
Gzip-8                    115MB/s ± 0%   115MB/s ± 0%  +0.52%  (p=0.000 n=13+13)
Gunzip-8                  786MB/s ± 0%   781MB/s ± 0%  -0.60%  (p=0.000 n=15+15)
JSONEncode-8              278MB/s ± 1%   280MB/s ± 0%  +0.69%  (p=0.000 n=14+14)
JSONDecode-8             62.3MB/s ± 1%  63.1MB/s ± 1%  +1.29%  (p=0.000 n=14+14)
GoParse-8                21.9MB/s ± 2%  22.0MB/s ± 1%    ~     (p=0.205 n=15+15)
RegexpMatchEasy0_32-8     709MB/s ± 0%   697MB/s ± 0%  -1.65%  (p=0.000 n=14+15)
RegexpMatchEasy0_1K-8    7.34GB/s ± 0%  7.37GB/s ± 0%  +0.43%  (p=0.000 n=15+15)
RegexpMatchEasy1_32-8     783MB/s ± 2%   790MB/s ± 0%  +0.88%  (p=0.000 n=15+13)
RegexpMatchEasy1_1K-8    4.77GB/s ± 1%  4.66GB/s ± 1%  -2.23%  (p=0.000 n=15+15)
RegexpMatchMedium_32-8   41.0MB/s ± 7%  43.3MB/s ± 0%    ~     (p=0.360 n=15+15)
RegexpMatchMedium_1K-8   42.5MB/s ± 6%  43.8MB/s ± 6%  +3.07%  (p=0.004 n=15+15)
RegexpMatchHard_32-8     29.2MB/s ± 1%  29.3MB/s ± 1%  +0.41%  (p=0.006 n=15+14)
RegexpMatchHard_1K-8     31.1MB/s ± 0%  31.1MB/s ± 0%    ~     (p=0.495 n=12+14)
Revcomp-8                 718MB/s ± 0%   720MB/s ± 0%  +0.23%  (p=0.002 n=15+13)
Template-8               46.3MB/s ± 1%  46.4MB/s ± 2%  +0.38%  (p=0.021 n=14+15)
[Geo mean]                205MB/s        206MB/s       +0.57%

Change-Id: Ibd1afdf8b6c0b08087dcc3acd8f943637eb95ac0
Reviewed-on: https://go-review.googlesource.com/c/go/+/344930
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
2021-08-25 19:45:20 +00:00
Meng Zhuo
efd206eb40 cmd/compile: intrinsify Mul64 on riscv64
According to RISCV instruction set manual v2.2 Sec 6.1
MULHU followed by MUL will be fused into one multiply by microarchitecture

Benchstat on Hifive unmatched:
name          old time/op    new time/op    delta
Hash8Bytes       245ns ± 3%     186ns ± 4%  -23.99%  (p=0.000 n=10+10)
Hash320Bytes    1.94µs ± 1%    1.31µs ± 1%  -32.38%  (p=0.000 n=9+10)
Hash1K          5.84µs ± 0%    3.84µs ± 0%  -34.20%  (p=0.000 n=10+9)
Hash8K          45.3µs ± 0%    29.4µs ± 0%  -35.04%  (p=0.000 n=10+10)

name          old speed      new speed      delta
Hash8Bytes    32.7MB/s ± 3%  43.0MB/s ± 4%  +31.61%  (p=0.000 n=10+10)
Hash320Bytes   165MB/s ± 1%   244MB/s ± 1%  +47.88%  (p=0.000 n=9+10)
Hash1K         175MB/s ± 0%   266MB/s ± 0%  +51.98%  (p=0.000 n=10+9)
Hash8K         181MB/s ± 0%   279MB/s ± 0%  +53.94%  (p=0.000 n=10+10)

Change-Id: I3561495d02a4a0ad8578e9b9819bf0a4eaca5d12
Reviewed-on: https://go-review.googlesource.com/c/go/+/329970
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Meng Zhuo <mzh@golangcn.org>
2021-08-16 13:50:11 +00:00
Cherry Mui
6b1e4430bb [dev.typeparams] cmd/compile: implement clobberdead mode on ARM64
For debugging.

Change-Id: I5875ccd2413b8ffd2ec97a0ace66b5cae7893b24
Reviewed-on: https://go-review.googlesource.com/c/go/+/324765
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-06-03 20:00:30 +00:00
Cherry Mui
165d39a1d4 [dev.typeparams] test: adjust codegen test for register ABI on ARM64
In codegen/arithmetic.go, previously there are MOVD's that match
for loads of arguments. With register ABI there are no more such
loads. Remove the MOVD matches.

Change-Id: I920ee2629c8c04d454f13a0c08e283d3528d9a64
Reviewed-on: https://go-review.googlesource.com/c/go/+/324251
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-06-03 14:28:14 +00:00
Ruslan Andreev
3b321a9d12 cmd/compile: add arch-specific inlining for runtime.memmove
This CL add runtime.memmove inlining for AMD64 and ARM64.
According to ssa dump from testcases generic rules can't inline
memmomve properly due to one of the arguments is Phi operation. But this
Phi op will be optimized out by later optimization stages. As a result
memmove can be inlined during arch-specific rules.
The commit add new optimization rules to arch-specific rules that can
inline runtime.memmove if it possible during lowering stage.
Optimization fires 5 times in Go source-code using regabi.

Fixes #41662

Change-Id: Iaffaf4c482d068b5f0683d141863892202cc8824
Reviewed-on: https://go-review.googlesource.com/c/go/+/289151
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: David Chase <drchase@google.com>
2021-05-12 16:23:30 +00:00
Keith Randall
b211fe0058 cmd/compile: remove bit operations that modify memory directly
These operations (BT{S,R,C}{Q,L}modify) are quite a bit slower than
other ways of doing the same thing.

Without the BTxmodify operations, there are two fallback ways the compiler
performs these operations: AND/OR/XOR operations directly on memory, or
load-BTx-write sequences. The compiler kinda chooses one arbitrarily
depending on rewrite rule application order. Currently, it uses
load-BTx-write for the Const benchmarks and AND/OR/XOR directly to memory
for the non-Const benchmarks. TBD, someone might investigate which of
the two fallback strategies is really better. For now, they are both
better than BTx ops.

name              old time/op  new time/op  delta
BitSet-8          1.09µs ± 2%  0.64µs ± 5%  -41.60%  (p=0.000 n=9+10)
BitClear-8        1.15µs ± 3%  0.68µs ± 6%  -41.00%  (p=0.000 n=10+10)
BitToggle-8       1.18µs ± 4%  0.73µs ± 2%  -38.36%  (p=0.000 n=10+8)
BitSetConst-8     37.0ns ± 7%  25.8ns ± 2%  -30.24%  (p=0.000 n=10+10)
BitClearConst-8   30.7ns ± 2%  25.0ns ±12%  -18.46%  (p=0.000 n=10+10)
BitToggleConst-8  36.9ns ± 1%  23.8ns ± 3%  -35.46%  (p=0.000 n=9+10)

Fixes #45790
Update #45242

Change-Id: Ie33a72dc139f261af82db15d446cd0855afb4e59
Reviewed-on: https://go-review.googlesource.com/c/go/+/318149
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ben Shi <powerman1st@163.com>
2021-05-08 03:27:59 +00:00
Cherry Zhang
6b8e3e2d06 cmd/compile: reduce redundant register moves for regabi calls
Currently, if we have AX=a and BX=b, and we want to make a call
F(1, a, b), to move arguments into the desired registers it emits

	MOVQ AX, CX
	MOVL $1, AX // AX=1
	MOVQ BX, DX
	MOVQ CX, BX // BX=a
	MOVQ DX, CX // CX=b

This has a few redundant moves.

This is because we process inputs in order. First, allocate 1 to
AX, which kicks out a (in AX) to CX (a free register at the
moment). Then, allocate a to BX, which kicks out b (in BX) to DX.
Finally, put b to CX.

Notice that if we start with allocating CX=b, then BX=a, AX=1,
we will not have redundant moves. This CL reduces redundant moves
by allocating them in different order: First, for inpouts that are
already in place, keep them there. Then allocate free registers.
Then everything else.

                             before       after
cmd/compile binary size     23703888    23609680
            text size        8565899     8533291

(with regabiargs enabled.)

Change-Id: I69e1bdf745f2c90bb791f6d7c45b37384af1e874
Reviewed-on: https://go-review.googlesource.com/c/go/+/311371
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-19 16:21:39 +00:00
David Chase
4df3d0e4df cmd/compile: rescue stmt boundaries from OpArgXXXReg and OpSelectN.
Fixes this failure:
go test cmd/compile/internal/ssa -run TestStmtLines -v
=== RUN   TestStmtLines
    stmtlines_test.go:115: Saw too many (amd64, > 1%) lines without
    statement marks, total=88263, nostmt=1930
    ('-run TestStmtLines -v' lists failing lines)

The failure has two causes.

One is that the first-line adjuster in code generation was relocating
"first lines" to instructions that would either not have any code generated,
or would have the statment marker removed by a different believed-good heuristic.

The other was that statement boundaries were getting attached to register
values (that with the old ABI were loads from the stack, hence real instructions).
The register values disappear at code generation.

The fixes are to (1) note that certain instructions are not good choices for
"first value" and skip them, and (2) in an expandCalls post-pass, look for
register valued instructions and under appropriate conditions move their
statement marker to a compatible use.

Also updates TestStmtLines to always log the score, for easier comparison of
minor compiler changes.

Updates #40724.

Change-Id: I485573ce900e292d7c44574adb7629cdb4695c3f
Reviewed-on: https://go-review.googlesource.com/c/go/+/309649
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-04-14 18:46:08 +00:00
Cherry Zhang
444d28295b test: make codegen/memops.go work with both ABIs
Following CL 309335, this fixes memops.go.

Change-Id: Ia2458b5267deee9f906f76c50e70a021ea2fcb5b
Reviewed-on: https://go-review.googlesource.com/c/go/+/309552
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-13 14:07:17 +00:00
Cherry Zhang
263e13d1f7 test: make codegen tests work with both ABIs
Some codegen tests were written with the assumption that
arguments and results are in memory, and with a specific stack
layout. With the register ABI, the assumption is no longer true.
Adjust the tests to work with both cases.

- For tests expecting in memory arguments/results, change to use
  global variables or memory-assigned argument/results.

- Allow more registers. E.g. some tests expecting register names
  contain only letters (e.g. AX), but  it can also contain numbers
  (e.g. R10).

- Some instruction selection changes when operate on register vs.
  memory, e.g. ADDQ vs. LEAQ, MOVB vs. MOVL. Accept both.

TODO: mathbits.go and memops.go still need fix.
Change-Id: Ic5932b4b5dd3f5d30ed078d296476b641420c4c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/309335
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-04-12 21:59:59 +00:00
Pat Gavlin
359f44910f cmd/compile: fix long RMW bit operations on AMD64
Under certain circumstances, the existing rules for bit operations can
produce code that writes beyond its intended bounds. For example,
consider the following code:

    func repro(b []byte, addr, bit int32) {
	    _ = b[3]
	    v := uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24 | 1<<(bit&31)
	    b[0] = byte(v)
	    b[1] = byte(v >> 8)
	    b[2] = byte(v >> 16)
	    b[3] = byte(v >> 24)
    }

Roughly speaking:

1. The expression `1 << (bit & 31)` is rewritten into `(SHLL 1 bit)`
2. The expression `uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 |
   uint32(b[3])<<24` is rewritten into `(MOVLload &b[0])`
3. The statements `b[0] = byte(v) ... b[3] = byte(v >> 24)` are
   rewritten into `(MOVLstore &b[0], v)`
4. `(ORL (SHLL 1, bit) (MOVLload &b[0]))` is rewritten into
   `(BTSL (MOVLload &b[0]) bit)`. This is a valid transformation because
   the destination is a register: in this case, the bit offset is masked
   by the number of bits in the destination register. This is identical
   to the masking performed by `SHL`.
5. `(MOVLstore &b[0] (BTSL (MOVLload &b[0]) bit))` is rewritten into
   `(BTSLmodify &b[0] bit)`. This is an invalid transformation because
   the destination is memory: in this case, the bit offset is not
   masked, and the chosen instruction may write outside its intended
   32-bit location.

These changes fix the invalid rewrite performed in step (5) by
explicitly maksing the bit offset operand to `BT(S|R|C)(L|Q)modify`. In
the example above, the adjusted rules produce
`(BTSLmodify &b[0] (ANDLconst [31] bit))` in step (5).

These changes also add several new rules to rewrite bit sets, toggles,
and clears that are rooted at `(OR|XOR|AND)(L|Q)modify` operators into
appropriate `BT(S|R|C)(L|Q)modify` operators. These rules catch cases
where `MOV(L|Q)store ((OR|XOR|AND)(L|Q) ...)` is rewritten to
`(OR|XOR|AND)(L|Q)modify` before the `(OR|XOR|AND)(L|Q) ...` can be
rewritten to `BT(S|R|C)(L|Q) ...`.

Overall, compilecmp reports small improvements in code size on
darwin/amd64 when the changes to the compiler itself are exlcuded:

file                               before   after    Δ       %
runtime.s                          536464   536412   -52     -0.010%
bytes.s                            32629    32593    -36     -0.110%
strings.s                          44565    44529    -36     -0.081%
os/signal.s                        7967     7959     -8      -0.100%
cmd/vendor/golang.org/x/sys/unix.s 81686    81678    -8      -0.010%
math/big.s                         188235   188253   +18     +0.010%
cmd/link/internal/loader.s         89295    89056    -239    -0.268%
cmd/link/internal/ld.s             633551   633232   -319    -0.050%
cmd/link/internal/arm.s            18934    18928    -6      -0.032%
cmd/link/internal/arm64.s          31814    31801    -13     -0.041%
cmd/link/internal/riscv64.s        7347     7345     -2      -0.027%
cmd/compile/internal/ssa.s         4029173  4033066  +3893   +0.097%
total                              21298280 21301472 +3192   +0.015%

Change-Id: I2e560548b515865129e1724e150e30540e9d29ce
GitHub-Last-Rev: 9a42bd29a5
GitHub-Pull-Request: golang/go#45242
Reviewed-on: https://go-review.googlesource.com/c/go/+/304869
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
2021-03-26 19:40:37 +00:00
fanzha02
3a0061822e cmd/compile: add arm64 rules to optimize go codes to constant 0
Optimize the following codes to constant 0.

  function shift (x uint32) uint64 {
    return uint64(x) >> 32
  }

Change-Id: Ida6b39d713cc119ad5a2f01fd54bfd252cf2c975
Reviewed-on: https://go-review.googlesource.com/c/go/+/303830
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-26 01:57:35 +00:00
fanzha02
b182ba7fab cmd/compile: optimize codes with arm64 REV16 instruction
Optimize some patterns into rev16/rev16w instruction.

Pattern1:
    (c & 0xff00ff00)>>8 | (c & 0x00ff00ff)<<8
To:
    rev16w c

Pattern2:
    (c & 0xff00ff00ff00ff00)>>8 | (c & 0x00ff00ff00ff00ff)<<8
To:
    rev16 c

This patch is a copy of CL 239637, contributed by Alice Xu(dianhong.xu@arm.com).

Change-Id: I96936c1db87618bc1903c04221c7e9b2779455b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/268377
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-23 01:36:23 +00:00
Cherry Zhang
6ae3b70ef2 cmd/compile: add clobberdeadreg mode
When -clobberdeadreg flag is set, the compiler inserts code that
clobbers integer registers at call sites. This may be helpful for
debugging register ABI.

Only implemented on AMD64 for now.

Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899
Reviewed-on: https://go-review.googlesource.com/c/go/+/302809
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-03-19 23:21:21 +00:00
fanzha02
f5e6d3e879 cmd/compile: add rewrite rules for conditional instructions on arm64
This CL adds rewrite rules for CSETM, CSINC, CSINV, and CSNEG. By adding
these rules, we can save one instruction.

For example,

  func test(cond bool, a int) int {
    if cond {
      a++
    }
    return a
  }

Before:

  MOVD "".a+8(RSP), R0
  ADD $1, R0, R1
  MOVBU "".cond(RSP), R2
  CMPW $0, R2
  CSEL NE, R1, R0, R0

After:

  MOVBU "".cond(RSP), R0
  CMPW $0, R0
  MOVD "".a+8(RSP), R0
  CSINC EQ, R0, R0, R0

This patch is a copy of CL 285694. Co-authored-by: JunchenLi
<junchen.li@arm.com>

Change-Id: Ic1a79e8b8ece409b533becfcb7950f11e7b76f24
Reviewed-on: https://go-review.googlesource.com/c/go/+/302231
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-18 01:46:58 +00:00
Cherry Zhang
8628bf9a97 cmd/compile: resurrect clobberdead mode
This CL resurrects the clobberdead debugging mode (CL 23924).
When -clobberdead flag is set (TODO: make it GOEXPERIMENT?), the
compiler inserts code that clobbers all dead stack slots that
contains pointers.

Mark windows syscall functions cgo_unsafe_args, as the code
actually does that, by taking the address of one argument and
passing it to cgocall.

Change-Id: Ie09a015f4bd14ae6053cc707866e30ae509b9d6f
Reviewed-on: https://go-review.googlesource.com/c/go/+/301791
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-03-17 17:50:50 +00:00
Meng Zhuo
afe517590c cmd/compile: loads from readonly globals into const for mips64x
Ref: CL 141118
Update #26498

Change-Id: If4ea55c080b9aa10183eefe81fefbd4072deaf3a
Reviewed-on: https://go-review.googlesource.com/c/go/+/280646
Trust: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-16 09:07:17 +00:00
erifan01
600259b099 cmd/compile: use depth first topological sort algorithm for layout
The current layout algorithm tries to put consecutive blocks together,
so the priority of the successor block is higher than the priority of
the zero indegree block. This algorithm is beneficial for subsequent
register allocation, but will result in more branch instructions.
The depth-first topological sorting algorithm is a well-known layout
algorithm, which has applications in many languages, and it helps to
reduce branch instructions. This CL applies it to the layout pass.
The test results show that it helps to reduce the code size.

This CL also includes the following changes:
1, Removed the primary predecessor mechanism. The new layout algorithm is
  not very friendly to register allocator in some cases, in order to adapt
  to the new layout algorithm, a new primary predecessor selection strategy
  is introduced.
2, Since the new layout implementation may place non-loop blocks between
  loop blocks, some adaptive modifications have also been made to looprotate
  pass.
3, The layout also affects the results of codegen, so this CL also adjusted
  several codegen tests accordingly.

It is inevitable that this CL will cause the code size or performance of a
few functions to decrease, but the number of cases it improves is much larger
than the number of cases it drops.

Statistical data from compilecmp on linux/amd64 is as follow:
name                      old time/op       new time/op       delta
Template                        382ms ± 4%        382ms ± 4%    ~     (p=0.497 n=49+50)
Unicode                         170ms ± 9%        169ms ± 8%    ~     (p=0.344 n=48+50)
GoTypes                         2.01s ± 4%        2.01s ± 4%    ~     (p=0.628 n=50+48)
Compiler                        190ms ±10%        189ms ± 9%    ~     (p=0.734 n=50+50)
SSA                             11.8s ± 2%        11.8s ± 3%    ~     (p=0.877 n=50+50)
Flate                           241ms ± 9%        241ms ± 8%    ~     (p=0.897 n=50+49)
GoParser                        366ms ± 3%        361ms ± 4%  -1.21%  (p=0.004 n=47+50)
Reflect                         835ms ± 3%        838ms ± 3%    ~     (p=0.275 n=50+49)
Tar                             336ms ± 4%        335ms ± 3%    ~     (p=0.454 n=48+48)
XML                             433ms ± 4%        431ms ± 3%    ~     (p=0.071 n=49+48)
LinkCompiler                    706ms ± 4%        705ms ± 4%    ~     (p=0.608 n=50+49)
ExternalLinkCompiler            1.85s ± 3%        1.83s ± 2%  -1.47%  (p=0.000 n=49+48)
LinkWithoutDebugCompiler        437ms ± 5%        437ms ± 6%    ~     (p=0.953 n=49+50)
[Geo mean]                      615ms             613ms       -0.37%

name                      old alloc/op      new alloc/op      delta
Template                       38.7MB ± 1%       38.7MB ± 1%    ~     (p=0.834 n=50+50)
Unicode                        28.1MB ± 0%       28.1MB ± 0%  -0.22%  (p=0.000 n=49+50)
GoTypes                         168MB ± 1%        168MB ± 1%    ~     (p=0.054 n=47+47)
Compiler                       23.0MB ± 1%       23.0MB ± 1%    ~     (p=0.432 n=50+50)
SSA                            1.54GB ± 0%       1.54GB ± 0%  +0.21%  (p=0.000 n=50+50)
Flate                          23.6MB ± 1%       23.6MB ± 1%    ~     (p=0.153 n=43+46)
GoParser                       35.1MB ± 1%       35.1MB ± 2%    ~     (p=0.202 n=50+50)
Reflect                        84.7MB ± 1%       84.7MB ± 1%    ~     (p=0.333 n=48+49)
Tar                            34.5MB ± 1%       34.5MB ± 1%    ~     (p=0.406 n=46+49)
XML                            44.3MB ± 2%       44.2MB ± 3%    ~     (p=0.981 n=50+50)
LinkCompiler                    131MB ± 0%        128MB ± 0%  -2.74%  (p=0.000 n=50+50)
ExternalLinkCompiler            120MB ± 0%        120MB ± 0%  +0.01%  (p=0.007 n=50+50)
LinkWithoutDebugCompiler       77.3MB ± 0%       77.3MB ± 0%  -0.02%  (p=0.000 n=50+50)
[Geo mean]                     69.3MB            69.1MB       -0.22%

file      before    after     Δ        %
addr2line 4104220   4043684   -60536   -1.475%
api       5342502   5249678   -92824   -1.737%
asm       4973785   4858257   -115528  -2.323%
buildid   2667844   2625660   -42184   -1.581%
cgo       4686849   4616313   -70536   -1.505%
compile   23667431  23268406  -399025  -1.686%
cover     4959676   4874108   -85568   -1.725%
dist      3515934   3450422   -65512   -1.863%
doc       3995581   3925469   -70112   -1.755%
fix       3379202   3318522   -60680   -1.796%
link      6743249   6629913   -113336  -1.681%
nm        4047529   3991777   -55752   -1.377%
objdump   4456151   4388151   -68000   -1.526%
pack      2435040   2398072   -36968   -1.518%
pprof     13804080  13565808  -238272  -1.726%
test2json 2690043   2645987   -44056   -1.638%
trace     10418492  10232716  -185776  -1.783%
vet       7258259   7121259   -137000  -1.888%
total     113145867 111204202 -1941665 -1.716%

The situation on linux/arm64 is as follow:
name                      old time/op       new time/op       delta
Template                        280ms ± 1%        282ms ± 1%  +0.75%  (p=0.000 n=46+48)
Unicode                         124ms ± 2%        124ms ± 2%  +0.37%  (p=0.045 n=50+50)
GoTypes                         1.69s ± 1%        1.70s ± 1%  +0.56%  (p=0.000 n=49+50)
Compiler                        122ms ± 1%        123ms ± 1%  +0.93%  (p=0.000 n=50+50)
SSA                             12.6s ± 1%        12.7s ± 0%  +0.72%  (p=0.000 n=50+50)
Flate                           170ms ± 1%        172ms ± 1%  +0.97%  (p=0.000 n=49+49)
GoParser                        262ms ± 1%        263ms ± 1%  +0.39%  (p=0.000 n=49+48)
Reflect                         639ms ± 1%        650ms ± 1%  +1.63%  (p=0.000 n=49+49)
Tar                             243ms ± 1%        245ms ± 1%  +0.82%  (p=0.000 n=50+50)
XML                             324ms ± 1%        327ms ± 1%  +0.72%  (p=0.000 n=50+49)
LinkCompiler                    597ms ± 1%        596ms ± 1%  -0.27%  (p=0.001 n=48+47)
ExternalLinkCompiler            1.90s ± 1%        1.88s ± 1%  -1.00%  (p=0.000 n=50+50)
LinkWithoutDebugCompiler        364ms ± 1%        363ms ± 1%    ~     (p=0.220 n=49+50)
[Geo mean]                      485ms             488ms       +0.49%

name                      old alloc/op      new alloc/op      delta
Template                       38.7MB ± 0%       38.8MB ± 1%    ~     (p=0.093 n=43+49)
Unicode                        28.4MB ± 0%       28.4MB ± 0%  +0.03%  (p=0.000 n=49+45)
GoTypes                         169MB ± 1%        169MB ± 1%  +0.23%  (p=0.010 n=50+50)
Compiler                       23.2MB ± 1%       23.2MB ± 1%  +0.11%  (p=0.000 n=40+44)
SSA                            1.54GB ± 0%       1.55GB ± 0%  +0.45%  (p=0.000 n=47+49)
Flate                          23.8MB ± 2%       23.8MB ± 1%    ~     (p=0.543 n=50+50)
GoParser                       35.3MB ± 1%       35.4MB ± 1%    ~     (p=0.792 n=50+50)
Reflect                        85.2MB ± 1%       85.2MB ± 0%    ~     (p=0.055 n=50+47)
Tar                            34.5MB ± 1%       34.5MB ± 1%  +0.06%  (p=0.015 n=50+50)
XML                            43.8MB ± 2%       43.9MB ± 2%  +0.19%  (p=0.000 n=48+48)
LinkCompiler                    137MB ± 0%        136MB ± 0%  -0.92%  (p=0.000 n=50+50)
ExternalLinkCompiler            127MB ± 0%        127MB ± 0%    ~     (p=0.516 n=50+50)
LinkWithoutDebugCompiler       84.0MB ± 0%       84.0MB ± 0%    ~     (p=0.057 n=50+50)
[Geo mean]                     70.4MB            70.4MB       +0.01%

file      before    after     Δ        %
addr2line 4021557   4002933   -18624   -0.463%
api       5127847   5028503   -99344   -1.937%
asm       5034716   4936836   -97880   -1.944%
buildid   2608118   2594094   -14024   -0.538%
cgo       4488592   4398320   -90272   -2.011%
compile   22501129  22213592  -287537  -1.278%
cover     4742301   4713573   -28728   -0.606%
dist      3388071   3365311   -22760   -0.672%
doc       3802250   3776082   -26168   -0.688%
fix       3306147   3216939   -89208   -2.698%
link      6404483   6363699   -40784   -0.637%
nm        3941026   3921930   -19096   -0.485%
objdump   4383330   4295122   -88208   -2.012%
pack      2404547   2389515   -15032   -0.625%
pprof     12996234  12856818  -139416  -1.073%
test2json 2668500   2586788   -81712   -3.062%
trace     9816276   9609580   -206696  -2.106%
vet       6900682   6787338   -113344  -1.643%
total     108535806 107056973 -1478833 -1.363%

Change-Id: Iaec1cdcaacca8025e9babb0fb8a532fddb70c87d
Reviewed-on: https://go-review.googlesource.com/c/go/+/255239
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: eric fang <eric.fang@arm.com>
2021-03-16 02:44:54 +00:00
Josh Bleecher Snyder
43d5f213e2 cmd/compile: optimize multi-register shifts on amd64
amd64 can shift in bits from another register instead of filling with 0/1.
This pattern is helpful when implementing 128 bit shifts or arbitrary length shifts.
In the standard library, it shows up in pure Go math/big.

Benchmarks results on amd64 with -tags=math_big_pure_go.

name                          old time/op  new time/op  delta
NonZeroShifts/1/shrVU-8       4.45ns ± 3%  4.39ns ± 1%   -1.28%  (p=0.000 n=30+27)
NonZeroShifts/1/shlVU-8       4.13ns ± 4%  4.10ns ± 2%     ~     (p=0.254 n=29+28)
NonZeroShifts/2/shrVU-8       5.55ns ± 1%  5.63ns ± 2%   +1.42%  (p=0.000 n=28+29)
NonZeroShifts/2/shlVU-8       5.70ns ± 2%  5.14ns ± 1%   -9.82%  (p=0.000 n=29+28)
NonZeroShifts/3/shrVU-8       6.79ns ± 2%  6.35ns ± 2%   -6.46%  (p=0.000 n=28+29)
NonZeroShifts/3/shlVU-8       6.69ns ± 1%  6.25ns ± 1%   -6.60%  (p=0.000 n=28+27)
NonZeroShifts/4/shrVU-8       7.79ns ± 2%  7.06ns ± 2%   -9.48%  (p=0.000 n=30+30)
NonZeroShifts/4/shlVU-8       7.82ns ± 1%  7.24ns ± 1%   -7.37%  (p=0.000 n=28+29)
NonZeroShifts/5/shrVU-8       8.90ns ± 3%  7.93ns ± 1%  -10.84%  (p=0.000 n=29+26)
NonZeroShifts/5/shlVU-8       8.68ns ± 1%  7.92ns ± 1%   -8.76%  (p=0.000 n=29+29)
NonZeroShifts/10/shrVU-8      14.4ns ± 1%  12.3ns ± 2%  -14.79%  (p=0.000 n=28+29)
NonZeroShifts/10/shlVU-8      14.1ns ± 1%  11.9ns ± 2%  -15.55%  (p=0.000 n=28+27)
NonZeroShifts/100/shrVU-8      118ns ± 1%    96ns ± 3%  -18.82%  (p=0.000 n=30+29)
NonZeroShifts/100/shlVU-8      120ns ± 2%    98ns ± 2%  -18.46%  (p=0.000 n=29+28)
NonZeroShifts/1000/shrVU-8    1.10µs ± 1%  0.88µs ± 2%  -19.63%  (p=0.000 n=29+30)
NonZeroShifts/1000/shlVU-8    1.10µs ± 2%  0.88µs ± 2%  -20.28%  (p=0.000 n=29+28)
NonZeroShifts/10000/shrVU-8   10.9µs ± 1%   8.7µs ± 1%  -19.78%  (p=0.000 n=28+27)
NonZeroShifts/10000/shlVU-8   10.9µs ± 2%   8.7µs ± 1%  -19.64%  (p=0.000 n=29+27)
NonZeroShifts/100000/shrVU-8   111µs ± 2%    90µs ± 2%  -19.39%  (p=0.000 n=28+29)
NonZeroShifts/100000/shlVU-8   113µs ± 2%    90µs ± 2%  -20.43%  (p=0.000 n=30+27)

The assembly version is still faster, unfortunately, but the gap is narrowing.
Speedup from pure Go to assembly:

name                          old time/op  new time/op  delta
NonZeroShifts/1/shrVU-8       4.39ns ± 1%  3.45ns ± 2%  -21.36%  (p=0.000 n=27+29)
NonZeroShifts/1/shlVU-8       4.10ns ± 2%  3.47ns ± 3%  -15.42%  (p=0.000 n=28+30)
NonZeroShifts/2/shrVU-8       5.63ns ± 2%  3.97ns ± 0%  -29.40%  (p=0.000 n=29+25)
NonZeroShifts/2/shlVU-8       5.14ns ± 1%  3.77ns ± 2%  -26.65%  (p=0.000 n=28+26)
NonZeroShifts/3/shrVU-8       6.35ns ± 2%  4.79ns ± 2%  -24.52%  (p=0.000 n=29+29)
NonZeroShifts/3/shlVU-8       6.25ns ± 1%  4.42ns ± 1%  -29.29%  (p=0.000 n=27+26)
NonZeroShifts/4/shrVU-8       7.06ns ± 2%  5.64ns ± 1%  -20.05%  (p=0.000 n=30+29)
NonZeroShifts/4/shlVU-8       7.24ns ± 1%  5.34ns ± 2%  -26.23%  (p=0.000 n=29+29)
NonZeroShifts/5/shrVU-8       7.93ns ± 1%  6.56ns ± 2%  -17.26%  (p=0.000 n=26+30)
NonZeroShifts/5/shlVU-8       7.92ns ± 1%  6.27ns ± 1%  -20.79%  (p=0.000 n=29+25)
NonZeroShifts/10/shrVU-8      12.3ns ± 2%  10.2ns ± 2%  -17.21%  (p=0.000 n=29+29)
NonZeroShifts/10/shlVU-8      11.9ns ± 2%  10.5ns ± 2%  -12.45%  (p=0.000 n=27+29)
NonZeroShifts/100/shrVU-8     95.9ns ± 3%  77.7ns ± 1%  -19.00%  (p=0.000 n=29+30)
NonZeroShifts/100/shlVU-8     97.5ns ± 2%  66.8ns ± 2%  -31.47%  (p=0.000 n=28+30)
NonZeroShifts/1000/shrVU-8     884ns ± 2%   705ns ± 1%  -20.17%  (p=0.000 n=30+28)
NonZeroShifts/1000/shlVU-8     880ns ± 2%   590ns ± 1%  -32.96%  (p=0.000 n=28+25)
NonZeroShifts/10000/shrVU-8   8.74µs ± 1%  7.34µs ± 3%  -15.94%  (p=0.000 n=27+30)
NonZeroShifts/10000/shlVU-8   8.73µs ± 1%  6.00µs ± 1%  -31.25%  (p=0.000 n=27+28)
NonZeroShifts/100000/shrVU-8  89.6µs ± 2%  75.5µs ± 2%  -15.80%  (p=0.000 n=29+29)
NonZeroShifts/100000/shlVU-8  89.6µs ± 2%  68.0µs ± 3%  -24.09%  (p=0.000 n=27+30)

Change-Id: I18f58d8f5513d737d9cdf09b8f9d14011ffe3958
Reviewed-on: https://go-review.googlesource.com/c/go/+/297050
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-11 19:11:46 +00:00
Paul E. Murphy
48ddf70128 cmd/asm,cmd/compile: support 5 operand RLWNM/RLWMI on ppc64
These instructions are actually 5 argument opcodes as specified
by the ISA.  Prior to this patch, the MB and ME arguments were
merged into a single bitmask operand to workaround the limitations
of the ppc64 assembler backend.

This limitation no longer exists. Thus, we can pass operands for
these opcodes without having to merge the MB and ME arguments in
the assembler frontend or compiler backend.

Likewise, support for 4 operand variants is unchanged.

Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/298750
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2021-03-09 20:35:41 +00:00
fanzha02
2b50ab2aee cmd/compile: optimize single-precision floating point square root
Add generic rule to rewrite the single-precision square root expression
with one single-precision instruction. The optimization will reduce two
times of precision converting between double-precision and single-precision.

On arm64 flatform.

previous:
  FCVTSD F0, F0
  FSQRTD F0, F0
  FCVTDS F0, F0

optimized:
  FSQRTS S0, S0

And this patch adds the test case to check the correctness.

This patch refers to CL 241877, contributed by Alice Xu
(dianhong.xu@arm.com)

Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/276873
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-02 06:38:07 +00:00
Egon Elbre
3ee32439b5 cmd/compile: ARM64 optimize []float64 and []float32 access
Optimize load and store to []float64 and []float32.
Previously it used LSL instead of shifted register indexed load/store.

Before:

    LSL   $3, R0, R0
    FMOVD F0, (R1)(R0)

After:

    FMOVD F0, (R1)(R0<<3)

Fixes #42798

Change-Id: I0c0912140c3dce5aa6abc27097c0eb93833cc589
Reviewed-on: https://go-review.googlesource.com/c/go/+/273706
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
2021-02-24 19:49:08 +00:00
Alejandro García Montoro
bf48163e8f cmd/compile: add rule to coalesce writes
The code generated when storing eight bytes loaded from memory created a
series of small writes instead of a single, large one. The specific
pattern of instructions generated stored 1 byte, then 2 bytes, then 4
bytes, and finally 1 byte.

The new rules match this specific pattern both for amd64 and for s390x,
and convert it into a single instruction to store the 8 bytes. arm64 and
ppc64le already generated the right code, but the new codegen test
covers also those architectures.

Fixes #41663

Change-Id: Ifb9b464be2d59c2ed5034acf7b9c3e473f344030
Reviewed-on: https://go-review.googlesource.com/c/go/+/280456
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-02-24 19:25:49 +00:00
Keith Randall
42cd40ee74 cmd/compile: improve bit test code
Some bit test instruction generation stopped triggering after
the change to addressing modes. I suspect this was just because
ANDQload was being generated before the rewrite rules could discover
the BTQ. Fix that by decomposing the ANDQload when it is surrounded
by a TESTQ (thus re-enabling the BTQ rules).

Fixes #44228

Change-Id: I489b4a5a7eb01c65fc8db0753f8cec54097cadb2
Reviewed-on: https://go-review.googlesource.com/c/go/+/291749
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-02-23 18:02:48 +00:00
Cherry Zhang
401d7e5a24 [dev.regabi] cmd/compile: reserve X15 as zero register on AMD64
In ABIInternal, reserve X15 as constant zero, and use it to zero
memory. (Maybe there can be more use of it?)

The register is zeroed when transition to ABIInternal from ABI0.

Caveat: using X15 generates longer instructions than using X0.
Maybe we want to use X0?

Change-Id: I12d5ee92a01fc0b59dad4e5ab023ac71bc2a8b7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/288093
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-02-03 22:44:53 +00:00
David Chase
9a19481acb [dev.regabi] cmd/compile: make ordering for InvertFlags more stable
Current many architectures use a rule along the lines of

// Canonicalize the order of arguments to comparisons - helps with CSE.
((CMP|CMPW) x y) && x.ID > y.ID => (InvertFlags ((CMP|CMPW) y x))

to normalize comparisons as much as possible for CSE.  Replace the
ID comparison with something less variable across compiler changes.
This helps avoid spurious failures in some of the codegen-comparison
tests (though the current choice of comparison is sensitive to Op
ordering).

Two tests changed to accommodate modified instruction choice.

Change-Id: Ib35f450bd2bae9d4f9f7838ceaf7ec682bcf1e1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/280155
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-01-13 02:40:43 +00:00
Lynn Boger
0ae3b7cb74 cmd/compile: fix rules regression with shifts on PPC64
Some rules for PPC64 were checking for a case
where a shift followed by an 'and' of a mask could
be lowered, depending on the format of the mask. The
function to verify if the mask was valid for this purpose
was not checking if the mask was 0 which we don't want to
allow. This case can happen if previous optimizations
resulted in that mask value.

This fixes isPPC64ValidShiftMask to check for a mask of 0 and return
false.

This also adds a codegen testcase to verify it doesn't try to
match the rules in the future.

Fixes #42610

Change-Id: I565d94e88495f51321ab365d6388c01e791b4dbb
Reviewed-on: https://go-review.googlesource.com/c/go/+/270358
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-11-17 13:20:20 +00:00
Alberto Donizetti
7307e86afd test/codegen: go fmt
Fixes #42445

Change-Id: I9653ef094dba2a1ac2e3daaa98279d10df17a2a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/268257
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Trust: Martin Möhrmann <moehrmann@google.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2020-11-08 12:19:55 +00:00
Michael Munday
854e892ce1 cmd/compile: optimize shift pairs and masks on s390x
Optimize combinations of left and right shifts by a constant value
into a 'rotate then insert selected bits [into zero]' instruction.
Use the same instruction for contiguous masks since it has some
benefits over 'and immediate' (not restricted to 32-bits, does not
overwrite source register).

To keep the complexity of this change under control I've only
implemented 64 bit operations for now.

There are a lot more optimizations that can be done with this
instruction family. However, since their function overlaps with other
instructions we need to be somewhat careful not to break existing
optimization rules by creating optimization dead ends. This is
particularly true of the load/store merging rules which contain lots
of zero extensions and shifts.

This CL does interfere with the store merging rules when an operand
is shifted left before it is stored:

  binary.BigEndian.PutUint64(b, x << 1)

This is unfortunate but it's not critical and somewhat complex so
I plan to fix that in a follow up CL.

file      before    after     Δ       %
addr2line 4117446   4117282   -164    -0.004%
api       4945184   4942752   -2432   -0.049%
asm       4998079   4991891   -6188   -0.124%
buildid   2685158   2684074   -1084   -0.040%
cgo       4553732   4553394   -338    -0.007%
compile   19294446  19245070  -49376  -0.256%
cover     4897105   4891319   -5786   -0.118%
dist      3544389   3542785   -1604   -0.045%
doc       3926795   3927617   +822    +0.021%
fix       3302958   3293868   -9090   -0.275%
link      6546274   6543456   -2818   -0.043%
nm        4102021   4100825   -1196   -0.029%
objdump   4542431   4548483   +6052   +0.133%
pack      2482465   2416389   -66076  -2.662%
pprof     13366541  13363915  -2626   -0.020%
test2json 2829007   2761515   -67492  -2.386%
trace     10216164  10219684  +3520   +0.034%
vet       6773956   6773572   -384    -0.006%
total     107124151 106917891 -206260 -0.193%

Change-Id: I7591cce41e06867ba10a745daae9333513062746
Reviewed-on: https://go-review.googlesource.com/c/go/+/233317
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Michael Munday <mike.munday@ibm.com>
2020-11-06 10:45:31 +00:00
Cherry Zhang
0387bedadf cmd/compile: remove racefuncenterfp when it is not needed
We already remove racefuncenter and racefuncexit if they are not
needed (i.e. the function doesn't have any other race  calls).
racefuncenterfp is like racefuncenter but used on LR machines.
Remove unnecessary racefuncenterfp as well.

Change-Id: I65edb00e19c6d9ab55a204cbbb93e9fb710559f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/267099
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-11-02 03:03:16 +00:00
Paul E. Murphy
c3c6fbf314 cmd/compile: combine more 32 bit shift and mask operations on ppc64
Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is
and the shift value produce constant which can be encoded into an
RLWINM instruction.

Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate
masks produces a constant which can be encoded into RLWINM.

Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)).

Combine rotate word + and operations which can be encoded as a single
RLWINM/RLWNM instruction.

The most notable performance improvements arise from the crypto
benchmarks below (GOARCH=power8 on a ppc64le/linux):

pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le
ExpandKeyWithSalt                               52.2µs ± 0%    47.5µs ± 0%  -8.88%
ExpandKey                                       44.4µs ± 0%    40.3µs ± 0%  -9.15%

pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le
Key                                             57.6ms ± 0%    52.3ms ± 0%  -9.13%

pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le
Equal                                           90.9ms ± 0%    82.6ms ± 0%  -9.13%
DefaultCost                                     91.0ms ± 0%    82.7ms ± 0%  -9.12%

Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/260798
Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-27 18:33:20 +00:00
Keith Randall
04b8a9fea5 all: implement GO386=softfloat
Backstop support for non-sse2 chips now that 387 is gone.

RELNOTE=yes

Change-Id: Ib10e69c4a3654c15a03568f93393437e1939e013
Reviewed-on: https://go-review.googlesource.com/c/go/+/260017
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-10-06 22:49:38 +00:00
Keith Randall
fe2cfb74ba all: drop 387 support
My last 387 CL. So sad ... ... ... ... not!

Fixes #40255

Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/258957
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-02 00:00:51 +00:00
Lynn Boger
cc2a5cf4b8 cmd/compile,cmd/internal/obj/ppc64: fix some shift rules due to a regression
A recent change to improve shifts was generating some
invalid cases when the rule was based on an AND. The
extended mnemonics CLRLSLDI and CLRLSLWI only allow
certain values for the operands and in the mask case
those values were not being checked properly. This
adds a check to those rules to verify that the
'b' and 'n' values used when an AND was part of the rule
have correct values.

There was a bug in some diag messages in asm9. The
message expected 3 values but only provided 2. Those are
corrected here also.

The test/codegen/shift.go was updated to add a few more
cases to check for the case mentioned here.

Some of the comments that mention the order of operands
in these extended mnemonics were wrong and those have been
corrected.

Fixes #41683.

Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31
Reviewed-on: https://go-review.googlesource.com/c/go/+/258138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-01 18:51:18 +00:00
Lynn Boger
a424f6e45e cmd/asm,cmd/compile,cmd/internal/obj/ppc64: add extswsli support on power9
This adds support for the extswsli instruction which combines
extsw followed by a shift.

New benchmark demonstrates the improvement:
name      old time/op  new time/op  delta
ExtShift  1.34µs ± 0%  1.30µs ± 0%  -3.15%  (p=0.057 n=4+3)

Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-28 18:13:48 +00:00
Lynn Boger
967465da29 cmd/compile: use combined shifts to improve array addressing on ppc64x
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.

These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.

Some rules in PPC64.rules were moved because the ordering prevented
some matching.

The following results were generated on power9.

hash/crc32:
    CRC32/poly=Koopman/size=40/align=0          195ns ± 0%     163ns ± 0%  -16.41%
    CRC32/poly=Koopman/size=40/align=1          200ns ± 0%     163ns ± 0%  -18.50%
    CRC32/poly=Koopman/size=512/align=0        1.98µs ± 0%    1.67µs ± 0%  -15.46%
    CRC32/poly=Koopman/size=512/align=1        1.98µs ± 0%    1.69µs ± 0%  -14.80%
    CRC32/poly=Koopman/size=1kB/align=0        3.90µs ± 0%    3.31µs ± 0%  -15.27%
    CRC32/poly=Koopman/size=1kB/align=1        3.85µs ± 0%    3.31µs ± 0%  -14.15%
    CRC32/poly=Koopman/size=4kB/align=0        15.3µs ± 0%    13.1µs ± 0%  -14.22%
    CRC32/poly=Koopman/size=4kB/align=1        15.4µs ± 0%    13.1µs ± 0%  -14.79%
    CRC32/poly=Koopman/size=32kB/align=0        137µs ± 0%     105µs ± 0%  -23.56%
    CRC32/poly=Koopman/size=32kB/align=1        137µs ± 0%     105µs ± 0%  -23.53%

crypto/rc4:
    RC4_128    733ns ± 0%    650ns ± 0%  -11.32%  (p=1.000 n=1+1)
    RC4_1K    5.80µs ± 0%   5.17µs ± 0%  -10.89%  (p=1.000 n=1+1)
    RC4_8K    45.7µs ± 0%   40.8µs ± 0%  -10.73%  (p=1.000 n=1+1)

crypto/sha1:
    Hash8Bytes       635ns ± 0%     613ns ± 0%   -3.46%  (p=1.000 n=1+1)
    Hash320Bytes    2.30µs ± 0%    2.18µs ± 0%   -5.38%  (p=1.000 n=1+1)
    Hash1K          5.88µs ± 0%    5.38µs ± 0%   -8.62%  (p=1.000 n=1+1)
    Hash8K          42.0µs ± 0%    37.9µs ± 0%   -9.75%  (p=1.000 n=1+1)

There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.

Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-09-17 12:37:40 +00:00
Cuong Manh Le
27a30186ab cmd/compile,runtime: skip zero'ing order array for select statements
The order array was zero initialized by the compiler, but ends up being
overwritten by the runtime anyway.

So let the runtime takes full responsibility for initializing, save us
one instruction per select.

Fixes #40399

Change-Id: Iec1eca27ad7180d4fcb3cc9ef97348206b7fe6b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/251517
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-08-29 08:02:52 +00:00
Paul E. Murphy
7615b20d06 cmd/compile: generate subfic on ppc64
This merges an lis + subf into subfic, and for 32b constants
lwa + subf into oris + ori + subf.

The carry bit is no longer used in code generation, therefore
I think we can clobber it as needed.  Note, lowered borrow/carry
arithmetic is self-contained and thus is not affected.

A few extra rules are added to ensure early transformations to
SUBFCconst don't trip up earlier rules, fold constant operations,
or otherwise simplify lowering.  Likewise, tests are added to
ensure all rules are hit.  Generic constant folding catches
trivial cases, however some lowering rules insert arithmetic
which can introduce new opportunities (e.g BitLen or Slicemask).

I couldn't find a specific benchmark to demonstrate noteworthy
improvements, but this is generating subfic in many of the default
bent test binaries, so we are at least saving a little code space.

Change-Id: Iad7c6e5767eaa9dc24dc1c989bd1c8cfe1982012
Reviewed-on: https://go-review.googlesource.com/c/go/+/249461
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-08-27 20:10:15 +00:00
fanzha02
d556c251a1 cmd/compile: add more generic rewrite rules to reassociate (op (op y C) x|C)
With this patch, opt pass can expose more obvious constant-folding
opportunites.

Example:
func test(i int) int {return (i+8)-(i+4)}

The previous version:
  MOVD	"".i(FP), R0
  ADD	$8, R0, R1
  ADD	$4, R0, R0
  SUB	R0, R1, R0
  MOVD	R0, "".~r1+8(FP)
  RET	(R30)

The optimized version:
  MOVD	$4, R0
  MOVD	R0, "".~r1+8(FP)
  RET	(R30)

This patch removes some existing reassociation rules, such as "x+(z-C)",
because the current generic rewrite rules will canonicalize "x-const"
to "x+(-const)", making "x+(z-C)" equal to "x+(z+(-C))".

This patch also adds test cases.

Change-Id: I857108ba0b5fcc18a879eeab38e2551bc4277797
Reviewed-on: https://go-review.googlesource.com/c/go/+/237137
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-08-24 14:52:54 +00:00
Agniva De Sarker
8acbe4c0b3 cmd/compile: optimize unsigned comparisons with 0/1 on wasm
Updates #21439

Change-Id: I0fbcde6e0c2fc368fe686b271670f9d8be4a7900
Reviewed-on: https://go-review.googlesource.com/c/go/+/247557
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
2020-08-22 12:35:47 +00:00
diaxu01
01aad9ea93 cmd/compile: Optimize ARM64's code with EON
This patch fuses pattern '(MVN (XOR x y))' into '(EON x y)'.

Change-Id: I269c98ce198d51a4945ce8bd0e1024acbd1b7609
Reviewed-on: https://go-review.googlesource.com/c/go/+/239638
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-19 16:47:14 +00:00
Paul E. Murphy
e7c7ce646f cmd/compile: combine multiply/add into maddld on ppc64le/power9
Add a new lowering rule to match and replace such instances
with the MADDLD instruction available on power9 where
possible.

Likewise, this plumbs in a new ppc64 ssa opcode to house
the newly generated MADDLD instructions.

When testing ed25519, this reduced binary size by 936B.
Similarly, MADDLD combination occcurs in a few other less
obvious cases such as division by constant.

Testing of golang.org/x/crypto/ed25519 shows non-trivial
speedup during keygeneration:

name           old time/op  new time/op  delta
KeyGeneration  65.2µs ± 0%  63.1µs ± 0%  -3.19%
Signing        64.3µs ± 0%  64.4µs ± 0%  +0.16%
Verification    147µs ± 0%   147µs ± 0%  +0.11%

Similarly, this test binary has shrunk by 66488B.

Change-Id: I077aeda7943119b41f07e4e62e44a648f16e4ad0
Reviewed-on: https://go-review.googlesource.com/c/go/+/248723
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-08-18 21:09:30 +00:00
Michael Munday
6e876f1985 cmd/compile: clean up and optimize s390x multiplication rules
Some of the existing optimizations aren't triggered because they
are handled by the generic rules so this CL removes them. Also
some constraints were copied without much thought from the amd64
rules and they don't make sense on s390x, so we remove those
constraints.

Finally, add a 'multiply by the sum of two powers of two'
optimization. This makes sense on s390x as shifts are low latency
and can also sometimes be optimized further (especially if we add
support for RISBG instructions).

name                   old time/op  new time/op  delta
IntMulByConst/3-8      1.70ns ±11%  1.10ns ± 5%  -35.26%  (p=0.000 n=10+10)
IntMulByConst/5-8      1.64ns ± 7%  1.10ns ± 4%  -32.94%  (p=0.000 n=10+9)
IntMulByConst/12-8     1.65ns ± 6%  1.20ns ± 4%  -27.16%  (p=0.000 n=10+9)
IntMulByConst/120-8    1.66ns ± 4%  1.22ns ±13%  -26.43%  (p=0.000 n=10+10)
IntMulByConst/-120-8   1.65ns ± 7%  1.19ns ± 4%  -28.06%  (p=0.000 n=9+10)
IntMulByConst/65537-8  0.86ns ± 9%  1.12ns ±12%  +30.41%  (p=0.000 n=10+10)
IntMulByConst/65538-8  1.65ns ± 5%  1.23ns ± 5%  -25.11%  (p=0.000 n=10+10)

Change-Id: Ib196e6bff1e97febfd266134d0a2b2a62897989f
Reviewed-on: https://go-review.googlesource.com/c/go/+/248937
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-08-18 15:39:44 +00:00
Junchen Li
06337823ef cmd/compile: optimize unsigned comparisons to 0/1 on arm64
For an unsigned integer, it's useful to convert its order test with 0/1
to its equality test with 0. We can save a comparison instruction that
followed by a conditional branch on arm64 since it supports
compare-with-zero-and-branch instructions. For example,

  if x > 0 { ... } else { ... }

the original version:
  CMP $0, R0
  BLS 9

the optimized version:
  CBZ R0, 8

Updates #21439

Change-Id: Id1de6f865f6aa72c5d45b29f7894818857288425
Reviewed-on: https://go-review.googlesource.com/c/go/+/246857
Reviewed-by: Keith Randall <khr@golang.org>
2020-08-18 04:09:13 +00:00
Keith Randall
ef9c8a38ad cmd/compile: don't rewrite (CMP (AND x y) 0) to TEST if AND has other uses
If the AND has other uses, we end up saving an argument to the AND
in another register, so we can use it for the TEST. No point in doing that.

Change-Id: I73444a6aeddd6f55e2328ce04d77c3e6cf4a83e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/241280
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-08-17 22:00:44 +00:00
Junchen Li
e30fbe3757 cmd/compile: optimize unsigned comparisons to 0
There are some architecture-independent rules in #21439, since an
unsigned integer >= 0 is always true and < 0 is always false. This CL
adds these optimizations to generic rules.

Updates #21439

Change-Id: Iec7e3040b761ecb1e60908f764815fdd9bc62495
Reviewed-on: https://go-review.googlesource.com/c/go/+/246617
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-08-17 20:06:35 +00:00
Keith Randall
c4fed25553 cmd/compile: add floating point load+op operations to addressing modes pass
They were missed as part of the refactoring to use a separate
addressing modes pass.

Fixes #40426

Change-Id: Ie0418b2fac4ba1ffe720644ac918f6d728d5e420
Reviewed-on: https://go-review.googlesource.com/c/go/+/244859
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-07-27 18:24:32 +00:00
Xiangdong Ji
e031318ca6 cmd/compile: ARM comparisons with 0 incorrect on overflow
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:

  Block-Op         Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.

For more details please refer to the previous fix on 64-bit ARM:
  https://go-review.googlesource.com/c/go/+/233097

Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.

name                     old time/op    new time/op     delta
BinaryTree17-8              7.73s ± 0%      7.81s ± 0%  +0.97%  (p=0.000 n=7+8)
Fannkuch11-8                7.06s ± 0%      7.00s ± 0%  -0.83%  (p=0.000 n=8+8)
FmtFprintfEmpty-8           181ns ± 1%      183ns ± 1%  +1.31%  (p=0.001 n=8+8)
FmtFprintfString-8          319ns ± 1%      325ns ± 2%  +1.71%  (p=0.009 n=7+8)
FmtFprintfInt-8             358ns ± 1%      359ns ± 1%    ~     (p=0.293 n=7+7)
FmtFprintfIntInt-8          459ns ± 3%      456ns ± 1%    ~     (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8     535ns ± 4%      538ns ± 4%    ~     (p=0.572 n=8+8)
FmtFprintfFloat-8          1.01µs ± 2%     1.01µs ± 2%    ~     (p=0.625 n=8+8)
FmtManyArgs-8              1.93µs ± 2%     1.93µs ± 1%    ~     (p=0.979 n=8+7)
GobDecode-8                16.1ms ± 1%     16.5ms ± 1%  +2.32%  (p=0.000 n=8+8)
GobEncode-8                15.9ms ± 0%     15.8ms ± 1%  -1.00%  (p=0.000 n=8+7)
Gzip-8                      690ms ± 1%      670ms ± 0%  -2.90%  (p=0.000 n=8+8)
Gunzip-8                    109ms ± 1%      109ms ± 1%    ~     (p=0.694 n=7+8)
HTTPClientServer-8          149µs ± 3%      146µs ± 2%  -1.70%  (p=0.028 n=8+8)
JSONEncode-8               50.5ms ± 1%     49.2ms ± 0%  -2.60%  (p=0.001 n=7+7)
JSONDecode-8                135ms ± 2%      137ms ± 1%    ~     (p=0.054 n=8+7)
Mandelbrot200-8             951ms ± 0%      952ms ± 0%    ~     (p=0.852 n=6+8)
GoParse-8                  9.47ms ± 1%     9.66ms ± 1%  +2.01%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8       288ns ± 2%      277ns ± 2%  -3.61%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8      1.66µs ± 1%     1.69µs ± 2%  +2.21%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       334ns ± 1%      305ns ± 2%  -8.86%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8      2.14µs ± 2%     2.15µs ± 0%    ~     (p=0.099 n=8+8)
RegexpMatchMedium_32-8     13.3ns ± 1%     13.3ns ± 0%    ~     (p=1.000 n=7+7)
RegexpMatchMedium_1K-8     81.1µs ± 3%     80.7µs ± 1%    ~     (p=0.955 n=7+8)
RegexpMatchHard_32-8       4.26µs ± 0%     4.26µs ± 0%    ~     (p=0.933 n=7+8)
RegexpMatchHard_1K-8        124µs ± 0%      124µs ± 0%  +0.31%  (p=0.000 n=8+8)
Revcomp-8                  14.7ms ± 2%     14.5ms ± 1%  -1.66%  (p=0.003 n=8+8)
Template-8                  197ms ± 2%      200ms ± 3%  +1.62%  (p=0.021 n=8+8)
TimeParse-8                1.33µs ± 1%     1.30µs ± 1%  -1.86%  (p=0.002 n=8+8)
TimeFormat-8               3.04µs ± 1%     3.02µs ± 0%  -0.60%  (p=0.000 n=8+8)

name                     old speed      new speed       delta
GobDecode-8              47.6MB/s ± 1%   46.5MB/s ± 1%  -2.28%  (p=0.000 n=8+8)
GobEncode-8              48.1MB/s ± 0%   48.6MB/s ± 1%  +1.02%  (p=0.000 n=8+7)
Gzip-8                   28.1MB/s ± 1%   29.0MB/s ± 0%  +2.97%  (p=0.000 n=8+8)
Gunzip-8                  178MB/s ± 1%    179MB/s ± 2%    ~     (p=0.694 n=7+8)
JSONEncode-8             38.4MB/s ± 1%   39.4MB/s ± 0%  +2.67%  (p=0.001 n=7+7)
JSONDecode-8             14.3MB/s ± 2%   14.2MB/s ± 1%  -0.81%  (p=0.043 n=8+7)
GoParse-8                6.12MB/s ± 1%   5.99MB/s ± 1%  -2.00%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8     111MB/s ± 2%    115MB/s ± 2%  +3.77%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8     618MB/s ± 1%    604MB/s ± 2%  -2.16%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8    95.7MB/s ± 1%  105.1MB/s ± 2%  +9.76%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8     479MB/s ± 2%    477MB/s ± 0%    ~     (p=0.105 n=8+8)
RegexpMatchMedium_32-8   75.2MB/s ± 1%   75.2MB/s ± 0%    ~     (p=0.247 n=7+7)
RegexpMatchMedium_1K-8   12.6MB/s ± 3%   12.7MB/s ± 1%    ~     (p=0.538 n=7+8)
RegexpMatchHard_32-8     7.52MB/s ± 0%   7.52MB/s ± 0%    ~     (p=0.968 n=7+8)
RegexpMatchHard_1K-8     8.26MB/s ± 0%   8.24MB/s ± 0%  -0.30%  (p=0.001 n=8+8)
Revcomp-8                 173MB/s ± 2%    176MB/s ± 1%  +1.68%  (p=0.003 n=8+8)
Template-8               9.85MB/s ± 2%   9.69MB/s ± 3%  -1.59%  (p=0.021 n=8+8)

Fixes   #39303
Updates #38740

Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-06-09 15:50:33 +00:00
Xiangdong Ji
e8f5a33191 cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:

  Block-Op        Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.

Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.

Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.

This change also updates TestFormats to support using %#x.

Examples exhibiting where does the issue come from:
  1: 'if x + 3 < 0' might be converted to:
  before:
    CMN $3, R0
    BGE <else branch> // wrong branch is taken if 'x+3' overflows
  after:
    CMN $3, R0
    BPL <else branch>

  2: 'if y - 3 > 0' might be converted to:
  before:
    CMP $3, R0
    BLE <else branch> // wrong branch is taken if 'y-3' underflows
  after:
    CMP $3, R0
    BMI <else branch>
    BEQ <else branch>

Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.

S1:
name                    old time/op  new time/op  delta
CondRewrite/SoloJump  13.6ns ± 0%  12.9ns ± 0%  -5.15%  (p=0.000 n=10+10)
CondRewrite/CombJump  13.8ns ± 1%  12.9ns ± 0%  -6.32%  (p=0.000 n=10+10)

S2:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  11.6ns ± 0%  10.9ns ± 0%  -6.03%  (p=0.000 n=10+10)
CondRewrite/CombJump  11.4ns ± 0%  10.8ns ± 1%  -5.53%  (p=0.000 n=10+10)

S3:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  7.36ns ± 0%  7.50ns ± 0%  +1.79%  (p=0.000 n=9+10)
CondRewrite/CombJump  7.35ns ± 0%  7.75ns ± 0%  +5.51%  (p=0.000 n=8+9)

S4:
name                      old time/op  new time/op  delta
CondRewrite/SoloJump-224  11.5ns ± 1%  10.9ns ± 0%  -4.97%  (p=0.000 n=10+10)
CondRewrite/CombJump-224  11.9ns ± 0%  11.5ns ± 0%  -2.95%  (p=0.000 n=10+10)

S5:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  10.0ns ± 0%  10.0ns ± 0%  -0.45%  (p=0.000 n=9+10)
CondRewrite/CombJump  9.93ns ± 0%  9.77ns ± 0%  -1.53%  (p=0.000 n=10+9)

Go1 perf. data:

name                     old time/op    new time/op    delta
BinaryTree17              6.29s ± 1%     6.30s ± 1%    ~     (p=1.000 n=5+5)
Fannkuch11                5.40s ± 0%     5.40s ± 0%    ~     (p=0.841 n=5+5)
FmtFprintfEmpty          97.9ns ± 0%    98.9ns ± 3%    ~     (p=0.937 n=4+5)
FmtFprintfString          171ns ± 3%     171ns ± 2%    ~     (p=0.754 n=5+5)
FmtFprintfInt             212ns ± 0%     217ns ± 6%  +2.55%  (p=0.008 n=5+5)
FmtFprintfIntInt          296ns ± 1%     297ns ± 2%    ~     (p=0.516 n=5+5)
FmtFprintfPrefixedInt     371ns ± 2%     374ns ± 7%    ~     (p=1.000 n=5+5)
FmtFprintfFloat           435ns ± 1%     439ns ± 2%    ~     (p=0.056 n=5+5)
FmtManyArgs              1.37µs ± 1%    1.36µs ± 1%    ~     (p=0.730 n=5+5)
GobDecode                14.6ms ± 4%    14.4ms ± 4%    ~     (p=0.690 n=5+5)
GobEncode                11.8ms ±20%    11.6ms ±15%    ~     (p=1.000 n=5+5)
Gzip                      507ms ± 0%     491ms ± 0%  -3.22%  (p=0.008 n=5+5)
Gunzip                   73.8ms ± 0%    73.9ms ± 0%    ~     (p=0.690 n=5+5)
HTTPClientServer          116µs ± 0%     116µs ± 0%    ~     (p=0.686 n=4+4)
JSONEncode               21.8ms ± 1%    21.6ms ± 2%    ~     (p=0.151 n=5+5)
JSONDecode                104ms ± 1%     103ms ± 1%  -1.08%  (p=0.016 n=5+5)
Mandelbrot200            9.53ms ± 0%    9.53ms ± 0%    ~     (p=0.421 n=5+5)
GoParse                  7.55ms ± 1%    7.51ms ± 1%    ~     (p=0.151 n=5+5)
RegexpMatchEasy0_32       158ns ± 0%     158ns ± 0%    ~     (all equal)
RegexpMatchEasy0_1K       606ns ± 1%     608ns ± 3%    ~     (p=0.937 n=5+5)
RegexpMatchEasy1_32       143ns ± 0%     144ns ± 1%    ~     (p=0.095 n=5+4)
RegexpMatchEasy1_1K       927ns ± 2%     944ns ± 2%    ~     (p=0.056 n=5+5)
RegexpMatchMedium_32     16.0ns ± 0%    16.0ns ± 0%    ~     (all equal)
RegexpMatchMedium_1K     69.3µs ± 2%    69.7µs ± 0%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32       3.73µs ± 0%    3.73µs ± 1%    ~     (p=0.984 n=5+5)
RegexpMatchHard_1K        111µs ± 1%     110µs ± 0%    ~     (p=0.151 n=5+5)
Revcomp                   1.91s ±47%     1.77s ±68%    ~     (p=1.000 n=5+5)
Template                  138ms ± 1%     138ms ± 1%    ~     (p=1.000 n=5+5)
TimeParse                 787ns ± 2%     785ns ± 1%    ~     (p=0.540 n=5+5)
TimeFormat                729ns ± 1%     726ns ± 1%    ~     (p=0.151 n=5+5)

Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2020-05-29 15:39:54 +00:00
Keith Randall
2cb10d42b7 cmd/compile: in prove, zero right shifts of positive int by #bits - 1
Taking over Zach's CL 212277. Just cleaned up and added a test.

For a positive, signed integer, an arithmetic right shift of count
(bit-width - 1) equals zero. e.g. int64(22) >> 63 -> 0. This CL makes
prove replace these right shifts with a zero-valued constant.

These shifts may arise in source code explicitly, but can also be
created by the generic rewrite of signed division by a power of 2.
// Signed divide by power of 2.
// n / c =       n >> log(c) if n >= 0
//       = (n+c-1) >> log(c) if n < 0
// We conditionally add c-1 by adding n>>63>>(64-log(c))
	(first shift signed, second shift unsigned).
(Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) ->
  (Rsh64x64
    (Add64 <t> n (Rsh64Ux64 <t>
    	(Rsh64x64 <t> n (Const64 <typ.UInt64> [63]))
	(Const64 <typ.UInt64> [64-log2(c)])))
    (Const64 <typ.UInt64> [log2(c)]))

If n is known to be positive, this rewrite includes an extra Add and 2
extra Rsh. This CL will allow prove to replace one of the extra Rsh with
a 0. That replacement then allows lateopt to remove all the unneccesary
fixups from the generic rewrite.

There is a rewrite rule to handle this case directly:
(Div64 n (Const64 [c])) && isNonNegative(n) && isPowerOfTwo(c) ->
	(Rsh64Ux64 n (Const64 <typ.UInt64> [log2(c)]))
But this implementation of isNonNegative really only handles constants
and a few special operations like len/cap. The division could be
handled if the factsTable version of isNonNegative were available.
Unfortunately, the first opt pass happens before prove even has a
chance to deduce the numerator is non-negative, so the generic rewrite
has already fired and created the extra Ops discussed above.

Fixes #36159

By Printf count, this zeroes 137 right shifts when building std and cmd.

Change-Id: Iab486910ac9d7cfb86ace2835456002732b384a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/232857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-05-11 16:23:52 +00:00
Martin Möhrmann
6ed4661807 cmd/compile: optimize make+copy pattern to avoid memclr
match:
 m = make([]T, x); copy(m, s)
for pointer free T and x==len(s) rewrite to:
 m = mallocgc(x*elemsize(T), nil, false); memmove(&m, &s, x*elemsize(T))
otherwise rewrite to:
 m = makeslicecopy([]T, x, s)

This avoids memclear and shading of pointers in the newly created slice
before the copy.

With this CL "s" is only be allowed to bev a variable and not a more
complex expression. This restriction could be lifted in future versions
of this optimization when it can be proven that "s" is not referencing "m".

Triggers 450 times during make.bash..
Reduces go binary size by ~8 kbyte.

name                           old time/op  new time/op  delta
MakeSliceCopy/mallocmove/Byte  71.1ns ± 1%  65.8ns ± 0%  -7.49%  (p=0.000 n=10+9)
MakeSliceCopy/mallocmove/Int   71.2ns ± 1%  66.0ns ± 0%  -7.27%  (p=0.000 n=10+8)
MakeSliceCopy/mallocmove/Ptr    104ns ± 4%    99ns ± 1%  -5.13%  (p=0.000 n=10+10)
MakeSliceCopy/makecopy/Byte    70.3ns ± 0%  68.0ns ± 0%  -3.22%  (p=0.000 n=10+9)
MakeSliceCopy/makecopy/Int     70.3ns ± 0%  68.5ns ± 1%  -2.59%  (p=0.000 n=9+10)
MakeSliceCopy/makecopy/Ptr      102ns ± 0%    99ns ± 1%  -2.97%  (p=0.000 n=9+9)
MakeSliceCopy/nilappend/Byte   75.4ns ± 0%  74.9ns ± 2%  -0.63%  (p=0.015 n=9+9)
MakeSliceCopy/nilappend/Int    75.6ns ± 0%  76.4ns ± 3%    ~     (p=0.245 n=9+10)
MakeSliceCopy/nilappend/Ptr     107ns ± 0%   108ns ± 1%  +0.93%  (p=0.005 n=9+10)

Fixes #26252

Change-Id: Iec553dd1fef6ded16197216a472351c8799a8e71
Reviewed-on: https://go-review.googlesource.com/c/go/+/146719
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-05-07 17:50:24 +00:00
Keith Randall
9ed0fb42e3 cmd/compile: add indexed memory modification ops to amd64
name            old time/op  new time/op  delta
Modify-16        404ns ± 1%   365ns ± 1%  -9.73%  (p=0.000 n=10+10)
ConstModify-16   407ns ± 0%   385ns ± 2%  -5.56%  (p=0.000 n=9+10)

Seems to generally help generated code.

Binary size change is in the noise.

Change-Id: I57891bfaf0f7dfc5d143bb9f7ebafc7079d2614f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228098
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-04-30 17:21:31 +00:00
Keith Randall
882ec701d2 cmd/compile: add indexed load+op operations to amd64
name        old time/op  new time/op  delta
LoadAdd-16   545ns ± 0%   456ns ± 0%  -16.31%  (p=0.000 n=10+10)

Update #36468

Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227960
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-04-30 17:19:57 +00:00
Josh Bleecher Snyder
4a7e363288 cmd/compile: optimize Move with all-zero ro sym src to Zero
We set up static symbols during walk that
we later make copies of to initialize local variables.
It is difficult to ascertain at that time exactly
when copying a symbol is profitable vs locally
initializing an autotmp.

During SSA, we are much better placed to optimize.
This change recognizes when we are copying from a
global readonly all-zero symbol and replaces it with
direct zeroing.

This often allows the all-zero symbol to be
deadcode eliminated at link time.
This is not ideal--it makes for large object files,
and longer link times--but it is the cleanest fix I could find.

This makes the final binary for the program in #38554
shrink from >500mb to ~2.2mb.

It also shrinks the standard binaries:

file      before    after     Δ       %
addr2line 4412496   4404304   -8192   -0.186%
buildid   2893816   2889720   -4096   -0.142%
cgo       4841048   4832856   -8192   -0.169%
compile   19926480  19922432  -4048   -0.020%
cover     5281816   5277720   -4096   -0.078%
link      6734648   6730552   -4096   -0.061%
nm        4366240   4358048   -8192   -0.188%
objdump   4755968   4747776   -8192   -0.172%
pprof     14653060  14612100  -40960  -0.280%
trace     11805940  11777268  -28672  -0.243%
vet       7185560   7181416   -4144   -0.058%
total     113588440 113465560 -122880 -0.108%

And not just by removing unnecessary symbols;
the program text shrinks a bit as well.

Fixes #38554

Change-Id: I8381ae6084ae145a5e0cd9410c451e52c0dc51c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/229704
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-24 23:58:10 +00:00
Josh Bleecher Snyder
e7c1873691 cmd/compile: optimize x & 1 != 0 to x & 1 on amd64
Triggers a handful of times in std+cmd.

Change-Id: I9bb8ce9a5f8bae2547cb61157cd8f256e1b63e76
Reviewed-on: https://go-review.googlesource.com/c/go/+/229602
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-23 17:52:28 +00:00
Michael Munday
ab7a65f283 cmd/compile: clean up codegen for branch-on-carry on s390x
This CL optimizes code that uses a carry from a function such as
bits.Add64 as the condition in an if statement. For example:

    x, c := bits.Add64(a, b, 0)
    if c != 0 {
        panic("overflow")
    }

Rather than converting the carry into a 0 or a 1 value and using
that as an input to a comparison instruction the carry flag is now
used as the input to a conditional branch directly. This typically
removes an ADD LOGICAL WITH CARRY instruction when user code is
doing overflow detection and is closer to the code that a user
would expect to generate.

Change-Id: I950431270955ab72f1b5c6db873b6abe769be0da
Reviewed-on: https://go-review.googlesource.com/c/go/+/219757
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-22 20:11:06 +00:00
Michael Munday
e464d7d797 cmd/compile: optimize comparisons with immediates on s390x
When generating code for unsigned equals (==) and not equals (!=)
comparisons we currently, on s390x, always use signed comparisons.

This mostly works well, however signed comparisons on s390x sign
extend their immediates and unsigned comparisons zero extend them.
For compare-and-branch instructions which can only have 8-bit
immediates this significantly changes the range of immediate values
we can represent: [-128, 127] for signed comparisons and [0, 255]
for unsigned comparisons.

When generating equals and not equals checks we don't neet to worry
about whether the comparison is signed or unsigned. This CL
therefore adds rules to allow us to switch signedness for such
comparisons if it means that it brings a constant into range for an
8-bit immediate.

For example, a signed equals with an integer in the range [128, 255]
will now be implemented using an unsigned compare-and-branch
instruction rather than separate compare and branch instructions.

As part of this change I've also added support for adding a name
to block control values using the same `x:(...)` syntax we use for
value rules.

Triggers 792 times when compiling cmd and std.

Change-Id: I77fa80a128f0a8ce51a2888d1e384bd5e9b61a77
Reviewed-on: https://go-review.googlesource.com/c/go/+/228642
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-21 19:23:51 +00:00
alex-semenyuk
876c1feb7d test/codegen, runtime/pprof, runtime: apply fmt
Change-Id: Ife4e065246729319c39e57a4fbd8e6f7b37724e1
GitHub-Last-Rev: e71803eaeb
GitHub-Pull-Request: golang/go#38527
Reviewed-on: https://go-review.googlesource.com/c/go/+/228901
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
2020-04-21 09:07:42 +00:00
Josh Bleecher Snyder
50b11318fe cmd/compile: use oneBit instead of isPowerOfTwo in bit optimization
This optimization works on any integer with exactly one bit set.
This is identical to being a power of two, except in the
most negative number. Use oneBit instead.

The rule now triggers in a few more places in std+cmd,
in packages encoding/asn1, crypto/elliptic, and
vendor/golang.org/x/crypto/cryptobyte.

This change obviates the need for CL 222479
by doing this optimization consistently in the compiler.

Change-Id: I983c6235290fdc634fda5e11b10f1f8ce041272f
Reviewed-on: https://go-review.googlesource.com/c/go/+/229124
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2020-04-21 00:38:34 +00:00
David Chase
e4e192484b cmd/compile: split up the addressing mode on OpAMD64CMP*loadidx* always
Benchmarking suggests that the combo instruction is notably slower,
at least in the places where we measure.

Updates #37955

Change-Id: I829f1975dd6edf38163128ba51d84604055512f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/228157
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-15 18:09:14 +00:00
Lynn Boger
a1550d3ca3 cmd/compile: use isel with variable shifts on ppc64x
This changes the code generated for variable length shift
counts to use isel instead of instructions that set and
read the carry flag.

This reduces the generated code for shifts like this
by 1 instruction and avoids the use of instructions to
set and read the carry flag.

This sequence can be found in strconv with these results
on power9:

Atof64Decimal                          71.6ns ± 0%  68.3ns ± 0%   -4.61%
Atof64Float                            95.3ns ± 0%  90.9ns ± 0%   -4.62%
Atof64FloatExp                          153ns ± 0%   149ns ± 0%   -2.61%
Atof64Big                               234ns ± 0%   232ns ± 0%   -0.85%
Atof64RandomBits                        348ns ± 0%   369ns ± 0%   +6.03%
Atof64RandomFloats                      262ns ± 0%   262ns ± 0%     ~
Atof32Decimal                          72.0ns ± 0%  68.2ns ± 0%   -5.28%
Atof32Float                            92.1ns ± 0%  87.1ns ± 0%   -5.43%
Atof32FloatExp                          159ns ± 0%   158ns ± 0%   -0.63%
Atof32Random                            194ns ± 0%   191ns ± 0%   -1.55%

Some tests in codegen/shift.go are enabled to verify the
expected instructions are generated.

Change-Id: I968715d10ada405a8c46132bf19b8ed9b85796d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227337
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-09 19:18:56 +00:00
Josh Bleecher Snyder
ade0811dc8 cmd/compile: handle some additional phis in shortcircuit
Prior to this change, the shortcircuit pass could only
handle blocks containing only a single phi control value,
possibly wrapped in some OpNot and OpCopy values.

This change partially lifts this limitation.
It handles some cases in which the block contains other phi values.
This appears to happen most commonly in cases in which
the conditionals being checked involve the memory state,
in which case there is a phi memory value in the block.

The general idea here is to use the information we have about
the CFG to (1) move the other phi values into other blocks
and/or (2) rewrite uses of the other phi values in other blocks.

For example, consider this CFG:

p   q
 \ /
  b
 / \
t   u

And consider a phi value v in block b.
We'll write v = Phi(p: x, q: y) to say that v has value x corresponding
to inbound block p, and value y for block q.

We will rewrite this CFG to:

p    q
|   /
|  b
|/  \
t    u

What should we do with v?

Any uses of v in u can be replaced with y. Why?
If we are in block u, we came from b, and before that from q.
If prior to b we came from p, then we would have gone to t, not u.
Since we came from q, we know that v took the value y.

Uses of v in t are a bit more complicated.
It is going to end up being a phi value: Phi(p: ?, b: ?).

Suppose, after the rewrite, we came from block p.
Then, before the rewrite, we would have gone to b,
where v would have the value x.
So we have Phi(p: x, b: ?).

Suppose, after the rewrite, we came from block b.
Then we must have come from block q.
If we come from block q, v has value y.
So we have Phi(p: x, b: y).
Uses of v in t can thus be replaced with a new phi value,
with the same values as v, but with altered predecessors.

Similar reasoning can be employed to rewrite or replace
other uses of v elsewhere in the CFG, so that v itself can be eliminated,
and the CFG rewrite can proceed.

This change sets up the infrastructure for such optimizations
and adds a few cheap ones. All optimizations in this change depend
only on the shape of the CFG; future changes may also depend on where
v's uses are. That analysis is more powerful but more expensive,
and should be done incrementally.

The use of closures here is perhaps a bit unusual,
but during development it proved critical to having readable code.
We must decide early on whether we can safely do the CFG modifications,
and then later fix up the phis if so.
Safely storing state and decisions across these two phases is hard to do readably.
Closures solve the problem neatly.

I manually instrumented the code paths in shortcircuitPhiPlan.
During make.bash there are nearly 6000 invocations.
The least-visited code path gets run 85 times,
so all the code in this CL is reasonably well-exercised.

Here is a concrete example of code improved by this change:

func f(e interface{}) int {
	if x, ok := e.(int); ok {
		return x
	}
	return 0
}

Omitting PCDATA, FUNCDATA, and the like, it used to compile to:

"".f STEXT nosplit size=50 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	43
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:4)	JNE	33
	0x001b 00027 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x0020 00032 (x.go:5)	RET
	0x0021 00033 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x002a 00042 (x.go:7)	RET
	0x002b 00043 (x.go:7)	MOVL	$0, AX
	0x0030 00048 (x.go:4)	JMP	25

Afterwards, it compiles to:

"".f STEXT nosplit size=41 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	31
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x001e 00030 (x.go:5)	RET
	0x001f 00031 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x0028 00040 (x.go:7)	RET

Note that there is now only a single JNE and a single RET $0 path.

Updates #37608

Has a minor good effect on compilation speed and memory use.

Provides widespread improvements to generated code.
The rare, minor regressions I have investigated are due to
register allocation fluctuations.

file      before    after     Δ       %       
addr2line 4376080   4371984   -4096   -0.094% 
api       5945400   5933112   -12288  -0.207% 
asm       5034312   5030216   -4096   -0.081% 
buildid   2844952   2840856   -4096   -0.144% 
cgo       4812872   4804680   -8192   -0.170% 
compile   19622064  19610368  -11696  -0.060% 
cover     5236648   5232552   -4096   -0.078% 
dist      3658312   3654216   -4096   -0.112% 
doc       4653512   4649416   -4096   -0.088% 
fix       3370072   3365976   -4096   -0.122% 
link      6671864   6667768   -4096   -0.061% 
pprof     14781652  14761172  -20480  -0.139% 
trace     11639684  11627396  -12288  -0.106% 
vet       8252280   8231800   -20480  -0.248% 
total     115052984 114934792 -118192 -0.103% 


file                                                                     before   after    Δ       %       
internal/cpu.s                                                           3298     3296     -2      -0.061% 
internal/bytealg.s                                                       1730     1737     +7      +0.405% 
cmd/vendor/golang.org/x/mod/semver.s                                     7332     7283     -49     -0.668% 
image/color.s                                                            8248     8156     -92     -1.115% 
math.s                                                                   35966    35956    -10     -0.028% 
math/cmplx.s                                                             6596     6575     -21     -0.318% 
runtime.s                                                                480566   480053   -513    -0.107% 
sync.s                                                                   16408    16385    -23     -0.140% 
math/rand.s                                                              10447    10406    -41     -0.392% 
internal/reflectlite.s                                                   28408    28366    -42     -0.148% 
errors.s                                                                 2736     2701     -35     -1.279% 
sort.s                                                                   17031    17036    +5      +0.029% 
io.s                                                                     16993    16964    -29     -0.171% 
container/heap.s                                                         2006     1997     -9      -0.449% 
text/tabwriter.s                                                         9570     9552     -18     -0.188% 
bytes.s                                                                  31823    31594    -229    -0.720% 
strconv.s                                                                52760    52717    -43     -0.082% 
vendor/golang.org/x/text/transform.s                                     16713    16706    -7      -0.042% 
strings.s                                                                42590    42563    -27     -0.063% 
bufio.s                                                                  22883    22785    -98     -0.428% 
encoding/base32.s                                                        9586     9531     -55     -0.574% 
syscall.s                                                                82237    82243    +6      +0.007% 
image.s                                                                  37465    37452    -13     -0.035% 
regexp/syntax.s                                                          82827    82769    -58     -0.070% 
image/draw.s                                                             18698    18584    -114    -0.610% 
image/jpeg.s                                                             36560    36549    -11     -0.030% 
time.s                                                                   82557    82526    -31     -0.038% 
context.s                                                                10863    10820    -43     -0.396% 
regexp.s                                                                 64114    64049    -65     -0.101% 
os.s                                                                     51751    51524    -227    -0.439% 
reflect.s                                                                168240   168049   -191    -0.114% 
cmd/go/internal/lockedfile/internal/filelock.s                           2317     2290     -27     -1.165% 
path/filepath.s                                                          17831    17766    -65     -0.365% 
io/ioutil.s                                                              6994     6990     -4      -0.057% 
encoding/binary.s                                                        30791    30726    -65     -0.211% 
cmd/vendor/golang.org/x/sys/unix.s                                       78055    78033    -22     -0.028% 
encoding/pem.s                                                           9280     9247     -33     -0.356% 
crypto/cipher.s                                                          20376    20374    -2      -0.010% 
os/exec.s                                                                29229    29140    -89     -0.304% 
internal/goroot.s                                                        4588     4579     -9      -0.196% 
cmd/internal/browser.s                                                   2246     2240     -6      -0.267% 
cmd/vendor/golang.org/x/crypto/ssh/terminal.s                            27183    27149    -34     -0.125% 
fmt.s                                                                    76625    76484    -141    -0.184% 
encoding/hex.s                                                           6154     6152     -2      -0.032% 
compress/lzw.s                                                           7063     7059     -4      -0.057% 
database/sql/driver.s                                                    18875    18862    -13     -0.069% 
debug/plan9obj.s                                                         8268     8266     -2      -0.024% 
net/url.s                                                                29724    29719    -5      -0.017% 
encoding/csv.s                                                           12872    12856    -16     -0.124% 
debug/gosym.s                                                            25303    25268    -35     -0.138% 
compress/flate.s                                                         50952    51019    +67     +0.131% 
compress/zlib.s                                                          7277     7266     -11     -0.151% 
archive/zip.s                                                            42155    42111    -44     -0.104% 
debug/dwarf.s                                                            107632   107541   -91     -0.085% 
database/sql.s                                                           98373    98028    -345    -0.351% 
os/user.s                                                                14722    14708    -14     -0.095% 
encoding/json.s                                                          105836   105711   -125    -0.118% 
debug/macho.s                                                            32598    32560    -38     -0.117% 
encoding/gob.s                                                           136478   135755   -723    -0.530% 
debug/pe.s                                                               31160    30869    -291    -0.934% 
debug/elf.s                                                              63495    63302    -193    -0.304% 
vendor/golang.org/x/text/unicode/bidi.s                                  27220    27217    -3      -0.011% 
vendor/golang.org/x/text/secure/bidirule.s                               3363     3352     -11     -0.327% 
go/token.s                                                               12036    12035    -1      -0.008% 
flag.s                                                                   22277    22256    -21     -0.094% 
mime.s                                                                   39696    39509    -187    -0.471% 
go/scanner.s                                                             19033    19020    -13     -0.068% 
archive/tar.s                                                            70936    70581    -355    -0.500% 
internal/xcoff.s                                                         22823    22820    -3      -0.013% 
text/scanner.s                                                           11631    11629    -2      -0.017% 
encoding/xml.s                                                           110534   110408   -126    -0.114% 
math/big.s                                                               183636   183545   -91     -0.050% 
image/gif.s                                                              27376    27343    -33     -0.121% 
crypto/dsa.s                                                             6029     5969     -60     -0.995% 
image/png.s                                                              42947    42939    -8      -0.019% 
crypto/rand.s                                                            6866     6854     -12     -0.175% 
vendor/golang.org/x/text/unicode/norm.s                                  66394    66354    -40     -0.060% 
runtime/trace.s                                                          2603     2521     -82     -3.150% 
crypto/ed25519.s                                                         6321     6300     -21     -0.332% 
text/template/parse.s                                                    93910    93844    -66     -0.070% 
crypto/rsa.s                                                             31460    31369    -91     -0.289% 
encoding/asn1.s                                                          57021    57023    +2      +0.004% 
crypto/elliptic.s                                                        51382    51363    -19     -0.037% 
crypto/x509/pkix.s                                                       10386    10342    -44     -0.424% 
vendor/golang.org/x/net/idna.s                                           24482    24466    -16     -0.065% 
vendor/golang.org/x/crypto/cryptobyte.s                                  33479    33280    -199    -0.594% 
crypto/ecdsa.s                                                           11936    11883    -53     -0.444% 
go/constant.s                                                            43670    42663    -1007   -2.306% 
go/ast.s                                                                 80383    80191    -192    -0.239% 
testing.s                                                                68069    68057    -12     -0.018% 
runtime/pprof.s                                                          59613    59603    -10     -0.017% 
testing/iotest.s                                                         4895     4891     -4      -0.082% 
internal/trace.s                                                         78136    78089    -47     -0.060% 
cmd/internal/goobj2.s                                                    13158    13154    -4      -0.030% 
cmd/internal/src.s                                                       17661    17657    -4      -0.023% 
go/parser.s                                                              79046    78880    -166    -0.210% 
cmd/internal/objabi.s                                                    16367    16343    -24     -0.147% 
text/template.s                                                          94899    94486    -413    -0.435% 
go/printer.s                                                             77267    76992    -275    -0.356% 
cmd/internal/goobj.s                                                     25988    25947    -41     -0.158% 
runtime/pprof/internal/profile.s                                         102066   101933   -133    -0.130% 
go/format.s                                                              5419     5371     -48     -0.886% 
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s                            37181    37149    -32     -0.086% 
go/doc.s                                                                 74533    74132    -401    -0.538% 
html/template.s                                                          88743    88389    -354    -0.399% 
cmd/asm/internal/lex.s                                                   24881    24872    -9      -0.036% 
cmd/internal/buildid.s                                                   18263    18256    -7      -0.038% 
cmd/vendor/golang.org/x/arch/x86/x86asm.s                                80036    79980    -56     -0.070% 
go/build.s                                                               68905    68737    -168    -0.244% 
cmd/cover.s                                                              46070    45950    -120    -0.260% 
cmd/internal/obj.s                                                       117001   116991   -10     -0.009% 
cmd/doc.s                                                                62700    62419    -281    -0.448% 
cmd/internal/obj/arm.s                                                   66745    66687    -58     -0.087% 
cmd/compile/internal/syntax.s                                            145406   145062   -344    -0.237% 
cmd/internal/obj/wasm.s                                                  44049    44027    -22     -0.050% 
net.s                                                                    291835   291020   -815    -0.279% 
cmd/dist.s                                                               209020   208807   -213    -0.102% 
cmd/cgo.s                                                                241564   241102   -462    -0.191% 
vendor/golang.org/x/net/http/httpproxy.s                                 9407     9399     -8      -0.085% 
log/syslog.s                                                             7921     7909     -12     -0.151% 
go/types.s                                                               319325   317513   -1812   -0.567% 
vendor/golang.org/x/net/http/httpguts.s                                  3834     3825     -9      -0.235% 
mime/multipart.s                                                         21414    21343    -71     -0.332% 
cmd/internal/obj/ppc64.s                                                 119949   119938   -11     -0.009% 
cmd/compile/internal/logopt.s                                            10158    10118    -40     -0.394% 
vendor/golang.org/x/net/nettest.s                                        28012    27991    -21     -0.075% 
go/internal/srcimporter.s                                                6405     6380     -25     -0.390% 
go/internal/gcimporter.s                                                 34525    34493    -32     -0.093% 
net/mail.s                                                               23937    23720    -217    -0.907% 
go/internal/gccgoimporter.s                                              56095    56038    -57     -0.102% 
cmd/compile/internal/types.s                                             47247    47207    -40     -0.085% 
cmd/api.s                                                                39582    39558    -24     -0.061% 
cmd/go/internal/base.s                                                   12572    12551    -21     -0.167% 
cmd/vendor/golang.org/x/xerrors.s                                        17846    17814    -32     -0.179% 
cmd/vendor/golang.org/x/mod/sumdb/note.s                                 18142    18070    -72     -0.397% 
cmd/go/internal/search.s                                                 19994    19876    -118    -0.590% 
cmd/go/internal/imports.s                                                16457    16428    -29     -0.176% 
cmd/vendor/golang.org/x/mod/module.s                                     17838    17759    -79     -0.443% 
cmd/go/internal/cache.s                                                  30551    30514    -37     -0.121% 
cmd/vendor/golang.org/x/mod/sumdb/tlog.s                                 36356    36321    -35     -0.096% 
cmd/internal/test2json.s                                                 9452     9408     -44     -0.466% 
cmd/go/internal/mvs.s                                                    25136    25092    -44     -0.175% 
cmd/go/internal/txtar.s                                                  3488     3461     -27     -0.774% 
cmd/vendor/golang.org/x/mod/zip.s                                        18811    18800    -11     -0.058% 
cmd/go/internal/version.s                                                11213    11171    -42     -0.375% 
cmd/link/internal/benchmark.s                                            4941     4949     +8      +0.162% 
cmd/internal/obj/s390x.s                                                 126865   126849   -16     -0.013% 
cmd/gofmt.s                                                              30684    30596    -88     -0.287% 
cmd/fix.s                                                                87450    86906    -544    -0.622% 
cmd/internal/obj/x86.s                                                   88578    88556    -22     -0.025% 
cmd/vendor/golang.org/x/mod/modfile.s                                    72450    72363    -87     -0.120% 
cmd/oldlink/internal/loader.s                                            16743    16741    -2      -0.012% 
cmd/pack.s                                                               14863    14861    -2      -0.013% 
cmd/go/internal/load.s                                                   106742   106568   -174    -0.163% 
cmd/oldlink/internal/objfile.s                                           21787    21780    -7      -0.032% 
cmd/oldlink/internal/loadmacho.s                                         29309    29317    +8      +0.027% 
cmd/oldlink/internal/loadelf.s                                           35013    35021    +8      +0.023% 
cmd/asm/internal/asm.s                                                   68550    68538    -12     -0.018% 
cmd/link/internal/loader.s                                               94765    94564    -201    -0.212% 
cmd/link/internal/loadelf.s                                              35663    35667    +4      +0.011% 
cmd/link/internal/loadmacho.s                                            29501    29509    +8      +0.027% 
cmd/vendor/golang.org/x/tools/go/analysis.s                              4983     4976     -7      -0.140% 
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s       16771    16709    -62     -0.370% 
cmd/vendor/golang.org/x/tools/go/types/objectpath.s                      18481    18456    -25     -0.135% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2100     2085     -15     -0.714% 
cmd/vendor/github.com/google/pprof/profile.s                             150141   149620   -521    -0.347% 
cmd/vendor/github.com/google/pprof/internal/measurement.s                10420    10404    -16     -0.154% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s               36814    36755    -59     -0.160% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s                 6688     6673     -15     -0.224% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s               9856     9784     -72     -0.731% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/composite.s             3011     2979     -32     -1.063% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s              9737     9682     -55     -0.565% 
cmd/vendor/golang.org/x/tools/go/cfg.s                                   30738    30725    -13     -0.042% 
cmd/vendor/github.com/ianlancetaylor/demangle.s                          175195   174513   -682    -0.389% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/httpresponse.s          3625     3520     -105    -2.897% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s           2987     2971     -16     -0.536% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s                 4372     4340     -32     -0.732% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s            8634     8611     -23     -0.266% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s                 6189     6164     -25     -0.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s             8089     8073     -16     -0.198% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unsafeptr.s             2208     2177     -31     -1.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s           8050     8047     -3      -0.037% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s          3665     3629     -36     -0.982% 
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                           65773    65680    -93     -0.141% 
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s                  13328    13286    -42     -0.315% 
cmd/vendor/golang.org/x/tools/go/types/typeutil.s                        12263    12162    -101    -0.824% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s              1459     1421     -38     -2.605% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s              5208     5191     -17     -0.326% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unmarshal.s             1801     1782     -19     -1.055% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s            9569     9528     -41     -0.428% 
cmd/go/internal/work.s                                                   304928   304756   -172    -0.056% 
crypto/x509.s                                                            147340   147139   -201    -0.136% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s                34287    34019    -268    -0.782% 
crypto/tls.s                                                             311603   310644   -959    -0.308% 
cmd/oldlink/internal/ld.s                                                533115   532651   -464    -0.087% 
cmd/oldlink/internal/wasm.s                                              16484    16458    -26     -0.158% 
cmd/oldlink/internal/x86.s                                               18832    18830    -2      -0.011% 
cmd/link/internal/ld.s                                                   548200   547626   -574    -0.105% 
cmd/link/internal/wasm.s                                                 16760    16734    -26     -0.155% 
cmd/link/internal/arm64.s                                                20850    20840    -10     -0.048% 
cmd/link/internal/x86.s                                                  17437    17435    -2      -0.011% 
net/http.s                                                               556647   555519   -1128   -0.203% 
net/http/cookiejar.s                                                     15849    15833    -16     -0.101% 
expvar.s                                                                 9521     9508     -13     -0.137% 
net/http/httptest.s                                                      16471    16452    -19     -0.115% 
cmd/vendor/github.com/google/pprof/internal/plugin.s                     4266     4264     -2      -0.047% 
net/http/cgi.s                                                           23448    23428    -20     -0.085% 
cmd/go/internal/web.s                                                    16472    16428    -44     -0.267% 
net/http/httputil.s                                                      39672    39670    -2      -0.005% 
net/rpc.s                                                                33989    33965    -24     -0.071% 
net/http/fcgi.s                                                          19167    19162    -5      -0.026% 
cmd/vendor/github.com/google/pprof/internal/symbolz.s                    5861     5857     -4      -0.068% 
cmd/vendor/github.com/google/pprof/internal/binutils.s                   35842    35823    -19     -0.053% 
cmd/vendor/github.com/google/pprof/internal/symbolizer.s                 11449    11404    -45     -0.393% 
cmd/go/internal/get.s                                                    62726    62582    -144    -0.230% 
cmd/vendor/github.com/google/pprof/internal/report.s                     80032    80022    -10     -0.012% 
cmd/go/internal/modfetch/codehost.s                                      89005    88871    -134    -0.151% 
cmd/trace.s                                                              116607   116496   -111    -0.095% 
cmd/vendor/github.com/google/pprof/internal/driver.s                     143234   143207   -27     -0.019% 
cmd/vendor/github.com/google/pprof/driver.s                              9000     8998     -2      -0.022% 
cmd/go/internal/modfetch.s                                               126300   125726   -574    -0.454% 
cmd/pprof.s                                                              12317    12312    -5      -0.041% 
cmd/go/internal/modconv.s                                                17878    17861    -17     -0.095% 
cmd/go/internal/modload.s                                                150261   149763   -498    -0.331% 
cmd/go/internal/clean.s                                                  11122    11091    -31     -0.279% 
cmd/go/internal/help.s                                                   6523     6521     -2      -0.031% 
cmd/go/internal/generate.s                                               11627    11614    -13     -0.112% 
cmd/go/internal/envcmd.s                                                 22034    21986    -48     -0.218% 
cmd/go/internal/modget.s                                                 38478    38398    -80     -0.208% 
cmd/go/internal/modcmd.s                                                 46430    46229    -201    -0.433% 
cmd/go/internal/test.s                                                   64399    64374    -25     -0.039% 
cmd/compile/internal/ssa.s                                               3615264  3608276  -6988   -0.193% 
cmd/compile/internal/gc.s                                                1538865  1537625  -1240   -0.081% 
cmd/compile/internal/amd64.s                                             33593    33574    -19     -0.057% 
cmd/compile/internal/x86.s                                               30871    30852    -19     -0.062% 
total                                                                    19343565 19311284 -32281  -0.167% 

Change-Id: Ib030eb79458827a5a5b6d0d2f98765f8325a4d7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/222923
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-08 22:13:38 +00:00
Ruixin(Peter) Bao
b2790a2838 cmd/compile: allow floating point Ops to produce flags on s390x
On s390x, some floating point arithmetic instructions (FSUB, FADD)  generate flag.
This patch allows those related SSA ops to return a tuple, where the second argument of
the tuple is the generated flag. We can use the flag and remove the
subsequent comparison instruction (e.g: LTDBR).

This CL also reduces the .text section for math.test binary by 0.4KB.

Benchmarks:
name                    old time/op  new time/op  delta
Acos-18                 12.1ns ± 0%  12.1ns ± 0%     ~     (all equal)
Acosh-18                18.5ns ± 0%  18.5ns ± 0%     ~     (all equal)
Asin-18                 13.1ns ± 0%  13.1ns ± 0%     ~     (all equal)
Asinh-18                19.4ns ± 0%  19.5ns ± 1%     ~     (p=0.444 n=5+5)
Atan-18                 10.0ns ± 0%  10.0ns ± 0%     ~     (all equal)
Atanh-18                19.1ns ± 1%  19.2ns ± 2%     ~     (p=0.841 n=5+5)
Atan2-18                16.4ns ± 0%  16.4ns ± 0%     ~     (all equal)
Cbrt-18                 14.8ns ± 0%  14.8ns ± 0%     ~     (all equal)
Ceil-18                 0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Copysign-18             0.80ns ± 0%  0.80ns ± 0%     ~     (all equal)
Cos-18                  7.19ns ± 0%  7.19ns ± 0%     ~     (p=0.556 n=4+5)
Cosh-18                 12.4ns ± 0%  12.4ns ± 0%     ~     (all equal)
Erf-18                  10.8ns ± 0%  10.8ns ± 0%     ~     (all equal)
Erfc-18                 11.0ns ± 0%  11.0ns ± 0%     ~     (all equal)
Erfinv-18               23.0ns ±16%  26.8ns ± 1%  +16.90%  (p=0.008 n=5+5)
Erfcinv-18              23.3ns ±15%  26.1ns ± 7%     ~     (p=0.087 n=5+5)
Exp-18                  8.67ns ± 0%  8.67ns ± 0%     ~     (p=1.000 n=4+4)
ExpGo-18                50.8ns ± 3%  52.4ns ± 2%     ~     (p=0.063 n=5+5)
Expm1-18                9.49ns ± 1%  9.47ns ± 0%     ~     (p=1.000 n=5+5)
Exp2-18                 52.7ns ± 1%  50.5ns ± 3%   -4.10%  (p=0.024 n=5+5)
Exp2Go-18               50.6ns ± 1%  48.4ns ± 3%   -4.39%  (p=0.008 n=5+5)
Abs-18                  0.67ns ± 0%  0.67ns ± 0%     ~     (p=0.444 n=5+5)
Dim-18                  1.02ns ± 0%  1.03ns ± 0%   +0.98%  (p=0.008 n=5+5)
Floor-18                0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Max-18                  3.09ns ± 1%  3.05ns ± 0%   -1.42%  (p=0.008 n=5+5)
Min-18                  3.32ns ± 1%  3.30ns ± 0%   -0.72%  (p=0.016 n=5+4)
Mod-18                  62.3ns ± 1%  65.8ns ± 3%   +5.55%  (p=0.008 n=5+5)
Frexp-18                5.05ns ± 2%  4.98ns ± 0%     ~     (p=0.683 n=5+5)
Gamma-18                24.4ns ± 0%  24.1ns ± 0%   -1.23%  (p=0.008 n=5+5)
Hypot-18                10.3ns ± 0%  10.3ns ± 0%     ~     (all equal)
HypotGo-18              10.2ns ± 0%  10.2ns ± 0%     ~     (all equal)
Ilogb-18                3.56ns ± 1%  3.54ns ± 0%     ~     (p=0.595 n=5+5)
J0-18                    113ns ± 0%   108ns ± 1%   -4.42%  (p=0.016 n=4+5)
J1-18                    115ns ± 0%   109ns ± 1%   -4.87%  (p=0.016 n=4+5)
Jn-18                    240ns ± 0%   230ns ± 2%   -4.41%  (p=0.008 n=5+5)
Ldexp-18                6.19ns ± 0%  6.19ns ± 0%     ~     (p=0.444 n=5+5)
Lgamma-18               32.2ns ± 0%  32.2ns ± 0%     ~     (all equal)
Log-18                  13.1ns ± 0%  13.1ns ± 0%     ~     (all equal)
Logb-18                 4.23ns ± 0%  4.22ns ± 0%     ~     (p=0.444 n=5+5)
Log1p-18                12.7ns ± 0%  12.7ns ± 0%     ~     (all equal)
Log10-18                18.1ns ± 0%  18.2ns ± 0%     ~     (p=0.167 n=5+5)
Log2-18                 14.0ns ± 0%  14.0ns ± 0%     ~     (all equal)
Modf-18                 10.4ns ± 0%  10.5ns ± 0%   +0.96%  (p=0.016 n=4+5)
Nextafter32-18          11.3ns ± 0%  11.3ns ± 0%     ~     (all equal)
Nextafter64-18          4.01ns ± 1%  3.97ns ± 0%     ~     (p=0.333 n=5+4)
PowInt-18               32.7ns ± 0%  32.7ns ± 0%     ~     (all equal)
PowFrac-18              33.2ns ± 0%  33.1ns ± 0%     ~     (p=0.095 n=4+5)
Pow10Pos-18             1.58ns ± 0%  1.58ns ± 0%     ~     (all equal)
Pow10Neg-18             5.81ns ± 0%  5.81ns ± 0%     ~     (all equal)
Round-18                0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
RoundToEven-18          0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Remainder-18            40.6ns ± 0%  40.7ns ± 0%     ~     (p=0.238 n=5+4)
Signbit-18              1.57ns ± 0%  1.57ns ± 0%     ~     (all equal)
Sin-18                  6.75ns ± 0%  6.74ns ± 0%     ~     (p=0.333 n=5+4)
Sincos-18               29.5ns ± 0%  29.5ns ± 0%     ~     (all equal)
Sinh-18                 14.4ns ± 0%  14.4ns ± 0%     ~     (all equal)
SqrtIndirect-18         3.97ns ± 0%  4.15ns ± 0%   +4.59%  (p=0.008 n=5+5)
SqrtLatency-18          8.01ns ± 0%  8.01ns ± 0%     ~     (all equal)
SqrtIndirectLatency-18  11.6ns ± 0%  11.6ns ± 0%     ~     (all equal)
SqrtGoLatency-18        44.7ns ± 0%  45.0ns ± 0%   +0.67%  (p=0.008 n=5+5)
SqrtPrime-18            1.26µs ± 0%  1.27µs ± 0%   +0.63%  (p=0.029 n=4+4)
Tan-18                  11.1ns ± 0%  11.1ns ± 0%     ~     (all equal)
Tanh-18                 15.8ns ± 0%  15.8ns ± 0%     ~     (all equal)
Trunc-18                0.78ns ± 0%  0.78ns ± 0%     ~     (all equal)
Y0-18                    113ns ± 2%   108ns ± 3%   -5.11%  (p=0.008 n=5+5)
Y1-18                    112ns ± 3%   107ns ± 0%   -4.29%  (p=0.000 n=5+4)
Yn-18                    229ns ± 0%   220ns ± 1%   -3.76%  (p=0.016 n=4+5)
Float64bits-18          1.09ns ± 0%  1.09ns ± 0%     ~     (all equal)
Float64frombits-18      0.55ns ± 0%  0.55ns ± 0%     ~     (all equal)
Float32bits-18          0.96ns ±16%  0.86ns ± 0%     ~     (p=0.563 n=5+5)
Float32frombits-18      1.03ns ±28%  0.84ns ± 0%     ~     (p=0.167 n=5+5)
FMA-18                  1.60ns ± 0%  1.60ns ± 0%     ~     (all equal)
[Geo mean]              10.0ns        9.9ns        -0.41%
Change-Id: Ief7e63ea5a8ba404b0a4696e12b9b7e0b05a9a03
Reviewed-on: https://go-review.googlesource.com/c/go/+/209160
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-08 20:57:58 +00:00
Michael Munday
bfd569fcb0 cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.

Fixes #37316.

Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-07 19:55:05 +00:00
Lynn Boger
815509ae31 cmd/compile: improve lowered moves and zeros for ppc64le
This change includes the following:
- Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9.
These instructions do not require an index register, which
allows more loads and stores within a loop without initializing
multiple index registers. The LoweredQuadXXX generate LXV/STXV.
- Create LoweredMoveXXXShort and LoweredZeroXXXShort for short
moves that don't generate loops, and therefore don't clobber the
address registers or flags.
- Use registers other than R3 and R4 to avoid conflicting with
registers that have already been allocated to avoid unnecessary
register moves.
- Eliminate the use of R14 as scratch register and use R31
instead.
- Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a
loop with more than 3 iterations.

This performance opportunity was noticed in github.com/golang/snappy
benchmarks. Results on power9:

WordsDecode1e1    54.1ns ± 0%    53.8ns ± 0%   -0.51%  (p=0.029 n=4+4)
WordsDecode1e2     287ns ± 0%     282ns ± 1%   -1.83%  (p=0.029 n=4+4)
WordsDecode1e3    3.98µs ± 0%    3.64µs ± 0%   -8.52%  (p=0.029 n=4+4)
WordsDecode1e4    66.9µs ± 0%    67.0µs ± 0%   +0.20%  (p=0.029 n=4+4)
WordsDecode1e5     723µs ± 0%     723µs ± 0%   -0.01%  (p=0.200 n=4+4)
WordsDecode1e6    7.21ms ± 0%    7.21ms ± 0%   -0.02%  (p=1.000 n=4+4)
WordsEncode1e1    29.9ns ± 0%    29.4ns ± 0%   -1.51%  (p=0.029 n=4+4)
WordsEncode1e2    2.12µs ± 0%    1.75µs ± 0%  -17.70%  (p=0.029 n=4+4)
WordsEncode1e3    11.7µs ± 0%    11.2µs ± 0%   -4.61%  (p=0.029 n=4+4)
WordsEncode1e4     119µs ± 0%     120µs ± 0%   +0.36%  (p=0.029 n=4+4)
WordsEncode1e5    1.21ms ± 0%    1.22ms ± 0%   +0.41%  (p=0.029 n=4+4)
WordsEncode1e6    12.0ms ± 0%    12.0ms ± 0%   +0.57%  (p=0.029 n=4+4)
RandomEncode       286µs ± 0%     203µs ± 0%  -28.82%  (p=0.029 n=4+4)
ExtendMatch       47.4µs ± 0%    47.0µs ± 0%   -0.85%  (p=0.029 n=4+4)

Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858
Reviewed-on: https://go-review.googlesource.com/c/go/+/226539
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-06 12:09:39 +00:00
Josh Bleecher Snyder
fff7509d47 cmd/compile: add intrinsic HasCPUFeature for checking cpu features
Before using some CPU instructions, we must check for their presence.
We use global variables in the runtime package to record features.

Prior to this CL, we issued a regular memory load for these features.
The downside to this is that, because it is a regular memory load,
it cannot be hoisted out of loops or otherwise reordered with other loads.

This CL introduces a new intrinsic just for checking cpu features.
It still ends up resulting in a memory load, but that memory load can
now be floated to the entry block and rematerialized as needed.

One downside is that the regular load could be combined with the comparison
into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE.
(It is possible that MOVBQZX+TESTQ+NE would be better.)

This CL does only amd64. It is easy to extend to other architectures.

For the benchmark in #36196, on my machine, this offers a mild speedup.

name      old time/op  new time/op  delta
FMA-8     1.39ns ± 6%  1.29ns ± 9%  -7.19%  (p=0.000 n=97+96)
NonFMA-8  2.03ns ±11%  2.04ns ±12%    ~     (p=0.618 n=99+98)

Updates #15808
Updates #36196

Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/212360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2020-04-04 01:01:04 +00:00
Keith Randall
bba88467f8 cmd/compile: add indexed-load CMP instructions
Things like CMPQ 4(AX)(BX*8), CX

Fixes #37955

Change-Id: Icbed430f65c91a0e3f38a633d8321d79433ad8b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/224219
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-04-01 17:03:26 +00:00
Josh Bleecher Snyder
8114242359 cmd/compile, runtime: use more registers for amd64 write barrier calls
The compiler-inserted write barrier calls use a special ABI
for speed and to minimize the binary size impact.

runtime.gcWriteBarrier takes its args in DI and AX.
This change adds gcWriteBarrier wrapper functions,
varying only in the register used for the second argument.
(Allowing variation in the first argument doesn't offer improvements,
which is convenient, as it avoids quadratic API growth.)
This reduces the number of register copies.

The goals are reduced binary size via reduced register pressure/copies.

One downside to this change is that when the write barrier is on,
we may bounce through several different write barrier wrappers,
which is bad for the instruction cache.

Package runtime write barrier benchmarks for this change:

name                old time/op  new time/op  delta
WriteBarrier-8      16.6ns ± 6%  15.6ns ± 6%  -5.73%  (p=0.000 n=97+99)
BulkWriteBarrier-8  4.37ns ± 7%  4.22ns ± 8%  -3.45%  (p=0.000 n=96+99)

However, I don't particularly trust these numbers.
I ran runtime.BenchmarkWriteBarrier multiple times as I rebased
this change, and noticed that the results have high variance
depending on the parent change, perhaps due to aligment.

This change was stress tested with GOGC=1 GODEBUG=gccheckmark=1 go test std.

This change reduces binary sizes:

file      before    after     Δ       %
addr2line 4308720   4296688   -12032  -0.279%
api       5965592   5945368   -20224  -0.339%
asm       5148088   5025464   -122624 -2.382%
buildid   2848760   2844904   -3856   -0.135%
cgo       4828968   4812840   -16128  -0.334%
compile   19754720  19529744  -224976 -1.139%
cover     5256840   5236600   -20240  -0.385%
dist      3670312   3658264   -12048  -0.328%
doc       4669608   4657576   -12032  -0.258%
fix       3377976   3365944   -12032  -0.356%
link      6614888   6586472   -28416  -0.430%
nm        4258368   4254528   -3840   -0.090%
objdump   4656336   4644304   -12032  -0.258%
pack      2295176   2295432   +256    +0.011%
pprof     14762356  14709364  -52992  -0.359%
test2json 2824456   2820600   -3856   -0.137%
trace     11684404  11643700  -40704  -0.348%
vet       8284760   8252248   -32512  -0.392%
total     115210328 114580040 -630288 -0.547%

This change improves compiler performance:

name        old time/op       new time/op       delta
Template          208ms ± 3%        207ms ± 3%  -0.40%  (p=0.030 n=43+44)
Unicode          80.2ms ± 3%       81.3ms ± 3%  +1.25%  (p=0.000 n=41+44)
GoTypes           699ms ± 3%        694ms ± 2%  -0.71%  (p=0.016 n=42+37)
Compiler          3.26s ± 2%        3.23s ± 2%  -0.86%  (p=0.000 n=43+45)
SSA               6.97s ± 1%        6.93s ± 1%  -0.63%  (p=0.000 n=43+45)
Flate             134ms ± 3%        133ms ± 2%    ~     (p=0.139 n=45+42)
GoParser          165ms ± 2%        164ms ± 1%  -0.79%  (p=0.000 n=45+40)
Reflect           434ms ± 4%        435ms ± 4%    ~     (p=0.937 n=44+44)
Tar               181ms ± 2%        181ms ± 2%    ~     (p=0.702 n=43+45)
XML               244ms ± 2%        244ms ± 2%    ~     (p=0.237 n=45+44)
[Geo mean]        403ms             402ms       -0.29%

name        old user-time/op  new user-time/op  delta
Template          271ms ± 2%        268ms ± 1%  -1.40%  (p=0.000 n=42+42)
Unicode           117ms ± 3%        116ms ± 5%    ~     (p=0.066 n=45+45)
GoTypes           948ms ± 2%        936ms ± 2%  -1.30%  (p=0.000 n=41+40)
Compiler          4.26s ± 1%        4.21s ± 2%  -1.25%  (p=0.000 n=37+45)
SSA               9.52s ± 2%        9.41s ± 1%  -1.18%  (p=0.000 n=44+45)
Flate             167ms ± 2%        165ms ± 2%  -1.15%  (p=0.000 n=44+41)
GoParser          201ms ± 2%        198ms ± 1%  -1.40%  (p=0.000 n=43+43)
Reflect           563ms ± 8%        560ms ± 7%    ~     (p=0.206 n=45+44)
Tar               224ms ± 2%        222ms ± 2%  -0.81%  (p=0.000 n=45+45)
XML               308ms ± 2%        304ms ± 1%  -1.17%  (p=0.000 n=42+43)
[Geo mean]        525ms             519ms       -1.08%

name        old alloc/op      new alloc/op      delta
Template         36.3MB ± 0%       36.3MB ± 0%    ~     (p=0.421 n=5+5)
Unicode          28.4MB ± 0%       28.3MB ± 0%    ~     (p=0.056 n=5+5)
GoTypes           121MB ± 0%        121MB ± 0%  -0.14%  (p=0.008 n=5+5)
Compiler          567MB ± 0%        567MB ± 0%  -0.06%  (p=0.016 n=4+5)
SSA              1.26GB ± 0%       1.26GB ± 0%  -0.07%  (p=0.008 n=5+5)
Flate            22.9MB ± 0%       22.8MB ± 0%    ~     (p=0.310 n=5+5)
GoParser         28.0MB ± 0%       27.9MB ± 0%  -0.09%  (p=0.008 n=5+5)
Reflect          78.4MB ± 0%       78.4MB ± 0%  -0.03%  (p=0.008 n=5+5)
Tar              34.2MB ± 0%       34.2MB ± 0%  -0.05%  (p=0.008 n=5+5)
XML              44.4MB ± 0%       44.4MB ± 0%  -0.04%  (p=0.016 n=5+5)
[Geo mean]       76.4MB            76.3MB       -0.05%

name        old allocs/op     new allocs/op     delta
Template           356k ± 0%         356k ± 0%  -0.13%  (p=0.008 n=5+5)
Unicode            326k ± 0%         326k ± 0%  -0.07%  (p=0.008 n=5+5)
GoTypes           1.24M ± 0%        1.24M ± 0%  -0.24%  (p=0.008 n=5+5)
Compiler          5.30M ± 0%        5.28M ± 0%  -0.34%  (p=0.008 n=5+5)
SSA               11.9M ± 0%        11.9M ± 0%  -0.16%  (p=0.008 n=5+5)
Flate              226k ± 0%         225k ± 0%  -0.12%  (p=0.008 n=5+5)
GoParser           287k ± 0%         286k ± 0%  -0.29%  (p=0.008 n=5+5)
Reflect            930k ± 0%         929k ± 0%  -0.05%  (p=0.008 n=5+5)
Tar                332k ± 0%         331k ± 0%  -0.12%  (p=0.008 n=5+5)
XML                411k ± 0%         411k ± 0%  -0.12%  (p=0.008 n=5+5)
[Geo mean]         771k              770k       -0.16%

For some packages, this change significantly reduces the size of executable text.
Examples:

file                                   before   after    Δ       %
cmd/internal/obj/arm.s                 68658    66855    -1803   -2.626%
cmd/internal/obj/mips.s                57486    56272    -1214   -2.112%
cmd/internal/obj/arm64.s               152107   147163   -4944   -3.250%
cmd/internal/obj/ppc64.s               125544   120456   -5088   -4.053%
cmd/vendor/golang.org/x/tools/go/cfg.s 31699    30742    -957    -3.019%

Full listing:

file                                                                     before   after    Δ       %
container/ring.s                                                         1890     1870     -20     -1.058%
container/list.s                                                         5366     5390     +24     +0.447%
internal/cpu.s                                                           3298     3295     -3      -0.091%
internal/testlog.s                                                       1507     1501     -6      -0.398%
image/color.s                                                            8281     8248     -33     -0.399%
runtime.s                                                                480970   480075   -895    -0.186%
sync.s                                                                   16497    16408    -89     -0.539%
internal/singleflight.s                                                  2591     2577     -14     -0.540%
math/rand.s                                                              10456    10438    -18     -0.172%
cmd/go/internal/par.s                                                    2801     2790     -11     -0.393%
internal/reflectlite.s                                                   28477    28417    -60     -0.211%
errors.s                                                                 2750     2736     -14     -0.509%
internal/oserror.s                                                       446      434      -12     -2.691%
sort.s                                                                   17061    17046    -15     -0.088%
io.s                                                                     17063    16999    -64     -0.375%
vendor/golang.org/x/crypto/hkdf.s                                        1962     1936     -26     -1.325%
text/tabwriter.s                                                         9617     9574     -43     -0.447%
hash/crc64.s                                                             3414     3408     -6      -0.176%
hash/crc32.s                                                             6657     6651     -6      -0.090%
bytes.s                                                                  31932    31863    -69     -0.216%
strconv.s                                                                53158    52799    -359    -0.675%
strings.s                                                                42829    42665    -164    -0.383%
encoding/ascii85.s                                                       4833     4791     -42     -0.869%
vendor/golang.org/x/text/transform.s                                     16810    16724    -86     -0.512%
path.s                                                                   6848     6845     -3      -0.044%
encoding/base32.s                                                        9658     9592     -66     -0.683%
bufio.s                                                                  23051    22908    -143    -0.620%
compress/bzip2.s                                                         11773    11764    -9      -0.076%
image.s                                                                  37565    37502    -63     -0.168%
syscall.s                                                                82359    82279    -80     -0.097%
regexp/syntax.s                                                          83573    82930    -643    -0.769%
image/jpeg.s                                                             36535    36490    -45     -0.123%
regexp.s                                                                 64396    64214    -182    -0.283%
time.s                                                                   82724    82622    -102    -0.123%
plugin.s                                                                 6539     6536     -3      -0.046%
context.s                                                                10959    10865    -94     -0.858%
internal/poll.s                                                          24286    24270    -16     -0.066%
reflect.s                                                                168304   167927   -377    -0.224%
internal/fmtsort.s                                                       7416     7376     -40     -0.539%
os.s                                                                     52465    51787    -678    -1.292%
cmd/go/internal/lockedfile/internal/filelock.s                           2326     2317     -9      -0.387%
os/signal.s                                                              4657     4648     -9      -0.193%
runtime/debug.s                                                          6040     5998     -42     -0.695%
encoding/binary.s                                                        30838    30801    -37     -0.120%
vendor/golang.org/x/net/route.s                                          23694    23491    -203    -0.857%
path/filepath.s                                                          17895    17889    -6      -0.034%
cmd/vendor/golang.org/x/sys/unix.s                                       78125    78109    -16     -0.020%
io/ioutil.s                                                              6999     6996     -3      -0.043%
encoding/base64.s                                                        12094    12007    -87     -0.719%
crypto/cipher.s                                                          20466    20372    -94     -0.459%
cmd/go/internal/robustio.s                                               2672     2669     -3      -0.112%
encoding/pem.s                                                           9302     9286     -16     -0.172%
internal/obscuretestdata.s                                               1719     1695     -24     -1.396%
crypto/aes.s                                                             11014    11002    -12     -0.109%
os/exec.s                                                                29388    29231    -157    -0.534%
cmd/internal/browser.s                                                   2266     2260     -6      -0.265%
internal/goroot.s                                                        4601     4592     -9      -0.196%
vendor/golang.org/x/crypto/chacha20poly1305.s                            8945     8942     -3      -0.034%
cmd/vendor/golang.org/x/crypto/ssh/terminal.s                            27226    27195    -31     -0.114%
index/suffixarray.s                                                      36431    36411    -20     -0.055%
fmt.s                                                                    77017    76709    -308    -0.400%
encoding/hex.s                                                           6241     6154     -87     -1.394%
compress/lzw.s                                                           7133     7069     -64     -0.897%
database/sql/driver.s                                                    18888    18877    -11     -0.058%
net/url.s                                                                29838    29739    -99     -0.332%
debug/plan9obj.s                                                         8329     8279     -50     -0.600%
encoding/csv.s                                                           12986    12902    -84     -0.647%
debug/gosym.s                                                            25403    25330    -73     -0.287%
compress/flate.s                                                         51192    50970    -222    -0.434%
vendor/golang.org/x/net/dns/dnsmessage.s                                 86769    86208    -561    -0.647%
compress/gzip.s                                                          9791     9758     -33     -0.337%
compress/zlib.s                                                          7310     7277     -33     -0.451%
archive/zip.s                                                            42356    42166    -190    -0.449%
debug/dwarf.s                                                            108259   107730   -529    -0.489%
encoding/json.s                                                          106378   105910   -468    -0.440%
os/user.s                                                                14751    14724    -27     -0.183%
database/sql.s                                                           99011    98404    -607    -0.613%
log.s                                                                    9466     9423     -43     -0.454%
debug/pe.s                                                               31272    31182    -90     -0.288%
debug/macho.s                                                            32764    32608    -156    -0.476%
encoding/gob.s                                                           136976   136517   -459    -0.335%
vendor/golang.org/x/text/unicode/bidi.s                                  27318    27276    -42     -0.154%
archive/tar.s                                                            71416    70975    -441    -0.618%
vendor/golang.org/x/net/http2/hpack.s                                    23892    23848    -44     -0.184%
vendor/golang.org/x/text/secure/bidirule.s                               3354     3351     -3      -0.089%
mime/quotedprintable.s                                                   5960     5925     -35     -0.587%
net/http/internal.s                                                      5874     5853     -21     -0.358%
math/big.s                                                               184147   183692   -455    -0.247%
debug/elf.s                                                              63775    63567    -208    -0.326%
mime.s                                                                   39802    39709    -93     -0.234%
encoding/xml.s                                                           111038   110713   -325    -0.293%
crypto/dsa.s                                                             6044     6029     -15     -0.248%
go/token.s                                                               12139    12077    -62     -0.511%
crypto/rand.s                                                            6889     6866     -23     -0.334%
go/scanner.s                                                             19030    19008    -22     -0.116%
flag.s                                                                   22320    22236    -84     -0.376%
vendor/golang.org/x/text/unicode/norm.s                                  66652    66391    -261    -0.392%
crypto/rsa.s                                                             31671    31650    -21     -0.066%
crypto/elliptic.s                                                        51553    51403    -150    -0.291%
internal/xcoff.s                                                         22950    22822    -128    -0.558%
go/constant.s                                                            43750    43689    -61     -0.139%
encoding/asn1.s                                                          57086    57035    -51     -0.089%
runtime/trace.s                                                          2609     2603     -6      -0.230%
crypto/x509/pkix.s                                                       10458    10471    +13     +0.124%
image/gif.s                                                              27544    27385    -159    -0.577%
vendor/golang.org/x/net/idna.s                                           24558    24502    -56     -0.228%
image/png.s                                                              42775    42685    -90     -0.210%
vendor/golang.org/x/crypto/cryptobyte.s                                  33616    33493    -123    -0.366%
go/ast.s                                                                 80684    80449    -235    -0.291%
net/internal/socktest.s                                                  16571    16535    -36     -0.217%
crypto/ecdsa.s                                                           11948    11936    -12     -0.100%
text/template/parse.s                                                    95138    94002    -1136   -1.194%
runtime/pprof.s                                                          59702    59639    -63     -0.106%
testing.s                                                                68427    68088    -339    -0.495%
internal/testenv.s                                                       5620     5596     -24     -0.427%
testing/internal/testdeps.s                                              3312     3294     -18     -0.543%
internal/trace.s                                                         78473    78239    -234    -0.298%
testing/iotest.s                                                         4968     4908     -60     -1.208%
os/signal/internal/pty.s                                                 3011     2990     -21     -0.697%
testing/quick.s                                                          12179    12125    -54     -0.443%
cmd/internal/bio.s                                                       9286     9274     -12     -0.129%
cmd/internal/src.s                                                       17684    17663    -21     -0.119%
cmd/internal/goobj2.s                                                    12588    12558    -30     -0.238%
cmd/internal/objabi.s                                                    16408    16390    -18     -0.110%
go/printer.s                                                             77417    77308    -109    -0.141%
go/parser.s                                                              80045    79113    -932    -1.164%
go/format.s                                                              5434     5419     -15     -0.276%
cmd/internal/goobj.s                                                     26146    25954    -192    -0.734%
runtime/pprof/internal/profile.s                                         102518   102178   -340    -0.332%
text/template.s                                                          95343    94935    -408    -0.428%
cmd/internal/dwarf.s                                                     31718    31572    -146    -0.460%
cmd/vendor/golang.org/x/arch/arm/armasm.s                                45240    45151    -89     -0.197%
internal/lazytemplate.s                                                  1470     1457     -13     -0.884%
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s                            37253    37220    -33     -0.089%
cmd/asm/internal/flags.s                                                 2593     2590     -3      -0.116%
cmd/asm/internal/lex.s                                                   25068    24921    -147    -0.586%
cmd/internal/buildid.s                                                   18536    18263    -273    -1.473%
cmd/vendor/golang.org/x/arch/x86/x86asm.s                                80209    80105    -104    -0.130%
go/doc.s                                                                 75140    74585    -555    -0.739%
cmd/internal/edit.s                                                      3893     3899     +6      +0.154%
html/template.s                                                          89377    88809    -568    -0.636%
cmd/vendor/golang.org/x/arch/arm64/arm64asm.s                            117998   117824   -174    -0.147%
cmd/internal/obj.s                                                       115015   114290   -725    -0.630%
go/build.s                                                               69379    68862    -517    -0.745%
cmd/internal/objfile.s                                                   48106    47982    -124    -0.258%
cmd/cover.s                                                              46239    46113    -126    -0.272%
cmd/addr2line.s                                                          2845     2833     -12     -0.422%
cmd/internal/obj/arm.s                                                   68658    66855    -1803   -2.626%
cmd/internal/obj/mips.s                                                  57486    56272    -1214   -2.112%
cmd/internal/obj/riscv.s                                                 63834    63006    -828    -1.297%
cmd/compile/internal/syntax.s                                            146582   145456   -1126   -0.768%
cmd/internal/obj/wasm.s                                                  44117    44066    -51     -0.116%
cmd/cgo.s                                                                242645   241653   -992    -0.409%
cmd/internal/obj/arm64.s                                                 152107   147163   -4944   -3.250%
net.s                                                                    295972   292010   -3962   -1.339%
go/types.s                                                               321371   319432   -1939   -0.603%
vendor/golang.org/x/net/http/httpproxy.s                                 9450     9423     -27     -0.286%
net/textproto.s                                                          19455    19406    -49     -0.252%
cmd/internal/obj/ppc64.s                                                 125544   120456   -5088   -4.053%
go/internal/srcimporter.s                                                6475     6409     -66     -1.019%
log/syslog.s                                                             8017     7929     -88     -1.098%
cmd/compile/internal/logopt.s                                            10183    10162    -21     -0.206%
net/mail.s                                                               24085    23948    -137    -0.569%
mime/multipart.s                                                         21527    21420    -107    -0.497%
cmd/internal/obj/s390x.s                                                 127610   127757   +147    +0.115%
go/internal/gcimporter.s                                                 34913    34548    -365    -1.045%
vendor/golang.org/x/net/nettest.s                                        28103    28016    -87     -0.310%
cmd/go/internal/cfg.s                                                    9967     9916     -51     -0.512%
cmd/api.s                                                                39703    39603    -100    -0.252%
go/internal/gccgoimporter.s                                              56470    56120    -350    -0.620%
go/importer.s                                                            2077     2056     -21     -1.011%
cmd/compile/internal/types.s                                             48202    47282    -920    -1.909%
cmd/go/internal/str.s                                                    4341     4320     -21     -0.484%
cmd/internal/obj/x86.s                                                   89440    88625    -815    -0.911%
cmd/go/internal/base.s                                                   12667    12580    -87     -0.687%
cmd/go/internal/cache.s                                                  30754    30571    -183    -0.595%
cmd/doc.s                                                                62976    62755    -221    -0.351%
cmd/go/internal/search.s                                                 20114    19993    -121    -0.602%
cmd/vendor/golang.org/x/xerrors.s                                        17923    17855    -68     -0.379%
cmd/go/internal/lockedfile.s                                             16451    16415    -36     -0.219%
cmd/vendor/golang.org/x/mod/sumdb/note.s                                 18200    18150    -50     -0.275%
cmd/vendor/golang.org/x/mod/module.s                                     17869    17851    -18     -0.101%
cmd/asm/internal/arch.s                                                  37533    37482    -51     -0.136%
cmd/fix.s                                                                87728    87492    -236    -0.269%
cmd/vendor/golang.org/x/mod/sumdb/tlog.s                                 36394    36367    -27     -0.074%
cmd/vendor/golang.org/x/mod/sumdb/dirhash.s                              4990     4963     -27     -0.541%
cmd/go/internal/imports.s                                                16499    16469    -30     -0.182%
cmd/vendor/golang.org/x/mod/zip.s                                        18816    18745    -71     -0.377%
cmd/go/internal/cmdflag.s                                                5126     5123     -3      -0.059%
cmd/internal/test2json.s                                                 9540     9452     -88     -0.922%
cmd/go/internal/tool.s                                                   3629     3623     -6      -0.165%
cmd/go/internal/version.s                                                11232    11220    -12     -0.107%
cmd/go/internal/mvs.s                                                    25383    25179    -204    -0.804%
cmd/nm.s                                                                 5815     5803     -12     -0.206%
cmd/dist.s                                                               210146   209140   -1006   -0.479%
cmd/asm/internal/asm.s                                                   68655    68549    -106    -0.154%
cmd/vendor/golang.org/x/mod/modfile.s                                    72974    72510    -464    -0.636%
cmd/go/internal/load.s                                                   107548   106861   -687    -0.639%
cmd/link/internal/sym.s                                                  18708    18581    -127    -0.679%
cmd/asm.s                                                                3367     3343     -24     -0.713%
cmd/gofmt.s                                                              30795    30698    -97     -0.315%
cmd/link/internal/objfile.s                                              21828    21630    -198    -0.907%
cmd/pack.s                                                               14878    14869    -9      -0.060%
cmd/vendor/github.com/google/pprof/internal/elfexec.s                    6788     6782     -6      -0.088%
cmd/test2json.s                                                          1647     1641     -6      -0.364%
cmd/link/internal/loader.s                                               48677    48483    -194    -0.399%
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s       16783    16773    -10     -0.060%
cmd/link/internal/loadelf.s                                              35464    35126    -338    -0.953%
cmd/link/internal/loadmacho.s                                            29438    29180    -258    -0.876%
cmd/link/internal/loadpe.s                                               16440    16371    -69     -0.420%
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2106     2100     -6      -0.285%
cmd/link/internal/loadxcoff.s                                            11711    11615    -96     -0.820%
cmd/vendor/golang.org/x/tools/go/analysis/internal/facts.s               14954    14883    -71     -0.475%
cmd/vendor/golang.org/x/tools/go/ast/inspector.s                         5394     5374     -20     -0.371%
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s               37029    36822    -207    -0.559%
cmd/vendor/golang.org/x/tools/go/analysis/passes/inspect.s               340      337      -3      -0.882%
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s               9919     9858     -61     -0.615%
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s                 6705     6690     -15     -0.224%
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s              9783     9741     -42     -0.429%
cmd/vendor/golang.org/x/tools/go/cfg.s                                   31699    30742    -957    -3.019%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ifaceassert.s           2768     2762     -6      -0.217%
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s           3031     2998     -33     -1.089%
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s                 4382     4376     -6      -0.137%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s            8654     8642     -12     -0.139%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stringintconv.s         3458     3446     -12     -0.347%
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s             8011     7995     -16     -0.200%
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s                 6205     6193     -12     -0.193%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                           66183    65861    -322    -0.487%
cmd/vendor/github.com/google/pprof/profile.s                             150844   150261   -583    -0.386%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s           8057     8054     -3      -0.037%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s          3670     3667     -3      -0.082%
cmd/vendor/github.com/google/pprof/internal/measurement.s                10464    10440    -24     -0.229%
cmd/vendor/golang.org/x/tools/go/types/typeutil.s                        12319    12274    -45     -0.365%
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s                  13503    13342    -161    -1.192%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s              5261     5218     -43     -0.817%
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s              1462     1459     -3      -0.205%
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s            9594     9582     -12     -0.125%
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s                34397    34338    -59     -0.172%
cmd/vendor/github.com/google/pprof/internal/graph.s                      53225    52936    -289    -0.543%
cmd/vendor/github.com/ianlancetaylor/demangle.s                          177450   175329   -2121   -1.195%
crypto/x509.s                                                            147892   147388   -504    -0.341%
cmd/go/internal/work.s                                                   306465   304950   -1515   -0.494%
cmd/go/internal/run.s                                                    4664     4657     -7      -0.150%
crypto/tls.s                                                             313130   311833   -1297   -0.414%
net/http/httptrace.s                                                     3979     3905     -74     -1.860%
net/smtp.s                                                               14413    14344    -69     -0.479%
cmd/link/internal/ld.s                                                   545343   542279   -3064   -0.562%
cmd/link/internal/mips.s                                                 6218     6215     -3      -0.048%
cmd/link/internal/mips64.s                                               6108     6103     -5      -0.082%
cmd/link/internal/amd64.s                                                18154    18112    -42     -0.231%
cmd/link/internal/arm64.s                                                22527    22494    -33     -0.146%
cmd/link/internal/arm.s                                                  22574    22494    -80     -0.354%
cmd/link/internal/s390x.s                                                20779    20746    -33     -0.159%
cmd/link/internal/wasm.s                                                 16531    16493    -38     -0.230%
cmd/link/internal/x86.s                                                  18906    18849    -57     -0.301%
cmd/link/internal/ppc64.s                                                26856    26778    -78     -0.290%
net/http.s                                                               559101   556513   -2588   -0.463%
net/http/cookiejar.s                                                     15912    15885    -27     -0.170%
expvar.s                                                                 9531     9525     -6      -0.063%
net/http/httptest.s                                                      16616    16475    -141    -0.849%
net/http/cgi.s                                                           23624    23458    -166    -0.703%
cmd/go/internal/web.s                                                    16546    16489    -57     -0.344%
cmd/vendor/golang.org/x/mod/sumdb.s                                      33197    33117    -80     -0.241%
net/http/fcgi.s                                                          19266    19169    -97     -0.503%
net/http/httputil.s                                                      39875    39728    -147    -0.369%
cmd/vendor/github.com/google/pprof/internal/symbolz.s                    5888     5867     -21     -0.357%
net/rpc.s                                                                34154    34003    -151    -0.442%
cmd/vendor/github.com/google/pprof/internal/transport.s                  2746     2716     -30     -1.092%
cmd/vendor/github.com/google/pprof/internal/binutils.s                   35999    35875    -124    -0.344%
net/rpc/jsonrpc.s                                                        6637     6598     -39     -0.588%
cmd/vendor/github.com/google/pprof/internal/symbolizer.s                 11533    11458    -75     -0.650%
cmd/go/internal/get.s                                                    62921    62803    -118    -0.188%
cmd/vendor/github.com/google/pprof/internal/report.s                     80364    80058    -306    -0.381%
cmd/go/internal/modfetch/codehost.s                                      89680    89066    -614    -0.685%
cmd/trace.s                                                              117171   116701   -470    -0.401%
cmd/vendor/github.com/google/pprof/internal/driver.s                     144268   143297   -971    -0.673%
cmd/go/internal/modfetch.s                                               126299   125860   -439    -0.348%
cmd/vendor/github.com/google/pprof/driver.s                              9042     9000     -42     -0.464%
cmd/go/internal/modconv.s                                                17947    17889    -58     -0.323%
cmd/pprof.s                                                              12399    12326    -73     -0.589%
cmd/go/internal/modload.s                                                151182   150389   -793    -0.525%
cmd/go/internal/generate.s                                               11738    11636    -102    -0.869%
cmd/go/internal/help.s                                                   6571     6531     -40     -0.609%
cmd/go/internal/clean.s                                                  11174    11142    -32     -0.286%
cmd/go/internal/vet.s                                                    7897     7867     -30     -0.380%
cmd/go/internal/envcmd.s                                                 22176    22095    -81     -0.365%
cmd/go/internal/list.s                                                   15216    15067    -149    -0.979%
cmd/go/internal/modget.s                                                 38698    38519    -179    -0.463%
cmd/go/internal/modcmd.s                                                 46674    46441    -233    -0.499%
cmd/go/internal/test.s                                                   64664    64456    -208    -0.322%
cmd/go.s                                                                 6730     6703     -27     -0.401%
cmd/compile/internal/ssa.s                                               3592565  3582500  -10065  -0.280%
cmd/compile/internal/gc.s                                                1549123  1537123  -12000  -0.775%
cmd/compile/internal/riscv64.s                                           14579    14483    -96     -0.658%
cmd/compile/internal/mips.s                                              20578    20419    -159    -0.773%
cmd/compile/internal/ppc64.s                                             25524    25359    -165    -0.646%
cmd/compile/internal/mips64.s                                            19795    19636    -159    -0.803%
cmd/compile/internal/wasm.s                                              13329    13290    -39     -0.293%
cmd/compile/internal/s390x.s                                             28097    27892    -205    -0.730%
cmd/compile/internal/arm.s                                               31489    31321    -168    -0.534%
cmd/compile/internal/arm64.s                                             29803    29590    -213    -0.715%
cmd/compile/internal/amd64.s                                             32961    33221    +260    +0.789%
cmd/compile/internal/x86.s                                               31029    30878    -151    -0.487%
total                                                                    18534966 18440341 -94625  -0.511%

Change-Id: I830d37364f14f0297800adc42c99f60a74c51aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/226367
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-31 21:26:33 +00:00
Keith Randall
33b648c0e9 cmd/compile: fix ephemeral pointer problem on amd64
Make sure we don't use the rewrite ptr + (c + x) -> c + (ptr + x), as
that may create an ephemeral out-of-bounds pointer.

I have not seen an actual bug caused by this yet, but we've seen
them in the 386 port so I'm fixing this issue for amd64 as well.

The load-combining rules needed to be reworked somewhat to still
work without the above broken rule.

Update #37881

Change-Id: I8046d170e89e2035195f261535e34ca7d8aca68a
Reviewed-on: https://go-review.googlesource.com/c/go/+/226437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-30 17:25:29 +00:00
Keith Randall
af7eafd150 cmd/compile: convert 386 port to use addressing modes pass (take 2)
Retrying CL 222782, with a fix that will hopefully stop the random crashing.

The issue with the previous CL is that it does pointer arithmetic
in a way that may briefly generate an out-of-bounds pointer. If an
interrupt happens to occur in that state, the referenced object may
be collected incorrectly.

Suppose there was code that did s[x+c].  The previous CL had a rule
to the effect of ptr + (x + c) -> c + (ptr + x).  But ptr+x is not
guaranteed to point to the same object as ptr. In contrast,
ptr+(x+c) is guaranteed to point to the same object as ptr, because
we would have already checked that x+c is in bounds.

For example, strconv.trim used to have this code:
  MOVZX -0x1(BX)(DX*1), BP
  CMPL $0x30, AL
After CL 222782, it had this code:
  LEAL 0(BX)(DX*1), BP
  CMPB $0x30, -0x1(BP)

An interrupt between those last two instructions could see BP pointing
outside the backing store of the slice involved.

It's really hard to actually demonstrate a bug. First, you need to
have an interrupt occur at exactly the right time. Then, there must
be no other pointers to the object in question. Since the interrupted
frame will be scanned conservatively, there can't even be a dead
pointer in another register or on the stack. (In the example above,
a bug can't happen because BX still holds the original pointer.)
Then, the object in question needs to be collected (or at least
scanned?) before the interrupted code continues.

This CL needs to handle load combining somewhat differently than CL 222782
because of the new restriction on arithmetic. That's the only real
difference (other than removing the bad rules) from that old CL.

This bug is also present in the amd64 rewrite rules, and we haven't
seen any crashing as a result. I will fix up that code similarly to
this one in a separate CL.

Update #37881

Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f
Reviewed-on: https://go-review.googlesource.com/c/go/+/225798
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-27 18:54:45 +00:00