1
0
mirror of https://github.com/golang/go synced 2024-11-16 23:14:42 -07:00
Commit Graph

333 Commits

Author SHA1 Message Date
Keith Randall
c4b2288755 cmd/compile: add jump table codegen test
Change-Id: Ic67f676f5ebe146166a0d3c1d78a802881320e49
Reviewed-on: https://go-review.googlesource.com/c/go/+/400375
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2022-04-14 21:16:29 +00:00
Wayne Zuo
66f03f79da cmd/compile: add SHLX&SHRX without load
Change-Id: I79eb5e7d6bcb23f26d3a100e915efff6dae70391
Reviewed-on: https://go-review.googlesource.com/c/go/+/399061
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-04-13 17:48:36 +00:00
Wayne Zuo
517781b391 cmd/compile: add SARXQload and SARXLload
Change-Id: I4e8dc5009a5b8af37d71b62f1322f94806d3e9d9
Reviewed-on: https://go-review.googlesource.com/c/go/+/399056
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-04-13 17:48:12 +00:00
Wayne Zuo
d6320f1a58 cmd/compile: add SARX instruction for GOAMD64>=3
name                    old time/op  new time/op  delta
ShiftArithmeticRight-8  0.68ns ± 5%  0.30ns ± 6%  -56.14%  (p=0.000 n=10+10)

Change-Id: I052a0d7b9e6526d526276444e588b0cc288beff4
Reviewed-on: https://go-review.googlesource.com/c/go/+/399055
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-04-12 12:59:27 +00:00
Wayne Zuo
32de2b0d1c cmd/compile: add MOVBE index load/store
Fixes #51724

Change-Id: I94e650a7482dc4c479d597f0162a6a89d779708d
Reviewed-on: https://go-review.googlesource.com/c/go/+/395474
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Cherry Mui <cherryyz@google.com>
2022-04-11 15:41:56 +00:00
Wayne Zuo
615d3c3040 test: adjust load and store test
In the load tests, we only want to test the assembly produced by
the load operations. If we use the global variable sink, it will produce
one load operation and one store operation(assign to sink).

For example:

func load_be64(b []byte) uint64 {
	sink64 = binary.BigEndian.Uint64(b)
}

If we compile this function with GOAMD64=v3, it may produce MOVBEQload
and MOVQstore or MOVQload and MOVBEQstore, but we only want MOVBEQload.
Discovered when developing CL 395474.

Same for the store tests.

Change-Id: I65c3c742f1eff657c3a0d2dd103f51140ae8079e
Reviewed-on: https://go-review.googlesource.com/c/go/+/397875
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Cherry Mui <cherryyz@google.com>
2022-04-11 15:41:04 +00:00
Wayne Zuo
7fbabe8d57 cmd/compile: use shlx&shrx instruction for GOAMD64>=v3
The SHRX/SHLX instruction can take any general register as the shift count operand, and can read source from memory. This CL introduces some operators to combine load and shift to one instruction.

For #47120

Change-Id: I13b48f53c7d30067a72eb2c8382242045dead36a
Reviewed-on: https://go-review.googlesource.com/c/go/+/385174
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Cherry Mui <cherryyz@google.com>
2022-04-04 20:07:23 +00:00
Wayne Zuo
a92ca51507 cmd/compile: use LZCNT instruction for GOAMD64>=3
LZCNT is similar to BSR, but BSR(x) is undefined when x == 0, so using
LZCNT can avoid a special case for zero input. Except that case,
LZCNTQ(x) == 63-BSRQ(x) and LZCNTL(x) == 31-BSRL(x).

And according to https://www.agner.org/optimize/instruction_tables.pdf,
LZCNT instructions are much faster than BSR on AMD CPU.

name              old time/op  new time/op  delta
LeadingZeros-8    0.91ns ± 1%  0.80ns ± 7%  -11.68%  (p=0.000 n=9+9)
LeadingZeros8-8   0.98ns ±15%  0.91ns ± 1%   -7.34%  (p=0.000 n=9+9)
LeadingZeros16-8  0.94ns ± 3%  0.92ns ± 2%   -2.36%  (p=0.001 n=10+10)
LeadingZeros32-8  0.89ns ± 1%  0.78ns ± 2%  -12.49%  (p=0.000 n=10+10)
LeadingZeros64-8  0.92ns ± 1%  0.78ns ± 1%  -14.48%  (p=0.000 n=10+10)

Change-Id: I125147fe3d6994a4cfe558432780408e9a27557a
Reviewed-on: https://go-review.googlesource.com/c/go/+/396794
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-04 04:01:17 +00:00
Wayne Zuo
ba6df85c7c cmd/compile: add MOVBEWstore support for GOAMD64>=3
This CL add MOVBE support for 16-bit version, but MOVBEWload is
excluded because it does not satisfy zero extented.

For #51724

Change-Id: I3fadf20bcbb9b423f6355e6a1e340107e8e621ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/396617
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
2022-04-03 23:48:16 +00:00
fanzha02
8ab42a945a cmd/compile: merge ANDconst and UBFX into UBFX on arm64
Add a new rewrite rule to merge ANDconst and UBFX into
UBFX.

Add test cases.

Change-Id: I24d6442d0c956d7ce092c3a3858d4a3a41771670
Reviewed-on: https://go-review.googlesource.com/c/go/+/377054
Trust: Fannie Zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-25 01:24:44 +00:00
Keith Randall
aaa3d39f27 cmd/compile: include all entries in map literal hint size
Currently we only include static entries in the hint for sizing
the map when allocating a map for a map literal. Change that to
include all entries.

This will be an overallocation if the dynamic entries in the map have
equal keys, but equal keys in map literals are rare, and at worst we
waste a bit of space.

Fixes #43020

Change-Id: I232f82f15316bdf4ea6d657d25a0b094b77884ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/383634
Run-TryBot: Keith Randall <khr@golang.org>
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-01 23:20:30 +00:00
Archana R
4056934e48 test/codegen: updated arithmetic tests to verify on ppc64,ppc64le
Updated multiple tests in test/codegen/arithmetic.go to verify
on ppc64/ppc64le as well

Change-Id: I79ca9f87017ea31147a4ba16f5d42ba0fcae64e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/358546
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-11-01 13:12:37 +00:00
Archana R
8b9c0d1a79 test/codegen: updated comparison test to verify on ppc64,ppc64le
Updated test/codegen/comparison.go to verify memequal is inlined
as implemented in CL 328291.

Change-Id: If7824aed37ee1f8640e54fda0f9b7610582ba316
Reviewed-on: https://go-review.googlesource.com/c/go/+/357289
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-21 13:05:07 +00:00
wdvxdr
fe7df4c4d0 cmd/compile: use MOVBE instruction for GOAMD64>=v3
In CL 354670, I copied some existing rules for convenience but forgot
to update the last rule which broke `GOAMD64=v3 ./make.bat`

Revive CL 354670

Change-Id: Ic1e2047c603f0122482a4b293ce1ef74d806c019
Reviewed-on: https://go-review.googlesource.com/c/go/+/356810
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
2021-10-19 16:18:56 +00:00
Daniel Martí
b0351bfd7d Revert "cmd/compile: use MOVBE instruction for GOAMD64>=v3"
This reverts CL 354670.

Reason for revert: broke make.bash with GOAMD64=v3.

Fixes #49061.

Change-Id: I7f2ed99b7c10100c4e0c1462ea91c4c9d8c609b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/356790
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Koichi Shiraishi <zchee.io@gmail.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2021-10-19 09:49:38 +00:00
Lynn Boger
33b3260c1e cmd/compile/internal/ssagen: set BitLen32 as intrinsic on PPC64
It was noticed through some other investigation that BitLen32
was not generating the best code and found that it wasn't recognized
as an intrinsic. This corrects that and enables the test for PPC64.

Change-Id: Iab496a8830c8552f507b7292649b1b660f3848b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/355872
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-18 21:33:08 +00:00
wdvxdr
3e5cc4d6f6 cmd/compile: use MOVBE instruction for GOAMD64>=v3
encoding/binary benchmark on my laptop:
name                      old time/op    new time/op     delta
ReadSlice1000Int32s-8       4.42µs ± 5%     4.20µs ± 1%   -4.94%  (p=0.046 n=9+8)
ReadStruct-8                 359ns ± 8%      368ns ± 5%   +2.35%  (p=0.041 n=9+10)
WriteStruct-8                349ns ± 1%      357ns ± 1%   +2.15%  (p=0.000 n=8+10)
ReadInts-8                   235ns ± 1%      233ns ± 1%   -1.01%  (p=0.005 n=10+10)
WriteInts-8                  265ns ± 1%      274ns ± 1%   +3.45%  (p=0.000 n=10+10)
WriteSlice1000Int32s-8      4.61µs ± 5%     4.59µs ± 5%     ~     (p=0.986 n=10+10)
PutUint16-8                 0.56ns ± 4%     0.57ns ± 4%     ~     (p=0.101 n=10+10)
PutUint32-8                 0.83ns ± 2%     0.56ns ± 6%  -32.91%  (p=0.000 n=10+10)
PutUint64-8                 0.81ns ± 3%     0.62ns ± 4%  -23.82%  (p=0.000 n=10+10)
LittleEndianPutUint16-8     0.55ns ± 4%     0.55ns ± 3%     ~     (p=0.926 n=10+10)
LittleEndianPutUint32-8     0.41ns ± 4%     0.42ns ± 3%     ~     (p=0.148 n=10+9)
LittleEndianPutUint64-8     0.55ns ± 2%     0.56ns ± 6%     ~     (p=0.897 n=10+10)
ReadFloats-8                60.4ns ± 4%     59.0ns ± 1%   -2.25%  (p=0.007 n=10+10)
WriteFloats-8               72.3ns ± 2%     71.5ns ± 7%     ~     (p=0.089 n=10+10)
ReadSlice1000Float32s-8     4.21µs ± 3%     4.18µs ± 2%     ~     (p=0.197 n=10+10)
WriteSlice1000Float32s-8    4.61µs ± 2%     4.68µs ± 7%     ~     (p=1.000 n=8+10)
ReadSlice1000Uint8s-8        250ns ± 4%      247ns ± 4%     ~     (p=0.324 n=10+10)
WriteSlice1000Uint8s-8       227ns ± 5%      229ns ± 2%     ~     (p=0.193 n=10+7)
PutUvarint32-8              15.3ns ± 2%     15.4ns ± 4%     ~     (p=0.782 n=10+10)
PutUvarint64-8              38.5ns ± 1%     38.6ns ± 5%     ~     (p=0.396 n=8+10)

name                      old speed      new speed       delta
ReadSlice1000Int32s-8      890MB/s ±17%    953MB/s ± 1%   +7.00%  (p=0.027 n=10+8)
ReadStruct-8               209MB/s ± 8%    204MB/s ± 5%   -2.42%  (p=0.043 n=9+10)
WriteStruct-8              214MB/s ± 3%    210MB/s ± 1%   -1.75%  (p=0.003 n=9+10)
ReadInts-8                 127MB/s ± 1%    129MB/s ± 1%   +1.01%  (p=0.006 n=10+10)
WriteInts-8                113MB/s ± 1%    109MB/s ± 1%   -3.34%  (p=0.000 n=10+10)
WriteSlice1000Int32s-8     868MB/s ± 5%    872MB/s ± 5%     ~     (p=1.000 n=10+10)
PutUint16-8               3.55GB/s ± 4%   3.50GB/s ± 4%     ~     (p=0.093 n=10+10)
PutUint32-8               4.83GB/s ± 2%   7.21GB/s ± 6%  +49.16%  (p=0.000 n=10+10)
PutUint64-8               9.89GB/s ± 3%  12.99GB/s ± 4%  +31.26%  (p=0.000 n=10+10)
LittleEndianPutUint16-8   3.65GB/s ± 4%   3.65GB/s ± 4%     ~     (p=0.912 n=10+10)
LittleEndianPutUint32-8   9.74GB/s ± 3%   9.63GB/s ± 3%     ~     (p=0.222 n=9+9)
LittleEndianPutUint64-8   14.4GB/s ± 2%   14.3GB/s ± 5%     ~     (p=0.912 n=10+10)
ReadFloats-8               199MB/s ± 4%    203MB/s ± 1%   +2.27%  (p=0.007 n=10+10)
WriteFloats-8              166MB/s ± 2%    168MB/s ± 7%     ~     (p=0.089 n=10+10)
ReadSlice1000Float32s-8    949MB/s ± 3%    958MB/s ± 2%     ~     (p=0.218 n=10+10)
WriteSlice1000Float32s-8   867MB/s ± 2%    857MB/s ± 6%     ~     (p=1.000 n=8+10)
ReadSlice1000Uint8s-8     4.00GB/s ± 4%   4.06GB/s ± 4%     ~     (p=0.353 n=10+10)
WriteSlice1000Uint8s-8    4.40GB/s ± 4%   4.36GB/s ± 2%     ~     (p=0.193 n=10+7)
PutUvarint32-8             262MB/s ± 2%    260MB/s ± 4%     ~     (p=0.739 n=10+10)
PutUvarint64-8             208MB/s ± 1%    207MB/s ± 5%     ~     (p=0.408 n=8+10)

Updates #45453

Change-Id: Ifda0d48d54665cef45d46d3aad974062633142c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/354670
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
2021-10-18 16:01:55 +00:00
Jake Ciolek
732f6fa9d5 cmd/compile: use ANDL for small immediates
We can rewrite ANDQ with an immediate fitting in 32bit with an ANDL, which is shorter to encode.

Looking at Go binary itself, before the change there was:

ANDL: 2337
ANDQ: 4476

After the change:

ANDL: 3790
ANDQ: 3024

So we got rid of 1452 ANDQs

This makes the Linux x86_64 binary 0.03% smaller.

There seems to be an impact on performance.

Intel Cascade Lake benchmarks (with perflock):

name                     old time/op    new time/op    delta
BinaryTree17-8              1.91s ± 1%     1.89s ± 1%  -1.22%  (p=0.000 n=21+18)
Fannkuch11-8                2.34s ± 0%     2.34s ± 0%    ~     (p=0.052 n=20+20)
FmtFprintfEmpty-8          27.7ns ± 1%    27.4ns ± 3%    ~     (p=0.497 n=21+21)
FmtFprintfString-8         53.2ns ± 0%    51.5ns ± 0%  -3.21%  (p=0.000 n=20+19)
FmtFprintfInt-8            57.3ns ± 0%    55.7ns ± 0%  -2.89%  (p=0.000 n=19+19)
FmtFprintfIntInt-8         92.3ns ± 0%    88.4ns ± 1%  -4.23%  (p=0.000 n=20+21)
FmtFprintfPrefixedInt-8     103ns ± 0%     103ns ± 0%  +0.23%  (p=0.000 n=20+21)
FmtFprintfFloat-8           147ns ± 0%     148ns ± 0%  +0.75%  (p=0.000 n=20+21)
FmtManyArgs-8               384ns ± 0%     381ns ± 0%  -0.63%  (p=0.000 n=21+21)
GobDecode-8                3.86ms ± 1%    3.88ms ± 1%  +0.52%  (p=0.000 n=20+21)
GobEncode-8                2.77ms ± 1%    2.77ms ± 0%    ~     (p=0.078 n=21+21)
Gzip-8                      168ms ± 1%     168ms ± 0%  +0.24%  (p=0.000 n=20+20)
Gunzip-8                   25.1ms ± 0%    24.3ms ± 0%  -3.03%  (p=0.000 n=21+21)
HTTPClientServer-8         61.4µs ± 8%    59.1µs ±10%    ~     (p=0.088 n=20+21)
JSONEncode-8               6.86ms ± 0%    6.70ms ± 0%  -2.29%  (p=0.000 n=20+19)
JSONDecode-8               30.8ms ± 1%    30.6ms ± 1%  -0.82%  (p=0.000 n=20+20)
Mandelbrot200-8            3.85ms ± 0%    3.85ms ± 0%    ~     (p=0.191 n=16+17)
GoParse-8                  2.61ms ± 2%    2.60ms ± 1%    ~     (p=0.561 n=21+20)
RegexpMatchEasy0_32-8      48.5ns ± 2%    45.9ns ± 3%  -5.26%  (p=0.000 n=20+21)
RegexpMatchEasy0_1K-8       139ns ± 0%     139ns ± 0%  +0.27%  (p=0.000 n=18+20)
RegexpMatchEasy1_32-8      41.3ns ± 0%    42.1ns ± 4%  +1.95%  (p=0.000 n=17+21)
RegexpMatchEasy1_1K-8       216ns ± 2%     216ns ± 0%  +0.17%  (p=0.020 n=21+19)
RegexpMatchMedium_32-8      790ns ± 7%     803ns ± 8%    ~     (p=0.178 n=21+21)
RegexpMatchMedium_1K-8     23.5µs ± 5%    23.7µs ± 5%    ~     (p=0.421 n=21+21)
RegexpMatchHard_32-8       1.09µs ± 1%    1.09µs ± 1%  -0.53%  (p=0.000 n=19+18)
RegexpMatchHard_1K-8       33.0µs ± 0%    33.0µs ± 0%    ~     (p=0.610 n=21+20)
Revcomp-8                   348ms ± 0%     353ms ± 0%  +1.38%  (p=0.000 n=17+18)
Template-8                 42.0ms ± 1%    41.9ms ± 1%  -0.30%  (p=0.049 n=20+20)
TimeParse-8                 185ns ± 0%     185ns ± 0%    ~     (p=0.387 n=20+18)
TimeFormat-8                237ns ± 1%     241ns ± 1%  +1.57%  (p=0.000 n=21+21)
[Geo mean]                 35.4µs         35.2µs       -0.66%

name                     old speed      new speed      delta
GobDecode-8               199MB/s ± 1%   198MB/s ± 1%  -0.52%  (p=0.000 n=20+21)
GobEncode-8               277MB/s ± 1%   277MB/s ± 0%    ~     (p=0.075 n=21+21)
Gzip-8                    116MB/s ± 1%   115MB/s ± 0%  -0.25%  (p=0.000 n=20+20)
Gunzip-8                  773MB/s ± 0%   797MB/s ± 0%  +3.12%  (p=0.000 n=21+21)
JSONEncode-8              283MB/s ± 0%   290MB/s ± 0%  +2.35%  (p=0.000 n=20+19)
JSONDecode-8             63.0MB/s ± 1%  63.5MB/s ± 1%  +0.82%  (p=0.000 n=20+20)
GoParse-8                22.2MB/s ± 2%  22.3MB/s ± 1%    ~     (p=0.539 n=21+20)
RegexpMatchEasy0_32-8     660MB/s ± 2%   697MB/s ± 3%  +5.57%  (p=0.000 n=20+21)
RegexpMatchEasy0_1K-8    7.36GB/s ± 0%  7.34GB/s ± 0%  -0.26%  (p=0.000 n=18+20)
RegexpMatchEasy1_32-8     775MB/s ± 0%   761MB/s ± 4%  -1.88%  (p=0.000 n=17+21)
RegexpMatchEasy1_1K-8    4.74GB/s ± 2%  4.74GB/s ± 0%  -0.18%  (p=0.020 n=21+19)
RegexpMatchMedium_32-8   40.6MB/s ± 7%  39.9MB/s ± 9%    ~     (p=0.191 n=21+21)
RegexpMatchMedium_1K-8   43.7MB/s ± 5%  43.2MB/s ± 5%    ~     (p=0.435 n=21+21)
RegexpMatchHard_32-8     29.3MB/s ± 1%  29.4MB/s ± 1%  +0.53%  (p=0.000 n=19+18)
RegexpMatchHard_1K-8     31.0MB/s ± 0%  31.0MB/s ± 0%    ~     (p=0.572 n=21+20)
Revcomp-8                 730MB/s ± 0%   720MB/s ± 0%  -1.36%  (p=0.000 n=17+18)
Template-8               46.2MB/s ± 1%  46.3MB/s ± 1%  +0.30%  (p=0.041 n=20+20)
[Geo mean]                204MB/s        205MB/s       +0.30%

Change-Id: Iac75d0ec184a515ce0e65e19559d5fe2e9840514
Reviewed-on: https://go-review.googlesource.com/c/go/+/354970
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-10-12 22:02:39 +00:00
Alejandro García Montoro
e1c294a56d cmd/compile: eliminate successive swaps
The code generated when storing eight bytes loaded from memory in big
endian introduced two successive byte swaps that did not actually
modified the data.

The new rules match this specific pattern both for amd64 and for arm64,
eliminating the double swap.

Fixes #41684

Change-Id: Icb6dc20b68e4393cef4fe6a07b33aba0d18c3ff3
Reviewed-on: https://go-review.googlesource.com/c/go/+/320073
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
2021-10-09 01:04:29 +00:00
Ruslan Andreev
810b08b8ec cmd/compile: inline memequal(x, const, sz) for small sizes
This CL adds late expanded memequal(x, const, sz) inlining for 2, 4, 8
bytes size. This PoC is using the same method as CL 248404.
This optimization fires about 100 times in Go compiler (1675 occurrences
reduced to 1574, so -6%).
Also, added unit-tests to codegen/comparisions.go file.

Updates #37275

Change-Id: Ia52808d573cb706d1da8166c5746ede26f46c5da
Reviewed-on: https://go-review.googlesource.com/c/go/+/328291
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Trust: David Chase <drchase@google.com>
2021-10-06 13:47:50 +00:00
nimelehin
097a82f54d cmd/compile: don't emit unnecessary amd64 extension checks
In case of amd64 the compiler issues checks if extensions are
available on a platform. With GOAMD64 microarchitecture levels
provided, some of the checks could be eliminated.

Change-Id: If15c178bcae273b2ce7d3673415cb8849292e087
Reviewed-on: https://go-review.googlesource.com/c/go/+/352010
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
2021-10-05 18:32:12 +00:00
wdvxdr
060cd73ab9 cmd/compile: use TZCNT instruction for GOAMD64>=v3
on my Intel CoffeeLake CPU:
name               old time/op  new time/op  delta
TrailingZeros-8    0.68ns ± 1%  0.64ns ± 1%  -6.26%  (p=0.000 n=10+10)
TrailingZeros8-8   0.70ns ± 1%  0.70ns ± 1%    ~     (p=0.697 n=10+10)
TrailingZeros16-8  0.70ns ± 1%  0.70ns ± 1%  +0.57%  (p=0.043 n=10+10)
TrailingZeros32-8  0.66ns ± 1%  0.64ns ± 1%  -3.35%  (p=0.000 n=10+10)
TrailingZeros64-8  0.68ns ± 1%  0.64ns ± 1%  -5.84%  (p=0.000 n=9+10)

Updates #45453

Change-Id: I228ff2d51df24b1306136f061432f8a12bb1d6fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/353249
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-10-05 16:06:49 +00:00
Lynn Boger
b8990ec932 test: update test/codegen/noextend.go to work with either ABI on ppc64x
This updates the codegen tests in noextend.go so they are not
dependent on the ABI.

Change-Id: I8433bea9dc78830c143290a7e0cf901b2397d38a
Reviewed-on: https://go-review.googlesource.com/c/go/+/353070
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-29 17:30:11 +00:00
Archana R
b35c668072 cmd/compile: add PPC64-specific inlining for runtime.memmove
Add rule to PPC64.rules to inline runtime.memmove in more cases, as is
done for other target architectures
Updated tests in codegen/copy.go to verify changes are done on
ppc64/ppc64le

Updates #41662

Change-Id: Id937ce21f9b4f4047b3e66dfa3c960128ee16a2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/352054
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2021-09-29 16:55:51 +00:00
Joel Sing
fe8347b61a cmd/compile: optimise immediate operands with constants on riscv64
Instructions with immediates can be precomputed when operating on a
constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI
and ADDI. Additionally, optimise ANDI and ORI when the immediate is
all ones or all zeroes.

In particular, the RISCV64 logical left and right shift rules
(Lsh*x*/Rsh*Ux*) produce sequences that check if the shift amount
exceeds 64 and if so returns zero. When the shift amount is a
constant we can precompute and eliminate the filter entirely.

Likewise the arithmetic right shift rules produce sequences that
check if the shift amount exceeds 64 and if so, ensures that the
lower six bits of the shift are all ones. When the shift amount
is a constant we can precompute the shift value.

Arithmetic right shift sequences like:

   117fc:       00100513                li      a0,1
   11800:       04053593                sltiu   a1,a0,64
   11804:       fff58593                addi    a1,a1,-1
   11808:       0015e593                ori     a1,a1,1
   1180c:       40b45433                sra     s0,s0,a1

Are now a single srai instruction:

   117fc:       40145413                srai    s0,s0,0x1

Likewise for logical left shift (and logical right shift):

   1d560:       01100413                li      s0,17
   1d564:       04043413                sltiu   s0,s0,64
   1d568:       40800433                neg     s0,s0
   1d56c:       01131493                slli    s1,t1,0x11
   1d570:       0084f433                and     s0,s1,s0

Which are now a single slli (or srli) instruction:

   1d120:       01131413                slli    s0,t1,0x11

This removes more than 30,000 instructions from the Go binary and
should improve performance in a variety of areas - of note
runtime.makemap_small drops from 48 to 36 instructions. Similar
gains exist in at least other parts of runtime and math/bits.

Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350689
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-24 10:51:48 +00:00
Joel Sing
d413908320 test/codegen: add shift tests for RISCV64
Add tests for shift by constant, masked shifts and bounded shifts. While here,
sort tests by architecture and keep order of tests consistent (lsh, rshU, rsh).

Change-Id: I512d64196f34df9cb2884e8c0f6adcf9dd88b0fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/351289
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
2021-09-24 07:22:13 +00:00
Matthew Dempsky
04572fa29b cmd/compile: use BMI1 instructions for GOAMD64=v3 and higher
BMI1 includes four instructions (ANDN, BLSI, BLSMSK, BLSR) that are
easy to peephole optimize, and which GCC always seems to favor using
when available and applicable.

Updates #45453.

Change-Id: I0274184057058f5c579e5bc3ea9c414396d3cf46
Reviewed-on: https://go-review.googlesource.com/c/go/+/351130
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Trust: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-09-22 00:15:27 +00:00
Keith Randall
6f35430faa cmd/compile: allow rotates to be merged with logical ops on arm64
Fixes #48002

Change-Id: Ie3a157d55b291f5ac2ef4845e6ce4fefd84fc642
Reviewed-on: https://go-review.googlesource.com/c/go/+/350912
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-20 16:25:36 +00:00
Keith Randall
315dbd10c9 cmd/compile: fold double negate on arm64
Fixes #48467

Change-Id: I52305dbf561ee3eee6c1f053e555a3a6ec1ab892
Reviewed-on: https://go-review.googlesource.com/c/go/+/350910
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-09-19 23:50:45 +00:00
Keith Randall
83b36ffb10 cmd/compile: implement constant rotates on arm64
Explicit constant rotates work, but constant arguments to
bits.RotateLeft* needed the additional rule.

Fixes #48465

Change-Id: Ia7544f21d0e7587b6b6506f72421459cd769aea6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350909
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-09-19 23:50:23 +00:00
Michael Munday
c69f5c0d76 cmd/compile: add support for Abs and Copysign intrinsics on riscv64
Also, add the FABSS and FABSD pseudo instructions to the assembler.
The compiler could use FSGNJX[SD] directly but there doesn't seem
to be much advantage to doing so and the pseudo instructions are
easier to understand.

Change-Id: Ie8825b8aa8773c69cc4f07a32ef04abf4061d80d
Reviewed-on: https://go-review.googlesource.com/c/go/+/348989
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
2021-09-10 10:45:59 +00:00
fanzha02
2091bd3f26 cmd/compile: simiplify arm64 bitfield optimizations
In some rewrite rules for arm64 bitfield optimizations, the
bitfield lsb value and the bitfield width value are related
to datasize, some of them use datasize directly to check the
bitfield lsb value is valid, to get the bitfiled width value,
but some of them call isARM64BFMask() and arm64BFWidth()
functions. In order to be consistent, this patch changes them
all to use datasize.

Besides, this patch sorts the codegen test cases.

Run the "toolstash-check -all" command and find one inconsistent code
is as the following.

new:    src/math/fma.go:104      BEQ	247
master: src/math/fma.go:104      BEQ	248

The above inconsistence is due to this patch changing the range of the
field lsb value in "UBFIZ" optimization rules from "lc+(32|16|8)<64" to
"lc<64", so that the following code is generated as "UBFIZ". The logical
of changed code is still correct.

The code of src/math/fma.go:160:
  const uvinf = 0x7FF0000000000000
  func FMA(a, b uint32) float64 {
  	ps := a+b
  	return Float64frombits(uint64(ps)<<63 | uvinf)
  }

The new assembly code:
  TEXT    "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16
  MOVWU   "".a(FP), R0
  MOVWU   "".b+4(FP), R1
  ADD     R1, R0, R0
  UBFIZ   $63, R0, $1, R0
  ORR     $9218868437227405312, R0, R0
  MOVD    R0, "".~r2+8(FP)
  RET     (R30)

The master assembly code:
  TEXT    "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16
  MOVWU   "".a(FP), R0
  MOVWU   "".b+4(FP), R1
  ADD     R1, R0, R0
  MOVWU   R0, R0
  LSL     $63, R0, R0
  ORR     $9218868437227405312, R0, R0
  MOVD    R0, "".~r2+8(FP)
  RET     (R30)

Change-Id: I9061104adfdfd3384d0525327ae1e5c8b0df5c35
Reviewed-on: https://go-review.googlesource.com/c/go/+/265038
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-10 02:44:36 +00:00
Michael Munday
8214257347 test/codegen: fix package name for test case
The codegen tests are currently skipped (see #48247). The test
added in CL 346050 did not compile because it was in the main
package but did not contain a main function. Changing the package
to 'codegen' fixes the issue.

Updates #48247.

Change-Id: I0a0eaca8e6a7d7b335606d2c76a204ac0c12e6d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/348392
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-09-08 14:51:40 +00:00
Michael Munday
1da64686f8 test/codegen: fix compilation of bitfield tests
The codegen tests are currently skipped (see #48247) and the
bitfield tests do not actually compile due to a duplicate function
name (sbfiz5) added in CL 267602. Renaming the function fixes the
issue.

Updates #48247.

Change-Id: I626fd5ef13732dc358e73ace9ddcc4cbb6ae5b21
Reviewed-on: https://go-review.googlesource.com/c/go/+/348391
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-08 14:51:31 +00:00
Michael Munday
fdc2072420 test/codegen: remove broken riscv64 test
This test is not executed by default (see #48247) and does not
actually pass. It was added in CL 346689. The code generation
changes made in that CL only change how instructions are assembled,
they do not actually affect the output of the compiler. This test
is unfortunately therefore invalid and will never pass.

Updates #48247.

Change-Id: I0c807e4a111336e5a097fe4e3af2805f9932a87f
Reviewed-on: https://go-review.googlesource.com/c/go/+/348390
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-09-08 14:51:22 +00:00
wdvxdr
21de6bc463 cmd/compile: simplify less with non-negative number and constant 0 or 1
The most common cases:
len(s) > 0
len(s) < 1

and they can be simplified to:
len(s) != 0
len(s) == 0

Fixes #48054

Change-Id: I16e5b0cffcfab62a4acc2a09977a6cd3543dd000
Reviewed-on: https://go-review.googlesource.com/c/go/+/346050
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-07 14:53:41 +00:00
fanzha02
7b69ddc171 cmd/compile: merge sign extension and shift into SBFIZ
This patch adds some rules to rewrite "(LeftShift (SignExtend x) lc)"
expression as "SBFIZ".

Add the test cases.

Change-Id: I294c4ba09712eeb02c7a952447bb006780f1e60d
Reviewed-on: https://go-review.googlesource.com/c/go/+/267602
Trust: fannie zhang <Fannie.Zhang@arm.com>
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-09-06 03:31:53 +00:00
fanzha02
43b05173a2 cmd/compile: merge zero/sign extensions with UBFX/SBFX on arm64
The UBFX and SBFX already zero/sign extend the result. Further
zero/sign extensions are thus unnecessary as long as they leave
the top bits unaltered. This patch absorbs zero/sign extensions
into UBFX/SBFX.

Add the related test cases.

Change-Id: I7c4516c8b52d677f77bf3aaedab87c4a28056ec0
Reviewed-on: https://go-review.googlesource.com/c/go/+/265039
Trust: fannie zhang <Fannie.Zhang@arm.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-06 03:31:43 +00:00
Ben Shi
37d4532867 cmd/internal/obj/riscv: simplify addition with constant
This CL simplifies riscv addition (add r, imm) to
(ADDI (ADDI r, imm/2), imm-imm/2) if imm is in specific ranges.
(-4096 <= imm <= -2049 or 2048 <= imm <= 4094)

There is little impact to the go1 benchmark, while the total
size of pkg/linux_riscv64 decreased by about 11KB.

Change-Id: I236eb8af3b83bb35ce9c0b318fc1d235e8ab9a4e
GitHub-Last-Rev: a2f56a0763
GitHub-Pull-Request: golang/go#48110
Reviewed-on: https://go-review.googlesource.com/c/go/+/346689
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Michael Munday <mike.munday@lowrisc.org>
2021-09-02 14:35:10 +00:00
Michael Munday
ea51e223c2 cmd/{asm,compile}: add fused multiply-add support on riscv64
Add support to the assembler for F[N]M{ADD,SUB}[SD] instructions.
Argument order is:

  OP RS1, RS2, RS3, RD

Also, add support for the FMA intrinsic to the compiler. Automatic
FMA matching is left to a future CL.

Change-Id: I47166c7393b2ab6bfc2e42aa8c1a8997c3a071b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/293030
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
2021-09-01 21:17:04 +00:00
Jake Ciolek
6cf1d5d0fa cmd/compile: generic SSA rules for simplifying 2 and 3 operand integer arithmetic expressions
This applies the following generic integer addition/subtraction transformations:

x - (x + y) = -y
(x - y) - x = -y
y + (x - y) = x
y + (z + (x - y) = x + z

There's over 40 unique functions matching in Go. Hits 2 funcs in the runtime itself:

runtime.stackfree()
runtime.runqdrain()

Go binary size reduced by 0.05% on Linux x86_64.

StackCopy bench (perflocked Cascade Lake x86):

name                old time/op  new time/op  delta
StackCopyPtr-8      87.3ms ± 1%  86.9ms ± 0%  -0.45%  (p=0.000 n=20+20)
StackCopy-8         77.6ms ± 1%  77.0ms ± 0%  -0.76%  (p=0.000 n=20+20)
StackCopyNoCache-8  2.28ms ± 2%  2.26ms ± 2%  -0.93%  (p=0.008 n=19+20)

test/bench/go1 benchmarks (perflocked Cascade Lake x86):

name                     old time/op    new time/op    delta
BinaryTree17-8              1.88s ± 1%     1.88s ± 0%    ~     (p=0.373 n=15+12)
Fannkuch11-8                2.31s ± 0%     2.35s ± 0%  +1.52%  (p=0.000 n=15+14)
FmtFprintfEmpty-8          26.6ns ± 0%    26.6ns ± 0%    ~     (p=0.081 n=14+13)
FmtFprintfString-8         48.6ns ± 0%    50.0ns ± 0%  +2.86%  (p=0.000 n=15+14)
FmtFprintfInt-8            56.9ns ± 0%    54.8ns ± 0%  -3.70%  (p=0.000 n=15+15)
FmtFprintfIntInt-8         90.4ns ± 0%    88.8ns ± 0%  -1.78%  (p=0.000 n=15+15)
FmtFprintfPrefixedInt-8     104ns ± 0%     104ns ± 0%    ~     (p=0.905 n=14+13)
FmtFprintfFloat-8           148ns ± 0%     144ns ± 0%  -2.19%  (p=0.000 n=14+15)
FmtManyArgs-8               389ns ± 0%     390ns ± 0%  +0.35%  (p=0.000 n=12+15)
GobDecode-8                3.90ms ± 1%    3.88ms ± 0%  -0.49%  (p=0.000 n=15+14)
GobEncode-8                2.73ms ± 0%    2.73ms ± 0%    ~     (p=0.425 n=15+14)
Gzip-8                      169ms ± 0%     168ms ± 0%  -0.52%  (p=0.000 n=13+13)
Gunzip-8                   24.7ms ± 0%    24.8ms ± 0%  +0.61%  (p=0.000 n=15+15)
HTTPClientServer-8         60.5µs ± 6%    60.4µs ± 7%    ~     (p=0.595 n=15+15)
JSONEncode-8               6.97ms ± 1%    6.93ms ± 0%  -0.69%  (p=0.000 n=14+14)
JSONDecode-8               31.2ms ± 1%    30.8ms ± 1%  -1.27%  (p=0.000 n=14+14)
Mandelbrot200-8            3.87ms ± 0%    3.87ms ± 0%    ~     (p=0.652 n=15+14)
GoParse-8                  2.65ms ± 2%    2.64ms ± 1%    ~     (p=0.202 n=15+15)
RegexpMatchEasy0_32-8      45.1ns ± 0%    45.9ns ± 0%  +1.68%  (p=0.000 n=14+15)
RegexpMatchEasy0_1K-8       140ns ± 0%     139ns ± 0%  -0.44%  (p=0.000 n=15+14)
RegexpMatchEasy1_32-8      40.9ns ± 3%    40.5ns ± 0%  -0.88%  (p=0.000 n=15+13)
RegexpMatchEasy1_1K-8       215ns ± 1%     220ns ± 1%  +2.27%  (p=0.000 n=15+15)
RegexpMatchMedium_32-8      783ns ± 7%     738ns ± 0%    ~     (p=0.361 n=15+15)
RegexpMatchMedium_1K-8     24.1µs ± 6%    23.4µs ± 6%  -2.94%  (p=0.004 n=15+15)
RegexpMatchHard_32-8       1.10µs ± 1%    1.09µs ± 1%  -0.40%  (p=0.006 n=15+14)
RegexpMatchHard_1K-8       33.0µs ± 0%    33.0µs ± 0%    ~     (p=0.535 n=12+14)
Revcomp-8                   354ms ± 0%     353ms ± 0%  -0.23%  (p=0.002 n=15+13)
Template-8                 42.0ms ± 1%    41.8ms ± 2%  -0.37%  (p=0.023 n=14+15)
TimeParse-8                 181ns ± 0%     180ns ± 1%  -0.18%  (p=0.014 n=12+13)
TimeFormat-8                240ns ± 0%     242ns ± 1%  +0.69%  (p=0.000 n=12+15)
[Geo mean]                 35.2µs         35.1µs       -0.43%

name                     old speed      new speed      delta
GobDecode-8               197MB/s ± 1%   198MB/s ± 0%  +0.49%  (p=0.000 n=15+14)
GobEncode-8               281MB/s ± 0%   281MB/s ± 0%    ~     (p=0.419 n=15+14)
Gzip-8                    115MB/s ± 0%   115MB/s ± 0%  +0.52%  (p=0.000 n=13+13)
Gunzip-8                  786MB/s ± 0%   781MB/s ± 0%  -0.60%  (p=0.000 n=15+15)
JSONEncode-8              278MB/s ± 1%   280MB/s ± 0%  +0.69%  (p=0.000 n=14+14)
JSONDecode-8             62.3MB/s ± 1%  63.1MB/s ± 1%  +1.29%  (p=0.000 n=14+14)
GoParse-8                21.9MB/s ± 2%  22.0MB/s ± 1%    ~     (p=0.205 n=15+15)
RegexpMatchEasy0_32-8     709MB/s ± 0%   697MB/s ± 0%  -1.65%  (p=0.000 n=14+15)
RegexpMatchEasy0_1K-8    7.34GB/s ± 0%  7.37GB/s ± 0%  +0.43%  (p=0.000 n=15+15)
RegexpMatchEasy1_32-8     783MB/s ± 2%   790MB/s ± 0%  +0.88%  (p=0.000 n=15+13)
RegexpMatchEasy1_1K-8    4.77GB/s ± 1%  4.66GB/s ± 1%  -2.23%  (p=0.000 n=15+15)
RegexpMatchMedium_32-8   41.0MB/s ± 7%  43.3MB/s ± 0%    ~     (p=0.360 n=15+15)
RegexpMatchMedium_1K-8   42.5MB/s ± 6%  43.8MB/s ± 6%  +3.07%  (p=0.004 n=15+15)
RegexpMatchHard_32-8     29.2MB/s ± 1%  29.3MB/s ± 1%  +0.41%  (p=0.006 n=15+14)
RegexpMatchHard_1K-8     31.1MB/s ± 0%  31.1MB/s ± 0%    ~     (p=0.495 n=12+14)
Revcomp-8                 718MB/s ± 0%   720MB/s ± 0%  +0.23%  (p=0.002 n=15+13)
Template-8               46.3MB/s ± 1%  46.4MB/s ± 2%  +0.38%  (p=0.021 n=14+15)
[Geo mean]                205MB/s        206MB/s       +0.57%

Change-Id: Ibd1afdf8b6c0b08087dcc3acd8f943637eb95ac0
Reviewed-on: https://go-review.googlesource.com/c/go/+/344930
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
2021-08-25 19:45:20 +00:00
Meng Zhuo
efd206eb40 cmd/compile: intrinsify Mul64 on riscv64
According to RISCV instruction set manual v2.2 Sec 6.1
MULHU followed by MUL will be fused into one multiply by microarchitecture

Benchstat on Hifive unmatched:
name          old time/op    new time/op    delta
Hash8Bytes       245ns ± 3%     186ns ± 4%  -23.99%  (p=0.000 n=10+10)
Hash320Bytes    1.94µs ± 1%    1.31µs ± 1%  -32.38%  (p=0.000 n=9+10)
Hash1K          5.84µs ± 0%    3.84µs ± 0%  -34.20%  (p=0.000 n=10+9)
Hash8K          45.3µs ± 0%    29.4µs ± 0%  -35.04%  (p=0.000 n=10+10)

name          old speed      new speed      delta
Hash8Bytes    32.7MB/s ± 3%  43.0MB/s ± 4%  +31.61%  (p=0.000 n=10+10)
Hash320Bytes   165MB/s ± 1%   244MB/s ± 1%  +47.88%  (p=0.000 n=9+10)
Hash1K         175MB/s ± 0%   266MB/s ± 0%  +51.98%  (p=0.000 n=10+9)
Hash8K         181MB/s ± 0%   279MB/s ± 0%  +53.94%  (p=0.000 n=10+10)

Change-Id: I3561495d02a4a0ad8578e9b9819bf0a4eaca5d12
Reviewed-on: https://go-review.googlesource.com/c/go/+/329970
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Meng Zhuo <mzh@golangcn.org>
2021-08-16 13:50:11 +00:00
Cherry Mui
6b1e4430bb [dev.typeparams] cmd/compile: implement clobberdead mode on ARM64
For debugging.

Change-Id: I5875ccd2413b8ffd2ec97a0ace66b5cae7893b24
Reviewed-on: https://go-review.googlesource.com/c/go/+/324765
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-06-03 20:00:30 +00:00
Cherry Mui
165d39a1d4 [dev.typeparams] test: adjust codegen test for register ABI on ARM64
In codegen/arithmetic.go, previously there are MOVD's that match
for loads of arguments. With register ABI there are no more such
loads. Remove the MOVD matches.

Change-Id: I920ee2629c8c04d454f13a0c08e283d3528d9a64
Reviewed-on: https://go-review.googlesource.com/c/go/+/324251
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-06-03 14:28:14 +00:00
Ruslan Andreev
3b321a9d12 cmd/compile: add arch-specific inlining for runtime.memmove
This CL add runtime.memmove inlining for AMD64 and ARM64.
According to ssa dump from testcases generic rules can't inline
memmomve properly due to one of the arguments is Phi operation. But this
Phi op will be optimized out by later optimization stages. As a result
memmove can be inlined during arch-specific rules.
The commit add new optimization rules to arch-specific rules that can
inline runtime.memmove if it possible during lowering stage.
Optimization fires 5 times in Go source-code using regabi.

Fixes #41662

Change-Id: Iaffaf4c482d068b5f0683d141863892202cc8824
Reviewed-on: https://go-review.googlesource.com/c/go/+/289151
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: David Chase <drchase@google.com>
2021-05-12 16:23:30 +00:00
Keith Randall
b211fe0058 cmd/compile: remove bit operations that modify memory directly
These operations (BT{S,R,C}{Q,L}modify) are quite a bit slower than
other ways of doing the same thing.

Without the BTxmodify operations, there are two fallback ways the compiler
performs these operations: AND/OR/XOR operations directly on memory, or
load-BTx-write sequences. The compiler kinda chooses one arbitrarily
depending on rewrite rule application order. Currently, it uses
load-BTx-write for the Const benchmarks and AND/OR/XOR directly to memory
for the non-Const benchmarks. TBD, someone might investigate which of
the two fallback strategies is really better. For now, they are both
better than BTx ops.

name              old time/op  new time/op  delta
BitSet-8          1.09µs ± 2%  0.64µs ± 5%  -41.60%  (p=0.000 n=9+10)
BitClear-8        1.15µs ± 3%  0.68µs ± 6%  -41.00%  (p=0.000 n=10+10)
BitToggle-8       1.18µs ± 4%  0.73µs ± 2%  -38.36%  (p=0.000 n=10+8)
BitSetConst-8     37.0ns ± 7%  25.8ns ± 2%  -30.24%  (p=0.000 n=10+10)
BitClearConst-8   30.7ns ± 2%  25.0ns ±12%  -18.46%  (p=0.000 n=10+10)
BitToggleConst-8  36.9ns ± 1%  23.8ns ± 3%  -35.46%  (p=0.000 n=9+10)

Fixes #45790
Update #45242

Change-Id: Ie33a72dc139f261af82db15d446cd0855afb4e59
Reviewed-on: https://go-review.googlesource.com/c/go/+/318149
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ben Shi <powerman1st@163.com>
2021-05-08 03:27:59 +00:00
Cherry Zhang
6b8e3e2d06 cmd/compile: reduce redundant register moves for regabi calls
Currently, if we have AX=a and BX=b, and we want to make a call
F(1, a, b), to move arguments into the desired registers it emits

	MOVQ AX, CX
	MOVL $1, AX // AX=1
	MOVQ BX, DX
	MOVQ CX, BX // BX=a
	MOVQ DX, CX // CX=b

This has a few redundant moves.

This is because we process inputs in order. First, allocate 1 to
AX, which kicks out a (in AX) to CX (a free register at the
moment). Then, allocate a to BX, which kicks out b (in BX) to DX.
Finally, put b to CX.

Notice that if we start with allocating CX=b, then BX=a, AX=1,
we will not have redundant moves. This CL reduces redundant moves
by allocating them in different order: First, for inpouts that are
already in place, keep them there. Then allocate free registers.
Then everything else.

                             before       after
cmd/compile binary size     23703888    23609680
            text size        8565899     8533291

(with regabiargs enabled.)

Change-Id: I69e1bdf745f2c90bb791f6d7c45b37384af1e874
Reviewed-on: https://go-review.googlesource.com/c/go/+/311371
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-19 16:21:39 +00:00
David Chase
4df3d0e4df cmd/compile: rescue stmt boundaries from OpArgXXXReg and OpSelectN.
Fixes this failure:
go test cmd/compile/internal/ssa -run TestStmtLines -v
=== RUN   TestStmtLines
    stmtlines_test.go:115: Saw too many (amd64, > 1%) lines without
    statement marks, total=88263, nostmt=1930
    ('-run TestStmtLines -v' lists failing lines)

The failure has two causes.

One is that the first-line adjuster in code generation was relocating
"first lines" to instructions that would either not have any code generated,
or would have the statment marker removed by a different believed-good heuristic.

The other was that statement boundaries were getting attached to register
values (that with the old ABI were loads from the stack, hence real instructions).
The register values disappear at code generation.

The fixes are to (1) note that certain instructions are not good choices for
"first value" and skip them, and (2) in an expandCalls post-pass, look for
register valued instructions and under appropriate conditions move their
statement marker to a compatible use.

Also updates TestStmtLines to always log the score, for easier comparison of
minor compiler changes.

Updates #40724.

Change-Id: I485573ce900e292d7c44574adb7629cdb4695c3f
Reviewed-on: https://go-review.googlesource.com/c/go/+/309649
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-04-14 18:46:08 +00:00
Cherry Zhang
444d28295b test: make codegen/memops.go work with both ABIs
Following CL 309335, this fixes memops.go.

Change-Id: Ia2458b5267deee9f906f76c50e70a021ea2fcb5b
Reviewed-on: https://go-review.googlesource.com/c/go/+/309552
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-13 14:07:17 +00:00
Cherry Zhang
263e13d1f7 test: make codegen tests work with both ABIs
Some codegen tests were written with the assumption that
arguments and results are in memory, and with a specific stack
layout. With the register ABI, the assumption is no longer true.
Adjust the tests to work with both cases.

- For tests expecting in memory arguments/results, change to use
  global variables or memory-assigned argument/results.

- Allow more registers. E.g. some tests expecting register names
  contain only letters (e.g. AX), but  it can also contain numbers
  (e.g. R10).

- Some instruction selection changes when operate on register vs.
  memory, e.g. ADDQ vs. LEAQ, MOVB vs. MOVL. Accept both.

TODO: mathbits.go and memops.go still need fix.
Change-Id: Ic5932b4b5dd3f5d30ed078d296476b641420c4c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/309335
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-04-12 21:59:59 +00:00