Ben Shi
8a85bce215
test/codegen: improve test cases for arm64
...
1. Some incorrect test cases are disabled.
2. Some wrong test cases are corrected.
3. Some new test cases are added.
Change-Id: Ib5d0473d55159f233ddab79f96967eaec7b08597
Reviewed-on: https://go-review.googlesource.com/113736
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-22 14:50:41 +00:00
Lynn Boger
28edaf4584
cmd/compile,test: combine byte loads and stores on ppc64le
...
CL 74410 added rules to combine consecutive byte loads and
stores when the byte order was little endian for ppc64le. This
is the corresponding change for bytes that are in big endian order.
These rules are all intended for a little endian target arch.
This adds new testcases in test/codegen/memcombine.go
Fixes #22496
Updates #24242
Benchmark improvement for encoding/binary:
name old time/op new time/op delta
ReadSlice1000Int32s-16 11.0µs ± 0% 9.0µs ± 0% -17.47% (p=0.029 n=4+4)
ReadStruct-16 2.47µs ± 1% 2.48µs ± 0% +0.67% (p=0.114 n=4+4)
ReadInts-16 642ns ± 1% 630ns ± 1% -2.02% (p=0.029 n=4+4)
WriteInts-16 654ns ± 0% 653ns ± 1% -0.08% (p=0.629 n=4+4)
WriteSlice1000Int32s-16 8.75µs ± 0% 8.20µs ± 0% -6.19% (p=0.029 n=4+4)
PutUint16-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4)
PutUint32-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4)
PutUint64-16 1.85ns ± 0% 0.93ns ± 0% -49.73% (p=0.029 n=4+4)
LittleEndianPutUint16-16 1.03ns ± 0% 0.93ns ± 0% -9.71% (p=0.029 n=4+4)
LittleEndianPutUint32-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal)
LittleEndianPutUint64-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal)
PutUvarint32-16 43.0ns ± 0% 43.1ns ± 0% +0.12% (p=0.429 n=4+4)
PutUvarint64-16 174ns ± 0% 175ns ± 0% +0.29% (p=0.429 n=4+4)
Updates made to functions in gcm.go to enable their matching. An existing
testcase prevents these functions from being replaced by those in encoding/binary
due to import dependencies.
Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a
Reviewed-on: https://go-review.googlesource.com/98136
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 13:15:39 +00:00
Ben Shi
aaf73c6d1e
cmd/compile: optimize ARM64 with shifted register indexed load/store
...
ARM64 supports efficient instructions which combine shift, addition, load/store
together. Such as "MOVD (R0)(R1<<3), R2" and "MOVWU R6, (R4)(R1<<2)".
This CL optimizes the compiler to emit such efficient instuctions. And below
is some test data.
1. binary size before/after
binary size change
pkg/linux_arm64 +80.1KB
pkg/tool/linux_arm64 +121.9KB
go -4.3KB
gofmt -64KB
2. go1 benchmark
There is big improvement for the test case Fannkuch11, and slight
improvement for sme others, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 43.9s ± 2% 44.0s ± 2% ~ (p=0.820 n=30+30)
Fannkuch11-4 30.6s ± 2% 24.5s ± 3% -19.93% (p=0.000 n=25+30)
FmtFprintfEmpty-4 500ns ± 0% 499ns ± 0% -0.11% (p=0.000 n=23+25)
FmtFprintfString-4 1.03µs ± 0% 1.04µs ± 3% ~ (p=0.065 n=29+30)
FmtFprintfInt-4 1.15µs ± 3% 1.15µs ± 4% -0.56% (p=0.000 n=30+30)
FmtFprintfIntInt-4 1.80µs ± 5% 1.82µs ± 0% ~ (p=0.094 n=30+24)
FmtFprintfPrefixedInt-4 2.17µs ± 5% 2.20µs ± 0% ~ (p=0.100 n=30+23)
FmtFprintfFloat-4 3.08µs ± 3% 3.09µs ± 4% ~ (p=0.123 n=30+30)
FmtManyArgs-4 7.41µs ± 4% 7.17µs ± 1% -3.26% (p=0.000 n=30+23)
GobDecode-4 93.7ms ± 0% 94.7ms ± 4% ~ (p=0.685 n=24+30)
GobEncode-4 78.7ms ± 7% 77.1ms ± 0% ~ (p=0.729 n=30+23)
Gzip-4 4.01s ± 0% 3.97s ± 5% -1.11% (p=0.037 n=24+30)
Gunzip-4 389ms ± 4% 384ms ± 0% ~ (p=0.155 n=30+23)
HTTPClientServer-4 536µs ± 1% 537µs ± 1% ~ (p=0.236 n=30+30)
JSONEncode-4 179ms ± 1% 182ms ± 6% ~ (p=0.763 n=24+30)
JSONDecode-4 843ms ± 0% 839ms ± 6% -0.42% (p=0.003 n=25+30)
Mandelbrot200-4 46.5ms ± 0% 46.5ms ± 0% +0.02% (p=0.000 n=26+26)
GoParse-4 44.3ms ± 6% 43.3ms ± 0% ~ (p=0.067 n=30+27)
RegexpMatchEasy0_32-4 1.07µs ± 7% 1.07µs ± 4% ~ (p=0.835 n=30+30)
RegexpMatchEasy0_1K-4 5.51µs ± 0% 5.49µs ± 0% -0.35% (p=0.000 n=23+26)
RegexpMatchEasy1_32-4 1.01µs ± 0% 1.02µs ± 4% +0.96% (p=0.014 n=24+30)
RegexpMatchEasy1_1K-4 7.43µs ± 0% 7.18µs ± 0% -3.41% (p=0.000 n=23+24)
RegexpMatchMedium_32-4 1.78µs ± 0% 1.81µs ± 4% +1.47% (p=0.012 n=23+30)
RegexpMatchMedium_1K-4 547µs ± 1% 542µs ± 3% -0.90% (p=0.003 n=24+30)
RegexpMatchHard_32-4 30.4µs ± 0% 29.7µs ± 0% -2.15% (p=0.000 n=19+23)
RegexpMatchHard_1K-4 913µs ± 0% 915µs ± 6% +0.25% (p=0.012 n=24+30)
Revcomp-4 6.32s ± 1% 6.42s ± 4% ~ (p=0.342 n=25+30)
Template-4 868ms ± 6% 878ms ± 6% +1.15% (p=0.000 n=30+30)
TimeParse-4 4.57µs ± 4% 4.59µs ± 3% +0.65% (p=0.010 n=29+30)
TimeFormat-4 4.51µs ± 0% 4.50µs ± 0% -0.27% (p=0.000 n=27+24)
[Geo mean] 695µs 689µs -0.92%
name old speed new speed delta
GobDecode-4 8.19MB/s ± 0% 8.12MB/s ± 4% ~ (p=0.680 n=24+30)
GobEncode-4 9.76MB/s ± 7% 9.96MB/s ± 0% ~ (p=0.616 n=30+23)
Gzip-4 4.84MB/s ± 0% 4.89MB/s ± 4% +1.16% (p=0.030 n=24+30)
Gunzip-4 49.9MB/s ± 4% 50.6MB/s ± 0% ~ (p=0.162 n=30+23)
JSONEncode-4 10.9MB/s ± 1% 10.7MB/s ± 6% ~ (p=0.575 n=24+30)
JSONDecode-4 2.30MB/s ± 0% 2.32MB/s ± 5% +0.72% (p=0.003 n=22+30)
GoParse-4 1.31MB/s ± 6% 1.34MB/s ± 0% +2.26% (p=0.002 n=30+27)
RegexpMatchEasy0_32-4 30.0MB/s ± 6% 30.0MB/s ± 4% ~ (p=1.000 n=30+30)
RegexpMatchEasy0_1K-4 186MB/s ± 0% 187MB/s ± 0% +0.35% (p=0.000 n=23+26)
RegexpMatchEasy1_32-4 31.8MB/s ± 0% 31.5MB/s ± 4% -0.92% (p=0.012 n=25+30)
RegexpMatchEasy1_1K-4 138MB/s ± 0% 143MB/s ± 0% +3.53% (p=0.000 n=23+24)
RegexpMatchMedium_32-4 560kB/s ± 0% 553kB/s ± 4% -1.19% (p=0.005 n=23+30)
RegexpMatchMedium_1K-4 1.87MB/s ± 0% 1.89MB/s ± 3% +1.04% (p=0.002 n=24+30)
RegexpMatchHard_32-4 1.05MB/s ± 0% 1.08MB/s ± 0% +2.40% (p=0.000 n=19+23)
RegexpMatchHard_1K-4 1.12MB/s ± 0% 1.12MB/s ± 5% +0.12% (p=0.006 n=25+30)
Revcomp-4 40.2MB/s ± 1% 39.6MB/s ± 4% ~ (p=0.242 n=25+30)
Template-4 2.24MB/s ± 6% 2.21MB/s ± 6% -1.15% (p=0.000 n=30+30)
[Geo mean] 7.87MB/s 7.91MB/s +0.44%
Change-Id: If374cb7abf83537aa0a176f73c0f736f7800db03
Reviewed-on: https://go-review.googlesource.com/108735
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 20:02:05 +00:00
Ben Shi
34f5f8a580
cmd/compile: optimize ARM64 with register indexed load/store
...
ARM64 supports load/store instructions with a memory operand that
the address is calculated by base register + index register.
In this CL,
1. Some rules are added to the compile's ARM64 backend to emit
such efficient instructions.
2. A wrong rule of load combination is fixed.
The go1 benchmark does show improvement.
name old time/op new time/op delta
BinaryTree17-4 44.5s ± 2% 44.1s ± 1% -0.81% (p=0.000 n=28+29)
Fannkuch11-4 32.7s ± 3% 30.5s ± 0% -6.79% (p=0.000 n=30+26)
FmtFprintfEmpty-4 499ns ± 0% 506ns ± 5% +1.39% (p=0.003 n=25+30)
FmtFprintfString-4 1.07µs ± 0% 1.04µs ± 4% -3.17% (p=0.000 n=23+30)
FmtFprintfInt-4 1.15µs ± 4% 1.13µs ± 0% -1.55% (p=0.000 n=30+23)
FmtFprintfIntInt-4 1.77µs ± 4% 1.74µs ± 0% -1.71% (p=0.000 n=30+24)
FmtFprintfPrefixedInt-4 2.37µs ± 5% 2.12µs ± 0% -10.56% (p=0.000 n=30+23)
FmtFprintfFloat-4 3.03µs ± 1% 3.03µs ± 4% -0.13% (p=0.003 n=25+30)
FmtManyArgs-4 7.38µs ± 1% 7.43µs ± 4% +0.59% (p=0.003 n=25+30)
GobDecode-4 101ms ± 6% 95ms ± 5% -5.55% (p=0.000 n=30+30)
GobEncode-4 78.0ms ± 4% 78.8ms ± 6% +1.05% (p=0.000 n=30+30)
Gzip-4 4.25s ± 0% 4.27s ± 4% +0.45% (p=0.003 n=24+30)
Gunzip-4 428ms ± 1% 420ms ± 0% -1.88% (p=0.000 n=23+23)
HTTPClientServer-4 549µs ± 1% 541µs ± 1% -1.56% (p=0.000 n=29+29)
JSONEncode-4 194ms ± 0% 188ms ± 4% ~ (p=0.417 n=23+30)
JSONDecode-4 890ms ± 5% 831ms ± 0% -6.55% (p=0.000 n=30+23)
Mandelbrot200-4 47.3ms ± 2% 46.5ms ± 0% ~ (p=0.980 n=30+26)
GoParse-4 43.1ms ± 6% 43.8ms ± 6% +1.65% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 1.06µs ± 0% 1.07µs ± 3% ~ (p=0.092 n=23+30)
RegexpMatchEasy0_1K-4 5.53µs ± 0% 5.51µs ± 0% -0.24% (p=0.000 n=25+25)
RegexpMatchEasy1_32-4 1.02µs ± 3% 1.01µs ± 0% -1.27% (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4 7.26µs ± 0% 7.33µs ± 0% +0.95% (p=0.000 n=23+26)
RegexpMatchMedium_32-4 1.84µs ± 7% 1.79µs ± 1% ~ (p=0.333 n=30+23)
RegexpMatchMedium_1K-4 553µs ± 0% 547µs ± 0% -1.14% (p=0.000 n=24+22)
RegexpMatchHard_32-4 30.8µs ± 1% 30.3µs ± 0% -1.40% (p=0.000 n=24+24)
RegexpMatchHard_1K-4 928µs ± 0% 929µs ± 5% +0.12% (p=0.013 n=23+30)
Revcomp-4 8.13s ± 4% 6.32s ± 1% -22.23% (p=0.000 n=30+23)
Template-4 899ms ± 6% 854ms ± 1% -5.01% (p=0.000 n=30+24)
TimeParse-4 4.66µs ± 4% 4.59µs ± 1% -1.57% (p=0.000 n=30+23)
TimeFormat-4 4.58µs ± 0% 4.61µs ± 0% +0.57% (p=0.000 n=26+24)
[Geo mean] 717µs 698µs -2.55%
name old speed new speed delta
GobDecode-4 7.63MB/s ± 6% 8.08MB/s ± 5% +5.88% (p=0.000 n=30+30)
GobEncode-4 9.85MB/s ± 4% 9.75MB/s ± 6% -1.04% (p=0.000 n=30+30)
Gzip-4 4.56MB/s ± 0% 4.55MB/s ± 4% -0.36% (p=0.003 n=24+30)
Gunzip-4 45.3MB/s ± 1% 46.2MB/s ± 0% +1.92% (p=0.000 n=23+23)
JSONEncode-4 10.0MB/s ± 0% 10.4MB/s ± 4% ~ (p=0.403 n=23+30)
JSONDecode-4 2.18MB/s ± 5% 2.33MB/s ± 0% +6.91% (p=0.000 n=30+23)
GoParse-4 1.34MB/s ± 5% 1.32MB/s ± 5% -1.66% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 30.2MB/s ± 0% 29.8MB/s ± 3% ~ (p=0.099 n=23+30)
RegexpMatchEasy0_1K-4 185MB/s ± 0% 186MB/s ± 0% +0.24% (p=0.000 n=25+25)
RegexpMatchEasy1_32-4 31.4MB/s ± 3% 31.8MB/s ± 0% +1.24% (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4 141MB/s ± 0% 140MB/s ± 0% -0.94% (p=0.000 n=23+26)
RegexpMatchMedium_32-4 541kB/s ± 6% 560kB/s ± 0% +3.45% (p=0.000 n=30+23)
RegexpMatchMedium_1K-4 1.85MB/s ± 0% 1.87MB/s ± 0% +1.08% (p=0.000 n=24+23)
RegexpMatchHard_32-4 1.04MB/s ± 1% 1.06MB/s ± 1% +1.48% (p=0.000 n=24+24)
RegexpMatchHard_1K-4 1.10MB/s ± 0% 1.10MB/s ± 5% +0.15% (p=0.004 n=23+30)
Revcomp-4 31.3MB/s ± 4% 40.2MB/s ± 1% +28.52% (p=0.000 n=30+23)
Template-4 2.16MB/s ± 6% 2.27MB/s ± 1% +5.18% (p=0.000 n=30+24)
[Geo mean] 7.57MB/s 7.79MB/s +2.98%
fixes #24907
Change-Id: I94afd0e3f53d62a1cf5e452f3dd6daf61be21785
Reviewed-on: https://go-review.googlesource.com/107376
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-19 15:08:10 +00:00
Alberto Donizetti
467eca6076
test/codegen: port last stack and memcombining tests
...
And delete them from asm_test.
Also delete an arm64 cmov test has been already ported to the new test
harness.
Change-Id: I4458721e1f512bc9ecbbe1c22a2c9c7109ad68fe
Reviewed-on: https://go-review.googlesource.com/106335
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-11 16:08:04 +00:00
Alberto Donizetti
54c3f56ee0
test/codegen: port various mem-combining tests
...
And delete them from asm_test.
Change-Id: I0e33d58274951ab5acb67b0117b60ef617ea887a
Reviewed-on: https://go-review.googlesource.com/105735
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-04-09 12:00:06 +00:00
Alberto Donizetti
3e31eb6b84
test/codegen: port arm64 slice zeroing tests
...
Finish porting arm64 slice zeroing codegen tests; delete them from
asm_test.
Change-Id: Id2532df8ba1c340fa662a6b5238daa3de30548be
Reviewed-on: https://go-review.googlesource.com/105136
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-07 09:55:51 +00:00
Alberto Donizetti
f2abca90a2
test/codegen: port arm64 byte slice zeroing tests
...
And delete them from asm_test.
Change-Id: Id533130470da9176a401cb94972f626f43a62148
Reviewed-on: https://go-review.googlesource.com/103656
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-04 13:18:15 +00:00
Wei Xiao
05962561ae
cmd/compile/internal/ssa: improve store combine optimization on arm64
...
Current implementation doesn't consider MOVDreg type operand and fail to combine
it into larger store. This patch fixes the issue.
Fixes #24242
Change-Id: I7d68697f80e76f48c3528ece01a602bf513248ec
Reviewed-on: https://go-review.googlesource.com/98397
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 20:29:04 +00:00
Giovanni Bajo
fad31e513d
test: move load/store combines into asmcheck
...
This CL moves the load/store combining tests into asmcheck.
In addition at being more compact, it's also now easier to
spot what it is missing in each architecture.
While doing so, I think I uncovered a bug in ppc64le and arm64
rules, because they fail to load/store combine in non-trivial
functions. Not sure why, I'll open an issue.
Change-Id: Ia1572d53c0553d9104f3e52b95e4d1768a8440a3
Reviewed-on: https://go-review.googlesource.com/98441
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 16:52:03 +00:00