Ben Shi
95dda75bde
cmd/compile: optimize store combination on 386/amd64
...
This CL add 3 rules to combine byte-store to word-store on386 and
amd64.
Change-Id: Iffd9cda42f1961680c81def4edc773ad58f211b3
Reviewed-on: https://go-review.googlesource.com/c/143057
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-19 02:21:04 +00:00
Ben Shi
4158734097
test/codegen: add more combined load/store test cases
...
This CL adds more combined load/store test cases for 386/amd64.
Change-Id: I0a483a6ed0212b65c5e84d67ed8c9f50c389ce2d
Reviewed-on: https://go-review.googlesource.com/c/142878
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-18 01:57:54 +00:00
Ben Shi
4b78fe57a8
cmd/compile: optimize 386's load/store combination
...
This CL adds more combinations of two consequtive MOVBload/MOVBstore
to a unique MOVWload/MOVWstore.
1. The size of the go executable decreases about 4KB, and the total
size of pkg/linux_386 (excluding cmd/compile) decreases about 1.5KB.
2. There is no regression in the go1 benchmark result, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 3.28s ± 2% 3.29s ± 2% ~ (p=0.151 n=40+40)
Fannkuch11-4 3.52s ± 1% 3.51s ± 1% -0.28% (p=0.002 n=40+40)
FmtFprintfEmpty-4 45.4ns ± 4% 45.0ns ± 4% -0.89% (p=0.019 n=40+40)
FmtFprintfString-4 81.9ns ± 7% 81.3ns ± 1% ~ (p=0.660 n=40+25)
FmtFprintfInt-4 91.9ns ± 9% 91.4ns ± 9% ~ (p=0.249 n=40+40)
FmtFprintfIntInt-4 143ns ± 4% 143ns ± 4% ~ (p=0.760 n=40+40)
FmtFprintfPrefixedInt-4 184ns ± 3% 183ns ± 4% ~ (p=0.485 n=40+40)
FmtFprintfFloat-4 408ns ± 3% 409ns ± 3% ~ (p=0.961 n=40+40)
FmtManyArgs-4 597ns ± 4% 602ns ± 3% ~ (p=0.413 n=40+40)
GobDecode-4 7.13ms ± 6% 7.14ms ± 6% ~ (p=0.859 n=40+40)
GobEncode-4 6.86ms ± 9% 6.94ms ± 7% ~ (p=0.162 n=40+40)
Gzip-4 395ms ± 4% 396ms ± 3% ~ (p=0.099 n=40+40)
Gunzip-4 40.9ms ± 4% 41.1ms ± 3% ~ (p=0.064 n=40+40)
HTTPClientServer-4 63.6µs ± 2% 63.6µs ± 3% ~ (p=0.832 n=36+39)
JSONEncode-4 16.1ms ± 3% 15.8ms ± 3% -1.60% (p=0.001 n=40+40)
JSONDecode-4 61.0ms ± 3% 61.5ms ± 4% ~ (p=0.065 n=40+40)
Mandelbrot200-4 5.16ms ± 3% 5.18ms ± 3% ~ (p=0.056 n=40+40)
GoParse-4 3.25ms ± 2% 3.23ms ± 3% ~ (p=0.727 n=40+40)
RegexpMatchEasy0_32-4 90.2ns ± 3% 89.3ns ± 6% -0.98% (p=0.002 n=40+40)
RegexpMatchEasy0_1K-4 812ns ± 3% 815ns ± 3% ~ (p=0.309 n=40+40)
RegexpMatchEasy1_32-4 103ns ± 6% 103ns ± 5% ~ (p=0.680 n=40+40)
RegexpMatchEasy1_1K-4 1.01µs ± 4% 1.02µs ± 3% ~ (p=0.326 n=40+33)
RegexpMatchMedium_32-4 120ns ± 4% 120ns ± 5% ~ (p=0.834 n=40+40)
RegexpMatchMedium_1K-4 40.1µs ± 3% 39.5µs ± 4% -1.35% (p=0.000 n=40+40)
RegexpMatchHard_32-4 2.27µs ± 6% 2.23µs ± 4% -1.67% (p=0.011 n=40+40)
RegexpMatchHard_1K-4 67.2µs ± 3% 67.2µs ± 3% ~ (p=0.149 n=40+40)
Revcomp-4 1.84s ± 2% 1.86s ± 3% +0.70% (p=0.020 n=40+40)
Template-4 69.0ms ± 4% 69.8ms ± 3% +1.20% (p=0.003 n=40+40)
TimeParse-4 438ns ± 3% 439ns ± 4% ~ (p=0.650 n=40+40)
TimeFormat-4 412ns ± 3% 412ns ± 3% ~ (p=0.888 n=40+40)
[Geo mean] 65.2µs 65.2µs -0.04%
name old speed new speed delta
GobDecode-4 108MB/s ± 6% 108MB/s ± 6% ~ (p=0.855 n=40+40)
GobEncode-4 112MB/s ± 9% 111MB/s ± 8% ~ (p=0.159 n=40+40)
Gzip-4 49.2MB/s ± 4% 49.1MB/s ± 3% ~ (p=0.102 n=40+40)
Gunzip-4 474MB/s ± 3% 472MB/s ± 3% ~ (p=0.063 n=40+40)
JSONEncode-4 121MB/s ± 3% 123MB/s ± 3% +1.62% (p=0.001 n=40+40)
JSONDecode-4 31.9MB/s ± 3% 31.6MB/s ± 4% ~ (p=0.070 n=40+40)
GoParse-4 17.9MB/s ± 2% 17.9MB/s ± 3% ~ (p=0.696 n=40+40)
RegexpMatchEasy0_32-4 355MB/s ± 3% 358MB/s ± 5% +0.99% (p=0.002 n=40+40)
RegexpMatchEasy0_1K-4 1.26GB/s ± 3% 1.26GB/s ± 3% ~ (p=0.381 n=40+40)
RegexpMatchEasy1_32-4 310MB/s ± 5% 310MB/s ± 4% ~ (p=0.655 n=40+40)
RegexpMatchEasy1_1K-4 1.01GB/s ± 4% 1.01GB/s ± 3% ~ (p=0.351 n=40+33)
RegexpMatchMedium_32-4 8.32MB/s ± 4% 8.34MB/s ± 5% ~ (p=0.696 n=40+40)
RegexpMatchMedium_1K-4 25.6MB/s ± 3% 25.9MB/s ± 4% +1.36% (p=0.000 n=40+40)
RegexpMatchHard_32-4 14.1MB/s ± 6% 14.3MB/s ± 4% +1.64% (p=0.011 n=40+40)
RegexpMatchHard_1K-4 15.2MB/s ± 3% 15.2MB/s ± 3% ~ (p=0.147 n=40+40)
Revcomp-4 138MB/s ± 2% 137MB/s ± 3% -0.70% (p=0.021 n=40+40)
Template-4 28.1MB/s ± 4% 27.8MB/s ± 3% -1.19% (p=0.003 n=40+40)
[Geo mean] 83.7MB/s 83.7MB/s +0.03%
Change-Id: I2a2b3a942b5c45467491515d201179fd192e65c9
Reviewed-on: https://go-review.googlesource.com/c/141650
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-16 07:17:11 +00:00
Ben Shi
bac6a2925c
test/codegen: add more arm64 test cases
...
This CL adds 3 combined load test cases for arm64.
Change-Id: I2c67308c40fd8a18f9f2d16c6d12911dcdc583e2
Reviewed-on: https://go-review.googlesource.com/c/140700
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-11 15:14:06 +00:00
Giovanni Bajo
f02cc88f46
test: relax whitespaces matching in codegen tests
...
The codegen testsuite uses regexp to parse the syntax, but it doesn't
have a way to tell line comments containing checks from line comments
containing English sentences. This means that any syntax error (that
is, non-matching regexp) is currently ignored and not reported.
There were some tests in memcombine.go that had an extraneous space
and were thus effectively disabled. It would be great if we could
report it as a syntax error, but for now we just punt and swallow the
spaces as a workaround, to avoid the same mistake again.
Fixes #25452
Change-Id: Ic7747a2278bc00adffd0c199ce40937acbbc9cf0
Reviewed-on: https://go-review.googlesource.com/113835
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-02 10:31:37 +00:00
Ben Shi
de0e72610b
test/codegen: add more combined store tests for arm64
...
Some combined store optimization was already implemented
in go-1.11, but there is no corresponding test cases.
Change-Id: Iebdad186e92047942e53a74f2c20b390922e1e9c
Reviewed-on: https://go-review.googlesource.com/122915
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-02 04:23:45 +00:00
Ben Shi
f9800a9473
test/codegen: add more test cases for arm64
...
More test cases of combined load for arm64.
Change-Id: I7a9f4dcec6930f161cbded1f47dbf7fcef1db4f1
Reviewed-on: https://go-review.googlesource.com/122582
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-07-10 15:41:15 +00:00
Ben Shi
8a85bce215
test/codegen: improve test cases for arm64
...
1. Some incorrect test cases are disabled.
2. Some wrong test cases are corrected.
3. Some new test cases are added.
Change-Id: Ib5d0473d55159f233ddab79f96967eaec7b08597
Reviewed-on: https://go-review.googlesource.com/113736
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-22 14:50:41 +00:00
Lynn Boger
28edaf4584
cmd/compile,test: combine byte loads and stores on ppc64le
...
CL 74410 added rules to combine consecutive byte loads and
stores when the byte order was little endian for ppc64le. This
is the corresponding change for bytes that are in big endian order.
These rules are all intended for a little endian target arch.
This adds new testcases in test/codegen/memcombine.go
Fixes #22496
Updates #24242
Benchmark improvement for encoding/binary:
name old time/op new time/op delta
ReadSlice1000Int32s-16 11.0µs ± 0% 9.0µs ± 0% -17.47% (p=0.029 n=4+4)
ReadStruct-16 2.47µs ± 1% 2.48µs ± 0% +0.67% (p=0.114 n=4+4)
ReadInts-16 642ns ± 1% 630ns ± 1% -2.02% (p=0.029 n=4+4)
WriteInts-16 654ns ± 0% 653ns ± 1% -0.08% (p=0.629 n=4+4)
WriteSlice1000Int32s-16 8.75µs ± 0% 8.20µs ± 0% -6.19% (p=0.029 n=4+4)
PutUint16-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4)
PutUint32-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4)
PutUint64-16 1.85ns ± 0% 0.93ns ± 0% -49.73% (p=0.029 n=4+4)
LittleEndianPutUint16-16 1.03ns ± 0% 0.93ns ± 0% -9.71% (p=0.029 n=4+4)
LittleEndianPutUint32-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal)
LittleEndianPutUint64-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal)
PutUvarint32-16 43.0ns ± 0% 43.1ns ± 0% +0.12% (p=0.429 n=4+4)
PutUvarint64-16 174ns ± 0% 175ns ± 0% +0.29% (p=0.429 n=4+4)
Updates made to functions in gcm.go to enable their matching. An existing
testcase prevents these functions from being replaced by those in encoding/binary
due to import dependencies.
Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a
Reviewed-on: https://go-review.googlesource.com/98136
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 13:15:39 +00:00
Ben Shi
aaf73c6d1e
cmd/compile: optimize ARM64 with shifted register indexed load/store
...
ARM64 supports efficient instructions which combine shift, addition, load/store
together. Such as "MOVD (R0)(R1<<3), R2" and "MOVWU R6, (R4)(R1<<2)".
This CL optimizes the compiler to emit such efficient instuctions. And below
is some test data.
1. binary size before/after
binary size change
pkg/linux_arm64 +80.1KB
pkg/tool/linux_arm64 +121.9KB
go -4.3KB
gofmt -64KB
2. go1 benchmark
There is big improvement for the test case Fannkuch11, and slight
improvement for sme others, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 43.9s ± 2% 44.0s ± 2% ~ (p=0.820 n=30+30)
Fannkuch11-4 30.6s ± 2% 24.5s ± 3% -19.93% (p=0.000 n=25+30)
FmtFprintfEmpty-4 500ns ± 0% 499ns ± 0% -0.11% (p=0.000 n=23+25)
FmtFprintfString-4 1.03µs ± 0% 1.04µs ± 3% ~ (p=0.065 n=29+30)
FmtFprintfInt-4 1.15µs ± 3% 1.15µs ± 4% -0.56% (p=0.000 n=30+30)
FmtFprintfIntInt-4 1.80µs ± 5% 1.82µs ± 0% ~ (p=0.094 n=30+24)
FmtFprintfPrefixedInt-4 2.17µs ± 5% 2.20µs ± 0% ~ (p=0.100 n=30+23)
FmtFprintfFloat-4 3.08µs ± 3% 3.09µs ± 4% ~ (p=0.123 n=30+30)
FmtManyArgs-4 7.41µs ± 4% 7.17µs ± 1% -3.26% (p=0.000 n=30+23)
GobDecode-4 93.7ms ± 0% 94.7ms ± 4% ~ (p=0.685 n=24+30)
GobEncode-4 78.7ms ± 7% 77.1ms ± 0% ~ (p=0.729 n=30+23)
Gzip-4 4.01s ± 0% 3.97s ± 5% -1.11% (p=0.037 n=24+30)
Gunzip-4 389ms ± 4% 384ms ± 0% ~ (p=0.155 n=30+23)
HTTPClientServer-4 536µs ± 1% 537µs ± 1% ~ (p=0.236 n=30+30)
JSONEncode-4 179ms ± 1% 182ms ± 6% ~ (p=0.763 n=24+30)
JSONDecode-4 843ms ± 0% 839ms ± 6% -0.42% (p=0.003 n=25+30)
Mandelbrot200-4 46.5ms ± 0% 46.5ms ± 0% +0.02% (p=0.000 n=26+26)
GoParse-4 44.3ms ± 6% 43.3ms ± 0% ~ (p=0.067 n=30+27)
RegexpMatchEasy0_32-4 1.07µs ± 7% 1.07µs ± 4% ~ (p=0.835 n=30+30)
RegexpMatchEasy0_1K-4 5.51µs ± 0% 5.49µs ± 0% -0.35% (p=0.000 n=23+26)
RegexpMatchEasy1_32-4 1.01µs ± 0% 1.02µs ± 4% +0.96% (p=0.014 n=24+30)
RegexpMatchEasy1_1K-4 7.43µs ± 0% 7.18µs ± 0% -3.41% (p=0.000 n=23+24)
RegexpMatchMedium_32-4 1.78µs ± 0% 1.81µs ± 4% +1.47% (p=0.012 n=23+30)
RegexpMatchMedium_1K-4 547µs ± 1% 542µs ± 3% -0.90% (p=0.003 n=24+30)
RegexpMatchHard_32-4 30.4µs ± 0% 29.7µs ± 0% -2.15% (p=0.000 n=19+23)
RegexpMatchHard_1K-4 913µs ± 0% 915µs ± 6% +0.25% (p=0.012 n=24+30)
Revcomp-4 6.32s ± 1% 6.42s ± 4% ~ (p=0.342 n=25+30)
Template-4 868ms ± 6% 878ms ± 6% +1.15% (p=0.000 n=30+30)
TimeParse-4 4.57µs ± 4% 4.59µs ± 3% +0.65% (p=0.010 n=29+30)
TimeFormat-4 4.51µs ± 0% 4.50µs ± 0% -0.27% (p=0.000 n=27+24)
[Geo mean] 695µs 689µs -0.92%
name old speed new speed delta
GobDecode-4 8.19MB/s ± 0% 8.12MB/s ± 4% ~ (p=0.680 n=24+30)
GobEncode-4 9.76MB/s ± 7% 9.96MB/s ± 0% ~ (p=0.616 n=30+23)
Gzip-4 4.84MB/s ± 0% 4.89MB/s ± 4% +1.16% (p=0.030 n=24+30)
Gunzip-4 49.9MB/s ± 4% 50.6MB/s ± 0% ~ (p=0.162 n=30+23)
JSONEncode-4 10.9MB/s ± 1% 10.7MB/s ± 6% ~ (p=0.575 n=24+30)
JSONDecode-4 2.30MB/s ± 0% 2.32MB/s ± 5% +0.72% (p=0.003 n=22+30)
GoParse-4 1.31MB/s ± 6% 1.34MB/s ± 0% +2.26% (p=0.002 n=30+27)
RegexpMatchEasy0_32-4 30.0MB/s ± 6% 30.0MB/s ± 4% ~ (p=1.000 n=30+30)
RegexpMatchEasy0_1K-4 186MB/s ± 0% 187MB/s ± 0% +0.35% (p=0.000 n=23+26)
RegexpMatchEasy1_32-4 31.8MB/s ± 0% 31.5MB/s ± 4% -0.92% (p=0.012 n=25+30)
RegexpMatchEasy1_1K-4 138MB/s ± 0% 143MB/s ± 0% +3.53% (p=0.000 n=23+24)
RegexpMatchMedium_32-4 560kB/s ± 0% 553kB/s ± 4% -1.19% (p=0.005 n=23+30)
RegexpMatchMedium_1K-4 1.87MB/s ± 0% 1.89MB/s ± 3% +1.04% (p=0.002 n=24+30)
RegexpMatchHard_32-4 1.05MB/s ± 0% 1.08MB/s ± 0% +2.40% (p=0.000 n=19+23)
RegexpMatchHard_1K-4 1.12MB/s ± 0% 1.12MB/s ± 5% +0.12% (p=0.006 n=25+30)
Revcomp-4 40.2MB/s ± 1% 39.6MB/s ± 4% ~ (p=0.242 n=25+30)
Template-4 2.24MB/s ± 6% 2.21MB/s ± 6% -1.15% (p=0.000 n=30+30)
[Geo mean] 7.87MB/s 7.91MB/s +0.44%
Change-Id: If374cb7abf83537aa0a176f73c0f736f7800db03
Reviewed-on: https://go-review.googlesource.com/108735
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 20:02:05 +00:00
Ben Shi
34f5f8a580
cmd/compile: optimize ARM64 with register indexed load/store
...
ARM64 supports load/store instructions with a memory operand that
the address is calculated by base register + index register.
In this CL,
1. Some rules are added to the compile's ARM64 backend to emit
such efficient instructions.
2. A wrong rule of load combination is fixed.
The go1 benchmark does show improvement.
name old time/op new time/op delta
BinaryTree17-4 44.5s ± 2% 44.1s ± 1% -0.81% (p=0.000 n=28+29)
Fannkuch11-4 32.7s ± 3% 30.5s ± 0% -6.79% (p=0.000 n=30+26)
FmtFprintfEmpty-4 499ns ± 0% 506ns ± 5% +1.39% (p=0.003 n=25+30)
FmtFprintfString-4 1.07µs ± 0% 1.04µs ± 4% -3.17% (p=0.000 n=23+30)
FmtFprintfInt-4 1.15µs ± 4% 1.13µs ± 0% -1.55% (p=0.000 n=30+23)
FmtFprintfIntInt-4 1.77µs ± 4% 1.74µs ± 0% -1.71% (p=0.000 n=30+24)
FmtFprintfPrefixedInt-4 2.37µs ± 5% 2.12µs ± 0% -10.56% (p=0.000 n=30+23)
FmtFprintfFloat-4 3.03µs ± 1% 3.03µs ± 4% -0.13% (p=0.003 n=25+30)
FmtManyArgs-4 7.38µs ± 1% 7.43µs ± 4% +0.59% (p=0.003 n=25+30)
GobDecode-4 101ms ± 6% 95ms ± 5% -5.55% (p=0.000 n=30+30)
GobEncode-4 78.0ms ± 4% 78.8ms ± 6% +1.05% (p=0.000 n=30+30)
Gzip-4 4.25s ± 0% 4.27s ± 4% +0.45% (p=0.003 n=24+30)
Gunzip-4 428ms ± 1% 420ms ± 0% -1.88% (p=0.000 n=23+23)
HTTPClientServer-4 549µs ± 1% 541µs ± 1% -1.56% (p=0.000 n=29+29)
JSONEncode-4 194ms ± 0% 188ms ± 4% ~ (p=0.417 n=23+30)
JSONDecode-4 890ms ± 5% 831ms ± 0% -6.55% (p=0.000 n=30+23)
Mandelbrot200-4 47.3ms ± 2% 46.5ms ± 0% ~ (p=0.980 n=30+26)
GoParse-4 43.1ms ± 6% 43.8ms ± 6% +1.65% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 1.06µs ± 0% 1.07µs ± 3% ~ (p=0.092 n=23+30)
RegexpMatchEasy0_1K-4 5.53µs ± 0% 5.51µs ± 0% -0.24% (p=0.000 n=25+25)
RegexpMatchEasy1_32-4 1.02µs ± 3% 1.01µs ± 0% -1.27% (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4 7.26µs ± 0% 7.33µs ± 0% +0.95% (p=0.000 n=23+26)
RegexpMatchMedium_32-4 1.84µs ± 7% 1.79µs ± 1% ~ (p=0.333 n=30+23)
RegexpMatchMedium_1K-4 553µs ± 0% 547µs ± 0% -1.14% (p=0.000 n=24+22)
RegexpMatchHard_32-4 30.8µs ± 1% 30.3µs ± 0% -1.40% (p=0.000 n=24+24)
RegexpMatchHard_1K-4 928µs ± 0% 929µs ± 5% +0.12% (p=0.013 n=23+30)
Revcomp-4 8.13s ± 4% 6.32s ± 1% -22.23% (p=0.000 n=30+23)
Template-4 899ms ± 6% 854ms ± 1% -5.01% (p=0.000 n=30+24)
TimeParse-4 4.66µs ± 4% 4.59µs ± 1% -1.57% (p=0.000 n=30+23)
TimeFormat-4 4.58µs ± 0% 4.61µs ± 0% +0.57% (p=0.000 n=26+24)
[Geo mean] 717µs 698µs -2.55%
name old speed new speed delta
GobDecode-4 7.63MB/s ± 6% 8.08MB/s ± 5% +5.88% (p=0.000 n=30+30)
GobEncode-4 9.85MB/s ± 4% 9.75MB/s ± 6% -1.04% (p=0.000 n=30+30)
Gzip-4 4.56MB/s ± 0% 4.55MB/s ± 4% -0.36% (p=0.003 n=24+30)
Gunzip-4 45.3MB/s ± 1% 46.2MB/s ± 0% +1.92% (p=0.000 n=23+23)
JSONEncode-4 10.0MB/s ± 0% 10.4MB/s ± 4% ~ (p=0.403 n=23+30)
JSONDecode-4 2.18MB/s ± 5% 2.33MB/s ± 0% +6.91% (p=0.000 n=30+23)
GoParse-4 1.34MB/s ± 5% 1.32MB/s ± 5% -1.66% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 30.2MB/s ± 0% 29.8MB/s ± 3% ~ (p=0.099 n=23+30)
RegexpMatchEasy0_1K-4 185MB/s ± 0% 186MB/s ± 0% +0.24% (p=0.000 n=25+25)
RegexpMatchEasy1_32-4 31.4MB/s ± 3% 31.8MB/s ± 0% +1.24% (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4 141MB/s ± 0% 140MB/s ± 0% -0.94% (p=0.000 n=23+26)
RegexpMatchMedium_32-4 541kB/s ± 6% 560kB/s ± 0% +3.45% (p=0.000 n=30+23)
RegexpMatchMedium_1K-4 1.85MB/s ± 0% 1.87MB/s ± 0% +1.08% (p=0.000 n=24+23)
RegexpMatchHard_32-4 1.04MB/s ± 1% 1.06MB/s ± 1% +1.48% (p=0.000 n=24+24)
RegexpMatchHard_1K-4 1.10MB/s ± 0% 1.10MB/s ± 5% +0.15% (p=0.004 n=23+30)
Revcomp-4 31.3MB/s ± 4% 40.2MB/s ± 1% +28.52% (p=0.000 n=30+23)
Template-4 2.16MB/s ± 6% 2.27MB/s ± 1% +5.18% (p=0.000 n=30+24)
[Geo mean] 7.57MB/s 7.79MB/s +2.98%
fixes #24907
Change-Id: I94afd0e3f53d62a1cf5e452f3dd6daf61be21785
Reviewed-on: https://go-review.googlesource.com/107376
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-19 15:08:10 +00:00
Alberto Donizetti
467eca6076
test/codegen: port last stack and memcombining tests
...
And delete them from asm_test.
Also delete an arm64 cmov test has been already ported to the new test
harness.
Change-Id: I4458721e1f512bc9ecbbe1c22a2c9c7109ad68fe
Reviewed-on: https://go-review.googlesource.com/106335
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-11 16:08:04 +00:00
Alberto Donizetti
54c3f56ee0
test/codegen: port various mem-combining tests
...
And delete them from asm_test.
Change-Id: I0e33d58274951ab5acb67b0117b60ef617ea887a
Reviewed-on: https://go-review.googlesource.com/105735
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-04-09 12:00:06 +00:00
Alberto Donizetti
3e31eb6b84
test/codegen: port arm64 slice zeroing tests
...
Finish porting arm64 slice zeroing codegen tests; delete them from
asm_test.
Change-Id: Id2532df8ba1c340fa662a6b5238daa3de30548be
Reviewed-on: https://go-review.googlesource.com/105136
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-07 09:55:51 +00:00
Alberto Donizetti
f2abca90a2
test/codegen: port arm64 byte slice zeroing tests
...
And delete them from asm_test.
Change-Id: Id533130470da9176a401cb94972f626f43a62148
Reviewed-on: https://go-review.googlesource.com/103656
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-04 13:18:15 +00:00
Wei Xiao
05962561ae
cmd/compile/internal/ssa: improve store combine optimization on arm64
...
Current implementation doesn't consider MOVDreg type operand and fail to combine
it into larger store. This patch fixes the issue.
Fixes #24242
Change-Id: I7d68697f80e76f48c3528ece01a602bf513248ec
Reviewed-on: https://go-review.googlesource.com/98397
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 20:29:04 +00:00
Giovanni Bajo
fad31e513d
test: move load/store combines into asmcheck
...
This CL moves the load/store combining tests into asmcheck.
In addition at being more compact, it's also now easier to
spot what it is missing in each architecture.
While doing so, I think I uncovered a bug in ppc64le and arm64
rules, because they fail to load/store combine in non-trivial
functions. Not sure why, I'll open an issue.
Change-Id: Ia1572d53c0553d9104f3e52b95e4d1768a8440a3
Reviewed-on: https://go-review.googlesource.com/98441
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 16:52:03 +00:00