mirror of
https://github.com/golang/go
synced 2024-11-08 11:56:16 -07:00
3deb528199
20 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Agniva De Sarker
|
8acbe4c0b3 |
cmd/compile: optimize unsigned comparisons with 0/1 on wasm
Updates #21439 Change-Id: I0fbcde6e0c2fc368fe686b271670f9d8be4a7900 Reviewed-on: https://go-review.googlesource.com/c/go/+/247557 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com> |
||
Junchen Li
|
06337823ef |
cmd/compile: optimize unsigned comparisons to 0/1 on arm64
For an unsigned integer, it's useful to convert its order test with 0/1 to its equality test with 0. We can save a comparison instruction that followed by a conditional branch on arm64 since it supports compare-with-zero-and-branch instructions. For example, if x > 0 { ... } else { ... } the original version: CMP $0, R0 BLS 9 the optimized version: CBZ R0, 8 Updates #21439 Change-Id: Id1de6f865f6aa72c5d45b29f7894818857288425 Reviewed-on: https://go-review.googlesource.com/c/go/+/246857 Reviewed-by: Keith Randall <khr@golang.org> |
||
Junchen Li
|
e30fbe3757 |
cmd/compile: optimize unsigned comparisons to 0
There are some architecture-independent rules in #21439, since an unsigned integer >= 0 is always true and < 0 is always false. This CL adds these optimizations to generic rules. Updates #21439 Change-Id: Iec7e3040b761ecb1e60908f764815fdd9bc62495 Reviewed-on: https://go-review.googlesource.com/c/go/+/246617 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
Xiangdong Ji
|
e031318ca6 |
cmd/compile: ARM comparisons with 0 incorrect on overflow
Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Xiangdong Ji
|
e8f5a33191 |
cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag, in the following categories: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The backend generates two consecutive branch instructions for 'LEnoov' and 'GTnoov' to model their expected behavior. A slight change to 'gc' and amd64/386 backends is made to unify the code generation. Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules identified on arm64, more might be needed on other arches, like 32-bit arm. Add two benchmarks profiling the aforementioned category 1&2 and category 3&4 separetely, we expect the first two categories will show performance improvement and the second will not result in visible regression compared with the non-optimized version. This change also updates TestFormats to support using %#x. Examples exhibiting where does the issue come from: 1: 'if x + 3 < 0' might be converted to: before: CMN $3, R0 BGE <else branch> // wrong branch is taken if 'x+3' overflows after: CMN $3, R0 BPL <else branch> 2: 'if y - 3 > 0' might be converted to: before: CMP $3, R0 BLE <else branch> // wrong branch is taken if 'y-3' underflows after: CMP $3, R0 BMI <else branch> BEQ <else branch> Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized version (not the parent commit), generally the optimization version outperforms. S1: name old time/op new time/op delta CondRewrite/SoloJump 13.6ns ± 0% 12.9ns ± 0% -5.15% (p=0.000 n=10+10) CondRewrite/CombJump 13.8ns ± 1% 12.9ns ± 0% -6.32% (p=0.000 n=10+10) S2: name old time/op new time/op delta CondRewrite/SoloJump 11.6ns ± 0% 10.9ns ± 0% -6.03% (p=0.000 n=10+10) CondRewrite/CombJump 11.4ns ± 0% 10.8ns ± 1% -5.53% (p=0.000 n=10+10) S3: name old time/op new time/op delta CondRewrite/SoloJump 7.36ns ± 0% 7.50ns ± 0% +1.79% (p=0.000 n=9+10) CondRewrite/CombJump 7.35ns ± 0% 7.75ns ± 0% +5.51% (p=0.000 n=8+9) S4: name old time/op new time/op delta CondRewrite/SoloJump-224 11.5ns ± 1% 10.9ns ± 0% -4.97% (p=0.000 n=10+10) CondRewrite/CombJump-224 11.9ns ± 0% 11.5ns ± 0% -2.95% (p=0.000 n=10+10) S5: name old time/op new time/op delta CondRewrite/SoloJump 10.0ns ± 0% 10.0ns ± 0% -0.45% (p=0.000 n=9+10) CondRewrite/CombJump 9.93ns ± 0% 9.77ns ± 0% -1.53% (p=0.000 n=10+9) Go1 perf. data: name old time/op new time/op delta BinaryTree17 6.29s ± 1% 6.30s ± 1% ~ (p=1.000 n=5+5) Fannkuch11 5.40s ± 0% 5.40s ± 0% ~ (p=0.841 n=5+5) FmtFprintfEmpty 97.9ns ± 0% 98.9ns ± 3% ~ (p=0.937 n=4+5) FmtFprintfString 171ns ± 3% 171ns ± 2% ~ (p=0.754 n=5+5) FmtFprintfInt 212ns ± 0% 217ns ± 6% +2.55% (p=0.008 n=5+5) FmtFprintfIntInt 296ns ± 1% 297ns ± 2% ~ (p=0.516 n=5+5) FmtFprintfPrefixedInt 371ns ± 2% 374ns ± 7% ~ (p=1.000 n=5+5) FmtFprintfFloat 435ns ± 1% 439ns ± 2% ~ (p=0.056 n=5+5) FmtManyArgs 1.37µs ± 1% 1.36µs ± 1% ~ (p=0.730 n=5+5) GobDecode 14.6ms ± 4% 14.4ms ± 4% ~ (p=0.690 n=5+5) GobEncode 11.8ms ±20% 11.6ms ±15% ~ (p=1.000 n=5+5) Gzip 507ms ± 0% 491ms ± 0% -3.22% (p=0.008 n=5+5) Gunzip 73.8ms ± 0% 73.9ms ± 0% ~ (p=0.690 n=5+5) HTTPClientServer 116µs ± 0% 116µs ± 0% ~ (p=0.686 n=4+4) JSONEncode 21.8ms ± 1% 21.6ms ± 2% ~ (p=0.151 n=5+5) JSONDecode 104ms ± 1% 103ms ± 1% -1.08% (p=0.016 n=5+5) Mandelbrot200 9.53ms ± 0% 9.53ms ± 0% ~ (p=0.421 n=5+5) GoParse 7.55ms ± 1% 7.51ms ± 1% ~ (p=0.151 n=5+5) RegexpMatchEasy0_32 158ns ± 0% 158ns ± 0% ~ (all equal) RegexpMatchEasy0_1K 606ns ± 1% 608ns ± 3% ~ (p=0.937 n=5+5) RegexpMatchEasy1_32 143ns ± 0% 144ns ± 1% ~ (p=0.095 n=5+4) RegexpMatchEasy1_1K 927ns ± 2% 944ns ± 2% ~ (p=0.056 n=5+5) RegexpMatchMedium_32 16.0ns ± 0% 16.0ns ± 0% ~ (all equal) RegexpMatchMedium_1K 69.3µs ± 2% 69.7µs ± 0% ~ (p=0.690 n=5+5) RegexpMatchHard_32 3.73µs ± 0% 3.73µs ± 1% ~ (p=0.984 n=5+5) RegexpMatchHard_1K 111µs ± 1% 110µs ± 0% ~ (p=0.151 n=5+5) Revcomp 1.91s ±47% 1.77s ±68% ~ (p=1.000 n=5+5) Template 138ms ± 1% 138ms ± 1% ~ (p=1.000 n=5+5) TimeParse 787ns ± 2% 785ns ± 1% ~ (p=0.540 n=5+5) TimeFormat 729ns ± 1% 726ns ± 1% ~ (p=0.151 n=5+5) Updates #38740 Change-Id: I06c604874acdc1e63e66452dadee5df053045222 Reviewed-on: https://go-review.googlesource.com/c/go/+/233097 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> |
||
Agniva De Sarker
|
ecc7dd5469 |
test/codegen: fix wasm codegen breakage
i32.eqz instructions don't appear unless needed in if conditions anymore after CL 195204. I forgot to run the codegen tests while submitting the CL. Thanks to @martisch for catching it. Fixes #34442 Change-Id: I177b064b389be48e39d564849714d7a8839be13e Reviewed-on: https://go-review.googlesource.com/c/go/+/196580 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> |
||
Ben Shi
|
1786ecd502 |
cmd/compile: eliminate WASM's redundant extension & wrapping
This CL eliminates unnecessary pairs of I32WrapI64 and I64ExtendI32U generated by the WASM backend for IF statements. And it makes the total size of pkg/js_wasm/ decreases about 490KB. Change-Id: I16b0abb686c4e30d5624323166ec2d0ec57dbe2d Reviewed-on: https://go-review.googlesource.com/c/go/+/191758 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com> |
||
Josh Bleecher Snyder
|
870cfe6484 |
test/codegen: gofmt
Change-Id: I33f5b5051e5f75aa264ec656926223c5a3c09c1b Reviewed-on: https://go-review.googlesource.com/c/go/+/167498 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matt Layher <mdlayher@gmail.com> |
||
Lynn Boger
|
4ae49b5921 |
cmd/compile: use ANDCC, ORCC, XORCC to avoid CMP on ppc64x
This change makes use of the cc versions of the AND, OR, XOR instructions, omitting the need for a CMP instruction. In many test programs and in the go binary, this reduces the size of 20-30 functions by at least 1 instruction, many in runtime. Testcase added to test/codegen/comparisons.go Change-Id: I6cc1ca8b80b065d7390749c625bc9784b0039adb Reviewed-on: https://go-review.googlesource.com/c/143059 Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
Lynn Boger
|
39fa301bdc |
test/codegen: enable more tests for ppc64/ppc64le
Adding cases for ppc64,ppc64le to the codegen tests where appropriate. Change-Id: Idf8cbe88a4ab4406a4ef1ea777bd15a58b68f3ed Reviewed-on: https://go-review.googlesource.com/c/142557 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
Ben Shi
|
5aeecc4530 |
cmd/compile: optimize arm64's code with more shifted operations
This CL optimizes arm64's NEG/MVN/TST/CMN with a shifted operand. 1. The total size of pkg/android_arm64 decreases about 0.2KB, excluding cmd/compile/ . 2. The go1 benchmark shows no regression, excluding noise. name old time/op new time/op delta BinaryTree17-4 16.4s ± 1% 16.4s ± 1% ~ (p=0.914 n=29+29) Fannkuch11-4 8.72s ± 0% 8.72s ± 0% ~ (p=0.274 n=30+29) FmtFprintfEmpty-4 174ns ± 0% 174ns ± 0% ~ (all equal) FmtFprintfString-4 370ns ± 0% 370ns ± 0% ~ (all equal) FmtFprintfInt-4 419ns ± 0% 419ns ± 0% ~ (all equal) FmtFprintfIntInt-4 672ns ± 1% 675ns ± 2% ~ (p=0.217 n=28+30) FmtFprintfPrefixedInt-4 806ns ± 0% 806ns ± 0% ~ (p=0.402 n=30+28) FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% +0.02% (p=0.011 n=22+27) FmtManyArgs-4 2.67µs ± 0% 2.68µs ± 0% ~ (p=0.279 n=29+30) GobDecode-4 33.1ms ± 1% 33.1ms ± 0% ~ (p=0.052 n=28+29) GobEncode-4 29.6ms ± 0% 29.6ms ± 0% +0.08% (p=0.013 n=28+29) Gzip-4 1.38s ± 2% 1.39s ± 2% ~ (p=0.071 n=29+29) Gunzip-4 139ms ± 0% 139ms ± 0% ~ (p=0.265 n=29+29) HTTPClientServer-4 789µs ± 4% 785µs ± 4% ~ (p=0.206 n=29+28) JSONEncode-4 49.7ms ± 0% 49.6ms ± 0% -0.24% (p=0.000 n=30+30) JSONDecode-4 266ms ± 1% 267ms ± 1% +0.34% (p=0.000 n=30+30) Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.835 n=28+30) GoParse-4 15.9ms ± 0% 15.8ms ± 0% -0.29% (p=0.000 n=27+30) RegexpMatchEasy0_32-4 380ns ± 0% 381ns ± 0% +0.18% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 1.18µs ± 0% 1.19µs ± 0% +0.23% (p=0.000 n=30+30) RegexpMatchEasy1_32-4 357ns ± 0% 358ns ± 0% +0.28% (p=0.000 n=29+29) RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.04µs ± 0% +0.06% (p=0.006 n=30+30) RegexpMatchMedium_32-4 589ns ± 0% 590ns ± 0% +0.24% (p=0.000 n=28+30) RegexpMatchMedium_1K-4 162µs ± 0% 162µs ± 0% -0.01% (p=0.027 n=26+29) RegexpMatchHard_32-4 9.58µs ± 0% 9.58µs ± 0% ~ (p=0.935 n=30+30) RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% ~ (p=0.387 n=29+30) Revcomp-4 2.50s ± 0% 2.50s ± 0% -0.10% (p=0.020 n=28+28) Template-4 310ms ± 0% 310ms ± 1% ~ (p=0.406 n=30+30) TimeParse-4 1.68µs ± 0% 1.68µs ± 0% +0.03% (p=0.014 n=30+17) TimeFormat-4 1.65µs ± 0% 1.66µs ± 0% +0.32% (p=0.000 n=27+29) [Geo mean] 247µs 247µs +0.05% name old speed new speed delta GobDecode-4 23.2MB/s ± 0% 23.2MB/s ± 0% -0.08% (p=0.032 n=27+29) GobEncode-4 26.0MB/s ± 0% 25.9MB/s ± 0% -0.10% (p=0.011 n=29+29) Gzip-4 14.1MB/s ± 2% 14.0MB/s ± 2% ~ (p=0.081 n=29+29) Gunzip-4 139MB/s ± 0% 139MB/s ± 0% ~ (p=0.290 n=29+29) JSONEncode-4 39.0MB/s ± 0% 39.1MB/s ± 0% +0.25% (p=0.000 n=29+30) JSONDecode-4 7.30MB/s ± 1% 7.28MB/s ± 1% -0.33% (p=0.000 n=30+30) GoParse-4 3.65MB/s ± 0% 3.66MB/s ± 0% +0.29% (p=0.000 n=27+30) RegexpMatchEasy0_32-4 84.1MB/s ± 0% 84.0MB/s ± 0% -0.17% (p=0.000 n=30+28) RegexpMatchEasy0_1K-4 864MB/s ± 0% 862MB/s ± 0% -0.24% (p=0.000 n=30+30) RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.3MB/s ± 0% -0.18% (p=0.000 n=28+24) RegexpMatchEasy1_1K-4 502MB/s ± 0% 502MB/s ± 0% -0.05% (p=0.008 n=30+29) RegexpMatchMedium_32-4 1.70MB/s ± 0% 1.69MB/s ± 0% -0.59% (p=0.000 n=29+30) RegexpMatchMedium_1K-4 6.31MB/s ± 0% 6.31MB/s ± 0% +0.05% (p=0.005 n=30+26) RegexpMatchHard_32-4 3.34MB/s ± 0% 3.34MB/s ± 0% ~ (all equal) RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.57MB/s ± 0% ~ (all equal) Revcomp-4 102MB/s ± 0% 102MB/s ± 0% +0.10% (p=0.022 n=28+28) Template-4 6.26MB/s ± 0% 6.26MB/s ± 1% ~ (p=0.768 n=30+30) [Geo mean] 24.2MB/s 24.1MB/s -0.08% Change-Id: I494f9db7f8a568a00e9c74ae25086a58b2221683 Reviewed-on: https://go-review.googlesource.com/137976 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
031a35ec84 |
cmd/compile: optimize 386's comparison
Optimization of "(CMPconst [0] (ANDL x y)) -> (TESTL x y)" only get benefits if there is no further use of the result of x&y. A condition of uses==1 will have slight improvements. 1. The code size of pkg/linux_386 decreases about 300 bytes, excluding cmd/compile/. 2. The go1 benchmark shows no regression, and even a slight improvement in test case FmtFprintfEmpty-4, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.34s ± 3% 3.32s ± 2% ~ (p=0.197 n=30+30) Fannkuch11-4 3.48s ± 2% 3.47s ± 1% -0.33% (p=0.015 n=30+30) FmtFprintfEmpty-4 46.3ns ± 4% 44.8ns ± 4% -3.33% (p=0.000 n=30+30) FmtFprintfString-4 78.8ns ± 7% 77.3ns ± 5% ~ (p=0.098 n=30+26) FmtFprintfInt-4 90.2ns ± 1% 90.0ns ± 7% -0.23% (p=0.027 n=18+30) FmtFprintfIntInt-4 144ns ± 4% 143ns ± 5% ~ (p=0.945 n=30+29) FmtFprintfPrefixedInt-4 180ns ± 4% 180ns ± 5% ~ (p=0.858 n=30+30) FmtFprintfFloat-4 409ns ± 4% 406ns ± 3% -0.87% (p=0.028 n=30+30) FmtManyArgs-4 611ns ± 5% 608ns ± 4% ~ (p=0.812 n=30+30) GobDecode-4 7.30ms ± 5% 7.26ms ± 5% ~ (p=0.522 n=30+29) GobEncode-4 6.90ms ± 7% 6.82ms ± 4% ~ (p=0.086 n=29+28) Gzip-4 396ms ± 4% 400ms ± 4% +0.99% (p=0.026 n=30+30) Gunzip-4 41.1ms ± 3% 41.2ms ± 3% ~ (p=0.495 n=30+30) HTTPClientServer-4 63.7µs ± 3% 63.3µs ± 2% ~ (p=0.113 n=29+29) JSONEncode-4 16.1ms ± 2% 16.1ms ± 2% -0.30% (p=0.041 n=30+30) JSONDecode-4 60.9ms ± 3% 61.2ms ± 6% ~ (p=0.187 n=30+30) Mandelbrot200-4 5.17ms ± 2% 5.19ms ± 3% ~ (p=0.676 n=30+30) GoParse-4 3.28ms ± 3% 3.25ms ± 2% -0.97% (p=0.002 n=30+30) RegexpMatchEasy0_32-4 103ns ± 4% 104ns ± 4% ~ (p=0.352 n=30+30) RegexpMatchEasy0_1K-4 849ns ± 2% 845ns ± 2% ~ (p=0.381 n=30+30) RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.795 n=30+30) RegexpMatchEasy1_1K-4 1.03µs ± 3% 1.03µs ± 4% ~ (p=0.275 n=25+30) RegexpMatchMedium_32-4 132ns ± 3% 132ns ± 3% ~ (p=0.970 n=30+30) RegexpMatchMedium_1K-4 41.4µs ± 3% 41.4µs ± 3% ~ (p=0.212 n=30+30) RegexpMatchHard_32-4 2.22µs ± 4% 2.22µs ± 4% ~ (p=0.399 n=30+30) RegexpMatchHard_1K-4 67.2µs ± 3% 67.6µs ± 4% ~ (p=0.359 n=30+30) Revcomp-4 1.84s ± 2% 1.83s ± 2% ~ (p=0.532 n=30+30) Template-4 69.1ms ± 4% 68.8ms ± 3% ~ (p=0.146 n=30+30) TimeParse-4 441ns ± 3% 442ns ± 3% ~ (p=0.154 n=30+30) TimeFormat-4 413ns ± 3% 414ns ± 3% ~ (p=0.275 n=30+30) [Geo mean] 66.2µs 66.0µs -0.28% name old speed new speed delta GobDecode-4 105MB/s ± 5% 106MB/s ± 5% ~ (p=0.514 n=30+29) GobEncode-4 111MB/s ± 5% 113MB/s ± 4% +1.37% (p=0.046 n=28+28) Gzip-4 49.1MB/s ± 4% 48.6MB/s ± 4% -0.98% (p=0.028 n=30+30) Gunzip-4 472MB/s ± 4% 472MB/s ± 3% ~ (p=0.496 n=30+30) JSONEncode-4 120MB/s ± 2% 121MB/s ± 2% +0.29% (p=0.042 n=30+30) JSONDecode-4 31.9MB/s ± 3% 31.7MB/s ± 6% ~ (p=0.186 n=30+30) GoParse-4 17.6MB/s ± 3% 17.8MB/s ± 2% +0.98% (p=0.002 n=30+30) RegexpMatchEasy0_32-4 309MB/s ± 4% 307MB/s ± 4% ~ (p=0.501 n=30+30) RegexpMatchEasy0_1K-4 1.21GB/s ± 2% 1.21GB/s ± 2% ~ (p=0.301 n=30+30) RegexpMatchEasy1_32-4 283MB/s ± 4% 282MB/s ± 3% ~ (p=0.877 n=30+30) RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 0.99GB/s ± 4% ~ (p=0.276 n=25+30) RegexpMatchMedium_32-4 7.54MB/s ± 3% 7.55MB/s ± 3% ~ (p=0.528 n=30+30) RegexpMatchMedium_1K-4 24.7MB/s ± 3% 24.7MB/s ± 3% ~ (p=0.203 n=30+30) RegexpMatchHard_32-4 14.4MB/s ± 4% 14.4MB/s ± 4% ~ (p=0.407 n=30+30) RegexpMatchHard_1K-4 15.3MB/s ± 3% 15.1MB/s ± 4% ~ (p=0.306 n=30+30) Revcomp-4 138MB/s ± 2% 139MB/s ± 2% ~ (p=0.520 n=30+30) Template-4 28.1MB/s ± 4% 28.2MB/s ± 3% ~ (p=0.149 n=30+30) [Geo mean] 81.5MB/s 81.5MB/s +0.06% Change-Id: I7f75425f79eec93cdd8fdd94db13ad4f61b6a2f5 Reviewed-on: https://go-review.googlesource.com/133657 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Ben Shi
|
067dfce21f |
cmd/compile: optimize ARM's comparision
Optimize (CMPconst [0] (ADD x y)) to (CMN x y) will only get benefits when the result of the addition is no longer used, otherwise there might be even performance drop. And this CL fixes that issue for CMP/CMN/TST/TEQ. There is little regression in the go1 benchmark (excluding noise), and the test case JSONDecode-4 even gets improvement. name old time/op new time/op delta BinaryTree17-4 21.6s ± 1% 21.6s ± 0% -0.22% (p=0.013 n=30+30) Fannkuch11-4 11.1s ± 0% 11.1s ± 0% +0.11% (p=0.000 n=30+29) FmtFprintfEmpty-4 297ns ± 0% 297ns ± 0% +0.08% (p=0.007 n=26+28) FmtFprintfString-4 589ns ± 1% 589ns ± 0% ~ (p=0.659 n=30+25) FmtFprintfInt-4 644ns ± 1% 650ns ± 0% +0.88% (p=0.000 n=30+24) FmtFprintfIntInt-4 964ns ± 0% 977ns ± 0% +1.33% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.06µs ± 0% 1.07µs ± 0% +1.31% (p=0.000 n=29+27) FmtFprintfFloat-4 1.89µs ± 0% 1.92µs ± 0% +1.25% (p=0.000 n=29+29) FmtManyArgs-4 3.63µs ± 0% 3.67µs ± 0% +1.33% (p=0.000 n=29+27) GobDecode-4 38.1ms ± 1% 37.9ms ± 1% -0.60% (p=0.000 n=29+29) GobEncode-4 35.3ms ± 2% 35.2ms ± 1% ~ (p=0.286 n=30+30) Gzip-4 2.36s ± 0% 2.37s ± 2% ~ (p=0.277 n=24+28) Gunzip-4 264ms ± 1% 264ms ± 1% ~ (p=0.104 n=28+30) HTTPClientServer-4 1.04ms ± 4% 1.02ms ± 4% -1.65% (p=0.000 n=28+28) JSONEncode-4 78.5ms ± 1% 79.6ms ± 1% +1.34% (p=0.000 n=27+28) JSONDecode-4 379ms ± 4% 352ms ± 5% -7.09% (p=0.000 n=29+30) Mandelbrot200-4 17.6ms ± 0% 17.6ms ± 0% ~ (p=0.206 n=28+29) GoParse-4 21.9ms ± 1% 22.1ms ± 1% +0.87% (p=0.000 n=28+26) RegexpMatchEasy0_32-4 631ns ± 0% 641ns ± 0% +1.63% (p=0.000 n=29+30) RegexpMatchEasy0_1K-4 4.11µs ± 0% 4.11µs ± 0% ~ (p=0.700 n=30+30) RegexpMatchEasy1_32-4 670ns ± 0% 679ns ± 0% +1.37% (p=0.000 n=21+30) RegexpMatchEasy1_1K-4 5.31µs ± 0% 5.26µs ± 0% -1.03% (p=0.000 n=25+28) RegexpMatchMedium_32-4 905ns ± 0% 906ns ± 0% +0.14% (p=0.001 n=30+30) RegexpMatchMedium_1K-4 192µs ± 0% 191µs ± 0% -0.45% (p=0.000 n=29+27) RegexpMatchHard_32-4 11.8µs ± 0% 11.7µs ± 0% -0.39% (p=0.000 n=29+28) RegexpMatchHard_1K-4 347µs ± 0% 347µs ± 0% ~ (p=0.084 n=29+30) Revcomp-4 37.5ms ± 1% 37.5ms ± 1% ~ (p=0.279 n=29+29) Template-4 519ms ± 2% 519ms ± 2% ~ (p=0.652 n=28+29) TimeParse-4 2.83µs ± 0% 2.78µs ± 0% -1.90% (p=0.000 n=27+28) TimeFormat-4 5.79µs ± 0% 5.60µs ± 0% -3.23% (p=0.000 n=29+29) [Geo mean] 331µs 330µs -0.16% name old speed new speed delta GobDecode-4 20.1MB/s ± 1% 20.3MB/s ± 1% +0.61% (p=0.000 n=29+29) GobEncode-4 21.7MB/s ± 2% 21.8MB/s ± 1% ~ (p=0.294 n=30+30) Gzip-4 8.23MB/s ± 1% 8.20MB/s ± 2% ~ (p=0.099 n=26+28) Gunzip-4 73.5MB/s ± 1% 73.4MB/s ± 1% ~ (p=0.107 n=28+30) JSONEncode-4 24.7MB/s ± 1% 24.4MB/s ± 1% -1.32% (p=0.000 n=27+28) JSONDecode-4 5.13MB/s ± 4% 5.52MB/s ± 5% +7.65% (p=0.000 n=29+30) GoParse-4 2.65MB/s ± 1% 2.63MB/s ± 1% -0.87% (p=0.000 n=28+26) RegexpMatchEasy0_32-4 50.7MB/s ± 0% 49.9MB/s ± 0% -1.58% (p=0.000 n=29+29) RegexpMatchEasy0_1K-4 249MB/s ± 0% 249MB/s ± 0% ~ (p=0.342 n=30+28) RegexpMatchEasy1_32-4 47.7MB/s ± 0% 47.1MB/s ± 0% -1.39% (p=0.000 n=26+30) RegexpMatchEasy1_1K-4 193MB/s ± 0% 195MB/s ± 0% +1.04% (p=0.000 n=25+28) RegexpMatchMedium_32-4 1.10MB/s ± 0% 1.10MB/s ± 0% -0.42% (p=0.000 n=30+26) RegexpMatchMedium_1K-4 5.33MB/s ± 0% 5.36MB/s ± 0% +0.43% (p=0.000 n=29+29) RegexpMatchHard_32-4 2.72MB/s ± 0% 2.73MB/s ± 0% +0.37% (p=0.000 n=29+30) RegexpMatchHard_1K-4 2.95MB/s ± 0% 2.95MB/s ± 0% ~ (all equal) Revcomp-4 67.8MB/s ± 1% 67.7MB/s ± 1% ~ (p=0.273 n=29+29) Template-4 3.74MB/s ± 2% 3.74MB/s ± 2% ~ (p=0.665 n=28+29) [Geo mean] 15.2MB/s 15.2MB/s +0.21% Change-Id: Ifed1fb8cc02d5ca52c8bc6c21b6b5bf6dbb2701a Reviewed-on: https://go-review.googlesource.com/132115 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
0e9f1de0b7 |
cmd/compile: optimize arm64's comparison
Add more optimization with TST/CMN. 1. A tiny benchmark shows more than 12% improvement. TSTCMN-4 378µs ± 0% 332µs ± 0% -12.15% (p=0.000 n=30+27) (https://github.com/benshi001/ugo1/blob/master/tstcmn_test.go) 2. There is little regression in the go1 benchmark, excluding noise. name old time/op new time/op delta BinaryTree17-4 19.1s ± 0% 19.1s ± 0% ~ (p=0.994 n=28+29) Fannkuch11-4 10.0s ± 0% 10.0s ± 0% ~ (p=0.198 n=30+25) FmtFprintfEmpty-4 233ns ± 0% 233ns ± 0% +0.14% (p=0.002 n=24+30) FmtFprintfString-4 428ns ± 0% 428ns ± 0% ~ (all equal) FmtFprintfInt-4 472ns ± 0% 472ns ± 0% ~ (all equal) FmtFprintfIntInt-4 725ns ± 0% 725ns ± 0% ~ (all equal) FmtFprintfPrefixedInt-4 889ns ± 0% 888ns ± 0% ~ (p=0.632 n=28+30) FmtFprintfFloat-4 1.20µs ± 0% 1.20µs ± 0% +0.05% (p=0.001 n=18+30) FmtManyArgs-4 3.00µs ± 0% 2.99µs ± 0% -0.07% (p=0.001 n=27+30) GobDecode-4 42.1ms ± 0% 42.2ms ± 0% +0.29% (p=0.000 n=28+28) GobEncode-4 38.6ms ± 9% 38.8ms ± 9% ~ (p=0.912 n=30+30) Gzip-4 2.07s ± 1% 2.05s ± 1% -0.64% (p=0.000 n=29+30) Gunzip-4 175ms ± 0% 175ms ± 0% -0.15% (p=0.001 n=30+30) HTTPClientServer-4 872µs ± 5% 880µs ± 6% ~ (p=0.196 n=30+29) JSONEncode-4 88.5ms ± 1% 89.8ms ± 1% +1.49% (p=0.000 n=23+24) JSONDecode-4 393ms ± 1% 390ms ± 1% -0.89% (p=0.000 n=28+30) Mandelbrot200-4 19.5ms ± 0% 19.5ms ± 0% ~ (p=0.405 n=29+28) GoParse-4 19.9ms ± 0% 20.0ms ± 0% +0.27% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 431ns ± 0% 431ns ± 0% ~ (p=1.000 n=30+30) RegexpMatchEasy0_1K-4 1.61µs ± 0% 1.61µs ± 0% ~ (p=0.527 n=26+26) RegexpMatchEasy1_32-4 443ns ± 0% 443ns ± 0% ~ (all equal) RegexpMatchEasy1_1K-4 2.58µs ± 1% 2.58µs ± 1% ~ (p=0.578 n=27+25) RegexpMatchMedium_32-4 740ns ± 0% 740ns ± 0% ~ (p=0.357 n=30+30) RegexpMatchMedium_1K-4 223µs ± 0% 223µs ± 0% +0.16% (p=0.000 n=30+29) RegexpMatchHard_32-4 12.3µs ± 0% 12.3µs ± 0% ~ (p=0.236 n=27+27) RegexpMatchHard_1K-4 371µs ± 0% 371µs ± 0% +0.09% (p=0.000 n=30+27) Revcomp-4 2.85s ± 0% 2.85s ± 0% ~ (p=0.057 n=28+25) Template-4 408ms ± 1% 409ms ± 1% ~ (p=0.117 n=29+29) TimeParse-4 1.93µs ± 0% 1.93µs ± 0% ~ (p=0.535 n=29+28) TimeFormat-4 1.99µs ± 0% 1.99µs ± 0% ~ (p=0.168 n=29+28) [Geo mean] 306µs 307µs +0.07% name old speed new speed delta GobDecode-4 18.3MB/s ± 0% 18.2MB/s ± 0% -0.31% (p=0.000 n=28+29) GobEncode-4 19.9MB/s ± 8% 19.8MB/s ± 9% ~ (p=0.923 n=30+30) Gzip-4 9.39MB/s ± 1% 9.45MB/s ± 1% +0.65% (p=0.000 n=29+30) Gunzip-4 111MB/s ± 0% 111MB/s ± 0% +0.15% (p=0.001 n=30+30) JSONEncode-4 21.9MB/s ± 1% 21.6MB/s ± 1% -1.45% (p=0.000 n=23+23) JSONDecode-4 4.94MB/s ± 1% 4.98MB/s ± 1% +0.84% (p=0.000 n=27+30) GoParse-4 2.91MB/s ± 0% 2.90MB/s ± 0% -0.34% (p=0.000 n=21+22) RegexpMatchEasy0_32-4 74.1MB/s ± 0% 74.1MB/s ± 0% ~ (p=0.469 n=29+28) RegexpMatchEasy0_1K-4 634MB/s ± 0% 634MB/s ± 0% ~ (p=0.978 n=24+28) RegexpMatchEasy1_32-4 72.2MB/s ± 0% 72.2MB/s ± 0% ~ (p=0.064 n=27+29) RegexpMatchEasy1_1K-4 396MB/s ± 1% 396MB/s ± 1% ~ (p=0.583 n=27+25) RegexpMatchMedium_32-4 1.35MB/s ± 0% 1.35MB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 4.60MB/s ± 0% 4.59MB/s ± 0% -0.14% (p=0.000 n=30+26) RegexpMatchHard_32-4 2.61MB/s ± 0% 2.61MB/s ± 0% ~ (all equal) RegexpMatchHard_1K-4 2.76MB/s ± 0% 2.76MB/s ± 0% ~ (all equal) Revcomp-4 89.1MB/s ± 0% 89.1MB/s ± 0% ~ (p=0.059 n=28+25) Template-4 4.75MB/s ± 1% 4.75MB/s ± 1% ~ (p=0.106 n=29+29) [Geo mean] 18.3MB/s 18.3MB/s -0.07% Change-Id: I3cd76ce63e84b0c3cebabf9fa3573b76a7343899 Reviewed-on: https://go-review.googlesource.com/124935 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
2b69ad0b3c |
cmd/compile: optimize arm's comparison
The CMP/CMN/TST/TEQ perform similar to SUB/ADD/AND/XOR except the result is abondoned, and only NZCV flags are affected. This CL implements further optimization with them. 1. A micro benchmark test gets more than 9% improvment. TSTTEQ-4 6.99ms ± 0% 6.35ms ± 0% -9.15% (p=0.000 n=33+36) (https://github.com/benshi001/ugo1/blob/master/tstteq2_test.go) 2. The go1 benckmark shows no regression, excluding noise. name old time/op new time/op delta BinaryTree17-4 25.7s ± 1% 25.7s ± 1% ~ (p=0.830 n=40+40) Fannkuch11-4 13.3s ± 0% 13.2s ± 0% -0.65% (p=0.000 n=40+34) FmtFprintfEmpty-4 394ns ± 0% 394ns ± 0% ~ (p=0.819 n=40+40) FmtFprintfString-4 677ns ± 0% 677ns ± 0% +0.06% (p=0.039 n=39+40) FmtFprintfInt-4 707ns ± 0% 706ns ± 0% -0.14% (p=0.000 n=40+39) FmtFprintfIntInt-4 1.04µs ± 0% 1.04µs ± 0% +0.10% (p=0.000 n=29+31) FmtFprintfPrefixedInt-4 1.10µs ± 0% 1.11µs ± 0% +0.65% (p=0.000 n=39+37) FmtFprintfFloat-4 2.27µs ± 0% 2.26µs ± 0% -0.53% (p=0.000 n=39+40) FmtManyArgs-4 3.96µs ± 0% 3.96µs ± 0% +0.10% (p=0.000 n=39+40) GobDecode-4 53.4ms ± 1% 52.8ms ± 2% -1.10% (p=0.000 n=39+39) GobEncode-4 50.3ms ± 3% 50.4ms ± 2% ~ (p=0.089 n=40+39) Gzip-4 2.62s ± 0% 2.64s ± 0% +0.60% (p=0.000 n=40+39) Gunzip-4 312ms ± 0% 312ms ± 0% +0.02% (p=0.030 n=40+39) HTTPClientServer-4 1.01ms ± 7% 0.98ms ± 7% -2.37% (p=0.000 n=40+39) JSONEncode-4 126ms ± 1% 126ms ± 1% -0.38% (p=0.004 n=39+39) JSONDecode-4 423ms ± 0% 426ms ± 2% +0.72% (p=0.001 n=39+40) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% +0.04% (p=0.000 n=38+40) GoParse-4 22.8ms ± 0% 22.6ms ± 0% -0.68% (p=0.000 n=35+40) RegexpMatchEasy0_32-4 699ns ± 0% 704ns ± 0% +0.73% (p=0.000 n=27+40) RegexpMatchEasy0_1K-4 4.27µs ± 0% 4.26µs ± 0% -0.09% (p=0.000 n=35+38) RegexpMatchEasy1_32-4 741ns ± 0% 735ns ± 0% -0.85% (p=0.000 n=40+35) RegexpMatchEasy1_1K-4 5.53µs ± 0% 5.49µs ± 0% -0.69% (p=0.000 n=39+40) RegexpMatchMedium_32-4 1.07µs ± 0% 1.04µs ± 2% -2.34% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 261µs ± 0% 261µs ± 0% -0.16% (p=0.000 n=40+39) RegexpMatchHard_32-4 14.9µs ± 0% 14.9µs ± 0% -0.18% (p=0.000 n=39+40) RegexpMatchHard_1K-4 445µs ± 0% 446µs ± 0% +0.09% (p=0.000 n=36+34) Revcomp-4 41.8ms ± 1% 41.8ms ± 1% ~ (p=0.595 n=39+38) Template-4 530ms ± 1% 528ms ± 1% -0.49% (p=0.000 n=40+40) TimeParse-4 3.39µs ± 0% 3.42µs ± 0% +0.98% (p=0.000 n=36+38) TimeFormat-4 6.12µs ± 0% 6.07µs ± 0% -0.81% (p=0.000 n=34+38) [Geo mean] 384µs 383µs -0.24% name old speed new speed delta GobDecode-4 14.4MB/s ± 1% 14.5MB/s ± 2% +1.11% (p=0.000 n=39+39) GobEncode-4 15.3MB/s ± 3% 15.2MB/s ± 2% ~ (p=0.104 n=40+39) Gzip-4 7.40MB/s ± 1% 7.36MB/s ± 0% -0.60% (p=0.000 n=40+39) Gunzip-4 62.2MB/s ± 0% 62.1MB/s ± 0% -0.02% (p=0.047 n=40+39) JSONEncode-4 15.4MB/s ± 1% 15.4MB/s ± 2% +0.39% (p=0.002 n=39+39) JSONDecode-4 4.59MB/s ± 0% 4.56MB/s ± 2% -0.71% (p=0.000 n=39+40) GoParse-4 2.54MB/s ± 0% 2.56MB/s ± 0% +0.72% (p=0.000 n=26+40) RegexpMatchEasy0_32-4 45.8MB/s ± 0% 45.4MB/s ± 0% -0.75% (p=0.000 n=38+40) RegexpMatchEasy0_1K-4 240MB/s ± 0% 240MB/s ± 0% +0.09% (p=0.000 n=35+38) RegexpMatchEasy1_32-4 43.1MB/s ± 0% 43.5MB/s ± 0% +0.84% (p=0.000 n=40+39) RegexpMatchEasy1_1K-4 185MB/s ± 0% 186MB/s ± 0% +0.69% (p=0.000 n=39+40) RegexpMatchMedium_32-4 936kB/s ± 1% 959kB/s ± 2% +2.38% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 3.92MB/s ± 0% 3.93MB/s ± 0% +0.18% (p=0.000 n=39+40) RegexpMatchHard_32-4 2.15MB/s ± 0% 2.15MB/s ± 0% +0.19% (p=0.000 n=40+40) RegexpMatchHard_1K-4 2.30MB/s ± 0% 2.30MB/s ± 0% ~ (all equal) Revcomp-4 60.8MB/s ± 1% 60.8MB/s ± 1% ~ (p=0.600 n=39+38) Template-4 3.66MB/s ± 1% 3.68MB/s ± 1% +0.46% (p=0.000 n=40+40) [Geo mean] 12.8MB/s 12.8MB/s +0.27% Change-Id: I849161169ecf0876a04b7c1d3990fa8d1435215e Reviewed-on: https://go-review.googlesource.com/122855 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
556971316f |
cmd/compile: optimize 386's comparison
CMPL/CMPW/CMPB can take a memory operand on 386, and this CL implements that optimization. 1. The total size of pkg/linux_386 decreases about 45KB, excluding cmd/compile. 2. The go1 benchmark shows a little improvement. name old time/op new time/op delta BinaryTree17-4 3.36s ± 2% 3.37s ± 3% ~ (p=0.537 n=40+40) Fannkuch11-4 3.59s ± 1% 3.53s ± 2% -1.58% (p=0.000 n=40+40) FmtFprintfEmpty-4 46.0ns ± 3% 45.8ns ± 3% ~ (p=0.249 n=40+40) FmtFprintfString-4 80.0ns ± 4% 78.8ns ± 3% -1.49% (p=0.001 n=40+40) FmtFprintfInt-4 89.7ns ± 2% 90.3ns ± 2% +0.74% (p=0.003 n=40+40) FmtFprintfIntInt-4 144ns ± 3% 143ns ± 3% -0.95% (p=0.003 n=40+40) FmtFprintfPrefixedInt-4 181ns ± 4% 180ns ± 2% ~ (p=0.103 n=40+40) FmtFprintfFloat-4 412ns ± 3% 408ns ± 4% -0.97% (p=0.018 n=40+40) FmtManyArgs-4 607ns ± 4% 605ns ± 4% ~ (p=0.148 n=40+40) GobDecode-4 7.19ms ± 4% 7.24ms ± 5% ~ (p=0.340 n=40+40) GobEncode-4 7.04ms ± 9% 6.99ms ± 9% ~ (p=0.289 n=40+40) Gzip-4 400ms ± 6% 398ms ± 5% ~ (p=0.168 n=40+40) Gunzip-4 41.2ms ± 3% 41.7ms ± 3% +1.40% (p=0.001 n=40+40) HTTPClientServer-4 62.5µs ± 1% 62.1µs ± 2% -0.61% (p=0.000 n=37+37) JSONEncode-4 20.7ms ± 4% 20.4ms ± 3% -1.60% (p=0.000 n=40+40) JSONDecode-4 69.4ms ± 4% 69.2ms ± 6% ~ (p=0.177 n=40+40) Mandelbrot200-4 5.22ms ± 6% 5.21ms ± 3% ~ (p=0.531 n=40+40) GoParse-4 3.29ms ± 3% 3.28ms ± 3% ~ (p=0.321 n=40+39) RegexpMatchEasy0_32-4 104ns ± 4% 103ns ± 7% -0.89% (p=0.040 n=40+40) RegexpMatchEasy0_1K-4 852ns ± 3% 853ns ± 2% ~ (p=0.357 n=40+40) RegexpMatchEasy1_32-4 113ns ± 8% 113ns ± 3% ~ (p=0.906 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 5% ~ (p=0.326 n=40+40) RegexpMatchMedium_32-4 136ns ± 3% 133ns ± 3% -2.31% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 44.0µs ± 3% 43.7µs ± 3% ~ (p=0.053 n=40+40) RegexpMatchHard_32-4 2.27µs ± 3% 2.26µs ± 4% ~ (p=0.391 n=40+40) RegexpMatchHard_1K-4 68.0µs ± 3% 68.9µs ± 3% +1.28% (p=0.000 n=40+40) Revcomp-4 1.86s ± 5% 1.86s ± 2% ~ (p=0.950 n=40+40) Template-4 73.4ms ± 4% 69.9ms ± 7% -4.78% (p=0.000 n=40+40) TimeParse-4 449ns ± 4% 441ns ± 5% -1.76% (p=0.000 n=40+40) TimeFormat-4 416ns ± 3% 417ns ± 4% ~ (p=0.304 n=40+40) [Geo mean] 67.7µs 67.3µs -0.55% name old speed new speed delta GobDecode-4 107MB/s ± 4% 106MB/s ± 5% ~ (p=0.336 n=40+40) GobEncode-4 109MB/s ± 5% 110MB/s ± 9% ~ (p=0.142 n=38+40) Gzip-4 48.5MB/s ± 5% 48.8MB/s ± 5% ~ (p=0.172 n=40+40) Gunzip-4 472MB/s ± 3% 465MB/s ± 3% -1.39% (p=0.001 n=40+40) JSONEncode-4 93.6MB/s ± 4% 95.1MB/s ± 3% +1.61% (p=0.000 n=40+40) JSONDecode-4 28.0MB/s ± 3% 28.1MB/s ± 6% ~ (p=0.181 n=40+40) GoParse-4 17.6MB/s ± 3% 17.7MB/s ± 3% ~ (p=0.350 n=40+39) RegexpMatchEasy0_32-4 308MB/s ± 4% 311MB/s ± 6% +0.96% (p=0.025 n=40+40) RegexpMatchEasy0_1K-4 1.20GB/s ± 3% 1.20GB/s ± 2% ~ (p=0.317 n=40+40) RegexpMatchEasy1_32-4 282MB/s ± 7% 282MB/s ± 3% ~ (p=0.516 n=40+40) RegexpMatchEasy1_1K-4 994MB/s ± 4% 991MB/s ± 5% ~ (p=0.319 n=40+40) RegexpMatchMedium_32-4 7.31MB/s ± 3% 7.49MB/s ± 3% +2.46% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 23.3MB/s ± 3% 23.4MB/s ± 3% ~ (p=0.052 n=40+40) RegexpMatchHard_32-4 14.1MB/s ± 3% 14.1MB/s ± 4% ~ (p=0.391 n=40+40) RegexpMatchHard_1K-4 15.1MB/s ± 3% 14.9MB/s ± 3% -1.27% (p=0.000 n=40+40) Revcomp-4 137MB/s ± 5% 137MB/s ± 2% ~ (p=0.942 n=40+40) Template-4 26.5MB/s ± 4% 27.8MB/s ± 7% +5.03% (p=0.000 n=40+40) [Geo mean] 78.6MB/s 79.0MB/s +0.57% Change-Id: Idcacc6881ef57cd7dc33aa87b711282842b72a53 Reviewed-on: https://go-review.googlesource.com/126618 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Michael Munday
|
b65122f99a |
cmd/compile: optimize comparisons using load merging where available
Multi-byte comparison operations were used on amd64, arm64, i386 and s390x for comparisons with constant arrays, but only amd64 and i386 for comparisons with string constants. This CL combines the check for platform capability, since they have the same requirements, and also enables both on ppc64le which also supports load merging. Note that these optimizations currently use little endian byte order which results in byte reversal instructions on s390x. This should be fixed at some point. Change-Id: Ie612d13359b50c77f4d7c6e73fea4a59fa11f322 Reviewed-on: https://go-review.googlesource.com/102558 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Alberto Donizetti
|
a27cd4fd31 |
test/codegen: port tbz/tbnz arm64 tests
And delete them from asm_test. Change-Id: I34fcf85ae8ce09cd146fe4ce6a0ae7616bd97e2d Reviewed-on: https://go-review.googlesource.com/102296 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> |
||
Alberto Donizetti
|
fc6280d4b0 |
test/codegen: port direct comparisons with memory tests
And remove them from asm_test. Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454 Reviewed-on: https://go-review.googlesource.com/102115 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Alberto Donizetti
|
be371edd67 |
test/codegen: port comparisons tests to codegen
And delete them from asm_test. Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab Reviewed-on: https://go-review.googlesource.com/101595 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> |