mirror of
https://github.com/golang/go
synced 2024-11-17 09:54:46 -07:00
40df9cc606
136 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
erifan01
|
dd91269b7c |
cmd/compile: optimize math/bits Len32 intrinsic on arm64
Arm64 has a 32-bit CLZ instruction CLZW, which can be used for intrinsic Len32. Function LeadingZeros32 calls Len32, with this change, the assembly code of LeadingZeros32 becomes more concise. Go code: func f32(x uint32) { z = bits.LeadingZeros32(x) } Before: "".f32 STEXT size=32 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8 0x0004 00004 (test.go:7) MOVWU "".x(FP), R0 0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZ R0, R0 0x000c 00012 ($GOROOT/src/math/bits/bits.go:30) SUB $32, R0, R0 0x0010 00016 (test.go:7) MOVD R0, "".z(SB) 0x001c 00028 (test.go:7) RET (R30) After: "".f32 STEXT size=32 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8 0x0004 00004 (test.go:7) MOVWU "".x(FP), R0 0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZW R0, R0 0x000c 00012 (test.go:7) MOVD R0, "".z(SB) 0x0018 00024 (test.go:7) RET (R30) Benchmarks: name old time/op new time/op delta LeadingZeros-8 2.53ns ± 0% 2.55ns ± 0% +0.67% (p=0.000 n=10+10) LeadingZeros8-8 3.56ns ± 0% 3.56ns ± 0% ~ (all equal) LeadingZeros16-8 3.55ns ± 0% 3.56ns ± 0% ~ (p=0.465 n=10+10) LeadingZeros32-8 3.55ns ± 0% 2.96ns ± 0% -16.71% (p=0.000 n=10+7) LeadingZeros64-8 2.53ns ± 0% 2.54ns ± 0% ~ (p=0.059 n=8+10) Change-Id: Ie5666bb82909e341060e02ffd4e86c0e5d67e90a Reviewed-on: https://go-review.googlesource.com/c/157000 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Iskander Sharipov
|
c1050a8e54 |
cmd/compile: don't generate newobject call for 0-sized types
Emit &runtime.zerobase instead of a call to newobject for allocations of zero sized objects in walk.go. Fixes #29446 Change-Id: I11b67981d55009726a17c2e582c12ce0c258682e Reviewed-on: https://go-review.googlesource.com/c/155840 Run-TryBot: Iskander Sharipov <quasilyte@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
Keith Randall
|
933e34ac99 |
cmd/compile: treat slice pointers as non-nil
var a []int = ... p := &a[0] _ = *p We don't need to nil check on the 3rd line. If the bounds check on the 2nd line passes, we know p is non-nil. We rely on the fact that any cap>0 slice has a non-nil pointer as its pointer to the backing array. This is true for all safely-constructed slices, and I don't see any reason why someone would violate this rule using unsafe. R=go1.13 Fixes #30366 Change-Id: I3ed764fcb72cfe1fbf963d8c1a82e24e3b6dead7 Reviewed-on: https://go-review.googlesource.com/c/163740 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
||
Keith Randall
|
c5414457c6 |
cmd/compile: pad zero-sized stack variables
If someone takes a pointer to a zero-sized stack variable, it can be incorrectly interpreted as a pointer to the next object in the stack frame. To avoid this, add some padding after zero-sized variables. We only need to pad if the next variable in memory (which is the previous variable in the order in which we allocate variables to the stack frame) has pointers. If the next variable has no pointers, it won't hurt to have a pointer to it. Because we allocate all pointer-containing variables before all non-pointer-containing variables, we should only have to pad once per frame. Fixes #24993 Change-Id: Ife561cdfdf964fdbf69af03ae6ba97d004e6193c Reviewed-on: https://go-review.googlesource.com/c/155698 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
Ben Shi
|
c042fedbc8 |
test/codegen: add arithmetic tests for 386/amd64/arm/arm64
This CL adds several test cases of arithmetic operations for 386/amd64/arm/arm64. Change-Id: I362687c06249f31091458a1d8c45fc4d006b616a Reviewed-on: https://go-review.googlesource.com/c/151897 Run-TryBot: Ben Shi <powerman1st@163.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
Keith Randall
|
0b79dde112 |
cmd/compile: don't use CMOV ops to compute load addresses
We want to issue loads as soon as possible, especially when they are going to miss in the cache. Using a conditional move (CMOV) here: i := ... if cond { i++ } ... = a[i] means that we have to wait for cond to be computed before the load is issued. Without a CMOV, if the branch is predicted correctly the load can be issued in parallel with computing cond. Even if the branch is predicted incorrectly, maybe the speculative load is close to the real load, and we get a prefetch for free. In the worst case, when the prediction is wrong and the address is way off, we only lose by the time difference between the CMOV latency (~2 cycles) and the mispredict restart latency (~15 cycles). We only squash CMOVs that affect load addresses. Results of CMOVs that are used for other things (store addresses, store values) we use as before. Fixes #26306 Change-Id: I82ca14b664bf05e1d45e58de8c4d9c775a127ca1 Reviewed-on: https://go-review.googlesource.com/c/145717 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
||
Brian Kessler
|
319787a528 |
cmd/compile: intrinsify math/bits.Div on amd64
Note that the intrinsic implementation panics separately for overflow and divide by zero, which matches the behavior of the pure go implementation. There is a modest performance improvement after intrinsic implementation. name old time/op new time/op delta Div-4 53.0ns ± 1% 47.0ns ± 0% -11.28% (p=0.008 n=5+5) Div32-4 18.4ns ± 0% 18.5ns ± 1% ~ (p=0.444 n=5+5) Div64-4 53.3ns ± 0% 47.5ns ± 4% -10.77% (p=0.008 n=5+5) Updates #28273 Change-Id: Ic1688ecc0964acace2e91bf44ef16f5fb6b6bc82 Reviewed-on: https://go-review.googlesource.com/c/144378 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Martin Möhrmann
|
75798e8ada |
runtime: make processor capability variable naming platform specific
The current support_XXX variables are specific for the amd64 and 386 platforms. Prefix processor capability variables by architecture to have a consistent naming scheme and avoid reuse of the existing variables for new platforms. This also aligns naming of runtime variables closer with internal/cpu processor capability variable names. Change-Id: I3eabb29a03874678851376185d3a62e73c1aff1d Reviewed-on: https://go-review.googlesource.com/c/91435 Run-TryBot: Martin Möhrmann <martisch@uos.de> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Lynn Boger
|
4ae49b5921 |
cmd/compile: use ANDCC, ORCC, XORCC to avoid CMP on ppc64x
This change makes use of the cc versions of the AND, OR, XOR instructions, omitting the need for a CMP instruction. In many test programs and in the go binary, this reduces the size of 20-30 functions by at least 1 instruction, many in runtime. Testcase added to test/codegen/comparisons.go Change-Id: I6cc1ca8b80b065d7390749c625bc9784b0039adb Reviewed-on: https://go-review.googlesource.com/c/143059 Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
Keith Randall
|
0ad332d80c |
cmd/compile: implement some moves using non-overlapping reads&writes
For moves >8,<16 bytes, do a move using non-overlapping loads/stores if it would require no more instructions. This helps a bit with the case when the move is from a static constant, because then the code to materialize the value being moved is smaller. Change-Id: Ie47a5a7c654afeb4973142b0a9922faea13c9b54 Reviewed-on: https://go-review.googlesource.com/c/146019 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
455ef3f6bc |
test/codegen: improve arithmetic tests
This CL fixes several typos and adds two more cases to arithmetic test. Change-Id: I086560162ea351e2166866e444e2317da36c1729 Reviewed-on: https://go-review.googlesource.com/c/145210 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
5f5ea3fd4d |
cmd/compile: optimize amd64's ADDQconstmodify/ADDLconstmodify
This CL optimize amd64's code: "ADDQ $-1, MEM_OP" -> "DECQ MEM_OP" "ADDL $-1, MEM_OP" -> "DECL MEM_OP" 1. The total size of pkg/linux_amd64 (excluding cmd/compile) decreases about 0.1KB. 2. The go1 benchmark shows little regression, excluding noise. name old time/op new time/op delta BinaryTree17-4 2.60s ± 5% 2.64s ± 3% +1.53% (p=0.000 n=38+39) Fannkuch11-4 2.37s ± 2% 2.38s ± 2% ~ (p=0.950 n=40+40) FmtFprintfEmpty-4 40.4ns ± 5% 40.5ns ± 5% ~ (p=0.711 n=40+40) FmtFprintfString-4 72.4ns ± 5% 72.3ns ± 3% ~ (p=0.485 n=40+40) FmtFprintfInt-4 79.7ns ± 3% 80.1ns ± 3% ~ (p=0.124 n=40+40) FmtFprintfIntInt-4 126ns ± 3% 127ns ± 3% +0.71% (p=0.027 n=40+40) FmtFprintfPrefixedInt-4 153ns ± 4% 153ns ± 2% ~ (p=0.604 n=40+40) FmtFprintfFloat-4 206ns ± 5% 210ns ± 5% +1.79% (p=0.002 n=40+40) FmtManyArgs-4 498ns ± 3% 496ns ± 3% ~ (p=0.099 n=40+40) GobDecode-4 6.48ms ± 6% 6.47ms ± 7% ~ (p=0.686 n=39+40) GobEncode-4 5.95ms ± 7% 5.96ms ± 6% ~ (p=0.670 n=40+34) Gzip-4 224ms ± 6% 223ms ± 5% ~ (p=0.143 n=40+40) Gunzip-4 36.5ms ± 4% 36.5ms ± 4% ~ (p=0.556 n=40+40) HTTPClientServer-4 60.7µs ± 2% 59.9µs ± 3% -1.20% (p=0.000 n=39+39) JSONEncode-4 9.03ms ± 4% 9.04ms ± 4% ~ (p=0.589 n=40+40) JSONDecode-4 49.4ms ± 4% 49.2ms ± 4% ~ (p=0.276 n=40+40) Mandelbrot200-4 3.80ms ± 4% 3.79ms ± 4% ~ (p=0.837 n=40+40) GoParse-4 3.15ms ± 5% 3.13ms ± 5% ~ (p=0.240 n=40+40) RegexpMatchEasy0_32-4 72.9ns ± 3% 72.0ns ± 8% -1.25% (p=0.003 n=40+40) RegexpMatchEasy0_1K-4 229ns ± 5% 230ns ± 4% ~ (p=0.318 n=40+40) RegexpMatchEasy1_32-4 66.9ns ± 3% 67.3ns ± 7% ~ (p=0.817 n=40+40) RegexpMatchEasy1_1K-4 371ns ± 5% 370ns ± 4% ~ (p=0.275 n=40+40) RegexpMatchMedium_32-4 106ns ± 4% 104ns ± 7% -2.28% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 32.0µs ± 2% 31.4µs ± 3% -2.08% (p=0.000 n=40+40) RegexpMatchHard_32-4 1.54µs ± 7% 1.52µs ± 3% -1.80% (p=0.007 n=39+40) RegexpMatchHard_1K-4 45.8µs ± 4% 45.5µs ± 3% ~ (p=0.707 n=40+40) Revcomp-4 401ms ± 5% 401ms ± 6% ~ (p=0.935 n=40+40) Template-4 62.4ms ± 4% 61.2ms ± 3% -1.85% (p=0.000 n=40+40) TimeParse-4 315ns ± 2% 318ns ± 3% +1.10% (p=0.002 n=40+40) TimeFormat-4 297ns ± 3% 298ns ± 3% ~ (p=0.238 n=40+40) [Geo mean] 45.8µs 45.7µs -0.22% name old speed new speed delta GobDecode-4 119MB/s ± 6% 119MB/s ± 7% ~ (p=0.684 n=39+40) GobEncode-4 129MB/s ± 7% 128MB/s ± 6% ~ (p=0.413 n=40+34) Gzip-4 86.6MB/s ± 6% 87.0MB/s ± 6% ~ (p=0.145 n=40+40) Gunzip-4 532MB/s ± 4% 532MB/s ± 4% ~ (p=0.556 n=40+40) JSONEncode-4 215MB/s ± 4% 215MB/s ± 4% ~ (p=0.583 n=40+40) JSONDecode-4 39.3MB/s ± 4% 39.5MB/s ± 4% ~ (p=0.277 n=40+40) GoParse-4 18.4MB/s ± 5% 18.5MB/s ± 5% ~ (p=0.229 n=40+40) RegexpMatchEasy0_32-4 439MB/s ± 3% 445MB/s ± 8% +1.28% (p=0.003 n=40+40) RegexpMatchEasy0_1K-4 4.46GB/s ± 4% 4.45GB/s ± 4% ~ (p=0.343 n=40+40) RegexpMatchEasy1_32-4 479MB/s ± 3% 476MB/s ± 7% ~ (p=0.855 n=40+40) RegexpMatchEasy1_1K-4 2.76GB/s ± 5% 2.77GB/s ± 4% ~ (p=0.250 n=40+40) RegexpMatchMedium_32-4 9.36MB/s ± 4% 9.58MB/s ± 6% +2.31% (p=0.001 n=40+40) RegexpMatchMedium_1K-4 32.0MB/s ± 2% 32.7MB/s ± 3% +2.12% (p=0.000 n=40+40) RegexpMatchHard_32-4 20.7MB/s ± 7% 21.1MB/s ± 3% +1.95% (p=0.005 n=40+40) RegexpMatchHard_1K-4 22.4MB/s ± 4% 22.5MB/s ± 3% ~ (p=0.689 n=40+40) Revcomp-4 634MB/s ± 5% 634MB/s ± 6% ~ (p=0.935 n=40+40) Template-4 31.1MB/s ± 3% 31.7MB/s ± 3% +1.88% (p=0.000 n=40+40) [Geo mean] 129MB/s 130MB/s +0.62% Change-Id: I9d61ee810d900920c572cbe89e2f1626bfed12b7 Reviewed-on: https://go-review.googlesource.com/c/145209 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Keith Randall
|
9f291d1fc3 |
cmd/compile: fix rule for combining loads with compares
Unlike normal load+op opcodes, the load+compare opcode does not clobber its non-load argument. Allow the load+compare merge to happen even if the non-load argument is used elsewhere. Noticed when investigating issue #28417. Change-Id: Ibc48d1f2e06ae76034c59f453815d263e8ec7288 Reviewed-on: https://go-review.googlesource.com/c/145097 Reviewed-by: Ainar Garipov <gugl.zadolbal@gmail.com> Reviewed-by: Ben Shi <powerman1st@163.com> |
||
Keith Randall
|
dd789550a7 |
cmd/compile: intrinsify math/bits.Sub on amd64
name old time/op new time/op delta Sub-8 1.12ns ± 1% 1.17ns ± 1% +5.20% (p=0.008 n=5+5) Sub32-8 1.11ns ± 0% 1.11ns ± 0% ~ (all samples are equal) Sub64-8 1.12ns ± 0% 1.18ns ± 1% +5.00% (p=0.016 n=4+5) Sub64multiple-8 4.10ns ± 1% 0.86ns ± 1% -78.93% (p=0.008 n=5+5) Fixes #28273 Change-Id: Ibcb6f2fd32d987c3bcbae4f4cd9d335a3de98548 Reviewed-on: https://go-review.googlesource.com/c/144258 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Keith Randall
|
899f3a2892 |
cmd/compile: intrinsify math/bits.Add on amd64
name old time/op new time/op delta Add-8 1.11ns ± 0% 1.18ns ± 0% +6.31% (p=0.029 n=4+4) Add32-8 1.02ns ± 0% 1.02ns ± 1% ~ (p=0.333 n=4+5) Add64-8 1.11ns ± 1% 1.17ns ± 0% +5.79% (p=0.008 n=5+5) Add64multiple-8 4.35ns ± 1% 0.86ns ± 0% -80.22% (p=0.000 n=5+4) The individual ops are a bit slower (but still very fast). Using the ops in carry chains is very fast. Update #28273 Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5 Reviewed-on: https://go-review.googlesource.com/c/144257 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
ChrisALiles
|
13d5cd7847 |
cmd/compile: use proved bounds to remove signed division fix-ups
prove is able to find 94 occurrences in std cmd where a divisor can't have the value -1. The change removes the extraneous fix-up code for these cases. Fixes #25239 Change-Id: Ic184de971f47cc57c702eb72805b8e291c14035d Reviewed-on: https://go-review.googlesource.com/c/130215 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Ben Shi
|
95dda75bde |
cmd/compile: optimize store combination on 386/amd64
This CL add 3 rules to combine byte-store to word-store on386 and amd64. Change-Id: Iffd9cda42f1961680c81def4edc773ad58f211b3 Reviewed-on: https://go-review.googlesource.com/c/143057 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Ben Shi
|
4158734097 |
test/codegen: add more combined load/store test cases
This CL adds more combined load/store test cases for 386/amd64. Change-Id: I0a483a6ed0212b65c5e84d67ed8c9f50c389ce2d Reviewed-on: https://go-review.googlesource.com/c/142878 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Lynn Boger
|
39fa301bdc |
test/codegen: enable more tests for ppc64/ppc64le
Adding cases for ppc64,ppc64le to the codegen tests where appropriate. Change-Id: Idf8cbe88a4ab4406a4ef1ea777bd15a58b68f3ed Reviewed-on: https://go-review.googlesource.com/c/142557 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
Ben Shi
|
4b78fe57a8 |
cmd/compile: optimize 386's load/store combination
This CL adds more combinations of two consequtive MOVBload/MOVBstore to a unique MOVWload/MOVWstore. 1. The size of the go executable decreases about 4KB, and the total size of pkg/linux_386 (excluding cmd/compile) decreases about 1.5KB. 2. There is no regression in the go1 benchmark result, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.28s ± 2% 3.29s ± 2% ~ (p=0.151 n=40+40) Fannkuch11-4 3.52s ± 1% 3.51s ± 1% -0.28% (p=0.002 n=40+40) FmtFprintfEmpty-4 45.4ns ± 4% 45.0ns ± 4% -0.89% (p=0.019 n=40+40) FmtFprintfString-4 81.9ns ± 7% 81.3ns ± 1% ~ (p=0.660 n=40+25) FmtFprintfInt-4 91.9ns ± 9% 91.4ns ± 9% ~ (p=0.249 n=40+40) FmtFprintfIntInt-4 143ns ± 4% 143ns ± 4% ~ (p=0.760 n=40+40) FmtFprintfPrefixedInt-4 184ns ± 3% 183ns ± 4% ~ (p=0.485 n=40+40) FmtFprintfFloat-4 408ns ± 3% 409ns ± 3% ~ (p=0.961 n=40+40) FmtManyArgs-4 597ns ± 4% 602ns ± 3% ~ (p=0.413 n=40+40) GobDecode-4 7.13ms ± 6% 7.14ms ± 6% ~ (p=0.859 n=40+40) GobEncode-4 6.86ms ± 9% 6.94ms ± 7% ~ (p=0.162 n=40+40) Gzip-4 395ms ± 4% 396ms ± 3% ~ (p=0.099 n=40+40) Gunzip-4 40.9ms ± 4% 41.1ms ± 3% ~ (p=0.064 n=40+40) HTTPClientServer-4 63.6µs ± 2% 63.6µs ± 3% ~ (p=0.832 n=36+39) JSONEncode-4 16.1ms ± 3% 15.8ms ± 3% -1.60% (p=0.001 n=40+40) JSONDecode-4 61.0ms ± 3% 61.5ms ± 4% ~ (p=0.065 n=40+40) Mandelbrot200-4 5.16ms ± 3% 5.18ms ± 3% ~ (p=0.056 n=40+40) GoParse-4 3.25ms ± 2% 3.23ms ± 3% ~ (p=0.727 n=40+40) RegexpMatchEasy0_32-4 90.2ns ± 3% 89.3ns ± 6% -0.98% (p=0.002 n=40+40) RegexpMatchEasy0_1K-4 812ns ± 3% 815ns ± 3% ~ (p=0.309 n=40+40) RegexpMatchEasy1_32-4 103ns ± 6% 103ns ± 5% ~ (p=0.680 n=40+40) RegexpMatchEasy1_1K-4 1.01µs ± 4% 1.02µs ± 3% ~ (p=0.326 n=40+33) RegexpMatchMedium_32-4 120ns ± 4% 120ns ± 5% ~ (p=0.834 n=40+40) RegexpMatchMedium_1K-4 40.1µs ± 3% 39.5µs ± 4% -1.35% (p=0.000 n=40+40) RegexpMatchHard_32-4 2.27µs ± 6% 2.23µs ± 4% -1.67% (p=0.011 n=40+40) RegexpMatchHard_1K-4 67.2µs ± 3% 67.2µs ± 3% ~ (p=0.149 n=40+40) Revcomp-4 1.84s ± 2% 1.86s ± 3% +0.70% (p=0.020 n=40+40) Template-4 69.0ms ± 4% 69.8ms ± 3% +1.20% (p=0.003 n=40+40) TimeParse-4 438ns ± 3% 439ns ± 4% ~ (p=0.650 n=40+40) TimeFormat-4 412ns ± 3% 412ns ± 3% ~ (p=0.888 n=40+40) [Geo mean] 65.2µs 65.2µs -0.04% name old speed new speed delta GobDecode-4 108MB/s ± 6% 108MB/s ± 6% ~ (p=0.855 n=40+40) GobEncode-4 112MB/s ± 9% 111MB/s ± 8% ~ (p=0.159 n=40+40) Gzip-4 49.2MB/s ± 4% 49.1MB/s ± 3% ~ (p=0.102 n=40+40) Gunzip-4 474MB/s ± 3% 472MB/s ± 3% ~ (p=0.063 n=40+40) JSONEncode-4 121MB/s ± 3% 123MB/s ± 3% +1.62% (p=0.001 n=40+40) JSONDecode-4 31.9MB/s ± 3% 31.6MB/s ± 4% ~ (p=0.070 n=40+40) GoParse-4 17.9MB/s ± 2% 17.9MB/s ± 3% ~ (p=0.696 n=40+40) RegexpMatchEasy0_32-4 355MB/s ± 3% 358MB/s ± 5% +0.99% (p=0.002 n=40+40) RegexpMatchEasy0_1K-4 1.26GB/s ± 3% 1.26GB/s ± 3% ~ (p=0.381 n=40+40) RegexpMatchEasy1_32-4 310MB/s ± 5% 310MB/s ± 4% ~ (p=0.655 n=40+40) RegexpMatchEasy1_1K-4 1.01GB/s ± 4% 1.01GB/s ± 3% ~ (p=0.351 n=40+33) RegexpMatchMedium_32-4 8.32MB/s ± 4% 8.34MB/s ± 5% ~ (p=0.696 n=40+40) RegexpMatchMedium_1K-4 25.6MB/s ± 3% 25.9MB/s ± 4% +1.36% (p=0.000 n=40+40) RegexpMatchHard_32-4 14.1MB/s ± 6% 14.3MB/s ± 4% +1.64% (p=0.011 n=40+40) RegexpMatchHard_1K-4 15.2MB/s ± 3% 15.2MB/s ± 3% ~ (p=0.147 n=40+40) Revcomp-4 138MB/s ± 2% 137MB/s ± 3% -0.70% (p=0.021 n=40+40) Template-4 28.1MB/s ± 4% 27.8MB/s ± 3% -1.19% (p=0.003 n=40+40) [Geo mean] 83.7MB/s 83.7MB/s +0.03% Change-Id: I2a2b3a942b5c45467491515d201179fd192e65c9 Reviewed-on: https://go-review.googlesource.com/c/141650 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Ben Shi
|
3785be3093 |
test/codegen: fix confusing test cases
ARMv7's MULAF/MULSF/MULAD/MULSD are not fused, this CL fixes the confusing test cases. Change-Id: I35022e207e2f0d24a23a7f6f188e41ba8eee9886 Reviewed-on: https://go-review.googlesource.com/c/142439 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Akhil Indurti <aindurti@gmail.com> Reviewed-by: Giovanni Bajo <rasky@develer.com> |
||
Martin Möhrmann
|
a0f57c3fd0 |
cmd/compile: avoid string allocations when map key is struct or array literal
x = map[string(byteslice)] is already optimized by the compiler to avoid a string allocation. This CL generalizes this optimization to: x = map[T1{ ... Tn{..., string(byteslice), ...} ... }] where T1 to Tn is a nesting of struct and array literals. Found in a hot code path that used a struct of strings made from []byte slices to make a map lookup. There are no uses of the more generalized optimization in the standard library. Passes toolstash -cmp. MapStringConversion/32/simple 21.9ns ± 2% 21.9ns ± 3% ~ (p=0.995 n=17+20) MapStringConversion/32/struct 28.8ns ± 3% 22.0ns ± 2% -23.80% (p=0.000 n=20+20) MapStringConversion/32/array 28.5ns ± 2% 21.9ns ± 2% -23.14% (p=0.000 n=19+16) MapStringConversion/64/simple 21.0ns ± 2% 21.1ns ± 3% ~ (p=0.072 n=19+18) MapStringConversion/64/struct 72.4ns ± 3% 21.3ns ± 2% -70.53% (p=0.000 n=20+20) MapStringConversion/64/array 72.8ns ± 1% 21.0ns ± 2% -71.13% (p=0.000 n=17+19) name old allocs/op new allocs/op delta MapStringConversion/32/simple 0.00 0.00 ~ (all equal) MapStringConversion/32/struct 0.00 0.00 ~ (all equal) MapStringConversion/32/array 0.00 0.00 ~ (all equal) MapStringConversion/64/simple 0.00 0.00 ~ (all equal) MapStringConversion/64/struct 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) MapStringConversion/64/array 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Change-Id: I483b4d84d8d74b1025b62c954da9a365e79b7a3a Reviewed-on: https://go-review.googlesource.com/c/116275 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
Alberto Donizetti
|
7c96d87eda |
test/codegen: test ppc64 TrailingZeros, OnesCount codegen
This change adds codegen tests for the intrinsification on ppc64 of the OnesCount{64,32,16,8}, and TrailingZeros{64,32,16,8} math/bits functions. Change-Id: Id3364921fbd18316850e15c8c71330c906187fdb Reviewed-on: https://go-review.googlesource.com/c/141897 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
||
Ben Shi
|
93e27e01af |
test/codegen: add tests of FMA for arm/arm64
This CL adds tests of fused multiplication-accumulation on arm/arm64. Change-Id: Ic85d5277c0d6acb7e1e723653372dfaf96824a39 Reviewed-on: https://go-review.googlesource.com/c/141652 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
c3208842e1 |
test/codegen: add tests for multiplication-subtraction
This CL adds tests for armv7's MULS and arm64's MSUBW. Change-Id: Id0fd5d26fd477e4ed14389b0d33cad930423eb5b Reviewed-on: https://go-review.googlesource.com/c/141651 Run-TryBot: Ben Shi <powerman1st@163.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
Keith Randall
|
653a4bd8d4 |
cmd/compile: optimize loads from readonly globals into constants
Instead of MOVB go.string."foo"(SB), AX do MOVB $102, AX When we know the global we're loading from is readonly, we can do that read at compile time. I've made this arch-dependent mostly because the cases where this happens often are memory->memory moves, and those don't get decomposed until lowering. Did amd64/386/arm/arm64. Other architectures could follow. Update #26498 Change-Id: I41b1dc831b2cd0a52dac9b97f4f4457888a46389 Reviewed-on: https://go-review.googlesource.com/c/141118 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
||
Ben Shi
|
bac6a2925c |
test/codegen: add more arm64 test cases
This CL adds 3 combined load test cases for arm64. Change-Id: I2c67308c40fd8a18f9f2d16c6d12911dcdc583e2 Reviewed-on: https://go-review.googlesource.com/c/140700 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
Ben Shi
|
27965c1436 |
cmd/compile: optimize 386's ADDLconstmodifyidx4
This CL optimize ADDLconstmodifyidx4 to INCL/DECL, when the constant is +1/-1. 1. The total size of pkg/linux_386/ decreases 28 bytes, excluding cmd/compile. 2. There is no regression in the go1 benchmark test, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.25s ± 2% 3.23s ± 3% -0.70% (p=0.040 n=30+30) Fannkuch11-4 3.50s ± 1% 3.47s ± 1% -0.68% (p=0.000 n=30+30) FmtFprintfEmpty-4 44.6ns ± 3% 44.8ns ± 3% +0.46% (p=0.029 n=30+30) FmtFprintfString-4 79.0ns ± 3% 78.7ns ± 3% ~ (p=0.053 n=30+30) FmtFprintfInt-4 89.2ns ± 2% 89.4ns ± 3% ~ (p=0.665 n=30+29) FmtFprintfIntInt-4 142ns ± 3% 142ns ± 3% ~ (p=0.435 n=30+30) FmtFprintfPrefixedInt-4 182ns ± 2% 182ns ± 2% ~ (p=0.964 n=30+30) FmtFprintfFloat-4 407ns ± 3% 411ns ± 4% ~ (p=0.080 n=30+30) FmtManyArgs-4 597ns ± 3% 593ns ± 4% ~ (p=0.222 n=30+30) GobDecode-4 7.09ms ± 6% 7.07ms ± 7% ~ (p=0.633 n=30+30) GobEncode-4 6.81ms ± 9% 6.81ms ± 8% ~ (p=0.982 n=30+30) Gzip-4 398ms ± 4% 400ms ± 6% ~ (p=0.177 n=30+30) Gunzip-4 41.3ms ± 3% 40.6ms ± 4% -1.71% (p=0.005 n=30+30) HTTPClientServer-4 63.4µs ± 3% 63.4µs ± 4% ~ (p=0.646 n=30+28) JSONEncode-4 16.0ms ± 3% 16.1ms ± 3% ~ (p=0.057 n=30+30) JSONDecode-4 63.3ms ± 8% 63.1ms ± 7% ~ (p=0.786 n=30+30) Mandelbrot200-4 5.17ms ± 3% 5.15ms ± 8% ~ (p=0.654 n=30+30) GoParse-4 3.24ms ± 3% 3.23ms ± 2% ~ (p=0.091 n=30+30) RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.575 n=30+30) RegexpMatchEasy0_1K-4 823ns ± 2% 821ns ± 3% ~ (p=0.827 n=30+30) RegexpMatchEasy1_32-4 113ns ± 3% 112ns ± 3% ~ (p=0.076 n=30+30) RegexpMatchEasy1_1K-4 1.02µs ± 4% 1.01µs ± 5% ~ (p=0.087 n=30+30) RegexpMatchMedium_32-4 129ns ± 3% 127ns ± 4% -1.55% (p=0.009 n=30+30) RegexpMatchMedium_1K-4 39.3µs ± 4% 39.7µs ± 3% ~ (p=0.054 n=30+30) RegexpMatchHard_32-4 2.15µs ± 4% 2.15µs ± 4% ~ (p=0.712 n=30+30) RegexpMatchHard_1K-4 66.0µs ± 3% 65.1µs ± 3% -1.32% (p=0.002 n=30+30) Revcomp-4 1.85s ± 2% 1.85s ± 3% ~ (p=0.168 n=30+30) Template-4 69.5ms ± 7% 68.9ms ± 6% ~ (p=0.250 n=28+28) TimeParse-4 434ns ± 3% 432ns ± 4% ~ (p=0.629 n=30+30) TimeFormat-4 403ns ± 4% 408ns ± 3% +1.23% (p=0.019 n=30+29) [Geo mean] 65.5µs 65.3µs -0.20% name old speed new speed delta GobDecode-4 108MB/s ± 6% 109MB/s ± 6% ~ (p=0.636 n=30+30) GobEncode-4 113MB/s ±10% 113MB/s ± 9% ~ (p=0.982 n=30+30) Gzip-4 48.8MB/s ± 4% 48.6MB/s ± 5% ~ (p=0.178 n=30+30) Gunzip-4 470MB/s ± 3% 479MB/s ± 4% +1.72% (p=0.006 n=30+30) JSONEncode-4 121MB/s ± 3% 120MB/s ± 3% ~ (p=0.057 n=30+30) JSONDecode-4 30.7MB/s ± 8% 30.8MB/s ± 8% ~ (p=0.784 n=30+30) GoParse-4 17.9MB/s ± 3% 17.9MB/s ± 2% ~ (p=0.090 n=30+30) RegexpMatchEasy0_32-4 309MB/s ± 4% 309MB/s ± 3% ~ (p=0.530 n=30+30) RegexpMatchEasy0_1K-4 1.24GB/s ± 2% 1.25GB/s ± 3% ~ (p=0.976 n=30+30) RegexpMatchEasy1_32-4 282MB/s ± 3% 284MB/s ± 3% +0.81% (p=0.041 n=30+30) RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 1.01GB/s ± 4% ~ (p=0.091 n=30+30) RegexpMatchMedium_32-4 7.71MB/s ± 3% 7.84MB/s ± 4% +1.71% (p=0.000 n=30+30) RegexpMatchMedium_1K-4 26.1MB/s ± 4% 25.8MB/s ± 3% ~ (p=0.051 n=30+30) RegexpMatchHard_32-4 14.9MB/s ± 4% 14.9MB/s ± 4% ~ (p=0.712 n=30+30) RegexpMatchHard_1K-4 15.5MB/s ± 3% 15.7MB/s ± 3% +1.34% (p=0.003 n=30+30) Revcomp-4 138MB/s ± 2% 137MB/s ± 3% ~ (p=0.174 n=30+30) Template-4 28.0MB/s ± 6% 28.2MB/s ± 6% ~ (p=0.251 n=28+28) [Geo mean] 82.3MB/s 82.6MB/s +0.36% Change-Id: I389829699ffe9500a013fcf31be58a97e98043e1 Reviewed-on: https://go-review.googlesource.com/c/140701 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Keith Randall
|
ceb0c371d9 |
cmd/compile: make []byte("...") more efficient
Do []byte(string) conversions more efficiently when the string is a constant. Instead of calling stringtobyteslice, allocate just the space we need and encode the initialization directly. []byte("foo") rewrites to the following pseudocode: var s [3]byte // on heap or stack, depending on whether b escapes s = *(*[3]byte)(&"foo"[0]) // initialize s from the string b = s[:] which generates this assembly: 0x001d 00029 (tmp1.go:9) LEAQ type.[3]uint8(SB), AX 0x0024 00036 (tmp1.go:9) MOVQ AX, (SP) 0x0028 00040 (tmp1.go:9) CALL runtime.newobject(SB) 0x002d 00045 (tmp1.go:9) MOVQ 8(SP), AX 0x0032 00050 (tmp1.go:9) MOVBLZX go.string."foo"+2(SB), CX 0x0039 00057 (tmp1.go:9) MOVWLZX go.string."foo"(SB), DX 0x0040 00064 (tmp1.go:9) MOVW DX, (AX) 0x0043 00067 (tmp1.go:9) MOVB CL, 2(AX) // Then the slice is b = {AX, 3, 3} The generated code is still not optimal, as it still does load/store from read-only memory instead of constant stores. Next CL... Update #26498 Fixes #10170 Change-Id: I4b990b19f9a308f60c8f4f148934acffefe0a5bd Reviewed-on: https://go-review.googlesource.com/c/140698 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
||
Ben Shi
|
3933302550 |
cmd/compile: add indexed form for several 386 instructions
This CL implements indexed memory operands for the following instructions. (ADD|SUB|MUL|AND|OR|XOR)Lload -> (ADD|SUB|MUL|AND|OR|XOR)Lloadidx4 (ADD|SUB|AND|OR|XOR)Lmodify -> (ADD|SUB|AND|OR|XOR)Lmodifyidx4 (ADD|AND|OR|XOR)Lconstmodify -> (ADD|AND|OR|XOR)Lconstmodifyidx4 1. The total size of pkg/linux_386/ decreases about 2.5KB, excluding cmd/compile/ . 2. There is little regression in the go1 benchmark test, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.25s ± 3% 3.25s ± 3% ~ (p=0.218 n=40+40) Fannkuch11-4 3.53s ± 1% 3.53s ± 1% ~ (p=0.303 n=40+40) FmtFprintfEmpty-4 44.9ns ± 3% 45.6ns ± 3% +1.48% (p=0.030 n=40+36) FmtFprintfString-4 78.7ns ± 5% 80.1ns ± 7% ~ (p=0.217 n=36+40) FmtFprintfInt-4 90.2ns ± 6% 89.8ns ± 5% ~ (p=0.659 n=40+38) FmtFprintfIntInt-4 140ns ± 5% 141ns ± 5% +1.00% (p=0.027 n=40+40) FmtFprintfPrefixedInt-4 185ns ± 3% 183ns ± 3% ~ (p=0.104 n=40+40) FmtFprintfFloat-4 411ns ± 4% 406ns ± 3% -1.37% (p=0.005 n=40+40) FmtManyArgs-4 590ns ± 4% 598ns ± 4% +1.35% (p=0.008 n=40+40) GobDecode-4 7.16ms ± 5% 7.10ms ± 5% ~ (p=0.335 n=40+40) GobEncode-4 6.85ms ± 7% 6.74ms ± 9% ~ (p=0.058 n=38+40) Gzip-4 400ms ± 4% 399ms ± 2% -0.34% (p=0.003 n=40+33) Gunzip-4 41.4ms ± 3% 41.4ms ± 4% -0.12% (p=0.020 n=40+40) HTTPClientServer-4 64.1µs ± 4% 63.5µs ± 2% -1.07% (p=0.000 n=39+37) JSONEncode-4 15.9ms ± 2% 15.9ms ± 3% ~ (p=0.103 n=40+40) JSONDecode-4 62.2ms ± 4% 61.6ms ± 3% -0.98% (p=0.006 n=39+40) Mandelbrot200-4 5.18ms ± 3% 5.14ms ± 4% ~ (p=0.125 n=40+40) GoParse-4 3.29ms ± 2% 3.27ms ± 2% -0.66% (p=0.006 n=40+40) RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.632 n=40+40) RegexpMatchEasy0_1K-4 830ns ± 3% 828ns ± 3% ~ (p=0.563 n=40+40) RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.494 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 130ns ± 4% 129ns ± 3% ~ (p=0.458 n=40+40) RegexpMatchMedium_1K-4 39.4µs ± 3% 39.7µs ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 2.16µs ± 4% 2.15µs ± 4% ~ (p=0.137 n=40+40) RegexpMatchHard_1K-4 65.2µs ± 3% 65.4µs ± 4% ~ (p=0.160 n=40+40) Revcomp-4 1.87s ± 2% 1.87s ± 1% +0.17% (p=0.019 n=33+33) Template-4 69.4ms ± 3% 69.8ms ± 3% +0.60% (p=0.009 n=40+40) TimeParse-4 437ns ± 4% 438ns ± 4% ~ (p=0.234 n=40+40) TimeFormat-4 408ns ± 3% 408ns ± 3% ~ (p=0.904 n=40+40) [Geo mean] 65.7µs 65.6µs -0.08% name old speed new speed delta GobDecode-4 107MB/s ± 5% 108MB/s ± 5% ~ (p=0.336 n=40+40) GobEncode-4 112MB/s ± 6% 114MB/s ± 9% +1.95% (p=0.036 n=37+40) Gzip-4 48.5MB/s ± 4% 48.6MB/s ± 2% +0.28% (p=0.003 n=40+33) Gunzip-4 469MB/s ± 4% 469MB/s ± 4% +0.11% (p=0.021 n=40+40) JSONEncode-4 122MB/s ± 2% 122MB/s ± 3% ~ (p=0.105 n=40+40) JSONDecode-4 31.2MB/s ± 4% 31.5MB/s ± 4% +0.99% (p=0.007 n=39+40) GoParse-4 17.6MB/s ± 2% 17.7MB/s ± 2% +0.66% (p=0.007 n=40+40) RegexpMatchEasy0_32-4 310MB/s ± 4% 310MB/s ± 4% ~ (p=0.384 n=40+40) RegexpMatchEasy0_1K-4 1.23GB/s ± 3% 1.24GB/s ± 3% ~ (p=0.186 n=40+40) RegexpMatchEasy1_32-4 283MB/s ± 3% 281MB/s ± 4% ~ (p=0.855 n=40+40) RegexpMatchEasy1_1K-4 1.00GB/s ± 4% 1.00GB/s ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 7.68MB/s ± 4% 7.73MB/s ± 3% ~ (p=0.359 n=40+40) RegexpMatchMedium_1K-4 26.0MB/s ± 3% 25.8MB/s ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 14.8MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.136 n=40+40) RegexpMatchHard_1K-4 15.7MB/s ± 3% 15.7MB/s ± 4% ~ (p=0.150 n=40+40) Revcomp-4 136MB/s ± 1% 136MB/s ± 1% -0.09% (p=0.028 n=32+33) Template-4 28.0MB/s ± 3% 27.8MB/s ± 3% -0.59% (p=0.010 n=40+40) [Geo mean] 82.1MB/s 82.3MB/s +0.25% Change-Id: Ifa387a251056678326d3508aa02753b70bf7e5d0 Reviewed-on: https://go-review.googlesource.com/c/140303 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Carlos Eduardo Seo
|
9aed4cc395 |
cmd/compile: instrinsify math/bits.Mul on ppc64x
Add SSA rules to intrinsify Mul/Mul64 on ppc64x. benchmark old ns/op new ns/op delta BenchmarkMul-40 8.80 0.93 -89.43% BenchmarkMul32-40 1.39 1.39 +0.00% BenchmarkMul64-40 5.39 0.93 -82.75% Updates #24813 Change-Id: I6e95bfbe976a2278bd17799df184a7fbc0e57829 Reviewed-on: https://go-review.googlesource.com/138917 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
||
Ben Shi
|
5aeecc4530 |
cmd/compile: optimize arm64's code with more shifted operations
This CL optimizes arm64's NEG/MVN/TST/CMN with a shifted operand. 1. The total size of pkg/android_arm64 decreases about 0.2KB, excluding cmd/compile/ . 2. The go1 benchmark shows no regression, excluding noise. name old time/op new time/op delta BinaryTree17-4 16.4s ± 1% 16.4s ± 1% ~ (p=0.914 n=29+29) Fannkuch11-4 8.72s ± 0% 8.72s ± 0% ~ (p=0.274 n=30+29) FmtFprintfEmpty-4 174ns ± 0% 174ns ± 0% ~ (all equal) FmtFprintfString-4 370ns ± 0% 370ns ± 0% ~ (all equal) FmtFprintfInt-4 419ns ± 0% 419ns ± 0% ~ (all equal) FmtFprintfIntInt-4 672ns ± 1% 675ns ± 2% ~ (p=0.217 n=28+30) FmtFprintfPrefixedInt-4 806ns ± 0% 806ns ± 0% ~ (p=0.402 n=30+28) FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% +0.02% (p=0.011 n=22+27) FmtManyArgs-4 2.67µs ± 0% 2.68µs ± 0% ~ (p=0.279 n=29+30) GobDecode-4 33.1ms ± 1% 33.1ms ± 0% ~ (p=0.052 n=28+29) GobEncode-4 29.6ms ± 0% 29.6ms ± 0% +0.08% (p=0.013 n=28+29) Gzip-4 1.38s ± 2% 1.39s ± 2% ~ (p=0.071 n=29+29) Gunzip-4 139ms ± 0% 139ms ± 0% ~ (p=0.265 n=29+29) HTTPClientServer-4 789µs ± 4% 785µs ± 4% ~ (p=0.206 n=29+28) JSONEncode-4 49.7ms ± 0% 49.6ms ± 0% -0.24% (p=0.000 n=30+30) JSONDecode-4 266ms ± 1% 267ms ± 1% +0.34% (p=0.000 n=30+30) Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.835 n=28+30) GoParse-4 15.9ms ± 0% 15.8ms ± 0% -0.29% (p=0.000 n=27+30) RegexpMatchEasy0_32-4 380ns ± 0% 381ns ± 0% +0.18% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 1.18µs ± 0% 1.19µs ± 0% +0.23% (p=0.000 n=30+30) RegexpMatchEasy1_32-4 357ns ± 0% 358ns ± 0% +0.28% (p=0.000 n=29+29) RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.04µs ± 0% +0.06% (p=0.006 n=30+30) RegexpMatchMedium_32-4 589ns ± 0% 590ns ± 0% +0.24% (p=0.000 n=28+30) RegexpMatchMedium_1K-4 162µs ± 0% 162µs ± 0% -0.01% (p=0.027 n=26+29) RegexpMatchHard_32-4 9.58µs ± 0% 9.58µs ± 0% ~ (p=0.935 n=30+30) RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% ~ (p=0.387 n=29+30) Revcomp-4 2.50s ± 0% 2.50s ± 0% -0.10% (p=0.020 n=28+28) Template-4 310ms ± 0% 310ms ± 1% ~ (p=0.406 n=30+30) TimeParse-4 1.68µs ± 0% 1.68µs ± 0% +0.03% (p=0.014 n=30+17) TimeFormat-4 1.65µs ± 0% 1.66µs ± 0% +0.32% (p=0.000 n=27+29) [Geo mean] 247µs 247µs +0.05% name old speed new speed delta GobDecode-4 23.2MB/s ± 0% 23.2MB/s ± 0% -0.08% (p=0.032 n=27+29) GobEncode-4 26.0MB/s ± 0% 25.9MB/s ± 0% -0.10% (p=0.011 n=29+29) Gzip-4 14.1MB/s ± 2% 14.0MB/s ± 2% ~ (p=0.081 n=29+29) Gunzip-4 139MB/s ± 0% 139MB/s ± 0% ~ (p=0.290 n=29+29) JSONEncode-4 39.0MB/s ± 0% 39.1MB/s ± 0% +0.25% (p=0.000 n=29+30) JSONDecode-4 7.30MB/s ± 1% 7.28MB/s ± 1% -0.33% (p=0.000 n=30+30) GoParse-4 3.65MB/s ± 0% 3.66MB/s ± 0% +0.29% (p=0.000 n=27+30) RegexpMatchEasy0_32-4 84.1MB/s ± 0% 84.0MB/s ± 0% -0.17% (p=0.000 n=30+28) RegexpMatchEasy0_1K-4 864MB/s ± 0% 862MB/s ± 0% -0.24% (p=0.000 n=30+30) RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.3MB/s ± 0% -0.18% (p=0.000 n=28+24) RegexpMatchEasy1_1K-4 502MB/s ± 0% 502MB/s ± 0% -0.05% (p=0.008 n=30+29) RegexpMatchMedium_32-4 1.70MB/s ± 0% 1.69MB/s ± 0% -0.59% (p=0.000 n=29+30) RegexpMatchMedium_1K-4 6.31MB/s ± 0% 6.31MB/s ± 0% +0.05% (p=0.005 n=30+26) RegexpMatchHard_32-4 3.34MB/s ± 0% 3.34MB/s ± 0% ~ (all equal) RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.57MB/s ± 0% ~ (all equal) Revcomp-4 102MB/s ± 0% 102MB/s ± 0% +0.10% (p=0.022 n=28+28) Template-4 6.26MB/s ± 0% 6.26MB/s ± 1% ~ (p=0.768 n=30+30) [Geo mean] 24.2MB/s 24.1MB/s -0.08% Change-Id: I494f9db7f8a568a00e9c74ae25086a58b2221683 Reviewed-on: https://go-review.googlesource.com/137976 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
d60cf39f8e |
cmd/compile: optimize arm64's MADD and MSUB
This CL implements constant folding for MADD/MSUB on arm64. 1. The total size of pkg/android_arm64/ decreases about 4KB, excluding cmd/compile/ . 2. There is no regression in the go1 benchmark, excluding noise. name old time/op new time/op delta BinaryTree17-4 16.4s ± 1% 16.5s ± 1% +0.24% (p=0.008 n=29+29) Fannkuch11-4 8.73s ± 0% 8.71s ± 0% -0.15% (p=0.000 n=29+29) FmtFprintfEmpty-4 174ns ± 0% 174ns ± 0% ~ (all equal) FmtFprintfString-4 370ns ± 0% 372ns ± 2% +0.53% (p=0.007 n=24+30) FmtFprintfInt-4 419ns ± 0% 419ns ± 0% ~ (all equal) FmtFprintfIntInt-4 673ns ± 1% 661ns ± 1% -1.81% (p=0.000 n=30+27) FmtFprintfPrefixedInt-4 806ns ± 0% 805ns ± 0% ~ (p=0.957 n=28+27) FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% -0.04% (p=0.001 n=22+30) FmtManyArgs-4 2.67µs ± 0% 2.68µs ± 0% +0.03% (p=0.045 n=29+28) GobDecode-4 33.2ms ± 1% 32.5ms ± 1% -2.11% (p=0.000 n=29+29) GobEncode-4 29.5ms ± 0% 29.2ms ± 0% -1.04% (p=0.000 n=28+28) Gzip-4 1.39s ± 2% 1.38s ± 1% -0.48% (p=0.023 n=30+30) Gunzip-4 139ms ± 0% 139ms ± 0% ~ (p=0.616 n=30+28) HTTPClientServer-4 766µs ± 4% 758µs ± 3% -1.03% (p=0.013 n=28+29) JSONEncode-4 49.7ms ± 0% 49.6ms ± 0% -0.24% (p=0.000 n=30+30) JSONDecode-4 266ms ± 0% 268ms ± 1% +1.07% (p=0.000 n=29+30) Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.248 n=30+29) GoParse-4 15.9ms ± 0% 16.0ms ± 0% +0.76% (p=0.000 n=29+29) RegexpMatchEasy0_32-4 381ns ± 0% 380ns ± 0% -0.14% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 1.18µs ± 0% 1.19µs ± 1% +0.30% (p=0.000 n=29+30) RegexpMatchEasy1_32-4 357ns ± 0% 357ns ± 0% ~ (all equal) RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.05µs ± 0% +0.50% (p=0.000 n=26+28) RegexpMatchMedium_32-4 590ns ± 0% 589ns ± 0% -0.12% (p=0.000 n=30+23) RegexpMatchMedium_1K-4 162µs ± 0% 162µs ± 0% ~ (p=0.318 n=28+25) RegexpMatchHard_32-4 9.56µs ± 0% 9.56µs ± 0% ~ (p=0.072 n=30+29) RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% -0.02% (p=0.005 n=28+28) Revcomp-4 2.50s ± 0% 2.51s ± 0% ~ (p=0.246 n=29+29) Template-4 312ms ± 1% 313ms ± 1% +0.46% (p=0.002 n=30+30) TimeParse-4 1.68µs ± 0% 1.67µs ± 0% -0.31% (p=0.000 n=27+29) TimeFormat-4 1.66µs ± 0% 1.64µs ± 0% -0.92% (p=0.000 n=29+26) [Geo mean] 247µs 246µs -0.15% name old speed new speed delta GobDecode-4 23.1MB/s ± 1% 23.6MB/s ± 0% +2.17% (p=0.000 n=29+28) GobEncode-4 26.0MB/s ± 0% 26.3MB/s ± 0% +1.05% (p=0.000 n=28+28) Gzip-4 14.0MB/s ± 2% 14.1MB/s ± 1% +0.47% (p=0.026 n=30+30) Gunzip-4 139MB/s ± 0% 139MB/s ± 0% ~ (p=0.624 n=30+28) JSONEncode-4 39.1MB/s ± 0% 39.2MB/s ± 0% +0.24% (p=0.000 n=30+30) JSONDecode-4 7.31MB/s ± 0% 7.23MB/s ± 1% -1.07% (p=0.000 n=28+30) GoParse-4 3.65MB/s ± 0% 3.62MB/s ± 0% -0.77% (p=0.000 n=29+29) RegexpMatchEasy0_32-4 84.0MB/s ± 0% 84.1MB/s ± 0% +0.18% (p=0.000 n=28+30) RegexpMatchEasy0_1K-4 864MB/s ± 0% 861MB/s ± 1% -0.29% (p=0.000 n=29+30) RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.5MB/s ± 0% ~ (p=0.841 n=28+28) RegexpMatchEasy1_1K-4 502MB/s ± 0% 500MB/s ± 0% -0.51% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.69MB/s ± 0% 1.70MB/s ± 0% +0.41% (p=0.000 n=26+30) RegexpMatchMedium_1K-4 6.31MB/s ± 0% 6.30MB/s ± 0% ~ (p=0.129 n=30+25) RegexpMatchHard_32-4 3.35MB/s ± 0% 3.35MB/s ± 0% ~ (p=0.657 n=30+29) RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.57MB/s ± 0% ~ (all equal) Revcomp-4 102MB/s ± 0% 101MB/s ± 0% ~ (p=0.213 n=29+29) Template-4 6.22MB/s ± 1% 6.19MB/s ± 1% -0.42% (p=0.005 n=30+29) [Geo mean] 24.1MB/s 24.2MB/s +0.08% Change-Id: I6c02d3c9975f6bd8bc215cb1fc14d29602b45649 Reviewed-on: https://go-review.googlesource.com/138095 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Brian Kessler
|
9eb53ab9bc |
cmd/compile: intrinsify math/bits.Mul
Add SSA rules to intrinsify Mul/Mul64 (AMD64 and ARM64). SSA rules for other functions and architectures are left as a future optimization. Benchmark results on AMD64/ARM64 before and after SSA implementation are below. amd64 name old time/op new time/op delta Add-4 1.78ns ± 0% 1.85ns ±12% ~ (p=0.397 n=4+5) Add32-4 1.71ns ± 1% 1.70ns ± 0% ~ (p=0.683 n=5+5) Add64-4 1.80ns ± 2% 1.77ns ± 0% -1.22% (p=0.048 n=5+5) Sub-4 1.78ns ± 0% 1.78ns ± 0% ~ (all equal) Sub32-4 1.78ns ± 1% 1.78ns ± 0% ~ (p=1.000 n=5+5) Sub64-4 1.78ns ± 1% 1.78ns ± 0% ~ (p=0.968 n=5+4) Mul-4 11.5ns ± 1% 1.8ns ± 2% -84.39% (p=0.008 n=5+5) Mul32-4 1.39ns ± 0% 1.38ns ± 3% ~ (p=0.175 n=5+5) Mul64-4 6.85ns ± 1% 1.78ns ± 1% -73.97% (p=0.008 n=5+5) Div-4 57.1ns ± 1% 56.7ns ± 0% ~ (p=0.087 n=5+5) Div32-4 18.0ns ± 0% 18.0ns ± 0% ~ (all equal) Div64-4 56.4ns ±10% 53.6ns ± 1% ~ (p=0.071 n=5+5) arm64 name old time/op new time/op delta Add-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Add32-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Add64-96 5.52ns ± 0% 5.51ns ± 0% ~ (p=0.444 n=5+5) Sub-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Sub32-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Sub64-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Mul-96 34.6ns ± 0% 5.0ns ± 0% -85.52% (p=0.008 n=5+5) Mul32-96 4.51ns ± 0% 4.51ns ± 0% ~ (all equal) Mul64-96 21.1ns ± 0% 5.0ns ± 0% -76.26% (p=0.008 n=5+5) Div-96 64.7ns ± 0% 64.7ns ± 0% ~ (all equal) Div32-96 17.0ns ± 0% 17.0ns ± 0% ~ (all equal) Div64-96 53.1ns ± 0% 53.1ns ± 0% ~ (all equal) Updates #24813 Change-Id: I9bda6d2102f65cae3d436a2087b47ed8bafeb068 Reviewed-on: https://go-review.googlesource.com/129415 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Iskander Sharipov
|
c03d0e4fec |
cmd/compile/internal/gc: handle arith ops in samesafeexpr
Teach samesafeexpr to handle arithmetic unary and binary ops. It makes map lookup optimization possible in m[k+1] = append(m[k+1], ...) m[-k] = append(m[-k], ...) ... etc Does not cover "+" for strings (concatenation). Change-Id: Ibbb16ac3faf176958da344be1471b06d7cf33a6c Reviewed-on: https://go-review.googlesource.com/135795 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
Ben Shi
|
c6bf9a8109 |
cmd/compile: optimize AMD64's bit wise operation
Currently "arr[idx] |= 0x80" is compiled to MOVLload->BTSL->MOVLstore. And this CL optimizes it to a single BTSLconstmodify. Other bit wise operations with a direct memory operand are also implemented. 1. The size of the executable bin/go decreases about 4KB, and the total size of pkg/linux_amd64 (excluding cmd/compile) decreases about 0.6KB. 2. There a little improvement in the go1 benchmark test (excluding noise). name old time/op new time/op delta BinaryTree17-4 2.66s ± 4% 2.66s ± 3% ~ (p=0.596 n=49+49) Fannkuch11-4 2.38s ± 2% 2.32s ± 2% -2.69% (p=0.000 n=50+50) FmtFprintfEmpty-4 42.7ns ± 4% 43.2ns ± 7% +1.31% (p=0.009 n=50+50) FmtFprintfString-4 71.0ns ± 5% 72.0ns ± 3% +1.33% (p=0.000 n=50+50) FmtFprintfInt-4 80.7ns ± 4% 80.6ns ± 3% ~ (p=0.931 n=50+50) FmtFprintfIntInt-4 125ns ± 3% 126ns ± 4% ~ (p=0.051 n=50+50) FmtFprintfPrefixedInt-4 158ns ± 1% 142ns ± 3% -9.84% (p=0.000 n=36+50) FmtFprintfFloat-4 215ns ± 4% 212ns ± 4% -1.23% (p=0.002 n=50+50) FmtManyArgs-4 519ns ± 3% 510ns ± 3% -1.77% (p=0.000 n=50+50) GobDecode-4 6.49ms ± 6% 6.52ms ± 5% ~ (p=0.866 n=50+50) GobEncode-4 5.93ms ± 8% 6.01ms ± 7% ~ (p=0.076 n=50+50) Gzip-4 222ms ± 4% 224ms ± 8% +0.80% (p=0.001 n=50+50) Gunzip-4 36.6ms ± 5% 36.4ms ± 4% ~ (p=0.093 n=50+50) HTTPClientServer-4 59.1µs ± 1% 58.9µs ± 2% -0.24% (p=0.039 n=49+48) JSONEncode-4 9.23ms ± 4% 9.21ms ± 5% ~ (p=0.244 n=50+50) JSONDecode-4 48.8ms ± 4% 48.7ms ± 4% ~ (p=0.653 n=50+50) Mandelbrot200-4 3.81ms ± 4% 3.80ms ± 3% ~ (p=0.834 n=50+50) GoParse-4 3.20ms ± 5% 3.19ms ± 5% ~ (p=0.494 n=50+50) RegexpMatchEasy0_32-4 78.1ns ± 2% 77.4ns ± 3% -0.86% (p=0.005 n=50+50) RegexpMatchEasy0_1K-4 233ns ± 3% 233ns ± 3% ~ (p=0.074 n=50+50) RegexpMatchEasy1_32-4 74.2ns ± 3% 73.4ns ± 3% -1.06% (p=0.000 n=50+50) RegexpMatchEasy1_1K-4 369ns ± 2% 364ns ± 4% -1.41% (p=0.000 n=36+50) RegexpMatchMedium_32-4 109ns ± 4% 107ns ± 3% -2.06% (p=0.001 n=50+50) RegexpMatchMedium_1K-4 31.5µs ± 3% 30.8µs ± 3% -2.20% (p=0.000 n=50+50) RegexpMatchHard_32-4 1.57µs ± 3% 1.56µs ± 2% -0.57% (p=0.016 n=50+50) RegexpMatchHard_1K-4 47.4µs ± 4% 47.0µs ± 3% -0.82% (p=0.008 n=50+50) Revcomp-4 414ms ± 7% 412ms ± 7% ~ (p=0.285 n=50+50) Template-4 64.3ms ± 4% 62.7ms ± 3% -2.44% (p=0.000 n=50+50) TimeParse-4 316ns ± 3% 313ns ± 3% ~ (p=0.122 n=50+50) TimeFormat-4 291ns ± 3% 293ns ± 3% +0.80% (p=0.001 n=50+50) [Geo mean] 46.5µs 46.2µs -0.81% name old speed new speed delta GobDecode-4 118MB/s ± 6% 118MB/s ± 5% ~ (p=0.863 n=50+50) GobEncode-4 130MB/s ± 9% 128MB/s ± 8% ~ (p=0.076 n=50+50) Gzip-4 87.4MB/s ± 4% 86.8MB/s ± 7% -0.78% (p=0.002 n=50+50) Gunzip-4 531MB/s ± 5% 533MB/s ± 4% ~ (p=0.093 n=50+50) JSONEncode-4 210MB/s ± 4% 211MB/s ± 5% ~ (p=0.247 n=50+50) JSONDecode-4 39.8MB/s ± 4% 39.9MB/s ± 4% ~ (p=0.654 n=50+50) GoParse-4 18.1MB/s ± 5% 18.2MB/s ± 5% ~ (p=0.493 n=50+50) RegexpMatchEasy0_32-4 410MB/s ± 2% 413MB/s ± 3% +0.86% (p=0.004 n=50+50) RegexpMatchEasy0_1K-4 4.39GB/s ± 3% 4.38GB/s ± 3% ~ (p=0.063 n=50+50) RegexpMatchEasy1_32-4 432MB/s ± 3% 436MB/s ± 3% +1.07% (p=0.000 n=50+50) RegexpMatchEasy1_1K-4 2.77GB/s ± 2% 2.81GB/s ± 4% +1.46% (p=0.000 n=36+50) RegexpMatchMedium_32-4 9.16MB/s ± 3% 9.35MB/s ± 4% +2.09% (p=0.001 n=50+50) RegexpMatchMedium_1K-4 32.5MB/s ± 3% 33.2MB/s ± 3% +2.25% (p=0.000 n=50+50) RegexpMatchHard_32-4 20.4MB/s ± 3% 20.5MB/s ± 2% +0.56% (p=0.017 n=50+50) RegexpMatchHard_1K-4 21.6MB/s ± 4% 21.8MB/s ± 3% +0.83% (p=0.008 n=50+50) Revcomp-4 613MB/s ± 4% 618MB/s ± 7% ~ (p=0.152 n=48+50) Template-4 30.2MB/s ± 4% 30.9MB/s ± 3% +2.49% (p=0.000 n=50+50) [Geo mean] 127MB/s 128MB/s +0.64% Change-Id: If405198283855d75697f66cf894b2bef458f620e Reviewed-on: https://go-review.googlesource.com/135422 Reviewed-by: Keith Randall <khr@golang.org> |
||
fanzha02
|
a19a83c8ef |
cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on arm64
Use float <-> int register moves without conversion instead of stores and loads to move float <-> int values. Math package benchmark results. name old time/op new time/op delta Acosh 153ns ± 0% 147ns ± 0% -3.92% (p=0.000 n=10+10) Asinh 183ns ± 0% 177ns ± 0% -3.28% (p=0.000 n=10+10) Atanh 157ns ± 0% 155ns ± 0% -1.27% (p=0.000 n=10+10) Atan2 118ns ± 0% 117ns ± 1% -0.59% (p=0.003 n=10+10) Cbrt 119ns ± 0% 114ns ± 0% -4.20% (p=0.000 n=10+10) Copysign 7.51ns ± 0% 6.51ns ± 0% -13.32% (p=0.000 n=9+10) Cos 73.1ns ± 0% 70.6ns ± 0% -3.42% (p=0.000 n=10+10) Cosh 119ns ± 0% 121ns ± 0% +1.68% (p=0.000 n=10+9) ExpGo 154ns ± 0% 149ns ± 0% -3.05% (p=0.000 n=9+10) Expm1 101ns ± 0% 99ns ± 0% -1.88% (p=0.000 n=10+10) Exp2Go 150ns ± 0% 146ns ± 0% -2.67% (p=0.000 n=10+10) Abs 7.01ns ± 0% 6.01ns ± 0% -14.27% (p=0.000 n=10+9) Mod 234ns ± 0% 212ns ± 0% -9.40% (p=0.000 n=9+10) Frexp 34.5ns ± 0% 30.0ns ± 0% -13.04% (p=0.000 n=10+10) Gamma 112ns ± 0% 111ns ± 0% -0.89% (p=0.000 n=10+10) Hypot 73.6ns ± 0% 68.6ns ± 0% -6.79% (p=0.000 n=10+10) HypotGo 77.1ns ± 0% 72.1ns ± 0% -6.49% (p=0.000 n=10+10) Ilogb 31.0ns ± 0% 28.0ns ± 0% -9.68% (p=0.000 n=10+10) J0 437ns ± 0% 434ns ± 0% -0.62% (p=0.000 n=10+10) J1 433ns ± 0% 431ns ± 0% -0.46% (p=0.000 n=10+10) Jn 927ns ± 0% 922ns ± 0% -0.54% (p=0.000 n=10+10) Ldexp 41.5ns ± 0% 37.0ns ± 0% -10.84% (p=0.000 n=9+10) Log 124ns ± 0% 118ns ± 0% -4.84% (p=0.000 n=10+9) Logb 34.0ns ± 0% 32.0ns ± 0% -5.88% (p=0.000 n=10+10) Log1p 110ns ± 0% 108ns ± 0% -1.82% (p=0.000 n=10+10) Log10 136ns ± 0% 132ns ± 0% -2.94% (p=0.000 n=10+10) Log2 51.6ns ± 0% 47.1ns ± 0% -8.72% (p=0.000 n=10+10) Nextafter32 33.0ns ± 0% 30.5ns ± 0% -7.58% (p=0.000 n=10+10) Nextafter64 29.0ns ± 0% 26.5ns ± 0% -8.62% (p=0.000 n=10+10) PowInt 169ns ± 0% 160ns ± 0% -5.33% (p=0.000 n=10+10) PowFrac 375ns ± 0% 361ns ± 0% -3.73% (p=0.000 n=10+10) RoundToEven 14.0ns ± 0% 12.5ns ± 0% -10.71% (p=0.000 n=10+10) Remainder 206ns ± 0% 192ns ± 0% -6.80% (p=0.000 n=10+9) Signbit 6.01ns ± 0% 5.51ns ± 0% -8.32% (p=0.000 n=10+9) Sin 70.1ns ± 0% 69.6ns ± 0% -0.71% (p=0.000 n=10+10) Sincos 99.1ns ± 0% 99.6ns ± 0% +0.50% (p=0.000 n=9+10) SqrtGoLatency 178ns ± 0% 146ns ± 0% -17.70% (p=0.000 n=8+10) SqrtPrime 9.19µs ± 0% 9.20µs ± 0% +0.01% (p=0.000 n=9+9) Tanh 125ns ± 1% 127ns ± 0% +1.36% (p=0.000 n=10+10) Y0 428ns ± 0% 426ns ± 0% -0.47% (p=0.000 n=10+10) Y1 431ns ± 0% 429ns ± 0% -0.46% (p=0.000 n=10+9) Yn 906ns ± 0% 901ns ± 0% -0.55% (p=0.000 n=10+10) Float64bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.000 n=10+10) Float64frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+9) Float32bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.002 n=8+10) Float32frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+10) Change-Id: Iba829e15d5624962fe0c699139ea783efeefabc2 Reviewed-on: https://go-review.googlesource.com/129715 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
Keith Randall
|
b1f656b1ce |
cmd/compile: fold address calculations into CMPload[const] ops
Makes go binary smaller by 0.2%. I noticed this in autogenerated equal methods, and there are probably a lot of those. Change-Id: I4e04eb3653fbceb9dd6a4eee97ceab1fa4d10b72 Reviewed-on: https://go-review.googlesource.com/135379 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> |
||
Lynn Boger
|
8dbd9afbb0 |
cmd/compile: improve rules for PPC64.rules
This adds some improvements to the rules for PPC64 to eliminate unnecessary zero or sign extends, and fix some rule for truncates which were not always using the correct sign instruction. This reduces of size of many functions by 1 or 2 instructions and can improve performance in cases where the execution time depends on small loops where at least 1 instruction was removed and where that loop contributes a significant amount of the total execution time. Included is a testcase for codegen to verify the sign/zero extend instructions are omitted. An example of the improvement (strings): IndexAnyASCII/256:1-16 392ns ± 0% 369ns ± 0% -5.79% (p=0.000 n=1+10) IndexAnyASCII/256:2-16 397ns ± 0% 376ns ± 0% -5.23% (p=0.000 n=1+9) IndexAnyASCII/256:4-16 405ns ± 0% 384ns ± 0% -5.19% (p=1.714 n=1+6) IndexAnyASCII/256:8-16 427ns ± 0% 403ns ± 0% -5.57% (p=0.000 n=1+10) IndexAnyASCII/256:16-16 441ns ± 0% 418ns ± 1% -5.33% (p=0.000 n=1+10) IndexAnyASCII/4096:1-16 5.62µs ± 0% 5.27µs ± 1% -6.31% (p=0.000 n=1+10) IndexAnyASCII/4096:2-16 5.67µs ± 0% 5.29µs ± 0% -6.67% (p=0.222 n=1+8) IndexAnyASCII/4096:4-16 5.66µs ± 0% 5.28µs ± 1% -6.66% (p=0.000 n=1+10) IndexAnyASCII/4096:8-16 5.66µs ± 0% 5.31µs ± 1% -6.10% (p=0.000 n=1+10) IndexAnyASCII/4096:16-16 5.70µs ± 0% 5.33µs ± 1% -6.43% (p=0.182 n=1+10) Change-Id: I739a6132b505936d39001aada5a978ff2a5f0500 Reviewed-on: https://go-review.googlesource.com/129875 Reviewed-by: David Chase <drchase@google.com> |
||
erifan01
|
8149db4f64 |
cmd/compile: intrinsify math.RoundToEven and math.Abs on arm64
math.RoundToEven can be done by one arm64 instruction FRINTND, intrinsify it to improve performance. The current pure Go implementation of the function Abs is translated into five instructions on arm64: str, ldr, and, str, ldr. The intrinsic implementation requires only one instruction, so in terms of performance, intrinsify it is worthwhile. Benchmarks: name old time/op new time/op delta Abs-8 3.50ns ± 0% 1.50ns ± 0% -57.14% (p=0.000 n=10+10) RoundToEven-8 9.26ns ± 0% 1.50ns ± 0% -83.80% (p=0.000 n=10+10) Change-Id: I9456b26ab282b544dfac0154fc86f17aed96ac3d Reviewed-on: https://go-review.googlesource.com/116535 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
fanzha02
|
d5377c2026 |
test: fix the wrong test of math.Copysign(c, -1) for arm64
The CL 132915 added the wrong codegen test for math.Copysign(c, -1), it should test that AND is not emitted. This CL fixes this error. Change-Id: Ida1d3d54ebfc7f238abccbc1f70f914e1b5bfd91 Reviewed-on: https://go-review.googlesource.com/134815 Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
Ben Shi
|
9f2411894b |
cmd/compile: optimize arm's bit operation
BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
erifan01
|
204cc14bdd |
cmd/compile: implement non-constant rotates using ROR on arm64
Add some rules to match the Go code like: y &= 63 x << y | x >> (64-y) or y &= 63 x >> y | x << (64-y) as a ROR instruction. Make math/bits.RotateLeft faster on arm64. Extends CL 132435 to arm64. Benchmarks of math/bits.RotateLeftxxN: name old time/op new time/op delta RotateLeft-8 3.548750ns +- 1% 2.003750ns +- 0% -43.54% (p=0.000 n=8+8) RotateLeft8-8 3.925000ns +- 0% 3.925000ns +- 0% ~ (p=1.000 n=8+8) RotateLeft16-8 3.925000ns +- 0% 3.927500ns +- 0% ~ (p=0.608 n=8+8) RotateLeft32-8 3.925000ns +- 0% 2.002500ns +- 0% -48.98% (p=0.000 n=8+8) RotateLeft64-8 3.536250ns +- 0% 2.003750ns +- 0% -43.34% (p=0.000 n=8+8) Change-Id: I77622cd7f39b917427e060647321f5513973232c Reviewed-on: https://go-review.googlesource.com/122542 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
031a35ec84 |
cmd/compile: optimize 386's comparison
Optimization of "(CMPconst [0] (ANDL x y)) -> (TESTL x y)" only get benefits if there is no further use of the result of x&y. A condition of uses==1 will have slight improvements. 1. The code size of pkg/linux_386 decreases about 300 bytes, excluding cmd/compile/. 2. The go1 benchmark shows no regression, and even a slight improvement in test case FmtFprintfEmpty-4, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.34s ± 3% 3.32s ± 2% ~ (p=0.197 n=30+30) Fannkuch11-4 3.48s ± 2% 3.47s ± 1% -0.33% (p=0.015 n=30+30) FmtFprintfEmpty-4 46.3ns ± 4% 44.8ns ± 4% -3.33% (p=0.000 n=30+30) FmtFprintfString-4 78.8ns ± 7% 77.3ns ± 5% ~ (p=0.098 n=30+26) FmtFprintfInt-4 90.2ns ± 1% 90.0ns ± 7% -0.23% (p=0.027 n=18+30) FmtFprintfIntInt-4 144ns ± 4% 143ns ± 5% ~ (p=0.945 n=30+29) FmtFprintfPrefixedInt-4 180ns ± 4% 180ns ± 5% ~ (p=0.858 n=30+30) FmtFprintfFloat-4 409ns ± 4% 406ns ± 3% -0.87% (p=0.028 n=30+30) FmtManyArgs-4 611ns ± 5% 608ns ± 4% ~ (p=0.812 n=30+30) GobDecode-4 7.30ms ± 5% 7.26ms ± 5% ~ (p=0.522 n=30+29) GobEncode-4 6.90ms ± 7% 6.82ms ± 4% ~ (p=0.086 n=29+28) Gzip-4 396ms ± 4% 400ms ± 4% +0.99% (p=0.026 n=30+30) Gunzip-4 41.1ms ± 3% 41.2ms ± 3% ~ (p=0.495 n=30+30) HTTPClientServer-4 63.7µs ± 3% 63.3µs ± 2% ~ (p=0.113 n=29+29) JSONEncode-4 16.1ms ± 2% 16.1ms ± 2% -0.30% (p=0.041 n=30+30) JSONDecode-4 60.9ms ± 3% 61.2ms ± 6% ~ (p=0.187 n=30+30) Mandelbrot200-4 5.17ms ± 2% 5.19ms ± 3% ~ (p=0.676 n=30+30) GoParse-4 3.28ms ± 3% 3.25ms ± 2% -0.97% (p=0.002 n=30+30) RegexpMatchEasy0_32-4 103ns ± 4% 104ns ± 4% ~ (p=0.352 n=30+30) RegexpMatchEasy0_1K-4 849ns ± 2% 845ns ± 2% ~ (p=0.381 n=30+30) RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.795 n=30+30) RegexpMatchEasy1_1K-4 1.03µs ± 3% 1.03µs ± 4% ~ (p=0.275 n=25+30) RegexpMatchMedium_32-4 132ns ± 3% 132ns ± 3% ~ (p=0.970 n=30+30) RegexpMatchMedium_1K-4 41.4µs ± 3% 41.4µs ± 3% ~ (p=0.212 n=30+30) RegexpMatchHard_32-4 2.22µs ± 4% 2.22µs ± 4% ~ (p=0.399 n=30+30) RegexpMatchHard_1K-4 67.2µs ± 3% 67.6µs ± 4% ~ (p=0.359 n=30+30) Revcomp-4 1.84s ± 2% 1.83s ± 2% ~ (p=0.532 n=30+30) Template-4 69.1ms ± 4% 68.8ms ± 3% ~ (p=0.146 n=30+30) TimeParse-4 441ns ± 3% 442ns ± 3% ~ (p=0.154 n=30+30) TimeFormat-4 413ns ± 3% 414ns ± 3% ~ (p=0.275 n=30+30) [Geo mean] 66.2µs 66.0µs -0.28% name old speed new speed delta GobDecode-4 105MB/s ± 5% 106MB/s ± 5% ~ (p=0.514 n=30+29) GobEncode-4 111MB/s ± 5% 113MB/s ± 4% +1.37% (p=0.046 n=28+28) Gzip-4 49.1MB/s ± 4% 48.6MB/s ± 4% -0.98% (p=0.028 n=30+30) Gunzip-4 472MB/s ± 4% 472MB/s ± 3% ~ (p=0.496 n=30+30) JSONEncode-4 120MB/s ± 2% 121MB/s ± 2% +0.29% (p=0.042 n=30+30) JSONDecode-4 31.9MB/s ± 3% 31.7MB/s ± 6% ~ (p=0.186 n=30+30) GoParse-4 17.6MB/s ± 3% 17.8MB/s ± 2% +0.98% (p=0.002 n=30+30) RegexpMatchEasy0_32-4 309MB/s ± 4% 307MB/s ± 4% ~ (p=0.501 n=30+30) RegexpMatchEasy0_1K-4 1.21GB/s ± 2% 1.21GB/s ± 2% ~ (p=0.301 n=30+30) RegexpMatchEasy1_32-4 283MB/s ± 4% 282MB/s ± 3% ~ (p=0.877 n=30+30) RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 0.99GB/s ± 4% ~ (p=0.276 n=25+30) RegexpMatchMedium_32-4 7.54MB/s ± 3% 7.55MB/s ± 3% ~ (p=0.528 n=30+30) RegexpMatchMedium_1K-4 24.7MB/s ± 3% 24.7MB/s ± 3% ~ (p=0.203 n=30+30) RegexpMatchHard_32-4 14.4MB/s ± 4% 14.4MB/s ± 4% ~ (p=0.407 n=30+30) RegexpMatchHard_1K-4 15.3MB/s ± 3% 15.1MB/s ± 4% ~ (p=0.306 n=30+30) Revcomp-4 138MB/s ± 2% 139MB/s ± 2% ~ (p=0.520 n=30+30) Template-4 28.1MB/s ± 4% 28.2MB/s ± 3% ~ (p=0.149 n=30+30) [Geo mean] 81.5MB/s 81.5MB/s +0.06% Change-Id: I7f75425f79eec93cdd8fdd94db13ad4f61b6a2f5 Reviewed-on: https://go-review.googlesource.com/133657 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
||
fanzha02
|
2e5c32518c |
cmd/compile: optimize math.Copysign on arm64
Add rewrite rules to optimize math.Copysign() when the second argument is negative floating point constant. For example, math.Copysign(c, -2): The previous compile output is "AND $9223372036854775807, R0, R0; ORR $-9223372036854775808, R0, R0". The optimized compile output is "ORR $-9223372036854775808, R0, R0" Math package benchmark results. name old time/op new time/op delta Copysign-8 2.61ns ± 2% 2.49ns ± 0% -4.55% (p=0.000 n=10+10) Cos-8 43.0ns ± 0% 41.5ns ± 0% -3.49% (p=0.000 n=10+10) Cosh-8 98.6ns ± 0% 98.1ns ± 0% -0.51% (p=0.000 n=10+10) ExpGo-8 107ns ± 0% 105ns ± 0% -1.87% (p=0.000 n=10+10) Exp2Go-8 100ns ± 0% 100ns ± 0% +0.39% (p=0.000 n=10+8) Max-8 6.56ns ± 2% 6.45ns ± 1% -1.63% (p=0.002 n=10+10) Min-8 6.66ns ± 3% 6.47ns ± 2% -2.82% (p=0.006 n=10+10) Mod-8 107ns ± 1% 104ns ± 1% -2.72% (p=0.000 n=10+10) Frexp-8 11.5ns ± 1% 11.0ns ± 0% -4.56% (p=0.000 n=8+10) HypotGo-8 19.4ns ± 0% 19.4ns ± 0% +0.36% (p=0.019 n=10+10) Ilogb-8 8.63ns ± 0% 8.51ns ± 0% -1.36% (p=0.000 n=10+10) Jn-8 584ns ± 0% 585ns ± 0% +0.17% (p=0.000 n=7+8) Ldexp-8 13.8ns ± 0% 13.5ns ± 0% -2.17% (p=0.002 n=8+10) Logb-8 10.2ns ± 0% 9.9ns ± 0% -2.65% (p=0.000 n=10+7) Nextafter64-8 7.54ns ± 0% 7.51ns ± 0% -0.37% (p=0.000 n=10+10) Remainder-8 73.5ns ± 1% 70.4ns ± 1% -4.27% (p=0.000 n=10+10) SqrtGoLatency-8 79.6ns ± 0% 76.2ns ± 0% -4.30% (p=0.000 n=9+10) Yn-8 582ns ± 0% 579ns ± 0% -0.52% (p=0.000 n=10+10) Change-Id: I0c9cd1ea87435e7b8bab94b4e79e6e29785f25b1 Reviewed-on: https://go-review.googlesource.com/132915 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
Ben Shi
|
067dfce21f |
cmd/compile: optimize ARM's comparision
Optimize (CMPconst [0] (ADD x y)) to (CMN x y) will only get benefits when the result of the addition is no longer used, otherwise there might be even performance drop. And this CL fixes that issue for CMP/CMN/TST/TEQ. There is little regression in the go1 benchmark (excluding noise), and the test case JSONDecode-4 even gets improvement. name old time/op new time/op delta BinaryTree17-4 21.6s ± 1% 21.6s ± 0% -0.22% (p=0.013 n=30+30) Fannkuch11-4 11.1s ± 0% 11.1s ± 0% +0.11% (p=0.000 n=30+29) FmtFprintfEmpty-4 297ns ± 0% 297ns ± 0% +0.08% (p=0.007 n=26+28) FmtFprintfString-4 589ns ± 1% 589ns ± 0% ~ (p=0.659 n=30+25) FmtFprintfInt-4 644ns ± 1% 650ns ± 0% +0.88% (p=0.000 n=30+24) FmtFprintfIntInt-4 964ns ± 0% 977ns ± 0% +1.33% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.06µs ± 0% 1.07µs ± 0% +1.31% (p=0.000 n=29+27) FmtFprintfFloat-4 1.89µs ± 0% 1.92µs ± 0% +1.25% (p=0.000 n=29+29) FmtManyArgs-4 3.63µs ± 0% 3.67µs ± 0% +1.33% (p=0.000 n=29+27) GobDecode-4 38.1ms ± 1% 37.9ms ± 1% -0.60% (p=0.000 n=29+29) GobEncode-4 35.3ms ± 2% 35.2ms ± 1% ~ (p=0.286 n=30+30) Gzip-4 2.36s ± 0% 2.37s ± 2% ~ (p=0.277 n=24+28) Gunzip-4 264ms ± 1% 264ms ± 1% ~ (p=0.104 n=28+30) HTTPClientServer-4 1.04ms ± 4% 1.02ms ± 4% -1.65% (p=0.000 n=28+28) JSONEncode-4 78.5ms ± 1% 79.6ms ± 1% +1.34% (p=0.000 n=27+28) JSONDecode-4 379ms ± 4% 352ms ± 5% -7.09% (p=0.000 n=29+30) Mandelbrot200-4 17.6ms ± 0% 17.6ms ± 0% ~ (p=0.206 n=28+29) GoParse-4 21.9ms ± 1% 22.1ms ± 1% +0.87% (p=0.000 n=28+26) RegexpMatchEasy0_32-4 631ns ± 0% 641ns ± 0% +1.63% (p=0.000 n=29+30) RegexpMatchEasy0_1K-4 4.11µs ± 0% 4.11µs ± 0% ~ (p=0.700 n=30+30) RegexpMatchEasy1_32-4 670ns ± 0% 679ns ± 0% +1.37% (p=0.000 n=21+30) RegexpMatchEasy1_1K-4 5.31µs ± 0% 5.26µs ± 0% -1.03% (p=0.000 n=25+28) RegexpMatchMedium_32-4 905ns ± 0% 906ns ± 0% +0.14% (p=0.001 n=30+30) RegexpMatchMedium_1K-4 192µs ± 0% 191µs ± 0% -0.45% (p=0.000 n=29+27) RegexpMatchHard_32-4 11.8µs ± 0% 11.7µs ± 0% -0.39% (p=0.000 n=29+28) RegexpMatchHard_1K-4 347µs ± 0% 347µs ± 0% ~ (p=0.084 n=29+30) Revcomp-4 37.5ms ± 1% 37.5ms ± 1% ~ (p=0.279 n=29+29) Template-4 519ms ± 2% 519ms ± 2% ~ (p=0.652 n=28+29) TimeParse-4 2.83µs ± 0% 2.78µs ± 0% -1.90% (p=0.000 n=27+28) TimeFormat-4 5.79µs ± 0% 5.60µs ± 0% -3.23% (p=0.000 n=29+29) [Geo mean] 331µs 330µs -0.16% name old speed new speed delta GobDecode-4 20.1MB/s ± 1% 20.3MB/s ± 1% +0.61% (p=0.000 n=29+29) GobEncode-4 21.7MB/s ± 2% 21.8MB/s ± 1% ~ (p=0.294 n=30+30) Gzip-4 8.23MB/s ± 1% 8.20MB/s ± 2% ~ (p=0.099 n=26+28) Gunzip-4 73.5MB/s ± 1% 73.4MB/s ± 1% ~ (p=0.107 n=28+30) JSONEncode-4 24.7MB/s ± 1% 24.4MB/s ± 1% -1.32% (p=0.000 n=27+28) JSONDecode-4 5.13MB/s ± 4% 5.52MB/s ± 5% +7.65% (p=0.000 n=29+30) GoParse-4 2.65MB/s ± 1% 2.63MB/s ± 1% -0.87% (p=0.000 n=28+26) RegexpMatchEasy0_32-4 50.7MB/s ± 0% 49.9MB/s ± 0% -1.58% (p=0.000 n=29+29) RegexpMatchEasy0_1K-4 249MB/s ± 0% 249MB/s ± 0% ~ (p=0.342 n=30+28) RegexpMatchEasy1_32-4 47.7MB/s ± 0% 47.1MB/s ± 0% -1.39% (p=0.000 n=26+30) RegexpMatchEasy1_1K-4 193MB/s ± 0% 195MB/s ± 0% +1.04% (p=0.000 n=25+28) RegexpMatchMedium_32-4 1.10MB/s ± 0% 1.10MB/s ± 0% -0.42% (p=0.000 n=30+26) RegexpMatchMedium_1K-4 5.33MB/s ± 0% 5.36MB/s ± 0% +0.43% (p=0.000 n=29+29) RegexpMatchHard_32-4 2.72MB/s ± 0% 2.73MB/s ± 0% +0.37% (p=0.000 n=29+30) RegexpMatchHard_1K-4 2.95MB/s ± 0% 2.95MB/s ± 0% ~ (all equal) Revcomp-4 67.8MB/s ± 1% 67.7MB/s ± 1% ~ (p=0.273 n=29+29) Template-4 3.74MB/s ± 2% 3.74MB/s ± 2% ~ (p=0.665 n=28+29) [Geo mean] 15.2MB/s 15.2MB/s +0.21% Change-Id: Ifed1fb8cc02d5ca52c8bc6c21b6b5bf6dbb2701a Reviewed-on: https://go-review.googlesource.com/132115 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Michael Munday
|
f94de9c9fb |
cmd/compile: make math/bits.RotateLeft{32,64} intrinsics on s390x
Extends CL 132435 to s390x. s390x has 32- and 64-bit variable rotate left instructions. Change-Id: Ic4f1ebb0e0543207ed2fc8c119e0163b428138a5 Reviewed-on: https://go-review.googlesource.com/133035 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
0e9f1de0b7 |
cmd/compile: optimize arm64's comparison
Add more optimization with TST/CMN. 1. A tiny benchmark shows more than 12% improvement. TSTCMN-4 378µs ± 0% 332µs ± 0% -12.15% (p=0.000 n=30+27) (https://github.com/benshi001/ugo1/blob/master/tstcmn_test.go) 2. There is little regression in the go1 benchmark, excluding noise. name old time/op new time/op delta BinaryTree17-4 19.1s ± 0% 19.1s ± 0% ~ (p=0.994 n=28+29) Fannkuch11-4 10.0s ± 0% 10.0s ± 0% ~ (p=0.198 n=30+25) FmtFprintfEmpty-4 233ns ± 0% 233ns ± 0% +0.14% (p=0.002 n=24+30) FmtFprintfString-4 428ns ± 0% 428ns ± 0% ~ (all equal) FmtFprintfInt-4 472ns ± 0% 472ns ± 0% ~ (all equal) FmtFprintfIntInt-4 725ns ± 0% 725ns ± 0% ~ (all equal) FmtFprintfPrefixedInt-4 889ns ± 0% 888ns ± 0% ~ (p=0.632 n=28+30) FmtFprintfFloat-4 1.20µs ± 0% 1.20µs ± 0% +0.05% (p=0.001 n=18+30) FmtManyArgs-4 3.00µs ± 0% 2.99µs ± 0% -0.07% (p=0.001 n=27+30) GobDecode-4 42.1ms ± 0% 42.2ms ± 0% +0.29% (p=0.000 n=28+28) GobEncode-4 38.6ms ± 9% 38.8ms ± 9% ~ (p=0.912 n=30+30) Gzip-4 2.07s ± 1% 2.05s ± 1% -0.64% (p=0.000 n=29+30) Gunzip-4 175ms ± 0% 175ms ± 0% -0.15% (p=0.001 n=30+30) HTTPClientServer-4 872µs ± 5% 880µs ± 6% ~ (p=0.196 n=30+29) JSONEncode-4 88.5ms ± 1% 89.8ms ± 1% +1.49% (p=0.000 n=23+24) JSONDecode-4 393ms ± 1% 390ms ± 1% -0.89% (p=0.000 n=28+30) Mandelbrot200-4 19.5ms ± 0% 19.5ms ± 0% ~ (p=0.405 n=29+28) GoParse-4 19.9ms ± 0% 20.0ms ± 0% +0.27% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 431ns ± 0% 431ns ± 0% ~ (p=1.000 n=30+30) RegexpMatchEasy0_1K-4 1.61µs ± 0% 1.61µs ± 0% ~ (p=0.527 n=26+26) RegexpMatchEasy1_32-4 443ns ± 0% 443ns ± 0% ~ (all equal) RegexpMatchEasy1_1K-4 2.58µs ± 1% 2.58µs ± 1% ~ (p=0.578 n=27+25) RegexpMatchMedium_32-4 740ns ± 0% 740ns ± 0% ~ (p=0.357 n=30+30) RegexpMatchMedium_1K-4 223µs ± 0% 223µs ± 0% +0.16% (p=0.000 n=30+29) RegexpMatchHard_32-4 12.3µs ± 0% 12.3µs ± 0% ~ (p=0.236 n=27+27) RegexpMatchHard_1K-4 371µs ± 0% 371µs ± 0% +0.09% (p=0.000 n=30+27) Revcomp-4 2.85s ± 0% 2.85s ± 0% ~ (p=0.057 n=28+25) Template-4 408ms ± 1% 409ms ± 1% ~ (p=0.117 n=29+29) TimeParse-4 1.93µs ± 0% 1.93µs ± 0% ~ (p=0.535 n=29+28) TimeFormat-4 1.99µs ± 0% 1.99µs ± 0% ~ (p=0.168 n=29+28) [Geo mean] 306µs 307µs +0.07% name old speed new speed delta GobDecode-4 18.3MB/s ± 0% 18.2MB/s ± 0% -0.31% (p=0.000 n=28+29) GobEncode-4 19.9MB/s ± 8% 19.8MB/s ± 9% ~ (p=0.923 n=30+30) Gzip-4 9.39MB/s ± 1% 9.45MB/s ± 1% +0.65% (p=0.000 n=29+30) Gunzip-4 111MB/s ± 0% 111MB/s ± 0% +0.15% (p=0.001 n=30+30) JSONEncode-4 21.9MB/s ± 1% 21.6MB/s ± 1% -1.45% (p=0.000 n=23+23) JSONDecode-4 4.94MB/s ± 1% 4.98MB/s ± 1% +0.84% (p=0.000 n=27+30) GoParse-4 2.91MB/s ± 0% 2.90MB/s ± 0% -0.34% (p=0.000 n=21+22) RegexpMatchEasy0_32-4 74.1MB/s ± 0% 74.1MB/s ± 0% ~ (p=0.469 n=29+28) RegexpMatchEasy0_1K-4 634MB/s ± 0% 634MB/s ± 0% ~ (p=0.978 n=24+28) RegexpMatchEasy1_32-4 72.2MB/s ± 0% 72.2MB/s ± 0% ~ (p=0.064 n=27+29) RegexpMatchEasy1_1K-4 396MB/s ± 1% 396MB/s ± 1% ~ (p=0.583 n=27+25) RegexpMatchMedium_32-4 1.35MB/s ± 0% 1.35MB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 4.60MB/s ± 0% 4.59MB/s ± 0% -0.14% (p=0.000 n=30+26) RegexpMatchHard_32-4 2.61MB/s ± 0% 2.61MB/s ± 0% ~ (all equal) RegexpMatchHard_1K-4 2.76MB/s ± 0% 2.76MB/s ± 0% ~ (all equal) Revcomp-4 89.1MB/s ± 0% 89.1MB/s ± 0% ~ (p=0.059 n=28+25) Template-4 4.75MB/s ± 1% 4.75MB/s ± 1% ~ (p=0.106 n=29+29) [Geo mean] 18.3MB/s 18.3MB/s -0.07% Change-Id: I3cd76ce63e84b0c3cebabf9fa3573b76a7343899 Reviewed-on: https://go-review.googlesource.com/124935 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Ben Shi
|
b444215116 |
cmd/compile: optimize ARM64's code with MADD/MSUB
MADD does MUL-ADD in a single instruction, and MSUB does the similiar simplification for MUL-SUB. The CL implements the optimization with MADD/MSUB. 1. The total size of pkg/android_arm64/ decreases about 20KB, excluding cmd/compile/. 2. The go1 benchmark shows a little improvement for RegexpMatchHard_32-4 and Template-4, excluding noise. name old time/op new time/op delta BinaryTree17-4 16.3s ± 1% 16.5s ± 1% +1.41% (p=0.000 n=26+28) Fannkuch11-4 8.79s ± 1% 8.76s ± 0% -0.36% (p=0.000 n=26+28) FmtFprintfEmpty-4 172ns ± 0% 172ns ± 0% ~ (all equal) FmtFprintfString-4 362ns ± 1% 364ns ± 0% +0.55% (p=0.000 n=30+30) FmtFprintfInt-4 416ns ± 0% 416ns ± 0% ~ (p=0.099 n=22+30) FmtFprintfIntInt-4 655ns ± 1% 660ns ± 1% +0.76% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 810ns ± 0% 809ns ± 0% -0.08% (p=0.009 n=29+29) FmtFprintfFloat-4 1.08µs ± 0% 1.09µs ± 0% +0.61% (p=0.000 n=30+29) FmtManyArgs-4 2.70µs ± 0% 2.69µs ± 0% -0.23% (p=0.000 n=29+28) GobDecode-4 32.2ms ± 1% 32.1ms ± 1% -0.39% (p=0.000 n=27+26) GobEncode-4 27.4ms ± 2% 27.4ms ± 1% ~ (p=0.864 n=28+28) Gzip-4 1.53s ± 1% 1.52s ± 1% -0.30% (p=0.031 n=29+29) Gunzip-4 146ms ± 0% 146ms ± 0% -0.14% (p=0.001 n=25+30) HTTPClientServer-4 1.00ms ± 4% 0.98ms ± 6% -1.65% (p=0.001 n=29+30) JSONEncode-4 67.3ms ± 1% 67.2ms ± 1% ~ (p=0.520 n=28+28) JSONDecode-4 329ms ± 5% 330ms ± 4% ~ (p=0.142 n=30+30) Mandelbrot200-4 17.3ms ± 0% 17.3ms ± 0% ~ (p=0.055 n=26+29) GoParse-4 16.9ms ± 1% 17.0ms ± 1% +0.82% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 382ns ± 0% 382ns ± 0% ~ (all equal) RegexpMatchEasy0_1K-4 1.33µs ± 0% 1.33µs ± 0% -0.25% (p=0.000 n=30+27) RegexpMatchEasy1_32-4 361ns ± 0% 361ns ± 0% -0.08% (p=0.002 n=30+28) RegexpMatchEasy1_1K-4 2.11µs ± 0% 2.09µs ± 0% -0.54% (p=0.000 n=30+29) RegexpMatchMedium_32-4 594ns ± 0% 592ns ± 0% -0.32% (p=0.000 n=30+30) RegexpMatchMedium_1K-4 173µs ± 0% 172µs ± 0% -0.77% (p=0.000 n=29+27) RegexpMatchHard_32-4 10.4µs ± 0% 10.1µs ± 0% -3.63% (p=0.000 n=28+27) RegexpMatchHard_1K-4 306µs ± 0% 301µs ± 0% -1.64% (p=0.000 n=29+30) Revcomp-4 2.51s ± 1% 2.52s ± 0% +0.18% (p=0.017 n=26+27) Template-4 394ms ± 3% 382ms ± 3% -3.22% (p=0.000 n=28+28) TimeParse-4 1.67µs ± 0% 1.67µs ± 0% +0.05% (p=0.030 n=27+30) TimeFormat-4 1.72µs ± 0% 1.70µs ± 0% -0.79% (p=0.000 n=28+26) [Geo mean] 259µs 259µs -0.33% name old speed new speed delta GobDecode-4 23.8MB/s ± 1% 23.9MB/s ± 1% +0.40% (p=0.001 n=27+26) GobEncode-4 28.0MB/s ± 2% 28.0MB/s ± 1% ~ (p=0.863 n=28+28) Gzip-4 12.7MB/s ± 1% 12.7MB/s ± 1% +0.32% (p=0.026 n=29+29) Gunzip-4 133MB/s ± 0% 133MB/s ± 0% +0.15% (p=0.001 n=24+30) JSONEncode-4 28.8MB/s ± 1% 28.9MB/s ± 1% ~ (p=0.475 n=28+28) JSONDecode-4 5.89MB/s ± 4% 5.87MB/s ± 5% ~ (p=0.174 n=29+30) GoParse-4 3.43MB/s ± 0% 3.40MB/s ± 1% -0.83% (p=0.000 n=28+30) RegexpMatchEasy0_32-4 83.6MB/s ± 0% 83.6MB/s ± 0% ~ (p=0.848 n=28+29) RegexpMatchEasy0_1K-4 768MB/s ± 0% 770MB/s ± 0% +0.25% (p=0.000 n=30+27) RegexpMatchEasy1_32-4 88.5MB/s ± 0% 88.5MB/s ± 0% ~ (p=0.086 n=29+29) RegexpMatchEasy1_1K-4 486MB/s ± 0% 489MB/s ± 0% +0.54% (p=0.000 n=30+29) RegexpMatchMedium_32-4 1.68MB/s ± 0% 1.69MB/s ± 0% +0.60% (p=0.000 n=30+23) RegexpMatchMedium_1K-4 5.90MB/s ± 0% 5.95MB/s ± 0% +0.85% (p=0.000 n=18+20) RegexpMatchHard_32-4 3.07MB/s ± 0% 3.18MB/s ± 0% +3.72% (p=0.000 n=29+26) RegexpMatchHard_1K-4 3.35MB/s ± 0% 3.40MB/s ± 0% +1.69% (p=0.000 n=30+30) Revcomp-4 101MB/s ± 0% 101MB/s ± 0% -0.18% (p=0.018 n=26+27) Template-4 4.92MB/s ± 4% 5.09MB/s ± 3% +3.31% (p=0.000 n=28+28) [Geo mean] 22.4MB/s 22.6MB/s +0.62% Change-Id: I8f304b272785739f57b3c8f736316f658f8c1b2a Reviewed-on: https://go-review.googlesource.com/129119 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
Michael Munday
|
6f9b94ab66 |
cmd/compile: implement OnesCount{8,16,32,64} intrinsics on s390x
This CL implements the math/bits.OnesCount{8,16,32,64} functions as intrinsics on s390x using the 'population count' (popcnt) instruction. This instruction was released as the 'population-count' facility which uses the same facility bit (45) as the 'distinct-operands' facility which is a pre-requisite for Go on s390x. We can therefore use it without a feature check. The s390x popcnt instruction treats a 64 bit register as a vector of 8 bytes, summing the number of ones in each byte individually. It then writes the results to the corresponding bytes in the output register. Therefore to implement OnesCount{16,32,64} we need to sum the individual byte counts using some extra instructions. To do this efficiently I've added some additional pseudo operations to the s390x SSA backend. Unlike other architectures the new instruction sequence is faster for OnesCount8, so that is implemented using the intrinsic. name old time/op new time/op delta OnesCount 3.21ns ± 1% 1.35ns ± 0% -58.00% (p=0.000 n=20+20) OnesCount8 0.91ns ± 1% 0.81ns ± 0% -11.43% (p=0.000 n=20+20) OnesCount16 1.51ns ± 3% 1.21ns ± 0% -19.71% (p=0.000 n=20+17) OnesCount32 1.91ns ± 0% 1.12ns ± 1% -41.60% (p=0.000 n=19+20) OnesCount64 3.18ns ± 4% 1.35ns ± 0% -57.52% (p=0.000 n=20+20) Change-Id: Id54f0bd28b6db9a887ad12c0d72fcc168ef9c4e0 Reviewed-on: https://go-review.googlesource.com/114675 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |