fanzha02
6efd51c6b7
cmd/compile: change the condition flags of floating-point comparisons in arm64 backend
...
Current compiler reverses operands to work around NaN in
"less than" and "less equal than" comparisons. But if we
want to use "FCMPD/FCMPS $(0.0), Fn" to do some optimization,
the workaround way does not work. Because assembler does
not support instruction "FCMPD/FCMPS Fn, $(0.0)".
This CL sets condition flags for floating-point comparisons
to resolve this problem.
Change-Id: Ia48076a1da95da64596d6e68304018cb301ebe33
Reviewed-on: https://go-review.googlesource.com/c/go/+/164718
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-07 21:23:52 +00:00
erifan01
4e2b0dda8c
cmd/compile: eliminate unnecessary type conversions in TrailingZeros(16|8) for arm64
...
This CL eliminates unnecessary type conversion operations: OpZeroExt16to64 and OpZeroExt8to64.
If the input argrument is a nonzero value, then ORconst operation can also be eliminated.
Benchmarks:
name old time/op new time/op delta
TrailingZeros-8 2.75ns ± 0% 2.75ns ± 0% ~ (all equal)
TrailingZeros8-8 3.49ns ± 1% 2.93ns ± 0% -16.00% (p=0.000 n=10+10)
TrailingZeros16-8 3.49ns ± 1% 2.93ns ± 0% -16.05% (p=0.000 n=9+10)
TrailingZeros32-8 2.67ns ± 1% 2.68ns ± 1% ~ (p=0.468 n=10+10)
TrailingZeros64-8 2.67ns ± 1% 2.65ns ± 0% -0.62% (p=0.022 n=10+9)
code:
func f16(x uint) { z = bits.TrailingZeros16(uint16(x)) }
Before:
"".f16 STEXT size=48 args=0x8 locals=0x0 leaf
0x0000 00000 (test.go:7) TEXT "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8
0x0000 00000 (test.go:7) FUNCDATA ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (test.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (test.go:7) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (test.go:7) PCDATA $2, ZR
0x0000 00000 (test.go:7) PCDATA ZR, ZR
0x0000 00000 (test.go:7) MOVD "".x(FP), R0
0x0004 00004 (test.go:7) MOVHU R0, R0
0x0008 00008 (test.go:7) ORR $65536, R0, R0
0x000c 00012 (test.go:7) RBIT R0, R0
0x0010 00016 (test.go:7) CLZ R0, R0
0x0014 00020 (test.go:7) MOVD R0, "".z(SB)
0x0020 00032 (test.go:7) RET (R30)
This line of code is unnecessary:
0x0004 00004 (test.go:7) MOVHU R0, R0
After:
"".f16 STEXT size=32 args=0x8 locals=0x0 leaf
0x0000 00000 (test.go:7) TEXT "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8
0x0000 00000 (test.go:7) FUNCDATA ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (test.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (test.go:7) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (test.go:7) PCDATA $2, ZR
0x0000 00000 (test.go:7) PCDATA ZR, ZR
0x0000 00000 (test.go:7) MOVD "".x(FP), R0
0x0004 00004 (test.go:7) ORR $65536, R0, R0
0x0008 00008 (test.go:7) RBITW R0, R0
0x000c 00012 (test.go:7) CLZW R0, R0
0x0010 00016 (test.go:7) MOVD R0, "".z(SB)
0x001c 00028 (test.go:7) RET (R30)
The situation of TrailingZeros8 is similar to TrailingZeros16.
Change-Id: I473bdca06be8460a0be87abbae6fe640017e4c9d
Reviewed-on: https://go-review.googlesource.com/c/go/+/156999
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-07 14:24:56 +00:00
erifan01
fee84cc905
cmd/compile: add an optimization rule for math/bits.ReverseBytes16 on arm
...
This CL adds two rules to turn patterns like ((x<<8) | (x>>8)) (the type of
x is uint16, "|" can also be "+" or "^") to a REV16 instruction on arm v6+.
This optimization rule can be used for math/bits.ReverseBytes16.
Benchmarks on arm v6:
name old time/op new time/op delta
ReverseBytes-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal)
ReverseBytes16-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal)
ReverseBytes32-32 1.29ns ± 0% 1.29ns ± 0% ~ (all equal)
ReverseBytes64-32 1.43ns ± 0% 1.43ns ± 0% ~ (all equal)
Change-Id: I819e633c9a9d308f8e476fb0c82d73fb73dd019f
Reviewed-on: https://go-review.googlesource.com/c/go/+/159019
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-07 13:37:54 +00:00
erifan01
159b2de442
cmd/compile: optimize math/bits.Div32 for arm64
...
Benchmark:
name old time/op new time/op delta
Div-8 22.0ns ± 0% 22.0ns ± 0% ~ (all equal)
Div32-8 6.51ns ± 0% 3.00ns ± 0% -53.90% (p=0.000 n=10+8)
Div64-8 22.5ns ± 0% 22.5ns ± 0% ~ (all equal)
Code:
func div32(hi, lo, y uint32) (q, r uint32) {return bits.Div32(hi, lo, y)}
Before:
0x0020 00032 (test.go:24) MOVWU "".y+8(FP), R0
0x0024 00036 ($GOROOT/src/math/bits/bits.go:472) CBZW R0, 132
0x0028 00040 ($GOROOT/src/math/bits/bits.go:472) MOVWU "".hi(FP), R1
0x002c 00044 ($GOROOT/src/math/bits/bits.go:472) CMPW R1, R0
0x0030 00048 ($GOROOT/src/math/bits/bits.go:472) BLS 96
0x0034 00052 ($GOROOT/src/math/bits/bits.go:475) MOVWU "".lo+4(FP), R2
0x0038 00056 ($GOROOT/src/math/bits/bits.go:475) ORR R1<<32, R2, R1
0x003c 00060 ($GOROOT/src/math/bits/bits.go:476) CBZ R0, 140
0x0040 00064 ($GOROOT/src/math/bits/bits.go:476) UDIV R0, R1, R2
0x0044 00068 (test.go:24) MOVW R2, "".q+16(FP)
0x0048 00072 ($GOROOT/src/math/bits/bits.go:476) UREM R0, R1, R0
0x0050 00080 (test.go:24) MOVW R0, "".r+20(FP)
0x0054 00084 (test.go:24) MOVD -8(RSP), R29
0x0058 00088 (test.go:24) MOVD.P 32(RSP), R30
0x005c 00092 (test.go:24) RET (R30)
After:
0x001c 00028 (test.go:24) MOVWU "".y+8(FP), R0
0x0020 00032 (test.go:24) CBZW R0, 92
0x0024 00036 (test.go:24) MOVWU "".hi(FP), R1
0x0028 00040 (test.go:24) CMPW R0, R1
0x002c 00044 (test.go:24) BHS 84
0x0030 00048 (test.go:24) MOVWU "".lo+4(FP), R2
0x0034 00052 (test.go:24) ORR R1<<32, R2, R4
0x0038 00056 (test.go:24) UDIV R0, R4, R3
0x003c 00060 (test.go:24) MSUB R3, R4, R0, R4
0x0040 00064 (test.go:24) MOVW R3, "".q+16(FP)
0x0044 00068 (test.go:24) MOVW R4, "".r+20(FP)
0x0048 00072 (test.go:24) MOVD -8(RSP), R29
0x004c 00076 (test.go:24) MOVD.P 16(RSP), R30
0x0050 00080 (test.go:24) RET (R30)
UREM instruction in the previous assembly code will be converted to UDIV and MSUB instructions
on arm64. However the UDIV instruction in UREM is unnecessary, because it's a duplicate of the
previous UDIV. This CL adds a rule to have this extra UDIV instruction removed by CSE.
Change-Id: Ie2508784320020b2de022806d09f75a7871bb3d7
Reviewed-on: https://go-review.googlesource.com/c/159577
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-03 20:20:10 +00:00
erifan01
192b675f17
cmd/compile: add an optimaztion rule for math/bits.ReverseBytes16 on arm64
...
On amd64 ReverseBytes16 is lowered to a rotate instruction. However arm64 doesn't
have 16-bit rotate instruction, but has a REV16W instruction which can be used
for ReverseBytes16. This CL adds a rule to turn the patterns like (x<<8) | (x>>8)
(the type of x is uint16, and "|" can also be "^" or "+") to a REV16W instruction.
Code:
func reverseBytes16(i uint16) uint16 { return bits.ReverseBytes16(i) }
Before:
0x0004 00004 (test.go:6) MOVHU "".i(FP), R0
0x0008 00008 ($GOROOT/src/math/bits/bits.go:262) UBFX $8, R0, $8, R1
0x000c 00012 ($GOROOT/src/math/bits/bits.go:262) ORR R0<<8, R1, R0
0x0010 00016 (test.go:6) MOVH R0, "".~r1+8(FP)
0x0014 00020 (test.go:6) RET (R30)
After:
0x0000 00000 (test.go:6) MOVHU "".i(FP), R0
0x0004 00004 (test.go:6) REV16W R0, R0
0x0008 00008 (test.go:6) MOVH R0, "".~r1+8(FP)
0x000c 00012 (test.go:6) RET (R30)
Benchmarks:
name old time/op new time/op delta
ReverseBytes-224 1.000000ns +- 0% 1.000000ns +- 0% ~ (all equal)
ReverseBytes16-224 1.500000ns +- 0% 1.000000ns +- 0% -33.33% (p=0.000 n=9+10)
ReverseBytes32-224 1.000000ns +- 0% 1.000000ns +- 0% ~ (all equal)
ReverseBytes64-224 1.000000ns +- 0% 1.000000ns +- 0% ~ (all equal)
Change-Id: I87cd41b2d8e549bf39c601f185d5775bd42d739c
Reviewed-on: https://go-review.googlesource.com/c/157757
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-01 15:42:19 +00:00
erifan01
dd91269b7c
cmd/compile: optimize math/bits Len32 intrinsic on arm64
...
Arm64 has a 32-bit CLZ instruction CLZW, which can be used for intrinsic Len32.
Function LeadingZeros32 calls Len32, with this change, the assembly code of
LeadingZeros32 becomes more concise.
Go code:
func f32(x uint32) { z = bits.LeadingZeros32(x) }
Before:
"".f32 STEXT size=32 args=0x8 locals=0x0 leaf
0x0000 00000 (test.go:7) TEXT "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8
0x0004 00004 (test.go:7) MOVWU "".x(FP), R0
0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZ R0, R0
0x000c 00012 ($GOROOT/src/math/bits/bits.go:30) SUB $32, R0, R0
0x0010 00016 (test.go:7) MOVD R0, "".z(SB)
0x001c 00028 (test.go:7) RET (R30)
After:
"".f32 STEXT size=32 args=0x8 locals=0x0 leaf
0x0000 00000 (test.go:7) TEXT "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8
0x0004 00004 (test.go:7) MOVWU "".x(FP), R0
0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZW R0, R0
0x000c 00012 (test.go:7) MOVD R0, "".z(SB)
0x0018 00024 (test.go:7) RET (R30)
Benchmarks:
name old time/op new time/op delta
LeadingZeros-8 2.53ns ± 0% 2.55ns ± 0% +0.67% (p=0.000 n=10+10)
LeadingZeros8-8 3.56ns ± 0% 3.56ns ± 0% ~ (all equal)
LeadingZeros16-8 3.55ns ± 0% 3.56ns ± 0% ~ (p=0.465 n=10+10)
LeadingZeros32-8 3.55ns ± 0% 2.96ns ± 0% -16.71% (p=0.000 n=10+7)
LeadingZeros64-8 2.53ns ± 0% 2.54ns ± 0% ~ (p=0.059 n=8+10)
Change-Id: Ie5666bb82909e341060e02ffd4e86c0e5d67e90a
Reviewed-on: https://go-review.googlesource.com/c/157000
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-02-27 16:09:33 +00:00
Iskander Sharipov
c1050a8e54
cmd/compile: don't generate newobject call for 0-sized types
...
Emit &runtime.zerobase instead of a call to newobject for
allocations of zero sized objects in walk.go.
Fixes #29446
Change-Id: I11b67981d55009726a17c2e582c12ce0c258682e
Reviewed-on: https://go-review.googlesource.com/c/155840
Run-TryBot: Iskander Sharipov <quasilyte@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-02-26 23:08:15 +00:00
Keith Randall
933e34ac99
cmd/compile: treat slice pointers as non-nil
...
var a []int = ...
p := &a[0]
_ = *p
We don't need to nil check on the 3rd line. If the bounds check on the 2nd
line passes, we know p is non-nil.
We rely on the fact that any cap>0 slice has a non-nil pointer as its
pointer to the backing array. This is true for all safely-constructed slices,
and I don't see any reason why someone would violate this rule using unsafe.
R=go1.13
Fixes #30366
Change-Id: I3ed764fcb72cfe1fbf963d8c1a82e24e3b6dead7
Reviewed-on: https://go-review.googlesource.com/c/163740
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-02-26 20:44:52 +00:00
Keith Randall
c5414457c6
cmd/compile: pad zero-sized stack variables
...
If someone takes a pointer to a zero-sized stack variable, it can
be incorrectly interpreted as a pointer to the next object in the
stack frame. To avoid this, add some padding after zero-sized variables.
We only need to pad if the next variable in memory (which is the
previous variable in the order in which we allocate variables to the
stack frame) has pointers. If the next variable has no pointers, it
won't hurt to have a pointer to it.
Because we allocate all pointer-containing variables before all
non-pointer-containing variables, we should only have to pad once per
frame.
Fixes #24993
Change-Id: Ife561cdfdf964fdbf69af03ae6ba97d004e6193c
Reviewed-on: https://go-review.googlesource.com/c/155698
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-22 01:16:00 +00:00
Ben Shi
c042fedbc8
test/codegen: add arithmetic tests for 386/amd64/arm/arm64
...
This CL adds several test cases of arithmetic operations for
386/amd64/arm/arm64.
Change-Id: I362687c06249f31091458a1d8c45fc4d006b616a
Reviewed-on: https://go-review.googlesource.com/c/151897
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-12-01 05:17:44 +00:00
Keith Randall
0b79dde112
cmd/compile: don't use CMOV ops to compute load addresses
...
We want to issue loads as soon as possible, especially when they
are going to miss in the cache. Using a conditional move (CMOV) here:
i := ...
if cond {
i++
}
... = a[i]
means that we have to wait for cond to be computed before the load
is issued. Without a CMOV, if the branch is predicted correctly the
load can be issued in parallel with computing cond.
Even if the branch is predicted incorrectly, maybe the speculative
load is close to the real load, and we get a prefetch for free.
In the worst case, when the prediction is wrong and the address is
way off, we only lose by the time difference between the CMOV
latency (~2 cycles) and the mispredict restart latency (~15 cycles).
We only squash CMOVs that affect load addresses. Results of CMOVs
that are used for other things (store addresses, store values) we
use as before.
Fixes #26306
Change-Id: I82ca14b664bf05e1d45e58de8c4d9c775a127ca1
Reviewed-on: https://go-review.googlesource.com/c/145717
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-11-27 17:22:37 +00:00
Brian Kessler
319787a528
cmd/compile: intrinsify math/bits.Div on amd64
...
Note that the intrinsic implementation panics separately for overflow and
divide by zero, which matches the behavior of the pure go implementation.
There is a modest performance improvement after intrinsic implementation.
name old time/op new time/op delta
Div-4 53.0ns ± 1% 47.0ns ± 0% -11.28% (p=0.008 n=5+5)
Div32-4 18.4ns ± 0% 18.5ns ± 1% ~ (p=0.444 n=5+5)
Div64-4 53.3ns ± 0% 47.5ns ± 4% -10.77% (p=0.008 n=5+5)
Updates #28273
Change-Id: Ic1688ecc0964acace2e91bf44ef16f5fb6b6bc82
Reviewed-on: https://go-review.googlesource.com/c/144378
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-27 05:04:25 +00:00
Martin Möhrmann
75798e8ada
runtime: make processor capability variable naming platform specific
...
The current support_XXX variables are specific for the
amd64 and 386 platforms.
Prefix processor capability variables by architecture to have a
consistent naming scheme and avoid reuse of the existing
variables for new platforms.
This also aligns naming of runtime variables closer with internal/cpu
processor capability variable names.
Change-Id: I3eabb29a03874678851376185d3a62e73c1aff1d
Reviewed-on: https://go-review.googlesource.com/c/91435
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-14 20:30:31 +00:00
Lynn Boger
4ae49b5921
cmd/compile: use ANDCC, ORCC, XORCC to avoid CMP on ppc64x
...
This change makes use of the cc versions of the AND, OR, XOR
instructions, omitting the need for a CMP instruction.
In many test programs and in the go binary, this reduces the
size of 20-30 functions by at least 1 instruction, many in
runtime.
Testcase added to test/codegen/comparisons.go
Change-Id: I6cc1ca8b80b065d7390749c625bc9784b0039adb
Reviewed-on: https://go-review.googlesource.com/c/143059
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-09 19:40:52 +00:00
Keith Randall
0ad332d80c
cmd/compile: implement some moves using non-overlapping reads&writes
...
For moves >8,<16 bytes, do a move using non-overlapping loads/stores
if it would require no more instructions.
This helps a bit with the case when the move is from a static
constant, because then the code to materialize the value being moved
is smaller.
Change-Id: Ie47a5a7c654afeb4973142b0a9922faea13c9b54
Reviewed-on: https://go-review.googlesource.com/c/146019
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-30 20:27:03 +00:00
Ben Shi
455ef3f6bc
test/codegen: improve arithmetic tests
...
This CL fixes several typos and adds two more cases
to arithmetic test.
Change-Id: I086560162ea351e2166866e444e2317da36c1729
Reviewed-on: https://go-review.googlesource.com/c/145210
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-30 14:39:53 +00:00
Ben Shi
5f5ea3fd4d
cmd/compile: optimize amd64's ADDQconstmodify/ADDLconstmodify
...
This CL optimize amd64's code:
"ADDQ $-1, MEM_OP" -> "DECQ MEM_OP"
"ADDL $-1, MEM_OP" -> "DECL MEM_OP"
1. The total size of pkg/linux_amd64 (excluding cmd/compile)
decreases about 0.1KB.
2. The go1 benchmark shows little regression, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 2.60s ± 5% 2.64s ± 3% +1.53% (p=0.000 n=38+39)
Fannkuch11-4 2.37s ± 2% 2.38s ± 2% ~ (p=0.950 n=40+40)
FmtFprintfEmpty-4 40.4ns ± 5% 40.5ns ± 5% ~ (p=0.711 n=40+40)
FmtFprintfString-4 72.4ns ± 5% 72.3ns ± 3% ~ (p=0.485 n=40+40)
FmtFprintfInt-4 79.7ns ± 3% 80.1ns ± 3% ~ (p=0.124 n=40+40)
FmtFprintfIntInt-4 126ns ± 3% 127ns ± 3% +0.71% (p=0.027 n=40+40)
FmtFprintfPrefixedInt-4 153ns ± 4% 153ns ± 2% ~ (p=0.604 n=40+40)
FmtFprintfFloat-4 206ns ± 5% 210ns ± 5% +1.79% (p=0.002 n=40+40)
FmtManyArgs-4 498ns ± 3% 496ns ± 3% ~ (p=0.099 n=40+40)
GobDecode-4 6.48ms ± 6% 6.47ms ± 7% ~ (p=0.686 n=39+40)
GobEncode-4 5.95ms ± 7% 5.96ms ± 6% ~ (p=0.670 n=40+34)
Gzip-4 224ms ± 6% 223ms ± 5% ~ (p=0.143 n=40+40)
Gunzip-4 36.5ms ± 4% 36.5ms ± 4% ~ (p=0.556 n=40+40)
HTTPClientServer-4 60.7µs ± 2% 59.9µs ± 3% -1.20% (p=0.000 n=39+39)
JSONEncode-4 9.03ms ± 4% 9.04ms ± 4% ~ (p=0.589 n=40+40)
JSONDecode-4 49.4ms ± 4% 49.2ms ± 4% ~ (p=0.276 n=40+40)
Mandelbrot200-4 3.80ms ± 4% 3.79ms ± 4% ~ (p=0.837 n=40+40)
GoParse-4 3.15ms ± 5% 3.13ms ± 5% ~ (p=0.240 n=40+40)
RegexpMatchEasy0_32-4 72.9ns ± 3% 72.0ns ± 8% -1.25% (p=0.003 n=40+40)
RegexpMatchEasy0_1K-4 229ns ± 5% 230ns ± 4% ~ (p=0.318 n=40+40)
RegexpMatchEasy1_32-4 66.9ns ± 3% 67.3ns ± 7% ~ (p=0.817 n=40+40)
RegexpMatchEasy1_1K-4 371ns ± 5% 370ns ± 4% ~ (p=0.275 n=40+40)
RegexpMatchMedium_32-4 106ns ± 4% 104ns ± 7% -2.28% (p=0.000 n=40+40)
RegexpMatchMedium_1K-4 32.0µs ± 2% 31.4µs ± 3% -2.08% (p=0.000 n=40+40)
RegexpMatchHard_32-4 1.54µs ± 7% 1.52µs ± 3% -1.80% (p=0.007 n=39+40)
RegexpMatchHard_1K-4 45.8µs ± 4% 45.5µs ± 3% ~ (p=0.707 n=40+40)
Revcomp-4 401ms ± 5% 401ms ± 6% ~ (p=0.935 n=40+40)
Template-4 62.4ms ± 4% 61.2ms ± 3% -1.85% (p=0.000 n=40+40)
TimeParse-4 315ns ± 2% 318ns ± 3% +1.10% (p=0.002 n=40+40)
TimeFormat-4 297ns ± 3% 298ns ± 3% ~ (p=0.238 n=40+40)
[Geo mean] 45.8µs 45.7µs -0.22%
name old speed new speed delta
GobDecode-4 119MB/s ± 6% 119MB/s ± 7% ~ (p=0.684 n=39+40)
GobEncode-4 129MB/s ± 7% 128MB/s ± 6% ~ (p=0.413 n=40+34)
Gzip-4 86.6MB/s ± 6% 87.0MB/s ± 6% ~ (p=0.145 n=40+40)
Gunzip-4 532MB/s ± 4% 532MB/s ± 4% ~ (p=0.556 n=40+40)
JSONEncode-4 215MB/s ± 4% 215MB/s ± 4% ~ (p=0.583 n=40+40)
JSONDecode-4 39.3MB/s ± 4% 39.5MB/s ± 4% ~ (p=0.277 n=40+40)
GoParse-4 18.4MB/s ± 5% 18.5MB/s ± 5% ~ (p=0.229 n=40+40)
RegexpMatchEasy0_32-4 439MB/s ± 3% 445MB/s ± 8% +1.28% (p=0.003 n=40+40)
RegexpMatchEasy0_1K-4 4.46GB/s ± 4% 4.45GB/s ± 4% ~ (p=0.343 n=40+40)
RegexpMatchEasy1_32-4 479MB/s ± 3% 476MB/s ± 7% ~ (p=0.855 n=40+40)
RegexpMatchEasy1_1K-4 2.76GB/s ± 5% 2.77GB/s ± 4% ~ (p=0.250 n=40+40)
RegexpMatchMedium_32-4 9.36MB/s ± 4% 9.58MB/s ± 6% +2.31% (p=0.001 n=40+40)
RegexpMatchMedium_1K-4 32.0MB/s ± 2% 32.7MB/s ± 3% +2.12% (p=0.000 n=40+40)
RegexpMatchHard_32-4 20.7MB/s ± 7% 21.1MB/s ± 3% +1.95% (p=0.005 n=40+40)
RegexpMatchHard_1K-4 22.4MB/s ± 4% 22.5MB/s ± 3% ~ (p=0.689 n=40+40)
Revcomp-4 634MB/s ± 5% 634MB/s ± 6% ~ (p=0.935 n=40+40)
Template-4 31.1MB/s ± 3% 31.7MB/s ± 3% +1.88% (p=0.000 n=40+40)
[Geo mean] 129MB/s 130MB/s +0.62%
Change-Id: I9d61ee810d900920c572cbe89e2f1626bfed12b7
Reviewed-on: https://go-review.googlesource.com/c/145209
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-30 00:22:58 +00:00
Keith Randall
9f291d1fc3
cmd/compile: fix rule for combining loads with compares
...
Unlike normal load+op opcodes, the load+compare opcode does
not clobber its non-load argument. Allow the load+compare merge
to happen even if the non-load argument is used elsewhere.
Noticed when investigating issue #28417 .
Change-Id: Ibc48d1f2e06ae76034c59f453815d263e8ec7288
Reviewed-on: https://go-review.googlesource.com/c/145097
Reviewed-by: Ainar Garipov <gugl.zadolbal@gmail.com>
Reviewed-by: Ben Shi <powerman1st@163.com>
2018-10-27 00:59:54 +00:00
Keith Randall
dd789550a7
cmd/compile: intrinsify math/bits.Sub on amd64
...
name old time/op new time/op delta
Sub-8 1.12ns ± 1% 1.17ns ± 1% +5.20% (p=0.008 n=5+5)
Sub32-8 1.11ns ± 0% 1.11ns ± 0% ~ (all samples are equal)
Sub64-8 1.12ns ± 0% 1.18ns ± 1% +5.00% (p=0.016 n=4+5)
Sub64multiple-8 4.10ns ± 1% 0.86ns ± 1% -78.93% (p=0.008 n=5+5)
Fixes #28273
Change-Id: Ibcb6f2fd32d987c3bcbae4f4cd9d335a3de98548
Reviewed-on: https://go-review.googlesource.com/c/144258
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-25 19:47:27 +00:00
Keith Randall
899f3a2892
cmd/compile: intrinsify math/bits.Add on amd64
...
name old time/op new time/op delta
Add-8 1.11ns ± 0% 1.18ns ± 0% +6.31% (p=0.029 n=4+4)
Add32-8 1.02ns ± 0% 1.02ns ± 1% ~ (p=0.333 n=4+5)
Add64-8 1.11ns ± 1% 1.17ns ± 0% +5.79% (p=0.008 n=5+5)
Add64multiple-8 4.35ns ± 1% 0.86ns ± 0% -80.22% (p=0.000 n=5+4)
The individual ops are a bit slower (but still very fast).
Using the ops in carry chains is very fast.
Update #28273
Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5
Reviewed-on: https://go-review.googlesource.com/c/144257
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-25 19:47:00 +00:00
ChrisALiles
13d5cd7847
cmd/compile: use proved bounds to remove signed division fix-ups
...
prove is able to find 94 occurrences in std cmd where a divisor
can't have the value -1. The change removes
the extraneous fix-up code for these cases.
Fixes #25239
Change-Id: Ic184de971f47cc57c702eb72805b8e291c14035d
Reviewed-on: https://go-review.googlesource.com/c/130215
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-23 02:29:44 +00:00
Ben Shi
95dda75bde
cmd/compile: optimize store combination on 386/amd64
...
This CL add 3 rules to combine byte-store to word-store on386 and
amd64.
Change-Id: Iffd9cda42f1961680c81def4edc773ad58f211b3
Reviewed-on: https://go-review.googlesource.com/c/143057
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-19 02:21:04 +00:00
Ben Shi
4158734097
test/codegen: add more combined load/store test cases
...
This CL adds more combined load/store test cases for 386/amd64.
Change-Id: I0a483a6ed0212b65c5e84d67ed8c9f50c389ce2d
Reviewed-on: https://go-review.googlesource.com/c/142878
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-18 01:57:54 +00:00
Lynn Boger
39fa301bdc
test/codegen: enable more tests for ppc64/ppc64le
...
Adding cases for ppc64,ppc64le to the codegen tests
where appropriate.
Change-Id: Idf8cbe88a4ab4406a4ef1ea777bd15a58b68f3ed
Reviewed-on: https://go-review.googlesource.com/c/142557
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-16 19:00:53 +00:00
Ben Shi
4b78fe57a8
cmd/compile: optimize 386's load/store combination
...
This CL adds more combinations of two consequtive MOVBload/MOVBstore
to a unique MOVWload/MOVWstore.
1. The size of the go executable decreases about 4KB, and the total
size of pkg/linux_386 (excluding cmd/compile) decreases about 1.5KB.
2. There is no regression in the go1 benchmark result, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 3.28s ± 2% 3.29s ± 2% ~ (p=0.151 n=40+40)
Fannkuch11-4 3.52s ± 1% 3.51s ± 1% -0.28% (p=0.002 n=40+40)
FmtFprintfEmpty-4 45.4ns ± 4% 45.0ns ± 4% -0.89% (p=0.019 n=40+40)
FmtFprintfString-4 81.9ns ± 7% 81.3ns ± 1% ~ (p=0.660 n=40+25)
FmtFprintfInt-4 91.9ns ± 9% 91.4ns ± 9% ~ (p=0.249 n=40+40)
FmtFprintfIntInt-4 143ns ± 4% 143ns ± 4% ~ (p=0.760 n=40+40)
FmtFprintfPrefixedInt-4 184ns ± 3% 183ns ± 4% ~ (p=0.485 n=40+40)
FmtFprintfFloat-4 408ns ± 3% 409ns ± 3% ~ (p=0.961 n=40+40)
FmtManyArgs-4 597ns ± 4% 602ns ± 3% ~ (p=0.413 n=40+40)
GobDecode-4 7.13ms ± 6% 7.14ms ± 6% ~ (p=0.859 n=40+40)
GobEncode-4 6.86ms ± 9% 6.94ms ± 7% ~ (p=0.162 n=40+40)
Gzip-4 395ms ± 4% 396ms ± 3% ~ (p=0.099 n=40+40)
Gunzip-4 40.9ms ± 4% 41.1ms ± 3% ~ (p=0.064 n=40+40)
HTTPClientServer-4 63.6µs ± 2% 63.6µs ± 3% ~ (p=0.832 n=36+39)
JSONEncode-4 16.1ms ± 3% 15.8ms ± 3% -1.60% (p=0.001 n=40+40)
JSONDecode-4 61.0ms ± 3% 61.5ms ± 4% ~ (p=0.065 n=40+40)
Mandelbrot200-4 5.16ms ± 3% 5.18ms ± 3% ~ (p=0.056 n=40+40)
GoParse-4 3.25ms ± 2% 3.23ms ± 3% ~ (p=0.727 n=40+40)
RegexpMatchEasy0_32-4 90.2ns ± 3% 89.3ns ± 6% -0.98% (p=0.002 n=40+40)
RegexpMatchEasy0_1K-4 812ns ± 3% 815ns ± 3% ~ (p=0.309 n=40+40)
RegexpMatchEasy1_32-4 103ns ± 6% 103ns ± 5% ~ (p=0.680 n=40+40)
RegexpMatchEasy1_1K-4 1.01µs ± 4% 1.02µs ± 3% ~ (p=0.326 n=40+33)
RegexpMatchMedium_32-4 120ns ± 4% 120ns ± 5% ~ (p=0.834 n=40+40)
RegexpMatchMedium_1K-4 40.1µs ± 3% 39.5µs ± 4% -1.35% (p=0.000 n=40+40)
RegexpMatchHard_32-4 2.27µs ± 6% 2.23µs ± 4% -1.67% (p=0.011 n=40+40)
RegexpMatchHard_1K-4 67.2µs ± 3% 67.2µs ± 3% ~ (p=0.149 n=40+40)
Revcomp-4 1.84s ± 2% 1.86s ± 3% +0.70% (p=0.020 n=40+40)
Template-4 69.0ms ± 4% 69.8ms ± 3% +1.20% (p=0.003 n=40+40)
TimeParse-4 438ns ± 3% 439ns ± 4% ~ (p=0.650 n=40+40)
TimeFormat-4 412ns ± 3% 412ns ± 3% ~ (p=0.888 n=40+40)
[Geo mean] 65.2µs 65.2µs -0.04%
name old speed new speed delta
GobDecode-4 108MB/s ± 6% 108MB/s ± 6% ~ (p=0.855 n=40+40)
GobEncode-4 112MB/s ± 9% 111MB/s ± 8% ~ (p=0.159 n=40+40)
Gzip-4 49.2MB/s ± 4% 49.1MB/s ± 3% ~ (p=0.102 n=40+40)
Gunzip-4 474MB/s ± 3% 472MB/s ± 3% ~ (p=0.063 n=40+40)
JSONEncode-4 121MB/s ± 3% 123MB/s ± 3% +1.62% (p=0.001 n=40+40)
JSONDecode-4 31.9MB/s ± 3% 31.6MB/s ± 4% ~ (p=0.070 n=40+40)
GoParse-4 17.9MB/s ± 2% 17.9MB/s ± 3% ~ (p=0.696 n=40+40)
RegexpMatchEasy0_32-4 355MB/s ± 3% 358MB/s ± 5% +0.99% (p=0.002 n=40+40)
RegexpMatchEasy0_1K-4 1.26GB/s ± 3% 1.26GB/s ± 3% ~ (p=0.381 n=40+40)
RegexpMatchEasy1_32-4 310MB/s ± 5% 310MB/s ± 4% ~ (p=0.655 n=40+40)
RegexpMatchEasy1_1K-4 1.01GB/s ± 4% 1.01GB/s ± 3% ~ (p=0.351 n=40+33)
RegexpMatchMedium_32-4 8.32MB/s ± 4% 8.34MB/s ± 5% ~ (p=0.696 n=40+40)
RegexpMatchMedium_1K-4 25.6MB/s ± 3% 25.9MB/s ± 4% +1.36% (p=0.000 n=40+40)
RegexpMatchHard_32-4 14.1MB/s ± 6% 14.3MB/s ± 4% +1.64% (p=0.011 n=40+40)
RegexpMatchHard_1K-4 15.2MB/s ± 3% 15.2MB/s ± 3% ~ (p=0.147 n=40+40)
Revcomp-4 138MB/s ± 2% 137MB/s ± 3% -0.70% (p=0.021 n=40+40)
Template-4 28.1MB/s ± 4% 27.8MB/s ± 3% -1.19% (p=0.003 n=40+40)
[Geo mean] 83.7MB/s 83.7MB/s +0.03%
Change-Id: I2a2b3a942b5c45467491515d201179fd192e65c9
Reviewed-on: https://go-review.googlesource.com/c/141650
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-16 07:17:11 +00:00
Ben Shi
3785be3093
test/codegen: fix confusing test cases
...
ARMv7's MULAF/MULSF/MULAD/MULSD are not fused,
this CL fixes the confusing test cases.
Change-Id: I35022e207e2f0d24a23a7f6f188e41ba8eee9886
Reviewed-on: https://go-review.googlesource.com/c/142439
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Akhil Indurti <aindurti@gmail.com>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-10-16 07:17:02 +00:00
Martin Möhrmann
a0f57c3fd0
cmd/compile: avoid string allocations when map key is struct or array literal
...
x = map[string(byteslice)] is already optimized by the compiler to avoid a
string allocation. This CL generalizes this optimization to:
x = map[T1{ ... Tn{..., string(byteslice), ...} ... }]
where T1 to Tn is a nesting of struct and array literals.
Found in a hot code path that used a struct of strings made from []byte
slices to make a map lookup.
There are no uses of the more generalized optimization in the standard library.
Passes toolstash -cmp.
MapStringConversion/32/simple 21.9ns ± 2% 21.9ns ± 3% ~ (p=0.995 n=17+20)
MapStringConversion/32/struct 28.8ns ± 3% 22.0ns ± 2% -23.80% (p=0.000 n=20+20)
MapStringConversion/32/array 28.5ns ± 2% 21.9ns ± 2% -23.14% (p=0.000 n=19+16)
MapStringConversion/64/simple 21.0ns ± 2% 21.1ns ± 3% ~ (p=0.072 n=19+18)
MapStringConversion/64/struct 72.4ns ± 3% 21.3ns ± 2% -70.53% (p=0.000 n=20+20)
MapStringConversion/64/array 72.8ns ± 1% 21.0ns ± 2% -71.13% (p=0.000 n=17+19)
name old allocs/op new allocs/op delta
MapStringConversion/32/simple 0.00 0.00 ~ (all equal)
MapStringConversion/32/struct 0.00 0.00 ~ (all equal)
MapStringConversion/32/array 0.00 0.00 ~ (all equal)
MapStringConversion/64/simple 0.00 0.00 ~ (all equal)
MapStringConversion/64/struct 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20)
MapStringConversion/64/array 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20)
Change-Id: I483b4d84d8d74b1025b62c954da9a365e79b7a3a
Reviewed-on: https://go-review.googlesource.com/c/116275
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-15 19:22:07 +00:00
Alberto Donizetti
7c96d87eda
test/codegen: test ppc64 TrailingZeros, OnesCount codegen
...
This change adds codegen tests for the intrinsification on ppc64 of
the OnesCount{64,32,16,8}, and TrailingZeros{64,32,16,8} math/bits
functions.
Change-Id: Id3364921fbd18316850e15c8c71330c906187fdb
Reviewed-on: https://go-review.googlesource.com/c/141897
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-10-15 16:53:03 +00:00
Ben Shi
93e27e01af
test/codegen: add tests of FMA for arm/arm64
...
This CL adds tests of fused multiplication-accumulation
on arm/arm64.
Change-Id: Ic85d5277c0d6acb7e1e723653372dfaf96824a39
Reviewed-on: https://go-review.googlesource.com/c/141652
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-15 14:51:30 +00:00
Ben Shi
c3208842e1
test/codegen: add tests for multiplication-subtraction
...
This CL adds tests for armv7's MULS and arm64's MSUBW.
Change-Id: Id0fd5d26fd477e4ed14389b0d33cad930423eb5b
Reviewed-on: https://go-review.googlesource.com/c/141651
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-15 02:41:33 +00:00
Keith Randall
653a4bd8d4
cmd/compile: optimize loads from readonly globals into constants
...
Instead of
MOVB go.string."foo"(SB), AX
do
MOVB $102, AX
When we know the global we're loading from is readonly, we can
do that read at compile time.
I've made this arch-dependent mostly because the cases where this
happens often are memory->memory moves, and those don't get
decomposed until lowering.
Did amd64/386/arm/arm64. Other architectures could follow.
Update #26498
Change-Id: I41b1dc831b2cd0a52dac9b97f4f4457888a46389
Reviewed-on: https://go-review.googlesource.com/c/141118
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-10-14 02:54:40 +00:00
Ben Shi
bac6a2925c
test/codegen: add more arm64 test cases
...
This CL adds 3 combined load test cases for arm64.
Change-Id: I2c67308c40fd8a18f9f2d16c6d12911dcdc583e2
Reviewed-on: https://go-review.googlesource.com/c/140700
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-11 15:14:06 +00:00
Ben Shi
27965c1436
cmd/compile: optimize 386's ADDLconstmodifyidx4
...
This CL optimize ADDLconstmodifyidx4 to INCL/DECL, when the
constant is +1/-1.
1. The total size of pkg/linux_386/ decreases 28 bytes, excluding
cmd/compile.
2. There is no regression in the go1 benchmark test, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 3.25s ± 2% 3.23s ± 3% -0.70% (p=0.040 n=30+30)
Fannkuch11-4 3.50s ± 1% 3.47s ± 1% -0.68% (p=0.000 n=30+30)
FmtFprintfEmpty-4 44.6ns ± 3% 44.8ns ± 3% +0.46% (p=0.029 n=30+30)
FmtFprintfString-4 79.0ns ± 3% 78.7ns ± 3% ~ (p=0.053 n=30+30)
FmtFprintfInt-4 89.2ns ± 2% 89.4ns ± 3% ~ (p=0.665 n=30+29)
FmtFprintfIntInt-4 142ns ± 3% 142ns ± 3% ~ (p=0.435 n=30+30)
FmtFprintfPrefixedInt-4 182ns ± 2% 182ns ± 2% ~ (p=0.964 n=30+30)
FmtFprintfFloat-4 407ns ± 3% 411ns ± 4% ~ (p=0.080 n=30+30)
FmtManyArgs-4 597ns ± 3% 593ns ± 4% ~ (p=0.222 n=30+30)
GobDecode-4 7.09ms ± 6% 7.07ms ± 7% ~ (p=0.633 n=30+30)
GobEncode-4 6.81ms ± 9% 6.81ms ± 8% ~ (p=0.982 n=30+30)
Gzip-4 398ms ± 4% 400ms ± 6% ~ (p=0.177 n=30+30)
Gunzip-4 41.3ms ± 3% 40.6ms ± 4% -1.71% (p=0.005 n=30+30)
HTTPClientServer-4 63.4µs ± 3% 63.4µs ± 4% ~ (p=0.646 n=30+28)
JSONEncode-4 16.0ms ± 3% 16.1ms ± 3% ~ (p=0.057 n=30+30)
JSONDecode-4 63.3ms ± 8% 63.1ms ± 7% ~ (p=0.786 n=30+30)
Mandelbrot200-4 5.17ms ± 3% 5.15ms ± 8% ~ (p=0.654 n=30+30)
GoParse-4 3.24ms ± 3% 3.23ms ± 2% ~ (p=0.091 n=30+30)
RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.575 n=30+30)
RegexpMatchEasy0_1K-4 823ns ± 2% 821ns ± 3% ~ (p=0.827 n=30+30)
RegexpMatchEasy1_32-4 113ns ± 3% 112ns ± 3% ~ (p=0.076 n=30+30)
RegexpMatchEasy1_1K-4 1.02µs ± 4% 1.01µs ± 5% ~ (p=0.087 n=30+30)
RegexpMatchMedium_32-4 129ns ± 3% 127ns ± 4% -1.55% (p=0.009 n=30+30)
RegexpMatchMedium_1K-4 39.3µs ± 4% 39.7µs ± 3% ~ (p=0.054 n=30+30)
RegexpMatchHard_32-4 2.15µs ± 4% 2.15µs ± 4% ~ (p=0.712 n=30+30)
RegexpMatchHard_1K-4 66.0µs ± 3% 65.1µs ± 3% -1.32% (p=0.002 n=30+30)
Revcomp-4 1.85s ± 2% 1.85s ± 3% ~ (p=0.168 n=30+30)
Template-4 69.5ms ± 7% 68.9ms ± 6% ~ (p=0.250 n=28+28)
TimeParse-4 434ns ± 3% 432ns ± 4% ~ (p=0.629 n=30+30)
TimeFormat-4 403ns ± 4% 408ns ± 3% +1.23% (p=0.019 n=30+29)
[Geo mean] 65.5µs 65.3µs -0.20%
name old speed new speed delta
GobDecode-4 108MB/s ± 6% 109MB/s ± 6% ~ (p=0.636 n=30+30)
GobEncode-4 113MB/s ±10% 113MB/s ± 9% ~ (p=0.982 n=30+30)
Gzip-4 48.8MB/s ± 4% 48.6MB/s ± 5% ~ (p=0.178 n=30+30)
Gunzip-4 470MB/s ± 3% 479MB/s ± 4% +1.72% (p=0.006 n=30+30)
JSONEncode-4 121MB/s ± 3% 120MB/s ± 3% ~ (p=0.057 n=30+30)
JSONDecode-4 30.7MB/s ± 8% 30.8MB/s ± 8% ~ (p=0.784 n=30+30)
GoParse-4 17.9MB/s ± 3% 17.9MB/s ± 2% ~ (p=0.090 n=30+30)
RegexpMatchEasy0_32-4 309MB/s ± 4% 309MB/s ± 3% ~ (p=0.530 n=30+30)
RegexpMatchEasy0_1K-4 1.24GB/s ± 2% 1.25GB/s ± 3% ~ (p=0.976 n=30+30)
RegexpMatchEasy1_32-4 282MB/s ± 3% 284MB/s ± 3% +0.81% (p=0.041 n=30+30)
RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 1.01GB/s ± 4% ~ (p=0.091 n=30+30)
RegexpMatchMedium_32-4 7.71MB/s ± 3% 7.84MB/s ± 4% +1.71% (p=0.000 n=30+30)
RegexpMatchMedium_1K-4 26.1MB/s ± 4% 25.8MB/s ± 3% ~ (p=0.051 n=30+30)
RegexpMatchHard_32-4 14.9MB/s ± 4% 14.9MB/s ± 4% ~ (p=0.712 n=30+30)
RegexpMatchHard_1K-4 15.5MB/s ± 3% 15.7MB/s ± 3% +1.34% (p=0.003 n=30+30)
Revcomp-4 138MB/s ± 2% 137MB/s ± 3% ~ (p=0.174 n=30+30)
Template-4 28.0MB/s ± 6% 28.2MB/s ± 6% ~ (p=0.251 n=28+28)
[Geo mean] 82.3MB/s 82.6MB/s +0.36%
Change-Id: I389829699ffe9500a013fcf31be58a97e98043e1
Reviewed-on: https://go-review.googlesource.com/c/140701
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-11 01:20:34 +00:00
Keith Randall
ceb0c371d9
cmd/compile: make []byte("...") more efficient
...
Do []byte(string) conversions more efficiently when the string
is a constant. Instead of calling stringtobyteslice, allocate
just the space we need and encode the initialization directly.
[]byte("foo") rewrites to the following pseudocode:
var s [3]byte // on heap or stack, depending on whether b escapes
s = *(*[3]byte)(&"foo"[0]) // initialize s from the string
b = s[:]
which generates this assembly:
0x001d 00029 (tmp1.go:9) LEAQ type.[3]uint8(SB), AX
0x0024 00036 (tmp1.go:9) MOVQ AX, (SP)
0x0028 00040 (tmp1.go:9) CALL runtime.newobject(SB)
0x002d 00045 (tmp1.go:9) MOVQ 8(SP), AX
0x0032 00050 (tmp1.go:9) MOVBLZX go.string."foo"+2(SB), CX
0x0039 00057 (tmp1.go:9) MOVWLZX go.string."foo"(SB), DX
0x0040 00064 (tmp1.go:9) MOVW DX, (AX)
0x0043 00067 (tmp1.go:9) MOVB CL, 2(AX)
// Then the slice is b = {AX, 3, 3}
The generated code is still not optimal, as it still does load/store
from read-only memory instead of constant stores. Next CL...
Update #26498
Fixes #10170
Change-Id: I4b990b19f9a308f60c8f4f148934acffefe0a5bd
Reviewed-on: https://go-review.googlesource.com/c/140698
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-10 16:10:40 +00:00
Ben Shi
3933302550
cmd/compile: add indexed form for several 386 instructions
...
This CL implements indexed memory operands for the following instructions.
(ADD|SUB|MUL|AND|OR|XOR)Lload -> (ADD|SUB|MUL|AND|OR|XOR)Lloadidx4
(ADD|SUB|AND|OR|XOR)Lmodify -> (ADD|SUB|AND|OR|XOR)Lmodifyidx4
(ADD|AND|OR|XOR)Lconstmodify -> (ADD|AND|OR|XOR)Lconstmodifyidx4
1. The total size of pkg/linux_386/ decreases about 2.5KB, excluding
cmd/compile/ .
2. There is little regression in the go1 benchmark test, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 3.25s ± 3% 3.25s ± 3% ~ (p=0.218 n=40+40)
Fannkuch11-4 3.53s ± 1% 3.53s ± 1% ~ (p=0.303 n=40+40)
FmtFprintfEmpty-4 44.9ns ± 3% 45.6ns ± 3% +1.48% (p=0.030 n=40+36)
FmtFprintfString-4 78.7ns ± 5% 80.1ns ± 7% ~ (p=0.217 n=36+40)
FmtFprintfInt-4 90.2ns ± 6% 89.8ns ± 5% ~ (p=0.659 n=40+38)
FmtFprintfIntInt-4 140ns ± 5% 141ns ± 5% +1.00% (p=0.027 n=40+40)
FmtFprintfPrefixedInt-4 185ns ± 3% 183ns ± 3% ~ (p=0.104 n=40+40)
FmtFprintfFloat-4 411ns ± 4% 406ns ± 3% -1.37% (p=0.005 n=40+40)
FmtManyArgs-4 590ns ± 4% 598ns ± 4% +1.35% (p=0.008 n=40+40)
GobDecode-4 7.16ms ± 5% 7.10ms ± 5% ~ (p=0.335 n=40+40)
GobEncode-4 6.85ms ± 7% 6.74ms ± 9% ~ (p=0.058 n=38+40)
Gzip-4 400ms ± 4% 399ms ± 2% -0.34% (p=0.003 n=40+33)
Gunzip-4 41.4ms ± 3% 41.4ms ± 4% -0.12% (p=0.020 n=40+40)
HTTPClientServer-4 64.1µs ± 4% 63.5µs ± 2% -1.07% (p=0.000 n=39+37)
JSONEncode-4 15.9ms ± 2% 15.9ms ± 3% ~ (p=0.103 n=40+40)
JSONDecode-4 62.2ms ± 4% 61.6ms ± 3% -0.98% (p=0.006 n=39+40)
Mandelbrot200-4 5.18ms ± 3% 5.14ms ± 4% ~ (p=0.125 n=40+40)
GoParse-4 3.29ms ± 2% 3.27ms ± 2% -0.66% (p=0.006 n=40+40)
RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.632 n=40+40)
RegexpMatchEasy0_1K-4 830ns ± 3% 828ns ± 3% ~ (p=0.563 n=40+40)
RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.494 n=40+40)
RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 4% ~ (p=0.665 n=40+40)
RegexpMatchMedium_32-4 130ns ± 4% 129ns ± 3% ~ (p=0.458 n=40+40)
RegexpMatchMedium_1K-4 39.4µs ± 3% 39.7µs ± 3% ~ (p=0.825 n=40+40)
RegexpMatchHard_32-4 2.16µs ± 4% 2.15µs ± 4% ~ (p=0.137 n=40+40)
RegexpMatchHard_1K-4 65.2µs ± 3% 65.4µs ± 4% ~ (p=0.160 n=40+40)
Revcomp-4 1.87s ± 2% 1.87s ± 1% +0.17% (p=0.019 n=33+33)
Template-4 69.4ms ± 3% 69.8ms ± 3% +0.60% (p=0.009 n=40+40)
TimeParse-4 437ns ± 4% 438ns ± 4% ~ (p=0.234 n=40+40)
TimeFormat-4 408ns ± 3% 408ns ± 3% ~ (p=0.904 n=40+40)
[Geo mean] 65.7µs 65.6µs -0.08%
name old speed new speed delta
GobDecode-4 107MB/s ± 5% 108MB/s ± 5% ~ (p=0.336 n=40+40)
GobEncode-4 112MB/s ± 6% 114MB/s ± 9% +1.95% (p=0.036 n=37+40)
Gzip-4 48.5MB/s ± 4% 48.6MB/s ± 2% +0.28% (p=0.003 n=40+33)
Gunzip-4 469MB/s ± 4% 469MB/s ± 4% +0.11% (p=0.021 n=40+40)
JSONEncode-4 122MB/s ± 2% 122MB/s ± 3% ~ (p=0.105 n=40+40)
JSONDecode-4 31.2MB/s ± 4% 31.5MB/s ± 4% +0.99% (p=0.007 n=39+40)
GoParse-4 17.6MB/s ± 2% 17.7MB/s ± 2% +0.66% (p=0.007 n=40+40)
RegexpMatchEasy0_32-4 310MB/s ± 4% 310MB/s ± 4% ~ (p=0.384 n=40+40)
RegexpMatchEasy0_1K-4 1.23GB/s ± 3% 1.24GB/s ± 3% ~ (p=0.186 n=40+40)
RegexpMatchEasy1_32-4 283MB/s ± 3% 281MB/s ± 4% ~ (p=0.855 n=40+40)
RegexpMatchEasy1_1K-4 1.00GB/s ± 4% 1.00GB/s ± 4% ~ (p=0.665 n=40+40)
RegexpMatchMedium_32-4 7.68MB/s ± 4% 7.73MB/s ± 3% ~ (p=0.359 n=40+40)
RegexpMatchMedium_1K-4 26.0MB/s ± 3% 25.8MB/s ± 3% ~ (p=0.825 n=40+40)
RegexpMatchHard_32-4 14.8MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.136 n=40+40)
RegexpMatchHard_1K-4 15.7MB/s ± 3% 15.7MB/s ± 4% ~ (p=0.150 n=40+40)
Revcomp-4 136MB/s ± 1% 136MB/s ± 1% -0.09% (p=0.028 n=32+33)
Template-4 28.0MB/s ± 3% 27.8MB/s ± 3% -0.59% (p=0.010 n=40+40)
[Geo mean] 82.1MB/s 82.3MB/s +0.25%
Change-Id: Ifa387a251056678326d3508aa02753b70bf7e5d0
Reviewed-on: https://go-review.googlesource.com/c/140303
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-09 03:55:08 +00:00
Carlos Eduardo Seo
9aed4cc395
cmd/compile: instrinsify math/bits.Mul on ppc64x
...
Add SSA rules to intrinsify Mul/Mul64 on ppc64x.
benchmark old ns/op new ns/op delta
BenchmarkMul-40 8.80 0.93 -89.43%
BenchmarkMul32-40 1.39 1.39 +0.00%
BenchmarkMul64-40 5.39 0.93 -82.75%
Updates #24813
Change-Id: I6e95bfbe976a2278bd17799df184a7fbc0e57829
Reviewed-on: https://go-review.googlesource.com/138917
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-10-02 18:56:06 +00:00
Ben Shi
5aeecc4530
cmd/compile: optimize arm64's code with more shifted operations
...
This CL optimizes arm64's NEG/MVN/TST/CMN with a shifted operand.
1. The total size of pkg/android_arm64 decreases about 0.2KB, excluding
cmd/compile/ .
2. The go1 benchmark shows no regression, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 16.4s ± 1% 16.4s ± 1% ~ (p=0.914 n=29+29)
Fannkuch11-4 8.72s ± 0% 8.72s ± 0% ~ (p=0.274 n=30+29)
FmtFprintfEmpty-4 174ns ± 0% 174ns ± 0% ~ (all equal)
FmtFprintfString-4 370ns ± 0% 370ns ± 0% ~ (all equal)
FmtFprintfInt-4 419ns ± 0% 419ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 672ns ± 1% 675ns ± 2% ~ (p=0.217 n=28+30)
FmtFprintfPrefixedInt-4 806ns ± 0% 806ns ± 0% ~ (p=0.402 n=30+28)
FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% +0.02% (p=0.011 n=22+27)
FmtManyArgs-4 2.67µs ± 0% 2.68µs ± 0% ~ (p=0.279 n=29+30)
GobDecode-4 33.1ms ± 1% 33.1ms ± 0% ~ (p=0.052 n=28+29)
GobEncode-4 29.6ms ± 0% 29.6ms ± 0% +0.08% (p=0.013 n=28+29)
Gzip-4 1.38s ± 2% 1.39s ± 2% ~ (p=0.071 n=29+29)
Gunzip-4 139ms ± 0% 139ms ± 0% ~ (p=0.265 n=29+29)
HTTPClientServer-4 789µs ± 4% 785µs ± 4% ~ (p=0.206 n=29+28)
JSONEncode-4 49.7ms ± 0% 49.6ms ± 0% -0.24% (p=0.000 n=30+30)
JSONDecode-4 266ms ± 1% 267ms ± 1% +0.34% (p=0.000 n=30+30)
Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.835 n=28+30)
GoParse-4 15.9ms ± 0% 15.8ms ± 0% -0.29% (p=0.000 n=27+30)
RegexpMatchEasy0_32-4 380ns ± 0% 381ns ± 0% +0.18% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 1.18µs ± 0% 1.19µs ± 0% +0.23% (p=0.000 n=30+30)
RegexpMatchEasy1_32-4 357ns ± 0% 358ns ± 0% +0.28% (p=0.000 n=29+29)
RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.04µs ± 0% +0.06% (p=0.006 n=30+30)
RegexpMatchMedium_32-4 589ns ± 0% 590ns ± 0% +0.24% (p=0.000 n=28+30)
RegexpMatchMedium_1K-4 162µs ± 0% 162µs ± 0% -0.01% (p=0.027 n=26+29)
RegexpMatchHard_32-4 9.58µs ± 0% 9.58µs ± 0% ~ (p=0.935 n=30+30)
RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% ~ (p=0.387 n=29+30)
Revcomp-4 2.50s ± 0% 2.50s ± 0% -0.10% (p=0.020 n=28+28)
Template-4 310ms ± 0% 310ms ± 1% ~ (p=0.406 n=30+30)
TimeParse-4 1.68µs ± 0% 1.68µs ± 0% +0.03% (p=0.014 n=30+17)
TimeFormat-4 1.65µs ± 0% 1.66µs ± 0% +0.32% (p=0.000 n=27+29)
[Geo mean] 247µs 247µs +0.05%
name old speed new speed delta
GobDecode-4 23.2MB/s ± 0% 23.2MB/s ± 0% -0.08% (p=0.032 n=27+29)
GobEncode-4 26.0MB/s ± 0% 25.9MB/s ± 0% -0.10% (p=0.011 n=29+29)
Gzip-4 14.1MB/s ± 2% 14.0MB/s ± 2% ~ (p=0.081 n=29+29)
Gunzip-4 139MB/s ± 0% 139MB/s ± 0% ~ (p=0.290 n=29+29)
JSONEncode-4 39.0MB/s ± 0% 39.1MB/s ± 0% +0.25% (p=0.000 n=29+30)
JSONDecode-4 7.30MB/s ± 1% 7.28MB/s ± 1% -0.33% (p=0.000 n=30+30)
GoParse-4 3.65MB/s ± 0% 3.66MB/s ± 0% +0.29% (p=0.000 n=27+30)
RegexpMatchEasy0_32-4 84.1MB/s ± 0% 84.0MB/s ± 0% -0.17% (p=0.000 n=30+28)
RegexpMatchEasy0_1K-4 864MB/s ± 0% 862MB/s ± 0% -0.24% (p=0.000 n=30+30)
RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.3MB/s ± 0% -0.18% (p=0.000 n=28+24)
RegexpMatchEasy1_1K-4 502MB/s ± 0% 502MB/s ± 0% -0.05% (p=0.008 n=30+29)
RegexpMatchMedium_32-4 1.70MB/s ± 0% 1.69MB/s ± 0% -0.59% (p=0.000 n=29+30)
RegexpMatchMedium_1K-4 6.31MB/s ± 0% 6.31MB/s ± 0% +0.05% (p=0.005 n=30+26)
RegexpMatchHard_32-4 3.34MB/s ± 0% 3.34MB/s ± 0% ~ (all equal)
RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.57MB/s ± 0% ~ (all equal)
Revcomp-4 102MB/s ± 0% 102MB/s ± 0% +0.10% (p=0.022 n=28+28)
Template-4 6.26MB/s ± 0% 6.26MB/s ± 1% ~ (p=0.768 n=30+30)
[Geo mean] 24.2MB/s 24.1MB/s -0.08%
Change-Id: I494f9db7f8a568a00e9c74ae25086a58b2221683
Reviewed-on: https://go-review.googlesource.com/137976
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-28 15:05:17 +00:00
Ben Shi
d60cf39f8e
cmd/compile: optimize arm64's MADD and MSUB
...
This CL implements constant folding for MADD/MSUB on arm64.
1. The total size of pkg/android_arm64/ decreases about 4KB,
excluding cmd/compile/ .
2. There is no regression in the go1 benchmark, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 16.4s ± 1% 16.5s ± 1% +0.24% (p=0.008 n=29+29)
Fannkuch11-4 8.73s ± 0% 8.71s ± 0% -0.15% (p=0.000 n=29+29)
FmtFprintfEmpty-4 174ns ± 0% 174ns ± 0% ~ (all equal)
FmtFprintfString-4 370ns ± 0% 372ns ± 2% +0.53% (p=0.007 n=24+30)
FmtFprintfInt-4 419ns ± 0% 419ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 673ns ± 1% 661ns ± 1% -1.81% (p=0.000 n=30+27)
FmtFprintfPrefixedInt-4 806ns ± 0% 805ns ± 0% ~ (p=0.957 n=28+27)
FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% -0.04% (p=0.001 n=22+30)
FmtManyArgs-4 2.67µs ± 0% 2.68µs ± 0% +0.03% (p=0.045 n=29+28)
GobDecode-4 33.2ms ± 1% 32.5ms ± 1% -2.11% (p=0.000 n=29+29)
GobEncode-4 29.5ms ± 0% 29.2ms ± 0% -1.04% (p=0.000 n=28+28)
Gzip-4 1.39s ± 2% 1.38s ± 1% -0.48% (p=0.023 n=30+30)
Gunzip-4 139ms ± 0% 139ms ± 0% ~ (p=0.616 n=30+28)
HTTPClientServer-4 766µs ± 4% 758µs ± 3% -1.03% (p=0.013 n=28+29)
JSONEncode-4 49.7ms ± 0% 49.6ms ± 0% -0.24% (p=0.000 n=30+30)
JSONDecode-4 266ms ± 0% 268ms ± 1% +1.07% (p=0.000 n=29+30)
Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.248 n=30+29)
GoParse-4 15.9ms ± 0% 16.0ms ± 0% +0.76% (p=0.000 n=29+29)
RegexpMatchEasy0_32-4 381ns ± 0% 380ns ± 0% -0.14% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 1.18µs ± 0% 1.19µs ± 1% +0.30% (p=0.000 n=29+30)
RegexpMatchEasy1_32-4 357ns ± 0% 357ns ± 0% ~ (all equal)
RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.05µs ± 0% +0.50% (p=0.000 n=26+28)
RegexpMatchMedium_32-4 590ns ± 0% 589ns ± 0% -0.12% (p=0.000 n=30+23)
RegexpMatchMedium_1K-4 162µs ± 0% 162µs ± 0% ~ (p=0.318 n=28+25)
RegexpMatchHard_32-4 9.56µs ± 0% 9.56µs ± 0% ~ (p=0.072 n=30+29)
RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% -0.02% (p=0.005 n=28+28)
Revcomp-4 2.50s ± 0% 2.51s ± 0% ~ (p=0.246 n=29+29)
Template-4 312ms ± 1% 313ms ± 1% +0.46% (p=0.002 n=30+30)
TimeParse-4 1.68µs ± 0% 1.67µs ± 0% -0.31% (p=0.000 n=27+29)
TimeFormat-4 1.66µs ± 0% 1.64µs ± 0% -0.92% (p=0.000 n=29+26)
[Geo mean] 247µs 246µs -0.15%
name old speed new speed delta
GobDecode-4 23.1MB/s ± 1% 23.6MB/s ± 0% +2.17% (p=0.000 n=29+28)
GobEncode-4 26.0MB/s ± 0% 26.3MB/s ± 0% +1.05% (p=0.000 n=28+28)
Gzip-4 14.0MB/s ± 2% 14.1MB/s ± 1% +0.47% (p=0.026 n=30+30)
Gunzip-4 139MB/s ± 0% 139MB/s ± 0% ~ (p=0.624 n=30+28)
JSONEncode-4 39.1MB/s ± 0% 39.2MB/s ± 0% +0.24% (p=0.000 n=30+30)
JSONDecode-4 7.31MB/s ± 0% 7.23MB/s ± 1% -1.07% (p=0.000 n=28+30)
GoParse-4 3.65MB/s ± 0% 3.62MB/s ± 0% -0.77% (p=0.000 n=29+29)
RegexpMatchEasy0_32-4 84.0MB/s ± 0% 84.1MB/s ± 0% +0.18% (p=0.000 n=28+30)
RegexpMatchEasy0_1K-4 864MB/s ± 0% 861MB/s ± 1% -0.29% (p=0.000 n=29+30)
RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.5MB/s ± 0% ~ (p=0.841 n=28+28)
RegexpMatchEasy1_1K-4 502MB/s ± 0% 500MB/s ± 0% -0.51% (p=0.000 n=29+29)
RegexpMatchMedium_32-4 1.69MB/s ± 0% 1.70MB/s ± 0% +0.41% (p=0.000 n=26+30)
RegexpMatchMedium_1K-4 6.31MB/s ± 0% 6.30MB/s ± 0% ~ (p=0.129 n=30+25)
RegexpMatchHard_32-4 3.35MB/s ± 0% 3.35MB/s ± 0% ~ (p=0.657 n=30+29)
RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.57MB/s ± 0% ~ (all equal)
Revcomp-4 102MB/s ± 0% 101MB/s ± 0% ~ (p=0.213 n=29+29)
Template-4 6.22MB/s ± 1% 6.19MB/s ± 1% -0.42% (p=0.005 n=30+29)
[Geo mean] 24.1MB/s 24.2MB/s +0.08%
Change-Id: I6c02d3c9975f6bd8bc215cb1fc14d29602b45649
Reviewed-on: https://go-review.googlesource.com/138095
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-28 15:03:17 +00:00
Brian Kessler
9eb53ab9bc
cmd/compile: intrinsify math/bits.Mul
...
Add SSA rules to intrinsify Mul/Mul64 (AMD64 and ARM64).
SSA rules for other functions and architectures are left as a future
optimization. Benchmark results on AMD64/ARM64 before and after SSA
implementation are below.
amd64
name old time/op new time/op delta
Add-4 1.78ns ± 0% 1.85ns ±12% ~ (p=0.397 n=4+5)
Add32-4 1.71ns ± 1% 1.70ns ± 0% ~ (p=0.683 n=5+5)
Add64-4 1.80ns ± 2% 1.77ns ± 0% -1.22% (p=0.048 n=5+5)
Sub-4 1.78ns ± 0% 1.78ns ± 0% ~ (all equal)
Sub32-4 1.78ns ± 1% 1.78ns ± 0% ~ (p=1.000 n=5+5)
Sub64-4 1.78ns ± 1% 1.78ns ± 0% ~ (p=0.968 n=5+4)
Mul-4 11.5ns ± 1% 1.8ns ± 2% -84.39% (p=0.008 n=5+5)
Mul32-4 1.39ns ± 0% 1.38ns ± 3% ~ (p=0.175 n=5+5)
Mul64-4 6.85ns ± 1% 1.78ns ± 1% -73.97% (p=0.008 n=5+5)
Div-4 57.1ns ± 1% 56.7ns ± 0% ~ (p=0.087 n=5+5)
Div32-4 18.0ns ± 0% 18.0ns ± 0% ~ (all equal)
Div64-4 56.4ns ±10% 53.6ns ± 1% ~ (p=0.071 n=5+5)
arm64
name old time/op new time/op delta
Add-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal)
Add32-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal)
Add64-96 5.52ns ± 0% 5.51ns ± 0% ~ (p=0.444 n=5+5)
Sub-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal)
Sub32-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal)
Sub64-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal)
Mul-96 34.6ns ± 0% 5.0ns ± 0% -85.52% (p=0.008 n=5+5)
Mul32-96 4.51ns ± 0% 4.51ns ± 0% ~ (all equal)
Mul64-96 21.1ns ± 0% 5.0ns ± 0% -76.26% (p=0.008 n=5+5)
Div-96 64.7ns ± 0% 64.7ns ± 0% ~ (all equal)
Div32-96 17.0ns ± 0% 17.0ns ± 0% ~ (all equal)
Div64-96 53.1ns ± 0% 53.1ns ± 0% ~ (all equal)
Updates #24813
Change-Id: I9bda6d2102f65cae3d436a2087b47ed8bafeb068
Reviewed-on: https://go-review.googlesource.com/129415
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-26 20:35:57 +00:00
Iskander Sharipov
c03d0e4fec
cmd/compile/internal/gc: handle arith ops in samesafeexpr
...
Teach samesafeexpr to handle arithmetic unary and binary ops.
It makes map lookup optimization possible in
m[k+1] = append(m[k+1], ...)
m[-k] = append(m[-k], ...)
... etc
Does not cover "+" for strings (concatenation).
Change-Id: Ibbb16ac3faf176958da344be1471b06d7cf33a6c
Reviewed-on: https://go-review.googlesource.com/135795
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-19 12:03:58 +00:00
Ben Shi
c6bf9a8109
cmd/compile: optimize AMD64's bit wise operation
...
Currently "arr[idx] |= 0x80" is compiled to MOVLload->BTSL->MOVLstore.
And this CL optimizes it to a single BTSLconstmodify. Other bit wise
operations with a direct memory operand are also implemented.
1. The size of the executable bin/go decreases about 4KB, and the total size
of pkg/linux_amd64 (excluding cmd/compile) decreases about 0.6KB.
2. There a little improvement in the go1 benchmark test (excluding noise).
name old time/op new time/op delta
BinaryTree17-4 2.66s ± 4% 2.66s ± 3% ~ (p=0.596 n=49+49)
Fannkuch11-4 2.38s ± 2% 2.32s ± 2% -2.69% (p=0.000 n=50+50)
FmtFprintfEmpty-4 42.7ns ± 4% 43.2ns ± 7% +1.31% (p=0.009 n=50+50)
FmtFprintfString-4 71.0ns ± 5% 72.0ns ± 3% +1.33% (p=0.000 n=50+50)
FmtFprintfInt-4 80.7ns ± 4% 80.6ns ± 3% ~ (p=0.931 n=50+50)
FmtFprintfIntInt-4 125ns ± 3% 126ns ± 4% ~ (p=0.051 n=50+50)
FmtFprintfPrefixedInt-4 158ns ± 1% 142ns ± 3% -9.84% (p=0.000 n=36+50)
FmtFprintfFloat-4 215ns ± 4% 212ns ± 4% -1.23% (p=0.002 n=50+50)
FmtManyArgs-4 519ns ± 3% 510ns ± 3% -1.77% (p=0.000 n=50+50)
GobDecode-4 6.49ms ± 6% 6.52ms ± 5% ~ (p=0.866 n=50+50)
GobEncode-4 5.93ms ± 8% 6.01ms ± 7% ~ (p=0.076 n=50+50)
Gzip-4 222ms ± 4% 224ms ± 8% +0.80% (p=0.001 n=50+50)
Gunzip-4 36.6ms ± 5% 36.4ms ± 4% ~ (p=0.093 n=50+50)
HTTPClientServer-4 59.1µs ± 1% 58.9µs ± 2% -0.24% (p=0.039 n=49+48)
JSONEncode-4 9.23ms ± 4% 9.21ms ± 5% ~ (p=0.244 n=50+50)
JSONDecode-4 48.8ms ± 4% 48.7ms ± 4% ~ (p=0.653 n=50+50)
Mandelbrot200-4 3.81ms ± 4% 3.80ms ± 3% ~ (p=0.834 n=50+50)
GoParse-4 3.20ms ± 5% 3.19ms ± 5% ~ (p=0.494 n=50+50)
RegexpMatchEasy0_32-4 78.1ns ± 2% 77.4ns ± 3% -0.86% (p=0.005 n=50+50)
RegexpMatchEasy0_1K-4 233ns ± 3% 233ns ± 3% ~ (p=0.074 n=50+50)
RegexpMatchEasy1_32-4 74.2ns ± 3% 73.4ns ± 3% -1.06% (p=0.000 n=50+50)
RegexpMatchEasy1_1K-4 369ns ± 2% 364ns ± 4% -1.41% (p=0.000 n=36+50)
RegexpMatchMedium_32-4 109ns ± 4% 107ns ± 3% -2.06% (p=0.001 n=50+50)
RegexpMatchMedium_1K-4 31.5µs ± 3% 30.8µs ± 3% -2.20% (p=0.000 n=50+50)
RegexpMatchHard_32-4 1.57µs ± 3% 1.56µs ± 2% -0.57% (p=0.016 n=50+50)
RegexpMatchHard_1K-4 47.4µs ± 4% 47.0µs ± 3% -0.82% (p=0.008 n=50+50)
Revcomp-4 414ms ± 7% 412ms ± 7% ~ (p=0.285 n=50+50)
Template-4 64.3ms ± 4% 62.7ms ± 3% -2.44% (p=0.000 n=50+50)
TimeParse-4 316ns ± 3% 313ns ± 3% ~ (p=0.122 n=50+50)
TimeFormat-4 291ns ± 3% 293ns ± 3% +0.80% (p=0.001 n=50+50)
[Geo mean] 46.5µs 46.2µs -0.81%
name old speed new speed delta
GobDecode-4 118MB/s ± 6% 118MB/s ± 5% ~ (p=0.863 n=50+50)
GobEncode-4 130MB/s ± 9% 128MB/s ± 8% ~ (p=0.076 n=50+50)
Gzip-4 87.4MB/s ± 4% 86.8MB/s ± 7% -0.78% (p=0.002 n=50+50)
Gunzip-4 531MB/s ± 5% 533MB/s ± 4% ~ (p=0.093 n=50+50)
JSONEncode-4 210MB/s ± 4% 211MB/s ± 5% ~ (p=0.247 n=50+50)
JSONDecode-4 39.8MB/s ± 4% 39.9MB/s ± 4% ~ (p=0.654 n=50+50)
GoParse-4 18.1MB/s ± 5% 18.2MB/s ± 5% ~ (p=0.493 n=50+50)
RegexpMatchEasy0_32-4 410MB/s ± 2% 413MB/s ± 3% +0.86% (p=0.004 n=50+50)
RegexpMatchEasy0_1K-4 4.39GB/s ± 3% 4.38GB/s ± 3% ~ (p=0.063 n=50+50)
RegexpMatchEasy1_32-4 432MB/s ± 3% 436MB/s ± 3% +1.07% (p=0.000 n=50+50)
RegexpMatchEasy1_1K-4 2.77GB/s ± 2% 2.81GB/s ± 4% +1.46% (p=0.000 n=36+50)
RegexpMatchMedium_32-4 9.16MB/s ± 3% 9.35MB/s ± 4% +2.09% (p=0.001 n=50+50)
RegexpMatchMedium_1K-4 32.5MB/s ± 3% 33.2MB/s ± 3% +2.25% (p=0.000 n=50+50)
RegexpMatchHard_32-4 20.4MB/s ± 3% 20.5MB/s ± 2% +0.56% (p=0.017 n=50+50)
RegexpMatchHard_1K-4 21.6MB/s ± 4% 21.8MB/s ± 3% +0.83% (p=0.008 n=50+50)
Revcomp-4 613MB/s ± 4% 618MB/s ± 7% ~ (p=0.152 n=48+50)
Template-4 30.2MB/s ± 4% 30.9MB/s ± 3% +2.49% (p=0.000 n=50+50)
[Geo mean] 127MB/s 128MB/s +0.64%
Change-Id: If405198283855d75697f66cf894b2bef458f620e
Reviewed-on: https://go-review.googlesource.com/135422
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-19 03:00:58 +00:00
fanzha02
a19a83c8ef
cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on arm64
...
Use float <-> int register moves without conversion instead of stores
and loads to move float <-> int values.
Math package benchmark results.
name old time/op new time/op delta
Acosh 153ns ± 0% 147ns ± 0% -3.92% (p=0.000 n=10+10)
Asinh 183ns ± 0% 177ns ± 0% -3.28% (p=0.000 n=10+10)
Atanh 157ns ± 0% 155ns ± 0% -1.27% (p=0.000 n=10+10)
Atan2 118ns ± 0% 117ns ± 1% -0.59% (p=0.003 n=10+10)
Cbrt 119ns ± 0% 114ns ± 0% -4.20% (p=0.000 n=10+10)
Copysign 7.51ns ± 0% 6.51ns ± 0% -13.32% (p=0.000 n=9+10)
Cos 73.1ns ± 0% 70.6ns ± 0% -3.42% (p=0.000 n=10+10)
Cosh 119ns ± 0% 121ns ± 0% +1.68% (p=0.000 n=10+9)
ExpGo 154ns ± 0% 149ns ± 0% -3.05% (p=0.000 n=9+10)
Expm1 101ns ± 0% 99ns ± 0% -1.88% (p=0.000 n=10+10)
Exp2Go 150ns ± 0% 146ns ± 0% -2.67% (p=0.000 n=10+10)
Abs 7.01ns ± 0% 6.01ns ± 0% -14.27% (p=0.000 n=10+9)
Mod 234ns ± 0% 212ns ± 0% -9.40% (p=0.000 n=9+10)
Frexp 34.5ns ± 0% 30.0ns ± 0% -13.04% (p=0.000 n=10+10)
Gamma 112ns ± 0% 111ns ± 0% -0.89% (p=0.000 n=10+10)
Hypot 73.6ns ± 0% 68.6ns ± 0% -6.79% (p=0.000 n=10+10)
HypotGo 77.1ns ± 0% 72.1ns ± 0% -6.49% (p=0.000 n=10+10)
Ilogb 31.0ns ± 0% 28.0ns ± 0% -9.68% (p=0.000 n=10+10)
J0 437ns ± 0% 434ns ± 0% -0.62% (p=0.000 n=10+10)
J1 433ns ± 0% 431ns ± 0% -0.46% (p=0.000 n=10+10)
Jn 927ns ± 0% 922ns ± 0% -0.54% (p=0.000 n=10+10)
Ldexp 41.5ns ± 0% 37.0ns ± 0% -10.84% (p=0.000 n=9+10)
Log 124ns ± 0% 118ns ± 0% -4.84% (p=0.000 n=10+9)
Logb 34.0ns ± 0% 32.0ns ± 0% -5.88% (p=0.000 n=10+10)
Log1p 110ns ± 0% 108ns ± 0% -1.82% (p=0.000 n=10+10)
Log10 136ns ± 0% 132ns ± 0% -2.94% (p=0.000 n=10+10)
Log2 51.6ns ± 0% 47.1ns ± 0% -8.72% (p=0.000 n=10+10)
Nextafter32 33.0ns ± 0% 30.5ns ± 0% -7.58% (p=0.000 n=10+10)
Nextafter64 29.0ns ± 0% 26.5ns ± 0% -8.62% (p=0.000 n=10+10)
PowInt 169ns ± 0% 160ns ± 0% -5.33% (p=0.000 n=10+10)
PowFrac 375ns ± 0% 361ns ± 0% -3.73% (p=0.000 n=10+10)
RoundToEven 14.0ns ± 0% 12.5ns ± 0% -10.71% (p=0.000 n=10+10)
Remainder 206ns ± 0% 192ns ± 0% -6.80% (p=0.000 n=10+9)
Signbit 6.01ns ± 0% 5.51ns ± 0% -8.32% (p=0.000 n=10+9)
Sin 70.1ns ± 0% 69.6ns ± 0% -0.71% (p=0.000 n=10+10)
Sincos 99.1ns ± 0% 99.6ns ± 0% +0.50% (p=0.000 n=9+10)
SqrtGoLatency 178ns ± 0% 146ns ± 0% -17.70% (p=0.000 n=8+10)
SqrtPrime 9.19µs ± 0% 9.20µs ± 0% +0.01% (p=0.000 n=9+9)
Tanh 125ns ± 1% 127ns ± 0% +1.36% (p=0.000 n=10+10)
Y0 428ns ± 0% 426ns ± 0% -0.47% (p=0.000 n=10+10)
Y1 431ns ± 0% 429ns ± 0% -0.46% (p=0.000 n=10+9)
Yn 906ns ± 0% 901ns ± 0% -0.55% (p=0.000 n=10+10)
Float64bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.000 n=10+10)
Float64frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+9)
Float32bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.002 n=8+10)
Float32frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+10)
Change-Id: Iba829e15d5624962fe0c699139ea783efeefabc2
Reviewed-on: https://go-review.googlesource.com/129715
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-17 20:49:04 +00:00
Keith Randall
b1f656b1ce
cmd/compile: fold address calculations into CMPload[const] ops
...
Makes go binary smaller by 0.2%.
I noticed this in autogenerated equal methods, and there are
probably a lot of those.
Change-Id: I4e04eb3653fbceb9dd6a4eee97ceab1fa4d10b72
Reviewed-on: https://go-review.googlesource.com/135379
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-09-14 19:42:09 +00:00
Lynn Boger
8dbd9afbb0
cmd/compile: improve rules for PPC64.rules
...
This adds some improvements to the rules for PPC64 to eliminate
unnecessary zero or sign extends, and fix some rule for truncates
which were not always using the correct sign instruction.
This reduces of size of many functions by 1 or 2 instructions and
can improve performance in cases where the execution time depends
on small loops where at least 1 instruction was removed and where that
loop contributes a significant amount of the total execution time.
Included is a testcase for codegen to verify the sign/zero extend
instructions are omitted.
An example of the improvement (strings):
IndexAnyASCII/256:1-16 392ns ± 0% 369ns ± 0% -5.79% (p=0.000 n=1+10)
IndexAnyASCII/256:2-16 397ns ± 0% 376ns ± 0% -5.23% (p=0.000 n=1+9)
IndexAnyASCII/256:4-16 405ns ± 0% 384ns ± 0% -5.19% (p=1.714 n=1+6)
IndexAnyASCII/256:8-16 427ns ± 0% 403ns ± 0% -5.57% (p=0.000 n=1+10)
IndexAnyASCII/256:16-16 441ns ± 0% 418ns ± 1% -5.33% (p=0.000 n=1+10)
IndexAnyASCII/4096:1-16 5.62µs ± 0% 5.27µs ± 1% -6.31% (p=0.000 n=1+10)
IndexAnyASCII/4096:2-16 5.67µs ± 0% 5.29µs ± 0% -6.67% (p=0.222 n=1+8)
IndexAnyASCII/4096:4-16 5.66µs ± 0% 5.28µs ± 1% -6.66% (p=0.000 n=1+10)
IndexAnyASCII/4096:8-16 5.66µs ± 0% 5.31µs ± 1% -6.10% (p=0.000 n=1+10)
IndexAnyASCII/4096:16-16 5.70µs ± 0% 5.33µs ± 1% -6.43% (p=0.182 n=1+10)
Change-Id: I739a6132b505936d39001aada5a978ff2a5f0500
Reviewed-on: https://go-review.googlesource.com/129875
Reviewed-by: David Chase <drchase@google.com>
2018-09-13 18:24:53 +00:00
erifan01
8149db4f64
cmd/compile: intrinsify math.RoundToEven and math.Abs on arm64
...
math.RoundToEven can be done by one arm64 instruction FRINTND, intrinsify it to improve performance.
The current pure Go implementation of the function Abs is translated into five instructions on arm64:
str, ldr, and, str, ldr. The intrinsic implementation requires only one instruction, so in terms of
performance, intrinsify it is worthwhile.
Benchmarks:
name old time/op new time/op delta
Abs-8 3.50ns ± 0% 1.50ns ± 0% -57.14% (p=0.000 n=10+10)
RoundToEven-8 9.26ns ± 0% 1.50ns ± 0% -83.80% (p=0.000 n=10+10)
Change-Id: I9456b26ab282b544dfac0154fc86f17aed96ac3d
Reviewed-on: https://go-review.googlesource.com/116535
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-13 14:52:51 +00:00
fanzha02
d5377c2026
test: fix the wrong test of math.Copysign(c, -1) for arm64
...
The CL 132915 added the wrong codegen test for math.Copysign(c, -1),
it should test that AND is not emitted. This CL fixes this error.
Change-Id: Ida1d3d54ebfc7f238abccbc1f70f914e1b5bfd91
Reviewed-on: https://go-review.googlesource.com/134815
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-12 15:34:20 +00:00
Ben Shi
9f2411894b
cmd/compile: optimize arm's bit operation
...
BFC (Bit Field Clear) was introduced in ARMv7, which can simplify
ANDconst and BICconst. And this CL implements that optimization.
1. The total size of pkg/android_arm decreases about 3KB, excluding
cmd/compile/.
2. There is no regression in the go1 benchmark result, and some
cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get
slight improvement.
name old time/op new time/op delta
BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29)
Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26)
FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28)
FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30)
FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30)
FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29)
FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21)
FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30)
GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30)
GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30)
Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29)
Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30)
HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29)
JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29)
JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30)
Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29)
GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29)
RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28)
RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25)
RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29)
RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29)
RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30)
RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29)
RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27)
Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30)
Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30)
TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27)
TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29)
[Geo mean] 384µs 382µs -0.37%
name old speed new speed delta
GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30)
GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30)
Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30)
Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30)
JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29)
JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30)
GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27)
RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28)
RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28)
RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27)
RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29)
RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29)
RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27)
RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26)
RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal)
Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30)
Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30)
[Geo mean] 12.9MB/s 12.9MB/s +0.29%
Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e
Reviewed-on: https://go-review.googlesource.com/134232
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-11 14:37:51 +00:00
erifan01
204cc14bdd
cmd/compile: implement non-constant rotates using ROR on arm64
...
Add some rules to match the Go code like:
y &= 63
x << y | x >> (64-y)
or
y &= 63
x >> y | x << (64-y)
as a ROR instruction. Make math/bits.RotateLeft faster on arm64.
Extends CL 132435 to arm64.
Benchmarks of math/bits.RotateLeftxxN:
name old time/op new time/op delta
RotateLeft-8 3.548750ns +- 1% 2.003750ns +- 0% -43.54% (p=0.000 n=8+8)
RotateLeft8-8 3.925000ns +- 0% 3.925000ns +- 0% ~ (p=1.000 n=8+8)
RotateLeft16-8 3.925000ns +- 0% 3.927500ns +- 0% ~ (p=0.608 n=8+8)
RotateLeft32-8 3.925000ns +- 0% 2.002500ns +- 0% -48.98% (p=0.000 n=8+8)
RotateLeft64-8 3.536250ns +- 0% 2.003750ns +- 0% -43.34% (p=0.000 n=8+8)
Change-Id: I77622cd7f39b917427e060647321f5513973232c
Reviewed-on: https://go-review.googlesource.com/122542
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-07 14:52:02 +00:00
Ben Shi
031a35ec84
cmd/compile: optimize 386's comparison
...
Optimization of "(CMPconst [0] (ANDL x y)) -> (TESTL x y)" only
get benefits if there is no further use of the result of x&y. A
condition of uses==1 will have slight improvements.
1. The code size of pkg/linux_386 decreases about 300 bytes, excluding
cmd/compile/.
2. The go1 benchmark shows no regression, and even a slight improvement
in test case FmtFprintfEmpty-4, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 3.34s ± 3% 3.32s ± 2% ~ (p=0.197 n=30+30)
Fannkuch11-4 3.48s ± 2% 3.47s ± 1% -0.33% (p=0.015 n=30+30)
FmtFprintfEmpty-4 46.3ns ± 4% 44.8ns ± 4% -3.33% (p=0.000 n=30+30)
FmtFprintfString-4 78.8ns ± 7% 77.3ns ± 5% ~ (p=0.098 n=30+26)
FmtFprintfInt-4 90.2ns ± 1% 90.0ns ± 7% -0.23% (p=0.027 n=18+30)
FmtFprintfIntInt-4 144ns ± 4% 143ns ± 5% ~ (p=0.945 n=30+29)
FmtFprintfPrefixedInt-4 180ns ± 4% 180ns ± 5% ~ (p=0.858 n=30+30)
FmtFprintfFloat-4 409ns ± 4% 406ns ± 3% -0.87% (p=0.028 n=30+30)
FmtManyArgs-4 611ns ± 5% 608ns ± 4% ~ (p=0.812 n=30+30)
GobDecode-4 7.30ms ± 5% 7.26ms ± 5% ~ (p=0.522 n=30+29)
GobEncode-4 6.90ms ± 7% 6.82ms ± 4% ~ (p=0.086 n=29+28)
Gzip-4 396ms ± 4% 400ms ± 4% +0.99% (p=0.026 n=30+30)
Gunzip-4 41.1ms ± 3% 41.2ms ± 3% ~ (p=0.495 n=30+30)
HTTPClientServer-4 63.7µs ± 3% 63.3µs ± 2% ~ (p=0.113 n=29+29)
JSONEncode-4 16.1ms ± 2% 16.1ms ± 2% -0.30% (p=0.041 n=30+30)
JSONDecode-4 60.9ms ± 3% 61.2ms ± 6% ~ (p=0.187 n=30+30)
Mandelbrot200-4 5.17ms ± 2% 5.19ms ± 3% ~ (p=0.676 n=30+30)
GoParse-4 3.28ms ± 3% 3.25ms ± 2% -0.97% (p=0.002 n=30+30)
RegexpMatchEasy0_32-4 103ns ± 4% 104ns ± 4% ~ (p=0.352 n=30+30)
RegexpMatchEasy0_1K-4 849ns ± 2% 845ns ± 2% ~ (p=0.381 n=30+30)
RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.795 n=30+30)
RegexpMatchEasy1_1K-4 1.03µs ± 3% 1.03µs ± 4% ~ (p=0.275 n=25+30)
RegexpMatchMedium_32-4 132ns ± 3% 132ns ± 3% ~ (p=0.970 n=30+30)
RegexpMatchMedium_1K-4 41.4µs ± 3% 41.4µs ± 3% ~ (p=0.212 n=30+30)
RegexpMatchHard_32-4 2.22µs ± 4% 2.22µs ± 4% ~ (p=0.399 n=30+30)
RegexpMatchHard_1K-4 67.2µs ± 3% 67.6µs ± 4% ~ (p=0.359 n=30+30)
Revcomp-4 1.84s ± 2% 1.83s ± 2% ~ (p=0.532 n=30+30)
Template-4 69.1ms ± 4% 68.8ms ± 3% ~ (p=0.146 n=30+30)
TimeParse-4 441ns ± 3% 442ns ± 3% ~ (p=0.154 n=30+30)
TimeFormat-4 413ns ± 3% 414ns ± 3% ~ (p=0.275 n=30+30)
[Geo mean] 66.2µs 66.0µs -0.28%
name old speed new speed delta
GobDecode-4 105MB/s ± 5% 106MB/s ± 5% ~ (p=0.514 n=30+29)
GobEncode-4 111MB/s ± 5% 113MB/s ± 4% +1.37% (p=0.046 n=28+28)
Gzip-4 49.1MB/s ± 4% 48.6MB/s ± 4% -0.98% (p=0.028 n=30+30)
Gunzip-4 472MB/s ± 4% 472MB/s ± 3% ~ (p=0.496 n=30+30)
JSONEncode-4 120MB/s ± 2% 121MB/s ± 2% +0.29% (p=0.042 n=30+30)
JSONDecode-4 31.9MB/s ± 3% 31.7MB/s ± 6% ~ (p=0.186 n=30+30)
GoParse-4 17.6MB/s ± 3% 17.8MB/s ± 2% +0.98% (p=0.002 n=30+30)
RegexpMatchEasy0_32-4 309MB/s ± 4% 307MB/s ± 4% ~ (p=0.501 n=30+30)
RegexpMatchEasy0_1K-4 1.21GB/s ± 2% 1.21GB/s ± 2% ~ (p=0.301 n=30+30)
RegexpMatchEasy1_32-4 283MB/s ± 4% 282MB/s ± 3% ~ (p=0.877 n=30+30)
RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 0.99GB/s ± 4% ~ (p=0.276 n=25+30)
RegexpMatchMedium_32-4 7.54MB/s ± 3% 7.55MB/s ± 3% ~ (p=0.528 n=30+30)
RegexpMatchMedium_1K-4 24.7MB/s ± 3% 24.7MB/s ± 3% ~ (p=0.203 n=30+30)
RegexpMatchHard_32-4 14.4MB/s ± 4% 14.4MB/s ± 4% ~ (p=0.407 n=30+30)
RegexpMatchHard_1K-4 15.3MB/s ± 3% 15.1MB/s ± 4% ~ (p=0.306 n=30+30)
Revcomp-4 138MB/s ± 2% 139MB/s ± 2% ~ (p=0.520 n=30+30)
Template-4 28.1MB/s ± 4% 28.2MB/s ± 3% ~ (p=0.149 n=30+30)
[Geo mean] 81.5MB/s 81.5MB/s +0.06%
Change-Id: I7f75425f79eec93cdd8fdd94db13ad4f61b6a2f5
Reviewed-on: https://go-review.googlesource.com/133657
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-07 01:20:45 +00:00
fanzha02
2e5c32518c
cmd/compile: optimize math.Copysign on arm64
...
Add rewrite rules to optimize math.Copysign() when the second
argument is negative floating point constant.
For example, math.Copysign(c, -2): The previous compile output is
"AND $9223372036854775807, R0, R0; ORR $-9223372036854775808, R0, R0".
The optimized compile output is "ORR $-9223372036854775808, R0, R0"
Math package benchmark results.
name old time/op new time/op delta
Copysign-8 2.61ns ± 2% 2.49ns ± 0% -4.55% (p=0.000 n=10+10)
Cos-8 43.0ns ± 0% 41.5ns ± 0% -3.49% (p=0.000 n=10+10)
Cosh-8 98.6ns ± 0% 98.1ns ± 0% -0.51% (p=0.000 n=10+10)
ExpGo-8 107ns ± 0% 105ns ± 0% -1.87% (p=0.000 n=10+10)
Exp2Go-8 100ns ± 0% 100ns ± 0% +0.39% (p=0.000 n=10+8)
Max-8 6.56ns ± 2% 6.45ns ± 1% -1.63% (p=0.002 n=10+10)
Min-8 6.66ns ± 3% 6.47ns ± 2% -2.82% (p=0.006 n=10+10)
Mod-8 107ns ± 1% 104ns ± 1% -2.72% (p=0.000 n=10+10)
Frexp-8 11.5ns ± 1% 11.0ns ± 0% -4.56% (p=0.000 n=8+10)
HypotGo-8 19.4ns ± 0% 19.4ns ± 0% +0.36% (p=0.019 n=10+10)
Ilogb-8 8.63ns ± 0% 8.51ns ± 0% -1.36% (p=0.000 n=10+10)
Jn-8 584ns ± 0% 585ns ± 0% +0.17% (p=0.000 n=7+8)
Ldexp-8 13.8ns ± 0% 13.5ns ± 0% -2.17% (p=0.002 n=8+10)
Logb-8 10.2ns ± 0% 9.9ns ± 0% -2.65% (p=0.000 n=10+7)
Nextafter64-8 7.54ns ± 0% 7.51ns ± 0% -0.37% (p=0.000 n=10+10)
Remainder-8 73.5ns ± 1% 70.4ns ± 1% -4.27% (p=0.000 n=10+10)
SqrtGoLatency-8 79.6ns ± 0% 76.2ns ± 0% -4.30% (p=0.000 n=9+10)
Yn-8 582ns ± 0% 579ns ± 0% -0.52% (p=0.000 n=10+10)
Change-Id: I0c9cd1ea87435e7b8bab94b4e79e6e29785f25b1
Reviewed-on: https://go-review.googlesource.com/132915
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-06 19:57:25 +00:00