1
0
mirror of https://github.com/golang/go synced 2024-10-04 03:21:22 -06:00
go/src/math
Ilya Tocar 6e703ae709 math: fix sqrt regression on AMD64
1.7 introduced a significant regression compared to 1.6:

SqrtIndirect-4  2.32ns ± 0%  7.86ns ± 0%  +238.79%        (p=0.000 n=20+18)

This is caused by sqrtsd preserving upper part of destination register.
Which introduces dependency on previous  value of X0.
In 1.6 benchmark loop didn't use X0 immediately after call:

callq  *%rbx
movsd  0x8(%rsp),%xmm2
movsd  0x20(%rsp),%xmm1
addsd  %xmm2,%xmm1
mov    0x18(%rsp),%rax
inc    %rax
jmp    loop

In 1.7 however xmm0 is used just after call:

callq  *%rbx
mov    0x10(%rsp),%rcx
lea    0x1(%rcx),%rax
movsd  0x8(%rsp),%xmm0
movsd  0x18(%rsp),%xmm1

I've  verified that this is caused by dependency, by inserting
XORPS X0,X0 in the beginning of math.Sqrt, which puts performance back on 1.6 level.

Splitting SQRTSD mem,reg into:
MOVSD mem,reg
SQRTSD reg,reg

Removes dependency, because MOVSD (load version)
doesn't need to preserve upper part of a register.
And reg,reg operation is solved by renamer in CPU.

As a result of this change regression is gone:
SqrtIndirect-4  7.86ns ± 0%  2.33ns ± 0%  -70.36%  (p=0.000 n=18+17)

This also removes old Sqrt benchmarks, in favor of benchmarks measuring latency.
Only SqrtIndirect is kept, to show impact of this patch.

Change-Id: Ic7eebe8866445adff5bc38192fa8d64c9a6b8872
Reviewed-on: https://go-review.googlesource.com/28392
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-06 15:45:02 +00:00
..
big math/big: add assembly implementation of arith for ppc64{le} 2016-08-29 21:03:21 +00:00
cmplx math/cmplx: added clarifying comment 2016-03-21 16:18:38 +00:00
rand math/rand: document that NewSource sources race 2016-09-02 05:16:21 +00:00
abs.go math: fix typo and braino in my earlier commit 2015-10-29 21:12:08 +00:00
acosh.go all: single space after period. 2016-03-02 00:13:47 +00:00
all_test.go math: fix sqrt regression on AMD64 2016-09-06 15:45:02 +00:00
asin_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin.go
asinh.go all: single space after period. 2016-03-02 00:13:47 +00:00
atan2_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2.go
atan_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan.go
atanh.go all: single space after period. 2016-03-02 00:13:47 +00:00
bits.go
cbrt.go
const.go math: explain OEIS link 2015-06-26 01:25:58 +00:00
copysign.go
dim_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
dim_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
dim_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
dim_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
dim_s390x.s math: add functions and stubs for s390x 2016-04-06 23:35:56 +00:00
dim.go
erf.go all: single space after period. 2016-03-02 00:13:47 +00:00
exp2_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp2_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp2_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp2_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp.go all: single space after period. 2016-03-02 00:13:47 +00:00
expm1_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1.go all: single space after period. 2016-03-02 00:13:47 +00:00
export_test.go all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_asm.go math: optimize ceil/floor functions on amd64 2015-10-03 15:55:08 +00:00
floor_s390x.s math: optimize Ceil, Floor and Trunc on s390x 2016-08-26 17:27:13 +00:00
floor.go
frexp_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp.go
gamma.go all: single space after period. 2016-03-02 00:13:47 +00:00
hypot_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot.go
j0.go math: speed up bessel functions on AMD64 2016-08-31 14:45:29 +00:00
j1.go math: speed up bessel functions on AMD64 2016-08-31 14:45:29 +00:00
jn.go all: single space after period. 2016-03-02 00:13:47 +00:00
ldexp_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp.go
lgamma.go all: single space after period. 2016-03-02 00:13:47 +00:00
log1p_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p.go all: single space after period. 2016-03-02 00:13:47 +00:00
log10_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10.go math: fix Log2 test failures on ppc64 (and s390) 2015-07-15 05:35:22 +00:00
log_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log.go all: single space after period. 2016-03-02 00:13:47 +00:00
logb.go
mod_386.s
mod_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
mod_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
mod_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
mod.go
modf_386.s all: fix assembly vet issues 2016-08-25 18:52:31 +00:00
modf_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf.go all: single space after period. 2016-03-02 00:13:47 +00:00
nextafter.go
pow10.go
pow.go
remainder_386.s
remainder_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
remainder_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
remainder_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
remainder.go all: single space after period. 2016-03-02 00:13:47 +00:00
signbit.go
sin_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin.go
sincos_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sincos_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sincos_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sincos_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sincos.go
sinh.go
sqrt_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_amd64.s math: fix sqrt regression on AMD64 2016-09-06 15:45:02 +00:00
sqrt_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_arm64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_ppc64x.s all: make copyright headers consistent with one space after period 2016-05-02 13:43:18 +00:00
sqrt_s390x.s math: add functions and stubs for s390x 2016-04-06 23:35:56 +00:00
sqrt.go math: delete unused function sqrtC 2016-03-03 02:29:09 +00:00
stubs_arm64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
stubs_mips64x.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
stubs_ppc64x.s math: improve sqrt for ppc64le,ppc64 2016-03-10 15:01:21 +00:00
stubs_s390x.s math: optimize Ceil, Floor and Trunc on s390x 2016-08-26 17:27:13 +00:00
tan_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan.go
tanh.go
unsafe.go