1
0
mirror of https://github.com/golang/go synced 2024-11-19 11:44:45 -07:00
go/src/math
Carlos Eduardo Seo a44c72823c math/big: improve performance of addVW/subVW for ppc64x
This change adds a better implementation in asm for addVW/subVW for
ppc64x, with speedups up to 3.11x.

benchmark                    old ns/op     new ns/op     delta
BenchmarkAddVW/1-16          6.87          5.71          -16.89%
BenchmarkAddVW/2-16          7.72          5.94          -23.06%
BenchmarkAddVW/3-16          8.74          6.56          -24.94%
BenchmarkAddVW/4-16          9.66          7.26          -24.84%
BenchmarkAddVW/5-16          10.8          7.26          -32.78%
BenchmarkAddVW/10-16         17.4          9.97          -42.70%
BenchmarkAddVW/100-16        164           56.0          -65.85%
BenchmarkAddVW/1000-16       1638          524           -68.01%
BenchmarkAddVW/10000-16      16421         5201          -68.33%
BenchmarkAddVW/100000-16     165762        53324         -67.83%
BenchmarkSubVW/1-16          6.76          5.62          -16.86%
BenchmarkSubVW/2-16          7.69          6.02          -21.72%
BenchmarkSubVW/3-16          8.85          6.61          -25.31%
BenchmarkSubVW/4-16          10.0          7.34          -26.60%
BenchmarkSubVW/5-16          11.3          7.33          -35.13%
BenchmarkSubVW/10-16         19.5          18.7          -4.10%
BenchmarkSubVW/100-16        153           55.9          -63.46%
BenchmarkSubVW/1000-16       1502          519           -65.45%
BenchmarkSubVW/10000-16      15005         5165          -65.58%
BenchmarkSubVW/100000-16     150620        53124         -64.73%

benchmark                    old MB/s     new MB/s     speedup
BenchmarkAddVW/1-16          1165.12      1400.76      1.20x
BenchmarkAddVW/2-16          2071.39      2693.25      1.30x
BenchmarkAddVW/3-16          2744.72      3656.92      1.33x
BenchmarkAddVW/4-16          3311.63      4407.34      1.33x
BenchmarkAddVW/5-16          3700.52      5512.48      1.49x
BenchmarkAddVW/10-16         4605.63      8026.37      1.74x
BenchmarkAddVW/100-16        4856.15      14296.76     2.94x
BenchmarkAddVW/1000-16       4883.96      15264.21     3.13x
BenchmarkAddVW/10000-16      4871.52      15380.78     3.16x
BenchmarkAddVW/100000-16     4826.17      15002.48     3.11x
BenchmarkSubVW/1-16          1183.20      1423.03      1.20x
BenchmarkSubVW/2-16          2081.92      2657.44      1.28x
BenchmarkSubVW/3-16          2711.52      3632.30      1.34x
BenchmarkSubVW/4-16          3198.30      4360.30      1.36x
BenchmarkSubVW/5-16          3534.43      5460.40      1.54x
BenchmarkSubVW/10-16         4106.34      4273.51      1.04x
BenchmarkSubVW/100-16        5213.48      14306.32     2.74x
BenchmarkSubVW/1000-16       5324.27      15391.21     2.89x
BenchmarkSubVW/10000-16      5331.33      15486.57     2.90x
BenchmarkSubVW/100000-16     5311.35      15059.01     2.84x

Change-Id: Ibaa5b9b38d63fba8e01a9c327eb8bef1e6e908c1
Reviewed-on: https://go-review.googlesource.com/101975
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-03-27 15:06:53 +00:00
..
big math/big: improve performance of addVW/subVW for ppc64x 2018-03-27 15:06:53 +00:00
bits math/bits: add examples for right rotation 2017-11-03 20:12:07 +00:00
cmplx math/cmplx: use signed zero to correct branch cuts 2017-11-27 07:44:00 +00:00
rand math/rand: typo fixed in documentation of seedPos 2018-01-04 20:27:29 +00:00
abs.go cmd/compile,math: improve code generation for math.Abs 2017-08-25 19:15:01 +00:00
acos_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
acosh_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
acosh.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
all_test.go math: correct result for Pow(x, ±.5) 2018-01-02 18:10:43 +00:00
arith_s390x_test.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
arith_s390x.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
asin_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
asin.go
asinh_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
asinh_stub.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
asinh.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
atan2_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
atan2.go
atan_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
atan.go
atanh_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
atanh.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
bits.go math: add RoundToEven function 2017-10-24 22:33:09 +00:00
cbrt_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
cbrt_stub.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
cbrt.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
const.go math: change oeis.org urls to https 2017-08-08 08:56:40 +00:00
copysign.go
cosh_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions 2017-04-17 16:33:51 +00:00
dim_386.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
dim_amd64.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
dim_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
dim_arm64.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
dim_arm.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
dim_s390x.s math: optimize dim and remove s390x assembly implementation 2017-10-30 19:05:51 +00:00
dim.go math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
erf_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
erf_stub.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
erf.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
erfc_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
erfinv.go math: implement the erfcinv function 2017-08-22 13:13:20 +00:00
example_test.go math: add examples for trig functions 2017-08-25 20:26:19 +00:00
exp2_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp2_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp2_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp2_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_386.s math: use portable Exp instead of 387 instructions on 386 2016-10-05 03:53:11 +00:00
exp_amd64.s math: implement fast path for Exp 2017-09-20 21:43:00 +00:00
exp_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_asm.go math: implement fast path for Exp 2017-09-20 21:43:00 +00:00
exp_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
exp.go math: fix inaccurate result of Exp(1) 2017-08-17 09:01:27 +00:00
expm1_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
expm1.go all: unindent some big chunks of code 2017-08-18 06:59:48 +00:00
export_s390x_test.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
export_test.go all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_amd64.s cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64 2017-10-31 19:30:54 +00:00
floor_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_arm64.s math: add some assembly implementations on ARM64 2016-09-27 23:52:12 +00:00
floor_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_ppc64x.s math, cmd/internal/obj/ppc64: improve floor, ceil, trunc with asm 2016-09-23 13:03:08 +00:00
floor_s390x.s math: optimize Ceil, Floor and Trunc on s390x 2016-08-26 17:27:13 +00:00
floor.go math: add RoundToEven function 2017-10-24 22:33:09 +00:00
frexp_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp.go
gamma.go math: speed up Gamma(+Inf) 2016-10-18 22:12:03 +00:00
hypot_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot.go math: use Abs rather than if x < 0 { x = -x } 2018-02-13 20:12:23 +00:00
j0.go math: use Abs rather than if x < 0 { x = -x } 2018-02-13 20:12:23 +00:00
j1.go math: speed up bessel functions on AMD64 2016-08-31 14:45:29 +00:00
jn.go math: fix typos in Bessel function docs 2017-02-16 22:41:34 +00:00
ldexp_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp.go
lgamma.go all: single space after period. 2016-03-02 00:13:47 +00:00
log1p_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
log1p.go math,math/cmplx: fix linter issues 2016-10-24 23:25:46 +00:00
log10_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions 2017-04-17 16:33:51 +00:00
log10.go
log_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log_amd64.s math: speed up Log on amd64 2017-03-29 20:36:29 +00:00
log_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
log.go all: single space after period. 2016-03-02 00:13:47 +00:00
logb.go
mod_386.s
mod_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
mod_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
mod_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
mod.go
modf_386.s all: fix assembly vet issues 2016-08-25 18:52:31 +00:00
modf_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf_arm64.s all: minor vet fixes 2016-10-24 17:27:37 +00:00
modf_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf_ppc64x.s math: implement asm modf for ppc64x 2017-11-02 13:24:32 +00:00
modf.go all: single space after period. 2016-03-02 00:13:47 +00:00
nextafter.go
pow10.go math: speed up and improve accuracy of Pow10 2017-02-22 19:17:04 +00:00
pow_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
pow_stub.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
pow.go math: correct result for Pow(x, ±.5) 2018-01-02 18:10:43 +00:00
remainder_386.s
remainder_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
remainder_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
remainder_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
remainder.go all: single space after period. 2016-03-02 00:13:47 +00:00
signbit.go
sin_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add "test under mask" instructions 2017-10-30 23:55:14 +00:00
sin.go math: use Abs rather than if x < 0 { x = -x } 2018-02-13 20:12:23 +00:00
sincos_386.go math: remove asm version of sincos everywhere, except 386 2017-04-24 15:09:18 +00:00
sincos_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sincos.go math: remove asm version of sincos everywhere, except 386 2017-04-24 15:09:18 +00:00
sinh_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions 2017-04-17 16:33:51 +00:00
sinh_stub.s math: use SIMD to accelerate some scalar math functions on s390x 2016-11-11 20:20:23 +00:00
sinh.go math: optimize sinh and cosh 2018-02-27 04:34:37 +00:00
sqrt_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_amd64.s math: make sqrt smaller on AMD64 2016-09-29 15:56:52 +00:00
sqrt_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_arm64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_mipsx.s runtime/cgo, math: don't use FP instructions for soft-float mips{,le} 2017-11-30 17:12:32 +00:00
sqrt_ppc64x.s all: make copyright headers consistent with one space after period 2016-05-02 13:43:18 +00:00
sqrt_s390x.s math: add functions and stubs for s390x 2016-04-06 23:35:56 +00:00
sqrt.go math: delete unused function sqrtC 2016-03-03 02:29:09 +00:00
stubs_arm64.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
stubs_mips64x.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
stubs_mipsx.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
stubs_ppc64x.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
stubs_s390x.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
tan_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_s390x.s math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
tan.go math,math/cmplx: fix linter issues 2016-10-24 23:25:46 +00:00
tanh_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions 2017-04-17 16:33:51 +00:00
tanh.go math: use SIMD to accelerate some scalar math functions on s390x 2016-11-11 20:20:23 +00:00
unsafe.go