This function avoids subtle faults found in many ad-hoc implementations,
and is simple enough to be inlined by the compiler.
Fixes#20100
Change-Id: Ib320254e9b1f1f798c6ef906b116f63bc29e8d08
Reviewed-on: https://go-review.googlesource.com/43652
Reviewed-by: Robert Griesemer <gri@golang.org>
Change-Id: Ic3ce2f3c055f2636ec8fc9cec8592e596b18dc05
Reviewed-on: https://go-review.googlesource.com/54771
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Implement int reg <-> fp reg moves on amd64.
If we see a load to int reg followed by an int->fp move, then we can just
load to the fp reg instead. Same for stores.
math.Abs is now:
MOVQ "".x+8(SP), AX
SHLQ $1, AX
SHRQ $1, AX
MOVQ AX, "".~r1+16(SP)
math.Copysign is now:
MOVQ "".x+8(SP), AX
SHLQ $1, AX
SHRQ $1, AX
MOVQ "".y+16(SP), CX
SHRQ $63, CX
SHLQ $63, CX
ORQ CX, AX
MOVQ AX, "".~r2+24(SP)
math.Float64bits is now:
MOVSD "".x+8(SP), X0
MOVSD X0, "".~r1+16(SP)
(it would be nicer to use a non-SSE reg for this, nothing is perfect)
And due to the fix for #21440, the inlined version of these improve as well.
name old time/op new time/op delta
Abs 1.38ns ± 5% 0.89ns ±10% -35.54% (p=0.000 n=10+10)
Copysign 1.56ns ± 7% 1.35ns ± 6% -13.77% (p=0.000 n=9+10)
Fixes#13095
Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499
Reviewed-on: https://go-review.googlesource.com/58732
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
This commit defines the inverse of error function (erfinv) in the
math package. The function is based on the rational approximation
of percentage points of normal distribution available at
https://www.jstor.org/stable/pdf/2347330.pdf.
Fixes#6359
Change-Id: Icfe4508f623e0574c7fffdbf7aa929540fd4c944
Reviewed-on: https://go-review.googlesource.com/46990
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Updates #13745
Recognize z.Mul(x, x) as squaring for Floats and use
the internal z.sqr(x) method for nat on the mantissa.
Change-Id: I0f792157bad93a13cae1aecc4c10bd20c6397693
Reviewed-on: https://go-review.googlesource.com/56774
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
updates #13745
A squared rational is always positive and can not
be reduced since the numerator and denominator had
no previous common factors. The nat multiplication
can be performed using the internal sqr method.
Change-Id: I558f5b38e379bfd26ff163c9489006d7e5a9cfaa
Reviewed-on: https://go-review.googlesource.com/56776
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Found with mvdan.cc/unindent. Prioritized the ones with the biggest wins
for now.
Change-Id: I2b032e45cdd559fc9ed5b1ee4c4de42c4c92e07b
Reviewed-on: https://go-review.googlesource.com/56470
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The existing implementation is translated from C, which uses a
polynomial coefficient very close to 1/6. If the function uses
1/6 as this coeffient, the result of Exp(1) will be more accurate.
And this change doesn't introduce more error to Exp function.
Fixes#20319
Change-Id: I94c236a18cf95570ebb69f7fb99884b0d7cf5f6e
Reviewed-on: https://go-review.googlesource.com/49294
Reviewed-by: Robert Griesemer <gri@golang.org>
updates #13745
Multiprecision squaring can be done in a straightforward manner
with about half the multiplications of a basic multiplication
due to the symmetry of the operands. This change implements
basic squaring for nat types and uses it for Int multiplication
when the same variable is supplied to both arguments of
z.Mul(x, x). This has some overhead to allocate a temporary
variable to hold the cross products, shift them to double and
add them to the diagonal terms. There is a speed benefit in
the intermediate range when the overhead is neglible and the
asymptotic performance of karatsuba multiplication has not been
reached.
basicSqrThreshold = 20
karatsubaSqrThreshold = 400
Were set by running calibrate_test.go to measure timing differences
between the algorithms. Benchmarks for squaring:
name old time/op new time/op delta
IntSqr/1-4 51.5ns ±25% 25.1ns ± 7% -51.38% (p=0.008 n=5+5)
IntSqr/2-4 79.1ns ± 4% 72.4ns ± 2% -8.47% (p=0.008 n=5+5)
IntSqr/3-4 102ns ± 4% 97ns ± 5% ~ (p=0.056 n=5+5)
IntSqr/5-4 161ns ± 4% 163ns ± 7% ~ (p=0.952 n=5+5)
IntSqr/8-4 277ns ± 5% 267ns ± 6% ~ (p=0.087 n=5+5)
IntSqr/10-4 358ns ± 3% 360ns ± 4% ~ (p=0.730 n=5+5)
IntSqr/20-4 1.07µs ± 3% 1.01µs ± 6% ~ (p=0.056 n=5+5)
IntSqr/30-4 2.36µs ± 4% 1.72µs ± 2% -27.03% (p=0.008 n=5+5)
IntSqr/50-4 5.19µs ± 3% 3.88µs ± 4% -25.37% (p=0.008 n=5+5)
IntSqr/80-4 11.3µs ± 4% 8.6µs ± 3% -23.78% (p=0.008 n=5+5)
IntSqr/100-4 16.2µs ± 4% 12.8µs ± 3% -21.49% (p=0.008 n=5+5)
IntSqr/200-4 50.1µs ± 5% 44.7µs ± 3% -10.65% (p=0.008 n=5+5)
IntSqr/300-4 105µs ±11% 95µs ± 3% -9.50% (p=0.008 n=5+5)
IntSqr/500-4 231µs ± 5% 227µs ± 2% ~ (p=0.310 n=5+5)
IntSqr/800-4 496µs ± 9% 459µs ± 3% -7.40% (p=0.016 n=5+5)
IntSqr/1000-4 700µs ± 3% 710µs ± 5% ~ (p=0.841 n=5+5)
Show a speed up of 10-25% in the range where basicSqr is optimal,
improved single word squaring and no significant difference when
the fallback to standard multiplication is used.
Change-Id: Iae2c82ca91cf890823f91e5c83bbe9a2c534b72b
Reviewed-on: https://go-review.googlesource.com/53638
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The current implementation of the extended Euclidean GCD algorithm
calculates both cosequences x and y inside the division loop. This
is unneccessary since the second Bezout coefficient can be obtained
at the end of calculation via a multiplication, subtraction and a
division. In case only one coefficient is needed, e.g. ModInverse
this calculation can be skipped entirely. This is a standard
optimization, see e.g.
"Handbook of Elliptic and Hyperelliptic Curve Cryptography"
Cohen et al pp 191
Available at:
http://cs.ucsb.edu/~koc/ccs130h/2013/EllipticHyperelliptic-CohenFrey.pdf
Updates #15833
Change-Id: I1e0d2e63567cfed97fd955048fe6373d36f22757
Reviewed-on: https://go-review.googlesource.com/50530
Reviewed-by: Robert Griesemer <gri@golang.org>
The current implementation uses a shift and add
loop to compute the product of x's exponent xe and
the integer part of y (yi) for yi up to 1<<63.
Since xe is an 11-bit exponent, this product can be
up to 74-bits and overflow both 32 and 64-bit int.
This change checks whether the accumulated exponent
will fit in the 11-bit float exponent of the output
and breaks out of the loop early if overflow is detected.
The current handling of yi >= 1<<63 uses Exp(y * Log(x))
which incorrectly returns Nan for x<0. In addition,
for y this large, Exp(y * Log(x)) can be enumerated
to only overflow except when x == -1 since the
boundary cases computed exactly:
Pow(NextAfter(1.0, Inf(1)), 1<<63) == 2.72332... * 10^889
Pow(NextAfter(1.0, Inf(-1)), 1<<63) == 1.91624... * 10^-445
exceed the range of float64. So, the call can be
replaced with a simple case statement analgous to
y == Inf that correctly handles x < 0 as well.
Fixes#7394
Change-Id: I6f50dc951f3693697f9669697599860604323102
Reviewed-on: https://go-review.googlesource.com/48290
Reviewed-by: Robert Griesemer <gri@golang.org>
As noted in the TODO comment, the sticky bit is only used
when the rounding bit is zero or the rounding mode is
ToNearestEven. This change makes that check explicit and
will eliminate half the sticky bit calculations on average
when rounding mode is not ToNearestEven.
Change-Id: Ia4709f08f46e682bf97dabe5eb2a10e8e3d7af43
Reviewed-on: https://go-review.googlesource.com/54111
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Add test cases to verify behavior for Ldexp with exponents outside the
range of Minint32/Maxint32, for a gccgo bug.
Test for issue #21323.
Change-Id: Iea67bc6fcfafdfddf515cf7075bdac59360c277a
Reviewed-on: https://go-review.googlesource.com/54230
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The standard deviation of a uniform distribution is size / √12.
The size of the interval [0, 255] is 256, not 255.
While we're here, simplify the expression.
The tests previously passed only because the error margin was large enough.
Sample observed standard deviations while running tests:
73.7893634666819
73.9221651548294
73.8077961697150
73.9084236069471
73.8968446814785
73.8684209136244
73.9774618960282
73.9523483202549
255 / √12 == 73.6121593216772
256 / √12 == 73.9008344562721
Change-Id: I7bc6cdc11e5d098951f2f2133036f62489275979
Reviewed-on: https://go-review.googlesource.com/51310
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Regular HTTP is insecure, oeis.org supports HTTPS and it is actually
used in some other places in the codebase. This changes these final urls
to use HTTPS.
Change-Id: Ia46410a9c7ce67238a10cb6bfffaceca46112f58
Reviewed-on: https://go-review.googlesource.com/52072
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
Manual hyphenation doesn't work well when text gets reflown,
for example by godoc.
There are a few other manual hyphenations in the tree,
but they are in local comments or comments for unexported functions.
Change-Id: I17c9b1fee1def650da48903b3aae2fa1e1119a65
Reviewed-on: https://go-review.googlesource.com/53510
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Erroneously called OnesCount instead of OnesCount64
Change-Id: Ie877e43f213253e45d31f64931c4a15915849586
Reviewed-on: https://go-review.googlesource.com/53410
Reviewed-by: Chris Broadfoot <cbro@golang.org>
Go doesn't guarantee that the result of floating point operations will
be the same on different architectures. It was not stated in the
documentation, that can lead to confusion.
Fixes#18354
Change-Id: Idb1b4c256fb9a7158a74256136eca3b8ce44476f
Reviewed-on: https://go-review.googlesource.com/34938
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Implements detection of x86 cpu features that
are used in the go standard library.
Changes all standard library packages to use the new cpu package
instead of using runtime internal variables to check x86 cpu features.
Updates: #15403
Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9
Reviewed-on: https://go-review.googlesource.com/41476
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This adds math/bits intrinsics for OnesCount, Len, TrailingZeros on
ppc64x.
benchmark old ns/op new ns/op delta
BenchmarkLeadingZeros-16 4.26 1.71 -59.86%
BenchmarkLeadingZeros16-16 3.04 1.83 -39.80%
BenchmarkLeadingZeros32-16 3.31 1.82 -45.02%
BenchmarkLeadingZeros64-16 3.69 1.71 -53.66%
BenchmarkTrailingZeros-16 2.55 1.62 -36.47%
BenchmarkTrailingZeros32-16 2.55 1.77 -30.59%
BenchmarkTrailingZeros64-16 2.78 1.62 -41.73%
BenchmarkOnesCount-16 3.19 0.93 -70.85%
BenchmarkOnesCount32-16 2.55 1.18 -53.73%
BenchmarkOnesCount64-16 3.22 0.93 -71.12%
Update #18616
I also made a change to bits_test.go because when debugging some failures
the output was not quite providing the right argument information.
Change-Id: Ia58d31d1777cf4582a4505f85b11a1202ca07d3e
Reviewed-on: https://go-review.googlesource.com/41630
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
As necessary, math functions were structured to use stubs, so that they can
be accelerated with assembly on any platform.
Technique used was minimax polynomial approximation using tables of
polynomial coefficients, with argument range reduction.
Benchmark New Old Speedup
BenchmarkAcos 12.2 47.5 3.89
BenchmarkAcosh 18.5 56.2 3.04
BenchmarkAsin 13.1 40.6 3.10
BenchmarkAsinh 19.4 62.8 3.24
BenchmarkAtan 10.1 23 2.28
BenchmarkAtanh 19.1 53.2 2.79
BenchmarkAtan2 16.5 33.9 2.05
BenchmarkCbrt 14.8 58 3.92
BenchmarkErf 10.8 20.1 1.86
BenchmarkErfc 11.2 23.5 2.10
BenchmarkExp 8.77 53.8 6.13
BenchmarkExpm1 10.1 38.3 3.79
BenchmarkLog 13.1 40.1 3.06
BenchmarkLog1p 12.7 38.3 3.02
BenchmarkPowInt 31.7 40.5 1.28
BenchmarkPowFrac 33.1 141 4.26
BenchmarkTan 11.5 30 2.61
Accuracy was tested against a high precision
reference function to determine maximum error.
Note: ulperr is error in "units in the last place"
max
ulperr
Acos 1.15
Acosh 1.07
Asin 2.22
Asinh 1.72
Atan 1.41
Atanh 3.00
Atan2 1.45
Cbrt 1.18
Erf 1.29
Erfc 4.82
Exp 1.00
Expm1 2.26
Log 0.94
Log1p 2.39
Tan 3.14
Pow will have 99.99% correctly rounded results with reasonable inputs
producing numeric (non Inf or NaN) results
Change-Id: I850e8cf7b70426e8b54ec49d74acd4cddc8c6cb2
Reviewed-on: https://go-review.googlesource.com/38585
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
We have dedicated asm implementation of sincos only on 386 and amd64,
on everything else we are just jumping to generic version.
However amd64 version is actually slower than generic one:
Sincos-6 34.4ns ± 0% 24.8ns ± 0% -27.79% (p=0.000 n=8+10)
So remove all sincos*.s and keep only generic and 386.
Updates #19819
Change-Id: I7eefab35743729578264f52f6d23ee2c227c92a5
Reviewed-on: https://go-review.googlesource.com/41200
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The instructions allow moves between floating point and general
purpose registers without any conversion taking place.
Change-Id: I82c6f3ad9c841a83783b5be80dcf5cd538ff49e6
Reviewed-on: https://go-review.googlesource.com/38777
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Starting in go1.9, the minimum processor requirement for ppc64 is POWER8. So it
may now use the same divWW implementation as ppc64le.
Updates #19074
Change-Id: If1a85f175cda89eee06a1024ccd468da6124c844
Reviewed-on: https://go-review.googlesource.com/39010
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
After https://golang.org/cl/31490 we break false
output dependency for CVTS.. in compiler generated code.
I've looked through asm code, which uses CVTS..
and added XOR to the only case where it affected performance.
Log-6 21.6ns ± 0% 19.9ns ± 0% -7.87% (p=0.000 n=10+10)
Change-Id: I25d9b405e3041a3839b40f9f9a52e708034bb347
Reviewed-on: https://go-review.googlesource.com/38771
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Verified that BenchmarkBitLen time went down from 2.25 ns/op to 0.65 ns/op
an a 2.3 GHz Intel Core i7, before removing that benchmark (now covered by
math/bits benchmarks).
Change-Id: I3890bb7d1889e95b9a94bd68f0bdf06f1885adeb
Reviewed-on: https://go-review.googlesource.com/38464
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
A -0 constant is the same as 0. Use explicit negative zero
for float64 -0.0. Also, fix two test cases that were wrong.
Fixes#19673.
Change-Id: Ic09775f29d9bc2ee7814172e59c4a693441ea730
Reviewed-on: https://go-review.googlesource.com/38463
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
nat.setUint64 is nicely generic.
By assuming 32- or 64-bit words, however,
we can write simpler code,
and eliminate some shifts
in dead code that vet complains about.
Generated code for 64 bit systems is unaltered.
Generated code for 32 bit systems is much better.
For 386, the routine length drops from 325
bytes of code to 271 bytes of code, with fewer loops.
Change-Id: I1bc14c06272dee37a7fcb48d33dd1e621eba945d
Reviewed-on: https://go-review.googlesource.com/38070
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Change-Id: I6343c162e27e2e492547c96f1fc504909b1c03c0
Reviewed-on: https://go-review.googlesource.com/37793
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Removes an extra function call for TrailingZeroes and thus may
increase chances for inlining.
Change-Id: Iefd8d4402dc89b64baf4e5c865eb3dadade623af
Reviewed-on: https://go-review.googlesource.com/37613
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>