We have dedicated asm implementation of sincos only on 386 and amd64,
on everything else we are just jumping to generic version.
However amd64 version is actually slower than generic one:
Sincos-6 34.4ns ± 0% 24.8ns ± 0% -27.79% (p=0.000 n=8+10)
So remove all sincos*.s and keep only generic and 386.
Updates #19819
Change-Id: I7eefab35743729578264f52f6d23ee2c227c92a5
Reviewed-on: https://go-review.googlesource.com/41200
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The instructions allow moves between floating point and general
purpose registers without any conversion taking place.
Change-Id: I82c6f3ad9c841a83783b5be80dcf5cd538ff49e6
Reviewed-on: https://go-review.googlesource.com/38777
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Starting in go1.9, the minimum processor requirement for ppc64 is POWER8. So it
may now use the same divWW implementation as ppc64le.
Updates #19074
Change-Id: If1a85f175cda89eee06a1024ccd468da6124c844
Reviewed-on: https://go-review.googlesource.com/39010
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
After https://golang.org/cl/31490 we break false
output dependency for CVTS.. in compiler generated code.
I've looked through asm code, which uses CVTS..
and added XOR to the only case where it affected performance.
Log-6 21.6ns ± 0% 19.9ns ± 0% -7.87% (p=0.000 n=10+10)
Change-Id: I25d9b405e3041a3839b40f9f9a52e708034bb347
Reviewed-on: https://go-review.googlesource.com/38771
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Verified that BenchmarkBitLen time went down from 2.25 ns/op to 0.65 ns/op
an a 2.3 GHz Intel Core i7, before removing that benchmark (now covered by
math/bits benchmarks).
Change-Id: I3890bb7d1889e95b9a94bd68f0bdf06f1885adeb
Reviewed-on: https://go-review.googlesource.com/38464
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
A -0 constant is the same as 0. Use explicit negative zero
for float64 -0.0. Also, fix two test cases that were wrong.
Fixes#19673.
Change-Id: Ic09775f29d9bc2ee7814172e59c4a693441ea730
Reviewed-on: https://go-review.googlesource.com/38463
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
nat.setUint64 is nicely generic.
By assuming 32- or 64-bit words, however,
we can write simpler code,
and eliminate some shifts
in dead code that vet complains about.
Generated code for 64 bit systems is unaltered.
Generated code for 32 bit systems is much better.
For 386, the routine length drops from 325
bytes of code to 271 bytes of code, with fewer loops.
Change-Id: I1bc14c06272dee37a7fcb48d33dd1e621eba945d
Reviewed-on: https://go-review.googlesource.com/38070
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Change-Id: I6343c162e27e2e492547c96f1fc504909b1c03c0
Reviewed-on: https://go-review.googlesource.com/37793
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Removes an extra function call for TrailingZeroes and thus may
increase chances for inlining.
Change-Id: Iefd8d4402dc89b64baf4e5c865eb3dadade623af
Reviewed-on: https://go-review.googlesource.com/37613
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change adds math/bits as a new dependency of math/big.
- use bits.LeadingZeroes instead of local implementation
(they are identical, so there's no performance loss here)
- leave other functionality local (ntz, bitLen) since there's
faster implementations in math/big at the moment
Change-Id: I1218aa8a1df0cc9783583b090a4bb5a8a145c4a2
Reviewed-on: https://go-review.googlesource.com/37141
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Removes init function from the math package.
Allows stripping of arrays with pre-computed values
used for Pow10 from binaries if Pow10 is not used.
cmd/go shrinks by 128 bytes.
Fixed small values like 10**-323 being 0 instead of 1e-323.
Overall precision is increased but still not as good as
predefined constants for some inputs.
Samples:
Pow10(208)
before: 1.0000000000000006662e+208
after: 1.0000000000000000959e+208
Pow10(202)
before 1.0000000000000009895e+202
after 1.0000000000000001193e+202
Pow10(60)
before 1.0000000000000001278e+60
after 0.9999999999999999494e+60
Pow10(-100)
before 0.99999999999999938551e-100
after 0.99999999999999989309e-100
Pow10(-200)
before 0.9999999999999988218e-200
after 1.0000000000000001271e-200
name old time/op new time/op delta
Pow10Pos-4 44.6ns ± 2% 1.2ns ± 1% -97.39% (p=0.000 n=19+17)
Pow10Neg-4 50.8ns ± 1% 4.1ns ± 2% -92.02% (p=0.000 n=17+19)
Change-Id: If094034286b8ac64be3a95fd9e8ffa3d4ad39b31
Reviewed-on: https://go-review.googlesource.com/36331
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Test finite negative x with Y0(-1), Y1(-1), Yn(2,-1), Yn(-3,-1).
Also test the special case Yn(0,0).
Fixes#19130.
Change-Id: I95f05a72e1c455ed8ddf202c56f4266f03f370fd
Reviewed-on: https://go-review.googlesource.com/37310
Reviewed-by: Robert Griesemer <gri@golang.org>
For compatibility with math/bits uint operations.
When math/big was written originally, the Go compiler used 32bit
int/uint values even on a 64bit machine. uintptr was the type that
represented the machine register size. Now, the int/uint types are
sized to the native machine register size, so they are the natural
machine Word type.
On most machines, the size of int/uint correspond to the size of
uintptr. On platforms where uint and uintptr have different sizes,
this change may lead to performance differences (e.g., amd64p32).
Change-Id: Ief249c160b707b6441848f20041e32e9e9d8d8ca
Reviewed-on: https://go-review.googlesource.com/37372
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Add exported global variables and store the results of benchmarked
functions in them. This prevents the current compiler optimizations
from removing the instructions that are needed to compute the return
values of the benchmarked functions.
Change-Id: If8b08424e85f3796bb6dd73e761c653abbabcc5e
Reviewed-on: https://go-review.googlesource.com/37195
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
- moved from: x&m>>k | x&^m<<k to: x&m>>k | x<<k&m
This permits use of the same constant m twice (*) which may be
better for machines that can't use large immediate constants
directly with an AND instruction and have to load them explicitly.
*) CPUs don't usually have a &^ instruction, so x&^m becomes x&(^m)
- simplified returns
This improves the generated code because the compiler recognizes
x>>k | x<<k as ROT when k is the bitsize of x.
The 8-bit versions of these instructions can be significantly faster
still if they are replaced with table lookups, as long as the table
is in cache. If the table is not in cache, table-lookup is probably
slower, hence the choice of an explicit register-only implementation
for now.
BenchmarkReverse-8 8.50 6.86 -19.29%
BenchmarkReverse8-8 2.17 1.74 -19.82%
BenchmarkReverse16-8 2.89 2.34 -19.03%
BenchmarkReverse32-8 3.55 2.95 -16.90%
BenchmarkReverse64-8 6.81 5.57 -18.21%
BenchmarkReverseBytes-8 3.49 2.48 -28.94%
BenchmarkReverseBytes16-8 0.93 0.62 -33.33%
BenchmarkReverseBytes32-8 1.55 1.13 -27.10%
BenchmarkReverseBytes64-8 2.47 2.47 +0.00%
Reverse-8 8.50ns ± 0% 6.86ns ± 0% ~ (p=1.000 n=1+1)
Reverse8-8 2.17ns ± 0% 1.74ns ± 0% ~ (p=1.000 n=1+1)
Reverse16-8 2.89ns ± 0% 2.34ns ± 0% ~ (p=1.000 n=1+1)
Reverse32-8 3.55ns ± 0% 2.95ns ± 0% ~ (p=1.000 n=1+1)
Reverse64-8 6.81ns ± 0% 5.57ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes-8 3.49ns ± 0% 2.48ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes16-8 0.93ns ± 0% 0.62ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes32-8 1.55ns ± 0% 1.13ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes64-8 2.47ns ± 0% 2.47ns ± 0% ~ (all samples are equal)
Change-Id: I0064de8c7e0e568ca7885d6f7064344bef91a06d
Reviewed-on: https://go-review.googlesource.com/37215
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Change-Id: I280c53be455f2fe0474ad577c0f7b7908a4eccb2
Reviewed-on: https://go-review.googlesource.com/36993
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The tests failed to compile when using the math_big_pure_go tag on
s390x.
Change-Id: I2a09f53ff6562ab9bc9b886cffc0f6205bbfcfbb
Reviewed-on: https://go-review.googlesource.com/36956
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The s390x port was based on the ppc64 port and, because of the way the
port was done, inherited some instructions from it. ppc64 supports
3-operand (4-operand for FMADD etc.) floating point instructions
but s390x doesn't (the destination register is always an input) and
so these were emulated.
There is a bug in the emulation of FMADD whereby if the destination
register is also a source for the multiplication it will be
clobbered. This doesn't break any assembly code in the std lib but
could affect future work.
To fix this I have gone through the floating point instructions and
removed all unnecessary 3-/4-operand emulation. The compiler doesn't
need it and assembly writers don't need it, it's just a source of
bugs.
I've also deleted the FNMADD family of emulated instructions. They
aren't used anywhere.
Change-Id: Ic07cedcf141a6a3b43a0c84895460f6cfbf56c04
Reviewed-on: https://go-review.googlesource.com/33350
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Unlike the pure go implementation used by every other architecture,
the amd64 asm implementation of Exp does not fail early if the
argument is known to overflow. Make it fail early.
Cost of the check is < 1ns (on an old Sandy Bridge machine):
name old time/op new time/op delta
Exp-4 18.3ns ± 1% 18.7ns ± 1% +2.08% (p=0.000 n=18+20)
Fixes#14932Fixes#18912
Change-Id: I04b3f9b4ee853822cbdc97feade726fbe2907289
Reviewed-on: https://go-review.googlesource.com/36271
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
There is some code value too: types intending to implement
Source64 can write a conversion confirming that.
For #4254 and the Go 1.8 release notes.
Change-Id: I7fc350a84f3a963e4dab317ad228fa340dda5c66
Reviewed-on: https://go-review.googlesource.com/33456
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
After x.ProbablyPrime(n) passes the n Miller-Rabin rounds,
add a Baillie-PSW test before declaring x probably prime.
Although the provable error bounds are unchanged, the empirical
error bounds drop dramatically: there are no known inputs
for which Baillie-PSW gives the wrong answer. For example,
before this CL, big.NewInt(443*1327).ProbablyPrime(1) == true.
Now it is (correctly) false.
The new Baillie-PSW test is two pieces: an added Miller-Rabin
round with base 2, and a so-called extra strong Lucas test.
(See the references listed in prime.go for more details.)
The Lucas test takes about 3.5x as long as the Miller-Rabin round,
which is close to theoretical expectations.
name time/op
ProbablyPrime/Lucas 2.91ms ± 2%
ProbablyPrime/MillerRabinBase2 850µs ± 1%
ProbablyPrime/n=0 3.75ms ± 3%
The speed of prime testing for a prime input does get slower:
name old time/op new time/op delta
ProbablyPrime/n=1 849µs ± 1% 4521µs ± 1% +432.31% (p=0.000 n=10+9)
ProbablyPrime/n=5 4.31ms ± 3% 7.87ms ± 1% +82.70% (p=0.000 n=10+10)
ProbablyPrime/n=10 8.52ms ± 3% 12.28ms ± 1% +44.11% (p=0.000 n=10+10)
ProbablyPrime/n=20 16.9ms ± 2% 21.4ms ± 2% +26.35% (p=0.000 n=9+10)
However, because the Baillie-PSW test is only added when the old
ProbablyPrime(n) would return true, testing composites runs at
the same speed as before, except in the case where the result
would have been incorrect and is now correct.
In particular, the most important use of this code is for
generating random primes in crypto/rand. That use spends
essentially all its time testing composites, so it is not
slowed down by the new Baillie-PSW check:
name old time/op new time/op delta
Prime 104ms ±22% 111ms ±16% ~ (p=0.165 n=10+10)
Thanks to Serhat Şevki Dinçer for CL 20170, which this CL builds on.
Fixes#13229.
Change-Id: Id26dde9b012c7637c85f2e96355d029b6382812a
Reviewed-on: https://go-review.googlesource.com/30770
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
The tree is inconsistent about single l vs double l in those
words in documentation, test messages, and one error value text.
$ git grep -E '[Mm]arshall(|s|er|ers|ed|ing)' | wc -l
42
$ git grep -E '[Mm]arshal(|s|er|ers|ed|ing)' | wc -l
1694
Make it consistently a single l, per earlier decisions. This means
contributors won't be confused by misleading precedence, and it helps
consistency.
Change the spelling in one error value text in newRawAttributes of
crypto/x509 package to be consistent.
This change was generated with:
perl -i -npe 's,([Mm]arshal)l(|s|er|ers|ed|ing),$1$2,' $(git grep -l -E '[Mm]arshall' | grep -v AUTHORS | grep -v CONTRIBUTORS)
Updates #12431.
Follows https://golang.org/cl/14150.
Change-Id: I85d28a2d7692862ccb02d6a09f5d18538b6049a2
Reviewed-on: https://go-review.googlesource.com/33017
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Note, most math functions are structured to use stubs, so that they can
be accelerated with assembly on any platform.
Sinh, cosh, and tanh were not structued with stubs, so this CL does
that. This set of routines was chosen as likely to produce good speedups
with assembly on any platform.
Technique used was minimax polynomial approximation using tables of
polynomial coefficients, with argument range reduction.
A table of scaling factors was also used for cosh and log10.
before after speedup
BenchmarkCos 22.1 ns/op 6.79 ns/op 3.25x
BenchmarkCosh 125 ns/op 11.7 ns/op 10.68x
BenchmarkLog10 48.4 ns/op 12.5 ns/op 3.87x
BenchmarkSin 22.2 ns/op 6.55 ns/op 3.39x
BenchmarkSinh 125 ns/op 14.2 ns/op 8.80x
BenchmarkTanh 65.0 ns/op 15.1 ns/op 4.30x
Accuracy was tested against a high precision
reference function to determine maximum error.
Approximately 4,000,000 points were tested for each function,
producing the following result.
Note: ulperr is error in "units in the last place"
max
ulperr
sin 1.43 (returns NaN beyond +-2^50)
cos 1.79 (returns NaN beyond +-2^50)
cosh 1.05
sinh 3.02
tanh 3.69
log10 1.75
Also includes a set of tests to test non-vector functions even
when SIMD is enabled
Change-Id: Icb45f14d00864ee19ed973d209c3af21e4df4edc
Reviewed-on: https://go-review.googlesource.com/32352
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
The condition to determine if any further iterations are needed is
evaluated to false in case it encounters a NaN. Instead, flip the
condition to keep looping until the factor is greater than the machine
roundoff error.
Updates #17577
Change-Id: I058abe73fcd49d3ae4e2f7b33020437cc8f290c3
Reviewed-on: https://go-review.googlesource.com/31952
Reviewed-by: Robert Griesemer <gri@golang.org>