1
0
mirror of https://github.com/golang/go synced 2024-11-19 14:14:40 -07:00
Commit Graph

17 Commits

Author SHA1 Message Date
Robert Griesemer
654a185f20 math/big: faster assembly kernels for AddVx/SubVx for 386.
(analog to Change-Id: Ia473e9ab9c63a955c252426684176bca566645ae)

Fixes #9243.

benchmark              old ns/op     new ns/op     delta
BenchmarkAddVV_1       5.76          5.60          -2.78%
BenchmarkAddVV_2       7.17          6.98          -2.65%
BenchmarkAddVV_3       8.69          8.57          -1.38%
BenchmarkAddVV_4       10.5          10.5          +0.00%
BenchmarkAddVV_5       13.3          11.6          -12.78%
BenchmarkAddVV_1e1     20.4          19.3          -5.39%
BenchmarkAddVV_1e2     166           140           -15.66%
BenchmarkAddVV_1e3     1588          1278          -19.52%
BenchmarkAddVV_1e4     16138         12657         -21.57%
BenchmarkAddVV_1e5     167608        127836        -23.73%
BenchmarkAddVW_1       4.87          4.76          -2.26%
BenchmarkAddVW_2       6.10          6.07          -0.49%
BenchmarkAddVW_3       7.75          7.65          -1.29%
BenchmarkAddVW_4       9.30          9.39          +0.97%
BenchmarkAddVW_5       10.8          10.9          +0.93%
BenchmarkAddVW_1e1     18.8          18.8          +0.00%
BenchmarkAddVW_1e2     143           134           -6.29%
BenchmarkAddVW_1e3     1390          1266          -8.92%
BenchmarkAddVW_1e4     13877         12545         -9.60%
BenchmarkAddVW_1e5     155330        125432        -19.25%

benchmark              old MB/s     new MB/s     speedup
BenchmarkAddVV_1       5556.09      5715.12      1.03x
BenchmarkAddVV_2       8926.55      9170.64      1.03x
BenchmarkAddVV_3       11042.15     11201.77     1.01x
BenchmarkAddVV_4       12168.21     12245.50     1.01x
BenchmarkAddVV_5       12041.39     13805.73     1.15x
BenchmarkAddVV_1e1     15659.65     16548.18     1.06x
BenchmarkAddVV_1e2     19268.57     22728.64     1.18x
BenchmarkAddVV_1e3     20141.45     25033.36     1.24x
BenchmarkAddVV_1e4     19827.86     25281.92     1.28x
BenchmarkAddVV_1e5     19092.06     25031.92     1.31x
BenchmarkAddVW_1       822.12       840.92       1.02x
BenchmarkAddVW_2       1310.89      1317.89      1.01x
BenchmarkAddVW_3       1549.31      1568.26      1.01x
BenchmarkAddVW_4       1720.45      1703.77      0.99x
BenchmarkAddVW_5       1857.12      1828.66      0.98x
BenchmarkAddVW_1e1     2126.39      2132.38      1.00x
BenchmarkAddVW_1e2     2784.49      2969.21      1.07x
BenchmarkAddVW_1e3     2876.89      3157.35      1.10x
BenchmarkAddVW_1e4     2882.32      3188.51      1.11x
BenchmarkAddVW_1e5     2575.16      3188.96      1.24x

(measured on OS X 10.9.5, 2.3 GHz Intel Core i7, 8GB 1333 MHz DDR3)

Change-Id: I46698729d5e0bc3e277aa0146a9d7a086c0c26f1
Reviewed-on: https://go-review.googlesource.com/2560
Reviewed-by: Keith Randall <khr@golang.org>
2015-01-08 20:58:59 +00:00
Robert Griesemer
067acd51b0 math/big: faster "pure Go" addition/subtraction for long vectors
(platforms w/o corresponding assembly kernels)

For short vector adds there's some erradic slow-down, but overall
these routines have become significantly faster. This only matters
for platforms w/o native (assembly) versions of these kernels, so
we are not concerned about the minor slow-down for short vectors.

This code was already reviewed under Mercurial (golang.org/cl/172810043)
but wasn't submitted before the switch to git.

Benchmarks run on 2.3GHz Intel Core i7, running OS X 10.9.5,
with the respective AddVV and AddVW assembly routines disabled.

benchmark              old ns/op     new ns/op     delta
BenchmarkAddVV_1       6.59          7.09          +7.59%
BenchmarkAddVV_2       10.3          10.1          -1.94%
BenchmarkAddVV_3       10.9          12.6          +15.60%
BenchmarkAddVV_4       13.9          15.6          +12.23%
BenchmarkAddVV_5       16.8          17.3          +2.98%
BenchmarkAddVV_1e1     29.5          29.9          +1.36%
BenchmarkAddVV_1e2     246           232           -5.69%
BenchmarkAddVV_1e3     2374          2185          -7.96%
BenchmarkAddVV_1e4     58942         22292         -62.18%
BenchmarkAddVV_1e5     668622        225279        -66.31%
BenchmarkAddVW_1       6.81          5.58          -18.06%
BenchmarkAddVW_2       7.69          6.86          -10.79%
BenchmarkAddVW_3       9.56          8.32          -12.97%
BenchmarkAddVW_4       12.1          9.53          -21.24%
BenchmarkAddVW_5       13.2          10.9          -17.42%
BenchmarkAddVW_1e1     23.4          18.0          -23.08%
BenchmarkAddVW_1e2     175           141           -19.43%
BenchmarkAddVW_1e3     1568          1266          -19.26%
BenchmarkAddVW_1e4     15425         12596         -18.34%
BenchmarkAddVW_1e5     156737        133539        -14.80%
BenchmarkFibo          381678466     132958666     -65.16%

benchmark              old MB/s     new MB/s     speedup
BenchmarkAddVV_1       9715.25      9028.30      0.93x
BenchmarkAddVV_2       12461.72     12622.60     1.01x
BenchmarkAddVV_3       17549.64     15243.82     0.87x
BenchmarkAddVV_4       18392.54     16398.29     0.89x
BenchmarkAddVV_5       18995.23     18496.57     0.97x
BenchmarkAddVV_1e1     21708.98     21438.28     0.99x
BenchmarkAddVV_1e2     25956.53     27506.88     1.06x
BenchmarkAddVV_1e3     26947.93     29286.66     1.09x
BenchmarkAddVV_1e4     10857.96     28709.46     2.64x
BenchmarkAddVV_1e5     9571.91      28409.21     2.97x
BenchmarkAddVW_1       1175.28      1433.98      1.22x
BenchmarkAddVW_2       2080.01      2332.54      1.12x
BenchmarkAddVW_3       2509.28      2883.97      1.15x
BenchmarkAddVW_4       2646.09      3356.83      1.27x
BenchmarkAddVW_5       3020.69      3671.07      1.22x
BenchmarkAddVW_1e1     3425.76      4441.40      1.30x
BenchmarkAddVW_1e2     4553.17      5642.96      1.24x
BenchmarkAddVW_1e3     5100.14      6318.72      1.24x
BenchmarkAddVW_1e4     5186.15      6350.96      1.22x
BenchmarkAddVW_1e5     5104.07      5990.74      1.17x

Change-Id: I7a62023b1105248a0e85e5b9819d3fd4266123d4
Reviewed-on: https://go-review.googlesource.com/2480
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
2015-01-08 17:00:59 +00:00
Robert Griesemer
80b3ff9f82 math/big: faster assembly kernels for AddVx/SubVx for amd64.
Replaced use of rotate instructions (RCRQ, RCLQ) with ADDQ/SBBQ
for restoring/saving the carry flag per suggestion from Torbjörn
Granlund (author of GMP bignum libs for C).
The rotate instructions tend to be slower on todays machines.

benchmark              old ns/op     new ns/op     delta
BenchmarkAddVV_1       5.69          5.51          -3.16%
BenchmarkAddVV_2       7.15          6.87          -3.92%
BenchmarkAddVV_3       8.69          8.06          -7.25%
BenchmarkAddVV_4       8.10          8.13          +0.37%
BenchmarkAddVV_5       8.37          8.47          +1.19%
BenchmarkAddVV_1e1     13.1          12.0          -8.40%
BenchmarkAddVV_1e2     78.1          69.4          -11.14%
BenchmarkAddVV_1e3     815           656           -19.51%
BenchmarkAddVV_1e4     8137          7345          -9.73%
BenchmarkAddVV_1e5     100127        93909         -6.21%
BenchmarkAddVW_1       4.86          4.71          -3.09%
BenchmarkAddVW_2       5.67          5.50          -3.00%
BenchmarkAddVW_3       6.51          6.34          -2.61%
BenchmarkAddVW_4       6.69          6.66          -0.45%
BenchmarkAddVW_5       7.20          7.21          +0.14%
BenchmarkAddVW_1e1     10.0          9.34          -6.60%
BenchmarkAddVW_1e2     45.4          52.3          +15.20%
BenchmarkAddVW_1e3     417           491           +17.75%
BenchmarkAddVW_1e4     4760          4852          +1.93%
BenchmarkAddVW_1e5     69107         67717         -2.01%

benchmark              old MB/s      new MB/s      speedup
BenchmarkAddVV_1       11241.82      11610.28      1.03x
BenchmarkAddVV_2       17902.68      18631.82      1.04x
BenchmarkAddVV_3       22082.43      23835.64      1.08x
BenchmarkAddVV_4       31588.18      31492.06      1.00x
BenchmarkAddVV_5       38229.90      37783.17      0.99x
BenchmarkAddVV_1e1     48891.67      53340.91      1.09x
BenchmarkAddVV_1e2     81940.61      92191.86      1.13x
BenchmarkAddVV_1e3     78443.09      97480.44      1.24x
BenchmarkAddVV_1e4     78644.18      87129.50      1.11x
BenchmarkAddVV_1e5     63918.48      68150.84      1.07x
BenchmarkAddVW_1       13165.09      13581.00      1.03x
BenchmarkAddVW_2       22588.04      23275.41      1.03x
BenchmarkAddVW_3       29483.82      30303.96      1.03x
BenchmarkAddVW_4       38286.54      38453.21      1.00x
BenchmarkAddVW_5       44414.57      44370.59      1.00x
BenchmarkAddVW_1e1     63816.84      68494.08      1.07x
BenchmarkAddVW_1e2     140885.41     122427.16     0.87x
BenchmarkAddVW_1e3     153258.31     130325.28     0.85x
BenchmarkAddVW_1e4     134447.63     131904.02     0.98x
BenchmarkAddVW_1e5     92609.41      94509.88      1.02x

Change-Id: Ia473e9ab9c63a955c252426684176bca566645ae
Reviewed-on: https://go-review.googlesource.com/2503
Reviewed-by: Keith Randall <khr@golang.org>
2015-01-08 16:57:11 +00:00
Shenghou Ma
43178697db math/big: panic if n <= 0 for ProbablyPrime
Fixes #9509

Change-Id: I3b86745d38e09093fe2f4b918d774bd6608727d7
Reviewed-on: https://go-review.googlesource.com/2313
Reviewed-by: Robert Griesemer <gri@golang.org>
2015-01-05 23:11:35 +00:00
Fazlul Shahriar
e6f76aac32 math: be consistent in how we document special cases
Change-Id: Ic6bc4af7bcc89b2881b2b9e7290aeb6fd54804e2
Reviewed-on: https://go-review.googlesource.com/2239
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-01-05 21:01:46 +00:00
Guobiao Mei
59cb2d9ca6 math/rand: fix example_test to show with the correct method
Originally it used r.Int63() to show "Uint32", and now we use the correct r.Uint32() method.

Fixes #9429

Change-Id: I8a1228f1ca1af93b0e3104676fc99000257c456f
Reviewed-on: https://go-review.googlesource.com/2069
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2014-12-23 17:24:24 +00:00
Alberto Donizetti
5de497bc6f math: Added parity check to ProbablyPrime
Fixes #9269

Change-Id: I25751632e95978537b656aedfa5c35ab2273089b
Reviewed-on: https://go-review.googlesource.com/1380
Reviewed-by: Robert Griesemer <gri@golang.org>
2014-12-12 00:25:16 +00:00
Russ Cox
09d92b6bbf all: power64 is now ppc64
Fixes #8654.

LGTM=austin
R=austin
CC=golang-codereviews
https://golang.org/cl/180600043
2014-12-05 19:13:20 -05:00
Austin Clements
062e354c84 [dev.power64] runtime: power64 fixes and ports of changes
Fix include paths that got moved in the great pkg/ rename.  Add
missing runtime/arch_* files for power64.  Port changes that
happened on default since branching to
runtime/{asm,atomic,sys_linux}_power64x.s (precise stacks,
calling convention change, various new and deleted functions.
Port struct renaming and fix some bugs in
runtime/defs_linux_power64.h.

LGTM=rsc
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/161450043
2014-10-27 17:27:03 -04:00
Austin Clements
f0bd539c59 [dev.power64] all: merge default into dev.power64
This brings dev.power64 up-to-date with the current tip of
default.  go_bootstrap is still panicking with a bad defer
when initializing the runtime (even on amd64).

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/152570049
2014-10-22 15:51:54 -04:00
Austin Clements
2bd616b1a7 build: merge the great pkg/ rename into dev.power64
This also removes pkg/runtime/traceback_lr.c, which was ported
to Go in an earlier commit and then moved to
runtime/traceback.go.

Reviewer: rsc@golang.org
          rsc: LGTM
2014-10-22 13:25:37 -04:00
Keith Randall
96d1e4ab59 math/big: Allow non-prime modulus for ModInverse
The inverse is defined whenever the element and the
modulus are relatively prime.  The code already handles
this situation, but the spec does not.

Test that it does indeed work.

Fixes #8875

LGTM=agl
R=agl
CC=golang-codereviews
https://golang.org/cl/155010043
2014-10-14 14:09:56 -07:00
Casey Marshall
7371153321 math/big: Fixes issue 8920
(*Rat).SetString checks for denominator.

LGTM=gri
R=golang-codereviews, gri
CC=golang-codereviews
https://golang.org/cl/159760043
2014-10-13 12:41:14 -07:00
Robert Griesemer
87f51f1031 math/big: fix doc comments
Fixes #8904.

TBR=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/148650043
2014-10-07 10:56:58 -07:00
Robert Griesemer
28ddfb090c math/big: math.Exp should return result >= 0 for |m| > 0
The documentation states that Exp(x, y, m)
computes x**y mod |m| for m != nil && m > 0.
In math.big, Mod is the Euclidean modulus,
which is always >= 0.

Fixes #8822.

LGTM=agl, r, rsc
R=agl, r, rsc
CC=golang-codereviews
https://golang.org/cl/145650043
2014-10-02 13:02:25 -07:00
Russ Cox
4a8cb4a49c math: avoid assumption of denormalized math mode in Sincos
The extra-clever code in Sincos is trying to do

        if v&2 == 0 {
                mask = 0xffffffffffffffff
        } else {
                mask = 0
        }

It does this by turning v&2 into a float64 X0 and then using

        MOVSD $0.0, X3
        CMPSD X0, X3, 0

That CMPSD is defined to behave like:

        if X0 == X3 {
                X3 = 0xffffffffffffffff
        } else {
                X3 = 0
        }

which gives the desired mask in X3. The goal in using the
CMPSD was to avoid a conditional branch.

This code fails when called from a PortAudio callback.
In particular, the failure behavior is exactly as if the
CMPSD always chose the 'true' execution.

Notice that the comparison X0 == X3 is comparing as
floating point values the 64-bit pattern v&2 and the actual
floating point value zero. The only possible values for v&2
are 0x0000000000000000 (floating point zero)
and 0x0000000000000002 (floating point 1e-323, a denormal).
If they are both comparing equal to zero, I conclude that
in a PortAudio callback (whatever that means), the processor
is running in "denormals are zero" mode.

I confirmed this by placing the processor into that mode
and running the test case in the bug; it produces the
incorrect output reported in the bug.

In general, if a Go program changes the floating point math
modes to something other than what Go expects, the math
library is not going to work exactly as intended, so we might
be justified in not fixing this at all.

However, it seems reasonable that the client code might
have expected "denormals are zero" mode to only affect
actual processing of denormals. This code has produced
what is in effect a gratuitous denormal by being extra clever.
There is nothing about the computation being requested
that fundamentally requires a denormal.

It is also easy to do this computation in integer math instead:

        mask = ((v&2)>>1)-1

Do that.

For the record, the other math tests that fail if you put the
processor in "denormals are zero" mode are the tests for
Frexp, Ilogb, Ldexp, Logb, Log2, and FloatMinMax, but all
fail processing denormal inputs. Sincos was the only function
for which that mode causes incorrect behavior on non-denormal inputs.

The existing tests check that the new assembly is correct.
There is no test for behavior in "denormals are zero" mode,
because I don't want to add assembly to change that.

Fixes #8623.

LGTM=josharian
R=golang-codereviews, josharian
CC=golang-codereviews, iant, r
https://golang.org/cl/151750043
2014-09-26 17:13:24 -04:00
Russ Cox
c007ce824d build: move package sources from src/pkg to src
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
2014-09-08 00:08:51 -04:00