1
0
mirror of https://github.com/golang/go synced 2024-11-17 18:24:45 -07:00
Commit Graph

556 Commits

Author SHA1 Message Date
surechen
73eb24ccb6 math: Remove redundant local variable Ln2
Use the const variable Ln2 in math/const.go for function acosh.

Change-Id: I5381d03dd3142c227ae5773ece9be6c8f377615e
Reviewed-on: https://go-review.googlesource.com/c/go/+/232517
Reviewed-by: Robert Griesemer <gri@golang.org>
Trust: Robert Griesemer <gri@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
2020-09-19 09:09:52 +00:00
Xiangdong Ji
ae7b6a3b77 math/big: tune addVW/subVW performance on arm64
Add an optimization for addVW and subVW over large-sized vectors, it switches
from add/sub with carry to copy the rest of the vector when we are done with
carries. Consistent performance improvement are observed on various arm64
machines.

Add additional tests and benchmarks to increase the test coverage.
TestFunVWExt:
  Testing with various types of input vector, using the result from go-version
  addVW/subVW as golden reference.
BenchmarkAddVWext and BenchmarkSubVWext:
  Benchmarking using input vector having all 1s or all 0s, for evaluating the
  overhead of worst case.

1. Perf. comparison over randomly generated input vectors:

Server 1:
name             old time/op    new time/op    delta
AddVW/1            12.3ns ± 3%    12.0ns ± 0%    -2.60%  (p=0.001 n=10+8)
AddVW/2            12.5ns ± 2%    12.3ns ± 0%    -1.84%  (p=0.001 n=10+8)
AddVW/3            12.6ns ± 2%    12.3ns ± 0%    -1.91%  (p=0.009 n=10+10)
AddVW/4            13.1ns ± 3%    12.7ns ± 0%    -2.98%  (p=0.006 n=10+8)
AddVW/5            14.4ns ± 1%    13.9ns ± 0%    -3.81%  (p=0.000 n=10+10)
AddVW/10           11.7ns ± 0%    11.7ns ± 0%      ~     (all equal)
AddVW/100          47.8ns ± 0%    29.9ns ± 2%   -37.38%  (p=0.000 n=10+9)
AddVW/1000          446ns ± 0%     207ns ± 0%   -53.59%  (p=0.000 n=10+10)
AddVW/10000        4.35µs ± 1%    2.92µs ± 0%   -32.85%  (p=0.000 n=10+10)
AddVW/100000       43.6µs ± 0%    29.7µs ± 0%   -31.92%  (p=0.000 n=8+10)
SubVW/1            12.6ns ± 0%    12.3ns ± 2%    -2.22%  (p=0.000 n=7+10)
SubVW/2            12.7ns ± 0%    12.6ns ± 1%    -0.39%  (p=0.046 n=8+10)
SubVW/3            12.7ns ± 1%    12.6ns ± 1%      ~     (p=0.410 n=10+10)
SubVW/4            13.3ns ± 3%    13.1ns ± 3%      ~     (p=0.072 n=10+10)
SubVW/5            14.2ns ± 0%    14.1ns ± 1%    -0.63%  (p=0.046 n=8+10)
SubVW/10           11.7ns ± 0%    11.7ns ± 0%      ~     (all equal)
SubVW/100          47.8ns ± 0%    33.1ns ±19%   -30.71%  (p=0.000 n=10+10)
SubVW/1000          446ns ± 0%     207ns ± 0%   -53.59%  (p=0.000 n=10+10)
SubVW/10000        4.33µs ± 1%    2.92µs ± 0%   -32.66%  (p=0.000 n=10+6)
SubVW/100000       43.4µs ± 0%    29.6µs ± 0%   -31.90%  (p=0.000 n=10+9)

Server 2:
name             old time/op    new time/op    delta
AddVW/1            5.49ns ± 0%    5.53ns ± 2%     ~     (p=1.000 n=9+10)
AddVW/2            5.96ns ± 2%    5.92ns ± 1%   -0.69%  (p=0.039 n=10+10)
AddVW/3            6.72ns ± 0%    6.73ns ± 0%     ~     (p=0.078 n=10+10)
AddVW/4            7.07ns ± 0%    6.75ns ± 2%   -4.55%  (p=0.000 n=10+10)
AddVW/5            8.14ns ± 0%    8.17ns ± 0%   +0.46%  (p=0.003 n=8+8)
AddVW/10           10.0ns ± 0%    10.1ns ± 1%   +0.70%  (p=0.003 n=10+10)
AddVW/100          43.0ns ± 0%    33.5ns ± 0%  -22.09%  (p=0.000 n=9+9)
AddVW/1000          394ns ± 0%     278ns ± 0%  -29.44%  (p=0.000 n=10+10)
AddVW/10000        4.18µs ± 0%    3.14µs ± 0%  -24.81%  (p=0.000 n=8+8)
AddVW/100000       68.3µs ± 3%    62.1µs ± 5%   -9.13%  (p=0.000 n=10+10)
SubVW/1            5.37ns ± 2%    5.42ns ± 1%     ~     (p=0.990 n=10+10)
SubVW/2            5.89ns ± 0%    5.92ns ± 1%   +0.58%  (p=0.000 n=8+10)
SubVW/3            6.64ns ± 1%    6.82ns ± 3%   +2.63%  (p=0.000 n=9+10)
SubVW/4            7.17ns ± 0%    6.69ns ± 2%   -6.74%  (p=0.000 n=10+9)
SubVW/5            8.22ns ± 0%    8.18ns ± 0%   -0.46%  (p=0.001 n=8+9)
SubVW/10           10.0ns ± 1%    10.1ns ± 1%     ~     (p=0.341 n=10+10)
SubVW/100          43.0ns ± 0%    33.5ns ± 0%  -22.09%  (p=0.000 n=7+10)
SubVW/1000          394ns ± 0%     278ns ± 0%  -29.44%  (p=0.000 n=10+10)
SubVW/10000        4.18µs ± 0%    3.15µs ± 0%  -24.62%  (p=0.000 n=9+9)
SubVW/100000       67.7µs ± 4%    62.4µs ± 2%   -7.92%  (p=0.000 n=10+10)

2. Perf. comparison over input vectors of all 1s or all 0s

Server 1:
name             old time/op    new time/op    delta
AddVWext/1         12.6ns ± 0%    12.0ns ± 0%    -4.76%  (p=0.000 n=6+10)
AddVWext/2         12.7ns ± 0%    12.4ns ± 1%    -2.52%  (p=0.000 n=10+10)
AddVWext/3         12.7ns ± 0%    12.4ns ± 0%    -2.36%  (p=0.000 n=9+7)
AddVWext/4         13.2ns ± 4%    12.7ns ± 0%    -3.71%  (p=0.001 n=10+9)
AddVWext/5         14.6ns ± 0%    13.9ns ± 0%    -4.79%  (p=0.000 n=10+8)
AddVWext/10        11.7ns ± 0%    11.7ns ± 0%      ~     (all equal)
AddVWext/100       47.8ns ± 0%    47.4ns ± 0%    -0.84%  (p=0.000 n=10+10)
AddVWext/1000       446ns ± 0%     399ns ± 0%   -10.54%  (p=0.000 n=10+10)
AddVWext/10000     4.34µs ± 1%    3.90µs ± 0%   -10.12%  (p=0.000 n=10+10)
AddVWext/100000    43.9µs ± 1%    39.4µs ± 0%   -10.18%  (p=0.000 n=10+10)
SubVWext/1         12.6ns ± 0%    12.3ns ± 2%    -2.70%  (p=0.000 n=7+10)
SubVWext/2         12.6ns ± 1%    12.6ns ± 2%      ~     (p=0.234 n=10+10)
SubVWext/3         12.7ns ± 0%    12.6ns ± 2%    -0.71%  (p=0.033 n=10+10)
SubVWext/4         13.4ns ± 0%    13.1ns ± 3%    -2.01%  (p=0.006 n=8+10)
SubVWext/5         14.2ns ± 0%    14.1ns ± 1%    -0.85%  (p=0.003 n=10+10)
SubVWext/10        11.7ns ± 0%    11.7ns ± 0%      ~     (all equal)
SubVWext/100       47.8ns ± 0%    47.4ns ± 0%    -0.84%  (p=0.000 n=10+10)
SubVWext/1000       446ns ± 0%     399ns ± 0%   -10.54%  (p=0.000 n=10+10)
SubVWext/10000     4.33µs ± 1%    3.90µs ± 0%   -10.02%  (p=0.000 n=10+10)
SubVWext/100000    43.5µs ± 0%    39.5µs ± 1%    -9.16%  (p=0.000 n=7+10)

Server 2:
name             old time/op    new time/op    delta
AddVWext/1         5.48ns ± 0%    5.43ns ± 1%   -0.97%  (p=0.000 n=9+9)
AddVWext/2         5.99ns ± 2%    5.93ns ± 1%     ~     (p=0.054 n=10+10)
AddVWext/3         6.74ns ± 0%    6.79ns ± 1%   +0.80%  (p=0.000 n=9+10)
AddVWext/4         7.18ns ± 0%    7.21ns ± 1%   +0.36%  (p=0.034 n=9+10)
AddVWext/5         7.93ns ± 3%    8.18ns ± 0%   +3.18%  (p=0.000 n=10+8)
AddVWext/10        10.0ns ± 0%    10.1ns ± 1%   +0.60%  (p=0.011 n=10+10)
AddVWext/100       43.0ns ± 0%    47.7ns ± 0%  +10.93%  (p=0.000 n=9+10)
AddVWext/1000       394ns ± 0%     399ns ± 0%   +1.27%  (p=0.000 n=10+10)
AddVWext/10000     4.18µs ± 0%    4.50µs ± 0%   +7.73%  (p=0.000 n=9+10)
AddVWext/100000    67.6µs ± 2%    68.4µs ± 3%     ~     (p=0.139 n=9+8)
SubVWext/1         5.46ns ± 1%    5.43ns ± 0%   -0.55%  (p=0.002 n=9+9)
SubVWext/2         5.89ns ± 0%    5.93ns ± 1%   +0.68%  (p=0.000 n=8+10)
SubVWext/3         6.72ns ± 1%    6.79ns ± 1%   +1.07%  (p=0.000 n=10+10)
SubVWext/4         6.98ns ± 1%    7.21ns ± 0%   +3.25%  (p=0.000 n=10+10)
SubVWext/5         8.22ns ± 0%    7.99ns ± 3%   -2.83%  (p=0.000 n=8+10)
SubVWext/10        10.0ns ± 1%    10.1ns ± 1%     ~     (p=0.239 n=10+10)
SubVWext/100       43.0ns ± 0%    47.7ns ± 0%  +10.93%  (p=0.000 n=8+10)
SubVWext/1000       394ns ± 0%     399ns ± 0%   +1.27%  (p=0.000 n=10+10)
SubVWext/10000     4.18µs ± 0%    4.51µs ± 0%   +7.86%  (p=0.000 n=8+8)
SubVWext/100000    68.3µs ± 2%    68.0µs ± 3%     ~     (p=0.515 n=10+8)

Change-Id: I134a5194b8a2deaaebbaa2b771baf72846971d58
Reviewed-on: https://go-review.googlesource.com/c/go/+/229739
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-08-28 16:40:41 +00:00
surechen
bd6dfe9a3e math/big: add a comment for SetMantExp
Change-Id: I9ff5d1767cf70648c2251268e5e815944a7cb371
Reviewed-on: https://go-review.googlesource.com/c/go/+/233737
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-08-28 16:25:32 +00:00
zhouzhongyuan
63828096f6 math/big: add function example
While reading the source code of the math/big package, I found the SetString function example of float type missing.

Change-Id: Id8c16a58e2e24f9463e8ff38adbc98f8c418ab26
Reviewed-on: https://go-review.googlesource.com/c/go/+/232804
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-08-26 16:15:32 +00:00
SparrowLii
41bc0a1713 math/big: fix TestShiftOverlap for test -count arguments > 1
Don't overwrite incoming test data.

The change uses copy instead of assigning statement to avoid this.

Change-Id: Ib907101822d811de5c45145cb9d7961907e212c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/250137
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-08-25 17:04:40 +00:00
Lynn Boger
216714e44f math/big: improve performance of mulAddVWW on ppc64x
This changes the assembly implementation on ppc64x
to improve performance by reordering some instructions.
It also eliminates an unnecessary move by changing an
ADDZE to use the correct target register.

Improvement on power9:

MulAddVWW/1         6.89ns ± 0%    7.30ns ± 0%   +5.95%  (p=1.000 n=1+1)
MulAddVWW/2         8.04ns ± 0%    8.06ns ± 0%   +0.25%  (p=1.000 n=1+1)
MulAddVWW/3         9.39ns ± 0%    9.39ns ± 0%     ~     (all equal)
MulAddVWW/4         9.76ns ± 0%    9.48ns ± 0%   -2.87%  (p=1.000 n=1+1)
MulAddVWW/5         10.5ns ± 0%    10.3ns ± 0%   -1.90%  (p=1.000 n=1+1)
MulAddVWW/10        15.4ns ± 0%    14.9ns ± 0%   -3.25%  (p=1.000 n=1+1)
MulAddVWW/100        149ns ± 0%     125ns ± 0%  -16.11%  (p=1.000 n=1+1)
MulAddVWW/1000      1.42µs ± 0%    1.28µs ± 0%   -9.74%  (p=1.000 n=1+1)
MulAddVWW/10000     14.2µs ± 0%    12.8µs ± 0%   -9.73%  (p=1.000 n=1+1)
MulAddVWW/100000     144µs ± 0%     129µs ± 0%  -10.10%  (p=1.000 n=1+1)

Change-Id: I0ae7002a69783ca19d7a4e3e42042ae75dc60069
Reviewed-on: https://go-review.googlesource.com/c/go/+/248721
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
2020-08-18 20:25:26 +00:00
kakulisen
441b52f566 math: simplify the code
Simplifying some code without compromising performance.
My CPU is Intel Xeon Gold 6161, 2.20GHz, 64-bit operating system.
The memory is 8GB. This is my test environment, I hope to help you judge.

Benchmark:

name      old time/op    new time/op    delta
Log1p-4    21.8ns ± 5%    21.8ns ± 4%   ~     (p=0.973 n=20+20)

Change-Id: Icd8f96f1325b00007602d114300b92d4c57de409
Reviewed-on: https://go-review.googlesource.com/c/go/+/233940
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-08-15 02:20:42 +00:00
Alberto Donizetti
65f514edfb math: fix dead link to springerlink (now link.springer)
Change-Id: Ie5fd026af45d2e7bc371a38d15dbb52a1b4958cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/235717
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2020-05-29 14:33:50 +00:00
Bryan C. Mills
13617380ca testing: clean up remaining TempDir issues from CL 231958
Updates #38850

Change-Id: I33f48762f5520eb0c0a841d8ca1ccdd65ecc20c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/234583
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-05-19 19:55:14 +00:00
Filippo Valsorda
c9d5f60eaa math/big: add (*Int).FillBytes
Replaced almost every use of Bytes with FillBytes.

Note that the approved proposal was for

    func (*Int) FillBytes(buf []byte)

while this implements

    func (*Int) FillBytes(buf []byte) []byte

because the latter was far nicer to use in all callsites.

Fixes #35833

Change-Id: Ia912df123e5d79b763845312ea3d9a8051343c0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/230397
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-05-05 00:36:44 +00:00
Joel Sing
03e6073b13 math: implement Min/Max in riscv64 assembly
Change-Id: If34422859d47bc8f44974a00c6b7908e7655ff41
Reviewed-on: https://go-review.googlesource.com/c/go/+/223561
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-05-04 17:29:13 +00:00
kakulisen
e90b0ce68b math: add function examples.
The function Modf lacks corresponding examples.

Change-Id: Id93423500e87d35b0b6870882be1698b304797ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/231097
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-05-02 20:22:19 +00:00
Brian Kessler
4209a9f65a math/cmplx: handle special cases
Implement special case handling and testing to ensure
conformance with the C99 standard annex G.6 Complex arithmetic.

Fixes #29320

Change-Id: Id72eb4c5a35d5a54b4b8690d2f7176ab11028f1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/220689
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-05-01 03:16:37 +00:00
kakulisen
df2862cf54 math: Add a function example
When I browsed the source code, I saw that there is no corresponding example of this function. I am not sure if there is a need for an increase, this is my first time to submit CL.

Change-Id: Idbf4e1e1ed2995176a76959d561e152263a2fd26
Reviewed-on: https://go-review.googlesource.com/c/go/+/230741
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-30 00:33:38 +00:00
Ruixin(Peter) Bao
a7e9e84716 math/big: simplify hasVX checking on s390x
Originally, we use an assembly function that returns a boolean result to
tell whether the machine has vector facility or not. It is now no longer
needed when we can directly use cpu.S390X.HasVX variable.

Change-Id: Ic1dae851982532bcfd9a9453416c112347f21d87
Reviewed-on: https://go-review.googlesource.com/c/go/+/230318
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-27 20:20:53 +00:00
Ruixin Bao
d2f5e4e38c math: simplify hasVX checking on s390x
Originally, we use an assembly function that returns a boolean result to
tell whether the machine has vector facility or not. It is now no longer
needed when we can directly use cpu.S390X.HasVX variable.

Change-Id: Ic3ffeb9e63238ef41406d97cdc42502145ddb454
Reviewed-on: https://go-review.googlesource.com/c/go/+/230319
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-27 20:06:57 +00:00
Tyson Andre
19648622d2 math/cmplx: fix typo in code comment
Everywhere else is using "cancellation" as of 2019

The reasoning is mentioned in 170060.

> Though there is variation in the spelling of canceled,
> cancellation is always spelled with a double l.
>
> Reference: https://www.grammarly.com/blog/canceled-vs-cancelled/

Change-Id: I933ea68d7251986ce582b92c33b7cb13cee1d207
GitHub-Last-Rev: fc3d5ada2b
GitHub-Pull-Request: golang/go#38661
Reviewed-on: https://go-review.googlesource.com/c/go/+/230199
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2020-04-25 21:06:35 +00:00
Ruixin(Peter) Bao
3a37fd4010 math/big: rewrite subVW to use fast path on s390x
This CL replaces the original subVW implementation with a implementation
that uses a similar idea as CL 164968.

When we know the borrow bit is zero, we can copy the rest of words as
they will not be updated. Also, since we are copying vector of a words,
a faster implementation of copy is written in this CL to copy a word or
multiple words at a time.

Benchmarks:
name             old time/op    new time/op     delta
SubVW/1-18         4.43ns ± 0%     3.82ns ± 0%   -13.85%  (p=0.000 n=20+20)
SubVW/2-18         5.39ns ± 0%     4.25ns ± 0%   -21.23%  (p=0.000 n=20+20)
SubVW/3-18         6.29ns ± 0%     4.65ns ± 0%   -26.07%  (p=0.000 n=16+19)
SubVW/4-18         6.08ns ± 2%     4.84ns ± 0%   -20.43%  (p=0.000 n=20+20)
SubVW/5-18         7.06ns ± 1%     4.93ns ± 0%   -30.18%  (p=0.000 n=20+20)
SubVW/10-18        10.3ns ± 2%      7.2ns ± 0%   -30.35%  (p=0.000 n=20+19)
SubVW/100-18       48.0ns ± 4%     17.6ns ± 0%   -63.32%  (p=0.000 n=18+19)
SubVW/1000-18       448ns ±10%      236ns ± 1%   -47.24%  (p=0.000 n=20+20)
SubVW/10000-18     4.83µs ± 5%     2.96µs ± 0%   -38.73%  (p=0.000 n=20+19)
SubVW/100000-18    46.6µs ± 3%     30.6µs ± 1%   -34.30%  (p=0.000 n=20+20)
[Geo mean]         56.3ns          37.0ns        -34.24%

name             old speed      new speed       delta
SubVW/1-18       1.80GB/s ± 0%   2.10GB/s ± 0%   +16.16%  (p=0.000 n=20+20)
SubVW/2-18       2.97GB/s ± 0%   3.77GB/s ± 0%   +26.95%  (p=0.000 n=20+20)
SubVW/3-18       3.82GB/s ± 0%   5.16GB/s ± 0%   +35.26%  (p=0.000 n=20+19)
SubVW/4-18       5.26GB/s ± 1%   6.61GB/s ± 0%   +25.59%  (p=0.000 n=20+20)
SubVW/5-18       5.67GB/s ± 1%   8.11GB/s ± 0%   +43.12%  (p=0.000 n=20+20)
SubVW/10-18      7.79GB/s ± 2%  11.17GB/s ± 0%   +43.52%  (p=0.000 n=20+19)
SubVW/100-18     16.7GB/s ± 4%   45.5GB/s ± 0%  +172.61%  (p=0.000 n=18+20)
SubVW/1000-18    17.9GB/s ± 9%   33.9GB/s ± 1%   +89.25%  (p=0.000 n=20+20)
SubVW/10000-18   16.6GB/s ± 5%   27.0GB/s ± 0%   +63.08%  (p=0.000 n=20+19)
SubVW/100000-18  17.2GB/s ± 2%   26.1GB/s ± 1%   +52.18%  (p=0.000 n=20+20)
[Geo mean]       7.25GB/s       11.03GB/s        +52.01%

Change-Id: I32e99cbab3260054a96231d02b87049c833ab77e
Reviewed-on: https://go-review.googlesource.com/c/go/+/227297
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-24 14:50:59 +00:00
Ruixin(Peter) Bao
ee8972cd12 math/big: rewrite addVW to use fast path on s390x
Rewrite addVW to use a fast path and remove the original
vector and non vector implementation of addVW in assembly. This CL uses
a similar idea as CL 164968, where we copy the rest of words when we
know carry bit is zero.

In addition, since we are copying vector of words, a faster
implementation of copy is written in this CL to copy a word or multiple
words at a time.

Benchmarks:
name             old time/op    new time/op     delta
AddVW/1-18         4.56ns ± 0%     4.01ns ± 6%   -12.14%  (p=0.000 n=18+20)
AddVW/2-18         5.54ns ± 0%     4.42ns ± 5%   -20.20%  (p=0.000 n=18+20)
AddVW/3-18         6.55ns ± 0%     4.61ns ± 0%   -29.62%  (p=0.000 n=16+18)
AddVW/4-18         6.11ns ± 2%     5.12ns ± 6%   -16.19%  (p=0.000 n=20+20)
AddVW/5-18         7.32ns ± 4%     5.14ns ± 0%   -29.77%  (p=0.000 n=20+19)
AddVW/10-18        10.6ns ± 2%      7.2ns ± 1%   -31.47%  (p=0.000 n=20+20)
AddVW/100-18       49.6ns ± 2%     18.0ns ± 0%   -63.63%  (p=0.000 n=20+20)
AddVW/1000-18       465ns ± 3%      244ns ± 0%   -47.54%  (p=0.000 n=20+20)
AddVW/10000-18     4.99µs ± 4%     2.97µs ± 0%   -40.54%  (p=0.000 n=20+20)
AddVW/100000-18    48.3µs ± 3%     30.8µs ± 1%   -36.29%  (p=0.000 n=20+20)
[Geo mean]         58.1ns          38.0ns        -34.57%

name             old speed      new speed       delta
AddVW/1-18       1.76GB/s ± 0%   2.00GB/s ± 6%   +14.04%  (p=0.000 n=20+20)
AddVW/2-18       2.89GB/s ± 0%   3.63GB/s ± 5%   +25.55%  (p=0.000 n=18+20)
AddVW/3-18       3.66GB/s ± 0%   5.21GB/s ± 0%   +42.25%  (p=0.000 n=18+19)
AddVW/4-18       5.24GB/s ± 2%   6.27GB/s ± 6%   +19.61%  (p=0.000 n=20+20)
AddVW/5-18       5.47GB/s ± 4%   7.78GB/s ± 0%   +42.28%  (p=0.000 n=20+18)
AddVW/10-18      7.55GB/s ± 2%  11.04GB/s ± 1%   +46.09%  (p=0.000 n=20+20)
AddVW/100-18     16.1GB/s ± 2%   44.3GB/s ± 0%  +174.77%  (p=0.000 n=20+20)
AddVW/1000-18    17.2GB/s ± 3%   32.8GB/s ± 1%   +90.58%  (p=0.000 n=20+20)
AddVW/10000-18   16.0GB/s ± 4%   26.9GB/s ± 0%   +68.11%  (p=0.000 n=20+20)
AddVW/100000-18  16.6GB/s ± 3%   26.0GB/s ± 1%   +56.94%  (p=0.000 n=20+20)
[Geo mean]       7.03GB/s       10.75GB/s        +52.93%

Change-Id: Idbb73f3178311bd2b18a93bdc1e48f26869d2f6a
Reviewed-on: https://go-review.googlesource.com/c/go/+/209679
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-24 13:26:34 +00:00
Michael Munday
ab7a65f283 cmd/compile: clean up codegen for branch-on-carry on s390x
This CL optimizes code that uses a carry from a function such as
bits.Add64 as the condition in an if statement. For example:

    x, c := bits.Add64(a, b, 0)
    if c != 0 {
        panic("overflow")
    }

Rather than converting the carry into a 0 or a 1 value and using
that as an input to a comparison instruction the carry flag is now
used as the input to a conditional branch directly. This typically
removes an ADD LOGICAL WITH CARRY instruction when user code is
doing overflow detection and is closer to the code that a user
would expect to generate.

Change-Id: I950431270955ab72f1b5c6db873b6abe769be0da
Reviewed-on: https://go-review.googlesource.com/c/go/+/219757
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-22 20:11:06 +00:00
Ruixin(Peter) Bao
0329c915a0 math/big: clean up whitespace in arith_s390x.s file
This CL looks big but it only does formatting changes to arith_s390x.s.
The file was formatted using asmfmt(https://github.com/klauspost/asmfmt)
, so there should not be any functional impact. I verified that the
generated assembly of big.test file is identical.

Change-Id: I8b4035ef082a4d0357881869327e25253f2d8be1
Reviewed-on: https://go-review.googlesource.com/c/go/+/229302
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-22 15:40:55 +00:00
Alberto Donizetti
813f8eae27 math/big: remove Direct Sqrt computation
The Float.Sqrt method switches (for performance reasons) between
direct (uses Quo) and inverse (doesn't) computation, depending on the
precision, with threshold 128.

Unfortunately the implementation of recursive division in CL 172018
made Quo slightly slower exactly in the range around and below the
threshold Sqrt is using, so this strategy is no longer profitable.

The new division algorithm allocates more, and this has increased the
amount of allocations performed by Sqrt when using the direct method;
on low precisions the computation is fast, so additional allocations
have an negative impact on performance.

Interestingly, only using the inverse method doesn't just reverse the
effects of the Quo algorithm change, but it seems to make performances
better overall for small precisions:

name                 old time/op    new time/op    delta
FloatSqrt/64-4          643ns ± 1%     635ns ± 1%   -1.24%  (p=0.000 n=10+10)
FloatSqrt/128-4        1.44µs ± 1%    1.02µs ± 1%  -29.25%  (p=0.000 n=10+10)
FloatSqrt/256-4        1.49µs ± 1%    1.49µs ± 1%     ~     (p=0.752 n=10+10)
FloatSqrt/1000-4       3.71µs ± 1%    3.74µs ± 1%   +0.87%  (p=0.001 n=10+10)
FloatSqrt/10000-4      35.3µs ± 1%    35.6µs ± 1%   +0.82%  (p=0.002 n=10+9)
FloatSqrt/100000-4      844µs ± 1%     844µs ± 0%     ~     (p=0.549 n=10+9)
FloatSqrt/1000000-4    69.5ms ± 0%    69.6ms ± 0%     ~     (p=0.222 n=9+9)

name                 old alloc/op   new alloc/op   delta
FloatSqrt/64-4           280B ± 0%      200B ± 0%  -28.57%  (p=0.000 n=10+10)
FloatSqrt/128-4          504B ± 0%      248B ± 0%  -50.79%  (p=0.000 n=10+10)
FloatSqrt/256-4          344B ± 0%      344B ± 0%     ~     (all equal)
FloatSqrt/1000-4       1.30kB ± 0%    1.30kB ± 0%     ~     (all equal)
FloatSqrt/10000-4      13.5kB ± 0%    13.5kB ± 0%     ~     (p=0.237 n=10+10)
FloatSqrt/100000-4      123kB ± 0%     123kB ± 0%     ~     (p=0.247 n=10+10)
FloatSqrt/1000000-4    1.83MB ± 1%    1.83MB ± 3%     ~     (p=0.779 n=8+10)

name                 old allocs/op  new allocs/op  delta
FloatSqrt/64-4           8.00 ± 0%      5.00 ± 0%  -37.50%  (p=0.000 n=10+10)
FloatSqrt/128-4          11.0 ± 0%       5.0 ± 0%  -54.55%  (p=0.000 n=10+10)
FloatSqrt/256-4          5.00 ± 0%      5.00 ± 0%     ~     (all equal)
FloatSqrt/1000-4         6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/10000-4        6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/100000-4       6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/1000000-4      10.3 ±13%      10.3 ±13%     ~     (p=1.000 n=10+10)

For example, 1.02µs for FloatSqrt/128 is actually better than what I
was getting on the same machine before the Quo changes.

The .8% slowdown on /1000 and /10000 appears to be real and it is
quite baffling (that codepath was not touched at all); it may be
caused by code alignment changes.

Change-Id: Ib03761cdc1055674bc7526d4f3a23d7a25094029
Reviewed-on: https://go-review.googlesource.com/c/go/+/228062
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-15 16:37:53 +00:00
Brad Fitzpatrick
8f53fad035 math/big: add test that linker is able to remove unused code
(Follow-up to CL 228108.)

Change-Id: Ia6d119ee19c7aa923cdeead06d3cee87a1751105
Reviewed-on: https://go-review.googlesource.com/c/go/+/228109
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-15 03:25:21 +00:00
Hanjun Kim
5a447c0ae9 math/big: fix typo in documentation for Int.Exp
Fixes #38304

Also change `If m > 0, y < 0, ...` to `If m != 0, y < 0, ...` since `Exp` will return `nil`
whatever `m`'s sign is.

Change-Id: I17d7337ccd1404318cea5d42a8de904ad185fd00
GitHub-Last-Rev: 2399510300
GitHub-Pull-Request: golang/go#38390
Reviewed-on: https://go-review.googlesource.com/c/go/+/228000
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-15 00:32:18 +00:00
Brad Fitzpatrick
a55645fa34 math/big: don't use Float in init to help linker discard 162 KiB
Removes 162 KiB from binaries that don't use math/big.Float:

-rwxr-xr-x 1 bradfitz bradfitz 1916590 Apr 14 12:21 x.after
-rwxr-xr-x 1 bradfitz bradfitz 2082575 Apr 14 12:21 x.before

No change in deps (this package already used sync).

No change in benchmarks:

name                 old time/op    new time/op    delta
FloatSqrt/64-8         1.06µs ±10%    1.03µs ± 6%   ~     (p=0.133 n=10+9)
FloatSqrt/128-8        2.26µs ± 9%    2.28µs ± 9%   ~     (p=0.460 n=10+8)
FloatSqrt/256-8        2.29µs ± 5%    2.31µs ± 3%   ~     (p=0.214 n=9+9)
FloatSqrt/1000-8       5.82µs ± 3%    5.87µs ± 7%   ~     (p=0.666 n=9+9)
FloatSqrt/10000-8      56.4µs ± 5%    57.0µs ± 6%   ~     (p=0.436 n=10+10)
FloatSqrt/100000-8     1.34ms ± 8%    1.31ms ± 3%   ~     (p=0.447 n=10+9)
FloatSqrt/1000000-8     106ms ± 5%     107ms ± 7%   ~     (p=0.315 n=10+10)

name                 old alloc/op   new alloc/op   delta
FloatSqrt/64-8           280B ± 0%      280B ± 0%   ~     (all equal)
FloatSqrt/128-8          504B ± 0%      504B ± 0%   ~     (all equal)
FloatSqrt/256-8          344B ± 0%      344B ± 0%   ~     (all equal)
FloatSqrt/1000-8       1.30kB ± 0%    1.30kB ± 0%   ~     (all equal)
FloatSqrt/10000-8      13.5kB ± 0%    13.5kB ± 0%   ~     (p=0.403 n=10+10)
FloatSqrt/100000-8      123kB ± 0%     123kB ± 0%   ~     (p=0.393 n=10+10)
FloatSqrt/1000000-8    1.84MB ± 7%    1.84MB ± 5%   ~     (p=0.739 n=10+10)

name                 old allocs/op  new allocs/op  delta
FloatSqrt/64-8           8.00 ± 0%      8.00 ± 0%   ~     (all equal)
FloatSqrt/128-8          11.0 ± 0%      11.0 ± 0%   ~     (all equal)
FloatSqrt/256-8          5.00 ± 0%      5.00 ± 0%   ~     (all equal)
FloatSqrt/1000-8         6.00 ± 0%      6.00 ± 0%   ~     (all equal)
FloatSqrt/10000-8        6.00 ± 0%      6.00 ± 0%   ~     (all equal)
FloatSqrt/100000-8       6.00 ± 0%      6.00 ± 0%   ~     (all equal)
FloatSqrt/1000000-8      10.9 ±10%      10.8 ±17%   ~     (p=0.974 n=10+10)

Change-Id: I3337f1f531bf7b4fae192b9d90cd24ff2be14fea
Reviewed-on: https://go-review.googlesource.com/c/go/+/228108
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-14 20:50:19 +00:00
Rémy Oudompheng
ac1fd419b6 math/big: correct off-by-one access in divBasic
The divBasic function computes the quotient of big nats u/v word by word.
It estimates each word qhat by performing a long division (top 2 words of u
divided by top word of v), looks at the next word to correct the estimate,
then perform a full multiplication (qhat*v) to catch any inaccuracy in the
estimate.

In the latter case, "negative" values appear temporarily and carries
must be carefully managed, and the recursive division refactoring
introduced a case where qhat*v has the same length as v, triggering an
out-of-bounds write in the case it happens when computing the top word
of the quotient.

Fixes #37499

Change-Id: I15089da4a4027beda43af497bf6de261eb792f94
Reviewed-on: https://go-review.googlesource.com/c/go/+/221980
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-08 20:53:58 +00:00
Brian Kessler
6b6414cab4 math: correct Atan2(±y,+∞) = ±0 on s390x
The s390x assembly implementation was previously only handling this
case correctly for x = -Pi.  Update the special case handling for
any y.

Fixes #35446

Change-Id: I355575e9ec8c7ce8bd9db10d74f42a22f39a2f38
Reviewed-on: https://go-review.googlesource.com/c/go/+/223420
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-03-25 04:06:34 +00:00
Alberto Donizetti
c5058652fd math/big: document that Sqrt doesn't set Accuracy
Document that the Float.Sqrt method does not set the receiver's
Accuracy field.

Updates #37915

Change-Id: Ief1dcac07eacc0ef02f86bfac9044501477bca1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/224497
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-03-20 19:24:37 +00:00
Brian Kessler
d774d979dd math/cmplx: disable TanHuge test on s390x
s390x has inaccurate range reduction for the assembly routines
in math so these tests are diabled until these are corrected.

Updates #37854

Change-Id: I1e26acd6d09ae3e592a3dd90aec73a6844f5c6fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/223457
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2020-03-14 07:03:15 +00:00
Brian Kessler
70dc28f766 math/cmplx: implement Payne-Hanek range reduction
Tan has poles along the real axis. In order to accurately calculate
the value near these poles, a range reduction by Pi is performed and
the result calculated via a Taylor series.  The prior implementation
of range reduction used Cody-Waite range reduction in three parts.
This fails when x is too large to accurately calculate the partial
products in the summation accurately.  Above this threshold, Payne-Hanek
range reduction using a multiple precision value of 1/Pi is required.

Additionally, the threshold used in math/trig_reduce.go for Payne-Hanek
range reduction was not set conservatively enough. The prior threshold
ensured that catastrophic failure did not occur where the argument x
would not actually be reduced below Pi/4. However, errors in reduction
begin to occur at values much lower when z = ((x - y*PI4A) - y*PI4B) - y*PI4C
is not exact because y*PI4A cannot be exactly represented as a float64.
reduceThreshold is lowered to the proper value.

Fixes #31566

Change-Id: I0f39a4171a5be44f64305f18dc57f6c29f19dba7
Reviewed-on: https://go-review.googlesource.com/c/go/+/172838
Reviewed-by: Rob Pike <r@golang.org>
2020-03-14 04:12:41 +00:00
Joel Sing
fe70838598 math/big: initial vector arithmetic in riscv64 assembly
Provide an assembly implementation of mulWW - for now all others run the
Go code.

Change-Id: Icb594c31048255f131bdea8d64f56784fc9db4d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/220919
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-02-25 16:47:02 +00:00
Joel Sing
89f249a40d math: implement Sqrt in assembly for riscv64
Change-Id: I9a5dc33271434e58335f5562a30cc131c6a8332c
Reviewed-on: https://go-review.googlesource.com/c/go/+/220918
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-02-25 16:43:26 +00:00
Filippo Valsorda
88ae4ccefb math/big: reintroduce pre-Go 1.14 mention in GCD docs
It was removed in CL 217302 but was intentionally added in CL 217104.

Change-Id: I1a478d80ad1ec4f0a0184bfebf8f1a5e352cfe8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/217941
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-02-05 20:54:27 +00:00
Filippo Valsorda
cdb7fd6b06 math/big: simplify GCD docs
We don't usually document past behavior (like "As of Go 1.14 ...") and
in isolation the current docs made it sound like a and b could only be
negative or zero.

Change-Id: I0d3c2b8579a9c01159ce528a3128b1478e99042a
Reviewed-on: https://go-review.googlesource.com/c/go/+/217302
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-01-31 23:32:37 +00:00
Robert Griesemer
9bb40ed8ec math/big: update comment on Int.GCD
Per the suggestion https://golang.org/cl/216200/2/doc/go1.14.html#423.

Updates #28878.

Change-Id: I654d2d114409624219a0041916f0a4030efc7573
Reviewed-on: https://go-review.googlesource.com/c/go/+/217104
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-01-30 20:37:01 +00:00
Joel Sing
5a3a5d3525 math, math/big: add support for riscv64
Based on riscv-go port.

Updates #27532

Change-Id: Id8ae7d851c393ec3702e4176c363accb0a42587f
Reviewed-on: https://go-review.googlesource.com/c/go/+/204633
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2020-01-15 18:49:52 +00:00
Brad Fitzpatrick
7d1d944626 math/rand: update comment to avoid use of ^ for exponentiation
Fixes #35920

Change-Id: I1a4d26c5f7f3fbd4de13fc337de482667d83c47f
Reviewed-on: https://go-review.googlesource.com/c/go/+/209758
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2019-12-04 21:14:24 +00:00
Ville Skyttä
440f7d6404 all: fix a bunch of misspellings
Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7
GitHub-Last-Rev: 778c5d2131
GitHub-Pull-Request: golang/go#35624
Reviewed-on: https://go-review.googlesource.com/c/go/+/207421
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-11-15 21:04:43 +00:00
Rémy Oudompheng
7ad27481f8 math/big: fix out-of-bounds panic in divRecursive
The bounds in the last carry branch were wrong as there
is no reason for len(u) >= n+n/2 to always hold true.

We also adjust test to avoid using a remainder of 1
(in which case, the last step of the algorithm computes
(qhatv+1) - qhatv which rarely produces a carry).

Change-Id: I69fbab9c5e19d0db1c087fbfcd5b89352c2d26fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/206839
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-11-13 19:15:27 +00:00
Robert Griesemer
1fe33e3cb2 math/big: ensure correct test input
There is a (theoretical, but possible) chance that the
random number values a, b used for TestDiv are 0 or 1,
in which case the test would fail.

This CL makes sure that a >= 1 and b >= 2 at all times.

Fixes #35523.

Change-Id: I6451feb94241249516a821cd0066e95a0c65b0ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/206818
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-11-12 18:52:52 +00:00
Rémy Oudompheng
194ae3236d math/big: implement recursive algorithm for division
The current division algorithm produces one word of result at a time,
using 2-word division to compute the top word and mulAddVWW to compute
the remainder. The top word may need to be adjusted by 1 or 2 units.

The recursive version, based on Burnikel, Ziegler, "Fast Recursive Division",
uses the same principles, but in a multi-word setting, so that
multiplication benefits from the Karatsuba algorithm (and possibly later
improvements).

benchmark                             old ns/op        new ns/op      delta
BenchmarkDiv/20/10-4                  38.2             38.3           +0.26%
BenchmarkDiv/40/20-4                  38.7             38.5           -0.52%
BenchmarkDiv/100/50-4                 62.5             62.6           +0.16%
BenchmarkDiv/200/100-4                238              259            +8.82%
BenchmarkDiv/400/200-4                311              338            +8.68%
BenchmarkDiv/1000/500-4               604              649            +7.45%
BenchmarkDiv/2000/1000-4              1214             1278           +5.27%
BenchmarkDiv/20000/10000-4            38279            36510          -4.62%
BenchmarkDiv/200000/100000-4          3022057          1359615        -55.01%
BenchmarkDiv/2000000/1000000-4        310827664        54012939       -82.62%
BenchmarkDiv/20000000/10000000-4      33272829421      1965401359     -94.09%
BenchmarkString/10/Base10-4           158              156            -1.27%
BenchmarkString/100/Base10-4          797              792            -0.63%
BenchmarkString/1000/Base10-4         3677             3814           +3.73%
BenchmarkString/10000/Base10-4        16633            17116          +2.90%
BenchmarkString/100000/Base10-4       5779029          1793808        -68.96%
BenchmarkString/1000000/Base10-4      889840820        85524031       -90.39%
BenchmarkString/10000000/Base10-4     134338236860     4935657026     -96.33%

Fixes #21960
Updates #30943

Change-Id: I134c6f81a47870c688ca95b6081eb9211def15a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/172018
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-11-12 05:18:25 +00:00
Ian Lance Taylor
f5e89c2214 Revert "math/cmplx: handle special cases"
This reverts CL 169501.

Reason for revert: The new tests fail at least on s390x and MIPS. This is likely a minor bug in the compiler or runtime. But this point in the release cycle is not the time to debug these details, which are unlikely to be new. Let's try again for 1.15.

Updates #29320
Fixes #35443

Change-Id: I2218b2083f8974b57d528e3742524393fc72b355
Reviewed-on: https://go-review.googlesource.com/c/go/+/206037
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-08 03:07:30 +00:00
Brian Kessler
68dce4296e math/cmplx: handle special cases
Implement special case handling and testing to ensure
conformance with the C99 standard annex G.6 Complex arithmetic.

Fixes #29320

Change-Id: Ieb0527191dd7fdea5b1aecb42b9e23aae3f74260
Reviewed-on: https://go-review.googlesource.com/c/go/+/169501
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-11-07 19:48:30 +00:00
Russ Cox
e8f01d591f math: test portable FMA even on system with hardware FMA
This makes it a little less likely the portable FMA will be
broken without realizing it.

Change-Id: I7f7f4509b35160a9709f8b8a0e494c09ea6e410a
Reviewed-on: https://go-review.googlesource.com/c/go/+/205337
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 14:53:38 +00:00
Russ Cox
543c6d2e0d math, cmd/compile: rename Fma to FMA
This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).

I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.

Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.

The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.

For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.

So math.FMA, not math.Fma.

This API has not yet been released, so this change does not break
the compatibility promise.

This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.

Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-07 14:51:06 +00:00
Brian Kessler
f5949b6067 math/big: allow all values for GCD
Allow the inputs a and b to be zero or negative to GCD
with the following definitions.

If x or y are not nil, GCD sets their value such that z = a*x + b*y.
Regardless of the signs of a and b, z is always >= 0.
If a == b == 0, GCD sets z = x = y = 0.
If a == 0 and b != 0, GCD sets z = |b|, x = 0, y = sign(b) * 1.
If a != 0 and b == 0, GCD sets z = |a|, x = sign(a) * 1, y = 0.

Fixes #28878

Change-Id: Ia83fce66912a96545c95cd8df0549bfd852652f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/164972
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-11-07 06:58:44 +00:00
Rémy Oudompheng
8f30d25168 math/big: use nat pool to reduce allocations in mul and sqr
This notably allows to reuse temporaries across
the karatsubaSqr recursion.

benchmark                    old ns/op     new ns/op     delta
BenchmarkNatMul/10-4         227           228           +0.44%
BenchmarkNatMul/100-4        8339          8589          +3.00%
BenchmarkNatMul/1000-4       313796        312272        -0.49%
BenchmarkNatMul/10000-4      11924720      11873589      -0.43%
BenchmarkNatMul/100000-4     503813354     503839058     +0.01%
BenchmarkNatSqr/20-4         549           513           -6.56%
BenchmarkNatSqr/30-4         945           874           -7.51%
BenchmarkNatSqr/50-4         1993          1832          -8.08%
BenchmarkNatSqr/80-4         4096          3874          -5.42%
BenchmarkNatSqr/100-4        6192          5712          -7.75%
BenchmarkNatSqr/200-4        20388         19543         -4.14%
BenchmarkNatSqr/300-4        38735         36715         -5.21%
BenchmarkNatSqr/500-4        99562         93542         -6.05%
BenchmarkNatSqr/800-4        195554        184907        -5.44%
BenchmarkNatSqr/1000-4       286302        275053        -3.93%
BenchmarkNatSqr/10000-4      9817057       9441641       -3.82%
BenchmarkNatSqr/100000-4     390713416     379696789     -2.82%

benchmark                    old allocs     new allocs     delta
BenchmarkNatMul/10-4         1              1              +0.00%
BenchmarkNatMul/100-4        1              1              +0.00%
BenchmarkNatMul/1000-4       2              1              -50.00%
BenchmarkNatMul/10000-4      2              1              -50.00%
BenchmarkNatMul/100000-4     9              11             +22.22%
BenchmarkNatSqr/20-4         2              1              -50.00%
BenchmarkNatSqr/30-4         2              1              -50.00%
BenchmarkNatSqr/50-4         2              1              -50.00%
BenchmarkNatSqr/80-4         2              1              -50.00%
BenchmarkNatSqr/100-4        2              1              -50.00%
BenchmarkNatSqr/200-4        2              1              -50.00%
BenchmarkNatSqr/300-4        4              1              -75.00%
BenchmarkNatSqr/500-4        4              1              -75.00%
BenchmarkNatSqr/800-4        10             1              -90.00%
BenchmarkNatSqr/1000-4       10             1              -90.00%
BenchmarkNatSqr/10000-4      731            1              -99.86%
BenchmarkNatSqr/100000-4     19687          6              -99.97%

benchmark                    old bytes     new bytes     delta
BenchmarkNatMul/10-4         192           192           +0.00%
BenchmarkNatMul/100-4        4864          4864          +0.00%
BenchmarkNatMul/1000-4       57344         49224         -14.16%
BenchmarkNatMul/10000-4      565248        498772        -11.76%
BenchmarkNatMul/100000-4     5749504       7263720       +26.34%
BenchmarkNatSqr/20-4         672           352           -47.62%
BenchmarkNatSqr/30-4         992           512           -48.39%
BenchmarkNatSqr/50-4         1792          896           -50.00%
BenchmarkNatSqr/80-4         2688          1408          -47.62%
BenchmarkNatSqr/100-4        3584          1792          -50.00%
BenchmarkNatSqr/200-4        6656          3456          -48.08%
BenchmarkNatSqr/300-4        24448         16387         -32.97%
BenchmarkNatSqr/500-4        36864         24591         -33.29%
BenchmarkNatSqr/800-4        69760         40981         -41.25%
BenchmarkNatSqr/1000-4       86016         49180         -42.82%
BenchmarkNatSqr/10000-4      2524800       487368        -80.70%
BenchmarkNatSqr/100000-4     68599808      5876581       -91.43%

Change-Id: I8e6e409ae1cb48be9d5aa9b5f428d6cbe487673a
Reviewed-on: https://go-review.googlesource.com/c/go/+/172017
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-10-25 03:14:39 +00:00
Robert Griesemer
a52c0a1992 math/big: make Rat.Denom side-effect free
A Rat is represented via a quotient a/b where a and b are Int values.
To make it possible to use an uninitialized Rat value (with a and b
uninitialized and thus == 0), the implementation treats a 0 denominator
as 1.

Rat.Num and Rat.Denom return pointers to these values a and b. Because
b may be 0, Rat.Denom used to first initialize it to 1 and thus produce
an undesirable side-effect (by changing the Rat's denominator).

This CL changes Denom to return a new (not shared) *Int with value 1
in the rare case where the Rat was not initialized. This eliminates
the side effect and returns the correct denominator value.

While this is changing behavior of the API, the impact should now be
minor because together with (prior) CL https://golang.org/cl/202997,
which initializes Rats ASAP, Denom is unlikely used to access the
denominator of an uninitialized (and thus 0) Rat. Any operation that
will somehow set a Rat value will ensure that the denominator is not 0.

Fixes #33792.
Updates #3521.

Change-Id: I0bf15ac60513cf52162bfb62440817ba36f0c3fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/203059
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-24 03:34:24 +00:00
Robert Griesemer
4412181e7c math/big: normalize unitialized denominators ASAP
A Rat is represented via a quotient a/b where a and b are Int values.
To make it possible to use an uninitialized Rat value (with a and b
uninitialized and thus == 0), the implementation treats a 0 denominator
as 1.

For each operation we check if the denominator is 0, and then treat
it as 1 (if necessary). Operations that create a new Rat result,
normalize that value such that a result denominator 1 is represened
as 0 again.

This CL changes this behavior slightly: 0 denominators are still
interpreted as 1, but whenever we (safely) can, we set an uninitialized
0 denominator to 1. This simplifies the code overall.

Also: Improved some doc strings.

Preparation for addressing issue #33792.

Updates #33792.

Change-Id: I3040587c8d0dad2e840022f96ca027d8470878a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/202997
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-24 03:34:11 +00:00
Akhil Indurti
93a601dd2a math: add guaranteed-precision FMA implementation
Currently, the precision of the float64 multiply-add operation
(x * y) + z varies across architectures. While generated code for
ppc64, s390x, and arm64 can guarantee that there is no intermediate
rounding on those platforms, other architectures like x86, mips, and
arm will exhibit different behavior depending on available instruction
set. Consequently, applications cannot rely on results being identical
across GOARCH-dependent codepaths.

This CL introduces a software implementation that performs an IEEE 754
double-precision fused-multiply-add operation. The only supported
rounding mode is round-to-nearest ties-to-even. Separate CLs include
hardware implementations when available. Otherwise, this software
fallback is given as the default implementation.

Specifically,
    - arm64, ppc64, s390x: Uses the FMA instruction provided by all
      of these ISAs.
    - mips[64][le]: Falls back to this software implementation. Only
      release 6 of the ISA includes a strict FMA instruction with
      MADDF.D (not implementation defined). Because the number of R6
      processors in the wild is scarce, the assembly implementation
      is left as a future optimization.
    - x86: Guards the use of VFMADD213SD by checking cpu.X86.HasFMA.
    - arm: Guards the use of VFMA by checking cpu.ARM.HasVFPv4.
    - software fallback: Uses mostly integer arithmetic except
      for input that involves Inf, NaN, or zero.

Updates #25819.

Change-Id: Iadadff2219638bacc9fec78d3ab885393fea4a08
Reviewed-on: https://go-review.googlesource.com/c/go/+/127458
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:15:30 +00:00