1
0
mirror of https://github.com/golang/go synced 2024-11-05 18:46:11 -07:00
Commit Graph

35824 Commits

Author SHA1 Message Date
erifan01
0585d41c87 math/big: optimize addVW and subVW on arm64
The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance.
By unrolling the cycle 4 times and 2 times, and using "LDP", "STP" instructions,
this CL can reduce the "load" cost and improve performance.

Benchmarks:

name                              old time/op    new time/op     delta
AddVV/1-8                           21.5ns ± 0%     21.5ns ± 0%      ~     (all equal)
AddVV/2-8                           13.5ns ± 0%     13.5ns ± 0%      ~     (all equal)
AddVV/3-8                           15.5ns ± 0%     15.5ns ± 0%      ~     (all equal)
AddVV/4-8                           17.5ns ± 0%     17.5ns ± 0%      ~     (all equal)
AddVV/5-8                           19.5ns ± 0%     19.5ns ± 0%      ~     (all equal)
AddVV/10-8                          29.5ns ± 0%     29.5ns ± 0%      ~     (all equal)
AddVV/100-8                          217ns ± 0%      217ns ± 0%      ~     (all equal)
AddVV/1000-8                        2.02µs ± 0%     2.02µs ± 0%      ~     (all equal)
AddVV/10000-8                       20.3µs ± 0%     20.3µs ± 0%      ~     (p=0.603 n=5+5)
AddVV/100000-8                       223µs ± 7%      228µs ± 8%      ~     (p=0.548 n=5+5)
AddVW/1-8                           9.32ns ± 0%     9.26ns ± 0%    -0.64%  (p=0.008 n=5+5)
AddVW/2-8                           19.8ns ± 3%     10.5ns ± 0%   -46.92%  (p=0.008 n=5+5)
AddVW/3-8                           11.5ns ± 0%     11.0ns ± 0%    -4.35%  (p=0.008 n=5+5)
AddVW/4-8                           13.0ns ± 0%     12.0ns ± 0%    -7.69%  (p=0.008 n=5+5)
AddVW/5-8                           14.5ns ± 0%     12.5ns ± 0%   -13.79%  (p=0.008 n=5+5)
AddVW/10-8                          22.0ns ± 0%     15.5ns ± 0%   -29.55%  (p=0.008 n=5+5)
AddVW/100-8                          167ns ± 0%       81ns ± 0%   -51.44%  (p=0.008 n=5+5)
AddVW/1000-8                        1.52µs ± 0%     0.64µs ± 0%   -57.58%  (p=0.008 n=5+5)
AddVW/10000-8                       15.1µs ± 0%      7.2µs ± 0%   -52.55%  (p=0.008 n=5+5)
AddVW/100000-8                       150µs ± 0%       71µs ± 0%   -52.95%  (p=0.008 n=5+5)
SubVW/1-8                           9.32ns ± 0%     9.26ns ± 0%    -0.64%  (p=0.008 n=5+5)
SubVW/2-8                           19.7ns ± 2%     10.5ns ± 0%   -46.70%  (p=0.008 n=5+5)
SubVW/3-8                           11.5ns ± 0%     11.0ns ± 0%    -4.35%  (p=0.008 n=5+5)
SubVW/4-8                           13.0ns ± 0%     12.0ns ± 0%    -7.69%  (p=0.008 n=5+5)
SubVW/5-8                           14.5ns ± 0%     12.5ns ± 0%   -13.79%  (p=0.008 n=5+5)
SubVW/10-8                          22.0ns ± 0%     15.5ns ± 0%   -29.55%  (p=0.008 n=5+5)
SubVW/100-8                          167ns ± 0%       81ns ± 0%   -51.44%  (p=0.008 n=5+5)
SubVW/1000-8                        1.52µs ± 0%     0.64µs ± 0%   -57.58%  (p=0.008 n=5+5)
SubVW/10000-8                       15.1µs ± 0%      7.2µs ± 0%   -52.49%  (p=0.008 n=5+5)
SubVW/100000-8                       150µs ± 0%       71µs ± 0%   -52.91%  (p=0.008 n=5+5)
AddMulVVW/1-8                       32.4ns ± 1%     32.6ns ± 1%      ~     (p=0.119 n=5+5)
AddMulVVW/2-8                       57.0ns ± 0%     57.0ns ± 0%      ~     (p=0.643 n=5+5)
AddMulVVW/3-8                       90.8ns ± 0%     90.7ns ± 0%      ~     (p=0.524 n=5+5)
AddMulVVW/4-8                        118ns ± 0%      118ns ± 1%      ~     (p=1.000 n=4+5)
AddMulVVW/5-8                        144ns ± 1%      144ns ± 0%      ~     (p=0.794 n=5+4)
AddMulVVW/10-8                       294ns ± 1%      296ns ± 0%    +0.48%  (p=0.040 n=5+5)
AddMulVVW/100-8                     2.73µs ± 0%     2.73µs ± 0%      ~     (p=0.278 n=5+5)
AddMulVVW/1000-8                    26.0µs ± 0%     26.5µs ± 0%    +2.14%  (p=0.008 n=5+5)
AddMulVVW/10000-8                    297µs ± 0%      297µs ± 0%    +0.24%  (p=0.008 n=5+5)
AddMulVVW/100000-8                  3.15ms ± 1%     3.13ms ± 0%      ~     (p=0.690 n=5+5)
DecimalConversion-8                  311µs ± 2%      309µs ± 2%      ~     (p=0.310 n=5+5)
FloatString/100-8                   2.55µs ± 2%     2.54µs ± 2%      ~     (p=1.000 n=5+5)
FloatString/1000-8                  58.1µs ± 0%     58.1µs ± 0%      ~     (p=0.151 n=5+5)
FloatString/10000-8                 4.59ms ± 0%     4.59ms ± 0%      ~     (p=0.151 n=5+5)
FloatString/100000-8                 446ms ± 0%      446ms ± 0%    +0.01%  (p=0.016 n=5+5)
FloatAdd/10-8                        183ns ± 0%      183ns ± 0%      ~     (p=0.333 n=4+5)
FloatAdd/100-8                       187ns ± 1%      192ns ± 2%      ~     (p=0.056 n=5+5)
FloatAdd/1000-8                      369ns ± 0%      371ns ± 0%    +0.54%  (p=0.016 n=4+5)
FloatAdd/10000-8                    1.88µs ± 0%     1.88µs ± 0%    -0.14%  (p=0.000 n=4+5)
FloatAdd/100000-8                   17.2µs ± 0%     17.1µs ± 0%    -0.37%  (p=0.008 n=5+5)
FloatSub/10-8                        147ns ± 0%      147ns ± 0%      ~     (all equal)
FloatSub/100-8                       145ns ± 0%      146ns ± 0%      ~     (p=0.238 n=5+4)
FloatSub/1000-8                      241ns ± 0%      241ns ± 0%      ~     (p=0.333 n=5+4)
FloatSub/10000-8                    1.06µs ± 0%     1.06µs ± 0%      ~     (p=0.444 n=5+5)
FloatSub/100000-8                   9.50µs ± 0%     9.48µs ± 0%    -0.14%  (p=0.008 n=5+5)
ParseFloatSmallExp-8                28.4µs ± 2%     28.5µs ± 1%      ~     (p=0.690 n=5+5)
ParseFloatLargeExp-8                 125µs ± 1%      124µs ± 1%      ~     (p=0.095 n=5+5)
GCD10x10/WithoutXY-8                 277ns ± 2%      278ns ± 3%      ~     (p=0.937 n=5+5)
GCD10x10/WithXY-8                   2.08µs ± 3%     2.15µs ± 3%      ~     (p=0.056 n=5+5)
GCD10x100/WithoutXY-8                592ns ± 3%      613ns ± 4%      ~     (p=0.056 n=5+5)
GCD10x100/WithXY-8                  3.40µs ± 2%     3.42µs ± 4%      ~     (p=0.841 n=5+5)
GCD10x1000/WithoutXY-8              1.37µs ± 2%     1.35µs ± 3%      ~     (p=0.460 n=5+5)
GCD10x1000/WithXY-8                 7.34µs ± 2%     7.33µs ± 4%      ~     (p=0.841 n=5+5)
GCD10x10000/WithoutXY-8             8.52µs ± 0%     8.51µs ± 1%      ~     (p=0.421 n=5+5)
GCD10x10000/WithXY-8                27.5µs ± 2%     27.2µs ± 1%      ~     (p=0.151 n=5+5)
GCD10x100000/WithoutXY-8            78.3µs ± 1%     78.5µs ± 1%      ~     (p=0.690 n=5+5)
GCD10x100000/WithXY-8                231µs ± 0%      229µs ± 1%    -1.11%  (p=0.016 n=5+5)
GCD100x100/WithoutXY-8              1.86µs ± 2%     1.86µs ± 2%      ~     (p=0.881 n=5+5)
GCD100x100/WithXY-8                 27.1µs ± 2%     27.2µs ± 1%      ~     (p=0.421 n=5+5)
GCD100x1000/WithoutXY-8             4.44µs ± 2%     4.41µs ± 1%      ~     (p=0.310 n=5+5)
GCD100x1000/WithXY-8                36.3µs ± 1%     36.2µs ± 1%      ~     (p=0.310 n=5+5)
GCD100x10000/WithoutXY-8            22.6µs ± 2%     22.5µs ± 1%      ~     (p=0.690 n=5+5)
GCD100x10000/WithXY-8                145µs ± 1%      145µs ± 1%      ~     (p=1.000 n=5+5)
GCD100x100000/WithoutXY-8            195µs ± 0%      196µs ± 1%      ~     (p=0.548 n=5+5)
GCD100x100000/WithXY-8              1.10ms ± 0%     1.10ms ± 0%    -0.30%  (p=0.016 n=5+5)
GCD1000x1000/WithoutXY-8            25.0µs ± 1%     25.2µs ± 2%      ~     (p=0.222 n=5+5)
GCD1000x1000/WithXY-8                520µs ± 0%      520µs ± 1%      ~     (p=0.151 n=5+5)
GCD1000x10000/WithoutXY-8           57.0µs ± 1%     56.9µs ± 1%      ~     (p=0.690 n=5+5)
GCD1000x10000/WithXY-8              1.21ms ± 0%     1.21ms ± 1%      ~     (p=0.881 n=5+5)
GCD1000x100000/WithoutXY-8           358µs ± 0%      359µs ± 1%      ~     (p=0.548 n=5+5)
GCD1000x100000/WithXY-8             8.73ms ± 0%     8.73ms ± 0%      ~     (p=0.548 n=5+5)
GCD10000x10000/WithoutXY-8           686µs ± 0%      687µs ± 0%      ~     (p=0.548 n=5+5)
GCD10000x10000/WithXY-8             15.9ms ± 0%     15.9ms ± 0%      ~     (p=0.841 n=5+5)
GCD10000x100000/WithoutXY-8         2.08ms ± 0%     2.08ms ± 0%      ~     (p=1.000 n=5+5)
GCD10000x100000/WithXY-8            86.7ms ± 0%     86.7ms ± 0%      ~     (p=1.000 n=5+5)
GCD100000x100000/WithoutXY-8        51.1ms ± 0%     51.0ms ± 0%      ~     (p=0.151 n=5+5)
GCD100000x100000/WithXY-8            1.23s ± 0%      1.23s ± 0%      ~     (p=0.841 n=5+5)
Hilbert-8                           2.41ms ± 1%     2.42ms ± 2%      ~     (p=0.690 n=5+5)
Binomial-8                          4.86µs ± 1%     4.86µs ± 1%      ~     (p=0.889 n=5+5)
QuoRem-8                            7.09µs ± 0%     7.08µs ± 0%    -0.09%  (p=0.024 n=5+5)
Exp-8                                161ms ± 0%      161ms ± 0%    -0.08%  (p=0.032 n=5+5)
Exp2-8                               161ms ± 0%      161ms ± 0%      ~     (p=1.000 n=5+5)
Bitset-8                            40.7ns ± 0%     40.6ns ± 0%      ~     (p=0.095 n=4+5)
BitsetNeg-8                          159ns ± 4%      148ns ± 0%    -6.92%  (p=0.016 n=5+4)
BitsetOrig-8                         378ns ± 1%      378ns ± 1%      ~     (p=0.937 n=5+5)
BitsetNegOrig-8                      647ns ± 5%      647ns ± 4%      ~     (p=1.000 n=5+5)
ModSqrt225_Tonelli-8                7.26ms ± 0%     7.27ms ± 0%      ~     (p=1.000 n=5+5)
ModSqrt224_3Mod4-8                  2.24ms ± 0%     2.24ms ± 0%      ~     (p=0.690 n=5+5)
ModSqrt5430_Tonelli-8                62.8s ± 1%      62.5s ± 0%      ~     (p=0.063 n=5+4)
ModSqrt5430_3Mod4-8                  20.8s ± 0%      20.8s ± 0%      ~     (p=0.310 n=5+5)
Sqrt-8                               101µs ± 1%      101µs ± 0%    -0.35%  (p=0.032 n=5+5)
IntSqr/1-8                          32.3ns ± 1%     32.5ns ± 1%      ~     (p=0.421 n=5+5)
IntSqr/2-8                           157ns ± 5%      156ns ± 5%      ~     (p=0.651 n=5+5)
IntSqr/3-8                           292ns ± 2%      291ns ± 3%      ~     (p=0.881 n=5+5)
IntSqr/5-8                           738ns ± 6%      740ns ± 5%      ~     (p=0.841 n=5+5)
IntSqr/8-8                          1.82µs ± 4%     1.83µs ± 4%      ~     (p=0.730 n=5+5)
IntSqr/10-8                         2.92µs ± 1%     2.93µs ± 1%      ~     (p=0.643 n=5+5)
IntSqr/20-8                         6.28µs ± 2%     6.28µs ± 2%      ~     (p=1.000 n=5+5)
IntSqr/30-8                         13.8µs ± 2%     13.9µs ± 3%      ~     (p=1.000 n=5+5)
IntSqr/50-8                         37.8µs ± 4%     37.9µs ± 4%      ~     (p=0.690 n=5+5)
IntSqr/80-8                         95.9µs ± 1%     95.8µs ± 1%      ~     (p=0.841 n=5+5)
IntSqr/100-8                         148µs ± 1%      148µs ± 1%      ~     (p=0.310 n=5+5)
IntSqr/200-8                         586µs ± 1%      586µs ± 1%      ~     (p=0.841 n=5+5)
IntSqr/300-8                        1.32ms ± 0%     1.31ms ± 0%      ~     (p=0.222 n=5+5)
IntSqr/500-8                        2.48ms ± 0%     2.48ms ± 0%      ~     (p=0.556 n=5+4)
IntSqr/800-8                        4.68ms ± 0%     4.68ms ± 0%      ~     (p=0.548 n=5+5)
IntSqr/1000-8                       7.57ms ± 0%     7.56ms ± 0%      ~     (p=0.421 n=5+5)
Mul-8                                311ms ± 0%      311ms ± 0%      ~     (p=0.548 n=5+5)
Exp3Power/0x10-8                     559ns ± 1%      560ns ± 1%      ~     (p=0.984 n=5+5)
Exp3Power/0x40-8                     641ns ± 1%      634ns ± 1%      ~     (p=0.063 n=5+5)
Exp3Power/0x100-8                   1.39µs ± 2%     1.40µs ± 2%      ~     (p=0.381 n=5+5)
Exp3Power/0x400-8                   8.27µs ± 1%     8.26µs ± 0%      ~     (p=0.571 n=5+5)
Exp3Power/0x1000-8                  59.9µs ± 0%     59.7µs ± 0%    -0.23%  (p=0.008 n=5+5)
Exp3Power/0x4000-8                   816µs ± 0%      816µs ± 0%      ~     (p=1.000 n=5+5)
Exp3Power/0x10000-8                 7.77ms ± 0%     7.77ms ± 0%      ~     (p=0.841 n=5+5)
Exp3Power/0x40000-8                 73.4ms ± 0%     73.4ms ± 0%      ~     (p=0.690 n=5+5)
Exp3Power/0x100000-8                 665ms ± 0%      664ms ± 0%    -0.14%  (p=0.008 n=5+5)
Exp3Power/0x400000-8                 5.98s ± 0%      5.98s ± 0%    -0.09%  (p=0.008 n=5+5)
Fibo-8                               116ms ± 0%      116ms ± 0%    -0.25%  (p=0.008 n=5+5)
NatSqr/1-8                           115ns ± 3%      116ns ± 2%      ~     (p=0.238 n=5+5)
NatSqr/2-8                           237ns ± 1%      237ns ± 1%      ~     (p=0.683 n=5+5)
NatSqr/3-8                           367ns ± 3%      368ns ± 3%      ~     (p=0.817 n=5+5)
NatSqr/5-8                           807ns ± 3%      812ns ± 3%      ~     (p=0.913 n=5+5)
NatSqr/8-8                          1.93µs ± 2%     1.93µs ± 3%      ~     (p=0.651 n=5+5)
NatSqr/10-8                         2.98µs ± 2%     2.99µs ± 2%      ~     (p=0.690 n=5+5)
NatSqr/20-8                         6.49µs ± 2%     6.46µs ± 2%      ~     (p=0.548 n=5+5)
NatSqr/30-8                         14.4µs ± 2%     14.3µs ± 2%      ~     (p=0.690 n=5+5)
NatSqr/50-8                         38.6µs ± 2%     38.7µs ± 2%      ~     (p=0.841 n=5+5)
NatSqr/80-8                         96.1µs ± 2%     95.8µs ± 2%      ~     (p=0.548 n=5+5)
NatSqr/100-8                         149µs ± 1%      149µs ± 1%      ~     (p=0.841 n=5+5)
NatSqr/200-8                         593µs ± 1%      590µs ± 1%      ~     (p=0.421 n=5+5)
NatSqr/300-8                        1.32ms ± 0%     1.32ms ± 1%      ~     (p=0.222 n=5+5)
NatSqr/500-8                        2.49ms ± 0%     2.49ms ± 0%      ~     (p=0.690 n=5+5)
NatSqr/800-8                        4.69ms ± 0%     4.69ms ± 0%      ~     (p=1.000 n=5+5)
NatSqr/1000-8                       7.59ms ± 0%     7.58ms ± 0%      ~     (p=0.841 n=5+5)
ScanPi-8                             322µs ± 0%      321µs ± 0%      ~     (p=0.095 n=5+5)
StringPiParallel-8                  71.4µs ± 5%     68.8µs ± 4%      ~     (p=0.151 n=5+5)
Scan/10/Base2-8                     1.10µs ± 0%     1.09µs ± 0%    -0.36%  (p=0.032 n=5+5)
Scan/100/Base2-8                    7.78µs ± 0%     7.79µs ± 0%    +0.14%  (p=0.008 n=5+5)
Scan/1000/Base2-8                   78.8µs ± 0%     79.0µs ± 0%    +0.24%  (p=0.008 n=5+5)
Scan/10000/Base2-8                  1.22ms ± 0%     1.22ms ± 0%      ~     (p=0.056 n=5+5)
Scan/100000/Base2-8                 55.1ms ± 0%     55.0ms ± 0%    -0.15%  (p=0.008 n=5+5)
Scan/10/Base8-8                      514ns ± 0%      515ns ± 0%      ~     (p=0.079 n=5+5)
Scan/100/Base8-8                    2.89µs ± 0%     2.89µs ± 0%    +0.15%  (p=0.008 n=5+5)
Scan/1000/Base8-8                   31.0µs ± 0%     31.1µs ± 0%    +0.12%  (p=0.008 n=5+5)
Scan/10000/Base8-8                   740µs ± 0%      740µs ± 0%      ~     (p=0.222 n=5+5)
Scan/100000/Base8-8                 50.6ms ± 0%     50.5ms ± 0%    -0.06%  (p=0.016 n=4+5)
Scan/10/Base10-8                     492ns ± 1%      490ns ± 1%      ~     (p=0.310 n=5+5)
Scan/100/Base10-8                   2.67µs ± 0%     2.67µs ± 0%      ~     (p=0.056 n=5+5)
Scan/1000/Base10-8                  28.7µs ± 0%     28.7µs ± 0%      ~     (p=1.000 n=5+5)
Scan/10000/Base10-8                  717µs ± 0%      716µs ± 0%      ~     (p=0.222 n=5+5)
Scan/100000/Base10-8                50.2ms ± 0%     50.3ms ± 0%    +0.05%  (p=0.008 n=5+5)
Scan/10/Base16-8                     442ns ± 1%      442ns ± 0%      ~     (p=0.468 n=5+5)
Scan/100/Base16-8                   2.46µs ± 0%     2.45µs ± 0%      ~     (p=0.159 n=5+5)
Scan/1000/Base16-8                  27.2µs ± 0%     27.2µs ± 0%      ~     (p=0.841 n=5+5)
Scan/10000/Base16-8                  721µs ± 0%      722µs ± 0%      ~     (p=0.548 n=5+5)
Scan/100000/Base16-8                52.6ms ± 0%     52.6ms ± 0%    +0.07%  (p=0.008 n=5+5)
String/10/Base2-8                    244ns ± 1%      242ns ± 1%      ~     (p=0.103 n=5+5)
String/100/Base2-8                  1.48µs ± 0%     1.48µs ± 1%      ~     (p=0.786 n=5+5)
String/1000/Base2-8                 13.3µs ± 1%     13.3µs ± 0%      ~     (p=0.222 n=5+5)
String/10000/Base2-8                 132µs ± 1%      132µs ± 1%      ~     (p=1.000 n=5+5)
String/100000/Base2-8               1.30ms ± 1%     1.30ms ± 1%      ~     (p=1.000 n=5+5)
String/10/Base8-8                    167ns ± 1%      168ns ± 1%      ~     (p=0.135 n=5+5)
String/100/Base8-8                   623ns ± 1%      626ns ± 1%      ~     (p=0.151 n=5+5)
String/1000/Base8-8                 5.24µs ± 1%     5.24µs ± 0%      ~     (p=1.000 n=5+5)
String/10000/Base8-8                50.0µs ± 1%     50.0µs ± 1%      ~     (p=1.000 n=5+5)
String/100000/Base8-8                492µs ± 1%      489µs ± 1%      ~     (p=0.056 n=5+5)
String/10/Base10-8                   503ns ± 1%      501ns ± 0%      ~     (p=0.183 n=5+5)
String/100/Base10-8                 1.96µs ± 0%     1.97µs ± 0%      ~     (p=0.389 n=5+5)
String/1000/Base10-8                12.4µs ± 1%     12.4µs ± 1%      ~     (p=0.841 n=5+5)
String/10000/Base10-8               56.7µs ± 1%     56.6µs ± 0%      ~     (p=1.000 n=5+5)
String/100000/Base10-8              25.6ms ± 0%     25.6ms ± 0%      ~     (p=0.222 n=5+5)
String/10/Base16-8                   147ns ± 0%      148ns ± 2%      ~     (p=1.000 n=4+5)
String/100/Base16-8                  505ns ± 0%      505ns ± 1%      ~     (p=0.778 n=5+5)
String/1000/Base16-8                3.94µs ± 0%     3.94µs ± 0%      ~     (p=0.841 n=5+5)
String/10000/Base16-8               37.4µs ± 1%     37.2µs ± 1%      ~     (p=0.095 n=5+5)
String/100000/Base16-8               367µs ± 1%      367µs ± 0%      ~     (p=1.000 n=5+5)
LeafSize/0-8                        6.64ms ± 0%     6.65ms ± 0%      ~     (p=0.690 n=5+5)
LeafSize/1-8                        72.5µs ± 1%     72.4µs ± 1%      ~     (p=0.841 n=5+5)
LeafSize/2-8                        72.6µs ± 1%     72.6µs ± 1%      ~     (p=1.000 n=5+5)
LeafSize/3-8                         377µs ± 0%      377µs ± 0%      ~     (p=0.421 n=5+5)
LeafSize/4-8                        71.2µs ± 1%     71.3µs ± 0%      ~     (p=0.278 n=5+5)
LeafSize/5-8                         469µs ± 0%      469µs ± 0%      ~     (p=0.310 n=5+5)
LeafSize/6-8                         376µs ± 0%      376µs ± 0%      ~     (p=0.841 n=5+5)
LeafSize/7-8                         244µs ± 0%      244µs ± 0%      ~     (p=0.841 n=5+5)
LeafSize/8-8                        71.9µs ± 1%     72.1µs ± 1%      ~     (p=0.548 n=5+5)
LeafSize/9-8                         536µs ± 0%      536µs ± 0%      ~     (p=0.151 n=5+5)
LeafSize/10-8                        470µs ± 0%      471µs ± 0%    +0.10%  (p=0.032 n=5+5)
LeafSize/11-8                        458µs ± 0%      458µs ± 0%      ~     (p=0.881 n=5+5)
LeafSize/12-8                        376µs ± 0%      376µs ± 0%      ~     (p=0.548 n=5+5)
LeafSize/13-8                        341µs ± 0%      342µs ± 0%      ~     (p=0.222 n=5+5)
LeafSize/14-8                        246µs ± 0%      245µs ± 0%      ~     (p=0.167 n=5+5)
LeafSize/15-8                        168µs ± 0%      168µs ± 0%      ~     (p=0.548 n=5+5)
LeafSize/16-8                       72.1µs ± 1%     72.2µs ± 1%      ~     (p=0.690 n=5+5)
LeafSize/32-8                       81.5µs ± 1%     81.4µs ± 1%      ~     (p=1.000 n=5+5)
LeafSize/64-8                        133µs ± 1%      134µs ± 1%      ~     (p=0.690 n=5+5)
ProbablyPrime/n=0-8                 44.3ms ± 0%     44.2ms ± 0%    -0.28%  (p=0.008 n=5+5)
ProbablyPrime/n=1-8                 64.8ms ± 0%     64.7ms ± 0%    -0.15%  (p=0.008 n=5+5)
ProbablyPrime/n=5-8                  147ms ± 0%      147ms ± 0%    -0.11%  (p=0.008 n=5+5)
ProbablyPrime/n=10-8                 250ms ± 0%      250ms ± 0%      ~     (p=0.056 n=5+5)
ProbablyPrime/n=20-8                 456ms ± 0%      455ms ± 0%    -0.05%  (p=0.008 n=5+5)
ProbablyPrime/Lucas-8               23.6ms ± 0%     23.5ms ± 0%    -0.29%  (p=0.008 n=5+5)
ProbablyPrime/MillerRabinBase2-8    20.6ms ± 0%     20.6ms ± 0%      ~     (p=0.690 n=5+5)
FloatSqrt/64-8                      2.01µs ± 1%     2.02µs ± 1%      ~     (p=0.421 n=5+5)
FloatSqrt/128-8                     4.43µs ± 2%     4.38µs ± 2%      ~     (p=0.222 n=5+5)
FloatSqrt/256-8                     6.64µs ± 1%     6.68µs ± 2%      ~     (p=0.516 n=5+5)
FloatSqrt/1000-8                    31.9µs ± 0%     31.8µs ± 0%      ~     (p=0.095 n=5+5)
FloatSqrt/10000-8                    595µs ± 0%      594µs ± 0%      ~     (p=0.056 n=5+5)
FloatSqrt/100000-8                  17.9ms ± 0%     17.9ms ± 0%      ~     (p=0.151 n=5+5)
FloatSqrt/1000000-8                  1.52s ± 0%      1.52s ± 0%      ~     (p=0.841 n=5+5)

name                              old speed      new speed       delta
AddVV/1-8                         2.97GB/s ± 0%   2.97GB/s ± 0%      ~     (p=0.971 n=4+4)
AddVV/2-8                         9.47GB/s ± 0%   9.47GB/s ± 0%    +0.01%  (p=0.016 n=5+5)
AddVV/3-8                         12.4GB/s ± 0%   12.4GB/s ± 0%      ~     (p=0.548 n=5+5)
AddVV/4-8                         14.6GB/s ± 0%   14.6GB/s ± 0%      ~     (p=1.000 n=5+5)
AddVV/5-8                         16.4GB/s ± 0%   16.4GB/s ± 0%      ~     (p=1.000 n=5+5)
AddVV/10-8                        21.7GB/s ± 0%   21.7GB/s ± 0%      ~     (p=0.548 n=5+5)
AddVV/100-8                       29.4GB/s ± 0%   29.4GB/s ± 0%      ~     (p=1.000 n=5+5)
AddVV/1000-8                      31.7GB/s ± 0%   31.7GB/s ± 0%      ~     (p=0.524 n=5+4)
AddVV/10000-8                     31.5GB/s ± 0%   31.5GB/s ± 0%      ~     (p=0.690 n=5+5)
AddVV/100000-8                    28.8GB/s ± 7%   28.1GB/s ± 8%      ~     (p=0.548 n=5+5)
AddVW/1-8                          859MB/s ± 0%    864MB/s ± 0%    +0.61%  (p=0.008 n=5+5)
AddVW/2-8                          809MB/s ± 2%   1520MB/s ± 0%   +87.78%  (p=0.008 n=5+5)
AddVW/3-8                         2.08GB/s ± 0%   2.18GB/s ± 0%    +4.54%  (p=0.008 n=5+5)
AddVW/4-8                         2.46GB/s ± 0%   2.66GB/s ± 0%    +8.33%  (p=0.016 n=4+5)
AddVW/5-8                         2.76GB/s ± 0%   3.20GB/s ± 0%   +16.03%  (p=0.008 n=5+5)
AddVW/10-8                        3.63GB/s ± 0%   5.15GB/s ± 0%   +41.83%  (p=0.008 n=5+5)
AddVW/100-8                       4.79GB/s ± 0%   9.87GB/s ± 0%  +106.12%  (p=0.008 n=5+5)
AddVW/1000-8                      5.27GB/s ± 0%  12.42GB/s ± 0%  +135.74%  (p=0.008 n=5+5)
AddVW/10000-8                     5.31GB/s ± 0%  11.19GB/s ± 0%  +110.71%  (p=0.008 n=5+5)
AddVW/100000-8                    5.32GB/s ± 0%  11.32GB/s ± 0%  +112.56%  (p=0.008 n=5+5)
SubVW/1-8                          859MB/s ± 0%    864MB/s ± 0%    +0.61%  (p=0.008 n=5+5)
SubVW/2-8                          812MB/s ± 2%   1520MB/s ± 0%   +87.09%  (p=0.008 n=5+5)
SubVW/3-8                         2.08GB/s ± 0%   2.18GB/s ± 0%    +4.55%  (p=0.008 n=5+5)
SubVW/4-8                         2.46GB/s ± 0%   2.66GB/s ± 0%    +8.33%  (p=0.008 n=5+5)
SubVW/5-8                         2.75GB/s ± 0%   3.20GB/s ± 0%   +16.03%  (p=0.008 n=5+5)
SubVW/10-8                        3.63GB/s ± 0%   5.15GB/s ± 0%   +41.82%  (p=0.008 n=5+5)
SubVW/100-8                       4.79GB/s ± 0%   9.87GB/s ± 0%  +106.13%  (p=0.008 n=5+5)
SubVW/1000-8                      5.27GB/s ± 0%  12.42GB/s ± 0%  +135.74%  (p=0.008 n=5+5)
SubVW/10000-8                     5.31GB/s ± 0%  11.17GB/s ± 0%  +110.44%  (p=0.008 n=5+5)
SubVW/100000-8                    5.32GB/s ± 0%  11.31GB/s ± 0%  +112.35%  (p=0.008 n=5+5)
AddMulVVW/1-8                     1.97GB/s ± 1%   1.96GB/s ± 1%      ~     (p=0.151 n=5+5)
AddMulVVW/2-8                     2.24GB/s ± 0%   2.25GB/s ± 0%      ~     (p=0.095 n=5+5)
AddMulVVW/3-8                     2.11GB/s ± 0%   2.12GB/s ± 0%      ~     (p=0.548 n=5+5)
AddMulVVW/4-8                     2.17GB/s ± 1%   2.17GB/s ± 1%      ~     (p=0.548 n=5+5)
AddMulVVW/5-8                     2.22GB/s ± 1%   2.21GB/s ± 1%      ~     (p=0.421 n=5+5)
AddMulVVW/10-8                    2.17GB/s ± 1%   2.16GB/s ± 0%      ~     (p=0.095 n=5+5)
AddMulVVW/100-8                   2.35GB/s ± 0%   2.35GB/s ± 0%      ~     (p=0.421 n=5+5)
AddMulVVW/1000-8                  2.47GB/s ± 0%   2.41GB/s ± 0%    -2.09%  (p=0.008 n=5+5)
AddMulVVW/10000-8                 2.16GB/s ± 0%   2.15GB/s ± 0%    -0.23%  (p=0.008 n=5+5)
AddMulVVW/100000-8                2.03GB/s ± 1%   2.04GB/s ± 0%      ~     (p=0.690 n=5+5)

name                              old alloc/op   new alloc/op    delta
FloatString/100-8                     400B ± 0%       400B ± 0%      ~     (all equal)
FloatString/1000-8                  3.22kB ± 0%     3.22kB ± 0%      ~     (all equal)
FloatString/10000-8                 55.6kB ± 0%     55.5kB ± 0%      ~     (p=0.206 n=5+5)
FloatString/100000-8                 627kB ± 0%      627kB ± 0%      ~     (all equal)
FloatAdd/10-8                        0.00B           0.00B           ~     (all equal)
FloatAdd/100-8                       0.00B           0.00B           ~     (all equal)
FloatAdd/1000-8                      0.00B           0.00B           ~     (all equal)
FloatAdd/10000-8                     0.00B           0.00B           ~     (all equal)
FloatAdd/100000-8                    0.00B           0.00B           ~     (all equal)
FloatSub/10-8                        0.00B           0.00B           ~     (all equal)
FloatSub/100-8                       0.00B           0.00B           ~     (all equal)
FloatSub/1000-8                      0.00B           0.00B           ~     (all equal)
FloatSub/10000-8                     0.00B           0.00B           ~     (all equal)
FloatSub/100000-8                    0.00B           0.00B           ~     (all equal)
FloatSqrt/64-8                        416B ± 0%       416B ± 0%      ~     (all equal)
FloatSqrt/128-8                       720B ± 0%       720B ± 0%      ~     (all equal)
FloatSqrt/256-8                       816B ± 0%       816B ± 0%      ~     (all equal)
FloatSqrt/1000-8                    2.50kB ± 0%     2.50kB ± 0%      ~     (all equal)
FloatSqrt/10000-8                   23.5kB ± 0%     23.5kB ± 0%      ~     (all equal)
FloatSqrt/100000-8                   251kB ± 0%      251kB ± 0%      ~     (all equal)
FloatSqrt/1000000-8                 4.61MB ± 0%     4.61MB ± 0%      ~     (all equal)

name                              old allocs/op  new allocs/op   delta
FloatString/100-8                     8.00 ± 0%       8.00 ± 0%      ~     (all equal)
FloatString/1000-8                    10.0 ± 0%       10.0 ± 0%      ~     (all equal)
FloatString/10000-8                   42.0 ± 0%       42.0 ± 0%      ~     (all equal)
FloatString/100000-8                   346 ± 0%        346 ± 0%      ~     (all equal)
FloatAdd/10-8                         0.00            0.00           ~     (all equal)
FloatAdd/100-8                        0.00            0.00           ~     (all equal)
FloatAdd/1000-8                       0.00            0.00           ~     (all equal)
FloatAdd/10000-8                      0.00            0.00           ~     (all equal)
FloatAdd/100000-8                     0.00            0.00           ~     (all equal)
FloatSub/10-8                         0.00            0.00           ~     (all equal)
FloatSub/100-8                        0.00            0.00           ~     (all equal)
FloatSub/1000-8                       0.00            0.00           ~     (all equal)
FloatSub/10000-8                      0.00            0.00           ~     (all equal)
FloatSub/100000-8                     0.00            0.00           ~     (all equal)
FloatSqrt/64-8                        9.00 ± 0%       9.00 ± 0%      ~     (all equal)
FloatSqrt/128-8                       13.0 ± 0%       13.0 ± 0%      ~     (all equal)
FloatSqrt/256-8                       12.0 ± 0%       12.0 ± 0%      ~     (all equal)
FloatSqrt/1000-8                      19.0 ± 0%       19.0 ± 0%      ~     (all equal)
FloatSqrt/10000-8                     35.0 ± 0%       35.0 ± 0%      ~     (all equal)
FloatSqrt/100000-8                    55.0 ± 0%       55.0 ± 0%      ~     (all equal)
FloatSqrt/1000000-8                    122 ± 0%        122 ± 0%      ~     (all equal)

Change-Id: I6888d84c037d91f9e2199f3492ea3f6a0ed77b24
Reviewed-on: https://go-review.googlesource.com/77832
Reviewed-by: Vlad Krasnov <vlad@cloudflare.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-08 15:31:37 +00:00
Lynn Boger
5b14c7b324 cmd/asm, cmd/internal/obj/ppc64: avoid unnecessary load zeros
When instructions add, and, or, xor, and movd have
constant operands in some cases more instructions are
generated than necessary by the assembler.

This adds more opcode/operand combinations to the optab
and improves the code generation for the cases where the
size and sign of the constant allows the use of 1
instructions instead of 2.

Example of previous code:
	oris r3, r0, 0
	ori  r3, r3, 65533

now:
	ori r3, r0, 65533

This does not significantly reduce the overall binary size
because the improvement depends on the constant value.
Some procedures show a 1-2% reduction in size. This improvement
could also be significant in cases where the extra instructions
occur in a critical loop.

Testcase ppc64enc.s was added to cmd/asm/internal/asm/testdata
with the variations affected by this change.

Updates #23845

Change-Id: I7fdf2320c95815d99f2755ba77d0c6921cd7fad7
Reviewed-on: https://go-review.googlesource.com/95135
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-08 14:17:34 +00:00
Joe Tsai
0add9a4dcf encoding/csv: avoid mangling invalid UTF-8 in Writer
In the situation where a quoted field is necessary, avoid processing
each UTF-8 rune one-by-one, which causes mangling of invalid sequences
into utf8.RuneError, causing a loss of information.
Instead, search only for the escaped characters, handle those specially
and copy everything else in between verbatim.

This symmetrically matches the behavior of Reader.

Fixes #24298

Change-Id: I9276f64891084ce8487678f663fad711b4095dbb
Reviewed-on: https://go-review.googlesource.com/99297
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-08 03:26:22 +00:00
Matthew Dempsky
88466e93a4 cmd/compile: mark anonymous receiver parameters as non-escaping
This was already done for normal parameters, and the same logic
applies for receiver parameters too.

Updates #24305.

Change-Id: Ia2a46f68d14e8fb62004ff0da1db0f065a95a1b7
Reviewed-on: https://go-review.googlesource.com/99335
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-08 00:20:01 +00:00
Ian Lance Taylor
8b8625a328 cmd/cover: don't crash on non-gofmt'ed input
Without the change to cover.go, the new test fails with

panic: overlapping edits: [4946,4950)->"", [4947,4947)->"thisNameMustBeVeryLongToCauseOverflowOfCounterIncrementStatementOntoNextLineForTest.Count[112]++;"

The original code inserts "else{", deletes "else", and then positions
a new block just after the "}" that must come before the "else".
That works on gofmt'ed code, but fails if the code looks like "}else".
When there is no space between the "{" and the "else", the new block
is inserted into a location that we are deleting, leading to the
"overlapping edits" mentioned above.

This CL fixes this case by not deleting the "else" but just using the
one that is already there. That requires adjust the block offset to
come after the "{" that we insert.

Fixes #23927

Change-Id: I40ef592490878765bbce6550ddb439e43ac525b2
Reviewed-on: https://go-review.googlesource.com/98935
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-07 23:36:25 +00:00
Ian Lance Taylor
419c06455a runtime: get traceback from VDSO code
Currently if a profiling signal arrives while executing within a VDSO
the profiler will report _ExternalCode, which is needlessly confusing
for a pure Go program. Change the VDSO calling code to record the
caller's PC/SP, so that we can do a traceback from that point. If that
fails for some reason, report _VDSO rather than _ExternalCode, which
should at least point in the right direction.

This adds some instructions to the code that calls the VDSO, but the
slowdown is reasonably negligible:

name                                  old time/op  new time/op  delta
ClockVDSOAndFallbackPaths/vDSO-8      40.5ns ± 2%  41.3ns ± 1%  +1.85%  (p=0.002 n=10+10)
ClockVDSOAndFallbackPaths/Fallback-8  41.9ns ± 1%  43.5ns ± 1%  +3.84%  (p=0.000 n=9+9)
TimeNow-8                             41.5ns ± 3%  41.5ns ± 2%    ~     (p=0.723 n=10+10)

Fixes #24142

Change-Id: Iacd935db3c4c782150b3809aaa675a71799b1c9c
Reviewed-on: https://go-review.googlesource.com/97315
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-07 23:35:25 +00:00
Ian Lance Taylor
c2f28de732 runtime: change from rt_sigaction to sigaction
This normalizes the Linux code to act like other targets. The size
argument to the rt_sigaction system call is pushed to a single
function, sysSigaction.

This is intended as a simplification step for CL 93875 for #14327.

Change-Id: I594788e235f0da20e16e8a028e27ac8c883907c4
Reviewed-on: https://go-review.googlesource.com/99077
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-07 23:30:02 +00:00
Brad Fitzpatrick
d8c9ef9e5c cmd/dist: skip rebuild before running tests when on the build systems
Updates #24300

Change-Id: I7752dab67e15a6dfe5fffe5b5ccbf3373bbc2c13
Reviewed-on: https://go-review.googlesource.com/99296
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-07 23:27:24 +00:00
Vlad Krasnov
fd3d27938a math/big: implement addMulVVW on arm64
The lack of proper addMulVVW implementation for arm64 hurts RSA performance.

This assembly implementation is optimized for arm64 based servers.

name                  old time/op    new time/op     delta
pkg:math/big goos:linux goarch:arm64
AddMulVVW/1             55.2ns ± 0%     11.9ns ± 1%    -78.37%  (p=0.000 n=8+10)
AddMulVVW/2             67.0ns ± 0%     11.2ns ± 0%    -83.28%  (p=0.000 n=7+10)
AddMulVVW/3             93.2ns ± 0%     13.2ns ± 0%    -85.84%  (p=0.000 n=10+10)
AddMulVVW/4              126ns ± 0%       13ns ± 1%    -89.82%  (p=0.000 n=10+10)
AddMulVVW/5              151ns ± 0%       17ns ± 0%    -88.87%  (p=0.000 n=10+9)
AddMulVVW/10             323ns ± 0%       25ns ± 0%    -92.20%  (p=0.000 n=10+10)
AddMulVVW/100           3.28µs ± 0%     0.14µs ± 0%    -95.82%  (p=0.000 n=10+10)
AddMulVVW/1000          31.7µs ± 0%      1.3µs ± 0%    -96.00%  (p=0.000 n=10+8)
AddMulVVW/10000          313µs ± 0%       13µs ± 0%    -95.98%  (p=0.000 n=10+10)
AddMulVVW/100000        3.24ms ± 0%     0.13ms ± 1%    -96.13%  (p=0.000 n=9+9)
pkg:crypto/rsa goos:linux goarch:arm64
RSA2048Decrypt          44.7ms ± 0%      4.0ms ± 6%    -91.08%  (p=0.000 n=8+10)
RSA2048Sign             46.3ms ± 0%      5.0ms ± 0%    -89.29%  (p=0.000 n=9+10)
3PrimeRSA2048Decrypt    22.3ms ± 0%      2.4ms ± 0%    -89.26%  (p=0.000 n=10+10)

Change-Id: I295f0bd5c51a4442d02c44ece1f6026d30dff0bc
Reviewed-on: https://go-review.googlesource.com/76270
Reviewed-by: Vlad Krasnov <vlad@cloudflare.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Vlad Krasnov <vlad@cloudflare.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-07 23:04:38 +00:00
David du Colombier
b1335037fa cmd/go: skip TestVetWithOnlyCgoFiles when cgo is disabled
CL 99175 added TestVetWithOnlyCgoFiles. However, this
test is failing on platforms where cgo is disabled,
because no file can be built.

This change fixes TestVetWithOnlyCgoFiles by skipping
this test when cgo is disabled.

Fixes #24304.

Change-Id: Ibb38fcd3e0ed1a791782145d3f2866f12117c6fe
Reviewed-on: https://go-review.googlesource.com/99275
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 22:03:43 +00:00
Elias Naur
7a2a96d6ad runtime/cgo: make sure nil is undefined before defining it
While working on standalone builds of gomobile bindings, I ran into
errors on the form:

gcc_darwin_arm.c:30:31: error: ambiguous expansion of macro 'nil' [-Werror,-Wambiguous-macro]
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS11.2.sdk/usr/include/MacTypes.h:94:15: note: expanding this definition of 'nil'

Fix it by undefining nil before defining it in libcgo.h.

Change-Id: I8e9660a68c6c351e592684d03d529f0d182c0493
Reviewed-on: https://go-review.googlesource.com/99215
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-07 21:08:19 +00:00
Ian Lance Taylor
709da95513 cmd/go: run vet on packages with only cgo files
CgoFiles is not included in GoFiles, so we need to check both.

Fixes #24193

Change-Id: I6a67bd912e3d9a4be0eae8fa8db6fa8a07fb5df3
Reviewed-on: https://go-review.googlesource.com/99175
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 18:42:17 +00:00
Matthew Dempsky
a3b3284ddc cmd/compile: prevent untyped types from reaching walk
We already require expressions to have already been typechecked before
reaching walk. Moreover, all untyped expressions should have been
converted to their default type by walk.

However, in practice, we've been somewhat sloppy and inconsistent
about ensuring this. In particular, a lot of AST rewrites ended up
leaving untyped bool expressions scattered around. These likely aren't
harmful in practice, but it seems worth cleaning up.

The two most common cases addressed by this CL are:

1) When generating OIF and OFOR nodes, we would often typecheck the
conditional expression, but not apply defaultlit to force it to the
expression's default type.

2) When rewriting string comparisons into more fundamental primitives,
we were simply overwriting r.Type with the desired type, which didn't
propagate the type to nested subexpressions. These are fixed by
utilizing finishcompare, which correctly handles this (and is already
used by other comparison lowering rewrites).

Lastly, walkexpr is extended to assert that it's not called on untyped
expressions.

Fixes #23834.

Change-Id: Icbd29648a293555e4015d3b06a95a24ccbd3f790
Reviewed-on: https://go-review.googlesource.com/98337
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-07 18:14:22 +00:00
Kunpei Sakai
ed8b7a7785 cmd/compile: go fmt
Change-Id: I2eae33928641c6ed74badfe44d079ae90e5cc8c8
Reviewed-on: https://go-review.googlesource.com/99195
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 16:57:03 +00:00
Alberto Donizetti
c028958393 test/codegen: fix issue with arm64 memmove codegen test
This recently added arm64 memmove codegen check:

  func movesmall() {
    // arm64:-"memmove"
    x := [...]byte{1, 2, 3, 4, 5, 6, 7}
    copy(x[1:], x[:])
  }

is not correct, for two reasons:

1. regexps are matched from the start of the disasm line (excluding
   line information). This mean that a negative -"memmove" check will
   pass against a 'CALL runtime.memmove' line because the line does
   not start with 'memmove' (its starts with CALL...).
   The way to specify no 'memmove' match whatsoever on the line is
   -".*memmove"

2. AFAIK comments on their own line are matched against the first
   subsequent non-comment line. So the code above only verifies that
   the x := ... line does not generate a memmove. The comment should
   be moved near the copy() line, if it's that one we want to not
   generate a memmove call.

The fact that the test above is not effective can be checked by
running `go run run.go -v codegen` in the toplevel test directory with
a go1.10 toolchain (that does not have the memmove-elision
optimization). The test will still pass (it shouldn't).

This change changes the regexp to -".*memmove" and moves it near the
line it needs to (not)match.

Change-Id: Ie01ef4d775e77d92dc8d8b7856b89b200f5e5ef2
Reviewed-on: https://go-review.googlesource.com/98977
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-07 16:41:24 +00:00
Tobias Klauser
aa00d97447 debug/pe: use bytes.IndexByte instead of a loop
Follow CL 98759

Change-Id: I58c8b769741b395e5bf4e723505b149d063d492a
Reviewed-on: https://go-review.googlesource.com/99095
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-07 16:12:08 +00:00
Tobias Klauser
0657235660 database/sql: fix typo in comment
Change-Id: Ie2966bae1dc2e542c42fb32d8059a4b2d4690014
Reviewed-on: https://go-review.googlesource.com/99115
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-07 15:42:31 +00:00
Hana Kim
93b0261d0a cmd/trace: force GC occassionally
to return memory to the OS after completing potentially
large operations.

Update #21870

Sys went down to 3.7G

$ DEBUG_MEMORY_USAGE=1 go tool trace trace.out

2018/03/07 09:35:52 Parsing trace...
after parsing trace
 Alloc:	3385754360 Bytes
 Sys:	3662047864 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	3488907264 Bytes
 HeapInUse:	3426549760 Bytes
 HeapAlloc:	3385754360 Bytes
Enter to continue...
2018/03/07 09:36:09 Splitting trace...
after spliting trace
 Alloc:	3238309424 Bytes
 Sys:	3684410168 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	3488874496 Bytes
 HeapInUse:	3266461696 Bytes
 HeapAlloc:	3238309424 Bytes
Enter to continue...
2018/03/07 09:36:39 Opening browser. Trace viewer is listening on http://100.101.224.241:12345

after httpJsonTrace
 Alloc:	3000633872 Bytes
 Sys:	3693978424 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	3488743424 Bytes
 HeapInUse:	3030966272 Bytes
 HeapAlloc:	3000633872 Bytes
Enter to continue...

Change-Id: I56f64cae66c809cbfbad03fba7bd0d35494c1d04
Reviewed-on: https://go-review.googlesource.com/92376
Reviewed-by: Peter Weinberger <pjw@google.com>
2018-03-07 14:39:25 +00:00
jimmyfrasche
20b14b71df go/build: correct value of .Doc field
Build could use the package comment from test files to populate the .Doc
field on *Package.

As go list uses this data and several packages in the standard library
have tests with package comments, this lead to:

$ go list -f '{{.Doc}}' flag container/heap image
These examples demonstrate more intricate uses of the flag package.
This example demonstrates an integer heap built using the heap interface.
This example demonstrates decoding a JPEG image and examining its pixels.

This change now only examines non-test files when attempting to populate
.Doc, resulting in the expected behavior:

$ gotip list -f '{{.Doc}}' flag container/heap image
Package flag implements command-line flag parsing.
Package heap provides heap operations for any type that implements heap.Interface.
Package image implements a basic 2-D image library.

Fixes #23594

Change-Id: I37171c26ec5cc573efd273556a05223c6f675968
Reviewed-on: https://go-review.googlesource.com/96976
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-07 14:35:52 +00:00
Hana Kim
ee465831ec cmd/trace: generate jsontrace data in a streaming fashion
Update #21870

The Sys went down to 4.25G from 6.2G.

$ DEBUG_MEMORY_USAGE=1 go tool trace trace.out
2018/03/07 08:49:01 Parsing trace...
after parsing trace
 Alloc:	3385757184 Bytes
 Sys:	3661195896 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	3488841728 Bytes
 HeapInUse:	3426516992 Bytes
 HeapAlloc:	3385757184 Bytes
Enter to continue...
2018/03/07 08:49:18 Splitting trace...
after spliting trace
 Alloc:	2352071904 Bytes
 Sys:	4243825464 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	4025712640 Bytes
 HeapInUse:	2377703424 Bytes
 HeapAlloc:	2352071904 Bytes
Enter to continue...
after httpJsonTrace
 Alloc:	3228697832 Bytes
 Sys:	4250379064 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	4025647104 Bytes
 HeapInUse:	3260014592 Bytes
 HeapAlloc:	3228697832 Bytes

Change-Id: I546f26bdbc68b1e58f1af1235a0e299dc0ff115e
Reviewed-on: https://go-review.googlesource.com/92375
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Peter Weinberger <pjw@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-07 14:33:54 +00:00
Yuval Pavel Zholkover
083f3957b8 runtime: add missing build constraints to os_linux_{be64,noauxv,novdso,ppc64x}.go files
They do not match the file name patterns of
  *_GOOS
  *_GOARCH
  *_GOOS_GOARCH
therefore the implicit linux constraint was not being added.

Change-Id: Ie506c51cee6818db445516f96fffaa351df62cf5
Reviewed-on: https://go-review.googlesource.com/99116
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-07 14:26:19 +00:00
Elias Naur
9094946f0d androidtest.bash: don't require GOARCH set
The host GOARCH is most likely supported (386, amd64, arm, arm64).

Change-Id: I86324b9c00f22c592ba54bda7d2ae97c86bda904
Reviewed-on: https://go-review.googlesource.com/99155
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2018-03-07 14:01:43 +00:00
Alex Brainman
e83601b435 os: use WIN32_FIND_DATA.Reserved0 to identify symlinks
os.Stat implementation uses instructions described at
https://blogs.msdn.microsoft.com/oldnewthing/20100212-00/?p=14963/
to distinguish symlinks. In particular, it calls
GetFileAttributesEx or FindFirstFile and checks
either WIN32_FILE_ATTRIBUTE_DATA.dwFileAttributes
or WIN32_FIND_DATA.dwFileAttributes to see if
FILE_ATTRIBUTES_REPARSE_POINT flag is set.
And that seems to worked fine so far.

But now we discovered that OneDrive root folder
is determined as directory:

c:\>dir C:\Users\Alex | grep OneDrive
30/11/2017  07:25 PM    <DIR>          OneDrive
c:\>

while Go identified it as symlink.

But we did not follow Microsoft's advice to the letter - we never
checked WIN32_FIND_DATA.Reserved0. And adding that extra check
makes Go treat OneDrive as symlink. So use FindFirstFile and
WIN32_FIND_DATA.Reserved0 to determine symlinks.

Fixes #22579

Change-Id: I0cb88929eb8b47b1d24efaf1907ad5a0e20de83f
Reviewed-on: https://go-review.googlesource.com/86556
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-07 08:51:04 +00:00
Matthew Dempsky
d7eb4901f1 cmd/compile: remove funcdepth variables
There were only two large classes of use for these variables:

1) Testing "funcdepth != 0" or "funcdepth > 0", which is equivalent to
checking "Curfn != nil".

2) In oldname, detecting whether a closure variable has been created
for the current function, which can be handled by instead testing
"n.Name.Curfn != Curfn".

Lastly, merge funcstart into funchdr, since it's only called once, and
it better matches up with funcbody now.

Passes toolstash-check.

Change-Id: I8fe159a9d37ef7debc4cd310354cea22a8b23394
Reviewed-on: https://go-review.googlesource.com/99076
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 06:05:18 +00:00
Matthew Dempsky
aa00ca12fe cmd/compile: cleanup funccompile and compile
Bring these functions next to each other, and clean them up a little
bit. Also, change emitptrargsmap to take Curfn as a parameter instead
of a global.

Passes toolstash-check.

Change-Id: Ib9c94fda3b2cb6f0dcec1585622b33b4f311b5e9
Reviewed-on: https://go-review.googlesource.com/99075
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 03:12:38 +00:00
Kunpei Sakai
b75e8a2a3b cmd/compile: prevent detection of wrong duplicates
by including *types.Type in typeVal.

Updates #21866
Fixes #24159

Change-Id: I2f8cac252d88d43e723124f2867b1410b7abab7b
Reviewed-on: https://go-review.googlesource.com/98476
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-03-07 01:26:00 +00:00
Matthew Dempsky
2c0c68d621 cmd/compile: fix miscompilation of "defer delete(m, k)"
Previously, for slow map key types (i.e., any type other than a 32-bit
or 64-bit plain memory type), we would rewrite

    defer delete(m, k)

into

    ktmp := k
    defer delete(m, &ktmp)

However, if the defer statement was inside a loop, we would end up
reusing the same ktmp value for all of the deferred deletes.

We already rewrite

    defer print(x, y, z)

into

    defer func(a1, a2, a3) {
        print(a1, a2, a3)
    }(x, y, z)

This CL generalizes this rewrite to also apply for slow map deletes.

This could be extended to apply even more generally to other builtins,
but as discussed on #24259, there are cases where we must *not* do
this (e.g., "defer recover()"). However, if we elect to do this more
generally, this CL should still make that easier.

Lastly, while here, fix a few isues in wrapCall (nee walkprintfunc):

1) lookupN appends the generation number to the symbol anyway, so "%d"
was being literally included in the generated function names.

2) walkstmt will be called when the function is compiled later anyway,
so no need to do it now.

Fixes #24259.

Change-Id: I70286867c64c69c18e9552f69e3f4154a0fc8b04
Reviewed-on: https://go-review.googlesource.com/99017
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-06 23:33:28 +00:00
Ian Lance Taylor
558769a61b internal/poll: if poller init fails, assume blocking mode
Fixes #23943

Change-Id: I16e604872f1615963925ec3c4710106bcce1330c
Reviewed-on: https://go-review.googlesource.com/99015
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-06 23:21:25 +00:00
ChrisALiles
42ecf39e85 cmd/compile: improve compiler error on embedded structs
Fixes #23609

Change-Id: I751aae3d849de7fce1306324fcb1a4c3842d873e
Reviewed-on: https://go-review.googlesource.com/97076
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-06 21:06:46 +00:00
Alberto Donizetti
8516ecd05f test/codegen: port math/bits.ReverseBytes tests to codegen
And remove them from ssa_test.

Change-Id: If767af662801219774d1bdb787c77edfa6067770
Reviewed-on: https://go-review.googlesource.com/98976
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-06 20:34:33 +00:00
Wei Xiao
05962561ae cmd/compile/internal/ssa: improve store combine optimization on arm64
Current implementation doesn't consider MOVDreg type operand and fail to combine
it into larger store. This patch fixes the issue.

Fixes #24242

Change-Id: I7d68697f80e76f48c3528ece01a602bf513248ec
Reviewed-on: https://go-review.googlesource.com/98397
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 20:29:04 +00:00
Josh Bleecher Snyder
b85433975a encoding/binary: use an offset instead of slicing
While running make.bash, over 5% of all pointer writes
come from encoding/binary doing struct reads.

This change replaces slicing during such reads with an offset.
This avoids updating the slice pointer with every
struct field read or write.

This has no impact when the write barrier is off.
Running the benchmarks with GOGC=1, however,
shows significant improvement:

name          old time/op    new time/op    delta
ReadStruct-8    13.2µs ± 6%    10.1µs ± 5%  -23.24%  (p=0.000 n=10+10)

name          old speed      new speed      delta
ReadStruct-8  5.69MB/s ± 6%  7.40MB/s ± 5%  +30.18%  (p=0.000 n=10+10)

Change-Id: I22904263196bfeddc38abe8989428e263aee5253
Reviewed-on: https://go-review.googlesource.com/98757
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-06 18:59:03 +00:00
Josh Bleecher Snyder
f7739c07c8 runtime: skip pointless writes in freedefer
Change-Id: I501a0e5c87ec88616c7dcdf1b723758b6df6c088
Reviewed-on: https://go-review.googlesource.com/98758
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-06 18:58:57 +00:00
Josh Bleecher Snyder
4599419e69 debug/macho: use bytes.IndexByte instead of a loop
Simpler, and no doubt faster.

Change-Id: Idd401918da07a257de365087721e9ff061e6fd07
Reviewed-on: https://go-review.googlesource.com/98759
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-06 18:58:50 +00:00
Balaram Makam
0e8b7110f6 cmd/compile/internal/ssa: inline small memmove for arm64
This patch enables the optimization for arm64 target.

Performance results on Amberwing for strconv benchmark:
name             old time/op  new time/op  delta
Quote             721ns ± 0%   617ns ± 0%  -14.40%  (p=0.016 n=5+4)
QuoteRune         118ns ± 0%   117ns ± 0%   -0.85%  (p=0.008 n=5+5)
AppendQuote       436ns ± 2%   321ns ± 0%  -26.31%  (p=0.008 n=5+5)
AppendQuoteRune  34.7ns ± 0%  28.4ns ± 0%  -18.16%  (p=0.000 n=5+4)
[Geo mean]        189ns        160ns       -15.41%

Change-Id: I5714c474e7483d07ca338fbaf49beb4bbcc11c44
Reviewed-on: https://go-review.googlesource.com/98735
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 18:37:19 +00:00
Alberto Donizetti
18ae5eca3b test/codegen: port math/bits.OnesCount tests to codegen
And remove them from ssa_test.

Change-Id: I3efac5fea529bb0efa2dae32124530482ba5058e
Reviewed-on: https://go-review.googlesource.com/98815
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-06 17:53:00 +00:00
Cherry Zhang
f624445473 cmd/internal/obj/arm64: gofmt
Change-Id: Ica778fef2d0245fbb14f595597e45c7cf6adef84
Reviewed-on: https://go-review.googlesource.com/98895
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-06 16:35:20 +00:00
Elias Naur
f5f16d1ec1 iostest.bash: don't build std library twice
Instead, mirror androidtest.bash and build once, then run run.bash.

Change-Id: I174ae30b2a429a62b20bb290a70cb07ed712b1e4
Reviewed-on: https://go-review.googlesource.com/98915
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-06 16:08:20 +00:00
Elias Naur
ad87a67cdf cmd/dist: default to GOARM=7 on android
Auto-detecting GOARM on Android makes as little sense as for nacl/arm
and darwin/arm.

Also update androidtest.sh to not require GOARM set.

Change-Id: Id409ce1573d3c668d00fa4b7e3562ad7ece6fef5
Reviewed-on: https://go-review.googlesource.com/98875
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-06 16:08:04 +00:00
Cherry Zhang
084143d844 math/big: don't use R18 in ARM64 assembly
R18 seems reserved on Apple platforms.

May fix darwin/arm64 build.

Change-Id: Ia2c1de550a64827c85a64affa53b94c62aacce8e
Reviewed-on: https://go-review.googlesource.com/98896
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
2018-03-06 15:34:00 +00:00
Tobias Klauser
9745397e1d runtime: fix stack switch check in walltime/nanotime on linux/arm
CL 98095 got the check wrong. We should be testing
'getg() == getg().m.curg', not 'getg().m == getg().m.curg'.

Change-Id: I32f6238b00409b67afa8efe732513d542aec5bc7
Reviewed-on: https://go-review.googlesource.com/98855
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-06 14:24:19 +00:00
Alberto Donizetti
85dcc709a8 test/codegen: port math/bits.TrailingZeros tests to codegen
And remove them from ssa_test.

Change-Id: Ib5de5c0d908f23915e0847eca338cacf2fa5325b
Reviewed-on: https://go-review.googlesource.com/98795
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-06 11:48:37 +00:00
as
df8c2b905b net/http: correct subtle transposition of offset and whence in test
Change-Id: I788972bdf85c0225397c0e74901bf9c33c6d30c7
GitHub-Last-Rev: 57737fe782
GitHub-Pull-Request: golang/go#24265
Reviewed-on: https://go-review.googlesource.com/98761
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-06 06:13:17 +00:00
Meng Zhuo
8916773a3d runtime, cmd/compile: use ldp for DUFFCOPY on ARM64
name         old time/op  new time/op  delta
CopyFat8     2.15ns ± 1%  2.19ns ± 6%     ~     (p=0.171 n=8+9)
CopyFat12    2.15ns ± 0%  2.17ns ± 2%     ~     (p=0.137 n=8+10)
CopyFat16    2.17ns ± 3%  2.15ns ± 0%     ~     (p=0.211 n=10+10)
CopyFat24    2.16ns ± 1%  2.15ns ± 0%     ~     (p=0.087 n=10+10)
CopyFat32    11.5ns ± 0%  12.8ns ± 2%  +10.87%  (p=0.000 n=8+10)
CopyFat64    20.2ns ± 2%  12.9ns ± 0%  -36.11%  (p=0.000 n=10+10)
CopyFat128   37.2ns ± 0%  21.5ns ± 0%  -42.20%  (p=0.000 n=10+10)
CopyFat256   71.6ns ± 0%  38.7ns ± 0%  -45.95%  (p=0.000 n=10+10)
CopyFat512    140ns ± 0%    73ns ± 0%  -47.86%  (p=0.000 n=10+9)
CopyFat520    142ns ± 0%    74ns ± 0%  -47.54%  (p=0.000 n=10+10)
CopyFat1024   277ns ± 0%   141ns ± 0%  -49.10%  (p=0.000 n=10+10)

Change-Id: If54bc571add5db674d5e081579c87e80153d0a5a
Reviewed-on: https://go-review.googlesource.com/97395
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 04:14:59 +00:00
Rob Pike
baf3eb1625 cmd/doc: make local dot-slash path names work
Before, an argument that started ./ or ../ was not treated as
a package relative to the current directory. Thus

	$ cd $GOROOT/src/text
	$ go doc ./template

could find html/template as $GOROOT/src/html/./template
is a valid Go source directory.

Fix this by catching such paths and making them absolute before
processing.

Fixes #23383.

Change-Id: Ic2a92eaa3a6328f728635657f9de72ac3ee82afb
Reviewed-on: https://go-review.googlesource.com/98396
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-06 01:11:26 +00:00
Fangming.Fang
917e72697e crypto/aes: optimize arm64 AES implementation
This patch makes use of arm64 AES instructions to accelerate AES computation
and only supports optimization on Linux for arm64

name        old time/op    new time/op     delta
Encrypt-32     255ns ± 0%       26ns ± 0%   -89.73%
Decrypt-32     256ns ± 0%       26ns ± 0%   -89.77%
Expand-32      990ns ± 5%      901ns ± 0%    -9.05%

name        old speed      new speed       delta
Encrypt-32  62.5MB/s ± 0%  610.4MB/s ± 0%  +876.39%
Decrypt-32  62.3MB/s ± 0%  610.2MB/s ± 0%  +879.6%

Fixes #18498

Change-Id: If416e5a151785325527b32ff72f6da3812493ed0
Reviewed-on: https://go-review.googlesource.com/64490
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 00:44:29 +00:00
erifan01
c4f3fe95c6 math/big: optimize addVV and subVV on arm64
The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance.
By unrolling the cycle 4x and 2x, and using "LDP", "STP" instructions, this CL can reduce the "load" cost and improve performance.

Benchmarks:

name                              old time/op    new time/op     delta
AddVV/1-8                           21.5ns ± 0%     11.5ns ± 0%   -46.51%  (p=0.008 n=5+5)
AddVV/2-8                           13.5ns ± 0%     12.0ns ± 0%   -11.11%  (p=0.008 n=5+5)
AddVV/3-8                           15.5ns ± 0%     13.0ns ± 0%   -16.13%  (p=0.008 n=5+5)
AddVV/4-8                           17.5ns ± 0%     13.5ns ± 0%   -22.86%  (p=0.008 n=5+5)
AddVV/5-8                           19.5ns ± 0%     14.5ns ± 0%   -25.64%  (p=0.008 n=5+5)
AddVV/10-8                          29.5ns ± 0%     18.0ns ± 0%   -38.98%  (p=0.008 n=5+5)
AddVV/100-8                          217ns ± 0%       94ns ± 0%   -56.64%  (p=0.008 n=5+5)
AddVV/1000-8                        2.02µs ± 0%     1.03µs ± 0%   -48.85%  (p=0.008 n=5+5)
AddVV/10000-8                       20.5µs ± 0%     11.3µs ± 0%   -44.70%  (p=0.008 n=5+5)
AddVV/100000-8                       247µs ± 3%      154µs ± 0%   -37.52%  (p=0.008 n=5+5)
SubVV/1-8                           21.5ns ± 0%     11.5ns ± 0%      ~     (p=0.079 n=4+5)
SubVV/2-8                           13.5ns ± 0%     12.0ns ± 0%   -11.11%  (p=0.008 n=5+5)
SubVV/3-8                           15.5ns ± 0%     13.0ns ± 0%   -16.13%  (p=0.008 n=5+5)
SubVV/4-8                           17.5ns ± 0%     13.5ns ± 0%   -22.86%  (p=0.008 n=5+5)
SubVV/5-8                           19.5ns ± 0%     14.5ns ± 0%   -25.64%  (p=0.008 n=5+5)
SubVV/10-8                          29.5ns ± 0%     18.0ns ± 0%   -38.98%  (p=0.008 n=5+5)
SubVV/100-8                          217ns ± 0%       94ns ± 0%   -56.64%  (p=0.008 n=5+5)
SubVV/1000-8                        2.02µs ± 0%     0.80µs ± 0%   -60.50%  (p=0.008 n=5+5)
SubVV/10000-8                       20.5µs ± 0%     11.3µs ± 0%   -44.99%  (p=0.008 n=5+5)
SubVV/100000-8                       221µs ±11%      223µs ±16%      ~     (p=0.690 n=5+5)
AddVW/1-8                           9.32ns ± 0%     9.32ns ± 0%      ~     (all equal)
AddVW/2-8                           19.7ns ± 1%     19.7ns ± 0%      ~     (p=0.381 n=5+4)
AddVW/3-8                           11.5ns ± 0%     11.5ns ± 0%      ~     (all equal)
AddVW/4-8                           13.0ns ± 0%     13.0ns ± 0%      ~     (all equal)
AddVW/5-8                           14.5ns ± 0%     14.5ns ± 0%      ~     (all equal)
AddVW/10-8                          22.0ns ± 0%     22.0ns ± 0%      ~     (all equal)
AddVW/100-8                          167ns ± 0%      167ns ± 0%      ~     (all equal)
AddVW/1000-8                        1.52µs ± 0%     1.52µs ± 0%    +0.40%  (p=0.008 n=5+5)
AddVW/10000-8                       15.1µs ± 0%     15.1µs ± 0%      ~     (p=0.556 n=5+4)
AddVW/100000-8                       152µs ± 1%      152µs ± 1%      ~     (p=0.690 n=5+5)
AddMulVVW/1-8                       33.3ns ± 0%     32.7ns ± 1%    -1.86%  (p=0.008 n=5+5)
AddMulVVW/2-8                       59.3ns ± 1%     56.9ns ± 1%    -4.15%  (p=0.008 n=5+5)
AddMulVVW/3-8                       80.5ns ± 1%     85.4ns ± 3%    +6.19%  (p=0.008 n=5+5)
AddMulVVW/4-8                        127ns ± 0%      111ns ± 1%   -13.19%  (p=0.008 n=5+5)
AddMulVVW/5-8                        144ns ± 0%      149ns ± 0%    +3.47%  (p=0.016 n=4+5)
AddMulVVW/10-8                       298ns ± 1%      283ns ± 0%    -4.77%  (p=0.008 n=5+5)
AddMulVVW/100-8                     3.06µs ± 0%     2.99µs ± 0%    -2.21%  (p=0.008 n=5+5)
AddMulVVW/1000-8                    31.3µs ± 0%     26.9µs ± 0%   -14.17%  (p=0.008 n=5+5)
AddMulVVW/10000-8                    316µs ± 0%      305µs ± 0%    -3.51%  (p=0.008 n=5+5)
AddMulVVW/100000-8                  3.17ms ± 0%     3.17ms ± 1%      ~     (p=0.690 n=5+5)
DecimalConversion-8                  316µs ± 1%      313µs ± 2%      ~     (p=0.095 n=5+5)
FloatString/100-8                   2.53µs ± 1%     2.56µs ± 2%      ~     (p=0.222 n=5+5)
FloatString/1000-8                  58.4µs ± 0%     58.5µs ± 0%      ~     (p=0.206 n=5+5)
FloatString/10000-8                 4.59ms ± 0%     4.58ms ± 0%    -0.31%  (p=0.008 n=5+5)
FloatString/100000-8                 446ms ± 0%      444ms ± 0%    -0.31%  (p=0.008 n=5+5)
FloatAdd/10-8                        184ns ± 0%      172ns ± 0%    -6.30%  (p=0.008 n=5+5)
FloatAdd/100-8                       189ns ± 2%      191ns ± 4%      ~     (p=0.381 n=5+5)
FloatAdd/1000-8                      371ns ± 0%      347ns ± 1%    -6.42%  (p=0.008 n=5+5)
FloatAdd/10000-8                    1.87µs ± 0%     1.68µs ± 0%   -10.16%  (p=0.008 n=5+5)
FloatAdd/100000-8                   17.1µs ± 0%     15.6µs ± 0%    -8.74%  (p=0.016 n=5+4)
FloatSub/10-8                        152ns ± 0%      138ns ± 0%    -9.47%  (p=0.000 n=4+5)
FloatSub/100-8                       148ns ± 0%      142ns ± 0%    -4.05%  (p=0.000 n=5+4)
FloatSub/1000-8                      245ns ± 1%      217ns ± 0%   -11.28%  (p=0.000 n=5+4)
FloatSub/10000-8                    1.07µs ± 0%     0.88µs ± 1%   -18.14%  (p=0.008 n=5+5)
FloatSub/100000-8                   9.58µs ± 0%     7.96µs ± 0%   -16.84%  (p=0.008 n=5+5)
ParseFloatSmallExp-8                28.8µs ± 1%     29.0µs ± 1%      ~     (p=0.095 n=5+5)
ParseFloatLargeExp-8                 126µs ± 1%      126µs ± 1%      ~     (p=0.841 n=5+5)
GCD10x10/WithoutXY-8                 277ns ± 2%      281ns ± 4%      ~     (p=0.746 n=5+5)
GCD10x10/WithXY-8                   2.10µs ± 1%     2.12µs ± 3%      ~     (p=0.548 n=5+5)
GCD10x100/WithoutXY-8                615ns ± 3%      607ns ± 2%      ~     (p=0.135 n=5+5)
GCD10x100/WithXY-8                  3.50µs ± 2%     3.62µs ± 5%      ~     (p=0.151 n=5+5)
GCD10x1000/WithoutXY-8              1.39µs ± 2%     1.39µs ± 3%      ~     (p=0.690 n=5+5)
GCD10x1000/WithXY-8                 7.39µs ± 1%     7.34µs ± 2%      ~     (p=0.135 n=5+5)
GCD10x10000/WithoutXY-8             8.66µs ± 1%     8.68µs ± 1%      ~     (p=0.421 n=5+5)
GCD10x10000/WithXY-8                28.1µs ± 2%     27.0µs ± 2%    -3.81%  (p=0.008 n=5+5)
GCD10x100000/WithoutXY-8            79.3µs ± 1%     79.3µs ± 1%      ~     (p=0.841 n=5+5)
GCD10x100000/WithXY-8                238µs ± 0%      227µs ± 1%    -4.74%  (p=0.008 n=5+5)
GCD100x100/WithoutXY-8              1.89µs ± 1%     1.88µs ± 2%      ~     (p=0.968 n=5+5)
GCD100x100/WithXY-8                 26.7µs ± 1%     27.0µs ± 1%    +1.44%  (p=0.032 n=5+5)
GCD100x1000/WithoutXY-8             4.48µs ± 1%     4.45µs ± 2%      ~     (p=0.341 n=5+5)
GCD100x1000/WithXY-8                36.3µs ± 1%     35.1µs ± 1%    -3.27%  (p=0.008 n=5+5)
GCD100x10000/WithoutXY-8            22.8µs ± 0%     22.7µs ± 1%      ~     (p=0.056 n=5+5)
GCD100x10000/WithXY-8                145µs ± 1%      133µs ± 1%    -8.33%  (p=0.008 n=5+5)
GCD100x100000/WithoutXY-8            198µs ± 0%      195µs ± 0%    -1.56%  (p=0.008 n=5+5)
GCD100x100000/WithXY-8              1.11ms ± 0%     1.00ms ± 0%   -10.04%  (p=0.008 n=5+5)
GCD1000x1000/WithoutXY-8            25.2µs ± 1%     24.8µs ± 1%    -1.63%  (p=0.016 n=5+5)
GCD1000x1000/WithXY-8                513µs ± 0%      517µs ± 2%      ~     (p=0.421 n=5+5)
GCD1000x10000/WithoutXY-8           57.0µs ± 0%     52.7µs ± 1%    -7.56%  (p=0.008 n=5+5)
GCD1000x10000/WithXY-8              1.20ms ± 0%     1.10ms ± 0%    -8.70%  (p=0.008 n=5+5)
GCD1000x100000/WithoutXY-8           358µs ± 0%      318µs ± 1%   -11.03%  (p=0.008 n=5+5)
GCD1000x100000/WithXY-8             8.71ms ± 0%     7.65ms ± 0%   -12.19%  (p=0.008 n=5+5)
GCD10000x10000/WithoutXY-8           690µs ± 0%      630µs ± 0%    -8.71%  (p=0.008 n=5+5)
GCD10000x10000/WithXY-8             16.0ms ± 1%     14.9ms ± 0%    -6.85%  (p=0.008 n=5+5)
GCD10000x100000/WithoutXY-8         2.09ms ± 0%     1.75ms ± 0%   -16.09%  (p=0.016 n=5+4)
GCD10000x100000/WithXY-8            86.8ms ± 0%     76.3ms ± 0%   -12.09%  (p=0.008 n=5+5)
GCD100000x100000/WithoutXY-8        51.1ms ± 0%     46.0ms ± 0%    -9.97%  (p=0.008 n=5+5)
GCD100000x100000/WithXY-8            1.25s ± 0%      1.15s ± 0%    -7.92%  (p=0.008 n=5+5)
Hilbert-8                           2.45ms ± 1%     2.49ms ± 1%    +1.99%  (p=0.008 n=5+5)
Binomial-8                          4.98µs ± 3%     4.90µs ± 2%      ~     (p=0.421 n=5+5)
QuoRem-8                            7.10µs ± 0%     6.21µs ± 0%   -12.55%  (p=0.016 n=5+4)
Exp-8                                161ms ± 0%      161ms ± 0%      ~     (p=0.421 n=5+5)
Exp2-8                               161ms ± 0%      161ms ± 0%      ~     (p=0.151 n=5+5)
Bitset-8                            40.4ns ± 0%     40.3ns ± 0%      ~     (p=0.190 n=5+5)
BitsetNeg-8                          163ns ± 3%      137ns ± 2%   -15.91%  (p=0.008 n=5+5)
BitsetOrig-8                         377ns ± 1%      372ns ± 1%    -1.22%  (p=0.024 n=5+5)
BitsetNegOrig-8                      631ns ± 1%      605ns ± 1%    -4.09%  (p=0.008 n=5+5)
ModSqrt225_Tonelli-8                7.26ms ± 0%     7.26ms ± 0%      ~     (p=0.548 n=5+5)
ModSqrt224_3Mod4-8                  2.24ms ± 0%     2.24ms ± 0%      ~     (p=1.000 n=5+5)
ModSqrt5430_Tonelli-8                62.4s ± 0%      62.4s ± 0%      ~     (p=0.841 n=5+5)
ModSqrt5430_3Mod4-8                  20.8s ± 0%      20.7s ± 0%      ~     (p=0.056 n=5+5)
Sqrt-8                               101µs ± 0%       89µs ± 0%   -12.17%  (p=0.008 n=5+5)
IntSqr/1-8                          32.5ns ± 1%     32.7ns ± 1%      ~     (p=0.056 n=5+5)
IntSqr/2-8                           160ns ± 5%      158ns ± 0%      ~     (p=0.397 n=5+4)
IntSqr/3-8                           298ns ± 4%      296ns ± 4%      ~     (p=0.667 n=5+5)
IntSqr/5-8                           737ns ± 5%      761ns ± 3%    +3.34%  (p=0.016 n=5+5)
IntSqr/8-8                          1.87µs ± 4%     1.90µs ± 3%      ~     (p=0.222 n=5+5)
IntSqr/10-8                         2.96µs ± 4%     2.92µs ± 6%      ~     (p=0.310 n=5+5)
IntSqr/20-8                         6.28µs ± 3%     6.21µs ± 2%      ~     (p=0.310 n=5+5)
IntSqr/30-8                         14.0µs ± 2%     13.9µs ± 2%      ~     (p=0.548 n=5+5)
IntSqr/50-8                         37.7µs ± 3%     38.3µs ± 2%      ~     (p=0.095 n=5+5)
IntSqr/80-8                         95.9µs ± 2%     95.1µs ± 1%      ~     (p=0.310 n=5+5)
IntSqr/100-8                         148µs ± 1%      148µs ± 1%      ~     (p=0.841 n=5+5)
IntSqr/200-8                         586µs ± 1%      587µs ± 1%      ~     (p=1.000 n=5+5)
IntSqr/300-8                        1.32ms ± 0%     1.31ms ± 1%    -0.73%  (p=0.032 n=5+5)
IntSqr/500-8                        2.48ms ± 0%     2.45ms ± 0%    -1.15%  (p=0.008 n=5+5)
IntSqr/800-8                        4.68ms ± 0%     4.62ms ± 0%    -1.23%  (p=0.008 n=5+5)
IntSqr/1000-8                       7.57ms ± 0%     7.50ms ± 0%    -0.84%  (p=0.008 n=5+5)
Mul-8                                311ms ± 0%      308ms ± 0%    -0.81%  (p=0.008 n=5+5)
Exp3Power/0x10-8                     574ns ± 1%      578ns ± 2%      ~     (p=0.500 n=5+5)
Exp3Power/0x40-8                     640ns ± 1%      646ns ± 0%      ~     (p=0.056 n=5+5)
Exp3Power/0x100-8                   1.42µs ± 1%     1.42µs ± 1%      ~     (p=0.246 n=5+5)
Exp3Power/0x400-8                   8.30µs ± 1%     8.29µs ± 1%      ~     (p=0.802 n=5+5)
Exp3Power/0x1000-8                  60.0µs ± 0%     59.9µs ± 0%    -0.24%  (p=0.016 n=5+5)
Exp3Power/0x4000-8                   817µs ± 0%      816µs ± 0%    -0.17%  (p=0.008 n=5+5)
Exp3Power/0x10000-8                 7.80ms ± 1%     7.70ms ± 0%    -1.23%  (p=0.008 n=5+5)
Exp3Power/0x40000-8                 73.4ms ± 0%     72.5ms ± 0%    -1.28%  (p=0.008 n=5+5)
Exp3Power/0x100000-8                 665ms ± 0%      656ms ± 0%    -1.34%  (p=0.008 n=5+5)
Exp3Power/0x400000-8                 5.99s ± 0%      5.90s ± 0%    -1.40%  (p=0.008 n=5+5)
Fibo-8                               116ms ± 0%       50ms ± 0%   -57.09%  (p=0.008 n=5+5)
NatSqr/1-8                           112ns ± 4%      112ns ± 2%      ~     (p=0.968 n=5+5)
NatSqr/2-8                           251ns ± 2%      250ns ± 1%      ~     (p=0.571 n=5+5)
NatSqr/3-8                           378ns ± 2%      379ns ± 2%      ~     (p=0.794 n=5+5)
NatSqr/5-8                           829ns ± 3%      827ns ± 2%      ~     (p=1.000 n=5+5)
NatSqr/8-8                          1.97µs ± 2%     1.95µs ± 2%      ~     (p=0.310 n=5+5)
NatSqr/10-8                         3.02µs ± 2%     2.99µs ± 2%      ~     (p=0.421 n=5+5)
NatSqr/20-8                         6.51µs ± 2%     6.49µs ± 1%      ~     (p=0.841 n=5+5)
NatSqr/30-8                         14.1µs ± 2%     14.0µs ± 2%      ~     (p=0.841 n=5+5)
NatSqr/50-8                         38.1µs ± 2%     38.3µs ± 3%      ~     (p=0.690 n=5+5)
NatSqr/80-8                         95.5µs ± 2%     96.0µs ± 1%      ~     (p=0.421 n=5+5)
NatSqr/100-8                         150µs ± 1%      148µs ± 2%      ~     (p=0.095 n=5+5)
NatSqr/200-8                         588µs ± 1%      590µs ± 1%      ~     (p=0.421 n=5+5)
NatSqr/300-8                        1.32ms ± 1%     1.31ms ± 1%      ~     (p=0.841 n=5+5)
NatSqr/500-8                        2.50ms ± 0%     2.47ms ± 0%    -1.03%  (p=0.008 n=5+5)
NatSqr/800-8                        4.70ms ± 0%     4.64ms ± 0%    -1.31%  (p=0.008 n=5+5)
NatSqr/1000-8                       7.60ms ± 0%     7.52ms ± 0%    -1.01%  (p=0.008 n=5+5)
ScanPi-8                             326µs ± 0%      326µs ± 0%      ~     (p=0.841 n=5+5)
StringPiParallel-8                  70.3µs ± 5%     63.8µs ±10%      ~     (p=0.056 n=5+5)
Scan/10/Base2-8                     1.09µs ± 0%     1.09µs ± 0%      ~     (p=0.317 n=5+5)
Scan/100/Base2-8                    7.79µs ± 0%     7.78µs ± 0%      ~     (p=0.063 n=5+5)
Scan/1000/Base2-8                   79.0µs ± 0%     78.9µs ± 0%    -0.18%  (p=0.008 n=5+5)
Scan/10000/Base2-8                  1.22ms ± 0%     1.22ms ± 0%    -0.15%  (p=0.008 n=5+5)
Scan/100000/Base2-8                 55.1ms ± 0%     55.2ms ± 0%    +0.20%  (p=0.008 n=5+5)
Scan/10/Base8-8                      512ns ± 0%      512ns ± 1%      ~     (p=0.810 n=5+5)
Scan/100/Base8-8                    2.89µs ± 0%     2.89µs ± 0%      ~     (p=0.810 n=5+5)
Scan/1000/Base8-8                   31.0µs ± 0%     31.0µs ± 0%      ~     (p=0.151 n=5+5)
Scan/10000/Base8-8                   740µs ± 0%      741µs ± 0%    +0.10%  (p=0.008 n=5+5)
Scan/100000/Base8-8                 50.6ms ± 0%     50.6ms ± 0%    +0.08%  (p=0.008 n=5+5)
Scan/10/Base10-8                     487ns ± 0%      487ns ± 0%      ~     (p=0.571 n=5+5)
Scan/100/Base10-8                   2.67µs ± 0%     2.67µs ± 0%      ~     (p=0.810 n=5+5)
Scan/1000/Base10-8                  28.7µs ± 0%     28.7µs ± 0%    +0.06%  (p=0.008 n=5+5)
Scan/10000/Base10-8                  716µs ± 0%      717µs ± 0%      ~     (p=0.222 n=5+5)
Scan/100000/Base10-8                50.3ms ± 0%     50.3ms ± 0%    +0.10%  (p=0.008 n=5+5)
Scan/10/Base16-8                     438ns ± 0%      437ns ± 1%      ~     (p=0.786 n=5+5)
Scan/100/Base16-8                   2.47µs ± 0%     2.47µs ± 0%    -0.19%  (p=0.048 n=5+5)
Scan/1000/Base16-8                  27.2µs ± 0%     27.3µs ± 0%      ~     (p=0.087 n=5+5)
Scan/10000/Base16-8                  722µs ± 0%      722µs ± 0%    +0.11%  (p=0.008 n=5+5)
Scan/100000/Base16-8                52.6ms ± 0%     52.7ms ± 0%    +0.15%  (p=0.008 n=5+5)
String/10/Base2-8                    247ns ± 2%      248ns ± 1%      ~     (p=0.437 n=5+5)
String/100/Base2-8                  1.51µs ± 0%     1.51µs ± 0%    -0.37%  (p=0.024 n=5+5)
String/1000/Base2-8                 13.6µs ± 1%     13.5µs ± 0%      ~     (p=0.095 n=5+5)
String/10000/Base2-8                 135µs ± 0%      135µs ± 1%      ~     (p=0.841 n=5+5)
String/100000/Base2-8               1.32ms ± 1%     1.32ms ± 1%      ~     (p=0.690 n=5+5)
String/10/Base8-8                    169ns ± 1%      169ns ± 1%      ~     (p=1.000 n=5+5)
String/100/Base8-8                   636ns ± 0%      634ns ± 1%      ~     (p=0.413 n=5+5)
String/1000/Base8-8                 5.33µs ± 1%     5.32µs ± 0%      ~     (p=0.222 n=5+5)
String/10000/Base8-8                50.9µs ± 1%     50.7µs ± 0%      ~     (p=0.151 n=5+5)
String/100000/Base8-8                500µs ± 1%      497µs ± 0%      ~     (p=0.421 n=5+5)
String/10/Base10-8                   516ns ± 1%      513ns ± 0%    -0.62%  (p=0.016 n=5+4)
String/100/Base10-8                 1.97µs ± 0%     1.96µs ± 0%      ~     (p=0.667 n=4+5)
String/1000/Base10-8                12.5µs ± 0%     11.5µs ± 0%    -7.92%  (p=0.008 n=5+5)
String/10000/Base10-8               57.7µs ± 0%     52.5µs ± 0%    -8.93%  (p=0.008 n=5+5)
String/100000/Base10-8              25.6ms ± 0%     21.6ms ± 0%   -15.94%  (p=0.008 n=5+5)
String/10/Base16-8                   150ns ± 1%      149ns ± 0%      ~     (p=0.413 n=5+4)
String/100/Base16-8                  514ns ± 1%      514ns ± 1%      ~     (p=0.849 n=5+5)
String/1000/Base16-8                4.01µs ± 0%     4.01µs ± 0%      ~     (p=0.421 n=5+5)
String/10000/Base16-8               37.8µs ± 1%     37.8µs ± 1%      ~     (p=0.841 n=5+5)
String/100000/Base16-8               373µs ± 2%      373µs ± 0%      ~     (p=0.421 n=5+5)
LeafSize/0-8                        6.63ms ± 0%     6.63ms ± 0%      ~     (p=0.730 n=4+5)
LeafSize/1-8                        74.0µs ± 0%     67.7µs ± 1%    -8.53%  (p=0.008 n=5+5)
LeafSize/2-8                        74.2µs ± 0%     68.3µs ± 1%    -7.99%  (p=0.008 n=5+5)
LeafSize/3-8                         379µs ± 0%      309µs ± 0%   -18.52%  (p=0.008 n=5+5)
LeafSize/4-8                        72.7µs ± 1%     66.7µs ± 0%    -8.37%  (p=0.008 n=5+5)
LeafSize/5-8                         471µs ± 0%      384µs ± 0%   -18.55%  (p=0.008 n=5+5)
LeafSize/6-8                         378µs ± 0%      308µs ± 0%   -18.59%  (p=0.008 n=5+5)
LeafSize/7-8                         245µs ± 0%      204µs ± 1%   -16.75%  (p=0.008 n=5+5)
LeafSize/8-8                        73.4µs ± 0%     66.9µs ± 1%    -8.79%  (p=0.008 n=5+5)
LeafSize/9-8                         538µs ± 0%      437µs ± 0%   -18.75%  (p=0.008 n=5+5)
LeafSize/10-8                        472µs ± 0%      396µs ± 1%   -16.01%  (p=0.008 n=5+5)
LeafSize/11-8                        460µs ± 0%      374µs ± 0%   -18.58%  (p=0.008 n=5+5)
LeafSize/12-8                        378µs ± 0%      308µs ± 0%   -18.38%  (p=0.008 n=5+5)
LeafSize/13-8                        343µs ± 0%      284µs ± 0%   -17.30%  (p=0.008 n=5+5)
LeafSize/14-8                        248µs ± 0%      206µs ± 0%   -16.94%  (p=0.008 n=5+5)
LeafSize/15-8                        169µs ± 0%      144µs ± 0%   -14.69%  (p=0.008 n=5+5)
LeafSize/16-8                       72.9µs ± 0%     66.8µs ± 1%    -8.27%  (p=0.008 n=5+5)
LeafSize/32-8                       82.5µs ± 0%     76.7µs ± 0%    -7.04%  (p=0.008 n=5+5)
LeafSize/64-8                        134µs ± 0%      129µs ± 0%    -3.80%  (p=0.008 n=5+5)
ProbablyPrime/n=0-8                 44.2ms ± 0%     43.4ms ± 0%    -1.95%  (p=0.008 n=5+5)
ProbablyPrime/n=1-8                 64.9ms ± 0%     64.0ms ± 0%    -1.27%  (p=0.008 n=5+5)
ProbablyPrime/n=5-8                  147ms ± 0%      146ms ± 0%    -0.58%  (p=0.008 n=5+5)
ProbablyPrime/n=10-8                 250ms ± 0%      249ms ± 0%    -0.35%  (p=0.008 n=5+5)
ProbablyPrime/n=20-8                 456ms ± 0%      455ms ± 0%    -0.18%  (p=0.008 n=5+5)
ProbablyPrime/Lucas-8               23.6ms ± 0%     22.7ms ± 0%    -3.74%  (p=0.008 n=5+5)
ProbablyPrime/MillerRabinBase2-8    20.7ms ± 0%     20.6ms ± 0%      ~     (p=0.421 n=5+5)
FloatSqrt/64-8                      2.25µs ± 1%     2.29µs ± 0%    +1.48%  (p=0.008 n=5+5)
FloatSqrt/128-8                     4.86µs ± 1%     4.92µs ± 1%    +1.21%  (p=0.032 n=5+5)
FloatSqrt/256-8                     13.6µs ± 0%     13.7µs ± 1%    +1.31%  (p=0.032 n=5+5)
FloatSqrt/1000-8                    70.0µs ± 1%     70.1µs ± 0%      ~     (p=0.690 n=5+5)
FloatSqrt/10000-8                   1.92ms ± 0%     1.90ms ± 0%    -0.59%  (p=0.008 n=5+5)
FloatSqrt/100000-8                  55.3ms ± 0%     54.8ms ± 0%    -1.01%  (p=0.008 n=5+5)
FloatSqrt/1000000-8                  4.56s ± 0%      4.50s ± 0%    -1.28%  (p=0.008 n=5+5)

name                              old speed      new speed       delta
AddVV/1-8                         2.97GB/s ± 0%   5.56GB/s ± 0%   +86.85%  (p=0.008 n=5+5)
AddVV/2-8                         9.47GB/s ± 0%  10.66GB/s ± 0%   +12.50%  (p=0.008 n=5+5)
AddVV/3-8                         12.4GB/s ± 0%   14.7GB/s ± 0%   +19.10%  (p=0.008 n=5+5)
AddVV/4-8                         14.6GB/s ± 0%   18.9GB/s ± 0%   +29.63%  (p=0.016 n=4+5)
AddVV/5-8                         16.4GB/s ± 0%   22.0GB/s ± 0%   +34.47%  (p=0.016 n=5+4)
AddVV/10-8                        21.7GB/s ± 0%   35.5GB/s ± 0%   +63.89%  (p=0.008 n=5+5)
AddVV/100-8                       29.4GB/s ± 0%   68.0GB/s ± 0%  +131.38%  (p=0.008 n=5+5)
AddVV/1000-8                      31.7GB/s ± 0%   61.9GB/s ± 0%   +95.43%  (p=0.008 n=5+5)
AddVV/10000-8                     31.2GB/s ± 0%   56.4GB/s ± 0%   +80.83%  (p=0.008 n=5+5)
AddVV/100000-8                    25.9GB/s ± 3%   41.4GB/s ± 0%   +59.98%  (p=0.008 n=5+5)
SubVV/1-8                         2.97GB/s ± 0%   5.56GB/s ± 0%   +86.97%  (p=0.016 n=4+5)
SubVV/2-8                         9.47GB/s ± 0%  10.66GB/s ± 0%   +12.51%  (p=0.008 n=5+5)
SubVV/3-8                         12.4GB/s ± 0%   14.8GB/s ± 0%   +19.23%  (p=0.016 n=4+5)
SubVV/4-8                         14.6GB/s ± 0%   18.9GB/s ± 0%   +29.56%  (p=0.008 n=5+5)
SubVV/5-8                         16.4GB/s ± 0%   22.0GB/s ± 0%   +34.47%  (p=0.016 n=4+5)
SubVV/10-8                        21.7GB/s ± 0%   35.5GB/s ± 0%   +63.89%  (p=0.008 n=5+5)
SubVV/100-8                       29.4GB/s ± 0%   68.0GB/s ± 0%  +131.38%  (p=0.008 n=5+5)
SubVV/1000-8                      31.6GB/s ± 0%   80.1GB/s ± 0%  +153.08%  (p=0.008 n=5+5)
SubVV/10000-8                     31.2GB/s ± 0%   56.7GB/s ± 0%   +81.79%  (p=0.008 n=5+5)
SubVV/100000-8                    29.1GB/s ±10%   29.0GB/s ±18%      ~     (p=0.690 n=5+5)
AddVW/1-8                          859MB/s ± 0%    859MB/s ± 0%    -0.01%  (p=0.008 n=5+5)
AddVW/2-8                          811MB/s ± 1%    814MB/s ± 0%      ~     (p=0.413 n=5+4)
AddVW/3-8                         2.08GB/s ± 0%   2.08GB/s ± 0%      ~     (p=0.206 n=5+5)
AddVW/4-8                         2.46GB/s ± 0%   2.46GB/s ± 0%      ~     (p=0.056 n=5+5)
AddVW/5-8                         2.75GB/s ± 0%   2.75GB/s ± 0%      ~     (p=0.508 n=5+5)
AddVW/10-8                        3.63GB/s ± 0%   3.63GB/s ± 0%      ~     (p=0.214 n=5+5)
AddVW/100-8                       4.79GB/s ± 0%   4.79GB/s ± 0%      ~     (p=0.500 n=5+5)
AddVW/1000-8                      5.27GB/s ± 0%   5.25GB/s ± 0%    -0.43%  (p=0.008 n=5+5)
AddVW/10000-8                     5.30GB/s ± 0%   5.30GB/s ± 0%      ~     (p=0.397 n=5+5)
AddVW/100000-8                    5.27GB/s ± 1%   5.25GB/s ± 1%      ~     (p=0.690 n=5+5)
AddMulVVW/1-8                     1.92GB/s ± 0%   1.96GB/s ± 1%    +1.95%  (p=0.008 n=5+5)
AddMulVVW/2-8                     2.16GB/s ± 1%   2.25GB/s ± 1%    +4.32%  (p=0.008 n=5+5)
AddMulVVW/3-8                     2.39GB/s ± 1%   2.25GB/s ± 3%    -5.79%  (p=0.008 n=5+5)
AddMulVVW/4-8                     2.00GB/s ± 0%   2.31GB/s ± 1%   +15.31%  (p=0.008 n=5+5)
AddMulVVW/5-8                     2.22GB/s ± 0%   2.14GB/s ± 0%    -3.86%  (p=0.008 n=5+5)
AddMulVVW/10-8                    2.15GB/s ± 1%   2.25GB/s ± 0%    +5.03%  (p=0.008 n=5+5)
AddMulVVW/100-8                   2.09GB/s ± 0%   2.14GB/s ± 0%    +2.25%  (p=0.008 n=5+5)
AddMulVVW/1000-8                  2.04GB/s ± 0%   2.38GB/s ± 0%   +16.52%  (p=0.008 n=5+5)
AddMulVVW/10000-8                 2.03GB/s ± 0%   2.10GB/s ± 0%    +3.64%  (p=0.008 n=5+5)
AddMulVVW/100000-8                2.02GB/s ± 0%   2.02GB/s ± 1%      ~     (p=0.690 n=5+5)

Change-Id: Ie482d67a7dbb5af6f5d81af2b3d9d14bd66336db
Reviewed-on: https://go-review.googlesource.com/77831
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-06 00:22:08 +00:00
Yury Smolsky
adcf2d59ec os/exec: document Process.Kill behaviour
It is not clear from documentation what the Process.Kill does. And it
leads to reccuring confusion about Cmd.Start/Wait methods.

Fixes #24220

Change-Id: I66609d21d2954e195d13648014681530eed8ea6c
Reviewed-on: https://go-review.googlesource.com/98715
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-05 23:47:41 +00:00
Mostyn Bramley-Moore
32e459a09c path/filepath: use a temp dir in path_test.go
We should avoid writing temp files to GOROOT, since it might be readonly.

Fixes #23881

Change-Id: Iaa38ec404b303f0cf27fdfb7daf1ddd60fd5d1c9
GitHub-Last-Rev: de0211df84
GitHub-Pull-Request: golang/go#24238
Reviewed-on: https://go-review.googlesource.com/98517
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-05 23:38:39 +00:00
Matthew Dempsky
26708439ec cmd/compile: refactor order.go into methods
No functional changes, just changing all the orderfoo functions
into (*Order).foo methods.

Passes toolstash-check.

Change-Id: Ib9833daa98aff3c645ce56794a414f8472689152
Reviewed-on: https://go-review.googlesource.com/98617
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-05 21:25:24 +00:00