1
0
mirror of https://github.com/golang/go synced 2024-11-13 17:50:23 -07:00
Commit Graph

3127 Commits

Author SHA1 Message Date
Ben Shi
e03220a594 cmd/compile: optimize 386 code with FLDPI
FLDPI pushes the constant pi to 387's register stack, which is
more efficient than MOVSSconst/MOVSDconst.

1. This optimization reduces 0.3KB of the total size of pkg/linux_386
(exlcuding cmd/compile).

2. There is little regression in the go1 benchmark.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.30s ± 3%     3.30s ± 2%    ~     (p=0.759 n=40+39)
Fannkuch11-4                3.53s ± 1%     3.54s ± 1%    ~     (p=0.168 n=40+40)
FmtFprintfEmpty-4          45.5ns ± 3%    45.6ns ± 3%    ~     (p=0.553 n=40+40)
FmtFprintfString-4         78.4ns ± 3%    78.3ns ± 3%    ~     (p=0.593 n=40+40)
FmtFprintfInt-4            88.8ns ± 2%    89.9ns ± 2%    ~     (p=0.083 n=40+33)
FmtFprintfIntInt-4          140ns ± 4%     140ns ± 4%    ~     (p=0.656 n=40+40)
FmtFprintfPrefixedInt-4     180ns ± 2%     181ns ± 3%  +0.53%  (p=0.050 n=40+40)
FmtFprintfFloat-4           408ns ± 4%     411ns ± 3%    ~     (p=0.112 n=40+40)
FmtManyArgs-4               599ns ± 3%     602ns ± 3%    ~     (p=0.784 n=40+40)
GobDecode-4                7.24ms ± 6%    7.30ms ± 5%    ~     (p=0.171 n=40+40)
GobEncode-4                6.98ms ± 5%    6.89ms ± 8%    ~     (p=0.107 n=40+40)
Gzip-4                      396ms ± 4%     396ms ± 3%    ~     (p=0.852 n=40+40)
Gunzip-4                   41.3ms ± 3%    41.5ms ± 4%    ~     (p=0.221 n=40+40)
HTTPClientServer-4         63.4µs ± 3%    63.4µs ± 2%    ~     (p=0.895 n=39+40)
JSONEncode-4               17.5ms ± 2%    17.5ms ± 3%    ~     (p=0.090 n=40+40)
JSONDecode-4               60.6ms ± 3%    60.1ms ± 4%    ~     (p=0.184 n=40+40)
Mandelbrot200-4            7.80ms ± 3%    7.78ms ± 2%    ~     (p=0.512 n=40+40)
GoParse-4                  3.30ms ± 3%    3.28ms ± 2%  -0.61%  (p=0.034 n=40+40)
RegexpMatchEasy0_32-4       104ns ± 4%     103ns ± 4%    ~     (p=0.118 n=40+40)
RegexpMatchEasy0_1K-4       850ns ± 2%     848ns ± 2%    ~     (p=0.370 n=40+40)
RegexpMatchEasy1_32-4       112ns ± 4%     112ns ± 4%    ~     (p=0.848 n=40+40)
RegexpMatchEasy1_1K-4      1.04µs ± 4%    1.03µs ± 4%    ~     (p=0.333 n=40+40)
RegexpMatchMedium_32-4      132ns ± 4%     131ns ± 3%    ~     (p=0.527 n=40+40)
RegexpMatchMedium_1K-4     43.4µs ± 3%    43.5µs ± 3%    ~     (p=0.111 n=40+40)
RegexpMatchHard_32-4       2.24µs ± 4%    2.24µs ± 4%    ~     (p=0.441 n=40+40)
RegexpMatchHard_1K-4       67.9µs ± 3%    68.0µs ± 3%    ~     (p=0.095 n=40+40)
Revcomp-4                   1.84s ± 2%     1.84s ± 2%    ~     (p=0.677 n=40+40)
Template-4                 68.4ms ± 3%    68.6ms ± 3%    ~     (p=0.345 n=40+40)
TimeParse-4                 433ns ± 3%     433ns ± 3%    ~     (p=0.403 n=40+40)
TimeFormat-4                407ns ± 3%     406ns ± 3%    ~     (p=0.900 n=40+40)
[Geo mean]                 67.1µs         67.2µs       +0.04%

name                     old speed      new speed      delta
GobDecode-4               106MB/s ± 5%   105MB/s ± 5%    ~     (p=0.173 n=40+40)
GobEncode-4               110MB/s ± 5%   112MB/s ± 9%    ~     (p=0.104 n=40+40)
Gzip-4                   49.0MB/s ± 4%  49.1MB/s ± 4%    ~     (p=0.836 n=40+40)
Gunzip-4                  471MB/s ± 3%   468MB/s ± 4%    ~     (p=0.218 n=40+40)
JSONEncode-4              111MB/s ± 2%   111MB/s ± 3%    ~     (p=0.090 n=40+40)
JSONDecode-4             32.0MB/s ± 3%  32.3MB/s ± 4%    ~     (p=0.194 n=40+40)
GoParse-4                17.6MB/s ± 3%  17.7MB/s ± 2%  +0.62%  (p=0.035 n=40+40)
RegexpMatchEasy0_32-4     307MB/s ± 4%   309MB/s ± 4%  +0.70%  (p=0.041 n=40+40)
RegexpMatchEasy0_1K-4    1.20GB/s ± 3%  1.21GB/s ± 2%    ~     (p=0.353 n=40+40)
RegexpMatchEasy1_32-4     285MB/s ± 3%   284MB/s ± 4%    ~     (p=0.384 n=40+40)
RegexpMatchEasy1_1K-4     988MB/s ± 4%   992MB/s ± 3%    ~     (p=0.335 n=40+40)
RegexpMatchMedium_32-4   7.56MB/s ± 4%  7.57MB/s ± 4%    ~     (p=0.314 n=40+40)
RegexpMatchMedium_1K-4   23.6MB/s ± 3%  23.6MB/s ± 3%    ~     (p=0.107 n=40+40)
RegexpMatchHard_32-4     14.3MB/s ± 4%  14.3MB/s ± 4%    ~     (p=0.429 n=40+40)
RegexpMatchHard_1K-4     15.1MB/s ± 3%  15.1MB/s ± 3%    ~     (p=0.099 n=40+40)
Revcomp-4                 138MB/s ± 2%   138MB/s ± 2%    ~     (p=0.658 n=40+40)
Template-4               28.4MB/s ± 3%  28.3MB/s ± 3%    ~     (p=0.331 n=40+40)
[Geo mean]               80.8MB/s       80.8MB/s       +0.09%

Change-Id: I0cb715eead68ade097a302e7fb80ccbd1d1b511e
Reviewed-on: https://go-review.googlesource.com/130975
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-25 02:39:49 +00:00
Ben Shi
3bc34385fa cmd/compile: introduce more read-modify-write operations for amd64
Add suport of read-modify-write for AND/SUB/AND/OR/XOR on amd64.

1. The total size of pkg/linux_amd64 decreases about 4KB, excluding
cmd/compile.

2. The go1 benchmark shows a little improvement, excluding noise.

name                     old time/op    new time/op    delta
BinaryTree17-4              2.63s ± 3%     2.65s ± 4%   +1.01%  (p=0.037 n=35+35)
Fannkuch11-4                2.33s ± 2%     2.39s ± 2%   +2.49%  (p=0.000 n=35+35)
FmtFprintfEmpty-4          45.4ns ± 5%    40.8ns ± 6%  -10.09%  (p=0.000 n=35+35)
FmtFprintfString-4         73.3ns ± 4%    70.9ns ± 3%   -3.23%  (p=0.000 n=30+35)
FmtFprintfInt-4            79.9ns ± 4%    79.5ns ± 3%     ~     (p=0.736 n=34+35)
FmtFprintfIntInt-4          126ns ± 4%     125ns ± 4%     ~     (p=0.083 n=35+35)
FmtFprintfPrefixedInt-4     152ns ± 6%     152ns ± 3%     ~     (p=0.855 n=34+35)
FmtFprintfFloat-4           215ns ± 4%     213ns ± 4%     ~     (p=0.066 n=35+35)
FmtManyArgs-4               522ns ± 3%     506ns ± 3%   -3.15%  (p=0.000 n=35+35)
GobDecode-4                6.45ms ± 8%    6.51ms ± 7%   +0.96%  (p=0.026 n=35+35)
GobEncode-4                6.10ms ± 6%    6.02ms ± 8%     ~     (p=0.160 n=35+35)
Gzip-4                      228ms ± 3%     221ms ± 3%   -2.92%  (p=0.000 n=35+35)
Gunzip-4                   37.5ms ± 4%    37.2ms ± 3%   -0.78%  (p=0.036 n=35+35)
HTTPClientServer-4         58.7µs ± 2%    59.2µs ± 1%   +0.80%  (p=0.000 n=33+33)
JSONEncode-4               12.0ms ± 3%    12.2ms ± 3%   +1.84%  (p=0.008 n=35+35)
JSONDecode-4               57.0ms ± 4%    56.6ms ± 3%     ~     (p=0.320 n=35+35)
Mandelbrot200-4            3.82ms ± 3%    3.79ms ± 3%     ~     (p=0.074 n=35+35)
GoParse-4                  3.21ms ± 5%    3.24ms ± 4%     ~     (p=0.119 n=35+35)
RegexpMatchEasy0_32-4      76.3ns ± 4%    75.4ns ± 4%   -1.14%  (p=0.014 n=34+33)
RegexpMatchEasy0_1K-4       251ns ± 4%     254ns ± 3%   +1.28%  (p=0.016 n=35+35)
RegexpMatchEasy1_32-4      69.6ns ± 3%    70.1ns ± 3%   +0.82%  (p=0.005 n=35+35)
RegexpMatchEasy1_1K-4       367ns ± 4%     376ns ± 4%   +2.47%  (p=0.000 n=35+35)
RegexpMatchMedium_32-4      108ns ± 5%     104ns ± 4%   -3.18%  (p=0.000 n=35+35)
RegexpMatchMedium_1K-4     33.8µs ± 3%    32.7µs ± 3%   -3.27%  (p=0.000 n=35+35)
RegexpMatchHard_32-4       1.55µs ± 3%    1.52µs ± 3%   -1.64%  (p=0.000 n=35+35)
RegexpMatchHard_1K-4       46.6µs ± 3%    46.6µs ± 4%     ~     (p=0.149 n=35+35)
Revcomp-4                   416ms ± 7%     412ms ± 6%   -0.95%  (p=0.033 n=33+35)
Template-4                 64.3ms ± 3%    62.4ms ± 7%   -2.94%  (p=0.000 n=35+35)
TimeParse-4                 320ns ± 2%     322ns ± 3%     ~     (p=0.589 n=35+35)
TimeFormat-4                300ns ± 3%     300ns ± 3%     ~     (p=0.597 n=35+35)
[Geo mean]                 47.4µs         47.0µs        -0.86%

name                     old speed      new speed      delta
GobDecode-4               119MB/s ± 7%   118MB/s ± 7%   -0.96%  (p=0.027 n=35+35)
GobEncode-4               126MB/s ± 7%   127MB/s ± 6%     ~     (p=0.157 n=34+34)
Gzip-4                   85.3MB/s ± 3%  87.9MB/s ± 3%   +3.02%  (p=0.000 n=35+35)
Gunzip-4                  518MB/s ± 4%   522MB/s ± 3%   +0.79%  (p=0.037 n=35+35)
JSONEncode-4              162MB/s ± 3%   159MB/s ± 3%   -1.81%  (p=0.009 n=35+35)
JSONDecode-4             34.1MB/s ± 4%  34.3MB/s ± 3%     ~     (p=0.318 n=35+35)
GoParse-4                18.0MB/s ± 5%  17.9MB/s ± 4%     ~     (p=0.117 n=35+35)
RegexpMatchEasy0_32-4     419MB/s ± 3%   425MB/s ± 4%   +1.46%  (p=0.003 n=32+33)
RegexpMatchEasy0_1K-4    4.07GB/s ± 4%  4.02GB/s ± 3%   -1.28%  (p=0.014 n=35+35)
RegexpMatchEasy1_32-4     460MB/s ± 3%   456MB/s ± 4%   -0.82%  (p=0.004 n=35+35)
RegexpMatchEasy1_1K-4    2.79GB/s ± 4%  2.72GB/s ± 4%   -2.39%  (p=0.000 n=35+35)
RegexpMatchMedium_32-4   9.23MB/s ± 4%  9.53MB/s ± 4%   +3.16%  (p=0.000 n=35+35)
RegexpMatchMedium_1K-4   30.3MB/s ± 3%  31.3MB/s ± 3%   +3.38%  (p=0.000 n=35+35)
RegexpMatchHard_32-4     20.7MB/s ± 3%  21.0MB/s ± 3%   +1.67%  (p=0.000 n=35+35)
RegexpMatchHard_1K-4     22.0MB/s ± 3%  21.9MB/s ± 4%     ~     (p=0.277 n=35+33)
Revcomp-4                 612MB/s ± 7%   618MB/s ± 6%   +0.96%  (p=0.034 n=33+35)
Template-4               30.2MB/s ± 3%  31.1MB/s ± 6%   +3.05%  (p=0.000 n=35+35)
[Geo mean]                123MB/s        124MB/s        +0.64%

Change-Id: Ia025da272e07d0069413824bfff3471b106d6280
Reviewed-on: https://go-review.googlesource.com/121535
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-24 23:38:25 +00:00
Kazuhiro Sera
ad644d2e86 all: fix typos detected by github.com/client9/misspell
Change-Id: Iadb3c5de8ae9ea45855013997ed70f7929a88661
GitHub-Last-Rev: ae85bcf82b
GitHub-Pull-Request: golang/go#26920
Reviewed-on: https://go-review.googlesource.com/128955
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-23 15:54:07 +00:00
Yury Smolsky
1484270aec test: restore tests for the reject unsafe code option
Tests in test/safe were neglected after moving to the run.go
framework. This change restores them.

These tests are skipped for go/types via -+ option.

Fixes #25668

Change-Id: I8fe26574a76fa7afa8664c467d7c2e6334f1bba9
Reviewed-on: https://go-review.googlesource.com/124660
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-22 17:54:09 +00:00
Yury Smolsky
02fecd33f6 test: remove errchk, the perl script
gc tests do not depend on errchk.

Fixes #25669

Change-Id: I99eb87bb9677897b9167d4fc9a6321fa66cd9116
Reviewed-on: https://go-review.googlesource.com/115955
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-22 16:28:36 +00:00
Ben Shi
a0a7e9fc0c cmd/compile: implement "OPC $imm, (mem)" for 386
New read-modify-write operations are introduced in this CL for 386.

1. The total size of pkg/linux_386 decreases about 10KB (excluding
cmd/compile).

2. The go1 benchmark shows little regression.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.32s ± 4%     3.29s ± 2%    ~     (p=0.059 n=30+30)
Fannkuch11-4                3.49s ± 1%     3.46s ± 1%  -0.92%  (p=0.001 n=30+30)
FmtFprintfEmpty-4          47.7ns ± 2%    46.8ns ± 5%  -1.93%  (p=0.011 n=25+30)
FmtFprintfString-4         79.5ns ± 7%    80.2ns ± 3%  +0.89%  (p=0.001 n=28+29)
FmtFprintfInt-4            90.5ns ± 2%    92.1ns ± 2%  +1.82%  (p=0.014 n=22+30)
FmtFprintfIntInt-4          141ns ± 1%     144ns ± 3%  +2.23%  (p=0.013 n=22+30)
FmtFprintfPrefixedInt-4     183ns ± 2%     184ns ± 3%    ~     (p=0.080 n=21+30)
FmtFprintfFloat-4           409ns ± 3%     412ns ± 3%  +0.83%  (p=0.040 n=30+30)
FmtManyArgs-4               597ns ± 6%     607ns ± 4%  +1.71%  (p=0.006 n=30+30)
GobDecode-4                7.21ms ± 5%    7.18ms ± 6%    ~     (p=0.665 n=30+30)
GobEncode-4                7.17ms ± 6%    7.09ms ± 7%    ~     (p=0.117 n=29+30)
Gzip-4                      413ms ± 4%     399ms ± 4%  -3.48%  (p=0.000 n=30+30)
Gunzip-4                   41.3ms ± 4%    41.7ms ± 3%  +1.05%  (p=0.011 n=30+30)
HTTPClientServer-4         63.5µs ± 3%    62.9µs ± 2%  -0.97%  (p=0.017 n=30+27)
JSONEncode-4               20.3ms ± 5%    20.1ms ± 5%  -1.16%  (p=0.004 n=30+30)
JSONDecode-4               66.2ms ± 4%    67.7ms ± 4%  +2.21%  (p=0.000 n=30+30)
Mandelbrot200-4            5.16ms ± 3%    5.18ms ± 3%    ~     (p=0.123 n=30+30)
GoParse-4                  3.23ms ± 2%    3.27ms ± 2%  +1.08%  (p=0.006 n=30+30)
RegexpMatchEasy0_32-4      98.9ns ± 5%    97.1ns ± 4%  -1.83%  (p=0.006 n=30+30)
RegexpMatchEasy0_1K-4       842ns ± 3%     842ns ± 3%    ~     (p=0.550 n=30+30)
RegexpMatchEasy1_32-4       107ns ± 4%     105ns ± 4%  -1.93%  (p=0.012 n=30+30)
RegexpMatchEasy1_1K-4      1.03µs ± 4%    1.04µs ± 4%    ~     (p=0.304 n=30+30)
RegexpMatchMedium_32-4      132ns ± 2%     129ns ± 4%  -2.02%  (p=0.000 n=21+30)
RegexpMatchMedium_1K-4     44.1µs ± 4%    43.8µs ± 3%    ~     (p=0.641 n=30+30)
RegexpMatchHard_32-4       2.26µs ± 4%    2.23µs ± 4%  -1.28%  (p=0.023 n=30+30)
RegexpMatchHard_1K-4       68.1µs ± 3%    68.6µs ± 4%    ~     (p=0.089 n=30+30)
Revcomp-4                   1.85s ± 2%     1.84s ± 2%    ~     (p=0.072 n=30+30)
Template-4                 69.2ms ± 3%    68.5ms ± 3%  -1.04%  (p=0.012 n=30+30)
TimeParse-4                 441ns ± 3%     446ns ± 4%  +1.21%  (p=0.001 n=30+30)
TimeFormat-4                415ns ± 3%     415ns ± 3%    ~     (p=0.436 n=30+30)
[Geo mean]                 67.0µs         66.9µs       -0.17%

name                     old speed      new speed      delta
GobDecode-4               107MB/s ± 5%   107MB/s ± 6%    ~     (p=0.663 n=30+30)
GobEncode-4               107MB/s ± 6%   108MB/s ± 7%    ~     (p=0.117 n=29+30)
Gzip-4                   47.0MB/s ± 4%  48.7MB/s ± 4%  +3.61%  (p=0.000 n=30+30)
Gunzip-4                  470MB/s ± 4%   466MB/s ± 4%  -1.05%  (p=0.011 n=30+30)
JSONEncode-4             95.6MB/s ± 5%  96.7MB/s ± 5%  +1.16%  (p=0.005 n=30+30)
JSONDecode-4             29.3MB/s ± 4%  28.7MB/s ± 4%  -2.17%  (p=0.000 n=30+30)
GoParse-4                17.9MB/s ± 2%  17.7MB/s ± 2%  -1.06%  (p=0.007 n=30+30)
RegexpMatchEasy0_32-4     323MB/s ± 5%   329MB/s ± 4%  +1.93%  (p=0.006 n=30+30)
RegexpMatchEasy0_1K-4    1.22GB/s ± 3%  1.22GB/s ± 3%    ~     (p=0.496 n=30+30)
RegexpMatchEasy1_32-4     298MB/s ± 4%   303MB/s ± 4%  +1.84%  (p=0.017 n=30+30)
RegexpMatchEasy1_1K-4     995MB/s ± 4%   989MB/s ± 4%    ~     (p=0.307 n=30+30)
RegexpMatchMedium_32-4   7.56MB/s ± 4%  7.74MB/s ± 4%  +2.46%  (p=0.000 n=22+30)
RegexpMatchMedium_1K-4   23.2MB/s ± 4%  23.4MB/s ± 3%    ~     (p=0.651 n=30+30)
RegexpMatchHard_32-4     14.2MB/s ± 4%  14.3MB/s ± 4%  +1.29%  (p=0.021 n=30+30)
RegexpMatchHard_1K-4     15.0MB/s ± 3%  14.9MB/s ± 4%    ~     (p=0.069 n=30+29)
Revcomp-4                 138MB/s ± 2%   138MB/s ± 2%    ~     (p=0.072 n=30+30)
Template-4               28.1MB/s ± 3%  28.4MB/s ± 3%  +1.05%  (p=0.012 n=30+30)
[Geo mean]               79.7MB/s       80.2MB/s       +0.60%

Change-Id: I44a1dfc942c9a385904553c4fe1fa8e509c8aa31
Reviewed-on: https://go-review.googlesource.com/120916
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-22 04:12:42 +00:00
Ben Shi
90f2fa0037 cmd/compile: optimize 386 code with MULLload/DIVSSload/DIVSDload
IMULL/DIVSS/DIVSD all can take the source operand from memory
directly. And this CL implement that optimization.

1. The total size of pkg/linux_386 decreases about 84KB (excluding
cmd/compile).

2. The go1 benchmark shows little regression in total (excluding noise).
name                     old time/op    new time/op    delta
BinaryTree17-4              3.29s ± 2%     3.27s ± 4%    ~     (p=0.192 n=30+30)
Fannkuch11-4                3.49s ± 2%     3.54s ± 1%  +1.48%  (p=0.000 n=30+30)
FmtFprintfEmpty-4          45.9ns ± 3%    46.3ns ± 4%  +0.89%  (p=0.037 n=30+30)
FmtFprintfString-4         78.8ns ± 3%    78.7ns ± 4%    ~     (p=0.209 n=30+27)
FmtFprintfInt-4            91.0ns ± 2%    90.3ns ± 2%  -0.82%  (p=0.031 n=30+27)
FmtFprintfIntInt-4          142ns ± 4%     143ns ± 4%    ~     (p=0.136 n=30+30)
FmtFprintfPrefixedInt-4     181ns ± 3%     183ns ± 4%  +1.40%  (p=0.005 n=30+30)
FmtFprintfFloat-4           404ns ± 4%     408ns ± 3%    ~     (p=0.397 n=30+30)
FmtManyArgs-4               601ns ± 3%     609ns ± 5%    ~     (p=0.059 n=30+30)
GobDecode-4                7.21ms ± 5%    7.24ms ± 5%    ~     (p=0.612 n=30+30)
GobEncode-4                6.91ms ± 6%    6.91ms ± 6%    ~     (p=0.797 n=30+30)
Gzip-4                      398ms ± 6%     399ms ± 4%    ~     (p=0.173 n=30+30)
Gunzip-4                   41.7ms ± 3%    41.8ms ± 3%    ~     (p=0.423 n=30+30)
HTTPClientServer-4         62.3µs ± 2%    62.7µs ± 3%    ~     (p=0.085 n=29+30)
JSONEncode-4               21.0ms ± 4%    20.7ms ± 5%  -1.39%  (p=0.014 n=30+30)
JSONDecode-4               66.3ms ± 3%    67.4ms ± 1%  +1.71%  (p=0.003 n=30+24)
Mandelbrot200-4            5.15ms ± 3%    5.16ms ± 3%    ~     (p=0.697 n=30+30)
GoParse-4                  3.24ms ± 3%    3.27ms ± 4%  +0.91%  (p=0.032 n=30+30)
RegexpMatchEasy0_32-4       101ns ± 5%      99ns ± 4%  -1.82%  (p=0.008 n=29+30)
RegexpMatchEasy0_1K-4       848ns ± 4%     841ns ± 2%  -0.77%  (p=0.043 n=30+30)
RegexpMatchEasy1_32-4       106ns ± 6%     106ns ± 3%    ~     (p=0.939 n=29+30)
RegexpMatchEasy1_1K-4      1.02µs ± 3%    1.03µs ± 4%    ~     (p=0.297 n=28+30)
RegexpMatchMedium_32-4      129ns ± 4%     127ns ± 4%    ~     (p=0.073 n=30+30)
RegexpMatchMedium_1K-4     43.9µs ± 3%    43.8µs ± 3%    ~     (p=0.186 n=30+30)
RegexpMatchHard_32-4       2.24µs ± 4%    2.22µs ± 4%    ~     (p=0.332 n=30+29)
RegexpMatchHard_1K-4       68.0µs ± 4%    67.5µs ± 3%    ~     (p=0.290 n=30+30)
Revcomp-4                   1.85s ± 3%     1.85s ± 3%    ~     (p=0.358 n=30+30)
Template-4                 69.6ms ± 3%    70.0ms ± 4%    ~     (p=0.273 n=30+30)
TimeParse-4                 445ns ± 3%     441ns ± 3%    ~     (p=0.494 n=30+30)
TimeFormat-4                412ns ± 3%     412ns ± 6%    ~     (p=0.841 n=30+30)
[Geo mean]                 66.7µs         66.8µs       +0.13%

name                     old speed      new speed      delta
GobDecode-4               107MB/s ± 5%   106MB/s ± 5%    ~     (p=0.615 n=30+30)
GobEncode-4               111MB/s ± 6%   111MB/s ± 6%    ~     (p=0.790 n=30+30)
Gzip-4                   48.8MB/s ± 6%  48.7MB/s ± 4%    ~     (p=0.167 n=30+30)
Gunzip-4                  465MB/s ± 3%   465MB/s ± 3%    ~     (p=0.420 n=30+30)
JSONEncode-4             92.4MB/s ± 4%  93.7MB/s ± 5%  +1.42%  (p=0.015 n=30+30)
JSONDecode-4             29.3MB/s ± 3%  28.8MB/s ± 1%  -1.72%  (p=0.003 n=30+24)
GoParse-4                17.9MB/s ± 3%  17.7MB/s ± 4%  -0.89%  (p=0.037 n=30+30)
RegexpMatchEasy0_32-4     317MB/s ± 8%   324MB/s ± 4%  +2.14%  (p=0.006 n=30+30)
RegexpMatchEasy0_1K-4    1.21GB/s ± 4%  1.22GB/s ± 2%  +0.77%  (p=0.036 n=30+30)
RegexpMatchEasy1_32-4     298MB/s ± 7%   299MB/s ± 4%    ~     (p=0.511 n=30+30)
RegexpMatchEasy1_1K-4    1.00GB/s ± 3%  1.00GB/s ± 4%    ~     (p=0.304 n=28+30)
RegexpMatchMedium_32-4   7.75MB/s ± 4%  7.82MB/s ± 4%    ~     (p=0.089 n=30+30)
RegexpMatchMedium_1K-4   23.3MB/s ± 3%  23.4MB/s ± 3%    ~     (p=0.181 n=30+30)
RegexpMatchHard_32-4     14.3MB/s ± 4%  14.4MB/s ± 4%    ~     (p=0.320 n=30+29)
RegexpMatchHard_1K-4     15.1MB/s ± 4%  15.2MB/s ± 3%    ~     (p=0.273 n=30+30)
Revcomp-4                 137MB/s ± 3%   137MB/s ± 3%    ~     (p=0.352 n=30+30)
Template-4               27.9MB/s ± 3%  27.7MB/s ± 4%    ~     (p=0.277 n=30+30)
[Geo mean]               79.9MB/s       80.1MB/s       +0.15%

Change-Id: I97333cd8ddabb3c7c88ca5aa9e14a005b74d306d
Reviewed-on: https://go-review.googlesource.com/120695
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-22 03:38:38 +00:00
Ben Shi
705f3c74e6 cmd/compile: optimize AMD64 with DIVSSload and DIVSDload
DIVSSload & DIVSDload directly operate on a memory operand. And
binary size can be reduced by them, while the performance is
not affected.

The total size of pkg/linux_amd64 (excluding cmd/compile) decreases
about 6KB.

There is little regression in the go1 benchmark test (excluding noise).
name                     old time/op    new time/op    delta
BinaryTree17-4              2.63s ± 4%     2.62s ± 4%    ~     (p=0.809 n=30+30)
Fannkuch11-4                2.40s ± 2%     2.40s ± 2%    ~     (p=0.109 n=30+30)
FmtFprintfEmpty-4          43.1ns ± 4%    43.2ns ± 9%    ~     (p=0.168 n=30+30)
FmtFprintfString-4         73.6ns ± 4%    74.1ns ± 4%    ~     (p=0.069 n=30+30)
FmtFprintfInt-4            81.0ns ± 3%    81.4ns ± 5%    ~     (p=0.350 n=30+30)
FmtFprintfIntInt-4          127ns ± 4%     129ns ± 4%  +0.99%  (p=0.021 n=30+30)
FmtFprintfPrefixedInt-4     156ns ± 4%     155ns ± 4%    ~     (p=0.415 n=30+30)
FmtFprintfFloat-4           219ns ± 4%     218ns ± 4%    ~     (p=0.071 n=30+30)
FmtManyArgs-4               522ns ± 3%     518ns ± 3%  -0.68%  (p=0.034 n=30+30)
GobDecode-4                6.49ms ± 6%    6.52ms ± 6%    ~     (p=0.832 n=30+30)
GobEncode-4                6.10ms ± 9%    6.14ms ± 7%    ~     (p=0.485 n=30+30)
Gzip-4                      227ms ± 1%     224ms ± 4%    ~     (p=0.484 n=24+30)
Gunzip-4                   37.2ms ± 3%    36.8ms ± 4%    ~     (p=0.889 n=30+30)
HTTPClientServer-4         58.9µs ± 1%    58.7µs ± 2%  -0.42%  (p=0.003 n=28+28)
JSONEncode-4               12.0ms ± 3%    12.0ms ± 4%    ~     (p=0.523 n=30+30)
JSONDecode-4               54.6ms ± 4%    54.5ms ± 4%    ~     (p=0.708 n=30+30)
Mandelbrot200-4            3.78ms ± 4%    3.81ms ± 3%  +0.99%  (p=0.016 n=30+30)
GoParse-4                  3.20ms ± 4%    3.20ms ± 5%    ~     (p=0.994 n=30+30)
RegexpMatchEasy0_32-4      77.0ns ± 4%    75.9ns ± 3%  -1.39%  (p=0.006 n=29+30)
RegexpMatchEasy0_1K-4       255ns ± 4%     253ns ± 4%    ~     (p=0.091 n=30+30)
RegexpMatchEasy1_32-4      69.7ns ± 3%    70.3ns ± 4%    ~     (p=0.120 n=30+30)
RegexpMatchEasy1_1K-4       373ns ± 2%     378ns ± 3%  +1.43%  (p=0.000 n=21+26)
RegexpMatchMedium_32-4      107ns ± 2%     108ns ± 4%  +1.50%  (p=0.012 n=22+30)
RegexpMatchMedium_1K-4     34.0µs ± 1%    34.3µs ± 3%  +1.08%  (p=0.008 n=24+30)
RegexpMatchHard_32-4       1.53µs ± 3%    1.54µs ± 3%    ~     (p=0.234 n=30+30)
RegexpMatchHard_1K-4       46.7µs ± 4%    47.0µs ± 4%    ~     (p=0.420 n=30+30)
Revcomp-4                   411ms ± 7%     415ms ± 6%    ~     (p=0.059 n=30+30)
Template-4                 65.5ms ± 5%    66.9ms ± 4%  +2.21%  (p=0.001 n=30+30)
TimeParse-4                 317ns ± 3%     311ns ± 3%  -1.97%  (p=0.000 n=30+30)
TimeFormat-4                293ns ± 3%     294ns ± 3%    ~     (p=0.243 n=30+30)
[Geo mean]                 47.4µs         47.5µs       +0.17%

name                     old speed      new speed      delta
GobDecode-4               118MB/s ± 5%   118MB/s ± 6%    ~     (p=0.832 n=30+30)
GobEncode-4               125MB/s ± 7%   125MB/s ± 7%    ~     (p=0.625 n=29+30)
Gzip-4                   85.3MB/s ± 1%  86.6MB/s ± 4%    ~     (p=0.486 n=24+30)
Gunzip-4                  522MB/s ± 3%   527MB/s ± 4%    ~     (p=0.889 n=30+30)
JSONEncode-4              162MB/s ± 3%   162MB/s ± 4%    ~     (p=0.520 n=30+30)
JSONDecode-4             35.5MB/s ± 4%  35.6MB/s ± 4%    ~     (p=0.701 n=30+30)
GoParse-4                18.1MB/s ± 4%  18.1MB/s ± 4%    ~     (p=0.891 n=29+30)
RegexpMatchEasy0_32-4     416MB/s ± 4%   422MB/s ± 3%  +1.43%  (p=0.005 n=29+30)
RegexpMatchEasy0_1K-4    4.01GB/s ± 4%  4.04GB/s ± 4%    ~     (p=0.091 n=30+30)
RegexpMatchEasy1_32-4     460MB/s ± 3%   456MB/s ± 5%    ~     (p=0.123 n=30+30)
RegexpMatchEasy1_1K-4    2.74GB/s ± 2%  2.70GB/s ± 3%  -1.33%  (p=0.000 n=22+26)
RegexpMatchMedium_32-4   9.39MB/s ± 3%  9.19MB/s ± 4%  -2.06%  (p=0.001 n=28+30)
RegexpMatchMedium_1K-4   30.1MB/s ± 1%  29.8MB/s ± 3%  -1.04%  (p=0.008 n=24+30)
RegexpMatchHard_32-4     20.9MB/s ± 3%  20.8MB/s ± 3%    ~     (p=0.234 n=30+30)
RegexpMatchHard_1K-4     21.9MB/s ± 4%  21.8MB/s ± 4%    ~     (p=0.420 n=30+30)
Revcomp-4                 619MB/s ± 7%   612MB/s ± 7%    ~     (p=0.059 n=30+30)
Template-4               29.6MB/s ± 4%  29.0MB/s ± 4%  -2.16%  (p=0.002 n=30+30)
[Geo mean]                123MB/s        123MB/s       -0.33%

Change-Id: Ia59e077feae4f2824df79059daea4d0f678e3e4c
Reviewed-on: https://go-review.googlesource.com/120275
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-08-22 02:47:49 +00:00
Ian Lance Taylor
7e64377903 cmd/compile: only support -race and -msan where they work
Consolidate decision about whether -race and -msan options are
supported in cmd/internal/sys. Use consolidated functions in
cmd/compile and cmd/go. Use a copy of them in cmd/dist; cmd/dist can't
import cmd/internal/sys because Go 1.4 doesn't have it.

Fixes #24315

Change-Id: I9cecaed4895eb1a2a49379b4848db40de66d32a9
Reviewed-on: https://go-review.googlesource.com/121816
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-21 03:38:27 +00:00
Ilya Tocar
a3593685cf cmd/compile/internal/ssa: remove useless zero extension
We generate MOVBLZX for byte-sized LoadReg, so
(MOVBQZX (LoadReg (Arg))) is the same as
(LoadReg (Arg)). Remove those zero extension where possible.
Triggers several times during all.bash.

Fixes #25378
Updates #15300

Change-Id: If50656e66f217832a13ee8f49c47997f4fcc093a
Reviewed-on: https://go-review.googlesource.com/115617
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-20 21:38:20 +00:00
Ilya Tocar
c292b32f33 cmd/compile: enable disjoint memmove inlining on amd64
Memmove can use AVX/prefetches/other optional instructions, so
only do it for small sizes, when call overhead dominates.

Change-Id: Ice5e93deb11462217f7fb5fc350b703109bb4090
Reviewed-on: https://go-review.googlesource.com/112517
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-08-20 21:10:12 +00:00
Ben Shi
0a382e0b7f cmd/compile: optimize ARMv7 code
"AND $0xffff0000, Rx" will be encoded to 12 bytes.
1. MOVWload from the constant pool to Rtmp
2. AND Rtmp, Rx
3. a 4-byte item in the constant pool

It can be simplified to 8 bytes on ARMv7, since ARMv7 has
"MOVW $imm-16, Rx".
1. MOVW $0xffff, Rtmp
2. BIC Rtmp, Rx

The above optimization also applies to BICconst, ADDconst and
SUBconst.

1. The total size of pkg/android_arm (excluding cmd/compile)
   decreases about 2KB.

2. The go1 benchmark shows no regression, exlcuding noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              25.5s ± 1%     25.2s ± 1%  -0.85%  (p=0.000 n=30+30)
Fannkuch11-4                13.3s ± 0%     13.3s ± 0%  +0.16%  (p=0.000 n=24+25)
FmtFprintfEmpty-4           397ns ± 0%     394ns ± 0%  -0.64%  (p=0.000 n=30+30)
FmtFprintfString-4          679ns ± 0%     678ns ± 0%    ~     (p=0.093 n=30+29)
FmtFprintfInt-4             708ns ± 0%     707ns ± 0%  -0.19%  (p=0.000 n=27+28)
FmtFprintfIntInt-4         1.05µs ± 0%    1.05µs ± 0%  -0.07%  (p=0.001 n=18+30)
FmtFprintfPrefixedInt-4    1.16µs ± 0%    1.15µs ± 0%  -0.41%  (p=0.000 n=29+30)
FmtFprintfFloat-4          2.26µs ± 0%    2.23µs ± 1%  -1.40%  (p=0.000 n=30+30)
FmtManyArgs-4              3.96µs ± 0%    3.95µs ± 0%  -0.29%  (p=0.000 n=29+30)
GobDecode-4                52.9ms ± 2%    53.4ms ± 2%  +0.92%  (p=0.004 n=28+30)
GobEncode-4                49.7ms ± 2%    49.8ms ± 2%    ~     (p=0.890 n=30+26)
Gzip-4                      2.61s ± 0%     2.60s ± 0%  -0.36%  (p=0.000 n=29+29)
Gunzip-4                    312ms ± 0%     311ms ± 0%  -0.13%  (p=0.000 n=30+28)
HTTPClientServer-4         1.02ms ± 8%    1.00ms ± 7%    ~     (p=0.224 n=29+26)
JSONEncode-4                125ms ± 1%     124ms ± 3%  -1.05%  (p=0.000 n=25+30)
JSONDecode-4                432ms ± 1%     436ms ± 2%    ~     (p=0.277 n=26+30)
Mandelbrot200-4            18.4ms ± 0%    18.4ms ± 0%  +0.02%  (p=0.001 n=28+25)
GoParse-4                  22.4ms ± 1%    22.3ms ± 1%  -0.41%  (p=0.000 n=28+28)
RegexpMatchEasy0_32-4       697ns ± 0%     706ns ± 0%  +1.23%  (p=0.000 n=19+30)
RegexpMatchEasy0_1K-4      4.27µs ± 0%    4.26µs ± 0%  -0.06%  (p=0.000 n=30+30)
RegexpMatchEasy1_32-4       741ns ± 0%     735ns ± 0%  -0.86%  (p=0.000 n=26+30)
RegexpMatchEasy1_1K-4      5.49µs ± 0%    5.49µs ± 0%  -0.03%  (p=0.023 n=25+30)
RegexpMatchMedium_32-4     1.05µs ± 2%    1.04µs ± 2%    ~     (p=0.893 n=30+30)
RegexpMatchMedium_1K-4      261µs ± 0%     261µs ± 0%  -0.11%  (p=0.000 n=29+30)
RegexpMatchHard_32-4       14.9µs ± 0%    14.9µs ± 0%  -0.36%  (p=0.000 n=23+29)
RegexpMatchHard_1K-4        446µs ± 0%     445µs ± 0%  -0.17%  (p=0.000 n=30+29)
Revcomp-4                  41.6ms ± 1%    41.7ms ± 1%  +0.27%  (p=0.040 n=28+30)
Template-4                  531ms ± 0%     532ms ± 1%    ~     (p=0.059 n=30+30)
TimeParse-4                3.40µs ± 0%    3.33µs ± 0%  -2.02%  (p=0.000 n=30+30)
TimeFormat-4               6.14µs ± 0%    6.11µs ± 0%  -0.45%  (p=0.000 n=27+29)
[Geo mean]                  384µs          383µs       -0.27%

name                     old speed      new speed      delta
GobDecode-4              14.5MB/s ± 2%  14.4MB/s ± 2%  -0.90%  (p=0.005 n=28+30)
GobEncode-4              15.4MB/s ± 2%  15.4MB/s ± 2%    ~     (p=0.741 n=30+25)
Gzip-4                   7.44MB/s ± 0%  7.47MB/s ± 1%  +0.37%  (p=0.000 n=25+30)
Gunzip-4                 62.3MB/s ± 0%  62.4MB/s ± 0%  +0.13%  (p=0.000 n=30+28)
JSONEncode-4             15.5MB/s ± 1%  15.6MB/s ± 3%  +1.07%  (p=0.000 n=25+30)
JSONDecode-4             4.48MB/s ± 0%  4.46MB/s ± 2%    ~     (p=0.655 n=23+30)
GoParse-4                2.58MB/s ± 1%  2.59MB/s ± 1%  +0.42%  (p=0.000 n=28+29)
RegexpMatchEasy0_32-4    45.9MB/s ± 0%  45.3MB/s ± 0%  -1.23%  (p=0.000 n=28+30)
RegexpMatchEasy0_1K-4     240MB/s ± 0%   240MB/s ± 0%  +0.07%  (p=0.000 n=30+30)
RegexpMatchEasy1_32-4    43.2MB/s ± 0%  43.5MB/s ± 0%  +0.85%  (p=0.000 n=30+28)
RegexpMatchEasy1_1K-4     186MB/s ± 0%   186MB/s ± 0%  +0.03%  (p=0.026 n=25+30)
RegexpMatchMedium_32-4    955kB/s ± 2%   960kB/s ± 2%    ~     (p=0.084 n=30+30)
RegexpMatchMedium_1K-4   3.92MB/s ± 0%  3.93MB/s ± 0%  +0.14%  (p=0.000 n=29+30)
RegexpMatchHard_32-4     2.14MB/s ± 0%  2.15MB/s ± 0%  +0.31%  (p=0.000 n=30+26)
RegexpMatchHard_1K-4     2.30MB/s ± 0%  2.30MB/s ± 0%    ~     (all equal)
Revcomp-4                61.1MB/s ± 1%  60.9MB/s ± 1%  -0.27%  (p=0.039 n=28+30)
Template-4               3.66MB/s ± 0%  3.65MB/s ± 1%  -0.14%  (p=0.045 n=30+30)
[Geo mean]               12.8MB/s       12.8MB/s       +0.04%

Change-Id: I02370e2584b4c041fddd324c97628fd6f0c12183
Reviewed-on: https://go-review.googlesource.com/123179
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-20 15:13:23 +00:00
Ben Shi
556971316f cmd/compile: optimize 386's comparison
CMPL/CMPW/CMPB can take a memory operand on 386, and this CL
implements that optimization.

1. The total size of pkg/linux_386 decreases about 45KB, excluding
cmd/compile.

2. The go1 benchmark shows a little improvement.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.36s ± 2%     3.37s ± 3%    ~     (p=0.537 n=40+40)
Fannkuch11-4                3.59s ± 1%     3.53s ± 2%  -1.58%  (p=0.000 n=40+40)
FmtFprintfEmpty-4          46.0ns ± 3%    45.8ns ± 3%    ~     (p=0.249 n=40+40)
FmtFprintfString-4         80.0ns ± 4%    78.8ns ± 3%  -1.49%  (p=0.001 n=40+40)
FmtFprintfInt-4            89.7ns ± 2%    90.3ns ± 2%  +0.74%  (p=0.003 n=40+40)
FmtFprintfIntInt-4          144ns ± 3%     143ns ± 3%  -0.95%  (p=0.003 n=40+40)
FmtFprintfPrefixedInt-4     181ns ± 4%     180ns ± 2%    ~     (p=0.103 n=40+40)
FmtFprintfFloat-4           412ns ± 3%     408ns ± 4%  -0.97%  (p=0.018 n=40+40)
FmtManyArgs-4               607ns ± 4%     605ns ± 4%    ~     (p=0.148 n=40+40)
GobDecode-4                7.19ms ± 4%    7.24ms ± 5%    ~     (p=0.340 n=40+40)
GobEncode-4                7.04ms ± 9%    6.99ms ± 9%    ~     (p=0.289 n=40+40)
Gzip-4                      400ms ± 6%     398ms ± 5%    ~     (p=0.168 n=40+40)
Gunzip-4                   41.2ms ± 3%    41.7ms ± 3%  +1.40%  (p=0.001 n=40+40)
HTTPClientServer-4         62.5µs ± 1%    62.1µs ± 2%  -0.61%  (p=0.000 n=37+37)
JSONEncode-4               20.7ms ± 4%    20.4ms ± 3%  -1.60%  (p=0.000 n=40+40)
JSONDecode-4               69.4ms ± 4%    69.2ms ± 6%    ~     (p=0.177 n=40+40)
Mandelbrot200-4            5.22ms ± 6%    5.21ms ± 3%    ~     (p=0.531 n=40+40)
GoParse-4                  3.29ms ± 3%    3.28ms ± 3%    ~     (p=0.321 n=40+39)
RegexpMatchEasy0_32-4       104ns ± 4%     103ns ± 7%  -0.89%  (p=0.040 n=40+40)
RegexpMatchEasy0_1K-4       852ns ± 3%     853ns ± 2%    ~     (p=0.357 n=40+40)
RegexpMatchEasy1_32-4       113ns ± 8%     113ns ± 3%    ~     (p=0.906 n=40+40)
RegexpMatchEasy1_1K-4      1.03µs ± 4%    1.03µs ± 5%    ~     (p=0.326 n=40+40)
RegexpMatchMedium_32-4      136ns ± 3%     133ns ± 3%  -2.31%  (p=0.000 n=40+40)
RegexpMatchMedium_1K-4     44.0µs ± 3%    43.7µs ± 3%    ~     (p=0.053 n=40+40)
RegexpMatchHard_32-4       2.27µs ± 3%    2.26µs ± 4%    ~     (p=0.391 n=40+40)
RegexpMatchHard_1K-4       68.0µs ± 3%    68.9µs ± 3%  +1.28%  (p=0.000 n=40+40)
Revcomp-4                   1.86s ± 5%     1.86s ± 2%    ~     (p=0.950 n=40+40)
Template-4                 73.4ms ± 4%    69.9ms ± 7%  -4.78%  (p=0.000 n=40+40)
TimeParse-4                 449ns ± 4%     441ns ± 5%  -1.76%  (p=0.000 n=40+40)
TimeFormat-4                416ns ± 3%     417ns ± 4%    ~     (p=0.304 n=40+40)
[Geo mean]                 67.7µs         67.3µs       -0.55%

name                     old speed      new speed      delta
GobDecode-4               107MB/s ± 4%   106MB/s ± 5%    ~     (p=0.336 n=40+40)
GobEncode-4               109MB/s ± 5%   110MB/s ± 9%    ~     (p=0.142 n=38+40)
Gzip-4                   48.5MB/s ± 5%  48.8MB/s ± 5%    ~     (p=0.172 n=40+40)
Gunzip-4                  472MB/s ± 3%   465MB/s ± 3%  -1.39%  (p=0.001 n=40+40)
JSONEncode-4             93.6MB/s ± 4%  95.1MB/s ± 3%  +1.61%  (p=0.000 n=40+40)
JSONDecode-4             28.0MB/s ± 3%  28.1MB/s ± 6%    ~     (p=0.181 n=40+40)
GoParse-4                17.6MB/s ± 3%  17.7MB/s ± 3%    ~     (p=0.350 n=40+39)
RegexpMatchEasy0_32-4     308MB/s ± 4%   311MB/s ± 6%  +0.96%  (p=0.025 n=40+40)
RegexpMatchEasy0_1K-4    1.20GB/s ± 3%  1.20GB/s ± 2%    ~     (p=0.317 n=40+40)
RegexpMatchEasy1_32-4     282MB/s ± 7%   282MB/s ± 3%    ~     (p=0.516 n=40+40)
RegexpMatchEasy1_1K-4     994MB/s ± 4%   991MB/s ± 5%    ~     (p=0.319 n=40+40)
RegexpMatchMedium_32-4   7.31MB/s ± 3%  7.49MB/s ± 3%  +2.46%  (p=0.000 n=40+40)
RegexpMatchMedium_1K-4   23.3MB/s ± 3%  23.4MB/s ± 3%    ~     (p=0.052 n=40+40)
RegexpMatchHard_32-4     14.1MB/s ± 3%  14.1MB/s ± 4%    ~     (p=0.391 n=40+40)
RegexpMatchHard_1K-4     15.1MB/s ± 3%  14.9MB/s ± 3%  -1.27%  (p=0.000 n=40+40)
Revcomp-4                 137MB/s ± 5%   137MB/s ± 2%    ~     (p=0.942 n=40+40)
Template-4               26.5MB/s ± 4%  27.8MB/s ± 7%  +5.03%  (p=0.000 n=40+40)
[Geo mean]               78.6MB/s       79.0MB/s       +0.57%

Change-Id: Idcacc6881ef57cd7dc33aa87b711282842b72a53
Reviewed-on: https://go-review.googlesource.com/126618
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-20 14:23:22 +00:00
Ben Shi
b75c5c5992 cmd/compile: optimize AMD64 with more read-modify-write operations
6 more operations which do read-modify-write with a constant
source operand are added.

1. The total size of pkg/linux_amd64 decreases about 3KB, excluding
cmd/compile.

2. The go1 benckmark shows a slight improvement.
name                     old time/op    new time/op    delta
BinaryTree17-4              2.61s ± 4%     2.67s ± 2%  +2.26%  (p=0.000 n=30+29)
Fannkuch11-4                2.39s ± 2%     2.32s ± 2%  -2.67%  (p=0.000 n=30+30)
FmtFprintfEmpty-4          44.0ns ± 4%    41.7ns ± 4%  -5.15%  (p=0.000 n=30+30)
FmtFprintfString-4         74.2ns ± 4%    72.3ns ± 4%  -2.59%  (p=0.000 n=30+30)
FmtFprintfInt-4            81.7ns ± 3%    78.8ns ± 4%  -3.54%  (p=0.000 n=27+30)
FmtFprintfIntInt-4          130ns ± 4%     124ns ± 5%  -4.60%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4     154ns ± 3%     152ns ± 3%  -1.13%  (p=0.012 n=30+30)
FmtFprintfFloat-4           215ns ± 4%     212ns ± 5%  -1.56%  (p=0.002 n=30+30)
FmtManyArgs-4               522ns ± 3%     512ns ± 3%  -1.84%  (p=0.001 n=30+30)
GobDecode-4                6.42ms ± 5%    6.49ms ± 7%    ~     (p=0.070 n=30+30)
GobEncode-4                6.07ms ± 8%    5.98ms ± 8%    ~     (p=0.150 n=30+30)
Gzip-4                      236ms ± 4%     223ms ± 4%  -5.57%  (p=0.000 n=30+30)
Gunzip-4                   37.4ms ± 3%    36.7ms ± 4%  -2.03%  (p=0.000 n=30+30)
HTTPClientServer-4         58.7µs ± 1%    58.5µs ± 2%  -0.37%  (p=0.018 n=30+29)
JSONEncode-4               12.0ms ± 4%    12.1ms ± 3%    ~     (p=0.112 n=30+30)
JSONDecode-4               54.5ms ± 3%    55.5ms ± 4%  +1.80%  (p=0.006 n=30+30)
Mandelbrot200-4            3.78ms ± 4%    3.78ms ± 4%    ~     (p=0.173 n=30+30)
GoParse-4                  3.16ms ± 5%    3.22ms ± 5%  +1.75%  (p=0.010 n=30+30)
RegexpMatchEasy0_32-4      76.6ns ± 1%    75.9ns ± 3%    ~     (p=0.672 n=25+30)
RegexpMatchEasy0_1K-4       252ns ± 3%     253ns ± 3%  +0.57%  (p=0.027 n=30+30)
RegexpMatchEasy1_32-4      69.8ns ± 4%    70.2ns ± 6%    ~     (p=0.539 n=30+30)
RegexpMatchEasy1_1K-4       374ns ± 3%     373ns ± 5%    ~     (p=0.263 n=30+30)
RegexpMatchMedium_32-4      107ns ± 4%     109ns ± 3%    ~     (p=0.067 n=30+30)
RegexpMatchMedium_1K-4     33.9µs ± 5%    34.1µs ± 4%    ~     (p=0.297 n=30+30)
RegexpMatchHard_32-4       1.54µs ± 3%    1.56µs ± 4%  +1.43%  (p=0.002 n=30+30)
RegexpMatchHard_1K-4       46.6µs ± 3%    47.0µs ± 3%    ~     (p=0.055 n=30+30)
Revcomp-4                   411ms ± 6%     407ms ± 6%    ~     (p=0.219 n=30+30)
Template-4                 66.8ms ± 3%    64.8ms ± 5%  -3.01%  (p=0.000 n=30+30)
TimeParse-4                 312ns ± 2%     319ns ± 3%  +2.50%  (p=0.000 n=30+30)
TimeFormat-4                296ns ± 5%     299ns ± 3%  +0.93%  (p=0.005 n=30+30)
[Geo mean]                 47.5µs         47.1µs       -0.75%

name                     old speed      new speed      delta
GobDecode-4               120MB/s ± 5%   118MB/s ± 6%    ~     (p=0.072 n=30+30)
GobEncode-4               127MB/s ± 8%   129MB/s ± 8%    ~     (p=0.150 n=30+30)
Gzip-4                   82.1MB/s ± 4%  87.0MB/s ± 4%  +5.90%  (p=0.000 n=30+30)
Gunzip-4                  519MB/s ± 4%   529MB/s ± 4%  +2.07%  (p=0.001 n=30+30)
JSONEncode-4              162MB/s ± 4%   161MB/s ± 3%    ~     (p=0.110 n=30+30)
JSONDecode-4             35.6MB/s ± 3%  35.0MB/s ± 4%  -1.77%  (p=0.007 n=30+30)
GoParse-4                18.3MB/s ± 4%  18.0MB/s ± 4%  -1.72%  (p=0.009 n=30+30)
RegexpMatchEasy0_32-4     418MB/s ± 1%   422MB/s ± 3%    ~     (p=0.645 n=25+30)
RegexpMatchEasy0_1K-4    4.06GB/s ± 3%  4.04GB/s ± 3%  -0.57%  (p=0.033 n=30+30)
RegexpMatchEasy1_32-4     459MB/s ± 4%   456MB/s ± 6%    ~     (p=0.530 n=30+30)
RegexpMatchEasy1_1K-4    2.73GB/s ± 3%  2.75GB/s ± 5%    ~     (p=0.279 n=30+30)
RegexpMatchMedium_32-4   9.28MB/s ± 5%  9.18MB/s ± 4%    ~     (p=0.086 n=30+30)
RegexpMatchMedium_1K-4   30.2MB/s ± 4%  30.0MB/s ± 4%    ~     (p=0.300 n=30+30)
RegexpMatchHard_32-4     20.8MB/s ± 3%  20.5MB/s ± 4%  -1.41%  (p=0.002 n=30+30)
RegexpMatchHard_1K-4     22.0MB/s ± 3%  21.8MB/s ± 3%    ~     (p=0.051 n=30+30)
Revcomp-4                 619MB/s ± 7%   625MB/s ± 7%    ~     (p=0.219 n=30+30)
Template-4               29.0MB/s ± 3%  29.9MB/s ± 4%  +3.11%  (p=0.000 n=30+30)
[Geo mean]                123MB/s        123MB/s       +0.28%

Change-Id: I850652cfd53329c1af804b7f57f4393d8097bb0d
Reviewed-on: https://go-review.googlesource.com/121135
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-08-20 14:18:39 +00:00
Richard Musiol
81555cb4f3 cmd/compile/internal/gc: add nil check for closure call on wasm
This commit adds an explicit nil check for closure calls on wasm,
so calling a nil func causes a proper panic instead of crashing on the
WebAssembly level.

Change-Id: I6246844f316677976cdd420618be5664444c25ae
Reviewed-on: https://go-review.googlesource.com/127759
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-14 09:19:38 +00:00
Ben Shi
de0e72610b test/codegen: add more combined store tests for arm64
Some combined store optimization was already implemented
in go-1.11, but there is no corresponding test cases.

Change-Id: Iebdad186e92047942e53a74f2c20b390922e1e9c
Reviewed-on: https://go-review.googlesource.com/122915
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-02 04:23:45 +00:00
Keith Randall
5fc70b6fac cmd/compile: set stricter inlining threshold in large functions
If we're compiling a large function, be more picky about how big
the function we're inlining is.  If the function is >5000 nodes,
we lower the inlining threshold from a cost of 80 to 20.

Turns out reflect.Value's cost is exactly 80.  That's the function
at issue in #26546.

20 was chosen as a proxy for "inlined body is smaller than the call would be".
Simple functions still get inlined, like this one at cost 7:

func ifaceIndir(t *rtype) bool {
	return t.kind&kindDirectIface == 0
}

5000 nodes was chosen as the big function size.  Here are all the
5000+ node (~~1000+ lines) functions in the stdlib:

5187 cmd/internal/obj/arm (*ctxt5).asmout
6879 cmd/internal/obj/s390x (*ctxtz).asmout
6567 cmd/internal/obj/ppc64 (*ctxt9).asmout
9643 cmd/internal/obj/arm64 (*ctxt7).asmout
5042 cmd/internal/obj/x86 (*AsmBuf).doasm
8768 cmd/compile/internal/ssa rewriteBlockAMD64
8878 cmd/compile/internal/ssa rewriteBlockARM
8344 cmd/compile/internal/ssa rewriteValueARM64_OpARM64OR_20
7916 cmd/compile/internal/ssa rewriteValueARM64_OpARM64OR_30
5427 cmd/compile/internal/ssa rewriteBlockARM64
5126 cmd/compile/internal/ssa rewriteValuePPC64_OpPPC64OR_50
6152 cmd/compile/internal/ssa rewriteValuePPC64_OpPPC64OR_60
6412 cmd/compile/internal/ssa rewriteValuePPC64_OpPPC64OR_70
6486 cmd/compile/internal/ssa rewriteValuePPC64_OpPPC64OR_80
6534 cmd/compile/internal/ssa rewriteValuePPC64_OpPPC64OR_90
6534 cmd/compile/internal/ssa rewriteValuePPC64_OpPPC64OR_100
6534 cmd/compile/internal/ssa rewriteValuePPC64_OpPPC64OR_110
6675 cmd/compile/internal/gc typecheck1
5433 cmd/compile/internal/gc walkexpr
14070 cmd/vendor/golang.org/x/arch/arm64/arm64asm decodeArg

There are a lot more smaller (~1000 node) functions in the stdlib.
The function in #26546 has 12477 nodes.

At some point it might be nice to have a better heuristic for "inlined
body is smaller than the call", a non-cliff way to scale down the cost
as the function gets bigger, doing cheaper inlined calls first, etc.
All that can wait for another release. I'd like to do this CL for
1.11.

Fixes #26546
Update #17566

Change-Id: Idda13020e46ec2b28d79a17217f44b189f8139ac
Reviewed-on: https://go-review.googlesource.com/125516
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-07-24 16:11:08 +00:00
Cherry Zhang
48c79734ff test: add test for gccgo bug #26495
Gccgo produced incorrect order of evaluation for expressions
involving &&, || subexpressions. The fix is CL 125299.

Updates #26495.

Change-Id: I18d873281709f3160b3e09f0b2e46f5c120e1cab
Reviewed-on: https://go-review.googlesource.com/125301
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-07-20 20:08:15 +00:00
Keith Randall
59d0baeb33 cmd/compile: add test for OPmodify ops clobbering flags
Code fix was in CL 122556.  This is a corresponding test case.

Fixes #26426

Change-Id: Ib8769f367aed8bead029da0a8d2ddccee1d1dccb
Reviewed-on: https://go-review.googlesource.com/124535
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-07-19 16:24:53 +00:00
Daniel Martí
ba6974fdc3 cmd/compile: fix crash on invalid struct literal
If one tries to use promoted fields in a struct literal, the compiler
errors correctly. However, if the embedded fields are of struct pointer
type, the field.Type.Sym.Name expression below panics.

This is because field.Type.Sym is nil in that case. We can simply use
field.Sym.Name in this piece of code though, as it only concerns
embedded fields, in which case what we are after is the field name.

Added a test mirroring fixedbugs/issue23609.go, but with pointer types.

Fixes #26416.

Change-Id: Ia46ce62995c9e1653f315accb99d592aff2f285e
Reviewed-on: https://go-review.googlesource.com/124395
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-07-18 20:32:04 +00:00
Ben Shi
f6ce1e2aa5 cmd/compile: fix an arm64's comparison bug
The arm64 backend generates "TST" for "if uint32(a)&uint32(b) == 0",
which should be "TSTW".

fixes #26438

Change-Id: I7d64c30e3a840b43486bcd10eea2e3e75aaa4857
Reviewed-on: https://go-review.googlesource.com/124637
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-07-18 14:15:05 +00:00
Michael Munday
1546ab5a39 cmd/compile: keep autos if their address reaches a control value
Autos must be kept if their address reaches the control value of a
block. We didn't see this before because it is rare for an auto's
address to reach a control value without also reaching a phi or
being written to memory. We can probably optimize away the
comparisons that lead to this scenario since autos cannot alias
with pointers from elsewhere, however for now we take the
conservative approach and just ensure the auto is properly
initialised if its address reaches a control value.

Fixes #26407.

Change-Id: I02265793f010a9e001c3e1a5397c290c6769d4de
Reviewed-on: https://go-review.googlesource.com/124335
Reviewed-by: David Chase <drchase@google.com>
2018-07-17 14:58:54 +00:00
Keith Randall
85cfa4d55e cmd/compile: handle degenerate write barrier case
If both branches of a write barrier test go to the same block,
then there's no unsafe points.

This can only happen if the resulting memory state is somehow dead,
which can only occur in degenerate cases, like infinite loops. No
point in cleaning up the useless branch in these situations.

Fixes #26024.

Change-Id: I93a7df9fdf2fc94c6c4b1fe61180dc4fd4a0871f
Reviewed-on: https://go-review.googlesource.com/123655
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-07-12 19:06:09 +00:00
David Chase
0029cd479e cmd/compile: add LocalAddr that takes SP,mem operands
Lack of a well-defined order between VarDef and related
address operations sometimes causes problems with store order
and write barrier transformations; glitches in the order are
made irreparable (by later optimizations) if the two parts of
the glitch straddle a split in the original block caused by
insertion of a write barrier diamond.

Fix this by creating a LocalAddr for addresses of locals
(what VarDef matters for) that takes a memory input to
help make the order explicit.  Addr is modified to only
be legal for SB operand, so there is no overlap between
Addr and LocalAddr uses (there may be some downstream
cleanup from this).

Changes to generic.rules and rewrite.go ensure that codegen
tests continue to pass; CSE of LocalAddr is impaired, not
quite sure of the cost.

Fixes #26105.

Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515
Reviewed-on: https://go-review.googlesource.com/122483
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-07-12 18:45:31 +00:00
Ian Lance Taylor
964c15f360 test: add test of valid code that gccgo failed to compile
Updates #26340

Change-Id: I3bc7cd544ea77df660bbda7de99a009b63d5be1b
Reviewed-on: https://go-review.googlesource.com/123477
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-07-12 05:29:12 +00:00
Matthew Dempsky
cc422e64d0 cmd/compile: fix ICE due to missing inline function body
For golang.org/cl/74110, I forgot that you can use range-based for
loops to extract key values from a map value.

This wasn't a problem for the binary format importer, because it was
more tolerant about missing inline function bodies. However, the
indexed importer is more particular about this.

We could potentially just make it more lenient like the binary
importer, but tweaking the logic here is easy enough and seems like
the preferable solution.

Fixes #26341.

Change-Id: I54564dcd0be60ea393f8a0f6954b7d3d61e96ee5
Reviewed-on: https://go-review.googlesource.com/123475
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-07-12 00:13:37 +00:00
Ian Lance Taylor
cda1947fd1 runtime: don't say "different packages" if they may not be different
Fix the panic message produced for an interface conversion error to
only say "types from different packages" if they are definitely from
different packges. If they may be from the same package, say "types
from different scopes."

Updates #18911
Fixes #26094

Change-Id: I0cea50ba31007d88e70c067b4680009ede69bab9
Reviewed-on: https://go-review.googlesource.com/123395
Reviewed-by: Austin Clements <austin@google.com>
2018-07-11 21:34:05 +00:00
Ian Lance Taylor
787ff17dea test: add test case that failed with gccgo
Updates #26335

Change-Id: Ibfb1e232a0c66fa699842c8908ae5ff0f5d2177d
Reviewed-on: https://go-review.googlesource.com/123316
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-07-11 18:14:35 +00:00
Ian Lance Taylor
d49572eaf3 test: add order of evaluation test case that gccgo got wrong
Updates #23188

Change-Id: Idc5567546d1c4c592f997a4cebbbf483b85331e0
Reviewed-on: https://go-review.googlesource.com/123115
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-07-10 21:30:26 +00:00
Keith Randall
be9c994609 internal/bytealg: specify argmaps for exported functions
Functions exported on behalf of other packages need to have their
argument stack maps specified explicitly.  They don't get an implicit
map because they are not in the local package, and if they get defer'd
they need argument maps.

Fixes #24419

Change-Id: I35b7d8b4a03d4770ba88699e1007cb3fcb5397a9
Reviewed-on: https://go-review.googlesource.com/122676
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-07-10 17:13:53 +00:00
Ben Shi
f9800a9473 test/codegen: add more test cases for arm64
More test cases of combined load for arm64.

Change-Id: I7a9f4dcec6930f161cbded1f47dbf7fcef1db4f1
Reviewed-on: https://go-review.googlesource.com/122582
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-07-10 15:41:15 +00:00
Cherry Zhang
c801232525 cmd/compile: make sure alg functions are generated when we call them
When DWARF is disabled, some alg functions were not generated.
Make sure they are generated when we about to generate calls to
them.

Fixes #23546.

Change-Id: Iecfa0eea830e42ee92e55268167cefb1540980b2
Reviewed-on: https://go-review.googlesource.com/122403
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-07-10 01:20:45 +00:00
Cherry Zhang
5f256dc8e6 test: add test for gccgo bug #26248
The fix is CL 122756.

Updates #26248.

Change-Id: Ic4250ab5d01da9f65d0bc033e2306343d9c87a99
Reviewed-on: https://go-review.googlesource.com/122757
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-07-10 01:19:37 +00:00
Keith Randall
58d287e5e8 cmd/compile: ensure that loop condition is detected correctly
We need to make sure that the terminating comparison has the right
sense given the increment direction. If the increment is positive,
the terminating comparsion must be < or <=. If the increment is
negative, the terminating comparison must be > or >=.

Do a few cleanups,  like constant-folding entry==0, adding comments,
removing unused "exported" fields.

Fixes #26116

Change-Id: I14230ee8126054b750e2a1f2b18eb8f09873dbd5
Reviewed-on: https://go-review.googlesource.com/121940
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-07-09 18:23:39 +00:00
Matthew Dempsky
8d5fd871d7 cmd/compile: fix "width not calculated" ICE
Expanding interface method sets is handled during width calculation,
which can't be performed concurrently. Make sure that we eagerly
expand interfaces in the frontend when importing them, even if they're
not actually used by code, because we might need to generate a type
description of them.

Fixes #25055.

Change-Id: I6fd2756de2c7d5dbc33056f70b3028ca3aebab41
Reviewed-on: https://go-review.googlesource.com/122517
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-07-06 20:25:52 +00:00
Cherry Zhang
3f54e8537a cmd/compile: run generic deadcode in -N mode
Late opt pass may generate dead stores, which messes up store
chain calculation in later passes. Run generic deadcode even
in -N mode to remove them.

Fixes #26163.

Change-Id: I8276101717bb978d5980e6c7998f53fd8d0ae10f
Reviewed-on: https://go-review.googlesource.com/121856
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-07-02 20:53:23 +00:00
Michael Munday
adfa8b8691 cmd/compile: keep autos whose address reaches a phi
If the address of an auto reaches a phi then any further stores to
the pointer represented by the phi probably need to be kept. This
is because stores to the other arguments to the phi may be visible
to the program.

Fixes #26153.

Change-Id: Ic506c6c543bf70d792e5b1a64bdde1e5fdf1126a
Reviewed-on: https://go-review.googlesource.com/121796
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-07-02 19:43:07 +00:00
David Chase
d4d8237a5b Revert "cmd/compile: make OpAddr depend on VarDef in storeOrder"
This reverts commit 1a27f048ad.

Reason for revert: Broke the ssacheck and -N-l builders, and the -N-l fix looks like it will take some time and take a different route entirely.

Change-Id: Ie0ac5e86ab7d72a303dfbbc48dfdf1e092d4f61a
Reviewed-on: https://go-review.googlesource.com/121715
Reviewed-by: David Chase <drchase@google.com>
2018-06-29 19:48:48 +00:00
David Chase
1a27f048ad cmd/compile: make OpAddr depend on VarDef in storeOrder
Given a carefully constructed input, writebarrier would
split a block with the OpAddr in the first half and the
VarDef in the second half which ultimately leads to a
compiler crash because the scheduler is no longer able
to put them in the proper order.

To fix, recognize the implicit dependence of OpAddr on
the VarDef of the same symbol if any exists.

This fix was chosen over making OpAddr take a memory
operand to make the dependence explicit, because this
change is less invasive at this late part of the 1.11
release cycle.

Fixes #26105.

Change-Id: I9b65460673af3af41740ef877d2fca91acd336bc
Reviewed-on: https://go-review.googlesource.com/121436
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-06-29 15:20:52 +00:00
Cherry Zhang
d21bdf125c cmd/compile: check SSAability in handling of INDEX of 1-element array
SSA can handle 1-element array, but only when the element type
is SSAable. When building SSA for INDEX of 1-element array, we
did not check the element type is SSAable. And when it's not,
it resulted in an unhandled SSA op.

Fixes #26120.

Change-Id: Id709996b5d9d90212f6c56d3f27eed320a4d8360
Reviewed-on: https://go-review.googlesource.com/121496
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-06-29 12:05:05 +00:00
Robert Griesemer
b410ce750e cmd/compile: don't crash in untyped expr to interface conversion
Fixes #24763.

Change-Id: Ibe534271d75b6961d00ebfd7d42c43a3ac650d79
Reviewed-on: https://go-review.googlesource.com/121335
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-06-28 23:01:22 +00:00
Ilya Tocar
b71ea0b7dd cmd/compile: mark CMOVLEQF, CMOVWEQF as cloberring AX
Code generation for OpAMD64CMOV[WLQ]EQF uses AX as a scratch register,
but only CMOVQEQF, correctly lets compiler know. Mark other 2 as
clobbering AX.

Fixes #26097

Change-Id: I2a65bd67bf18a540898b4a0ae6c8766e0b767b19
Reviewed-on: https://go-review.googlesource.com/121336
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-06-28 18:39:56 +00:00
Cherry Zhang
1d303a0086 cmd/compile: fold offset into address on Wasm
On Wasm, the offset was not folded into LoweredAddr, so it was
not rematerializeable. This led to the address-taken operation
in some cases generated too early, before the local variable
becoming live. The liveness code thinks the variable live when
the address is taken, then backs it up to live at function
entry, then complains about it, because nothing other than
arguments should be live on entry.

This CL folds the offset into the address operation, so it is
rematerializeable and so generated right before use, after the
variable actually becomes live.

It might be possible to relax the liveness code not to think a
variable live when its address being taken, but until the address
actually being used. But it would be quite complicated. As we're
late in Go 1.11 freeze, it would be better not to do it. Also,
I think the address operation is rematerializeable now on all
architectures, so this is probably less necessary.

This may also be a slight optimization, as the address+offset is
now rematerializeable, which can be generated on the Wasm stack,
without using any "registers" which are emulated by local
variables on Wasm. I don't know how to do benchmarks on Wasm. At
least, cmd/go binary size shrinks 9K.

Fixes #25966.

Change-Id: I01e5869515d6a3942fccdcb857f924a866876e57
Reviewed-on: https://go-review.googlesource.com/120599
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
2018-06-27 14:30:00 +00:00
David Chase
7e9d55eeeb cmd/compile: avoid remainder in loopbce when increment=0
For non-unit increment, loopbce checks to see if the
increment evenly divides the difference between (constant)
loop start and end.  This test panics when the increment
is zero.

Fix: check for zero, if found, don't optimize the loop.

Also added missing copyright notice to loopbce.go.

Fixes #26043.

Change-Id: I5f460104879cacc94481949234c9ce8c519d6380
Reviewed-on: https://go-review.googlesource.com/120759
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-06-25 20:36:42 +00:00
Matthew Dempsky
f422bea498 cmd/compile: fix compile failure for lazily resolved shadowed types
If expanding an inline function body required lazily expanding a
package-scoped type whose identifier was shadowed within the function
body, the lazy expansion would instead overwrite the local symbol
definition instead of the package-scoped symbol. This was due to
importsym using s.Def instead of s.PkgDef.

Unfortunately, this is yet another consequence of the current awkward
scope handling code.

Passes toolstash-check.

Fixes #25984.

Change-Id: Ia7033e1749a883e6e979c854d4b12b0b28083dd8
Reviewed-on: https://go-review.googlesource.com/120456
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-06-22 17:57:09 +00:00
Cherry Zhang
78a579316b cmd/compile: convert uint32 to int32 in ARM constant folding rules
MOVWconst's AuxInt is Int32. SSA check complains if the AuxInt
does not fit in int32. Convert uint32 to int32 to make it happy.

The generated code is unchanged. MOVW only cares low 32 bits.

Passes "toolstash -cmp" std cmd for ARM.

Fixes #25993.

Change-Id: I2b6532c9c285ea6d89652505fb7c553f85a98864
Reviewed-on: https://go-review.googlesource.com/120335
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-06-22 16:25:32 +00:00
David Chase
c6e455bb11 cmd/compile: conditional on -race, disable inline of go:norace
Adds the appropriate check to inl.go.
Includes tests of both -race+go:norace and plain go:norace.

Fixes #24651.

Change-Id: Id806342430c20baf4679a985d12eea3b677092e0
Reviewed-on: https://go-review.googlesource.com/119195
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-06-19 20:59:10 +00:00
Robert Griesemer
707ca18d97 cmd/compile: more accurate position for select case error message
Fixes #25958.

Change-Id: I1f4808a70c20334ecfc4eb1789f5389d94dcf00e
Reviewed-on: https://go-review.googlesource.com/119755
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-06-19 17:52:23 +00:00
David Chase
c359d759a7 cmd/compile: ensure that operand of ORETURN is not double-walked
Inlining of switch statements into a RETURNed expression
can sometimes lead to the switch being walked twice, which
results in a miscompiled switch statement. The bug depends
on:

1) multiple results
2) named results
3) a return statement whose expression includes a call to a
function containing a switch statement that is inlined.

It may also be significant that the default case of that
switch is a panic(), though that's not proven.

Rearranged the walk case for ORETURN so that double walks are
not possible.  Added a test, because this is so fiddly.
Added a check against double walks, verified that it fires
w/o other fix.

Fixes #25776.

Change-Id: I2d594351fa082632512ef989af67eb887059729b
Reviewed-on: https://go-review.googlesource.com/118318
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-06-14 20:08:10 +00:00
Emmanuel T Odeke
7bc99ffa05 cmd/compile: make case insensitive suggestions aware of package
Ensure that compiler error suggestions after case insensitive
field lookups don't mistakenly reported unexported fields if
those fields aren't in the local package being processed.

Fixes #25727

Change-Id: Icae84388c2a82c8cb539f3d43ad348f50a644caa
Reviewed-on: https://go-review.googlesource.com/117755
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-06-13 23:12:54 +00:00
Brad Fitzpatrick
39ad208c13 test: add test to verify that string copies don't get optimized away
Fixes #25834

Change-Id: I33e58dabfd04b84dfee1a9a3796796b5d19862e7
Reviewed-on: https://go-review.googlesource.com/118295
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-06-12 19:10:34 +00:00
Robert Griesemer
48987baa09 cmd/compile: correct alias cycle detection
The original fix (https://go-review.googlesource.com/c/go/+/35831)
for this issue was incorrect as it reported cycles in cases where
it shouldn't.

Instead, use a different approach: A type cycle containing aliases
is only a cycle if there are no type definitions. As soon as there
is a type definition, alias expansion terminates and there is no
cycle.

Approach: Split sprint_depchain into two non-recursive and more
easily understandable functions (cycleFor and cycleTrace),
and use those instead for cycle reporting. Analyze the cycle
returned by cycleFor before issueing an alias cycle error.

Also: Removed original fix (main.go) which introduced a separate
crash (#23823).

Fixes #18640.
Fixes #23823.
Fixes #24939.

Change-Id: Ic3707a9dec40a71dc928a3e49b4868c5fac3d3b7
Reviewed-on: https://go-review.googlesource.com/118078
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-06-12 18:55:39 +00:00
David Chase
c08b01ecb4 cmd/compile: fix panic-okay-to-inline change; adjust tests
This line of the inlining tuning experiment
https://go-review.googlesource.com/c/go/+/109918/1/src/cmd/compile/internal/gc/inl.go#347
was incorrectly rewritten in a later patch to use the call
cost, not the panic cost, and thus the inlining of panic
didn't occur when it should.  I discovered this when I
realized that tests should have failed, but didn't.

Fix is to make the correct change, and also to modify the
tests that this causes to fail.  One test now asserts the
new normal, the other calls "ppanic" instead which is
designed to behave like panic but not be inlined.

Change-Id: I423bb7f08bd66a70d999826dd9b87027abf34cdf
Reviewed-on: https://go-review.googlesource.com/116656
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-06-06 20:35:23 +00:00
Yury Smolsky
7c1f361e25 test: add comments for all test actions
Every action has a short annotation.
The errorCheck function has a comment adapted from errchk script.

Removed redundant assigments to tmpDir.

Change-Id: Ifdd1284de046a0ce2aad26bd8da8a8e6a7707a8e
Reviewed-on: https://go-review.googlesource.com/115856
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-06-06 15:12:32 +00:00
Richard Musiol
88dc4aee7c cmd/compile: fix OffPtr with negative offset on wasm
The wasm archtecture was missing a rule to handle OffPtr with a
negative offset. This commit makes it so OffPtr always gets lowered
to I64AddConst.

Fixes #25741

Change-Id: I1d48e2954e3ff31deb8cba9a9bf0cab7c4bab71a
Reviewed-on: https://go-review.googlesource.com/116595
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2018-06-06 14:40:25 +00:00
Robert Griesemer
9be4f312bf cmd/compile: revert internal parameter rename (from ".anonX" to "") before export
In the old binary export format, parameter names for parameter lists
which contained only types where never written, so this problem didn't
come up.

Fixes #25101.

Change-Id: Ia8b817f7f467570b05f88d584e86b6ef4acdccc6
Reviewed-on: https://go-review.googlesource.com/116376
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-06-05 20:22:07 +00:00
Robert Griesemer
4a2bec9726 cmd/compile: fix printing of array types in error messages
Fixes #23094.

Change-Id: I9aa36046488baa5f55cf2099e10cfb39ecd17b06
Reviewed-on: https://go-review.googlesource.com/116256
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-06-05 00:43:22 +00:00
Robert Griesemer
75c1aed345 runtime: slightly better error message for assertion panics with identical looking types
Fixes #18911.

Change-Id: Ice10f37460a4f0a66cddeacfe26c28045f5e60fe
Reviewed-on: https://go-review.googlesource.com/116255
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-06-05 00:29:50 +00:00
Keith Randall
06b326054d cmd/compile: include callee args section when checking frame too large
The stack frame includes the callee args section. At the point where
we were checking the frame size, that part of the frame had not been
computed yet. Move the check later so we can include the callee args size.

Fixes #20780
Update #25507

Change-Id: Iab97cb89b3a24f8ca19b9123ef2a111d6850c3fe
Reviewed-on: https://go-review.googlesource.com/115195
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-06-02 18:00:44 +00:00
Yury Smolsky
7cb1810fe8 test: skip test/fixedbugs/bug345.go on windows
Before the CL 115277 we did not run the test on Windows,
so let's just go back to not running the test on Windows.
There is nothing OS-specific about this test,
so skipping it on Windows doesn't seem like a big deal.

Updates #25693
Fixes #25586

Change-Id: I1eb3e158b322d73e271ef388f8c6e2f2af0a0729
Reviewed-on: https://go-review.googlesource.com/115857
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-06-01 21:46:07 +00:00
Yury Smolsky
57d40f1b27 test: remove rundircmpout and cmpout actions
This CL removes the rundircmpout action completely
because it is not used anywhere.

The run case already looks for output files. Rename the cmpout action
mentioned in tests to the run action and remove "cmpout" from run.go.

Change-Id: I835ceb70082927f8e9360e0ea0ba74f296363ab3
Reviewed-on: https://go-review.googlesource.com/115575
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-31 17:36:45 +00:00
Yury Smolsky
caf968616a test: eliminate use of Perl in fixedbugs/bug345.go
To allow testing of fixedbugs/bug345.go in Go,
a new flag -n is introduced. This flag disables setting
of relative path for local imports and imports search path
to current dir, namely -D . -I . are not passed to the compiler.
Error regexps are fixed to allow running the test in temp directory.

This change eliminates the last place where Perl
script "errchk" was used.

Fixes #25586.

Change-Id: If085f466e6955312d77315f96d3ef1cb68495aef
Reviewed-on: https://go-review.googlesource.com/115277
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-05-31 13:29:50 +00:00
Keith Randall
db9341a024 cmd/compile: update WBLoads during deadcode
When we deadcode-remove a block which is a write barrier test,
remove that block from the list of write barrier test blocks.

Fixes #25516

Change-Id: I1efe732d5476003eab4ad6bf67d0340d7874ff0c
Reviewed-on: https://go-review.googlesource.com/115037
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-29 17:45:36 +00:00
Keith Randall
d15d055054 cmd/compile: reject large argument areas
Extend stack frame limit of 1GB to include large argument/return areas.
Argument/return areas are part of the parent frame, not the frame itself,
so they need to be handled separately.

Fixes #25507.

Change-Id: I309298a58faee3e7c1dac80bd2f1166c82460087
Reviewed-on: https://go-review.googlesource.com/115036
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-29 16:58:05 +00:00
Josh Bleecher Snyder
002c764533 test: gofmt bounds.go
Change-Id: I8b462e20064658120afc8eb1cbac926254d1e24e
Reviewed-on: https://go-review.googlesource.com/114937
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-05-29 02:39:16 +00:00
Yury Smolsky
cb80c28961 test: eliminate use of Perl in test/fixedbugs/bug248.go
This change enables bug248 to be tested with Go code.
For that, it adds a flag -1 to error check and run directory
with one package failing compilation prior the last package
which should be run.

Specifically, the "p" package in bug1.go file was renamed into "q"
to compile them in separate steps,
bug2.go and bug3.go files were reordered,
bug2.go was changed into non-main package.

Updates #25586.

Change-Id: Ie47aacd56ebb2ce4eac66c792d1a53e1e30e637c
Reviewed-on: https://go-review.googlesource.com/114818
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-26 15:20:42 +00:00
Ben Shi
8a85bce215 test/codegen: improve test cases for arm64
1. Some incorrect test cases are disabled.
2. Some wrong test cases are corrected.
3. Some new test cases are added.

Change-Id: Ib5d0473d55159f233ddab79f96967eaec7b08597
Reviewed-on: https://go-review.googlesource.com/113736
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-22 14:50:41 +00:00
Austin Clements
a367f44c18 cmd/compile: enable stack maps everywhere except unsafe points
This modifies issafepoint in liveness analysis to report almost every
operation as a safe point. There are four things we don't mark as
safe-points:

1. Runtime code (other than at calls).

2. go:nosplit functions (other than at calls).

3. Instructions between the load of the write barrier-enabled flag and
   the write.

4. Instructions leading up to a uintptr -> unsafe.Pointer conversion.

We'll optimize this in later CLs:

name        old time/op       new time/op       delta
Template          185ms ± 2%        190ms ± 2%   +2.95%  (p=0.000 n=10+10)
Unicode          96.3ms ± 3%       96.4ms ± 1%     ~     (p=0.905 n=10+9)
GoTypes           658ms ± 0%        669ms ± 1%   +1.72%  (p=0.000 n=10+9)
Compiler          3.14s ± 1%        3.18s ± 1%   +1.56%  (p=0.000 n=9+10)
SSA               7.41s ± 2%        7.59s ± 1%   +2.48%  (p=0.000 n=9+10)
Flate             126ms ± 1%        128ms ± 1%   +2.08%  (p=0.000 n=10+10)
GoParser          153ms ± 1%        157ms ± 2%   +2.38%  (p=0.000 n=10+10)
Reflect           437ms ± 1%        442ms ± 1%   +0.98%  (p=0.001 n=10+10)
Tar               178ms ± 1%        179ms ± 1%   +0.67%  (p=0.035 n=10+9)
XML               223ms ± 1%        229ms ± 1%   +2.58%  (p=0.000 n=10+10)
[Geo mean]        394ms             401ms        +1.75%

No effect on binary size because we're not yet emitting these extra
safe points.

For #24543.

Change-Id: I16a1eebb9183cad7cef9d53c0fd21a973cad6859
Reviewed-on: https://go-review.googlesource.com/109348
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-05-22 14:43:37 +00:00
Austin Clements
837ed98d63 cmd/compile: don't produce a past-the-end pointer in range loops
Currently, range loops over slices and arrays are compiled roughly
like:

for i, x := range s { b }
  ⇓
for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b }
  ⇓
i, _n, _p := 0, len(s), &s[0]
goto cond
body:
{ b }
i, _p = i+1, _p + unsafe.Sizeof(s[0])
cond:
if i < _n { goto body } else { goto end }
end:

The problem with this lowering is that _p may temporarily point past
the end of the allocation the moment before the loop terminates. Right
now this isn't a problem because there's never a safe-point during
this brief moment.

We're about to introduce safe-points everywhere, so this bad pointer
is going to be a problem. We could mark the increment as an unsafe
block, but this inhibits reordering opportunities and could result in
infrequent safe-points if the body is short.

Instead, this CL fixes this by changing how we compile range loops to
never produce this past-the-end pointer. It changes the lowering to
roughly:

i, _n, _p := 0, len(s), &s[0]
if i < _n { goto body } else { goto end }
top:
_p += unsafe.Sizeof(s[0])
body:
{ b }
i++
if i < _n { goto top } else { goto end }
end:

Notably, the increment is split into two parts: we increment the index
before checking the condition, but increment the pointer only *after*
the condition check has succeeded.

The implementation builds on the OFORUNTIL construct that was
introduced during the loop preemption experiments, since OFORUNTIL
places the increment and condition after the loop body. To support the
extra "late increment" step, we further define OFORUNTIL's "List"
field to contain the late increment statements. This makes all of this
a relatively small change.

This depends on the improvements to the prove pass in CL 102603. With
the current lowering, bounds-check elimination knows that i < _n in
the body because the body block is dominated by the cond block. In the
new lowering, deriving this fact requires detecting that i < _n on
*both* paths into body and hence is true in body. CL 102603 made prove
able to detect this.

The code size effect of this is minimal. The cmd/go binary on
linux/amd64 increases by 0.17%. Performance-wise, this actually
appears to be a net win, though it's mostly noise:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.80s ± 0%     2.61s ± 1%  -6.88%  (p=0.000 n=20+18)
Fannkuch11-12                2.41s ± 0%     2.42s ± 0%  +0.05%  (p=0.005 n=20+20)
FmtFprintfEmpty-12          41.6ns ± 5%    41.4ns ± 6%    ~     (p=0.765 n=20+19)
FmtFprintfString-12         69.4ns ± 3%    69.3ns ± 1%    ~     (p=0.084 n=19+17)
FmtFprintfInt-12            76.1ns ± 1%    77.3ns ± 1%  +1.57%  (p=0.000 n=19+19)
FmtFprintfIntInt-12          122ns ± 2%     123ns ± 3%  +0.95%  (p=0.015 n=20+20)
FmtFprintfPrefixedInt-12     153ns ± 2%     151ns ± 3%  -1.27%  (p=0.013 n=20+20)
FmtFprintfFloat-12           215ns ± 0%     216ns ± 0%  +0.47%  (p=0.000 n=20+16)
FmtManyArgs-12               486ns ± 1%     498ns ± 0%  +2.40%  (p=0.000 n=20+17)
GobDecode-12                6.43ms ± 0%    6.50ms ± 0%  +1.08%  (p=0.000 n=18+19)
GobEncode-12                5.43ms ± 1%    5.47ms ± 0%  +0.76%  (p=0.000 n=20+20)
Gzip-12                      218ms ± 1%     218ms ± 1%    ~     (p=0.883 n=20+20)
Gunzip-12                   38.8ms ± 0%    38.9ms ± 0%    ~     (p=0.644 n=19+19)
HTTPClientServer-12         76.2µs ± 1%    76.4µs ± 2%    ~     (p=0.218 n=20+20)
JSONEncode-12               12.2ms ± 0%    12.3ms ± 1%  +0.45%  (p=0.000 n=19+19)
JSONDecode-12               54.2ms ± 1%    53.3ms ± 0%  -1.67%  (p=0.000 n=20+20)
Mandelbrot200-12            3.71ms ± 0%    3.71ms ± 0%    ~     (p=0.143 n=19+20)
GoParse-12                  3.22ms ± 0%    3.19ms ± 1%  -0.72%  (p=0.000 n=20+20)
RegexpMatchEasy0_32-12      76.7ns ± 1%    75.8ns ± 1%  -1.19%  (p=0.000 n=20+17)
RegexpMatchEasy0_1K-12       245ns ± 1%     243ns ± 0%  -0.72%  (p=0.000 n=18+17)
RegexpMatchEasy1_32-12      71.9ns ± 0%    71.7ns ± 1%  -0.39%  (p=0.006 n=12+18)
RegexpMatchEasy1_1K-12       358ns ± 1%     354ns ± 1%  -1.13%  (p=0.000 n=20+19)
RegexpMatchMedium_32-12      105ns ± 2%     105ns ± 1%  -0.63%  (p=0.007 n=19+20)
RegexpMatchMedium_1K-12     31.9µs ± 1%    31.9µs ± 1%    ~     (p=1.000 n=17+17)
RegexpMatchHard_32-12       1.51µs ± 1%    1.52µs ± 2%  +0.46%  (p=0.042 n=18+18)
RegexpMatchHard_1K-12       45.3µs ± 1%    45.5µs ± 2%  +0.44%  (p=0.029 n=18+19)
Revcomp-12                   388ms ± 1%     385ms ± 0%  -0.57%  (p=0.000 n=19+18)
Template-12                 63.0ms ± 1%    63.3ms ± 0%  +0.50%  (p=0.000 n=19+20)
TimeParse-12                 309ns ± 1%     307ns ± 0%  -0.62%  (p=0.000 n=20+20)
TimeFormat-12                328ns ± 0%     333ns ± 0%  +1.35%  (p=0.000 n=19+19)
[Geo mean]                  47.0µs         46.9µs       -0.20%

(https://perf.golang.org/search?q=upload:20180326.1)

For #10958.
For #24543.

Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a
Reviewed-on: https://go-review.googlesource.com/102604
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-05-22 14:15:46 +00:00
Austin Clements
b812eec928 cmd/compile: detect OFORUNTIL inductive facts in prove
Currently, we compile range loops into for loops with the obvious
initialization and update of the index variable. In this form, the
prove pass can see that the body is dominated by an i < len condition,
and findIndVar can detect that i is an induction variable and that
0 <= i < len.

GOEXPERIMENT=preemptibleloops compiles range loops to OFORUNTIL and
we're preparing to unconditionally switch to a variation of this for
 #24543. OFORUNTIL moves the increment and condition *after* the body,
which makes the bounds on the index variable much less obvious. With
OFORUNTIL, proving anything about the index variable requires
understanding the phi that joins the index values at the top of the
loop body block.

This interferes with both prove's ability to see that i < len (this is
true on both paths that enter the body, but from two different
conditional checks) and with findIndVar's ability to detect the
induction pattern.

Fix this by teaching prove to detect that the index in the pattern
constructed by OFORUNTIL is an induction variable and add both bounds
to the facts table. Currently this is done separately from findIndVar
because it depends on prove's factsTable, while findIndVar runs before
visiting blocks and building the factsTable.

Without any GOEXPERIMENT, this has no effect on std or cmd. However,
with GOEXPERIMENT=preemptibleloops, this change becomes necessary to
prove 90 conditions in std and cmd.

Change-Id: Ic025d669f81b53426309da5a6e8010e5ccaf4f49
Reviewed-on: https://go-review.googlesource.com/102603
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-22 14:15:44 +00:00
David Chase
eeff8fa453 cmd/compile: remove now-irrelevant test
This test measures "line churn" which was minimized to help
improve the debugger experience.  With proper is_stmt markers,
this is no longer necessary, and it is more accurate (for
profiling) to allow line numbers to vary willy-nilly.

"Debugger experience" is now better measured by
cmd/compile/internal/ssa/debug_test.go

This CL made the obsoleting change:
https://go-review.googlesource.com/c/go/+/102435

Change-Id: I874ab89f3b243b905aaeba7836118f632225a667
Reviewed-on: https://go-review.googlesource.com/113155
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-14 20:33:00 +00:00
David Chase
c2c1822b12 cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).

Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).

Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.

Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".

The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices.  This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.

Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go.  There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.

Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)

This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.

The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.

This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.

Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-14 14:09:49 +00:00
Keith Randall
599f56dc05 cmd/compile: fix zero extend after float->int conversion
Don't do direct loads from argument slots if the sizes don't match.
This prevents us from loading from a float32 using a uint64 load
during expressions like uint64(math.float32Bits(f)) where f is a float32 arg.

Fixes #25322

Change-Id: I3887d76f78c844ba546243e7721d811c3d4a9700
Reviewed-on: https://go-review.googlesource.com/112637
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-10 17:32:43 +00:00
Lynn Boger
e4172e5f5e cmd/compile/internal/ssa: initialize t.UInt in SetTypPtrs()
Initialization of t.UInt is missing from SetTypPtrs in config.go,
preventing rules that use it from matching when they should.
This adds the initialization to allow those rules to work.

Updated test/codegen/rotate.go to test for this case, which
appears in math/bits RotateLeft32 and RotateLeft64. There had been
a testcase for this in go 1.10 but that went away when asm_test.go
was removed.

Change-Id: I82fc825ad8364df6fc36a69a1e448214d2e24ed5
Reviewed-on: https://go-review.googlesource.com/112518
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-10 14:50:30 +00:00
Michael Munday
6d00e8c478 cmd/compile: convert memmove call into Move when arguments are disjoint
Move ops can be faster than memmove calls because the number of bytes
to be moved is fixed and they don't incur the overhead of a call.
This change allows memmove to be converted into a Move op when the
arguments are disjoint.

The optimization is only enabled on s390x at the moment, however
other architectures may also benefit from it in the future. The
memmove inlining rule triggers an extra 12 times when compiling the
standard library. It will most likely make more of a difference as the
disjoint function is improved over time (to recognize fresh heap
allocations for example).

Change-Id: I9af570dcfff28257b8e59e0ff584a46d8e248310
Reviewed-on: https://go-review.googlesource.com/110064
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-09 11:20:40 +00:00
Matt Juran
e0adc35c47 test: fast GC+concurrency+types verification
This test runs independent goroutines modifying a comprehensive variety
of local vars to look for garbage collector regressions. This test has
been verified to trigger issue 22781 on the go1.9.2 tag. This test
expands on test/fixedbugs/issue22781.go.

Tests #22781

Change-Id: Id32f8dde7ef650aea1b1b4cf518e6d045537bfdc
Reviewed-on: https://go-review.googlesource.com/93715
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-08 21:15:48 +00:00
Martin Möhrmann
aee71dd70b cmd/compile: optimize map-clearing range idiom
replace map clears of the form:

        for k := range m {
                delete(m, k)
        }

(where m is map with key type that is reflexive for ==)
with a new runtime function that clears the maps backing
array with a memclr and reinitializes the hmap struct.

Map key types that for example contain floats are not
replaced by this optimization since NaN keys cannot
be deleted from maps using delete.

name                           old time/op  new time/op  delta
GoMapClear/Reflexive/1         92.2ns ± 1%  47.1ns ± 2%  -48.89%  (p=0.000 n=9+9)
GoMapClear/Reflexive/10         108ns ± 1%    48ns ± 2%  -55.68%  (p=0.000 n=10+10)
GoMapClear/Reflexive/100        303ns ± 2%   110ns ± 3%  -63.56%  (p=0.000 n=10+10)
GoMapClear/Reflexive/1000      3.58µs ± 3%  1.23µs ± 2%  -65.49%  (p=0.000 n=9+10)
GoMapClear/Reflexive/10000     28.2µs ± 3%  10.3µs ± 2%  -63.55%  (p=0.000 n=9+10)
GoMapClear/NonReflexive/1       121ns ± 2%   124ns ± 7%     ~     (p=0.097 n=10+10)
GoMapClear/NonReflexive/10      137ns ± 2%   139ns ± 3%   +1.53%  (p=0.033 n=10+10)
GoMapClear/NonReflexive/100     331ns ± 3%   334ns ± 2%     ~     (p=0.342 n=10+10)
GoMapClear/NonReflexive/1000   3.64µs ± 3%  3.64µs ± 2%     ~     (p=0.887 n=9+10)
GoMapClear/NonReflexive/10000  28.1µs ± 2%  28.4µs ± 3%     ~     (p=0.247 n=10+10)

Fixes #20138

Change-Id: I181332a8ef434a4f0d89659f492d8711db3f3213
Reviewed-on: https://go-review.googlesource.com/110055
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 21:15:16 +00:00
Michael Munday
8af0c77df3 cmd/compile: simplify shift lowering on s390x
Use conditional moves instead of subtractions with borrow to handle
saturation cases. This allows us to delete the SUBE/SUBEW ops and
associated rules from the SSA backend. Using conditional moves also
means we can detect when shift values are masked so I've added some
new rules to constant fold the relevant comparisons and masking ops.

Also use the new shiftIsBounded() function to avoid generating code
to handle saturation cases where possible.

Updates #25167 for s390x.

Change-Id: Ief9991c91267c9151ce4c5ec07642abb4dcc1c0d
Reviewed-on: https://go-review.googlesource.com/110070
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-08 16:19:56 +00:00
Lynn Boger
28edaf4584 cmd/compile,test: combine byte loads and stores on ppc64le
CL 74410 added rules to combine consecutive byte loads and
stores when the byte order was little endian for ppc64le. This
is the corresponding change for bytes that are in big endian order.
These rules are all intended for a little endian target arch.

This adds new testcases in test/codegen/memcombine.go

Fixes #22496
Updates #24242

Benchmark improvement for encoding/binary:
name                      old time/op    new time/op    delta
ReadSlice1000Int32s-16      11.0µs ± 0%     9.0µs ± 0%  -17.47%  (p=0.029 n=4+4)
ReadStruct-16               2.47µs ± 1%    2.48µs ± 0%   +0.67%  (p=0.114 n=4+4)
ReadInts-16                  642ns ± 1%     630ns ± 1%   -2.02%  (p=0.029 n=4+4)
WriteInts-16                 654ns ± 0%     653ns ± 1%   -0.08%  (p=0.629 n=4+4)
WriteSlice1000Int32s-16     8.75µs ± 0%    8.20µs ± 0%   -6.19%  (p=0.029 n=4+4)
PutUint16-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint32-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint64-16                1.85ns ± 0%    0.93ns ± 0%  -49.73%  (p=0.029 n=4+4)
LittleEndianPutUint16-16    1.03ns ± 0%    0.93ns ± 0%   -9.71%  (p=0.029 n=4+4)
LittleEndianPutUint32-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
LittleEndianPutUint64-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
PutUvarint32-16             43.0ns ± 0%    43.1ns ± 0%   +0.12%  (p=0.429 n=4+4)
PutUvarint64-16              174ns ± 0%     175ns ± 0%   +0.29%  (p=0.429 n=4+4)

Updates made to functions in gcm.go to enable their matching. An existing
testcase prevents these functions from being replaced by those in encoding/binary
due to import dependencies.

Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a
Reviewed-on: https://go-review.googlesource.com/98136
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 13:15:39 +00:00
Michael Munday
f31a18ded4 cmd/compile: add some generic composite type optimizations
Propagate values through some wide Zero/Move operations. Among
other things this allows us to optimize some kinds of array
initialization. For example, the following code no longer
requires a temporary be allocated on the stack. Instead it
writes the values directly into the return value.

func f(i uint32) [4]uint32 {
    return [4]uint32{i, i+1, i+2, i+3}
}

The return value is unnecessarily cleared but removing that is
probably a task for dead store analysis (I think it needs to
be able to match multiple Store ops to wide Zero ops).

In order to reliably remove stack variables that are rendered
unnecessary by these new rules I've added a new generic version
of the unread autos elimination pass.

These rules are triggered more than 5000 times when building and
testing the standard library.

Updates #15925 (fixes for arrays of up to 4 elements).
Updates #24386 (fixes for up to 4 kept elements).
Updates #24416.

compilebench results:

name       old time/op       new time/op       delta
Template         353ms ± 5%        359ms ± 3%    ~     (p=0.143 n=10+10)
Unicode          219ms ± 1%        217ms ± 4%    ~     (p=0.740 n=7+10)
GoTypes          1.26s ± 1%        1.26s ± 2%    ~     (p=0.549 n=9+10)
Compiler         6.00s ± 1%        6.08s ± 1%  +1.42%  (p=0.000 n=9+8)
SSA              15.3s ± 2%        15.6s ± 1%  +2.43%  (p=0.000 n=10+10)
Flate            237ms ± 2%        240ms ± 2%  +1.31%  (p=0.015 n=10+10)
GoParser         285ms ± 1%        285ms ± 1%    ~     (p=0.878 n=8+8)
Reflect          797ms ± 3%        807ms ± 2%    ~     (p=0.065 n=9+10)
Tar              334ms ± 0%        335ms ± 4%    ~     (p=0.460 n=8+10)
XML              419ms ± 0%        423ms ± 1%  +0.91%  (p=0.001 n=7+9)
StdCmd           46.0s ± 0%        46.4s ± 0%  +0.85%  (p=0.000 n=9+9)

name       old user-time/op  new user-time/op  delta
Template         337ms ± 3%        346ms ± 5%    ~     (p=0.053 n=9+10)
Unicode          205ms ±10%        205ms ± 8%    ~     (p=1.000 n=10+10)
GoTypes          1.22s ± 2%        1.21s ± 3%    ~     (p=0.436 n=10+10)
Compiler         5.85s ± 1%        5.93s ± 0%  +1.46%  (p=0.000 n=10+8)
SSA              14.9s ± 1%        15.3s ± 1%  +2.62%  (p=0.000 n=10+10)
Flate            229ms ± 4%        228ms ± 6%    ~     (p=0.796 n=10+10)
GoParser         271ms ± 3%        275ms ± 4%    ~     (p=0.165 n=10+10)
Reflect          779ms ± 5%        775ms ± 2%    ~     (p=0.971 n=10+10)
Tar              317ms ± 4%        319ms ± 5%    ~     (p=0.853 n=10+10)
XML              404ms ± 4%        409ms ± 5%    ~     (p=0.436 n=10+10)

name       old alloc/op      new alloc/op      delta
Template        34.9MB ± 0%       35.0MB ± 0%  +0.26%  (p=0.000 n=10+10)
Unicode         29.3MB ± 0%       29.3MB ± 0%  +0.02%  (p=0.000 n=10+10)
GoTypes          115MB ± 0%        115MB ± 0%  +0.30%  (p=0.000 n=10+10)
Compiler         519MB ± 0%        521MB ± 0%  +0.30%  (p=0.000 n=10+10)
SSA             1.55GB ± 0%       1.57GB ± 0%  +1.34%  (p=0.000 n=10+9)
Flate           24.1MB ± 0%       24.2MB ± 0%  +0.10%  (p=0.000 n=10+10)
GoParser        28.1MB ± 0%       28.1MB ± 0%  +0.07%  (p=0.000 n=10+10)
Reflect         78.7MB ± 0%       78.7MB ± 0%  +0.03%  (p=0.000 n=8+10)
Tar             34.4MB ± 0%       34.5MB ± 0%  +0.12%  (p=0.000 n=10+10)
XML             43.2MB ± 0%       43.2MB ± 0%  +0.13%  (p=0.000 n=10+10)

name       old allocs/op     new allocs/op     delta
Template          330k ± 0%         330k ± 0%  -0.01%  (p=0.017 n=10+10)
Unicode           337k ± 0%         337k ± 0%  +0.01%  (p=0.000 n=9+10)
GoTypes          1.15M ± 0%        1.15M ± 0%  +0.03%  (p=0.000 n=10+10)
Compiler         4.77M ± 0%        4.77M ± 0%  +0.03%  (p=0.000 n=9+10)
SSA              12.5M ± 0%        12.6M ± 0%  +1.16%  (p=0.000 n=10+10)
Flate             221k ± 0%         221k ± 0%  +0.05%  (p=0.000 n=9+10)
GoParser          275k ± 0%         275k ± 0%  +0.01%  (p=0.014 n=10+9)
Reflect           944k ± 0%         944k ± 0%  -0.02%  (p=0.000 n=10+10)
Tar               324k ± 0%         323k ± 0%  -0.12%  (p=0.000 n=10+10)
XML               384k ± 0%         384k ± 0%  -0.01%  (p=0.001 n=10+10)

name       old object-bytes  new object-bytes  delta
Template         476kB ± 0%        476kB ± 0%  -0.04%  (p=0.000 n=10+10)
Unicode          218kB ± 0%        218kB ± 0%    ~     (all equal)
GoTypes         1.58MB ± 0%       1.58MB ± 0%  -0.04%  (p=0.000 n=10+10)
Compiler        6.25MB ± 0%       6.24MB ± 0%  -0.09%  (p=0.000 n=10+10)
SSA             15.9MB ± 0%       16.1MB ± 0%  +1.22%  (p=0.000 n=10+10)
Flate            304kB ± 0%        304kB ± 0%  -0.13%  (p=0.000 n=10+10)
GoParser         370kB ± 0%        370kB ± 0%  -0.00%  (p=0.000 n=10+10)
Reflect         1.27MB ± 0%       1.27MB ± 0%  -0.12%  (p=0.000 n=10+10)
Tar              421kB ± 0%        419kB ± 0%  -0.64%  (p=0.000 n=10+10)
XML              518kB ± 0%        517kB ± 0%  -0.12%  (p=0.000 n=10+10)

name       old export-bytes  new export-bytes  delta
Template        16.7kB ± 0%       16.7kB ± 0%    ~     (all equal)
Unicode         6.52kB ± 0%       6.52kB ± 0%    ~     (all equal)
GoTypes         29.2kB ± 0%       29.2kB ± 0%    ~     (all equal)
Compiler        88.0kB ± 0%       88.0kB ± 0%    ~     (all equal)
SSA              109kB ± 0%        109kB ± 0%    ~     (all equal)
Flate           4.49kB ± 0%       4.49kB ± 0%    ~     (all equal)
GoParser        8.10kB ± 0%       8.10kB ± 0%    ~     (all equal)
Reflect         7.71kB ± 0%       7.71kB ± 0%    ~     (all equal)
Tar             9.15kB ± 0%       9.15kB ± 0%    ~     (all equal)
XML             12.3kB ± 0%       12.3kB ± 0%    ~     (all equal)

name       old text-bytes    new text-bytes    delta
HelloSize        676kB ± 0%        672kB ± 0%  -0.59%  (p=0.000 n=10+10)
CmdGoSize       7.26MB ± 0%       7.24MB ± 0%  -0.18%  (p=0.000 n=10+10)

name       old data-bytes    new data-bytes    delta
HelloSize       10.2kB ± 0%       10.2kB ± 0%    ~     (all equal)
CmdGoSize        248kB ± 0%        248kB ± 0%    ~     (all equal)

name       old bss-bytes     new bss-bytes     delta
HelloSize        125kB ± 0%        125kB ± 0%    ~     (all equal)
CmdGoSize        145kB ± 0%        145kB ± 0%    ~     (all equal)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.46MB ± 0%       1.45MB ± 0%  -0.31%  (p=0.000 n=10+10)
CmdGoSize       14.7MB ± 0%       14.7MB ± 0%  -0.17%  (p=0.000 n=10+10)

Change-Id: Ic72b0c189dd542f391e1c9ab88a76e9148dc4285
Reviewed-on: https://go-review.googlesource.com/106495
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 10:31:21 +00:00
Ben Shi
098ca846c7 cmd/compile: emit more compact 386 instructions
ADDL/SUBL/ANDL/ORL/XORL can have a memory operand as destination,
and this CL optimize the compiler to emit such instructions on
386 for more compact binary.

Here is test report:
1. The total size of pkg/linux_386/ and pkg/tool/linux_386/ decreases
about 14KB.
(pkg/linux_386/cmd/compile/ and pkg/tool/linux_386/compile are excluded)

2. The go1 benchmark shows little change, excluding ±2% noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.34s ± 2%     3.38s ± 2%  +1.27%  (p=0.000 n=40+39)
Fannkuch11-4                3.55s ± 1%     3.51s ± 1%  -1.33%  (p=0.000 n=40+40)
FmtFprintfEmpty-4          46.3ns ± 3%    46.9ns ± 4%  +1.41%  (p=0.002 n=40+40)
FmtFprintfString-4         80.8ns ± 3%    80.4ns ± 6%  -0.54%  (p=0.044 n=40+40)
FmtFprintfInt-4            93.0ns ± 3%    92.2ns ± 4%  -0.88%  (p=0.007 n=39+40)
FmtFprintfIntInt-4          144ns ± 5%     145ns ± 2%  +0.78%  (p=0.015 n=40+40)
FmtFprintfPrefixedInt-4     184ns ± 2%     182ns ± 2%  -1.06%  (p=0.004 n=40+40)
FmtFprintfFloat-4           415ns ± 4%     419ns ± 4%    ~     (p=0.434 n=40+40)
FmtManyArgs-4               615ns ± 3%     619ns ± 3%    ~     (p=0.100 n=40+40)
GobDecode-4                7.30ms ± 6%    7.36ms ± 6%    ~     (p=0.074 n=40+40)
GobEncode-4                7.10ms ± 6%    7.21ms ± 5%    ~     (p=0.082 n=40+39)
Gzip-4                      364ms ± 3%     362ms ± 6%  -0.71%  (p=0.020 n=40+40)
Gunzip-4                   42.4ms ± 3%    42.2ms ± 3%    ~     (p=0.303 n=40+40)
HTTPClientServer-4         62.9µs ± 1%    62.9µs ± 1%    ~     (p=0.768 n=38+39)
JSONEncode-4               21.4ms ± 4%    21.5ms ± 5%    ~     (p=0.210 n=40+40)
JSONDecode-4               67.7ms ± 3%    67.9ms ± 4%    ~     (p=0.713 n=40+40)
Mandelbrot200-4            5.18ms ± 3%    5.21ms ± 3%  +0.59%  (p=0.021 n=40+40)
GoParse-4                  3.35ms ± 3%    3.34ms ± 2%    ~     (p=0.996 n=40+40)
RegexpMatchEasy0_32-4      98.5ns ± 5%    96.3ns ± 4%  -2.15%  (p=0.001 n=40+40)
RegexpMatchEasy0_1K-4       851ns ± 4%     850ns ± 5%    ~     (p=0.700 n=40+40)
RegexpMatchEasy1_32-4       105ns ± 7%     107ns ± 4%  +1.50%  (p=0.017 n=40+40)
RegexpMatchEasy1_1K-4      1.03µs ± 5%    1.03µs ± 4%    ~     (p=0.992 n=40+40)
RegexpMatchMedium_32-4      130ns ± 6%     128ns ± 4%  -1.66%  (p=0.012 n=40+40)
RegexpMatchMedium_1K-4     44.0µs ± 5%    43.6µs ± 3%    ~     (p=0.704 n=40+40)
RegexpMatchHard_32-4       2.29µs ± 3%    2.23µs ± 4%  -2.38%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4       69.0µs ± 3%    68.1µs ± 3%  -1.28%  (p=0.003 n=40+40)
Revcomp-4                   1.85s ± 2%     1.87s ± 3%  +1.11%  (p=0.000 n=40+40)
Template-4                 69.8ms ± 3%    69.6ms ± 3%    ~     (p=0.125 n=40+40)
TimeParse-4                 442ns ± 5%     440ns ± 3%    ~     (p=0.585 n=40+40)
TimeFormat-4                419ns ± 3%     420ns ± 3%    ~     (p=0.824 n=40+40)
[Geo mean]                 67.3µs         67.2µs       -0.11%

name                     old speed      new speed      delta
GobDecode-4               105MB/s ± 6%   104MB/s ± 6%    ~     (p=0.074 n=40+40)
GobEncode-4               108MB/s ± 7%   107MB/s ± 5%    ~     (p=0.080 n=40+39)
Gzip-4                   53.3MB/s ± 3%  53.7MB/s ± 6%  +0.73%  (p=0.021 n=40+40)
Gunzip-4                  458MB/s ± 3%   460MB/s ± 3%    ~     (p=0.301 n=40+40)
JSONEncode-4             90.8MB/s ± 4%  90.3MB/s ± 4%    ~     (p=0.213 n=40+40)
JSONDecode-4             28.7MB/s ± 3%  28.6MB/s ± 4%    ~     (p=0.679 n=40+40)
GoParse-4                17.3MB/s ± 3%  17.3MB/s ± 2%    ~     (p=1.000 n=40+40)
RegexpMatchEasy0_32-4     325MB/s ± 5%   333MB/s ± 4%  +2.44%  (p=0.000 n=40+38)
RegexpMatchEasy0_1K-4    1.20GB/s ± 4%  1.21GB/s ± 5%    ~     (p=0.684 n=40+40)
RegexpMatchEasy1_32-4     303MB/s ± 7%   298MB/s ± 4%  -1.52%  (p=0.022 n=40+40)
RegexpMatchEasy1_1K-4     995MB/s ± 5%   996MB/s ± 4%    ~     (p=0.996 n=40+40)
RegexpMatchMedium_32-4   7.67MB/s ± 6%  7.80MB/s ± 4%  +1.68%  (p=0.011 n=40+40)
RegexpMatchMedium_1K-4   23.3MB/s ± 5%  23.5MB/s ± 3%    ~     (p=0.697 n=40+40)
RegexpMatchHard_32-4     14.0MB/s ± 3%  14.3MB/s ± 4%  +2.43%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4     14.8MB/s ± 3%  15.0MB/s ± 3%  +1.30%  (p=0.003 n=40+40)
Revcomp-4                 137MB/s ± 2%   136MB/s ± 3%  -1.10%  (p=0.000 n=40+40)
Template-4               27.8MB/s ± 3%  27.9MB/s ± 3%    ~     (p=0.128 n=40+40)
[Geo mean]               79.6MB/s       79.9MB/s       +0.28%

Change-Id: I02a3efc125dc81e18fc8495eb2bf1bba59ab8733
Reviewed-on: https://go-review.googlesource.com/110157
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-08 06:44:54 +00:00
Martin Möhrmann
b9a59d9f2e cmd/compile: optimize len([]rune(string))
Adds a new runtime function to count runes in a string.
Modifies the compiler to detect the pattern len([]rune(string))
and replaces it with the new rune counting runtime function.

RuneCount/lenruneslice/ASCII                  27.8ns ± 2%  14.5ns ± 3%  -47.70%  (p=0.000 n=10+10)
RuneCount/lenruneslice/Japanese                126ns ± 2%    60ns ± 2%  -52.03%  (p=0.000 n=10+10)
RuneCount/lenruneslice/MixedLength             104ns ± 2%    50ns ± 1%  -51.71%  (p=0.000 n=10+9)

Fixes #24923

Change-Id: Ie9c7e7391a4e2cca675c5cdcc1e5ce7d523948b9
Reviewed-on: https://go-review.googlesource.com/108985
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-05-06 05:31:01 +00:00
Martin Möhrmann
a8a60ac2a7 cmd/compile: optimize append(x, make([]T, y)...) slice extension
Changes the compiler to recognize the slice extension pattern

  append(x, make([]T, y)...)

and replace it with growslice and an optional memclr to avoid an allocation for make([]T, y).

Memclr is not called in case growslice already allocated a new cleared backing array
when T contains pointers.

amd64:
name                      old time/op    new time/op    delta
ExtendSlice/IntSlice         103ns ± 4%      57ns ± 4%   -44.55%  (p=0.000 n=18+18)
ExtendSlice/PointerSlice     155ns ± 3%      77ns ± 3%   -49.93%  (p=0.000 n=20+20)
ExtendSlice/NoGrow          50.2ns ± 3%     5.2ns ± 2%   -89.67%  (p=0.000 n=18+18)

name                      old alloc/op   new alloc/op   delta
ExtendSlice/IntSlice         64.0B ± 0%     32.0B ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/PointerSlice     64.0B ± 0%     32.0B ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/NoGrow           32.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)

name                      old allocs/op  new allocs/op  delta
ExtendSlice/IntSlice          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/PointerSlice      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/NoGrow            1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)

Fixes #21266

Change-Id: Idc3077665f63cbe89762b590c5967a864fd1c07f
Reviewed-on: https://go-review.googlesource.com/109517
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-05-06 04:28:23 +00:00
Martin Möhrmann
500d79c410 cmd/compile: refactor memclrrange for arrays and slices
Rename memclrrange to signify that it does not handle
all types of range clears.

Simplify checks to detect the range clear idiom for
arrays and slices.

Add tests to verify the optimization for the slice
range clear idiom is being applied by the compiler.

Change-Id: I5c3b7c9a479699ebdb4c407fde692f30f377860c
Reviewed-on: https://go-review.googlesource.com/110477
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-02 04:20:25 +00:00
Matthew Dempsky
004260afde cmd/compile: open code select{send,recv,default}
Registration now looks like:

        var cases [4]runtime.scases
        var order [8]uint16
	cases[0].kind = caseSend
	cases[0].c = c1
	cases[0].elem = &v1
	if raceenabled || msanenabled {
		selectsetpc(&cases[0])
	}
	cases[1].kind = caseRecv
	cases[1].c = c2
	cases[1].elem = &v2
	if raceenabled || msanenabled {
		selectsetpc(&cases[1])
	}
	...

Change-Id: Ib9bcf426a4797fe4bfd8152ca9e6e08e39a70b48
Reviewed-on: https://go-review.googlesource.com/37934
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:44 +00:00
Matthew Dempsky
3aa53b3135 runtime: eliminate runtime.hselect
Now the registration phase looks like:

    var cases [4]runtime.scases
    var order [8]uint16
    selectsend(&cases[0], c1, &v1)
    selectrecv(&cases[1], c2, &v2, nil)
    selectrecv(&cases[2], c3, &v3, &ok)
    selectdefault(&cases[3])
    chosen := selectgo(&cases[0], &order[0], 4)

Primarily, this is just preparation for having the compiler open-code
selectsend, selectrecv, and selectdefault.

As a minor benefit, order can now be layed out separately on the stack
in the pointer-free segment, so it won't take up space in the
function's stack pointer maps.

Change-Id: I5552ba594201efd31fcb40084da20b42ea569a45
Reviewed-on: https://go-review.googlesource.com/37933
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:31 +00:00
Richard Musiol
e3c684777a all: skip unsupported tests for js/wasm
The general policy for the current state of js/wasm is that it only
has to support tests that are also supported by nacl.

The test nilptr3.go makes assumptions about which nil checks can be
removed. Since WebAssembly does not signal on reading a null pointer,
all nil checks have to be explicit.

Updates #18892

Change-Id: I06a687860b8d22ae26b1c391499c0f5183e4c485
Reviewed-on: https://go-review.googlesource.com/110096
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-30 19:39:18 +00:00
Giovanni Bajo
e0d37a33ab cmd/compile: teach prove to handle expressions like len(s)-delta
When a loop has bound len(s)-delta, findIndVar detected it and
returned len(s) as (conservative) upper bound. This little lie
allowed loopbce to drop bound checks.

It is obviously more generic to teach prove about relations like
x+d<w for non-constant "w"; we already handled the case for
constant "w", so we just want to learn that if d<0, then x+d<w
proves that x<w.

To be able to remove the code from findIndVar, we also need
to teach prove that len() and cap() are always non-negative.

This CL allows to prove 633 more checks in cmd+std. Most
of them are cases where the code was already testing before
accessing a slice but the compiler didn't know it. For instance,
take strings.HasSuffix:

    func HasSuffix(s, suffix string) bool {
        return len(s) >= len(suffix) && s[len(s)-len(suffix):] == suffix
    }

When suffix is a literal string, the compiler now understands
that the explicit check is enough to not emit a slice check.

I also found a loopbce test that was incorrectly
written to detect an overflow but had a off-by-one (on the
conservative side), so it unexpectly passed with this CL; I
changed it to really trigger the overflow as intended.

Change-Id: Ib5abade337db46b8811425afebad4719b6e46c4a
Reviewed-on: https://go-review.googlesource.com/105635
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:38:32 +00:00
Giovanni Bajo
6d379add0f cmd/compile: in prove, detect loops with negative increments
To be effective, this also requires being able to relax constraints
on min/max bound inclusiveness; they are now exposed through a flags,
and prove has been updated to handle it correctly.

Change-Id: I3490e54461b7b9de8bc4ae40d3b5e2fa2d9f0556
Reviewed-on: https://go-review.googlesource.com/104041
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:38:18 +00:00
Giovanni Bajo
980fdb8dd5 cmd/compile: improve testing of induction variables
Test both minimum and maximum bound, and prepare
formatting for more advanced tests (inclusive / esclusive bounds).

Change-Id: Ibe432916d9c938343bc07943798bc9709ad71845
Reviewed-on: https://go-review.googlesource.com/104040
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-29 09:38:09 +00:00
Giovanni Bajo
7ec25d0acf cmd/compile: implement loop BCE in prove
Reuse findIndVar to discover induction variables, and then
register the facts we know about them into the facts table
when entering the loop block.

Moreover, handle "x+delta > w" while updating the facts table,
to be able to prove accesses to slices with constant offsets
such as slice[i-10].

Change-Id: I2a63d050ed58258136d54712ac7015b25c893d71
Reviewed-on: https://go-review.googlesource.com/104038
Run-TryBot: Giovanni Bajo <rasky@develer.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:37:35 +00:00
Giovanni Bajo
29162ec9a7 cmd/compile: in prove, infer unsigned relations while branching
When a branch is followed, we apply the relation as described
in the domain relation table. In case the relation is in the
positive domain, we can also infer an unsigned relation if,
by that point, we know that both operands are non-negative.

Fixes #20393

Change-Id: Ieaf0c81558b36d96616abae3eb834c788dd278d5
Reviewed-on: https://go-review.googlesource.com/100278
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:37:15 +00:00
Giovanni Bajo
5c40210987 cmd/compile: in prove, add transitive closure of relations
Implement it through a partial order datastructure, which
keeps the relations between SSA values in a forest of DAGs
and is able to discover contradictions.

In make.bash, this patch is able to prove hundreds of conditions
which were not proved before.

Compilebench:

name       old time/op       new time/op       delta
Template         371ms ± 2%        368ms ± 1%    ~     (p=0.222 n=5+5)
Unicode          203ms ± 6%        199ms ± 3%    ~     (p=0.421 n=5+5)
GoTypes          1.17s ± 4%        1.18s ± 1%    ~     (p=0.151 n=5+5)
Compiler         5.54s ± 2%        5.59s ± 1%    ~     (p=0.548 n=5+5)
SSA              12.9s ± 2%        13.2s ± 1%  +2.96%  (p=0.032 n=5+5)
Flate            245ms ± 2%        247ms ± 3%    ~     (p=0.690 n=5+5)
GoParser         302ms ± 6%        302ms ± 4%    ~     (p=0.548 n=5+5)
Reflect          764ms ± 4%        773ms ± 3%    ~     (p=0.095 n=5+5)
Tar              354ms ± 6%        361ms ± 3%    ~     (p=0.222 n=5+5)
XML              434ms ± 3%        429ms ± 1%    ~     (p=0.421 n=5+5)
StdCmd           22.6s ± 1%        22.9s ± 1%  +1.40%  (p=0.032 n=5+5)

name       old user-time/op  new user-time/op  delta
Template         436ms ± 8%        426ms ± 5%    ~     (p=0.579 n=5+5)
Unicode          219ms ±15%        219ms ±12%    ~     (p=1.000 n=5+5)
GoTypes          1.47s ± 6%        1.53s ± 6%    ~     (p=0.222 n=5+5)
Compiler         7.26s ± 4%        7.40s ± 2%    ~     (p=0.389 n=5+5)
SSA              17.7s ± 4%        18.5s ± 4%  +4.13%  (p=0.032 n=5+5)
Flate            257ms ± 5%        268ms ± 9%    ~     (p=0.333 n=5+5)
GoParser         354ms ± 6%        348ms ± 6%    ~     (p=0.913 n=5+5)
Reflect          904ms ± 2%        944ms ± 4%    ~     (p=0.056 n=5+5)
Tar              398ms ±11%        430ms ± 7%    ~     (p=0.079 n=5+5)
XML              501ms ± 7%        489ms ± 5%    ~     (p=0.444 n=5+5)

name       old text-bytes    new text-bytes    delta
HelloSize        670kB ± 0%        670kB ± 0%  +0.00%  (p=0.008 n=5+5)
CmdGoSize       7.22MB ± 0%       7.21MB ± 0%  -0.07%  (p=0.008 n=5+5)

name       old data-bytes    new data-bytes    delta
HelloSize       9.88kB ± 0%       9.88kB ± 0%    ~     (all equal)
CmdGoSize        248kB ± 0%        248kB ± 0%  -0.06%  (p=0.008 n=5+5)

name       old bss-bytes     new bss-bytes     delta
HelloSize        125kB ± 0%        125kB ± 0%    ~     (all equal)
CmdGoSize        145kB ± 0%        144kB ± 0%  -0.20%  (p=0.008 n=5+5)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.43MB ± 0%       1.43MB ± 0%    ~     (all equal)
CmdGoSize       14.5MB ± 0%       14.5MB ± 0%  -0.06%  (p=0.008 n=5+5)

Fixes #19714
Updates #20393

Change-Id: Ia090f5b5dc1bcd274ba8a39b233c1e1ace1b330e
Reviewed-on: https://go-review.googlesource.com/100277
Run-TryBot: Giovanni Bajo <rasky@develer.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:35:39 +00:00
Ben Shi
aaf73c6d1e cmd/compile: optimize ARM64 with shifted register indexed load/store
ARM64 supports efficient instructions which combine shift, addition, load/store
together. Such as "MOVD (R0)(R1<<3), R2" and "MOVWU R6, (R4)(R1<<2)".

This CL optimizes the compiler to emit such efficient instuctions. And below
is some test data.

1. binary size before/after
binary                 size change
pkg/linux_arm64        +80.1KB
pkg/tool/linux_arm64   +121.9KB
go                     -4.3KB
gofmt                  -64KB

2. go1 benchmark
There is big improvement for the test case Fannkuch11, and slight
improvement for sme others, excluding noise.

name                     old time/op    new time/op    delta
BinaryTree17-4              43.9s ± 2%     44.0s ± 2%     ~     (p=0.820 n=30+30)
Fannkuch11-4                30.6s ± 2%     24.5s ± 3%  -19.93%  (p=0.000 n=25+30)
FmtFprintfEmpty-4           500ns ± 0%     499ns ± 0%   -0.11%  (p=0.000 n=23+25)
FmtFprintfString-4         1.03µs ± 0%    1.04µs ± 3%     ~     (p=0.065 n=29+30)
FmtFprintfInt-4            1.15µs ± 3%    1.15µs ± 4%   -0.56%  (p=0.000 n=30+30)
FmtFprintfIntInt-4         1.80µs ± 5%    1.82µs ± 0%     ~     (p=0.094 n=30+24)
FmtFprintfPrefixedInt-4    2.17µs ± 5%    2.20µs ± 0%     ~     (p=0.100 n=30+23)
FmtFprintfFloat-4          3.08µs ± 3%    3.09µs ± 4%     ~     (p=0.123 n=30+30)
FmtManyArgs-4              7.41µs ± 4%    7.17µs ± 1%   -3.26%  (p=0.000 n=30+23)
GobDecode-4                93.7ms ± 0%    94.7ms ± 4%     ~     (p=0.685 n=24+30)
GobEncode-4                78.7ms ± 7%    77.1ms ± 0%     ~     (p=0.729 n=30+23)
Gzip-4                      4.01s ± 0%     3.97s ± 5%   -1.11%  (p=0.037 n=24+30)
Gunzip-4                    389ms ± 4%     384ms ± 0%     ~     (p=0.155 n=30+23)
HTTPClientServer-4          536µs ± 1%     537µs ± 1%     ~     (p=0.236 n=30+30)
JSONEncode-4                179ms ± 1%     182ms ± 6%     ~     (p=0.763 n=24+30)
JSONDecode-4                843ms ± 0%     839ms ± 6%   -0.42%  (p=0.003 n=25+30)
Mandelbrot200-4            46.5ms ± 0%    46.5ms ± 0%   +0.02%  (p=0.000 n=26+26)
GoParse-4                  44.3ms ± 6%    43.3ms ± 0%     ~     (p=0.067 n=30+27)
RegexpMatchEasy0_32-4      1.07µs ± 7%    1.07µs ± 4%     ~     (p=0.835 n=30+30)
RegexpMatchEasy0_1K-4      5.51µs ± 0%    5.49µs ± 0%   -0.35%  (p=0.000 n=23+26)
RegexpMatchEasy1_32-4      1.01µs ± 0%    1.02µs ± 4%   +0.96%  (p=0.014 n=24+30)
RegexpMatchEasy1_1K-4      7.43µs ± 0%    7.18µs ± 0%   -3.41%  (p=0.000 n=23+24)
RegexpMatchMedium_32-4     1.78µs ± 0%    1.81µs ± 4%   +1.47%  (p=0.012 n=23+30)
RegexpMatchMedium_1K-4      547µs ± 1%     542µs ± 3%   -0.90%  (p=0.003 n=24+30)
RegexpMatchHard_32-4       30.4µs ± 0%    29.7µs ± 0%   -2.15%  (p=0.000 n=19+23)
RegexpMatchHard_1K-4        913µs ± 0%     915µs ± 6%   +0.25%  (p=0.012 n=24+30)
Revcomp-4                   6.32s ± 1%     6.42s ± 4%     ~     (p=0.342 n=25+30)
Template-4                  868ms ± 6%     878ms ± 6%   +1.15%  (p=0.000 n=30+30)
TimeParse-4                4.57µs ± 4%    4.59µs ± 3%   +0.65%  (p=0.010 n=29+30)
TimeFormat-4               4.51µs ± 0%    4.50µs ± 0%   -0.27%  (p=0.000 n=27+24)
[Geo mean]                  695µs          689µs        -0.92%

name                     old speed      new speed      delta
GobDecode-4              8.19MB/s ± 0%  8.12MB/s ± 4%     ~     (p=0.680 n=24+30)
GobEncode-4              9.76MB/s ± 7%  9.96MB/s ± 0%     ~     (p=0.616 n=30+23)
Gzip-4                   4.84MB/s ± 0%  4.89MB/s ± 4%   +1.16%  (p=0.030 n=24+30)
Gunzip-4                 49.9MB/s ± 4%  50.6MB/s ± 0%     ~     (p=0.162 n=30+23)
JSONEncode-4             10.9MB/s ± 1%  10.7MB/s ± 6%     ~     (p=0.575 n=24+30)
JSONDecode-4             2.30MB/s ± 0%  2.32MB/s ± 5%   +0.72%  (p=0.003 n=22+30)
GoParse-4                1.31MB/s ± 6%  1.34MB/s ± 0%   +2.26%  (p=0.002 n=30+27)
RegexpMatchEasy0_32-4    30.0MB/s ± 6%  30.0MB/s ± 4%     ~     (p=1.000 n=30+30)
RegexpMatchEasy0_1K-4     186MB/s ± 0%   187MB/s ± 0%   +0.35%  (p=0.000 n=23+26)
RegexpMatchEasy1_32-4    31.8MB/s ± 0%  31.5MB/s ± 4%   -0.92%  (p=0.012 n=25+30)
RegexpMatchEasy1_1K-4     138MB/s ± 0%   143MB/s ± 0%   +3.53%  (p=0.000 n=23+24)
RegexpMatchMedium_32-4    560kB/s ± 0%   553kB/s ± 4%   -1.19%  (p=0.005 n=23+30)
RegexpMatchMedium_1K-4   1.87MB/s ± 0%  1.89MB/s ± 3%   +1.04%  (p=0.002 n=24+30)
RegexpMatchHard_32-4     1.05MB/s ± 0%  1.08MB/s ± 0%   +2.40%  (p=0.000 n=19+23)
RegexpMatchHard_1K-4     1.12MB/s ± 0%  1.12MB/s ± 5%   +0.12%  (p=0.006 n=25+30)
Revcomp-4                40.2MB/s ± 1%  39.6MB/s ± 4%     ~     (p=0.242 n=25+30)
Template-4               2.24MB/s ± 6%  2.21MB/s ± 6%   -1.15%  (p=0.000 n=30+30)
[Geo mean]               7.87MB/s       7.91MB/s        +0.44%

Change-Id: If374cb7abf83537aa0a176f73c0f736f7800db03
Reviewed-on: https://go-review.googlesource.com/108735
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 20:02:05 +00:00
Milan Knezevic
2959128dc5 cmd/compile: add softfloat support to mips64{,le}
mips64 softfloat support is based on mips implementation and introduces
new enviroment variable GOMIPS64.

GOMIPS64 is a GOARCH=mips64{,le} specific option, for a choice between
hard-float and soft-float. Valid values are 'hardfloat' (default) and
'softfloat'. It is passed to the assembler as
'GOMIPS64_{hardfloat,softfloat}'.

Change-Id: I7f73078627f7cb37c588a38fb5c997fe09c56134
Reviewed-on: https://go-review.googlesource.com/108475
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 14:50:17 +00:00
Josh Bleecher Snyder
d9a50a6531 cmd/compile: use prove pass to detect Ctz of non-zero values
On amd64, Ctz must include special handling of zeros.
But the prove pass has enough information to detect whether the input
is non-zero, allowing a more efficient lowering.

Introduce new CtzNonZero ops to capture and use this information.

Benchmark code:

func BenchmarkVisitBits(b *testing.B) {
	b.Run("8", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			x := uint8(0xff)
			for x != 0 {
				sink = bits.TrailingZeros8(x)
				x &= x - 1
			}
		}
	})

    // and similarly so for 16, 32, 64
}

name            old time/op  new time/op  delta
VisitBits/8-8   7.27ns ± 4%  5.58ns ± 4%  -23.35%  (p=0.000 n=28+26)
VisitBits/16-8  14.7ns ± 7%  10.5ns ± 4%  -28.43%  (p=0.000 n=30+28)
VisitBits/32-8  27.6ns ± 8%  19.3ns ± 3%  -30.14%  (p=0.000 n=30+26)
VisitBits/64-8  44.0ns ±11%  38.0ns ± 5%  -13.48%  (p=0.000 n=30+30)

Fixes #25077

Change-Id: Ie6e5bd86baf39ee8a4ca7cadcf56d934e047f957
Reviewed-on: https://go-review.googlesource.com/109358
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-26 18:22:28 +00:00
Carlos Eduardo Seo
ebb67d993a cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64x
This change implements math.Round as an intrinsic on ppc64x so it can be
done using a single instruction.

benchmark                   old ns/op     new ns/op     delta
BenchmarkRound-16           2.60          0.69          -73.46%

Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8
Reviewed-on: https://go-review.googlesource.com/109395
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-04-26 14:12:09 +00:00
Josh Bleecher Snyder
c5f0104daf cmd/compile: use intrinsic for LeadingZeros8 on amd64
The previous change sped up the pure computation form of LeadingZeros8.
This places it somewhat close to the table lookup form.
Depending on something that varies from toolchain to toolchain
(alignment, perhaps?), the slowdown from ditching the table lookup
is either 20% or 5%.

This benchmark is the best case scenario for the table lookup:
It is in the L1 cache already.

I think we're close enough that we can switch to the computational version,
and trust that the memory effects and binary size savings will be worth it.

Code:

func f8(x uint8)   { z = bits.LeadingZeros8(x) }

Before:

"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	LEAQ	math/bits.len8tab(SB), CX
	0x000f 00015 (x.go:7)	MOVBLZX	(CX)(AX*1), AX
	0x0013 00019 (x.go:7)	ADDQ	$-8, AX
	0x0017 00023 (x.go:7)	NEGQ	AX
	0x001a 00026 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:7)	RET

After:

"".f8 STEXT nosplit size=30 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	LEAL	1(AX)(AX*1), AX
	0x000c 00012 (x.go:7)	BSRL	AX, AX
	0x000f 00015 (x.go:7)	ADDQ	$-8, AX
	0x0013 00019 (x.go:7)	NEGQ	AX
	0x0016 00022 (x.go:7)	MOVQ	AX, "".z(SB)
	0x001d 00029 (x.go:7)	RET

Change-Id: Icc7db50a7820fb9a3da8a816d6b6940d7f8e193e
Reviewed-on: https://go-review.googlesource.com/108942
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:34:15 +00:00
Josh Bleecher Snyder
1d321ada73 cmd/compile: optimize LeadingZeros(16|32) on amd64
Introduce Len8 and Len16 ops and provide optimized lowerings for them.
amd64 only for this CL, although it wouldn't surprise me
if other architectures also admit of optimized lowerings.

Also use and optimize the Len32 lowering, along the same lines.

Leave Len8 unused for the moment; a subsequent CL will enable it.

For 16 and 32 bits, this leads to a speed-up.

name              old time/op  new time/op  delta
LeadingZeros16-8  1.42ns ± 5%  1.23ns ± 5%  -13.42%  (p=0.000 n=20+20)
LeadingZeros32-8  1.25ns ± 5%  1.03ns ± 5%  -17.63%  (p=0.000 n=20+16)

Code:

func f16(x uint16) { z = bits.LeadingZeros16(x) }
func f32(x uint32) { z = bits.LeadingZeros32(x) }

Before:

"".f16 STEXT nosplit size=38 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	BSRQ	AX, AX
	0x000c 00012 (x.go:8)	MOVQ	$-1, CX
	0x0013 00019 (x.go:8)	CMOVQEQ	CX, AX
	0x0017 00023 (x.go:8)	ADDQ	$-15, AX
	0x001b 00027 (x.go:8)	NEGQ	AX
	0x001e 00030 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0025 00037 (x.go:8)	RET

"".f32 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:9)	TEXT	"".f32(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:9)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:9)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:9)	MOVL	"".x+8(SP), AX
	0x0004 00004 (x.go:9)	BSRQ	AX, AX
	0x0008 00008 (x.go:9)	MOVQ	$-1, CX
	0x000f 00015 (x.go:9)	CMOVQEQ	CX, AX
	0x0013 00019 (x.go:9)	ADDQ	$-31, AX
	0x0017 00023 (x.go:9)	NEGQ	AX
	0x001a 00026 (x.go:9)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:9)	RET

After:

"".f16 STEXT nosplit size=30 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	LEAL	1(AX)(AX*1), AX
	0x000c 00012 (x.go:8)	BSRL	AX, AX
	0x000f 00015 (x.go:8)	ADDQ	$-16, AX
	0x0013 00019 (x.go:8)	NEGQ	AX
	0x0016 00022 (x.go:8)	MOVQ	AX, "".z(SB)
	0x001d 00029 (x.go:8)	RET

"".f32 STEXT nosplit size=28 args=0x8 locals=0x0
	0x0000 00000 (x.go:9)	TEXT	"".f32(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:9)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:9)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:9)	MOVL	"".x+8(SP), AX
	0x0004 00004 (x.go:9)	LEAQ	1(AX)(AX*1), AX
	0x0009 00009 (x.go:9)	BSRQ	AX, AX
	0x000d 00013 (x.go:9)	ADDQ	$-32, AX
	0x0011 00017 (x.go:9)	NEGQ	AX
	0x0014 00020 (x.go:9)	MOVQ	AX, "".z(SB)
	0x001b 00027 (x.go:9)	RET

Change-Id: I6c93c173752a7bfdeab8be30777ae05a736e1f4b
Reviewed-on: https://go-review.googlesource.com/108941
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:34:04 +00:00
Josh Bleecher Snyder
54dbab5221 cmd/compile: optimize TrailingZeros(8|16) on amd64
Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them.
amd64 only for this CL, although it wouldn't surprise me
if other architectures also admit of optimized lowerings.

name               old time/op  new time/op  delta
TrailingZeros8-8   1.33ns ± 6%  0.84ns ± 3%  -36.90%  (p=0.000 n=20+20)
TrailingZeros16-8  1.26ns ± 5%  0.84ns ± 5%  -33.50%  (p=0.000 n=20+18)

Code:

func f8(x uint8)   { z = bits.TrailingZeros8(x) }
func f16(x uint16) { z = bits.TrailingZeros16(x) }

Before:

"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	BTSQ	$8, AX
	0x000d 00013 (x.go:7)	BSFQ	AX, AX
	0x0011 00017 (x.go:7)	MOVL	$64, CX
	0x0016 00022 (x.go:7)	CMOVQEQ	CX, AX
	0x001a 00026 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:7)	RET

"".f16 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	BTSQ	$16, AX
	0x000d 00013 (x.go:8)	BSFQ	AX, AX
	0x0011 00017 (x.go:8)	MOVL	$64, CX
	0x0016 00022 (x.go:8)	CMOVQEQ	CX, AX
	0x001a 00026 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:8)	RET

After:

"".f8 STEXT nosplit size=20 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	BTSL	$8, AX
	0x0009 00009 (x.go:7)	BSFL	AX, AX
	0x000c 00012 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0013 00019 (x.go:7)	RET

"".f16 STEXT nosplit size=20 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	BTSL	$16, AX
	0x0009 00009 (x.go:8)	BSFL	AX, AX
	0x000c 00012 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0013 00019 (x.go:8)	RET

Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11
Reviewed-on: https://go-review.googlesource.com/108940
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:33:52 +00:00