1
0
mirror of https://github.com/golang/go synced 2024-11-16 20:04:52 -07:00
Commit Graph

6 Commits

Author SHA1 Message Date
Ben Shi
0a382e0b7f cmd/compile: optimize ARMv7 code
"AND $0xffff0000, Rx" will be encoded to 12 bytes.
1. MOVWload from the constant pool to Rtmp
2. AND Rtmp, Rx
3. a 4-byte item in the constant pool

It can be simplified to 8 bytes on ARMv7, since ARMv7 has
"MOVW $imm-16, Rx".
1. MOVW $0xffff, Rtmp
2. BIC Rtmp, Rx

The above optimization also applies to BICconst, ADDconst and
SUBconst.

1. The total size of pkg/android_arm (excluding cmd/compile)
   decreases about 2KB.

2. The go1 benchmark shows no regression, exlcuding noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              25.5s ± 1%     25.2s ± 1%  -0.85%  (p=0.000 n=30+30)
Fannkuch11-4                13.3s ± 0%     13.3s ± 0%  +0.16%  (p=0.000 n=24+25)
FmtFprintfEmpty-4           397ns ± 0%     394ns ± 0%  -0.64%  (p=0.000 n=30+30)
FmtFprintfString-4          679ns ± 0%     678ns ± 0%    ~     (p=0.093 n=30+29)
FmtFprintfInt-4             708ns ± 0%     707ns ± 0%  -0.19%  (p=0.000 n=27+28)
FmtFprintfIntInt-4         1.05µs ± 0%    1.05µs ± 0%  -0.07%  (p=0.001 n=18+30)
FmtFprintfPrefixedInt-4    1.16µs ± 0%    1.15µs ± 0%  -0.41%  (p=0.000 n=29+30)
FmtFprintfFloat-4          2.26µs ± 0%    2.23µs ± 1%  -1.40%  (p=0.000 n=30+30)
FmtManyArgs-4              3.96µs ± 0%    3.95µs ± 0%  -0.29%  (p=0.000 n=29+30)
GobDecode-4                52.9ms ± 2%    53.4ms ± 2%  +0.92%  (p=0.004 n=28+30)
GobEncode-4                49.7ms ± 2%    49.8ms ± 2%    ~     (p=0.890 n=30+26)
Gzip-4                      2.61s ± 0%     2.60s ± 0%  -0.36%  (p=0.000 n=29+29)
Gunzip-4                    312ms ± 0%     311ms ± 0%  -0.13%  (p=0.000 n=30+28)
HTTPClientServer-4         1.02ms ± 8%    1.00ms ± 7%    ~     (p=0.224 n=29+26)
JSONEncode-4                125ms ± 1%     124ms ± 3%  -1.05%  (p=0.000 n=25+30)
JSONDecode-4                432ms ± 1%     436ms ± 2%    ~     (p=0.277 n=26+30)
Mandelbrot200-4            18.4ms ± 0%    18.4ms ± 0%  +0.02%  (p=0.001 n=28+25)
GoParse-4                  22.4ms ± 1%    22.3ms ± 1%  -0.41%  (p=0.000 n=28+28)
RegexpMatchEasy0_32-4       697ns ± 0%     706ns ± 0%  +1.23%  (p=0.000 n=19+30)
RegexpMatchEasy0_1K-4      4.27µs ± 0%    4.26µs ± 0%  -0.06%  (p=0.000 n=30+30)
RegexpMatchEasy1_32-4       741ns ± 0%     735ns ± 0%  -0.86%  (p=0.000 n=26+30)
RegexpMatchEasy1_1K-4      5.49µs ± 0%    5.49µs ± 0%  -0.03%  (p=0.023 n=25+30)
RegexpMatchMedium_32-4     1.05µs ± 2%    1.04µs ± 2%    ~     (p=0.893 n=30+30)
RegexpMatchMedium_1K-4      261µs ± 0%     261µs ± 0%  -0.11%  (p=0.000 n=29+30)
RegexpMatchHard_32-4       14.9µs ± 0%    14.9µs ± 0%  -0.36%  (p=0.000 n=23+29)
RegexpMatchHard_1K-4        446µs ± 0%     445µs ± 0%  -0.17%  (p=0.000 n=30+29)
Revcomp-4                  41.6ms ± 1%    41.7ms ± 1%  +0.27%  (p=0.040 n=28+30)
Template-4                  531ms ± 0%     532ms ± 1%    ~     (p=0.059 n=30+30)
TimeParse-4                3.40µs ± 0%    3.33µs ± 0%  -2.02%  (p=0.000 n=30+30)
TimeFormat-4               6.14µs ± 0%    6.11µs ± 0%  -0.45%  (p=0.000 n=27+29)
[Geo mean]                  384µs          383µs       -0.27%

name                     old speed      new speed      delta
GobDecode-4              14.5MB/s ± 2%  14.4MB/s ± 2%  -0.90%  (p=0.005 n=28+30)
GobEncode-4              15.4MB/s ± 2%  15.4MB/s ± 2%    ~     (p=0.741 n=30+25)
Gzip-4                   7.44MB/s ± 0%  7.47MB/s ± 1%  +0.37%  (p=0.000 n=25+30)
Gunzip-4                 62.3MB/s ± 0%  62.4MB/s ± 0%  +0.13%  (p=0.000 n=30+28)
JSONEncode-4             15.5MB/s ± 1%  15.6MB/s ± 3%  +1.07%  (p=0.000 n=25+30)
JSONDecode-4             4.48MB/s ± 0%  4.46MB/s ± 2%    ~     (p=0.655 n=23+30)
GoParse-4                2.58MB/s ± 1%  2.59MB/s ± 1%  +0.42%  (p=0.000 n=28+29)
RegexpMatchEasy0_32-4    45.9MB/s ± 0%  45.3MB/s ± 0%  -1.23%  (p=0.000 n=28+30)
RegexpMatchEasy0_1K-4     240MB/s ± 0%   240MB/s ± 0%  +0.07%  (p=0.000 n=30+30)
RegexpMatchEasy1_32-4    43.2MB/s ± 0%  43.5MB/s ± 0%  +0.85%  (p=0.000 n=30+28)
RegexpMatchEasy1_1K-4     186MB/s ± 0%   186MB/s ± 0%  +0.03%  (p=0.026 n=25+30)
RegexpMatchMedium_32-4    955kB/s ± 2%   960kB/s ± 2%    ~     (p=0.084 n=30+30)
RegexpMatchMedium_1K-4   3.92MB/s ± 0%  3.93MB/s ± 0%  +0.14%  (p=0.000 n=29+30)
RegexpMatchHard_32-4     2.14MB/s ± 0%  2.15MB/s ± 0%  +0.31%  (p=0.000 n=30+26)
RegexpMatchHard_1K-4     2.30MB/s ± 0%  2.30MB/s ± 0%    ~     (all equal)
Revcomp-4                61.1MB/s ± 1%  60.9MB/s ± 1%  -0.27%  (p=0.039 n=28+30)
Template-4               3.66MB/s ± 0%  3.65MB/s ± 1%  -0.14%  (p=0.045 n=30+30)
[Geo mean]               12.8MB/s       12.8MB/s       +0.04%

Change-Id: I02370e2584b4c041fddd324c97628fd6f0c12183
Reviewed-on: https://go-review.googlesource.com/123179
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-20 15:13:23 +00:00
Ben Shi
b75c5c5992 cmd/compile: optimize AMD64 with more read-modify-write operations
6 more operations which do read-modify-write with a constant
source operand are added.

1. The total size of pkg/linux_amd64 decreases about 3KB, excluding
cmd/compile.

2. The go1 benckmark shows a slight improvement.
name                     old time/op    new time/op    delta
BinaryTree17-4              2.61s ± 4%     2.67s ± 2%  +2.26%  (p=0.000 n=30+29)
Fannkuch11-4                2.39s ± 2%     2.32s ± 2%  -2.67%  (p=0.000 n=30+30)
FmtFprintfEmpty-4          44.0ns ± 4%    41.7ns ± 4%  -5.15%  (p=0.000 n=30+30)
FmtFprintfString-4         74.2ns ± 4%    72.3ns ± 4%  -2.59%  (p=0.000 n=30+30)
FmtFprintfInt-4            81.7ns ± 3%    78.8ns ± 4%  -3.54%  (p=0.000 n=27+30)
FmtFprintfIntInt-4          130ns ± 4%     124ns ± 5%  -4.60%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4     154ns ± 3%     152ns ± 3%  -1.13%  (p=0.012 n=30+30)
FmtFprintfFloat-4           215ns ± 4%     212ns ± 5%  -1.56%  (p=0.002 n=30+30)
FmtManyArgs-4               522ns ± 3%     512ns ± 3%  -1.84%  (p=0.001 n=30+30)
GobDecode-4                6.42ms ± 5%    6.49ms ± 7%    ~     (p=0.070 n=30+30)
GobEncode-4                6.07ms ± 8%    5.98ms ± 8%    ~     (p=0.150 n=30+30)
Gzip-4                      236ms ± 4%     223ms ± 4%  -5.57%  (p=0.000 n=30+30)
Gunzip-4                   37.4ms ± 3%    36.7ms ± 4%  -2.03%  (p=0.000 n=30+30)
HTTPClientServer-4         58.7µs ± 1%    58.5µs ± 2%  -0.37%  (p=0.018 n=30+29)
JSONEncode-4               12.0ms ± 4%    12.1ms ± 3%    ~     (p=0.112 n=30+30)
JSONDecode-4               54.5ms ± 3%    55.5ms ± 4%  +1.80%  (p=0.006 n=30+30)
Mandelbrot200-4            3.78ms ± 4%    3.78ms ± 4%    ~     (p=0.173 n=30+30)
GoParse-4                  3.16ms ± 5%    3.22ms ± 5%  +1.75%  (p=0.010 n=30+30)
RegexpMatchEasy0_32-4      76.6ns ± 1%    75.9ns ± 3%    ~     (p=0.672 n=25+30)
RegexpMatchEasy0_1K-4       252ns ± 3%     253ns ± 3%  +0.57%  (p=0.027 n=30+30)
RegexpMatchEasy1_32-4      69.8ns ± 4%    70.2ns ± 6%    ~     (p=0.539 n=30+30)
RegexpMatchEasy1_1K-4       374ns ± 3%     373ns ± 5%    ~     (p=0.263 n=30+30)
RegexpMatchMedium_32-4      107ns ± 4%     109ns ± 3%    ~     (p=0.067 n=30+30)
RegexpMatchMedium_1K-4     33.9µs ± 5%    34.1µs ± 4%    ~     (p=0.297 n=30+30)
RegexpMatchHard_32-4       1.54µs ± 3%    1.56µs ± 4%  +1.43%  (p=0.002 n=30+30)
RegexpMatchHard_1K-4       46.6µs ± 3%    47.0µs ± 3%    ~     (p=0.055 n=30+30)
Revcomp-4                   411ms ± 6%     407ms ± 6%    ~     (p=0.219 n=30+30)
Template-4                 66.8ms ± 3%    64.8ms ± 5%  -3.01%  (p=0.000 n=30+30)
TimeParse-4                 312ns ± 2%     319ns ± 3%  +2.50%  (p=0.000 n=30+30)
TimeFormat-4                296ns ± 5%     299ns ± 3%  +0.93%  (p=0.005 n=30+30)
[Geo mean]                 47.5µs         47.1µs       -0.75%

name                     old speed      new speed      delta
GobDecode-4               120MB/s ± 5%   118MB/s ± 6%    ~     (p=0.072 n=30+30)
GobEncode-4               127MB/s ± 8%   129MB/s ± 8%    ~     (p=0.150 n=30+30)
Gzip-4                   82.1MB/s ± 4%  87.0MB/s ± 4%  +5.90%  (p=0.000 n=30+30)
Gunzip-4                  519MB/s ± 4%   529MB/s ± 4%  +2.07%  (p=0.001 n=30+30)
JSONEncode-4              162MB/s ± 4%   161MB/s ± 3%    ~     (p=0.110 n=30+30)
JSONDecode-4             35.6MB/s ± 3%  35.0MB/s ± 4%  -1.77%  (p=0.007 n=30+30)
GoParse-4                18.3MB/s ± 4%  18.0MB/s ± 4%  -1.72%  (p=0.009 n=30+30)
RegexpMatchEasy0_32-4     418MB/s ± 1%   422MB/s ± 3%    ~     (p=0.645 n=25+30)
RegexpMatchEasy0_1K-4    4.06GB/s ± 3%  4.04GB/s ± 3%  -0.57%  (p=0.033 n=30+30)
RegexpMatchEasy1_32-4     459MB/s ± 4%   456MB/s ± 6%    ~     (p=0.530 n=30+30)
RegexpMatchEasy1_1K-4    2.73GB/s ± 3%  2.75GB/s ± 5%    ~     (p=0.279 n=30+30)
RegexpMatchMedium_32-4   9.28MB/s ± 5%  9.18MB/s ± 4%    ~     (p=0.086 n=30+30)
RegexpMatchMedium_1K-4   30.2MB/s ± 4%  30.0MB/s ± 4%    ~     (p=0.300 n=30+30)
RegexpMatchHard_32-4     20.8MB/s ± 3%  20.5MB/s ± 4%  -1.41%  (p=0.002 n=30+30)
RegexpMatchHard_1K-4     22.0MB/s ± 3%  21.8MB/s ± 3%    ~     (p=0.051 n=30+30)
Revcomp-4                 619MB/s ± 7%   625MB/s ± 7%    ~     (p=0.219 n=30+30)
Template-4               29.0MB/s ± 3%  29.9MB/s ± 4%  +3.11%  (p=0.000 n=30+30)
[Geo mean]                123MB/s        123MB/s       +0.28%

Change-Id: I850652cfd53329c1af804b7f57f4393d8097bb0d
Reviewed-on: https://go-review.googlesource.com/121135
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-08-20 14:18:39 +00:00
Alberto Donizetti
188e2bf897 test/codegen: port arm64 BIC/EON/ORN and masking tests
And delete them from asm_test.

Change-Id: I24f421b87e8cb4770c887a6dfd58eacd0088947d
Reviewed-on: https://go-review.googlesource.com/106056
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-10 10:57:50 +00:00
Giovanni Bajo
79112707bb cmd/compile: add patterns for bit set/clear/complement on amd64
This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).

Example of code changes from time.(*Time).addSec:

        if t.wall&hasMonotonic != 0 {
  0x1073465               488b08                  MOVQ 0(AX), CX
  0x1073468               4889ca                  MOVQ CX, DX
  0x107346b               48c1e93f                SHRQ $0x3f, CX
  0x107346f               48c1e13f                SHLQ $0x3f, CX
  0x1073473               48f7c1ffffffff          TESTQ $-0x1, CX
  0x107347a               746b                    JE 0x10734e7

        if t.wall&hasMonotonic != 0 {
  0x1073435               488b08                  MOVQ 0(AX), CX
  0x1073438               480fbae13f              BTQ $0x3f, CX
  0x107343d               7363                    JAE 0x10734a2

Another example:

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x10734c8               4881e1ffffff3f          ANDQ $0x3fffffff, CX
  0x10734cf               48c1e61e                SHLQ $0x1e, SI
  0x10734d3               4809ce                  ORQ CX, SI
  0x10734d6               48b90000000000000080    MOVQ $0x8000000000000000, CX
  0x10734e0               4809f1                  ORQ SI, CX
  0x10734e3               488908                  MOVQ CX, 0(AX)

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x107348b		4881e2ffffff3f		ANDQ $0x3fffffff, DX
  0x1073492		48c1e61e		SHLQ $0x1e, SI
  0x1073496		4809f2			ORQ SI, DX
  0x1073499		480fbaea3f		BTSQ $0x3f, DX
  0x107349e		488910			MOVQ DX, 0(AX)

Go1 benchmarks seem unaffected, and I would be surprised
otherwise:

name                     old time/op    new time/op     delta
BinaryTree17-4              2.64s ± 4%      2.56s ± 9%  -2.92%  (p=0.008 n=9+9)
Fannkuch11-4                2.90s ± 1%      2.95s ± 3%  +1.76%  (p=0.010 n=10+9)
FmtFprintfEmpty-4          35.3ns ± 1%     34.5ns ± 2%  -2.34%  (p=0.004 n=9+8)
FmtFprintfString-4         57.0ns ± 1%     58.4ns ± 5%  +2.52%  (p=0.029 n=9+10)
FmtFprintfInt-4            59.8ns ± 3%     59.8ns ± 6%    ~     (p=0.565 n=10+10)
FmtFprintfIntInt-4         93.9ns ± 3%     91.2ns ± 5%  -2.94%  (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4     107ns ± 6%      104ns ± 6%    ~     (p=0.099 n=10+10)
FmtFprintfFloat-4           187ns ± 3%      188ns ± 3%    ~     (p=0.505 n=10+9)
FmtManyArgs-4               410ns ± 1%      415ns ± 6%    ~     (p=0.649 n=8+10)
GobDecode-4                5.30ms ± 3%     5.27ms ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4                4.62ms ± 5%     4.47ms ± 2%  -3.24%  (p=0.001 n=9+10)
Gzip-4                      197ms ± 4%      193ms ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                   30.4ms ± 3%     30.1ms ± 3%    ~     (p=0.481 n=10+10)
HTTPClientServer-4         76.3µs ± 1%     76.0µs ± 1%    ~     (p=0.236 n=8+9)
JSONEncode-4               10.5ms ± 9%     10.3ms ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4               42.3ms ±10%     41.3ms ± 2%    ~     (p=0.053 n=9+10)
Mandelbrot200-4            3.80ms ± 2%     3.72ms ± 2%  -2.15%  (p=0.001 n=9+10)
GoParse-4                  2.88ms ±10%     2.81ms ± 2%    ~     (p=0.247 n=10+10)
RegexpMatchEasy0_32-4      69.5ns ± 4%     68.6ns ± 2%    ~     (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4       165ns ± 3%      162ns ± 3%    ~     (p=0.137 n=10+10)
RegexpMatchEasy1_32-4      65.7ns ± 6%     64.4ns ± 2%  -2.02%  (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4       278ns ± 2%      279ns ± 3%    ~     (p=0.991 n=8+9)
RegexpMatchMedium_32-4     99.3ns ± 3%     98.5ns ± 4%    ~     (p=0.457 n=10+9)
RegexpMatchMedium_1K-4     30.1µs ± 1%     30.4µs ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4       1.40µs ± 2%     1.41µs ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4       42.5µs ± 1%     41.5µs ± 3%  -2.13%  (p=0.002 n=8+9)
Revcomp-4                   332ms ± 4%      328ms ± 5%    ~     (p=0.720 n=9+10)
Template-4                 48.3ms ± 2%     49.6ms ± 3%  +2.56%  (p=0.002 n=8+10)
TimeParse-4                 252ns ± 2%      249ns ± 3%    ~     (p=0.116 n=9+10)
TimeFormat-4                262ns ± 4%      252ns ± 3%  -4.01%  (p=0.000 n=9+10)

name                     old speed      new speed       delta
GobDecode-4               145MB/s ± 3%    146MB/s ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4               166MB/s ± 5%    172MB/s ± 2%  +3.28%  (p=0.001 n=9+10)
Gzip-4                   98.6MB/s ± 4%  100.4MB/s ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                  639MB/s ± 3%    645MB/s ± 3%    ~     (p=0.481 n=10+10)
JSONEncode-4              185MB/s ± 8%    189MB/s ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4             46.0MB/s ± 9%   47.0MB/s ± 2%  +2.21%  (p=0.046 n=9+10)
GoParse-4                20.1MB/s ± 9%   20.6MB/s ± 2%    ~     (p=0.239 n=10+10)
RegexpMatchEasy0_32-4     460MB/s ± 4%    467MB/s ± 2%    ~     (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4    6.19GB/s ± 3%   6.28GB/s ± 3%    ~     (p=0.165 n=10+10)
RegexpMatchEasy1_32-4     487MB/s ± 5%    497MB/s ± 2%  +2.00%  (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4    3.67GB/s ± 2%   3.67GB/s ± 3%    ~     (p=0.963 n=8+9)
RegexpMatchMedium_32-4   10.1MB/s ± 3%   10.1MB/s ± 4%    ~     (p=0.435 n=10+9)
RegexpMatchMedium_1K-4   34.0MB/s ± 1%   33.7MB/s ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4     22.9MB/s ± 2%   22.7MB/s ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4     24.0MB/s ± 3%   24.7MB/s ± 3%  +2.64%  (p=0.001 n=9+9)
Revcomp-4                 766MB/s ± 4%    775MB/s ± 5%    ~     (p=0.720 n=9+10)
Template-4               40.2MB/s ± 2%   39.2MB/s ± 3%  -2.47%  (p=0.002 n=8+10)

The rules match ~1800 times during all.bash.

Fixes #18943

Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-24 02:38:50 +00:00
Alberto Donizetti
644b2dafc2 test/codegen: add copyright headers to new codegen files
Change-Id: I9fe6572d1043ef9ee09c0925059ded554ad24c6b
Reviewed-on: https://go-review.googlesource.com/98215
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-02 20:13:13 +00:00
Giovanni Bajo
c9438cb198 test: add support for code generation tests (asmcheck)
The top-level test harness is modified to support a new kind
of test: "asmcheck". This is meant to replace asm_test.go
as an easier and more readable way to test code generation.

I've added a couple of codegen tests to get initial feedback
on the syntax. I've created them under a common "codegen"
subdirectory, so that it's easier to run them all with
"go run run.go -v codegen".

The asmcheck syntax allows to insert line comments that
can specify a regular expression to match in the assembly code,
for multiple architectures (the testsuite will automatically
build each testfile multiple times, one per mentioned architecture).

Negative matches are unsupported for now, so this cannot fully
replace asm_test yet.

Change-Id: Ifdbba389f01d55e63e73c99e5f5449e642101d55
Reviewed-on: https://go-review.googlesource.com/97355
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2018-03-01 07:59:54 +00:00