1
0
mirror of https://github.com/golang/go synced 2024-09-29 14:24:32 -06:00
Commit Graph

136 Commits

Author SHA1 Message Date
Alberto Donizetti
56eaf574a1 test/codegen: match 387 ops too for GOARCH=386
Change-Id: I99407e27e340689009af798989b33cef7cb92070
Reviewed-on: https://go-review.googlesource.com/103376
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-29 20:05:40 +00:00
Alberto Donizetti
a27cd4fd31 test/codegen: port tbz/tbnz arm64 tests
And delete them from asm_test.

Change-Id: I34fcf85ae8ce09cd146fe4ce6a0ae7616bd97e2d
Reviewed-on: https://go-review.googlesource.com/102296
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-24 09:35:53 +00:00
Giovanni Bajo
79112707bb cmd/compile: add patterns for bit set/clear/complement on amd64
This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).

Example of code changes from time.(*Time).addSec:

        if t.wall&hasMonotonic != 0 {
  0x1073465               488b08                  MOVQ 0(AX), CX
  0x1073468               4889ca                  MOVQ CX, DX
  0x107346b               48c1e93f                SHRQ $0x3f, CX
  0x107346f               48c1e13f                SHLQ $0x3f, CX
  0x1073473               48f7c1ffffffff          TESTQ $-0x1, CX
  0x107347a               746b                    JE 0x10734e7

        if t.wall&hasMonotonic != 0 {
  0x1073435               488b08                  MOVQ 0(AX), CX
  0x1073438               480fbae13f              BTQ $0x3f, CX
  0x107343d               7363                    JAE 0x10734a2

Another example:

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x10734c8               4881e1ffffff3f          ANDQ $0x3fffffff, CX
  0x10734cf               48c1e61e                SHLQ $0x1e, SI
  0x10734d3               4809ce                  ORQ CX, SI
  0x10734d6               48b90000000000000080    MOVQ $0x8000000000000000, CX
  0x10734e0               4809f1                  ORQ SI, CX
  0x10734e3               488908                  MOVQ CX, 0(AX)

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x107348b		4881e2ffffff3f		ANDQ $0x3fffffff, DX
  0x1073492		48c1e61e		SHLQ $0x1e, SI
  0x1073496		4809f2			ORQ SI, DX
  0x1073499		480fbaea3f		BTSQ $0x3f, DX
  0x107349e		488910			MOVQ DX, 0(AX)

Go1 benchmarks seem unaffected, and I would be surprised
otherwise:

name                     old time/op    new time/op     delta
BinaryTree17-4              2.64s ± 4%      2.56s ± 9%  -2.92%  (p=0.008 n=9+9)
Fannkuch11-4                2.90s ± 1%      2.95s ± 3%  +1.76%  (p=0.010 n=10+9)
FmtFprintfEmpty-4          35.3ns ± 1%     34.5ns ± 2%  -2.34%  (p=0.004 n=9+8)
FmtFprintfString-4         57.0ns ± 1%     58.4ns ± 5%  +2.52%  (p=0.029 n=9+10)
FmtFprintfInt-4            59.8ns ± 3%     59.8ns ± 6%    ~     (p=0.565 n=10+10)
FmtFprintfIntInt-4         93.9ns ± 3%     91.2ns ± 5%  -2.94%  (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4     107ns ± 6%      104ns ± 6%    ~     (p=0.099 n=10+10)
FmtFprintfFloat-4           187ns ± 3%      188ns ± 3%    ~     (p=0.505 n=10+9)
FmtManyArgs-4               410ns ± 1%      415ns ± 6%    ~     (p=0.649 n=8+10)
GobDecode-4                5.30ms ± 3%     5.27ms ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4                4.62ms ± 5%     4.47ms ± 2%  -3.24%  (p=0.001 n=9+10)
Gzip-4                      197ms ± 4%      193ms ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                   30.4ms ± 3%     30.1ms ± 3%    ~     (p=0.481 n=10+10)
HTTPClientServer-4         76.3µs ± 1%     76.0µs ± 1%    ~     (p=0.236 n=8+9)
JSONEncode-4               10.5ms ± 9%     10.3ms ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4               42.3ms ±10%     41.3ms ± 2%    ~     (p=0.053 n=9+10)
Mandelbrot200-4            3.80ms ± 2%     3.72ms ± 2%  -2.15%  (p=0.001 n=9+10)
GoParse-4                  2.88ms ±10%     2.81ms ± 2%    ~     (p=0.247 n=10+10)
RegexpMatchEasy0_32-4      69.5ns ± 4%     68.6ns ± 2%    ~     (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4       165ns ± 3%      162ns ± 3%    ~     (p=0.137 n=10+10)
RegexpMatchEasy1_32-4      65.7ns ± 6%     64.4ns ± 2%  -2.02%  (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4       278ns ± 2%      279ns ± 3%    ~     (p=0.991 n=8+9)
RegexpMatchMedium_32-4     99.3ns ± 3%     98.5ns ± 4%    ~     (p=0.457 n=10+9)
RegexpMatchMedium_1K-4     30.1µs ± 1%     30.4µs ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4       1.40µs ± 2%     1.41µs ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4       42.5µs ± 1%     41.5µs ± 3%  -2.13%  (p=0.002 n=8+9)
Revcomp-4                   332ms ± 4%      328ms ± 5%    ~     (p=0.720 n=9+10)
Template-4                 48.3ms ± 2%     49.6ms ± 3%  +2.56%  (p=0.002 n=8+10)
TimeParse-4                 252ns ± 2%      249ns ± 3%    ~     (p=0.116 n=9+10)
TimeFormat-4                262ns ± 4%      252ns ± 3%  -4.01%  (p=0.000 n=9+10)

name                     old speed      new speed       delta
GobDecode-4               145MB/s ± 3%    146MB/s ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4               166MB/s ± 5%    172MB/s ± 2%  +3.28%  (p=0.001 n=9+10)
Gzip-4                   98.6MB/s ± 4%  100.4MB/s ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                  639MB/s ± 3%    645MB/s ± 3%    ~     (p=0.481 n=10+10)
JSONEncode-4              185MB/s ± 8%    189MB/s ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4             46.0MB/s ± 9%   47.0MB/s ± 2%  +2.21%  (p=0.046 n=9+10)
GoParse-4                20.1MB/s ± 9%   20.6MB/s ± 2%    ~     (p=0.239 n=10+10)
RegexpMatchEasy0_32-4     460MB/s ± 4%    467MB/s ± 2%    ~     (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4    6.19GB/s ± 3%   6.28GB/s ± 3%    ~     (p=0.165 n=10+10)
RegexpMatchEasy1_32-4     487MB/s ± 5%    497MB/s ± 2%  +2.00%  (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4    3.67GB/s ± 2%   3.67GB/s ± 3%    ~     (p=0.963 n=8+9)
RegexpMatchMedium_32-4   10.1MB/s ± 3%   10.1MB/s ± 4%    ~     (p=0.435 n=10+9)
RegexpMatchMedium_1K-4   34.0MB/s ± 1%   33.7MB/s ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4     22.9MB/s ± 2%   22.7MB/s ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4     24.0MB/s ± 3%   24.7MB/s ± 3%  +2.64%  (p=0.001 n=9+9)
Revcomp-4                 766MB/s ± 4%    775MB/s ± 5%    ~     (p=0.720 n=9+10)
Template-4               40.2MB/s ± 2%   39.2MB/s ± 3%  -2.47%  (p=0.002 n=8+10)

The rules match ~1800 times during all.bash.

Fixes #18943

Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-24 02:38:50 +00:00
Alberto Donizetti
fc6280d4b0 test/codegen: port direct comparisons with memory tests
And remove them from asm_test.

Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454
Reviewed-on: https://go-review.googlesource.com/102115
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-22 17:20:09 +00:00
Alberto Donizetti
be371edd67 test/codegen: port comparisons tests to codegen
And delete them from asm_test.

Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
Reviewed-on: https://go-review.googlesource.com/101595
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-20 19:38:06 +00:00
Vladimir Kuzmin
c12b185a6e cmd/compile: avoid mapaccess at m[k]=append(m[k]..
Currently rvalue m[k] is transformed during walk into:

        tmp1 := *mapaccess(m, k)
        tmp2 := append(tmp1, ...)
        *mapassign(m, k) = tmp2

However, this is suboptimal, as we could instead produce just:
        tmp := mapassign(m, k)
        *tmp := append(*tmp, ...)

Optimization is possible only if during Order it may tell that m[k] is
exactly the same at left and right part of assignment. It doesn't work:
1) m[f(k)] = append(m[f(k)], ...)
2) sink, m[k] = sink, append(m[k]...)
3) m[k] = append(..., m[k],...)

Benchmark:
name                           old time/op    new time/op    delta
MapAppendAssign/Int32/256-8      33.5ns ± 3%    22.4ns ±10%  -33.24%  (p=0.000 n=16+18)
MapAppendAssign/Int32/65536-8    68.2ns ± 6%    48.5ns ±29%  -28.90%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8      34.3ns ± 4%    23.3ns ± 5%  -32.23%  (p=0.000 n=17+18)
MapAppendAssign/Int64/65536-8    65.9ns ± 7%    61.2ns ±19%   -7.06%  (p=0.002 n=18+20)
MapAppendAssign/Str/256-8         116ns ±12%      79ns ±16%  -31.70%  (p=0.000 n=20+19)
MapAppendAssign/Str/65536-8       134ns ±15%     111ns ±45%  -16.95%  (p=0.000 n=19+20)

name                           old alloc/op   new alloc/op   delta
MapAppendAssign/Int32/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=19+18)
MapAppendAssign/Int32/65536-8     27.0B ± 0%     20.7B ±30%  -23.33%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=20+17)
MapAppendAssign/Int64/65536-8     27.0B ± 0%     27.0B ± 0%     ~     (all equal)
MapAppendAssign/Str/256-8         94.0B ± 0%     78.0B ± 0%  -17.02%  (p=0.000 n=20+16)
MapAppendAssign/Str/65536-8       54.0B ± 0%     54.0B ± 0%     ~     (all equal)

Fixes #24364
Updates #5147

Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409
Reviewed-on: https://go-review.googlesource.com/100838
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-20 01:47:07 +00:00
Alberto Donizetti
5a4e09837c test/codegen: port maps test to codegen
And delete them from asm_test.

Change-Id: I3cf0934706a640136cb0f646509174f8c1bf3363
Reviewed-on: https://go-review.googlesource.com/101395
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-19 13:39:34 +00:00
Alberto Donizetti
b61b1d2c57 test/codegen: port structs test to codegen
And delete them from asm_test.

Change-Id: Ia286239a3d8f3915f2ca25dbcb39f3354a4f8aea
Reviewed-on: https://go-review.googlesource.com/101138
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-18 16:53:53 +00:00
Alberto Donizetti
cceee685be test/codegen: port floats tests to codegen
And delete them from asm_test.

Change-Id: Ibdaca3496eefc73c731b511ddb9636a1f3dff68c
Reviewed-on: https://go-review.googlesource.com/100915
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-15 18:05:59 +00:00
Giovanni Bajo
a35ec9a59e cmd/compile: implement CMOV on amd64
This builds upon the branchelim pass, activating it for amd64 and
lowering CondSelect. Special care is made to FPU instructions for
NaN handling.

Benchmark results on Xeon E5630 (Westmere EP):

name                      old time/op    new time/op    delta
BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)

name                      old speed      new speed      delta
GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)

Change-Id: I6972e8f35f2b31f9a42ac473a6bf419a18022558
Reviewed-on: https://go-review.googlesource.com/100935
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-15 16:41:59 +00:00
Geoff Berry
e244a7a7d3 cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes
Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
UBFIZ and UBFX opcodes.

go1 benchmarks results on Amberwing:
name                   old time/op    new time/op    delta
FmtManyArgs               786ns ± 2%     714ns ± 1%  -9.20%  (p=0.000 n=10+10)
Gzip                      437ms ± 0%     402ms ± 0%  -7.99%  (p=0.000 n=10+10)
FmtFprintfIntInt          196ns ± 0%     182ns ± 0%  -7.28%  (p=0.000 n=10+9)
FmtFprintfPrefixedInt     207ns ± 0%     199ns ± 0%  -3.86%  (p=0.000 n=10+10)
FmtFprintfFloat           324ns ± 0%     316ns ± 0%  -2.47%  (p=0.000 n=10+8)
FmtFprintfInt             119ns ± 0%     117ns ± 0%  -1.68%  (p=0.000 n=10+9)
GobDecode                12.8ms ± 2%    12.6ms ± 1%  -1.62%  (p=0.002 n=10+10)
JSONDecode               94.4ms ± 1%    93.4ms ± 0%  -1.10%  (p=0.000 n=10+10)
RegexpMatchEasy0_32       247ns ± 0%     245ns ± 0%  -0.65%  (p=0.000 n=10+10)
RegexpMatchMedium_32      314ns ± 0%     312ns ± 0%  -0.64%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K       541ns ± 0%     538ns ± 0%  -0.55%  (p=0.000 n=10+9)
TimeParse                 450ns ± 1%     448ns ± 1%  -0.42%  (p=0.035 n=9+9)
RegexpMatchEasy1_32       244ns ± 0%     243ns ± 0%  -0.41%  (p=0.000 n=10+10)
GoParse                  6.03ms ± 0%    6.00ms ± 0%  -0.40%  (p=0.002 n=10+10)
RegexpMatchEasy1_1K       779ns ± 0%     777ns ± 0%  -0.26%  (p=0.000 n=10+10)
RegexpMatchHard_32       2.75µs ± 0%    2.74µs ± 1%  -0.06%  (p=0.026 n=9+9)
BinaryTree17              11.7s ± 0%     11.6s ± 0%    ~     (p=0.089 n=10+10)
HTTPClientServer         89.1µs ± 1%    89.5µs ± 2%    ~     (p=0.436 n=10+10)
RegexpMatchHard_1K       78.9µs ± 0%    79.5µs ± 2%    ~     (p=0.469 n=10+10)
FmtFprintfEmpty          58.5ns ± 0%    58.5ns ± 0%    ~     (all equal)
GobEncode                12.0ms ± 1%    12.1ms ± 0%    ~     (p=0.075 n=10+10)
Revcomp                   669ms ± 0%     668ms ± 0%    ~     (p=0.091 n=7+9)
Mandelbrot200            5.35ms ± 0%    5.36ms ± 0%  +0.07%  (p=0.000 n=9+9)
RegexpMatchMedium_1K     52.1µs ± 0%    52.1µs ± 0%  +0.10%  (p=0.000 n=9+9)
Fannkuch11                3.25s ± 0%     3.26s ± 0%  +0.36%  (p=0.000 n=9+10)
FmtFprintfString          114ns ± 1%     115ns ± 0%  +0.52%  (p=0.011 n=10+10)
JSONEncode               20.2ms ± 0%    20.3ms ± 0%  +0.65%  (p=0.000 n=10+10)
Template                 91.3ms ± 0%    92.3ms ± 0%  +1.08%  (p=0.000 n=10+10)
TimeFormat                484ns ± 0%     495ns ± 1%  +2.30%  (p=0.000 n=9+10)

There are some opportunities to improve this change further by adding
patterns to match the "extended register" versions of ADD/SUB/CMP, but I
think that should be evaluated on its own.  The regressions in Template
and TimeFormat would likely be recovered by this, as they seem to be due
to generating:

    ubfiz x0, x0, #3, #8
    add x1, x2, x0

instead of

    add x1, x2, x0, lsl #3

Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
Reviewed-on: https://go-review.googlesource.com/88355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-15 14:10:41 +00:00
Alberto Donizetti
ded9a1b372 test/codegen: port len/cap pow2 div tests to codegen
And delete them from asm_test.

Change-Id: I29c8d098a8893e6b669b6272a2f508985ac9d618
Reviewed-on: https://go-review.googlesource.com/100876
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-15 13:34:01 +00:00
Ilya Tocar
644d14ea0f Revert "cmd/compile: implement CMOV on amd64"
This reverts commit 080187f4f7.

It broke build of golang.org/x/exp/shiny/iconvg
See issue 24395 for details

Change-Id: Ifd6134f6214e6cee40bd3c63c32941d5fc96ae8b
Reviewed-on: https://go-review.googlesource.com/100755
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-14 21:21:23 +00:00
Alberto Donizetti
cd3aae9b81 test/codegen: port all small memmove tests to codegen
This change ports all the remaining tests checking that small memmoves
are replaced with MOVs to the new codegen test harness, and deletes
them from the asm_test file.

Change-Id: I01c94b441e27a5d61518035af62d62779dafeb56
Reviewed-on: https://go-review.googlesource.com/100476
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-14 15:57:07 +00:00
Alberto Donizetti
858042b8fd test/codegen: add codegen tests for div
Change-Id: I6ce8981e85fd55ade6078b0946e54a9215d9deca
Reviewed-on: https://go-review.googlesource.com/100575
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-14 15:56:46 +00:00
isharipo
85a8d25d53 cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86
cmd/asm now supports three-operand form of IMUL,
so instead of using IMUL with resultInArg0, emit IMUL3 instruction.

This results in less redundant MOVs where SSA assigns
different registers to input[0] and dst arguments.

Note: these have exactly the same encoding when reg0=reg1:
      IMUL3x $const, reg0, reg1
      IMULx $const, reg
Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0].
This is why we don't bother to generate IMULx for the case where
dst is the same as input[0].

Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4
Reviewed-on: https://go-review.googlesource.com/99656
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-12 19:02:36 +00:00
Giovanni Bajo
080187f4f7 cmd/compile: implement CMOV on amd64
This builds upon the branchelim pass, activating it for amd64 and
lowering CondSelect. Special care is made to FPU instructions for
NaN handling.

Benchmark results on Xeon E5630 (Westmere EP):

name                      old time/op    new time/op    delta
BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)

name                      old speed      new speed      delta
GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)

Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c
Reviewed-on: https://go-review.googlesource.com/98695
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-03-12 18:01:33 +00:00
Giovanni Bajo
f7ac70a566 test: move rotate tests to top-level testsuite.
Remove old tests from asm_test.

Change-Id: Ib408ec7faa60068bddecf709b93ce308e0ef665a
Reviewed-on: https://go-review.googlesource.com/100075
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2018-03-11 10:08:18 +00:00
Alberto Donizetti
8e427a3878 test/codegen: add README file for the codegen test harness
This change adds a README file inside the test/codegen directory,
explaining how to run the codegen tests and the syntax of the regexps
comments used to match assembly instructions.

Change-Id: Ica4eb3ffa9c6975371538cc8ae0ac3c1a3a03baf
Reviewed-on: https://go-review.googlesource.com/99156
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-09 18:38:53 +00:00
Alberto Donizetti
5f541b11aa test/codegen: port MULs merging tests to codegen
And delete them from asm_go.

Change-Id: I0057cbd90ca55fa51c596e32406e190f3866f93e
Reviewed-on: https://go-review.googlesource.com/99815
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-09 17:01:56 +00:00
Alberto Donizetti
cde34780b7 test/codegen: port math/bits.RotateLeft tests to codegen
Only RotateLeft{64,32} were tested, and just for ppc64. This CL adds
tests for RotateLeft{64,32,16,8} on arm64 and amd64/386, for the cases
where the calls are actually instrinsified.

RotateLeft tests (the last ones for math/bits functions) are deleted
from asm_test.

This CL also adds a space between the "//" and the arch name in the
comments, to uniform this file to the style used in all the other
files.

Change-Id: Ifc2a27261d70bcc294b4ec64490d8367f62d2b89
Reviewed-on: https://go-review.googlesource.com/99596
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-09 10:53:38 +00:00
Alberto Donizetti
3772b2e1d5 test/codegen: port 2^n muls tests to codegen harness
And delete them from the asm_test.go file.

Change-Id: I124c8c352299646ec7db0968cdb0fe59a3b5d83d
Reviewed-on: https://go-review.googlesource.com/99475
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-08 16:30:14 +00:00
Alberto Donizetti
c028958393 test/codegen: fix issue with arm64 memmove codegen test
This recently added arm64 memmove codegen check:

  func movesmall() {
    // arm64:-"memmove"
    x := [...]byte{1, 2, 3, 4, 5, 6, 7}
    copy(x[1:], x[:])
  }

is not correct, for two reasons:

1. regexps are matched from the start of the disasm line (excluding
   line information). This mean that a negative -"memmove" check will
   pass against a 'CALL runtime.memmove' line because the line does
   not start with 'memmove' (its starts with CALL...).
   The way to specify no 'memmove' match whatsoever on the line is
   -".*memmove"

2. AFAIK comments on their own line are matched against the first
   subsequent non-comment line. So the code above only verifies that
   the x := ... line does not generate a memmove. The comment should
   be moved near the copy() line, if it's that one we want to not
   generate a memmove call.

The fact that the test above is not effective can be checked by
running `go run run.go -v codegen` in the toplevel test directory with
a go1.10 toolchain (that does not have the memmove-elision
optimization). The test will still pass (it shouldn't).

This change changes the regexp to -".*memmove" and moves it near the
line it needs to (not)match.

Change-Id: Ie01ef4d775e77d92dc8d8b7856b89b200f5e5ef2
Reviewed-on: https://go-review.googlesource.com/98977
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-07 16:41:24 +00:00
Alberto Donizetti
8516ecd05f test/codegen: port math/bits.ReverseBytes tests to codegen
And remove them from ssa_test.

Change-Id: If767af662801219774d1bdb787c77edfa6067770
Reviewed-on: https://go-review.googlesource.com/98976
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-06 20:34:33 +00:00
Wei Xiao
05962561ae cmd/compile/internal/ssa: improve store combine optimization on arm64
Current implementation doesn't consider MOVDreg type operand and fail to combine
it into larger store. This patch fixes the issue.

Fixes #24242

Change-Id: I7d68697f80e76f48c3528ece01a602bf513248ec
Reviewed-on: https://go-review.googlesource.com/98397
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 20:29:04 +00:00
Balaram Makam
0e8b7110f6 cmd/compile/internal/ssa: inline small memmove for arm64
This patch enables the optimization for arm64 target.

Performance results on Amberwing for strconv benchmark:
name             old time/op  new time/op  delta
Quote             721ns ± 0%   617ns ± 0%  -14.40%  (p=0.016 n=5+4)
QuoteRune         118ns ± 0%   117ns ± 0%   -0.85%  (p=0.008 n=5+5)
AppendQuote       436ns ± 2%   321ns ± 0%  -26.31%  (p=0.008 n=5+5)
AppendQuoteRune  34.7ns ± 0%  28.4ns ± 0%  -18.16%  (p=0.000 n=5+4)
[Geo mean]        189ns        160ns       -15.41%

Change-Id: I5714c474e7483d07ca338fbaf49beb4bbcc11c44
Reviewed-on: https://go-review.googlesource.com/98735
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 18:37:19 +00:00
Alberto Donizetti
18ae5eca3b test/codegen: port math/bits.OnesCount tests to codegen
And remove them from ssa_test.

Change-Id: I3efac5fea529bb0efa2dae32124530482ba5058e
Reviewed-on: https://go-review.googlesource.com/98815
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-06 17:53:00 +00:00
Alberto Donizetti
85dcc709a8 test/codegen: port math/bits.TrailingZeros tests to codegen
And remove them from ssa_test.

Change-Id: Ib5de5c0d908f23915e0847eca338cacf2fa5325b
Reviewed-on: https://go-review.googlesource.com/98795
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-06 11:48:37 +00:00
Alberto Donizetti
83e41b3e76 test/codegen: port math/bits.Leadingzero tests to codegen
Change-Id: Ic21d25db5d56ce77516c53082dfbc010e5875b81
Reviewed-on: https://go-review.googlesource.com/98655
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-05 19:52:04 +00:00
Alberto Donizetti
c1806906d8 test: port bits.Len intrinsics tests to the new codegen harness
This change move bits.Len* intrinsification tests to the new codegen
test harness, removing them from the old ssa_test file. Five different
test functions (one for each bit.Len function tested) was used, to
avoid possible unwanted interactions between multiple calls inside one
function.

Change-Id: Iffd5be55b58e88597fa30a562a28dacb01236d8b
Reviewed-on: https://go-review.googlesource.com/98156
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-05 18:01:19 +00:00
Giovanni Bajo
89ae7045f3 test: convert all math-related tests from asm_test
Change-Id: If542f0b5c5754e6eb2f9b302fe5a148ba9a57338
Reviewed-on: https://go-review.googlesource.com/98443
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 16:52:33 +00:00
Giovanni Bajo
fad31e513d test: move load/store combines into asmcheck
This CL moves the load/store combining tests into asmcheck.
In addition at being more compact, it's also now easier to
spot what it is missing in each architecture.

While doing so, I think I uncovered a bug in ppc64le and arm64
rules, because they fail to load/store combine in non-trivial
functions. Not sure why, I'll open an issue.

Change-Id: Ia1572d53c0553d9104f3e52b95e4d1768a8440a3
Reviewed-on: https://go-review.googlesource.com/98441
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 16:52:03 +00:00
Giovanni Bajo
8ce74b7d11 test: port a nil-check interface test from asm_test
Change-Id: I69c1688506d1aeca655047acf35d1bff966fc01e
Reviewed-on: https://go-review.googlesource.com/98442
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-03 20:20:54 +00:00
Alberto Donizetti
644b2dafc2 test/codegen: add copyright headers to new codegen files
Change-Id: I9fe6572d1043ef9ee09c0925059ded554ad24c6b
Reviewed-on: https://go-review.googlesource.com/98215
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-02 20:13:13 +00:00
Giovanni Bajo
879a1ff1e4 test: improve asmcheck syntax
asmcheck comments now support a compact form of specifying
multiple checks for each platform, using the following syntax:

   amd64:"SHL\t[$]4","SHR\t[$]4"

Negative checks are also parsed using the following syntax:

   amd64:-"ROR"

though they are still not working.

Moreover, out-of-line comments have been implemented. This
allows to specify asmchecks on comment-only lines, that will
be matched on the first subsequent non-comment non-empty line.

    // amd64:"XOR"
    // arm:"EOR"

    x ^= 1

Change-Id: I110c7462fc6a5c70fd4af0d42f516016ae7f2760
Reviewed-on: https://go-review.googlesource.com/97816
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-01 18:10:48 +00:00
Giovanni Bajo
c9438cb198 test: add support for code generation tests (asmcheck)
The top-level test harness is modified to support a new kind
of test: "asmcheck". This is meant to replace asm_test.go
as an easier and more readable way to test code generation.

I've added a couple of codegen tests to get initial feedback
on the syntax. I've created them under a common "codegen"
subdirectory, so that it's easier to run them all with
"go run run.go -v codegen".

The asmcheck syntax allows to insert line comments that
can specify a regular expression to match in the assembly code,
for multiple architectures (the testsuite will automatically
build each testfile multiple times, one per mentioned architecture).

Negative matches are unsupported for now, so this cannot fully
replace asm_test yet.

Change-Id: Ifdbba389f01d55e63e73c99e5f5449e642101d55
Reviewed-on: https://go-review.googlesource.com/97355
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2018-03-01 07:59:54 +00:00