1
0
mirror of https://github.com/golang/go synced 2024-11-18 09:14:43 -07:00
Commit Graph

20 Commits

Author SHA1 Message Date
Agniva De Sarker
8acbe4c0b3 cmd/compile: optimize unsigned comparisons with 0/1 on wasm
Updates #21439

Change-Id: I0fbcde6e0c2fc368fe686b271670f9d8be4a7900
Reviewed-on: https://go-review.googlesource.com/c/go/+/247557
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
2020-08-22 12:35:47 +00:00
Junchen Li
06337823ef cmd/compile: optimize unsigned comparisons to 0/1 on arm64
For an unsigned integer, it's useful to convert its order test with 0/1
to its equality test with 0. We can save a comparison instruction that
followed by a conditional branch on arm64 since it supports
compare-with-zero-and-branch instructions. For example,

  if x > 0 { ... } else { ... }

the original version:
  CMP $0, R0
  BLS 9

the optimized version:
  CBZ R0, 8

Updates #21439

Change-Id: Id1de6f865f6aa72c5d45b29f7894818857288425
Reviewed-on: https://go-review.googlesource.com/c/go/+/246857
Reviewed-by: Keith Randall <khr@golang.org>
2020-08-18 04:09:13 +00:00
Junchen Li
e30fbe3757 cmd/compile: optimize unsigned comparisons to 0
There are some architecture-independent rules in #21439, since an
unsigned integer >= 0 is always true and < 0 is always false. This CL
adds these optimizations to generic rules.

Updates #21439

Change-Id: Iec7e3040b761ecb1e60908f764815fdd9bc62495
Reviewed-on: https://go-review.googlesource.com/c/go/+/246617
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-08-17 20:06:35 +00:00
Xiangdong Ji
e031318ca6 cmd/compile: ARM comparisons with 0 incorrect on overflow
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:

  Block-Op         Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.

For more details please refer to the previous fix on 64-bit ARM:
  https://go-review.googlesource.com/c/go/+/233097

Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.

name                     old time/op    new time/op     delta
BinaryTree17-8              7.73s ± 0%      7.81s ± 0%  +0.97%  (p=0.000 n=7+8)
Fannkuch11-8                7.06s ± 0%      7.00s ± 0%  -0.83%  (p=0.000 n=8+8)
FmtFprintfEmpty-8           181ns ± 1%      183ns ± 1%  +1.31%  (p=0.001 n=8+8)
FmtFprintfString-8          319ns ± 1%      325ns ± 2%  +1.71%  (p=0.009 n=7+8)
FmtFprintfInt-8             358ns ± 1%      359ns ± 1%    ~     (p=0.293 n=7+7)
FmtFprintfIntInt-8          459ns ± 3%      456ns ± 1%    ~     (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8     535ns ± 4%      538ns ± 4%    ~     (p=0.572 n=8+8)
FmtFprintfFloat-8          1.01µs ± 2%     1.01µs ± 2%    ~     (p=0.625 n=8+8)
FmtManyArgs-8              1.93µs ± 2%     1.93µs ± 1%    ~     (p=0.979 n=8+7)
GobDecode-8                16.1ms ± 1%     16.5ms ± 1%  +2.32%  (p=0.000 n=8+8)
GobEncode-8                15.9ms ± 0%     15.8ms ± 1%  -1.00%  (p=0.000 n=8+7)
Gzip-8                      690ms ± 1%      670ms ± 0%  -2.90%  (p=0.000 n=8+8)
Gunzip-8                    109ms ± 1%      109ms ± 1%    ~     (p=0.694 n=7+8)
HTTPClientServer-8          149µs ± 3%      146µs ± 2%  -1.70%  (p=0.028 n=8+8)
JSONEncode-8               50.5ms ± 1%     49.2ms ± 0%  -2.60%  (p=0.001 n=7+7)
JSONDecode-8                135ms ± 2%      137ms ± 1%    ~     (p=0.054 n=8+7)
Mandelbrot200-8             951ms ± 0%      952ms ± 0%    ~     (p=0.852 n=6+8)
GoParse-8                  9.47ms ± 1%     9.66ms ± 1%  +2.01%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8       288ns ± 2%      277ns ± 2%  -3.61%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8      1.66µs ± 1%     1.69µs ± 2%  +2.21%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       334ns ± 1%      305ns ± 2%  -8.86%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8      2.14µs ± 2%     2.15µs ± 0%    ~     (p=0.099 n=8+8)
RegexpMatchMedium_32-8     13.3ns ± 1%     13.3ns ± 0%    ~     (p=1.000 n=7+7)
RegexpMatchMedium_1K-8     81.1µs ± 3%     80.7µs ± 1%    ~     (p=0.955 n=7+8)
RegexpMatchHard_32-8       4.26µs ± 0%     4.26µs ± 0%    ~     (p=0.933 n=7+8)
RegexpMatchHard_1K-8        124µs ± 0%      124µs ± 0%  +0.31%  (p=0.000 n=8+8)
Revcomp-8                  14.7ms ± 2%     14.5ms ± 1%  -1.66%  (p=0.003 n=8+8)
Template-8                  197ms ± 2%      200ms ± 3%  +1.62%  (p=0.021 n=8+8)
TimeParse-8                1.33µs ± 1%     1.30µs ± 1%  -1.86%  (p=0.002 n=8+8)
TimeFormat-8               3.04µs ± 1%     3.02µs ± 0%  -0.60%  (p=0.000 n=8+8)

name                     old speed      new speed       delta
GobDecode-8              47.6MB/s ± 1%   46.5MB/s ± 1%  -2.28%  (p=0.000 n=8+8)
GobEncode-8              48.1MB/s ± 0%   48.6MB/s ± 1%  +1.02%  (p=0.000 n=8+7)
Gzip-8                   28.1MB/s ± 1%   29.0MB/s ± 0%  +2.97%  (p=0.000 n=8+8)
Gunzip-8                  178MB/s ± 1%    179MB/s ± 2%    ~     (p=0.694 n=7+8)
JSONEncode-8             38.4MB/s ± 1%   39.4MB/s ± 0%  +2.67%  (p=0.001 n=7+7)
JSONDecode-8             14.3MB/s ± 2%   14.2MB/s ± 1%  -0.81%  (p=0.043 n=8+7)
GoParse-8                6.12MB/s ± 1%   5.99MB/s ± 1%  -2.00%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8     111MB/s ± 2%    115MB/s ± 2%  +3.77%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8     618MB/s ± 1%    604MB/s ± 2%  -2.16%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8    95.7MB/s ± 1%  105.1MB/s ± 2%  +9.76%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8     479MB/s ± 2%    477MB/s ± 0%    ~     (p=0.105 n=8+8)
RegexpMatchMedium_32-8   75.2MB/s ± 1%   75.2MB/s ± 0%    ~     (p=0.247 n=7+7)
RegexpMatchMedium_1K-8   12.6MB/s ± 3%   12.7MB/s ± 1%    ~     (p=0.538 n=7+8)
RegexpMatchHard_32-8     7.52MB/s ± 0%   7.52MB/s ± 0%    ~     (p=0.968 n=7+8)
RegexpMatchHard_1K-8     8.26MB/s ± 0%   8.24MB/s ± 0%  -0.30%  (p=0.001 n=8+8)
Revcomp-8                 173MB/s ± 2%    176MB/s ± 1%  +1.68%  (p=0.003 n=8+8)
Template-8               9.85MB/s ± 2%   9.69MB/s ± 3%  -1.59%  (p=0.021 n=8+8)

Fixes   #39303
Updates #38740

Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-06-09 15:50:33 +00:00
Xiangdong Ji
e8f5a33191 cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:

  Block-Op        Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.

Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.

Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.

This change also updates TestFormats to support using %#x.

Examples exhibiting where does the issue come from:
  1: 'if x + 3 < 0' might be converted to:
  before:
    CMN $3, R0
    BGE <else branch> // wrong branch is taken if 'x+3' overflows
  after:
    CMN $3, R0
    BPL <else branch>

  2: 'if y - 3 > 0' might be converted to:
  before:
    CMP $3, R0
    BLE <else branch> // wrong branch is taken if 'y-3' underflows
  after:
    CMP $3, R0
    BMI <else branch>
    BEQ <else branch>

Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.

S1:
name                    old time/op  new time/op  delta
CondRewrite/SoloJump  13.6ns ± 0%  12.9ns ± 0%  -5.15%  (p=0.000 n=10+10)
CondRewrite/CombJump  13.8ns ± 1%  12.9ns ± 0%  -6.32%  (p=0.000 n=10+10)

S2:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  11.6ns ± 0%  10.9ns ± 0%  -6.03%  (p=0.000 n=10+10)
CondRewrite/CombJump  11.4ns ± 0%  10.8ns ± 1%  -5.53%  (p=0.000 n=10+10)

S3:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  7.36ns ± 0%  7.50ns ± 0%  +1.79%  (p=0.000 n=9+10)
CondRewrite/CombJump  7.35ns ± 0%  7.75ns ± 0%  +5.51%  (p=0.000 n=8+9)

S4:
name                      old time/op  new time/op  delta
CondRewrite/SoloJump-224  11.5ns ± 1%  10.9ns ± 0%  -4.97%  (p=0.000 n=10+10)
CondRewrite/CombJump-224  11.9ns ± 0%  11.5ns ± 0%  -2.95%  (p=0.000 n=10+10)

S5:
name                     old time/op  new time/op  delta
CondRewrite/SoloJump  10.0ns ± 0%  10.0ns ± 0%  -0.45%  (p=0.000 n=9+10)
CondRewrite/CombJump  9.93ns ± 0%  9.77ns ± 0%  -1.53%  (p=0.000 n=10+9)

Go1 perf. data:

name                     old time/op    new time/op    delta
BinaryTree17              6.29s ± 1%     6.30s ± 1%    ~     (p=1.000 n=5+5)
Fannkuch11                5.40s ± 0%     5.40s ± 0%    ~     (p=0.841 n=5+5)
FmtFprintfEmpty          97.9ns ± 0%    98.9ns ± 3%    ~     (p=0.937 n=4+5)
FmtFprintfString          171ns ± 3%     171ns ± 2%    ~     (p=0.754 n=5+5)
FmtFprintfInt             212ns ± 0%     217ns ± 6%  +2.55%  (p=0.008 n=5+5)
FmtFprintfIntInt          296ns ± 1%     297ns ± 2%    ~     (p=0.516 n=5+5)
FmtFprintfPrefixedInt     371ns ± 2%     374ns ± 7%    ~     (p=1.000 n=5+5)
FmtFprintfFloat           435ns ± 1%     439ns ± 2%    ~     (p=0.056 n=5+5)
FmtManyArgs              1.37µs ± 1%    1.36µs ± 1%    ~     (p=0.730 n=5+5)
GobDecode                14.6ms ± 4%    14.4ms ± 4%    ~     (p=0.690 n=5+5)
GobEncode                11.8ms ±20%    11.6ms ±15%    ~     (p=1.000 n=5+5)
Gzip                      507ms ± 0%     491ms ± 0%  -3.22%  (p=0.008 n=5+5)
Gunzip                   73.8ms ± 0%    73.9ms ± 0%    ~     (p=0.690 n=5+5)
HTTPClientServer          116µs ± 0%     116µs ± 0%    ~     (p=0.686 n=4+4)
JSONEncode               21.8ms ± 1%    21.6ms ± 2%    ~     (p=0.151 n=5+5)
JSONDecode                104ms ± 1%     103ms ± 1%  -1.08%  (p=0.016 n=5+5)
Mandelbrot200            9.53ms ± 0%    9.53ms ± 0%    ~     (p=0.421 n=5+5)
GoParse                  7.55ms ± 1%    7.51ms ± 1%    ~     (p=0.151 n=5+5)
RegexpMatchEasy0_32       158ns ± 0%     158ns ± 0%    ~     (all equal)
RegexpMatchEasy0_1K       606ns ± 1%     608ns ± 3%    ~     (p=0.937 n=5+5)
RegexpMatchEasy1_32       143ns ± 0%     144ns ± 1%    ~     (p=0.095 n=5+4)
RegexpMatchEasy1_1K       927ns ± 2%     944ns ± 2%    ~     (p=0.056 n=5+5)
RegexpMatchMedium_32     16.0ns ± 0%    16.0ns ± 0%    ~     (all equal)
RegexpMatchMedium_1K     69.3µs ± 2%    69.7µs ± 0%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32       3.73µs ± 0%    3.73µs ± 1%    ~     (p=0.984 n=5+5)
RegexpMatchHard_1K        111µs ± 1%     110µs ± 0%    ~     (p=0.151 n=5+5)
Revcomp                   1.91s ±47%     1.77s ±68%    ~     (p=1.000 n=5+5)
Template                  138ms ± 1%     138ms ± 1%    ~     (p=1.000 n=5+5)
TimeParse                 787ns ± 2%     785ns ± 1%    ~     (p=0.540 n=5+5)
TimeFormat                729ns ± 1%     726ns ± 1%    ~     (p=0.151 n=5+5)

Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2020-05-29 15:39:54 +00:00
Agniva De Sarker
ecc7dd5469 test/codegen: fix wasm codegen breakage
i32.eqz instructions don't appear unless needed in if conditions anymore
after CL 195204. I forgot to run the codegen tests while submitting the CL.

Thanks to @martisch for catching it.

Fixes #34442

Change-Id: I177b064b389be48e39d564849714d7a8839be13e
Reviewed-on: https://go-review.googlesource.com/c/go/+/196580
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2019-09-21 16:31:44 +00:00
Ben Shi
1786ecd502 cmd/compile: eliminate WASM's redundant extension & wrapping
This CL eliminates unnecessary pairs of I32WrapI64 and
I64ExtendI32U generated by the WASM backend for IF
statements. And it makes the total size of pkg/js_wasm/
decreases about 490KB.

Change-Id: I16b0abb686c4e30d5624323166ec2d0ec57dbe2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191758
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
2019-08-30 21:20:03 +00:00
Josh Bleecher Snyder
870cfe6484 test/codegen: gofmt
Change-Id: I33f5b5051e5f75aa264ec656926223c5a3c09c1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/167498
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
2019-03-13 21:44:45 +00:00
Lynn Boger
4ae49b5921 cmd/compile: use ANDCC, ORCC, XORCC to avoid CMP on ppc64x
This change makes use of the cc versions of the AND, OR, XOR
instructions, omitting the need for a CMP instruction.

In many test programs and in the go binary, this reduces the
size of 20-30 functions by at least 1 instruction, many in
runtime.

Testcase added to test/codegen/comparisons.go

Change-Id: I6cc1ca8b80b065d7390749c625bc9784b0039adb
Reviewed-on: https://go-review.googlesource.com/c/143059
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-09 19:40:52 +00:00
Lynn Boger
39fa301bdc test/codegen: enable more tests for ppc64/ppc64le
Adding cases for ppc64,ppc64le to the codegen tests
where appropriate.

Change-Id: Idf8cbe88a4ab4406a4ef1ea777bd15a58b68f3ed
Reviewed-on: https://go-review.googlesource.com/c/142557
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-16 19:00:53 +00:00
Ben Shi
5aeecc4530 cmd/compile: optimize arm64's code with more shifted operations
This CL optimizes arm64's NEG/MVN/TST/CMN with a shifted operand.

1. The total size of pkg/android_arm64 decreases about 0.2KB, excluding
cmd/compile/ .

2. The go1 benchmark shows no regression, excluding noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              16.4s ± 1%     16.4s ± 1%    ~     (p=0.914 n=29+29)
Fannkuch11-4                8.72s ± 0%     8.72s ± 0%    ~     (p=0.274 n=30+29)
FmtFprintfEmpty-4           174ns ± 0%     174ns ± 0%    ~     (all equal)
FmtFprintfString-4          370ns ± 0%     370ns ± 0%    ~     (all equal)
FmtFprintfInt-4             419ns ± 0%     419ns ± 0%    ~     (all equal)
FmtFprintfIntInt-4          672ns ± 1%     675ns ± 2%    ~     (p=0.217 n=28+30)
FmtFprintfPrefixedInt-4     806ns ± 0%     806ns ± 0%    ~     (p=0.402 n=30+28)
FmtFprintfFloat-4          1.09µs ± 0%    1.09µs ± 0%  +0.02%  (p=0.011 n=22+27)
FmtManyArgs-4              2.67µs ± 0%    2.68µs ± 0%    ~     (p=0.279 n=29+30)
GobDecode-4                33.1ms ± 1%    33.1ms ± 0%    ~     (p=0.052 n=28+29)
GobEncode-4                29.6ms ± 0%    29.6ms ± 0%  +0.08%  (p=0.013 n=28+29)
Gzip-4                      1.38s ± 2%     1.39s ± 2%    ~     (p=0.071 n=29+29)
Gunzip-4                    139ms ± 0%     139ms ± 0%    ~     (p=0.265 n=29+29)
HTTPClientServer-4          789µs ± 4%     785µs ± 4%    ~     (p=0.206 n=29+28)
JSONEncode-4               49.7ms ± 0%    49.6ms ± 0%  -0.24%  (p=0.000 n=30+30)
JSONDecode-4                266ms ± 1%     267ms ± 1%  +0.34%  (p=0.000 n=30+30)
Mandelbrot200-4            16.6ms ± 0%    16.6ms ± 0%    ~     (p=0.835 n=28+30)
GoParse-4                  15.9ms ± 0%    15.8ms ± 0%  -0.29%  (p=0.000 n=27+30)
RegexpMatchEasy0_32-4       380ns ± 0%     381ns ± 0%  +0.18%  (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4      1.18µs ± 0%    1.19µs ± 0%  +0.23%  (p=0.000 n=30+30)
RegexpMatchEasy1_32-4       357ns ± 0%     358ns ± 0%  +0.28%  (p=0.000 n=29+29)
RegexpMatchEasy1_1K-4      2.04µs ± 0%    2.04µs ± 0%  +0.06%  (p=0.006 n=30+30)
RegexpMatchMedium_32-4      589ns ± 0%     590ns ± 0%  +0.24%  (p=0.000 n=28+30)
RegexpMatchMedium_1K-4      162µs ± 0%     162µs ± 0%  -0.01%  (p=0.027 n=26+29)
RegexpMatchHard_32-4       9.58µs ± 0%    9.58µs ± 0%    ~     (p=0.935 n=30+30)
RegexpMatchHard_1K-4        287µs ± 0%     287µs ± 0%    ~     (p=0.387 n=29+30)
Revcomp-4                   2.50s ± 0%     2.50s ± 0%  -0.10%  (p=0.020 n=28+28)
Template-4                  310ms ± 0%     310ms ± 1%    ~     (p=0.406 n=30+30)
TimeParse-4                1.68µs ± 0%    1.68µs ± 0%  +0.03%  (p=0.014 n=30+17)
TimeFormat-4               1.65µs ± 0%    1.66µs ± 0%  +0.32%  (p=0.000 n=27+29)
[Geo mean]                  247µs          247µs       +0.05%

name                     old speed      new speed      delta
GobDecode-4              23.2MB/s ± 0%  23.2MB/s ± 0%  -0.08%  (p=0.032 n=27+29)
GobEncode-4              26.0MB/s ± 0%  25.9MB/s ± 0%  -0.10%  (p=0.011 n=29+29)
Gzip-4                   14.1MB/s ± 2%  14.0MB/s ± 2%    ~     (p=0.081 n=29+29)
Gunzip-4                  139MB/s ± 0%   139MB/s ± 0%    ~     (p=0.290 n=29+29)
JSONEncode-4             39.0MB/s ± 0%  39.1MB/s ± 0%  +0.25%  (p=0.000 n=29+30)
JSONDecode-4             7.30MB/s ± 1%  7.28MB/s ± 1%  -0.33%  (p=0.000 n=30+30)
GoParse-4                3.65MB/s ± 0%  3.66MB/s ± 0%  +0.29%  (p=0.000 n=27+30)
RegexpMatchEasy0_32-4    84.1MB/s ± 0%  84.0MB/s ± 0%  -0.17%  (p=0.000 n=30+28)
RegexpMatchEasy0_1K-4     864MB/s ± 0%   862MB/s ± 0%  -0.24%  (p=0.000 n=30+30)
RegexpMatchEasy1_32-4    89.5MB/s ± 0%  89.3MB/s ± 0%  -0.18%  (p=0.000 n=28+24)
RegexpMatchEasy1_1K-4     502MB/s ± 0%   502MB/s ± 0%  -0.05%  (p=0.008 n=30+29)
RegexpMatchMedium_32-4   1.70MB/s ± 0%  1.69MB/s ± 0%  -0.59%  (p=0.000 n=29+30)
RegexpMatchMedium_1K-4   6.31MB/s ± 0%  6.31MB/s ± 0%  +0.05%  (p=0.005 n=30+26)
RegexpMatchHard_32-4     3.34MB/s ± 0%  3.34MB/s ± 0%    ~     (all equal)
RegexpMatchHard_1K-4     3.57MB/s ± 0%  3.57MB/s ± 0%    ~     (all equal)
Revcomp-4                 102MB/s ± 0%   102MB/s ± 0%  +0.10%  (p=0.022 n=28+28)
Template-4               6.26MB/s ± 0%  6.26MB/s ± 1%    ~     (p=0.768 n=30+30)
[Geo mean]               24.2MB/s       24.1MB/s       -0.08%

Change-Id: I494f9db7f8a568a00e9c74ae25086a58b2221683
Reviewed-on: https://go-review.googlesource.com/137976
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-28 15:05:17 +00:00
Ben Shi
031a35ec84 cmd/compile: optimize 386's comparison
Optimization of "(CMPconst [0] (ANDL x y)) -> (TESTL x y)" only
get benefits if there is no further use of the result of x&y. A
condition of uses==1 will have slight improvements.

1. The code size of pkg/linux_386 decreases about 300 bytes, excluding
cmd/compile/.

2. The go1 benchmark shows no regression, and even a slight improvement
in test case FmtFprintfEmpty-4, excluding noise.

name                     old time/op    new time/op    delta
BinaryTree17-4              3.34s ± 3%     3.32s ± 2%    ~     (p=0.197 n=30+30)
Fannkuch11-4                3.48s ± 2%     3.47s ± 1%  -0.33%  (p=0.015 n=30+30)
FmtFprintfEmpty-4          46.3ns ± 4%    44.8ns ± 4%  -3.33%  (p=0.000 n=30+30)
FmtFprintfString-4         78.8ns ± 7%    77.3ns ± 5%    ~     (p=0.098 n=30+26)
FmtFprintfInt-4            90.2ns ± 1%    90.0ns ± 7%  -0.23%  (p=0.027 n=18+30)
FmtFprintfIntInt-4          144ns ± 4%     143ns ± 5%    ~     (p=0.945 n=30+29)
FmtFprintfPrefixedInt-4     180ns ± 4%     180ns ± 5%    ~     (p=0.858 n=30+30)
FmtFprintfFloat-4           409ns ± 4%     406ns ± 3%  -0.87%  (p=0.028 n=30+30)
FmtManyArgs-4               611ns ± 5%     608ns ± 4%    ~     (p=0.812 n=30+30)
GobDecode-4                7.30ms ± 5%    7.26ms ± 5%    ~     (p=0.522 n=30+29)
GobEncode-4                6.90ms ± 7%    6.82ms ± 4%    ~     (p=0.086 n=29+28)
Gzip-4                      396ms ± 4%     400ms ± 4%  +0.99%  (p=0.026 n=30+30)
Gunzip-4                   41.1ms ± 3%    41.2ms ± 3%    ~     (p=0.495 n=30+30)
HTTPClientServer-4         63.7µs ± 3%    63.3µs ± 2%    ~     (p=0.113 n=29+29)
JSONEncode-4               16.1ms ± 2%    16.1ms ± 2%  -0.30%  (p=0.041 n=30+30)
JSONDecode-4               60.9ms ± 3%    61.2ms ± 6%    ~     (p=0.187 n=30+30)
Mandelbrot200-4            5.17ms ± 2%    5.19ms ± 3%    ~     (p=0.676 n=30+30)
GoParse-4                  3.28ms ± 3%    3.25ms ± 2%  -0.97%  (p=0.002 n=30+30)
RegexpMatchEasy0_32-4       103ns ± 4%     104ns ± 4%    ~     (p=0.352 n=30+30)
RegexpMatchEasy0_1K-4       849ns ± 2%     845ns ± 2%    ~     (p=0.381 n=30+30)
RegexpMatchEasy1_32-4       113ns ± 4%     113ns ± 4%    ~     (p=0.795 n=30+30)
RegexpMatchEasy1_1K-4      1.03µs ± 3%    1.03µs ± 4%    ~     (p=0.275 n=25+30)
RegexpMatchMedium_32-4      132ns ± 3%     132ns ± 3%    ~     (p=0.970 n=30+30)
RegexpMatchMedium_1K-4     41.4µs ± 3%    41.4µs ± 3%    ~     (p=0.212 n=30+30)
RegexpMatchHard_32-4       2.22µs ± 4%    2.22µs ± 4%    ~     (p=0.399 n=30+30)
RegexpMatchHard_1K-4       67.2µs ± 3%    67.6µs ± 4%    ~     (p=0.359 n=30+30)
Revcomp-4                   1.84s ± 2%     1.83s ± 2%    ~     (p=0.532 n=30+30)
Template-4                 69.1ms ± 4%    68.8ms ± 3%    ~     (p=0.146 n=30+30)
TimeParse-4                 441ns ± 3%     442ns ± 3%    ~     (p=0.154 n=30+30)
TimeFormat-4                413ns ± 3%     414ns ± 3%    ~     (p=0.275 n=30+30)
[Geo mean]                 66.2µs         66.0µs       -0.28%

name                     old speed      new speed      delta
GobDecode-4               105MB/s ± 5%   106MB/s ± 5%    ~     (p=0.514 n=30+29)
GobEncode-4               111MB/s ± 5%   113MB/s ± 4%  +1.37%  (p=0.046 n=28+28)
Gzip-4                   49.1MB/s ± 4%  48.6MB/s ± 4%  -0.98%  (p=0.028 n=30+30)
Gunzip-4                  472MB/s ± 4%   472MB/s ± 3%    ~     (p=0.496 n=30+30)
JSONEncode-4              120MB/s ± 2%   121MB/s ± 2%  +0.29%  (p=0.042 n=30+30)
JSONDecode-4             31.9MB/s ± 3%  31.7MB/s ± 6%    ~     (p=0.186 n=30+30)
GoParse-4                17.6MB/s ± 3%  17.8MB/s ± 2%  +0.98%  (p=0.002 n=30+30)
RegexpMatchEasy0_32-4     309MB/s ± 4%   307MB/s ± 4%    ~     (p=0.501 n=30+30)
RegexpMatchEasy0_1K-4    1.21GB/s ± 2%  1.21GB/s ± 2%    ~     (p=0.301 n=30+30)
RegexpMatchEasy1_32-4     283MB/s ± 4%   282MB/s ± 3%    ~     (p=0.877 n=30+30)
RegexpMatchEasy1_1K-4    1.00GB/s ± 3%  0.99GB/s ± 4%    ~     (p=0.276 n=25+30)
RegexpMatchMedium_32-4   7.54MB/s ± 3%  7.55MB/s ± 3%    ~     (p=0.528 n=30+30)
RegexpMatchMedium_1K-4   24.7MB/s ± 3%  24.7MB/s ± 3%    ~     (p=0.203 n=30+30)
RegexpMatchHard_32-4     14.4MB/s ± 4%  14.4MB/s ± 4%    ~     (p=0.407 n=30+30)
RegexpMatchHard_1K-4     15.3MB/s ± 3%  15.1MB/s ± 4%    ~     (p=0.306 n=30+30)
Revcomp-4                 138MB/s ± 2%   139MB/s ± 2%    ~     (p=0.520 n=30+30)
Template-4               28.1MB/s ± 4%  28.2MB/s ± 3%    ~     (p=0.149 n=30+30)
[Geo mean]               81.5MB/s       81.5MB/s       +0.06%

Change-Id: I7f75425f79eec93cdd8fdd94db13ad4f61b6a2f5
Reviewed-on: https://go-review.googlesource.com/133657
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-07 01:20:45 +00:00
Ben Shi
067dfce21f cmd/compile: optimize ARM's comparision
Optimize (CMPconst [0] (ADD x y)) to (CMN x y) will only get benefits
when the result of the addition is no longer used, otherwise there
might be even performance drop. And this CL fixes that issue for
CMP/CMN/TST/TEQ.

There is little regression in the go1 benchmark (excluding noise),
and the test case JSONDecode-4 even gets improvement.

name                     old time/op    new time/op    delta
BinaryTree17-4              21.6s ± 1%     21.6s ± 0%  -0.22%  (p=0.013 n=30+30)
Fannkuch11-4                11.1s ± 0%     11.1s ± 0%  +0.11%  (p=0.000 n=30+29)
FmtFprintfEmpty-4           297ns ± 0%     297ns ± 0%  +0.08%  (p=0.007 n=26+28)
FmtFprintfString-4          589ns ± 1%     589ns ± 0%    ~     (p=0.659 n=30+25)
FmtFprintfInt-4             644ns ± 1%     650ns ± 0%  +0.88%  (p=0.000 n=30+24)
FmtFprintfIntInt-4          964ns ± 0%     977ns ± 0%  +1.33%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4    1.06µs ± 0%    1.07µs ± 0%  +1.31%  (p=0.000 n=29+27)
FmtFprintfFloat-4          1.89µs ± 0%    1.92µs ± 0%  +1.25%  (p=0.000 n=29+29)
FmtManyArgs-4              3.63µs ± 0%    3.67µs ± 0%  +1.33%  (p=0.000 n=29+27)
GobDecode-4                38.1ms ± 1%    37.9ms ± 1%  -0.60%  (p=0.000 n=29+29)
GobEncode-4                35.3ms ± 2%    35.2ms ± 1%    ~     (p=0.286 n=30+30)
Gzip-4                      2.36s ± 0%     2.37s ± 2%    ~     (p=0.277 n=24+28)
Gunzip-4                    264ms ± 1%     264ms ± 1%    ~     (p=0.104 n=28+30)
HTTPClientServer-4         1.04ms ± 4%    1.02ms ± 4%  -1.65%  (p=0.000 n=28+28)
JSONEncode-4               78.5ms ± 1%    79.6ms ± 1%  +1.34%  (p=0.000 n=27+28)
JSONDecode-4                379ms ± 4%     352ms ± 5%  -7.09%  (p=0.000 n=29+30)
Mandelbrot200-4            17.6ms ± 0%    17.6ms ± 0%    ~     (p=0.206 n=28+29)
GoParse-4                  21.9ms ± 1%    22.1ms ± 1%  +0.87%  (p=0.000 n=28+26)
RegexpMatchEasy0_32-4       631ns ± 0%     641ns ± 0%  +1.63%  (p=0.000 n=29+30)
RegexpMatchEasy0_1K-4      4.11µs ± 0%    4.11µs ± 0%    ~     (p=0.700 n=30+30)
RegexpMatchEasy1_32-4       670ns ± 0%     679ns ± 0%  +1.37%  (p=0.000 n=21+30)
RegexpMatchEasy1_1K-4      5.31µs ± 0%    5.26µs ± 0%  -1.03%  (p=0.000 n=25+28)
RegexpMatchMedium_32-4      905ns ± 0%     906ns ± 0%  +0.14%  (p=0.001 n=30+30)
RegexpMatchMedium_1K-4      192µs ± 0%     191µs ± 0%  -0.45%  (p=0.000 n=29+27)
RegexpMatchHard_32-4       11.8µs ± 0%    11.7µs ± 0%  -0.39%  (p=0.000 n=29+28)
RegexpMatchHard_1K-4        347µs ± 0%     347µs ± 0%    ~     (p=0.084 n=29+30)
Revcomp-4                  37.5ms ± 1%    37.5ms ± 1%    ~     (p=0.279 n=29+29)
Template-4                  519ms ± 2%     519ms ± 2%    ~     (p=0.652 n=28+29)
TimeParse-4                2.83µs ± 0%    2.78µs ± 0%  -1.90%  (p=0.000 n=27+28)
TimeFormat-4               5.79µs ± 0%    5.60µs ± 0%  -3.23%  (p=0.000 n=29+29)
[Geo mean]                  331µs          330µs       -0.16%

name                     old speed      new speed      delta
GobDecode-4              20.1MB/s ± 1%  20.3MB/s ± 1%  +0.61%  (p=0.000 n=29+29)
GobEncode-4              21.7MB/s ± 2%  21.8MB/s ± 1%    ~     (p=0.294 n=30+30)
Gzip-4                   8.23MB/s ± 1%  8.20MB/s ± 2%    ~     (p=0.099 n=26+28)
Gunzip-4                 73.5MB/s ± 1%  73.4MB/s ± 1%    ~     (p=0.107 n=28+30)
JSONEncode-4             24.7MB/s ± 1%  24.4MB/s ± 1%  -1.32%  (p=0.000 n=27+28)
JSONDecode-4             5.13MB/s ± 4%  5.52MB/s ± 5%  +7.65%  (p=0.000 n=29+30)
GoParse-4                2.65MB/s ± 1%  2.63MB/s ± 1%  -0.87%  (p=0.000 n=28+26)
RegexpMatchEasy0_32-4    50.7MB/s ± 0%  49.9MB/s ± 0%  -1.58%  (p=0.000 n=29+29)
RegexpMatchEasy0_1K-4     249MB/s ± 0%   249MB/s ± 0%    ~     (p=0.342 n=30+28)
RegexpMatchEasy1_32-4    47.7MB/s ± 0%  47.1MB/s ± 0%  -1.39%  (p=0.000 n=26+30)
RegexpMatchEasy1_1K-4     193MB/s ± 0%   195MB/s ± 0%  +1.04%  (p=0.000 n=25+28)
RegexpMatchMedium_32-4   1.10MB/s ± 0%  1.10MB/s ± 0%  -0.42%  (p=0.000 n=30+26)
RegexpMatchMedium_1K-4   5.33MB/s ± 0%  5.36MB/s ± 0%  +0.43%  (p=0.000 n=29+29)
RegexpMatchHard_32-4     2.72MB/s ± 0%  2.73MB/s ± 0%  +0.37%  (p=0.000 n=29+30)
RegexpMatchHard_1K-4     2.95MB/s ± 0%  2.95MB/s ± 0%    ~     (all equal)
Revcomp-4                67.8MB/s ± 1%  67.7MB/s ± 1%    ~     (p=0.273 n=29+29)
Template-4               3.74MB/s ± 2%  3.74MB/s ± 2%    ~     (p=0.665 n=28+29)
[Geo mean]               15.2MB/s       15.2MB/s       +0.21%

Change-Id: Ifed1fb8cc02d5ca52c8bc6c21b6b5bf6dbb2701a
Reviewed-on: https://go-review.googlesource.com/132115
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-05 14:54:08 +00:00
Ben Shi
0e9f1de0b7 cmd/compile: optimize arm64's comparison
Add more optimization with TST/CMN.

1. A tiny benchmark shows more than 12% improvement.
TSTCMN-4                    378µs ± 0%     332µs ± 0%  -12.15%  (p=0.000 n=30+27)
(https://github.com/benshi001/ugo1/blob/master/tstcmn_test.go)

2. There is little regression in the go1 benchmark, excluding noise.

name                     old time/op    new time/op    delta
BinaryTree17-4              19.1s ± 0%     19.1s ± 0%    ~     (p=0.994 n=28+29)
Fannkuch11-4                10.0s ± 0%     10.0s ± 0%    ~     (p=0.198 n=30+25)
FmtFprintfEmpty-4           233ns ± 0%     233ns ± 0%  +0.14%  (p=0.002 n=24+30)
FmtFprintfString-4          428ns ± 0%     428ns ± 0%    ~     (all equal)
FmtFprintfInt-4             472ns ± 0%     472ns ± 0%    ~     (all equal)
FmtFprintfIntInt-4          725ns ± 0%     725ns ± 0%    ~     (all equal)
FmtFprintfPrefixedInt-4     889ns ± 0%     888ns ± 0%    ~     (p=0.632 n=28+30)
FmtFprintfFloat-4          1.20µs ± 0%    1.20µs ± 0%  +0.05%  (p=0.001 n=18+30)
FmtManyArgs-4              3.00µs ± 0%    2.99µs ± 0%  -0.07%  (p=0.001 n=27+30)
GobDecode-4                42.1ms ± 0%    42.2ms ± 0%  +0.29%  (p=0.000 n=28+28)
GobEncode-4                38.6ms ± 9%    38.8ms ± 9%    ~     (p=0.912 n=30+30)
Gzip-4                      2.07s ± 1%     2.05s ± 1%  -0.64%  (p=0.000 n=29+30)
Gunzip-4                    175ms ± 0%     175ms ± 0%  -0.15%  (p=0.001 n=30+30)
HTTPClientServer-4          872µs ± 5%     880µs ± 6%    ~     (p=0.196 n=30+29)
JSONEncode-4               88.5ms ± 1%    89.8ms ± 1%  +1.49%  (p=0.000 n=23+24)
JSONDecode-4                393ms ± 1%     390ms ± 1%  -0.89%  (p=0.000 n=28+30)
Mandelbrot200-4            19.5ms ± 0%    19.5ms ± 0%    ~     (p=0.405 n=29+28)
GoParse-4                  19.9ms ± 0%    20.0ms ± 0%  +0.27%  (p=0.000 n=30+30)
RegexpMatchEasy0_32-4       431ns ± 0%     431ns ± 0%    ~     (p=1.000 n=30+30)
RegexpMatchEasy0_1K-4      1.61µs ± 0%    1.61µs ± 0%    ~     (p=0.527 n=26+26)
RegexpMatchEasy1_32-4       443ns ± 0%     443ns ± 0%    ~     (all equal)
RegexpMatchEasy1_1K-4      2.58µs ± 1%    2.58µs ± 1%    ~     (p=0.578 n=27+25)
RegexpMatchMedium_32-4      740ns ± 0%     740ns ± 0%    ~     (p=0.357 n=30+30)
RegexpMatchMedium_1K-4      223µs ± 0%     223µs ± 0%  +0.16%  (p=0.000 n=30+29)
RegexpMatchHard_32-4       12.3µs ± 0%    12.3µs ± 0%    ~     (p=0.236 n=27+27)
RegexpMatchHard_1K-4        371µs ± 0%     371µs ± 0%  +0.09%  (p=0.000 n=30+27)
Revcomp-4                   2.85s ± 0%     2.85s ± 0%    ~     (p=0.057 n=28+25)
Template-4                  408ms ± 1%     409ms ± 1%    ~     (p=0.117 n=29+29)
TimeParse-4                1.93µs ± 0%    1.93µs ± 0%    ~     (p=0.535 n=29+28)
TimeFormat-4               1.99µs ± 0%    1.99µs ± 0%    ~     (p=0.168 n=29+28)
[Geo mean]                  306µs          307µs       +0.07%

name                     old speed      new speed      delta
GobDecode-4              18.3MB/s ± 0%  18.2MB/s ± 0%  -0.31%  (p=0.000 n=28+29)
GobEncode-4              19.9MB/s ± 8%  19.8MB/s ± 9%    ~     (p=0.923 n=30+30)
Gzip-4                   9.39MB/s ± 1%  9.45MB/s ± 1%  +0.65%  (p=0.000 n=29+30)
Gunzip-4                  111MB/s ± 0%   111MB/s ± 0%  +0.15%  (p=0.001 n=30+30)
JSONEncode-4             21.9MB/s ± 1%  21.6MB/s ± 1%  -1.45%  (p=0.000 n=23+23)
JSONDecode-4             4.94MB/s ± 1%  4.98MB/s ± 1%  +0.84%  (p=0.000 n=27+30)
GoParse-4                2.91MB/s ± 0%  2.90MB/s ± 0%  -0.34%  (p=0.000 n=21+22)
RegexpMatchEasy0_32-4    74.1MB/s ± 0%  74.1MB/s ± 0%    ~     (p=0.469 n=29+28)
RegexpMatchEasy0_1K-4     634MB/s ± 0%   634MB/s ± 0%    ~     (p=0.978 n=24+28)
RegexpMatchEasy1_32-4    72.2MB/s ± 0%  72.2MB/s ± 0%    ~     (p=0.064 n=27+29)
RegexpMatchEasy1_1K-4     396MB/s ± 1%   396MB/s ± 1%    ~     (p=0.583 n=27+25)
RegexpMatchMedium_32-4   1.35MB/s ± 0%  1.35MB/s ± 0%    ~     (all equal)
RegexpMatchMedium_1K-4   4.60MB/s ± 0%  4.59MB/s ± 0%  -0.14%  (p=0.000 n=30+26)
RegexpMatchHard_32-4     2.61MB/s ± 0%  2.61MB/s ± 0%    ~     (all equal)
RegexpMatchHard_1K-4     2.76MB/s ± 0%  2.76MB/s ± 0%    ~     (all equal)
Revcomp-4                89.1MB/s ± 0%  89.1MB/s ± 0%    ~     (p=0.059 n=28+25)
Template-4               4.75MB/s ± 1%  4.75MB/s ± 1%    ~     (p=0.106 n=29+29)
[Geo mean]               18.3MB/s       18.3MB/s       -0.07%

Change-Id: I3cd76ce63e84b0c3cebabf9fa3573b76a7343899
Reviewed-on: https://go-review.googlesource.com/124935
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-05 02:51:28 +00:00
Ben Shi
2b69ad0b3c cmd/compile: optimize arm's comparison
The CMP/CMN/TST/TEQ perform similar to SUB/ADD/AND/XOR except
the result is abondoned, and only NZCV flags are affected.

This CL implements further optimization with them.

1. A micro benchmark test gets more than 9% improvment.
TSTTEQ-4                   6.99ms ± 0%    6.35ms ± 0%  -9.15%  (p=0.000 n=33+36)
(https://github.com/benshi001/ugo1/blob/master/tstteq2_test.go)

2. The go1 benckmark shows no regression, excluding noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              25.7s ± 1%     25.7s ± 1%    ~     (p=0.830 n=40+40)
Fannkuch11-4                13.3s ± 0%     13.2s ± 0%  -0.65%  (p=0.000 n=40+34)
FmtFprintfEmpty-4           394ns ± 0%     394ns ± 0%    ~     (p=0.819 n=40+40)
FmtFprintfString-4          677ns ± 0%     677ns ± 0%  +0.06%  (p=0.039 n=39+40)
FmtFprintfInt-4             707ns ± 0%     706ns ± 0%  -0.14%  (p=0.000 n=40+39)
FmtFprintfIntInt-4         1.04µs ± 0%    1.04µs ± 0%  +0.10%  (p=0.000 n=29+31)
FmtFprintfPrefixedInt-4    1.10µs ± 0%    1.11µs ± 0%  +0.65%  (p=0.000 n=39+37)
FmtFprintfFloat-4          2.27µs ± 0%    2.26µs ± 0%  -0.53%  (p=0.000 n=39+40)
FmtManyArgs-4              3.96µs ± 0%    3.96µs ± 0%  +0.10%  (p=0.000 n=39+40)
GobDecode-4                53.4ms ± 1%    52.8ms ± 2%  -1.10%  (p=0.000 n=39+39)
GobEncode-4                50.3ms ± 3%    50.4ms ± 2%    ~     (p=0.089 n=40+39)
Gzip-4                      2.62s ± 0%     2.64s ± 0%  +0.60%  (p=0.000 n=40+39)
Gunzip-4                    312ms ± 0%     312ms ± 0%  +0.02%  (p=0.030 n=40+39)
HTTPClientServer-4         1.01ms ± 7%    0.98ms ± 7%  -2.37%  (p=0.000 n=40+39)
JSONEncode-4                126ms ± 1%     126ms ± 1%  -0.38%  (p=0.004 n=39+39)
JSONDecode-4                423ms ± 0%     426ms ± 2%  +0.72%  (p=0.001 n=39+40)
Mandelbrot200-4            18.4ms ± 0%    18.4ms ± 0%  +0.04%  (p=0.000 n=38+40)
GoParse-4                  22.8ms ± 0%    22.6ms ± 0%  -0.68%  (p=0.000 n=35+40)
RegexpMatchEasy0_32-4       699ns ± 0%     704ns ± 0%  +0.73%  (p=0.000 n=27+40)
RegexpMatchEasy0_1K-4      4.27µs ± 0%    4.26µs ± 0%  -0.09%  (p=0.000 n=35+38)
RegexpMatchEasy1_32-4       741ns ± 0%     735ns ± 0%  -0.85%  (p=0.000 n=40+35)
RegexpMatchEasy1_1K-4      5.53µs ± 0%    5.49µs ± 0%  -0.69%  (p=0.000 n=39+40)
RegexpMatchMedium_32-4     1.07µs ± 0%    1.04µs ± 2%  -2.34%  (p=0.000 n=40+40)
RegexpMatchMedium_1K-4      261µs ± 0%     261µs ± 0%  -0.16%  (p=0.000 n=40+39)
RegexpMatchHard_32-4       14.9µs ± 0%    14.9µs ± 0%  -0.18%  (p=0.000 n=39+40)
RegexpMatchHard_1K-4        445µs ± 0%     446µs ± 0%  +0.09%  (p=0.000 n=36+34)
Revcomp-4                  41.8ms ± 1%    41.8ms ± 1%    ~     (p=0.595 n=39+38)
Template-4                  530ms ± 1%     528ms ± 1%  -0.49%  (p=0.000 n=40+40)
TimeParse-4                3.39µs ± 0%    3.42µs ± 0%  +0.98%  (p=0.000 n=36+38)
TimeFormat-4               6.12µs ± 0%    6.07µs ± 0%  -0.81%  (p=0.000 n=34+38)
[Geo mean]                  384µs          383µs       -0.24%

name                     old speed      new speed      delta
GobDecode-4              14.4MB/s ± 1%  14.5MB/s ± 2%  +1.11%  (p=0.000 n=39+39)
GobEncode-4              15.3MB/s ± 3%  15.2MB/s ± 2%    ~     (p=0.104 n=40+39)
Gzip-4                   7.40MB/s ± 1%  7.36MB/s ± 0%  -0.60%  (p=0.000 n=40+39)
Gunzip-4                 62.2MB/s ± 0%  62.1MB/s ± 0%  -0.02%  (p=0.047 n=40+39)
JSONEncode-4             15.4MB/s ± 1%  15.4MB/s ± 2%  +0.39%  (p=0.002 n=39+39)
JSONDecode-4             4.59MB/s ± 0%  4.56MB/s ± 2%  -0.71%  (p=0.000 n=39+40)
GoParse-4                2.54MB/s ± 0%  2.56MB/s ± 0%  +0.72%  (p=0.000 n=26+40)
RegexpMatchEasy0_32-4    45.8MB/s ± 0%  45.4MB/s ± 0%  -0.75%  (p=0.000 n=38+40)
RegexpMatchEasy0_1K-4     240MB/s ± 0%   240MB/s ± 0%  +0.09%  (p=0.000 n=35+38)
RegexpMatchEasy1_32-4    43.1MB/s ± 0%  43.5MB/s ± 0%  +0.84%  (p=0.000 n=40+39)
RegexpMatchEasy1_1K-4     185MB/s ± 0%   186MB/s ± 0%  +0.69%  (p=0.000 n=39+40)
RegexpMatchMedium_32-4    936kB/s ± 1%   959kB/s ± 2%  +2.38%  (p=0.000 n=40+40)
RegexpMatchMedium_1K-4   3.92MB/s ± 0%  3.93MB/s ± 0%  +0.18%  (p=0.000 n=39+40)
RegexpMatchHard_32-4     2.15MB/s ± 0%  2.15MB/s ± 0%  +0.19%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4     2.30MB/s ± 0%  2.30MB/s ± 0%    ~     (all equal)
Revcomp-4                60.8MB/s ± 1%  60.8MB/s ± 1%    ~     (p=0.600 n=39+38)
Template-4               3.66MB/s ± 1%  3.68MB/s ± 1%  +0.46%  (p=0.000 n=40+40)
[Geo mean]               12.8MB/s       12.8MB/s       +0.27%

Change-Id: I849161169ecf0876a04b7c1d3990fa8d1435215e
Reviewed-on: https://go-review.googlesource.com/122855
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-27 15:49:46 +00:00
Ben Shi
556971316f cmd/compile: optimize 386's comparison
CMPL/CMPW/CMPB can take a memory operand on 386, and this CL
implements that optimization.

1. The total size of pkg/linux_386 decreases about 45KB, excluding
cmd/compile.

2. The go1 benchmark shows a little improvement.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.36s ± 2%     3.37s ± 3%    ~     (p=0.537 n=40+40)
Fannkuch11-4                3.59s ± 1%     3.53s ± 2%  -1.58%  (p=0.000 n=40+40)
FmtFprintfEmpty-4          46.0ns ± 3%    45.8ns ± 3%    ~     (p=0.249 n=40+40)
FmtFprintfString-4         80.0ns ± 4%    78.8ns ± 3%  -1.49%  (p=0.001 n=40+40)
FmtFprintfInt-4            89.7ns ± 2%    90.3ns ± 2%  +0.74%  (p=0.003 n=40+40)
FmtFprintfIntInt-4          144ns ± 3%     143ns ± 3%  -0.95%  (p=0.003 n=40+40)
FmtFprintfPrefixedInt-4     181ns ± 4%     180ns ± 2%    ~     (p=0.103 n=40+40)
FmtFprintfFloat-4           412ns ± 3%     408ns ± 4%  -0.97%  (p=0.018 n=40+40)
FmtManyArgs-4               607ns ± 4%     605ns ± 4%    ~     (p=0.148 n=40+40)
GobDecode-4                7.19ms ± 4%    7.24ms ± 5%    ~     (p=0.340 n=40+40)
GobEncode-4                7.04ms ± 9%    6.99ms ± 9%    ~     (p=0.289 n=40+40)
Gzip-4                      400ms ± 6%     398ms ± 5%    ~     (p=0.168 n=40+40)
Gunzip-4                   41.2ms ± 3%    41.7ms ± 3%  +1.40%  (p=0.001 n=40+40)
HTTPClientServer-4         62.5µs ± 1%    62.1µs ± 2%  -0.61%  (p=0.000 n=37+37)
JSONEncode-4               20.7ms ± 4%    20.4ms ± 3%  -1.60%  (p=0.000 n=40+40)
JSONDecode-4               69.4ms ± 4%    69.2ms ± 6%    ~     (p=0.177 n=40+40)
Mandelbrot200-4            5.22ms ± 6%    5.21ms ± 3%    ~     (p=0.531 n=40+40)
GoParse-4                  3.29ms ± 3%    3.28ms ± 3%    ~     (p=0.321 n=40+39)
RegexpMatchEasy0_32-4       104ns ± 4%     103ns ± 7%  -0.89%  (p=0.040 n=40+40)
RegexpMatchEasy0_1K-4       852ns ± 3%     853ns ± 2%    ~     (p=0.357 n=40+40)
RegexpMatchEasy1_32-4       113ns ± 8%     113ns ± 3%    ~     (p=0.906 n=40+40)
RegexpMatchEasy1_1K-4      1.03µs ± 4%    1.03µs ± 5%    ~     (p=0.326 n=40+40)
RegexpMatchMedium_32-4      136ns ± 3%     133ns ± 3%  -2.31%  (p=0.000 n=40+40)
RegexpMatchMedium_1K-4     44.0µs ± 3%    43.7µs ± 3%    ~     (p=0.053 n=40+40)
RegexpMatchHard_32-4       2.27µs ± 3%    2.26µs ± 4%    ~     (p=0.391 n=40+40)
RegexpMatchHard_1K-4       68.0µs ± 3%    68.9µs ± 3%  +1.28%  (p=0.000 n=40+40)
Revcomp-4                   1.86s ± 5%     1.86s ± 2%    ~     (p=0.950 n=40+40)
Template-4                 73.4ms ± 4%    69.9ms ± 7%  -4.78%  (p=0.000 n=40+40)
TimeParse-4                 449ns ± 4%     441ns ± 5%  -1.76%  (p=0.000 n=40+40)
TimeFormat-4                416ns ± 3%     417ns ± 4%    ~     (p=0.304 n=40+40)
[Geo mean]                 67.7µs         67.3µs       -0.55%

name                     old speed      new speed      delta
GobDecode-4               107MB/s ± 4%   106MB/s ± 5%    ~     (p=0.336 n=40+40)
GobEncode-4               109MB/s ± 5%   110MB/s ± 9%    ~     (p=0.142 n=38+40)
Gzip-4                   48.5MB/s ± 5%  48.8MB/s ± 5%    ~     (p=0.172 n=40+40)
Gunzip-4                  472MB/s ± 3%   465MB/s ± 3%  -1.39%  (p=0.001 n=40+40)
JSONEncode-4             93.6MB/s ± 4%  95.1MB/s ± 3%  +1.61%  (p=0.000 n=40+40)
JSONDecode-4             28.0MB/s ± 3%  28.1MB/s ± 6%    ~     (p=0.181 n=40+40)
GoParse-4                17.6MB/s ± 3%  17.7MB/s ± 3%    ~     (p=0.350 n=40+39)
RegexpMatchEasy0_32-4     308MB/s ± 4%   311MB/s ± 6%  +0.96%  (p=0.025 n=40+40)
RegexpMatchEasy0_1K-4    1.20GB/s ± 3%  1.20GB/s ± 2%    ~     (p=0.317 n=40+40)
RegexpMatchEasy1_32-4     282MB/s ± 7%   282MB/s ± 3%    ~     (p=0.516 n=40+40)
RegexpMatchEasy1_1K-4     994MB/s ± 4%   991MB/s ± 5%    ~     (p=0.319 n=40+40)
RegexpMatchMedium_32-4   7.31MB/s ± 3%  7.49MB/s ± 3%  +2.46%  (p=0.000 n=40+40)
RegexpMatchMedium_1K-4   23.3MB/s ± 3%  23.4MB/s ± 3%    ~     (p=0.052 n=40+40)
RegexpMatchHard_32-4     14.1MB/s ± 3%  14.1MB/s ± 4%    ~     (p=0.391 n=40+40)
RegexpMatchHard_1K-4     15.1MB/s ± 3%  14.9MB/s ± 3%  -1.27%  (p=0.000 n=40+40)
Revcomp-4                 137MB/s ± 5%   137MB/s ± 2%    ~     (p=0.942 n=40+40)
Template-4               26.5MB/s ± 4%  27.8MB/s ± 7%  +5.03%  (p=0.000 n=40+40)
[Geo mean]               78.6MB/s       79.0MB/s       +0.57%

Change-Id: Idcacc6881ef57cd7dc33aa87b711282842b72a53
Reviewed-on: https://go-review.googlesource.com/126618
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-20 14:23:22 +00:00
Michael Munday
b65122f99a cmd/compile: optimize comparisons using load merging where available
Multi-byte comparison operations were used on amd64, arm64, i386
and s390x for comparisons with constant arrays, but only amd64 and
i386 for comparisons with string constants. This CL combines the
check for platform capability, since they have the same requirements,
and also enables both on ppc64le which also supports load merging.

Note that these optimizations currently use little endian byte order
which results in byte reversal instructions on s390x. This should
be fixed at some point.

Change-Id: Ie612d13359b50c77f4d7c6e73fea4a59fa11f322
Reviewed-on: https://go-review.googlesource.com/102558
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-09 21:16:47 +00:00
Alberto Donizetti
a27cd4fd31 test/codegen: port tbz/tbnz arm64 tests
And delete them from asm_test.

Change-Id: I34fcf85ae8ce09cd146fe4ce6a0ae7616bd97e2d
Reviewed-on: https://go-review.googlesource.com/102296
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-24 09:35:53 +00:00
Alberto Donizetti
fc6280d4b0 test/codegen: port direct comparisons with memory tests
And remove them from asm_test.

Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454
Reviewed-on: https://go-review.googlesource.com/102115
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-22 17:20:09 +00:00
Alberto Donizetti
be371edd67 test/codegen: port comparisons tests to codegen
And delete them from asm_test.

Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
Reviewed-on: https://go-review.googlesource.com/101595
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-20 19:38:06 +00:00