1
0
mirror of https://github.com/golang/go synced 2024-11-05 19:56:11 -07:00
Commit Graph

20 Commits

Author SHA1 Message Date
ruinan
454a058ffc cmd/compile: Add shiftIsBounded check for logic shifts of arm64
This CL adds shiftIsBounded checks for the Lsh* and Rsh* rules in arm64.
There is no need to check the shift value again with CMP + CSEL when the
shift value is valid.

Change-Id: I54620de64f02a1b5a11089add237248ae2de01b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/417714
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2022-09-07 20:10:13 +00:00
Keith Randall
5b1fbfba1c cmd/compile: rewrite >>c<<c to &^(1<<c-1)
Fixes #54496

Change-Id: I3c2ed8cd55836d5b07c8cdec00d3b584885aca79
Reviewed-on: https://go-review.googlesource.com/c/go/+/424856
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Martin Möhrmann <martin@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <martin@golang.org>
2022-09-02 18:51:37 +00:00
ruinan
54c7bc9cff cmd/compile: optimize shift ops on arm64 when the shift value is v&63
For the following code case:

  var x uint64
  x >> (shift & 63)

We can directly genereta `x >> shift` on arm64, since the hardware will
only use the bottom 6 bits of the shift amount.

Benchmark               old time/op  new time/op    delta
ShiftArithmeticRight-8  0.40ns       0.31ns        -21.7%

Change-Id: Id58c8a5b2f6dd5c30c3876f4a36e11b4d81e2dc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/425294
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2022-09-02 17:44:29 +00:00
Wayne Zuo
da6556968f cmd/compile: simplify bounded shift on riscv64
The prove pass will mark some shifts bounded, and then we can use that
information to generate better code on riscv64.

Change-Id: Ia22f43d0598453c9417adac7017db28d7240948b
Reviewed-on: https://go-review.googlesource.com/c/go/+/422616
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-08-31 20:21:00 +00:00
Archana R
d09c6ac417 test/codegen: updated multiple tests to verify on ppc64,ppc64le
Updated multiple tests in test/codegen: math.go, mathbits.go, shift.go
and slices.go to verify on ppc64/ppc64le as well

Change-Id: Id88dd41569b7097819fb4d451b615f69cf7f7a94
Reviewed-on: https://go-review.googlesource.com/c/go/+/412115
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-08-17 13:56:55 +00:00
Joel Sing
fe8347b61a cmd/compile: optimise immediate operands with constants on riscv64
Instructions with immediates can be precomputed when operating on a
constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI
and ADDI. Additionally, optimise ANDI and ORI when the immediate is
all ones or all zeroes.

In particular, the RISCV64 logical left and right shift rules
(Lsh*x*/Rsh*Ux*) produce sequences that check if the shift amount
exceeds 64 and if so returns zero. When the shift amount is a
constant we can precompute and eliminate the filter entirely.

Likewise the arithmetic right shift rules produce sequences that
check if the shift amount exceeds 64 and if so, ensures that the
lower six bits of the shift are all ones. When the shift amount
is a constant we can precompute the shift value.

Arithmetic right shift sequences like:

   117fc:       00100513                li      a0,1
   11800:       04053593                sltiu   a1,a0,64
   11804:       fff58593                addi    a1,a1,-1
   11808:       0015e593                ori     a1,a1,1
   1180c:       40b45433                sra     s0,s0,a1

Are now a single srai instruction:

   117fc:       40145413                srai    s0,s0,0x1

Likewise for logical left shift (and logical right shift):

   1d560:       01100413                li      s0,17
   1d564:       04043413                sltiu   s0,s0,64
   1d568:       40800433                neg     s0,s0
   1d56c:       01131493                slli    s1,t1,0x11
   1d570:       0084f433                and     s0,s1,s0

Which are now a single slli (or srli) instruction:

   1d120:       01131413                slli    s0,t1,0x11

This removes more than 30,000 instructions from the Go binary and
should improve performance in a variety of areas - of note
runtime.makemap_small drops from 48 to 36 instructions. Similar
gains exist in at least other parts of runtime and math/bits.

Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350689
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-24 10:51:48 +00:00
Joel Sing
d413908320 test/codegen: add shift tests for RISCV64
Add tests for shift by constant, masked shifts and bounded shifts. While here,
sort tests by architecture and keep order of tests consistent (lsh, rshU, rsh).

Change-Id: I512d64196f34df9cb2884e8c0f6adcf9dd88b0fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/351289
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
2021-09-24 07:22:13 +00:00
Josh Bleecher Snyder
43d5f213e2 cmd/compile: optimize multi-register shifts on amd64
amd64 can shift in bits from another register instead of filling with 0/1.
This pattern is helpful when implementing 128 bit shifts or arbitrary length shifts.
In the standard library, it shows up in pure Go math/big.

Benchmarks results on amd64 with -tags=math_big_pure_go.

name                          old time/op  new time/op  delta
NonZeroShifts/1/shrVU-8       4.45ns ± 3%  4.39ns ± 1%   -1.28%  (p=0.000 n=30+27)
NonZeroShifts/1/shlVU-8       4.13ns ± 4%  4.10ns ± 2%     ~     (p=0.254 n=29+28)
NonZeroShifts/2/shrVU-8       5.55ns ± 1%  5.63ns ± 2%   +1.42%  (p=0.000 n=28+29)
NonZeroShifts/2/shlVU-8       5.70ns ± 2%  5.14ns ± 1%   -9.82%  (p=0.000 n=29+28)
NonZeroShifts/3/shrVU-8       6.79ns ± 2%  6.35ns ± 2%   -6.46%  (p=0.000 n=28+29)
NonZeroShifts/3/shlVU-8       6.69ns ± 1%  6.25ns ± 1%   -6.60%  (p=0.000 n=28+27)
NonZeroShifts/4/shrVU-8       7.79ns ± 2%  7.06ns ± 2%   -9.48%  (p=0.000 n=30+30)
NonZeroShifts/4/shlVU-8       7.82ns ± 1%  7.24ns ± 1%   -7.37%  (p=0.000 n=28+29)
NonZeroShifts/5/shrVU-8       8.90ns ± 3%  7.93ns ± 1%  -10.84%  (p=0.000 n=29+26)
NonZeroShifts/5/shlVU-8       8.68ns ± 1%  7.92ns ± 1%   -8.76%  (p=0.000 n=29+29)
NonZeroShifts/10/shrVU-8      14.4ns ± 1%  12.3ns ± 2%  -14.79%  (p=0.000 n=28+29)
NonZeroShifts/10/shlVU-8      14.1ns ± 1%  11.9ns ± 2%  -15.55%  (p=0.000 n=28+27)
NonZeroShifts/100/shrVU-8      118ns ± 1%    96ns ± 3%  -18.82%  (p=0.000 n=30+29)
NonZeroShifts/100/shlVU-8      120ns ± 2%    98ns ± 2%  -18.46%  (p=0.000 n=29+28)
NonZeroShifts/1000/shrVU-8    1.10µs ± 1%  0.88µs ± 2%  -19.63%  (p=0.000 n=29+30)
NonZeroShifts/1000/shlVU-8    1.10µs ± 2%  0.88µs ± 2%  -20.28%  (p=0.000 n=29+28)
NonZeroShifts/10000/shrVU-8   10.9µs ± 1%   8.7µs ± 1%  -19.78%  (p=0.000 n=28+27)
NonZeroShifts/10000/shlVU-8   10.9µs ± 2%   8.7µs ± 1%  -19.64%  (p=0.000 n=29+27)
NonZeroShifts/100000/shrVU-8   111µs ± 2%    90µs ± 2%  -19.39%  (p=0.000 n=28+29)
NonZeroShifts/100000/shlVU-8   113µs ± 2%    90µs ± 2%  -20.43%  (p=0.000 n=30+27)

The assembly version is still faster, unfortunately, but the gap is narrowing.
Speedup from pure Go to assembly:

name                          old time/op  new time/op  delta
NonZeroShifts/1/shrVU-8       4.39ns ± 1%  3.45ns ± 2%  -21.36%  (p=0.000 n=27+29)
NonZeroShifts/1/shlVU-8       4.10ns ± 2%  3.47ns ± 3%  -15.42%  (p=0.000 n=28+30)
NonZeroShifts/2/shrVU-8       5.63ns ± 2%  3.97ns ± 0%  -29.40%  (p=0.000 n=29+25)
NonZeroShifts/2/shlVU-8       5.14ns ± 1%  3.77ns ± 2%  -26.65%  (p=0.000 n=28+26)
NonZeroShifts/3/shrVU-8       6.35ns ± 2%  4.79ns ± 2%  -24.52%  (p=0.000 n=29+29)
NonZeroShifts/3/shlVU-8       6.25ns ± 1%  4.42ns ± 1%  -29.29%  (p=0.000 n=27+26)
NonZeroShifts/4/shrVU-8       7.06ns ± 2%  5.64ns ± 1%  -20.05%  (p=0.000 n=30+29)
NonZeroShifts/4/shlVU-8       7.24ns ± 1%  5.34ns ± 2%  -26.23%  (p=0.000 n=29+29)
NonZeroShifts/5/shrVU-8       7.93ns ± 1%  6.56ns ± 2%  -17.26%  (p=0.000 n=26+30)
NonZeroShifts/5/shlVU-8       7.92ns ± 1%  6.27ns ± 1%  -20.79%  (p=0.000 n=29+25)
NonZeroShifts/10/shrVU-8      12.3ns ± 2%  10.2ns ± 2%  -17.21%  (p=0.000 n=29+29)
NonZeroShifts/10/shlVU-8      11.9ns ± 2%  10.5ns ± 2%  -12.45%  (p=0.000 n=27+29)
NonZeroShifts/100/shrVU-8     95.9ns ± 3%  77.7ns ± 1%  -19.00%  (p=0.000 n=29+30)
NonZeroShifts/100/shlVU-8     97.5ns ± 2%  66.8ns ± 2%  -31.47%  (p=0.000 n=28+30)
NonZeroShifts/1000/shrVU-8     884ns ± 2%   705ns ± 1%  -20.17%  (p=0.000 n=30+28)
NonZeroShifts/1000/shlVU-8     880ns ± 2%   590ns ± 1%  -32.96%  (p=0.000 n=28+25)
NonZeroShifts/10000/shrVU-8   8.74µs ± 1%  7.34µs ± 3%  -15.94%  (p=0.000 n=27+30)
NonZeroShifts/10000/shlVU-8   8.73µs ± 1%  6.00µs ± 1%  -31.25%  (p=0.000 n=27+28)
NonZeroShifts/100000/shrVU-8  89.6µs ± 2%  75.5µs ± 2%  -15.80%  (p=0.000 n=29+29)
NonZeroShifts/100000/shlVU-8  89.6µs ± 2%  68.0µs ± 3%  -24.09%  (p=0.000 n=27+30)

Change-Id: I18f58d8f5513d737d9cdf09b8f9d14011ffe3958
Reviewed-on: https://go-review.googlesource.com/c/go/+/297050
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-11 19:11:46 +00:00
Paul E. Murphy
48ddf70128 cmd/asm,cmd/compile: support 5 operand RLWNM/RLWMI on ppc64
These instructions are actually 5 argument opcodes as specified
by the ISA.  Prior to this patch, the MB and ME arguments were
merged into a single bitmask operand to workaround the limitations
of the ppc64 assembler backend.

This limitation no longer exists. Thus, we can pass operands for
these opcodes without having to merge the MB and ME arguments in
the assembler frontend or compiler backend.

Likewise, support for 4 operand variants is unchanged.

Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/298750
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2021-03-09 20:35:41 +00:00
Michael Munday
854e892ce1 cmd/compile: optimize shift pairs and masks on s390x
Optimize combinations of left and right shifts by a constant value
into a 'rotate then insert selected bits [into zero]' instruction.
Use the same instruction for contiguous masks since it has some
benefits over 'and immediate' (not restricted to 32-bits, does not
overwrite source register).

To keep the complexity of this change under control I've only
implemented 64 bit operations for now.

There are a lot more optimizations that can be done with this
instruction family. However, since their function overlaps with other
instructions we need to be somewhat careful not to break existing
optimization rules by creating optimization dead ends. This is
particularly true of the load/store merging rules which contain lots
of zero extensions and shifts.

This CL does interfere with the store merging rules when an operand
is shifted left before it is stored:

  binary.BigEndian.PutUint64(b, x << 1)

This is unfortunate but it's not critical and somewhat complex so
I plan to fix that in a follow up CL.

file      before    after     Δ       %
addr2line 4117446   4117282   -164    -0.004%
api       4945184   4942752   -2432   -0.049%
asm       4998079   4991891   -6188   -0.124%
buildid   2685158   2684074   -1084   -0.040%
cgo       4553732   4553394   -338    -0.007%
compile   19294446  19245070  -49376  -0.256%
cover     4897105   4891319   -5786   -0.118%
dist      3544389   3542785   -1604   -0.045%
doc       3926795   3927617   +822    +0.021%
fix       3302958   3293868   -9090   -0.275%
link      6546274   6543456   -2818   -0.043%
nm        4102021   4100825   -1196   -0.029%
objdump   4542431   4548483   +6052   +0.133%
pack      2482465   2416389   -66076  -2.662%
pprof     13366541  13363915  -2626   -0.020%
test2json 2829007   2761515   -67492  -2.386%
trace     10216164  10219684  +3520   +0.034%
vet       6773956   6773572   -384    -0.006%
total     107124151 106917891 -206260 -0.193%

Change-Id: I7591cce41e06867ba10a745daae9333513062746
Reviewed-on: https://go-review.googlesource.com/c/go/+/233317
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Michael Munday <mike.munday@ibm.com>
2020-11-06 10:45:31 +00:00
Paul E. Murphy
c3c6fbf314 cmd/compile: combine more 32 bit shift and mask operations on ppc64
Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is
and the shift value produce constant which can be encoded into an
RLWINM instruction.

Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate
masks produces a constant which can be encoded into RLWINM.

Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)).

Combine rotate word + and operations which can be encoded as a single
RLWINM/RLWNM instruction.

The most notable performance improvements arise from the crypto
benchmarks below (GOARCH=power8 on a ppc64le/linux):

pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le
ExpandKeyWithSalt                               52.2µs ± 0%    47.5µs ± 0%  -8.88%
ExpandKey                                       44.4µs ± 0%    40.3µs ± 0%  -9.15%

pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le
Key                                             57.6ms ± 0%    52.3ms ± 0%  -9.13%

pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le
Equal                                           90.9ms ± 0%    82.6ms ± 0%  -9.13%
DefaultCost                                     91.0ms ± 0%    82.7ms ± 0%  -9.12%

Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/260798
Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-27 18:33:20 +00:00
Lynn Boger
cc2a5cf4b8 cmd/compile,cmd/internal/obj/ppc64: fix some shift rules due to a regression
A recent change to improve shifts was generating some
invalid cases when the rule was based on an AND. The
extended mnemonics CLRLSLDI and CLRLSLWI only allow
certain values for the operands and in the mask case
those values were not being checked properly. This
adds a check to those rules to verify that the
'b' and 'n' values used when an AND was part of the rule
have correct values.

There was a bug in some diag messages in asm9. The
message expected 3 values but only provided 2. Those are
corrected here also.

The test/codegen/shift.go was updated to add a few more
cases to check for the case mentioned here.

Some of the comments that mention the order of operands
in these extended mnemonics were wrong and those have been
corrected.

Fixes #41683.

Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31
Reviewed-on: https://go-review.googlesource.com/c/go/+/258138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-01 18:51:18 +00:00
Lynn Boger
a424f6e45e cmd/asm,cmd/compile,cmd/internal/obj/ppc64: add extswsli support on power9
This adds support for the extswsli instruction which combines
extsw followed by a shift.

New benchmark demonstrates the improvement:
name      old time/op  new time/op  delta
ExtShift  1.34µs ± 0%  1.30µs ± 0%  -3.15%  (p=0.057 n=4+3)

Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-28 18:13:48 +00:00
Lynn Boger
967465da29 cmd/compile: use combined shifts to improve array addressing on ppc64x
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.

These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.

Some rules in PPC64.rules were moved because the ordering prevented
some matching.

The following results were generated on power9.

hash/crc32:
    CRC32/poly=Koopman/size=40/align=0          195ns ± 0%     163ns ± 0%  -16.41%
    CRC32/poly=Koopman/size=40/align=1          200ns ± 0%     163ns ± 0%  -18.50%
    CRC32/poly=Koopman/size=512/align=0        1.98µs ± 0%    1.67µs ± 0%  -15.46%
    CRC32/poly=Koopman/size=512/align=1        1.98µs ± 0%    1.69µs ± 0%  -14.80%
    CRC32/poly=Koopman/size=1kB/align=0        3.90µs ± 0%    3.31µs ± 0%  -15.27%
    CRC32/poly=Koopman/size=1kB/align=1        3.85µs ± 0%    3.31µs ± 0%  -14.15%
    CRC32/poly=Koopman/size=4kB/align=0        15.3µs ± 0%    13.1µs ± 0%  -14.22%
    CRC32/poly=Koopman/size=4kB/align=1        15.4µs ± 0%    13.1µs ± 0%  -14.79%
    CRC32/poly=Koopman/size=32kB/align=0        137µs ± 0%     105µs ± 0%  -23.56%
    CRC32/poly=Koopman/size=32kB/align=1        137µs ± 0%     105µs ± 0%  -23.53%

crypto/rc4:
    RC4_128    733ns ± 0%    650ns ± 0%  -11.32%  (p=1.000 n=1+1)
    RC4_1K    5.80µs ± 0%   5.17µs ± 0%  -10.89%  (p=1.000 n=1+1)
    RC4_8K    45.7µs ± 0%   40.8µs ± 0%  -10.73%  (p=1.000 n=1+1)

crypto/sha1:
    Hash8Bytes       635ns ± 0%     613ns ± 0%   -3.46%  (p=1.000 n=1+1)
    Hash320Bytes    2.30µs ± 0%    2.18µs ± 0%   -5.38%  (p=1.000 n=1+1)
    Hash1K          5.88µs ± 0%    5.38µs ± 0%   -8.62%  (p=1.000 n=1+1)
    Hash8K          42.0µs ± 0%    37.9µs ± 0%   -9.75%  (p=1.000 n=1+1)

There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.

Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-09-17 12:37:40 +00:00
Lynn Boger
a1550d3ca3 cmd/compile: use isel with variable shifts on ppc64x
This changes the code generated for variable length shift
counts to use isel instead of instructions that set and
read the carry flag.

This reduces the generated code for shifts like this
by 1 instruction and avoids the use of instructions to
set and read the carry flag.

This sequence can be found in strconv with these results
on power9:

Atof64Decimal                          71.6ns ± 0%  68.3ns ± 0%   -4.61%
Atof64Float                            95.3ns ± 0%  90.9ns ± 0%   -4.62%
Atof64FloatExp                          153ns ± 0%   149ns ± 0%   -2.61%
Atof64Big                               234ns ± 0%   232ns ± 0%   -0.85%
Atof64RandomBits                        348ns ± 0%   369ns ± 0%   +6.03%
Atof64RandomFloats                      262ns ± 0%   262ns ± 0%     ~
Atof32Decimal                          72.0ns ± 0%  68.2ns ± 0%   -5.28%
Atof32Float                            92.1ns ± 0%  87.1ns ± 0%   -5.43%
Atof32FloatExp                          159ns ± 0%   158ns ± 0%   -0.63%
Atof32Random                            194ns ± 0%   191ns ± 0%   -1.55%

Some tests in codegen/shift.go are enabled to verify the
expected instructions are generated.

Change-Id: I968715d10ada405a8c46132bf19b8ed9b85796d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227337
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-09 19:18:56 +00:00
Lynn Boger
e4a1cf8a56 cmd/compile: add rules to eliminate unnecessary signed shifts
This change to the rules removes some unnecessary signed shifts
that appear in the math/rand functions. Existing rules did not
cover some of the signed cases.

A little improvement seen in math/rand due to removing 1 of 2
instructions generated for Int31n, which is inlined quite a bit.

Intn1000                 46.9ns ± 0%  45.5ns ± 0%   -2.99%  (p=1.000 n=1+1)
Int63n1000               33.5ns ± 0%  32.8ns ± 0%   -2.09%  (p=1.000 n=1+1)
Int31n1000               32.7ns ± 0%  32.6ns ± 0%   -0.31%  (p=1.000 n=1+1)
Float32                  32.7ns ± 0%  30.3ns ± 0%   -7.34%  (p=1.000 n=1+1)
Float64                  21.7ns ± 0%  20.9ns ± 0%   -3.69%  (p=1.000 n=1+1)
Perm3                     205ns ± 0%   202ns ± 0%   -1.46%  (p=1.000 n=1+1)
Perm30                   1.71µs ± 0%  1.68µs ± 0%   -1.35%  (p=1.000 n=1+1)
Perm30ViaShuffle         1.65µs ± 0%  1.65µs ± 0%   -0.30%  (p=1.000 n=1+1)
ShuffleOverhead          2.83µs ± 0%  2.83µs ± 0%   -0.07%  (p=1.000 n=1+1)
Read3                    18.7ns ± 0%  16.1ns ± 0%  -13.90%  (p=1.000 n=1+1)
Read64                    126ns ± 0%   124ns ± 0%   -1.59%  (p=1.000 n=1+1)
Read1000                 1.75µs ± 0%  1.63µs ± 0%   -7.08%  (p=1.000 n=1+1)

Change-Id: I11502dfca7d65aafc76749a8d713e9e50c24a858
Reviewed-on: https://go-review.googlesource.com/c/go/+/225917
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-27 16:05:42 +00:00
Agniva De Sarker
8fedb2d338 cmd/compile: optimize bounded shifts on wasm
Use the shiftIsBounded function to generate more efficient
Shift instructions.

Updates #25167

Change-Id: Id350f8462dc3a7ed3bfed0bcbea2860b8f40048a
Reviewed-on: https://go-review.googlesource.com/c/go/+/182558
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Richard Musiol <neelance@gmail.com>
2019-08-28 04:44:21 +00:00
Josh Bleecher Snyder
61945fc502 cmd/compile: don't generate panicshift for masked int shifts
We know that a & 31 is non-negative for all a, signed or not.
We can avoid checking that and needing to write out an
unreachable call to panicshift.

Change-Id: I32f32fb2c950d2b2b35ac5c0e99b7b2dbd47f917
Reviewed-on: https://go-review.googlesource.com/c/go/+/167499
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-14 00:03:29 +00:00
Josh Bleecher Snyder
870cfe6484 test/codegen: gofmt
Change-Id: I33f5b5051e5f75aa264ec656926223c5a3c09c1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/167498
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
2019-03-13 21:44:45 +00:00
Michael Munday
8af0c77df3 cmd/compile: simplify shift lowering on s390x
Use conditional moves instead of subtractions with borrow to handle
saturation cases. This allows us to delete the SUBE/SUBEW ops and
associated rules from the SSA backend. Using conditional moves also
means we can detect when shift values are masked so I've added some
new rules to constant fold the relevant comparisons and masking ops.

Also use the new shiftIsBounded() function to avoid generating code
to handle saturation cases where possible.

Updates #25167 for s390x.

Change-Id: Ief9991c91267c9151ce4c5ec07642abb4dcc1c0d
Reviewed-on: https://go-review.googlesource.com/110070
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-08 16:19:56 +00:00