1
0
mirror of https://github.com/golang/go synced 2024-11-23 04:50:06 -07:00
Commit Graph

31 Commits

Author SHA1 Message Date
Paul E. Murphy
c6d142c4a7 cmd/compile/internal/ssa: fix ppc64 merging of (CLRLSLDI (SRD ...))
The rotate value was not correctly converted from a 64 bit to 32
bit rotate. This caused a miscompile of
golang.org/x/text/unicode/runenames.Names.

Fixes #67526

Change-Id: Ief56fbab27ccc71cd4c01117909bfee7f60a2ea1
Reviewed-on: https://go-review.googlesource.com/c/go/+/586915
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-21 18:53:43 +00:00
Paul E. Murphy
0222a028f1 cmd/compile/internal/ssa: combine more shift and masking on PPC64
Investigating binaries, these patterns seem to show up frequently.

Change-Id: I987251e4070e35c25e98da321e444ccaa1526912
Reviewed-on: https://go-review.googlesource.com/c/go/+/583302
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-15 13:27:41 +00:00
Paul E. Murphy
7994da4cc1 cmd/compile/internal/ssa: on PPC64, try combining CLRLSLDI and SRDconst into RLWINM
This provides a small performance bump to crc64 as measured on ppc64le/power10:

name              old time/op    new time/op    delta
Crc64/ISO64KB       49.6µs ± 0%    46.6µs ± 0%  -6.18%
Crc64/ISO4KB        3.16µs ± 0%    2.97µs ± 0%  -5.83%
Crc64/ISO1KB         840ns ± 0%     794ns ± 0%  -5.46%
Crc64/ECMA64KB      49.6µs ± 0%    46.5µs ± 0%  -6.20%
Crc64/Random64KB    53.1µs ± 0%    49.9µs ± 0%  -6.04%
Crc64/Random16KB    15.9µs ± 1%    15.0µs ± 0%  -5.73%

Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/581937
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-05-03 21:12:29 +00:00
Joel Sing
70c7fb75e9 cmd/compile: correct code generation for right shifts on riscv64
The code generation on riscv64 will currently result in incorrect
assembly when a 32 bit integer is right shifted by an amount that
exceeds the size of the type. In particular, this occurs when an
int32 or uint32 is cast to a 64 bit type and right shifted by a
value larger than 31.

Fix this by moving the SRAW/SRLW conversion into the right shift
rules and removing the SignExt32to64/ZeroExt32to64. Add additional
rules that rewrite to SRAIW/SRLIW when the shift is less than the
size of the type, or replace/eliminate the shift when it exceeds
the size of the type.

Add SSA tests that would have caught this issue. Also add additional
codegen tests to ensure that the resulting assembly is what we
expect in these overflow cases.

Fixes #64285

Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/545035
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-12-01 19:30:59 +00:00
Ubuntu
8fc043ccfa cmd/compile: optimize right shifts of int32 on riscv64
The compiler is currently sign extending 32 bit signed integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit signed values (sraw and sraiw) which sign extend
the result of the shift to 64 bits.  Change the compiler so that
it uses sraw and sraiw for shifts of signed 32 bit integers reducing
in most cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

int32(a) >> 2

  before:

    sll     x5,x10,0x20
    sra     x10,x5,0x22

  after:

    sraw    x10,x10,0x2

int32(v) >> int(s)

  before:

    sext.w  x5,x10
    sltiu   x6,x11,64
    add     x6,x6,-1
    or      x6,x11,x6
    sra     x10,x5,x6

  after:

    sltiu   x5,x11,32
    add     x5,x5,-1
    or      x5,x11,x5
    sraw    x10,x10,x5

int32(v) >> (int(s) & 31)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

after:

    and     x5,x11,31
    sraw    x10,x10,x5

int32(100) >> int(a)

  before:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,64
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sra     x10,x6,x5

  after:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,32
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sraw    x10,x6,x5

int32(v) >> (int(s) & 63)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

  after:

    and     x5,x11,63
    sltiu   x6,x5,32
    add     x6,x6,-1
    or      x5,x5,x6
    sraw    x10,x10,x5

In most cases we eliminate one instruction.  In the case where
we shift a int32 constant by a variable the number of instructions
generated is identical.  A sra is simply replaced by a sraw.  In the
unusual case where we shift right by a variable anded with a constant
> 31 but < 64, we generate two additional instructions.  As this is
an unusual case we do not try to optimize for it.

Some improvements can be seen in some of the existing benchmarks,
notably in the utf8 package which performs right shifts of runes
which are signed 32 bit integers.

                      |  utf8-old   |              utf8-new            |
                      |   sec/op    |   sec/op     vs base             |
EncodeASCIIRune-4       17.68n ± 0%   17.67n ± 0%       ~ (p=0.312 n=10)
EncodeJapaneseRune-4    35.34n ± 0%   34.53n ± 1%  -2.31% (p=0.000 n=10)
AppendASCIIRune-4       3.213n ± 0%   3.213n ± 0%       ~ (p=0.318 n=10)
AppendJapaneseRune-4    36.14n ± 0%   35.35n ± 0%  -2.19% (p=0.000 n=10)
DecodeASCIIRune-4       28.11n ± 0%   27.36n ± 0%  -2.69% (p=0.000 n=10)
DecodeJapaneseRune-4    38.55n ± 0%   38.58n ± 0%       ~ (p=0.612 n=10)

Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/535115
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
2023-10-30 14:47:06 +00:00
Lynn Boger
80834af206 cmd/compile: avoid ANDCCconst on PPC64 if condition not needed
In the PPC64 ISA, the instruction to do an 'and' operation
using an immediate constant is only available in the form that
also sets CR0 (i.e. clobbers the condition register.) This means
CR0 is being clobbered unnecessarily in many cases. That
affects some decisions made during some compiler passes
that check for it.

In those cases when the constant used by the ANDCC is a right
justified consecutive set of bits, a shift instruction can
be used which has the same effect if CR0 does not need to be
set. The rule to do that has been added to the late rules file
after other rules using ANDCCconst have been processed in the
main rules file.

Some codegen tests had to be updated since ANDCC is no
longer generated for some cases. A new test case was added to
verify the ANDCC is present if the results for both the AND
and CR0 are used.

Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542
Reviewed-on: https://go-review.googlesource.com/c/go/+/531435
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-10-18 15:56:53 +00:00
Mark Ryan
561bf0457f cmd/compile: optimize right shifts of uint32 on riscv
The compiler is currently zero extending 32 bit unsigned integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit unsigned values (srlw and srliw) which zero extend
the result of the shift to 64 bits.  Change the compiler so that
it uses srlw and srliw for 32 bit unsigned shifts reducing in most
cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

uint32(a) >> 2

  before:

    sll     x5,x10,0x20
    srl     x10,x5,0x22

  after:

    srlw    x10,x10,0x2

uint32(a) >> int(b)

  before:

    sll     x5,x10,0x20
    srl     x5,x5,0x20
    srl     x5,x5,x11
    sltiu   x6,x11,64
    neg     x6,x6
    and     x10,x5,x6

  after:

    srlw    x5,x10,x11
    sltiu   x6,x11,32
    neg     x6,x6
    and     x10,x5,x6

bits.RotateLeft32(uint32(a), 1)

  before:

    sll     x5,x10,0x1
    sll     x6,x10,0x20
    srl     x7,x6,0x3f
    or      x5,x5,x7

  after:

   sll     x5,x10,0x1
   srlw    x6,x10,0x1f
   or      x10,x5,x6

bits.RotateLeft32(uint32(a), int(b))

  before:
    and     x6,x11,31
    sll     x7,x10,x6
    sll     x8,x10,0x20
    srl     x8,x8,0x20
    add     x6,x6,-32
    neg     x6,x6
    srl     x9,x8,x6
    sltiu   x6,x6,64
    neg     x6,x6
    and     x6,x9,x6
    or      x6,x6,x7

  after:

    and     x5,x11,31
    sll     x6,x10,x5
    add     x5,x5,-32
    neg     x5,x5
    srlw    x7,x10,x5
    sltiu   x5,x5,32
    neg     x5,x5
    and     x5,x7,x5
    or      x10,x6,x5

The one regression observed is the following case, an unbounded right
shift of a uint32 where the value we're shifting by is known to be
< 64 but > 31.  As this is an unusual case this commit does not
optimize for it, although the existing code does.

uint32(a) >> (b & 63)

  before:

    sll     x5,x10,0x20
    srl     x5,x5,0x20
    and     x6,x11,63
    srl     x10,x5,x6

  after

    and     x5,x11,63
    srlw    x6,x10,x5
    sltiu   x5,x5,32
    neg     x5,x5
    and     x10,x6,x5

Here we have one extra instruction.

Some benchmark highlights, generated on a VisionFive2 8GB running
Ubuntu 23.04.

pkg: math/bits
LeadingZeros32-4    18.64n ± 0%     17.32n ± 0%   -7.11% (p=0.000 n=10)
LeadingZeros64-4    15.47n ± 0%     15.51n ± 0%   +0.26% (p=0.027 n=10)
TrailingZeros16-4   18.48n ± 0%     17.68n ± 0%   -4.33% (p=0.000 n=10)
TrailingZeros32-4   16.87n ± 0%     16.07n ± 0%   -4.74% (p=0.000 n=10)
TrailingZeros64-4   15.26n ± 0%     15.27n ± 0%   +0.07% (p=0.043 n=10)
OnesCount32-4       20.08n ± 0%     19.29n ± 0%   -3.96% (p=0.000 n=10)
RotateLeft-4        8.864n ± 0%     8.838n ± 0%   -0.30% (p=0.006 n=10)
RotateLeft32-4      8.837n ± 0%     8.032n ± 0%   -9.11% (p=0.000 n=10)
Reverse32-4         29.77n ± 0%     26.52n ± 0%  -10.93% (p=0.000 n=10)
ReverseBytes32-4    9.640n ± 0%     8.838n ± 0%   -8.32% (p=0.000 n=10)
Sub32-4             8.835n ± 0%     8.035n ± 0%   -9.06% (p=0.000 n=10)
geomean             11.50n          11.33n        -1.45%

pkg: crypto/md5
Hash8Bytes-4             1.486µ ± 0%   1.426µ ± 0%  -4.04% (p=0.000 n=10)
Hash64-4                 2.079µ ± 0%   1.968µ ± 0%  -5.36% (p=0.000 n=10)
Hash128-4                2.720µ ± 0%   2.557µ ± 0%  -5.99% (p=0.000 n=10)
Hash256-4                3.996µ ± 0%   3.733µ ± 0%  -6.58% (p=0.000 n=10)
Hash512-4                6.541µ ± 0%   6.072µ ± 0%  -7.18% (p=0.000 n=10)
Hash1K-4                 11.64µ ± 0%   10.75µ ± 0%  -7.58% (p=0.000 n=10)
Hash8K-4                 82.95µ ± 0%   76.32µ ± 0%  -7.99% (p=0.000 n=10)
Hash1M-4                10.436m ± 0%   9.591m ± 0%  -8.10% (p=0.000 n=10)
Hash8M-4                 83.50m ± 0%   76.73m ± 0%  -8.10% (p=0.000 n=10)
Hash8BytesUnaligned-4    1.494µ ± 0%   1.434µ ± 0%  -4.02% (p=0.000 n=10)
Hash1KUnaligned-4        11.64µ ± 0%   10.76µ ± 0%  -7.52% (p=0.000 n=10)
Hash8KUnaligned-4        83.01µ ± 0%   76.32µ ± 0%  -8.07% (p=0.000 n=10)
geomean                  28.32µ        26.42µ       -6.72%

Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693
Reviewed-on: https://go-review.googlesource.com/c/go/+/528975
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-07 12:31:38 +00:00
Paul E. Murphy
1540531746 test/codegen: merge identical ppc64 and ppc64le tests
Manually consolidate the remaining ppc64/ppc64le test which
are not so trivial to automatically merge.

The remaining ppc64le tests are limited to cases where load/stores are
merged (this only happens on ppc64le) and the race detector (only
supported on ppc64le).

Change-Id: I1f9c0f3d3ddbb7fbbd8c81fbbd6537394fba63ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/463217
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-01-27 19:03:02 +00:00
Paul E. Murphy
0301c6c351 test/codegen: combine trivial PPC64 tests into ppc64x
Use a small python script to consolidate duplicate
ppc64/ppc64le tests into a single ppc64x codegen test.

This makes small assumption that anytime two tests with
for different arch/variant combos exists, those tests
can be combined into a single ppc64x test.

E.x:

  // ppc64le: foo
  // ppc64le/power9: foo
into
  // ppc64x: foo

or

  // ppc64: foo
  // ppc64le: foo
into
  // ppc64x: foo

import glob
import re
files = glob.glob("codegen/*.go")
for file in files:
    with open(file) as f:
        text = [l for l in f]
    i = 0
    while i < len(text):
        first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i])
        if first:
            j = i+1
            while j < len(text):
                second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j])
                if not second:
                    break
                if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3):
                    text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i])
                    text=text[:j] + (text[j+1:])
                else:
                    j += 1
        i+=1
    with open(file, 'w') as f:
        f.write("".join(text))

Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821
Reviewed-on: https://go-review.googlesource.com/c/go/+/463220
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-27 18:24:12 +00:00
Wayne Zuo
af668c689c cmd/compile: fold constant shift with extension on riscv64
For example:

  movb a0, a0
  srai $1, a0, a0

the assembler will expand to:

  slli $56, a0, a0
  srai $56, a0, a0
  srai $1, a0, a0

this CL optimize to:

  slli $56, a0, a0
  srai $57, a0, a0

Remove 270+ instructions from Go binary on linux/riscv64.

Change-Id: I375e19f9d3bd54f2781791d8cbe5970191297dc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/428496
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-06 05:21:04 +00:00
Joel Sing
a7bcc94719 cmd/compile: resolve known outcomes for SLTI/SLTIU on riscv64
When SLTI/SLTIU is used with ANDI/ORI, it may be possible to determine the
outcome based on the values of the immediates. Resolve these cases.

Improves code generation for various shift operations.

While here, sort tests by architecture to improve readability and ease
future maintenance.

Change-Id: I87e71e016a0e396a928e7d6389a2df61583dfd8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/428217
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Jenny Rakoczy <jenny@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Jenny Rakoczy <jenny@golang.org>
2022-09-17 17:17:52 +00:00
ruinan
454a058ffc cmd/compile: Add shiftIsBounded check for logic shifts of arm64
This CL adds shiftIsBounded checks for the Lsh* and Rsh* rules in arm64.
There is no need to check the shift value again with CMP + CSEL when the
shift value is valid.

Change-Id: I54620de64f02a1b5a11089add237248ae2de01b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/417714
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2022-09-07 20:10:13 +00:00
Keith Randall
5b1fbfba1c cmd/compile: rewrite >>c<<c to &^(1<<c-1)
Fixes #54496

Change-Id: I3c2ed8cd55836d5b07c8cdec00d3b584885aca79
Reviewed-on: https://go-review.googlesource.com/c/go/+/424856
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Martin Möhrmann <martin@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <martin@golang.org>
2022-09-02 18:51:37 +00:00
ruinan
54c7bc9cff cmd/compile: optimize shift ops on arm64 when the shift value is v&63
For the following code case:

  var x uint64
  x >> (shift & 63)

We can directly genereta `x >> shift` on arm64, since the hardware will
only use the bottom 6 bits of the shift amount.

Benchmark               old time/op  new time/op    delta
ShiftArithmeticRight-8  0.40ns       0.31ns        -21.7%

Change-Id: Id58c8a5b2f6dd5c30c3876f4a36e11b4d81e2dc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/425294
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2022-09-02 17:44:29 +00:00
Wayne Zuo
da6556968f cmd/compile: simplify bounded shift on riscv64
The prove pass will mark some shifts bounded, and then we can use that
information to generate better code on riscv64.

Change-Id: Ia22f43d0598453c9417adac7017db28d7240948b
Reviewed-on: https://go-review.googlesource.com/c/go/+/422616
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-08-31 20:21:00 +00:00
Archana R
d09c6ac417 test/codegen: updated multiple tests to verify on ppc64,ppc64le
Updated multiple tests in test/codegen: math.go, mathbits.go, shift.go
and slices.go to verify on ppc64/ppc64le as well

Change-Id: Id88dd41569b7097819fb4d451b615f69cf7f7a94
Reviewed-on: https://go-review.googlesource.com/c/go/+/412115
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-08-17 13:56:55 +00:00
Joel Sing
fe8347b61a cmd/compile: optimise immediate operands with constants on riscv64
Instructions with immediates can be precomputed when operating on a
constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI
and ADDI. Additionally, optimise ANDI and ORI when the immediate is
all ones or all zeroes.

In particular, the RISCV64 logical left and right shift rules
(Lsh*x*/Rsh*Ux*) produce sequences that check if the shift amount
exceeds 64 and if so returns zero. When the shift amount is a
constant we can precompute and eliminate the filter entirely.

Likewise the arithmetic right shift rules produce sequences that
check if the shift amount exceeds 64 and if so, ensures that the
lower six bits of the shift are all ones. When the shift amount
is a constant we can precompute the shift value.

Arithmetic right shift sequences like:

   117fc:       00100513                li      a0,1
   11800:       04053593                sltiu   a1,a0,64
   11804:       fff58593                addi    a1,a1,-1
   11808:       0015e593                ori     a1,a1,1
   1180c:       40b45433                sra     s0,s0,a1

Are now a single srai instruction:

   117fc:       40145413                srai    s0,s0,0x1

Likewise for logical left shift (and logical right shift):

   1d560:       01100413                li      s0,17
   1d564:       04043413                sltiu   s0,s0,64
   1d568:       40800433                neg     s0,s0
   1d56c:       01131493                slli    s1,t1,0x11
   1d570:       0084f433                and     s0,s1,s0

Which are now a single slli (or srli) instruction:

   1d120:       01131413                slli    s0,t1,0x11

This removes more than 30,000 instructions from the Go binary and
should improve performance in a variety of areas - of note
runtime.makemap_small drops from 48 to 36 instructions. Similar
gains exist in at least other parts of runtime and math/bits.

Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350689
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-24 10:51:48 +00:00
Joel Sing
d413908320 test/codegen: add shift tests for RISCV64
Add tests for shift by constant, masked shifts and bounded shifts. While here,
sort tests by architecture and keep order of tests consistent (lsh, rshU, rsh).

Change-Id: I512d64196f34df9cb2884e8c0f6adcf9dd88b0fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/351289
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
2021-09-24 07:22:13 +00:00
Josh Bleecher Snyder
43d5f213e2 cmd/compile: optimize multi-register shifts on amd64
amd64 can shift in bits from another register instead of filling with 0/1.
This pattern is helpful when implementing 128 bit shifts or arbitrary length shifts.
In the standard library, it shows up in pure Go math/big.

Benchmarks results on amd64 with -tags=math_big_pure_go.

name                          old time/op  new time/op  delta
NonZeroShifts/1/shrVU-8       4.45ns ± 3%  4.39ns ± 1%   -1.28%  (p=0.000 n=30+27)
NonZeroShifts/1/shlVU-8       4.13ns ± 4%  4.10ns ± 2%     ~     (p=0.254 n=29+28)
NonZeroShifts/2/shrVU-8       5.55ns ± 1%  5.63ns ± 2%   +1.42%  (p=0.000 n=28+29)
NonZeroShifts/2/shlVU-8       5.70ns ± 2%  5.14ns ± 1%   -9.82%  (p=0.000 n=29+28)
NonZeroShifts/3/shrVU-8       6.79ns ± 2%  6.35ns ± 2%   -6.46%  (p=0.000 n=28+29)
NonZeroShifts/3/shlVU-8       6.69ns ± 1%  6.25ns ± 1%   -6.60%  (p=0.000 n=28+27)
NonZeroShifts/4/shrVU-8       7.79ns ± 2%  7.06ns ± 2%   -9.48%  (p=0.000 n=30+30)
NonZeroShifts/4/shlVU-8       7.82ns ± 1%  7.24ns ± 1%   -7.37%  (p=0.000 n=28+29)
NonZeroShifts/5/shrVU-8       8.90ns ± 3%  7.93ns ± 1%  -10.84%  (p=0.000 n=29+26)
NonZeroShifts/5/shlVU-8       8.68ns ± 1%  7.92ns ± 1%   -8.76%  (p=0.000 n=29+29)
NonZeroShifts/10/shrVU-8      14.4ns ± 1%  12.3ns ± 2%  -14.79%  (p=0.000 n=28+29)
NonZeroShifts/10/shlVU-8      14.1ns ± 1%  11.9ns ± 2%  -15.55%  (p=0.000 n=28+27)
NonZeroShifts/100/shrVU-8      118ns ± 1%    96ns ± 3%  -18.82%  (p=0.000 n=30+29)
NonZeroShifts/100/shlVU-8      120ns ± 2%    98ns ± 2%  -18.46%  (p=0.000 n=29+28)
NonZeroShifts/1000/shrVU-8    1.10µs ± 1%  0.88µs ± 2%  -19.63%  (p=0.000 n=29+30)
NonZeroShifts/1000/shlVU-8    1.10µs ± 2%  0.88µs ± 2%  -20.28%  (p=0.000 n=29+28)
NonZeroShifts/10000/shrVU-8   10.9µs ± 1%   8.7µs ± 1%  -19.78%  (p=0.000 n=28+27)
NonZeroShifts/10000/shlVU-8   10.9µs ± 2%   8.7µs ± 1%  -19.64%  (p=0.000 n=29+27)
NonZeroShifts/100000/shrVU-8   111µs ± 2%    90µs ± 2%  -19.39%  (p=0.000 n=28+29)
NonZeroShifts/100000/shlVU-8   113µs ± 2%    90µs ± 2%  -20.43%  (p=0.000 n=30+27)

The assembly version is still faster, unfortunately, but the gap is narrowing.
Speedup from pure Go to assembly:

name                          old time/op  new time/op  delta
NonZeroShifts/1/shrVU-8       4.39ns ± 1%  3.45ns ± 2%  -21.36%  (p=0.000 n=27+29)
NonZeroShifts/1/shlVU-8       4.10ns ± 2%  3.47ns ± 3%  -15.42%  (p=0.000 n=28+30)
NonZeroShifts/2/shrVU-8       5.63ns ± 2%  3.97ns ± 0%  -29.40%  (p=0.000 n=29+25)
NonZeroShifts/2/shlVU-8       5.14ns ± 1%  3.77ns ± 2%  -26.65%  (p=0.000 n=28+26)
NonZeroShifts/3/shrVU-8       6.35ns ± 2%  4.79ns ± 2%  -24.52%  (p=0.000 n=29+29)
NonZeroShifts/3/shlVU-8       6.25ns ± 1%  4.42ns ± 1%  -29.29%  (p=0.000 n=27+26)
NonZeroShifts/4/shrVU-8       7.06ns ± 2%  5.64ns ± 1%  -20.05%  (p=0.000 n=30+29)
NonZeroShifts/4/shlVU-8       7.24ns ± 1%  5.34ns ± 2%  -26.23%  (p=0.000 n=29+29)
NonZeroShifts/5/shrVU-8       7.93ns ± 1%  6.56ns ± 2%  -17.26%  (p=0.000 n=26+30)
NonZeroShifts/5/shlVU-8       7.92ns ± 1%  6.27ns ± 1%  -20.79%  (p=0.000 n=29+25)
NonZeroShifts/10/shrVU-8      12.3ns ± 2%  10.2ns ± 2%  -17.21%  (p=0.000 n=29+29)
NonZeroShifts/10/shlVU-8      11.9ns ± 2%  10.5ns ± 2%  -12.45%  (p=0.000 n=27+29)
NonZeroShifts/100/shrVU-8     95.9ns ± 3%  77.7ns ± 1%  -19.00%  (p=0.000 n=29+30)
NonZeroShifts/100/shlVU-8     97.5ns ± 2%  66.8ns ± 2%  -31.47%  (p=0.000 n=28+30)
NonZeroShifts/1000/shrVU-8     884ns ± 2%   705ns ± 1%  -20.17%  (p=0.000 n=30+28)
NonZeroShifts/1000/shlVU-8     880ns ± 2%   590ns ± 1%  -32.96%  (p=0.000 n=28+25)
NonZeroShifts/10000/shrVU-8   8.74µs ± 1%  7.34µs ± 3%  -15.94%  (p=0.000 n=27+30)
NonZeroShifts/10000/shlVU-8   8.73µs ± 1%  6.00µs ± 1%  -31.25%  (p=0.000 n=27+28)
NonZeroShifts/100000/shrVU-8  89.6µs ± 2%  75.5µs ± 2%  -15.80%  (p=0.000 n=29+29)
NonZeroShifts/100000/shlVU-8  89.6µs ± 2%  68.0µs ± 3%  -24.09%  (p=0.000 n=27+30)

Change-Id: I18f58d8f5513d737d9cdf09b8f9d14011ffe3958
Reviewed-on: https://go-review.googlesource.com/c/go/+/297050
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-11 19:11:46 +00:00
Paul E. Murphy
48ddf70128 cmd/asm,cmd/compile: support 5 operand RLWNM/RLWMI on ppc64
These instructions are actually 5 argument opcodes as specified
by the ISA.  Prior to this patch, the MB and ME arguments were
merged into a single bitmask operand to workaround the limitations
of the ppc64 assembler backend.

This limitation no longer exists. Thus, we can pass operands for
these opcodes without having to merge the MB and ME arguments in
the assembler frontend or compiler backend.

Likewise, support for 4 operand variants is unchanged.

Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/298750
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2021-03-09 20:35:41 +00:00
Michael Munday
854e892ce1 cmd/compile: optimize shift pairs and masks on s390x
Optimize combinations of left and right shifts by a constant value
into a 'rotate then insert selected bits [into zero]' instruction.
Use the same instruction for contiguous masks since it has some
benefits over 'and immediate' (not restricted to 32-bits, does not
overwrite source register).

To keep the complexity of this change under control I've only
implemented 64 bit operations for now.

There are a lot more optimizations that can be done with this
instruction family. However, since their function overlaps with other
instructions we need to be somewhat careful not to break existing
optimization rules by creating optimization dead ends. This is
particularly true of the load/store merging rules which contain lots
of zero extensions and shifts.

This CL does interfere with the store merging rules when an operand
is shifted left before it is stored:

  binary.BigEndian.PutUint64(b, x << 1)

This is unfortunate but it's not critical and somewhat complex so
I plan to fix that in a follow up CL.

file      before    after     Δ       %
addr2line 4117446   4117282   -164    -0.004%
api       4945184   4942752   -2432   -0.049%
asm       4998079   4991891   -6188   -0.124%
buildid   2685158   2684074   -1084   -0.040%
cgo       4553732   4553394   -338    -0.007%
compile   19294446  19245070  -49376  -0.256%
cover     4897105   4891319   -5786   -0.118%
dist      3544389   3542785   -1604   -0.045%
doc       3926795   3927617   +822    +0.021%
fix       3302958   3293868   -9090   -0.275%
link      6546274   6543456   -2818   -0.043%
nm        4102021   4100825   -1196   -0.029%
objdump   4542431   4548483   +6052   +0.133%
pack      2482465   2416389   -66076  -2.662%
pprof     13366541  13363915  -2626   -0.020%
test2json 2829007   2761515   -67492  -2.386%
trace     10216164  10219684  +3520   +0.034%
vet       6773956   6773572   -384    -0.006%
total     107124151 106917891 -206260 -0.193%

Change-Id: I7591cce41e06867ba10a745daae9333513062746
Reviewed-on: https://go-review.googlesource.com/c/go/+/233317
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Michael Munday <mike.munday@ibm.com>
2020-11-06 10:45:31 +00:00
Paul E. Murphy
c3c6fbf314 cmd/compile: combine more 32 bit shift and mask operations on ppc64
Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is
and the shift value produce constant which can be encoded into an
RLWINM instruction.

Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate
masks produces a constant which can be encoded into RLWINM.

Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)).

Combine rotate word + and operations which can be encoded as a single
RLWINM/RLWNM instruction.

The most notable performance improvements arise from the crypto
benchmarks below (GOARCH=power8 on a ppc64le/linux):

pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le
ExpandKeyWithSalt                               52.2µs ± 0%    47.5µs ± 0%  -8.88%
ExpandKey                                       44.4µs ± 0%    40.3µs ± 0%  -9.15%

pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le
Key                                             57.6ms ± 0%    52.3ms ± 0%  -9.13%

pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le
Equal                                           90.9ms ± 0%    82.6ms ± 0%  -9.13%
DefaultCost                                     91.0ms ± 0%    82.7ms ± 0%  -9.12%

Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/260798
Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-27 18:33:20 +00:00
Lynn Boger
cc2a5cf4b8 cmd/compile,cmd/internal/obj/ppc64: fix some shift rules due to a regression
A recent change to improve shifts was generating some
invalid cases when the rule was based on an AND. The
extended mnemonics CLRLSLDI and CLRLSLWI only allow
certain values for the operands and in the mask case
those values were not being checked properly. This
adds a check to those rules to verify that the
'b' and 'n' values used when an AND was part of the rule
have correct values.

There was a bug in some diag messages in asm9. The
message expected 3 values but only provided 2. Those are
corrected here also.

The test/codegen/shift.go was updated to add a few more
cases to check for the case mentioned here.

Some of the comments that mention the order of operands
in these extended mnemonics were wrong and those have been
corrected.

Fixes #41683.

Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31
Reviewed-on: https://go-review.googlesource.com/c/go/+/258138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-01 18:51:18 +00:00
Lynn Boger
a424f6e45e cmd/asm,cmd/compile,cmd/internal/obj/ppc64: add extswsli support on power9
This adds support for the extswsli instruction which combines
extsw followed by a shift.

New benchmark demonstrates the improvement:
name      old time/op  new time/op  delta
ExtShift  1.34µs ± 0%  1.30µs ± 0%  -3.15%  (p=0.057 n=4+3)

Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-28 18:13:48 +00:00
Lynn Boger
967465da29 cmd/compile: use combined shifts to improve array addressing on ppc64x
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.

These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.

Some rules in PPC64.rules were moved because the ordering prevented
some matching.

The following results were generated on power9.

hash/crc32:
    CRC32/poly=Koopman/size=40/align=0          195ns ± 0%     163ns ± 0%  -16.41%
    CRC32/poly=Koopman/size=40/align=1          200ns ± 0%     163ns ± 0%  -18.50%
    CRC32/poly=Koopman/size=512/align=0        1.98µs ± 0%    1.67µs ± 0%  -15.46%
    CRC32/poly=Koopman/size=512/align=1        1.98µs ± 0%    1.69µs ± 0%  -14.80%
    CRC32/poly=Koopman/size=1kB/align=0        3.90µs ± 0%    3.31µs ± 0%  -15.27%
    CRC32/poly=Koopman/size=1kB/align=1        3.85µs ± 0%    3.31µs ± 0%  -14.15%
    CRC32/poly=Koopman/size=4kB/align=0        15.3µs ± 0%    13.1µs ± 0%  -14.22%
    CRC32/poly=Koopman/size=4kB/align=1        15.4µs ± 0%    13.1µs ± 0%  -14.79%
    CRC32/poly=Koopman/size=32kB/align=0        137µs ± 0%     105µs ± 0%  -23.56%
    CRC32/poly=Koopman/size=32kB/align=1        137µs ± 0%     105µs ± 0%  -23.53%

crypto/rc4:
    RC4_128    733ns ± 0%    650ns ± 0%  -11.32%  (p=1.000 n=1+1)
    RC4_1K    5.80µs ± 0%   5.17µs ± 0%  -10.89%  (p=1.000 n=1+1)
    RC4_8K    45.7µs ± 0%   40.8µs ± 0%  -10.73%  (p=1.000 n=1+1)

crypto/sha1:
    Hash8Bytes       635ns ± 0%     613ns ± 0%   -3.46%  (p=1.000 n=1+1)
    Hash320Bytes    2.30µs ± 0%    2.18µs ± 0%   -5.38%  (p=1.000 n=1+1)
    Hash1K          5.88µs ± 0%    5.38µs ± 0%   -8.62%  (p=1.000 n=1+1)
    Hash8K          42.0µs ± 0%    37.9µs ± 0%   -9.75%  (p=1.000 n=1+1)

There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.

Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-09-17 12:37:40 +00:00
Lynn Boger
a1550d3ca3 cmd/compile: use isel with variable shifts on ppc64x
This changes the code generated for variable length shift
counts to use isel instead of instructions that set and
read the carry flag.

This reduces the generated code for shifts like this
by 1 instruction and avoids the use of instructions to
set and read the carry flag.

This sequence can be found in strconv with these results
on power9:

Atof64Decimal                          71.6ns ± 0%  68.3ns ± 0%   -4.61%
Atof64Float                            95.3ns ± 0%  90.9ns ± 0%   -4.62%
Atof64FloatExp                          153ns ± 0%   149ns ± 0%   -2.61%
Atof64Big                               234ns ± 0%   232ns ± 0%   -0.85%
Atof64RandomBits                        348ns ± 0%   369ns ± 0%   +6.03%
Atof64RandomFloats                      262ns ± 0%   262ns ± 0%     ~
Atof32Decimal                          72.0ns ± 0%  68.2ns ± 0%   -5.28%
Atof32Float                            92.1ns ± 0%  87.1ns ± 0%   -5.43%
Atof32FloatExp                          159ns ± 0%   158ns ± 0%   -0.63%
Atof32Random                            194ns ± 0%   191ns ± 0%   -1.55%

Some tests in codegen/shift.go are enabled to verify the
expected instructions are generated.

Change-Id: I968715d10ada405a8c46132bf19b8ed9b85796d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227337
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-09 19:18:56 +00:00
Lynn Boger
e4a1cf8a56 cmd/compile: add rules to eliminate unnecessary signed shifts
This change to the rules removes some unnecessary signed shifts
that appear in the math/rand functions. Existing rules did not
cover some of the signed cases.

A little improvement seen in math/rand due to removing 1 of 2
instructions generated for Int31n, which is inlined quite a bit.

Intn1000                 46.9ns ± 0%  45.5ns ± 0%   -2.99%  (p=1.000 n=1+1)
Int63n1000               33.5ns ± 0%  32.8ns ± 0%   -2.09%  (p=1.000 n=1+1)
Int31n1000               32.7ns ± 0%  32.6ns ± 0%   -0.31%  (p=1.000 n=1+1)
Float32                  32.7ns ± 0%  30.3ns ± 0%   -7.34%  (p=1.000 n=1+1)
Float64                  21.7ns ± 0%  20.9ns ± 0%   -3.69%  (p=1.000 n=1+1)
Perm3                     205ns ± 0%   202ns ± 0%   -1.46%  (p=1.000 n=1+1)
Perm30                   1.71µs ± 0%  1.68µs ± 0%   -1.35%  (p=1.000 n=1+1)
Perm30ViaShuffle         1.65µs ± 0%  1.65µs ± 0%   -0.30%  (p=1.000 n=1+1)
ShuffleOverhead          2.83µs ± 0%  2.83µs ± 0%   -0.07%  (p=1.000 n=1+1)
Read3                    18.7ns ± 0%  16.1ns ± 0%  -13.90%  (p=1.000 n=1+1)
Read64                    126ns ± 0%   124ns ± 0%   -1.59%  (p=1.000 n=1+1)
Read1000                 1.75µs ± 0%  1.63µs ± 0%   -7.08%  (p=1.000 n=1+1)

Change-Id: I11502dfca7d65aafc76749a8d713e9e50c24a858
Reviewed-on: https://go-review.googlesource.com/c/go/+/225917
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-27 16:05:42 +00:00
Agniva De Sarker
8fedb2d338 cmd/compile: optimize bounded shifts on wasm
Use the shiftIsBounded function to generate more efficient
Shift instructions.

Updates #25167

Change-Id: Id350f8462dc3a7ed3bfed0bcbea2860b8f40048a
Reviewed-on: https://go-review.googlesource.com/c/go/+/182558
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Richard Musiol <neelance@gmail.com>
2019-08-28 04:44:21 +00:00
Josh Bleecher Snyder
61945fc502 cmd/compile: don't generate panicshift for masked int shifts
We know that a & 31 is non-negative for all a, signed or not.
We can avoid checking that and needing to write out an
unreachable call to panicshift.

Change-Id: I32f32fb2c950d2b2b35ac5c0e99b7b2dbd47f917
Reviewed-on: https://go-review.googlesource.com/c/go/+/167499
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-14 00:03:29 +00:00
Josh Bleecher Snyder
870cfe6484 test/codegen: gofmt
Change-Id: I33f5b5051e5f75aa264ec656926223c5a3c09c1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/167498
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
2019-03-13 21:44:45 +00:00
Michael Munday
8af0c77df3 cmd/compile: simplify shift lowering on s390x
Use conditional moves instead of subtractions with borrow to handle
saturation cases. This allows us to delete the SUBE/SUBEW ops and
associated rules from the SSA backend. Using conditional moves also
means we can detect when shift values are masked so I've added some
new rules to constant fold the relevant comparisons and masking ops.

Also use the new shiftIsBounded() function to avoid generating code
to handle saturation cases where possible.

Updates #25167 for s390x.

Change-Id: Ief9991c91267c9151ce4c5ec07642abb4dcc1c0d
Reviewed-on: https://go-review.googlesource.com/110070
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-08 16:19:56 +00:00