Fold constant int16 addends for usages of math/bits.Add64(x,const,0)
on PPC64. This usage shows up in a few crypto implementations;
notably the go wrapper for CL 626176.
Change-Id: I6963163330487d04e0479b4fdac235f97bb96889
Reviewed-on: https://go-review.googlesource.com/c/go/+/625899
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
For the SubFromLen64 codegen test case to work as intended, we need
to fold c-(-(x-d)) into x+(c-d).
Still, some instances of LeadingZeros are not optimized into single
CLZ instructions right now (actually, the LeadingZeros micro-benchmarks
are currently still compiled with redundant adds/subs of 64, due to
interference of loop optimizations before lowering), but perf numbers
indicate it's not that bad after all.
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 3.660n ± 0% 1.348n ± 0% -63.17% (p=0.000 n=20)
LeadingZeros8 1.777n ± 0% 1.767n ± 0% -0.56% (p=0.000 n=20)
LeadingZeros16 2.816n ± 0% 1.770n ± 0% -37.14% (p=0.000 n=20)
LeadingZeros32 5.293n ± 1% 1.683n ± 0% -68.21% (p=0.000 n=20)
LeadingZeros64 3.622n ± 0% 1.349n ± 0% -62.76% (p=0.000 n=20)
geomean 3.229n 1.571n -51.35%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 2.410n ± 0% 1.103n ± 1% -54.23% (p=0.000 n=20)
LeadingZeros8 1.236n ± 0% 1.501n ± 0% +21.44% (p=0.000 n=20)
LeadingZeros16 2.106n ± 0% 1.501n ± 0% -28.73% (p=0.000 n=20)
LeadingZeros32 2.860n ± 0% 1.324n ± 0% -53.72% (p=0.000 n=20)
LeadingZeros64 2.6135n ± 0% 0.9509n ± 0% -63.62% (p=0.000 n=20)
geomean 2.159n 1.256n -41.81%
Updates #59120
This patch is a copy of CL 483356.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556
Reviewed-on: https://go-review.googlesource.com/c/go/+/624575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
For #69635
Change-Id: Id5696dc9724c3b3afcd7b60a6994f98c5309eb0e
Reviewed-on: https://go-review.googlesource.com/c/go/+/621755
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Change-Id: I03ba70076d6dd3c0b9624d14699b7dd91a3c0e9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/618476
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.
Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.
Fixes#70003
Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The spill/restore code around morestack is almost never exectued, so
we should make it as small as possible. Using 2-register loads/stores
makes sense here. Also, the offsets from SP are pretty small so the
offset almost always fits in the (smaller than a normal load/store)
offset field of the instruction.
Makes cmd/go 0.6% smaller.
Change-Id: I8845283c1b269a259498153924428f6173bda293
Reviewed-on: https://go-review.googlesource.com/c/go/+/621556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
If the expression type is a single compile-time known type, use that
type instead of the dynamic one, so the later passes of the compiler
could skip un-necessary runtime calls.
Thanks Youlin Feng for writing the original test case.
Change-Id: I3f65ab90f041474a9731338a82136c1d394c1773
Reviewed-on: https://go-review.googlesource.com/c/go/+/616975
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Change-Id: I7d996b8d46fbeef933943f806052a30f1f8d50c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/588836
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Tim King <taking@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
This CL optimizes the compilation of string-to-bytes conversion in the
case of string additions.
Fixes#62407
Change-Id: Ic47df758478e5d061880620025c4ec7dbbff8a64
Reviewed-on: https://go-review.googlesource.com/c/go/+/527935
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Tim King <taking@google.com>
When GORISCV64 enables rva22u64, combined shift and addition using the
SH1ADD, SH2ADD and SH3ADD instructions that are available via the Zba
extension. This results in more than 2000 instructions being removed
from the Go binary on riscv64.
Change-Id: Ia62ae7dda3d8083cff315113421bee73f518eea8
Reviewed-on: https://go-review.googlesource.com/c/go/+/606636
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
No point in keeping values in registers when their next use is after
a call, as we'd have to spill/restore them anyway.
cmd/go is 0.1% smaller.
Fixes#59297
Change-Id: I10ee761d0d23229f57de278f734c44d6a8dccd6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/509255
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This can be done efficiently with few instructions.
This also adds MULHDUCC for further codegen improvement.
Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
After newobject, we don't need to write zeroes to initialize the
object. It has already been zeroed by the allocator.
This is already handled in most cases, but because we run builtin
decomposition after the opt pass, we don't handle cases where the zero
of a compound builtin is being written. Improve the zero detector to
handle those cases.
Fixes#68845
Change-Id: If3dde2e304a05e5a6a6723565191d5444b334bcc
Reviewed-on: https://go-review.googlesource.com/c/go/+/605255
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Get rid of TODO in prove pass.
We currently avoid marking shifts of constants as bounded, where
bounded means we don't have to worry about <0 or >=bitwidth shifts.
We do this because it causes different rule applications during lowering
which cause some codegen tests to fail.
Add some new rules which ensure that we get the right final instruction
sequence regardless of the ordering. Then we can remove this special case.
Change-Id: I4e962d4f09992b42ab47e123de5ded3b8b8fb205
Reviewed-on: https://go-review.googlesource.com/c/go/+/602935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
We don't need noLimit checks in a bunch of places.
Also simplify folding of provable constant results.
At this point in the CL stack, compilebench reports no performance
changes. The only thing of note is that binaries got a bit smaller.
name old text-bytes new text-bytes delta
HelloSize 960kB ± 0% 952kB ± 0% -0.83% (p=0.000 n=10+10)
CmdGoSize 12.3MB ± 0% 12.1MB ± 0% -1.53% (p=0.000 n=10+10)
Change-Id: Id4be75eec0f8c93f2f3b93a8521ce2278ee2ee2c
Reviewed-on: https://go-review.googlesource.com/c/go/+/599197
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Handles a lot more cases where constant ranges can eliminate
various (mostly bounds failure) paths.
Fixes#66826Fixes#66692Fixes#48213
Update #57959
TODO: remove constant logic from poset code, no longer needed.
Change-Id: Id196436fcd8a0c84c7d59c04f93bd92e26a0fd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/599096
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Temporary measure to reduce the required MVP code.
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest-swissmap
Change-Id: I44dc8acd0dc8280c6beb40451998e84bc85c238a
Reviewed-on: https://go-review.googlesource.com/c/go/+/580915
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
x86 is better at storing constant ints than constant floats.
(It uses a constant directly in the instruction stream, instead of
loading it from a constant global memory.)
Noticed as part of #67957
Change-Id: I9b7b586ad8e0fe9ce245324f020e9526f82b209d
Reviewed-on: https://go-review.googlesource.com/c/go/+/592596
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
RLIWNM does not clear the upper 32 bits of the target register if
the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for
such masks. All other usage clears the upper 32 bits.
Fixes#67844.
Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/590896
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change-Id: I51e9832317e5dee1e3fe0772e7592b3dae95a625
Reviewed-on: https://go-review.googlesource.com/c/go/+/586797
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The rotate value was not correctly converted from a 64 bit to 32
bit rotate. This caused a miscompile of
golang.org/x/text/unicode/runenames.Names.
Fixes#67526
Change-Id: Ief56fbab27ccc71cd4c01117909bfee7f60a2ea1
Reviewed-on: https://go-review.googlesource.com/c/go/+/586915
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Avoid creating duplicate usages of ANDCCconst. This is preparation for
a patch to reintroduce ANDconst to simplify the lower pass while
treating ANDCCconst like other *CC* ssa opcodes.
Also, move many of the similar rules wich retarget ANDCCconst users
to the flag result to a common rule for all compares against zero.
Change-Id: Ida86efe17ff413cb82c349d8ef69d2899361f4c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/585400
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Investigating binaries, these patterns seem to show up frequently.
Change-Id: I987251e4070e35c25e98da321e444ccaa1526912
Reviewed-on: https://go-review.googlesource.com/c/go/+/583302
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
The results of cmpstring are reuseable if the second call has the
same arguments and memory.
Note that this gets rid of cmpstring, but we still generate a
redundant </<= test and branch afterwards, because the compiler
doesn't know that cmpstring only ever returns -1,0,1.
Update #61725
Change-Id: I93a0d1ccca50d90b1e1a888240ffb75a3b10b59b
Reviewed-on: https://go-review.googlesource.com/c/go/+/578835
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Any normal float32 constant can be generated by this instruction;
use xxspltidp when possible. This prefixed instruction is much
faster than the two instruction load sequence from the
float32/float64 constant pool.
Change-Id: Id751d9ffdae71463adbde66427b986f0b2ef74c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/575555
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
For #66585
Change-Id: Iddc407e3ef4c3b6ecf5173963b66b3e65e43c92d
Reviewed-on: https://go-review.googlesource.com/c/go/+/575336
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
This enables efficient use of the builtin min/max function
for float64 and float32 types on GOPPC64 >= power9.
Extend the assembler to support xsminjdp/xsmaxjdp and use
them to implement float min/max.
Simplify the VSX xx3 opcode rules to allow FPR arguments,
if all arguments are an FPR.
Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328
Reviewed-on: https://go-review.googlesource.com/c/go/+/574535
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Remove dynamic checks for atomic instructions for ARM64 targets that support LSE extension.
For #66131
Change-Id: I0ec1b183a3f4ea4c8a537430646e6bc4b4f64271
Reviewed-on: https://go-review.googlesource.com/c/go/+/569536
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
Reviewed-by: Shu-Chun Weng <scw@google.com>
Constant bools are like constant 1-byte values, they memcombine just fine.
(There are still trickier cases that this pass doesn't catch
yet, see TODO at memcombine.go:503.)
Fixes#66413
Change-Id: Ia67cf72ed1c416e27ac22da443bd88a3f09a6cc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/573416
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
This usage shows up in quite a few places, and helps reduce
register pressure in several complex cryto functions by
removing a MOVD $0,... instruction.
Change-Id: I9444ea8f9d19bfd68fb71ea8dc34e109681b3802
Reviewed-on: https://go-review.googlesource.com/c/go/+/571055
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
For GOPPC64 < 10 targets, most large 32 bit constants (those
exceeding int16 capacity) can be added using two instructions
instead of 3.
This cannot be done for values greater than 0x7FFF7FFF, so this
must be done during asm preprocessing as the optab matching
rules cannot differentiate this special case.
Likewise, constants 0x8000 <= x < 0x10000 are not converted. The
assembler currently generates 2 instructions sequences for these
constants.
Change-Id: I1ccc839c6c28fc32f15d286b2e52e2d22a2a06d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/568116
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Provide and use rotation pseudo-instructions for riscv64. The RISC-V bitmanip
extension adds support for hardware rotation instructions in the form of ROL,
ROLW, ROR, RORI, RORIW and RORW. These are easily implemented in the assembler
as pseudo-instructions for CPUs that do not support the bitmanip extension.
This approach provides a number of advantages, including reducing the rewrite
rules needed in the compiler, simplifying codegen tests and most importantly,
allowing these instructions to be used in assembly (for example, riscv64
optimised versions of SHA-256 and SHA-512). When bitmanip support is added,
these instruction sequences can simply be replaced with a single instruction
if permitted by the GORISCV64 profile.
Change-Id: Ia23402e1a82f211ac760690deb063386056ae1fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/565015
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: M Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Currently we use pointer equality on types when deciding whether we can
reuse a stack slot. That's too strict, as we don't guarantee pointer
equality for the same type. In particular, it can vary based on whether
PtrTo has been called in the frontend or not.
Instead, use the type's LinkString, which is guaranteed to both be
unique for a type, and to not vary given two different type structures
describing the same type.
Update #65783
Change-Id: I64f55138475f04bfa30cfb819b786b7cc06aebe4
Reviewed-on: https://go-review.googlesource.com/c/go/+/565436
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Enable canRotate for riscv64, enable rotation intrinsics and provide
better rewrite implementations for rotations. By avoiding Lsh*x64
and Rsh*Ux64 we can produce better code, especially for 32 and 64
bit rotations. By enabling canRotate we also benefit from the generic
rotation rewrite rules.
Benchmark on a StarFive VisionFive 2:
│ rotate.1 │ rotate.2 │
│ sec/op │ sec/op vs base │
RotateLeft-4 14.700n ± 0% 8.016n ± 0% -45.47% (p=0.000 n=10)
RotateLeft8-4 14.70n ± 0% 10.69n ± 0% -27.28% (p=0.000 n=10)
RotateLeft16-4 14.70n ± 0% 12.02n ± 0% -18.23% (p=0.000 n=10)
RotateLeft32-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10)
RotateLeft64-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10)
geomean 14.15n 9.208n -34.92%
Change-Id: I1a2036fdc57cf88ebb6617eb8d92e1d187e183b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/560315
Reviewed-by: M Zhuo <mengzhuo1203@gmail.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
As CL 514596 and CL 514775 adds hardware implement of float
max/min, we should add codegen test for these two CL.
Change-Id: I347331032fe9f67a2e6fdb5d3cfe20203296b81c
Reviewed-on: https://go-review.googlesource.com/c/go/+/561295
Reviewed-by: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: M Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>