ISEL is roughly equivalent to CMOV on PPC64. Verify ISEL generation
in all reasonable cases.
Note "ISEL test x y z" is the same as "ISEL !test y x z". test is
always one of LT (0), GT (1), EQ (2), SO (3). Sometimes x and y are
swapped if GE/LE/NE is desired.
Change-Id: Ie1bf029224064e004d855099731fe5e8d05aa990
Reviewed-on: https://go-review.googlesource.com/c/go/+/445215
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
This needs to be as low as possible while not breaking priority
assumptions of other scores to correctly schedule carry chains.
Prior to the arm64 changes, it was set below ReadTuple. At the time,
this prevented the MulHiLo implementation on PPC64 from occluding
the scheduling of a full carry chain.
Memory scores can also prevent better scheduling, as can be observed
with crypto/internal/edwards25519/field.feMulGeneric.
Fixes#56497
Change-Id: Ia4b54e6dffcce584faf46b1b8d7cea18a3913887
Reviewed-on: https://go-review.googlesource.com/c/go/+/447435
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
The recently added rule only works before decomposing slices.
Add a rule that works after decomposing slices.
The reason we need the latter is because although the length may
be a constant, it can be hidden inside a slice that is not constant
(its pointer or capacity might be changing). By applying this
optimization after decomposing slices, we can find more cases
where it applies.
Fixes#56440
Change-Id: I0094e59eee3065ab4d210defdda8227a6e897420
Reviewed-on: https://go-review.googlesource.com/c/go/+/446277
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Fixes a performance regression due to CL 418554.
Fixes#56440
Change-Id: I6ff152e9b83084756363f49ee6b0844a7a284880
Reviewed-on: https://go-review.googlesource.com/c/go/+/445875
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
For example:
movb a0, a0
srai $1, a0, a0
the assembler will expand to:
slli $56, a0, a0
srai $56, a0, a0
srai $1, a0, a0
this CL optimize to:
slli $56, a0, a0
srai $57, a0, a0
Remove 270+ instructions from Go binary on linux/riscv64.
Change-Id: I375e19f9d3bd54f2781791d8cbe5970191297dc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/428496
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Partial overlaps can only happen for strict sub-pieces of larger arrays.
That's a much stronger condition than the current optimization rules.
Update #54467
Change-Id: I11e539b71099e50175f37ee78fddf69283f83ee5
Reviewed-on: https://go-review.googlesource.com/c/go/+/433056
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This CL adds some optimizaion rules:
1, Converts CMP to CMN, or vice versa, when comparing with a negative
number.
2, For equal and not equal comparisons, CMP can be converted to CMN in
some cases. In theory we could do the same optimization for LT, LE, GT
and GE, but need to account for overflow, this CL doesn't handle them.
There are no noticeable performance changes.
Change-Id: Ia49266c019ab7908ebc9510c2f02e121b1607869
Reviewed-on: https://go-review.googlesource.com/c/go/+/429795
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
When SLTI/SLTIU is used with ANDI/ORI, it may be possible to determine the
outcome based on the values of the immediates. Resolve these cases.
Improves code generation for various shift operations.
While here, sort tests by architecture to improve readability and ease
future maintenance.
Change-Id: I87e71e016a0e396a928e7d6389a2df61583dfd8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/428217
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Jenny Rakoczy <jenny@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Jenny Rakoczy <jenny@golang.org>
This CL adds shiftIsBounded checks for the Lsh* and Rsh* rules in arm64.
There is no need to check the shift value again with CMP + CSEL when the
shift value is valid.
Change-Id: I54620de64f02a1b5a11089add237248ae2de01b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/417714
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
In go.dev/cl/421821, I included a hack to force OCONVNOP back to
OCONVIFACE for conversions involving shape types and non-empty
interfaces. The comment correctly noted that this was only needed for
conversions between non-identical types, but the code was conservative
and applied to even conversions between identical types.
This CL adds an extra bool to record whether the conversion is between
identical types, so we can keep OCONVNOP instead of forcing back to
OCONVIFACE. This has a small improvement to generated code, because we
no longer need a convI2I call (as demonstrated by codegen/ifaces.go).
But more usefully, this is relevant to pruning unnecessary itab slots
in runtime dictionaries (next CL).
Change-Id: I94f89e961cd26629b925037fea58d283140766ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/427678
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This CL changes the heuristic used to determine whether we can inline a
struct equality check or if we must generate a function and call that
function for equality.
The old method was to count struct fields, but this can lead to poor
in lining decisions. We should really be determining the cost of the
equality check and use that to determine if we should inline or generate
a function.
The new benchmark provided in this CL returns the following when compared
against tip:
```
name old time/op new time/op delta
EqStruct-32 2.46ns ± 4% 0.25ns ±10% -89.72% (p=0.000 n=39+39)
```
Fixes#38494
Change-Id: Ie06b80a2b2a03a3fd0978bcaf7715f9afb66e0ab
GitHub-Last-Rev: e9a18d9389
GitHub-Pull-Request: golang/go#53326
Reviewed-on: https://go-review.googlesource.com/c/go/+/411674
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
This CL optimizes RotateLeft8/16 on arm64.
For 16 bits, we form a 32 bits register by duplicating two 16 bits
registers, then use RORW instruction to do the rotate shift.
For 8 bits, we just use LSR and LSL instead of RORW because the code is
simpler.
Benchmark Old ThisCL delta
RotateLeft8-46 2.16 ns/op 1.73 ns/op -19.70%
RotateLeft16-46 2.16 ns/op 1.54 ns/op -28.53%
Change-Id: I09cde4383d12e31876a57f8cdfd3bb4f324fadb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/420976
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
For the following code case:
var x uint64
x >> (shift & 63)
We can directly genereta `x >> shift` on arm64, since the hardware will
only use the bottom 6 bits of the shift amount.
Benchmark old time/op new time/op delta
ShiftArithmeticRight-8 0.40ns 0.31ns -21.7%
Change-Id: Id58c8a5b2f6dd5c30c3876f4a36e11b4d81e2dc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/425294
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Rotating by c, then by d, is the same as rotating by c+d.
Change-Id: I36df82261460ff80f7c6d39bcdf0e840cef1c91a
Reviewed-on: https://go-review.googlesource.com/c/go/+/424894
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Currently we use a full cmpstring to do the comparison for each
split in the binary search for a string switch.
Instead, split by comparing a single byte of the input string with a
constant. That will give us a much faster split (although it might be
not quite as good a split).
Fixes#53333
R=go1.20
Change-Id: I28c7209342314f367071e4aa1f2beb6ec9ff7123
Reviewed-on: https://go-review.googlesource.com/c/go/+/414894
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
The prove pass will mark some shifts bounded, and then we can use that
information to generate better code on riscv64.
Change-Id: Ia22f43d0598453c9417adac7017db28d7240948b
Reviewed-on: https://go-review.googlesource.com/c/go/+/422616
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This is a follow up of CL 425101 on RISCV64.
According to RISCV Volume 1, Unprivileged Spec v. 20191213 Chapter 7.1:
If both the high and low bits of the same product are required, then the
recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2
(source register specifiers must be in same order and rdh cannot be the
same as rs1 or rs2). Microarchitectures can then fuse these into a single
multiply operation instead of performing two separate multiplies.
So we should not split Muluhilo to separate instructions.
Updates #54607
Change-Id: If47461f3aaaf00e27cd583a9990e144fb8bcdb17
Reviewed-on: https://go-review.googlesource.com/c/go/+/425203
Auto-Submit: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Detect rotate instructions while still in architecture-independent form.
It's easier to do here, and we don't need to repeat it in each
architecture file.
Change-Id: I9396954b3f3b3bfb96c160d064a02002309935bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/421195
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com>
Run-TryBot: Keith Randall <khr@golang.org>
Previously we convert $0 to the ZR register for some reasons, which causes
two problems:
1. Confusion, the special case of the ZR register needs to be considered
when dealing with constants. For encoding, some places we encode ZR, and
some places we encode $0, although we have converted $0 to ZR.
2. Unexpected instruction format. All instructions that support ZR register
operands can be replaced by $0.
This patch removes this conversion. Note that this patch may cause previously
unintendedly supported instruction formats to no longer be supported.
Change-Id: I3d8d2c06711b7614a38191397da7776417f1861c
Reviewed-on: https://go-review.googlesource.com/c/go/+/404316
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
On ARM64 we use two separate instructions to compute the hi and lo
results of a 64x64->128 multiplication. Lower to two separate ops
so if only one result is needed we can deadcode the other.
Fixes#54607.
Change-Id: Ib023e77eb2b2b0bcf467b45471cb8a294bce6f90
Reviewed-on: https://go-review.googlesource.com/c/go/+/425101
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
With the introduction of stack objects, VARKILL information is
no longer needed.
With stack objects, an object is dead when there are no more static
references to it, and the stack scanner can't find any live pointers
to it. VARKILL information isn't used to establish live ranges for
address-taken variables any more. In effect, the last static reference
*is* the VARKILL, and there's an additional dynamic liveness check
during stack scanning.
Next CL will actually rip out the VARKILL opcodes.
Change-Id: I030a2ab867445cf4e0e69397911f8a2e2f0ed07b
Reviewed-on: https://go-review.googlesource.com/c/go/+/419234
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Updated multiple tests in test/codegen: math.go, mathbits.go, shift.go
and slices.go to verify on ppc64/ppc64le as well
Change-Id: Id88dd41569b7097819fb4d451b615f69cf7f7a94
Reviewed-on: https://go-review.googlesource.com/c/go/+/412115
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
For signed comparisons, the following four optimization rules hold:
(CMPconst [0] z:(AND x y)) && z.Uses == 1 => (TST x y)
(CMPWconst [0] z:(AND x y)) && z.Uses == 1 => (TSTW x y)
(CMPconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTconst [c] y)
(CMPWconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTWconst [int32(c)] y)
But currently they only apply to jump instructions, not to conditional
instructions within a block, such as cset, csel, etc. This CL extends
the above rules into blocks so that conditional instructions can also be
optimized.
name old time/op new time/op delta
DivisiblePow2constI64-160 1.04ns ± 0% 0.86ns ± 0% -17.30% (p=0.008 n=5+5)
DivisiblePow2constI32-160 1.04ns ± 0% 0.87ns ± 0% -16.16% (p=0.016 n=4+5)
DivisiblePow2constI16-160 1.04ns ± 0% 0.87ns ± 0% -16.03% (p=0.008 n=5+5)
DivisiblePow2constI8-160 1.04ns ± 0% 0.86ns ± 0% -17.15% (p=0.008 n=5+5)
Change-Id: I6bc34bff30862210e8dd001e0340b8fe502fe3de
Reviewed-on: https://go-review.googlesource.com/c/go/+/420434
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
As it can't appear in user package paths.
There is a hack for handling "go:buildid" and "type:*" on windows/386.
Previously, windows/386 requires underscore prefix on external symbols,
but that's only applied for SHOSTOBJ/SUNDEFEXT or cgo export symbols.
"go.buildid" is STEXT, "type.*" is STYPE, thus they are not prefixed
with underscore.
In external linking mode, the external linker can't resolve them as
external symbols. But we are lucky that they have "." in their name,
so the external linker see them as Forwarder RVA exports. See:
- https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#export-address-table
- https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/pe-dll.c;h=e7b82ba6ffadf74dc1b9ee71dc13d48336941e51;hb=HEAD#l972)
This CL changes "." to ":" in symbols name, so theses symbols can not be
found by external linker anymore. So a hacky way is adding the
underscore prefix for these 2 symbols. I don't have enough knowledge to
verify whether adding the underscore for all STEXT/STYPE symbols are
fine, even if it could be, that would be done in future CL.
Fixes#37762
Change-Id: I92eaaf24c0820926a36e0530fdb07b07af1fcc35
Reviewed-on: https://go-review.googlesource.com/c/go/+/317917
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently there is a an ANDconst and an ANDCCconst op in PPC64,
which is confusing since they map onto the same instruction.
One of these ops sets the result of the AND operation, and the
other sets the flag (condition register).
This converts ANDCCconst into an op with the 2 expected results:
the integer result of the AND and the flag setting. The ANDconst
op has been removed.
Note that in the PPC64 ISA the only variation of the 'and immediate'
is the one that sets the condition bit, which probably led to the
original (confusing) implementation.
This also adds a few rules to improve the use of ANDCCconst with
ISELB and some testcases to verify those improvements.
Change-Id: I523703fa4da2098eb995dc3ba744d36fa28e41d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/422015
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
We don't need a multiply when the element type is size 0 or 1.
The panic functions don't return, so we don't need any post-call
code (register restores, etc.).
Change-Id: I0dcea5df56d29d7be26554ddca966b3903c672e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/419754
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Since CL 391014, cmd/compile now requires the -p flag to be set the
build system. This CL changes it to initialize LocalPkg.Path to the
provided path, rather than relying on writing out `"".` into object
files and expecting cmd/link to substitute them.
However, this actually involved a rather long tail of fixes. Many have
already been submitted, but a few notable ones that have to land
simultaneously with changing LocalPkg:
1. When compiling package runtime, there are really two "runtime"
packages: types.LocalPkg (the source package itself) and
ir.Pkgs.Runtime (the compiler's internal representation, for synthetic
references). Previously, these ended up creating separate link
symbols (`"".xxx` and `runtime.xxx`, respectively), but now they both
end up as `runtime.xxx`, which causes lsym collisions (notably
inittask and funcsyms).
2. test/codegen tests need to be updated to expect symbols to be named
`command-line-arguments.xxx` rather than `"".foo`.
3. The issue20014 test case is sensitive to the sort order of field
tracking symbols. In particular, the local package now sorts to its
natural place in the list, rather than to the front.
Thanks to David Chase for helping track down all of the fixes needed
for this CL.
Updates #51734.
Change-Id: Iba3041cf7ad967d18c6e17922fa06ba11798b565
Reviewed-on: https://go-review.googlesource.com/c/go/+/393715
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
math/bits.Add64 and math/bits.Sub64 now lower and optimize
directly in SSA form.
The optimization of carry chains focuses around eliding
XER<->GPR transfers of the CA bit when used exclusively as an
input to a single carry operations, or when the CA value is
known.
This also adds support for handling XER spills in the assembler
which could happen if carry chains contain inter-dependencies
on each other (which seems very unlikely with practical usage),
or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA).
With PPC64 Add64/Sub64 lowering into SSA and this patch, the net
performance difference in crypto/elliptic benchmarks on P9/ppc64le
are:
name old time/op new time/op delta
ScalarBaseMult/P256 46.3µs ± 0% 46.9µs ± 0% +1.34%
ScalarBaseMult/P224 356µs ± 0% 209µs ± 0% -41.14%
ScalarBaseMult/P384 1.20ms ± 0% 0.57ms ± 0% -52.14%
ScalarBaseMult/P521 3.38ms ± 0% 1.44ms ± 0% -57.27%
ScalarMult/P256 199µs ± 0% 199µs ± 0% -0.17%
ScalarMult/P224 357µs ± 0% 212µs ± 0% -40.56%
ScalarMult/P384 1.20ms ± 0% 0.58ms ± 0% -51.86%
ScalarMult/P521 3.37ms ± 0% 1.44ms ± 0% -57.32%
MarshalUnmarshal/P256/Uncompressed 2.59µs ± 0% 2.52µs ± 0% -2.63%
MarshalUnmarshal/P256/Compressed 2.58µs ± 0% 2.52µs ± 0% -2.06%
MarshalUnmarshal/P224/Uncompressed 1.54µs ± 0% 1.40µs ± 0% -9.42%
MarshalUnmarshal/P224/Compressed 1.54µs ± 0% 1.39µs ± 0% -9.87%
MarshalUnmarshal/P384/Uncompressed 2.40µs ± 0% 1.80µs ± 0% -24.93%
MarshalUnmarshal/P384/Compressed 2.35µs ± 0% 1.81µs ± 0% -23.03%
MarshalUnmarshal/P521/Uncompressed 3.79µs ± 0% 2.58µs ± 0% -31.81%
MarshalUnmarshal/P521/Compressed 3.80µs ± 0% 2.60µs ± 0% -31.67%
Note, P256 uses an asm implementation, thus, little variation is expected.
Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71
Reviewed-on: https://go-review.googlesource.com/c/go/+/346870
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
It is hit ~70k times building go.
This make the go binary, 0.04% smaller.
I didn't included benchmarks because this is just constant foldings
and is hard to mesure objectively.
For example, this enable rewriting things like:
if x == 20 {
return x + 30 + z
}
Into:
if x == 20 {
return 50 + z
}
It's not just fixing programer's code,
the ssa generator generate code like this sometimes.
Change-Id: I0861f342b27f7227b5f1c34d8267fa0057b1bbbc
GitHub-Last-Rev: 4c2f9b5216
GitHub-Pull-Request: golang/go#52669
Reviewed-on: https://go-review.googlesource.com/c/go/+/403735
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This shows up in a few crypto functions, and other
assorted places.
Change-Id: I5a7f4c25ddd4a6499dc295ef693b9fe43d2448ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/404057
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
In the load tests, we only want to test the assembly produced by
the load operations. If we use the global variable sink, it will produce
one load operation and one store operation(assign to sink).
For example:
func load_be64(b []byte) uint64 {
sink64 = binary.BigEndian.Uint64(b)
}
If we compile this function with GOAMD64=v3, it may produce MOVBEQload
and MOVQstore or MOVQload and MOVBEQstore, but we only want MOVBEQload.
Discovered when developing CL 395474.
Same for the store tests.
Change-Id: I65c3c742f1eff657c3a0d2dd103f51140ae8079e
Reviewed-on: https://go-review.googlesource.com/c/go/+/397875
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Cherry Mui <cherryyz@google.com>
The SHRX/SHLX instruction can take any general register as the shift count operand, and can read source from memory. This CL introduces some operators to combine load and shift to one instruction.
For #47120
Change-Id: I13b48f53c7d30067a72eb2c8382242045dead36a
Reviewed-on: https://go-review.googlesource.com/c/go/+/385174
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Cherry Mui <cherryyz@google.com>
LZCNT is similar to BSR, but BSR(x) is undefined when x == 0, so using
LZCNT can avoid a special case for zero input. Except that case,
LZCNTQ(x) == 63-BSRQ(x) and LZCNTL(x) == 31-BSRL(x).
And according to https://www.agner.org/optimize/instruction_tables.pdf,
LZCNT instructions are much faster than BSR on AMD CPU.
name old time/op new time/op delta
LeadingZeros-8 0.91ns ± 1% 0.80ns ± 7% -11.68% (p=0.000 n=9+9)
LeadingZeros8-8 0.98ns ±15% 0.91ns ± 1% -7.34% (p=0.000 n=9+9)
LeadingZeros16-8 0.94ns ± 3% 0.92ns ± 2% -2.36% (p=0.001 n=10+10)
LeadingZeros32-8 0.89ns ± 1% 0.78ns ± 2% -12.49% (p=0.000 n=10+10)
LeadingZeros64-8 0.92ns ± 1% 0.78ns ± 1% -14.48% (p=0.000 n=10+10)
Change-Id: I125147fe3d6994a4cfe558432780408e9a27557a
Reviewed-on: https://go-review.googlesource.com/c/go/+/396794
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This CL add MOVBE support for 16-bit version, but MOVBEWload is
excluded because it does not satisfy zero extented.
For #51724
Change-Id: I3fadf20bcbb9b423f6355e6a1e340107e8e621ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/396617
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Add a new rewrite rule to merge ANDconst and UBFX into
UBFX.
Add test cases.
Change-Id: I24d6442d0c956d7ce092c3a3858d4a3a41771670
Reviewed-on: https://go-review.googlesource.com/c/go/+/377054
Trust: Fannie Zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently we only include static entries in the hint for sizing
the map when allocating a map for a map literal. Change that to
include all entries.
This will be an overallocation if the dynamic entries in the map have
equal keys, but equal keys in map literals are rare, and at worst we
waste a bit of space.
Fixes#43020
Change-Id: I232f82f15316bdf4ea6d657d25a0b094b77884ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/383634
Run-TryBot: Keith Randall <khr@golang.org>
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Updated multiple tests in test/codegen/arithmetic.go to verify
on ppc64/ppc64le as well
Change-Id: I79ca9f87017ea31147a4ba16f5d42ba0fcae64e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/358546
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
In CL 354670, I copied some existing rules for convenience but forgot
to update the last rule which broke `GOAMD64=v3 ./make.bat`
Revive CL 354670
Change-Id: Ic1e2047c603f0122482a4b293ce1ef74d806c019
Reviewed-on: https://go-review.googlesource.com/c/go/+/356810
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
It was noticed through some other investigation that BitLen32
was not generating the best code and found that it wasn't recognized
as an intrinsic. This corrects that and enables the test for PPC64.
Change-Id: Iab496a8830c8552f507b7292649b1b660f3848b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/355872
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
The code generated when storing eight bytes loaded from memory in big
endian introduced two successive byte swaps that did not actually
modified the data.
The new rules match this specific pattern both for amd64 and for arm64,
eliminating the double swap.
Fixes#41684
Change-Id: Icb6dc20b68e4393cef4fe6a07b33aba0d18c3ff3
Reviewed-on: https://go-review.googlesource.com/c/go/+/320073
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
This CL adds late expanded memequal(x, const, sz) inlining for 2, 4, 8
bytes size. This PoC is using the same method as CL 248404.
This optimization fires about 100 times in Go compiler (1675 occurrences
reduced to 1574, so -6%).
Also, added unit-tests to codegen/comparisions.go file.
Updates #37275
Change-Id: Ia52808d573cb706d1da8166c5746ede26f46c5da
Reviewed-on: https://go-review.googlesource.com/c/go/+/328291
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Trust: David Chase <drchase@google.com>
In case of amd64 the compiler issues checks if extensions are
available on a platform. With GOAMD64 microarchitecture levels
provided, some of the checks could be eliminated.
Change-Id: If15c178bcae273b2ce7d3673415cb8849292e087
Reviewed-on: https://go-review.googlesource.com/c/go/+/352010
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
This updates the codegen tests in noextend.go so they are not
dependent on the ABI.
Change-Id: I8433bea9dc78830c143290a7e0cf901b2397d38a
Reviewed-on: https://go-review.googlesource.com/c/go/+/353070
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Add rule to PPC64.rules to inline runtime.memmove in more cases, as is
done for other target architectures
Updated tests in codegen/copy.go to verify changes are done on
ppc64/ppc64le
Updates #41662
Change-Id: Id937ce21f9b4f4047b3e66dfa3c960128ee16a2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/352054
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Instructions with immediates can be precomputed when operating on a
constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI
and ADDI. Additionally, optimise ANDI and ORI when the immediate is
all ones or all zeroes.
In particular, the RISCV64 logical left and right shift rules
(Lsh*x*/Rsh*Ux*) produce sequences that check if the shift amount
exceeds 64 and if so returns zero. When the shift amount is a
constant we can precompute and eliminate the filter entirely.
Likewise the arithmetic right shift rules produce sequences that
check if the shift amount exceeds 64 and if so, ensures that the
lower six bits of the shift are all ones. When the shift amount
is a constant we can precompute the shift value.
Arithmetic right shift sequences like:
117fc: 00100513 li a0,1
11800: 04053593 sltiu a1,a0,64
11804: fff58593 addi a1,a1,-1
11808: 0015e593 ori a1,a1,1
1180c: 40b45433 sra s0,s0,a1
Are now a single srai instruction:
117fc: 40145413 srai s0,s0,0x1
Likewise for logical left shift (and logical right shift):
1d560: 01100413 li s0,17
1d564: 04043413 sltiu s0,s0,64
1d568: 40800433 neg s0,s0
1d56c: 01131493 slli s1,t1,0x11
1d570: 0084f433 and s0,s1,s0
Which are now a single slli (or srli) instruction:
1d120: 01131413 slli s0,t1,0x11
This removes more than 30,000 instructions from the Go binary and
should improve performance in a variety of areas - of note
runtime.makemap_small drops from 48 to 36 instructions. Similar
gains exist in at least other parts of runtime and math/bits.
Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350689
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Add tests for shift by constant, masked shifts and bounded shifts. While here,
sort tests by architecture and keep order of tests consistent (lsh, rshU, rsh).
Change-Id: I512d64196f34df9cb2884e8c0f6adcf9dd88b0fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/351289
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
BMI1 includes four instructions (ANDN, BLSI, BLSMSK, BLSR) that are
easy to peephole optimize, and which GCC always seems to favor using
when available and applicable.
Updates #45453.
Change-Id: I0274184057058f5c579e5bc3ea9c414396d3cf46
Reviewed-on: https://go-review.googlesource.com/c/go/+/351130
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Trust: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Also, add the FABSS and FABSD pseudo instructions to the assembler.
The compiler could use FSGNJX[SD] directly but there doesn't seem
to be much advantage to doing so and the pseudo instructions are
easier to understand.
Change-Id: Ie8825b8aa8773c69cc4f07a32ef04abf4061d80d
Reviewed-on: https://go-review.googlesource.com/c/go/+/348989
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
In some rewrite rules for arm64 bitfield optimizations, the
bitfield lsb value and the bitfield width value are related
to datasize, some of them use datasize directly to check the
bitfield lsb value is valid, to get the bitfiled width value,
but some of them call isARM64BFMask() and arm64BFWidth()
functions. In order to be consistent, this patch changes them
all to use datasize.
Besides, this patch sorts the codegen test cases.
Run the "toolstash-check -all" command and find one inconsistent code
is as the following.
new: src/math/fma.go:104 BEQ 247
master: src/math/fma.go:104 BEQ 248
The above inconsistence is due to this patch changing the range of the
field lsb value in "UBFIZ" optimization rules from "lc+(32|16|8)<64" to
"lc<64", so that the following code is generated as "UBFIZ". The logical
of changed code is still correct.
The code of src/math/fma.go:160:
const uvinf = 0x7FF0000000000000
func FMA(a, b uint32) float64 {
ps := a+b
return Float64frombits(uint64(ps)<<63 | uvinf)
}
The new assembly code:
TEXT "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16
MOVWU "".a(FP), R0
MOVWU "".b+4(FP), R1
ADD R1, R0, R0
UBFIZ $63, R0, $1, R0
ORR $9218868437227405312, R0, R0
MOVD R0, "".~r2+8(FP)
RET (R30)
The master assembly code:
TEXT "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16
MOVWU "".a(FP), R0
MOVWU "".b+4(FP), R1
ADD R1, R0, R0
MOVWU R0, R0
LSL $63, R0, R0
ORR $9218868437227405312, R0, R0
MOVD R0, "".~r2+8(FP)
RET (R30)
Change-Id: I9061104adfdfd3384d0525327ae1e5c8b0df5c35
Reviewed-on: https://go-review.googlesource.com/c/go/+/265038
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
The codegen tests are currently skipped (see #48247). The test
added in CL 346050 did not compile because it was in the main
package but did not contain a main function. Changing the package
to 'codegen' fixes the issue.
Updates #48247.
Change-Id: I0a0eaca8e6a7d7b335606d2c76a204ac0c12e6d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/348392
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
The codegen tests are currently skipped (see #48247) and the
bitfield tests do not actually compile due to a duplicate function
name (sbfiz5) added in CL 267602. Renaming the function fixes the
issue.
Updates #48247.
Change-Id: I626fd5ef13732dc358e73ace9ddcc4cbb6ae5b21
Reviewed-on: https://go-review.googlesource.com/c/go/+/348391
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This test is not executed by default (see #48247) and does not
actually pass. It was added in CL 346689. The code generation
changes made in that CL only change how instructions are assembled,
they do not actually affect the output of the compiler. This test
is unfortunately therefore invalid and will never pass.
Updates #48247.
Change-Id: I0c807e4a111336e5a097fe4e3af2805f9932a87f
Reviewed-on: https://go-review.googlesource.com/c/go/+/348390
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
The most common cases:
len(s) > 0
len(s) < 1
and they can be simplified to:
len(s) != 0
len(s) == 0
Fixes#48054
Change-Id: I16e5b0cffcfab62a4acc2a09977a6cd3543dd000
Reviewed-on: https://go-review.googlesource.com/c/go/+/346050
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This patch adds some rules to rewrite "(LeftShift (SignExtend x) lc)"
expression as "SBFIZ".
Add the test cases.
Change-Id: I294c4ba09712eeb02c7a952447bb006780f1e60d
Reviewed-on: https://go-review.googlesource.com/c/go/+/267602
Trust: fannie zhang <Fannie.Zhang@arm.com>
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The UBFX and SBFX already zero/sign extend the result. Further
zero/sign extensions are thus unnecessary as long as they leave
the top bits unaltered. This patch absorbs zero/sign extensions
into UBFX/SBFX.
Add the related test cases.
Change-Id: I7c4516c8b52d677f77bf3aaedab87c4a28056ec0
Reviewed-on: https://go-review.googlesource.com/c/go/+/265039
Trust: fannie zhang <Fannie.Zhang@arm.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This CL simplifies riscv addition (add r, imm) to
(ADDI (ADDI r, imm/2), imm-imm/2) if imm is in specific ranges.
(-4096 <= imm <= -2049 or 2048 <= imm <= 4094)
There is little impact to the go1 benchmark, while the total
size of pkg/linux_riscv64 decreased by about 11KB.
Change-Id: I236eb8af3b83bb35ce9c0b318fc1d235e8ab9a4e
GitHub-Last-Rev: a2f56a0763
GitHub-Pull-Request: golang/go#48110
Reviewed-on: https://go-review.googlesource.com/c/go/+/346689
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Michael Munday <mike.munday@lowrisc.org>
Add support to the assembler for F[N]M{ADD,SUB}[SD] instructions.
Argument order is:
OP RS1, RS2, RS3, RD
Also, add support for the FMA intrinsic to the compiler. Automatic
FMA matching is left to a future CL.
Change-Id: I47166c7393b2ab6bfc2e42aa8c1a8997c3a071b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/293030
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
For debugging.
Change-Id: I5875ccd2413b8ffd2ec97a0ace66b5cae7893b24
Reviewed-on: https://go-review.googlesource.com/c/go/+/324765
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
In codegen/arithmetic.go, previously there are MOVD's that match
for loads of arguments. With register ABI there are no more such
loads. Remove the MOVD matches.
Change-Id: I920ee2629c8c04d454f13a0c08e283d3528d9a64
Reviewed-on: https://go-review.googlesource.com/c/go/+/324251
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
This CL add runtime.memmove inlining for AMD64 and ARM64.
According to ssa dump from testcases generic rules can't inline
memmomve properly due to one of the arguments is Phi operation. But this
Phi op will be optimized out by later optimization stages. As a result
memmove can be inlined during arch-specific rules.
The commit add new optimization rules to arch-specific rules that can
inline runtime.memmove if it possible during lowering stage.
Optimization fires 5 times in Go source-code using regabi.
Fixes#41662
Change-Id: Iaffaf4c482d068b5f0683d141863892202cc8824
Reviewed-on: https://go-review.googlesource.com/c/go/+/289151
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: David Chase <drchase@google.com>
These operations (BT{S,R,C}{Q,L}modify) are quite a bit slower than
other ways of doing the same thing.
Without the BTxmodify operations, there are two fallback ways the compiler
performs these operations: AND/OR/XOR operations directly on memory, or
load-BTx-write sequences. The compiler kinda chooses one arbitrarily
depending on rewrite rule application order. Currently, it uses
load-BTx-write for the Const benchmarks and AND/OR/XOR directly to memory
for the non-Const benchmarks. TBD, someone might investigate which of
the two fallback strategies is really better. For now, they are both
better than BTx ops.
name old time/op new time/op delta
BitSet-8 1.09µs ± 2% 0.64µs ± 5% -41.60% (p=0.000 n=9+10)
BitClear-8 1.15µs ± 3% 0.68µs ± 6% -41.00% (p=0.000 n=10+10)
BitToggle-8 1.18µs ± 4% 0.73µs ± 2% -38.36% (p=0.000 n=10+8)
BitSetConst-8 37.0ns ± 7% 25.8ns ± 2% -30.24% (p=0.000 n=10+10)
BitClearConst-8 30.7ns ± 2% 25.0ns ±12% -18.46% (p=0.000 n=10+10)
BitToggleConst-8 36.9ns ± 1% 23.8ns ± 3% -35.46% (p=0.000 n=9+10)
Fixes#45790
Update #45242
Change-Id: Ie33a72dc139f261af82db15d446cd0855afb4e59
Reviewed-on: https://go-review.googlesource.com/c/go/+/318149
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ben Shi <powerman1st@163.com>
Currently, if we have AX=a and BX=b, and we want to make a call
F(1, a, b), to move arguments into the desired registers it emits
MOVQ AX, CX
MOVL $1, AX // AX=1
MOVQ BX, DX
MOVQ CX, BX // BX=a
MOVQ DX, CX // CX=b
This has a few redundant moves.
This is because we process inputs in order. First, allocate 1 to
AX, which kicks out a (in AX) to CX (a free register at the
moment). Then, allocate a to BX, which kicks out b (in BX) to DX.
Finally, put b to CX.
Notice that if we start with allocating CX=b, then BX=a, AX=1,
we will not have redundant moves. This CL reduces redundant moves
by allocating them in different order: First, for inpouts that are
already in place, keep them there. Then allocate free registers.
Then everything else.
before after
cmd/compile binary size 23703888 23609680
text size 8565899 8533291
(with regabiargs enabled.)
Change-Id: I69e1bdf745f2c90bb791f6d7c45b37384af1e874
Reviewed-on: https://go-review.googlesource.com/c/go/+/311371
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Fixes this failure:
go test cmd/compile/internal/ssa -run TestStmtLines -v
=== RUN TestStmtLines
stmtlines_test.go:115: Saw too many (amd64, > 1%) lines without
statement marks, total=88263, nostmt=1930
('-run TestStmtLines -v' lists failing lines)
The failure has two causes.
One is that the first-line adjuster in code generation was relocating
"first lines" to instructions that would either not have any code generated,
or would have the statment marker removed by a different believed-good heuristic.
The other was that statement boundaries were getting attached to register
values (that with the old ABI were loads from the stack, hence real instructions).
The register values disappear at code generation.
The fixes are to (1) note that certain instructions are not good choices for
"first value" and skip them, and (2) in an expandCalls post-pass, look for
register valued instructions and under appropriate conditions move their
statement marker to a compatible use.
Also updates TestStmtLines to always log the score, for easier comparison of
minor compiler changes.
Updates #40724.
Change-Id: I485573ce900e292d7c44574adb7629cdb4695c3f
Reviewed-on: https://go-review.googlesource.com/c/go/+/309649
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Some codegen tests were written with the assumption that
arguments and results are in memory, and with a specific stack
layout. With the register ABI, the assumption is no longer true.
Adjust the tests to work with both cases.
- For tests expecting in memory arguments/results, change to use
global variables or memory-assigned argument/results.
- Allow more registers. E.g. some tests expecting register names
contain only letters (e.g. AX), but it can also contain numbers
(e.g. R10).
- Some instruction selection changes when operate on register vs.
memory, e.g. ADDQ vs. LEAQ, MOVB vs. MOVL. Accept both.
TODO: mathbits.go and memops.go still need fix.
Change-Id: Ic5932b4b5dd3f5d30ed078d296476b641420c4c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/309335
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Under certain circumstances, the existing rules for bit operations can
produce code that writes beyond its intended bounds. For example,
consider the following code:
func repro(b []byte, addr, bit int32) {
_ = b[3]
v := uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24 | 1<<(bit&31)
b[0] = byte(v)
b[1] = byte(v >> 8)
b[2] = byte(v >> 16)
b[3] = byte(v >> 24)
}
Roughly speaking:
1. The expression `1 << (bit & 31)` is rewritten into `(SHLL 1 bit)`
2. The expression `uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 |
uint32(b[3])<<24` is rewritten into `(MOVLload &b[0])`
3. The statements `b[0] = byte(v) ... b[3] = byte(v >> 24)` are
rewritten into `(MOVLstore &b[0], v)`
4. `(ORL (SHLL 1, bit) (MOVLload &b[0]))` is rewritten into
`(BTSL (MOVLload &b[0]) bit)`. This is a valid transformation because
the destination is a register: in this case, the bit offset is masked
by the number of bits in the destination register. This is identical
to the masking performed by `SHL`.
5. `(MOVLstore &b[0] (BTSL (MOVLload &b[0]) bit))` is rewritten into
`(BTSLmodify &b[0] bit)`. This is an invalid transformation because
the destination is memory: in this case, the bit offset is not
masked, and the chosen instruction may write outside its intended
32-bit location.
These changes fix the invalid rewrite performed in step (5) by
explicitly maksing the bit offset operand to `BT(S|R|C)(L|Q)modify`. In
the example above, the adjusted rules produce
`(BTSLmodify &b[0] (ANDLconst [31] bit))` in step (5).
These changes also add several new rules to rewrite bit sets, toggles,
and clears that are rooted at `(OR|XOR|AND)(L|Q)modify` operators into
appropriate `BT(S|R|C)(L|Q)modify` operators. These rules catch cases
where `MOV(L|Q)store ((OR|XOR|AND)(L|Q) ...)` is rewritten to
`(OR|XOR|AND)(L|Q)modify` before the `(OR|XOR|AND)(L|Q) ...` can be
rewritten to `BT(S|R|C)(L|Q) ...`.
Overall, compilecmp reports small improvements in code size on
darwin/amd64 when the changes to the compiler itself are exlcuded:
file before after Δ %
runtime.s 536464 536412 -52 -0.010%
bytes.s 32629 32593 -36 -0.110%
strings.s 44565 44529 -36 -0.081%
os/signal.s 7967 7959 -8 -0.100%
cmd/vendor/golang.org/x/sys/unix.s 81686 81678 -8 -0.010%
math/big.s 188235 188253 +18 +0.010%
cmd/link/internal/loader.s 89295 89056 -239 -0.268%
cmd/link/internal/ld.s 633551 633232 -319 -0.050%
cmd/link/internal/arm.s 18934 18928 -6 -0.032%
cmd/link/internal/arm64.s 31814 31801 -13 -0.041%
cmd/link/internal/riscv64.s 7347 7345 -2 -0.027%
cmd/compile/internal/ssa.s 4029173 4033066 +3893 +0.097%
total 21298280 21301472 +3192 +0.015%
Change-Id: I2e560548b515865129e1724e150e30540e9d29ce
GitHub-Last-Rev: 9a42bd29a5
GitHub-Pull-Request: golang/go#45242
Reviewed-on: https://go-review.googlesource.com/c/go/+/304869
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Optimize some patterns into rev16/rev16w instruction.
Pattern1:
(c & 0xff00ff00)>>8 | (c & 0x00ff00ff)<<8
To:
rev16w c
Pattern2:
(c & 0xff00ff00ff00ff00)>>8 | (c & 0x00ff00ff00ff00ff)<<8
To:
rev16 c
This patch is a copy of CL 239637, contributed by Alice Xu(dianhong.xu@arm.com).
Change-Id: I96936c1db87618bc1903c04221c7e9b2779455b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/268377
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When -clobberdeadreg flag is set, the compiler inserts code that
clobbers integer registers at call sites. This may be helpful for
debugging register ABI.
Only implemented on AMD64 for now.
Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899
Reviewed-on: https://go-review.googlesource.com/c/go/+/302809
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This CL adds rewrite rules for CSETM, CSINC, CSINV, and CSNEG. By adding
these rules, we can save one instruction.
For example,
func test(cond bool, a int) int {
if cond {
a++
}
return a
}
Before:
MOVD "".a+8(RSP), R0
ADD $1, R0, R1
MOVBU "".cond(RSP), R2
CMPW $0, R2
CSEL NE, R1, R0, R0
After:
MOVBU "".cond(RSP), R0
CMPW $0, R0
MOVD "".a+8(RSP), R0
CSINC EQ, R0, R0, R0
This patch is a copy of CL 285694. Co-authored-by: JunchenLi
<junchen.li@arm.com>
Change-Id: Ic1a79e8b8ece409b533becfcb7950f11e7b76f24
Reviewed-on: https://go-review.googlesource.com/c/go/+/302231
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This CL resurrects the clobberdead debugging mode (CL 23924).
When -clobberdead flag is set (TODO: make it GOEXPERIMENT?), the
compiler inserts code that clobbers all dead stack slots that
contains pointers.
Mark windows syscall functions cgo_unsafe_args, as the code
actually does that, by taking the address of one argument and
passing it to cgocall.
Change-Id: Ie09a015f4bd14ae6053cc707866e30ae509b9d6f
Reviewed-on: https://go-review.googlesource.com/c/go/+/301791
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>