1
0
mirror of https://github.com/golang/go synced 2024-11-23 09:00:04 -07:00
Commit Graph

454 Commits

Author SHA1 Message Date
Matthew Dempsky
00715d089d cmd/compile/internal/walk: copy SSA-able variables
order.go ensures expressions that are passed to the runtime by address
are in fact addressable. However, in the case of local variables, if the
variable hasn't already been marked as addrtaken, then taking its
address here will effectively prevent the variable from being converted
to SSA form.

Instead, it's better to just copy the variable into a new temporary,
which we can pass by address instead. This ensures the original variable
can still be converted to SSA form.

Fixes #63332.

Change-Id: I182376d98d419df8bf07c400d84c344c9b82c0fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/541715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
2023-11-21 20:34:12 +00:00
Paul E. Murphy
773039ed5c cmd/compile/internal/ssa: on PPC64, merge (CMPconst [0] (op ...)) more aggressively
Generate the CC version of many opcodes whose result is compared against
signed 0. The approach taken here works even if the opcode result is used in
multiple places too.

Add support for ADD, ADDconst, ANDN, SUB, NEG, CNTLZD, NOR conversions
to their CC opcode variant. These are the most commonly used variants.

Also, do not set clobberFlags of CNTLZD and CNTLZW, they do not clobber
flags.

This results in about 1% smaller text sections in kubernetes binaries,
and no regressions in the crypto benchmarks.

Change-Id: I9e0381944869c3774106bf348dead5ecb96dffda
Reviewed-on: https://go-review.googlesource.com/c/go/+/538636
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-11-13 22:12:32 +00:00
Paul E. Murphy
3128aeec87 cmd/internal/obj/ppc64: remove C_UCON optab matching class
This optab matching rule was used to match signed 16 bit values shifted
left by 16 bits. Unsigned 16 bit values greater than 0x7FFF<<16 were
classified as C_U32CON which led to larger than necessary codegen.

Instead, rewrite logical/arithmetic operations in the preprocessor pass
to use the 16 bit shifted immediate operation (e.g ADDIS vs ADD). This
simplifies the optab matching rules, while also minimizing codegen size
for large unsigned values.

Note, ADDIS sign-extends the constant argument, all others do not.

For matching opcodes, this means:
	MOVD $is<<16,Rx becomes ADDIS $is,Rx or ORIS $is,Rx
	MOVW $is<<16,Rx becomes ADDIS $is,Rx
	ADD $is<<16,[Rx,]Ry becomes ADDIS $is[Rx,]Ry
	OR $is<<16,[Rx,]Ry becomes ORIS $is[Rx,]Ry
	XOR $is<<16,[Rx,]Ry becomes XORIS $is[Rx,]Ry

Change-Id: I1a988d9f52517a04bb8dc2e41d7caf3d5fff867c
Reviewed-on: https://go-review.googlesource.com/c/go/+/536735
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-11-09 18:41:18 +00:00
Ubuntu
8fc043ccfa cmd/compile: optimize right shifts of int32 on riscv64
The compiler is currently sign extending 32 bit signed integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit signed values (sraw and sraiw) which sign extend
the result of the shift to 64 bits.  Change the compiler so that
it uses sraw and sraiw for shifts of signed 32 bit integers reducing
in most cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

int32(a) >> 2

  before:

    sll     x5,x10,0x20
    sra     x10,x5,0x22

  after:

    sraw    x10,x10,0x2

int32(v) >> int(s)

  before:

    sext.w  x5,x10
    sltiu   x6,x11,64
    add     x6,x6,-1
    or      x6,x11,x6
    sra     x10,x5,x6

  after:

    sltiu   x5,x11,32
    add     x5,x5,-1
    or      x5,x11,x5
    sraw    x10,x10,x5

int32(v) >> (int(s) & 31)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

after:

    and     x5,x11,31
    sraw    x10,x10,x5

int32(100) >> int(a)

  before:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,64
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sra     x10,x6,x5

  after:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,32
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sraw    x10,x6,x5

int32(v) >> (int(s) & 63)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

  after:

    and     x5,x11,63
    sltiu   x6,x5,32
    add     x6,x6,-1
    or      x5,x5,x6
    sraw    x10,x10,x5

In most cases we eliminate one instruction.  In the case where
we shift a int32 constant by a variable the number of instructions
generated is identical.  A sra is simply replaced by a sraw.  In the
unusual case where we shift right by a variable anded with a constant
> 31 but < 64, we generate two additional instructions.  As this is
an unusual case we do not try to optimize for it.

Some improvements can be seen in some of the existing benchmarks,
notably in the utf8 package which performs right shifts of runes
which are signed 32 bit integers.

                      |  utf8-old   |              utf8-new            |
                      |   sec/op    |   sec/op     vs base             |
EncodeASCIIRune-4       17.68n ± 0%   17.67n ± 0%       ~ (p=0.312 n=10)
EncodeJapaneseRune-4    35.34n ± 0%   34.53n ± 1%  -2.31% (p=0.000 n=10)
AppendASCIIRune-4       3.213n ± 0%   3.213n ± 0%       ~ (p=0.318 n=10)
AppendJapaneseRune-4    36.14n ± 0%   35.35n ± 0%  -2.19% (p=0.000 n=10)
DecodeASCIIRune-4       28.11n ± 0%   27.36n ± 0%  -2.69% (p=0.000 n=10)
DecodeJapaneseRune-4    38.55n ± 0%   38.58n ± 0%       ~ (p=0.612 n=10)

Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/535115
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
2023-10-30 14:47:06 +00:00
Dmitri Shuralyov
b2fd76ab8d test: migrate remaining files to go:build syntax
Most of the test cases in the test directory use the new go:build syntax
already. Convert the rest. In general, try to place the build constraint
line below the test directive comment in more places.

For #41184.
For #60268.

Change-Id: I11c41a0642a8a26dc2eda1406da908645bbc005b
Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/536236
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-19 23:33:25 +00:00
Paul E. Murphy
1c5862cc0a test/codegen: fix PPC64 AddLargeConst test
Commit 061d77cb70 was published in parallel with another commit
36ecff0893 which changed how certain constants were generated.

Update the test to account for the changes.

Change-Id: I314b735a34857efa02392b7a0dd9fd634e4ee428
Reviewed-on: https://go-review.googlesource.com/c/go/+/536256
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-10-19 00:40:54 +00:00
Paul E. Murphy
061d77cb70 cmd/compile/internal/ssa: on PPC64, generate large constant paddi
This is only supported power10/linux/PPC64. This generates smaller,
faster code by merging a pli + add into paddi.

Change-Id: I1f4d522fce53aea4c072713cc119a9e0d7065acc
Reviewed-on: https://go-review.googlesource.com/c/go/+/531717
Run-TryBot: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-10-18 18:04:48 +00:00
Lynn Boger
80834af206 cmd/compile: avoid ANDCCconst on PPC64 if condition not needed
In the PPC64 ISA, the instruction to do an 'and' operation
using an immediate constant is only available in the form that
also sets CR0 (i.e. clobbers the condition register.) This means
CR0 is being clobbered unnecessarily in many cases. That
affects some decisions made during some compiler passes
that check for it.

In those cases when the constant used by the ANDCC is a right
justified consecutive set of bits, a shift instruction can
be used which has the same effect if CR0 does not need to be
set. The rule to do that has been added to the late rules file
after other rules using ANDCCconst have been processed in the
main rules file.

Some codegen tests had to be updated since ANDCC is no
longer generated for some cases. A new test case was added to
verify the ANDCC is present if the results for both the AND
and CR0 are used.

Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542
Reviewed-on: https://go-review.googlesource.com/c/go/+/531435
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-10-18 15:56:53 +00:00
Keith Randall
657c885fb9 cmd/compile: when combining stores, use line number of first store
var p *[2]uint32 = ...
p[0] = 0
p[1] = 0

When we combine these two 32-bit stores into a single 64-bit store,
use the line number of the first store, not the second one.
This differs from the default behavior because usually with the combining
that the compiler does, we use the line number of the last instruction
in the combo (e.g. load+add, we use the line number of the add).

This is the same behavior that gcc does in C (picking the line
number of the first of a set of combined stores).

Change-Id: Ie70bf6151755322d33ecd50e4d9caf62f7881784
Reviewed-on: https://go-review.googlesource.com/c/go/+/521678
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-10-12 18:09:26 +00:00
Keith Randall
e0948d825d cmd/compile: use type hash from itab field instead of type field
It is one less dependent load away, and right next to another
field in the itab we also load as part of the type switch or
type assert.

Change-Id: If7aaa7814c47bd79a6c7ed4232ece0bc1d63550e
Reviewed-on: https://go-review.googlesource.com/c/go/+/533117
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-10-09 18:39:50 +00:00
Keith Randall
afd7c15c7f cmd/compile: use cache in front of convI2I
This is the last of the getitab users to receive a cache.
We should now no longer see getitab (and callees) in profiles.
Hopefully.

Change-Id: I2ed72b9943095bbe8067c805da7f08e00706c98c
Reviewed-on: https://go-review.googlesource.com/c/go/+/531055
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-10-09 17:28:22 +00:00
Mark Ryan
561bf0457f cmd/compile: optimize right shifts of uint32 on riscv
The compiler is currently zero extending 32 bit unsigned integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit unsigned values (srlw and srliw) which zero extend
the result of the shift to 64 bits.  Change the compiler so that
it uses srlw and srliw for 32 bit unsigned shifts reducing in most
cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

uint32(a) >> 2

  before:

    sll     x5,x10,0x20
    srl     x10,x5,0x22

  after:

    srlw    x10,x10,0x2

uint32(a) >> int(b)

  before:

    sll     x5,x10,0x20
    srl     x5,x5,0x20
    srl     x5,x5,x11
    sltiu   x6,x11,64
    neg     x6,x6
    and     x10,x5,x6

  after:

    srlw    x5,x10,x11
    sltiu   x6,x11,32
    neg     x6,x6
    and     x10,x5,x6

bits.RotateLeft32(uint32(a), 1)

  before:

    sll     x5,x10,0x1
    sll     x6,x10,0x20
    srl     x7,x6,0x3f
    or      x5,x5,x7

  after:

   sll     x5,x10,0x1
   srlw    x6,x10,0x1f
   or      x10,x5,x6

bits.RotateLeft32(uint32(a), int(b))

  before:
    and     x6,x11,31
    sll     x7,x10,x6
    sll     x8,x10,0x20
    srl     x8,x8,0x20
    add     x6,x6,-32
    neg     x6,x6
    srl     x9,x8,x6
    sltiu   x6,x6,64
    neg     x6,x6
    and     x6,x9,x6
    or      x6,x6,x7

  after:

    and     x5,x11,31
    sll     x6,x10,x5
    add     x5,x5,-32
    neg     x5,x5
    srlw    x7,x10,x5
    sltiu   x5,x5,32
    neg     x5,x5
    and     x5,x7,x5
    or      x10,x6,x5

The one regression observed is the following case, an unbounded right
shift of a uint32 where the value we're shifting by is known to be
< 64 but > 31.  As this is an unusual case this commit does not
optimize for it, although the existing code does.

uint32(a) >> (b & 63)

  before:

    sll     x5,x10,0x20
    srl     x5,x5,0x20
    and     x6,x11,63
    srl     x10,x5,x6

  after

    and     x5,x11,63
    srlw    x6,x10,x5
    sltiu   x5,x5,32
    neg     x5,x5
    and     x10,x6,x5

Here we have one extra instruction.

Some benchmark highlights, generated on a VisionFive2 8GB running
Ubuntu 23.04.

pkg: math/bits
LeadingZeros32-4    18.64n ± 0%     17.32n ± 0%   -7.11% (p=0.000 n=10)
LeadingZeros64-4    15.47n ± 0%     15.51n ± 0%   +0.26% (p=0.027 n=10)
TrailingZeros16-4   18.48n ± 0%     17.68n ± 0%   -4.33% (p=0.000 n=10)
TrailingZeros32-4   16.87n ± 0%     16.07n ± 0%   -4.74% (p=0.000 n=10)
TrailingZeros64-4   15.26n ± 0%     15.27n ± 0%   +0.07% (p=0.043 n=10)
OnesCount32-4       20.08n ± 0%     19.29n ± 0%   -3.96% (p=0.000 n=10)
RotateLeft-4        8.864n ± 0%     8.838n ± 0%   -0.30% (p=0.006 n=10)
RotateLeft32-4      8.837n ± 0%     8.032n ± 0%   -9.11% (p=0.000 n=10)
Reverse32-4         29.77n ± 0%     26.52n ± 0%  -10.93% (p=0.000 n=10)
ReverseBytes32-4    9.640n ± 0%     8.838n ± 0%   -8.32% (p=0.000 n=10)
Sub32-4             8.835n ± 0%     8.035n ± 0%   -9.06% (p=0.000 n=10)
geomean             11.50n          11.33n        -1.45%

pkg: crypto/md5
Hash8Bytes-4             1.486µ ± 0%   1.426µ ± 0%  -4.04% (p=0.000 n=10)
Hash64-4                 2.079µ ± 0%   1.968µ ± 0%  -5.36% (p=0.000 n=10)
Hash128-4                2.720µ ± 0%   2.557µ ± 0%  -5.99% (p=0.000 n=10)
Hash256-4                3.996µ ± 0%   3.733µ ± 0%  -6.58% (p=0.000 n=10)
Hash512-4                6.541µ ± 0%   6.072µ ± 0%  -7.18% (p=0.000 n=10)
Hash1K-4                 11.64µ ± 0%   10.75µ ± 0%  -7.58% (p=0.000 n=10)
Hash8K-4                 82.95µ ± 0%   76.32µ ± 0%  -7.99% (p=0.000 n=10)
Hash1M-4                10.436m ± 0%   9.591m ± 0%  -8.10% (p=0.000 n=10)
Hash8M-4                 83.50m ± 0%   76.73m ± 0%  -8.10% (p=0.000 n=10)
Hash8BytesUnaligned-4    1.494µ ± 0%   1.434µ ± 0%  -4.02% (p=0.000 n=10)
Hash1KUnaligned-4        11.64µ ± 0%   10.76µ ± 0%  -7.52% (p=0.000 n=10)
Hash8KUnaligned-4        83.01µ ± 0%   76.32µ ± 0%  -8.07% (p=0.000 n=10)
geomean                  28.32µ        26.42µ       -6.72%

Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693
Reviewed-on: https://go-review.googlesource.com/c/go/+/528975
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-07 12:31:38 +00:00
David Chase
b72bbaebf9 cmd/compile: expand calls cleanup
Convert expand calls into a smaller number of focused
recursive rewrites, and rely on an enhanced version of
"decompose" to clean up afterwards.

Debugging information seems to emerge intact.

Change-Id: Ic46da4207e3a4da5c8e2c47b637b0e35abbe56bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/507295
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-06 20:57:33 +00:00
Keith Randall
cbcf8efa5f cmd/compile: use cache in front of type assert runtime call
That way we don't need to call into the runtime for every
type assertion (to an interface type).

name           old time/op  new time/op  delta
TypeAssert-24  3.78ns ± 3%  1.00ns ± 1%  -73.53%  (p=0.000 n=10+8)

Change-Id: I0ba308aaf0f24a5495b4e13c814d35af0c58bfde
Reviewed-on: https://go-review.googlesource.com/c/go/+/529316
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-10-06 17:02:53 +00:00
Keith Randall
39263f34a3 cmd/compile: add a cache to interface type switches
That way we don't need to call into the runtime when the type being
switched on has been seen many times before.

The cache is just a hash table of a sample of all the concrete types
that have been switched on at that source location.  We record the
matching case number and the resulting itab for each concrete input
type.

The caches seldom get large. The only two in a run of all.bash that
get more than 100 entries, even with the sampling rate set to 1, are

test/fixedbugs/issue29264.go, with 101
test/fixedbugs/issue29312.go, with 254

Both happen at the type switch in fmt.(*pp).handleMethods, perhaps
unsurprisingly.

name                                 old time/op  new time/op  delta
SwitchInterfaceTypePredictable-24    25.8ns ± 2%   2.5ns ± 3%  -90.43%  (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24  37.5ns ± 2%  11.2ns ± 1%  -70.02%  (p=0.000 n=10+10)

Change-Id: I4961ac9547b7f15b03be6f55cdcb972d176955eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/526658
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-06 15:44:08 +00:00
Keith Randall
28f4ea16a2 cmd/compile: improve interface type switches
For type switches where the targets are interface types,
call into the runtime once instead of doing a sequence
of assert* calls.

name                                 old time/op  new time/op  delta
SwitchInterfaceTypePredictable-24    26.6ns ± 1%  25.8ns ± 2%  -2.86%  (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24  39.3ns ± 1%  37.5ns ± 2%  -4.57%  (p=0.000 n=10+10)

Not super helpful by itself, but this code organization allows
followon CLs that add caching to the lookups.

Change-Id: I7967f85a99171faa6c2550690e311bea8b54b01c
Reviewed-on: https://go-review.googlesource.com/c/go/+/526657
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-06 15:42:30 +00:00
Paul E. Murphy
dcd018b5c5 cmd/internal/obj/ppc64: generate MOVD mask constants in register
Add a new form of RLDC which maps directly to the ISA definition
of rldc: RLDC Rs, $sh, $mb, Ra. This is used to generate mask
constants described below.

Using MOVD $-1, Rx; RLDC Rx, $sh, $mb, Rx, any mask constant
can be generated. A mask is a contiguous series of 1 bits, which
may wrap.

Change-Id: Ifcaae1114080ad58b5fdaa3e5fc9019e2051f282
Reviewed-on: https://go-review.googlesource.com/c/go/+/531120
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-05 14:03:32 +00:00
Paul E. Murphy
36ecff0893 cmd/internal/obj/ppc64: generate small, shifted constants in register
Check for shifted 16b constants, and transform them to avoid the load
penalty. This should be much faster than loading, and reduce binary
size by reducing the constant pool size.

Change-Id: I6834e08be7ca88e3b77449d226d08d199db84299
Reviewed-on: https://go-review.googlesource.com/c/go/+/531119
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-04 19:10:19 +00:00
Paul E. Murphy
c8caad423c cmd/compile/internal/ssa: optimize (AND (MOVDconst [-1] x)) on PPC64
This sequence can show up in the lowering pass on PPC64. If it
makes it to the latelower pass, it will cause an error because
it looks like it can be turned into RLDICL, but -1 isn't an
accepted mask.

Also, print more debug info if panic is called from
encodePPC64RotateMask.

Fixes #62698

Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c
Reviewed-on: https://go-review.googlesource.com/c/go/+/529195
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-09-22 14:09:29 +00:00
eric fang
ace1494d92 cmd/compile: optimize absorbing InvertFlags into Noov comparisons for arm64
Previously (LessThanNoov (InvertFlags x)) is lowered as:
CSET
CSET
BIC
With this CL it's lowered as:
CSET
CSEL
This saves one instruction.

Similarly (GreaterEqualNoov (InvertFlags x)) is now lowered as:
CSET
CSINC

$ benchstat old.bench new.bench
goos: linux
goarch: arm64
                       │  old.bench  │             new.bench              │
                       │   sec/op    │   sec/op     vs base               │
InvertLessThanNoov-160   2.249n ± 2%   2.190n ± 1%  -2.62% (p=0.003 n=10)

Change-Id: Idd8979b7f4fe466e74b1a201c4aba7f1b0cffb0b
Reviewed-on: https://go-review.googlesource.com/c/go/+/526237
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-09-21 02:36:06 +00:00
Yi Yang
4ee1d542ed cmd/compile: sparse conditional constant propagation
sparse conditional constant propagation can discover optimization
opportunities that cannot be found by just combining constant folding
and constant propagation and dead code elimination separately.

This is a re-submit of PR#59575, which fix a broken dominance relationship caught by ssacheck

Updates https://github.com/golang/go/issues/59399

Change-Id: I57482dee38f8e80a610aed4f64295e60c38b7a47
GitHub-Last-Rev: 830016f24e
GitHub-Pull-Request: golang/go#60469
Reviewed-on: https://go-review.googlesource.com/c/go/+/498795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-09-12 21:01:50 +00:00
Paul E. Murphy
5cdb132228 cmd/compile/internal/ssa: improve masking codegen on PPC64
Generate RLDIC[LR] instead of MOVD mask, Rx; AND Rx, Ry, Rz.
This helps reduce code size, and reduces the latency caused
by the constant load.

Similarly, for smaller-than-register values, truncate constants
which exceed the range of the value's type to avoid needing to
load a constant.

Change-Id: I6019684795eb8962d4fd6d9585d08b17c15e7d64
Reviewed-on: https://go-review.googlesource.com/c/go/+/515576
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-09-06 16:34:20 +00:00
Keith Randall
4711299661 cmd/compile: use jump tables for large type switches
For large interface -> concrete type switches, we can use a jump
table on some bits of the type hash instead of a binary search on
the type hash.

name                        old time/op  new time/op  delta
SwitchTypePredictable-24    1.99ns ± 2%  1.78ns ± 5%  -10.87%  (p=0.000 n=10+10)
SwitchTypeUnpredictable-24  11.0ns ± 1%   9.1ns ± 2%  -17.55%  (p=0.000 n=7+9)

Change-Id: Ida4768e5d62c3ce1c2701288b72664aaa9e64259
Reviewed-on: https://go-review.googlesource.com/c/go/+/521497
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-08-23 00:30:54 +00:00
Keith Randall
556e9c5f3e cmd/compile: allow non-pointer writes in the middle of a write barrier
This lets us combine more write barriers, getting rid of some of the
test+branch and gcWriteBarrier* calls.
With the new write barriers, it's easy to add a few non-pointer writes
to the set of values written.

We allow up to 2 non-pointer writes between pointer writes. This is enough
for, for example, adjacent slice fields.

Fixes #62126

Change-Id: I872d0fa9cc4eb855e270ffc0223b39fde1723c4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/521498
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-08-23 00:16:06 +00:00
Meng Zhuo
63ab68ddc5 cmd/compile: add single-precision FMA code generation for riscv64
This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv

Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066
Reviewed-on: https://go-review.googlesource.com/c/go/+/506616
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: M Zhuo <mzh@golangcn.org>
2023-08-22 12:05:36 +00:00
Meng Zhuo
05f9511582 cmd/compile: improve FP FMA performance on riscv64
FMADD/FMSUB/FNSUB are an efficient FP FMA instructions, which can
be used by the compiler to improve FP performance.

Erf               188.0n ± 2%   139.5n ± 2%  -25.82% (p=0.000 n=10)
Erfc              193.6n ± 1%   143.2n ± 1%  -26.01% (p=0.000 n=10)
Erfinv            244.4n ± 2%   172.6n ± 0%  -29.40% (p=0.000 n=10)
Erfcinv           244.7n ± 2%   173.0n ± 1%  -29.31% (p=0.000 n=10)
geomean           216.0n        156.3n       -27.65%

Ref: The RISC-V Instruction Set Manual Volume I: Unprivileged ISA
11.6 Single-Precision Floating-Point Computational Instructions

Change-Id: I89aa3a4df7576fdd47f4a6ee608ac16feafd093c
Reviewed-on: https://go-review.googlesource.com/c/go/+/506036
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-22 08:38:08 +00:00
Keith Randall
0b47b94a62 cmd/compile: remove more extension ops when not needed
If we're not using the upper bits, don't bother issuing a
sign/zero extension operation.

For arm64, after CL 520916 which fixed a correctness bug with
extensions but as a side effect leaves many unnecessary ones
still in place.

Change-Id: I5f4fe4efbf2e9f80969ab5b9a6122fb812dc2ec0
Reviewed-on: https://go-review.googlesource.com/c/go/+/521496
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-21 20:52:15 +00:00
Keith Randall
611706b171 cmd/compile: don't use BTS when OR works, add direct memory BTS operations
Stop using BTSconst and friends when ORLconst can be used instead.
OR can be issued by more function units than BTS can, so it could
lead to better IPC. OR might take a few more bytes to encode, but
not a lot more.

Still use BTSconst for cases where the constant otherwise wouldn't
fit and would require a separate movabs instruction to materialize
the constant. This happens when setting bits 31-63 of 64-bit targets.

Add BTS-to-memory operations so we don't need to load/bts/store.

Fixes #61694

Change-Id: I00379608df8fb0167cb01466e97d11dec7c1596c
Reviewed-on: https://go-review.googlesource.com/c/go/+/515755
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-08-04 16:40:24 +00:00
Jorropo
bac4e2f241 cmd/compile: try to rewrite loops to count down
Fixes #61629

This reduce the pressure on regalloc because then the loop only keep alive
one value (the iterator) instead of the iterator and the upper bound since
the comparison now acts against an immediate, often zero which can be skipped.

This optimize things like:
  for i := 0; i < n; i++ {
Or a range over a slice where the index is not used:
  for _, v := range someSlice {
Or the new range over int from #61405:
  for range n {

It is hit in 975 unique places while doing ./make.bash.

Change-Id: I5facff8b267a0b60ea3c1b9a58c4d74cdb38f03f
Reviewed-on: https://go-review.googlesource.com/c/go/+/512935
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
2023-07-31 18:33:29 +00:00
Junxian Zhu
b1474672c6 cmd/compile: intrinsify Sub64 on mips64
This CL intrinsify Sub64 on mips64.

pkg: math/bits
                  _   sec/op    _   sec/op     vs base               _
Sub-4               2.849n _ 0%   1.948n _ 0%  -31.64% (p=0.000 n=8)
Sub32-4             3.447n _ 0%   3.446n _ 0%        ~ (p=0.982 n=8)
Sub64-4             2.815n _ 0%   1.948n _ 0%  -30.78% (p=0.000 n=8)
Sub64multiple-4     6.124n _ 0%   3.340n _ 0%  -45.46% (p=0.000 n=8)

Change-Id: Ibba91a4350e4a549ae0b60d8cafc4bca05034b84
Reviewed-on: https://go-review.googlesource.com/c/go/+/498497
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-07-31 03:59:48 +00:00
Junxian Zhu
5f8a2fdf09 cmd/compile: intrinsify Add64 on mips64
This CL intrinsify Add64 on mips64.

pkg: math/bits
                  _   sec/op    _   sec/op     vs base               _
Add64-4             2.783n _ 0%   1.950n _ 0%  -29.93% (p=0.000 n=8)
Add64multiple-4     5.713n _ 0%   3.063n _ 0%  -46.38% (p=0.000 n=8)

pkg: crypto/elliptic
                                     _    sec/op    _   sec/op     vs base               _
ScalarBaseMult/P256-4                   353.7_ _ 0%   282.7_ _ 0%  -20.09% (p=0.000 n=8)
ScalarBaseMult/P224-4                   330.5_ _ 0%   250.0_ _ 0%  -24.37% (p=0.000 n=8)
ScalarBaseMult/P384-4                  1228.8_ _ 0%   791.5_ _ 0%  -35.59% (p=0.000 n=8)
ScalarBaseMult/P521-4                  15.412m _ 0%   2.438m _ 0%  -84.18% (p=0.000 n=8)
ScalarMult/P256-4                      1189.4_ _ 0%   904.2_ _ 0%  -23.98% (p=0.000 n=8)
ScalarMult/P224-4                      1138.8_ _ 0%   813.8_ _ 0%  -28.54% (p=0.000 n=8)
ScalarMult/P384-4                       4.419m _ 0%   2.692m _ 0%  -39.08% (p=0.000 n=8)
ScalarMult/P521-4                      59.768m _ 0%   8.773m _ 0%  -85.32% (p=0.000 n=8)
MarshalUnmarshal/P256/Uncompressed-4    8.697_ _ 1%   7.923_ _ 1%   -8.91% (p=0.000 n=8)
MarshalUnmarshal/P256/Compressed-4     104.75_ _ 0%   66.29_ _ 0%  -36.72% (p=0.000 n=8)
MarshalUnmarshal/P224/Uncompressed-4    8.728_ _ 1%   7.823_ _ 1%  -10.37% (p=0.000 n=8)
MarshalUnmarshal/P224/Compressed-4     1035.7_ _ 0%   676.5_ _ 2%  -34.69% (p=0.000 n=8)
MarshalUnmarshal/P384/Uncompressed-4    15.32_ _ 1%   11.81_ _ 1%  -22.90% (p=0.000 n=8)
MarshalUnmarshal/P384/Compressed-4      399.8_ _ 0%   217.4_ _ 0%  -45.62% (p=0.000 n=8)
MarshalUnmarshal/P521/Uncompressed-4    96.79_ _ 0%   20.32_ _ 0%  -79.01% (p=0.000 n=8)
MarshalUnmarshal/P521/Compressed-4     6640.4_ _ 0%   790.8_ _ 0%  -88.09% (p=0.000 n=8)

Change-Id: I8a0960b9665720c1d3e57dce36386e74db37fefa
Reviewed-on: https://go-review.googlesource.com/c/go/+/498496
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
2023-07-31 03:58:42 +00:00
Keith Randall
67983c0f78 cmd/compile: add indexed SET* opcodes for amd64
Update #61356

Change-Id: I391af98563b1c068208784c80ea736c78c29639d
Reviewed-on: https://go-review.googlesource.com/c/go/+/510435
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2023-07-26 17:19:57 +00:00
Keith Randall
505e50b1e3 cmd/compile: get rid of special case in scheduler for entry block
It isn't needed.

Fixes #61356

Change-Id: Ib466a3eac90c3ea57888cf40c294513033fc6118
Reviewed-on: https://go-review.googlesource.com/c/go/+/509856
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-07-26 17:19:14 +00:00
Keith Randall
d0964e172b cmd/compile: optimize s==s for strings
s==s is always true for strings. This comes up in NaN testing in
generic code, where we want x==x to compile completely away except for
float types.

Fixes #60777

Change-Id: I3ce054b5121354de2f9751b010fb409f148cb637
Reviewed-on: https://go-review.googlesource.com/c/go/+/503795
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2023-07-26 17:17:28 +00:00
Keith Randall
e713d6f939 cmd/compile: memcombine if values being stored are from consecutive loads
If we load 2 values and then store those 2 loaded values, we can likely
perform that operation with a single wider load and store.

Fixes #60709

Change-Id: Ifc5f92c2f1b174c6ed82a69070f16cec6853c770
Reviewed-on: https://go-review.googlesource.com/c/go/+/502295
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-07-21 19:06:53 +00:00
Jes Cok
5d481abc87 all: fix typos
Change-Id: I510b0a4bf3472d937393800dd57472c30beef329
GitHub-Last-Rev: 8d289b73a3
GitHub-Pull-Request: golang/go#60960
Reviewed-on: https://go-review.googlesource.com/c/go/+/505398
Auto-Submit: Robert Findley <rfindley@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Robert Findley <rfindley@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-07-18 19:55:29 +00:00
Paul E. Murphy
5e4000ad7f cmd/compile: on PPC64, fix sign/zero extension when masking
(ANDCCconst [y] (MOV.*reg x)) should only be merged when zero
extending. Otherwise, sign bits are lost on negative values.

(ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero
extension of x. Likewise for the MOVHreg variant.

Fixes #61297

Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b
Reviewed-on: https://go-review.googlesource.com/c/go/+/508775
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-07-12 16:34:02 +00:00
Meng Zhuo
3fce111535 cmd/compile: fix FMA negative commutativity of riscv64
According to RISCV manual 11.6:

FMADD x,y,z computes x*y+z and
FNMADD x,y,z => -x*y-z
FMSUB x,y,z => x*y-z
FNMSUB x,y,z => -x*y+z respectively

However our implement of SSA convert FMADD -x,y,z to FNMADD x,y,z which
is wrong and should be convert to FNMSUB according to manual.

Change-Id: Ib297bc83824e121fd7dda171ed56ea9694a4e575
Reviewed-on: https://go-review.googlesource.com/c/go/+/506575
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-07-05 22:05:44 +00:00
Meng Zhuo
5c1a15df41 test/codegen: enable Mul2 DivPow2 test for riscv64
Change-Id: Ice0bb7a665599b334e927a1b00d1a5b400c15e3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/506035
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2023-07-04 13:33:45 +00:00
Meng Zhuo
b7e7467865 test/codegen: add fsqrt test for riscv64
Add FSQRTD FSQRTS codegen tests for riscv64

Change-Id: I16ca3753ad1ba37afbd9d0f887b078e33f98fda0
Reviewed-on: https://go-review.googlesource.com/c/go/+/503275
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-06-15 15:16:20 +00:00
Keith Randall
c643b29381 cmd/compile: use callsite as line number for argument marshaling
Don't use the line number of the argument itself, as that may be from
arbitrarily earlier in the function.

Fixes #60673

Change-Id: Ifc0a2aaae221a256be3a4b0b2e04849bae4b79d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/502656
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-06-12 20:34:37 +00:00
Austin Clements
c2e0bf0abf cmd/internal/testdir: pass if GOEXPERIMENT=cgocheck2 is set
Some testdir tests fail if GOEXPERIMENT=cgocheck2 is set. Fix this by
skipping these tests.

Change-Id: I58d4ef0cceb86bcf93220b4a44de9b9dc4879b16
Reviewed-on: https://go-review.googlesource.com/c/go/+/499675
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-06-01 18:30:44 +00:00
Bryan Mills
02d234e34d Revert "cmd/compile: sparse conditional constant propagation"
This reverts CL 483875.

Reason for revert: appears to cause internal compiler errors on the ssacheck builder.

Change-Id: I662418384291470c1962c417797a5890dd9aa7a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/497855
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
2023-05-24 14:39:34 +00:00
Junxian Zhu
f0d575c266 cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on mips64x
This CL use MFC1/MTC1 instructions to move data between GPR and FPR instead of stores and loads to move float/int values.

goos: linux
goarch: mips64le
pkg: math
                      │   oldmath    │              newmath               │
                      │    sec/op    │   sec/op     vs base               │
Acos-4                   258.2n ± 0%   258.2n ± 0%        ~ (p=0.859 n=8)
Acosh-4                  378.7n ± 0%   323.9n ± 0%  -14.47% (p=0.000 n=8)
Asin-4                   255.1n ± 2%   255.5n ± 0%   +0.16% (p=0.002 n=8)
Asinh-4                  407.1n ± 0%   348.7n ± 0%  -14.35% (p=0.000 n=8)
Atan-4                   189.5n ± 0%   189.9n ± 3%        ~ (p=0.205 n=8)
Atanh-4                  355.6n ± 0%   323.4n ± 2%   -9.03% (p=0.000 n=8)
Atan2-4                  284.1n ± 7%   280.1n ± 4%        ~ (p=0.313 n=8)
Cbrt-4                   314.3n ± 0%   236.4n ± 0%  -24.79% (p=0.000 n=8)
Ceil-4                   144.3n ± 3%   139.6n ± 0%        ~ (p=0.069 n=8)
Compare-4               21.100n ± 0%   7.035n ± 0%  -66.66% (p=0.000 n=8)
Compare32-4             20.100n ± 0%   6.030n ± 0%  -70.00% (p=0.000 n=8)
Copysign-4              34.970n ± 0%   6.221n ± 0%  -82.21% (p=0.000 n=8)
Cos-4                    183.4n ± 3%   184.1n ± 5%        ~ (p=0.159 n=8)
Cosh-4                   487.9n ± 2%   419.6n ± 0%  -14.00% (p=0.000 n=8)
Erf-4                    160.6n ± 0%   157.9n ± 0%   -1.68% (p=0.009 n=8)
Erfc-4                   183.7n ± 4%   169.8n ± 0%   -7.54% (p=0.000 n=8)
Erfinv-4                 191.5n ± 4%   183.6n ± 0%   -4.13% (p=0.023 n=8)
Erfcinv-4                192.0n ± 7%   184.3n ± 0%        ~ (p=0.425 n=8)
Exp-4                    398.2n ± 0%   340.1n ± 4%  -14.58% (p=0.000 n=8)
ExpGo-4                  383.3n ± 0%   327.3n ± 0%  -14.62% (p=0.000 n=8)
Expm1-4                  248.7n ± 5%   216.0n ± 0%  -13.11% (p=0.000 n=8)
Exp2-4                   372.8n ± 0%   316.9n ± 3%  -14.98% (p=0.000 n=8)
Exp2Go-4                 374.1n ± 0%   320.5n ± 0%  -14.33% (p=0.000 n=8)
Abs-4                    3.013n ± 0%   3.016n ± 0%   +0.10% (p=0.020 n=8)
Dim-4                    5.021n ± 0%   5.022n ± 0%        ~ (p=0.270 n=8)
Floor-4                  127.5n ± 4%   126.2n ± 3%        ~ (p=0.186 n=8)
Max-4                    72.32n ± 0%   61.33n ± 0%  -15.20% (p=0.000 n=8)
Min-4                    83.33n ± 1%   61.36n ± 0%  -26.37% (p=0.000 n=8)
Mod-4                    690.7n ± 0%   454.5n ± 0%  -34.20% (p=0.000 n=8)
Frexp-4                 116.30n ± 1%   71.80n ± 1%  -38.26% (p=0.000 n=8)
Gamma-4                  389.0n ± 0%   355.9n ± 1%   -8.48% (p=0.000 n=8)
Hypot-4                 102.40n ± 0%   83.90n ± 0%  -18.07% (p=0.000 n=8)
HypotGo-4               105.45n ± 4%   84.82n ± 2%  -19.56% (p=0.000 n=8)
Ilogb-4                  99.13n ± 4%   63.71n ± 2%  -35.73% (p=0.000 n=8)
J0-4                     859.7n ± 0%   854.8n ± 0%   -0.57% (p=0.000 n=8)
J1-4                     873.9n ± 0%   875.7n ± 0%   +0.21% (p=0.007 n=8)
Jn-4                     1.855µ ± 0%   1.867µ ± 0%   +0.65% (p=0.000 n=8)
Ldexp-4                 130.50n ± 2%   64.35n ± 0%  -50.69% (p=0.000 n=8)
Lgamma-4                 208.8n ± 0%   200.9n ± 0%   -3.78% (p=0.000 n=8)
Log-4                    294.1n ± 0%   255.2n ± 3%  -13.22% (p=0.000 n=8)
Logb-4                  105.45n ± 1%   66.81n ± 1%  -36.64% (p=0.000 n=8)
Log1p-4                  268.2n ± 0%   211.3n ± 0%  -21.21% (p=0.000 n=8)
Log10-4                  295.4n ± 0%   255.2n ± 2%  -13.59% (p=0.000 n=8)
Log2-4                   152.9n ± 1%   127.5n ± 0%  -16.61% (p=0.000 n=8)
Modf-4                  103.40n ± 0%   75.36n ± 0%  -27.12% (p=0.000 n=8)
Nextafter32-4           121.20n ± 1%   78.40n ± 0%  -35.31% (p=0.000 n=8)
Nextafter64-4           110.40n ± 1%   64.91n ± 0%  -41.20% (p=0.000 n=8)
PowInt-4                 509.8n ± 1%   369.3n ± 1%  -27.56% (p=0.000 n=8)
PowFrac-4               1189.0n ± 0%   947.8n ± 0%  -20.29% (p=0.000 n=8)
Pow10Pos-4               15.07n ± 0%   15.07n ± 0%        ~ (p=0.733 n=8)
Pow10Neg-4               20.10n ± 0%   20.10n ± 0%        ~ (p=0.576 n=8)
Round-4                  44.22n ± 0%   26.12n ± 0%  -40.92% (p=0.000 n=8)
RoundToEven-4            46.22n ± 0%   27.12n ± 0%  -41.31% (p=0.000 n=8)
Remainder-4              539.0n ± 1%   417.1n ± 1%  -22.62% (p=0.000 n=8)
Signbit-4               17.985n ± 0%   5.694n ± 0%  -68.34% (p=0.000 n=8)
Sin-4                    185.7n ± 5%   172.9n ± 0%   -6.89% (p=0.001 n=8)
Sincos-4                 176.6n ± 0%   200.9n ± 0%  +13.76% (p=0.000 n=8)
Sinh-4                   495.8n ± 0%   435.9n ± 0%  -12.09% (p=0.000 n=8)
SqrtIndirect-4           5.022n ± 0%   5.024n ± 0%        ~ (p=0.083 n=8)
SqrtLatency-4            8.038n ± 0%   8.044n ± 0%        ~ (p=0.524 n=8)
SqrtIndirectLatency-4    8.035n ± 0%   8.039n ± 0%   +0.06% (p=0.017 n=8)
SqrtGoLatency-4          340.1n ± 0%   278.3n ± 0%  -18.19% (p=0.000 n=8)
SqrtPrime-4              5.381µ ± 0%   5.386µ ± 0%        ~ (p=0.662 n=8)
Tan-4                    198.6n ± 1%   183.1n ± 0%   -7.85% (p=0.000 n=8)
Tanh-4                   491.3n ± 1%   440.8n ± 1%  -10.29% (p=0.000 n=8)
Trunc-4                  121.7n ± 0%   121.7n ± 0%        ~ (p=0.769 n=8)
Y0-4                     855.1n ± 0%   859.8n ± 0%   +0.54% (p=0.007 n=8)
Y1-4                     862.3n ± 0%   865.1n ± 0%   +0.32% (p=0.007 n=8)
Yn-4                     1.830µ ± 0%   1.837µ ± 0%   +0.36% (p=0.011 n=8)
Float64bits-4           13.060n ± 0%   3.016n ± 0%  -76.91% (p=0.000 n=8)
Float64frombits-4       13.060n ± 0%   3.018n ± 0%  -76.90% (p=0.000 n=8)
Float32bits-4           13.060n ± 0%   3.016n ± 0%  -76.91% (p=0.000 n=8)
Float32frombits-4       13.070n ± 0%   3.013n ± 0%  -76.94% (p=0.000 n=8)
FMA-4                    446.0n ± 0%   413.1n ± 1%   -7.38% (p=0.000 n=8)
geomean                  143.4n        108.3n       -24.49%

Change-Id: I2067f7a5ae1126ada7ab3fb2083710e8212535e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/493815
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
2023-05-24 03:36:31 +00:00
Yi Yang
fa50248ce6 cmd/compile: sparse conditional constant propagation
sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately.

Updates #59399

Change-Id: Ia954e906480654a6f0cc065d75b5912f96f36b2e
GitHub-Last-Rev: 90fc02db99
GitHub-Pull-Request: golang/go#59575
Reviewed-on: https://go-review.googlesource.com/c/go/+/483875
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-05-24 02:54:03 +00:00
Cuong Manh Le
35a71dc56d cmd/compile: avoid slicebytetostring call in len(string([]byte))
Change-Id: Ie04503e61400a793a6a29a4b58795254deabe472
Reviewed-on: https://go-review.googlesource.com/c/go/+/497276
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-05-23 19:27:38 +00:00
Matthew Dempsky
7f1467ff4d cmd/compile: incorporate inlined function names into closure naming
In Go 1.17, cmd/compile gained the ability to inline calls to
functions that contain function literals (aka "closures"). This was
implemented by duplicating the function literal body and emitting a
second LSym, because in general it might be optimized better than the
original function literal.

However, the second LSym was named simply as any other function
literal appearing literally in the enclosing function would be named.
E.g., if f has a closure "f.funcX", and f is inlined into g, we would
create "g.funcY" (N.B., X and Y need not be the same.). Users then
have no idea this function originally came from f.

With this CL, the inlined call stack is incorporated into the clone
LSym's name: instead of "g.funcY", it's named "g.f.funcY".

In the future, it seems desirable to arrange for the clone's name to
appear exactly as the original name, so stack traces remain the same
as when -l or -d=inlfuncswithclosures are used. But it's unclear
whether the linker supports that today, or whether any downstream
tooling would be confused by this.

Updates #60324.

Change-Id: Ifad0ccef7e959e72005beeecdfffd872f63982f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/497137
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
2023-05-22 22:47:15 +00:00
Keith Randall
6042a062dc cmd/compile: make memcombine pass a bit more robust to reassociation of exprs
Be more liberal about expanding the OR tree. Handle any tree shape
instead of a fully left or right associative tree.

Also remove tail feature, it isn't ever needed.

Change-Id: If16bebef94b952a604d6069e9be3d9129994cb6f
Reviewed-on: https://go-review.googlesource.com/c/go/+/494056
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ryan Berger <ryanbberger@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-05-16 19:13:26 +00:00
Lynn Boger
4481042c43 cmd/compile: update rules to generate more prefixed instructions
This modifies some existing rules to allow more prefixed instructions
to be generated when using GOPPC64=power10. Some rules also check
if PCRel is available, which is currently supported for linux/ppc64le
and linux/ppc64 (internal linking only).

Prior to p10, DS-offset loads and stores had a 16 bit size limit for
the offset field. If the offset of the data for load or store was
beyond this range then an indexed load or store would be selected by
the rules.

In p10 the assembler can generate prefixed instructions in this case,
but does not if an indexed instruction was selected during the lowering
pass.

This allows many more cases to use prefixed loads or stores, reducing
function sizes and improving performance in some cases where the code
change happens in key loops.

For example in strconv BenchmarkAppendQuoteRune before:

  12c5e4:       15 00 10 06     pla     r10,1425660
  12c5e8:       fc c0 40 39
  12c5ec:       00 00 6a e8     ld      r3,0(r10)
  12c5f0:       10 00 aa e8     ld      r5,16(r10)

After this change:

  12a828:       15 00 10 04     pld     r3,1433272
  12a82c:       b8 de 60 e4
  12a830:       15 00 10 04     pld     r5,1433280
  12a834:       c0 de a0 e4

Performs better in the second case.

A testcase was added to verify that the rules correctly select a load or
store based on the offset and whether power10 or earlier.

Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846
Reviewed-on: https://go-review.googlesource.com/c/go/+/477398
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
2023-05-15 18:20:54 +00:00
Dmitri Shuralyov
7cc4516ac8 internal/testdir: move to cmd/internal/testdir
The effect and motivation is for the test to be selected when doing
'go test cmd' and not when doing 'go test std' since it's primarily
about testing the Go compiler and linker. Other than that, it's run
by all.bash and 'go test std cmd' as before.

For #56844.
Fixes #60059.

Change-Id: I2d499af013f9d9b8761fdf4573f8d27d80c1fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/493876
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-05-12 17:18:08 +00:00