qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-15 01:00:29 -07:00

Author	SHA1	Message	Date
Keith Randall	657c885fb9	cmd/compile: when combining stores, use line number of first store var p *[2]uint32 = ... p[0] = 0 p[1] = 0 When we combine these two 32-bit stores into a single 64-bit store, use the line number of the first store, not the second one. This differs from the default behavior because usually with the combining that the compiler does, we use the line number of the last instruction in the combo (e.g. load+add, we use the line number of the add). This is the same behavior that gcc does in C (picking the line number of the first of a set of combined stores). Change-Id: Ie70bf6151755322d33ecd50e4d9caf62f7881784 Reviewed-on: https://go-review.googlesource.com/c/go/+/521678 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>	2023-10-12 18:09:26 +00:00
Keith Randall	e0948d825d	cmd/compile: use type hash from itab field instead of type field It is one less dependent load away, and right next to another field in the itab we also load as part of the type switch or type assert. Change-Id: If7aaa7814c47bd79a6c7ed4232ece0bc1d63550e Reviewed-on: https://go-review.googlesource.com/c/go/+/533117 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2023-10-09 18:39:50 +00:00
Keith Randall	afd7c15c7f	cmd/compile: use cache in front of convI2I This is the last of the getitab users to receive a cache. We should now no longer see getitab (and callees) in profiles. Hopefully. Change-Id: I2ed72b9943095bbe8067c805da7f08e00706c98c Reviewed-on: https://go-review.googlesource.com/c/go/+/531055 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2023-10-09 17:28:22 +00:00
Mark Ryan	561bf0457f	cmd/compile: optimize right shifts of uint32 on riscv The compiler is currently zero extending 32 bit unsigned integers to 64 bits before right shifting them using a 64 bit shift instruction. There's no need to do this as RISC-V has instructions for right shifting 32 bit unsigned values (srlw and srliw) which zero extend the result of the shift to 64 bits. Change the compiler so that it uses srlw and srliw for 32 bit unsigned shifts reducing in most cases the number of instructions needed to perform the shift. Here are some examples of code sequences that are changed by this patch: uint32(a) >> 2 before: sll x5,x10,0x20 srl x10,x5,0x22 after: srlw x10,x10,0x2 uint32(a) >> int(b) before: sll x5,x10,0x20 srl x5,x5,0x20 srl x5,x5,x11 sltiu x6,x11,64 neg x6,x6 and x10,x5,x6 after: srlw x5,x10,x11 sltiu x6,x11,32 neg x6,x6 and x10,x5,x6 bits.RotateLeft32(uint32(a), 1) before: sll x5,x10,0x1 sll x6,x10,0x20 srl x7,x6,0x3f or x5,x5,x7 after: sll x5,x10,0x1 srlw x6,x10,0x1f or x10,x5,x6 bits.RotateLeft32(uint32(a), int(b)) before: and x6,x11,31 sll x7,x10,x6 sll x8,x10,0x20 srl x8,x8,0x20 add x6,x6,-32 neg x6,x6 srl x9,x8,x6 sltiu x6,x6,64 neg x6,x6 and x6,x9,x6 or x6,x6,x7 after: and x5,x11,31 sll x6,x10,x5 add x5,x5,-32 neg x5,x5 srlw x7,x10,x5 sltiu x5,x5,32 neg x5,x5 and x5,x7,x5 or x10,x6,x5 The one regression observed is the following case, an unbounded right shift of a uint32 where the value we're shifting by is known to be < 64 but > 31. As this is an unusual case this commit does not optimize for it, although the existing code does. uint32(a) >> (b & 63) before: sll x5,x10,0x20 srl x5,x5,0x20 and x6,x11,63 srl x10,x5,x6 after and x5,x11,63 srlw x6,x10,x5 sltiu x5,x5,32 neg x5,x5 and x10,x6,x5 Here we have one extra instruction. Some benchmark highlights, generated on a VisionFive2 8GB running Ubuntu 23.04. pkg: math/bits LeadingZeros32-4 18.64n ± 0% 17.32n ± 0% -7.11% (p=0.000 n=10) LeadingZeros64-4 15.47n ± 0% 15.51n ± 0% +0.26% (p=0.027 n=10) TrailingZeros16-4 18.48n ± 0% 17.68n ± 0% -4.33% (p=0.000 n=10) TrailingZeros32-4 16.87n ± 0% 16.07n ± 0% -4.74% (p=0.000 n=10) TrailingZeros64-4 15.26n ± 0% 15.27n ± 0% +0.07% (p=0.043 n=10) OnesCount32-4 20.08n ± 0% 19.29n ± 0% -3.96% (p=0.000 n=10) RotateLeft-4 8.864n ± 0% 8.838n ± 0% -0.30% (p=0.006 n=10) RotateLeft32-4 8.837n ± 0% 8.032n ± 0% -9.11% (p=0.000 n=10) Reverse32-4 29.77n ± 0% 26.52n ± 0% -10.93% (p=0.000 n=10) ReverseBytes32-4 9.640n ± 0% 8.838n ± 0% -8.32% (p=0.000 n=10) Sub32-4 8.835n ± 0% 8.035n ± 0% -9.06% (p=0.000 n=10) geomean 11.50n 11.33n -1.45% pkg: crypto/md5 Hash8Bytes-4 1.486µ ± 0% 1.426µ ± 0% -4.04% (p=0.000 n=10) Hash64-4 2.079µ ± 0% 1.968µ ± 0% -5.36% (p=0.000 n=10) Hash128-4 2.720µ ± 0% 2.557µ ± 0% -5.99% (p=0.000 n=10) Hash256-4 3.996µ ± 0% 3.733µ ± 0% -6.58% (p=0.000 n=10) Hash512-4 6.541µ ± 0% 6.072µ ± 0% -7.18% (p=0.000 n=10) Hash1K-4 11.64µ ± 0% 10.75µ ± 0% -7.58% (p=0.000 n=10) Hash8K-4 82.95µ ± 0% 76.32µ ± 0% -7.99% (p=0.000 n=10) Hash1M-4 10.436m ± 0% 9.591m ± 0% -8.10% (p=0.000 n=10) Hash8M-4 83.50m ± 0% 76.73m ± 0% -8.10% (p=0.000 n=10) Hash8BytesUnaligned-4 1.494µ ± 0% 1.434µ ± 0% -4.02% (p=0.000 n=10) Hash1KUnaligned-4 11.64µ ± 0% 10.76µ ± 0% -7.52% (p=0.000 n=10) Hash8KUnaligned-4 83.01µ ± 0% 76.32µ ± 0% -8.07% (p=0.000 n=10) geomean 28.32µ 26.42µ -6.72% Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693 Reviewed-on: https://go-review.googlesource.com/c/go/+/528975 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-10-07 12:31:38 +00:00
David Chase	b72bbaebf9	cmd/compile: expand calls cleanup Convert expand calls into a smaller number of focused recursive rewrites, and rely on an enhanced version of "decompose" to clean up afterwards. Debugging information seems to emerge intact. Change-Id: Ic46da4207e3a4da5c8e2c47b637b0e35abbe56bb Reviewed-on: https://go-review.googlesource.com/c/go/+/507295 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-10-06 20:57:33 +00:00
Keith Randall	cbcf8efa5f	cmd/compile: use cache in front of type assert runtime call That way we don't need to call into the runtime for every type assertion (to an interface type). name old time/op new time/op delta TypeAssert-24 3.78ns ± 3% 1.00ns ± 1% -73.53% (p=0.000 n=10+8) Change-Id: I0ba308aaf0f24a5495b4e13c814d35af0c58bfde Reviewed-on: https://go-review.googlesource.com/c/go/+/529316 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2023-10-06 17:02:53 +00:00
Keith Randall	39263f34a3	cmd/compile: add a cache to interface type switches That way we don't need to call into the runtime when the type being switched on has been seen many times before. The cache is just a hash table of a sample of all the concrete types that have been switched on at that source location. We record the matching case number and the resulting itab for each concrete input type. The caches seldom get large. The only two in a run of all.bash that get more than 100 entries, even with the sampling rate set to 1, are test/fixedbugs/issue29264.go, with 101 test/fixedbugs/issue29312.go, with 254 Both happen at the type switch in fmt.(*pp).handleMethods, perhaps unsurprisingly. name old time/op new time/op delta SwitchInterfaceTypePredictable-24 25.8ns ± 2% 2.5ns ± 3% -90.43% (p=0.000 n=10+10) SwitchInterfaceTypeUnpredictable-24 37.5ns ± 2% 11.2ns ± 1% -70.02% (p=0.000 n=10+10) Change-Id: I4961ac9547b7f15b03be6f55cdcb972d176955eb Reviewed-on: https://go-review.googlesource.com/c/go/+/526658 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@google.com>	2023-10-06 15:44:08 +00:00
Keith Randall	28f4ea16a2	cmd/compile: improve interface type switches For type switches where the targets are interface types, call into the runtime once instead of doing a sequence of assert* calls. name old time/op new time/op delta SwitchInterfaceTypePredictable-24 26.6ns ± 1% 25.8ns ± 2% -2.86% (p=0.000 n=10+10) SwitchInterfaceTypeUnpredictable-24 39.3ns ± 1% 37.5ns ± 2% -4.57% (p=0.000 n=10+10) Not super helpful by itself, but this code organization allows followon CLs that add caching to the lookups. Change-Id: I7967f85a99171faa6c2550690e311bea8b54b01c Reviewed-on: https://go-review.googlesource.com/c/go/+/526657 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com>	2023-10-06 15:42:30 +00:00
Paul E. Murphy	dcd018b5c5	cmd/internal/obj/ppc64: generate MOVD mask constants in register Add a new form of RLDC which maps directly to the ISA definition of rldc: RLDC Rs, $sh, $mb, Ra. This is used to generate mask constants described below. Using MOVD $-1, Rx; RLDC Rx, $sh, $mb, Rx, any mask constant can be generated. A mask is a contiguous series of 1 bits, which may wrap. Change-Id: Ifcaae1114080ad58b5fdaa3e5fc9019e2051f282 Reviewed-on: https://go-review.googlesource.com/c/go/+/531120 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-10-05 14:03:32 +00:00
Paul E. Murphy	36ecff0893	cmd/internal/obj/ppc64: generate small, shifted constants in register Check for shifted 16b constants, and transform them to avoid the load penalty. This should be much faster than loading, and reduce binary size by reducing the constant pool size. Change-Id: I6834e08be7ca88e3b77449d226d08d199db84299 Reviewed-on: https://go-review.googlesource.com/c/go/+/531119 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-10-04 19:10:19 +00:00
Paul E. Murphy	c8caad423c	cmd/compile/internal/ssa: optimize (AND (MOVDconst [-1] x)) on PPC64 This sequence can show up in the lowering pass on PPC64. If it makes it to the latelower pass, it will cause an error because it looks like it can be turned into RLDICL, but -1 isn't an accepted mask. Also, print more debug info if panic is called from encodePPC64RotateMask. Fixes #62698 Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c Reviewed-on: https://go-review.googlesource.com/c/go/+/529195 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-09-22 14:09:29 +00:00
eric fang	ace1494d92	cmd/compile: optimize absorbing InvertFlags into Noov comparisons for arm64 Previously (LessThanNoov (InvertFlags x)) is lowered as: CSET CSET BIC With this CL it's lowered as: CSET CSEL This saves one instruction. Similarly (GreaterEqualNoov (InvertFlags x)) is now lowered as: CSET CSINC $ benchstat old.bench new.bench goos: linux goarch: arm64 │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ InvertLessThanNoov-160 2.249n ± 2% 2.190n ± 1% -2.62% (p=0.003 n=10) Change-Id: Idd8979b7f4fe466e74b1a201c4aba7f1b0cffb0b Reviewed-on: https://go-review.googlesource.com/c/go/+/526237 Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-09-21 02:36:06 +00:00
Yi Yang	4ee1d542ed	cmd/compile: sparse conditional constant propagation sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately. This is a re-submit of PR#59575, which fix a broken dominance relationship caught by ssacheck Updates https://github.com/golang/go/issues/59399 Change-Id: I57482dee38f8e80a610aed4f64295e60c38b7a47 GitHub-Last-Rev: `830016f24e` GitHub-Pull-Request: golang/go#60469 Reviewed-on: https://go-review.googlesource.com/c/go/+/498795 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2023-09-12 21:01:50 +00:00
Paul E. Murphy	5cdb132228	cmd/compile/internal/ssa: improve masking codegen on PPC64 Generate RLDIC[LR] instead of MOVD mask, Rx; AND Rx, Ry, Rz. This helps reduce code size, and reduces the latency caused by the constant load. Similarly, for smaller-than-register values, truncate constants which exceed the range of the value's type to avoid needing to load a constant. Change-Id: I6019684795eb8962d4fd6d9585d08b17c15e7d64 Reviewed-on: https://go-review.googlesource.com/c/go/+/515576 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-09-06 16:34:20 +00:00
Keith Randall	4711299661	cmd/compile: use jump tables for large type switches For large interface -> concrete type switches, we can use a jump table on some bits of the type hash instead of a binary search on the type hash. name old time/op new time/op delta SwitchTypePredictable-24 1.99ns ± 2% 1.78ns ± 5% -10.87% (p=0.000 n=10+10) SwitchTypeUnpredictable-24 11.0ns ± 1% 9.1ns ± 2% -17.55% (p=0.000 n=7+9) Change-Id: Ida4768e5d62c3ce1c2701288b72664aaa9e64259 Reviewed-on: https://go-review.googlesource.com/c/go/+/521497 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2023-08-23 00:30:54 +00:00
Keith Randall	556e9c5f3e	cmd/compile: allow non-pointer writes in the middle of a write barrier This lets us combine more write barriers, getting rid of some of the test+branch and gcWriteBarrier* calls. With the new write barriers, it's easy to add a few non-pointer writes to the set of values written. We allow up to 2 non-pointer writes between pointer writes. This is enough for, for example, adjacent slice fields. Fixes #62126 Change-Id: I872d0fa9cc4eb855e270ffc0223b39fde1723c4b Reviewed-on: https://go-review.googlesource.com/c/go/+/521498 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>	2023-08-23 00:16:06 +00:00
Meng Zhuo	63ab68ddc5	cmd/compile: add single-precision FMA code generation for riscv64 This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066 Reviewed-on: https://go-review.googlesource.com/c/go/+/506616 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: M Zhuo <mzh@golangcn.org>	2023-08-22 12:05:36 +00:00
Meng Zhuo	05f9511582	cmd/compile: improve FP FMA performance on riscv64 FMADD/FMSUB/FNSUB are an efficient FP FMA instructions, which can be used by the compiler to improve FP performance. Erf 188.0n ± 2% 139.5n ± 2% -25.82% (p=0.000 n=10) Erfc 193.6n ± 1% 143.2n ± 1% -26.01% (p=0.000 n=10) Erfinv 244.4n ± 2% 172.6n ± 0% -29.40% (p=0.000 n=10) Erfcinv 244.7n ± 2% 173.0n ± 1% -29.31% (p=0.000 n=10) geomean 216.0n 156.3n -27.65% Ref: The RISC-V Instruction Set Manual Volume I: Unprivileged ISA 11.6 Single-Precision Floating-Point Computational Instructions Change-Id: I89aa3a4df7576fdd47f4a6ee608ac16feafd093c Reviewed-on: https://go-review.googlesource.com/c/go/+/506036 Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-08-22 08:38:08 +00:00
Keith Randall	0b47b94a62	cmd/compile: remove more extension ops when not needed If we're not using the upper bits, don't bother issuing a sign/zero extension operation. For arm64, after CL 520916 which fixed a correctness bug with extensions but as a side effect leaves many unnecessary ones still in place. Change-Id: I5f4fe4efbf2e9f80969ab5b9a6122fb812dc2ec0 Reviewed-on: https://go-review.googlesource.com/c/go/+/521496 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-08-21 20:52:15 +00:00
Keith Randall	611706b171	cmd/compile: don't use BTS when OR works, add direct memory BTS operations Stop using BTSconst and friends when ORLconst can be used instead. OR can be issued by more function units than BTS can, so it could lead to better IPC. OR might take a few more bytes to encode, but not a lot more. Still use BTSconst for cases where the constant otherwise wouldn't fit and would require a separate movabs instruction to materialize the constant. This happens when setting bits 31-63 of 64-bit targets. Add BTS-to-memory operations so we don't need to load/bts/store. Fixes #61694 Change-Id: I00379608df8fb0167cb01466e97d11dec7c1596c Reviewed-on: https://go-review.googlesource.com/c/go/+/515755 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-08-04 16:40:24 +00:00
Jorropo	bac4e2f241	cmd/compile: try to rewrite loops to count down Fixes #61629 This reduce the pressure on regalloc because then the loop only keep alive one value (the iterator) instead of the iterator and the upper bound since the comparison now acts against an immediate, often zero which can be skipped. This optimize things like: for i := 0; i < n; i++ { Or a range over a slice where the index is not used: for _, v := range someSlice { Or the new range over int from #61405: for range n { It is hit in 975 unique places while doing ./make.bash. Change-Id: I5facff8b267a0b60ea3c1b9a58c4d74cdb38f03f Reviewed-on: https://go-review.googlesource.com/c/go/+/512935 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>	2023-07-31 18:33:29 +00:00
Junxian Zhu	b1474672c6	cmd/compile: intrinsify Sub64 on mips64 This CL intrinsify Sub64 on mips64. pkg: math/bits _ sec/op _ sec/op vs base _ Sub-4 2.849n _ 0% 1.948n _ 0% -31.64% (p=0.000 n=8) Sub32-4 3.447n _ 0% 3.446n _ 0% ~ (p=0.982 n=8) Sub64-4 2.815n _ 0% 1.948n _ 0% -30.78% (p=0.000 n=8) Sub64multiple-4 6.124n _ 0% 3.340n _ 0% -45.46% (p=0.000 n=8) Change-Id: Ibba91a4350e4a549ae0b60d8cafc4bca05034b84 Reviewed-on: https://go-review.googlesource.com/c/go/+/498497 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2023-07-31 03:59:48 +00:00
Junxian Zhu	5f8a2fdf09	cmd/compile: intrinsify Add64 on mips64 This CL intrinsify Add64 on mips64. pkg: math/bits _ sec/op _ sec/op vs base _ Add64-4 2.783n _ 0% 1.950n _ 0% -29.93% (p=0.000 n=8) Add64multiple-4 5.713n _ 0% 3.063n _ 0% -46.38% (p=0.000 n=8) pkg: crypto/elliptic _ sec/op _ sec/op vs base _ ScalarBaseMult/P256-4 353.7_ _ 0% 282.7_ _ 0% -20.09% (p=0.000 n=8) ScalarBaseMult/P224-4 330.5_ _ 0% 250.0_ _ 0% -24.37% (p=0.000 n=8) ScalarBaseMult/P384-4 1228.8_ _ 0% 791.5_ _ 0% -35.59% (p=0.000 n=8) ScalarBaseMult/P521-4 15.412m _ 0% 2.438m _ 0% -84.18% (p=0.000 n=8) ScalarMult/P256-4 1189.4_ _ 0% 904.2_ _ 0% -23.98% (p=0.000 n=8) ScalarMult/P224-4 1138.8_ _ 0% 813.8_ _ 0% -28.54% (p=0.000 n=8) ScalarMult/P384-4 4.419m _ 0% 2.692m _ 0% -39.08% (p=0.000 n=8) ScalarMult/P521-4 59.768m _ 0% 8.773m _ 0% -85.32% (p=0.000 n=8) MarshalUnmarshal/P256/Uncompressed-4 8.697_ _ 1% 7.923_ _ 1% -8.91% (p=0.000 n=8) MarshalUnmarshal/P256/Compressed-4 104.75_ _ 0% 66.29_ _ 0% -36.72% (p=0.000 n=8) MarshalUnmarshal/P224/Uncompressed-4 8.728_ _ 1% 7.823_ _ 1% -10.37% (p=0.000 n=8) MarshalUnmarshal/P224/Compressed-4 1035.7_ _ 0% 676.5_ _ 2% -34.69% (p=0.000 n=8) MarshalUnmarshal/P384/Uncompressed-4 15.32_ _ 1% 11.81_ _ 1% -22.90% (p=0.000 n=8) MarshalUnmarshal/P384/Compressed-4 399.8_ _ 0% 217.4_ _ 0% -45.62% (p=0.000 n=8) MarshalUnmarshal/P521/Uncompressed-4 96.79_ _ 0% 20.32_ _ 0% -79.01% (p=0.000 n=8) MarshalUnmarshal/P521/Compressed-4 6640.4_ _ 0% 790.8_ _ 0% -88.09% (p=0.000 n=8) Change-Id: I8a0960b9665720c1d3e57dce36386e74db37fefa Reviewed-on: https://go-review.googlesource.com/c/go/+/498496 Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org>	2023-07-31 03:58:42 +00:00
Keith Randall	67983c0f78	cmd/compile: add indexed SET* opcodes for amd64 Update #61356 Change-Id: I391af98563b1c068208784c80ea736c78c29639d Reviewed-on: https://go-review.googlesource.com/c/go/+/510435 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2023-07-26 17:19:57 +00:00
Keith Randall	505e50b1e3	cmd/compile: get rid of special case in scheduler for entry block It isn't needed. Fixes #61356 Change-Id: Ib466a3eac90c3ea57888cf40c294513033fc6118 Reviewed-on: https://go-review.googlesource.com/c/go/+/509856 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2023-07-26 17:19:14 +00:00
Keith Randall	d0964e172b	cmd/compile: optimize s==s for strings s==s is always true for strings. This comes up in NaN testing in generic code, where we want x==x to compile completely away except for float types. Fixes #60777 Change-Id: I3ce054b5121354de2f9751b010fb409f148cb637 Reviewed-on: https://go-review.googlesource.com/c/go/+/503795 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org>	2023-07-26 17:17:28 +00:00
Keith Randall	e713d6f939	cmd/compile: memcombine if values being stored are from consecutive loads If we load 2 values and then store those 2 loaded values, we can likely perform that operation with a single wider load and store. Fixes #60709 Change-Id: Ifc5f92c2f1b174c6ed82a69070f16cec6853c770 Reviewed-on: https://go-review.googlesource.com/c/go/+/502295 Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2023-07-21 19:06:53 +00:00
Jes Cok	5d481abc87	all: fix typos Change-Id: I510b0a4bf3472d937393800dd57472c30beef329 GitHub-Last-Rev: `8d289b73a3` GitHub-Pull-Request: golang/go#60960 Reviewed-on: https://go-review.googlesource.com/c/go/+/505398 Auto-Submit: Robert Findley <rfindley@google.com> Reviewed-by: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Robert Findley <rfindley@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2023-07-18 19:55:29 +00:00
Paul E. Murphy	5e4000ad7f	cmd/compile: on PPC64, fix sign/zero extension when masking (ANDCCconst [y] (MOV.*reg x)) should only be merged when zero extending. Otherwise, sign bits are lost on negative values. (ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero extension of x. Likewise for the MOVHreg variant. Fixes #61297 Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b Reviewed-on: https://go-review.googlesource.com/c/go/+/508775 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-07-12 16:34:02 +00:00
Meng Zhuo	3fce111535	cmd/compile: fix FMA negative commutativity of riscv64 According to RISCV manual 11.6: FMADD x,y,z computes xy+z and FNMADD x,y,z => -xy-z FMSUB x,y,z => xy-z FNMSUB x,y,z => -xy+z respectively However our implement of SSA convert FMADD -x,y,z to FNMADD x,y,z which is wrong and should be convert to FNMSUB according to manual. Change-Id: Ib297bc83824e121fd7dda171ed56ea9694a4e575 Reviewed-on: https://go-review.googlesource.com/c/go/+/506575 Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-07-05 22:05:44 +00:00
Meng Zhuo	5c1a15df41	test/codegen: enable Mul2 DivPow2 test for riscv64 Change-Id: Ice0bb7a665599b334e927a1b00d1a5b400c15e3d Reviewed-on: https://go-review.googlesource.com/c/go/+/506035 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org>	2023-07-04 13:33:45 +00:00
Meng Zhuo	b7e7467865	test/codegen: add fsqrt test for riscv64 Add FSQRTD FSQRTS codegen tests for riscv64 Change-Id: I16ca3753ad1ba37afbd9d0f887b078e33f98fda0 Reviewed-on: https://go-review.googlesource.com/c/go/+/503275 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-06-15 15:16:20 +00:00
Keith Randall	c643b29381	cmd/compile: use callsite as line number for argument marshaling Don't use the line number of the argument itself, as that may be from arbitrarily earlier in the function. Fixes #60673 Change-Id: Ifc0a2aaae221a256be3a4b0b2e04849bae4b79d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/502656 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>	2023-06-12 20:34:37 +00:00
Austin Clements	c2e0bf0abf	cmd/internal/testdir: pass if GOEXPERIMENT=cgocheck2 is set Some testdir tests fail if GOEXPERIMENT=cgocheck2 is set. Fix this by skipping these tests. Change-Id: I58d4ef0cceb86bcf93220b4a44de9b9dc4879b16 Reviewed-on: https://go-review.googlesource.com/c/go/+/499675 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2023-06-01 18:30:44 +00:00
Bryan Mills	02d234e34d	Revert "cmd/compile: sparse conditional constant propagation" This reverts CL 483875. Reason for revert: appears to cause internal compiler errors on the ssacheck builder. Change-Id: I662418384291470c1962c417797a5890dd9aa7a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/497855 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com>	2023-05-24 14:39:34 +00:00
Junxian Zhu	f0d575c266	cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on mips64x This CL use MFC1/MTC1 instructions to move data between GPR and FPR instead of stores and loads to move float/int values. goos: linux goarch: mips64le pkg: math │ oldmath │ newmath │ │ sec/op │ sec/op vs base │ Acos-4 258.2n ± 0% 258.2n ± 0% ~ (p=0.859 n=8) Acosh-4 378.7n ± 0% 323.9n ± 0% -14.47% (p=0.000 n=8) Asin-4 255.1n ± 2% 255.5n ± 0% +0.16% (p=0.002 n=8) Asinh-4 407.1n ± 0% 348.7n ± 0% -14.35% (p=0.000 n=8) Atan-4 189.5n ± 0% 189.9n ± 3% ~ (p=0.205 n=8) Atanh-4 355.6n ± 0% 323.4n ± 2% -9.03% (p=0.000 n=8) Atan2-4 284.1n ± 7% 280.1n ± 4% ~ (p=0.313 n=8) Cbrt-4 314.3n ± 0% 236.4n ± 0% -24.79% (p=0.000 n=8) Ceil-4 144.3n ± 3% 139.6n ± 0% ~ (p=0.069 n=8) Compare-4 21.100n ± 0% 7.035n ± 0% -66.66% (p=0.000 n=8) Compare32-4 20.100n ± 0% 6.030n ± 0% -70.00% (p=0.000 n=8) Copysign-4 34.970n ± 0% 6.221n ± 0% -82.21% (p=0.000 n=8) Cos-4 183.4n ± 3% 184.1n ± 5% ~ (p=0.159 n=8) Cosh-4 487.9n ± 2% 419.6n ± 0% -14.00% (p=0.000 n=8) Erf-4 160.6n ± 0% 157.9n ± 0% -1.68% (p=0.009 n=8) Erfc-4 183.7n ± 4% 169.8n ± 0% -7.54% (p=0.000 n=8) Erfinv-4 191.5n ± 4% 183.6n ± 0% -4.13% (p=0.023 n=8) Erfcinv-4 192.0n ± 7% 184.3n ± 0% ~ (p=0.425 n=8) Exp-4 398.2n ± 0% 340.1n ± 4% -14.58% (p=0.000 n=8) ExpGo-4 383.3n ± 0% 327.3n ± 0% -14.62% (p=0.000 n=8) Expm1-4 248.7n ± 5% 216.0n ± 0% -13.11% (p=0.000 n=8) Exp2-4 372.8n ± 0% 316.9n ± 3% -14.98% (p=0.000 n=8) Exp2Go-4 374.1n ± 0% 320.5n ± 0% -14.33% (p=0.000 n=8) Abs-4 3.013n ± 0% 3.016n ± 0% +0.10% (p=0.020 n=8) Dim-4 5.021n ± 0% 5.022n ± 0% ~ (p=0.270 n=8) Floor-4 127.5n ± 4% 126.2n ± 3% ~ (p=0.186 n=8) Max-4 72.32n ± 0% 61.33n ± 0% -15.20% (p=0.000 n=8) Min-4 83.33n ± 1% 61.36n ± 0% -26.37% (p=0.000 n=8) Mod-4 690.7n ± 0% 454.5n ± 0% -34.20% (p=0.000 n=8) Frexp-4 116.30n ± 1% 71.80n ± 1% -38.26% (p=0.000 n=8) Gamma-4 389.0n ± 0% 355.9n ± 1% -8.48% (p=0.000 n=8) Hypot-4 102.40n ± 0% 83.90n ± 0% -18.07% (p=0.000 n=8) HypotGo-4 105.45n ± 4% 84.82n ± 2% -19.56% (p=0.000 n=8) Ilogb-4 99.13n ± 4% 63.71n ± 2% -35.73% (p=0.000 n=8) J0-4 859.7n ± 0% 854.8n ± 0% -0.57% (p=0.000 n=8) J1-4 873.9n ± 0% 875.7n ± 0% +0.21% (p=0.007 n=8) Jn-4 1.855µ ± 0% 1.867µ ± 0% +0.65% (p=0.000 n=8) Ldexp-4 130.50n ± 2% 64.35n ± 0% -50.69% (p=0.000 n=8) Lgamma-4 208.8n ± 0% 200.9n ± 0% -3.78% (p=0.000 n=8) Log-4 294.1n ± 0% 255.2n ± 3% -13.22% (p=0.000 n=8) Logb-4 105.45n ± 1% 66.81n ± 1% -36.64% (p=0.000 n=8) Log1p-4 268.2n ± 0% 211.3n ± 0% -21.21% (p=0.000 n=8) Log10-4 295.4n ± 0% 255.2n ± 2% -13.59% (p=0.000 n=8) Log2-4 152.9n ± 1% 127.5n ± 0% -16.61% (p=0.000 n=8) Modf-4 103.40n ± 0% 75.36n ± 0% -27.12% (p=0.000 n=8) Nextafter32-4 121.20n ± 1% 78.40n ± 0% -35.31% (p=0.000 n=8) Nextafter64-4 110.40n ± 1% 64.91n ± 0% -41.20% (p=0.000 n=8) PowInt-4 509.8n ± 1% 369.3n ± 1% -27.56% (p=0.000 n=8) PowFrac-4 1189.0n ± 0% 947.8n ± 0% -20.29% (p=0.000 n=8) Pow10Pos-4 15.07n ± 0% 15.07n ± 0% ~ (p=0.733 n=8) Pow10Neg-4 20.10n ± 0% 20.10n ± 0% ~ (p=0.576 n=8) Round-4 44.22n ± 0% 26.12n ± 0% -40.92% (p=0.000 n=8) RoundToEven-4 46.22n ± 0% 27.12n ± 0% -41.31% (p=0.000 n=8) Remainder-4 539.0n ± 1% 417.1n ± 1% -22.62% (p=0.000 n=8) Signbit-4 17.985n ± 0% 5.694n ± 0% -68.34% (p=0.000 n=8) Sin-4 185.7n ± 5% 172.9n ± 0% -6.89% (p=0.001 n=8) Sincos-4 176.6n ± 0% 200.9n ± 0% +13.76% (p=0.000 n=8) Sinh-4 495.8n ± 0% 435.9n ± 0% -12.09% (p=0.000 n=8) SqrtIndirect-4 5.022n ± 0% 5.024n ± 0% ~ (p=0.083 n=8) SqrtLatency-4 8.038n ± 0% 8.044n ± 0% ~ (p=0.524 n=8) SqrtIndirectLatency-4 8.035n ± 0% 8.039n ± 0% +0.06% (p=0.017 n=8) SqrtGoLatency-4 340.1n ± 0% 278.3n ± 0% -18.19% (p=0.000 n=8) SqrtPrime-4 5.381µ ± 0% 5.386µ ± 0% ~ (p=0.662 n=8) Tan-4 198.6n ± 1% 183.1n ± 0% -7.85% (p=0.000 n=8) Tanh-4 491.3n ± 1% 440.8n ± 1% -10.29% (p=0.000 n=8) Trunc-4 121.7n ± 0% 121.7n ± 0% ~ (p=0.769 n=8) Y0-4 855.1n ± 0% 859.8n ± 0% +0.54% (p=0.007 n=8) Y1-4 862.3n ± 0% 865.1n ± 0% +0.32% (p=0.007 n=8) Yn-4 1.830µ ± 0% 1.837µ ± 0% +0.36% (p=0.011 n=8) Float64bits-4 13.060n ± 0% 3.016n ± 0% -76.91% (p=0.000 n=8) Float64frombits-4 13.060n ± 0% 3.018n ± 0% -76.90% (p=0.000 n=8) Float32bits-4 13.060n ± 0% 3.016n ± 0% -76.91% (p=0.000 n=8) Float32frombits-4 13.070n ± 0% 3.013n ± 0% -76.94% (p=0.000 n=8) FMA-4 446.0n ± 0% 413.1n ± 1% -7.38% (p=0.000 n=8) geomean 143.4n 108.3n -24.49% Change-Id: I2067f7a5ae1126ada7ab3fb2083710e8212535e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/493815 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>	2023-05-24 03:36:31 +00:00
Yi Yang	fa50248ce6	cmd/compile: sparse conditional constant propagation sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately. Updates #59399 Change-Id: Ia954e906480654a6f0cc065d75b5912f96f36b2e GitHub-Last-Rev: `90fc02db99` GitHub-Pull-Request: golang/go#59575 Reviewed-on: https://go-review.googlesource.com/c/go/+/483875 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2023-05-24 02:54:03 +00:00
Cuong Manh Le	35a71dc56d	cmd/compile: avoid slicebytetostring call in len(string([]byte)) Change-Id: Ie04503e61400a793a6a29a4b58795254deabe472 Reviewed-on: https://go-review.googlesource.com/c/go/+/497276 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2023-05-23 19:27:38 +00:00
Matthew Dempsky	7f1467ff4d	cmd/compile: incorporate inlined function names into closure naming In Go 1.17, cmd/compile gained the ability to inline calls to functions that contain function literals (aka "closures"). This was implemented by duplicating the function literal body and emitting a second LSym, because in general it might be optimized better than the original function literal. However, the second LSym was named simply as any other function literal appearing literally in the enclosing function would be named. E.g., if f has a closure "f.funcX", and f is inlined into g, we would create "g.funcY" (N.B., X and Y need not be the same.). Users then have no idea this function originally came from f. With this CL, the inlined call stack is incorporated into the clone LSym's name: instead of "g.funcY", it's named "g.f.funcY". In the future, it seems desirable to arrange for the clone's name to appear exactly as the original name, so stack traces remain the same as when -l or -d=inlfuncswithclosures are used. But it's unclear whether the linker supports that today, or whether any downstream tooling would be confused by this. Updates #60324. Change-Id: Ifad0ccef7e959e72005beeecdfffd872f63982f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/497137 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com>	2023-05-22 22:47:15 +00:00
Keith Randall	6042a062dc	cmd/compile: make memcombine pass a bit more robust to reassociation of exprs Be more liberal about expanding the OR tree. Handle any tree shape instead of a fully left or right associative tree. Also remove tail feature, it isn't ever needed. Change-Id: If16bebef94b952a604d6069e9be3d9129994cb6f Reviewed-on: https://go-review.googlesource.com/c/go/+/494056 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Ryan Berger <ryanbberger@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>	2023-05-16 19:13:26 +00:00
Lynn Boger	4481042c43	cmd/compile: update rules to generate more prefixed instructions This modifies some existing rules to allow more prefixed instructions to be generated when using GOPPC64=power10. Some rules also check if PCRel is available, which is currently supported for linux/ppc64le and linux/ppc64 (internal linking only). Prior to p10, DS-offset loads and stores had a 16 bit size limit for the offset field. If the offset of the data for load or store was beyond this range then an indexed load or store would be selected by the rules. In p10 the assembler can generate prefixed instructions in this case, but does not if an indexed instruction was selected during the lowering pass. This allows many more cases to use prefixed loads or stores, reducing function sizes and improving performance in some cases where the code change happens in key loops. For example in strconv BenchmarkAppendQuoteRune before: 12c5e4: 15 00 10 06 pla r10,1425660 12c5e8: fc c0 40 39 12c5ec: 00 00 6a e8 ld r3,0(r10) 12c5f0: 10 00 aa e8 ld r5,16(r10) After this change: 12a828: 15 00 10 04 pld r3,1433272 12a82c: b8 de 60 e4 12a830: 15 00 10 04 pld r5,1433280 12a834: c0 de a0 e4 Performs better in the second case. A testcase was added to verify that the rules correctly select a load or store based on the offset and whether power10 or earlier. Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846 Reviewed-on: https://go-review.googlesource.com/c/go/+/477398 Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com>	2023-05-15 18:20:54 +00:00
Dmitri Shuralyov	7cc4516ac8	internal/testdir: move to cmd/internal/testdir The effect and motivation is for the test to be selected when doing 'go test cmd' and not when doing 'go test std' since it's primarily about testing the Go compiler and linker. Other than that, it's run by all.bash and 'go test std cmd' as before. For #56844. Fixes #60059. Change-Id: I2d499af013f9d9b8761fdf4573f8d27d80c1fccf Reviewed-on: https://go-review.googlesource.com/c/go/+/493876 Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2023-05-12 17:18:08 +00:00
Stefan	95c4f320d5	cmd/compile: add De Morgan's rewrite rule Adds rules that rewrites statements such as ~P&~Q as ~(P\|Q) and ~P\|~Q as ~(P&Q), removing an extraneous instruction. Change-Id: Icedb97df741680ddf9799df79df78657173aa500 GitHub-Last-Rev: `f22e2350c9` GitHub-Pull-Request: golang/go#60018 Reviewed-on: https://go-review.googlesource.com/c/go/+/493175 Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Stefan M <st3f4nm4d4@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-05-10 16:32:25 +00:00
Lynn Boger	bc3bdfa977	test: add memcombine testcases for ppc64 Thanks to the recent addition of the memcombine pass, the ppc64 ports now have the memcombine optimizations. Previously in PPC64.rules, the memcombine rules were only added for ppc64le targets due to the significant increase in size of the rewritePPC64.go file when those rules were added. The ppc64 and ppc64le rules had to be different because of the byte order due to endianness differences. This enables the memcombine tests to be run on ppc64 as well as ppc64le. Change-Id: I4081e2d94617a1b66541d536c0c2662e266c9c1e Reviewed-on: https://go-review.googlesource.com/c/go/+/492615 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Lynn Boger <laboger@linux.vnet.ibm.com>	2023-05-08 16:50:23 +00:00
Junxian Zhu	5cad8d41ca	math: optimize math.Abs on mipsx This commit optimized math.Abs function implementation on mipsx. Tested on loongson 3A2000. goos: linux goarch: mipsle pkg: math │ oldmath │ newmath │ │ sec/op │ sec/op vs base │ Acos-4 282.6n ± 0% 282.3n ± 0% ~ (p=0.140 n=7) Acosh-4 506.1n ± 0% 451.8n ± 0% -10.73% (p=0.001 n=7) Asin-4 272.3n ± 0% 272.2n ± 0% ~ (p=0.808 n=7) Asinh-4 529.7n ± 0% 475.3n ± 0% -10.27% (p=0.001 n=7) Atan-4 208.2n ± 0% 207.9n ± 0% ~ (p=0.134 n=7) Atanh-4 503.4n ± 1% 449.7n ± 0% -10.67% (p=0.001 n=7) Atan2-4 310.5n ± 0% 310.5n ± 0% ~ (p=0.928 n=7) Cbrt-4 359.3n ± 0% 358.8n ± 0% ~ (p=0.121 n=7) Ceil-4 203.9n ± 0% 204.0n ± 0% ~ (p=0.600 n=7) Compare-4 23.11n ± 0% 23.11n ± 0% ~ (p=0.702 n=7) Compare32-4 19.09n ± 0% 19.12n ± 0% ~ (p=0.070 n=7) Copysign-4 33.20n ± 0% 34.02n ± 0% +2.47% (p=0.001 n=7) Cos-4 422.5n ± 0% 385.4n ± 1% -8.78% (p=0.001 n=7) Cosh-4 628.0n ± 0% 545.5n ± 0% -13.14% (p=0.001 n=7) Erf-4 193.7n ± 2% 192.7n ± 1% ~ (p=0.430 n=7) Erfc-4 192.8n ± 1% 193.0n ± 0% ~ (p=0.245 n=7) Erfinv-4 220.7n ± 1% 221.5n ± 2% ~ (p=0.272 n=7) Erfcinv-4 221.3n ± 1% 220.4n ± 2% ~ (p=0.738 n=7) Exp-4 471.4n ± 0% 435.1n ± 0% -7.70% (p=0.001 n=7) ExpGo-4 470.6n ± 0% 434.0n ± 0% -7.78% (p=0.001 n=7) Expm1-4 243.1n ± 0% 243.4n ± 0% ~ (p=0.417 n=7) Exp2-4 463.1n ± 0% 427.0n ± 0% -7.80% (p=0.001 n=7) Exp2Go-4 462.4n ± 0% 426.2n ± 5% -7.83% (p=0.001 n=7) Abs-4 37.000n ± 0% 8.039n ± 9% -78.27% (p=0.001 n=7) Dim-4 18.09n ± 0% 18.11n ± 0% ~ (p=0.094 n=7) Floor-4 151.9n ± 0% 151.8n ± 0% ~ (p=0.190 n=7) Max-4 116.7n ± 1% 116.7n ± 1% ~ (p=0.842 n=7) Min-4 116.6n ± 1% 116.6n ± 0% ~ (p=0.464 n=7) Mod-4 1244.0n ± 0% 980.9n ± 0% -21.15% (p=0.001 n=7) Frexp-4 199.0n ± 0% 146.7n ± 0% -26.28% (p=0.001 n=7) Gamma-4 516.4n ± 0% 479.3n ± 1% -7.18% (p=0.001 n=7) Hypot-4 169.8n ± 0% 117.8n ± 2% -30.62% (p=0.001 n=7) HypotGo-4 170.8n ± 0% 117.5n ± 0% -31.21% (p=0.001 n=7) Ilogb-4 160.8n ± 0% 109.5n ± 0% -31.90% (p=0.001 n=7) J0-4 1.359µ ± 0% 1.305µ ± 0% -3.97% (p=0.001 n=7) J1-4 1.386µ ± 0% 1.334µ ± 0% -3.75% (p=0.001 n=7) Jn-4 2.864µ ± 0% 2.758µ ± 0% -3.70% (p=0.001 n=7) Ldexp-4 202.9n ± 0% 151.7n ± 0% -25.23% (p=0.001 n=7) Lgamma-4 234.0n ± 0% 234.3n ± 0% ~ (p=0.199 n=7) Log-4 444.1n ± 0% 407.9n ± 0% -8.15% (p=0.001 n=7) Logb-4 157.8n ± 0% 121.6n ± 0% -22.94% (p=0.001 n=7) Log1p-4 354.8n ± 0% 315.4n ± 0% -11.10% (p=0.001 n=7) Log10-4 453.9n ± 0% 417.9n ± 0% -7.93% (p=0.001 n=7) Log2-4 245.3n ± 0% 209.1n ± 0% -14.76% (p=0.001 n=7) Modf-4 126.6n ± 0% 126.6n ± 0% ~ (p=0.126 n=7) Nextafter32-4 112.5n ± 0% 112.5n ± 0% ~ (p=0.853 n=7) Nextafter64-4 141.7n ± 0% 141.6n ± 0% ~ (p=0.331 n=7) PowInt-4 878.8n ± 1% 758.3n ± 1% -13.71% (p=0.001 n=7) PowFrac-4 1.809µ ± 0% 1.615µ ± 0% -10.72% (p=0.001 n=7) Pow10Pos-4 18.10n ± 0% 18.12n ± 0% ~ (p=0.464 n=7) Pow10Neg-4 17.09n ± 0% 17.09n ± 0% ~ (p=0.263 n=7) Round-4 68.36n ± 0% 68.33n ± 0% ~ (p=0.325 n=7) RoundToEven-4 78.40n ± 0% 78.40n ± 0% ~ (p=0.934 n=7) Remainder-4 894.0n ± 1% 753.4n ± 1% -15.73% (p=0.001 n=7) Signbit-4 18.09n ± 0% 18.09n ± 0% ~ (p=0.761 n=7) Sin-4 389.8n ± 1% 389.8n ± 0% ~ (p=0.995 n=7) Sincos-4 416.0n ± 0% 415.9n ± 0% ~ (p=0.361 n=7) Sinh-4 634.6n ± 4% 585.6n ± 1% -7.72% (p=0.001 n=7) SqrtIndirect-4 8.035n ± 0% 8.036n ± 0% ~ (p=0.523 n=7) SqrtLatency-4 8.039n ± 0% 8.037n ± 0% ~ (p=0.218 n=7) SqrtIndirectLatency-4 8.040n ± 0% 8.040n ± 0% ~ (p=0.652 n=7) SqrtGoLatency-4 895.7n ± 0% 896.6n ± 0% +0.10% (p=0.004 n=7) SqrtPrime-4 5.406µ ± 0% 5.407µ ± 0% ~ (p=0.592 n=7) Tan-4 406.1n ± 0% 405.8n ± 1% ~ (p=0.435 n=7) Tanh-4 627.6n ± 0% 545.5n ± 0% -13.08% (p=0.001 n=7) Trunc-4 146.7n ± 1% 146.7n ± 0% ~ (p=0.755 n=7) Y0-4 1.359µ ± 0% 1.310µ ± 0% -3.61% (p=0.001 n=7) Y1-4 1.351µ ± 0% 1.301µ ± 0% -3.70% (p=0.001 n=7) Yn-4 2.829µ ± 0% 2.729µ ± 0% -3.53% (p=0.001 n=7) Float64bits-4 14.08n ± 0% 14.07n ± 0% ~ (p=0.069 n=7) Float64frombits-4 19.09n ± 0% 19.10n ± 0% ~ (p=0.755 n=7) Float32bits-4 13.06n ± 0% 13.07n ± 1% ~ (p=0.586 n=7) Float32frombits-4 13.06n ± 0% 13.06n ± 0% ~ (p=0.853 n=7) FMA-4 606.9n ± 0% 606.8n ± 0% ~ (p=0.393 n=7) geomean 201.1n 185.4n -7.81% Change-Id: I6d41a97ad3789ed5731588588859ac0b8b13b664 Reviewed-on: https://go-review.googlesource.com/c/go/+/484675 Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Than McIntosh <thanm@google.com>	2023-05-08 15:53:28 +00:00
Junxian Zhu	574431cfcd	math: optimize math.Abs on mips64x This commit optimized math.Abs function implementation on mips64x. Tested on loongson 3A2000. goos: linux goarch: mips64le pkg: math │ oldmath │ newmath │ │ sec/op │ sec/op vs base │ Acos-4 258.0n ± ∞ ¹ 257.1n ± ∞ ¹ -0.35% (p=0.008 n=5) Acosh-4 417.0n ± ∞ ¹ 377.9n ± ∞ ¹ -9.38% (p=0.008 n=5) Asin-4 248.0n ± ∞ ¹ 259.9n ± ∞ ¹ +4.80% (p=0.008 n=5) Asinh-4 439.6n ± ∞ ¹ 408.3n ± ∞ ¹ -7.12% (p=0.008 n=5) Atan-4 189.6n ± ∞ ¹ 188.8n ± ∞ ¹ ~ (p=0.056 n=5) Atanh-4 390.0n ± ∞ ¹ 356.4n ± ∞ ¹ -8.62% (p=0.008 n=5) Atan2-4 279.0n ± ∞ ¹ 263.9n ± ∞ ¹ -5.41% (p=0.008 n=5) Cbrt-4 314.2n ± ∞ ¹ 322.3n ± ∞ ¹ +2.58% (p=0.008 n=5) Ceil-4 139.7n ± ∞ ¹ 136.6n ± ∞ ¹ -2.22% (p=0.008 n=5) Compare-4 21.11n ± ∞ ¹ 21.09n ± ∞ ¹ ~ (p=0.405 n=5) Compare32-4 20.10n ± ∞ ¹ 20.12n ± ∞ ¹ ~ (p=0.206 n=5) Copysign-4 32.17n ± ∞ ¹ 35.71n ± ∞ ¹ +11.00% (p=0.008 n=5) Cos-4 222.8n ± ∞ ¹ 169.8n ± ∞ ¹ -23.79% (p=0.008 n=5) Cosh-4 550.2n ± ∞ ¹ 477.4n ± ∞ ¹ -13.23% (p=0.008 n=5) Erf-4 171.6n ± ∞ ¹ 174.5n ± ∞ ¹ ~ (p=0.635 n=5) Erfc-4 182.6n ± ∞ ¹ 170.2n ± ∞ ¹ -6.79% (p=0.008 n=5) Erfinv-4 177.6n ± ∞ ¹ 196.6n ± ∞ ¹ +10.70% (p=0.008 n=5) Erfcinv-4 177.8n ± ∞ ¹ 197.8n ± ∞ ¹ +11.25% (p=0.008 n=5) Exp-4 422.8n ± ∞ ¹ 382.1n ± ∞ ¹ -9.63% (p=0.008 n=5) ExpGo-4 416.1n ± ∞ ¹ 383.2n ± ∞ ¹ -7.91% (p=0.008 n=5) Expm1-4 232.9n ± ∞ ¹ 252.2n ± ∞ ¹ +8.29% (p=0.008 n=5) Exp2-4 404.8n ± ∞ ¹ 389.1n ± ∞ ¹ -3.88% (p=0.008 n=5) Exp2Go-4 407.0n ± ∞ ¹ 372.3n ± ∞ ¹ -8.53% (p=0.008 n=5) Abs-4 30.120n ± ∞ ¹ 3.014n ± ∞ ¹ -89.99% (p=0.008 n=5) Dim-4 5.021n ± ∞ ¹ 5.023n ± ∞ ¹ ~ (p=0.071 n=5) Floor-4 127.8n ± ∞ ¹ 127.1n ± ∞ ¹ -0.55% (p=0.008 n=5) Max-4 77.69n ± ∞ ¹ 76.33n ± ∞ ¹ -1.75% (p=0.008 n=5) Min-4 83.27n ± ∞ ¹ 77.87n ± ∞ ¹ -6.48% (p=0.008 n=5) Mod-4 906.2n ± ∞ ¹ 692.9n ± ∞ ¹ -23.54% (p=0.008 n=5) Frexp-4 150.6n ± ∞ ¹ 108.6n ± ∞ ¹ -27.89% (p=0.008 n=5) Gamma-4 418.4n ± ∞ ¹ 386.1n ± ∞ ¹ -7.72% (p=0.008 n=5) Hypot-4 148.20n ± ∞ ¹ 93.78n ± ∞ ¹ -36.72% (p=0.008 n=5) HypotGo-4 148.20n ± ∞ ¹ 94.47n ± ∞ ¹ -36.26% (p=0.008 n=5) Ilogb-4 135.50n ± ∞ ¹ 92.38n ± ∞ ¹ -31.82% (p=0.008 n=5) J0-4 937.7n ± ∞ ¹ 861.7n ± ∞ ¹ -8.10% (p=0.008 n=5) J1-4 915.4n ± ∞ ¹ 875.9n ± ∞ ¹ -4.32% (p=0.008 n=5) Jn-4 1.974µ ± ∞ ¹ 1.863µ ± ∞ ¹ -5.62% (p=0.008 n=5) Ldexp-4 158.5n ± ∞ ¹ 129.3n ± ∞ ¹ -18.42% (p=0.008 n=5) Lgamma-4 209.0n ± ∞ ¹ 211.8n ± ∞ ¹ ~ (p=0.095 n=5) Log-4 326.4n ± ∞ ¹ 295.2n ± ∞ ¹ -9.56% (p=0.008 n=5) Logb-4 147.7n ± ∞ ¹ 105.0n ± ∞ ¹ -28.91% (p=0.008 n=5) Log1p-4 303.4n ± ∞ ¹ 266.3n ± ∞ ¹ -12.23% (p=0.008 n=5) Log10-4 329.2n ± ∞ ¹ 298.3n ± ∞ ¹ -9.39% (p=0.008 n=5) Log2-4 187.4n ± ∞ ¹ 153.0n ± ∞ ¹ -18.36% (p=0.008 n=5) Modf-4 110.5n ± ∞ ¹ 103.5n ± ∞ ¹ -6.33% (p=0.008 n=5) Nextafter32-4 128.4n ± ∞ ¹ 121.5n ± ∞ ¹ -5.37% (p=0.016 n=5) Nextafter64-4 109.5n ± ∞ ¹ 110.5n ± ∞ ¹ +0.91% (p=0.008 n=5) PowInt-4 603.3n ± ∞ ¹ 516.4n ± ∞ ¹ -14.40% (p=0.008 n=5) PowFrac-4 1.365µ ± ∞ ¹ 1.183µ ± ∞ ¹ -13.33% (p=0.008 n=5) Pow10Pos-4 15.07n ± ∞ ¹ 15.07n ± ∞ ¹ ~ (p=0.738 n=5) Pow10Neg-4 21.11n ± ∞ ¹ 21.10n ± ∞ ¹ ~ (p=0.190 n=5) Round-4 44.23n ± ∞ ¹ 44.22n ± ∞ ¹ ~ (p=0.635 n=5) RoundToEven-4 50.25n ± ∞ ¹ 46.27n ± ∞ ¹ -7.92% (p=0.008 n=5) Remainder-4 675.6n ± ∞ ¹ 530.4n ± ∞ ¹ -21.49% (p=0.008 n=5) Signbit-4 17.07n ± ∞ ¹ 17.95n ± ∞ ¹ +5.16% (p=0.008 n=5) Sin-4 171.6n ± ∞ ¹ 189.1n ± ∞ ¹ +10.20% (p=0.008 n=5) Sincos-4 201.5n ± ∞ ¹ 200.5n ± ∞ ¹ ~ (p=0.421 n=5) Sinh-4 529.6n ± ∞ ¹ 484.6n ± ∞ ¹ -8.50% (p=0.008 n=5) SqrtIndirect-4 5.021n ± ∞ ¹ 5.023n ± ∞ ¹ +0.04% (p=0.048 n=5) SqrtLatency-4 8.032n ± ∞ ¹ 8.039n ± ∞ ¹ +0.09% (p=0.024 n=5) SqrtIndirectLatency-4 8.036n ± ∞ ¹ 8.038n ± ∞ ¹ ~ (p=0.056 n=5) SqrtGoLatency-4 338.8n ± ∞ ¹ 338.7n ± ∞ ¹ ~ (p=0.841 n=5) SqrtPrime-4 5.379µ ± ∞ ¹ 5.382µ ± ∞ ¹ +0.06% (p=0.048 n=5) Tan-4 182.7n ± ∞ ¹ 191.8n ± ∞ ¹ +4.98% (p=0.008 n=5) Tanh-4 558.7n ± ∞ ¹ 497.6n ± ∞ ¹ -10.94% (p=0.008 n=5) Trunc-4 122.5n ± ∞ ¹ 122.6n ± ∞ ¹ ~ (p=0.405 n=5) Y0-4 892.8n ± ∞ ¹ 851.7n ± ∞ ¹ -4.60% (p=0.008 n=5) Y1-4 887.2n ± ∞ ¹ 863.2n ± ∞ ¹ -2.71% (p=0.008 n=5) Yn-4 1.889µ ± ∞ ¹ 1.832µ ± ∞ ¹ -3.02% (p=0.008 n=5) Float64bits-4 13.05n ± ∞ ¹ 13.06n ± ∞ ¹ +0.08% (p=0.040 n=5) Float64frombits-4 13.05n ± ∞ ¹ 13.06n ± ∞ ¹ ~ (p=0.143 n=5) Float32bits-4 13.05n ± ∞ ¹ 13.06n ± ∞ ¹ +0.08% (p=0.008 n=5) Float32frombits-4 13.05n ± ∞ ¹ 13.08n ± ∞ ¹ +0.23% (p=0.016 n=5) FMA-4 445.7n ± ∞ ¹ 448.1n ± ∞ ¹ +0.54% (p=0.008 n=5) geomean 157.2n 142.8n -9.17% Change-Id: I9bf104848b588c9ecf79401a81d483d7fcdb0a79 Reviewed-on: https://go-review.googlesource.com/c/go/+/481575 Reviewed-by: M Zhuo <mzh@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Than McIntosh <thanm@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com>	2023-05-05 14:54:39 +00:00
Keith Randall	6b165577fe	cmd/compile: remove memequal call from string compares in more cases Add more rules to ensure that order doesn't matter. Add memequal 0 rule. Try to use a constant argument to memequal when one is available. Fixes #59684 Change-Id: I36e85ffbd949396ed700ed6e8ec2bc3ae013f5d2 Reviewed-on: https://go-review.googlesource.com/c/go/+/485535 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-04-18 21:31:33 +00:00
ruinan	9be533a8ee	cmd/compile: get more bounds info from logic operators in prove pass Currently, the prove pass can get knowledge from some specific logic operators only before the CFG is explored, which means that the bounds information of the branch will be ignored. This CL updates the facts table by the logic operators in every branch. Combined with the branch information, this will be helpful for BCE in some circumstances. Fixes #57243 Change-Id: I0bd164f1b47804ccfc37879abe9788740b016fd5 Reviewed-on: https://go-review.googlesource.com/c/go/+/419555 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2023-04-07 10:09:11 +00:00
erifan01	42f99b203d	cmd/compile: optimize cmp to cmn under conditions < and >= on arm64 Under the right conditions we can optimize cmp comparisons to cmn comparisons, such as: func foo(a, b int) int { var c int if a + b < 0 { c = 1 } return c } Previously it's compiled as: ADD R1, R0, R1 CMP $0, R1 CSET LT, R0 With this CL it's compiled as: CMN R1, R0 CSET MI, R0 Here we need to pay attention to the overflow situation of a+b, the MI flag means N==1, which doesn't honor the overflow flag V, its value depends only on the sign of the result. So it has the same semantic of the Go code, so it's correct. Similarly, this CL also optimizes the case of >= comparison using the PL conditional flag. Change-Id: I47179faba5b30cca84ea69bafa2ad5241bf6dfba Reviewed-on: https://go-review.googlesource.com/c/go/+/476116 Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-03-24 01:19:09 +00:00
erifan01	91a2e921dd	cmd/compile: fix incorrect truncating when converting CMP to TST on arm64 CL 420434 optimized CMP into TST in some situations, but it has a bug, these four rules are not correct: (LessThan (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessThan (TSTconst [c] y)) (LessEqual (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessEqual (TSTconst [c] y)) (GreaterThan (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterThan (TSTconst [c] y)) (GreaterEqual (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterEqual (TSTconst [c] y)) But due to the existence of this rule (LessThan (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessThan (TSTWconst [int32(c)] y)), the above rules have never been fired. This CL corrects them as: (LessThan (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessThan (TSTconst [c] y)) (LessEqual (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessEqual (TSTconst [c] y)) (GreaterThan (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterThan (TSTconst [c] y)) (GreaterEqual (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterEqual (TSTconst [c] y)) Change-Id: I7d60bcc9a266ee58388baeaab9f493b57cf1ad55 Reviewed-on: https://go-review.googlesource.com/c/go/+/473617 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com>	2023-03-22 08:32:53 +00:00

1 2 3 4 5 ...

446 Commits