qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-17 05:35:00 -07:00

Author	SHA1	Message	Date
Joel Sing	daa58db486	cmd/compile: improve rotations for riscv64 Enable canRotate for riscv64, enable rotation intrinsics and provide better rewrite implementations for rotations. By avoiding Lshx64 and RshUx64 we can produce better code, especially for 32 and 64 bit rotations. By enabling canRotate we also benefit from the generic rotation rewrite rules. Benchmark on a StarFive VisionFive 2: │ rotate.1 │ rotate.2 │ │ sec/op │ sec/op vs base │ RotateLeft-4 14.700n ± 0% 8.016n ± 0% -45.47% (p=0.000 n=10) RotateLeft8-4 14.70n ± 0% 10.69n ± 0% -27.28% (p=0.000 n=10) RotateLeft16-4 14.70n ± 0% 12.02n ± 0% -18.23% (p=0.000 n=10) RotateLeft32-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10) RotateLeft64-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10) geomean 14.15n 9.208n -34.92% Change-Id: I1a2036fdc57cf88ebb6617eb8d92e1d187e183b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/560315 Reviewed-by: M Zhuo <mengzhuo1203@gmail.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2024-02-16 11:59:07 +00:00
Meng Zhuo	1400b26852	test/codegen: add float max/min codegen test As CL 514596 and CL 514775 adds hardware implement of float max/min, we should add codegen test for these two CL. Change-Id: I347331032fe9f67a2e6fdb5d3cfe20203296b81c Reviewed-on: https://go-review.googlesource.com/c/go/+/561295 Reviewed-by: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: M Zhuo <mengzhuo1203@gmail.com> Reviewed-by: David Chase <drchase@google.com>	2024-02-08 03:02:00 +00:00
Danil Timerbulatov	527829a7cb	all: remove newline characters after return statements This commit is aimed at improving the readability and consistency of the code base. Extraneous newline characters were present after some return statements, creating unnecessary separation in the code. Fixes #64610 Change-Id: Ic1b05bf11761c4dff22691c2f1c3755f66d341f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/548316 Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2023-12-14 17:22:18 +00:00
Joel Sing	70c7fb75e9	cmd/compile: correct code generation for right shifts on riscv64 The code generation on riscv64 will currently result in incorrect assembly when a 32 bit integer is right shifted by an amount that exceeds the size of the type. In particular, this occurs when an int32 or uint32 is cast to a 64 bit type and right shifted by a value larger than 31. Fix this by moving the SRAW/SRLW conversion into the right shift rules and removing the SignExt32to64/ZeroExt32to64. Add additional rules that rewrite to SRAIW/SRLIW when the shift is less than the size of the type, or replace/eliminate the shift when it exceeds the size of the type. Add SSA tests that would have caught this issue. Also add additional codegen tests to ensure that the resulting assembly is what we expect in these overflow cases. Fixes #64285 Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec Reviewed-on: https://go-review.googlesource.com/c/go/+/545035 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-12-01 19:30:59 +00:00
Keith Randall	bda1ef13f8	cmd/compile: fix memcombine pass for big endian, > 1 byte elements The shift amounts were wrong in this case, leading to miscompilation of load combining. Also the store combining was not triggering when it should. Fixes #64468 Change-Id: Iaeb08972c5fc1d6f628800334789c6af7216e87b Reviewed-on: https://go-review.googlesource.com/c/go/+/546355 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-11-30 18:35:50 +00:00
Matthew Dempsky	00715d089d	cmd/compile/internal/walk: copy SSA-able variables order.go ensures expressions that are passed to the runtime by address are in fact addressable. However, in the case of local variables, if the variable hasn't already been marked as addrtaken, then taking its address here will effectively prevent the variable from being converted to SSA form. Instead, it's better to just copy the variable into a new temporary, which we can pass by address instead. This ensures the original variable can still be converted to SSA form. Fixes #63332. Change-Id: I182376d98d419df8bf07c400d84c344c9b82c0fb Reviewed-on: https://go-review.googlesource.com/c/go/+/541715 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com>	2023-11-21 20:34:12 +00:00
Paul E. Murphy	773039ed5c	cmd/compile/internal/ssa: on PPC64, merge (CMPconst [0] (op ...)) more aggressively Generate the CC version of many opcodes whose result is compared against signed 0. The approach taken here works even if the opcode result is used in multiple places too. Add support for ADD, ADDconst, ANDN, SUB, NEG, CNTLZD, NOR conversions to their CC opcode variant. These are the most commonly used variants. Also, do not set clobberFlags of CNTLZD and CNTLZW, they do not clobber flags. This results in about 1% smaller text sections in kubernetes binaries, and no regressions in the crypto benchmarks. Change-Id: I9e0381944869c3774106bf348dead5ecb96dffda Reviewed-on: https://go-review.googlesource.com/c/go/+/538636 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2023-11-13 22:12:32 +00:00
Paul E. Murphy	3128aeec87	cmd/internal/obj/ppc64: remove C_UCON optab matching class This optab matching rule was used to match signed 16 bit values shifted left by 16 bits. Unsigned 16 bit values greater than 0x7FFF<<16 were classified as C_U32CON which led to larger than necessary codegen. Instead, rewrite logical/arithmetic operations in the preprocessor pass to use the 16 bit shifted immediate operation (e.g ADDIS vs ADD). This simplifies the optab matching rules, while also minimizing codegen size for large unsigned values. Note, ADDIS sign-extends the constant argument, all others do not. For matching opcodes, this means: MOVD $is<<16,Rx becomes ADDIS $is,Rx or ORIS $is,Rx MOVW $is<<16,Rx becomes ADDIS $is,Rx ADD $is<<16,[Rx,]Ry becomes ADDIS $is[Rx,]Ry OR $is<<16,[Rx,]Ry becomes ORIS $is[Rx,]Ry XOR $is<<16,[Rx,]Ry becomes XORIS $is[Rx,]Ry Change-Id: I1a988d9f52517a04bb8dc2e41d7caf3d5fff867c Reviewed-on: https://go-review.googlesource.com/c/go/+/536735 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2023-11-09 18:41:18 +00:00
Ubuntu	8fc043ccfa	cmd/compile: optimize right shifts of int32 on riscv64 The compiler is currently sign extending 32 bit signed integers to 64 bits before right shifting them using a 64 bit shift instruction. There's no need to do this as RISC-V has instructions for right shifting 32 bit signed values (sraw and sraiw) which sign extend the result of the shift to 64 bits. Change the compiler so that it uses sraw and sraiw for shifts of signed 32 bit integers reducing in most cases the number of instructions needed to perform the shift. Here are some examples of code sequences that are changed by this patch: int32(a) >> 2 before: sll x5,x10,0x20 sra x10,x5,0x22 after: sraw x10,x10,0x2 int32(v) >> int(s) before: sext.w x5,x10 sltiu x6,x11,64 add x6,x6,-1 or x6,x11,x6 sra x10,x5,x6 after: sltiu x5,x11,32 add x5,x5,-1 or x5,x11,x5 sraw x10,x10,x5 int32(v) >> (int(s) & 31) before: sext.w x5,x10 and x6,x11,63 sra x10,x5,x6 after: and x5,x11,31 sraw x10,x10,x5 int32(100) >> int(a) before: bltz x10,<target address calls runtime.panicshift> sltiu x5,x10,64 add x5,x5,-1 or x5,x10,x5 li x6,100 sra x10,x6,x5 after: bltz x10,<target address calls runtime.panicshift> sltiu x5,x10,32 add x5,x5,-1 or x5,x10,x5 li x6,100 sraw x10,x6,x5 int32(v) >> (int(s) & 63) before: sext.w x5,x10 and x6,x11,63 sra x10,x5,x6 after: and x5,x11,63 sltiu x6,x5,32 add x6,x6,-1 or x5,x5,x6 sraw x10,x10,x5 In most cases we eliminate one instruction. In the case where we shift a int32 constant by a variable the number of instructions generated is identical. A sra is simply replaced by a sraw. In the unusual case where we shift right by a variable anded with a constant > 31 but < 64, we generate two additional instructions. As this is an unusual case we do not try to optimize for it. Some improvements can be seen in some of the existing benchmarks, notably in the utf8 package which performs right shifts of runes which are signed 32 bit integers. \| utf8-old \| utf8-new \| \| sec/op \| sec/op vs base \| EncodeASCIIRune-4 17.68n ± 0% 17.67n ± 0% ~ (p=0.312 n=10) EncodeJapaneseRune-4 35.34n ± 0% 34.53n ± 1% -2.31% (p=0.000 n=10) AppendASCIIRune-4 3.213n ± 0% 3.213n ± 0% ~ (p=0.318 n=10) AppendJapaneseRune-4 36.14n ± 0% 35.35n ± 0% -2.19% (p=0.000 n=10) DecodeASCIIRune-4 28.11n ± 0% 27.36n ± 0% -2.69% (p=0.000 n=10) DecodeJapaneseRune-4 38.55n ± 0% 38.58n ± 0% ~ (p=0.612 n=10) Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2 Reviewed-on: https://go-review.googlesource.com/c/go/+/535115 Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com>	2023-10-30 14:47:06 +00:00
Dmitri Shuralyov	b2fd76ab8d	test: migrate remaining files to go:build syntax Most of the test cases in the test directory use the new go:build syntax already. Convert the rest. In general, try to place the build constraint line below the test directive comment in more places. For #41184. For #60268. Change-Id: I11c41a0642a8a26dc2eda1406da908645bbc005b Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest,gotip-windows-amd64-longtest Reviewed-on: https://go-review.googlesource.com/c/go/+/536236 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-10-19 23:33:25 +00:00
Paul E. Murphy	1c5862cc0a	test/codegen: fix PPC64 AddLargeConst test Commit `061d77cb70` was published in parallel with another commit `36ecff0893` which changed how certain constants were generated. Update the test to account for the changes. Change-Id: I314b735a34857efa02392b7a0dd9fd634e4ee428 Reviewed-on: https://go-review.googlesource.com/c/go/+/536256 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Paul Murphy <murp@ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2023-10-19 00:40:54 +00:00
Paul E. Murphy	061d77cb70	cmd/compile/internal/ssa: on PPC64, generate large constant paddi This is only supported power10/linux/PPC64. This generates smaller, faster code by merging a pli + add into paddi. Change-Id: I1f4d522fce53aea4c072713cc119a9e0d7065acc Reviewed-on: https://go-review.googlesource.com/c/go/+/531717 Run-TryBot: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2023-10-18 18:04:48 +00:00
Lynn Boger	80834af206	cmd/compile: avoid ANDCCconst on PPC64 if condition not needed In the PPC64 ISA, the instruction to do an 'and' operation using an immediate constant is only available in the form that also sets CR0 (i.e. clobbers the condition register.) This means CR0 is being clobbered unnecessarily in many cases. That affects some decisions made during some compiler passes that check for it. In those cases when the constant used by the ANDCC is a right justified consecutive set of bits, a shift instruction can be used which has the same effect if CR0 does not need to be set. The rule to do that has been added to the late rules file after other rules using ANDCCconst have been processed in the main rules file. Some codegen tests had to be updated since ANDCC is no longer generated for some cases. A new test case was added to verify the ANDCC is present if the results for both the AND and CR0 are used. Change-Id: I304f607c039a458e2d67d25351dd00aea72ba542 Reviewed-on: https://go-review.googlesource.com/c/go/+/531435 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2023-10-18 15:56:53 +00:00
Keith Randall	657c885fb9	cmd/compile: when combining stores, use line number of first store var p *[2]uint32 = ... p[0] = 0 p[1] = 0 When we combine these two 32-bit stores into a single 64-bit store, use the line number of the first store, not the second one. This differs from the default behavior because usually with the combining that the compiler does, we use the line number of the last instruction in the combo (e.g. load+add, we use the line number of the add). This is the same behavior that gcc does in C (picking the line number of the first of a set of combined stores). Change-Id: Ie70bf6151755322d33ecd50e4d9caf62f7881784 Reviewed-on: https://go-review.googlesource.com/c/go/+/521678 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>	2023-10-12 18:09:26 +00:00
Keith Randall	e0948d825d	cmd/compile: use type hash from itab field instead of type field It is one less dependent load away, and right next to another field in the itab we also load as part of the type switch or type assert. Change-Id: If7aaa7814c47bd79a6c7ed4232ece0bc1d63550e Reviewed-on: https://go-review.googlesource.com/c/go/+/533117 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2023-10-09 18:39:50 +00:00
Keith Randall	afd7c15c7f	cmd/compile: use cache in front of convI2I This is the last of the getitab users to receive a cache. We should now no longer see getitab (and callees) in profiles. Hopefully. Change-Id: I2ed72b9943095bbe8067c805da7f08e00706c98c Reviewed-on: https://go-review.googlesource.com/c/go/+/531055 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2023-10-09 17:28:22 +00:00
Mark Ryan	561bf0457f	cmd/compile: optimize right shifts of uint32 on riscv The compiler is currently zero extending 32 bit unsigned integers to 64 bits before right shifting them using a 64 bit shift instruction. There's no need to do this as RISC-V has instructions for right shifting 32 bit unsigned values (srlw and srliw) which zero extend the result of the shift to 64 bits. Change the compiler so that it uses srlw and srliw for 32 bit unsigned shifts reducing in most cases the number of instructions needed to perform the shift. Here are some examples of code sequences that are changed by this patch: uint32(a) >> 2 before: sll x5,x10,0x20 srl x10,x5,0x22 after: srlw x10,x10,0x2 uint32(a) >> int(b) before: sll x5,x10,0x20 srl x5,x5,0x20 srl x5,x5,x11 sltiu x6,x11,64 neg x6,x6 and x10,x5,x6 after: srlw x5,x10,x11 sltiu x6,x11,32 neg x6,x6 and x10,x5,x6 bits.RotateLeft32(uint32(a), 1) before: sll x5,x10,0x1 sll x6,x10,0x20 srl x7,x6,0x3f or x5,x5,x7 after: sll x5,x10,0x1 srlw x6,x10,0x1f or x10,x5,x6 bits.RotateLeft32(uint32(a), int(b)) before: and x6,x11,31 sll x7,x10,x6 sll x8,x10,0x20 srl x8,x8,0x20 add x6,x6,-32 neg x6,x6 srl x9,x8,x6 sltiu x6,x6,64 neg x6,x6 and x6,x9,x6 or x6,x6,x7 after: and x5,x11,31 sll x6,x10,x5 add x5,x5,-32 neg x5,x5 srlw x7,x10,x5 sltiu x5,x5,32 neg x5,x5 and x5,x7,x5 or x10,x6,x5 The one regression observed is the following case, an unbounded right shift of a uint32 where the value we're shifting by is known to be < 64 but > 31. As this is an unusual case this commit does not optimize for it, although the existing code does. uint32(a) >> (b & 63) before: sll x5,x10,0x20 srl x5,x5,0x20 and x6,x11,63 srl x10,x5,x6 after and x5,x11,63 srlw x6,x10,x5 sltiu x5,x5,32 neg x5,x5 and x10,x6,x5 Here we have one extra instruction. Some benchmark highlights, generated on a VisionFive2 8GB running Ubuntu 23.04. pkg: math/bits LeadingZeros32-4 18.64n ± 0% 17.32n ± 0% -7.11% (p=0.000 n=10) LeadingZeros64-4 15.47n ± 0% 15.51n ± 0% +0.26% (p=0.027 n=10) TrailingZeros16-4 18.48n ± 0% 17.68n ± 0% -4.33% (p=0.000 n=10) TrailingZeros32-4 16.87n ± 0% 16.07n ± 0% -4.74% (p=0.000 n=10) TrailingZeros64-4 15.26n ± 0% 15.27n ± 0% +0.07% (p=0.043 n=10) OnesCount32-4 20.08n ± 0% 19.29n ± 0% -3.96% (p=0.000 n=10) RotateLeft-4 8.864n ± 0% 8.838n ± 0% -0.30% (p=0.006 n=10) RotateLeft32-4 8.837n ± 0% 8.032n ± 0% -9.11% (p=0.000 n=10) Reverse32-4 29.77n ± 0% 26.52n ± 0% -10.93% (p=0.000 n=10) ReverseBytes32-4 9.640n ± 0% 8.838n ± 0% -8.32% (p=0.000 n=10) Sub32-4 8.835n ± 0% 8.035n ± 0% -9.06% (p=0.000 n=10) geomean 11.50n 11.33n -1.45% pkg: crypto/md5 Hash8Bytes-4 1.486µ ± 0% 1.426µ ± 0% -4.04% (p=0.000 n=10) Hash64-4 2.079µ ± 0% 1.968µ ± 0% -5.36% (p=0.000 n=10) Hash128-4 2.720µ ± 0% 2.557µ ± 0% -5.99% (p=0.000 n=10) Hash256-4 3.996µ ± 0% 3.733µ ± 0% -6.58% (p=0.000 n=10) Hash512-4 6.541µ ± 0% 6.072µ ± 0% -7.18% (p=0.000 n=10) Hash1K-4 11.64µ ± 0% 10.75µ ± 0% -7.58% (p=0.000 n=10) Hash8K-4 82.95µ ± 0% 76.32µ ± 0% -7.99% (p=0.000 n=10) Hash1M-4 10.436m ± 0% 9.591m ± 0% -8.10% (p=0.000 n=10) Hash8M-4 83.50m ± 0% 76.73m ± 0% -8.10% (p=0.000 n=10) Hash8BytesUnaligned-4 1.494µ ± 0% 1.434µ ± 0% -4.02% (p=0.000 n=10) Hash1KUnaligned-4 11.64µ ± 0% 10.76µ ± 0% -7.52% (p=0.000 n=10) Hash8KUnaligned-4 83.01µ ± 0% 76.32µ ± 0% -8.07% (p=0.000 n=10) geomean 28.32µ 26.42µ -6.72% Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693 Reviewed-on: https://go-review.googlesource.com/c/go/+/528975 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-10-07 12:31:38 +00:00
David Chase	b72bbaebf9	cmd/compile: expand calls cleanup Convert expand calls into a smaller number of focused recursive rewrites, and rely on an enhanced version of "decompose" to clean up afterwards. Debugging information seems to emerge intact. Change-Id: Ic46da4207e3a4da5c8e2c47b637b0e35abbe56bb Reviewed-on: https://go-review.googlesource.com/c/go/+/507295 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-10-06 20:57:33 +00:00
Keith Randall	cbcf8efa5f	cmd/compile: use cache in front of type assert runtime call That way we don't need to call into the runtime for every type assertion (to an interface type). name old time/op new time/op delta TypeAssert-24 3.78ns ± 3% 1.00ns ± 1% -73.53% (p=0.000 n=10+8) Change-Id: I0ba308aaf0f24a5495b4e13c814d35af0c58bfde Reviewed-on: https://go-review.googlesource.com/c/go/+/529316 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2023-10-06 17:02:53 +00:00
Keith Randall	39263f34a3	cmd/compile: add a cache to interface type switches That way we don't need to call into the runtime when the type being switched on has been seen many times before. The cache is just a hash table of a sample of all the concrete types that have been switched on at that source location. We record the matching case number and the resulting itab for each concrete input type. The caches seldom get large. The only two in a run of all.bash that get more than 100 entries, even with the sampling rate set to 1, are test/fixedbugs/issue29264.go, with 101 test/fixedbugs/issue29312.go, with 254 Both happen at the type switch in fmt.(*pp).handleMethods, perhaps unsurprisingly. name old time/op new time/op delta SwitchInterfaceTypePredictable-24 25.8ns ± 2% 2.5ns ± 3% -90.43% (p=0.000 n=10+10) SwitchInterfaceTypeUnpredictable-24 37.5ns ± 2% 11.2ns ± 1% -70.02% (p=0.000 n=10+10) Change-Id: I4961ac9547b7f15b03be6f55cdcb972d176955eb Reviewed-on: https://go-review.googlesource.com/c/go/+/526658 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@google.com>	2023-10-06 15:44:08 +00:00
Keith Randall	28f4ea16a2	cmd/compile: improve interface type switches For type switches where the targets are interface types, call into the runtime once instead of doing a sequence of assert* calls. name old time/op new time/op delta SwitchInterfaceTypePredictable-24 26.6ns ± 1% 25.8ns ± 2% -2.86% (p=0.000 n=10+10) SwitchInterfaceTypeUnpredictable-24 39.3ns ± 1% 37.5ns ± 2% -4.57% (p=0.000 n=10+10) Not super helpful by itself, but this code organization allows followon CLs that add caching to the lookups. Change-Id: I7967f85a99171faa6c2550690e311bea8b54b01c Reviewed-on: https://go-review.googlesource.com/c/go/+/526657 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com>	2023-10-06 15:42:30 +00:00
Paul E. Murphy	dcd018b5c5	cmd/internal/obj/ppc64: generate MOVD mask constants in register Add a new form of RLDC which maps directly to the ISA definition of rldc: RLDC Rs, $sh, $mb, Ra. This is used to generate mask constants described below. Using MOVD $-1, Rx; RLDC Rx, $sh, $mb, Rx, any mask constant can be generated. A mask is a contiguous series of 1 bits, which may wrap. Change-Id: Ifcaae1114080ad58b5fdaa3e5fc9019e2051f282 Reviewed-on: https://go-review.googlesource.com/c/go/+/531120 Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-10-05 14:03:32 +00:00
Paul E. Murphy	36ecff0893	cmd/internal/obj/ppc64: generate small, shifted constants in register Check for shifted 16b constants, and transform them to avoid the load penalty. This should be much faster than loading, and reduce binary size by reducing the constant pool size. Change-Id: I6834e08be7ca88e3b77449d226d08d199db84299 Reviewed-on: https://go-review.googlesource.com/c/go/+/531119 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-10-04 19:10:19 +00:00
Paul E. Murphy	c8caad423c	cmd/compile/internal/ssa: optimize (AND (MOVDconst [-1] x)) on PPC64 This sequence can show up in the lowering pass on PPC64. If it makes it to the latelower pass, it will cause an error because it looks like it can be turned into RLDICL, but -1 isn't an accepted mask. Also, print more debug info if panic is called from encodePPC64RotateMask. Fixes #62698 Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c Reviewed-on: https://go-review.googlesource.com/c/go/+/529195 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-09-22 14:09:29 +00:00
eric fang	ace1494d92	cmd/compile: optimize absorbing InvertFlags into Noov comparisons for arm64 Previously (LessThanNoov (InvertFlags x)) is lowered as: CSET CSET BIC With this CL it's lowered as: CSET CSEL This saves one instruction. Similarly (GreaterEqualNoov (InvertFlags x)) is now lowered as: CSET CSINC $ benchstat old.bench new.bench goos: linux goarch: arm64 │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ InvertLessThanNoov-160 2.249n ± 2% 2.190n ± 1% -2.62% (p=0.003 n=10) Change-Id: Idd8979b7f4fe466e74b1a201c4aba7f1b0cffb0b Reviewed-on: https://go-review.googlesource.com/c/go/+/526237 Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-09-21 02:36:06 +00:00
Yi Yang	4ee1d542ed	cmd/compile: sparse conditional constant propagation sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately. This is a re-submit of PR#59575, which fix a broken dominance relationship caught by ssacheck Updates https://github.com/golang/go/issues/59399 Change-Id: I57482dee38f8e80a610aed4f64295e60c38b7a47 GitHub-Last-Rev: `830016f24e` GitHub-Pull-Request: golang/go#60469 Reviewed-on: https://go-review.googlesource.com/c/go/+/498795 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2023-09-12 21:01:50 +00:00
Paul E. Murphy	5cdb132228	cmd/compile/internal/ssa: improve masking codegen on PPC64 Generate RLDIC[LR] instead of MOVD mask, Rx; AND Rx, Ry, Rz. This helps reduce code size, and reduces the latency caused by the constant load. Similarly, for smaller-than-register values, truncate constants which exceed the range of the value's type to avoid needing to load a constant. Change-Id: I6019684795eb8962d4fd6d9585d08b17c15e7d64 Reviewed-on: https://go-review.googlesource.com/c/go/+/515576 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-09-06 16:34:20 +00:00
Keith Randall	4711299661	cmd/compile: use jump tables for large type switches For large interface -> concrete type switches, we can use a jump table on some bits of the type hash instead of a binary search on the type hash. name old time/op new time/op delta SwitchTypePredictable-24 1.99ns ± 2% 1.78ns ± 5% -10.87% (p=0.000 n=10+10) SwitchTypeUnpredictable-24 11.0ns ± 1% 9.1ns ± 2% -17.55% (p=0.000 n=7+9) Change-Id: Ida4768e5d62c3ce1c2701288b72664aaa9e64259 Reviewed-on: https://go-review.googlesource.com/c/go/+/521497 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2023-08-23 00:30:54 +00:00
Keith Randall	556e9c5f3e	cmd/compile: allow non-pointer writes in the middle of a write barrier This lets us combine more write barriers, getting rid of some of the test+branch and gcWriteBarrier* calls. With the new write barriers, it's easy to add a few non-pointer writes to the set of values written. We allow up to 2 non-pointer writes between pointer writes. This is enough for, for example, adjacent slice fields. Fixes #62126 Change-Id: I872d0fa9cc4eb855e270ffc0223b39fde1723c4b Reviewed-on: https://go-review.googlesource.com/c/go/+/521498 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>	2023-08-23 00:16:06 +00:00
Meng Zhuo	63ab68ddc5	cmd/compile: add single-precision FMA code generation for riscv64 This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066 Reviewed-on: https://go-review.googlesource.com/c/go/+/506616 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: M Zhuo <mzh@golangcn.org>	2023-08-22 12:05:36 +00:00
Meng Zhuo	05f9511582	cmd/compile: improve FP FMA performance on riscv64 FMADD/FMSUB/FNSUB are an efficient FP FMA instructions, which can be used by the compiler to improve FP performance. Erf 188.0n ± 2% 139.5n ± 2% -25.82% (p=0.000 n=10) Erfc 193.6n ± 1% 143.2n ± 1% -26.01% (p=0.000 n=10) Erfinv 244.4n ± 2% 172.6n ± 0% -29.40% (p=0.000 n=10) Erfcinv 244.7n ± 2% 173.0n ± 1% -29.31% (p=0.000 n=10) geomean 216.0n 156.3n -27.65% Ref: The RISC-V Instruction Set Manual Volume I: Unprivileged ISA 11.6 Single-Precision Floating-Point Computational Instructions Change-Id: I89aa3a4df7576fdd47f4a6ee608ac16feafd093c Reviewed-on: https://go-review.googlesource.com/c/go/+/506036 Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-08-22 08:38:08 +00:00
Keith Randall	0b47b94a62	cmd/compile: remove more extension ops when not needed If we're not using the upper bits, don't bother issuing a sign/zero extension operation. For arm64, after CL 520916 which fixed a correctness bug with extensions but as a side effect leaves many unnecessary ones still in place. Change-Id: I5f4fe4efbf2e9f80969ab5b9a6122fb812dc2ec0 Reviewed-on: https://go-review.googlesource.com/c/go/+/521496 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-08-21 20:52:15 +00:00
Keith Randall	611706b171	cmd/compile: don't use BTS when OR works, add direct memory BTS operations Stop using BTSconst and friends when ORLconst can be used instead. OR can be issued by more function units than BTS can, so it could lead to better IPC. OR might take a few more bytes to encode, but not a lot more. Still use BTSconst for cases where the constant otherwise wouldn't fit and would require a separate movabs instruction to materialize the constant. This happens when setting bits 31-63 of 64-bit targets. Add BTS-to-memory operations so we don't need to load/bts/store. Fixes #61694 Change-Id: I00379608df8fb0167cb01466e97d11dec7c1596c Reviewed-on: https://go-review.googlesource.com/c/go/+/515755 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-08-04 16:40:24 +00:00
Jorropo	bac4e2f241	cmd/compile: try to rewrite loops to count down Fixes #61629 This reduce the pressure on regalloc because then the loop only keep alive one value (the iterator) instead of the iterator and the upper bound since the comparison now acts against an immediate, often zero which can be skipped. This optimize things like: for i := 0; i < n; i++ { Or a range over a slice where the index is not used: for _, v := range someSlice { Or the new range over int from #61405: for range n { It is hit in 975 unique places while doing ./make.bash. Change-Id: I5facff8b267a0b60ea3c1b9a58c4d74cdb38f03f Reviewed-on: https://go-review.googlesource.com/c/go/+/512935 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>	2023-07-31 18:33:29 +00:00
Junxian Zhu	b1474672c6	cmd/compile: intrinsify Sub64 on mips64 This CL intrinsify Sub64 on mips64. pkg: math/bits _ sec/op _ sec/op vs base _ Sub-4 2.849n _ 0% 1.948n _ 0% -31.64% (p=0.000 n=8) Sub32-4 3.447n _ 0% 3.446n _ 0% ~ (p=0.982 n=8) Sub64-4 2.815n _ 0% 1.948n _ 0% -30.78% (p=0.000 n=8) Sub64multiple-4 6.124n _ 0% 3.340n _ 0% -45.46% (p=0.000 n=8) Change-Id: Ibba91a4350e4a549ae0b60d8cafc4bca05034b84 Reviewed-on: https://go-review.googlesource.com/c/go/+/498497 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2023-07-31 03:59:48 +00:00
Junxian Zhu	5f8a2fdf09	cmd/compile: intrinsify Add64 on mips64 This CL intrinsify Add64 on mips64. pkg: math/bits _ sec/op _ sec/op vs base _ Add64-4 2.783n _ 0% 1.950n _ 0% -29.93% (p=0.000 n=8) Add64multiple-4 5.713n _ 0% 3.063n _ 0% -46.38% (p=0.000 n=8) pkg: crypto/elliptic _ sec/op _ sec/op vs base _ ScalarBaseMult/P256-4 353.7_ _ 0% 282.7_ _ 0% -20.09% (p=0.000 n=8) ScalarBaseMult/P224-4 330.5_ _ 0% 250.0_ _ 0% -24.37% (p=0.000 n=8) ScalarBaseMult/P384-4 1228.8_ _ 0% 791.5_ _ 0% -35.59% (p=0.000 n=8) ScalarBaseMult/P521-4 15.412m _ 0% 2.438m _ 0% -84.18% (p=0.000 n=8) ScalarMult/P256-4 1189.4_ _ 0% 904.2_ _ 0% -23.98% (p=0.000 n=8) ScalarMult/P224-4 1138.8_ _ 0% 813.8_ _ 0% -28.54% (p=0.000 n=8) ScalarMult/P384-4 4.419m _ 0% 2.692m _ 0% -39.08% (p=0.000 n=8) ScalarMult/P521-4 59.768m _ 0% 8.773m _ 0% -85.32% (p=0.000 n=8) MarshalUnmarshal/P256/Uncompressed-4 8.697_ _ 1% 7.923_ _ 1% -8.91% (p=0.000 n=8) MarshalUnmarshal/P256/Compressed-4 104.75_ _ 0% 66.29_ _ 0% -36.72% (p=0.000 n=8) MarshalUnmarshal/P224/Uncompressed-4 8.728_ _ 1% 7.823_ _ 1% -10.37% (p=0.000 n=8) MarshalUnmarshal/P224/Compressed-4 1035.7_ _ 0% 676.5_ _ 2% -34.69% (p=0.000 n=8) MarshalUnmarshal/P384/Uncompressed-4 15.32_ _ 1% 11.81_ _ 1% -22.90% (p=0.000 n=8) MarshalUnmarshal/P384/Compressed-4 399.8_ _ 0% 217.4_ _ 0% -45.62% (p=0.000 n=8) MarshalUnmarshal/P521/Uncompressed-4 96.79_ _ 0% 20.32_ _ 0% -79.01% (p=0.000 n=8) MarshalUnmarshal/P521/Compressed-4 6640.4_ _ 0% 790.8_ _ 0% -88.09% (p=0.000 n=8) Change-Id: I8a0960b9665720c1d3e57dce36386e74db37fefa Reviewed-on: https://go-review.googlesource.com/c/go/+/498496 Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org>	2023-07-31 03:58:42 +00:00
Keith Randall	67983c0f78	cmd/compile: add indexed SET* opcodes for amd64 Update #61356 Change-Id: I391af98563b1c068208784c80ea736c78c29639d Reviewed-on: https://go-review.googlesource.com/c/go/+/510435 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2023-07-26 17:19:57 +00:00
Keith Randall	505e50b1e3	cmd/compile: get rid of special case in scheduler for entry block It isn't needed. Fixes #61356 Change-Id: Ib466a3eac90c3ea57888cf40c294513033fc6118 Reviewed-on: https://go-review.googlesource.com/c/go/+/509856 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2023-07-26 17:19:14 +00:00
Keith Randall	d0964e172b	cmd/compile: optimize s==s for strings s==s is always true for strings. This comes up in NaN testing in generic code, where we want x==x to compile completely away except for float types. Fixes #60777 Change-Id: I3ce054b5121354de2f9751b010fb409f148cb637 Reviewed-on: https://go-review.googlesource.com/c/go/+/503795 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org>	2023-07-26 17:17:28 +00:00
Keith Randall	e713d6f939	cmd/compile: memcombine if values being stored are from consecutive loads If we load 2 values and then store those 2 loaded values, we can likely perform that operation with a single wider load and store. Fixes #60709 Change-Id: Ifc5f92c2f1b174c6ed82a69070f16cec6853c770 Reviewed-on: https://go-review.googlesource.com/c/go/+/502295 Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2023-07-21 19:06:53 +00:00
Jes Cok	5d481abc87	all: fix typos Change-Id: I510b0a4bf3472d937393800dd57472c30beef329 GitHub-Last-Rev: `8d289b73a3` GitHub-Pull-Request: golang/go#60960 Reviewed-on: https://go-review.googlesource.com/c/go/+/505398 Auto-Submit: Robert Findley <rfindley@google.com> Reviewed-by: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Robert Findley <rfindley@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2023-07-18 19:55:29 +00:00
Paul E. Murphy	5e4000ad7f	cmd/compile: on PPC64, fix sign/zero extension when masking (ANDCCconst [y] (MOV.*reg x)) should only be merged when zero extending. Otherwise, sign bits are lost on negative values. (ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero extension of x. Likewise for the MOVHreg variant. Fixes #61297 Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b Reviewed-on: https://go-review.googlesource.com/c/go/+/508775 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-07-12 16:34:02 +00:00
Meng Zhuo	3fce111535	cmd/compile: fix FMA negative commutativity of riscv64 According to RISCV manual 11.6: FMADD x,y,z computes xy+z and FNMADD x,y,z => -xy-z FMSUB x,y,z => xy-z FNMSUB x,y,z => -xy+z respectively However our implement of SSA convert FMADD -x,y,z to FNMADD x,y,z which is wrong and should be convert to FNMSUB according to manual. Change-Id: Ib297bc83824e121fd7dda171ed56ea9694a4e575 Reviewed-on: https://go-review.googlesource.com/c/go/+/506575 Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-07-05 22:05:44 +00:00
Meng Zhuo	5c1a15df41	test/codegen: enable Mul2 DivPow2 test for riscv64 Change-Id: Ice0bb7a665599b334e927a1b00d1a5b400c15e3d Reviewed-on: https://go-review.googlesource.com/c/go/+/506035 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org>	2023-07-04 13:33:45 +00:00
Meng Zhuo	b7e7467865	test/codegen: add fsqrt test for riscv64 Add FSQRTD FSQRTS codegen tests for riscv64 Change-Id: I16ca3753ad1ba37afbd9d0f887b078e33f98fda0 Reviewed-on: https://go-review.googlesource.com/c/go/+/503275 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-06-15 15:16:20 +00:00
Keith Randall	c643b29381	cmd/compile: use callsite as line number for argument marshaling Don't use the line number of the argument itself, as that may be from arbitrarily earlier in the function. Fixes #60673 Change-Id: Ifc0a2aaae221a256be3a4b0b2e04849bae4b79d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/502656 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>	2023-06-12 20:34:37 +00:00
Austin Clements	c2e0bf0abf	cmd/internal/testdir: pass if GOEXPERIMENT=cgocheck2 is set Some testdir tests fail if GOEXPERIMENT=cgocheck2 is set. Fix this by skipping these tests. Change-Id: I58d4ef0cceb86bcf93220b4a44de9b9dc4879b16 Reviewed-on: https://go-review.googlesource.com/c/go/+/499675 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2023-06-01 18:30:44 +00:00
Bryan Mills	02d234e34d	Revert "cmd/compile: sparse conditional constant propagation" This reverts CL 483875. Reason for revert: appears to cause internal compiler errors on the ssacheck builder. Change-Id: I662418384291470c1962c417797a5890dd9aa7a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/497855 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com>	2023-05-24 14:39:34 +00:00
Junxian Zhu	f0d575c266	cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on mips64x This CL use MFC1/MTC1 instructions to move data between GPR and FPR instead of stores and loads to move float/int values. goos: linux goarch: mips64le pkg: math │ oldmath │ newmath │ │ sec/op │ sec/op vs base │ Acos-4 258.2n ± 0% 258.2n ± 0% ~ (p=0.859 n=8) Acosh-4 378.7n ± 0% 323.9n ± 0% -14.47% (p=0.000 n=8) Asin-4 255.1n ± 2% 255.5n ± 0% +0.16% (p=0.002 n=8) Asinh-4 407.1n ± 0% 348.7n ± 0% -14.35% (p=0.000 n=8) Atan-4 189.5n ± 0% 189.9n ± 3% ~ (p=0.205 n=8) Atanh-4 355.6n ± 0% 323.4n ± 2% -9.03% (p=0.000 n=8) Atan2-4 284.1n ± 7% 280.1n ± 4% ~ (p=0.313 n=8) Cbrt-4 314.3n ± 0% 236.4n ± 0% -24.79% (p=0.000 n=8) Ceil-4 144.3n ± 3% 139.6n ± 0% ~ (p=0.069 n=8) Compare-4 21.100n ± 0% 7.035n ± 0% -66.66% (p=0.000 n=8) Compare32-4 20.100n ± 0% 6.030n ± 0% -70.00% (p=0.000 n=8) Copysign-4 34.970n ± 0% 6.221n ± 0% -82.21% (p=0.000 n=8) Cos-4 183.4n ± 3% 184.1n ± 5% ~ (p=0.159 n=8) Cosh-4 487.9n ± 2% 419.6n ± 0% -14.00% (p=0.000 n=8) Erf-4 160.6n ± 0% 157.9n ± 0% -1.68% (p=0.009 n=8) Erfc-4 183.7n ± 4% 169.8n ± 0% -7.54% (p=0.000 n=8) Erfinv-4 191.5n ± 4% 183.6n ± 0% -4.13% (p=0.023 n=8) Erfcinv-4 192.0n ± 7% 184.3n ± 0% ~ (p=0.425 n=8) Exp-4 398.2n ± 0% 340.1n ± 4% -14.58% (p=0.000 n=8) ExpGo-4 383.3n ± 0% 327.3n ± 0% -14.62% (p=0.000 n=8) Expm1-4 248.7n ± 5% 216.0n ± 0% -13.11% (p=0.000 n=8) Exp2-4 372.8n ± 0% 316.9n ± 3% -14.98% (p=0.000 n=8) Exp2Go-4 374.1n ± 0% 320.5n ± 0% -14.33% (p=0.000 n=8) Abs-4 3.013n ± 0% 3.016n ± 0% +0.10% (p=0.020 n=8) Dim-4 5.021n ± 0% 5.022n ± 0% ~ (p=0.270 n=8) Floor-4 127.5n ± 4% 126.2n ± 3% ~ (p=0.186 n=8) Max-4 72.32n ± 0% 61.33n ± 0% -15.20% (p=0.000 n=8) Min-4 83.33n ± 1% 61.36n ± 0% -26.37% (p=0.000 n=8) Mod-4 690.7n ± 0% 454.5n ± 0% -34.20% (p=0.000 n=8) Frexp-4 116.30n ± 1% 71.80n ± 1% -38.26% (p=0.000 n=8) Gamma-4 389.0n ± 0% 355.9n ± 1% -8.48% (p=0.000 n=8) Hypot-4 102.40n ± 0% 83.90n ± 0% -18.07% (p=0.000 n=8) HypotGo-4 105.45n ± 4% 84.82n ± 2% -19.56% (p=0.000 n=8) Ilogb-4 99.13n ± 4% 63.71n ± 2% -35.73% (p=0.000 n=8) J0-4 859.7n ± 0% 854.8n ± 0% -0.57% (p=0.000 n=8) J1-4 873.9n ± 0% 875.7n ± 0% +0.21% (p=0.007 n=8) Jn-4 1.855µ ± 0% 1.867µ ± 0% +0.65% (p=0.000 n=8) Ldexp-4 130.50n ± 2% 64.35n ± 0% -50.69% (p=0.000 n=8) Lgamma-4 208.8n ± 0% 200.9n ± 0% -3.78% (p=0.000 n=8) Log-4 294.1n ± 0% 255.2n ± 3% -13.22% (p=0.000 n=8) Logb-4 105.45n ± 1% 66.81n ± 1% -36.64% (p=0.000 n=8) Log1p-4 268.2n ± 0% 211.3n ± 0% -21.21% (p=0.000 n=8) Log10-4 295.4n ± 0% 255.2n ± 2% -13.59% (p=0.000 n=8) Log2-4 152.9n ± 1% 127.5n ± 0% -16.61% (p=0.000 n=8) Modf-4 103.40n ± 0% 75.36n ± 0% -27.12% (p=0.000 n=8) Nextafter32-4 121.20n ± 1% 78.40n ± 0% -35.31% (p=0.000 n=8) Nextafter64-4 110.40n ± 1% 64.91n ± 0% -41.20% (p=0.000 n=8) PowInt-4 509.8n ± 1% 369.3n ± 1% -27.56% (p=0.000 n=8) PowFrac-4 1189.0n ± 0% 947.8n ± 0% -20.29% (p=0.000 n=8) Pow10Pos-4 15.07n ± 0% 15.07n ± 0% ~ (p=0.733 n=8) Pow10Neg-4 20.10n ± 0% 20.10n ± 0% ~ (p=0.576 n=8) Round-4 44.22n ± 0% 26.12n ± 0% -40.92% (p=0.000 n=8) RoundToEven-4 46.22n ± 0% 27.12n ± 0% -41.31% (p=0.000 n=8) Remainder-4 539.0n ± 1% 417.1n ± 1% -22.62% (p=0.000 n=8) Signbit-4 17.985n ± 0% 5.694n ± 0% -68.34% (p=0.000 n=8) Sin-4 185.7n ± 5% 172.9n ± 0% -6.89% (p=0.001 n=8) Sincos-4 176.6n ± 0% 200.9n ± 0% +13.76% (p=0.000 n=8) Sinh-4 495.8n ± 0% 435.9n ± 0% -12.09% (p=0.000 n=8) SqrtIndirect-4 5.022n ± 0% 5.024n ± 0% ~ (p=0.083 n=8) SqrtLatency-4 8.038n ± 0% 8.044n ± 0% ~ (p=0.524 n=8) SqrtIndirectLatency-4 8.035n ± 0% 8.039n ± 0% +0.06% (p=0.017 n=8) SqrtGoLatency-4 340.1n ± 0% 278.3n ± 0% -18.19% (p=0.000 n=8) SqrtPrime-4 5.381µ ± 0% 5.386µ ± 0% ~ (p=0.662 n=8) Tan-4 198.6n ± 1% 183.1n ± 0% -7.85% (p=0.000 n=8) Tanh-4 491.3n ± 1% 440.8n ± 1% -10.29% (p=0.000 n=8) Trunc-4 121.7n ± 0% 121.7n ± 0% ~ (p=0.769 n=8) Y0-4 855.1n ± 0% 859.8n ± 0% +0.54% (p=0.007 n=8) Y1-4 862.3n ± 0% 865.1n ± 0% +0.32% (p=0.007 n=8) Yn-4 1.830µ ± 0% 1.837µ ± 0% +0.36% (p=0.011 n=8) Float64bits-4 13.060n ± 0% 3.016n ± 0% -76.91% (p=0.000 n=8) Float64frombits-4 13.060n ± 0% 3.018n ± 0% -76.90% (p=0.000 n=8) Float32bits-4 13.060n ± 0% 3.016n ± 0% -76.91% (p=0.000 n=8) Float32frombits-4 13.070n ± 0% 3.013n ± 0% -76.94% (p=0.000 n=8) FMA-4 446.0n ± 0% 413.1n ± 1% -7.38% (p=0.000 n=8) geomean 143.4n 108.3n -24.49% Change-Id: I2067f7a5ae1126ada7ab3fb2083710e8212535e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/493815 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>	2023-05-24 03:36:31 +00:00
Yi Yang	fa50248ce6	cmd/compile: sparse conditional constant propagation sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately. Updates #59399 Change-Id: Ia954e906480654a6f0cc065d75b5912f96f36b2e GitHub-Last-Rev: `90fc02db99` GitHub-Pull-Request: golang/go#59575 Reviewed-on: https://go-review.googlesource.com/c/go/+/483875 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2023-05-24 02:54:03 +00:00

1 2 3 4 5 ...

459 Commits