qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-26 23:11:24 -07:00

Author	SHA1	Message	Date
Jorropo	5c67ebbb31	cmd/compile: AMD64v3 remove unnecessary TEST comparision in isPowerOfTwo With GOAMD64=V3 the canonical isPowerOfTwo function: func isPowerOfTwo(x uintptr) bool { return x&(x-1) == 0 } Used to compile to: temp := BLSR(x) // x&(x-1) flags = TEST(temp, temp) return flags.zf However the blsr instruction already set ZF according to the result. So we can remove the TEST instruction if we are just checking ZF. Such as in multiple pieces of code around memory allocations. This make the code smaller and faster. Change-Id: Ia12d5a73aa3cb49188c0b647b1eff7b56c5a7b58 Reviewed-on: https://go-review.googlesource.com/c/go/+/448255 Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-01-20 04:58:59 +00:00
Jorropo	fc814056aa	cmd/compile: rewrite empty makeslice to zerobase pointer make$\[\][a-zA-Z0-9]+, 0$ is seen 52 times in the go source. And at least 391 times on internet: https://grep.app/search?q=make%5C%28%5C%5B%5C%5D%5Ba-zA-Z0-9%5D%2B%2C%200%5C%29&regexp=true This used to compile to calling runtime.makeslice. However we can copy what we do for []T{}, just use a zerobase pointer. On my machine this is 10x faster (from 3ns to 0.3ns). Note that an empty loop also runs in 0.3ns, so this really is free when you count superscallar execution. Change-Id: I1cfe7e69f5a7a4dabbc71912ce6a4f8a2d4a7f3c Reviewed-on: https://go-review.googlesource.com/c/go/+/454036 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>	2023-01-20 04:57:35 +00:00
Keith Randall	f959fb3872	cmd/compile: add anchored version of SP The SPanchored opcode is identical to SP, except that it takes a memory argument so that it (and more importantly, anything that uses it) must be scheduled at or after that memory argument. This opcode ensures that a LEAQ of a variable gets scheduled after the corresponding VARDEF for that variable. This may lead to less CSE of LEAQ operations. The effect is very small. The go binary is only 80 bytes bigger after this CL. Usually LEAQs get folded into load/store operations, so the effect is only for pointerful types, large enough to need a duffzero, and have their address passed somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because the two uses are on different sides of a function call and the LEAQ ends up being rematerialized at the second use anyway. Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293 Reviewed-on: https://go-review.googlesource.com/c/go/+/452916 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2023-01-19 22:43:12 +00:00
Keith Randall	1eb0465fa5	cmd/compile: turn off jump tables when spectre retpolines are on Fixes #57097 Change-Id: I6ab659abbca1ae0ac8710674d39aec116fab0baa Reviewed-on: https://go-review.googlesource.com/c/go/+/455336 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org>	2022-12-06 05:12:12 +00:00
Paul E. Murphy	dc6b7c86df	cmd/compile: merge zero constant ISEL in PPC64 lateLower pass Add a new SSA opcode ISELZ, similar to ISELB to represent a select of value or 0. Then, merge candidate ISEL opcodes inside the late lower pass. This avoids complicating rules within the the lower pass. Change-Id: I3b14c94b763863aadc834b0e910a85870c131313 Reviewed-on: https://go-review.googlesource.com/c/go/+/442596 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Joedian Reid <joedian@golang.org>	2022-11-14 19:44:47 +00:00
Wayne Zuo	268f4629df	cmd/compile: enable brachelim pass on loong64 Change-Id: I4fd1c307901c265ab9865bf8a74460ddc15e5d14 Reviewed-on: https://go-review.googlesource.com/c/go/+/416735 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: xiaodong liu <teaofmoli@gmail.com> Auto-Submit: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>	2022-11-09 06:10:55 +00:00
Paul E. Murphy	390abbbbf1	codegen: check for PPC64 ISEL in condmove tests ISEL is roughly equivalent to CMOV on PPC64. Verify ISEL generation in all reasonable cases. Note "ISEL test x y z" is the same as "ISEL !test y x z". test is always one of LT (0), GT (1), EQ (2), SO (3). Sometimes x and y are swapped if GE/LE/NE is desired. Change-Id: Ie1bf029224064e004d855099731fe5e8d05aa990 Reviewed-on: https://go-review.googlesource.com/c/go/+/445215 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Than McIntosh <thanm@google.com>	2022-11-07 15:19:20 +00:00
Paul E. Murphy	d031e9e07a	cmd/compile/internal/ssa: re-adjust CarryChainTail scheduling priority This needs to be as low as possible while not breaking priority assumptions of other scores to correctly schedule carry chains. Prior to the arm64 changes, it was set below ReadTuple. At the time, this prevented the MulHiLo implementation on PPC64 from occluding the scheduling of a full carry chain. Memory scores can also prevent better scheduling, as can be observed with crypto/internal/edwards25519/field.feMulGeneric. Fixes #56497 Change-Id: Ia4b54e6dffcce584faf46b1b8d7cea18a3913887 Reviewed-on: https://go-review.googlesource.com/c/go/+/447435 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2022-11-03 19:59:19 +00:00
Keith Randall	9ce27feaeb	cmd/compile: add rule for post-decomposed growslice optimization The recently added rule only works before decomposing slices. Add a rule that works after decomposing slices. The reason we need the latter is because although the length may be a constant, it can be hidden inside a slice that is not constant (its pointer or capacity might be changing). By applying this optimization after decomposing slices, we can find more cases where it applies. Fixes #56440 Change-Id: I0094e59eee3065ab4d210defdda8227a6e897420 Reviewed-on: https://go-review.googlesource.com/c/go/+/446277 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2022-10-31 21:40:49 +00:00
Keith Randall	0156b797e6	cmd/compile: recognize when the result of append has a constant length Fixes a performance regression due to CL 418554. Fixes #56440 Change-Id: I6ff152e9b83084756363f49ee6b0844a7a284880 Reviewed-on: https://go-review.googlesource.com/c/go/+/445875 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-10-27 17:09:50 +00:00
Wayne Zuo	90a3527427	cmd/compile: intrinsify Sub64 on loong64 This is a follow up of CL 420095 on loong64. file before after Δ % compile/internal/ssa.a 35649482 35653274 +3792 +0.011% compile/internal/ssagen.a 4099858 4098728 -1130 -0.028% ecdh.a 227896 226896 -1000 -0.439% internal/nistec/fiat.a 1212254 1128184 -84070 -6.935% tls.a 3256800 3256802 +2 +0.000% big.a 1708518 1702496 -6022 -0.352% bits.a 106762 105734 -1028 -0.963% math.a 578762 577288 -1474 -0.255% netip.a 555922 555610 -312 -0.056% net.a 3286528 3286530 +2 +0.000% golang.org/x/crypto/internal/poly1305.a 109546 107686 -1860 -1.698% total 260392768 260299668 -93100 -0.036% Change-Id: Ieffca705aae5666501f284502d986ca179dde494 Reviewed-on: https://go-review.googlesource.com/c/go/+/428557 Reviewed-by: Carlos Amedee <carlos@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>	2022-10-07 18:16:26 +00:00
Wayne Zuo	97760ed651	cmd/compile: intrinsify Add64 on loong64 This is a follow up of CL 420094 on loong64. Reduce go toolchain size slightly on linux/loong64. compilecmp HEAD~1 -> HEAD HEAD~1 (`8a32354219`): internal/trace: use strings.Builder HEAD (1767784ac3): cmd/compile: intrinsify Add64 on loong64 platform: linux/loong64 file before after Δ % addr2line 3882616 3882536 -80 -0.002% api 5528866 5528450 -416 -0.008% asm 5133780 5133796 +16 +0.000% cgo 4668787 4668491 -296 -0.006% compile 25163409 25164729 +1320 +0.005% cover 4658055 4658007 -48 -0.001% dist 3437783 3437727 -56 -0.002% doc 3883069 3883205 +136 +0.004% fix 3383254 3383070 -184 -0.005% link 6747559 6747023 -536 -0.008% nm 3793923 3793939 +16 +0.000% objdump 4256628 4256812 +184 +0.004% pack 2356328 2356144 -184 -0.008% pprof 14233370 14131910 -101460 -0.713% test2json 2638668 2638476 -192 -0.007% trace 13392065 13360781 -31284 -0.234% vet 7456388 7455588 -800 -0.011% total 132498256 132364392 -133864 -0.101% file before after Δ % compile/internal/ssa.a 35644590 35649482 +4892 +0.014% compile/internal/ssagen.a 4101250 4099858 -1392 -0.034% internal/edwards25519/field.a 226064 201718 -24346 -10.770% internal/nistec/fiat.a 1689922 1212254 -477668 -28.266% tls.a 3256798 3256800 +2 +0.000% big.a 1718552 1708518 -10034 -0.584% bits.a 107786 106762 -1024 -0.950% cmplx.a 169434 168214 -1220 -0.720% math.a 581302 578762 -2540 -0.437% netip.a 556096 555922 -174 -0.031% net.a 3286526 3286528 +2 +0.000% runtime.a 8644786 8644510 -276 -0.003% strconv.a 519098 518374 -724 -0.139% golang.org/x/crypto/internal/poly1305.a 115398 109546 -5852 -5.071% total 260913122 260392768 -520354 -0.199% Change-Id: I75b2bb7761fa5a0d0d032d4ebe3582d092ea77be Reviewed-on: https://go-review.googlesource.com/c/go/+/428556 Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-10-07 18:16:10 +00:00
Wayne Zuo	af668c689c	cmd/compile: fold constant shift with extension on riscv64 For example: movb a0, a0 srai $1, a0, a0 the assembler will expand to: slli $56, a0, a0 srai $56, a0, a0 srai $1, a0, a0 this CL optimize to: slli $56, a0, a0 srai $57, a0, a0 Remove 270+ instructions from Go binary on linux/riscv64. Change-Id: I375e19f9d3bd54f2781791d8cbe5970191297dc8 Reviewed-on: https://go-review.googlesource.com/c/go/+/428496 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-10-06 05:21:04 +00:00
eric fang	ddc7d2a80c	cmd/compile: add late lower pass for last rules to run Usually optimization rules have corresponding priorities, some need to be run first, some run next, and some run last, which produces the best code. But currently our optimization rules have no priority, this CL adds a late lower pass that runs those rules that need to be run at last, such as split unreasonable constant folding. This pass can be seen as the second round of the lower pass. For example: func foo(a, b uint64) uint64 { d := a+0x1234568 d1 := b+0x1234568 return d&d1 } The code generated by the master branch: 0x0004 00004 ADD $19088744, R0, R2 // movz+movk+add 0x0010 00016 ADD $19088744, R1, R1 // movz+movk+add 0x001c 00028 AND R1, R2, R0 This is because the current constant folding optimization rules do not take into account the range of constants, causing the constant to be loaded repeatedly. This CL splits these unreasonable constants folding in the late lower pass. With this CL the generated code: 0x0004 00004 MOVD $19088744, R2 // movz+movk 0x000c 00012 ADD R0, R2, R3 0x0010 00016 ADD R1, R2, R1 0x0014 00020 AND R1, R3, R0 This CL also adds constant folding optimization for ADDS instruction. In addition, in order not to introduce the codegen regression, an optimization rule is added to change the addition of a negative number into a subtraction of a positive number. go1 benchmarks: name old time/op new time/op delta BinaryTree17-8 1.22s ± 1% 1.24s ± 0% +1.56% (p=0.008 n=5+5) Fannkuch11-8 1.54s ± 0% 1.53s ± 0% -0.69% (p=0.016 n=4+5) FmtFprintfEmpty-8 14.1ns ± 0% 14.1ns ± 0% ~ (p=0.079 n=4+5) FmtFprintfString-8 26.0ns ± 0% 26.1ns ± 0% +0.23% (p=0.008 n=5+5) FmtFprintfInt-8 32.3ns ± 0% 32.9ns ± 1% +1.72% (p=0.008 n=5+5) FmtFprintfIntInt-8 54.5ns ± 0% 55.5ns ± 0% +1.83% (p=0.008 n=5+5) FmtFprintfPrefixedInt-8 61.5ns ± 0% 62.0ns ± 0% +0.93% (p=0.008 n=5+5) FmtFprintfFloat-8 72.0ns ± 0% 73.6ns ± 0% +2.24% (p=0.008 n=5+5) FmtManyArgs-8 221ns ± 0% 224ns ± 0% +1.22% (p=0.008 n=5+5) GobDecode-8 1.91ms ± 0% 1.93ms ± 0% +0.98% (p=0.008 n=5+5) GobEncode-8 1.40ms ± 1% 1.39ms ± 0% -0.79% (p=0.032 n=5+5) Gzip-8 115ms ± 0% 117ms ± 1% +1.17% (p=0.008 n=5+5) Gunzip-8 19.4ms ± 1% 19.3ms ± 0% -0.71% (p=0.016 n=5+4) HTTPClientServer-8 27.0µs ± 0% 27.3µs ± 0% +0.80% (p=0.008 n=5+5) JSONEncode-8 3.36ms ± 1% 3.33ms ± 0% ~ (p=0.056 n=5+5) JSONDecode-8 17.5ms ± 2% 17.8ms ± 0% +1.71% (p=0.016 n=5+4) Mandelbrot200-8 2.29ms ± 0% 2.29ms ± 0% ~ (p=0.151 n=5+5) GoParse-8 1.35ms ± 1% 1.36ms ± 1% ~ (p=0.056 n=5+5) RegexpMatchEasy0_32-8 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.444 n=4+5) RegexpMatchEasy0_1K-8 131ns ±11% 118ns ± 6% ~ (p=0.056 n=5+5) RegexpMatchEasy1_32-8 22.9ns ± 0% 22.9ns ± 0% ~ (p=0.905 n=4+5) RegexpMatchEasy1_1K-8 126ns ± 0% 127ns ± 0% ~ (p=0.063 n=4+5) RegexpMatchMedium_32-8 486ns ± 5% 483ns ± 0% ~ (p=0.381 n=5+4) RegexpMatchMedium_1K-8 15.4µs ± 1% 15.5µs ± 0% ~ (p=0.151 n=5+5) RegexpMatchHard_32-8 687ns ± 0% 686ns ± 0% ~ (p=0.103 n=5+5) RegexpMatchHard_1K-8 20.7µs ± 0% 20.7µs ± 1% ~ (p=0.151 n=5+5) Revcomp-8 175ms ± 2% 176ms ± 3% ~ (p=1.000 n=5+5) Template-8 20.4ms ± 6% 20.1ms ± 2% ~ (p=0.151 n=5+5) TimeParse-8 112ns ± 0% 113ns ± 0% +0.97% (p=0.016 n=5+4) TimeFormat-8 156ns ± 0% 145ns ± 0% -7.14% (p=0.029 n=4+4) Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da Reviewed-on: https://go-review.googlesource.com/c/go/+/425134 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com>	2022-10-05 02:40:56 +00:00
Keith Randall	6485e8f503	cmd/compile: use stricter rule for possible partial overlap Partial overlaps can only happen for strict sub-pieces of larger arrays. That's a much stronger condition than the current optimization rules. Update #54467 Change-Id: I11e539b71099e50175f37ee78fddf69283f83ee5 Reviewed-on: https://go-review.googlesource.com/c/go/+/433056 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2022-09-27 20:09:33 +00:00
eric fang	cf53990b18	cmd/compile: Add some CMP and CMN optimization rules on arm64 This CL adds some optimizaion rules: 1, Converts CMP to CMN, or vice versa, when comparing with a negative number. 2, For equal and not equal comparisons, CMP can be converted to CMN in some cases. In theory we could do the same optimization for LT, LE, GT and GE, but need to account for overflow, this CL doesn't handle them. There are no noticeable performance changes. Change-Id: Ia49266c019ab7908ebc9510c2f02e121b1607869 Reviewed-on: https://go-review.googlesource.com/c/go/+/429795 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com>	2022-09-20 01:14:40 +00:00
Joel Sing	a7bcc94719	cmd/compile: resolve known outcomes for SLTI/SLTIU on riscv64 When SLTI/SLTIU is used with ANDI/ORI, it may be possible to determine the outcome based on the values of the immediates. Resolve these cases. Improves code generation for various shift operations. While here, sort tests by architecture to improve readability and ease future maintenance. Change-Id: I87e71e016a0e396a928e7d6389a2df61583dfd8d Reviewed-on: https://go-review.googlesource.com/c/go/+/428217 Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Jenny Rakoczy <jenny@golang.org> Reviewed-by: Jenny Rakoczy <jenny@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Jenny Rakoczy <jenny@golang.org>	2022-09-17 17:17:52 +00:00
ruinan	454a058ffc	cmd/compile: Add shiftIsBounded check for logic shifts of arm64 This CL adds shiftIsBounded checks for the Lsh* and Rsh* rules in arm64. There is no need to check the shift value again with CMP + CSEL when the shift value is valid. Change-Id: I54620de64f02a1b5a11089add237248ae2de01b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/417714 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-09-07 20:10:13 +00:00
Keith Randall	5b1fbfba1c	cmd/compile: rewrite >>c<<c to &^(1<<c-1) Fixes #54496 Change-Id: I3c2ed8cd55836d5b07c8cdec00d3b584885aca79 Reviewed-on: https://go-review.googlesource.com/c/go/+/424856 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Martin Möhrmann <martin@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <martin@golang.org>	2022-09-02 18:51:37 +00:00
Matthew Dempsky	34f0029a85	cmd/compile/internal/noder: allow OCONVNOP for identical iface conversions In go.dev/cl/421821, I included a hack to force OCONVNOP back to OCONVIFACE for conversions involving shape types and non-empty interfaces. The comment correctly noted that this was only needed for conversions between non-identical types, but the code was conservative and applied to even conversions between identical types. This CL adds an extra bool to record whether the conversion is between identical types, so we can keep OCONVNOP instead of forcing back to OCONVIFACE. This has a small improvement to generated code, because we no longer need a convI2I call (as demonstrated by codegen/ifaces.go). But more usefully, this is relevant to pruning unnecessary itab slots in runtime dictionaries (next CL). Change-Id: I94f89e961cd26629b925037fea58d283140766ff Reviewed-on: https://go-review.googlesource.com/c/go/+/427678 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-09-02 18:26:02 +00:00
Derek Parker	6605686e3b	cmd/compile: new inline heuristic for struct compares This CL changes the heuristic used to determine whether we can inline a struct equality check or if we must generate a function and call that function for equality. The old method was to count struct fields, but this can lead to poor in lining decisions. We should really be determining the cost of the equality check and use that to determine if we should inline or generate a function. The new benchmark provided in this CL returns the following when compared against tip: ``` name old time/op new time/op delta EqStruct-32 2.46ns ± 4% 0.25ns ±10% -89.72% (p=0.000 n=39+39) ``` Fixes #38494 Change-Id: Ie06b80a2b2a03a3fd0978bcaf7715f9afb66e0ab GitHub-Last-Rev: `e9a18d9389` GitHub-Pull-Request: golang/go#53326 Reviewed-on: https://go-review.googlesource.com/c/go/+/411674 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-09-02 17:48:22 +00:00
ruinan	121344ac33	cmd/compile: optimize RotateLeft8/16 on arm64 This CL optimizes RotateLeft8/16 on arm64. For 16 bits, we form a 32 bits register by duplicating two 16 bits registers, then use RORW instruction to do the rotate shift. For 8 bits, we just use LSR and LSL instead of RORW because the code is simpler. Benchmark Old ThisCL delta RotateLeft8-46 2.16 ns/op 1.73 ns/op -19.70% RotateLeft16-46 2.16 ns/op 1.54 ns/op -28.53% Change-Id: I09cde4383d12e31876a57f8cdfd3bb4f324fadb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/420976 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-09-02 17:46:31 +00:00
ruinan	54c7bc9cff	cmd/compile: optimize shift ops on arm64 when the shift value is v&63 For the following code case: var x uint64 x >> (shift & 63) We can directly genereta `x >> shift` on arm64, since the hardware will only use the bottom 6 bits of the shift amount. Benchmark old time/op new time/op delta ShiftArithmeticRight-8 0.40ns 0.31ns -21.7% Change-Id: Id58c8a5b2f6dd5c30c3876f4a36e11b4d81e2dc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/425294 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-09-02 17:44:29 +00:00
Keith Randall	33a7e5a4b4	cmd/compile: combine multiple rotate instructions Rotating by c, then by d, is the same as rotating by c+d. Change-Id: I36df82261460ff80f7c6d39bcdf0e840cef1c91a Reviewed-on: https://go-review.googlesource.com/c/go/+/424894 Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-08-31 22:10:52 +00:00
Keith Randall	69aed4712d	cmd/compile: use better splitting condition for string binary search Currently we use a full cmpstring to do the comparison for each split in the binary search for a string switch. Instead, split by comparing a single byte of the input string with a constant. That will give us a much faster split (although it might be not quite as good a split). Fixes #53333 R=go1.20 Change-Id: I28c7209342314f367071e4aa1f2beb6ec9ff7123 Reviewed-on: https://go-review.googlesource.com/c/go/+/414894 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-08-31 22:08:26 +00:00
Wayne Zuo	da6556968f	cmd/compile: simplify bounded shift on riscv64 The prove pass will mark some shifts bounded, and then we can use that information to generate better code on riscv64. Change-Id: Ia22f43d0598453c9417adac7017db28d7240948b Reviewed-on: https://go-review.googlesource.com/c/go/+/422616 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-31 20:21:00 +00:00
Wayne Zuo	e8f0340fa4	cmd/compile: intrinsify RotateLeft{32,64} on loong64 Benchmark on crypto/sha256 (provided by Xiaodong Liu): name old time/op new time/op delta Hash8Bytes/New 1.19µs ± 0% 0.97µs ± 0% -18.75% (p=0.000 n=9+9) Hash8Bytes/Sum224 1.21µs ± 0% 0.97µs ± 0% -20.04% (p=0.000 n=9+10) Hash8Bytes/Sum256 1.21µs ± 0% 0.98µs ± 0% -19.16% (p=0.000 n=10+7) Hash1K/New 15.9µs ± 0% 12.4µs ± 0% -22.10% (p=0.000 n=10+10) Hash1K/Sum224 15.9µs ± 0% 12.4µs ± 0% -22.18% (p=0.000 n=8+10) Hash1K/Sum256 15.9µs ± 0% 12.4µs ± 0% -22.15% (p=0.000 n=10+9) Hash8K/New 119µs ± 0% 92µs ± 0% -22.40% (p=0.000 n=10+9) Hash8K/Sum224 119µs ± 0% 92µs ± 0% -22.41% (p=0.000 n=9+10) Hash8K/Sum256 119µs ± 0% 92µs ± 0% -22.40% (p=0.000 n=9+9) name old speed new speed delta Hash8Bytes/New 6.70MB/s ± 0% 8.25MB/s ± 0% +23.13% (p=0.000 n=10+10) Hash8Bytes/Sum224 6.60MB/s ± 0% 8.26MB/s ± 0% +25.06% (p=0.000 n=10+10) Hash8Bytes/Sum256 6.59MB/s ± 0% 8.15MB/s ± 0% +23.67% (p=0.000 n=10+7) Hash1K/New 64.3MB/s ± 0% 82.5MB/s ± 0% +28.36% (p=0.000 n=10+10) Hash1K/Sum224 64.3MB/s ± 0% 82.6MB/s ± 0% +28.51% (p=0.000 n=10+10) Hash1K/Sum256 64.3MB/s ± 0% 82.6MB/s ± 0% +28.46% (p=0.000 n=9+9) Hash8K/New 69.0MB/s ± 0% 89.0MB/s ± 0% +28.87% (p=0.000 n=10+8) Hash8K/Sum224 69.0MB/s ± 0% 89.0MB/s ± 0% +28.88% (p=0.000 n=9+10) Hash8K/Sum256 69.0MB/s ± 0% 88.9MB/s ± 0% +28.87% (p=0.000 n=8+9) Benchmark on crypto/sha512 (provided by Xiaodong Liu): name old time/op new time/op delta Hash8Bytes/New 1.55µs ± 0% 1.31µs ± 0% -15.67% (p=0.000 n=10+10) Hash8Bytes/Sum384 1.59µs ± 0% 1.35µs ± 0% -14.97% (p=0.000 n=10+10) Hash8Bytes/Sum512 1.62µs ± 0% 1.39µs ± 0% -14.02% (p=0.000 n=10+10) Hash1K/New 10.7µs ± 0% 8.6µs ± 0% -19.60% (p=0.000 n=8+8) Hash1K/Sum384 10.8µs ± 0% 8.7µs ± 0% -19.40% (p=0.000 n=9+9) Hash1K/Sum512 10.8µs ± 0% 8.7µs ± 0% -19.35% (p=0.000 n=9+10) Hash8K/New 74.6µs ± 0% 59.6µs ± 0% -20.08% (p=0.000 n=10+9) Hash8K/Sum384 74.7µs ± 0% 59.7µs ± 0% -20.04% (p=0.000 n=9+8) Hash8K/Sum512 74.7µs ± 0% 59.7µs ± 0% -20.01% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes/New 5.16MB/s ± 0% 6.12MB/s ± 0% +18.60% (p=0.000 n=10+8) Hash8Bytes/Sum384 5.02MB/s ± 0% 5.90MB/s ± 0% +17.56% (p=0.000 n=10+10) Hash8Bytes/Sum512 4.94MB/s ± 0% 5.74MB/s ± 0% +16.29% (p=0.000 n=10+9) Hash1K/New 95.4MB/s ± 0% 118.6MB/s ± 0% +24.38% (p=0.000 n=10+10) Hash1K/Sum384 95.0MB/s ± 0% 117.9MB/s ± 0% +24.06% (p=0.000 n=8+9) Hash1K/Sum512 94.8MB/s ± 0% 117.5MB/s ± 0% +23.99% (p=0.000 n=8+9) Hash8K/New 110MB/s ± 0% 137MB/s ± 0% +25.11% (p=0.000 n=9+6) Hash8K/Sum384 110MB/s ± 0% 137MB/s ± 0% +25.07% (p=0.000 n=9+8) Hash8K/Sum512 110MB/s ± 0% 137MB/s ± 0% +25.01% (p=0.000 n=10+10) Change-Id: I28ccfce634659305a336c8e0a3f8589f7361d661 Reviewed-on: https://go-review.googlesource.com/c/go/+/422317 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: David Chase <drchase@google.com>	2022-08-30 03:21:44 +00:00
Wayne Zuo	a6219737e3	cmd/compile: intrinsify Sub64 on riscv64 After this CL, the performance difference in crypto/elliptic benchmarks on linux/riscv64 are: name old time/op new time/op delta ScalarBaseMult/P256 1.64ms ± 1% 1.60ms ± 1% -2.36% (p=0.008 n=5+5) ScalarBaseMult/P224 1.53ms ± 1% 1.47ms ± 2% -4.24% (p=0.008 n=5+5) ScalarBaseMult/P384 5.12ms ± 2% 5.03ms ± 2% ~ (p=0.095 n=5+5) ScalarBaseMult/P521 22.3ms ± 2% 13.8ms ± 1% -37.89% (p=0.008 n=5+5) ScalarMult/P256 4.49ms ± 2% 4.26ms ± 2% -5.13% (p=0.008 n=5+5) ScalarMult/P224 4.33ms ± 1% 4.09ms ± 1% -5.59% (p=0.008 n=5+5) ScalarMult/P384 16.3ms ± 1% 15.5ms ± 2% -4.78% (p=0.008 n=5+5) ScalarMult/P521 101ms ± 0% 47ms ± 2% -53.36% (p=0.008 n=5+5) Change-Id: I31cf0506e27f9d85f576af1813630a19c20dda8a Reviewed-on: https://go-review.googlesource.com/c/go/+/420095 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-27 05:43:59 +00:00
Wayne Zuo	969f48a3a2	cmd/compile: intrinsify Add64 on riscv64 According to RISCV instruction set manual v2.2 Sec 2.4, we can implement overflowing check for unsigned addition cheaply using SLTU instructions. After this CL, the performance difference in crypto/elliptic benchmarks on linux/riscv64 are: name old time/op new time/op delta ScalarBaseMult/P256 1.93ms ± 1% 1.64ms ± 1% -14.96% (p=0.008 n=5+5) ScalarBaseMult/P224 1.80ms ± 2% 1.53ms ± 1% -14.89% (p=0.008 n=5+5) ScalarBaseMult/P384 6.15ms ± 2% 5.12ms ± 2% -16.73% (p=0.008 n=5+5) ScalarBaseMult/P521 25.9ms ± 1% 22.3ms ± 2% -13.78% (p=0.008 n=5+5) ScalarMult/P256 5.59ms ± 1% 4.49ms ± 2% -19.79% (p=0.008 n=5+5) ScalarMult/P224 5.42ms ± 1% 4.33ms ± 1% -20.01% (p=0.008 n=5+5) ScalarMult/P384 19.9ms ± 2% 16.3ms ± 1% -18.15% (p=0.008 n=5+5) ScalarMult/P521 97.3ms ± 1% 100.7ms ± 0% +3.48% (p=0.008 n=5+5) Change-Id: Ic4c82ced4b072a4a6575343fa9f29dd09b0cabc4 Reviewed-on: https://go-review.googlesource.com/c/go/+/420094 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-27 05:43:32 +00:00
Wayne Zuo	b60432df14	cmd/compile: deadcode for LoweredMuluhilo on riscv64 This is a follow up of CL 425101 on RISCV64. According to RISCV Volume 1, Unprivileged Spec v. 20191213 Chapter 7.1: If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. So we should not split Muluhilo to separate instructions. Updates #54607 Change-Id: If47461f3aaaf00e27cd583a9990e144fb8bcdb17 Reviewed-on: https://go-review.googlesource.com/c/go/+/425203 Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-24 18:08:33 +00:00
Keith Randall	60ad3c48f5	cmd/compile: move SSA rotate instruction detection to arch-independent rules Detect rotate instructions while still in architecture-independent form. It's easier to do here, and we don't need to repeat it in each architecture file. Change-Id: I9396954b3f3b3bfb96c160d064a02002309935bb Reviewed-on: https://go-review.googlesource.com/c/go/+/421195 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Eric Fang <eric.fang@arm.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-08-23 21:24:14 +00:00
eric fang	9f0f87c806	cmd/internal/obj/arm64: remove the transition from $0 to ZR Previously we convert $0 to the ZR register for some reasons, which causes two problems: 1. Confusion, the special case of the ZR register needs to be considered when dealing with constants. For encoding, some places we encode ZR, and some places we encode $0, although we have converted $0 to ZR. 2. Unexpected instruction format. All instructions that support ZR register operands can be replaced by $0. This patch removes this conversion. Note that this patch may cause previously unintendedly supported instruction formats to no longer be supported. Change-Id: I3d8d2c06711b7614a38191397da7776417f1861c Reviewed-on: https://go-review.googlesource.com/c/go/+/404316 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-23 06:11:32 +00:00
eric fang	0a52d80666	cmd/compile/internal/ssa: optimize memory moving on arm64 This CL optimizes memory moving with LDP and STP on arm64. Benchmarks: name old time/op new time/op delta ClearFat7-160 1.08ns ± 0% 0.95ns ± 0% -11.41% (p=0.008 n=5+5) ClearFat8-160 0.84ns ± 0% 0.84ns ± 0% -0.95% (p=0.008 n=5+5) ClearFat11-160 1.08ns ± 0% 0.95ns ± 0% -11.46% (p=0.008 n=5+5) ClearFat12-160 0.95ns ± 0% 0.95ns ± 0% ~ (p=0.063 n=4+5) ClearFat13-160 1.08ns ± 0% 0.95ns ± 0% -11.45% (p=0.008 n=5+5) ClearFat14-160 1.08ns ± 0% 0.95ns ± 0% -11.47% (p=0.008 n=5+5) ClearFat15-160 1.24ns ± 0% 0.95ns ± 0% -22.98% (p=0.029 n=4+4) ClearFat16-160 0.84ns ± 0% 0.83ns ± 0% -0.11% (p=0.008 n=5+5) ClearFat24-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat32-160 2.86ns ± 0% 2.86ns ± 0% ~ (p=0.333 n=5+4) ClearFat40-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat48-160 3.32ns ± 1% 3.31ns ± 1% ~ (p=0.690 n=5+5) ClearFat56-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat64-160 3.25ns ± 1% 3.26ns ± 1% ~ (p=0.841 n=5+5) ClearFat72-160 2.22ns ± 0% 2.22ns ± 0% ~ (p=0.444 n=5+5) ClearFat128-160 4.03ns ± 0% 4.04ns ± 0% +0.32% (p=0.008 n=5+5) ClearFat256-160 6.44ns ± 0% 6.44ns ± 0% +0.08% (p=0.016 n=4+5) ClearFat512-160 12.2ns ± 0% 12.2ns ± 0% +0.13% (p=0.008 n=5+5) ClearFat1024-160 24.3ns ± 0% 24.3ns ± 0% ~ (p=0.167 n=5+5) ClearFat1032-160 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.238 n=4+5) ClearFat1040-160 29.2ns ± 0% 29.3ns ± 0% +0.34% (p=0.008 n=5+5) CopyFat7-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5) CopyFat8-160 0.89ns ± 0% 0.89ns ± 0% ~ (p=0.238 n=5+5) CopyFat11-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5) CopyFat12-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.238 n=5+4) CopyFat13-160 1.43ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5) CopyFat14-160 1.43ns ± 0% 1.07ns ± 0% -24.95% (p=0.008 n=5+5) CopyFat15-160 1.79ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5) CopyFat16-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.444 n=5+5) CopyFat24-160 1.84ns ± 2% 1.67ns ± 0% -9.28% (p=0.008 n=5+5) CopyFat32-160 3.22ns ± 0% 2.92ns ± 0% -9.40% (p=0.008 n=5+5) CopyFat64-160 3.64ns ± 0% 3.57ns ± 0% -1.96% (p=0.008 n=5+5) CopyFat72-160 3.56ns ± 0% 3.11ns ± 0% -12.89% (p=0.008 n=5+5) CopyFat128-160 5.06ns ± 0% 5.06ns ± 0% +0.04% (p=0.048 n=5+5) CopyFat256-160 9.13ns ± 0% 9.13ns ± 0% ~ (p=0.659 n=5+5) CopyFat512-160 17.4ns ± 0% 17.4ns ± 0% ~ (p=0.167 n=5+5) CopyFat520-160 17.2ns ± 0% 17.3ns ± 0% +0.37% (p=0.008 n=5+5) CopyFat1024-160 34.1ns ± 0% 34.0ns ± 0% ~ (p=0.127 n=5+5) CopyFat1032-160 80.9ns ± 0% 34.2ns ± 0% -57.74% (p=0.008 n=5+5) CopyFat1040-160 94.4ns ± 0% 41.7ns ± 0% -55.78% (p=0.016 n=5+4) Change-Id: I14186f9f82b0ecf8b6c02191dc5da566b9a21e6c Reviewed-on: https://go-review.googlesource.com/c/go/+/421654 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-23 03:09:07 +00:00
Cherry Mui	8bf9e01473	cmd/compile: split Muluhilo op on ARM64 On ARM64 we use two separate instructions to compute the hi and lo results of a 64x64->128 multiplication. Lower to two separate ops so if only one result is needed we can deadcode the other. Fixes #54607. Change-Id: Ib023e77eb2b2b0bcf467b45471cb8a294bce6f90 Reviewed-on: https://go-review.googlesource.com/c/go/+/425101 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2022-08-22 21:29:31 +00:00
Keith Randall	908499adec	cmd/compile: stop using VARKILL With the introduction of stack objects, VARKILL information is no longer needed. With stack objects, an object is dead when there are no more static references to it, and the stack scanner can't find any live pointers to it. VARKILL information isn't used to establish live ranges for address-taken variables any more. In effect, the last static reference is the VARKILL, and there's an additional dynamic liveness check during stack scanning. Next CL will actually rip out the VARKILL opcodes. Change-Id: I030a2ab867445cf4e0e69397911f8a2e2f0ed07b Reviewed-on: https://go-review.googlesource.com/c/go/+/419234 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-08-18 17:36:38 +00:00
Archana R	d09c6ac417	test/codegen: updated multiple tests to verify on ppc64,ppc64le Updated multiple tests in test/codegen: math.go, mathbits.go, shift.go and slices.go to verify on ppc64/ppc64le as well Change-Id: Id88dd41569b7097819fb4d451b615f69cf7f7a94 Reviewed-on: https://go-review.googlesource.com/c/go/+/412115 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-08-17 13:56:55 +00:00
Wayne Zuo	09932f95f5	cmd/compile: combine more constant stores on amd64 Fixes #53324 Change-Id: I06149d860f858b082235e9d80bf0ea494679b386 Reviewed-on: https://go-review.googlesource.com/c/go/+/411614 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-08-15 16:19:21 +00:00
eric fang	efe5929dbd	cmd/compile/internal/ssa: optimize ARM64 code with TST For signed comparisons, the following four optimization rules hold: (CMPconst [0] z:(AND x y)) && z.Uses == 1 => (TST x y) (CMPWconst [0] z:(AND x y)) && z.Uses == 1 => (TSTW x y) (CMPconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTconst [c] y) (CMPWconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTWconst [int32(c)] y) But currently they only apply to jump instructions, not to conditional instructions within a block, such as cset, csel, etc. This CL extends the above rules into blocks so that conditional instructions can also be optimized. name old time/op new time/op delta DivisiblePow2constI64-160 1.04ns ± 0% 0.86ns ± 0% -17.30% (p=0.008 n=5+5) DivisiblePow2constI32-160 1.04ns ± 0% 0.87ns ± 0% -16.16% (p=0.016 n=4+5) DivisiblePow2constI16-160 1.04ns ± 0% 0.87ns ± 0% -16.03% (p=0.008 n=5+5) DivisiblePow2constI8-160 1.04ns ± 0% 0.86ns ± 0% -17.15% (p=0.008 n=5+5) Change-Id: I6bc34bff30862210e8dd001e0340b8fe502fe3de Reviewed-on: https://go-review.googlesource.com/c/go/+/420434 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com>	2022-08-10 02:13:53 +00:00
Cuong Manh Le	0f8dffd0aa	all: use ":" for compiler generated symbols As it can't appear in user package paths. There is a hack for handling "go:buildid" and "type:" on windows/386. Previously, windows/386 requires underscore prefix on external symbols, but that's only applied for SHOSTOBJ/SUNDEFEXT or cgo export symbols. "go.buildid" is STEXT, "type." is STYPE, thus they are not prefixed with underscore. In external linking mode, the external linker can't resolve them as external symbols. But we are lucky that they have "." in their name, so the external linker see them as Forwarder RVA exports. See: - https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#export-address-table - https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/pe-dll.c;h=e7b82ba6ffadf74dc1b9ee71dc13d48336941e51;hb=HEAD#l972) This CL changes "." to ":" in symbols name, so theses symbols can not be found by external linker anymore. So a hacky way is adding the underscore prefix for these 2 symbols. I don't have enough knowledge to verify whether adding the underscore for all STEXT/STYPE symbols are fine, even if it could be, that would be done in future CL. Fixes #37762 Change-Id: I92eaaf24c0820926a36e0530fdb07b07af1fcc35 Reviewed-on: https://go-review.googlesource.com/c/go/+/317917 Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-09 11:28:56 +00:00
Lynn Boger	c1bfefe9d1	cmd/compile: fix confusion with ANDCCconst in PPC64 rules Currently there is a an ANDconst and an ANDCCconst op in PPC64, which is confusing since they map onto the same instruction. One of these ops sets the result of the AND operation, and the other sets the flag (condition register). This converts ANDCCconst into an op with the 2 expected results: the integer result of the AND and the flag setting. The ANDconst op has been removed. Note that in the PPC64 ISA the only variation of the 'and immediate' is the one that sets the condition bit, which probably led to the original (confusing) implementation. This also adds a few rules to improve the use of ANDCCconst with ISELB and some testcases to verify those improvements. Change-Id: I523703fa4da2098eb995dc3ba744d36fa28e41d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/422015 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Paul Murphy <murp@ibm.com>	2022-08-08 20:15:55 +00:00
Keith Randall	c2a9c55823	cmd/compile: optimize unsafe.Slice generated code We don't need a multiply when the element type is size 0 or 1. The panic functions don't return, so we don't need any post-call code (register restores, etc.). Change-Id: I0dcea5df56d29d7be26554ddca966b3903c672e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/419754 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2022-08-08 17:36:47 +00:00
cuiweixie	e7307034cc	cmd/compile: store combine on amd64 Fixes #54120 Change-Id: I6915b6e8d459d9becfdef4fdcba95ee4dea6af05 GitHub-Last-Rev: `03f19942c7` GitHub-Pull-Request: golang/go#54126 Reviewed-on: https://go-review.googlesource.com/c/go/+/420115 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>	2022-08-08 16:40:58 +00:00
Matthew Dempsky	ab8d7dd75e	cmd/compile: set LocalPkg.Path to -p flag Since CL 391014, cmd/compile now requires the -p flag to be set the build system. This CL changes it to initialize LocalPkg.Path to the provided path, rather than relying on writing out `"".` into object files and expecting cmd/link to substitute them. However, this actually involved a rather long tail of fixes. Many have already been submitted, but a few notable ones that have to land simultaneously with changing LocalPkg: 1. When compiling package runtime, there are really two "runtime" packages: types.LocalPkg (the source package itself) and ir.Pkgs.Runtime (the compiler's internal representation, for synthetic references). Previously, these ended up creating separate link symbols (`"".xxx` and `runtime.xxx`, respectively), but now they both end up as `runtime.xxx`, which causes lsym collisions (notably inittask and funcsyms). 2. test/codegen tests need to be updated to expect symbols to be named `command-line-arguments.xxx` rather than `"".foo`. 3. The issue20014 test case is sensitive to the sort order of field tracking symbols. In particular, the local package now sorts to its natural place in the list, rather than to the front. Thanks to David Chase for helping track down all of the fixes needed for this CL. Updates #51734. Change-Id: Iba3041cf7ad967d18c6e17922fa06ba11798b565 Reviewed-on: https://go-review.googlesource.com/c/go/+/393715 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2022-05-16 18:19:47 +00:00
Cherry Mui	540f8c2b50	cmd/compile: use jump table on ARM64 Following CL 357330, use jump tables on ARM64. name old time/op new time/op delta Switch8Predictable-4 3.41ns ± 0% 3.21ns ± 0% ~ (p=0.079 n=4+5) Switch8Unpredictable-4 12.0ns ± 0% 9.5ns ± 0% -21.17% (p=0.000 n=5+4) Switch32Predictable-4 3.06ns ± 0% 2.82ns ± 0% -7.78% (p=0.008 n=5+5) Switch32Unpredictable-4 13.3ns ± 0% 9.5ns ± 0% -28.87% (p=0.016 n=4+5) SwitchStringPredictable-4 3.71ns ± 0% 3.21ns ± 0% -13.43% (p=0.000 n=5+4) SwitchStringUnpredictable-4 14.8ns ± 0% 15.1ns ± 0% +2.37% (p=0.008 n=5+5) Change-Id: Ia0b85df7ca9273cf70c05eb957225c6e61822fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/403979 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2022-05-13 19:51:03 +00:00
Paul E. Murphy	0c43878baa	cmd/compile: lower Add64/Sub64 into ssa on PPC64 math/bits.Add64 and math/bits.Sub64 now lower and optimize directly in SSA form. The optimization of carry chains focuses around eliding XER<->GPR transfers of the CA bit when used exclusively as an input to a single carry operations, or when the CA value is known. This also adds support for handling XER spills in the assembler which could happen if carry chains contain inter-dependencies on each other (which seems very unlikely with practical usage), or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA). With PPC64 Add64/Sub64 lowering into SSA and this patch, the net performance difference in crypto/elliptic benchmarks on P9/ppc64le are: name old time/op new time/op delta ScalarBaseMult/P256 46.3µs ± 0% 46.9µs ± 0% +1.34% ScalarBaseMult/P224 356µs ± 0% 209µs ± 0% -41.14% ScalarBaseMult/P384 1.20ms ± 0% 0.57ms ± 0% -52.14% ScalarBaseMult/P521 3.38ms ± 0% 1.44ms ± 0% -57.27% ScalarMult/P256 199µs ± 0% 199µs ± 0% -0.17% ScalarMult/P224 357µs ± 0% 212µs ± 0% -40.56% ScalarMult/P384 1.20ms ± 0% 0.58ms ± 0% -51.86% ScalarMult/P521 3.37ms ± 0% 1.44ms ± 0% -57.32% MarshalUnmarshal/P256/Uncompressed 2.59µs ± 0% 2.52µs ± 0% -2.63% MarshalUnmarshal/P256/Compressed 2.58µs ± 0% 2.52µs ± 0% -2.06% MarshalUnmarshal/P224/Uncompressed 1.54µs ± 0% 1.40µs ± 0% -9.42% MarshalUnmarshal/P224/Compressed 1.54µs ± 0% 1.39µs ± 0% -9.87% MarshalUnmarshal/P384/Uncompressed 2.40µs ± 0% 1.80µs ± 0% -24.93% MarshalUnmarshal/P384/Compressed 2.35µs ± 0% 1.81µs ± 0% -23.03% MarshalUnmarshal/P521/Uncompressed 3.79µs ± 0% 2.58µs ± 0% -31.81% MarshalUnmarshal/P521/Compressed 3.80µs ± 0% 2.60µs ± 0% -31.67% Note, P256 uses an asm implementation, thus, little variation is expected. Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71 Reviewed-on: https://go-review.googlesource.com/c/go/+/346870 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-05-10 20:03:53 +00:00
Jorropo	e1e056fa6a	cmd/compile: fold constants found by prove It is hit ~70k times building go. This make the go binary, 0.04% smaller. I didn't included benchmarks because this is just constant foldings and is hard to mesure objectively. For example, this enable rewriting things like: if x == 20 { return x + 30 + z } Into: if x == 20 { return 50 + z } It's not just fixing programer's code, the ssa generator generate code like this sometimes. Change-Id: I0861f342b27f7227b5f1c34d8267fa0057b1bbbc GitHub-Last-Rev: `4c2f9b5216` GitHub-Pull-Request: golang/go#52669 Reviewed-on: https://go-review.googlesource.com/c/go/+/403735 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>	2022-05-04 20:30:17 +00:00
Paul E. Murphy	c570f0eda2	cmd/compile: combine OR + NOT into ORN on PPC64 This shows up in a few crypto functions, and other assorted places. Change-Id: I5a7f4c25ddd4a6499dc295ef693b9fe43d2448ab Reviewed-on: https://go-review.googlesource.com/c/go/+/404057 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2022-05-04 18:49:50 +00:00
Cuong Manh Le	0668e3cb1a	cmd/compile: support pointers to arrays in arrayClear Fixes #52635 Change-Id: I85f182931e30292983ef86c55a0ab6e01282395c Reviewed-on: https://go-review.googlesource.com/c/go/+/403337 Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-05-03 05:42:48 +00:00
Keith Randall	c4b2288755	cmd/compile: add jump table codegen test Change-Id: Ic67f676f5ebe146166a0d3c1d78a802881320e49 Reviewed-on: https://go-review.googlesource.com/c/go/+/400375 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>	2022-04-14 21:16:29 +00:00
Wayne Zuo	66f03f79da	cmd/compile: add SHLX&SHRX without load Change-Id: I79eb5e7d6bcb23f26d3a100e915efff6dae70391 Reviewed-on: https://go-review.googlesource.com/c/go/+/399061 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-13 17:48:36 +00:00

1 2 3 4 5 ...

381 Commits