qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-17 16:24:42 -07:00

Author	SHA1	Message	Date
Keith Randall	9ce27feaeb	cmd/compile: add rule for post-decomposed growslice optimization The recently added rule only works before decomposing slices. Add a rule that works after decomposing slices. The reason we need the latter is because although the length may be a constant, it can be hidden inside a slice that is not constant (its pointer or capacity might be changing). By applying this optimization after decomposing slices, we can find more cases where it applies. Fixes #56440 Change-Id: I0094e59eee3065ab4d210defdda8227a6e897420 Reviewed-on: https://go-review.googlesource.com/c/go/+/446277 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2022-10-31 21:40:49 +00:00
Keith Randall	0156b797e6	cmd/compile: recognize when the result of append has a constant length Fixes a performance regression due to CL 418554. Fixes #56440 Change-Id: I6ff152e9b83084756363f49ee6b0844a7a284880 Reviewed-on: https://go-review.googlesource.com/c/go/+/445875 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-10-27 17:09:50 +00:00
Wayne Zuo	90a3527427	cmd/compile: intrinsify Sub64 on loong64 This is a follow up of CL 420095 on loong64. file before after Δ % compile/internal/ssa.a 35649482 35653274 +3792 +0.011% compile/internal/ssagen.a 4099858 4098728 -1130 -0.028% ecdh.a 227896 226896 -1000 -0.439% internal/nistec/fiat.a 1212254 1128184 -84070 -6.935% tls.a 3256800 3256802 +2 +0.000% big.a 1708518 1702496 -6022 -0.352% bits.a 106762 105734 -1028 -0.963% math.a 578762 577288 -1474 -0.255% netip.a 555922 555610 -312 -0.056% net.a 3286528 3286530 +2 +0.000% golang.org/x/crypto/internal/poly1305.a 109546 107686 -1860 -1.698% total 260392768 260299668 -93100 -0.036% Change-Id: Ieffca705aae5666501f284502d986ca179dde494 Reviewed-on: https://go-review.googlesource.com/c/go/+/428557 Reviewed-by: Carlos Amedee <carlos@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>	2022-10-07 18:16:26 +00:00
Wayne Zuo	97760ed651	cmd/compile: intrinsify Add64 on loong64 This is a follow up of CL 420094 on loong64. Reduce go toolchain size slightly on linux/loong64. compilecmp HEAD~1 -> HEAD HEAD~1 (`8a32354219`): internal/trace: use strings.Builder HEAD (1767784ac3): cmd/compile: intrinsify Add64 on loong64 platform: linux/loong64 file before after Δ % addr2line 3882616 3882536 -80 -0.002% api 5528866 5528450 -416 -0.008% asm 5133780 5133796 +16 +0.000% cgo 4668787 4668491 -296 -0.006% compile 25163409 25164729 +1320 +0.005% cover 4658055 4658007 -48 -0.001% dist 3437783 3437727 -56 -0.002% doc 3883069 3883205 +136 +0.004% fix 3383254 3383070 -184 -0.005% link 6747559 6747023 -536 -0.008% nm 3793923 3793939 +16 +0.000% objdump 4256628 4256812 +184 +0.004% pack 2356328 2356144 -184 -0.008% pprof 14233370 14131910 -101460 -0.713% test2json 2638668 2638476 -192 -0.007% trace 13392065 13360781 -31284 -0.234% vet 7456388 7455588 -800 -0.011% total 132498256 132364392 -133864 -0.101% file before after Δ % compile/internal/ssa.a 35644590 35649482 +4892 +0.014% compile/internal/ssagen.a 4101250 4099858 -1392 -0.034% internal/edwards25519/field.a 226064 201718 -24346 -10.770% internal/nistec/fiat.a 1689922 1212254 -477668 -28.266% tls.a 3256798 3256800 +2 +0.000% big.a 1718552 1708518 -10034 -0.584% bits.a 107786 106762 -1024 -0.950% cmplx.a 169434 168214 -1220 -0.720% math.a 581302 578762 -2540 -0.437% netip.a 556096 555922 -174 -0.031% net.a 3286526 3286528 +2 +0.000% runtime.a 8644786 8644510 -276 -0.003% strconv.a 519098 518374 -724 -0.139% golang.org/x/crypto/internal/poly1305.a 115398 109546 -5852 -5.071% total 260913122 260392768 -520354 -0.199% Change-Id: I75b2bb7761fa5a0d0d032d4ebe3582d092ea77be Reviewed-on: https://go-review.googlesource.com/c/go/+/428556 Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-10-07 18:16:10 +00:00
Wayne Zuo	af668c689c	cmd/compile: fold constant shift with extension on riscv64 For example: movb a0, a0 srai $1, a0, a0 the assembler will expand to: slli $56, a0, a0 srai $56, a0, a0 srai $1, a0, a0 this CL optimize to: slli $56, a0, a0 srai $57, a0, a0 Remove 270+ instructions from Go binary on linux/riscv64. Change-Id: I375e19f9d3bd54f2781791d8cbe5970191297dc8 Reviewed-on: https://go-review.googlesource.com/c/go/+/428496 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-10-06 05:21:04 +00:00
eric fang	ddc7d2a80c	cmd/compile: add late lower pass for last rules to run Usually optimization rules have corresponding priorities, some need to be run first, some run next, and some run last, which produces the best code. But currently our optimization rules have no priority, this CL adds a late lower pass that runs those rules that need to be run at last, such as split unreasonable constant folding. This pass can be seen as the second round of the lower pass. For example: func foo(a, b uint64) uint64 { d := a+0x1234568 d1 := b+0x1234568 return d&d1 } The code generated by the master branch: 0x0004 00004 ADD $19088744, R0, R2 // movz+movk+add 0x0010 00016 ADD $19088744, R1, R1 // movz+movk+add 0x001c 00028 AND R1, R2, R0 This is because the current constant folding optimization rules do not take into account the range of constants, causing the constant to be loaded repeatedly. This CL splits these unreasonable constants folding in the late lower pass. With this CL the generated code: 0x0004 00004 MOVD $19088744, R2 // movz+movk 0x000c 00012 ADD R0, R2, R3 0x0010 00016 ADD R1, R2, R1 0x0014 00020 AND R1, R3, R0 This CL also adds constant folding optimization for ADDS instruction. In addition, in order not to introduce the codegen regression, an optimization rule is added to change the addition of a negative number into a subtraction of a positive number. go1 benchmarks: name old time/op new time/op delta BinaryTree17-8 1.22s ± 1% 1.24s ± 0% +1.56% (p=0.008 n=5+5) Fannkuch11-8 1.54s ± 0% 1.53s ± 0% -0.69% (p=0.016 n=4+5) FmtFprintfEmpty-8 14.1ns ± 0% 14.1ns ± 0% ~ (p=0.079 n=4+5) FmtFprintfString-8 26.0ns ± 0% 26.1ns ± 0% +0.23% (p=0.008 n=5+5) FmtFprintfInt-8 32.3ns ± 0% 32.9ns ± 1% +1.72% (p=0.008 n=5+5) FmtFprintfIntInt-8 54.5ns ± 0% 55.5ns ± 0% +1.83% (p=0.008 n=5+5) FmtFprintfPrefixedInt-8 61.5ns ± 0% 62.0ns ± 0% +0.93% (p=0.008 n=5+5) FmtFprintfFloat-8 72.0ns ± 0% 73.6ns ± 0% +2.24% (p=0.008 n=5+5) FmtManyArgs-8 221ns ± 0% 224ns ± 0% +1.22% (p=0.008 n=5+5) GobDecode-8 1.91ms ± 0% 1.93ms ± 0% +0.98% (p=0.008 n=5+5) GobEncode-8 1.40ms ± 1% 1.39ms ± 0% -0.79% (p=0.032 n=5+5) Gzip-8 115ms ± 0% 117ms ± 1% +1.17% (p=0.008 n=5+5) Gunzip-8 19.4ms ± 1% 19.3ms ± 0% -0.71% (p=0.016 n=5+4) HTTPClientServer-8 27.0µs ± 0% 27.3µs ± 0% +0.80% (p=0.008 n=5+5) JSONEncode-8 3.36ms ± 1% 3.33ms ± 0% ~ (p=0.056 n=5+5) JSONDecode-8 17.5ms ± 2% 17.8ms ± 0% +1.71% (p=0.016 n=5+4) Mandelbrot200-8 2.29ms ± 0% 2.29ms ± 0% ~ (p=0.151 n=5+5) GoParse-8 1.35ms ± 1% 1.36ms ± 1% ~ (p=0.056 n=5+5) RegexpMatchEasy0_32-8 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.444 n=4+5) RegexpMatchEasy0_1K-8 131ns ±11% 118ns ± 6% ~ (p=0.056 n=5+5) RegexpMatchEasy1_32-8 22.9ns ± 0% 22.9ns ± 0% ~ (p=0.905 n=4+5) RegexpMatchEasy1_1K-8 126ns ± 0% 127ns ± 0% ~ (p=0.063 n=4+5) RegexpMatchMedium_32-8 486ns ± 5% 483ns ± 0% ~ (p=0.381 n=5+4) RegexpMatchMedium_1K-8 15.4µs ± 1% 15.5µs ± 0% ~ (p=0.151 n=5+5) RegexpMatchHard_32-8 687ns ± 0% 686ns ± 0% ~ (p=0.103 n=5+5) RegexpMatchHard_1K-8 20.7µs ± 0% 20.7µs ± 1% ~ (p=0.151 n=5+5) Revcomp-8 175ms ± 2% 176ms ± 3% ~ (p=1.000 n=5+5) Template-8 20.4ms ± 6% 20.1ms ± 2% ~ (p=0.151 n=5+5) TimeParse-8 112ns ± 0% 113ns ± 0% +0.97% (p=0.016 n=5+4) TimeFormat-8 156ns ± 0% 145ns ± 0% -7.14% (p=0.029 n=4+4) Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da Reviewed-on: https://go-review.googlesource.com/c/go/+/425134 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com>	2022-10-05 02:40:56 +00:00
Keith Randall	6485e8f503	cmd/compile: use stricter rule for possible partial overlap Partial overlaps can only happen for strict sub-pieces of larger arrays. That's a much stronger condition than the current optimization rules. Update #54467 Change-Id: I11e539b71099e50175f37ee78fddf69283f83ee5 Reviewed-on: https://go-review.googlesource.com/c/go/+/433056 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2022-09-27 20:09:33 +00:00
eric fang	cf53990b18	cmd/compile: Add some CMP and CMN optimization rules on arm64 This CL adds some optimizaion rules: 1, Converts CMP to CMN, or vice versa, when comparing with a negative number. 2, For equal and not equal comparisons, CMP can be converted to CMN in some cases. In theory we could do the same optimization for LT, LE, GT and GE, but need to account for overflow, this CL doesn't handle them. There are no noticeable performance changes. Change-Id: Ia49266c019ab7908ebc9510c2f02e121b1607869 Reviewed-on: https://go-review.googlesource.com/c/go/+/429795 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com>	2022-09-20 01:14:40 +00:00
Joel Sing	a7bcc94719	cmd/compile: resolve known outcomes for SLTI/SLTIU on riscv64 When SLTI/SLTIU is used with ANDI/ORI, it may be possible to determine the outcome based on the values of the immediates. Resolve these cases. Improves code generation for various shift operations. While here, sort tests by architecture to improve readability and ease future maintenance. Change-Id: I87e71e016a0e396a928e7d6389a2df61583dfd8d Reviewed-on: https://go-review.googlesource.com/c/go/+/428217 Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Jenny Rakoczy <jenny@golang.org> Reviewed-by: Jenny Rakoczy <jenny@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Jenny Rakoczy <jenny@golang.org>	2022-09-17 17:17:52 +00:00
ruinan	454a058ffc	cmd/compile: Add shiftIsBounded check for logic shifts of arm64 This CL adds shiftIsBounded checks for the Lsh* and Rsh* rules in arm64. There is no need to check the shift value again with CMP + CSEL when the shift value is valid. Change-Id: I54620de64f02a1b5a11089add237248ae2de01b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/417714 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-09-07 20:10:13 +00:00
Keith Randall	5b1fbfba1c	cmd/compile: rewrite >>c<<c to &^(1<<c-1) Fixes #54496 Change-Id: I3c2ed8cd55836d5b07c8cdec00d3b584885aca79 Reviewed-on: https://go-review.googlesource.com/c/go/+/424856 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Martin Möhrmann <martin@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <martin@golang.org>	2022-09-02 18:51:37 +00:00
Matthew Dempsky	34f0029a85	cmd/compile/internal/noder: allow OCONVNOP for identical iface conversions In go.dev/cl/421821, I included a hack to force OCONVNOP back to OCONVIFACE for conversions involving shape types and non-empty interfaces. The comment correctly noted that this was only needed for conversions between non-identical types, but the code was conservative and applied to even conversions between identical types. This CL adds an extra bool to record whether the conversion is between identical types, so we can keep OCONVNOP instead of forcing back to OCONVIFACE. This has a small improvement to generated code, because we no longer need a convI2I call (as demonstrated by codegen/ifaces.go). But more usefully, this is relevant to pruning unnecessary itab slots in runtime dictionaries (next CL). Change-Id: I94f89e961cd26629b925037fea58d283140766ff Reviewed-on: https://go-review.googlesource.com/c/go/+/427678 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-09-02 18:26:02 +00:00
Derek Parker	6605686e3b	cmd/compile: new inline heuristic for struct compares This CL changes the heuristic used to determine whether we can inline a struct equality check or if we must generate a function and call that function for equality. The old method was to count struct fields, but this can lead to poor in lining decisions. We should really be determining the cost of the equality check and use that to determine if we should inline or generate a function. The new benchmark provided in this CL returns the following when compared against tip: ``` name old time/op new time/op delta EqStruct-32 2.46ns ± 4% 0.25ns ±10% -89.72% (p=0.000 n=39+39) ``` Fixes #38494 Change-Id: Ie06b80a2b2a03a3fd0978bcaf7715f9afb66e0ab GitHub-Last-Rev: `e9a18d9389` GitHub-Pull-Request: golang/go#53326 Reviewed-on: https://go-review.googlesource.com/c/go/+/411674 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-09-02 17:48:22 +00:00
ruinan	121344ac33	cmd/compile: optimize RotateLeft8/16 on arm64 This CL optimizes RotateLeft8/16 on arm64. For 16 bits, we form a 32 bits register by duplicating two 16 bits registers, then use RORW instruction to do the rotate shift. For 8 bits, we just use LSR and LSL instead of RORW because the code is simpler. Benchmark Old ThisCL delta RotateLeft8-46 2.16 ns/op 1.73 ns/op -19.70% RotateLeft16-46 2.16 ns/op 1.54 ns/op -28.53% Change-Id: I09cde4383d12e31876a57f8cdfd3bb4f324fadb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/420976 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-09-02 17:46:31 +00:00
ruinan	54c7bc9cff	cmd/compile: optimize shift ops on arm64 when the shift value is v&63 For the following code case: var x uint64 x >> (shift & 63) We can directly genereta `x >> shift` on arm64, since the hardware will only use the bottom 6 bits of the shift amount. Benchmark old time/op new time/op delta ShiftArithmeticRight-8 0.40ns 0.31ns -21.7% Change-Id: Id58c8a5b2f6dd5c30c3876f4a36e11b4d81e2dc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/425294 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-09-02 17:44:29 +00:00
Keith Randall	33a7e5a4b4	cmd/compile: combine multiple rotate instructions Rotating by c, then by d, is the same as rotating by c+d. Change-Id: I36df82261460ff80f7c6d39bcdf0e840cef1c91a Reviewed-on: https://go-review.googlesource.com/c/go/+/424894 Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-08-31 22:10:52 +00:00
Keith Randall	69aed4712d	cmd/compile: use better splitting condition for string binary search Currently we use a full cmpstring to do the comparison for each split in the binary search for a string switch. Instead, split by comparing a single byte of the input string with a constant. That will give us a much faster split (although it might be not quite as good a split). Fixes #53333 R=go1.20 Change-Id: I28c7209342314f367071e4aa1f2beb6ec9ff7123 Reviewed-on: https://go-review.googlesource.com/c/go/+/414894 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-08-31 22:08:26 +00:00
Wayne Zuo	da6556968f	cmd/compile: simplify bounded shift on riscv64 The prove pass will mark some shifts bounded, and then we can use that information to generate better code on riscv64. Change-Id: Ia22f43d0598453c9417adac7017db28d7240948b Reviewed-on: https://go-review.googlesource.com/c/go/+/422616 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-31 20:21:00 +00:00
Wayne Zuo	e8f0340fa4	cmd/compile: intrinsify RotateLeft{32,64} on loong64 Benchmark on crypto/sha256 (provided by Xiaodong Liu): name old time/op new time/op delta Hash8Bytes/New 1.19µs ± 0% 0.97µs ± 0% -18.75% (p=0.000 n=9+9) Hash8Bytes/Sum224 1.21µs ± 0% 0.97µs ± 0% -20.04% (p=0.000 n=9+10) Hash8Bytes/Sum256 1.21µs ± 0% 0.98µs ± 0% -19.16% (p=0.000 n=10+7) Hash1K/New 15.9µs ± 0% 12.4µs ± 0% -22.10% (p=0.000 n=10+10) Hash1K/Sum224 15.9µs ± 0% 12.4µs ± 0% -22.18% (p=0.000 n=8+10) Hash1K/Sum256 15.9µs ± 0% 12.4µs ± 0% -22.15% (p=0.000 n=10+9) Hash8K/New 119µs ± 0% 92µs ± 0% -22.40% (p=0.000 n=10+9) Hash8K/Sum224 119µs ± 0% 92µs ± 0% -22.41% (p=0.000 n=9+10) Hash8K/Sum256 119µs ± 0% 92µs ± 0% -22.40% (p=0.000 n=9+9) name old speed new speed delta Hash8Bytes/New 6.70MB/s ± 0% 8.25MB/s ± 0% +23.13% (p=0.000 n=10+10) Hash8Bytes/Sum224 6.60MB/s ± 0% 8.26MB/s ± 0% +25.06% (p=0.000 n=10+10) Hash8Bytes/Sum256 6.59MB/s ± 0% 8.15MB/s ± 0% +23.67% (p=0.000 n=10+7) Hash1K/New 64.3MB/s ± 0% 82.5MB/s ± 0% +28.36% (p=0.000 n=10+10) Hash1K/Sum224 64.3MB/s ± 0% 82.6MB/s ± 0% +28.51% (p=0.000 n=10+10) Hash1K/Sum256 64.3MB/s ± 0% 82.6MB/s ± 0% +28.46% (p=0.000 n=9+9) Hash8K/New 69.0MB/s ± 0% 89.0MB/s ± 0% +28.87% (p=0.000 n=10+8) Hash8K/Sum224 69.0MB/s ± 0% 89.0MB/s ± 0% +28.88% (p=0.000 n=9+10) Hash8K/Sum256 69.0MB/s ± 0% 88.9MB/s ± 0% +28.87% (p=0.000 n=8+9) Benchmark on crypto/sha512 (provided by Xiaodong Liu): name old time/op new time/op delta Hash8Bytes/New 1.55µs ± 0% 1.31µs ± 0% -15.67% (p=0.000 n=10+10) Hash8Bytes/Sum384 1.59µs ± 0% 1.35µs ± 0% -14.97% (p=0.000 n=10+10) Hash8Bytes/Sum512 1.62µs ± 0% 1.39µs ± 0% -14.02% (p=0.000 n=10+10) Hash1K/New 10.7µs ± 0% 8.6µs ± 0% -19.60% (p=0.000 n=8+8) Hash1K/Sum384 10.8µs ± 0% 8.7µs ± 0% -19.40% (p=0.000 n=9+9) Hash1K/Sum512 10.8µs ± 0% 8.7µs ± 0% -19.35% (p=0.000 n=9+10) Hash8K/New 74.6µs ± 0% 59.6µs ± 0% -20.08% (p=0.000 n=10+9) Hash8K/Sum384 74.7µs ± 0% 59.7µs ± 0% -20.04% (p=0.000 n=9+8) Hash8K/Sum512 74.7µs ± 0% 59.7µs ± 0% -20.01% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes/New 5.16MB/s ± 0% 6.12MB/s ± 0% +18.60% (p=0.000 n=10+8) Hash8Bytes/Sum384 5.02MB/s ± 0% 5.90MB/s ± 0% +17.56% (p=0.000 n=10+10) Hash8Bytes/Sum512 4.94MB/s ± 0% 5.74MB/s ± 0% +16.29% (p=0.000 n=10+9) Hash1K/New 95.4MB/s ± 0% 118.6MB/s ± 0% +24.38% (p=0.000 n=10+10) Hash1K/Sum384 95.0MB/s ± 0% 117.9MB/s ± 0% +24.06% (p=0.000 n=8+9) Hash1K/Sum512 94.8MB/s ± 0% 117.5MB/s ± 0% +23.99% (p=0.000 n=8+9) Hash8K/New 110MB/s ± 0% 137MB/s ± 0% +25.11% (p=0.000 n=9+6) Hash8K/Sum384 110MB/s ± 0% 137MB/s ± 0% +25.07% (p=0.000 n=9+8) Hash8K/Sum512 110MB/s ± 0% 137MB/s ± 0% +25.01% (p=0.000 n=10+10) Change-Id: I28ccfce634659305a336c8e0a3f8589f7361d661 Reviewed-on: https://go-review.googlesource.com/c/go/+/422317 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: David Chase <drchase@google.com>	2022-08-30 03:21:44 +00:00
Wayne Zuo	a6219737e3	cmd/compile: intrinsify Sub64 on riscv64 After this CL, the performance difference in crypto/elliptic benchmarks on linux/riscv64 are: name old time/op new time/op delta ScalarBaseMult/P256 1.64ms ± 1% 1.60ms ± 1% -2.36% (p=0.008 n=5+5) ScalarBaseMult/P224 1.53ms ± 1% 1.47ms ± 2% -4.24% (p=0.008 n=5+5) ScalarBaseMult/P384 5.12ms ± 2% 5.03ms ± 2% ~ (p=0.095 n=5+5) ScalarBaseMult/P521 22.3ms ± 2% 13.8ms ± 1% -37.89% (p=0.008 n=5+5) ScalarMult/P256 4.49ms ± 2% 4.26ms ± 2% -5.13% (p=0.008 n=5+5) ScalarMult/P224 4.33ms ± 1% 4.09ms ± 1% -5.59% (p=0.008 n=5+5) ScalarMult/P384 16.3ms ± 1% 15.5ms ± 2% -4.78% (p=0.008 n=5+5) ScalarMult/P521 101ms ± 0% 47ms ± 2% -53.36% (p=0.008 n=5+5) Change-Id: I31cf0506e27f9d85f576af1813630a19c20dda8a Reviewed-on: https://go-review.googlesource.com/c/go/+/420095 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-27 05:43:59 +00:00
Wayne Zuo	969f48a3a2	cmd/compile: intrinsify Add64 on riscv64 According to RISCV instruction set manual v2.2 Sec 2.4, we can implement overflowing check for unsigned addition cheaply using SLTU instructions. After this CL, the performance difference in crypto/elliptic benchmarks on linux/riscv64 are: name old time/op new time/op delta ScalarBaseMult/P256 1.93ms ± 1% 1.64ms ± 1% -14.96% (p=0.008 n=5+5) ScalarBaseMult/P224 1.80ms ± 2% 1.53ms ± 1% -14.89% (p=0.008 n=5+5) ScalarBaseMult/P384 6.15ms ± 2% 5.12ms ± 2% -16.73% (p=0.008 n=5+5) ScalarBaseMult/P521 25.9ms ± 1% 22.3ms ± 2% -13.78% (p=0.008 n=5+5) ScalarMult/P256 5.59ms ± 1% 4.49ms ± 2% -19.79% (p=0.008 n=5+5) ScalarMult/P224 5.42ms ± 1% 4.33ms ± 1% -20.01% (p=0.008 n=5+5) ScalarMult/P384 19.9ms ± 2% 16.3ms ± 1% -18.15% (p=0.008 n=5+5) ScalarMult/P521 97.3ms ± 1% 100.7ms ± 0% +3.48% (p=0.008 n=5+5) Change-Id: Ic4c82ced4b072a4a6575343fa9f29dd09b0cabc4 Reviewed-on: https://go-review.googlesource.com/c/go/+/420094 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-27 05:43:32 +00:00
Wayne Zuo	b60432df14	cmd/compile: deadcode for LoweredMuluhilo on riscv64 This is a follow up of CL 425101 on RISCV64. According to RISCV Volume 1, Unprivileged Spec v. 20191213 Chapter 7.1: If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. So we should not split Muluhilo to separate instructions. Updates #54607 Change-Id: If47461f3aaaf00e27cd583a9990e144fb8bcdb17 Reviewed-on: https://go-review.googlesource.com/c/go/+/425203 Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-24 18:08:33 +00:00
Keith Randall	60ad3c48f5	cmd/compile: move SSA rotate instruction detection to arch-independent rules Detect rotate instructions while still in architecture-independent form. It's easier to do here, and we don't need to repeat it in each architecture file. Change-Id: I9396954b3f3b3bfb96c160d064a02002309935bb Reviewed-on: https://go-review.googlesource.com/c/go/+/421195 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Eric Fang <eric.fang@arm.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-08-23 21:24:14 +00:00
eric fang	9f0f87c806	cmd/internal/obj/arm64: remove the transition from $0 to ZR Previously we convert $0 to the ZR register for some reasons, which causes two problems: 1. Confusion, the special case of the ZR register needs to be considered when dealing with constants. For encoding, some places we encode ZR, and some places we encode $0, although we have converted $0 to ZR. 2. Unexpected instruction format. All instructions that support ZR register operands can be replaced by $0. This patch removes this conversion. Note that this patch may cause previously unintendedly supported instruction formats to no longer be supported. Change-Id: I3d8d2c06711b7614a38191397da7776417f1861c Reviewed-on: https://go-review.googlesource.com/c/go/+/404316 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-23 06:11:32 +00:00
eric fang	0a52d80666	cmd/compile/internal/ssa: optimize memory moving on arm64 This CL optimizes memory moving with LDP and STP on arm64. Benchmarks: name old time/op new time/op delta ClearFat7-160 1.08ns ± 0% 0.95ns ± 0% -11.41% (p=0.008 n=5+5) ClearFat8-160 0.84ns ± 0% 0.84ns ± 0% -0.95% (p=0.008 n=5+5) ClearFat11-160 1.08ns ± 0% 0.95ns ± 0% -11.46% (p=0.008 n=5+5) ClearFat12-160 0.95ns ± 0% 0.95ns ± 0% ~ (p=0.063 n=4+5) ClearFat13-160 1.08ns ± 0% 0.95ns ± 0% -11.45% (p=0.008 n=5+5) ClearFat14-160 1.08ns ± 0% 0.95ns ± 0% -11.47% (p=0.008 n=5+5) ClearFat15-160 1.24ns ± 0% 0.95ns ± 0% -22.98% (p=0.029 n=4+4) ClearFat16-160 0.84ns ± 0% 0.83ns ± 0% -0.11% (p=0.008 n=5+5) ClearFat24-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat32-160 2.86ns ± 0% 2.86ns ± 0% ~ (p=0.333 n=5+4) ClearFat40-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat48-160 3.32ns ± 1% 3.31ns ± 1% ~ (p=0.690 n=5+5) ClearFat56-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat64-160 3.25ns ± 1% 3.26ns ± 1% ~ (p=0.841 n=5+5) ClearFat72-160 2.22ns ± 0% 2.22ns ± 0% ~ (p=0.444 n=5+5) ClearFat128-160 4.03ns ± 0% 4.04ns ± 0% +0.32% (p=0.008 n=5+5) ClearFat256-160 6.44ns ± 0% 6.44ns ± 0% +0.08% (p=0.016 n=4+5) ClearFat512-160 12.2ns ± 0% 12.2ns ± 0% +0.13% (p=0.008 n=5+5) ClearFat1024-160 24.3ns ± 0% 24.3ns ± 0% ~ (p=0.167 n=5+5) ClearFat1032-160 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.238 n=4+5) ClearFat1040-160 29.2ns ± 0% 29.3ns ± 0% +0.34% (p=0.008 n=5+5) CopyFat7-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5) CopyFat8-160 0.89ns ± 0% 0.89ns ± 0% ~ (p=0.238 n=5+5) CopyFat11-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5) CopyFat12-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.238 n=5+4) CopyFat13-160 1.43ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5) CopyFat14-160 1.43ns ± 0% 1.07ns ± 0% -24.95% (p=0.008 n=5+5) CopyFat15-160 1.79ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5) CopyFat16-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.444 n=5+5) CopyFat24-160 1.84ns ± 2% 1.67ns ± 0% -9.28% (p=0.008 n=5+5) CopyFat32-160 3.22ns ± 0% 2.92ns ± 0% -9.40% (p=0.008 n=5+5) CopyFat64-160 3.64ns ± 0% 3.57ns ± 0% -1.96% (p=0.008 n=5+5) CopyFat72-160 3.56ns ± 0% 3.11ns ± 0% -12.89% (p=0.008 n=5+5) CopyFat128-160 5.06ns ± 0% 5.06ns ± 0% +0.04% (p=0.048 n=5+5) CopyFat256-160 9.13ns ± 0% 9.13ns ± 0% ~ (p=0.659 n=5+5) CopyFat512-160 17.4ns ± 0% 17.4ns ± 0% ~ (p=0.167 n=5+5) CopyFat520-160 17.2ns ± 0% 17.3ns ± 0% +0.37% (p=0.008 n=5+5) CopyFat1024-160 34.1ns ± 0% 34.0ns ± 0% ~ (p=0.127 n=5+5) CopyFat1032-160 80.9ns ± 0% 34.2ns ± 0% -57.74% (p=0.008 n=5+5) CopyFat1040-160 94.4ns ± 0% 41.7ns ± 0% -55.78% (p=0.016 n=5+4) Change-Id: I14186f9f82b0ecf8b6c02191dc5da566b9a21e6c Reviewed-on: https://go-review.googlesource.com/c/go/+/421654 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-23 03:09:07 +00:00
Cherry Mui	8bf9e01473	cmd/compile: split Muluhilo op on ARM64 On ARM64 we use two separate instructions to compute the hi and lo results of a 64x64->128 multiplication. Lower to two separate ops so if only one result is needed we can deadcode the other. Fixes #54607. Change-Id: Ib023e77eb2b2b0bcf467b45471cb8a294bce6f90 Reviewed-on: https://go-review.googlesource.com/c/go/+/425101 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2022-08-22 21:29:31 +00:00
Keith Randall	908499adec	cmd/compile: stop using VARKILL With the introduction of stack objects, VARKILL information is no longer needed. With stack objects, an object is dead when there are no more static references to it, and the stack scanner can't find any live pointers to it. VARKILL information isn't used to establish live ranges for address-taken variables any more. In effect, the last static reference is the VARKILL, and there's an additional dynamic liveness check during stack scanning. Next CL will actually rip out the VARKILL opcodes. Change-Id: I030a2ab867445cf4e0e69397911f8a2e2f0ed07b Reviewed-on: https://go-review.googlesource.com/c/go/+/419234 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-08-18 17:36:38 +00:00
Archana R	d09c6ac417	test/codegen: updated multiple tests to verify on ppc64,ppc64le Updated multiple tests in test/codegen: math.go, mathbits.go, shift.go and slices.go to verify on ppc64/ppc64le as well Change-Id: Id88dd41569b7097819fb4d451b615f69cf7f7a94 Reviewed-on: https://go-review.googlesource.com/c/go/+/412115 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-08-17 13:56:55 +00:00
Wayne Zuo	09932f95f5	cmd/compile: combine more constant stores on amd64 Fixes #53324 Change-Id: I06149d860f858b082235e9d80bf0ea494679b386 Reviewed-on: https://go-review.googlesource.com/c/go/+/411614 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-08-15 16:19:21 +00:00
eric fang	efe5929dbd	cmd/compile/internal/ssa: optimize ARM64 code with TST For signed comparisons, the following four optimization rules hold: (CMPconst [0] z:(AND x y)) && z.Uses == 1 => (TST x y) (CMPWconst [0] z:(AND x y)) && z.Uses == 1 => (TSTW x y) (CMPconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTconst [c] y) (CMPWconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTWconst [int32(c)] y) But currently they only apply to jump instructions, not to conditional instructions within a block, such as cset, csel, etc. This CL extends the above rules into blocks so that conditional instructions can also be optimized. name old time/op new time/op delta DivisiblePow2constI64-160 1.04ns ± 0% 0.86ns ± 0% -17.30% (p=0.008 n=5+5) DivisiblePow2constI32-160 1.04ns ± 0% 0.87ns ± 0% -16.16% (p=0.016 n=4+5) DivisiblePow2constI16-160 1.04ns ± 0% 0.87ns ± 0% -16.03% (p=0.008 n=5+5) DivisiblePow2constI8-160 1.04ns ± 0% 0.86ns ± 0% -17.15% (p=0.008 n=5+5) Change-Id: I6bc34bff30862210e8dd001e0340b8fe502fe3de Reviewed-on: https://go-review.googlesource.com/c/go/+/420434 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com>	2022-08-10 02:13:53 +00:00
Cuong Manh Le	0f8dffd0aa	all: use ":" for compiler generated symbols As it can't appear in user package paths. There is a hack for handling "go:buildid" and "type:" on windows/386. Previously, windows/386 requires underscore prefix on external symbols, but that's only applied for SHOSTOBJ/SUNDEFEXT or cgo export symbols. "go.buildid" is STEXT, "type." is STYPE, thus they are not prefixed with underscore. In external linking mode, the external linker can't resolve them as external symbols. But we are lucky that they have "." in their name, so the external linker see them as Forwarder RVA exports. See: - https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#export-address-table - https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/pe-dll.c;h=e7b82ba6ffadf74dc1b9ee71dc13d48336941e51;hb=HEAD#l972) This CL changes "." to ":" in symbols name, so theses symbols can not be found by external linker anymore. So a hacky way is adding the underscore prefix for these 2 symbols. I don't have enough knowledge to verify whether adding the underscore for all STEXT/STYPE symbols are fine, even if it could be, that would be done in future CL. Fixes #37762 Change-Id: I92eaaf24c0820926a36e0530fdb07b07af1fcc35 Reviewed-on: https://go-review.googlesource.com/c/go/+/317917 Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-09 11:28:56 +00:00
Lynn Boger	c1bfefe9d1	cmd/compile: fix confusion with ANDCCconst in PPC64 rules Currently there is a an ANDconst and an ANDCCconst op in PPC64, which is confusing since they map onto the same instruction. One of these ops sets the result of the AND operation, and the other sets the flag (condition register). This converts ANDCCconst into an op with the 2 expected results: the integer result of the AND and the flag setting. The ANDconst op has been removed. Note that in the PPC64 ISA the only variation of the 'and immediate' is the one that sets the condition bit, which probably led to the original (confusing) implementation. This also adds a few rules to improve the use of ANDCCconst with ISELB and some testcases to verify those improvements. Change-Id: I523703fa4da2098eb995dc3ba744d36fa28e41d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/422015 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Paul Murphy <murp@ibm.com>	2022-08-08 20:15:55 +00:00
Keith Randall	c2a9c55823	cmd/compile: optimize unsafe.Slice generated code We don't need a multiply when the element type is size 0 or 1. The panic functions don't return, so we don't need any post-call code (register restores, etc.). Change-Id: I0dcea5df56d29d7be26554ddca966b3903c672e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/419754 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2022-08-08 17:36:47 +00:00
cuiweixie	e7307034cc	cmd/compile: store combine on amd64 Fixes #54120 Change-Id: I6915b6e8d459d9becfdef4fdcba95ee4dea6af05 GitHub-Last-Rev: `03f19942c7` GitHub-Pull-Request: golang/go#54126 Reviewed-on: https://go-review.googlesource.com/c/go/+/420115 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>	2022-08-08 16:40:58 +00:00
Matthew Dempsky	ab8d7dd75e	cmd/compile: set LocalPkg.Path to -p flag Since CL 391014, cmd/compile now requires the -p flag to be set the build system. This CL changes it to initialize LocalPkg.Path to the provided path, rather than relying on writing out `"".` into object files and expecting cmd/link to substitute them. However, this actually involved a rather long tail of fixes. Many have already been submitted, but a few notable ones that have to land simultaneously with changing LocalPkg: 1. When compiling package runtime, there are really two "runtime" packages: types.LocalPkg (the source package itself) and ir.Pkgs.Runtime (the compiler's internal representation, for synthetic references). Previously, these ended up creating separate link symbols (`"".xxx` and `runtime.xxx`, respectively), but now they both end up as `runtime.xxx`, which causes lsym collisions (notably inittask and funcsyms). 2. test/codegen tests need to be updated to expect symbols to be named `command-line-arguments.xxx` rather than `"".foo`. 3. The issue20014 test case is sensitive to the sort order of field tracking symbols. In particular, the local package now sorts to its natural place in the list, rather than to the front. Thanks to David Chase for helping track down all of the fixes needed for this CL. Updates #51734. Change-Id: Iba3041cf7ad967d18c6e17922fa06ba11798b565 Reviewed-on: https://go-review.googlesource.com/c/go/+/393715 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2022-05-16 18:19:47 +00:00
Cherry Mui	540f8c2b50	cmd/compile: use jump table on ARM64 Following CL 357330, use jump tables on ARM64. name old time/op new time/op delta Switch8Predictable-4 3.41ns ± 0% 3.21ns ± 0% ~ (p=0.079 n=4+5) Switch8Unpredictable-4 12.0ns ± 0% 9.5ns ± 0% -21.17% (p=0.000 n=5+4) Switch32Predictable-4 3.06ns ± 0% 2.82ns ± 0% -7.78% (p=0.008 n=5+5) Switch32Unpredictable-4 13.3ns ± 0% 9.5ns ± 0% -28.87% (p=0.016 n=4+5) SwitchStringPredictable-4 3.71ns ± 0% 3.21ns ± 0% -13.43% (p=0.000 n=5+4) SwitchStringUnpredictable-4 14.8ns ± 0% 15.1ns ± 0% +2.37% (p=0.008 n=5+5) Change-Id: Ia0b85df7ca9273cf70c05eb957225c6e61822fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/403979 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2022-05-13 19:51:03 +00:00
Paul E. Murphy	0c43878baa	cmd/compile: lower Add64/Sub64 into ssa on PPC64 math/bits.Add64 and math/bits.Sub64 now lower and optimize directly in SSA form. The optimization of carry chains focuses around eliding XER<->GPR transfers of the CA bit when used exclusively as an input to a single carry operations, or when the CA value is known. This also adds support for handling XER spills in the assembler which could happen if carry chains contain inter-dependencies on each other (which seems very unlikely with practical usage), or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA). With PPC64 Add64/Sub64 lowering into SSA and this patch, the net performance difference in crypto/elliptic benchmarks on P9/ppc64le are: name old time/op new time/op delta ScalarBaseMult/P256 46.3µs ± 0% 46.9µs ± 0% +1.34% ScalarBaseMult/P224 356µs ± 0% 209µs ± 0% -41.14% ScalarBaseMult/P384 1.20ms ± 0% 0.57ms ± 0% -52.14% ScalarBaseMult/P521 3.38ms ± 0% 1.44ms ± 0% -57.27% ScalarMult/P256 199µs ± 0% 199µs ± 0% -0.17% ScalarMult/P224 357µs ± 0% 212µs ± 0% -40.56% ScalarMult/P384 1.20ms ± 0% 0.58ms ± 0% -51.86% ScalarMult/P521 3.37ms ± 0% 1.44ms ± 0% -57.32% MarshalUnmarshal/P256/Uncompressed 2.59µs ± 0% 2.52µs ± 0% -2.63% MarshalUnmarshal/P256/Compressed 2.58µs ± 0% 2.52µs ± 0% -2.06% MarshalUnmarshal/P224/Uncompressed 1.54µs ± 0% 1.40µs ± 0% -9.42% MarshalUnmarshal/P224/Compressed 1.54µs ± 0% 1.39µs ± 0% -9.87% MarshalUnmarshal/P384/Uncompressed 2.40µs ± 0% 1.80µs ± 0% -24.93% MarshalUnmarshal/P384/Compressed 2.35µs ± 0% 1.81µs ± 0% -23.03% MarshalUnmarshal/P521/Uncompressed 3.79µs ± 0% 2.58µs ± 0% -31.81% MarshalUnmarshal/P521/Compressed 3.80µs ± 0% 2.60µs ± 0% -31.67% Note, P256 uses an asm implementation, thus, little variation is expected. Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71 Reviewed-on: https://go-review.googlesource.com/c/go/+/346870 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-05-10 20:03:53 +00:00
Jorropo	e1e056fa6a	cmd/compile: fold constants found by prove It is hit ~70k times building go. This make the go binary, 0.04% smaller. I didn't included benchmarks because this is just constant foldings and is hard to mesure objectively. For example, this enable rewriting things like: if x == 20 { return x + 30 + z } Into: if x == 20 { return 50 + z } It's not just fixing programer's code, the ssa generator generate code like this sometimes. Change-Id: I0861f342b27f7227b5f1c34d8267fa0057b1bbbc GitHub-Last-Rev: `4c2f9b5216` GitHub-Pull-Request: golang/go#52669 Reviewed-on: https://go-review.googlesource.com/c/go/+/403735 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>	2022-05-04 20:30:17 +00:00
Paul E. Murphy	c570f0eda2	cmd/compile: combine OR + NOT into ORN on PPC64 This shows up in a few crypto functions, and other assorted places. Change-Id: I5a7f4c25ddd4a6499dc295ef693b9fe43d2448ab Reviewed-on: https://go-review.googlesource.com/c/go/+/404057 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2022-05-04 18:49:50 +00:00
Cuong Manh Le	0668e3cb1a	cmd/compile: support pointers to arrays in arrayClear Fixes #52635 Change-Id: I85f182931e30292983ef86c55a0ab6e01282395c Reviewed-on: https://go-review.googlesource.com/c/go/+/403337 Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-05-03 05:42:48 +00:00
Keith Randall	c4b2288755	cmd/compile: add jump table codegen test Change-Id: Ic67f676f5ebe146166a0d3c1d78a802881320e49 Reviewed-on: https://go-review.googlesource.com/c/go/+/400375 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>	2022-04-14 21:16:29 +00:00
Wayne Zuo	66f03f79da	cmd/compile: add SHLX&SHRX without load Change-Id: I79eb5e7d6bcb23f26d3a100e915efff6dae70391 Reviewed-on: https://go-review.googlesource.com/c/go/+/399061 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-13 17:48:36 +00:00
Wayne Zuo	517781b391	cmd/compile: add SARXQload and SARXLload Change-Id: I4e8dc5009a5b8af37d71b62f1322f94806d3e9d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/399056 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-13 17:48:12 +00:00
Wayne Zuo	d6320f1a58	cmd/compile: add SARX instruction for GOAMD64>=3 name old time/op new time/op delta ShiftArithmeticRight-8 0.68ns ± 5% 0.30ns ± 6% -56.14% (p=0.000 n=10+10) Change-Id: I052a0d7b9e6526d526276444e588b0cc288beff4 Reviewed-on: https://go-review.googlesource.com/c/go/+/399055 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-12 12:59:27 +00:00
Wayne Zuo	32de2b0d1c	cmd/compile: add MOVBE index load/store Fixes #51724 Change-Id: I94e650a7482dc4c479d597f0162a6a89d779708d Reviewed-on: https://go-review.googlesource.com/c/go/+/395474 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>	2022-04-11 15:41:56 +00:00
Wayne Zuo	615d3c3040	test: adjust load and store test In the load tests, we only want to test the assembly produced by the load operations. If we use the global variable sink, it will produce one load operation and one store operation(assign to sink). For example: func load_be64(b []byte) uint64 { sink64 = binary.BigEndian.Uint64(b) } If we compile this function with GOAMD64=v3, it may produce MOVBEQload and MOVQstore or MOVQload and MOVBEQstore, but we only want MOVBEQload. Discovered when developing CL 395474. Same for the store tests. Change-Id: I65c3c742f1eff657c3a0d2dd103f51140ae8079e Reviewed-on: https://go-review.googlesource.com/c/go/+/397875 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>	2022-04-11 15:41:04 +00:00
Wayne Zuo	7fbabe8d57	cmd/compile: use shlx&shrx instruction for GOAMD64>=v3 The SHRX/SHLX instruction can take any general register as the shift count operand, and can read source from memory. This CL introduces some operators to combine load and shift to one instruction. For #47120 Change-Id: I13b48f53c7d30067a72eb2c8382242045dead36a Reviewed-on: https://go-review.googlesource.com/c/go/+/385174 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>	2022-04-04 20:07:23 +00:00
Wayne Zuo	a92ca51507	cmd/compile: use LZCNT instruction for GOAMD64>=3 LZCNT is similar to BSR, but BSR(x) is undefined when x == 0, so using LZCNT can avoid a special case for zero input. Except that case, LZCNTQ(x) == 63-BSRQ(x) and LZCNTL(x) == 31-BSRL(x). And according to https://www.agner.org/optimize/instruction_tables.pdf, LZCNT instructions are much faster than BSR on AMD CPU. name old time/op new time/op delta LeadingZeros-8 0.91ns ± 1% 0.80ns ± 7% -11.68% (p=0.000 n=9+9) LeadingZeros8-8 0.98ns ±15% 0.91ns ± 1% -7.34% (p=0.000 n=9+9) LeadingZeros16-8 0.94ns ± 3% 0.92ns ± 2% -2.36% (p=0.001 n=10+10) LeadingZeros32-8 0.89ns ± 1% 0.78ns ± 2% -12.49% (p=0.000 n=10+10) LeadingZeros64-8 0.92ns ± 1% 0.78ns ± 1% -14.48% (p=0.000 n=10+10) Change-Id: I125147fe3d6994a4cfe558432780408e9a27557a Reviewed-on: https://go-review.googlesource.com/c/go/+/396794 Reviewed-by: Keith Randall <khr@golang.org> Trust: Emmanuel Odeke <emmanuel@orijtech.com> Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-04-04 04:01:17 +00:00
Wayne Zuo	ba6df85c7c	cmd/compile: add MOVBEWstore support for GOAMD64>=3 This CL add MOVBE support for 16-bit version, but MOVBEWload is excluded because it does not satisfy zero extented. For #51724 Change-Id: I3fadf20bcbb9b423f6355e6a1e340107e8e621ac Reviewed-on: https://go-review.googlesource.com/c/go/+/396617 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Trust: Emmanuel Odeke <emmanuel@orijtech.com>	2022-04-03 23:48:16 +00:00
fanzha02	8ab42a945a	cmd/compile: merge ANDconst and UBFX into UBFX on arm64 Add a new rewrite rule to merge ANDconst and UBFX into UBFX. Add test cases. Change-Id: I24d6442d0c956d7ce092c3a3858d4a3a41771670 Reviewed-on: https://go-review.googlesource.com/c/go/+/377054 Trust: Fannie Zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-03-25 01:24:44 +00:00

1 2 3 4 5 ...

373 Commits