qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-26 03:57:57 -07:00

Author	SHA1	Message	Date
Keith Randall	c4b2288755	cmd/compile: add jump table codegen test Change-Id: Ic67f676f5ebe146166a0d3c1d78a802881320e49 Reviewed-on: https://go-review.googlesource.com/c/go/+/400375 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>	2022-04-14 21:16:29 +00:00
Wayne Zuo	66f03f79da	cmd/compile: add SHLX&SHRX without load Change-Id: I79eb5e7d6bcb23f26d3a100e915efff6dae70391 Reviewed-on: https://go-review.googlesource.com/c/go/+/399061 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-13 17:48:36 +00:00
Wayne Zuo	517781b391	cmd/compile: add SARXQload and SARXLload Change-Id: I4e8dc5009a5b8af37d71b62f1322f94806d3e9d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/399056 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-13 17:48:12 +00:00
Wayne Zuo	d6320f1a58	cmd/compile: add SARX instruction for GOAMD64>=3 name old time/op new time/op delta ShiftArithmeticRight-8 0.68ns ± 5% 0.30ns ± 6% -56.14% (p=0.000 n=10+10) Change-Id: I052a0d7b9e6526d526276444e588b0cc288beff4 Reviewed-on: https://go-review.googlesource.com/c/go/+/399055 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-12 12:59:27 +00:00
Wayne Zuo	32de2b0d1c	cmd/compile: add MOVBE index load/store Fixes #51724 Change-Id: I94e650a7482dc4c479d597f0162a6a89d779708d Reviewed-on: https://go-review.googlesource.com/c/go/+/395474 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>	2022-04-11 15:41:56 +00:00
Wayne Zuo	615d3c3040	test: adjust load and store test In the load tests, we only want to test the assembly produced by the load operations. If we use the global variable sink, it will produce one load operation and one store operation(assign to sink). For example: func load_be64(b []byte) uint64 { sink64 = binary.BigEndian.Uint64(b) } If we compile this function with GOAMD64=v3, it may produce MOVBEQload and MOVQstore or MOVQload and MOVBEQstore, but we only want MOVBEQload. Discovered when developing CL 395474. Same for the store tests. Change-Id: I65c3c742f1eff657c3a0d2dd103f51140ae8079e Reviewed-on: https://go-review.googlesource.com/c/go/+/397875 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>	2022-04-11 15:41:04 +00:00
Wayne Zuo	7fbabe8d57	cmd/compile: use shlx&shrx instruction for GOAMD64>=v3 The SHRX/SHLX instruction can take any general register as the shift count operand, and can read source from memory. This CL introduces some operators to combine load and shift to one instruction. For #47120 Change-Id: I13b48f53c7d30067a72eb2c8382242045dead36a Reviewed-on: https://go-review.googlesource.com/c/go/+/385174 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>	2022-04-04 20:07:23 +00:00
Wayne Zuo	a92ca51507	cmd/compile: use LZCNT instruction for GOAMD64>=3 LZCNT is similar to BSR, but BSR(x) is undefined when x == 0, so using LZCNT can avoid a special case for zero input. Except that case, LZCNTQ(x) == 63-BSRQ(x) and LZCNTL(x) == 31-BSRL(x). And according to https://www.agner.org/optimize/instruction_tables.pdf, LZCNT instructions are much faster than BSR on AMD CPU. name old time/op new time/op delta LeadingZeros-8 0.91ns ± 1% 0.80ns ± 7% -11.68% (p=0.000 n=9+9) LeadingZeros8-8 0.98ns ±15% 0.91ns ± 1% -7.34% (p=0.000 n=9+9) LeadingZeros16-8 0.94ns ± 3% 0.92ns ± 2% -2.36% (p=0.001 n=10+10) LeadingZeros32-8 0.89ns ± 1% 0.78ns ± 2% -12.49% (p=0.000 n=10+10) LeadingZeros64-8 0.92ns ± 1% 0.78ns ± 1% -14.48% (p=0.000 n=10+10) Change-Id: I125147fe3d6994a4cfe558432780408e9a27557a Reviewed-on: https://go-review.googlesource.com/c/go/+/396794 Reviewed-by: Keith Randall <khr@golang.org> Trust: Emmanuel Odeke <emmanuel@orijtech.com> Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-04-04 04:01:17 +00:00
Wayne Zuo	ba6df85c7c	cmd/compile: add MOVBEWstore support for GOAMD64>=3 This CL add MOVBE support for 16-bit version, but MOVBEWload is excluded because it does not satisfy zero extented. For #51724 Change-Id: I3fadf20bcbb9b423f6355e6a1e340107e8e621ac Reviewed-on: https://go-review.googlesource.com/c/go/+/396617 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Trust: Emmanuel Odeke <emmanuel@orijtech.com>	2022-04-03 23:48:16 +00:00
fanzha02	8ab42a945a	cmd/compile: merge ANDconst and UBFX into UBFX on arm64 Add a new rewrite rule to merge ANDconst and UBFX into UBFX. Add test cases. Change-Id: I24d6442d0c956d7ce092c3a3858d4a3a41771670 Reviewed-on: https://go-review.googlesource.com/c/go/+/377054 Trust: Fannie Zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-03-25 01:24:44 +00:00
Keith Randall	aaa3d39f27	cmd/compile: include all entries in map literal hint size Currently we only include static entries in the hint for sizing the map when allocating a map for a map literal. Change that to include all entries. This will be an overallocation if the dynamic entries in the map have equal keys, but equal keys in map literals are rare, and at worst we waste a bit of space. Fixes #43020 Change-Id: I232f82f15316bdf4ea6d657d25a0b094b77884ce Reviewed-on: https://go-review.googlesource.com/c/go/+/383634 Run-TryBot: Keith Randall <khr@golang.org> Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-03-01 23:20:30 +00:00
Archana R	4056934e48	test/codegen: updated arithmetic tests to verify on ppc64,ppc64le Updated multiple tests in test/codegen/arithmetic.go to verify on ppc64/ppc64le as well Change-Id: I79ca9f87017ea31147a4ba16f5d42ba0fcae64e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/358546 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-11-01 13:12:37 +00:00
Archana R	8b9c0d1a79	test/codegen: updated comparison test to verify on ppc64,ppc64le Updated test/codegen/comparison.go to verify memequal is inlined as implemented in CL 328291. Change-Id: If7824aed37ee1f8640e54fda0f9b7610582ba316 Reviewed-on: https://go-review.googlesource.com/c/go/+/357289 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-10-21 13:05:07 +00:00
wdvxdr	fe7df4c4d0	cmd/compile: use MOVBE instruction for GOAMD64>=v3 In CL 354670, I copied some existing rules for convenience but forgot to update the last rule which broke `GOAMD64=v3 ./make.bat` Revive CL 354670 Change-Id: Ic1e2047c603f0122482a4b293ce1ef74d806c019 Reviewed-on: https://go-review.googlesource.com/c/go/+/356810 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org>	2021-10-19 16:18:56 +00:00
Daniel Martí	b0351bfd7d	Revert "cmd/compile: use MOVBE instruction for GOAMD64>=v3" This reverts CL 354670. Reason for revert: broke make.bash with GOAMD64=v3. Fixes #49061. Change-Id: I7f2ed99b7c10100c4e0c1462ea91c4c9d8c609b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/356790 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Koichi Shiraishi <zchee.io@gmail.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2021-10-19 09:49:38 +00:00
Lynn Boger	33b3260c1e	cmd/compile/internal/ssagen: set BitLen32 as intrinsic on PPC64 It was noticed through some other investigation that BitLen32 was not generating the best code and found that it wasn't recognized as an intrinsic. This corrects that and enables the test for PPC64. Change-Id: Iab496a8830c8552f507b7292649b1b660f3848b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/355872 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-10-18 21:33:08 +00:00
wdvxdr	3e5cc4d6f6	cmd/compile: use MOVBE instruction for GOAMD64>=v3 encoding/binary benchmark on my laptop: name old time/op new time/op delta ReadSlice1000Int32s-8 4.42µs ± 5% 4.20µs ± 1% -4.94% (p=0.046 n=9+8) ReadStruct-8 359ns ± 8% 368ns ± 5% +2.35% (p=0.041 n=9+10) WriteStruct-8 349ns ± 1% 357ns ± 1% +2.15% (p=0.000 n=8+10) ReadInts-8 235ns ± 1% 233ns ± 1% -1.01% (p=0.005 n=10+10) WriteInts-8 265ns ± 1% 274ns ± 1% +3.45% (p=0.000 n=10+10) WriteSlice1000Int32s-8 4.61µs ± 5% 4.59µs ± 5% ~ (p=0.986 n=10+10) PutUint16-8 0.56ns ± 4% 0.57ns ± 4% ~ (p=0.101 n=10+10) PutUint32-8 0.83ns ± 2% 0.56ns ± 6% -32.91% (p=0.000 n=10+10) PutUint64-8 0.81ns ± 3% 0.62ns ± 4% -23.82% (p=0.000 n=10+10) LittleEndianPutUint16-8 0.55ns ± 4% 0.55ns ± 3% ~ (p=0.926 n=10+10) LittleEndianPutUint32-8 0.41ns ± 4% 0.42ns ± 3% ~ (p=0.148 n=10+9) LittleEndianPutUint64-8 0.55ns ± 2% 0.56ns ± 6% ~ (p=0.897 n=10+10) ReadFloats-8 60.4ns ± 4% 59.0ns ± 1% -2.25% (p=0.007 n=10+10) WriteFloats-8 72.3ns ± 2% 71.5ns ± 7% ~ (p=0.089 n=10+10) ReadSlice1000Float32s-8 4.21µs ± 3% 4.18µs ± 2% ~ (p=0.197 n=10+10) WriteSlice1000Float32s-8 4.61µs ± 2% 4.68µs ± 7% ~ (p=1.000 n=8+10) ReadSlice1000Uint8s-8 250ns ± 4% 247ns ± 4% ~ (p=0.324 n=10+10) WriteSlice1000Uint8s-8 227ns ± 5% 229ns ± 2% ~ (p=0.193 n=10+7) PutUvarint32-8 15.3ns ± 2% 15.4ns ± 4% ~ (p=0.782 n=10+10) PutUvarint64-8 38.5ns ± 1% 38.6ns ± 5% ~ (p=0.396 n=8+10) name old speed new speed delta ReadSlice1000Int32s-8 890MB/s ±17% 953MB/s ± 1% +7.00% (p=0.027 n=10+8) ReadStruct-8 209MB/s ± 8% 204MB/s ± 5% -2.42% (p=0.043 n=9+10) WriteStruct-8 214MB/s ± 3% 210MB/s ± 1% -1.75% (p=0.003 n=9+10) ReadInts-8 127MB/s ± 1% 129MB/s ± 1% +1.01% (p=0.006 n=10+10) WriteInts-8 113MB/s ± 1% 109MB/s ± 1% -3.34% (p=0.000 n=10+10) WriteSlice1000Int32s-8 868MB/s ± 5% 872MB/s ± 5% ~ (p=1.000 n=10+10) PutUint16-8 3.55GB/s ± 4% 3.50GB/s ± 4% ~ (p=0.093 n=10+10) PutUint32-8 4.83GB/s ± 2% 7.21GB/s ± 6% +49.16% (p=0.000 n=10+10) PutUint64-8 9.89GB/s ± 3% 12.99GB/s ± 4% +31.26% (p=0.000 n=10+10) LittleEndianPutUint16-8 3.65GB/s ± 4% 3.65GB/s ± 4% ~ (p=0.912 n=10+10) LittleEndianPutUint32-8 9.74GB/s ± 3% 9.63GB/s ± 3% ~ (p=0.222 n=9+9) LittleEndianPutUint64-8 14.4GB/s ± 2% 14.3GB/s ± 5% ~ (p=0.912 n=10+10) ReadFloats-8 199MB/s ± 4% 203MB/s ± 1% +2.27% (p=0.007 n=10+10) WriteFloats-8 166MB/s ± 2% 168MB/s ± 7% ~ (p=0.089 n=10+10) ReadSlice1000Float32s-8 949MB/s ± 3% 958MB/s ± 2% ~ (p=0.218 n=10+10) WriteSlice1000Float32s-8 867MB/s ± 2% 857MB/s ± 6% ~ (p=1.000 n=8+10) ReadSlice1000Uint8s-8 4.00GB/s ± 4% 4.06GB/s ± 4% ~ (p=0.353 n=10+10) WriteSlice1000Uint8s-8 4.40GB/s ± 4% 4.36GB/s ± 2% ~ (p=0.193 n=10+7) PutUvarint32-8 262MB/s ± 2% 260MB/s ± 4% ~ (p=0.739 n=10+10) PutUvarint64-8 208MB/s ± 1% 207MB/s ± 5% ~ (p=0.408 n=8+10) Updates #45453 Change-Id: Ifda0d48d54665cef45d46d3aad974062633142c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/354670 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>	2021-10-18 16:01:55 +00:00
Jake Ciolek	732f6fa9d5	cmd/compile: use ANDL for small immediates We can rewrite ANDQ with an immediate fitting in 32bit with an ANDL, which is shorter to encode. Looking at Go binary itself, before the change there was: ANDL: 2337 ANDQ: 4476 After the change: ANDL: 3790 ANDQ: 3024 So we got rid of 1452 ANDQs This makes the Linux x86_64 binary 0.03% smaller. There seems to be an impact on performance. Intel Cascade Lake benchmarks (with perflock): name old time/op new time/op delta BinaryTree17-8 1.91s ± 1% 1.89s ± 1% -1.22% (p=0.000 n=21+18) Fannkuch11-8 2.34s ± 0% 2.34s ± 0% ~ (p=0.052 n=20+20) FmtFprintfEmpty-8 27.7ns ± 1% 27.4ns ± 3% ~ (p=0.497 n=21+21) FmtFprintfString-8 53.2ns ± 0% 51.5ns ± 0% -3.21% (p=0.000 n=20+19) FmtFprintfInt-8 57.3ns ± 0% 55.7ns ± 0% -2.89% (p=0.000 n=19+19) FmtFprintfIntInt-8 92.3ns ± 0% 88.4ns ± 1% -4.23% (p=0.000 n=20+21) FmtFprintfPrefixedInt-8 103ns ± 0% 103ns ± 0% +0.23% (p=0.000 n=20+21) FmtFprintfFloat-8 147ns ± 0% 148ns ± 0% +0.75% (p=0.000 n=20+21) FmtManyArgs-8 384ns ± 0% 381ns ± 0% -0.63% (p=0.000 n=21+21) GobDecode-8 3.86ms ± 1% 3.88ms ± 1% +0.52% (p=0.000 n=20+21) GobEncode-8 2.77ms ± 1% 2.77ms ± 0% ~ (p=0.078 n=21+21) Gzip-8 168ms ± 1% 168ms ± 0% +0.24% (p=0.000 n=20+20) Gunzip-8 25.1ms ± 0% 24.3ms ± 0% -3.03% (p=0.000 n=21+21) HTTPClientServer-8 61.4µs ± 8% 59.1µs ±10% ~ (p=0.088 n=20+21) JSONEncode-8 6.86ms ± 0% 6.70ms ± 0% -2.29% (p=0.000 n=20+19) JSONDecode-8 30.8ms ± 1% 30.6ms ± 1% -0.82% (p=0.000 n=20+20) Mandelbrot200-8 3.85ms ± 0% 3.85ms ± 0% ~ (p=0.191 n=16+17) GoParse-8 2.61ms ± 2% 2.60ms ± 1% ~ (p=0.561 n=21+20) RegexpMatchEasy0_32-8 48.5ns ± 2% 45.9ns ± 3% -5.26% (p=0.000 n=20+21) RegexpMatchEasy0_1K-8 139ns ± 0% 139ns ± 0% +0.27% (p=0.000 n=18+20) RegexpMatchEasy1_32-8 41.3ns ± 0% 42.1ns ± 4% +1.95% (p=0.000 n=17+21) RegexpMatchEasy1_1K-8 216ns ± 2% 216ns ± 0% +0.17% (p=0.020 n=21+19) RegexpMatchMedium_32-8 790ns ± 7% 803ns ± 8% ~ (p=0.178 n=21+21) RegexpMatchMedium_1K-8 23.5µs ± 5% 23.7µs ± 5% ~ (p=0.421 n=21+21) RegexpMatchHard_32-8 1.09µs ± 1% 1.09µs ± 1% -0.53% (p=0.000 n=19+18) RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.610 n=21+20) Revcomp-8 348ms ± 0% 353ms ± 0% +1.38% (p=0.000 n=17+18) Template-8 42.0ms ± 1% 41.9ms ± 1% -0.30% (p=0.049 n=20+20) TimeParse-8 185ns ± 0% 185ns ± 0% ~ (p=0.387 n=20+18) TimeFormat-8 237ns ± 1% 241ns ± 1% +1.57% (p=0.000 n=21+21) [Geo mean] 35.4µs 35.2µs -0.66% name old speed new speed delta GobDecode-8 199MB/s ± 1% 198MB/s ± 1% -0.52% (p=0.000 n=20+21) GobEncode-8 277MB/s ± 1% 277MB/s ± 0% ~ (p=0.075 n=21+21) Gzip-8 116MB/s ± 1% 115MB/s ± 0% -0.25% (p=0.000 n=20+20) Gunzip-8 773MB/s ± 0% 797MB/s ± 0% +3.12% (p=0.000 n=21+21) JSONEncode-8 283MB/s ± 0% 290MB/s ± 0% +2.35% (p=0.000 n=20+19) JSONDecode-8 63.0MB/s ± 1% 63.5MB/s ± 1% +0.82% (p=0.000 n=20+20) GoParse-8 22.2MB/s ± 2% 22.3MB/s ± 1% ~ (p=0.539 n=21+20) RegexpMatchEasy0_32-8 660MB/s ± 2% 697MB/s ± 3% +5.57% (p=0.000 n=20+21) RegexpMatchEasy0_1K-8 7.36GB/s ± 0% 7.34GB/s ± 0% -0.26% (p=0.000 n=18+20) RegexpMatchEasy1_32-8 775MB/s ± 0% 761MB/s ± 4% -1.88% (p=0.000 n=17+21) RegexpMatchEasy1_1K-8 4.74GB/s ± 2% 4.74GB/s ± 0% -0.18% (p=0.020 n=21+19) RegexpMatchMedium_32-8 40.6MB/s ± 7% 39.9MB/s ± 9% ~ (p=0.191 n=21+21) RegexpMatchMedium_1K-8 43.7MB/s ± 5% 43.2MB/s ± 5% ~ (p=0.435 n=21+21) RegexpMatchHard_32-8 29.3MB/s ± 1% 29.4MB/s ± 1% +0.53% (p=0.000 n=19+18) RegexpMatchHard_1K-8 31.0MB/s ± 0% 31.0MB/s ± 0% ~ (p=0.572 n=21+20) Revcomp-8 730MB/s ± 0% 720MB/s ± 0% -1.36% (p=0.000 n=17+18) Template-8 46.2MB/s ± 1% 46.3MB/s ± 1% +0.30% (p=0.041 n=20+20) [Geo mean] 204MB/s 205MB/s +0.30% Change-Id: Iac75d0ec184a515ce0e65e19559d5fe2e9840514 Reviewed-on: https://go-review.googlesource.com/c/go/+/354970 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Trust: Josh Bleecher Snyder <josharian@gmail.com> Trust: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-10-12 22:02:39 +00:00
Alejandro García Montoro	e1c294a56d	cmd/compile: eliminate successive swaps The code generated when storing eight bytes loaded from memory in big endian introduced two successive byte swaps that did not actually modified the data. The new rules match this specific pattern both for amd64 and for arm64, eliminating the double swap. Fixes #41684 Change-Id: Icb6dc20b68e4393cef4fe6a07b33aba0d18c3ff3 Reviewed-on: https://go-review.googlesource.com/c/go/+/320073 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Daniel Martí <mvdan@mvdan.cc> Trust: Dmitri Shuralyov <dmitshur@golang.org>	2021-10-09 01:04:29 +00:00
Ruslan Andreev	810b08b8ec	cmd/compile: inline memequal(x, const, sz) for small sizes This CL adds late expanded memequal(x, const, sz) inlining for 2, 4, 8 bytes size. This PoC is using the same method as CL 248404. This optimization fires about 100 times in Go compiler (1675 occurrences reduced to 1574, so -6%). Also, added unit-tests to codegen/comparisions.go file. Updates #37275 Change-Id: Ia52808d573cb706d1da8166c5746ede26f46c5da Reviewed-on: https://go-review.googlesource.com/c/go/+/328291 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Trust: David Chase <drchase@google.com>	2021-10-06 13:47:50 +00:00
nimelehin	097a82f54d	cmd/compile: don't emit unnecessary amd64 extension checks In case of amd64 the compiler issues checks if extensions are available on a platform. With GOAMD64 microarchitecture levels provided, some of the checks could be eliminated. Change-Id: If15c178bcae273b2ce7d3673415cb8849292e087 Reviewed-on: https://go-review.googlesource.com/c/go/+/352010 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org>	2021-10-05 18:32:12 +00:00
wdvxdr	060cd73ab9	cmd/compile: use TZCNT instruction for GOAMD64>=v3 on my Intel CoffeeLake CPU: name old time/op new time/op delta TrailingZeros-8 0.68ns ± 1% 0.64ns ± 1% -6.26% (p=0.000 n=10+10) TrailingZeros8-8 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.697 n=10+10) TrailingZeros16-8 0.70ns ± 1% 0.70ns ± 1% +0.57% (p=0.043 n=10+10) TrailingZeros32-8 0.66ns ± 1% 0.64ns ± 1% -3.35% (p=0.000 n=10+10) TrailingZeros64-8 0.68ns ± 1% 0.64ns ± 1% -5.84% (p=0.000 n=9+10) Updates #45453 Change-Id: I228ff2d51df24b1306136f061432f8a12bb1d6fd Reviewed-on: https://go-review.googlesource.com/c/go/+/353249 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-10-05 16:06:49 +00:00
Lynn Boger	b8990ec932	test: update test/codegen/noextend.go to work with either ABI on ppc64x This updates the codegen tests in noextend.go so they are not dependent on the ABI. Change-Id: I8433bea9dc78830c143290a7e0cf901b2397d38a Reviewed-on: https://go-review.googlesource.com/c/go/+/353070 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-29 17:30:11 +00:00
Archana R	b35c668072	cmd/compile: add PPC64-specific inlining for runtime.memmove Add rule to PPC64.rules to inline runtime.memmove in more cases, as is done for other target architectures Updated tests in codegen/copy.go to verify changes are done on ppc64/ppc64le Updates #41662 Change-Id: Id937ce21f9b4f4047b3e66dfa3c960128ee16a2a Reviewed-on: https://go-review.googlesource.com/c/go/+/352054 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>	2021-09-29 16:55:51 +00:00
Joel Sing	fe8347b61a	cmd/compile: optimise immediate operands with constants on riscv64 Instructions with immediates can be precomputed when operating on a constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI and ADDI. Additionally, optimise ANDI and ORI when the immediate is all ones or all zeroes. In particular, the RISCV64 logical left and right shift rules (Lshx/RshUx) produce sequences that check if the shift amount exceeds 64 and if so returns zero. When the shift amount is a constant we can precompute and eliminate the filter entirely. Likewise the arithmetic right shift rules produce sequences that check if the shift amount exceeds 64 and if so, ensures that the lower six bits of the shift are all ones. When the shift amount is a constant we can precompute the shift value. Arithmetic right shift sequences like: 117fc: 00100513 li a0,1 11800: 04053593 sltiu a1,a0,64 11804: fff58593 addi a1,a1,-1 11808: 0015e593 ori a1,a1,1 1180c: 40b45433 sra s0,s0,a1 Are now a single srai instruction: 117fc: 40145413 srai s0,s0,0x1 Likewise for logical left shift (and logical right shift): 1d560: 01100413 li s0,17 1d564: 04043413 sltiu s0,s0,64 1d568: 40800433 neg s0,s0 1d56c: 01131493 slli s1,t1,0x11 1d570: 0084f433 and s0,s1,s0 Which are now a single slli (or srli) instruction: 1d120: 01131413 slli s0,t1,0x11 This removes more than 30,000 instructions from the Go binary and should improve performance in a variety of areas - of note runtime.makemap_small drops from 48 to 36 instructions. Similar gains exist in at least other parts of runtime and math/bits. Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/350689 Trust: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-24 10:51:48 +00:00
Joel Sing	d413908320	test/codegen: add shift tests for RISCV64 Add tests for shift by constant, masked shifts and bounded shifts. While here, sort tests by architecture and keep order of tests consistent (lsh, rshU, rsh). Change-Id: I512d64196f34df9cb2884e8c0f6adcf9dd88b0fc Reviewed-on: https://go-review.googlesource.com/c/go/+/351289 Trust: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>	2021-09-24 07:22:13 +00:00
Matthew Dempsky	04572fa29b	cmd/compile: use BMI1 instructions for GOAMD64=v3 and higher BMI1 includes four instructions (ANDN, BLSI, BLSMSK, BLSR) that are easy to peephole optimize, and which GCC always seems to favor using when available and applicable. Updates #45453. Change-Id: I0274184057058f5c579e5bc3ea9c414396d3cf46 Reviewed-on: https://go-review.googlesource.com/c/go/+/351130 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Trust: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-09-22 00:15:27 +00:00
Keith Randall	6f35430faa	cmd/compile: allow rotates to be merged with logical ops on arm64 Fixes #48002 Change-Id: Ie3a157d55b291f5ac2ef4845e6ce4fefd84fc642 Reviewed-on: https://go-review.googlesource.com/c/go/+/350912 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-20 16:25:36 +00:00
Keith Randall	315dbd10c9	cmd/compile: fold double negate on arm64 Fixes #48467 Change-Id: I52305dbf561ee3eee6c1f053e555a3a6ec1ab892 Reviewed-on: https://go-review.googlesource.com/c/go/+/350910 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2021-09-19 23:50:45 +00:00
Keith Randall	83b36ffb10	cmd/compile: implement constant rotates on arm64 Explicit constant rotates work, but constant arguments to bits.RotateLeft* needed the additional rule. Fixes #48465 Change-Id: Ia7544f21d0e7587b6b6506f72421459cd769aea6 Reviewed-on: https://go-review.googlesource.com/c/go/+/350909 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2021-09-19 23:50:23 +00:00
Michael Munday	c69f5c0d76	cmd/compile: add support for Abs and Copysign intrinsics on riscv64 Also, add the FABSS and FABSD pseudo instructions to the assembler. The compiler could use FSGNJX[SD] directly but there doesn't seem to be much advantage to doing so and the pseudo instructions are easier to understand. Change-Id: Ie8825b8aa8773c69cc4f07a32ef04abf4061d80d Reviewed-on: https://go-review.googlesource.com/c/go/+/348989 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>	2021-09-10 10:45:59 +00:00
fanzha02	2091bd3f26	cmd/compile: simiplify arm64 bitfield optimizations In some rewrite rules for arm64 bitfield optimizations, the bitfield lsb value and the bitfield width value are related to datasize, some of them use datasize directly to check the bitfield lsb value is valid, to get the bitfiled width value, but some of them call isARM64BFMask() and arm64BFWidth() functions. In order to be consistent, this patch changes them all to use datasize. Besides, this patch sorts the codegen test cases. Run the "toolstash-check -all" command and find one inconsistent code is as the following. new: src/math/fma.go:104 BEQ 247 master: src/math/fma.go:104 BEQ 248 The above inconsistence is due to this patch changing the range of the field lsb value in "UBFIZ" optimization rules from "lc+(32\|16\|8)<64" to "lc<64", so that the following code is generated as "UBFIZ". The logical of changed code is still correct. The code of src/math/fma.go:160: const uvinf = 0x7FF0000000000000 func FMA(a, b uint32) float64 { ps := a+b return Float64frombits(uint64(ps)<<63 \| uvinf) } The new assembly code: TEXT "".FMA(SB), LEAF\|NOFRAME\|ABIInternal, $0-16 MOVWU "".a(FP), R0 MOVWU "".b+4(FP), R1 ADD R1, R0, R0 UBFIZ $63, R0, $1, R0 ORR $9218868437227405312, R0, R0 MOVD R0, "".~r2+8(FP) RET (R30) The master assembly code: TEXT "".FMA(SB), LEAF\|NOFRAME\|ABIInternal, $0-16 MOVWU "".a(FP), R0 MOVWU "".b+4(FP), R1 ADD R1, R0, R0 MOVWU R0, R0 LSL $63, R0, R0 ORR $9218868437227405312, R0, R0 MOVD R0, "".~r2+8(FP) RET (R30) Change-Id: I9061104adfdfd3384d0525327ae1e5c8b0df5c35 Reviewed-on: https://go-review.googlesource.com/c/go/+/265038 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-10 02:44:36 +00:00
Michael Munday	8214257347	test/codegen: fix package name for test case The codegen tests are currently skipped (see #48247). The test added in CL 346050 did not compile because it was in the main package but did not contain a main function. Changing the package to 'codegen' fixes the issue. Updates #48247. Change-Id: I0a0eaca8e6a7d7b335606d2c76a204ac0c12e6d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/348392 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-09-08 14:51:40 +00:00
Michael Munday	1da64686f8	test/codegen: fix compilation of bitfield tests The codegen tests are currently skipped (see #48247) and the bitfield tests do not actually compile due to a duplicate function name (sbfiz5) added in CL 267602. Renaming the function fixes the issue. Updates #48247. Change-Id: I626fd5ef13732dc358e73ace9ddcc4cbb6ae5b21 Reviewed-on: https://go-review.googlesource.com/c/go/+/348391 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-08 14:51:31 +00:00
Michael Munday	fdc2072420	test/codegen: remove broken riscv64 test This test is not executed by default (see #48247) and does not actually pass. It was added in CL 346689. The code generation changes made in that CL only change how instructions are assembled, they do not actually affect the output of the compiler. This test is unfortunately therefore invalid and will never pass. Updates #48247. Change-Id: I0c807e4a111336e5a097fe4e3af2805f9932a87f Reviewed-on: https://go-review.googlesource.com/c/go/+/348390 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-09-08 14:51:22 +00:00
wdvxdr	21de6bc463	cmd/compile: simplify less with non-negative number and constant 0 or 1 The most common cases: len(s) > 0 len(s) < 1 and they can be simplified to: len(s) != 0 len(s) == 0 Fixes #48054 Change-Id: I16e5b0cffcfab62a4acc2a09977a6cd3543dd000 Reviewed-on: https://go-review.googlesource.com/c/go/+/346050 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-07 14:53:41 +00:00
fanzha02	7b69ddc171	cmd/compile: merge sign extension and shift into SBFIZ This patch adds some rules to rewrite "(LeftShift (SignExtend x) lc)" expression as "SBFIZ". Add the test cases. Change-Id: I294c4ba09712eeb02c7a952447bb006780f1e60d Reviewed-on: https://go-review.googlesource.com/c/go/+/267602 Trust: fannie zhang <Fannie.Zhang@arm.com> Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-09-06 03:31:53 +00:00
fanzha02	43b05173a2	cmd/compile: merge zero/sign extensions with UBFX/SBFX on arm64 The UBFX and SBFX already zero/sign extend the result. Further zero/sign extensions are thus unnecessary as long as they leave the top bits unaltered. This patch absorbs zero/sign extensions into UBFX/SBFX. Add the related test cases. Change-Id: I7c4516c8b52d677f77bf3aaedab87c4a28056ec0 Reviewed-on: https://go-review.googlesource.com/c/go/+/265039 Trust: fannie zhang <Fannie.Zhang@arm.com> Trust: Keith Randall <khr@golang.org> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-06 03:31:43 +00:00
Ben Shi	37d4532867	cmd/internal/obj/riscv: simplify addition with constant This CL simplifies riscv addition (add r, imm) to (ADDI (ADDI r, imm/2), imm-imm/2) if imm is in specific ranges. (-4096 <= imm <= -2049 or 2048 <= imm <= 4094) There is little impact to the go1 benchmark, while the total size of pkg/linux_riscv64 decreased by about 11KB. Change-Id: I236eb8af3b83bb35ce9c0b318fc1d235e8ab9a4e GitHub-Last-Rev: `a2f56a0763` GitHub-Pull-Request: golang/go#48110 Reviewed-on: https://go-review.googlesource.com/c/go/+/346689 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Trust: Michael Munday <mike.munday@lowrisc.org>	2021-09-02 14:35:10 +00:00
Michael Munday	ea51e223c2	cmd/{asm,compile}: add fused multiply-add support on riscv64 Add support to the assembler for F[N]M{ADD,SUB}[SD] instructions. Argument order is: OP RS1, RS2, RS3, RD Also, add support for the FMA intrinsic to the compiler. Automatic FMA matching is left to a future CL. Change-Id: I47166c7393b2ab6bfc2e42aa8c1a8997c3a071b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/293030 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>	2021-09-01 21:17:04 +00:00
Jake Ciolek	6cf1d5d0fa	cmd/compile: generic SSA rules for simplifying 2 and 3 operand integer arithmetic expressions This applies the following generic integer addition/subtraction transformations: x - (x + y) = -y (x - y) - x = -y y + (x - y) = x y + (z + (x - y) = x + z There's over 40 unique functions matching in Go. Hits 2 funcs in the runtime itself: runtime.stackfree() runtime.runqdrain() Go binary size reduced by 0.05% on Linux x86_64. StackCopy bench (perflocked Cascade Lake x86): name old time/op new time/op delta StackCopyPtr-8 87.3ms ± 1% 86.9ms ± 0% -0.45% (p=0.000 n=20+20) StackCopy-8 77.6ms ± 1% 77.0ms ± 0% -0.76% (p=0.000 n=20+20) StackCopyNoCache-8 2.28ms ± 2% 2.26ms ± 2% -0.93% (p=0.008 n=19+20) test/bench/go1 benchmarks (perflocked Cascade Lake x86): name old time/op new time/op delta BinaryTree17-8 1.88s ± 1% 1.88s ± 0% ~ (p=0.373 n=15+12) Fannkuch11-8 2.31s ± 0% 2.35s ± 0% +1.52% (p=0.000 n=15+14) FmtFprintfEmpty-8 26.6ns ± 0% 26.6ns ± 0% ~ (p=0.081 n=14+13) FmtFprintfString-8 48.6ns ± 0% 50.0ns ± 0% +2.86% (p=0.000 n=15+14) FmtFprintfInt-8 56.9ns ± 0% 54.8ns ± 0% -3.70% (p=0.000 n=15+15) FmtFprintfIntInt-8 90.4ns ± 0% 88.8ns ± 0% -1.78% (p=0.000 n=15+15) FmtFprintfPrefixedInt-8 104ns ± 0% 104ns ± 0% ~ (p=0.905 n=14+13) FmtFprintfFloat-8 148ns ± 0% 144ns ± 0% -2.19% (p=0.000 n=14+15) FmtManyArgs-8 389ns ± 0% 390ns ± 0% +0.35% (p=0.000 n=12+15) GobDecode-8 3.90ms ± 1% 3.88ms ± 0% -0.49% (p=0.000 n=15+14) GobEncode-8 2.73ms ± 0% 2.73ms ± 0% ~ (p=0.425 n=15+14) Gzip-8 169ms ± 0% 168ms ± 0% -0.52% (p=0.000 n=13+13) Gunzip-8 24.7ms ± 0% 24.8ms ± 0% +0.61% (p=0.000 n=15+15) HTTPClientServer-8 60.5µs ± 6% 60.4µs ± 7% ~ (p=0.595 n=15+15) JSONEncode-8 6.97ms ± 1% 6.93ms ± 0% -0.69% (p=0.000 n=14+14) JSONDecode-8 31.2ms ± 1% 30.8ms ± 1% -1.27% (p=0.000 n=14+14) Mandelbrot200-8 3.87ms ± 0% 3.87ms ± 0% ~ (p=0.652 n=15+14) GoParse-8 2.65ms ± 2% 2.64ms ± 1% ~ (p=0.202 n=15+15) RegexpMatchEasy0_32-8 45.1ns ± 0% 45.9ns ± 0% +1.68% (p=0.000 n=14+15) RegexpMatchEasy0_1K-8 140ns ± 0% 139ns ± 0% -0.44% (p=0.000 n=15+14) RegexpMatchEasy1_32-8 40.9ns ± 3% 40.5ns ± 0% -0.88% (p=0.000 n=15+13) RegexpMatchEasy1_1K-8 215ns ± 1% 220ns ± 1% +2.27% (p=0.000 n=15+15) RegexpMatchMedium_32-8 783ns ± 7% 738ns ± 0% ~ (p=0.361 n=15+15) RegexpMatchMedium_1K-8 24.1µs ± 6% 23.4µs ± 6% -2.94% (p=0.004 n=15+15) RegexpMatchHard_32-8 1.10µs ± 1% 1.09µs ± 1% -0.40% (p=0.006 n=15+14) RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.535 n=12+14) Revcomp-8 354ms ± 0% 353ms ± 0% -0.23% (p=0.002 n=15+13) Template-8 42.0ms ± 1% 41.8ms ± 2% -0.37% (p=0.023 n=14+15) TimeParse-8 181ns ± 0% 180ns ± 1% -0.18% (p=0.014 n=12+13) TimeFormat-8 240ns ± 0% 242ns ± 1% +0.69% (p=0.000 n=12+15) [Geo mean] 35.2µs 35.1µs -0.43% name old speed new speed delta GobDecode-8 197MB/s ± 1% 198MB/s ± 0% +0.49% (p=0.000 n=15+14) GobEncode-8 281MB/s ± 0% 281MB/s ± 0% ~ (p=0.419 n=15+14) Gzip-8 115MB/s ± 0% 115MB/s ± 0% +0.52% (p=0.000 n=13+13) Gunzip-8 786MB/s ± 0% 781MB/s ± 0% -0.60% (p=0.000 n=15+15) JSONEncode-8 278MB/s ± 1% 280MB/s ± 0% +0.69% (p=0.000 n=14+14) JSONDecode-8 62.3MB/s ± 1% 63.1MB/s ± 1% +1.29% (p=0.000 n=14+14) GoParse-8 21.9MB/s ± 2% 22.0MB/s ± 1% ~ (p=0.205 n=15+15) RegexpMatchEasy0_32-8 709MB/s ± 0% 697MB/s ± 0% -1.65% (p=0.000 n=14+15) RegexpMatchEasy0_1K-8 7.34GB/s ± 0% 7.37GB/s ± 0% +0.43% (p=0.000 n=15+15) RegexpMatchEasy1_32-8 783MB/s ± 2% 790MB/s ± 0% +0.88% (p=0.000 n=15+13) RegexpMatchEasy1_1K-8 4.77GB/s ± 1% 4.66GB/s ± 1% -2.23% (p=0.000 n=15+15) RegexpMatchMedium_32-8 41.0MB/s ± 7% 43.3MB/s ± 0% ~ (p=0.360 n=15+15) RegexpMatchMedium_1K-8 42.5MB/s ± 6% 43.8MB/s ± 6% +3.07% (p=0.004 n=15+15) RegexpMatchHard_32-8 29.2MB/s ± 1% 29.3MB/s ± 1% +0.41% (p=0.006 n=15+14) RegexpMatchHard_1K-8 31.1MB/s ± 0% 31.1MB/s ± 0% ~ (p=0.495 n=12+14) Revcomp-8 718MB/s ± 0% 720MB/s ± 0% +0.23% (p=0.002 n=15+13) Template-8 46.3MB/s ± 1% 46.4MB/s ± 2% +0.38% (p=0.021 n=14+15) [Geo mean] 205MB/s 206MB/s +0.57% Change-Id: Ibd1afdf8b6c0b08087dcc3acd8f943637eb95ac0 Reviewed-on: https://go-review.googlesource.com/c/go/+/344930 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com>	2021-08-25 19:45:20 +00:00
Meng Zhuo	efd206eb40	cmd/compile: intrinsify Mul64 on riscv64 According to RISCV instruction set manual v2.2 Sec 6.1 MULHU followed by MUL will be fused into one multiply by microarchitecture Benchstat on Hifive unmatched: name old time/op new time/op delta Hash8Bytes 245ns ± 3% 186ns ± 4% -23.99% (p=0.000 n=10+10) Hash320Bytes 1.94µs ± 1% 1.31µs ± 1% -32.38% (p=0.000 n=9+10) Hash1K 5.84µs ± 0% 3.84µs ± 0% -34.20% (p=0.000 n=10+9) Hash8K 45.3µs ± 0% 29.4µs ± 0% -35.04% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes 32.7MB/s ± 3% 43.0MB/s ± 4% +31.61% (p=0.000 n=10+10) Hash320Bytes 165MB/s ± 1% 244MB/s ± 1% +47.88% (p=0.000 n=9+10) Hash1K 175MB/s ± 0% 266MB/s ± 0% +51.98% (p=0.000 n=10+9) Hash8K 181MB/s ± 0% 279MB/s ± 0% +53.94% (p=0.000 n=10+10) Change-Id: I3561495d02a4a0ad8578e9b9819bf0a4eaca5d12 Reviewed-on: https://go-review.googlesource.com/c/go/+/329970 Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Go Bot <gobot@golang.org> Trust: Meng Zhuo <mzh@golangcn.org>	2021-08-16 13:50:11 +00:00
Cherry Mui	6b1e4430bb	[dev.typeparams] cmd/compile: implement clobberdead mode on ARM64 For debugging. Change-Id: I5875ccd2413b8ffd2ec97a0ace66b5cae7893b24 Reviewed-on: https://go-review.googlesource.com/c/go/+/324765 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Than McIntosh <thanm@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-06-03 20:00:30 +00:00
Cherry Mui	165d39a1d4	[dev.typeparams] test: adjust codegen test for register ABI on ARM64 In codegen/arithmetic.go, previously there are MOVD's that match for loads of arguments. With register ABI there are no more such loads. Remove the MOVD matches. Change-Id: I920ee2629c8c04d454f13a0c08e283d3528d9a64 Reviewed-on: https://go-review.googlesource.com/c/go/+/324251 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>	2021-06-03 14:28:14 +00:00
Ruslan Andreev	3b321a9d12	cmd/compile: add arch-specific inlining for runtime.memmove This CL add runtime.memmove inlining for AMD64 and ARM64. According to ssa dump from testcases generic rules can't inline memmomve properly due to one of the arguments is Phi operation. But this Phi op will be optimized out by later optimization stages. As a result memmove can be inlined during arch-specific rules. The commit add new optimization rules to arch-specific rules that can inline runtime.memmove if it possible during lowering stage. Optimization fires 5 times in Go source-code using regabi. Fixes #41662 Change-Id: Iaffaf4c482d068b5f0683d141863892202cc8824 Reviewed-on: https://go-review.googlesource.com/c/go/+/289151 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: David Chase <drchase@google.com>	2021-05-12 16:23:30 +00:00
Keith Randall	b211fe0058	cmd/compile: remove bit operations that modify memory directly These operations (BT{S,R,C}{Q,L}modify) are quite a bit slower than other ways of doing the same thing. Without the BTxmodify operations, there are two fallback ways the compiler performs these operations: AND/OR/XOR operations directly on memory, or load-BTx-write sequences. The compiler kinda chooses one arbitrarily depending on rewrite rule application order. Currently, it uses load-BTx-write for the Const benchmarks and AND/OR/XOR directly to memory for the non-Const benchmarks. TBD, someone might investigate which of the two fallback strategies is really better. For now, they are both better than BTx ops. name old time/op new time/op delta BitSet-8 1.09µs ± 2% 0.64µs ± 5% -41.60% (p=0.000 n=9+10) BitClear-8 1.15µs ± 3% 0.68µs ± 6% -41.00% (p=0.000 n=10+10) BitToggle-8 1.18µs ± 4% 0.73µs ± 2% -38.36% (p=0.000 n=10+8) BitSetConst-8 37.0ns ± 7% 25.8ns ± 2% -30.24% (p=0.000 n=10+10) BitClearConst-8 30.7ns ± 2% 25.0ns ±12% -18.46% (p=0.000 n=10+10) BitToggleConst-8 36.9ns ± 1% 23.8ns ± 3% -35.46% (p=0.000 n=9+10) Fixes #45790 Update #45242 Change-Id: Ie33a72dc139f261af82db15d446cd0855afb4e59 Reviewed-on: https://go-review.googlesource.com/c/go/+/318149 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ben Shi <powerman1st@163.com>	2021-05-08 03:27:59 +00:00
Cherry Zhang	6b8e3e2d06	cmd/compile: reduce redundant register moves for regabi calls Currently, if we have AX=a and BX=b, and we want to make a call F(1, a, b), to move arguments into the desired registers it emits MOVQ AX, CX MOVL $1, AX // AX=1 MOVQ BX, DX MOVQ CX, BX // BX=a MOVQ DX, CX // CX=b This has a few redundant moves. This is because we process inputs in order. First, allocate 1 to AX, which kicks out a (in AX) to CX (a free register at the moment). Then, allocate a to BX, which kicks out b (in BX) to DX. Finally, put b to CX. Notice that if we start with allocating CX=b, then BX=a, AX=1, we will not have redundant moves. This CL reduces redundant moves by allocating them in different order: First, for inpouts that are already in place, keep them there. Then allocate free registers. Then everything else. before after cmd/compile binary size 23703888 23609680 text size 8565899 8533291 (with regabiargs enabled.) Change-Id: I69e1bdf745f2c90bb791f6d7c45b37384af1e874 Reviewed-on: https://go-review.googlesource.com/c/go/+/311371 Trust: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Than McIntosh <thanm@google.com>	2021-04-19 16:21:39 +00:00
David Chase	4df3d0e4df	cmd/compile: rescue stmt boundaries from OpArgXXXReg and OpSelectN. Fixes this failure: go test cmd/compile/internal/ssa -run TestStmtLines -v === RUN TestStmtLines stmtlines_test.go:115: Saw too many (amd64, > 1%) lines without statement marks, total=88263, nostmt=1930 ('-run TestStmtLines -v' lists failing lines) The failure has two causes. One is that the first-line adjuster in code generation was relocating "first lines" to instructions that would either not have any code generated, or would have the statment marker removed by a different believed-good heuristic. The other was that statement boundaries were getting attached to register values (that with the old ABI were loads from the stack, hence real instructions). The register values disappear at code generation. The fixes are to (1) note that certain instructions are not good choices for "first value" and skip them, and (2) in an expandCalls post-pass, look for register valued instructions and under appropriate conditions move their statement marker to a compatible use. Also updates TestStmtLines to always log the score, for easier comparison of minor compiler changes. Updates #40724. Change-Id: I485573ce900e292d7c44574adb7629cdb4695c3f Reviewed-on: https://go-review.googlesource.com/c/go/+/309649 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-04-14 18:46:08 +00:00
Cherry Zhang	444d28295b	test: make codegen/memops.go work with both ABIs Following CL 309335, this fixes memops.go. Change-Id: Ia2458b5267deee9f906f76c50e70a021ea2fcb5b Reviewed-on: https://go-review.googlesource.com/c/go/+/309552 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Than McIntosh <thanm@google.com>	2021-04-13 14:07:17 +00:00
Cherry Zhang	263e13d1f7	test: make codegen tests work with both ABIs Some codegen tests were written with the assumption that arguments and results are in memory, and with a specific stack layout. With the register ABI, the assumption is no longer true. Adjust the tests to work with both cases. - For tests expecting in memory arguments/results, change to use global variables or memory-assigned argument/results. - Allow more registers. E.g. some tests expecting register names contain only letters (e.g. AX), but it can also contain numbers (e.g. R10). - Some instruction selection changes when operate on register vs. memory, e.g. ADDQ vs. LEAQ, MOVB vs. MOVL. Accept both. TODO: mathbits.go and memops.go still need fix. Change-Id: Ic5932b4b5dd3f5d30ed078d296476b641420c4c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/309335 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2021-04-12 21:59:59 +00:00
Pat Gavlin	359f44910f	cmd/compile: fix long RMW bit operations on AMD64 Under certain circumstances, the existing rules for bit operations can produce code that writes beyond its intended bounds. For example, consider the following code: func repro(b []byte, addr, bit int32) { _ = b[3] v := uint32(b[0]) \| uint32(b[1])<<8 \| uint32(b[2])<<16 \| uint32(b[3])<<24 \| 1<<(bit&31) b[0] = byte(v) b[1] = byte(v >> 8) b[2] = byte(v >> 16) b[3] = byte(v >> 24) } Roughly speaking: 1. The expression `1 << (bit & 31)` is rewritten into `(SHLL 1 bit)` 2. The expression `uint32(b[0]) \| uint32(b[1])<<8 \| uint32(b[2])<<16 \| uint32(b[3])<<24` is rewritten into `(MOVLload &b[0])` 3. The statements `b[0] = byte(v) ... b[3] = byte(v >> 24)` are rewritten into `(MOVLstore &b[0], v)` 4. `(ORL (SHLL 1, bit) (MOVLload &b[0]))` is rewritten into `(BTSL (MOVLload &b[0]) bit)`. This is a valid transformation because the destination is a register: in this case, the bit offset is masked by the number of bits in the destination register. This is identical to the masking performed by `SHL`. 5. `(MOVLstore &b[0] (BTSL (MOVLload &b[0]) bit))` is rewritten into `(BTSLmodify &b[0] bit)`. This is an invalid transformation because the destination is memory: in this case, the bit offset is not masked, and the chosen instruction may write outside its intended 32-bit location. These changes fix the invalid rewrite performed in step (5) by explicitly maksing the bit offset operand to `BT(S\|R\|C)(L\|Q)modify`. In the example above, the adjusted rules produce `(BTSLmodify &b[0] (ANDLconst [31] bit))` in step (5). These changes also add several new rules to rewrite bit sets, toggles, and clears that are rooted at `(OR\|XOR\|AND)(L\|Q)modify` operators into appropriate `BT(S\|R\|C)(L\|Q)modify` operators. These rules catch cases where `MOV(L\|Q)store ((OR\|XOR\|AND)(L\|Q) ...)` is rewritten to `(OR\|XOR\|AND)(L\|Q)modify` before the `(OR\|XOR\|AND)(L\|Q) ...` can be rewritten to `BT(S\|R\|C)(L\|Q) ...`. Overall, compilecmp reports small improvements in code size on darwin/amd64 when the changes to the compiler itself are exlcuded: file before after Δ % runtime.s 536464 536412 -52 -0.010% bytes.s 32629 32593 -36 -0.110% strings.s 44565 44529 -36 -0.081% os/signal.s 7967 7959 -8 -0.100% cmd/vendor/golang.org/x/sys/unix.s 81686 81678 -8 -0.010% math/big.s 188235 188253 +18 +0.010% cmd/link/internal/loader.s 89295 89056 -239 -0.268% cmd/link/internal/ld.s 633551 633232 -319 -0.050% cmd/link/internal/arm.s 18934 18928 -6 -0.032% cmd/link/internal/arm64.s 31814 31801 -13 -0.041% cmd/link/internal/riscv64.s 7347 7345 -2 -0.027% cmd/compile/internal/ssa.s 4029173 4033066 +3893 +0.097% total 21298280 21301472 +3192 +0.015% Change-Id: I2e560548b515865129e1724e150e30540e9d29ce GitHub-Last-Rev: `9a42bd29a5` GitHub-Pull-Request: golang/go#45242 Reviewed-on: https://go-review.googlesource.com/c/go/+/304869 Reviewed-by: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com>	2021-03-26 19:40:37 +00:00
fanzha02	3a0061822e	cmd/compile: add arm64 rules to optimize go codes to constant 0 Optimize the following codes to constant 0. function shift (x uint32) uint64 { return uint64(x) >> 32 } Change-Id: Ida6b39d713cc119ad5a2f01fd54bfd252cf2c975 Reviewed-on: https://go-review.googlesource.com/c/go/+/303830 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-03-26 01:57:35 +00:00
fanzha02	b182ba7fab	cmd/compile: optimize codes with arm64 REV16 instruction Optimize some patterns into rev16/rev16w instruction. Pattern1: (c & 0xff00ff00)>>8 \| (c & 0x00ff00ff)<<8 To: rev16w c Pattern2: (c & 0xff00ff00ff00ff00)>>8 \| (c & 0x00ff00ff00ff00ff)<<8 To: rev16 c This patch is a copy of CL 239637, contributed by Alice Xu(dianhong.xu@arm.com). Change-Id: I96936c1db87618bc1903c04221c7e9b2779455b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/268377 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-03-23 01:36:23 +00:00
Cherry Zhang	6ae3b70ef2	cmd/compile: add clobberdeadreg mode When -clobberdeadreg flag is set, the compiler inserts code that clobbers integer registers at call sites. This may be helpful for debugging register ABI. Only implemented on AMD64 for now. Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899 Reviewed-on: https://go-review.googlesource.com/c/go/+/302809 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2021-03-19 23:21:21 +00:00
fanzha02	f5e6d3e879	cmd/compile: add rewrite rules for conditional instructions on arm64 This CL adds rewrite rules for CSETM, CSINC, CSINV, and CSNEG. By adding these rules, we can save one instruction. For example, func test(cond bool, a int) int { if cond { a++ } return a } Before: MOVD "".a+8(RSP), R0 ADD $1, R0, R1 MOVBU "".cond(RSP), R2 CMPW $0, R2 CSEL NE, R1, R0, R0 After: MOVBU "".cond(RSP), R0 CMPW $0, R0 MOVD "".a+8(RSP), R0 CSINC EQ, R0, R0, R0 This patch is a copy of CL 285694. Co-authored-by: JunchenLi <junchen.li@arm.com> Change-Id: Ic1a79e8b8ece409b533becfcb7950f11e7b76f24 Reviewed-on: https://go-review.googlesource.com/c/go/+/302231 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-03-18 01:46:58 +00:00
Cherry Zhang	8628bf9a97	cmd/compile: resurrect clobberdead mode This CL resurrects the clobberdead debugging mode (CL 23924). When -clobberdead flag is set (TODO: make it GOEXPERIMENT?), the compiler inserts code that clobbers all dead stack slots that contains pointers. Mark windows syscall functions cgo_unsafe_args, as the code actually does that, by taking the address of one argument and passing it to cgocall. Change-Id: Ie09a015f4bd14ae6053cc707866e30ae509b9d6f Reviewed-on: https://go-review.googlesource.com/c/go/+/301791 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Than McIntosh <thanm@google.com>	2021-03-17 17:50:50 +00:00
Meng Zhuo	afe517590c	cmd/compile: loads from readonly globals into const for mips64x Ref: CL 141118 Update #26498 Change-Id: If4ea55c080b9aa10183eefe81fefbd4072deaf3a Reviewed-on: https://go-review.googlesource.com/c/go/+/280646 Trust: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-03-16 09:07:17 +00:00
erifan01	600259b099	cmd/compile: use depth first topological sort algorithm for layout The current layout algorithm tries to put consecutive blocks together, so the priority of the successor block is higher than the priority of the zero indegree block. This algorithm is beneficial for subsequent register allocation, but will result in more branch instructions. The depth-first topological sorting algorithm is a well-known layout algorithm, which has applications in many languages, and it helps to reduce branch instructions. This CL applies it to the layout pass. The test results show that it helps to reduce the code size. This CL also includes the following changes: 1, Removed the primary predecessor mechanism. The new layout algorithm is not very friendly to register allocator in some cases, in order to adapt to the new layout algorithm, a new primary predecessor selection strategy is introduced. 2, Since the new layout implementation may place non-loop blocks between loop blocks, some adaptive modifications have also been made to looprotate pass. 3, The layout also affects the results of codegen, so this CL also adjusted several codegen tests accordingly. It is inevitable that this CL will cause the code size or performance of a few functions to decrease, but the number of cases it improves is much larger than the number of cases it drops. Statistical data from compilecmp on linux/amd64 is as follow: name old time/op new time/op delta Template 382ms ± 4% 382ms ± 4% ~ (p=0.497 n=49+50) Unicode 170ms ± 9% 169ms ± 8% ~ (p=0.344 n=48+50) GoTypes 2.01s ± 4% 2.01s ± 4% ~ (p=0.628 n=50+48) Compiler 190ms ±10% 189ms ± 9% ~ (p=0.734 n=50+50) SSA 11.8s ± 2% 11.8s ± 3% ~ (p=0.877 n=50+50) Flate 241ms ± 9% 241ms ± 8% ~ (p=0.897 n=50+49) GoParser 366ms ± 3% 361ms ± 4% -1.21% (p=0.004 n=47+50) Reflect 835ms ± 3% 838ms ± 3% ~ (p=0.275 n=50+49) Tar 336ms ± 4% 335ms ± 3% ~ (p=0.454 n=48+48) XML 433ms ± 4% 431ms ± 3% ~ (p=0.071 n=49+48) LinkCompiler 706ms ± 4% 705ms ± 4% ~ (p=0.608 n=50+49) ExternalLinkCompiler 1.85s ± 3% 1.83s ± 2% -1.47% (p=0.000 n=49+48) LinkWithoutDebugCompiler 437ms ± 5% 437ms ± 6% ~ (p=0.953 n=49+50) [Geo mean] 615ms 613ms -0.37% name old alloc/op new alloc/op delta Template 38.7MB ± 1% 38.7MB ± 1% ~ (p=0.834 n=50+50) Unicode 28.1MB ± 0% 28.1MB ± 0% -0.22% (p=0.000 n=49+50) GoTypes 168MB ± 1% 168MB ± 1% ~ (p=0.054 n=47+47) Compiler 23.0MB ± 1% 23.0MB ± 1% ~ (p=0.432 n=50+50) SSA 1.54GB ± 0% 1.54GB ± 0% +0.21% (p=0.000 n=50+50) Flate 23.6MB ± 1% 23.6MB ± 1% ~ (p=0.153 n=43+46) GoParser 35.1MB ± 1% 35.1MB ± 2% ~ (p=0.202 n=50+50) Reflect 84.7MB ± 1% 84.7MB ± 1% ~ (p=0.333 n=48+49) Tar 34.5MB ± 1% 34.5MB ± 1% ~ (p=0.406 n=46+49) XML 44.3MB ± 2% 44.2MB ± 3% ~ (p=0.981 n=50+50) LinkCompiler 131MB ± 0% 128MB ± 0% -2.74% (p=0.000 n=50+50) ExternalLinkCompiler 120MB ± 0% 120MB ± 0% +0.01% (p=0.007 n=50+50) LinkWithoutDebugCompiler 77.3MB ± 0% 77.3MB ± 0% -0.02% (p=0.000 n=50+50) [Geo mean] 69.3MB 69.1MB -0.22% file before after Δ % addr2line 4104220 4043684 -60536 -1.475% api 5342502 5249678 -92824 -1.737% asm 4973785 4858257 -115528 -2.323% buildid 2667844 2625660 -42184 -1.581% cgo 4686849 4616313 -70536 -1.505% compile 23667431 23268406 -399025 -1.686% cover 4959676 4874108 -85568 -1.725% dist 3515934 3450422 -65512 -1.863% doc 3995581 3925469 -70112 -1.755% fix 3379202 3318522 -60680 -1.796% link 6743249 6629913 -113336 -1.681% nm 4047529 3991777 -55752 -1.377% objdump 4456151 4388151 -68000 -1.526% pack 2435040 2398072 -36968 -1.518% pprof 13804080 13565808 -238272 -1.726% test2json 2690043 2645987 -44056 -1.638% trace 10418492 10232716 -185776 -1.783% vet 7258259 7121259 -137000 -1.888% total 113145867 111204202 -1941665 -1.716% The situation on linux/arm64 is as follow: name old time/op new time/op delta Template 280ms ± 1% 282ms ± 1% +0.75% (p=0.000 n=46+48) Unicode 124ms ± 2% 124ms ± 2% +0.37% (p=0.045 n=50+50) GoTypes 1.69s ± 1% 1.70s ± 1% +0.56% (p=0.000 n=49+50) Compiler 122ms ± 1% 123ms ± 1% +0.93% (p=0.000 n=50+50) SSA 12.6s ± 1% 12.7s ± 0% +0.72% (p=0.000 n=50+50) Flate 170ms ± 1% 172ms ± 1% +0.97% (p=0.000 n=49+49) GoParser 262ms ± 1% 263ms ± 1% +0.39% (p=0.000 n=49+48) Reflect 639ms ± 1% 650ms ± 1% +1.63% (p=0.000 n=49+49) Tar 243ms ± 1% 245ms ± 1% +0.82% (p=0.000 n=50+50) XML 324ms ± 1% 327ms ± 1% +0.72% (p=0.000 n=50+49) LinkCompiler 597ms ± 1% 596ms ± 1% -0.27% (p=0.001 n=48+47) ExternalLinkCompiler 1.90s ± 1% 1.88s ± 1% -1.00% (p=0.000 n=50+50) LinkWithoutDebugCompiler 364ms ± 1% 363ms ± 1% ~ (p=0.220 n=49+50) [Geo mean] 485ms 488ms +0.49% name old alloc/op new alloc/op delta Template 38.7MB ± 0% 38.8MB ± 1% ~ (p=0.093 n=43+49) Unicode 28.4MB ± 0% 28.4MB ± 0% +0.03% (p=0.000 n=49+45) GoTypes 169MB ± 1% 169MB ± 1% +0.23% (p=0.010 n=50+50) Compiler 23.2MB ± 1% 23.2MB ± 1% +0.11% (p=0.000 n=40+44) SSA 1.54GB ± 0% 1.55GB ± 0% +0.45% (p=0.000 n=47+49) Flate 23.8MB ± 2% 23.8MB ± 1% ~ (p=0.543 n=50+50) GoParser 35.3MB ± 1% 35.4MB ± 1% ~ (p=0.792 n=50+50) Reflect 85.2MB ± 1% 85.2MB ± 0% ~ (p=0.055 n=50+47) Tar 34.5MB ± 1% 34.5MB ± 1% +0.06% (p=0.015 n=50+50) XML 43.8MB ± 2% 43.9MB ± 2% +0.19% (p=0.000 n=48+48) LinkCompiler 137MB ± 0% 136MB ± 0% -0.92% (p=0.000 n=50+50) ExternalLinkCompiler 127MB ± 0% 127MB ± 0% ~ (p=0.516 n=50+50) LinkWithoutDebugCompiler 84.0MB ± 0% 84.0MB ± 0% ~ (p=0.057 n=50+50) [Geo mean] 70.4MB 70.4MB +0.01% file before after Δ % addr2line 4021557 4002933 -18624 -0.463% api 5127847 5028503 -99344 -1.937% asm 5034716 4936836 -97880 -1.944% buildid 2608118 2594094 -14024 -0.538% cgo 4488592 4398320 -90272 -2.011% compile 22501129 22213592 -287537 -1.278% cover 4742301 4713573 -28728 -0.606% dist 3388071 3365311 -22760 -0.672% doc 3802250 3776082 -26168 -0.688% fix 3306147 3216939 -89208 -2.698% link 6404483 6363699 -40784 -0.637% nm 3941026 3921930 -19096 -0.485% objdump 4383330 `4295122` -88208 -2.012% pack 2404547 2389515 -15032 -0.625% pprof 12996234 12856818 -139416 -1.073% test2json 2668500 2586788 -81712 -3.062% trace 9816276 9609580 -206696 -2.106% vet 6900682 6787338 -113344 -1.643% total 108535806 107056973 -1478833 -1.363% Change-Id: Iaec1cdcaacca8025e9babb0fb8a532fddb70c87d Reviewed-on: https://go-review.googlesource.com/c/go/+/255239 Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@golang.org> Trust: eric fang <eric.fang@arm.com>	2021-03-16 02:44:54 +00:00
Josh Bleecher Snyder	43d5f213e2	cmd/compile: optimize multi-register shifts on amd64 amd64 can shift in bits from another register instead of filling with 0/1. This pattern is helpful when implementing 128 bit shifts or arbitrary length shifts. In the standard library, it shows up in pure Go math/big. Benchmarks results on amd64 with -tags=math_big_pure_go. name old time/op new time/op delta NonZeroShifts/1/shrVU-8 4.45ns ± 3% 4.39ns ± 1% -1.28% (p=0.000 n=30+27) NonZeroShifts/1/shlVU-8 4.13ns ± 4% 4.10ns ± 2% ~ (p=0.254 n=29+28) NonZeroShifts/2/shrVU-8 5.55ns ± 1% 5.63ns ± 2% +1.42% (p=0.000 n=28+29) NonZeroShifts/2/shlVU-8 5.70ns ± 2% 5.14ns ± 1% -9.82% (p=0.000 n=29+28) NonZeroShifts/3/shrVU-8 6.79ns ± 2% 6.35ns ± 2% -6.46% (p=0.000 n=28+29) NonZeroShifts/3/shlVU-8 6.69ns ± 1% 6.25ns ± 1% -6.60% (p=0.000 n=28+27) NonZeroShifts/4/shrVU-8 7.79ns ± 2% 7.06ns ± 2% -9.48% (p=0.000 n=30+30) NonZeroShifts/4/shlVU-8 7.82ns ± 1% 7.24ns ± 1% -7.37% (p=0.000 n=28+29) NonZeroShifts/5/shrVU-8 8.90ns ± 3% 7.93ns ± 1% -10.84% (p=0.000 n=29+26) NonZeroShifts/5/shlVU-8 8.68ns ± 1% 7.92ns ± 1% -8.76% (p=0.000 n=29+29) NonZeroShifts/10/shrVU-8 14.4ns ± 1% 12.3ns ± 2% -14.79% (p=0.000 n=28+29) NonZeroShifts/10/shlVU-8 14.1ns ± 1% 11.9ns ± 2% -15.55% (p=0.000 n=28+27) NonZeroShifts/100/shrVU-8 118ns ± 1% 96ns ± 3% -18.82% (p=0.000 n=30+29) NonZeroShifts/100/shlVU-8 120ns ± 2% 98ns ± 2% -18.46% (p=0.000 n=29+28) NonZeroShifts/1000/shrVU-8 1.10µs ± 1% 0.88µs ± 2% -19.63% (p=0.000 n=29+30) NonZeroShifts/1000/shlVU-8 1.10µs ± 2% 0.88µs ± 2% -20.28% (p=0.000 n=29+28) NonZeroShifts/10000/shrVU-8 10.9µs ± 1% 8.7µs ± 1% -19.78% (p=0.000 n=28+27) NonZeroShifts/10000/shlVU-8 10.9µs ± 2% 8.7µs ± 1% -19.64% (p=0.000 n=29+27) NonZeroShifts/100000/shrVU-8 111µs ± 2% 90µs ± 2% -19.39% (p=0.000 n=28+29) NonZeroShifts/100000/shlVU-8 113µs ± 2% 90µs ± 2% -20.43% (p=0.000 n=30+27) The assembly version is still faster, unfortunately, but the gap is narrowing. Speedup from pure Go to assembly: name old time/op new time/op delta NonZeroShifts/1/shrVU-8 4.39ns ± 1% 3.45ns ± 2% -21.36% (p=0.000 n=27+29) NonZeroShifts/1/shlVU-8 4.10ns ± 2% 3.47ns ± 3% -15.42% (p=0.000 n=28+30) NonZeroShifts/2/shrVU-8 5.63ns ± 2% 3.97ns ± 0% -29.40% (p=0.000 n=29+25) NonZeroShifts/2/shlVU-8 5.14ns ± 1% 3.77ns ± 2% -26.65% (p=0.000 n=28+26) NonZeroShifts/3/shrVU-8 6.35ns ± 2% 4.79ns ± 2% -24.52% (p=0.000 n=29+29) NonZeroShifts/3/shlVU-8 6.25ns ± 1% 4.42ns ± 1% -29.29% (p=0.000 n=27+26) NonZeroShifts/4/shrVU-8 7.06ns ± 2% 5.64ns ± 1% -20.05% (p=0.000 n=30+29) NonZeroShifts/4/shlVU-8 7.24ns ± 1% 5.34ns ± 2% -26.23% (p=0.000 n=29+29) NonZeroShifts/5/shrVU-8 7.93ns ± 1% 6.56ns ± 2% -17.26% (p=0.000 n=26+30) NonZeroShifts/5/shlVU-8 7.92ns ± 1% 6.27ns ± 1% -20.79% (p=0.000 n=29+25) NonZeroShifts/10/shrVU-8 12.3ns ± 2% 10.2ns ± 2% -17.21% (p=0.000 n=29+29) NonZeroShifts/10/shlVU-8 11.9ns ± 2% 10.5ns ± 2% -12.45% (p=0.000 n=27+29) NonZeroShifts/100/shrVU-8 95.9ns ± 3% 77.7ns ± 1% -19.00% (p=0.000 n=29+30) NonZeroShifts/100/shlVU-8 97.5ns ± 2% 66.8ns ± 2% -31.47% (p=0.000 n=28+30) NonZeroShifts/1000/shrVU-8 884ns ± 2% 705ns ± 1% -20.17% (p=0.000 n=30+28) NonZeroShifts/1000/shlVU-8 880ns ± 2% 590ns ± 1% -32.96% (p=0.000 n=28+25) NonZeroShifts/10000/shrVU-8 8.74µs ± 1% 7.34µs ± 3% -15.94% (p=0.000 n=27+30) NonZeroShifts/10000/shlVU-8 8.73µs ± 1% 6.00µs ± 1% -31.25% (p=0.000 n=27+28) NonZeroShifts/100000/shrVU-8 89.6µs ± 2% 75.5µs ± 2% -15.80% (p=0.000 n=29+29) NonZeroShifts/100000/shlVU-8 89.6µs ± 2% 68.0µs ± 3% -24.09% (p=0.000 n=27+30) Change-Id: I18f58d8f5513d737d9cdf09b8f9d14011ffe3958 Reviewed-on: https://go-review.googlesource.com/c/go/+/297050 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-03-11 19:11:46 +00:00
Paul E. Murphy	48ddf70128	cmd/asm,cmd/compile: support 5 operand RLWNM/RLWMI on ppc64 These instructions are actually 5 argument opcodes as specified by the ISA. Prior to this patch, the MB and ME arguments were merged into a single bitmask operand to workaround the limitations of the ppc64 assembler backend. This limitation no longer exists. Thus, we can pass operands for these opcodes without having to merge the MB and ME arguments in the assembler frontend or compiler backend. Likewise, support for 4 operand variants is unchanged. Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb Reviewed-on: https://go-review.googlesource.com/c/go/+/298750 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>	2021-03-09 20:35:41 +00:00
fanzha02	2b50ab2aee	cmd/compile: optimize single-precision floating point square root Add generic rule to rewrite the single-precision square root expression with one single-precision instruction. The optimization will reduce two times of precision converting between double-precision and single-precision. On arm64 flatform. previous: FCVTSD F0, F0 FSQRTD F0, F0 FCVTDS F0, F0 optimized: FSQRTS S0, S0 And this patch adds the test case to check the correctness. This patch refers to CL 241877, contributed by Alice Xu (dianhong.xu@arm.com) Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e Reviewed-on: https://go-review.googlesource.com/c/go/+/276873 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-03-02 06:38:07 +00:00
Egon Elbre	3ee32439b5	cmd/compile: ARM64 optimize []float64 and []float32 access Optimize load and store to []float64 and []float32. Previously it used LSL instead of shifted register indexed load/store. Before: LSL $3, R0, R0 FMOVD F0, (R1)(R0) After: FMOVD F0, (R1)(R0<<3) Fixes #42798 Change-Id: I0c0912140c3dce5aa6abc27097c0eb93833cc589 Reviewed-on: https://go-review.googlesource.com/c/go/+/273706 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Giovanni Bajo <rasky@develer.com>	2021-02-24 19:49:08 +00:00
Alejandro García Montoro	bf48163e8f	cmd/compile: add rule to coalesce writes The code generated when storing eight bytes loaded from memory created a series of small writes instead of a single, large one. The specific pattern of instructions generated stored 1 byte, then 2 bytes, then 4 bytes, and finally 1 byte. The new rules match this specific pattern both for amd64 and for s390x, and convert it into a single instruction to store the 8 bytes. arm64 and ppc64le already generated the right code, but the new codegen test covers also those architectures. Fixes #41663 Change-Id: Ifb9b464be2d59c2ed5034acf7b9c3e473f344030 Reviewed-on: https://go-review.googlesource.com/c/go/+/280456 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Trust: Josh Bleecher Snyder <josharian@gmail.com> Trust: Jason A. Donenfeld <Jason@zx2c4.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-02-24 19:25:49 +00:00
Keith Randall	42cd40ee74	cmd/compile: improve bit test code Some bit test instruction generation stopped triggering after the change to addressing modes. I suspect this was just because ANDQload was being generated before the rewrite rules could discover the BTQ. Fix that by decomposing the ANDQload when it is surrounded by a TESTQ (thus re-enabling the BTQ rules). Fixes #44228 Change-Id: I489b4a5a7eb01c65fc8db0753f8cec54097cadb2 Reviewed-on: https://go-review.googlesource.com/c/go/+/291749 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2021-02-23 18:02:48 +00:00
Cherry Zhang	401d7e5a24	[dev.regabi] cmd/compile: reserve X15 as zero register on AMD64 In ABIInternal, reserve X15 as constant zero, and use it to zero memory. (Maybe there can be more use of it?) The register is zeroed when transition to ABIInternal from ABI0. Caveat: using X15 generates longer instructions than using X0. Maybe we want to use X0? Change-Id: I12d5ee92a01fc0b59dad4e5ab023ac71bc2a8b7d Reviewed-on: https://go-review.googlesource.com/c/go/+/288093 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2021-02-03 22:44:53 +00:00
David Chase	9a19481acb	[dev.regabi] cmd/compile: make ordering for InvertFlags more stable Current many architectures use a rule along the lines of // Canonicalize the order of arguments to comparisons - helps with CSE. ((CMP\|CMPW) x y) && x.ID > y.ID => (InvertFlags ((CMP\|CMPW) y x)) to normalize comparisons as much as possible for CSE. Replace the ID comparison with something less variable across compiler changes. This helps avoid spurious failures in some of the codegen-comparison tests (though the current choice of comparison is sensitive to Op ordering). Two tests changed to accommodate modified instruction choice. Change-Id: Ib35f450bd2bae9d4f9f7838ceaf7ec682bcf1e1a Reviewed-on: https://go-review.googlesource.com/c/go/+/280155 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-01-13 02:40:43 +00:00
Lynn Boger	0ae3b7cb74	cmd/compile: fix rules regression with shifts on PPC64 Some rules for PPC64 were checking for a case where a shift followed by an 'and' of a mask could be lowered, depending on the format of the mask. The function to verify if the mask was valid for this purpose was not checking if the mask was 0 which we don't want to allow. This case can happen if previous optimizations resulted in that mask value. This fixes isPPC64ValidShiftMask to check for a mask of 0 and return false. This also adds a codegen testcase to verify it doesn't try to match the rules in the future. Fixes #42610 Change-Id: I565d94e88495f51321ab365d6388c01e791b4dbb Reviewed-on: https://go-review.googlesource.com/c/go/+/270358 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>	2020-11-17 13:20:20 +00:00
Alberto Donizetti	7307e86afd	test/codegen: go fmt Fixes #42445 Change-Id: I9653ef094dba2a1ac2e3daaa98279d10df17a2a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/268257 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Trust: Martin Möhrmann <moehrmann@google.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2020-11-08 12:19:55 +00:00
Michael Munday	854e892ce1	cmd/compile: optimize shift pairs and masks on s390x Optimize combinations of left and right shifts by a constant value into a 'rotate then insert selected bits [into zero]' instruction. Use the same instruction for contiguous masks since it has some benefits over 'and immediate' (not restricted to 32-bits, does not overwrite source register). To keep the complexity of this change under control I've only implemented 64 bit operations for now. There are a lot more optimizations that can be done with this instruction family. However, since their function overlaps with other instructions we need to be somewhat careful not to break existing optimization rules by creating optimization dead ends. This is particularly true of the load/store merging rules which contain lots of zero extensions and shifts. This CL does interfere with the store merging rules when an operand is shifted left before it is stored: binary.BigEndian.PutUint64(b, x << 1) This is unfortunate but it's not critical and somewhat complex so I plan to fix that in a follow up CL. file before after Δ % addr2line 4117446 4117282 -164 -0.004% api 4945184 4942752 -2432 -0.049% asm 4998079 4991891 -6188 -0.124% buildid 2685158 2684074 -1084 -0.040% cgo 4553732 4553394 -338 -0.007% compile 19294446 19245070 -49376 -0.256% cover 4897105 4891319 -5786 -0.118% dist 3544389 3542785 -1604 -0.045% doc 3926795 3927617 +822 +0.021% fix 3302958 3293868 -9090 -0.275% link 6546274 6543456 -2818 -0.043% nm 4102021 4100825 -1196 -0.029% objdump 4542431 4548483 +6052 +0.133% pack 2482465 2416389 -66076 -2.662% pprof 13366541 13363915 -2626 -0.020% test2json 2829007 2761515 -67492 -2.386% trace 10216164 10219684 +3520 +0.034% vet 6773956 6773572 -384 -0.006% total 107124151 106917891 -206260 -0.193% Change-Id: I7591cce41e06867ba10a745daae9333513062746 Reviewed-on: https://go-review.googlesource.com/c/go/+/233317 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Michael Munday <mike.munday@ibm.com>	2020-11-06 10:45:31 +00:00
Cherry Zhang	0387bedadf	cmd/compile: remove racefuncenterfp when it is not needed We already remove racefuncenter and racefuncexit if they are not needed (i.e. the function doesn't have any other race calls). racefuncenterfp is like racefuncenter but used on LR machines. Remove unnecessary racefuncenterfp as well. Change-Id: I65edb00e19c6d9ab55a204cbbb93e9fb710559f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/267099 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2020-11-02 03:03:16 +00:00
Paul E. Murphy	c3c6fbf314	cmd/compile: combine more 32 bit shift and mask operations on ppc64 Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is and the shift value produce constant which can be encoded into an RLWINM instruction. Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate masks produces a constant which can be encoded into RLWINM. Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)). Combine rotate word + and operations which can be encoded as a single RLWINM/RLWNM instruction. The most notable performance improvements arise from the crypto benchmarks below (GOARCH=power8 on a ppc64le/linux): pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le ExpandKeyWithSalt 52.2µs ± 0% 47.5µs ± 0% -8.88% ExpandKey 44.4µs ± 0% 40.3µs ± 0% -9.15% pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le Key 57.6ms ± 0% 52.3ms ± 0% -9.13% pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le Equal 90.9ms ± 0% 82.6ms ± 0% -9.13% DefaultCost 91.0ms ± 0% 82.7ms ± 0% -9.12% Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce Reviewed-on: https://go-review.googlesource.com/c/go/+/260798 Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>	2020-10-27 18:33:20 +00:00
Keith Randall	04b8a9fea5	all: implement GO386=softfloat Backstop support for non-sse2 chips now that 387 is gone. RELNOTE=yes Change-Id: Ib10e69c4a3654c15a03568f93393437e1939e013 Reviewed-on: https://go-review.googlesource.com/c/go/+/260017 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2020-10-06 22:49:38 +00:00
Keith Randall	fe2cfb74ba	all: drop 387 support My last 387 CL. So sad ... ... ... ... not! Fixes #40255 Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4 Reviewed-on: https://go-review.googlesource.com/c/go/+/258957 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-10-02 00:00:51 +00:00
Lynn Boger	cc2a5cf4b8	cmd/compile,cmd/internal/obj/ppc64: fix some shift rules due to a regression A recent change to improve shifts was generating some invalid cases when the rule was based on an AND. The extended mnemonics CLRLSLDI and CLRLSLWI only allow certain values for the operands and in the mask case those values were not being checked properly. This adds a check to those rules to verify that the 'b' and 'n' values used when an AND was part of the rule have correct values. There was a bug in some diag messages in asm9. The message expected 3 values but only provided 2. Those are corrected here also. The test/codegen/shift.go was updated to add a few more cases to check for the case mentioned here. Some of the comments that mention the order of operands in these extended mnemonics were wrong and those have been corrected. Fixes #41683. Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31 Reviewed-on: https://go-review.googlesource.com/c/go/+/258138 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>	2020-10-01 18:51:18 +00:00
Lynn Boger	a424f6e45e	cmd/asm,cmd/compile,cmd/internal/obj/ppc64: add extswsli support on power9 This adds support for the extswsli instruction which combines extsw followed by a shift. New benchmark demonstrates the improvement: name old time/op new time/op delta ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3) Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/257017 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>	2020-09-28 18:13:48 +00:00
Lynn Boger	967465da29	cmd/compile: use combined shifts to improve array addressing on ppc64x This change adds rules to find pairs of instructions that can be combined into a single shifts. These instruction sequences are common in array addressing within loops. Improvements can be seen in many crypto packages and the hash packages. These are based on the extended mnemonics found in the ISA sections C.8.1 and C.8.2. Some rules in PPC64.rules were moved because the ordering prevented some matching. The following results were generated on power9. hash/crc32: CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41% CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50% CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46% CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80% CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27% CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15% CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22% CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79% CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56% CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53% crypto/rc4: RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1) RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1) RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1) crypto/sha1: Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1) Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1) Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1) Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1) There are other improvements found in golang.org/x/crypto which are all in the range of 5-15%. Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c Reviewed-on: https://go-review.googlesource.com/c/go/+/252097 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>	2020-09-17 12:37:40 +00:00
Cuong Manh Le	27a30186ab	cmd/compile,runtime: skip zero'ing order array for select statements The order array was zero initialized by the compiler, but ends up being overwritten by the runtime anyway. So let the runtime takes full responsibility for initializing, save us one instruction per select. Fixes #40399 Change-Id: Iec1eca27ad7180d4fcb3cc9ef97348206b7fe6b8 Reviewed-on: https://go-review.googlesource.com/c/go/+/251517 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2020-08-29 08:02:52 +00:00
Paul E. Murphy	7615b20d06	cmd/compile: generate subfic on ppc64 This merges an lis + subf into subfic, and for 32b constants lwa + subf into oris + ori + subf. The carry bit is no longer used in code generation, therefore I think we can clobber it as needed. Note, lowered borrow/carry arithmetic is self-contained and thus is not affected. A few extra rules are added to ensure early transformations to SUBFCconst don't trip up earlier rules, fold constant operations, or otherwise simplify lowering. Likewise, tests are added to ensure all rules are hit. Generic constant folding catches trivial cases, however some lowering rules insert arithmetic which can introduce new opportunities (e.g BitLen or Slicemask). I couldn't find a specific benchmark to demonstrate noteworthy improvements, but this is generating subfic in many of the default bent test binaries, so we are at least saving a little code space. Change-Id: Iad7c6e5767eaa9dc24dc1c989bd1c8cfe1982012 Reviewed-on: https://go-review.googlesource.com/c/go/+/249461 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>	2020-08-27 20:10:15 +00:00
fanzha02	d556c251a1	cmd/compile: add more generic rewrite rules to reassociate (op (op y C) x\|C) With this patch, opt pass can expose more obvious constant-folding opportunites. Example: func test(i int) int {return (i+8)-(i+4)} The previous version: MOVD "".i(FP), R0 ADD $8, R0, R1 ADD $4, R0, R0 SUB R0, R1, R0 MOVD R0, "".~r1+8(FP) RET (R30) The optimized version: MOVD $4, R0 MOVD R0, "".~r1+8(FP) RET (R30) This patch removes some existing reassociation rules, such as "x+(z-C)", because the current generic rewrite rules will canonicalize "x-const" to "x+(-const)", making "x+(z-C)" equal to "x+(z+(-C))". This patch also adds test cases. Change-Id: I857108ba0b5fcc18a879eeab38e2551bc4277797 Reviewed-on: https://go-review.googlesource.com/c/go/+/237137 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-08-24 14:52:54 +00:00
Agniva De Sarker	8acbe4c0b3	cmd/compile: optimize unsigned comparisons with 0/1 on wasm Updates #21439 Change-Id: I0fbcde6e0c2fc368fe686b271670f9d8be4a7900 Reviewed-on: https://go-review.googlesource.com/c/go/+/247557 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com>	2020-08-22 12:35:47 +00:00
diaxu01	01aad9ea93	cmd/compile: Optimize ARM64's code with EON This patch fuses pattern '(MVN (XOR x y))' into '(EON x y)'. Change-Id: I269c98ce198d51a4945ce8bd0e1024acbd1b7609 Reviewed-on: https://go-review.googlesource.com/c/go/+/239638 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-08-19 16:47:14 +00:00
Paul E. Murphy	e7c7ce646f	cmd/compile: combine multiply/add into maddld on ppc64le/power9 Add a new lowering rule to match and replace such instances with the MADDLD instruction available on power9 where possible. Likewise, this plumbs in a new ppc64 ssa opcode to house the newly generated MADDLD instructions. When testing ed25519, this reduced binary size by 936B. Similarly, MADDLD combination occcurs in a few other less obvious cases such as division by constant. Testing of golang.org/x/crypto/ed25519 shows non-trivial speedup during keygeneration: name old time/op new time/op delta KeyGeneration 65.2µs ± 0% 63.1µs ± 0% -3.19% Signing 64.3µs ± 0% 64.4µs ± 0% +0.16% Verification 147µs ± 0% 147µs ± 0% +0.11% Similarly, this test binary has shrunk by 66488B. Change-Id: I077aeda7943119b41f07e4e62e44a648f16e4ad0 Reviewed-on: https://go-review.googlesource.com/c/go/+/248723 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2020-08-18 21:09:30 +00:00
Michael Munday	6e876f1985	cmd/compile: clean up and optimize s390x multiplication rules Some of the existing optimizations aren't triggered because they are handled by the generic rules so this CL removes them. Also some constraints were copied without much thought from the amd64 rules and they don't make sense on s390x, so we remove those constraints. Finally, add a 'multiply by the sum of two powers of two' optimization. This makes sense on s390x as shifts are low latency and can also sometimes be optimized further (especially if we add support for RISBG instructions). name old time/op new time/op delta IntMulByConst/3-8 1.70ns ±11% 1.10ns ± 5% -35.26% (p=0.000 n=10+10) IntMulByConst/5-8 1.64ns ± 7% 1.10ns ± 4% -32.94% (p=0.000 n=10+9) IntMulByConst/12-8 1.65ns ± 6% 1.20ns ± 4% -27.16% (p=0.000 n=10+9) IntMulByConst/120-8 1.66ns ± 4% 1.22ns ±13% -26.43% (p=0.000 n=10+10) IntMulByConst/-120-8 1.65ns ± 7% 1.19ns ± 4% -28.06% (p=0.000 n=9+10) IntMulByConst/65537-8 0.86ns ± 9% 1.12ns ±12% +30.41% (p=0.000 n=10+10) IntMulByConst/65538-8 1.65ns ± 5% 1.23ns ± 5% -25.11% (p=0.000 n=10+10) Change-Id: Ib196e6bff1e97febfd266134d0a2b2a62897989f Reviewed-on: https://go-review.googlesource.com/c/go/+/248937 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-08-18 15:39:44 +00:00
Junchen Li	06337823ef	cmd/compile: optimize unsigned comparisons to 0/1 on arm64 For an unsigned integer, it's useful to convert its order test with 0/1 to its equality test with 0. We can save a comparison instruction that followed by a conditional branch on arm64 since it supports compare-with-zero-and-branch instructions. For example, if x > 0 { ... } else { ... } the original version: CMP $0, R0 BLS 9 the optimized version: CBZ R0, 8 Updates #21439 Change-Id: Id1de6f865f6aa72c5d45b29f7894818857288425 Reviewed-on: https://go-review.googlesource.com/c/go/+/246857 Reviewed-by: Keith Randall <khr@golang.org>	2020-08-18 04:09:13 +00:00
Keith Randall	ef9c8a38ad	cmd/compile: don't rewrite (CMP (AND x y) 0) to TEST if AND has other uses If the AND has other uses, we end up saving an argument to the AND in another register, so we can use it for the TEST. No point in doing that. Change-Id: I73444a6aeddd6f55e2328ce04d77c3e6cf4a83e0 Reviewed-on: https://go-review.googlesource.com/c/go/+/241280 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-08-17 22:00:44 +00:00
Junchen Li	e30fbe3757	cmd/compile: optimize unsigned comparisons to 0 There are some architecture-independent rules in #21439, since an unsigned integer >= 0 is always true and < 0 is always false. This CL adds these optimizations to generic rules. Updates #21439 Change-Id: Iec7e3040b761ecb1e60908f764815fdd9bc62495 Reviewed-on: https://go-review.googlesource.com/c/go/+/246617 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-08-17 20:06:35 +00:00
Keith Randall	c4fed25553	cmd/compile: add floating point load+op operations to addressing modes pass They were missed as part of the refactoring to use a separate addressing modes pass. Fixes #40426 Change-Id: Ie0418b2fac4ba1ffe720644ac918f6d728d5e420 Reviewed-on: https://go-review.googlesource.com/c/go/+/244859 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-07-27 18:24:32 +00:00
Xiangdong Ji	e031318ca6	cmd/compile: ARM comparisons with 0 incorrect on overflow Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI \|\| EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-06-09 15:50:33 +00:00
Xiangdong Ji	e8f5a33191	cmd/compile: fix incorrect rewriting to if condition Some ARM64 rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag, in the following categories: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI \|\| EQ 4. GTnoov greater than NEQ & PL The backend generates two consecutive branch instructions for 'LEnoov' and 'GTnoov' to model their expected behavior. A slight change to 'gc' and amd64/386 backends is made to unify the code generation. Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules identified on arm64, more might be needed on other arches, like 32-bit arm. Add two benchmarks profiling the aforementioned category 1&2 and category 3&4 separetely, we expect the first two categories will show performance improvement and the second will not result in visible regression compared with the non-optimized version. This change also updates TestFormats to support using %#x. Examples exhibiting where does the issue come from: 1: 'if x + 3 < 0' might be converted to: before: CMN $3, R0 BGE <else branch> // wrong branch is taken if 'x+3' overflows after: CMN $3, R0 BPL <else branch> 2: 'if y - 3 > 0' might be converted to: before: CMP $3, R0 BLE <else branch> // wrong branch is taken if 'y-3' underflows after: CMP $3, R0 BMI <else branch> BEQ <else branch> Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized version (not the parent commit), generally the optimization version outperforms. S1: name old time/op new time/op delta CondRewrite/SoloJump 13.6ns ± 0% 12.9ns ± 0% -5.15% (p=0.000 n=10+10) CondRewrite/CombJump 13.8ns ± 1% 12.9ns ± 0% -6.32% (p=0.000 n=10+10) S2: name old time/op new time/op delta CondRewrite/SoloJump 11.6ns ± 0% 10.9ns ± 0% -6.03% (p=0.000 n=10+10) CondRewrite/CombJump 11.4ns ± 0% 10.8ns ± 1% -5.53% (p=0.000 n=10+10) S3: name old time/op new time/op delta CondRewrite/SoloJump 7.36ns ± 0% 7.50ns ± 0% +1.79% (p=0.000 n=9+10) CondRewrite/CombJump 7.35ns ± 0% 7.75ns ± 0% +5.51% (p=0.000 n=8+9) S4: name old time/op new time/op delta CondRewrite/SoloJump-224 11.5ns ± 1% 10.9ns ± 0% -4.97% (p=0.000 n=10+10) CondRewrite/CombJump-224 11.9ns ± 0% 11.5ns ± 0% -2.95% (p=0.000 n=10+10) S5: name old time/op new time/op delta CondRewrite/SoloJump 10.0ns ± 0% 10.0ns ± 0% -0.45% (p=0.000 n=9+10) CondRewrite/CombJump 9.93ns ± 0% 9.77ns ± 0% -1.53% (p=0.000 n=10+9) Go1 perf. data: name old time/op new time/op delta BinaryTree17 6.29s ± 1% 6.30s ± 1% ~ (p=1.000 n=5+5) Fannkuch11 5.40s ± 0% 5.40s ± 0% ~ (p=0.841 n=5+5) FmtFprintfEmpty 97.9ns ± 0% 98.9ns ± 3% ~ (p=0.937 n=4+5) FmtFprintfString 171ns ± 3% 171ns ± 2% ~ (p=0.754 n=5+5) FmtFprintfInt 212ns ± 0% 217ns ± 6% +2.55% (p=0.008 n=5+5) FmtFprintfIntInt 296ns ± 1% 297ns ± 2% ~ (p=0.516 n=5+5) FmtFprintfPrefixedInt 371ns ± 2% 374ns ± 7% ~ (p=1.000 n=5+5) FmtFprintfFloat 435ns ± 1% 439ns ± 2% ~ (p=0.056 n=5+5) FmtManyArgs 1.37µs ± 1% 1.36µs ± 1% ~ (p=0.730 n=5+5) GobDecode 14.6ms ± 4% 14.4ms ± 4% ~ (p=0.690 n=5+5) GobEncode 11.8ms ±20% 11.6ms ±15% ~ (p=1.000 n=5+5) Gzip 507ms ± 0% 491ms ± 0% -3.22% (p=0.008 n=5+5) Gunzip 73.8ms ± 0% 73.9ms ± 0% ~ (p=0.690 n=5+5) HTTPClientServer 116µs ± 0% 116µs ± 0% ~ (p=0.686 n=4+4) JSONEncode 21.8ms ± 1% 21.6ms ± 2% ~ (p=0.151 n=5+5) JSONDecode 104ms ± 1% 103ms ± 1% -1.08% (p=0.016 n=5+5) Mandelbrot200 9.53ms ± 0% 9.53ms ± 0% ~ (p=0.421 n=5+5) GoParse 7.55ms ± 1% 7.51ms ± 1% ~ (p=0.151 n=5+5) RegexpMatchEasy0_32 158ns ± 0% 158ns ± 0% ~ (all equal) RegexpMatchEasy0_1K 606ns ± 1% 608ns ± 3% ~ (p=0.937 n=5+5) RegexpMatchEasy1_32 143ns ± 0% 144ns ± 1% ~ (p=0.095 n=5+4) RegexpMatchEasy1_1K 927ns ± 2% 944ns ± 2% ~ (p=0.056 n=5+5) RegexpMatchMedium_32 16.0ns ± 0% 16.0ns ± 0% ~ (all equal) RegexpMatchMedium_1K 69.3µs ± 2% 69.7µs ± 0% ~ (p=0.690 n=5+5) RegexpMatchHard_32 3.73µs ± 0% 3.73µs ± 1% ~ (p=0.984 n=5+5) RegexpMatchHard_1K 111µs ± 1% 110µs ± 0% ~ (p=0.151 n=5+5) Revcomp 1.91s ±47% 1.77s ±68% ~ (p=1.000 n=5+5) Template 138ms ± 1% 138ms ± 1% ~ (p=1.000 n=5+5) TimeParse 787ns ± 2% 785ns ± 1% ~ (p=0.540 n=5+5) TimeFormat 729ns ± 1% 726ns ± 1% ~ (p=0.151 n=5+5) Updates #38740 Change-Id: I06c604874acdc1e63e66452dadee5df053045222 Reviewed-on: https://go-review.googlesource.com/c/go/+/233097 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org>	2020-05-29 15:39:54 +00:00
Keith Randall	2cb10d42b7	cmd/compile: in prove, zero right shifts of positive int by #bits - 1 Taking over Zach's CL 212277. Just cleaned up and added a test. For a positive, signed integer, an arithmetic right shift of count (bit-width - 1) equals zero. e.g. int64(22) >> 63 -> 0. This CL makes prove replace these right shifts with a zero-valued constant. These shifts may arise in source code explicitly, but can also be created by the generic rewrite of signed division by a power of 2. // Signed divide by power of 2. // n / c = n >> log(c) if n >= 0 // = (n+c-1) >> log(c) if n < 0 // We conditionally add c-1 by adding n>>63>>(64-log(c)) (first shift signed, second shift unsigned). (Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) -> (Rsh64x64 (Add64 <t> n (Rsh64Ux64 <t> (Rsh64x64 <t> n (Const64 <typ.UInt64> [63])) (Const64 <typ.UInt64> [64-log2(c)]))) (Const64 <typ.UInt64> [log2(c)])) If n is known to be positive, this rewrite includes an extra Add and 2 extra Rsh. This CL will allow prove to replace one of the extra Rsh with a 0. That replacement then allows lateopt to remove all the unneccesary fixups from the generic rewrite. There is a rewrite rule to handle this case directly: (Div64 n (Const64 [c])) && isNonNegative(n) && isPowerOfTwo(c) -> (Rsh64Ux64 n (Const64 <typ.UInt64> [log2(c)])) But this implementation of isNonNegative really only handles constants and a few special operations like len/cap. The division could be handled if the factsTable version of isNonNegative were available. Unfortunately, the first opt pass happens before prove even has a chance to deduce the numerator is non-negative, so the generic rewrite has already fired and created the extra Ops discussed above. Fixes #36159 By Printf count, this zeroes 137 right shifts when building std and cmd. Change-Id: Iab486910ac9d7cfb86ace2835456002732b384a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/232857 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-05-11 16:23:52 +00:00
Martin Möhrmann	6ed4661807	cmd/compile: optimize make+copy pattern to avoid memclr match: m = make([]T, x); copy(m, s) for pointer free T and x==len(s) rewrite to: m = mallocgc(xelemsize(T), nil, false); memmove(&m, &s, xelemsize(T)) otherwise rewrite to: m = makeslicecopy([]T, x, s) This avoids memclear and shading of pointers in the newly created slice before the copy. With this CL "s" is only be allowed to bev a variable and not a more complex expression. This restriction could be lifted in future versions of this optimization when it can be proven that "s" is not referencing "m". Triggers 450 times during make.bash.. Reduces go binary size by ~8 kbyte. name old time/op new time/op delta MakeSliceCopy/mallocmove/Byte 71.1ns ± 1% 65.8ns ± 0% -7.49% (p=0.000 n=10+9) MakeSliceCopy/mallocmove/Int 71.2ns ± 1% 66.0ns ± 0% -7.27% (p=0.000 n=10+8) MakeSliceCopy/mallocmove/Ptr 104ns ± 4% 99ns ± 1% -5.13% (p=0.000 n=10+10) MakeSliceCopy/makecopy/Byte 70.3ns ± 0% 68.0ns ± 0% -3.22% (p=0.000 n=10+9) MakeSliceCopy/makecopy/Int 70.3ns ± 0% 68.5ns ± 1% -2.59% (p=0.000 n=9+10) MakeSliceCopy/makecopy/Ptr 102ns ± 0% 99ns ± 1% -2.97% (p=0.000 n=9+9) MakeSliceCopy/nilappend/Byte 75.4ns ± 0% 74.9ns ± 2% -0.63% (p=0.015 n=9+9) MakeSliceCopy/nilappend/Int 75.6ns ± 0% 76.4ns ± 3% ~ (p=0.245 n=9+10) MakeSliceCopy/nilappend/Ptr 107ns ± 0% 108ns ± 1% +0.93% (p=0.005 n=9+10) Fixes #26252 Change-Id: Iec553dd1fef6ded16197216a472351c8799a8e71 Reviewed-on: https://go-review.googlesource.com/c/go/+/146719 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-05-07 17:50:24 +00:00
Keith Randall	9ed0fb42e3	cmd/compile: add indexed memory modification ops to amd64 name old time/op new time/op delta Modify-16 404ns ± 1% 365ns ± 1% -9.73% (p=0.000 n=10+10) ConstModify-16 407ns ± 0% 385ns ± 2% -5.56% (p=0.000 n=9+10) Seems to generally help generated code. Binary size change is in the noise. Change-Id: I57891bfaf0f7dfc5d143bb9f7ebafc7079d2614f Reviewed-on: https://go-review.googlesource.com/c/go/+/228098 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-04-30 17:21:31 +00:00
Keith Randall	882ec701d2	cmd/compile: add indexed load+op operations to amd64 name old time/op new time/op delta LoadAdd-16 545ns ± 0% 456ns ± 0% -16.31% (p=0.000 n=10+10) Update #36468 Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/227960 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-04-30 17:19:57 +00:00
Josh Bleecher Snyder	4a7e363288	cmd/compile: optimize Move with all-zero ro sym src to Zero We set up static symbols during walk that we later make copies of to initialize local variables. It is difficult to ascertain at that time exactly when copying a symbol is profitable vs locally initializing an autotmp. During SSA, we are much better placed to optimize. This change recognizes when we are copying from a global readonly all-zero symbol and replaces it with direct zeroing. This often allows the all-zero symbol to be deadcode eliminated at link time. This is not ideal--it makes for large object files, and longer link times--but it is the cleanest fix I could find. This makes the final binary for the program in #38554 shrink from >500mb to ~2.2mb. It also shrinks the standard binaries: file before after Δ % addr2line 4412496 4404304 -8192 -0.186% buildid 2893816 2889720 -4096 -0.142% cgo 4841048 4832856 -8192 -0.169% compile 19926480 19922432 -4048 -0.020% cover 5281816 5277720 -4096 -0.078% link 6734648 6730552 -4096 -0.061% nm 4366240 4358048 -8192 -0.188% objdump 4755968 4747776 -8192 -0.172% pprof 14653060 14612100 -40960 -0.280% trace 11805940 11777268 -28672 -0.243% vet 7185560 7181416 -4144 -0.058% total 113588440 113465560 -122880 -0.108% And not just by removing unnecessary symbols; the program text shrinks a bit as well. Fixes #38554 Change-Id: I8381ae6084ae145a5e0cd9410c451e52c0dc51c8 Reviewed-on: https://go-review.googlesource.com/c/go/+/229704 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-24 23:58:10 +00:00
Josh Bleecher Snyder	e7c1873691	cmd/compile: optimize x & 1 != 0 to x & 1 on amd64 Triggers a handful of times in std+cmd. Change-Id: I9bb8ce9a5f8bae2547cb61157cd8f256e1b63e76 Reviewed-on: https://go-review.googlesource.com/c/go/+/229602 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-23 17:52:28 +00:00
Michael Munday	ab7a65f283	cmd/compile: clean up codegen for branch-on-carry on s390x This CL optimizes code that uses a carry from a function such as bits.Add64 as the condition in an if statement. For example: x, c := bits.Add64(a, b, 0) if c != 0 { panic("overflow") } Rather than converting the carry into a 0 or a 1 value and using that as an input to a comparison instruction the carry flag is now used as the input to a conditional branch directly. This typically removes an ADD LOGICAL WITH CARRY instruction when user code is doing overflow detection and is closer to the code that a user would expect to generate. Change-Id: I950431270955ab72f1b5c6db873b6abe769be0da Reviewed-on: https://go-review.googlesource.com/c/go/+/219757 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-22 20:11:06 +00:00
Michael Munday	e464d7d797	cmd/compile: optimize comparisons with immediates on s390x When generating code for unsigned equals (==) and not equals (!=) comparisons we currently, on s390x, always use signed comparisons. This mostly works well, however signed comparisons on s390x sign extend their immediates and unsigned comparisons zero extend them. For compare-and-branch instructions which can only have 8-bit immediates this significantly changes the range of immediate values we can represent: [-128, 127] for signed comparisons and [0, 255] for unsigned comparisons. When generating equals and not equals checks we don't neet to worry about whether the comparison is signed or unsigned. This CL therefore adds rules to allow us to switch signedness for such comparisons if it means that it brings a constant into range for an 8-bit immediate. For example, a signed equals with an integer in the range [128, 255] will now be implemented using an unsigned compare-and-branch instruction rather than separate compare and branch instructions. As part of this change I've also added support for adding a name to block control values using the same `x:(...)` syntax we use for value rules. Triggers 792 times when compiling cmd and std. Change-Id: I77fa80a128f0a8ce51a2888d1e384bd5e9b61a77 Reviewed-on: https://go-review.googlesource.com/c/go/+/228642 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-21 19:23:51 +00:00
alex-semenyuk	876c1feb7d	test/codegen, runtime/pprof, runtime: apply fmt Change-Id: Ife4e065246729319c39e57a4fbd8e6f7b37724e1 GitHub-Last-Rev: `e71803eaeb` GitHub-Pull-Request: golang/go#38527 Reviewed-on: https://go-review.googlesource.com/c/go/+/228901 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>	2020-04-21 09:07:42 +00:00
Josh Bleecher Snyder	50b11318fe	cmd/compile: use oneBit instead of isPowerOfTwo in bit optimization This optimization works on any integer with exactly one bit set. This is identical to being a power of two, except in the most negative number. Use oneBit instead. The rule now triggers in a few more places in std+cmd, in packages encoding/asn1, crypto/elliptic, and vendor/golang.org/x/crypto/cryptobyte. This change obviates the need for CL 222479 by doing this optimization consistently in the compiler. Change-Id: I983c6235290fdc634fda5e11b10f1f8ce041272f Reviewed-on: https://go-review.googlesource.com/c/go/+/229124 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2020-04-21 00:38:34 +00:00
David Chase	e4e192484b	cmd/compile: split up the addressing mode on OpAMD64CMPloadidx always Benchmarking suggests that the combo instruction is notably slower, at least in the places where we measure. Updates #37955 Change-Id: I829f1975dd6edf38163128ba51d84604055512f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/228157 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-15 18:09:14 +00:00
Lynn Boger	a1550d3ca3	cmd/compile: use isel with variable shifts on ppc64x This changes the code generated for variable length shift counts to use isel instead of instructions that set and read the carry flag. This reduces the generated code for shifts like this by 1 instruction and avoids the use of instructions to set and read the carry flag. This sequence can be found in strconv with these results on power9: Atof64Decimal 71.6ns ± 0% 68.3ns ± 0% -4.61% Atof64Float 95.3ns ± 0% 90.9ns ± 0% -4.62% Atof64FloatExp 153ns ± 0% 149ns ± 0% -2.61% Atof64Big 234ns ± 0% 232ns ± 0% -0.85% Atof64RandomBits 348ns ± 0% 369ns ± 0% +6.03% Atof64RandomFloats 262ns ± 0% 262ns ± 0% ~ Atof32Decimal 72.0ns ± 0% 68.2ns ± 0% -5.28% Atof32Float 92.1ns ± 0% 87.1ns ± 0% -5.43% Atof32FloatExp 159ns ± 0% 158ns ± 0% -0.63% Atof32Random 194ns ± 0% 191ns ± 0% -1.55% Some tests in codegen/shift.go are enabled to verify the expected instructions are generated. Change-Id: I968715d10ada405a8c46132bf19b8ed9b85796d1 Reviewed-on: https://go-review.googlesource.com/c/go/+/227337 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-04-09 19:18:56 +00:00
Josh Bleecher Snyder	ade0811dc8	cmd/compile: handle some additional phis in shortcircuit Prior to this change, the shortcircuit pass could only handle blocks containing only a single phi control value, possibly wrapped in some OpNot and OpCopy values. This change partially lifts this limitation. It handles some cases in which the block contains other phi values. This appears to happen most commonly in cases in which the conditionals being checked involve the memory state, in which case there is a phi memory value in the block. The general idea here is to use the information we have about the CFG to (1) move the other phi values into other blocks and/or (2) rewrite uses of the other phi values in other blocks. For example, consider this CFG: p q \ / b / \ t u And consider a phi value v in block b. We'll write v = Phi(p: x, q: y) to say that v has value x corresponding to inbound block p, and value y for block q. We will rewrite this CFG to: p q \| / \| b \|/ \ t u What should we do with v? Any uses of v in u can be replaced with y. Why? If we are in block u, we came from b, and before that from q. If prior to b we came from p, then we would have gone to t, not u. Since we came from q, we know that v took the value y. Uses of v in t are a bit more complicated. It is going to end up being a phi value: Phi(p: ?, b: ?). Suppose, after the rewrite, we came from block p. Then, before the rewrite, we would have gone to b, where v would have the value x. So we have Phi(p: x, b: ?). Suppose, after the rewrite, we came from block b. Then we must have come from block q. If we come from block q, v has value y. So we have Phi(p: x, b: y). Uses of v in t can thus be replaced with a new phi value, with the same values as v, but with altered predecessors. Similar reasoning can be employed to rewrite or replace other uses of v elsewhere in the CFG, so that v itself can be eliminated, and the CFG rewrite can proceed. This change sets up the infrastructure for such optimizations and adds a few cheap ones. All optimizations in this change depend only on the shape of the CFG; future changes may also depend on where v's uses are. That analysis is more powerful but more expensive, and should be done incrementally. The use of closures here is perhaps a bit unusual, but during development it proved critical to having readable code. We must decide early on whether we can safely do the CFG modifications, and then later fix up the phis if so. Safely storing state and decisions across these two phases is hard to do readably. Closures solve the problem neatly. I manually instrumented the code paths in shortcircuitPhiPlan. During make.bash there are nearly 6000 invocations. The least-visited code path gets run 85 times, so all the code in this CL is reasonably well-exercised. Here is a concrete example of code improved by this change: func f(e interface{}) int { if x, ok := e.(int); ok { return x } return 0 } Omitting PCDATA, FUNCDATA, and the like, it used to compile to: "".f STEXT nosplit size=50 args=0x18 locals=0x0 0x0000 00000 (x.go:4) LEAQ type.int(SB), AX 0x0007 00007 (x.go:4) MOVQ "".e+8(SP), CX 0x000c 00012 (x.go:4) CMPQ AX, CX 0x000f 00015 (x.go:4) JNE 43 0x0011 00017 (x.go:4) MOVQ "".e+16(SP), AX 0x0016 00022 (x.go:4) MOVQ (AX), AX 0x0019 00025 (x.go:4) JNE 33 0x001b 00027 (x.go:5) MOVQ AX, "".~r1+24(SP) 0x0020 00032 (x.go:5) RET 0x0021 00033 (x.go:7) MOVQ $0, "".~r1+24(SP) 0x002a 00042 (x.go:7) RET 0x002b 00043 (x.go:7) MOVL $0, AX 0x0030 00048 (x.go:4) JMP 25 Afterwards, it compiles to: "".f STEXT nosplit size=41 args=0x18 locals=0x0 0x0000 00000 (x.go:4) LEAQ type.int(SB), AX 0x0007 00007 (x.go:4) MOVQ "".e+8(SP), CX 0x000c 00012 (x.go:4) CMPQ AX, CX 0x000f 00015 (x.go:4) JNE 31 0x0011 00017 (x.go:4) MOVQ "".e+16(SP), AX 0x0016 00022 (x.go:4) MOVQ (AX), AX 0x0019 00025 (x.go:5) MOVQ AX, "".~r1+24(SP) 0x001e 00030 (x.go:5) RET 0x001f 00031 (x.go:7) MOVQ $0, "".~r1+24(SP) 0x0028 00040 (x.go:7) RET Note that there is now only a single JNE and a single RET $0 path. Updates #37608 Has a minor good effect on compilation speed and memory use. Provides widespread improvements to generated code. The rare, minor regressions I have investigated are due to register allocation fluctuations. file before after Δ % addr2line 4376080 4371984 -4096 -0.094% api 5945400 5933112 -12288 -0.207% asm 5034312 5030216 -4096 -0.081% buildid 2844952 2840856 -4096 -0.144% cgo 4812872 4804680 -8192 -0.170% compile 19622064 19610368 -11696 -0.060% cover 5236648 5232552 -4096 -0.078% dist 3658312 3654216 -4096 -0.112% doc 4653512 4649416 -4096 -0.088% fix 3370072 3365976 -4096 -0.122% link 6671864 6667768 -4096 -0.061% pprof 14781652 14761172 -20480 -0.139% trace 11639684 11627396 -12288 -0.106% vet 8252280 8231800 -20480 -0.248% total 115052984 114934792 -118192 -0.103% file before after Δ % internal/cpu.s 3298 3296 -2 -0.061% internal/bytealg.s 1730 1737 +7 +0.405% cmd/vendor/golang.org/x/mod/semver.s 7332 7283 -49 -0.668% image/color.s 8248 8156 -92 -1.115% math.s 35966 35956 -10 -0.028% math/cmplx.s 6596 6575 -21 -0.318% runtime.s 480566 480053 -513 -0.107% sync.s 16408 16385 -23 -0.140% math/rand.s 10447 10406 -41 -0.392% internal/reflectlite.s 28408 28366 -42 -0.148% errors.s 2736 2701 -35 -1.279% sort.s 17031 17036 +5 +0.029% io.s 16993 16964 -29 -0.171% container/heap.s 2006 1997 -9 -0.449% text/tabwriter.s 9570 9552 -18 -0.188% bytes.s 31823 31594 -229 -0.720% strconv.s 52760 52717 -43 -0.082% vendor/golang.org/x/text/transform.s 16713 16706 -7 -0.042% strings.s 42590 42563 -27 -0.063% bufio.s 22883 22785 -98 -0.428% encoding/base32.s 9586 9531 -55 -0.574% syscall.s 82237 82243 +6 +0.007% image.s 37465 37452 -13 -0.035% regexp/syntax.s 82827 82769 -58 -0.070% image/draw.s 18698 18584 -114 -0.610% image/jpeg.s 36560 36549 -11 -0.030% time.s 82557 82526 -31 -0.038% context.s 10863 10820 -43 -0.396% regexp.s 64114 64049 -65 -0.101% os.s 51751 51524 -227 -0.439% reflect.s 168240 168049 -191 -0.114% cmd/go/internal/lockedfile/internal/filelock.s 2317 2290 -27 -1.165% path/filepath.s 17831 17766 -65 -0.365% io/ioutil.s 6994 6990 -4 -0.057% encoding/binary.s 30791 30726 -65 -0.211% cmd/vendor/golang.org/x/sys/unix.s 78055 78033 -22 -0.028% encoding/pem.s 9280 9247 -33 -0.356% crypto/cipher.s 20376 20374 -2 -0.010% os/exec.s 29229 29140 -89 -0.304% internal/goroot.s 4588 4579 -9 -0.196% cmd/internal/browser.s 2246 2240 -6 -0.267% cmd/vendor/golang.org/x/crypto/ssh/terminal.s 27183 27149 -34 -0.125% fmt.s 76625 76484 -141 -0.184% encoding/hex.s 6154 6152 -2 -0.032% compress/lzw.s 7063 7059 -4 -0.057% database/sql/driver.s 18875 18862 -13 -0.069% debug/plan9obj.s 8268 8266 -2 -0.024% net/url.s 29724 29719 -5 -0.017% encoding/csv.s 12872 12856 -16 -0.124% debug/gosym.s 25303 25268 -35 -0.138% compress/flate.s 50952 51019 +67 +0.131% compress/zlib.s 7277 7266 -11 -0.151% archive/zip.s 42155 42111 -44 -0.104% debug/dwarf.s 107632 107541 -91 -0.085% database/sql.s 98373 98028 -345 -0.351% os/user.s 14722 14708 -14 -0.095% encoding/json.s 105836 105711 -125 -0.118% debug/macho.s 32598 32560 -38 -0.117% encoding/gob.s 136478 135755 -723 -0.530% debug/pe.s 31160 30869 -291 -0.934% debug/elf.s 63495 63302 -193 -0.304% vendor/golang.org/x/text/unicode/bidi.s 27220 27217 -3 -0.011% vendor/golang.org/x/text/secure/bidirule.s 3363 3352 -11 -0.327% go/token.s 12036 12035 -1 -0.008% flag.s 22277 22256 -21 -0.094% mime.s 39696 39509 -187 -0.471% go/scanner.s 19033 19020 -13 -0.068% archive/tar.s 70936 70581 -355 -0.500% internal/xcoff.s 22823 22820 -3 -0.013% text/scanner.s 11631 11629 -2 -0.017% encoding/xml.s 110534 110408 -126 -0.114% math/big.s 183636 183545 -91 -0.050% image/gif.s 27376 27343 -33 -0.121% crypto/dsa.s 6029 5969 -60 -0.995% image/png.s 42947 42939 -8 -0.019% crypto/rand.s 6866 6854 -12 -0.175% vendor/golang.org/x/text/unicode/norm.s 66394 66354 -40 -0.060% runtime/trace.s 2603 2521 -82 -3.150% crypto/ed25519.s 6321 6300 -21 -0.332% text/template/parse.s 93910 93844 -66 -0.070% crypto/rsa.s 31460 31369 -91 -0.289% encoding/asn1.s 57021 57023 +2 +0.004% crypto/elliptic.s 51382 51363 -19 -0.037% crypto/x509/pkix.s 10386 10342 -44 -0.424% vendor/golang.org/x/net/idna.s 24482 24466 -16 -0.065% vendor/golang.org/x/crypto/cryptobyte.s 33479 33280 -199 -0.594% crypto/ecdsa.s 11936 11883 -53 -0.444% go/constant.s 43670 42663 -1007 -2.306% go/ast.s 80383 80191 -192 -0.239% testing.s 68069 68057 -12 -0.018% runtime/pprof.s 59613 59603 -10 -0.017% testing/iotest.s 4895 4891 -4 -0.082% internal/trace.s 78136 78089 -47 -0.060% cmd/internal/goobj2.s 13158 13154 -4 -0.030% cmd/internal/src.s 17661 17657 -4 -0.023% go/parser.s 79046 78880 -166 -0.210% cmd/internal/objabi.s 16367 16343 -24 -0.147% text/template.s 94899 94486 -413 -0.435% go/printer.s 77267 76992 -275 -0.356% cmd/internal/goobj.s 25988 25947 -41 -0.158% runtime/pprof/internal/profile.s 102066 101933 -133 -0.130% go/format.s 5419 5371 -48 -0.886% cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s 37181 37149 -32 -0.086% go/doc.s 74533 74132 -401 -0.538% html/template.s 88743 88389 -354 -0.399% cmd/asm/internal/lex.s 24881 24872 -9 -0.036% cmd/internal/buildid.s 18263 18256 -7 -0.038% cmd/vendor/golang.org/x/arch/x86/x86asm.s 80036 79980 -56 -0.070% go/build.s 68905 68737 -168 -0.244% cmd/cover.s 46070 45950 -120 -0.260% cmd/internal/obj.s 117001 116991 -10 -0.009% cmd/doc.s 62700 62419 -281 -0.448% cmd/internal/obj/arm.s 66745 66687 -58 -0.087% cmd/compile/internal/syntax.s 145406 145062 -344 -0.237% cmd/internal/obj/wasm.s 44049 44027 -22 -0.050% net.s 291835 291020 -815 -0.279% cmd/dist.s 209020 208807 -213 -0.102% cmd/cgo.s 241564 241102 -462 -0.191% vendor/golang.org/x/net/http/httpproxy.s 9407 9399 -8 -0.085% log/syslog.s 7921 7909 -12 -0.151% go/types.s 319325 317513 -1812 -0.567% vendor/golang.org/x/net/http/httpguts.s 3834 3825 -9 -0.235% mime/multipart.s 21414 21343 -71 -0.332% cmd/internal/obj/ppc64.s 119949 119938 -11 -0.009% cmd/compile/internal/logopt.s 10158 10118 -40 -0.394% vendor/golang.org/x/net/nettest.s 28012 27991 -21 -0.075% go/internal/srcimporter.s 6405 6380 -25 -0.390% go/internal/gcimporter.s 34525 34493 -32 -0.093% net/mail.s 23937 23720 -217 -0.907% go/internal/gccgoimporter.s 56095 56038 -57 -0.102% cmd/compile/internal/types.s 47247 47207 -40 -0.085% cmd/api.s 39582 39558 -24 -0.061% cmd/go/internal/base.s 12572 12551 -21 -0.167% cmd/vendor/golang.org/x/xerrors.s 17846 17814 -32 -0.179% cmd/vendor/golang.org/x/mod/sumdb/note.s 18142 18070 -72 -0.397% cmd/go/internal/search.s 19994 19876 -118 -0.590% cmd/go/internal/imports.s 16457 16428 -29 -0.176% cmd/vendor/golang.org/x/mod/module.s 17838 17759 -79 -0.443% cmd/go/internal/cache.s 30551 30514 -37 -0.121% cmd/vendor/golang.org/x/mod/sumdb/tlog.s 36356 36321 -35 -0.096% cmd/internal/test2json.s 9452 9408 -44 -0.466% cmd/go/internal/mvs.s 25136 25092 -44 -0.175% cmd/go/internal/txtar.s 3488 3461 -27 -0.774% cmd/vendor/golang.org/x/mod/zip.s 18811 18800 -11 -0.058% cmd/go/internal/version.s 11213 11171 -42 -0.375% cmd/link/internal/benchmark.s 4941 4949 +8 +0.162% cmd/internal/obj/s390x.s 126865 126849 -16 -0.013% cmd/gofmt.s 30684 30596 -88 -0.287% cmd/fix.s 87450 86906 -544 -0.622% cmd/internal/obj/x86.s 88578 88556 -22 -0.025% cmd/vendor/golang.org/x/mod/modfile.s 72450 72363 -87 -0.120% cmd/oldlink/internal/loader.s 16743 16741 -2 -0.012% cmd/pack.s 14863 14861 -2 -0.013% cmd/go/internal/load.s 106742 106568 -174 -0.163% cmd/oldlink/internal/objfile.s 21787 21780 -7 -0.032% cmd/oldlink/internal/loadmacho.s 29309 29317 +8 +0.027% cmd/oldlink/internal/loadelf.s 35013 35021 +8 +0.023% cmd/asm/internal/asm.s 68550 68538 -12 -0.018% cmd/link/internal/loader.s 94765 94564 -201 -0.212% cmd/link/internal/loadelf.s 35663 35667 +4 +0.011% cmd/link/internal/loadmacho.s 29501 29509 +8 +0.027% cmd/vendor/golang.org/x/tools/go/analysis.s 4983 4976 -7 -0.140% cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s 16771 16709 -62 -0.370% cmd/vendor/golang.org/x/tools/go/types/objectpath.s 18481 18456 -25 -0.135% cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2100 2085 -15 -0.714% cmd/vendor/github.com/google/pprof/profile.s 150141 149620 -521 -0.347% cmd/vendor/github.com/google/pprof/internal/measurement.s 10420 10404 -16 -0.154% cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s 36814 36755 -59 -0.160% cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s 6688 6673 -15 -0.224% cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s 9856 9784 -72 -0.731% cmd/vendor/golang.org/x/tools/go/analysis/passes/composite.s 3011 2979 -32 -1.063% cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s 9737 9682 -55 -0.565% cmd/vendor/golang.org/x/tools/go/cfg.s 30738 30725 -13 -0.042% cmd/vendor/github.com/ianlancetaylor/demangle.s 175195 174513 -682 -0.389% cmd/vendor/golang.org/x/tools/go/analysis/passes/httpresponse.s 3625 3520 -105 -2.897% cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s 2987 2971 -16 -0.536% cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s 4372 4340 -32 -0.732% cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s 8634 8611 -23 -0.266% cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s 6189 6164 -25 -0.404% cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s 8089 8073 -16 -0.198% cmd/vendor/golang.org/x/tools/go/analysis/passes/unsafeptr.s 2208 2177 -31 -1.404% cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8050 8047 -3 -0.037% cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s 3665 3629 -36 -0.982% cmd/vendor/golang.org/x/tools/go/ast/astutil.s 65773 65680 -93 -0.141% cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s 13328 13286 -42 -0.315% cmd/vendor/golang.org/x/tools/go/types/typeutil.s 12263 12162 -101 -0.824% cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s 1459 1421 -38 -2.605% cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s 5208 5191 -17 -0.326% cmd/vendor/golang.org/x/tools/go/analysis/passes/unmarshal.s 1801 1782 -19 -1.055% cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s 9569 9528 -41 -0.428% cmd/go/internal/work.s 304928 304756 -172 -0.056% crypto/x509.s 147340 147139 -201 -0.136% cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s 34287 34019 -268 -0.782% crypto/tls.s 311603 310644 -959 -0.308% cmd/oldlink/internal/ld.s 533115 532651 -464 -0.087% cmd/oldlink/internal/wasm.s 16484 16458 -26 -0.158% cmd/oldlink/internal/x86.s 18832 18830 -2 -0.011% cmd/link/internal/ld.s 548200 547626 -574 -0.105% cmd/link/internal/wasm.s 16760 16734 -26 -0.155% cmd/link/internal/arm64.s 20850 20840 -10 -0.048% cmd/link/internal/x86.s 17437 17435 -2 -0.011% net/http.s 556647 555519 -1128 -0.203% net/http/cookiejar.s 15849 15833 -16 -0.101% expvar.s 9521 9508 -13 -0.137% net/http/httptest.s 16471 16452 -19 -0.115% cmd/vendor/github.com/google/pprof/internal/plugin.s 4266 4264 -2 -0.047% net/http/cgi.s 23448 23428 -20 -0.085% cmd/go/internal/web.s 16472 16428 -44 -0.267% net/http/httputil.s 39672 39670 -2 -0.005% net/rpc.s 33989 33965 -24 -0.071% net/http/fcgi.s 19167 19162 -5 -0.026% cmd/vendor/github.com/google/pprof/internal/symbolz.s 5861 5857 -4 -0.068% cmd/vendor/github.com/google/pprof/internal/binutils.s 35842 35823 -19 -0.053% cmd/vendor/github.com/google/pprof/internal/symbolizer.s 11449 11404 -45 -0.393% cmd/go/internal/get.s 62726 62582 -144 -0.230% cmd/vendor/github.com/google/pprof/internal/report.s 80032 80022 -10 -0.012% cmd/go/internal/modfetch/codehost.s 89005 88871 -134 -0.151% cmd/trace.s 116607 116496 -111 -0.095% cmd/vendor/github.com/google/pprof/internal/driver.s 143234 143207 -27 -0.019% cmd/vendor/github.com/google/pprof/driver.s 9000 8998 -2 -0.022% cmd/go/internal/modfetch.s 126300 125726 -574 -0.454% cmd/pprof.s 12317 12312 -5 -0.041% cmd/go/internal/modconv.s 17878 17861 -17 -0.095% cmd/go/internal/modload.s 150261 149763 -498 -0.331% cmd/go/internal/clean.s 11122 11091 -31 -0.279% cmd/go/internal/help.s 6523 6521 -2 -0.031% cmd/go/internal/generate.s 11627 11614 -13 -0.112% cmd/go/internal/envcmd.s 22034 21986 -48 -0.218% cmd/go/internal/modget.s 38478 38398 -80 -0.208% cmd/go/internal/modcmd.s 46430 46229 -201 -0.433% cmd/go/internal/test.s 64399 64374 -25 -0.039% cmd/compile/internal/ssa.s 3615264 3608276 -6988 -0.193% cmd/compile/internal/gc.s 1538865 1537625 -1240 -0.081% cmd/compile/internal/amd64.s 33593 33574 -19 -0.057% cmd/compile/internal/x86.s 30871 30852 -19 -0.062% total 19343565 19311284 -32281 -0.167% Change-Id: Ib030eb79458827a5a5b6d0d2f98765f8325a4d7e Reviewed-on: https://go-review.googlesource.com/c/go/+/222923 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-08 22:13:38 +00:00
Ruixin(Peter) Bao	b2790a2838	cmd/compile: allow floating point Ops to produce flags on s390x On s390x, some floating point arithmetic instructions (FSUB, FADD) generate flag. This patch allows those related SSA ops to return a tuple, where the second argument of the tuple is the generated flag. We can use the flag and remove the subsequent comparison instruction (e.g: LTDBR). This CL also reduces the .text section for math.test binary by 0.4KB. Benchmarks: name old time/op new time/op delta Acos-18 12.1ns ± 0% 12.1ns ± 0% ~ (all equal) Acosh-18 18.5ns ± 0% 18.5ns ± 0% ~ (all equal) Asin-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal) Asinh-18 19.4ns ± 0% 19.5ns ± 1% ~ (p=0.444 n=5+5) Atan-18 10.0ns ± 0% 10.0ns ± 0% ~ (all equal) Atanh-18 19.1ns ± 1% 19.2ns ± 2% ~ (p=0.841 n=5+5) Atan2-18 16.4ns ± 0% 16.4ns ± 0% ~ (all equal) Cbrt-18 14.8ns ± 0% 14.8ns ± 0% ~ (all equal) Ceil-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) Copysign-18 0.80ns ± 0% 0.80ns ± 0% ~ (all equal) Cos-18 7.19ns ± 0% 7.19ns ± 0% ~ (p=0.556 n=4+5) Cosh-18 12.4ns ± 0% 12.4ns ± 0% ~ (all equal) Erf-18 10.8ns ± 0% 10.8ns ± 0% ~ (all equal) Erfc-18 11.0ns ± 0% 11.0ns ± 0% ~ (all equal) Erfinv-18 23.0ns ±16% 26.8ns ± 1% +16.90% (p=0.008 n=5+5) Erfcinv-18 23.3ns ±15% 26.1ns ± 7% ~ (p=0.087 n=5+5) Exp-18 8.67ns ± 0% 8.67ns ± 0% ~ (p=1.000 n=4+4) ExpGo-18 50.8ns ± 3% 52.4ns ± 2% ~ (p=0.063 n=5+5) Expm1-18 9.49ns ± 1% 9.47ns ± 0% ~ (p=1.000 n=5+5) Exp2-18 52.7ns ± 1% 50.5ns ± 3% -4.10% (p=0.024 n=5+5) Exp2Go-18 50.6ns ± 1% 48.4ns ± 3% -4.39% (p=0.008 n=5+5) Abs-18 0.67ns ± 0% 0.67ns ± 0% ~ (p=0.444 n=5+5) Dim-18 1.02ns ± 0% 1.03ns ± 0% +0.98% (p=0.008 n=5+5) Floor-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) Max-18 3.09ns ± 1% 3.05ns ± 0% -1.42% (p=0.008 n=5+5) Min-18 3.32ns ± 1% 3.30ns ± 0% -0.72% (p=0.016 n=5+4) Mod-18 62.3ns ± 1% 65.8ns ± 3% +5.55% (p=0.008 n=5+5) Frexp-18 5.05ns ± 2% 4.98ns ± 0% ~ (p=0.683 n=5+5) Gamma-18 24.4ns ± 0% 24.1ns ± 0% -1.23% (p=0.008 n=5+5) Hypot-18 10.3ns ± 0% 10.3ns ± 0% ~ (all equal) HypotGo-18 10.2ns ± 0% 10.2ns ± 0% ~ (all equal) Ilogb-18 3.56ns ± 1% 3.54ns ± 0% ~ (p=0.595 n=5+5) J0-18 113ns ± 0% 108ns ± 1% -4.42% (p=0.016 n=4+5) J1-18 115ns ± 0% 109ns ± 1% -4.87% (p=0.016 n=4+5) Jn-18 240ns ± 0% 230ns ± 2% -4.41% (p=0.008 n=5+5) Ldexp-18 6.19ns ± 0% 6.19ns ± 0% ~ (p=0.444 n=5+5) Lgamma-18 32.2ns ± 0% 32.2ns ± 0% ~ (all equal) Log-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal) Logb-18 4.23ns ± 0% 4.22ns ± 0% ~ (p=0.444 n=5+5) Log1p-18 12.7ns ± 0% 12.7ns ± 0% ~ (all equal) Log10-18 18.1ns ± 0% 18.2ns ± 0% ~ (p=0.167 n=5+5) Log2-18 14.0ns ± 0% 14.0ns ± 0% ~ (all equal) Modf-18 10.4ns ± 0% 10.5ns ± 0% +0.96% (p=0.016 n=4+5) Nextafter32-18 11.3ns ± 0% 11.3ns ± 0% ~ (all equal) Nextafter64-18 4.01ns ± 1% 3.97ns ± 0% ~ (p=0.333 n=5+4) PowInt-18 32.7ns ± 0% 32.7ns ± 0% ~ (all equal) PowFrac-18 33.2ns ± 0% 33.1ns ± 0% ~ (p=0.095 n=4+5) Pow10Pos-18 1.58ns ± 0% 1.58ns ± 0% ~ (all equal) Pow10Neg-18 5.81ns ± 0% 5.81ns ± 0% ~ (all equal) Round-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) RoundToEven-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) Remainder-18 40.6ns ± 0% 40.7ns ± 0% ~ (p=0.238 n=5+4) Signbit-18 1.57ns ± 0% 1.57ns ± 0% ~ (all equal) Sin-18 6.75ns ± 0% 6.74ns ± 0% ~ (p=0.333 n=5+4) Sincos-18 29.5ns ± 0% 29.5ns ± 0% ~ (all equal) Sinh-18 14.4ns ± 0% 14.4ns ± 0% ~ (all equal) SqrtIndirect-18 3.97ns ± 0% 4.15ns ± 0% +4.59% (p=0.008 n=5+5) SqrtLatency-18 8.01ns ± 0% 8.01ns ± 0% ~ (all equal) SqrtIndirectLatency-18 11.6ns ± 0% 11.6ns ± 0% ~ (all equal) SqrtGoLatency-18 44.7ns ± 0% 45.0ns ± 0% +0.67% (p=0.008 n=5+5) SqrtPrime-18 1.26µs ± 0% 1.27µs ± 0% +0.63% (p=0.029 n=4+4) Tan-18 11.1ns ± 0% 11.1ns ± 0% ~ (all equal) Tanh-18 15.8ns ± 0% 15.8ns ± 0% ~ (all equal) Trunc-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal) Y0-18 113ns ± 2% 108ns ± 3% -5.11% (p=0.008 n=5+5) Y1-18 112ns ± 3% 107ns ± 0% -4.29% (p=0.000 n=5+4) Yn-18 229ns ± 0% 220ns ± 1% -3.76% (p=0.016 n=4+5) Float64bits-18 1.09ns ± 0% 1.09ns ± 0% ~ (all equal) Float64frombits-18 0.55ns ± 0% 0.55ns ± 0% ~ (all equal) Float32bits-18 0.96ns ±16% 0.86ns ± 0% ~ (p=0.563 n=5+5) Float32frombits-18 1.03ns ±28% 0.84ns ± 0% ~ (p=0.167 n=5+5) FMA-18 1.60ns ± 0% 1.60ns ± 0% ~ (all equal) [Geo mean] 10.0ns 9.9ns -0.41% Change-Id: Ief7e63ea5a8ba404b0a4696e12b9b7e0b05a9a03 Reviewed-on: https://go-review.googlesource.com/c/go/+/209160 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-04-08 20:57:58 +00:00
Michael Munday	bfd569fcb0	cmd/compile: delete the floating point Greater and Geq ops Extend CL 220417 (which removed the integer Greater and Geq ops) to floating point comparisons. Greater and Geq can always be implemented using Less and Leq. Fixes #37316. Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/222397 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-07 19:55:05 +00:00
Lynn Boger	815509ae31	cmd/compile: improve lowered moves and zeros for ppc64le This change includes the following: - Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9. These instructions do not require an index register, which allows more loads and stores within a loop without initializing multiple index registers. The LoweredQuadXXX generate LXV/STXV. - Create LoweredMoveXXXShort and LoweredZeroXXXShort for short moves that don't generate loops, and therefore don't clobber the address registers or flags. - Use registers other than R3 and R4 to avoid conflicting with registers that have already been allocated to avoid unnecessary register moves. - Eliminate the use of R14 as scratch register and use R31 instead. - Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a loop with more than 3 iterations. This performance opportunity was noticed in github.com/golang/snappy benchmarks. Results on power9: WordsDecode1e1 54.1ns ± 0% 53.8ns ± 0% -0.51% (p=0.029 n=4+4) WordsDecode1e2 287ns ± 0% 282ns ± 1% -1.83% (p=0.029 n=4+4) WordsDecode1e3 3.98µs ± 0% 3.64µs ± 0% -8.52% (p=0.029 n=4+4) WordsDecode1e4 66.9µs ± 0% 67.0µs ± 0% +0.20% (p=0.029 n=4+4) WordsDecode1e5 723µs ± 0% 723µs ± 0% -0.01% (p=0.200 n=4+4) WordsDecode1e6 7.21ms ± 0% 7.21ms ± 0% -0.02% (p=1.000 n=4+4) WordsEncode1e1 29.9ns ± 0% 29.4ns ± 0% -1.51% (p=0.029 n=4+4) WordsEncode1e2 2.12µs ± 0% 1.75µs ± 0% -17.70% (p=0.029 n=4+4) WordsEncode1e3 11.7µs ± 0% 11.2µs ± 0% -4.61% (p=0.029 n=4+4) WordsEncode1e4 119µs ± 0% 120µs ± 0% +0.36% (p=0.029 n=4+4) WordsEncode1e5 1.21ms ± 0% 1.22ms ± 0% +0.41% (p=0.029 n=4+4) WordsEncode1e6 12.0ms ± 0% 12.0ms ± 0% +0.57% (p=0.029 n=4+4) RandomEncode 286µs ± 0% 203µs ± 0% -28.82% (p=0.029 n=4+4) ExtendMatch 47.4µs ± 0% 47.0µs ± 0% -0.85% (p=0.029 n=4+4) Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858 Reviewed-on: https://go-review.googlesource.com/c/go/+/226539 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-04-06 12:09:39 +00:00
Josh Bleecher Snyder	fff7509d47	cmd/compile: add intrinsic HasCPUFeature for checking cpu features Before using some CPU instructions, we must check for their presence. We use global variables in the runtime package to record features. Prior to this CL, we issued a regular memory load for these features. The downside to this is that, because it is a regular memory load, it cannot be hoisted out of loops or otherwise reordered with other loads. This CL introduces a new intrinsic just for checking cpu features. It still ends up resulting in a memory load, but that memory load can now be floated to the entry block and rematerialized as needed. One downside is that the regular load could be combined with the comparison into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE. (It is possible that MOVBQZX+TESTQ+NE would be better.) This CL does only amd64. It is easy to extend to other architectures. For the benchmark in #36196, on my machine, this offers a mild speedup. name old time/op new time/op delta FMA-8 1.39ns ± 6% 1.29ns ± 9% -7.19% (p=0.000 n=97+96) NonFMA-8 2.03ns ±11% 2.04ns ±12% ~ (p=0.618 n=99+98) Updates #15808 Updates #36196 Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb Reviewed-on: https://go-review.googlesource.com/c/go/+/212360 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2020-04-04 01:01:04 +00:00
Keith Randall	bba88467f8	cmd/compile: add indexed-load CMP instructions Things like CMPQ 4(AX)(BX*8), CX Fixes #37955 Change-Id: Icbed430f65c91a0e3f38a633d8321d79433ad8b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/224219 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2020-04-01 17:03:26 +00:00
Josh Bleecher Snyder	8114242359	cmd/compile, runtime: use more registers for amd64 write barrier calls The compiler-inserted write barrier calls use a special ABI for speed and to minimize the binary size impact. runtime.gcWriteBarrier takes its args in DI and AX. This change adds gcWriteBarrier wrapper functions, varying only in the register used for the second argument. (Allowing variation in the first argument doesn't offer improvements, which is convenient, as it avoids quadratic API growth.) This reduces the number of register copies. The goals are reduced binary size via reduced register pressure/copies. One downside to this change is that when the write barrier is on, we may bounce through several different write barrier wrappers, which is bad for the instruction cache. Package runtime write barrier benchmarks for this change: name old time/op new time/op delta WriteBarrier-8 16.6ns ± 6% 15.6ns ± 6% -5.73% (p=0.000 n=97+99) BulkWriteBarrier-8 4.37ns ± 7% 4.22ns ± 8% -3.45% (p=0.000 n=96+99) However, I don't particularly trust these numbers. I ran runtime.BenchmarkWriteBarrier multiple times as I rebased this change, and noticed that the results have high variance depending on the parent change, perhaps due to aligment. This change was stress tested with GOGC=1 GODEBUG=gccheckmark=1 go test std. This change reduces binary sizes: file before after Δ % addr2line 4308720 4296688 -12032 -0.279% api 5965592 5945368 -20224 -0.339% asm 5148088 5025464 -122624 -2.382% buildid 2848760 2844904 -3856 -0.135% cgo 4828968 4812840 -16128 -0.334% compile 19754720 19529744 -224976 -1.139% cover 5256840 5236600 -20240 -0.385% dist 3670312 3658264 -12048 -0.328% doc 4669608 4657576 -12032 -0.258% fix 3377976 3365944 -12032 -0.356% link 6614888 6586472 -28416 -0.430% nm 4258368 4254528 -3840 -0.090% objdump 4656336 4644304 -12032 -0.258% pack 2295176 2295432 +256 +0.011% pprof 14762356 14709364 -52992 -0.359% test2json 2824456 2820600 -3856 -0.137% trace 11684404 11643700 -40704 -0.348% vet 8284760 8252248 -32512 -0.392% total 115210328 114580040 -630288 -0.547% This change improves compiler performance: name old time/op new time/op delta Template 208ms ± 3% 207ms ± 3% -0.40% (p=0.030 n=43+44) Unicode 80.2ms ± 3% 81.3ms ± 3% +1.25% (p=0.000 n=41+44) GoTypes 699ms ± 3% 694ms ± 2% -0.71% (p=0.016 n=42+37) Compiler 3.26s ± 2% 3.23s ± 2% -0.86% (p=0.000 n=43+45) SSA 6.97s ± 1% 6.93s ± 1% -0.63% (p=0.000 n=43+45) Flate 134ms ± 3% 133ms ± 2% ~ (p=0.139 n=45+42) GoParser 165ms ± 2% 164ms ± 1% -0.79% (p=0.000 n=45+40) Reflect 434ms ± 4% 435ms ± 4% ~ (p=0.937 n=44+44) Tar 181ms ± 2% 181ms ± 2% ~ (p=0.702 n=43+45) XML 244ms ± 2% 244ms ± 2% ~ (p=0.237 n=45+44) [Geo mean] 403ms 402ms -0.29% name old user-time/op new user-time/op delta Template 271ms ± 2% 268ms ± 1% -1.40% (p=0.000 n=42+42) Unicode 117ms ± 3% 116ms ± 5% ~ (p=0.066 n=45+45) GoTypes 948ms ± 2% 936ms ± 2% -1.30% (p=0.000 n=41+40) Compiler 4.26s ± 1% 4.21s ± 2% -1.25% (p=0.000 n=37+45) SSA 9.52s ± 2% 9.41s ± 1% -1.18% (p=0.000 n=44+45) Flate 167ms ± 2% 165ms ± 2% -1.15% (p=0.000 n=44+41) GoParser 201ms ± 2% 198ms ± 1% -1.40% (p=0.000 n=43+43) Reflect 563ms ± 8% 560ms ± 7% ~ (p=0.206 n=45+44) Tar 224ms ± 2% 222ms ± 2% -0.81% (p=0.000 n=45+45) XML 308ms ± 2% 304ms ± 1% -1.17% (p=0.000 n=42+43) [Geo mean] 525ms 519ms -1.08% name old alloc/op new alloc/op delta Template 36.3MB ± 0% 36.3MB ± 0% ~ (p=0.421 n=5+5) Unicode 28.4MB ± 0% 28.3MB ± 0% ~ (p=0.056 n=5+5) GoTypes 121MB ± 0% 121MB ± 0% -0.14% (p=0.008 n=5+5) Compiler 567MB ± 0% 567MB ± 0% -0.06% (p=0.016 n=4+5) SSA 1.26GB ± 0% 1.26GB ± 0% -0.07% (p=0.008 n=5+5) Flate 22.9MB ± 0% 22.8MB ± 0% ~ (p=0.310 n=5+5) GoParser 28.0MB ± 0% 27.9MB ± 0% -0.09% (p=0.008 n=5+5) Reflect 78.4MB ± 0% 78.4MB ± 0% -0.03% (p=0.008 n=5+5) Tar 34.2MB ± 0% 34.2MB ± 0% -0.05% (p=0.008 n=5+5) XML 44.4MB ± 0% 44.4MB ± 0% -0.04% (p=0.016 n=5+5) [Geo mean] 76.4MB 76.3MB -0.05% name old allocs/op new allocs/op delta Template 356k ± 0% 356k ± 0% -0.13% (p=0.008 n=5+5) Unicode 326k ± 0% 326k ± 0% -0.07% (p=0.008 n=5+5) GoTypes 1.24M ± 0% 1.24M ± 0% -0.24% (p=0.008 n=5+5) Compiler 5.30M ± 0% 5.28M ± 0% -0.34% (p=0.008 n=5+5) SSA 11.9M ± 0% 11.9M ± 0% -0.16% (p=0.008 n=5+5) Flate 226k ± 0% 225k ± 0% -0.12% (p=0.008 n=5+5) GoParser 287k ± 0% 286k ± 0% -0.29% (p=0.008 n=5+5) Reflect 930k ± 0% 929k ± 0% -0.05% (p=0.008 n=5+5) Tar 332k ± 0% 331k ± 0% -0.12% (p=0.008 n=5+5) XML 411k ± 0% 411k ± 0% -0.12% (p=0.008 n=5+5) [Geo mean] 771k 770k -0.16% For some packages, this change significantly reduces the size of executable text. Examples: file before after Δ % cmd/internal/obj/arm.s 68658 66855 -1803 -2.626% cmd/internal/obj/mips.s 57486 56272 -1214 -2.112% cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250% cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053% cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019% Full listing: file before after Δ % container/ring.s 1890 1870 -20 -1.058% container/list.s 5366 5390 +24 +0.447% internal/cpu.s 3298 3295 -3 -0.091% internal/testlog.s 1507 1501 -6 -0.398% image/color.s 8281 8248 -33 -0.399% runtime.s 480970 480075 -895 -0.186% sync.s 16497 16408 -89 -0.539% internal/singleflight.s 2591 2577 -14 -0.540% math/rand.s 10456 10438 -18 -0.172% cmd/go/internal/par.s 2801 2790 -11 -0.393% internal/reflectlite.s 28477 28417 -60 -0.211% errors.s 2750 2736 -14 -0.509% internal/oserror.s 446 434 -12 -2.691% sort.s 17061 17046 -15 -0.088% io.s 17063 16999 -64 -0.375% vendor/golang.org/x/crypto/hkdf.s 1962 1936 -26 -1.325% text/tabwriter.s 9617 9574 -43 -0.447% hash/crc64.s 3414 3408 -6 -0.176% hash/crc32.s 6657 6651 -6 -0.090% bytes.s 31932 31863 -69 -0.216% strconv.s 53158 52799 -359 -0.675% strings.s 42829 42665 -164 -0.383% encoding/ascii85.s 4833 4791 -42 -0.869% vendor/golang.org/x/text/transform.s 16810 16724 -86 -0.512% path.s 6848 6845 -3 -0.044% encoding/base32.s 9658 9592 -66 -0.683% bufio.s 23051 22908 -143 -0.620% compress/bzip2.s 11773 11764 -9 -0.076% image.s 37565 37502 -63 -0.168% syscall.s 82359 82279 -80 -0.097% regexp/syntax.s 83573 82930 -643 -0.769% image/jpeg.s 36535 36490 -45 -0.123% regexp.s 64396 64214 -182 -0.283% time.s 82724 82622 -102 -0.123% plugin.s 6539 6536 -3 -0.046% context.s 10959 10865 -94 -0.858% internal/poll.s 24286 24270 -16 -0.066% reflect.s 168304 167927 -377 -0.224% internal/fmtsort.s 7416 7376 -40 -0.539% os.s 52465 51787 -678 -1.292% cmd/go/internal/lockedfile/internal/filelock.s 2326 2317 -9 -0.387% os/signal.s 4657 4648 -9 -0.193% runtime/debug.s 6040 5998 -42 -0.695% encoding/binary.s 30838 30801 -37 -0.120% vendor/golang.org/x/net/route.s 23694 23491 -203 -0.857% path/filepath.s 17895 17889 -6 -0.034% cmd/vendor/golang.org/x/sys/unix.s 78125 78109 -16 -0.020% io/ioutil.s 6999 6996 -3 -0.043% encoding/base64.s 12094 12007 -87 -0.719% crypto/cipher.s 20466 20372 -94 -0.459% cmd/go/internal/robustio.s 2672 2669 -3 -0.112% encoding/pem.s 9302 9286 -16 -0.172% internal/obscuretestdata.s 1719 1695 -24 -1.396% crypto/aes.s 11014 11002 -12 -0.109% os/exec.s 29388 29231 -157 -0.534% cmd/internal/browser.s 2266 2260 -6 -0.265% internal/goroot.s 4601 4592 -9 -0.196% vendor/golang.org/x/crypto/chacha20poly1305.s 8945 8942 -3 -0.034% cmd/vendor/golang.org/x/crypto/ssh/terminal.s 27226 27195 -31 -0.114% index/suffixarray.s 36431 36411 -20 -0.055% fmt.s 77017 76709 -308 -0.400% encoding/hex.s 6241 6154 -87 -1.394% compress/lzw.s 7133 7069 -64 -0.897% database/sql/driver.s 18888 18877 -11 -0.058% net/url.s 29838 29739 -99 -0.332% debug/plan9obj.s 8329 8279 -50 -0.600% encoding/csv.s 12986 12902 -84 -0.647% debug/gosym.s 25403 25330 -73 -0.287% compress/flate.s 51192 50970 -222 -0.434% vendor/golang.org/x/net/dns/dnsmessage.s 86769 86208 -561 -0.647% compress/gzip.s 9791 9758 -33 -0.337% compress/zlib.s 7310 7277 -33 -0.451% archive/zip.s 42356 42166 -190 -0.449% debug/dwarf.s 108259 107730 -529 -0.489% encoding/json.s 106378 105910 -468 -0.440% os/user.s 14751 14724 -27 -0.183% database/sql.s 99011 98404 -607 -0.613% log.s 9466 9423 -43 -0.454% debug/pe.s 31272 31182 -90 -0.288% debug/macho.s 32764 32608 -156 -0.476% encoding/gob.s 136976 136517 -459 -0.335% vendor/golang.org/x/text/unicode/bidi.s 27318 27276 -42 -0.154% archive/tar.s 71416 70975 -441 -0.618% vendor/golang.org/x/net/http2/hpack.s 23892 23848 -44 -0.184% vendor/golang.org/x/text/secure/bidirule.s 3354 3351 -3 -0.089% mime/quotedprintable.s 5960 5925 -35 -0.587% net/http/internal.s 5874 5853 -21 -0.358% math/big.s 184147 183692 -455 -0.247% debug/elf.s 63775 63567 -208 -0.326% mime.s 39802 39709 -93 -0.234% encoding/xml.s 111038 110713 -325 -0.293% crypto/dsa.s 6044 6029 -15 -0.248% go/token.s 12139 12077 -62 -0.511% crypto/rand.s 6889 6866 -23 -0.334% go/scanner.s 19030 19008 -22 -0.116% flag.s 22320 22236 -84 -0.376% vendor/golang.org/x/text/unicode/norm.s 66652 66391 -261 -0.392% crypto/rsa.s 31671 31650 -21 -0.066% crypto/elliptic.s 51553 51403 -150 -0.291% internal/xcoff.s 22950 22822 -128 -0.558% go/constant.s 43750 43689 -61 -0.139% encoding/asn1.s 57086 57035 -51 -0.089% runtime/trace.s 2609 2603 -6 -0.230% crypto/x509/pkix.s 10458 10471 +13 +0.124% image/gif.s 27544 27385 -159 -0.577% vendor/golang.org/x/net/idna.s 24558 24502 -56 -0.228% image/png.s 42775 42685 -90 -0.210% vendor/golang.org/x/crypto/cryptobyte.s 33616 33493 -123 -0.366% go/ast.s 80684 80449 -235 -0.291% net/internal/socktest.s 16571 16535 -36 -0.217% crypto/ecdsa.s 11948 11936 -12 -0.100% text/template/parse.s 95138 94002 -1136 -1.194% runtime/pprof.s 59702 59639 -63 -0.106% testing.s 68427 68088 -339 -0.495% internal/testenv.s 5620 5596 -24 -0.427% testing/internal/testdeps.s 3312 3294 -18 -0.543% internal/trace.s 78473 78239 -234 -0.298% testing/iotest.s 4968 4908 -60 -1.208% os/signal/internal/pty.s 3011 2990 -21 -0.697% testing/quick.s 12179 12125 -54 -0.443% cmd/internal/bio.s 9286 9274 -12 -0.129% cmd/internal/src.s 17684 17663 -21 -0.119% cmd/internal/goobj2.s 12588 12558 -30 -0.238% cmd/internal/objabi.s 16408 16390 -18 -0.110% go/printer.s 77417 77308 -109 -0.141% go/parser.s 80045 79113 -932 -1.164% go/format.s 5434 5419 -15 -0.276% cmd/internal/goobj.s 26146 25954 -192 -0.734% runtime/pprof/internal/profile.s 102518 102178 -340 -0.332% text/template.s 95343 94935 -408 -0.428% cmd/internal/dwarf.s 31718 31572 -146 -0.460% cmd/vendor/golang.org/x/arch/arm/armasm.s 45240 45151 -89 -0.197% internal/lazytemplate.s 1470 1457 -13 -0.884% cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s 37253 37220 -33 -0.089% cmd/asm/internal/flags.s 2593 2590 -3 -0.116% cmd/asm/internal/lex.s 25068 24921 -147 -0.586% cmd/internal/buildid.s 18536 18263 -273 -1.473% cmd/vendor/golang.org/x/arch/x86/x86asm.s 80209 80105 -104 -0.130% go/doc.s 75140 74585 -555 -0.739% cmd/internal/edit.s 3893 3899 +6 +0.154% html/template.s 89377 88809 -568 -0.636% cmd/vendor/golang.org/x/arch/arm64/arm64asm.s 117998 117824 -174 -0.147% cmd/internal/obj.s 115015 114290 -725 -0.630% go/build.s 69379 68862 -517 -0.745% cmd/internal/objfile.s 48106 47982 -124 -0.258% cmd/cover.s 46239 46113 -126 -0.272% cmd/addr2line.s 2845 2833 -12 -0.422% cmd/internal/obj/arm.s 68658 66855 -1803 -2.626% cmd/internal/obj/mips.s 57486 56272 -1214 -2.112% cmd/internal/obj/riscv.s 63834 63006 -828 -1.297% cmd/compile/internal/syntax.s 146582 145456 -1126 -0.768% cmd/internal/obj/wasm.s 44117 44066 -51 -0.116% cmd/cgo.s 242645 241653 -992 -0.409% cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250% net.s 295972 292010 -3962 -1.339% go/types.s 321371 319432 -1939 -0.603% vendor/golang.org/x/net/http/httpproxy.s 9450 9423 -27 -0.286% net/textproto.s 19455 19406 -49 -0.252% cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053% go/internal/srcimporter.s 6475 6409 -66 -1.019% log/syslog.s 8017 7929 -88 -1.098% cmd/compile/internal/logopt.s 10183 10162 -21 -0.206% net/mail.s 24085 23948 -137 -0.569% mime/multipart.s 21527 21420 -107 -0.497% cmd/internal/obj/s390x.s 127610 127757 +147 +0.115% go/internal/gcimporter.s 34913 34548 -365 -1.045% vendor/golang.org/x/net/nettest.s 28103 28016 -87 -0.310% cmd/go/internal/cfg.s 9967 9916 -51 -0.512% cmd/api.s 39703 39603 -100 -0.252% go/internal/gccgoimporter.s 56470 56120 -350 -0.620% go/importer.s 2077 2056 -21 -1.011% cmd/compile/internal/types.s 48202 47282 -920 -1.909% cmd/go/internal/str.s 4341 4320 -21 -0.484% cmd/internal/obj/x86.s 89440 88625 -815 -0.911% cmd/go/internal/base.s 12667 12580 -87 -0.687% cmd/go/internal/cache.s 30754 30571 -183 -0.595% cmd/doc.s 62976 62755 -221 -0.351% cmd/go/internal/search.s 20114 19993 -121 -0.602% cmd/vendor/golang.org/x/xerrors.s 17923 17855 -68 -0.379% cmd/go/internal/lockedfile.s 16451 16415 -36 -0.219% cmd/vendor/golang.org/x/mod/sumdb/note.s 18200 18150 -50 -0.275% cmd/vendor/golang.org/x/mod/module.s 17869 17851 -18 -0.101% cmd/asm/internal/arch.s 37533 37482 -51 -0.136% cmd/fix.s 87728 87492 -236 -0.269% cmd/vendor/golang.org/x/mod/sumdb/tlog.s 36394 36367 -27 -0.074% cmd/vendor/golang.org/x/mod/sumdb/dirhash.s 4990 4963 -27 -0.541% cmd/go/internal/imports.s 16499 16469 -30 -0.182% cmd/vendor/golang.org/x/mod/zip.s 18816 18745 -71 -0.377% cmd/go/internal/cmdflag.s 5126 5123 -3 -0.059% cmd/internal/test2json.s 9540 9452 -88 -0.922% cmd/go/internal/tool.s 3629 3623 -6 -0.165% cmd/go/internal/version.s 11232 11220 -12 -0.107% cmd/go/internal/mvs.s 25383 25179 -204 -0.804% cmd/nm.s 5815 5803 -12 -0.206% cmd/dist.s 210146 209140 -1006 -0.479% cmd/asm/internal/asm.s 68655 68549 -106 -0.154% cmd/vendor/golang.org/x/mod/modfile.s 72974 72510 -464 -0.636% cmd/go/internal/load.s 107548 106861 -687 -0.639% cmd/link/internal/sym.s 18708 18581 -127 -0.679% cmd/asm.s 3367 3343 -24 -0.713% cmd/gofmt.s 30795 30698 -97 -0.315% cmd/link/internal/objfile.s 21828 21630 -198 -0.907% cmd/pack.s 14878 14869 -9 -0.060% cmd/vendor/github.com/google/pprof/internal/elfexec.s 6788 6782 -6 -0.088% cmd/test2json.s 1647 1641 -6 -0.364% cmd/link/internal/loader.s 48677 48483 -194 -0.399% cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s 16783 16773 -10 -0.060% cmd/link/internal/loadelf.s 35464 35126 -338 -0.953% cmd/link/internal/loadmacho.s 29438 29180 -258 -0.876% cmd/link/internal/loadpe.s 16440 16371 -69 -0.420% cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2106 2100 -6 -0.285% cmd/link/internal/loadxcoff.s 11711 11615 -96 -0.820% cmd/vendor/golang.org/x/tools/go/analysis/internal/facts.s 14954 14883 -71 -0.475% cmd/vendor/golang.org/x/tools/go/ast/inspector.s 5394 5374 -20 -0.371% cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s 37029 36822 -207 -0.559% cmd/vendor/golang.org/x/tools/go/analysis/passes/inspect.s 340 337 -3 -0.882% cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s 9919 9858 -61 -0.615% cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s 6705 6690 -15 -0.224% cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s 9783 9741 -42 -0.429% cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019% cmd/vendor/golang.org/x/tools/go/analysis/passes/ifaceassert.s 2768 2762 -6 -0.217% cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s 3031 2998 -33 -1.089% cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s 4382 4376 -6 -0.137% cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s 8654 8642 -12 -0.139% cmd/vendor/golang.org/x/tools/go/analysis/passes/stringintconv.s 3458 3446 -12 -0.347% cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s 8011 7995 -16 -0.200% cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s 6205 6193 -12 -0.193% cmd/vendor/golang.org/x/tools/go/ast/astutil.s 66183 65861 -322 -0.487% cmd/vendor/github.com/google/pprof/profile.s 150844 150261 -583 -0.386% cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8057 8054 -3 -0.037% cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s 3670 3667 -3 -0.082% cmd/vendor/github.com/google/pprof/internal/measurement.s 10464 10440 -24 -0.229% cmd/vendor/golang.org/x/tools/go/types/typeutil.s 12319 12274 -45 -0.365% cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s 13503 13342 -161 -1.192% cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s 5261 5218 -43 -0.817% cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s 1462 1459 -3 -0.205% cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s 9594 9582 -12 -0.125% cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s 34397 34338 -59 -0.172% cmd/vendor/github.com/google/pprof/internal/graph.s 53225 52936 -289 -0.543% cmd/vendor/github.com/ianlancetaylor/demangle.s 177450 175329 -2121 -1.195% crypto/x509.s 147892 147388 -504 -0.341% cmd/go/internal/work.s 306465 304950 -1515 -0.494% cmd/go/internal/run.s 4664 4657 -7 -0.150% crypto/tls.s 313130 311833 -1297 -0.414% net/http/httptrace.s 3979 3905 -74 -1.860% net/smtp.s 14413 14344 -69 -0.479% cmd/link/internal/ld.s 545343 542279 -3064 -0.562% cmd/link/internal/mips.s 6218 6215 -3 -0.048% cmd/link/internal/mips64.s 6108 6103 -5 -0.082% cmd/link/internal/amd64.s 18154 18112 -42 -0.231% cmd/link/internal/arm64.s 22527 22494 -33 -0.146% cmd/link/internal/arm.s 22574 22494 -80 -0.354% cmd/link/internal/s390x.s 20779 20746 -33 -0.159% cmd/link/internal/wasm.s 16531 16493 -38 -0.230% cmd/link/internal/x86.s 18906 18849 -57 -0.301% cmd/link/internal/ppc64.s 26856 26778 -78 -0.290% net/http.s 559101 556513 -2588 -0.463% net/http/cookiejar.s 15912 15885 -27 -0.170% expvar.s 9531 9525 -6 -0.063% net/http/httptest.s 16616 16475 -141 -0.849% net/http/cgi.s 23624 23458 -166 -0.703% cmd/go/internal/web.s 16546 16489 -57 -0.344% cmd/vendor/golang.org/x/mod/sumdb.s 33197 33117 -80 -0.241% net/http/fcgi.s 19266 19169 -97 -0.503% net/http/httputil.s 39875 39728 -147 -0.369% cmd/vendor/github.com/google/pprof/internal/symbolz.s 5888 5867 -21 -0.357% net/rpc.s 34154 34003 -151 -0.442% cmd/vendor/github.com/google/pprof/internal/transport.s 2746 2716 -30 -1.092% cmd/vendor/github.com/google/pprof/internal/binutils.s 35999 35875 -124 -0.344% net/rpc/jsonrpc.s 6637 6598 -39 -0.588% cmd/vendor/github.com/google/pprof/internal/symbolizer.s 11533 11458 -75 -0.650% cmd/go/internal/get.s 62921 62803 -118 -0.188% cmd/vendor/github.com/google/pprof/internal/report.s 80364 80058 -306 -0.381% cmd/go/internal/modfetch/codehost.s 89680 89066 -614 -0.685% cmd/trace.s 117171 116701 -470 -0.401% cmd/vendor/github.com/google/pprof/internal/driver.s 144268 143297 -971 -0.673% cmd/go/internal/modfetch.s 126299 125860 -439 -0.348% cmd/vendor/github.com/google/pprof/driver.s 9042 9000 -42 -0.464% cmd/go/internal/modconv.s 17947 17889 -58 -0.323% cmd/pprof.s 12399 12326 -73 -0.589% cmd/go/internal/modload.s 151182 150389 -793 -0.525% cmd/go/internal/generate.s 11738 11636 -102 -0.869% cmd/go/internal/help.s 6571 6531 -40 -0.609% cmd/go/internal/clean.s 11174 11142 -32 -0.286% cmd/go/internal/vet.s 7897 7867 -30 -0.380% cmd/go/internal/envcmd.s 22176 22095 -81 -0.365% cmd/go/internal/list.s 15216 15067 -149 -0.979% cmd/go/internal/modget.s 38698 38519 -179 -0.463% cmd/go/internal/modcmd.s 46674 46441 -233 -0.499% cmd/go/internal/test.s 64664 64456 -208 -0.322% cmd/go.s 6730 6703 -27 -0.401% cmd/compile/internal/ssa.s 3592565 3582500 -10065 -0.280% cmd/compile/internal/gc.s 1549123 1537123 -12000 -0.775% cmd/compile/internal/riscv64.s 14579 14483 -96 -0.658% cmd/compile/internal/mips.s 20578 20419 -159 -0.773% cmd/compile/internal/ppc64.s 25524 25359 -165 -0.646% cmd/compile/internal/mips64.s 19795 19636 -159 -0.803% cmd/compile/internal/wasm.s 13329 13290 -39 -0.293% cmd/compile/internal/s390x.s 28097 27892 -205 -0.730% cmd/compile/internal/arm.s 31489 31321 -168 -0.534% cmd/compile/internal/arm64.s 29803 29590 -213 -0.715% cmd/compile/internal/amd64.s 32961 33221 +260 +0.789% cmd/compile/internal/x86.s 31029 30878 -151 -0.487% total 18534966 18440341 -94625 -0.511% Change-Id: I830d37364f14f0297800adc42c99f60a74c51aca Reviewed-on: https://go-review.googlesource.com/c/go/+/226367 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-31 21:26:33 +00:00
Keith Randall	33b648c0e9	cmd/compile: fix ephemeral pointer problem on amd64 Make sure we don't use the rewrite ptr + (c + x) -> c + (ptr + x), as that may create an ephemeral out-of-bounds pointer. I have not seen an actual bug caused by this yet, but we've seen them in the 386 port so I'm fixing this issue for amd64 as well. The load-combining rules needed to be reworked somewhat to still work without the above broken rule. Update #37881 Change-Id: I8046d170e89e2035195f261535e34ca7d8aca68a Reviewed-on: https://go-review.googlesource.com/c/go/+/226437 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-03-30 17:25:29 +00:00
Keith Randall	af7eafd150	cmd/compile: convert 386 port to use addressing modes pass (take 2) Retrying CL 222782, with a fix that will hopefully stop the random crashing. The issue with the previous CL is that it does pointer arithmetic in a way that may briefly generate an out-of-bounds pointer. If an interrupt happens to occur in that state, the referenced object may be collected incorrectly. Suppose there was code that did s[x+c]. The previous CL had a rule to the effect of ptr + (x + c) -> c + (ptr + x). But ptr+x is not guaranteed to point to the same object as ptr. In contrast, ptr+(x+c) is guaranteed to point to the same object as ptr, because we would have already checked that x+c is in bounds. For example, strconv.trim used to have this code: MOVZX -0x1(BX)(DX1), BP CMPL $0x30, AL After CL 222782, it had this code: LEAL 0(BX)(DX1), BP CMPB $0x30, -0x1(BP) An interrupt between those last two instructions could see BP pointing outside the backing store of the slice involved. It's really hard to actually demonstrate a bug. First, you need to have an interrupt occur at exactly the right time. Then, there must be no other pointers to the object in question. Since the interrupted frame will be scanned conservatively, there can't even be a dead pointer in another register or on the stack. (In the example above, a bug can't happen because BX still holds the original pointer.) Then, the object in question needs to be collected (or at least scanned?) before the interrupted code continues. This CL needs to handle load combining somewhat differently than CL 222782 because of the new restriction on arithmetic. That's the only real difference (other than removing the bad rules) from that old CL. This bug is also present in the amd64 rewrite rules, and we haven't seen any crashing as a result. I will fix up that code similarly to this one in a separate CL. Update #37881 Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f Reviewed-on: https://go-review.googlesource.com/c/go/+/225798 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-03-27 18:54:45 +00:00
Lynn Boger	e4a1cf8a56	cmd/compile: add rules to eliminate unnecessary signed shifts This change to the rules removes some unnecessary signed shifts that appear in the math/rand functions. Existing rules did not cover some of the signed cases. A little improvement seen in math/rand due to removing 1 of 2 instructions generated for Int31n, which is inlined quite a bit. Intn1000 46.9ns ± 0% 45.5ns ± 0% -2.99% (p=1.000 n=1+1) Int63n1000 33.5ns ± 0% 32.8ns ± 0% -2.09% (p=1.000 n=1+1) Int31n1000 32.7ns ± 0% 32.6ns ± 0% -0.31% (p=1.000 n=1+1) Float32 32.7ns ± 0% 30.3ns ± 0% -7.34% (p=1.000 n=1+1) Float64 21.7ns ± 0% 20.9ns ± 0% -3.69% (p=1.000 n=1+1) Perm3 205ns ± 0% 202ns ± 0% -1.46% (p=1.000 n=1+1) Perm30 1.71µs ± 0% 1.68µs ± 0% -1.35% (p=1.000 n=1+1) Perm30ViaShuffle 1.65µs ± 0% 1.65µs ± 0% -0.30% (p=1.000 n=1+1) ShuffleOverhead 2.83µs ± 0% 2.83µs ± 0% -0.07% (p=1.000 n=1+1) Read3 18.7ns ± 0% 16.1ns ± 0% -13.90% (p=1.000 n=1+1) Read64 126ns ± 0% 124ns ± 0% -1.59% (p=1.000 n=1+1) Read1000 1.75µs ± 0% 1.63µs ± 0% -7.08% (p=1.000 n=1+1) Change-Id: I11502dfca7d65aafc76749a8d713e9e50c24a858 Reviewed-on: https://go-review.googlesource.com/c/go/+/225917 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-03-27 16:05:42 +00:00
Ruixin(Peter) Bao	16cfab8d89	cmd/compile: use load and test instructions on s390x The load and test instructions compare the given value against zero and will produce a condition code indicating one of the following scenarios: 0: Result is zero 1: Result is less than zero 2: Result is greater than zero 3: Result is not a number (NaN) The instruction can be used to simplify floating point comparisons against zero, which can enable further optimizations. This CL also reduces the size of .text section of math.test binary by around 0.7 KB (in hexadecimal, from 1358f0 to 135620). Change-Id: I33cb714f0c6feebac7a1c46dfcc735e7daceff9c Reviewed-on: https://go-review.googlesource.com/c/go/+/209159 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-03-25 13:10:07 +00:00
Keith Randall	c785633941	Revert "cmd/compile: convert 386 port to use addressing modes pass" This reverts commit CL 222782. Reason for revert: Reverting to see if 386 errors go away Update #37881 Change-Id: I74f287404c52414db1b6ff1649effa4ed9e5cc0c Reviewed-on: https://go-review.googlesource.com/c/go/+/225218 Reviewed-by: Bryan C. Mills <bcmills@google.com>	2020-03-24 19:07:15 +00:00
Keith Randall	e0deacd1c0	Revert "cmd/compile: disable mem+op operations on 386" This reverts commit CL 224837. Reason for revert: Reverting partial reverts of 222782. Update #37881 Change-Id: Ie9bf84d6e17ed214abe538965e5ff03936886826 Reviewed-on: https://go-review.googlesource.com/c/go/+/225217 Reviewed-by: Bryan C. Mills <bcmills@google.com>	2020-03-24 19:06:22 +00:00
Keith Randall	f975485ad1	Revert "cmd/compile: disable addressingmodes pass for 386" This reverts commit CL 225057. Reason for revert: Undoing partial reverts of CL 222782 Update #37881 Change-Id: Iee024cab2a580a37a0fc355e0e3c5ad3d8fdaf7d Reviewed-on: https://go-review.googlesource.com/c/go/+/225197 Reviewed-by: Bryan C. Mills <bcmills@google.com>	2020-03-24 19:05:50 +00:00
Keith Randall	5b897ec017	cmd/compile: disable addressingmodes pass for 386 Update #37881 Change-Id: I1f9a3f57f6215a19c31765c257ee78715eab36b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/225057 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-03-23 20:31:13 +00:00
Keith Randall	3adbdb6d99	cmd/compile: disable mem+op operations on 386 Rolling back portions of CL 222782 to see if that helps issue #37881 any. Update #37881 Change-Id: I9cc3ff8c469fa5e4b22daec715d04148033f46f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/224837 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>	2020-03-23 18:27:37 +00:00
Russ Cox	fc8a6336d1	cmd/asm, cmd/compile, runtime: add -spectre=ret mode This commit extends the -spectre flag to cmd/asm and adds a new Spectre mitigation mode "ret", which enables the use of retpolines. Retpolines prevent speculation about the target of an indirect jump or call and are described in more detail here: https://support.google.com/faqs/answer/7625886 Change-Id: I4f2cb982fa94e44d91e49bd98974fd125619c93a Reviewed-on: https://go-review.googlesource.com/c/go/+/222661 Reviewed-by: Keith Randall <khr@golang.org>	2020-03-13 19:05:54 +00:00
Russ Cox	877ef86bec	cmd/compile: add spectre mitigation mode enabled by -spectre This commit adds a new cmd/compile flag -spectre, which accepts a comma-separated list of possible Spectre mitigations to apply, or the empty string (none), or "all". The only known mitigation right now is "index", which uses conditional moves to ensure that x86-64 CPUs do not speculate past index bounds checks. Speculating past index bounds checks may be problematic on systems running privileged servers that accept requests from untrusted users who can execute their own programs on the same machine. (And some more constraints that make it even more unlikely in practice.) The cases this protects against are analogous to the ones Microsoft explains in the "Array out of bounds load/store feeding ..." sections here: https://docs.microsoft.com/en-us/cpp/security/developer-guidance-speculative-execution?view=vs-2019#array-out-of-bounds-load-feeding-an-indirect-branch Change-Id: Ib7532d7e12466b17e04c4e2075c2a456dc98f610 Reviewed-on: https://go-review.googlesource.com/c/go/+/222660 Reviewed-by: Keith Randall <khr@golang.org>	2020-03-13 19:05:46 +00:00
Keith Randall	d84cbec890	cmd/compile: convert 386 port to use addressing modes pass Update #36468 Change-Id: Idfdb845d097994689be450d6e8a57fa9adb57166 Reviewed-on: https://go-review.googlesource.com/c/go/+/222782 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-03-13 17:00:54 +00:00
Russ Cox	801a9d9a0c	test/codegen: mention in README that tests only run on Linux without -all_codegen This took me a while to figure out. The relevant code is in test/run.go (note the "linux" hard-coded strings): var arch, subarch, os string switch { case archspec[2] != "": // 3 components: "linux/386/sse2" os, arch, subarch = archspec[0], archspec[1][1:], archspec[2][1:] case archspec[1] != "": // 2 components: "386/sse2" os, arch, subarch = "linux", archspec[0], archspec[1][1:] default: // 1 component: "386" os, arch, subarch = "linux", archspec[0], "" if arch == "wasm" { os = "js" } } Change-Id: I92ba280025d2072e17532a5e43cf1d676789c167 Reviewed-on: https://go-review.googlesource.com/c/go/+/222819 Reviewed-by: Keith Randall <khr@golang.org>	2020-03-11 16:17:08 +00:00
Keith Randall	98cb76799c	cmd/compile: insert complicated x86 addressing modes as a separate pass Use a separate compiler pass to introduce complicated x86 addressing modes. Loads in the normal architecture rules (for x86 and all other platforms) can have constant offsets (AuxInt values) and symbols (Aux values), but no more. The complex addressing modes (x+y, x+2*y, etc.) are introduced in a separate pass that combines loads with LEAQx ops. Organizing rewrites this way simplifies the number of rewrites required, as there are lots of different rule orderings that have to be specified to ensure these complex addressing modes are always found if they are possible. Update #36468 Change-Id: I5b4bf7b03a1e731d6dfeb9ef19b376175f3b4b44 Reviewed-on: https://go-review.googlesource.com/c/go/+/217097 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-03-10 00:13:21 +00:00
Diogo Pinela	19ed0d993c	cmd/compile: use staticuint64s instead of staticbytes There are still two places in src/runtime/string.go that use staticbytes, so we cannot delete it just yet. There is a new codegen test to verify that the index calculation is constant-folded, at least on amd64. ppc64, mips[64] and s390x cannot currently do that. There is also a new runtime benchmark to ensure that this does not slow down performance (tested against parent commit): name old time/op new time/op delta ConvT2EByteSized/bool-4 1.07ns ± 1% 1.07ns ± 1% ~ (p=0.060 n=14+15) ConvT2EByteSized/uint8-4 1.06ns ± 1% 1.07ns ± 1% ~ (p=0.095 n=14+15) Updates #37612 Change-Id: I5ec30738edaa48cda78dfab4a78e24a32fa7fd6a Reviewed-on: https://go-review.googlesource.com/c/go/+/221957 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-03-04 21:43:01 +00:00
Keith Randall	cd9fd640db	cmd/compile: don't allow NaNs in floating-point constant ops Trying this CL again, with a fixed test that allows platforms to disagree on the exact behavior of converting NaNs. We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for almost all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. Update #36400 Change-Id: Idf203b688a15abceabbd66ba290d4e9f63619ecb Reviewed-on: https://go-review.googlesource.com/c/go/+/221790 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-03-04 04:49:54 +00:00
Josh Bleecher Snyder	b49d8ce2fa	all: fix two minor typos in comments Change-Id: Iec6cd81c9787d3419850aa97e75052956ad139bc Reviewed-on: https://go-review.googlesource.com/c/go/+/221789 Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>	2020-03-03 17:44:05 +00:00
Michael Munday	e37cc29863	cmd/compile: optimize integer-in-range checks This CL incorporates code from CL 201206 by Josh Bleecher Snyder (thanks Josh). This CL restores the integer-in-range optimizations in the SSA backend. The fuse pass is enhanced to detect inequalities that could be merged and fuse their associated blocks while the generic rules optimize them into a single unsigned comparison. For example, the inequality `x >= 0 && x < 10` will now be optimized to `unsigned(x) < 10`. Overall has a fairly positive impact on binary sizes. name old time/op new time/op delta Template 192ms ± 1% 192ms ± 1% ~ (p=0.757 n=17+18) Unicode 76.6ms ± 2% 76.5ms ± 2% ~ (p=0.603 n=19+19) GoTypes 694ms ± 1% 693ms ± 1% ~ (p=0.569 n=19+20) Compiler 3.26s ± 0% 3.27s ± 0% +0.25% (p=0.000 n=20+20) SSA 7.41s ± 0% 7.49s ± 0% +1.10% (p=0.000 n=17+19) Flate 120ms ± 1% 120ms ± 1% +0.38% (p=0.003 n=19+19) GoParser 152ms ± 1% 152ms ± 1% ~ (p=0.061 n=17+19) Reflect 422ms ± 1% 425ms ± 2% +0.76% (p=0.001 n=18+20) Tar 167ms ± 1% 167ms ± 0% ~ (p=0.730 n=18+19) XML 233ms ± 4% 231ms ± 1% ~ (p=0.752 n=20+17) LinkCompiler 927ms ± 8% 928ms ± 8% ~ (p=0.857 n=19+20) ExternalLinkCompiler 1.81s ± 2% 1.81s ± 2% ~ (p=0.513 n=19+20) LinkWithoutDebugCompiler 556ms ±10% 583ms ±13% +4.95% (p=0.007 n=20+20) [Geo mean] 478ms 481ms +0.52% name old user-time/op new user-time/op delta Template 270ms ± 5% 269ms ± 7% ~ (p=0.925 n=20+20) Unicode 134ms ± 7% 131ms ±14% ~ (p=0.593 n=18+20) GoTypes 981ms ± 3% 987ms ± 2% +0.63% (p=0.049 n=19+18) Compiler 4.50s ± 2% 4.50s ± 1% ~ (p=0.588 n=19+20) SSA 10.6s ± 2% 10.6s ± 1% ~ (p=0.141 n=20+19) Flate 164ms ± 8% 165ms ±10% ~ (p=0.738 n=20+20) GoParser 202ms ± 5% 203ms ± 6% ~ (p=0.820 n=20+20) Reflect 587ms ± 6% 597ms ± 3% ~ (p=0.087 n=20+18) Tar 230ms ± 6% 228ms ± 8% ~ (p=0.569 n=19+20) XML 311ms ± 6% 314ms ± 5% ~ (p=0.369 n=20+20) LinkCompiler 878ms ± 8% 887ms ± 7% ~ (p=0.289 n=20+20) ExternalLinkCompiler 1.60s ± 7% 1.60s ± 7% ~ (p=0.820 n=20+20) LinkWithoutDebugCompiler 498ms ±12% 489ms ±11% ~ (p=0.398 n=20+20) [Geo mean] 611ms 611ms +0.05% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 36.0MB ± 0% -0.32% (p=0.000 n=20+20) Unicode 28.3MB ± 0% 28.3MB ± 0% -0.03% (p=0.000 n=19+20) GoTypes 121MB ± 0% 121MB ± 0% ~ (p=0.226 n=16+20) Compiler 563MB ± 0% 563MB ± 0% ~ (p=0.166 n=20+19) SSA 1.32GB ± 0% 1.33GB ± 0% +0.88% (p=0.000 n=20+19) Flate 22.7MB ± 0% 22.7MB ± 0% -0.02% (p=0.033 n=19+20) GoParser 27.9MB ± 0% 27.9MB ± 0% -0.02% (p=0.001 n=20+20) Reflect 78.3MB ± 0% 78.2MB ± 0% -0.01% (p=0.019 n=20+20) Tar 34.0MB ± 0% 34.0MB ± 0% -0.04% (p=0.000 n=20+20) XML 43.9MB ± 0% 43.9MB ± 0% -0.07% (p=0.000 n=20+19) LinkCompiler 205MB ± 0% 205MB ± 0% +0.44% (p=0.000 n=20+18) ExternalLinkCompiler 223MB ± 0% 223MB ± 0% +0.03% (p=0.000 n=20+20) LinkWithoutDebugCompiler 139MB ± 0% 142MB ± 0% +1.75% (p=0.000 n=20+20) [Geo mean] 93.7MB 93.9MB +0.20% name old allocs/op new allocs/op delta Template 363k ± 0% 361k ± 0% -0.58% (p=0.000 n=20+19) Unicode 329k ± 0% 329k ± 0% -0.06% (p=0.000 n=19+20) GoTypes 1.28M ± 0% 1.28M ± 0% -0.01% (p=0.000 n=20+20) Compiler 5.40M ± 0% 5.40M ± 0% -0.01% (p=0.000 n=20+20) SSA 12.7M ± 0% 12.8M ± 0% +0.80% (p=0.000 n=20+20) Flate 228k ± 0% 228k ± 0% ~ (p=0.194 n=20+20) GoParser 295k ± 0% 295k ± 0% -0.04% (p=0.000 n=20+20) Reflect 949k ± 0% 949k ± 0% -0.01% (p=0.000 n=20+20) Tar 337k ± 0% 337k ± 0% -0.06% (p=0.000 n=20+20) XML 418k ± 0% 417k ± 0% -0.17% (p=0.000 n=20+20) LinkCompiler 553k ± 0% 554k ± 0% +0.22% (p=0.000 n=20+19) ExternalLinkCompiler 1.52M ± 0% 1.52M ± 0% +0.27% (p=0.000 n=20+20) LinkWithoutDebugCompiler 186k ± 0% 186k ± 0% +0.06% (p=0.000 n=20+20) [Geo mean] 723k 723k +0.03% name old text-bytes new text-bytes delta HelloSize 828kB ± 0% 828kB ± 0% -0.01% (p=0.000 n=20+20) name old data-bytes new data-bytes delta HelloSize 13.4kB ± 0% 13.4kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 180kB ± 0% 180kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.23MB ± 0% 1.23MB ± 0% -0.33% (p=0.000 n=20+20) file before after Δ % addr2line 4320075 4311883 -8192 -0.190% asm 5191932 5187836 -4096 -0.079% buildid 2835338 2831242 -4096 -0.144% compile 20531717 20569099 +37382 +0.182% cover 5322511 5318415 -4096 -0.077% dist 3723749 3719653 -4096 -0.110% doc 4743515 4739419 -4096 -0.086% fix 3413960 3409864 -4096 -0.120% link 6690119 6686023 -4096 -0.061% nm 4269616 4265520 -4096 -0.096% pprof 14942189 14929901 -12288 -0.082% trace 11807164 11790780 -16384 -0.139% vet 8384104 8388200 +4096 +0.049% go 15339076 15334980 -4096 -0.027% total 132258257 132226007 -32250 -0.024% Fixes #30645. Change-Id: If551ac5996097f3685870d083151b5843170aab0 Reviewed-on: https://go-review.googlesource.com/c/go/+/165998 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-03 14:30:26 +00:00
Michael Munday	44fe355694	cmd/compile: canonicalize comparison argument order Ensure that any comparison between two values has the same argument order. This helps ensure that they can be eliminated during the lowered CSE pass which will be particularly important if we eliminate the Greater and Geq ops (see #37316). Example: CMP R0, R1 BLT L1 CMP R1, R0 // different order, cannot eliminate BEQ L2 CMP R0, R1 BLT L1 CMP R0, R1 // same order, can eliminate BEQ L2 This does have some drawbacks. Notably comparisons might 'flip' direction in the assembly output after even small changes to the code or compiler. It should help make optimizations more reliable however. compilecmp master -> HEAD master (`218f4572f5`): text/template: make reflect.Value indirections more robust HEAD (f1661fef3e): cmd/compile: canonicalize comparison argument order platform: linux/amd64 file before after Δ % api 6063927 6068023 +4096 +0.068% asm 5191757 5183565 -8192 -0.158% cgo 4893518 4901710 +8192 +0.167% cover 5330345 5326249 -4096 -0.077% fix 3417778 3421874 +4096 +0.120% pprof 14889456 14885360 -4096 -0.028% test2json 2848138 2844042 -4096 -0.144% trace 11746239 11733951 -12288 -0.105% total 132739173 132722789 -16384 -0.012% Change-Id: I11736b3fe2a4553f6fc65018f475e88217fa22f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/220425 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-26 10:32:22 +00:00
Bryan C. Mills	a9f1ea4a83	Revert "cmd/compile: don't allow NaNs in floating-point constant ops" This reverts CL 213477. Reason for revert: tests are failing on linux-mips*-rtrk builders. Change-Id: I8168f7450890233f1bd7e53930b73693c26d4dc0 Reviewed-on: https://go-review.googlesource.com/c/go/+/220897 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-02-25 15:49:19 +00:00
Keith Randall	2aa7c6c548	cmd/compile: don't allow NaNs in floating-point constant ops We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for almost all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. This has been a bug since 1.10, and there's an easy workaround (declare a global varaible containing the signaling NaN pattern, and use that as the argument to math.Float32frombits) so we'll fix it in 1.15. Fixes #36400 Update #36399 Change-Id: Icf155e743281560eda2eed953d19a829552ccfda Reviewed-on: https://go-review.googlesource.com/c/go/+/213477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-02-25 02:21:53 +00:00
Keith Randall	1cfe8e91b6	cmd/compile: use ADDQ instead of LEAQ when we can The address calculations in the example end up doing x << 4 + y + 0. Before this CL we use a SHLQ+LEAQ. Since the constant offset is 0, we can use SHLQ+ADDQ instead. Change-Id: Ia048c4fdbb3a42121c7e1ab707961062e8247fca Reviewed-on: https://go-review.googlesource.com/c/go/+/209959 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-02-24 21:33:53 +00:00
Brian Kessler	6b1d5471b9	cmd/compile: add signed indivisibility by power of 2 rules Commit `44343c777c` (CL 173557) added rules for handling divisibility checks for powers of 2 for signed integers, x%c ==0. This change adds the complementary indivisibility rules, x%c != 0. Fixes #34166 Change-Id: I87379e30af7aff633371acca82db2397da9b2c07 Reviewed-on: https://go-review.googlesource.com/c/go/+/194219 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-11-07 16:30:46 +00:00
Russ Cox	543c6d2e0d	math, cmd/compile: rename Fma to FMA This API was added for #25819, where it was discussed as math.FMA. The commit adding it used math.Fma, presumably for consistency with the rest of the unusual names in package math (Sincos, Acosh, Erfcinv, Float32bits, etc). I believe that using an idiomatic Go name is more important here than consistency with these other names, most of which are historical baggage from C's standard library. Early additions like Float32frombits happened before "uppercase for export" (so they were originally like "float32frombits") and they were not properly reconsidered when we uppercased the symbols to export them. That's a mistake we live with. The names of functions we have added since then, and even a few that were legacy, are more properly Go-cased, such as IsNaN, IsInf, and RoundToEven, rather than Isnan, Isinf, and Roundtoeven. And also constants like MaxFloat32. For new API, we should keep using proper Go-cased symbols instead of minimally-upper-cased-C symbols. So math.FMA, not math.Fma. This API has not yet been released, so this change does not break the compatibility promise. This CL also modifies cmd/compile, since the compiler knows the name of the function. I could have stopped at changing the string constants, but it seemed to make more sense to use a consistent casing everywhere. Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e Reviewed-on: https://go-review.googlesource.com/c/go/+/205317 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-11-07 14:51:06 +00:00
smasher164	58b031949b	cmd/compile: add fma intrinsic for arm This change introduces an arm intrinsic that generates the FMULAD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite rule translates the generic intrinsic to FMULAD. Updates #25819. Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa Reviewed-on: https://go-review.googlesource.com/c/go/+/142117 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-21 17:42:47 +00:00
smasher164	7a6da218b1	cmd/compile: add fma intrinsic for amd64 To permit ssa-level optimization, this change introduces an amd64 intrinsic that generates the VFMADD231SD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic ("Fma") to VFMADD231SD. The benchmark compares the software implementation (old) with the intrinsic (new). name old time/op new time/op delta Fma-4 27.2ns ± 1% 1.0ns ± 9% -96.48% (p=0.008 n=5+5) Updates #25819. Change-Id: I966655e5f96817a5d06dff5942418a3915b09584 Reviewed-on: https://go-review.googlesource.com/c/go/+/137156 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-21 16:42:10 +00:00
smasher164	33425ab8db	cmd/compile: introduce generic ssa intrinsic for fused-multiply-add In order to make math.FMA a compiler intrinsic for ISAs like ARM64, PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and rewritten as ARM64: (Fma x y z) -> (FMADDD z x y) PPC64: (Fma x y z) -> (FMADD x y z) S390X: (Fma x y z) -> (FMADD z x y) Updates #25819. Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa Reviewed-on: https://go-review.googlesource.com/c/go/+/131959 Run-TryBot: Akhil Indurti <aindurti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-21 16:24:15 +00:00
David Chase	6adaf17eaa	cmd/compile: preserve statements in late nilcheckelim optimization When a subsequent load/store of a ptr makes the nil check of that pointer unnecessary, if their lines differ, change the line of the load/store to that of the nilcheck, and attempt to rehome the load/store position instead. This fix makes profiling less accurate in order to make panics more informative. Fixes #33724 Change-Id: Ib9afaac12fe0d0320aea1bf493617facc34034b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/200197 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-15 16:43:44 +00:00
Meng Zhuo	50f1157760	cmd/compile: add math/bits.Mul64 intrinsic on mips64x Benchmark: name old time/op new time/op delta Mul 36.0ns ± 1% 2.8ns ± 0% -92.31% (p=0.000 n=10+10) Mul32 4.37ns ± 0% 4.37ns ± 0% ~ (p=0.429 n=6+10) Mul64 36.4ns ± 0% 2.8ns ± 0% -92.37% (p=0.000 n=10+9) Change-Id: Ic4f4e5958adbf24999abcee721d0180b5413fca7 Reviewed-on: https://go-review.googlesource.com/c/go/+/200582 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-10-14 21:23:34 +00:00
Michael Munday	6ec4c71eef	cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-08 10:03:04 +00:00
Cuong Manh Le	77f5adba55	cmd/compile: don't use statictmps for small object in slice literal Fixes #21561 Change-Id: I89c59752060dd9570d17d73acbbaceaefce5d8ce Reviewed-on: https://go-review.googlesource.com/c/go/+/197560 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-10-08 06:09:26 +00:00
Keith Randall	72dc9ab191	cmd/compile: reuse dead register before reusing register holding constant For commuting ops, check whether the second argument is dead before checking if the first argument is rematerializeable. Reusing the register holding a dead value is always best. Fixes #33580 Change-Id: I7372cfc03d514e6774d2d9cc727a3e6bf6ce2657 Reviewed-on: https://go-review.googlesource.com/c/go/+/199559 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2019-10-07 15:16:26 +00:00
Dan Scales	225f484c88	misc, runtime, test: extra tests and benchmarks for defer Add a bunch of extra tests and benchmarks for defer, in preparation for new low-cost (open-coded) implementation of defers (see #34481), - New file defer_test.go that tests a bunch more unusual defer scenarios, including things that might have problems for open-coded defers. - Additions to callers_test.go actually verifying what the stack trace looks like for various panic or panic-recover scenarios. - Additions to crash_test.go testing several more crash scenarios involving recursive panics. - New benchmark in runtime_test.go measuring speed of panic-recover - New CGo benchmark in cgo_test.go calling from Go to C back to Go that shows defer overhead Updates #34481 Change-Id: I423523f3e05fc0229d4277dd00073289a5526188 Reviewed-on: https://go-review.googlesource.com/c/go/+/197017 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2019-09-25 23:27:16 +00:00
Martin Möhrmann	f41451e7eb	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): 'go run run.go -all_codegen -v codegen' passes with following adjustments: ARM64: The BFXIL pattern ((x << lc) >> rc \| y & ac) needed adjustment since ORshiftRL generation fusing '>> rc' and '\|' interferes with matching ((x << lc) >> rc) to generate UBFX. Previously ORshiftLL was created first using the shifts generated for (y & ac). S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs. Updates #33826 Updates #32781 Change-Id: I5a59f6239660d53c029cd22dfb44ddf39f93a56c Reviewed-on: https://go-review.googlesource.com/c/go/+/196810 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-24 20:30:59 +00:00
Bryan C. Mills	34fe8295c5	Revert "compile: prefer an AND instead of SHR+SHL instructions" This reverts CL 194297. Reason for revert: introduced register allocation failures on PPC64LE builders. Updates #33826 Updates #32781 Updates #34468 Change-Id: I7d0b55df8cdf8e7d2277f1814299b083c2692e48 Reviewed-on: https://go-review.googlesource.com/c/go/+/196957 Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2019-09-23 15:20:12 +00:00
Martin Möhrmann	4e2b84ffc5	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): 'go run run.go -all_codegen -v codegen' passes with following adjustments: ARM64: The BFXIL pattern ((x << lc) >> rc \| y & ac) needed adjustment since ORshiftRL generation fusing '>> rc' and '\|' interferes with matching ((x << lc) >> rc) to generate UBFX. Previously ORshiftLL was created first using the shifts generated for (y & ac). S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs. Updates #33826 Updates #32781 Change-Id: I43227da76b625de03fbc51117162b23b9c678cdb Reviewed-on: https://go-review.googlesource.com/c/go/+/194297 Run-TryBot: Martin Möhrmann <martisch@uos.de> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-21 18:00:13 +00:00
Agniva De Sarker	ecc7dd5469	test/codegen: fix wasm codegen breakage i32.eqz instructions don't appear unless needed in if conditions anymore after CL 195204. I forgot to run the codegen tests while submitting the CL. Thanks to @martisch for catching it. Fixes #34442 Change-Id: I177b064b389be48e39d564849714d7a8839be13e Reviewed-on: https://go-review.googlesource.com/c/go/+/196580 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2019-09-21 16:31:44 +00:00
Matthew Dempsky	85fc765341	cmd/compile: optimize switch on strings When compiling expression switches, we try to optimize runs of constants into binary searches. The ordering used isn't visible to the application, so it's unimportant as long as we're consistent between sorting and searching. For strings, it's much cheaper to compare string lengths than strings themselves, so instead of ordering strings by "si <= sj", we currently order them by "len(si) < len(sj) \|\| len(si) == len(sj) && si <= sj" (i.e., the lexicographical ordering on the 2-tuple (len(s), s)). However, it's also somewhat cheaper to compare strings for equality (i.e., ==) than for ordering (i.e., <=). And if there were two or three string constants of the same length in a switch statement, we might unnecessarily emit ordering comparisons. For example, given: switch s { case "", "1", "2", "3": // ordered by length then content goto L } we currently compile this as: if len(s) < 1 \|\| len(s) == 1 && s <= "1" { if s == "" { goto L } else if s == "1" { goto L } } else { if s == "2" { goto L } else if s == "3" { goto L } } This CL switches to using a 2-level binary search---first on len(s), then on s itself---so that string ordering comparisons are only needed when there are 4 or more strings of the same length. (4 being the cut-off for when using binary search is actually worthwhile.) So the above switch instead now compiles to: if len(s) == 0 { if s == "" { goto L } } else if len(s) == 1 { if s == "1" { goto L } else if s == "2" { goto L } else if s == "3" { goto L } } which is better optimized by walk and SSA. (Notably, because there are only two distinct lengths and no more than three strings of any particular length, this example ends up falling back to simply using linear search.) Test case by khr@ from CL 195138. Fixes #33934. Change-Id: I8eeebcaf7e26343223be5f443d6a97a0daf84f07 Reviewed-on: https://go-review.googlesource.com/c/go/+/195340 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-18 05:33:05 +00:00
LE Manh Cuong	ec4e8517cd	cmd/compile: support more length types for slice extension optimization golang.org/cl/109517 optimized the compiler to avoid the allocation for make in append(x, make([]T, y)...). This was only implemented for the case that y has type int. This change extends the optimization to trigger for all integer types where the value is known at compile time to fit into an int. name old time/op new time/op delta ExtendInt-12 106ns ± 4% 106ns ± 0% ~ (p=0.351 n=10+6) ExtendUint64-12 1.03µs ± 5% 0.10µs ± 4% -90.01% (p=0.000 n=9+10) name old alloc/op new alloc/op delta ExtendInt-12 0.00B 0.00B ~ (all equal) ExtendUint64-12 13.6kB ± 0% 0.0kB -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta ExtendInt-12 0.00 0.00 ~ (all equal) ExtendUint64-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) Updates #29785 Change-Id: Ief7760097c285abd591712da98c5b02bc3961fcd Reviewed-on: https://go-review.googlesource.com/c/go/+/182559 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-17 17:18:17 +00:00
Alberto Donizetti	c2facbe937	Revert "test/codegen: document -all_codegen option in README" This reverts CL 192101. Reason for revert: The same paragraph was added 2 weeks ago (look a few lines above) Change-Id: I05efb2631d7b4966f66493f178f2a649c715a3cc Reviewed-on: https://go-review.googlesource.com/c/go/+/195637 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-16 17:31:37 +00:00
Cherry Zhang	d9b8ffa51c	test/codegen: document -all_codegen option in README It is useful to know about the -all_codegen option for running codegen tests for all platforms. I was puzzling that some codegen test was not failing on my local machine or on trybot, until I found this option. Change-Id: I062cf4d73f6a6c9ebc2258195779d2dab21bc36d Reviewed-on: https://go-review.googlesource.com/c/go/+/192101 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-16 11:53:57 +00:00
Ruixin Bao	98aa97806b	cmd/compile: add math/bits.Mul64 intrinsic on s390x This change adds an intrinsic for Mul64 on s390x. To achieve that, a new assembly instruction, MLGR, is introduced in s390x/asmz.go. This assembly instruction directly uses an existing instruction on Z and supports multiplication of two 64 bit unsigned integer and stores the result in two separate registers. In this case, we require the multiplcand to be stored in register R3 and the output result (the high and low 64 bit of the product) to be stored in R2 and R3 respectively. A test case is also added. Benchmark: name old time/op new time/op delta Mul-18 11.1ns ± 0% 1.4ns ± 0% -87.39% (p=0.002 n=8+10) Mul32-18 2.07ns ± 0% 2.07ns ± 0% ~ (all equal) Mul64-18 11.1ns ± 1% 1.4ns ± 0% -87.42% (p=0.000 n=10+10) Change-Id: Ieca6ad1f61fff9a48a31d50bbd3f3c6d9e6675c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/194572 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-13 09:04:48 +00:00
Michael Munday	5c5f217b63	cmd/compile: improve s390x sign/zero extension removal This CL gets rid of the MOVDreg and MOVDnop SSA operations on s390x. They were originally inserted to help avoid situations where a sign/zero extension was elided but a spill invalidated the optimization. It's not really clear we need to do this though (amd64 doesn't have these ops for example) so long as we are careful when removing sign/zero extensions. Also, the MOVDreg technique doesn't work if the register is spilled before the MOVDreg op (I haven't seen that in practice). Removing these ops reduces the complexity of the rules and also allows us to unblock optimizations. For example, the compiler can now merge the loads in binary.{Big,Little}Endian.PutUint16 which it wasn't able to do before. This CL reduces the size of the .text section in the go tool by about 4.7KB (0.09%). Change-Id: Icaddae7f2e4f9b2debb6fabae845adb3f73b41db Reviewed-on: https://go-review.googlesource.com/c/go/+/173897 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-10 13:17:24 +00:00
Martin Möhrmann	5bb59b6d16	Revert "compile: prefer an AND instead of SHR+SHL instructions" This reverts commit `9ec7074a94`. Reason for revert: broke s390x (copysign, abs) and arm64 (bitfield) tests. Change-Id: I16c1b389c062e8c4aa5de079f1d46c9b25b0db52 Reviewed-on: https://go-review.googlesource.com/c/go/+/193850 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-09 07:33:25 +00:00
Martin Möhrmann	9ec7074a94	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): Updates #33826 Updates #32781 Change-Id: I862d3587446410c447b9a7265196b57f85358633 Reviewed-on: https://go-review.googlesource.com/c/go/+/191780 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-09 06:49:17 +00:00
Alberto Donizetti	e6d2544d20	test/codegen: mention -all_codegen in the README For performance reasons (avoiding costly cross-compilations) CL 177577 changed the codegen test harness to only run the tests for the machine's GOARCH by default. This change updates the codegen README accordingly, explaining what all.bash does run by default and how to perform the tests for all architectures. Fixes #33924 Change-Id: I43328d878c3e449ebfda46f7e69963a44a511d40 Reviewed-on: https://go-review.googlesource.com/c/go/+/192619 Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2019-09-01 15:37:13 +00:00
Brian Kessler	b003afe4fe	cmd/compile: intrinsify RotateLeft32 on wasm wasm has 32-bit versions of all integer operations. This change lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits call. Benchmarking on amd64 under node.js this is ~25% faster. node v10.15.3/amd64 name old time/op new time/op delta RotateLeft 8.37ns ± 1% 8.28ns ± 0% -1.05% (p=0.029 n=4+4) RotateLeft8 11.9ns ± 1% 11.8ns ± 0% ~ (p=0.167 n=5+5) RotateLeft16 11.8ns ± 0% 11.8ns ± 0% ~ (all equal) RotateLeft32 11.9ns ± 1% 8.7ns ± 0% -26.32% (p=0.008 n=5+5) RotateLeft64 8.31ns ± 1% 8.43ns ± 2% ~ (p=0.063 n=5+5) Updates #31265 Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46 Reviewed-on: https://go-review.googlesource.com/c/go/+/182359 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com>	2019-08-31 17:03:04 +00:00
Ben Shi	1786ecd502	cmd/compile: eliminate WASM's redundant extension & wrapping This CL eliminates unnecessary pairs of I32WrapI64 and I64ExtendI32U generated by the WASM backend for IF statements. And it makes the total size of pkg/js_wasm/ decreases about 490KB. Change-Id: I16b0abb686c4e30d5624323166ec2d0ec57dbe2d Reviewed-on: https://go-review.googlesource.com/c/go/+/191758 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com>	2019-08-30 21:20:03 +00:00
Ben Shi	8d5197d818	cmd/compile: optimize 386's math.bits.TrailingZeros16 This CL reverts CL 192097 and fixes the issue in CL 189277. Change-Id: Icd271262e1f5019a8e01c91f91c12c1261eeb02b Reviewed-on: https://go-review.googlesource.com/c/go/+/192519 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-30 17:37:00 +00:00
Cherry Zhang	9859f6bedb	test/codegen: fix ARM32 RotateLeft32 test The syntax of a shifted operation does not have a "$" sign for the shift amount. Remove it. Change-Id: I50782fe942b640076f48c2fafea4d3175be8ff99 Reviewed-on: https://go-review.googlesource.com/c/go/+/192100 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-08-28 20:42:48 +00:00
Ben Shi	3cfd003a8a	cmd/compile: optimize ARM's math.bits.RotateLeft32 This CL optimizes math.bits.RotateLeft32 to inline "MOVW Rx@>Ry, Rd" on ARM. The benchmark results of math/bits show some improvements. name old time/op new time/op delta RotateLeft-4 9.42ns ± 0% 6.91ns ± 0% -26.66% (p=0.000 n=40+33) RotateLeft8-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+31) RotateLeft16-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+32) RotateLeft32-4 8.16ns ± 0% 7.54ns ± 0% -7.68% (p=0.000 n=40+40) RotateLeft64-4 15.7ns ± 0% 15.7ns ± 0% ~ (all equal) updates #31265 Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712 Reviewed-on: https://go-review.googlesource.com/c/go/+/188697 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-08-28 15:41:58 +00:00
Ben Shi	c683ab8128	cmd/compile: optimize ARM's math.Abs This CL optimizes math.Abs to an inline ABSD instruction on ARM. The benchmark results of src/math/ show big improvements. name old time/op new time/op delta Acos-4 181ns ± 0% 182ns ± 0% +0.30% (p=0.000 n=40+40) Acosh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Asin-4 163ns ± 0% 163ns ± 0% ~ (all equal) Asinh-4 242ns ± 0% 242ns ± 0% ~ (all equal) Atan-4 120ns ± 0% 121ns ± 0% +0.83% (p=0.000 n=40+40) Atanh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Atan2-4 173ns ± 0% 173ns ± 0% ~ (all equal) Cbrt-4 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=39+37) Ceil-4 72.9ns ± 0% 72.8ns ± 0% ~ (p=0.237 n=40+40) Copysign-4 13.2ns ± 0% 13.2ns ± 0% ~ (all equal) Cos-4 193ns ± 0% 183ns ± 0% -5.18% (p=0.000 n=40+40) Cosh-4 254ns ± 0% 239ns ± 0% -5.91% (p=0.000 n=40+40) Erf-4 112ns ± 0% 112ns ± 0% ~ (all equal) Erfc-4 117ns ± 0% 117ns ± 0% ~ (all equal) Erfinv-4 127ns ± 0% 127ns ± 1% ~ (p=0.492 n=40+40) Erfcinv-4 128ns ± 0% 128ns ± 0% ~ (all equal) Exp-4 212ns ± 0% 206ns ± 0% -3.05% (p=0.000 n=40+40) ExpGo-4 216ns ± 0% 209ns ± 0% -3.24% (p=0.000 n=40+40) Expm1-4 142ns ± 0% 142ns ± 0% ~ (all equal) Exp2-4 191ns ± 0% 184ns ± 0% -3.45% (p=0.000 n=40+40) Exp2Go-4 194ns ± 0% 187ns ± 0% -3.61% (p=0.000 n=40+40) Abs-4 14.4ns ± 0% 6.3ns ± 0% -56.39% (p=0.000 n=38+39) Dim-4 12.6ns ± 0% 12.6ns ± 0% ~ (all equal) Floor-4 49.6ns ± 0% 49.6ns ± 0% ~ (all equal) Max-4 27.6ns ± 0% 27.6ns ± 0% ~ (all equal) Min-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal) Mod-4 349ns ± 0% 305ns ± 1% -12.55% (p=0.000 n=33+40) Frexp-4 54.0ns ± 0% 47.1ns ± 0% -12.78% (p=0.000 n=38+38) Gamma-4 242ns ± 0% 234ns ± 0% -3.16% (p=0.000 n=36+40) Hypot-4 84.8ns ± 0% 67.8ns ± 0% -20.05% (p=0.000 n=31+35) HypotGo-4 88.5ns ± 0% 71.6ns ± 0% -19.12% (p=0.000 n=40+38) Ilogb-4 45.8ns ± 0% 38.9ns ± 0% -15.12% (p=0.000 n=40+32) J0-4 821ns ± 0% 802ns ± 0% -2.33% (p=0.000 n=33+40) J1-4 816ns ± 0% 807ns ± 0% -1.05% (p=0.000 n=40+29) Jn-4 1.67µs ± 0% 1.65µs ± 0% -1.45% (p=0.000 n=40+39) Ldexp-4 61.5ns ± 0% 54.6ns ± 0% -11.27% (p=0.000 n=40+32) Lgamma-4 188ns ± 0% 188ns ± 0% ~ (all equal) Log-4 154ns ± 0% 147ns ± 0% -4.78% (p=0.000 n=40+40) Logb-4 50.9ns ± 0% 42.7ns ± 0% -16.11% (p=0.000 n=34+39) Log1p-4 160ns ± 0% 159ns ± 0% ~ (p=0.828 n=40+40) Log10-4 173ns ± 0% 166ns ± 0% -4.05% (p=0.000 n=40+40) Log2-4 65.3ns ± 0% 58.4ns ± 0% -10.57% (p=0.000 n=37+37) Modf-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter32-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter64-4 32.7ns ± 0% 32.6ns ± 0% ~ (p=0.375 n=40+40) PowInt-4 300ns ± 0% 277ns ± 0% -7.78% (p=0.000 n=40+40) PowFrac-4 676ns ± 0% 635ns ± 0% -6.00% (p=0.000 n=40+35) Pow10Pos-4 17.6ns ± 0% 17.6ns ± 0% ~ (all equal) Pow10Neg-4 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) Round-4 30.1ns ± 0% 30.1ns ± 0% ~ (all equal) RoundToEven-4 38.9ns ± 0% 38.9ns ± 0% ~ (all equal) Remainder-4 291ns ± 0% 263ns ± 0% -9.62% (p=0.000 n=40+40) Signbit-4 11.3ns ± 0% 11.3ns ± 0% ~ (all equal) Sin-4 185ns ± 0% 185ns ± 0% ~ (all equal) Sincos-4 230ns ± 0% 230ns ± 0% ~ (all equal) Sinh-4 253ns ± 0% 246ns ± 0% -2.77% (p=0.000 n=39+39) SqrtIndirect-4 41.4ns ± 0% 41.4ns ± 0% ~ (all equal) SqrtLatency-4 13.8ns ± 0% 13.8ns ± 0% ~ (all equal) SqrtIndirectLatency-4 37.0ns ± 0% 37.0ns ± 0% ~ (p=0.632 n=40+40) SqrtGoLatency-4 911ns ± 0% 911ns ± 0% +0.08% (p=0.000 n=40+40) SqrtPrime-4 13.2µs ± 0% 13.2µs ± 0% +0.01% (p=0.038 n=38+40) Tan-4 205ns ± 0% 205ns ± 0% ~ (all equal) Tanh-4 264ns ± 0% 247ns ± 0% -6.44% (p=0.000 n=39+32) Trunc-4 45.2ns ± 0% 45.2ns ± 0% ~ (all equal) Y0-4 796ns ± 0% 792ns ± 0% -0.55% (p=0.000 n=35+40) Y1-4 804ns ± 0% 797ns ± 0% -0.82% (p=0.000 n=24+40) Yn-4 1.64µs ± 0% 1.62µs ± 0% -1.27% (p=0.000 n=40+39) Float64bits-4 8.16ns ± 0% 8.16ns ± 0% +0.04% (p=0.000 n=35+40) Float64frombits-4 10.7ns ± 0% 10.7ns ± 0% ~ (all equal) Float32bits-4 7.53ns ± 0% 7.53ns ± 0% ~ (p=0.760 n=40+40) Float32frombits-4 6.91ns ± 0% 6.91ns ± 0% -0.04% (p=0.002 n=32+38) [Geo mean] 111ns 106ns -3.98% Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d Reviewed-on: https://go-review.googlesource.com/c/go/+/188678 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-08-28 15:41:28 +00:00
Bryan C. Mills	372b0eed17	Revert "cmd/compile: optimize 386's math.bits.TrailingZeros16" This reverts CL 189277. Reason for revert: broke 32-bit builders. Updates #33902 Change-Id: Ie5f180d0371a90e5057ed578c334372e5fc3a286 Reviewed-on: https://go-review.googlesource.com/c/go/+/192097 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2019-08-28 12:57:59 +00:00
Agniva De Sarker	7be97af2ff	cmd/compile: apply optimization for readonly globals on wasm Extend the optimization introduced in CL 141118 to the wasm architecture. And for reference, the rules trigger 212 times while building std and cmd $GOOS=js GOARCH=wasm gotip build std cmd $grep -E "Wasm.rules:44(1\|2\|3\|4)" rulelog \| wc -l 212 Updates #26498 Change-Id: I153684a2b98589ae812b42268da08b65679e09d1 Reviewed-on: https://go-review.googlesource.com/c/go/+/185477 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Richard Musiol <neelance@gmail.com>	2019-08-28 05:55:52 +00:00
Agniva De Sarker	8fedb2d338	cmd/compile: optimize bounded shifts on wasm Use the shiftIsBounded function to generate more efficient Shift instructions. Updates #25167 Change-Id: Id350f8462dc3a7ed3bfed0bcbea2860b8f40048a Reviewed-on: https://go-review.googlesource.com/c/go/+/182558 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Richard Musiol <neelance@gmail.com>	2019-08-28 04:44:21 +00:00
Ben Shi	22355d6cd2	cmd/compile: optimize 386's math.bits.TrailingZeros16 This CL optimizes math.bits.TrailingZeros16 on 386 with a pair of BSFL and ORL instrcutions. The case TrailingZeros16-4 of the benchmark test in math/bits shows big improvement. name old time/op new time/op delta TrailingZeros16-4 1.55ns ± 1% 0.87ns ± 1% -43.87% (p=0.000 n=50+49) Change-Id: Ia899975b0e46f45dcd20223b713ed632bc32740b Reviewed-on: https://go-review.googlesource.com/c/go/+/189277 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-28 02:29:54 +00:00
Ben Shi	731e6fc34e	cmd/compile: generate Select on WASM This CL performs the branchelim optimization on WASM with its select instruction. And the total size of pkg/js_wasm decreased about 80KB by this optimization. Change-Id: I868eb146120a1cac5c4609c8e9ddb07e4da8a1d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/190957 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-08-28 02:29:25 +00:00
LE Manh Cuong	c5f142fa9f	cmd/compile: optimize bitset tests The assembly output for x & c == c, where c is power of 2: MOVQ "".set+8(SP), AX ANDQ $8, AX CMPQ AX, $8 SETEQ "".~r2+24(SP) With optimization using bitset: MOVQ "".set+8(SP), AX BTL $3, AX SETCS "".~r2+24(SP) output less than 1 instruction. However, there is no speed improvement: name old time/op new time/op delta AllBitSet-8 0.35ns ± 0% 0.35ns ± 0% ~ (all equal) Fixes #31904 Change-Id: I5dca4e410bf45716ed2145e3473979ec997e35d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/175957 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-27 18:01:16 +00:00
Keith Randall	8f296f59de	Revert "Revert "cmd/compile,runtime: allocate defer records on the stack"" This reverts CL 180761 Reason for revert: Reinstate the stack-allocated defer CL. There was nothing wrong with the CL proper, but stack allocation of defers exposed two other issues. Issue #32477: Fix has been submitted as CL 181258. Issue #32498: Possible fix is CL 181377 (not submitted yet). Change-Id: I32b3365d5026600069291b068bbba6cb15295eb3 Reviewed-on: https://go-review.googlesource.com/c/go/+/181378 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-06-10 16:19:39 +00:00
Keith Randall	49200e3f3e	Revert "cmd/compile,runtime: allocate defer records on the stack" This reverts commit `fff4f599fe`. Reason for revert: Seems to still have issues around GC. Fixes #32452 Change-Id: Ibe7af629f9ad6a3d5312acd7b066123f484da7f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/180761 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2019-06-05 19:50:09 +00:00
Keith Randall	fff4f599fe	cmd/compile,runtime: allocate defer records on the stack When a defer is executed at most once in a function body, we can allocate the defer record for it on the stack instead of on the heap. This should make defers like this (which are very common) faster. This optimization applies to 363 out of the 370 static defer sites in the cmd/go binary. name old time/op new time/op delta Defer-4 52.2ns ± 5% 36.2ns ± 3% -30.70% (p=0.000 n=10+10) Fixes #6980 Update #14939 Change-Id: I697109dd7aeef9e97a9eeba2ef65ff53d3ee1004 Reviewed-on: https://go-review.googlesource.com/c/go/+/171758 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2019-06-04 17:35:20 +00:00
fanzha02	822a9f537f	cmd/compile: fix the error of absorbing boolean tests into block(FGE, FGT) The CL 164718 mistyped the comparison flags. The rules for floating point comparison should be GreaterThanF and GreaterEqualF. Fortunately, the wrong optimizations were overwritten by other integer rules, so the issue won't cause failure but just some performance impact. The fixed CL optimizes the floating point test as follows. source code: func foo(f float64) bool { return f > 4 \|\| f < -4} previous version: "FCMPD", "CSET\tGT", "CBZ" fixed version: "FCMPD", BLE" Add the test case. Change-Id: Iea954fdbb8272b2d642dae0f816dc77286e6e1fa Reviewed-on: https://go-review.googlesource.com/c/go/+/177121 Reviewed-by: Ben Shi <powerman1st@163.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-05-16 13:46:25 +00:00
Michael Munday	2c1b5130aa	cmd/compile: add math/bits.{Add,Sub}64 intrinsics on s390x This CL adds intrinsics for the 64-bit addition and subtraction functions in math/bits. These intrinsics use the condition code to propagate the carry or borrow bit. To make the carry chains more efficient I've removed the 'clobberFlags' property from most of the load and store operations. Originally these ops did clobber flags when using offsets that didn't fit in a signed 20-bit integer, however that is no longer true. As with other platforms the intrinsics are faster when executed in a chain rather than a loop because currently we need to spill and restore the carry bit between each loop iteration. We may be able to reduce the need to do this on s390x (e.g. by using compare-and-branch instructions that do not clobber flags) in the future. name old time/op new time/op delta Add64 1.21ns ± 2% 2.03ns ± 2% +67.18% (p=0.000 n=7+10) Add64multiple 2.98ns ± 3% 1.03ns ± 0% -65.39% (p=0.000 n=10+9) Sub64 1.23ns ± 4% 2.03ns ± 1% +64.85% (p=0.000 n=10+10) Sub64multiple 3.73ns ± 4% 1.04ns ± 1% -72.28% (p=0.000 n=10+8) Change-Id: I913bbd5e19e6b95bef52f5bc4f14d6fe40119083 Reviewed-on: https://go-review.googlesource.com/c/go/+/174303 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-05-03 10:41:15 +00:00
Lynn Boger	e30aa166ea	test: enable more memcombine tests for ppc64le This enables more of the testcases in memcombine for ppc64le, and adds more detail to some existing. Change-Id: Ic522a1175bed682b546909c96f9ea758f8db247c Reviewed-on: https://go-review.googlesource.com/c/go/+/174737 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-05-01 14:27:27 +00:00
Brian Kessler	4d9dd35806	cmd/compile: add signed divisibility rules "Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication, and add and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also needed. So, we must match on the expanded form of (x/c) in the expression x == c(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. On amd64, the divisibility check is 30-45% faster. name old time/op new time/op delta` DivisiblePow2constI64-4 0.83ns ± 1% 0.82ns ± 0% ~ (p=0.079 n=5+4) DivisibleconstI64-4 2.68ns ± 1% 1.87ns ± 0% -30.33% (p=0.000 n=5+4) DivisibleWDivconstI64-4 2.69ns ± 1% 2.71ns ± 3% ~ (p=1.000 n=5+5) DivisiblePow2constI32-4 1.15ns ± 1% 1.15ns ± 0% ~ (p=0.238 n=5+4) DivisibleconstI32-4 2.24ns ± 1% 1.20ns ± 0% -46.48% (p=0.016 n=5+4) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.683 n=5+5) DivisiblePow2constI16-4 0.81ns ± 1% 0.82ns ± 1% ~ (p=0.135 n=5+5) DivisibleconstI16-4 2.11ns ± 2% 1.20ns ± 1% -42.99% (p=0.008 n=5+5) DivisibleWDivconstI16-4 2.23ns ± 0% 2.27ns ± 2% +1.79% (p=0.029 n=4+4) DivisiblePow2constI8-4 0.81ns ± 1% 0.81ns ± 1% ~ (p=0.286 n=5+5) DivisibleconstI8-4 2.13ns ± 3% 1.19ns ± 1% -43.84% (p=0.008 n=5+5) DivisibleWDivconstI8-4 2.23ns ± 1% 2.25ns ± 1% ~ (p=0.183 n=5+5) Fixes #30282 Fixes #15806 Change-Id: Id20d78263a4fdfe0509229ae4dfa2fede83fc1d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/173998 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-04-30 22:02:07 +00:00
Carlos Eduardo Seo	50ad09418e	cmd/compile: intrinsify math/bits.Add64 for ppc64x This change creates an intrinsic for Add64 for ppc64x and adds a testcase for it. name old time/op new time/op delta Add64-160 1.90ns ±40% 2.29ns ± 0% ~ (p=0.119 n=5+5) Add64multiple-160 6.69ns ± 2% 2.45ns ± 4% -63.47% (p=0.016 n=4+5) Change-Id: I9abe6fb023fdf62eea3c9b46a1820f60bb0a7f97 Reviewed-on: https://go-review.googlesource.com/c/go/+/173758 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>	2019-04-28 23:51:04 +00:00
Brian Kessler	a28a942768	cmd/compile: add unsigned divisibility rules "Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also available. So, we must match on the expanded form of (x/c) in the expression x == c(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. Additional rules to lower the generic RotateLeft* ops were also applied. On amd64, the divisibility check is 25-50% faster. name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.08ns ± 1% ~ (p=0.881 n=5+5) DivisibleconstI64-4 2.67ns ± 0% 2.67ns ± 1% ~ (p=1.000 n=5+5) DivisibleWDivconstI64-4 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.683 n=5+5) DivconstU64-4 2.08ns ± 1% 2.08ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstU64-4 2.77ns ± 1% 1.55ns ± 2% -43.90% (p=0.008 n=5+5) DivisibleWDivconstU64-4 2.99ns ± 1% 2.99ns ± 1% ~ (p=1.000 n=5+5) DivconstI32-4 1.53ns ± 2% 1.53ns ± 0% ~ (p=1.000 n=5+5) DivisibleconstI32-4 2.23ns ± 0% 2.25ns ± 3% ~ (p=0.167 n=5+5) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.429 n=5+5) DivconstU32-4 1.78ns ± 0% 1.78ns ± 1% ~ (p=1.000 n=4+5) DivisibleconstU32-4 2.52ns ± 2% 1.26ns ± 0% -49.96% (p=0.000 n=5+4) DivisibleWDivconstU32-4 2.63ns ± 0% 2.85ns ±10% +8.29% (p=0.016 n=4+5) DivconstI16-4 1.54ns ± 0% 1.54ns ± 0% ~ (p=0.333 n=4+5) DivisibleconstI16-4 2.10ns ± 0% 2.10ns ± 1% ~ (p=0.571 n=4+5) DivisibleWDivconstI16-4 2.22ns ± 0% 2.23ns ± 1% ~ (p=0.556 n=4+5) DivconstU16-4 1.09ns ± 0% 1.01ns ± 1% -7.74% (p=0.000 n=4+5) DivisibleconstU16-4 1.83ns ± 0% 1.26ns ± 0% -31.52% (p=0.008 n=5+5) DivisibleWDivconstU16-4 1.88ns ± 0% 1.89ns ± 1% ~ (p=0.365 n=5+5) DivconstI8-4 1.54ns ± 1% 1.54ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstI8-4 2.10ns ± 0% 2.11ns ± 0% ~ (p=0.238 n=5+4) DivisibleWDivconstI8-4 2.22ns ± 0% 2.23ns ± 2% ~ (p=0.762 n=5+5) DivconstU8-4 0.92ns ± 1% 0.94ns ± 1% +2.65% (p=0.008 n=5+5) DivisibleconstU8-4 1.66ns ± 0% 1.26ns ± 1% -24.28% (p=0.008 n=5+5) DivisibleWDivconstU8-4 1.79ns ± 0% 1.80ns ± 1% ~ (p=0.079 n=4+5) A follow-up change will address the signed division case. Updates #30282 Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403 Reviewed-on: https://go-review.googlesource.com/c/go/+/168037 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-04-27 20:46:46 +00:00
Brian Kessler	44343c777c	cmd/compile: add signed divisibility by power of 2 rules For powers of two (c=1<<k), the divisibility check x%c == 0 can be made just by checking the trailing zeroes via a mask x&(c-1) == 0 even for signed integers. This avoids division fix-ups when just divisibility check is needed. To apply this rule, we match on the fixed-up version of the division. This is neccessary because the mod and division rewrite rules are already applied during the initial opt pass. The speed up on amd64 due to elimination of unneccessary fix-up code is ~55%: name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.09ns ± 1% ~ (p=0.730 n=5+5) DivisiblePow2constI64-4 1.78ns ± 1% 0.81ns ± 1% -54.66% (p=0.008 n=5+5) DivconstU64-4 2.08ns ± 0% 2.08ns ± 0% ~ (p=0.683 n=5+5) DivconstI32-4 1.53ns ± 0% 1.53ns ± 1% ~ (p=0.968 n=4+5) DivisiblePow2constI32-4 1.79ns ± 1% 0.81ns ± 1% -54.97% (p=0.008 n=5+5) DivconstU32-4 1.78ns ± 1% 1.80ns ± 2% ~ (p=0.206 n=5+5) DivconstI16-4 1.54ns ± 2% 1.54ns ± 0% ~ (p=0.238 n=5+4) DivisiblePow2constI16-4 1.78ns ± 0% 0.81ns ± 1% -54.72% (p=0.000 n=4+5) DivconstU16-4 1.00ns ± 5% 1.01ns ± 1% ~ (p=0.119 n=5+5) DivconstI8-4 1.54ns ± 0% 1.54ns ± 2% ~ (p=0.571 n=4+5) DivisiblePow2constI8-4 1.78ns ± 0% 0.82ns ± 8% -53.71% (p=0.008 n=5+5) DivconstU8-4 0.93ns ± 1% 0.93ns ± 1% ~ (p=0.643 n=5+5) A follow-up CL will address the general case of x%c == 0 for signed integers. Updates #15806 Change-Id: Iabadbbe369b6e0998c8ce85d038ebc236142e42a Reviewed-on: https://go-review.googlesource.com/c/go/+/173557 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-04-25 03:00:32 +00:00
Keith Randall	17615969b6	Revert "cmd/compile: add signed divisibility by power of 2 rules" This reverts CL 168038 (git `68819fb6d2`) Reason for revert: Doesn't work on 32 bit archs. Change-Id: Idec9098060dc65bc2f774c5383f0477f8eb63a3d Reviewed-on: https://go-review.googlesource.com/c/go/+/173442 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-04-23 21:23:18 +00:00
Brian Kessler	68819fb6d2	cmd/compile: add signed divisibility by power of 2 rules For powers of two (c=1<<k), the divisibility check x%c == 0 can be made just by checking the trailing zeroes via a mask x&(c-1)==0 even for signed integers. This avoids division fixups when just divisibility check is needed. To apply this rule the generic divisibility rule for A%B = A-(A/B*B) is disabled on the "opt" pass, but this does not affect generated code as this rule is applied later. The speed up on amd64 due to elimination of unneccessary fixup code is ~55%: name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.07ns ± 0% ~ (p=0.079 n=5+5) DivisiblePow2constI64-4 1.78ns ± 1% 0.81ns ± 1% -54.55% (p=0.008 n=5+5) DivconstU64-4 2.08ns ± 0% 2.08ns ± 0% ~ (p=1.000 n=5+5) DivconstI32-4 1.53ns ± 0% 1.53ns ± 0% ~ (all equal) DivisiblePow2constI32-4 1.79ns ± 1% 0.81ns ± 4% -54.75% (p=0.008 n=5+5) DivconstU32-4 1.78ns ± 1% 1.78ns ± 1% ~ (p=1.000 n=5+5) DivconstI16-4 1.54ns ± 2% 1.53ns ± 0% ~ (p=0.333 n=5+4) DivisiblePow2constI16-4 1.78ns ± 0% 0.79ns ± 1% -55.39% (p=0.000 n=4+5) DivconstU16-4 1.00ns ± 5% 0.99ns ± 1% ~ (p=0.730 n=5+5) DivconstI8-4 1.54ns ± 0% 1.53ns ± 0% ~ (p=0.714 n=4+5) DivisiblePow2constI8-4 1.78ns ± 0% 0.80ns ± 0% -55.06% (p=0.000 n=5+4) DivconstU8-4 0.93ns ± 1% 0.95ns ± 1% +1.72% (p=0.024 n=5+5) A follow-up CL will address the general case of x%c == 0 for signed integers. Updates #15806 Change-Id: I0d284863774b1bc8c4ce87443bbaec6103e14ef4 Reviewed-on: https://go-review.googlesource.com/c/go/+/168038 Reviewed-by: Keith Randall <khr@golang.org>	2019-04-23 20:35:54 +00:00
Keith Randall	fd788a86b6	cmd/compile: always mark atColumn1 results as statements In 31618, we end up comparing the is-stmt-ness of positions to repurpose real instructions as inline marks. If the is-stmt-ness doesn't match, we end up not being able to remove the inline mark. Always use statement-full positions to do the matching, so we always find a match if there is one. Also always use positions that are statements for inline marks. Fixes #31618 Change-Id: Idaf39bdb32fa45238d5cd52973cadf4504f947d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/173324 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>	2019-04-23 17:39:11 +00:00
erifan01	f8f265b9cf	cmd/compile: intrinsify math/bits.Sub64 for arm64 This CL instrinsifies Sub64 with arm64 instruction sequence NEGS, SBCS, NGC and NEG, and optimzes the case of borrowing chains. Benchmarks: name old time/op new time/op delta Sub-64 2.500000ns +- 0% 2.048000ns +- 1% -18.08% (p=0.000 n=10+10) Sub32-64 2.500000ns +- 0% 2.500000ns +- 0% ~ (all equal) Sub64-64 2.500000ns +- 0% 2.080000ns +- 0% -16.80% (p=0.000 n=10+7) Sub64multiple-64 7.090000ns +- 0% 2.090000ns +- 0% -70.52% (p=0.000 n=10+10) Change-Id: I3d2664e009a9635e13b55d2c4567c7b34c2c0655 Reviewed-on: https://go-review.googlesource.com/c/go/+/159018 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-04-22 14:40:20 +00:00
Josh Bleecher Snyder	68d4b1265e	cmd/compile: reduce bits.Div64(0, lo, y) to 64 bit division With this change, these two functions generate identical code: func f(x uint64) (uint64, uint64) { return bits.Div64(0, x, 5) } func g(x uint64) (uint64, uint64) { return x / 5, x % 5 } Updates #31582 Change-Id: Ia96c2e67f8af5dd985823afee5f155608c04a4b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/173197 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-04-20 19:34:03 +00:00
Keith Randall	48ef01051a	cmd/compile: handle new panicindex/slice names in optimizations These new calls should not prevent NOSPLIT promotion, like the old ones. These new calls should not prevent racefuncenter/exit removal. (The latter was already true, as the new calls are not yet lowered to StaticCalls at the point where racefuncenter/exit removal is done.) Add tests to make sure we don't regress (again). Fixes #31219 Change-Id: I3fb6b17cdd32c425829f1e2498defa813a5a9ace Reviewed-on: https://go-review.googlesource.com/c/go/+/170639 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2019-04-03 21:24:17 +00:00
erifan01	d0cbf9bf53	cmd/compile: follow up intrinsifying math/bits.Add64 for arm64 This CL deals with the additional comments of CL 159017. Change-Id: I4ad3c60c834646d58dc0c544c741b92bfe83fb8b Reviewed-on: https://go-review.googlesource.com/c/go/+/168857 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-22 15:09:47 +00:00
Carlos Eduardo Seo	3023d7da49	cmd/compile/internal, cmd/internal/obj/ppc64: generate new count trailing zeros instructions on POWER9 This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD) to the assembler and generates them in SSA when GOPPC64=power9. name old time/op new time/op delta TrailingZeros-160 1.59ns ±20% 1.45ns ±10% -8.81% (p=0.000 n=14+13) TrailingZeros8-160 1.55ns ±23% 1.62ns ±44% ~ (p=0.593 n=13+15) TrailingZeros16-160 1.78ns ±23% 1.62ns ±38% -9.31% (p=0.003 n=14+14) TrailingZeros32-160 1.64ns ±10% 1.49ns ± 9% -9.15% (p=0.000 n=13+14) TrailingZeros64-160 1.53ns ± 6% 1.45ns ± 5% -5.38% (p=0.000 n=15+13) Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596 Reviewed-on: https://go-review.googlesource.com/c/go/+/167517 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2019-03-20 20:27:00 +00:00
erifan01	5714c91b53	cmd/compile: intrinsify math/bits.Add64 for arm64 This CL instrinsifies Add64 with arm64 instruction sequence ADDS, ADCS and ADC, and optimzes the case of carry chains.The CL also changes the test code so that the intrinsic implementation can be tested. Benchmarks: name old time/op new time/op delta Add-224 2.500000ns +- 0% 2.090000ns +- 4% -16.40% (p=0.000 n=9+10) Add32-224 2.500000ns +- 0% 2.500000ns +- 0% ~ (all equal) Add64-224 2.500000ns +- 0% 1.577778ns +- 2% -36.89% (p=0.000 n=10+9) Add64multiple-224 6.000000ns +- 0% 2.000000ns +- 0% -66.67% (p=0.000 n=10+10) Change-Id: I6ee91c9a85c16cc72ade5fd94868c579f16c7615 Reviewed-on: https://go-review.googlesource.com/c/go/+/159017 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-03-20 05:39:49 +00:00
Josh Bleecher Snyder	250b96a7bf	cmd/compile: slightly optimize adding 128 'SUBQ $-0x80, r' is shorter to encode than 'ADDQ $0x80, r', and functionally equivalent. Use it instead. Shaves off a few bytes here and there: file before after Δ % compile 25935856 25927664 -8192 -0.032% nm 4251840 4247744 -4096 -0.096% Change-Id: Ia9e02ea38cbded6a52a613b92e3a914f878d931e Reviewed-on: https://go-review.googlesource.com/c/go/+/168344 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-03-19 19:44:52 +00:00
Keith Randall	2c423f063b	cmd/compile,runtime: provide index information on bounds check failure A few examples (for accessing a slice of length 3): s[-1] runtime error: index out of range [-1] s[3] runtime error: index out of range [3] with length 3 s[-1:0] runtime error: slice bounds out of range [-1:] s[3:0] runtime error: slice bounds out of range [3:0] s[3:-1] runtime error: slice bounds out of range [:-1] s[3:4] runtime error: slice bounds out of range [:4] with capacity 3 s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3 Note that in cases where there are multiple things wrong with the indexes (e.g. s[3:-1]), we report one of those errors kind of arbitrarily, currently the rightmost one. An exhaustive set of examples is in issue30116[u].out in the CL. The message text has the same prefix as the old message text. That leads to slightly awkward phrasing but hopefully minimizes the chance that code depending on the error text will break. Increases the size of the go binary by 0.5% (amd64). The panic functions take arguments in registers in order to keep the size of the compiled code as small as possible. Fixes #30116 Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd Reviewed-on: https://go-review.googlesource.com/c/go/+/161477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2019-03-18 17:33:38 +00:00
Tobias Klauser	156c830bea	cmd/compile: eliminate unnecessary type conversions in TrailingZeros(16\|8) for arm This follows CL 156999 which did the same for arm64. name old time/op new time/op delta TrailingZeros-4 7.30ns ± 1% 7.30ns ± 0% ~ (p=0.413 n=9+9) TrailingZeros8-4 8.32ns ± 0% 7.17ns ± 0% -13.77% (p=0.000 n=10+9) TrailingZeros16-4 8.30ns ± 0% 7.18ns ± 0% -13.50% (p=0.000 n=9+10) TrailingZeros32-4 6.46ns ± 1% 6.47ns ± 1% ~ (p=0.325 n=10+10) TrailingZeros64-4 16.3ns ± 0% 16.2ns ± 0% -0.61% (p=0.000 n=7+10) Change-Id: I7e9e1abf7e30d811aa474d272b2824ec7cbbaa98 Reviewed-on: https://go-review.googlesource.com/c/go/+/167797 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-03-15 18:37:22 +00:00
Richard Musiol	5ee1b84959	math, math/bits: add intrinsics for wasm This commit adds compiler intrinsics for the packages math and math/bits on the wasm architecture for better performance. benchmark old ns/op new ns/op delta BenchmarkCeil 8.31 3.21 -61.37% BenchmarkCopysign 5.24 3.88 -25.95% BenchmarkAbs 5.42 3.34 -38.38% BenchmarkFloor 8.29 3.18 -61.64% BenchmarkRoundToEven 9.76 3.26 -66.60% BenchmarkSqrtLatency 8.13 4.88 -39.98% BenchmarkSqrtPrime 5246 3535 -32.62% BenchmarkTrunc 8.29 3.15 -62.00% BenchmarkLeadingZeros 13.0 4.23 -67.46% BenchmarkLeadingZeros8 4.65 4.42 -4.95% BenchmarkLeadingZeros16 7.60 4.38 -42.37% BenchmarkLeadingZeros32 10.7 4.48 -58.13% BenchmarkLeadingZeros64 12.9 4.31 -66.59% BenchmarkTrailingZeros 6.52 4.04 -38.04% BenchmarkTrailingZeros8 4.57 4.14 -9.41% BenchmarkTrailingZeros16 6.69 4.16 -37.82% BenchmarkTrailingZeros32 6.97 4.23 -39.31% BenchmarkTrailingZeros64 6.59 4.00 -39.30% BenchmarkOnesCount 7.93 3.30 -58.39% BenchmarkOnesCount8 3.56 3.19 -10.39% BenchmarkOnesCount16 4.85 3.19 -34.23% BenchmarkOnesCount32 7.27 3.19 -56.12% BenchmarkOnesCount64 8.08 3.28 -59.41% BenchmarkRotateLeft 4.88 3.80 -22.13% BenchmarkRotateLeft64 5.03 3.63 -27.83% Change-Id: Ic1e0c2984878be8defb6eb7eb6ee63765c793222 Reviewed-on: https://go-review.googlesource.com/c/go/+/165177 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-03-14 19:46:19 +00:00
Josh Bleecher Snyder	61945fc502	cmd/compile: don't generate panicshift for masked int shifts We know that a & 31 is non-negative for all a, signed or not. We can avoid checking that and needing to write out an unreachable call to panicshift. Change-Id: I32f32fb2c950d2b2b35ac5c0e99b7b2dbd47f917 Reviewed-on: https://go-review.googlesource.com/c/go/+/167499 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2019-03-14 00:03:29 +00:00
Josh Bleecher Snyder	870cfe6484	test/codegen: gofmt Change-Id: I33f5b5051e5f75aa264ec656926223c5a3c09c1b Reviewed-on: https://go-review.googlesource.com/c/go/+/167498 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matt Layher <mdlayher@gmail.com>	2019-03-13 21:44:45 +00:00
Keith Randall	486ca37b14	test: fix memcombine tests Two tests (load_le_byte8_uint64_inv and load_be_byte8_uint64) pass but the generated code isn't actually correct. The test regexp provides a false negative, as it matches the MOVQ (SP), BP instruction in the epilogue. Combined loads never worked for these cases - the test was added in error as part of a batch and not noticed because of the above false match. Normalize the amd64/386 tests to always negative match on narrower loads and OR. Change-Id: I256861924774d39db0e65723866c81df5ab5076f Reviewed-on: https://go-review.googlesource.com/c/go/+/166837 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-03-11 19:18:03 +00:00
fanzha02	6efd51c6b7	cmd/compile: change the condition flags of floating-point comparisons in arm64 backend Current compiler reverses operands to work around NaN in "less than" and "less equal than" comparisons. But if we want to use "FCMPD/FCMPS $(0.0), Fn" to do some optimization, the workaround way does not work. Because assembler does not support instruction "FCMPD/FCMPS Fn, $(0.0)". This CL sets condition flags for floating-point comparisons to resolve this problem. Change-Id: Ia48076a1da95da64596d6e68304018cb301ebe33 Reviewed-on: https://go-review.googlesource.com/c/go/+/164718 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-03-07 21:23:52 +00:00
erifan01	4e2b0dda8c	cmd/compile: eliminate unnecessary type conversions in TrailingZeros(16\|8) for arm64 This CL eliminates unnecessary type conversion operations: OpZeroExt16to64 and OpZeroExt8to64. If the input argrument is a nonzero value, then ORconst operation can also be eliminated. Benchmarks: name old time/op new time/op delta TrailingZeros-8 2.75ns ± 0% 2.75ns ± 0% ~ (all equal) TrailingZeros8-8 3.49ns ± 1% 2.93ns ± 0% -16.00% (p=0.000 n=10+10) TrailingZeros16-8 3.49ns ± 1% 2.93ns ± 0% -16.05% (p=0.000 n=9+10) TrailingZeros32-8 2.67ns ± 1% 2.68ns ± 1% ~ (p=0.468 n=10+10) TrailingZeros64-8 2.67ns ± 1% 2.65ns ± 0% -0.62% (p=0.022 n=10+9) code: func f16(x uint) { z = bits.TrailingZeros16(uint16(x)) } Before: "".f16 STEXT size=48 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f16(SB), LEAF\|NOFRAME\|ABIInternal, $0-8 0x0000 00000 (test.go:7) FUNCDATA ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) PCDATA $2, ZR 0x0000 00000 (test.go:7) PCDATA ZR, ZR 0x0000 00000 (test.go:7) MOVD "".x(FP), R0 0x0004 00004 (test.go:7) MOVHU R0, R0 0x0008 00008 (test.go:7) ORR $65536, R0, R0 0x000c 00012 (test.go:7) RBIT R0, R0 0x0010 00016 (test.go:7) CLZ R0, R0 0x0014 00020 (test.go:7) MOVD R0, "".z(SB) 0x0020 00032 (test.go:7) RET (R30) This line of code is unnecessary: 0x0004 00004 (test.go:7) MOVHU R0, R0 After: "".f16 STEXT size=32 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f16(SB), LEAF\|NOFRAME\|ABIInternal, $0-8 0x0000 00000 (test.go:7) FUNCDATA ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) PCDATA $2, ZR 0x0000 00000 (test.go:7) PCDATA ZR, ZR 0x0000 00000 (test.go:7) MOVD "".x(FP), R0 0x0004 00004 (test.go:7) ORR $65536, R0, R0 0x0008 00008 (test.go:7) RBITW R0, R0 0x000c 00012 (test.go:7) CLZW R0, R0 0x0010 00016 (test.go:7) MOVD R0, "".z(SB) 0x001c 00028 (test.go:7) RET (R30) The situation of TrailingZeros8 is similar to TrailingZeros16. Change-Id: I473bdca06be8460a0be87abbae6fe640017e4c9d Reviewed-on: https://go-review.googlesource.com/c/go/+/156999 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-07 14:24:56 +00:00
erifan01	fee84cc905	cmd/compile: add an optimization rule for math/bits.ReverseBytes16 on arm This CL adds two rules to turn patterns like ((x<<8) \| (x>>8)) (the type of x is uint16, "\|" can also be "+" or "^") to a REV16 instruction on arm v6+. This optimization rule can be used for math/bits.ReverseBytes16. Benchmarks on arm v6: name old time/op new time/op delta ReverseBytes-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal) ReverseBytes16-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal) ReverseBytes32-32 1.29ns ± 0% 1.29ns ± 0% ~ (all equal) ReverseBytes64-32 1.43ns ± 0% 1.43ns ± 0% ~ (all equal) Change-Id: I819e633c9a9d308f8e476fb0c82d73fb73dd019f Reviewed-on: https://go-review.googlesource.com/c/go/+/159019 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-07 13:37:54 +00:00
erifan01	159b2de442	cmd/compile: optimize math/bits.Div32 for arm64 Benchmark: name old time/op new time/op delta Div-8 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) Div32-8 6.51ns ± 0% 3.00ns ± 0% -53.90% (p=0.000 n=10+8) Div64-8 22.5ns ± 0% 22.5ns ± 0% ~ (all equal) Code: func div32(hi, lo, y uint32) (q, r uint32) {return bits.Div32(hi, lo, y)} Before: 0x0020 00032 (test.go:24) MOVWU "".y+8(FP), R0 0x0024 00036 ($GOROOT/src/math/bits/bits.go:472) CBZW R0, 132 0x0028 00040 ($GOROOT/src/math/bits/bits.go:472) MOVWU "".hi(FP), R1 0x002c 00044 ($GOROOT/src/math/bits/bits.go:472) CMPW R1, R0 0x0030 00048 ($GOROOT/src/math/bits/bits.go:472) BLS 96 0x0034 00052 ($GOROOT/src/math/bits/bits.go:475) MOVWU "".lo+4(FP), R2 0x0038 00056 ($GOROOT/src/math/bits/bits.go:475) ORR R1<<32, R2, R1 0x003c 00060 ($GOROOT/src/math/bits/bits.go:476) CBZ R0, 140 0x0040 00064 ($GOROOT/src/math/bits/bits.go:476) UDIV R0, R1, R2 0x0044 00068 (test.go:24) MOVW R2, "".q+16(FP) 0x0048 00072 ($GOROOT/src/math/bits/bits.go:476) UREM R0, R1, R0 0x0050 00080 (test.go:24) MOVW R0, "".r+20(FP) 0x0054 00084 (test.go:24) MOVD -8(RSP), R29 0x0058 00088 (test.go:24) MOVD.P 32(RSP), R30 0x005c 00092 (test.go:24) RET (R30) After: 0x001c 00028 (test.go:24) MOVWU "".y+8(FP), R0 0x0020 00032 (test.go:24) CBZW R0, 92 0x0024 00036 (test.go:24) MOVWU "".hi(FP), R1 0x0028 00040 (test.go:24) CMPW R0, R1 0x002c 00044 (test.go:24) BHS 84 0x0030 00048 (test.go:24) MOVWU "".lo+4(FP), R2 0x0034 00052 (test.go:24) ORR R1<<32, R2, R4 0x0038 00056 (test.go:24) UDIV R0, R4, R3 0x003c 00060 (test.go:24) MSUB R3, R4, R0, R4 0x0040 00064 (test.go:24) MOVW R3, "".q+16(FP) 0x0044 00068 (test.go:24) MOVW R4, "".r+20(FP) 0x0048 00072 (test.go:24) MOVD -8(RSP), R29 0x004c 00076 (test.go:24) MOVD.P 16(RSP), R30 0x0050 00080 (test.go:24) RET (R30) UREM instruction in the previous assembly code will be converted to UDIV and MSUB instructions on arm64. However the UDIV instruction in UREM is unnecessary, because it's a duplicate of the previous UDIV. This CL adds a rule to have this extra UDIV instruction removed by CSE. Change-Id: Ie2508784320020b2de022806d09f75a7871bb3d7 Reviewed-on: https://go-review.googlesource.com/c/159577 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-03 20:20:10 +00:00
erifan01	192b675f17	cmd/compile: add an optimaztion rule for math/bits.ReverseBytes16 on arm64 On amd64 ReverseBytes16 is lowered to a rotate instruction. However arm64 doesn't have 16-bit rotate instruction, but has a REV16W instruction which can be used for ReverseBytes16. This CL adds a rule to turn the patterns like (x<<8) \| (x>>8) (the type of x is uint16, and "\|" can also be "^" or "+") to a REV16W instruction. Code: func reverseBytes16(i uint16) uint16 { return bits.ReverseBytes16(i) } Before: 0x0004 00004 (test.go:6) MOVHU "".i(FP), R0 0x0008 00008 ($GOROOT/src/math/bits/bits.go:262) UBFX $8, R0, $8, R1 0x000c 00012 ($GOROOT/src/math/bits/bits.go:262) ORR R0<<8, R1, R0 0x0010 00016 (test.go:6) MOVH R0, "".~r1+8(FP) 0x0014 00020 (test.go:6) RET (R30) After: 0x0000 00000 (test.go:6) MOVHU "".i(FP), R0 0x0004 00004 (test.go:6) REV16W R0, R0 0x0008 00008 (test.go:6) MOVH R0, "".~r1+8(FP) 0x000c 00012 (test.go:6) RET (R30) Benchmarks: name old time/op new time/op delta ReverseBytes-224 1.000000ns +- 0% 1.000000ns +- 0% ~ (all equal) ReverseBytes16-224 1.500000ns +- 0% 1.000000ns +- 0% -33.33% (p=0.000 n=9+10) ReverseBytes32-224 1.000000ns +- 0% 1.000000ns +- 0% ~ (all equal) ReverseBytes64-224 1.000000ns +- 0% 1.000000ns +- 0% ~ (all equal) Change-Id: I87cd41b2d8e549bf39c601f185d5775bd42d739c Reviewed-on: https://go-review.googlesource.com/c/157757 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-01 15:42:19 +00:00
erifan01	dd91269b7c	cmd/compile: optimize math/bits Len32 intrinsic on arm64 Arm64 has a 32-bit CLZ instruction CLZW, which can be used for intrinsic Len32. Function LeadingZeros32 calls Len32, with this change, the assembly code of LeadingZeros32 becomes more concise. Go code: func f32(x uint32) { z = bits.LeadingZeros32(x) } Before: "".f32 STEXT size=32 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f32(SB), LEAF\|NOFRAME\|ABIInternal, $0-8 0x0004 00004 (test.go:7) MOVWU "".x(FP), R0 0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZ R0, R0 0x000c 00012 ($GOROOT/src/math/bits/bits.go:30) SUB $32, R0, R0 0x0010 00016 (test.go:7) MOVD R0, "".z(SB) 0x001c 00028 (test.go:7) RET (R30) After: "".f32 STEXT size=32 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f32(SB), LEAF\|NOFRAME\|ABIInternal, $0-8 0x0004 00004 (test.go:7) MOVWU "".x(FP), R0 0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZW R0, R0 0x000c 00012 (test.go:7) MOVD R0, "".z(SB) 0x0018 00024 (test.go:7) RET (R30) Benchmarks: name old time/op new time/op delta LeadingZeros-8 2.53ns ± 0% 2.55ns ± 0% +0.67% (p=0.000 n=10+10) LeadingZeros8-8 3.56ns ± 0% 3.56ns ± 0% ~ (all equal) LeadingZeros16-8 3.55ns ± 0% 3.56ns ± 0% ~ (p=0.465 n=10+10) LeadingZeros32-8 3.55ns ± 0% 2.96ns ± 0% -16.71% (p=0.000 n=10+7) LeadingZeros64-8 2.53ns ± 0% 2.54ns ± 0% ~ (p=0.059 n=8+10) Change-Id: Ie5666bb82909e341060e02ffd4e86c0e5d67e90a Reviewed-on: https://go-review.googlesource.com/c/157000 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-02-27 16:09:33 +00:00
Iskander Sharipov	c1050a8e54	cmd/compile: don't generate newobject call for 0-sized types Emit &runtime.zerobase instead of a call to newobject for allocations of zero sized objects in walk.go. Fixes #29446 Change-Id: I11b67981d55009726a17c2e582c12ce0c258682e Reviewed-on: https://go-review.googlesource.com/c/155840 Run-TryBot: Iskander Sharipov <quasilyte@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2019-02-26 23:08:15 +00:00
Keith Randall	933e34ac99	cmd/compile: treat slice pointers as non-nil var a []int = ... p := &a[0] _ = *p We don't need to nil check on the 3rd line. If the bounds check on the 2nd line passes, we know p is non-nil. We rely on the fact that any cap>0 slice has a non-nil pointer as its pointer to the backing array. This is true for all safely-constructed slices, and I don't see any reason why someone would violate this rule using unsafe. R=go1.13 Fixes #30366 Change-Id: I3ed764fcb72cfe1fbf963d8c1a82e24e3b6dead7 Reviewed-on: https://go-review.googlesource.com/c/163740 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2019-02-26 20:44:52 +00:00

... 2 3 4 5 6 ...

483 Commits