qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-07 08:46:19 -07:00

Author	SHA1	Message	Date
Lynn Boger	28edaf4584	cmd/compile,test: combine byte loads and stores on ppc64le CL 74410 added rules to combine consecutive byte loads and stores when the byte order was little endian for ppc64le. This is the corresponding change for bytes that are in big endian order. These rules are all intended for a little endian target arch. This adds new testcases in test/codegen/memcombine.go Fixes #22496 Updates #24242 Benchmark improvement for encoding/binary: name old time/op new time/op delta ReadSlice1000Int32s-16 11.0µs ± 0% 9.0µs ± 0% -17.47% (p=0.029 n=4+4) ReadStruct-16 2.47µs ± 1% 2.48µs ± 0% +0.67% (p=0.114 n=4+4) ReadInts-16 642ns ± 1% 630ns ± 1% -2.02% (p=0.029 n=4+4) WriteInts-16 654ns ± 0% 653ns ± 1% -0.08% (p=0.629 n=4+4) WriteSlice1000Int32s-16 8.75µs ± 0% 8.20µs ± 0% -6.19% (p=0.029 n=4+4) PutUint16-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4) PutUint32-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4) PutUint64-16 1.85ns ± 0% 0.93ns ± 0% -49.73% (p=0.029 n=4+4) LittleEndianPutUint16-16 1.03ns ± 0% 0.93ns ± 0% -9.71% (p=0.029 n=4+4) LittleEndianPutUint32-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal) LittleEndianPutUint64-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal) PutUvarint32-16 43.0ns ± 0% 43.1ns ± 0% +0.12% (p=0.429 n=4+4) PutUvarint64-16 174ns ± 0% 175ns ± 0% +0.29% (p=0.429 n=4+4) Updates made to functions in gcm.go to enable their matching. An existing testcase prevents these functions from being replaced by those in encoding/binary due to import dependencies. Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a Reviewed-on: https://go-review.googlesource.com/98136 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-05-08 13:15:39 +00:00
Michael Munday	f31a18ded4	cmd/compile: add some generic composite type optimizations Propagate values through some wide Zero/Move operations. Among other things this allows us to optimize some kinds of array initialization. For example, the following code no longer requires a temporary be allocated on the stack. Instead it writes the values directly into the return value. func f(i uint32) [4]uint32 { return [4]uint32{i, i+1, i+2, i+3} } The return value is unnecessarily cleared but removing that is probably a task for dead store analysis (I think it needs to be able to match multiple Store ops to wide Zero ops). In order to reliably remove stack variables that are rendered unnecessary by these new rules I've added a new generic version of the unread autos elimination pass. These rules are triggered more than 5000 times when building and testing the standard library. Updates #15925 (fixes for arrays of up to 4 elements). Updates #24386 (fixes for up to 4 kept elements). Updates #24416. compilebench results: name old time/op new time/op delta Template 353ms ± 5% 359ms ± 3% ~ (p=0.143 n=10+10) Unicode 219ms ± 1% 217ms ± 4% ~ (p=0.740 n=7+10) GoTypes 1.26s ± 1% 1.26s ± 2% ~ (p=0.549 n=9+10) Compiler 6.00s ± 1% 6.08s ± 1% +1.42% (p=0.000 n=9+8) SSA 15.3s ± 2% 15.6s ± 1% +2.43% (p=0.000 n=10+10) Flate 237ms ± 2% 240ms ± 2% +1.31% (p=0.015 n=10+10) GoParser 285ms ± 1% 285ms ± 1% ~ (p=0.878 n=8+8) Reflect 797ms ± 3% 807ms ± 2% ~ (p=0.065 n=9+10) Tar 334ms ± 0% 335ms ± 4% ~ (p=0.460 n=8+10) XML 419ms ± 0% 423ms ± 1% +0.91% (p=0.001 n=7+9) StdCmd 46.0s ± 0% 46.4s ± 0% +0.85% (p=0.000 n=9+9) name old user-time/op new user-time/op delta Template 337ms ± 3% 346ms ± 5% ~ (p=0.053 n=9+10) Unicode 205ms ±10% 205ms ± 8% ~ (p=1.000 n=10+10) GoTypes 1.22s ± 2% 1.21s ± 3% ~ (p=0.436 n=10+10) Compiler 5.85s ± 1% 5.93s ± 0% +1.46% (p=0.000 n=10+8) SSA 14.9s ± 1% 15.3s ± 1% +2.62% (p=0.000 n=10+10) Flate 229ms ± 4% 228ms ± 6% ~ (p=0.796 n=10+10) GoParser 271ms ± 3% 275ms ± 4% ~ (p=0.165 n=10+10) Reflect 779ms ± 5% 775ms ± 2% ~ (p=0.971 n=10+10) Tar 317ms ± 4% 319ms ± 5% ~ (p=0.853 n=10+10) XML 404ms ± 4% 409ms ± 5% ~ (p=0.436 n=10+10) name old alloc/op new alloc/op delta Template 34.9MB ± 0% 35.0MB ± 0% +0.26% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 29.3MB ± 0% +0.02% (p=0.000 n=10+10) GoTypes 115MB ± 0% 115MB ± 0% +0.30% (p=0.000 n=10+10) Compiler 519MB ± 0% 521MB ± 0% +0.30% (p=0.000 n=10+10) SSA 1.55GB ± 0% 1.57GB ± 0% +1.34% (p=0.000 n=10+9) Flate 24.1MB ± 0% 24.2MB ± 0% +0.10% (p=0.000 n=10+10) GoParser 28.1MB ± 0% 28.1MB ± 0% +0.07% (p=0.000 n=10+10) Reflect 78.7MB ± 0% 78.7MB ± 0% +0.03% (p=0.000 n=8+10) Tar 34.4MB ± 0% 34.5MB ± 0% +0.12% (p=0.000 n=10+10) XML 43.2MB ± 0% 43.2MB ± 0% +0.13% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 330k ± 0% 330k ± 0% -0.01% (p=0.017 n=10+10) Unicode 337k ± 0% 337k ± 0% +0.01% (p=0.000 n=9+10) GoTypes 1.15M ± 0% 1.15M ± 0% +0.03% (p=0.000 n=10+10) Compiler 4.77M ± 0% 4.77M ± 0% +0.03% (p=0.000 n=9+10) SSA 12.5M ± 0% 12.6M ± 0% +1.16% (p=0.000 n=10+10) Flate 221k ± 0% 221k ± 0% +0.05% (p=0.000 n=9+10) GoParser 275k ± 0% 275k ± 0% +0.01% (p=0.014 n=10+9) Reflect 944k ± 0% 944k ± 0% -0.02% (p=0.000 n=10+10) Tar 324k ± 0% 323k ± 0% -0.12% (p=0.000 n=10+10) XML 384k ± 0% 384k ± 0% -0.01% (p=0.001 n=10+10) name old object-bytes new object-bytes delta Template 476kB ± 0% 476kB ± 0% -0.04% (p=0.000 n=10+10) Unicode 218kB ± 0% 218kB ± 0% ~ (all equal) GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.04% (p=0.000 n=10+10) Compiler 6.25MB ± 0% 6.24MB ± 0% -0.09% (p=0.000 n=10+10) SSA 15.9MB ± 0% 16.1MB ± 0% +1.22% (p=0.000 n=10+10) Flate 304kB ± 0% 304kB ± 0% -0.13% (p=0.000 n=10+10) GoParser 370kB ± 0% 370kB ± 0% -0.00% (p=0.000 n=10+10) Reflect 1.27MB ± 0% 1.27MB ± 0% -0.12% (p=0.000 n=10+10) Tar 421kB ± 0% 419kB ± 0% -0.64% (p=0.000 n=10+10) XML 518kB ± 0% 517kB ± 0% -0.12% (p=0.000 n=10+10) name old export-bytes new export-bytes delta Template 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) Unicode 6.52kB ± 0% 6.52kB ± 0% ~ (all equal) GoTypes 29.2kB ± 0% 29.2kB ± 0% ~ (all equal) Compiler 88.0kB ± 0% 88.0kB ± 0% ~ (all equal) SSA 109kB ± 0% 109kB ± 0% ~ (all equal) Flate 4.49kB ± 0% 4.49kB ± 0% ~ (all equal) GoParser 8.10kB ± 0% 8.10kB ± 0% ~ (all equal) Reflect 7.71kB ± 0% 7.71kB ± 0% ~ (all equal) Tar 9.15kB ± 0% 9.15kB ± 0% ~ (all equal) XML 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 676kB ± 0% 672kB ± 0% -0.59% (p=0.000 n=10+10) CmdGoSize 7.26MB ± 0% 7.24MB ± 0% -0.18% (p=0.000 n=10+10) name old data-bytes new data-bytes delta HelloSize 10.2kB ± 0% 10.2kB ± 0% ~ (all equal) CmdGoSize 248kB ± 0% 248kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) CmdGoSize 145kB ± 0% 145kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.46MB ± 0% 1.45MB ± 0% -0.31% (p=0.000 n=10+10) CmdGoSize 14.7MB ± 0% 14.7MB ± 0% -0.17% (p=0.000 n=10+10) Change-Id: Ic72b0c189dd542f391e1c9ab88a76e9148dc4285 Reviewed-on: https://go-review.googlesource.com/106495 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-05-08 10:31:21 +00:00
Ben Shi	098ca846c7	cmd/compile: emit more compact 386 instructions ADDL/SUBL/ANDL/ORL/XORL can have a memory operand as destination, and this CL optimize the compiler to emit such instructions on 386 for more compact binary. Here is test report: 1. The total size of pkg/linux_386/ and pkg/tool/linux_386/ decreases about 14KB. (pkg/linux_386/cmd/compile/ and pkg/tool/linux_386/compile are excluded) 2. The go1 benchmark shows little change, excluding ±2% noise. name old time/op new time/op delta BinaryTree17-4 3.34s ± 2% 3.38s ± 2% +1.27% (p=0.000 n=40+39) Fannkuch11-4 3.55s ± 1% 3.51s ± 1% -1.33% (p=0.000 n=40+40) FmtFprintfEmpty-4 46.3ns ± 3% 46.9ns ± 4% +1.41% (p=0.002 n=40+40) FmtFprintfString-4 80.8ns ± 3% 80.4ns ± 6% -0.54% (p=0.044 n=40+40) FmtFprintfInt-4 93.0ns ± 3% 92.2ns ± 4% -0.88% (p=0.007 n=39+40) FmtFprintfIntInt-4 144ns ± 5% 145ns ± 2% +0.78% (p=0.015 n=40+40) FmtFprintfPrefixedInt-4 184ns ± 2% 182ns ± 2% -1.06% (p=0.004 n=40+40) FmtFprintfFloat-4 415ns ± 4% 419ns ± 4% ~ (p=0.434 n=40+40) FmtManyArgs-4 615ns ± 3% 619ns ± 3% ~ (p=0.100 n=40+40) GobDecode-4 7.30ms ± 6% 7.36ms ± 6% ~ (p=0.074 n=40+40) GobEncode-4 7.10ms ± 6% 7.21ms ± 5% ~ (p=0.082 n=40+39) Gzip-4 364ms ± 3% 362ms ± 6% -0.71% (p=0.020 n=40+40) Gunzip-4 42.4ms ± 3% 42.2ms ± 3% ~ (p=0.303 n=40+40) HTTPClientServer-4 62.9µs ± 1% 62.9µs ± 1% ~ (p=0.768 n=38+39) JSONEncode-4 21.4ms ± 4% 21.5ms ± 5% ~ (p=0.210 n=40+40) JSONDecode-4 67.7ms ± 3% 67.9ms ± 4% ~ (p=0.713 n=40+40) Mandelbrot200-4 5.18ms ± 3% 5.21ms ± 3% +0.59% (p=0.021 n=40+40) GoParse-4 3.35ms ± 3% 3.34ms ± 2% ~ (p=0.996 n=40+40) RegexpMatchEasy0_32-4 98.5ns ± 5% 96.3ns ± 4% -2.15% (p=0.001 n=40+40) RegexpMatchEasy0_1K-4 851ns ± 4% 850ns ± 5% ~ (p=0.700 n=40+40) RegexpMatchEasy1_32-4 105ns ± 7% 107ns ± 4% +1.50% (p=0.017 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 5% 1.03µs ± 4% ~ (p=0.992 n=40+40) RegexpMatchMedium_32-4 130ns ± 6% 128ns ± 4% -1.66% (p=0.012 n=40+40) RegexpMatchMedium_1K-4 44.0µs ± 5% 43.6µs ± 3% ~ (p=0.704 n=40+40) RegexpMatchHard_32-4 2.29µs ± 3% 2.23µs ± 4% -2.38% (p=0.000 n=40+40) RegexpMatchHard_1K-4 69.0µs ± 3% 68.1µs ± 3% -1.28% (p=0.003 n=40+40) Revcomp-4 1.85s ± 2% 1.87s ± 3% +1.11% (p=0.000 n=40+40) Template-4 69.8ms ± 3% 69.6ms ± 3% ~ (p=0.125 n=40+40) TimeParse-4 442ns ± 5% 440ns ± 3% ~ (p=0.585 n=40+40) TimeFormat-4 419ns ± 3% 420ns ± 3% ~ (p=0.824 n=40+40) [Geo mean] 67.3µs 67.2µs -0.11% name old speed new speed delta GobDecode-4 105MB/s ± 6% 104MB/s ± 6% ~ (p=0.074 n=40+40) GobEncode-4 108MB/s ± 7% 107MB/s ± 5% ~ (p=0.080 n=40+39) Gzip-4 53.3MB/s ± 3% 53.7MB/s ± 6% +0.73% (p=0.021 n=40+40) Gunzip-4 458MB/s ± 3% 460MB/s ± 3% ~ (p=0.301 n=40+40) JSONEncode-4 90.8MB/s ± 4% 90.3MB/s ± 4% ~ (p=0.213 n=40+40) JSONDecode-4 28.7MB/s ± 3% 28.6MB/s ± 4% ~ (p=0.679 n=40+40) GoParse-4 17.3MB/s ± 3% 17.3MB/s ± 2% ~ (p=1.000 n=40+40) RegexpMatchEasy0_32-4 325MB/s ± 5% 333MB/s ± 4% +2.44% (p=0.000 n=40+38) RegexpMatchEasy0_1K-4 1.20GB/s ± 4% 1.21GB/s ± 5% ~ (p=0.684 n=40+40) RegexpMatchEasy1_32-4 303MB/s ± 7% 298MB/s ± 4% -1.52% (p=0.022 n=40+40) RegexpMatchEasy1_1K-4 995MB/s ± 5% 996MB/s ± 4% ~ (p=0.996 n=40+40) RegexpMatchMedium_32-4 7.67MB/s ± 6% 7.80MB/s ± 4% +1.68% (p=0.011 n=40+40) RegexpMatchMedium_1K-4 23.3MB/s ± 5% 23.5MB/s ± 3% ~ (p=0.697 n=40+40) RegexpMatchHard_32-4 14.0MB/s ± 3% 14.3MB/s ± 4% +2.43% (p=0.000 n=40+40) RegexpMatchHard_1K-4 14.8MB/s ± 3% 15.0MB/s ± 3% +1.30% (p=0.003 n=40+40) Revcomp-4 137MB/s ± 2% 136MB/s ± 3% -1.10% (p=0.000 n=40+40) Template-4 27.8MB/s ± 3% 27.9MB/s ± 3% ~ (p=0.128 n=40+40) [Geo mean] 79.6MB/s 79.9MB/s +0.28% Change-Id: I02a3efc125dc81e18fc8495eb2bf1bba59ab8733 Reviewed-on: https://go-review.googlesource.com/110157 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-05-08 06:44:54 +00:00
Martin Möhrmann	b9a59d9f2e	cmd/compile: optimize len([]rune(string)) Adds a new runtime function to count runes in a string. Modifies the compiler to detect the pattern len([]rune(string)) and replaces it with the new rune counting runtime function. RuneCount/lenruneslice/ASCII 27.8ns ± 2% 14.5ns ± 3% -47.70% (p=0.000 n=10+10) RuneCount/lenruneslice/Japanese 126ns ± 2% 60ns ± 2% -52.03% (p=0.000 n=10+10) RuneCount/lenruneslice/MixedLength 104ns ± 2% 50ns ± 1% -51.71% (p=0.000 n=10+9) Fixes #24923 Change-Id: Ie9c7e7391a4e2cca675c5cdcc1e5ce7d523948b9 Reviewed-on: https://go-review.googlesource.com/108985 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-05-06 05:31:01 +00:00
Martin Möhrmann	a8a60ac2a7	cmd/compile: optimize append(x, make([]T, y)...) slice extension Changes the compiler to recognize the slice extension pattern append(x, make([]T, y)...) and replace it with growslice and an optional memclr to avoid an allocation for make([]T, y). Memclr is not called in case growslice already allocated a new cleared backing array when T contains pointers. amd64: name old time/op new time/op delta ExtendSlice/IntSlice 103ns ± 4% 57ns ± 4% -44.55% (p=0.000 n=18+18) ExtendSlice/PointerSlice 155ns ± 3% 77ns ± 3% -49.93% (p=0.000 n=20+20) ExtendSlice/NoGrow 50.2ns ± 3% 5.2ns ± 2% -89.67% (p=0.000 n=18+18) name old alloc/op new alloc/op delta ExtendSlice/IntSlice 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=20+20) ExtendSlice/PointerSlice 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=20+20) ExtendSlice/NoGrow 32.0B ± 0% 0.0B -100.00% (p=0.000 n=20+20) name old allocs/op new allocs/op delta ExtendSlice/IntSlice 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=20+20) ExtendSlice/PointerSlice 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=20+20) ExtendSlice/NoGrow 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Fixes #21266 Change-Id: Idc3077665f63cbe89762b590c5967a864fd1c07f Reviewed-on: https://go-review.googlesource.com/109517 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-05-06 04:28:23 +00:00
Martin Möhrmann	500d79c410	cmd/compile: refactor memclrrange for arrays and slices Rename memclrrange to signify that it does not handle all types of range clears. Simplify checks to detect the range clear idiom for arrays and slices. Add tests to verify the optimization for the slice range clear idiom is being applied by the compiler. Change-Id: I5c3b7c9a479699ebdb4c407fde692f30f377860c Reviewed-on: https://go-review.googlesource.com/110477 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-05-02 04:20:25 +00:00
Matthew Dempsky	004260afde	cmd/compile: open code select{send,recv,default} Registration now looks like: var cases [4]runtime.scases var order [8]uint16 cases[0].kind = caseSend cases[0].c = c1 cases[0].elem = &v1 if raceenabled \|\| msanenabled { selectsetpc(&cases[0]) } cases[1].kind = caseRecv cases[1].c = c2 cases[1].elem = &v2 if raceenabled \|\| msanenabled { selectsetpc(&cases[1]) } ... Change-Id: Ib9bcf426a4797fe4bfd8152ca9e6e08e39a70b48 Reviewed-on: https://go-review.googlesource.com/37934 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-05-01 03:17:44 +00:00
Matthew Dempsky	3aa53b3135	runtime: eliminate runtime.hselect Now the registration phase looks like: var cases [4]runtime.scases var order [8]uint16 selectsend(&cases[0], c1, &v1) selectrecv(&cases[1], c2, &v2, nil) selectrecv(&cases[2], c3, &v3, &ok) selectdefault(&cases[3]) chosen := selectgo(&cases[0], &order[0], 4) Primarily, this is just preparation for having the compiler open-code selectsend, selectrecv, and selectdefault. As a minor benefit, order can now be layed out separately on the stack in the pointer-free segment, so it won't take up space in the function's stack pointer maps. Change-Id: I5552ba594201efd31fcb40084da20b42ea569a45 Reviewed-on: https://go-review.googlesource.com/37933 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-05-01 03:17:31 +00:00
Richard Musiol	e3c684777a	all: skip unsupported tests for js/wasm The general policy for the current state of js/wasm is that it only has to support tests that are also supported by nacl. The test nilptr3.go makes assumptions about which nil checks can be removed. Since WebAssembly does not signal on reading a null pointer, all nil checks have to be explicit. Updates #18892 Change-Id: I06a687860b8d22ae26b1c391499c0f5183e4c485 Reviewed-on: https://go-review.googlesource.com/110096 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-30 19:39:18 +00:00
Giovanni Bajo	e0d37a33ab	cmd/compile: teach prove to handle expressions like len(s)-delta When a loop has bound len(s)-delta, findIndVar detected it and returned len(s) as (conservative) upper bound. This little lie allowed loopbce to drop bound checks. It is obviously more generic to teach prove about relations like x+d<w for non-constant "w"; we already handled the case for constant "w", so we just want to learn that if d<0, then x+d<w proves that x<w. To be able to remove the code from findIndVar, we also need to teach prove that len() and cap() are always non-negative. This CL allows to prove 633 more checks in cmd+std. Most of them are cases where the code was already testing before accessing a slice but the compiler didn't know it. For instance, take strings.HasSuffix: func HasSuffix(s, suffix string) bool { return len(s) >= len(suffix) && s[len(s)-len(suffix):] == suffix } When suffix is a literal string, the compiler now understands that the explicit check is enough to not emit a slice check. I also found a loopbce test that was incorrectly written to detect an overflow but had a off-by-one (on the conservative side), so it unexpectly passed with this CL; I changed it to really trigger the overflow as intended. Change-Id: Ib5abade337db46b8811425afebad4719b6e46c4a Reviewed-on: https://go-review.googlesource.com/105635 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-04-29 09:38:32 +00:00
Giovanni Bajo	6d379add0f	cmd/compile: in prove, detect loops with negative increments To be effective, this also requires being able to relax constraints on min/max bound inclusiveness; they are now exposed through a flags, and prove has been updated to handle it correctly. Change-Id: I3490e54461b7b9de8bc4ae40d3b5e2fa2d9f0556 Reviewed-on: https://go-review.googlesource.com/104041 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-04-29 09:38:18 +00:00
Giovanni Bajo	980fdb8dd5	cmd/compile: improve testing of induction variables Test both minimum and maximum bound, and prepare formatting for more advanced tests (inclusive / esclusive bounds). Change-Id: Ibe432916d9c938343bc07943798bc9709ad71845 Reviewed-on: https://go-review.googlesource.com/104040 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-29 09:38:09 +00:00
Giovanni Bajo	7ec25d0acf	cmd/compile: implement loop BCE in prove Reuse findIndVar to discover induction variables, and then register the facts we know about them into the facts table when entering the loop block. Moreover, handle "x+delta > w" while updating the facts table, to be able to prove accesses to slices with constant offsets such as slice[i-10]. Change-Id: I2a63d050ed58258136d54712ac7015b25c893d71 Reviewed-on: https://go-review.googlesource.com/104038 Run-TryBot: Giovanni Bajo <rasky@develer.com> Reviewed-by: David Chase <drchase@google.com>	2018-04-29 09:37:35 +00:00
Giovanni Bajo	29162ec9a7	cmd/compile: in prove, infer unsigned relations while branching When a branch is followed, we apply the relation as described in the domain relation table. In case the relation is in the positive domain, we can also infer an unsigned relation if, by that point, we know that both operands are non-negative. Fixes #20393 Change-Id: Ieaf0c81558b36d96616abae3eb834c788dd278d5 Reviewed-on: https://go-review.googlesource.com/100278 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: David Chase <drchase@google.com>	2018-04-29 09:37:15 +00:00
Giovanni Bajo	5c40210987	cmd/compile: in prove, add transitive closure of relations Implement it through a partial order datastructure, which keeps the relations between SSA values in a forest of DAGs and is able to discover contradictions. In make.bash, this patch is able to prove hundreds of conditions which were not proved before. Compilebench: name old time/op new time/op delta Template 371ms ± 2% 368ms ± 1% ~ (p=0.222 n=5+5) Unicode 203ms ± 6% 199ms ± 3% ~ (p=0.421 n=5+5) GoTypes 1.17s ± 4% 1.18s ± 1% ~ (p=0.151 n=5+5) Compiler 5.54s ± 2% 5.59s ± 1% ~ (p=0.548 n=5+5) SSA 12.9s ± 2% 13.2s ± 1% +2.96% (p=0.032 n=5+5) Flate 245ms ± 2% 247ms ± 3% ~ (p=0.690 n=5+5) GoParser 302ms ± 6% 302ms ± 4% ~ (p=0.548 n=5+5) Reflect 764ms ± 4% 773ms ± 3% ~ (p=0.095 n=5+5) Tar 354ms ± 6% 361ms ± 3% ~ (p=0.222 n=5+5) XML 434ms ± 3% 429ms ± 1% ~ (p=0.421 n=5+5) StdCmd 22.6s ± 1% 22.9s ± 1% +1.40% (p=0.032 n=5+5) name old user-time/op new user-time/op delta Template 436ms ± 8% 426ms ± 5% ~ (p=0.579 n=5+5) Unicode 219ms ±15% 219ms ±12% ~ (p=1.000 n=5+5) GoTypes 1.47s ± 6% 1.53s ± 6% ~ (p=0.222 n=5+5) Compiler 7.26s ± 4% 7.40s ± 2% ~ (p=0.389 n=5+5) SSA 17.7s ± 4% 18.5s ± 4% +4.13% (p=0.032 n=5+5) Flate 257ms ± 5% 268ms ± 9% ~ (p=0.333 n=5+5) GoParser 354ms ± 6% 348ms ± 6% ~ (p=0.913 n=5+5) Reflect 904ms ± 2% 944ms ± 4% ~ (p=0.056 n=5+5) Tar 398ms ±11% 430ms ± 7% ~ (p=0.079 n=5+5) XML 501ms ± 7% 489ms ± 5% ~ (p=0.444 n=5+5) name old text-bytes new text-bytes delta HelloSize 670kB ± 0% 670kB ± 0% +0.00% (p=0.008 n=5+5) CmdGoSize 7.22MB ± 0% 7.21MB ± 0% -0.07% (p=0.008 n=5+5) name old data-bytes new data-bytes delta HelloSize 9.88kB ± 0% 9.88kB ± 0% ~ (all equal) CmdGoSize 248kB ± 0% 248kB ± 0% -0.06% (p=0.008 n=5+5) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) CmdGoSize 145kB ± 0% 144kB ± 0% -0.20% (p=0.008 n=5+5) name old exe-bytes new exe-bytes delta HelloSize 1.43MB ± 0% 1.43MB ± 0% ~ (all equal) CmdGoSize 14.5MB ± 0% 14.5MB ± 0% -0.06% (p=0.008 n=5+5) Fixes #19714 Updates #20393 Change-Id: Ia090f5b5dc1bcd274ba8a39b233c1e1ace1b330e Reviewed-on: https://go-review.googlesource.com/100277 Run-TryBot: Giovanni Bajo <rasky@develer.com> Reviewed-by: David Chase <drchase@google.com>	2018-04-29 09:35:39 +00:00
Ben Shi	aaf73c6d1e	cmd/compile: optimize ARM64 with shifted register indexed load/store ARM64 supports efficient instructions which combine shift, addition, load/store together. Such as "MOVD (R0)(R1<<3), R2" and "MOVWU R6, (R4)(R1<<2)". This CL optimizes the compiler to emit such efficient instuctions. And below is some test data. 1. binary size before/after binary size change pkg/linux_arm64 +80.1KB pkg/tool/linux_arm64 +121.9KB go -4.3KB gofmt -64KB 2. go1 benchmark There is big improvement for the test case Fannkuch11, and slight improvement for sme others, excluding noise. name old time/op new time/op delta BinaryTree17-4 43.9s ± 2% 44.0s ± 2% ~ (p=0.820 n=30+30) Fannkuch11-4 30.6s ± 2% 24.5s ± 3% -19.93% (p=0.000 n=25+30) FmtFprintfEmpty-4 500ns ± 0% 499ns ± 0% -0.11% (p=0.000 n=23+25) FmtFprintfString-4 1.03µs ± 0% 1.04µs ± 3% ~ (p=0.065 n=29+30) FmtFprintfInt-4 1.15µs ± 3% 1.15µs ± 4% -0.56% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.80µs ± 5% 1.82µs ± 0% ~ (p=0.094 n=30+24) FmtFprintfPrefixedInt-4 2.17µs ± 5% 2.20µs ± 0% ~ (p=0.100 n=30+23) FmtFprintfFloat-4 3.08µs ± 3% 3.09µs ± 4% ~ (p=0.123 n=30+30) FmtManyArgs-4 7.41µs ± 4% 7.17µs ± 1% -3.26% (p=0.000 n=30+23) GobDecode-4 93.7ms ± 0% 94.7ms ± 4% ~ (p=0.685 n=24+30) GobEncode-4 78.7ms ± 7% 77.1ms ± 0% ~ (p=0.729 n=30+23) Gzip-4 4.01s ± 0% 3.97s ± 5% -1.11% (p=0.037 n=24+30) Gunzip-4 389ms ± 4% 384ms ± 0% ~ (p=0.155 n=30+23) HTTPClientServer-4 536µs ± 1% 537µs ± 1% ~ (p=0.236 n=30+30) JSONEncode-4 179ms ± 1% 182ms ± 6% ~ (p=0.763 n=24+30) JSONDecode-4 843ms ± 0% 839ms ± 6% -0.42% (p=0.003 n=25+30) Mandelbrot200-4 46.5ms ± 0% 46.5ms ± 0% +0.02% (p=0.000 n=26+26) GoParse-4 44.3ms ± 6% 43.3ms ± 0% ~ (p=0.067 n=30+27) RegexpMatchEasy0_32-4 1.07µs ± 7% 1.07µs ± 4% ~ (p=0.835 n=30+30) RegexpMatchEasy0_1K-4 5.51µs ± 0% 5.49µs ± 0% -0.35% (p=0.000 n=23+26) RegexpMatchEasy1_32-4 1.01µs ± 0% 1.02µs ± 4% +0.96% (p=0.014 n=24+30) RegexpMatchEasy1_1K-4 7.43µs ± 0% 7.18µs ± 0% -3.41% (p=0.000 n=23+24) RegexpMatchMedium_32-4 1.78µs ± 0% 1.81µs ± 4% +1.47% (p=0.012 n=23+30) RegexpMatchMedium_1K-4 547µs ± 1% 542µs ± 3% -0.90% (p=0.003 n=24+30) RegexpMatchHard_32-4 30.4µs ± 0% 29.7µs ± 0% -2.15% (p=0.000 n=19+23) RegexpMatchHard_1K-4 913µs ± 0% 915µs ± 6% +0.25% (p=0.012 n=24+30) Revcomp-4 6.32s ± 1% 6.42s ± 4% ~ (p=0.342 n=25+30) Template-4 868ms ± 6% 878ms ± 6% +1.15% (p=0.000 n=30+30) TimeParse-4 4.57µs ± 4% 4.59µs ± 3% +0.65% (p=0.010 n=29+30) TimeFormat-4 4.51µs ± 0% 4.50µs ± 0% -0.27% (p=0.000 n=27+24) [Geo mean] 695µs 689µs -0.92% name old speed new speed delta GobDecode-4 8.19MB/s ± 0% 8.12MB/s ± 4% ~ (p=0.680 n=24+30) GobEncode-4 9.76MB/s ± 7% 9.96MB/s ± 0% ~ (p=0.616 n=30+23) Gzip-4 4.84MB/s ± 0% 4.89MB/s ± 4% +1.16% (p=0.030 n=24+30) Gunzip-4 49.9MB/s ± 4% 50.6MB/s ± 0% ~ (p=0.162 n=30+23) JSONEncode-4 10.9MB/s ± 1% 10.7MB/s ± 6% ~ (p=0.575 n=24+30) JSONDecode-4 2.30MB/s ± 0% 2.32MB/s ± 5% +0.72% (p=0.003 n=22+30) GoParse-4 1.31MB/s ± 6% 1.34MB/s ± 0% +2.26% (p=0.002 n=30+27) RegexpMatchEasy0_32-4 30.0MB/s ± 6% 30.0MB/s ± 4% ~ (p=1.000 n=30+30) RegexpMatchEasy0_1K-4 186MB/s ± 0% 187MB/s ± 0% +0.35% (p=0.000 n=23+26) RegexpMatchEasy1_32-4 31.8MB/s ± 0% 31.5MB/s ± 4% -0.92% (p=0.012 n=25+30) RegexpMatchEasy1_1K-4 138MB/s ± 0% 143MB/s ± 0% +3.53% (p=0.000 n=23+24) RegexpMatchMedium_32-4 560kB/s ± 0% 553kB/s ± 4% -1.19% (p=0.005 n=23+30) RegexpMatchMedium_1K-4 1.87MB/s ± 0% 1.89MB/s ± 3% +1.04% (p=0.002 n=24+30) RegexpMatchHard_32-4 1.05MB/s ± 0% 1.08MB/s ± 0% +2.40% (p=0.000 n=19+23) RegexpMatchHard_1K-4 1.12MB/s ± 0% 1.12MB/s ± 5% +0.12% (p=0.006 n=25+30) Revcomp-4 40.2MB/s ± 1% 39.6MB/s ± 4% ~ (p=0.242 n=25+30) Template-4 2.24MB/s ± 6% 2.21MB/s ± 6% -1.15% (p=0.000 n=30+30) [Geo mean] 7.87MB/s 7.91MB/s +0.44% Change-Id: If374cb7abf83537aa0a176f73c0f736f7800db03 Reviewed-on: https://go-review.googlesource.com/108735 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-27 20:02:05 +00:00
Milan Knezevic	2959128dc5	cmd/compile: add softfloat support to mips64{,le} mips64 softfloat support is based on mips implementation and introduces new enviroment variable GOMIPS64. GOMIPS64 is a GOARCH=mips64{,le} specific option, for a choice between hard-float and soft-float. Valid values are 'hardfloat' (default) and 'softfloat'. It is passed to the assembler as 'GOMIPS64_{hardfloat,softfloat}'. Change-Id: I7f73078627f7cb37c588a38fb5c997fe09c56134 Reviewed-on: https://go-review.googlesource.com/108475 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-27 14:50:17 +00:00
Josh Bleecher Snyder	d9a50a6531	cmd/compile: use prove pass to detect Ctz of non-zero values On amd64, Ctz must include special handling of zeros. But the prove pass has enough information to detect whether the input is non-zero, allowing a more efficient lowering. Introduce new CtzNonZero ops to capture and use this information. Benchmark code: func BenchmarkVisitBits(b testing.B) { b.Run("8", func(b testing.B) { for i := 0; i < b.N; i++ { x := uint8(0xff) for x != 0 { sink = bits.TrailingZeros8(x) x &= x - 1 } } }) // and similarly so for 16, 32, 64 } name old time/op new time/op delta VisitBits/8-8 7.27ns ± 4% 5.58ns ± 4% -23.35% (p=0.000 n=28+26) VisitBits/16-8 14.7ns ± 7% 10.5ns ± 4% -28.43% (p=0.000 n=30+28) VisitBits/32-8 27.6ns ± 8% 19.3ns ± 3% -30.14% (p=0.000 n=30+26) VisitBits/64-8 44.0ns ±11% 38.0ns ± 5% -13.48% (p=0.000 n=30+30) Fixes #25077 Change-Id: Ie6e5bd86baf39ee8a4ca7cadcf56d934e047f957 Reviewed-on: https://go-review.googlesource.com/109358 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-26 18:22:28 +00:00
Carlos Eduardo Seo	ebb67d993a	cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64x This change implements math.Round as an intrinsic on ppc64x so it can be done using a single instruction. benchmark old ns/op new ns/op delta BenchmarkRound-16 2.60 0.69 -73.46% Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8 Reviewed-on: https://go-review.googlesource.com/109395 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2018-04-26 14:12:09 +00:00
Josh Bleecher Snyder	c5f0104daf	cmd/compile: use intrinsic for LeadingZeros8 on amd64 The previous change sped up the pure computation form of LeadingZeros8. This places it somewhat close to the table lookup form. Depending on something that varies from toolchain to toolchain (alignment, perhaps?), the slowdown from ditching the table lookup is either 20% or 5%. This benchmark is the best case scenario for the table lookup: It is in the L1 cache already. I think we're close enough that we can switch to the computational version, and trust that the memory effects and binary size savings will be worth it. Code: func f8(x uint8) { z = bits.LeadingZeros8(x) } Before: "".f8 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) LEAQ math/bits.len8tab(SB), CX 0x000f 00015 (x.go:7) MOVBLZX (CX)(AX1), AX 0x0013 00019 (x.go:7) ADDQ $-8, AX 0x0017 00023 (x.go:7) NEGQ AX 0x001a 00026 (x.go:7) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:7) RET After: "".f8 STEXT nosplit size=30 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) LEAL 1(AX)(AX1), AX 0x000c 00012 (x.go:7) BSRL AX, AX 0x000f 00015 (x.go:7) ADDQ $-8, AX 0x0013 00019 (x.go:7) NEGQ AX 0x0016 00022 (x.go:7) MOVQ AX, "".z(SB) 0x001d 00029 (x.go:7) RET Change-Id: Icc7db50a7820fb9a3da8a816d6b6940d7f8e193e Reviewed-on: https://go-review.googlesource.com/108942 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-25 21:34:15 +00:00
Josh Bleecher Snyder	1d321ada73	cmd/compile: optimize LeadingZeros(16\|32) on amd64 Introduce Len8 and Len16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. Also use and optimize the Len32 lowering, along the same lines. Leave Len8 unused for the moment; a subsequent CL will enable it. For 16 and 32 bits, this leads to a speed-up. name old time/op new time/op delta LeadingZeros16-8 1.42ns ± 5% 1.23ns ± 5% -13.42% (p=0.000 n=20+20) LeadingZeros32-8 1.25ns ± 5% 1.03ns ± 5% -17.63% (p=0.000 n=20+16) Code: func f16(x uint16) { z = bits.LeadingZeros16(x) } func f32(x uint32) { z = bits.LeadingZeros32(x) } Before: "".f16 STEXT nosplit size=38 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BSRQ AX, AX 0x000c 00012 (x.go:8) MOVQ $-1, CX 0x0013 00019 (x.go:8) CMOVQEQ CX, AX 0x0017 00023 (x.go:8) ADDQ $-15, AX 0x001b 00027 (x.go:8) NEGQ AX 0x001e 00030 (x.go:8) MOVQ AX, "".z(SB) 0x0025 00037 (x.go:8) RET "".f32 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) BSRQ AX, AX 0x0008 00008 (x.go:9) MOVQ $-1, CX 0x000f 00015 (x.go:9) CMOVQEQ CX, AX 0x0013 00019 (x.go:9) ADDQ $-31, AX 0x0017 00023 (x.go:9) NEGQ AX 0x001a 00026 (x.go:9) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:9) RET After: "".f16 STEXT nosplit size=30 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) LEAL 1(AX)(AX1), AX 0x000c 00012 (x.go:8) BSRL AX, AX 0x000f 00015 (x.go:8) ADDQ $-16, AX 0x0013 00019 (x.go:8) NEGQ AX 0x0016 00022 (x.go:8) MOVQ AX, "".z(SB) 0x001d 00029 (x.go:8) RET "".f32 STEXT nosplit size=28 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) LEAQ 1(AX)(AX1), AX 0x0009 00009 (x.go:9) BSRQ AX, AX 0x000d 00013 (x.go:9) ADDQ $-32, AX 0x0011 00017 (x.go:9) NEGQ AX 0x0014 00020 (x.go:9) MOVQ AX, "".z(SB) 0x001b 00027 (x.go:9) RET Change-Id: I6c93c173752a7bfdeab8be30777ae05a736e1f4b Reviewed-on: https://go-review.googlesource.com/108941 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-25 21:34:04 +00:00
Josh Bleecher Snyder	54dbab5221	cmd/compile: optimize TrailingZeros(8\|16) on amd64 Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. name old time/op new time/op delta TrailingZeros8-8 1.33ns ± 6% 0.84ns ± 3% -36.90% (p=0.000 n=20+20) TrailingZeros16-8 1.26ns ± 5% 0.84ns ± 5% -33.50% (p=0.000 n=20+18) Code: func f8(x uint8) { z = bits.TrailingZeros8(x) } func f16(x uint16) { z = bits.TrailingZeros16(x) } Before: "".f8 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) BTSQ $8, AX 0x000d 00013 (x.go:7) BSFQ AX, AX 0x0011 00017 (x.go:7) MOVL $64, CX 0x0016 00022 (x.go:7) CMOVQEQ CX, AX 0x001a 00026 (x.go:7) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:7) RET "".f16 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BTSQ $16, AX 0x000d 00013 (x.go:8) BSFQ AX, AX 0x0011 00017 (x.go:8) MOVL $64, CX 0x0016 00022 (x.go:8) CMOVQEQ CX, AX 0x001a 00026 (x.go:8) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:8) RET After: "".f8 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) BTSL $8, AX 0x0009 00009 (x.go:7) BSFL AX, AX 0x000c 00012 (x.go:7) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:7) RET "".f16 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) BTSL $16, AX 0x0009 00009 (x.go:8) BSFL AX, AX 0x000c 00012 (x.go:8) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:8) RET Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11 Reviewed-on: https://go-review.googlesource.com/108940 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-25 21:33:52 +00:00
Ilya Tocar	fb017c60bc	cmd/compile/internal/ssa: fix endless compile loop on AMD64 We currently rewrite (TESTQ (MOVQconst [c] x)) into (TESTQconst [c] x) and (TESTQconst [-1] x) into (TESTQ x x) if x is a (MOVQconst [-1]) we will be stuck in the endless rewrite loop. Don't perform the rewrite in such cases. Fixes #25006 Change-Id: I77f561ba2605fc104f1e5d5c57f32e9d67a2c000 Reviewed-on: https://go-review.googlesource.com/108879 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-24 16:20:41 +00:00
Josh Bleecher Snyder	d292f77e95	cmd/compile: rewrite 2*x+c into LEAx1 on amd64 Rewrite x<<1+c into x+x+c, which can be expressed as a single LEAQ/LEAL. Bit of a special case, but the single-instruction LEA is both shorter and faster than SHL then ADD. Triggers 293 times during make.bash. Change-Id: I3f09c8e9a8f3859d1eeed336f095fc3ada79c2c1 Reviewed-on: https://go-review.googlesource.com/108938 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-23 22:40:10 +00:00
Josh Bleecher Snyder	566e3e074c	cmd/compile: avoid runtime call during switch string(byteslice) This triggers three times while building std, once in image/png and twice in go/internal/gccgoimporter. There are no instances in std in which a more aggressive optimization would have triggered. This doesn't necessarily avoid an allocation, because escape analysis is already able in many cases to use a temporary backing for the string, but it does at a minimum avoid the runtime call and copy. Fixes #24937 Change-Id: I7019e85638ba8cd7e2f03890e672558b858579bc Reviewed-on: https://go-review.googlesource.com/108035 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-04-21 00:50:50 +00:00
Lynn Boger	30311e8860	cmd/compile: generate load without DS relocation for go.string on ppc64le Due to some recent optimizations related to the compare instruction, DS-form load instructions started to be used to load 8-byte go.strings. This can cause link time errors if the go.string is not aligned to 4 bytes. For DS-form instructions, the value in the offset field must be a multiple of 4. If the offset is known at the time the rules are processed, a DS-form load will not be chosen. But for go.strings, the offset is not known at that time, but a relocation is generated indicating that the linker should fill in the DS relocation. When the linker tries to fill in the relocation, if the offset is not aligned properly, a link error will occur. To fix this, when loading a go.string using MOVDload, the full address of the go.string is generated and loaded into the base register. Then the go.string is loaded with a 0 offset field. Added a testcase that reproduces this problem. Fixes #24799 Change-Id: I6a154e8e1cba64eae290be0fbcb608b75884ecdd Reviewed-on: https://go-review.googlesource.com/107855 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-04-20 16:16:47 +00:00
Ben Shi	34f5f8a580	cmd/compile: optimize ARM64 with register indexed load/store ARM64 supports load/store instructions with a memory operand that the address is calculated by base register + index register. In this CL, 1. Some rules are added to the compile's ARM64 backend to emit such efficient instructions. 2. A wrong rule of load combination is fixed. The go1 benchmark does show improvement. name old time/op new time/op delta BinaryTree17-4 44.5s ± 2% 44.1s ± 1% -0.81% (p=0.000 n=28+29) Fannkuch11-4 32.7s ± 3% 30.5s ± 0% -6.79% (p=0.000 n=30+26) FmtFprintfEmpty-4 499ns ± 0% 506ns ± 5% +1.39% (p=0.003 n=25+30) FmtFprintfString-4 1.07µs ± 0% 1.04µs ± 4% -3.17% (p=0.000 n=23+30) FmtFprintfInt-4 1.15µs ± 4% 1.13µs ± 0% -1.55% (p=0.000 n=30+23) FmtFprintfIntInt-4 1.77µs ± 4% 1.74µs ± 0% -1.71% (p=0.000 n=30+24) FmtFprintfPrefixedInt-4 2.37µs ± 5% 2.12µs ± 0% -10.56% (p=0.000 n=30+23) FmtFprintfFloat-4 3.03µs ± 1% 3.03µs ± 4% -0.13% (p=0.003 n=25+30) FmtManyArgs-4 7.38µs ± 1% 7.43µs ± 4% +0.59% (p=0.003 n=25+30) GobDecode-4 101ms ± 6% 95ms ± 5% -5.55% (p=0.000 n=30+30) GobEncode-4 78.0ms ± 4% 78.8ms ± 6% +1.05% (p=0.000 n=30+30) Gzip-4 4.25s ± 0% 4.27s ± 4% +0.45% (p=0.003 n=24+30) Gunzip-4 428ms ± 1% 420ms ± 0% -1.88% (p=0.000 n=23+23) HTTPClientServer-4 549µs ± 1% 541µs ± 1% -1.56% (p=0.000 n=29+29) JSONEncode-4 194ms ± 0% 188ms ± 4% ~ (p=0.417 n=23+30) JSONDecode-4 890ms ± 5% 831ms ± 0% -6.55% (p=0.000 n=30+23) Mandelbrot200-4 47.3ms ± 2% 46.5ms ± 0% ~ (p=0.980 n=30+26) GoParse-4 43.1ms ± 6% 43.8ms ± 6% +1.65% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.06µs ± 0% 1.07µs ± 3% ~ (p=0.092 n=23+30) RegexpMatchEasy0_1K-4 5.53µs ± 0% 5.51µs ± 0% -0.24% (p=0.000 n=25+25) RegexpMatchEasy1_32-4 1.02µs ± 3% 1.01µs ± 0% -1.27% (p=0.000 n=30+24) RegexpMatchEasy1_1K-4 7.26µs ± 0% 7.33µs ± 0% +0.95% (p=0.000 n=23+26) RegexpMatchMedium_32-4 1.84µs ± 7% 1.79µs ± 1% ~ (p=0.333 n=30+23) RegexpMatchMedium_1K-4 553µs ± 0% 547µs ± 0% -1.14% (p=0.000 n=24+22) RegexpMatchHard_32-4 30.8µs ± 1% 30.3µs ± 0% -1.40% (p=0.000 n=24+24) RegexpMatchHard_1K-4 928µs ± 0% 929µs ± 5% +0.12% (p=0.013 n=23+30) Revcomp-4 8.13s ± 4% 6.32s ± 1% -22.23% (p=0.000 n=30+23) Template-4 899ms ± 6% 854ms ± 1% -5.01% (p=0.000 n=30+24) TimeParse-4 4.66µs ± 4% 4.59µs ± 1% -1.57% (p=0.000 n=30+23) TimeFormat-4 4.58µs ± 0% 4.61µs ± 0% +0.57% (p=0.000 n=26+24) [Geo mean] 717µs 698µs -2.55% name old speed new speed delta GobDecode-4 7.63MB/s ± 6% 8.08MB/s ± 5% +5.88% (p=0.000 n=30+30) GobEncode-4 9.85MB/s ± 4% 9.75MB/s ± 6% -1.04% (p=0.000 n=30+30) Gzip-4 4.56MB/s ± 0% 4.55MB/s ± 4% -0.36% (p=0.003 n=24+30) Gunzip-4 45.3MB/s ± 1% 46.2MB/s ± 0% +1.92% (p=0.000 n=23+23) JSONEncode-4 10.0MB/s ± 0% 10.4MB/s ± 4% ~ (p=0.403 n=23+30) JSONDecode-4 2.18MB/s ± 5% 2.33MB/s ± 0% +6.91% (p=0.000 n=30+23) GoParse-4 1.34MB/s ± 5% 1.32MB/s ± 5% -1.66% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 30.2MB/s ± 0% 29.8MB/s ± 3% ~ (p=0.099 n=23+30) RegexpMatchEasy0_1K-4 185MB/s ± 0% 186MB/s ± 0% +0.24% (p=0.000 n=25+25) RegexpMatchEasy1_32-4 31.4MB/s ± 3% 31.8MB/s ± 0% +1.24% (p=0.000 n=30+24) RegexpMatchEasy1_1K-4 141MB/s ± 0% 140MB/s ± 0% -0.94% (p=0.000 n=23+26) RegexpMatchMedium_32-4 541kB/s ± 6% 560kB/s ± 0% +3.45% (p=0.000 n=30+23) RegexpMatchMedium_1K-4 1.85MB/s ± 0% 1.87MB/s ± 0% +1.08% (p=0.000 n=24+23) RegexpMatchHard_32-4 1.04MB/s ± 1% 1.06MB/s ± 1% +1.48% (p=0.000 n=24+24) RegexpMatchHard_1K-4 1.10MB/s ± 0% 1.10MB/s ± 5% +0.15% (p=0.004 n=23+30) Revcomp-4 31.3MB/s ± 4% 40.2MB/s ± 1% +28.52% (p=0.000 n=30+23) Template-4 2.16MB/s ± 6% 2.27MB/s ± 1% +5.18% (p=0.000 n=30+24) [Geo mean] 7.57MB/s 7.79MB/s +2.98% fixes #24907 Change-Id: I94afd0e3f53d62a1cf5e452f3dd6daf61be21785 Reviewed-on: https://go-review.googlesource.com/107376 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-04-19 15:08:10 +00:00
Cherry Zhang	3042463d61	cmd/compile: in escape analysis, use element type for OIND of slice The escape analysis models the flow of "content" of X with a level of "indirection" (OIND node) of X. This content can be pointer dereference, or slice/string element. For the latter case, the type of the OIND node should be the element type of the slice/string. This CL fixes this. In particular, this matters when the element type is pointerless, where the data flow should not cause any escape. Fixes #15730. Change-Id: Iba9f92898681625e7e3ddef76ae65d7cd61c41e0 Reviewed-on: https://go-review.googlesource.com/107597 Reviewed-by: David Chase <drchase@google.com>	2018-04-18 02:59:37 +00:00
Michael Munday	58cdecb9c8	cmd/compile: generate constants for NeqPtr, EqPtr and IsNonNil ops If both inputs are constant offsets from the same pointer then we can evaluate NeqPtr and EqPtr at compile time. Triggers a few times during all.bash. Removes a conditional branch in the following code: copy(x[1:], x[:]) This branch was recently added as an optimization in CL 94596. We now skip the memmove if the pointers are equal. However, in the above code we know at compile time that they are never equal. Also, when the offset is variable, check if the offset is zero rather than if the pointers are equal. For example: copy(x[a:], x[:]) This would now skip the copy if a == 0, rather than if x + a == x. Finally I've also added a rule to make IsNonNil true for pointers to values on the stack. The nil check elimination pass will catch these anyway, but eliminating them here might eliminate branches earlier. Change-Id: If72f436fef0a96ad0f4e296d3a1f8b6c3e712085 Reviewed-on: https://go-review.googlesource.com/106635 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-16 20:43:57 +00:00
Ben Shi	cd65bbc01b	cmd/compile/internal/ssa: optimize 386's subtraction The SUBL instruction can take a memory operand, and this CL implements this optimization. The go1 benchmark shows a little improvement. name old time/op new time/op delta BinaryTree17-4 3.27s ± 2% 3.29s ± 3% ~ (p=0.322 n=37+40) Fannkuch11-4 3.49s ± 0% 3.53s ± 1% +1.21% (p=0.000 n=31+40) FmtFprintfEmpty-4 46.2ns ± 3% 46.3ns ± 2% ~ (p=0.351 n=40+28) FmtFprintfString-4 82.0ns ± 3% 81.5ns ± 2% -0.69% (p=0.002 n=40+30) FmtFprintfInt-4 94.6ns ± 3% 94.6ns ± 6% ~ (p=0.913 n=39+37) FmtFprintfIntInt-4 147ns ± 3% 150ns ± 2% +1.72% (p=0.000 n=40+25) FmtFprintfPrefixedInt-4 186ns ± 3% 186ns ± 0% -0.33% (p=0.006 n=40+25) FmtFprintfFloat-4 388ns ± 4% 388ns ± 4% ~ (p=0.162 n=40+40) FmtManyArgs-4 612ns ± 3% 616ns ± 4% ~ (p=0.223 n=40+40) GobDecode-4 7.35ms ± 5% 7.42ms ± 5% ~ (p=0.095 n=40+40) GobEncode-4 7.21ms ± 8% 7.23ms ± 4% ~ (p=0.294 n=40+40) Gzip-4 360ms ± 4% 359ms ± 4% ~ (p=0.097 n=40+40) Gunzip-4 46.1ms ± 3% 45.6ms ± 3% -1.20% (p=0.000 n=40+40) HTTPClientServer-4 64.0µs ± 2% 64.1µs ± 2% ~ (p=0.648 n=39+40) JSONEncode-4 21.9ms ± 4% 22.1ms ± 5% ~ (p=0.086 n=40+40) JSONDecode-4 67.9ms ± 4% 66.7ms ± 4% -1.63% (p=0.000 n=40+40) Mandelbrot200-4 5.19ms ± 3% 5.17ms ± 3% ~ (p=0.881 n=40+40) GoParse-4 3.34ms ± 3% 3.28ms ± 2% -1.78% (p=0.000 n=40+40) RegexpMatchEasy0_32-4 101ns ± 5% 99ns ± 3% -2.40% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 851ns ± 1% 848ns ± 3% -0.36% (p=0.004 n=33+40) RegexpMatchEasy1_32-4 109ns ± 5% 105ns ± 3% -3.53% (p=0.000 n=39+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 3% ~ (p=0.638 n=40+38) RegexpMatchMedium_32-4 131ns ± 5% 127ns ± 4% -3.36% (p=0.000 n=38+40) RegexpMatchMedium_1K-4 43.4µs ± 4% 43.2µs ± 3% -0.46% (p=0.008 n=40+40) RegexpMatchHard_32-4 2.21µs ± 4% 2.23µs ± 1% +0.77% (p=0.014 n=40+28) RegexpMatchHard_1K-4 67.6µs ± 4% 67.7µs ± 3% +0.11% (p=0.016 n=40+40) Revcomp-4 1.86s ± 3% 1.77s ± 2% -4.81% (p=0.000 n=40+40) Template-4 71.7ms ± 3% 71.6ms ± 4% ~ (p=0.200 n=40+40) TimeParse-4 436ns ± 4% 433ns ± 3% ~ (p=0.358 n=40+40) TimeFormat-4 413ns ± 4% 412ns ± 3% ~ (p=0.415 n=40+40) [Geo mean] 63.9µs 63.6µs -0.49% name old speed new speed delta GobDecode-4 105MB/s ± 5% 104MB/s ± 5% ~ (p=0.096 n=40+40) GobEncode-4 106MB/s ± 7% 106MB/s ± 3% ~ (p=0.385 n=39+40) Gzip-4 54.0MB/s ± 4% 54.0MB/s ± 4% ~ (p=0.100 n=40+40) Gunzip-4 421MB/s ± 3% 426MB/s ± 3% +1.21% (p=0.000 n=40+40) JSONEncode-4 88.5MB/s ± 5% 88.0MB/s ± 5% ~ (p=0.083 n=40+40) JSONDecode-4 28.6MB/s ± 4% 29.1MB/s ± 4% +1.65% (p=0.000 n=40+40) GoParse-4 17.3MB/s ± 3% 17.7MB/s ± 2% +1.82% (p=0.000 n=40+40) RegexpMatchEasy0_32-4 316MB/s ± 5% 323MB/s ± 4% +2.44% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 1.20GB/s ± 1% 1.21GB/s ± 3% +0.40% (p=0.004 n=33+40) RegexpMatchEasy1_32-4 291MB/s ± 7% 302MB/s ± 4% +3.82% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 993MB/s ± 4% 990MB/s ± 3% ~ (p=0.623 n=40+38) RegexpMatchMedium_32-4 7.61MB/s ± 5% 7.87MB/s ± 4% +3.36% (p=0.000 n=38+40) RegexpMatchMedium_1K-4 23.6MB/s ± 4% 23.7MB/s ± 4% +0.46% (p=0.007 n=40+40) RegexpMatchHard_32-4 14.5MB/s ± 4% 14.3MB/s ± 1% -0.79% (p=0.017 n=40+28) RegexpMatchHard_1K-4 15.1MB/s ± 4% 15.1MB/s ± 3% -0.11% (p=0.015 n=40+40) Revcomp-4 137MB/s ± 3% 144MB/s ± 3% +5.06% (p=0.000 n=40+40) Template-4 27.1MB/s ± 3% 27.1MB/s ± 4% ~ (p=0.211 n=40+40) [Geo mean] 78.9MB/s 79.7MB/s +1.01% Change-Id: I638fa4fef85833e8605919d693f9570cc3cf7334 Reviewed-on: https://go-review.googlesource.com/107275 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-16 04:41:20 +00:00
Giovanni Bajo	e7b1d0a9cf	test: add missing copyright header Change-Id: Ia64535492515f725fe3c4b59ea300363a0c4ce10 Reviewed-on: https://go-review.googlesource.com/107136 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-15 21:17:54 +00:00
Giovanni Bajo	2954ef20bb	test: small cleanup of code and comments in run.go While writing CL 107315, I went back and forth for the syntax used for constraints of build environments in which the architecture did not support varitants ("plan9/amd64" vs "plan9/amd64/"). I eventually settled for the latter because the code required less heuristics (think parsing "plan9/386" vs "386/sse2") but there were a few leftovers in code and comments. Change-Id: I9d9a008f3814f9a1642609650eb571e7f1a675cf Reviewed-on: https://go-review.googlesource.com/107338 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-15 21:17:43 +00:00
Giovanni Bajo	284ba47b49	test: run codegen tests on all supported architecture variants This CL makes the codegen testsuite automatically test all architecture variants for architecture specified in tests. For instance, if a test file specifies a "arm" test, it will be automatically run on all GOARM variants (5,6,7), to increase the coverage. The CL also introduces a syntax to specify only a specific variant (eg: "arm/7") in case the test makes sense only there. The same syntax also allows to specify the operating system in case it matters (eg: "plan9/386/sse2"). Fixes #24658 Change-Id: I2eba8b918f51bb6a77a8431a309f8b71af07ea22 Reviewed-on: https://go-review.googlesource.com/107315 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-15 20:02:43 +00:00
Giovanni Bajo	01aa1d7dbe	test: migrate plan9 tests to codegen And remove it from asmtest. Next CL will remove the whole asmtest infrastructure. Change-Id: I5851bf7c617456d62a3c6cffacf70252df7b056b Reviewed-on: https://go-review.googlesource.com/107335 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-15 20:02:30 +00:00
Cherry Zhang	b08a9b7ecc	all: use new softfloat on GOARM=5 Use the new softfloat support in the compiler, originally added for softfloat on MIPS. This support is portable, so we can just use it for softfloat on ARM. In the old softfloat support on ARM, the compiler generates floating point instructions, then the assembler inserts calls to _sfloat before FP instructions. _sfloat decodes the following FP instructions and simulates them. In the new scheme, the compiler generates runtime calls to do FP operations at a higher level. It doesn't generate FP instructions, and therefore the assembler won't insert _sfloat calls, i.e. the old mechanism is automatically suppressed. The old method may be still be triggered with assembly code using FP instructions. In the standard library, the only occurance is math/sqrt_arm.s, which is rewritten to call to the Go implementation instead. Some significant speedups for code using floating points: name old time/op new time/op delta BinaryTree17-4 37.1s ± 2% 37.3s ± 1% ~ (p=0.105 n=10+10) Fannkuch11-4 13.0s ± 0% 13.1s ± 0% +0.46% (p=0.000 n=10+10) FmtFprintfEmpty-4 700ns ± 4% 734ns ± 6% +4.84% (p=0.009 n=10+10) FmtFprintfString-4 1.22µs ± 3% 1.22µs ± 4% ~ (p=0.897 n=10+10) FmtFprintfInt-4 1.27µs ± 2% 1.30µs ± 1% +1.91% (p=0.001 n=10+9) FmtFprintfIntInt-4 1.83µs ± 2% 1.81µs ± 3% ~ (p=0.149 n=10+10) FmtFprintfPrefixedInt-4 1.80µs ± 3% 1.81µs ± 2% ~ (p=0.421 n=10+8) FmtFprintfFloat-4 6.89µs ± 3% 3.59µs ± 2% -47.93% (p=0.000 n=10+10) FmtManyArgs-4 6.39µs ± 1% 6.09µs ± 1% -4.61% (p=0.000 n=10+9) GobDecode-4 109ms ± 2% 81ms ± 2% -25.99% (p=0.000 n=9+10) GobEncode-4 109ms ± 2% 76ms ± 2% -29.88% (p=0.000 n=10+9) Gzip-4 3.61s ± 1% 3.59s ± 1% ~ (p=0.247 n=10+10) Gunzip-4 449ms ± 4% 450ms ± 1% ~ (p=0.230 n=10+7) HTTPClientServer-4 1.55ms ± 3% 1.53ms ± 2% ~ (p=0.400 n=9+10) JSONEncode-4 356ms ± 1% 183ms ± 1% -48.73% (p=0.000 n=10+10) JSONDecode-4 1.12s ± 2% 0.87s ± 1% -21.88% (p=0.000 n=10+10) Mandelbrot200-4 5.49s ± 1% 2.55s ± 1% -53.45% (p=0.000 n=9+10) GoParse-4 49.6ms ± 2% 47.5ms ± 1% -4.08% (p=0.000 n=10+9) RegexpMatchEasy0_32-4 1.13µs ± 4% 1.20µs ± 4% +6.42% (p=0.000 n=10+10) RegexpMatchEasy0_1K-4 4.41µs ± 2% 4.44µs ± 2% ~ (p=0.128 n=10+10) RegexpMatchEasy1_32-4 1.15µs ± 5% 1.20µs ± 5% +4.85% (p=0.002 n=10+10) RegexpMatchEasy1_1K-4 6.21µs ± 2% 6.37µs ± 4% +2.62% (p=0.001 n=9+10) RegexpMatchMedium_32-4 1.58µs ± 5% 1.65µs ± 3% +4.85% (p=0.000 n=10+10) RegexpMatchMedium_1K-4 341µs ± 3% 351µs ± 7% ~ (p=0.573 n=8+10) RegexpMatchHard_32-4 21.4µs ± 3% 21.5µs ± 5% ~ (p=0.931 n=9+9) RegexpMatchHard_1K-4 626µs ± 2% 626µs ± 1% ~ (p=0.645 n=8+8) Revcomp-4 46.4ms ± 2% 47.4ms ± 2% +2.07% (p=0.000 n=10+10) Template-4 1.31s ± 3% 1.23s ± 4% -6.13% (p=0.000 n=10+10) TimeParse-4 4.49µs ± 1% 4.41µs ± 2% -1.81% (p=0.000 n=10+9) TimeFormat-4 9.31µs ± 1% 9.32µs ± 2% ~ (p=0.561 n=9+9) Change-Id: Iaeeff6c9a09c1b2c064d06e09dd88101dc02bfa4 Reviewed-on: https://go-review.googlesource.com/106735 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-13 16:39:39 +00:00
Cherry Zhang	5a91c83ce8	cmd/compile: in escape analysis, propagate loop depth to field The escape analysis models "loop depth". If the address of an expression is assigned to something defined at a lower (outer) loop depth, the escape analysis decides it escapes. However, it uses the loop depth of the address operator instead of where the RHS is defined. This causes an unnecessary escape if there is an assignment inside a loop but the RHS is defined outside the loop. This CL propagates the loop depth. Fixes #24730. Change-Id: I5ff1530688bdfd90561a7b39c8be9bfc009a9dae Reviewed-on: https://go-review.googlesource.com/105257 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-04-13 14:48:23 +00:00
Josh Bleecher Snyder	c1ed1f3c80	cmd/compile: fix evaluation of "" < s Fixes #24817 Change-Id: Ifa79ab3dfe69297eeef85f7193cd5f85e5982bc5 Reviewed-on: https://go-review.googlesource.com/106655 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-04-12 19:38:37 +00:00
Josh Bleecher Snyder	2dfb423e6e	cmd/compile: loop to ensure all autogenerated functions are compiled I was wrong. There was a need to loop here. Fixes #24761 Change-Id: If13b3ab72febde930bdaebdddd1c05e0d0446020 Reviewed-on: https://go-review.googlesource.com/105615 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-04-11 23:46:30 +00:00
Alberto Donizetti	467eca6076	test/codegen: port last stack and memcombining tests And delete them from asm_test. Also delete an arm64 cmov test has been already ported to the new test harness. Change-Id: I4458721e1f512bc9ecbbe1c22a2c9c7109ad68fe Reviewed-on: https://go-review.googlesource.com/106335 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-04-11 16:08:04 +00:00
Robert Griesemer	3d501df441	cmd/compile: better error message when referring to ambiguous method/field Fixes #14321. Change-Id: I9c92c767b01cf7938c4808a8fef9f2936fc667bc Reviewed-on: https://go-review.googlesource.com/106119 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-04-10 23:39:13 +00:00
Matthew Dempsky	535ad8efb8	cmd/compile: fix check that ensures main.main is a function The check was previously disallowing package main from even importing a non-function symbol named "main". Fixes #24801. Change-Id: I849b9713890429f0a16860ef16b5dc7e970d04a4 Reviewed-on: https://go-review.googlesource.com/106120 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-04-10 23:34:12 +00:00
Alberto Donizetti	188e2bf897	test/codegen: port arm64 BIC/EON/ORN and masking tests And delete them from asm_test. Change-Id: I24f421b87e8cb4770c887a6dfd58eacd0088947d Reviewed-on: https://go-review.googlesource.com/106056 Reviewed-by: Keith Randall <khr@golang.org>	2018-04-10 10:57:50 +00:00
Alberto Donizetti	d5ff631e6b	test/codegen: port last remaining misc bit/arithmetic tests And delete them from asm_test. Change-Id: I9a75efe9858ef9d7ac86065f860c2ae3f25b0941 Reviewed-on: https://go-review.googlesource.com/105597 Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2018-04-10 07:58:35 +00:00
Matthew Dempsky	49ed4cbe85	cmd/compile: sort method sets using package height Also, when statically building itabs, compare *types.Sym instead of name alone so that method sets with duplicate non-exported methods are handled correctly. Fixes #24693. Change-Id: I2db8a3d6e80991a71fef5586a15134b6de116269 Reviewed-on: https://go-review.googlesource.com/105039 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-04-10 00:06:06 +00:00
Matthew Dempsky	fe77a5413e	cmd/compile: fix constant pointer comparison failure Previously, constant pointer-typed expressions could use either Mpint or NilVal as their Val depending on their construction, but const.go expects each type to have a single corresponding Val kind. This CL changes pointer-typed expressions to exclusively use Mpint. Fixes #21221. Change-Id: I6ba36c9b11eb19a68306f0b296acb11a8c254c41 Reviewed-on: https://go-review.googlesource.com/105315 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-04-09 23:19:45 +00:00
Michael Munday	b65122f99a	cmd/compile: optimize comparisons using load merging where available Multi-byte comparison operations were used on amd64, arm64, i386 and s390x for comparisons with constant arrays, but only amd64 and i386 for comparisons with string constants. This CL combines the check for platform capability, since they have the same requirements, and also enables both on ppc64le which also supports load merging. Note that these optimizations currently use little endian byte order which results in byte reversal instructions on s390x. This should be fixed at some point. Change-Id: Ie612d13359b50c77f4d7c6e73fea4a59fa11f322 Reviewed-on: https://go-review.googlesource.com/102558 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-09 21:16:47 +00:00
Keith Randall	0de0ed369f	test: check that unaligned load-add opcodes work. A test for CL 102036. Change-Id: Ief6dcb4f478670813fbe22ea75a06815a4b201a3 Reviewed-on: https://go-review.googlesource.com/105875 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-09 18:57:37 +00:00
Alberto Donizetti	54c3f56ee0	test/codegen: port various mem-combining tests And delete them from asm_test. Change-Id: I0e33d58274951ab5acb67b0117b60ef617ea887a Reviewed-on: https://go-review.googlesource.com/105735 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2018-04-09 12:00:06 +00:00
Alberto Donizetti	3e31eb6b84	test/codegen: port arm64 slice zeroing tests Finish porting arm64 slice zeroing codegen tests; delete them from asm_test. Change-Id: Id2532df8ba1c340fa662a6b5238daa3de30548be Reviewed-on: https://go-review.googlesource.com/105136 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-04-07 09:55:51 +00:00
Matthew Dempsky	950a56899a	cmd/compile: fix method expressions with anonymous receivers Method expressions with anonymous receiver types like "struct { T }.m" require wrapper functions, which we weren't always creating. This in turn resulted in linker errors. This CL ensures that we generate wrapper functions for any anonymous receiver types used in a method expression. Fixes #22444. Change-Id: Ia8ac27f238c2898965e57b82a91d959792d2ddd4 Reviewed-on: https://go-review.googlesource.com/105044 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-06 15:39:11 +00:00
Matthew Dempsky	638f112d69	cmd/compile: cleanup method symbol creation There were multiple ad hoc ways to create method symbols, with subtle and confusing differences between them. This CL unifies them into a single well-documented encoding and implementation. This introduces some inconsequential changes to symbol format for the sake of simplicity and consistency. Two notable changes: 1) Symbol construction is now insensitive to the package currently being compiled. Previously, non-exported methods on anonymous types received different method symbols depending on whether the method was local or imported. 2) Symbols for method values parenthesized non-pointer receiver types and non-exported method names, and also always package-qualified non-exported method names. Now they use the same rules as normal method symbols. The methodSym function is also now stricter about rejecting non-sensical method/receiver combinations. Notably, this means that typecheckfunc needs to call addmethod to validate the method before calling declare, which also means we no longer emit errors about redeclaring bogus methods. Change-Id: I9501c7a53dd70ef60e5c74603974e5ecc06e2003 Reviewed-on: https://go-review.googlesource.com/104876 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-04-05 22:01:17 +00:00
Daniel Martí	9767727353	test: skip locklinear's lockmany test for now Since it's been reliably failing on one of the linux-arm builders (arm5spacemonkey) for a long time. Updates #24221. Change-Id: I8fccc7e16631de497ccc2c285e510a110a93ad95 Reviewed-on: https://go-review.googlesource.com/104535 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-05 10:53:40 +00:00
Alberto Donizetti	f2abca90a2	test/codegen: port arm64 byte slice zeroing tests And delete them from asm_test. Change-Id: Id533130470da9176a401cb94972f626f43a62148 Reviewed-on: https://go-review.googlesource.com/103656 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-04-04 13:18:15 +00:00
Robert Griesemer	4637699e92	cmd/compile/internal/syntax: better error message for incorrect if/switch header Fixes #23664. Change-Id: Ic0637e9f896b2fc6502dfbab2d1c4de3c62c0bd2 Reviewed-on: https://go-review.googlesource.com/104616 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Robert Griesemer <gri@golang.org>	2018-04-03 21:57:37 +00:00
Giovanni Bajo	ac43de3ae5	cmd/compile: in prove, complete support for OpIsInBounds/OpIsSliceInBounds The logic in addBranchRestrictions didn't allow to correctly model OpIs(Slice)Bound for signed domain, and it was also partly implemented within addRestrictions. Thanks to the previous changes, it is now possible to handle the negative conditions correctly, so that we can learn both signed/LT + unsigned/LT on the positive side, and signed/GE + unsigned/GE on the negative side (but only if the index can be proved to be non-negative). This is able to prove ~50 more slice accesses in std+cmd. Change-Id: I9858080dc03b16f85993a55983dbc4b00f8491b0 Reviewed-on: https://go-review.googlesource.com/104037 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-04-03 20:25:34 +00:00
isharipo	dcaf3fb134	cmd/compile: make DCE remove nodes after terminating if This change makes compiler frontend dead code elimination of const expr if statements introduced in https://golang.org/cl/38773 treat both if constCondTrue { ...; returnStmt } toBeRemoved... if constCondFalse { ...; } else { returnStmt } toBeRemoved... identically to: if constCondTrue { ...; returnStmt } else { toBeRemoved... } Where "constCondTrue" is a an expression that can be evaluated to "true" during compile time. The additional checks are only triggered for const expr if conditions that evaluate to true. name old time/op new time/op delta Template 431ms ± 2% 429ms ± 1% ~ (p=0.491 n=8+6) Unicode 198ms ± 4% 201ms ± 2% ~ (p=0.234 n=7+6) GoTypes 1.40s ± 1% 1.41s ± 2% ~ (p=0.053 n=7+7) Compiler 6.72s ± 2% 6.81s ± 1% +1.35% (p=0.011 n=7+7) SSA 17.3s ± 1% 17.3s ± 2% ~ (p=0.731 n=6+7) Flate 275ms ± 2% 275ms ± 2% ~ (p=0.902 n=7+7) GoParser 340ms ± 2% 339ms ± 2% ~ (p=0.902 n=7+7) Reflect 910ms ± 2% 905ms ± 1% ~ (p=0.310 n=6+6) Tar 403ms ± 1% 403ms ± 2% ~ (p=0.366 n=7+6) XML 486ms ± 1% 490ms ± 1% ~ (p=0.065 n=6+6) StdCmd 56.2s ± 1% 56.6s ± 2% ~ (p=0.620 n=7+7) name old user-time/op new user-time/op delta Template 559ms ± 8% 557ms ± 7% ~ (p=0.713 n=8+7) Unicode 266ms ±13% 277ms ± 9% ~ (p=0.157 n=8+7) GoTypes 1.83s ± 2% 1.84s ± 1% ~ (p=0.522 n=8+7) Compiler 8.67s ± 4% 8.89s ± 4% ~ (p=0.077 n=7+7) SSA 23.9s ± 1% 24.2s ± 1% +1.31% (p=0.005 n=7+7) Flate 351ms ± 4% 342ms ± 5% ~ (p=0.105 n=7+7) GoParser 437ms ± 2% 423ms ± 5% -3.14% (p=0.016 n=7+7) Reflect 1.16s ± 3% 1.15s ± 2% ~ (p=0.362 n=7+7) Tar 517ms ± 4% 511ms ± 3% ~ (p=0.538 n=7+7) XML 619ms ± 3% 617ms ± 4% ~ (p=0.483 n=7+7) Fixes #23521 Change-Id: I165a7827d869aeb93ce6047d026ff873d039a4f3 Reviewed-on: https://go-review.googlesource.com/91056 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2018-04-03 15:19:41 +00:00
Robert Griesemer	c65a2781be	cmd/compile: better handling of incorrect type switches Don't report errors if we don't have a correct type switch guard; instead ignore it and leave it to the type-checker to report the error. This leads to better error messages concentrating on the type switch guard rather than errors around (confusing) syntactic details. Also clean up some code setting up AssertExpr (they never have a nil Type field) and remove some incorrect TODOs. Fixes #24470. Change-Id: I69512f36e0417e3b5ea9c8856768e04b19d654a8 Reviewed-on: https://go-review.googlesource.com/103615 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-04-03 05:34:20 +00:00
isharipo	b44e73eacb	test/fixedbugs: fix bug248 and bug345 When test/run script was removed, these two tests were changed to be executed by test/run.go. Because errchk does not exit with non-zero status on errors, they were silently failing for a while. This change makes 2 things: 1. Compile tested packages in GOROOT/test to match older runner script behavior (strictly required only in bug345, optional in bug248) 2. Check command output with "(?m)^BUG" regexp. It approximates older `grep -q '^BUG' that was used before. See referenced issue for detailed explanation. Fixes #24629 Change-Id: Ie888dcdb4e25cdbb19d434bbc5cb03eb633e9ee8 Reviewed-on: https://go-review.googlesource.com/104095 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-02 20:08:27 +00:00
Bryan Chan	625f2dccd4	cmd/compile/internal/ssa: handle symbol address comparisons consistently CL 38338 introduced SSA rules to optimize two types of pointer equality tests: a pointer compared with itself, and comparison of addresses taken of two symbols which may have the same base. This patch adds rules to apply the same optimization to pointer inequality tests, which also ensures that two pointers to zero-width types cannot be both equal and unequal at the same time. Fixes #24503. Change-Id: Ic828aeb86ae2e680caf66c35f4c247674768a9ba Reviewed-on: https://go-review.googlesource.com/102275 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-31 21:37:13 +00:00
Alberto Donizetti	3b0b8bcd68	test/codegen: port stack-related tests to codegen And delete them from asm_test. Change-Id: Idfe1249052d82d15b9c30b292c78656a0bf7b48d Reviewed-on: https://go-review.googlesource.com/103315 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-30 08:08:06 +00:00
Alberto Donizetti	56eaf574a1	test/codegen: match 387 ops too for GOARCH=386 Change-Id: I99407e27e340689009af798989b33cef7cb92070 Reviewed-on: https://go-review.googlesource.com/103376 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-29 20:05:40 +00:00
Alberto Donizetti	aacf7a1846	test: avoid touching GOOS/GOARCH in codegen driver This change modifies the codegen test harness driver so that it no longer modifies the environment GOOS/GOARCH, since that seems to cause flakiness in other concurrently-running tests. The change also enables the codegen tests in run.go. Fixes #24538 Change-Id: I997ac1eb38eb92054efff67fe5c4d3cccc86412b Reviewed-on: https://go-review.googlesource.com/103455 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-29 19:00:10 +00:00
Alberto Donizetti	04e993f953	test: update list of escape reasons The escape_because.go test file (which tests the "because" escape explainations printed by `-m -m`) cointains a machine-generated list of all the escape reasons seen in the escape tests. The list appears to be outdated; moreove a new escape reason was added in CL 102895. This change re-generates the list. Change-Id: Idc721c6bbfe9516895b5cf1e6d09b77deda5a3dd Reviewed-on: https://go-review.googlesource.com/103375 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-29 14:17:40 +00:00
Alberto Donizetti	360c19157a	cmd/compile: print accurate escape reason for non-const-length slices This change makes `-m -m` print a better explanation for the case where a slice is marked as escaping and heap-allocated because it has a non-constant len/cap. Fixes #24578 Change-Id: I0ebafb77c758a99857d72b365817bdba7b446cc0 Reviewed-on: https://go-review.googlesource.com/102895 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-03-28 16:56:03 +00:00
Matthew Dempsky	7b177b1a03	cmd/compile: fix method set computation for shadowed methods In expandmeth, we call expand1/expand0 to build a list of all candidate methods to promote, and then we use dotpath to prune down which names actually resolve to a promoted method and how. However, previously we still computed "followsptr" based on the expand1/expand0 traversal (which is depth-first), rather than dotpath (which is breadth-first). The result is that we could sometimes end up miscomputing whether a particular promoted method involves a pointer traversal, which could result in bad code generation for method trampolines. Fixes #24547. Change-Id: I57dc014466d81c165b05d78b98610dc3765b7a90 Reviewed-on: https://go-review.googlesource.com/102618 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-03-27 18:56:36 +00:00
Alberto Donizetti	a27cd4fd31	test/codegen: port tbz/tbnz arm64 tests And delete them from asm_test. Change-Id: I34fcf85ae8ce09cd146fe4ce6a0ae7616bd97e2d Reviewed-on: https://go-review.googlesource.com/102296 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-03-24 09:35:53 +00:00
Giovanni Bajo	79112707bb	cmd/compile: add patterns for bit set/clear/complement on amd64 This patch completes implementation of BT(Q\|L), and adds support for BT(S\|R\|C)(Q\|L). Example of code changes from time.(*Time).addSec: if t.wall&hasMonotonic != 0 { 0x1073465 488b08 MOVQ 0(AX), CX 0x1073468 4889ca MOVQ CX, DX 0x107346b 48c1e93f SHRQ $0x3f, CX 0x107346f 48c1e13f SHLQ $0x3f, CX 0x1073473 48f7c1ffffffff TESTQ $-0x1, CX 0x107347a 746b JE 0x10734e7 if t.wall&hasMonotonic != 0 { 0x1073435 488b08 MOVQ 0(AX), CX 0x1073438 480fbae13f BTQ $0x3f, CX 0x107343d 7363 JAE 0x10734a2 Another example: t.wall = t.wall&nsecMask \| uint64(dsec)<<nsecShift \| hasMonotonic 0x10734c8 4881e1ffffff3f ANDQ $0x3fffffff, CX 0x10734cf 48c1e61e SHLQ $0x1e, SI 0x10734d3 4809ce ORQ CX, SI 0x10734d6 48b90000000000000080 MOVQ $0x8000000000000000, CX 0x10734e0 4809f1 ORQ SI, CX 0x10734e3 488908 MOVQ CX, 0(AX) t.wall = t.wall&nsecMask \| uint64(dsec)<<nsecShift \| hasMonotonic 0x107348b 4881e2ffffff3f ANDQ $0x3fffffff, DX 0x1073492 48c1e61e SHLQ $0x1e, SI 0x1073496 4809f2 ORQ SI, DX 0x1073499 480fbaea3f BTSQ $0x3f, DX 0x107349e 488910 MOVQ DX, 0(AX) Go1 benchmarks seem unaffected, and I would be surprised otherwise: name old time/op new time/op delta BinaryTree17-4 2.64s ± 4% 2.56s ± 9% -2.92% (p=0.008 n=9+9) Fannkuch11-4 2.90s ± 1% 2.95s ± 3% +1.76% (p=0.010 n=10+9) FmtFprintfEmpty-4 35.3ns ± 1% 34.5ns ± 2% -2.34% (p=0.004 n=9+8) FmtFprintfString-4 57.0ns ± 1% 58.4ns ± 5% +2.52% (p=0.029 n=9+10) FmtFprintfInt-4 59.8ns ± 3% 59.8ns ± 6% ~ (p=0.565 n=10+10) FmtFprintfIntInt-4 93.9ns ± 3% 91.2ns ± 5% -2.94% (p=0.014 n=10+9) FmtFprintfPrefixedInt-4 107ns ± 6% 104ns ± 6% ~ (p=0.099 n=10+10) FmtFprintfFloat-4 187ns ± 3% 188ns ± 3% ~ (p=0.505 n=10+9) FmtManyArgs-4 410ns ± 1% 415ns ± 6% ~ (p=0.649 n=8+10) GobDecode-4 5.30ms ± 3% 5.27ms ± 3% ~ (p=0.436 n=10+10) GobEncode-4 4.62ms ± 5% 4.47ms ± 2% -3.24% (p=0.001 n=9+10) Gzip-4 197ms ± 4% 193ms ± 3% ~ (p=0.123 n=10+10) Gunzip-4 30.4ms ± 3% 30.1ms ± 3% ~ (p=0.481 n=10+10) HTTPClientServer-4 76.3µs ± 1% 76.0µs ± 1% ~ (p=0.236 n=8+9) JSONEncode-4 10.5ms ± 9% 10.3ms ± 3% ~ (p=0.280 n=10+10) JSONDecode-4 42.3ms ±10% 41.3ms ± 2% ~ (p=0.053 n=9+10) Mandelbrot200-4 3.80ms ± 2% 3.72ms ± 2% -2.15% (p=0.001 n=9+10) GoParse-4 2.88ms ±10% 2.81ms ± 2% ~ (p=0.247 n=10+10) RegexpMatchEasy0_32-4 69.5ns ± 4% 68.6ns ± 2% ~ (p=0.171 n=10+10) RegexpMatchEasy0_1K-4 165ns ± 3% 162ns ± 3% ~ (p=0.137 n=10+10) RegexpMatchEasy1_32-4 65.7ns ± 6% 64.4ns ± 2% -2.02% (p=0.037 n=10+10) RegexpMatchEasy1_1K-4 278ns ± 2% 279ns ± 3% ~ (p=0.991 n=8+9) RegexpMatchMedium_32-4 99.3ns ± 3% 98.5ns ± 4% ~ (p=0.457 n=10+9) RegexpMatchMedium_1K-4 30.1µs ± 1% 30.4µs ± 2% ~ (p=0.173 n=8+10) RegexpMatchHard_32-4 1.40µs ± 2% 1.41µs ± 4% ~ (p=0.565 n=10+10) RegexpMatchHard_1K-4 42.5µs ± 1% 41.5µs ± 3% -2.13% (p=0.002 n=8+9) Revcomp-4 332ms ± 4% 328ms ± 5% ~ (p=0.720 n=9+10) Template-4 48.3ms ± 2% 49.6ms ± 3% +2.56% (p=0.002 n=8+10) TimeParse-4 252ns ± 2% 249ns ± 3% ~ (p=0.116 n=9+10) TimeFormat-4 262ns ± 4% 252ns ± 3% -4.01% (p=0.000 n=9+10) name old speed new speed delta GobDecode-4 145MB/s ± 3% 146MB/s ± 3% ~ (p=0.436 n=10+10) GobEncode-4 166MB/s ± 5% 172MB/s ± 2% +3.28% (p=0.001 n=9+10) Gzip-4 98.6MB/s ± 4% 100.4MB/s ± 3% ~ (p=0.123 n=10+10) Gunzip-4 639MB/s ± 3% 645MB/s ± 3% ~ (p=0.481 n=10+10) JSONEncode-4 185MB/s ± 8% 189MB/s ± 3% ~ (p=0.280 n=10+10) JSONDecode-4 46.0MB/s ± 9% 47.0MB/s ± 2% +2.21% (p=0.046 n=9+10) GoParse-4 20.1MB/s ± 9% 20.6MB/s ± 2% ~ (p=0.239 n=10+10) RegexpMatchEasy0_32-4 460MB/s ± 4% 467MB/s ± 2% ~ (p=0.165 n=10+10) RegexpMatchEasy0_1K-4 6.19GB/s ± 3% 6.28GB/s ± 3% ~ (p=0.165 n=10+10) RegexpMatchEasy1_32-4 487MB/s ± 5% 497MB/s ± 2% +2.00% (p=0.043 n=10+10) RegexpMatchEasy1_1K-4 3.67GB/s ± 2% 3.67GB/s ± 3% ~ (p=0.963 n=8+9) RegexpMatchMedium_32-4 10.1MB/s ± 3% 10.1MB/s ± 4% ~ (p=0.435 n=10+9) RegexpMatchMedium_1K-4 34.0MB/s ± 1% 33.7MB/s ± 2% ~ (p=0.173 n=8+10) RegexpMatchHard_32-4 22.9MB/s ± 2% 22.7MB/s ± 4% ~ (p=0.565 n=10+10) RegexpMatchHard_1K-4 24.0MB/s ± 3% 24.7MB/s ± 3% +2.64% (p=0.001 n=9+9) Revcomp-4 766MB/s ± 4% 775MB/s ± 5% ~ (p=0.720 n=9+10) Template-4 40.2MB/s ± 2% 39.2MB/s ± 3% -2.47% (p=0.002 n=8+10) The rules match ~1800 times during all.bash. Fixes #18943 Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4 Reviewed-on: https://go-review.googlesource.com/94766 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-24 02:38:50 +00:00
Alberto Donizetti	fc6280d4b0	test/codegen: port direct comparisons with memory tests And remove them from asm_test. Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454 Reviewed-on: https://go-review.googlesource.com/102115 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-22 17:20:09 +00:00
Alberto Donizetti	be371edd67	test/codegen: port comparisons tests to codegen And delete them from asm_test. Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab Reviewed-on: https://go-review.googlesource.com/101595 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-03-20 19:38:06 +00:00
Michael Munday	ae10914e67	cmd/compile: mark LAA and LAAG as clobbering flags on s390x The atomic add instructions modify the condition code and so need to be marked as clobbering flags. Fixes #24449. Change-Id: Ic69c8d775fbdbfb2a56c5e0cfca7a49c0d7f6897 Reviewed-on: https://go-review.googlesource.com/101455 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-03-20 09:44:50 +00:00
Vladimir Kuzmin	c12b185a6e	cmd/compile: avoid mapaccess at m[k]=append(m[k].. Currently rvalue m[k] is transformed during walk into: tmp1 := mapaccess(m, k) tmp2 := append(tmp1, ...) mapassign(m, k) = tmp2 However, this is suboptimal, as we could instead produce just: tmp := mapassign(m, k) tmp := append(tmp, ...) Optimization is possible only if during Order it may tell that m[k] is exactly the same at left and right part of assignment. It doesn't work: 1) m[f(k)] = append(m[f(k)], ...) 2) sink, m[k] = sink, append(m[k]...) 3) m[k] = append(..., m[k],...) Benchmark: name old time/op new time/op delta MapAppendAssign/Int32/256-8 33.5ns ± 3% 22.4ns ±10% -33.24% (p=0.000 n=16+18) MapAppendAssign/Int32/65536-8 68.2ns ± 6% 48.5ns ±29% -28.90% (p=0.000 n=20+20) MapAppendAssign/Int64/256-8 34.3ns ± 4% 23.3ns ± 5% -32.23% (p=0.000 n=17+18) MapAppendAssign/Int64/65536-8 65.9ns ± 7% 61.2ns ±19% -7.06% (p=0.002 n=18+20) MapAppendAssign/Str/256-8 116ns ±12% 79ns ±16% -31.70% (p=0.000 n=20+19) MapAppendAssign/Str/65536-8 134ns ±15% 111ns ±45% -16.95% (p=0.000 n=19+20) name old alloc/op new alloc/op delta MapAppendAssign/Int32/256-8 47.0B ± 0% 46.0B ± 0% -2.13% (p=0.000 n=19+18) MapAppendAssign/Int32/65536-8 27.0B ± 0% 20.7B ±30% -23.33% (p=0.000 n=20+20) MapAppendAssign/Int64/256-8 47.0B ± 0% 46.0B ± 0% -2.13% (p=0.000 n=20+17) MapAppendAssign/Int64/65536-8 27.0B ± 0% 27.0B ± 0% ~ (all equal) MapAppendAssign/Str/256-8 94.0B ± 0% 78.0B ± 0% -17.02% (p=0.000 n=20+16) MapAppendAssign/Str/65536-8 54.0B ± 0% 54.0B ± 0% ~ (all equal) Fixes #24364 Updates #5147 Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409 Reviewed-on: https://go-review.googlesource.com/100838 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-20 01:47:07 +00:00
Alberto Donizetti	5a4e09837c	test/codegen: port maps test to codegen And delete them from asm_test. Change-Id: I3cf0934706a640136cb0f646509174f8c1bf3363 Reviewed-on: https://go-review.googlesource.com/101395 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-03-19 13:39:34 +00:00
Alberto Donizetti	b61b1d2c57	test/codegen: port structs test to codegen And delete them from asm_test. Change-Id: Ia286239a3d8f3915f2ca25dbcb39f3354a4f8aea Reviewed-on: https://go-review.googlesource.com/101138 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-03-18 16:53:53 +00:00
Alberto Donizetti	cceee685be	test/codegen: port floats tests to codegen And delete them from asm_test. Change-Id: Ibdaca3496eefc73c731b511ddb9636a1f3dff68c Reviewed-on: https://go-review.googlesource.com/100915 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-15 18:05:59 +00:00
Giovanni Bajo	a35ec9a59e	cmd/compile: implement CMOV on amd64 This builds upon the branchelim pass, activating it for amd64 and lowering CondSelect. Special care is made to FPU instructions for NaN handling. Benchmark results on Xeon E5630 (Westmere EP): name old time/op new time/op delta BinaryTree17-16 4.99s ± 9% 4.66s ± 2% ~ (p=0.095 n=5+5) Fannkuch11-16 4.93s ± 3% 5.04s ± 2% ~ (p=0.548 n=5+5) FmtFprintfEmpty-16 58.8ns ± 7% 61.4ns ±14% ~ (p=0.579 n=5+5) FmtFprintfString-16 114ns ± 2% 114ns ± 4% ~ (p=0.603 n=5+5) FmtFprintfInt-16 181ns ± 4% 125ns ± 3% -30.90% (p=0.008 n=5+5) FmtFprintfIntInt-16 263ns ± 2% 217ns ± 2% -17.34% (p=0.008 n=5+5) FmtFprintfPrefixedInt-16 230ns ± 1% 212ns ± 1% -7.99% (p=0.008 n=5+5) FmtFprintfFloat-16 411ns ± 3% 344ns ± 5% -16.43% (p=0.008 n=5+5) FmtManyArgs-16 828ns ± 4% 790ns ± 2% -4.59% (p=0.032 n=5+5) GobDecode-16 10.9ms ± 4% 10.8ms ± 5% ~ (p=0.548 n=5+5) GobEncode-16 9.52ms ± 5% 9.46ms ± 2% ~ (p=1.000 n=5+5) Gzip-16 334ms ± 2% 337ms ± 2% ~ (p=0.548 n=5+5) Gunzip-16 64.4ms ± 1% 65.0ms ± 1% +1.00% (p=0.008 n=5+5) HTTPClientServer-16 156µs ± 3% 155µs ± 3% ~ (p=0.690 n=5+5) JSONEncode-16 21.0ms ± 1% 21.8ms ± 0% +3.76% (p=0.016 n=5+4) JSONDecode-16 95.1ms ± 0% 95.7ms ± 1% ~ (p=0.151 n=5+5) Mandelbrot200-16 6.38ms ± 1% 6.42ms ± 1% ~ (p=0.095 n=5+5) GoParse-16 5.47ms ± 2% 5.36ms ± 1% -1.95% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 111ns ± 1% 111ns ± 1% ~ (p=0.635 n=5+4) RegexpMatchEasy0_1K-16 408ns ± 1% 411ns ± 2% ~ (p=0.087 n=5+5) RegexpMatchEasy1_32-16 103ns ± 1% 104ns ± 1% ~ (p=0.484 n=5+5) RegexpMatchEasy1_1K-16 659ns ± 2% 652ns ± 1% ~ (p=0.571 n=5+5) RegexpMatchMedium_32-16 176ns ± 2% 174ns ± 1% ~ (p=0.476 n=5+5) RegexpMatchMedium_1K-16 58.6µs ± 4% 57.7µs ± 4% ~ (p=0.548 n=5+5) RegexpMatchHard_32-16 3.07µs ± 3% 3.04µs ± 4% ~ (p=0.421 n=5+5) RegexpMatchHard_1K-16 89.2µs ± 1% 87.9µs ± 2% -1.52% (p=0.032 n=5+5) Revcomp-16 575ms ± 0% 587ms ± 2% +2.12% (p=0.032 n=4+5) Template-16 110ms ± 1% 107ms ± 3% -3.00% (p=0.032 n=5+5) TimeParse-16 463ns ± 0% 462ns ± 0% ~ (p=0.810 n=5+4) TimeFormat-16 538ns ± 0% 535ns ± 0% -0.63% (p=0.024 n=5+5) name old speed new speed delta GobDecode-16 70.7MB/s ± 4% 71.4MB/s ± 5% ~ (p=0.452 n=5+5) GobEncode-16 80.7MB/s ± 5% 81.2MB/s ± 2% ~ (p=1.000 n=5+5) Gzip-16 58.2MB/s ± 2% 57.7MB/s ± 2% ~ (p=0.452 n=5+5) Gunzip-16 302MB/s ± 1% 299MB/s ± 1% -0.99% (p=0.008 n=5+5) JSONEncode-16 92.4MB/s ± 1% 89.1MB/s ± 0% -3.63% (p=0.016 n=5+4) JSONDecode-16 20.4MB/s ± 0% 20.3MB/s ± 1% ~ (p=0.135 n=5+5) GoParse-16 10.6MB/s ± 2% 10.8MB/s ± 1% +2.00% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 286MB/s ± 1% 285MB/s ± 3% ~ (p=1.000 n=5+5) RegexpMatchEasy0_1K-16 2.51GB/s ± 1% 2.49GB/s ± 2% ~ (p=0.095 n=5+5) RegexpMatchEasy1_32-16 309MB/s ± 1% 307MB/s ± 1% ~ (p=0.548 n=5+5) RegexpMatchEasy1_1K-16 1.55GB/s ± 2% 1.57GB/s ± 1% ~ (p=0.690 n=5+5) RegexpMatchMedium_32-16 5.68MB/s ± 2% 5.73MB/s ± 1% ~ (p=0.579 n=5+5) RegexpMatchMedium_1K-16 17.5MB/s ± 4% 17.8MB/s ± 4% ~ (p=0.500 n=5+5) RegexpMatchHard_32-16 10.4MB/s ± 3% 10.5MB/s ± 4% ~ (p=0.460 n=5+5) RegexpMatchHard_1K-16 11.5MB/s ± 1% 11.7MB/s ± 2% +1.57% (p=0.032 n=5+5) Revcomp-16 442MB/s ± 0% 433MB/s ± 2% -2.05% (p=0.032 n=4+5) Template-16 17.7MB/s ± 1% 18.2MB/s ± 3% +3.12% (p=0.032 n=5+5) Change-Id: I6972e8f35f2b31f9a42ac473a6bf419a18022558 Reviewed-on: https://go-review.googlesource.com/100935 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-15 16:41:59 +00:00
Geoff Berry	e244a7a7d3	cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX, UBFIZ and UBFX opcodes. go1 benchmarks results on Amberwing: name old time/op new time/op delta FmtManyArgs 786ns ± 2% 714ns ± 1% -9.20% (p=0.000 n=10+10) Gzip 437ms ± 0% 402ms ± 0% -7.99% (p=0.000 n=10+10) FmtFprintfIntInt 196ns ± 0% 182ns ± 0% -7.28% (p=0.000 n=10+9) FmtFprintfPrefixedInt 207ns ± 0% 199ns ± 0% -3.86% (p=0.000 n=10+10) FmtFprintfFloat 324ns ± 0% 316ns ± 0% -2.47% (p=0.000 n=10+8) FmtFprintfInt 119ns ± 0% 117ns ± 0% -1.68% (p=0.000 n=10+9) GobDecode 12.8ms ± 2% 12.6ms ± 1% -1.62% (p=0.002 n=10+10) JSONDecode 94.4ms ± 1% 93.4ms ± 0% -1.10% (p=0.000 n=10+10) RegexpMatchEasy0_32 247ns ± 0% 245ns ± 0% -0.65% (p=0.000 n=10+10) RegexpMatchMedium_32 314ns ± 0% 312ns ± 0% -0.64% (p=0.000 n=10+10) RegexpMatchEasy0_1K 541ns ± 0% 538ns ± 0% -0.55% (p=0.000 n=10+9) TimeParse 450ns ± 1% 448ns ± 1% -0.42% (p=0.035 n=9+9) RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.000 n=10+10) GoParse 6.03ms ± 0% 6.00ms ± 0% -0.40% (p=0.002 n=10+10) RegexpMatchEasy1_1K 779ns ± 0% 777ns ± 0% -0.26% (p=0.000 n=10+10) RegexpMatchHard_32 2.75µs ± 0% 2.74µs ± 1% -0.06% (p=0.026 n=9+9) BinaryTree17 11.7s ± 0% 11.6s ± 0% ~ (p=0.089 n=10+10) HTTPClientServer 89.1µs ± 1% 89.5µs ± 2% ~ (p=0.436 n=10+10) RegexpMatchHard_1K 78.9µs ± 0% 79.5µs ± 2% ~ (p=0.469 n=10+10) FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal) GobEncode 12.0ms ± 1% 12.1ms ± 0% ~ (p=0.075 n=10+10) Revcomp 669ms ± 0% 668ms ± 0% ~ (p=0.091 n=7+9) Mandelbrot200 5.35ms ± 0% 5.36ms ± 0% +0.07% (p=0.000 n=9+9) RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% +0.10% (p=0.000 n=9+9) Fannkuch11 3.25s ± 0% 3.26s ± 0% +0.36% (p=0.000 n=9+10) FmtFprintfString 114ns ± 1% 115ns ± 0% +0.52% (p=0.011 n=10+10) JSONEncode 20.2ms ± 0% 20.3ms ± 0% +0.65% (p=0.000 n=10+10) Template 91.3ms ± 0% 92.3ms ± 0% +1.08% (p=0.000 n=10+10) TimeFormat 484ns ± 0% 495ns ± 1% +2.30% (p=0.000 n=9+10) There are some opportunities to improve this change further by adding patterns to match the "extended register" versions of ADD/SUB/CMP, but I think that should be evaluated on its own. The regressions in Template and TimeFormat would likely be recovered by this, as they seem to be due to generating: ubfiz x0, x0, #3, #8 add x1, x2, x0 instead of add x1, x2, x0, lsl #3 Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b Reviewed-on: https://go-review.googlesource.com/88355 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-15 14:10:41 +00:00
Alberto Donizetti	ded9a1b372	test/codegen: port len/cap pow2 div tests to codegen And delete them from asm_test. Change-Id: I29c8d098a8893e6b669b6272a2f508985ac9d618 Reviewed-on: https://go-review.googlesource.com/100876 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-03-15 13:34:01 +00:00
Ilya Tocar	644d14ea0f	Revert "cmd/compile: implement CMOV on amd64" This reverts commit `080187f4f7`. It broke build of golang.org/x/exp/shiny/iconvg See issue 24395 for details Change-Id: Ifd6134f6214e6cee40bd3c63c32941d5fc96ae8b Reviewed-on: https://go-review.googlesource.com/100755 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-14 21:21:23 +00:00
Alberto Donizetti	cd3aae9b81	test/codegen: port all small memmove tests to codegen This change ports all the remaining tests checking that small memmoves are replaced with MOVs to the new codegen test harness, and deletes them from the asm_test file. Change-Id: I01c94b441e27a5d61518035af62d62779dafeb56 Reviewed-on: https://go-review.googlesource.com/100476 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-14 15:57:07 +00:00
Alberto Donizetti	858042b8fd	test/codegen: add codegen tests for div Change-Id: I6ce8981e85fd55ade6078b0946e54a9215d9deca Reviewed-on: https://go-review.googlesource.com/100575 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-03-14 15:56:46 +00:00
Tobias Klauser	d32018a500	test: check that size argument errors are emitted at call site Add tests for the "negative size argument in make." and "size argument too large in make." error messages to appear at call sites in case the size is a const defined on another line. As suggested by Matthew in a comment on CL 69910. Change-Id: I5c33d4bec4e3d20bb21fe8019df27999997ddff3 Reviewed-on: https://go-review.googlesource.com/100395 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-03-14 08:36:15 +00:00
Matthew Dempsky	e601c07908	cmd/compile: reject type switch with guarded declaration and no cases Fixes #23116. Change-Id: I5db5c5c39bbb50148ffa18c9393b045f255f80a3 Reviewed-on: https://go-review.googlesource.com/100459 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-03-13 22:02:46 +00:00
Robert Griesemer	363bcd7b4f	cmd/compile: use key position for key:val elements in composite literals Fixes #24339. Change-Id: Ie47764fed27f76b480834b1fdbed0512c94831d9 Reviewed-on: https://go-review.googlesource.com/100457 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-03-13 21:44:12 +00:00
Matthew Dempsky	09d4455f45	cmd/compile: enable inlining variadic functions As a side effect of working on mid-stack inlining, we've fixed support for inlining variadic functions. Might as well enable it. Change-Id: I7f555f8b941969791db7eb598c0b49f6dc0820aa Reviewed-on: https://go-review.googlesource.com/100456 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-03-13 20:34:03 +00:00
Vladimir Kuzmin	7395083136	cmd/compile: avoid extra mapaccess in "m[k] op= r" Currently, order desugars map assignment operations like m[k] op= r into m[k] = m[k] op r which in turn is transformed during walk into: tmp := mapaccess(m, k) tmp = tmp op r mapassign(m, k) = tmp However, this is suboptimal, as we could instead produce just: mapassign(m, k) op= r One complication though is if "r == 0", then "m[k] /= r" and "m[k] %= r" will panic, and they need to do so before* calling mapassign, otherwise we may insert a new zero-value element into the map. It would be spec compliant to just emit the "r != 0" check before calling mapassign (see #23735), but currently these checks aren't generated until SSA construction. For now, it's simpler to continue desugaring /= and %= into two map indexing operations. Fixes #23661. Change-Id: I46e3739d9adef10e92b46fdd78b88d5aabe68952 Reviewed-on: https://go-review.googlesource.com/91557 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-03-12 19:27:44 +00:00
isharipo	85a8d25d53	cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86 cmd/asm now supports three-operand form of IMUL, so instead of using IMUL with resultInArg0, emit IMUL3 instruction. This results in less redundant MOVs where SSA assigns different registers to input[0] and dst arguments. Note: these have exactly the same encoding when reg0=reg1: IMUL3x $const, reg0, reg1 IMULx $const, reg Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0]. This is why we don't bother to generate IMULx for the case where dst is the same as input[0]. Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4 Reviewed-on: https://go-review.googlesource.com/99656 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-12 19:02:36 +00:00
Giovanni Bajo	080187f4f7	cmd/compile: implement CMOV on amd64 This builds upon the branchelim pass, activating it for amd64 and lowering CondSelect. Special care is made to FPU instructions for NaN handling. Benchmark results on Xeon E5630 (Westmere EP): name old time/op new time/op delta BinaryTree17-16 4.99s ± 9% 4.66s ± 2% ~ (p=0.095 n=5+5) Fannkuch11-16 4.93s ± 3% 5.04s ± 2% ~ (p=0.548 n=5+5) FmtFprintfEmpty-16 58.8ns ± 7% 61.4ns ±14% ~ (p=0.579 n=5+5) FmtFprintfString-16 114ns ± 2% 114ns ± 4% ~ (p=0.603 n=5+5) FmtFprintfInt-16 181ns ± 4% 125ns ± 3% -30.90% (p=0.008 n=5+5) FmtFprintfIntInt-16 263ns ± 2% 217ns ± 2% -17.34% (p=0.008 n=5+5) FmtFprintfPrefixedInt-16 230ns ± 1% 212ns ± 1% -7.99% (p=0.008 n=5+5) FmtFprintfFloat-16 411ns ± 3% 344ns ± 5% -16.43% (p=0.008 n=5+5) FmtManyArgs-16 828ns ± 4% 790ns ± 2% -4.59% (p=0.032 n=5+5) GobDecode-16 10.9ms ± 4% 10.8ms ± 5% ~ (p=0.548 n=5+5) GobEncode-16 9.52ms ± 5% 9.46ms ± 2% ~ (p=1.000 n=5+5) Gzip-16 334ms ± 2% 337ms ± 2% ~ (p=0.548 n=5+5) Gunzip-16 64.4ms ± 1% 65.0ms ± 1% +1.00% (p=0.008 n=5+5) HTTPClientServer-16 156µs ± 3% 155µs ± 3% ~ (p=0.690 n=5+5) JSONEncode-16 21.0ms ± 1% 21.8ms ± 0% +3.76% (p=0.016 n=5+4) JSONDecode-16 95.1ms ± 0% 95.7ms ± 1% ~ (p=0.151 n=5+5) Mandelbrot200-16 6.38ms ± 1% 6.42ms ± 1% ~ (p=0.095 n=5+5) GoParse-16 5.47ms ± 2% 5.36ms ± 1% -1.95% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 111ns ± 1% 111ns ± 1% ~ (p=0.635 n=5+4) RegexpMatchEasy0_1K-16 408ns ± 1% 411ns ± 2% ~ (p=0.087 n=5+5) RegexpMatchEasy1_32-16 103ns ± 1% 104ns ± 1% ~ (p=0.484 n=5+5) RegexpMatchEasy1_1K-16 659ns ± 2% 652ns ± 1% ~ (p=0.571 n=5+5) RegexpMatchMedium_32-16 176ns ± 2% 174ns ± 1% ~ (p=0.476 n=5+5) RegexpMatchMedium_1K-16 58.6µs ± 4% 57.7µs ± 4% ~ (p=0.548 n=5+5) RegexpMatchHard_32-16 3.07µs ± 3% 3.04µs ± 4% ~ (p=0.421 n=5+5) RegexpMatchHard_1K-16 89.2µs ± 1% 87.9µs ± 2% -1.52% (p=0.032 n=5+5) Revcomp-16 575ms ± 0% 587ms ± 2% +2.12% (p=0.032 n=4+5) Template-16 110ms ± 1% 107ms ± 3% -3.00% (p=0.032 n=5+5) TimeParse-16 463ns ± 0% 462ns ± 0% ~ (p=0.810 n=5+4) TimeFormat-16 538ns ± 0% 535ns ± 0% -0.63% (p=0.024 n=5+5) name old speed new speed delta GobDecode-16 70.7MB/s ± 4% 71.4MB/s ± 5% ~ (p=0.452 n=5+5) GobEncode-16 80.7MB/s ± 5% 81.2MB/s ± 2% ~ (p=1.000 n=5+5) Gzip-16 58.2MB/s ± 2% 57.7MB/s ± 2% ~ (p=0.452 n=5+5) Gunzip-16 302MB/s ± 1% 299MB/s ± 1% -0.99% (p=0.008 n=5+5) JSONEncode-16 92.4MB/s ± 1% 89.1MB/s ± 0% -3.63% (p=0.016 n=5+4) JSONDecode-16 20.4MB/s ± 0% 20.3MB/s ± 1% ~ (p=0.135 n=5+5) GoParse-16 10.6MB/s ± 2% 10.8MB/s ± 1% +2.00% (p=0.016 n=5+5) RegexpMatchEasy0_32-16 286MB/s ± 1% 285MB/s ± 3% ~ (p=1.000 n=5+5) RegexpMatchEasy0_1K-16 2.51GB/s ± 1% 2.49GB/s ± 2% ~ (p=0.095 n=5+5) RegexpMatchEasy1_32-16 309MB/s ± 1% 307MB/s ± 1% ~ (p=0.548 n=5+5) RegexpMatchEasy1_1K-16 1.55GB/s ± 2% 1.57GB/s ± 1% ~ (p=0.690 n=5+5) RegexpMatchMedium_32-16 5.68MB/s ± 2% 5.73MB/s ± 1% ~ (p=0.579 n=5+5) RegexpMatchMedium_1K-16 17.5MB/s ± 4% 17.8MB/s ± 4% ~ (p=0.500 n=5+5) RegexpMatchHard_32-16 10.4MB/s ± 3% 10.5MB/s ± 4% ~ (p=0.460 n=5+5) RegexpMatchHard_1K-16 11.5MB/s ± 1% 11.7MB/s ± 2% +1.57% (p=0.032 n=5+5) Revcomp-16 442MB/s ± 0% 433MB/s ± 2% -2.05% (p=0.032 n=4+5) Template-16 17.7MB/s ± 1% 18.2MB/s ± 3% +3.12% (p=0.032 n=5+5) Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c Reviewed-on: https://go-review.googlesource.com/98695 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-03-12 18:01:33 +00:00
Giovanni Bajo	f7ac70a566	test: move rotate tests to top-level testsuite. Remove old tests from asm_test. Change-Id: Ib408ec7faa60068bddecf709b93ce308e0ef665a Reviewed-on: https://go-review.googlesource.com/100075 Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>	2018-03-11 10:08:18 +00:00
Alberto Donizetti	8e427a3878	test/codegen: add README file for the codegen test harness This change adds a README file inside the test/codegen directory, explaining how to run the codegen tests and the syntax of the regexps comments used to match assembly instructions. Change-Id: Ica4eb3ffa9c6975371538cc8ae0ac3c1a3a03baf Reviewed-on: https://go-review.googlesource.com/99156 Reviewed-by: Keith Randall <khr@golang.org>	2018-03-09 18:38:53 +00:00
Alberto Donizetti	5f541b11aa	test/codegen: port MULs merging tests to codegen And delete them from asm_go. Change-Id: I0057cbd90ca55fa51c596e32406e190f3866f93e Reviewed-on: https://go-review.googlesource.com/99815 Reviewed-by: Keith Randall <khr@golang.org>	2018-03-09 17:01:56 +00:00
Alberto Donizetti	cde34780b7	test/codegen: port math/bits.RotateLeft tests to codegen Only RotateLeft{64,32} were tested, and just for ppc64. This CL adds tests for RotateLeft{64,32,16,8} on arm64 and amd64/386, for the cases where the calls are actually instrinsified. RotateLeft tests (the last ones for math/bits functions) are deleted from asm_test. This CL also adds a space between the "//" and the arch name in the comments, to uniform this file to the style used in all the other files. Change-Id: Ifc2a27261d70bcc294b4ec64490d8367f62d2b89 Reviewed-on: https://go-review.googlesource.com/99596 Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-03-09 10:53:38 +00:00
Austin Clements	6436270dad	cmd/compile: add fence-post implications to prove This adds four new deductions to the prove pass, all related to adding or subtracting one from a value. This is the first hint of actual arithmetic relations in the prove pass. The most effective of these is x-1 >= w && x > min ⇒ x > w This helps eliminate bounds checks in code like if x > 0 { // do something with s[x-1] } Altogether, these deductions prove an additional 260 branches in std and cmd. Furthermore, they will let us eliminate some tricky compiler-inserted panics in the runtime that are interfering with static analysis. Fixes #23354. Change-Id: I7088223e0e0cd6ff062a75c127eb4bb60e6dce02 Reviewed-on: https://go-review.googlesource.com/87480 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>	2018-03-08 22:25:28 +00:00
Austin Clements	941fc129e2	cmd/compile: derive unsigned limits from signed limits in prove This adds a few simple deductions to the prove pass' fact table to derive unsigned concrete limits from signed concrete limits where possible. This tweak lets the pass prove 70 additional branch conditions in std and cmd. This is based on a comment from the recently-deleted factsTable.get: "// TODO: also use signed data if lim.min >= 0". Change-Id: Ib4340249e7733070f004a0aa31254adf5df8a392 Reviewed-on: https://go-review.googlesource.com/87479 Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-08 22:25:27 +00:00
Austin Clements	669db2cef5	cmd/compile: make prove pass use unsatisfiability Currently the prove pass uses implication queries. For each block, it collects the set of branch conditions leading to that block, and queries this fact table for whether any of these facts imply the block's own branch condition (or its inverse). This works remarkably well considering it doesn't do any deduction on these facts, but it has various downsides: 1. It requires an implementation both of adding facts to the table and determining implications. These are very nearly duals of each other, but require separate implementations. Likewise, the process of asserting facts of dominating branch conditions is very nearly the dual of the process of querying implied branch conditions. 2. It leads to less effective use of derived facts. For example, the prove pass currently derives facts about the relations between len and cap, but can't make use of these unless a branch condition is in the exact form of a derived fact. If one of these derived facts contradicts another fact, it won't notice or make use of this. This CL changes the approach of the prove pass to instead use contradiction instead of implication. Rather than ever querying a branch condition, it simply adds branch conditions to the fact table. If this leads to a contradiction (specifically, it makes the fact set unsatisfiable), that branch is impossible and can be cut. As a result, 1. We can eliminate the code for determining implications (factsTable.get disappears entirely). Also, there is now a single implementation of visiting and asserting branch conditions, since we don't have to flip them around to treat them as facts in one place and queries in another. 2. Derived facts can be used effectively. It doesn't matter why the fact table is unsatisfiable; a contradiction in any of the facts is enough. 3. As an added benefit, it's now quite easy to avoid traversing beyond provably-unreachable blocks. In contrast, the current implementation always visits all blocks. The prove pass already has nearly all of the mechanism necessary to compute unsatisfiability, which means this both simplifies the code and makes it more powerful. The only complication is that the current implication procedure has a hack for dealing with the 0 <= Args[0] condition of OpIsInBounds and OpIsSliceInBounds. We replace this with asserting the appropriate fact when we process one of these conditions. This seems much cleaner anyway, and works because we can now take advantage of derived facts. This has no measurable effect on compiler performance. Effectiveness: There is exactly one condition in all of std and cmd that this fails to prove that the old implementation could: (int64(^uint(0)>>1) < x) in encoding/gob. This can never be true because x is an int, and it's basically coincidence that the old code gets this. (For example, it fails to prove the similar (x < ^int64(^uint(0)>>1)) condition that immediately precedes it, and even though the conditions are logically unrelated, it wouldn't get the second one if it hadn't first processed the first!) It does, however, prove a few dozen additional branches. These come from facts that are added to the fact table about the relations between len and cap. These were almost never queried directly before, but could lead to contradictions, which the unsat-based approach is able to use. There are exactly two branches in std and cmd that this implementation proves in the other direction. This sounds scary, but is okay because both occur in already-unreachable blocks, so it doesn't matter what we chose. Because the fact table logic is sound but incomplete, it fails to prove that the block isn't reachable, even though it is able to prove that both outgoing branches are impossible. We could turn these blocks into BlockExit blocks, but it doesn't seem worth the trouble of the extra proof effort for something that happens twice in all of std and cmd. Tests: This CL updates test/prove.go to change the expected messages because it can no longer give a "reason" why it proved or disproved a condition. It also adds a new test of a branch it couldn't prove before. It mostly guts test/sliceopt.go, removing everything related to slice bounds optimizations and moving a few relevant tests to test/prove.go. Much of this test is actually unreachable. The new prove pass figures this out and doesn't try to prove anything about the unreachable parts. The output on the unreachable parts is already suspect because anything can be proved at that point, so it's really just a regression test for an algorithm the compiler no longer uses. This is a step toward fixing #23354. That issue is quite easy to fix once we can use derived facts effectively. Change-Id: Ia48a1b9ee081310579fe474e4a61857424ff8ce8 Reviewed-on: https://go-review.googlesource.com/87478 Reviewed-by: Keith Randall <khr@golang.org>	2018-03-08 22:25:25 +00:00
Alberto Donizetti	3772b2e1d5	test/codegen: port 2^n muls tests to codegen harness And delete them from the asm_test.go file. Change-Id: I124c8c352299646ec7db0968cdb0fe59a3b5d83d Reviewed-on: https://go-review.googlesource.com/99475 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-03-08 16:30:14 +00:00
Matthew Dempsky	88466e93a4	cmd/compile: mark anonymous receiver parameters as non-escaping This was already done for normal parameters, and the same logic applies for receiver parameters too. Updates #24305. Change-Id: Ia2a46f68d14e8fb62004ff0da1db0f065a95a1b7 Reviewed-on: https://go-review.googlesource.com/99335 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-03-08 00:20:01 +00:00
Alberto Donizetti	c028958393	test/codegen: fix issue with arm64 memmove codegen test This recently added arm64 memmove codegen check: func movesmall() { // arm64:-"memmove" x := [...]byte{1, 2, 3, 4, 5, 6, 7} copy(x[1:], x[:]) } is not correct, for two reasons: 1. regexps are matched from the start of the disasm line (excluding line information). This mean that a negative -"memmove" check will pass against a 'CALL runtime.memmove' line because the line does not start with 'memmove' (its starts with CALL...). The way to specify no 'memmove' match whatsoever on the line is -".memmove" 2. AFAIK comments on their own line are matched against the first subsequent non-comment line. So the code above only verifies that the x := ... line does not generate a memmove. The comment should be moved near the copy() line, if it's that one we want to not generate a memmove call. The fact that the test above is not effective can be checked by running `go run run.go -v codegen` in the toplevel test directory with a go1.10 toolchain (that does not have the memmove-elision optimization). The test will still pass (it shouldn't). This change changes the regexp to -".memmove" and moves it near the line it needs to (not)match. Change-Id: Ie01ef4d775e77d92dc8d8b7856b89b200f5e5ef2 Reviewed-on: https://go-review.googlesource.com/98977 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-03-07 16:41:24 +00:00
Kunpei Sakai	b75e8a2a3b	cmd/compile: prevent detection of wrong duplicates by including *types.Type in typeVal. Updates #21866 Fixes #24159 Change-Id: I2f8cac252d88d43e723124f2867b1410b7abab7b Reviewed-on: https://go-review.googlesource.com/98476 Run-TryBot: Kunpei Sakai <namusyaka@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-03-07 01:26:00 +00:00
ChrisALiles	42ecf39e85	cmd/compile: improve compiler error on embedded structs Fixes #23609 Change-Id: I751aae3d849de7fce1306324fcb1a4c3842d873e Reviewed-on: https://go-review.googlesource.com/97076 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-06 21:06:46 +00:00
Alberto Donizetti	8516ecd05f	test/codegen: port math/bits.ReverseBytes tests to codegen And remove them from ssa_test. Change-Id: If767af662801219774d1bdb787c77edfa6067770 Reviewed-on: https://go-review.googlesource.com/98976 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-03-06 20:34:33 +00:00

1 2 3 4 5 ...

3049 Commits