qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-17 15:04:45 -07:00

Author	SHA1	Message	Date
Keith Randall	af7eafd150	cmd/compile: convert 386 port to use addressing modes pass (take 2) Retrying CL 222782, with a fix that will hopefully stop the random crashing. The issue with the previous CL is that it does pointer arithmetic in a way that may briefly generate an out-of-bounds pointer. If an interrupt happens to occur in that state, the referenced object may be collected incorrectly. Suppose there was code that did s[x+c]. The previous CL had a rule to the effect of ptr + (x + c) -> c + (ptr + x). But ptr+x is not guaranteed to point to the same object as ptr. In contrast, ptr+(x+c) is guaranteed to point to the same object as ptr, because we would have already checked that x+c is in bounds. For example, strconv.trim used to have this code: MOVZX -0x1(BX)(DX1), BP CMPL $0x30, AL After CL 222782, it had this code: LEAL 0(BX)(DX1), BP CMPB $0x30, -0x1(BP) An interrupt between those last two instructions could see BP pointing outside the backing store of the slice involved. It's really hard to actually demonstrate a bug. First, you need to have an interrupt occur at exactly the right time. Then, there must be no other pointers to the object in question. Since the interrupted frame will be scanned conservatively, there can't even be a dead pointer in another register or on the stack. (In the example above, a bug can't happen because BX still holds the original pointer.) Then, the object in question needs to be collected (or at least scanned?) before the interrupted code continues. This CL needs to handle load combining somewhat differently than CL 222782 because of the new restriction on arithmetic. That's the only real difference (other than removing the bad rules) from that old CL. This bug is also present in the amd64 rewrite rules, and we haven't seen any crashing as a result. I will fix up that code similarly to this one in a separate CL. Update #37881 Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f Reviewed-on: https://go-review.googlesource.com/c/go/+/225798 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-03-27 18:54:45 +00:00
Lynn Boger	e4a1cf8a56	cmd/compile: add rules to eliminate unnecessary signed shifts This change to the rules removes some unnecessary signed shifts that appear in the math/rand functions. Existing rules did not cover some of the signed cases. A little improvement seen in math/rand due to removing 1 of 2 instructions generated for Int31n, which is inlined quite a bit. Intn1000 46.9ns ± 0% 45.5ns ± 0% -2.99% (p=1.000 n=1+1) Int63n1000 33.5ns ± 0% 32.8ns ± 0% -2.09% (p=1.000 n=1+1) Int31n1000 32.7ns ± 0% 32.6ns ± 0% -0.31% (p=1.000 n=1+1) Float32 32.7ns ± 0% 30.3ns ± 0% -7.34% (p=1.000 n=1+1) Float64 21.7ns ± 0% 20.9ns ± 0% -3.69% (p=1.000 n=1+1) Perm3 205ns ± 0% 202ns ± 0% -1.46% (p=1.000 n=1+1) Perm30 1.71µs ± 0% 1.68µs ± 0% -1.35% (p=1.000 n=1+1) Perm30ViaShuffle 1.65µs ± 0% 1.65µs ± 0% -0.30% (p=1.000 n=1+1) ShuffleOverhead 2.83µs ± 0% 2.83µs ± 0% -0.07% (p=1.000 n=1+1) Read3 18.7ns ± 0% 16.1ns ± 0% -13.90% (p=1.000 n=1+1) Read64 126ns ± 0% 124ns ± 0% -1.59% (p=1.000 n=1+1) Read1000 1.75µs ± 0% 1.63µs ± 0% -7.08% (p=1.000 n=1+1) Change-Id: I11502dfca7d65aafc76749a8d713e9e50c24a858 Reviewed-on: https://go-review.googlesource.com/c/go/+/225917 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-03-27 16:05:42 +00:00
Ruixin(Peter) Bao	16cfab8d89	cmd/compile: use load and test instructions on s390x The load and test instructions compare the given value against zero and will produce a condition code indicating one of the following scenarios: 0: Result is zero 1: Result is less than zero 2: Result is greater than zero 3: Result is not a number (NaN) The instruction can be used to simplify floating point comparisons against zero, which can enable further optimizations. This CL also reduces the size of .text section of math.test binary by around 0.7 KB (in hexadecimal, from 1358f0 to 135620). Change-Id: I33cb714f0c6feebac7a1c46dfcc735e7daceff9c Reviewed-on: https://go-review.googlesource.com/c/go/+/209159 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-03-25 13:10:07 +00:00
Keith Randall	c785633941	Revert "cmd/compile: convert 386 port to use addressing modes pass" This reverts commit CL 222782. Reason for revert: Reverting to see if 386 errors go away Update #37881 Change-Id: I74f287404c52414db1b6ff1649effa4ed9e5cc0c Reviewed-on: https://go-review.googlesource.com/c/go/+/225218 Reviewed-by: Bryan C. Mills <bcmills@google.com>	2020-03-24 19:07:15 +00:00
Keith Randall	e0deacd1c0	Revert "cmd/compile: disable mem+op operations on 386" This reverts commit CL 224837. Reason for revert: Reverting partial reverts of 222782. Update #37881 Change-Id: Ie9bf84d6e17ed214abe538965e5ff03936886826 Reviewed-on: https://go-review.googlesource.com/c/go/+/225217 Reviewed-by: Bryan C. Mills <bcmills@google.com>	2020-03-24 19:06:22 +00:00
Keith Randall	f975485ad1	Revert "cmd/compile: disable addressingmodes pass for 386" This reverts commit CL 225057. Reason for revert: Undoing partial reverts of CL 222782 Update #37881 Change-Id: Iee024cab2a580a37a0fc355e0e3c5ad3d8fdaf7d Reviewed-on: https://go-review.googlesource.com/c/go/+/225197 Reviewed-by: Bryan C. Mills <bcmills@google.com>	2020-03-24 19:05:50 +00:00
Keith Randall	5b897ec017	cmd/compile: disable addressingmodes pass for 386 Update #37881 Change-Id: I1f9a3f57f6215a19c31765c257ee78715eab36b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/225057 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-03-23 20:31:13 +00:00
Keith Randall	3adbdb6d99	cmd/compile: disable mem+op operations on 386 Rolling back portions of CL 222782 to see if that helps issue #37881 any. Update #37881 Change-Id: I9cc3ff8c469fa5e4b22daec715d04148033f46f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/224837 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com>	2020-03-23 18:27:37 +00:00
Russ Cox	fc8a6336d1	cmd/asm, cmd/compile, runtime: add -spectre=ret mode This commit extends the -spectre flag to cmd/asm and adds a new Spectre mitigation mode "ret", which enables the use of retpolines. Retpolines prevent speculation about the target of an indirect jump or call and are described in more detail here: https://support.google.com/faqs/answer/7625886 Change-Id: I4f2cb982fa94e44d91e49bd98974fd125619c93a Reviewed-on: https://go-review.googlesource.com/c/go/+/222661 Reviewed-by: Keith Randall <khr@golang.org>	2020-03-13 19:05:54 +00:00
Russ Cox	877ef86bec	cmd/compile: add spectre mitigation mode enabled by -spectre This commit adds a new cmd/compile flag -spectre, which accepts a comma-separated list of possible Spectre mitigations to apply, or the empty string (none), or "all". The only known mitigation right now is "index", which uses conditional moves to ensure that x86-64 CPUs do not speculate past index bounds checks. Speculating past index bounds checks may be problematic on systems running privileged servers that accept requests from untrusted users who can execute their own programs on the same machine. (And some more constraints that make it even more unlikely in practice.) The cases this protects against are analogous to the ones Microsoft explains in the "Array out of bounds load/store feeding ..." sections here: https://docs.microsoft.com/en-us/cpp/security/developer-guidance-speculative-execution?view=vs-2019#array-out-of-bounds-load-feeding-an-indirect-branch Change-Id: Ib7532d7e12466b17e04c4e2075c2a456dc98f610 Reviewed-on: https://go-review.googlesource.com/c/go/+/222660 Reviewed-by: Keith Randall <khr@golang.org>	2020-03-13 19:05:46 +00:00
Keith Randall	d84cbec890	cmd/compile: convert 386 port to use addressing modes pass Update #36468 Change-Id: Idfdb845d097994689be450d6e8a57fa9adb57166 Reviewed-on: https://go-review.googlesource.com/c/go/+/222782 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-03-13 17:00:54 +00:00
Russ Cox	801a9d9a0c	test/codegen: mention in README that tests only run on Linux without -all_codegen This took me a while to figure out. The relevant code is in test/run.go (note the "linux" hard-coded strings): var arch, subarch, os string switch { case archspec[2] != "": // 3 components: "linux/386/sse2" os, arch, subarch = archspec[0], archspec[1][1:], archspec[2][1:] case archspec[1] != "": // 2 components: "386/sse2" os, arch, subarch = "linux", archspec[0], archspec[1][1:] default: // 1 component: "386" os, arch, subarch = "linux", archspec[0], "" if arch == "wasm" { os = "js" } } Change-Id: I92ba280025d2072e17532a5e43cf1d676789c167 Reviewed-on: https://go-review.googlesource.com/c/go/+/222819 Reviewed-by: Keith Randall <khr@golang.org>	2020-03-11 16:17:08 +00:00
Keith Randall	98cb76799c	cmd/compile: insert complicated x86 addressing modes as a separate pass Use a separate compiler pass to introduce complicated x86 addressing modes. Loads in the normal architecture rules (for x86 and all other platforms) can have constant offsets (AuxInt values) and symbols (Aux values), but no more. The complex addressing modes (x+y, x+2*y, etc.) are introduced in a separate pass that combines loads with LEAQx ops. Organizing rewrites this way simplifies the number of rewrites required, as there are lots of different rule orderings that have to be specified to ensure these complex addressing modes are always found if they are possible. Update #36468 Change-Id: I5b4bf7b03a1e731d6dfeb9ef19b376175f3b4b44 Reviewed-on: https://go-review.googlesource.com/c/go/+/217097 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-03-10 00:13:21 +00:00
Diogo Pinela	19ed0d993c	cmd/compile: use staticuint64s instead of staticbytes There are still two places in src/runtime/string.go that use staticbytes, so we cannot delete it just yet. There is a new codegen test to verify that the index calculation is constant-folded, at least on amd64. ppc64, mips[64] and s390x cannot currently do that. There is also a new runtime benchmark to ensure that this does not slow down performance (tested against parent commit): name old time/op new time/op delta ConvT2EByteSized/bool-4 1.07ns ± 1% 1.07ns ± 1% ~ (p=0.060 n=14+15) ConvT2EByteSized/uint8-4 1.06ns ± 1% 1.07ns ± 1% ~ (p=0.095 n=14+15) Updates #37612 Change-Id: I5ec30738edaa48cda78dfab4a78e24a32fa7fd6a Reviewed-on: https://go-review.googlesource.com/c/go/+/221957 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-03-04 21:43:01 +00:00
Keith Randall	cd9fd640db	cmd/compile: don't allow NaNs in floating-point constant ops Trying this CL again, with a fixed test that allows platforms to disagree on the exact behavior of converting NaNs. We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for almost all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. Update #36400 Change-Id: Idf203b688a15abceabbd66ba290d4e9f63619ecb Reviewed-on: https://go-review.googlesource.com/c/go/+/221790 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-03-04 04:49:54 +00:00
Josh Bleecher Snyder	b49d8ce2fa	all: fix two minor typos in comments Change-Id: Iec6cd81c9787d3419850aa97e75052956ad139bc Reviewed-on: https://go-review.googlesource.com/c/go/+/221789 Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>	2020-03-03 17:44:05 +00:00
Michael Munday	e37cc29863	cmd/compile: optimize integer-in-range checks This CL incorporates code from CL 201206 by Josh Bleecher Snyder (thanks Josh). This CL restores the integer-in-range optimizations in the SSA backend. The fuse pass is enhanced to detect inequalities that could be merged and fuse their associated blocks while the generic rules optimize them into a single unsigned comparison. For example, the inequality `x >= 0 && x < 10` will now be optimized to `unsigned(x) < 10`. Overall has a fairly positive impact on binary sizes. name old time/op new time/op delta Template 192ms ± 1% 192ms ± 1% ~ (p=0.757 n=17+18) Unicode 76.6ms ± 2% 76.5ms ± 2% ~ (p=0.603 n=19+19) GoTypes 694ms ± 1% 693ms ± 1% ~ (p=0.569 n=19+20) Compiler 3.26s ± 0% 3.27s ± 0% +0.25% (p=0.000 n=20+20) SSA 7.41s ± 0% 7.49s ± 0% +1.10% (p=0.000 n=17+19) Flate 120ms ± 1% 120ms ± 1% +0.38% (p=0.003 n=19+19) GoParser 152ms ± 1% 152ms ± 1% ~ (p=0.061 n=17+19) Reflect 422ms ± 1% 425ms ± 2% +0.76% (p=0.001 n=18+20) Tar 167ms ± 1% 167ms ± 0% ~ (p=0.730 n=18+19) XML 233ms ± 4% 231ms ± 1% ~ (p=0.752 n=20+17) LinkCompiler 927ms ± 8% 928ms ± 8% ~ (p=0.857 n=19+20) ExternalLinkCompiler 1.81s ± 2% 1.81s ± 2% ~ (p=0.513 n=19+20) LinkWithoutDebugCompiler 556ms ±10% 583ms ±13% +4.95% (p=0.007 n=20+20) [Geo mean] 478ms 481ms +0.52% name old user-time/op new user-time/op delta Template 270ms ± 5% 269ms ± 7% ~ (p=0.925 n=20+20) Unicode 134ms ± 7% 131ms ±14% ~ (p=0.593 n=18+20) GoTypes 981ms ± 3% 987ms ± 2% +0.63% (p=0.049 n=19+18) Compiler 4.50s ± 2% 4.50s ± 1% ~ (p=0.588 n=19+20) SSA 10.6s ± 2% 10.6s ± 1% ~ (p=0.141 n=20+19) Flate 164ms ± 8% 165ms ±10% ~ (p=0.738 n=20+20) GoParser 202ms ± 5% 203ms ± 6% ~ (p=0.820 n=20+20) Reflect 587ms ± 6% 597ms ± 3% ~ (p=0.087 n=20+18) Tar 230ms ± 6% 228ms ± 8% ~ (p=0.569 n=19+20) XML 311ms ± 6% 314ms ± 5% ~ (p=0.369 n=20+20) LinkCompiler 878ms ± 8% 887ms ± 7% ~ (p=0.289 n=20+20) ExternalLinkCompiler 1.60s ± 7% 1.60s ± 7% ~ (p=0.820 n=20+20) LinkWithoutDebugCompiler 498ms ±12% 489ms ±11% ~ (p=0.398 n=20+20) [Geo mean] 611ms 611ms +0.05% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 36.0MB ± 0% -0.32% (p=0.000 n=20+20) Unicode 28.3MB ± 0% 28.3MB ± 0% -0.03% (p=0.000 n=19+20) GoTypes 121MB ± 0% 121MB ± 0% ~ (p=0.226 n=16+20) Compiler 563MB ± 0% 563MB ± 0% ~ (p=0.166 n=20+19) SSA 1.32GB ± 0% 1.33GB ± 0% +0.88% (p=0.000 n=20+19) Flate 22.7MB ± 0% 22.7MB ± 0% -0.02% (p=0.033 n=19+20) GoParser 27.9MB ± 0% 27.9MB ± 0% -0.02% (p=0.001 n=20+20) Reflect 78.3MB ± 0% 78.2MB ± 0% -0.01% (p=0.019 n=20+20) Tar 34.0MB ± 0% 34.0MB ± 0% -0.04% (p=0.000 n=20+20) XML 43.9MB ± 0% 43.9MB ± 0% -0.07% (p=0.000 n=20+19) LinkCompiler 205MB ± 0% 205MB ± 0% +0.44% (p=0.000 n=20+18) ExternalLinkCompiler 223MB ± 0% 223MB ± 0% +0.03% (p=0.000 n=20+20) LinkWithoutDebugCompiler 139MB ± 0% 142MB ± 0% +1.75% (p=0.000 n=20+20) [Geo mean] 93.7MB 93.9MB +0.20% name old allocs/op new allocs/op delta Template 363k ± 0% 361k ± 0% -0.58% (p=0.000 n=20+19) Unicode 329k ± 0% 329k ± 0% -0.06% (p=0.000 n=19+20) GoTypes 1.28M ± 0% 1.28M ± 0% -0.01% (p=0.000 n=20+20) Compiler 5.40M ± 0% 5.40M ± 0% -0.01% (p=0.000 n=20+20) SSA 12.7M ± 0% 12.8M ± 0% +0.80% (p=0.000 n=20+20) Flate 228k ± 0% 228k ± 0% ~ (p=0.194 n=20+20) GoParser 295k ± 0% 295k ± 0% -0.04% (p=0.000 n=20+20) Reflect 949k ± 0% 949k ± 0% -0.01% (p=0.000 n=20+20) Tar 337k ± 0% 337k ± 0% -0.06% (p=0.000 n=20+20) XML 418k ± 0% 417k ± 0% -0.17% (p=0.000 n=20+20) LinkCompiler 553k ± 0% 554k ± 0% +0.22% (p=0.000 n=20+19) ExternalLinkCompiler 1.52M ± 0% 1.52M ± 0% +0.27% (p=0.000 n=20+20) LinkWithoutDebugCompiler 186k ± 0% 186k ± 0% +0.06% (p=0.000 n=20+20) [Geo mean] 723k 723k +0.03% name old text-bytes new text-bytes delta HelloSize 828kB ± 0% 828kB ± 0% -0.01% (p=0.000 n=20+20) name old data-bytes new data-bytes delta HelloSize 13.4kB ± 0% 13.4kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 180kB ± 0% 180kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.23MB ± 0% 1.23MB ± 0% -0.33% (p=0.000 n=20+20) file before after Δ % addr2line 4320075 4311883 -8192 -0.190% asm 5191932 5187836 -4096 -0.079% buildid 2835338 2831242 -4096 -0.144% compile 20531717 20569099 +37382 +0.182% cover 5322511 5318415 -4096 -0.077% dist 3723749 3719653 -4096 -0.110% doc 4743515 4739419 -4096 -0.086% fix 3413960 3409864 -4096 -0.120% link 6690119 6686023 -4096 -0.061% nm 4269616 4265520 -4096 -0.096% pprof 14942189 14929901 -12288 -0.082% trace 11807164 11790780 -16384 -0.139% vet 8384104 8388200 +4096 +0.049% go 15339076 15334980 -4096 -0.027% total 132258257 132226007 -32250 -0.024% Fixes #30645. Change-Id: If551ac5996097f3685870d083151b5843170aab0 Reviewed-on: https://go-review.googlesource.com/c/go/+/165998 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-03 14:30:26 +00:00
Michael Munday	44fe355694	cmd/compile: canonicalize comparison argument order Ensure that any comparison between two values has the same argument order. This helps ensure that they can be eliminated during the lowered CSE pass which will be particularly important if we eliminate the Greater and Geq ops (see #37316). Example: CMP R0, R1 BLT L1 CMP R1, R0 // different order, cannot eliminate BEQ L2 CMP R0, R1 BLT L1 CMP R0, R1 // same order, can eliminate BEQ L2 This does have some drawbacks. Notably comparisons might 'flip' direction in the assembly output after even small changes to the code or compiler. It should help make optimizations more reliable however. compilecmp master -> HEAD master (`218f4572f5`): text/template: make reflect.Value indirections more robust HEAD (f1661fef3e): cmd/compile: canonicalize comparison argument order platform: linux/amd64 file before after Δ % api 6063927 6068023 +4096 +0.068% asm 5191757 5183565 -8192 -0.158% cgo 4893518 4901710 +8192 +0.167% cover 5330345 5326249 -4096 -0.077% fix 3417778 3421874 +4096 +0.120% pprof 14889456 14885360 -4096 -0.028% test2json 2848138 2844042 -4096 -0.144% trace 11746239 11733951 -12288 -0.105% total 132739173 132722789 -16384 -0.012% Change-Id: I11736b3fe2a4553f6fc65018f475e88217fa22f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/220425 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-26 10:32:22 +00:00
Bryan C. Mills	a9f1ea4a83	Revert "cmd/compile: don't allow NaNs in floating-point constant ops" This reverts CL 213477. Reason for revert: tests are failing on linux-mips*-rtrk builders. Change-Id: I8168f7450890233f1bd7e53930b73693c26d4dc0 Reviewed-on: https://go-review.googlesource.com/c/go/+/220897 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-02-25 15:49:19 +00:00
Keith Randall	2aa7c6c548	cmd/compile: don't allow NaNs in floating-point constant ops We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for almost all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. This has been a bug since 1.10, and there's an easy workaround (declare a global varaible containing the signaling NaN pattern, and use that as the argument to math.Float32frombits) so we'll fix it in 1.15. Fixes #36400 Update #36399 Change-Id: Icf155e743281560eda2eed953d19a829552ccfda Reviewed-on: https://go-review.googlesource.com/c/go/+/213477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-02-25 02:21:53 +00:00
Keith Randall	1cfe8e91b6	cmd/compile: use ADDQ instead of LEAQ when we can The address calculations in the example end up doing x << 4 + y + 0. Before this CL we use a SHLQ+LEAQ. Since the constant offset is 0, we can use SHLQ+ADDQ instead. Change-Id: Ia048c4fdbb3a42121c7e1ab707961062e8247fca Reviewed-on: https://go-review.googlesource.com/c/go/+/209959 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-02-24 21:33:53 +00:00
Brian Kessler	6b1d5471b9	cmd/compile: add signed indivisibility by power of 2 rules Commit `44343c777c` (CL 173557) added rules for handling divisibility checks for powers of 2 for signed integers, x%c ==0. This change adds the complementary indivisibility rules, x%c != 0. Fixes #34166 Change-Id: I87379e30af7aff633371acca82db2397da9b2c07 Reviewed-on: https://go-review.googlesource.com/c/go/+/194219 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-11-07 16:30:46 +00:00
Russ Cox	543c6d2e0d	math, cmd/compile: rename Fma to FMA This API was added for #25819, where it was discussed as math.FMA. The commit adding it used math.Fma, presumably for consistency with the rest of the unusual names in package math (Sincos, Acosh, Erfcinv, Float32bits, etc). I believe that using an idiomatic Go name is more important here than consistency with these other names, most of which are historical baggage from C's standard library. Early additions like Float32frombits happened before "uppercase for export" (so they were originally like "float32frombits") and they were not properly reconsidered when we uppercased the symbols to export them. That's a mistake we live with. The names of functions we have added since then, and even a few that were legacy, are more properly Go-cased, such as IsNaN, IsInf, and RoundToEven, rather than Isnan, Isinf, and Roundtoeven. And also constants like MaxFloat32. For new API, we should keep using proper Go-cased symbols instead of minimally-upper-cased-C symbols. So math.FMA, not math.Fma. This API has not yet been released, so this change does not break the compatibility promise. This CL also modifies cmd/compile, since the compiler knows the name of the function. I could have stopped at changing the string constants, but it seemed to make more sense to use a consistent casing everywhere. Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e Reviewed-on: https://go-review.googlesource.com/c/go/+/205317 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-11-07 14:51:06 +00:00
smasher164	58b031949b	cmd/compile: add fma intrinsic for arm This change introduces an arm intrinsic that generates the FMULAD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite rule translates the generic intrinsic to FMULAD. Updates #25819. Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa Reviewed-on: https://go-review.googlesource.com/c/go/+/142117 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-21 17:42:47 +00:00
smasher164	7a6da218b1	cmd/compile: add fma intrinsic for amd64 To permit ssa-level optimization, this change introduces an amd64 intrinsic that generates the VFMADD231SD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic ("Fma") to VFMADD231SD. The benchmark compares the software implementation (old) with the intrinsic (new). name old time/op new time/op delta Fma-4 27.2ns ± 1% 1.0ns ± 9% -96.48% (p=0.008 n=5+5) Updates #25819. Change-Id: I966655e5f96817a5d06dff5942418a3915b09584 Reviewed-on: https://go-review.googlesource.com/c/go/+/137156 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-21 16:42:10 +00:00
smasher164	33425ab8db	cmd/compile: introduce generic ssa intrinsic for fused-multiply-add In order to make math.FMA a compiler intrinsic for ISAs like ARM64, PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and rewritten as ARM64: (Fma x y z) -> (FMADDD z x y) PPC64: (Fma x y z) -> (FMADD x y z) S390X: (Fma x y z) -> (FMADD z x y) Updates #25819. Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa Reviewed-on: https://go-review.googlesource.com/c/go/+/131959 Run-TryBot: Akhil Indurti <aindurti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-21 16:24:15 +00:00
David Chase	6adaf17eaa	cmd/compile: preserve statements in late nilcheckelim optimization When a subsequent load/store of a ptr makes the nil check of that pointer unnecessary, if their lines differ, change the line of the load/store to that of the nilcheck, and attempt to rehome the load/store position instead. This fix makes profiling less accurate in order to make panics more informative. Fixes #33724 Change-Id: Ib9afaac12fe0d0320aea1bf493617facc34034b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/200197 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-15 16:43:44 +00:00
Meng Zhuo	50f1157760	cmd/compile: add math/bits.Mul64 intrinsic on mips64x Benchmark: name old time/op new time/op delta Mul 36.0ns ± 1% 2.8ns ± 0% -92.31% (p=0.000 n=10+10) Mul32 4.37ns ± 0% 4.37ns ± 0% ~ (p=0.429 n=6+10) Mul64 36.4ns ± 0% 2.8ns ± 0% -92.37% (p=0.000 n=10+9) Change-Id: Ic4f4e5958adbf24999abcee721d0180b5413fca7 Reviewed-on: https://go-review.googlesource.com/c/go/+/200582 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-10-14 21:23:34 +00:00
Michael Munday	6ec4c71eef	cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-08 10:03:04 +00:00
Cuong Manh Le	77f5adba55	cmd/compile: don't use statictmps for small object in slice literal Fixes #21561 Change-Id: I89c59752060dd9570d17d73acbbaceaefce5d8ce Reviewed-on: https://go-review.googlesource.com/c/go/+/197560 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-10-08 06:09:26 +00:00
Keith Randall	72dc9ab191	cmd/compile: reuse dead register before reusing register holding constant For commuting ops, check whether the second argument is dead before checking if the first argument is rematerializeable. Reusing the register holding a dead value is always best. Fixes #33580 Change-Id: I7372cfc03d514e6774d2d9cc727a3e6bf6ce2657 Reviewed-on: https://go-review.googlesource.com/c/go/+/199559 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2019-10-07 15:16:26 +00:00
Dan Scales	225f484c88	misc, runtime, test: extra tests and benchmarks for defer Add a bunch of extra tests and benchmarks for defer, in preparation for new low-cost (open-coded) implementation of defers (see #34481), - New file defer_test.go that tests a bunch more unusual defer scenarios, including things that might have problems for open-coded defers. - Additions to callers_test.go actually verifying what the stack trace looks like for various panic or panic-recover scenarios. - Additions to crash_test.go testing several more crash scenarios involving recursive panics. - New benchmark in runtime_test.go measuring speed of panic-recover - New CGo benchmark in cgo_test.go calling from Go to C back to Go that shows defer overhead Updates #34481 Change-Id: I423523f3e05fc0229d4277dd00073289a5526188 Reviewed-on: https://go-review.googlesource.com/c/go/+/197017 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2019-09-25 23:27:16 +00:00
Martin Möhrmann	f41451e7eb	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): 'go run run.go -all_codegen -v codegen' passes with following adjustments: ARM64: The BFXIL pattern ((x << lc) >> rc \| y & ac) needed adjustment since ORshiftRL generation fusing '>> rc' and '\|' interferes with matching ((x << lc) >> rc) to generate UBFX. Previously ORshiftLL was created first using the shifts generated for (y & ac). S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs. Updates #33826 Updates #32781 Change-Id: I5a59f6239660d53c029cd22dfb44ddf39f93a56c Reviewed-on: https://go-review.googlesource.com/c/go/+/196810 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-24 20:30:59 +00:00
Bryan C. Mills	34fe8295c5	Revert "compile: prefer an AND instead of SHR+SHL instructions" This reverts CL 194297. Reason for revert: introduced register allocation failures on PPC64LE builders. Updates #33826 Updates #32781 Updates #34468 Change-Id: I7d0b55df8cdf8e7d2277f1814299b083c2692e48 Reviewed-on: https://go-review.googlesource.com/c/go/+/196957 Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2019-09-23 15:20:12 +00:00
Martin Möhrmann	4e2b84ffc5	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): 'go run run.go -all_codegen -v codegen' passes with following adjustments: ARM64: The BFXIL pattern ((x << lc) >> rc \| y & ac) needed adjustment since ORshiftRL generation fusing '>> rc' and '\|' interferes with matching ((x << lc) >> rc) to generate UBFX. Previously ORshiftLL was created first using the shifts generated for (y & ac). S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs. Updates #33826 Updates #32781 Change-Id: I43227da76b625de03fbc51117162b23b9c678cdb Reviewed-on: https://go-review.googlesource.com/c/go/+/194297 Run-TryBot: Martin Möhrmann <martisch@uos.de> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-21 18:00:13 +00:00
Agniva De Sarker	ecc7dd5469	test/codegen: fix wasm codegen breakage i32.eqz instructions don't appear unless needed in if conditions anymore after CL 195204. I forgot to run the codegen tests while submitting the CL. Thanks to @martisch for catching it. Fixes #34442 Change-Id: I177b064b389be48e39d564849714d7a8839be13e Reviewed-on: https://go-review.googlesource.com/c/go/+/196580 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2019-09-21 16:31:44 +00:00
Matthew Dempsky	85fc765341	cmd/compile: optimize switch on strings When compiling expression switches, we try to optimize runs of constants into binary searches. The ordering used isn't visible to the application, so it's unimportant as long as we're consistent between sorting and searching. For strings, it's much cheaper to compare string lengths than strings themselves, so instead of ordering strings by "si <= sj", we currently order them by "len(si) < len(sj) \|\| len(si) == len(sj) && si <= sj" (i.e., the lexicographical ordering on the 2-tuple (len(s), s)). However, it's also somewhat cheaper to compare strings for equality (i.e., ==) than for ordering (i.e., <=). And if there were two or three string constants of the same length in a switch statement, we might unnecessarily emit ordering comparisons. For example, given: switch s { case "", "1", "2", "3": // ordered by length then content goto L } we currently compile this as: if len(s) < 1 \|\| len(s) == 1 && s <= "1" { if s == "" { goto L } else if s == "1" { goto L } } else { if s == "2" { goto L } else if s == "3" { goto L } } This CL switches to using a 2-level binary search---first on len(s), then on s itself---so that string ordering comparisons are only needed when there are 4 or more strings of the same length. (4 being the cut-off for when using binary search is actually worthwhile.) So the above switch instead now compiles to: if len(s) == 0 { if s == "" { goto L } } else if len(s) == 1 { if s == "1" { goto L } else if s == "2" { goto L } else if s == "3" { goto L } } which is better optimized by walk and SSA. (Notably, because there are only two distinct lengths and no more than three strings of any particular length, this example ends up falling back to simply using linear search.) Test case by khr@ from CL 195138. Fixes #33934. Change-Id: I8eeebcaf7e26343223be5f443d6a97a0daf84f07 Reviewed-on: https://go-review.googlesource.com/c/go/+/195340 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-18 05:33:05 +00:00
LE Manh Cuong	ec4e8517cd	cmd/compile: support more length types for slice extension optimization golang.org/cl/109517 optimized the compiler to avoid the allocation for make in append(x, make([]T, y)...). This was only implemented for the case that y has type int. This change extends the optimization to trigger for all integer types where the value is known at compile time to fit into an int. name old time/op new time/op delta ExtendInt-12 106ns ± 4% 106ns ± 0% ~ (p=0.351 n=10+6) ExtendUint64-12 1.03µs ± 5% 0.10µs ± 4% -90.01% (p=0.000 n=9+10) name old alloc/op new alloc/op delta ExtendInt-12 0.00B 0.00B ~ (all equal) ExtendUint64-12 13.6kB ± 0% 0.0kB -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta ExtendInt-12 0.00 0.00 ~ (all equal) ExtendUint64-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) Updates #29785 Change-Id: Ief7760097c285abd591712da98c5b02bc3961fcd Reviewed-on: https://go-review.googlesource.com/c/go/+/182559 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-17 17:18:17 +00:00
Alberto Donizetti	c2facbe937	Revert "test/codegen: document -all_codegen option in README" This reverts CL 192101. Reason for revert: The same paragraph was added 2 weeks ago (look a few lines above) Change-Id: I05efb2631d7b4966f66493f178f2a649c715a3cc Reviewed-on: https://go-review.googlesource.com/c/go/+/195637 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-16 17:31:37 +00:00
Cherry Zhang	d9b8ffa51c	test/codegen: document -all_codegen option in README It is useful to know about the -all_codegen option for running codegen tests for all platforms. I was puzzling that some codegen test was not failing on my local machine or on trybot, until I found this option. Change-Id: I062cf4d73f6a6c9ebc2258195779d2dab21bc36d Reviewed-on: https://go-review.googlesource.com/c/go/+/192101 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-16 11:53:57 +00:00
Ruixin Bao	98aa97806b	cmd/compile: add math/bits.Mul64 intrinsic on s390x This change adds an intrinsic for Mul64 on s390x. To achieve that, a new assembly instruction, MLGR, is introduced in s390x/asmz.go. This assembly instruction directly uses an existing instruction on Z and supports multiplication of two 64 bit unsigned integer and stores the result in two separate registers. In this case, we require the multiplcand to be stored in register R3 and the output result (the high and low 64 bit of the product) to be stored in R2 and R3 respectively. A test case is also added. Benchmark: name old time/op new time/op delta Mul-18 11.1ns ± 0% 1.4ns ± 0% -87.39% (p=0.002 n=8+10) Mul32-18 2.07ns ± 0% 2.07ns ± 0% ~ (all equal) Mul64-18 11.1ns ± 1% 1.4ns ± 0% -87.42% (p=0.000 n=10+10) Change-Id: Ieca6ad1f61fff9a48a31d50bbd3f3c6d9e6675c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/194572 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-13 09:04:48 +00:00
Michael Munday	5c5f217b63	cmd/compile: improve s390x sign/zero extension removal This CL gets rid of the MOVDreg and MOVDnop SSA operations on s390x. They were originally inserted to help avoid situations where a sign/zero extension was elided but a spill invalidated the optimization. It's not really clear we need to do this though (amd64 doesn't have these ops for example) so long as we are careful when removing sign/zero extensions. Also, the MOVDreg technique doesn't work if the register is spilled before the MOVDreg op (I haven't seen that in practice). Removing these ops reduces the complexity of the rules and also allows us to unblock optimizations. For example, the compiler can now merge the loads in binary.{Big,Little}Endian.PutUint16 which it wasn't able to do before. This CL reduces the size of the .text section in the go tool by about 4.7KB (0.09%). Change-Id: Icaddae7f2e4f9b2debb6fabae845adb3f73b41db Reviewed-on: https://go-review.googlesource.com/c/go/+/173897 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-10 13:17:24 +00:00
Martin Möhrmann	5bb59b6d16	Revert "compile: prefer an AND instead of SHR+SHL instructions" This reverts commit `9ec7074a94`. Reason for revert: broke s390x (copysign, abs) and arm64 (bitfield) tests. Change-Id: I16c1b389c062e8c4aa5de079f1d46c9b25b0db52 Reviewed-on: https://go-review.googlesource.com/c/go/+/193850 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-09 07:33:25 +00:00
Martin Möhrmann	9ec7074a94	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): Updates #33826 Updates #32781 Change-Id: I862d3587446410c447b9a7265196b57f85358633 Reviewed-on: https://go-review.googlesource.com/c/go/+/191780 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-09 06:49:17 +00:00
Alberto Donizetti	e6d2544d20	test/codegen: mention -all_codegen in the README For performance reasons (avoiding costly cross-compilations) CL 177577 changed the codegen test harness to only run the tests for the machine's GOARCH by default. This change updates the codegen README accordingly, explaining what all.bash does run by default and how to perform the tests for all architectures. Fixes #33924 Change-Id: I43328d878c3e449ebfda46f7e69963a44a511d40 Reviewed-on: https://go-review.googlesource.com/c/go/+/192619 Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2019-09-01 15:37:13 +00:00
Brian Kessler	b003afe4fe	cmd/compile: intrinsify RotateLeft32 on wasm wasm has 32-bit versions of all integer operations. This change lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits call. Benchmarking on amd64 under node.js this is ~25% faster. node v10.15.3/amd64 name old time/op new time/op delta RotateLeft 8.37ns ± 1% 8.28ns ± 0% -1.05% (p=0.029 n=4+4) RotateLeft8 11.9ns ± 1% 11.8ns ± 0% ~ (p=0.167 n=5+5) RotateLeft16 11.8ns ± 0% 11.8ns ± 0% ~ (all equal) RotateLeft32 11.9ns ± 1% 8.7ns ± 0% -26.32% (p=0.008 n=5+5) RotateLeft64 8.31ns ± 1% 8.43ns ± 2% ~ (p=0.063 n=5+5) Updates #31265 Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46 Reviewed-on: https://go-review.googlesource.com/c/go/+/182359 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com>	2019-08-31 17:03:04 +00:00
Ben Shi	1786ecd502	cmd/compile: eliminate WASM's redundant extension & wrapping This CL eliminates unnecessary pairs of I32WrapI64 and I64ExtendI32U generated by the WASM backend for IF statements. And it makes the total size of pkg/js_wasm/ decreases about 490KB. Change-Id: I16b0abb686c4e30d5624323166ec2d0ec57dbe2d Reviewed-on: https://go-review.googlesource.com/c/go/+/191758 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com>	2019-08-30 21:20:03 +00:00
Ben Shi	8d5197d818	cmd/compile: optimize 386's math.bits.TrailingZeros16 This CL reverts CL 192097 and fixes the issue in CL 189277. Change-Id: Icd271262e1f5019a8e01c91f91c12c1261eeb02b Reviewed-on: https://go-review.googlesource.com/c/go/+/192519 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-30 17:37:00 +00:00
Cherry Zhang	9859f6bedb	test/codegen: fix ARM32 RotateLeft32 test The syntax of a shifted operation does not have a "$" sign for the shift amount. Remove it. Change-Id: I50782fe942b640076f48c2fafea4d3175be8ff99 Reviewed-on: https://go-review.googlesource.com/c/go/+/192100 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-08-28 20:42:48 +00:00
Ben Shi	3cfd003a8a	cmd/compile: optimize ARM's math.bits.RotateLeft32 This CL optimizes math.bits.RotateLeft32 to inline "MOVW Rx@>Ry, Rd" on ARM. The benchmark results of math/bits show some improvements. name old time/op new time/op delta RotateLeft-4 9.42ns ± 0% 6.91ns ± 0% -26.66% (p=0.000 n=40+33) RotateLeft8-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+31) RotateLeft16-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+32) RotateLeft32-4 8.16ns ± 0% 7.54ns ± 0% -7.68% (p=0.000 n=40+40) RotateLeft64-4 15.7ns ± 0% 15.7ns ± 0% ~ (all equal) updates #31265 Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712 Reviewed-on: https://go-review.googlesource.com/c/go/+/188697 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-08-28 15:41:58 +00:00

1 2 3 4 5

224 Commits