qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-14 14:40:23 -07:00

Author	SHA1	Message	Date
David Chase	5b9ff11c3d	cmd/compile: ppc64le working, not optimized enough This time with the cherry-pick from the proper patch of the old CL. Stack size increased. Corrected NaN-comparison glitches. Marked g register as clobbered by calls. Fixed shared libraries. live_ssa.go still disabled because of differences. Presumably turning on more optimization will fix both the stack size and the live_ssa.go glitches. Enhanced debugging output for shared libs test. Rebased onto master. Updates #16010. Change-Id: I40864faf1ef32c118fb141b7ef8e854498e6b2c4 Reviewed-on: https://go-review.googlesource.com/27159 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-18 16:34:47 +00:00
Cherry Zhang	d99cee79b9	[dev.ssa] cmd/compile, etc.: more ARM64 optimizations, and enable SSA by default Add more ARM64 optimizations: - use hardware zero register when it is possible. - use shifted ops. The assembler supports shifted ops but not documented, nor knows how to print it. This CL adds them. - enable fast division. This was disabled because it makes the old backend generate slower code. But with SSA it generates faster code. Turn on SSA by default, also adjust tests. Change-Id: I7794479954c83bb65008dcb457bc1e21d7496da6 Reviewed-on: https://go-review.googlesource.com/26950 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-15 03:37:34 +00:00
Keith Randall	c069bc4996	[dev.ssa] cmd/compile: implement GO386=387 Last part of the 386 SSA port. Modify the x86 backend to simulate SSE registers and instructions with 387 registers and instructions. The simulation isn't terribly performant, but it works, and the old implementation wasn't very performant either. Leaving to people who care about 387 to optimize if they want. Turn on SSA backend for 386 by default. Fixes #16358 Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0 Reviewed-on: https://go-review.googlesource.com/25271 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-08-10 17:41:01 +00:00
Keith Randall	69a755b602	[dev.ssa] cmd/compile: port SSA backend to amd64p32 It's not a new backend, just a PtrSize==4 modification of the existing AMD64 backend. Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda Reviewed-on: https://go-review.googlesource.com/25586 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-09 15:48:26 +00:00
Cherry Zhang	6b6de15d32	[dev.ssa] cmd/compile: support NaCl in SSA for ARM NaCl code runs in sandbox and there are restrictions for its instruction uses (https://developer.chrome.com/native-client/reference/sandbox_internals/arm-32-bit-sandbox). Like the legacy backend, on NaCl, - don't use R9, which is used as NaCl's "thread pointer". - don't use Duff's device. - don't use indexed load/stores. - the assembler rewrites DIV/MOD to runtime calls, which on NaCl clobbers R12, so R12 is marked as clobbered for DIV/MOD. - other restrictions are satisfied by the assembler. Enable SSA specific tests on nacl/arm, and disable non-SSA ones. Updates #15365. Change-Id: I9262693ec6756b89ca29d3ae4e52a96fe5403b02 Reviewed-on: https://go-review.googlesource.com/24859 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-07-16 03:13:45 +00:00
Cherry Zhang	42181ad852	[dev.ssa] cmd/compile: enable SSA on ARM by default As Josh mentioned in CL 24716, there has been requests for using SSA for ARM. SSA can still be disabled by setting -ssa=0 for cmd/compile, or partially enabled with GOSSAFUNC, GOSSAPKG, and GOSSAHASH. Not enable SSA by default on NaCl, which is not supported yet. Enable SSA-specific tests on ARM: live_ssa.go and nilptr3_ssa.go; disable non-SSA tests: live.go, nilptr3.go, and slicepot.go. Updates #15365. Change-Id: Ic2ca8d166aeca8517b9d262a55e92f2130683a16 Reviewed-on: https://go-review.googlesource.com/23953 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: David Chase <drchase@google.com>	2016-07-06 15:05:50 +00:00
Emmanuel Odeke	53fd522c0d	all: make copyright headers consistent with one space after period Follows suit with https://go-review.googlesource.com/#/c/20111. Generated by running $ grep -R 'Go Authors. All' * \| cut -d":" -f1 \| while read F;do perl -pi -e 's/Go Authors. All/Go Authors. All/g' $F;done The code in cmd/internal/unvendor wasn't changed. Fixes #15213 Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f Reviewed-on: https://go-review.googlesource.com/21819 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-02 13:43:18 +00:00
David Chase	729abfa35c	[dev.ssa] cmd/compile: default compile+test with SSA Some tests disabled, some bifurcated into _ssa and not, with appropriate logging added to compiler. "tests/live.go" in particular needs attention. SSA-specific testing removed, since it's all SSA now. Added "-run_skips" option to tests/run.go to simplify checking whether a test still fails (or how it fails) on a skipped platform. The compiler now compiles with SSA by default. If you don't want SSA, specify GOSSAHASH=n (or N) as an environment variable. Function names ending in "_ssa" are always SSA-compiled. GOSSAFUNC=fname retains its "SSA for fname, log to ssa.html" GOSSAPKG=pkg only has an effect when GOSSAHASH=n GOSSAHASH=10101 etc retains its name-hash-matching behavior for purposes of debugging. See #13068 Change-Id: I8217bfeb34173533eaeb391b5f6935483c7d6b43 Reviewed-on: https://go-review.googlesource.com/16299 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com>	2015-10-30 20:35:20 +00:00
Russ Cox	d447279927	cmd/internal/gc: optimize slice + write barrier The code generated for a slice x[i:j] or x[i:j:k] computes the entire new slice (base, len, cap) and then uses it as the evaluation of the slice expression. If the slice is part of an update x = x[i:j] or x = x[i:j:k], there are opportunities to avoid computing some of these fields. For x = x[0:i], we know that only the len is changing; base can be ignored completely, and cap can be left unmodified. For x = x[0:i:j], we know that only len and cap are changing; base can be ignored completely. For x = x[i:i], we know that the resulting cap is zero, and we don't adjust the base during a slice producing a zero-cap result, so again base can be ignored completely. No write to base, no write barrier. The old slice code was trying to work at a Go syntax level, mainly because that was how you wrote code just once instead of once per architecture. Now the compiler is factored a bit better and we can implement slice during code generation but still have one copy of the code. So the new code is working at that lower level. (It must, to update only parts of the result.) This CL by itself: name old mean new mean delta BinaryTree17 5.81s × (0.98,1.03) 5.71s × (0.96,1.05) ~ (p=0.101) Fannkuch11 4.35s × (1.00,1.00) 4.39s × (1.00,1.00) +0.79% (p=0.000) FmtFprintfEmpty 86.0ns × (0.94,1.11) 82.6ns × (0.98,1.04) -3.86% (p=0.048) FmtFprintfString 276ns × (0.98,1.04) 273ns × (0.98,1.02) ~ (p=0.235) FmtFprintfInt 274ns × (0.98,1.06) 270ns × (0.99,1.01) ~ (p=0.119) FmtFprintfIntInt 506ns × (0.99,1.01) 475ns × (0.99,1.01) -6.02% (p=0.000) FmtFprintfPrefixedInt 391ns × (0.99,1.01) 393ns × (1.00,1.01) ~ (p=0.139) FmtFprintfFloat 566ns × (0.99,1.01) 574ns × (1.00,1.01) +1.33% (p=0.001) FmtManyArgs 1.91µs × (0.99,1.01) 1.87µs × (0.99,1.02) -1.83% (p=0.000) GobDecode 15.3ms × (0.99,1.02) 15.0ms × (0.98,1.05) -1.84% (p=0.042) GobEncode 11.5ms × (0.97,1.03) 11.4ms × (0.99,1.03) ~ (p=0.152) Gzip 645ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.265) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) +0.90% (p=0.000) HTTPClientServer 90.5µs × (0.97,1.04) 88.5µs × (0.99,1.03) -2.27% (p=0.014) JSONEncode 32.0ms × (0.98,1.03) 29.6ms × (0.98,1.01) -7.51% (p=0.000) JSONDecode 114ms × (0.99,1.01) 104ms × (1.00,1.01) -8.60% (p=0.000) Mandelbrot200 6.04ms × (1.00,1.01) 6.02ms × (1.00,1.00) ~ (p=0.057) GoParse 6.47ms × (0.97,1.05) 6.37ms × (0.97,1.04) ~ (p=0.105) RegexpMatchEasy0_32 171ns × (0.93,1.07) 152ns × (0.99,1.01) -11.09% (p=0.000) RegexpMatchEasy0_1K 550ns × (0.98,1.01) 530ns × (1.00,1.00) -3.78% (p=0.000) RegexpMatchEasy1_32 135ns × (0.99,1.02) 134ns × (0.99,1.01) -1.33% (p=0.002) RegexpMatchEasy1_1K 879ns × (1.00,1.01) 865ns × (1.00,1.00) -1.58% (p=0.000) RegexpMatchMedium_32 243ns × (1.00,1.00) 233ns × (1.00,1.00) -4.30% (p=0.000) RegexpMatchMedium_1K 70.3µs × (1.00,1.00) 69.5µs × (1.00,1.00) -1.13% (p=0.000) RegexpMatchHard_32 3.82µs × (1.00,1.01) 3.74µs × (1.00,1.00) -1.95% (p=0.000) RegexpMatchHard_1K 117µs × (1.00,1.00) 115µs × (1.00,1.00) -1.69% (p=0.000) Revcomp 917ms × (0.97,1.04) 920ms × (0.97,1.04) ~ (p=0.786) Template 114ms × (0.99,1.01) 117ms × (0.99,1.01) +2.58% (p=0.000) TimeParse 622ns × (0.99,1.01) 615ns × (0.99,1.00) -1.06% (p=0.000) TimeFormat 665ns × (0.99,1.01) 654ns × (0.99,1.00) -1.70% (p=0.000) This CL and previous CL (append) combined: name old mean new mean delta BinaryTree17 5.68s × (0.97,1.04) 5.71s × (0.96,1.05) ~ (p=0.638) Fannkuch11 4.41s × (0.98,1.03) 4.39s × (1.00,1.00) ~ (p=0.474) FmtFprintfEmpty 92.7ns × (0.91,1.16) 82.6ns × (0.98,1.04) -10.89% (p=0.004) FmtFprintfString 281ns × (0.96,1.08) 273ns × (0.98,1.02) ~ (p=0.078) FmtFprintfInt 288ns × (0.97,1.06) 270ns × (0.99,1.01) -6.37% (p=0.000) FmtFprintfIntInt 493ns × (0.97,1.04) 475ns × (0.99,1.01) -3.53% (p=0.002) FmtFprintfPrefixedInt 423ns × (0.97,1.04) 393ns × (1.00,1.01) -7.07% (p=0.000) FmtFprintfFloat 598ns × (0.99,1.01) 574ns × (1.00,1.01) -4.02% (p=0.000) FmtManyArgs 1.89µs × (0.98,1.05) 1.87µs × (0.99,1.02) ~ (p=0.305) GobDecode 14.8ms × (0.98,1.03) 15.0ms × (0.98,1.05) ~ (p=0.237) GobEncode 12.3ms × (0.98,1.01) 11.4ms × (0.99,1.03) -6.95% (p=0.000) Gzip 656ms × (0.99,1.05) 647ms × (0.99,1.01) ~ (p=0.101) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) +0.58% (p=0.001) HTTPClientServer 91.2µs × (0.97,1.04) 88.5µs × (0.99,1.03) -3.02% (p=0.003) JSONEncode 32.6ms × (0.97,1.08) 29.6ms × (0.98,1.01) -9.10% (p=0.000) JSONDecode 114ms × (0.97,1.05) 104ms × (1.00,1.01) -8.74% (p=0.000) Mandelbrot200 6.11ms × (0.98,1.04) 6.02ms × (1.00,1.00) ~ (p=0.090) GoParse 6.66ms × (0.97,1.04) 6.37ms × (0.97,1.04) -4.41% (p=0.000) RegexpMatchEasy0_32 159ns × (0.99,1.00) 152ns × (0.99,1.01) -4.69% (p=0.000) RegexpMatchEasy0_1K 538ns × (1.00,1.01) 530ns × (1.00,1.00) -1.57% (p=0.000) RegexpMatchEasy1_32 138ns × (1.00,1.00) 134ns × (0.99,1.01) -2.91% (p=0.000) RegexpMatchEasy1_1K 869ns × (0.99,1.01) 865ns × (1.00,1.00) -0.51% (p=0.012) RegexpMatchMedium_32 252ns × (0.99,1.01) 233ns × (1.00,1.00) -7.85% (p=0.000) RegexpMatchMedium_1K 72.7µs × (1.00,1.00) 69.5µs × (1.00,1.00) -4.43% (p=0.000) RegexpMatchHard_32 3.85µs × (1.00,1.00) 3.74µs × (1.00,1.00) -2.74% (p=0.000) RegexpMatchHard_1K 118µs × (1.00,1.00) 115µs × (1.00,1.00) -2.24% (p=0.000) Revcomp 920ms × (0.97,1.07) 920ms × (0.97,1.04) ~ (p=0.998) Template 129ms × (0.98,1.03) 117ms × (0.99,1.01) -9.79% (p=0.000) TimeParse 619ns × (0.99,1.01) 615ns × (0.99,1.00) -0.57% (p=0.011) TimeFormat 661ns × (0.98,1.04) 654ns × (0.99,1.00) ~ (p=0.223) Change-Id: If054d81ab2c71d8d62cf54b5b1fac2af66b387fc Reviewed-on: https://go-review.googlesource.com/9813 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-05-13 19:20:39 +00:00
Russ Cox	8552047a32	cmd/internal/gc: optimize append + write barrier The code generated for x = append(x, v) is roughly: t := x if len(t)+1 > cap(t) { t = grow(t) } t[len(t)] = v len(t)++ x = t We used to generate this code as Go pseudocode during walk. Generate it instead as actual instructions during gen. Doing so lets us apply a few optimizations. The most important is that when, as in the above example, the source slice and the destination slice are the same, the code can instead do: t := x if len(t)+1 > cap(t) { t = grow(t) x = {base(t), len(t)+1, cap(t)} } else { len(x)++ } t[len(t)] = v That is, in the fast path that does not reallocate the array, only the updated length needs to be written back to x, not the array pointer and not the capacity. This is more like what you'd write by hand in C. It's faster in general, since the fast path elides two of the three stores, but it's especially faster when the form of x is such that the base pointer write would turn into a write barrier. No write, no barrier. name old mean new mean delta BinaryTree17 5.68s × (0.97,1.04) 5.81s × (0.98,1.03) +2.35% (p=0.023) Fannkuch11 4.41s × (0.98,1.03) 4.35s × (1.00,1.00) ~ (p=0.090) FmtFprintfEmpty 92.7ns × (0.91,1.16) 86.0ns × (0.94,1.11) -7.31% (p=0.038) FmtFprintfString 281ns × (0.96,1.08) 276ns × (0.98,1.04) ~ (p=0.219) FmtFprintfInt 288ns × (0.97,1.06) 274ns × (0.98,1.06) -4.94% (p=0.002) FmtFprintfIntInt 493ns × (0.97,1.04) 506ns × (0.99,1.01) +2.65% (p=0.009) FmtFprintfPrefixedInt 423ns × (0.97,1.04) 391ns × (0.99,1.01) -7.52% (p=0.000) FmtFprintfFloat 598ns × (0.99,1.01) 566ns × (0.99,1.01) -5.27% (p=0.000) FmtManyArgs 1.89µs × (0.98,1.05) 1.91µs × (0.99,1.01) ~ (p=0.231) GobDecode 14.8ms × (0.98,1.03) 15.3ms × (0.99,1.02) +3.01% (p=0.000) GobEncode 12.3ms × (0.98,1.01) 11.5ms × (0.97,1.03) -5.93% (p=0.000) Gzip 656ms × (0.99,1.05) 645ms × (0.99,1.01) ~ (p=0.055) Gunzip 142ms × (1.00,1.00) 142ms × (1.00,1.00) -0.32% (p=0.034) HTTPClientServer 91.2µs × (0.97,1.04) 90.5µs × (0.97,1.04) ~ (p=0.468) JSONEncode 32.6ms × (0.97,1.08) 32.0ms × (0.98,1.03) ~ (p=0.190) JSONDecode 114ms × (0.97,1.05) 114ms × (0.99,1.01) ~ (p=0.887) Mandelbrot200 6.11ms × (0.98,1.04) 6.04ms × (1.00,1.01) ~ (p=0.167) GoParse 6.66ms × (0.97,1.04) 6.47ms × (0.97,1.05) -2.81% (p=0.014) RegexpMatchEasy0_32 159ns × (0.99,1.00) 171ns × (0.93,1.07) +7.19% (p=0.002) RegexpMatchEasy0_1K 538ns × (1.00,1.01) 550ns × (0.98,1.01) +2.30% (p=0.000) RegexpMatchEasy1_32 138ns × (1.00,1.00) 135ns × (0.99,1.02) -1.60% (p=0.000) RegexpMatchEasy1_1K 869ns × (0.99,1.01) 879ns × (1.00,1.01) +1.08% (p=0.000) RegexpMatchMedium_32 252ns × (0.99,1.01) 243ns × (1.00,1.00) -3.71% (p=0.000) RegexpMatchMedium_1K 72.7µs × (1.00,1.00) 70.3µs × (1.00,1.00) -3.34% (p=0.000) RegexpMatchHard_32 3.85µs × (1.00,1.00) 3.82µs × (1.00,1.01) -0.81% (p=0.000) RegexpMatchHard_1K 118µs × (1.00,1.00) 117µs × (1.00,1.00) -0.56% (p=0.000) Revcomp 920ms × (0.97,1.07) 917ms × (0.97,1.04) ~ (p=0.808) Template 129ms × (0.98,1.03) 114ms × (0.99,1.01) -12.06% (p=0.000) TimeParse 619ns × (0.99,1.01) 622ns × (0.99,1.01) ~ (p=0.062) TimeFormat 661ns × (0.98,1.04) 665ns × (0.99,1.01) ~ (p=0.524) See next CL for combination with a similar optimization for slice. The benchmarks that are slower in this CL are still faster overall with the combination of the two. Change-Id: I2a7421658091b2488c64741b4db15ab6c3b4cb7e Reviewed-on: https://go-review.googlesource.com/9812 Reviewed-by: David Chase <drchase@google.com>	2015-05-12 17:55:09 +00:00

10 Commits