mirror of
https://github.com/golang/go
synced 2024-11-16 22:04:50 -07:00
0b47b94a62
428 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Keith Randall
|
0b47b94a62 |
cmd/compile: remove more extension ops when not needed
If we're not using the upper bits, don't bother issuing a sign/zero extension operation. For arm64, after CL 520916 which fixed a correctness bug with extensions but as a side effect leaves many unnecessary ones still in place. Change-Id: I5f4fe4efbf2e9f80969ab5b9a6122fb812dc2ec0 Reviewed-on: https://go-review.googlesource.com/c/go/+/521496 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Keith Randall
|
611706b171 |
cmd/compile: don't use BTS when OR works, add direct memory BTS operations
Stop using BTSconst and friends when ORLconst can be used instead. OR can be issued by more function units than BTS can, so it could lead to better IPC. OR might take a few more bytes to encode, but not a lot more. Still use BTSconst for cases where the constant otherwise wouldn't fit and would require a separate movabs instruction to materialize the constant. This happens when setting bits 31-63 of 64-bit targets. Add BTS-to-memory operations so we don't need to load/bts/store. Fixes #61694 Change-Id: I00379608df8fb0167cb01466e97d11dec7c1596c Reviewed-on: https://go-review.googlesource.com/c/go/+/515755 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
Jorropo
|
bac4e2f241 |
cmd/compile: try to rewrite loops to count down
Fixes #61629 This reduce the pressure on regalloc because then the loop only keep alive one value (the iterator) instead of the iterator and the upper bound since the comparison now acts against an immediate, often zero which can be skipped. This optimize things like: for i := 0; i < n; i++ { Or a range over a slice where the index is not used: for _, v := range someSlice { Or the new range over int from #61405: for range n { It is hit in 975 unique places while doing ./make.bash. Change-Id: I5facff8b267a0b60ea3c1b9a58c4d74cdb38f03f Reviewed-on: https://go-review.googlesource.com/c/go/+/512935 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> |
||
Junxian Zhu
|
b1474672c6 |
cmd/compile: intrinsify Sub64 on mips64
This CL intrinsify Sub64 on mips64. pkg: math/bits _ sec/op _ sec/op vs base _ Sub-4 2.849n _ 0% 1.948n _ 0% -31.64% (p=0.000 n=8) Sub32-4 3.447n _ 0% 3.446n _ 0% ~ (p=0.982 n=8) Sub64-4 2.815n _ 0% 1.948n _ 0% -30.78% (p=0.000 n=8) Sub64multiple-4 6.124n _ 0% 3.340n _ 0% -45.46% (p=0.000 n=8) Change-Id: Ibba91a4350e4a549ae0b60d8cafc4bca05034b84 Reviewed-on: https://go-review.googlesource.com/c/go/+/498497 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> |
||
Junxian Zhu
|
5f8a2fdf09 |
cmd/compile: intrinsify Add64 on mips64
This CL intrinsify Add64 on mips64. pkg: math/bits _ sec/op _ sec/op vs base _ Add64-4 2.783n _ 0% 1.950n _ 0% -29.93% (p=0.000 n=8) Add64multiple-4 5.713n _ 0% 3.063n _ 0% -46.38% (p=0.000 n=8) pkg: crypto/elliptic _ sec/op _ sec/op vs base _ ScalarBaseMult/P256-4 353.7_ _ 0% 282.7_ _ 0% -20.09% (p=0.000 n=8) ScalarBaseMult/P224-4 330.5_ _ 0% 250.0_ _ 0% -24.37% (p=0.000 n=8) ScalarBaseMult/P384-4 1228.8_ _ 0% 791.5_ _ 0% -35.59% (p=0.000 n=8) ScalarBaseMult/P521-4 15.412m _ 0% 2.438m _ 0% -84.18% (p=0.000 n=8) ScalarMult/P256-4 1189.4_ _ 0% 904.2_ _ 0% -23.98% (p=0.000 n=8) ScalarMult/P224-4 1138.8_ _ 0% 813.8_ _ 0% -28.54% (p=0.000 n=8) ScalarMult/P384-4 4.419m _ 0% 2.692m _ 0% -39.08% (p=0.000 n=8) ScalarMult/P521-4 59.768m _ 0% 8.773m _ 0% -85.32% (p=0.000 n=8) MarshalUnmarshal/P256/Uncompressed-4 8.697_ _ 1% 7.923_ _ 1% -8.91% (p=0.000 n=8) MarshalUnmarshal/P256/Compressed-4 104.75_ _ 0% 66.29_ _ 0% -36.72% (p=0.000 n=8) MarshalUnmarshal/P224/Uncompressed-4 8.728_ _ 1% 7.823_ _ 1% -10.37% (p=0.000 n=8) MarshalUnmarshal/P224/Compressed-4 1035.7_ _ 0% 676.5_ _ 2% -34.69% (p=0.000 n=8) MarshalUnmarshal/P384/Uncompressed-4 15.32_ _ 1% 11.81_ _ 1% -22.90% (p=0.000 n=8) MarshalUnmarshal/P384/Compressed-4 399.8_ _ 0% 217.4_ _ 0% -45.62% (p=0.000 n=8) MarshalUnmarshal/P521/Uncompressed-4 96.79_ _ 0% 20.32_ _ 0% -79.01% (p=0.000 n=8) MarshalUnmarshal/P521/Compressed-4 6640.4_ _ 0% 790.8_ _ 0% -88.09% (p=0.000 n=8) Change-Id: I8a0960b9665720c1d3e57dce36386e74db37fefa Reviewed-on: https://go-review.googlesource.com/c/go/+/498496 Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@golang.org> |
||
Keith Randall
|
67983c0f78 |
cmd/compile: add indexed SET* opcodes for amd64
Update #61356 Change-Id: I391af98563b1c068208784c80ea736c78c29639d Reviewed-on: https://go-review.googlesource.com/c/go/+/510435 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> |
||
Keith Randall
|
505e50b1e3 |
cmd/compile: get rid of special case in scheduler for entry block
It isn't needed. Fixes #61356 Change-Id: Ib466a3eac90c3ea57888cf40c294513033fc6118 Reviewed-on: https://go-review.googlesource.com/c/go/+/509856 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org> |
||
Keith Randall
|
d0964e172b |
cmd/compile: optimize s==s for strings
s==s is always true for strings. This comes up in NaN testing in generic code, where we want x==x to compile completely away except for float types. Fixes #60777 Change-Id: I3ce054b5121354de2f9751b010fb409f148cb637 Reviewed-on: https://go-review.googlesource.com/c/go/+/503795 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> |
||
Keith Randall
|
e713d6f939 |
cmd/compile: memcombine if values being stored are from consecutive loads
If we load 2 values and then store those 2 loaded values, we can likely perform that operation with a single wider load and store. Fixes #60709 Change-Id: Ifc5f92c2f1b174c6ed82a69070f16cec6853c770 Reviewed-on: https://go-review.googlesource.com/c/go/+/502295 Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> |
||
Jes Cok
|
5d481abc87 |
all: fix typos
Change-Id: I510b0a4bf3472d937393800dd57472c30beef329
GitHub-Last-Rev:
|
||
Paul E. Murphy
|
5e4000ad7f |
cmd/compile: on PPC64, fix sign/zero extension when masking
(ANDCCconst [y] (MOV.*reg x)) should only be merged when zero extending. Otherwise, sign bits are lost on negative values. (ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero extension of x. Likewise for the MOVHreg variant. Fixes #61297 Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b Reviewed-on: https://go-review.googlesource.com/c/go/+/508775 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
Meng Zhuo
|
3fce111535 |
cmd/compile: fix FMA negative commutativity of riscv64
According to RISCV manual 11.6: FMADD x,y,z computes x*y+z and FNMADD x,y,z => -x*y-z FMSUB x,y,z => x*y-z FNMSUB x,y,z => -x*y+z respectively However our implement of SSA convert FMADD -x,y,z to FNMADD x,y,z which is wrong and should be convert to FNMSUB according to manual. Change-Id: Ib297bc83824e121fd7dda171ed56ea9694a4e575 Reviewed-on: https://go-review.googlesource.com/c/go/+/506575 Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Meng Zhuo
|
5c1a15df41 |
test/codegen: enable Mul2 DivPow2 test for riscv64
Change-Id: Ice0bb7a665599b334e927a1b00d1a5b400c15e3d Reviewed-on: https://go-review.googlesource.com/c/go/+/506035 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> |
||
Meng Zhuo
|
b7e7467865 |
test/codegen: add fsqrt test for riscv64
Add FSQRTD FSQRTS codegen tests for riscv64 Change-Id: I16ca3753ad1ba37afbd9d0f887b078e33f98fda0 Reviewed-on: https://go-review.googlesource.com/c/go/+/503275 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Keith Randall
|
c643b29381 |
cmd/compile: use callsite as line number for argument marshaling
Don't use the line number of the argument itself, as that may be from arbitrarily earlier in the function. Fixes #60673 Change-Id: Ifc0a2aaae221a256be3a4b0b2e04849bae4b79d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/502656 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
Austin Clements
|
c2e0bf0abf |
cmd/internal/testdir: pass if GOEXPERIMENT=cgocheck2 is set
Some testdir tests fail if GOEXPERIMENT=cgocheck2 is set. Fix this by skipping these tests. Change-Id: I58d4ef0cceb86bcf93220b4a44de9b9dc4879b16 Reviewed-on: https://go-review.googlesource.com/c/go/+/499675 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
Bryan Mills
|
02d234e34d |
Revert "cmd/compile: sparse conditional constant propagation"
This reverts CL 483875. Reason for revert: appears to cause internal compiler errors on the ssacheck builder. Change-Id: I662418384291470c1962c417797a5890dd9aa7a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/497855 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Bryan Mills <bcmills@google.com> |
||
Junxian Zhu
|
f0d575c266 |
cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on mips64x
This CL use MFC1/MTC1 instructions to move data between GPR and FPR instead of stores and loads to move float/int values. goos: linux goarch: mips64le pkg: math │ oldmath │ newmath │ │ sec/op │ sec/op vs base │ Acos-4 258.2n ± 0% 258.2n ± 0% ~ (p=0.859 n=8) Acosh-4 378.7n ± 0% 323.9n ± 0% -14.47% (p=0.000 n=8) Asin-4 255.1n ± 2% 255.5n ± 0% +0.16% (p=0.002 n=8) Asinh-4 407.1n ± 0% 348.7n ± 0% -14.35% (p=0.000 n=8) Atan-4 189.5n ± 0% 189.9n ± 3% ~ (p=0.205 n=8) Atanh-4 355.6n ± 0% 323.4n ± 2% -9.03% (p=0.000 n=8) Atan2-4 284.1n ± 7% 280.1n ± 4% ~ (p=0.313 n=8) Cbrt-4 314.3n ± 0% 236.4n ± 0% -24.79% (p=0.000 n=8) Ceil-4 144.3n ± 3% 139.6n ± 0% ~ (p=0.069 n=8) Compare-4 21.100n ± 0% 7.035n ± 0% -66.66% (p=0.000 n=8) Compare32-4 20.100n ± 0% 6.030n ± 0% -70.00% (p=0.000 n=8) Copysign-4 34.970n ± 0% 6.221n ± 0% -82.21% (p=0.000 n=8) Cos-4 183.4n ± 3% 184.1n ± 5% ~ (p=0.159 n=8) Cosh-4 487.9n ± 2% 419.6n ± 0% -14.00% (p=0.000 n=8) Erf-4 160.6n ± 0% 157.9n ± 0% -1.68% (p=0.009 n=8) Erfc-4 183.7n ± 4% 169.8n ± 0% -7.54% (p=0.000 n=8) Erfinv-4 191.5n ± 4% 183.6n ± 0% -4.13% (p=0.023 n=8) Erfcinv-4 192.0n ± 7% 184.3n ± 0% ~ (p=0.425 n=8) Exp-4 398.2n ± 0% 340.1n ± 4% -14.58% (p=0.000 n=8) ExpGo-4 383.3n ± 0% 327.3n ± 0% -14.62% (p=0.000 n=8) Expm1-4 248.7n ± 5% 216.0n ± 0% -13.11% (p=0.000 n=8) Exp2-4 372.8n ± 0% 316.9n ± 3% -14.98% (p=0.000 n=8) Exp2Go-4 374.1n ± 0% 320.5n ± 0% -14.33% (p=0.000 n=8) Abs-4 3.013n ± 0% 3.016n ± 0% +0.10% (p=0.020 n=8) Dim-4 5.021n ± 0% 5.022n ± 0% ~ (p=0.270 n=8) Floor-4 127.5n ± 4% 126.2n ± 3% ~ (p=0.186 n=8) Max-4 72.32n ± 0% 61.33n ± 0% -15.20% (p=0.000 n=8) Min-4 83.33n ± 1% 61.36n ± 0% -26.37% (p=0.000 n=8) Mod-4 690.7n ± 0% 454.5n ± 0% -34.20% (p=0.000 n=8) Frexp-4 116.30n ± 1% 71.80n ± 1% -38.26% (p=0.000 n=8) Gamma-4 389.0n ± 0% 355.9n ± 1% -8.48% (p=0.000 n=8) Hypot-4 102.40n ± 0% 83.90n ± 0% -18.07% (p=0.000 n=8) HypotGo-4 105.45n ± 4% 84.82n ± 2% -19.56% (p=0.000 n=8) Ilogb-4 99.13n ± 4% 63.71n ± 2% -35.73% (p=0.000 n=8) J0-4 859.7n ± 0% 854.8n ± 0% -0.57% (p=0.000 n=8) J1-4 873.9n ± 0% 875.7n ± 0% +0.21% (p=0.007 n=8) Jn-4 1.855µ ± 0% 1.867µ ± 0% +0.65% (p=0.000 n=8) Ldexp-4 130.50n ± 2% 64.35n ± 0% -50.69% (p=0.000 n=8) Lgamma-4 208.8n ± 0% 200.9n ± 0% -3.78% (p=0.000 n=8) Log-4 294.1n ± 0% 255.2n ± 3% -13.22% (p=0.000 n=8) Logb-4 105.45n ± 1% 66.81n ± 1% -36.64% (p=0.000 n=8) Log1p-4 268.2n ± 0% 211.3n ± 0% -21.21% (p=0.000 n=8) Log10-4 295.4n ± 0% 255.2n ± 2% -13.59% (p=0.000 n=8) Log2-4 152.9n ± 1% 127.5n ± 0% -16.61% (p=0.000 n=8) Modf-4 103.40n ± 0% 75.36n ± 0% -27.12% (p=0.000 n=8) Nextafter32-4 121.20n ± 1% 78.40n ± 0% -35.31% (p=0.000 n=8) Nextafter64-4 110.40n ± 1% 64.91n ± 0% -41.20% (p=0.000 n=8) PowInt-4 509.8n ± 1% 369.3n ± 1% -27.56% (p=0.000 n=8) PowFrac-4 1189.0n ± 0% 947.8n ± 0% -20.29% (p=0.000 n=8) Pow10Pos-4 15.07n ± 0% 15.07n ± 0% ~ (p=0.733 n=8) Pow10Neg-4 20.10n ± 0% 20.10n ± 0% ~ (p=0.576 n=8) Round-4 44.22n ± 0% 26.12n ± 0% -40.92% (p=0.000 n=8) RoundToEven-4 46.22n ± 0% 27.12n ± 0% -41.31% (p=0.000 n=8) Remainder-4 539.0n ± 1% 417.1n ± 1% -22.62% (p=0.000 n=8) Signbit-4 17.985n ± 0% 5.694n ± 0% -68.34% (p=0.000 n=8) Sin-4 185.7n ± 5% 172.9n ± 0% -6.89% (p=0.001 n=8) Sincos-4 176.6n ± 0% 200.9n ± 0% +13.76% (p=0.000 n=8) Sinh-4 495.8n ± 0% 435.9n ± 0% -12.09% (p=0.000 n=8) SqrtIndirect-4 5.022n ± 0% 5.024n ± 0% ~ (p=0.083 n=8) SqrtLatency-4 8.038n ± 0% 8.044n ± 0% ~ (p=0.524 n=8) SqrtIndirectLatency-4 8.035n ± 0% 8.039n ± 0% +0.06% (p=0.017 n=8) SqrtGoLatency-4 340.1n ± 0% 278.3n ± 0% -18.19% (p=0.000 n=8) SqrtPrime-4 5.381µ ± 0% 5.386µ ± 0% ~ (p=0.662 n=8) Tan-4 198.6n ± 1% 183.1n ± 0% -7.85% (p=0.000 n=8) Tanh-4 491.3n ± 1% 440.8n ± 1% -10.29% (p=0.000 n=8) Trunc-4 121.7n ± 0% 121.7n ± 0% ~ (p=0.769 n=8) Y0-4 855.1n ± 0% 859.8n ± 0% +0.54% (p=0.007 n=8) Y1-4 862.3n ± 0% 865.1n ± 0% +0.32% (p=0.007 n=8) Yn-4 1.830µ ± 0% 1.837µ ± 0% +0.36% (p=0.011 n=8) Float64bits-4 13.060n ± 0% 3.016n ± 0% -76.91% (p=0.000 n=8) Float64frombits-4 13.060n ± 0% 3.018n ± 0% -76.90% (p=0.000 n=8) Float32bits-4 13.060n ± 0% 3.016n ± 0% -76.91% (p=0.000 n=8) Float32frombits-4 13.070n ± 0% 3.013n ± 0% -76.94% (p=0.000 n=8) FMA-4 446.0n ± 0% 413.1n ± 1% -7.38% (p=0.000 n=8) geomean 143.4n 108.3n -24.49% Change-Id: I2067f7a5ae1126ada7ab3fb2083710e8212535e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/493815 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> |
||
Yi Yang
|
fa50248ce6 |
cmd/compile: sparse conditional constant propagation
sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately.
Updates #59399
Change-Id: Ia954e906480654a6f0cc065d75b5912f96f36b2e
GitHub-Last-Rev:
|
||
Cuong Manh Le
|
35a71dc56d |
cmd/compile: avoid slicebytetostring call in len(string([]byte))
Change-Id: Ie04503e61400a793a6a29a4b58795254deabe472 Reviewed-on: https://go-review.googlesource.com/c/go/+/497276 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
||
Matthew Dempsky
|
7f1467ff4d |
cmd/compile: incorporate inlined function names into closure naming
In Go 1.17, cmd/compile gained the ability to inline calls to functions that contain function literals (aka "closures"). This was implemented by duplicating the function literal body and emitting a second LSym, because in general it might be optimized better than the original function literal. However, the second LSym was named simply as any other function literal appearing literally in the enclosing function would be named. E.g., if f has a closure "f.funcX", and f is inlined into g, we would create "g.funcY" (N.B., X and Y need not be the same.). Users then have no idea this function originally came from f. With this CL, the inlined call stack is incorporated into the clone LSym's name: instead of "g.funcY", it's named "g.f.funcY". In the future, it seems desirable to arrange for the clone's name to appear exactly as the original name, so stack traces remain the same as when -l or -d=inlfuncswithclosures are used. But it's unclear whether the linker supports that today, or whether any downstream tooling would be confused by this. Updates #60324. Change-Id: Ifad0ccef7e959e72005beeecdfffd872f63982f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/497137 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> |
||
Keith Randall
|
6042a062dc |
cmd/compile: make memcombine pass a bit more robust to reassociation of exprs
Be more liberal about expanding the OR tree. Handle any tree shape instead of a fully left or right associative tree. Also remove tail feature, it isn't ever needed. Change-Id: If16bebef94b952a604d6069e9be3d9129994cb6f Reviewed-on: https://go-review.googlesource.com/c/go/+/494056 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Ryan Berger <ryanbberger@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> |
||
Lynn Boger
|
4481042c43 |
cmd/compile: update rules to generate more prefixed instructions
This modifies some existing rules to allow more prefixed instructions to be generated when using GOPPC64=power10. Some rules also check if PCRel is available, which is currently supported for linux/ppc64le and linux/ppc64 (internal linking only). Prior to p10, DS-offset loads and stores had a 16 bit size limit for the offset field. If the offset of the data for load or store was beyond this range then an indexed load or store would be selected by the rules. In p10 the assembler can generate prefixed instructions in this case, but does not if an indexed instruction was selected during the lowering pass. This allows many more cases to use prefixed loads or stores, reducing function sizes and improving performance in some cases where the code change happens in key loops. For example in strconv BenchmarkAppendQuoteRune before: 12c5e4: 15 00 10 06 pla r10,1425660 12c5e8: fc c0 40 39 12c5ec: 00 00 6a e8 ld r3,0(r10) 12c5f0: 10 00 aa e8 ld r5,16(r10) After this change: 12a828: 15 00 10 04 pld r3,1433272 12a82c: b8 de 60 e4 12a830: 15 00 10 04 pld r5,1433280 12a834: c0 de a0 e4 Performs better in the second case. A testcase was added to verify that the rules correctly select a load or store based on the offset and whether power10 or earlier. Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846 Reviewed-on: https://go-review.googlesource.com/c/go/+/477398 Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> |
||
Dmitri Shuralyov
|
7cc4516ac8 |
internal/testdir: move to cmd/internal/testdir
The effect and motivation is for the test to be selected when doing 'go test cmd' and not when doing 'go test std' since it's primarily about testing the Go compiler and linker. Other than that, it's run by all.bash and 'go test std cmd' as before. For #56844. Fixes #60059. Change-Id: I2d499af013f9d9b8761fdf4573f8d27d80c1fccf Reviewed-on: https://go-review.googlesource.com/c/go/+/493876 Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> |
||
Stefan
|
95c4f320d5 |
cmd/compile: add De Morgan's rewrite rule
Adds rules that rewrites statements such as ~P&~Q as ~(P|Q) and ~P|~Q as ~(P&Q), removing an extraneous instruction.
Change-Id: Icedb97df741680ddf9799df79df78657173aa500
GitHub-Last-Rev:
|
||
Lynn Boger
|
bc3bdfa977 |
test: add memcombine testcases for ppc64
Thanks to the recent addition of the memcombine pass, the ppc64 ports now have the memcombine optimizations. Previously in PPC64.rules, the memcombine rules were only added for ppc64le targets due to the significant increase in size of the rewritePPC64.go file when those rules were added. The ppc64 and ppc64le rules had to be different because of the byte order due to endianness differences. This enables the memcombine tests to be run on ppc64 as well as ppc64le. Change-Id: I4081e2d94617a1b66541d536c0c2662e266c9c1e Reviewed-on: https://go-review.googlesource.com/c/go/+/492615 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Lynn Boger <laboger@linux.vnet.ibm.com> |
||
Junxian Zhu
|
5cad8d41ca |
math: optimize math.Abs on mipsx
This commit optimized math.Abs function implementation on mipsx. Tested on loongson 3A2000. goos: linux goarch: mipsle pkg: math │ oldmath │ newmath │ │ sec/op │ sec/op vs base │ Acos-4 282.6n ± 0% 282.3n ± 0% ~ (p=0.140 n=7) Acosh-4 506.1n ± 0% 451.8n ± 0% -10.73% (p=0.001 n=7) Asin-4 272.3n ± 0% 272.2n ± 0% ~ (p=0.808 n=7) Asinh-4 529.7n ± 0% 475.3n ± 0% -10.27% (p=0.001 n=7) Atan-4 208.2n ± 0% 207.9n ± 0% ~ (p=0.134 n=7) Atanh-4 503.4n ± 1% 449.7n ± 0% -10.67% (p=0.001 n=7) Atan2-4 310.5n ± 0% 310.5n ± 0% ~ (p=0.928 n=7) Cbrt-4 359.3n ± 0% 358.8n ± 0% ~ (p=0.121 n=7) Ceil-4 203.9n ± 0% 204.0n ± 0% ~ (p=0.600 n=7) Compare-4 23.11n ± 0% 23.11n ± 0% ~ (p=0.702 n=7) Compare32-4 19.09n ± 0% 19.12n ± 0% ~ (p=0.070 n=7) Copysign-4 33.20n ± 0% 34.02n ± 0% +2.47% (p=0.001 n=7) Cos-4 422.5n ± 0% 385.4n ± 1% -8.78% (p=0.001 n=7) Cosh-4 628.0n ± 0% 545.5n ± 0% -13.14% (p=0.001 n=7) Erf-4 193.7n ± 2% 192.7n ± 1% ~ (p=0.430 n=7) Erfc-4 192.8n ± 1% 193.0n ± 0% ~ (p=0.245 n=7) Erfinv-4 220.7n ± 1% 221.5n ± 2% ~ (p=0.272 n=7) Erfcinv-4 221.3n ± 1% 220.4n ± 2% ~ (p=0.738 n=7) Exp-4 471.4n ± 0% 435.1n ± 0% -7.70% (p=0.001 n=7) ExpGo-4 470.6n ± 0% 434.0n ± 0% -7.78% (p=0.001 n=7) Expm1-4 243.1n ± 0% 243.4n ± 0% ~ (p=0.417 n=7) Exp2-4 463.1n ± 0% 427.0n ± 0% -7.80% (p=0.001 n=7) Exp2Go-4 462.4n ± 0% 426.2n ± 5% -7.83% (p=0.001 n=7) Abs-4 37.000n ± 0% 8.039n ± 9% -78.27% (p=0.001 n=7) Dim-4 18.09n ± 0% 18.11n ± 0% ~ (p=0.094 n=7) Floor-4 151.9n ± 0% 151.8n ± 0% ~ (p=0.190 n=7) Max-4 116.7n ± 1% 116.7n ± 1% ~ (p=0.842 n=7) Min-4 116.6n ± 1% 116.6n ± 0% ~ (p=0.464 n=7) Mod-4 1244.0n ± 0% 980.9n ± 0% -21.15% (p=0.001 n=7) Frexp-4 199.0n ± 0% 146.7n ± 0% -26.28% (p=0.001 n=7) Gamma-4 516.4n ± 0% 479.3n ± 1% -7.18% (p=0.001 n=7) Hypot-4 169.8n ± 0% 117.8n ± 2% -30.62% (p=0.001 n=7) HypotGo-4 170.8n ± 0% 117.5n ± 0% -31.21% (p=0.001 n=7) Ilogb-4 160.8n ± 0% 109.5n ± 0% -31.90% (p=0.001 n=7) J0-4 1.359µ ± 0% 1.305µ ± 0% -3.97% (p=0.001 n=7) J1-4 1.386µ ± 0% 1.334µ ± 0% -3.75% (p=0.001 n=7) Jn-4 2.864µ ± 0% 2.758µ ± 0% -3.70% (p=0.001 n=7) Ldexp-4 202.9n ± 0% 151.7n ± 0% -25.23% (p=0.001 n=7) Lgamma-4 234.0n ± 0% 234.3n ± 0% ~ (p=0.199 n=7) Log-4 444.1n ± 0% 407.9n ± 0% -8.15% (p=0.001 n=7) Logb-4 157.8n ± 0% 121.6n ± 0% -22.94% (p=0.001 n=7) Log1p-4 354.8n ± 0% 315.4n ± 0% -11.10% (p=0.001 n=7) Log10-4 453.9n ± 0% 417.9n ± 0% -7.93% (p=0.001 n=7) Log2-4 245.3n ± 0% 209.1n ± 0% -14.76% (p=0.001 n=7) Modf-4 126.6n ± 0% 126.6n ± 0% ~ (p=0.126 n=7) Nextafter32-4 112.5n ± 0% 112.5n ± 0% ~ (p=0.853 n=7) Nextafter64-4 141.7n ± 0% 141.6n ± 0% ~ (p=0.331 n=7) PowInt-4 878.8n ± 1% 758.3n ± 1% -13.71% (p=0.001 n=7) PowFrac-4 1.809µ ± 0% 1.615µ ± 0% -10.72% (p=0.001 n=7) Pow10Pos-4 18.10n ± 0% 18.12n ± 0% ~ (p=0.464 n=7) Pow10Neg-4 17.09n ± 0% 17.09n ± 0% ~ (p=0.263 n=7) Round-4 68.36n ± 0% 68.33n ± 0% ~ (p=0.325 n=7) RoundToEven-4 78.40n ± 0% 78.40n ± 0% ~ (p=0.934 n=7) Remainder-4 894.0n ± 1% 753.4n ± 1% -15.73% (p=0.001 n=7) Signbit-4 18.09n ± 0% 18.09n ± 0% ~ (p=0.761 n=7) Sin-4 389.8n ± 1% 389.8n ± 0% ~ (p=0.995 n=7) Sincos-4 416.0n ± 0% 415.9n ± 0% ~ (p=0.361 n=7) Sinh-4 634.6n ± 4% 585.6n ± 1% -7.72% (p=0.001 n=7) SqrtIndirect-4 8.035n ± 0% 8.036n ± 0% ~ (p=0.523 n=7) SqrtLatency-4 8.039n ± 0% 8.037n ± 0% ~ (p=0.218 n=7) SqrtIndirectLatency-4 8.040n ± 0% 8.040n ± 0% ~ (p=0.652 n=7) SqrtGoLatency-4 895.7n ± 0% 896.6n ± 0% +0.10% (p=0.004 n=7) SqrtPrime-4 5.406µ ± 0% 5.407µ ± 0% ~ (p=0.592 n=7) Tan-4 406.1n ± 0% 405.8n ± 1% ~ (p=0.435 n=7) Tanh-4 627.6n ± 0% 545.5n ± 0% -13.08% (p=0.001 n=7) Trunc-4 146.7n ± 1% 146.7n ± 0% ~ (p=0.755 n=7) Y0-4 1.359µ ± 0% 1.310µ ± 0% -3.61% (p=0.001 n=7) Y1-4 1.351µ ± 0% 1.301µ ± 0% -3.70% (p=0.001 n=7) Yn-4 2.829µ ± 0% 2.729µ ± 0% -3.53% (p=0.001 n=7) Float64bits-4 14.08n ± 0% 14.07n ± 0% ~ (p=0.069 n=7) Float64frombits-4 19.09n ± 0% 19.10n ± 0% ~ (p=0.755 n=7) Float32bits-4 13.06n ± 0% 13.07n ± 1% ~ (p=0.586 n=7) Float32frombits-4 13.06n ± 0% 13.06n ± 0% ~ (p=0.853 n=7) FMA-4 606.9n ± 0% 606.8n ± 0% ~ (p=0.393 n=7) geomean 201.1n 185.4n -7.81% Change-Id: I6d41a97ad3789ed5731588588859ac0b8b13b664 Reviewed-on: https://go-review.googlesource.com/c/go/+/484675 Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Than McIntosh <thanm@google.com> |
||
Junxian Zhu
|
574431cfcd |
math: optimize math.Abs on mips64x
This commit optimized math.Abs function implementation on mips64x. Tested on loongson 3A2000. goos: linux goarch: mips64le pkg: math │ oldmath │ newmath │ │ sec/op │ sec/op vs base │ Acos-4 258.0n ± ∞ ¹ 257.1n ± ∞ ¹ -0.35% (p=0.008 n=5) Acosh-4 417.0n ± ∞ ¹ 377.9n ± ∞ ¹ -9.38% (p=0.008 n=5) Asin-4 248.0n ± ∞ ¹ 259.9n ± ∞ ¹ +4.80% (p=0.008 n=5) Asinh-4 439.6n ± ∞ ¹ 408.3n ± ∞ ¹ -7.12% (p=0.008 n=5) Atan-4 189.6n ± ∞ ¹ 188.8n ± ∞ ¹ ~ (p=0.056 n=5) Atanh-4 390.0n ± ∞ ¹ 356.4n ± ∞ ¹ -8.62% (p=0.008 n=5) Atan2-4 279.0n ± ∞ ¹ 263.9n ± ∞ ¹ -5.41% (p=0.008 n=5) Cbrt-4 314.2n ± ∞ ¹ 322.3n ± ∞ ¹ +2.58% (p=0.008 n=5) Ceil-4 139.7n ± ∞ ¹ 136.6n ± ∞ ¹ -2.22% (p=0.008 n=5) Compare-4 21.11n ± ∞ ¹ 21.09n ± ∞ ¹ ~ (p=0.405 n=5) Compare32-4 20.10n ± ∞ ¹ 20.12n ± ∞ ¹ ~ (p=0.206 n=5) Copysign-4 32.17n ± ∞ ¹ 35.71n ± ∞ ¹ +11.00% (p=0.008 n=5) Cos-4 222.8n ± ∞ ¹ 169.8n ± ∞ ¹ -23.79% (p=0.008 n=5) Cosh-4 550.2n ± ∞ ¹ 477.4n ± ∞ ¹ -13.23% (p=0.008 n=5) Erf-4 171.6n ± ∞ ¹ 174.5n ± ∞ ¹ ~ (p=0.635 n=5) Erfc-4 182.6n ± ∞ ¹ 170.2n ± ∞ ¹ -6.79% (p=0.008 n=5) Erfinv-4 177.6n ± ∞ ¹ 196.6n ± ∞ ¹ +10.70% (p=0.008 n=5) Erfcinv-4 177.8n ± ∞ ¹ 197.8n ± ∞ ¹ +11.25% (p=0.008 n=5) Exp-4 422.8n ± ∞ ¹ 382.1n ± ∞ ¹ -9.63% (p=0.008 n=5) ExpGo-4 416.1n ± ∞ ¹ 383.2n ± ∞ ¹ -7.91% (p=0.008 n=5) Expm1-4 232.9n ± ∞ ¹ 252.2n ± ∞ ¹ +8.29% (p=0.008 n=5) Exp2-4 404.8n ± ∞ ¹ 389.1n ± ∞ ¹ -3.88% (p=0.008 n=5) Exp2Go-4 407.0n ± ∞ ¹ 372.3n ± ∞ ¹ -8.53% (p=0.008 n=5) Abs-4 30.120n ± ∞ ¹ 3.014n ± ∞ ¹ -89.99% (p=0.008 n=5) Dim-4 5.021n ± ∞ ¹ 5.023n ± ∞ ¹ ~ (p=0.071 n=5) Floor-4 127.8n ± ∞ ¹ 127.1n ± ∞ ¹ -0.55% (p=0.008 n=5) Max-4 77.69n ± ∞ ¹ 76.33n ± ∞ ¹ -1.75% (p=0.008 n=5) Min-4 83.27n ± ∞ ¹ 77.87n ± ∞ ¹ -6.48% (p=0.008 n=5) Mod-4 906.2n ± ∞ ¹ 692.9n ± ∞ ¹ -23.54% (p=0.008 n=5) Frexp-4 150.6n ± ∞ ¹ 108.6n ± ∞ ¹ -27.89% (p=0.008 n=5) Gamma-4 418.4n ± ∞ ¹ 386.1n ± ∞ ¹ -7.72% (p=0.008 n=5) Hypot-4 148.20n ± ∞ ¹ 93.78n ± ∞ ¹ -36.72% (p=0.008 n=5) HypotGo-4 148.20n ± ∞ ¹ 94.47n ± ∞ ¹ -36.26% (p=0.008 n=5) Ilogb-4 135.50n ± ∞ ¹ 92.38n ± ∞ ¹ -31.82% (p=0.008 n=5) J0-4 937.7n ± ∞ ¹ 861.7n ± ∞ ¹ -8.10% (p=0.008 n=5) J1-4 915.4n ± ∞ ¹ 875.9n ± ∞ ¹ -4.32% (p=0.008 n=5) Jn-4 1.974µ ± ∞ ¹ 1.863µ ± ∞ ¹ -5.62% (p=0.008 n=5) Ldexp-4 158.5n ± ∞ ¹ 129.3n ± ∞ ¹ -18.42% (p=0.008 n=5) Lgamma-4 209.0n ± ∞ ¹ 211.8n ± ∞ ¹ ~ (p=0.095 n=5) Log-4 326.4n ± ∞ ¹ 295.2n ± ∞ ¹ -9.56% (p=0.008 n=5) Logb-4 147.7n ± ∞ ¹ 105.0n ± ∞ ¹ -28.91% (p=0.008 n=5) Log1p-4 303.4n ± ∞ ¹ 266.3n ± ∞ ¹ -12.23% (p=0.008 n=5) Log10-4 329.2n ± ∞ ¹ 298.3n ± ∞ ¹ -9.39% (p=0.008 n=5) Log2-4 187.4n ± ∞ ¹ 153.0n ± ∞ ¹ -18.36% (p=0.008 n=5) Modf-4 110.5n ± ∞ ¹ 103.5n ± ∞ ¹ -6.33% (p=0.008 n=5) Nextafter32-4 128.4n ± ∞ ¹ 121.5n ± ∞ ¹ -5.37% (p=0.016 n=5) Nextafter64-4 109.5n ± ∞ ¹ 110.5n ± ∞ ¹ +0.91% (p=0.008 n=5) PowInt-4 603.3n ± ∞ ¹ 516.4n ± ∞ ¹ -14.40% (p=0.008 n=5) PowFrac-4 1.365µ ± ∞ ¹ 1.183µ ± ∞ ¹ -13.33% (p=0.008 n=5) Pow10Pos-4 15.07n ± ∞ ¹ 15.07n ± ∞ ¹ ~ (p=0.738 n=5) Pow10Neg-4 21.11n ± ∞ ¹ 21.10n ± ∞ ¹ ~ (p=0.190 n=5) Round-4 44.23n ± ∞ ¹ 44.22n ± ∞ ¹ ~ (p=0.635 n=5) RoundToEven-4 50.25n ± ∞ ¹ 46.27n ± ∞ ¹ -7.92% (p=0.008 n=5) Remainder-4 675.6n ± ∞ ¹ 530.4n ± ∞ ¹ -21.49% (p=0.008 n=5) Signbit-4 17.07n ± ∞ ¹ 17.95n ± ∞ ¹ +5.16% (p=0.008 n=5) Sin-4 171.6n ± ∞ ¹ 189.1n ± ∞ ¹ +10.20% (p=0.008 n=5) Sincos-4 201.5n ± ∞ ¹ 200.5n ± ∞ ¹ ~ (p=0.421 n=5) Sinh-4 529.6n ± ∞ ¹ 484.6n ± ∞ ¹ -8.50% (p=0.008 n=5) SqrtIndirect-4 5.021n ± ∞ ¹ 5.023n ± ∞ ¹ +0.04% (p=0.048 n=5) SqrtLatency-4 8.032n ± ∞ ¹ 8.039n ± ∞ ¹ +0.09% (p=0.024 n=5) SqrtIndirectLatency-4 8.036n ± ∞ ¹ 8.038n ± ∞ ¹ ~ (p=0.056 n=5) SqrtGoLatency-4 338.8n ± ∞ ¹ 338.7n ± ∞ ¹ ~ (p=0.841 n=5) SqrtPrime-4 5.379µ ± ∞ ¹ 5.382µ ± ∞ ¹ +0.06% (p=0.048 n=5) Tan-4 182.7n ± ∞ ¹ 191.8n ± ∞ ¹ +4.98% (p=0.008 n=5) Tanh-4 558.7n ± ∞ ¹ 497.6n ± ∞ ¹ -10.94% (p=0.008 n=5) Trunc-4 122.5n ± ∞ ¹ 122.6n ± ∞ ¹ ~ (p=0.405 n=5) Y0-4 892.8n ± ∞ ¹ 851.7n ± ∞ ¹ -4.60% (p=0.008 n=5) Y1-4 887.2n ± ∞ ¹ 863.2n ± ∞ ¹ -2.71% (p=0.008 n=5) Yn-4 1.889µ ± ∞ ¹ 1.832µ ± ∞ ¹ -3.02% (p=0.008 n=5) Float64bits-4 13.05n ± ∞ ¹ 13.06n ± ∞ ¹ +0.08% (p=0.040 n=5) Float64frombits-4 13.05n ± ∞ ¹ 13.06n ± ∞ ¹ ~ (p=0.143 n=5) Float32bits-4 13.05n ± ∞ ¹ 13.06n ± ∞ ¹ +0.08% (p=0.008 n=5) Float32frombits-4 13.05n ± ∞ ¹ 13.08n ± ∞ ¹ +0.23% (p=0.016 n=5) FMA-4 445.7n ± ∞ ¹ 448.1n ± ∞ ¹ +0.54% (p=0.008 n=5) geomean 157.2n 142.8n -9.17% Change-Id: I9bf104848b588c9ecf79401a81d483d7fcdb0a79 Reviewed-on: https://go-review.googlesource.com/c/go/+/481575 Reviewed-by: M Zhuo <mzh@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Than McIntosh <thanm@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com> |
||
Keith Randall
|
6b165577fe |
cmd/compile: remove memequal call from string compares in more cases
Add more rules to ensure that order doesn't matter. Add memequal 0 rule. Try to use a constant argument to memequal when one is available. Fixes #59684 Change-Id: I36e85ffbd949396ed700ed6e8ec2bc3ae013f5d2 Reviewed-on: https://go-review.googlesource.com/c/go/+/485535 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
ruinan
|
9be533a8ee |
cmd/compile: get more bounds info from logic operators in prove pass
Currently, the prove pass can get knowledge from some specific logic operators only before the CFG is explored, which means that the bounds information of the branch will be ignored. This CL updates the facts table by the logic operators in every branch. Combined with the branch information, this will be helpful for BCE in some circumstances. Fixes #57243 Change-Id: I0bd164f1b47804ccfc37879abe9788740b016fd5 Reviewed-on: https://go-review.googlesource.com/c/go/+/419555 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> |
||
erifan01
|
42f99b203d |
cmd/compile: optimize cmp to cmn under conditions < and >= on arm64
Under the right conditions we can optimize cmp comparisons to cmn comparisons, such as: func foo(a, b int) int { var c int if a + b < 0 { c = 1 } return c } Previously it's compiled as: ADD R1, R0, R1 CMP $0, R1 CSET LT, R0 With this CL it's compiled as: CMN R1, R0 CSET MI, R0 Here we need to pay attention to the overflow situation of a+b, the MI flag means N==1, which doesn't honor the overflow flag V, its value depends only on the sign of the result. So it has the same semantic of the Go code, so it's correct. Similarly, this CL also optimizes the case of >= comparison using the PL conditional flag. Change-Id: I47179faba5b30cca84ea69bafa2ad5241bf6dfba Reviewed-on: https://go-review.googlesource.com/c/go/+/476116 Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
erifan01
|
91a2e921dd |
cmd/compile: fix incorrect truncating when converting CMP to TST on arm64
CL 420434 optimized CMP into TST in some situations, but it has a bug, these four rules are not correct: (LessThan (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessThan (TSTconst [c] y)) (LessEqual (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessEqual (TSTconst [c] y)) (GreaterThan (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterThan (TSTconst [c] y)) (GreaterEqual (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterEqual (TSTconst [c] y)) But due to the existence of this rule (LessThan (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessThan (TSTWconst [int32(c)] y)), the above rules have never been fired. This CL corrects them as: (LessThan (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessThan (TSTconst [c] y)) (LessEqual (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessEqual (TSTconst [c] y)) (GreaterThan (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterThan (TSTconst [c] y)) (GreaterEqual (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterEqual (TSTconst [c] y)) Change-Id: I7d60bcc9a266ee58388baeaab9f493b57cf1ad55 Reviewed-on: https://go-review.googlesource.com/c/go/+/473617 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> |
||
Yi Yang
|
da4687923b |
cmd/compile: add rewrite rules for arithmetic operations
Add the following common local transformations
(t + x) - (t + y) == x - y
(t + x) - (y + t) == x - y
(x + t) - (y + t) == x - y
(x + t) - (t + y) == x - y
(x - t) + (t + y) == x + y
(x - t) + (y + t) == x + y
The compiler itself matches such patterns many times. This also aligns with other popular compilers.
Fixes #59111
Change-Id: Ibdfdb414782f8fcaa20b84ac5d43d0d9ae2c7b60
GitHub-Last-Rev:
|
||
Wayne Zuo
|
cedfcba3e8 |
cmd/compile: instrinsify TrailingZeros{8,32,64} for 386
This CL add support for instrinsifying the TrialingZeros{8,32,64} functions for 386 architecture. We need handle the case when the input is 0, which could lead to undefined output from the BSFL instruction. Next CL will remove the assembly code in runtime/internal/sys package. Change-Id: Ic168edf68e81bf69a536102100fdd3f56f0f4a1b Reviewed-on: https://go-review.googlesource.com/c/go/+/475735 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
ruinan
|
4d180f71dc |
cmd/compile: omit redundant sign/unsign extension on arm64
On Arm64, all 32-bit instructions will ignore the upper 32 bits and clear them to zero for the result. No need to do an unsign extend before a 32 bit op. This CL removes the redundant unsign extension only for the existing 32-bit opcodes, and also omits the sign extension when the upper bit of the result can be predicted. Fixes #42162 Change-Id: I61e6670bfb8982572430e67a4fa61134a3ea240a CustomizedGitHooks: yes Reviewed-on: https://go-review.googlesource.com/c/go/+/427454 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Eric Fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Dmitri Shuralyov
|
7a0799b2c0 |
cmd/dist, test: convert test/run.go runner to a cmd/go test
As motivated on the issue, we want to move the functionality of the run.go program to happen via a normal go test. Each .go test case in the GOROOT/test directory gets a subtest, and cmd/go's support for parallel test execution replaces run.go's own implementation thereof. The goal of this change is to have fairly minimal and readable diff while making an atomic changeover. The working directory is modified during the test execution to be GOROOT/test as it was with run.go, and most of the test struct and its run method are kept unchanged. The next CL in the stack applies further simplifications and cleanups that become viable. There's no noticeable difference in test execution time: it takes around 60-80 seconds both before and after on my machine. Test caching, which the previous runner lacked, can shorten the time significantly. For #37486. Fixes #56844. Change-Id: I209619dc9d90e7529624e49c01efeadfbeb5c9ae Reviewed-on: https://go-review.googlesource.com/c/go/+/463276 Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Austin Clements <austin@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Michael Munday
|
85d54a7667 |
cmd/compile: use zero constants in comparisons where possible
Some integer comparisons with 1 and -1 can be rewritten as comparisons with 0. For example, x < 1 is equivalent to x <= 0. This is an advantageous transformation on riscv64 because comparisons with zero do not require a constant to be loaded into a register. Other architectures will likely benefit too and the transformation is relatively benign on architectures that do not benefit. Change-Id: I2ce9821dd7605a660eb71d76e83a61f9bae1bf25 Reviewed-on: https://go-review.googlesource.com/c/go/+/350831 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Lynn Boger
|
ebe49f98c8 |
cmd/compile: inline constant sized memclrNoHeapPointers calls on PPC64
Update the function isInlinableMemclr for ppc64 and ppc64le to enable inlining for the constant sized cases < 512. Larger cases can use dcbz which performs better but requires alignment checking so it is best to continue using memclrNoHeapPointers for those cases. Results on p10: MemclrKnownSize1 2.07ns ± 0% 0.57ns ± 0% -72.59% MemclrKnownSize2 2.56ns ± 5% 0.57ns ± 0% -77.82% MemclrKnownSize4 5.15ns ± 0% 0.57ns ± 0% -89.00% MemclrKnownSize8 2.23ns ± 0% 0.57ns ± 0% -74.57% MemclrKnownSize16 2.23ns ± 0% 0.50ns ± 0% -77.74% MemclrKnownSize32 2.28ns ± 0% 0.56ns ± 0% -75.28% MemclrKnownSize64 2.49ns ± 0% 0.72ns ± 0% -70.95% MemclrKnownSize112 2.97ns ± 2% 1.14ns ± 0% -61.72% MemclrKnownSize128 4.64ns ± 6% 2.45ns ± 1% -47.17% MemclrKnownSize192 5.45ns ± 5% 2.79ns ± 0% -48.87% MemclrKnownSize248 4.51ns ± 0% 2.83ns ± 0% -37.12% MemclrKnownSize256 6.34ns ± 1% 3.58ns ± 0% -43.53% MemclrKnownSize512 3.64ns ± 0% 3.64ns ± 0% -0.03% MemclrKnownSize1024 4.73ns ± 0% 4.73ns ± 0% +0.01% MemclrKnownSize4096 17.1ns ± 0% 17.1ns ± 0% +0.07% MemclrKnownSize512KiB 2.12µs ± 0% 2.12µs ± 0% ~ (all equal) Change-Id: If1abf5749f4802c64523a41fe0058bd144d0ea46 Reviewed-on: https://go-review.googlesource.com/c/go/+/464340 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Jakub Ciolek <jakub@ciolek.dev> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> |
||
Wayne Zuo
|
d7ac5d1480 |
cmd/compile: intrinsify math/bits/ReverseBytes{32|64} for 386
The BSWAPL instruction is supported in i486 and newer. https://github.com/golang/go/wiki/MinimumRequirements#386 says we support "All Pentium MMX or later". The Pentium is also referred to as i586, so that we are safe with these instructions. Change-Id: I6dea1f9d864a45bb07c8f8f35a81cfe16cca216c Reviewed-on: https://go-review.googlesource.com/c/go/+/465515 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
Archana R
|
a432d89137 |
cmd/compile: add rules to emit SETBC/R instructions on power10
This CL adds rules that replaces instances of ISEL that produce a boolean result based on a condition register by SETBC/SETBCR operations. On Power10 these are convereted to SETBC/SETBCR instructions that use one register instead of 3 registers conventionally used by ISEL and hence reduces register pressure. On loops written specifically to exercise such instances of ISEL extensively, a performance improvement of 2.5% is seen on Power10. Also added verification tests to verify correct generation of SETBC/SETBCR instructions on Power10. Change-Id: Ib719897f09d893de40324440a43052dca026e8fa Reviewed-on: https://go-review.googlesource.com/c/go/+/449795 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Archana R
|
cd1fc87156 |
cmd/compile: intrinsify math/bits/ReverseBytes{16|32|64} for ppc64/power10
This change intrinsifies ReverseBytes{16|32|64} by generating the corresponding new instructions in Power10: brh, brd and brw and adds a verification test for the same. On Power 9 and 8, the .go code performs optimally as it is. Performance improvement seen on Power10: ReverseBytes32 1.38ns ± 0% 1.18ns ± 0% -14.2 ReverseBytes64 1.52ns ± 0% 1.11ns ± 0% -26.87 ReverseBytes16 1.41ns ± 1% 1.18ns ± 0% -16.47 Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37 Reviewed-on: https://go-review.googlesource.com/c/go/+/446675 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> |
||
Keith Randall
|
6224db9b4d |
cmd/compile: schedule values with no in-block uses later
When scheduling a block, deprioritize values whose results aren't used until subsequent blocks. For #58166, this has the effect of pushing the induction variable increment to the end of the block, past all the other uses of the pre-incremented value. Do this only with optimizations on. Debuggers have a preference for values in source code order, which this CL can degrade. Fixes #58166 Fixes #57976 Change-Id: I40d5885c661b142443c6d4702294c8abe8026c4f Reviewed-on: https://go-review.googlesource.com/c/go/+/463751 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Jakub Ciolek
|
1fc585dc2f |
cmd/compile: inline known-size memclrNoHeapPointers calls
This patch rewrites known-size calls to memclrNoHeapPointers with an OpZero. This significantly improves performance and lets some clears get DSE'd. One of the cases where this applies is zeroing a known-size array, example: var x [256]int8 ... for a := range x { x[a] = 0 } Other cases can be found in the runtime itself where memclrNoHeapPointers is sometimes directly invoked with a constant. It seems that for some sized-clears on some architectures (AMD64, maybe others) the memcrlNoHeapPointers is more performant than OpZero. See the issue #56997 for more details. Benches ARM (M1 Pro): name old time/op new time/op delta MemclrKnownSize1-10 2.03ns ± 0% 0.31ns ± 0% -84.69% (p=0.000 n=18+19) MemclrKnownSize2-10 1.97ns ± 0% 0.31ns ± 0% -84.19% (p=0.000 n=12+19) MemclrKnownSize4-10 2.02ns ± 0% 0.31ns ± 0% -84.56% (p=0.000 n=12+20) MemclrKnownSize8-10 2.02ns ± 0% 0.31ns ± 0% -84.59% (p=0.000 n=14+19) MemclrKnownSize16-10 2.15ns ± 0% 0.31ns ± 0% -85.50% (p=0.000 n=18+19) MemclrKnownSize32-10 2.48ns ± 0% 0.31ns ± 0% -87.48% (p=0.000 n=20+19) MemclrKnownSize64-10 1.93ns ± 0% 0.62ns ± 0% -67.88% (p=0.000 n=20+19) MemclrKnownSize112-10 2.48ns ± 0% 1.80ns ± 0% -27.74% (p=0.000 n=19+20) MemclrKnownSize128-10 10.0ns ±112% 2.0ns ± 0% -79.76% (p=0.000 n=18+17) MemclrKnownSize192-10 27.4ns ±103% 2.6ns ± 0% -90.38% (p=0.000 n=16+19) MemclrKnownSize248-10 9.67ns ±43% 3.26ns ± 0% -66.29% (p=0.000 n=19+19) MemclrKnownSize256-10 85.4ns ±148% 3.3ns ± 0% -96.18% (p=0.000 n=20+20) MemclrKnownSize512-10 223ns ±54% 6ns ± 0% -97.42% (p=0.000 n=18+20) MemclrKnownSize1024-10 216ns ±26% 11ns ± 0% -95.00% (p=0.000 n=18+15) MemclrKnownSize4096-10 265ns ± 2% 88ns ± 0% -66.84% (p=0.000 n=19+17) MemclrKnownSize512KiB-10 9.91µs ± 1% 10.23µs ± 2% +3.14% (p=0.000 n=19+19) [Geo mean] 15.6ns 2.5ns -83.62% name old speed new speed delta MemclrKnownSize1-10 493MB/s ± 0% 3216MB/s ± 0% +553.04% (p=0.000 n=18+19) MemclrKnownSize2-10 1.02GB/s ± 0% 6.43GB/s ± 0% +532.33% (p=0.000 n=16+19) MemclrKnownSize4-10 1.99GB/s ± 0% 12.86GB/s ± 0% +547.67% (p=0.000 n=18+20) MemclrKnownSize8-10 3.96GB/s ± 0% 25.72GB/s ± 0% +548.81% (p=0.000 n=19+19) MemclrKnownSize16-10 7.46GB/s ± 0% 51.43GB/s ± 0% +589.72% (p=0.000 n=20+19) MemclrKnownSize32-10 12.9GB/s ± 0% 102.9GB/s ± 0% +698.60% (p=0.000 n=20+18) MemclrKnownSize64-10 33.1GB/s ± 0% 103.0GB/s ± 0% +211.34% (p=0.000 n=19+19) MemclrKnownSize112-10 45.1GB/s ± 0% 62.4GB/s ± 0% +38.38% (p=0.000 n=19+20) MemclrKnownSize128-10 13.3GB/s ±107% 63.5GB/s ± 0% +378.03% (p=0.000 n=19+18) MemclrKnownSize192-10 6.97GB/s ±139% 72.72GB/s ± 0% +943.44% (p=0.000 n=19+19) MemclrKnownSize248-10 25.9GB/s ±46% 76.1GB/s ± 0% +194.16% (p=0.000 n=20+17) MemclrKnownSize256-10 8.64GB/s ±196% 78.51GB/s ± 0% +808.19% (p=0.000 n=20+20) MemclrKnownSize512-10 2.33GB/s ±86% 89.13GB/s ± 0% +3719.50% (p=0.000 n=17+20) MemclrKnownSize1024-10 4.85GB/s ±32% 94.93GB/s ± 0% +1856.74% (p=0.000 n=18+19) MemclrKnownSize4096-10 15.4GB/s ± 2% 46.6GB/s ± 0% +201.55% (p=0.000 n=19+18) MemclrKnownSize512KiB-10 52.9GB/s ± 1% 51.3GB/s ± 2% -3.04% (p=0.000 n=19+19) [Geo mean] 7.54GB/s 42.86GB/s +468.76% Intel Alder Lake 12600k: name old time/op new time/op delta MemclrKnownSize1-16 0.59ns ± 3% 0.38ns ± 6% -36.00% (p=0.000 n=19+18) MemclrKnownSize2-16 0.57ns ± 1% 0.19ns ± 5% -66.27% (p=0.000 n=19+19) MemclrKnownSize4-16 0.66ns ± 2% 0.36ns ±21% -45.12% (p=0.000 n=19+20) MemclrKnownSize8-16 0.74ns ± 1% 0.30ns ±26% -59.81% (p=0.000 n=18+20) MemclrKnownSize16-16 1.00ns ± 7% 0.21ns ± 8% -79.51% (p=0.000 n=20+19) MemclrKnownSize32-16 0.95ns ± 1% 0.40ns ± 1% -57.61% (p=0.000 n=20+18) MemclrKnownSize64-16 1.20ns ± 2% 0.41ns ± 0% -65.82% (p=0.000 n=20+18) MemclrKnownSize112-16 1.27ns ± 2% 1.03ns ± 0% -19.35% (p=0.000 n=20+18) MemclrKnownSize128-16 1.34ns ± 2% 1.03ns ± 0% -23.02% (p=0.000 n=20+20) MemclrKnownSize192-16 1.92ns ± 2% 1.44ns ± 0% -24.89% (p=0.000 n=20+16) MemclrKnownSize248-16 2.77ns ± 1% 3.29ns ± 0% +18.81% (p=0.000 n=20+16) MemclrKnownSize256-16 1.92ns ± 1% 1.86ns ± 0% -3.49% (p=0.000 n=19+15) MemclrKnownSize512-16 2.81ns ± 2% 3.49ns ± 0% +24.15% (p=0.000 n=20+17) MemclrKnownSize1024-16 4.02ns ± 1% 6.78ns ± 0% +68.44% (p=0.000 n=20+18) MemclrKnownSize4096-16 17.2ns ± 2% 14.4ns ± 0% -16.73% (p=0.000 n=20+17) MemclrKnownSize512KiB-16 6.71µs ± 1% 6.52µs ± 0% -2.85% (p=0.000 n=20+18) [Geo mean] 2.60ns 1.71ns -34.06% name old speed new speed delta MemclrKnownSize1-16 1.71GB/s ± 3% 2.67GB/s ± 6% +56.39% (p=0.000 n=19+18) MemclrKnownSize2-16 3.52GB/s ± 2% 10.43GB/s ± 6% +196.04% (p=0.000 n=20+20) MemclrKnownSize4-16 6.06GB/s ± 1% 10.83GB/s ±11% +78.63% (p=0.000 n=19+18) MemclrKnownSize8-16 10.7GB/s ± 1% 27.0GB/s ±21% +151.49% (p=0.000 n=18+20) MemclrKnownSize16-16 16.0GB/s ± 8% 78.1GB/s ± 7% +387.24% (p=0.000 n=20+19) MemclrKnownSize32-16 33.6GB/s ± 1% 79.4GB/s ± 1% +135.89% (p=0.000 n=20+18) MemclrKnownSize64-16 53.3GB/s ± 2% 155.9GB/s ± 0% +192.58% (p=0.000 n=20+18) MemclrKnownSize112-16 88.0GB/s ± 2% 109.1GB/s ± 0% +23.97% (p=0.000 n=20+18) MemclrKnownSize128-16 95.3GB/s ± 2% 123.8GB/s ± 0% +29.88% (p=0.000 n=20+20) MemclrKnownSize192-16 100GB/s ± 2% 133GB/s ± 0% +33.12% (p=0.000 n=20+17) MemclrKnownSize248-16 89.7GB/s ± 1% 75.5GB/s ± 0% -15.84% (p=0.000 n=20+19) MemclrKnownSize256-16 133GB/s ± 1% 138GB/s ± 0% +3.61% (p=0.000 n=19+14) MemclrKnownSize512-16 182GB/s ± 2% 147GB/s ± 0% -19.46% (p=0.000 n=20+17) MemclrKnownSize1024-16 254GB/s ± 1% 151GB/s ± 0% -40.64% (p=0.000 n=20+18) MemclrKnownSize4096-16 237GB/s ± 2% 285GB/s ± 0% +20.09% (p=0.000 n=20+17) MemclrKnownSize512KiB-16 78.2GB/s ± 1% 80.4GB/s ± 0% +2.93% (p=0.000 n=20+18) [Geo mean] 42.1GB/s 63.8GB/s +51.53% compilecmp linux/amd64: runtime runtime.(*pallocData).allocAll 85 -> 45 (-47.06%) runtime.(*pageAlloc).allocRange 942 -> 923 (-2.02%) runtime.(*pageAlloc).free 798 -> 774 (-3.01%) runtime.(*pageBits).clearAll 66 -> 20 (-69.70%) runtime.startCheckmarks 255 -> 246 (-3.53%) runtime.(*pallocData).freeAll 86 -> 46 (-46.51%) runtime.(*pallocBits).freeAll 66 -> 20 (-69.70%) runtime.(*consistentHeapStats).unsafeClear 66 -> 19 (-71.21%) runtime.newproc1 965 -> 933 (-3.32%) crypto/rc4 crypto/rc4.(*Cipher).Reset 78 -> 69 (-11.54%) compress/bzip2 compress/bzip2.(*reader).readBlock 2973 -> 2941 (-1.08%) image/jpeg image/jpeg.(*decoder).processDHT 1179 -> 1166 (-1.10%) index/suffixarray index/suffixarray.bucketMax_8_32 394 -> 241 (-38.83%) index/suffixarray.freq_8_32 317 -> 185 (-41.64%) index/suffixarray.freq_8_64 317 -> 178 (-43.85%) index/suffixarray.bucketMin_8_32 394 -> 243 (-38.32%) index/suffixarray.bucketMin_8_64 398 -> 234 (-41.21%) index/suffixarray.bucketMax_8_64 398 -> 234 (-41.21%) compress/flate compress/flate.(*huffmanBitWriter).generateCodegen 965 -> 838 (-13.16%) compress/flate.(*compressor).reset 429 -> 409 (-4.66%) cmd/vendor/golang.org/x/sys/unix cmd/vendor/golang.org/x/sys/unix.(*FdSet).Zero 66 -> 60 (-9.09%) cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetInet4Addr 211 -> 129 (-38.86%) cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint32 98 -> 14 (-85.71%) cmd/vendor/golang.org/x/sys/unix.(*Ifreq).clear 66 -> 11 (-83.33%) cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint16 101 -> 15 (-85.15%) cmd/vendor/golang.org/x/sys/unix.(*CPUSet).Zero 66 -> 60 (-9.09%) internal/coverage/decodemeta internal/coverage/decodemeta.(*CoverageMetaFileReader).rdUint64 325 -> 293 (-9.85%) crypto/tls crypto/tls.(*halfConn).setTrafficSecret 253 -> 247 (-2.37%) crypto/tls.(*Conn).readRecordOrCCS 10315 -> 10283 (-0.31%) crypto/tls.(*halfConn).changeCipherSpec 271 -> 261 (-3.69%) crypto/tls.(*Conn).writeRecordLocked 1765 -> 1748 (-0.96%) file before after Δ % runtime.s 512467 512164 -303 -0.059% crypto/rc4.s 955 946 -9 -0.942% compress/bzip2.s 9586 9554 -32 -0.334% image/jpeg.s 32122 32109 -13 -0.040% index/suffixarray.s 38547 37644 -903 -2.343% compress/flate.s 46668 46521 -147 -0.315% cmd/vendor/golang.org/x/sys/unix.s 118620 118301 -319 -0.269% internal/coverage/decodemeta.s 7224 7192 -32 -0.443% crypto/tls.s 288762 288697 -65 -0.023% cmd/compile/internal/ssa.s 3639799 3640727 +928 +0.025% total 20790248 20789353 -895 -0.004% src/runtime benchmarks (Linux Alder Lake 12600k): name old time/op new time/op delta MakeChan/Byte-16 26.2ns ± 2% 25.6ns ± 3% -2.05% (p=0.003 n=9+10) MakeChan/Int-16 33.9ns ± 2% 33.3ns ± 4% -1.99% (p=0.015 n=10+10) MakeChan/Ptr-16 54.2ns ± 2% 53.7ns ± 1% -0.90% (p=0.016 n=10+9) MakeChan/Struct/0-16 23.8ns ± 3% 23.4ns ± 1% -1.72% (p=0.009 n=10+8) MakeChan/Struct/32-16 55.9ns ± 2% 53.9ns ± 1% -3.48% (p=0.000 n=10+10) MakeChan/Struct/40-16 63.5ns ± 1% 61.1ns ± 2% -3.79% (p=0.000 n=10+9) ChanNonblocking-16 0.22ns ± 0% 0.22ns ± 0% +0.40% (p=0.011 n=9+8) SelectUncontended-16 4.63ns ± 1% 4.62ns ± 0% -0.35% (p=0.001 n=10+8) SelectSyncContended-16 1.58µs ± 2% 1.59µs ± 1% ~ (p=0.540 n=10+10) SelectAsyncContended-16 290ns ± 0% 291ns ± 0% +0.14% (p=0.012 n=8+9) SelectNonblock-16 0.95ns ± 1% 0.95ns ± 1% ~ (p=0.546 n=9+9) ChanUncontended-16 239ns ± 3% 242ns ± 6% ~ (p=0.886 n=9+10) ChanContended-16 17.7µs ± 1% 18.2µs ± 1% +2.87% (p=0.000 n=10+9) ChanSync-16 109ns ± 2% 109ns ± 1% ~ (p=0.342 n=10+10) ChanSyncWork-16 6.55µs ± 1% 6.53µs ± 1% ~ (p=0.101 n=10+10) ChanProdCons0-16 502ns ± 1% 499ns ± 0% -0.55% (p=0.001 n=10+9) ChanProdCons10-16 373ns ± 2% 377ns ± 1% ~ (p=0.095 n=10+9) ChanProdCons100-16 224ns ± 2% 223ns ± 3% ~ (p=0.150 n=9+10) ChanProdConsWork0-16 491ns ± 1% 484ns ± 0% -1.26% (p=0.000 n=10+9) ChanProdConsWork10-16 451ns ± 2% 448ns ± 2% ~ (p=0.210 n=8+10) ChanProdConsWork100-16 406ns ± 0% 407ns ± 1% ~ (p=0.138 n=8+8) SelectProdCons-16 509ns ± 0% 509ns ± 0% ~ (p=0.917 n=9+9) ReceiveDataFromClosedChan-16 12.1ns ± 0% 12.1ns ± 0% ~ (p=0.780 n=10+10) ChanCreation-16 22.6ns ± 1% 22.4ns ± 0% -0.72% (p=0.001 n=10+8) ChanSem-16 165ns ± 1% 166ns ± 1% +0.72% (p=0.002 n=10+10) ChanPopular-16 500µs ± 2% 498µs ± 1% ~ (p=0.218 n=10+10) ChanClosed-16 0.29ns ± 0% 0.29ns ± 0% +0.09% (p=0.019 n=9+8) CallClosure-16 1.28ns ± 0% 1.27ns ± 0% -0.51% (p=0.000 n=9+9) CallClosure1-16 1.50ns ± 0% 1.50ns ± 0% ~ (p=0.123 n=9+9) CallClosure2-16 8.86ns ± 1% 8.86ns ± 3% ~ (p=0.590 n=9+10) CallClosure3-16 8.75ns ± 2% 8.69ns ± 2% ~ (p=0.247 n=10+10) CallClosure4-16 8.65ns ± 2% 8.56ns ± 2% ~ (p=0.105 n=10+10) Complex128DivNormal-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.790 n=10+9) Complex128DivNisNaN-16 4.44ns ± 0% 4.43ns ± 0% ~ (p=0.564 n=10+10) Complex128DivDisNaN-16 4.48ns ± 0% 4.48ns ± 0% ~ (p=0.101 n=10+10) Complex128DivNisInf-16 2.58ns ± 0% 2.58ns ± 0% ~ (p=0.808 n=10+10) Complex128DivDisInf-16 6.30ns ± 0% 6.31ns ± 0% ~ (p=0.305 n=10+10) SetTypePtr-16 0.73ns ± 1% 0.73ns ± 3% ~ (p=0.644 n=10+10) SetTypePtr8-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.127 n=10+10) SetTypePtr16-16 4.13ns ± 1% 4.12ns ± 0% ~ (p=0.109 n=10+10) SetTypePtr32-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.203 n=9+10) SetTypePtr64-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.696 n=10+10) SetTypePtr126-16 6.91ns ± 0% 6.91ns ± 0% ~ (p=0.469 n=10+10) SetTypePtr128-16 6.66ns ± 0% 6.67ns ± 0% ~ (p=0.246 n=9+10) SetTypePtrSlice-16 54.1ns ± 1% 54.1ns ± 1% ~ (p=0.509 n=9+10) SetTypeNode1-16 4.13ns ± 1% 4.12ns ± 0% ~ (p=0.342 n=10+10) SetTypeNode1Slice-16 10.1ns ± 1% 10.0ns ± 1% -1.18% (p=0.000 n=10+10) SetTypeNode8-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.137 n=8+8) SetTypeNode8Slice-16 22.6ns ± 0% 22.6ns ± 0% ~ (p=0.423 n=10+10) SetTypeNode64-16 6.90ns ± 0% 6.91ns ± 0% ~ (p=0.275 n=10+10) SetTypeNode64Slice-16 173ns ± 0% 173ns ± 0% ~ (p=0.610 n=9+10) SetTypeNode64Dead-16 5.53ns ± 0% 5.52ns ± 0% ~ (p=0.123 n=10+6) SetTypeNode64DeadSlice-16 150ns ± 0% 150ns ± 0% ~ (p=0.398 n=10+10) SetTypeNode124-16 6.90ns ± 0% 6.90ns ± 0% ~ (p=0.779 n=10+10) SetTypeNode124Slice-16 222ns ± 5% 217ns ± 0% ~ (p=0.302 n=10+10) SetTypeNode126-16 6.66ns ± 0% 6.66ns ± 0% ~ (p=0.324 n=10+9) SetTypeNode126Slice-16 218ns ± 0% 218ns ± 0% ~ (p=0.119 n=9+10) SetTypeNode128-16 9.76ns ± 0% 9.73ns ± 0% -0.31% (p=0.003 n=9+10) SetTypeNode128Slice-16 279ns ± 0% 278ns ± 0% ~ (p=0.112 n=10+9) SetTypeNode130-16 9.77ns ± 0% 9.73ns ± 0% -0.33% (p=0.002 n=10+10) SetTypeNode130Slice-16 284ns ± 0% 284ns ± 0% ~ (p=0.668 n=10+10) SetTypeNode1024-16 51.2ns ± 0% 51.6ns ± 1% ~ (p=0.080 n=9+9) SetTypeNode1024Slice-16 1.83µs ± 0% 1.82µs ± 0% ~ (p=0.115 n=10+10) Allocation-16 4.64µs ± 1% 4.37µs ± 1% -5.69% (p=0.000 n=9+9) ReadMemStats-16 5.62µs ± 2% 5.55µs ± 5% -1.36% (p=0.050 n=10+10) WriteBarrier-16 4.95ns ± 3% 4.99ns ± 3% ~ (p=0.255 n=10+10) BulkWriteBarrier-16 1.69ns ± 2% 1.63ns ± 4% -3.77% (p=0.001 n=10+10) ScanStackNoLocals-16 12.8ms ± 2% 12.9ms ± 1% +0.72% (p=0.019 n=10+10) MSpanCountAlloc/bits=64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.124 n=10+10) MSpanCountAlloc/bits=128-16 2.08ns ± 1% 2.06ns ± 1% -0.87% (p=0.000 n=10+10) MSpanCountAlloc/bits=256-16 2.71ns ± 1% 2.69ns ± 1% -0.74% (p=0.001 n=10+9) MSpanCountAlloc/bits=512-16 4.15ns ± 0% 4.23ns ± 2% +2.15% (p=0.000 n=10+10) MSpanCountAlloc/bits=1024-16 7.89ns ± 1% 7.89ns ± 1% ~ (p=0.867 n=10+10) Hash5-16 1.93ns ± 1% 2.01ns ± 0% +3.99% (p=0.000 n=10+8) Hash16-16 2.04ns ± 1% 2.21ns ± 1% +8.61% (p=0.000 n=10+10) Hash64-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.154 n=9+9) Hash1024-16 16.4ns ± 0% 16.4ns ± 0% +0.17% (p=0.020 n=9+10) Hash65536-16 886ns ± 0% 885ns ± 0% ~ (p=0.725 n=10+10) AlignedLoad-16 0.96ns ± 2% 0.95ns ± 3% ~ (p=0.123 n=10+10) UnalignedLoad-16 0.95ns ± 2% 1.01ns ± 2% +6.31% (p=0.000 n=10+10) EqEfaceConcrete-16 0.31ns ± 3% 0.33ns ± 5% +8.10% (p=0.000 n=10+10) EqIfaceConcrete-16 0.31ns ±13% 0.28ns ± 2% -9.23% (p=0.001 n=10+10) NeEfaceConcrete-16 0.29ns ± 1% 0.31ns ± 7% +5.59% (p=0.010 n=8+8) NeIfaceConcrete-16 0.28ns ± 2% 0.29ns ± 1% +4.49% (p=0.000 n=9+8) ConvT2EByteSized/bool-16 0.53ns ± 1% 0.52ns ± 1% -2.18% (p=0.000 n=10+10) ConvT2EByteSized/uint8-16 0.53ns ± 1% 0.53ns ± 0% +1.22% (p=0.000 n=10+10) ConvT2ESmall-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.774 n=9+9) ConvT2EUintptr-16 1.03ns ± 0% 1.04ns ± 0% +0.50% (p=0.000 n=10+8) ConvT2ELarge-16 14.4ns ± 2% 14.4ns ± 1% ~ (p=0.726 n=10+10) ConvT2ISmall-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.693 n=9+10) ConvT2IUintptr-16 1.03ns ± 0% 1.03ns ± 0% +0.44% (p=0.000 n=10+10) ConvT2ILarge-16 14.2ns ± 1% 14.4ns ± 1% +0.85% (p=0.007 n=9+10) ConvI2E-16 0.54ns ± 1% 0.54ns ± 0% -0.39% (p=0.037 n=10+8) ConvI2I-16 2.68ns ± 0% 2.70ns ± 1% +0.73% (p=0.000 n=9+10) AssertE2T-16 0.28ns ± 1% 0.39ns ± 5% +37.38% (p=0.000 n=10+10) AssertE2TLarge-16 0.42ns ± 2% 0.48ns ± 1% +14.92% (p=0.000 n=9+10) AssertE2I-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.352 n=9+9) AssertI2T-16 0.37ns ± 3% 0.34ns ± 1% -6.16% (p=0.000 n=10+10) AssertI2I-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.286 n=10+10) AssertI2E-16 0.54ns ± 1% 0.54ns ± 0% -0.94% (p=0.000 n=10+10) AssertE2E-16 0.41ns ± 0% 0.41ns ± 1% ~ (p=0.880 n=9+9) AssertE2T2-16 0.41ns ± 1% 0.41ns ± 1% ~ (p=0.725 n=10+10) AssertE2T2Blank-16 0.24ns ± 5% 0.21ns ± 1% -14.79% (p=0.000 n=10+9) AssertI2E2-16 0.69ns ± 0% 0.69ns ± 1% ~ (p=0.541 n=10+10) AssertI2E2Blank-16 0.26ns ± 9% 0.21ns ± 1% -18.86% (p=0.000 n=10+9) AssertE2E2-16 0.53ns ± 1% 0.53ns ± 1% +0.72% (p=0.004 n=10+10) AssertE2E2Blank-16 0.23ns ± 4% 0.21ns ± 1% -8.42% (p=0.000 n=10+10) ConvT2Ezero/zero/16-16 1.13ns ± 0% 1.14ns ± 1% ~ (p=0.583 n=9+10) ConvT2Ezero/zero/32-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.417 n=10+10) ConvT2Ezero/zero/64-16 1.03ns ± 1% 1.03ns ± 0% ~ (p=0.051 n=10+10) ConvT2Ezero/zero/str-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.132 n=10+10) ConvT2Ezero/zero/slice-16 1.14ns ± 0% 1.15ns ± 0% +0.49% (p=0.001 n=10+10) ConvT2Ezero/zero/big-16 123ns ± 1% 123ns ± 1% ~ (p=0.171 n=10+10) ConvT2Ezero/nonzero/str-16 19.4ns ± 1% 19.3ns ± 3% ~ (p=0.548 n=9+10) ConvT2Ezero/nonzero/slice-16 22.2ns ± 2% 22.0ns ± 2% ~ (p=0.109 n=10+10) ConvT2Ezero/nonzero/big-16 123ns ± 1% 123ns ± 1% ~ (p=0.446 n=10+8) ConvT2Ezero/smallint/16-16 1.13ns ± 0% 1.14ns ± 1% ~ (p=0.362 n=10+10) ConvT2Ezero/smallint/32-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.907 n=10+9) ConvT2Ezero/smallint/64-16 1.04ns ± 0% 1.03ns ± 0% -0.38% (p=0.002 n=10+10) ConvT2Ezero/largeint/16-16 6.65ns ± 1% 6.65ns ± 2% ~ (p=0.618 n=10+9) ConvT2Ezero/largeint/32-16 6.75ns ± 3% 6.63ns ± 2% -1.77% (p=0.015 n=10+10) ConvT2Ezero/largeint/64-16 9.19ns ± 1% 9.26ns ± 2% ~ (p=0.123 n=10+10) Malloc8-16 8.66ns ± 1% 8.89ns ± 2% +2.74% (p=0.000 n=10+10) Malloc16-16 13.7ns ± 1% 13.8ns ± 1% +0.71% (p=0.022 n=10+8) MallocTypeInfo8-16 11.7ns ± 3% 11.6ns ± 2% ~ (p=0.469 n=10+10) MallocTypeInfo16-16 18.3ns ± 1% 18.2ns ± 2% ~ (p=0.251 n=9+10) MallocLargeStruct-16 195ns ± 1% 198ns ± 1% +1.65% (p=0.000 n=9+10) GoroutineSelect-16 1.10ms ± 1% 1.12ms ± 1% +1.36% (p=0.000 n=10+8) GoroutineBlocking-16 986µs ± 1% 998µs ± 1% +1.23% (p=0.002 n=10+10) GoroutineForRange-16 985µs ± 1% 1001µs ± 1% +1.68% (p=0.000 n=10+10) GoroutineIdle-16 679µs ± 1% 691µs ± 0% +1.74% (p=0.000 n=10+9) HashStringSpeed-16 5.33ns ± 5% 5.19ns ± 4% ~ (p=0.113 n=9+9) HashBytesSpeed-16 8.20ns ± 3% 8.24ns ± 1% ~ (p=0.497 n=10+9) HashInt32Speed-16 4.01ns ± 2% 3.90ns ± 4% -2.63% (p=0.011 n=9+10) HashInt64Speed-16 3.94ns ± 4% 3.79ns ± 1% -3.74% (p=0.003 n=10+9) HashStringArraySpeed-16 12.5ns ± 4% 12.3ns ± 1% ~ (p=0.055 n=10+10) MegMap-16 3.72ns ± 1% 3.73ns ± 1% ~ (p=0.484 n=9+10) MegOneMap-16 2.28ns ± 1% 2.27ns ± 1% ~ (p=0.287 n=10+10) MegEqMap-16 22.0µs ± 3% 22.3µs ± 2% +1.48% (p=0.028 n=10+9) MegEmptyMap-16 0.93ns ± 1% 0.92ns ± 1% -0.52% (p=0.030 n=10+10) SmallStrMap-16 3.77ns ± 0% 3.77ns ± 0% ~ (p=0.324 n=10+10) MapStringKeysEight_16-16 3.91ns ± 0% 3.91ns ± 0% ~ (p=0.088 n=9+9) MapStringKeysEight_32-16 3.58ns ± 1% 3.50ns ± 0% -2.11% (p=0.000 n=10+10) MapStringKeysEight_64-16 3.58ns ± 1% 3.50ns ± 0% -2.23% (p=0.000 n=10+10) MapStringKeysEight_1M-16 3.57ns ± 1% 3.50ns ± 0% -1.92% (p=0.000 n=10+10) IntMap-16 2.89ns ± 1% 2.89ns ± 0% ~ (p=0.381 n=10+10) MapFirst/1-16 1.60ns ± 1% 1.59ns ± 2% -0.49% (p=0.020 n=10+9) MapFirst/2-16 1.61ns ± 0% 1.59ns ± 1% -1.17% (p=0.001 n=10+10) MapFirst/3-16 1.61ns ± 1% 1.59ns ± 1% -1.45% (p=0.000 n=10+10) MapFirst/4-16 1.60ns ± 1% 1.59ns ± 1% -1.16% (p=0.000 n=10+10) MapFirst/5-16 1.60ns ± 1% 1.58ns ± 1% -0.98% (p=0.000 n=10+10) MapFirst/6-16 1.60ns ± 1% 1.59ns ± 1% -0.87% (p=0.001 n=10+10) MapFirst/7-16 1.60ns ± 1% 1.59ns ± 1% -0.79% (p=0.002 n=10+10) MapFirst/8-16 1.60ns ± 1% 1.59ns ± 1% -0.67% (p=0.017 n=9+10) MapFirst/9-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.492 n=10+10) MapFirst/10-16 2.83ns ± 0% 2.84ns ± 0% +0.24% (p=0.017 n=10+10) MapFirst/11-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.445 n=10+10) MapFirst/12-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.564 n=10+10) MapFirst/13-16 2.83ns ± 0% 2.84ns ± 0% ~ (p=0.175 n=9+10) MapFirst/14-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.322 n=10+9) MapFirst/15-16 2.83ns ± 0% 2.84ns ± 1% ~ (p=0.209 n=10+10) MapFirst/16-16 2.83ns ± 1% 2.84ns ± 0% ~ (p=0.238 n=10+10) MapMid/1-16 1.64ns ± 0% 1.64ns ± 0% ~ (p=0.453 n=10+9) MapMid/2-16 1.86ns ± 1% 1.86ns ± 0% ~ (p=0.764 n=10+9) MapMid/3-16 1.86ns ± 0% 1.86ns ± 1% ~ (p=1.000 n=10+10) MapMid/4-16 2.06ns ± 0% 2.06ns ± 0% -0.27% (p=0.014 n=10+9) MapMid/5-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.075 n=9+10) MapMid/6-16 2.27ns ± 0% 2.27ns ± 1% ~ (p=0.898 n=10+10) MapMid/7-16 2.27ns ± 1% 2.26ns ± 0% -0.23% (p=0.049 n=10+10) MapMid/8-16 2.47ns ± 0% 2.47ns ± 1% ~ (p=0.840 n=10+10) MapMid/9-16 4.21ns ± 7% 4.13ns ±19% ~ (p=0.315 n=10+10) MapMid/10-16 4.17ns ± 7% 4.31ns ± 5% +3.37% (p=0.021 n=10+9) MapMid/11-16 4.18ns ± 7% 4.32ns ± 6% +3.50% (p=0.015 n=10+10) MapMid/12-16 4.34ns ± 7% 4.30ns ± 5% ~ (p=0.858 n=9+10) MapMid/13-16 4.25ns ± 6% 4.28ns ± 6% ~ (p=0.489 n=9+9) MapMid/14-16 3.75ns ±23% 3.90ns ±16% ~ (p=0.353 n=10+10) MapMid/15-16 3.87ns ±25% 3.95ns ±26% ~ (p=0.315 n=10+10) MapMid/16-16 4.06ns ±19% 3.94ns ±16% ~ (p=0.796 n=10+10) MapLast/1-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.607 n=10+10) MapLast/2-16 1.86ns ± 0% 1.86ns ± 0% +0.26% (p=0.029 n=10+10) MapLast/3-16 2.06ns ± 1% 2.06ns ± 0% ~ (p=0.689 n=8+9) MapLast/4-16 2.27ns ± 1% 2.26ns ± 0% ~ (p=0.148 n=10+9) MapLast/5-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.385 n=9+10) MapLast/6-16 2.67ns ± 0% 2.68ns ± 0% ~ (p=0.202 n=9+10) MapLast/7-16 2.88ns ± 0% 2.88ns ± 0% ~ (p=0.751 n=10+10) MapLast/8-16 3.08ns ± 0% 3.08ns ± 0% ~ (p=0.826 n=10+9) MapLast/9-16 4.31ns ± 6% 4.54ns ± 5% ~ (p=0.070 n=9+8) MapLast/10-16 4.25ns ± 5% 4.42ns ± 6% ~ (p=0.321 n=9+8) MapLast/11-16 4.59ns ±16% 5.42ns ±44% +17.99% (p=0.019 n=10+10) MapLast/12-16 5.04ns ±19% 6.11ns ±28% +21.35% (p=0.005 n=9+10) MapLast/13-16 6.00ns ±35% 5.76ns ± 3% ~ (p=0.173 n=10+8) MapLast/14-16 4.27ns ± 5% 4.53ns ± 6% +6.14% (p=0.007 n=10+10) MapLast/15-16 4.41ns ± 1% 4.44ns ± 7% ~ (p=0.515 n=8+10) MapLast/16-16 4.18ns ± 6% 4.99ns ±18% +19.48% (p=0.000 n=10+10) MapCycle-16 7.48ns ± 2% 7.46ns ± 1% ~ (p=0.699 n=10+10) RepeatedLookupStrMapKey32-16 6.98ns ± 3% 6.73ns ± 2% -3.63% (p=0.000 n=10+10) RepeatedLookupStrMapKey1M-16 14.7µs ± 5% 14.7µs ± 4% ~ (p=0.604 n=9+10) MakeMap/[Byte]Byte-16 58.5ns ± 1% 58.5ns ± 1% ~ (p=0.780 n=10+9) MakeMap/[Int]Int-16 113ns ± 0% 113ns ± 1% ~ (p=0.100 n=8+10) NewEmptyMap-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.638 n=10+10) NewSmallMap-16 11.5ns ± 1% 11.6ns ± 0% +1.18% (p=0.000 n=10+10) MapIter-16 42.2ns ± 0% 42.8ns ± 1% +1.50% (p=0.000 n=10+10) MapIterEmpty-16 1.85ns ± 0% 1.85ns ± 0% ~ (p=0.651 n=10+10) SameLengthMap-16 1.85ns ± 1% 1.85ns ± 0% ~ (p=0.247 n=10+10) BigKeyMap-16 7.18ns ± 1% 7.42ns ± 4% +3.33% (p=0.004 n=10+10) BigValMap-16 7.03ns ± 2% 7.19ns ± 1% +2.33% (p=0.000 n=10+9) SmallKeyMap-16 5.32ns ± 1% 5.24ns ± 1% -1.41% (p=0.000 n=10+10) MapPopulate/1-16 6.30ns ± 0% 6.41ns ± 1% +1.81% (p=0.000 n=8+10) MapPopulate/10-16 239ns ± 2% 234ns ± 2% -2.05% (p=0.001 n=9+10) MapPopulate/100-16 4.19µs ± 2% 4.22µs ± 2% ~ (p=0.171 n=10+10) MapPopulate/1000-16 52.3µs ± 1% 52.5µs ± 1% ~ (p=0.133 n=9+10) MapPopulate/10000-16 459µs ± 1% 466µs ± 2% +1.45% (p=0.005 n=10+10) MapPopulate/100000-16 4.22ms ± 2% 4.25ms ± 2% ~ (p=0.393 n=10+10) ComplexAlgMap-16 12.5ns ± 1% 12.4ns ± 1% -0.95% (p=0.022 n=10+10) GoMapClear/Reflexive/1-16 9.61ns ± 1% 9.58ns ± 0% -0.27% (p=0.027 n=10+10) GoMapClear/Reflexive/10-16 10.0ns ± 1% 10.0ns ± 1% ~ (p=0.648 n=9+9) GoMapClear/Reflexive/100-16 31.4ns ± 0% 31.4ns ± 1% ~ (p=0.305 n=9+10) GoMapClear/Reflexive/1000-16 147ns ± 0% 149ns ± 2% +1.21% (p=0.000 n=10+10) GoMapClear/Reflexive/10000-16 3.99µs ± 0% 4.00µs ± 0% +0.21% (p=0.018 n=9+10) GoMapClear/NonReflexive/1-16 41.4ns ± 2% 41.7ns ± 1% +0.55% (p=0.043 n=9+10) GoMapClear/NonReflexive/10-16 50.3ns ± 1% 50.9ns ± 1% +1.16% (p=0.000 n=10+10) GoMapClear/NonReflexive/100-16 125ns ± 0% 126ns ± 0% +0.96% (p=0.000 n=8+10) GoMapClear/NonReflexive/1000-16 1.08µs ± 0% 1.08µs ± 1% ~ (p=0.097 n=10+10) GoMapClear/NonReflexive/10000-16 8.18µs ± 2% 8.10µs ± 0% -0.91% (p=0.019 n=10+8) MapStringConversion/32/simple-16 4.66ns ± 1% 4.69ns ± 3% ~ (p=0.905 n=9+10) MapStringConversion/32/struct-16 4.65ns ± 3% 4.94ns ± 2% +6.23% (p=0.000 n=10+10) MapStringConversion/32/array-16 4.69ns ± 3% 4.72ns ± 3% ~ (p=0.631 n=10+10) MapStringConversion/64/simple-16 4.14ns ± 0% 4.14ns ± 1% ~ (p=0.342 n=10+10) MapStringConversion/64/struct-16 4.13ns ± 0% 4.13ns ± 0% ~ (p=0.809 n=10+10) MapStringConversion/64/array-16 4.13ns ± 1% 4.13ns ± 1% ~ (p=0.752 n=10+10) MapInterfaceString-16 7.90ns ±23% 8.51ns ±33% ~ (p=0.604 n=9+10) MapInterfacePtr-16 7.68ns ±29% 7.10ns ±36% ~ (p=0.353 n=10+10) NewEmptyMapHintLessThan8-16 3.70ns ± 0% 3.70ns ± 0% ~ (p=0.209 n=10+10) NewEmptyMapHintGreaterThan8-16 270ns ± 1% 272ns ± 1% +0.71% (p=0.005 n=10+9) MapPop100-16 6.45µs ± 0% 6.50µs ± 1% +0.77% (p=0.000 n=10+10) MapPop1000-16 114µs ± 1% 114µs ± 1% ~ (p=0.190 n=10+10) MapPop10000-16 2.28ms ± 2% 2.28ms ± 2% ~ (p=0.912 n=10+10) MapAssign/Int32/256-16 4.75ns ± 2% 4.82ns ± 4% ~ (p=0.101 n=10+10) MapAssign/Int32/65536-16 16.4ns ± 1% 16.7ns ± 0% +1.44% (p=0.000 n=10+9) MapAssign/Int64/256-16 4.79ns ± 5% 4.79ns ± 1% ~ (p=0.616 n=10+8) MapAssign/Int64/65536-16 17.1ns ± 1% 16.8ns ± 0% -1.28% (p=0.000 n=10+8) MapAssign/Str/256-16 6.07ns ± 6% 6.24ns ± 2% +2.84% (p=0.035 n=10+9) MapAssign/Str/65536-16 21.4ns ± 0% 21.4ns ± 3% ~ (p=0.300 n=7+10) MapOperatorAssign/Int32/256-16 4.82ns ± 3% 4.81ns ± 3% ~ (p=0.684 n=10+10) MapOperatorAssign/Int32/65536-16 16.8ns ± 1% 16.5ns ± 1% -1.68% (p=0.000 n=9+10) MapOperatorAssign/Int64/256-16 4.74ns ± 1% 4.77ns ± 3% ~ (p=0.563 n=10+9) MapOperatorAssign/Int64/65536-16 16.9ns ± 1% 17.2ns ± 1% +1.88% (p=0.000 n=10+10) MapOperatorAssign/Str/256-16 1.09µs ± 1% 1.10µs ± 2% ~ (p=0.210 n=10+10) MapOperatorAssign/Str/65536-16 184ns ± 9% 184ns ± 8% ~ (p=0.922 n=10+9) MapAppendAssign/Int32/256-16 13.8ns ±10% 14.4ns ±11% ~ (p=0.190 n=10+10) MapAppendAssign/Int32/65536-16 28.9ns ± 5% 30.7ns ± 6% +6.13% (p=0.001 n=9+10) MapAppendAssign/Int64/256-16 14.5ns ±12% 13.8ns ± 8% -5.02% (p=0.037 n=10+10) MapAppendAssign/Int64/65536-16 30.9ns ± 1% 30.4ns ± 2% -1.56% (p=0.001 n=10+10) MapAppendAssign/Str/256-16 30.2ns ± 6% 30.0ns ±10% ~ (p=0.645 n=10+10) MapAppendAssign/Str/65536-16 44.5ns ± 4% 46.8ns ± 3% +5.17% (p=0.001 n=8+9) MapDelete/Int32/100-16 18.7ns ± 0% 18.7ns ± 0% -0.27% (p=0.017 n=10+10) MapDelete/Int32/1000-16 17.6ns ± 1% 17.5ns ± 1% -0.85% (p=0.000 n=9+10) MapDelete/Int32/10000-16 18.7ns ± 0% 18.3ns ± 1% -1.92% (p=0.000 n=10+10) MapDelete/Int64/100-16 19.1ns ± 0% 19.2ns ± 0% +0.68% (p=0.000 n=10+9) MapDelete/Int64/1000-16 17.7ns ± 2% 18.3ns ± 1% +3.00% (p=0.000 n=10+10) MapDelete/Int64/10000-16 18.8ns ± 1% 19.2ns ± 0% +2.01% (p=0.000 n=10+9) MapDelete/Str/100-16 26.5ns ± 0% 26.4ns ± 1% -0.73% (p=0.000 n=10+10) MapDelete/Str/1000-16 23.5ns ± 2% 23.4ns ± 1% ~ (p=0.425 n=10+10) MapDelete/Str/10000-16 25.1ns ± 0% 25.1ns ± 1% +0.28% (p=0.037 n=10+10) MapDelete/Pointer/100-16 20.6ns ± 1% 20.6ns ± 0% ~ (p=0.117 n=10+10) MapDelete/Pointer/1000-16 19.2ns ± 1% 19.4ns ± 1% +0.97% (p=0.004 n=10+10) MapDelete/Pointer/10000-16 20.0ns ± 0% 20.1ns ± 1% +0.52% (p=0.022 n=10+10) Memmove/0-16 0.21ns ± 2% 0.21ns ± 1% ~ (p=0.671 n=10+10) Memmove/1-16 0.93ns ± 0% 0.93ns ± 0% +0.21% (p=0.034 n=10+10) Memmove/2-16 0.93ns ± 0% 0.93ns ± 0% ~ (p=0.101 n=10+10) Memmove/3-16 0.93ns ± 1% 0.93ns ± 1% +0.49% (p=0.004 n=10+10) Memmove/4-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.260 n=10+10) Memmove/5-16 1.13ns ± 0% 1.13ns ± 0% +0.20% (p=0.034 n=10+10) Memmove/6-16 1.13ns ± 0% 1.13ns ± 1% ~ (p=0.126 n=10+10) Memmove/7-16 1.13ns ± 0% 1.13ns ± 1% +0.22% (p=0.028 n=10+10) Memmove/8-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.545 n=9+10) Memmove/9-16 1.25ns ± 0% 1.35ns ± 0% +7.98% (p=0.000 n=10+10) Memmove/10-16 1.25ns ± 0% 1.35ns ± 0% +7.96% (p=0.000 n=9+9) Memmove/11-16 1.25ns ± 0% 1.35ns ± 0% +8.53% (p=0.000 n=10+9) Memmove/12-16 1.25ns ± 0% 1.35ns ± 1% +8.24% (p=0.000 n=10+10) Memmove/13-16 1.25ns ± 0% 1.34ns ± 0% +7.75% (p=0.000 n=10+10) Memmove/14-16 1.25ns ± 0% 1.35ns ± 1% +8.28% (p=0.000 n=10+9) Memmove/15-16 1.25ns ± 0% 1.35ns ± 0% +8.07% (p=0.000 n=10+9) Memmove/16-16 1.25ns ± 0% 1.35ns ± 1% +8.35% (p=0.000 n=9+10) Memmove/32-16 1.34ns ± 0% 1.36ns ± 1% +1.22% (p=0.000 n=10+10) Memmove/64-16 1.45ns ± 0% 1.64ns ± 0% +13.07% (p=0.000 n=10+9) Memmove/128-16 1.86ns ± 0% 2.02ns ± 0% +8.64% (p=0.000 n=10+10) Memmove/256-16 2.47ns ± 0% 2.49ns ± 1% +1.14% (p=0.000 n=10+10) Memmove/512-16 3.96ns ± 1% 3.96ns ± 0% ~ (p=0.182 n=10+10) Memmove/1024-16 5.90ns ± 1% 5.87ns ± 1% ~ (p=0.258 n=9+9) Memmove/2048-16 9.62ns ± 1% 9.62ns ± 2% ~ (p=0.963 n=8+9) Memmove/4096-16 16.4ns ± 0% 17.1ns ± 4% +4.19% (p=0.003 n=8+9) MemmoveOverlap/32-16 1.62ns ± 1% 1.68ns ± 1% +3.53% (p=0.000 n=10+10) MemmoveOverlap/64-16 1.64ns ± 0% 1.65ns ± 0% +0.29% (p=0.002 n=9+9) MemmoveOverlap/128-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.070 n=10+10) MemmoveOverlap/256-16 2.67ns ± 0% 2.67ns ± 0% +0.26% (p=0.012 n=10+10) MemmoveOverlap/512-16 6.20ns ±18% 5.74ns ± 0% ~ (p=0.645 n=10+8) MemmoveOverlap/1024-16 7.28ns ± 0% 7.30ns ± 0% +0.28% (p=0.006 n=8+10) MemmoveOverlap/2048-16 11.9ns ± 0% 12.0ns ± 1% +0.37% (p=0.014 n=9+9) MemmoveOverlap/4096-16 23.3ns ± 1% 23.1ns ± 1% -0.84% (p=0.000 n=8+10) MemmoveUnalignedDst/0-16 1.03ns ± 0% 1.03ns ± 0% +0.19% (p=0.007 n=10+10) MemmoveUnalignedDst/1-16 1.24ns ± 0% 1.25ns ± 1% +0.52% (p=0.022 n=10+10) MemmoveUnalignedDst/2-16 1.23ns ± 0% 1.23ns ± 0% ~ (p=0.051 n=10+10) MemmoveUnalignedDst/3-16 1.23ns ± 0% 1.23ns ± 0% +0.14% (p=0.006 n=9+9) MemmoveUnalignedDst/4-16 1.23ns ± 0% 1.24ns ± 1% +0.37% (p=0.004 n=10+10) MemmoveUnalignedDst/5-16 1.35ns ± 0% 1.35ns ± 0% ~ (p=0.075 n=10+10) MemmoveUnalignedDst/6-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.779 n=10+10) MemmoveUnalignedDst/7-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=1.000 n=10+10) MemmoveUnalignedDst/8-16 1.34ns ± 0% 1.35ns ± 1% +0.39% (p=0.024 n=10+10) MemmoveUnalignedDst/9-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.849 n=10+10) MemmoveUnalignedDst/10-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.255 n=10+10) MemmoveUnalignedDst/11-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.304 n=10+10) MemmoveUnalignedDst/12-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.672 n=10+10) MemmoveUnalignedDst/13-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.435 n=10+10) MemmoveUnalignedDst/14-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.340 n=10+10) MemmoveUnalignedDst/15-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.911 n=10+9) MemmoveUnalignedDst/16-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.074 n=10+10) MemmoveUnalignedDst/32-16 1.62ns ± 0% 1.63ns ± 0% ~ (p=0.059 n=10+10) MemmoveUnalignedDst/64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.234 n=10+10) MemmoveUnalignedDst/128-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.709 n=10+9) MemmoveUnalignedDst/256-16 3.69ns ± 0% 3.70ns ± 0% ~ (p=0.144 n=10+10) MemmoveUnalignedDst/512-16 4.15ns ± 1% 4.14ns ± 0% ~ (p=0.778 n=10+8) MemmoveUnalignedDst/1024-16 7.52ns ± 0% 7.53ns ± 1% ~ (p=0.650 n=9+9) MemmoveUnalignedDst/2048-16 12.9ns ± 0% 12.9ns ± 1% ~ (p=0.548 n=8+8) MemmoveUnalignedDst/4096-16 25.4ns ± 0% 25.4ns ± 0% ~ (p=0.947 n=9+9) MemmoveUnalignedDstOverlap/32-16 4.08ns ± 0% 4.09ns ± 0% ~ (p=0.360 n=10+10) MemmoveUnalignedDstOverlap/64-16 4.56ns ± 0% 4.56ns ± 0% ~ (p=0.705 n=10+9) MemmoveUnalignedDstOverlap/128-16 4.67ns ± 0% 4.67ns ± 0% ~ (p=0.397 n=10+10) MemmoveUnalignedDstOverlap/256-16 5.08ns ± 0% 5.08ns ± 0% ~ (p=0.159 n=10+9) MemmoveUnalignedDstOverlap/512-16 8.45ns ± 5% 8.19ns ± 0% -3.10% (p=0.021 n=10+9) MemmoveUnalignedDstOverlap/1024-16 9.55ns ± 0% 9.56ns ± 0% ~ (p=0.221 n=8+8) MemmoveUnalignedDstOverlap/2048-16 14.0ns ± 0% 14.0ns ± 1% ~ (p=0.200 n=10+9) MemmoveUnalignedDstOverlap/4096-16 26.5ns ± 0% 26.5ns ± 0% ~ (p=0.458 n=10+9) MemmoveUnalignedSrc/0-16 1.02ns ± 1% 0.99ns ± 1% -2.67% (p=0.000 n=10+9) MemmoveUnalignedSrc/1-16 1.13ns ± 0% 1.13ns ± 1% -0.25% (p=0.027 n=10+9) MemmoveUnalignedSrc/2-16 1.13ns ± 1% 1.13ns ± 0% -0.28% (p=0.012 n=10+9) MemmoveUnalignedSrc/3-16 1.24ns ± 1% 1.23ns ± 0% -0.25% (p=0.022 n=9+10) MemmoveUnalignedSrc/4-16 1.24ns ± 0% 1.23ns ± 1% ~ (p=0.118 n=9+10) MemmoveUnalignedSrc/5-16 1.34ns ± 0% 1.34ns ± 1% ~ (p=0.564 n=8+10) MemmoveUnalignedSrc/6-16 1.34ns ± 0% 1.34ns ± 0% -0.39% (p=0.000 n=10+10) MemmoveUnalignedSrc/7-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.235 n=10+10) MemmoveUnalignedSrc/8-16 1.34ns ± 0% 1.34ns ± 0% -0.37% (p=0.002 n=10+9) MemmoveUnalignedSrc/9-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.579 n=10+9) MemmoveUnalignedSrc/10-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.534 n=10+9) MemmoveUnalignedSrc/11-16 1.44ns ± 0% 1.44ns ± 1% ~ (p=0.415 n=10+10) MemmoveUnalignedSrc/12-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.218 n=10+10) MemmoveUnalignedSrc/13-16 1.44ns ± 0% 1.44ns ± 1% ~ (p=0.693 n=10+10) MemmoveUnalignedSrc/14-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.901 n=10+10) MemmoveUnalignedSrc/15-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.379 n=10+10) MemmoveUnalignedSrc/16-16 1.44ns ± 1% 1.44ns ± 0% ~ (p=0.538 n=10+10) MemmoveUnalignedSrc/32-16 1.60ns ± 1% 1.60ns ± 0% ~ (p=0.491 n=10+10) MemmoveUnalignedSrc/64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.564 n=10+10) MemmoveUnalignedSrc/128-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.497 n=10+9) MemmoveUnalignedSrc/256-16 2.70ns ± 0% 2.78ns ± 1% +2.81% (p=0.000 n=10+10) MemmoveUnalignedSrc/512-16 4.31ns ± 0% 4.30ns ± 0% -0.26% (p=0.031 n=8+9) MemmoveUnalignedSrc/1024-16 7.28ns ± 0% 7.21ns ± 1% -1.05% (p=0.000 n=8+10) MemmoveUnalignedSrc/2048-16 13.0ns ± 0% 13.0ns ± 0% ~ (p=0.180 n=9+8) MemmoveUnalignedSrc/4096-16 25.4ns ± 0% 25.3ns ± 1% ~ (p=0.054 n=10+10) MemmoveUnalignedSrcOverlap/32-16 4.04ns ± 0% 4.06ns ± 0% +0.62% (p=0.000 n=9+10) MemmoveUnalignedSrcOverlap/64-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.421 n=10+10) MemmoveUnalignedSrcOverlap/128-16 4.53ns ± 0% 4.52ns ± 0% ~ (p=0.251 n=10+10) MemmoveUnalignedSrcOverlap/256-16 6.17ns ± 0% 6.15ns ± 0% -0.35% (p=0.000 n=10+9) MemmoveUnalignedSrcOverlap/512-16 7.43ns ± 0% 7.44ns ± 0% ~ (p=0.524 n=9+8) MemmoveUnalignedSrcOverlap/1024-16 8.94ns ± 0% 8.94ns ± 0% ~ (p=0.419 n=8+8) MemmoveUnalignedSrcOverlap/2048-16 13.2ns ± 0% 14.5ns ±21% ~ (p=0.107 n=8+10) MemmoveUnalignedSrcOverlap/4096-16 25.6ns ± 0% 25.6ns ± 1% ~ (p=0.650 n=9+9) Memclr/5-16 0.86ns ± 1% 0.86ns ± 2% ~ (p=0.531 n=9+9) Memclr/16-16 1.04ns ± 0% 1.04ns ± 0% +0.32% (p=0.013 n=9+10) Memclr/64-16 1.23ns ± 0% 1.26ns ± 0% +2.28% (p=0.000 n=10+10) Memclr/256-16 2.27ns ± 0% 2.27ns ± 0% ~ (p=0.127 n=10+10) Memclr/4096-16 17.1ns ± 1% 17.3ns ± 0% +0.88% (p=0.000 n=10+10) Memclr/65536-16 821ns ± 0% 822ns ± 0% ~ (p=0.516 n=10+10) Memclr/1M-16 14.1µs ± 1% 14.0µs ± 1% ~ (p=0.516 n=10+10) Memclr/4M-16 86.1µs ± 1% 85.9µs ± 0% ~ (p=0.123 n=10+10) Memclr/8M-16 174µs ± 2% 173µs ± 0% ~ (p=0.408 n=10+8) Memclr/16M-16 385µs ± 4% 387µs ± 0% ~ (p=0.173 n=10+8) Memclr/64M-16 2.18ms ± 0% 2.19ms ± 0% ~ (p=0.113 n=10+9) GoMemclr/5-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.346 n=9+10) GoMemclr/16-16 1.02ns ± 0% 1.02ns ± 0% +0.22% (p=0.003 n=10+8) GoMemclr/64-16 1.14ns ± 0% 1.14ns ± 0% ~ (p=0.948 n=10+9) GoMemclr/256-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.868 n=10+10) MemclrRange/1K_2K-16 457ns ± 0% 428ns ± 1% -6.38% (p=0.000 n=10+10) MemclrRange/2K_8K-16 1.46µs ± 0% 1.46µs ± 0% ~ (p=0.700 n=10+10) MemclrRange/4K_16K-16 1.16µs ± 0% 1.16µs ± 0% ~ (p=0.567 n=9+10) MemclrRange/160K_228K-16 20.7µs ± 0% 20.7µs ± 0% ~ (p=0.160 n=10+10) ClearFat7-16 0.38ns ± 5% 0.21ns ± 1% -45.79% (p=0.000 n=9+10) ClearFat8-16 0.21ns ± 3% 0.12ns ± 2% -44.16% (p=0.000 n=8+9) ClearFat11-16 0.35ns ± 3% 0.21ns ± 1% -40.46% (p=0.000 n=9+9) ClearFat12-16 0.23ns ± 9% 0.21ns ± 1% -10.23% (p=0.000 n=10+9) ClearFat13-16 0.22ns ± 6% 0.21ns ± 2% -6.53% (p=0.000 n=10+10) ClearFat14-16 0.22ns ± 4% 0.21ns ± 1% -5.97% (p=0.000 n=10+10) ClearFat15-16 0.22ns ± 4% 0.21ns ± 1% -6.96% (p=0.000 n=10+9) ClearFat16-16 0.19ns ± 9% 0.12ns ± 6% -34.89% (p=0.000 n=9+10) ClearFat24-16 0.23ns ± 6% 0.21ns ± 1% -10.26% (p=0.000 n=10+9) ClearFat32-16 0.22ns ± 5% 0.21ns ± 2% -5.31% (p=0.000 n=10+10) ClearFat40-16 0.34ns ± 4% 0.62ns ± 1% +83.00% (p=0.000 n=10+10) ClearFat48-16 0.33ns ± 2% 0.41ns ± 0% +26.71% (p=0.000 n=10+10) ClearFat56-16 0.41ns ± 1% 0.41ns ± 0% ~ (p=0.838 n=10+10) ClearFat64-16 0.41ns ± 0% 0.41ns ± 0% ~ (p=0.178 n=10+8) ClearFat72-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.669 n=10+10) ClearFat128-16 1.04ns ± 0% 1.04ns ± 0% ~ (p=0.679 n=10+10) ClearFat256-16 1.86ns ± 0% 1.86ns ± 0% ~ (p=0.066 n=9+10) ClearFat512-16 3.50ns ± 0% 3.50ns ± 0% ~ (p=0.626 n=10+10) ClearFat1024-16 6.79ns ± 0% 6.79ns ± 0% ~ (p=0.986 n=10+10) ClearFat1032-16 13.6ns ± 0% 13.6ns ± 0% +0.13% (p=0.044 n=10+10) ClearFat1040-16 10.3ns ± 0% 10.3ns ± 0% ~ (p=0.175 n=10+9) CopyFat7-16 0.37ns ±13% 0.25ns ± 1% -31.74% (p=0.000 n=10+9) CopyFat8-16 0.17ns ± 1% 0.17ns ± 2% +1.35% (p=0.004 n=9+9) CopyFat11-16 0.26ns ± 1% 0.30ns ± 3% +12.58% (p=0.000 n=9+10) CopyFat12-16 0.28ns ± 2% 0.26ns ± 1% -5.66% (p=0.000 n=9+9) CopyFat13-16 0.26ns ± 0% 0.28ns ± 4% +7.35% (p=0.000 n=8+10) CopyFat14-16 0.29ns ± 6% 0.26ns ± 2% -10.46% (p=0.000 n=10+9) CopyFat15-16 0.26ns ± 1% 0.30ns ± 6% +14.12% (p=0.000 n=8+10) CopyFat16-16 0.21ns ± 1% 0.21ns ± 0% ~ (p=0.426 n=8+8) CopyFat24-16 0.29ns ± 3% 0.25ns ± 1% -12.27% (p=0.000 n=9+10) CopyFat32-16 0.26ns ± 4% 0.29ns ± 4% +11.71% (p=0.000 n=10+10) CopyFat64-16 0.46ns ± 8% 0.42ns ± 1% -8.37% (p=0.002 n=10+10) CopyFat72-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.563 n=10+10) CopyFat128-16 1.53ns ± 0% 1.54ns ± 0% +0.62% (p=0.000 n=10+10) CopyFat256-16 2.68ns ± 0% 2.65ns ± 1% -1.23% (p=0.000 n=10+10) CopyFat512-16 4.93ns ± 1% 5.19ns ± 3% +5.16% (p=0.000 n=9+9) CopyFat520-16 6.99ns ± 0% 6.99ns ± 0% ~ (p=0.539 n=10+10) CopyFat1024-16 11.5ns ± 1% 9.8ns ± 1% -14.98% (p=0.000 n=9+10) CopyFat1032-16 13.6ns ± 0% 13.6ns ± 0% ~ (p=0.728 n=10+10) CopyFat1040-16 11.0ns ± 0% 11.1ns ± 0% +0.53% (p=0.000 n=10+10) Issue18740/2byte-16 10.1µs ± 0% 10.1µs ± 0% ~ (p=0.342 n=10+10) Issue18740/4byte-16 2.34µs ± 0% 2.35µs ± 0% +0.30% (p=0.002 n=10+8) Issue18740/8byte-16 1.28µs ± 0% 1.28µs ± 0% +0.32% (p=0.000 n=9+10) Finalizer-16 345µs ± 1% 336µs ± 0% -2.55% (p=0.000 n=10+9) FinalizerRun-16 450ns ± 3% 420ns ± 1% -6.65% (p=0.000 n=10+10) PallocBitsSummarize/Unpacked00-16 2.88ns ± 0% 2.88ns ± 0% ~ (p=0.358 n=10+10) PallocBitsSummarize/UnpackedFFFFFFFFFFFFFFFF-16 15.2ns ± 0% 15.2ns ± 0% ~ (p=0.925 n=10+10) PallocBitsSummarize/UnpackedAA-16 16.4ns ± 0% 16.3ns ± 0% ~ (p=0.113 n=9+9) PallocBitsSummarize/UnpackedAAAAAAAAAAAAAAAA-16 16.5ns ± 0% 16.6ns ± 0% ~ (p=0.238 n=10+10) PallocBitsSummarize/Unpacked80000000AAAAAAAA-16 37.8ns ± 1% 36.4ns ± 0% -3.70% (p=0.000 n=10+9) PallocBitsSummarize/UnpackedAAAAAAAA00000001-16 41.8ns ± 1% 39.9ns ± 0% -4.68% (p=0.000 n=9+10) PallocBitsSummarize/UnpackedBBBBBBBBBBBBBBBB-16 18.3ns ± 0% 18.3ns ± 0% ~ (p=0.781 n=10+10) PallocBitsSummarize/Unpacked80000000BBBBBBBB-16 38.8ns ± 1% 38.1ns ± 0% -1.78% (p=0.000 n=9+10) PallocBitsSummarize/UnpackedBBBBBBBB00000001-16 37.5ns ± 0% 36.1ns ± 1% -3.88% (p=0.000 n=8+10) PallocBitsSummarize/UnpackedCCCCCCCCCCCCCCCC-16 21.8ns ± 0% 21.9ns ± 0% +0.20% (p=0.018 n=10+9) PallocBitsSummarize/Unpacked4444444444444444-16 21.8ns ± 0% 21.9ns ± 0% +0.20% (p=0.029 n=10+9) PallocBitsSummarize/Unpacked4040404040404040-16 26.5ns ± 0% 26.5ns ± 0% -0.24% (p=0.001 n=9+10) PallocBitsSummarize/Unpacked4000400040004000-16 33.4ns ± 1% 31.3ns ± 0% -6.20% (p=0.000 n=9+10) PallocBitsSummarize/Unpacked1000404044CCAAFF-16 36.4ns ± 1% 35.9ns ± 0% -1.50% (p=0.000 n=10+10) FindBitRange64/Pattern00Size2-16 0.34ns ± 1% 0.35ns ± 1% +3.80% (p=0.000 n=10+9) FindBitRange64/Pattern00Size8-16 0.70ns ± 1% 0.70ns ± 0% -0.68% (p=0.000 n=10+10) FindBitRange64/Pattern00Size32-16 0.70ns ± 1% 0.69ns ± 0% -0.86% (p=0.001 n=10+8) FindBitRange64/PatternFFFFFFFFFFFFFFFFSize2-16 0.34ns ± 1% 0.35ns ± 1% +4.45% (p=0.000 n=9+8) FindBitRange64/PatternFFFFFFFFFFFFFFFFSize8-16 1.54ns ± 0% 1.54ns ± 1% ~ (p=0.914 n=9+9) FindBitRange64/PatternFFFFFFFFFFFFFFFFSize32-16 2.78ns ± 0% 2.78ns ± 0% ~ (p=0.295 n=9+10) FindBitRange64/PatternAASize2-16 0.34ns ± 2% 0.35ns ± 2% +4.61% (p=0.000 n=10+10) FindBitRange64/PatternAASize8-16 0.70ns ± 1% 0.70ns ± 1% -0.82% (p=0.005 n=10+10) FindBitRange64/PatternAASize32-16 0.70ns ± 1% 0.70ns ± 0% -0.73% (p=0.003 n=10+9) FindBitRange64/PatternAAAAAAAAAAAAAAAASize2-16 0.34ns ± 2% 0.35ns ± 2% +3.94% (p=0.000 n=10+10) FindBitRange64/PatternAAAAAAAAAAAAAAAASize8-16 0.70ns ± 1% 0.70ns ± 1% -0.67% (p=0.025 n=10+10) FindBitRange64/PatternAAAAAAAAAAAAAAAASize32-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.118 n=9+10) FindBitRange64/Pattern80000000AAAAAAAASize2-16 0.34ns ± 1% 0.35ns ± 2% +3.72% (p=0.000 n=10+9) FindBitRange64/Pattern80000000AAAAAAAASize8-16 0.70ns ± 1% 0.70ns ± 0% ~ (p=0.102 n=10+10) FindBitRange64/Pattern80000000AAAAAAAASize32-16 0.70ns ± 1% 0.70ns ± 1% -0.55% (p=0.011 n=10+10) FindBitRange64/PatternAAAAAAAA00000001Size2-16 0.34ns ± 2% 0.35ns ± 1% +3.83% (p=0.000 n=10+9) FindBitRange64/PatternAAAAAAAA00000001Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.065 n=10+10) FindBitRange64/PatternAAAAAAAA00000001Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.95% (p=0.002 n=10+10) FindBitRange64/PatternBBBBBBBBBBBBBBBBSize2-16 0.34ns ± 0% 0.35ns ± 1% +4.12% (p=0.000 n=8+10) FindBitRange64/PatternBBBBBBBBBBBBBBBBSize8-16 1.24ns ± 0% 1.23ns ± 0% -0.30% (p=0.002 n=10+9) FindBitRange64/PatternBBBBBBBBBBBBBBBBSize32-16 1.24ns ± 0% 1.24ns ± 0% -0.17% (p=0.023 n=9+10) FindBitRange64/Pattern80000000BBBBBBBBSize2-16 0.34ns ± 1% 0.35ns ± 2% +4.82% (p=0.000 n=9+10) FindBitRange64/Pattern80000000BBBBBBBBSize8-16 1.24ns ± 1% 1.24ns ± 0% ~ (p=0.063 n=10+10) FindBitRange64/Pattern80000000BBBBBBBBSize32-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.164 n=9+10) FindBitRange64/PatternBBBBBBBB00000001Size2-16 0.34ns ± 1% 0.35ns ± 1% +4.38% (p=0.000 n=8+10) FindBitRange64/PatternBBBBBBBB00000001Size8-16 1.24ns ± 1% 1.24ns ± 0% ~ (p=0.052 n=10+10) FindBitRange64/PatternBBBBBBBB00000001Size32-16 1.24ns ± 0% 1.23ns ± 0% -0.40% (p=0.000 n=10+10) FindBitRange64/PatternCCCCCCCCCCCCCCCCSize2-16 0.34ns ± 0% 0.35ns ± 2% +3.96% (p=0.000 n=9+10) FindBitRange64/PatternCCCCCCCCCCCCCCCCSize8-16 1.24ns ± 0% 1.23ns ± 0% -0.30% (p=0.000 n=10+9) FindBitRange64/PatternCCCCCCCCCCCCCCCCSize32-16 1.24ns ± 0% 1.24ns ± 1% ~ (p=0.284 n=10+10) FindBitRange64/Pattern4444444444444444Size2-16 0.34ns ± 1% 0.35ns ± 1% +3.91% (p=0.000 n=9+9) FindBitRange64/Pattern4444444444444444Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.617 n=10+10) FindBitRange64/Pattern4444444444444444Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.60% (p=0.006 n=10+10) FindBitRange64/Pattern4040404040404040Size2-16 0.34ns ± 2% 0.35ns ± 2% +3.67% (p=0.000 n=10+10) FindBitRange64/Pattern4040404040404040Size8-16 0.70ns ± 2% 0.70ns ± 1% -0.87% (p=0.014 n=10+10) FindBitRange64/Pattern4040404040404040Size32-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.256 n=10+10) FindBitRange64/Pattern4000400040004000Size2-16 0.34ns ± 2% 0.35ns ± 3% +4.71% (p=0.000 n=10+10) FindBitRange64/Pattern4000400040004000Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.393 n=10+10) FindBitRange64/Pattern4000400040004000Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.86% (p=0.014 n=10+10) NetpollBreak-16 1.49µs ± 1% 1.50µs ± 3% ~ (p=0.181 n=8+10) Syscall-16 3.68ns ± 1% 3.66ns ± 2% ~ (p=0.148 n=10+10) SyscallWork-16 5.15ns ± 1% 5.13ns ± 0% ~ (p=0.188 n=10+9) SyscallExcess-16 3.89ns ± 2% 3.83ns ± 1% -1.52% (p=0.001 n=10+10) SyscallExcessWork-16 5.34ns ± 1% 5.31ns ± 0% -0.64% (p=0.000 n=10+9) PingPongHog-16 397ns ± 7% 394ns ±11% ~ (p=0.912 n=10+10) StackGrowth-16 67.9ns ± 0% 68.8ns ± 0% +1.28% (p=0.000 n=10+8) StackGrowthDeep-16 7.70µs ± 1% 8.48µs ± 2% +10.06% (p=0.000 n=9+10) CreateGoroutines-16 124ns ± 1% 124ns ± 1% ~ (p=0.254 n=10+10) CreateGoroutinesParallel-16 25.7ns ± 1% 27.6ns ± 2% +7.51% (p=0.000 n=10+10) CreateGoroutinesCapture-16 823ns ± 1% 821ns ± 2% ~ (p=0.699 n=10+10) CreateGoroutinesSingle-16 175ns ± 3% 172ns ± 3% -1.90% (p=0.011 n=10+10) ClosureCall-16 0.11ns ± 7% 0.12ns ± 3% ~ (p=0.842 n=9+10) WakeupParallelSpinning/0s-16 11.4µs ± 0% 11.4µs ± 0% ~ (p=0.325 n=9+10) WakeupParallelSpinning/1µs-16 15.4µs ± 0% 15.4µs ± 1% ~ (p=0.955 n=10+10) WakeupParallelSpinning/2µs-16 18.7µs ± 2% 18.9µs ± 2% ~ (p=0.052 n=10+10) WakeupParallelSpinning/5µs-16 30.7µs ± 0% 30.7µs ± 0% -0.03% (p=0.003 n=10+10) WakeupParallelSpinning/10µs-16 48.8µs ± 0% 48.8µs ± 0% ~ (p=0.670 n=10+10) WakeupParallelSpinning/20µs-16 90.8µs ± 0% 90.8µs ± 0% -0.02% (p=0.004 n=10+10) WakeupParallelSpinning/50µs-16 211µs ± 0% 211µs ± 0% ~ (p=0.194 n=10+10) WakeupParallelSpinning/100µs-16 323µs ± 0% 323µs ± 0% ~ (p=1.000 n=10+9) WakeupParallelSyscall/0s-16 118µs ± 0% 118µs ± 0% ~ (p=0.447 n=10+9) WakeupParallelSyscall/1µs-16 119µs ± 2% 119µs ± 1% ~ (p=0.604 n=10+9) WakeupParallelSyscall/2µs-16 120µs ± 1% 121µs ± 3% ~ (p=0.263 n=8+10) WakeupParallelSyscall/5µs-16 126µs ± 2% 126µs ± 2% ~ (p=0.510 n=10+9) WakeupParallelSyscall/10µs-16 136µs ± 1% 137µs ± 1% ~ (p=0.095 n=9+10) WakeupParallelSyscall/20µs-16 156µs ± 2% 157µs ± 3% ~ (p=0.604 n=10+9) WakeupParallelSyscall/50µs-16 221µs ± 1% 220µs ± 1% ~ (p=0.063 n=10+10) WakeupParallelSyscall/100µs-16 326µs ± 0% 325µs ± 0% -0.26% (p=0.003 n=9+10) Matmult-16 0.67ns ± 2% 0.66ns ± 2% ~ (p=0.256 n=10+10) Fastrand-16 0.08ns ±11% 0.08ns ±13% ~ (p=0.661 n=9+10) Fastrand64-16 0.08ns ±11% 0.08ns ± 6% ~ (p=0.631 n=10+10) FastrandHashiter-16 1.76ns ± 1% 1.76ns ± 1% ~ (p=0.854 n=8+8) Fastrandn/2-16 0.86ns ± 1% 0.86ns ± 1% +1.09% (p=0.000 n=10+9) Fastrandn/3-16 0.85ns ± 1% 0.86ns ± 1% +1.23% (p=0.001 n=10+10) Fastrandn/4-16 0.85ns ± 1% 0.87ns ± 2% +1.60% (p=0.000 n=10+10) Fastrandn/5-16 0.85ns ± 1% 0.86ns ± 1% +1.05% (p=0.000 n=10+10) IfaceCmp100-16 46.6ns ± 0% 46.1ns ± 0% -1.18% (p=0.000 n=10+10) IfaceCmpNil100-16 26.8ns ± 0% 26.8ns ± 0% ~ (p=0.777 n=10+8) EfaceCmpDiff-16 132ns ± 0% 130ns ± 0% -0.95% (p=0.000 n=10+9) EfaceCmpDiffIndirect-16 209ns ± 0% 211ns ± 0% +1.14% (p=0.000 n=10+9) Defer-16 3.40ns ± 1% 3.04ns ± 0% -10.67% (p=0.000 n=10+10) Defer10-16 29.4ns ± 2% 27.2ns ± 3% -7.26% (p=0.000 n=10+10) DeferMany-16 110ns ± 6% 113ns ± 2% +3.45% (p=0.017 n=9+9) PanicRecover-16 67.6ns ± 0% 67.7ns ± 2% ~ (p=0.436 n=9+9) GoroutineProfile/small-nil/idle-16 3.90µs ± 4% 3.86µs ± 2% ~ (p=0.305 n=10+9) GoroutineProfile/small-nil/loaded-16 4.82µs ± 6% 4.82µs ± 4% ~ (p=0.905 n=10+9) GoroutineProfile/small/idle-16 103µs ± 3% 102µs ± 3% ~ (p=0.113 n=9+9) GoroutineProfile/small/loaded-16 432µs ± 5% 440µs ±13% ~ (p=0.604 n=9+10) GoroutineProfile/large-nil/idle-16 3.86µs ± 3% 3.82µs ± 3% ~ (p=0.210 n=10+10) GoroutineProfile/large-nil/loaded-16 4.90µs ± 2% 4.90µs ± 5% ~ (p=0.780 n=10+9) GoroutineProfile/large/idle-16 2.58ms ± 1% 2.52ms ± 1% -2.38% (p=0.000 n=10+10) GoroutineProfile/large/loaded-16 8.62ms ± 9% 8.90ms ±11% ~ (p=0.400 n=9+10) GoroutineProfile/sparse-nil/idle-16 3.85µs ± 4% 3.81µs ± 3% ~ (p=0.470 n=10+10) GoroutineProfile/sparse-nil/loaded-16 4.82µs ± 4% 4.69µs ± 5% ~ (p=0.052 n=10+10) GoroutineProfile/sparse/idle-16 102µs ± 4% 102µs ± 2% ~ (p=0.497 n=10+9) GoroutineProfile/sparse/loaded-16 438µs ± 7% 437µs ± 6% ~ (p=0.796 n=10+10) RWMutexUncontended-16 6.79ns ± 0% 6.78ns ± 0% ~ (p=0.228 n=10+8) RWMutexWrite100-16 85.4ns ± 0% 87.1ns ± 0% +2.00% (p=0.000 n=10+8) RWMutexWrite10-16 168ns ±25% 152ns ±11% ~ (p=0.063 n=10+10) RWMutexWorkWrite100-16 106ns ± 0% 106ns ± 3% ~ (p=0.136 n=10+10) RWMutexWorkWrite10-16 567ns ± 3% 571ns ± 1% ~ (p=0.326 n=10+9) SemTable/OneAddrCollision/n=1000-16 15.9µs ± 1% 16.0µs ± 1% +0.50% (p=0.031 n=9+9) SemTable/ManyAddrCollision/n=1000-16 56.2µs ± 1% 56.8µs ± 1% +1.06% (p=0.000 n=10+10) SemTable/OneAddrCollision/n=2000-16 32.6µs ± 2% 32.9µs ± 4% ~ (p=0.156 n=10+9) SemTable/ManyAddrCollision/n=2000-16 118µs ± 0% 119µs ± 0% +0.75% (p=0.000 n=9+10) SemTable/OneAddrCollision/n=4000-16 65.3µs ± 1% 65.6µs ± 3% ~ (p=0.497 n=9+10) SemTable/ManyAddrCollision/n=4000-16 245µs ± 0% 248µs ± 2% +1.36% (p=0.000 n=9+10) SemTable/OneAddrCollision/n=8000-16 131µs ± 1% 130µs ± 1% -1.01% (p=0.002 n=9+10) SemTable/ManyAddrCollision/n=8000-16 503µs ± 1% 508µs ± 0% +0.97% (p=0.000 n=10+10) MakeSliceCopy/mallocmove/Byte-16 67.6ns ± 1% 64.1ns ± 2% -5.20% (p=0.000 n=10+10) MakeSliceCopy/mallocmove/Int-16 65.0ns ± 7% 61.7ns ± 4% -5.08% (p=0.009 n=10+10) MakeSliceCopy/mallocmove/Ptr-16 88.1ns ± 1% 79.9ns ± 1% -9.29% (p=0.000 n=10+10) MakeSliceCopy/makecopy/Byte-16 65.2ns ± 6% 63.4ns ± 0% ~ (p=0.500 n=10+8) MakeSliceCopy/makecopy/Int-16 63.2ns ± 1% 64.1ns ± 1% +1.34% (p=0.001 n=9+9) MakeSliceCopy/makecopy/Ptr-16 88.1ns ± 1% 80.1ns ± 1% -9.09% (p=0.000 n=10+10) MakeSliceCopy/nilappend/Byte-16 69.8ns ± 1% 65.7ns ± 3% -5.80% (p=0.000 n=10+10) MakeSliceCopy/nilappend/Int-16 69.6ns ± 2% 67.2ns ± 1% -3.50% (p=0.000 n=10+9) MakeSliceCopy/nilappend/Ptr-16 91.5ns ± 1% 83.8ns ± 1% -8.42% (p=0.000 n=9+10) MakeSlice/Byte-16 6.64ns ± 3% 6.58ns ± 2% ~ (p=0.393 n=10+10) MakeSlice/Int16-16 8.60ns ± 1% 8.38ns ± 3% -2.48% (p=0.001 n=9+10) MakeSlice/Int-16 17.7ns ± 3% 16.9ns ± 1% -4.67% (p=0.000 n=10+9) MakeSlice/Ptr-16 24.0ns ± 3% 23.3ns ± 2% -3.25% (p=0.000 n=10+9) MakeSlice/Struct/24-16 34.1ns ± 1% 32.0ns ± 1% -6.11% (p=0.000 n=10+10) MakeSlice/Struct/32-16 39.1ns ± 4% 38.2ns ± 1% ~ (p=0.829 n=10+8) MakeSlice/Struct/40-16 47.0ns ± 5% 43.0ns ± 2% -8.55% (p=0.000 n=10+9) GrowSlice/Byte-16 15.3ns ± 3% 15.0ns ± 2% -1.75% (p=0.005 n=9+9) GrowSlice/Int16-16 18.9ns ± 2% 18.4ns ± 2% -2.71% (p=0.000 n=10+9) GrowSlice/Int-16 33.9ns ± 1% 32.2ns ± 1% -4.89% (p=0.000 n=10+9) GrowSlice/Ptr-16 45.3ns ± 2% 43.5ns ± 1% -4.12% (p=0.000 n=10+10) GrowSlice/Struct/24-16 61.9ns ± 2% 60.0ns ± 4% -3.10% (p=0.002 n=10+10) GrowSlice/Struct/32-16 79.9ns ± 2% 72.3ns ± 3% -9.58% (p=0.000 n=8+10) GrowSlice/Struct/40-16 97.1ns ± 7% 88.8ns ± 5% -8.49% (p=0.000 n=10+10) ExtendSlice/IntSlice-16 21.1ns ± 2% 20.3ns ± 2% -3.71% (p=0.000 n=10+10) ExtendSlice/PointerSlice-16 26.8ns ± 2% 26.3ns ± 2% -1.86% (p=0.004 n=10+10) ExtendSlice/NoGrow-16 1.23ns ± 0% 1.30ns ± 1% +5.03% (p=0.000 n=10+10) Append-16 4.58ns ± 1% 4.53ns ± 0% -1.11% (p=0.000 n=10+10) AppendGrowByte-16 1.46ms ± 8% 1.42ms ± 7% -3.24% (p=0.035 n=10+10) AppendGrowString-16 27.8ms ± 4% 27.2ms ± 5% ~ (p=0.052 n=10+10) AppendSlice/1Bytes-16 1.03ns ± 1% 1.04ns ± 1% ~ (p=0.303 n=10+10) AppendSlice/4Bytes-16 1.04ns ± 0% 1.05ns ± 0% +0.79% (p=0.000 n=9+10) AppendSlice/7Bytes-16 1.23ns ± 1% 1.24ns ± 0% +0.45% (p=0.001 n=10+10) AppendSlice/8Bytes-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.183 n=10+10) AppendSlice/15Bytes-16 1.37ns ± 1% 1.43ns ± 1% +3.88% (p=0.000 n=10+10) AppendSlice/16Bytes-16 1.37ns ± 1% 1.42ns ± 1% +3.63% (p=0.000 n=9+10) AppendSlice/32Bytes-16 1.44ns ± 0% 1.47ns ± 1% +1.83% (p=0.000 n=10+10) AppendSliceLarge/1024Bytes-16 257ns ± 2% 234ns ± 1% -8.96% (p=0.000 n=8+9) AppendSliceLarge/4096Bytes-16 871ns ± 6% 812ns ± 1% -6.80% (p=0.000 n=10+10) AppendSliceLarge/16384Bytes-16 3.15µs ± 6% 3.04µs ± 5% ~ (p=0.052 n=10+10) AppendSliceLarge/65536Bytes-16 10.7µs ± 7% 10.8µs ± 2% ~ (p=0.278 n=10+9) AppendSliceLarge/262144Bytes-16 42.9µs ± 1% 39.6µs ± 5% -7.75% (p=0.000 n=9+10) AppendSliceLarge/1048576Bytes-16 147µs ± 4% 144µs ± 4% -2.21% (p=0.035 n=10+10) AppendStr/1Bytes-16 1.20ns ± 0% 1.20ns ± 0% ~ (p=0.755 n=10+10) AppendStr/4Bytes-16 1.13ns ± 0% 1.14ns ± 1% +1.20% (p=0.000 n=10+10) AppendStr/8Bytes-16 1.24ns ± 0% 1.25ns ± 0% +0.93% (p=0.000 n=10+10) AppendStr/16Bytes-16 1.40ns ± 0% 1.42ns ± 0% +2.10% (p=0.000 n=9+10) AppendStr/32Bytes-16 1.44ns ± 0% 1.45ns ± 0% +0.99% (p=0.000 n=10+10) AppendSpecialCase-16 8.64ns ± 1% 8.89ns ± 2% +2.90% (p=0.000 n=10+10) Copy/1Byte-16 1.24ns ± 1% 1.24ns ± 0% -0.28% (p=0.000 n=10+6) Copy/1String-16 1.24ns ± 0% 1.23ns ± 0% ~ (p=0.160 n=10+10) Copy/2Byte-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.115 n=10+10) Copy/2String-16 1.24ns ± 0% 1.24ns ± 1% ~ (p=0.954 n=10+10) Copy/4Byte-16 1.24ns ± 0% 1.24ns ± 0% -0.44% (p=0.001 n=10+10) Copy/4String-16 1.23ns ± 0% 1.23ns ± 0% ~ (p=0.081 n=10+10) Copy/8Byte-16 1.37ns ± 0% 1.34ns ± 0% -1.79% (p=0.000 n=9+9) Copy/8String-16 1.34ns ± 0% 1.34ns ± 0% -0.58% (p=0.000 n=9+10) Copy/12Byte-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.149 n=9+9) Copy/12String-16 1.44ns ± 0% 1.45ns ± 0% ~ (p=0.124 n=9+9) Copy/16Byte-16 1.44ns ± 0% 1.44ns ± 0% -0.19% (p=0.004 n=10+9) Copy/16String-16 1.44ns ± 0% 1.45ns ± 0% +0.30% (p=0.008 n=10+10) Copy/32Byte-16 1.63ns ± 1% 1.62ns ± 1% -0.72% (p=0.002 n=10+10) Copy/32String-16 1.60ns ± 1% 1.64ns ± 0% +2.23% (p=0.000 n=10+10) Copy/128Byte-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.757 n=9+10) Copy/128String-16 2.07ns ± 0% 2.07ns ± 0% +0.36% (p=0.004 n=10+10) Copy/1024Byte-16 6.07ns ± 2% 6.00ns ± 1% -1.20% (p=0.000 n=9+10) Copy/1024String-16 6.05ns ± 0% 5.95ns ± 1% -1.54% (p=0.000 n=10+9) AppendInPlace/NoGrow/Byte-16 288ns ± 1% 284ns ± 1% -1.58% (p=0.000 n=10+10) AppendInPlace/NoGrow/1Ptr-16 844ns ± 1% 809ns ± 3% -4.13% (p=0.000 n=9+10) AppendInPlace/NoGrow/2Ptr-16 1.47µs ± 1% 1.46µs ± 1% ~ (p=0.388 n=9+10) AppendInPlace/NoGrow/3Ptr-16 1.87µs ± 7% 1.91µs ± 1% ~ (p=0.166 n=10+8) AppendInPlace/NoGrow/4Ptr-16 2.66µs ± 1% 2.67µs ± 3% ~ (p=0.968 n=9+10) AppendInPlace/Grow/Byte-16 126ns ± 2% 121ns ± 2% -4.06% (p=0.000 n=10+10) AppendInPlace/Grow/1Ptr-16 132ns ± 2% 127ns ± 2% -4.28% (p=0.000 n=10+9) AppendInPlace/Grow/2Ptr-16 196ns ± 2% 188ns ± 1% -4.20% (p=0.000 n=10+8) AppendInPlace/Grow/3Ptr-16 264ns ± 1% 260ns ± 1% -1.51% (p=0.000 n=9+10) AppendInPlace/Grow/4Ptr-16 297ns ± 2% 294ns ± 2% ~ (p=0.085 n=10+10) StackCopyPtr-16 36.4ms ± 2% 36.7ms ± 2% ~ (p=0.481 n=10+10) StackCopy-16 33.9ms ± 3% 32.6ms ± 1% -3.87% (p=0.000 n=10+8) StackCopyNoCache-16 1.00ms ± 5% 1.01ms ± 5% ~ (p=0.143 n=10+10) StackCopyWithStkobj-16 11.0ms ± 3% 10.9ms ± 4% ~ (p=0.579 n=10+10) Issue18138-16 49.2µs ± 5% 49.0µs ± 4% ~ (p=1.000 n=10+9) CompareStringEqual-16 1.39ns ± 1% 1.45ns ± 2% +3.80% (p=0.000 n=8+10) CompareStringIdentical-16 0.55ns ± 1% 0.55ns ± 0% +0.42% (p=0.007 n=10+10) CompareStringSameLength-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.430 n=9+10) CompareStringDifferentLength-16 0.11ns ± 2% 0.11ns ± 3% ~ (p=0.139 n=9+10) CompareStringBigUnaligned-16 23.9µs ± 1% 24.0µs ± 1% ~ (p=0.370 n=9+8) CompareStringBig-16 22.0µs ± 3% 22.2µs ± 3% ~ (p=0.243 n=9+10) ConcatStringAndBytes-16 10.7ns ± 1% 10.0ns ± 2% -6.33% (p=0.000 n=10+10) SliceByteToString/1-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.057 n=10+10) SliceByteToString/2-16 6.67ns ± 2% 6.60ns ± 3% ~ (p=0.101 n=10+10) SliceByteToString/4-16 7.76ns ± 2% 7.56ns ± 3% -2.59% (p=0.001 n=10+10) SliceByteToString/8-16 9.81ns ± 4% 9.57ns ± 2% -2.48% (p=0.005 n=10+10) SliceByteToString/16-16 14.0ns ± 3% 13.7ns ± 2% -2.31% (p=0.009 n=10+10) SliceByteToString/32-16 17.3ns ± 1% 16.7ns ± 2% -3.41% (p=0.000 n=10+10) SliceByteToString/64-16 25.1ns ± 1% 24.1ns ± 2% -3.93% (p=0.000 n=9+10) SliceByteToString/128-16 38.6ns ± 1% 36.5ns ± 1% -5.60% (p=0.000 n=10+10) RuneCount/lenruneslice/ASCII-16 4.12ns ± 0% 4.11ns ± 0% ~ (p=0.382 n=10+10) RuneCount/lenruneslice/Japanese-16 25.4ns ± 2% 25.6ns ± 2% ~ (p=0.138 n=9+10) RuneCount/lenruneslice/MixedLength-16 17.1ns ± 0% 17.2ns ± 0% +0.59% (p=0.000 n=9+9) RuneCount/rangeloop/ASCII-16 3.30ns ± 1% 3.29ns ± 0% ~ (p=0.267 n=10+10) RuneCount/rangeloop/Japanese-16 20.1ns ± 1% 24.9ns ± 1% +24.31% (p=0.000 n=9+9) RuneCount/rangeloop/MixedLength-16 16.5ns ± 1% 16.7ns ± 1% +1.34% (p=0.000 n=10+10) RuneCount/utf8.RuneCountInString/ASCII-16 5.71ns ± 1% 5.73ns ± 2% ~ (p=0.579 n=10+10) RuneCount/utf8.RuneCountInString/Japanese-16 22.0ns ± 6% 18.4ns ± 3% -16.41% (p=0.000 n=9+10) RuneCount/utf8.RuneCountInString/MixedLength-16 15.0ns ± 1% 14.9ns ± 1% -1.01% (p=0.004 n=9+10) RuneIterate/range/ASCII-16 2.69ns ± 1% 2.72ns ± 0% +0.94% (p=0.026 n=10+9) RuneIterate/range/Japanese-16 24.5ns ± 2% 25.3ns ± 2% +3.23% (p=0.000 n=9+10) RuneIterate/range/MixedLength-16 17.0ns ± 1% 17.1ns ± 1% +0.85% (p=0.000 n=10+10) RuneIterate/range1/ASCII-16 2.70ns ± 1% 2.72ns ± 0% ~ (p=0.058 n=9+9) RuneIterate/range1/Japanese-16 24.1ns ± 2% 25.2ns ± 3% +4.30% (p=0.000 n=10+10) RuneIterate/range1/MixedLength-16 16.9ns ± 1% 17.7ns ± 0% +5.04% (p=0.000 n=10+8) RuneIterate/range2/ASCII-16 2.84ns ± 8% 2.72ns ± 1% -4.28% (p=0.003 n=10+9) RuneIterate/range2/Japanese-16 22.7ns ± 4% 25.2ns ± 3% +10.97% (p=0.000 n=10+10) RuneIterate/range2/MixedLength-16 17.0ns ± 1% 17.2ns ± 0% +0.95% (p=0.000 n=10+10) ArrayEqual-16 0.40ns ± 5% 0.35ns ± 2% -11.83% (p=0.000 n=10+10) Func/Name-16 8.05ns ± 1% 8.09ns ± 1% +0.40% (p=0.025 n=8+10) Func/Entry-16 1.73ns ± 1% 1.66ns ± 1% -3.93% (p=0.000 n=10+10) Func/FileLine-16 27.5ns ± 2% 26.0ns ± 0% -5.50% (p=0.000 n=10+10) [Geo mean] 16.7ns 15.7ns -6.08% name old speed new speed delta SetTypePtr-16 11.0GB/s ± 1% 11.0GB/s ± 3% ~ (p=0.684 n=10+10) SetTypePtr8-16 15.5GB/s ± 0% 15.5GB/s ± 0% ~ (p=0.123 n=10+10) SetTypePtr16-16 31.0GB/s ± 1% 31.1GB/s ± 0% ~ (p=0.123 n=10+10) SetTypePtr32-16 62.1GB/s ± 0% 62.2GB/s ± 0% ~ (p=0.123 n=10+10) SetTypePtr64-16 124GB/s ± 0% 124GB/s ± 0% ~ (p=0.684 n=10+10) SetTypePtr126-16 146GB/s ± 0% 146GB/s ± 0% ~ (p=0.481 n=10+10) SetTypePtr128-16 154GB/s ± 0% 154GB/s ± 0% ~ (p=0.243 n=9+10) SetTypePtrSlice-16 151GB/s ± 1% 151GB/s ± 1% ~ (p=0.497 n=9+10) SetTypeNode1-16 5.82GB/s ± 1% 5.82GB/s ± 0% ~ (p=0.353 n=10+10) SetTypeNode1Slice-16 76.1GB/s ± 1% 77.0GB/s ± 1% +1.19% (p=0.000 n=10+10) SetTypeNode8-16 19.4GB/s ± 0% 19.4GB/s ± 0% ~ (p=0.130 n=8+8) SetTypeNode8Slice-16 113GB/s ± 0% 113GB/s ± 0% ~ (p=0.604 n=10+9) SetTypeNode64-16 76.5GB/s ± 0% 76.5GB/s ± 0% ~ (p=0.190 n=10+10) SetTypeNode64Slice-16 97.8GB/s ± 0% 97.7GB/s ± 0% ~ (p=0.549 n=9+10) SetTypeNode64Dead-16 95.5GB/s ± 0% 95.7GB/s ± 0% ~ (p=0.118 n=10+6) SetTypeNode64DeadSlice-16 112GB/s ± 0% 112GB/s ± 0% ~ (p=0.353 n=10+10) SetTypeNode124-16 146GB/s ± 0% 146GB/s ± 0% ~ (p=0.853 n=10+10) SetTypeNode124Slice-16 146GB/s ± 5% 149GB/s ± 0% ~ (p=0.315 n=10+10) SetTypeNode126-16 154GB/s ± 0% 154GB/s ± 0% ~ (p=0.356 n=10+9) SetTypeNode126Slice-16 150GB/s ± 0% 150GB/s ± 0% ~ (p=0.095 n=9+10) SetTypeNode128-16 107GB/s ± 0% 107GB/s ± 0% +0.31% (p=0.003 n=9+10) SetTypeNode128Slice-16 119GB/s ± 0% 120GB/s ± 0% ~ (p=0.156 n=10+9) SetTypeNode130-16 108GB/s ± 0% 108GB/s ± 0% +0.33% (p=0.002 n=10+10) SetTypeNode130Slice-16 119GB/s ± 0% 119GB/s ± 0% ~ (p=0.739 n=10+10) SetTypeNode1024-16 160GB/s ± 0% 159GB/s ± 1% ~ (p=0.113 n=9+9) SetTypeNode1024Slice-16 144GB/s ± 0% 144GB/s ± 0% ~ (p=0.063 n=10+10) Hash5-16 2.59GB/s ± 1% 2.49GB/s ± 0% -3.90% (p=0.000 n=10+9) Hash16-16 7.85GB/s ± 1% 7.23GB/s ± 1% -7.92% (p=0.000 n=10+10) Hash64-16 24.0GB/s ± 0% 23.9GB/s ± 0% ~ (p=0.190 n=9+9) Hash1024-16 62.4GB/s ± 0% 62.3GB/s ± 0% -0.16% (p=0.017 n=9+10) Hash65536-16 74.0GB/s ± 0% 74.0GB/s ± 0% ~ (p=0.796 n=10+10) Memmove/1-16 1.08GB/s ± 0% 1.08GB/s ± 0% -0.21% (p=0.035 n=10+10) Memmove/2-16 2.16GB/s ± 0% 2.15GB/s ± 0% ~ (p=0.105 n=10+10) Memmove/3-16 3.24GB/s ± 1% 3.22GB/s ± 1% -0.49% (p=0.004 n=10+10) Memmove/4-16 3.89GB/s ± 0% 3.89GB/s ± 0% ~ (p=0.218 n=10+10) Memmove/5-16 4.42GB/s ± 0% 4.42GB/s ± 0% ~ (p=0.075 n=10+10) Memmove/6-16 5.31GB/s ± 0% 5.29GB/s ± 1% ~ (p=0.218 n=10+10) Memmove/7-16 6.19GB/s ± 0% 6.18GB/s ± 0% -0.15% (p=0.035 n=10+9) Memmove/8-16 7.07GB/s ± 0% 7.07GB/s ± 0% ~ (p=0.684 n=10+10) Memmove/9-16 7.22GB/s ± 0% 6.68GB/s ± 0% -7.37% (p=0.000 n=10+10) Memmove/10-16 8.02GB/s ± 0% 7.43GB/s ± 0% -7.38% (p=0.000 n=9+9) Memmove/11-16 8.83GB/s ± 0% 8.13GB/s ± 0% -7.87% (p=0.000 n=10+9) Memmove/12-16 9.62GB/s ± 0% 8.89GB/s ± 1% -7.61% (p=0.000 n=10+10) Memmove/13-16 10.4GB/s ± 0% 9.7GB/s ± 0% -7.20% (p=0.000 n=10+10) Memmove/14-16 11.2GB/s ± 0% 10.4GB/s ± 1% -7.64% (p=0.000 n=10+9) Memmove/15-16 12.0GB/s ± 0% 11.1GB/s ± 0% -7.46% (p=0.000 n=10+9) Memmove/16-16 12.8GB/s ± 0% 11.8GB/s ± 1% -7.67% (p=0.000 n=10+10) Memmove/32-16 23.8GB/s ± 0% 23.5GB/s ± 1% -1.20% (p=0.000 n=10+10) Memmove/64-16 44.2GB/s ± 0% 39.1GB/s ± 0% -11.56% (p=0.000 n=10+9) Memmove/128-16 68.7GB/s ± 0% 63.2GB/s ± 0% -7.95% (p=0.000 n=10+10) Memmove/256-16 104GB/s ± 0% 103GB/s ± 0% -1.13% (p=0.000 n=10+10) Memmove/512-16 129GB/s ± 1% 129GB/s ± 0% ~ (p=0.165 n=10+10) Memmove/1024-16 174GB/s ± 1% 174GB/s ± 1% ~ (p=0.258 n=9+9) Memmove/2048-16 213GB/s ± 1% 213GB/s ± 2% ~ (p=0.963 n=8+9) Memmove/4096-16 250GB/s ± 1% 240GB/s ± 4% -3.83% (p=0.006 n=9+9) MemmoveOverlap/32-16 19.8GB/s ± 1% 19.1GB/s ± 1% -3.40% (p=0.000 n=10+10) MemmoveOverlap/64-16 39.0GB/s ± 0% 38.8GB/s ± 0% -0.28% (p=0.001 n=9+9) MemmoveOverlap/128-16 62.2GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.063 n=10+10) MemmoveOverlap/256-16 96.0GB/s ± 0% 95.8GB/s ± 0% -0.26% (p=0.009 n=10+10) MemmoveOverlap/512-16 83.6GB/s ±16% 89.2GB/s ± 0% ~ (p=0.696 n=10+8) MemmoveOverlap/1024-16 141GB/s ± 0% 140GB/s ± 0% -0.28% (p=0.006 n=8+10) MemmoveOverlap/2048-16 172GB/s ± 0% 171GB/s ± 1% -0.38% (p=0.008 n=9+9) MemmoveOverlap/4096-16 176GB/s ± 1% 177GB/s ± 1% +0.84% (p=0.001 n=8+10) MemmoveUnalignedDst/1-16 806MB/s ± 0% 802MB/s ± 1% -0.52% (p=0.023 n=10+10) MemmoveUnalignedDst/2-16 1.62GB/s ± 0% 1.62GB/s ± 0% -0.11% (p=0.041 n=10+10) MemmoveUnalignedDst/3-16 2.43GB/s ± 0% 2.43GB/s ± 0% -0.14% (p=0.006 n=9+9) MemmoveUnalignedDst/4-16 3.24GB/s ± 0% 3.23GB/s ± 1% -0.36% (p=0.007 n=10+10) MemmoveUnalignedDst/5-16 3.71GB/s ± 0% 3.71GB/s ± 0% ~ (p=0.063 n=10+10) MemmoveUnalignedDst/6-16 4.48GB/s ± 0% 4.47GB/s ± 0% ~ (p=0.912 n=10+10) MemmoveUnalignedDst/7-16 5.22GB/s ± 0% 5.22GB/s ± 0% ~ (p=1.000 n=10+10) MemmoveUnalignedDst/8-16 5.95GB/s ± 0% 5.93GB/s ± 1% -0.40% (p=0.023 n=10+10) MemmoveUnalignedDst/9-16 6.24GB/s ± 0% 6.24GB/s ± 0% ~ (p=0.912 n=10+10) MemmoveUnalignedDst/10-16 6.94GB/s ± 0% 6.94GB/s ± 0% ~ (p=0.353 n=10+10) MemmoveUnalignedDst/11-16 7.64GB/s ± 0% 7.63GB/s ± 0% ~ (p=0.393 n=10+10) MemmoveUnalignedDst/12-16 8.33GB/s ± 0% 8.33GB/s ± 0% ~ (p=0.971 n=10+10) MemmoveUnalignedDst/13-16 9.02GB/s ± 0% 9.01GB/s ± 0% ~ (p=0.436 n=10+10) MemmoveUnalignedDst/14-16 9.71GB/s ± 0% 9.71GB/s ± 0% ~ (p=0.280 n=10+10) MemmoveUnalignedDst/15-16 10.4GB/s ± 0% 10.4GB/s ± 1% ~ (p=0.853 n=10+10) MemmoveUnalignedDst/16-16 11.1GB/s ± 0% 11.1GB/s ± 0% ~ (p=0.089 n=10+10) MemmoveUnalignedDst/32-16 19.7GB/s ± 1% 19.6GB/s ± 0% ~ (p=0.075 n=10+10) MemmoveUnalignedDst/64-16 38.9GB/s ± 0% 38.8GB/s ± 0% ~ (p=0.218 n=10+10) MemmoveUnalignedDst/128-16 62.1GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.549 n=10+9) MemmoveUnalignedDst/256-16 69.4GB/s ± 0% 69.3GB/s ± 0% ~ (p=0.105 n=10+10) MemmoveUnalignedDst/512-16 124GB/s ± 1% 124GB/s ± 0% ~ (p=0.762 n=10+8) MemmoveUnalignedDst/1024-16 136GB/s ± 0% 136GB/s ± 1% ~ (p=0.666 n=9+9) MemmoveUnalignedDst/2048-16 159GB/s ± 0% 159GB/s ± 0% ~ (p=0.574 n=8+8) MemmoveUnalignedDst/4096-16 161GB/s ± 0% 161GB/s ± 0% ~ (p=1.000 n=9+9) MemmoveUnalignedDstOverlap/32-16 7.84GB/s ± 0% 7.83GB/s ± 0% ~ (p=0.353 n=10+10) MemmoveUnalignedDstOverlap/64-16 14.0GB/s ± 0% 14.0GB/s ± 0% ~ (p=0.661 n=10+9) MemmoveUnalignedDstOverlap/128-16 27.4GB/s ± 0% 27.4GB/s ± 0% ~ (p=0.353 n=10+10) MemmoveUnalignedDstOverlap/256-16 50.4GB/s ± 0% 50.4GB/s ± 0% ~ (p=0.156 n=10+9) MemmoveUnalignedDstOverlap/512-16 60.7GB/s ± 4% 62.5GB/s ± 0% +3.07% (p=0.022 n=10+9) MemmoveUnalignedDstOverlap/1024-16 107GB/s ± 0% 107GB/s ± 0% ~ (p=0.234 n=8+8) MemmoveUnalignedDstOverlap/2048-16 146GB/s ± 0% 146GB/s ± 1% ~ (p=0.182 n=10+9) MemmoveUnalignedDstOverlap/4096-16 155GB/s ± 0% 155GB/s ± 0% ~ (p=0.400 n=10+9) MemmoveUnalignedSrc/1-16 882MB/s ± 0% 884MB/s ± 1% +0.24% (p=0.033 n=10+9) MemmoveUnalignedSrc/2-16 1.76GB/s ± 1% 1.77GB/s ± 0% +0.27% (p=0.028 n=10+9) MemmoveUnalignedSrc/3-16 2.43GB/s ± 0% 2.43GB/s ± 0% +0.26% (p=0.027 n=9+10) MemmoveUnalignedSrc/4-16 3.24GB/s ± 0% 3.24GB/s ± 1% ~ (p=0.079 n=9+10) MemmoveUnalignedSrc/5-16 3.73GB/s ± 0% 3.73GB/s ± 1% ~ (p=0.829 n=8+10) MemmoveUnalignedSrc/6-16 4.47GB/s ± 0% 4.49GB/s ± 0% +0.39% (p=0.000 n=10+10) MemmoveUnalignedSrc/7-16 5.22GB/s ± 0% 5.23GB/s ± 0% ~ (p=0.280 n=10+10) MemmoveUnalignedSrc/8-16 5.95GB/s ± 0% 5.98GB/s ± 0% +0.39% (p=0.001 n=10+9) MemmoveUnalignedSrc/9-16 6.24GB/s ± 0% 6.25GB/s ± 0% ~ (p=0.549 n=10+9) MemmoveUnalignedSrc/10-16 6.93GB/s ± 0% 6.94GB/s ± 0% ~ (p=0.604 n=10+9) MemmoveUnalignedSrc/11-16 7.63GB/s ± 0% 7.63GB/s ± 1% ~ (p=0.353 n=10+10) MemmoveUnalignedSrc/12-16 8.32GB/s ± 0% 8.32GB/s ± 0% ~ (p=0.218 n=10+10) MemmoveUnalignedSrc/13-16 9.02GB/s ± 0% 9.00GB/s ± 1% ~ (p=0.684 n=10+10) MemmoveUnalignedSrc/14-16 9.71GB/s ± 0% 9.71GB/s ± 0% ~ (p=0.739 n=10+10) MemmoveUnalignedSrc/15-16 10.4GB/s ± 0% 10.4GB/s ± 0% ~ (p=0.353 n=10+10) MemmoveUnalignedSrc/16-16 11.1GB/s ± 1% 11.1GB/s ± 0% ~ (p=0.579 n=10+10) MemmoveUnalignedSrc/32-16 20.0GB/s ± 1% 20.0GB/s ± 0% ~ (p=0.631 n=10+10) MemmoveUnalignedSrc/64-16 38.8GB/s ± 0% 38.8GB/s ± 0% ~ (p=0.579 n=10+10) MemmoveUnalignedSrc/128-16 61.2GB/s ± 0% 61.2GB/s ± 0% ~ (p=0.780 n=10+9) MemmoveUnalignedSrc/256-16 94.8GB/s ± 0% 92.2GB/s ± 1% -2.73% (p=0.000 n=10+10) MemmoveUnalignedSrc/512-16 119GB/s ± 0% 119GB/s ± 0% +0.26% (p=0.027 n=8+9) MemmoveUnalignedSrc/1024-16 141GB/s ± 0% 142GB/s ± 1% +1.07% (p=0.000 n=8+10) MemmoveUnalignedSrc/2048-16 157GB/s ± 0% 157GB/s ± 0% ~ (p=0.167 n=9+8) MemmoveUnalignedSrc/4096-16 161GB/s ± 0% 162GB/s ± 1% ~ (p=0.063 n=10+10) MemmoveUnalignedSrcOverlap/32-16 7.93GB/s ± 0% 7.88GB/s ± 0% -0.63% (p=0.000 n=9+10) MemmoveUnalignedSrcOverlap/64-16 15.5GB/s ± 0% 15.5GB/s ± 0% ~ (p=0.529 n=10+10) MemmoveUnalignedSrcOverlap/128-16 28.3GB/s ± 0% 28.3GB/s ± 0% ~ (p=0.218 n=10+10) MemmoveUnalignedSrcOverlap/256-16 41.5GB/s ± 0% 41.6GB/s ± 0% +0.35% (p=0.000 n=10+9) MemmoveUnalignedSrcOverlap/512-16 68.9GB/s ± 0% 68.8GB/s ± 0% ~ (p=0.541 n=9+8) MemmoveUnalignedSrcOverlap/1024-16 115GB/s ± 0% 115GB/s ± 0% ~ (p=0.382 n=8+8) MemmoveUnalignedSrcOverlap/2048-16 155GB/s ± 0% 144GB/s ±18% ~ (p=0.101 n=8+10) MemmoveUnalignedSrcOverlap/4096-16 160GB/s ± 0% 160GB/s ± 1% ~ (p=0.605 n=9+9) Memclr/5-16 5.81GB/s ± 1% 5.80GB/s ± 2% ~ (p=0.546 n=9+9) Memclr/16-16 15.5GB/s ± 0% 15.4GB/s ± 0% -0.32% (p=0.008 n=9+10) Memclr/64-16 51.8GB/s ± 0% 50.7GB/s ± 0% -2.22% (p=0.000 n=10+10) Memclr/256-16 113GB/s ± 0% 113GB/s ± 0% ~ (p=0.143 n=10+10) Memclr/4096-16 239GB/s ± 1% 237GB/s ± 0% -0.87% (p=0.000 n=10+10) Memclr/65536-16 79.8GB/s ± 0% 79.7GB/s ± 0% ~ (p=0.529 n=10+10) Memclr/1M-16 74.6GB/s ± 1% 74.7GB/s ± 1% ~ (p=0.529 n=10+10) Memclr/4M-16 48.7GB/s ± 1% 48.8GB/s ± 0% ~ (p=0.123 n=10+10) Memclr/8M-16 48.2GB/s ± 2% 48.6GB/s ± 0% ~ (p=0.408 n=10+8) Memclr/16M-16 43.6GB/s ± 4% 43.3GB/s ± 0% ~ (p=0.173 n=10+8) Memclr/64M-16 30.7GB/s ± 0% 30.7GB/s ± 0% ~ (p=0.113 n=10+9) GoMemclr/5-16 6.07GB/s ± 0% 6.08GB/s ± 0% ~ (p=0.367 n=9+10) GoMemclr/16-16 15.6GB/s ± 0% 15.6GB/s ± 0% -0.22% (p=0.004 n=9+9) GoMemclr/64-16 56.1GB/s ± 0% 56.1GB/s ± 0% ~ (p=0.968 n=10+9) GoMemclr/256-16 125GB/s ± 0% 124GB/s ± 0% ~ (p=0.912 n=10+10) MemclrRange/1K_2K-16 210GB/s ± 0% 224GB/s ± 1% +6.81% (p=0.000 n=10+10) MemclrRange/2K_8K-16 228GB/s ± 0% 228GB/s ± 0% ~ (p=0.684 n=10+10) MemclrRange/4K_16K-16 279GB/s ± 0% 279GB/s ± 0% ~ (p=0.780 n=9+10) MemclrRange/160K_228K-16 80.3GB/s ± 0% 80.2GB/s ± 0% ~ (p=0.165 n=10+10) Copy/1Byte-16 808MB/s ± 1% 810MB/s ± 0% +0.28% (p=0.000 n=10+8) Copy/1String-16 810MB/s ± 0% 811MB/s ± 0% ~ (p=0.105 n=10+10) Copy/2Byte-16 1.62GB/s ± 0% 1.62GB/s ± 0% ~ (p=0.182 n=10+9) Copy/2String-16 1.62GB/s ± 1% 1.62GB/s ± 1% ~ (p=1.000 n=10+10) Copy/4Byte-16 3.22GB/s ± 0% 3.24GB/s ± 0% +0.46% (p=0.000 n=10+10) Copy/4String-16 3.24GB/s ± 0% 3.24GB/s ± 0% ~ (p=0.075 n=10+10) Copy/8Byte-16 5.86GB/s ± 0% 5.96GB/s ± 0% +1.82% (p=0.000 n=9+9) Copy/8String-16 5.95GB/s ± 0% 5.99GB/s ± 0% +0.59% (p=0.000 n=9+10) Copy/12Byte-16 8.32GB/s ± 0% 8.32GB/s ± 0% ~ (p=0.190 n=9+9) Copy/12String-16 8.31GB/s ± 0% 8.29GB/s ± 0% ~ (p=0.068 n=9+10) Copy/16Byte-16 11.1GB/s ± 0% 11.1GB/s ± 0% +0.18% (p=0.003 n=10+9) Copy/16String-16 11.1GB/s ± 0% 11.1GB/s ± 0% -0.31% (p=0.009 n=10+10) Copy/32Byte-16 19.6GB/s ± 1% 19.8GB/s ± 1% +0.72% (p=0.002 n=10+10) Copy/32String-16 20.0GB/s ± 0% 19.5GB/s ± 0% -2.19% (p=0.000 n=10+10) Copy/128Byte-16 62.2GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.661 n=9+10) Copy/128String-16 61.9GB/s ± 0% 61.7GB/s ± 0% -0.35% (p=0.005 n=10+10) Copy/1024Byte-16 169GB/s ± 2% 171GB/s ± 1% +1.21% (p=0.000 n=9+10) Copy/1024String-16 169GB/s ± 0% 172GB/s ± 1% +1.57% (p=0.000 n=10+9) CompareStringBigUnaligned-16 43.8GB/s ± 1% 43.7GB/s ± 1% ~ (p=0.370 n=9+8) CompareStringBig-16 47.6GB/s ± 3% 47.3GB/s ± 3% ~ (p=0.243 n=9+10) [Geo mean] 25.3GB/s 28.0GB/s +10.66% name old p50-ns new p50-ns delta ReadMemStatsLatency-16 98.4k ±37% 112.7k ±64% ~ (p=0.436 n=10+10) ReadMetricsLatency-16 1.69k ± 3% 1.70k ± 2% ~ (p=0.646 n=9+10) GoroutineProfile/small-nil/idle-16 3.75k ± 4% 3.72k ± 1% ~ (p=0.447 n=10+9) GoroutineProfile/small-nil/loaded-16 4.33k ± 3% 4.31k ± 5% ~ (p=0.931 n=9+9) GoroutineProfile/small/idle-16 102k ± 3% 101k ± 4% ~ (p=0.113 n=9+9) GoroutineProfile/small/loaded-16 214k ± 3% 215k ± 6% ~ (p=0.842 n=9+10) GoroutineProfile/large-nil/idle-16 3.70k ± 2% 3.65k ± 2% ~ (p=0.075 n=10+10) GoroutineProfile/large-nil/loaded-16 4.36k ± 3% 4.31k ±10% ~ (p=0.631 n=10+10) GoroutineProfile/large/idle-16 2.56M ± 1% 2.51M ± 1% -2.28% (p=0.000 n=10+10) GoroutineProfile/large/loaded-16 6.77M ± 4% 6.85M ±19% ~ (p=0.536 n=7+10) GoroutineProfile/sparse-nil/idle-16 3.66k ± 1% 3.64k ± 2% ~ (p=0.136 n=9+9) GoroutineProfile/sparse-nil/loaded-16 4.25k ± 5% 4.15k ± 4% ~ (p=0.190 n=10+10) GoroutineProfile/sparse/idle-16 102k ± 4% 101k ± 3% ~ (p=0.447 n=10+9) GoroutineProfile/sparse/loaded-16 216k ± 4% 218k ± 4% ~ (p=0.549 n=9+10) [Geo mean] 35.8k 35.9k +0.35% name old p90-ns new p90-ns delta ReadMemStatsLatency-16 983k ±310% 200k ±34% -79.62% (p=0.034 n=10+8) ReadMetricsLatency-16 4.01k ±35% 3.75k ±17% ~ (p=0.315 n=10+10) GoroutineProfile/small-nil/idle-16 4.21k ± 4% 4.27k ± 8% ~ (p=0.968 n=9+10) GoroutineProfile/small-nil/loaded-16 5.58k ± 8% 5.35k ±12% ~ (p=0.190 n=10+10) GoroutineProfile/small/idle-16 108k ± 6% 107k ± 7% ~ (p=0.497 n=9+10) GoroutineProfile/small/loaded-16 450k ± 5% 432k ± 3% -3.92% (p=0.002 n=9+9) GoroutineProfile/large-nil/idle-16 4.13k ± 7% 4.04k ± 2% ~ (p=0.181 n=10+8) GoroutineProfile/large-nil/loaded-16 5.76k ± 4% 5.67k ± 5% ~ (p=0.190 n=10+10) GoroutineProfile/large/idle-16 2.63M ± 2% 2.58M ± 1% -1.97% (p=0.000 n=10+10) GoroutineProfile/large/loaded-16 16.9M ± 4% 17.0M ± 6% ~ (p=0.661 n=9+10) GoroutineProfile/sparse-nil/idle-16 4.21k ±10% 4.07k ± 7% ~ (p=0.128 n=10+10) GoroutineProfile/sparse-nil/loaded-16 5.55k ± 8% 5.38k ± 6% ~ (p=0.089 n=10+10) GoroutineProfile/sparse/idle-16 106k ± 4% 106k ± 3% ~ (p=0.661 n=10+9) GoroutineProfile/sparse/loaded-16 454k ± 6% 441k ± 6% -2.86% (p=0.043 n=10+10) [Geo mean] 58.4k 51.0k -12.61% name old p99-ns new p99-ns delta ReadMemStatsLatency-16 983k ±310% 200k ±34% -79.62% (p=0.034 n=10+8) ReadMetricsLatency-16 26.6k ±22% 26.6k ±17% ~ (p=0.971 n=10+10) GoroutineProfile/small-nil/idle-16 5.27k ±12% 5.19k ±12% ~ (p=0.579 n=10+10) GoroutineProfile/small-nil/loaded-16 7.28k ± 3% 7.04k ± 9% ~ (p=0.113 n=9+10) GoroutineProfile/small/idle-16 114k ± 6% 113k ± 6% ~ (p=0.604 n=9+10) GoroutineProfile/small/loaded-16 4.84M ±70% 6.06M ±131% ~ (p=0.842 n=9+10) GoroutineProfile/large-nil/idle-16 5.38k ±19% 5.26k ±13% ~ (p=0.912 n=10+10) GoroutineProfile/large-nil/loaded-16 7.38k ± 3% 7.23k ± 4% ~ (p=0.143 n=10+10) GoroutineProfile/large/idle-16 2.79M ± 5% 2.72M ± 3% ~ (p=0.089 n=10+10) GoroutineProfile/large/loaded-16 24.0M ±24% 24.9M ±29% ~ (p=0.684 n=10+10) GoroutineProfile/sparse-nil/idle-16 5.32k ±17% 5.49k ±17% ~ (p=0.684 n=10+10) GoroutineProfile/sparse-nil/loaded-16 7.25k ± 4% 6.97k ± 5% -3.90% (p=0.005 n=9+10) GoroutineProfile/sparse/idle-16 113k ± 5% 112k ± 6% ~ (p=0.631 n=10+10) GoroutineProfile/sparse/loaded-16 4.00M ±66% 4.26M ±65% ~ (p=0.489 n=9+9) [Geo mean] 107k 97k -9.55% name old alloc/op new alloc/op delta NewEmptyMap-16 0.00B 0.00B ~ (all equal) NewSmallMap-16 0.00B 0.00B ~ (all equal) MapPopulate/1-16 0.00B 0.00B ~ (all equal) MapPopulate/10-16 179B ± 0% 179B ± 0% ~ (all equal) MapPopulate/100-16 3.35kB ± 0% 3.35kB ± 0% ~ (p=0.294 n=10+8) MapPopulate/1000-16 53.3kB ± 0% 53.3kB ± 0% ~ (p=1.000 n=8+10) MapPopulate/10000-16 428kB ± 0% 428kB ± 0% ~ (p=0.469 n=10+10) MapPopulate/100000-16 3.62MB ± 0% 3.62MB ± 0% ~ (p=0.888 n=9+10) MapStringConversion/32/simple-16 0.00B 0.00B ~ (all equal) MapStringConversion/32/struct-16 0.00B 0.00B ~ (all equal) MapStringConversion/32/array-16 0.00B 0.00B ~ (all equal) MapStringConversion/64/simple-16 0.00B 0.00B ~ (all equal) MapStringConversion/64/struct-16 0.00B 0.00B ~ (all equal) MapStringConversion/64/array-16 0.00B 0.00B ~ (all equal) NewEmptyMapHintLessThan8-16 0.00B 0.00B ~ (all equal) NewEmptyMapHintGreaterThan8-16 1.15kB ± 0% 1.15kB ± 0% ~ (all equal) MapAppendAssign/Int32/256-16 41.7B ±15% 44.3B ±12% ~ (p=0.106 n=10+10) MapAppendAssign/Int32/65536-16 22.6B ± 6% 23.5B ± 6% +4.19% (p=0.025 n=9+10) MapAppendAssign/Int64/256-16 43.5B ±10% 42.9B ± 7% ~ (p=0.757 n=10+10) MapAppendAssign/Int64/65536-16 24.7B ± 7% 21.8B ± 6% -11.74% (p=0.000 n=10+10) MapAppendAssign/Str/256-16 87.6B ±10% 89.2B ± 9% ~ (p=0.379 n=10+10) MapAppendAssign/Str/65536-16 45.1B ±14% 47.2B ± 8% ~ (p=0.150 n=10+9) CreateGoroutinesCapture-16 144B ± 0% 144B ± 0% ~ (all equal) [Geo mean] 769B 770B +0.21% name old allocs/op new allocs/op delta NewEmptyMap-16 0.00 0.00 ~ (all equal) NewSmallMap-16 0.00 0.00 ~ (all equal) MapPopulate/1-16 0.00 0.00 ~ (all equal) MapPopulate/10-16 1.00 ± 0% 1.00 ± 0% ~ (all equal) MapPopulate/100-16 17.0 ± 0% 17.0 ± 0% ~ (all equal) MapPopulate/1000-16 73.0 ± 0% 73.0 ± 0% ~ (all equal) MapPopulate/10000-16 320 ± 0% 320 ± 0% ~ (p=1.000 n=10+10) MapPopulate/100000-16 4.00k ± 0% 4.00k ± 0% ~ (p=0.753 n=10+10) MapStringConversion/32/simple-16 0.00 0.00 ~ (all equal) MapStringConversion/32/struct-16 0.00 0.00 ~ (all equal) MapStringConversion/32/array-16 0.00 0.00 ~ (all equal) MapStringConversion/64/simple-16 0.00 0.00 ~ (all equal) MapStringConversion/64/struct-16 0.00 0.00 ~ (all equal) MapStringConversion/64/array-16 0.00 0.00 ~ (all equal) NewEmptyMapHintLessThan8-16 0.00 0.00 ~ (all equal) NewEmptyMapHintGreaterThan8-16 1.00 ± 0% 1.00 ± 0% ~ (all equal) MapAppendAssign/Int32/256-16 0.00 0.00 ~ (all equal) MapAppendAssign/Int32/65536-16 0.00 0.00 ~ (all equal) MapAppendAssign/Int64/256-16 0.00 0.00 ~ (all equal) MapAppendAssign/Int64/65536-16 0.00 0.00 ~ (all equal) MapAppendAssign/Str/256-16 0.00 0.00 ~ (all equal) MapAppendAssign/Str/65536-16 0.00 0.00 ~ (all equal) CreateGoroutinesCapture-16 5.00 ± 0% 5.00 ± 0% ~ (all equal) [Geo mean] 26.0 26.0 +0.00% Change-Id: I5fb03e93df8b380e04795afbdcd1c94aeeecacc6 Reviewed-on: https://go-review.googlesource.com/c/go/+/454255 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Paul E. Murphy
|
1540531746 |
test/codegen: merge identical ppc64 and ppc64le tests
Manually consolidate the remaining ppc64/ppc64le test which are not so trivial to automatically merge. The remaining ppc64le tests are limited to cases where load/stores are merged (this only happens on ppc64le) and the race detector (only supported on ppc64le). Change-Id: I1f9c0f3d3ddbb7fbbd8c81fbbd6537394fba63ce Reviewed-on: https://go-review.googlesource.com/c/go/+/463217 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
||
Paul E. Murphy
|
0301c6c351 |
test/codegen: combine trivial PPC64 tests into ppc64x
Use a small python script to consolidate duplicate ppc64/ppc64le tests into a single ppc64x codegen test. This makes small assumption that anytime two tests with for different arch/variant combos exists, those tests can be combined into a single ppc64x test. E.x: // ppc64le: foo // ppc64le/power9: foo into // ppc64x: foo or // ppc64: foo // ppc64le: foo into // ppc64x: foo import glob import re files = glob.glob("codegen/*.go") for file in files: with open(file) as f: text = [l for l in f] i = 0 while i < len(text): first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i]) if first: j = i+1 while j < len(text): second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j]) if not second: break if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3): text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i]) text=text[:j] + (text[j+1:]) else: j += 1 i+=1 with open(file, 'w') as f: f.write("".join(text)) Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821 Reviewed-on: https://go-review.googlesource.com/c/go/+/463220 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
Paul E. Murphy
|
a37672bb7b |
test/codegen: accept ppc64x as alias for ppc64le and ppc64 arches
This helps simplify the noise when adding ppc codegen tests. ppc64x is used in other places to indicate something which runs on either endian. This helps cleanup existing codegen tests which are mostly identical between endian variants. condmove tests are converted as an example. Change-Id: I2b2d98a9a1859015f62db38d62d9d5d7593435b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/462895 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Paul Murphy <murp@ibm.com> |
||
David Chase
|
e22bd2348c |
internal/abi,runtime: refactor map constants into one place
Previously TryBot-tested with bucket bits = 4. Also tested locally with bucket bits = 5. This makes it much easier to change the size of map buckets, and hopefully provides pointers to all the code that in some way depends on details of map layout. Change-Id: I9f6669d1eadd02f182d0bc3f959dc5f385fa1683 Reviewed-on: https://go-review.googlesource.com/c/go/+/462115 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Austin Clements <austin@google.com> |
||
Jorropo
|
5c67ebbb31 |
cmd/compile: AMD64v3 remove unnecessary TEST comparision in isPowerOfTwo
With GOAMD64=V3 the canonical isPowerOfTwo function: func isPowerOfTwo(x uintptr) bool { return x&(x-1) == 0 } Used to compile to: temp := BLSR(x) // x&(x-1) flags = TEST(temp, temp) return flags.zf However the blsr instruction already set ZF according to the result. So we can remove the TEST instruction if we are just checking ZF. Such as in multiple pieces of code around memory allocations. This make the code smaller and faster. Change-Id: Ia12d5a73aa3cb49188c0b647b1eff7b56c5a7b58 Reviewed-on: https://go-review.googlesource.com/c/go/+/448255 Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
Jorropo
|
fc814056aa |
cmd/compile: rewrite empty makeslice to zerobase pointer
make\(\[\][a-zA-Z0-9]+, 0\) is seen 52 times in the go source. And at least 391 times on internet: https://grep.app/search?q=make%5C%28%5C%5B%5C%5D%5Ba-zA-Z0-9%5D%2B%2C%200%5C%29®exp=true This used to compile to calling runtime.makeslice. However we can copy what we do for []T{}, just use a zerobase pointer. On my machine this is 10x faster (from 3ns to 0.3ns). Note that an empty loop also runs in 0.3ns, so this really is free when you count superscallar execution. Change-Id: I1cfe7e69f5a7a4dabbc71912ce6a4f8a2d4a7f3c Reviewed-on: https://go-review.googlesource.com/c/go/+/454036 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> |
||
Keith Randall
|
f959fb3872 |
cmd/compile: add anchored version of SP
The SPanchored opcode is identical to SP, except that it takes a memory argument so that it (and more importantly, anything that uses it) must be scheduled at or after that memory argument. This opcode ensures that a LEAQ of a variable gets scheduled after the corresponding VARDEF for that variable. This may lead to less CSE of LEAQ operations. The effect is very small. The go binary is only 80 bytes bigger after this CL. Usually LEAQs get folded into load/store operations, so the effect is only for pointerful types, large enough to need a duffzero, and have their address passed somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because the two uses are on different sides of a function call and the LEAQ ends up being rematerialized at the second use anyway. Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293 Reviewed-on: https://go-review.googlesource.com/c/go/+/452916 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> Reviewed-by: Keith Randall <khr@google.com> |