1
0
mirror of https://github.com/golang/go synced 2024-09-29 04:24:36 -06:00
Commit Graph

416 Commits

Author SHA1 Message Date
Meng Zhuo
5c1a15df41 test/codegen: enable Mul2 DivPow2 test for riscv64
Change-Id: Ice0bb7a665599b334e927a1b00d1a5b400c15e3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/506035
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2023-07-04 13:33:45 +00:00
Meng Zhuo
b7e7467865 test/codegen: add fsqrt test for riscv64
Add FSQRTD FSQRTS codegen tests for riscv64

Change-Id: I16ca3753ad1ba37afbd9d0f887b078e33f98fda0
Reviewed-on: https://go-review.googlesource.com/c/go/+/503275
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-06-15 15:16:20 +00:00
Keith Randall
c643b29381 cmd/compile: use callsite as line number for argument marshaling
Don't use the line number of the argument itself, as that may be from
arbitrarily earlier in the function.

Fixes #60673

Change-Id: Ifc0a2aaae221a256be3a4b0b2e04849bae4b79d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/502656
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-06-12 20:34:37 +00:00
Austin Clements
c2e0bf0abf cmd/internal/testdir: pass if GOEXPERIMENT=cgocheck2 is set
Some testdir tests fail if GOEXPERIMENT=cgocheck2 is set. Fix this by
skipping these tests.

Change-Id: I58d4ef0cceb86bcf93220b4a44de9b9dc4879b16
Reviewed-on: https://go-review.googlesource.com/c/go/+/499675
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-06-01 18:30:44 +00:00
Bryan Mills
02d234e34d Revert "cmd/compile: sparse conditional constant propagation"
This reverts CL 483875.

Reason for revert: appears to cause internal compiler errors on the ssacheck builder.

Change-Id: I662418384291470c1962c417797a5890dd9aa7a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/497855
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
2023-05-24 14:39:34 +00:00
Junxian Zhu
f0d575c266 cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on mips64x
This CL use MFC1/MTC1 instructions to move data between GPR and FPR instead of stores and loads to move float/int values.

goos: linux
goarch: mips64le
pkg: math
                      │   oldmath    │              newmath               │
                      │    sec/op    │   sec/op     vs base               │
Acos-4                   258.2n ± 0%   258.2n ± 0%        ~ (p=0.859 n=8)
Acosh-4                  378.7n ± 0%   323.9n ± 0%  -14.47% (p=0.000 n=8)
Asin-4                   255.1n ± 2%   255.5n ± 0%   +0.16% (p=0.002 n=8)
Asinh-4                  407.1n ± 0%   348.7n ± 0%  -14.35% (p=0.000 n=8)
Atan-4                   189.5n ± 0%   189.9n ± 3%        ~ (p=0.205 n=8)
Atanh-4                  355.6n ± 0%   323.4n ± 2%   -9.03% (p=0.000 n=8)
Atan2-4                  284.1n ± 7%   280.1n ± 4%        ~ (p=0.313 n=8)
Cbrt-4                   314.3n ± 0%   236.4n ± 0%  -24.79% (p=0.000 n=8)
Ceil-4                   144.3n ± 3%   139.6n ± 0%        ~ (p=0.069 n=8)
Compare-4               21.100n ± 0%   7.035n ± 0%  -66.66% (p=0.000 n=8)
Compare32-4             20.100n ± 0%   6.030n ± 0%  -70.00% (p=0.000 n=8)
Copysign-4              34.970n ± 0%   6.221n ± 0%  -82.21% (p=0.000 n=8)
Cos-4                    183.4n ± 3%   184.1n ± 5%        ~ (p=0.159 n=8)
Cosh-4                   487.9n ± 2%   419.6n ± 0%  -14.00% (p=0.000 n=8)
Erf-4                    160.6n ± 0%   157.9n ± 0%   -1.68% (p=0.009 n=8)
Erfc-4                   183.7n ± 4%   169.8n ± 0%   -7.54% (p=0.000 n=8)
Erfinv-4                 191.5n ± 4%   183.6n ± 0%   -4.13% (p=0.023 n=8)
Erfcinv-4                192.0n ± 7%   184.3n ± 0%        ~ (p=0.425 n=8)
Exp-4                    398.2n ± 0%   340.1n ± 4%  -14.58% (p=0.000 n=8)
ExpGo-4                  383.3n ± 0%   327.3n ± 0%  -14.62% (p=0.000 n=8)
Expm1-4                  248.7n ± 5%   216.0n ± 0%  -13.11% (p=0.000 n=8)
Exp2-4                   372.8n ± 0%   316.9n ± 3%  -14.98% (p=0.000 n=8)
Exp2Go-4                 374.1n ± 0%   320.5n ± 0%  -14.33% (p=0.000 n=8)
Abs-4                    3.013n ± 0%   3.016n ± 0%   +0.10% (p=0.020 n=8)
Dim-4                    5.021n ± 0%   5.022n ± 0%        ~ (p=0.270 n=8)
Floor-4                  127.5n ± 4%   126.2n ± 3%        ~ (p=0.186 n=8)
Max-4                    72.32n ± 0%   61.33n ± 0%  -15.20% (p=0.000 n=8)
Min-4                    83.33n ± 1%   61.36n ± 0%  -26.37% (p=0.000 n=8)
Mod-4                    690.7n ± 0%   454.5n ± 0%  -34.20% (p=0.000 n=8)
Frexp-4                 116.30n ± 1%   71.80n ± 1%  -38.26% (p=0.000 n=8)
Gamma-4                  389.0n ± 0%   355.9n ± 1%   -8.48% (p=0.000 n=8)
Hypot-4                 102.40n ± 0%   83.90n ± 0%  -18.07% (p=0.000 n=8)
HypotGo-4               105.45n ± 4%   84.82n ± 2%  -19.56% (p=0.000 n=8)
Ilogb-4                  99.13n ± 4%   63.71n ± 2%  -35.73% (p=0.000 n=8)
J0-4                     859.7n ± 0%   854.8n ± 0%   -0.57% (p=0.000 n=8)
J1-4                     873.9n ± 0%   875.7n ± 0%   +0.21% (p=0.007 n=8)
Jn-4                     1.855µ ± 0%   1.867µ ± 0%   +0.65% (p=0.000 n=8)
Ldexp-4                 130.50n ± 2%   64.35n ± 0%  -50.69% (p=0.000 n=8)
Lgamma-4                 208.8n ± 0%   200.9n ± 0%   -3.78% (p=0.000 n=8)
Log-4                    294.1n ± 0%   255.2n ± 3%  -13.22% (p=0.000 n=8)
Logb-4                  105.45n ± 1%   66.81n ± 1%  -36.64% (p=0.000 n=8)
Log1p-4                  268.2n ± 0%   211.3n ± 0%  -21.21% (p=0.000 n=8)
Log10-4                  295.4n ± 0%   255.2n ± 2%  -13.59% (p=0.000 n=8)
Log2-4                   152.9n ± 1%   127.5n ± 0%  -16.61% (p=0.000 n=8)
Modf-4                  103.40n ± 0%   75.36n ± 0%  -27.12% (p=0.000 n=8)
Nextafter32-4           121.20n ± 1%   78.40n ± 0%  -35.31% (p=0.000 n=8)
Nextafter64-4           110.40n ± 1%   64.91n ± 0%  -41.20% (p=0.000 n=8)
PowInt-4                 509.8n ± 1%   369.3n ± 1%  -27.56% (p=0.000 n=8)
PowFrac-4               1189.0n ± 0%   947.8n ± 0%  -20.29% (p=0.000 n=8)
Pow10Pos-4               15.07n ± 0%   15.07n ± 0%        ~ (p=0.733 n=8)
Pow10Neg-4               20.10n ± 0%   20.10n ± 0%        ~ (p=0.576 n=8)
Round-4                  44.22n ± 0%   26.12n ± 0%  -40.92% (p=0.000 n=8)
RoundToEven-4            46.22n ± 0%   27.12n ± 0%  -41.31% (p=0.000 n=8)
Remainder-4              539.0n ± 1%   417.1n ± 1%  -22.62% (p=0.000 n=8)
Signbit-4               17.985n ± 0%   5.694n ± 0%  -68.34% (p=0.000 n=8)
Sin-4                    185.7n ± 5%   172.9n ± 0%   -6.89% (p=0.001 n=8)
Sincos-4                 176.6n ± 0%   200.9n ± 0%  +13.76% (p=0.000 n=8)
Sinh-4                   495.8n ± 0%   435.9n ± 0%  -12.09% (p=0.000 n=8)
SqrtIndirect-4           5.022n ± 0%   5.024n ± 0%        ~ (p=0.083 n=8)
SqrtLatency-4            8.038n ± 0%   8.044n ± 0%        ~ (p=0.524 n=8)
SqrtIndirectLatency-4    8.035n ± 0%   8.039n ± 0%   +0.06% (p=0.017 n=8)
SqrtGoLatency-4          340.1n ± 0%   278.3n ± 0%  -18.19% (p=0.000 n=8)
SqrtPrime-4              5.381µ ± 0%   5.386µ ± 0%        ~ (p=0.662 n=8)
Tan-4                    198.6n ± 1%   183.1n ± 0%   -7.85% (p=0.000 n=8)
Tanh-4                   491.3n ± 1%   440.8n ± 1%  -10.29% (p=0.000 n=8)
Trunc-4                  121.7n ± 0%   121.7n ± 0%        ~ (p=0.769 n=8)
Y0-4                     855.1n ± 0%   859.8n ± 0%   +0.54% (p=0.007 n=8)
Y1-4                     862.3n ± 0%   865.1n ± 0%   +0.32% (p=0.007 n=8)
Yn-4                     1.830µ ± 0%   1.837µ ± 0%   +0.36% (p=0.011 n=8)
Float64bits-4           13.060n ± 0%   3.016n ± 0%  -76.91% (p=0.000 n=8)
Float64frombits-4       13.060n ± 0%   3.018n ± 0%  -76.90% (p=0.000 n=8)
Float32bits-4           13.060n ± 0%   3.016n ± 0%  -76.91% (p=0.000 n=8)
Float32frombits-4       13.070n ± 0%   3.013n ± 0%  -76.94% (p=0.000 n=8)
FMA-4                    446.0n ± 0%   413.1n ± 1%   -7.38% (p=0.000 n=8)
geomean                  143.4n        108.3n       -24.49%

Change-Id: I2067f7a5ae1126ada7ab3fb2083710e8212535e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/493815
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
2023-05-24 03:36:31 +00:00
Yi Yang
fa50248ce6 cmd/compile: sparse conditional constant propagation
sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately.

Updates #59399

Change-Id: Ia954e906480654a6f0cc065d75b5912f96f36b2e
GitHub-Last-Rev: 90fc02db99
GitHub-Pull-Request: golang/go#59575
Reviewed-on: https://go-review.googlesource.com/c/go/+/483875
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-05-24 02:54:03 +00:00
Cuong Manh Le
35a71dc56d cmd/compile: avoid slicebytetostring call in len(string([]byte))
Change-Id: Ie04503e61400a793a6a29a4b58795254deabe472
Reviewed-on: https://go-review.googlesource.com/c/go/+/497276
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-05-23 19:27:38 +00:00
Matthew Dempsky
7f1467ff4d cmd/compile: incorporate inlined function names into closure naming
In Go 1.17, cmd/compile gained the ability to inline calls to
functions that contain function literals (aka "closures"). This was
implemented by duplicating the function literal body and emitting a
second LSym, because in general it might be optimized better than the
original function literal.

However, the second LSym was named simply as any other function
literal appearing literally in the enclosing function would be named.
E.g., if f has a closure "f.funcX", and f is inlined into g, we would
create "g.funcY" (N.B., X and Y need not be the same.). Users then
have no idea this function originally came from f.

With this CL, the inlined call stack is incorporated into the clone
LSym's name: instead of "g.funcY", it's named "g.f.funcY".

In the future, it seems desirable to arrange for the clone's name to
appear exactly as the original name, so stack traces remain the same
as when -l or -d=inlfuncswithclosures are used. But it's unclear
whether the linker supports that today, or whether any downstream
tooling would be confused by this.

Updates #60324.

Change-Id: Ifad0ccef7e959e72005beeecdfffd872f63982f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/497137
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
2023-05-22 22:47:15 +00:00
Keith Randall
6042a062dc cmd/compile: make memcombine pass a bit more robust to reassociation of exprs
Be more liberal about expanding the OR tree. Handle any tree shape
instead of a fully left or right associative tree.

Also remove tail feature, it isn't ever needed.

Change-Id: If16bebef94b952a604d6069e9be3d9129994cb6f
Reviewed-on: https://go-review.googlesource.com/c/go/+/494056
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ryan Berger <ryanbberger@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-05-16 19:13:26 +00:00
Lynn Boger
4481042c43 cmd/compile: update rules to generate more prefixed instructions
This modifies some existing rules to allow more prefixed instructions
to be generated when using GOPPC64=power10. Some rules also check
if PCRel is available, which is currently supported for linux/ppc64le
and linux/ppc64 (internal linking only).

Prior to p10, DS-offset loads and stores had a 16 bit size limit for
the offset field. If the offset of the data for load or store was
beyond this range then an indexed load or store would be selected by
the rules.

In p10 the assembler can generate prefixed instructions in this case,
but does not if an indexed instruction was selected during the lowering
pass.

This allows many more cases to use prefixed loads or stores, reducing
function sizes and improving performance in some cases where the code
change happens in key loops.

For example in strconv BenchmarkAppendQuoteRune before:

  12c5e4:       15 00 10 06     pla     r10,1425660
  12c5e8:       fc c0 40 39
  12c5ec:       00 00 6a e8     ld      r3,0(r10)
  12c5f0:       10 00 aa e8     ld      r5,16(r10)

After this change:

  12a828:       15 00 10 04     pld     r3,1433272
  12a82c:       b8 de 60 e4
  12a830:       15 00 10 04     pld     r5,1433280
  12a834:       c0 de a0 e4

Performs better in the second case.

A testcase was added to verify that the rules correctly select a load or
store based on the offset and whether power10 or earlier.

Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846
Reviewed-on: https://go-review.googlesource.com/c/go/+/477398
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
2023-05-15 18:20:54 +00:00
Dmitri Shuralyov
7cc4516ac8 internal/testdir: move to cmd/internal/testdir
The effect and motivation is for the test to be selected when doing
'go test cmd' and not when doing 'go test std' since it's primarily
about testing the Go compiler and linker. Other than that, it's run
by all.bash and 'go test std cmd' as before.

For #56844.
Fixes #60059.

Change-Id: I2d499af013f9d9b8761fdf4573f8d27d80c1fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/493876
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-05-12 17:18:08 +00:00
Stefan
95c4f320d5 cmd/compile: add De Morgan's rewrite rule
Adds rules that rewrites statements such as ~P&~Q as ~(P|Q) and ~P|~Q as ~(P&Q), removing an extraneous instruction.

Change-Id: Icedb97df741680ddf9799df79df78657173aa500
GitHub-Last-Rev: f22e2350c9
GitHub-Pull-Request: golang/go#60018
Reviewed-on: https://go-review.googlesource.com/c/go/+/493175
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Stefan M <st3f4nm4d4@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-05-10 16:32:25 +00:00
Lynn Boger
bc3bdfa977 test: add memcombine testcases for ppc64
Thanks to the recent addition of the memcombine pass, the
ppc64 ports now have the memcombine optimizations. Previously
in PPC64.rules, the memcombine rules were only added for
ppc64le targets due to the significant increase in size of
the rewritePPC64.go file when those rules were added. The
ppc64 and ppc64le rules had to be different because of the
byte order due to endianness differences.

This enables the memcombine tests to be run on ppc64 as well
as ppc64le.

Change-Id: I4081e2d94617a1b66541d536c0c2662e266c9c1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/492615
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-05-08 16:50:23 +00:00
Junxian Zhu
5cad8d41ca math: optimize math.Abs on mipsx
This commit optimized math.Abs function implementation on mipsx.
Tested on loongson 3A2000.

goos: linux
goarch: mipsle
pkg: math
                      │   oldmath    │              newmath               │
                      │    sec/op    │   sec/op     vs base               │
Acos-4                   282.6n ± 0%   282.3n ± 0%        ~ (p=0.140 n=7)
Acosh-4                  506.1n ± 0%   451.8n ± 0%  -10.73% (p=0.001 n=7)
Asin-4                   272.3n ± 0%   272.2n ± 0%        ~ (p=0.808 n=7)
Asinh-4                  529.7n ± 0%   475.3n ± 0%  -10.27% (p=0.001 n=7)
Atan-4                   208.2n ± 0%   207.9n ± 0%        ~ (p=0.134 n=7)
Atanh-4                  503.4n ± 1%   449.7n ± 0%  -10.67% (p=0.001 n=7)
Atan2-4                  310.5n ± 0%   310.5n ± 0%        ~ (p=0.928 n=7)
Cbrt-4                   359.3n ± 0%   358.8n ± 0%        ~ (p=0.121 n=7)
Ceil-4                   203.9n ± 0%   204.0n ± 0%        ~ (p=0.600 n=7)
Compare-4                23.11n ± 0%   23.11n ± 0%        ~ (p=0.702 n=7)
Compare32-4              19.09n ± 0%   19.12n ± 0%        ~ (p=0.070 n=7)
Copysign-4               33.20n ± 0%   34.02n ± 0%   +2.47% (p=0.001 n=7)
Cos-4                    422.5n ± 0%   385.4n ± 1%   -8.78% (p=0.001 n=7)
Cosh-4                   628.0n ± 0%   545.5n ± 0%  -13.14% (p=0.001 n=7)
Erf-4                    193.7n ± 2%   192.7n ± 1%        ~ (p=0.430 n=7)
Erfc-4                   192.8n ± 1%   193.0n ± 0%        ~ (p=0.245 n=7)
Erfinv-4                 220.7n ± 1%   221.5n ± 2%        ~ (p=0.272 n=7)
Erfcinv-4                221.3n ± 1%   220.4n ± 2%        ~ (p=0.738 n=7)
Exp-4                    471.4n ± 0%   435.1n ± 0%   -7.70% (p=0.001 n=7)
ExpGo-4                  470.6n ± 0%   434.0n ± 0%   -7.78% (p=0.001 n=7)
Expm1-4                  243.1n ± 0%   243.4n ± 0%        ~ (p=0.417 n=7)
Exp2-4                   463.1n ± 0%   427.0n ± 0%   -7.80% (p=0.001 n=7)
Exp2Go-4                 462.4n ± 0%   426.2n ± 5%   -7.83% (p=0.001 n=7)
Abs-4                   37.000n ± 0%   8.039n ± 9%  -78.27% (p=0.001 n=7)
Dim-4                    18.09n ± 0%   18.11n ± 0%        ~ (p=0.094 n=7)
Floor-4                  151.9n ± 0%   151.8n ± 0%        ~ (p=0.190 n=7)
Max-4                    116.7n ± 1%   116.7n ± 1%        ~ (p=0.842 n=7)
Min-4                    116.6n ± 1%   116.6n ± 0%        ~ (p=0.464 n=7)
Mod-4                   1244.0n ± 0%   980.9n ± 0%  -21.15% (p=0.001 n=7)
Frexp-4                  199.0n ± 0%   146.7n ± 0%  -26.28% (p=0.001 n=7)
Gamma-4                  516.4n ± 0%   479.3n ± 1%   -7.18% (p=0.001 n=7)
Hypot-4                  169.8n ± 0%   117.8n ± 2%  -30.62% (p=0.001 n=7)
HypotGo-4                170.8n ± 0%   117.5n ± 0%  -31.21% (p=0.001 n=7)
Ilogb-4                  160.8n ± 0%   109.5n ± 0%  -31.90% (p=0.001 n=7)
J0-4                     1.359µ ± 0%   1.305µ ± 0%   -3.97% (p=0.001 n=7)
J1-4                     1.386µ ± 0%   1.334µ ± 0%   -3.75% (p=0.001 n=7)
Jn-4                     2.864µ ± 0%   2.758µ ± 0%   -3.70% (p=0.001 n=7)
Ldexp-4                  202.9n ± 0%   151.7n ± 0%  -25.23% (p=0.001 n=7)
Lgamma-4                 234.0n ± 0%   234.3n ± 0%        ~ (p=0.199 n=7)
Log-4                    444.1n ± 0%   407.9n ± 0%   -8.15% (p=0.001 n=7)
Logb-4                   157.8n ± 0%   121.6n ± 0%  -22.94% (p=0.001 n=7)
Log1p-4                  354.8n ± 0%   315.4n ± 0%  -11.10% (p=0.001 n=7)
Log10-4                  453.9n ± 0%   417.9n ± 0%   -7.93% (p=0.001 n=7)
Log2-4                   245.3n ± 0%   209.1n ± 0%  -14.76% (p=0.001 n=7)
Modf-4                   126.6n ± 0%   126.6n ± 0%        ~ (p=0.126 n=7)
Nextafter32-4            112.5n ± 0%   112.5n ± 0%        ~ (p=0.853 n=7)
Nextafter64-4            141.7n ± 0%   141.6n ± 0%        ~ (p=0.331 n=7)
PowInt-4                 878.8n ± 1%   758.3n ± 1%  -13.71% (p=0.001 n=7)
PowFrac-4                1.809µ ± 0%   1.615µ ± 0%  -10.72% (p=0.001 n=7)
Pow10Pos-4               18.10n ± 0%   18.12n ± 0%        ~ (p=0.464 n=7)
Pow10Neg-4               17.09n ± 0%   17.09n ± 0%        ~ (p=0.263 n=7)
Round-4                  68.36n ± 0%   68.33n ± 0%        ~ (p=0.325 n=7)
RoundToEven-4            78.40n ± 0%   78.40n ± 0%        ~ (p=0.934 n=7)
Remainder-4              894.0n ± 1%   753.4n ± 1%  -15.73% (p=0.001 n=7)
Signbit-4                18.09n ± 0%   18.09n ± 0%        ~ (p=0.761 n=7)
Sin-4                    389.8n ± 1%   389.8n ± 0%        ~ (p=0.995 n=7)
Sincos-4                 416.0n ± 0%   415.9n ± 0%        ~ (p=0.361 n=7)
Sinh-4                   634.6n ± 4%   585.6n ± 1%   -7.72% (p=0.001 n=7)
SqrtIndirect-4           8.035n ± 0%   8.036n ± 0%        ~ (p=0.523 n=7)
SqrtLatency-4            8.039n ± 0%   8.037n ± 0%        ~ (p=0.218 n=7)
SqrtIndirectLatency-4    8.040n ± 0%   8.040n ± 0%        ~ (p=0.652 n=7)
SqrtGoLatency-4          895.7n ± 0%   896.6n ± 0%   +0.10% (p=0.004 n=7)
SqrtPrime-4              5.406µ ± 0%   5.407µ ± 0%        ~ (p=0.592 n=7)
Tan-4                    406.1n ± 0%   405.8n ± 1%        ~ (p=0.435 n=7)
Tanh-4                   627.6n ± 0%   545.5n ± 0%  -13.08% (p=0.001 n=7)
Trunc-4                  146.7n ± 1%   146.7n ± 0%        ~ (p=0.755 n=7)
Y0-4                     1.359µ ± 0%   1.310µ ± 0%   -3.61% (p=0.001 n=7)
Y1-4                     1.351µ ± 0%   1.301µ ± 0%   -3.70% (p=0.001 n=7)
Yn-4                     2.829µ ± 0%   2.729µ ± 0%   -3.53% (p=0.001 n=7)
Float64bits-4            14.08n ± 0%   14.07n ± 0%        ~ (p=0.069 n=7)
Float64frombits-4        19.09n ± 0%   19.10n ± 0%        ~ (p=0.755 n=7)
Float32bits-4            13.06n ± 0%   13.07n ± 1%        ~ (p=0.586 n=7)
Float32frombits-4        13.06n ± 0%   13.06n ± 0%        ~ (p=0.853 n=7)
FMA-4                    606.9n ± 0%   606.8n ± 0%        ~ (p=0.393 n=7)
geomean                  201.1n        185.4n        -7.81%

Change-Id: I6d41a97ad3789ed5731588588859ac0b8b13b664
Reviewed-on: https://go-review.googlesource.com/c/go/+/484675
Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
2023-05-08 15:53:28 +00:00
Junxian Zhu
574431cfcd math: optimize math.Abs on mips64x
This commit optimized math.Abs function implementation on mips64x.
Tested on loongson 3A2000.

goos: linux
goarch: mips64le
pkg: math
                      │    oldmath    │               newmath               │
                      │    sec/op     │    sec/op     vs base               │
Acos-4                   258.0n ± ∞ ¹   257.1n ± ∞ ¹   -0.35% (p=0.008 n=5)
Acosh-4                  417.0n ± ∞ ¹   377.9n ± ∞ ¹   -9.38% (p=0.008 n=5)
Asin-4                   248.0n ± ∞ ¹   259.9n ± ∞ ¹   +4.80% (p=0.008 n=5)
Asinh-4                  439.6n ± ∞ ¹   408.3n ± ∞ ¹   -7.12% (p=0.008 n=5)
Atan-4                   189.6n ± ∞ ¹   188.8n ± ∞ ¹        ~ (p=0.056 n=5)
Atanh-4                  390.0n ± ∞ ¹   356.4n ± ∞ ¹   -8.62% (p=0.008 n=5)
Atan2-4                  279.0n ± ∞ ¹   263.9n ± ∞ ¹   -5.41% (p=0.008 n=5)
Cbrt-4                   314.2n ± ∞ ¹   322.3n ± ∞ ¹   +2.58% (p=0.008 n=5)
Ceil-4                   139.7n ± ∞ ¹   136.6n ± ∞ ¹   -2.22% (p=0.008 n=5)
Compare-4                21.11n ± ∞ ¹   21.09n ± ∞ ¹        ~ (p=0.405 n=5)
Compare32-4              20.10n ± ∞ ¹   20.12n ± ∞ ¹        ~ (p=0.206 n=5)
Copysign-4               32.17n ± ∞ ¹   35.71n ± ∞ ¹  +11.00% (p=0.008 n=5)
Cos-4                    222.8n ± ∞ ¹   169.8n ± ∞ ¹  -23.79% (p=0.008 n=5)
Cosh-4                   550.2n ± ∞ ¹   477.4n ± ∞ ¹  -13.23% (p=0.008 n=5)
Erf-4                    171.6n ± ∞ ¹   174.5n ± ∞ ¹        ~ (p=0.635 n=5)
Erfc-4                   182.6n ± ∞ ¹   170.2n ± ∞ ¹   -6.79% (p=0.008 n=5)
Erfinv-4                 177.6n ± ∞ ¹   196.6n ± ∞ ¹  +10.70% (p=0.008 n=5)
Erfcinv-4                177.8n ± ∞ ¹   197.8n ± ∞ ¹  +11.25% (p=0.008 n=5)
Exp-4                    422.8n ± ∞ ¹   382.1n ± ∞ ¹   -9.63% (p=0.008 n=5)
ExpGo-4                  416.1n ± ∞ ¹   383.2n ± ∞ ¹   -7.91% (p=0.008 n=5)
Expm1-4                  232.9n ± ∞ ¹   252.2n ± ∞ ¹   +8.29% (p=0.008 n=5)
Exp2-4                   404.8n ± ∞ ¹   389.1n ± ∞ ¹   -3.88% (p=0.008 n=5)
Exp2Go-4                 407.0n ± ∞ ¹   372.3n ± ∞ ¹   -8.53% (p=0.008 n=5)
Abs-4                   30.120n ± ∞ ¹   3.014n ± ∞ ¹  -89.99% (p=0.008 n=5)
Dim-4                    5.021n ± ∞ ¹   5.023n ± ∞ ¹        ~ (p=0.071 n=5)
Floor-4                  127.8n ± ∞ ¹   127.1n ± ∞ ¹   -0.55% (p=0.008 n=5)
Max-4                    77.69n ± ∞ ¹   76.33n ± ∞ ¹   -1.75% (p=0.008 n=5)
Min-4                    83.27n ± ∞ ¹   77.87n ± ∞ ¹   -6.48% (p=0.008 n=5)
Mod-4                    906.2n ± ∞ ¹   692.9n ± ∞ ¹  -23.54% (p=0.008 n=5)
Frexp-4                  150.6n ± ∞ ¹   108.6n ± ∞ ¹  -27.89% (p=0.008 n=5)
Gamma-4                  418.4n ± ∞ ¹   386.1n ± ∞ ¹   -7.72% (p=0.008 n=5)
Hypot-4                 148.20n ± ∞ ¹   93.78n ± ∞ ¹  -36.72% (p=0.008 n=5)
HypotGo-4               148.20n ± ∞ ¹   94.47n ± ∞ ¹  -36.26% (p=0.008 n=5)
Ilogb-4                 135.50n ± ∞ ¹   92.38n ± ∞ ¹  -31.82% (p=0.008 n=5)
J0-4                     937.7n ± ∞ ¹   861.7n ± ∞ ¹   -8.10% (p=0.008 n=5)
J1-4                     915.4n ± ∞ ¹   875.9n ± ∞ ¹   -4.32% (p=0.008 n=5)
Jn-4                     1.974µ ± ∞ ¹   1.863µ ± ∞ ¹   -5.62% (p=0.008 n=5)
Ldexp-4                  158.5n ± ∞ ¹   129.3n ± ∞ ¹  -18.42% (p=0.008 n=5)
Lgamma-4                 209.0n ± ∞ ¹   211.8n ± ∞ ¹        ~ (p=0.095 n=5)
Log-4                    326.4n ± ∞ ¹   295.2n ± ∞ ¹   -9.56% (p=0.008 n=5)
Logb-4                   147.7n ± ∞ ¹   105.0n ± ∞ ¹  -28.91% (p=0.008 n=5)
Log1p-4                  303.4n ± ∞ ¹   266.3n ± ∞ ¹  -12.23% (p=0.008 n=5)
Log10-4                  329.2n ± ∞ ¹   298.3n ± ∞ ¹   -9.39% (p=0.008 n=5)
Log2-4                   187.4n ± ∞ ¹   153.0n ± ∞ ¹  -18.36% (p=0.008 n=5)
Modf-4                   110.5n ± ∞ ¹   103.5n ± ∞ ¹   -6.33% (p=0.008 n=5)
Nextafter32-4            128.4n ± ∞ ¹   121.5n ± ∞ ¹   -5.37% (p=0.016 n=5)
Nextafter64-4            109.5n ± ∞ ¹   110.5n ± ∞ ¹   +0.91% (p=0.008 n=5)
PowInt-4                 603.3n ± ∞ ¹   516.4n ± ∞ ¹  -14.40% (p=0.008 n=5)
PowFrac-4                1.365µ ± ∞ ¹   1.183µ ± ∞ ¹  -13.33% (p=0.008 n=5)
Pow10Pos-4               15.07n ± ∞ ¹   15.07n ± ∞ ¹        ~ (p=0.738 n=5)
Pow10Neg-4               21.11n ± ∞ ¹   21.10n ± ∞ ¹        ~ (p=0.190 n=5)
Round-4                  44.23n ± ∞ ¹   44.22n ± ∞ ¹        ~ (p=0.635 n=5)
RoundToEven-4            50.25n ± ∞ ¹   46.27n ± ∞ ¹   -7.92% (p=0.008 n=5)
Remainder-4              675.6n ± ∞ ¹   530.4n ± ∞ ¹  -21.49% (p=0.008 n=5)
Signbit-4                17.07n ± ∞ ¹   17.95n ± ∞ ¹   +5.16% (p=0.008 n=5)
Sin-4                    171.6n ± ∞ ¹   189.1n ± ∞ ¹  +10.20% (p=0.008 n=5)
Sincos-4                 201.5n ± ∞ ¹   200.5n ± ∞ ¹        ~ (p=0.421 n=5)
Sinh-4                   529.6n ± ∞ ¹   484.6n ± ∞ ¹   -8.50% (p=0.008 n=5)
SqrtIndirect-4           5.021n ± ∞ ¹   5.023n ± ∞ ¹   +0.04% (p=0.048 n=5)
SqrtLatency-4            8.032n ± ∞ ¹   8.039n ± ∞ ¹   +0.09% (p=0.024 n=5)
SqrtIndirectLatency-4    8.036n ± ∞ ¹   8.038n ± ∞ ¹        ~ (p=0.056 n=5)
SqrtGoLatency-4          338.8n ± ∞ ¹   338.7n ± ∞ ¹        ~ (p=0.841 n=5)
SqrtPrime-4              5.379µ ± ∞ ¹   5.382µ ± ∞ ¹   +0.06% (p=0.048 n=5)
Tan-4                    182.7n ± ∞ ¹   191.8n ± ∞ ¹   +4.98% (p=0.008 n=5)
Tanh-4                   558.7n ± ∞ ¹   497.6n ± ∞ ¹  -10.94% (p=0.008 n=5)
Trunc-4                  122.5n ± ∞ ¹   122.6n ± ∞ ¹        ~ (p=0.405 n=5)
Y0-4                     892.8n ± ∞ ¹   851.7n ± ∞ ¹   -4.60% (p=0.008 n=5)
Y1-4                     887.2n ± ∞ ¹   863.2n ± ∞ ¹   -2.71% (p=0.008 n=5)
Yn-4                     1.889µ ± ∞ ¹   1.832µ ± ∞ ¹   -3.02% (p=0.008 n=5)
Float64bits-4            13.05n ± ∞ ¹   13.06n ± ∞ ¹   +0.08% (p=0.040 n=5)
Float64frombits-4        13.05n ± ∞ ¹   13.06n ± ∞ ¹        ~ (p=0.143 n=5)
Float32bits-4            13.05n ± ∞ ¹   13.06n ± ∞ ¹   +0.08% (p=0.008 n=5)
Float32frombits-4        13.05n ± ∞ ¹   13.08n ± ∞ ¹   +0.23% (p=0.016 n=5)
FMA-4                    445.7n ± ∞ ¹   448.1n ± ∞ ¹   +0.54% (p=0.008 n=5)
geomean                  157.2n         142.8n         -9.17%

Change-Id: I9bf104848b588c9ecf79401a81d483d7fcdb0a79
Reviewed-on: https://go-review.googlesource.com/c/go/+/481575
Reviewed-by: M Zhuo <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Than McIntosh <thanm@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com>
2023-05-05 14:54:39 +00:00
Keith Randall
6b165577fe cmd/compile: remove memequal call from string compares in more cases
Add more rules to ensure that order doesn't matter.

Add memequal 0 rule.

Try to use a constant argument to memequal when one is available.

Fixes #59684

Change-Id: I36e85ffbd949396ed700ed6e8ec2bc3ae013f5d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/485535
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-18 21:31:33 +00:00
ruinan
9be533a8ee cmd/compile: get more bounds info from logic operators in prove pass
Currently, the prove pass can get knowledge from some specific logic
operators only before the CFG is explored, which means that the bounds
information of the branch will be ignored.

This CL updates the facts table by the logic operators in every
branch. Combined with the branch information, this will be helpful for
BCE in some circumstances.

Fixes #57243

Change-Id: I0bd164f1b47804ccfc37879abe9788740b016fd5
Reviewed-on: https://go-review.googlesource.com/c/go/+/419555
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-04-07 10:09:11 +00:00
erifan01
42f99b203d cmd/compile: optimize cmp to cmn under conditions < and >= on arm64
Under the right conditions we can optimize cmp comparisons to cmn
comparisons, such as:
func foo(a, b int) int {
  var c int
  if a + b < 0 {
  	c = 1
  }
  return c
}

Previously it's compiled as:
  ADD     R1, R0, R1
  CMP     $0, R1
  CSET    LT, R0
With this CL it's compiled as:
  CMN     R1, R0
  CSET    MI, R0
Here we need to pay attention to the overflow situation of a+b, the MI
flag means N==1, which doesn't honor the overflow flag V, its value
depends only on the sign of the result. So it has the same semantic of
the Go code, so it's correct.

Similarly, this CL also optimizes the case of >= comparison
using the PL conditional flag.

Change-Id: I47179faba5b30cca84ea69bafa2ad5241bf6dfba
Reviewed-on: https://go-review.googlesource.com/c/go/+/476116
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-24 01:19:09 +00:00
erifan01
91a2e921dd cmd/compile: fix incorrect truncating when converting CMP to TST on arm64
CL 420434 optimized CMP into TST in some situations, but it has a bug,
these four rules are not correct:
(LessThan (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessThan (TSTconst [c] y))
(LessEqual (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessEqual (TSTconst [c] y))
(GreaterThan (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterThan (TSTconst [c] y))
(GreaterEqual (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterEqual (TSTconst [c] y))

But due to the existence of this rule
(LessThan (CMPWconst [0] x:(ANDconst [c] y))) && x.Uses == 1 =>
(LessThan (TSTWconst [int32(c)] y)), the above rules have never been
fired. This CL corrects them as:
(LessThan (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessThan (TSTconst [c] y))
(LessEqual (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (LessEqual (TSTconst [c] y))
(GreaterThan (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterThan (TSTconst [c] y))
(GreaterEqual (CMPconst [0] x:(ANDconst [c] y))) && x.Uses == 1 => (GreaterEqual (TSTconst [c] y))

Change-Id: I7d60bcc9a266ee58388baeaab9f493b57cf1ad55
Reviewed-on: https://go-review.googlesource.com/c/go/+/473617
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
2023-03-22 08:32:53 +00:00
Yi Yang
da4687923b cmd/compile: add rewrite rules for arithmetic operations
Add the following common local transformations

(t + x) - (t + y) == x - y
(t + x) - (y + t) == x - y
(x + t) - (y + t) == x - y
(x + t) - (t + y) == x - y
(x - t) + (t + y) == x + y
(x - t) + (y + t) == x + y

The compiler itself matches such patterns many times. This also aligns with other popular compilers.

Fixes #59111

Change-Id: Ibdfdb414782f8fcaa20b84ac5d43d0d9ae2c7b60
GitHub-Last-Rev: 1aad82e62e
GitHub-Pull-Request: golang/go#59119
Reviewed-on: https://go-review.googlesource.com/c/go/+/477555
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2023-03-20 15:42:09 +00:00
Wayne Zuo
cedfcba3e8 cmd/compile: instrinsify TrailingZeros{8,32,64} for 386
This CL add support for instrinsifying the TrialingZeros{8,32,64}
functions for 386 architecture. We need handle the case when the input
is 0, which could lead to undefined output from the BSFL instruction.

Next CL will remove the assembly code in runtime/internal/sys package.

Change-Id: Ic168edf68e81bf69a536102100fdd3f56f0f4a1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/475735
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-14 08:10:32 +00:00
ruinan
4d180f71dc cmd/compile: omit redundant sign/unsign extension on arm64
On Arm64, all 32-bit instructions will ignore the upper 32 bits and
clear them to zero for the result. No need to do an unsign extend before
a 32 bit op.

This CL removes the redundant unsign extension only for the existing
32-bit opcodes, and also omits the sign extension when the upper bit of
the result can be predicted.

Fixes #42162

Change-Id: I61e6670bfb8982572430e67a4fa61134a3ea240a
CustomizedGitHooks: yes
Reviewed-on: https://go-review.googlesource.com/c/go/+/427454
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Eric Fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-28 03:16:44 +00:00
Dmitri Shuralyov
7a0799b2c0 cmd/dist, test: convert test/run.go runner to a cmd/go test
As motivated on the issue, we want to move the functionality of the
run.go program to happen via a normal go test. Each .go test case in
the GOROOT/test directory gets a subtest, and cmd/go's support for
parallel test execution replaces run.go's own implementation thereof.

The goal of this change is to have fairly minimal and readable diff
while making an atomic changeover. The working directory is modified
during the test execution to be GOROOT/test as it was with run.go,
and most of the test struct and its run method are kept unchanged.
The next CL in the stack applies further simplifications and cleanups
that become viable.

There's no noticeable difference in test execution time: it takes around
60-80 seconds both before and after on my machine. Test caching, which
the previous runner lacked, can shorten the time significantly.

For #37486.
Fixes #56844.

Change-Id: I209619dc9d90e7529624e49c01efeadfbeb5c9ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/463276
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-28 01:11:37 +00:00
Michael Munday
85d54a7667 cmd/compile: use zero constants in comparisons where possible
Some integer comparisons with 1 and -1 can be rewritten as comparisons
with 0. For example, x < 1 is equivalent to x <= 0. This is an
advantageous transformation on riscv64 because comparisons with zero
do not require a constant to be loaded into a register. Other
architectures will likely benefit too and the transformation is
relatively benign on architectures that do not benefit.

Change-Id: I2ce9821dd7605a660eb71d76e83a61f9bae1bf25
Reviewed-on: https://go-review.googlesource.com/c/go/+/350831
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-27 21:38:30 +00:00
Lynn Boger
ebe49f98c8 cmd/compile: inline constant sized memclrNoHeapPointers calls on PPC64
Update the function isInlinableMemclr for ppc64 and ppc64le
to enable inlining for the constant sized cases < 512.

Larger cases can use dcbz which performs better but requires
alignment checking so it is best to continue using memclrNoHeapPointers
for those cases.

Results on p10:

MemclrKnownSize1         2.07ns ± 0%     0.57ns ± 0%   -72.59%
MemclrKnownSize2         2.56ns ± 5%     0.57ns ± 0%   -77.82%
MemclrKnownSize4         5.15ns ± 0%     0.57ns ± 0%   -89.00%
MemclrKnownSize8         2.23ns ± 0%     0.57ns ± 0%   -74.57%
MemclrKnownSize16        2.23ns ± 0%     0.50ns ± 0%   -77.74%
MemclrKnownSize32        2.28ns ± 0%     0.56ns ± 0%   -75.28%
MemclrKnownSize64        2.49ns ± 0%     0.72ns ± 0%   -70.95%
MemclrKnownSize112       2.97ns ± 2%     1.14ns ± 0%   -61.72%
MemclrKnownSize128       4.64ns ± 6%     2.45ns ± 1%   -47.17%
MemclrKnownSize192       5.45ns ± 5%     2.79ns ± 0%   -48.87%
MemclrKnownSize248       4.51ns ± 0%     2.83ns ± 0%   -37.12%
MemclrKnownSize256       6.34ns ± 1%     3.58ns ± 0%   -43.53%
MemclrKnownSize512       3.64ns ± 0%     3.64ns ± 0%    -0.03%
MemclrKnownSize1024      4.73ns ± 0%     4.73ns ± 0%    +0.01%
MemclrKnownSize4096      17.1ns ± 0%     17.1ns ± 0%    +0.07%
MemclrKnownSize512KiB    2.12µs ± 0%     2.12µs ± 0%      ~     (all equal)

Change-Id: If1abf5749f4802c64523a41fe0058bd144d0ea46
Reviewed-on: https://go-review.googlesource.com/c/go/+/464340
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2023-02-23 18:57:27 +00:00
Wayne Zuo
d7ac5d1480 cmd/compile: intrinsify math/bits/ReverseBytes{32|64} for 386
The BSWAPL instruction is supported in i486 and newer.
https://github.com/golang/go/wiki/MinimumRequirements#386 says we
support "All Pentium MMX or later". The Pentium is also referred to as
i586, so that we are safe with these instructions.

Change-Id: I6dea1f9d864a45bb07c8f8f35a81cfe16cca216c
Reviewed-on: https://go-review.googlesource.com/c/go/+/465515
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-02-08 03:43:23 +00:00
Archana R
a432d89137 cmd/compile: add rules to emit SETBC/R instructions on power10
This CL adds rules that replaces instances of ISEL that produce
a boolean result based on a condition register by SETBC/SETBCR
operations. On Power10 these are convereted to SETBC/SETBCR
instructions that use one register instead of 3 registers
conventionally used by ISEL and hence reduces register pressure.
On loops written specifically to exercise such instances of ISEL
extensively, a performance improvement of 2.5% is seen on Power10.
Also added verification tests to verify correct generation of
SETBC/SETBCR instructions on Power10.

Change-Id: Ib719897f09d893de40324440a43052dca026e8fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/449795
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-06 12:49:53 +00:00
Archana R
cd1fc87156 cmd/compile: intrinsify math/bits/ReverseBytes{16|32|64} for ppc64/power10
This change intrinsifies ReverseBytes{16|32|64} by generating the
corresponding new instructions in Power10: brh, brd and brw and
adds a verification test for the same.
On Power 9 and 8, the .go code performs optimally as it is.

Performance improvement seen on Power10:
ReverseBytes32  1.38ns ± 0%  1.18ns ± 0%  -14.2
ReverseBytes64  1.52ns ± 0%  1.11ns ± 0%  -26.87
ReverseBytes16  1.41ns ± 1%  1.18ns ± 0%  -16.47

Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37
Reviewed-on: https://go-review.googlesource.com/c/go/+/446675
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-02-03 19:01:06 +00:00
Keith Randall
6224db9b4d cmd/compile: schedule values with no in-block uses later
When scheduling a block, deprioritize values whose results aren't used
until subsequent blocks.

For #58166, this has the effect of pushing the induction variable increment
to the end of the block, past all the other uses of the pre-incremented value.

Do this only with optimizations on. Debuggers have a preference for values
in source code order, which this CL can degrade.

Fixes #58166
Fixes #57976

Change-Id: I40d5885c661b142443c6d4702294c8abe8026c4f
Reviewed-on: https://go-review.googlesource.com/c/go/+/463751
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-01 18:41:07 +00:00
Jakub Ciolek
1fc585dc2f cmd/compile: inline known-size memclrNoHeapPointers calls
This patch rewrites known-size calls to memclrNoHeapPointers with an OpZero.
This significantly improves performance and lets some clears get DSE'd.

One of the cases where this applies is zeroing a known-size array, example:

	var x [256]int8

        ...

	for a := range x {
	    x[a] = 0
	}

Other cases can be found in the runtime itself where memclrNoHeapPointers is sometimes directly invoked with a constant.

It seems that for some sized-clears on some architectures (AMD64, maybe others) the memcrlNoHeapPointers is more performant than OpZero.
See the issue #56997 for more details.

Benches ARM (M1 Pro):

name                      old time/op     new time/op     delta
MemclrKnownSize1-10          2.03ns ± 0%     0.31ns ± 0%    -84.69%  (p=0.000 n=18+19)
MemclrKnownSize2-10          1.97ns ± 0%     0.31ns ± 0%    -84.19%  (p=0.000 n=12+19)
MemclrKnownSize4-10          2.02ns ± 0%     0.31ns ± 0%    -84.56%  (p=0.000 n=12+20)
MemclrKnownSize8-10          2.02ns ± 0%     0.31ns ± 0%    -84.59%  (p=0.000 n=14+19)
MemclrKnownSize16-10         2.15ns ± 0%     0.31ns ± 0%    -85.50%  (p=0.000 n=18+19)
MemclrKnownSize32-10         2.48ns ± 0%     0.31ns ± 0%    -87.48%  (p=0.000 n=20+19)
MemclrKnownSize64-10         1.93ns ± 0%     0.62ns ± 0%    -67.88%  (p=0.000 n=20+19)
MemclrKnownSize112-10        2.48ns ± 0%     1.80ns ± 0%    -27.74%  (p=0.000 n=19+20)
MemclrKnownSize128-10       10.0ns ±112%      2.0ns ± 0%    -79.76%  (p=0.000 n=18+17)
MemclrKnownSize192-10       27.4ns ±103%      2.6ns ± 0%    -90.38%  (p=0.000 n=16+19)
MemclrKnownSize248-10        9.67ns ±43%     3.26ns ± 0%    -66.29%  (p=0.000 n=19+19)
MemclrKnownSize256-10       85.4ns ±148%      3.3ns ± 0%    -96.18%  (p=0.000 n=20+20)
MemclrKnownSize512-10         223ns ±54%        6ns ± 0%    -97.42%  (p=0.000 n=18+20)
MemclrKnownSize1024-10        216ns ±26%       11ns ± 0%    -95.00%  (p=0.000 n=18+15)
MemclrKnownSize4096-10        265ns ± 2%       88ns ± 0%    -66.84%  (p=0.000 n=19+17)
MemclrKnownSize512KiB-10     9.91µs ± 1%    10.23µs ± 2%     +3.14%  (p=0.000 n=19+19)
[Geo mean]                   15.6ns           2.5ns         -83.62%

name                      old speed       new speed       delta
MemclrKnownSize1-10         493MB/s ± 0%   3216MB/s ± 0%   +553.04%  (p=0.000 n=18+19)
MemclrKnownSize2-10        1.02GB/s ± 0%   6.43GB/s ± 0%   +532.33%  (p=0.000 n=16+19)
MemclrKnownSize4-10        1.99GB/s ± 0%  12.86GB/s ± 0%   +547.67%  (p=0.000 n=18+20)
MemclrKnownSize8-10        3.96GB/s ± 0%  25.72GB/s ± 0%   +548.81%  (p=0.000 n=19+19)
MemclrKnownSize16-10       7.46GB/s ± 0%  51.43GB/s ± 0%   +589.72%  (p=0.000 n=20+19)
MemclrKnownSize32-10       12.9GB/s ± 0%  102.9GB/s ± 0%   +698.60%  (p=0.000 n=20+18)
MemclrKnownSize64-10       33.1GB/s ± 0%  103.0GB/s ± 0%   +211.34%  (p=0.000 n=19+19)
MemclrKnownSize112-10      45.1GB/s ± 0%   62.4GB/s ± 0%    +38.38%  (p=0.000 n=19+20)
MemclrKnownSize128-10     13.3GB/s ±107%   63.5GB/s ± 0%   +378.03%  (p=0.000 n=19+18)
MemclrKnownSize192-10     6.97GB/s ±139%  72.72GB/s ± 0%   +943.44%  (p=0.000 n=19+19)
MemclrKnownSize248-10      25.9GB/s ±46%   76.1GB/s ± 0%   +194.16%  (p=0.000 n=20+17)
MemclrKnownSize256-10     8.64GB/s ±196%  78.51GB/s ± 0%   +808.19%  (p=0.000 n=20+20)
MemclrKnownSize512-10      2.33GB/s ±86%  89.13GB/s ± 0%  +3719.50%  (p=0.000 n=17+20)
MemclrKnownSize1024-10     4.85GB/s ±32%  94.93GB/s ± 0%  +1856.74%  (p=0.000 n=18+19)
MemclrKnownSize4096-10     15.4GB/s ± 2%   46.6GB/s ± 0%   +201.55%  (p=0.000 n=19+18)
MemclrKnownSize512KiB-10   52.9GB/s ± 1%   51.3GB/s ± 2%     -3.04%  (p=0.000 n=19+19)
[Geo mean]                 7.54GB/s       42.86GB/s        +468.76%

Intel Alder Lake 12600k:

name                      old time/op    new time/op     delta
MemclrKnownSize1-16         0.59ns ± 3%     0.38ns ± 6%   -36.00%  (p=0.000 n=19+18)
MemclrKnownSize2-16         0.57ns ± 1%     0.19ns ± 5%   -66.27%  (p=0.000 n=19+19)
MemclrKnownSize4-16         0.66ns ± 2%     0.36ns ±21%   -45.12%  (p=0.000 n=19+20)
MemclrKnownSize8-16         0.74ns ± 1%     0.30ns ±26%   -59.81%  (p=0.000 n=18+20)
MemclrKnownSize16-16        1.00ns ± 7%     0.21ns ± 8%   -79.51%  (p=0.000 n=20+19)
MemclrKnownSize32-16        0.95ns ± 1%     0.40ns ± 1%   -57.61%  (p=0.000 n=20+18)
MemclrKnownSize64-16        1.20ns ± 2%     0.41ns ± 0%   -65.82%  (p=0.000 n=20+18)
MemclrKnownSize112-16       1.27ns ± 2%     1.03ns ± 0%   -19.35%  (p=0.000 n=20+18)
MemclrKnownSize128-16       1.34ns ± 2%     1.03ns ± 0%   -23.02%  (p=0.000 n=20+20)
MemclrKnownSize192-16       1.92ns ± 2%     1.44ns ± 0%   -24.89%  (p=0.000 n=20+16)
MemclrKnownSize248-16       2.77ns ± 1%     3.29ns ± 0%   +18.81%  (p=0.000 n=20+16)
MemclrKnownSize256-16       1.92ns ± 1%     1.86ns ± 0%    -3.49%  (p=0.000 n=19+15)
MemclrKnownSize512-16       2.81ns ± 2%     3.49ns ± 0%   +24.15%  (p=0.000 n=20+17)
MemclrKnownSize1024-16      4.02ns ± 1%     6.78ns ± 0%   +68.44%  (p=0.000 n=20+18)
MemclrKnownSize4096-16      17.2ns ± 2%     14.4ns ± 0%   -16.73%  (p=0.000 n=20+17)
MemclrKnownSize512KiB-16    6.71µs ± 1%     6.52µs ± 0%    -2.85%  (p=0.000 n=20+18)
[Geo mean]                  2.60ns          1.71ns        -34.06%

name                      old speed      new speed       delta
MemclrKnownSize1-16       1.71GB/s ± 3%   2.67GB/s ± 6%   +56.39%  (p=0.000 n=19+18)
MemclrKnownSize2-16       3.52GB/s ± 2%  10.43GB/s ± 6%  +196.04%  (p=0.000 n=20+20)
MemclrKnownSize4-16       6.06GB/s ± 1%  10.83GB/s ±11%   +78.63%  (p=0.000 n=19+18)
MemclrKnownSize8-16       10.7GB/s ± 1%   27.0GB/s ±21%  +151.49%  (p=0.000 n=18+20)
MemclrKnownSize16-16      16.0GB/s ± 8%   78.1GB/s ± 7%  +387.24%  (p=0.000 n=20+19)
MemclrKnownSize32-16      33.6GB/s ± 1%   79.4GB/s ± 1%  +135.89%  (p=0.000 n=20+18)
MemclrKnownSize64-16      53.3GB/s ± 2%  155.9GB/s ± 0%  +192.58%  (p=0.000 n=20+18)
MemclrKnownSize112-16     88.0GB/s ± 2%  109.1GB/s ± 0%   +23.97%  (p=0.000 n=20+18)
MemclrKnownSize128-16     95.3GB/s ± 2%  123.8GB/s ± 0%   +29.88%  (p=0.000 n=20+20)
MemclrKnownSize192-16      100GB/s ± 2%    133GB/s ± 0%   +33.12%  (p=0.000 n=20+17)
MemclrKnownSize248-16     89.7GB/s ± 1%   75.5GB/s ± 0%   -15.84%  (p=0.000 n=20+19)
MemclrKnownSize256-16      133GB/s ± 1%    138GB/s ± 0%    +3.61%  (p=0.000 n=19+14)
MemclrKnownSize512-16      182GB/s ± 2%    147GB/s ± 0%   -19.46%  (p=0.000 n=20+17)
MemclrKnownSize1024-16     254GB/s ± 1%    151GB/s ± 0%   -40.64%  (p=0.000 n=20+18)
MemclrKnownSize4096-16     237GB/s ± 2%    285GB/s ± 0%   +20.09%  (p=0.000 n=20+17)
MemclrKnownSize512KiB-16  78.2GB/s ± 1%   80.4GB/s ± 0%    +2.93%  (p=0.000 n=20+18)
[Geo mean]                42.1GB/s        63.8GB/s        +51.53%

compilecmp linux/amd64:

runtime
runtime.(*pallocData).allocAll 85 -> 45  (-47.06%)
runtime.(*pageAlloc).allocRange 942 -> 923  (-2.02%)
runtime.(*pageAlloc).free 798 -> 774  (-3.01%)
runtime.(*pageBits).clearAll 66 -> 20  (-69.70%)
runtime.startCheckmarks 255 -> 246  (-3.53%)
runtime.(*pallocData).freeAll 86 -> 46  (-46.51%)
runtime.(*pallocBits).freeAll 66 -> 20  (-69.70%)
runtime.(*consistentHeapStats).unsafeClear 66 -> 19  (-71.21%)
runtime.newproc1 965 -> 933  (-3.32%)

crypto/rc4
crypto/rc4.(*Cipher).Reset 78 -> 69  (-11.54%)

compress/bzip2
compress/bzip2.(*reader).readBlock 2973 -> 2941  (-1.08%)

image/jpeg
image/jpeg.(*decoder).processDHT 1179 -> 1166  (-1.10%)

index/suffixarray
index/suffixarray.bucketMax_8_32 394 -> 241  (-38.83%)
index/suffixarray.freq_8_32 317 -> 185  (-41.64%)
index/suffixarray.freq_8_64 317 -> 178  (-43.85%)
index/suffixarray.bucketMin_8_32 394 -> 243  (-38.32%)
index/suffixarray.bucketMin_8_64 398 -> 234  (-41.21%)
index/suffixarray.bucketMax_8_64 398 -> 234  (-41.21%)

compress/flate
compress/flate.(*huffmanBitWriter).generateCodegen 965 -> 838  (-13.16%)
compress/flate.(*compressor).reset 429 -> 409  (-4.66%)

cmd/vendor/golang.org/x/sys/unix
cmd/vendor/golang.org/x/sys/unix.(*FdSet).Zero 66 -> 60  (-9.09%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetInet4Addr 211 -> 129  (-38.86%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint32 98 -> 14  (-85.71%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).clear 66 -> 11  (-83.33%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint16 101 -> 15  (-85.15%)
cmd/vendor/golang.org/x/sys/unix.(*CPUSet).Zero 66 -> 60  (-9.09%)

internal/coverage/decodemeta
internal/coverage/decodemeta.(*CoverageMetaFileReader).rdUint64 325 -> 293  (-9.85%)

crypto/tls
crypto/tls.(*halfConn).setTrafficSecret 253 -> 247  (-2.37%)
crypto/tls.(*Conn).readRecordOrCCS 10315 -> 10283  (-0.31%)
crypto/tls.(*halfConn).changeCipherSpec 271 -> 261  (-3.69%)
crypto/tls.(*Conn).writeRecordLocked 1765 -> 1748  (-0.96%)

file                               before   after    Δ       %
runtime.s                          512467   512164   -303    -0.059%
crypto/rc4.s                       955      946      -9      -0.942%
compress/bzip2.s                   9586     9554     -32     -0.334%
image/jpeg.s                       32122    32109    -13     -0.040%
index/suffixarray.s                38547    37644    -903    -2.343%
compress/flate.s                   46668    46521    -147    -0.315%
cmd/vendor/golang.org/x/sys/unix.s 118620   118301   -319    -0.269%
internal/coverage/decodemeta.s     7224     7192     -32     -0.443%
crypto/tls.s                       288762   288697   -65     -0.023%
cmd/compile/internal/ssa.s         3639799  3640727  +928    +0.025%
total                              20790248 20789353 -895    -0.004%

src/runtime benchmarks (Linux Alder Lake 12600k):

name                                             old time/op    new time/op    delta
MakeChan/Byte-16                                   26.2ns ± 2%    25.6ns ± 3%   -2.05%  (p=0.003 n=9+10)
MakeChan/Int-16                                    33.9ns ± 2%    33.3ns ± 4%   -1.99%  (p=0.015 n=10+10)
MakeChan/Ptr-16                                    54.2ns ± 2%    53.7ns ± 1%   -0.90%  (p=0.016 n=10+9)
MakeChan/Struct/0-16                               23.8ns ± 3%    23.4ns ± 1%   -1.72%  (p=0.009 n=10+8)
MakeChan/Struct/32-16                              55.9ns ± 2%    53.9ns ± 1%   -3.48%  (p=0.000 n=10+10)
MakeChan/Struct/40-16                              63.5ns ± 1%    61.1ns ± 2%   -3.79%  (p=0.000 n=10+9)
ChanNonblocking-16                                 0.22ns ± 0%    0.22ns ± 0%   +0.40%  (p=0.011 n=9+8)
SelectUncontended-16                               4.63ns ± 1%    4.62ns ± 0%   -0.35%  (p=0.001 n=10+8)
SelectSyncContended-16                             1.58µs ± 2%    1.59µs ± 1%     ~     (p=0.540 n=10+10)
SelectAsyncContended-16                             290ns ± 0%     291ns ± 0%   +0.14%  (p=0.012 n=8+9)
SelectNonblock-16                                  0.95ns ± 1%    0.95ns ± 1%     ~     (p=0.546 n=9+9)
ChanUncontended-16                                  239ns ± 3%     242ns ± 6%     ~     (p=0.886 n=9+10)
ChanContended-16                                   17.7µs ± 1%    18.2µs ± 1%   +2.87%  (p=0.000 n=10+9)
ChanSync-16                                         109ns ± 2%     109ns ± 1%     ~     (p=0.342 n=10+10)
ChanSyncWork-16                                    6.55µs ± 1%    6.53µs ± 1%     ~     (p=0.101 n=10+10)
ChanProdCons0-16                                    502ns ± 1%     499ns ± 0%   -0.55%  (p=0.001 n=10+9)
ChanProdCons10-16                                   373ns ± 2%     377ns ± 1%     ~     (p=0.095 n=10+9)
ChanProdCons100-16                                  224ns ± 2%     223ns ± 3%     ~     (p=0.150 n=9+10)
ChanProdConsWork0-16                                491ns ± 1%     484ns ± 0%   -1.26%  (p=0.000 n=10+9)
ChanProdConsWork10-16                               451ns ± 2%     448ns ± 2%     ~     (p=0.210 n=8+10)
ChanProdConsWork100-16                              406ns ± 0%     407ns ± 1%     ~     (p=0.138 n=8+8)
SelectProdCons-16                                   509ns ± 0%     509ns ± 0%     ~     (p=0.917 n=9+9)
ReceiveDataFromClosedChan-16                       12.1ns ± 0%    12.1ns ± 0%     ~     (p=0.780 n=10+10)
ChanCreation-16                                    22.6ns ± 1%    22.4ns ± 0%   -0.72%  (p=0.001 n=10+8)
ChanSem-16                                          165ns ± 1%     166ns ± 1%   +0.72%  (p=0.002 n=10+10)
ChanPopular-16                                      500µs ± 2%     498µs ± 1%     ~     (p=0.218 n=10+10)
ChanClosed-16                                      0.29ns ± 0%    0.29ns ± 0%   +0.09%  (p=0.019 n=9+8)
CallClosure-16                                     1.28ns ± 0%    1.27ns ± 0%   -0.51%  (p=0.000 n=9+9)
CallClosure1-16                                    1.50ns ± 0%    1.50ns ± 0%     ~     (p=0.123 n=9+9)
CallClosure2-16                                    8.86ns ± 1%    8.86ns ± 3%     ~     (p=0.590 n=9+10)
CallClosure3-16                                    8.75ns ± 2%    8.69ns ± 2%     ~     (p=0.247 n=10+10)
CallClosure4-16                                    8.65ns ± 2%    8.56ns ± 2%     ~     (p=0.105 n=10+10)
Complex128DivNormal-16                             2.47ns ± 0%    2.47ns ± 0%     ~     (p=0.790 n=10+9)
Complex128DivNisNaN-16                             4.44ns ± 0%    4.43ns ± 0%     ~     (p=0.564 n=10+10)
Complex128DivDisNaN-16                             4.48ns ± 0%    4.48ns ± 0%     ~     (p=0.101 n=10+10)
Complex128DivNisInf-16                             2.58ns ± 0%    2.58ns ± 0%     ~     (p=0.808 n=10+10)
Complex128DivDisInf-16                             6.30ns ± 0%    6.31ns ± 0%     ~     (p=0.305 n=10+10)
SetTypePtr-16                                      0.73ns ± 1%    0.73ns ± 3%     ~     (p=0.644 n=10+10)
SetTypePtr8-16                                     4.12ns ± 0%    4.12ns ± 0%     ~     (p=0.127 n=10+10)
SetTypePtr16-16                                    4.13ns ± 1%    4.12ns ± 0%     ~     (p=0.109 n=10+10)
SetTypePtr32-16                                    4.12ns ± 0%    4.12ns ± 0%     ~     (p=0.203 n=9+10)
SetTypePtr64-16                                    4.12ns ± 0%    4.12ns ± 0%     ~     (p=0.696 n=10+10)
SetTypePtr126-16                                   6.91ns ± 0%    6.91ns ± 0%     ~     (p=0.469 n=10+10)
SetTypePtr128-16                                   6.66ns ± 0%    6.67ns ± 0%     ~     (p=0.246 n=9+10)
SetTypePtrSlice-16                                 54.1ns ± 1%    54.1ns ± 1%     ~     (p=0.509 n=9+10)
SetTypeNode1-16                                    4.13ns ± 1%    4.12ns ± 0%     ~     (p=0.342 n=10+10)
SetTypeNode1Slice-16                               10.1ns ± 1%    10.0ns ± 1%   -1.18%  (p=0.000 n=10+10)
SetTypeNode8-16                                    4.12ns ± 0%    4.12ns ± 0%     ~     (p=0.137 n=8+8)
SetTypeNode8Slice-16                               22.6ns ± 0%    22.6ns ± 0%     ~     (p=0.423 n=10+10)
SetTypeNode64-16                                   6.90ns ± 0%    6.91ns ± 0%     ~     (p=0.275 n=10+10)
SetTypeNode64Slice-16                               173ns ± 0%     173ns ± 0%     ~     (p=0.610 n=9+10)
SetTypeNode64Dead-16                               5.53ns ± 0%    5.52ns ± 0%     ~     (p=0.123 n=10+6)
SetTypeNode64DeadSlice-16                           150ns ± 0%     150ns ± 0%     ~     (p=0.398 n=10+10)
SetTypeNode124-16                                  6.90ns ± 0%    6.90ns ± 0%     ~     (p=0.779 n=10+10)
SetTypeNode124Slice-16                              222ns ± 5%     217ns ± 0%     ~     (p=0.302 n=10+10)
SetTypeNode126-16                                  6.66ns ± 0%    6.66ns ± 0%     ~     (p=0.324 n=10+9)
SetTypeNode126Slice-16                              218ns ± 0%     218ns ± 0%     ~     (p=0.119 n=9+10)
SetTypeNode128-16                                  9.76ns ± 0%    9.73ns ± 0%   -0.31%  (p=0.003 n=9+10)
SetTypeNode128Slice-16                              279ns ± 0%     278ns ± 0%     ~     (p=0.112 n=10+9)
SetTypeNode130-16                                  9.77ns ± 0%    9.73ns ± 0%   -0.33%  (p=0.002 n=10+10)
SetTypeNode130Slice-16                              284ns ± 0%     284ns ± 0%     ~     (p=0.668 n=10+10)
SetTypeNode1024-16                                 51.2ns ± 0%    51.6ns ± 1%     ~     (p=0.080 n=9+9)
SetTypeNode1024Slice-16                            1.83µs ± 0%    1.82µs ± 0%     ~     (p=0.115 n=10+10)
Allocation-16                                      4.64µs ± 1%    4.37µs ± 1%   -5.69%  (p=0.000 n=9+9)
ReadMemStats-16                                    5.62µs ± 2%    5.55µs ± 5%   -1.36%  (p=0.050 n=10+10)
WriteBarrier-16                                    4.95ns ± 3%    4.99ns ± 3%     ~     (p=0.255 n=10+10)
BulkWriteBarrier-16                                1.69ns ± 2%    1.63ns ± 4%   -3.77%  (p=0.001 n=10+10)
ScanStackNoLocals-16                               12.8ms ± 2%    12.9ms ± 1%   +0.72%  (p=0.019 n=10+10)
MSpanCountAlloc/bits=64-16                         1.65ns ± 0%    1.65ns ± 0%     ~     (p=0.124 n=10+10)
MSpanCountAlloc/bits=128-16                        2.08ns ± 1%    2.06ns ± 1%   -0.87%  (p=0.000 n=10+10)
MSpanCountAlloc/bits=256-16                        2.71ns ± 1%    2.69ns ± 1%   -0.74%  (p=0.001 n=10+9)
MSpanCountAlloc/bits=512-16                        4.15ns ± 0%    4.23ns ± 2%   +2.15%  (p=0.000 n=10+10)
MSpanCountAlloc/bits=1024-16                       7.89ns ± 1%    7.89ns ± 1%     ~     (p=0.867 n=10+10)
Hash5-16                                           1.93ns ± 1%    2.01ns ± 0%   +3.99%  (p=0.000 n=10+8)
Hash16-16                                          2.04ns ± 1%    2.21ns ± 1%   +8.61%  (p=0.000 n=10+10)
Hash64-16                                          2.67ns ± 0%    2.67ns ± 0%     ~     (p=0.154 n=9+9)
Hash1024-16                                        16.4ns ± 0%    16.4ns ± 0%   +0.17%  (p=0.020 n=9+10)
Hash65536-16                                        886ns ± 0%     885ns ± 0%     ~     (p=0.725 n=10+10)
AlignedLoad-16                                     0.96ns ± 2%    0.95ns ± 3%     ~     (p=0.123 n=10+10)
UnalignedLoad-16                                   0.95ns ± 2%    1.01ns ± 2%   +6.31%  (p=0.000 n=10+10)
EqEfaceConcrete-16                                 0.31ns ± 3%    0.33ns ± 5%   +8.10%  (p=0.000 n=10+10)
EqIfaceConcrete-16                                 0.31ns ±13%    0.28ns ± 2%   -9.23%  (p=0.001 n=10+10)
NeEfaceConcrete-16                                 0.29ns ± 1%    0.31ns ± 7%   +5.59%  (p=0.010 n=8+8)
NeIfaceConcrete-16                                 0.28ns ± 2%    0.29ns ± 1%   +4.49%  (p=0.000 n=9+8)
ConvT2EByteSized/bool-16                           0.53ns ± 1%    0.52ns ± 1%   -2.18%  (p=0.000 n=10+10)
ConvT2EByteSized/uint8-16                          0.53ns ± 1%    0.53ns ± 0%   +1.22%  (p=0.000 n=10+10)
ConvT2ESmall-16                                    1.13ns ± 0%    1.13ns ± 0%     ~     (p=0.774 n=9+9)
ConvT2EUintptr-16                                  1.03ns ± 0%    1.04ns ± 0%   +0.50%  (p=0.000 n=10+8)
ConvT2ELarge-16                                    14.4ns ± 2%    14.4ns ± 1%     ~     (p=0.726 n=10+10)
ConvT2ISmall-16                                    1.13ns ± 0%    1.13ns ± 0%     ~     (p=0.693 n=9+10)
ConvT2IUintptr-16                                  1.03ns ± 0%    1.03ns ± 0%   +0.44%  (p=0.000 n=10+10)
ConvT2ILarge-16                                    14.2ns ± 1%    14.4ns ± 1%   +0.85%  (p=0.007 n=9+10)
ConvI2E-16                                         0.54ns ± 1%    0.54ns ± 0%   -0.39%  (p=0.037 n=10+8)
ConvI2I-16                                         2.68ns ± 0%    2.70ns ± 1%   +0.73%  (p=0.000 n=9+10)
AssertE2T-16                                       0.28ns ± 1%    0.39ns ± 5%  +37.38%  (p=0.000 n=10+10)
AssertE2TLarge-16                                  0.42ns ± 2%    0.48ns ± 1%  +14.92%  (p=0.000 n=9+10)
AssertE2I-16                                       2.67ns ± 0%    2.67ns ± 0%     ~     (p=0.352 n=9+9)
AssertI2T-16                                       0.37ns ± 3%    0.34ns ± 1%   -6.16%  (p=0.000 n=10+10)
AssertI2I-16                                       2.67ns ± 0%    2.67ns ± 0%     ~     (p=0.286 n=10+10)
AssertI2E-16                                       0.54ns ± 1%    0.54ns ± 0%   -0.94%  (p=0.000 n=10+10)
AssertE2E-16                                       0.41ns ± 0%    0.41ns ± 1%     ~     (p=0.880 n=9+9)
AssertE2T2-16                                      0.41ns ± 1%    0.41ns ± 1%     ~     (p=0.725 n=10+10)
AssertE2T2Blank-16                                 0.24ns ± 5%    0.21ns ± 1%  -14.79%  (p=0.000 n=10+9)
AssertI2E2-16                                      0.69ns ± 0%    0.69ns ± 1%     ~     (p=0.541 n=10+10)
AssertI2E2Blank-16                                 0.26ns ± 9%    0.21ns ± 1%  -18.86%  (p=0.000 n=10+9)
AssertE2E2-16                                      0.53ns ± 1%    0.53ns ± 1%   +0.72%  (p=0.004 n=10+10)
AssertE2E2Blank-16                                 0.23ns ± 4%    0.21ns ± 1%   -8.42%  (p=0.000 n=10+10)
ConvT2Ezero/zero/16-16                             1.13ns ± 0%    1.14ns ± 1%     ~     (p=0.583 n=9+10)
ConvT2Ezero/zero/32-16                             1.13ns ± 0%    1.13ns ± 0%     ~     (p=0.417 n=10+10)
ConvT2Ezero/zero/64-16                             1.03ns ± 1%    1.03ns ± 0%     ~     (p=0.051 n=10+10)
ConvT2Ezero/zero/str-16                            1.03ns ± 0%    1.03ns ± 0%     ~     (p=0.132 n=10+10)
ConvT2Ezero/zero/slice-16                          1.14ns ± 0%    1.15ns ± 0%   +0.49%  (p=0.001 n=10+10)
ConvT2Ezero/zero/big-16                             123ns ± 1%     123ns ± 1%     ~     (p=0.171 n=10+10)
ConvT2Ezero/nonzero/str-16                         19.4ns ± 1%    19.3ns ± 3%     ~     (p=0.548 n=9+10)
ConvT2Ezero/nonzero/slice-16                       22.2ns ± 2%    22.0ns ± 2%     ~     (p=0.109 n=10+10)
ConvT2Ezero/nonzero/big-16                          123ns ± 1%     123ns ± 1%     ~     (p=0.446 n=10+8)
ConvT2Ezero/smallint/16-16                         1.13ns ± 0%    1.14ns ± 1%     ~     (p=0.362 n=10+10)
ConvT2Ezero/smallint/32-16                         1.13ns ± 0%    1.13ns ± 0%     ~     (p=0.907 n=10+9)
ConvT2Ezero/smallint/64-16                         1.04ns ± 0%    1.03ns ± 0%   -0.38%  (p=0.002 n=10+10)
ConvT2Ezero/largeint/16-16                         6.65ns ± 1%    6.65ns ± 2%     ~     (p=0.618 n=10+9)
ConvT2Ezero/largeint/32-16                         6.75ns ± 3%    6.63ns ± 2%   -1.77%  (p=0.015 n=10+10)
ConvT2Ezero/largeint/64-16                         9.19ns ± 1%    9.26ns ± 2%     ~     (p=0.123 n=10+10)
Malloc8-16                                         8.66ns ± 1%    8.89ns ± 2%   +2.74%  (p=0.000 n=10+10)
Malloc16-16                                        13.7ns ± 1%    13.8ns ± 1%   +0.71%  (p=0.022 n=10+8)
MallocTypeInfo8-16                                 11.7ns ± 3%    11.6ns ± 2%     ~     (p=0.469 n=10+10)
MallocTypeInfo16-16                                18.3ns ± 1%    18.2ns ± 2%     ~     (p=0.251 n=9+10)
MallocLargeStruct-16                                195ns ± 1%     198ns ± 1%   +1.65%  (p=0.000 n=9+10)
GoroutineSelect-16                                 1.10ms ± 1%    1.12ms ± 1%   +1.36%  (p=0.000 n=10+8)
GoroutineBlocking-16                                986µs ± 1%     998µs ± 1%   +1.23%  (p=0.002 n=10+10)
GoroutineForRange-16                                985µs ± 1%    1001µs ± 1%   +1.68%  (p=0.000 n=10+10)
GoroutineIdle-16                                    679µs ± 1%     691µs ± 0%   +1.74%  (p=0.000 n=10+9)
HashStringSpeed-16                                 5.33ns ± 5%    5.19ns ± 4%     ~     (p=0.113 n=9+9)
HashBytesSpeed-16                                  8.20ns ± 3%    8.24ns ± 1%     ~     (p=0.497 n=10+9)
HashInt32Speed-16                                  4.01ns ± 2%    3.90ns ± 4%   -2.63%  (p=0.011 n=9+10)
HashInt64Speed-16                                  3.94ns ± 4%    3.79ns ± 1%   -3.74%  (p=0.003 n=10+9)
HashStringArraySpeed-16                            12.5ns ± 4%    12.3ns ± 1%     ~     (p=0.055 n=10+10)
MegMap-16                                          3.72ns ± 1%    3.73ns ± 1%     ~     (p=0.484 n=9+10)
MegOneMap-16                                       2.28ns ± 1%    2.27ns ± 1%     ~     (p=0.287 n=10+10)
MegEqMap-16                                        22.0µs ± 3%    22.3µs ± 2%   +1.48%  (p=0.028 n=10+9)
MegEmptyMap-16                                     0.93ns ± 1%    0.92ns ± 1%   -0.52%  (p=0.030 n=10+10)
SmallStrMap-16                                     3.77ns ± 0%    3.77ns ± 0%     ~     (p=0.324 n=10+10)
MapStringKeysEight_16-16                           3.91ns ± 0%    3.91ns ± 0%     ~     (p=0.088 n=9+9)
MapStringKeysEight_32-16                           3.58ns ± 1%    3.50ns ± 0%   -2.11%  (p=0.000 n=10+10)
MapStringKeysEight_64-16                           3.58ns ± 1%    3.50ns ± 0%   -2.23%  (p=0.000 n=10+10)
MapStringKeysEight_1M-16                           3.57ns ± 1%    3.50ns ± 0%   -1.92%  (p=0.000 n=10+10)
IntMap-16                                          2.89ns ± 1%    2.89ns ± 0%     ~     (p=0.381 n=10+10)
MapFirst/1-16                                      1.60ns ± 1%    1.59ns ± 2%   -0.49%  (p=0.020 n=10+9)
MapFirst/2-16                                      1.61ns ± 0%    1.59ns ± 1%   -1.17%  (p=0.001 n=10+10)
MapFirst/3-16                                      1.61ns ± 1%    1.59ns ± 1%   -1.45%  (p=0.000 n=10+10)
MapFirst/4-16                                      1.60ns ± 1%    1.59ns ± 1%   -1.16%  (p=0.000 n=10+10)
MapFirst/5-16                                      1.60ns ± 1%    1.58ns ± 1%   -0.98%  (p=0.000 n=10+10)
MapFirst/6-16                                      1.60ns ± 1%    1.59ns ± 1%   -0.87%  (p=0.001 n=10+10)
MapFirst/7-16                                      1.60ns ± 1%    1.59ns ± 1%   -0.79%  (p=0.002 n=10+10)
MapFirst/8-16                                      1.60ns ± 1%    1.59ns ± 1%   -0.67%  (p=0.017 n=9+10)
MapFirst/9-16                                      2.83ns ± 0%    2.83ns ± 0%     ~     (p=0.492 n=10+10)
MapFirst/10-16                                     2.83ns ± 0%    2.84ns ± 0%   +0.24%  (p=0.017 n=10+10)
MapFirst/11-16                                     2.83ns ± 0%    2.83ns ± 0%     ~     (p=0.445 n=10+10)
MapFirst/12-16                                     2.83ns ± 0%    2.83ns ± 0%     ~     (p=0.564 n=10+10)
MapFirst/13-16                                     2.83ns ± 0%    2.84ns ± 0%     ~     (p=0.175 n=9+10)
MapFirst/14-16                                     2.83ns ± 0%    2.83ns ± 0%     ~     (p=0.322 n=10+9)
MapFirst/15-16                                     2.83ns ± 0%    2.84ns ± 1%     ~     (p=0.209 n=10+10)
MapFirst/16-16                                     2.83ns ± 1%    2.84ns ± 0%     ~     (p=0.238 n=10+10)
MapMid/1-16                                        1.64ns ± 0%    1.64ns ± 0%     ~     (p=0.453 n=10+9)
MapMid/2-16                                        1.86ns ± 1%    1.86ns ± 0%     ~     (p=0.764 n=10+9)
MapMid/3-16                                        1.86ns ± 0%    1.86ns ± 1%     ~     (p=1.000 n=10+10)
MapMid/4-16                                        2.06ns ± 0%    2.06ns ± 0%   -0.27%  (p=0.014 n=10+9)
MapMid/5-16                                        2.06ns ± 0%    2.06ns ± 0%     ~     (p=0.075 n=9+10)
MapMid/6-16                                        2.27ns ± 0%    2.27ns ± 1%     ~     (p=0.898 n=10+10)
MapMid/7-16                                        2.27ns ± 1%    2.26ns ± 0%   -0.23%  (p=0.049 n=10+10)
MapMid/8-16                                        2.47ns ± 0%    2.47ns ± 1%     ~     (p=0.840 n=10+10)
MapMid/9-16                                        4.21ns ± 7%    4.13ns ±19%     ~     (p=0.315 n=10+10)
MapMid/10-16                                       4.17ns ± 7%    4.31ns ± 5%   +3.37%  (p=0.021 n=10+9)
MapMid/11-16                                       4.18ns ± 7%    4.32ns ± 6%   +3.50%  (p=0.015 n=10+10)
MapMid/12-16                                       4.34ns ± 7%    4.30ns ± 5%     ~     (p=0.858 n=9+10)
MapMid/13-16                                       4.25ns ± 6%    4.28ns ± 6%     ~     (p=0.489 n=9+9)
MapMid/14-16                                       3.75ns ±23%    3.90ns ±16%     ~     (p=0.353 n=10+10)
MapMid/15-16                                       3.87ns ±25%    3.95ns ±26%     ~     (p=0.315 n=10+10)
MapMid/16-16                                       4.06ns ±19%    3.94ns ±16%     ~     (p=0.796 n=10+10)
MapLast/1-16                                       1.65ns ± 0%    1.65ns ± 0%     ~     (p=0.607 n=10+10)
MapLast/2-16                                       1.86ns ± 0%    1.86ns ± 0%   +0.26%  (p=0.029 n=10+10)
MapLast/3-16                                       2.06ns ± 1%    2.06ns ± 0%     ~     (p=0.689 n=8+9)
MapLast/4-16                                       2.27ns ± 1%    2.26ns ± 0%     ~     (p=0.148 n=10+9)
MapLast/5-16                                       2.47ns ± 0%    2.47ns ± 0%     ~     (p=0.385 n=9+10)
MapLast/6-16                                       2.67ns ± 0%    2.68ns ± 0%     ~     (p=0.202 n=9+10)
MapLast/7-16                                       2.88ns ± 0%    2.88ns ± 0%     ~     (p=0.751 n=10+10)
MapLast/8-16                                       3.08ns ± 0%    3.08ns ± 0%     ~     (p=0.826 n=10+9)
MapLast/9-16                                       4.31ns ± 6%    4.54ns ± 5%     ~     (p=0.070 n=9+8)
MapLast/10-16                                      4.25ns ± 5%    4.42ns ± 6%     ~     (p=0.321 n=9+8)
MapLast/11-16                                      4.59ns ±16%    5.42ns ±44%  +17.99%  (p=0.019 n=10+10)
MapLast/12-16                                      5.04ns ±19%    6.11ns ±28%  +21.35%  (p=0.005 n=9+10)
MapLast/13-16                                      6.00ns ±35%    5.76ns ± 3%     ~     (p=0.173 n=10+8)
MapLast/14-16                                      4.27ns ± 5%    4.53ns ± 6%   +6.14%  (p=0.007 n=10+10)
MapLast/15-16                                      4.41ns ± 1%    4.44ns ± 7%     ~     (p=0.515 n=8+10)
MapLast/16-16                                      4.18ns ± 6%    4.99ns ±18%  +19.48%  (p=0.000 n=10+10)
MapCycle-16                                        7.48ns ± 2%    7.46ns ± 1%     ~     (p=0.699 n=10+10)
RepeatedLookupStrMapKey32-16                       6.98ns ± 3%    6.73ns ± 2%   -3.63%  (p=0.000 n=10+10)
RepeatedLookupStrMapKey1M-16                       14.7µs ± 5%    14.7µs ± 4%     ~     (p=0.604 n=9+10)
MakeMap/[Byte]Byte-16                              58.5ns ± 1%    58.5ns ± 1%     ~     (p=0.780 n=10+9)
MakeMap/[Int]Int-16                                 113ns ± 0%     113ns ± 1%     ~     (p=0.100 n=8+10)
NewEmptyMap-16                                     2.47ns ± 0%    2.47ns ± 0%     ~     (p=0.638 n=10+10)
NewSmallMap-16                                     11.5ns ± 1%    11.6ns ± 0%   +1.18%  (p=0.000 n=10+10)
MapIter-16                                         42.2ns ± 0%    42.8ns ± 1%   +1.50%  (p=0.000 n=10+10)
MapIterEmpty-16                                    1.85ns ± 0%    1.85ns ± 0%     ~     (p=0.651 n=10+10)
SameLengthMap-16                                   1.85ns ± 1%    1.85ns ± 0%     ~     (p=0.247 n=10+10)
BigKeyMap-16                                       7.18ns ± 1%    7.42ns ± 4%   +3.33%  (p=0.004 n=10+10)
BigValMap-16                                       7.03ns ± 2%    7.19ns ± 1%   +2.33%  (p=0.000 n=10+9)
SmallKeyMap-16                                     5.32ns ± 1%    5.24ns ± 1%   -1.41%  (p=0.000 n=10+10)
MapPopulate/1-16                                   6.30ns ± 0%    6.41ns ± 1%   +1.81%  (p=0.000 n=8+10)
MapPopulate/10-16                                   239ns ± 2%     234ns ± 2%   -2.05%  (p=0.001 n=9+10)
MapPopulate/100-16                                 4.19µs ± 2%    4.22µs ± 2%     ~     (p=0.171 n=10+10)
MapPopulate/1000-16                                52.3µs ± 1%    52.5µs ± 1%     ~     (p=0.133 n=9+10)
MapPopulate/10000-16                                459µs ± 1%     466µs ± 2%   +1.45%  (p=0.005 n=10+10)
MapPopulate/100000-16                              4.22ms ± 2%    4.25ms ± 2%     ~     (p=0.393 n=10+10)
ComplexAlgMap-16                                   12.5ns ± 1%    12.4ns ± 1%   -0.95%  (p=0.022 n=10+10)
GoMapClear/Reflexive/1-16                          9.61ns ± 1%    9.58ns ± 0%   -0.27%  (p=0.027 n=10+10)
GoMapClear/Reflexive/10-16                         10.0ns ± 1%    10.0ns ± 1%     ~     (p=0.648 n=9+9)
GoMapClear/Reflexive/100-16                        31.4ns ± 0%    31.4ns ± 1%     ~     (p=0.305 n=9+10)
GoMapClear/Reflexive/1000-16                        147ns ± 0%     149ns ± 2%   +1.21%  (p=0.000 n=10+10)
GoMapClear/Reflexive/10000-16                      3.99µs ± 0%    4.00µs ± 0%   +0.21%  (p=0.018 n=9+10)
GoMapClear/NonReflexive/1-16                       41.4ns ± 2%    41.7ns ± 1%   +0.55%  (p=0.043 n=9+10)
GoMapClear/NonReflexive/10-16                      50.3ns ± 1%    50.9ns ± 1%   +1.16%  (p=0.000 n=10+10)
GoMapClear/NonReflexive/100-16                      125ns ± 0%     126ns ± 0%   +0.96%  (p=0.000 n=8+10)
GoMapClear/NonReflexive/1000-16                    1.08µs ± 0%    1.08µs ± 1%     ~     (p=0.097 n=10+10)
GoMapClear/NonReflexive/10000-16                   8.18µs ± 2%    8.10µs ± 0%   -0.91%  (p=0.019 n=10+8)
MapStringConversion/32/simple-16                   4.66ns ± 1%    4.69ns ± 3%     ~     (p=0.905 n=9+10)
MapStringConversion/32/struct-16                   4.65ns ± 3%    4.94ns ± 2%   +6.23%  (p=0.000 n=10+10)
MapStringConversion/32/array-16                    4.69ns ± 3%    4.72ns ± 3%     ~     (p=0.631 n=10+10)
MapStringConversion/64/simple-16                   4.14ns ± 0%    4.14ns ± 1%     ~     (p=0.342 n=10+10)
MapStringConversion/64/struct-16                   4.13ns ± 0%    4.13ns ± 0%     ~     (p=0.809 n=10+10)
MapStringConversion/64/array-16                    4.13ns ± 1%    4.13ns ± 1%     ~     (p=0.752 n=10+10)
MapInterfaceString-16                              7.90ns ±23%    8.51ns ±33%     ~     (p=0.604 n=9+10)
MapInterfacePtr-16                                 7.68ns ±29%    7.10ns ±36%     ~     (p=0.353 n=10+10)
NewEmptyMapHintLessThan8-16                        3.70ns ± 0%    3.70ns ± 0%     ~     (p=0.209 n=10+10)
NewEmptyMapHintGreaterThan8-16                      270ns ± 1%     272ns ± 1%   +0.71%  (p=0.005 n=10+9)
MapPop100-16                                       6.45µs ± 0%    6.50µs ± 1%   +0.77%  (p=0.000 n=10+10)
MapPop1000-16                                       114µs ± 1%     114µs ± 1%     ~     (p=0.190 n=10+10)
MapPop10000-16                                     2.28ms ± 2%    2.28ms ± 2%     ~     (p=0.912 n=10+10)
MapAssign/Int32/256-16                             4.75ns ± 2%    4.82ns ± 4%     ~     (p=0.101 n=10+10)
MapAssign/Int32/65536-16                           16.4ns ± 1%    16.7ns ± 0%   +1.44%  (p=0.000 n=10+9)
MapAssign/Int64/256-16                             4.79ns ± 5%    4.79ns ± 1%     ~     (p=0.616 n=10+8)
MapAssign/Int64/65536-16                           17.1ns ± 1%    16.8ns ± 0%   -1.28%  (p=0.000 n=10+8)
MapAssign/Str/256-16                               6.07ns ± 6%    6.24ns ± 2%   +2.84%  (p=0.035 n=10+9)
MapAssign/Str/65536-16                             21.4ns ± 0%    21.4ns ± 3%     ~     (p=0.300 n=7+10)
MapOperatorAssign/Int32/256-16                     4.82ns ± 3%    4.81ns ± 3%     ~     (p=0.684 n=10+10)
MapOperatorAssign/Int32/65536-16                   16.8ns ± 1%    16.5ns ± 1%   -1.68%  (p=0.000 n=9+10)
MapOperatorAssign/Int64/256-16                     4.74ns ± 1%    4.77ns ± 3%     ~     (p=0.563 n=10+9)
MapOperatorAssign/Int64/65536-16                   16.9ns ± 1%    17.2ns ± 1%   +1.88%  (p=0.000 n=10+10)
MapOperatorAssign/Str/256-16                       1.09µs ± 1%    1.10µs ± 2%     ~     (p=0.210 n=10+10)
MapOperatorAssign/Str/65536-16                      184ns ± 9%     184ns ± 8%     ~     (p=0.922 n=10+9)
MapAppendAssign/Int32/256-16                       13.8ns ±10%    14.4ns ±11%     ~     (p=0.190 n=10+10)
MapAppendAssign/Int32/65536-16                     28.9ns ± 5%    30.7ns ± 6%   +6.13%  (p=0.001 n=9+10)
MapAppendAssign/Int64/256-16                       14.5ns ±12%    13.8ns ± 8%   -5.02%  (p=0.037 n=10+10)
MapAppendAssign/Int64/65536-16                     30.9ns ± 1%    30.4ns ± 2%   -1.56%  (p=0.001 n=10+10)
MapAppendAssign/Str/256-16                         30.2ns ± 6%    30.0ns ±10%     ~     (p=0.645 n=10+10)
MapAppendAssign/Str/65536-16                       44.5ns ± 4%    46.8ns ± 3%   +5.17%  (p=0.001 n=8+9)
MapDelete/Int32/100-16                             18.7ns ± 0%    18.7ns ± 0%   -0.27%  (p=0.017 n=10+10)
MapDelete/Int32/1000-16                            17.6ns ± 1%    17.5ns ± 1%   -0.85%  (p=0.000 n=9+10)
MapDelete/Int32/10000-16                           18.7ns ± 0%    18.3ns ± 1%   -1.92%  (p=0.000 n=10+10)
MapDelete/Int64/100-16                             19.1ns ± 0%    19.2ns ± 0%   +0.68%  (p=0.000 n=10+9)
MapDelete/Int64/1000-16                            17.7ns ± 2%    18.3ns ± 1%   +3.00%  (p=0.000 n=10+10)
MapDelete/Int64/10000-16                           18.8ns ± 1%    19.2ns ± 0%   +2.01%  (p=0.000 n=10+9)
MapDelete/Str/100-16                               26.5ns ± 0%    26.4ns ± 1%   -0.73%  (p=0.000 n=10+10)
MapDelete/Str/1000-16                              23.5ns ± 2%    23.4ns ± 1%     ~     (p=0.425 n=10+10)
MapDelete/Str/10000-16                             25.1ns ± 0%    25.1ns ± 1%   +0.28%  (p=0.037 n=10+10)
MapDelete/Pointer/100-16                           20.6ns ± 1%    20.6ns ± 0%     ~     (p=0.117 n=10+10)
MapDelete/Pointer/1000-16                          19.2ns ± 1%    19.4ns ± 1%   +0.97%  (p=0.004 n=10+10)
MapDelete/Pointer/10000-16                         20.0ns ± 0%    20.1ns ± 1%   +0.52%  (p=0.022 n=10+10)
Memmove/0-16                                       0.21ns ± 2%    0.21ns ± 1%     ~     (p=0.671 n=10+10)
Memmove/1-16                                       0.93ns ± 0%    0.93ns ± 0%   +0.21%  (p=0.034 n=10+10)
Memmove/2-16                                       0.93ns ± 0%    0.93ns ± 0%     ~     (p=0.101 n=10+10)
Memmove/3-16                                       0.93ns ± 1%    0.93ns ± 1%   +0.49%  (p=0.004 n=10+10)
Memmove/4-16                                       1.03ns ± 0%    1.03ns ± 0%     ~     (p=0.260 n=10+10)
Memmove/5-16                                       1.13ns ± 0%    1.13ns ± 0%   +0.20%  (p=0.034 n=10+10)
Memmove/6-16                                       1.13ns ± 0%    1.13ns ± 1%     ~     (p=0.126 n=10+10)
Memmove/7-16                                       1.13ns ± 0%    1.13ns ± 1%   +0.22%  (p=0.028 n=10+10)
Memmove/8-16                                       1.13ns ± 0%    1.13ns ± 0%     ~     (p=0.545 n=9+10)
Memmove/9-16                                       1.25ns ± 0%    1.35ns ± 0%   +7.98%  (p=0.000 n=10+10)
Memmove/10-16                                      1.25ns ± 0%    1.35ns ± 0%   +7.96%  (p=0.000 n=9+9)
Memmove/11-16                                      1.25ns ± 0%    1.35ns ± 0%   +8.53%  (p=0.000 n=10+9)
Memmove/12-16                                      1.25ns ± 0%    1.35ns ± 1%   +8.24%  (p=0.000 n=10+10)
Memmove/13-16                                      1.25ns ± 0%    1.34ns ± 0%   +7.75%  (p=0.000 n=10+10)
Memmove/14-16                                      1.25ns ± 0%    1.35ns ± 1%   +8.28%  (p=0.000 n=10+9)
Memmove/15-16                                      1.25ns ± 0%    1.35ns ± 0%   +8.07%  (p=0.000 n=10+9)
Memmove/16-16                                      1.25ns ± 0%    1.35ns ± 1%   +8.35%  (p=0.000 n=9+10)
Memmove/32-16                                      1.34ns ± 0%    1.36ns ± 1%   +1.22%  (p=0.000 n=10+10)
Memmove/64-16                                      1.45ns ± 0%    1.64ns ± 0%  +13.07%  (p=0.000 n=10+9)
Memmove/128-16                                     1.86ns ± 0%    2.02ns ± 0%   +8.64%  (p=0.000 n=10+10)
Memmove/256-16                                     2.47ns ± 0%    2.49ns ± 1%   +1.14%  (p=0.000 n=10+10)
Memmove/512-16                                     3.96ns ± 1%    3.96ns ± 0%     ~     (p=0.182 n=10+10)
Memmove/1024-16                                    5.90ns ± 1%    5.87ns ± 1%     ~     (p=0.258 n=9+9)
Memmove/2048-16                                    9.62ns ± 1%    9.62ns ± 2%     ~     (p=0.963 n=8+9)
Memmove/4096-16                                    16.4ns ± 0%    17.1ns ± 4%   +4.19%  (p=0.003 n=8+9)
MemmoveOverlap/32-16                               1.62ns ± 1%    1.68ns ± 1%   +3.53%  (p=0.000 n=10+10)
MemmoveOverlap/64-16                               1.64ns ± 0%    1.65ns ± 0%   +0.29%  (p=0.002 n=9+9)
MemmoveOverlap/128-16                              2.06ns ± 0%    2.06ns ± 0%     ~     (p=0.070 n=10+10)
MemmoveOverlap/256-16                              2.67ns ± 0%    2.67ns ± 0%   +0.26%  (p=0.012 n=10+10)
MemmoveOverlap/512-16                              6.20ns ±18%    5.74ns ± 0%     ~     (p=0.645 n=10+8)
MemmoveOverlap/1024-16                             7.28ns ± 0%    7.30ns ± 0%   +0.28%  (p=0.006 n=8+10)
MemmoveOverlap/2048-16                             11.9ns ± 0%    12.0ns ± 1%   +0.37%  (p=0.014 n=9+9)
MemmoveOverlap/4096-16                             23.3ns ± 1%    23.1ns ± 1%   -0.84%  (p=0.000 n=8+10)
MemmoveUnalignedDst/0-16                           1.03ns ± 0%    1.03ns ± 0%   +0.19%  (p=0.007 n=10+10)
MemmoveUnalignedDst/1-16                           1.24ns ± 0%    1.25ns ± 1%   +0.52%  (p=0.022 n=10+10)
MemmoveUnalignedDst/2-16                           1.23ns ± 0%    1.23ns ± 0%     ~     (p=0.051 n=10+10)
MemmoveUnalignedDst/3-16                           1.23ns ± 0%    1.23ns ± 0%   +0.14%  (p=0.006 n=9+9)
MemmoveUnalignedDst/4-16                           1.23ns ± 0%    1.24ns ± 1%   +0.37%  (p=0.004 n=10+10)
MemmoveUnalignedDst/5-16                           1.35ns ± 0%    1.35ns ± 0%     ~     (p=0.075 n=10+10)
MemmoveUnalignedDst/6-16                           1.34ns ± 0%    1.34ns ± 0%     ~     (p=0.779 n=10+10)
MemmoveUnalignedDst/7-16                           1.34ns ± 0%    1.34ns ± 0%     ~     (p=1.000 n=10+10)
MemmoveUnalignedDst/8-16                           1.34ns ± 0%    1.35ns ± 1%   +0.39%  (p=0.024 n=10+10)
MemmoveUnalignedDst/9-16                           1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.849 n=10+10)
MemmoveUnalignedDst/10-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.255 n=10+10)
MemmoveUnalignedDst/11-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.304 n=10+10)
MemmoveUnalignedDst/12-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.672 n=10+10)
MemmoveUnalignedDst/13-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.435 n=10+10)
MemmoveUnalignedDst/14-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.340 n=10+10)
MemmoveUnalignedDst/15-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.911 n=10+9)
MemmoveUnalignedDst/16-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.074 n=10+10)
MemmoveUnalignedDst/32-16                          1.62ns ± 0%    1.63ns ± 0%     ~     (p=0.059 n=10+10)
MemmoveUnalignedDst/64-16                          1.65ns ± 0%    1.65ns ± 0%     ~     (p=0.234 n=10+10)
MemmoveUnalignedDst/128-16                         2.06ns ± 0%    2.06ns ± 0%     ~     (p=0.709 n=10+9)
MemmoveUnalignedDst/256-16                         3.69ns ± 0%    3.70ns ± 0%     ~     (p=0.144 n=10+10)
MemmoveUnalignedDst/512-16                         4.15ns ± 1%    4.14ns ± 0%     ~     (p=0.778 n=10+8)
MemmoveUnalignedDst/1024-16                        7.52ns ± 0%    7.53ns ± 1%     ~     (p=0.650 n=9+9)
MemmoveUnalignedDst/2048-16                        12.9ns ± 0%    12.9ns ± 1%     ~     (p=0.548 n=8+8)
MemmoveUnalignedDst/4096-16                        25.4ns ± 0%    25.4ns ± 0%     ~     (p=0.947 n=9+9)
MemmoveUnalignedDstOverlap/32-16                   4.08ns ± 0%    4.09ns ± 0%     ~     (p=0.360 n=10+10)
MemmoveUnalignedDstOverlap/64-16                   4.56ns ± 0%    4.56ns ± 0%     ~     (p=0.705 n=10+9)
MemmoveUnalignedDstOverlap/128-16                  4.67ns ± 0%    4.67ns ± 0%     ~     (p=0.397 n=10+10)
MemmoveUnalignedDstOverlap/256-16                  5.08ns ± 0%    5.08ns ± 0%     ~     (p=0.159 n=10+9)
MemmoveUnalignedDstOverlap/512-16                  8.45ns ± 5%    8.19ns ± 0%   -3.10%  (p=0.021 n=10+9)
MemmoveUnalignedDstOverlap/1024-16                 9.55ns ± 0%    9.56ns ± 0%     ~     (p=0.221 n=8+8)
MemmoveUnalignedDstOverlap/2048-16                 14.0ns ± 0%    14.0ns ± 1%     ~     (p=0.200 n=10+9)
MemmoveUnalignedDstOverlap/4096-16                 26.5ns ± 0%    26.5ns ± 0%     ~     (p=0.458 n=10+9)
MemmoveUnalignedSrc/0-16                           1.02ns ± 1%    0.99ns ± 1%   -2.67%  (p=0.000 n=10+9)
MemmoveUnalignedSrc/1-16                           1.13ns ± 0%    1.13ns ± 1%   -0.25%  (p=0.027 n=10+9)
MemmoveUnalignedSrc/2-16                           1.13ns ± 1%    1.13ns ± 0%   -0.28%  (p=0.012 n=10+9)
MemmoveUnalignedSrc/3-16                           1.24ns ± 1%    1.23ns ± 0%   -0.25%  (p=0.022 n=9+10)
MemmoveUnalignedSrc/4-16                           1.24ns ± 0%    1.23ns ± 1%     ~     (p=0.118 n=9+10)
MemmoveUnalignedSrc/5-16                           1.34ns ± 0%    1.34ns ± 1%     ~     (p=0.564 n=8+10)
MemmoveUnalignedSrc/6-16                           1.34ns ± 0%    1.34ns ± 0%   -0.39%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/7-16                           1.34ns ± 0%    1.34ns ± 0%     ~     (p=0.235 n=10+10)
MemmoveUnalignedSrc/8-16                           1.34ns ± 0%    1.34ns ± 0%   -0.37%  (p=0.002 n=10+9)
MemmoveUnalignedSrc/9-16                           1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.579 n=10+9)
MemmoveUnalignedSrc/10-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.534 n=10+9)
MemmoveUnalignedSrc/11-16                          1.44ns ± 0%    1.44ns ± 1%     ~     (p=0.415 n=10+10)
MemmoveUnalignedSrc/12-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.218 n=10+10)
MemmoveUnalignedSrc/13-16                          1.44ns ± 0%    1.44ns ± 1%     ~     (p=0.693 n=10+10)
MemmoveUnalignedSrc/14-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.901 n=10+10)
MemmoveUnalignedSrc/15-16                          1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.379 n=10+10)
MemmoveUnalignedSrc/16-16                          1.44ns ± 1%    1.44ns ± 0%     ~     (p=0.538 n=10+10)
MemmoveUnalignedSrc/32-16                          1.60ns ± 1%    1.60ns ± 0%     ~     (p=0.491 n=10+10)
MemmoveUnalignedSrc/64-16                          1.65ns ± 0%    1.65ns ± 0%     ~     (p=0.564 n=10+10)
MemmoveUnalignedSrc/128-16                         2.09ns ± 0%    2.09ns ± 0%     ~     (p=0.497 n=10+9)
MemmoveUnalignedSrc/256-16                         2.70ns ± 0%    2.78ns ± 1%   +2.81%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/512-16                         4.31ns ± 0%    4.30ns ± 0%   -0.26%  (p=0.031 n=8+9)
MemmoveUnalignedSrc/1024-16                        7.28ns ± 0%    7.21ns ± 1%   -1.05%  (p=0.000 n=8+10)
MemmoveUnalignedSrc/2048-16                        13.0ns ± 0%    13.0ns ± 0%     ~     (p=0.180 n=9+8)
MemmoveUnalignedSrc/4096-16                        25.4ns ± 0%    25.3ns ± 1%     ~     (p=0.054 n=10+10)
MemmoveUnalignedSrcOverlap/32-16                   4.04ns ± 0%    4.06ns ± 0%   +0.62%  (p=0.000 n=9+10)
MemmoveUnalignedSrcOverlap/64-16                   4.12ns ± 0%    4.12ns ± 0%     ~     (p=0.421 n=10+10)
MemmoveUnalignedSrcOverlap/128-16                  4.53ns ± 0%    4.52ns ± 0%     ~     (p=0.251 n=10+10)
MemmoveUnalignedSrcOverlap/256-16                  6.17ns ± 0%    6.15ns ± 0%   -0.35%  (p=0.000 n=10+9)
MemmoveUnalignedSrcOverlap/512-16                  7.43ns ± 0%    7.44ns ± 0%     ~     (p=0.524 n=9+8)
MemmoveUnalignedSrcOverlap/1024-16                 8.94ns ± 0%    8.94ns ± 0%     ~     (p=0.419 n=8+8)
MemmoveUnalignedSrcOverlap/2048-16                 13.2ns ± 0%    14.5ns ±21%     ~     (p=0.107 n=8+10)
MemmoveUnalignedSrcOverlap/4096-16                 25.6ns ± 0%    25.6ns ± 1%     ~     (p=0.650 n=9+9)
Memclr/5-16                                        0.86ns ± 1%    0.86ns ± 2%     ~     (p=0.531 n=9+9)
Memclr/16-16                                       1.04ns ± 0%    1.04ns ± 0%   +0.32%  (p=0.013 n=9+10)
Memclr/64-16                                       1.23ns ± 0%    1.26ns ± 0%   +2.28%  (p=0.000 n=10+10)
Memclr/256-16                                      2.27ns ± 0%    2.27ns ± 0%     ~     (p=0.127 n=10+10)
Memclr/4096-16                                     17.1ns ± 1%    17.3ns ± 0%   +0.88%  (p=0.000 n=10+10)
Memclr/65536-16                                     821ns ± 0%     822ns ± 0%     ~     (p=0.516 n=10+10)
Memclr/1M-16                                       14.1µs ± 1%    14.0µs ± 1%     ~     (p=0.516 n=10+10)
Memclr/4M-16                                       86.1µs ± 1%    85.9µs ± 0%     ~     (p=0.123 n=10+10)
Memclr/8M-16                                        174µs ± 2%     173µs ± 0%     ~     (p=0.408 n=10+8)
Memclr/16M-16                                       385µs ± 4%     387µs ± 0%     ~     (p=0.173 n=10+8)
Memclr/64M-16                                      2.18ms ± 0%    2.19ms ± 0%     ~     (p=0.113 n=10+9)
GoMemclr/5-16                                      0.82ns ± 0%    0.82ns ± 0%     ~     (p=0.346 n=9+10)
GoMemclr/16-16                                     1.02ns ± 0%    1.02ns ± 0%   +0.22%  (p=0.003 n=10+8)
GoMemclr/64-16                                     1.14ns ± 0%    1.14ns ± 0%     ~     (p=0.948 n=10+9)
GoMemclr/256-16                                    2.06ns ± 0%    2.06ns ± 0%     ~     (p=0.868 n=10+10)
MemclrRange/1K_2K-16                                457ns ± 0%     428ns ± 1%   -6.38%  (p=0.000 n=10+10)
MemclrRange/2K_8K-16                               1.46µs ± 0%    1.46µs ± 0%     ~     (p=0.700 n=10+10)
MemclrRange/4K_16K-16                              1.16µs ± 0%    1.16µs ± 0%     ~     (p=0.567 n=9+10)
MemclrRange/160K_228K-16                           20.7µs ± 0%    20.7µs ± 0%     ~     (p=0.160 n=10+10)
ClearFat7-16                                       0.38ns ± 5%    0.21ns ± 1%  -45.79%  (p=0.000 n=9+10)
ClearFat8-16                                       0.21ns ± 3%    0.12ns ± 2%  -44.16%  (p=0.000 n=8+9)
ClearFat11-16                                      0.35ns ± 3%    0.21ns ± 1%  -40.46%  (p=0.000 n=9+9)
ClearFat12-16                                      0.23ns ± 9%    0.21ns ± 1%  -10.23%  (p=0.000 n=10+9)
ClearFat13-16                                      0.22ns ± 6%    0.21ns ± 2%   -6.53%  (p=0.000 n=10+10)
ClearFat14-16                                      0.22ns ± 4%    0.21ns ± 1%   -5.97%  (p=0.000 n=10+10)
ClearFat15-16                                      0.22ns ± 4%    0.21ns ± 1%   -6.96%  (p=0.000 n=10+9)
ClearFat16-16                                      0.19ns ± 9%    0.12ns ± 6%  -34.89%  (p=0.000 n=9+10)
ClearFat24-16                                      0.23ns ± 6%    0.21ns ± 1%  -10.26%  (p=0.000 n=10+9)
ClearFat32-16                                      0.22ns ± 5%    0.21ns ± 2%   -5.31%  (p=0.000 n=10+10)
ClearFat40-16                                      0.34ns ± 4%    0.62ns ± 1%  +83.00%  (p=0.000 n=10+10)
ClearFat48-16                                      0.33ns ± 2%    0.41ns ± 0%  +26.71%  (p=0.000 n=10+10)
ClearFat56-16                                      0.41ns ± 1%    0.41ns ± 0%     ~     (p=0.838 n=10+10)
ClearFat64-16                                      0.41ns ± 0%    0.41ns ± 0%     ~     (p=0.178 n=10+8)
ClearFat72-16                                      0.82ns ± 0%    0.82ns ± 0%     ~     (p=0.669 n=10+10)
ClearFat128-16                                     1.04ns ± 0%    1.04ns ± 0%     ~     (p=0.679 n=10+10)
ClearFat256-16                                     1.86ns ± 0%    1.86ns ± 0%     ~     (p=0.066 n=9+10)
ClearFat512-16                                     3.50ns ± 0%    3.50ns ± 0%     ~     (p=0.626 n=10+10)
ClearFat1024-16                                    6.79ns ± 0%    6.79ns ± 0%     ~     (p=0.986 n=10+10)
ClearFat1032-16                                    13.6ns ± 0%    13.6ns ± 0%   +0.13%  (p=0.044 n=10+10)
ClearFat1040-16                                    10.3ns ± 0%    10.3ns ± 0%     ~     (p=0.175 n=10+9)
CopyFat7-16                                        0.37ns ±13%    0.25ns ± 1%  -31.74%  (p=0.000 n=10+9)
CopyFat8-16                                        0.17ns ± 1%    0.17ns ± 2%   +1.35%  (p=0.004 n=9+9)
CopyFat11-16                                       0.26ns ± 1%    0.30ns ± 3%  +12.58%  (p=0.000 n=9+10)
CopyFat12-16                                       0.28ns ± 2%    0.26ns ± 1%   -5.66%  (p=0.000 n=9+9)
CopyFat13-16                                       0.26ns ± 0%    0.28ns ± 4%   +7.35%  (p=0.000 n=8+10)
CopyFat14-16                                       0.29ns ± 6%    0.26ns ± 2%  -10.46%  (p=0.000 n=10+9)
CopyFat15-16                                       0.26ns ± 1%    0.30ns ± 6%  +14.12%  (p=0.000 n=8+10)
CopyFat16-16                                       0.21ns ± 1%    0.21ns ± 0%     ~     (p=0.426 n=8+8)
CopyFat24-16                                       0.29ns ± 3%    0.25ns ± 1%  -12.27%  (p=0.000 n=9+10)
CopyFat32-16                                       0.26ns ± 4%    0.29ns ± 4%  +11.71%  (p=0.000 n=10+10)
CopyFat64-16                                       0.46ns ± 8%    0.42ns ± 1%   -8.37%  (p=0.002 n=10+10)
CopyFat72-16                                       0.82ns ± 0%    0.82ns ± 0%     ~     (p=0.563 n=10+10)
CopyFat128-16                                      1.53ns ± 0%    1.54ns ± 0%   +0.62%  (p=0.000 n=10+10)
CopyFat256-16                                      2.68ns ± 0%    2.65ns ± 1%   -1.23%  (p=0.000 n=10+10)
CopyFat512-16                                      4.93ns ± 1%    5.19ns ± 3%   +5.16%  (p=0.000 n=9+9)
CopyFat520-16                                      6.99ns ± 0%    6.99ns ± 0%     ~     (p=0.539 n=10+10)
CopyFat1024-16                                     11.5ns ± 1%     9.8ns ± 1%  -14.98%  (p=0.000 n=9+10)
CopyFat1032-16                                     13.6ns ± 0%    13.6ns ± 0%     ~     (p=0.728 n=10+10)
CopyFat1040-16                                     11.0ns ± 0%    11.1ns ± 0%   +0.53%  (p=0.000 n=10+10)
Issue18740/2byte-16                                10.1µs ± 0%    10.1µs ± 0%     ~     (p=0.342 n=10+10)
Issue18740/4byte-16                                2.34µs ± 0%    2.35µs ± 0%   +0.30%  (p=0.002 n=10+8)
Issue18740/8byte-16                                1.28µs ± 0%    1.28µs ± 0%   +0.32%  (p=0.000 n=9+10)
Finalizer-16                                        345µs ± 1%     336µs ± 0%   -2.55%  (p=0.000 n=10+9)
FinalizerRun-16                                     450ns ± 3%     420ns ± 1%   -6.65%  (p=0.000 n=10+10)
PallocBitsSummarize/Unpacked00-16                  2.88ns ± 0%    2.88ns ± 0%     ~     (p=0.358 n=10+10)
PallocBitsSummarize/UnpackedFFFFFFFFFFFFFFFF-16    15.2ns ± 0%    15.2ns ± 0%     ~     (p=0.925 n=10+10)
PallocBitsSummarize/UnpackedAA-16                  16.4ns ± 0%    16.3ns ± 0%     ~     (p=0.113 n=9+9)
PallocBitsSummarize/UnpackedAAAAAAAAAAAAAAAA-16    16.5ns ± 0%    16.6ns ± 0%     ~     (p=0.238 n=10+10)
PallocBitsSummarize/Unpacked80000000AAAAAAAA-16    37.8ns ± 1%    36.4ns ± 0%   -3.70%  (p=0.000 n=10+9)
PallocBitsSummarize/UnpackedAAAAAAAA00000001-16    41.8ns ± 1%    39.9ns ± 0%   -4.68%  (p=0.000 n=9+10)
PallocBitsSummarize/UnpackedBBBBBBBBBBBBBBBB-16    18.3ns ± 0%    18.3ns ± 0%     ~     (p=0.781 n=10+10)
PallocBitsSummarize/Unpacked80000000BBBBBBBB-16    38.8ns ± 1%    38.1ns ± 0%   -1.78%  (p=0.000 n=9+10)
PallocBitsSummarize/UnpackedBBBBBBBB00000001-16    37.5ns ± 0%    36.1ns ± 1%   -3.88%  (p=0.000 n=8+10)
PallocBitsSummarize/UnpackedCCCCCCCCCCCCCCCC-16    21.8ns ± 0%    21.9ns ± 0%   +0.20%  (p=0.018 n=10+9)
PallocBitsSummarize/Unpacked4444444444444444-16    21.8ns ± 0%    21.9ns ± 0%   +0.20%  (p=0.029 n=10+9)
PallocBitsSummarize/Unpacked4040404040404040-16    26.5ns ± 0%    26.5ns ± 0%   -0.24%  (p=0.001 n=9+10)
PallocBitsSummarize/Unpacked4000400040004000-16    33.4ns ± 1%    31.3ns ± 0%   -6.20%  (p=0.000 n=9+10)
PallocBitsSummarize/Unpacked1000404044CCAAFF-16    36.4ns ± 1%    35.9ns ± 0%   -1.50%  (p=0.000 n=10+10)
FindBitRange64/Pattern00Size2-16                   0.34ns ± 1%    0.35ns ± 1%   +3.80%  (p=0.000 n=10+9)
FindBitRange64/Pattern00Size8-16                   0.70ns ± 1%    0.70ns ± 0%   -0.68%  (p=0.000 n=10+10)
FindBitRange64/Pattern00Size32-16                  0.70ns ± 1%    0.69ns ± 0%   -0.86%  (p=0.001 n=10+8)
FindBitRange64/PatternFFFFFFFFFFFFFFFFSize2-16     0.34ns ± 1%    0.35ns ± 1%   +4.45%  (p=0.000 n=9+8)
FindBitRange64/PatternFFFFFFFFFFFFFFFFSize8-16     1.54ns ± 0%    1.54ns ± 1%     ~     (p=0.914 n=9+9)
FindBitRange64/PatternFFFFFFFFFFFFFFFFSize32-16    2.78ns ± 0%    2.78ns ± 0%     ~     (p=0.295 n=9+10)
FindBitRange64/PatternAASize2-16                   0.34ns ± 2%    0.35ns ± 2%   +4.61%  (p=0.000 n=10+10)
FindBitRange64/PatternAASize8-16                   0.70ns ± 1%    0.70ns ± 1%   -0.82%  (p=0.005 n=10+10)
FindBitRange64/PatternAASize32-16                  0.70ns ± 1%    0.70ns ± 0%   -0.73%  (p=0.003 n=10+9)
FindBitRange64/PatternAAAAAAAAAAAAAAAASize2-16     0.34ns ± 2%    0.35ns ± 2%   +3.94%  (p=0.000 n=10+10)
FindBitRange64/PatternAAAAAAAAAAAAAAAASize8-16     0.70ns ± 1%    0.70ns ± 1%   -0.67%  (p=0.025 n=10+10)
FindBitRange64/PatternAAAAAAAAAAAAAAAASize32-16    0.70ns ± 1%    0.70ns ± 1%     ~     (p=0.118 n=9+10)
FindBitRange64/Pattern80000000AAAAAAAASize2-16     0.34ns ± 1%    0.35ns ± 2%   +3.72%  (p=0.000 n=10+9)
FindBitRange64/Pattern80000000AAAAAAAASize8-16     0.70ns ± 1%    0.70ns ± 0%     ~     (p=0.102 n=10+10)
FindBitRange64/Pattern80000000AAAAAAAASize32-16    0.70ns ± 1%    0.70ns ± 1%   -0.55%  (p=0.011 n=10+10)
FindBitRange64/PatternAAAAAAAA00000001Size2-16     0.34ns ± 2%    0.35ns ± 1%   +3.83%  (p=0.000 n=10+9)
FindBitRange64/PatternAAAAAAAA00000001Size8-16     0.70ns ± 1%    0.70ns ± 1%     ~     (p=0.065 n=10+10)
FindBitRange64/PatternAAAAAAAA00000001Size32-16    0.70ns ± 1%    0.70ns ± 1%   -0.95%  (p=0.002 n=10+10)
FindBitRange64/PatternBBBBBBBBBBBBBBBBSize2-16     0.34ns ± 0%    0.35ns ± 1%   +4.12%  (p=0.000 n=8+10)
FindBitRange64/PatternBBBBBBBBBBBBBBBBSize8-16     1.24ns ± 0%    1.23ns ± 0%   -0.30%  (p=0.002 n=10+9)
FindBitRange64/PatternBBBBBBBBBBBBBBBBSize32-16    1.24ns ± 0%    1.24ns ± 0%   -0.17%  (p=0.023 n=9+10)
FindBitRange64/Pattern80000000BBBBBBBBSize2-16     0.34ns ± 1%    0.35ns ± 2%   +4.82%  (p=0.000 n=9+10)
FindBitRange64/Pattern80000000BBBBBBBBSize8-16     1.24ns ± 1%    1.24ns ± 0%     ~     (p=0.063 n=10+10)
FindBitRange64/Pattern80000000BBBBBBBBSize32-16    1.24ns ± 0%    1.24ns ± 0%     ~     (p=0.164 n=9+10)
FindBitRange64/PatternBBBBBBBB00000001Size2-16     0.34ns ± 1%    0.35ns ± 1%   +4.38%  (p=0.000 n=8+10)
FindBitRange64/PatternBBBBBBBB00000001Size8-16     1.24ns ± 1%    1.24ns ± 0%     ~     (p=0.052 n=10+10)
FindBitRange64/PatternBBBBBBBB00000001Size32-16    1.24ns ± 0%    1.23ns ± 0%   -0.40%  (p=0.000 n=10+10)
FindBitRange64/PatternCCCCCCCCCCCCCCCCSize2-16     0.34ns ± 0%    0.35ns ± 2%   +3.96%  (p=0.000 n=9+10)
FindBitRange64/PatternCCCCCCCCCCCCCCCCSize8-16     1.24ns ± 0%    1.23ns ± 0%   -0.30%  (p=0.000 n=10+9)
FindBitRange64/PatternCCCCCCCCCCCCCCCCSize32-16    1.24ns ± 0%    1.24ns ± 1%     ~     (p=0.284 n=10+10)
FindBitRange64/Pattern4444444444444444Size2-16     0.34ns ± 1%    0.35ns ± 1%   +3.91%  (p=0.000 n=9+9)
FindBitRange64/Pattern4444444444444444Size8-16     0.70ns ± 1%    0.70ns ± 1%     ~     (p=0.617 n=10+10)
FindBitRange64/Pattern4444444444444444Size32-16    0.70ns ± 1%    0.70ns ± 1%   -0.60%  (p=0.006 n=10+10)
FindBitRange64/Pattern4040404040404040Size2-16     0.34ns ± 2%    0.35ns ± 2%   +3.67%  (p=0.000 n=10+10)
FindBitRange64/Pattern4040404040404040Size8-16     0.70ns ± 2%    0.70ns ± 1%   -0.87%  (p=0.014 n=10+10)
FindBitRange64/Pattern4040404040404040Size32-16    0.70ns ± 1%    0.70ns ± 1%     ~     (p=0.256 n=10+10)
FindBitRange64/Pattern4000400040004000Size2-16     0.34ns ± 2%    0.35ns ± 3%   +4.71%  (p=0.000 n=10+10)
FindBitRange64/Pattern4000400040004000Size8-16     0.70ns ± 1%    0.70ns ± 1%     ~     (p=0.393 n=10+10)
FindBitRange64/Pattern4000400040004000Size32-16    0.70ns ± 1%    0.70ns ± 1%   -0.86%  (p=0.014 n=10+10)
NetpollBreak-16                                    1.49µs ± 1%    1.50µs ± 3%     ~     (p=0.181 n=8+10)
Syscall-16                                         3.68ns ± 1%    3.66ns ± 2%     ~     (p=0.148 n=10+10)
SyscallWork-16                                     5.15ns ± 1%    5.13ns ± 0%     ~     (p=0.188 n=10+9)
SyscallExcess-16                                   3.89ns ± 2%    3.83ns ± 1%   -1.52%  (p=0.001 n=10+10)
SyscallExcessWork-16                               5.34ns ± 1%    5.31ns ± 0%   -0.64%  (p=0.000 n=10+9)
PingPongHog-16                                      397ns ± 7%     394ns ±11%     ~     (p=0.912 n=10+10)
StackGrowth-16                                     67.9ns ± 0%    68.8ns ± 0%   +1.28%  (p=0.000 n=10+8)
StackGrowthDeep-16                                 7.70µs ± 1%    8.48µs ± 2%  +10.06%  (p=0.000 n=9+10)
CreateGoroutines-16                                 124ns ± 1%     124ns ± 1%     ~     (p=0.254 n=10+10)
CreateGoroutinesParallel-16                        25.7ns ± 1%    27.6ns ± 2%   +7.51%  (p=0.000 n=10+10)
CreateGoroutinesCapture-16                          823ns ± 1%     821ns ± 2%     ~     (p=0.699 n=10+10)
CreateGoroutinesSingle-16                           175ns ± 3%     172ns ± 3%   -1.90%  (p=0.011 n=10+10)
ClosureCall-16                                     0.11ns ± 7%    0.12ns ± 3%     ~     (p=0.842 n=9+10)
WakeupParallelSpinning/0s-16                       11.4µs ± 0%    11.4µs ± 0%     ~     (p=0.325 n=9+10)
WakeupParallelSpinning/1µs-16                      15.4µs ± 0%    15.4µs ± 1%     ~     (p=0.955 n=10+10)
WakeupParallelSpinning/2µs-16                      18.7µs ± 2%    18.9µs ± 2%     ~     (p=0.052 n=10+10)
WakeupParallelSpinning/5µs-16                      30.7µs ± 0%    30.7µs ± 0%   -0.03%  (p=0.003 n=10+10)
WakeupParallelSpinning/10µs-16                     48.8µs ± 0%    48.8µs ± 0%     ~     (p=0.670 n=10+10)
WakeupParallelSpinning/20µs-16                     90.8µs ± 0%    90.8µs ± 0%   -0.02%  (p=0.004 n=10+10)
WakeupParallelSpinning/50µs-16                      211µs ± 0%     211µs ± 0%     ~     (p=0.194 n=10+10)
WakeupParallelSpinning/100µs-16                     323µs ± 0%     323µs ± 0%     ~     (p=1.000 n=10+9)
WakeupParallelSyscall/0s-16                         118µs ± 0%     118µs ± 0%     ~     (p=0.447 n=10+9)
WakeupParallelSyscall/1µs-16                        119µs ± 2%     119µs ± 1%     ~     (p=0.604 n=10+9)
WakeupParallelSyscall/2µs-16                        120µs ± 1%     121µs ± 3%     ~     (p=0.263 n=8+10)
WakeupParallelSyscall/5µs-16                        126µs ± 2%     126µs ± 2%     ~     (p=0.510 n=10+9)
WakeupParallelSyscall/10µs-16                       136µs ± 1%     137µs ± 1%     ~     (p=0.095 n=9+10)
WakeupParallelSyscall/20µs-16                       156µs ± 2%     157µs ± 3%     ~     (p=0.604 n=10+9)
WakeupParallelSyscall/50µs-16                       221µs ± 1%     220µs ± 1%     ~     (p=0.063 n=10+10)
WakeupParallelSyscall/100µs-16                      326µs ± 0%     325µs ± 0%   -0.26%  (p=0.003 n=9+10)
Matmult-16                                         0.67ns ± 2%    0.66ns ± 2%     ~     (p=0.256 n=10+10)
Fastrand-16                                        0.08ns ±11%    0.08ns ±13%     ~     (p=0.661 n=9+10)
Fastrand64-16                                      0.08ns ±11%    0.08ns ± 6%     ~     (p=0.631 n=10+10)
FastrandHashiter-16                                1.76ns ± 1%    1.76ns ± 1%     ~     (p=0.854 n=8+8)
Fastrandn/2-16                                     0.86ns ± 1%    0.86ns ± 1%   +1.09%  (p=0.000 n=10+9)
Fastrandn/3-16                                     0.85ns ± 1%    0.86ns ± 1%   +1.23%  (p=0.001 n=10+10)
Fastrandn/4-16                                     0.85ns ± 1%    0.87ns ± 2%   +1.60%  (p=0.000 n=10+10)
Fastrandn/5-16                                     0.85ns ± 1%    0.86ns ± 1%   +1.05%  (p=0.000 n=10+10)
IfaceCmp100-16                                     46.6ns ± 0%    46.1ns ± 0%   -1.18%  (p=0.000 n=10+10)
IfaceCmpNil100-16                                  26.8ns ± 0%    26.8ns ± 0%     ~     (p=0.777 n=10+8)
EfaceCmpDiff-16                                     132ns ± 0%     130ns ± 0%   -0.95%  (p=0.000 n=10+9)
EfaceCmpDiffIndirect-16                             209ns ± 0%     211ns ± 0%   +1.14%  (p=0.000 n=10+9)
Defer-16                                           3.40ns ± 1%    3.04ns ± 0%  -10.67%  (p=0.000 n=10+10)
Defer10-16                                         29.4ns ± 2%    27.2ns ± 3%   -7.26%  (p=0.000 n=10+10)
DeferMany-16                                        110ns ± 6%     113ns ± 2%   +3.45%  (p=0.017 n=9+9)
PanicRecover-16                                    67.6ns ± 0%    67.7ns ± 2%     ~     (p=0.436 n=9+9)
GoroutineProfile/small-nil/idle-16                 3.90µs ± 4%    3.86µs ± 2%     ~     (p=0.305 n=10+9)
GoroutineProfile/small-nil/loaded-16               4.82µs ± 6%    4.82µs ± 4%     ~     (p=0.905 n=10+9)
GoroutineProfile/small/idle-16                      103µs ± 3%     102µs ± 3%     ~     (p=0.113 n=9+9)
GoroutineProfile/small/loaded-16                    432µs ± 5%     440µs ±13%     ~     (p=0.604 n=9+10)
GoroutineProfile/large-nil/idle-16                 3.86µs ± 3%    3.82µs ± 3%     ~     (p=0.210 n=10+10)
GoroutineProfile/large-nil/loaded-16               4.90µs ± 2%    4.90µs ± 5%     ~     (p=0.780 n=10+9)
GoroutineProfile/large/idle-16                     2.58ms ± 1%    2.52ms ± 1%   -2.38%  (p=0.000 n=10+10)
GoroutineProfile/large/loaded-16                   8.62ms ± 9%    8.90ms ±11%     ~     (p=0.400 n=9+10)
GoroutineProfile/sparse-nil/idle-16                3.85µs ± 4%    3.81µs ± 3%     ~     (p=0.470 n=10+10)
GoroutineProfile/sparse-nil/loaded-16              4.82µs ± 4%    4.69µs ± 5%     ~     (p=0.052 n=10+10)
GoroutineProfile/sparse/idle-16                     102µs ± 4%     102µs ± 2%     ~     (p=0.497 n=10+9)
GoroutineProfile/sparse/loaded-16                   438µs ± 7%     437µs ± 6%     ~     (p=0.796 n=10+10)
RWMutexUncontended-16                              6.79ns ± 0%    6.78ns ± 0%     ~     (p=0.228 n=10+8)
RWMutexWrite100-16                                 85.4ns ± 0%    87.1ns ± 0%   +2.00%  (p=0.000 n=10+8)
RWMutexWrite10-16                                   168ns ±25%     152ns ±11%     ~     (p=0.063 n=10+10)
RWMutexWorkWrite100-16                              106ns ± 0%     106ns ± 3%     ~     (p=0.136 n=10+10)
RWMutexWorkWrite10-16                               567ns ± 3%     571ns ± 1%     ~     (p=0.326 n=10+9)
SemTable/OneAddrCollision/n=1000-16                15.9µs ± 1%    16.0µs ± 1%   +0.50%  (p=0.031 n=9+9)
SemTable/ManyAddrCollision/n=1000-16               56.2µs ± 1%    56.8µs ± 1%   +1.06%  (p=0.000 n=10+10)
SemTable/OneAddrCollision/n=2000-16                32.6µs ± 2%    32.9µs ± 4%     ~     (p=0.156 n=10+9)
SemTable/ManyAddrCollision/n=2000-16                118µs ± 0%     119µs ± 0%   +0.75%  (p=0.000 n=9+10)
SemTable/OneAddrCollision/n=4000-16                65.3µs ± 1%    65.6µs ± 3%     ~     (p=0.497 n=9+10)
SemTable/ManyAddrCollision/n=4000-16                245µs ± 0%     248µs ± 2%   +1.36%  (p=0.000 n=9+10)
SemTable/OneAddrCollision/n=8000-16                 131µs ± 1%     130µs ± 1%   -1.01%  (p=0.002 n=9+10)
SemTable/ManyAddrCollision/n=8000-16                503µs ± 1%     508µs ± 0%   +0.97%  (p=0.000 n=10+10)
MakeSliceCopy/mallocmove/Byte-16                   67.6ns ± 1%    64.1ns ± 2%   -5.20%  (p=0.000 n=10+10)
MakeSliceCopy/mallocmove/Int-16                    65.0ns ± 7%    61.7ns ± 4%   -5.08%  (p=0.009 n=10+10)
MakeSliceCopy/mallocmove/Ptr-16                    88.1ns ± 1%    79.9ns ± 1%   -9.29%  (p=0.000 n=10+10)
MakeSliceCopy/makecopy/Byte-16                     65.2ns ± 6%    63.4ns ± 0%     ~     (p=0.500 n=10+8)
MakeSliceCopy/makecopy/Int-16                      63.2ns ± 1%    64.1ns ± 1%   +1.34%  (p=0.001 n=9+9)
MakeSliceCopy/makecopy/Ptr-16                      88.1ns ± 1%    80.1ns ± 1%   -9.09%  (p=0.000 n=10+10)
MakeSliceCopy/nilappend/Byte-16                    69.8ns ± 1%    65.7ns ± 3%   -5.80%  (p=0.000 n=10+10)
MakeSliceCopy/nilappend/Int-16                     69.6ns ± 2%    67.2ns ± 1%   -3.50%  (p=0.000 n=10+9)
MakeSliceCopy/nilappend/Ptr-16                     91.5ns ± 1%    83.8ns ± 1%   -8.42%  (p=0.000 n=9+10)
MakeSlice/Byte-16                                  6.64ns ± 3%    6.58ns ± 2%     ~     (p=0.393 n=10+10)
MakeSlice/Int16-16                                 8.60ns ± 1%    8.38ns ± 3%   -2.48%  (p=0.001 n=9+10)
MakeSlice/Int-16                                   17.7ns ± 3%    16.9ns ± 1%   -4.67%  (p=0.000 n=10+9)
MakeSlice/Ptr-16                                   24.0ns ± 3%    23.3ns ± 2%   -3.25%  (p=0.000 n=10+9)
MakeSlice/Struct/24-16                             34.1ns ± 1%    32.0ns ± 1%   -6.11%  (p=0.000 n=10+10)
MakeSlice/Struct/32-16                             39.1ns ± 4%    38.2ns ± 1%     ~     (p=0.829 n=10+8)
MakeSlice/Struct/40-16                             47.0ns ± 5%    43.0ns ± 2%   -8.55%  (p=0.000 n=10+9)
GrowSlice/Byte-16                                  15.3ns ± 3%    15.0ns ± 2%   -1.75%  (p=0.005 n=9+9)
GrowSlice/Int16-16                                 18.9ns ± 2%    18.4ns ± 2%   -2.71%  (p=0.000 n=10+9)
GrowSlice/Int-16                                   33.9ns ± 1%    32.2ns ± 1%   -4.89%  (p=0.000 n=10+9)
GrowSlice/Ptr-16                                   45.3ns ± 2%    43.5ns ± 1%   -4.12%  (p=0.000 n=10+10)
GrowSlice/Struct/24-16                             61.9ns ± 2%    60.0ns ± 4%   -3.10%  (p=0.002 n=10+10)
GrowSlice/Struct/32-16                             79.9ns ± 2%    72.3ns ± 3%   -9.58%  (p=0.000 n=8+10)
GrowSlice/Struct/40-16                             97.1ns ± 7%    88.8ns ± 5%   -8.49%  (p=0.000 n=10+10)
ExtendSlice/IntSlice-16                            21.1ns ± 2%    20.3ns ± 2%   -3.71%  (p=0.000 n=10+10)
ExtendSlice/PointerSlice-16                        26.8ns ± 2%    26.3ns ± 2%   -1.86%  (p=0.004 n=10+10)
ExtendSlice/NoGrow-16                              1.23ns ± 0%    1.30ns ± 1%   +5.03%  (p=0.000 n=10+10)
Append-16                                          4.58ns ± 1%    4.53ns ± 0%   -1.11%  (p=0.000 n=10+10)
AppendGrowByte-16                                  1.46ms ± 8%    1.42ms ± 7%   -3.24%  (p=0.035 n=10+10)
AppendGrowString-16                                27.8ms ± 4%    27.2ms ± 5%     ~     (p=0.052 n=10+10)
AppendSlice/1Bytes-16                              1.03ns ± 1%    1.04ns ± 1%     ~     (p=0.303 n=10+10)
AppendSlice/4Bytes-16                              1.04ns ± 0%    1.05ns ± 0%   +0.79%  (p=0.000 n=9+10)
AppendSlice/7Bytes-16                              1.23ns ± 1%    1.24ns ± 0%   +0.45%  (p=0.001 n=10+10)
AppendSlice/8Bytes-16                              1.24ns ± 0%    1.24ns ± 0%     ~     (p=0.183 n=10+10)
AppendSlice/15Bytes-16                             1.37ns ± 1%    1.43ns ± 1%   +3.88%  (p=0.000 n=10+10)
AppendSlice/16Bytes-16                             1.37ns ± 1%    1.42ns ± 1%   +3.63%  (p=0.000 n=9+10)
AppendSlice/32Bytes-16                             1.44ns ± 0%    1.47ns ± 1%   +1.83%  (p=0.000 n=10+10)
AppendSliceLarge/1024Bytes-16                       257ns ± 2%     234ns ± 1%   -8.96%  (p=0.000 n=8+9)
AppendSliceLarge/4096Bytes-16                       871ns ± 6%     812ns ± 1%   -6.80%  (p=0.000 n=10+10)
AppendSliceLarge/16384Bytes-16                     3.15µs ± 6%    3.04µs ± 5%     ~     (p=0.052 n=10+10)
AppendSliceLarge/65536Bytes-16                     10.7µs ± 7%    10.8µs ± 2%     ~     (p=0.278 n=10+9)
AppendSliceLarge/262144Bytes-16                    42.9µs ± 1%    39.6µs ± 5%   -7.75%  (p=0.000 n=9+10)
AppendSliceLarge/1048576Bytes-16                    147µs ± 4%     144µs ± 4%   -2.21%  (p=0.035 n=10+10)
AppendStr/1Bytes-16                                1.20ns ± 0%    1.20ns ± 0%     ~     (p=0.755 n=10+10)
AppendStr/4Bytes-16                                1.13ns ± 0%    1.14ns ± 1%   +1.20%  (p=0.000 n=10+10)
AppendStr/8Bytes-16                                1.24ns ± 0%    1.25ns ± 0%   +0.93%  (p=0.000 n=10+10)
AppendStr/16Bytes-16                               1.40ns ± 0%    1.42ns ± 0%   +2.10%  (p=0.000 n=9+10)
AppendStr/32Bytes-16                               1.44ns ± 0%    1.45ns ± 0%   +0.99%  (p=0.000 n=10+10)
AppendSpecialCase-16                               8.64ns ± 1%    8.89ns ± 2%   +2.90%  (p=0.000 n=10+10)
Copy/1Byte-16                                      1.24ns ± 1%    1.24ns ± 0%   -0.28%  (p=0.000 n=10+6)
Copy/1String-16                                    1.24ns ± 0%    1.23ns ± 0%     ~     (p=0.160 n=10+10)
Copy/2Byte-16                                      1.24ns ± 0%    1.24ns ± 0%     ~     (p=0.115 n=10+10)
Copy/2String-16                                    1.24ns ± 0%    1.24ns ± 1%     ~     (p=0.954 n=10+10)
Copy/4Byte-16                                      1.24ns ± 0%    1.24ns ± 0%   -0.44%  (p=0.001 n=10+10)
Copy/4String-16                                    1.23ns ± 0%    1.23ns ± 0%     ~     (p=0.081 n=10+10)
Copy/8Byte-16                                      1.37ns ± 0%    1.34ns ± 0%   -1.79%  (p=0.000 n=9+9)
Copy/8String-16                                    1.34ns ± 0%    1.34ns ± 0%   -0.58%  (p=0.000 n=9+10)
Copy/12Byte-16                                     1.44ns ± 0%    1.44ns ± 0%     ~     (p=0.149 n=9+9)
Copy/12String-16                                   1.44ns ± 0%    1.45ns ± 0%     ~     (p=0.124 n=9+9)
Copy/16Byte-16                                     1.44ns ± 0%    1.44ns ± 0%   -0.19%  (p=0.004 n=10+9)
Copy/16String-16                                   1.44ns ± 0%    1.45ns ± 0%   +0.30%  (p=0.008 n=10+10)
Copy/32Byte-16                                     1.63ns ± 1%    1.62ns ± 1%   -0.72%  (p=0.002 n=10+10)
Copy/32String-16                                   1.60ns ± 1%    1.64ns ± 0%   +2.23%  (p=0.000 n=10+10)
Copy/128Byte-16                                    2.06ns ± 0%    2.06ns ± 0%     ~     (p=0.757 n=9+10)
Copy/128String-16                                  2.07ns ± 0%    2.07ns ± 0%   +0.36%  (p=0.004 n=10+10)
Copy/1024Byte-16                                   6.07ns ± 2%    6.00ns ± 1%   -1.20%  (p=0.000 n=9+10)
Copy/1024String-16                                 6.05ns ± 0%    5.95ns ± 1%   -1.54%  (p=0.000 n=10+9)
AppendInPlace/NoGrow/Byte-16                        288ns ± 1%     284ns ± 1%   -1.58%  (p=0.000 n=10+10)
AppendInPlace/NoGrow/1Ptr-16                        844ns ± 1%     809ns ± 3%   -4.13%  (p=0.000 n=9+10)
AppendInPlace/NoGrow/2Ptr-16                       1.47µs ± 1%    1.46µs ± 1%     ~     (p=0.388 n=9+10)
AppendInPlace/NoGrow/3Ptr-16                       1.87µs ± 7%    1.91µs ± 1%     ~     (p=0.166 n=10+8)
AppendInPlace/NoGrow/4Ptr-16                       2.66µs ± 1%    2.67µs ± 3%     ~     (p=0.968 n=9+10)
AppendInPlace/Grow/Byte-16                          126ns ± 2%     121ns ± 2%   -4.06%  (p=0.000 n=10+10)
AppendInPlace/Grow/1Ptr-16                          132ns ± 2%     127ns ± 2%   -4.28%  (p=0.000 n=10+9)
AppendInPlace/Grow/2Ptr-16                          196ns ± 2%     188ns ± 1%   -4.20%  (p=0.000 n=10+8)
AppendInPlace/Grow/3Ptr-16                          264ns ± 1%     260ns ± 1%   -1.51%  (p=0.000 n=9+10)
AppendInPlace/Grow/4Ptr-16                          297ns ± 2%     294ns ± 2%     ~     (p=0.085 n=10+10)
StackCopyPtr-16                                    36.4ms ± 2%    36.7ms ± 2%     ~     (p=0.481 n=10+10)
StackCopy-16                                       33.9ms ± 3%    32.6ms ± 1%   -3.87%  (p=0.000 n=10+8)
StackCopyNoCache-16                                1.00ms ± 5%    1.01ms ± 5%     ~     (p=0.143 n=10+10)
StackCopyWithStkobj-16                             11.0ms ± 3%    10.9ms ± 4%     ~     (p=0.579 n=10+10)
Issue18138-16                                      49.2µs ± 5%    49.0µs ± 4%     ~     (p=1.000 n=10+9)
CompareStringEqual-16                              1.39ns ± 1%    1.45ns ± 2%   +3.80%  (p=0.000 n=8+10)
CompareStringIdentical-16                          0.55ns ± 1%    0.55ns ± 0%   +0.42%  (p=0.007 n=10+10)
CompareStringSameLength-16                         1.03ns ± 0%    1.03ns ± 0%     ~     (p=0.430 n=9+10)
CompareStringDifferentLength-16                    0.11ns ± 2%    0.11ns ± 3%     ~     (p=0.139 n=9+10)
CompareStringBigUnaligned-16                       23.9µs ± 1%    24.0µs ± 1%     ~     (p=0.370 n=9+8)
CompareStringBig-16                                22.0µs ± 3%    22.2µs ± 3%     ~     (p=0.243 n=9+10)
ConcatStringAndBytes-16                            10.7ns ± 1%    10.0ns ± 2%   -6.33%  (p=0.000 n=10+10)
SliceByteToString/1-16                             1.34ns ± 0%    1.34ns ± 0%     ~     (p=0.057 n=10+10)
SliceByteToString/2-16                             6.67ns ± 2%    6.60ns ± 3%     ~     (p=0.101 n=10+10)
SliceByteToString/4-16                             7.76ns ± 2%    7.56ns ± 3%   -2.59%  (p=0.001 n=10+10)
SliceByteToString/8-16                             9.81ns ± 4%    9.57ns ± 2%   -2.48%  (p=0.005 n=10+10)
SliceByteToString/16-16                            14.0ns ± 3%    13.7ns ± 2%   -2.31%  (p=0.009 n=10+10)
SliceByteToString/32-16                            17.3ns ± 1%    16.7ns ± 2%   -3.41%  (p=0.000 n=10+10)
SliceByteToString/64-16                            25.1ns ± 1%    24.1ns ± 2%   -3.93%  (p=0.000 n=9+10)
SliceByteToString/128-16                           38.6ns ± 1%    36.5ns ± 1%   -5.60%  (p=0.000 n=10+10)
RuneCount/lenruneslice/ASCII-16                    4.12ns ± 0%    4.11ns ± 0%     ~     (p=0.382 n=10+10)
RuneCount/lenruneslice/Japanese-16                 25.4ns ± 2%    25.6ns ± 2%     ~     (p=0.138 n=9+10)
RuneCount/lenruneslice/MixedLength-16              17.1ns ± 0%    17.2ns ± 0%   +0.59%  (p=0.000 n=9+9)
RuneCount/rangeloop/ASCII-16                       3.30ns ± 1%    3.29ns ± 0%     ~     (p=0.267 n=10+10)
RuneCount/rangeloop/Japanese-16                    20.1ns ± 1%    24.9ns ± 1%  +24.31%  (p=0.000 n=9+9)
RuneCount/rangeloop/MixedLength-16                 16.5ns ± 1%    16.7ns ± 1%   +1.34%  (p=0.000 n=10+10)
RuneCount/utf8.RuneCountInString/ASCII-16          5.71ns ± 1%    5.73ns ± 2%     ~     (p=0.579 n=10+10)
RuneCount/utf8.RuneCountInString/Japanese-16       22.0ns ± 6%    18.4ns ± 3%  -16.41%  (p=0.000 n=9+10)
RuneCount/utf8.RuneCountInString/MixedLength-16    15.0ns ± 1%    14.9ns ± 1%   -1.01%  (p=0.004 n=9+10)
RuneIterate/range/ASCII-16                         2.69ns ± 1%    2.72ns ± 0%   +0.94%  (p=0.026 n=10+9)
RuneIterate/range/Japanese-16                      24.5ns ± 2%    25.3ns ± 2%   +3.23%  (p=0.000 n=9+10)
RuneIterate/range/MixedLength-16                   17.0ns ± 1%    17.1ns ± 1%   +0.85%  (p=0.000 n=10+10)
RuneIterate/range1/ASCII-16                        2.70ns ± 1%    2.72ns ± 0%     ~     (p=0.058 n=9+9)
RuneIterate/range1/Japanese-16                     24.1ns ± 2%    25.2ns ± 3%   +4.30%  (p=0.000 n=10+10)
RuneIterate/range1/MixedLength-16                  16.9ns ± 1%    17.7ns ± 0%   +5.04%  (p=0.000 n=10+8)
RuneIterate/range2/ASCII-16                        2.84ns ± 8%    2.72ns ± 1%   -4.28%  (p=0.003 n=10+9)
RuneIterate/range2/Japanese-16                     22.7ns ± 4%    25.2ns ± 3%  +10.97%  (p=0.000 n=10+10)
RuneIterate/range2/MixedLength-16                  17.0ns ± 1%    17.2ns ± 0%   +0.95%  (p=0.000 n=10+10)
ArrayEqual-16                                      0.40ns ± 5%    0.35ns ± 2%  -11.83%  (p=0.000 n=10+10)
Func/Name-16                                       8.05ns ± 1%    8.09ns ± 1%   +0.40%  (p=0.025 n=8+10)
Func/Entry-16                                      1.73ns ± 1%    1.66ns ± 1%   -3.93%  (p=0.000 n=10+10)
Func/FileLine-16                                   27.5ns ± 2%    26.0ns ± 0%   -5.50%  (p=0.000 n=10+10)
[Geo mean]                                         16.7ns         15.7ns        -6.08%

name                                             old speed      new speed      delta
SetTypePtr-16                                    11.0GB/s ± 1%  11.0GB/s ± 3%     ~     (p=0.684 n=10+10)
SetTypePtr8-16                                   15.5GB/s ± 0%  15.5GB/s ± 0%     ~     (p=0.123 n=10+10)
SetTypePtr16-16                                  31.0GB/s ± 1%  31.1GB/s ± 0%     ~     (p=0.123 n=10+10)
SetTypePtr32-16                                  62.1GB/s ± 0%  62.2GB/s ± 0%     ~     (p=0.123 n=10+10)
SetTypePtr64-16                                   124GB/s ± 0%   124GB/s ± 0%     ~     (p=0.684 n=10+10)
SetTypePtr126-16                                  146GB/s ± 0%   146GB/s ± 0%     ~     (p=0.481 n=10+10)
SetTypePtr128-16                                  154GB/s ± 0%   154GB/s ± 0%     ~     (p=0.243 n=9+10)
SetTypePtrSlice-16                                151GB/s ± 1%   151GB/s ± 1%     ~     (p=0.497 n=9+10)
SetTypeNode1-16                                  5.82GB/s ± 1%  5.82GB/s ± 0%     ~     (p=0.353 n=10+10)
SetTypeNode1Slice-16                             76.1GB/s ± 1%  77.0GB/s ± 1%   +1.19%  (p=0.000 n=10+10)
SetTypeNode8-16                                  19.4GB/s ± 0%  19.4GB/s ± 0%     ~     (p=0.130 n=8+8)
SetTypeNode8Slice-16                              113GB/s ± 0%   113GB/s ± 0%     ~     (p=0.604 n=10+9)
SetTypeNode64-16                                 76.5GB/s ± 0%  76.5GB/s ± 0%     ~     (p=0.190 n=10+10)
SetTypeNode64Slice-16                            97.8GB/s ± 0%  97.7GB/s ± 0%     ~     (p=0.549 n=9+10)
SetTypeNode64Dead-16                             95.5GB/s ± 0%  95.7GB/s ± 0%     ~     (p=0.118 n=10+6)
SetTypeNode64DeadSlice-16                         112GB/s ± 0%   112GB/s ± 0%     ~     (p=0.353 n=10+10)
SetTypeNode124-16                                 146GB/s ± 0%   146GB/s ± 0%     ~     (p=0.853 n=10+10)
SetTypeNode124Slice-16                            146GB/s ± 5%   149GB/s ± 0%     ~     (p=0.315 n=10+10)
SetTypeNode126-16                                 154GB/s ± 0%   154GB/s ± 0%     ~     (p=0.356 n=10+9)
SetTypeNode126Slice-16                            150GB/s ± 0%   150GB/s ± 0%     ~     (p=0.095 n=9+10)
SetTypeNode128-16                                 107GB/s ± 0%   107GB/s ± 0%   +0.31%  (p=0.003 n=9+10)
SetTypeNode128Slice-16                            119GB/s ± 0%   120GB/s ± 0%     ~     (p=0.156 n=10+9)
SetTypeNode130-16                                 108GB/s ± 0%   108GB/s ± 0%   +0.33%  (p=0.002 n=10+10)
SetTypeNode130Slice-16                            119GB/s ± 0%   119GB/s ± 0%     ~     (p=0.739 n=10+10)
SetTypeNode1024-16                                160GB/s ± 0%   159GB/s ± 1%     ~     (p=0.113 n=9+9)
SetTypeNode1024Slice-16                           144GB/s ± 0%   144GB/s ± 0%     ~     (p=0.063 n=10+10)
Hash5-16                                         2.59GB/s ± 1%  2.49GB/s ± 0%   -3.90%  (p=0.000 n=10+9)
Hash16-16                                        7.85GB/s ± 1%  7.23GB/s ± 1%   -7.92%  (p=0.000 n=10+10)
Hash64-16                                        24.0GB/s ± 0%  23.9GB/s ± 0%     ~     (p=0.190 n=9+9)
Hash1024-16                                      62.4GB/s ± 0%  62.3GB/s ± 0%   -0.16%  (p=0.017 n=9+10)
Hash65536-16                                     74.0GB/s ± 0%  74.0GB/s ± 0%     ~     (p=0.796 n=10+10)
Memmove/1-16                                     1.08GB/s ± 0%  1.08GB/s ± 0%   -0.21%  (p=0.035 n=10+10)
Memmove/2-16                                     2.16GB/s ± 0%  2.15GB/s ± 0%     ~     (p=0.105 n=10+10)
Memmove/3-16                                     3.24GB/s ± 1%  3.22GB/s ± 1%   -0.49%  (p=0.004 n=10+10)
Memmove/4-16                                     3.89GB/s ± 0%  3.89GB/s ± 0%     ~     (p=0.218 n=10+10)
Memmove/5-16                                     4.42GB/s ± 0%  4.42GB/s ± 0%     ~     (p=0.075 n=10+10)
Memmove/6-16                                     5.31GB/s ± 0%  5.29GB/s ± 1%     ~     (p=0.218 n=10+10)
Memmove/7-16                                     6.19GB/s ± 0%  6.18GB/s ± 0%   -0.15%  (p=0.035 n=10+9)
Memmove/8-16                                     7.07GB/s ± 0%  7.07GB/s ± 0%     ~     (p=0.684 n=10+10)
Memmove/9-16                                     7.22GB/s ± 0%  6.68GB/s ± 0%   -7.37%  (p=0.000 n=10+10)
Memmove/10-16                                    8.02GB/s ± 0%  7.43GB/s ± 0%   -7.38%  (p=0.000 n=9+9)
Memmove/11-16                                    8.83GB/s ± 0%  8.13GB/s ± 0%   -7.87%  (p=0.000 n=10+9)
Memmove/12-16                                    9.62GB/s ± 0%  8.89GB/s ± 1%   -7.61%  (p=0.000 n=10+10)
Memmove/13-16                                    10.4GB/s ± 0%   9.7GB/s ± 0%   -7.20%  (p=0.000 n=10+10)
Memmove/14-16                                    11.2GB/s ± 0%  10.4GB/s ± 1%   -7.64%  (p=0.000 n=10+9)
Memmove/15-16                                    12.0GB/s ± 0%  11.1GB/s ± 0%   -7.46%  (p=0.000 n=10+9)
Memmove/16-16                                    12.8GB/s ± 0%  11.8GB/s ± 1%   -7.67%  (p=0.000 n=10+10)
Memmove/32-16                                    23.8GB/s ± 0%  23.5GB/s ± 1%   -1.20%  (p=0.000 n=10+10)
Memmove/64-16                                    44.2GB/s ± 0%  39.1GB/s ± 0%  -11.56%  (p=0.000 n=10+9)
Memmove/128-16                                   68.7GB/s ± 0%  63.2GB/s ± 0%   -7.95%  (p=0.000 n=10+10)
Memmove/256-16                                    104GB/s ± 0%   103GB/s ± 0%   -1.13%  (p=0.000 n=10+10)
Memmove/512-16                                    129GB/s ± 1%   129GB/s ± 0%     ~     (p=0.165 n=10+10)
Memmove/1024-16                                   174GB/s ± 1%   174GB/s ± 1%     ~     (p=0.258 n=9+9)
Memmove/2048-16                                   213GB/s ± 1%   213GB/s ± 2%     ~     (p=0.963 n=8+9)
Memmove/4096-16                                   250GB/s ± 1%   240GB/s ± 4%   -3.83%  (p=0.006 n=9+9)
MemmoveOverlap/32-16                             19.8GB/s ± 1%  19.1GB/s ± 1%   -3.40%  (p=0.000 n=10+10)
MemmoveOverlap/64-16                             39.0GB/s ± 0%  38.8GB/s ± 0%   -0.28%  (p=0.001 n=9+9)
MemmoveOverlap/128-16                            62.2GB/s ± 0%  62.1GB/s ± 0%     ~     (p=0.063 n=10+10)
MemmoveOverlap/256-16                            96.0GB/s ± 0%  95.8GB/s ± 0%   -0.26%  (p=0.009 n=10+10)
MemmoveOverlap/512-16                            83.6GB/s ±16%  89.2GB/s ± 0%     ~     (p=0.696 n=10+8)
MemmoveOverlap/1024-16                            141GB/s ± 0%   140GB/s ± 0%   -0.28%  (p=0.006 n=8+10)
MemmoveOverlap/2048-16                            172GB/s ± 0%   171GB/s ± 1%   -0.38%  (p=0.008 n=9+9)
MemmoveOverlap/4096-16                            176GB/s ± 1%   177GB/s ± 1%   +0.84%  (p=0.001 n=8+10)
MemmoveUnalignedDst/1-16                          806MB/s ± 0%   802MB/s ± 1%   -0.52%  (p=0.023 n=10+10)
MemmoveUnalignedDst/2-16                         1.62GB/s ± 0%  1.62GB/s ± 0%   -0.11%  (p=0.041 n=10+10)
MemmoveUnalignedDst/3-16                         2.43GB/s ± 0%  2.43GB/s ± 0%   -0.14%  (p=0.006 n=9+9)
MemmoveUnalignedDst/4-16                         3.24GB/s ± 0%  3.23GB/s ± 1%   -0.36%  (p=0.007 n=10+10)
MemmoveUnalignedDst/5-16                         3.71GB/s ± 0%  3.71GB/s ± 0%     ~     (p=0.063 n=10+10)
MemmoveUnalignedDst/6-16                         4.48GB/s ± 0%  4.47GB/s ± 0%     ~     (p=0.912 n=10+10)
MemmoveUnalignedDst/7-16                         5.22GB/s ± 0%  5.22GB/s ± 0%     ~     (p=1.000 n=10+10)
MemmoveUnalignedDst/8-16                         5.95GB/s ± 0%  5.93GB/s ± 1%   -0.40%  (p=0.023 n=10+10)
MemmoveUnalignedDst/9-16                         6.24GB/s ± 0%  6.24GB/s ± 0%     ~     (p=0.912 n=10+10)
MemmoveUnalignedDst/10-16                        6.94GB/s ± 0%  6.94GB/s ± 0%     ~     (p=0.353 n=10+10)
MemmoveUnalignedDst/11-16                        7.64GB/s ± 0%  7.63GB/s ± 0%     ~     (p=0.393 n=10+10)
MemmoveUnalignedDst/12-16                        8.33GB/s ± 0%  8.33GB/s ± 0%     ~     (p=0.971 n=10+10)
MemmoveUnalignedDst/13-16                        9.02GB/s ± 0%  9.01GB/s ± 0%     ~     (p=0.436 n=10+10)
MemmoveUnalignedDst/14-16                        9.71GB/s ± 0%  9.71GB/s ± 0%     ~     (p=0.280 n=10+10)
MemmoveUnalignedDst/15-16                        10.4GB/s ± 0%  10.4GB/s ± 1%     ~     (p=0.853 n=10+10)
MemmoveUnalignedDst/16-16                        11.1GB/s ± 0%  11.1GB/s ± 0%     ~     (p=0.089 n=10+10)
MemmoveUnalignedDst/32-16                        19.7GB/s ± 1%  19.6GB/s ± 0%     ~     (p=0.075 n=10+10)
MemmoveUnalignedDst/64-16                        38.9GB/s ± 0%  38.8GB/s ± 0%     ~     (p=0.218 n=10+10)
MemmoveUnalignedDst/128-16                       62.1GB/s ± 0%  62.1GB/s ± 0%     ~     (p=0.549 n=10+9)
MemmoveUnalignedDst/256-16                       69.4GB/s ± 0%  69.3GB/s ± 0%     ~     (p=0.105 n=10+10)
MemmoveUnalignedDst/512-16                        124GB/s ± 1%   124GB/s ± 0%     ~     (p=0.762 n=10+8)
MemmoveUnalignedDst/1024-16                       136GB/s ± 0%   136GB/s ± 1%     ~     (p=0.666 n=9+9)
MemmoveUnalignedDst/2048-16                       159GB/s ± 0%   159GB/s ± 0%     ~     (p=0.574 n=8+8)
MemmoveUnalignedDst/4096-16                       161GB/s ± 0%   161GB/s ± 0%     ~     (p=1.000 n=9+9)
MemmoveUnalignedDstOverlap/32-16                 7.84GB/s ± 0%  7.83GB/s ± 0%     ~     (p=0.353 n=10+10)
MemmoveUnalignedDstOverlap/64-16                 14.0GB/s ± 0%  14.0GB/s ± 0%     ~     (p=0.661 n=10+9)
MemmoveUnalignedDstOverlap/128-16                27.4GB/s ± 0%  27.4GB/s ± 0%     ~     (p=0.353 n=10+10)
MemmoveUnalignedDstOverlap/256-16                50.4GB/s ± 0%  50.4GB/s ± 0%     ~     (p=0.156 n=10+9)
MemmoveUnalignedDstOverlap/512-16                60.7GB/s ± 4%  62.5GB/s ± 0%   +3.07%  (p=0.022 n=10+9)
MemmoveUnalignedDstOverlap/1024-16                107GB/s ± 0%   107GB/s ± 0%     ~     (p=0.234 n=8+8)
MemmoveUnalignedDstOverlap/2048-16                146GB/s ± 0%   146GB/s ± 1%     ~     (p=0.182 n=10+9)
MemmoveUnalignedDstOverlap/4096-16                155GB/s ± 0%   155GB/s ± 0%     ~     (p=0.400 n=10+9)
MemmoveUnalignedSrc/1-16                          882MB/s ± 0%   884MB/s ± 1%   +0.24%  (p=0.033 n=10+9)
MemmoveUnalignedSrc/2-16                         1.76GB/s ± 1%  1.77GB/s ± 0%   +0.27%  (p=0.028 n=10+9)
MemmoveUnalignedSrc/3-16                         2.43GB/s ± 0%  2.43GB/s ± 0%   +0.26%  (p=0.027 n=9+10)
MemmoveUnalignedSrc/4-16                         3.24GB/s ± 0%  3.24GB/s ± 1%     ~     (p=0.079 n=9+10)
MemmoveUnalignedSrc/5-16                         3.73GB/s ± 0%  3.73GB/s ± 1%     ~     (p=0.829 n=8+10)
MemmoveUnalignedSrc/6-16                         4.47GB/s ± 0%  4.49GB/s ± 0%   +0.39%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/7-16                         5.22GB/s ± 0%  5.23GB/s ± 0%     ~     (p=0.280 n=10+10)
MemmoveUnalignedSrc/8-16                         5.95GB/s ± 0%  5.98GB/s ± 0%   +0.39%  (p=0.001 n=10+9)
MemmoveUnalignedSrc/9-16                         6.24GB/s ± 0%  6.25GB/s ± 0%     ~     (p=0.549 n=10+9)
MemmoveUnalignedSrc/10-16                        6.93GB/s ± 0%  6.94GB/s ± 0%     ~     (p=0.604 n=10+9)
MemmoveUnalignedSrc/11-16                        7.63GB/s ± 0%  7.63GB/s ± 1%     ~     (p=0.353 n=10+10)
MemmoveUnalignedSrc/12-16                        8.32GB/s ± 0%  8.32GB/s ± 0%     ~     (p=0.218 n=10+10)
MemmoveUnalignedSrc/13-16                        9.02GB/s ± 0%  9.00GB/s ± 1%     ~     (p=0.684 n=10+10)
MemmoveUnalignedSrc/14-16                        9.71GB/s ± 0%  9.71GB/s ± 0%     ~     (p=0.739 n=10+10)
MemmoveUnalignedSrc/15-16                        10.4GB/s ± 0%  10.4GB/s ± 0%     ~     (p=0.353 n=10+10)
MemmoveUnalignedSrc/16-16                        11.1GB/s ± 1%  11.1GB/s ± 0%     ~     (p=0.579 n=10+10)
MemmoveUnalignedSrc/32-16                        20.0GB/s ± 1%  20.0GB/s ± 0%     ~     (p=0.631 n=10+10)
MemmoveUnalignedSrc/64-16                        38.8GB/s ± 0%  38.8GB/s ± 0%     ~     (p=0.579 n=10+10)
MemmoveUnalignedSrc/128-16                       61.2GB/s ± 0%  61.2GB/s ± 0%     ~     (p=0.780 n=10+9)
MemmoveUnalignedSrc/256-16                       94.8GB/s ± 0%  92.2GB/s ± 1%   -2.73%  (p=0.000 n=10+10)
MemmoveUnalignedSrc/512-16                        119GB/s ± 0%   119GB/s ± 0%   +0.26%  (p=0.027 n=8+9)
MemmoveUnalignedSrc/1024-16                       141GB/s ± 0%   142GB/s ± 1%   +1.07%  (p=0.000 n=8+10)
MemmoveUnalignedSrc/2048-16                       157GB/s ± 0%   157GB/s ± 0%     ~     (p=0.167 n=9+8)
MemmoveUnalignedSrc/4096-16                       161GB/s ± 0%   162GB/s ± 1%     ~     (p=0.063 n=10+10)
MemmoveUnalignedSrcOverlap/32-16                 7.93GB/s ± 0%  7.88GB/s ± 0%   -0.63%  (p=0.000 n=9+10)
MemmoveUnalignedSrcOverlap/64-16                 15.5GB/s ± 0%  15.5GB/s ± 0%     ~     (p=0.529 n=10+10)
MemmoveUnalignedSrcOverlap/128-16                28.3GB/s ± 0%  28.3GB/s ± 0%     ~     (p=0.218 n=10+10)
MemmoveUnalignedSrcOverlap/256-16                41.5GB/s ± 0%  41.6GB/s ± 0%   +0.35%  (p=0.000 n=10+9)
MemmoveUnalignedSrcOverlap/512-16                68.9GB/s ± 0%  68.8GB/s ± 0%     ~     (p=0.541 n=9+8)
MemmoveUnalignedSrcOverlap/1024-16                115GB/s ± 0%   115GB/s ± 0%     ~     (p=0.382 n=8+8)
MemmoveUnalignedSrcOverlap/2048-16                155GB/s ± 0%   144GB/s ±18%     ~     (p=0.101 n=8+10)
MemmoveUnalignedSrcOverlap/4096-16                160GB/s ± 0%   160GB/s ± 1%     ~     (p=0.605 n=9+9)
Memclr/5-16                                      5.81GB/s ± 1%  5.80GB/s ± 2%     ~     (p=0.546 n=9+9)
Memclr/16-16                                     15.5GB/s ± 0%  15.4GB/s ± 0%   -0.32%  (p=0.008 n=9+10)
Memclr/64-16                                     51.8GB/s ± 0%  50.7GB/s ± 0%   -2.22%  (p=0.000 n=10+10)
Memclr/256-16                                     113GB/s ± 0%   113GB/s ± 0%     ~     (p=0.143 n=10+10)
Memclr/4096-16                                    239GB/s ± 1%   237GB/s ± 0%   -0.87%  (p=0.000 n=10+10)
Memclr/65536-16                                  79.8GB/s ± 0%  79.7GB/s ± 0%     ~     (p=0.529 n=10+10)
Memclr/1M-16                                     74.6GB/s ± 1%  74.7GB/s ± 1%     ~     (p=0.529 n=10+10)
Memclr/4M-16                                     48.7GB/s ± 1%  48.8GB/s ± 0%     ~     (p=0.123 n=10+10)
Memclr/8M-16                                     48.2GB/s ± 2%  48.6GB/s ± 0%     ~     (p=0.408 n=10+8)
Memclr/16M-16                                    43.6GB/s ± 4%  43.3GB/s ± 0%     ~     (p=0.173 n=10+8)
Memclr/64M-16                                    30.7GB/s ± 0%  30.7GB/s ± 0%     ~     (p=0.113 n=10+9)
GoMemclr/5-16                                    6.07GB/s ± 0%  6.08GB/s ± 0%     ~     (p=0.367 n=9+10)
GoMemclr/16-16                                   15.6GB/s ± 0%  15.6GB/s ± 0%   -0.22%  (p=0.004 n=9+9)
GoMemclr/64-16                                   56.1GB/s ± 0%  56.1GB/s ± 0%     ~     (p=0.968 n=10+9)
GoMemclr/256-16                                   125GB/s ± 0%   124GB/s ± 0%     ~     (p=0.912 n=10+10)
MemclrRange/1K_2K-16                              210GB/s ± 0%   224GB/s ± 1%   +6.81%  (p=0.000 n=10+10)
MemclrRange/2K_8K-16                              228GB/s ± 0%   228GB/s ± 0%     ~     (p=0.684 n=10+10)
MemclrRange/4K_16K-16                             279GB/s ± 0%   279GB/s ± 0%     ~     (p=0.780 n=9+10)
MemclrRange/160K_228K-16                         80.3GB/s ± 0%  80.2GB/s ± 0%     ~     (p=0.165 n=10+10)
Copy/1Byte-16                                     808MB/s ± 1%   810MB/s ± 0%   +0.28%  (p=0.000 n=10+8)
Copy/1String-16                                   810MB/s ± 0%   811MB/s ± 0%     ~     (p=0.105 n=10+10)
Copy/2Byte-16                                    1.62GB/s ± 0%  1.62GB/s ± 0%     ~     (p=0.182 n=10+9)
Copy/2String-16                                  1.62GB/s ± 1%  1.62GB/s ± 1%     ~     (p=1.000 n=10+10)
Copy/4Byte-16                                    3.22GB/s ± 0%  3.24GB/s ± 0%   +0.46%  (p=0.000 n=10+10)
Copy/4String-16                                  3.24GB/s ± 0%  3.24GB/s ± 0%     ~     (p=0.075 n=10+10)
Copy/8Byte-16                                    5.86GB/s ± 0%  5.96GB/s ± 0%   +1.82%  (p=0.000 n=9+9)
Copy/8String-16                                  5.95GB/s ± 0%  5.99GB/s ± 0%   +0.59%  (p=0.000 n=9+10)
Copy/12Byte-16                                   8.32GB/s ± 0%  8.32GB/s ± 0%     ~     (p=0.190 n=9+9)
Copy/12String-16                                 8.31GB/s ± 0%  8.29GB/s ± 0%     ~     (p=0.068 n=9+10)
Copy/16Byte-16                                   11.1GB/s ± 0%  11.1GB/s ± 0%   +0.18%  (p=0.003 n=10+9)
Copy/16String-16                                 11.1GB/s ± 0%  11.1GB/s ± 0%   -0.31%  (p=0.009 n=10+10)
Copy/32Byte-16                                   19.6GB/s ± 1%  19.8GB/s ± 1%   +0.72%  (p=0.002 n=10+10)
Copy/32String-16                                 20.0GB/s ± 0%  19.5GB/s ± 0%   -2.19%  (p=0.000 n=10+10)
Copy/128Byte-16                                  62.2GB/s ± 0%  62.1GB/s ± 0%     ~     (p=0.661 n=9+10)
Copy/128String-16                                61.9GB/s ± 0%  61.7GB/s ± 0%   -0.35%  (p=0.005 n=10+10)
Copy/1024Byte-16                                  169GB/s ± 2%   171GB/s ± 1%   +1.21%  (p=0.000 n=9+10)
Copy/1024String-16                                169GB/s ± 0%   172GB/s ± 1%   +1.57%  (p=0.000 n=10+9)
CompareStringBigUnaligned-16                     43.8GB/s ± 1%  43.7GB/s ± 1%     ~     (p=0.370 n=9+8)
CompareStringBig-16                              47.6GB/s ± 3%  47.3GB/s ± 3%     ~     (p=0.243 n=9+10)
[Geo mean]                                       25.3GB/s       28.0GB/s       +10.66%

name                                             old p50-ns     new p50-ns     delta
ReadMemStatsLatency-16                              98.4k ±37%    112.7k ±64%     ~     (p=0.436 n=10+10)
ReadMetricsLatency-16                               1.69k ± 3%     1.70k ± 2%     ~     (p=0.646 n=9+10)
GoroutineProfile/small-nil/idle-16                  3.75k ± 4%     3.72k ± 1%     ~     (p=0.447 n=10+9)
GoroutineProfile/small-nil/loaded-16                4.33k ± 3%     4.31k ± 5%     ~     (p=0.931 n=9+9)
GoroutineProfile/small/idle-16                       102k ± 3%      101k ± 4%     ~     (p=0.113 n=9+9)
GoroutineProfile/small/loaded-16                     214k ± 3%      215k ± 6%     ~     (p=0.842 n=9+10)
GoroutineProfile/large-nil/idle-16                  3.70k ± 2%     3.65k ± 2%     ~     (p=0.075 n=10+10)
GoroutineProfile/large-nil/loaded-16                4.36k ± 3%     4.31k ±10%     ~     (p=0.631 n=10+10)
GoroutineProfile/large/idle-16                      2.56M ± 1%     2.51M ± 1%   -2.28%  (p=0.000 n=10+10)
GoroutineProfile/large/loaded-16                    6.77M ± 4%     6.85M ±19%     ~     (p=0.536 n=7+10)
GoroutineProfile/sparse-nil/idle-16                 3.66k ± 1%     3.64k ± 2%     ~     (p=0.136 n=9+9)
GoroutineProfile/sparse-nil/loaded-16               4.25k ± 5%     4.15k ± 4%     ~     (p=0.190 n=10+10)
GoroutineProfile/sparse/idle-16                      102k ± 4%      101k ± 3%     ~     (p=0.447 n=10+9)
GoroutineProfile/sparse/loaded-16                    216k ± 4%      218k ± 4%     ~     (p=0.549 n=9+10)
[Geo mean]                                          35.8k          35.9k        +0.35%

name                                             old p90-ns     new p90-ns     delta
ReadMemStatsLatency-16                              983k ±310%      200k ±34%  -79.62%  (p=0.034 n=10+8)
ReadMetricsLatency-16                               4.01k ±35%     3.75k ±17%     ~     (p=0.315 n=10+10)
GoroutineProfile/small-nil/idle-16                  4.21k ± 4%     4.27k ± 8%     ~     (p=0.968 n=9+10)
GoroutineProfile/small-nil/loaded-16                5.58k ± 8%     5.35k ±12%     ~     (p=0.190 n=10+10)
GoroutineProfile/small/idle-16                       108k ± 6%      107k ± 7%     ~     (p=0.497 n=9+10)
GoroutineProfile/small/loaded-16                     450k ± 5%      432k ± 3%   -3.92%  (p=0.002 n=9+9)
GoroutineProfile/large-nil/idle-16                  4.13k ± 7%     4.04k ± 2%     ~     (p=0.181 n=10+8)
GoroutineProfile/large-nil/loaded-16                5.76k ± 4%     5.67k ± 5%     ~     (p=0.190 n=10+10)
GoroutineProfile/large/idle-16                      2.63M ± 2%     2.58M ± 1%   -1.97%  (p=0.000 n=10+10)
GoroutineProfile/large/loaded-16                    16.9M ± 4%     17.0M ± 6%     ~     (p=0.661 n=9+10)
GoroutineProfile/sparse-nil/idle-16                 4.21k ±10%     4.07k ± 7%     ~     (p=0.128 n=10+10)
GoroutineProfile/sparse-nil/loaded-16               5.55k ± 8%     5.38k ± 6%     ~     (p=0.089 n=10+10)
GoroutineProfile/sparse/idle-16                      106k ± 4%      106k ± 3%     ~     (p=0.661 n=10+9)
GoroutineProfile/sparse/loaded-16                    454k ± 6%      441k ± 6%   -2.86%  (p=0.043 n=10+10)
[Geo mean]                                          58.4k          51.0k       -12.61%

name                                             old p99-ns     new p99-ns     delta
ReadMemStatsLatency-16                              983k ±310%      200k ±34%  -79.62%  (p=0.034 n=10+8)
ReadMetricsLatency-16                               26.6k ±22%     26.6k ±17%     ~     (p=0.971 n=10+10)
GoroutineProfile/small-nil/idle-16                  5.27k ±12%     5.19k ±12%     ~     (p=0.579 n=10+10)
GoroutineProfile/small-nil/loaded-16                7.28k ± 3%     7.04k ± 9%     ~     (p=0.113 n=9+10)
GoroutineProfile/small/idle-16                       114k ± 6%      113k ± 6%     ~     (p=0.604 n=9+10)
GoroutineProfile/small/loaded-16                    4.84M ±70%    6.06M ±131%     ~     (p=0.842 n=9+10)
GoroutineProfile/large-nil/idle-16                  5.38k ±19%     5.26k ±13%     ~     (p=0.912 n=10+10)
GoroutineProfile/large-nil/loaded-16                7.38k ± 3%     7.23k ± 4%     ~     (p=0.143 n=10+10)
GoroutineProfile/large/idle-16                      2.79M ± 5%     2.72M ± 3%     ~     (p=0.089 n=10+10)
GoroutineProfile/large/loaded-16                    24.0M ±24%     24.9M ±29%     ~     (p=0.684 n=10+10)
GoroutineProfile/sparse-nil/idle-16                 5.32k ±17%     5.49k ±17%     ~     (p=0.684 n=10+10)
GoroutineProfile/sparse-nil/loaded-16               7.25k ± 4%     6.97k ± 5%   -3.90%  (p=0.005 n=9+10)
GoroutineProfile/sparse/idle-16                      113k ± 5%      112k ± 6%     ~     (p=0.631 n=10+10)
GoroutineProfile/sparse/loaded-16                   4.00M ±66%     4.26M ±65%     ~     (p=0.489 n=9+9)
[Geo mean]                                           107k            97k        -9.55%

name                                             old alloc/op   new alloc/op   delta
NewEmptyMap-16                                      0.00B          0.00B          ~     (all equal)
NewSmallMap-16                                      0.00B          0.00B          ~     (all equal)
MapPopulate/1-16                                    0.00B          0.00B          ~     (all equal)
MapPopulate/10-16                                    179B ± 0%      179B ± 0%     ~     (all equal)
MapPopulate/100-16                                 3.35kB ± 0%    3.35kB ± 0%     ~     (p=0.294 n=10+8)
MapPopulate/1000-16                                53.3kB ± 0%    53.3kB ± 0%     ~     (p=1.000 n=8+10)
MapPopulate/10000-16                                428kB ± 0%     428kB ± 0%     ~     (p=0.469 n=10+10)
MapPopulate/100000-16                              3.62MB ± 0%    3.62MB ± 0%     ~     (p=0.888 n=9+10)
MapStringConversion/32/simple-16                    0.00B          0.00B          ~     (all equal)
MapStringConversion/32/struct-16                    0.00B          0.00B          ~     (all equal)
MapStringConversion/32/array-16                     0.00B          0.00B          ~     (all equal)
MapStringConversion/64/simple-16                    0.00B          0.00B          ~     (all equal)
MapStringConversion/64/struct-16                    0.00B          0.00B          ~     (all equal)
MapStringConversion/64/array-16                     0.00B          0.00B          ~     (all equal)
NewEmptyMapHintLessThan8-16                         0.00B          0.00B          ~     (all equal)
NewEmptyMapHintGreaterThan8-16                     1.15kB ± 0%    1.15kB ± 0%     ~     (all equal)
MapAppendAssign/Int32/256-16                        41.7B ±15%     44.3B ±12%     ~     (p=0.106 n=10+10)
MapAppendAssign/Int32/65536-16                      22.6B ± 6%     23.5B ± 6%   +4.19%  (p=0.025 n=9+10)
MapAppendAssign/Int64/256-16                        43.5B ±10%     42.9B ± 7%     ~     (p=0.757 n=10+10)
MapAppendAssign/Int64/65536-16                      24.7B ± 7%     21.8B ± 6%  -11.74%  (p=0.000 n=10+10)
MapAppendAssign/Str/256-16                          87.6B ±10%     89.2B ± 9%     ~     (p=0.379 n=10+10)
MapAppendAssign/Str/65536-16                        45.1B ±14%     47.2B ± 8%     ~     (p=0.150 n=10+9)
CreateGoroutinesCapture-16                           144B ± 0%      144B ± 0%     ~     (all equal)
[Geo mean]                                           769B           770B        +0.21%

name                                             old allocs/op  new allocs/op  delta
NewEmptyMap-16                                       0.00           0.00          ~     (all equal)
NewSmallMap-16                                       0.00           0.00          ~     (all equal)
MapPopulate/1-16                                     0.00           0.00          ~     (all equal)
MapPopulate/10-16                                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
MapPopulate/100-16                                   17.0 ± 0%      17.0 ± 0%     ~     (all equal)
MapPopulate/1000-16                                  73.0 ± 0%      73.0 ± 0%     ~     (all equal)
MapPopulate/10000-16                                  320 ± 0%       320 ± 0%     ~     (p=1.000 n=10+10)
MapPopulate/100000-16                               4.00k ± 0%     4.00k ± 0%     ~     (p=0.753 n=10+10)
MapStringConversion/32/simple-16                     0.00           0.00          ~     (all equal)
MapStringConversion/32/struct-16                     0.00           0.00          ~     (all equal)
MapStringConversion/32/array-16                      0.00           0.00          ~     (all equal)
MapStringConversion/64/simple-16                     0.00           0.00          ~     (all equal)
MapStringConversion/64/struct-16                     0.00           0.00          ~     (all equal)
MapStringConversion/64/array-16                      0.00           0.00          ~     (all equal)
NewEmptyMapHintLessThan8-16                          0.00           0.00          ~     (all equal)
NewEmptyMapHintGreaterThan8-16                       1.00 ± 0%      1.00 ± 0%     ~     (all equal)
MapAppendAssign/Int32/256-16                         0.00           0.00          ~     (all equal)
MapAppendAssign/Int32/65536-16                       0.00           0.00          ~     (all equal)
MapAppendAssign/Int64/256-16                         0.00           0.00          ~     (all equal)
MapAppendAssign/Int64/65536-16                       0.00           0.00          ~     (all equal)
MapAppendAssign/Str/256-16                           0.00           0.00          ~     (all equal)
MapAppendAssign/Str/65536-16                         0.00           0.00          ~     (all equal)
CreateGoroutinesCapture-16                           5.00 ± 0%      5.00 ± 0%     ~     (all equal)
[Geo mean]                                           26.0           26.0        +0.00%

Change-Id: I5fb03e93df8b380e04795afbdcd1c94aeeecacc6
Reviewed-on: https://go-review.googlesource.com/c/go/+/454255
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-31 18:11:24 +00:00
Paul E. Murphy
1540531746 test/codegen: merge identical ppc64 and ppc64le tests
Manually consolidate the remaining ppc64/ppc64le test which
are not so trivial to automatically merge.

The remaining ppc64le tests are limited to cases where load/stores are
merged (this only happens on ppc64le) and the race detector (only
supported on ppc64le).

Change-Id: I1f9c0f3d3ddbb7fbbd8c81fbbd6537394fba63ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/463217
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-01-27 19:03:02 +00:00
Paul E. Murphy
0301c6c351 test/codegen: combine trivial PPC64 tests into ppc64x
Use a small python script to consolidate duplicate
ppc64/ppc64le tests into a single ppc64x codegen test.

This makes small assumption that anytime two tests with
for different arch/variant combos exists, those tests
can be combined into a single ppc64x test.

E.x:

  // ppc64le: foo
  // ppc64le/power9: foo
into
  // ppc64x: foo

or

  // ppc64: foo
  // ppc64le: foo
into
  // ppc64x: foo

import glob
import re
files = glob.glob("codegen/*.go")
for file in files:
    with open(file) as f:
        text = [l for l in f]
    i = 0
    while i < len(text):
        first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i])
        if first:
            j = i+1
            while j < len(text):
                second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j])
                if not second:
                    break
                if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3):
                    text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i])
                    text=text[:j] + (text[j+1:])
                else:
                    j += 1
        i+=1
    with open(file, 'w') as f:
        f.write("".join(text))

Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821
Reviewed-on: https://go-review.googlesource.com/c/go/+/463220
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-27 18:24:12 +00:00
Paul E. Murphy
a37672bb7b test/codegen: accept ppc64x as alias for ppc64le and ppc64 arches
This helps simplify the noise when adding ppc codegen tests. ppc64x
is used in other places to indicate something which runs on either
endian.

This helps cleanup existing codegen tests which are mostly
identical between endian variants.

condmove tests are converted as an example.

Change-Id: I2b2d98a9a1859015f62db38d62d9d5d7593435b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/462895
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
2023-01-24 22:55:18 +00:00
David Chase
e22bd2348c internal/abi,runtime: refactor map constants into one place
Previously TryBot-tested with bucket bits = 4.
Also tested locally with bucket bits = 5.
This makes it much easier to change the size of map
buckets, and hopefully provides pointers to all the
code that in some way depends on details of map layout.

Change-Id: I9f6669d1eadd02f182d0bc3f959dc5f385fa1683
Reviewed-on: https://go-review.googlesource.com/c/go/+/462115
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2023-01-23 15:51:32 +00:00
Jorropo
5c67ebbb31 cmd/compile: AMD64v3 remove unnecessary TEST comparision in isPowerOfTwo
With GOAMD64=V3 the canonical isPowerOfTwo function:
  func isPowerOfTwo(x uintptr) bool {
    return x&(x-1) == 0
  }

Used to compile to:
  temp := BLSR(x) // x&(x-1)
  flags = TEST(temp, temp)
  return flags.zf

However the blsr instruction already set ZF according to the result.
So we can remove the TEST instruction if we are just checking ZF.
Such as in multiple pieces of code around memory allocations.

This make the code smaller and faster.

Change-Id: Ia12d5a73aa3cb49188c0b647b1eff7b56c5a7b58
Reviewed-on: https://go-review.googlesource.com/c/go/+/448255
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-01-20 04:58:59 +00:00
Jorropo
fc814056aa cmd/compile: rewrite empty makeslice to zerobase pointer
make\(\[\][a-zA-Z0-9]+, 0\) is seen 52 times in the go source.
And at least 391 times on internet:
https://grep.app/search?q=make%5C%28%5C%5B%5C%5D%5Ba-zA-Z0-9%5D%2B%2C%200%5C%29&regexp=true
This used to compile to calling runtime.makeslice.
However we can copy what we do for []T{}, just use a zerobase pointer.

On my machine this is 10x faster (from 3ns to 0.3ns).
Note that an empty loop also runs in 0.3ns,
so this really is free when you count superscallar execution.

Change-Id: I1cfe7e69f5a7a4dabbc71912ce6a4f8a2d4a7f3c
Reviewed-on: https://go-review.googlesource.com/c/go/+/454036
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
2023-01-20 04:57:35 +00:00
Keith Randall
f959fb3872 cmd/compile: add anchored version of SP
The SPanchored opcode is identical to SP, except that it takes a memory
argument so that it (and more importantly, anything that uses it)
must be scheduled at or after that memory argument.

This opcode ensures that a LEAQ of a variable gets scheduled after the
corresponding VARDEF for that variable.

This may lead to less CSE of LEAQ operations. The effect is very small.
The go binary is only 80 bytes bigger after this CL. Usually LEAQs get
folded into load/store operations, so the effect is only for pointerful
types, large enough to need a duffzero, and have their address passed
somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because
the two uses are on different sides of a function call and the LEAQ
ends up being rematerialized at the second use anyway.

Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293
Reviewed-on: https://go-review.googlesource.com/c/go/+/452916
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2023-01-19 22:43:12 +00:00
Keith Randall
1eb0465fa5 cmd/compile: turn off jump tables when spectre retpolines are on
Fixes #57097

Change-Id: I6ab659abbca1ae0ac8710674d39aec116fab0baa
Reviewed-on: https://go-review.googlesource.com/c/go/+/455336
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2022-12-06 05:12:12 +00:00
Paul E. Murphy
dc6b7c86df cmd/compile: merge zero constant ISEL in PPC64 lateLower pass
Add a new SSA opcode ISELZ, similar to ISELB to represent a select
of value or 0. Then, merge candidate ISEL opcodes inside the late
lower pass.

This avoids complicating rules within the the lower pass.

Change-Id: I3b14c94b763863aadc834b0e910a85870c131313
Reviewed-on: https://go-review.googlesource.com/c/go/+/442596
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
2022-11-14 19:44:47 +00:00
Wayne Zuo
268f4629df cmd/compile: enable brachelim pass on loong64
Change-Id: I4fd1c307901c265ab9865bf8a74460ddc15e5d14
Reviewed-on: https://go-review.googlesource.com/c/go/+/416735
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: xiaodong liu <teaofmoli@gmail.com>
Auto-Submit: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
2022-11-09 06:10:55 +00:00
Paul E. Murphy
390abbbbf1 codegen: check for PPC64 ISEL in condmove tests
ISEL is roughly equivalent to CMOV on PPC64. Verify ISEL generation
in all reasonable cases.

Note "ISEL test x y z" is the same as "ISEL !test y x z". test is
always one of LT (0), GT (1), EQ (2), SO (3). Sometimes x and y are
swapped if GE/LE/NE is desired.

Change-Id: Ie1bf029224064e004d855099731fe5e8d05aa990
Reviewed-on: https://go-review.googlesource.com/c/go/+/445215
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2022-11-07 15:19:20 +00:00
Paul E. Murphy
d031e9e07a cmd/compile/internal/ssa: re-adjust CarryChainTail scheduling priority
This needs to be as low as possible while not breaking priority
assumptions of other scores to correctly schedule carry chains.

Prior to the arm64 changes, it was set below ReadTuple. At the time,
this prevented the MulHiLo implementation on PPC64 from occluding
the scheduling of a full carry chain.

Memory scores can also prevent better scheduling, as can be observed
with crypto/internal/edwards25519/field.feMulGeneric.

Fixes #56497

Change-Id: Ia4b54e6dffcce584faf46b1b8d7cea18a3913887
Reviewed-on: https://go-review.googlesource.com/c/go/+/447435
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2022-11-03 19:59:19 +00:00
Keith Randall
9ce27feaeb cmd/compile: add rule for post-decomposed growslice optimization
The recently added rule only works before decomposing slices.
Add a rule that works after decomposing slices.

The reason we need the latter is because although the length may
be a constant, it can be hidden inside a slice that is not constant
(its pointer or capacity might be changing). By applying this
optimization after decomposing slices, we can find more cases
where it applies.

Fixes #56440

Change-Id: I0094e59eee3065ab4d210defdda8227a6e897420
Reviewed-on: https://go-review.googlesource.com/c/go/+/446277
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2022-10-31 21:40:49 +00:00
Keith Randall
0156b797e6 cmd/compile: recognize when the result of append has a constant length
Fixes a performance regression due to CL 418554.

Fixes #56440

Change-Id: I6ff152e9b83084756363f49ee6b0844a7a284880
Reviewed-on: https://go-review.googlesource.com/c/go/+/445875
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-10-27 17:09:50 +00:00
Wayne Zuo
90a3527427 cmd/compile: intrinsify Sub64 on loong64
This is a follow up of CL 420095  on loong64.

file                                    before    after     Δ       %
compile/internal/ssa.a                  35649482  35653274  +3792   +0.011%
compile/internal/ssagen.a               4099858   4098728   -1130   -0.028%
ecdh.a                                  227896    226896    -1000   -0.439%
internal/nistec/fiat.a                  1212254   1128184   -84070  -6.935%
tls.a                                   3256800   3256802   +2      +0.000%
big.a                                   1708518   1702496   -6022   -0.352%
bits.a                                  106762    105734    -1028   -0.963%
math.a                                  578762    577288    -1474   -0.255%
netip.a                                 555922    555610    -312    -0.056%
net.a                                   3286528   3286530   +2      +0.000%
golang.org/x/crypto/internal/poly1305.a 109546    107686    -1860   -1.698%
total                                   260392768 260299668 -93100  -0.036%

Change-Id: Ieffca705aae5666501f284502d986ca179dde494
Reviewed-on: https://go-review.googlesource.com/c/go/+/428557
Reviewed-by: Carlos Amedee <carlos@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
2022-10-07 18:16:26 +00:00
Wayne Zuo
97760ed651 cmd/compile: intrinsify Add64 on loong64
This is a follow up of CL 420094  on loong64.

Reduce go toolchain size slightly on linux/loong64.

compilecmp HEAD~1 -> HEAD
HEAD~1 (8a32354219): internal/trace: use strings.Builder
HEAD (1767784ac3): cmd/compile: intrinsify Add64 on loong64
platform: linux/loong64

file      before    after     Δ       %
addr2line 3882616   3882536   -80     -0.002%
api       5528866   5528450   -416    -0.008%
asm       5133780   5133796   +16     +0.000%
cgo       4668787   4668491   -296    -0.006%
compile   25163409  25164729  +1320   +0.005%
cover     4658055   4658007   -48     -0.001%
dist      3437783   3437727   -56     -0.002%
doc       3883069   3883205   +136    +0.004%
fix       3383254   3383070   -184    -0.005%
link      6747559   6747023   -536    -0.008%
nm        3793923   3793939   +16     +0.000%
objdump   4256628   4256812   +184    +0.004%
pack      2356328   2356144   -184    -0.008%
pprof     14233370  14131910  -101460 -0.713%
test2json 2638668   2638476   -192    -0.007%
trace     13392065  13360781  -31284  -0.234%
vet       7456388   7455588   -800    -0.011%
total     132498256 132364392 -133864 -0.101%

file                                    before    after     Δ       %
compile/internal/ssa.a                  35644590  35649482  +4892   +0.014%
compile/internal/ssagen.a               4101250   4099858   -1392   -0.034%
internal/edwards25519/field.a           226064    201718    -24346  -10.770%
internal/nistec/fiat.a                  1689922   1212254   -477668 -28.266%
tls.a                                   3256798   3256800   +2      +0.000%
big.a                                   1718552   1708518   -10034  -0.584%
bits.a                                  107786    106762    -1024   -0.950%
cmplx.a                                 169434    168214    -1220   -0.720%
math.a                                  581302    578762    -2540   -0.437%
netip.a                                 556096    555922    -174    -0.031%
net.a                                   3286526   3286528   +2      +0.000%
runtime.a                               8644786   8644510   -276    -0.003%
strconv.a                               519098    518374    -724    -0.139%
golang.org/x/crypto/internal/poly1305.a 115398    109546    -5852   -5.071%
total                                   260913122 260392768 -520354 -0.199%

Change-Id: I75b2bb7761fa5a0d0d032d4ebe3582d092ea77be
Reviewed-on: https://go-review.googlesource.com/c/go/+/428556
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-07 18:16:10 +00:00
Wayne Zuo
af668c689c cmd/compile: fold constant shift with extension on riscv64
For example:

  movb a0, a0
  srai $1, a0, a0

the assembler will expand to:

  slli $56, a0, a0
  srai $56, a0, a0
  srai $1, a0, a0

this CL optimize to:

  slli $56, a0, a0
  srai $57, a0, a0

Remove 270+ instructions from Go binary on linux/riscv64.

Change-Id: I375e19f9d3bd54f2781791d8cbe5970191297dc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/428496
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-06 05:21:04 +00:00
eric fang
ddc7d2a80c cmd/compile: add late lower pass for last rules to run
Usually optimization rules have corresponding priorities, some need to
be run first, some run next, and some run last, which produces the best
code. But currently our optimization rules have no priority, this CL
adds a late lower pass that runs those rules that need to be run at last,
such as split unreasonable constant folding. This pass can be seen as
the second round of the lower pass.

For example:
func foo(a, b uint64) uint64 {
        d := a+0x1234568
        d1 := b+0x1234568
        return d&d1
}
The code generated by the master branch:
	0x0004 00004        ADD     $19088744, R0, R2 // movz+movk+add
	0x0010 00016        ADD     $19088744, R1, R1 // movz+movk+add
	0x001c 00028        AND     R1, R2, R0

This is because the current constant folding optimization rules do not
take into account the range of constants, causing the constant to be
loaded repeatedly. This CL splits these unreasonable constants folding
in the late lower pass. With this CL the generated code:
	0x0004 00004        MOVD    $19088744, R2 // movz+movk
	0x000c 00012        ADD     R0, R2, R3
	0x0010 00016        ADD     R1, R2, R1
	0x0014 00020        AND     R1, R3, R0

This CL also adds constant folding optimization for ADDS instruction.

In addition, in order not to introduce the codegen regression, an
optimization rule is added to change the addition of a negative number
into a subtraction of a positive number.

go1 benchmarks:
name                     old time/op    new time/op    delta
BinaryTree17-8              1.22s ± 1%     1.24s ± 0%  +1.56%  (p=0.008 n=5+5)
Fannkuch11-8                1.54s ± 0%     1.53s ± 0%  -0.69%  (p=0.016 n=4+5)
FmtFprintfEmpty-8          14.1ns ± 0%    14.1ns ± 0%    ~     (p=0.079 n=4+5)
FmtFprintfString-8         26.0ns ± 0%    26.1ns ± 0%  +0.23%  (p=0.008 n=5+5)
FmtFprintfInt-8            32.3ns ± 0%    32.9ns ± 1%  +1.72%  (p=0.008 n=5+5)
FmtFprintfIntInt-8         54.5ns ± 0%    55.5ns ± 0%  +1.83%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8    61.5ns ± 0%    62.0ns ± 0%  +0.93%  (p=0.008 n=5+5)
FmtFprintfFloat-8          72.0ns ± 0%    73.6ns ± 0%  +2.24%  (p=0.008 n=5+5)
FmtManyArgs-8               221ns ± 0%     224ns ± 0%  +1.22%  (p=0.008 n=5+5)
GobDecode-8                1.91ms ± 0%    1.93ms ± 0%  +0.98%  (p=0.008 n=5+5)
GobEncode-8                1.40ms ± 1%    1.39ms ± 0%  -0.79%  (p=0.032 n=5+5)
Gzip-8                      115ms ± 0%     117ms ± 1%  +1.17%  (p=0.008 n=5+5)
Gunzip-8                   19.4ms ± 1%    19.3ms ± 0%  -0.71%  (p=0.016 n=5+4)
HTTPClientServer-8         27.0µs ± 0%    27.3µs ± 0%  +0.80%  (p=0.008 n=5+5)
JSONEncode-8               3.36ms ± 1%    3.33ms ± 0%    ~     (p=0.056 n=5+5)
JSONDecode-8               17.5ms ± 2%    17.8ms ± 0%  +1.71%  (p=0.016 n=5+4)
Mandelbrot200-8            2.29ms ± 0%    2.29ms ± 0%    ~     (p=0.151 n=5+5)
GoParse-8                  1.35ms ± 1%    1.36ms ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchEasy0_32-8      24.5ns ± 0%    24.5ns ± 0%    ~     (p=0.444 n=4+5)
RegexpMatchEasy0_1K-8       131ns ±11%     118ns ± 6%    ~     (p=0.056 n=5+5)
RegexpMatchEasy1_32-8      22.9ns ± 0%    22.9ns ± 0%    ~     (p=0.905 n=4+5)
RegexpMatchEasy1_1K-8       126ns ± 0%     127ns ± 0%    ~     (p=0.063 n=4+5)
RegexpMatchMedium_32-8      486ns ± 5%     483ns ± 0%    ~     (p=0.381 n=5+4)
RegexpMatchMedium_1K-8     15.4µs ± 1%    15.5µs ± 0%    ~     (p=0.151 n=5+5)
RegexpMatchHard_32-8        687ns ± 0%     686ns ± 0%    ~     (p=0.103 n=5+5)
RegexpMatchHard_1K-8       20.7µs ± 0%    20.7µs ± 1%    ~     (p=0.151 n=5+5)
Revcomp-8                   175ms ± 2%     176ms ± 3%    ~     (p=1.000 n=5+5)
Template-8                 20.4ms ± 6%    20.1ms ± 2%    ~     (p=0.151 n=5+5)
TimeParse-8                 112ns ± 0%     113ns ± 0%  +0.97%  (p=0.016 n=5+4)
TimeFormat-8                156ns ± 0%     145ns ± 0%  -7.14%  (p=0.029 n=4+4)

Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da
Reviewed-on: https://go-review.googlesource.com/c/go/+/425134
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
2022-10-05 02:40:56 +00:00
Keith Randall
6485e8f503 cmd/compile: use stricter rule for possible partial overlap
Partial overlaps can only happen for strict sub-pieces of larger arrays.
That's a much stronger condition than the current optimization rules.

Update #54467

Change-Id: I11e539b71099e50175f37ee78fddf69283f83ee5
Reviewed-on: https://go-review.googlesource.com/c/go/+/433056
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2022-09-27 20:09:33 +00:00