1
0
mirror of https://github.com/golang/go synced 2024-11-16 20:24:54 -07:00
Commit Graph

58447 Commits

Author SHA1 Message Date
apocelipes
e73e25b624 internal/cpu: add comments to copied functions
Just as same as other copied functions,
like stringsTrimSuffix in "os/executable_procfs.go"

Change-Id: I9c9fbd75b009a5ae0e869cf1fddc77c0e08d9a67
GitHub-Last-Rev: 4c18865e15
GitHub-Pull-Request: golang/go#63704
Reviewed-on: https://go-review.googlesource.com/c/go/+/537056
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-10-31 21:32:19 +00:00
Cherry Mui
d2f3a68bf0 runtime: use testenv.Command in TestG0StackOverflow
For debugging timeouts.

Change-Id: I08dc86ec0264196e5fd54066655e94a9d062ed80
Reviewed-on: https://go-review.googlesource.com/c/go/+/538697
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-31 20:50:47 +00:00
Keith Randall
b11defeaed runtime: make select fairness test less picky
Allow up to 10 standard deviations from the mean, instead of
~5 that the current test allows.

10 standard deviations allows up to a 4500/5500 split.

Fixes #52465

Change-Id: Icb21c1d31fafbcf4723b75435ba5e98863e812c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/538815
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-31 20:47:35 +00:00
Keith Randall
962ccbef91 cmd/compile: ensure pointer arithmetic happens after the nil check
Have nil checks return a pointer that is known non-nil. Users of
that pointer can use the result, ensuring that they are ordered
after the nil check itself.

The order dependence goes away after scheduling, when we've fixed
an order. At that point we move uses back to the original pointer
so it doesn't change regalloc any.

This prevents pointer arithmetic on nil from being spilled to the
stack and then observed by a stack scan.

Fixes #63657

Change-Id: I1a5fa4f2e6d9000d672792b4f90dfc1b7b67f6ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/537775
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 20:45:54 +00:00
Keith Randall
43b57b8516 cmd/compile: handle constant pointer offsets in dead store elimination
Update #63657
Update #45573

Change-Id: I163c6038c13d974dc0ca9f02144472bc05331826
Reviewed-on: https://go-review.googlesource.com/c/go/+/538595
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 20:42:56 +00:00
Keith Randall
66b8107a26 runtime: on arm32, detect whether we have sync instructions
Make the choice of using these instructions dynamic (triggered by cpu
feature detection) rather than static (trigered by GOARM setting).

if GOARM>=7, we know we have them.
For GOARM=5/6, dynamically dispatch based on auxv information.

Update #17082
Update #61588

Change-Id: I8a50481d942f62cf36348998a99225d0d242f8af
Reviewed-on: https://go-review.googlesource.com/c/go/+/525637
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-10-31 20:38:55 +00:00
Mateusz Poliwczak
dd84bb6824 crypto/x509: add new OID type and use it in Certificate
Fixes #60665

Change-Id: I814b7d4b26b964f74443584fb2048b3e27e3b675
GitHub-Last-Rev: 693c741c76
GitHub-Pull-Request: golang/go#62096
Reviewed-on: https://go-review.googlesource.com/c/go/+/520535
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Mateusz Poliwczak <mpoliwczak34@gmail.com>
Auto-Submit: Roland Shoemaker <roland@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-10-31 19:22:19 +00:00
Jes Cok
68e52bc03c bytes,internal/bytealg: eliminate IndexRabinKarpBytes using generics
This is a follow-up to CL 538175.

Change-Id: Iec2523b36a16d7e157c17858c89fcd43c2470d58
GitHub-Last-Rev: 812d36e57c
GitHub-Pull-Request: golang/go#63770
Reviewed-on: https://go-review.googlesource.com/c/go/+/538195
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-31 17:14:04 +00:00
Jes Cok
cbc403af1d cmd/compile/internal/ssa: adjust default to the end in *Block.AuxIntString
Change-Id: Id48cade7811e2dfbf78d3171fe202ad272534e37
GitHub-Last-Rev: ea6abb2dc2
GitHub-Pull-Request: golang/go#63808
Reviewed-on: https://go-review.googlesource.com/c/go/+/538377
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 17:13:33 +00:00
Cuong Manh Le
3dea7c3f69 hash/maphash: weaken avalanche test a bit more
CL 495415 weaken avalanche, making allowed range from 43% to 57%. Since
then, we only see a failure with 58% on linux-386-longtest builder, so
let give the test a bit more wiggle room: 40% to 59%.

Fixes #60170

Change-Id: I9528ebc8601975b733c3d9fd464ce41429654273
Reviewed-on: https://go-review.googlesource.com/c/go/+/538655
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-10-31 17:00:31 +00:00
cui fliter
289b823ac9 internal/bytealg: optimize Count/CountString in arm64
For #63678

goos: darwin
goarch: arm64
pkg: strings
                          │ count_old.txt │            count_new.txt            │
                          │    sec/op     │   sec/op     vs base                │
CountHard1-8                 368.7µ ± 11%   332.0µ ± 1%   -9.95% (p=0.002 n=10)
CountHard2-8                 348.8µ ±  5%   333.1µ ± 1%   -4.51% (p=0.000 n=10)
CountHard3-8                 402.7µ ± 25%   359.5µ ± 1%  -10.75% (p=0.000 n=10)
CountTorture-8              10.536µ ± 23%   9.913µ ± 0%   -5.91% (p=0.000 n=10)
CountTortureOverlapping-8    74.86µ ±  9%   67.56µ ± 1%   -9.75% (p=0.000 n=10)
CountByte/10-8               6.905n ±  3%   6.690n ± 1%   -3.11% (p=0.001 n=10)
CountByte/32-8               3.247n ± 13%   3.207n ± 2%   -1.23% (p=0.030 n=10)
CountByte/4096-8             83.72n ±  1%   82.58n ± 1%   -1.36% (p=0.007 n=10)
CountByte/4194304-8          85.17µ ±  5%   84.02µ ± 8%        ~ (p=0.075 n=10)
CountByte/67108864-8         1.497m ±  8%   1.397m ± 2%   -6.69% (p=0.000 n=10)
geomean                      9.977µ         9.426µ        -5.53%

                     │ count_old.txt │            count_new.txt            │
                     │      B/s      │     B/s       vs base               │
CountByte/10-8         1.349Gi ±  3%   1.392Gi ± 1%  +3.20% (p=0.002 n=10)
CountByte/32-8         9.180Gi ± 11%   9.294Gi ± 2%  +1.24% (p=0.029 n=10)
CountByte/4096-8       45.57Gi ±  1%   46.20Gi ± 1%  +1.38% (p=0.007 n=10)
CountByte/4194304-8    45.86Gi ±  5%   46.49Gi ± 7%       ~ (p=0.075 n=10)
CountByte/67108864-8   41.75Gi ±  8%   44.74Gi ± 2%  +7.16% (p=0.000 n=10)
geomean                16.10Gi         16.55Gi       +2.85%

Change-Id: Ifc2173ba3a926b0fa9598372d4404b8645929d45
Reviewed-on: https://go-review.googlesource.com/c/go/+/538116
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-31 17:00:27 +00:00
Joel Sing
e293c4b509 runtime: allocate crash stack via stackalloc
On some platforms (notably OpenBSD), stacks must be specifically allocated
and marked as being stack memory. Allocate the crash stack using stackalloc,
which ensures these requirements are met, rather than using a global Go
variable.

Fixes #63794

Change-Id: I6513575797dd69ff0a36f3bfd4e5fc3bd95cbf50
Reviewed-on: https://go-review.googlesource.com/c/go/+/538457
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-31 16:28:14 +00:00
Robert Griesemer
b7a66be69c cmd/compile/internal/syntax: set up dummy name and type if func name is missing
We do the same elsewhere (e.g. in parser.name when a name is missing).
This ensures functions have a (dummy) name and a non-nil type.
Avoids a crash in the type-checker (verified manually).
A test was added here (rather than the type checker) because type-
checker tests are shared between types2 and go/types and error
recovery in this case is different.

Fixes #63835.

Change-Id: I1460fc88d23d80b8d8c181c774d6b0a56ca06317
Reviewed-on: https://go-review.googlesource.com/c/go/+/538059
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Bypass: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
2023-10-31 16:12:41 +00:00
Robert Griesemer
25a59decd5 go/types, types2: more concise error if conversion fails due to integer overflow
This change brings the error message for this case back in line
with the pre-Go1.18 error message.

Fixes #63563.

Change-Id: I3c6587d420907b34ee8a5f295ecb231e9f008380
Reviewed-on: https://go-review.googlesource.com/c/go/+/538058
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
TryBot-Bypass: Robert Griesemer <gri@google.com>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
2023-10-31 16:11:16 +00:00
Joel Sing
b6a3c0273e cmd/dist,internal/platform: enable openbsd/ppc64 port
Updates #56001

Change-Id: I16440114ecf661e9fc17d304ab3b16bc97ef82f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/517935
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2023-10-31 12:43:19 +00:00
Jes Cok
f215a0be4d cmd/compile/internal/ssa: add missing space in comment
Change-Id: I54c3e8e0d61ceb6533284098dc32944f9f14459e
GitHub-Last-Rev: 9793d9d039
GitHub-Pull-Request: golang/go#63806
Reviewed-on: https://go-review.googlesource.com/c/go/+/538375
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: qiulaidongfeng <2645477756@qq.com>
2023-10-30 21:52:15 +00:00
qiulaidongfeng
9c2ab20d48 internal/fmtsort: makeChans pin pointer
Complete TODO.

For #49431

Change-Id: I1399205e430ebd83182c3e0c4becf1fde32d433e
GitHub-Last-Rev: 02cdea740b
GitHub-Pull-Request: golang/go#62673
Reviewed-on: https://go-review.googlesource.com/c/go/+/528796
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Commit-Queue: Keith Randall <khr@golang.org>
Run-TryBot: qiulaidongfeng <2645477756@qq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-30 21:00:16 +00:00
Quan Tong
214ce28503 cmd/go/internal/help: update the documentation to match the design and implementation
The existing documentation imply that the build constraints
should be ignored after a block comments, but actually it's not.

Fixes #63502

Change-Id: I0597934b7a7eeab8908bf06e1312169b3702bf05
Reviewed-on: https://go-review.googlesource.com/c/go/+/535635
Reviewed-by: Michael Matloob <matloob@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Mark Pictor <mark.pictor@contrastsecurity.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-10-30 18:16:15 +00:00
Allen Li
1e95fc7ffe log/slog: Reorder doc comment for level constants
pkgsite and go doc print the doc comment *after* the code, resulting in:

    const (
            LevelDebug Level = -4
            ...
    )

    Many paragraphs...

    Names for common levels.

The "Names for common levels." feels out of place and confusing at the bottom.

This is also consistent with the recommendation for the first sentence in doc comments to be the "summary".

Change-Id: I656e85e27d2a4b23eaba5f2c1f4f811a88848c83
GitHub-Last-Rev: d9f7ee9b94
GitHub-Pull-Request: golang/go#61943
Reviewed-on: https://go-review.googlesource.com/c/go/+/518537
Reviewed-by: Alan Donovan <alan@alandonovan.net>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: qiulaidongfeng <2645477756@qq.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
2023-10-30 17:34:43 +00:00
Russ Cox
8abde68f19 math/rand/v2: delete Mitchell/Reeds source
These slowdowns are because we are now using PCG instead of the
Mitchell/Reeds LFSR for the benchmarks. PCG is in fact a bit slower
(but generates statically far better random numbers).

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 01ff938549.amd64 │           afa459a2f0.amd64           │
                        │      sec/op      │    sec/op     vs base                │
PCG_DXSM-32                    1.490n ± 0%    1.488n ± 2%        ~ (p=0.408 n=20)
SourceUint64-32                1.352n ± 1%    1.450n ± 3%   +7.21% (p=0.000 n=20)
GlobalInt64-32                 2.083n ± 0%    2.067n ± 2%        ~ (p=0.223 n=20)
GlobalInt64Parallel-32        0.1035n ± 1%   0.1044n ± 2%        ~ (p=0.010 n=20)
GlobalUint64-32                2.038n ± 1%    2.085n ± 0%   +2.28% (p=0.000 n=20)
GlobalUint64Parallel-32       0.1006n ± 1%   0.1008n ± 1%        ~ (p=0.733 n=20)
Int64-32                       1.687n ± 2%    1.779n ± 1%   +5.48% (p=0.000 n=20)
Uint64-32                      1.674n ± 2%    1.854n ± 2%  +10.69% (p=0.000 n=20)
GlobalIntN1000-32              3.135n ± 1%    3.140n ± 3%        ~ (p=0.794 n=20)
IntN1000-32                    2.478n ± 1%    2.496n ± 1%   +0.73% (p=0.006 n=20)
Int64N1000-32                  2.455n ± 1%    2.510n ± 2%   +2.22% (p=0.000 n=20)
Int64N1e8-32                   2.467n ± 2%    2.471n ± 2%        ~ (p=0.050 n=20)
Int64N1e9-32                   2.454n ± 1%    2.488n ± 2%   +1.39% (p=0.000 n=20)
Int64N2e9-32                   2.482n ± 1%    2.478n ± 2%        ~ (p=0.066 n=20)
Int64N1e18-32                  3.349n ± 2%    3.088n ± 1%   -7.81% (p=0.000 n=20)
Int64N2e18-32                  3.537n ± 1%    3.493n ± 1%   -1.24% (p=0.002 n=20)
Int64N4e18-32                  4.917n ± 0%    5.060n ± 2%   +2.91% (p=0.000 n=20)
Int32N1000-32                  2.386n ± 1%    2.620n ± 1%   +9.76% (p=0.000 n=20)
Int32N1e8-32                   2.366n ± 1%    2.652n ± 0%  +12.11% (p=0.000 n=20)
Int32N1e9-32                   2.355n ± 2%    2.644n ± 1%  +12.32% (p=0.000 n=20)
Int32N2e9-32                   2.371n ± 1%    2.619n ± 2%  +10.48% (p=0.000 n=20)
Float32-32                     2.245n ± 2%    2.261n ± 1%        ~ (p=0.625 n=20)
Float64-32                     2.235n ± 1%    2.241n ± 2%        ~ (p=0.393 n=20)
ExpFloat64-32                  3.813n ± 3%    3.716n ± 1%   -2.53% (p=0.000 n=20)
NormFloat64-32                 3.652n ± 2%    3.718n ± 1%   +1.79% (p=0.006 n=20)
Perm3-32                       33.12n ± 3%    34.11n ± 2%        ~ (p=0.021 n=20)
Perm30-32                      205.1n ± 1%    200.6n ± 0%   -2.17% (p=0.000 n=20)
Perm30ViaShuffle-32            110.8n ± 1%    109.7n ± 1%   -0.99% (p=0.002 n=20)
ShuffleOverhead-32             113.0n ± 1%    107.2n ± 1%   -5.09% (p=0.000 n=20)
Concurrent-32                  2.100n ± 0%    2.108n ± 6%        ~ (p=0.103 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
                       │ 01ff938549.arm64 │           afa459a2f0.arm64           │
                       │      sec/op      │    sec/op     vs base                │
PCG_DXSM-8                    2.531n ± 0%    2.531n ± 0%        ~ (p=0.763 n=20)
SourceUint64-8                2.258n ± 1%    2.531n ± 0%  +12.09% (p=0.000 n=20)
GlobalInt64-8                 2.167n ± 0%    2.177n ± 1%        ~ (p=0.213 n=20)
GlobalInt64Parallel-8        0.4310n ± 0%   0.4319n ± 0%        ~ (p=0.027 n=20)
GlobalUint64-8                2.182n ± 1%    2.185n ± 1%        ~ (p=0.683 n=20)
GlobalUint64Parallel-8       0.4297n ± 0%   0.4295n ± 1%        ~ (p=0.941 n=20)
Int64-8                       2.472n ± 1%    4.104n ± 0%  +66.00% (p=0.000 n=20)
Uint64-8                      2.449n ± 1%    4.080n ± 0%  +66.60% (p=0.000 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 1%        ~ (p=0.972 n=20)
IntN1000-8                    2.998n ± 2%    4.140n ± 0%  +38.09% (p=0.000 n=20)
Int64N1000-8                  2.949n ± 2%    4.139n ± 0%  +40.35% (p=0.000 n=20)
Int64N1e8-8                   2.953n ± 2%    4.140n ± 0%  +40.22% (p=0.000 n=20)
Int64N1e9-8                   2.950n ± 0%    4.139n ± 0%  +40.32% (p=0.000 n=20)
Int64N2e9-8                   2.946n ± 2%    4.140n ± 0%  +40.53% (p=0.000 n=20)
Int64N1e18-8                  3.779n ± 1%    5.273n ± 0%  +39.52% (p=0.000 n=20)
Int64N2e18-8                  4.370n ± 1%    6.059n ± 0%  +38.65% (p=0.000 n=20)
Int64N4e18-8                  6.544n ± 1%    8.803n ± 0%  +34.52% (p=0.000 n=20)
Int32N1000-8                  2.950n ± 0%    4.131n ± 0%  +40.06% (p=0.000 n=20)
Int32N1e8-8                   2.950n ± 2%    4.131n ± 0%  +40.03% (p=0.000 n=20)
Int32N1e9-8                   2.951n ± 2%    4.131n ± 0%  +39.99% (p=0.000 n=20)
Int32N2e9-8                   2.950n ± 2%    4.131n ± 0%  +40.03% (p=0.000 n=20)
Float32-8                     3.441n ± 0%    4.110n ± 0%  +19.44% (p=0.000 n=20)
Float64-8                     3.442n ± 0%    4.104n ± 0%  +19.24% (p=0.000 n=20)
ExpFloat64-8                  4.481n ± 0%    5.338n ± 0%  +19.11% (p=0.000 n=20)
NormFloat64-8                 4.725n ± 0%    5.731n ± 0%  +21.28% (p=0.000 n=20)
Perm3-8                       26.55n ± 0%    26.62n ± 0%   +0.28% (p=0.000 n=20)
Perm30-8                      181.9n ± 0%    194.6n ± 2%   +6.98% (p=0.000 n=20)
Perm30ViaShuffle-8            142.9n ± 0%    156.4n ± 0%   +9.45% (p=0.000 n=20)
ShuffleOverhead-8             120.8n ± 2%    125.8n ± 0%   +4.10% (p=0.000 n=20)
Concurrent-8                  2.421n ± 6%    2.654n ± 6%   +9.67% (p=0.002 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 01ff938549.386 │            afa459a2f0.386             │
                        │     sec/op     │    sec/op     vs base                 │
PCG_DXSM-32                  7.613n ± 1%    7.793n ± 2%    +2.38% (p=0.000 n=20)
SourceUint64-32              2.069n ± 0%    7.680n ± 1%  +271.19% (p=0.000 n=20)
GlobalInt64-32               3.456n ± 1%    3.474n ± 3%         ~ (p=0.654 n=20)
GlobalInt64Parallel-32      0.3252n ± 0%   0.3253n ± 0%         ~ (p=0.952 n=20)
GlobalUint64-32              3.573n ± 1%    3.433n ± 2%    -3.92% (p=0.000 n=20)
GlobalUint64Parallel-32     0.3159n ± 0%   0.3156n ± 0%         ~ (p=0.223 n=20)
Int64-32                     2.562n ± 2%    7.707n ± 1%  +200.74% (p=0.000 n=20)
Uint64-32                    2.592n ± 0%    7.714n ± 1%  +197.65% (p=0.000 n=20)
GlobalIntN1000-32            6.266n ± 2%    6.236n ± 1%         ~ (p=0.039 n=20)
IntN1000-32                  4.724n ± 2%   10.410n ± 1%  +120.39% (p=0.000 n=20)
Int64N1000-32                5.490n ± 2%   10.975n ± 2%   +99.89% (p=0.000 n=20)
Int64N1e8-32                 5.513n ± 2%   10.980n ± 1%   +99.15% (p=0.000 n=20)
Int64N1e9-32                 5.476n ± 1%   10.950n ± 0%   +99.96% (p=0.000 n=20)
Int64N2e9-32                 5.501n ± 2%   11.110n ± 1%  +101.96% (p=0.000 n=20)
Int64N1e18-32                9.043n ± 2%   15.180n ± 2%   +67.86% (p=0.000 n=20)
Int64N2e18-32                9.601n ± 2%   15.610n ± 1%   +62.60% (p=0.000 n=20)
Int64N4e18-32                12.00n ± 1%    19.23n ± 2%   +60.14% (p=0.000 n=20)
Int32N1000-32                4.829n ± 2%   10.345n ± 1%  +114.25% (p=0.000 n=20)
Int32N1e8-32                 4.825n ± 2%   10.330n ± 1%  +114.09% (p=0.000 n=20)
Int32N1e9-32                 4.830n ± 2%   10.350n ± 1%  +114.26% (p=0.000 n=20)
Int32N2e9-32                 4.750n ± 2%   10.345n ± 1%  +117.81% (p=0.000 n=20)
Float32-32                   10.89n ± 4%    13.57n ± 1%   +24.61% (p=0.000 n=20)
Float64-32                   19.60n ± 4%    22.95n ± 4%   +17.12% (p=0.000 n=20)
ExpFloat64-32                12.96n ± 3%    15.23n ± 2%   +17.47% (p=0.000 n=20)
NormFloat64-32               7.516n ± 1%   13.780n ± 1%   +83.34% (p=0.000 n=20)
Perm3-32                     36.78n ± 2%    46.62n ± 2%   +26.72% (p=0.000 n=20)
Perm30-32                    238.9n ± 2%    400.7n ± 1%   +67.73% (p=0.000 n=20)
Perm30ViaShuffle-32          189.7n ± 2%    350.5n ± 1%   +84.79% (p=0.000 n=20)
ShuffleOverhead-32           159.8n ± 1%    326.0n ± 2%  +104.01% (p=0.000 n=20)
Concurrent-32                3.286n ± 1%    3.290n ± 0%         ~ (p=0.743 n=20)

On the other hand, compared to the original "update benchmarks" CL,
the cleanups we've made more than compensate for PCG being a bit
slower than LFSR, at least on 64-bit x86. ARM64 (Apple M1) is a bit
slower: perhaps the 64x64→128 multiply is slower there for some reason.
386 is noticeably slower, but it's also a non-SSA backend.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.amd64 │            afa459a2f0.amd64            │
                        │      sec/op      │    sec/op     vs base                  │
SourceUint64-32                1.555n ± 1%    1.450n ± 3%   -6.78% (p=0.000 n=20)
GlobalInt64-32                 2.071n ± 1%    2.067n ± 2%        ~ (p=0.673 n=20)
GlobalInt63Parallel-32        0.1023n ± 1%
GlobalInt64Parallel-32                       0.1044n ± 2%
GlobalUint64-32                5.193n ± 1%    2.085n ± 0%  -59.86% (p=0.000 n=20)
GlobalUint64Parallel-32       0.2341n ± 0%   0.1008n ± 1%  -56.93% (p=0.000 n=20)
Int64-32                       2.056n ± 2%    1.779n ± 1%  -13.47% (p=0.000 n=20)
Uint64-32                      2.077n ± 2%    1.854n ± 2%  -10.74% (p=0.000 n=20)
GlobalIntN1000-32              4.077n ± 2%    3.140n ± 3%  -22.98% (p=0.000 n=20)
IntN1000-32                    3.476n ± 2%    2.496n ± 1%  -28.19% (p=0.000 n=20)
Int64N1000-32                  3.059n ± 1%    2.510n ± 2%  -17.96% (p=0.000 n=20)
Int64N1e8-32                   2.942n ± 1%    2.471n ± 2%  -15.98% (p=0.000 n=20)
Int64N1e9-32                   2.932n ± 1%    2.488n ± 2%  -15.14% (p=0.000 n=20)
Int64N2e9-32                   2.925n ± 1%    2.478n ± 2%  -15.30% (p=0.000 n=20)
Int64N1e18-32                  3.116n ± 1%    3.088n ± 1%        ~ (p=0.013 n=20)
Int64N2e18-32                  4.067n ± 1%    3.493n ± 1%  -14.11% (p=0.000 n=20)
Int64N4e18-32                  4.054n ± 1%    5.060n ± 2%  +24.80% (p=0.000 n=20)
Int32N1000-32                  2.951n ± 1%    2.620n ± 1%  -11.22% (p=0.000 n=20)
Int32N1e8-32                   3.102n ± 1%    2.652n ± 0%  -14.50% (p=0.000 n=20)
Int32N1e9-32                   3.535n ± 1%    2.644n ± 1%  -25.20% (p=0.000 n=20)
Int32N2e9-32                   3.514n ± 1%    2.619n ± 2%  -25.47% (p=0.000 n=20)
Float32-32                     2.760n ± 1%    2.261n ± 1%  -18.06% (p=0.000 n=20)
Float64-32                     2.284n ± 1%    2.241n ± 2%        ~ (p=0.016 n=20)
ExpFloat64-32                  3.757n ± 1%    3.716n ± 1%        ~ (p=0.034 n=20)
NormFloat64-32                 3.837n ± 1%    3.718n ± 1%   -3.09% (p=0.000 n=20)
Perm3-32                       35.23n ± 2%    34.11n ± 2%   -3.19% (p=0.000 n=20)
Perm30-32                      208.8n ± 1%    200.6n ± 0%   -3.93% (p=0.000 n=20)
Perm30ViaShuffle-32            111.7n ± 1%    109.7n ± 1%   -1.84% (p=0.000 n=20)
ShuffleOverhead-32             101.1n ± 1%    107.2n ± 1%   +6.03% (p=0.000 n=20)
Concurrent-32                  2.108n ± 7%    2.108n ± 6%        ~ (p=0.644 n=20)
PCG_DXSM-32                                   1.488n ± 2%

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 220860f76f.arm64 │            afa459a2f0.arm64            │
                       │      sec/op      │    sec/op     vs base                  │
SourceUint64-8                2.316n ± 1%    2.531n ± 0%   +9.33% (p=0.000 n=20)
GlobalInt64-8                 2.183n ± 1%    2.177n ± 1%        ~ (p=0.533 n=20)
GlobalInt63Parallel-8        0.4331n ± 0%
GlobalInt64Parallel-8                       0.4319n ± 0%
GlobalUint64-8                4.377n ± 2%    2.185n ± 1%  -50.07% (p=0.000 n=20)
GlobalUint64Parallel-8       0.9237n ± 0%   0.4295n ± 1%  -53.50% (p=0.000 n=20)
Int64-8                       2.538n ± 1%    4.104n ± 0%  +61.68% (p=0.000 n=20)
Uint64-8                      2.604n ± 1%    4.080n ± 0%  +56.68% (p=0.000 n=20)
GlobalIntN1000-8              3.857n ± 2%    2.814n ± 1%  -27.04% (p=0.000 n=20)
IntN1000-8                    3.822n ± 2%    4.140n ± 0%   +8.32% (p=0.000 n=20)
Int64N1000-8                  3.318n ± 0%    4.139n ± 0%  +24.74% (p=0.000 n=20)
Int64N1e8-8                   3.349n ± 1%    4.140n ± 0%  +23.64% (p=0.000 n=20)
Int64N1e9-8                   3.317n ± 2%    4.139n ± 0%  +24.80% (p=0.000 n=20)
Int64N2e9-8                   3.317n ± 2%    4.140n ± 0%  +24.81% (p=0.000 n=20)
Int64N1e18-8                  3.542n ± 1%    5.273n ± 0%  +48.85% (p=0.000 n=20)
Int64N2e18-8                  5.087n ± 0%    6.059n ± 0%  +19.12% (p=0.000 n=20)
Int64N4e18-8                  5.084n ± 0%    8.803n ± 0%  +73.16% (p=0.000 n=20)
Int32N1000-8                  3.208n ± 2%    4.131n ± 0%  +28.79% (p=0.000 n=20)
Int32N1e8-8                   3.610n ± 1%    4.131n ± 0%  +14.43% (p=0.000 n=20)
Int32N1e9-8                   4.235n ± 0%    4.131n ± 0%   -2.44% (p=0.000 n=20)
Int32N2e9-8                   4.229n ± 1%    4.131n ± 0%   -2.33% (p=0.000 n=20)
Float32-8                     3.468n ± 0%    4.110n ± 0%  +18.50% (p=0.000 n=20)
Float64-8                     3.447n ± 0%    4.104n ± 0%  +19.05% (p=0.000 n=20)
ExpFloat64-8                  4.567n ± 0%    5.338n ± 0%  +16.86% (p=0.000 n=20)
NormFloat64-8                 4.821n ± 0%    5.731n ± 0%  +18.89% (p=0.000 n=20)
Perm3-8                       28.89n ± 0%    26.62n ± 0%   -7.84% (p=0.000 n=20)
Perm30-8                      175.7n ± 0%    194.6n ± 2%  +10.76% (p=0.000 n=20)
Perm30ViaShuffle-8            153.5n ± 0%    156.4n ± 0%   +1.86% (p=0.000 n=20)
ShuffleOverhead-8             119.8n ± 1%    125.8n ± 0%   +4.97% (p=0.000 n=20)
Concurrent-8                  2.433n ± 3%    2.654n ± 6%   +9.13% (p=0.001 n=20)
PCG_DXSM-8                                   2.531n ± 0%

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.386 │             afa459a2f0.386              │
                        │     sec/op     │    sec/op     vs base                   │
SourceUint64-32             2.370n ±  1%    7.680n ± 1%  +224.05% (p=0.000 n=20)
GlobalInt64-32              3.569n ±  1%    3.474n ± 3%    -2.66% (p=0.001 n=20)
GlobalInt63Parallel-32     0.3221n ±  1%
GlobalInt64Parallel-32                     0.3253n ± 0%
GlobalUint64-32             8.797n ± 10%    3.433n ± 2%   -60.98% (p=0.000 n=20)
GlobalUint64Parallel-32    0.6351n ±  0%   0.3156n ± 0%   -50.31% (p=0.000 n=20)
Int64-32                    2.612n ±  2%    7.707n ± 1%  +195.04% (p=0.000 n=20)
Uint64-32                   3.350n ±  1%    7.714n ± 1%  +130.25% (p=0.000 n=20)
GlobalIntN1000-32           5.892n ±  1%    6.236n ± 1%    +5.82% (p=0.000 n=20)
IntN1000-32                 4.546n ±  1%   10.410n ± 1%  +128.97% (p=0.000 n=20)
Int64N1000-32               14.59n ±  1%    10.97n ± 2%   -24.75% (p=0.000 n=20)
Int64N1e8-32                14.76n ±  2%    10.98n ± 1%   -25.58% (p=0.000 n=20)
Int64N1e9-32                16.57n ±  1%    10.95n ± 0%   -33.90% (p=0.000 n=20)
Int64N2e9-32                14.54n ±  1%    11.11n ± 1%   -23.62% (p=0.000 n=20)
Int64N1e18-32               16.14n ±  1%    15.18n ± 2%    -5.95% (p=0.000 n=20)
Int64N2e18-32               18.10n ±  1%    15.61n ± 1%   -13.73% (p=0.000 n=20)
Int64N4e18-32               18.65n ±  1%    19.23n ± 2%    +3.08% (p=0.000 n=20)
Int32N1000-32               3.560n ±  1%   10.345n ± 1%  +190.55% (p=0.000 n=20)
Int32N1e8-32                3.770n ±  2%   10.330n ± 1%  +174.01% (p=0.000 n=20)
Int32N1e9-32                4.098n ±  0%   10.350n ± 1%  +152.53% (p=0.000 n=20)
Int32N2e9-32                4.179n ±  1%   10.345n ± 1%  +147.52% (p=0.000 n=20)
Float32-32                  21.18n ±  4%    13.57n ± 1%   -35.93% (p=0.000 n=20)
Float64-32                  20.60n ±  2%    22.95n ± 4%   +11.41% (p=0.000 n=20)
ExpFloat64-32               13.07n ±  0%    15.23n ± 2%   +16.48% (p=0.000 n=20)
NormFloat64-32              7.738n ±  2%   13.780n ± 1%   +78.08% (p=0.000 n=20)
Perm3-32                    36.73n ±  1%    46.62n ± 2%   +26.91% (p=0.000 n=20)
Perm30-32                   211.9n ±  1%    400.7n ± 1%   +89.05% (p=0.000 n=20)
Perm30ViaShuffle-32         165.2n ±  1%    350.5n ± 1%  +112.20% (p=0.000 n=20)
ShuffleOverhead-32          133.9n ±  1%    326.0n ± 2%  +143.37% (p=0.000 n=20)
Concurrent-32               3.287n ±  2%    3.290n ± 0%         ~ (p=0.365 n=20)
PCG_DXSM-32                                 7.793n ± 2%

For #61716.

Change-Id: I4e9c0525b5f84a2ac46f23da9e365495e2d05777
Reviewed-on: https://go-review.googlesource.com/c/go/+/502506
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 17:09:26 +00:00
Russ Cox
8631fcbf31 math/rand/v2: add PCG-DXSM
For the original math/rand, we ported Plan 9's random number
generator, which was a refinement by Ken Thompson of an algorithm
by Don Mitchell and Jim Reeds, which Mitchell in turn recalls as
having been derived from an algorithm by Marsaglia. At its core,
it is an additive lagged Fibonacci generator (ALFG).

Whatever the details of the history, this generator is nowhere
near the current state of the art for simple, pseudo-random
generators.

This CL adds an implementation of Melissa O'Neill's PCG, specifically
the variant PCG-DXSM, which she defined after writing the PCG paper
and which is now the default in Numpy. The update is slightly slower
(a few multiplies and adds, instead of a few adds), but the state
is dramatically smaller (2 words instead of 607). The statistical
output properties are better too.

A followup CL will delete the old generator.

PCG is the only change here, so no benchmarks should be affected.
Including them anyway as further evidence for caution.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 8993506f2f.amd64 │           01ff938549.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.325n ± 1%    1.352n ± 1%   +2.00% (p=0.000 n=20)
GlobalInt64-32                 2.240n ± 1%    2.083n ± 0%   -7.03% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1041n ± 1%   0.1035n ± 1%        ~ (p=0.064 n=20)
GlobalUint64-32                2.072n ± 3%    2.038n ± 1%        ~ (p=0.089 n=20)
GlobalUint64Parallel-32       0.1008n ± 1%   0.1006n ± 1%        ~ (p=0.804 n=20)
Int64-32                       1.716n ± 1%    1.687n ± 2%        ~ (p=0.045 n=20)
Uint64-32                      1.665n ± 1%    1.674n ± 2%        ~ (p=0.878 n=20)
GlobalIntN1000-32              3.335n ± 1%    3.135n ± 1%   -6.00% (p=0.000 n=20)
IntN1000-32                    2.484n ± 1%    2.478n ± 1%        ~ (p=0.085 n=20)
Int64N1000-32                  2.502n ± 2%    2.455n ± 1%   -1.88% (p=0.002 n=20)
Int64N1e8-32                   2.484n ± 2%    2.467n ± 2%        ~ (p=0.048 n=20)
Int64N1e9-32                   2.502n ± 0%    2.454n ± 1%   -1.92% (p=0.000 n=20)
Int64N2e9-32                   2.502n ± 0%    2.482n ± 1%   -0.76% (p=0.000 n=20)
Int64N1e18-32                  3.201n ± 1%    3.349n ± 2%   +4.62% (p=0.000 n=20)
Int64N2e18-32                  3.504n ± 1%    3.537n ± 1%        ~ (p=0.185 n=20)
Int64N4e18-32                  4.873n ± 1%    4.917n ± 0%   +0.90% (p=0.000 n=20)
Int32N1000-32                  2.639n ± 1%    2.386n ± 1%   -9.57% (p=0.000 n=20)
Int32N1e8-32                   2.686n ± 2%    2.366n ± 1%  -11.91% (p=0.000 n=20)
Int32N1e9-32                   2.636n ± 1%    2.355n ± 2%  -10.70% (p=0.000 n=20)
Int32N2e9-32                   2.660n ± 1%    2.371n ± 1%  -10.88% (p=0.000 n=20)
Float32-32                     2.261n ± 1%    2.245n ± 2%        ~ (p=0.752 n=20)
Float64-32                     2.280n ± 1%    2.235n ± 1%   -1.97% (p=0.007 n=20)
ExpFloat64-32                  3.891n ± 1%    3.813n ± 3%        ~ (p=0.087 n=20)
NormFloat64-32                 3.711n ± 1%    3.652n ± 2%        ~ (p=0.021 n=20)
Perm3-32                       32.60n ± 2%    33.12n ± 3%        ~ (p=0.107 n=20)
Perm30-32                      204.2n ± 0%    205.1n ± 1%        ~ (p=0.358 n=20)
Perm30ViaShuffle-32            121.7n ± 2%    110.8n ± 1%   -8.96% (p=0.000 n=20)
ShuffleOverhead-32             106.2n ± 2%    113.0n ± 1%   +6.36% (p=0.000 n=20)
Concurrent-32                  2.190n ± 5%    2.100n ± 0%   -4.13% (p=0.001 n=20)
PCG_DXSM-32                                   1.490n ± 0%

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 8993506f2f.arm64 │           01ff938549.arm64           │
                       │      sec/op      │    sec/op     vs base                │
SourceUint64-8                2.271n ± 0%    2.258n ± 1%        ~ (p=0.167 n=20)
GlobalInt64-8                 2.161n ± 1%    2.167n ± 0%        ~ (p=0.693 n=20)
GlobalInt64Parallel-8        0.4303n ± 0%   0.4310n ± 0%        ~ (p=0.051 n=20)
GlobalUint64-8                2.164n ± 1%    2.182n ± 1%        ~ (p=0.042 n=20)
GlobalUint64Parallel-8       0.4287n ± 0%   0.4297n ± 0%        ~ (p=0.082 n=20)
Int64-8                       2.478n ± 1%    2.472n ± 1%        ~ (p=0.151 n=20)
Uint64-8                      2.460n ± 1%    2.449n ± 1%        ~ (p=0.013 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 2%        ~ (p=0.821 n=20)
IntN1000-8                    3.003n ± 2%    2.998n ± 2%        ~ (p=0.024 n=20)
Int64N1000-8                  2.954n ± 0%    2.949n ± 2%        ~ (p=0.192 n=20)
Int64N1e8-8                   2.956n ± 0%    2.953n ± 2%        ~ (p=0.109 n=20)
Int64N1e9-8                   3.325n ± 0%    2.950n ± 0%  -11.26% (p=0.000 n=20)
Int64N2e9-8                   2.956n ± 2%    2.946n ± 2%        ~ (p=0.027 n=20)
Int64N1e18-8                  3.780n ± 1%    3.779n ± 1%        ~ (p=0.815 n=20)
Int64N2e18-8                  4.385n ± 0%    4.370n ± 1%        ~ (p=0.402 n=20)
Int64N4e18-8                  6.527n ± 0%    6.544n ± 1%        ~ (p=0.140 n=20)
Int32N1000-8                  2.964n ± 1%    2.950n ± 0%   -0.47% (p=0.002 n=20)
Int32N1e8-8                   2.964n ± 1%    2.950n ± 2%        ~ (p=0.013 n=20)
Int32N1e9-8                   2.963n ± 2%    2.951n ± 2%        ~ (p=0.062 n=20)
Int32N2e9-8                   2.961n ± 2%    2.950n ± 2%   -0.37% (p=0.002 n=20)
Float32-8                     3.442n ± 0%    3.441n ± 0%        ~ (p=0.211 n=20)
Float64-8                     3.442n ± 0%    3.442n ± 0%        ~ (p=0.067 n=20)
ExpFloat64-8                  4.472n ± 0%    4.481n ± 0%   +0.20% (p=0.000 n=20)
NormFloat64-8                 4.734n ± 0%    4.725n ± 0%   -0.19% (p=0.003 n=20)
Perm3-8                       26.55n ± 0%    26.55n ± 0%        ~ (p=0.833 n=20)
Perm30-8                      181.9n ± 0%    181.9n ± 0%   -0.03% (p=0.004 n=20)
Perm30ViaShuffle-8            143.1n ± 0%    142.9n ± 0%        ~ (p=0.204 n=20)
ShuffleOverhead-8             120.6n ± 1%    120.8n ± 2%        ~ (p=0.102 n=20)
Concurrent-8                  2.357n ± 2%    2.421n ± 6%        ~ (p=0.016 n=20)
PCG_DXSM-8                                   2.531n ± 0%

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 8993506f2f.386 │           01ff938549.386            │
                        │     sec/op     │    sec/op     vs base               │
SourceUint64-32              2.102n ± 2%    2.069n ± 0%       ~ (p=0.021 n=20)
GlobalInt64-32               3.542n ± 2%    3.456n ± 1%  -2.44% (p=0.001 n=20)
GlobalInt64Parallel-32      0.3202n ± 0%   0.3252n ± 0%  +1.56% (p=0.000 n=20)
GlobalUint64-32              3.507n ± 1%    3.573n ± 1%  +1.87% (p=0.000 n=20)
GlobalUint64Parallel-32     0.3170n ± 1%   0.3159n ± 0%       ~ (p=0.167 n=20)
Int64-32                     2.516n ± 1%    2.562n ± 2%       ~ (p=0.016 n=20)
Uint64-32                    2.544n ± 1%    2.592n ± 0%  +1.85% (p=0.000 n=20)
GlobalIntN1000-32            6.237n ± 1%    6.266n ± 2%       ~ (p=0.268 n=20)
IntN1000-32                  4.670n ± 2%    4.724n ± 2%       ~ (p=0.644 n=20)
Int64N1000-32                5.412n ± 1%    5.490n ± 2%       ~ (p=0.159 n=20)
Int64N1e8-32                 5.414n ± 2%    5.513n ± 2%       ~ (p=0.129 n=20)
Int64N1e9-32                 5.473n ± 1%    5.476n ± 1%       ~ (p=0.723 n=20)
Int64N2e9-32                 5.487n ± 1%    5.501n ± 2%       ~ (p=0.481 n=20)
Int64N1e18-32                8.901n ± 2%    9.043n ± 2%       ~ (p=0.330 n=20)
Int64N2e18-32                9.521n ± 1%    9.601n ± 2%       ~ (p=0.703 n=20)
Int64N4e18-32                11.92n ± 1%    12.00n ± 1%       ~ (p=0.489 n=20)
Int32N1000-32                4.785n ± 1%    4.829n ± 2%       ~ (p=0.402 n=20)
Int32N1e8-32                 4.748n ± 1%    4.825n ± 2%       ~ (p=0.218 n=20)
Int32N1e9-32                 4.810n ± 1%    4.830n ± 2%       ~ (p=0.794 n=20)
Int32N2e9-32                 4.812n ± 1%    4.750n ± 2%       ~ (p=0.057 n=20)
Float32-32                   10.48n ± 4%    10.89n ± 4%       ~ (p=0.162 n=20)
Float64-32                   19.79n ± 3%    19.60n ± 4%       ~ (p=0.668 n=20)
ExpFloat64-32                12.91n ± 3%    12.96n ± 3%       ~ (p=1.000 n=20)
NormFloat64-32               7.462n ± 1%    7.516n ± 1%       ~ (p=0.051 n=20)
Perm3-32                     35.98n ± 2%    36.78n ± 2%       ~ (p=0.033 n=20)
Perm30-32                    241.5n ± 1%    238.9n ± 2%       ~ (p=0.126 n=20)
Perm30ViaShuffle-32          187.3n ± 2%    189.7n ± 2%       ~ (p=0.387 n=20)
ShuffleOverhead-32           160.2n ± 1%    159.8n ± 1%       ~ (p=0.256 n=20)
Concurrent-32                3.308n ± 3%    3.286n ± 1%       ~ (p=0.038 n=20)
PCG_DXSM-32                                 7.613n ± 1%

For #61716.

Change-Id: Icb274ca1f782504d658305a40159b4ae6a2f3f1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/502505
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:09:23 +00:00
Russ Cox
f2e2637227 math/rand/v2: simplify Perm
The compiler says Perm is being inlined into BenchmarkPerm,
and yet BenchmarkPerm30ViaShuffle, which you'd think is the
same code, still runs significantly faster.

The benchmarks are mystifying but this is clearly still a step in
the right direction, since BenchmarkPerm30ViaShuffle is still
the fastest and we avoid having two copies of that logic.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ e1bbe739fb.amd64 │           8993506f2f.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.316n ± 2%    1.325n ± 1%        ~ (p=0.208 n=20)
GlobalInt64-32                 2.048n ± 1%    2.240n ± 1%   +9.38% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1037n ± 1%   0.1041n ± 1%        ~ (p=0.774 n=20)
GlobalUint64-32                2.039n ± 2%    2.072n ± 3%        ~ (p=0.115 n=20)
GlobalUint64Parallel-32       0.1013n ± 1%   0.1008n ± 1%        ~ (p=0.417 n=20)
Int64-32                       1.692n ± 2%    1.716n ± 1%        ~ (p=0.122 n=20)
Uint64-32                      1.643n ± 2%    1.665n ± 1%        ~ (p=0.062 n=20)
GlobalIntN1000-32              3.287n ± 1%    3.335n ± 1%        ~ (p=0.147 n=20)
IntN1000-32                    2.678n ± 2%    2.484n ± 1%   -7.24% (p=0.000 n=20)
Int64N1000-32                  2.684n ± 2%    2.502n ± 2%   -6.80% (p=0.000 n=20)
Int64N1e8-32                   2.663n ± 2%    2.484n ± 2%   -6.76% (p=0.000 n=20)
Int64N1e9-32                   2.633n ± 1%    2.502n ± 0%   -4.98% (p=0.000 n=20)
Int64N2e9-32                   2.657n ± 1%    2.502n ± 0%   -5.87% (p=0.000 n=20)
Int64N1e18-32                  3.125n ± 2%    3.201n ± 1%   +2.43% (p=0.000 n=20)
Int64N2e18-32                  3.476n ± 1%    3.504n ± 1%   +0.83% (p=0.009 n=20)
Int64N4e18-32                  4.795n ± 1%    4.873n ± 1%        ~ (p=0.106 n=20)
Int32N1000-32                  2.485n ± 2%    2.639n ± 1%   +6.20% (p=0.000 n=20)
Int32N1e8-32                   2.457n ± 1%    2.686n ± 2%   +9.34% (p=0.000 n=20)
Int32N1e9-32                   2.452n ± 1%    2.636n ± 1%   +7.52% (p=0.000 n=20)
Int32N2e9-32                   2.453n ± 1%    2.660n ± 1%   +8.44% (p=0.000 n=20)
Float32-32                     2.254n ± 1%    2.261n ± 1%        ~ (p=0.888 n=20)
Float64-32                     2.262n ± 1%    2.280n ± 1%        ~ (p=0.040 n=20)
ExpFloat64-32                  3.777n ± 2%    3.891n ± 1%   +3.03% (p=0.000 n=20)
NormFloat64-32                 3.606n ± 1%    3.711n ± 1%   +2.91% (p=0.000 n=20)
Perm3-32                       33.12n ± 2%    32.60n ± 2%        ~ (p=0.045 n=20)
Perm30-32                      176.1n ± 1%    204.2n ± 0%  +15.96% (p=0.000 n=20)
Perm30ViaShuffle-32            109.3n ± 1%    121.7n ± 2%  +11.30% (p=0.000 n=20)
ShuffleOverhead-32             112.5n ± 1%    106.2n ± 2%   -5.56% (p=0.000 n=20)
Concurrent-32                  2.099n ± 0%    2.190n ± 5%   +4.36% (p=0.001 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ e1bbe739fb.arm64 │           8993506f2f.arm64           │
                       │      sec/op      │    sec/op     vs base                │
SourceUint64-8                2.290n ± 1%    2.271n ± 0%        ~ (p=0.015 n=20)
GlobalInt64-8                 2.180n ± 1%    2.161n ± 1%        ~ (p=0.180 n=20)
GlobalInt64Parallel-8        0.4294n ± 0%   0.4303n ± 0%   +0.19% (p=0.001 n=20)
GlobalUint64-8                2.170n ± 1%    2.164n ± 1%        ~ (p=0.673 n=20)
GlobalUint64Parallel-8       0.4283n ± 0%   0.4287n ± 0%        ~ (p=0.128 n=20)
Int64-8                       2.481n ± 1%    2.478n ± 1%        ~ (p=0.867 n=20)
Uint64-8                      2.464n ± 1%    2.460n ± 1%        ~ (p=0.763 n=20)
GlobalIntN1000-8              2.814n ± 0%    2.814n ± 2%        ~ (p=0.969 n=20)
IntN1000-8                    2.934n ± 2%    3.003n ± 2%   +2.35% (p=0.000 n=20)
Int64N1000-8                  2.957n ± 1%    2.954n ± 0%        ~ (p=0.285 n=20)
Int64N1e8-8                   2.935n ± 2%    2.956n ± 0%   +0.73% (p=0.002 n=20)
Int64N1e9-8                   2.935n ± 2%    3.325n ± 0%  +13.29% (p=0.000 n=20)
Int64N2e9-8                   2.933n ± 4%    2.956n ± 2%        ~ (p=0.163 n=20)
Int64N1e18-8                  3.781n ± 1%    3.780n ± 1%        ~ (p=0.805 n=20)
Int64N2e18-8                  4.362n ± 0%    4.385n ± 0%        ~ (p=0.077 n=20)
Int64N4e18-8                  6.576n ± 1%    6.527n ± 0%        ~ (p=0.024 n=20)
Int32N1000-8                  2.942n ± 2%    2.964n ± 1%        ~ (p=0.073 n=20)
Int32N1e8-8                   2.941n ± 1%    2.964n ± 1%        ~ (p=0.058 n=20)
Int32N1e9-8                   2.938n ± 2%    2.963n ± 2%   +0.87% (p=0.003 n=20)
Int32N2e9-8                   2.982n ± 2%    2.961n ± 2%        ~ (p=0.056 n=20)
Float32-8                     3.441n ± 0%    3.442n ± 0%        ~ (p=0.030 n=20)
Float64-8                     3.441n ± 0%    3.442n ± 0%   +0.03% (p=0.001 n=20)
ExpFloat64-8                  4.472n ± 0%    4.472n ± 0%        ~ (p=0.877 n=20)
NormFloat64-8                 4.716n ± 0%    4.734n ± 0%   +0.38% (p=0.000 n=20)
Perm3-8                       26.66n ± 0%    26.55n ± 0%   -0.39% (p=0.000 n=20)
Perm30-8                      143.3n ± 0%    181.9n ± 0%  +26.97% (p=0.000 n=20)
Perm30ViaShuffle-8            142.9n ± 0%    143.1n ± 0%        ~ (p=0.669 n=20)
ShuffleOverhead-8             121.1n ± 1%    120.6n ± 1%   -0.41% (p=0.004 n=20)
Concurrent-8                  2.379n ± 2%    2.357n ± 2%        ~ (p=0.337 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ e1bbe739fb.386 │            8993506f2f.386            │
                        │     sec/op     │    sec/op     vs base                │
SourceUint64-32              2.087n ± 1%    2.102n ± 2%        ~ (p=0.507 n=20)
GlobalInt64-32               3.538n ± 2%    3.542n ± 2%        ~ (p=0.425 n=20)
GlobalInt64Parallel-32      0.3207n ± 1%   0.3202n ± 0%        ~ (p=0.963 n=20)
GlobalUint64-32              3.543n ± 1%    3.507n ± 1%        ~ (p=0.034 n=20)
GlobalUint64Parallel-32     0.3170n ± 0%   0.3170n ± 1%        ~ (p=0.920 n=20)
Int64-32                     2.548n ± 1%    2.516n ± 1%        ~ (p=0.139 n=20)
Uint64-32                    2.565n ± 2%    2.544n ± 1%        ~ (p=0.394 n=20)
GlobalIntN1000-32            6.300n ± 1%    6.237n ± 1%        ~ (p=0.029 n=20)
IntN1000-32                  4.750n ± 0%    4.670n ± 2%        ~ (p=0.034 n=20)
Int64N1000-32                5.515n ± 2%    5.412n ± 1%   -1.86% (p=0.009 n=20)
Int64N1e8-32                 5.527n ± 0%    5.414n ± 2%   -2.05% (p=0.002 n=20)
Int64N1e9-32                 5.531n ± 2%    5.473n ± 1%        ~ (p=0.047 n=20)
Int64N2e9-32                 5.514n ± 2%    5.487n ± 1%        ~ (p=0.298 n=20)
Int64N1e18-32                9.059n ± 1%    8.901n ± 2%        ~ (p=0.037 n=20)
Int64N2e18-32                9.594n ± 1%    9.521n ± 1%        ~ (p=0.051 n=20)
Int64N4e18-32                12.05n ± 2%    11.92n ± 1%        ~ (p=0.357 n=20)
Int32N1000-32                4.840n ± 2%    4.785n ± 1%        ~ (p=0.189 n=20)
Int32N1e8-32                 4.832n ± 2%    4.748n ± 1%        ~ (p=0.042 n=20)
Int32N1e9-32                 4.815n ± 2%    4.810n ± 1%        ~ (p=0.878 n=20)
Int32N2e9-32                 4.813n ± 1%    4.812n ± 1%        ~ (p=0.542 n=20)
Float32-32                   10.90n ± 2%    10.48n ± 4%   -3.85% (p=0.007 n=20)
Float64-32                   20.32n ± 4%    19.79n ± 3%        ~ (p=0.553 n=20)
ExpFloat64-32                12.95n ± 3%    12.91n ± 3%        ~ (p=0.909 n=20)
NormFloat64-32               7.570n ± 1%    7.462n ± 1%   -1.44% (p=0.004 n=20)
Perm3-32                     37.80n ± 2%    35.98n ± 2%   -4.79% (p=0.000 n=20)
Perm30-32                    214.0n ± 1%    241.5n ± 1%  +12.85% (p=0.000 n=20)
Perm30ViaShuffle-32          188.7n ± 2%    187.3n ± 2%        ~ (p=0.029 n=20)
ShuffleOverhead-32           160.8n ± 1%    160.2n ± 1%        ~ (p=0.180 n=20)
Concurrent-32                3.288n ± 0%    3.308n ± 3%        ~ (p=0.037 n=20)

For #61716.

Change-Id: I342b611456c3569520d3c91c849d29eba325d87e
Reviewed-on: https://go-review.googlesource.com/c/go/+/502504
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:09:21 +00:00
Branden Brown
488e2a56b9 math/rand/v2: remove bias in ExpFloat64 and NormFloat64
The original implementation of the ziggurat algorithm was designed for
32-bit random integer inputs. This necessitated reusing some low-order
bits for the slice selection and the random coordinate, which introduces
statistical bias. The result is that PractRand consistently fails the
math/rand normal and exponential sequences (transformed to uniform)
within 2 GB of variates.

This change adjusts the ziggurat procedures to use 63-bit random inputs,
so that there is no need to reuse bits between the slice and coordinate.
This is sufficient for the normal sequence to survive to 256 GB of
PractRand testing.

An alternative technique is to recalculate the ziggurats to use 1024
rather than 128 or 256 slices to make full use of 64-bit inputs. This
improves the survival of the normal sequence to far beyond 256 GB and
additionally provides a 6% performance improvement due to the improved
rejection procedure efficiency. However, doing so increases the total
size of the ziggurat tables from 4.5 kB to 48 kB.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 2703446c2e.amd64 │           e1bbe739fb.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.337n ± 1%    1.316n ± 2%        ~ (p=0.024 n=20)
GlobalInt64-32                 2.225n ± 2%    2.048n ± 1%   -7.93% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1043n ± 2%   0.1037n ± 1%        ~ (p=0.587 n=20)
GlobalUint64-32                2.058n ± 1%    2.039n ± 2%        ~ (p=0.030 n=20)
GlobalUint64Parallel-32       0.1009n ± 1%   0.1013n ± 1%        ~ (p=0.984 n=20)
Int64-32                       1.719n ± 2%    1.692n ± 2%        ~ (p=0.085 n=20)
Uint64-32                      1.669n ± 1%    1.643n ± 2%        ~ (p=0.049 n=20)
GlobalIntN1000-32              3.321n ± 2%    3.287n ± 1%        ~ (p=0.298 n=20)
IntN1000-32                    2.479n ± 1%    2.678n ± 2%   +8.01% (p=0.000 n=20)
Int64N1000-32                  2.477n ± 1%    2.684n ± 2%   +8.38% (p=0.000 n=20)
Int64N1e8-32                   2.490n ± 1%    2.663n ± 2%   +6.99% (p=0.000 n=20)
Int64N1e9-32                   2.458n ± 1%    2.633n ± 1%   +7.12% (p=0.000 n=20)
Int64N2e9-32                   2.486n ± 2%    2.657n ± 1%   +6.90% (p=0.000 n=20)
Int64N1e18-32                  3.215n ± 2%    3.125n ± 2%   -2.78% (p=0.000 n=20)
Int64N2e18-32                  3.588n ± 2%    3.476n ± 1%   -3.15% (p=0.000 n=20)
Int64N4e18-32                  4.938n ± 2%    4.795n ± 1%   -2.91% (p=0.000 n=20)
Int32N1000-32                  2.673n ± 2%    2.485n ± 2%   -7.02% (p=0.000 n=20)
Int32N1e8-32                   2.631n ± 2%    2.457n ± 1%   -6.63% (p=0.000 n=20)
Int32N1e9-32                   2.628n ± 2%    2.452n ± 1%   -6.70% (p=0.000 n=20)
Int32N2e9-32                   2.684n ± 2%    2.453n ± 1%   -8.61% (p=0.000 n=20)
Float32-32                     2.240n ± 2%    2.254n ± 1%        ~ (p=0.878 n=20)
Float64-32                     2.253n ± 1%    2.262n ± 1%        ~ (p=0.963 n=20)
ExpFloat64-32                  3.677n ± 1%    3.777n ± 2%   +2.71% (p=0.004 n=20)
NormFloat64-32                 3.761n ± 1%    3.606n ± 1%   -4.15% (p=0.000 n=20)
Perm3-32                       33.55n ± 2%    33.12n ± 2%        ~ (p=0.402 n=20)
Perm30-32                      173.2n ± 1%    176.1n ± 1%   +1.67% (p=0.000 n=20)
Perm30ViaShuffle-32            115.9n ± 1%    109.3n ± 1%   -5.69% (p=0.000 n=20)
ShuffleOverhead-32             101.9n ± 1%    112.5n ± 1%  +10.35% (p=0.000 n=20)
Concurrent-32                  2.107n ± 6%    2.099n ± 0%        ~ (p=0.051 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 2703446c2e.arm64 │          e1bbe739fb.arm64           │
                       │      sec/op      │    sec/op     vs base               │
SourceUint64-8                2.275n ± 0%    2.290n ± 1%       ~ (p=0.044 n=20)
GlobalInt64-8                 2.154n ± 1%    2.180n ± 1%       ~ (p=0.068 n=20)
GlobalInt64Parallel-8        0.4298n ± 0%   0.4294n ± 0%       ~ (p=0.079 n=20)
GlobalUint64-8                2.160n ± 1%    2.170n ± 1%       ~ (p=0.129 n=20)
GlobalUint64Parallel-8       0.4286n ± 0%   0.4283n ± 0%       ~ (p=0.350 n=20)
Int64-8                       2.491n ± 1%    2.481n ± 1%       ~ (p=0.330 n=20)
Uint64-8                      2.458n ± 0%    2.464n ± 1%       ~ (p=0.351 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 0%       ~ (p=0.325 n=20)
IntN1000-8                    2.933n ± 0%    2.934n ± 2%       ~ (p=0.079 n=20)
Int64N1000-8                  2.962n ± 1%    2.957n ± 1%       ~ (p=0.259 n=20)
Int64N1e8-8                   2.960n ± 1%    2.935n ± 2%       ~ (p=0.276 n=20)
Int64N1e9-8                   2.935n ± 2%    2.935n ± 2%       ~ (p=0.984 n=20)
Int64N2e9-8                   2.934n ± 0%    2.933n ± 4%       ~ (p=0.463 n=20)
Int64N1e18-8                  3.777n ± 1%    3.781n ± 1%       ~ (p=0.516 n=20)
Int64N2e18-8                  4.359n ± 1%    4.362n ± 0%       ~ (p=0.256 n=20)
Int64N4e18-8                  6.536n ± 1%    6.576n ± 1%       ~ (p=0.224 n=20)
Int32N1000-8                  2.937n ± 0%    2.942n ± 2%       ~ (p=0.312 n=20)
Int32N1e8-8                   2.937n ± 1%    2.941n ± 1%       ~ (p=0.463 n=20)
Int32N1e9-8                   2.936n ± 0%    2.938n ± 2%       ~ (p=0.044 n=20)
Int32N2e9-8                   2.938n ± 2%    2.982n ± 2%       ~ (p=0.174 n=20)
Float32-8                     3.441n ± 0%    3.441n ± 0%       ~ (p=0.064 n=20)
Float64-8                     3.441n ± 0%    3.441n ± 0%       ~ (p=0.826 n=20)
ExpFloat64-8                  4.486n ± 0%    4.472n ± 0%  -0.31% (p=0.000 n=20)
NormFloat64-8                 4.721n ± 0%    4.716n ± 0%       ~ (p=0.051 n=20)
Perm3-8                       26.65n ± 0%    26.66n ± 0%       ~ (p=0.080 n=20)
Perm30-8                      143.2n ± 0%    143.3n ± 0%  +0.10% (p=0.000 n=20)
Perm30ViaShuffle-8            143.0n ± 0%    142.9n ± 0%       ~ (p=0.642 n=20)
ShuffleOverhead-8             120.6n ± 1%    121.1n ± 1%  +0.41% (p=0.010 n=20)
Concurrent-8                  2.399n ± 5%    2.379n ± 2%       ~ (p=0.365 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 2703446c2e.386 │           e1bbe739fb.386            │
                        │     sec/op     │    sec/op     vs base               │
SourceUint64-32             2.072n ±  2%    2.087n ± 1%       ~ (p=0.440 n=20)
GlobalInt64-32              3.546n ± 27%    3.538n ± 2%       ~ (p=0.101 n=20)
GlobalInt64Parallel-32     0.3211n ±  0%   0.3207n ± 1%       ~ (p=0.753 n=20)
GlobalUint64-32             3.522n ±  2%    3.543n ± 1%       ~ (p=0.071 n=20)
GlobalUint64Parallel-32    0.3172n ±  0%   0.3170n ± 0%       ~ (p=0.507 n=20)
Int64-32                    2.520n ±  2%    2.548n ± 1%       ~ (p=0.267 n=20)
Uint64-32                   2.581n ±  1%    2.565n ± 2%       ~ (p=0.143 n=20)
GlobalIntN1000-32           6.171n ±  1%    6.300n ± 1%       ~ (p=0.037 n=20)
IntN1000-32                 4.752n ±  2%    4.750n ± 0%       ~ (p=0.984 n=20)
Int64N1000-32               5.429n ±  1%    5.515n ± 2%       ~ (p=0.292 n=20)
Int64N1e8-32                5.469n ±  2%    5.527n ± 0%       ~ (p=0.013 n=20)
Int64N1e9-32                5.489n ±  2%    5.531n ± 2%       ~ (p=0.256 n=20)
Int64N2e9-32                5.492n ±  2%    5.514n ± 2%       ~ (p=0.606 n=20)
Int64N1e18-32               8.927n ±  1%    9.059n ± 1%       ~ (p=0.229 n=20)
Int64N2e18-32               9.622n ±  1%    9.594n ± 1%       ~ (p=0.703 n=20)
Int64N4e18-32               12.03n ±  1%    12.05n ± 2%       ~ (p=0.733 n=20)
Int32N1000-32               4.817n ±  1%    4.840n ± 2%       ~ (p=0.941 n=20)
Int32N1e8-32                4.801n ±  1%    4.832n ± 2%       ~ (p=0.228 n=20)
Int32N1e9-32                4.798n ±  1%    4.815n ± 2%       ~ (p=0.560 n=20)
Int32N2e9-32                4.840n ±  1%    4.813n ± 1%       ~ (p=0.015 n=20)
Float32-32                  10.51n ±  4%    10.90n ± 2%  +3.71% (p=0.007 n=20)
Float64-32                  20.33n ±  3%    20.32n ± 4%       ~ (p=0.566 n=20)
ExpFloat64-32               12.59n ±  2%    12.95n ± 3%  +2.86% (p=0.002 n=20)
NormFloat64-32              7.350n ±  2%    7.570n ± 1%  +2.99% (p=0.007 n=20)
Perm3-32                    39.29n ±  2%    37.80n ± 2%  -3.79% (p=0.000 n=20)
Perm30-32                   219.1n ±  2%    214.0n ± 1%  -2.33% (p=0.002 n=20)
Perm30ViaShuffle-32         189.8n ±  2%    188.7n ± 2%       ~ (p=0.147 n=20)
ShuffleOverhead-32          158.9n ±  2%    160.8n ± 1%       ~ (p=0.176 n=20)
Concurrent-32               3.306n ±  3%    3.288n ± 0%  -0.54% (p=0.005 n=20)

For #61716.

Change-Id: I4c5fe710b310dc075ae21c97d1805bcc20db5050
Reviewed-on: https://go-review.googlesource.com/c/go/+/516275
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:08:47 +00:00
Russ Cox
ecda959b99 math/rand/v2: optimize Float32, Float64
We realized too late after Go 1 that float64(r.Uint64())/(1<<64)
is not a correct implementation: it occasionally rounds to 1.
The correct implementation is float64(r.Uint64()&(1<<53-1))/(1<<53)
but we couldn't change the implementation for compatibility, so we
changed it to retry only in the "round to 1" cases.

The change to v2 lets us update the algorithm to the simpler,
faster one.

Note that this implementation cannot generate 2⁻⁵⁴, nor 2⁻¹⁰⁰,
nor any of the other numbers between 0 and 2⁻⁵³. A slower algorithm
could shift some of the probability of generating these two boundary
values over to the values in between, but that would be much slower
and not necessarily be better. In particular, the current
implementation has the property that there are uniform gaps between
the possible returned floats, which might help stability. Also, the
result is often scaled and shifted, like Float64()*X+Y. Multiplying by
X>1 would open new gaps, and adding most Y would erase all the
distinctions that were introduced.

The only changes to benchmarks should be in Float32 and Float64.
The other changes remain a cautionary tale.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 4d84a369d1.amd64 │           2703446c2e.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.348n ± 2%    1.337n ± 1%        ~ (p=0.662 n=20)
GlobalInt64-32                 2.082n ± 2%    2.225n ± 2%   +6.87% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1036n ± 1%   0.1043n ± 2%        ~ (p=0.171 n=20)
GlobalUint64-32                2.077n ± 2%    2.058n ± 1%        ~ (p=0.560 n=20)
GlobalUint64Parallel-32       0.1012n ± 1%   0.1009n ± 1%        ~ (p=0.995 n=20)
Int64-32                       1.750n ± 0%    1.719n ± 2%   -1.74% (p=0.000 n=20)
Uint64-32                      1.707n ± 2%    1.669n ± 1%   -2.20% (p=0.000 n=20)
GlobalIntN1000-32              3.192n ± 1%    3.321n ± 2%   +4.04% (p=0.000 n=20)
IntN1000-32                    2.462n ± 2%    2.479n ± 1%        ~ (p=0.417 n=20)
Int64N1000-32                  2.470n ± 1%    2.477n ± 1%        ~ (p=0.664 n=20)
Int64N1e8-32                   2.503n ± 2%    2.490n ± 1%        ~ (p=0.245 n=20)
Int64N1e9-32                   2.487n ± 1%    2.458n ± 1%        ~ (p=0.032 n=20)
Int64N2e9-32                   2.487n ± 1%    2.486n ± 2%        ~ (p=0.507 n=20)
Int64N1e18-32                  3.006n ± 2%    3.215n ± 2%   +6.94% (p=0.000 n=20)
Int64N2e18-32                  3.368n ± 1%    3.588n ± 2%   +6.55% (p=0.000 n=20)
Int64N4e18-32                  4.763n ± 1%    4.938n ± 2%   +3.69% (p=0.000 n=20)
Int32N1000-32                  2.403n ± 1%    2.673n ± 2%  +11.19% (p=0.000 n=20)
Int32N1e8-32                   2.405n ± 1%    2.631n ± 2%   +9.42% (p=0.000 n=20)
Int32N1e9-32                   2.402n ± 2%    2.628n ± 2%   +9.41% (p=0.000 n=20)
Int32N2e9-32                   2.384n ± 1%    2.684n ± 2%  +12.56% (p=0.000 n=20)
Float32-32                     2.641n ± 2%    2.240n ± 2%  -15.18% (p=0.000 n=20)
Float64-32                     2.483n ± 1%    2.253n ± 1%   -9.26% (p=0.000 n=20)
ExpFloat64-32                  3.486n ± 2%    3.677n ± 1%   +5.49% (p=0.000 n=20)
NormFloat64-32                 3.648n ± 1%    3.761n ± 1%   +3.11% (p=0.000 n=20)
Perm3-32                       33.04n ± 1%    33.55n ± 2%        ~ (p=0.180 n=20)
Perm30-32                      171.9n ± 1%    173.2n ± 1%        ~ (p=0.050 n=20)
Perm30ViaShuffle-32            100.3n ± 1%    115.9n ± 1%  +15.55% (p=0.000 n=20)
ShuffleOverhead-32             102.5n ± 1%    101.9n ± 1%        ~ (p=0.266 n=20)
Concurrent-32                  2.101n ± 0%    2.107n ± 6%        ~ (p=0.212 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 4d84a369d1.arm64 │          2703446c2e.arm64           │
                       │      sec/op      │    sec/op     vs base               │
SourceUint64-8                2.261n ± 1%    2.275n ± 0%       ~ (p=0.082 n=20)
GlobalInt64-8                 2.160n ± 1%    2.154n ± 1%       ~ (p=0.490 n=20)
GlobalInt64Parallel-8        0.4299n ± 0%   0.4298n ± 0%       ~ (p=0.663 n=20)
GlobalUint64-8                2.169n ± 1%    2.160n ± 1%       ~ (p=0.292 n=20)
GlobalUint64Parallel-8       0.4293n ± 1%   0.4286n ± 0%       ~ (p=0.155 n=20)
Int64-8                       2.473n ± 1%    2.491n ± 1%       ~ (p=0.317 n=20)
Uint64-8                      2.453n ± 1%    2.458n ± 0%       ~ (p=0.941 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 2%       ~ (p=0.972 n=20)
IntN1000-8                    2.933n ± 2%    2.933n ± 0%       ~ (p=0.287 n=20)
Int64N1000-8                  2.934n ± 2%    2.962n ± 1%       ~ (p=0.062 n=20)
Int64N1e8-8                   2.935n ± 2%    2.960n ± 1%       ~ (p=0.183 n=20)
Int64N1e9-8                   2.934n ± 2%    2.935n ± 2%       ~ (p=0.367 n=20)
Int64N2e9-8                   2.935n ± 2%    2.934n ± 0%       ~ (p=0.455 n=20)
Int64N1e18-8                  3.778n ± 1%    3.777n ± 1%       ~ (p=0.995 n=20)
Int64N2e18-8                  4.359n ± 1%    4.359n ± 1%       ~ (p=0.122 n=20)
Int64N4e18-8                  6.546n ± 1%    6.536n ± 1%       ~ (p=0.920 n=20)
Int32N1000-8                  2.940n ± 2%    2.937n ± 0%       ~ (p=0.149 n=20)
Int32N1e8-8                   2.937n ± 2%    2.937n ± 1%       ~ (p=0.620 n=20)
Int32N1e9-8                   2.938n ± 0%    2.936n ± 0%       ~ (p=0.046 n=20)
Int32N2e9-8                   2.938n ± 2%    2.938n ± 2%       ~ (p=0.455 n=20)
Float32-8                     3.486n ± 0%    3.441n ± 0%  -1.28% (p=0.000 n=20)
Float64-8                     3.480n ± 0%    3.441n ± 0%  -1.13% (p=0.000 n=20)
ExpFloat64-8                  4.533n ± 0%    4.486n ± 0%  -1.03% (p=0.000 n=20)
NormFloat64-8                 4.764n ± 0%    4.721n ± 0%  -0.90% (p=0.000 n=20)
Perm3-8                       26.66n ± 0%    26.65n ± 0%       ~ (p=0.019 n=20)
Perm30-8                      143.4n ± 0%    143.2n ± 0%  -0.17% (p=0.000 n=20)
Perm30ViaShuffle-8            142.9n ± 0%    143.0n ± 0%       ~ (p=0.522 n=20)
ShuffleOverhead-8             120.7n ± 0%    120.6n ± 1%       ~ (p=0.488 n=20)
Concurrent-8                  2.360n ± 2%    2.399n ± 5%       ~ (p=0.062 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 4d84a369d1.386 │            2703446c2e.386             │
                        │     sec/op     │    sec/op      vs base                │
SourceUint64-32              2.101n ± 2%    2.072n ±  2%        ~ (p=0.273 n=20)
GlobalInt64-32               3.518n ± 2%    3.546n ± 27%   +0.78% (p=0.007 n=20)
GlobalInt64Parallel-32      0.3206n ± 0%   0.3211n ±  0%        ~ (p=0.386 n=20)
GlobalUint64-32              3.538n ± 1%    3.522n ±  2%        ~ (p=0.331 n=20)
GlobalUint64Parallel-32     0.3231n ± 0%   0.3172n ±  0%   -1.84% (p=0.000 n=20)
Int64-32                     2.554n ± 2%    2.520n ±  2%        ~ (p=0.465 n=20)
Uint64-32                    2.575n ± 2%    2.581n ±  1%        ~ (p=0.213 n=20)
GlobalIntN1000-32            6.292n ± 1%    6.171n ±  1%        ~ (p=0.015 n=20)
IntN1000-32                  4.735n ± 1%    4.752n ±  2%        ~ (p=0.635 n=20)
Int64N1000-32                5.489n ± 2%    5.429n ±  1%        ~ (p=0.324 n=20)
Int64N1e8-32                 5.528n ± 2%    5.469n ±  2%        ~ (p=0.013 n=20)
Int64N1e9-32                 5.438n ± 2%    5.489n ±  2%        ~ (p=0.984 n=20)
Int64N2e9-32                 5.474n ± 1%    5.492n ±  2%        ~ (p=0.616 n=20)
Int64N1e18-32                9.053n ± 1%    8.927n ±  1%        ~ (p=0.037 n=20)
Int64N2e18-32                9.685n ± 2%    9.622n ±  1%        ~ (p=0.449 n=20)
Int64N4e18-32                12.18n ± 1%    12.03n ±  1%        ~ (p=0.013 n=20)
Int32N1000-32                4.862n ± 1%    4.817n ±  1%   -0.94% (p=0.002 n=20)
Int32N1e8-32                 4.758n ± 2%    4.801n ±  1%        ~ (p=0.597 n=20)
Int32N1e9-32                 4.772n ± 1%    4.798n ±  1%        ~ (p=0.774 n=20)
Int32N2e9-32                 4.847n ± 0%    4.840n ±  1%        ~ (p=0.867 n=20)
Float32-32                   22.18n ± 4%    10.51n ±  4%  -52.61% (p=0.000 n=20)
Float64-32                   21.21n ± 3%    20.33n ±  3%   -4.17% (p=0.000 n=20)
ExpFloat64-32                12.39n ± 2%    12.59n ±  2%        ~ (p=0.139 n=20)
NormFloat64-32               7.422n ± 1%    7.350n ±  2%        ~ (p=0.208 n=20)
Perm3-32                     38.00n ± 2%    39.29n ±  2%   +3.38% (p=0.000 n=20)
Perm30-32                    212.7n ± 1%    219.1n ±  2%   +3.03% (p=0.001 n=20)
Perm30ViaShuffle-32          187.5n ± 2%    189.8n ±  2%        ~ (p=0.457 n=20)
ShuffleOverhead-32           159.7n ± 1%    158.9n ±  2%        ~ (p=0.920 n=20)
Concurrent-32                3.470n ± 0%    3.306n ±  3%   -4.71% (p=0.000 n=20)

For #61716.

Change-Id: I1933f1f9efd7e6e832d83e7fa5d84398f67d41f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/502503
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 17:08:40 +00:00
Russ Cox
c266587846 math/rand/v2: add, optimize N, UintN, Uint32N, Uint64N
Now that we can break the value stream, we can take advantage
of better algorithms that have been suggested since the original
code was written.

Also optimizes IntN, Int32N, Int64N, Perm (indirectly).

All the N variants (IntN, Int32N, Int64N, UintN, N, etc) now
return the same values given a Source and parameter n, so that
for example uint(r.IntN(10)) and r.UintN(10) and r.N(uint(10))
are completely interchangeable.

Int64N4e18 gets slower but that is a near worst case for
the algorithm and is extremely unlikely in practice.

32-bit Int32N variants got slower too, by 15-30%, in exchange
for speeding up everything on 64-bit systems and consistency
across the N functions.

Also rename previously missed benchmark
GlobalInt63Parallel to GlobalInt64Parallel.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 11ad9fdddc.amd64 │            4d84a369d1.amd64            │
                        │      sec/op      │    sec/op     vs base                  │
SourceUint64-32                1.335n ± 1%    1.348n ± 2%        ~ (p=0.335 n=20)
GlobalInt64-32                 2.046n ± 1%    2.082n ± 2%        ~ (p=0.310 n=20)
GlobalInt63Parallel-32        0.1037n ± 1%
GlobalInt64Parallel-32                       0.1036n ± 1%
GlobalUint64-32                2.075n ± 0%    2.077n ± 2%        ~ (p=0.228 n=20)
GlobalUint64Parallel-32       0.1013n ± 1%   0.1012n ± 1%        ~ (p=0.878 n=20)
Int64-32                       1.726n ± 2%    1.750n ± 0%   +1.39% (p=0.000 n=20)
Uint64-32                      1.673n ± 1%    1.707n ± 2%   +2.03% (p=0.002 n=20)
GlobalIntN1000-32              3.895n ± 2%    3.192n ± 1%  -18.05% (p=0.000 n=20)
IntN1000-32                    3.403n ± 1%    2.462n ± 2%  -27.65% (p=0.000 n=20)
Int64N1000-32                  3.053n ± 2%    2.470n ± 1%  -19.11% (p=0.000 n=20)
Int64N1e8-32                   2.718n ± 1%    2.503n ± 2%   -7.91% (p=0.000 n=20)
Int64N1e9-32                   2.712n ± 1%    2.487n ± 1%   -8.31% (p=0.000 n=20)
Int64N2e9-32                   2.690n ± 1%    2.487n ± 1%   -7.57% (p=0.000 n=20)
Int64N1e18-32                  3.084n ± 2%    3.006n ± 2%   -2.53% (p=0.000 n=20)
Int64N2e18-32                  4.026n ± 1%    3.368n ± 1%  -16.33% (p=0.000 n=20)
Int64N4e18-32                  4.049n ± 2%    4.763n ± 1%  +17.62% (p=0.000 n=20)
Int32N1000-32                  2.730n ± 0%    2.403n ± 1%  -11.94% (p=0.000 n=20)
Int32N1e8-32                   2.916n ± 2%    2.405n ± 1%  -17.53% (p=0.000 n=20)
Int32N1e9-32                   3.375n ± 1%    2.402n ± 2%  -28.83% (p=0.000 n=20)
Int32N2e9-32                   3.292n ± 1%    2.384n ± 1%  -27.58% (p=0.000 n=20)
Float32-32                     2.673n ± 1%    2.641n ± 2%        ~ (p=0.147 n=20)
Float64-32                     2.485n ± 1%    2.483n ± 1%        ~ (p=0.804 n=20)
ExpFloat64-32                  3.577n ± 2%    3.486n ± 2%   -2.57% (p=0.000 n=20)
NormFloat64-32                 3.797n ± 2%    3.648n ± 1%   -3.92% (p=0.000 n=20)
Perm3-32                       35.79n ± 2%    33.04n ± 1%   -7.68% (p=0.000 n=20)
Perm30-32                      205.1n ± 1%    171.9n ± 1%  -16.14% (p=0.000 n=20)
Perm30ViaShuffle-32            111.2n ± 2%    100.3n ± 1%   -9.76% (p=0.000 n=20)
ShuffleOverhead-32             100.5n ± 2%    102.5n ± 1%   +1.99% (p=0.007 n=20)
Concurrent-32                  2.188n ± 5%    2.101n ± 0%        ~ (p=0.013 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 11ad9fdddc.arm64 │            4d84a369d1.arm64            │
                       │      sec/op      │    sec/op     vs base                  │
SourceUint64-8                2.272n ± 1%    2.261n ± 1%        ~ (p=0.172 n=20)
GlobalInt64-8                 2.155n ± 1%    2.160n ± 1%        ~ (p=0.482 n=20)
GlobalInt63Parallel-8        0.4352n ± 0%
GlobalInt64Parallel-8                       0.4299n ± 0%
GlobalUint64-8                2.173n ± 1%    2.169n ± 1%        ~ (p=0.262 n=20)
GlobalUint64Parallel-8       0.4340n ± 0%   0.4293n ± 1%   -1.08% (p=0.000 n=20)
Int64-8                       2.544n ± 1%    2.473n ± 1%   -2.83% (p=0.000 n=20)
Uint64-8                      2.552n ± 1%    2.453n ± 1%   -3.90% (p=0.000 n=20)
GlobalIntN1000-8              3.856n ± 0%    2.814n ± 2%  -27.02% (p=0.000 n=20)
IntN1000-8                    3.820n ± 0%    2.933n ± 2%  -23.22% (p=0.000 n=20)
Int64N1000-8                  3.219n ± 2%    2.934n ± 2%   -8.85% (p=0.000 n=20)
Int64N1e8-8                   3.221n ± 2%    2.935n ± 2%   -8.91% (p=0.000 n=20)
Int64N1e9-8                   3.276n ± 2%    2.934n ± 2%  -10.44% (p=0.000 n=20)
Int64N2e9-8                   3.217n ± 0%    2.935n ± 2%   -8.78% (p=0.000 n=20)
Int64N1e18-8                  3.502n ± 2%    3.778n ± 1%   +7.91% (p=0.000 n=20)
Int64N2e18-8                  4.968n ± 1%    4.359n ± 1%  -12.26% (p=0.000 n=20)
Int64N4e18-8                  4.963n ± 0%    6.546n ± 1%  +31.92% (p=0.000 n=20)
Int32N1000-8                  3.189n ± 1%    2.940n ± 2%   -7.81% (p=0.000 n=20)
Int32N1e8-8                   3.514n ± 1%    2.937n ± 2%  -16.41% (p=0.000 n=20)
Int32N1e9-8                   4.133n ± 0%    2.938n ± 0%  -28.91% (p=0.000 n=20)
Int32N2e9-8                   4.137n ± 0%    2.938n ± 2%  -28.97% (p=0.000 n=20)
Float32-8                     3.468n ± 1%    3.486n ± 0%   +0.52% (p=0.000 n=20)
Float64-8                     3.478n ± 0%    3.480n ± 0%        ~ (p=0.063 n=20)
ExpFloat64-8                  4.563n ± 0%    4.533n ± 0%   -0.67% (p=0.000 n=20)
NormFloat64-8                 4.768n ± 0%    4.764n ± 0%   -0.07% (p=0.001 n=20)
Perm3-8                       28.94n ± 0%    26.66n ± 0%   -7.88% (p=0.000 n=20)
Perm30-8                      175.9n ± 0%    143.4n ± 0%  -18.50% (p=0.000 n=20)
Perm30ViaShuffle-8            152.6n ± 1%    142.9n ± 0%   -6.29% (p=0.000 n=20)
ShuffleOverhead-8             119.6n ± 1%    120.7n ± 0%   +0.96% (p=0.000 n=20)
Concurrent-8                  2.452n ± 3%    2.360n ± 2%   -3.73% (p=0.007 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 11ad9fdddc.386 │             4d84a369d1.386             │
                        │     sec/op     │    sec/op     vs base                  │
SourceUint64-32              2.091n ± 1%    2.101n ± 2%        ~ (p=0.672 n=20)
GlobalInt64-32               3.514n ± 2%    3.518n ± 2%        ~ (p=0.723 n=20)
GlobalInt63Parallel-32      0.3197n ± 0%
GlobalInt64Parallel-32                     0.3206n ± 0%
GlobalUint64-32              3.542n ± 1%    3.538n ± 1%        ~ (p=0.304 n=20)
GlobalUint64Parallel-32     0.3218n ± 0%   0.3231n ± 0%        ~ (p=0.071 n=20)
Int64-32                     2.552n ± 2%    2.554n ± 2%        ~ (p=0.693 n=20)
Uint64-32                    2.566n ± 1%    2.575n ± 2%        ~ (p=0.606 n=20)
GlobalIntN1000-32            5.965n ± 2%    6.292n ± 1%   +5.46% (p=0.000 n=20)
IntN1000-32                  4.652n ± 1%    4.735n ± 1%   +1.77% (p=0.000 n=20)
Int64N1000-32               14.485n ± 1%    5.489n ± 2%  -62.11% (p=0.000 n=20)
Int64N1e8-32                14.675n ± 1%    5.528n ± 2%  -62.33% (p=0.000 n=20)
Int64N1e9-32                16.805n ± 2%    5.438n ± 2%  -67.64% (p=0.000 n=20)
Int64N2e9-32                14.515n ± 1%    5.474n ± 1%  -62.28% (p=0.000 n=20)
Int64N1e18-32               16.165n ± 1%    9.053n ± 1%  -44.00% (p=0.000 n=20)
Int64N2e18-32               17.945n ± 2%    9.685n ± 2%  -46.03% (p=0.000 n=20)
Int64N4e18-32                18.35n ± 2%    12.18n ± 1%  -33.62% (p=0.000 n=20)
Int32N1000-32                3.608n ± 1%    4.862n ± 1%  +34.77% (p=0.000 n=20)
Int32N1e8-32                 3.767n ± 1%    4.758n ± 2%  +26.31% (p=0.000 n=20)
Int32N1e9-32                 4.130n ± 2%    4.772n ± 1%  +15.54% (p=0.000 n=20)
Int32N2e9-32                 4.206n ± 1%    4.847n ± 0%  +15.24% (p=0.000 n=20)
Float32-32                   22.18n ± 4%    22.18n ± 4%        ~ (p=0.195 n=20)
Float64-32                   20.75n ± 4%    21.21n ± 3%        ~ (p=0.394 n=20)
ExpFloat64-32                12.58n ± 3%    12.39n ± 2%        ~ (p=0.032 n=20)
NormFloat64-32               7.920n ± 3%    7.422n ± 1%   -6.29% (p=0.000 n=20)
Perm3-32                     40.27n ± 1%    38.00n ± 2%   -5.65% (p=0.000 n=20)
Perm30-32                    213.2n ± 2%    212.7n ± 1%        ~ (p=0.995 n=20)
Perm30ViaShuffle-32          164.2n ± 2%    187.5n ± 2%  +14.22% (p=0.000 n=20)
ShuffleOverhead-32           134.7n ± 2%    159.7n ± 1%  +18.52% (p=0.000 n=20)
Concurrent-32                3.301n ± 2%    3.470n ± 0%   +5.10% (p=0.000 n=20)

For #61716.

Change-Id: Id1481b04202883cd0b23e21bb58d1bca4e482bd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/502500
Reviewed-by: Rob Pike <r@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 17:08:37 +00:00
Russ Cox
c7dddb02d3 math/rand/v2: change Source to use uint64
This should make Uint64-using functions faster and leave
other things alone. It is a mystery why so much got faster.
A good cautionary tale not to read too much into minor
jitter in the benchmarks.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.amd64 │           11ad9fdddc.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.555n ± 1%    1.335n ± 1%  -14.15% (p=0.000 n=20)
GlobalInt64-32                 2.071n ± 1%    2.046n ± 1%        ~ (p=0.016 n=20)
GlobalInt63Parallel-32        0.1023n ± 1%   0.1037n ± 1%   +1.37% (p=0.002 n=20)
GlobalUint64-32                5.193n ± 1%    2.075n ± 0%  -60.06% (p=0.000 n=20)
GlobalUint64Parallel-32       0.2341n ± 0%   0.1013n ± 1%  -56.74% (p=0.000 n=20)
Int64-32                       2.056n ± 2%    1.726n ± 2%  -16.10% (p=0.000 n=20)
Uint64-32                      2.077n ± 2%    1.673n ± 1%  -19.46% (p=0.000 n=20)
GlobalIntN1000-32              4.077n ± 2%    3.895n ± 2%   -4.45% (p=0.000 n=20)
IntN1000-32                    3.476n ± 2%    3.403n ± 1%   -2.10% (p=0.000 n=20)
Int64N1000-32                  3.059n ± 1%    3.053n ± 2%        ~ (p=0.131 n=20)
Int64N1e8-32                   2.942n ± 1%    2.718n ± 1%   -7.60% (p=0.000 n=20)
Int64N1e9-32                   2.932n ± 1%    2.712n ± 1%   -7.50% (p=0.000 n=20)
Int64N2e9-32                   2.925n ± 1%    2.690n ± 1%   -8.03% (p=0.000 n=20)
Int64N1e18-32                  3.116n ± 1%    3.084n ± 2%        ~ (p=0.425 n=20)
Int64N2e18-32                  4.067n ± 1%    4.026n ± 1%   -1.02% (p=0.007 n=20)
Int64N4e18-32                  4.054n ± 1%    4.049n ± 2%        ~ (p=0.204 n=20)
Int32N1000-32                  2.951n ± 1%    2.730n ± 0%   -7.49% (p=0.000 n=20)
Int32N1e8-32                   3.102n ± 1%    2.916n ± 2%   -6.03% (p=0.000 n=20)
Int32N1e9-32                   3.535n ± 1%    3.375n ± 1%   -4.54% (p=0.000 n=20)
Int32N2e9-32                   3.514n ± 1%    3.292n ± 1%   -6.30% (p=0.000 n=20)
Float32-32                     2.760n ± 1%    2.673n ± 1%   -3.13% (p=0.000 n=20)
Float64-32                     2.284n ± 1%    2.485n ± 1%   +8.80% (p=0.000 n=20)
ExpFloat64-32                  3.757n ± 1%    3.577n ± 2%   -4.78% (p=0.000 n=20)
NormFloat64-32                 3.837n ± 1%    3.797n ± 2%        ~ (p=0.204 n=20)
Perm3-32                       35.23n ± 2%    35.79n ± 2%        ~ (p=0.298 n=20)
Perm30-32                      208.8n ± 1%    205.1n ± 1%   -1.82% (p=0.000 n=20)
Perm30ViaShuffle-32            111.7n ± 1%    111.2n ± 2%        ~ (p=0.273 n=20)
ShuffleOverhead-32             101.1n ± 1%    100.5n ± 2%        ~ (p=0.878 n=20)
Concurrent-32                  2.108n ± 7%    2.188n ± 5%        ~ (p=0.417 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
                       │ 220860f76f.arm64 │           11ad9fdddc.arm64           │
                       │      sec/op      │    sec/op     vs base                │
SourceUint64-8                2.316n ± 1%    2.272n ± 1%   -1.86% (p=0.000 n=20)
GlobalInt64-8                 2.183n ± 1%    2.155n ± 1%        ~ (p=0.122 n=20)
GlobalInt63Parallel-8        0.4331n ± 0%   0.4352n ± 0%   +0.48% (p=0.000 n=20)
GlobalUint64-8                4.377n ± 2%    2.173n ± 1%  -50.35% (p=0.000 n=20)
GlobalUint64Parallel-8       0.9237n ± 0%   0.4340n ± 0%  -53.02% (p=0.000 n=20)
Int64-8                       2.538n ± 1%    2.544n ± 1%        ~ (p=0.189 n=20)
Uint64-8                      2.604n ± 1%    2.552n ± 1%   -1.98% (p=0.000 n=20)
GlobalIntN1000-8              3.857n ± 2%    3.856n ± 0%        ~ (p=0.051 n=20)
IntN1000-8                    3.822n ± 2%    3.820n ± 0%   -0.05% (p=0.001 n=20)
Int64N1000-8                  3.318n ± 0%    3.219n ± 2%   -2.98% (p=0.000 n=20)
Int64N1e8-8                   3.349n ± 1%    3.221n ± 2%   -3.79% (p=0.000 n=20)
Int64N1e9-8                   3.317n ± 2%    3.276n ± 2%   -1.24% (p=0.001 n=20)
Int64N2e9-8                   3.317n ± 2%    3.217n ± 0%   -3.01% (p=0.000 n=20)
Int64N1e18-8                  3.542n ± 1%    3.502n ± 2%   -1.16% (p=0.001 n=20)
Int64N2e18-8                  5.087n ± 0%    4.968n ± 1%   -2.33% (p=0.000 n=20)
Int64N4e18-8                  5.084n ± 0%    4.963n ± 0%   -2.39% (p=0.000 n=20)
Int32N1000-8                  3.208n ± 2%    3.189n ± 1%   -0.58% (p=0.001 n=20)
Int32N1e8-8                   3.610n ± 1%    3.514n ± 1%   -2.67% (p=0.000 n=20)
Int32N1e9-8                   4.235n ± 0%    4.133n ± 0%   -2.40% (p=0.000 n=20)
Int32N2e9-8                   4.229n ± 1%    4.137n ± 0%   -2.19% (p=0.000 n=20)
Float32-8                     3.468n ± 0%    3.468n ± 1%        ~ (p=0.350 n=20)
Float64-8                     3.447n ± 0%    3.478n ± 0%   +0.90% (p=0.000 n=20)
ExpFloat64-8                  4.567n ± 0%    4.563n ± 0%   -0.10% (p=0.002 n=20)
NormFloat64-8                 4.821n ± 0%    4.768n ± 0%   -1.09% (p=0.000 n=20)
Perm3-8                       28.89n ± 0%    28.94n ± 0%   +0.17% (p=0.000 n=20)
Perm30-8                      175.7n ± 0%    175.9n ± 0%   +0.14% (p=0.000 n=20)
Perm30ViaShuffle-8            153.5n ± 0%    152.6n ± 1%        ~ (p=0.010 n=20)
ShuffleOverhead-8             119.8n ± 1%    119.6n ± 1%        ~ (p=0.147 n=20)
Concurrent-8                  2.433n ± 3%    2.452n ± 3%        ~ (p=0.616 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.386 │            11ad9fdddc.386            │
                        │     sec/op     │    sec/op     vs base                │
SourceUint64-32             2.370n ±  1%    2.091n ± 1%  -11.75% (p=0.000 n=20)
GlobalInt64-32              3.569n ±  1%    3.514n ± 2%   -1.56% (p=0.000 n=20)
GlobalInt63Parallel-32     0.3221n ±  1%   0.3197n ± 0%   -0.76% (p=0.000 n=20)
GlobalUint64-32             8.797n ± 10%    3.542n ± 1%  -59.74% (p=0.000 n=20)
GlobalUint64Parallel-32    0.6351n ±  0%   0.3218n ± 0%  -49.33% (p=0.000 n=20)
Int64-32                    2.612n ±  2%    2.552n ± 2%   -2.30% (p=0.000 n=20)
Uint64-32                   3.350n ±  1%    2.566n ± 1%  -23.42% (p=0.000 n=20)
GlobalIntN1000-32           5.892n ±  1%    5.965n ± 2%        ~ (p=0.082 n=20)
IntN1000-32                 4.546n ±  1%    4.652n ± 1%   +2.33% (p=0.000 n=20)
Int64N1000-32               14.59n ±  1%    14.48n ± 1%        ~ (p=0.652 n=20)
Int64N1e8-32                14.76n ±  2%    14.67n ± 1%        ~ (p=0.836 n=20)
Int64N1e9-32                16.57n ±  1%    16.80n ± 2%        ~ (p=0.016 n=20)
Int64N2e9-32                14.54n ±  1%    14.52n ± 1%        ~ (p=0.533 n=20)
Int64N1e18-32               16.14n ±  1%    16.16n ± 1%        ~ (p=0.606 n=20)
Int64N2e18-32               18.10n ±  1%    17.95n ± 2%        ~ (p=0.062 n=20)
Int64N4e18-32               18.65n ±  1%    18.35n ± 2%   -1.61% (p=0.010 n=20)
Int32N1000-32               3.560n ±  1%    3.608n ± 1%   +1.33% (p=0.001 n=20)
Int32N1e8-32                3.770n ±  2%    3.767n ± 1%        ~ (p=0.155 n=20)
Int32N1e9-32                4.098n ±  0%    4.130n ± 2%        ~ (p=0.016 n=20)
Int32N2e9-32                4.179n ±  1%    4.206n ± 1%        ~ (p=0.011 n=20)
Float32-32                  21.18n ±  4%    22.18n ± 4%   +4.70% (p=0.003 n=20)
Float64-32                  20.60n ±  2%    20.75n ± 4%   +0.73% (p=0.000 n=20)
ExpFloat64-32               13.07n ±  0%    12.58n ± 3%   -3.82% (p=0.000 n=20)
NormFloat64-32              7.738n ±  2%    7.920n ± 3%        ~ (p=0.066 n=20)
Perm3-32                    36.73n ±  1%    40.27n ± 1%   +9.65% (p=0.000 n=20)
Perm30-32                   211.9n ±  1%    213.2n ± 2%        ~ (p=0.262 n=20)
Perm30ViaShuffle-32         165.2n ±  1%    164.2n ± 2%        ~ (p=0.029 n=20)
ShuffleOverhead-32          133.9n ±  1%    134.7n ± 2%        ~ (p=0.551 n=20)
Concurrent-32               3.287n ±  2%    3.301n ± 2%        ~ (p=0.330 n=20)

For #61716.

Change-Id: I8d2f73f87dd3603a0c2ff069988938e0957b6904
Reviewed-on: https://go-review.googlesource.com/c/go/+/502499
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:08:34 +00:00
Ubuntu
8fc043ccfa cmd/compile: optimize right shifts of int32 on riscv64
The compiler is currently sign extending 32 bit signed integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit signed values (sraw and sraiw) which sign extend
the result of the shift to 64 bits.  Change the compiler so that
it uses sraw and sraiw for shifts of signed 32 bit integers reducing
in most cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

int32(a) >> 2

  before:

    sll     x5,x10,0x20
    sra     x10,x5,0x22

  after:

    sraw    x10,x10,0x2

int32(v) >> int(s)

  before:

    sext.w  x5,x10
    sltiu   x6,x11,64
    add     x6,x6,-1
    or      x6,x11,x6
    sra     x10,x5,x6

  after:

    sltiu   x5,x11,32
    add     x5,x5,-1
    or      x5,x11,x5
    sraw    x10,x10,x5

int32(v) >> (int(s) & 31)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

after:

    and     x5,x11,31
    sraw    x10,x10,x5

int32(100) >> int(a)

  before:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,64
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sra     x10,x6,x5

  after:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,32
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sraw    x10,x6,x5

int32(v) >> (int(s) & 63)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

  after:

    and     x5,x11,63
    sltiu   x6,x5,32
    add     x6,x6,-1
    or      x5,x5,x6
    sraw    x10,x10,x5

In most cases we eliminate one instruction.  In the case where
we shift a int32 constant by a variable the number of instructions
generated is identical.  A sra is simply replaced by a sraw.  In the
unusual case where we shift right by a variable anded with a constant
> 31 but < 64, we generate two additional instructions.  As this is
an unusual case we do not try to optimize for it.

Some improvements can be seen in some of the existing benchmarks,
notably in the utf8 package which performs right shifts of runes
which are signed 32 bit integers.

                      |  utf8-old   |              utf8-new            |
                      |   sec/op    |   sec/op     vs base             |
EncodeASCIIRune-4       17.68n ± 0%   17.67n ± 0%       ~ (p=0.312 n=10)
EncodeJapaneseRune-4    35.34n ± 0%   34.53n ± 1%  -2.31% (p=0.000 n=10)
AppendASCIIRune-4       3.213n ± 0%   3.213n ± 0%       ~ (p=0.318 n=10)
AppendJapaneseRune-4    36.14n ± 0%   35.35n ± 0%  -2.19% (p=0.000 n=10)
DecodeASCIIRune-4       28.11n ± 0%   27.36n ± 0%  -2.69% (p=0.000 n=10)
DecodeJapaneseRune-4    38.55n ± 0%   38.58n ± 0%       ~ (p=0.612 n=10)

Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/535115
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
2023-10-30 14:47:06 +00:00
Russ Cox
1f4db9dbd6 math/rand/v2: update benchmarks
Change the benchmarks to use the result of the calls,
as I found that in certain cases inlining resulted in
discarding part of the computation in the benchmark loop.
Add various benchmarks that will be relevant in future CLs.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.amd64 │
                        │      sec/op      │
SourceUint64-32                1.555n ± 1%
GlobalInt64-32                 2.071n ± 1%
GlobalInt63Parallel-32        0.1023n ± 1%
GlobalUint64-32                5.193n ± 1%
GlobalUint64Parallel-32       0.2341n ± 0%
Int64-32                       2.056n ± 2%
Uint64-32                      2.077n ± 2%
GlobalIntN1000-32              4.077n ± 2%
IntN1000-32                    3.476n ± 2%
Int64N1000-32                  3.059n ± 1%
Int64N1e8-32                   2.942n ± 1%
Int64N1e9-32                   2.932n ± 1%
Int64N2e9-32                   2.925n ± 1%
Int64N1e18-32                  3.116n ± 1%
Int64N2e18-32                  4.067n ± 1%
Int64N4e18-32                  4.054n ± 1%
Int32N1000-32                  2.951n ± 1%
Int32N1e8-32                   3.102n ± 1%
Int32N1e9-32                   3.535n ± 1%
Int32N2e9-32                   3.514n ± 1%
Float32-32                     2.760n ± 1%
Float64-32                     2.284n ± 1%
ExpFloat64-32                  3.757n ± 1%
NormFloat64-32                 3.837n ± 1%
Perm3-32                       35.23n ± 2%
Perm30-32                      208.8n ± 1%
Perm30ViaShuffle-32            111.7n ± 1%
ShuffleOverhead-32             101.1n ± 1%
Concurrent-32                  2.108n ± 7%

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 220860f76f.arm64 │
                       │      sec/op      │
SourceUint64-8                2.316n ± 1%
GlobalInt64-8                 2.183n ± 1%
GlobalInt63Parallel-8        0.4331n ± 0%
GlobalUint64-8                4.377n ± 2%
GlobalUint64Parallel-8       0.9237n ± 0%
Int64-8                       2.538n ± 1%
Uint64-8                      2.604n ± 1%
GlobalIntN1000-8              3.857n ± 2%
IntN1000-8                    3.822n ± 2%
Int64N1000-8                  3.318n ± 0%
Int64N1e8-8                   3.349n ± 1%
Int64N1e9-8                   3.317n ± 2%
Int64N2e9-8                   3.317n ± 2%
Int64N1e18-8                  3.542n ± 1%
Int64N2e18-8                  5.087n ± 0%
Int64N4e18-8                  5.084n ± 0%
Int32N1000-8                  3.208n ± 2%
Int32N1e8-8                   3.610n ± 1%
Int32N1e9-8                   4.235n ± 0%
Int32N2e9-8                   4.229n ± 1%
Float32-8                     3.468n ± 0%
Float64-8                     3.447n ± 0%
ExpFloat64-8                  4.567n ± 0%
NormFloat64-8                 4.821n ± 0%
Perm3-8                       28.89n ± 0%
Perm30-8                      175.7n ± 0%
Perm30ViaShuffle-8            153.5n ± 0%
ShuffleOverhead-8             119.8n ± 1%
Concurrent-8                  2.433n ± 3%

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.386 │
                        │     sec/op     │
SourceUint64-32             2.370n ±  1%
GlobalInt64-32              3.569n ±  1%
GlobalInt63Parallel-32     0.3221n ±  1%
GlobalUint64-32             8.797n ± 10%
GlobalUint64Parallel-32    0.6351n ±  0%
Int64-32                    2.612n ±  2%
Uint64-32                   3.350n ±  1%
GlobalIntN1000-32           5.892n ±  1%
IntN1000-32                 4.546n ±  1%
Int64N1000-32               14.59n ±  1%
Int64N1e8-32                14.76n ±  2%
Int64N1e9-32                16.57n ±  1%
Int64N2e9-32                14.54n ±  1%
Int64N1e18-32               16.14n ±  1%
Int64N2e18-32               18.10n ±  1%
Int64N4e18-32               18.65n ±  1%
Int32N1000-32               3.560n ±  1%
Int32N1e8-32                3.770n ±  2%
Int32N1e9-32                4.098n ±  0%
Int32N2e9-32                4.179n ±  1%
Float32-32                  21.18n ±  4%
Float64-32                  20.60n ±  2%
ExpFloat64-32               13.07n ±  0%
NormFloat64-32              7.738n ±  2%
Perm3-32                    36.73n ±  1%
Perm30-32                   211.9n ±  1%
Perm30ViaShuffle-32         165.2n ±  1%
ShuffleOverhead-32          133.9n ±  1%
Concurrent-32               3.287n ±  2%

For #61716.

Change-Id: I2f0938eae4b7bf736a8cd899a99783e731bf2179
Reviewed-on: https://go-review.googlesource.com/c/go/+/502496
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:32:20 +00:00
Russ Cox
1cc5b34d28 math/rand/v2: remove Rand.Seed
Removing Rand.Seed lets us remove lockedSource as well,
along with the ambiguity in globalRand about which source
to use.

For #61716.

Change-Id: Ibe150520dd1e7dd87165eacaebe9f0c2daeaedfd
Reviewed-on: https://go-review.googlesource.com/c/go/+/502498
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2023-10-30 14:31:46 +00:00
Russ Cox
48bd1fc93b math/rand/v2: clean up regression test
Add more test cases.
Replace -printgolden with -update,
which rewrites the files for us.

For #61716.

Change-Id: I7c4c900ee896042429135a21971a56ebe16b6a66
Reviewed-on: https://go-review.googlesource.com/c/go/+/516858
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:30:24 +00:00
Russ Cox
d6c1ef52ad math/rand/v2: remove Read
In math/rand, Read is deprecated. Remove in v2.
People should use crypto/rand if they need long strings.

For #61716.

Change-Id: Ib254b7e1844616e96db60a3a7abb572b0dcb1583
Reviewed-on: https://go-review.googlesource.com/c/go/+/502497
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:30:14 +00:00
Russ Cox
d42750b17c math/rand/v2: rename various functions
Int31 -> Int32
Int31n -> Int32N
Int63 -> Int64
Int63n -> Int64N
Intn -> IntN

The 31 and 63 are pedantic and confusing: the functions should
be named for the type they return, same as all the others.

The lower-case n is inconsistent with Go's usual CamelCase
and especially problematic because we plan to add 'func N'.
Capitalize the n.

For #61716.

Change-Id: Idb1a005a82f353677450d47fb612ade7a41fde69
Reviewed-on: https://go-review.googlesource.com/c/go/+/516857
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:29:37 +00:00
Russ Cox
59f0ab4036 math/rand/v2: start of new API
This is the beginning of the math/rand/v2 package from proposal #61716.
Start by copying old API. This CL copies math/rand/* to math/rand/v2
and updates references to math/rand to add v2 throughout.
Later CLs will make the v2 changes.

For #61716.

Change-Id: I1624ccffae3dfa442d4ba2461942decbd076e11b
Reviewed-on: https://go-review.googlesource.com/c/go/+/502495
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 14:29:30 +00:00
Cherry Mui
8c92897e15 cmd/compile: rework TestPGOHash to not rebuild dependencies
TestPGOHash may rebuild dependencies as we pass -trimpath to the
go command. This CL makes it pass -trimpath compiler flag to only
the current package instead, as we only need the current package
to have a stable source file path.

Also refactor buildPGOInliningTest to only take compiler flags,
not go flags, to avoid accidental rebuild.

Should fix #63733.

Change-Id: Iec6c4e90cf659790e21083ee2e697f518234c5b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/535915
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-27 17:54:18 +00:00
Cherry Mui
5613882df7 internal/testenv: use cmd.Environ in CleanCmdEnv
In CleanCmdEnv, use cmd.Environ instead of os.Environ, so it
sets the PWD environment variable if cmd.Dir is set. This ensures
the child process sees a canonical path for its working directory.

Change-Id: Ia769552a488dc909eaf6bb7d21937adba06d1072
Reviewed-on: https://go-review.googlesource.com/c/go/+/538215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-27 17:53:23 +00:00
Jes Cok
b46aec0765 bytes,internal/bytealg: eliminate HashStrBytes,HashStrRevBytes using …
…generics

The logic of HashStrBytes, HashStrRevBytes and HashStr, HashStrRev,
are exactly the same, except that the types are different.

Since the bootstrap toolchain is bumped to 1.20, we can eliminate them
by using generics.

Change-Id: I4336b1cab494ba963f09646c169b45f6b1ee62e3
GitHub-Last-Rev: b11a2bf947
GitHub-Pull-Request: golang/go#63766
Reviewed-on: https://go-review.googlesource.com/c/go/+/538175
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-27 15:55:16 +00:00
Cherry Mui
0262ea1ff9 runtime: print a stack trace at "morestack on g0"
Error like "morestack on g0" is one of the errors that is very
hard to debug, because often it doesn't print a useful stack trace.
The runtime doesn't directly print a stack trace because it is
a bad stack state to call print. Sometimes the SIGABRT may trigger
a traceback, but sometimes not especially in a cgo binary. Even if
it triggers a traceback it often does not include the stack trace
of the bad stack.

This CL makes it explicitly print a stack trace and throw. The
idea is to have some space as an "emergency" crash stack. When the
stack is in a really bad state, we switch to the crash stack and
do a traceback.

Currently only implemented on AMD64 and ARM64.

TODO: also handle errors like "morestack on gsignal" and bad
systemstack. Also handle other architectures.

Change-Id: Ibfc397202f2bb0737c5cbe99f2763de83301c1c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/419435
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-10-26 18:46:50 +00:00
Alexander Yastrebov
29b80397a8 crypto/subtle: use PCALIGN in xorBytes
goos: linux
goarch: amd64
pkg: crypto/subtle
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
                      │   master    │                HEAD                 │
                      │   sec/op    │   sec/op     vs base                │
XORBytes/8Bytes-8       10.90n ± 1%   10.96n ± 5%        ~ (p=0.617 n=10)
XORBytes/128Bytes-8     14.85n ± 2%   12.05n ± 2%  -18.82% (p=0.000 n=10)
XORBytes/2048Bytes-8    88.30n ± 2%   72.64n ± 1%  -17.73% (p=0.000 n=10)
XORBytes/32768Bytes-8   1.489µ ± 2%   1.442µ ± 1%   -3.12% (p=0.000 n=10)
geomean                 67.91n        60.99n       -10.19%

                      │    master    │                 HEAD                 │
                      │     B/s      │     B/s       vs base                │
XORBytes/8Bytes-8       700.5Mi ± 1%   696.5Mi ± 5%        ~ (p=0.631 n=10)
XORBytes/128Bytes-8     8.026Gi ± 2%   9.890Gi ± 2%  +23.22% (p=0.000 n=10)
XORBytes/2048Bytes-8    21.60Gi ± 2%   26.26Gi ± 1%  +21.55% (p=0.000 n=10)
XORBytes/32768Bytes-8   20.50Gi ± 2%   21.16Gi ± 1%   +3.21% (p=0.000 n=10)
geomean                 7.022Gi        7.819Gi       +11.34%

For #63678

Change-Id: I3996873773748a6f78acc6575e70e09bb6aea979
GitHub-Last-Rev: d9129cb8ea
GitHub-Pull-Request: golang/go#63754
Reviewed-on: https://go-review.googlesource.com/c/go/+/537856
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-26 18:14:32 +00:00
Dmitri Shuralyov
7546c79e91 cmd: update vendored golang.org/x/mod
Pull in CL 500335. It teaches modfile.IsDirectoryPath to recognize all
relative paths that begin with a "." or ".." path element as a valid
directory path (rather than a module path). This allows removing the
path == "." check that CL 389298 added to modload.ToDirectoryPath.

go get golang.org/x/mod@6e58e47c  # CL 500335
go mod tidy
go mod vendor

Updates #51448.
Fixes #60572.

Change-Id: Ide99c728c8dac8fd238e13f6d6a0c3917d7aea2d
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/500355
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-26 17:53:40 +00:00
Mauri de Souza Meneguzzo
0046c1414c net/http: pull http2 underflow fix from x/net/http2
After CL 534295 was merged to fix a CVE it introduced
an underflow when we try to decrement sc.curHandlers
in handlerDone.

Pull in a fix from x/net/http2:
http2: fix underflow in http2 server push
https://go-review.googlesource.com/c/net/+/535595

Fixes #63511

Change-Id: I5c678ce7dcc53635f3ad5e4999857cb120dfc1ab
GitHub-Last-Rev: 587ffa3caf
GitHub-Pull-Request: golang/go#63561
Reviewed-on: https://go-review.googlesource.com/c/go/+/535575
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-26 15:35:15 +00:00
Michael Pratt
1af424c196 runtime: clear g0 stack bounds in dropm
After CL 527715, needm uses callbackUpdateSystemStack to set the stack
bounds for g0 on an M from the extra M list. Since
callbackUpdateSystemStack is also used for recursive cgocallback, it
does nothing if the stack is already in bounds.

Currently, the stack bounds in an extra M may contain stale bounds from
a previous thread that used this M and then returned it to the extra
list in dropm.

Typically a new thread will not have an overlapping stack with an old
thread, but because the old thread has exited there is a small chance
that the C memory allocator will allocate the new thread's stack
partially or fully overlapping with the old thread's stack.

If this occurs, then callbackUpdateSystemStack will not update the stack
bounds. If in addition, the overlap is partial such that SP on
cgocallback is close to the recorded stack lower bound, then Go may
quickly "overflow" the stack and crash with "morestack on g0".

Fix this by clearing the stack bounds in dropm, which ensures that
callbackUpdateSystemStack will unconditionally update the bounds in
needm.

For #62440.

Change-Id: Ic9e2052c2090dd679ed716d1a23a86d66cbcada7
Reviewed-on: https://go-review.googlesource.com/c/go/+/537695
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Bypass: Michael Pratt <mpratt@google.com>
2023-10-26 15:17:33 +00:00
Daniel Martí
5fe2035927 internal/profile: actually return errors in postDecode
As spotted by staticcheck, the body did keep track of errors by sharing
a single err variable, but its last value was never used as the function
simply finished by returning nil.

To prevent postDecode from erroring on empty profiles,
which breaks TestEmptyProfile, add a check at the top of the function.

Update the runtime/pprof test accordingly,
since the default units didn't make sense for an empty profile anyway.

Change-Id: I188cd8337434adf9169651ab5c914731b8b20f39
Reviewed-on: https://go-review.googlesource.com/c/go/+/483137
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-26 07:37:45 +00:00
Mauri de Souza Meneguzzo
555af99bcc runtime/internal/atomic: add riscv64 operators for And/Or
These primitives will be used by the new And/Or sync/atomic apis.

For #61395

Change-Id: I4062d6317e01afd94d3588f5425237723ab15ade
GitHub-Last-Rev: c0a8d8f34d
GitHub-Pull-Request: golang/go#63272
Reviewed-on: https://go-review.googlesource.com/c/go/+/531575
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-10-25 21:32:01 +00:00
Bryan C. Mills
e7908ab9a2 cmd/go: allow suffixed toolchains to satisfy toolchain lines for the same base version
Fixes #63357.

Change-Id: I8380cf0d3965d6aef84a91a515d3e0e8aae9344b
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/535355
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-10-25 21:19:11 +00:00
Bryan C. Mills
55b8e16b2e testing: use monotonic counts to attribute races in subtests
This implements the approach I described in
https://go-review.git.corp.google.com/c/go/+/494057/1#message-5c9773bded2f89b4058848cb036b860aa6716de3.

Specifically:

- Each level of test atomically records the cumulative number of races
  seen as of the last race-induced test failure.

- When a subtest fails, it logs the race error, and then updates its
  parents' counters so that they will not log the same error.

- We check each test or benchmark for races before it starts running
  each of its subtests or sub-benchmark, before unblocking parallel
  subtests, and after running any cleanup functions.

With this implementation, it should be the case that every test that
is running when a race is detected reports that race, and any race
reported for a subtest is not redundantly reported for its parent.

The regression tests are based on those added in CL 494057 and
CL 501895, with a few additions based on my own review of the code.

Fixes #60083.

Change-Id: I578ae929f192a7a951b31b17ecb560cbbf1ef7a1
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/506300
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-25 20:44:25 +00:00
Tobias Klauser
a57c5736c5 cmd/go: remove unused (*testgoData).acquireNet test helper
It's unused since CL 518775.

Change-Id: Ic889f0cf1555a8503d0c2b3fb232854609d72764
Reviewed-on: https://go-review.googlesource.com/c/go/+/537597
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-10-25 19:47:04 +00:00
Tobias Klauser
9f84df7f01 cmd/go: remove unused (*testgoData).mustHaveContent test helper
It's unused since CL 214382.

Change-Id: I83a860938f87a7c4d2bdb966689c17ba29066639
Reviewed-on: https://go-review.googlesource.com/c/go/+/537596
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-25 19:46:53 +00:00
Tobias Klauser
37788b8b9e cmd/go: remove unused (*testgoData).runGit test helper
It's unused since CL 518775.

Change-Id: I81a4865d0c656ca2b968d51e52388c88e661a157
Reviewed-on: https://go-review.googlesource.com/c/go/+/537595
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2023-10-25 19:46:46 +00:00
Ian Lance Taylor
57322b3cbf debug/elf: return error in DynString for invalid dynamic section size
No test case because the problem can only happen for invalid data.
Let the fuzzer find cases like this.

Fixes #63610

Change-Id: I797b4d9bdb08286ad3e3a9a6f800ee8c90cb7261
Reviewed-on: https://go-review.googlesource.com/c/go/+/536400
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
2023-10-25 19:25:38 +00:00
Rhys Hiltner
9cdcb01320 runtime/pprof: include labels for caller of goroutine profile
The goroutine profile has close to three code paths for adding a
goroutine record to the goroutine profile: one for the goroutine that
requested the profile, one for every other goroutine, plus some special
handling for the finalizer goroutine. The first of those captured the
goroutine stack, but neglected to include that goroutine's labels.

Update the tests to check for the inclusion of labels for all three
types of goroutines, and include labels for the creator of the goroutine
profile.

For #63712

Change-Id: Id5387a5f536d3c37268c240e0b6db3d329a3d632
Reviewed-on: https://go-review.googlesource.com/c/go/+/537515
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: David Chase <drchase@google.com>
2023-10-25 17:37:34 +00:00