1
0
mirror of https://github.com/golang/go synced 2024-11-17 21:24:55 -07:00
Commit Graph

58562 Commits

Author SHA1 Message Date
Joel Sing
54452b963c syscall: call getfsstat via libc on openbsd
On openbsd, call getfsstat directly via libc, instead of calling it
via syscall.Syscall.

Updates #63900

Change-Id: Ib4c581160b170e6cc6017c42e959e647d97ac993
Reviewed-on: https://go-review.googlesource.com/c/go/+/538736
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Josh Rickmar <jrick@zettaport.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
2023-11-02 10:34:00 +00:00
Joel Sing
4e896d179d runtime: remove map stack version handling for openbsd
OpenBSD 6.3 is more than five years old and has not been supported for
the last four years (only 7.3 and 7.4 are currently supported). As such,
remove special handling of MAP_STACK for 6.3 and earlier.

Change-Id: I1086c910bbcade7fb3938bb1226813212794b587
Reviewed-on: https://go-review.googlesource.com/c/go/+/538458
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Aaron Bieber <aaron@bolddaemon.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
2023-11-02 08:05:10 +00:00
Robert Griesemer
6a7ef36466 test: run range-over-integer tests without need for -goexperiment
Move the range-over-function tests into range4.go.

Change-Id: Idccf30a0c7d7e8d2a17fb1c5561cf21e00506135
Reviewed-on: https://go-review.googlesource.com/c/go/+/539095
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
2023-11-02 04:17:21 +00:00
Robert Griesemer
11677d983e go/types, types2: enable range over int w/o need for goexperiment
For #61405.

Change-Id: I047ec31bc36b1707799ffef25506070613477d1f
Reviewed-on: https://go-review.googlesource.com/c/go/+/538718
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2023-11-02 04:17:18 +00:00
Robert Griesemer
e5ef484691 spec: document range over integer expression
This CL is partly based on CL 510535.

For #61405.

Change-Id: Ic94f6726f9eb34313f11bec7b651921d7e5c18d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/538859
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Bypass: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2023-11-02 03:57:56 +00:00
Joe Tsai
08b2f1f761 os: fix PathError.Op for dirFS.Open
This appears to be a copy-paste error from CL 455362.

The operation name used to be "open"
but seems to have been accidentally changed to "stat".
This CL reverts back to "open".

Change-Id: I3fc5168095e2d9eee3efa3cc091b10bcf4e3ecde
Reviewed-on: https://go-review.googlesource.com/c/go/+/539056
Run-TryBot: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Joseph Tsai <joetsai@digital-static.net>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-01 23:58:46 +00:00
qiulaidongfeng
23711f8ef7 internal/bytealg: optimize indexbyte in amd64
goos: windows
goarch: amd64
pkg: bytes
cpu: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
                         │   old.txt   │               new.txt               │
                         │   sec/op    │   sec/op     vs base                │
IndexByte/10-16            2.613n ± 1%   2.558n ± 1%   -2.09% (p=0.014 n=10)
IndexByte/32-16            3.034n ± 1%   3.010n ± 2%        ~ (p=0.305 n=10)
IndexByte/4K-16            57.20n ± 2%   39.58n ± 2%  -30.81% (p=0.000 n=10)
IndexByte/4M-16            34.48µ ± 1%   33.83µ ± 2%   -1.87% (p=0.023 n=10)
IndexByte/64M-16           1.493m ± 2%   1.450m ± 2%   -2.89% (p=0.000 n=10)
IndexBytePortable/10-16    3.172n ± 4%   3.163n ± 2%        ~ (p=0.684 n=10)
IndexBytePortable/32-16    8.465n ± 2%   8.375n ± 3%        ~ (p=0.631 n=10)
IndexBytePortable/4K-16    852.0n ± 1%   846.6n ± 3%        ~ (p=0.971 n=10)
IndexBytePortable/4M-16    868.2µ ± 2%   856.6µ ± 2%        ~ (p=0.393 n=10)
IndexBytePortable/64M-16   13.81m ± 2%   13.88m ± 3%        ~ (p=0.684 n=10)
geomean                    1.204µ        1.148µ        -4.63%

                         │   old.txt    │               new.txt                │
                         │     B/s      │     B/s       vs base                │
IndexByte/10-16            3.565Gi ± 1%   3.641Gi ± 1%   +2.15% (p=0.015 n=10)
IndexByte/32-16            9.821Gi ± 1%   9.899Gi ± 2%        ~ (p=0.315 n=10)
IndexByte/4K-16            66.70Gi ± 2%   96.39Gi ± 2%  +44.52% (p=0.000 n=10)
IndexByte/4M-16            113.3Gi ± 1%   115.5Gi ± 2%   +1.91% (p=0.023 n=10)
IndexByte/64M-16           41.85Gi ± 2%   43.10Gi ± 2%   +2.98% (p=0.000 n=10)
IndexBytePortable/10-16    2.936Gi ± 4%   2.945Gi ± 2%        ~ (p=0.684 n=10)
IndexBytePortable/32-16    3.521Gi ± 2%   3.559Gi ± 3%        ~ (p=0.631 n=10)
IndexBytePortable/4K-16    4.477Gi ± 1%   4.506Gi ± 3%        ~ (p=0.971 n=10)
IndexBytePortable/4M-16    4.499Gi ± 2%   4.560Gi ± 2%        ~ (p=0.393 n=10)
IndexBytePortable/64M-16   4.525Gi ± 2%   4.504Gi ± 3%        ~ (p=0.684 n=10)
geomean                    10.04Gi        10.53Gi        +4.86%

For #63678

Change-Id: I0571c2b540a816d57bd6ed8bb1df4191c7992d92
GitHub-Last-Rev: 7e95b8bfb0
GitHub-Pull-Request: golang/go#63847
Reviewed-on: https://go-review.googlesource.com/c/go/+/538715
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-01 19:06:01 +00:00
dchaofei
ea3010d994 crypto/x509: optimize the performance of checkSignature
The loop should be terminated immediately when `algo` has been found

Fixes #52955

Change-Id: Ib3865c4616a0c1af9b72daea45f5a1750f84562f
GitHub-Last-Rev: 721322725f
GitHub-Pull-Request: golang/go#52987
Reviewed-on: https://go-review.googlesource.com/c/go/+/407215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
2023-11-01 19:04:52 +00:00
Jes Cok
a05a25cb19 bytes,internal/bytealg: add func bytealg.LastIndexRabinKarp
Also rename 'substr' to 'sep' in IndexRabinKarp for consistency.

Change-Id: Icc2ad1116aecaf002c8264daa2fa608306c9a88a
GitHub-Last-Rev: 1784b93f53
GitHub-Pull-Request: golang/go#63854
Reviewed-on: https://go-review.googlesource.com/c/go/+/538716
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-11-01 19:02:57 +00:00
Bryan C. Mills
0330aad038 os: report IO_REPARSE_TAG_DEDUP files as regular in Stat and Lstat
Prior to CL 460595, Lstat reported most reparse points as regular
files. However, reparse points can in general implement unusual
behaviors (consider IO_REPARSE_TAG_AF_UNIX or IO_REPARSE_TAG_LX_CHR),
and Windows allows arbitrary user-defined reparse points, so in
general we must not assume that an unrecognized reparse tag represents
a regular file; in CL 460595, we began marking them as irregular.

As it turns out, the Data Deduplication service on Windows Server runs
an Optimization job that turns regular files into reparse files with
the tag IO_REPARSE_TAG_DEDUP. Those files still behave more-or-less
like regular files, in that they have well-defined sizes and support
random-access reads and writes, so most programs can treat them as
regular files without difficulty. However, they are still reparse
files: as a result, on servers with the Data Deduplication service
enabled, files could arbitrarily change from “regular” to “irregular”
without explicit user intervention.

Since dedup files are converted in the background and otherwise behave
like regular files, this change adds a special case to report DEDUP
reparse points as regular.

Fixes #63429.

No test because to my knowledge we don't have any Windows builders
that have the deduplication service enabled, nor do we have a way to
reliably guarantee the existence of an IO_REPARSE_TAG_DEDUP file.

(In theory we could add a builder with the service enabled on a
specific volume, write a test that encodes knowledge of that volume,
and use the GO_BUILDER_NAME environment variable to run that test only
on the specially-configured builders. However, I don't currently have
the bandwidth to reconfigure the builders in this way, and given the
simplicity of the change I think it is unlikely to regress
accidentally.)

Change-Id: I649e7ef0b67e3939a980339ce7ec6a20b31b23a1
Cq-Include-Trybots: luci.golang.try:gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/537915
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
2023-11-01 19:01:53 +00:00
Robert Griesemer
b7a695bd68 cmd/compile/internal/syntax: better error messages for incorrect type parameter list
When parsing a declaration of the form

        type a [b[c]]d

where a, b, c, d stand for identifiers, b[c] is parsed as a type
constraint (because an array length must be constant and an index
expression b[c] is never constant, even if b is a constant string
and c a constant index - this is crucial for disambiguation of the
various possibilities).

As a result, the error message referred to a missing type parameter
name and not an invalid array declaration.

Recognize this special case and report both possibilities (because
we can't be sure without type information) with the new error:

       "missing type parameter name or invalid array length"

ALso, change the previous error message

        "type parameter must be named"

to

        "missing type parameter name"

which is more fitting as the error refers to an absent type parameter
(rather than a type parameter that's somehow invisibly present but
unnamed).

Fixes #60812.

Change-Id: Iaad3b3a9aeff9dfe2184779f3d799f16c7500b34
Reviewed-on: https://go-review.googlesource.com/c/go/+/538856
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
2023-11-01 17:49:03 +00:00
Robert Griesemer
34a5830c26 cmd/compile/internal/syntax: fix/update various comments
Change-Id: I30b448c8fcdbad94afcd7ff0dfc5cfebb485bdd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/538855
Auto-Submit: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
2023-11-01 12:49:49 +00:00
Joel Sing
0aa2197279 os/signal: use syscall.Wait4 directly in tests
Rather than using syscall.Syscall6 with SYS_WAIT4, use syscall.Wait4
directly.

Updates #59667

Change-Id: I50fea3b7d10003dbc632aafd5e170a9fe96d6f42
Reviewed-on: https://go-review.googlesource.com/c/go/+/538459
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-01 07:10:32 +00:00
Joel Sing
1a58fd0fda syscall: regen zsyscall for openbsd/riscv64
This removes the unused writelen function, which was cleaned up for other
platforms in CL#529035.

Change-Id: I1999dc81276763bdc73d8590c16729447c4e8538
Reviewed-on: https://go-review.googlesource.com/c/go/+/538119
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
2023-11-01 07:10:03 +00:00
Joel Sing
6ecadb4d87 syscall: regenerate zsyscall for dragonfly/freebsd/netbsd
The sysctl declaration was moved in CL 141639, however the files were
presumably not regenerated. There is no functional change, however
regenerating avoids unrelated noise in future diffs.

Change-Id: Ifb840b5853f3f1c3c88a3f94df21b6f6d3c635d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/538118
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-11-01 07:09:43 +00:00
apocelipes
e73e25b624 internal/cpu: add comments to copied functions
Just as same as other copied functions,
like stringsTrimSuffix in "os/executable_procfs.go"

Change-Id: I9c9fbd75b009a5ae0e869cf1fddc77c0e08d9a67
GitHub-Last-Rev: 4c18865e15
GitHub-Pull-Request: golang/go#63704
Reviewed-on: https://go-review.googlesource.com/c/go/+/537056
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-10-31 21:32:19 +00:00
Cherry Mui
d2f3a68bf0 runtime: use testenv.Command in TestG0StackOverflow
For debugging timeouts.

Change-Id: I08dc86ec0264196e5fd54066655e94a9d062ed80
Reviewed-on: https://go-review.googlesource.com/c/go/+/538697
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-31 20:50:47 +00:00
Keith Randall
b11defeaed runtime: make select fairness test less picky
Allow up to 10 standard deviations from the mean, instead of
~5 that the current test allows.

10 standard deviations allows up to a 4500/5500 split.

Fixes #52465

Change-Id: Icb21c1d31fafbcf4723b75435ba5e98863e812c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/538815
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-31 20:47:35 +00:00
Keith Randall
962ccbef91 cmd/compile: ensure pointer arithmetic happens after the nil check
Have nil checks return a pointer that is known non-nil. Users of
that pointer can use the result, ensuring that they are ordered
after the nil check itself.

The order dependence goes away after scheduling, when we've fixed
an order. At that point we move uses back to the original pointer
so it doesn't change regalloc any.

This prevents pointer arithmetic on nil from being spilled to the
stack and then observed by a stack scan.

Fixes #63657

Change-Id: I1a5fa4f2e6d9000d672792b4f90dfc1b7b67f6ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/537775
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 20:45:54 +00:00
Keith Randall
43b57b8516 cmd/compile: handle constant pointer offsets in dead store elimination
Update #63657
Update #45573

Change-Id: I163c6038c13d974dc0ca9f02144472bc05331826
Reviewed-on: https://go-review.googlesource.com/c/go/+/538595
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 20:42:56 +00:00
Keith Randall
66b8107a26 runtime: on arm32, detect whether we have sync instructions
Make the choice of using these instructions dynamic (triggered by cpu
feature detection) rather than static (trigered by GOARM setting).

if GOARM>=7, we know we have them.
For GOARM=5/6, dynamically dispatch based on auxv information.

Update #17082
Update #61588

Change-Id: I8a50481d942f62cf36348998a99225d0d242f8af
Reviewed-on: https://go-review.googlesource.com/c/go/+/525637
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-10-31 20:38:55 +00:00
Mateusz Poliwczak
dd84bb6824 crypto/x509: add new OID type and use it in Certificate
Fixes #60665

Change-Id: I814b7d4b26b964f74443584fb2048b3e27e3b675
GitHub-Last-Rev: 693c741c76
GitHub-Pull-Request: golang/go#62096
Reviewed-on: https://go-review.googlesource.com/c/go/+/520535
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Mateusz Poliwczak <mpoliwczak34@gmail.com>
Auto-Submit: Roland Shoemaker <roland@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-10-31 19:22:19 +00:00
Jes Cok
68e52bc03c bytes,internal/bytealg: eliminate IndexRabinKarpBytes using generics
This is a follow-up to CL 538175.

Change-Id: Iec2523b36a16d7e157c17858c89fcd43c2470d58
GitHub-Last-Rev: 812d36e57c
GitHub-Pull-Request: golang/go#63770
Reviewed-on: https://go-review.googlesource.com/c/go/+/538195
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-31 17:14:04 +00:00
Jes Cok
cbc403af1d cmd/compile/internal/ssa: adjust default to the end in *Block.AuxIntString
Change-Id: Id48cade7811e2dfbf78d3171fe202ad272534e37
GitHub-Last-Rev: ea6abb2dc2
GitHub-Pull-Request: golang/go#63808
Reviewed-on: https://go-review.googlesource.com/c/go/+/538377
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 17:13:33 +00:00
Cuong Manh Le
3dea7c3f69 hash/maphash: weaken avalanche test a bit more
CL 495415 weaken avalanche, making allowed range from 43% to 57%. Since
then, we only see a failure with 58% on linux-386-longtest builder, so
let give the test a bit more wiggle room: 40% to 59%.

Fixes #60170

Change-Id: I9528ebc8601975b733c3d9fd464ce41429654273
Reviewed-on: https://go-review.googlesource.com/c/go/+/538655
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2023-10-31 17:00:31 +00:00
cui fliter
289b823ac9 internal/bytealg: optimize Count/CountString in arm64
For #63678

goos: darwin
goarch: arm64
pkg: strings
                          │ count_old.txt │            count_new.txt            │
                          │    sec/op     │   sec/op     vs base                │
CountHard1-8                 368.7µ ± 11%   332.0µ ± 1%   -9.95% (p=0.002 n=10)
CountHard2-8                 348.8µ ±  5%   333.1µ ± 1%   -4.51% (p=0.000 n=10)
CountHard3-8                 402.7µ ± 25%   359.5µ ± 1%  -10.75% (p=0.000 n=10)
CountTorture-8              10.536µ ± 23%   9.913µ ± 0%   -5.91% (p=0.000 n=10)
CountTortureOverlapping-8    74.86µ ±  9%   67.56µ ± 1%   -9.75% (p=0.000 n=10)
CountByte/10-8               6.905n ±  3%   6.690n ± 1%   -3.11% (p=0.001 n=10)
CountByte/32-8               3.247n ± 13%   3.207n ± 2%   -1.23% (p=0.030 n=10)
CountByte/4096-8             83.72n ±  1%   82.58n ± 1%   -1.36% (p=0.007 n=10)
CountByte/4194304-8          85.17µ ±  5%   84.02µ ± 8%        ~ (p=0.075 n=10)
CountByte/67108864-8         1.497m ±  8%   1.397m ± 2%   -6.69% (p=0.000 n=10)
geomean                      9.977µ         9.426µ        -5.53%

                     │ count_old.txt │            count_new.txt            │
                     │      B/s      │     B/s       vs base               │
CountByte/10-8         1.349Gi ±  3%   1.392Gi ± 1%  +3.20% (p=0.002 n=10)
CountByte/32-8         9.180Gi ± 11%   9.294Gi ± 2%  +1.24% (p=0.029 n=10)
CountByte/4096-8       45.57Gi ±  1%   46.20Gi ± 1%  +1.38% (p=0.007 n=10)
CountByte/4194304-8    45.86Gi ±  5%   46.49Gi ± 7%       ~ (p=0.075 n=10)
CountByte/67108864-8   41.75Gi ±  8%   44.74Gi ± 2%  +7.16% (p=0.000 n=10)
geomean                16.10Gi         16.55Gi       +2.85%

Change-Id: Ifc2173ba3a926b0fa9598372d4404b8645929d45
Reviewed-on: https://go-review.googlesource.com/c/go/+/538116
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-31 17:00:27 +00:00
Joel Sing
e293c4b509 runtime: allocate crash stack via stackalloc
On some platforms (notably OpenBSD), stacks must be specifically allocated
and marked as being stack memory. Allocate the crash stack using stackalloc,
which ensures these requirements are met, rather than using a global Go
variable.

Fixes #63794

Change-Id: I6513575797dd69ff0a36f3bfd4e5fc3bd95cbf50
Reviewed-on: https://go-review.googlesource.com/c/go/+/538457
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-31 16:28:14 +00:00
Robert Griesemer
b7a66be69c cmd/compile/internal/syntax: set up dummy name and type if func name is missing
We do the same elsewhere (e.g. in parser.name when a name is missing).
This ensures functions have a (dummy) name and a non-nil type.
Avoids a crash in the type-checker (verified manually).
A test was added here (rather than the type checker) because type-
checker tests are shared between types2 and go/types and error
recovery in this case is different.

Fixes #63835.

Change-Id: I1460fc88d23d80b8d8c181c774d6b0a56ca06317
Reviewed-on: https://go-review.googlesource.com/c/go/+/538059
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Bypass: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
2023-10-31 16:12:41 +00:00
Robert Griesemer
25a59decd5 go/types, types2: more concise error if conversion fails due to integer overflow
This change brings the error message for this case back in line
with the pre-Go1.18 error message.

Fixes #63563.

Change-Id: I3c6587d420907b34ee8a5f295ecb231e9f008380
Reviewed-on: https://go-review.googlesource.com/c/go/+/538058
Auto-Submit: Robert Griesemer <gri@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
TryBot-Bypass: Robert Griesemer <gri@google.com>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
2023-10-31 16:11:16 +00:00
Joel Sing
b6a3c0273e cmd/dist,internal/platform: enable openbsd/ppc64 port
Updates #56001

Change-Id: I16440114ecf661e9fc17d304ab3b16bc97ef82f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/517935
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2023-10-31 12:43:19 +00:00
Jes Cok
f215a0be4d cmd/compile/internal/ssa: add missing space in comment
Change-Id: I54c3e8e0d61ceb6533284098dc32944f9f14459e
GitHub-Last-Rev: 9793d9d039
GitHub-Pull-Request: golang/go#63806
Reviewed-on: https://go-review.googlesource.com/c/go/+/538375
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: qiulaidongfeng <2645477756@qq.com>
2023-10-30 21:52:15 +00:00
qiulaidongfeng
9c2ab20d48 internal/fmtsort: makeChans pin pointer
Complete TODO.

For #49431

Change-Id: I1399205e430ebd83182c3e0c4becf1fde32d433e
GitHub-Last-Rev: 02cdea740b
GitHub-Pull-Request: golang/go#62673
Reviewed-on: https://go-review.googlesource.com/c/go/+/528796
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Commit-Queue: Keith Randall <khr@golang.org>
Run-TryBot: qiulaidongfeng <2645477756@qq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-30 21:00:16 +00:00
Quan Tong
214ce28503 cmd/go/internal/help: update the documentation to match the design and implementation
The existing documentation imply that the build constraints
should be ignored after a block comments, but actually it's not.

Fixes #63502

Change-Id: I0597934b7a7eeab8908bf06e1312169b3702bf05
Reviewed-on: https://go-review.googlesource.com/c/go/+/535635
Reviewed-by: Michael Matloob <matloob@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Mark Pictor <mark.pictor@contrastsecurity.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-10-30 18:16:15 +00:00
Allen Li
1e95fc7ffe log/slog: Reorder doc comment for level constants
pkgsite and go doc print the doc comment *after* the code, resulting in:

    const (
            LevelDebug Level = -4
            ...
    )

    Many paragraphs...

    Names for common levels.

The "Names for common levels." feels out of place and confusing at the bottom.

This is also consistent with the recommendation for the first sentence in doc comments to be the "summary".

Change-Id: I656e85e27d2a4b23eaba5f2c1f4f811a88848c83
GitHub-Last-Rev: d9f7ee9b94
GitHub-Pull-Request: golang/go#61943
Reviewed-on: https://go-review.googlesource.com/c/go/+/518537
Reviewed-by: Alan Donovan <alan@alandonovan.net>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: qiulaidongfeng <2645477756@qq.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
2023-10-30 17:34:43 +00:00
Russ Cox
8abde68f19 math/rand/v2: delete Mitchell/Reeds source
These slowdowns are because we are now using PCG instead of the
Mitchell/Reeds LFSR for the benchmarks. PCG is in fact a bit slower
(but generates statically far better random numbers).

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 01ff938549.amd64 │           afa459a2f0.amd64           │
                        │      sec/op      │    sec/op     vs base                │
PCG_DXSM-32                    1.490n ± 0%    1.488n ± 2%        ~ (p=0.408 n=20)
SourceUint64-32                1.352n ± 1%    1.450n ± 3%   +7.21% (p=0.000 n=20)
GlobalInt64-32                 2.083n ± 0%    2.067n ± 2%        ~ (p=0.223 n=20)
GlobalInt64Parallel-32        0.1035n ± 1%   0.1044n ± 2%        ~ (p=0.010 n=20)
GlobalUint64-32                2.038n ± 1%    2.085n ± 0%   +2.28% (p=0.000 n=20)
GlobalUint64Parallel-32       0.1006n ± 1%   0.1008n ± 1%        ~ (p=0.733 n=20)
Int64-32                       1.687n ± 2%    1.779n ± 1%   +5.48% (p=0.000 n=20)
Uint64-32                      1.674n ± 2%    1.854n ± 2%  +10.69% (p=0.000 n=20)
GlobalIntN1000-32              3.135n ± 1%    3.140n ± 3%        ~ (p=0.794 n=20)
IntN1000-32                    2.478n ± 1%    2.496n ± 1%   +0.73% (p=0.006 n=20)
Int64N1000-32                  2.455n ± 1%    2.510n ± 2%   +2.22% (p=0.000 n=20)
Int64N1e8-32                   2.467n ± 2%    2.471n ± 2%        ~ (p=0.050 n=20)
Int64N1e9-32                   2.454n ± 1%    2.488n ± 2%   +1.39% (p=0.000 n=20)
Int64N2e9-32                   2.482n ± 1%    2.478n ± 2%        ~ (p=0.066 n=20)
Int64N1e18-32                  3.349n ± 2%    3.088n ± 1%   -7.81% (p=0.000 n=20)
Int64N2e18-32                  3.537n ± 1%    3.493n ± 1%   -1.24% (p=0.002 n=20)
Int64N4e18-32                  4.917n ± 0%    5.060n ± 2%   +2.91% (p=0.000 n=20)
Int32N1000-32                  2.386n ± 1%    2.620n ± 1%   +9.76% (p=0.000 n=20)
Int32N1e8-32                   2.366n ± 1%    2.652n ± 0%  +12.11% (p=0.000 n=20)
Int32N1e9-32                   2.355n ± 2%    2.644n ± 1%  +12.32% (p=0.000 n=20)
Int32N2e9-32                   2.371n ± 1%    2.619n ± 2%  +10.48% (p=0.000 n=20)
Float32-32                     2.245n ± 2%    2.261n ± 1%        ~ (p=0.625 n=20)
Float64-32                     2.235n ± 1%    2.241n ± 2%        ~ (p=0.393 n=20)
ExpFloat64-32                  3.813n ± 3%    3.716n ± 1%   -2.53% (p=0.000 n=20)
NormFloat64-32                 3.652n ± 2%    3.718n ± 1%   +1.79% (p=0.006 n=20)
Perm3-32                       33.12n ± 3%    34.11n ± 2%        ~ (p=0.021 n=20)
Perm30-32                      205.1n ± 1%    200.6n ± 0%   -2.17% (p=0.000 n=20)
Perm30ViaShuffle-32            110.8n ± 1%    109.7n ± 1%   -0.99% (p=0.002 n=20)
ShuffleOverhead-32             113.0n ± 1%    107.2n ± 1%   -5.09% (p=0.000 n=20)
Concurrent-32                  2.100n ± 0%    2.108n ± 6%        ~ (p=0.103 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
                       │ 01ff938549.arm64 │           afa459a2f0.arm64           │
                       │      sec/op      │    sec/op     vs base                │
PCG_DXSM-8                    2.531n ± 0%    2.531n ± 0%        ~ (p=0.763 n=20)
SourceUint64-8                2.258n ± 1%    2.531n ± 0%  +12.09% (p=0.000 n=20)
GlobalInt64-8                 2.167n ± 0%    2.177n ± 1%        ~ (p=0.213 n=20)
GlobalInt64Parallel-8        0.4310n ± 0%   0.4319n ± 0%        ~ (p=0.027 n=20)
GlobalUint64-8                2.182n ± 1%    2.185n ± 1%        ~ (p=0.683 n=20)
GlobalUint64Parallel-8       0.4297n ± 0%   0.4295n ± 1%        ~ (p=0.941 n=20)
Int64-8                       2.472n ± 1%    4.104n ± 0%  +66.00% (p=0.000 n=20)
Uint64-8                      2.449n ± 1%    4.080n ± 0%  +66.60% (p=0.000 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 1%        ~ (p=0.972 n=20)
IntN1000-8                    2.998n ± 2%    4.140n ± 0%  +38.09% (p=0.000 n=20)
Int64N1000-8                  2.949n ± 2%    4.139n ± 0%  +40.35% (p=0.000 n=20)
Int64N1e8-8                   2.953n ± 2%    4.140n ± 0%  +40.22% (p=0.000 n=20)
Int64N1e9-8                   2.950n ± 0%    4.139n ± 0%  +40.32% (p=0.000 n=20)
Int64N2e9-8                   2.946n ± 2%    4.140n ± 0%  +40.53% (p=0.000 n=20)
Int64N1e18-8                  3.779n ± 1%    5.273n ± 0%  +39.52% (p=0.000 n=20)
Int64N2e18-8                  4.370n ± 1%    6.059n ± 0%  +38.65% (p=0.000 n=20)
Int64N4e18-8                  6.544n ± 1%    8.803n ± 0%  +34.52% (p=0.000 n=20)
Int32N1000-8                  2.950n ± 0%    4.131n ± 0%  +40.06% (p=0.000 n=20)
Int32N1e8-8                   2.950n ± 2%    4.131n ± 0%  +40.03% (p=0.000 n=20)
Int32N1e9-8                   2.951n ± 2%    4.131n ± 0%  +39.99% (p=0.000 n=20)
Int32N2e9-8                   2.950n ± 2%    4.131n ± 0%  +40.03% (p=0.000 n=20)
Float32-8                     3.441n ± 0%    4.110n ± 0%  +19.44% (p=0.000 n=20)
Float64-8                     3.442n ± 0%    4.104n ± 0%  +19.24% (p=0.000 n=20)
ExpFloat64-8                  4.481n ± 0%    5.338n ± 0%  +19.11% (p=0.000 n=20)
NormFloat64-8                 4.725n ± 0%    5.731n ± 0%  +21.28% (p=0.000 n=20)
Perm3-8                       26.55n ± 0%    26.62n ± 0%   +0.28% (p=0.000 n=20)
Perm30-8                      181.9n ± 0%    194.6n ± 2%   +6.98% (p=0.000 n=20)
Perm30ViaShuffle-8            142.9n ± 0%    156.4n ± 0%   +9.45% (p=0.000 n=20)
ShuffleOverhead-8             120.8n ± 2%    125.8n ± 0%   +4.10% (p=0.000 n=20)
Concurrent-8                  2.421n ± 6%    2.654n ± 6%   +9.67% (p=0.002 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 01ff938549.386 │            afa459a2f0.386             │
                        │     sec/op     │    sec/op     vs base                 │
PCG_DXSM-32                  7.613n ± 1%    7.793n ± 2%    +2.38% (p=0.000 n=20)
SourceUint64-32              2.069n ± 0%    7.680n ± 1%  +271.19% (p=0.000 n=20)
GlobalInt64-32               3.456n ± 1%    3.474n ± 3%         ~ (p=0.654 n=20)
GlobalInt64Parallel-32      0.3252n ± 0%   0.3253n ± 0%         ~ (p=0.952 n=20)
GlobalUint64-32              3.573n ± 1%    3.433n ± 2%    -3.92% (p=0.000 n=20)
GlobalUint64Parallel-32     0.3159n ± 0%   0.3156n ± 0%         ~ (p=0.223 n=20)
Int64-32                     2.562n ± 2%    7.707n ± 1%  +200.74% (p=0.000 n=20)
Uint64-32                    2.592n ± 0%    7.714n ± 1%  +197.65% (p=0.000 n=20)
GlobalIntN1000-32            6.266n ± 2%    6.236n ± 1%         ~ (p=0.039 n=20)
IntN1000-32                  4.724n ± 2%   10.410n ± 1%  +120.39% (p=0.000 n=20)
Int64N1000-32                5.490n ± 2%   10.975n ± 2%   +99.89% (p=0.000 n=20)
Int64N1e8-32                 5.513n ± 2%   10.980n ± 1%   +99.15% (p=0.000 n=20)
Int64N1e9-32                 5.476n ± 1%   10.950n ± 0%   +99.96% (p=0.000 n=20)
Int64N2e9-32                 5.501n ± 2%   11.110n ± 1%  +101.96% (p=0.000 n=20)
Int64N1e18-32                9.043n ± 2%   15.180n ± 2%   +67.86% (p=0.000 n=20)
Int64N2e18-32                9.601n ± 2%   15.610n ± 1%   +62.60% (p=0.000 n=20)
Int64N4e18-32                12.00n ± 1%    19.23n ± 2%   +60.14% (p=0.000 n=20)
Int32N1000-32                4.829n ± 2%   10.345n ± 1%  +114.25% (p=0.000 n=20)
Int32N1e8-32                 4.825n ± 2%   10.330n ± 1%  +114.09% (p=0.000 n=20)
Int32N1e9-32                 4.830n ± 2%   10.350n ± 1%  +114.26% (p=0.000 n=20)
Int32N2e9-32                 4.750n ± 2%   10.345n ± 1%  +117.81% (p=0.000 n=20)
Float32-32                   10.89n ± 4%    13.57n ± 1%   +24.61% (p=0.000 n=20)
Float64-32                   19.60n ± 4%    22.95n ± 4%   +17.12% (p=0.000 n=20)
ExpFloat64-32                12.96n ± 3%    15.23n ± 2%   +17.47% (p=0.000 n=20)
NormFloat64-32               7.516n ± 1%   13.780n ± 1%   +83.34% (p=0.000 n=20)
Perm3-32                     36.78n ± 2%    46.62n ± 2%   +26.72% (p=0.000 n=20)
Perm30-32                    238.9n ± 2%    400.7n ± 1%   +67.73% (p=0.000 n=20)
Perm30ViaShuffle-32          189.7n ± 2%    350.5n ± 1%   +84.79% (p=0.000 n=20)
ShuffleOverhead-32           159.8n ± 1%    326.0n ± 2%  +104.01% (p=0.000 n=20)
Concurrent-32                3.286n ± 1%    3.290n ± 0%         ~ (p=0.743 n=20)

On the other hand, compared to the original "update benchmarks" CL,
the cleanups we've made more than compensate for PCG being a bit
slower than LFSR, at least on 64-bit x86. ARM64 (Apple M1) is a bit
slower: perhaps the 64x64→128 multiply is slower there for some reason.
386 is noticeably slower, but it's also a non-SSA backend.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.amd64 │            afa459a2f0.amd64            │
                        │      sec/op      │    sec/op     vs base                  │
SourceUint64-32                1.555n ± 1%    1.450n ± 3%   -6.78% (p=0.000 n=20)
GlobalInt64-32                 2.071n ± 1%    2.067n ± 2%        ~ (p=0.673 n=20)
GlobalInt63Parallel-32        0.1023n ± 1%
GlobalInt64Parallel-32                       0.1044n ± 2%
GlobalUint64-32                5.193n ± 1%    2.085n ± 0%  -59.86% (p=0.000 n=20)
GlobalUint64Parallel-32       0.2341n ± 0%   0.1008n ± 1%  -56.93% (p=0.000 n=20)
Int64-32                       2.056n ± 2%    1.779n ± 1%  -13.47% (p=0.000 n=20)
Uint64-32                      2.077n ± 2%    1.854n ± 2%  -10.74% (p=0.000 n=20)
GlobalIntN1000-32              4.077n ± 2%    3.140n ± 3%  -22.98% (p=0.000 n=20)
IntN1000-32                    3.476n ± 2%    2.496n ± 1%  -28.19% (p=0.000 n=20)
Int64N1000-32                  3.059n ± 1%    2.510n ± 2%  -17.96% (p=0.000 n=20)
Int64N1e8-32                   2.942n ± 1%    2.471n ± 2%  -15.98% (p=0.000 n=20)
Int64N1e9-32                   2.932n ± 1%    2.488n ± 2%  -15.14% (p=0.000 n=20)
Int64N2e9-32                   2.925n ± 1%    2.478n ± 2%  -15.30% (p=0.000 n=20)
Int64N1e18-32                  3.116n ± 1%    3.088n ± 1%        ~ (p=0.013 n=20)
Int64N2e18-32                  4.067n ± 1%    3.493n ± 1%  -14.11% (p=0.000 n=20)
Int64N4e18-32                  4.054n ± 1%    5.060n ± 2%  +24.80% (p=0.000 n=20)
Int32N1000-32                  2.951n ± 1%    2.620n ± 1%  -11.22% (p=0.000 n=20)
Int32N1e8-32                   3.102n ± 1%    2.652n ± 0%  -14.50% (p=0.000 n=20)
Int32N1e9-32                   3.535n ± 1%    2.644n ± 1%  -25.20% (p=0.000 n=20)
Int32N2e9-32                   3.514n ± 1%    2.619n ± 2%  -25.47% (p=0.000 n=20)
Float32-32                     2.760n ± 1%    2.261n ± 1%  -18.06% (p=0.000 n=20)
Float64-32                     2.284n ± 1%    2.241n ± 2%        ~ (p=0.016 n=20)
ExpFloat64-32                  3.757n ± 1%    3.716n ± 1%        ~ (p=0.034 n=20)
NormFloat64-32                 3.837n ± 1%    3.718n ± 1%   -3.09% (p=0.000 n=20)
Perm3-32                       35.23n ± 2%    34.11n ± 2%   -3.19% (p=0.000 n=20)
Perm30-32                      208.8n ± 1%    200.6n ± 0%   -3.93% (p=0.000 n=20)
Perm30ViaShuffle-32            111.7n ± 1%    109.7n ± 1%   -1.84% (p=0.000 n=20)
ShuffleOverhead-32             101.1n ± 1%    107.2n ± 1%   +6.03% (p=0.000 n=20)
Concurrent-32                  2.108n ± 7%    2.108n ± 6%        ~ (p=0.644 n=20)
PCG_DXSM-32                                   1.488n ± 2%

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 220860f76f.arm64 │            afa459a2f0.arm64            │
                       │      sec/op      │    sec/op     vs base                  │
SourceUint64-8                2.316n ± 1%    2.531n ± 0%   +9.33% (p=0.000 n=20)
GlobalInt64-8                 2.183n ± 1%    2.177n ± 1%        ~ (p=0.533 n=20)
GlobalInt63Parallel-8        0.4331n ± 0%
GlobalInt64Parallel-8                       0.4319n ± 0%
GlobalUint64-8                4.377n ± 2%    2.185n ± 1%  -50.07% (p=0.000 n=20)
GlobalUint64Parallel-8       0.9237n ± 0%   0.4295n ± 1%  -53.50% (p=0.000 n=20)
Int64-8                       2.538n ± 1%    4.104n ± 0%  +61.68% (p=0.000 n=20)
Uint64-8                      2.604n ± 1%    4.080n ± 0%  +56.68% (p=0.000 n=20)
GlobalIntN1000-8              3.857n ± 2%    2.814n ± 1%  -27.04% (p=0.000 n=20)
IntN1000-8                    3.822n ± 2%    4.140n ± 0%   +8.32% (p=0.000 n=20)
Int64N1000-8                  3.318n ± 0%    4.139n ± 0%  +24.74% (p=0.000 n=20)
Int64N1e8-8                   3.349n ± 1%    4.140n ± 0%  +23.64% (p=0.000 n=20)
Int64N1e9-8                   3.317n ± 2%    4.139n ± 0%  +24.80% (p=0.000 n=20)
Int64N2e9-8                   3.317n ± 2%    4.140n ± 0%  +24.81% (p=0.000 n=20)
Int64N1e18-8                  3.542n ± 1%    5.273n ± 0%  +48.85% (p=0.000 n=20)
Int64N2e18-8                  5.087n ± 0%    6.059n ± 0%  +19.12% (p=0.000 n=20)
Int64N4e18-8                  5.084n ± 0%    8.803n ± 0%  +73.16% (p=0.000 n=20)
Int32N1000-8                  3.208n ± 2%    4.131n ± 0%  +28.79% (p=0.000 n=20)
Int32N1e8-8                   3.610n ± 1%    4.131n ± 0%  +14.43% (p=0.000 n=20)
Int32N1e9-8                   4.235n ± 0%    4.131n ± 0%   -2.44% (p=0.000 n=20)
Int32N2e9-8                   4.229n ± 1%    4.131n ± 0%   -2.33% (p=0.000 n=20)
Float32-8                     3.468n ± 0%    4.110n ± 0%  +18.50% (p=0.000 n=20)
Float64-8                     3.447n ± 0%    4.104n ± 0%  +19.05% (p=0.000 n=20)
ExpFloat64-8                  4.567n ± 0%    5.338n ± 0%  +16.86% (p=0.000 n=20)
NormFloat64-8                 4.821n ± 0%    5.731n ± 0%  +18.89% (p=0.000 n=20)
Perm3-8                       28.89n ± 0%    26.62n ± 0%   -7.84% (p=0.000 n=20)
Perm30-8                      175.7n ± 0%    194.6n ± 2%  +10.76% (p=0.000 n=20)
Perm30ViaShuffle-8            153.5n ± 0%    156.4n ± 0%   +1.86% (p=0.000 n=20)
ShuffleOverhead-8             119.8n ± 1%    125.8n ± 0%   +4.97% (p=0.000 n=20)
Concurrent-8                  2.433n ± 3%    2.654n ± 6%   +9.13% (p=0.001 n=20)
PCG_DXSM-8                                   2.531n ± 0%

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.386 │             afa459a2f0.386              │
                        │     sec/op     │    sec/op     vs base                   │
SourceUint64-32             2.370n ±  1%    7.680n ± 1%  +224.05% (p=0.000 n=20)
GlobalInt64-32              3.569n ±  1%    3.474n ± 3%    -2.66% (p=0.001 n=20)
GlobalInt63Parallel-32     0.3221n ±  1%
GlobalInt64Parallel-32                     0.3253n ± 0%
GlobalUint64-32             8.797n ± 10%    3.433n ± 2%   -60.98% (p=0.000 n=20)
GlobalUint64Parallel-32    0.6351n ±  0%   0.3156n ± 0%   -50.31% (p=0.000 n=20)
Int64-32                    2.612n ±  2%    7.707n ± 1%  +195.04% (p=0.000 n=20)
Uint64-32                   3.350n ±  1%    7.714n ± 1%  +130.25% (p=0.000 n=20)
GlobalIntN1000-32           5.892n ±  1%    6.236n ± 1%    +5.82% (p=0.000 n=20)
IntN1000-32                 4.546n ±  1%   10.410n ± 1%  +128.97% (p=0.000 n=20)
Int64N1000-32               14.59n ±  1%    10.97n ± 2%   -24.75% (p=0.000 n=20)
Int64N1e8-32                14.76n ±  2%    10.98n ± 1%   -25.58% (p=0.000 n=20)
Int64N1e9-32                16.57n ±  1%    10.95n ± 0%   -33.90% (p=0.000 n=20)
Int64N2e9-32                14.54n ±  1%    11.11n ± 1%   -23.62% (p=0.000 n=20)
Int64N1e18-32               16.14n ±  1%    15.18n ± 2%    -5.95% (p=0.000 n=20)
Int64N2e18-32               18.10n ±  1%    15.61n ± 1%   -13.73% (p=0.000 n=20)
Int64N4e18-32               18.65n ±  1%    19.23n ± 2%    +3.08% (p=0.000 n=20)
Int32N1000-32               3.560n ±  1%   10.345n ± 1%  +190.55% (p=0.000 n=20)
Int32N1e8-32                3.770n ±  2%   10.330n ± 1%  +174.01% (p=0.000 n=20)
Int32N1e9-32                4.098n ±  0%   10.350n ± 1%  +152.53% (p=0.000 n=20)
Int32N2e9-32                4.179n ±  1%   10.345n ± 1%  +147.52% (p=0.000 n=20)
Float32-32                  21.18n ±  4%    13.57n ± 1%   -35.93% (p=0.000 n=20)
Float64-32                  20.60n ±  2%    22.95n ± 4%   +11.41% (p=0.000 n=20)
ExpFloat64-32               13.07n ±  0%    15.23n ± 2%   +16.48% (p=0.000 n=20)
NormFloat64-32              7.738n ±  2%   13.780n ± 1%   +78.08% (p=0.000 n=20)
Perm3-32                    36.73n ±  1%    46.62n ± 2%   +26.91% (p=0.000 n=20)
Perm30-32                   211.9n ±  1%    400.7n ± 1%   +89.05% (p=0.000 n=20)
Perm30ViaShuffle-32         165.2n ±  1%    350.5n ± 1%  +112.20% (p=0.000 n=20)
ShuffleOverhead-32          133.9n ±  1%    326.0n ± 2%  +143.37% (p=0.000 n=20)
Concurrent-32               3.287n ±  2%    3.290n ± 0%         ~ (p=0.365 n=20)
PCG_DXSM-32                                 7.793n ± 2%

For #61716.

Change-Id: I4e9c0525b5f84a2ac46f23da9e365495e2d05777
Reviewed-on: https://go-review.googlesource.com/c/go/+/502506
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 17:09:26 +00:00
Russ Cox
8631fcbf31 math/rand/v2: add PCG-DXSM
For the original math/rand, we ported Plan 9's random number
generator, which was a refinement by Ken Thompson of an algorithm
by Don Mitchell and Jim Reeds, which Mitchell in turn recalls as
having been derived from an algorithm by Marsaglia. At its core,
it is an additive lagged Fibonacci generator (ALFG).

Whatever the details of the history, this generator is nowhere
near the current state of the art for simple, pseudo-random
generators.

This CL adds an implementation of Melissa O'Neill's PCG, specifically
the variant PCG-DXSM, which she defined after writing the PCG paper
and which is now the default in Numpy. The update is slightly slower
(a few multiplies and adds, instead of a few adds), but the state
is dramatically smaller (2 words instead of 607). The statistical
output properties are better too.

A followup CL will delete the old generator.

PCG is the only change here, so no benchmarks should be affected.
Including them anyway as further evidence for caution.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 8993506f2f.amd64 │           01ff938549.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.325n ± 1%    1.352n ± 1%   +2.00% (p=0.000 n=20)
GlobalInt64-32                 2.240n ± 1%    2.083n ± 0%   -7.03% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1041n ± 1%   0.1035n ± 1%        ~ (p=0.064 n=20)
GlobalUint64-32                2.072n ± 3%    2.038n ± 1%        ~ (p=0.089 n=20)
GlobalUint64Parallel-32       0.1008n ± 1%   0.1006n ± 1%        ~ (p=0.804 n=20)
Int64-32                       1.716n ± 1%    1.687n ± 2%        ~ (p=0.045 n=20)
Uint64-32                      1.665n ± 1%    1.674n ± 2%        ~ (p=0.878 n=20)
GlobalIntN1000-32              3.335n ± 1%    3.135n ± 1%   -6.00% (p=0.000 n=20)
IntN1000-32                    2.484n ± 1%    2.478n ± 1%        ~ (p=0.085 n=20)
Int64N1000-32                  2.502n ± 2%    2.455n ± 1%   -1.88% (p=0.002 n=20)
Int64N1e8-32                   2.484n ± 2%    2.467n ± 2%        ~ (p=0.048 n=20)
Int64N1e9-32                   2.502n ± 0%    2.454n ± 1%   -1.92% (p=0.000 n=20)
Int64N2e9-32                   2.502n ± 0%    2.482n ± 1%   -0.76% (p=0.000 n=20)
Int64N1e18-32                  3.201n ± 1%    3.349n ± 2%   +4.62% (p=0.000 n=20)
Int64N2e18-32                  3.504n ± 1%    3.537n ± 1%        ~ (p=0.185 n=20)
Int64N4e18-32                  4.873n ± 1%    4.917n ± 0%   +0.90% (p=0.000 n=20)
Int32N1000-32                  2.639n ± 1%    2.386n ± 1%   -9.57% (p=0.000 n=20)
Int32N1e8-32                   2.686n ± 2%    2.366n ± 1%  -11.91% (p=0.000 n=20)
Int32N1e9-32                   2.636n ± 1%    2.355n ± 2%  -10.70% (p=0.000 n=20)
Int32N2e9-32                   2.660n ± 1%    2.371n ± 1%  -10.88% (p=0.000 n=20)
Float32-32                     2.261n ± 1%    2.245n ± 2%        ~ (p=0.752 n=20)
Float64-32                     2.280n ± 1%    2.235n ± 1%   -1.97% (p=0.007 n=20)
ExpFloat64-32                  3.891n ± 1%    3.813n ± 3%        ~ (p=0.087 n=20)
NormFloat64-32                 3.711n ± 1%    3.652n ± 2%        ~ (p=0.021 n=20)
Perm3-32                       32.60n ± 2%    33.12n ± 3%        ~ (p=0.107 n=20)
Perm30-32                      204.2n ± 0%    205.1n ± 1%        ~ (p=0.358 n=20)
Perm30ViaShuffle-32            121.7n ± 2%    110.8n ± 1%   -8.96% (p=0.000 n=20)
ShuffleOverhead-32             106.2n ± 2%    113.0n ± 1%   +6.36% (p=0.000 n=20)
Concurrent-32                  2.190n ± 5%    2.100n ± 0%   -4.13% (p=0.001 n=20)
PCG_DXSM-32                                   1.490n ± 0%

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 8993506f2f.arm64 │           01ff938549.arm64           │
                       │      sec/op      │    sec/op     vs base                │
SourceUint64-8                2.271n ± 0%    2.258n ± 1%        ~ (p=0.167 n=20)
GlobalInt64-8                 2.161n ± 1%    2.167n ± 0%        ~ (p=0.693 n=20)
GlobalInt64Parallel-8        0.4303n ± 0%   0.4310n ± 0%        ~ (p=0.051 n=20)
GlobalUint64-8                2.164n ± 1%    2.182n ± 1%        ~ (p=0.042 n=20)
GlobalUint64Parallel-8       0.4287n ± 0%   0.4297n ± 0%        ~ (p=0.082 n=20)
Int64-8                       2.478n ± 1%    2.472n ± 1%        ~ (p=0.151 n=20)
Uint64-8                      2.460n ± 1%    2.449n ± 1%        ~ (p=0.013 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 2%        ~ (p=0.821 n=20)
IntN1000-8                    3.003n ± 2%    2.998n ± 2%        ~ (p=0.024 n=20)
Int64N1000-8                  2.954n ± 0%    2.949n ± 2%        ~ (p=0.192 n=20)
Int64N1e8-8                   2.956n ± 0%    2.953n ± 2%        ~ (p=0.109 n=20)
Int64N1e9-8                   3.325n ± 0%    2.950n ± 0%  -11.26% (p=0.000 n=20)
Int64N2e9-8                   2.956n ± 2%    2.946n ± 2%        ~ (p=0.027 n=20)
Int64N1e18-8                  3.780n ± 1%    3.779n ± 1%        ~ (p=0.815 n=20)
Int64N2e18-8                  4.385n ± 0%    4.370n ± 1%        ~ (p=0.402 n=20)
Int64N4e18-8                  6.527n ± 0%    6.544n ± 1%        ~ (p=0.140 n=20)
Int32N1000-8                  2.964n ± 1%    2.950n ± 0%   -0.47% (p=0.002 n=20)
Int32N1e8-8                   2.964n ± 1%    2.950n ± 2%        ~ (p=0.013 n=20)
Int32N1e9-8                   2.963n ± 2%    2.951n ± 2%        ~ (p=0.062 n=20)
Int32N2e9-8                   2.961n ± 2%    2.950n ± 2%   -0.37% (p=0.002 n=20)
Float32-8                     3.442n ± 0%    3.441n ± 0%        ~ (p=0.211 n=20)
Float64-8                     3.442n ± 0%    3.442n ± 0%        ~ (p=0.067 n=20)
ExpFloat64-8                  4.472n ± 0%    4.481n ± 0%   +0.20% (p=0.000 n=20)
NormFloat64-8                 4.734n ± 0%    4.725n ± 0%   -0.19% (p=0.003 n=20)
Perm3-8                       26.55n ± 0%    26.55n ± 0%        ~ (p=0.833 n=20)
Perm30-8                      181.9n ± 0%    181.9n ± 0%   -0.03% (p=0.004 n=20)
Perm30ViaShuffle-8            143.1n ± 0%    142.9n ± 0%        ~ (p=0.204 n=20)
ShuffleOverhead-8             120.6n ± 1%    120.8n ± 2%        ~ (p=0.102 n=20)
Concurrent-8                  2.357n ± 2%    2.421n ± 6%        ~ (p=0.016 n=20)
PCG_DXSM-8                                   2.531n ± 0%

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 8993506f2f.386 │           01ff938549.386            │
                        │     sec/op     │    sec/op     vs base               │
SourceUint64-32              2.102n ± 2%    2.069n ± 0%       ~ (p=0.021 n=20)
GlobalInt64-32               3.542n ± 2%    3.456n ± 1%  -2.44% (p=0.001 n=20)
GlobalInt64Parallel-32      0.3202n ± 0%   0.3252n ± 0%  +1.56% (p=0.000 n=20)
GlobalUint64-32              3.507n ± 1%    3.573n ± 1%  +1.87% (p=0.000 n=20)
GlobalUint64Parallel-32     0.3170n ± 1%   0.3159n ± 0%       ~ (p=0.167 n=20)
Int64-32                     2.516n ± 1%    2.562n ± 2%       ~ (p=0.016 n=20)
Uint64-32                    2.544n ± 1%    2.592n ± 0%  +1.85% (p=0.000 n=20)
GlobalIntN1000-32            6.237n ± 1%    6.266n ± 2%       ~ (p=0.268 n=20)
IntN1000-32                  4.670n ± 2%    4.724n ± 2%       ~ (p=0.644 n=20)
Int64N1000-32                5.412n ± 1%    5.490n ± 2%       ~ (p=0.159 n=20)
Int64N1e8-32                 5.414n ± 2%    5.513n ± 2%       ~ (p=0.129 n=20)
Int64N1e9-32                 5.473n ± 1%    5.476n ± 1%       ~ (p=0.723 n=20)
Int64N2e9-32                 5.487n ± 1%    5.501n ± 2%       ~ (p=0.481 n=20)
Int64N1e18-32                8.901n ± 2%    9.043n ± 2%       ~ (p=0.330 n=20)
Int64N2e18-32                9.521n ± 1%    9.601n ± 2%       ~ (p=0.703 n=20)
Int64N4e18-32                11.92n ± 1%    12.00n ± 1%       ~ (p=0.489 n=20)
Int32N1000-32                4.785n ± 1%    4.829n ± 2%       ~ (p=0.402 n=20)
Int32N1e8-32                 4.748n ± 1%    4.825n ± 2%       ~ (p=0.218 n=20)
Int32N1e9-32                 4.810n ± 1%    4.830n ± 2%       ~ (p=0.794 n=20)
Int32N2e9-32                 4.812n ± 1%    4.750n ± 2%       ~ (p=0.057 n=20)
Float32-32                   10.48n ± 4%    10.89n ± 4%       ~ (p=0.162 n=20)
Float64-32                   19.79n ± 3%    19.60n ± 4%       ~ (p=0.668 n=20)
ExpFloat64-32                12.91n ± 3%    12.96n ± 3%       ~ (p=1.000 n=20)
NormFloat64-32               7.462n ± 1%    7.516n ± 1%       ~ (p=0.051 n=20)
Perm3-32                     35.98n ± 2%    36.78n ± 2%       ~ (p=0.033 n=20)
Perm30-32                    241.5n ± 1%    238.9n ± 2%       ~ (p=0.126 n=20)
Perm30ViaShuffle-32          187.3n ± 2%    189.7n ± 2%       ~ (p=0.387 n=20)
ShuffleOverhead-32           160.2n ± 1%    159.8n ± 1%       ~ (p=0.256 n=20)
Concurrent-32                3.308n ± 3%    3.286n ± 1%       ~ (p=0.038 n=20)
PCG_DXSM-32                                 7.613n ± 1%

For #61716.

Change-Id: Icb274ca1f782504d658305a40159b4ae6a2f3f1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/502505
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:09:23 +00:00
Russ Cox
f2e2637227 math/rand/v2: simplify Perm
The compiler says Perm is being inlined into BenchmarkPerm,
and yet BenchmarkPerm30ViaShuffle, which you'd think is the
same code, still runs significantly faster.

The benchmarks are mystifying but this is clearly still a step in
the right direction, since BenchmarkPerm30ViaShuffle is still
the fastest and we avoid having two copies of that logic.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ e1bbe739fb.amd64 │           8993506f2f.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.316n ± 2%    1.325n ± 1%        ~ (p=0.208 n=20)
GlobalInt64-32                 2.048n ± 1%    2.240n ± 1%   +9.38% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1037n ± 1%   0.1041n ± 1%        ~ (p=0.774 n=20)
GlobalUint64-32                2.039n ± 2%    2.072n ± 3%        ~ (p=0.115 n=20)
GlobalUint64Parallel-32       0.1013n ± 1%   0.1008n ± 1%        ~ (p=0.417 n=20)
Int64-32                       1.692n ± 2%    1.716n ± 1%        ~ (p=0.122 n=20)
Uint64-32                      1.643n ± 2%    1.665n ± 1%        ~ (p=0.062 n=20)
GlobalIntN1000-32              3.287n ± 1%    3.335n ± 1%        ~ (p=0.147 n=20)
IntN1000-32                    2.678n ± 2%    2.484n ± 1%   -7.24% (p=0.000 n=20)
Int64N1000-32                  2.684n ± 2%    2.502n ± 2%   -6.80% (p=0.000 n=20)
Int64N1e8-32                   2.663n ± 2%    2.484n ± 2%   -6.76% (p=0.000 n=20)
Int64N1e9-32                   2.633n ± 1%    2.502n ± 0%   -4.98% (p=0.000 n=20)
Int64N2e9-32                   2.657n ± 1%    2.502n ± 0%   -5.87% (p=0.000 n=20)
Int64N1e18-32                  3.125n ± 2%    3.201n ± 1%   +2.43% (p=0.000 n=20)
Int64N2e18-32                  3.476n ± 1%    3.504n ± 1%   +0.83% (p=0.009 n=20)
Int64N4e18-32                  4.795n ± 1%    4.873n ± 1%        ~ (p=0.106 n=20)
Int32N1000-32                  2.485n ± 2%    2.639n ± 1%   +6.20% (p=0.000 n=20)
Int32N1e8-32                   2.457n ± 1%    2.686n ± 2%   +9.34% (p=0.000 n=20)
Int32N1e9-32                   2.452n ± 1%    2.636n ± 1%   +7.52% (p=0.000 n=20)
Int32N2e9-32                   2.453n ± 1%    2.660n ± 1%   +8.44% (p=0.000 n=20)
Float32-32                     2.254n ± 1%    2.261n ± 1%        ~ (p=0.888 n=20)
Float64-32                     2.262n ± 1%    2.280n ± 1%        ~ (p=0.040 n=20)
ExpFloat64-32                  3.777n ± 2%    3.891n ± 1%   +3.03% (p=0.000 n=20)
NormFloat64-32                 3.606n ± 1%    3.711n ± 1%   +2.91% (p=0.000 n=20)
Perm3-32                       33.12n ± 2%    32.60n ± 2%        ~ (p=0.045 n=20)
Perm30-32                      176.1n ± 1%    204.2n ± 0%  +15.96% (p=0.000 n=20)
Perm30ViaShuffle-32            109.3n ± 1%    121.7n ± 2%  +11.30% (p=0.000 n=20)
ShuffleOverhead-32             112.5n ± 1%    106.2n ± 2%   -5.56% (p=0.000 n=20)
Concurrent-32                  2.099n ± 0%    2.190n ± 5%   +4.36% (p=0.001 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ e1bbe739fb.arm64 │           8993506f2f.arm64           │
                       │      sec/op      │    sec/op     vs base                │
SourceUint64-8                2.290n ± 1%    2.271n ± 0%        ~ (p=0.015 n=20)
GlobalInt64-8                 2.180n ± 1%    2.161n ± 1%        ~ (p=0.180 n=20)
GlobalInt64Parallel-8        0.4294n ± 0%   0.4303n ± 0%   +0.19% (p=0.001 n=20)
GlobalUint64-8                2.170n ± 1%    2.164n ± 1%        ~ (p=0.673 n=20)
GlobalUint64Parallel-8       0.4283n ± 0%   0.4287n ± 0%        ~ (p=0.128 n=20)
Int64-8                       2.481n ± 1%    2.478n ± 1%        ~ (p=0.867 n=20)
Uint64-8                      2.464n ± 1%    2.460n ± 1%        ~ (p=0.763 n=20)
GlobalIntN1000-8              2.814n ± 0%    2.814n ± 2%        ~ (p=0.969 n=20)
IntN1000-8                    2.934n ± 2%    3.003n ± 2%   +2.35% (p=0.000 n=20)
Int64N1000-8                  2.957n ± 1%    2.954n ± 0%        ~ (p=0.285 n=20)
Int64N1e8-8                   2.935n ± 2%    2.956n ± 0%   +0.73% (p=0.002 n=20)
Int64N1e9-8                   2.935n ± 2%    3.325n ± 0%  +13.29% (p=0.000 n=20)
Int64N2e9-8                   2.933n ± 4%    2.956n ± 2%        ~ (p=0.163 n=20)
Int64N1e18-8                  3.781n ± 1%    3.780n ± 1%        ~ (p=0.805 n=20)
Int64N2e18-8                  4.362n ± 0%    4.385n ± 0%        ~ (p=0.077 n=20)
Int64N4e18-8                  6.576n ± 1%    6.527n ± 0%        ~ (p=0.024 n=20)
Int32N1000-8                  2.942n ± 2%    2.964n ± 1%        ~ (p=0.073 n=20)
Int32N1e8-8                   2.941n ± 1%    2.964n ± 1%        ~ (p=0.058 n=20)
Int32N1e9-8                   2.938n ± 2%    2.963n ± 2%   +0.87% (p=0.003 n=20)
Int32N2e9-8                   2.982n ± 2%    2.961n ± 2%        ~ (p=0.056 n=20)
Float32-8                     3.441n ± 0%    3.442n ± 0%        ~ (p=0.030 n=20)
Float64-8                     3.441n ± 0%    3.442n ± 0%   +0.03% (p=0.001 n=20)
ExpFloat64-8                  4.472n ± 0%    4.472n ± 0%        ~ (p=0.877 n=20)
NormFloat64-8                 4.716n ± 0%    4.734n ± 0%   +0.38% (p=0.000 n=20)
Perm3-8                       26.66n ± 0%    26.55n ± 0%   -0.39% (p=0.000 n=20)
Perm30-8                      143.3n ± 0%    181.9n ± 0%  +26.97% (p=0.000 n=20)
Perm30ViaShuffle-8            142.9n ± 0%    143.1n ± 0%        ~ (p=0.669 n=20)
ShuffleOverhead-8             121.1n ± 1%    120.6n ± 1%   -0.41% (p=0.004 n=20)
Concurrent-8                  2.379n ± 2%    2.357n ± 2%        ~ (p=0.337 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ e1bbe739fb.386 │            8993506f2f.386            │
                        │     sec/op     │    sec/op     vs base                │
SourceUint64-32              2.087n ± 1%    2.102n ± 2%        ~ (p=0.507 n=20)
GlobalInt64-32               3.538n ± 2%    3.542n ± 2%        ~ (p=0.425 n=20)
GlobalInt64Parallel-32      0.3207n ± 1%   0.3202n ± 0%        ~ (p=0.963 n=20)
GlobalUint64-32              3.543n ± 1%    3.507n ± 1%        ~ (p=0.034 n=20)
GlobalUint64Parallel-32     0.3170n ± 0%   0.3170n ± 1%        ~ (p=0.920 n=20)
Int64-32                     2.548n ± 1%    2.516n ± 1%        ~ (p=0.139 n=20)
Uint64-32                    2.565n ± 2%    2.544n ± 1%        ~ (p=0.394 n=20)
GlobalIntN1000-32            6.300n ± 1%    6.237n ± 1%        ~ (p=0.029 n=20)
IntN1000-32                  4.750n ± 0%    4.670n ± 2%        ~ (p=0.034 n=20)
Int64N1000-32                5.515n ± 2%    5.412n ± 1%   -1.86% (p=0.009 n=20)
Int64N1e8-32                 5.527n ± 0%    5.414n ± 2%   -2.05% (p=0.002 n=20)
Int64N1e9-32                 5.531n ± 2%    5.473n ± 1%        ~ (p=0.047 n=20)
Int64N2e9-32                 5.514n ± 2%    5.487n ± 1%        ~ (p=0.298 n=20)
Int64N1e18-32                9.059n ± 1%    8.901n ± 2%        ~ (p=0.037 n=20)
Int64N2e18-32                9.594n ± 1%    9.521n ± 1%        ~ (p=0.051 n=20)
Int64N4e18-32                12.05n ± 2%    11.92n ± 1%        ~ (p=0.357 n=20)
Int32N1000-32                4.840n ± 2%    4.785n ± 1%        ~ (p=0.189 n=20)
Int32N1e8-32                 4.832n ± 2%    4.748n ± 1%        ~ (p=0.042 n=20)
Int32N1e9-32                 4.815n ± 2%    4.810n ± 1%        ~ (p=0.878 n=20)
Int32N2e9-32                 4.813n ± 1%    4.812n ± 1%        ~ (p=0.542 n=20)
Float32-32                   10.90n ± 2%    10.48n ± 4%   -3.85% (p=0.007 n=20)
Float64-32                   20.32n ± 4%    19.79n ± 3%        ~ (p=0.553 n=20)
ExpFloat64-32                12.95n ± 3%    12.91n ± 3%        ~ (p=0.909 n=20)
NormFloat64-32               7.570n ± 1%    7.462n ± 1%   -1.44% (p=0.004 n=20)
Perm3-32                     37.80n ± 2%    35.98n ± 2%   -4.79% (p=0.000 n=20)
Perm30-32                    214.0n ± 1%    241.5n ± 1%  +12.85% (p=0.000 n=20)
Perm30ViaShuffle-32          188.7n ± 2%    187.3n ± 2%        ~ (p=0.029 n=20)
ShuffleOverhead-32           160.8n ± 1%    160.2n ± 1%        ~ (p=0.180 n=20)
Concurrent-32                3.288n ± 0%    3.308n ± 3%        ~ (p=0.037 n=20)

For #61716.

Change-Id: I342b611456c3569520d3c91c849d29eba325d87e
Reviewed-on: https://go-review.googlesource.com/c/go/+/502504
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:09:21 +00:00
Branden Brown
488e2a56b9 math/rand/v2: remove bias in ExpFloat64 and NormFloat64
The original implementation of the ziggurat algorithm was designed for
32-bit random integer inputs. This necessitated reusing some low-order
bits for the slice selection and the random coordinate, which introduces
statistical bias. The result is that PractRand consistently fails the
math/rand normal and exponential sequences (transformed to uniform)
within 2 GB of variates.

This change adjusts the ziggurat procedures to use 63-bit random inputs,
so that there is no need to reuse bits between the slice and coordinate.
This is sufficient for the normal sequence to survive to 256 GB of
PractRand testing.

An alternative technique is to recalculate the ziggurats to use 1024
rather than 128 or 256 slices to make full use of 64-bit inputs. This
improves the survival of the normal sequence to far beyond 256 GB and
additionally provides a 6% performance improvement due to the improved
rejection procedure efficiency. However, doing so increases the total
size of the ziggurat tables from 4.5 kB to 48 kB.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 2703446c2e.amd64 │           e1bbe739fb.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.337n ± 1%    1.316n ± 2%        ~ (p=0.024 n=20)
GlobalInt64-32                 2.225n ± 2%    2.048n ± 1%   -7.93% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1043n ± 2%   0.1037n ± 1%        ~ (p=0.587 n=20)
GlobalUint64-32                2.058n ± 1%    2.039n ± 2%        ~ (p=0.030 n=20)
GlobalUint64Parallel-32       0.1009n ± 1%   0.1013n ± 1%        ~ (p=0.984 n=20)
Int64-32                       1.719n ± 2%    1.692n ± 2%        ~ (p=0.085 n=20)
Uint64-32                      1.669n ± 1%    1.643n ± 2%        ~ (p=0.049 n=20)
GlobalIntN1000-32              3.321n ± 2%    3.287n ± 1%        ~ (p=0.298 n=20)
IntN1000-32                    2.479n ± 1%    2.678n ± 2%   +8.01% (p=0.000 n=20)
Int64N1000-32                  2.477n ± 1%    2.684n ± 2%   +8.38% (p=0.000 n=20)
Int64N1e8-32                   2.490n ± 1%    2.663n ± 2%   +6.99% (p=0.000 n=20)
Int64N1e9-32                   2.458n ± 1%    2.633n ± 1%   +7.12% (p=0.000 n=20)
Int64N2e9-32                   2.486n ± 2%    2.657n ± 1%   +6.90% (p=0.000 n=20)
Int64N1e18-32                  3.215n ± 2%    3.125n ± 2%   -2.78% (p=0.000 n=20)
Int64N2e18-32                  3.588n ± 2%    3.476n ± 1%   -3.15% (p=0.000 n=20)
Int64N4e18-32                  4.938n ± 2%    4.795n ± 1%   -2.91% (p=0.000 n=20)
Int32N1000-32                  2.673n ± 2%    2.485n ± 2%   -7.02% (p=0.000 n=20)
Int32N1e8-32                   2.631n ± 2%    2.457n ± 1%   -6.63% (p=0.000 n=20)
Int32N1e9-32                   2.628n ± 2%    2.452n ± 1%   -6.70% (p=0.000 n=20)
Int32N2e9-32                   2.684n ± 2%    2.453n ± 1%   -8.61% (p=0.000 n=20)
Float32-32                     2.240n ± 2%    2.254n ± 1%        ~ (p=0.878 n=20)
Float64-32                     2.253n ± 1%    2.262n ± 1%        ~ (p=0.963 n=20)
ExpFloat64-32                  3.677n ± 1%    3.777n ± 2%   +2.71% (p=0.004 n=20)
NormFloat64-32                 3.761n ± 1%    3.606n ± 1%   -4.15% (p=0.000 n=20)
Perm3-32                       33.55n ± 2%    33.12n ± 2%        ~ (p=0.402 n=20)
Perm30-32                      173.2n ± 1%    176.1n ± 1%   +1.67% (p=0.000 n=20)
Perm30ViaShuffle-32            115.9n ± 1%    109.3n ± 1%   -5.69% (p=0.000 n=20)
ShuffleOverhead-32             101.9n ± 1%    112.5n ± 1%  +10.35% (p=0.000 n=20)
Concurrent-32                  2.107n ± 6%    2.099n ± 0%        ~ (p=0.051 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 2703446c2e.arm64 │          e1bbe739fb.arm64           │
                       │      sec/op      │    sec/op     vs base               │
SourceUint64-8                2.275n ± 0%    2.290n ± 1%       ~ (p=0.044 n=20)
GlobalInt64-8                 2.154n ± 1%    2.180n ± 1%       ~ (p=0.068 n=20)
GlobalInt64Parallel-8        0.4298n ± 0%   0.4294n ± 0%       ~ (p=0.079 n=20)
GlobalUint64-8                2.160n ± 1%    2.170n ± 1%       ~ (p=0.129 n=20)
GlobalUint64Parallel-8       0.4286n ± 0%   0.4283n ± 0%       ~ (p=0.350 n=20)
Int64-8                       2.491n ± 1%    2.481n ± 1%       ~ (p=0.330 n=20)
Uint64-8                      2.458n ± 0%    2.464n ± 1%       ~ (p=0.351 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 0%       ~ (p=0.325 n=20)
IntN1000-8                    2.933n ± 0%    2.934n ± 2%       ~ (p=0.079 n=20)
Int64N1000-8                  2.962n ± 1%    2.957n ± 1%       ~ (p=0.259 n=20)
Int64N1e8-8                   2.960n ± 1%    2.935n ± 2%       ~ (p=0.276 n=20)
Int64N1e9-8                   2.935n ± 2%    2.935n ± 2%       ~ (p=0.984 n=20)
Int64N2e9-8                   2.934n ± 0%    2.933n ± 4%       ~ (p=0.463 n=20)
Int64N1e18-8                  3.777n ± 1%    3.781n ± 1%       ~ (p=0.516 n=20)
Int64N2e18-8                  4.359n ± 1%    4.362n ± 0%       ~ (p=0.256 n=20)
Int64N4e18-8                  6.536n ± 1%    6.576n ± 1%       ~ (p=0.224 n=20)
Int32N1000-8                  2.937n ± 0%    2.942n ± 2%       ~ (p=0.312 n=20)
Int32N1e8-8                   2.937n ± 1%    2.941n ± 1%       ~ (p=0.463 n=20)
Int32N1e9-8                   2.936n ± 0%    2.938n ± 2%       ~ (p=0.044 n=20)
Int32N2e9-8                   2.938n ± 2%    2.982n ± 2%       ~ (p=0.174 n=20)
Float32-8                     3.441n ± 0%    3.441n ± 0%       ~ (p=0.064 n=20)
Float64-8                     3.441n ± 0%    3.441n ± 0%       ~ (p=0.826 n=20)
ExpFloat64-8                  4.486n ± 0%    4.472n ± 0%  -0.31% (p=0.000 n=20)
NormFloat64-8                 4.721n ± 0%    4.716n ± 0%       ~ (p=0.051 n=20)
Perm3-8                       26.65n ± 0%    26.66n ± 0%       ~ (p=0.080 n=20)
Perm30-8                      143.2n ± 0%    143.3n ± 0%  +0.10% (p=0.000 n=20)
Perm30ViaShuffle-8            143.0n ± 0%    142.9n ± 0%       ~ (p=0.642 n=20)
ShuffleOverhead-8             120.6n ± 1%    121.1n ± 1%  +0.41% (p=0.010 n=20)
Concurrent-8                  2.399n ± 5%    2.379n ± 2%       ~ (p=0.365 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 2703446c2e.386 │           e1bbe739fb.386            │
                        │     sec/op     │    sec/op     vs base               │
SourceUint64-32             2.072n ±  2%    2.087n ± 1%       ~ (p=0.440 n=20)
GlobalInt64-32              3.546n ± 27%    3.538n ± 2%       ~ (p=0.101 n=20)
GlobalInt64Parallel-32     0.3211n ±  0%   0.3207n ± 1%       ~ (p=0.753 n=20)
GlobalUint64-32             3.522n ±  2%    3.543n ± 1%       ~ (p=0.071 n=20)
GlobalUint64Parallel-32    0.3172n ±  0%   0.3170n ± 0%       ~ (p=0.507 n=20)
Int64-32                    2.520n ±  2%    2.548n ± 1%       ~ (p=0.267 n=20)
Uint64-32                   2.581n ±  1%    2.565n ± 2%       ~ (p=0.143 n=20)
GlobalIntN1000-32           6.171n ±  1%    6.300n ± 1%       ~ (p=0.037 n=20)
IntN1000-32                 4.752n ±  2%    4.750n ± 0%       ~ (p=0.984 n=20)
Int64N1000-32               5.429n ±  1%    5.515n ± 2%       ~ (p=0.292 n=20)
Int64N1e8-32                5.469n ±  2%    5.527n ± 0%       ~ (p=0.013 n=20)
Int64N1e9-32                5.489n ±  2%    5.531n ± 2%       ~ (p=0.256 n=20)
Int64N2e9-32                5.492n ±  2%    5.514n ± 2%       ~ (p=0.606 n=20)
Int64N1e18-32               8.927n ±  1%    9.059n ± 1%       ~ (p=0.229 n=20)
Int64N2e18-32               9.622n ±  1%    9.594n ± 1%       ~ (p=0.703 n=20)
Int64N4e18-32               12.03n ±  1%    12.05n ± 2%       ~ (p=0.733 n=20)
Int32N1000-32               4.817n ±  1%    4.840n ± 2%       ~ (p=0.941 n=20)
Int32N1e8-32                4.801n ±  1%    4.832n ± 2%       ~ (p=0.228 n=20)
Int32N1e9-32                4.798n ±  1%    4.815n ± 2%       ~ (p=0.560 n=20)
Int32N2e9-32                4.840n ±  1%    4.813n ± 1%       ~ (p=0.015 n=20)
Float32-32                  10.51n ±  4%    10.90n ± 2%  +3.71% (p=0.007 n=20)
Float64-32                  20.33n ±  3%    20.32n ± 4%       ~ (p=0.566 n=20)
ExpFloat64-32               12.59n ±  2%    12.95n ± 3%  +2.86% (p=0.002 n=20)
NormFloat64-32              7.350n ±  2%    7.570n ± 1%  +2.99% (p=0.007 n=20)
Perm3-32                    39.29n ±  2%    37.80n ± 2%  -3.79% (p=0.000 n=20)
Perm30-32                   219.1n ±  2%    214.0n ± 1%  -2.33% (p=0.002 n=20)
Perm30ViaShuffle-32         189.8n ±  2%    188.7n ± 2%       ~ (p=0.147 n=20)
ShuffleOverhead-32          158.9n ±  2%    160.8n ± 1%       ~ (p=0.176 n=20)
Concurrent-32               3.306n ±  3%    3.288n ± 0%  -0.54% (p=0.005 n=20)

For #61716.

Change-Id: I4c5fe710b310dc075ae21c97d1805bcc20db5050
Reviewed-on: https://go-review.googlesource.com/c/go/+/516275
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:08:47 +00:00
Russ Cox
ecda959b99 math/rand/v2: optimize Float32, Float64
We realized too late after Go 1 that float64(r.Uint64())/(1<<64)
is not a correct implementation: it occasionally rounds to 1.
The correct implementation is float64(r.Uint64()&(1<<53-1))/(1<<53)
but we couldn't change the implementation for compatibility, so we
changed it to retry only in the "round to 1" cases.

The change to v2 lets us update the algorithm to the simpler,
faster one.

Note that this implementation cannot generate 2⁻⁵⁴, nor 2⁻¹⁰⁰,
nor any of the other numbers between 0 and 2⁻⁵³. A slower algorithm
could shift some of the probability of generating these two boundary
values over to the values in between, but that would be much slower
and not necessarily be better. In particular, the current
implementation has the property that there are uniform gaps between
the possible returned floats, which might help stability. Also, the
result is often scaled and shifted, like Float64()*X+Y. Multiplying by
X>1 would open new gaps, and adding most Y would erase all the
distinctions that were introduced.

The only changes to benchmarks should be in Float32 and Float64.
The other changes remain a cautionary tale.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 4d84a369d1.amd64 │           2703446c2e.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.348n ± 2%    1.337n ± 1%        ~ (p=0.662 n=20)
GlobalInt64-32                 2.082n ± 2%    2.225n ± 2%   +6.87% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1036n ± 1%   0.1043n ± 2%        ~ (p=0.171 n=20)
GlobalUint64-32                2.077n ± 2%    2.058n ± 1%        ~ (p=0.560 n=20)
GlobalUint64Parallel-32       0.1012n ± 1%   0.1009n ± 1%        ~ (p=0.995 n=20)
Int64-32                       1.750n ± 0%    1.719n ± 2%   -1.74% (p=0.000 n=20)
Uint64-32                      1.707n ± 2%    1.669n ± 1%   -2.20% (p=0.000 n=20)
GlobalIntN1000-32              3.192n ± 1%    3.321n ± 2%   +4.04% (p=0.000 n=20)
IntN1000-32                    2.462n ± 2%    2.479n ± 1%        ~ (p=0.417 n=20)
Int64N1000-32                  2.470n ± 1%    2.477n ± 1%        ~ (p=0.664 n=20)
Int64N1e8-32                   2.503n ± 2%    2.490n ± 1%        ~ (p=0.245 n=20)
Int64N1e9-32                   2.487n ± 1%    2.458n ± 1%        ~ (p=0.032 n=20)
Int64N2e9-32                   2.487n ± 1%    2.486n ± 2%        ~ (p=0.507 n=20)
Int64N1e18-32                  3.006n ± 2%    3.215n ± 2%   +6.94% (p=0.000 n=20)
Int64N2e18-32                  3.368n ± 1%    3.588n ± 2%   +6.55% (p=0.000 n=20)
Int64N4e18-32                  4.763n ± 1%    4.938n ± 2%   +3.69% (p=0.000 n=20)
Int32N1000-32                  2.403n ± 1%    2.673n ± 2%  +11.19% (p=0.000 n=20)
Int32N1e8-32                   2.405n ± 1%    2.631n ± 2%   +9.42% (p=0.000 n=20)
Int32N1e9-32                   2.402n ± 2%    2.628n ± 2%   +9.41% (p=0.000 n=20)
Int32N2e9-32                   2.384n ± 1%    2.684n ± 2%  +12.56% (p=0.000 n=20)
Float32-32                     2.641n ± 2%    2.240n ± 2%  -15.18% (p=0.000 n=20)
Float64-32                     2.483n ± 1%    2.253n ± 1%   -9.26% (p=0.000 n=20)
ExpFloat64-32                  3.486n ± 2%    3.677n ± 1%   +5.49% (p=0.000 n=20)
NormFloat64-32                 3.648n ± 1%    3.761n ± 1%   +3.11% (p=0.000 n=20)
Perm3-32                       33.04n ± 1%    33.55n ± 2%        ~ (p=0.180 n=20)
Perm30-32                      171.9n ± 1%    173.2n ± 1%        ~ (p=0.050 n=20)
Perm30ViaShuffle-32            100.3n ± 1%    115.9n ± 1%  +15.55% (p=0.000 n=20)
ShuffleOverhead-32             102.5n ± 1%    101.9n ± 1%        ~ (p=0.266 n=20)
Concurrent-32                  2.101n ± 0%    2.107n ± 6%        ~ (p=0.212 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 4d84a369d1.arm64 │          2703446c2e.arm64           │
                       │      sec/op      │    sec/op     vs base               │
SourceUint64-8                2.261n ± 1%    2.275n ± 0%       ~ (p=0.082 n=20)
GlobalInt64-8                 2.160n ± 1%    2.154n ± 1%       ~ (p=0.490 n=20)
GlobalInt64Parallel-8        0.4299n ± 0%   0.4298n ± 0%       ~ (p=0.663 n=20)
GlobalUint64-8                2.169n ± 1%    2.160n ± 1%       ~ (p=0.292 n=20)
GlobalUint64Parallel-8       0.4293n ± 1%   0.4286n ± 0%       ~ (p=0.155 n=20)
Int64-8                       2.473n ± 1%    2.491n ± 1%       ~ (p=0.317 n=20)
Uint64-8                      2.453n ± 1%    2.458n ± 0%       ~ (p=0.941 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 2%       ~ (p=0.972 n=20)
IntN1000-8                    2.933n ± 2%    2.933n ± 0%       ~ (p=0.287 n=20)
Int64N1000-8                  2.934n ± 2%    2.962n ± 1%       ~ (p=0.062 n=20)
Int64N1e8-8                   2.935n ± 2%    2.960n ± 1%       ~ (p=0.183 n=20)
Int64N1e9-8                   2.934n ± 2%    2.935n ± 2%       ~ (p=0.367 n=20)
Int64N2e9-8                   2.935n ± 2%    2.934n ± 0%       ~ (p=0.455 n=20)
Int64N1e18-8                  3.778n ± 1%    3.777n ± 1%       ~ (p=0.995 n=20)
Int64N2e18-8                  4.359n ± 1%    4.359n ± 1%       ~ (p=0.122 n=20)
Int64N4e18-8                  6.546n ± 1%    6.536n ± 1%       ~ (p=0.920 n=20)
Int32N1000-8                  2.940n ± 2%    2.937n ± 0%       ~ (p=0.149 n=20)
Int32N1e8-8                   2.937n ± 2%    2.937n ± 1%       ~ (p=0.620 n=20)
Int32N1e9-8                   2.938n ± 0%    2.936n ± 0%       ~ (p=0.046 n=20)
Int32N2e9-8                   2.938n ± 2%    2.938n ± 2%       ~ (p=0.455 n=20)
Float32-8                     3.486n ± 0%    3.441n ± 0%  -1.28% (p=0.000 n=20)
Float64-8                     3.480n ± 0%    3.441n ± 0%  -1.13% (p=0.000 n=20)
ExpFloat64-8                  4.533n ± 0%    4.486n ± 0%  -1.03% (p=0.000 n=20)
NormFloat64-8                 4.764n ± 0%    4.721n ± 0%  -0.90% (p=0.000 n=20)
Perm3-8                       26.66n ± 0%    26.65n ± 0%       ~ (p=0.019 n=20)
Perm30-8                      143.4n ± 0%    143.2n ± 0%  -0.17% (p=0.000 n=20)
Perm30ViaShuffle-8            142.9n ± 0%    143.0n ± 0%       ~ (p=0.522 n=20)
ShuffleOverhead-8             120.7n ± 0%    120.6n ± 1%       ~ (p=0.488 n=20)
Concurrent-8                  2.360n ± 2%    2.399n ± 5%       ~ (p=0.062 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 4d84a369d1.386 │            2703446c2e.386             │
                        │     sec/op     │    sec/op      vs base                │
SourceUint64-32              2.101n ± 2%    2.072n ±  2%        ~ (p=0.273 n=20)
GlobalInt64-32               3.518n ± 2%    3.546n ± 27%   +0.78% (p=0.007 n=20)
GlobalInt64Parallel-32      0.3206n ± 0%   0.3211n ±  0%        ~ (p=0.386 n=20)
GlobalUint64-32              3.538n ± 1%    3.522n ±  2%        ~ (p=0.331 n=20)
GlobalUint64Parallel-32     0.3231n ± 0%   0.3172n ±  0%   -1.84% (p=0.000 n=20)
Int64-32                     2.554n ± 2%    2.520n ±  2%        ~ (p=0.465 n=20)
Uint64-32                    2.575n ± 2%    2.581n ±  1%        ~ (p=0.213 n=20)
GlobalIntN1000-32            6.292n ± 1%    6.171n ±  1%        ~ (p=0.015 n=20)
IntN1000-32                  4.735n ± 1%    4.752n ±  2%        ~ (p=0.635 n=20)
Int64N1000-32                5.489n ± 2%    5.429n ±  1%        ~ (p=0.324 n=20)
Int64N1e8-32                 5.528n ± 2%    5.469n ±  2%        ~ (p=0.013 n=20)
Int64N1e9-32                 5.438n ± 2%    5.489n ±  2%        ~ (p=0.984 n=20)
Int64N2e9-32                 5.474n ± 1%    5.492n ±  2%        ~ (p=0.616 n=20)
Int64N1e18-32                9.053n ± 1%    8.927n ±  1%        ~ (p=0.037 n=20)
Int64N2e18-32                9.685n ± 2%    9.622n ±  1%        ~ (p=0.449 n=20)
Int64N4e18-32                12.18n ± 1%    12.03n ±  1%        ~ (p=0.013 n=20)
Int32N1000-32                4.862n ± 1%    4.817n ±  1%   -0.94% (p=0.002 n=20)
Int32N1e8-32                 4.758n ± 2%    4.801n ±  1%        ~ (p=0.597 n=20)
Int32N1e9-32                 4.772n ± 1%    4.798n ±  1%        ~ (p=0.774 n=20)
Int32N2e9-32                 4.847n ± 0%    4.840n ±  1%        ~ (p=0.867 n=20)
Float32-32                   22.18n ± 4%    10.51n ±  4%  -52.61% (p=0.000 n=20)
Float64-32                   21.21n ± 3%    20.33n ±  3%   -4.17% (p=0.000 n=20)
ExpFloat64-32                12.39n ± 2%    12.59n ±  2%        ~ (p=0.139 n=20)
NormFloat64-32               7.422n ± 1%    7.350n ±  2%        ~ (p=0.208 n=20)
Perm3-32                     38.00n ± 2%    39.29n ±  2%   +3.38% (p=0.000 n=20)
Perm30-32                    212.7n ± 1%    219.1n ±  2%   +3.03% (p=0.001 n=20)
Perm30ViaShuffle-32          187.5n ± 2%    189.8n ±  2%        ~ (p=0.457 n=20)
ShuffleOverhead-32           159.7n ± 1%    158.9n ±  2%        ~ (p=0.920 n=20)
Concurrent-32                3.470n ± 0%    3.306n ±  3%   -4.71% (p=0.000 n=20)

For #61716.

Change-Id: I1933f1f9efd7e6e832d83e7fa5d84398f67d41f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/502503
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 17:08:40 +00:00
Russ Cox
c266587846 math/rand/v2: add, optimize N, UintN, Uint32N, Uint64N
Now that we can break the value stream, we can take advantage
of better algorithms that have been suggested since the original
code was written.

Also optimizes IntN, Int32N, Int64N, Perm (indirectly).

All the N variants (IntN, Int32N, Int64N, UintN, N, etc) now
return the same values given a Source and parameter n, so that
for example uint(r.IntN(10)) and r.UintN(10) and r.N(uint(10))
are completely interchangeable.

Int64N4e18 gets slower but that is a near worst case for
the algorithm and is extremely unlikely in practice.

32-bit Int32N variants got slower too, by 15-30%, in exchange
for speeding up everything on 64-bit systems and consistency
across the N functions.

Also rename previously missed benchmark
GlobalInt63Parallel to GlobalInt64Parallel.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 11ad9fdddc.amd64 │            4d84a369d1.amd64            │
                        │      sec/op      │    sec/op     vs base                  │
SourceUint64-32                1.335n ± 1%    1.348n ± 2%        ~ (p=0.335 n=20)
GlobalInt64-32                 2.046n ± 1%    2.082n ± 2%        ~ (p=0.310 n=20)
GlobalInt63Parallel-32        0.1037n ± 1%
GlobalInt64Parallel-32                       0.1036n ± 1%
GlobalUint64-32                2.075n ± 0%    2.077n ± 2%        ~ (p=0.228 n=20)
GlobalUint64Parallel-32       0.1013n ± 1%   0.1012n ± 1%        ~ (p=0.878 n=20)
Int64-32                       1.726n ± 2%    1.750n ± 0%   +1.39% (p=0.000 n=20)
Uint64-32                      1.673n ± 1%    1.707n ± 2%   +2.03% (p=0.002 n=20)
GlobalIntN1000-32              3.895n ± 2%    3.192n ± 1%  -18.05% (p=0.000 n=20)
IntN1000-32                    3.403n ± 1%    2.462n ± 2%  -27.65% (p=0.000 n=20)
Int64N1000-32                  3.053n ± 2%    2.470n ± 1%  -19.11% (p=0.000 n=20)
Int64N1e8-32                   2.718n ± 1%    2.503n ± 2%   -7.91% (p=0.000 n=20)
Int64N1e9-32                   2.712n ± 1%    2.487n ± 1%   -8.31% (p=0.000 n=20)
Int64N2e9-32                   2.690n ± 1%    2.487n ± 1%   -7.57% (p=0.000 n=20)
Int64N1e18-32                  3.084n ± 2%    3.006n ± 2%   -2.53% (p=0.000 n=20)
Int64N2e18-32                  4.026n ± 1%    3.368n ± 1%  -16.33% (p=0.000 n=20)
Int64N4e18-32                  4.049n ± 2%    4.763n ± 1%  +17.62% (p=0.000 n=20)
Int32N1000-32                  2.730n ± 0%    2.403n ± 1%  -11.94% (p=0.000 n=20)
Int32N1e8-32                   2.916n ± 2%    2.405n ± 1%  -17.53% (p=0.000 n=20)
Int32N1e9-32                   3.375n ± 1%    2.402n ± 2%  -28.83% (p=0.000 n=20)
Int32N2e9-32                   3.292n ± 1%    2.384n ± 1%  -27.58% (p=0.000 n=20)
Float32-32                     2.673n ± 1%    2.641n ± 2%        ~ (p=0.147 n=20)
Float64-32                     2.485n ± 1%    2.483n ± 1%        ~ (p=0.804 n=20)
ExpFloat64-32                  3.577n ± 2%    3.486n ± 2%   -2.57% (p=0.000 n=20)
NormFloat64-32                 3.797n ± 2%    3.648n ± 1%   -3.92% (p=0.000 n=20)
Perm3-32                       35.79n ± 2%    33.04n ± 1%   -7.68% (p=0.000 n=20)
Perm30-32                      205.1n ± 1%    171.9n ± 1%  -16.14% (p=0.000 n=20)
Perm30ViaShuffle-32            111.2n ± 2%    100.3n ± 1%   -9.76% (p=0.000 n=20)
ShuffleOverhead-32             100.5n ± 2%    102.5n ± 1%   +1.99% (p=0.007 n=20)
Concurrent-32                  2.188n ± 5%    2.101n ± 0%        ~ (p=0.013 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 11ad9fdddc.arm64 │            4d84a369d1.arm64            │
                       │      sec/op      │    sec/op     vs base                  │
SourceUint64-8                2.272n ± 1%    2.261n ± 1%        ~ (p=0.172 n=20)
GlobalInt64-8                 2.155n ± 1%    2.160n ± 1%        ~ (p=0.482 n=20)
GlobalInt63Parallel-8        0.4352n ± 0%
GlobalInt64Parallel-8                       0.4299n ± 0%
GlobalUint64-8                2.173n ± 1%    2.169n ± 1%        ~ (p=0.262 n=20)
GlobalUint64Parallel-8       0.4340n ± 0%   0.4293n ± 1%   -1.08% (p=0.000 n=20)
Int64-8                       2.544n ± 1%    2.473n ± 1%   -2.83% (p=0.000 n=20)
Uint64-8                      2.552n ± 1%    2.453n ± 1%   -3.90% (p=0.000 n=20)
GlobalIntN1000-8              3.856n ± 0%    2.814n ± 2%  -27.02% (p=0.000 n=20)
IntN1000-8                    3.820n ± 0%    2.933n ± 2%  -23.22% (p=0.000 n=20)
Int64N1000-8                  3.219n ± 2%    2.934n ± 2%   -8.85% (p=0.000 n=20)
Int64N1e8-8                   3.221n ± 2%    2.935n ± 2%   -8.91% (p=0.000 n=20)
Int64N1e9-8                   3.276n ± 2%    2.934n ± 2%  -10.44% (p=0.000 n=20)
Int64N2e9-8                   3.217n ± 0%    2.935n ± 2%   -8.78% (p=0.000 n=20)
Int64N1e18-8                  3.502n ± 2%    3.778n ± 1%   +7.91% (p=0.000 n=20)
Int64N2e18-8                  4.968n ± 1%    4.359n ± 1%  -12.26% (p=0.000 n=20)
Int64N4e18-8                  4.963n ± 0%    6.546n ± 1%  +31.92% (p=0.000 n=20)
Int32N1000-8                  3.189n ± 1%    2.940n ± 2%   -7.81% (p=0.000 n=20)
Int32N1e8-8                   3.514n ± 1%    2.937n ± 2%  -16.41% (p=0.000 n=20)
Int32N1e9-8                   4.133n ± 0%    2.938n ± 0%  -28.91% (p=0.000 n=20)
Int32N2e9-8                   4.137n ± 0%    2.938n ± 2%  -28.97% (p=0.000 n=20)
Float32-8                     3.468n ± 1%    3.486n ± 0%   +0.52% (p=0.000 n=20)
Float64-8                     3.478n ± 0%    3.480n ± 0%        ~ (p=0.063 n=20)
ExpFloat64-8                  4.563n ± 0%    4.533n ± 0%   -0.67% (p=0.000 n=20)
NormFloat64-8                 4.768n ± 0%    4.764n ± 0%   -0.07% (p=0.001 n=20)
Perm3-8                       28.94n ± 0%    26.66n ± 0%   -7.88% (p=0.000 n=20)
Perm30-8                      175.9n ± 0%    143.4n ± 0%  -18.50% (p=0.000 n=20)
Perm30ViaShuffle-8            152.6n ± 1%    142.9n ± 0%   -6.29% (p=0.000 n=20)
ShuffleOverhead-8             119.6n ± 1%    120.7n ± 0%   +0.96% (p=0.000 n=20)
Concurrent-8                  2.452n ± 3%    2.360n ± 2%   -3.73% (p=0.007 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 11ad9fdddc.386 │             4d84a369d1.386             │
                        │     sec/op     │    sec/op     vs base                  │
SourceUint64-32              2.091n ± 1%    2.101n ± 2%        ~ (p=0.672 n=20)
GlobalInt64-32               3.514n ± 2%    3.518n ± 2%        ~ (p=0.723 n=20)
GlobalInt63Parallel-32      0.3197n ± 0%
GlobalInt64Parallel-32                     0.3206n ± 0%
GlobalUint64-32              3.542n ± 1%    3.538n ± 1%        ~ (p=0.304 n=20)
GlobalUint64Parallel-32     0.3218n ± 0%   0.3231n ± 0%        ~ (p=0.071 n=20)
Int64-32                     2.552n ± 2%    2.554n ± 2%        ~ (p=0.693 n=20)
Uint64-32                    2.566n ± 1%    2.575n ± 2%        ~ (p=0.606 n=20)
GlobalIntN1000-32            5.965n ± 2%    6.292n ± 1%   +5.46% (p=0.000 n=20)
IntN1000-32                  4.652n ± 1%    4.735n ± 1%   +1.77% (p=0.000 n=20)
Int64N1000-32               14.485n ± 1%    5.489n ± 2%  -62.11% (p=0.000 n=20)
Int64N1e8-32                14.675n ± 1%    5.528n ± 2%  -62.33% (p=0.000 n=20)
Int64N1e9-32                16.805n ± 2%    5.438n ± 2%  -67.64% (p=0.000 n=20)
Int64N2e9-32                14.515n ± 1%    5.474n ± 1%  -62.28% (p=0.000 n=20)
Int64N1e18-32               16.165n ± 1%    9.053n ± 1%  -44.00% (p=0.000 n=20)
Int64N2e18-32               17.945n ± 2%    9.685n ± 2%  -46.03% (p=0.000 n=20)
Int64N4e18-32                18.35n ± 2%    12.18n ± 1%  -33.62% (p=0.000 n=20)
Int32N1000-32                3.608n ± 1%    4.862n ± 1%  +34.77% (p=0.000 n=20)
Int32N1e8-32                 3.767n ± 1%    4.758n ± 2%  +26.31% (p=0.000 n=20)
Int32N1e9-32                 4.130n ± 2%    4.772n ± 1%  +15.54% (p=0.000 n=20)
Int32N2e9-32                 4.206n ± 1%    4.847n ± 0%  +15.24% (p=0.000 n=20)
Float32-32                   22.18n ± 4%    22.18n ± 4%        ~ (p=0.195 n=20)
Float64-32                   20.75n ± 4%    21.21n ± 3%        ~ (p=0.394 n=20)
ExpFloat64-32                12.58n ± 3%    12.39n ± 2%        ~ (p=0.032 n=20)
NormFloat64-32               7.920n ± 3%    7.422n ± 1%   -6.29% (p=0.000 n=20)
Perm3-32                     40.27n ± 1%    38.00n ± 2%   -5.65% (p=0.000 n=20)
Perm30-32                    213.2n ± 2%    212.7n ± 1%        ~ (p=0.995 n=20)
Perm30ViaShuffle-32          164.2n ± 2%    187.5n ± 2%  +14.22% (p=0.000 n=20)
ShuffleOverhead-32           134.7n ± 2%    159.7n ± 1%  +18.52% (p=0.000 n=20)
Concurrent-32                3.301n ± 2%    3.470n ± 0%   +5.10% (p=0.000 n=20)

For #61716.

Change-Id: Id1481b04202883cd0b23e21bb58d1bca4e482bd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/502500
Reviewed-by: Rob Pike <r@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 17:08:37 +00:00
Russ Cox
c7dddb02d3 math/rand/v2: change Source to use uint64
This should make Uint64-using functions faster and leave
other things alone. It is a mystery why so much got faster.
A good cautionary tale not to read too much into minor
jitter in the benchmarks.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.amd64 │           11ad9fdddc.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.555n ± 1%    1.335n ± 1%  -14.15% (p=0.000 n=20)
GlobalInt64-32                 2.071n ± 1%    2.046n ± 1%        ~ (p=0.016 n=20)
GlobalInt63Parallel-32        0.1023n ± 1%   0.1037n ± 1%   +1.37% (p=0.002 n=20)
GlobalUint64-32                5.193n ± 1%    2.075n ± 0%  -60.06% (p=0.000 n=20)
GlobalUint64Parallel-32       0.2341n ± 0%   0.1013n ± 1%  -56.74% (p=0.000 n=20)
Int64-32                       2.056n ± 2%    1.726n ± 2%  -16.10% (p=0.000 n=20)
Uint64-32                      2.077n ± 2%    1.673n ± 1%  -19.46% (p=0.000 n=20)
GlobalIntN1000-32              4.077n ± 2%    3.895n ± 2%   -4.45% (p=0.000 n=20)
IntN1000-32                    3.476n ± 2%    3.403n ± 1%   -2.10% (p=0.000 n=20)
Int64N1000-32                  3.059n ± 1%    3.053n ± 2%        ~ (p=0.131 n=20)
Int64N1e8-32                   2.942n ± 1%    2.718n ± 1%   -7.60% (p=0.000 n=20)
Int64N1e9-32                   2.932n ± 1%    2.712n ± 1%   -7.50% (p=0.000 n=20)
Int64N2e9-32                   2.925n ± 1%    2.690n ± 1%   -8.03% (p=0.000 n=20)
Int64N1e18-32                  3.116n ± 1%    3.084n ± 2%        ~ (p=0.425 n=20)
Int64N2e18-32                  4.067n ± 1%    4.026n ± 1%   -1.02% (p=0.007 n=20)
Int64N4e18-32                  4.054n ± 1%    4.049n ± 2%        ~ (p=0.204 n=20)
Int32N1000-32                  2.951n ± 1%    2.730n ± 0%   -7.49% (p=0.000 n=20)
Int32N1e8-32                   3.102n ± 1%    2.916n ± 2%   -6.03% (p=0.000 n=20)
Int32N1e9-32                   3.535n ± 1%    3.375n ± 1%   -4.54% (p=0.000 n=20)
Int32N2e9-32                   3.514n ± 1%    3.292n ± 1%   -6.30% (p=0.000 n=20)
Float32-32                     2.760n ± 1%    2.673n ± 1%   -3.13% (p=0.000 n=20)
Float64-32                     2.284n ± 1%    2.485n ± 1%   +8.80% (p=0.000 n=20)
ExpFloat64-32                  3.757n ± 1%    3.577n ± 2%   -4.78% (p=0.000 n=20)
NormFloat64-32                 3.837n ± 1%    3.797n ± 2%        ~ (p=0.204 n=20)
Perm3-32                       35.23n ± 2%    35.79n ± 2%        ~ (p=0.298 n=20)
Perm30-32                      208.8n ± 1%    205.1n ± 1%   -1.82% (p=0.000 n=20)
Perm30ViaShuffle-32            111.7n ± 1%    111.2n ± 2%        ~ (p=0.273 n=20)
ShuffleOverhead-32             101.1n ± 1%    100.5n ± 2%        ~ (p=0.878 n=20)
Concurrent-32                  2.108n ± 7%    2.188n ± 5%        ~ (p=0.417 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
                       │ 220860f76f.arm64 │           11ad9fdddc.arm64           │
                       │      sec/op      │    sec/op     vs base                │
SourceUint64-8                2.316n ± 1%    2.272n ± 1%   -1.86% (p=0.000 n=20)
GlobalInt64-8                 2.183n ± 1%    2.155n ± 1%        ~ (p=0.122 n=20)
GlobalInt63Parallel-8        0.4331n ± 0%   0.4352n ± 0%   +0.48% (p=0.000 n=20)
GlobalUint64-8                4.377n ± 2%    2.173n ± 1%  -50.35% (p=0.000 n=20)
GlobalUint64Parallel-8       0.9237n ± 0%   0.4340n ± 0%  -53.02% (p=0.000 n=20)
Int64-8                       2.538n ± 1%    2.544n ± 1%        ~ (p=0.189 n=20)
Uint64-8                      2.604n ± 1%    2.552n ± 1%   -1.98% (p=0.000 n=20)
GlobalIntN1000-8              3.857n ± 2%    3.856n ± 0%        ~ (p=0.051 n=20)
IntN1000-8                    3.822n ± 2%    3.820n ± 0%   -0.05% (p=0.001 n=20)
Int64N1000-8                  3.318n ± 0%    3.219n ± 2%   -2.98% (p=0.000 n=20)
Int64N1e8-8                   3.349n ± 1%    3.221n ± 2%   -3.79% (p=0.000 n=20)
Int64N1e9-8                   3.317n ± 2%    3.276n ± 2%   -1.24% (p=0.001 n=20)
Int64N2e9-8                   3.317n ± 2%    3.217n ± 0%   -3.01% (p=0.000 n=20)
Int64N1e18-8                  3.542n ± 1%    3.502n ± 2%   -1.16% (p=0.001 n=20)
Int64N2e18-8                  5.087n ± 0%    4.968n ± 1%   -2.33% (p=0.000 n=20)
Int64N4e18-8                  5.084n ± 0%    4.963n ± 0%   -2.39% (p=0.000 n=20)
Int32N1000-8                  3.208n ± 2%    3.189n ± 1%   -0.58% (p=0.001 n=20)
Int32N1e8-8                   3.610n ± 1%    3.514n ± 1%   -2.67% (p=0.000 n=20)
Int32N1e9-8                   4.235n ± 0%    4.133n ± 0%   -2.40% (p=0.000 n=20)
Int32N2e9-8                   4.229n ± 1%    4.137n ± 0%   -2.19% (p=0.000 n=20)
Float32-8                     3.468n ± 0%    3.468n ± 1%        ~ (p=0.350 n=20)
Float64-8                     3.447n ± 0%    3.478n ± 0%   +0.90% (p=0.000 n=20)
ExpFloat64-8                  4.567n ± 0%    4.563n ± 0%   -0.10% (p=0.002 n=20)
NormFloat64-8                 4.821n ± 0%    4.768n ± 0%   -1.09% (p=0.000 n=20)
Perm3-8                       28.89n ± 0%    28.94n ± 0%   +0.17% (p=0.000 n=20)
Perm30-8                      175.7n ± 0%    175.9n ± 0%   +0.14% (p=0.000 n=20)
Perm30ViaShuffle-8            153.5n ± 0%    152.6n ± 1%        ~ (p=0.010 n=20)
ShuffleOverhead-8             119.8n ± 1%    119.6n ± 1%        ~ (p=0.147 n=20)
Concurrent-8                  2.433n ± 3%    2.452n ± 3%        ~ (p=0.616 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.386 │            11ad9fdddc.386            │
                        │     sec/op     │    sec/op     vs base                │
SourceUint64-32             2.370n ±  1%    2.091n ± 1%  -11.75% (p=0.000 n=20)
GlobalInt64-32              3.569n ±  1%    3.514n ± 2%   -1.56% (p=0.000 n=20)
GlobalInt63Parallel-32     0.3221n ±  1%   0.3197n ± 0%   -0.76% (p=0.000 n=20)
GlobalUint64-32             8.797n ± 10%    3.542n ± 1%  -59.74% (p=0.000 n=20)
GlobalUint64Parallel-32    0.6351n ±  0%   0.3218n ± 0%  -49.33% (p=0.000 n=20)
Int64-32                    2.612n ±  2%    2.552n ± 2%   -2.30% (p=0.000 n=20)
Uint64-32                   3.350n ±  1%    2.566n ± 1%  -23.42% (p=0.000 n=20)
GlobalIntN1000-32           5.892n ±  1%    5.965n ± 2%        ~ (p=0.082 n=20)
IntN1000-32                 4.546n ±  1%    4.652n ± 1%   +2.33% (p=0.000 n=20)
Int64N1000-32               14.59n ±  1%    14.48n ± 1%        ~ (p=0.652 n=20)
Int64N1e8-32                14.76n ±  2%    14.67n ± 1%        ~ (p=0.836 n=20)
Int64N1e9-32                16.57n ±  1%    16.80n ± 2%        ~ (p=0.016 n=20)
Int64N2e9-32                14.54n ±  1%    14.52n ± 1%        ~ (p=0.533 n=20)
Int64N1e18-32               16.14n ±  1%    16.16n ± 1%        ~ (p=0.606 n=20)
Int64N2e18-32               18.10n ±  1%    17.95n ± 2%        ~ (p=0.062 n=20)
Int64N4e18-32               18.65n ±  1%    18.35n ± 2%   -1.61% (p=0.010 n=20)
Int32N1000-32               3.560n ±  1%    3.608n ± 1%   +1.33% (p=0.001 n=20)
Int32N1e8-32                3.770n ±  2%    3.767n ± 1%        ~ (p=0.155 n=20)
Int32N1e9-32                4.098n ±  0%    4.130n ± 2%        ~ (p=0.016 n=20)
Int32N2e9-32                4.179n ±  1%    4.206n ± 1%        ~ (p=0.011 n=20)
Float32-32                  21.18n ±  4%    22.18n ± 4%   +4.70% (p=0.003 n=20)
Float64-32                  20.60n ±  2%    20.75n ± 4%   +0.73% (p=0.000 n=20)
ExpFloat64-32               13.07n ±  0%    12.58n ± 3%   -3.82% (p=0.000 n=20)
NormFloat64-32              7.738n ±  2%    7.920n ± 3%        ~ (p=0.066 n=20)
Perm3-32                    36.73n ±  1%    40.27n ± 1%   +9.65% (p=0.000 n=20)
Perm30-32                   211.9n ±  1%    213.2n ± 2%        ~ (p=0.262 n=20)
Perm30ViaShuffle-32         165.2n ±  1%    164.2n ± 2%        ~ (p=0.029 n=20)
ShuffleOverhead-32          133.9n ±  1%    134.7n ± 2%        ~ (p=0.551 n=20)
Concurrent-32               3.287n ±  2%    3.301n ± 2%        ~ (p=0.330 n=20)

For #61716.

Change-Id: I8d2f73f87dd3603a0c2ff069988938e0957b6904
Reviewed-on: https://go-review.googlesource.com/c/go/+/502499
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:08:34 +00:00
Ubuntu
8fc043ccfa cmd/compile: optimize right shifts of int32 on riscv64
The compiler is currently sign extending 32 bit signed integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit signed values (sraw and sraiw) which sign extend
the result of the shift to 64 bits.  Change the compiler so that
it uses sraw and sraiw for shifts of signed 32 bit integers reducing
in most cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

int32(a) >> 2

  before:

    sll     x5,x10,0x20
    sra     x10,x5,0x22

  after:

    sraw    x10,x10,0x2

int32(v) >> int(s)

  before:

    sext.w  x5,x10
    sltiu   x6,x11,64
    add     x6,x6,-1
    or      x6,x11,x6
    sra     x10,x5,x6

  after:

    sltiu   x5,x11,32
    add     x5,x5,-1
    or      x5,x11,x5
    sraw    x10,x10,x5

int32(v) >> (int(s) & 31)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

after:

    and     x5,x11,31
    sraw    x10,x10,x5

int32(100) >> int(a)

  before:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,64
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sra     x10,x6,x5

  after:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,32
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sraw    x10,x6,x5

int32(v) >> (int(s) & 63)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

  after:

    and     x5,x11,63
    sltiu   x6,x5,32
    add     x6,x6,-1
    or      x5,x5,x6
    sraw    x10,x10,x5

In most cases we eliminate one instruction.  In the case where
we shift a int32 constant by a variable the number of instructions
generated is identical.  A sra is simply replaced by a sraw.  In the
unusual case where we shift right by a variable anded with a constant
> 31 but < 64, we generate two additional instructions.  As this is
an unusual case we do not try to optimize for it.

Some improvements can be seen in some of the existing benchmarks,
notably in the utf8 package which performs right shifts of runes
which are signed 32 bit integers.

                      |  utf8-old   |              utf8-new            |
                      |   sec/op    |   sec/op     vs base             |
EncodeASCIIRune-4       17.68n ± 0%   17.67n ± 0%       ~ (p=0.312 n=10)
EncodeJapaneseRune-4    35.34n ± 0%   34.53n ± 1%  -2.31% (p=0.000 n=10)
AppendASCIIRune-4       3.213n ± 0%   3.213n ± 0%       ~ (p=0.318 n=10)
AppendJapaneseRune-4    36.14n ± 0%   35.35n ± 0%  -2.19% (p=0.000 n=10)
DecodeASCIIRune-4       28.11n ± 0%   27.36n ± 0%  -2.69% (p=0.000 n=10)
DecodeJapaneseRune-4    38.55n ± 0%   38.58n ± 0%       ~ (p=0.612 n=10)

Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/535115
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
2023-10-30 14:47:06 +00:00
Russ Cox
1f4db9dbd6 math/rand/v2: update benchmarks
Change the benchmarks to use the result of the calls,
as I found that in certain cases inlining resulted in
discarding part of the computation in the benchmark loop.
Add various benchmarks that will be relevant in future CLs.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.amd64 │
                        │      sec/op      │
SourceUint64-32                1.555n ± 1%
GlobalInt64-32                 2.071n ± 1%
GlobalInt63Parallel-32        0.1023n ± 1%
GlobalUint64-32                5.193n ± 1%
GlobalUint64Parallel-32       0.2341n ± 0%
Int64-32                       2.056n ± 2%
Uint64-32                      2.077n ± 2%
GlobalIntN1000-32              4.077n ± 2%
IntN1000-32                    3.476n ± 2%
Int64N1000-32                  3.059n ± 1%
Int64N1e8-32                   2.942n ± 1%
Int64N1e9-32                   2.932n ± 1%
Int64N2e9-32                   2.925n ± 1%
Int64N1e18-32                  3.116n ± 1%
Int64N2e18-32                  4.067n ± 1%
Int64N4e18-32                  4.054n ± 1%
Int32N1000-32                  2.951n ± 1%
Int32N1e8-32                   3.102n ± 1%
Int32N1e9-32                   3.535n ± 1%
Int32N2e9-32                   3.514n ± 1%
Float32-32                     2.760n ± 1%
Float64-32                     2.284n ± 1%
ExpFloat64-32                  3.757n ± 1%
NormFloat64-32                 3.837n ± 1%
Perm3-32                       35.23n ± 2%
Perm30-32                      208.8n ± 1%
Perm30ViaShuffle-32            111.7n ± 1%
ShuffleOverhead-32             101.1n ± 1%
Concurrent-32                  2.108n ± 7%

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 220860f76f.arm64 │
                       │      sec/op      │
SourceUint64-8                2.316n ± 1%
GlobalInt64-8                 2.183n ± 1%
GlobalInt63Parallel-8        0.4331n ± 0%
GlobalUint64-8                4.377n ± 2%
GlobalUint64Parallel-8       0.9237n ± 0%
Int64-8                       2.538n ± 1%
Uint64-8                      2.604n ± 1%
GlobalIntN1000-8              3.857n ± 2%
IntN1000-8                    3.822n ± 2%
Int64N1000-8                  3.318n ± 0%
Int64N1e8-8                   3.349n ± 1%
Int64N1e9-8                   3.317n ± 2%
Int64N2e9-8                   3.317n ± 2%
Int64N1e18-8                  3.542n ± 1%
Int64N2e18-8                  5.087n ± 0%
Int64N4e18-8                  5.084n ± 0%
Int32N1000-8                  3.208n ± 2%
Int32N1e8-8                   3.610n ± 1%
Int32N1e9-8                   4.235n ± 0%
Int32N2e9-8                   4.229n ± 1%
Float32-8                     3.468n ± 0%
Float64-8                     3.447n ± 0%
ExpFloat64-8                  4.567n ± 0%
NormFloat64-8                 4.821n ± 0%
Perm3-8                       28.89n ± 0%
Perm30-8                      175.7n ± 0%
Perm30ViaShuffle-8            153.5n ± 0%
ShuffleOverhead-8             119.8n ± 1%
Concurrent-8                  2.433n ± 3%

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.386 │
                        │     sec/op     │
SourceUint64-32             2.370n ±  1%
GlobalInt64-32              3.569n ±  1%
GlobalInt63Parallel-32     0.3221n ±  1%
GlobalUint64-32             8.797n ± 10%
GlobalUint64Parallel-32    0.6351n ±  0%
Int64-32                    2.612n ±  2%
Uint64-32                   3.350n ±  1%
GlobalIntN1000-32           5.892n ±  1%
IntN1000-32                 4.546n ±  1%
Int64N1000-32               14.59n ±  1%
Int64N1e8-32                14.76n ±  2%
Int64N1e9-32                16.57n ±  1%
Int64N2e9-32                14.54n ±  1%
Int64N1e18-32               16.14n ±  1%
Int64N2e18-32               18.10n ±  1%
Int64N4e18-32               18.65n ±  1%
Int32N1000-32               3.560n ±  1%
Int32N1e8-32                3.770n ±  2%
Int32N1e9-32                4.098n ±  0%
Int32N2e9-32                4.179n ±  1%
Float32-32                  21.18n ±  4%
Float64-32                  20.60n ±  2%
ExpFloat64-32               13.07n ±  0%
NormFloat64-32              7.738n ±  2%
Perm3-32                    36.73n ±  1%
Perm30-32                   211.9n ±  1%
Perm30ViaShuffle-32         165.2n ±  1%
ShuffleOverhead-32          133.9n ±  1%
Concurrent-32               3.287n ±  2%

For #61716.

Change-Id: I2f0938eae4b7bf736a8cd899a99783e731bf2179
Reviewed-on: https://go-review.googlesource.com/c/go/+/502496
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:32:20 +00:00
Russ Cox
1cc5b34d28 math/rand/v2: remove Rand.Seed
Removing Rand.Seed lets us remove lockedSource as well,
along with the ambiguity in globalRand about which source
to use.

For #61716.

Change-Id: Ibe150520dd1e7dd87165eacaebe9f0c2daeaedfd
Reviewed-on: https://go-review.googlesource.com/c/go/+/502498
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2023-10-30 14:31:46 +00:00
Russ Cox
48bd1fc93b math/rand/v2: clean up regression test
Add more test cases.
Replace -printgolden with -update,
which rewrites the files for us.

For #61716.

Change-Id: I7c4c900ee896042429135a21971a56ebe16b6a66
Reviewed-on: https://go-review.googlesource.com/c/go/+/516858
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:30:24 +00:00
Russ Cox
d6c1ef52ad math/rand/v2: remove Read
In math/rand, Read is deprecated. Remove in v2.
People should use crypto/rand if they need long strings.

For #61716.

Change-Id: Ib254b7e1844616e96db60a3a7abb572b0dcb1583
Reviewed-on: https://go-review.googlesource.com/c/go/+/502497
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:30:14 +00:00
Russ Cox
d42750b17c math/rand/v2: rename various functions
Int31 -> Int32
Int31n -> Int32N
Int63 -> Int64
Int63n -> Int64N
Intn -> IntN

The 31 and 63 are pedantic and confusing: the functions should
be named for the type they return, same as all the others.

The lower-case n is inconsistent with Go's usual CamelCase
and especially problematic because we plan to add 'func N'.
Capitalize the n.

For #61716.

Change-Id: Idb1a005a82f353677450d47fb612ade7a41fde69
Reviewed-on: https://go-review.googlesource.com/c/go/+/516857
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:29:37 +00:00
Russ Cox
59f0ab4036 math/rand/v2: start of new API
This is the beginning of the math/rand/v2 package from proposal #61716.
Start by copying old API. This CL copies math/rand/* to math/rand/v2
and updates references to math/rand to add v2 throughout.
Later CLs will make the v2 changes.

For #61716.

Change-Id: I1624ccffae3dfa442d4ba2461942decbd076e11b
Reviewed-on: https://go-review.googlesource.com/c/go/+/502495
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 14:29:30 +00:00
Cherry Mui
8c92897e15 cmd/compile: rework TestPGOHash to not rebuild dependencies
TestPGOHash may rebuild dependencies as we pass -trimpath to the
go command. This CL makes it pass -trimpath compiler flag to only
the current package instead, as we only need the current package
to have a stable source file path.

Also refactor buildPGOInliningTest to only take compiler flags,
not go flags, to avoid accidental rebuild.

Should fix #63733.

Change-Id: Iec6c4e90cf659790e21083ee2e697f518234c5b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/535915
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-27 17:54:18 +00:00
Cherry Mui
5613882df7 internal/testenv: use cmd.Environ in CleanCmdEnv
In CleanCmdEnv, use cmd.Environ instead of os.Environ, so it
sets the PWD environment variable if cmd.Dir is set. This ensures
the child process sees a canonical path for its working directory.

Change-Id: Ia769552a488dc909eaf6bb7d21937adba06d1072
Reviewed-on: https://go-review.googlesource.com/c/go/+/538215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-10-27 17:53:23 +00:00