1
0
mirror of https://github.com/golang/go synced 2024-11-06 11:26:12 -07:00
go/src
nimelehin dcfe57b8c2 runtime: use ERMS in memclr_amd64
This patch adds support for REP STOSB in memclr(). The current
implementation uses REP STOSB when 1) ERMS is supported
2) size is bigger than 2kb and less than 32mb.

The threshold of 2kb is chosen based on benchmark results and is
close to what Intel mentioned in their comparison of ERMSB and AVX
(Table 3-4. Relative Performance of Memcpy() Using ERMSB Vs. 128-bit
AVX in the Intel Optimization Guide).

While REP STOS uses a no-RFO write protocol, ERMS could show the
same or slower performance comparing to Non-Temporal Stores when the
size is bigger than LLC depending on hardware.

Benchmarks (including MemclrRange from CL373362)
goos: darwin
goarch: amd64
pkg: runtime
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
name                           old time/op    new time/op    delta
Memclr/5-12                      1.90ns ± 2%    2.13ns ± 2%  +11.72%  (p=0.001 n=7+7)
Memclr/16-12                     2.33ns ± 4%    2.36ns ± 4%     ~     (p=0.259 n=7+7)
Memclr/64-12                     2.58ns ± 2%    2.61ns ± 3%     ~     (p=0.091 n=7+7)
Memclr/256-12                    4.89ns ± 4%    4.94ns ± 3%     ~     (p=0.620 n=7+7)
Memclr/4096-12                   38.4ns ± 2%    39.5ns ± 5%     ~     (p=0.078 n=7+7)
Memclr/65536-12                   929ns ± 2%    1040ns ±19%     ~     (p=0.268 n=5+7)
Memclr/1M-12                     24.2µs ± 5%    19.0µs ± 9%  -21.62%  (p=0.001 n=7+7)
Memclr/4M-12                     93.3µs ± 3%    73.2µs ± 4%  -21.50%  (p=0.001 n=7+7)
Memclr/8M-12                      209µs ± 6%     164µs ± 3%  -21.55%  (p=0.001 n=7+7)
Memclr/16M-12                     731µs ± 4%     507µs ± 6%  -30.71%  (p=0.001 n=7+7)
Memclr/64M-12                    1.79ms ± 1%    1.83ms ± 3%   +2.47%  (p=0.041 n=6+6)
MemclrRange/1_2_47K-12            873ns ± 3%     899ns ± 5%     ~     (p=0.053 n=7+7)
MemclrRange/2_8_166K-12          2.98µs ± 4%    2.90µs ± 5%     ~     (p=0.165 n=7+7)
MemclrRange/4_16_315K-12         6.81µs ± 4%    5.31µs ± 9%  -22.01%  (p=0.001 n=7+7)
MemclrRange/128_256_1623K-12     37.5µs ± 4%    28.1µs ± 4%  -25.19%  (p=0.001 n=7+6)
[Geo mean]                       1.56µs         1.43µs        -8.43%

name                           old speed      new speed      delta
Memclr/5-12                    2.63GB/s ± 2%  2.35GB/s ± 2%  -10.50%  (p=0.001 n=7+7)
Memclr/16-12                   6.86GB/s ± 4%  6.79GB/s ± 4%     ~     (p=0.259 n=7+7)
Memclr/64-12                   24.8GB/s ± 2%  24.5GB/s ± 3%     ~     (p=0.097 n=7+7)
Memclr/256-12                  52.4GB/s ± 4%  51.9GB/s ± 3%     ~     (p=0.620 n=7+7)
Memclr/4096-12                  107GB/s ± 2%   104GB/s ± 5%     ~     (p=0.073 n=7+7)
Memclr/65536-12                70.6GB/s ± 2%  64.2GB/s ±21%     ~     (p=0.268 n=5+7)
Memclr/1M-12                   43.4GB/s ± 5%  55.5GB/s ±10%  +28.04%  (p=0.001 n=7+7)
Memclr/4M-12                   45.0GB/s ± 4%  57.3GB/s ± 4%  +27.38%  (p=0.001 n=7+7)
Memclr/8M-12                   40.1GB/s ± 5%  51.1GB/s ± 3%  +27.37%  (p=0.001 n=7+7)
Memclr/16M-12                  23.0GB/s ± 4%  33.1GB/s ± 6%  +44.39%  (p=0.001 n=7+7)
Memclr/64M-12                  37.6GB/s ± 1%  36.7GB/s ± 3%   -2.38%  (p=0.041 n=6+6)
MemclrRange/1_2_47K-12         55.9GB/s ± 3%  54.3GB/s ± 5%     ~     (p=0.053 n=7+7)
MemclrRange/2_8_166K-12        57.4GB/s ± 5%  58.9GB/s ± 5%     ~     (p=0.165 n=7+7)
MemclrRange/4_16_315K-12       47.4GB/s ± 4%  60.9GB/s ± 9%  +28.40%  (p=0.001 n=7+7)
MemclrRange/128_256_1623K-12   44.3GB/s ± 4%  58.4GB/s ± 9%  +31.73%  (p=0.001 n=7+7)
[Geo mean]                     33.6GB/s       36.8GB/s        +9.27%

goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) Gold 6230N CPU @ 2.30GHz
name                     old time/op    new time/op     delta
Memclr/5-2                 2.53ns ± 0%     2.52ns ± 0%   -0.25%  (p=0.001 n=7+7)
Memclr/16-2                2.77ns ± 0%     2.55ns ± 0%   -7.97%  (p=0.000 n=5+7)
Memclr/64-2                3.16ns ± 0%     3.16ns ± 0%     ~     (p=0.432 n=7+7)
Memclr/256-2               7.26ns ± 0%     7.26ns ± 0%     ~     (p=0.220 n=7+7)
Memclr/4096-2              49.3ns ± 0%     43.5ns ± 0%  -11.80%  (p=0.001 n=7+7)
Memclr/65536-2             1.32µs ± 1%     1.24µs ± 0%   -6.31%  (p=0.001 n=7+7)
Memclr/1M-2                27.3µs ± 0%     26.6µs ± 5%     ~     (p=0.195 n=7+7)
Memclr/4M-2                 195µs ± 0%      148µs ± 4%  -24.22%  (p=0.001 n=7+7)
Memclr/8M-2                 391µs ± 0%      308µs ± 0%  -21.09%  (p=0.001 n=7+6)
Memclr/16M-2                782µs ± 0%      639µs ± 1%  -18.31%  (p=0.001 n=7+7)
Memclr/64M-2               2.83ms ± 1%     2.84ms ± 1%     ~     (p=0.620 n=7+7)
MemclrRange/1K_2K-2        1.24µs ± 0%     1.24µs ± 0%     ~     (p=1.000 n=7+6)
MemclrRange/2K_8K-2        3.89µs ± 0%     3.11µs ± 0%  -20.00%  (p=0.001 n=6+7)
MemclrRange/4K_16K-2       3.63µs ± 0%     2.37µs ± 0%  -34.61%  (p=0.001 n=7+7)
MemclrRange/160K_228K-2    31.0µs ± 0%     30.6µs ± 1%   -1.50%  (p=0.001 n=7+7)
[Geo mean]                 1.97µs          1.76µs       -10.59%

name                     old speed      new speed       delta
Memclr/5-2               1.98GB/s ± 0%   1.98GB/s ± 0%   +0.27%  (p=0.001 n=7+7)
Memclr/16-2              5.78GB/s ± 0%   6.28GB/s ± 0%   +8.67%  (p=0.001 n=7+7)
Memclr/64-2              20.2GB/s ± 0%   20.3GB/s ± 0%     ~     (p=0.535 n=7+7)
Memclr/256-2             35.3GB/s ± 0%   35.2GB/s ± 0%     ~     (p=0.259 n=7+7)
Memclr/4096-2            83.1GB/s ± 0%   94.2GB/s ± 0%  +13.39%  (p=0.001 n=7+7)
Memclr/65536-2           49.7GB/s ± 1%   53.0GB/s ± 0%   +6.73%  (p=0.001 n=7+7)
Memclr/1M-2              38.4GB/s ± 0%   39.4GB/s ± 4%     ~     (p=0.209 n=7+7)
Memclr/4M-2              21.5GB/s ± 0%   28.4GB/s ± 4%  +32.02%  (p=0.001 n=7+7)
Memclr/8M-2              21.5GB/s ± 0%   27.2GB/s ± 0%  +26.73%  (p=0.001 n=7+6)
Memclr/16M-2             21.4GB/s ± 0%   26.2GB/s ± 1%  +22.42%  (p=0.001 n=7+7)
Memclr/64M-2             23.7GB/s ± 1%   23.7GB/s ± 1%     ~     (p=0.620 n=7+7)
MemclrRange/1K_2K-2      77.3GB/s ± 0%   77.3GB/s ± 0%     ~     (p=0.710 n=7+7)
MemclrRange/2K_8K-2      85.7GB/s ± 0%  107.1GB/s ± 0%  +25.00%  (p=0.001 n=6+7)
MemclrRange/4K_16K-2     89.0GB/s ± 0%  136.1GB/s ± 0%  +52.92%  (p=0.001 n=7+7)
MemclrRange/160K_228K-2  53.6GB/s ± 0%   54.4GB/s ± 1%   +1.52%  (p=0.001 n=7+7)
[Geo mean]               29.2GB/s        32.7GB/s       +11.86%

Change-Id: I8f3533f88ebd303ae1666a77391fec304bea9724
Reviewed-on: https://go-review.googlesource.com/c/go/+/374396
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-09 17:41:45 +00:00
..
archive archive/zip: permit zip files to have prefixes 2022-05-08 17:26:10 +00:00
bufio bufio: clarify io.EOF behavior of Reader.Read 2022-05-02 21:34:37 +00:00
builtin
bytes internal/bytealg: optimize index function for ppc64le/power9 2022-05-09 12:02:02 +00:00
cmd cmd/compile: update comment/message that mention betypeinit 2022-05-09 17:37:08 +00:00
compress compress/flate: cancel redundant operations 2022-05-08 17:05:16 +00:00
container
context
crypto all: fix some lint issues 2022-05-08 17:27:54 +00:00
database/sql all: fix some lint issues 2022-05-08 17:27:54 +00:00
debug all: fix some lint issues 2022-05-08 17:27:54 +00:00
embed
encoding encoding/base32: decoder output depends on chunking of underlying reader 2022-05-03 18:30:15 +00:00
errors
expvar
flag
fmt
go all: fix some lint issues 2022-05-08 17:27:54 +00:00
hash hash/maphash: use fastrand64 in MakeSeed 2022-04-21 17:46:04 +00:00
html
image
index/suffixarray
internal internal/abi, internal/buildcfg: enable regabi on riscv64 by default 2022-05-09 15:37:44 +00:00
io io: add an Err field to LimitedReader 2022-05-04 20:06:32 +00:00
log
math all: fix some lint issues 2022-05-08 17:27:54 +00:00
mime
net all: fix some lint issues 2022-05-08 17:27:54 +00:00
os os/exec: refactor goroutine communication in Wait 2022-05-06 22:04:35 +00:00
path path/filepath: simplify EvalSymlinks for plan9 2022-05-09 14:44:54 +00:00
plugin
reflect reflect: implement float32 for regabi riscv64 2022-05-04 13:38:32 +00:00
regexp regexp/syntax: fix typo in comment 2022-04-29 01:00:55 +00:00
runtime runtime: use ERMS in memclr_amd64 2022-05-09 17:41:45 +00:00
sort slices: use !{{Less}} instead of {{GreaterOrEqual}} 2022-04-25 19:12:14 +00:00
strconv
strings strings: adding micro-optimization for TrimSpace 2022-04-29 02:01:27 +00:00
sync sync: remove the redundant logic on sync.(*Pool).Put 2022-05-08 17:23:05 +00:00
syscall internal/poll, net, syscall: use accept4 on solaris 2022-05-03 14:38:32 +00:00
testdata
testing all: fix some lint issues 2022-05-08 17:27:54 +00:00
text
time time: return ENOENT instead of ERROR_PATH_NOT_FOUND in windows 2022-05-08 17:19:07 +00:00
unicode
unsafe
vendor vendor, cmd/vendor: update to current x/sys repo 2022-05-03 19:48:07 +00:00
all.bash
all.bat
all.rc
bootstrap.bash cmd/trace: embed static content 2022-04-21 21:18:18 +00:00
buildall.bash
clean.bash
clean.bat
clean.rc
cmp.bash
go.mod vendor, cmd/vendor: update to current x/sys repo 2022-05-03 19:48:07 +00:00
go.sum vendor, cmd/vendor: update to current x/sys repo 2022-05-03 19:48:07 +00:00
make.bash
make.bat
Make.dist
make.rc
race.bash cmd,runtime: enable race detector on s390x 2022-05-04 14:17:20 +00:00
race.bat
README.vendor
run.bash
run.bat
run.rc

Vendoring in std and cmd
========================

The Go command maintains copies of external packages needed by the
standard library in the src/vendor and src/cmd/vendor directories.

In GOPATH mode, imports of vendored packages are resolved to these
directories following normal vendor directory logic
(see golang.org/s/go15vendor).

In module mode, std and cmd are modules (defined in src/go.mod and
src/cmd/go.mod). When a package outside std or cmd is imported
by a package inside std or cmd, the import path is interpreted
as if it had a "vendor/" prefix. For example, within "crypto/tls",
an import of "golang.org/x/crypto/cryptobyte" resolves to
"vendor/golang.org/x/crypto/cryptobyte". When a package with the
same path is imported from a package outside std or cmd, it will
be resolved normally. Consequently, a binary may be built with two
copies of a package at different versions if the package is
imported normally and vendored by the standard library.

Vendored packages are internally renamed with a "vendor/" prefix
to preserve the invariant that all packages have distinct paths.
This is necessary to avoid compiler and linker conflicts. Adding
a "vendor/" prefix also maintains the invariant that standard
library packages begin with a dotless path element.

The module requirements of std and cmd do not influence version
selection in other modules. They are only considered when running
module commands like 'go get' and 'go mod vendor' from a directory
in GOROOT/src.

Maintaining vendor directories
==============================

Before updating vendor directories, ensure that module mode is enabled.
Make sure GO111MODULE=off is not set ('on' or 'auto' should work).

Requirements may be added, updated, and removed with 'go get'.
The vendor directory may be updated with 'go mod vendor'.
A typical sequence might be:

    cd src
    go get -d golang.org/x/net@latest
    go mod tidy
    go mod vendor

Use caution when passing '-u' to 'go get'. The '-u' flag updates
modules providing all transitively imported packages, not only
the module providing the target package.

Note that 'go mod vendor' only copies packages that are transitively
imported by packages in the current module. If a new package is needed,
it should be imported before running 'go mod vendor'.