1
0
mirror of https://github.com/golang/go synced 2024-09-30 06:34:29 -06:00
go/src/bytes
Carlo Alberto Ferraris dcb90152a4 bytes,strings: optimize Repeat
When generating long strings or slices with Repeat we
currently reuse intermediate states as a way to quickly
build exponentially longer results.

This works well as long as the intermediate states fit into
the processor D-cache. If they don't we start thrashing the
D-cache by reading in the whole intermediate state over and
over on each iteration.

Instead, once we reach a large enough intermediate state (that
allows the memcpy operation to perform at peak) we cap the
size of chunk of the state that is used as source for subsequent
appends. This ensures that this smaller source chunk is always
present in the D-cache, and the append operation does not need
to read the state contents from memory.

Currently the cap is set to 8KB, a number derived via
experimentation to yield the highest performance across a
a large range of result sizes. Slightly higher caps also
produced similar results: 8KB was chosen as the smallest one
in this performance plateau with the intention to minimize
D-cache pollution.

For result sizes larger than the fastest cache levels we get
significantly higher performance compared to the current
implementation:
strings:
name                            old speed      new speed      delta
RepeatLarge/256/1-16            1.73GB/s ± 1%  1.73GB/s ± 0%      ~     (p=0.556 n=5+4)
RepeatLarge/256/16-16           2.02GB/s ± 0%  1.95GB/s ± 8%      ~     (p=0.222 n=5+5)
RepeatLarge/512/1-16            2.30GB/s ±13%  2.47GB/s ± 1%      ~     (p=0.548 n=5+5)
RepeatLarge/512/16-16           2.38GB/s ±16%  2.77GB/s ± 1%   +16.27%  (p=0.032 n=5+5)
RepeatLarge/1024/1-16           3.17GB/s ± 1%  3.18GB/s ± 0%      ~     (p=0.730 n=4+5)
RepeatLarge/1024/16-16          3.39GB/s ± 2%  3.38GB/s ± 1%      ~     (p=0.548 n=5+5)
RepeatLarge/2048/1-16           3.32GB/s ± 2%  3.32GB/s ± 2%      ~     (p=1.000 n=5+5)
RepeatLarge/2048/16-16          3.41GB/s ± 4%  3.46GB/s ± 2%      ~     (p=0.310 n=5+5)
RepeatLarge/4096/1-16           3.60GB/s ± 4%  3.67GB/s ± 3%      ~     (p=0.690 n=5+5)
RepeatLarge/4096/16-16          3.74GB/s ± 3%  3.71GB/s ± 5%      ~     (p=0.690 n=5+5)
RepeatLarge/8192/1-16           3.94GB/s ± 4%  4.01GB/s ± 1%      ~     (p=0.222 n=5+5)
RepeatLarge/8192/16-16          3.94GB/s ± 6%  4.05GB/s ± 1%      ~     (p=0.222 n=5+5)
RepeatLarge/8192/4097-16        4.25GB/s ± 6%  4.32GB/s ± 3%      ~     (p=0.690 n=5+5)
RepeatLarge/16384/1-16          4.96GB/s ± 1%  5.02GB/s ± 2%      ~     (p=0.421 n=5+5)
RepeatLarge/16384/16-16         4.99GB/s ± 2%  5.07GB/s ± 1%      ~     (p=0.421 n=5+5)
RepeatLarge/16384/4097-16       5.15GB/s ± 3%  5.17GB/s ± 1%      ~     (p=1.000 n=5+5)
RepeatLarge/32768/1-16          5.44GB/s ± 2%  5.42GB/s ± 1%      ~     (p=0.841 n=5+5)
RepeatLarge/32768/16-16         5.46GB/s ± 4%  5.44GB/s ± 1%      ~     (p=0.905 n=5+4)
RepeatLarge/32768/4097-16       4.84GB/s ± 2%  4.59GB/s ±12%    -5.05%  (p=0.032 n=5+5)
RepeatLarge/65536/1-16          5.85GB/s ± 0%  5.84GB/s ± 1%      ~     (p=0.690 n=5+5)
RepeatLarge/65536/16-16         5.81GB/s ± 2%  5.84GB/s ± 2%      ~     (p=0.421 n=5+5)
RepeatLarge/65536/4097-16       5.38GB/s ± 6%  5.45GB/s ± 1%      ~     (p=1.000 n=5+5)
RepeatLarge/131072/1-16         6.20GB/s ± 1%  6.31GB/s ± 1%    +1.80%  (p=0.008 n=5+5)
RepeatLarge/131072/16-16        6.12GB/s ± 3%  6.25GB/s ± 3%      ~     (p=0.095 n=5+5)
RepeatLarge/131072/4097-16      5.95GB/s ± 1%  5.85GB/s ±10%      ~     (p=1.000 n=5+5)
RepeatLarge/262144/1-16         6.33GB/s ± 1%  6.56GB/s ± 0%    +3.62%  (p=0.016 n=5+4)
RepeatLarge/262144/16-16        6.42GB/s ± 0%  6.65GB/s ± 1%    +3.58%  (p=0.016 n=4+5)
RepeatLarge/262144/4097-16      6.31GB/s ± 1%  6.44GB/s ± 1%    +1.94%  (p=0.008 n=5+5)
RepeatLarge/524288/1-16         6.23GB/s ± 1%  6.92GB/s ± 3%   +11.02%  (p=0.008 n=5+5)
RepeatLarge/524288/16-16        6.24GB/s ± 1%  6.97GB/s ± 2%   +11.77%  (p=0.016 n=4+5)
RepeatLarge/524288/4097-16      6.14GB/s ± 2%  6.73GB/s ± 3%    +9.50%  (p=0.008 n=5+5)
RepeatLarge/1048576/1-16        5.23GB/s ± 1%  6.53GB/s ± 6%   +24.85%  (p=0.008 n=5+5)
RepeatLarge/1048576/16-16       5.21GB/s ± 1%  6.56GB/s ± 4%   +25.93%  (p=0.008 n=5+5)
RepeatLarge/1048576/4097-16     5.22GB/s ± 1%  6.26GB/s ± 2%   +20.09%  (p=0.008 n=5+5)
RepeatLarge/2097152/1-16        3.95GB/s ± 1%  5.96GB/s ± 1%   +51.01%  (p=0.008 n=5+5)
RepeatLarge/2097152/16-16       3.94GB/s ± 1%  5.98GB/s ± 2%   +51.99%  (p=0.008 n=5+5)
RepeatLarge/2097152/4097-16     4.94GB/s ± 1%  5.71GB/s ± 2%   +15.63%  (p=0.008 n=5+5)
RepeatLarge/4194304/1-16        3.10GB/s ± 1%  5.89GB/s ± 1%   +89.90%  (p=0.008 n=5+5)
RepeatLarge/4194304/16-16       3.09GB/s ± 1%  5.86GB/s ± 1%   +89.89%  (p=0.008 n=5+5)
RepeatLarge/4194304/4097-16     3.13GB/s ± 1%  5.89GB/s ± 1%   +88.36%  (p=0.008 n=5+5)
RepeatLarge/8388608/1-16        3.06GB/s ± 1%  6.31GB/s ±16%  +105.84%  (p=0.008 n=5+5)
RepeatLarge/8388608/16-16       3.08GB/s ± 1%  6.62GB/s ± 1%  +114.66%  (p=0.008 n=5+5)
RepeatLarge/8388608/4097-16     3.13GB/s ± 2%  6.87GB/s ± 1%  +119.62%  (p=0.008 n=5+5)
RepeatLarge/16777216/1-16       3.21GB/s ± 3%  5.88GB/s ± 1%   +83.27%  (p=0.008 n=5+5)
RepeatLarge/16777216/16-16      3.23GB/s ± 2%  5.84GB/s ± 2%   +80.49%  (p=0.008 n=5+5)
RepeatLarge/16777216/4097-16    3.30GB/s ± 6%  5.88GB/s ± 2%   +78.18%  (p=0.008 n=5+5)
RepeatLarge/33554432/1-16       3.71GB/s ± 3%  5.91GB/s ± 2%   +59.17%  (p=0.008 n=5+5)
RepeatLarge/33554432/16-16      3.67GB/s ± 3%  5.91GB/s ± 2%   +61.13%  (p=0.008 n=5+5)
RepeatLarge/33554432/4097-16    3.71GB/s ± 1%  5.77GB/s ± 6%   +55.51%  (p=0.008 n=5+5)
RepeatLarge/67108864/1-16       4.61GB/s ±11%  6.00GB/s ± 5%   +30.15%  (p=0.008 n=5+5)
RepeatLarge/67108864/16-16      4.62GB/s ± 7%  6.11GB/s ± 2%   +32.35%  (p=0.008 n=5+5)
RepeatLarge/67108864/4097-16    4.71GB/s ± 2%  6.24GB/s ± 2%   +32.60%  (p=0.008 n=5+5)
RepeatLarge/134217728/1-16      4.53GB/s ± 8%  6.28GB/s ±11%   +38.57%  (p=0.008 n=5+5)
RepeatLarge/134217728/16-16     4.78GB/s ± 3%  6.36GB/s ± 3%   +33.16%  (p=0.008 n=5+5)
RepeatLarge/134217728/4097-16   4.73GB/s ± 6%  6.46GB/s ± 3%   +36.63%  (p=0.008 n=5+5)
RepeatLarge/268435456/1-16      4.09GB/s ±25%  6.37GB/s ±19%   +56.00%  (p=0.008 n=5+5)
RepeatLarge/268435456/16-16     4.50GB/s ± 4%  6.86GB/s ± 0%   +52.49%  (p=0.016 n=5+4)
RepeatLarge/268435456/4097-16   4.73GB/s ± 5%  6.90GB/s ± 0%   +45.94%  (p=0.008 n=5+5)
RepeatLarge/536870912/1-16      4.38GB/s ±36%  6.52GB/s ± 8%   +48.68%  (p=0.008 n=5+5)
RepeatLarge/536870912/16-16     4.69GB/s ±12%  6.90GB/s ± 1%   +46.97%  (p=0.008 n=5+5)
RepeatLarge/536870912/4097-16   4.87GB/s ± 8%  6.98GB/s ± 0%   +43.36%  (p=0.008 n=5+5)
RepeatLarge/1073741824/1-16     3.87GB/s ±28%  6.96GB/s ± 1%   +79.94%  (p=0.016 n=5+4)
RepeatLarge/1073741824/16-16    4.79GB/s ± 9%  6.93GB/s ± 0%   +44.79%  (p=0.008 n=5+5)
RepeatLarge/1073741824/4097-16  4.65GB/s ± 8%  7.02GB/s ± 1%   +51.02%  (p=0.008 n=5+5)

bytes:
name                            old speed      new speed      delta
RepeatLarge/256/1-16            1.93GB/s ± 1%  1.84GB/s ± 1%    -4.81%  (p=0.000 n=10+10)
RepeatLarge/256/16-16           2.25GB/s ± 2%  2.15GB/s ± 1%    -4.45%  (p=0.000 n=9+8)
RepeatLarge/512/1-16            2.71GB/s ± 1%  2.62GB/s ± 1%    -3.27%  (p=0.000 n=10+9)
RepeatLarge/512/16-16           2.96GB/s ± 4%  2.91GB/s ± 1%      ~     (p=0.243 n=9+10)
RepeatLarge/1024/1-16           3.35GB/s ± 1%  3.27GB/s ± 1%    -2.61%  (p=0.000 n=9+10)
RepeatLarge/1024/16-16          3.56GB/s ± 2%  3.52GB/s ± 1%    -1.10%  (p=0.010 n=10+9)
RepeatLarge/2048/1-16           3.52GB/s ± 1%  3.45GB/s ± 1%    -1.92%  (p=0.000 n=10+10)
RepeatLarge/2048/16-16          3.61GB/s ± 1%  3.58GB/s ± 0%    -0.82%  (p=0.008 n=9+8)
RepeatLarge/4096/1-16           3.85GB/s ± 2%  3.80GB/s ± 2%      ~     (p=0.165 n=10+10)
RepeatLarge/4096/16-16          3.88GB/s ± 3%  3.84GB/s ± 4%      ~     (p=0.393 n=10+10)
RepeatLarge/8192/1-16           4.12GB/s ± 2%  4.04GB/s ± 1%    -1.96%  (p=0.000 n=10+10)
RepeatLarge/8192/16-16          4.11GB/s ± 2%  4.09GB/s ± 1%      ~     (p=0.278 n=9+10)
RepeatLarge/8192/4097-16        4.38GB/s ± 1%  4.39GB/s ± 4%      ~     (p=0.720 n=9+10)
RepeatLarge/16384/1-16          5.06GB/s ± 2%  4.95GB/s ± 3%    -2.29%  (p=0.001 n=10+9)
RepeatLarge/16384/16-16         5.11GB/s ± 3%  5.06GB/s ± 3%      ~     (p=0.315 n=10+9)
RepeatLarge/16384/4097-16       5.22GB/s ± 3%  5.26GB/s ± 3%      ~     (p=0.211 n=9+10)
RepeatLarge/32768/1-16          5.54GB/s ± 2%  5.50GB/s ± 3%      ~     (p=0.353 n=10+10)
RepeatLarge/32768/16-16         5.55GB/s ± 1%  5.60GB/s ± 1%    +0.91%  (p=0.035 n=10+9)
RepeatLarge/32768/4097-16       4.88GB/s ± 2%  4.85GB/s ± 2%      ~     (p=0.447 n=10+9)
RepeatLarge/65536/1-16          5.86GB/s ± 1%  5.93GB/s ± 2%    +1.18%  (p=0.043 n=8+10)
RepeatLarge/65536/16-16         5.83GB/s ± 2%  5.98GB/s ± 1%    +2.67%  (p=0.000 n=10+10)
RepeatLarge/65536/4097-16       5.57GB/s ± 0%  5.56GB/s ± 3%      ~     (p=0.696 n=8+10)
RepeatLarge/131072/1-16         6.23GB/s ± 1%  6.38GB/s ± 2%    +2.51%  (p=0.000 n=9+10)
RepeatLarge/131072/16-16        6.21GB/s ± 2%  6.37GB/s ± 1%    +2.72%  (p=0.000 n=9+10)
RepeatLarge/131072/4097-16      6.04GB/s ± 1%  6.09GB/s ± 3%      ~     (p=0.356 n=9+10)
RepeatLarge/262144/1-16         6.47GB/s ± 1%  6.63GB/s ± 2%    +2.57%  (p=0.003 n=10+10)
RepeatLarge/262144/16-16        6.45GB/s ± 2%  6.69GB/s ± 2%    +3.65%  (p=0.000 n=10+10)
RepeatLarge/262144/4097-16      6.35GB/s ± 1%  6.51GB/s ± 2%    +2.48%  (p=0.000 n=9+10)
RepeatLarge/524288/1-16         6.21GB/s ± 2%  6.95GB/s ± 1%   +11.95%  (p=0.000 n=10+10)
RepeatLarge/524288/16-16        6.24GB/s ± 2%  6.93GB/s ± 2%   +11.11%  (p=0.000 n=10+10)
RepeatLarge/524288/4097-16      6.18GB/s ± 2%  6.82GB/s ± 1%   +10.39%  (p=0.000 n=9+10)
RepeatLarge/1048576/1-16        5.34GB/s ± 2%  6.41GB/s ± 2%   +20.05%  (p=0.000 n=10+10)
RepeatLarge/1048576/16-16       5.33GB/s ± 1%  6.45GB/s ± 2%   +20.84%  (p=0.000 n=10+9)
RepeatLarge/1048576/4097-16     5.28GB/s ± 1%  6.17GB/s ± 2%   +16.75%  (p=0.000 n=10+10)
RepeatLarge/2097152/1-16        4.04GB/s ± 1%  6.21GB/s ± 1%   +53.89%  (p=0.000 n=9+8)
RepeatLarge/2097152/16-16       4.02GB/s ± 1%  6.20GB/s ± 2%   +54.37%  (p=0.000 n=10+9)
RepeatLarge/2097152/4097-16     4.94GB/s ± 1%  6.04GB/s ± 1%   +22.36%  (p=0.000 n=10+10)
RepeatLarge/4194304/1-16        3.10GB/s ± 1%  5.74GB/s ± 0%   +85.04%  (p=0.000 n=10+9)
RepeatLarge/4194304/16-16       3.10GB/s ± 2%  5.72GB/s ± 1%   +84.26%  (p=0.000 n=9+10)
RepeatLarge/4194304/4097-16     3.03GB/s ± 4%  5.61GB/s ± 1%   +85.06%  (p=0.000 n=10+9)
RepeatLarge/8388608/1-16        3.08GB/s ± 2%  6.25GB/s ± 1%  +103.09%  (p=0.000 n=9+9)
RepeatLarge/8388608/16-16       3.07GB/s ± 2%  6.26GB/s ± 3%  +104.07%  (p=0.000 n=10+9)
RepeatLarge/8388608/4097-16     3.08GB/s ± 2%  6.23GB/s ± 2%  +102.09%  (p=0.000 n=9+10)
RepeatLarge/16777216/1-16       3.25GB/s ± 2%  5.78GB/s ± 3%   +78.03%  (p=0.000 n=9+9)
RepeatLarge/16777216/16-16      3.25GB/s ± 1%  5.75GB/s ± 1%   +77.21%  (p=0.000 n=9+10)
RepeatLarge/16777216/4097-16    3.29GB/s ± 3%  5.72GB/s ± 2%   +73.74%  (p=0.000 n=10+10)
RepeatLarge/33554432/1-16       3.68GB/s ± 2%  5.90GB/s ± 1%   +60.20%  (p=0.000 n=10+10)
RepeatLarge/33554432/16-16      3.69GB/s ± 3%  5.88GB/s ± 1%   +59.54%  (p=0.000 n=10+9)
RepeatLarge/33554432/4097-16    3.74GB/s ± 1%  5.94GB/s ± 2%   +58.68%  (p=0.000 n=7+10)
RepeatLarge/67108864/1-16       4.62GB/s ±12%  6.11GB/s ± 3%   +32.23%  (p=0.000 n=10+9)
RepeatLarge/67108864/16-16      4.77GB/s ± 2%  6.09GB/s ± 2%   +27.88%  (p=0.000 n=9+9)
RepeatLarge/67108864/4097-16    4.78GB/s ± 1%  6.19GB/s ± 1%   +29.51%  (p=0.000 n=9+10)
RepeatLarge/134217728/1-16      4.60GB/s ±16%  6.52GB/s ± 9%   +41.67%  (p=0.000 n=10+10)
RepeatLarge/134217728/16-16     4.80GB/s ± 4%  6.81GB/s ± 2%   +41.82%  (p=0.000 n=10+9)
RepeatLarge/134217728/4097-16   4.79GB/s ± 4%  6.81GB/s ± 2%   +42.31%  (p=0.000 n=9+10)
RepeatLarge/268435456/1-16      4.43GB/s ±25%  6.27GB/s ±14%   +41.52%  (p=0.000 n=10+10)
RepeatLarge/268435456/16-16     4.75GB/s ± 4%  6.68GB/s ± 4%   +40.50%  (p=0.000 n=9+10)
RepeatLarge/268435456/4097-16   4.75GB/s ± 3%  6.58GB/s ± 4%   +38.68%  (p=0.000 n=9+10)
RepeatLarge/536870912/1-16      4.96GB/s ± 9%  6.39GB/s ±16%   +28.90%  (p=0.000 n=8+10)
RepeatLarge/536870912/16-16     4.66GB/s ± 6%  6.57GB/s ± 7%   +40.82%  (p=0.000 n=10+9)
RepeatLarge/536870912/4097-16   4.68GB/s ±11%  6.88GB/s ± 3%   +47.01%  (p=0.000 n=10+9)
RepeatLarge/1073741824/1-16     4.39GB/s ±23%  6.57GB/s ± 5%   +49.75%  (p=0.000 n=10+8)
RepeatLarge/1073741824/16-16    4.73GB/s ±13%  6.89GB/s ± 1%   +45.68%  (p=0.000 n=9+8)
RepeatLarge/1073741824/4097-16  4.97GB/s ±15%  6.73GB/s ± 9%   +35.45%  (p=0.000 n=10+10)

The results above come from a Intel i9-9980HK (256KB L2) with
TurboBoost disabled.

Change-Id: I79dd57da0429aee9020ffd7bc458a034b999b740
Reviewed-on: https://go-review.googlesource.com/c/go/+/419054
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-27 16:55:15 +00:00
..
boundary_test.go bytes: test for page boundary crosses on sep of Index 2021-11-02 17:31:50 +00:00
buffer_test.go bytes: rely on runtime.growslice for growing 2022-03-10 16:24:16 +00:00
buffer.go bytes: rely on utf8.AppendRune 2022-08-23 20:26:47 +00:00
bytes_test.go bytes,strings: optimize Repeat 2022-09-27 16:55:15 +00:00
bytes.go bytes,strings: optimize Repeat 2022-09-27 16:55:15 +00:00
compare_test.go internal/bytealg: correct alignment checks for compare/memequal on riscv64 2022-09-19 18:59:50 +00:00
example_test.go bytes: add example with (*Buffer).Cap, (*Buffer).Read, (*Buffer).ReadByte 2021-11-05 21:26:54 +00:00
export_test.go bytes, internal/bytealg: simplify Equal 2019-04-24 00:56:36 +00:00
reader_test.go all: gofmt -w -r 'interface{} -> any' src 2021-12-13 18:45:54 +00:00
reader.go bytes: document that Reader.Reset affects the result of Size 2022-07-26 14:20:34 +00:00