1
0
mirror of https://github.com/golang/go synced 2024-11-17 00:14:50 -07:00
The Go programming language
Go to file
Paul E. Murphy db32aba508 internal/bytealg: rewrite indexbytebody on PPC64
Use P8 instructions throughout to be backwards compatible, but
otherwise not impede performance. Use overlapping loads where
possible, and prioritize larger checks over smaller check.

However, some newer instructions can be used surgically when
targeting a newer GOPPC64. These can lead to noticeable
performance improvements with minimal impact to readability.

All tests run below on a Power10/ppc64le, and use a small
modification to BenchmarkIndexByte to ensure the IndexByte
wrapper call is inlined (as it likely is under realistic usage).
This wrapper adds substantial overhead if not inlined.

Previous (power9 path, GOPPC64=power8) vs. GOPPC64=power8:

IndexByte/1       3.81ns ± 8%     3.11ns ± 5%  -18.39%
IndexByte/2       3.82ns ± 3%     3.20ns ± 6%  -16.23%
IndexByte/3       3.61ns ± 4%     3.25ns ± 6%  -10.13%
IndexByte/4       3.66ns ± 5%     3.08ns ± 1%  -15.91%
IndexByte/5       3.82ns ± 0%     3.75ns ± 2%   -1.94%
IndexByte/6       3.83ns ± 0%     3.87ns ± 4%   +1.04%
IndexByte/7       3.83ns ± 0%     3.82ns ± 0%   -0.27%
IndexByte/8       3.82ns ± 0%     2.92ns ±11%  -23.70%
IndexByte/9       3.70ns ± 2%     3.08ns ± 2%  -16.87%
IndexByte/10      3.74ns ± 2%     3.04ns ± 0%  -18.75%
IndexByte/11      3.75ns ± 0%     3.31ns ± 8%  -11.79%
IndexByte/12      3.74ns ± 0%     3.04ns ± 0%  -18.86%
IndexByte/13      3.83ns ± 4%     3.04ns ± 0%  -20.64%
IndexByte/14      3.80ns ± 1%     3.30ns ± 8%  -13.18%
IndexByte/15      3.77ns ± 1%     3.04ns ± 0%  -19.33%
IndexByte/16      3.81ns ± 0%     2.78ns ± 7%  -26.88%
IndexByte/17      4.12ns ± 0%     3.04ns ± 1%  -26.11%
IndexByte/18      4.27ns ± 6%     3.05ns ± 0%  -28.64%
IndexByte/19      4.30ns ± 4%     3.02ns ± 2%  -29.65%
IndexByte/20      4.43ns ± 7%     3.45ns ± 7%  -22.15%
IndexByte/21      4.12ns ± 0%     3.03ns ± 1%  -26.35%
IndexByte/22      4.40ns ± 6%     3.05ns ± 0%  -30.82%
IndexByte/23      4.40ns ± 6%     3.01ns ± 2%  -31.48%
IndexByte/24      4.32ns ± 5%     3.07ns ± 0%  -28.98%
IndexByte/25      4.76ns ± 2%     3.04ns ± 1%  -36.11%
IndexByte/26      4.82ns ± 0%     3.05ns ± 0%  -36.66%
IndexByte/27      4.82ns ± 0%     2.97ns ± 3%  -38.39%
IndexByte/28      4.82ns ± 0%     2.96ns ± 3%  -38.57%
IndexByte/29      4.82ns ± 0%     3.34ns ± 9%  -30.71%
IndexByte/30      4.82ns ± 0%     3.05ns ± 0%  -36.77%
IndexByte/31      4.81ns ± 0%     3.05ns ± 0%  -36.70%
IndexByte/32      3.52ns ± 0%     3.44ns ± 1%   -2.15%
IndexByte/33      4.77ns ± 1%     3.35ns ± 0%  -29.81%
IndexByte/34      5.01ns ± 5%     3.35ns ± 0%  -33.15%
IndexByte/35      4.92ns ± 9%     3.35ns ± 0%  -31.89%
IndexByte/36      4.81ns ± 5%     3.35ns ± 0%  -30.37%
IndexByte/37      4.99ns ± 6%     3.35ns ± 0%  -32.86%
IndexByte/38      5.06ns ± 5%     3.35ns ± 0%  -33.84%
IndexByte/39      5.02ns ± 5%     3.48ns ± 9%  -30.58%
IndexByte/40      5.21ns ± 9%     3.55ns ± 4%  -31.82%
IndexByte/41      5.18ns ± 0%     3.42ns ± 2%  -33.98%
IndexByte/42      5.19ns ± 0%     3.55ns ±11%  -31.56%
IndexByte/43      5.18ns ± 0%     3.45ns ± 5%  -33.46%
IndexByte/44      5.18ns ± 0%     3.39ns ± 0%  -34.56%
IndexByte/45      5.18ns ± 0%     3.43ns ± 4%  -33.74%
IndexByte/46      5.18ns ± 0%     3.47ns ± 1%  -33.03%
IndexByte/47      5.18ns ± 0%     3.44ns ± 2%  -33.54%
IndexByte/48      5.18ns ± 0%     3.39ns ± 0%  -34.52%
IndexByte/49      5.69ns ± 0%     3.79ns ± 0%  -33.45%
IndexByte/50      5.70ns ± 0%     3.70ns ± 3%  -34.98%
IndexByte/51      5.70ns ± 0%     3.70ns ± 2%  -35.05%
IndexByte/52      5.69ns ± 0%     3.80ns ± 1%  -33.35%
IndexByte/53      5.69ns ± 0%     3.78ns ± 0%  -33.54%
IndexByte/54      5.69ns ± 0%     3.78ns ± 1%  -33.51%
IndexByte/55      5.69ns ± 0%     3.78ns ± 0%  -33.61%
IndexByte/56      5.69ns ± 0%     3.81ns ± 3%  -33.12%
IndexByte/57      6.20ns ± 0%     3.79ns ± 4%  -38.89%
IndexByte/58      6.20ns ± 0%     3.74ns ± 2%  -39.58%
IndexByte/59      6.20ns ± 0%     3.69ns ± 2%  -40.47%
IndexByte/60      6.20ns ± 0%     3.79ns ± 1%  -38.81%
IndexByte/61      6.20ns ± 0%     3.77ns ± 1%  -39.23%
IndexByte/62      6.20ns ± 0%     3.79ns ± 0%  -38.89%
IndexByte/63      6.20ns ± 0%     3.79ns ± 0%  -38.90%
IndexByte/64      4.17ns ± 0%     3.47ns ± 3%  -16.70%
IndexByte/65      5.38ns ± 0%     4.21ns ± 0%  -21.59%
IndexByte/66      5.38ns ± 0%     4.21ns ± 0%  -21.58%
IndexByte/67      5.38ns ± 0%     4.22ns ± 0%  -21.58%
IndexByte/68      5.38ns ± 0%     4.22ns ± 0%  -21.59%
IndexByte/69      5.38ns ± 0%     4.22ns ± 0%  -21.56%
IndexByte/70      5.38ns ± 0%     4.21ns ± 0%  -21.59%
IndexByte/71      5.37ns ± 0%     4.21ns ± 0%  -21.51%
IndexByte/72      5.37ns ± 0%     4.22ns ± 0%  -21.46%
IndexByte/73      5.71ns ± 0%     4.22ns ± 0%  -26.20%
IndexByte/74      5.71ns ± 0%     4.21ns ± 0%  -26.21%
IndexByte/75      5.71ns ± 0%     4.21ns ± 0%  -26.17%
IndexByte/76      5.71ns ± 0%     4.22ns ± 0%  -26.22%
IndexByte/77      5.71ns ± 0%     4.22ns ± 0%  -26.22%
IndexByte/78      5.71ns ± 0%     4.21ns ± 0%  -26.22%
IndexByte/79      5.71ns ± 0%     4.22ns ± 0%  -26.21%
IndexByte/80      5.71ns ± 0%     4.21ns ± 0%  -26.19%
IndexByte/81      6.20ns ± 0%     4.39ns ± 0%  -29.13%
IndexByte/82      6.20ns ± 0%     4.36ns ± 0%  -29.67%
IndexByte/83      6.20ns ± 0%     4.36ns ± 0%  -29.63%
IndexByte/84      6.20ns ± 0%     4.39ns ± 0%  -29.21%
IndexByte/85      6.20ns ± 0%     4.36ns ± 0%  -29.64%
IndexByte/86      6.20ns ± 0%     4.36ns ± 0%  -29.63%
IndexByte/87      6.20ns ± 0%     4.39ns ± 0%  -29.21%
IndexByte/88      6.20ns ± 0%     4.36ns ± 0%  -29.65%
IndexByte/89      6.74ns ± 0%     4.36ns ± 0%  -35.33%
IndexByte/90      6.75ns ± 0%     4.37ns ± 0%  -35.22%
IndexByte/91      6.74ns ± 0%     4.36ns ± 0%  -35.30%
IndexByte/92      6.74ns ± 0%     4.36ns ± 0%  -35.34%
IndexByte/93      6.74ns ± 0%     4.37ns ± 0%  -35.20%
IndexByte/94      6.74ns ± 0%     4.36ns ± 0%  -35.33%
IndexByte/95      6.75ns ± 0%     4.36ns ± 0%  -35.32%
IndexByte/96      4.83ns ± 0%     4.34ns ± 2%  -10.24%
IndexByte/97      5.91ns ± 0%     4.65ns ± 0%  -21.24%
IndexByte/98      5.91ns ± 0%     4.65ns ± 0%  -21.24%
IndexByte/99      5.91ns ± 0%     4.65ns ± 0%  -21.23%
IndexByte/100     5.90ns ± 0%     4.65ns ± 0%  -21.21%
IndexByte/101     5.90ns ± 0%     4.65ns ± 0%  -21.22%
IndexByte/102     5.90ns ± 0%     4.65ns ± 0%  -21.23%
IndexByte/103     5.91ns ± 0%     4.65ns ± 0%  -21.23%
IndexByte/104     5.91ns ± 0%     4.65ns ± 0%  -21.24%
IndexByte/105     6.25ns ± 0%     4.65ns ± 0%  -25.59%
IndexByte/106     6.25ns ± 0%     4.65ns ± 0%  -25.59%
IndexByte/107     6.25ns ± 0%     4.65ns ± 0%  -25.60%
IndexByte/108     6.25ns ± 0%     4.65ns ± 0%  -25.58%
IndexByte/109     6.24ns ± 0%     4.65ns ± 0%  -25.50%
IndexByte/110     6.25ns ± 0%     4.65ns ± 0%  -25.56%
IndexByte/111     6.25ns ± 0%     4.65ns ± 0%  -25.60%
IndexByte/112     6.25ns ± 0%     4.65ns ± 0%  -25.59%
IndexByte/113     6.76ns ± 0%     5.05ns ± 0%  -25.37%
IndexByte/114     6.76ns ± 0%     5.05ns ± 0%  -25.31%
IndexByte/115     6.76ns ± 0%     5.05ns ± 0%  -25.38%
IndexByte/116     6.76ns ± 0%     5.05ns ± 0%  -25.31%
IndexByte/117     6.76ns ± 0%     5.05ns ± 0%  -25.38%
IndexByte/118     6.76ns ± 0%     5.05ns ± 0%  -25.31%
IndexByte/119     6.76ns ± 0%     5.05ns ± 0%  -25.38%
IndexByte/120     6.76ns ± 0%     5.05ns ± 0%  -25.36%
IndexByte/121     7.35ns ± 0%     5.05ns ± 0%  -31.33%
IndexByte/122     7.36ns ± 0%     5.05ns ± 0%  -31.42%
IndexByte/123     7.38ns ± 0%     5.05ns ± 0%  -31.60%
IndexByte/124     7.38ns ± 0%     5.05ns ± 0%  -31.59%
IndexByte/125     7.38ns ± 0%     5.05ns ± 0%  -31.60%
IndexByte/126     7.38ns ± 0%     5.05ns ± 0%  -31.58%
IndexByte/128     5.28ns ± 0%     5.10ns ± 0%   -3.41%
IndexByte/256     7.27ns ± 0%     7.28ns ± 2%   +0.13%
IndexByte/512     12.1ns ± 0%     11.8ns ± 0%   -2.51%
IndexByte/1K      23.1ns ± 3%     22.0ns ± 0%   -4.66%
IndexByte/2K      42.6ns ± 0%     42.4ns ± 0%   -0.41%
IndexByte/4K      90.3ns ± 0%     89.4ns ± 0%   -0.98%
IndexByte/8K       170ns ± 0%      170ns ± 0%   -0.59%
IndexByte/16K      331ns ± 0%      330ns ± 0%   -0.27%
IndexByte/32K      660ns ± 0%      660ns ± 0%   -0.08%
IndexByte/64K     1.30µs ± 0%     1.30µs ± 0%   -0.08%
IndexByte/128K    2.58µs ± 0%     2.58µs ± 0%   -0.04%
IndexByte/256K    5.15µs ± 0%     5.15µs ± 0%   -0.04%
IndexByte/512K    10.3µs ± 0%     10.3µs ± 0%   -0.03%
IndexByte/1M      20.6µs ± 0%     20.5µs ± 0%   -0.03%
IndexByte/2M      41.1µs ± 0%     41.1µs ± 0%   -0.03%
IndexByte/4M      82.2µs ± 0%     82.1µs ± 0%   -0.02%
IndexByte/8M       164µs ± 0%      164µs ± 0%   -0.01%
IndexByte/16M      328µs ± 0%      328µs ± 0%   -0.01%
IndexByte/32M      657µs ± 0%      657µs ± 0%   -0.00%

GOPPC64=power8 vs GOPPC64=power9. The Improvement is
most noticed between 16 and 64B, and goes away around
128B.

IndexByte/16      2.78ns ± 7%     2.65ns ±15%   -4.74%
IndexByte/17      3.04ns ± 1%     2.80ns ± 3%   -7.85%
IndexByte/18      3.05ns ± 0%     2.71ns ± 4%  -11.00%
IndexByte/19      3.02ns ± 2%     2.76ns ±10%   -8.74%
IndexByte/20      3.45ns ± 7%     2.91ns ± 0%  -15.46%
IndexByte/21      3.03ns ± 1%     2.84ns ± 9%   -6.33%
IndexByte/22      3.05ns ± 0%     2.67ns ± 1%  -12.38%
IndexByte/23      3.01ns ± 2%     2.67ns ± 1%  -11.24%
IndexByte/24      3.07ns ± 0%     2.92ns ±12%   -4.79%
IndexByte/25      3.04ns ± 1%     3.15ns ±15%   +3.63%
IndexByte/26      3.05ns ± 0%     2.83ns ±13%   -7.33%
IndexByte/27      2.97ns ± 3%     2.98ns ±10%   +0.56%
IndexByte/28      2.96ns ± 3%     2.96ns ± 9%   -0.05%
IndexByte/29      3.34ns ± 9%     3.03ns ±12%   -9.33%
IndexByte/30      3.05ns ± 0%     2.68ns ± 1%  -12.05%
IndexByte/31      3.05ns ± 0%     2.83ns ±12%   -7.27%
IndexByte/32      3.44ns ± 1%     3.21ns ±10%   -6.78%
IndexByte/33      3.35ns ± 0%     3.41ns ± 2%   +1.95%
IndexByte/34      3.35ns ± 0%     3.13ns ± 0%   -6.53%
IndexByte/35      3.35ns ± 0%     3.13ns ± 0%   -6.54%
IndexByte/36      3.35ns ± 0%     3.13ns ± 0%   -6.52%
IndexByte/37      3.35ns ± 0%     3.13ns ± 0%   -6.52%
IndexByte/38      3.35ns ± 0%     3.24ns ± 4%   -3.30%
IndexByte/39      3.48ns ± 9%     3.44ns ± 2%   -1.19%
IndexByte/40      3.55ns ± 4%     3.46ns ± 2%   -2.44%
IndexByte/41      3.42ns ± 2%     3.39ns ± 4%   -0.86%
IndexByte/42      3.55ns ±11%     3.46ns ± 1%   -2.65%
IndexByte/43      3.45ns ± 5%     3.44ns ± 2%   -0.31%
IndexByte/44      3.39ns ± 0%     3.43ns ± 3%   +1.23%
IndexByte/45      3.43ns ± 4%     3.50ns ± 1%   +2.07%
IndexByte/46      3.47ns ± 1%     3.46ns ± 2%   -0.31%
IndexByte/47      3.44ns ± 2%     3.47ns ± 1%   +0.78%
IndexByte/48      3.39ns ± 0%     3.46ns ± 2%   +1.96%
IndexByte/49      3.79ns ± 0%     3.47ns ± 0%   -8.41%
IndexByte/50      3.70ns ± 3%     3.64ns ± 5%   -1.66%
IndexByte/51      3.70ns ± 2%     3.75ns ± 0%   +1.40%
IndexByte/52      3.80ns ± 1%     3.77ns ± 0%   -0.70%
IndexByte/53      3.78ns ± 0%     3.77ns ± 0%   -0.46%
IndexByte/54      3.78ns ± 1%     3.53ns ± 7%   -6.74%
IndexByte/55      3.78ns ± 0%     3.47ns ± 0%   -8.17%
IndexByte/56      3.81ns ± 3%     3.45ns ± 0%   -9.43%
IndexByte/57      3.79ns ± 4%     3.47ns ± 0%   -8.45%
IndexByte/58      3.74ns ± 2%     3.55ns ± 4%   -5.16%
IndexByte/59      3.69ns ± 2%     3.61ns ± 4%   -2.01%
IndexByte/60      3.79ns ± 1%     3.45ns ± 0%   -9.09%
IndexByte/61      3.77ns ± 1%     3.47ns ± 0%   -7.93%
IndexByte/62      3.79ns ± 0%     3.45ns ± 0%   -8.97%
IndexByte/63      3.79ns ± 0%     3.47ns ± 0%   -8.44%
IndexByte/64      3.47ns ± 3%     3.18ns ± 0%   -8.41%

GOPPC64=power9 vs GOPPC64=power10. Only sizes <16 will
show meaningful changes.

IndexByte/1       3.27ns ± 8%     2.36ns ± 2%  -27.58%
IndexByte/2       3.06ns ± 4%     2.34ns ± 1%  -23.42%
IndexByte/3       3.77ns ±11%     2.48ns ± 7%  -34.03%
IndexByte/4       3.18ns ± 8%     2.33ns ± 1%  -26.69%
IndexByte/5       3.18ns ± 5%     2.34ns ± 4%  -26.26%
IndexByte/6       3.13ns ± 3%     2.35ns ± 1%  -24.97%
IndexByte/7       3.25ns ± 1%     2.33ns ± 1%  -28.22%
IndexByte/8       2.79ns ± 2%     2.36ns ± 1%  -15.32%
IndexByte/9       2.90ns ± 0%     2.34ns ± 2%  -19.36%
IndexByte/10      2.99ns ± 3%     2.31ns ± 1%  -22.70%
IndexByte/11      3.13ns ± 7%     2.31ns ± 0%  -26.08%
IndexByte/12      3.01ns ± 4%     2.32ns ± 1%  -22.91%
IndexByte/13      2.98ns ± 3%     2.31ns ± 1%  -22.72%
IndexByte/14      2.92ns ± 2%     2.61ns ±16%  -10.58%
IndexByte/15      3.02ns ± 5%     2.69ns ± 7%  -10.90%
IndexByte/16      2.65ns ±15%     2.29ns ± 1%  -13.61%

Change-Id: I4482f762d25eabf60def4981a0b2bc0c10ccf50c
Reviewed-on: https://go-review.googlesource.com/c/go/+/478656
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-21 16:10:29 +00:00
.github doc: normalize proposal-process links 2023-03-29 22:00:27 +00:00
api log/slog: add Source type for source location 2023-04-20 17:36:42 +00:00
doc doc: add release notes for new context functions 2023-04-20 13:01:21 +00:00
lib/time time/tzdata: generate zip constant during cmd/dist 2023-01-17 22:30:53 +00:00
misc make.{bash,bat}: check unmodified $PATH for $GOROOT/bin presence 2023-04-19 14:36:22 +00:00
src internal/bytealg: rewrite indexbytebody on PPC64 2023-04-21 16:10:29 +00:00
test Revert "runtime, cmd: rationalize StackLimit and StackGuard" 2023-04-20 16:19:35 +00:00
.gitattributes
.gitignore cmd/dist: refactor generated cgo-support logic 2023-04-20 17:26:46 +00:00
codereview.cfg
CONTRIBUTING.md doc: normalize proposal-process links 2023-03-29 22:00:27 +00:00
go.env cmd/go: introduce GOROOT/go.env and move proxy/sumdb config there 2023-01-17 23:10:39 +00:00
LICENSE
PATENTS
README.md README: update from CC-BY-3.0 to CC-BY-4.0 2022-11-02 20:14:56 +00:00
SECURITY.md SECURITY.md: replace golang.org with go.dev 2022-04-26 19:59:47 +00:00

The Go Programming Language

Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.

Gopher image Gopher image by Renee French, licensed under Creative Commons 4.0 Attributions license.

Our canonical Git repository is located at https://go.googlesource.com/go. There is a mirror of the repository at https://github.com/golang/go.

Unless otherwise noted, the Go source files are distributed under the BSD-style license found in the LICENSE file.

Download and Install

Binary Distributions

Official binary distributions are available at https://go.dev/dl/.

After downloading a binary release, visit https://go.dev/doc/install for installation instructions.

Install From Source

If a binary distribution is not available for your combination of operating system and architecture, visit https://go.dev/doc/install/source for source installation instructions.

Contributing

Go is the work of thousands of contributors. We appreciate your help!

To contribute, please read the contribution guidelines at https://go.dev/doc/contribute.

Note that the Go project uses the issue tracker for bug reports and proposals only. See https://go.dev/wiki/Questions for a list of places to ask questions about the Go language.