1
0
mirror of https://github.com/golang/go synced 2024-11-27 01:31:21 -07:00
go/src/internal/bytealg
Paul E. Murphy 756841bffa internal/bytealg: optimize Count/CountString for PPC64/Power10
Power10 adds a handful of new instructions which make this
noticeably quicker for smaller values.

Likewise, since the vector loop requires 32B to enter,
unroll it once to count 32B per iteration. This
improvement benefits all PPC64 cpus.

On Power10 comparing a binary built with GOPPC64=power8

CountSingle/10     8.99ns ± 0%    5.55ns ± 3%   -38.24%
CountSingle/16     7.55ns ± 0%    5.56ns ± 3%   -26.37%
CountSingle/17     7.45ns ± 0%    5.25ns ± 0%   -29.52%
CountSingle/31     18.4ns ± 0%     6.2ns ± 0%   -66.41%
CountSingle/32     6.17ns ± 0%    5.04ns ± 0%   -18.37%
CountSingle/33     7.13ns ± 0%    5.99ns ± 0%   -15.94%
CountSingle/4K      198ns ± 0%     115ns ± 0%   -42.08%
CountSingle/4M      190µs ± 0%     109µs ± 0%   -42.49%
CountSingle/64M    3.28ms ± 0%    2.08ms ± 0%   -36.53%

Furthermore, comparing the new tail implementation on
GOPPC64=power8 with GOPPC64=power10:

CountSingle/10     5.55ns ± 3%    4.52ns ± 1%  -18.66%
CountSingle/16     5.56ns ± 3%    4.80ns ± 0%  -13.65%
CountSingle/17     5.25ns ± 0%    4.79ns ± 0%   -8.78%
CountSingle/31     6.17ns ± 0%    4.82ns ± 0%  -21.79%
CountSingle/32     5.04ns ± 0%    5.09ns ± 6%   +1.01%
CountSingle/33     5.99ns ± 0%    5.42ns ± 2%   -9.54%

Change-Id: I62d80be3b5d706e1abbb4bec7d6278a939a5eed4
Reviewed-on: https://go-review.googlesource.com/c/go/+/512695
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-14 20:30:44 +00:00
..
bytealg.go
compare_386.s
compare_amd64.s
compare_arm64.s
compare_arm.s
compare_generic.go
compare_loong64.s
compare_mips64x.s
compare_mipsx.s
compare_native.go
compare_ppc64x.s cmd/internal/obj/ppc64: modify PCALIGN to ensure alignment 2023-04-21 16:47:45 +00:00
compare_riscv64.s internal/bytealg: fix alignment code in compare_riscv64.s 2023-05-30 16:05:30 +00:00
compare_s390x.s
compare_wasm.s
count_amd64.s internal/bytealg: optimize Count/CountString in amd64 2023-08-07 23:13:36 +00:00
count_arm64.s
count_arm.s
count_generic.go
count_native.go
count_ppc64x.s internal/bytealg: optimize Count/CountString for PPC64/Power10 2023-08-14 20:30:44 +00:00
count_riscv64.s
count_s390x.s
equal_386.s
equal_amd64.s
equal_arm64.s
equal_arm.s
equal_generic.go
equal_loong64.s
equal_mips64x.s
equal_mipsx.s
equal_native.go
equal_ppc64x.s cmd/internal/obj/ppc64: modify PCALIGN to ensure alignment 2023-04-21 16:47:45 +00:00
equal_riscv64.s internal/bytealg: fix alignment code in equal_riscv64.s 2023-06-29 02:34:59 +00:00
equal_s390x.s
equal_wasm.s
index_amd64.go
index_amd64.s internal/bytealg: optimize Index/IndexString in amd64 2023-08-07 00:20:48 +00:00
index_arm64.go
index_arm64.s
index_generic.go
index_native.go
index_ppc64x.go
index_ppc64x.s cmd/internal/obj/ppc64: modify PCALIGN to ensure alignment 2023-04-21 16:47:45 +00:00
index_s390x.go
index_s390x.s
indexbyte_386.s
indexbyte_amd64.s internal/bytealg: use generic IndexByte on plan9/amd64 2023-07-20 17:30:15 +00:00
indexbyte_arm64.s
indexbyte_arm.s
indexbyte_generic.go internal/bytealg: use generic IndexByte on plan9/amd64 2023-07-20 17:30:15 +00:00
indexbyte_loong64.s
indexbyte_mips64x.s
indexbyte_mipsx.s
indexbyte_native.go internal/bytealg: use generic IndexByte on plan9/amd64 2023-07-20 17:30:15 +00:00
indexbyte_ppc64x.s internal/bytealg: rewrite indexbytebody on PPC64 2023-04-21 16:10:29 +00:00
indexbyte_riscv64.s
indexbyte_s390x.s
indexbyte_wasm.s