mirror of
https://github.com/golang/go
synced 2024-11-27 01:31:21 -07:00
756841bffa
Power10 adds a handful of new instructions which make this noticeably quicker for smaller values. Likewise, since the vector loop requires 32B to enter, unroll it once to count 32B per iteration. This improvement benefits all PPC64 cpus. On Power10 comparing a binary built with GOPPC64=power8 CountSingle/10 8.99ns ± 0% 5.55ns ± 3% -38.24% CountSingle/16 7.55ns ± 0% 5.56ns ± 3% -26.37% CountSingle/17 7.45ns ± 0% 5.25ns ± 0% -29.52% CountSingle/31 18.4ns ± 0% 6.2ns ± 0% -66.41% CountSingle/32 6.17ns ± 0% 5.04ns ± 0% -18.37% CountSingle/33 7.13ns ± 0% 5.99ns ± 0% -15.94% CountSingle/4K 198ns ± 0% 115ns ± 0% -42.08% CountSingle/4M 190µs ± 0% 109µs ± 0% -42.49% CountSingle/64M 3.28ms ± 0% 2.08ms ± 0% -36.53% Furthermore, comparing the new tail implementation on GOPPC64=power8 with GOPPC64=power10: CountSingle/10 5.55ns ± 3% 4.52ns ± 1% -18.66% CountSingle/16 5.56ns ± 3% 4.80ns ± 0% -13.65% CountSingle/17 5.25ns ± 0% 4.79ns ± 0% -8.78% CountSingle/31 6.17ns ± 0% 4.82ns ± 0% -21.79% CountSingle/32 5.04ns ± 0% 5.09ns ± 6% +1.01% CountSingle/33 5.99ns ± 0% 5.42ns ± 2% -9.54% Change-Id: I62d80be3b5d706e1abbb4bec7d6278a939a5eed4 Reviewed-on: https://go-review.googlesource.com/c/go/+/512695 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
---|---|---|
.. | ||
bytealg.go | ||
compare_386.s | ||
compare_amd64.s | ||
compare_arm64.s | ||
compare_arm.s | ||
compare_generic.go | ||
compare_loong64.s | ||
compare_mips64x.s | ||
compare_mipsx.s | ||
compare_native.go | ||
compare_ppc64x.s | ||
compare_riscv64.s | ||
compare_s390x.s | ||
compare_wasm.s | ||
count_amd64.s | ||
count_arm64.s | ||
count_arm.s | ||
count_generic.go | ||
count_native.go | ||
count_ppc64x.s | ||
count_riscv64.s | ||
count_s390x.s | ||
equal_386.s | ||
equal_amd64.s | ||
equal_arm64.s | ||
equal_arm.s | ||
equal_generic.go | ||
equal_loong64.s | ||
equal_mips64x.s | ||
equal_mipsx.s | ||
equal_native.go | ||
equal_ppc64x.s | ||
equal_riscv64.s | ||
equal_s390x.s | ||
equal_wasm.s | ||
index_amd64.go | ||
index_amd64.s | ||
index_arm64.go | ||
index_arm64.s | ||
index_generic.go | ||
index_native.go | ||
index_ppc64x.go | ||
index_ppc64x.s | ||
index_s390x.go | ||
index_s390x.s | ||
indexbyte_386.s | ||
indexbyte_amd64.s | ||
indexbyte_arm64.s | ||
indexbyte_arm.s | ||
indexbyte_generic.go | ||
indexbyte_loong64.s | ||
indexbyte_mips64x.s | ||
indexbyte_mipsx.s | ||
indexbyte_native.go | ||
indexbyte_ppc64x.s | ||
indexbyte_riscv64.s | ||
indexbyte_s390x.s | ||
indexbyte_wasm.s |