1
0
mirror of https://github.com/golang/go synced 2024-11-06 04:16:11 -07:00
go/src/hash
ruinan c81dfdd47a hash/crc32: use LDP instead of LDR in crc32 computation
Since recent ARM CPUs support CRC late forwarding of results from a
producer to a consumer, the second CRC instruction can be executed
before the first CRC instruction complete. By loading 16 bytes once with
LDP and ordering two CRC instructions here we can make the data
forwarding happen to get better performance.

Benchmarks                             old         ThisRun    delta
CRC32/poly=IEEE/size=15/align=0-160    1.24GB/s    1.26GB/s   +1.92%
CRC32/poly=IEEE/size=15/align=1-160    1.24GB/s    1.26GB/s   +1.92%
CRC32/poly=IEEE/size=40/align=0-160    2.89GB/s    2.87GB/s   -0.72%
CRC32/poly=IEEE/size=40/align=1-160    2.86GB/s    2.87GB/s     ~
CRC32/poly=IEEE/size=512/align=0-160   9.29GB/s   14.47GB/s  +55.69%
CRC32/poly=IEEE/size=512/align=1-160   9.26GB/s   13.88GB/s  +49.89%
CRC32/poly=IEEE/size=1kB/align=0-160   10.2GB/s    17.6GB/s  +72.97%
CRC32/poly=IEEE/size=1kB/align=1-160   10.1GB/s    17.2GB/s  +69.29%
CRC32/poly=IEEE/size=4kB/align=0-160   10.5GB/s    20.9GB/s  +99.01%
CRC32/poly=IEEE/size=4kB/align=1-160   10.5GB/s    20.5GB/s  +95.02%
CRC32/poly=IEEE/size=32kB/align=0-160  11.1GB/s    22.0GB/s  +98.40%
CRC32/poly=IEEE/size=32kB/align=1-160  11.1GB/s    21.6GB/s  +94.80%

CRC32/poly=Castagnoli/size=15/align=0-160   1.26GB/s   1.26GB/s     ~
CRC32/poly=Castagnoli/size=15/align=1-160   1.25GB/s   1.26GB/s     ~
CRC32/poly=Castagnoli/size=40/align=0-160   2.88GB/s   3.02GB/s   +5.18%
CRC32/poly=Castagnoli/size=40/align=1-160   2.85GB/s   3.01GB/s   +5.57%
CRC32/poly=Castagnoli/size=512/align=0-160  9.24GB/s  14.71GB/s  +59.29%
CRC32/poly=Castagnoli/size=512/align=1-160  9.21GB/s  13.45GB/s  +45.92%
CRC32/poly=Castagnoli/size=1kB/align=0-160  10.1GB/s   17.8GB/s  +75.81%
CRC32/poly=Castagnoli/size=1kB/align=1-160  10.1GB/s   17.0GB/s  +67.80%
CRC32/poly=Castagnoli/size=4kB/align=0-160  10.5GB/s   21.0GB/s  +99.67%
CRC32/poly=Castagnoli/size=4kB/align=1-160  10.5GB/s   20.5GB/s  +94.26%
CRC32/poly=Castagnoli/size=32kB/align=0-160 11.1GB/s   22.0GB/s  +98.39%
CRC32/poly=Castagnoli/size=32kB/align=1-160 11.1GB/s   21.7GB/s  +95.63%

Change-Id: Ifc7be5048cafac242e7b75f652e4aafa9aeae844
Reviewed-on: https://go-review.googlesource.com/c/go/+/407854
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2022-08-10 08:57:33 +00:00
..
adler32 all: gofmt main repo 2022-04-11 16:34:30 +00:00
crc32 hash/crc32: use LDP instead of LDR in crc32 computation 2022-08-10 08:57:33 +00:00
crc64
fnv
maphash hash/maphash: use fastrand64 in MakeSeed 2022-04-21 17:46:04 +00:00
example_test.go
hash.go
marshal_test.go
test_cases.txt
test_gen.awk