This adds "slicing by 8" optimization to Castagnoli tables which will
speed up CRC32 calculation on systems without asssembler,
which are all but AMD64.
In my tests, it is faster to use "slicing by 8" for sizes all down to
16 bytes, so the switchover point has been adjusted.
There are no benchmarks for small sizes, so I have added one for 40 bytes,
as well as one for bigger sizes (32KB).
Castagnoli, No assembler, 40 Byte payload: (before, after)
BenchmarkCastagnoli40B-4 10000000 161 ns/op 246.94 MB/s
BenchmarkCastagnoli40B-4 20000000 100 ns/op 398.01 MB/s
Castagnoli, No assembler, 32KB payload: (before, after)
BenchmarkCastagnoli32KB-4 10000 115426 ns/op 283.89 MB/s
BenchmarkCastagnoli32KB-4 30000 45171 ns/op 725.41 MB/s
IEEE, No assembler, 1KB payload: (before, after)
BenchmarkCrc1KB-4 500000 3604 ns/op 284.10 MB/s
BenchmarkCrc1KB-4 1000000 1463 ns/op 699.79 MB/s
Compared:
benchmark old ns/op new ns/op delta
BenchmarkCastagnoli40B-4 161 100 -37.89%
BenchmarkCastagnoli32KB-4 115426 45171 -60.87%
BenchmarkCrc1KB-4 3604 1463 -59.41%
benchmark old MB/s new MB/s speedup
BenchmarkCastagnoli40B-4 246.94 398.01 1.61x
BenchmarkCastagnoli32KB-4 283.89 725.41 2.56x
BenchmarkCrc1KB-4 284.10 699.79 2.46x
Change-Id: I303e4ec84e8d4dafd057d64c0e43deb2b498e968
Reviewed-on: https://go-review.googlesource.com/19335
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Add several instructions that were used via BYTE and use them.
Instructions added: PEXTRB, PEXTRD, PEXTRQ, PINSRB, XGETBV, POPCNT.
Change-Id: I5a80cd390dc01f3555dbbe856a475f74b5e6df65
Reviewed-on: https://go-review.googlesource.com/18593
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
CRC-32 computation is stateless and the p slice does not get stored
anywhere. Thus, we mark the assembly functions as noescape so that
it doesn't believe that p leaks in:
func Update(crc uint32, tab *Table, p []byte) uint32
Before:
./crc32.go:153: leaking param: p
After:
./crc32.go:153: Update p does not escape
Change-Id: I52ba35b6cc544fff724327140e0c27898431d1dc
Reviewed-on: https://go-review.googlesource.com/17069
Reviewed-by: Russ Cox <rsc@golang.org>
iEEETable violates the Go naming conventions and is inconsistent
with the rest of the package. Use ieeeTable instead.
Change-Id: I04b201aa39759d159de2b0295f43da80488c2263
Reviewed-on: https://go-review.googlesource.com/17068
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Change-Id: I77c6768fff6f0163b36800307c4d573bb6521fe5
Reviewed-on: https://go-review.googlesource.com/14454
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
IEEE is the most commonly used CRC-32 polynomial, used by zip, gzip and others.
Based on http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
benchmark old ns/op new ns/op delta
BenchmarkIEEECrc1KB-8 3193 352 -88.98%
BenchmarkIEEECrc4KB-8 5025 1307 -73.99%
BenchmarkCastagnoliCrc1KB-8 126 126 +0.00%
benchmark old MB/s new MB/s speedup
BenchmarkIEEECrc1KB-8 320.68 2901.92 9.05x
BenchmarkIEEECrc4KB-8 815.08 3131.80 3.84x
BenchmarkCastagnoliCrc1KB-8 8100.80 8109.78 1.00x
Change-Id: I99c9a48365f631827f516e44f97e86155f03cb90
Reviewed-on: https://go-review.googlesource.com/14080
Reviewed-by: Keith Randall <khr@golang.org>
Explicitly say that *Table returned by MakeTable may not be
modified. Otherwise, this leads to very subtle bugs that may
or may not manifest themselves.
Same comment was made on package crc64, to keep the future
open to the caching tables that crc32 effectively does.
Fixes: #12487.
Change-Id: I2881bebb8b16f6f8564412172774c79c2593c6c1
Reviewed-on: https://go-review.googlesource.com/14258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The URL is shown on go docs and is an eye-sore.
For go1.6.
Change-Id: I8b8ea3751200d06ed36acfe22f47ebb38107f8db
Reviewed-on: https://go-review.googlesource.com/13282
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The Slicing-By-8 [1] algorithm has much performance improvements than
current approach. This patch only uses it for IEEE, which is the most
common case in practice.
There is the benchmark on Mac OS X 10.9:
benchmark old MB/s new MB/s speedup
BenchmarkIEEECrc1KB 349.40 353.03 1.01x
BenchmarkIEEECrc4KB 351.55 934.35 2.66x
BenchmarkCastagnoliCrc1KB 7037.58 7392.63 1.05x
This algorithm need 8K lookup table, so it's enabled only for block
larger than 4K.
We can see about 2.6x improvement for IEEE.
Change-Id: I7f786d20f0949245e4aa101d7921669f496ed0f7
Reviewed-on: https://go-review.googlesource.com/1863
Reviewed-by: Russ Cox <rsc@golang.org>
Explicitly specify that we represent polynomial in reversed notation
Fixes#8229
Change-Id: Idf094c01fd82f133cd0c1b50fa967d12c577bdb5
Reviewed-on: https://go-review.googlesource.com/9237
Reviewed-by: David Chase <drchase@google.com>
This also removes pkg/runtime/traceback_lr.c, which was ported
to Go in an earlier commit and then moved to
runtime/traceback.go.
Reviewer: rsc@golang.org
rsc: LGTM