qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-09-29 19:24:33 -06:00

Author	SHA1	Message	Date
Roger Peppe	bd926e1c65	crypto, hash: document marshal/unmarshal implementation Unless you go back and read the hash package documentation, it's not clear that all the hash packages implement marshaling and unmarshaling. Document the behaviour specifically in each package that implements it as it this is hidden behaviour and easy to miss. Change-Id: Id9d3508909362f1a3e53872d0319298359e50a94 Reviewed-on: https://go-review.googlesource.com/77251 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>	2017-11-15 00:06:24 +00:00
Fangming.Fang	66bfbd9ad7	internal/cpu: detect cpu features in internal/cpu package change hash/crc32 package to use cpu package instead of using runtime internal variables to check crc32 instruction Change-Id: I8f88d2351bde8ed4e256f9adf822a08b9a00f532 Reviewed-on: https://go-review.googlesource.com/76490 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>	2017-11-14 19:07:15 +00:00
Joe Tsai	4aea3e7135	hash: document that the encoded state may contain input in plaintext The cryptographic checksums operate in blocks of 64 or 128 bytes, which means that the last 128 bytes or so of the input may be encoded in its original (plaintext) form as part of the state. Document this so users do not falsely assume that the encoded state carries no reversible information about the input. Change-Id: I823dbb87867bf0a77aa20f6ed7a615dbedab3715 Reviewed-on: https://go-review.googlesource.com/77372 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-11-13 22:14:58 +00:00
Tim Cooper	0ee4527ac7	hash: add marshaling, unmarshaling example Example usage of functionality implemented in CL 66710. Change-Id: I87d6e4d2fb7a60e4ba1e6ef02715480eb7e8f8bd Reviewed-on: https://go-review.googlesource.com/76011 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-11-04 03:47:34 +00:00
Joe Tsai	08f19bbde1	go/printer: forbid empty line before first comment in block To improve readability when exported fields are removed, forbid the printer from emitting an empty line before the first comment in a const, var, or type block. Also, when printing the "Has filtered or unexported fields." message, add an empty line before it to separate the message from the struct or interfact contents. Before the change: <<< type NamedArg struct { // Name is the name of the parameter placeholder. // // If empty, the ordinal position in the argument list will be // used. // // Name must omit any symbol prefix. Name string // Value is the value of the parameter. // It may be assigned the same value types as the query // arguments. Value interface{} // contains filtered or unexported fields } >>> After the change: <<< type NamedArg struct { // Name is the name of the parameter placeholder. // // If empty, the ordinal position in the argument list will be // used. // // Name must omit any symbol prefix. Name string // Value is the value of the parameter. // It may be assigned the same value types as the query // arguments. Value interface{} // contains filtered or unexported fields } >>> Fixes #18264 Change-Id: I9fe17ca39cf92fcdfea55064bd2eaa784ce48c88 Reviewed-on: https://go-review.googlesource.com/71990 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-02 18:17:22 +00:00
Tim Cooper	731b632172	crypto, hash: implement BinaryMarshaler, BinaryUnmarshaler in hash implementations The marshal method allows the hash's internal state to be serialized and unmarshaled at a later time, without having the re-write the entire stream of data that was already written to the hash. Fixes #20573 Change-Id: I40bbb84702ac4b7c5662f99bf943cdf4081203e5 Reviewed-on: https://go-review.googlesource.com/66710 Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-11-01 21:04:12 +00:00
Mikio Hara	7b659eb155	all: gofmt Change-Id: I2d0439a9f068e726173afafe2ef1f5d62b7feb4d Reviewed-on: https://go-review.googlesource.com/46190 Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-06-21 03:14:30 +00:00
Martin Möhrmann	69972aea74	internal/cpu: new package to detect cpu features Implements detection of x86 cpu features that are used in the go standard library. Changes all standard library packages to use the new cpu package instead of using runtime internal variables to check x86 cpu features. Updates: #15403 Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9 Reviewed-on: https://go-review.googlesource.com/41476 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-05-10 17:02:21 +00:00
Wei Xiao	ab636b899c	hash/crc32: optimize arm64 crc32 implementation ARMv8 defines crc32 instruction. Comparing to the original crc32 calculation, this patch makes use of crc32 instructions to do crc32 calculation instead of the multiple lookup table algorithms. ARMv8 provides IEEE and Castagnoli polynomials for crc32 calculation so that the perfomance of these two types of crc32 get significant improved. name old time/op new time/op delta CRC32/poly=IEEE/size=15/align=0-32 117ns ± 0% 38ns ± 0% -67.44% CRC32/poly=IEEE/size=15/align=1-32 117ns ± 0% 38ns ± 0% -67.52% CRC32/poly=IEEE/size=40/align=0-32 129ns ± 0% 41ns ± 0% -68.37% CRC32/poly=IEEE/size=40/align=1-32 129ns ± 0% 41ns ± 0% -68.29% CRC32/poly=IEEE/size=512/align=0-32 828ns ± 0% 246ns ± 0% -70.29% CRC32/poly=IEEE/size=512/align=1-32 828ns ± 0% 132ns ± 0% -84.06% CRC32/poly=IEEE/size=1kB/align=0-32 1.58µs ± 0% 0.46µs ± 0% -70.98% CRC32/poly=IEEE/size=1kB/align=1-32 1.58µs ± 0% 0.46µs ± 0% -70.92% CRC32/poly=IEEE/size=4kB/align=0-32 6.06µs ± 0% 1.74µs ± 0% -71.27% CRC32/poly=IEEE/size=4kB/align=1-32 6.10µs ± 0% 1.74µs ± 0% -71.44% CRC32/poly=IEEE/size=32kB/align=0-32 48.3µs ± 0% 13.7µs ± 0% -71.61% CRC32/poly=IEEE/size=32kB/align=1-32 48.3µs ± 0% 13.7µs ± 0% -71.60% CRC32/poly=Castagnoli/size=15/align=0-32 116ns ± 0% 38ns ± 0% -67.07% CRC32/poly=Castagnoli/size=15/align=1-32 116ns ± 0% 38ns ± 0% -66.90% CRC32/poly=Castagnoli/size=40/align=0-32 127ns ± 0% 40ns ± 0% -68.11% CRC32/poly=Castagnoli/size=40/align=1-32 127ns ± 0% 40ns ± 0% -68.11% CRC32/poly=Castagnoli/size=512/align=0-32 828ns ± 0% 132ns ± 0% -84.06% CRC32/poly=Castagnoli/size=512/align=1-32 827ns ± 0% 132ns ± 0% -84.04% CRC32/poly=Castagnoli/size=1kB/align=0-32 1.59µs ± 0% 0.22µs ± 0% -85.89% CRC32/poly=Castagnoli/size=1kB/align=1-32 1.58µs ± 0% 0.22µs ± 0% -85.79% CRC32/poly=Castagnoli/size=4kB/align=0-32 6.14µs ± 0% 0.77µs ± 0% -87.40% CRC32/poly=Castagnoli/size=4kB/align=1-32 6.06µs ± 0% 0.77µs ± 0% -87.25% CRC32/poly=Castagnoli/size=32kB/align=0-32 48.3µs ± 0% 5.9µs ± 0% -87.71% CRC32/poly=Castagnoli/size=32kB/align=1-32 48.4µs ± 0% 6.0µs ± 0% -87.69% CRC32/poly=Koopman/size=15/align=0-32 104ns ± 0% 104ns ± 0% +0.00% CRC32/poly=Koopman/size=15/align=1-32 104ns ± 0% 104ns ± 0% +0.00% CRC32/poly=Koopman/size=40/align=0-32 235ns ± 0% 235ns ± 0% +0.00% CRC32/poly=Koopman/size=40/align=1-32 235ns ± 0% 235ns ± 0% +0.00% CRC32/poly=Koopman/size=512/align=0-32 2.71µs ± 0% 2.71µs ± 0% -0.07% CRC32/poly=Koopman/size=512/align=1-32 2.71µs ± 0% 2.71µs ± 0% -0.04% CRC32/poly=Koopman/size=1kB/align=0-32 5.40µs ± 0% 5.39µs ± 0% -0.06% CRC32/poly=Koopman/size=1kB/align=1-32 5.40µs ± 0% 5.40µs ± 0% +0.02% CRC32/poly=Koopman/size=4kB/align=0-32 21.5µs ± 0% 21.5µs ± 0% -0.16% CRC32/poly=Koopman/size=4kB/align=1-32 21.5µs ± 0% 21.5µs ± 0% -0.05% CRC32/poly=Koopman/size=32kB/align=0-32 172µs ± 0% 172µs ± 0% -0.07% CRC32/poly=Koopman/size=32kB/align=1-32 172µs ± 0% 172µs ± 0% -0.01% name old speed new speed delta CRC32/poly=IEEE/size=15/align=0-32 128MB/s ± 0% 394MB/s ± 0% +207.95% CRC32/poly=IEEE/size=15/align=1-32 128MB/s ± 0% 394MB/s ± 0% +208.09% CRC32/poly=IEEE/size=40/align=0-32 310MB/s ± 0% 979MB/s ± 0% +216.07% CRC32/poly=IEEE/size=40/align=1-32 310MB/s ± 0% 979MB/s ± 0% +216.16% CRC32/poly=IEEE/size=512/align=0-32 618MB/s ± 0% 2074MB/s ± 0% +235.72% CRC32/poly=IEEE/size=512/align=1-32 618MB/s ± 0% 3852MB/s ± 0% +523.55% CRC32/poly=IEEE/size=1kB/align=0-32 646MB/s ± 0% 2225MB/s ± 0% +244.57% CRC32/poly=IEEE/size=1kB/align=1-32 647MB/s ± 0% 2225MB/s ± 0% +243.87% CRC32/poly=IEEE/size=4kB/align=0-32 676MB/s ± 0% 2352MB/s ± 0% +248.02% CRC32/poly=IEEE/size=4kB/align=1-32 672MB/s ± 0% 2352MB/s ± 0% +250.15% CRC32/poly=IEEE/size=32kB/align=0-32 678MB/s ± 0% 2387MB/s ± 0% +252.17% CRC32/poly=IEEE/size=32kB/align=1-32 678MB/s ± 0% 2388MB/s ± 0% +252.11% CRC32/poly=Castagnoli/size=15/align=0-32 129MB/s ± 0% 393MB/s ± 0% +205.51% CRC32/poly=Castagnoli/size=15/align=1-32 129MB/s ± 0% 390MB/s ± 0% +203.41% CRC32/poly=Castagnoli/size=40/align=0-32 314MB/s ± 0% 988MB/s ± 0% +215.04% CRC32/poly=Castagnoli/size=40/align=1-32 314MB/s ± 0% 987MB/s ± 0% +214.68% CRC32/poly=Castagnoli/size=512/align=0-32 618MB/s ± 0% 3860MB/s ± 0% +524.32% CRC32/poly=Castagnoli/size=512/align=1-32 619MB/s ± 0% 3859MB/s ± 0% +523.66% CRC32/poly=Castagnoli/size=1kB/align=0-32 645MB/s ± 0% 4568MB/s ± 0% +608.56% CRC32/poly=Castagnoli/size=1kB/align=1-32 650MB/s ± 0% 4567MB/s ± 0% +602.94% CRC32/poly=Castagnoli/size=4kB/align=0-32 667MB/s ± 0% 5297MB/s ± 0% +693.81% CRC32/poly=Castagnoli/size=4kB/align=1-32 676MB/s ± 0% 5297MB/s ± 0% +684.00% CRC32/poly=Castagnoli/size=32kB/align=0-32 678MB/s ± 0% 5519MB/s ± 0% +713.83% CRC32/poly=Castagnoli/size=32kB/align=1-32 677MB/s ± 0% 5497MB/s ± 0% +712.04% CRC32/poly=Koopman/size=15/align=0-32 143MB/s ± 0% 144MB/s ± 0% +0.27% CRC32/poly=Koopman/size=15/align=1-32 143MB/s ± 0% 144MB/s ± 0% +0.33% CRC32/poly=Koopman/size=40/align=0-32 169MB/s ± 0% 170MB/s ± 0% +0.12% CRC32/poly=Koopman/size=40/align=1-32 170MB/s ± 0% 170MB/s ± 0% +0.08% CRC32/poly=Koopman/size=512/align=0-32 189MB/s ± 0% 189MB/s ± 0% +0.07% CRC32/poly=Koopman/size=512/align=1-32 189MB/s ± 0% 189MB/s ± 0% +0.04% CRC32/poly=Koopman/size=1kB/align=0-32 190MB/s ± 0% 190MB/s ± 0% +0.05% CRC32/poly=Koopman/size=1kB/align=1-32 190MB/s ± 0% 190MB/s ± 0% -0.01% CRC32/poly=Koopman/size=4kB/align=0-32 190MB/s ± 0% 190MB/s ± 0% +0.15% CRC32/poly=Koopman/size=4kB/align=1-32 190MB/s ± 0% 191MB/s ± 0% +0.05% CRC32/poly=Koopman/size=32kB/align=0-32 191MB/s ± 0% 191MB/s ± 0% +0.06% CRC32/poly=Koopman/size=32kB/align=1-32 191MB/s ± 0% 191MB/s ± 0% +0.02% Also fix a bug of arm64 assembler The optimization is mainly contributed by Fangming.Fang <fangming.fang@arm.com> Change-Id: I900678c2e445d7e8ad9e2a9ab3305d649230905f Reviewed-on: https://go-review.googlesource.com/40074 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-13 12:44:10 +00:00
Lucas Clemente	e05de6a5be	hash/fnv: add 128-bit FNV hash support The 128bit FNV hash will be used e.g. in QUIC. The algorithm is described at https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function Change-Id: I13f3ec39b0e12b7a5008824a6619dff2e708ee81 Reviewed-on: https://go-review.googlesource.com/38356 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-04-13 01:28:48 +00:00
Eric Lagergren	094498c9a1	all: fix minor misspellings Change-Id: I1f1cfb161640eb8756fb1a283892d06b30b7a8fa Reviewed-on: https://go-review.googlesource.com/39356 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-03 23:19:07 +00:00
Lynn Boger	b6cd22c277	hash/crc32: improve performance for ppc64le This change improves the performance of crc32 for ppc64le by using vpmsum and other vector instructions in the algorithm. The testcase was updated to test more sizes. Fixes #19570 BenchmarkCRC32/poly=IEEE/size=15/align=0-8 90.5 81.8 -9.61% BenchmarkCRC32/poly=IEEE/size=15/align=1-8 89.7 81.7 -8.92% BenchmarkCRC32/poly=IEEE/size=40/align=0-8 93.2 61.1 -34.44% BenchmarkCRC32/poly=IEEE/size=40/align=1-8 92.8 60.9 -34.38% BenchmarkCRC32/poly=IEEE/size=512/align=0-8 501 55.8 -88.86% BenchmarkCRC32/poly=IEEE/size=512/align=1-8 502 132 -73.71% BenchmarkCRC32/poly=IEEE/size=1kB/align=0-8 947 69.9 -92.62% BenchmarkCRC32/poly=IEEE/size=1kB/align=1-8 946 144 -84.78% BenchmarkCRC32/poly=IEEE/size=4kB/align=0-8 3602 186 -94.84% BenchmarkCRC32/poly=IEEE/size=4kB/align=1-8 3603 263 -92.70% BenchmarkCRC32/poly=IEEE/size=32kB/align=0-8 28404 1338 -95.29% BenchmarkCRC32/poly=IEEE/size=32kB/align=1-8 28856 1405 -95.13% BenchmarkCRC32/poly=Castagnoli/size=15/align=0-8 89.7 81.8 -8.81% BenchmarkCRC32/poly=Castagnoli/size=15/align=1-8 89.8 81.9 -8.80% BenchmarkCRC32/poly=Castagnoli/size=40/align=0-8 93.8 61.4 -34.54% BenchmarkCRC32/poly=Castagnoli/size=40/align=1-8 94.3 61.3 -34.99% BenchmarkCRC32/poly=Castagnoli/size=512/align=0-8 503 56.4 -88.79% BenchmarkCRC32/poly=Castagnoli/size=512/align=1-8 502 132 -73.71% BenchmarkCRC32/poly=Castagnoli/size=1kB/align=0-8 941 70.2 -92.54% BenchmarkCRC32/poly=Castagnoli/size=1kB/align=1-8 943 145 -84.62% BenchmarkCRC32/poly=Castagnoli/size=4kB/align=0-8 3588 186 -94.82% BenchmarkCRC32/poly=Castagnoli/size=4kB/align=1-8 3595 264 -92.66% BenchmarkCRC32/poly=Castagnoli/size=32kB/align=0-8 28266 1323 -95.32% BenchmarkCRC32/poly=Castagnoli/size=32kB/align=1-8 28344 1404 -95.05% Change-Id: Ic4d8274c66e0e87bfba5f609f508a3877aee6bb5 Reviewed-on: https://go-review.googlesource.com/38184 Reviewed-by: David Chase <drchase@google.com>	2017-03-17 12:28:57 +00:00
Russ Cox	04e0a7622c	hash/crc32: use sub-benchmarks Change-Id: Iae68a097a6897f1616f94fdc3548837ef200e66f Reviewed-on: https://go-review.googlesource.com/36541 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>	2017-02-08 17:17:08 +00:00
Radu Berinde	bdde10137b	hash/crc32: cleanup code and improve tests Major reorganization of the crc32 code: - The arch-specific files now implement a well-defined interface (documented in crc32.go). They no longer have the responsibility of initializing and falling back to a non-accelerated implementation; instead, that happens in the higher level code. - The non-accelerated algorithms are moved to a separate file with no dependencies on other code. - The "cutoff" optimization for slicing-by-8 is moved inside the algorithm itself (as opposed to every callsite). Tests are significantly improved: - direct tests for the non-accelerated algorithms. - "cross-check" tests for arch-specific implementations (all archs). - tests for misaligned buffers for both IEEE and Castagnoli. Fixes #16909. Change-Id: I9b6dd83b7a57cd615eae901c0a6d61c6b8091c74 Reviewed-on: https://go-review.googlesource.com/27935 Reviewed-by: Keith Randall <khr@golang.org>	2016-08-31 15:17:57 +00:00
Radu Berinde	8c15a17251	hash/crc32: fix nil Castagnoli table problem When SSE is available, we don't need the Table. However, it is returned as a handle by MakeTable. Fix this to always generate the table. Further cleanup is discussed in #16909. Change-Id: Ic05400d68c6b5d25073ebd962000451746137afc Reviewed-on: https://go-review.googlesource.com/27934 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-08-28 19:01:07 +00:00
Radu Berinde	90c3cf4b52	hash/crc32: improve the AMD64 implementation using SSE4.2 The algorithm is explained in the comments. The improvement in throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x for larger buffers. Additionally, we no longer initialize the software tables if SSE4.2 is available. Adding a test for the SSE implementation (restricted to amd64 and amd64p32). Benchmarks on a Haswell i5-4670 @ 3.4 GHz: name old time/op new time/op delta CastagnoliCrc15B-4 21.9ns ± 1% 22.9ns ± 0% +4.45% CastagnoliCrc15BMisaligned-4 22.6ns ± 0% 23.4ns ± 0% +3.43% CastagnoliCrc40B-4 23.3ns ± 0% 23.9ns ± 0% +2.58% CastagnoliCrc40BMisaligned-4 25.4ns ± 0% 26.1ns ± 0% +2.86% CastagnoliCrc512-4 72.6ns ± 0% 52.8ns ± 0% -27.33% CastagnoliCrc512Misaligned-4 76.3ns ± 1% 56.3ns ± 0% -26.18% CastagnoliCrc1KB-4 128ns ± 1% 89ns ± 0% -30.04% CastagnoliCrc1KBMisaligned-4 130ns ± 0% 88ns ± 0% -32.65% CastagnoliCrc4KB-4 461ns ± 0% 187ns ± 0% -59.40% CastagnoliCrc4KBMisaligned-4 463ns ± 0% 191ns ± 0% -58.77% CastagnoliCrc32KB-4 3.58µs ± 0% 1.35µs ± 0% -62.22% CastagnoliCrc32KBMisaligned-4 3.58µs ± 0% 1.36µs ± 0% -61.84% name old speed new speed delta CastagnoliCrc15B-4 684MB/s ± 1% 655MB/s ± 0% -4.32% CastagnoliCrc15BMisaligned-4 663MB/s ± 0% 641MB/s ± 0% -3.32% CastagnoliCrc40B-4 1.72GB/s ± 0% 1.67GB/s ± 0% -2.69% CastagnoliCrc40BMisaligned-4 1.58GB/s ± 0% 1.53GB/s ± 0% -2.82% CastagnoliCrc512-4 7.05GB/s ± 0% 9.70GB/s ± 0% +37.59% CastagnoliCrc512Misaligned-4 6.71GB/s ± 1% 9.09GB/s ± 0% +35.43% CastagnoliCrc1KB-4 7.98GB/s ± 1% 11.46GB/s ± 0% +43.55% CastagnoliCrc1KBMisaligned-4 7.86GB/s ± 0% 11.70GB/s ± 0% +48.75% CastagnoliCrc4KB-4 8.87GB/s ± 0% 21.80GB/s ± 0% +145.69% CastagnoliCrc4KBMisaligned-4 8.83GB/s ± 0% 21.39GB/s ± 0% +142.25% CastagnoliCrc32KB-4 9.15GB/s ± 0% 24.22GB/s ± 0% +164.62% CastagnoliCrc32KBMisaligned-4 9.16GB/s ± 0% 24.00GB/s ± 0% +161.94% Fixes #16107. Change-Id: Ibe50ea76574674ce0571ef31c31015e0ed66b907 Reviewed-on: https://go-review.googlesource.com/27931 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-08-28 01:39:03 +00:00
Keith Randall	3427f16642	Revert "hash/crc32: improve the AMD64 implementation using SSE4.2" This reverts commit `54d7de7dd6`. It was breaking non-amd64 builds. Change-Id: I22650e922498eeeba3d4fa08bb4ea40a210c8f97 Reviewed-on: https://go-review.googlesource.com/27925 Reviewed-by: Keith Randall <khr@golang.org>	2016-08-27 16:49:02 +00:00
Radu Berinde	54d7de7dd6	hash/crc32: improve the AMD64 implementation using SSE4.2 The algorithm is explained in the comments. The improvement in throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x for larger buffers. Additionally, we no longer initialize the software tables if SSE4.2 is available. Benchmarks on a Haswell i5-4670 @ 3.4 GHz: name old time/op new time/op delta CastagnoliCrc15B-4 21.9ns ± 1% 22.9ns ± 0% +4.45% CastagnoliCrc15BMisaligned-4 22.6ns ± 0% 23.4ns ± 0% +3.43% CastagnoliCrc40B-4 23.3ns ± 0% 23.9ns ± 0% +2.58% CastagnoliCrc40BMisaligned-4 25.4ns ± 0% 26.1ns ± 0% +2.86% CastagnoliCrc512-4 72.6ns ± 0% 52.8ns ± 0% -27.33% CastagnoliCrc512Misaligned-4 76.3ns ± 1% 56.3ns ± 0% -26.18% CastagnoliCrc1KB-4 128ns ± 1% 89ns ± 0% -30.04% CastagnoliCrc1KBMisaligned-4 130ns ± 0% 88ns ± 0% -32.65% CastagnoliCrc4KB-4 461ns ± 0% 187ns ± 0% -59.40% CastagnoliCrc4KBMisaligned-4 463ns ± 0% 191ns ± 0% -58.77% CastagnoliCrc32KB-4 3.58µs ± 0% 1.35µs ± 0% -62.22% CastagnoliCrc32KBMisaligned-4 3.58µs ± 0% 1.36µs ± 0% -61.84% name old speed new speed delta CastagnoliCrc15B-4 684MB/s ± 1% 655MB/s ± 0% -4.32% CastagnoliCrc15BMisaligned-4 663MB/s ± 0% 641MB/s ± 0% -3.32% CastagnoliCrc40B-4 1.72GB/s ± 0% 1.67GB/s ± 0% -2.69% CastagnoliCrc40BMisaligned-4 1.58GB/s ± 0% 1.53GB/s ± 0% -2.82% CastagnoliCrc512-4 7.05GB/s ± 0% 9.70GB/s ± 0% +37.59% CastagnoliCrc512Misaligned-4 6.71GB/s ± 1% 9.09GB/s ± 0% +35.43% CastagnoliCrc1KB-4 7.98GB/s ± 1% 11.46GB/s ± 0% +43.55% CastagnoliCrc1KBMisaligned-4 7.86GB/s ± 0% 11.70GB/s ± 0% +48.75% CastagnoliCrc4KB-4 8.87GB/s ± 0% 21.80GB/s ± 0% +145.69% CastagnoliCrc4KBMisaligned-4 8.83GB/s ± 0% 21.39GB/s ± 0% +142.25% CastagnoliCrc32KB-4 9.15GB/s ± 0% 24.22GB/s ± 0% +164.62% CastagnoliCrc32KBMisaligned-4 9.16GB/s ± 0% 24.00GB/s ± 0% +161.94% Fixes #16107. Change-Id: I8fa827ec03f708ba27ee71c833f7544ad9dc5bc3 Reviewed-on: https://go-review.googlesource.com/24471 Reviewed-by: Keith Randall <khr@golang.org>	2016-08-27 15:50:28 +00:00
Michael Munday	4b17b152a3	hash/crc32: fix optimized s390x implementation The code wasn't checking to see if the data was still >= 64 bytes long after aligning it. Aligning the data is an optimization and we don't actually need to do it. In fact for smaller sizes it slows things down due to the overhead of calling the generic function. Therefore for now I have simply removed the alignment stage. I have also added a check into the assembly to deliberately trigger a segmentation fault if the data is too short. Fixes #16779. Change-Id: Ic01636d775efc5ec97689f050991cee04ce8fe73 Reviewed-on: https://go-review.googlesource.com/27409 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-08-21 02:04:43 +00:00
Radu Berinde	0c819b654f	hash/crc32: improve the processing of the last bytes in the SSE4.2 code for AMD64 This commit improves the processing of the final few bytes in castagnoliSSE42: instead of processing one byte at a time, we use all versions of the CRC32 instruction to process 4 bytes, then 2, then 1. The difference is only noticeable for small "odd" sized buffers. We do the similar improvement for processing the first few bytes in the case of unaligned buffer. Fixing the test which was not actually verifying the results for misaligned buffers (WriteString was creating an internal copy which was aligned). Adding benchmarks for length 15 (aligned and misaligned), results below. name old time/op new time/op delta CastagnoliCrc15B-4 25.1ns ± 0% 22.1ns ± 1% -12.14% CastagnoliCrc15BMisaligned-4 25.2ns ± 0% 22.9ns ± 1% -9.03% CastagnoliCrc40B-4 23.1ns ± 0% 23.4ns ± 0% +1.08% CastagnoliCrc1KB-4 127ns ± 0% 128ns ± 0% +1.18% CastagnoliCrc4KB-4 462ns ± 0% 464ns ± 0% ~ CastagnoliCrc32KB-4 3.58µs ± 0% 3.60µs ± 0% +0.58% name old speed new speed delta CastagnoliCrc15B-4 597MB/s ± 0% 679MB/s ± 1% +13.77% CastagnoliCrc15BMisaligned-4 596MB/s ± 0% 655MB/s ± 1% +9.94% CastagnoliCrc40B-4 1.73GB/s ± 0% 1.71GB/s ± 0% -1.14% CastagnoliCrc1KB-4 8.01GB/s ± 0% 7.93GB/s ± 1% -1.06% CastagnoliCrc4KB-4 8.86GB/s ± 0% 8.83GB/s ± 0% ~ CastagnoliCrc32KB-4 9.14GB/s ± 0% 9.09GB/s ± 0% -0.58% Change-Id: I499e37af2241d28e3e5d522bbab836c1a718430a Reviewed-on: https://go-review.googlesource.com/24470 Reviewed-by: Keith Randall <khr@golang.org>	2016-08-17 21:20:50 +00:00
Ilya Tocar	9d73e146da	hash/crc64: Use slicing by 8. Similar to crc32 slicing by 8. This also fixes a Crc64KB benchmark actually using 1024 bytes. Crc64/ISO64KB-4 147µs ± 0% 37µs ± 0% -75.05% (p=0.000 n=18+18) Crc64/ISO4KB-4 9.19µs ± 0% 2.33µs ± 0% -74.70% (p=0.000 n=19+20) Crc64/ISO1KB-4 2.31µs ± 0% 0.60µs ± 0% -73.81% (p=0.000 n=19+15) Crc64/ECMA64KB-4 147µs ± 0% 37µs ± 0% -75.05% (p=0.000 n=20+20) Crc64/Random64KB-4 147µs ± 0% 41µs ± 0% -72.17% (p=0.000 n=20+18) Crc64/Random16KB-4 36.7µs ± 0% 36.5µs ± 0% -0.54% (p=0.000 n=18+19) name old speed new speed delta Crc64/ISO64KB-4 446MB/s ± 0% 1788MB/s ± 0% +300.72% (p=0.000 n=18+18) Crc64/ISO4KB-4 446MB/s ± 0% 1761MB/s ± 0% +295.20% (p=0.000 n=18+20) Crc64/ISO1KB-4 444MB/s ± 0% 1694MB/s ± 0% +281.46% (p=0.000 n=19+20) Crc64/ECMA64KB-4 446MB/s ± 0% 1788MB/s ± 0% +300.77% (p=0.000 n=20+20) Crc64/Random64KB-4 446MB/s ± 0% 1603MB/s ± 0% +259.32% (p=0.000 n=20+18) Crc64/Random16KB-4 446MB/s ± 0% 448MB/s ± 0% +0.54% (p=0.000 n=18+20) Change-Id: I1c7621d836c486d6bfc41dbe1ec2ff9ab11aedfc Reviewed-on: https://go-review.googlesource.com/22222 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2016-05-18 14:38:04 +00:00
Chris Zou	5833d843de	hash/crc32: use vector instructions on s390x The input buffer is aligned to a doubleword boundary to improve performance of the vector instructions. The pure Go implementation is used to align the input data, and is also used when the vector instructions are not available or the data length is less than 64 bytes. Change-Id: Ie259a5f2f1562bcc17961c99e5776c99091d6bed Reviewed-on: https://go-review.googlesource.com/22201 Reviewed-by: Michael Munday <munday@ca.ibm.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bill O'Farrell <billotosyr@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-04-22 18:07:15 +00:00
Ilya Tocar	89a1f02834	hash/adler32: Unroll loop for extra performance. name old time/op new time/op delta Adler32KB-4 592ns ± 0% 447ns ± 0% -24.49% (p=0.000 n=19+20) name old speed new speed delta Adler32KB-4 1.73GB/s ± 0% 2.29GB/s ± 0% +32.41% (p=0.000 n=20+20) Change-Id: I38990aa66ca4452a886200018a57c0bc3af30717 Reviewed-on: https://go-review.googlesource.com/21880 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-15 10:17:17 +00:00
Michael Munday	8edf4cb27d	hash/crc32: invert build tags for go implementation It seems cleaner and more consistent with other files to list the architectures that have assembly implementations rather than to list those that do not. This means we don't have to add s390x and future platforms to this list. Change-Id: I2ad3f66b76eb1711333c910236ca7f5151b698e5 Reviewed-on: https://go-review.googlesource.com/21770 Reviewed-by: Bill O'Farrell <billotosyr@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-12 16:30:25 +00:00
Ilya Tocar	f5bd3556f5	hash/crc64: Add tests for ECMA polynomial Currently we test crc64 only with ISO polynomial. Change-Id: Ibc5e202db3b960369cbbb18e31eb0fea07b54dba Reviewed-on: https://go-review.googlesource.com/21309 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-03-31 20:42:02 +00:00
Klaus Post	b212c68b90	hash/crc32: use slicing by 8 for Castagnoli and smaller sizes This adds "slicing by 8" optimization to Castagnoli tables which will speed up CRC32 calculation on systems without asssembler, which are all but AMD64. In my tests, it is faster to use "slicing by 8" for sizes all down to 16 bytes, so the switchover point has been adjusted. There are no benchmarks for small sizes, so I have added one for 40 bytes, as well as one for bigger sizes (32KB). Castagnoli, No assembler, 40 Byte payload: (before, after) BenchmarkCastagnoli40B-4 10000000 161 ns/op 246.94 MB/s BenchmarkCastagnoli40B-4 20000000 100 ns/op 398.01 MB/s Castagnoli, No assembler, 32KB payload: (before, after) BenchmarkCastagnoli32KB-4 10000 115426 ns/op 283.89 MB/s BenchmarkCastagnoli32KB-4 30000 45171 ns/op 725.41 MB/s IEEE, No assembler, 1KB payload: (before, after) BenchmarkCrc1KB-4 500000 3604 ns/op 284.10 MB/s BenchmarkCrc1KB-4 1000000 1463 ns/op 699.79 MB/s Compared: benchmark old ns/op new ns/op delta BenchmarkCastagnoli40B-4 161 100 -37.89% BenchmarkCastagnoli32KB-4 115426 45171 -60.87% BenchmarkCrc1KB-4 3604 1463 -59.41% benchmark old MB/s new MB/s speedup BenchmarkCastagnoli40B-4 246.94 398.01 1.61x BenchmarkCastagnoli32KB-4 283.89 725.41 2.56x BenchmarkCrc1KB-4 284.10 699.79 2.46x Change-Id: I303e4ec84e8d4dafd057d64c0e43deb2b498e968 Reviewed-on: https://go-review.googlesource.com/19335 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-03-08 16:46:24 +00:00
Ilya Tocar	1d1f2fb4c6	cmd/internal/obj/x86: add new instructions, cleanup. Add several instructions that were used via BYTE and use them. Instructions added: PEXTRB, PEXTRD, PEXTRQ, PINSRB, XGETBV, POPCNT. Change-Id: I5a80cd390dc01f3555dbbe856a475f74b5e6df65 Reviewed-on: https://go-review.googlesource.com/18593 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2016-01-13 14:04:44 +00:00
Joe Tsai	64cc5fd0b3	hash/crc32: add noescape tags to assembly functions CRC-32 computation is stateless and the p slice does not get stored anywhere. Thus, we mark the assembly functions as noescape so that it doesn't believe that p leaks in: func Update(crc uint32, tab *Table, p []byte) uint32 Before: ./crc32.go:153: leaking param: p After: ./crc32.go:153: Update p does not escape Change-Id: I52ba35b6cc544fff724327140e0c27898431d1dc Reviewed-on: https://go-review.googlesource.com/17069 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-25 15:01:10 +00:00
Joe Tsai	d6ee6c2d06	hash/crc32: rename iEEETable to ieeeTable iEEETable violates the Go naming conventions and is inconsistent with the rest of the package. Use ieeeTable instead. Change-Id: I04b201aa39759d159de2b0295f43da80488c2263 Reviewed-on: https://go-review.googlesource.com/17068 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>	2015-11-20 04:57:07 +00:00
Yao Zhang	84df38181b	hash/crc32: added mips64{,le} build tags Change-Id: I77c6768fff6f0163b36800307c4d573bb6521fe5 Reviewed-on: https://go-review.googlesource.com/14454 Reviewed-by: Minux Ma <minux@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-11-12 04:50:43 +00:00
Klaus Post	2027b00e63	hash/crc32: add AMD64 optimized IEEE CRC calculation IEEE is the most commonly used CRC-32 polynomial, used by zip, gzip and others. Based on http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf benchmark old ns/op new ns/op delta BenchmarkIEEECrc1KB-8 3193 352 -88.98% BenchmarkIEEECrc4KB-8 5025 1307 -73.99% BenchmarkCastagnoliCrc1KB-8 126 126 +0.00% benchmark old MB/s new MB/s speedup BenchmarkIEEECrc1KB-8 320.68 2901.92 9.05x BenchmarkIEEECrc4KB-8 815.08 3131.80 3.84x BenchmarkCastagnoliCrc1KB-8 8100.80 8109.78 1.00x Change-Id: I99c9a48365f631827f516e44f97e86155f03cb90 Reviewed-on: https://go-review.googlesource.com/14080 Reviewed-by: Keith Randall <khr@golang.org>	2015-09-16 15:42:42 +00:00
Shenghou Ma	91ddc07f65	hash/*: document the byte order used by the Sum methods Fixes #12350. Change-Id: I3dcb0e2190c11f83f15fb07cc637fead54f734f7 Reviewed-on: https://go-review.googlesource.com/14275 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-09-10 03:34:23 +00:00
Joe Tsai	e16d80362d	hash: update documentation for MakeTable in crc32 and crc64 Explicitly say that *Table returned by MakeTable may not be modified. Otherwise, this leads to very subtle bugs that may or may not manifest themselves. Same comment was made on package crc64, to keep the future open to the caching tables that crc32 effectively does. Fixes: #12487. Change-Id: I2881bebb8b16f6f8564412172774c79c2593c6c1 Reviewed-on: https://go-review.googlesource.com/14258 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-09-04 02:16:27 +00:00
Joe Tsai	8e2d0e1c4c	hash/fnv: fix wiki url The URL is shown on go docs and is an eye-sore. For go1.6. Change-Id: I8b8ea3751200d06ed36acfe22f47ebb38107f8db Reviewed-on: https://go-review.googlesource.com/13282 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-08-24 21:26:42 +00:00
Davies Liu	1e0760354c	hash/crc32: speedup crc32 of IEEE using slicingBy8 The Slicing-By-8 [1] algorithm has much performance improvements than current approach. This patch only uses it for IEEE, which is the most common case in practice. There is the benchmark on Mac OS X 10.9: benchmark old MB/s new MB/s speedup BenchmarkIEEECrc1KB 349.40 353.03 1.01x BenchmarkIEEECrc4KB 351.55 934.35 2.66x BenchmarkCastagnoliCrc1KB 7037.58 7392.63 1.05x This algorithm need 8K lookup table, so it's enabled only for block larger than 4K. We can see about 2.6x improvement for IEEE. Change-Id: I7f786d20f0949245e4aa101d7921669f496ed0f7 Reviewed-on: https://go-review.googlesource.com/1863 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-18 18:14:24 +00:00
Shenghou Ma	169adec231	hash/crc32: move reverse representation docs to an example Updates #8229. Change-Id: I3e691479d3659ed1b3ff8ebbb71b4fc03f2e67af Reviewed-on: https://go-review.googlesource.com/9680 Reviewed-by: Rob Pike <r@golang.org>	2015-05-04 00:19:22 +00:00
Aamir Khan	80f575b78f	hash/crc32: clarify documentation Explicitly specify that we represent polynomial in reversed notation Fixes #8229 Change-Id: Idf094c01fd82f133cd0c1b50fa967d12c577bdb5 Reviewed-on: https://go-review.googlesource.com/9237 Reviewed-by: David Chase <drchase@google.com>	2015-04-24 13:44:25 +00:00
Aram Hăvărneanu	a25e3c03f3	os/signal, hash/crc32: add arm64 build tags Change-Id: I6ca9caec8ccf12618e56dcf6b83328e7acf8b1ec Reviewed-on: https://go-review.googlesource.com/7148 Reviewed-by: Minux Ma <minux@golang.org> Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Russ Cox <rsc@golang.org>	2015-03-16 18:46:43 +00:00
Russ Cox	09d92b6bbf	all: power64 is now ppc64 Fixes #8654. LGTM=austin R=austin CC=golang-codereviews https://golang.org/cl/180600043	2014-12-05 19:13:20 -05:00
Russ Cox	50e0749f87	[dev.cc] all: merge default (e4ab8f908aac) into dev.cc TBR=austin CC=golang-codereviews https://golang.org/cl/179040044	2014-11-20 11:48:08 -05:00
Nigel Tao	de7d1c4094	hash/crc32: fix comment that the IEEE polynomial applies to MPEG-2. LGTM=minux R=adg, minux CC=golang-codereviews https://golang.org/cl/170520043	2014-11-12 18:48:00 +11:00
Austin Clements	2bd616b1a7	build: merge the great pkg/ rename into dev.power64 This also removes pkg/runtime/traceback_lr.c, which was ported to Go in an earlier commit and then moved to runtime/traceback.go. Reviewer: rsc@golang.org rsc: LGTM	2014-10-22 13:25:37 -04:00
Russ Cox	c007ce824d	build: move package sources from src/pkg to src Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.	2014-09-08 00:08:51 -04:00

43 Commits