1
0
mirror of https://github.com/golang/go synced 2024-09-29 19:24:33 -06:00
Commit Graph

43 Commits

Author SHA1 Message Date
Roger Peppe
bd926e1c65 crypto, hash: document marshal/unmarshal implementation
Unless you go back and read the hash package documentation, it's
not clear that all the hash packages implement marshaling and
unmarshaling. Document the behaviour specifically in each package
that implements it as it this is hidden behaviour and easy to miss.

Change-Id: Id9d3508909362f1a3e53872d0319298359e50a94
Reviewed-on: https://go-review.googlesource.com/77251
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2017-11-15 00:06:24 +00:00
Fangming.Fang
66bfbd9ad7 internal/cpu: detect cpu features in internal/cpu package
change hash/crc32 package to use cpu package instead of using
runtime internal variables to check crc32 instruction

Change-Id: I8f88d2351bde8ed4e256f9adf822a08b9a00f532
Reviewed-on: https://go-review.googlesource.com/76490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2017-11-14 19:07:15 +00:00
Joe Tsai
4aea3e7135 hash: document that the encoded state may contain input in plaintext
The cryptographic checksums operate in blocks of 64 or 128 bytes,
which means that the last 128 bytes or so of the input may be encoded
in its original (plaintext) form as part of the state.
Document this so users do not falsely assume that the encoded state
carries no reversible information about the input.

Change-Id: I823dbb87867bf0a77aa20f6ed7a615dbedab3715
Reviewed-on: https://go-review.googlesource.com/77372
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-11-13 22:14:58 +00:00
Tim Cooper
0ee4527ac7 hash: add marshaling, unmarshaling example
Example usage of functionality implemented in CL 66710.

Change-Id: I87d6e4d2fb7a60e4ba1e6ef02715480eb7e8f8bd
Reviewed-on: https://go-review.googlesource.com/76011
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-11-04 03:47:34 +00:00
Joe Tsai
08f19bbde1 go/printer: forbid empty line before first comment in block
To improve readability when exported fields are removed,
forbid the printer from emitting an empty line before the first comment
in a const, var, or type block.
Also, when printing the "Has filtered or unexported fields." message,
add an empty line before it to separate the message from the struct
or interfact contents.

Before the change:
<<<
type NamedArg struct {

        // Name is the name of the parameter placeholder.
        //
        // If empty, the ordinal position in the argument list will be
        // used.
        //
        // Name must omit any symbol prefix.
        Name string

        // Value is the value of the parameter.
        // It may be assigned the same value types as the query
        // arguments.
        Value interface{}
        // contains filtered or unexported fields
}
>>>

After the change:
<<<
type NamedArg struct {
        // Name is the name of the parameter placeholder.
        //
        // If empty, the ordinal position in the argument list will be
        // used.
        //
        // Name must omit any symbol prefix.
        Name string

        // Value is the value of the parameter.
        // It may be assigned the same value types as the query
        // arguments.
        Value interface{}

        // contains filtered or unexported fields
}
>>>

Fixes #18264

Change-Id: I9fe17ca39cf92fcdfea55064bd2eaa784ce48c88
Reviewed-on: https://go-review.googlesource.com/71990
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-11-02 18:17:22 +00:00
Tim Cooper
731b632172 crypto, hash: implement BinaryMarshaler, BinaryUnmarshaler in hash implementations
The marshal method allows the hash's internal state to be serialized and
unmarshaled at a later time, without having the re-write the entire stream
of data that was already written to the hash.

Fixes #20573

Change-Id: I40bbb84702ac4b7c5662f99bf943cdf4081203e5
Reviewed-on: https://go-review.googlesource.com/66710
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-01 21:04:12 +00:00
Mikio Hara
7b659eb155 all: gofmt
Change-Id: I2d0439a9f068e726173afafe2ef1f5d62b7feb4d
Reviewed-on: https://go-review.googlesource.com/46190
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-06-21 03:14:30 +00:00
Martin Möhrmann
69972aea74 internal/cpu: new package to detect cpu features
Implements detection of x86 cpu features that
are used in the go standard library.

Changes all standard library packages to use the new cpu package
instead of using runtime internal variables to check x86 cpu features.

Updates: #15403

Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9
Reviewed-on: https://go-review.googlesource.com/41476
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-05-10 17:02:21 +00:00
Wei Xiao
ab636b899c hash/crc32: optimize arm64 crc32 implementation
ARMv8 defines crc32 instruction.

Comparing to the original crc32 calculation, this patch makes use of
crc32 instructions to do crc32 calculation instead of the multiple
lookup table algorithms.

ARMv8 provides IEEE and Castagnoli polynomials for crc32 calculation
so that the perfomance of these two types of crc32 get significant
improved.

name                                        old time/op   new time/op    delta
CRC32/poly=IEEE/size=15/align=0-32            117ns ± 0%      38ns ± 0%   -67.44%
CRC32/poly=IEEE/size=15/align=1-32            117ns ± 0%      38ns ± 0%   -67.52%
CRC32/poly=IEEE/size=40/align=0-32            129ns ± 0%      41ns ± 0%   -68.37%
CRC32/poly=IEEE/size=40/align=1-32            129ns ± 0%      41ns ± 0%   -68.29%
CRC32/poly=IEEE/size=512/align=0-32           828ns ± 0%     246ns ± 0%   -70.29%
CRC32/poly=IEEE/size=512/align=1-32           828ns ± 0%     132ns ± 0%   -84.06%
CRC32/poly=IEEE/size=1kB/align=0-32          1.58µs ± 0%    0.46µs ± 0%   -70.98%
CRC32/poly=IEEE/size=1kB/align=1-32          1.58µs ± 0%    0.46µs ± 0%   -70.92%
CRC32/poly=IEEE/size=4kB/align=0-32          6.06µs ± 0%    1.74µs ± 0%   -71.27%
CRC32/poly=IEEE/size=4kB/align=1-32          6.10µs ± 0%    1.74µs ± 0%   -71.44%
CRC32/poly=IEEE/size=32kB/align=0-32         48.3µs ± 0%    13.7µs ± 0%   -71.61%
CRC32/poly=IEEE/size=32kB/align=1-32         48.3µs ± 0%    13.7µs ± 0%   -71.60%
CRC32/poly=Castagnoli/size=15/align=0-32      116ns ± 0%      38ns ± 0%   -67.07%
CRC32/poly=Castagnoli/size=15/align=1-32      116ns ± 0%      38ns ± 0%   -66.90%
CRC32/poly=Castagnoli/size=40/align=0-32      127ns ± 0%      40ns ± 0%   -68.11%
CRC32/poly=Castagnoli/size=40/align=1-32      127ns ± 0%      40ns ± 0%   -68.11%
CRC32/poly=Castagnoli/size=512/align=0-32     828ns ± 0%     132ns ± 0%   -84.06%
CRC32/poly=Castagnoli/size=512/align=1-32     827ns ± 0%     132ns ± 0%   -84.04%
CRC32/poly=Castagnoli/size=1kB/align=0-32    1.59µs ± 0%    0.22µs ± 0%   -85.89%
CRC32/poly=Castagnoli/size=1kB/align=1-32    1.58µs ± 0%    0.22µs ± 0%   -85.79%
CRC32/poly=Castagnoli/size=4kB/align=0-32    6.14µs ± 0%    0.77µs ± 0%   -87.40%
CRC32/poly=Castagnoli/size=4kB/align=1-32    6.06µs ± 0%    0.77µs ± 0%   -87.25%
CRC32/poly=Castagnoli/size=32kB/align=0-32   48.3µs ± 0%     5.9µs ± 0%   -87.71%
CRC32/poly=Castagnoli/size=32kB/align=1-32   48.4µs ± 0%     6.0µs ± 0%   -87.69%
CRC32/poly=Koopman/size=15/align=0-32         104ns ± 0%     104ns ± 0%    +0.00%
CRC32/poly=Koopman/size=15/align=1-32         104ns ± 0%     104ns ± 0%    +0.00%
CRC32/poly=Koopman/size=40/align=0-32         235ns ± 0%     235ns ± 0%    +0.00%
CRC32/poly=Koopman/size=40/align=1-32         235ns ± 0%     235ns ± 0%    +0.00%
CRC32/poly=Koopman/size=512/align=0-32       2.71µs ± 0%    2.71µs ± 0%    -0.07%
CRC32/poly=Koopman/size=512/align=1-32       2.71µs ± 0%    2.71µs ± 0%    -0.04%
CRC32/poly=Koopman/size=1kB/align=0-32       5.40µs ± 0%    5.39µs ± 0%    -0.06%
CRC32/poly=Koopman/size=1kB/align=1-32       5.40µs ± 0%    5.40µs ± 0%    +0.02%
CRC32/poly=Koopman/size=4kB/align=0-32       21.5µs ± 0%    21.5µs ± 0%    -0.16%
CRC32/poly=Koopman/size=4kB/align=1-32       21.5µs ± 0%    21.5µs ± 0%    -0.05%
CRC32/poly=Koopman/size=32kB/align=0-32       172µs ± 0%     172µs ± 0%    -0.07%
CRC32/poly=Koopman/size=32kB/align=1-32       172µs ± 0%     172µs ± 0%    -0.01%

name                                        old speed     new speed      delta
CRC32/poly=IEEE/size=15/align=0-32          128MB/s ± 0%   394MB/s ± 0%  +207.95%
CRC32/poly=IEEE/size=15/align=1-32          128MB/s ± 0%   394MB/s ± 0%  +208.09%
CRC32/poly=IEEE/size=40/align=0-32          310MB/s ± 0%   979MB/s ± 0%  +216.07%
CRC32/poly=IEEE/size=40/align=1-32          310MB/s ± 0%   979MB/s ± 0%  +216.16%
CRC32/poly=IEEE/size=512/align=0-32         618MB/s ± 0%  2074MB/s ± 0%  +235.72%
CRC32/poly=IEEE/size=512/align=1-32         618MB/s ± 0%  3852MB/s ± 0%  +523.55%
CRC32/poly=IEEE/size=1kB/align=0-32         646MB/s ± 0%  2225MB/s ± 0%  +244.57%
CRC32/poly=IEEE/size=1kB/align=1-32         647MB/s ± 0%  2225MB/s ± 0%  +243.87%
CRC32/poly=IEEE/size=4kB/align=0-32         676MB/s ± 0%  2352MB/s ± 0%  +248.02%
CRC32/poly=IEEE/size=4kB/align=1-32         672MB/s ± 0%  2352MB/s ± 0%  +250.15%
CRC32/poly=IEEE/size=32kB/align=0-32        678MB/s ± 0%  2387MB/s ± 0%  +252.17%
CRC32/poly=IEEE/size=32kB/align=1-32        678MB/s ± 0%  2388MB/s ± 0%  +252.11%
CRC32/poly=Castagnoli/size=15/align=0-32    129MB/s ± 0%   393MB/s ± 0%  +205.51%
CRC32/poly=Castagnoli/size=15/align=1-32    129MB/s ± 0%   390MB/s ± 0%  +203.41%
CRC32/poly=Castagnoli/size=40/align=0-32    314MB/s ± 0%   988MB/s ± 0%  +215.04%
CRC32/poly=Castagnoli/size=40/align=1-32    314MB/s ± 0%   987MB/s ± 0%  +214.68%
CRC32/poly=Castagnoli/size=512/align=0-32   618MB/s ± 0%  3860MB/s ± 0%  +524.32%
CRC32/poly=Castagnoli/size=512/align=1-32   619MB/s ± 0%  3859MB/s ± 0%  +523.66%
CRC32/poly=Castagnoli/size=1kB/align=0-32   645MB/s ± 0%  4568MB/s ± 0%  +608.56%
CRC32/poly=Castagnoli/size=1kB/align=1-32   650MB/s ± 0%  4567MB/s ± 0%  +602.94%
CRC32/poly=Castagnoli/size=4kB/align=0-32   667MB/s ± 0%  5297MB/s ± 0%  +693.81%
CRC32/poly=Castagnoli/size=4kB/align=1-32   676MB/s ± 0%  5297MB/s ± 0%  +684.00%
CRC32/poly=Castagnoli/size=32kB/align=0-32  678MB/s ± 0%  5519MB/s ± 0%  +713.83%
CRC32/poly=Castagnoli/size=32kB/align=1-32  677MB/s ± 0%  5497MB/s ± 0%  +712.04%
CRC32/poly=Koopman/size=15/align=0-32       143MB/s ± 0%   144MB/s ± 0%    +0.27%
CRC32/poly=Koopman/size=15/align=1-32       143MB/s ± 0%   144MB/s ± 0%    +0.33%
CRC32/poly=Koopman/size=40/align=0-32       169MB/s ± 0%   170MB/s ± 0%    +0.12%
CRC32/poly=Koopman/size=40/align=1-32       170MB/s ± 0%   170MB/s ± 0%    +0.08%
CRC32/poly=Koopman/size=512/align=0-32      189MB/s ± 0%   189MB/s ± 0%    +0.07%
CRC32/poly=Koopman/size=512/align=1-32      189MB/s ± 0%   189MB/s ± 0%    +0.04%
CRC32/poly=Koopman/size=1kB/align=0-32      190MB/s ± 0%   190MB/s ± 0%    +0.05%
CRC32/poly=Koopman/size=1kB/align=1-32      190MB/s ± 0%   190MB/s ± 0%    -0.01%
CRC32/poly=Koopman/size=4kB/align=0-32      190MB/s ± 0%   190MB/s ± 0%    +0.15%
CRC32/poly=Koopman/size=4kB/align=1-32      190MB/s ± 0%   191MB/s ± 0%    +0.05%
CRC32/poly=Koopman/size=32kB/align=0-32     191MB/s ± 0%   191MB/s ± 0%    +0.06%
CRC32/poly=Koopman/size=32kB/align=1-32     191MB/s ± 0%   191MB/s ± 0%    +0.02%

Also fix a bug of arm64 assembler

The optimization is mainly contributed by Fangming.Fang <fangming.fang@arm.com>

Change-Id: I900678c2e445d7e8ad9e2a9ab3305d649230905f
Reviewed-on: https://go-review.googlesource.com/40074
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-13 12:44:10 +00:00
Lucas Clemente
e05de6a5be hash/fnv: add 128-bit FNV hash support
The 128bit FNV hash will be used e.g. in QUIC.

The algorithm is described at
https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

Change-Id: I13f3ec39b0e12b7a5008824a6619dff2e708ee81
Reviewed-on: https://go-review.googlesource.com/38356
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-13 01:28:48 +00:00
Eric Lagergren
094498c9a1 all: fix minor misspellings
Change-Id: I1f1cfb161640eb8756fb1a283892d06b30b7a8fa
Reviewed-on: https://go-review.googlesource.com/39356
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-03 23:19:07 +00:00
Lynn Boger
b6cd22c277 hash/crc32: improve performance for ppc64le
This change improves the performance of crc32 for ppc64le by using
vpmsum and other vector instructions in the algorithm.

The testcase was updated to test more sizes.

Fixes #19570

BenchmarkCRC32/poly=IEEE/size=15/align=0-8             90.5          81.8          -9.61%
BenchmarkCRC32/poly=IEEE/size=15/align=1-8             89.7          81.7          -8.92%
BenchmarkCRC32/poly=IEEE/size=40/align=0-8             93.2          61.1          -34.44%
BenchmarkCRC32/poly=IEEE/size=40/align=1-8             92.8          60.9          -34.38%
BenchmarkCRC32/poly=IEEE/size=512/align=0-8            501           55.8          -88.86%
BenchmarkCRC32/poly=IEEE/size=512/align=1-8            502           132           -73.71%
BenchmarkCRC32/poly=IEEE/size=1kB/align=0-8            947           69.9          -92.62%
BenchmarkCRC32/poly=IEEE/size=1kB/align=1-8            946           144           -84.78%
BenchmarkCRC32/poly=IEEE/size=4kB/align=0-8            3602          186           -94.84%
BenchmarkCRC32/poly=IEEE/size=4kB/align=1-8            3603          263           -92.70%
BenchmarkCRC32/poly=IEEE/size=32kB/align=0-8           28404         1338          -95.29%
BenchmarkCRC32/poly=IEEE/size=32kB/align=1-8           28856         1405          -95.13%
BenchmarkCRC32/poly=Castagnoli/size=15/align=0-8       89.7          81.8          -8.81%
BenchmarkCRC32/poly=Castagnoli/size=15/align=1-8       89.8          81.9          -8.80%
BenchmarkCRC32/poly=Castagnoli/size=40/align=0-8       93.8          61.4          -34.54%
BenchmarkCRC32/poly=Castagnoli/size=40/align=1-8       94.3          61.3          -34.99%
BenchmarkCRC32/poly=Castagnoli/size=512/align=0-8      503           56.4          -88.79%
BenchmarkCRC32/poly=Castagnoli/size=512/align=1-8      502           132           -73.71%
BenchmarkCRC32/poly=Castagnoli/size=1kB/align=0-8      941           70.2          -92.54%
BenchmarkCRC32/poly=Castagnoli/size=1kB/align=1-8      943           145           -84.62%
BenchmarkCRC32/poly=Castagnoli/size=4kB/align=0-8      3588          186           -94.82%
BenchmarkCRC32/poly=Castagnoli/size=4kB/align=1-8      3595          264           -92.66%
BenchmarkCRC32/poly=Castagnoli/size=32kB/align=0-8     28266         1323          -95.32%
BenchmarkCRC32/poly=Castagnoli/size=32kB/align=1-8     28344         1404          -95.05%

Change-Id: Ic4d8274c66e0e87bfba5f609f508a3877aee6bb5
Reviewed-on: https://go-review.googlesource.com/38184
Reviewed-by: David Chase <drchase@google.com>
2017-03-17 12:28:57 +00:00
Russ Cox
04e0a7622c hash/crc32: use sub-benchmarks
Change-Id: Iae68a097a6897f1616f94fdc3548837ef200e66f
Reviewed-on: https://go-review.googlesource.com/36541
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2017-02-08 17:17:08 +00:00
Radu Berinde
bdde10137b hash/crc32: cleanup code and improve tests
Major reorganization of the crc32 code:

 - The arch-specific files now implement a well-defined interface
   (documented in crc32.go). They no longer have the responsibility of
   initializing and falling back to a non-accelerated implementation;
   instead, that happens in the higher level code.

 - The non-accelerated algorithms are moved to a separate file with no
   dependencies on other code.

 - The "cutoff" optimization for slicing-by-8 is moved inside the
   algorithm itself (as opposed to every callsite).

Tests are significantly improved:
 - direct tests for the non-accelerated algorithms.
 - "cross-check" tests for arch-specific implementations (all archs).
 - tests for misaligned buffers for both IEEE and Castagnoli.

Fixes #16909.

Change-Id: I9b6dd83b7a57cd615eae901c0a6d61c6b8091c74
Reviewed-on: https://go-review.googlesource.com/27935
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-31 15:17:57 +00:00
Radu Berinde
8c15a17251 hash/crc32: fix nil Castagnoli table problem
When SSE is available, we don't need the Table. However, it is
returned as a handle by MakeTable. Fix this to always generate
the table.

Further cleanup is discussed in #16909.

Change-Id: Ic05400d68c6b5d25073ebd962000451746137afc
Reviewed-on: https://go-review.googlesource.com/27934
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-28 19:01:07 +00:00
Radu Berinde
90c3cf4b52 hash/crc32: improve the AMD64 implementation using SSE4.2
The algorithm is explained in the comments. The improvement in
throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
for larger buffers.

Additionally, we no longer initialize the software tables if SSE4.2 is
available.

Adding a test for the SSE implementation (restricted to amd64 and
amd64p32).

Benchmarks on a Haswell i5-4670 @ 3.4 GHz:

name                           old time/op    new time/op     delta
CastagnoliCrc15B-4               21.9ns ± 1%     22.9ns ± 0%    +4.45%
CastagnoliCrc15BMisaligned-4     22.6ns ± 0%     23.4ns ± 0%    +3.43%
CastagnoliCrc40B-4               23.3ns ± 0%     23.9ns ± 0%    +2.58%
CastagnoliCrc40BMisaligned-4     25.4ns ± 0%     26.1ns ± 0%    +2.86%
CastagnoliCrc512-4               72.6ns ± 0%     52.8ns ± 0%   -27.33%
CastagnoliCrc512Misaligned-4     76.3ns ± 1%     56.3ns ± 0%   -26.18%
CastagnoliCrc1KB-4                128ns ± 1%       89ns ± 0%   -30.04%
CastagnoliCrc1KBMisaligned-4      130ns ± 0%       88ns ± 0%   -32.65%
CastagnoliCrc4KB-4                461ns ± 0%      187ns ± 0%   -59.40%
CastagnoliCrc4KBMisaligned-4      463ns ± 0%      191ns ± 0%   -58.77%
CastagnoliCrc32KB-4              3.58µs ± 0%     1.35µs ± 0%   -62.22%
CastagnoliCrc32KBMisaligned-4    3.58µs ± 0%     1.36µs ± 0%   -61.84%

name                           old speed      new speed       delta
CastagnoliCrc15B-4              684MB/s ± 1%    655MB/s ± 0%    -4.32%
CastagnoliCrc15BMisaligned-4    663MB/s ± 0%    641MB/s ± 0%    -3.32%
CastagnoliCrc40B-4             1.72GB/s ± 0%   1.67GB/s ± 0%    -2.69%
CastagnoliCrc40BMisaligned-4   1.58GB/s ± 0%   1.53GB/s ± 0%    -2.82%
CastagnoliCrc512-4             7.05GB/s ± 0%   9.70GB/s ± 0%   +37.59%
CastagnoliCrc512Misaligned-4   6.71GB/s ± 1%   9.09GB/s ± 0%   +35.43%
CastagnoliCrc1KB-4             7.98GB/s ± 1%  11.46GB/s ± 0%   +43.55%
CastagnoliCrc1KBMisaligned-4   7.86GB/s ± 0%  11.70GB/s ± 0%   +48.75%
CastagnoliCrc4KB-4             8.87GB/s ± 0%  21.80GB/s ± 0%  +145.69%
CastagnoliCrc4KBMisaligned-4   8.83GB/s ± 0%  21.39GB/s ± 0%  +142.25%
CastagnoliCrc32KB-4            9.15GB/s ± 0%  24.22GB/s ± 0%  +164.62%
CastagnoliCrc32KBMisaligned-4  9.16GB/s ± 0%  24.00GB/s ± 0%  +161.94%

Fixes #16107.

Change-Id: Ibe50ea76574674ce0571ef31c31015e0ed66b907
Reviewed-on: https://go-review.googlesource.com/27931
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-28 01:39:03 +00:00
Keith Randall
3427f16642 Revert "hash/crc32: improve the AMD64 implementation using SSE4.2"
This reverts commit 54d7de7dd6.

It was breaking non-amd64 builds.

Change-Id: I22650e922498eeeba3d4fa08bb4ea40a210c8f97
Reviewed-on: https://go-review.googlesource.com/27925
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-27 16:49:02 +00:00
Radu Berinde
54d7de7dd6 hash/crc32: improve the AMD64 implementation using SSE4.2
The algorithm is explained in the comments. The improvement in
throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
for larger buffers.

Additionally, we no longer initialize the software tables if SSE4.2 is
available.

Benchmarks on a Haswell i5-4670 @ 3.4 GHz:

name                           old time/op    new time/op     delta
CastagnoliCrc15B-4               21.9ns ± 1%     22.9ns ± 0%    +4.45%
CastagnoliCrc15BMisaligned-4     22.6ns ± 0%     23.4ns ± 0%    +3.43%
CastagnoliCrc40B-4               23.3ns ± 0%     23.9ns ± 0%    +2.58%
CastagnoliCrc40BMisaligned-4     25.4ns ± 0%     26.1ns ± 0%    +2.86%
CastagnoliCrc512-4               72.6ns ± 0%     52.8ns ± 0%   -27.33%
CastagnoliCrc512Misaligned-4     76.3ns ± 1%     56.3ns ± 0%   -26.18%
CastagnoliCrc1KB-4                128ns ± 1%       89ns ± 0%   -30.04%
CastagnoliCrc1KBMisaligned-4      130ns ± 0%       88ns ± 0%   -32.65%
CastagnoliCrc4KB-4                461ns ± 0%      187ns ± 0%   -59.40%
CastagnoliCrc4KBMisaligned-4      463ns ± 0%      191ns ± 0%   -58.77%
CastagnoliCrc32KB-4              3.58µs ± 0%     1.35µs ± 0%   -62.22%
CastagnoliCrc32KBMisaligned-4    3.58µs ± 0%     1.36µs ± 0%   -61.84%

name                           old speed      new speed       delta
CastagnoliCrc15B-4              684MB/s ± 1%    655MB/s ± 0%    -4.32%
CastagnoliCrc15BMisaligned-4    663MB/s ± 0%    641MB/s ± 0%    -3.32%
CastagnoliCrc40B-4             1.72GB/s ± 0%   1.67GB/s ± 0%    -2.69%
CastagnoliCrc40BMisaligned-4   1.58GB/s ± 0%   1.53GB/s ± 0%    -2.82%
CastagnoliCrc512-4             7.05GB/s ± 0%   9.70GB/s ± 0%   +37.59%
CastagnoliCrc512Misaligned-4   6.71GB/s ± 1%   9.09GB/s ± 0%   +35.43%
CastagnoliCrc1KB-4             7.98GB/s ± 1%  11.46GB/s ± 0%   +43.55%
CastagnoliCrc1KBMisaligned-4   7.86GB/s ± 0%  11.70GB/s ± 0%   +48.75%
CastagnoliCrc4KB-4             8.87GB/s ± 0%  21.80GB/s ± 0%  +145.69%
CastagnoliCrc4KBMisaligned-4   8.83GB/s ± 0%  21.39GB/s ± 0%  +142.25%
CastagnoliCrc32KB-4            9.15GB/s ± 0%  24.22GB/s ± 0%  +164.62%
CastagnoliCrc32KBMisaligned-4  9.16GB/s ± 0%  24.00GB/s ± 0%  +161.94%

Fixes #16107.

Change-Id: I8fa827ec03f708ba27ee71c833f7544ad9dc5bc3
Reviewed-on: https://go-review.googlesource.com/24471
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-27 15:50:28 +00:00
Michael Munday
4b17b152a3 hash/crc32: fix optimized s390x implementation
The code wasn't checking to see if the data was still >= 64 bytes
long after aligning it.

Aligning the data is an optimization and we don't actually need
to do it. In fact for smaller sizes it slows things down due to
the overhead of calling the generic function. Therefore for now
I have simply removed the alignment stage. I have also added a
check into the assembly to deliberately trigger a segmentation
fault if the data is too short.

Fixes #16779.

Change-Id: Ic01636d775efc5ec97689f050991cee04ce8fe73
Reviewed-on: https://go-review.googlesource.com/27409
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-21 02:04:43 +00:00
Radu Berinde
0c819b654f hash/crc32: improve the processing of the last bytes in the SSE4.2 code for AMD64
This commit improves the processing of the final few bytes in
castagnoliSSE42: instead of processing one byte at a time, we use all
versions of the CRC32 instruction to process 4 bytes, then 2, then 1.
The difference is only noticeable for small "odd" sized buffers.

We do the similar improvement for processing the first few bytes in
the case of unaligned buffer.

Fixing the test which was not actually verifying the results for
misaligned buffers (WriteString was creating an internal copy which
was aligned).

Adding benchmarks for length 15 (aligned and misaligned), results
below.

name                          old time/op    new time/op    delta
CastagnoliCrc15B-4              25.1ns ± 0%    22.1ns ± 1%  -12.14%
CastagnoliCrc15BMisaligned-4    25.2ns ± 0%    22.9ns ± 1%   -9.03%
CastagnoliCrc40B-4              23.1ns ± 0%    23.4ns ± 0%   +1.08%
CastagnoliCrc1KB-4               127ns ± 0%     128ns ± 0%   +1.18%
CastagnoliCrc4KB-4               462ns ± 0%     464ns ± 0%     ~
CastagnoliCrc32KB-4             3.58µs ± 0%    3.60µs ± 0%   +0.58%

name                          old speed      new speed      delta
CastagnoliCrc15B-4             597MB/s ± 0%   679MB/s ± 1%  +13.77%
CastagnoliCrc15BMisaligned-4   596MB/s ± 0%   655MB/s ± 1%   +9.94%
CastagnoliCrc40B-4            1.73GB/s ± 0%  1.71GB/s ± 0%   -1.14%
CastagnoliCrc1KB-4            8.01GB/s ± 0%  7.93GB/s ± 1%   -1.06%
CastagnoliCrc4KB-4            8.86GB/s ± 0%  8.83GB/s ± 0%     ~
CastagnoliCrc32KB-4           9.14GB/s ± 0%  9.09GB/s ± 0%   -0.58%

Change-Id: I499e37af2241d28e3e5d522bbab836c1a718430a
Reviewed-on: https://go-review.googlesource.com/24470
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-17 21:20:50 +00:00
Ilya Tocar
9d73e146da hash/crc64: Use slicing by 8.
Similar to crc32 slicing by 8.
This also fixes a Crc64KB benchmark actually using 1024 bytes.

Crc64/ISO64KB-4       147µs ± 0%      37µs ± 0%   -75.05%  (p=0.000 n=18+18)
Crc64/ISO4KB-4       9.19µs ± 0%    2.33µs ± 0%   -74.70%  (p=0.000 n=19+20)
Crc64/ISO1KB-4       2.31µs ± 0%    0.60µs ± 0%   -73.81%  (p=0.000 n=19+15)
Crc64/ECMA64KB-4      147µs ± 0%      37µs ± 0%   -75.05%  (p=0.000 n=20+20)
Crc64/Random64KB-4    147µs ± 0%      41µs ± 0%   -72.17%  (p=0.000 n=20+18)
Crc64/Random16KB-4   36.7µs ± 0%    36.5µs ± 0%    -0.54%  (p=0.000 n=18+19)

name                old speed     new speed      delta
Crc64/ISO64KB-4     446MB/s ± 0%  1788MB/s ± 0%  +300.72%  (p=0.000 n=18+18)
Crc64/ISO4KB-4      446MB/s ± 0%  1761MB/s ± 0%  +295.20%  (p=0.000 n=18+20)
Crc64/ISO1KB-4      444MB/s ± 0%  1694MB/s ± 0%  +281.46%  (p=0.000 n=19+20)
Crc64/ECMA64KB-4    446MB/s ± 0%  1788MB/s ± 0%  +300.77%  (p=0.000 n=20+20)
Crc64/Random64KB-4  446MB/s ± 0%  1603MB/s ± 0%  +259.32%  (p=0.000 n=20+18)
Crc64/Random16KB-4  446MB/s ± 0%   448MB/s ± 0%    +0.54%  (p=0.000 n=18+20)

Change-Id: I1c7621d836c486d6bfc41dbe1ec2ff9ab11aedfc
Reviewed-on: https://go-review.googlesource.com/22222
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-05-18 14:38:04 +00:00
Chris Zou
5833d843de hash/crc32: use vector instructions on s390x
The input buffer is aligned to a doubleword boundary to
improve performance of the vector instructions. The pure
Go implementation is used to align the input data, and is
also used when the vector instructions are not available
or the data length is less than 64 bytes.

Change-Id: Ie259a5f2f1562bcc17961c99e5776c99091d6bed
Reviewed-on: https://go-review.googlesource.com/22201
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-22 18:07:15 +00:00
Ilya Tocar
89a1f02834 hash/adler32: Unroll loop for extra performance.
name         old time/op    new time/op    delta
Adler32KB-4     592ns ± 0%     447ns ± 0%  -24.49%  (p=0.000 n=19+20)

name         old speed      new speed      delta
Adler32KB-4  1.73GB/s ± 0%  2.29GB/s ± 0%  +32.41%  (p=0.000 n=20+20)

Change-Id: I38990aa66ca4452a886200018a57c0bc3af30717
Reviewed-on: https://go-review.googlesource.com/21880
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-15 10:17:17 +00:00
Michael Munday
8edf4cb27d hash/crc32: invert build tags for go implementation
It seems cleaner and more consistent with other files to list the
architectures that have assembly implementations rather than to
list those that do not.

This means we don't have to add s390x and future platforms to this
list.

Change-Id: I2ad3f66b76eb1711333c910236ca7f5151b698e5
Reviewed-on: https://go-review.googlesource.com/21770
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-12 16:30:25 +00:00
Ilya Tocar
f5bd3556f5 hash/crc64: Add tests for ECMA polynomial
Currently we test crc64 only with ISO polynomial.

Change-Id: Ibc5e202db3b960369cbbb18e31eb0fea07b54dba
Reviewed-on: https://go-review.googlesource.com/21309
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-31 20:42:02 +00:00
Klaus Post
b212c68b90 hash/crc32: use slicing by 8 for Castagnoli and smaller sizes
This adds "slicing by 8" optimization to Castagnoli tables which will
speed up CRC32 calculation on systems without asssembler,
which are all but AMD64.

In my tests, it is faster to use "slicing by 8" for sizes all down to
16 bytes, so the switchover point has been adjusted.

There are no benchmarks for small sizes, so I have added one for 40 bytes,
as well as one for bigger sizes (32KB).

Castagnoli, No assembler, 40 Byte payload: (before, after)
BenchmarkCastagnoli40B-4   10000000     161 ns/op         246.94 MB/s
BenchmarkCastagnoli40B-4   20000000     100 ns/op         398.01 MB/s

Castagnoli, No assembler, 32KB payload: (before, after)
BenchmarkCastagnoli32KB-4     10000     115426 ns/op      283.89 MB/s
BenchmarkCastagnoli32KB-4     30000     45171 ns/op       725.41 MB/s

IEEE, No assembler, 1KB payload: (before, after)
BenchmarkCrc1KB-4       500000     3604 ns/op         284.10 MB/s
BenchmarkCrc1KB-4      1000000     1463 ns/op         699.79 MB/s

Compared:
benchmark                     old ns/op     new ns/op     delta
BenchmarkCastagnoli40B-4      161           100           -37.89%
BenchmarkCastagnoli32KB-4     115426        45171         -60.87%
BenchmarkCrc1KB-4             3604          1463          -59.41%

benchmark                     old MB/s     new MB/s     speedup
BenchmarkCastagnoli40B-4      246.94       398.01       1.61x
BenchmarkCastagnoli32KB-4     283.89       725.41       2.56x
BenchmarkCrc1KB-4             284.10       699.79       2.46x

Change-Id: I303e4ec84e8d4dafd057d64c0e43deb2b498e968
Reviewed-on: https://go-review.googlesource.com/19335
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-08 16:46:24 +00:00
Ilya Tocar
1d1f2fb4c6 cmd/internal/obj/x86: add new instructions, cleanup.
Add several instructions that were used via BYTE and use them.
Instructions added: PEXTRB, PEXTRD, PEXTRQ, PINSRB, XGETBV, POPCNT.

Change-Id: I5a80cd390dc01f3555dbbe856a475f74b5e6df65
Reviewed-on: https://go-review.googlesource.com/18593
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-13 14:04:44 +00:00
Joe Tsai
64cc5fd0b3 hash/crc32: add noescape tags to assembly functions
CRC-32 computation is stateless and the p slice does not get stored
anywhere. Thus, we mark the assembly functions as noescape so that
it doesn't believe that p leaks in:
	func Update(crc uint32, tab *Table, p []byte) uint32

Before:
	./crc32.go:153: leaking param: p

After:
	./crc32.go:153: Update p does not escape

Change-Id: I52ba35b6cc544fff724327140e0c27898431d1dc
Reviewed-on: https://go-review.googlesource.com/17069
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-25 15:01:10 +00:00
Joe Tsai
d6ee6c2d06 hash/crc32: rename iEEETable to ieeeTable
iEEETable violates the Go naming conventions and is inconsistent
with the rest of the package. Use ieeeTable instead.

Change-Id: I04b201aa39759d159de2b0295f43da80488c2263
Reviewed-on: https://go-review.googlesource.com/17068
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2015-11-20 04:57:07 +00:00
Yao Zhang
84df38181b hash/crc32: added mips64{,le} build tags
Change-Id: I77c6768fff6f0163b36800307c4d573bb6521fe5
Reviewed-on: https://go-review.googlesource.com/14454
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-11-12 04:50:43 +00:00
Klaus Post
2027b00e63 hash/crc32: add AMD64 optimized IEEE CRC calculation
IEEE is the most commonly used CRC-32 polynomial, used by zip, gzip and others.

Based on http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf

benchmark                       old ns/op     new ns/op     delta
BenchmarkIEEECrc1KB-8           3193          352           -88.98%
BenchmarkIEEECrc4KB-8           5025          1307          -73.99%
BenchmarkCastagnoliCrc1KB-8     126           126           +0.00%

benchmark                       old MB/s     new MB/s     speedup
BenchmarkIEEECrc1KB-8           320.68       2901.92      9.05x
BenchmarkIEEECrc4KB-8           815.08       3131.80      3.84x
BenchmarkCastagnoliCrc1KB-8     8100.80      8109.78      1.00x

Change-Id: I99c9a48365f631827f516e44f97e86155f03cb90
Reviewed-on: https://go-review.googlesource.com/14080
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-16 15:42:42 +00:00
Shenghou Ma
91ddc07f65 hash/*: document the byte order used by the Sum methods
Fixes #12350.

Change-Id: I3dcb0e2190c11f83f15fb07cc637fead54f734f7
Reviewed-on: https://go-review.googlesource.com/14275
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-09-10 03:34:23 +00:00
Joe Tsai
e16d80362d hash: update documentation for MakeTable in crc32 and crc64
Explicitly say that *Table returned by MakeTable may not be
modified. Otherwise, this leads to very subtle bugs that may
or may not manifest themselves.

Same comment was made on package crc64, to keep the future
open to the caching tables that crc32 effectively does.

Fixes: #12487.

Change-Id: I2881bebb8b16f6f8564412172774c79c2593c6c1
Reviewed-on: https://go-review.googlesource.com/14258
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-09-04 02:16:27 +00:00
Joe Tsai
8e2d0e1c4c hash/fnv: fix wiki url
The URL is shown on go docs and is an eye-sore.

For go1.6.

Change-Id: I8b8ea3751200d06ed36acfe22f47ebb38107f8db
Reviewed-on: https://go-review.googlesource.com/13282
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-08-24 21:26:42 +00:00
Davies Liu
1e0760354c hash/crc32: speedup crc32 of IEEE using slicingBy8
The Slicing-By-8 [1] algorithm has much performance improvements than
current approach. This patch only uses it for IEEE, which is the most
common case in practice.

There is the benchmark on Mac OS X 10.9:

benchmark                     old MB/s     new MB/s     speedup
BenchmarkIEEECrc1KB           349.40       353.03       1.01x
BenchmarkIEEECrc4KB           351.55       934.35       2.66x
BenchmarkCastagnoliCrc1KB     7037.58      7392.63      1.05x

This algorithm need 8K lookup table, so it's enabled only for block
larger than 4K.

We can see about 2.6x improvement for IEEE.

Change-Id: I7f786d20f0949245e4aa101d7921669f496ed0f7
Reviewed-on: https://go-review.googlesource.com/1863
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-18 18:14:24 +00:00
Shenghou Ma
169adec231 hash/crc32: move reverse representation docs to an example
Updates #8229.

Change-Id: I3e691479d3659ed1b3ff8ebbb71b4fc03f2e67af
Reviewed-on: https://go-review.googlesource.com/9680
Reviewed-by: Rob Pike <r@golang.org>
2015-05-04 00:19:22 +00:00
Aamir Khan
80f575b78f hash/crc32: clarify documentation
Explicitly specify that we represent polynomial in reversed notation

Fixes #8229

Change-Id: Idf094c01fd82f133cd0c1b50fa967d12c577bdb5
Reviewed-on: https://go-review.googlesource.com/9237
Reviewed-by: David Chase <drchase@google.com>
2015-04-24 13:44:25 +00:00
Aram Hăvărneanu
a25e3c03f3 os/signal, hash/crc32: add arm64 build tags
Change-Id: I6ca9caec8ccf12618e56dcf6b83328e7acf8b1ec
Reviewed-on: https://go-review.googlesource.com/7148
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-16 18:46:43 +00:00
Russ Cox
09d92b6bbf all: power64 is now ppc64
Fixes #8654.

LGTM=austin
R=austin
CC=golang-codereviews
https://golang.org/cl/180600043
2014-12-05 19:13:20 -05:00
Russ Cox
50e0749f87 [dev.cc] all: merge default (e4ab8f908aac) into dev.cc
TBR=austin
CC=golang-codereviews
https://golang.org/cl/179040044
2014-11-20 11:48:08 -05:00
Nigel Tao
de7d1c4094 hash/crc32: fix comment that the IEEE polynomial applies to MPEG-2.
LGTM=minux
R=adg, minux
CC=golang-codereviews
https://golang.org/cl/170520043
2014-11-12 18:48:00 +11:00
Austin Clements
2bd616b1a7 build: merge the great pkg/ rename into dev.power64
This also removes pkg/runtime/traceback_lr.c, which was ported
to Go in an earlier commit and then moved to
runtime/traceback.go.

Reviewer: rsc@golang.org
          rsc: LGTM
2014-10-22 13:25:37 -04:00
Russ Cox
c007ce824d build: move package sources from src/pkg to src
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
2014-09-08 00:08:51 -04:00