1
0
mirror of https://github.com/golang/go synced 2024-11-18 19:34:41 -07:00
go/src/compress
Ilya Tocar 4984d843d9 compress/flate: optimize huffSym
By using local variables and assigning them back to decompressor
at the end of huffSym, we allow compiler to keep them in registers
and avoid reloading/storing them repeatedly. To archive this,
moreBits was inlined and specialized to work with local variables.
Also move EOF error conversion to helper function, to make inlined
part of moreBits more readable. Together this results in nice speed-up:

name                             old time/op    new time/op    delta
Decode/Digits/Huffman/1e4-6         278µs ± 1%     240µs ± 2%  -13.72%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-6        2.38ms ± 1%    2.05ms ± 1%  -14.12%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-6        23.4ms ± 1%    19.9ms ± 0%  -14.69%  (p=0.000 n=9+9)
Decode/Digits/Speed/1e4-6           280µs ± 2%     254µs ± 1%   -9.28%  (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-6          2.53ms ± 1%    2.35ms ± 1%   -7.17%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-6          24.8ms ± 1%    23.0ms ± 1%   -7.22%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-6         281µs ± 2%     259µs ± 3%   -8.03%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-6        2.45ms ± 1%    2.30ms ± 1%   -6.15%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-6        24.1ms ± 1%    22.6ms ± 0%   -6.31%  (p=0.000 n=9+9)
Decode/Digits/Compression/1e4-6     279µs ± 2%     261µs ± 2%   -6.53%  (p=0.000 n=8+9)
Decode/Digits/Compression/1e5-6    2.44ms ± 1%    2.30ms ± 1%   -5.72%  (p=0.000 n=10+9)
Decode/Digits/Compression/1e6-6    24.0ms ± 1%    22.6ms ± 0%   -6.10%  (p=0.000 n=10+9)
Decode/Twain/Huffman/1e4-6          316µs ± 2%     267µs ± 3%  -15.30%  (p=0.000 n=9+10)
Decode/Twain/Huffman/1e5-6         2.62ms ± 0%    2.22ms ± 0%  -15.24%  (p=0.000 n=10+10)
Decode/Twain/Huffman/1e6-6         25.7ms ± 1%    21.8ms ± 0%  -15.19%  (p=0.000 n=10+10)
Decode/Twain/Speed/1e4-6            290µs ± 1%     264µs ± 2%   -9.17%  (p=0.000 n=9+10)
Decode/Twain/Speed/1e5-6           2.35ms ± 1%    2.13ms ± 1%   -9.74%  (p=0.000 n=9+10)
Decode/Twain/Speed/1e6-6           22.9ms ± 0%    20.7ms ± 0%   -9.68%  (p=0.000 n=10+9)
Decode/Twain/Default/1e4-6          270µs ± 2%     252µs ± 2%   -6.67%  (p=0.000 n=9+10)
Decode/Twain/Default/1e5-6         2.02ms ± 1%    1.84ms ± 1%   -8.85%  (p=0.000 n=10+10)
Decode/Twain/Default/1e6-6         19.1ms ± 0%    17.5ms ± 1%   -8.73%  (p=0.000 n=9+10)
Decode/Twain/Compression/1e4-6      272µs ± 1%     250µs ± 4%   -8.20%  (p=0.000 n=9+10)
Decode/Twain/Compression/1e5-6     2.01ms ± 0%    1.84ms ± 1%   -8.57%  (p=0.000 n=9+10)
Decode/Twain/Compression/1e6-6     19.1ms ± 0%    17.4ms ± 1%   -8.75%  (p=0.000 n=9+10)

name                             old speed      new speed      delta
Decode/Digits/Huffman/1e4-6      35.9MB/s ± 1%  41.7MB/s ± 2%  +15.91%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-6      41.9MB/s ± 1%  48.8MB/s ± 1%  +16.44%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-6      42.8MB/s ± 1%  50.2MB/s ± 0%  +17.22%  (p=0.000 n=9+9)
Decode/Digits/Speed/1e4-6        35.7MB/s ± 2%  39.4MB/s ± 1%  +10.22%  (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-6        39.6MB/s ± 1%  42.6MB/s ± 1%   +7.73%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-6        40.3MB/s ± 1%  43.4MB/s ± 1%   +7.78%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-6      35.6MB/s ± 2%  38.7MB/s ± 2%   +8.74%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-6      40.9MB/s ± 1%  43.6MB/s ± 1%   +6.55%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-6      41.5MB/s ± 1%  44.3MB/s ± 0%   +6.73%  (p=0.000 n=9+9)
Decode/Digits/Compression/1e4-6  35.8MB/s ± 2%  38.3MB/s ± 2%   +6.99%  (p=0.000 n=8+9)
Decode/Digits/Compression/1e5-6  40.9MB/s ± 1%  43.4MB/s ± 1%   +6.07%  (p=0.000 n=10+9)
Decode/Digits/Compression/1e6-6  41.6MB/s ± 1%  44.3MB/s ± 0%   +6.49%  (p=0.000 n=10+9)
Decode/Twain/Huffman/1e4-6       31.7MB/s ± 2%  37.4MB/s ± 3%  +18.08%  (p=0.000 n=9+10)
Decode/Twain/Huffman/1e5-6       38.2MB/s ± 0%  45.0MB/s ± 0%  +17.97%  (p=0.000 n=10+10)
Decode/Twain/Huffman/1e6-6       38.9MB/s ± 1%  45.9MB/s ± 0%  +17.90%  (p=0.000 n=10+10)
Decode/Twain/Speed/1e4-6         34.5MB/s ± 1%  38.0MB/s ± 2%  +10.11%  (p=0.000 n=9+10)
Decode/Twain/Speed/1e5-6         42.5MB/s ± 1%  47.0MB/s ± 1%  +10.79%  (p=0.000 n=9+10)
Decode/Twain/Speed/1e6-6         43.7MB/s ± 0%  48.3MB/s ± 0%  +10.72%  (p=0.000 n=10+9)
Decode/Twain/Default/1e4-6       37.1MB/s ± 2%  39.8MB/s ± 2%   +7.15%  (p=0.000 n=9+10)
Decode/Twain/Default/1e5-6       49.5MB/s ± 1%  54.3MB/s ± 1%   +9.71%  (p=0.000 n=10+10)
Decode/Twain/Default/1e6-6       52.3MB/s ± 0%  57.3MB/s ± 1%   +9.57%  (p=0.000 n=9+10)
Decode/Twain/Compression/1e4-6   36.7MB/s ± 1%  40.0MB/s ± 4%   +8.96%  (p=0.000 n=9+10)
Decode/Twain/Compression/1e5-6   49.8MB/s ± 0%  54.5MB/s ± 1%   +9.38%  (p=0.000 n=9+10)
Decode/Twain/Compression/1e6-6   52.3MB/s ± 0%  57.3MB/s ± 1%   +9.58%  (p=0.000 n=9+10)

Change-Id: Iabfd285535ddb210f7f48f33317c6463b5532400
Reviewed-on: https://go-review.googlesource.com/102235
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-17 22:37:49 +00:00
..
bzip2 compress/bzip2: remove bit-tricks 2018-03-21 21:57:15 +00:00
flate compress/flate: optimize huffSym 2018-04-17 22:37:49 +00:00
gzip compress/gzip: do not count header bytes written in Write 2018-04-02 20:18:14 +00:00
lzw compress/lzw: don't follow code == hi if last is invalid. 2017-06-08 01:27:37 +00:00
testdata
zlib compress/gzip, compress/zlib: fix Writer documentation inconsistencies 2018-03-13 20:58:19 +00:00