This reverts commit 467109bf56.
Replaced by a improved strategy later in the CL relation chain.
Change-Id: Ib90813b5a6c4716b563c8496013d2d57f9c022b8
Reviewed-on: https://go-review.googlesource.com/36066
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The working directory is now adjusted to match the typical Go test
working directory in main, as the old trick for adjusting earlier
stopped working with the latest version of LLDB bugs.
That means the small number of places where testdata files are
read before main is called no longer work. This CL adjusts those
reads to happen after main is called. (This has the bonus effect of
not reading some benchmark testdata files in all.bash.)
Fixes compress/bzip2, go/doc, go/parser, os, and time package
tests on the iOS builder.
Change-Id: If60f026aa7848b37511c36ac5e3985469ec25209
Reviewed-on: https://go-review.googlesource.com/35255
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Ranging over an array causes the array to be copied over to the
stack, which cause large re-growths. Instead, we should iterate
over slices of the array.
Also, assigning a large struct literal uses the stack even
though the actual fields being populated are small in comparison
to the entirety of the struct (see #18636).
Fixing the stack growth does not alter CPU-time performance much
since the stack-growth and copying was such a tiny portion of the
compression work:
name old time/op new time/op delta
Encode/Digits/Default/1e4-8 332µs ± 1% 332µs ± 1% ~ (p=0.796 n=10+10)
Encode/Digits/Default/1e5-8 5.07ms ± 2% 5.05ms ± 1% ~ (p=0.815 n=9+8)
Encode/Digits/Default/1e6-8 53.7ms ± 1% 53.9ms ± 1% ~ (p=0.075 n=10+10)
Encode/Twain/Default/1e4-8 380µs ± 1% 380µs ± 1% ~ (p=0.684 n=10+10)
Encode/Twain/Default/1e5-8 5.79ms ± 2% 5.79ms ± 1% ~ (p=0.497 n=9+10)
Encode/Twain/Default/1e6-8 61.5ms ± 1% 61.8ms ± 1% ~ (p=0.247 n=10+10)
name old speed new speed delta
Encode/Digits/Default/1e4-8 30.1MB/s ± 1% 30.1MB/s ± 1% ~ (p=0.753 n=10+10)
Encode/Digits/Default/1e5-8 19.7MB/s ± 2% 19.8MB/s ± 1% ~ (p=0.795 n=9+8)
Encode/Digits/Default/1e6-8 18.6MB/s ± 1% 18.5MB/s ± 1% ~ (p=0.072 n=10+10)
Encode/Twain/Default/1e4-8 26.3MB/s ± 1% 26.3MB/s ± 1% ~ (p=0.616 n=10+10)
Encode/Twain/Default/1e5-8 17.3MB/s ± 2% 17.3MB/s ± 1% ~ (p=0.484 n=9+10)
Encode/Twain/Default/1e6-8 16.3MB/s ± 1% 16.2MB/s ± 1% ~ (p=0.238 n=10+10)
Updates #18636Fixes#18625
Change-Id: I471b20339bf675f63dc56d38b3acdd824fe23328
Reviewed-on: https://go-review.googlesource.com/35122
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I used the slowtests.go tool as described in
https://golang.org/cl/32684 on packages that stood out.
go test -short std drops from ~56 to ~52 seconds.
This isn't a huge win, but it was mostly an exercise.
Updates #17751
Change-Id: I9f3402e36a038d71e662d06ce2c1d52f6c4b674d
Reviewed-on: https://go-review.googlesource.com/32751
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Previously, we were off by one.
Also fix a comment typo.
Change-Id: Ib94d23acc56d5fccd44144f71655481f98803ac8
Reviewed-on: https://go-review.googlesource.com/32149
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The GZIP format records the ModTime as an uint32 counting seconds since
the Unix epoch. The zero value is explicitly defined in section 2.3.1
as meaning no timestamp is available.
Currently, the Writer always encodes the ModTime even if it is the zero
time.Time value, which causes the Writer to try and encode the value
-62135596800 into the uint32 MTIME field. This causes an overflow and
results in our GZIP files having MTIME fields indicating a date in 2042-07-13.
We alter the Writer to only encode ModTime if the value does not underflow
the MTIME field (i.e., it is newer than the Unix epoch). We do not attempt
to fix what happens when the timestamp overflows in the year 2106.
We alter the Reader to only decode ModTime if the value is non-zero.
There is no risk of overflowing time.Time when decoding.
Fixes#17663
Change-Id: Ie1b65770c6342cd7b14aeebe10e5a49e6c9eb730
Reviewed-on: https://go-review.googlesource.com/32325
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Tests for determinism was not working as intended since io.Copybuffer
uses the io.WriterTo if available.
This exposed that level 0 (no compression) changed output
based on the number of writes and buffers given to the
writer.
Previously, Write would emit a new raw block (BTYPE=00) for
every non-empty call to Write.
This CL fixes it such that a raw block is only emitted upon
the following conditions:
* A full window is obtained (every 65535 bytes)
* Flush is called
* Close is called
Change-Id: I807f866d97e2db7820f11febab30a96266a6cbf1
Reviewed-on: https://go-review.googlesource.com/31174
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
This exposes HuffmanOnly in zlib and gzip packages, which is currently
unavailable.
Change-Id: If5d103bbc8b5fce2f5d740fd103a235c5d1ed7cd
Reviewed-on: https://go-review.googlesource.com/31186
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The incorrect table was used for estimating output size.
This can give suboptimal selection of entropy encoder in rare cases.
Change-Id: I8b358200f2d1f9a3f9b79a44269d7be704e1d2d9
Reviewed-on: https://go-review.googlesource.com/31172
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In the event of an unexpected error, we should always flush available
decompressed data to the user.
Fixes#16924
Change-Id: I0bc0824c3201f3149e84e6a26e3dbcba72a1aae5
Reviewed-on: https://go-review.googlesource.com/28216
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
For persistent error handling, the methods of huffmanBitWriter have to be
consistent about how they check errors. It must either consistently
check error *before* every operation OR immediately *after* every
operation. Since most of the current logic uses the previous approach,
we apply the same style of error checking to writeBits and all calls
to Write such that they only operate if w.err is already nil going
into them.
The error handling approach is brittle and easily broken by future commits to
the code. In the near future, we should switch the logic to use panic at the
lowest levels and a recover at the edge of the public API to ensure
that errors are always persistent.
Fixes#16749
Change-Id: Ie1d83e4ed8842f6911a31e23311cd3cbf38abe8c
Reviewed-on: https://go-review.googlesource.com/27200
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
As rendered on https://tip.golang.org/pkg/compress/flate/, there is an
extra new-line because of the unexported constants in the same block.
<<<
const (
NoCompression = 0
BestSpeed = 1
BestCompression = 9
DefaultCompression = -1
HuffmanOnly = -2 // Disables match search and only does Huffman entropy reduction.
)
>>>
Instead, seperate the exported compression level constants into its own
const block. This is both more readable and also fixes the issue.
Change-Id: I60b7966c83fb53356c02e4640d05f55a3bee35b7
Reviewed-on: https://go-review.googlesource.com/23557
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This causes the large files to be loaded only once per benchmark.
This CL also serves as an example use case of sub(tests|-benchmarks).
This CL ensures that names are identical to the original
except for an added slashes. Things could be
simplified further if this restriction were dropped.
Change-Id: I45e303e158e3152e33d0d751adfef784713bf997
Reviewed-on: https://go-review.googlesource.com/23420
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Address two documentation issues:
1) Document that the GZIP and ZLIB footer is only verified when the
reader has been fully consumed.
2) The zlib reader is guaranteed to not read past the EOF if the
input io.Reader is also a io.ByteReader. This functionality was
documented in the flate and gzip packages but not on zlib.
Fixes#14867
Change-Id: I43d46b93e38f98a04901dc7d4f18ed2f9e09f6fb
Reviewed-on: https://go-review.googlesource.com/21218
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This makes compress/flate's version of Snappy diverge from the upstream
golang/snappy version, but the latter has a goal of matching C++ snappy
output byte-for-byte. Both C++ and the asm version of golang/snappy can
use a smaller N for the O(N) zero-initialization of the hash table when
the input is small, even if the pure Go golang/snappy algorithm cannot:
"var table [tableSize]uint16" zeroes all tableSize elements.
For this package, we don't have the match-C++-snappy goal, so we can use
a different (constant) hash table size.
This is a small win, in terms of throughput and output size, but it also
enables us to re-use the (constant size) hash table between
encodeBestSpeed calls, avoiding the cost of zero-initializing the hash
table altogether. This will be implemented in follow-up commits.
This package's benchmarks:
name old speed new speed delta
EncodeDigitsSpeed1e4-8 72.8MB/s ± 1% 73.5MB/s ± 1% +0.86% (p=0.000 n=10+10)
EncodeDigitsSpeed1e5-8 77.5MB/s ± 1% 78.0MB/s ± 0% +0.69% (p=0.000 n=10+10)
EncodeDigitsSpeed1e6-8 82.0MB/s ± 1% 82.7MB/s ± 1% +0.85% (p=0.000 n=10+9)
EncodeTwainSpeed1e4-8 65.1MB/s ± 1% 65.6MB/s ± 0% +0.78% (p=0.000 n=10+9)
EncodeTwainSpeed1e5-8 80.0MB/s ± 0% 80.6MB/s ± 1% +0.66% (p=0.000 n=9+10)
EncodeTwainSpeed1e6-8 81.6MB/s ± 1% 82.1MB/s ± 1% +0.55% (p=0.017 n=10+10)
Input size in bytes, output size (and time taken) before and after on
some larger files:
1073741824 57269781 ( 3183ms) 57269781 ( 3177ms) adresser.001
1000000000 391052000 ( 11071ms) 391051996 ( 11067ms) enwik9
1911399616 378679516 ( 13450ms) 378679514 ( 13079ms) gob-stream
8558382592 3972329193 ( 99962ms) 3972329193 ( 91290ms) rawstudio-mint14.tar
200000000 200015265 ( 776ms) 200015265 ( 774ms) sharnd.out
Thanks to Klaus Post for the original suggestion on cl/21021.
Change-Id: Ia4c63a8d1b92c67e1765ec5c3c8c69d289d9a6ce
Reviewed-on: https://go-review.googlesource.com/22604
Reviewed-by: Russ Cox <rsc@golang.org>
This encoding algorithm, which prioritizes speed over output size, is
based on Snappy's LZ77-style encoder: github.com/golang/snappy
This commit keeps the diff between this package's encodeBestSpeed
function and and Snappy's encodeBlock function as small as possible (see
the diff below). Follow-up commits will improve this package's
performance and output size.
This package's speed benchmarks:
name old speed new speed delta
EncodeDigitsSpeed1e4-8 40.7MB/s ± 0% 73.0MB/s ± 0% +79.18% (p=0.008 n=5+5)
EncodeDigitsSpeed1e5-8 33.0MB/s ± 0% 77.3MB/s ± 1% +134.04% (p=0.008 n=5+5)
EncodeDigitsSpeed1e6-8 32.1MB/s ± 0% 82.1MB/s ± 0% +156.18% (p=0.008 n=5+5)
EncodeTwainSpeed1e4-8 42.1MB/s ± 0% 65.0MB/s ± 0% +54.61% (p=0.008 n=5+5)
EncodeTwainSpeed1e5-8 46.3MB/s ± 0% 80.0MB/s ± 0% +72.81% (p=0.008 n=5+5)
EncodeTwainSpeed1e6-8 47.3MB/s ± 0% 81.7MB/s ± 0% +72.86% (p=0.008 n=5+5)
Here's the milliseconds taken, before and after this commit, to compress
a number of test files:
Go's src/compress/testdata files:
4 1 e.txt
8 4 Mark.Twain-Tom.Sawyer.txt
github.com/golang/snappy's benchmark files:
3 1 alice29.txt
12 3 asyoulik.txt
6 1 fireworks.jpeg
1 1 geo.protodata
1 0 html
2 2 html_x_4
6 3 kppkn.gtb
11 4 lcet10.txt
5 1 paper-100k.pdf
14 6 plrabn12.txt
17 6 urls.10K
Larger files linked to from
https://docs.google.com/spreadsheets/d/1VLxi-ac0BAtf735HyH3c1xRulbkYYUkFecKdLPH7NIQ/edit#gid=166102500
2409 3182 adresser.001
16757 11027 enwik9
13764 12946 gob-stream
153978 74317 rawstudio-mint14.tar
4371 770 sharnd.out
Output size is larger. In the table below, the first column is the input
size, the second column is the output size prior to this commit, the
third column is the output size after this commit.
100003 47707 50006 e.txt
387851 172707 182930 Mark.Twain-Tom.Sawyer.txt
152089 62457 66705 alice29.txt
125179 54503 57274 asyoulik.txt
123093 122827 123108 fireworks.jpeg
118588 18574 20558 geo.protodata
102400 16601 17305 html
409600 65506 70313 html_x_4
184320 49007 50944 kppkn.gtb
426754 166957 179355 lcet10.txt
102400 82126 84937 paper-100k.pdf
481861 218617 231988 plrabn12.txt
702087 241774 258020 urls.10K
1073741824 43074110 57269781 adresser.001
1000000000 365772256 391052000 enwik9
1911399616 340364558 378679516 gob-stream
8558382592 3807229562 3972329193 rawstudio-mint14.tar
200000000 200061040 200015265 sharnd.out
The diff between github.com/golang/snappy's encodeBlock function and
this commit's encodeBestSpeed function:
1c1,7
< func encodeBlock(dst, src []byte) (d int) {
---
> func encodeBestSpeed(dst []token, src []byte) []token {
> // This check isn't in the Snappy implementation, but there, the caller
> // instead of the callee handles this case.
> if len(src) < minNonLiteralBlockSize {
> return emitLiteral(dst, src)
> }
>
4c10
< // and len(src) <= maxBlockSize and maxBlockSize == 65536.
---
> // and len(src) <= maxStoreBlockSize and maxStoreBlockSize == 65535.
65c71
< if load32(src, s) == load32(src, candidate) {
---
> if s-candidate < maxOffset && load32(src, s) == load32(src, candidate) {
73c79
< d += emitLiteral(dst[d:], src[nextEmit:s])
---
> dst = emitLiteral(dst, src[nextEmit:s])
90c96
< // This is an inlined version of:
---
> // This is an inlined version of Snappy's:
93c99,103
< for i := candidate + 4; s < len(src) && src[i] == src[s]; i, s = i+1, s+1 {
---
> s1 := base + maxMatchLength
> if s1 > len(src) {
> s1 = len(src)
> }
> for i := candidate + 4; s < s1 && src[i] == src[s]; i, s = i+1, s+1 {
96c106,107
< d += emitCopy(dst[d:], base-candidate, s-base)
---
> // matchToken is flate's equivalent of Snappy's emitCopy.
> dst = append(dst, matchToken(uint32(s-base-3), uint32(base-candidate-minOffsetSize)))
114c125
< if uint32(x>>8) != load32(src, candidate) {
---
> if s-candidate >= maxOffset || uint32(x>>8) != load32(src, candidate) {
124c135
< d += emitLiteral(dst[d:], src[nextEmit:])
---
> dst = emitLiteral(dst, src[nextEmit:])
126c137
< return d
---
> return dst
This change is based on https://go-review.googlesource.com/#/c/21021/ by
Klaus Post, but it is a separate changelist as cl/21021 seems to have
stalled in code review, and the Go 1.7 feature freeze approaches.
Golang-dev discussion:
https://groups.google.com/d/topic/golang-dev/XYgHX9p8IOk/discussion and
of course cl/21021.
Change-Id: Ib662439417b3bd0b61c2977c12c658db3e44d164
Reviewed-on: https://go-review.googlesource.com/22370
Reviewed-by: Russ Cox <rsc@golang.org>
cmd and runtime were handled separately, and I'm intentionally skipped
syscall. This is the rest of the standard library.
CL generated mechanically with github.com/mdempsky/unconvert.
Change-Id: I9e0eff886974dedc37adb93f602064b83e469122
Reviewed-on: https://go-review.googlesource.com/22104
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
It's not a big deal (the for loop drops from 130-ish to 120-ish
milliseconds for me) but it's not a big change either.
Change-Id: I161a49caab5cae5a2b87866ed1dfb93627be8013
Reviewed-on: https://go-review.googlesource.com/22110
Reviewed-by: Klaus Post <klauspost@gmail.com>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
RFC 1952, section 3.2.3 says:
>>>
If FHCRC is set, a CRC16 for the gzip header is present,
immediately before the compressed data. The CRC16 consists of the two
least significant bytes of the CRC32 for all bytes of the
gzip header up to and not including the CRC16.
<<<
Thus, instead of computing the CRC only over the first 10 bytes
of the header, we compute it over the whole header (minus CRC16).
Fixes#15070
Change-Id: I55703fd30b535b12abeb5e3962d4da0a86ed615a
Reviewed-on: https://go-review.googlesource.com/21466
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This improves the short version of the writer test.
First of all, it has a much quicker setup. Previously that
could take up towards 0.5 second.
Secondly, it will test all compression levels in short mode as well.
Execution time is 1.7s/0.03s for normal/short mode.
Change-Id: I275a21f712daff6f7125cc6a493415e86439cb19
Reviewed-on: https://go-review.googlesource.com/21800
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Rather than specifying every field that should be cleared in Reset,
it is better to just zero the entire struct and only preserve or set the
fields that we actually care about. This ensures that the Header field
is reset for the next use.
Fixes#15077
Change-Id: I41832e506d2d64c62b700aa1986e7de24a577511
Reviewed-on: https://go-review.googlesource.com/21465
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Changes made:
* Reader.flg is not used anywhere else other than readHeader and
does not need to be stored.
* Store Reader.digest and Writer.digest as uint32s rather than as
a hash.Hash32 and use the crc32.Update function instead. This simplifies
initialization logic since the zero value of uint32 is the initial
CRC-32 value. There are no performance detriments to doing this since
the hash.Hash32 returned by crc32 simply calls crc32.Update as well.
* s/[0:/[:/ Consistently use shorter notation for slicing.
* s/RFC1952/RFC 1952/ Consistently use RFC notation.
Change-Id: I55416a19f4836cbed943adaa3f672538ea5d166d
Reviewed-on: https://go-review.googlesource.com/21429
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Rather than checking the block final bit on the next invocation
of nextBlock, we check it at the termination of the current block.
This ensures that we return (n, io.EOF) instead of (0, io.EOF)
more frequently for most streams.
However, there are certain situations where an eager io.EOF is not done:
1) We previously returned from Read because the write buffer of the internal
dictionary was full, and it just so happens that there is no more data
remaining in the stream.
2) There exists a [non-final, empty, raw block] after all blocks that
actually contain uncompressed data. We cannot return io.EOF eagerly here
since it would break flushing semantics.
Both situations happen infrequently, but it is still important to note that
this change does *not* guarantee that flate will *always* return (n, io.EOF).
Furthermore, this CL makes no changes to the pattern of ReadByte calls
to the underlying io.ByteReader.
Below is the motivation for this change, pulling the text from
@bradfitz's CL/21290:
net/http and other things work better when io.Reader implementations
return (n, io.EOF) at the end, instead of (n, nil) followed by (0,
io.EOF). Both are legal, but the standard library has been moving
towards n+io.EOF.
An investigation of net/http connection re-use in
https://github.com/google/go-github/pull/317 revealed that with gzip
compression + http/1.1 chunking, the net/http package was not
automatically reusing the underlying TCP connections when the final
EOF bytes were already read off the wire. The net/http package only
reuses the connection if the underlying Readers (many of them nested
in this case) all eagerly return io.EOF.
Previous related CLs:
https://golang.org/cl/76400046 - tls.Reader
https://golang.org/cl/58240043 - http chunked reader
In addition to net/http, this behavior also helps things like
ioutil.ReadAll (see comments about performance improvements in
https://codereview.appspot.com/49570044)
Updates #14867
Updates google/go-github#317
Change-Id: I637c45552efb561d34b13ed918b73c660f668378
Reviewed-on: https://go-review.googlesource.com/21302
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The Read logic should not assume that only (0, io.EOF) is returned
instead of (n, io.EOF) where n is positive.
The fix done here is very similar to the fix to compress/zlib
in CL/20292.
Change-Id: Icb76258cdcf8cfa386a60bab330fefde46fc071d
Reviewed-on: https://go-review.googlesource.com/21308
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Add a "HuffmanOnly" compression level, where the input is
only entropy encoded.
The output is fully inflate compatible. Typical compression
is reduction is about 50% of typical level 1 compression, however
the compression time is very stable, and does not vary as much as
nearly as much level 1 compression (or Snappy).
This mode is useful for:
* HTTP compression in a CPU limited environment.
* Entropy encoding Snappy compressed data, for archiving, etc.
* Compression where compression time needs to be predictable.
* Fast network transfer.
Snappy "usually" performs inbetween this and level 1 compression-wise,
but at the same speed as "Huffman", so this is not a replacement,
but a good supplement for Snappy, since it usually can compress
Snappy output further.
This is implemented as level -2, since this would be too much of a
compression reduction to replace level 1.
>go test -bench=Encode -cpu=1
BenchmarkEncodeDigitsHuffman1e4 30000 52334 ns/op 191.08 MB/s
BenchmarkEncodeDigitsHuffman1e5 3000 518343 ns/op 192.92 MB/s
BenchmarkEncodeDigitsHuffman1e6 300 5356884 ns/op 186.68 MB/s
BenchmarkEncodeDigitsSpeed1e4 5000 324214 ns/op 30.84 MB/s
BenchmarkEncodeDigitsSpeed1e5 500 3952614 ns/op 25.30 MB/s
BenchmarkEncodeDigitsSpeed1e6 30 40760350 ns/op 24.53 MB/s
BenchmarkEncodeDigitsDefault1e4 5000 387056 ns/op 25.84 MB/s
BenchmarkEncodeDigitsDefault1e5 300 5950614 ns/op 16.80 MB/s
BenchmarkEncodeDigitsDefault1e6 20 63842195 ns/op 15.66 MB/s
BenchmarkEncodeDigitsCompress1e4 5000 391859 ns/op 25.52 MB/s
BenchmarkEncodeDigitsCompress1e5 300 5707112 ns/op 17.52 MB/s
BenchmarkEncodeDigitsCompress1e6 20 59839465 ns/op 16.71 MB/s
BenchmarkEncodeTwainHuffman1e4 20000 73498 ns/op 136.06 MB/s
BenchmarkEncodeTwainHuffman1e5 2000 595892 ns/op 167.82 MB/s
BenchmarkEncodeTwainHuffman1e6 200 6059016 ns/op 165.04 MB/s
BenchmarkEncodeTwainSpeed1e4 5000 321212 ns/op 31.13 MB/s
BenchmarkEncodeTwainSpeed1e5 500 2823873 ns/op 35.41 MB/s
BenchmarkEncodeTwainSpeed1e6 50 27237864 ns/op 36.71 MB/s
BenchmarkEncodeTwainDefault1e4 3000 454634 ns/op 22.00 MB/s
BenchmarkEncodeTwainDefault1e5 200 6859537 ns/op 14.58 MB/s
BenchmarkEncodeTwainDefault1e6 20 71547405 ns/op 13.98 MB/s
BenchmarkEncodeTwainCompress1e4 3000 462307 ns/op 21.63 MB/s
BenchmarkEncodeTwainCompress1e5 200 7534992 ns/op 13.27 MB/s
BenchmarkEncodeTwainCompress1e6 20 80353365 ns/op 12.45 MB/s
PASS
ok compress/flate 55.333s
Change-Id: I8e12ad13220e50d4cf7ddba6f292333efad61b0c
Reviewed-on: https://go-review.googlesource.com/20982
Reviewed-by: Joe Tsai <joetsai@digital-static.net>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
- Fix a typo.
- Skip this test on -short on non-builders.
Change-Id: Id102eceb59451694bf92b618e02ccee6603b6852
Reviewed-on: https://go-review.googlesource.com/21113
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
This change removes a lot of dead code. Some of the code has never been
used, not even when it was first commited. The rest shouldn't have
survived refactors.
This change doesn't remove unused routines helpful for debugging, nor
does it remove code that's used in commented out blocks of code that are
only unused temporarily. Furthermore, unused constants weren't removed
when they were part of a set of constants from specifications.
One noteworthy omission from this CL are about 1000 lines of unused code
in cmd/fix, 700 lines of which are the typechecker, which hasn't been
used ever since the pre-Go 1 fixes have been removed. I wasn't sure if
this code should stick around for future uses of cmd/fix or be culled as
well.
Change-Id: Ib714bc7e487edc11ad23ba1c3222d1fd02e4a549
Reviewed-on: https://go-review.googlesource.com/20926
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This changes how matching is done in deflate algorithm.
The major change is that we do not look for matches that are only
3 bytes in length, matches must be 4 bytes at least.
Contrary to what you would expect this actually improves the
compresion ratio, since 3 literal bytes will often be shorter
than a match after huffman encoding.
This varies a bit by source, but is most often the case when the
source is "easy" to compress.
Second of all, a "stronger" hash is used. The hash is similar to
the hashing function used by Snappy.
Overall, the speed impact is biggest on higher compression levels.
I intend to replace the "speed" compression level, which can be
seen in CL 21021.
The built-in benchmark using "digits" is slower at level 1.
I see this as an exception, since "digits" is a special type
of data, where you have low entropy (numbers 0->9), but no
significant matches. Again, CL 20021 fixes that case.
NewWriterDict is also made considerably faster, by not running data
through the entire encoder. This is not reflected by the benchmark.
Overall, the speed impact is biggest on higher compression levels.
I intend to replace the "speed" compression level.
COMPARED to tip/master:
name old time/op new time/op delta
EncodeDigitsSpeed1e4-4 401µs ± 1% 345µs ± 2% -13.95%
EncodeDigitsSpeed1e5-4 3.19ms ± 1% 4.27ms ± 3% +33.96%
EncodeDigitsSpeed1e6-4 27.7ms ± 4% 43.8ms ± 3% +58.00%
EncodeDigitsDefault1e4-4 641µs ± 0% 403µs ± 1% -37.15%
EncodeDigitsDefault1e5-4 13.8ms ± 1% 6.4ms ± 3% -53.73%
EncodeDigitsDefault1e6-4 162ms ± 1% 64ms ± 2% -60.51%
EncodeDigitsCompress1e4-4 627µs ± 1% 405µs ± 2% -35.45%
EncodeDigitsCompress1e5-4 13.9ms ± 0% 6.3ms ± 2% -54.46%
EncodeDigitsCompress1e6-4 159ms ± 1% 64ms ± 0% -59.91%
EncodeTwainSpeed1e4-4 433µs ± 4% 331µs ± 1% -23.53%
EncodeTwainSpeed1e5-4 2.82ms ± 1% 3.08ms ± 0% +9.10%
EncodeTwainSpeed1e6-4 28.1ms ± 2% 28.8ms ± 0% +2.82%
EncodeTwainDefault1e4-4 695µs ± 4% 474µs ± 1% -31.78%
EncodeTwainDefault1e5-4 11.8ms ± 0% 7.4ms ± 0% -37.31%
EncodeTwainDefault1e6-4 128ms ± 0% 75ms ± 0% -40.93%
EncodeTwainCompress1e4-4 719µs ± 3% 480µs ± 0% -33.27%
EncodeTwainCompress1e5-4 15.0ms ± 3% 8.2ms ± 2% -45.55%
EncodeTwainCompress1e6-4 170ms ± 0% 85ms ± 1% -49.99%
name old speed new speed delta
EncodeDigitsSpeed1e4-4 25.0MB/s ± 1% 29.0MB/s ± 2% +16.24%
EncodeDigitsSpeed1e5-4 31.4MB/s ± 1% 23.4MB/s ± 3% -25.34%
EncodeDigitsSpeed1e6-4 36.1MB/s ± 4% 22.8MB/s ± 3% -36.74%
EncodeDigitsDefault1e4-4 15.6MB/s ± 0% 24.8MB/s ± 1% +59.11%
EncodeDigitsDefault1e5-4 7.27MB/s ± 1% 15.72MB/s ± 3% +116.23%
EncodeDigitsDefault1e6-4 6.16MB/s ± 0% 15.60MB/s ± 2% +153.25%
EncodeDigitsCompress1e4-4 15.9MB/s ± 1% 24.7MB/s ± 2% +54.97%
EncodeDigitsCompress1e5-4 7.19MB/s ± 0% 15.78MB/s ± 2% +119.62%
EncodeDigitsCompress1e6-4 6.27MB/s ± 1% 15.65MB/s ± 0% +149.52%
EncodeTwainSpeed1e4-4 23.1MB/s ± 4% 30.2MB/s ± 1% +30.68%
EncodeTwainSpeed1e5-4 35.4MB/s ± 1% 32.5MB/s ± 0% -8.34%
EncodeTwainSpeed1e6-4 35.6MB/s ± 2% 34.7MB/s ± 0% -2.77%
EncodeTwainDefault1e4-4 14.4MB/s ± 4% 21.1MB/s ± 1% +46.48%
EncodeTwainDefault1e5-4 8.49MB/s ± 0% 13.55MB/s ± 0% +59.50%
EncodeTwainDefault1e6-4 7.83MB/s ± 0% 13.25MB/s ± 0% +69.19%
EncodeTwainCompress1e4-4 13.9MB/s ± 3% 20.8MB/s ± 0% +49.83%
EncodeTwainCompress1e5-4 6.65MB/s ± 3% 12.20MB/s ± 2% +83.51%
EncodeTwainCompress1e6-4 5.88MB/s ± 0% 11.76MB/s ± 1% +100.06%
Change-Id: I724e33c1dd3e3a6a1b0a68e094baa959352baf32
Reviewed-on: https://go-review.googlesource.com/20929
Run-TryBot: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
This will test if deflate output is deterministic between two runs
of the deflater, when write sizes differ.
The deflater makes no official promises that results are
deterministic between runs, but this is a good test to determine
unintentional randomness.
Note that this does not guarantee that results are deterministic
across platforms nor that results will be deterministic between
Go versions. This is also not guarantees we should imply.
Change-Id: Id7dd89fe276060fd83a43d0b34ac35d50fcd32d9
Reviewed-on: https://go-review.googlesource.com/20573
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
If the upstream writer has returned an error, it may not
be returned by subsequent calls.
This makes sure that if an error has been returned, the
Writer will keep returning an error on all subsequent calls,
and not silently "swallow" them.
Change-Id: I2c9f614df72e1f4786705bf94e119b66c62abe5e
Reviewed-on: https://go-review.googlesource.com/20515
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Ensure that all errors (including io.EOF) are persistent across method
calls on zlib.Reader. Furthermore, ensure that these persistent errors
are properly cleared when Reset is called.
Fixes#14675
Change-Id: I15a20c7e25dc38219e7e0ff255d1ba775a86bb47
Reviewed-on: https://go-review.googlesource.com/20292
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>