These should never be found in a bzip2 file but it does appear that
there's a buggy encoder that is producing them. Since the official
bzip2 handles this case, this change makes the Go code do likewise.
With this change, the code produces the same output as the official
bzip2 code on the invalid example given in the bug.
Fixes#7279.
LGTM=r
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/64010043
Use the smaller read-only bytes.NewReader/strings.NewReader instead
of a bytes.Buffer when possible.
LGTM=r
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54660045
The huffmanDecoder struct appears to be intented for reuse by calling init a
second time with a second sequence of code lengths. Unfortunately, it can
currently panic if the second sequence of code lengths has a minimum value
greater than 10 due to failure to reinitialize the links table.
This change prevents the panic by resetting the huffmanDecoder struct back to
the struct's zero value at the beginning of the init method if the
huffmanDecoder is being reused (determined by checking if min has been set to a
non-zero value).
Fixes#6255.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/13230043
While we're here, add a test for the same functionality in gzip,
which was already implemented, and add bzip2 CRC checks.
Fixes#5772.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/12387044
I used just enough of the data provided by Matt in Issue 5915 to trigger
issue 5915. As luck would have it, using slightly less of it triggered
issue 5962.
Fixes#5915.
Fixes#5962.
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/12288043
Decode as much as possible of a Huffman symbol in a single table
lookup (much like the zlib implementation), filling more bits
(conservatively, so we don't consume past the end of the stream)
when the code prefix indicates more bits are needed. This
results in about a 50% performance gain in speed benchmarks.
The following set is benchcmp done on a retina MacBook Pro:
benchmark old MB/s new MB/s speedup
BenchmarkDecodeDigitsSpeed1e4 28.41 42.79 1.51x
BenchmarkDecodeDigitsSpeed1e5 30.18 47.62 1.58x
BenchmarkDecodeDigitsSpeed1e6 30.81 48.14 1.56x
BenchmarkDecodeDigitsDefault1e4 30.28 44.61 1.47x
BenchmarkDecodeDigitsDefault1e5 32.18 51.94 1.61x
BenchmarkDecodeDigitsDefault1e6 35.57 53.28 1.50x
BenchmarkDecodeDigitsCompress1e4 30.39 44.83 1.48x
BenchmarkDecodeDigitsCompress1e5 33.05 51.64 1.56x
BenchmarkDecodeDigitsCompress1e6 35.69 53.04 1.49x
BenchmarkDecodeTwainSpeed1e4 25.90 43.04 1.66x
BenchmarkDecodeTwainSpeed1e5 29.97 48.19 1.61x
BenchmarkDecodeTwainSpeed1e6 31.36 49.43 1.58x
BenchmarkDecodeTwainDefault1e4 28.79 45.02 1.56x
BenchmarkDecodeTwainDefault1e5 37.12 55.65 1.50x
BenchmarkDecodeTwainDefault1e6 39.28 58.16 1.48x
BenchmarkDecodeTwainCompress1e4 28.64 44.90 1.57x
BenchmarkDecodeTwainCompress1e5 37.40 55.98 1.50x
BenchmarkDecodeTwainCompress1e6 39.35 58.06 1.48x
R=rsc, dave, minux.ma, bradfitz, nigeltao
CC=golang-dev
https://golang.org/cl/6872063
This bug has been introduced in the following revision:
changeset: 11404:26dceba5c610
user: Ivan Krasin <krasin@golang.org>
date: Mon Jan 23 09:19:39 2012 -0500
summary: compress/flate: reduce memory pressure at cost of additional arithmetic operation.
This is the review page for that CL: https://golang.org/cl/5555070/
R=rsc, imkrasin
CC=golang-dev
https://golang.org/cl/6249067
These files change from exactly 10003 bytes long to 100003: a digit,
a '.', 100k digits, and a '\n'.
The magic constants in compress/flate/deflate_test.go change since
deflateInflateStringTests checks that the compressed form of e.txt
is not 'too large'. I'm not exactly sure how these numbers were
originally calculated (they were introduced in codereview 5554066
"make lazy matching work"); perhaps krasin@golang.org can comment.
My change was to increase the first one (no compression) to a tight
bound, and multiply all the others by 10.
Benchcmp numbers for compress/flate and compress/lzw below. LZW's
window size of 4096 is less than 10k, so shows no significant change.
Flate's window size is 32768, between 10k and 100k, and so the .*1e5
and .*1e6 benchmarks show a dramatic drop, since the compressed forms
are no longer a trivial forward copy of 10k digits repeated over and
over, but should now be more representative of real world usage.
compress/flate:
benchmark old MB/s new MB/s speedup
BenchmarkDecodeDigitsSpeed1e4 16.58 16.52 1.00x
BenchmarkDecodeDigitsSpeed1e5 68.09 18.10 0.27x
BenchmarkDecodeDigitsSpeed1e6 124.63 18.35 0.15x
BenchmarkDecodeDigitsDefault1e4 17.21 17.12 0.99x
BenchmarkDecodeDigitsDefault1e5 118.28 19.19 0.16x
BenchmarkDecodeDigitsDefault1e6 295.62 20.52 0.07x
BenchmarkDecodeDigitsCompress1e4 17.22 17.17 1.00x
BenchmarkDecodeDigitsCompress1e5 118.19 19.21 0.16x
BenchmarkDecodeDigitsCompress1e6 295.59 20.55 0.07x
BenchmarkEncodeDigitsSpeed1e4 8.18 8.19 1.00x
BenchmarkEncodeDigitsSpeed1e5 43.22 12.84 0.30x
BenchmarkEncodeDigitsSpeed1e6 80.76 13.48 0.17x
BenchmarkEncodeDigitsDefault1e4 6.29 6.19 0.98x
BenchmarkEncodeDigitsDefault1e5 31.63 3.60 0.11x
BenchmarkEncodeDigitsDefault1e6 52.97 3.24 0.06x
BenchmarkEncodeDigitsCompress1e4 6.20 6.19 1.00x
BenchmarkEncodeDigitsCompress1e5 31.59 3.59 0.11x
BenchmarkEncodeDigitsCompress1e6 53.18 3.25 0.06x
compress/lzw:
benchmark old MB/s new MB/s speedup
BenchmarkDecoder1e4 21.99 22.09 1.00x
BenchmarkDecoder1e5 22.77 22.71 1.00x
BenchmarkDecoder1e6 22.90 22.90 1.00x
BenchmarkEncoder1e4 21.04 21.19 1.01x
BenchmarkEncoder1e5 22.06 22.06 1.00x
BenchmarkEncoder1e6 22.16 22.28 1.01x
R=rsc
CC=golang-dev, krasin
https://golang.org/cl/6207043
I'm not sure where the BOM came from, originally.
http://www.gutenberg.org/files/74/74.txt doesn't have it, although
a fresh download of that URL gives me "\r\n"s instead of plain "\n"s,
and the extra line "Character set encoding: ASCII". Maybe Project
Gutenberg has changed their server configuration since we added that
file to the Go repo.
Anyway, this change is just manually excising the BOM from the start
of the file, leaving pure ASCII.
R=r, bradfitz
CC=golang-dev, krasin, rsc
https://golang.org/cl/6197061
In CL 6127051, nigeltao suggested that further gains
were possible by improving the performance of flate.
This CL adds a set of benchmarks (based on compress/lzw)
that can be used to judge any future improvements.
R=nigeltao
CC=golang-dev
https://golang.org/cl/6128049
(*Writer, error) if they take a compression level, and *Writer otherwise.
Rename gzip's Compressor and Decompressor to Writer and Reader, similar to
flate and zlib.
Clarify commentary when writing gzip metadata that is not representable
as Latin-1, and fix io.EOF comment bug.
Also refactor gzip_test to be more straightforward.
Fixes#2839.
R=rsc, r, rsc, bradfitz
CC=golang-dev
https://golang.org/cl/5639057
The practice encourages people to think this is the way to
create a bytes.Buffer when new(bytes.Buffer) or
just var buf bytes.Buffer work fine.
(html/token.go was missing the point altogether.)
R=golang-dev, bradfitz, r
CC=golang-dev
https://golang.org/cl/5637043
Consequently, remove many package Makefiles,
and shorten the few that remain.
gomake becomes 'go tool make'.
Turn off test phases of run.bash that do not work,
flagged with $BROKEN. Future CLs will restore these,
but this seemed like a big enough CL already.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5601057