These new methods provide support for cases where performance is a
primary concern. For example, copying files from an existing zip to a
new zip without incurring the decompression and compression overhead.
Using an optimized, external compression method and writing the output
to a zip archive. And compressing file contents in parallel and then
sequentially writing the compressed bytes to a zip archive.
TestWriterCopy is copied verbatim from https://github.com/rsc/zipmergeFixes#34974
Change-Id: Iade5bc245ba34cdbb86364bf59f79f38bb9e2eb6
Reviewed-on: https://go-review.googlesource.com/c/go/+/312310
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Carlos Amedee <carlos@golang.org>
Several adjustments:
1) When encoding the FileHeader for a directory, explicitly set all of the sizes
to zero regardless of their prior values. These values are currently populated
by FileInfoHeader as it calls os.FileInfo.Size regardless of whether the file is
a directory or not. We avoid fixing FileInfoHeader now as it is too late in the
release cycle (see #24082).
We silently adjust slightly wrong FileHeader fields as opposed to returning
an error because the CreateHeader method already does such mutations
(e.g., for UTF-8 detection, data descriptor, etc).
2) Have dirWriter.Write only return an error if some number of bytes are written.
Some code still call Write for both normal files and directories, but just pass
an empty []byte to Write for directories.
Change-Id: I85492a31356107fcf76dc89ceb00a28853754289
Reviewed-on: https://go-review.googlesource.com/124155
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Java fails to unzip archives created by archive/zip because directories are
written with the "data descriptor" flag (bit 3) set, but emits no such
descriptor. To fix this, we explicitly clear the flag.
Fixes#25215
Change-Id: Id3af4c7f863758197063df879717c1710f86c0e5
Reviewed-on: https://go-review.googlesource.com/110795
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When creating a directory, Writer.Create now returns a dummy
io.Writer that always returns an error on Write.
Fixes#24043
Change-Id: I7792f54440d45d22d0aa174cba5015ed5fab1c5c
Reviewed-on: https://go-review.googlesource.com/108615
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
FileHeader.Name also reflects this fact.
Fixes#24018
Change-Id: Id0860a9b23c264ac4c6ddd65ba20e0f1f36e4865
Reviewed-on: https://go-review.googlesource.com/97057
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
In order to avoid a regression where the date of the ModTime method
changed behavior, simply preserve the old behavior of determining
the date based on the legacy fields.
This ensures that anyone relying on ModTime before Go1.10 will have
the exact same behavior as before.
New users should use FileHeader.Modified instead.
We keep the UTC coersion logic in SetModTime since some users
manually compute timezone offsets in order to have precise control
over the MS-DOS time field.
Fixes#22738
Change-Id: Ib18b6ebd863bcf645748e083357dce9bc788cdba
Reviewed-on: https://go-review.googlesource.com/78031
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
A method is more in keeping with the rest of the Writer API and
incidentally allows the comment error to be reported earlier.
Fixes#22737.
Change-Id: I1eee2103a0720c76d0c394ccd6541e6219996dc0
Reviewed-on: https://go-review.googlesource.com/79415
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
The replacement rune is a valid rune and can appear as itself in valid UTF8
(it encodes as three bytes). To check for invalid UTF8 it is necessary to
look for utf8.DecodeRune returning the replacement rune and size==1.
Change-Id: I169be8d1fe61605c921ac13cc2fde94f80f3463c
Reviewed-on: https://go-review.googlesource.com/78126
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
The NonUTF8 field provides users with a way to explictly tell the
ZIP writer to avoid setting the UTF-8 flag.
This is necessary because many readers:
1) (Still) do not support UTF-8
2) And use the local system encoding instead
Thus, even though character encodings other than CP-437 and UTF-8
are not officially supported by the ZIP specification, pragmatically
the world has permitted use of them.
When a non-standard encoding is used, it is the user's responsibility
to ensure that the target system is expecting the encoding used
(e.g., producing a ZIP file you know is used on a Chinese version of Windows).
We adjust the detectUTF8 function to account for Shift-JIS and EUC-KR
not being identical to ASCII for two characters.
We don't need an API for users to explicitly specify that they are encoding
with UTF-8 since all single byte characters are compatible with all other
common encodings (Windows-1256, Windows-1252, Windows-1251, Windows-1250,
IEC-8859, EUC-KR, KOI8-R, Latin-1, Shift-JIS, GB-2312, GBK) except for
the non-printable characters and the backslash character (all of which
are invalid characters in a path name anyways).
Fixes#10741
Change-Id: I9004542d1d522c9137973f1b6e2b623fa54dfd66
Reviewed-on: https://go-review.googlesource.com/75592
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The ModifiedTime and ModifiedDate fields are not expressive enough
for many of the time extensions that have since been added to ZIP,
nor are they easy to access since they in a legacy MS-DOS format,
and must be set and retrieved via the SetModTime and ModTime methods.
Instead, we add new field Modified of time.Time type that contains
all of the previous information and more.
Support for extended timestamps have been attempted before, but the
change was reverted because it provided no ability for the user to
specify the timezone of the legacy MS-DOS fields.
Technically the old API did not either, but users were manually offsetting
the timestamp to achieve the same effect.
The Writer now writes the legacy timestamps according to the timezone
of the FileHeader.Modified field. When the Modified field is set via
the SetModTime method, it is in UTC, which preserves the old behavior.
The Reader attempts to determine the timezone if both the legacy
and extended timestamps are present since it can compute the delta
between the two values.
Since Modified is a superset of the information in ModifiedTime and ModifiedDate,
we mark ModifiedTime, ModifiedDate, ModTime, and SetModTime as deprecated.
Fixes#18359
Change-Id: I29c6bc0a62908095d02740df3e6902f50d3152f1
Reviewed-on: https://go-review.googlesource.com/74970
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
CL 39570 added support for automatically setting flag bit 11 to
indicate that the filename and comment fields are encoded in UTF-8,
which is (conventionally) the encoding using for most Go strings.
However, the detection added is too lose for two reasons:
* We need to ensure both fields are at least possibly UTF-8.
That is, if any field is definitely not UTF-8, then we can't set the bit.
* The utf8.ValidRune returns true for utf8.RuneError, which iterating
over a Go string automatically returns for invalid UTF-8.
Thus, we manually check for that value.
Updates #22367
Updates #10741
Change-Id: Ie8aae388432e546e44c6bebd06a00434373ca99e
Reviewed-on: https://go-review.googlesource.com/72791
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change added support "end of central directory record comemnt" to the Writer.
There is a new exported field Writer.Comment in this change.
If invalid size of comment was set, Close returns error without closing resources.
Fixes#21634
Change-Id: Ifb62bc6c7f81b9257ac83eb570ad9915de727f8c
Reviewed-on: https://go-review.googlesource.com/59310
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The ZIP format uses uint16 to contain the length of the file name and
the length of the Extra section. This change verifies that the length
of these fields fit in an uint16 prior to writing the ZIP file. If not,
an error is returned.
Fixes#17402
Change-Id: Ief9a864d2fe16b89ddb9917838283b801a2c58a4
Reviewed-on: https://go-review.googlesource.com/50250
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
See: https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.0.TXT
Document says:
> If general purpose bit 11 is set, the filename and comment must support The
> Unicode Standard, Version 4.1.0 or greater using the character encoding form
> defined by the UTF-8 storage specification.
Since Go encode the filename to UTF-8, general purpose bit 11 should be set.
Change-Id: Ica4af02b4dc695e9a5c015ae360e70171efb6ee3
Reviewed-on: https://go-review.googlesource.com/39570
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change reverts the following CLs:
CL/18274: handle mtime in NTFS/UNIX/ExtendedTS extra fields
CL/30811: only use Extended Timestamp on non-zero MS-DOS timestamps
We are reverting support for extended timestamps since the support was not
not complete. CL/18274 added full support for reading extended timestamp fields
and minimal support for writing them. CL/18274 is incomplete because it made
no changes to the FileHeader struct, so timezone information was lost when
reading and/or writing.
While CL/18274 was a step in the right direction, we should provide full
support for high precision timestamps in both the reader and writer.
This will probably require that we add a new field of type time.Time.
The complete fix is too involved to add in the time remaining for Go 1.8
and will be completed in Go 1.9.
Updates #10242
Updates #17403
Updates #18359Fixes#18378
Change-Id: Icf6d028047f69379f7979a29bfcb319a02f4783e
Reviewed-on: https://go-review.googlesource.com/34651
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
We should preserve the fact that a roundtrip read on fields with the zero
value should remain the zero for those that are reasonable to stay that way.
If the zero value for a MS-DOS timestamp was used, then it is sensible for
that zero value to also be read back later.
Fixes#17403
Change-Id: I32c3915eab180e91ddd2499007374f7b85f0bd76
Reviewed-on: https://go-review.googlesource.com/30811
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
You can not use cannot, but you cannot spell cannot can not.
Change-Id: I2f0971481a460804de96fd8c9e46a9cc62a3fc5b
Reviewed-on: https://go-review.googlesource.com/19772
Reviewed-by: Rob Pike <r@golang.org>
Read zip files that contain only 64-bit header offset, not 64-bit sizes.
Fixes#13367.
Read zip files that contain completely unexpected Extra fields,
provided we do not need to find 64-bit size or header offset information there.
Fixes#13166.
Write zip file entries with 0xFFFFFFFF uncompressed data bytes
correctly (must use zip64 header, since that's the magic indicator).
Fixes new TestZip64EdgeCase. (Noticed while working on the CL.)
Change-Id: I84a22b3995fafab8052b99de8094a9f35a25de5b
Reviewed-on: https://go-review.googlesource.com/18317
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Implement setting the compression level for a zip archive by registering
a per-Writer compressor through Writer.RegisterCompressor. If no
compressors are registered, fall back to the ones registered at the
package level. Also implements per-Reader decompressors.
Fixes#8359
Change-Id: I93b27c81947b0f817b42e0067aa610ff267fdb21
Reviewed-on: https://go-review.googlesource.com/16669
Reviewed-by: Joe Tsai <joetsai@digital-static.net>
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
Reviewed-by: Klaus Post <klauspost@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
When appending zip data to existing data such as a binary file the
zip headers must use the correct offset. NewWriterWithOptions
allows creating a Writer that uses the provided offset in the zip
headers.
Fixes#8669
Change-Id: I6ec64f1e816cc57b6fc8bb9e8a0918e586fc56b0
Reviewed-on: https://go-review.googlesource.com/2978
Reviewed-by: Andrew Gerrand <adg@golang.org>
Section 4.3.14.1 of the ZIP file format
spec (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) says,
The value stored into the "size of zip64 end of central directory
record" should be the size of the remaining record and should not
include the leading 12 bytes.
We were previously writing the full size, including the 12 bytes.
Fixes#9857
Change-Id: I7cf1fc8457c5f306717cbcf61e02304ab549781f
Reviewed-on: https://go-review.googlesource.com/4760
Reviewed-by: Andrew Gerrand <adg@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>