1
0
mirror of https://github.com/golang/go synced 2024-11-08 03:46:10 -07:00
Commit Graph

104 Commits

Author SHA1 Message Date
Awn
23c9db657e archive/tar: remove useless type conversions
Change-Id: I259a6ed6a1abc63d2dc39eca7e85f94cf38001cc
Reviewed-on: https://go-review.googlesource.com/47342
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-15 15:11:14 +00:00
Stanislav Afanasev
a4aa5c3181 archive/tar: a cosmetic fix after checking by golint
Existing methods regFileReader.LogicalRemaining and regFileReader.PhysicalRemaining have inconsistent reciever names with the previous name

Change-Id: Ief2024716737eaf482c4311f3fdf77d92801c36e
Reviewed-on: https://go-review.googlesource.com/76430
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-11-07 20:11:28 +00:00
Joe Tsai
577aab0c59 archive/tar: ignore ChangeTime and AccessTime unless Format is specified
CL 59230 changed Writer.WriteHeader to ignore the ChangeTime and AccessTime
fields when considering using the USTAR format when the format is unspecified.
This policy is confusing and leads to unexpected behavior where some files
have ModTime only, while others have ModTime+AccessTime+ChangeTime if the
format became PAX for some unrelated reason (e.g., long pathname).

Change the policy to simply always ignore ChangeTime, AccessTime, and
sub-second time resolutions unless the user explicitly specifies a format.
This is a safe policy change since WriteHeader had no support for the
above features in any Go release.

Support for ChangeTime and AccessTime was added in CL 55570.
Support for sub-second times was added in CL 55552.
Both CLs landed after the latest Go release (i.e., Go1.9), which was
cut from the master branch around August 6th, 2017.

Change-Id: Ib82baa1bf9dd4573ed4f674b7d55d15f733a4843
Reviewed-on: https://go-review.googlesource.com/69296
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-10 20:13:27 +00:00
Joe Tsai
4cd58c2f26 archive/tar: improve handling of directory paths
The USTAR format says:
<<<
Implementors should be aware that the previous file format did not include
a mechanism to archive directory type files.
For this reason, the convention of using a filename ending with
<slash> was adopted to specify a directory on the archive.
>>>

In light of this suggestion, make the following changes:
* Writer.WriteHeader refuses to encode a header where a file that
is obviously a file-type has a trailing slash in the name.
* formatter.formatString avoids encoding a trailing slash in the event
that the string is truncated (the full string will be encoded elsewhere,
so stripping the slash is safe).
* Reader.Next treats a TypeRegA (which is the zero value of Typeflag)
as a TypeDir if the name has a trailing slash.

Change-Id: Ibf27aa8234cce2032d92e5e5b28546c2f2ae5ef6
Reviewed-on: https://go-review.googlesource.com/69293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-10 20:11:26 +00:00
Marvin Stenger
d153df8e4b all: revert "all: prefer strings.LastIndexByte over strings.LastIndex"
This reverts https://golang.org/cl/66372.

Updates #22148

Change-Id: I3e94af3dfc11a2883bf28e1d5e1f32f98760b3ee
Reviewed-on: https://go-review.googlesource.com/68431
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-05 23:19:42 +00:00
Joe Tsai
e04ff3d133 archive/tar: fix typo in documentation
s/TypeSymLink/TypeSymlink/g

Change-Id: I2550843248eb27d90684d0036fe2add0b247ae5a
Reviewed-on: https://go-review.googlesource.com/67810
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-03 16:24:53 +00:00
Marvin Stenger
d2826d3e06 all: prefer strings.LastIndexByte over strings.LastIndex
strings.LastIndexByte was introduced in go1.5 and it can be used
effectively wherever the second argument to strings.LastIndex is
exactly one byte long.

This avoids generating unnecessary string symbols and saves
a few calls to strings.LastIndex.

Change-Id: I7b5679d616197b055cffe6882a8675d24a98b574
Reviewed-on: https://go-review.googlesource.com/66372
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-27 00:54:24 +00:00
Joe Tsai
7246585f8c archive/tar: avoid empty IO operations
The interfaces for io.Reader and io.Writer permit calling Read/Write
with an empty buffer. However, this condition is often not well tested
and can lead to bugs in various implementations of io.Reader and io.Writer.
For example, see #22028 for buggy io.Reader in the bzip2 package.

We reduce the likelihood of hitting these bugs by adjusting
regFileReader.Read and regFileWriter.Write to avoid performing
Read and Write calls when the buffer is known to be empty.

Fixes #22029

Change-Id: Ie4a26be53cf87bc4d2abd951fa005db5871cc75c
Reviewed-on: https://go-review.googlesource.com/66111
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-25 23:06:04 +00:00
Giovanni Bajo
2f8b555de2 archive/tar: fix sparse files support on Darwin
Apple defined the SEEK_HOLE/SEEK_DATA constants in unistd.h
with swapped values, compared to all other UNIX systems.

Fixes #21970

Change-Id: I84a33e0741f0f33a2e04898e96b788b87aa9890f
Reviewed-on: https://go-review.googlesource.com/65570
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-23 16:53:28 +00:00
David du Colombier
d83b23fd4f archive/tar: skip TestSparseFiles on Plan 9
CL 60871 added TestSparseFiles. This test is succeeding
on Plan 9 when executed on the ramfs file system, but
is failing when executed on the Fossil file system.

This may be due to an issue in the handling of sparse
files in the Fossil file system on Plan 9 that should
be investigated.

Updates #21977.

Change-Id: I177afff519b862a5c548e094203c219504852006
Reviewed-on: https://go-review.googlesource.com/65352
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-22 13:50:50 +00:00
Joe Tsai
718d9de60f archive/tar: perform test for hole-detection on specific builders
The test for hole-detection is heavily dependent on whether the
OS and underlying FS provides support for it.
Even on Linux, which has support for SEEK_HOLE and SEEK_DATA,
the underlying filesystem may not have support for it.
In order to avoid an ever-changing game of whack-a-mole,
we whitelist the specific builders that we expect the test to pass on.

Updates #21964

Change-Id: I7334e8532c96cc346ea83aabbb81b719685ad7e5
Reviewed-on: https://go-review.googlesource.com/65270
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-21 20:42:11 +00:00
Joe Tsai
fdecab6ef0 archive/tar: make check for hole detection support more liberal
On most Unix OSes, lseek reports EINVAL when lacking SEEK_HOLE support.
However, there are reports that ENOTTY is reported instead.
Rather than tracking down every possible errno that may be used to
represent "not supported", just treat any non-nil error as meaning
that there is no support. This is the same strategy taken by the
GNU and BSD tar tools.

Fixes #21958

Change-Id: Iae68afdc934042f52fa914fca45f0ca89220c383
Reviewed-on: https://go-review.googlesource.com/65191
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-21 17:49:35 +00:00
Joe Tsai
1eacf78858 archive/tar: add Header.DetectSparseHoles and Header.PunchSparseHoles
To support the detection and creation of sparse files,
add two new methods:
	func Header.DetectSparseHoles(*os.File) error
	func Header.PunchSparseHoles(*os.File) error

DetectSparseHoles is intended to be used after FileInfoHeader
prior to serializing the Header with WriteHeader.
For each OS, it uses specialized logic to detect
the location of sparse holes. On most Unix systems, it uses
SEEK_HOLE and SEEK_DATA to query for the holes.
On Windows, it uses a specialized the FSCTL_QUERY_ALLOCATED_RANGES
syscall to query for all the holes.

PunchSparseHoles is intended to be used after Reader.Next
prior to populating the file with Reader.WriteTo.
On Windows, this uses the FSCTL_SET_ZERO_DATA syscall.
On other operating systems it simply truncates the file
to the end-offset of SparseHoles.

DetectSparseHoles and PunchSparseHoles are added as methods on
Header because they are heavily tied to the operating system,
for which there is already an existing precedence for
(since FileInfoHeader makes uses of OS-specific details).

Fixes #13548

Change-Id: I98a321dd1ce0165f3d143d4edadfda5e7db67746
Reviewed-on: https://go-review.googlesource.com/60871
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-20 22:12:38 +00:00
Joe Tsai
57c79febda archive/tar: add Reader.WriteTo and Writer.ReadFrom
To support the efficient packing and extracting of sparse files,
add two new methods:
	func Reader.WriteTo(io.Writer) (int64, error)
	func Writer.ReadFrom(io.Reader) (int64, error)

If the current archive entry is sparse and the provided io.{Reader,Writer}
is also an io.Seeker, then use Seek to skip past the holes.
If the last region in a file entry is a hole, then we seek to 1 byte
before the EOF:
	* for Reader.WriteTo to write a single byte
	to ensure that the resulting filesize is correct.
	* for Writer.ReadFrom to read a single byte
	to verify that the input filesize is correct.

The downside of this approach is when the last region in the sparse file
is a hole. In the case of Reader.WriteTo, the 1-byte write will cause
the last fragment to have a single chunk allocated.
However, the goal of ReadFrom/WriteTo is *not* the ability to
exactly reproduce sparse files (in terms of the location of sparse holes),
but rather to provide an efficient way to create them.

File systems already impose their own restrictions on how the sparse file
will be created. Some filesystems (e.g., HFS+) don't support sparseness and
seeking forward simply causes the FS to write zeros. Other filesystems
have different chunk sizes, which will cause chunk allocations at boundaries
different from what was in the original sparse file. In either case,
it should not be a normal expectation of users that the location of holes
in sparse files exactly matches the source.

For users that really desire to have exact reproduction of sparse holes,
they can wrap os.File with their own io.WriteSeeker that discards the
final 1-byte write and uses File.Truncate to resize the file to the
correct size.

Other reasons we choose this approach over special-casing *os.File because:
	* The Reader already has special-case logic for io.Seeker
	* As much as possible, we want to decouple OS-specific logic from
	Reader and Writer.
	* This allows other abstractions over *os.File to also benefit from
	the "skip past holes" logic.
	* It is easier to test, since it is harder to mock an *os.File.

Updates #13548

Change-Id: I0a4f293bd53d13d154a946bc4a2ade28a6646f6a
Reviewed-on: https://go-review.googlesource.com/60872
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-18 16:18:17 +00:00
Tobias Klauser
3098cf0175 archive/tar: populate Devmajor and Devminor in FileInfoHeader on *BSD
Extract device major/minor number on all the BSDs and set Devmajor and
Devminor in FileInfoHeader. Code based on the corresponding Major/Minor
implementations in golang.org/x/sys/unix.

Change-Id: Ieffa7ce0cdbe6481950de666b2f5f88407a32382
Reviewed-on: https://go-review.googlesource.com/63470
Reviewed-by: Joe Tsai <joetsai@google.com>
2017-09-13 21:02:11 +00:00
Tobias Klauser
ec359643a1 archive/tar: populate Devmajor and Devminor in FileInfoHeader on Darwin
Extract device major/minor number on Darwin and set Devmajor and
Devminor in FileInfoHeader. Code based on the Major/Minor functions for
Darwin in golang.org/x/sys/unix.

Change-Id: I51b65f607bfa2e6b177b8b66e2b246b771367b84
Reviewed-on: https://go-review.googlesource.com/60850
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-01 10:25:54 +00:00
Joe Tsai
c1679286c3 archive/tar: minor doc fixes
Use "file" consistently instead of "entry".

Change-Id: Ia81c9665d0d956adb78f7fa49de40cdb87fba000
Reviewed-on: https://go-review.googlesource.com/60150
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30 18:01:08 +00:00
Joe Tsai
f85dc050ba archive/tar: require opt-in to PAX or GNU format for time features
Nearly every Header obtained from FileInfoHeader via the FS has
timestamps with sub-second resolution and the AccessTime
and ChangeTime fields populated. This forces the PAX format
to almost always be used, which has the following problems:
* PAX is still not as widely supported compared to USTAR
* The PAX headers will occupy at minimum 1KiB for every entry

The old behavior of tar Writer had no support for sub-second resolution
nor any support for AccessTime or ChangeTime, so had neither problem.
Instead the Writer would just truncate sub-second information and
ignore the AccessTime and ChangeTime fields.

In this CL, we preserve the behavior such that the *default* behavior
would output a USTAR header for most cases by truncating sub-second
time measurements and ignoring AccessTime and ChangeTime.
To use either of the features, users will need to explicitly specify
that the format is PAX or GNU.

The exact policy chosen is this:
* USTAR and GNU may still be chosen even if sub-second measurements
are present; they simply truncate the timestamp to the nearest second.
As before, PAX uses sub-second resolutions.
* If the Format is unspecified, then WriteHeader ignores AccessTime
and ChangeTime when using the USTAR format.

This ensures that USTAR may still be chosen for a vast majority of
file entries obtained through FileInfoHeader.

Updates #11171
Updates #17876

Change-Id: Icc5274d4245922924498fd79b8d3ae94d5717271
Reviewed-on: https://go-review.googlesource.com/59230
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30 18:00:59 +00:00
Joe Tsai
0564e304a6 archive/tar: populate uname/gname/devmajor/devminor in FileInfoHeader
We take a best-effort approach since information for these fields
are not well supported on all platforms.

user.LookupId+user.LookupGroupId is currently 15x slower than os.Stat.
For performance reasons, we perpetually cache username and groupname
with a sync.Map. As a result, this function will not be updated whenever
the user or group names are renamed in the OS. However, this is a better
situation than before, where those fields were not populated at all.

Change-Id: I3cec8291aed7675dea89ee1cbda92bd493c8831f
Reviewed-on: https://go-review.googlesource.com/59531
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30 00:52:31 +00:00
Joe Tsai
bad6b6fa91 archive/tar: improve package documentation
Many aspects of the package is woefully undocumented.
With the recent flurry of improvements, the package is now at feature
parity with the GNU and TAR tools. Thoroughly all of the public API
and perform some minor stylistic cleanup in some code segments.

Change-Id: Ic892fd72c587f30dfe91d1b25b88c9c8048cc389
Reviewed-on: https://go-review.googlesource.com/59210
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25 23:29:55 +00:00
Joe Tsai
19a995945f archive/tar: add raw support for global PAX records
The PAX specification says the following:
<<<
'g' represents global extended header records for the following files in the archive.
The format of these extended header records shall be as described in pax Extended Header.
Each value shall affect all subsequent files that do not override that value
in their own extended header record and until another global extended header record
is reached that provides another value for the same field.
>>>

This CL adds support for parsing and composing global PAX records,
but intentionally does not provide support for automatically
persisting the global state across files.

Changes made:
* When Reader encounters a TypeXGlobalRecord header, it parses the
PAX records and returns them to the user ad-verbatim. Reader does not
store them in its state, ensuring it has no effect on future Next calls.
* When Writer receives a TypeXGlobalRecord header, it writes the
PAX records to the archive ad-verbatim. It does not store them in
its state, ensuring it has no effect on future WriteHeader calls.
* The restriction regarding empty record values is lifted since this
value is used to represent deletion in global headers.

Why provide raw support only:
* Some archives in the wild have a global header section (often empty)
and it is the user's responsibility to manually read and discard it's body.
The logic added here allows users to more easily skip over these sections.
* For users that do care about global headers, having access to the raw
records allows them to implement the functionality of global headers themselves
and manually persist the global state across files.
* We can still upgrade to a full implementation in the future.

Why we don't provide full support:
* Even though the PAX specification describes their operation in detail,
both the GNU and BSD tar tools (which are the most common implementations)
do not have a consistent interpretation of many details.
* Global headers were a controversial feature in PAX, by admission of the
specification itself:
  <<<
  The concept of a global extended header (typeflag g) was controversial.

  The typeflag g global headers should not be used with interchange media that
  could suffer partial data loss in transporting the archive.
  >>>
* Having state persist from entry-to-entry complicates the implementation
for a feature that is not widely used and not well supported.

Change-Id: I1d904cacc2623ddcaa91525a5470b7dbe226c7e8
Reviewed-on: https://go-review.googlesource.com/59190
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-08-25 23:03:52 +00:00
Joe Tsai
a795ca51db archive/tar: support arbitrary PAX records
This CL adds the following new publicly visible API:
	type Header struct { ...; PAXRecords map[string]string }

The new Header.PAXRecords field is a map of all PAX extended header records.

We suggest (but do not enforce) that users use VENDOR-prefixed keys
according to the following in the PAX specification:
<<<
The standard developers have reserved keyword name space for vendor extensions.
It is suggested that the format to be used is:
	VENDOR.keyword
where VENDOR is the name of the vendor or organization in all uppercase letters.
>>>

When reading, the Header.PAXRecords is populated with all PAX records
encountered so far, including basic ones (e.g., "path", "mtime", etc).
When writing, the fields of Header will be merged into PAXRecords,
overwriting any records that may conflict.

Since PAXRecords is a more expressive feature than Xattrs and
is entirely a superset of Xattrs, we mark Xattrs as deprecated,
and steer users towards the new PAXRecords API.

The issue has a discussion about adding a Header.SetPAXRecord method
to help validate records and keep the Header fields in sync.
However, we do not include that in this CL since that helper
method can always be added in the future.

There is no support for global records.

Fixes #14472

Change-Id: If285a52749acc733476cf75a2c7ad15bc1542071
Reviewed-on: https://go-review.googlesource.com/58390
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25 21:57:32 +00:00
Joe Tsai
3d62000adc archive/tar: return better WriteHeader errors
WriteHeader may fail to encode a header for any number of reasons,
which can be frustrating for the user when trying to create a tar archive.
As we validate the Header, we generate an informative error message
intended for human consumption and return that if and only if no
format can be selected.

This allows WriteHeader to return informative errors like:
    tar: cannot encode header: invalid PAX record: "linkpath = \x00hello"
    tar: cannot encode header: invalid PAX record: "SCHILY.xattr.foo=bar = baz"
    tar: cannot encode header: Format specifies GNU; and only PAX supports Xattrs
    tar: cannot encode header: Format specifies GNU; and GNU cannot encode ModTime=1969-12-31 15:59:59.0000005 -0800 PST
    tar: cannot encode header: Format specifies GNU; and GNU supports sparse files only with TypeGNUSparse
    tar: cannot encode header: Format specifies USTAR; and USTAR cannot encode ModTime=292277026596-12-04 07:30:07 -0800 PST
    tar: cannot encode header: Format specifies USTAR; and USTAR does not support sparse files
    tar: cannot encode header: Format specifies PAX; and only GNU supports TypeGNUSparse

Updates #18710

Change-Id: I82a498d6f29d02c4e73bce47b768eb578da8499c
Reviewed-on: https://go-review.googlesource.com/58310
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25 05:21:00 +00:00
Joe Tsai
9d3d370632 archive/tar: support reporting and selecting the format
The Reader and Writer are now at feature parity,
meaning that everything that can be parsed by the Reader,
can also be composed by the Writer.

This position enables us to support selection of the format
in a backwards compatible way, since it ensures that everything
that can be read can also be round-trip written.

As such, we add the following new API:
    type Format int
            const FormatUnknown Format = 0 ...
    type Header struct { ...; Format Format }

The new Header.Format field is populated by the Reader on the
best guess on what the format is. Note that the Reader is very liberal
in what it permits, so a hybrid TAR file using aspects of multiple
formats can still be decoded, but will be reported as FormatUnknown.

Even though Reader has full support for V7 and basic support for STAR,
it will still report those formats as unknown (and the constants for
those formats are not even exported). The reasons for this is because
the Writer has no support for V7 or STAR. Leaving it as unknown allows
the Writer to choose a format usually USTAR or GNU that can encode
the equivalent Header.

When writing, the Header.allowedFormats will take the Format field
into consideration if it is a known format.

Fixes #18710

Change-Id: I00980c475d067c6969d3414e1ff0224fdd89cd49
Reviewed-on: https://go-review.googlesource.com/58230
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-24 01:35:39 +00:00
Joe Tsai
e0ab505a97 archive/tar: implement Writer support for sparse files
This CL is the second step (of two; part1 is CL/56771) for adding
sparse file support to the Writer.

There are no new identifiers exported in this CL, but this does make
use of Header.SparseHoles added in part1. If the Typeflag is set to
TypeGNUSparse or len(SparseHoles) > 0, then the Writer will emit an
sparse file, where the holes must be written by the user as zeros.

If TypeGNUSparse is set, then the output file must use the GNU format.
Otherwise, it must use the PAX format (with GNU-defined PAX keys).

A future CL may export Reader.Discard and Writer.FillZeros,
but those methods are currently unexported, and only used by the
tests for efficiency reasons.
Calling Discard or FillZeros on a hole 10GiB in size does take
time, even if it is essentially a memcopy.

Updates #13548

Change-Id: Id586d9178c227c0577f796f731ae2cbb72355601
Reviewed-on: https://go-review.googlesource.com/57212
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-23 22:38:45 +00:00
Agniva De Sarker
ea5e3bd2a1 all: fix easy-to-miss typos
Using the wonderful https://github.com/client9/misspell tool.

Change-Id: Icdbc75a5559854f4a7a61b5271bcc7e3f99a1a24
Reviewed-on: https://go-review.googlesource.com/57851
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-23 03:07:12 +00:00
Joe Tsai
3bece2fa0e archive/tar: refactor Reader support for sparse files
This CL is the first step (of two) for adding sparse file support
to the Writer. This CL only refactors the logic of sparse-file handling
in the Reader so that common logic can be easily shared by the Writer.

As a result of this CL, there are some new publicly visible API changes:
	type SparseEntry struct { Offset, Length int64 }
	type Header struct { ...; SparseHoles []SparseEntry }

A new type is defined to represent a sparse fragment and a new field
Header.SparseHoles is added to represent the sparse holes in a file.
The API intentionally represent sparse files using hole fragments,
rather than data fragments so that the zero value of SparseHoles
naturally represents a normal file (i.e., a file without any holes).
The Reader now populates SparseHoles for sparse files.

It is necessary to export the sparse hole information, otherwise it would
be impossible for the Writer to specify that it is trying to encode
a sparse file, and what it looks like.

Some unexported helper functions were added to common.go:
	func validateSparseEntries(sp []SparseEntry, size int64) bool
	func alignSparseEntries(src []SparseEntry, size int64) []SparseEntry
	func invertSparseEntries(src []SparseEntry, size int64) []SparseEntry

The validation logic that used to be in newSparseFileReader is now moved
to validateSparseEntries so that the Writer can use it in the future.
alignSparseEntries is currently unused by the Reader, but will be used
by the Writer in the future. Since TAR represents sparse files by
only recording the data fragments, we add the invertSparseEntries
function to convert a list of data fragments to a normalized list
of hole fragments (and vice-versa).

Some other high-level changes:
* skipUnread is deleted, where most of it's logic is moved to the
Discard methods on regFileReader and sparseFileReader.
* readGNUSparsePAXHeaders was rewritten to be simpler.
* regFileReader and sparseFileReader were completely rewritten
in simpler and easier to understand logic.
* A bug was fixed in sparseFileReader.Read where it failed to
report an error if the logical size of the file ends before
consuming all of the underlying data.
* The tests for sparse-file support was completely rewritten.

Updates #13548

Change-Id: Ic1233ae5daf3b3f4278fe1115d34a90c4aeaf0c2
Reviewed-on: https://go-review.googlesource.com/56771
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-19 00:57:31 +00:00
Agniva De Sarker
d9606e5532 archive/tar: add reader/writer benchmarks
According to the discussion on golang.org/cl/55210,
adding benchmarks for reading from and writing to tar archives.

Splitting the benchmarks into 3 sections of USTAR, GNU, PAX each.

Results ran with -cpu=1 -count=10 on an amd64 machine (i5-5200U CPU @ 2.20GHz)
name           time/op
/Writer/USTAR  5.31µs ± 0%
/Writer/GNU    5.01µs ± 1%
/Writer/PAX    11.0µs ± 2%
/Reader/USTAR  3.22µs ± 1%
/Reader/GNU    3.04µs ± 1%
/Reader/PAX    7.48µs ± 1%

name           alloc/op
/Writer/USTAR  1.20kB ± 0%
/Writer/GNU    1.15kB ± 0%
/Writer/PAX    2.61kB ± 0%
/Reader/USTAR  1.38kB ± 0%
/Reader/GNU    1.35kB ± 0%
/Reader/PAX    4.91kB ± 0%

name           allocs/op
/Writer/USTAR    53.0 ± 0%
/Writer/GNU      47.0 ± 0%
/Writer/PAX       107 ± 0%
/Reader/USTAR    32.0 ± 0%
/Reader/GNU      30.0 ± 0%
/Reader/PAX      67.0 ± 0%

Change-Id: I58b1b85b52e58cbd566736aae4d722a3ddf2395b
Reviewed-on: https://go-review.googlesource.com/55254
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-16 20:51:52 +00:00
Joe Tsai
b9a79f32b1 archive/tar: make Writer error handling consistent
The Writer logic was not consistent about when an IO error would
persist across multiple calls on Writer's methods.

Thus, to make the error handling more consistent we always check
the persistent state of the error prior to every exported method
call, and return an error if set. Otherwise, it is the responsibility
of every exported method to persist any fatal errors that may occur.

As a simplification, we can remove the close field since that
information can be represented by simply storing ErrWriteAfterClose
in the err field.

Change-Id: I8746ca36b3739803e0373253450db69b3bd12f38
Reviewed-on: https://go-review.googlesource.com/55590
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-16 01:07:12 +00:00
Joe Tsai
5c20ffbb2f archive/tar: add support for long binary strings in GNU format
The GNU tar format defines the following type flags:
	TypeGNULongName = 'L' // Next file has a long name
	TypeGNULongLink = 'K' // Next file symlinks to a file w/ a long name

Anytime a string exceeds the field dedicated to store it, the GNU format
permits a fake "file" to be prepended where that file entry has a Typeflag
of 'L' or 'K' and the contents of the file is a NUL-terminated string.

Contrary to previous TODO comments,
the GNU format supports arbitrary strings (without NUL) rather UTF-8 strings.
The manual says the following:
<<<
The name, linkname, magic, uname, and gname are
null-terminated character strings
>>>
<<<
All characters in header blocks are represented
by using 8-bit characters in the local variant of ASCII.
>>>

From this description, we gather the following:
* We must forbid NULs in any GNU strings
* Any 8-bit value (other than NUL) is permitted

Since the modern world has moved to UTF-8, it is really difficult to
determine what a "local variant of ASCII" means. For this reason,
we treat strings as just an arbitrary binary string (without NUL)
and leave it to the user to determine the encoding of this string.
(Practically, it seems that UTF-8 is the typical encoding used
in GNU archives seen in the wild).

The implementation of GNU tar seems to confirm this interpretation
of the manual where it permits any arbitrary binary string to exist
within these fields so long as they do not contain the NUL character.

 $ touch `echo -e "not\x80\x81\x82\x83utf8"`
 $ gnutar -H gnu --tar -cvf gnu-not-utf8.tar $(echo -e "not\x80\x81\x82\x83utf8")

The fact that we permit arbitrary binary in GNU strings goes
hand-in-hand with the fact that GNU also permits a "base-256" encoding
of numeric fields, which is effectively two-complement binary.

Change-Id: Ic037ec6bed306d07d1312f0058594bd9b64d9880
Reviewed-on: https://go-review.googlesource.com/55573
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-16 00:39:32 +00:00
Joe Tsai
4c55774304 archive/tar: re-implement USTAR path splitting
The logic for USTAR was disabled because a previous implementation of
Writer had a wrong understanding of the differences between USTAR and GNU,
causing the prefix field is incorrectly be populated in GNU files.

Now that this issue has been fixed, we can re-enable the logic for USTAR
path splitting, which allows Writer to use the USTAR for a wider range
of possible inputs.

Updates #9683
Updates #12594
Updates #17630

Change-Id: I9fe34e5df63f99c6dd56fee3a7e7e4d6ec3995c9
Reviewed-on: https://go-review.googlesource.com/55574
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-15 05:40:22 +00:00
Joe Tsai
a0237c527b archive/tar: centralize errors in common.go
Move all sentinel errors to common.go since some of them are
returned by both the reader and writer and remove errInvalidHeader
since it not used.

Also, consistently use the "tar: " prefix for errors.

Change-Id: I0afb185bbf3db80dfd9595321603924454a4c2f9
Reviewed-on: https://go-review.googlesource.com/55650
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-15 05:09:54 +00:00
Joe Tsai
9223adcc2c archive/tar: add support for atime and ctime to Writer
Both the GNU and PAX formats support atime and ctime fields.
The implementation is trivial now that we have:
* support for formatting PAX records for timestamps
* dedicated methods that only handle one format (e.g., GNU)

Fixes #17876

Change-Id: I0c604fce14a47d722098afc966399cca2037395d
Reviewed-on: https://go-review.googlesource.com/55570
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-15 03:07:59 +00:00
Joe Tsai
1da0e7e28e archive/tar: reject bad key-value pairs for PAX records
We forbid empty keys or keys with '=' because it leads to ambiguous parsing.
Relevent PAX specification:
<<<
A keyword shall not include an <equals-sign>.
>>>

Also, we forbid the writer from encoding records with an empty value.
While, this is a valid record syntactically, the semantics of an empty
value is that previous records with that key should be deleted.
Since we have no support (and probably never will) for global PAX records,
deletion is a non-sensible operation.
<<<
If the <value> field is zero length,
it shall delete any header block field,
previously entered extended header value,
or global extended header value of the same name.
>>>

Fixes #20698
Fixes #15567

Change-Id: Ia29c5c6ef2e36cd9e6d7f6cff10e92b96a62f0d1
Reviewed-on: https://go-review.googlesource.com/55571
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-15 02:29:29 +00:00
Joe Tsai
2bcc24e977 archive/tar: support PAX subsecond resolution times
Add support for PAX subsecond resolution times. Since the parser
supports negative timestamps, the formatter also handles negative
timestamps.

The relevant PAX specification is:
<<<
Portable file timestamps cannot be negative. If pax encounters a
file with a negative timestamp in copy or write mode, it can reject
the file, substitute a non-negative timestamp, or generate a
non-portable timestamp with a leading '-'.
>>>

<<<
All of these time records shall be formatted as a decimal
representation of the time in seconds since the Epoch.
If a <period> ( '.' ) decimal point character is present,
the digits to the right of the point shall represent the units of
a subsecond timing granularity, where the first digit is tenths of
a second and each subsequent digit is a tenth of the previous digit.
>>>

Fixes #11171

Change-Id: Ied108f3d2654390bc1b0ddd66a4081c2b83e490b
Reviewed-on: https://go-review.googlesource.com/55552
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-15 02:20:22 +00:00
Joe Tsai
e098e5142d archive/tar: properly handle header-only "files" in Writer
Certain special type-flags, specifically 1, 2, 3, 4, 5, 6,
do not have a data section. Thus, regardless of what the size field
says, we should not attempt to write any data for these special types.

The relevant PAX and USTAR specification says:
<<<
If the typeflag field is set to specify a file to be of type 1 (a link)
or 2 (a symbolic link), the size field shall be specified as zero.
If the typeflag field is set to specify a file of type 5 (directory),
the size field shall be interpreted as described under the definition
of that record type. No data logical records are stored for types 1, 2, or 5.
If the typeflag field is set to 3 (character special file),
4 (block special file), or 6 (FIFO), the meaning of the size field is
unspecified by this volume of POSIX.1-2008, and no data logical records shall
be stored on the medium.
Additionally, for type 6, the size field shall be ignored when reading.
If the typeflag field is set to any other value, the number of logical
records written following the header shall be (size+511)/512, ignoring
any fraction in the result of the division.
>>>

Fixes #15565

Change-Id: Id11886b723b3b13deb15221dca51c25cd778a6b5
Reviewed-on: https://go-review.googlesource.com/55553
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-15 00:54:57 +00:00
Joe Tsai
17fa5a7c9f archive/tar: roundtrip reading device numbers
Both GNU and BSD tar do not care if the devmajor and devminor values are
set on entries (like regular files) that aren't character or block devices.

While this is non-sensible, it is more consistent with the Writer to actually
read these fields always. In a vast majority of the cases these will still
be zero. In the rare situation where someone actually cares about these,
at least information was not silently lost.

Change-Id: I6e4ba01cd897a1b13c28b1837e102a4fdeb420ba
Reviewed-on: https://go-review.googlesource.com/55572
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-15 00:54:37 +00:00
Joe Tsai
694875cbf2 archive/tar: remove writeHeader and writePAXHeaderLegacy
Previous CLs (CL/54970, CL55231, and CL/55237) re-implemented tar.Writer
entirely using specialized methods (writeUSTARHeader, writePAXHeader,
and writeGNUHeader) allowing tar.Writer to entirely side-step the broken
and buggy logic in writeHeader.

Since writeHeader and writePAXHeaderLegacy is now dead-code,
we can delete them.

One minor change is that we call Writer.Flush at the start of WriteHeader.
This used to be performed by writeHeader, but doing so in WriteHeader
ensures each of the specialized methods can benefit from its effect.

Fixes #17665
Fixes #12594

Change-Id: Iff2ef8e7310d40ac5484d2f8852fc5df25201426
Reviewed-on: https://go-review.googlesource.com/55550
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-14 23:29:25 +00:00
Joe Tsai
ffd9810e59 archive/tar: implement specialized logic for GNU format
Rather than going through writeHeader, which attempts to handle all formats,
implement writeGNUHeader, which only has an understanding of the GNU format.

Currently, the implementation is nearly identical to writeUSTARHeader, except:
* formatNumeric is used instead of formatOctal
* the GNU magic value is used

This is kept as a separate method since it makes more logical sense
when we add support for sparse files, long filenames, and atime/ctime fields,
which do not affect USTAR.

Updates #12594

Change-Id: I76efc0b39dc649efc22646dfc9867a7c165f34a8
Reviewed-on: https://go-review.googlesource.com/55237
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2017-08-14 21:50:56 +00:00
Joe Tsai
01385b1bb6 archive/tar: adjust bytediff to print full context
Since test files don't exceed 10KiB, print the full context of the diff,
including bytes that are equal.
Also, fix the labels for got and want; they were backwards before.

Change-Id: Ibac022e5f988d26812c3f75b643cae8b95603fc9
Reviewed-on: https://go-review.googlesource.com/55151
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2017-08-14 06:27:44 +00:00
Joe Tsai
7ae9561610 archive/tar: implement specialized logic for PAX format
Rather than going through writeHeader, which attempts to handle all formats,
implement writePAXHeader, which only has an understanding of the PAX format.

In PAX, the USTAR header is filled out in a best-effort manner.
Thus, we change logic of formatString and formatOctal to try their best to
output something (possibly truncated) in the event of an error.

The new implementation of PAX headers causes several tests to fail.
An investigation into the new output reveals that the new behavior is correct,
while the tests had actually locked in incorrect behavior before.

A dump of the differences is listed below (-before, +after):

<< writer-big.tar >>

This change is due to fact that we changed the Header.Devminor to force the
tar.Writer to choose the GNU format over the PAX one.
The ability to control the output is an open issue (see #18710).
- 00000150  00 30 30 30 30 30 30 30  00 00 00 00 00 00 00 00  |.0000000........|
+ 00000150  00 ff ff ff ff ff ff ff  ff 00 00 00 00 00 00 00  |................|

<< writer-big-long.tar>>

The previous logic generated the GNU magic values for a PAX file.
The new logic correctly uses the USTAR magic values.
- 00000100  00 75 73 74 61 72 20 20  00 00 00 00 00 00 00 00  |.ustar  ........|
- 00000500  00 75 73 74 61 72 20 20  00 67 75 69 6c 6c 61 75  |.ustar  .guillau|
+ 00000100  00 75 73 74 61 72 00 30  30 00 00 00 00 00 00 00  |.ustar.00.......|
+ 00000500  00 75 73 74 61 72 00 30  30 67 75 69 6c 6c 61 75  |.ustar.00guillau|

The previous logic tried to use the specified timestmap in the PAX headers file,
but this is problematic as this timestamp can overflow, defeating the point
of using PAX, which is intended to extend tar.
The new logic uses the zero timestamp similar to what GNU and BSD tar do.
- 00000080  30 30 30 30 32 33 32 00  31 32 33 33 32 37 37 30  |0000232.12332770|
+ 00000080  30 30 30 30 32 35 36 00  30 30 30 30 30 30 30 30  |0000256.00000000|

The previous logic populated the devminor and devmajor fields.
The new logic leaves them zeroed just like what GNU and BSD tar do.
- 00000140  00 00 00 00 00 00 00 00  00 30 30 30 30 30 30 30  |.........0000000|
- 00000150  00 30 30 30 30 30 30 30  00 00 00 00 00 00 00 00  |.0000000........|
+ 00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+ 00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The previous logic uses PAX headers, but fails to add a record for the size.
The new logic does properly add a record for the size.
- 00000290  31 36 67 69 67 2e 74 78  74 0a 00 00 00 00 00 00  |16gig.txt.......|
- 000002a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+ 00000290  31 36 67 69 67 2e 74 78  74 0a 32 30 20 73 69 7a  |16gig.txt.20 siz|
+ 000002a0  65 3d 31 37 31 37 39 38  36 39 31 38 34 0a 00 00  |e=17179869184...|

The previous logic encoded the size as a base-256 field,
which is only valid in GNU, but the previous PAX headers implies this should
be a PAX file. This result in a strange hybrid that is neither GNU nor PAX.
The new logic uses PAX headers to store the size.
- 00000470  37 35 30 00 30 30 30 31  37 35 30 00 80 00 00 00  |750.0001750.....|
- 00000480  00 00 00 04 00 00 00 00  31 32 33 33 32 37 37 30  |........12332770|
+ 00000470  37 35 30 00 30 30 30 31  37 35 30 00 30 30 30 30  |750.0001750.0000|
+ 00000480  30 30 30 30 30 30 30 00  31 32 33 33 32 37 37 30  |0000000.12332770|

<< ustar.issue12594.tar >>

The previous logic used the specified timestamp for the PAX headers file.
The new logic just uses the zero timestmap.
- 00000080  30 30 30 30 32 33 31 00  31 32 31 30 34 34 30 32  |0000231.12104402|
+ 00000080  30 30 30 30 32 33 31 00  30 30 30 30 30 30 30 30  |0000231.00000000|

The previous logic populated the devminor and devmajor fields.
The new logic leaves them zeroed just like what GNU and BSD tar do.
- 00000140  00 00 00 00 00 00 00 00  00 30 30 30 30 30 30 30  |.........0000000|
- 00000150  00 30 30 30 30 30 30 30  00 00 00 00 00 00 00 00  |.0000000........|
+ 00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+ 00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Change-Id: I33419eb1124951968e9d5a10d50027e03133c811
Reviewed-on: https://go-review.googlesource.com/55231
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-14 06:26:35 +00:00
Joe Tsai
1d81251599 archive/tar: simplify toASCII and parseString
Use a simple []byte instead of bytes.Buffer to create a string.
Use bytes.IndexByte instead of our own for loop.

Change-Id: Ic4a1161d79017fd3af086a05c53d5f20a5f09326
Reviewed-on: https://go-review.googlesource.com/54752
Reviewed-by: Avelino <t@avelino.xxx>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-08-13 02:32:28 +00:00
Agniva De Sarker
23cd87eb0a archive/tar: optimize formatPAXRecord() call
By replacing fmt.Sprintf with a simple string concat, we see
pretty good improvements across the board on time and memory.

name             old time/op    new time/op    delta
FormatPAXRecord     683ns ± 2%     210ns ± 5%  -69.22%  (p=0.000 n=10+10)

name             old alloc/op   new alloc/op   delta
FormatPAXRecord      112B ± 0%       32B ± 0%  -71.43%  (p=0.000 n=10+10)

name             old allocs/op  new allocs/op  delta
FormatPAXRecord      8.00 ± 0%      2.00 ± 0%  -75.00%  (p=0.000 n=10+10)

Ran with - -cpu=1 -count=10 on an AMD64 i5-5200U CPU @ 2.20GHz

Using the following benchmark:
func BenchmarkFormatPAXRecord(b *testing.B) {
  for n := 0; n < b.N; n++ {
    formatPAXRecord("foo", "bar")
  }
}

Change-Id: I828ddbafad2e5d937f0cf5f777b512638344acfc
Reviewed-on: https://go-review.googlesource.com/55210
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2017-08-12 04:52:27 +00:00
Joe Tsai
0d1a8f6e12 archive/tar: implement specialized logic for USTAR format
Rather than going through the complicated logic of writeHeader,
implement a writeUSTARHeader that only knows about the USTAR format.
This makes the logic much easier to reason about since you only
need to be concerned about USTAR and not all the subtle
differences between USTAR, PAX, and GNU.

We seperate out the logic in writeUSTARHeader into templateV7Plus
and writeRawHeader since the planned implementations of
writePAXHeader and writeGNUHeader will use them.

Change-Id: Ie75a54ac998420ece82686159ae6fa39f8b128e9
Reviewed-on: https://go-review.googlesource.com/54970
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-12 01:48:06 +00:00
Joe Tsai
ead6255ce3 archive/tar: check for permissible output formats first
The current logic in writeHeader attempts to encode the Header in one
format and if it discovered that it could not it would attempt to
switch to a different format mid-way through. This makes it very
hard to reason about what format will be used in the end and whether
it will even be a valid format.

Instead, we should verify from the start what formats are allowed
to encode the given input Header. If no formats are possible,
then we can return immediately, rejecting the Header.

For now, we continue on to the hairy logic in writeHeader, but
a future CL can split that logic up and specialize them for each
format now that we know what is possible.

Update #9683
Update #12594

Change-Id: I8406ea855dfcb8b478a03a7058ddf8b2b09d46dc
Reviewed-on: https://go-review.googlesource.com/54433
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-11 04:39:39 +00:00
Joe Tsai
310ba82828 archive/tar: ensure input fits in octal field
The prior logic would over-write the NUL-terminator if the octal value
was long enough. In order to prevent this, we add a fitsInOctal function
that does the proper check.

The relevant USTAR specification about NUL-terminator is:
<<<
Each numeric field is terminated by one or more <space> or NUL characters.
>>>

Change-Id: I6fbc6e8fe71168727eea201925d0fe08d43116ac
Reviewed-on: https://go-review.googlesource.com/54432
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-11 03:25:17 +00:00
Joe Tsai
019d8a07e1 archive/tar: forbid NUL character in string fields
USTAR and GNU strings are NUL-terminated. Thus, we should never
allow the NUL terminator, otherwise we will lose data round-trip.

Relevant specification text:
<<<
The fields magic, uname, and gname are character strings each terminated by a NUL character.
>>>

Technically, PAX keys and values should be UTF-8, but the observance
of invalid files in the wild causes us to be more liberal.
<<<
The <length> field, <blank>, <equals-sign>, and <newline> shown shall
be limited to the portable character set, as encoded in UTF-8.
>>>

Thus, we only reject NULs in PAX keys, and NULs for PAX values
representing the USTAR string fields (i.e., path, linkpath, uname, gname).
These are treated more strictly because they represent strings that
are typically represented as C-strings on POSIX systems.

Change-Id: I305b794d9d966faad852ff660bd0b3b0964e52bf
Reviewed-on: https://go-review.googlesource.com/14724
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-11 03:12:47 +00:00
Joe Tsai
c592c05745 archive/tar: expand TestPartialRead to cover sparse files
Given that sparse file logic is not trivial, there should be a test
in TestPartialRead to ensure that partial reads work.

Change-Id: I913da3e331da06dca6758a8be3f5099abba233a6
Reviewed-on: https://go-review.googlesource.com/54430
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-11 03:12:27 +00:00
Joe Tsai
e17405d754 archive/tar: simplify bytediff logic
The encoding/hex package provides a nice Dump formatter that
prints both hex and ASCII. Use that instead for better visual
debugging of binary diffs.

Change-Id: Iad1084e8e52d7d523595e97ae20912657cea2ab5
Reviewed-on: https://go-review.googlesource.com/14729
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-11 03:12:02 +00:00
Joe Tsai
01e45c7368 archive/tar: fallback to pre-Go1.8 behavior on certain GNU files
Prior to Go1.8, the Writer had a bug where it would output
an invalid tar file in certain rare situations because the logic
incorrectly believed that the old GNU format had a prefix field.
This is wrong and leads to an output file that mangles the
atime and ctime fields, which are often left unused.

In order to continue reading tar files created by former, buggy
versions of Go, we skeptically parse the atime and ctime fields.
If we are unable to parse them and the prefix field looks like
an ASCII string, then we fallback on the pre-Go1.8 behavior
of treating these fields as the USTAR prefix field.

Note that this will not use the fallback logic for all possible
files generated by a pre-Go1.8 toolchain. If the generated file
happened to have a prefix field that parses as valid
atime and ctime fields (e.g., when they are valid octal strings),
then it is impossible to distinguish between an valid GNU file
and an invalid pre-Go1.8 file.

Fixes #21005

Change-Id: Iebf5c67c08e0e46da6ee41a2e8b339f84030dd90
Reviewed-on: https://go-review.googlesource.com/53635
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-11 03:11:49 +00:00