1
0
mirror of https://github.com/golang/go synced 2024-11-08 03:26:11 -07:00
Commit Graph

123 Commits

Author SHA1 Message Date
Russ Cox
1b09d43067 all: update references to symbols moved from io/ioutil to io
The old ioutil references are still valid, but update our code
to reflect best practices and get used to the new locations.

Code compiled with the bootstrap toolchain
(cmd/asm, cmd/dist, cmd/compile, debug/elf)
must remain Go 1.4-compatible and is excluded.
Also excluded vendored code.

For #41190.

Change-Id: I6d86f2bf7bc37a9d904b6cee3fe0c7af6d94d5b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/263142
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
2020-10-20 18:41:18 +00:00
Russ Cox
7bb721b938 all: update references to symbols moved from os to io/fs
The old os references are still valid, but update our code
to reflect best practices and get used to the new locations.

Code compiled with the bootstrap toolchain
(cmd/asm, cmd/dist, cmd/compile, debug/elf)
must remain Go 1.4-compatible and is excluded.

For #41190.

Change-Id: I8f9526977867c10a221e2f392f78d7dec073f1bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/243907
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2020-10-20 02:32:42 +00:00
Cherry Zhang
a413908dd0 all: add GOOS=ios
Introduce GOOS=ios for iOS systems. GOOS=ios matches "darwin"
build tag, like GOOS=android matches "linux" and GOOS=illumos
matches "solaris". Only ios/arm64 is supported (ios/amd64 is
not).

GOOS=ios and GOOS=darwin remain essentially the same at this
point. They will diverge at later time, to differentiate macOS
and iOS.

Uses of GOOS=="darwin" are changed to (GOOS=="darwin" || GOOS=="ios"),
except if it clearly means macOS (e.g. GOOS=="darwin" && GOARCH=="amd64"),
it remains GOOS=="darwin".

Updates #38485.

Change-Id: I4faacdc1008f42434599efb3c3ad90763a83b67c
Reviewed-on: https://go-review.googlesource.com/c/go/+/254740
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-09-23 18:12:59 +00:00
Daniel Martí
41348081fa all: fix a number of misuses of the word "an"
After golang.org/cl/210124, I wondered if the same error had gone
unnoticed elsewhere. I quickly spotted another dozen mistakes after
reading through the output of:

	git grep '\<[Aa]n [bcdfgjklmnpqrtvwyz][a-z]'

Many results are false positives for acronyms like "an mtime", since
it's pronounced "an em-time". However, the total amount of output isn't
that large given how simple the grep pattern is.

Change-Id: Iaa2ca69e42f4587a9e3137d6c5ed758887906ca6
Reviewed-on: https://go-review.googlesource.com/c/go/+/210678
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Zach Jones <zachj1@gmail.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-12-10 16:23:10 +00:00
Clément Chigot
8e8abf368d archive/tar, syscall: add statUnix for aix/ppc64
This commit add statUnix function for aix/ppc64. It also adds Unix
and Nano methods for AIX time structure.

Change-Id: I9fd62d34a47e87cd46f2f936cb736da0bdff7959
Reviewed-on: https://go-review.googlesource.com/c/163957
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-02-27 23:40:20 +00:00
Robert Griesemer
fae44a2be3 src, misc: apply gofmt
This applies the new gofmt literal normalizations to the library.

Change-Id: I8c1e8ef62eb556fc568872c9f77a31ef236348e7
Reviewed-on: https://go-review.googlesource.com/c/162539
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-02-19 20:38:28 +00:00
Yuval Pavel Zholkover
480373c756 syscall: revert to pre-FreeBSD 10 / POSIX-2008 timespec field names in Stat_t on FreeBSD
CL 138595 introduced the new names when the hardcoded stat8 definitions was replaced
with a cgo generated one.

Fixes #29393
Updates #22448

Change-Id: I6309958306329ff301c17344b2e0ead0cc874224
Reviewed-on: https://go-review.googlesource.com/c/155958
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-12-30 19:36:52 +00:00
Yuval Pavel Zholkover
dc6eb200dd syscall: FreeBSD 12 ino64 support
This is similar to CL 136816 for x/sys/unix, changing the FreeBSD ABI to use 64-bit inodes in
Stat_t, Statfs_t, and Dirent types.

The changes are forward compatible, that is FreeBSD 10.x, 11.x continue to use their current sysnum numbers.
The affected types are converted to the new layout (with some overhead).
Thus the same statically linked binary should work using the native sysnums (without any conversion) on FreeBSD 12.

Breaking API changes in package syscall are:
Mknod takes a uint64 (C dev_t) instead of int.
Stat_t: Dev, Ino, Nlink, Rdev, Gen became uint64.
  Atimespec, Mtimespec, Ctimespec, Birthtimespec renamed to Atim, Mtim, Ctim, Birthtim respectively.

Statfs_t: Mntonname and Mntfromname changed from [88]int8 to [1024]int8 arrays.

Dirent: Fileno became uint64, Namlen uint16 and an additional field Off int64 (currently unused) was added.

The following commands were run to generate ztypes_* and zsyscall_* on FreeBSD-12.0-ALPHA6 systems (GOARCH=386 were run on the same amd64 host):
GOOS=freebsd GOARCH=amd64 ./mksyscall.pl -tags freebsd,amd64 syscall_bsd.go syscall_freebsd.go syscall_freebsd_amd64.go |gofmt >zsyscall_freebsd_amd64.go
GOOS=freebsd GOARCH=amd64 go tool cgo -godefs types_freebsd.go | GOOS=freebsd GOARCH=amd64 go run mkpost.go >ztypes_freebsd_amd64.go

GOOS=freebsd GOARCH=386 ./mksyscall.pl -l32 -tags freebsd,386 syscall_bsd.go syscall_freebsd.go syscall_freebsd_386.go |gofmt >zsyscall_freebsd_386.go
GOOS=freebsd GOARCH=386 go tool cgo -godefs types_freebsd.go | GOOS=freebsd GOARCH=386 go run mkpost.go >ztypes_freebsd_386.go

GOOS=freebsd GOARCH=arm ./mksyscall.pl -l32 -arm -tags freebsd,arm syscall_bsd.go syscall_freebsd.go syscall_freebsd_arm.go |gofmt >zsyscall_freebsd_arm.go
GOOS=freebsd GOARCH=arm go tool cgo -godefs -- -fsigned-char types_freebsd.go | GOOS=freebsd GOARCH=arm go run mkpost.go >ztypes_freebsd_arm.go

The Kevent struct was changed to use the FREEBSD_COMPAT11 version always (requiring the COMPAT_FREEBSD11 kernel option FreeBSD-12, this is the default).

The definitions of ifData were not updated, their functionality in has have been replaced by vendored golang.org/x/net/route.

freebsdVersion initialization was dropped from init() in favor of a sync.Once based wrapper - supportsABI().

Updates #22448.

Change-Id: I359b756e2849c036d7ed7f75dbd6ec836e0b90b4
Reviewed-on: https://go-review.googlesource.com/c/138595
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-05 21:38:13 +00:00
Iskander Sharipov
89d533e368 archive/tar: remore redundant parens in type expressions
Simplify `(T)` expressions to  `T` where possible.

Found using https://go-critic.github.io/overview.html#typeUnparen-ref

Change-Id: Ic5ef335e03898f9fea1ff90fd83956376657fe67
Reviewed-on: https://go-review.googlesource.com/123379
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2018-08-21 01:47:17 +00:00
Tim Cooper
161874da2a all: update comment URLs from HTTP to HTTPS, where possible
Each URL was manually verified to ensure it did not serve up incorrect
content.

Change-Id: I4dc846227af95a73ee9a3074d0c379ff0fa955df
Reviewed-on: https://go-review.googlesource.com/115798
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2018-06-01 21:52:00 +00:00
Robert Griesemer
542ea5ad91 go/printer, gofmt: tuned table alignment for better results
The go/printer (and thus gofmt) uses a heuristic to determine
whether to break alignment between elements of an expression
list which is spread across multiple lines. The heuristic only
kicked in if the entry sizes (character length) was above a
certain threshold (20) and the ratio between the previous and
current entry size was above a certain value (4).

This heuristic worked reasonably most of the time, but also
led to unfortunate breaks in many cases where a single entry
was suddenly much smaller (or larger) then the previous one.

The behavior of gofmt was sufficiently mysterious in some of
these situations that many issues were filed against it.

The simplest solution to address this problem is to remove
the heuristic altogether and have a programmer introduce
empty lines to force different alignments if it improves
readability. The problem with that approach is that the
places where it really matters, very long tables with many
(hundreds, or more) entries, may be machine-generated and
not "post-processed" by a human (e.g., unicode/utf8/tables.go).

If a single one of those entries is overlong, the result
would be that the alignment would force all comments or
values in key:value pairs to be adjusted to that overlong
value, making the table hard to read (e.g., that entry may
not even be visible on screen and all other entries seem
spaced out too wide).

Instead, we opted for a slightly improved heuristic that
behaves much better for "normal", human-written code.

1) The threshold is increased from 20 to 40. This disables
the heuristic for many common cases yet even if the alignment
is not "ideal", 40 is not that many characters per line with
todays screens, making it very likely that the entire line
remains "visible" in an editor.

2) Changed the heuristic to not simply look at the size ratio
between current and previous line, but instead considering the
geometric mean of the sizes of the previous (aligned) lines.
This emphasizes the "overall picture" of the previous lines,
rather than a single one (which might be an outlier).

3) Changed the ratio from 4 to 2.5. Now that we ignore sizes
below 40, a ratio of 4 would mean that a new entry would have
to be 4 times bigger (160) or smaller (10) before alignment
would be broken. A ratio of 2.5 seems more sensible.

Applied updated gofmt to all of src and misc. Also tested
against several former issues that complained about this
and verified that the output for the given examples is
satisfactory (added respective test cases).

Some of the files changed because they were not gofmt-ed
in the first place.

For #644.
For #7335.
For #10392.
(and probably more related issues)

Fixes #22852.

Change-Id: I5e48b3d3b157a5cf2d649833b7297b33f43a6f6e
2018-04-04 13:39:34 -07:00
Brad Fitzpatrick
48db2c01b4 all: use strings.Builder instead of bytes.Buffer where appropriate
I grepped for "bytes.Buffer" and "buf.String" and mostly ignored test
files. I skipped a few on purpose and probably missed a few others,
but otherwise I think this should be most of them.

Updates #18990

Change-Id: I5a6ae4296b87b416d8da02d7bfaf981d8cc14774
Reviewed-on: https://go-review.googlesource.com/102479
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-26 23:05:53 +00:00
Agniva De Sarker
39852bf4cc archive/tar: remove loop label from reader
CL 14624 introduced this label. At that time,
the switch-case had a break to label statement which made this necessary.
But now, the code no longer has a break statement and it directly returns.

Hence, it is no longer necessary to have a label.

Change-Id: Idde0fcc4d2db2d76424679f5acfe33ab8573bce4
Reviewed-on: https://go-review.googlesource.com/96935
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2018-02-25 21:50:17 +00:00
Ryuma Yoshida
8fc25b531b all: remove duplicate word "the"
Change-Id: Ia5908e94a6bd362099ca3c63f6ffb7e94457131d
GitHub-Last-Rev: 545a40571a
GitHub-Pull-Request: golang/go#23942
Reviewed-on: https://go-review.googlesource.com/95435
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-20 16:45:55 +00:00
Kunpei Sakai
f356e83e2e all: remove "the" duplications
Change-Id: I1f25b11fb9b7cd3c09968ed99913dc85db2025ef
Reviewed-on: https://go-review.googlesource.com/94976
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-18 17:54:20 +00:00
Caio Marcelo de Oliveira Filho
e4bde05104 archive/tar: automatically promote TypeRegA
Change Reader to promote TypeRegA to TypeReg in headers, unless their
name have a trailing slash which is already promoted to TypeDir. This
will allow client code to handle just TypeReg instead both TypeReg and
TypeRegA.

Change Writer to promote TypeRegA to TypeReg or TypeDir in the headers
depending on whether the name has a trailing slash. This normalization
is motivated by the specification (in pax(1)):

   0 represents a regular file. For backwards-compatibility, a
   typeflag value of binary zero ( '\0' ) should be recognized as
   meaning a regular file when extracting files from the
   archive. Archives written with this version of the archive file
   format create regular files with a typeflag value of the
   ISO/IEC 646:1991 standard IRV '0'.

Fixes #22768.

Change-Id: I149ec55824580d446cdde5a0d7a0457ad7b03466
Reviewed-on: https://go-review.googlesource.com/85656
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-13 18:36:49 +00:00
Joe Tsai
9ec0c7abe1 archive/tar: use placeholder name for global PAX records
Several usages of tar (reasonably) just use the Header.FileInfo
to determine the type of the header. However, the os.FileMode type
is not expressive enough to represent "files" that are not files
at all, but some form of metadata.

Thus, Header{Typeflag: TypeXGlobalHeader}.FileInfo().Mode().IsRegular()
reports true, even though the expected result may have been false.

To reduce (not eliminate) the possibility of failure for such usages,
use the placeholder filename from the global PAX headers.
Thus, in the event the user did not handle special "meta" headers
specifically, they will just be written to disk as a regular file.

As an example use case, the "git archive --format=tgz" command produces
an archive where the first "file" is a global PAX header with the
name "global_pax_header". For users that do not explicitly check
the Header.Typeflag field to ignore such headers, they may end up
extracting a file named "global_pax_header". While it is a bogus file,
it at least does not stop the extraction process.

Updates #22748

Change-Id: I28448b528dcfacb4e92311824c33c71b482f49c9
Reviewed-on: https://go-review.googlesource.com/78355
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-29 19:04:57 +00:00
Joe Tsai
ba2835db6c archive/tar: partially revert sparse file support
This CL removes the following APIs:
	type SparseEntry struct{ ... }
	type Header struct{ SparseHoles []SparseEntry; ... }
	func (*Header) DetectSparseHoles(f *os.File) error
	func (*Header) PunchSparseHoles(f *os.File) error
	func (*Reader) WriteTo(io.Writer) (int, error)
	func (*Writer) ReadFrom(io.Reader) (int, error)

This API was added during the Go1.10 dev cycle, and are safe to remove.

The rationale for reverting is because Header.DetectSparseHoles and
Header.PunchSparseHoles are functionality that probably better belongs in
the os package itself.

The other API like Header.SparseHoles, Reader.WriteTo, and Writer.ReadFrom
perform no OS specific logic and only perform the actual business logic of
reading and writing sparse archives. Since we do know know what the API added to
package os may look like, we preemptively revert these non-OS specific changes
as well by simply commenting them out.

Updates #13548
Updates #22735

Change-Id: I77842acd39a43de63e5c754bfa1c26cc24687b70
Reviewed-on: https://go-review.googlesource.com/78030
Reviewed-by: Russ Cox <rsc@golang.org>
2017-11-16 16:54:08 +00:00
Joe Tsai
d9fb9e7cf5 archive/tar: change error prefix
Change error message prefix from "tar:" to "archive/tar:" to maintain
backwards compatibility with Go1.9 and earlier in the unfortunate event
that someone is relying on string parsing of errors.

Fixes #22740

Change-Id: I59039c59818a0599e9d3b06bb5a531aa22a389b8
Reviewed-on: https://go-review.googlesource.com/77933
Reviewed-by: roger peppe <rogpeppe@gmail.com>
2017-11-15 18:56:32 +00:00
Awn
23c9db657e archive/tar: remove useless type conversions
Change-Id: I259a6ed6a1abc63d2dc39eca7e85f94cf38001cc
Reviewed-on: https://go-review.googlesource.com/47342
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-15 15:11:14 +00:00
Stanislav Afanasev
a4aa5c3181 archive/tar: a cosmetic fix after checking by golint
Existing methods regFileReader.LogicalRemaining and regFileReader.PhysicalRemaining have inconsistent reciever names with the previous name

Change-Id: Ief2024716737eaf482c4311f3fdf77d92801c36e
Reviewed-on: https://go-review.googlesource.com/76430
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-11-07 20:11:28 +00:00
Joe Tsai
577aab0c59 archive/tar: ignore ChangeTime and AccessTime unless Format is specified
CL 59230 changed Writer.WriteHeader to ignore the ChangeTime and AccessTime
fields when considering using the USTAR format when the format is unspecified.
This policy is confusing and leads to unexpected behavior where some files
have ModTime only, while others have ModTime+AccessTime+ChangeTime if the
format became PAX for some unrelated reason (e.g., long pathname).

Change the policy to simply always ignore ChangeTime, AccessTime, and
sub-second time resolutions unless the user explicitly specifies a format.
This is a safe policy change since WriteHeader had no support for the
above features in any Go release.

Support for ChangeTime and AccessTime was added in CL 55570.
Support for sub-second times was added in CL 55552.
Both CLs landed after the latest Go release (i.e., Go1.9), which was
cut from the master branch around August 6th, 2017.

Change-Id: Ib82baa1bf9dd4573ed4f674b7d55d15f733a4843
Reviewed-on: https://go-review.googlesource.com/69296
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-10 20:13:27 +00:00
Joe Tsai
4cd58c2f26 archive/tar: improve handling of directory paths
The USTAR format says:
<<<
Implementors should be aware that the previous file format did not include
a mechanism to archive directory type files.
For this reason, the convention of using a filename ending with
<slash> was adopted to specify a directory on the archive.
>>>

In light of this suggestion, make the following changes:
* Writer.WriteHeader refuses to encode a header where a file that
is obviously a file-type has a trailing slash in the name.
* formatter.formatString avoids encoding a trailing slash in the event
that the string is truncated (the full string will be encoded elsewhere,
so stripping the slash is safe).
* Reader.Next treats a TypeRegA (which is the zero value of Typeflag)
as a TypeDir if the name has a trailing slash.

Change-Id: Ibf27aa8234cce2032d92e5e5b28546c2f2ae5ef6
Reviewed-on: https://go-review.googlesource.com/69293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-10 20:11:26 +00:00
Marvin Stenger
d153df8e4b all: revert "all: prefer strings.LastIndexByte over strings.LastIndex"
This reverts https://golang.org/cl/66372.

Updates #22148

Change-Id: I3e94af3dfc11a2883bf28e1d5e1f32f98760b3ee
Reviewed-on: https://go-review.googlesource.com/68431
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-05 23:19:42 +00:00
Joe Tsai
e04ff3d133 archive/tar: fix typo in documentation
s/TypeSymLink/TypeSymlink/g

Change-Id: I2550843248eb27d90684d0036fe2add0b247ae5a
Reviewed-on: https://go-review.googlesource.com/67810
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-03 16:24:53 +00:00
Marvin Stenger
d2826d3e06 all: prefer strings.LastIndexByte over strings.LastIndex
strings.LastIndexByte was introduced in go1.5 and it can be used
effectively wherever the second argument to strings.LastIndex is
exactly one byte long.

This avoids generating unnecessary string symbols and saves
a few calls to strings.LastIndex.

Change-Id: I7b5679d616197b055cffe6882a8675d24a98b574
Reviewed-on: https://go-review.googlesource.com/66372
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-27 00:54:24 +00:00
Joe Tsai
7246585f8c archive/tar: avoid empty IO operations
The interfaces for io.Reader and io.Writer permit calling Read/Write
with an empty buffer. However, this condition is often not well tested
and can lead to bugs in various implementations of io.Reader and io.Writer.
For example, see #22028 for buggy io.Reader in the bzip2 package.

We reduce the likelihood of hitting these bugs by adjusting
regFileReader.Read and regFileWriter.Write to avoid performing
Read and Write calls when the buffer is known to be empty.

Fixes #22029

Change-Id: Ie4a26be53cf87bc4d2abd951fa005db5871cc75c
Reviewed-on: https://go-review.googlesource.com/66111
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-25 23:06:04 +00:00
Giovanni Bajo
2f8b555de2 archive/tar: fix sparse files support on Darwin
Apple defined the SEEK_HOLE/SEEK_DATA constants in unistd.h
with swapped values, compared to all other UNIX systems.

Fixes #21970

Change-Id: I84a33e0741f0f33a2e04898e96b788b87aa9890f
Reviewed-on: https://go-review.googlesource.com/65570
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-23 16:53:28 +00:00
David du Colombier
d83b23fd4f archive/tar: skip TestSparseFiles on Plan 9
CL 60871 added TestSparseFiles. This test is succeeding
on Plan 9 when executed on the ramfs file system, but
is failing when executed on the Fossil file system.

This may be due to an issue in the handling of sparse
files in the Fossil file system on Plan 9 that should
be investigated.

Updates #21977.

Change-Id: I177afff519b862a5c548e094203c219504852006
Reviewed-on: https://go-review.googlesource.com/65352
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-22 13:50:50 +00:00
Joe Tsai
718d9de60f archive/tar: perform test for hole-detection on specific builders
The test for hole-detection is heavily dependent on whether the
OS and underlying FS provides support for it.
Even on Linux, which has support for SEEK_HOLE and SEEK_DATA,
the underlying filesystem may not have support for it.
In order to avoid an ever-changing game of whack-a-mole,
we whitelist the specific builders that we expect the test to pass on.

Updates #21964

Change-Id: I7334e8532c96cc346ea83aabbb81b719685ad7e5
Reviewed-on: https://go-review.googlesource.com/65270
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-21 20:42:11 +00:00
Joe Tsai
fdecab6ef0 archive/tar: make check for hole detection support more liberal
On most Unix OSes, lseek reports EINVAL when lacking SEEK_HOLE support.
However, there are reports that ENOTTY is reported instead.
Rather than tracking down every possible errno that may be used to
represent "not supported", just treat any non-nil error as meaning
that there is no support. This is the same strategy taken by the
GNU and BSD tar tools.

Fixes #21958

Change-Id: Iae68afdc934042f52fa914fca45f0ca89220c383
Reviewed-on: https://go-review.googlesource.com/65191
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-21 17:49:35 +00:00
Joe Tsai
1eacf78858 archive/tar: add Header.DetectSparseHoles and Header.PunchSparseHoles
To support the detection and creation of sparse files,
add two new methods:
	func Header.DetectSparseHoles(*os.File) error
	func Header.PunchSparseHoles(*os.File) error

DetectSparseHoles is intended to be used after FileInfoHeader
prior to serializing the Header with WriteHeader.
For each OS, it uses specialized logic to detect
the location of sparse holes. On most Unix systems, it uses
SEEK_HOLE and SEEK_DATA to query for the holes.
On Windows, it uses a specialized the FSCTL_QUERY_ALLOCATED_RANGES
syscall to query for all the holes.

PunchSparseHoles is intended to be used after Reader.Next
prior to populating the file with Reader.WriteTo.
On Windows, this uses the FSCTL_SET_ZERO_DATA syscall.
On other operating systems it simply truncates the file
to the end-offset of SparseHoles.

DetectSparseHoles and PunchSparseHoles are added as methods on
Header because they are heavily tied to the operating system,
for which there is already an existing precedence for
(since FileInfoHeader makes uses of OS-specific details).

Fixes #13548

Change-Id: I98a321dd1ce0165f3d143d4edadfda5e7db67746
Reviewed-on: https://go-review.googlesource.com/60871
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-20 22:12:38 +00:00
Joe Tsai
57c79febda archive/tar: add Reader.WriteTo and Writer.ReadFrom
To support the efficient packing and extracting of sparse files,
add two new methods:
	func Reader.WriteTo(io.Writer) (int64, error)
	func Writer.ReadFrom(io.Reader) (int64, error)

If the current archive entry is sparse and the provided io.{Reader,Writer}
is also an io.Seeker, then use Seek to skip past the holes.
If the last region in a file entry is a hole, then we seek to 1 byte
before the EOF:
	* for Reader.WriteTo to write a single byte
	to ensure that the resulting filesize is correct.
	* for Writer.ReadFrom to read a single byte
	to verify that the input filesize is correct.

The downside of this approach is when the last region in the sparse file
is a hole. In the case of Reader.WriteTo, the 1-byte write will cause
the last fragment to have a single chunk allocated.
However, the goal of ReadFrom/WriteTo is *not* the ability to
exactly reproduce sparse files (in terms of the location of sparse holes),
but rather to provide an efficient way to create them.

File systems already impose their own restrictions on how the sparse file
will be created. Some filesystems (e.g., HFS+) don't support sparseness and
seeking forward simply causes the FS to write zeros. Other filesystems
have different chunk sizes, which will cause chunk allocations at boundaries
different from what was in the original sparse file. In either case,
it should not be a normal expectation of users that the location of holes
in sparse files exactly matches the source.

For users that really desire to have exact reproduction of sparse holes,
they can wrap os.File with their own io.WriteSeeker that discards the
final 1-byte write and uses File.Truncate to resize the file to the
correct size.

Other reasons we choose this approach over special-casing *os.File because:
	* The Reader already has special-case logic for io.Seeker
	* As much as possible, we want to decouple OS-specific logic from
	Reader and Writer.
	* This allows other abstractions over *os.File to also benefit from
	the "skip past holes" logic.
	* It is easier to test, since it is harder to mock an *os.File.

Updates #13548

Change-Id: I0a4f293bd53d13d154a946bc4a2ade28a6646f6a
Reviewed-on: https://go-review.googlesource.com/60872
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-18 16:18:17 +00:00
Tobias Klauser
3098cf0175 archive/tar: populate Devmajor and Devminor in FileInfoHeader on *BSD
Extract device major/minor number on all the BSDs and set Devmajor and
Devminor in FileInfoHeader. Code based on the corresponding Major/Minor
implementations in golang.org/x/sys/unix.

Change-Id: Ieffa7ce0cdbe6481950de666b2f5f88407a32382
Reviewed-on: https://go-review.googlesource.com/63470
Reviewed-by: Joe Tsai <joetsai@google.com>
2017-09-13 21:02:11 +00:00
Tobias Klauser
ec359643a1 archive/tar: populate Devmajor and Devminor in FileInfoHeader on Darwin
Extract device major/minor number on Darwin and set Devmajor and
Devminor in FileInfoHeader. Code based on the Major/Minor functions for
Darwin in golang.org/x/sys/unix.

Change-Id: I51b65f607bfa2e6b177b8b66e2b246b771367b84
Reviewed-on: https://go-review.googlesource.com/60850
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-01 10:25:54 +00:00
Joe Tsai
c1679286c3 archive/tar: minor doc fixes
Use "file" consistently instead of "entry".

Change-Id: Ia81c9665d0d956adb78f7fa49de40cdb87fba000
Reviewed-on: https://go-review.googlesource.com/60150
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30 18:01:08 +00:00
Joe Tsai
f85dc050ba archive/tar: require opt-in to PAX or GNU format for time features
Nearly every Header obtained from FileInfoHeader via the FS has
timestamps with sub-second resolution and the AccessTime
and ChangeTime fields populated. This forces the PAX format
to almost always be used, which has the following problems:
* PAX is still not as widely supported compared to USTAR
* The PAX headers will occupy at minimum 1KiB for every entry

The old behavior of tar Writer had no support for sub-second resolution
nor any support for AccessTime or ChangeTime, so had neither problem.
Instead the Writer would just truncate sub-second information and
ignore the AccessTime and ChangeTime fields.

In this CL, we preserve the behavior such that the *default* behavior
would output a USTAR header for most cases by truncating sub-second
time measurements and ignoring AccessTime and ChangeTime.
To use either of the features, users will need to explicitly specify
that the format is PAX or GNU.

The exact policy chosen is this:
* USTAR and GNU may still be chosen even if sub-second measurements
are present; they simply truncate the timestamp to the nearest second.
As before, PAX uses sub-second resolutions.
* If the Format is unspecified, then WriteHeader ignores AccessTime
and ChangeTime when using the USTAR format.

This ensures that USTAR may still be chosen for a vast majority of
file entries obtained through FileInfoHeader.

Updates #11171
Updates #17876

Change-Id: Icc5274d4245922924498fd79b8d3ae94d5717271
Reviewed-on: https://go-review.googlesource.com/59230
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30 18:00:59 +00:00
Joe Tsai
0564e304a6 archive/tar: populate uname/gname/devmajor/devminor in FileInfoHeader
We take a best-effort approach since information for these fields
are not well supported on all platforms.

user.LookupId+user.LookupGroupId is currently 15x slower than os.Stat.
For performance reasons, we perpetually cache username and groupname
with a sync.Map. As a result, this function will not be updated whenever
the user or group names are renamed in the OS. However, this is a better
situation than before, where those fields were not populated at all.

Change-Id: I3cec8291aed7675dea89ee1cbda92bd493c8831f
Reviewed-on: https://go-review.googlesource.com/59531
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-30 00:52:31 +00:00
Joe Tsai
bad6b6fa91 archive/tar: improve package documentation
Many aspects of the package is woefully undocumented.
With the recent flurry of improvements, the package is now at feature
parity with the GNU and TAR tools. Thoroughly all of the public API
and perform some minor stylistic cleanup in some code segments.

Change-Id: Ic892fd72c587f30dfe91d1b25b88c9c8048cc389
Reviewed-on: https://go-review.googlesource.com/59210
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25 23:29:55 +00:00
Joe Tsai
19a995945f archive/tar: add raw support for global PAX records
The PAX specification says the following:
<<<
'g' represents global extended header records for the following files in the archive.
The format of these extended header records shall be as described in pax Extended Header.
Each value shall affect all subsequent files that do not override that value
in their own extended header record and until another global extended header record
is reached that provides another value for the same field.
>>>

This CL adds support for parsing and composing global PAX records,
but intentionally does not provide support for automatically
persisting the global state across files.

Changes made:
* When Reader encounters a TypeXGlobalRecord header, it parses the
PAX records and returns them to the user ad-verbatim. Reader does not
store them in its state, ensuring it has no effect on future Next calls.
* When Writer receives a TypeXGlobalRecord header, it writes the
PAX records to the archive ad-verbatim. It does not store them in
its state, ensuring it has no effect on future WriteHeader calls.
* The restriction regarding empty record values is lifted since this
value is used to represent deletion in global headers.

Why provide raw support only:
* Some archives in the wild have a global header section (often empty)
and it is the user's responsibility to manually read and discard it's body.
The logic added here allows users to more easily skip over these sections.
* For users that do care about global headers, having access to the raw
records allows them to implement the functionality of global headers themselves
and manually persist the global state across files.
* We can still upgrade to a full implementation in the future.

Why we don't provide full support:
* Even though the PAX specification describes their operation in detail,
both the GNU and BSD tar tools (which are the most common implementations)
do not have a consistent interpretation of many details.
* Global headers were a controversial feature in PAX, by admission of the
specification itself:
  <<<
  The concept of a global extended header (typeflag g) was controversial.

  The typeflag g global headers should not be used with interchange media that
  could suffer partial data loss in transporting the archive.
  >>>
* Having state persist from entry-to-entry complicates the implementation
for a feature that is not widely used and not well supported.

Change-Id: I1d904cacc2623ddcaa91525a5470b7dbe226c7e8
Reviewed-on: https://go-review.googlesource.com/59190
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
2017-08-25 23:03:52 +00:00
Joe Tsai
a795ca51db archive/tar: support arbitrary PAX records
This CL adds the following new publicly visible API:
	type Header struct { ...; PAXRecords map[string]string }

The new Header.PAXRecords field is a map of all PAX extended header records.

We suggest (but do not enforce) that users use VENDOR-prefixed keys
according to the following in the PAX specification:
<<<
The standard developers have reserved keyword name space for vendor extensions.
It is suggested that the format to be used is:
	VENDOR.keyword
where VENDOR is the name of the vendor or organization in all uppercase letters.
>>>

When reading, the Header.PAXRecords is populated with all PAX records
encountered so far, including basic ones (e.g., "path", "mtime", etc).
When writing, the fields of Header will be merged into PAXRecords,
overwriting any records that may conflict.

Since PAXRecords is a more expressive feature than Xattrs and
is entirely a superset of Xattrs, we mark Xattrs as deprecated,
and steer users towards the new PAXRecords API.

The issue has a discussion about adding a Header.SetPAXRecord method
to help validate records and keep the Header fields in sync.
However, we do not include that in this CL since that helper
method can always be added in the future.

There is no support for global records.

Fixes #14472

Change-Id: If285a52749acc733476cf75a2c7ad15bc1542071
Reviewed-on: https://go-review.googlesource.com/58390
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25 21:57:32 +00:00
Joe Tsai
3d62000adc archive/tar: return better WriteHeader errors
WriteHeader may fail to encode a header for any number of reasons,
which can be frustrating for the user when trying to create a tar archive.
As we validate the Header, we generate an informative error message
intended for human consumption and return that if and only if no
format can be selected.

This allows WriteHeader to return informative errors like:
    tar: cannot encode header: invalid PAX record: "linkpath = \x00hello"
    tar: cannot encode header: invalid PAX record: "SCHILY.xattr.foo=bar = baz"
    tar: cannot encode header: Format specifies GNU; and only PAX supports Xattrs
    tar: cannot encode header: Format specifies GNU; and GNU cannot encode ModTime=1969-12-31 15:59:59.0000005 -0800 PST
    tar: cannot encode header: Format specifies GNU; and GNU supports sparse files only with TypeGNUSparse
    tar: cannot encode header: Format specifies USTAR; and USTAR cannot encode ModTime=292277026596-12-04 07:30:07 -0800 PST
    tar: cannot encode header: Format specifies USTAR; and USTAR does not support sparse files
    tar: cannot encode header: Format specifies PAX; and only GNU supports TypeGNUSparse

Updates #18710

Change-Id: I82a498d6f29d02c4e73bce47b768eb578da8499c
Reviewed-on: https://go-review.googlesource.com/58310
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-25 05:21:00 +00:00
Joe Tsai
9d3d370632 archive/tar: support reporting and selecting the format
The Reader and Writer are now at feature parity,
meaning that everything that can be parsed by the Reader,
can also be composed by the Writer.

This position enables us to support selection of the format
in a backwards compatible way, since it ensures that everything
that can be read can also be round-trip written.

As such, we add the following new API:
    type Format int
            const FormatUnknown Format = 0 ...
    type Header struct { ...; Format Format }

The new Header.Format field is populated by the Reader on the
best guess on what the format is. Note that the Reader is very liberal
in what it permits, so a hybrid TAR file using aspects of multiple
formats can still be decoded, but will be reported as FormatUnknown.

Even though Reader has full support for V7 and basic support for STAR,
it will still report those formats as unknown (and the constants for
those formats are not even exported). The reasons for this is because
the Writer has no support for V7 or STAR. Leaving it as unknown allows
the Writer to choose a format usually USTAR or GNU that can encode
the equivalent Header.

When writing, the Header.allowedFormats will take the Format field
into consideration if it is a known format.

Fixes #18710

Change-Id: I00980c475d067c6969d3414e1ff0224fdd89cd49
Reviewed-on: https://go-review.googlesource.com/58230
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-24 01:35:39 +00:00
Joe Tsai
e0ab505a97 archive/tar: implement Writer support for sparse files
This CL is the second step (of two; part1 is CL/56771) for adding
sparse file support to the Writer.

There are no new identifiers exported in this CL, but this does make
use of Header.SparseHoles added in part1. If the Typeflag is set to
TypeGNUSparse or len(SparseHoles) > 0, then the Writer will emit an
sparse file, where the holes must be written by the user as zeros.

If TypeGNUSparse is set, then the output file must use the GNU format.
Otherwise, it must use the PAX format (with GNU-defined PAX keys).

A future CL may export Reader.Discard and Writer.FillZeros,
but those methods are currently unexported, and only used by the
tests for efficiency reasons.
Calling Discard or FillZeros on a hole 10GiB in size does take
time, even if it is essentially a memcopy.

Updates #13548

Change-Id: Id586d9178c227c0577f796f731ae2cbb72355601
Reviewed-on: https://go-review.googlesource.com/57212
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-23 22:38:45 +00:00
Agniva De Sarker
ea5e3bd2a1 all: fix easy-to-miss typos
Using the wonderful https://github.com/client9/misspell tool.

Change-Id: Icdbc75a5559854f4a7a61b5271bcc7e3f99a1a24
Reviewed-on: https://go-review.googlesource.com/57851
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-23 03:07:12 +00:00
Joe Tsai
3bece2fa0e archive/tar: refactor Reader support for sparse files
This CL is the first step (of two) for adding sparse file support
to the Writer. This CL only refactors the logic of sparse-file handling
in the Reader so that common logic can be easily shared by the Writer.

As a result of this CL, there are some new publicly visible API changes:
	type SparseEntry struct { Offset, Length int64 }
	type Header struct { ...; SparseHoles []SparseEntry }

A new type is defined to represent a sparse fragment and a new field
Header.SparseHoles is added to represent the sparse holes in a file.
The API intentionally represent sparse files using hole fragments,
rather than data fragments so that the zero value of SparseHoles
naturally represents a normal file (i.e., a file without any holes).
The Reader now populates SparseHoles for sparse files.

It is necessary to export the sparse hole information, otherwise it would
be impossible for the Writer to specify that it is trying to encode
a sparse file, and what it looks like.

Some unexported helper functions were added to common.go:
	func validateSparseEntries(sp []SparseEntry, size int64) bool
	func alignSparseEntries(src []SparseEntry, size int64) []SparseEntry
	func invertSparseEntries(src []SparseEntry, size int64) []SparseEntry

The validation logic that used to be in newSparseFileReader is now moved
to validateSparseEntries so that the Writer can use it in the future.
alignSparseEntries is currently unused by the Reader, but will be used
by the Writer in the future. Since TAR represents sparse files by
only recording the data fragments, we add the invertSparseEntries
function to convert a list of data fragments to a normalized list
of hole fragments (and vice-versa).

Some other high-level changes:
* skipUnread is deleted, where most of it's logic is moved to the
Discard methods on regFileReader and sparseFileReader.
* readGNUSparsePAXHeaders was rewritten to be simpler.
* regFileReader and sparseFileReader were completely rewritten
in simpler and easier to understand logic.
* A bug was fixed in sparseFileReader.Read where it failed to
report an error if the logical size of the file ends before
consuming all of the underlying data.
* The tests for sparse-file support was completely rewritten.

Updates #13548

Change-Id: Ic1233ae5daf3b3f4278fe1115d34a90c4aeaf0c2
Reviewed-on: https://go-review.googlesource.com/56771
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-19 00:57:31 +00:00
Agniva De Sarker
d9606e5532 archive/tar: add reader/writer benchmarks
According to the discussion on golang.org/cl/55210,
adding benchmarks for reading from and writing to tar archives.

Splitting the benchmarks into 3 sections of USTAR, GNU, PAX each.

Results ran with -cpu=1 -count=10 on an amd64 machine (i5-5200U CPU @ 2.20GHz)
name           time/op
/Writer/USTAR  5.31µs ± 0%
/Writer/GNU    5.01µs ± 1%
/Writer/PAX    11.0µs ± 2%
/Reader/USTAR  3.22µs ± 1%
/Reader/GNU    3.04µs ± 1%
/Reader/PAX    7.48µs ± 1%

name           alloc/op
/Writer/USTAR  1.20kB ± 0%
/Writer/GNU    1.15kB ± 0%
/Writer/PAX    2.61kB ± 0%
/Reader/USTAR  1.38kB ± 0%
/Reader/GNU    1.35kB ± 0%
/Reader/PAX    4.91kB ± 0%

name           allocs/op
/Writer/USTAR    53.0 ± 0%
/Writer/GNU      47.0 ± 0%
/Writer/PAX       107 ± 0%
/Reader/USTAR    32.0 ± 0%
/Reader/GNU      30.0 ± 0%
/Reader/PAX      67.0 ± 0%

Change-Id: I58b1b85b52e58cbd566736aae4d722a3ddf2395b
Reviewed-on: https://go-review.googlesource.com/55254
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-16 20:51:52 +00:00
Joe Tsai
b9a79f32b1 archive/tar: make Writer error handling consistent
The Writer logic was not consistent about when an IO error would
persist across multiple calls on Writer's methods.

Thus, to make the error handling more consistent we always check
the persistent state of the error prior to every exported method
call, and return an error if set. Otherwise, it is the responsibility
of every exported method to persist any fatal errors that may occur.

As a simplification, we can remove the close field since that
information can be represented by simply storing ErrWriteAfterClose
in the err field.

Change-Id: I8746ca36b3739803e0373253450db69b3bd12f38
Reviewed-on: https://go-review.googlesource.com/55590
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-16 01:07:12 +00:00
Joe Tsai
5c20ffbb2f archive/tar: add support for long binary strings in GNU format
The GNU tar format defines the following type flags:
	TypeGNULongName = 'L' // Next file has a long name
	TypeGNULongLink = 'K' // Next file symlinks to a file w/ a long name

Anytime a string exceeds the field dedicated to store it, the GNU format
permits a fake "file" to be prepended where that file entry has a Typeflag
of 'L' or 'K' and the contents of the file is a NUL-terminated string.

Contrary to previous TODO comments,
the GNU format supports arbitrary strings (without NUL) rather UTF-8 strings.
The manual says the following:
<<<
The name, linkname, magic, uname, and gname are
null-terminated character strings
>>>
<<<
All characters in header blocks are represented
by using 8-bit characters in the local variant of ASCII.
>>>

From this description, we gather the following:
* We must forbid NULs in any GNU strings
* Any 8-bit value (other than NUL) is permitted

Since the modern world has moved to UTF-8, it is really difficult to
determine what a "local variant of ASCII" means. For this reason,
we treat strings as just an arbitrary binary string (without NUL)
and leave it to the user to determine the encoding of this string.
(Practically, it seems that UTF-8 is the typical encoding used
in GNU archives seen in the wild).

The implementation of GNU tar seems to confirm this interpretation
of the manual where it permits any arbitrary binary string to exist
within these fields so long as they do not contain the NUL character.

 $ touch `echo -e "not\x80\x81\x82\x83utf8"`
 $ gnutar -H gnu --tar -cvf gnu-not-utf8.tar $(echo -e "not\x80\x81\x82\x83utf8")

The fact that we permit arbitrary binary in GNU strings goes
hand-in-hand with the fact that GNU also permits a "base-256" encoding
of numeric fields, which is effectively two-complement binary.

Change-Id: Ic037ec6bed306d07d1312f0058594bd9b64d9880
Reviewed-on: https://go-review.googlesource.com/55573
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-16 00:39:32 +00:00
Joe Tsai
4c55774304 archive/tar: re-implement USTAR path splitting
The logic for USTAR was disabled because a previous implementation of
Writer had a wrong understanding of the differences between USTAR and GNU,
causing the prefix field is incorrectly be populated in GNU files.

Now that this issue has been fixed, we can re-enable the logic for USTAR
path splitting, which allows Writer to use the USTAR for a wider range
of possible inputs.

Updates #9683
Updates #12594
Updates #17630

Change-Id: I9fe34e5df63f99c6dd56fee3a7e7e4d6ec3995c9
Reviewed-on: https://go-review.googlesource.com/55574
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-15 05:40:22 +00:00