intDataSize ignores unsigned integers, forcing reads/writes to miss the fast path.
Fixes#8956
Change-Id: Ie79b565b037db3c469aa1dc6d0a8a5a9252d5f0a
Reviewed-on: https://go-review.googlesource.com/1777
Reviewed-by: Russ Cox <rsc@golang.org>
Some applications use unpadded base64 format, omitting the trailing
'=' padding characters from the standard base64 format, either to
minimize size or (more justifiably) to avoid use of the '=' character.
Unpadded flavors are standard and documented in section 3.2 of RFC 4648.
To support these unpadded flavors, this change adds two predefined
encoding variables, RawStdEncoding and RawURLEncoding, for unpadded
encodings using the standard and URL character set, respectively.
The change also adds a function WithPadding() to customize the padding
character or disable padding in a custom Encoding.
Finally, I noticed that the existing base64 test-suite was only
exercising the StdEncoding, and not referencing URLEncoding at all.
This change adds test-suite functionality to exercise all four encodings
(the two existing ones and the two new unpadded flavors),
although it still doesn't run *every* test on all four encodings.
Naming: I used the "Raw" prefix because it's more concise than "Unpadded"
and seemed just as expressive, but I have no strong preferences here.
Another short alternative prefix would be "Min" ("minimal" encoding).
Change-Id: Ic0423e02589b39a6b2bb7d0763bd073fd244f469
Reviewed-on: https://go-review.googlesource.com/1511
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
It is unused. It was introduced in the CL that added InputOffset.
I suspect it was an editing mistake.
LGTM=bradfitz
R=bradfitz
CC=golang-codereviews
https://golang.org/cl/182580043
In theory both of these lines encode the same three fields:
a,,c
a,"",c
However, Postgres defines that when importing CSV, the unquoted
version is treated as NULL (missing), while the quoted version is
treated as a string value (empty string). If the middle field is supposed to
be an integer value, the first line can be imported (NULL is okay), but
the second line cannot (empty string is not).
Postgres's import command (COPY FROM) has an option to force
the unquoted empty to be interpreted as a string but it does not
have an option to force the quoted empty to be interpreted as a NULL.
From http://www.postgresql.org/docs/9.0/static/sql-copy.html:
The CSV format has no standard way to distinguish a NULL
value from an empty string. PostgreSQL's COPY handles this
by quoting. A NULL is output as the NULL parameter string
and is not quoted, while a non-NULL value matching the NULL
parameter string is quoted. For example, with the default
settings, a NULL is written as an unquoted empty string,
while an empty string data value is written with double
quotes (""). Reading values follows similar rules. You can
use FORCE_NOT_NULL to prevent NULL input comparisons for
specific columns.
Therefore printing the unquoted empty is more flexible for
imports into Postgres than printing the quoted empty.
In addition to making the output more useful with Postgres, not
quoting empty strings makes the output smaller and easier to read.
It also matches the behavior of Microsoft Excel and Google Drive.
Since we are here and making concessions for Postgres, handle this
case too (again quoting the Postgres docs):
Because backslash is not a special character in the CSV
format, \., the end-of-data marker, could also appear as a
data value. To avoid any misinterpretation, a \. data value
appearing as a lone entry on a line is automatically quoted
on output, and on input, if quoted, is not interpreted as
the end-of-data marker. If you are loading a file created by
another application that has a single unquoted column and
might have a value of \., you might need to quote that value
in the input file.
Fixes#7586.
LGTM=bradfitz
R=bradfitz
CC=golang-codereviews
https://golang.org/cl/164760043
As we did with encoding, provide a trivial byte reader for
faster decoding. We can also reduce some of the copying
by doing the allocation all at once using a slightly different
interface from byte buffers.
benchmark old ns/op new ns/op delta
BenchmarkEndToEndPipe 13368 12902 -3.49%
BenchmarkEndToEndByteBuffer 5969 5642 -5.48%
BenchmarkEndToEndSliceByteBuffer 479485 470798 -1.81%
BenchmarkEncodeComplex128Slice 92367 92201 -0.18%
BenchmarkEncodeFloat64Slice 39990 38960 -2.58%
BenchmarkEncodeInt32Slice 30510 27938 -8.43%
BenchmarkEncodeStringSlice 33753 33365 -1.15%
BenchmarkDecodeComplex128Slice 232278 196704 -15.32%
BenchmarkDecodeFloat64Slice 150258 128191 -14.69%
BenchmarkDecodeInt32Slice 133806 115748 -13.50%
BenchmarkDecodeStringSlice 335117 300534 -10.32%
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/154360049
Bytes buffers have more API and are a little slower. Since appending
is a key part of the path in encode, using a faster implementation
speeds things up measurably.
The couple of positive swings are likely garbage-collection related
since memory allocation looks different in the benchmark now.
I am not concerned by them.
benchmark old ns/op new ns/op delta
BenchmarkEndToEndPipe 6620 6388 -3.50%
BenchmarkEndToEndByteBuffer 3548 3600 +1.47%
BenchmarkEndToEndSliceByteBuffer 336678 367980 +9.30%
BenchmarkEncodeComplex128Slice 78199 71297 -8.83%
BenchmarkEncodeFloat64Slice 37731 32258 -14.51%
BenchmarkEncodeInt32Slice 26780 22977 -14.20%
BenchmarkEncodeStringSlice 35882 26492 -26.17%
BenchmarkDecodeComplex128Slice 194819 185126 -4.98%
BenchmarkDecodeFloat64Slice 120538 120102 -0.36%
BenchmarkDecodeInt32Slice 106442 107275 +0.78%
BenchmarkDecodeStringSlice 272902 269866 -1.11%
LGTM=ruiu
R=golang-codereviews, ruiu
CC=golang-codereviews
https://golang.org/cl/160990043
Use go generate to write better loops for decoding arrays,
just as we did for encoding. It doesn't help as much,
relatively speaking, but it's still noticeable.
benchmark old ns/op new ns/op delta
BenchmarkDecodeComplex128Slice 202348 184529 -8.81%
BenchmarkDecodeFloat64Slice 135800 120979 -10.91%
BenchmarkDecodeInt32Slice 121200 105149 -13.24%
BenchmarkDecodeStringSlice 288129 278214 -3.44%
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/154420044
We borrow a trick from the fmt package and avoid reflection
to walk the elements when possible. We could push further with
unsafe (and we may) but this is a good start.
Decode can benefit similarly; it will be done separately.
Use go generate (engen.go) to produce the helper functions
(enc_helpers.go).
benchmark old ns/op new ns/op delta
BenchmarkEndToEndPipe 6593 6482 -1.68%
BenchmarkEndToEndByteBuffer 3662 3684 +0.60%
BenchmarkEndToEndSliceByteBuffer 350306 351693 +0.40%
BenchmarkComplex128Slice 96347 80045 -16.92%
BenchmarkInt32Slice 42484 26008 -38.78%
BenchmarkFloat64Slice 51143 36265 -29.09%
BenchmarkStringSlice 53402 35077 -34.32%
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/156310043
FieldByIndex never returns an invalid Value, so the validity
test can be avoided if the field is not indirect.
BenchmarkGobEncode 12768642 12424022 -2.70%
BenchmarkGobEncode 60.11 61.78 1.03x
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/158890045
https://golang.org/cl/153770043/ tried to fix the case where a
implicitly tagged Time, that happened to have the same tag as
GENERALIZEDTIME, shouldn't be parsed as a GENERALIZEDTIME.
It did so, mistakenly, by testing whether params.tag != nil. But
explicitly tagged values also have a non-nil tag and there the inner
tag actually does encode the type of the value.
This change instead tests whether the tag class is UNIVERSAL before
assuming that the tag contains type information.
LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/152380044