The existing implementation for Equal and similar
functions in the bytes package operate on one byte at
at time. This performs poorly on ppc64/ppc64le especially
when the byte buffers are large. This change improves
those functions by loading and comparing double words where
possible. The common code has been moved to a function
that can be shared by the other functions in this
file which perform the same type of comparison.
Further optimizations are done for the case where
>= 32 bytes are being compared. The new function
memeqbody is used by memeq_varlen, Equal, and eqstring.
When running the bytes test with -test.bench=Equal
benchmark old MB/s new MB/s speedup
BenchmarkEqual1 164.83 129.49 0.79x
BenchmarkEqual6 563.51 445.47 0.79x
BenchmarkEqual9 656.15 1099.00 1.67x
BenchmarkEqual15 591.93 1024.30 1.73x
BenchmarkEqual16 613.25 1914.12 3.12x
BenchmarkEqual20 682.37 1687.04 2.47x
BenchmarkEqual32 807.96 3843.29 4.76x
BenchmarkEqual4K 1076.25 23280.51 21.63x
BenchmarkEqual4M 1079.30 13120.14 12.16x
BenchmarkEqual64M 1073.28 10876.92 10.13x
It was determined that the degradation in the smaller byte tests
were due to unfavorable code alignment of the single byte loop.
Fixes#14368
Change-Id: I0dd87382c28887c70f4fbe80877a8ba03c31d7cd
Reviewed-on: https://go-review.googlesource.com/20249
Reviewed-by: Minux Ma <minux@golang.org>
This changes how matching is done in deflate algorithm.
The major change is that we do not look for matches that are only
3 bytes in length, matches must be 4 bytes at least.
Contrary to what you would expect this actually improves the
compresion ratio, since 3 literal bytes will often be shorter
than a match after huffman encoding.
This varies a bit by source, but is most often the case when the
source is "easy" to compress.
Second of all, a "stronger" hash is used. The hash is similar to
the hashing function used by Snappy.
Overall, the speed impact is biggest on higher compression levels.
I intend to replace the "speed" compression level, which can be
seen in CL 21021.
The built-in benchmark using "digits" is slower at level 1.
I see this as an exception, since "digits" is a special type
of data, where you have low entropy (numbers 0->9), but no
significant matches. Again, CL 20021 fixes that case.
NewWriterDict is also made considerably faster, by not running data
through the entire encoder. This is not reflected by the benchmark.
Overall, the speed impact is biggest on higher compression levels.
I intend to replace the "speed" compression level.
COMPARED to tip/master:
name old time/op new time/op delta
EncodeDigitsSpeed1e4-4 401µs ± 1% 345µs ± 2% -13.95%
EncodeDigitsSpeed1e5-4 3.19ms ± 1% 4.27ms ± 3% +33.96%
EncodeDigitsSpeed1e6-4 27.7ms ± 4% 43.8ms ± 3% +58.00%
EncodeDigitsDefault1e4-4 641µs ± 0% 403µs ± 1% -37.15%
EncodeDigitsDefault1e5-4 13.8ms ± 1% 6.4ms ± 3% -53.73%
EncodeDigitsDefault1e6-4 162ms ± 1% 64ms ± 2% -60.51%
EncodeDigitsCompress1e4-4 627µs ± 1% 405µs ± 2% -35.45%
EncodeDigitsCompress1e5-4 13.9ms ± 0% 6.3ms ± 2% -54.46%
EncodeDigitsCompress1e6-4 159ms ± 1% 64ms ± 0% -59.91%
EncodeTwainSpeed1e4-4 433µs ± 4% 331µs ± 1% -23.53%
EncodeTwainSpeed1e5-4 2.82ms ± 1% 3.08ms ± 0% +9.10%
EncodeTwainSpeed1e6-4 28.1ms ± 2% 28.8ms ± 0% +2.82%
EncodeTwainDefault1e4-4 695µs ± 4% 474µs ± 1% -31.78%
EncodeTwainDefault1e5-4 11.8ms ± 0% 7.4ms ± 0% -37.31%
EncodeTwainDefault1e6-4 128ms ± 0% 75ms ± 0% -40.93%
EncodeTwainCompress1e4-4 719µs ± 3% 480µs ± 0% -33.27%
EncodeTwainCompress1e5-4 15.0ms ± 3% 8.2ms ± 2% -45.55%
EncodeTwainCompress1e6-4 170ms ± 0% 85ms ± 1% -49.99%
name old speed new speed delta
EncodeDigitsSpeed1e4-4 25.0MB/s ± 1% 29.0MB/s ± 2% +16.24%
EncodeDigitsSpeed1e5-4 31.4MB/s ± 1% 23.4MB/s ± 3% -25.34%
EncodeDigitsSpeed1e6-4 36.1MB/s ± 4% 22.8MB/s ± 3% -36.74%
EncodeDigitsDefault1e4-4 15.6MB/s ± 0% 24.8MB/s ± 1% +59.11%
EncodeDigitsDefault1e5-4 7.27MB/s ± 1% 15.72MB/s ± 3% +116.23%
EncodeDigitsDefault1e6-4 6.16MB/s ± 0% 15.60MB/s ± 2% +153.25%
EncodeDigitsCompress1e4-4 15.9MB/s ± 1% 24.7MB/s ± 2% +54.97%
EncodeDigitsCompress1e5-4 7.19MB/s ± 0% 15.78MB/s ± 2% +119.62%
EncodeDigitsCompress1e6-4 6.27MB/s ± 1% 15.65MB/s ± 0% +149.52%
EncodeTwainSpeed1e4-4 23.1MB/s ± 4% 30.2MB/s ± 1% +30.68%
EncodeTwainSpeed1e5-4 35.4MB/s ± 1% 32.5MB/s ± 0% -8.34%
EncodeTwainSpeed1e6-4 35.6MB/s ± 2% 34.7MB/s ± 0% -2.77%
EncodeTwainDefault1e4-4 14.4MB/s ± 4% 21.1MB/s ± 1% +46.48%
EncodeTwainDefault1e5-4 8.49MB/s ± 0% 13.55MB/s ± 0% +59.50%
EncodeTwainDefault1e6-4 7.83MB/s ± 0% 13.25MB/s ± 0% +69.19%
EncodeTwainCompress1e4-4 13.9MB/s ± 3% 20.8MB/s ± 0% +49.83%
EncodeTwainCompress1e5-4 6.65MB/s ± 3% 12.20MB/s ± 2% +83.51%
EncodeTwainCompress1e6-4 5.88MB/s ± 0% 11.76MB/s ± 1% +100.06%
Change-Id: I724e33c1dd3e3a6a1b0a68e094baa959352baf32
Reviewed-on: https://go-review.googlesource.com/20929
Run-TryBot: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
The exclusion of string from IsScanValue prevents driver authors from
writing their drivers in such a way that would allow users to
distinguish between strings and byte arrays returned from a database.
Such drivers are possible today, but require their authors to deviate
from the guidance provided by the standard library.
This exclusion has been in place since the birth of this package in
https://github.com/golang/go/commit/357f2cb1a385f4d1418e48856f9abe0cce,
but the fakedb implementation shipped in the same commit violates the
exclusion!
Strictly speaking this is a breaking change, but it increases the set
of permissible Scan types, and should not cause breakage in practice.
No test changes are necessary because fakedb already exercises this.
Fixes#6497.
Change-Id: I69dbd3a59d90464bcae8c852d7ec6c97bfd120f8
Reviewed-on: https://go-review.googlesource.com/19439
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This is to support https://golang.org/cl/18057, which is going to add
Windows support to this directory. Better to write the test in Go then
to have both test.bash and test.bat.
Update #13494.
Change-Id: I4af7004416309e885049ee60b9470926282f210d
Reviewed-on: https://go-review.googlesource.com/20892
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
No need to have both ops when they do the same thing.
Just declare MOVBload to zero extend and we can get rid
of MOVBQZXload. Same for W and L.
Kind of a followon cleanup for https://go-review.googlesource.com/c/19506/
Should enable an easier fix for #14920
Change-Id: I7cfac909a8ba387f433a6ae75c050740ebb34d42
Reviewed-on: https://go-review.googlesource.com/21004
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This makes the rounding bug fix in math/big for issue 14651 available
to the compiler.
- changes to cmd/compile/internal/big fully automatic via script
- added test case for issue
- updated old test case with correct test data
Fixes#14651.
Change-Id: Iea37a2cd8d3a75f8c96193748b66156a987bbe40
Reviewed-on: https://go-review.googlesource.com/20818
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Shows up occassionally, especially after p = p[:8:len(p)]
Updates #14905
Change-Id: Iab35ef2eac57817e6a10c6aaeeb84709e8021641
Reviewed-on: https://go-review.googlesource.com/21025
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The matcher is responsible for sanitizing and uniquing the
test and benchmark names and thus needs to be included before the
API can be exposed.
Matching currently uses the regexp to only match the top-level
tests/benchmarks.
Support for subtest matching is for another CL.
Change-Id: I7c8464068faef7ebc179b03a7fe3d01122cc4f0b
Reviewed-on: https://go-review.googlesource.com/18897
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Escape analysis has a hard time with tree-like
structures (see #13493 and #14858).
This is unlikely to change.
As a result, when invoking a function that accepts
a **Node parameter, we usually allocate a *Node
on the heap. This happens a whole lot.
This CL changes functions from taking a **Node
to acting more like append: It both modifies
the input and returns a replacement for it.
Because of the cascading nature of escape analysis,
in order to get the benefits, I had to modify
almost all such functions. The remaining functions
are in racewalk and the backend. I would be happy
to update them as well in a separate CL.
This CL was created by manually updating the
function signatures and the directly impacted
bits of code. The callsites were then automatically
updated using a bespoke script:
https://gist.github.com/josharian/046b1be7aceae244de39
For ease of reviewing and future understanding,
this CL is also broken down into four CLs,
mailed separately, which show the manual
and the automated changes separately.
They are CLs 20990, 20991, 20992, and 20993.
Passes toolstash -cmp.
name old time/op new time/op delta
Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24)
Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24)
GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24)
Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24)
MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23)
name old alloc/op new alloc/op delta
Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23)
Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25)
GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25)
Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25)
name old allocs/op new allocs/op delta
Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25)
Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24)
GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25)
Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25)
name old text-bytes new text-bytes delta
HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal)
CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal)
name old data-bytes new data-bytes delta
HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal)
CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal)
name old exe-bytes new exe-bytes delta
HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal)
CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal)
Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475
Reviewed-on: https://go-review.googlesource.com/20959
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
These new methods help find the compilation unit to pass to the
LineReader method in order to find the line information for a PC.
The Ranges method also helps identify the specific function for a PC,
needed to determine the function name.
This uses the .debug.ranges section if necessary, and changes the object
file format packages to pass in the section contents if available.
Change-Id: I5ebc3d27faaf1a126ffb17a1e6027efdf64af836
Reviewed-on: https://go-review.googlesource.com/20769
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Adds a new R_PCRELDBL relocation for 2-byte aligned relative
relocations on s390x. Should be removed once #14218 is
implemented.
Change-Id: I79dd2d8e746ba8cbc26c570faccfdd691e8161e8
Reviewed-on: https://go-review.googlesource.com/20941
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Should let the s390x builder progress a little further.
Change-Id: I5eab5f384b0b039f8e246ba69ecfb24de08625d2
Reviewed-on: https://go-review.googlesource.com/20965
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Allow names to be used for subexpressions of match rules.
For example:
(OpA x:(OpB y)) -> ..use x here to refer to the OpB value..
This gets rid of the .Args[0].Args[0]... way of naming we
used to use.
While we're here, give all subexpression matches names instead
of recomputing them with .Args[i] sequences each time they
are referenced. Makes the generated rule code a bit smaller.
Change-Id: Ie42139f6f208933b75bd2ae8bd34e95419bc0e4e
Reviewed-on: https://go-review.googlesource.com/20997
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
Don't write back parts of a slicing operation if they
are unchanged from the source of the slice. For example:
x.s = x.s[0:5] // don't write back pointer or cap
x.s = x.s[:5] // don't write back pointer or cap
x.s = x.s[:5:7] // don't write back pointer
There is more to be done here, for example:
x.s = x.s[:len(x.s):7] // don't write back ptr or len
This CL can't handle that one yet.
Fixes#14855
Change-Id: Id1e1a4fa7f3076dc1a76924a7f1cd791b81909bb
Reviewed-on: https://go-review.googlesource.com/20954
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
For the following example, but there are a few more in the stdlib:
func histogram(b []byte, h *[256]int32) {
for _, t := range b {
h[t]++
}
}
Change-Id: I56615f341ae52e02ef34025588dc6d1c52122295
Reviewed-on: https://go-review.googlesource.com/20924
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Allow inlining of functions with switch statements as long as they don't
contain a break or type switch.
Fixes#13071
Change-Id: I057be351ea4584def1a744ee87eafa5df47a7f6d
Reviewed-on: https://go-review.googlesource.com/20824
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Converting a big.Float value x to a float32/64 value did not correctly
round x up to the smallest denormal float32/64 if x was smaller than the
smallest denormal float32/64, but larger than 0.5 of a smallest denormal
float32/64.
Handle this case explicitly and simplify some code in the turn.
For #14651.
Change-Id: I025e24bf8f0e671581a7de0abf7c1cd7e6403a6c
Reviewed-on: https://go-review.googlesource.com/20816
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
The inserted early bound checks cause the slice
to expand beyond the original length of the slice.
Change-Id: Ib38891605f4a9a12d3b9e2071a5f77640b083d2d
Reviewed-on: https://go-review.googlesource.com/20981
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
MOVSB is quite a bit faster for unaligned moves.
Possibly we should use MOVSB all of the time, but Intel folks
say it might be a bit faster to use MOVSQ on some processors
(but not any I have access to at the moment).
benchmark old ns/op new ns/op delta
BenchmarkMemmove4096-8 93.9 93.2 -0.75%
BenchmarkMemmoveUnalignedDst4096-8 256 151 -41.02%
BenchmarkMemmoveUnalignedSrc4096-8 175 90.5 -48.29%
Fixes#14630
Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff
Reviewed-on: https://go-review.googlesource.com/20293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Left over from CL 20931.
Change-Id: I3b8dd9ef748bcbf70b5118da28135aaa1e5ba3a8
Reviewed-on: https://go-review.googlesource.com/20955
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Minux Ma <minux@golang.org>
Constant comparisons against 0 are reasonably common.
Special-case and avoid allocating a new zero value each time.
Change-Id: I6c526c8ab30ef7f0fef59110133c764b7b90ba05
Reviewed-on: https://go-review.googlesource.com/20956
Reviewed-by: Alan Donovan <adonovan@google.com>
Also give them more idiomatic Go names. Adding godocs is outside the
scope of this CL. (Besides, the method names almost all directly
parallel an underlying math/big.Int or math/big.Float method.)
CL prepared mechanically with sed (for rewriting mpint.go/mpfloat.go)
and gofmt (for rewriting call sites).
Passes toolstash -cmp.
Change-Id: Id76f4aee476ba740f48db33162463e7978c2083d
Reviewed-on: https://go-review.googlesource.com/20909
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Fixed bug that slipped probably slipped in after rebasing and
explain why it failed on nacl/netbsd/plan9, which set default
maxparallelism to 1.
Change-Id: I4d59682fb2843d138b320334189f53fcdda5b2f6
Reviewed-on: https://go-review.googlesource.com/20980
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
The Linux ABI takes arguments in a different order on s390x.
Change-Id: Ic9cfcc22a5ea3d8ef77d4dd0b915fc266ff3e5f7
Reviewed-on: https://go-review.googlesource.com/20960
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Minimum architecture of z196 required so that GCC can assemble
gcc_s390x.S in runtime/cgo.
Change-Id: I603ed2edd39f826fb8193740ece5bd11d18c3dc5
Reviewed-on: https://go-review.googlesource.com/20876
Reviewed-by: Ian Lance Taylor <iant@golang.org>
It was just a funny way of saying len(Ctxt.Allsym) by now.
Change-Id: Iff75e73c9f7ec4ba26cfef479bbd05d7dcd172f5
Reviewed-on: https://go-review.googlesource.com/20973
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Nothing cares about it.
I did this after looking at the memprof output, but it helps performance a bit:
name old s/op new s/op delta
LinkCmdGo 0.44 ± 3% 0.43 ± 3% -2.20% (p=0.000 n=94+90)
LinkJuju 3.98 ± 5% 3.94 ± 5% -1.19% (p=0.000 n=100+91)
As well as MaxRSS (i.e. what /usr/bin/time -f '%M' prints):
name old MaxRSS new MaxRSS delta
LinkCmdGo 130k ± 0% 120k ± 3% -7.79% (p=0.000 n=79+90)
LinkJuju 862k ± 6% 827k ± 8% -4.01% (p=0.000 n=100+99)
Change-Id: I6306b7b3369576a688659e2ecdb0815b4152ae96
Reviewed-on: https://go-review.googlesource.com/20972
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>