This CL instrinsifies Add64 with arm64 instruction sequence ADDS, ADCS
and ADC, and optimzes the case of carry chains.The CL also changes the
test code so that the intrinsic implementation can be tested.
Benchmarks:
name old time/op new time/op delta
Add-224 2.500000ns +- 0% 2.090000ns +- 4% -16.40% (p=0.000 n=9+10)
Add32-224 2.500000ns +- 0% 2.500000ns +- 0% ~ (all equal)
Add64-224 2.500000ns +- 0% 1.577778ns +- 2% -36.89% (p=0.000 n=10+9)
Add64multiple-224 6.000000ns +- 0% 2.000000ns +- 0% -66.67% (p=0.000 n=10+10)
Change-Id: I6ee91c9a85c16cc72ade5fd94868c579f16c7615
Reviewed-on: https://go-review.googlesource.com/c/go/+/159017
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
'SUBQ $-0x80, r' is shorter to encode than 'ADDQ $0x80, r',
and functionally equivalent. Use it instead.
Shaves off a few bytes here and there:
file before after Δ %
compile 25935856 25927664 -8192 -0.032%
nm 4251840 4247744 -4096 -0.096%
Change-Id: Ia9e02ea38cbded6a52a613b92e3a914f878d931e
Reviewed-on: https://go-review.googlesource.com/c/go/+/168344
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
A few examples (for accessing a slice of length 3):
s[-1] runtime error: index out of range [-1]
s[3] runtime error: index out of range [3] with length 3
s[-1:0] runtime error: slice bounds out of range [-1:]
s[3:0] runtime error: slice bounds out of range [3:0]
s[3:-1] runtime error: slice bounds out of range [:-1]
s[3:4] runtime error: slice bounds out of range [:4] with capacity 3
s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3
Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.
An exhaustive set of examples is in issue30116[u].out in the CL.
The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.
Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.
Fixes#30116
Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
We know that a & 31 is non-negative for all a, signed or not.
We can avoid checking that and needing to write out an
unreachable call to panicshift.
Change-Id: I32f32fb2c950d2b2b35ac5c0e99b7b2dbd47f917
Reviewed-on: https://go-review.googlesource.com/c/go/+/167499
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Two tests (load_le_byte8_uint64_inv and load_be_byte8_uint64)
pass but the generated code isn't actually correct.
The test regexp provides a false negative, as it matches the
MOVQ (SP), BP instruction in the epilogue.
Combined loads never worked for these cases - the test was added in error
as part of a batch and not noticed because of the above false match.
Normalize the amd64/386 tests to always negative match on narrower
loads and OR.
Change-Id: I256861924774d39db0e65723866c81df5ab5076f
Reviewed-on: https://go-review.googlesource.com/c/go/+/166837
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Current compiler reverses operands to work around NaN in
"less than" and "less equal than" comparisons. But if we
want to use "FCMPD/FCMPS $(0.0), Fn" to do some optimization,
the workaround way does not work. Because assembler does
not support instruction "FCMPD/FCMPS Fn, $(0.0)".
This CL sets condition flags for floating-point comparisons
to resolve this problem.
Change-Id: Ia48076a1da95da64596d6e68304018cb301ebe33
Reviewed-on: https://go-review.googlesource.com/c/go/+/164718
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This CL adds two rules to turn patterns like ((x<<8) | (x>>8)) (the type of
x is uint16, "|" can also be "+" or "^") to a REV16 instruction on arm v6+.
This optimization rule can be used for math/bits.ReverseBytes16.
Benchmarks on arm v6:
name old time/op new time/op delta
ReverseBytes-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal)
ReverseBytes16-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal)
ReverseBytes32-32 1.29ns ± 0% 1.29ns ± 0% ~ (all equal)
ReverseBytes64-32 1.43ns ± 0% 1.43ns ± 0% ~ (all equal)
Change-Id: I819e633c9a9d308f8e476fb0c82d73fb73dd019f
Reviewed-on: https://go-review.googlesource.com/c/go/+/159019
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Emit &runtime.zerobase instead of a call to newobject for
allocations of zero sized objects in walk.go.
Fixes#29446
Change-Id: I11b67981d55009726a17c2e582c12ce0c258682e
Reviewed-on: https://go-review.googlesource.com/c/155840
Run-TryBot: Iskander Sharipov <quasilyte@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
var a []int = ...
p := &a[0]
_ = *p
We don't need to nil check on the 3rd line. If the bounds check on the 2nd
line passes, we know p is non-nil.
We rely on the fact that any cap>0 slice has a non-nil pointer as its
pointer to the backing array. This is true for all safely-constructed slices,
and I don't see any reason why someone would violate this rule using unsafe.
R=go1.13
Fixes#30366
Change-Id: I3ed764fcb72cfe1fbf963d8c1a82e24e3b6dead7
Reviewed-on: https://go-review.googlesource.com/c/163740
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
If someone takes a pointer to a zero-sized stack variable, it can
be incorrectly interpreted as a pointer to the next object in the
stack frame. To avoid this, add some padding after zero-sized variables.
We only need to pad if the next variable in memory (which is the
previous variable in the order in which we allocate variables to the
stack frame) has pointers. If the next variable has no pointers, it
won't hurt to have a pointer to it.
Because we allocate all pointer-containing variables before all
non-pointer-containing variables, we should only have to pad once per
frame.
Fixes#24993
Change-Id: Ife561cdfdf964fdbf69af03ae6ba97d004e6193c
Reviewed-on: https://go-review.googlesource.com/c/155698
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This CL adds several test cases of arithmetic operations for
386/amd64/arm/arm64.
Change-Id: I362687c06249f31091458a1d8c45fc4d006b616a
Reviewed-on: https://go-review.googlesource.com/c/151897
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Keith Randall <khr@golang.org>
We want to issue loads as soon as possible, especially when they
are going to miss in the cache. Using a conditional move (CMOV) here:
i := ...
if cond {
i++
}
... = a[i]
means that we have to wait for cond to be computed before the load
is issued. Without a CMOV, if the branch is predicted correctly the
load can be issued in parallel with computing cond.
Even if the branch is predicted incorrectly, maybe the speculative
load is close to the real load, and we get a prefetch for free.
In the worst case, when the prediction is wrong and the address is
way off, we only lose by the time difference between the CMOV
latency (~2 cycles) and the mispredict restart latency (~15 cycles).
We only squash CMOVs that affect load addresses. Results of CMOVs
that are used for other things (store addresses, store values) we
use as before.
Fixes#26306
Change-Id: I82ca14b664bf05e1d45e58de8c4d9c775a127ca1
Reviewed-on: https://go-review.googlesource.com/c/145717
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Note that the intrinsic implementation panics separately for overflow and
divide by zero, which matches the behavior of the pure go implementation.
There is a modest performance improvement after intrinsic implementation.
name old time/op new time/op delta
Div-4 53.0ns ± 1% 47.0ns ± 0% -11.28% (p=0.008 n=5+5)
Div32-4 18.4ns ± 0% 18.5ns ± 1% ~ (p=0.444 n=5+5)
Div64-4 53.3ns ± 0% 47.5ns ± 4% -10.77% (p=0.008 n=5+5)
Updates #28273
Change-Id: Ic1688ecc0964acace2e91bf44ef16f5fb6b6bc82
Reviewed-on: https://go-review.googlesource.com/c/144378
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The current support_XXX variables are specific for the
amd64 and 386 platforms.
Prefix processor capability variables by architecture to have a
consistent naming scheme and avoid reuse of the existing
variables for new platforms.
This also aligns naming of runtime variables closer with internal/cpu
processor capability variable names.
Change-Id: I3eabb29a03874678851376185d3a62e73c1aff1d
Reviewed-on: https://go-review.googlesource.com/c/91435
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This change makes use of the cc versions of the AND, OR, XOR
instructions, omitting the need for a CMP instruction.
In many test programs and in the go binary, this reduces the
size of 20-30 functions by at least 1 instruction, many in
runtime.
Testcase added to test/codegen/comparisons.go
Change-Id: I6cc1ca8b80b065d7390749c625bc9784b0039adb
Reviewed-on: https://go-review.googlesource.com/c/143059
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
For moves >8,<16 bytes, do a move using non-overlapping loads/stores
if it would require no more instructions.
This helps a bit with the case when the move is from a static
constant, because then the code to materialize the value being moved
is smaller.
Change-Id: Ie47a5a7c654afeb4973142b0a9922faea13c9b54
Reviewed-on: https://go-review.googlesource.com/c/146019
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This CL fixes several typos and adds two more cases
to arithmetic test.
Change-Id: I086560162ea351e2166866e444e2317da36c1729
Reviewed-on: https://go-review.googlesource.com/c/145210
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Unlike normal load+op opcodes, the load+compare opcode does
not clobber its non-load argument. Allow the load+compare merge
to happen even if the non-load argument is used elsewhere.
Noticed when investigating issue #28417.
Change-Id: Ibc48d1f2e06ae76034c59f453815d263e8ec7288
Reviewed-on: https://go-review.googlesource.com/c/145097
Reviewed-by: Ainar Garipov <gugl.zadolbal@gmail.com>
Reviewed-by: Ben Shi <powerman1st@163.com>
prove is able to find 94 occurrences in std cmd where a divisor
can't have the value -1. The change removes
the extraneous fix-up code for these cases.
Fixes#25239
Change-Id: Ic184de971f47cc57c702eb72805b8e291c14035d
Reviewed-on: https://go-review.googlesource.com/c/130215
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This CL add 3 rules to combine byte-store to word-store on386 and
amd64.
Change-Id: Iffd9cda42f1961680c81def4edc773ad58f211b3
Reviewed-on: https://go-review.googlesource.com/c/143057
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This CL adds more combined load/store test cases for 386/amd64.
Change-Id: I0a483a6ed0212b65c5e84d67ed8c9f50c389ce2d
Reviewed-on: https://go-review.googlesource.com/c/142878
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
ARMv7's MULAF/MULSF/MULAD/MULSD are not fused,
this CL fixes the confusing test cases.
Change-Id: I35022e207e2f0d24a23a7f6f188e41ba8eee9886
Reviewed-on: https://go-review.googlesource.com/c/142439
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Akhil Indurti <aindurti@gmail.com>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
x = map[string(byteslice)] is already optimized by the compiler to avoid a
string allocation. This CL generalizes this optimization to:
x = map[T1{ ... Tn{..., string(byteslice), ...} ... }]
where T1 to Tn is a nesting of struct and array literals.
Found in a hot code path that used a struct of strings made from []byte
slices to make a map lookup.
There are no uses of the more generalized optimization in the standard library.
Passes toolstash -cmp.
MapStringConversion/32/simple 21.9ns ± 2% 21.9ns ± 3% ~ (p=0.995 n=17+20)
MapStringConversion/32/struct 28.8ns ± 3% 22.0ns ± 2% -23.80% (p=0.000 n=20+20)
MapStringConversion/32/array 28.5ns ± 2% 21.9ns ± 2% -23.14% (p=0.000 n=19+16)
MapStringConversion/64/simple 21.0ns ± 2% 21.1ns ± 3% ~ (p=0.072 n=19+18)
MapStringConversion/64/struct 72.4ns ± 3% 21.3ns ± 2% -70.53% (p=0.000 n=20+20)
MapStringConversion/64/array 72.8ns ± 1% 21.0ns ± 2% -71.13% (p=0.000 n=17+19)
name old allocs/op new allocs/op delta
MapStringConversion/32/simple 0.00 0.00 ~ (all equal)
MapStringConversion/32/struct 0.00 0.00 ~ (all equal)
MapStringConversion/32/array 0.00 0.00 ~ (all equal)
MapStringConversion/64/simple 0.00 0.00 ~ (all equal)
MapStringConversion/64/struct 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20)
MapStringConversion/64/array 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20)
Change-Id: I483b4d84d8d74b1025b62c954da9a365e79b7a3a
Reviewed-on: https://go-review.googlesource.com/c/116275
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change adds codegen tests for the intrinsification on ppc64 of
the OnesCount{64,32,16,8}, and TrailingZeros{64,32,16,8} math/bits
functions.
Change-Id: Id3364921fbd18316850e15c8c71330c906187fdb
Reviewed-on: https://go-review.googlesource.com/c/141897
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
This CL adds tests of fused multiplication-accumulation
on arm/arm64.
Change-Id: Ic85d5277c0d6acb7e1e723653372dfaf96824a39
Reviewed-on: https://go-review.googlesource.com/c/141652
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Instead of
MOVB go.string."foo"(SB), AX
do
MOVB $102, AX
When we know the global we're loading from is readonly, we can
do that read at compile time.
I've made this arch-dependent mostly because the cases where this
happens often are memory->memory moves, and those don't get
decomposed until lowering.
Did amd64/386/arm/arm64. Other architectures could follow.
Update #26498
Change-Id: I41b1dc831b2cd0a52dac9b97f4f4457888a46389
Reviewed-on: https://go-review.googlesource.com/c/141118
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Do []byte(string) conversions more efficiently when the string
is a constant. Instead of calling stringtobyteslice, allocate
just the space we need and encode the initialization directly.
[]byte("foo") rewrites to the following pseudocode:
var s [3]byte // on heap or stack, depending on whether b escapes
s = *(*[3]byte)(&"foo"[0]) // initialize s from the string
b = s[:]
which generates this assembly:
0x001d 00029 (tmp1.go:9) LEAQ type.[3]uint8(SB), AX
0x0024 00036 (tmp1.go:9) MOVQ AX, (SP)
0x0028 00040 (tmp1.go:9) CALL runtime.newobject(SB)
0x002d 00045 (tmp1.go:9) MOVQ 8(SP), AX
0x0032 00050 (tmp1.go:9) MOVBLZX go.string."foo"+2(SB), CX
0x0039 00057 (tmp1.go:9) MOVWLZX go.string."foo"(SB), DX
0x0040 00064 (tmp1.go:9) MOVW DX, (AX)
0x0043 00067 (tmp1.go:9) MOVB CL, 2(AX)
// Then the slice is b = {AX, 3, 3}
The generated code is still not optimal, as it still does load/store
from read-only memory instead of constant stores. Next CL...
Update #26498Fixes#10170
Change-Id: I4b990b19f9a308f60c8f4f148934acffefe0a5bd
Reviewed-on: https://go-review.googlesource.com/c/140698
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Teach samesafeexpr to handle arithmetic unary and binary ops.
It makes map lookup optimization possible in
m[k+1] = append(m[k+1], ...)
m[-k] = append(m[-k], ...)
... etc
Does not cover "+" for strings (concatenation).
Change-Id: Ibbb16ac3faf176958da344be1471b06d7cf33a6c
Reviewed-on: https://go-review.googlesource.com/135795
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>