Add rule to PPC64.rules to inline runtime.memmove in more cases, as is
done for other target architectures
Updated tests in codegen/copy.go to verify changes are done on
ppc64/ppc64le
Updates #41662
Change-Id: Id937ce21f9b4f4047b3e66dfa3c960128ee16a2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/352054
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
This CL add runtime.memmove inlining for AMD64 and ARM64.
According to ssa dump from testcases generic rules can't inline
memmomve properly due to one of the arguments is Phi operation. But this
Phi op will be optimized out by later optimization stages. As a result
memmove can be inlined during arch-specific rules.
The commit add new optimization rules to arch-specific rules that can
inline runtime.memmove if it possible during lowering stage.
Optimization fires 5 times in Go source-code using regabi.
Fixes#41662
Change-Id: Iaffaf4c482d068b5f0683d141863892202cc8824
Reviewed-on: https://go-review.googlesource.com/c/go/+/289151
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: David Chase <drchase@google.com>
Memmove can use AVX/prefetches/other optional instructions, so
only do it for small sizes, when call overhead dominates.
Change-Id: Ice5e93deb11462217f7fb5fc350b703109bb4090
Reviewed-on: https://go-review.googlesource.com/112517
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Move ops can be faster than memmove calls because the number of bytes
to be moved is fixed and they don't incur the overhead of a call.
This change allows memmove to be converted into a Move op when the
arguments are disjoint.
The optimization is only enabled on s390x at the moment, however
other architectures may also benefit from it in the future. The
memmove inlining rule triggers an extra 12 times when compiling the
standard library. It will most likely make more of a difference as the
disjoint function is improved over time (to recognize fresh heap
allocations for example).
Change-Id: I9af570dcfff28257b8e59e0ff584a46d8e248310
Reviewed-on: https://go-review.googlesource.com/110064
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
If both inputs are constant offsets from the same pointer then we
can evaluate NeqPtr and EqPtr at compile time. Triggers a few times
during all.bash. Removes a conditional branch in the following
code:
copy(x[1:], x[:])
This branch was recently added as an optimization in CL 94596. We
now skip the memmove if the pointers are equal. However, in the
above code we know at compile time that they are never equal.
Also, when the offset is variable, check if the offset is zero
rather than if the pointers are equal. For example:
copy(x[a:], x[:])
This would now skip the copy if a == 0, rather than if x + a == x.
Finally I've also added a rule to make IsNonNil true for pointers
to values on the stack. The nil check elimination pass will catch
these anyway, but eliminating them here might eliminate branches
earlier.
Change-Id: If72f436fef0a96ad0f4e296d3a1f8b6c3e712085
Reviewed-on: https://go-review.googlesource.com/106635
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>