If we're not using the upper bits, don't bother issuing a
sign/zero extension operation.
For arm64, after CL 520916 which fixed a correctness bug with
extensions but as a side effect leaves many unnecessary ones
still in place.
Change-Id: I5f4fe4efbf2e9f80969ab5b9a6122fb812dc2ec0
Reviewed-on: https://go-review.googlesource.com/c/go/+/521496
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
On Arm64, all 32-bit instructions will ignore the upper 32 bits and
clear them to zero for the result. No need to do an unsign extend before
a 32 bit op.
This CL removes the redundant unsign extension only for the existing
32-bit opcodes, and also omits the sign extension when the upper bit of
the result can be predicted.
Fixes#42162
Change-Id: I61e6670bfb8982572430e67a4fa61134a3ea240a
CustomizedGitHooks: yes
Reviewed-on: https://go-review.googlesource.com/c/go/+/427454
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Eric Fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Use a small python script to consolidate duplicate
ppc64/ppc64le tests into a single ppc64x codegen test.
This makes small assumption that anytime two tests with
for different arch/variant combos exists, those tests
can be combined into a single ppc64x test.
E.x:
// ppc64le: foo
// ppc64le/power9: foo
into
// ppc64x: foo
or
// ppc64: foo
// ppc64le: foo
into
// ppc64x: foo
import glob
import re
files = glob.glob("codegen/*.go")
for file in files:
with open(file) as f:
text = [l for l in f]
i = 0
while i < len(text):
first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i])
if first:
j = i+1
while j < len(text):
second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j])
if not second:
break
if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3):
text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i])
text=text[:j] + (text[j+1:])
else:
j += 1
i+=1
with open(file, 'w') as f:
f.write("".join(text))
Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821
Reviewed-on: https://go-review.googlesource.com/c/go/+/463220
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This updates the codegen tests in noextend.go so they are not
dependent on the ABI.
Change-Id: I8433bea9dc78830c143290a7e0cf901b2397d38a
Reviewed-on: https://go-review.googlesource.com/c/go/+/353070
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This adds some improvements to the rules for PPC64 to eliminate
unnecessary zero or sign extends, and fix some rule for truncates
which were not always using the correct sign instruction.
This reduces of size of many functions by 1 or 2 instructions and
can improve performance in cases where the execution time depends
on small loops where at least 1 instruction was removed and where that
loop contributes a significant amount of the total execution time.
Included is a testcase for codegen to verify the sign/zero extend
instructions are omitted.
An example of the improvement (strings):
IndexAnyASCII/256:1-16 392ns ± 0% 369ns ± 0% -5.79% (p=0.000 n=1+10)
IndexAnyASCII/256:2-16 397ns ± 0% 376ns ± 0% -5.23% (p=0.000 n=1+9)
IndexAnyASCII/256:4-16 405ns ± 0% 384ns ± 0% -5.19% (p=1.714 n=1+6)
IndexAnyASCII/256:8-16 427ns ± 0% 403ns ± 0% -5.57% (p=0.000 n=1+10)
IndexAnyASCII/256:16-16 441ns ± 0% 418ns ± 1% -5.33% (p=0.000 n=1+10)
IndexAnyASCII/4096:1-16 5.62µs ± 0% 5.27µs ± 1% -6.31% (p=0.000 n=1+10)
IndexAnyASCII/4096:2-16 5.67µs ± 0% 5.29µs ± 0% -6.67% (p=0.222 n=1+8)
IndexAnyASCII/4096:4-16 5.66µs ± 0% 5.28µs ± 1% -6.66% (p=0.000 n=1+10)
IndexAnyASCII/4096:8-16 5.66µs ± 0% 5.31µs ± 1% -6.10% (p=0.000 n=1+10)
IndexAnyASCII/4096:16-16 5.70µs ± 0% 5.33µs ± 1% -6.43% (p=0.182 n=1+10)
Change-Id: I739a6132b505936d39001aada5a978ff2a5f0500
Reviewed-on: https://go-review.googlesource.com/129875
Reviewed-by: David Chase <drchase@google.com>