1. Some incorrect test cases are disabled.
2. Some wrong test cases are corrected.
3. Some new test cases are added.
Change-Id: Ib5d0473d55159f233ddab79f96967eaec7b08597
Reviewed-on: https://go-review.googlesource.com/113736
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Initialization of t.UInt is missing from SetTypPtrs in config.go,
preventing rules that use it from matching when they should.
This adds the initialization to allow those rules to work.
Updated test/codegen/rotate.go to test for this case, which
appears in math/bits RotateLeft32 and RotateLeft64. There had been
a testcase for this in go 1.10 but that went away when asm_test.go
was removed.
Change-Id: I82fc825ad8364df6fc36a69a1e448214d2e24ed5
Reviewed-on: https://go-review.googlesource.com/112518
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Move ops can be faster than memmove calls because the number of bytes
to be moved is fixed and they don't incur the overhead of a call.
This change allows memmove to be converted into a Move op when the
arguments are disjoint.
The optimization is only enabled on s390x at the moment, however
other architectures may also benefit from it in the future. The
memmove inlining rule triggers an extra 12 times when compiling the
standard library. It will most likely make more of a difference as the
disjoint function is improved over time (to recognize fresh heap
allocations for example).
Change-Id: I9af570dcfff28257b8e59e0ff584a46d8e248310
Reviewed-on: https://go-review.googlesource.com/110064
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
replace map clears of the form:
for k := range m {
delete(m, k)
}
(where m is map with key type that is reflexive for ==)
with a new runtime function that clears the maps backing
array with a memclr and reinitializes the hmap struct.
Map key types that for example contain floats are not
replaced by this optimization since NaN keys cannot
be deleted from maps using delete.
name old time/op new time/op delta
GoMapClear/Reflexive/1 92.2ns ± 1% 47.1ns ± 2% -48.89% (p=0.000 n=9+9)
GoMapClear/Reflexive/10 108ns ± 1% 48ns ± 2% -55.68% (p=0.000 n=10+10)
GoMapClear/Reflexive/100 303ns ± 2% 110ns ± 3% -63.56% (p=0.000 n=10+10)
GoMapClear/Reflexive/1000 3.58µs ± 3% 1.23µs ± 2% -65.49% (p=0.000 n=9+10)
GoMapClear/Reflexive/10000 28.2µs ± 3% 10.3µs ± 2% -63.55% (p=0.000 n=9+10)
GoMapClear/NonReflexive/1 121ns ± 2% 124ns ± 7% ~ (p=0.097 n=10+10)
GoMapClear/NonReflexive/10 137ns ± 2% 139ns ± 3% +1.53% (p=0.033 n=10+10)
GoMapClear/NonReflexive/100 331ns ± 3% 334ns ± 2% ~ (p=0.342 n=10+10)
GoMapClear/NonReflexive/1000 3.64µs ± 3% 3.64µs ± 2% ~ (p=0.887 n=9+10)
GoMapClear/NonReflexive/10000 28.1µs ± 2% 28.4µs ± 3% ~ (p=0.247 n=10+10)
Fixes#20138
Change-Id: I181332a8ef434a4f0d89659f492d8711db3f3213
Reviewed-on: https://go-review.googlesource.com/110055
Reviewed-by: Keith Randall <khr@golang.org>
Use conditional moves instead of subtractions with borrow to handle
saturation cases. This allows us to delete the SUBE/SUBEW ops and
associated rules from the SSA backend. Using conditional moves also
means we can detect when shift values are masked so I've added some
new rules to constant fold the relevant comparisons and masking ops.
Also use the new shiftIsBounded() function to avoid generating code
to handle saturation cases where possible.
Updates #25167 for s390x.
Change-Id: Ief9991c91267c9151ce4c5ec07642abb4dcc1c0d
Reviewed-on: https://go-review.googlesource.com/110070
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Rename memclrrange to signify that it does not handle
all types of range clears.
Simplify checks to detect the range clear idiom for
arrays and slices.
Add tests to verify the optimization for the slice
range clear idiom is being applied by the compiler.
Change-Id: I5c3b7c9a479699ebdb4c407fde692f30f377860c
Reviewed-on: https://go-review.googlesource.com/110477
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
mips64 softfloat support is based on mips implementation and introduces
new enviroment variable GOMIPS64.
GOMIPS64 is a GOARCH=mips64{,le} specific option, for a choice between
hard-float and soft-float. Valid values are 'hardfloat' (default) and
'softfloat'. It is passed to the assembler as
'GOMIPS64_{hardfloat,softfloat}'.
Change-Id: I7f73078627f7cb37c588a38fb5c997fe09c56134
Reviewed-on: https://go-review.googlesource.com/108475
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
On amd64, Ctz must include special handling of zeros.
But the prove pass has enough information to detect whether the input
is non-zero, allowing a more efficient lowering.
Introduce new CtzNonZero ops to capture and use this information.
Benchmark code:
func BenchmarkVisitBits(b *testing.B) {
b.Run("8", func(b *testing.B) {
for i := 0; i < b.N; i++ {
x := uint8(0xff)
for x != 0 {
sink = bits.TrailingZeros8(x)
x &= x - 1
}
}
})
// and similarly so for 16, 32, 64
}
name old time/op new time/op delta
VisitBits/8-8 7.27ns ± 4% 5.58ns ± 4% -23.35% (p=0.000 n=28+26)
VisitBits/16-8 14.7ns ± 7% 10.5ns ± 4% -28.43% (p=0.000 n=30+28)
VisitBits/32-8 27.6ns ± 8% 19.3ns ± 3% -30.14% (p=0.000 n=30+26)
VisitBits/64-8 44.0ns ±11% 38.0ns ± 5% -13.48% (p=0.000 n=30+30)
Fixes#25077
Change-Id: Ie6e5bd86baf39ee8a4ca7cadcf56d934e047f957
Reviewed-on: https://go-review.googlesource.com/109358
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This change implements math.Round as an intrinsic on ppc64x so it can be
done using a single instruction.
benchmark old ns/op new ns/op delta
BenchmarkRound-16 2.60 0.69 -73.46%
Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8
Reviewed-on: https://go-review.googlesource.com/109395
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
The previous change sped up the pure computation form of LeadingZeros8.
This places it somewhat close to the table lookup form.
Depending on something that varies from toolchain to toolchain
(alignment, perhaps?), the slowdown from ditching the table lookup
is either 20% or 5%.
This benchmark is the best case scenario for the table lookup:
It is in the L1 cache already.
I think we're close enough that we can switch to the computational version,
and trust that the memory effects and binary size savings will be worth it.
Code:
func f8(x uint8) { z = bits.LeadingZeros8(x) }
Before:
"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX
0x0005 00005 (x.go:7) MOVBLZX AL, AX
0x0008 00008 (x.go:7) LEAQ math/bits.len8tab(SB), CX
0x000f 00015 (x.go:7) MOVBLZX (CX)(AX*1), AX
0x0013 00019 (x.go:7) ADDQ $-8, AX
0x0017 00023 (x.go:7) NEGQ AX
0x001a 00026 (x.go:7) MOVQ AX, "".z(SB)
0x0021 00033 (x.go:7) RET
After:
"".f8 STEXT nosplit size=30 args=0x8 locals=0x0
0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8
0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX
0x0005 00005 (x.go:7) MOVBLZX AL, AX
0x0008 00008 (x.go:7) LEAL 1(AX)(AX*1), AX
0x000c 00012 (x.go:7) BSRL AX, AX
0x000f 00015 (x.go:7) ADDQ $-8, AX
0x0013 00019 (x.go:7) NEGQ AX
0x0016 00022 (x.go:7) MOVQ AX, "".z(SB)
0x001d 00029 (x.go:7) RET
Change-Id: Icc7db50a7820fb9a3da8a816d6b6940d7f8e193e
Reviewed-on: https://go-review.googlesource.com/108942
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Rewrite x<<1+c into x+x+c, which can be expressed as a single LEAQ/LEAL.
Bit of a special case, but the single-instruction
LEA is both shorter and faster than SHL then ADD.
Triggers 293 times during make.bash.
Change-Id: I3f09c8e9a8f3859d1eeed336f095fc3ada79c2c1
Reviewed-on: https://go-review.googlesource.com/108938
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
If both inputs are constant offsets from the same pointer then we
can evaluate NeqPtr and EqPtr at compile time. Triggers a few times
during all.bash. Removes a conditional branch in the following
code:
copy(x[1:], x[:])
This branch was recently added as an optimization in CL 94596. We
now skip the memmove if the pointers are equal. However, in the
above code we know at compile time that they are never equal.
Also, when the offset is variable, check if the offset is zero
rather than if the pointers are equal. For example:
copy(x[a:], x[:])
This would now skip the copy if a == 0, rather than if x + a == x.
Finally I've also added a rule to make IsNonNil true for pointers
to values on the stack. The nil check elimination pass will catch
these anyway, but eliminating them here might eliminate branches
earlier.
Change-Id: If72f436fef0a96ad0f4e296d3a1f8b6c3e712085
Reviewed-on: https://go-review.googlesource.com/106635
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This CL makes the codegen testsuite automatically test all
architecture variants for architecture specified in tests. For
instance, if a test file specifies a "arm" test, it will be
automatically run on all GOARM variants (5,6,7), to increase
the coverage.
The CL also introduces a syntax to specify only a specific
variant (eg: "arm/7") in case the test makes sense only there.
The same syntax also allows to specify the operating system
in case it matters (eg: "plan9/386/sse2").
Fixes#24658
Change-Id: I2eba8b918f51bb6a77a8431a309f8b71af07ea22
Reviewed-on: https://go-review.googlesource.com/107315
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
And remove it from asmtest. Next CL will remove the whole
asmtest infrastructure.
Change-Id: I5851bf7c617456d62a3c6cffacf70252df7b056b
Reviewed-on: https://go-review.googlesource.com/107335
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
And delete them from asm_test.
Also delete an arm64 cmov test has been already ported to the new test
harness.
Change-Id: I4458721e1f512bc9ecbbe1c22a2c9c7109ad68fe
Reviewed-on: https://go-review.googlesource.com/106335
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
And delete them from asm_test.
Change-Id: I24f421b87e8cb4770c887a6dfd58eacd0088947d
Reviewed-on: https://go-review.googlesource.com/106056
Reviewed-by: Keith Randall <khr@golang.org>
And delete them from asm_test.
Change-Id: I9a75efe9858ef9d7ac86065f860c2ae3f25b0941
Reviewed-on: https://go-review.googlesource.com/105597
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Multi-byte comparison operations were used on amd64, arm64, i386
and s390x for comparisons with constant arrays, but only amd64 and
i386 for comparisons with string constants. This CL combines the
check for platform capability, since they have the same requirements,
and also enables both on ppc64le which also supports load merging.
Note that these optimizations currently use little endian byte order
which results in byte reversal instructions on s390x. This should
be fixed at some point.
Change-Id: Ie612d13359b50c77f4d7c6e73fea4a59fa11f322
Reviewed-on: https://go-review.googlesource.com/102558
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
And delete them from asm_test.
Change-Id: I0e33d58274951ab5acb67b0117b60ef617ea887a
Reviewed-on: https://go-review.googlesource.com/105735
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
And delete them from asm_test.
Change-Id: Id533130470da9176a401cb94972f626f43a62148
Reviewed-on: https://go-review.googlesource.com/103656
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
And delete them from asm_test.
Change-Id: Idfe1249052d82d15b9c30b292c78656a0bf7b48d
Reviewed-on: https://go-review.googlesource.com/103315
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
And delete them from asm_test.
Change-Id: I34fcf85ae8ce09cd146fe4ce6a0ae7616bd97e2d
Reviewed-on: https://go-review.googlesource.com/102296
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
And remove them from asm_test.
Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454
Reviewed-on: https://go-review.googlesource.com/102115
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
And delete them from asm_test.
Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
Reviewed-on: https://go-review.googlesource.com/101595
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
And delete them from asm_test.
Change-Id: I3cf0934706a640136cb0f646509174f8c1bf3363
Reviewed-on: https://go-review.googlesource.com/101395
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
And delete them from asm_test.
Change-Id: Ibdaca3496eefc73c731b511ddb9636a1f3dff68c
Reviewed-on: https://go-review.googlesource.com/100915
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
And delete them from asm_test.
Change-Id: I29c8d098a8893e6b669b6272a2f508985ac9d618
Reviewed-on: https://go-review.googlesource.com/100876
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This reverts commit 080187f4f7.
It broke build of golang.org/x/exp/shiny/iconvg
See issue 24395 for details
Change-Id: Ifd6134f6214e6cee40bd3c63c32941d5fc96ae8b
Reviewed-on: https://go-review.googlesource.com/100755
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This change ports all the remaining tests checking that small memmoves
are replaced with MOVs to the new codegen test harness, and deletes
them from the asm_test file.
Change-Id: I01c94b441e27a5d61518035af62d62779dafeb56
Reviewed-on: https://go-review.googlesource.com/100476
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
cmd/asm now supports three-operand form of IMUL,
so instead of using IMUL with resultInArg0, emit IMUL3 instruction.
This results in less redundant MOVs where SSA assigns
different registers to input[0] and dst arguments.
Note: these have exactly the same encoding when reg0=reg1:
IMUL3x $const, reg0, reg1
IMULx $const, reg
Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0].
This is why we don't bother to generate IMULx for the case where
dst is the same as input[0].
Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4
Reviewed-on: https://go-review.googlesource.com/99656
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>