That way we don't need to call into the runtime for every
type assertion (to an interface type).
name old time/op new time/op delta
TypeAssert-24 3.78ns ± 3% 1.00ns ± 1% -73.53% (p=0.000 n=10+8)
Change-Id: I0ba308aaf0f24a5495b4e13c814d35af0c58bfde
Reviewed-on: https://go-review.googlesource.com/c/go/+/529316
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
That way we don't need to call into the runtime when the type being
switched on has been seen many times before.
The cache is just a hash table of a sample of all the concrete types
that have been switched on at that source location. We record the
matching case number and the resulting itab for each concrete input
type.
The caches seldom get large. The only two in a run of all.bash that
get more than 100 entries, even with the sampling rate set to 1, are
test/fixedbugs/issue29264.go, with 101
test/fixedbugs/issue29312.go, with 254
Both happen at the type switch in fmt.(*pp).handleMethods, perhaps
unsurprisingly.
name old time/op new time/op delta
SwitchInterfaceTypePredictable-24 25.8ns ± 2% 2.5ns ± 3% -90.43% (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24 37.5ns ± 2% 11.2ns ± 1% -70.02% (p=0.000 n=10+10)
Change-Id: I4961ac9547b7f15b03be6f55cdcb972d176955eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/526658
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
For type switches where the targets are interface types,
call into the runtime once instead of doing a sequence
of assert* calls.
name old time/op new time/op delta
SwitchInterfaceTypePredictable-24 26.6ns ± 1% 25.8ns ± 2% -2.86% (p=0.000 n=10+10)
SwitchInterfaceTypeUnpredictable-24 39.3ns ± 1% 37.5ns ± 2% -4.57% (p=0.000 n=10+10)
Not super helpful by itself, but this code organization allows
followon CLs that add caching to the lookups.
Change-Id: I7967f85a99171faa6c2550690e311bea8b54b01c
Reviewed-on: https://go-review.googlesource.com/c/go/+/526657
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Add a new form of RLDC which maps directly to the ISA definition
of rldc: RLDC Rs, $sh, $mb, Ra. This is used to generate mask
constants described below.
Using MOVD $-1, Rx; RLDC Rx, $sh, $mb, Rx, any mask constant
can be generated. A mask is a contiguous series of 1 bits, which
may wrap.
Change-Id: Ifcaae1114080ad58b5fdaa3e5fc9019e2051f282
Reviewed-on: https://go-review.googlesource.com/c/go/+/531120
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Check for shifted 16b constants, and transform them to avoid the load
penalty. This should be much faster than loading, and reduce binary
size by reducing the constant pool size.
Change-Id: I6834e08be7ca88e3b77449d226d08d199db84299
Reviewed-on: https://go-review.googlesource.com/c/go/+/531119
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This sequence can show up in the lowering pass on PPC64. If it
makes it to the latelower pass, it will cause an error because
it looks like it can be turned into RLDICL, but -1 isn't an
accepted mask.
Also, print more debug info if panic is called from
encodePPC64RotateMask.
Fixes#62698
Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c
Reviewed-on: https://go-review.googlesource.com/c/go/+/529195
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
sparse conditional constant propagation can discover optimization
opportunities that cannot be found by just combining constant folding
and constant propagation and dead code elimination separately.
This is a re-submit of PR#59575, which fix a broken dominance relationship caught by ssacheck
Updates https://github.com/golang/go/issues/59399
Change-Id: I57482dee38f8e80a610aed4f64295e60c38b7a47
GitHub-Last-Rev: 830016f24e
GitHub-Pull-Request: golang/go#60469
Reviewed-on: https://go-review.googlesource.com/c/go/+/498795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Generate RLDIC[LR] instead of MOVD mask, Rx; AND Rx, Ry, Rz.
This helps reduce code size, and reduces the latency caused
by the constant load.
Similarly, for smaller-than-register values, truncate constants
which exceed the range of the value's type to avoid needing to
load a constant.
Change-Id: I6019684795eb8962d4fd6d9585d08b17c15e7d64
Reviewed-on: https://go-review.googlesource.com/c/go/+/515576
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
For large interface -> concrete type switches, we can use a jump
table on some bits of the type hash instead of a binary search on
the type hash.
name old time/op new time/op delta
SwitchTypePredictable-24 1.99ns ± 2% 1.78ns ± 5% -10.87% (p=0.000 n=10+10)
SwitchTypeUnpredictable-24 11.0ns ± 1% 9.1ns ± 2% -17.55% (p=0.000 n=7+9)
Change-Id: Ida4768e5d62c3ce1c2701288b72664aaa9e64259
Reviewed-on: https://go-review.googlesource.com/c/go/+/521497
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
This lets us combine more write barriers, getting rid of some of the
test+branch and gcWriteBarrier* calls.
With the new write barriers, it's easy to add a few non-pointer writes
to the set of values written.
We allow up to 2 non-pointer writes between pointer writes. This is enough
for, for example, adjacent slice fields.
Fixes#62126
Change-Id: I872d0fa9cc4eb855e270ffc0223b39fde1723c4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/521498
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv
Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066
Reviewed-on: https://go-review.googlesource.com/c/go/+/506616
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: M Zhuo <mzh@golangcn.org>
If we're not using the upper bits, don't bother issuing a
sign/zero extension operation.
For arm64, after CL 520916 which fixed a correctness bug with
extensions but as a side effect leaves many unnecessary ones
still in place.
Change-Id: I5f4fe4efbf2e9f80969ab5b9a6122fb812dc2ec0
Reviewed-on: https://go-review.googlesource.com/c/go/+/521496
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Stop using BTSconst and friends when ORLconst can be used instead.
OR can be issued by more function units than BTS can, so it could
lead to better IPC. OR might take a few more bytes to encode, but
not a lot more.
Still use BTSconst for cases where the constant otherwise wouldn't
fit and would require a separate movabs instruction to materialize
the constant. This happens when setting bits 31-63 of 64-bit targets.
Add BTS-to-memory operations so we don't need to load/bts/store.
Fixes#61694
Change-Id: I00379608df8fb0167cb01466e97d11dec7c1596c
Reviewed-on: https://go-review.googlesource.com/c/go/+/515755
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Fixes#61629
This reduce the pressure on regalloc because then the loop only keep alive
one value (the iterator) instead of the iterator and the upper bound since
the comparison now acts against an immediate, often zero which can be skipped.
This optimize things like:
for i := 0; i < n; i++ {
Or a range over a slice where the index is not used:
for _, v := range someSlice {
Or the new range over int from #61405:
for range n {
It is hit in 975 unique places while doing ./make.bash.
Change-Id: I5facff8b267a0b60ea3c1b9a58c4d74cdb38f03f
Reviewed-on: https://go-review.googlesource.com/c/go/+/512935
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
s==s is always true for strings. This comes up in NaN testing in
generic code, where we want x==x to compile completely away except for
float types.
Fixes#60777
Change-Id: I3ce054b5121354de2f9751b010fb409f148cb637
Reviewed-on: https://go-review.googlesource.com/c/go/+/503795
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
If we load 2 values and then store those 2 loaded values, we can likely
perform that operation with a single wider load and store.
Fixes#60709
Change-Id: Ifc5f92c2f1b174c6ed82a69070f16cec6853c770
Reviewed-on: https://go-review.googlesource.com/c/go/+/502295
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
(ANDCCconst [y] (MOV.*reg x)) should only be merged when zero
extending. Otherwise, sign bits are lost on negative values.
(ANDCCconst [0xFF] (MOVBreg x)) should be simplified to a zero
extension of x. Likewise for the MOVHreg variant.
Fixes#61297
Change-Id: I04e4fd7dc6a826e870681f37506620d48393698b
Reviewed-on: https://go-review.googlesource.com/c/go/+/508775
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
According to RISCV manual 11.6:
FMADD x,y,z computes x*y+z and
FNMADD x,y,z => -x*y-z
FMSUB x,y,z => x*y-z
FNMSUB x,y,z => -x*y+z respectively
However our implement of SSA convert FMADD -x,y,z to FNMADD x,y,z which
is wrong and should be convert to FNMSUB according to manual.
Change-Id: Ib297bc83824e121fd7dda171ed56ea9694a4e575
Reviewed-on: https://go-review.googlesource.com/c/go/+/506575
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Don't use the line number of the argument itself, as that may be from
arbitrarily earlier in the function.
Fixes#60673
Change-Id: Ifc0a2aaae221a256be3a4b0b2e04849bae4b79d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/502656
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Some testdir tests fail if GOEXPERIMENT=cgocheck2 is set. Fix this by
skipping these tests.
Change-Id: I58d4ef0cceb86bcf93220b4a44de9b9dc4879b16
Reviewed-on: https://go-review.googlesource.com/c/go/+/499675
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately.
Updates #59399
Change-Id: Ia954e906480654a6f0cc065d75b5912f96f36b2e
GitHub-Last-Rev: 90fc02db99
GitHub-Pull-Request: golang/go#59575
Reviewed-on: https://go-review.googlesource.com/c/go/+/483875
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
In Go 1.17, cmd/compile gained the ability to inline calls to
functions that contain function literals (aka "closures"). This was
implemented by duplicating the function literal body and emitting a
second LSym, because in general it might be optimized better than the
original function literal.
However, the second LSym was named simply as any other function
literal appearing literally in the enclosing function would be named.
E.g., if f has a closure "f.funcX", and f is inlined into g, we would
create "g.funcY" (N.B., X and Y need not be the same.). Users then
have no idea this function originally came from f.
With this CL, the inlined call stack is incorporated into the clone
LSym's name: instead of "g.funcY", it's named "g.f.funcY".
In the future, it seems desirable to arrange for the clone's name to
appear exactly as the original name, so stack traces remain the same
as when -l or -d=inlfuncswithclosures are used. But it's unclear
whether the linker supports that today, or whether any downstream
tooling would be confused by this.
Updates #60324.
Change-Id: Ifad0ccef7e959e72005beeecdfffd872f63982f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/497137
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Be more liberal about expanding the OR tree. Handle any tree shape
instead of a fully left or right associative tree.
Also remove tail feature, it isn't ever needed.
Change-Id: If16bebef94b952a604d6069e9be3d9129994cb6f
Reviewed-on: https://go-review.googlesource.com/c/go/+/494056
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ryan Berger <ryanbberger@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
This modifies some existing rules to allow more prefixed instructions
to be generated when using GOPPC64=power10. Some rules also check
if PCRel is available, which is currently supported for linux/ppc64le
and linux/ppc64 (internal linking only).
Prior to p10, DS-offset loads and stores had a 16 bit size limit for
the offset field. If the offset of the data for load or store was
beyond this range then an indexed load or store would be selected by
the rules.
In p10 the assembler can generate prefixed instructions in this case,
but does not if an indexed instruction was selected during the lowering
pass.
This allows many more cases to use prefixed loads or stores, reducing
function sizes and improving performance in some cases where the code
change happens in key loops.
For example in strconv BenchmarkAppendQuoteRune before:
12c5e4: 15 00 10 06 pla r10,1425660
12c5e8: fc c0 40 39
12c5ec: 00 00 6a e8 ld r3,0(r10)
12c5f0: 10 00 aa e8 ld r5,16(r10)
After this change:
12a828: 15 00 10 04 pld r3,1433272
12a82c: b8 de 60 e4
12a830: 15 00 10 04 pld r5,1433280
12a834: c0 de a0 e4
Performs better in the second case.
A testcase was added to verify that the rules correctly select a load or
store based on the offset and whether power10 or earlier.
Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846
Reviewed-on: https://go-review.googlesource.com/c/go/+/477398
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
The effect and motivation is for the test to be selected when doing
'go test cmd' and not when doing 'go test std' since it's primarily
about testing the Go compiler and linker. Other than that, it's run
by all.bash and 'go test std cmd' as before.
For #56844.
Fixes#60059.
Change-Id: I2d499af013f9d9b8761fdf4573f8d27d80c1fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/493876
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Adds rules that rewrites statements such as ~P&~Q as ~(P|Q) and ~P|~Q as ~(P&Q), removing an extraneous instruction.
Change-Id: Icedb97df741680ddf9799df79df78657173aa500
GitHub-Last-Rev: f22e2350c9
GitHub-Pull-Request: golang/go#60018
Reviewed-on: https://go-review.googlesource.com/c/go/+/493175
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Stefan M <st3f4nm4d4@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Thanks to the recent addition of the memcombine pass, the
ppc64 ports now have the memcombine optimizations. Previously
in PPC64.rules, the memcombine rules were only added for
ppc64le targets due to the significant increase in size of
the rewritePPC64.go file when those rules were added. The
ppc64 and ppc64le rules had to be different because of the
byte order due to endianness differences.
This enables the memcombine tests to be run on ppc64 as well
as ppc64le.
Change-Id: I4081e2d94617a1b66541d536c0c2662e266c9c1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/492615
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Lynn Boger <laboger@linux.vnet.ibm.com>
Add more rules to ensure that order doesn't matter.
Add memequal 0 rule.
Try to use a constant argument to memequal when one is available.
Fixes#59684
Change-Id: I36e85ffbd949396ed700ed6e8ec2bc3ae013f5d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/485535
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently, the prove pass can get knowledge from some specific logic
operators only before the CFG is explored, which means that the bounds
information of the branch will be ignored.
This CL updates the facts table by the logic operators in every
branch. Combined with the branch information, this will be helpful for
BCE in some circumstances.
Fixes#57243
Change-Id: I0bd164f1b47804ccfc37879abe9788740b016fd5
Reviewed-on: https://go-review.googlesource.com/c/go/+/419555
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Under the right conditions we can optimize cmp comparisons to cmn
comparisons, such as:
func foo(a, b int) int {
var c int
if a + b < 0 {
c = 1
}
return c
}
Previously it's compiled as:
ADD R1, R0, R1
CMP $0, R1
CSET LT, R0
With this CL it's compiled as:
CMN R1, R0
CSET MI, R0
Here we need to pay attention to the overflow situation of a+b, the MI
flag means N==1, which doesn't honor the overflow flag V, its value
depends only on the sign of the result. So it has the same semantic of
the Go code, so it's correct.
Similarly, this CL also optimizes the case of >= comparison
using the PL conditional flag.
Change-Id: I47179faba5b30cca84ea69bafa2ad5241bf6dfba
Reviewed-on: https://go-review.googlesource.com/c/go/+/476116
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Add the following common local transformations
(t + x) - (t + y) == x - y
(t + x) - (y + t) == x - y
(x + t) - (y + t) == x - y
(x + t) - (t + y) == x - y
(x - t) + (t + y) == x + y
(x - t) + (y + t) == x + y
The compiler itself matches such patterns many times. This also aligns with other popular compilers.
Fixes#59111
Change-Id: Ibdfdb414782f8fcaa20b84ac5d43d0d9ae2c7b60
GitHub-Last-Rev: 1aad82e62e
GitHub-Pull-Request: golang/go#59119
Reviewed-on: https://go-review.googlesource.com/c/go/+/477555
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
This CL add support for instrinsifying the TrialingZeros{8,32,64}
functions for 386 architecture. We need handle the case when the input
is 0, which could lead to undefined output from the BSFL instruction.
Next CL will remove the assembly code in runtime/internal/sys package.
Change-Id: Ic168edf68e81bf69a536102100fdd3f56f0f4a1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/475735
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
On Arm64, all 32-bit instructions will ignore the upper 32 bits and
clear them to zero for the result. No need to do an unsign extend before
a 32 bit op.
This CL removes the redundant unsign extension only for the existing
32-bit opcodes, and also omits the sign extension when the upper bit of
the result can be predicted.
Fixes#42162
Change-Id: I61e6670bfb8982572430e67a4fa61134a3ea240a
CustomizedGitHooks: yes
Reviewed-on: https://go-review.googlesource.com/c/go/+/427454
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Eric Fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
As motivated on the issue, we want to move the functionality of the
run.go program to happen via a normal go test. Each .go test case in
the GOROOT/test directory gets a subtest, and cmd/go's support for
parallel test execution replaces run.go's own implementation thereof.
The goal of this change is to have fairly minimal and readable diff
while making an atomic changeover. The working directory is modified
during the test execution to be GOROOT/test as it was with run.go,
and most of the test struct and its run method are kept unchanged.
The next CL in the stack applies further simplifications and cleanups
that become viable.
There's no noticeable difference in test execution time: it takes around
60-80 seconds both before and after on my machine. Test caching, which
the previous runner lacked, can shorten the time significantly.
For #37486.
Fixes#56844.
Change-Id: I209619dc9d90e7529624e49c01efeadfbeb5c9ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/463276
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Some integer comparisons with 1 and -1 can be rewritten as comparisons
with 0. For example, x < 1 is equivalent to x <= 0. This is an
advantageous transformation on riscv64 because comparisons with zero
do not require a constant to be loaded into a register. Other
architectures will likely benefit too and the transformation is
relatively benign on architectures that do not benefit.
Change-Id: I2ce9821dd7605a660eb71d76e83a61f9bae1bf25
Reviewed-on: https://go-review.googlesource.com/c/go/+/350831
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>