The previous change moved code around to create slicesym.
This change simplifies slicesym and its callsites
by accepting an int64 for lencap instead of a node,
and by removing all the calls to gdata.
It also stops modifying n,
which avoids the need to make a copy of it.
Passes toolstash-check.
Change-Id: I4d25454d11b4bb8941000244443e3c99eef4bdd0
Reviewed-on: https://go-review.googlesource.com/c/go/+/227550
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This change mostly moves code around to unify it.
A subsequent change will simplify and improve slicesym.
Passes toolstash-check.
Change-Id: I84a877ea747febb2b571d4089ba6d905b51b27ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/227549
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Some runtime calls accept a slice, but only use ptr and len.
This change modifies most such routines to accept only ptr and len.
After this change, the only runtime calls that accept an unnecessary
cap arg are concatstrings and slicerunetostring.
Neither is particularly common, and both are complicated to modify.
Negligible compiler performance impact. Shrinks binaries a little.
There are only a few regressions; the one I investigated was
due to register allocation fluctuation.
Passes 'go test -race std cmd', modulo #38265 and #38266.
Wow, does that take a long time to run.
Updates #36890
file before after Δ %
compile 19655024 19655152 +128 +0.001%
cover 5244840 5236648 -8192 -0.156%
dist 3662376 3658280 -4096 -0.112%
link 6680056 6675960 -4096 -0.061%
pprof 14789844 14777556 -12288 -0.083%
test2json 2824744 2820648 -4096 -0.145%
trace 11647876 11639684 -8192 -0.070%
vet 8260472 8256376 -4096 -0.050%
total 115163736 115118808 -44928 -0.039%
Change-Id: Idb29fa6a81d6a82bfd3b65740b98cf3275ca0a78
Reviewed-on: https://go-review.googlesource.com/c/go/+/227163
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The offset is always a multiple of the pointer size.
Change-Id: I790e087e89a081044a3ec35d99880533a4c929bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/227540
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
I noticed a timeout in TestIgnore in
https://build.golang.org/log/52d83a72f3a5ea9a16eb5d670c729694144f9624,
which suggests that the settle time is currently set too low.
I've also added a check for the same GO_TEST_TIMEOUT_SCALE used in
TestTerminalSignal, so that if this builder remains too slow we can
increase the builder's scale factor rather than the test's baseline
running time.
Updates #33174
Change-Id: I18b10eaa3bb5ae2f604300aedaaf6f79ee7ad567
Reviewed-on: https://go-review.googlesource.com/c/go/+/227649
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Try to get some output even if the subprocess hangs.
For #25628
Change-Id: I4cc0a8f2c52b03a322b8fd0a620cba37b06ff10a
Reviewed-on: https://go-review.googlesource.com/c/go/+/227517
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Returning an URL.String() without the password is very useful for
situations where the URL is supposed to be logged and the password is
not useful to be shown.
This method re-uses URL.String() but with the password scrubbed and
substituted for a "xxxxx" in order to make it obvious that there was a
password. If the URL had no password then no "xxxxx" will be shown.
Fixes#34855
Change-Id: I7f17d81aa09a7963d2731d16fe15c6ae8e2285fc
GitHub-Last-Rev: 46d06dbc4f
GitHub-Pull-Request: golang/go#35578
Reviewed-on: https://go-review.googlesource.com/c/go/+/207082
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
The divBasic function computes the quotient of big nats u/v word by word.
It estimates each word qhat by performing a long division (top 2 words of u
divided by top word of v), looks at the next word to correct the estimate,
then perform a full multiplication (qhat*v) to catch any inaccuracy in the
estimate.
In the latter case, "negative" values appear temporarily and carries
must be carefully managed, and the recursive division refactoring
introduced a case where qhat*v has the same length as v, triggering an
out-of-bounds write in the case it happens when computing the top word
of the quotient.
Fixes#37499
Change-Id: I15089da4a4027beda43af497bf6de261eb792f94
Reviewed-on: https://go-review.googlesource.com/c/go/+/221980
Reviewed-by: Robert Griesemer <gri@golang.org>
Make typedmemmove, typedmemclr, typedmemclrpartial look more like other
callers of bulkBarrierPreWrite.
Change-Id: Ic47030d88bf07d290f91198b7810ffc16d9769e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/227541
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The cgo build tag is not necessary for root_darwin_arm64.go. We can't
build for darwin/arm64 without cgo, and even if we did 1) this code
would work fine 2) the no-cgo code that shells out to
/usr/bin/security would not work.
(Suggested by Filippo.)
Change-Id: I98cac2ea96ec5ac1ae60b7e32d195d5e86e2bd66
Reviewed-on: https://go-review.googlesource.com/c/go/+/227583
Reviewed-by: Filippo Valsorda <filippo@golang.org>
This test is currently flaky in the builders.
Skip it while we investigate.
For #37404
Change-Id: I53721d383a4cafbe8d031ed25a3b1be2ae8b4285
Reviewed-on: https://go-review.googlesource.com/c/go/+/227587
Reviewed-by: David Chase <drchase@google.com>
This removes all conditions and conditional code (that I could find)
that depended on darwin/386.
Fixes#37610.
Change-Id: I630d9ea13613fb7c0bcdb981e8367facff250ba0
Reviewed-on: https://go-review.googlesource.com/c/go/+/227582
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This removes all files that are only used on darwin/386 and cleans up
build tags in files that are still used on other platforms.
Updates #37610.
Change-Id: If246642476c12d15f59a474e2b91a29c0c02fe75
Reviewed-on: https://go-review.googlesource.com/c/go/+/227581
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This removes all conditions and conditional code (that I could find)
that depended on darwin/arm.
Fixes#35439 (since that only happened on darwin/arm)
Fixes#37611.
Change-Id: Ia4c32a5a4368ed75231075832b0b5bfb1ad11986
Reviewed-on: https://go-review.googlesource.com/c/go/+/227198
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This removes all files that are only used on darwin/arm and cleans up
build tags in files that are still used on other platforms.
Updates #37611.
Change-Id: Ic9490cf0edfc157c6276a7ca950c1768b34a998f
Reviewed-on: https://go-review.googlesource.com/c/go/+/227197
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Adds a GOMODCACHE environment variable that's used by cmd/go to determine the
location of the module cache. The default value of GOMODCACHE will be
GOPATH[0]/pkg/mod, the default location of the module cache before this change.
Replace the cmd/go/internal/modfetch.PkgMod variable which previously held the
location of the module cache with the new cmd/go/internal/cfg.GOMODCACHE
variable, for consistency with many of the other environment variables that
affect the behavior of cmd/go. (Most of the changes in this CL are due to
moving/renaming the variable.)
The value of cfg.GOMODCACHE is now set using a variable initializer. It was
previously set in cmd/go/internal/modload.Init.
The location of GOPATH/pkg/sumdb is unchanged by this CL. While it was
previously determined using the value of PkgMod, it now is determined
independently dirctly from the value of GOPATH[0].
Fixes#34527
Change-Id: Id4d31d217b3507d6057c8ef7c52af1a0606603e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/219538
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
TestPingPongHog tests properties of the scheduler.
But the race detector intentionally does randomized scheduling,
so the test is not applicable.
Fixes#38266
Change-Id: Ib06aa317b2776cb1faa641c4e038e2599cf70b2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/227344
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TestTryAdd is particularly brittle because it tests some real cases by
constructing fake sample stack frames. If those frames don't correctly
represent what the runtime would generate then they may fail to catch
regressions.
Instead, call runtime.Callers at the bottom of real function calls to
generate real frames as a base for truncation, etc in tests. Several of
these tests still have to fake parts of the frames to test the right
thing, but this is a bit less fragile.
Change-Id: I62522a9ded5544b06d1bf28550af5400f3af667b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227484
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
This change adds CFA information to the assembly function 'crosscall1'
and reorgnizes its code to establish well-formed prologue and epilogue.
It will fix an infinite callstack issue when debugging cgo program with
GDB on arm64.
Brief root cause analysis:
GDB's aarch64 unwinder parses prologue to determine current frame's size
and previous PC&SP if CFA information is not available.
The unwinder parses the prologue of 'crosscall1' to determine a frame size
of 0x10, then turns to its next frame trying to compute its previous PC&SP
as they are not saved on current frame's stack as per its 'traditional frame
unwind' rules, which ends up getting an endless frame chain like:
[callee] : pc:<pc0>, sp:<sp0>
crosscall1: pc:<pc1>, sp:<sp0>+0x10
[caller] : pc:<pc1>, sp:<sp0>+0x10+0x10
[caller] : pc:<pc1>, sp:<sp0>+0x10+0x10+0x10
...
GDB fails to detect the 'caller' frame is same as 'crosscall1' and terminate
unwinding since SP increases everytime.
Fixes#37238
Change-Id: Ia6bd8555828541a3a61f7dc9b94dfa00775ec52a
Reviewed-on: https://go-review.googlesource.com/c/go/+/226999
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.
Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.
Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.
See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.
I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.
I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).
Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.
For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.
Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.
To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.
Fixes#38029
Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.
Fixes#37316.
Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
In the commit message of CL 212360, I wrote:
> This new intrinsic ... generates MOVB+TESTB+NE.
> (It is possible that MOVBQZX+TESTQ+NE would be better.)
I should have tested. MOVBQZX+TESTQ+NE does in fact appear to be better.
For the benchmark in #36196, on my machine:
name old time/op new time/op delta
FMA-8 0.86ns ± 6% 0.70ns ± 5% -18.79% (p=0.000 n=98+97)
NonFMA-8 0.61ns ± 5% 0.60ns ± 4% -0.74% (p=0.001 n=100+97)
Interestingly, these are both considerably faster than
the measurements I took a couple of months ago (1.4ns/2ns).
It appears that CL 219131 (clearing VZEROUPPER in asyncPreempt) helped a lot.
And FMA is now once again slower than NonFMA, although this change
helps it regain some ground.
Updates #15808
Updates #36351
Updates #36196
Change-Id: I8a326289a963b1939aaa7eaa2fab2ec536467c7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/227238
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The test was not preserving temporary directory flags leading to a
failure on windows with:
mkdir C:\WINDOWS\go-build315158903: Access is denied.
Fixes#38251
Change-Id: I6ee31b31e84b7f6e75ea6ee0f3b8c094835bf5d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/227497
Reviewed-by: David Chase <drchase@google.com>
On s390x, we already have MVCIN opcode in asmz.go,
but we did not use it. This CL uses that opcode and adds MVCIN
instruction.
MVCIN instruction can be used to move data from one storage location
to another while reversing the order of bytes within the field. This
could be useful when transforming data from little-endian to big-endian.
Change-Id: Ifa1a911c0d3442f4a62f91f74ed25b196d01636b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227478
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This CL extracts some error handling code into a common method for
presenting errors encountered when loading package data.
Fixes#36087Fixes#36762
Change-Id: I87c8d41e3cc6e6afa152d9c067bc60923bf19fbe
Reviewed-on: https://go-review.googlesource.com/c/go/+/210938
Run-TryBot: Jay Conrod <jayconrod@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
The darwin/arm port is removed in Go 1.15. Setting GOOS=darwin
GOARCH=arm will fail, therefore "go test cmd/link" on macOS will
fail (in non -short mode). Remove this test point.
Updates #37611.
Change-Id: Ia9531c4b4a6692a0c49153517af9fdddd1f3e0bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/227341
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Complete a long-standing TODO in the code.
Exit blocks are cold code, so we lay them out at the end of the function.
Blocks that are post-dominated by exit blocks are also ipso facto exit blocks.
Treat them as such.
Implement using a simple loop, because there are generally very few exit blocks.
In addition to improved instruction cache, this empirically yields
better register allocation.
Binary size impact:
file before after Δ %
cgo 4812872 4808776 -4096 -0.085%
fix 3370072 3365976 -4096 -0.122%
vet 8252280 8248184 -4096 -0.050%
total 115052984 115040696 -12288 -0.011%
This also appears to improve compiler performance
(-0.15% geomean time/op, -1.20% geomean user time/op),
but that could just be alignment effects.
Compiler benchmarking hasn't been super reliably recently,
and there's no particular reason to think this should
speed up the compiler that much.
Change-Id: I3d262c4f5cb80626a67a5c17285e2fa09f423c00
Reviewed-on: https://go-review.googlesource.com/c/go/+/227217
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Multiple instances of testDWARF run in parallel, with a shared
backing store of the env input slice. Do modification of the
environment locally, instead of on the shared slice.
Fixes#38265.
Change-Id: I22a7194c8cd55ba22c9d6c47ac47bf7e710a7027
Reviewed-on: https://go-review.googlesource.com/c/go/+/227342
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
'go get' will now check absolute paths without wildcards the same way
it checks relative paths. modload.DirImportPath may be used for both
without converting path separators.
Fixes#38038
Change-Id: I453299898ece58f3b5002a5e80021d6bfe815fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/226857
Run-TryBot: Jay Conrod <jayconrod@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Michael Matloob <matloob@golang.org>
The auxint value was being printed in LongString() but not LongHTML().
Fixes#38250.
Change-Id: I28e819feef8710f912bee424d1b900eb07f3abb8
Reviewed-on: https://go-review.googlesource.com/c/go/+/227160
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Previously we stopped the timer and then reset it. With the current
timer implementation that is no longer required.
Change-Id: Ie7aba61ad53ce835f6fcd0b6bce7fe0a15b10e24
Reviewed-on: https://go-review.googlesource.com/c/go/+/227180
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
If the final pass(es) are identical during ssa.html generation,
they are persisted in-memory as "pendingPhases" but never get
written as a column in the html. This change flushes those
in-memory phases.
Fixes#38242
Change-Id: Id13477dcbe7b419a818bb457861b2422ba5ef4bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/227182
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Replace HTMLWriter's Logger field with a *Func. Implement Fatalf method
for HTMLWriter which gets the Frontend() from the Func and calls down
into it's Fatalf method, passing the msg and args along. Replace
remaining calls to the old Logger with calls to logging methods on
the Func.
Change-Id: I966342ef9997396f3416fb152fa52d60080ebecb
Reviewed-on: https://go-review.googlesource.com/c/go/+/227277
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
CL 201783 enable -d=checkptr when -race or -msan is specified
everywhere but windows.
But, now that all unsafe pointer conversions in the standard
library are fixed, enable -d=checkptr even on windows.
Updates #34964
Updates #34972
Change-Id: Id912fa83b0d5b46c6f1c134c742fd94d2d185835
Reviewed-on: https://go-review.googlesource.com/c/go/+/227003
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This only removes the ability to build it, and removes it as a
src/buildall.bash target (which uses go tool dist list).
Now:
$ go tool dist list | grep ^darwin
darwin/amd64
darwin/arm64
After this, remaining is removing leftover port--specific code in the
tree.
Updates #37610
Updates #37611
Change-Id: I00f03b2355c2e152f75e57abd3063be243529d2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/226985
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Before using some CPU instructions, we must check for their presence.
We use global variables in the runtime package to record features.
Prior to this CL, we issued a regular memory load for these features.
The downside to this is that, because it is a regular memory load,
it cannot be hoisted out of loops or otherwise reordered with other loads.
This CL introduces a new intrinsic just for checking cpu features.
It still ends up resulting in a memory load, but that memory load can
now be floated to the entry block and rematerialized as needed.
One downside is that the regular load could be combined with the comparison
into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE.
(It is possible that MOVBQZX+TESTQ+NE would be better.)
This CL does only amd64. It is easy to extend to other architectures.
For the benchmark in #36196, on my machine, this offers a mild speedup.
name old time/op new time/op delta
FMA-8 1.39ns ± 6% 1.29ns ± 9% -7.19% (p=0.000 n=97+96)
NonFMA-8 2.03ns ±11% 2.04ns ±12% ~ (p=0.618 n=99+98)
Updates #15808
Updates #36196
Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/212360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>