The current cas64 definition hard-codes the x86 behavior
of updating *old with the new value when the cas fails.
This is inconsistent with cas32 and casp.
Make it consistent.
This means that the cas64 uses will be epsilon less efficient
than they might be, because they have to do an unnecessary
memory load on x86. But so be it. Code clarity and consistency
is more important.
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/10909045
The problem happens when end=0, then end-1 is very big number.
Observed with the new scheduler.
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/7307073