1
0
mirror of https://github.com/golang/go synced 2024-11-18 04:14:49 -07:00
Commit Graph

123 Commits

Author SHA1 Message Date
Matthew Dempsky
a97ccc8940 sync/atomic: suppress checkptr errors for hammerStoreLoadPointer
This test could be updated to use unsafe.Pointer arithmetic properly
(e.g., see discussion at #34972), but it doesn't seem worthwhile. The
test is just checking that LoadPointer and StorePointer are atomic.

Updates #34972.

Change-Id: I85a8d610c1766cd63136cae686aa8a240a362a18
Reviewed-on: https://go-review.googlesource.com/c/go/+/202597
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-22 18:09:03 +00:00
Brad Fitzpatrick
03ef105dae all: remove nacl (part 3, more amd64p32)
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
        one of my grep patterns when splitting the original CL up into
        two parts.

This one might also have interesting stuff to resurrect for any future
x32 ABI support.

Updates #30439

Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-10 22:38:38 +00:00
Russ Cox
bc593eac2d sync: document implementation of Once.Do
It's not correct to use atomic.CompareAndSwap to implement Once.Do,
and we don't, but why we don't is a question that has come up
twice on golang-dev in the past few months.
Add a comment to help others with the same question.

Change-Id: Ia89ec9715cc5442c6e7f13e57a49c6cfe664d32c
Reviewed-on: https://go-review.googlesource.com/c/go/+/184261
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Ingo Oeser <nightlyone@googlemail.com>
2019-07-01 14:45:49 +00:00
Austin Clements
9caaac2c92 sync: only check for successful PopHeads in long mode
In TestPoolDequeue it's surprisingly common for the queue to stay
nearly empty the whole time and for a racing PopTail to happen in the
window between the producer doing a PushHead and doing a PopHead. In
short mode, there are only 100 PopTail attempts. On linux/amd64, it's
not uncommon for this to fail 50% of the time. On linux/arm64, it's
not uncommon for this to fail 100% of the time, causing the test to
fail.

This CL fixes this by only checking for a successful PopTail in long
mode. Long mode makes 200,000 PopTail attempts, and has never been
observed to fail.

Fixes #31422.

Change-Id: If464d55eb94fcb0b8d78fbc441d35be9f056a290
Reviewed-on: https://go-review.googlesource.com/c/go/+/183981
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2019-06-26 19:48:42 +00:00
Austin Clements
c19b3a60da sync: make TestPoolDequeue termination condition more robust
TestPoolDequeue creates P-1 consumer goroutines and 1 producer
goroutine. Currently, if a consumer goroutine pops the last value from
the dequeue, it sets a flag that stops all consumers, but the producer
also periodically pops from the dequeue and doesn't set this flag.

Hence, if the producer were to pop the last element, the consumers
will continue to run and the test won't terminate. This CL fixes this
by also setting the termination flag in the producer.

I believe it's impossible for this to happen right now because the
producer only pops after pushing an element for which j%10==0 and the
last element is either 999 or 1999999, which means it should never try
to pop after pushing the last element. However, we shouldn't depend on
this reasoning.

Change-Id: Icd2bc8d7cb9a79ebfcec99e367c8a2ba76e027d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/183980
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2019-06-26 19:48:40 +00:00
Austin Clements
93c05627f9 sync: fix pool wrap-around test
TestPoolDequeue in long mode does a little more than 1<<21 pushes.
This was originally because the head and tail indexes were 21 bits and
the intent was to test wrap-around behavior. However, in the final
version they were both 32 bits, so the test no longer tested
wrap-around.

It would take too long to reach 32-bit wrap around in a test, so
instead we initialize the poolDequeue with indexes that are already
nearly at their limit. This keeps the knowledge of the maximum index
in one place, and lets us test wrap-around even in short mode.

Change-Id: Ib9b8d85b6d9b5be235849c2b32e81c809e806579
Reviewed-on: https://go-review.googlesource.com/c/go/+/183979
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2019-06-26 19:48:39 +00:00
Russ Cox
06b0babf31 all: shorten some tests
Shorten some of the longest tests that run during all.bash.
Removes 7r 50u 21s from all.bash.

After this change, all.bash is under 5 minutes again on my laptop.

For #26473.

Change-Id: Ie0460aa935808d65460408feaed210fbaa1d5d79
Reviewed-on: https://go-review.googlesource.com/c/go/+/177559
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-05-22 12:54:00 +00:00
Kai Dong
5ccaf2c6ad sync: update comment
Comment update.

Change-Id: If0d054216f9953f42df04647b85c38008b85b026
GitHub-Last-Rev: 133b4670be
GitHub-Pull-Request: golang/go#31539
Reviewed-on: https://go-review.googlesource.com/c/go/+/172700
Reviewed-by: Austin Clements <austin@google.com>
2019-04-19 16:15:36 +00:00
Austin Clements
2dcbf8b369 sync: smooth out Pool behavior over GC with a victim cache
Currently, every Pool is cleared completely at the start of each GC.
This is a problem for heavy users of Pool because it causes an
allocation spike immediately after Pools are clear, which impacts both
throughput and latency.

This CL fixes this by introducing a victim cache mechanism. Instead of
clearing Pools, the victim cache is dropped and the primary cache is
moved to the victim cache. As a result, in steady-state, there are
(roughly) no new allocations, but if Pool usage drops, objects will
still be collected within two GCs (as opposed to one).

This victim cache approach also improves Pool's impact on GC dynamics.
The current approach causes all objects in Pools to be short lived.
However, if an application is in steady state and is just going to
repopulate its Pools, then these objects impact the live heap size *as
if* they were long lived. Since Pooled objects count as short lived
when computing the GC trigger and goal, but act as long lived objects
in the live heap, this causes GC to trigger too frequently. If Pooled
objects are a non-trivial portion of an application's heap, this
increases the CPU overhead of GC. The victim cache lets Pooled objects
affect the GC trigger and goal as long-lived objects.

This has no impact on Get/Put performance, but substantially reduces
the impact to the Pool user when a GC happens. PoolExpensiveNew
demonstrates this in the substantially reduction in the rate at which
the "New" function is called.

name                 old time/op     new time/op     delta
Pool-12                 2.21ns ±36%     2.00ns ± 0%     ~     (p=0.070 n=19+16)
PoolOverflow-12          587ns ± 1%      583ns ± 1%   -0.77%  (p=0.000 n=18+18)
PoolSTW-12              5.57µs ± 3%     4.52µs ± 4%  -18.82%  (p=0.000 n=20+19)
PoolExpensiveNew-12     3.69ms ± 7%     1.25ms ± 5%  -66.25%  (p=0.000 n=20+19)

name                 old p50-ns/STW  new p50-ns/STW  delta
PoolSTW-12               5.48k ± 2%      4.53k ± 2%  -17.32%  (p=0.000 n=20+20)

name                 old p95-ns/STW  new p95-ns/STW  delta
PoolSTW-12               6.69k ± 4%      5.13k ± 3%  -23.31%  (p=0.000 n=19+18)

name                 old GCs/op      new GCs/op      delta
PoolExpensiveNew-12       0.39 ± 1%       0.32 ± 2%  -17.95%  (p=0.000 n=18+20)

name                 old New/op      new New/op      delta
PoolExpensiveNew-12       40.0 ± 6%       12.4 ± 6%  -68.91%  (p=0.000 n=20+19)

(https://perf.golang.org/search?q=upload:20190311.2)

Fixes #22950.

Change-Id: If2e183d948c650417283076aacc20739682cdd70
Reviewed-on: https://go-review.googlesource.com/c/go/+/166961
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-04-05 18:49:08 +00:00
Austin Clements
d5fd2dd6a1 sync: use lock-free structure for Pool stealing
Currently, Pool stores each per-P shard's overflow in a slice
protected by a Mutex. In order to store to the overflow or steal from
another shard, a P must lock that shard's Mutex. This allows for
simple synchronization between Put and Get, but has unfortunate
consequences for clearing pools.

Pools are cleared during STW sweep termination, and hence rely on
pinning a goroutine to its P to synchronize between Get/Put and
clearing. This makes the Get/Put fast path extremely fast because it
can rely on quiescence-style coordination, which doesn't even require
atomic writes, much less locking.

The catch is that a goroutine cannot acquire a Mutex while pinned to
its P (as this could deadlock). Hence, it must drop the pin on the
slow path. But this means the slow path is not synchronized with
clearing. As a result,

1) It's difficult to reason about races between clearing and the slow
path. Furthermore, this reasoning often depends on unspecified nuances
of where preemption points can occur.

2) Clearing must zero out the pointer to every object in every Pool to
prevent a concurrent slow path from causing all objects to be
retained. Since this happens during STW, this has an O(# objects in
Pools) effect on STW time.

3) We can't implement a victim cache without making clearing even
slower.

This CL solves these problems by replacing the locked overflow slice
with a lock-free structure. This allows Gets and Puts to be pinned the
whole time they're manipulating the shards slice (Pool.local), which
eliminates the races between Get/Put and clearing. This, in turn,
eliminates the need to zero all object pointers, reducing clearing to
O(# of Pools) during STW.

In addition to significantly reducing STW impact, this also happens to
speed up the Get/Put fast-path and the slow path. It somewhat
increases the cost of PoolExpensiveNew, but we'll fix that in the next
CL.

name                 old time/op     new time/op     delta
Pool-12                 3.00ns ± 0%     2.21ns ±36%  -26.32%  (p=0.000 n=18+19)
PoolOverflow-12          600ns ± 1%      587ns ± 1%   -2.21%  (p=0.000 n=16+18)
PoolSTW-12              71.0µs ± 2%      5.6µs ± 3%  -92.15%  (p=0.000 n=20+20)
PoolExpensiveNew-12     3.14ms ± 5%     3.69ms ± 7%  +17.67%  (p=0.000 n=19+20)

name                 old p50-ns/STW  new p50-ns/STW  delta
PoolSTW-12               70.7k ± 1%       5.5k ± 2%  -92.25%  (p=0.000 n=20+20)

name                 old p95-ns/STW  new p95-ns/STW  delta
PoolSTW-12               73.1k ± 2%       6.7k ± 4%  -90.86%  (p=0.000 n=18+19)

name                 old GCs/op      new GCs/op      delta
PoolExpensiveNew-12       0.38 ± 1%       0.39 ± 1%   +2.07%  (p=0.000 n=20+18)

name                 old New/op      new New/op      delta
PoolExpensiveNew-12       33.9 ± 6%       40.0 ± 6%  +17.97%  (p=0.000 n=19+20)

(https://perf.golang.org/search?q=upload:20190311.1)

Fixes #22331.
For #22950.

Change-Id: Ic5cd826e25e218f3f8256dbc4d22835c1fecb391
Reviewed-on: https://go-review.googlesource.com/c/go/+/166960
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-04-05 18:49:07 +00:00
Austin Clements
59f2704dab sync: add Pool benchmarks to stress STW and reuse
This adds two benchmarks that will highlight two problems in Pool that
we're about to address.

The first benchmark measures the impact of large Pools on GC STW time.
Currently, STW time is O(# of items in Pools), and this benchmark
demonstrates 70µs STW times.

The second benchmark measures the impact of fully clearing all Pools
on each GC. Typically this is a problem in heavily-loaded systems
because it causes a spike in allocation. This benchmark stresses this
by simulating an expensive "New" function, so the cost of creating new
objects is reflected in the ns/op of the benchmark.

For #22950, #22331.

Change-Id: I0c8853190d23144026fa11837b6bf42adc461722
Reviewed-on: https://go-review.googlesource.com/c/go/+/166959
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-04-05 18:49:06 +00:00
Austin Clements
57bb7be4b1 sync: internal dynamically sized lock-free queue for sync.Pool
This adds a dynamically sized, lock-free, single-producer,
multi-consumer queue that will be used in the new Pool stealing
implementation. It's built on top of the fixed-size queue added in the
previous CL.

For #22950, #22331.

Change-Id: Ifc0ca3895bec7e7f9289ba9fb7dd0332bf96ba5a
Reviewed-on: https://go-review.googlesource.com/c/go/+/166958
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-04-05 18:49:04 +00:00
Austin Clements
2b60567002 sync: internal fixed size lock-free queue for sync.Pool
This is the first step toward fixing multiple issues with sync.Pool.
This adds a fixed size, lock-free, single-producer, multi-consumer
queue that will be used in the new Pool stealing implementation.

For #22950, #22331.

Change-Id: I50e85e3cb83a2ee71f611ada88e7f55996504bb5
Reviewed-on: https://go-review.googlesource.com/c/go/+/166957
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-04-05 18:49:03 +00:00
Carlo Alberto Ferraris
05051b56a0 sync: allow inlining the RWMutex.RUnlock fast path
RWMutex.RLock is already inlineable, so add a test for it as well.

name                    old time/op  new time/op  delta
RWMutexUncontended      66.5ns ± 0%  60.3ns ± 1%  -9.38%  (p=0.000 n=12+20)
RWMutexUncontended-4    16.7ns ± 0%  15.3ns ± 1%  -8.49%  (p=0.000 n=17+20)
RWMutexUncontended-16   7.86ns ± 0%  7.69ns ± 0%  -2.08%  (p=0.000 n=18+15)
RWMutexWrite100         25.1ns ± 0%  24.0ns ± 1%  -4.28%  (p=0.000 n=20+18)
RWMutexWrite100-4       46.7ns ± 5%  44.1ns ± 4%  -5.53%  (p=0.000 n=20+20)
RWMutexWrite100-16      68.3ns ±11%  65.7ns ± 8%  -3.81%  (p=0.003 n=20+20)
RWMutexWrite10          26.7ns ± 1%  25.7ns ± 0%  -3.75%  (p=0.000 n=17+14)
RWMutexWrite10-4        34.9ns ± 2%  33.8ns ± 2%  -3.15%  (p=0.000 n=20+20)
RWMutexWrite10-16       37.4ns ± 2%  36.1ns ± 2%  -3.51%  (p=0.000 n=18+20)
RWMutexWorkWrite100      163ns ± 0%   162ns ± 0%  -0.89%  (p=0.000 n=18+20)
RWMutexWorkWrite100-4    189ns ± 4%   184ns ± 4%  -2.89%  (p=0.000 n=19+20)
RWMutexWorkWrite100-16   207ns ± 4%   200ns ± 2%  -3.07%  (p=0.000 n=19+20)
RWMutexWorkWrite10       153ns ± 0%   151ns ± 1%  -0.75%  (p=0.000 n=20+20)
RWMutexWorkWrite10-4     177ns ± 1%   176ns ± 2%  -0.63%  (p=0.004 n=17+20)
RWMutexWorkWrite10-16    191ns ± 2%   189ns ± 1%  -0.83%  (p=0.015 n=20+17)

linux/amd64 bin/go 14688201 (previous commit 14675861, +12340/+0.08%)

The cumulative effect of this and the previous 3 commits is:

name                    old time/op  new time/op  delta
MutexUncontended        19.3ns ± 1%  16.4ns ± 1%  -15.13%  (p=0.000 n=20+20)
MutexUncontended-4      5.24ns ± 0%  4.09ns ± 0%  -21.95%  (p=0.000 n=20+18)
MutexUncontended-16     2.10ns ± 0%  2.12ns ± 0%   +0.95%  (p=0.000 n=15+17)
Mutex                   19.6ns ± 0%  16.3ns ± 1%  -17.12%  (p=0.000 n=20+20)
Mutex-4                 54.6ns ± 5%  45.6ns ±10%  -16.51%  (p=0.000 n=20+19)
Mutex-16                 133ns ± 5%   130ns ± 3%   -1.99%  (p=0.002 n=20+20)
MutexSlack              33.4ns ± 2%  16.2ns ± 0%  -51.44%  (p=0.000 n=19+20)
MutexSlack-4             206ns ± 5%   209ns ± 9%     ~     (p=0.154 n=20+20)
MutexSlack-16           89.4ns ± 1%  90.9ns ± 2%   +1.70%  (p=0.000 n=18+17)
MutexWork               60.5ns ± 0%  55.3ns ± 1%   -8.59%  (p=0.000 n=12+20)
MutexWork-4              105ns ± 5%    97ns ±11%   -7.95%  (p=0.000 n=20+20)
MutexWork-16             157ns ± 1%   158ns ± 1%   +0.66%  (p=0.001 n=18+17)
MutexWorkSlack          70.2ns ± 5%  55.3ns ± 0%  -21.30%  (p=0.000 n=19+18)
MutexWorkSlack-4         277ns ±13%   260ns ±15%   -6.35%  (p=0.002 n=20+18)
MutexWorkSlack-16        156ns ± 0%   146ns ± 1%   -6.40%  (p=0.000 n=16+19)
MutexNoSpin              966ns ± 0%   976ns ± 1%   +0.97%  (p=0.000 n=15+17)
MutexNoSpin-4            269ns ± 4%   272ns ± 4%   +1.15%  (p=0.048 n=20+18)
MutexNoSpin-16           122ns ± 0%   119ns ± 1%   -2.63%  (p=0.000 n=19+15)
MutexSpin               3.13µs ± 0%  3.12µs ± 0%   -0.17%  (p=0.000 n=18+18)
MutexSpin-4              826ns ± 1%   833ns ± 1%   +0.84%  (p=0.000 n=19+17)
MutexSpin-16             397ns ± 1%   394ns ± 1%   -0.78%  (p=0.000 n=19+19)
Once                    5.67ns ± 0%  2.07ns ± 2%  -63.43%  (p=0.000 n=20+20)
Once-4                  1.47ns ± 2%  0.54ns ± 3%  -63.49%  (p=0.000 n=19+20)
Once-16                 0.58ns ± 0%  0.17ns ± 5%  -70.49%  (p=0.000 n=17+17)
RWMutexUncontended      71.4ns ± 0%  60.3ns ± 1%  -15.60%  (p=0.000 n=16+20)
RWMutexUncontended-4    18.4ns ± 4%  15.3ns ± 1%  -17.14%  (p=0.000 n=20+20)
RWMutexUncontended-16   8.01ns ± 0%  7.69ns ± 0%   -3.91%  (p=0.000 n=18+15)
RWMutexWrite100         24.9ns ± 0%  24.0ns ± 1%   -3.57%  (p=0.000 n=19+18)
RWMutexWrite100-4       46.5ns ± 3%  44.1ns ± 4%   -5.09%  (p=0.000 n=17+20)
RWMutexWrite100-16      68.9ns ± 3%  65.7ns ± 8%   -4.65%  (p=0.000 n=18+20)
RWMutexWrite10          27.1ns ± 0%  25.7ns ± 0%   -5.25%  (p=0.000 n=17+14)
RWMutexWrite10-4        34.8ns ± 1%  33.8ns ± 2%   -2.96%  (p=0.000 n=20+20)
RWMutexWrite10-16       37.5ns ± 2%  36.1ns ± 2%   -3.72%  (p=0.000 n=20+20)
RWMutexWorkWrite100      164ns ± 0%   162ns ± 0%   -1.49%  (p=0.000 n=12+20)
RWMutexWorkWrite100-4    186ns ± 3%   184ns ± 4%     ~     (p=0.097 n=20+20)
RWMutexWorkWrite100-16   204ns ± 2%   200ns ± 2%   -1.58%  (p=0.000 n=18+20)
RWMutexWorkWrite10       153ns ± 0%   151ns ± 1%   -1.21%  (p=0.000 n=20+20)
RWMutexWorkWrite10-4     179ns ± 1%   176ns ± 2%   -1.25%  (p=0.000 n=19+20)
RWMutexWorkWrite10-16    191ns ± 1%   189ns ± 1%   -0.94%  (p=0.000 n=15+17)

Change-Id: I9269bf2ac42a04c610624f707d3268dcb17390f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/152698
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-03-09 16:34:17 +00:00
Carlo Alberto Ferraris
ca8354843e sync: allow inlining the Once.Do fast path
Using Once.Do is now extremely cheap because the fast path is just an inlined
atomic load of a variable that is written only once and a conditional jump.
This is very beneficial for Once.Do because, due to its nature, the fast path
will be used for every call after the first one.

In a attempt to mimize code size increase, reorder the fields so that the
pointer to Once is also the pointer to Once.done, that is the only field used
in the hot path. This allows to use more compact instruction encodings or less
instructions in the hot path (that is inlined at every callsite).

name     old time/op  new time/op  delta
Once     4.54ns ± 0%  2.06ns ± 0%  -54.59%  (p=0.000 n=19+16)
Once-4   1.18ns ± 0%  0.55ns ± 0%  -53.39%  (p=0.000 n=15+16)
Once-16  0.53ns ± 0%  0.17ns ± 0%  -67.92%  (p=0.000 n=18+17)

linux/amd64 bin/go 14675861 (previous commit 14663387, +12474/+0.09%)

Change-Id: Ie2708103ab473787875d66746d2f20f1d90a6916
Reviewed-on: https://go-review.googlesource.com/c/go/+/152697
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-03-09 05:58:31 +00:00
Carlo Alberto Ferraris
41cb0aedff sync: allow inlining the Mutex.Lock fast path
name                    old time/op  new time/op  delta
MutexUncontended        18.9ns ± 0%  16.2ns ± 0%  -14.29%  (p=0.000 n=19+19)
MutexUncontended-4      4.75ns ± 1%  4.08ns ± 0%  -14.20%  (p=0.000 n=20+19)
MutexUncontended-16     2.05ns ± 0%  2.11ns ± 0%   +2.93%  (p=0.000 n=19+16)
Mutex                   19.3ns ± 1%  16.2ns ± 0%  -15.86%  (p=0.000 n=17+19)
Mutex-4                 52.4ns ± 4%  48.6ns ± 9%   -7.22%  (p=0.000 n=20+20)
Mutex-16                 139ns ± 2%   140ns ± 3%   +1.03%  (p=0.011 n=16+20)
MutexSlack              18.9ns ± 1%  16.2ns ± 1%  -13.96%  (p=0.000 n=20+20)
MutexSlack-4             225ns ± 8%   211ns ±10%   -5.94%  (p=0.000 n=18+19)
MutexSlack-16           98.4ns ± 1%  90.9ns ± 1%   -7.60%  (p=0.000 n=17+18)
MutexWork               58.2ns ± 3%  55.4ns ± 0%   -4.82%  (p=0.000 n=20+17)
MutexWork-4              103ns ± 7%    95ns ±18%   -8.03%  (p=0.000 n=20+20)
MutexWork-16             163ns ± 2%   155ns ± 2%   -4.47%  (p=0.000 n=18+18)
MutexWorkSlack          57.7ns ± 1%  55.4ns ± 0%   -3.99%  (p=0.000 n=20+13)
MutexWorkSlack-4         276ns ±13%   260ns ±10%   -5.64%  (p=0.001 n=19+19)
MutexWorkSlack-16        147ns ± 0%   156ns ± 1%   +5.87%  (p=0.000 n=14+19)
MutexNoSpin              968ns ± 0%   900ns ± 1%   -6.98%  (p=0.000 n=20+18)
MutexNoSpin-4            270ns ± 2%   255ns ± 2%   -5.74%  (p=0.000 n=19+20)
MutexNoSpin-16           120ns ± 4%   112ns ± 0%   -6.99%  (p=0.000 n=19+14)
MutexSpin               3.13µs ± 1%  3.19µs ± 6%     ~     (p=0.401 n=20+20)
MutexSpin-4              832ns ± 2%   831ns ± 1%   -0.17%  (p=0.023 n=16+18)
MutexSpin-16             395ns ± 0%   399ns ± 0%   +0.94%  (p=0.000 n=17+19)
RWMutexUncontended      69.5ns ± 0%  68.4ns ± 0%   -1.59%  (p=0.000 n=20+20)
RWMutexUncontended-4    17.5ns ± 0%  16.7ns ± 0%   -4.30%  (p=0.000 n=18+17)
RWMutexUncontended-16   7.92ns ± 0%  7.87ns ± 0%   -0.61%  (p=0.000 n=18+17)
RWMutexWrite100         24.9ns ± 1%  25.0ns ± 1%   +0.32%  (p=0.000 n=20+20)
RWMutexWrite100-4       46.2ns ± 4%  46.2ns ± 5%     ~     (p=0.840 n=19+20)
RWMutexWrite100-16      69.9ns ± 5%  69.9ns ± 3%     ~     (p=0.545 n=20+19)
RWMutexWrite10          27.0ns ± 2%  26.8ns ± 2%   -0.98%  (p=0.001 n=20+20)
RWMutexWrite10-4        34.7ns ± 2%  35.0ns ± 4%     ~     (p=0.191 n=18+20)
RWMutexWrite10-16       37.2ns ± 4%  37.3ns ± 2%     ~     (p=0.438 n=20+19)
RWMutexWorkWrite100      164ns ± 0%   163ns ± 0%   -0.24%  (p=0.025 n=20+20)
RWMutexWorkWrite100-4    193ns ± 3%   191ns ± 2%   -1.06%  (p=0.027 n=20+20)
RWMutexWorkWrite100-16   210ns ± 3%   207ns ± 3%   -1.22%  (p=0.038 n=20+20)
RWMutexWorkWrite10       153ns ± 0%   153ns ± 0%     ~     (all equal)
RWMutexWorkWrite10-4     178ns ± 2%   179ns ± 2%     ~     (p=0.186 n=20+20)
RWMutexWorkWrite10-16    192ns ± 2%   192ns ± 2%     ~     (p=0.731 n=19+20)

linux/amd64 bin/go 14663387 (previous commit 14630572, +32815/+0.22%)

Change-Id: I98171006dce14069b1a62da07c3d165455a7906b
Reviewed-on: https://go-review.googlesource.com/c/go/+/148959
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-09 05:08:04 +00:00
Carlo Alberto Ferraris
4c3f26076b sync: allow inlining the Mutex.Unlock fast path
Make use of the newly-enabled limited midstack inlining.
Similar changes will be done in followup CLs.

name                    old time/op  new time/op  delta
MutexUncontended        19.3ns ± 1%  18.9ns ± 0%   -1.92%  (p=0.000 n=20+19)
MutexUncontended-4      5.24ns ± 0%  4.75ns ± 1%   -9.25%  (p=0.000 n=20+20)
MutexUncontended-16     2.10ns ± 0%  2.05ns ± 0%   -2.38%  (p=0.000 n=15+19)
Mutex                   19.6ns ± 0%  19.3ns ± 1%   -1.92%  (p=0.000 n=20+17)
Mutex-4                 54.6ns ± 5%  52.4ns ± 4%   -4.09%  (p=0.000 n=20+20)
Mutex-16                 133ns ± 5%   139ns ± 2%   +4.23%  (p=0.000 n=20+16)
MutexSlack              33.4ns ± 2%  18.9ns ± 1%  -43.56%  (p=0.000 n=19+20)
MutexSlack-4             206ns ± 5%   225ns ± 8%   +9.12%  (p=0.000 n=20+18)
MutexSlack-16           89.4ns ± 1%  98.4ns ± 1%  +10.10%  (p=0.000 n=18+17)
MutexWork               60.5ns ± 0%  58.2ns ± 3%   -3.75%  (p=0.000 n=12+20)
MutexWork-4              105ns ± 5%   103ns ± 7%   -1.68%  (p=0.007 n=20+20)
MutexWork-16             157ns ± 1%   163ns ± 2%   +3.90%  (p=0.000 n=18+18)
MutexWorkSlack          70.2ns ± 5%  57.7ns ± 1%  -17.81%  (p=0.000 n=19+20)
MutexWorkSlack-4         277ns ±13%   276ns ±13%     ~     (p=0.682 n=20+19)
MutexWorkSlack-16        156ns ± 0%   147ns ± 0%   -5.62%  (p=0.000 n=16+14)
MutexNoSpin              966ns ± 0%   968ns ± 0%   +0.11%  (p=0.029 n=15+20)
MutexNoSpin-4            269ns ± 4%   270ns ± 2%     ~     (p=0.807 n=20+19)
MutexNoSpin-16           122ns ± 0%   120ns ± 4%   -1.63%  (p=0.000 n=19+19)
MutexSpin               3.13µs ± 0%  3.13µs ± 1%   +0.16%  (p=0.004 n=18+20)
MutexSpin-4              826ns ± 1%   832ns ± 2%   +0.74%  (p=0.000 n=19+16)
MutexSpin-16             397ns ± 1%   395ns ± 0%   -0.50%  (p=0.000 n=19+17)
RWMutexUncontended      71.4ns ± 0%  69.5ns ± 0%   -2.72%  (p=0.000 n=16+20)
RWMutexUncontended-4    18.4ns ± 4%  17.5ns ± 0%   -4.92%  (p=0.000 n=20+18)
RWMutexUncontended-16   8.01ns ± 0%  7.92ns ± 0%   -1.15%  (p=0.000 n=18+18)
RWMutexWrite100         24.9ns ± 0%  24.9ns ± 1%     ~     (p=0.099 n=19+20)
RWMutexWrite100-4       46.5ns ± 3%  46.2ns ± 4%     ~     (p=0.253 n=17+19)
RWMutexWrite100-16      68.9ns ± 3%  69.9ns ± 5%   +1.46%  (p=0.012 n=18+20)
RWMutexWrite10          27.1ns ± 0%  27.0ns ± 2%     ~     (p=0.128 n=17+20)
RWMutexWrite10-4        34.8ns ± 1%  34.7ns ± 2%     ~     (p=0.180 n=20+18)
RWMutexWrite10-16       37.5ns ± 2%  37.2ns ± 4%   -0.89%  (p=0.023 n=20+20)
RWMutexWorkWrite100      164ns ± 0%   164ns ± 0%     ~     (p=0.106 n=12+20)
RWMutexWorkWrite100-4    186ns ± 3%   193ns ± 3%   +3.46%  (p=0.000 n=20+20)
RWMutexWorkWrite100-16   204ns ± 2%   210ns ± 3%   +2.96%  (p=0.000 n=18+20)
RWMutexWorkWrite10       153ns ± 0%   153ns ± 0%   -0.20%  (p=0.017 n=20+19)
RWMutexWorkWrite10-4     179ns ± 1%   178ns ± 2%     ~     (p=0.215 n=19+20)
RWMutexWorkWrite10-16    191ns ± 1%   192ns ± 2%     ~     (p=0.166 n=15+19)

linux/amd64 bin/go 14630572 (previous commit 14605947, +24625/+0.17%)

Change-Id: I3f9d1765801fe0b8deb1bc2728b8bba8a7508e23
Reviewed-on: https://go-review.googlesource.com/c/go/+/148958
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-05 14:59:31 +00:00
Ian Lance Taylor
ca7c12d4c9 sync/atomic: add 32-bit MIPS to the 64-bit alignment requirement
runtime/internal/atomic/atomic_mipsx.go enforces 64-bit alignment.

Change-Id: Ifdc36e1c0322827711425054d10f1c52425a13fa
Reviewed-on: https://go-review.googlesource.com/c/161697
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-02-09 23:07:07 +00:00
Brad Fitzpatrick
3813edf26e all: use "reports whether" consistently in the few places that didn't
Go documentation style for boolean funcs is to say:

    // Foo reports whether ...
    func Foo() bool

(rather than "returns true if")

This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.

(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)

Created with:

$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)

Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-02 22:47:58 +00:00
Roberto
963776e689 sync: fix typo in doc
Change-Id: Ie1f35c7598bd2549a048d64e1b1279bf4acaa103
GitHub-Last-Rev: c8cc7dfef9
GitHub-Pull-Request: golang/go#28051
Reviewed-on: https://go-review.googlesource.com/c/140302
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-10-06 12:04:57 +00:00
Ian Lance Taylor
756c352963 sync: simplify (*entry).tryStore
The only change to the go build -gcflags=-m=2 output was to remove
these two lines:

sync/map.go:178:26: &e.p escapes to heap
sync/map.go:178:26: 	from &e.p (passed to call[argument escapes]) at sync/map.go:178:25

Benchstat report for sync.Map benchmarks:

name                                            old time/op  new time/op  delta
LoadMostlyHits/*sync_test.DeepCopyMap-12        10.6ns ±11%  10.2ns ± 3%    ~     (p=0.299 n=10+8)
LoadMostlyHits/*sync_test.RWMutexMap-12         54.6ns ± 3%  54.6ns ± 2%    ~     (p=0.782 n=10+10)
LoadMostlyHits/*sync.Map-12                     10.1ns ± 1%  10.1ns ± 1%    ~     (p=1.127 n=10+8)
LoadMostlyMisses/*sync_test.DeepCopyMap-12      8.65ns ± 1%  8.77ns ± 5%  +1.39%  (p=0.017 n=9+10)
LoadMostlyMisses/*sync_test.RWMutexMap-12       53.6ns ± 2%  53.8ns ± 2%    ~     (p=0.408 n=10+9)
LoadMostlyMisses/*sync.Map-12                   7.37ns ± 1%  7.46ns ± 1%  +1.19%  (p=0.000 n=9+10)
LoadOrStoreBalanced/*sync_test.RWMutexMap-12     895ns ± 4%   906ns ± 3%    ~     (p=0.203 n=9+10)
LoadOrStoreBalanced/*sync.Map-12                 872ns ±10%   804ns ±12%  -7.75%  (p=0.014 n=10+10)
LoadOrStoreUnique/*sync_test.RWMutexMap-12      1.29µs ± 2%  1.28µs ± 1%    ~     (p=0.586 n=10+9)
LoadOrStoreUnique/*sync.Map-12                  1.30µs ± 7%  1.40µs ± 2%  +6.95%  (p=0.000 n=9+10)
LoadOrStoreCollision/*sync_test.DeepCopyMap-12  6.98ns ± 1%  6.91ns ± 1%  -1.10%  (p=0.000 n=10+10)
LoadOrStoreCollision/*sync_test.RWMutexMap-12    371ns ± 1%   372ns ± 2%    ~     (p=0.679 n=9+9)
LoadOrStoreCollision/*sync.Map-12               5.49ns ± 1%  5.49ns ± 1%    ~     (p=0.732 n=9+10)
Range/*sync_test.DeepCopyMap-12                 2.49µs ± 1%  2.50µs ± 0%    ~     (p=0.148 n=10+10)
Range/*sync_test.RWMutexMap-12                  54.7µs ± 1%  54.6µs ± 3%    ~     (p=0.549 n=9+10)
Range/*sync.Map-12                              2.74µs ± 1%  2.76µs ± 1%  +0.68%  (p=0.011 n=10+8)
AdversarialAlloc/*sync_test.DeepCopyMap-12      2.52µs ± 5%  2.54µs ± 7%    ~     (p=0.225 n=10+10)
AdversarialAlloc/*sync_test.RWMutexMap-12        108ns ± 1%   107ns ± 1%    ~     (p=0.101 n=10+9)
AdversarialAlloc/*sync.Map-12                    712ns ± 2%   714ns ± 3%    ~     (p=0.984 n=8+10)
AdversarialDelete/*sync_test.DeepCopyMap-12      581ns ± 3%   578ns ± 3%    ~     (p=0.781 n=9+9)
AdversarialDelete/*sync_test.RWMutexMap-12       126ns ± 2%   126ns ± 1%    ~     (p=0.883 n=10+10)
AdversarialDelete/*sync.Map-12                   155ns ± 8%   158ns ± 2%    ~     (p=0.158 n=10+9)

Change-Id: I1ed8e3109baca03087d0fad3df769fc7e38f6dbb
Reviewed-on: https://go-review.googlesource.com/137441
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-09-27 21:44:20 +00:00
Dan Kortschak
c2eba53e7f cmd/vet,sync: check lock values more precisely
Fixes #26165

Change-Id: I1f3bd193af9b6f8461c736330952b6e50d3e00d9
Reviewed-on: https://go-review.googlesource.com/121876
Reviewed-by: Alan Donovan <adonovan@google.com>
Run-TryBot: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-07-14 06:48:21 +00:00
Dmitry Vyukov
f451a318bb sync: fix deficiency in RWMutex race annotations
Remove unnecessary race.Release annotation from Unlock.

For RWMutex we want to establish the following happens-before (HB) edges:
1. Between Unlock and the subsequent Lock.
2. Between Unlock and the subsequent RLock.
3. Between batch of RUnlock's and the subsequent Lock.

1 is provided by Release(&rw.readerSem) in Unlock and Acquire(&rw.readerSem) in Lock.
2 is provided by Release(&rw.readerSem) in Unlock and Acquire(&rw.readerSem) in RLock.
3 is provided by ReleaseMerge(&rw.writerSem) in RUnlock in Acquire(&rw.writerSem) in Lock,
since we want to establish HB between a batch of RUnlock's this uses ReleaseMerge instead of Release.

Release(&rw.writerSem) in Unlock is simply not needed.

FWIW this is also how C++ tsan handles mutexes, not a proof but at least something.
Making 2 implementations consistent also simplifies any kind of reasoning against both of them.

Since this only affects performance, a reasonable test is not possible.
Everything should just continue to work but slightly faster.

Credit for discovering this goes to Jamie Liu.

Change-Id: Ice37d29ecb7a5faed3f7781c38dd32c7469b2735
Reviewed-on: https://go-review.googlesource.com/120495
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-06-22 14:43:09 +00:00
Michael Munday
b7f3c178a3 sync: deflake TestWaitGroupMisuse2
We need to yield to the runtime every now and again to avoid
deadlock. This doesn't show up on most machines because the test
only runs when you have 5 or more CPUs.

Fixes #20072.

Change-Id: Ibf5ed370e919943395f3418487188df0b2be160b
Reviewed-on: https://go-review.googlesource.com/112978
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-05-14 04:42:45 +00:00
Cherry Zhang
1b6fec862c sync/atomic: redirect many functions to runtime/internal/atomic
The implementation of atomics are inherently tricky. It would
be good to have them implemented in a single place, instead of
multiple copies.

Mostly a simple redirect.

On 386, some functions in sync/atomic have better implementations,
which are moved to runtime/internal/atomic.

On ARM, some functions in sync/atomic have better implementations.
They are dropped by this CL, but restored with an improved
version in a follow-up CL. On linux/arm, 64-bit CAS kernel helper
is dropped, as we're trying to move away from kernel helpers.

Fixes #23778.

Change-Id: Icb9e1039acc92adbb2a371c34baaf0b79551c3ea
Reviewed-on: https://go-review.googlesource.com/93637
Reviewed-by: Austin Clements <austin@google.com>
2018-05-03 21:35:01 +00:00
Russ Cox
eca7a1343c sync: hide test of misuse of Cond from vet
The test wants to check that copies of Cond are detected at runtime.
Make a copy that isn't detected by vet at compile time.

Change-Id: I933ab1003585f75ba96723563107f1ba8126cb72
Reviewed-on: https://go-review.googlesource.com/108557
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-04-25 02:49:46 +00:00
Kevin Burke
f3962b1849 sync/atomic: use package prefix in examples
Previously these examples declared "var v Value" but any caller would
need to write "var v atomic.Value", so we should use the external
package declaration form to avoid confusion about where Value comes
from.

Change-Id: Ic0b1a05fb6b700da61cfc8efca594c49a9bedb69
Reviewed-on: https://go-review.googlesource.com/107975
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-18 23:58:54 +00:00
Yuval Pavel Zholkover
0a4b962c17 runtime/internal/atomic: don't use Cas in atomic.Load on ARM
Instead issue a memory barrier on ARMv7 after reading the address.

Fixes #23777

Change-Id: I7aff2ab0246af64b437ebe0b31d4b30d351890d8
Reviewed-on: https://go-review.googlesource.com/94275
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-18 15:34:14 +00:00
Yuval Pavel Zholkover
0a5be12f5c cmd/internal/obj/arm: add DMB instruction
Change-Id: Ib67a61d5b37af210ff15d60d72bd5238b9c2d0ca
Reviewed-on: https://go-review.googlesource.com/94815
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-27 19:54:44 +00:00
Carlos Eduardo Seo
6633bb2aa7 cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement faster atomics for ppc64x
This change implements faster atomics for ppc64x based on the ISA 2.07B,
Appendix B.2 recommendations, replacing SYNC/ISYNC by LWSYNC in some
cases.

Updates #21348

name                                           old time/op new time/op    delta
Cond1-16                                           955ns     856ns      -10.33%
Cond2-16                                          2.38µs    2.03µs      -14.59%
Cond4-16                                          5.90µs    5.44µs       -7.88%
Cond8-16                                          12.1µs    11.1µs       -8.42%
Cond16-16                                         27.0µs    25.1µs       -7.04%
Cond32-16                                         59.1µs    55.5µs       -6.14%
LoadMostlyHits/*sync_test.DeepCopyMap-16          22.1ns    24.1ns       +9.02%
LoadMostlyHits/*sync_test.RWMutexMap-16            252ns     249ns       -1.20%
LoadMostlyHits/*sync.Map-16                       16.2ns    16.3ns         ~
LoadMostlyMisses/*sync_test.DeepCopyMap-16        22.3ns    22.6ns         ~
LoadMostlyMisses/*sync_test.RWMutexMap-16          249ns     247ns       -0.51%
LoadMostlyMisses/*sync.Map-16                     12.7ns    12.7ns         ~
LoadOrStoreBalanced/*sync_test.RWMutexMap-16      1.27µs    1.17µs       -7.54%
LoadOrStoreBalanced/*sync.Map-16                  1.12µs    1.10µs       -2.35%
LoadOrStoreUnique/*sync_test.RWMutexMap-16        1.75µs    1.68µs       -3.84%
LoadOrStoreUnique/*sync.Map-16                    2.07µs    1.97µs       -5.13%
LoadOrStoreCollision/*sync_test.DeepCopyMap-16    15.8ns    15.9ns         ~
LoadOrStoreCollision/*sync_test.RWMutexMap-16      496ns     424ns      -14.48%
LoadOrStoreCollision/*sync.Map-16                 6.07ns    6.07ns         ~
Range/*sync_test.DeepCopyMap-16                   1.65µs    1.64µs         ~
Range/*sync_test.RWMutexMap-16                     278µs     288µs       +3.75%
Range/*sync.Map-16                                2.00µs    2.01µs         ~
AdversarialAlloc/*sync_test.DeepCopyMap-16        3.45µs    3.44µs         ~
AdversarialAlloc/*sync_test.RWMutexMap-16          226ns     227ns         ~
AdversarialAlloc/*sync.Map-16                     1.09µs    1.07µs       -2.36%
AdversarialDelete/*sync_test.DeepCopyMap-16        553ns     550ns       -0.57%
AdversarialDelete/*sync_test.RWMutexMap-16         273ns     274ns         ~
AdversarialDelete/*sync.Map-16                     247ns     249ns         ~
UncontendedSemaphore-16                           79.0ns    65.5ns      -17.11%
ContendedSemaphore-16                              112ns      97ns      -13.77%
MutexUncontended-16                               3.34ns    2.51ns      -24.69%
Mutex-16                                           266ns     191ns      -28.26%
MutexSlack-16                                      226ns     159ns      -29.55%
MutexWork-16                                       377ns     338ns      -10.14%
MutexWorkSlack-16                                  335ns     308ns       -8.20%
MutexNoSpin-16                                     196ns     184ns       -5.91%
MutexSpin-16                                       710ns     666ns       -6.21%
Once-16                                           1.29ns    1.29ns         ~
Pool-16                                           8.64ns    8.71ns         ~
PoolOverflow-16                                   1.60µs    1.44µs      -10.25%
SemaUncontended-16                                5.39ns    4.42ns      -17.96%
SemaSyntNonblock-16                                539ns     483ns      -10.42%
SemaSyntBlock-16                                   413ns     354ns      -14.20%
SemaWorkNonblock-16                                305ns     258ns      -15.36%
SemaWorkBlock-16                                   266ns     229ns      -14.06%
RWMutexUncontended-16                             12.9ns     9.7ns      -24.80%
RWMutexWrite100-16                                 203ns     147ns      -27.47%
RWMutexWrite10-16                                  177ns     119ns      -32.74%
RWMutexWorkWrite100-16                             435ns     403ns       -7.39%
RWMutexWorkWrite10-16                              642ns     611ns       -4.79%
WaitGroupUncontended-16                           4.67ns    3.70ns      -20.92%
WaitGroupAddDone-16                                402ns     355ns      -11.54%
WaitGroupAddDoneWork-16                            208ns     250ns      +20.09%
WaitGroupWait-16                                  1.21ns    1.21ns         ~
WaitGroupWaitWork-16                              5.91ns    5.87ns       -0.81%
WaitGroupActuallyWait-16                          92.2ns    85.8ns       -6.91%

Updates #21348

Change-Id: Ibb9b271d11b308264103829e176c6d9fe8f867d3
Reviewed-on: https://go-review.googlesource.com/95175
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-03-22 14:13:01 +00:00
isharipo
ff5cf43df5 runtime,sync/atomic: replace asm BYTEs with insts for x86
For each replacement, test case is added to new 386enc.s file
with exception of EMMS, SYSENTER, MFENCE and LFENCE as they
are already covered in amd64enc.s (same on amd64 and 386).

The replacement became less obvious after go vet suggested changes
Before:
	BYTE $0x0f; BYTE $0x7f; BYTE $0x44; BYTE $0x24; BYTE $0x08
Changed to MOVQ (this form is being tested):
	MOVQ M0, 8(SP)
Refactored to FP-relative access (go vet advice):
	MOVQ M0, val+4(FP)

Change-Id: I56b87cf3371b6ad81ad0cd9db2033aee407b5818
Reviewed-on: https://go-review.googlesource.com/101475
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-03-21 20:51:04 +00:00
Diogo Pinela
9ff7df003d sync: make WaitGroup more space-efficient
The struct stores its 64-bit state field in a 12-byte array to
ensure that it can be 64-bit-aligned. This leaves 4 spare bytes,
which we can reuse to store the sema field.

(32-bit alignment is still guaranteed because the array type was
changed to [3]uint32.)

Fixes #19149.

Change-Id: I9bc20e69e45e0e07fbf496080f3650e8be0d6e8d
Reviewed-on: https://go-review.googlesource.com/100515
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-15 09:56:25 +00:00
Lorenz Bauer
88ba645827 sync: enable profiling of RWMutex
Include reader / writer interactions of RWMutex in the mutex profile.
Writer contention is already included in the profile, since a plain Mutex
is used to control exclusion.

Fixes #18496

Change-Id: Ib0dc1ffa0fd5e6d964a6f7764d7f09556eb63f00
Reviewed-on: https://go-review.googlesource.com/87095
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2018-02-14 15:38:42 +00:00
Austin Clements
a046caa1e8 runtime, sync/atomic: use NOFRAME on arm
This replaces frame size -4 with the NOFRAME flag in arm assembly.

This was automated with:

sed -i -e 's/\(^TEXT.*[A-Z]\),\( *\)\$-4/\1|NOFRAME,\2$0/' $(find -name '*_arm.s')

Plus three manual comment changes found by:

grep '\$-4' $(find -name '*_arm.s')

The go binary is identical before and after this change.

Change-Id: I0310384d1a584118c41d1cd3a042bb8ea7227ef9
Reviewed-on: https://go-review.googlesource.com/92042
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-12 21:41:30 +00:00
Brad Fitzpatrick
165e7523fb sync: consistently use article "a" for RWMutex
We used a mix of both before.

I've never heard anybody say "an arr-double you mutex" when speaking.

Fixes #23457

Change-Id: I802b5eb2339f885ca9d24607eeda565763165298
Reviewed-on: https://go-review.googlesource.com/87896
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2018-01-16 23:09:57 +00:00
Russ Cox
3f150934e2 sync: document when and when not to use Map
Fixes #21587.

Change-Id: I47eb181d65da67a3b530c7f8acac9c0c619ea474
Reviewed-on: https://go-review.googlesource.com/83796
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-01-04 20:13:20 +00:00
Rhys Hiltner
f72196647e sync: throw, not panic, for unlock of unlocked mutex
This was originally done in https://golang.org/cl/31359 but partially
undone (apparently unintentionally) in https://golang.org/cl/34310

Fix it, and update tests to ensure the error is unrecoverable.

Fixes #23039

Change-Id: I923ebd613a05e67d8acce77f4a68c64c8574faa6
Reviewed-on: https://go-review.googlesource.com/82656
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2017-12-08 13:40:21 +00:00
Russ Cox
0c0c3c186b sync/atomic: remove noCopy from Value
Values must not be copied after the first use.

Using noCopy makes vet complain about copies
even before the first use, which is incorrect
and very frustrating.

Drop it.

Fixes #21504.

Change-Id: Icd3a5ac3fe11e84525b998e848ed18a5d996f45a
Reviewed-on: https://go-review.googlesource.com/80836
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-12-01 16:38:53 +00:00
Leigh McCulloch
8db19a4966 all: change github.com issue links to golang.org
The go repository contains a mix of github.com/golang/go/issues/xxxxx
and golang.org/issues/xxxxx URLs for references to issues in the issue
tracker. We should use one for consistency, and golang.org is preferred
in case the project moves the issue tracker in the future.

This reasoning is taken from a comment Sam Whited left on a CL I
recently opened: https://go-review.googlesource.com/c/go/+/73890.

In that CL I referenced an issue using its github.com URL, because other
tests in the file I was changing contained references to issues using
their github.com URL. Sam Whited left a comment on the CL stating I
should change it to the golang.org URL.

If new code is intended to reference issues via golang.org and not
github.com, existing code should be updated so that precedence exists
for contributors who are looking at the existing code as a guide for the
code they should write.

Change-Id: I3b9053fe38a1c56fc101a8b7fd7b8f310ba29724
Reviewed-on: https://go-review.googlesource.com/75673
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-11-04 04:13:41 +00:00
Elias Naur
9735fcfce7 sync/atomic: add memory barriers to Load/StoreInt32 on darwin/arm
After switching to an iPhone 5 for the darwin/arm builds,
TestStoreLoadRelAcq32 started to timeout on every builder run.
Adding the same memory barriers as armLoadUint64 and armStoreUint64
makes the test complete successfully.

Fixes sync/atomic tests on the darwin/arm builder.

Change-Id: Id73de31679304e259bdbd7f2f94383ae7fd70ee4
Reviewed-on: https://go-review.googlesource.com/67390
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2017-10-02 09:57:23 +00:00
Daniel Martí
fbc8973a6b all: join some chained ifs to unindent code
Found with mvdan.cc/unindent. It skipped the cases where parentheses
would need to be added, where comments would have to be moved elsewhere,
or where actions and simple logic would mix.

One of them was of the form "err != nil && err == io.EOF", so the first
part was removed.

Change-Id: Ie504c2b03a2c87d10ecbca1b9270069be1171b91
Reviewed-on: https://go-review.googlesource.com/57690
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-29 20:57:41 +00:00
Daniel Martí
99da8730b0 all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.

Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-26 15:09:09 +00:00
Tom Levy
b0ba0b49a0 sync/atomic: remove references to old atomic pointer hammer tests
The tests were removed in https://golang.org/cl/2311 but some
references to them were missed.

Change-Id: I163e554a0cc99401a012deead8fda813ad74dbfe
Reviewed-on: https://go-review.googlesource.com/58870
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-25 05:58:33 +00:00
Brad Fitzpatrick
5f7a03e148 sync: delete a sentence from the Map docs
From Josh's comments on https://golang.org/cl/50310

Once I removed the "from the Go standard library" bit, the beginning
wasn't worth keeping. It also wasn't clear whether what it meant by
"cache contention". Processor caches, or user-level caches built with
sync.Map? It didn't seem worth clarifying and didn't convey any useful
information, so deleted.

Change-Id: Id1d76105a3081d0855f6a64540700932bb83d98e
Reviewed-on: https://go-review.googlesource.com/50632
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2017-07-21 22:00:47 +00:00
Michael Stapelberg
ace7ce1025 sync: update Map documentation with usage rule of thumb
As per bcmills’s lightning talk at GopherCon 2017:
https://github.com/gophercon/2017-talks/tree/master/lightningtalks/BryanCMills-AnOverviewOfSyncMap

Change-Id: I12dd0daa608af175d110298780f32c6dc5e1e0a0
Reviewed-on: https://go-review.googlesource.com/50310
Reviewed-by: Bryan Mills <bcmills@google.com>
2017-07-21 16:02:43 +00:00
Bryan C. Mills
9311af79d5 sync: release m.mu during (*RWMutexMap).Range callbacks in sync_test
The mainline sync.Map has allowed mutations within Range callbacks
since https://golang.org/cl/37342. The reference implementations need
to do the same.

This change integrates https://go-review.googlesource.com/c/42956/
from x/sync.

Change-Id: I6b58cf874bb31cd4f6fdb8bfa8278888ed617a5a
Reviewed-on: https://go-review.googlesource.com/42957
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-07-20 18:51:09 +00:00
Austin Clements
fea7c43ea2 sync/atomic: clarify 64-bit alignment bug
Local variables can also be relied on the be 64-bit aligned, since
they will be escaped to the heap if used with any atomic operations.

Also, allocated arrays are also aligned, just like structs and slices.

Fixes #18955.

Change-Id: I8a1897f6ff78922c8bfcf20d6eb4bcb17a70ba2d
Reviewed-on: https://go-review.googlesource.com/48112
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-07-11 18:32:38 +00:00
Aliaksandr Valialkin
3ea53cb08f sync: deflake TestPool and TestPoolNew
Prevent possible goroutine rescheduling to another P between
Put and Get calls by locking the goroutine to OS thread.

Inspired by the CL 42770.

Fixes #20198.

Change-Id: I18e24fcad1630658713e6b9d80d90d7941f604be
Reviewed-on: https://go-review.googlesource.com/44310
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-06-28 22:02:07 +00:00
Alberto Donizetti
4b4bb53bf3 sync: make clear that WaitGroup.Done decrements by one
Change-Id: Ief076151739147378f8ca35cd09aabb59c3c9a52
Reviewed-on: https://go-review.googlesource.com/46350
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-06-21 15:49:18 +00:00
Ian Lance Taylor
09ebbf4085 runtime: add read/write mutex type
This is a runtime version of sync.RWMutex that can be used by code in
the runtime package. The type is not quite the same, in that the zero
value is not valid.

For future use by CL 43713.

Updates #19546

Change-Id: I431eb3688add16ce1274dab97285f555b72735bf
Reviewed-on: https://go-review.googlesource.com/45991
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-06-19 17:40:38 +00:00