1
0
mirror of https://github.com/golang/go synced 2024-11-19 14:54:43 -07:00
Commit Graph

51 Commits

Author SHA1 Message Date
Austin Clements
22000f5407 runtime: record swept and reclaimed bytes in sweep trace
This extends the GCSweepDone event with counts of swept and reclaimed
bytes. These are useful for understanding the duration and
effectiveness of sweep events.

Change-Id: I3c97a4f0f3aad3adbd188adb264859775f54e2df
Reviewed-on: https://go-review.googlesource.com/40811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2017-04-19 18:31:14 +00:00
Austin Clements
79c56addb6 runtime: make sweep trace events encompass entire sweep loop
Currently, each individual span sweep emits a span to the trace. But
sweeps are generally done in loops until some condition is satisfied,
so this tracing is lower-level than anyone really wants any hides the
fact that no other work is being accomplished between adjacent sweep
events. This is also high overhead: enabling tracing significantly
impacts sweep latency.

Replace this with instead tracing around the sweep loops used for
allocation. This is slightly tricky because sweep loops don't
generally know if any sweeping will happen in them. Hence, we make the
tracing lazy by recording in the P that we would like to start tracing
the sweep *if* one happens, and then only closing the sweep event if
we started it.

This does mean we don't get tracing on every sweep path, which are
legion. However, we get much more informative tracing on the paths
that block allocation, which are the paths that matter.

Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c
Reviewed-on: https://go-review.googlesource.com/40810
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-19 18:31:11 +00:00
David Lazar
3249cb0ab4 runtime/trace: iterate over frames instead of PCs
Now the runtime/trace tests pass with -l=4.

This also gets rid of the frames cache for multiple reasons:

1) The frames cache was used to avoid repeated calls to funcname and
funcline. Now these calls happen inside the CallersFrames iterator.

2) Maintaining a frames cache is harder: map[uintptr]traceFrame
doesn't work since each PC can map to multiple traceFrames.

3) It's not clear that the cache is important.

Change-Id: I2914ac0b3ba08e39b60149d99a98f9f532b35bbb
Reviewed-on: https://go-review.googlesource.com/40591
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-14 12:21:21 +00:00
Daniel Martí
77b09b8b8d runtime: remove unused g parameter
Found by github.com/mvdan/unparam.

Change-Id: I20145440ff1bcd27fcf15a740354c52f313e536c
Reviewed-on: https://go-review.googlesource.com/37894
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-16 14:03:45 +00:00
Austin Clements
0efc8b2188 runtime: avoid repeated findmoduledatap calls
Currently almost every function that deals with a *_func has to first
look up the *moduledata for the module containing the function's entry
point. This means we almost always do at least two identical module
lookups whenever we deal with a *_func (one to get the *_func and
another to get something from its module data) and sometimes several
more.

Fix this by making findfunc return a new funcInfo type that embeds
*_func, but also includes the *moduledata, and making all of the
functions that currently take a *_func instead take a funcInfo and use
the already-found *moduledata.

This transformation is trivial for the most part, since the *_func
type is usually inferred. The annoying part is that we can no longer
use nil to indicate failure, so this introduces a funcInfo.valid()
method and replaces nil checks with calls to valid.

Change-Id: I9b8075ef1c31185c1943596d96dec45c7ab5100f
Reviewed-on: https://go-review.googlesource.com/37331
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-03-06 19:17:24 +00:00
Dmitry Vyukov
0556e26273 sync: make Mutex more fair
Add new starvation mode for Mutex.
In starvation mode ownership is directly handed off from
unlocking goroutine to the next waiter. New arriving goroutines
don't compete for ownership.
Unfair wait time is now limited to 1ms.
Also fix a long standing bug that goroutines were requeued
at the tail of the wait queue. That lead to even more unfair
acquisition times with multiple waiters.

Performance of normal mode is not considerably affected.

Fixes #13086

On the provided in the issue lockskew program:

done in 1.207853ms
done in 1.177451ms
done in 1.184168ms
done in 1.198633ms
done in 1.185797ms
done in 1.182502ms
done in 1.316485ms
done in 1.211611ms
done in 1.182418ms

name                    old time/op  new time/op   delta
MutexUncontended-48     0.65ns ± 0%   0.65ns ± 1%     ~           (p=0.087 n=10+10)
Mutex-48                 112ns ± 1%    114ns ± 1%   +1.69%        (p=0.000 n=10+10)
MutexSlack-48            113ns ± 0%     87ns ± 1%  -22.65%         (p=0.000 n=8+10)
MutexWork-48             149ns ± 0%    145ns ± 0%   -2.48%         (p=0.000 n=9+10)
MutexWorkSlack-48        149ns ± 0%    122ns ± 3%  -18.26%         (p=0.000 n=6+10)
MutexNoSpin-48           103ns ± 4%    105ns ± 3%     ~           (p=0.089 n=10+10)
MutexSpin-48             490ns ± 4%    515ns ± 6%   +5.08%        (p=0.006 n=10+10)
Cond32-48               13.4µs ± 6%   13.1µs ± 5%   -2.75%        (p=0.023 n=10+10)
RWMutexWrite100-48      53.2ns ± 3%   41.2ns ± 3%  -22.57%        (p=0.000 n=10+10)
RWMutexWrite10-48       45.9ns ± 2%   43.9ns ± 2%   -4.38%        (p=0.000 n=10+10)
RWMutexWorkWrite100-48   122ns ± 2%    134ns ± 1%   +9.92%        (p=0.000 n=10+10)
RWMutexWorkWrite10-48    206ns ± 1%    188ns ± 1%   -8.52%         (p=0.000 n=8+10)
Cond32-24               12.1µs ± 3%   12.4µs ± 3%   +1.98%         (p=0.043 n=10+9)
MutexUncontended-24     0.74ns ± 1%   0.75ns ± 1%     ~           (p=0.650 n=10+10)
Mutex-24                 122ns ± 2%    124ns ± 1%   +1.31%        (p=0.007 n=10+10)
MutexSlack-24           96.9ns ± 2%  102.8ns ± 2%   +6.11%        (p=0.000 n=10+10)
MutexWork-24             146ns ± 1%    135ns ± 2%   -7.70%         (p=0.000 n=10+9)
MutexWorkSlack-24        135ns ± 1%    128ns ± 2%   -5.01%         (p=0.000 n=10+9)
MutexNoSpin-24           114ns ± 3%    110ns ± 4%   -3.84%        (p=0.000 n=10+10)
MutexSpin-24             482ns ± 4%    475ns ± 8%     ~           (p=0.286 n=10+10)
RWMutexWrite100-24      43.0ns ± 3%   43.1ns ± 2%     ~           (p=0.956 n=10+10)
RWMutexWrite10-24       43.4ns ± 1%   43.2ns ± 1%     ~            (p=0.085 n=10+9)
RWMutexWorkWrite100-24   130ns ± 3%    131ns ± 3%     ~           (p=0.747 n=10+10)
RWMutexWorkWrite10-24    191ns ± 1%    192ns ± 1%     ~           (p=0.210 n=10+10)
Cond32-12               11.5µs ± 2%   11.7µs ± 2%   +1.98%        (p=0.002 n=10+10)
MutexUncontended-12     1.48ns ± 0%   1.50ns ± 1%   +1.08%        (p=0.004 n=10+10)
Mutex-12                 141ns ± 1%    143ns ± 1%   +1.63%        (p=0.000 n=10+10)
MutexSlack-12            121ns ± 0%    119ns ± 0%   -1.65%          (p=0.001 n=8+9)
MutexWork-12             141ns ± 2%    150ns ± 3%   +6.36%         (p=0.000 n=9+10)
MutexWorkSlack-12        131ns ± 0%    138ns ± 0%   +5.73%         (p=0.000 n=9+10)
MutexNoSpin-12          87.0ns ± 1%   83.7ns ± 1%   -3.80%        (p=0.000 n=10+10)
MutexSpin-12             364ns ± 1%    377ns ± 1%   +3.77%        (p=0.000 n=10+10)
RWMutexWrite100-12      42.8ns ± 1%   43.9ns ± 1%   +2.41%         (p=0.000 n=8+10)
RWMutexWrite10-12       39.8ns ± 4%   39.3ns ± 1%     ~            (p=0.433 n=10+9)
RWMutexWorkWrite100-12   131ns ± 1%    131ns ± 0%     ~            (p=0.591 n=10+9)
RWMutexWorkWrite10-12    173ns ± 1%    174ns ± 0%     ~            (p=0.059 n=10+8)
Cond32-6                10.9µs ± 2%   10.9µs ± 2%     ~           (p=0.739 n=10+10)
MutexUncontended-6      2.97ns ± 0%   2.97ns ± 0%     ~     (all samples are equal)
Mutex-6                  122ns ± 6%    122ns ± 2%     ~           (p=0.668 n=10+10)
MutexSlack-6             149ns ± 3%    142ns ± 3%   -4.63%        (p=0.000 n=10+10)
MutexWork-6              136ns ± 3%    140ns ± 5%     ~           (p=0.077 n=10+10)
MutexWorkSlack-6         152ns ± 0%    138ns ± 2%   -9.21%         (p=0.000 n=6+10)
MutexNoSpin-6            150ns ± 1%    152ns ± 0%   +1.50%         (p=0.000 n=8+10)
MutexSpin-6              726ns ± 0%    730ns ± 1%     ~           (p=0.069 n=10+10)
RWMutexWrite100-6       40.6ns ± 1%   40.9ns ± 1%   +0.91%         (p=0.001 n=8+10)
RWMutexWrite10-6        37.1ns ± 0%   37.0ns ± 1%     ~            (p=0.386 n=9+10)
RWMutexWorkWrite100-6    133ns ± 1%    134ns ± 1%   +1.01%         (p=0.005 n=9+10)
RWMutexWorkWrite10-6     152ns ± 0%    152ns ± 0%     ~     (all samples are equal)
Cond32-2                7.86µs ± 2%   7.95µs ± 2%   +1.10%        (p=0.023 n=10+10)
MutexUncontended-2      8.10ns ± 0%   9.11ns ± 4%  +12.44%         (p=0.000 n=9+10)
Mutex-2                 32.9ns ± 9%   38.4ns ± 6%  +16.58%        (p=0.000 n=10+10)
MutexSlack-2            93.4ns ± 1%   98.5ns ± 2%   +5.39%         (p=0.000 n=10+9)
MutexWork-2             40.8ns ± 3%   43.8ns ± 7%   +7.38%         (p=0.000 n=10+9)
MutexWorkSlack-2        98.6ns ± 5%  108.2ns ± 2%   +9.80%         (p=0.000 n=10+8)
MutexNoSpin-2            399ns ± 1%    398ns ± 2%     ~             (p=0.463 n=8+9)
MutexSpin-2             1.99µs ± 3%   1.97µs ± 1%   -0.81%          (p=0.003 n=9+8)
RWMutexWrite100-2       37.6ns ± 5%   46.0ns ± 4%  +22.17%         (p=0.000 n=10+8)
RWMutexWrite10-2        50.1ns ± 6%   36.8ns ±12%  -26.46%         (p=0.000 n=9+10)
RWMutexWorkWrite100-2    136ns ± 0%    134ns ± 2%   -1.80%          (p=0.001 n=7+9)
RWMutexWorkWrite10-2     140ns ± 1%    138ns ± 1%   -1.50%        (p=0.000 n=10+10)
Cond32                  5.93µs ± 1%   5.91µs ± 0%     ~            (p=0.411 n=9+10)
MutexUncontended        15.9ns ± 0%   15.8ns ± 0%   -0.63%          (p=0.000 n=8+8)
Mutex                   15.9ns ± 0%   15.8ns ± 0%   -0.44%        (p=0.003 n=10+10)
MutexSlack              26.9ns ± 3%   26.7ns ± 2%     ~           (p=0.084 n=10+10)
MutexWork               47.8ns ± 0%   47.9ns ± 0%   +0.21%          (p=0.014 n=9+8)
MutexWorkSlack          54.9ns ± 3%   54.5ns ± 3%     ~           (p=0.254 n=10+10)
MutexNoSpin              786ns ± 2%    765ns ± 1%   -2.66%        (p=0.000 n=10+10)
MutexSpin               3.87µs ± 1%   3.83µs ± 0%   -0.85%          (p=0.005 n=9+8)
RWMutexWrite100         21.2ns ± 2%   21.0ns ± 1%   -0.88%         (p=0.018 n=10+9)
RWMutexWrite10          22.6ns ± 1%   22.6ns ± 0%     ~             (p=0.471 n=9+9)
RWMutexWorkWrite100      132ns ± 0%    132ns ± 0%     ~     (all samples are equal)
RWMutexWorkWrite10       124ns ± 0%    123ns ± 0%     ~           (p=0.656 n=10+10)

Change-Id: I66412a3a0980df1233ad7a5a0cd9723b4274528b
Reviewed-on: https://go-review.googlesource.com/34310
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-17 17:24:59 +00:00
Austin Clements
d089a6c718 runtime: remove stack barriers
Now that we don't rescan stacks, stack barriers are unnecessary. This
removes all of the code and structures supporting them as well as
tests that were specifically for stack barriers.

Updates #17503.

Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c
Reviewed-on: https://go-review.googlesource.com/36620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-14 15:52:54 +00:00
Heschi Kreinick
2a74b9e814 cmd/trace: Record mark assists in execution traces
During the mark phase of garbage collection, goroutines that allocate
may be recruited to assist. This change creates trace events for mark
assists and displays them similarly to sweep assists in the trace
viewer.

Mark assists are different than sweeps in that they can be preempted, so
displaying them in the trace viewer is a little tricky -- we may need to
synthesize multiple slices for one mark assist. This could have been
done in the parser instead, but I thought it might be preferable to keep
the parser as true to the event stream as possible.

Change-Id: I381dcb1027a187a354b1858537851fa68a620ea7
Reviewed-on: https://go-review.googlesource.com/36015
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2017-02-10 18:03:42 +00:00
Austin Clements
6da83c6fc0 runtime, cmd/trace: track goroutines blocked on GC assists
Currently when a goroutine blocks on a GC assist, it emits a generic
EvGoBlock event. Since assist blocking events and, in particular, the
length of the blocked assist queue, are important for diagnosing GC
behavior, this commit adds a new EvGoBlockGC event for blocking on a
GC assist. The trace viewer uses this event to report a "waiting on
GC" count in the "Goroutines" row. This makes sense because, unlike
other blocked goroutines, these goroutines do have work to do, so
being blocked on a GC assist is quite similar to being in the
"runnable" state, which we also report in the trace viewer.

Change-Id: Ic21a326992606b121ea3d3d00110d8d1fdc7a5ef
Reviewed-on: https://go-review.googlesource.com/30704
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-28 14:29:47 +00:00
Austin Clements
6834839427 runtime, cmd/trace: annotate different mark worker types
Currently mark workers are shown in the trace as regular goroutines
labeled "runtime.gcBgMarkWorker". That's somewhat unhelpful to an end
user because of the opaque label and particularly unhelpful to runtime
developers because it doesn't distinguish the different types of mark
workers.

Fix this by introducing a variant of the GoStart event called
GoStartLabel that lets the runtime indicate a label for a goroutine
execution span and using this to label mark worker executions as "GC
(<mode>)" in the trace viewer.

Since this bumps the trace version to 1.8, we also add test data for
1.7 traces.

Change-Id: Id7b9c0536508430c661ffb9e40e436f3901ca121
Reviewed-on: https://go-review.googlesource.com/30702
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-28 14:29:40 +00:00
Peter Weinberger
ca922b6d36 runtime: Profile goroutines holding contended mutexes.
runtime.SetMutexProfileFraction(n int) will capture 1/n-th of stack
traces of goroutines holding contended mutexes if n > 0. From runtime/pprof,
pprot.Lookup("mutex").WriteTo writes the accumulated
stack traces to w (in essentially the same format that blocking
profiling uses).

Change-Id: Ie0b54fa4226853d99aa42c14cb529ae586a8335a
Reviewed-on: https://go-review.googlesource.com/29650
Reviewed-by: Austin Clements <austin@google.com>
2016-10-28 11:47:16 +00:00
Austin Clements
c242517866 runtime: replace *g with guintptr in trace
trace's reader *g is going to cause write barriers in unfortunate
places, so replace it with a guintptr.

Change-Id: Ie8fb13bb89a78238f9d2a77ec77da703e96df8af
Reviewed-on: https://go-review.googlesource.com/31469
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-21 16:00:20 +00:00
Austin Clements
1bc6be6423 runtime: mark several types go:notinheap
This covers basically all sysAlloc'd, persistentalloc'd, and
fixalloc'd types.

Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b
Reviewed-on: https://go-review.googlesource.com/30941
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-15 17:58:20 +00:00
Austin Clements
94589054d3 cmd/trace: label mark termination spans as such
Currently these are labeled "MARK", which was accurate in the STW
collector, but these really indicate mark termination now, since
marking happens for the full duration of the concurrent GC. Re-label
them as "MARK TERMINATION" to clarify this.

Change-Id: Ie98bd961195acde49598b4fa3f9e7d90d757c0a6
Reviewed-on: https://go-review.googlesource.com/30018
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-07 18:33:23 +00:00
Austin Clements
fa9b57bb1d runtime: make next_gc ^0 when GC is disabled
When GC is disabled, we set gcpercent to -1. However, we still use
gcpercent to compute several values, such as next_gc and gc_trigger.
These calculations are meaningless when gcpercent is -1 and result in
meaningless values. This is okay in a sense because we also never use
these values if gcpercent is -1, but they're confusing when exposed to
the user, for example via MemStats or the execution trace. It's
particularly unfortunate in the execution trace because it attempts to
plot the underflowed value of next_gc, which scales all useful
information in the heap row into oblivion.

Fix this by making next_gc ^0 when gcpercent < 0. This has the
advantage of being true in a way: next_gc is effectively infinite when
gcpercent < 0. We can also detect this special value when updating the
execution trace and report next_gc as 0 so it doesn't blow up the
display of the heap line.

Change-Id: I4f366e4451f8892a4908da7b2b6086bdc67ca9a9
Reviewed-on: https://go-review.googlesource.com/30016
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-07 18:32:51 +00:00
Dmitry Vyukov
cd285f1c6f runtime: fix global buffer reset in StopTrace
We reset global buffer only if its pos != 0.
We ought to do it always, but queue it only if pos != 0.
This is a latent bug. Currently it does not fire because
whenever we create a global buffer, we increment pos.

Change-Id: I01e28ae88ce9a5412497c524391b8b7cb443ffd9
Reviewed-on: https://go-review.googlesource.com/25574
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-09-02 19:14:11 +00:00
Dmitry Vyukov
747a158ef3 runtime: speed up StartTrace with lots of blocked goroutines
In StartTrace we emit EvGoCreate for all existing goroutines.
This includes stack unwind to obtain current stack.
Real Go programs can contain hundreds of thousands of blocked goroutines.
For such programs StartTrace can take up to a second (few ms per goroutine).

Obtain current stack ID once and use it for all EvGoCreate events.

This speeds up StartTrace with 10K blocked goroutines from 20ms to 4 ms
(win for StartTrace called from net/http/pprof hander will be bigger
as stack is deeper).

Change-Id: I9e5ff9468331a840f8fdcdd56c5018c2cfde61fc
Reviewed-on: https://go-review.googlesource.com/25573
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2016-08-22 17:40:10 +00:00
Dmitry Vyukov
a3703618ea runtime: use per-goroutine sequence numbers in tracer
Currently tracer uses global sequencer and it introduces
significant slowdown on parallel machines (up to 10x).
Replace the global sequencer with per-goroutine sequencer.

If we assign per-goroutine sequence numbers to only 3 types
of events (start, unblock and syscall exit), it is enough to
restore consistent partial ordering of all events. Even these
events don't need sequence numbers all the time (if goroutine
starts on the same P where it was unblocked, then start does
not need sequence number).
The burden of restoring the order is put on trace parser.
Details of the algorithm are described in the comments.

On http benchmark with GOMAXPROCS=48:
no tracing: 5026 ns/op
tracing: 27803 ns/op (+453%)
with this change: 6369 ns/op (+26%, mostly for traceback)

Also trace size is reduced by ~22%. Average event size before: 4.63
bytes/event, after: 3.62 bytes/event.

Besides running trace tests, I've also tested with manually broken
cputicks (random skew for each event, per-P skew and episodic random skew).
In all cases broken timestamps were detected and no test failures.

Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa
Reviewed-on: https://go-review.googlesource.com/21512
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-23 15:57:05 +00:00
Dmitry Vyukov
2d342fba78 runtime: fix description of trace events
Change-Id: I037101b1921fe151695d32e9874b50dd64982298
Reviewed-on: https://go-review.googlesource.com/22314
Reviewed-by: Austin Clements <austin@google.com>
2016-04-22 21:32:37 +00:00
Dmitry Vyukov
3fafe2e888 internal/trace: support parsing of 1.5 traces
1. Parse out version from trace header.
2. Restore handling of 1.5 traces.
3. Restore optional symbolization of traces.
4. Add some canned 1.5 traces for regression testing
   (http benchmark trace, runtime/trace stress traces,
    plus one with broken timestamps).

Change-Id: Idb18a001d03ded8e13c2730eeeb37c5836e31256
Reviewed-on: https://go-review.googlesource.com/21803
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-04-11 17:56:44 +00:00
Dmitry Vyukov
0fb7b4cccd runtime: emit file:line info into traces
This makes traces self-contained and simplifies trace workflow
in modern cloud environments where it is simpler to reach
a service via HTTP than to obtain the binary.

Change-Id: I6ff3ca694dc698270f1e29da37d5efaf4e843a0d
Reviewed-on: https://go-review.googlesource.com/21732
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2016-04-08 20:52:30 +00:00
Austin Clements
08594ac7c7 runtime: acquire stack lock in traceEvent
traceEvent records system call events after a G has already entered
_Gsyscall, which means the garbage collector could be installing stack
barriers in the G's stack during the traceEvent. If traceEvent
attempts to capture the user stack during this, it may observe a
inconsistent stack barriers and panic. Fix this by acquiring the stack
lock around the stack walk in traceEvent.

Fixes #14101.

Change-Id: I15f0ab0c70c04c6e182221f65a6f761c5a896459
Reviewed-on: https://go-review.googlesource.com/18973
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-27 02:22:09 +00:00
Austin Clements
e9aef43d87 runtime: eliminate traceAllocBlock write barriers
This replaces *traceAllocBlock with traceAllocBlockPtr.

Updates #10600.

Change-Id: I94a20d90f04cca7c457b29062427748e315e4857
Reviewed-on: https://go-review.googlesource.com/17004
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-19 21:17:09 +00:00
Michael Matloob
432cb66f16 runtime: break out system-specific constants into package sys
runtime/internal/sys will hold system-, architecture- and config-
specific constants.

Updates #11647

Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54
Reviewed-on: https://go-review.googlesource.com/16817
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-12 17:04:45 +00:00
Austin Clements
f5c42cf88e runtime: replace traceBuf slice with index
Currently traceBuf keeps track of where it is in the trace buffer by
also maintaining a slice that points in to this buffer with an initial
length of 0 and a cap of the length of the array. All writes to this
buffer are done by appending to the slice (as long as the bounds
checks are right, it will never overflow and the append won't allocate
a new slice).

Each of these appends generates a write barrier. As long as we never
overflow the buffer, this write barrier won't fire, but this wreaks
havoc with eliminating write barriers from the tracing code. If we
were to overflow the buffer, this would both allocate and invoke a
write barrier, both things that are dicey at best to do in many of the
contexts tracing happens. It also wastes space in the traceBuf and
leads to more complex code and more complex generated code.

Replace this slice trick with keeping track of a simple array
position.

Updates #10600.

Change-Id: I0a63eecec1992e195449f414ed47653f66318d0e
Reviewed-on: https://go-review.googlesource.com/16814
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-11 17:37:31 +00:00
Austin Clements
2be1ed80c5 runtime: eliminate traceStack write barriers
This replaces *traceStack with traceStackPtr, much like the preceding
commit.

Updates #10600.

Change-Id: Ifadc35eb37a405ae877f9740151fb31a0ca1d08f
Reviewed-on: https://go-review.googlesource.com/16813
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-11 17:37:26 +00:00
Austin Clements
03227bb55e runtime: eliminate traceBuf write barriers
The tracing code is currently called from contexts such as sysmon and
the scheduler where write barriers are not allowed. Unfortunately,
while the common paths through the tracing code do not have write
barriers, many of the less common paths dealing with buffer overflow
and recycling do.

This change replaces all *traceBufs with traceBufPtrs. In the style of
guintptr, etc., the GC does not trace traceBufPtrs and write barriers
do not apply when these pointers are written. Since traceBufs are
allocated from non-GC'd memory and manually managed, this is always
safe.

Updates #10600.

Change-Id: I52b992d36d1b634ebd855c8cde27947ec14f59ba
Reviewed-on: https://go-review.googlesource.com/16812
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-11-11 17:37:18 +00:00
Michael Matloob
67faca7d9c runtime: break atomics out into package runtime/internal/atomic
This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.

all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.

Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 17:38:04 +00:00
Austin Clements
a51905fa04 runtime: decentralize sweep termination and mark transition
This moves all of GC initialization, sweep termination, and the
transition to concurrent marking in to the off->mark transition
function. This means it's now handled on the goroutine that detected
the state exit condition.

As a result, malloc no longer needs to Gosched() at the beginning of
the GC cycle to prevent over-allocation while the GC is starting up
because it will now *help* the GC to start up. The Gosched hack is
still necessary during GC shutdown (this is easy to test by enabling
gctrace and hitting Ctrl-S to block the gctrace output).

At this point, the GC coordinator still handles later phases. This
requires a small tweak to how we start the GC coordinator. Currently,
starting the GC coordinator is best-effort and may fail if the
coordinator is about to park from the previous cycle but hasn't yet.
We fix this by replacing the park/ready to wake up the coordinator
with a semaphore. This is temporary since the coordinator will be
going away in a few commits.

Updates #11970.

Change-Id: I2c6a11c91e72dfbc59c2d8e7c66146dee9a444fe
Reviewed-on: https://go-review.googlesource.com/16357
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-05 21:23:27 +00:00
Austin Clements
f54bcedce1 runtime: beginning of decentralized off->mark transition
This begins the conversion of the centralized GC coordinator to a
decentralized state machine by introducing the internal API that
triggers the first state transition from _GCoff to _GCmark (or
_GCmarktermination).

This change introduces the transition lock, the off->mark transition
condition (which is very similar to shouldtriggergc()), and the
general structure of a state transition. Since we're doing this
conversion in stages, it then falls back to the GC coordinator to
actually execute the cycle. We'll start moving logic out of the GC
coordinator and in to transition functions next.

This fixes a minor bug in gcstoptheworld debug mode where passing the
heap trigger once could trigger multiple STW GCs.

Updates #11970.

Change-Id: I964087dd190a639eb5766398f8e1bbf8b352902f
Reviewed-on: https://go-review.googlesource.com/16355
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-11-05 21:23:17 +00:00
Shawn Walker-Salas
001a75a74c runtime/trace: fix tracing of blocking system calls
The placement and invocation of traceGoSysCall when using
entersyscallblock() instead of entersyscall() differs enough that the
TestTraceSymbolize test can fail on some platforms.

This change moves the invocation of traceGoSysCall for entersyscall() so
that the same number of "frames to skip" are present in the trace as when
entersyscallblock() is used ensuring system call traces remain identical
regardless of internal implementation choices.

Fixes golang/go#12056

Change-Id: I8361e91aa3708f5053f98263dfe9feb8c5d1d969
Reviewed-on: https://go-review.googlesource.com/13861
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-09-17 09:06:20 +00:00
Austin Clements
739f133837 runtime: fix hashing of trace stacks
The call to hash the trace stack reversed the "seed" and "size"
arguments to memhash and, hence, always called memhash with a 0 size,
which dutifully returned a hash value that depended only on the number
of PCs in the stack and not their values. As a result, all stacks were
put in to a very subset of the 8,192 buckets.

Fix this by passing these arguments in the correct order.

Change-Id: I67cd29312f5615c7ffa23e205008dd72c6b8af62
Reviewed-on: https://go-review.googlesource.com/13613
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-09-14 18:14:14 +00:00
Russ Cox
80c98fa901 runtime/trace: record event sequence numbers explicitly
Nearly all the flaky failures we've seen in trace tests have been
due to the use of time stamps to determine relative event ordering.
This is tricky for many reasons, including:
 - different cores might not have exactly synchronized clocks
 - VMs are worse than real hardware
 - non-x86 chips have different timer resolution than x86 chips
 - on fast systems two events can end up with the same time stamp

Stop trying to make time reliable. It's clearly not going to be for Go 1.5.
Instead, record an explicit event sequence number for ordering.
Using our own counter solves all of the above problems.

The trace still contains time stamps, of course. The sequence number
is just used for ordering.

Should alleviate #10554 somewhat. Then tickDiv can be chosen to
be a useful time unit instead of having to be exact for ordering.

Separating ordering and time stamps lets the trace parser diagnose
systems where the time stamp order and actual order do not match
for one reason or another. This CL adds that check to the end of
trace.Parse, after all other sequence order-based checking.
If that error is found, we skip the test instead of failing it.
Putting the check in trace.Parse means that cmd/trace will pick
up the same check, refusing to display a trace where the time stamps
do not match actual ordering.

Using net/http's BenchmarkClientServerParallel4 on various CPU counts,
not tracing vs tracing:

name                      old time/op    new time/op    delta
ClientServerParallel4       50.4µs ± 4%    80.2µs ± 4%  +59.06%        (p=0.000 n=10+10)
ClientServerParallel4-2     33.1µs ± 7%    57.8µs ± 5%  +74.53%        (p=0.000 n=10+10)
ClientServerParallel4-4     18.5µs ± 4%    32.6µs ± 3%  +75.77%        (p=0.000 n=10+10)
ClientServerParallel4-6     12.9µs ± 5%    24.4µs ± 2%  +89.33%        (p=0.000 n=10+10)
ClientServerParallel4-8     11.4µs ± 6%    21.0µs ± 3%  +83.40%        (p=0.000 n=10+10)
ClientServerParallel4-12    14.4µs ± 4%    23.8µs ± 4%  +65.67%        (p=0.000 n=10+10)

Fixes #10512.

Change-Id: I173eecf8191e86feefd728a5aad25bf1bc094b12
Reviewed-on: https://go-review.googlesource.com/12579
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:32:14 +00:00
Austin Clements
58f3a82950 runtime: fix comments referring to trace functions in runtime/pprof
ae1ea2a moved trace-related functions from runtime/pprof to
runtime/trace, but missed a doc comment and a code comment. Update
these to reflect the move.

Change-Id: I6e1e8861e5ede465c08a2e3f80b976145a8b32d8
Reviewed-on: https://go-review.googlesource.com/12525
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-07-22 18:33:38 +00:00
Brad Fitzpatrick
2ae77376f7 all: link to https instead of http
The one in misc/makerelease/makerelease.go is particularly bad and
probably warrants rotating our keys.

I didn't update old weekly notes, and reverted some changes involving
test code for now, since we're late in the Go 1.5 freeze. Otherwise,
the rest are all auto-generated changes, and all manually reviewed.

Change-Id: Ia2753576ab5d64826a167d259f48a2f50508792d
Reviewed-on: https://go-review.googlesource.com/12048
Reviewed-by: Rob Pike <r@golang.org>
2015-07-11 14:36:33 +00:00
Dmitry Vyukov
e72f5f67a1 runtime: fix tracing of syscallexit
There were two issues.
1. Delayed EvGoSysExit could have been emitted during TraceStart,
while it had not yet emitted EvGoInSyscall.
2. Delayed EvGoSysExit could have been emitted during next tracing session.

Fixes #10476
Fixes #11262

Change-Id: Iab68eb31cf38eb6eb6eee427f49c5ca0865a8c64
Reviewed-on: https://go-review.googlesource.com/9132
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-18 13:59:55 +00:00
Austin Clements
a1da255aa0 runtime: factor stoptheworld/starttheworld pattern
There are several steps to stopping and starting the world and
currently they're open-coded in several places. The garbage collector
is the only thing that needs to stop and start the world in a
non-trivial pattern. Replace all other uses with calls to higher-level
functions that implement the entire pattern necessary to stop and
start the world.

This is a pure refectoring and should not change any code semantics.
In the following commits, we'll make changes that are easier to do
with this abstraction in place.

This commit renames the old starttheworld to startTheWorldWithSema.
This is a slight misnomer right now because the callers release
worldsema just before calling this. However, a later commit will swap
these and I don't want to think of another name in the mean time.

Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911
Reviewed-on: https://go-review.googlesource.com/10154
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:25 +00:00
Shenghou Ma
5f69e739d3 runtime: adjust traceTickDiv for non-x86 architectures
Fixes #10554.
Fixes #10623.

Change-Id: I90fbaa34e3d55c8758178f8d2e7fa41ff1194a1b
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9247
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-05-01 07:25:49 +00:00
Russ Cox
181e26b9fa runtime: replace func-based write barrier skipping with type-based
This CL revises CL 7504 to use explicitly uintptr types for the
struct fields that are going to be updated sometimes without
write barriers. The result is that the fields are now updated *always*
without write barriers.

This approach has two important properties:

1) Now the GC never looks at the field, so if the missing reference
could cause a problem, it will do so all the time, not just when the
write barrier is missed at just the right moment.

2) Now a write barrier never happens for the field, avoiding the
(correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1.

Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e
Reviewed-on: https://go-review.googlesource.com/9019
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-20 20:20:09 +00:00
Dmitry Vyukov
089d363a91 runtime: fix tracing of syscall exit
Fix tracing of syscall exit after:
https://go-review.googlesource.com/#/c/7504/

Change-Id: Idcde2aa826d2b9a05d0a90a80242b6bfa78846ab
Reviewed-on: https://go-review.googlesource.com/8728
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-10 17:39:06 +00:00
Austin Clements
d7e0ad4b82 runtime: introduce heap_live; replace use of heap_alloc in GC
Currently there are two main consumers of memstats.heap_alloc:
updatememstats (aka ReadMemStats) and shouldtriggergc.

updatememstats recomputes heap_alloc from the ground up, so we don't
need to keep heap_alloc up to date for it. shouldtriggergc wants to
know how many bytes were marked by the previous GC plus how many bytes
have been allocated since then, but this *isn't* what heap_alloc
tracks. heap_alloc also includes objects that are not marked and
haven't yet been swept.

Introduce a new memstat called heap_live that actually tracks what
shouldtriggergc wants to know and stop keeping heap_alloc up to date.

Unlike heap_alloc, heap_live follows a simple sawtooth that drops
during each mark termination and increases monotonically between GCs.
heap_alloc, on the other hand, has much more complicated behavior: it
may drop during sweep termination, slowly decreases from background
sweeping between GCs, is roughly unaffected by allocation as long as
there are unswept spans (because we sweep and allocate at the same
rate), and may go up after background sweeping is done depending on
the GC trigger.

heap_live simplifies computing next_gc and using it to figure out when
to trigger garbage collection. Currently, we guess next_gc at the end
of a cycle and update it as we sweep and get a better idea of how much
heap was marked. Now, since we're directly tracking how much heap is
marked, we can directly compute next_gc.

This also corrects bugs that could cause us to trigger GC early.
Currently, in any case where sweep termination actually finds spans to
sweep, heap_alloc is an overestimation of live heap, so we'll trigger
GC too early. heap_live, on the other hand, is unaffected by sweeping.

Change-Id: I1f96807b6ed60d4156e8173a8e68745ffc742388
Reviewed-on: https://go-review.googlesource.com/8389
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-06 21:28:13 +00:00
Dmitry Vyukov
4396ea96c4 runtime: remove futile wakeups from trace
Channels and sync.Mutex'es allow another goroutine to acquire resource
ahead of an unblocked goroutine. This is good for performance, but
leads to futile wakeups (the unblocked goroutine needs to block again).
Futile wakeups caused user confusion during the very first evaluation
of tracing functionality on a real server (a goroutine as if acquires a mutex
in a loop, while there is no loop in user code).

This change detects futile wakeups on channels and emits a special event
to denote the fact. Later parser finds entire wakeup sequences
(unblock->start->block) and removes them.

sync.Mutex will be supported in a separate change.

Change-Id: Iaaaee9d5c0921afc62b449a97447445030ac19d3
Reviewed-on: https://go-review.googlesource.com/7380
Reviewed-by: Keith Randall <khr@golang.org>
2015-03-17 14:14:55 +00:00
Dmitry Vyukov
9d332a8324 cmd/trace: dump thread id on proc start
Augment ProcStart events with OS thread id.
This helps in scheduler locality analysis.

Change-Id: I93fea75d3072cf68de66110d0b59d07101badcb5
Reviewed-on: https://go-review.googlesource.com/7302
Reviewed-by: Keith Randall <khr@golang.org>
2015-03-11 12:52:41 +00:00
Dmitry Vyukov
919fd24884 runtime: remove runtime frames from stacks in traces
Stip uninteresting bottom and top frames from trace stacks.
This makes both binary and json trace files smaller,
and also makes stacks shorter and more readable in the viewer.

Change-Id: Ib9c80ccc280504f0e235f867f53f1d2652c41583
Reviewed-on: https://go-review.googlesource.com/5523
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
2015-03-10 14:46:15 +00:00
Matthew Dempsky
3c8a89daf3 runtime: simplify CPU profiling code
This makes Go's CPU profiling code somewhat more idiomatic; e.g.,
using := instead of forward declaring variables, using "int" for
element counts instead of "uintptr", and slices instead of C-style
pointer+length.  This makes the code easier to read and eliminates a
lot of type conversion clutter.

Additionally, in sigprof we can collect just maxCPUProfStack stack
frames, as cpuprof won't use more than that anyway.

Change-Id: I0235b5ae552191bcbb453b14add6d8c01381bd06
Reviewed-on: https://go-review.googlesource.com/6072
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-02-26 08:59:24 +00:00
Dmitry Vyukov
edadffa2f3 cmd/trace: add new command
Trace command allows to visualize and analyze traces.
Run as:
$ go tool trace binary trace.file
The commands opens web browser with the main page,
which contains links for trace visualization,
blocking profiler, network IO profiler and per-goroutine
traces.

Also move trace parser from runtime/pprof/trace_parser_test.go
to internal/trace/parser.go, so that it can be shared between
tests and the command.

Change-Id: Ic97ed59ad6e4c7e1dc9eca5e979701a2b4aed7cf
Reviewed-on: https://go-review.googlesource.com/3601
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-02-20 18:31:25 +00:00
Dmitry Vyukov
4a45ac577f runtime: fix false race report during tracing
Currently race detector produces the following reports on pprof tests:

WARNING: DATA RACE
Read by goroutine 4:
  runtime/pprof_test.TestTraceStartStop()
      src/runtime/pprof/trace_test.go:38 +0x1da
  testing.tRunner()
      src/testing/testing.go:448 +0x13a

Previous write by goroutine 5:
  bytes.(*Buffer).grow()
      src/bytes/buffer.go:102 +0x190
  bytes.(*Buffer).Write()
      src/bytes/buffer.go:127 +0x75
  runtime/pprof.func·002()
      src/runtime/pprof/pprof.go:633 +0xae

Trace writer goroutine synchronizes with StopTrace
using trace.shutdownSema runtime semaphore.
But race detector does not see that synchronization
and so produces false reports.
Teach race detector about the synchronization.

Change-Id: I1219817325d4e16b423f29a0cbee94c929793881
Reviewed-on: https://go-review.googlesource.com/3746
Reviewed-by: Russ Cox <rsc@golang.org>
2015-02-03 15:41:41 +00:00
Austin Clements
28b5118415 runtime: rename m.gcing to m.preemptoff and make it a string
m.gcing has become overloaded to mean "don't preempt this g" in
general.  Once the garbage collector is preemptible, the one thing it
*won't* mean is that we're in the garbage collector.

So, rename gcing to "preemptoff" and make it a string giving a reason
that preemption is disabled.  gcing was never set to anything but 0 or
1, so we don't have to worry about there being a stack of reasons.

Change-Id: I4337c29e8e942e7aa4f106fc29597e1b5de4ef46
Reviewed-on: https://go-review.googlesource.com/3660
Reviewed-by: Russ Cox <rsc@golang.org>
2015-02-02 19:34:51 +00:00
Dmitry Vyukov
256116ad25 runtime: fix trace ticks frequency on windows
Change-Id: I8c7fcc7705070bc9979e39d08a4c9b2870087a08
Reviewed-on: https://go-review.googlesource.com/3500
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2015-01-30 08:35:38 +00:00
Dmitry Vyukov
5288fadbdc runtime: add tracing of runtime events
Add actual tracing of interesting runtime events.
Part of a larger tracing functionality:
https://docs.google.com/document/u/1/d/1FP5apqzBgr7ahCCgFO-yoVhk4YZrNIDNf9RybngBc14/pub
Full change:
https://codereview.appspot.com/146920043

Change-Id: Icccf54aea54e09350bb698ba6bf11532f9fbe6d3
Reviewed-on: https://go-review.googlesource.com/1451
Reviewed-by: Russ Cox <rsc@golang.org>
2015-01-28 16:35:24 +00:00