1
0
mirror of https://github.com/golang/go synced 2024-11-19 18:44:41 -07:00
Commit Graph

2834 Commits

Author SHA1 Message Date
Austin Clements
c44d031bf0 runtime: eliminate heapBits.hasPointers
This is no longer necessary now that we can more efficiently consult
the span's noscan bit.

This is a cherry-pick of dev.garbage commit 312aa09996.

Change-Id: Id0b00b278533660973f45eb6efa5b00f373d58af
Reviewed-on: https://go-review.googlesource.com/41252
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-28 22:50:34 +00:00
Austin Clements
1a033b1a70 runtime: separate spans of noscan objects
Currently, we mix objects with pointers and objects without pointers
("noscan" objects) together in memory. As a result, for every object
we grey, we have to check that object's heap bits to find out if it's
noscan, which adds to the per-object cost of GC. This also hurts the
TLB footprint of the garbage collector because it decreases the
density of scannable objects at the page level.

This commit improves the situation by using separate spans for noscan
objects. This will allow a much simpler noscan check (in a follow up
CL), eliminate the need to clear the bitmap of noscan objects (in a
follow up CL), and improves TLB footprint by increasing the density of
scannable objects.

This is also a step toward eliminating dead bits, since the current
noscan check depends on checking the dead bit of the first word.

This has no effect on the heap size of the garbage benchmark.

We'll measure the performance change of this after the follow-up
optimizations.

This is a cherry-pick from dev.garbage commit d491e550c3. The only
non-trivial merge conflict was in updatememstats in mstats.go, where
we now have to separate the per-spanclass stats from the per-sizeclass
stats.

Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40
Reviewed-on: https://go-review.googlesource.com/41251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-28 22:50:31 +00:00
Austin Clements
390fdead0b runtime: document runtime.Frames better
In particular, this says that Frames.Function uniquely identifies a
function within a program. We depend on this in various places that
use runtime.Frames in std, but it wasn't actually written down.

Change-Id: Ie7ede348c17673e11ae513a094862b60c506abc5
Reviewed-on: https://go-review.googlesource.com/41610
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-28 22:43:20 +00:00
Michael Matloob
f105c91757 runtime/pprof: propagate profile labels into profile proto
Profile labels added by the user using pprof.Do, if present will
be in a *labelMap stored in the unsafe.Pointer 'tag' field of
the profile map entry. This change extracts the labels from the tag
field and writes them to the profile proto.

Change-Id: Ic40fdc58b66e993ca91d5d5effe0e04ffbb5bc46
Reviewed-on: https://go-review.googlesource.com/39613
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-04-28 17:37:58 +00:00
Russ Cox
c82efb1fa3 runtime: fix profile handling of labels for race detector
If g1 sets its labels and then they are copied into a profile buffer
and then g2 reads the profile buffer and inspects the labels,
the race detector must understand that g1's recording of the labels
happens before g2's use of the labels. Make that so.

Fixes race test failure in CL 39613.

Change-Id: Id7cda1c2aac6f8eef49213b5ca414f7154b4acfa
Reviewed-on: https://go-review.googlesource.com/42111
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-04-28 17:37:46 +00:00
Russ Cox
3ddf65015a runtime/pprof: ignore dummy huge page mapping in /proc/self/maps
Change-Id: I72bea1450386100482b4681b20eb9a9af12c7522
Reviewed-on: https://go-review.googlesource.com/41816
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-04-26 19:34:56 +00:00
Russ Cox
d1ac592717 runtime/pprof: add /proc/self/maps parsing test
Delete old TestRuntimeFunctionTrimming, which is testing a dead API
and is now handled in end-to-end tests.

Change-Id: I64fc2991ed4a7690456356b5f6b546f36935bb67
Reviewed-on: https://go-review.googlesource.com/41815
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-04-26 19:34:01 +00:00
Aliaksandr Valialkin
259d60995d runtime: align mcentral by cache line size
This may improve perormance during concurrent access
to mheap.central array from multiple CPU cores.

Change-Id: I8f48dd2e72aa62e9c32de07ae60fe552d8642782
Reviewed-on: https://go-review.googlesource.com/41550
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-26 03:48:23 +00:00
Mikio Hara
91c9b0d568 runtime: adjust netpoll panic messages
Change-Id: I34547b057605bb9e1e2227c41867589348560244
Reviewed-on: https://go-review.googlesource.com/41513
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-25 21:39:18 +00:00
Daniel Martí
516e6f6d5d all: remove some unused parameters in test code
Mostly unnecessary *testing.T arguments.

Found with github.com/mvdan/unparam.

Change-Id: Ifb955cb88f2ce8784ee4172f4f94d860fa36ae9a
Reviewed-on: https://go-review.googlesource.com/41691
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-25 14:38:10 +00:00
Brad Fitzpatrick
34ee8ec193 runtime: ignore TestCgoPprofPIE test failures on Alpine (take 2)
s/arm64/amd64/ in previous typo CL 41628

Updates #19938
Updates #18243

Change-Id: I282244ee3c94535f229a87b6246382385ff64428
Reviewed-on: https://go-review.googlesource.com/41675
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-25 05:02:56 +00:00
Martin Möhrmann
b64e817853 runtime: simplify detection of preference to use AVX memmove
Reduces cmd/go by 4464 bytes on amd64.

Removes the duplicate detection of AVX support and
presence of Intel processors.

Change-Id: I4670189951a63760fae217708f68d65e94a30dc5
Reviewed-on: https://go-review.googlesource.com/41570
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-25 04:50:04 +00:00
Brad Fitzpatrick
16271b8b52 runtime: ignore TestCgoPprofPIE test failures on Alpine
Updates #19938
Updates #18243

Change-Id: Ib6e704c0a5d596bdfaa6493902d2528bec55bf16
Reviewed-on: https://go-review.googlesource.com/41628
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-25 04:33:00 +00:00
Evgeniy Polyakov
9f98e49825 runtime: make time correctly update on Wine
Implemented low-level time system for windows on hardware (software),
which does not support memory mapped _KSYSTEM_TIME page update.

In particular this problem exists on Wine where _KSYSTEM_TIME
only contains time at the start, and is never modified.

On start we try to detect Wine and if it's so we fallback to
GetSystemTimeAsFileTime() for current time and a monotonic
timer based on QueryPerformanceCounter family of syscalls:
https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx

Fixes #18537

Change-Id: I269d22467ed9b0afb62056974d23e731b80c83ed
Reviewed-on: https://go-review.googlesource.com/35710
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-25 04:30:06 +00:00
Brad Fitzpatrick
6a48019ea5 runtime/debug: mark TestSetGCPercent as flaky
Updates #20076

Change-Id: I4eb98abbb49174cc6433e5da2c3660893ef88fd1
Reviewed-on: https://go-review.googlesource.com/41615
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-24 22:11:58 +00:00
Mikio Hara
42c5f3993b runtime: gofmt -w -s
Change-Id: I954b0300554786b7026996a21acfec3b6f205e75
Reviewed-on: https://go-review.googlesource.com/41512
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-24 17:01:29 +00:00
Keith Randall
1e72bf6218 cmd/compile: experiment which clobbers all dead pointer fields
The experiment "clobberdead" clobbers all pointer fields that the
compiler thinks are dead, just before and after every safepoint.
Useful for debugging the generation of live pointer bitmaps.

Helped find the following issues:
Update #15936
Update #16026
Update #16095
Update #18860

Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9
Reviewed-on: https://go-review.googlesource.com/23924
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-21 20:19:50 +00:00
Austin Clements
e516227554 runtime/debug: increase threshold on TestSetGCPercent
Currently TestSetGCPercent checks that NextGC is within 10 MB of the
expected value. For some reason it's much noisier on some of the
builders. To get these passing again, raise the threshold to 20 MB.

Change-Id: I14e64025660d782d81ff0421c1eb898f416e11fe
Reviewed-on: https://go-review.googlesource.com/41374
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-04-21 19:55:14 +00:00
Austin Clements
227fff2ea4 runtime/debug: don't trigger a GC on SetGCPercent
Currently SetGCPercent forces a GC in order to recompute GC pacing.
Since we can now recompute pacing on the fly using gcSetTriggerRatio,
change SetGCPercent (really runtime.setGCPercent) to go through
gcSetTriggerRatio and not trigger a GC.

Fixes #19076.

Change-Id: Ib30d7ab1bb3b55219535b9f238108f3d45a1b522
Reviewed-on: https://go-review.googlesource.com/39835
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-21 17:42:02 +00:00
Austin Clements
d9308cbb51 runtime/debug: expand SetGCPercent test
The current SetGCPercent test is, shall we say, minimal.

Expand it to check that the GC target is actually computed and updated
correctly.

For #19076.

Change-Id: I6e9b2ee0ef369f22f72e43b58d89e9f1e1b73b1b
Reviewed-on: https://go-review.googlesource.com/39834
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-21 17:42:01 +00:00
Austin Clements
1c4f3c5ea0 runtime: make gcSetTriggerRatio work at any time
This changes gcSetTriggerRatio so it can be called even during
concurrent mark or sweep. In this case, it will adjust the pacing of
the current phase, accounting for progress that has already been made.

To make this work for concurrent sweep, this introduces a "basis" for
the pagesSwept count, much like the basis we just introduced for
heap_live. This lets gcSetTriggerRatio shift the basis to the current
heap_live and pagesSwept and compute a slope from there to completion.
This avoids creating a discontinuity where, if the ratio has
increased, there has to be a flurry of sweep activity to catch up.
Instead, this creates a continuous, piece-wise linear function as
adjustments are made.

For #19076.

Change-Id: Ibcd76aeeb81ff4814b00be7cbd3530b73bbdbba9
Reviewed-on: https://go-review.googlesource.com/39833
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-21 17:41:59 +00:00
Austin Clements
a5eb3dceaf runtime: drive proportional sweep directly off heap_live
Currently, proportional sweep maintains its own count of how many
bytes have been allocated since the beginning of the sweep cycle so it
can compute how many pages need to be swept for a given allocation.

However, this requires a somewhat complex reimbursement scheme since
proportional sweep must be done before a span is allocated, but we
don't know how many bytes to charge until we've allocated a span. This
means that the allocated byte count used by proportional sweep can go
up and down, which has led to underflow bugs in the past (#18043) and
is going to interfere with adjusting sweep pacing on-the-fly (for #19076).

This approach also means we're maintaining a statistic that is very
closely related to heap_live, but has a different 0 value. This is
particularly confusing because the sweep ratio is computed based on
heap_live, so you have to understand that these two statistics are
very closely related.

Replace all of this and compute the sweep debt directly from the
current value of heap_live. To make this work, we simply save the
value of heap_live when the sweep ratio is computed to use as a
"basis" for later computing the sweep debt.

This eliminates the need for reimbursement as well as the code for
maintaining the sweeper's version of the live heap size.

For #19076.

Coincidentally fixes #18043, since this eliminates sweep reimbursement
entirely.

Change-Id: I1f931ddd6e90c901a3972c7506874c899251dc2a
Reviewed-on: https://go-review.googlesource.com/39832
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-21 17:41:57 +00:00
Austin Clements
ee175afac2 runtime: consolidate all trigger-derived computations
Currently, the computations that derive controls from the GC trigger
are spread across several parts of the mark termination code.
Consolidate computing the absolute trigger, the heap goal, and sweep
pacing into a single function called at the end of mark termination.

Unlike the code being consolidated, this has to be more careful about
negative gcpercent. Many of the consolidated code paths simply didn't
execute if GC was off.

This is a step toward being able to change the GC trigger ratio in the
middle of concurrent sweeping and marking. For this commit, we try to
stick close to the original structure of the code that's being
consolidated, so it doesn't yet support mid-cycle adjustments.

For #19076.

Change-Id: Ic5335be04b96ad20e70d53d67913a86bd6b31456
Reviewed-on: https://go-review.googlesource.com/39831
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-21 17:41:55 +00:00
Austin Clements
49a412a5b7 runtime: rationalize triggerRatio
gcController.triggerRatio is the only field in gcController that
persists across cycles. As global mutable state, the places where it
written and read are spread out, making it difficult to see that
updates and downstream calculations are done correctly.

Improve this situation by doing two things:

1) Move triggerRatio to memstats so it lives with the other
trigger-related fields and makes gcController entirely transient
state.

2) Commit the new trigger ratio during mark termination when we
compute other next-cycle controls, including the absolute trigger.
This forces us to explicitly thread the new trigger ratio from
gcController.endCycle to mark termination, so we're not just pulling
it out of global state.

Change-Id: I6669932f8039a8c0ef46a3f2a8c537db72e578aa
Reviewed-on: https://go-review.googlesource.com/39830
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-21 17:41:53 +00:00
Austin Clements
9d36163c0b runtime: consistently use atomic loads for heap_live
heap_live is updated atomically without locking, so we should also use
atomic loads to read it. Fix the reads of heap_live that happen
outside of STW to be atomic.

Change-Id: Idca9451c348168c2a792a9499af349833a3c333f
Reviewed-on: https://go-review.googlesource.com/41371
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-21 17:41:51 +00:00
Austin Clements
bb6309cd63 runtime: inform arena placement using sbrk(0)
On 32-bit architectures (or if we fail to map a 64-bit-style arena),
we try to map the heap arena just above the end of the process image.
While we can accept any address, using lower addresses is preferable
because lower addresses cause us to map less of the heap bitmap.

However, if a program is linked against C code that has global
constructors, those constructors may call brk/sbrk to allocate memory
(e.g., many C malloc implementations do this for small allocations).
The brk also starts just above the process image, so this may adjust
the brk past the beginning of where we want to put the heap arena. In
this case, the kernel will pick a different address for the arena and
it will usually be very high (at least, as these things go in a 32-bit
address space).

Fix this by consulting the current value of the brk and using this in
addition to the end of the process image to compute the initial arena
placement.

This is implemented only on Linux currently, since we have no evidence
that it's an issue on any other OSes.

Fixes #19831.

Change-Id: Id64b45d08d8c91e4f50d92d0339146250b04f2f8
Reviewed-on: https://go-review.googlesource.com/39810
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-21 14:34:10 +00:00
David Lazar
da75700a64 runtime: make test independent of inlining
TestBreakpoint expects to see "runtime.Breakpoint()" in the stack trace.
If runtime.Breakpoint() is inlined, then the stack trace prints
"runtime.Breakpoint(...)" since the runtime does not have information
about arguments (or lack thereof) to inlined functions. This change
makes the test independent of inlining by looking for the string
"runtime.Breakpoint(". Now TestBreakpoint passes with -l=4.

Change-Id: Ia044a8e8a4de2337cb2b393d6fa78c73a2f25926
Reviewed-on: https://go-review.googlesource.com/40997
Run-TryBot: David Lazar <lazard@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-20 20:41:15 +00:00
Austin Clements
6f2e6f8dd6 runtime/pprof: don't accept "," in profile PCs
TestBlockProfile matches samples against a regexp that accepts "," in
profile PCs. I suspect this was just a syntax mistake. Remove "," from
the character class.

Change-Id: Idcfc20ed6900075abae08597ba71db559e89b37b
Reviewed-on: https://go-review.googlesource.com/41111
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Peter Weinberger <pjw@google.com>
2017-04-20 19:46:38 +00:00
Austin Clements
4a4398825f runtime/pprof: accept fewer PCs
TestBlockProfile currently requires exactly five PCs in each sample.
With more aggressive inlining there may be fewer, so change this test
to use the same pattern as TestMutexProfile, which accepts one or more
PCs. With this change, this test passes when compiled with -l=4.

Change-Id: I1421a6d56c96b77111bdc671d88723a222672fd6
Reviewed-on: https://go-review.googlesource.com/41110
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
2017-04-20 19:46:36 +00:00
Josh Bleecher Snyder
565807566e runtime: improve ExampleFrames
CL 40876 changed ExampleFrames so that the output
was stable with and without mid-stack inlining.

However, that change lost some of the
pedagogical and copy/paste value of the example.
It was unclear why both more and i were being tracked,
and whether the 5 in i < 5 is related to len(pc),
and if so, why and how.

This CL rewrites the example with lots more comments,
and such that the core structure more closely matches
normal usage, and such that it is obvious
which lines of code should be deleted when copying.
As a bonus, it also now illustrates Frame.File.

Change-Id: Iab73541dd096657ddf79c5795337e8b596d89740
Reviewed-on: https://go-review.googlesource.com/41136
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-20 19:46:01 +00:00
Austin Clements
0c0c94a9dc runtime/pprof: fix period information
The period recorded in CPU profiles is in nanoseconds, but was being
computed incorrectly as hz * 1000. As a result, many absolute times
displayed by pprof were incorrect.

Fix this by computing the period correctly.

Change-Id: I6fadd6d8ad3e57f31e8cc7a25a24fcaec510d8d4
Reviewed-on: https://go-review.googlesource.com/40995
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-04-20 19:35:08 +00:00
Josh Bleecher Snyder
01b1a34aac cmd/compile: rework handling of udiv on ARM
Instead of populating the aux symbol
of CALLudiv during rewrite rules,
populate it during genssa.

This simplifies the rewrite rules.
It also removes all remaining calls
to ctxt.Lookup from any rewrite rules.
This is a first step towards removing
ctxt from ssa.Cache entirely,
and also a first step towards converting
the obj.LSym.Version field into a boolean.
It should also speed up compilation.

Also, move func udiv into package runtime.
That's where it is anyway,
and it lets udiv look and act like the rest of
the runtime support functions.

Change-Id: I41462a632c14fdc41f61b08049ec13cd80a87bfe
Reviewed-on: https://go-review.googlesource.com/41191
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-20 16:27:38 +00:00
Daniel Martí
ff7994ac10 all: remove redundant returns
Returns at the end of func bodies where the funcs have no return values
are pointless.

Change-Id: I0da5ea78671503e41a9f56dd770df8c919310ce5
Reviewed-on: https://go-review.googlesource.com/41093
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-19 20:03:51 +00:00
Austin Clements
22000f5407 runtime: record swept and reclaimed bytes in sweep trace
This extends the GCSweepDone event with counts of swept and reclaimed
bytes. These are useful for understanding the duration and
effectiveness of sweep events.

Change-Id: I3c97a4f0f3aad3adbd188adb264859775f54e2df
Reviewed-on: https://go-review.googlesource.com/40811
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2017-04-19 18:31:14 +00:00
Austin Clements
79c56addb6 runtime: make sweep trace events encompass entire sweep loop
Currently, each individual span sweep emits a span to the trace. But
sweeps are generally done in loops until some condition is satisfied,
so this tracing is lower-level than anyone really wants any hides the
fact that no other work is being accomplished between adjacent sweep
events. This is also high overhead: enabling tracing significantly
impacts sweep latency.

Replace this with instead tracing around the sweep loops used for
allocation. This is slightly tricky because sweep loops don't
generally know if any sweeping will happen in them. Hence, we make the
tracing lazy by recording in the P that we would like to start tracing
the sweep *if* one happens, and then only closing the sweep event if
we started it.

This does mean we don't get tracing on every sweep path, which are
legion. However, we get much more informative tracing on the paths
that block allocation, which are the paths that matter.

Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c
Reviewed-on: https://go-review.googlesource.com/40810
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-19 18:31:11 +00:00
Michael Munday
fb28f5ba3a runtime: avoid restricting GOARCH values in documentation
Changes the text to match GOOS which appends 'and so on' at the
end to avoid restricting the set of possible values.

Change-Id: I54bcde71334202cf701662cdc2582c974ba8bf53
Reviewed-on: https://go-review.googlesource.com/41074
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-19 18:19:08 +00:00
Josh Bleecher Snyder
94e44a9c8e runtime: preallocate some overflow buckets
When allocating a non-small array of buckets for a map,
also preallocate some overflow buckets.

The estimate of the number of overflow buckets
is based on a simulation of putting mid=(low+high)/2 elements
into a map, where low is the minimum number of elements
needed to reach this value of b (according to overLoadFactor),
and high is the maximum number of elements possible
to put in this value of b (according to overLoadFactor).
This estimate is surprisingly reliable and accurate.

The number of overflow buckets needed is quadratic,
for a fixed value of b.
Using this mid estimate means that we will overallocate a few
too many overflow buckets when the actual number of elements is near low,
and underallocate significantly too few overflow buckets
when the actual number of elements is near high.

The mechanism introduced in this CL can be re-used for
other overflow bucket optimizations.

For example, given an initial size hint,
we could estimate quite precisely the number of overflow buckets.
This is #19931.

We could also change from "non-nil means end-of-list"
to "pointer-to-hmap.buckets means end-of-list",
and then create a linked list of reusable overflow buckets
when they are freed by map growth.
That is #19992.

We could also use a similar mechanism to do bulk allocation
of overflow buckets.
All these uses can co-exist with only the one additional pointer
in mapextra, given a little care.

name                  old time/op    new time/op    delta
MapPopulate/1-8         60.1ns ± 2%    60.3ns ± 2%     ~     (p=0.278 n=19+20)
MapPopulate/10-8         577ns ± 1%     578ns ± 1%     ~     (p=0.140 n=20+20)
MapPopulate/100-8       8.06µs ± 1%    8.19µs ± 1%   +1.67%  (p=0.000 n=20+20)
MapPopulate/1000-8       104µs ± 1%     104µs ± 1%     ~     (p=0.317 n=20+20)
MapPopulate/10000-8      891µs ± 1%     888µs ± 1%     ~     (p=0.101 n=19+20)
MapPopulate/100000-8    8.61ms ± 1%    8.58ms ± 0%   -0.34%  (p=0.009 n=20+17)

name                  old alloc/op   new alloc/op   delta
MapPopulate/1-8          0.00B          0.00B          ~     (all equal)
MapPopulate/10-8          179B ± 0%      179B ± 0%     ~     (all equal)
MapPopulate/100-8       3.33kB ± 0%    3.38kB ± 0%   +1.48%  (p=0.000 n=20+16)
MapPopulate/1000-8      55.5kB ± 0%    53.4kB ± 0%   -3.84%  (p=0.000 n=19+20)
MapPopulate/10000-8      432kB ± 0%     428kB ± 0%   -1.06%  (p=0.000 n=19+20)
MapPopulate/100000-8    3.65MB ± 0%    3.62MB ± 0%   -0.70%  (p=0.000 n=20+20)

name                  old allocs/op  new allocs/op  delta
MapPopulate/1-8           0.00           0.00          ~     (all equal)
MapPopulate/10-8          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
MapPopulate/100-8         18.0 ± 0%      17.0 ± 0%   -5.56%  (p=0.000 n=20+20)
MapPopulate/1000-8        96.0 ± 0%      72.6 ± 1%  -24.38%  (p=0.000 n=20+20)
MapPopulate/10000-8        625 ± 0%       319 ± 0%  -48.86%  (p=0.000 n=20+20)
MapPopulate/100000-8     6.23k ± 0%     4.00k ± 0%  -35.79%  (p=0.000 n=20+20)

Change-Id: I01f41cb1374bdb99ccedbc00d04fb9ae43daa204
Reviewed-on: https://go-review.googlesource.com/40979
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:47:28 +00:00
Josh Bleecher Snyder
2abd91e265 runtime: add a map growth benchmark
Updates #19931
Updates #19992

Change-Id: Ib2d4e6b9b89a49caa443310d896dce8d6db06050
Reviewed-on: https://go-review.googlesource.com/40978
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:47:21 +00:00
Josh Bleecher Snyder
17d497feaa runtime: add bmap.setoverflow
bmap already has a overflow (getter) method.
Add a setoverflow (setter) method, for readability.

Updates #19931
Updates #19992

Change-Id: I00b3d94037c0d75508a7ebd51085c5c3857fb764
Reviewed-on: https://go-review.googlesource.com/40977
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:43:01 +00:00
Josh Bleecher Snyder
a41b1d5052 runtime: convert hmap.overflow into hmap.extra
Any change to how we allocate overflow buckets
will require some extra hmap storage,
but we don't want hmap to grow,
particular as small maps usually don't need overflow buckets.

This CL converts the existing hmap overflow field,
which is usually used for pointer-free maps,
into a generic extra field.

This extra field can be used to hold data that is optional.
If it is valuable enough to do have special
handling of overflow buckets, which are medium-sized,
it is valuable enough to pay an extra alloc and two extra words for.

Adding fields to extra would entail adding overhead to pointer-free maps;
any mapextra fields added would need to be weighed against that.
This CL is just rearrangement, though.

Updates #19931
Updates #19992

Change-Id: If8537a206905b9d4dc6cd9d886184ece671b3f80
Reviewed-on: https://go-review.googlesource.com/40976
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:42:04 +00:00
Josh Bleecher Snyder
619af17205 runtime: refactor hmap setoverflow into newoverflow
This simplifies the code, as well as providing
a single place to modify to change the
allocation of new overflow buckets.

Updates #19931
Updates #19992

Change-Id: I77070619f5c8fe449bbc35278278bca5eda780f2
Reviewed-on: https://go-review.googlesource.com/40975
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-19 13:41:44 +00:00
David Lazar
17137fae2e runtime: fix TestCaller with -l=4
Only the noinline pragma on testCallerFoo is needed to pass the test,
but the second pragma makes the test robust to future changes to the
inliner.

Change-Id: I80b384380c598f52e0382f53b59bb47ff196363d
Reviewed-on: https://go-review.googlesource.com/40877
Run-TryBot: David Lazar <lazard@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-18 19:56:48 +00:00
David Lazar
7821be5951 runtime: make example independent of inlining
Otherwise, with -l=4, runtime.Callers gets inlined and the example
prints too many frames. Now the example passes with -l=4.

Change-Id: I9e420af9371724ac3ec89efafd76a658cf82bb4a
Reviewed-on: https://go-review.googlesource.com/40876
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-18 19:56:36 +00:00
David Lazar
0ea120a70c runtime: skip logical frames in runtime.Caller
This rewrites runtime.Caller in terms of stackExpander, which already
handles inlined frames and partially skipped frames. This also has the
effect of making runtime.Caller understand cgo frames if there is a cgo
symbolizer.

Updates #19348.

Change-Id: Icdf4df921aab5aa394d4d92e3becc4dd169c9a6e
Reviewed-on: https://go-review.googlesource.com/40270
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-18 19:56:30 +00:00
Austin Clements
38521004ed runtime: make internal CallersFrames-equivalent that doesn't escape PC slice
The Frames API forces the PC slice to escape to the heap because it
stores it in the Frames object. However, we'd like to use this API for
call stack expansion internally in the runtime in places where it
would be very good to avoid heap allocation.

This commit makes this possible by pulling the bulk of the Frames
implementation into an internal frameExpander API. The key difference
between these APIs is that the frameExpander does not hold the PC
slice; instead, the caller is responsible for threading the PC slice
through the frameExpander API calls. This makes it possible to keep
the PC slice on the stack. The Frames API then becomes a thin shim
around the frameExpander that keeps the PC slice in the Frames object.

Change-Id: If6b2d0b9132a2a905a0cf5deced9feddce76fc0e
Reviewed-on: https://go-review.googlesource.com/40610
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
2017-04-17 22:46:18 +00:00
Lynn Boger
7d4cca07d2 cmd/asm: detect invalid DS form offsets for ppc64x
While debugging a recent regression it was discovered that
the assembler for ppc64x was not always generating the correct
instruction for DS form loads and stores.  When an instruction
is DS form then the offset must be a multiple of 4, and if it
isn't then bits outside the offset field were being incorrectly
set resulting in unexpected and incorrect instructions.

This change adds a check to determine when the opcode is DS form
and then verifies that the offset is a multiple of 4 before
generating the instruction, otherwise logs an error.

This also changes a few asm files that were using unaligned offsets
for DS form loads and stores.  In the runtime package these were
instructions intended to cause a crash so using aligned or unaligned
offsets doesn't change that behavior.

Change-Id: Ie3a7e1e65dcc9933b54de7a46a054da8459cb56f
Reviewed-on: https://go-review.googlesource.com/40476
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-04-17 21:24:35 +00:00
David Lazar
3249cb0ab4 runtime/trace: iterate over frames instead of PCs
Now the runtime/trace tests pass with -l=4.

This also gets rid of the frames cache for multiple reasons:

1) The frames cache was used to avoid repeated calls to funcname and
funcline. Now these calls happen inside the CallersFrames iterator.

2) Maintaining a frames cache is harder: map[uintptr]traceFrame
doesn't work since each PC can map to multiple traceFrames.

3) It's not clear that the cache is important.

Change-Id: I2914ac0b3ba08e39b60149d99a98f9f532b35bbb
Reviewed-on: https://go-review.googlesource.com/40591
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-14 12:21:21 +00:00
David Lazar
a7276742e6 runtime/trace: better output when test fails
Change-Id: I108d15eb4cd25904bb76de4ed7548c039c69d1a3
Reviewed-on: https://go-review.googlesource.com/40590
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-04-14 12:21:02 +00:00
Austin Clements
051809e352 runtime: free workbufs during sweeping
This extends the sweeper to free workbufs back to the heap between GC
cycles, allowing this memory to be reused for GC'd allocations or
eventually returned to the OS.

This helps for applications that have high peak heap usage relative to
their regular heap usage (for example, a high-memory initialization
phase). Workbuf memory is roughly proportional to heap size and since
we currently never free workbufs, it's proportional to *peak* heap
size. By freeing workbufs, we can release and reuse this memory for
other purposes when the heap shrinks.

This is somewhat complicated because this costs ~1–2 µs per workbuf
span, so for large heaps it's too expensive to just do synchronously
after mark termination between starting the world and dropping the
worldsema. Hence, we do it asynchronously in the sweeper. This adds a
list of "free" workbuf spans that can be returned to the heap. GC
moves all workbuf spans to this list after mark termination and the
background sweeper drains this list back to the heap. If the sweeper
doesn't finish, that's fine, since getempty can directly reuse any
remaining spans to allocate more workbufs.

Performance impact is negligible. On the x/benchmarks, this reduces
GC-bytes-from-system by 6–11%.

Fixes #19325.

Change-Id: Icb92da2196f0c39ee984faf92d52f29fd9ded7a8
Reviewed-on: https://go-review.googlesource.com/38582
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:47 +00:00
Austin Clements
9cc883a466 runtime: allocate GC workbufs from manually-managed spans
Currently the runtime allocates workbufs from persistent memory, which
means they can never be freed.

Switch to allocating them from manually-managed heap spans. This
doesn't free them yet, but it puts us in a position to do so.

For #19325.

Change-Id: I94b2512a2f2bbbb456cd9347761b9412e80d2da9
Reviewed-on: https://go-review.googlesource.com/38581
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:44 +00:00
Austin Clements
42c1214762 runtime: eliminate write barriers from alloc/mark bitmaps
This introduces a new type, *gcBits, to use for alloc/mark bitmap
allocations instead of *uint8. This type is marked go:notinheap, so
uses of it correctly eliminate write barriers. Since we now have a
type, this also extracts some common operations to methods both for
convenience and to avoid (*uint8) casts at most use sites.

For #19325.

Change-Id: Id51f734fb2e96b8b7715caa348c8dcd4aef0696a
Reviewed-on: https://go-review.googlesource.com/38580
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:42 +00:00
Austin Clements
9d1b2f888e runtime: rename gcBits -> gcBitsArena
This clarifies that the gcBits type is actually an arena of gcBits and
will let us introduce a new gcBits type representing a single
mark/alloc bitmap allocated from the arena.

For #19325.

Change-Id: Idedf76d202d9174a17c61bcca9d5539e042e2445
Reviewed-on: https://go-review.googlesource.com/38579
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:40 +00:00
Austin Clements
dc0f0ab70f runtime: don't count manually-managed spans from heap_{inuse,sys}
Currently, manually-managed spans are included in memstats.heap_inuse
and memstats.heap_sys, but when we export these stats to the user, we
subtract out how much has been allocated for stack spans from both.
This works for now because stacks are the only manually-managed spans
we have.

However, we're about to use manually-managed spans for more things
that don't necessarily have obvious stats we can use to adjust the
user-presented numbers. Prepare for this by changing the accounting so
manually-managed spans don't count toward heap_inuse or heap_sys. This
makes these fields align with the fields presented to the user and
means we don't have to track more statistics just so we can adjust
these statistics.

For #19325.

Change-Id: I5cb35527fd65587ff23339276ba2c3969e2ad98f
Reviewed-on: https://go-review.googlesource.com/38577
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:38 +00:00
Austin Clements
407c56ae9f runtime: generalize {alloc,free}Stack to {alloc,free}Manual
We're going to start using manually-managed spans for GC workbufs, so
rename the allocate/free methods and pass in a pointer to the stats to
use instead of using the stack stats directly.

For #19325.

Change-Id: I37df0147ae5a8e1f3cb37d59c8e57a1fcc6f2980
Reviewed-on: https://go-review.googlesource.com/38576
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:35 +00:00
Austin Clements
ab9db51e1c runtime: rename mspan.stackfreelist -> manualFreeList
We're going to use this free list for other types of manually-managed
memory in the heap.

For #19325.

Change-Id: Ib7e682295133eabfddf3a84f44db43d937bfdd9c
Reviewed-on: https://go-review.googlesource.com/38575
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:33 +00:00
Austin Clements
8fbaa4f70b runtime: rename _MSpanStack -> _MSpanManual
We're about to generalize _MSpanStack to be used for other forms of
in-heap manual memory management in the runtime. This is an automated
rename of _MSpanStack to _MSpanManual plus some comment fix-ups.

For #19325.

Change-Id: I1e20a57bb3b87a0d324382f92a3e294ffc767395
Reviewed-on: https://go-review.googlesource.com/38574
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:30 +00:00
Wei Xiao
ab636b899c hash/crc32: optimize arm64 crc32 implementation
ARMv8 defines crc32 instruction.

Comparing to the original crc32 calculation, this patch makes use of
crc32 instructions to do crc32 calculation instead of the multiple
lookup table algorithms.

ARMv8 provides IEEE and Castagnoli polynomials for crc32 calculation
so that the perfomance of these two types of crc32 get significant
improved.

name                                        old time/op   new time/op    delta
CRC32/poly=IEEE/size=15/align=0-32            117ns ± 0%      38ns ± 0%   -67.44%
CRC32/poly=IEEE/size=15/align=1-32            117ns ± 0%      38ns ± 0%   -67.52%
CRC32/poly=IEEE/size=40/align=0-32            129ns ± 0%      41ns ± 0%   -68.37%
CRC32/poly=IEEE/size=40/align=1-32            129ns ± 0%      41ns ± 0%   -68.29%
CRC32/poly=IEEE/size=512/align=0-32           828ns ± 0%     246ns ± 0%   -70.29%
CRC32/poly=IEEE/size=512/align=1-32           828ns ± 0%     132ns ± 0%   -84.06%
CRC32/poly=IEEE/size=1kB/align=0-32          1.58µs ± 0%    0.46µs ± 0%   -70.98%
CRC32/poly=IEEE/size=1kB/align=1-32          1.58µs ± 0%    0.46µs ± 0%   -70.92%
CRC32/poly=IEEE/size=4kB/align=0-32          6.06µs ± 0%    1.74µs ± 0%   -71.27%
CRC32/poly=IEEE/size=4kB/align=1-32          6.10µs ± 0%    1.74µs ± 0%   -71.44%
CRC32/poly=IEEE/size=32kB/align=0-32         48.3µs ± 0%    13.7µs ± 0%   -71.61%
CRC32/poly=IEEE/size=32kB/align=1-32         48.3µs ± 0%    13.7µs ± 0%   -71.60%
CRC32/poly=Castagnoli/size=15/align=0-32      116ns ± 0%      38ns ± 0%   -67.07%
CRC32/poly=Castagnoli/size=15/align=1-32      116ns ± 0%      38ns ± 0%   -66.90%
CRC32/poly=Castagnoli/size=40/align=0-32      127ns ± 0%      40ns ± 0%   -68.11%
CRC32/poly=Castagnoli/size=40/align=1-32      127ns ± 0%      40ns ± 0%   -68.11%
CRC32/poly=Castagnoli/size=512/align=0-32     828ns ± 0%     132ns ± 0%   -84.06%
CRC32/poly=Castagnoli/size=512/align=1-32     827ns ± 0%     132ns ± 0%   -84.04%
CRC32/poly=Castagnoli/size=1kB/align=0-32    1.59µs ± 0%    0.22µs ± 0%   -85.89%
CRC32/poly=Castagnoli/size=1kB/align=1-32    1.58µs ± 0%    0.22µs ± 0%   -85.79%
CRC32/poly=Castagnoli/size=4kB/align=0-32    6.14µs ± 0%    0.77µs ± 0%   -87.40%
CRC32/poly=Castagnoli/size=4kB/align=1-32    6.06µs ± 0%    0.77µs ± 0%   -87.25%
CRC32/poly=Castagnoli/size=32kB/align=0-32   48.3µs ± 0%     5.9µs ± 0%   -87.71%
CRC32/poly=Castagnoli/size=32kB/align=1-32   48.4µs ± 0%     6.0µs ± 0%   -87.69%
CRC32/poly=Koopman/size=15/align=0-32         104ns ± 0%     104ns ± 0%    +0.00%
CRC32/poly=Koopman/size=15/align=1-32         104ns ± 0%     104ns ± 0%    +0.00%
CRC32/poly=Koopman/size=40/align=0-32         235ns ± 0%     235ns ± 0%    +0.00%
CRC32/poly=Koopman/size=40/align=1-32         235ns ± 0%     235ns ± 0%    +0.00%
CRC32/poly=Koopman/size=512/align=0-32       2.71µs ± 0%    2.71µs ± 0%    -0.07%
CRC32/poly=Koopman/size=512/align=1-32       2.71µs ± 0%    2.71µs ± 0%    -0.04%
CRC32/poly=Koopman/size=1kB/align=0-32       5.40µs ± 0%    5.39µs ± 0%    -0.06%
CRC32/poly=Koopman/size=1kB/align=1-32       5.40µs ± 0%    5.40µs ± 0%    +0.02%
CRC32/poly=Koopman/size=4kB/align=0-32       21.5µs ± 0%    21.5µs ± 0%    -0.16%
CRC32/poly=Koopman/size=4kB/align=1-32       21.5µs ± 0%    21.5µs ± 0%    -0.05%
CRC32/poly=Koopman/size=32kB/align=0-32       172µs ± 0%     172µs ± 0%    -0.07%
CRC32/poly=Koopman/size=32kB/align=1-32       172µs ± 0%     172µs ± 0%    -0.01%

name                                        old speed     new speed      delta
CRC32/poly=IEEE/size=15/align=0-32          128MB/s ± 0%   394MB/s ± 0%  +207.95%
CRC32/poly=IEEE/size=15/align=1-32          128MB/s ± 0%   394MB/s ± 0%  +208.09%
CRC32/poly=IEEE/size=40/align=0-32          310MB/s ± 0%   979MB/s ± 0%  +216.07%
CRC32/poly=IEEE/size=40/align=1-32          310MB/s ± 0%   979MB/s ± 0%  +216.16%
CRC32/poly=IEEE/size=512/align=0-32         618MB/s ± 0%  2074MB/s ± 0%  +235.72%
CRC32/poly=IEEE/size=512/align=1-32         618MB/s ± 0%  3852MB/s ± 0%  +523.55%
CRC32/poly=IEEE/size=1kB/align=0-32         646MB/s ± 0%  2225MB/s ± 0%  +244.57%
CRC32/poly=IEEE/size=1kB/align=1-32         647MB/s ± 0%  2225MB/s ± 0%  +243.87%
CRC32/poly=IEEE/size=4kB/align=0-32         676MB/s ± 0%  2352MB/s ± 0%  +248.02%
CRC32/poly=IEEE/size=4kB/align=1-32         672MB/s ± 0%  2352MB/s ± 0%  +250.15%
CRC32/poly=IEEE/size=32kB/align=0-32        678MB/s ± 0%  2387MB/s ± 0%  +252.17%
CRC32/poly=IEEE/size=32kB/align=1-32        678MB/s ± 0%  2388MB/s ± 0%  +252.11%
CRC32/poly=Castagnoli/size=15/align=0-32    129MB/s ± 0%   393MB/s ± 0%  +205.51%
CRC32/poly=Castagnoli/size=15/align=1-32    129MB/s ± 0%   390MB/s ± 0%  +203.41%
CRC32/poly=Castagnoli/size=40/align=0-32    314MB/s ± 0%   988MB/s ± 0%  +215.04%
CRC32/poly=Castagnoli/size=40/align=1-32    314MB/s ± 0%   987MB/s ± 0%  +214.68%
CRC32/poly=Castagnoli/size=512/align=0-32   618MB/s ± 0%  3860MB/s ± 0%  +524.32%
CRC32/poly=Castagnoli/size=512/align=1-32   619MB/s ± 0%  3859MB/s ± 0%  +523.66%
CRC32/poly=Castagnoli/size=1kB/align=0-32   645MB/s ± 0%  4568MB/s ± 0%  +608.56%
CRC32/poly=Castagnoli/size=1kB/align=1-32   650MB/s ± 0%  4567MB/s ± 0%  +602.94%
CRC32/poly=Castagnoli/size=4kB/align=0-32   667MB/s ± 0%  5297MB/s ± 0%  +693.81%
CRC32/poly=Castagnoli/size=4kB/align=1-32   676MB/s ± 0%  5297MB/s ± 0%  +684.00%
CRC32/poly=Castagnoli/size=32kB/align=0-32  678MB/s ± 0%  5519MB/s ± 0%  +713.83%
CRC32/poly=Castagnoli/size=32kB/align=1-32  677MB/s ± 0%  5497MB/s ± 0%  +712.04%
CRC32/poly=Koopman/size=15/align=0-32       143MB/s ± 0%   144MB/s ± 0%    +0.27%
CRC32/poly=Koopman/size=15/align=1-32       143MB/s ± 0%   144MB/s ± 0%    +0.33%
CRC32/poly=Koopman/size=40/align=0-32       169MB/s ± 0%   170MB/s ± 0%    +0.12%
CRC32/poly=Koopman/size=40/align=1-32       170MB/s ± 0%   170MB/s ± 0%    +0.08%
CRC32/poly=Koopman/size=512/align=0-32      189MB/s ± 0%   189MB/s ± 0%    +0.07%
CRC32/poly=Koopman/size=512/align=1-32      189MB/s ± 0%   189MB/s ± 0%    +0.04%
CRC32/poly=Koopman/size=1kB/align=0-32      190MB/s ± 0%   190MB/s ± 0%    +0.05%
CRC32/poly=Koopman/size=1kB/align=1-32      190MB/s ± 0%   190MB/s ± 0%    -0.01%
CRC32/poly=Koopman/size=4kB/align=0-32      190MB/s ± 0%   190MB/s ± 0%    +0.15%
CRC32/poly=Koopman/size=4kB/align=1-32      190MB/s ± 0%   191MB/s ± 0%    +0.05%
CRC32/poly=Koopman/size=32kB/align=0-32     191MB/s ± 0%   191MB/s ± 0%    +0.06%
CRC32/poly=Koopman/size=32kB/align=1-32     191MB/s ± 0%   191MB/s ± 0%    +0.02%

Also fix a bug of arm64 assembler

The optimization is mainly contributed by Fangming.Fang <fangming.fang@arm.com>

Change-Id: I900678c2e445d7e8ad9e2a9ab3305d649230905f
Reviewed-on: https://go-review.googlesource.com/40074
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-13 12:44:10 +00:00
Austin Clements
7f32d41e5d runtime: expand inlining iteratively in CallersFrames
Currently CallersFrames expands each PC to a slice of Frames and then
iteratively returns those Frames. However, this makes it very
difficult to avoid heap allocation: either the Frames slice will be
heap allocated, or, if it uses internal scratch space for small slices
(as it currently does), the Frames object itself has to be heap
allocated.

Fix this, at least in the common case, by expanding each PC
iteratively. We introduce a new pcExpander type that's responsible for
expanding a single PC. This maintains state from one Frame to the next
in the same PC. Frames then becomes a wrapper around this responsible
for feeding it the next PC when the pcExpander runs out of frames for
the current PC.

This makes it possible to stack-allocate a Frames object, which will
make it possible to use this API for PC expansion from within the
runtime itself.

Change-Id: I993463945ab574557cf1d6bedbe79ce7e9cbbdcd
Reviewed-on: https://go-review.googlesource.com/40434
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
2017-04-12 19:39:37 +00:00
Todd Neal
e49627d355 plugin: properly handle recursively defined types
Prevent a crash if the same type in two plugins had a recursive
definition, either by referring to a pointer to itself or a map existing
with the type as a value type (which creates a recursive definition
through the overflow bucket type).

Fixes #19258

Change-Id: Iac1cbda4c5b6e8edd5e6859a4d5da3bad539a9c6
Reviewed-on: https://go-review.googlesource.com/40292
Run-TryBot: Todd Neal <todd@tneal.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-04-12 12:46:07 +00:00
Joel Sing
092405a9af runtime/cgo: actually remove gcc_libinit_openbsd.c
This was unintentionally emptied rather than removed in 9417c022.

Change-Id: Ie6fdcf7ef55e58f12e2a2750ab448aa2d9f94d15
Reviewed-on: https://go-review.googlesource.com/40413
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-12 06:56:34 +00:00
Joel Sing
9417c022c6 cmd/link,runtime/cgo: enable PT_TLS generation on OpenBSD
OpenBSD 6.0 and later have support for PT_TLS in ld.so(1). Now that OpenBSD
6.1 has been released, OpenBSD 5.9 is no longer officially supported and Go
can start generating PT_TLS for OpenBSD cgo binaries. This also allows us
to remove the workarounds in the OpenBSD cgo runtime.

This change also removes the environ and progname exports - these are now
provided directly by ld.so(1) itself.

Fixes #19932

Change-Id: I42e75ef9feb5dcd4696add5233497e3cbc48ad52
Reviewed-on: https://go-review.googlesource.com/40331
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-11 16:33:16 +00:00
Ben Shi
69261ecad6 runtime: use hardware divider to improve performance
The hardware divider is an optional component of ARMv7. This patch
detects whether it is available in runtime and use it or not.

1. The hardware divider is detected at startup and a flag is set/clear
   according to a perticular bit of runtime.hwcap.
2. Each call of runtime.udiv will check this flag and decide if
   use the hardware division instruction.

A rough test shows the performance improves 40-50% for ARMv7. And
the compatibility of ARMv5/v6 is not broken.

fixes #19118

Change-Id: Ic586bc9659ebc169553ca2004d2bdb721df823ac
Reviewed-on: https://go-review.googlesource.com/37496
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-11 12:25:55 +00:00
Austin Clements
6c6f455f88 runtime: consolidate changes to arena_used
Changing mheap_.arena_used requires several steps that are currently
repeated multiple times in mheap_.sysAlloc. Consolidate these into a
single function.

In the future, this will also make it easier to add other auxiliary VM
structures.

Change-Id: Ie68837d2612e1f4ba4904acb1b6b832b15431d56
Reviewed-on: https://go-review.googlesource.com/40151
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-11 01:35:47 +00:00
Caleb Spare
221541ec8c testing: consider a test failed after race errors
Fixes #19851.

Change-Id: I5ee9533406542be7d5418df154f6134139e75892
Reviewed-on: https://go-review.googlesource.com/39890
Run-TryBot: Caleb Spare <cespare@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-04-10 14:36:02 +00:00
Austin Clements
7e1832d06c runtime: say where the compiler knows about var writeBarrier
The runtime.writeBarrier variable tries to be helpful by telling you
that the compiler also knows about this variable, which you could
probably guess, but doesn't say how the compiler knows about it. In
fact, the compiler has a complete copy in builtin/runtime.go that
needs to be kept in sync. Say so.

Change-Id: Ia7fb0c591cb6f9b8230decce01008b417dfcec89
Reviewed-on: https://go-review.googlesource.com/40150
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-09 23:05:24 +00:00
Todd Neal
0d33dc3105 runtime: improve output of panic(x) where x is numeric
Fixes #19658

Change-Id: I41e46073b75c7674e2ed9d6a90ece367ce92166b
Reviewed-on: https://go-review.googlesource.com/39650
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-09 22:40:33 +00:00
Cherry Zhang
0020b8a257 runtime: prevent TLS fetching instructions from being assembled on NaCl/ARM
They are dead code already, but the verifier is still not happy.
Don't assemble them at all.

Looks like it has been like that for long. I don't know why it
was ok. Maybe the verifier is now more picky?

Fixes #19884.

Change-Id: Ib806fb73ca469789dec56f52d484cf8baf7a245c
Reviewed-on: https://go-review.googlesource.com/40111
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
2017-04-08 22:51:18 +00:00
Josselin Costanzi
d206af1e6c strings: optimize Count for amd64
Move optimized Count implementation from bytes to runtime. Use in
both bytes and strings packages.
Add CountByte benchmark to strings.

Strings benchmarks:
name                       old time/op    new time/op    delta
CountHard1-4                 226µs ± 1%      226µs ± 2%      ~     (p=0.247 n=10+10)
CountHard2-4                 316µs ± 1%      315µs ± 0%      ~     (p=0.133 n=9+10)
CountHard3-4                 919µs ± 1%      920µs ± 1%      ~     (p=0.968 n=10+9)
CountTorture-4              15.4µs ± 1%     15.7µs ± 1%    +2.47%  (p=0.000 n=10+9)
CountTortureOverlapping-4   9.60ms ± 0%     9.65ms ± 1%      ~     (p=0.247 n=10+10)
CountByte/10-4              26.3ns ± 1%     10.9ns ± 1%   -58.71%  (p=0.000 n=9+9)
CountByte/32-4              42.7ns ± 0%     14.2ns ± 0%   -66.64%  (p=0.000 n=10+10)
CountByte/4096-4            3.07µs ± 0%     0.31µs ± 2%   -89.99%  (p=0.000 n=9+10)
CountByte/4194304-4         3.48ms ± 1%     0.34ms ± 1%   -90.09%  (p=0.000 n=10+9)
CountByte/67108864-4        55.6ms ± 1%      7.0ms ± 0%   -87.49%  (p=0.000 n=9+8)

name                      old speed      new speed       delta
CountByte/10-4             380MB/s ± 1%    919MB/s ± 1%  +142.21%  (p=0.000 n=9+9)
CountByte/32-4             750MB/s ± 0%   2247MB/s ± 0%  +199.62%  (p=0.000 n=10+10)
CountByte/4096-4          1.33GB/s ± 0%  13.32GB/s ± 2%  +898.13%  (p=0.000 n=9+10)
CountByte/4194304-4       1.21GB/s ± 1%  12.17GB/s ± 1%  +908.87%  (p=0.000 n=10+9)
CountByte/67108864-4      1.21GB/s ± 1%   9.65GB/s ± 0%  +699.29%  (p=0.000 n=9+8)

Fixes #19411

Change-Id: I8d2d409f0fa6df6d03b60790aa86e540b4a4e3b0
Reviewed-on: https://go-review.googlesource.com/38693
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-07 14:25:13 +00:00
Daniel Martí
4e7724b2db runtime: remove unused parameter from bestFitTreap
This code was added recently, and it doesn't seem like the parameter
will be useful in the near future.

Change-Id: I5d64dadb6820c159b588262ab90df2461b5fdf04
Reviewed-on: https://go-review.googlesource.com/39692
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-06 17:20:43 +00:00
Austin Clements
9741f0275c runtime: initialize more fields of stack spans
Stack spans don't internally use many of the fields of the mspan,
which means things like the size class and element size get left over
from whatever last used the mspan. This can lead to confusing crashes
and debugging.

Zero these fields or initialize them to something reasonable. This
also lets us simplify some code that currently has to distinguish
between heap and stack spans.

Change-Id: I9bd114e76c147bb32de497045b932f8bf1988bbf
Reviewed-on: https://go-review.googlesource.com/38573
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-05 19:17:41 +00:00
Austin Clements
ecb7b63820 runtime: fix gcpacertrace printing of sweep ratio
Commit 44ed88a5a7 moved printing of the "sweep done" gcpacertrace
message so that it is printed when the final sweeper finishes.
However, by this point some other thread has often already observed
that there are no more spans to sweep and zeroed sweepPagesPerByte.

Avoid printing a 0 sweep ratio in the trace when this race happens by
getting the value of the sweep ratio upon entry to sweepone and
printing that.

Change-Id: Iac0c48ae899e12f193267cdfb012c921f8b71c85
Reviewed-on: https://go-review.googlesource.com/39492
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-05 18:24:52 +00:00
Filip Gruszczyński
6c5a819a5e reflect: add MakeMapWithSize for creating maps with size hint
Providing size hint when creating a map allows avoiding re-allocating
underlying data structure if we know how many elements are going to
be inserted. This can be used for example during decoding maps in
gob.

Fixes #19599

Change-Id: I108035fec29391215d2261a73eaed1310b46bab1
Reviewed-on: https://go-review.googlesource.com/38335
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-04 20:01:43 +00:00
Eric Lagergren
094498c9a1 all: fix minor misspellings
Change-Id: I1f1cfb161640eb8756fb1a283892d06b30b7a8fa
Reviewed-on: https://go-review.googlesource.com/39356
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-03 23:19:07 +00:00
Josh Bleecher Snyder
654c977b26 runtime/race: print output when TestRace parsing fails
Change-Id: I986f0c106e059455874692f5bfe2b5af25cf470e
Reviewed-on: https://go-review.googlesource.com/39090
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-31 17:07:29 +00:00
Austin Clements
9ffbdabdb0 runtime: make runtime.GC() trigger a concurrent GC
Currently runtime.GC() triggers a STW GC. For common uses in tests and
benchmarks, it doesn't matter whether it's STW or concurrent, but for
uses in servers for things like collecting heap profiles and
controlling memory footprint, this pause can be a bit problem for
latency.

This changes runtime.GC() to trigger a concurrent GC. In order to
remain as close as possible to its current meaning, we define it to
always perform a full mark/sweep GC cycle before returning (even if
that means it has to finish up a cycle we're in the middle of first)
and to publish the heap profile as of the triggered mark termination.
While it must perform a full cycle, simultaneous runtime.GC() calls
can be consolidated into a single full cycle.

Fixes #18216.

Change-Id: I9088cc5deef4ab6bcf0245ed1982a852a01c44b5
Reviewed-on: https://go-review.googlesource.com/37520
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:21 +00:00
Austin Clements
44ed88a5a7 runtime: track the number of active sweepone calls
sweepone returns ^uintptr(0) when there are no more spans to *start*
sweeping, but there may be spans being swept concurrently at the time
and there's currently no efficient way to tell when the sweeper is
done sweeping all the spans.

We'll need this for concurrent runtime.GC(), so add a count of the
number of active sweepone calls to make it possible to block until
sweeping is truly done.

This is also useful for more accurately printing the gcpacertrace,
since that should be printed after all of the sweeping stats are in
(currently we can print it slightly too early).

For #18216.

Change-Id: I06e6240c9e7b40aca6fd7b788bb6962107c10a0f
Reviewed-on: https://go-review.googlesource.com/37716
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:18 +00:00
Austin Clements
2919132e1b runtime: don't adjust GC trigger on forced GC
Forced GCs don't provide good information about how to adjust the GC
trigger. Currently we avoid adjusting the trigger on forced GC because
forced GC is also STW and we don't adjust the trigger on STW GC.
However, this will become a problem when forced GC is concurrent.

Fix this by skipping trigger adjustment if the GC was user-forced.

For #18216.

Change-Id: I03dfdad12ecd3cfeca4573140a0768abb29aac5e
Reviewed-on: https://go-review.googlesource.com/38951
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:16 +00:00
Austin Clements
29fdbcfea3 runtime: track forced GCs independent of gcMode
Currently gcMode != gcBackgroundMode implies this was a user-forced GC
cycle. This is no longer going to be true when we make runtime.GC()
trigger a concurrent GC, so replace this with an explicit
work.userForced bit.

For #18216.

Change-Id: If7d71bbca78b5f0b35641b070f9d457f5c9a52bd
Reviewed-on: https://go-review.googlesource.com/37519
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:13 +00:00
Austin Clements
786eb5b754 runtime: make debug.FreeOSMemory call runtime.GC()
Currently freeOSMemory calls gcStart directly, but we really just want
it to behave like runtime.GC() and then perform a scavenge, so make it
call runtime.GC() rather than gcStart.

For #18216.

Change-Id: I548ec007afc788e87d383532a443a10d92105937
Reviewed-on: https://go-review.googlesource.com/37518
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:10 +00:00
Austin Clements
3d58498fdb runtime: simplify forced GC triggering
Now that the gcMode is no longer involved in the GC trigger condition,
we can simplify the triggering of forced GCs. By making the trigger
condition for forced GCs true even if gcphase is not _GCoff, we don't
need any special case path in gcStart to ensure that forced GCs don't
get consolidated.

Change-Id: I6067a13d76e40ff2eef8fade6fc14adb0cb58ee5
Reviewed-on: https://go-review.googlesource.com/37517
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:08 +00:00
Austin Clements
29be3f1999 runtime: generalize GC trigger
Currently the GC triggering condition is an awkward combination of the
gcMode (whether or not it's gcBackgroundMode) and a boolean
"forceTrigger" flag.

Replace this with a new gcTrigger type that represents the range of
transition predicates we need. This has several advantages:

1. We can remove the awkward logic that affects the trigger behavior
   based on the gcMode. Now gcMode purely controls whether to run a
   STW GC or not and the gcTrigger controls whether this is a forced
   GC that cannot be consolidated with other GC cycles.

2. We can lift the time-based triggering logic in sysmon to just
   another type of GC trigger and move the logic to the trigger test.

3. This sets us up to have a cycle count-based trigger, which we'll
   use to make runtime.GC trigger concurrent GC with the desired
   consolidation properties.

For #18216.

Change-Id: If9cd49349579a548800f5022ae47b8128004bbfc
Reviewed-on: https://go-review.googlesource.com/37516
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:06 +00:00
Austin Clements
640cd3b322 runtime: check transition condition before triggering periodic GC
Currently sysmon triggers periodic GC if GC is not currently running
and it's been long enough since the last GC. This misses some
important conditions; for example, whether GC is enabled at all by
GOGC. As a result, if GOGC is off, once we pass the timeout for
periodic GC, sysmon will attempt to trigger a GC every 10ms. This GC
will be a no-op because gcStart will check all of the appropriate
conditions and do nothing, but it still goes through the motions of
waking the forcegc goroutine and printing a gctrace line.

Fix this by making sysmon call gcShouldStart to check *all* of the
appropriate transition conditions before attempting to trigger a
periodic GC.

Fixes #19247.

Change-Id: Icee5521ce175e8419f934723849853d53773af31
Reviewed-on: https://go-review.googlesource.com/37515
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:03 +00:00
Austin Clements
1be3e76e76 runtime: simplify heap profile flushing
Currently the heap profile is flushed by *either* gcSweep in STW mode
or by gcMarkTermination in concurrent mode. Simplify this by making
gcMarkTermination always flush the heap profile and by making gcSweep
do one extra flush (instead of two) in STW mode.

Change-Id: I62147afb2a128e1f3d92ef4bb8144c8a345f53c4
Reviewed-on: https://go-review.googlesource.com/37715
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:01 +00:00
Austin Clements
eee85fc5a1 runtime: snapshot heap profile during mark termination
Currently we snapshot the heap profile just *after* mark termination
starts the world because it's a relatively expensive operation.
However, this means any alloc or free events that happen between
starting the world and snapshotting the heap profile can be accounted
to the wrong cycle. In the worst case, a free can be accounted to the
cycle before the alloc; if the heap is small, this can result
temporarily in a negative "in use" count in the profile.

Fix this without making STW more expensive by using a global heap
profile cycle counter. This lets us split up the operation into a two
parts: 1) a super-cheap snapshot operation that simply increments the
global cycle counter during STW, and 2) a more expensive cleanup
operation we can do after starting the world that frees up a slot in
all buckets for use by the next heap profile cycle.

Fixes #19311.

Change-Id: I6bdafabf111c48b3d26fe2d91267f7bef0bd4270
Reviewed-on: https://go-review.googlesource.com/37714
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:14:56 +00:00
Austin Clements
3ebe7d7d11 runtime: pull heap profile cycle into a type
Currently memRecord has the same set of four fields repeated three
times. Pull these into a type and use this type three times. This
cleans up and simplifies the code a bit and will make it easier to
switch to a globally tracked heap profile cycle for #19311.

Change-Id: I414d15673feaa406a8366b48784437c642997cf2
Reviewed-on: https://go-review.googlesource.com/37713
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:14:54 +00:00
Austin Clements
673a8fdfe6 runtime: diagram flow of stats through heap profile
Every time I modify heap profiling, I find myself redrawing this
diagram, so add it to the comments. This shows how allocations and
frees are accounted, how we arrive at consistent profile snapshots,
and when those snapshots are published to the user.

Change-Id: I106aba1200af3c773b46e24e5f50205e808e2c69
Reviewed-on: https://go-review.googlesource.com/37514
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 00:46:18 +00:00
Austin Clements
ef1829d1de runtime: improve TestMemStats checks
Now that we have a nice predicate system, improve the tests performed
by TestMemStats. We add some more non-zero checks (now that we force a
GC, things like NumGC must be non-zero), checks for trivial boolean
fields, and a few more range checks.

Change-Id: I6da46d33fa0ce5738407ee57d587825479413171
Reviewed-on: https://go-review.googlesource.com/37513
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 00:46:16 +00:00
Austin Clements
bda74b0e4a runtime: make TestMemStats failure messages useful
Currently most TestMemStats failures dump the whole MemStats object if
anything is amiss without telling you what is amiss, or even which
field is wrong. This makes it hard to figure out what the actual
problem is.

Replace this with a reflection walk over MemStats and a map of
predicates to check. If one fails, we can construct a detailed and
descriptive error message. The predicates are a direct translation of
the current tests.

Change-Id: I5a7cafb8e6a1eeab653d2e18bb74e2245eaa5444
Reviewed-on: https://go-review.googlesource.com/37512
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 00:46:14 +00:00
Caleb Spare
592037f381 runtime: fix for implementation notes appearing in godoc
Change-Id: I31cfae1e98313b68e3bc8f49079491d2725a662b
Reviewed-on: https://go-review.googlesource.com/38850
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-29 22:32:57 +00:00
David Lazar
7bf0adc6ad runtime: include inlined calls in result of CallersFrames
Change-Id: If1a3396175f2afa607d56efd1444181334a9ae3e
Reviewed-on: https://go-review.googlesource.com/37862
Reviewed-by: Austin Clements <austin@google.com>
2017-03-29 17:27:38 +00:00
David Lazar
ee97216a17 runtime: handle inlined calls in runtime.Callers
The `skip` argument passed to runtime.Caller and runtime.Callers should
be interpreted as the number of logical calls to skip (rather than the
number of physical stack frames to skip). This changes runtime.Callers
to skip inlined calls in addition to physical stack frames.

The result value of runtime.Callers is a slice of program counters
([]uintptr) representing physical stack frames. If the `skip` parameter
to runtime.Callers skips part-way into a physical frame, there is no
convenient way to encode that in the resulting slice. To avoid changing
the API in an incompatible way, our solution is to store the number of
skipped logical calls of the first frame in the _second_ uintptr
returned by runtime.Callers. Since this number is a small integer, we
encode it as a valid PC value into a small symbol called:

    runtime.skipPleaseUseCallersFrames

For example, if f() calls g(), g() calls `runtime.Callers(2, pcs)`, and
g() is inlined into f, then the frame for f will be partially skipped,
resulting in the following slice:

    pcs = []uintptr{pc_in_f, runtime.skipPleaseUseCallersFrames+1, ...}

We store the skip PC in pcs[1] instead of pcs[0] so that `pcs[i:]` will
truncate the captured stack trace rather than grow it for all i.

Updates #19348.

Change-Id: I1c56f89ac48c29e6f52a5d085567c6d77d499cf1
Reviewed-on: https://go-review.googlesource.com/37854
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-29 17:22:08 +00:00
Rick Hudson
6e9ec14186 runtime: redo insert/remove of large spans
Currently for spans with up to 1 MBytes (128 pages) we
maintain an array indexed by the number of pages in the
span. This is efficient both in terms of space as well
as time to insert or remove a span of a particular size.

Unfortunately for spans larger than 1 MByte we currently
place them on a separate linked list. This results in
O(n) behavior. Now that we are seeing heaps approaching
100 GBytes n is large enough to be noticed in real programs.

This change replaces the linked list now used with a balanced
binary tree structure called a treap. A treap is a
probabilistically balanced tree offering O(logN) behavior for
inserting and removing spans.

To verify that this approach will work we start with noting
that only spans with sizes > 1MByte will be put into the treap.
This means that to support 1 TByte a treap will need at most
1 million nodes and can ideally be held in a treap with a
depth of 20. Experiments with adding and removing randomly
sized spans from the treap seem to result in treaps with
depths of about twice the ideal or 40. A petabyte would
require a tree of only twice again that depth again so this
algorithm should last well into the future.

Fixes #19393

Go1 benchmarks indicate this is basically an overall wash.
Tue Mar 28 21:29:21 EDT 2017
name                     old time/op    new time/op    delta
BinaryTree17-4              2.42s ± 1%     2.42s ± 1%    ~     (p=0.980 n=21+21)
Fannkuch11-4                3.00s ± 1%     3.18s ± 4%  +6.10%  (p=0.000 n=22+24)
FmtFprintfEmpty-4          40.5ns ± 1%    40.3ns ± 3%    ~     (p=0.692 n=22+25)
FmtFprintfString-4         65.9ns ± 3%    64.6ns ± 1%  -1.98%  (p=0.000 n=24+23)
FmtFprintfInt-4            69.6ns ± 1%    68.0ns ± 7%  -2.30%  (p=0.001 n=21+22)
FmtFprintfIntInt-4          102ns ± 2%      99ns ± 1%  -3.07%  (p=0.000 n=23+23)
FmtFprintfPrefixedInt-4     126ns ± 0%     125ns ± 0%  -0.79%  (p=0.000 n=19+17)
FmtFprintfFloat-4           206ns ± 2%     205ns ± 1%    ~     (p=0.671 n=23+21)
FmtManyArgs-4               441ns ± 1%     445ns ± 1%  +0.88%  (p=0.000 n=22+23)
GobDecode-4                5.73ms ± 1%    5.86ms ± 1%  +2.37%  (p=0.000 n=23+22)
GobEncode-4                4.51ms ± 1%    4.89ms ± 1%  +8.32%  (p=0.000 n=22+22)
Gzip-4                      197ms ± 0%     202ms ± 1%  +2.75%  (p=0.000 n=23+24)
Gunzip-4                   32.9ms ± 8%    32.7ms ± 2%    ~     (p=0.466 n=23+24)
HTTPClientServer-4         57.3µs ± 1%    56.7µs ± 1%  -0.94%  (p=0.000 n=21+22)
JSONEncode-4               13.8ms ± 1%    13.9ms ± 2%  +1.14%  (p=0.000 n=22+23)
JSONDecode-4               47.4ms ± 1%    48.1ms ± 1%  +1.49%  (p=0.000 n=23+23)
Mandelbrot200-4            3.92ms ± 0%    3.92ms ± 1%  +0.21%  (p=0.000 n=22+22)
GoParse-4                  2.89ms ± 1%    2.87ms ± 1%  -0.68%  (p=0.000 n=21+22)
RegexpMatchEasy0_32-4      73.6ns ± 1%    72.0ns ± 2%  -2.15%  (p=0.000 n=21+22)
RegexpMatchEasy0_1K-4       173ns ± 1%     173ns ± 1%    ~     (p=0.847 n=22+24)
RegexpMatchEasy1_32-4      71.9ns ± 1%    69.8ns ± 1%  -2.99%  (p=0.000 n=23+20)
RegexpMatchEasy1_1K-4       314ns ± 1%     308ns ± 1%  -1.91%  (p=0.000 n=22+23)
RegexpMatchMedium_32-4      106ns ± 0%     105ns ± 1%  -0.58%  (p=0.000 n=19+21)
RegexpMatchMedium_1K-4     34.3µs ± 1%    34.3µs ± 1%    ~     (p=0.871 n=23+22)
RegexpMatchHard_32-4       1.67µs ± 1%    1.67µs ± 7%    ~     (p=0.224 n=22+23)
RegexpMatchHard_1K-4       51.5µs ± 1%    50.4µs ± 1%  -1.99%  (p=0.000 n=22+23)
Revcomp-4                   383ms ± 1%     415ms ± 0%  +8.51%  (p=0.000 n=22+22)
Template-4                 51.5ms ± 1%    51.5ms ± 1%    ~     (p=0.555 n=20+23)
TimeParse-4                 279ns ± 2%     277ns ± 1%  -0.95%  (p=0.000 n=24+22)
TimeFormat-4                294ns ± 1%     296ns ± 1%  +0.58%  (p=0.003 n=24+23)
[Geo mean]                 43.7µs         43.8µs       +0.32%

name                     old speed      new speed      delta
GobDecode-4               134MB/s ± 1%   131MB/s ± 1%  -2.32%  (p=0.000 n=23+22)
GobEncode-4               170MB/s ± 1%   157MB/s ± 1%  -7.68%  (p=0.000 n=22+22)
Gzip-4                   98.7MB/s ± 0%  96.1MB/s ± 1%  -2.68%  (p=0.000 n=23+24)
Gunzip-4                  590MB/s ± 7%   593MB/s ± 2%    ~     (p=0.466 n=23+24)
JSONEncode-4              141MB/s ± 1%   139MB/s ± 2%  -1.13%  (p=0.000 n=22+23)
JSONDecode-4             40.9MB/s ± 1%  40.3MB/s ± 0%  -1.47%  (p=0.000 n=23+23)
GoParse-4                20.1MB/s ± 1%  20.2MB/s ± 1%  +0.69%  (p=0.000 n=21+22)
RegexpMatchEasy0_32-4     435MB/s ± 1%   444MB/s ± 2%  +2.21%  (p=0.000 n=21+22)
RegexpMatchEasy0_1K-4    5.89GB/s ± 1%  5.89GB/s ± 1%    ~     (p=0.439 n=22+24)
RegexpMatchEasy1_32-4     445MB/s ± 1%   459MB/s ± 1%  +3.06%  (p=0.000 n=23+20)
RegexpMatchEasy1_1K-4    3.26GB/s ± 1%  3.32GB/s ± 1%  +1.97%  (p=0.000 n=22+23)
RegexpMatchMedium_32-4   9.40MB/s ± 1%  9.44MB/s ± 1%  +0.43%  (p=0.000 n=23+21)
RegexpMatchMedium_1K-4   29.8MB/s ± 1%  29.8MB/s ± 1%    ~     (p=0.826 n=23+22)
RegexpMatchHard_32-4     19.1MB/s ± 1%  19.1MB/s ± 7%    ~     (p=0.233 n=22+23)
RegexpMatchHard_1K-4     19.9MB/s ± 1%  20.3MB/s ± 1%  +2.03%  (p=0.000 n=22+23)
Revcomp-4                 664MB/s ± 1%   612MB/s ± 0%  -7.85%  (p=0.000 n=22+22)
Template-4               37.6MB/s ± 1%  37.7MB/s ± 1%    ~     (p=0.558 n=20+23)
[Geo mean]                134MB/s        133MB/s       -0.76%
Tue Mar 28 22:16:54 EDT 2017

Change-Id: I4a4f5c2b53d3fb85ef76c98522d3ed5cf8ae5b7e
Reviewed-on: https://go-review.googlesource.com/38732
Reviewed-by: Russ Cox <rsc@golang.org>
2017-03-29 14:18:24 +00:00
Elias Naur
2b4274d667 runtime/cgo: CFRelease result from CFBundleCopyResourceURL
The result from CFBundleCopyResourceURL is owned by the caller. This
CL adds the necessary CFRelease to release it after use.

Fixes #19722

Change-Id: I7afe22ef241d21922a7f5cef6498017e6269a5c3
Reviewed-on: https://go-review.googlesource.com/38639
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2017-03-27 18:12:17 +00:00
Austin Clements
4234d1decd runtime: improve systemstack-on-Go stack message
We reused the old C stack check mechanism for the implementation of
//go:systemstack, so when we execute a //go:systemstack function on a
user stack, the system fails by calling morestackc. However,
morestackc's message still talks about "executing C code".

Fix morestackc's message to reflect its modern usage.

Change-Id: I7e70e7980eab761c0520f675d3ce89486496030f
Reviewed-on: https://go-review.googlesource.com/38572
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-27 14:53:12 +00:00
Elias Naur
0476c7a7b5 runtime/cgo: raise the thread-local storage slot search limit on Android
On Android, the thread local offset is found by looping through memory
starting at the TLS base address. The search is limited to
PTHREAD_KEYS_MAX, but issue 19472 made it clear that in some cases, the
slot is located further from the TLS base.

The limit is merely a sanity check in case our assumptions about the
thread-local storage layout are wrong, so this CL raises it to 384, which
is enough for the test case in issue 19472.

Fixes #19472

Change-Id: I89d1db3e9739d3a7fff5548ae487a7483c0a278a
Reviewed-on: https://go-review.googlesource.com/38636
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-27 08:56:08 +00:00
Elias Naur
aa4c2ca316 runtime/pprof: fix proto tests on NetBSD
The proto_test tests are failing on NetBSD:

https://build.golang.org/log/a3a577144ac48c6ef8e384ce6a700ad30549fb78

the failures seem similar to previous failures on Android:

https://build.golang.org/log/b5786e0cd6d5941dc37b6a50be5172f6b99e22f0

The Android failures where fixed by CL 37896. This CL is an attempt
to fix the NetBSD failures with a similar fix.

Change-Id: I3834afa5b32303ca226e6a31f0f321f66fef9a3f
Reviewed-on: https://go-review.googlesource.com/38637
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-27 08:55:14 +00:00
Keith Randall
e67d881bc3 cmd/compile: simplify efaceeq and ifaceeq
Clean up code that does interface equality. Avoid doing checks
in efaceeq/ifaceeq that we already did before calling those routines.

No noticeable performance changes for existing benchmarks.

name            old time/op  new time/op  delta
EfaceCmpDiff-8   604ns ± 1%   553ns ± 1%  -8.41%  (p=0.000 n=9+10)

Fixes #18618

Change-Id: I3bd46db82b96494873045bc3300c56400bc582eb
Reviewed-on: https://go-review.googlesource.com/38606
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2017-03-24 23:03:09 +00:00
Cherry Zhang
3a1ce1085a runtime: access _cgo_yield indirectly
The darwin linker for ARM does not allow PC-relative relocation
of external symbol in text section. Work around it by accessing
it indirectly: putting its address in a global variable (which is
not external), and accessing through that variable.

Fixes #19684.

Change-Id: I41361bbb281b5dbdda0d100ae49d32c69ed85a81
Reviewed-on: https://go-review.googlesource.com/38596
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
2017-03-24 15:37:56 +00:00
Carlos Eduardo Seo
189053aee2 runtime/internal/atomic: Remove unnecessary checks for GOARCH_ppc64
Starting in go1.9, the minimum processor requirement for ppc64 is POWER8. This
means the checks for GOARCH_ppc64 in asm_ppc64x.s can be removed, since we can
assume LBAR and STBCCC instructions (both from ISA 2.06) will always be
available.

Updates #19074

Change-Id: Ib4418169cd9fc6f871a5ab126b28ee58a2f349e2
Reviewed-on: https://go-review.googlesource.com/38406
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-22 18:14:41 +00:00
Josselin Costanzi
01cd22c687 bytes: add optimized countByte for amd64
Use SSE/AVX2 when counting a single byte.
Inspired from runtime indexbyte implementation.

Benchmark against previous implementation, where
1 byte in every 8 is the one we are looking for:

* On a machine without AVX2
name               old time/op   new time/op     delta
CountSingle/10-4    61.8ns ±10%     15.6ns ±11%    -74.83%  (p=0.000 n=10+10)
CountSingle/32-4     100ns ± 4%       17ns ±10%    -82.54%  (p=0.000 n=10+9)
CountSingle/4K-4    9.66µs ± 3%     0.37µs ± 6%    -96.21%  (p=0.000 n=10+10)
CountSingle/4M-4    11.0ms ± 6%      0.4ms ± 4%    -96.04%  (p=0.000 n=10+10)
CountSingle/64M-4    194ms ± 8%        8ms ± 2%    -95.64%  (p=0.000 n=10+10)

name               old speed     new speed       delta
CountSingle/10-4   162MB/s ±10%    645MB/s ±10%   +297.00%  (p=0.000 n=10+10)
CountSingle/32-4   321MB/s ± 5%   1844MB/s ± 9%   +474.79%  (p=0.000 n=10+9)
CountSingle/4K-4   424MB/s ± 3%  11169MB/s ± 6%  +2533.10%  (p=0.000 n=10+10)
CountSingle/4M-4   381MB/s ± 7%   9609MB/s ± 4%  +2421.88%  (p=0.000 n=10+10)
CountSingle/64M-4  346MB/s ± 7%   7924MB/s ± 2%  +2188.78%  (p=0.000 n=10+10)

* On a machine with AVX2
name               old time/op   new time/op     delta
CountSingle/10-8    37.1ns ± 3%      8.2ns ± 1%    -77.80%  (p=0.000 n=10+10)
CountSingle/32-8    66.1ns ± 3%      9.8ns ± 2%    -85.23%  (p=0.000 n=10+10)
CountSingle/4K-8    7.36µs ± 3%     0.11µs ± 1%    -98.54%  (p=0.000 n=10+10)
CountSingle/4M-8    7.46ms ± 2%     0.15ms ± 2%    -97.95%  (p=0.000 n=10+9)
CountSingle/64M-8    124ms ± 2%        6ms ± 4%    -95.09%  (p=0.000 n=10+10)

name               old speed     new speed       delta
CountSingle/10-8   269MB/s ± 3%   1213MB/s ± 1%   +350.32%  (p=0.000 n=10+10)
CountSingle/32-8   484MB/s ± 4%   3277MB/s ± 2%   +576.66%  (p=0.000 n=10+10)
CountSingle/4K-8   556MB/s ± 3%  37933MB/s ± 1%  +6718.36%  (p=0.000 n=10+10)
CountSingle/4M-8   562MB/s ± 2%  27444MB/s ± 3%  +4783.43%  (p=0.000 n=10+9)
CountSingle/64M-8  543MB/s ± 2%  11054MB/s ± 3%  +1935.81%  (p=0.000 n=10+10)

Fixes #19411

Change-Id: Ieaf20b1fabccabe767c55c66e242e86f3617f883
Reviewed-on: https://go-review.googlesource.com/38258
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-21 20:25:17 +00:00
Daniel Martí
2e29eb57db runtime: remove unused *chantype parameters
The chanrecv funcs don't use it at all. The chansend ones do, but the
element type is now part of the hchan struct, which is already a
parameter.

hchan can be nil in chansend when sending to a nil channel, so when
instrumenting we must copy to the stack to be able to read the channel
type.

name             old time/op  new time/op  delta
ChanUncontended  6.42µs ± 1%  6.22µs ± 0%  -3.06%  (p=0.000 n=19+18)

Initially found by github.com/mvdan/unparam.

Fixes #19591.

Change-Id: I3a5e8a0082e8445cc3f0074695e3593fd9c88412
Reviewed-on: https://go-review.googlesource.com/38351
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-21 17:10:16 +00:00
Vladimir Stefanovic
24dc8c6cb5 cmd/compile,runtime: fix atomic And8 for mipsle
Removing stray xori that came from big endian copy/paste.
Adding atomicand8 check to runtime.check() that would have revealed
this error.
Might fix #19396.

Change-Id: If8d6f25d3e205496163541eb112548aa66df9c2a
Reviewed-on: https://go-review.googlesource.com/38257
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-21 16:03:12 +00:00
Hugues Bruant
5d6b7fcaa1 runtime: add mapdelete_fast*
Add benchmarks for map delete with int32/int64/string key

Benchmark results on darwin/amd64

name                 old time/op  new time/op  delta
MapDelete/Int32/1-8   151ns ± 8%    99ns ± 3%  -34.39%  (p=0.008 n=5+5)
MapDelete/Int32/2-8   128ns ± 2%   111ns ±15%  -13.40%  (p=0.040 n=5+5)
MapDelete/Int32/4-8   128ns ± 5%   114ns ± 2%  -10.82%  (p=0.008 n=5+5)
MapDelete/Int64/1-8   144ns ± 0%   104ns ± 3%  -27.53%  (p=0.016 n=4+5)
MapDelete/Int64/2-8   153ns ± 1%   126ns ± 3%  -17.17%  (p=0.008 n=5+5)
MapDelete/Int64/4-8   178ns ± 3%   136ns ± 2%  -23.60%  (p=0.008 n=5+5)
MapDelete/Str/1-8     187ns ± 3%   171ns ± 3%   -8.54%  (p=0.008 n=5+5)
MapDelete/Str/2-8     221ns ± 3%   206ns ± 4%   -7.18%  (p=0.016 n=5+4)
MapDelete/Str/4-8     256ns ± 5%   232ns ± 2%   -9.36%  (p=0.016 n=4+5)

name                     old time/op    new time/op    delta
BinaryTree17-8              2.78s ± 7%     2.70s ± 1%    ~     (p=0.151 n=5+5)
Fannkuch11-8                3.21s ± 2%     3.19s ± 1%    ~     (p=0.310 n=5+5)
FmtFprintfEmpty-8          49.1ns ± 3%    50.2ns ± 2%    ~     (p=0.095 n=5+5)
FmtFprintfString-8         78.6ns ± 4%    80.2ns ± 5%    ~     (p=0.460 n=5+5)
FmtFprintfInt-8            79.7ns ± 1%    81.0ns ± 3%    ~     (p=0.103 n=5+5)
FmtFprintfIntInt-8          117ns ± 2%     119ns ± 0%    ~     (p=0.079 n=5+4)
FmtFprintfPrefixedInt-8     153ns ± 1%     146ns ± 3%  -4.19%  (p=0.024 n=5+5)
FmtFprintfFloat-8           239ns ± 1%     237ns ± 1%    ~     (p=0.246 n=5+5)
FmtManyArgs-8               506ns ± 2%     509ns ± 2%    ~     (p=0.238 n=5+5)
GobDecode-8                7.06ms ± 4%    6.86ms ± 1%    ~     (p=0.222 n=5+5)
GobEncode-8                6.01ms ± 5%    5.87ms ± 2%    ~     (p=0.222 n=5+5)
Gzip-8                      246ms ± 4%     236ms ± 1%  -4.12%  (p=0.008 n=5+5)
Gunzip-8                   37.7ms ± 4%    37.3ms ± 1%    ~     (p=0.841 n=5+5)
HTTPClientServer-8         64.9µs ± 1%    64.4µs ± 0%  -0.80%  (p=0.032 n=5+4)
JSONEncode-8               16.0ms ± 2%    16.2ms ±11%    ~     (p=0.548 n=5+5)
JSONDecode-8               53.2ms ± 2%    53.1ms ± 4%    ~     (p=1.000 n=5+5)
Mandelbrot200-8            4.33ms ± 2%    4.32ms ± 2%    ~     (p=0.841 n=5+5)
GoParse-8                  3.24ms ± 2%    3.27ms ± 4%    ~     (p=0.690 n=5+5)
RegexpMatchEasy0_32-8      86.2ns ± 1%    85.2ns ± 3%    ~     (p=0.286 n=5+5)
RegexpMatchEasy0_1K-8       198ns ± 2%     199ns ± 1%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_32-8      82.6ns ± 2%    81.8ns ± 1%    ~     (p=0.294 n=5+5)
RegexpMatchEasy1_1K-8       359ns ± 2%     354ns ± 1%  -1.39%  (p=0.048 n=5+5)
RegexpMatchMedium_32-8      123ns ± 2%     123ns ± 1%    ~     (p=0.905 n=5+5)
RegexpMatchMedium_1K-8     38.2µs ± 2%    38.6µs ± 8%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32-8       1.92µs ± 2%    1.91µs ± 5%    ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-8       57.6µs ± 1%    57.0µs ± 2%    ~     (p=0.310 n=5+5)
Revcomp-8                   483ms ± 7%     441ms ± 1%  -8.79%  (p=0.016 n=5+4)
Template-8                 58.0ms ± 1%    58.2ms ± 7%    ~     (p=0.310 n=5+5)
TimeParse-8                 324ns ± 6%     312ns ± 2%    ~     (p=0.087 n=5+5)
TimeFormat-8                330ns ± 1%     329ns ± 1%    ~     (p=0.968 n=5+5)

name                     old speed      new speed      delta
GobDecode-8               109MB/s ± 4%   112MB/s ± 1%    ~     (p=0.222 n=5+5)
GobEncode-8               128MB/s ± 5%   131MB/s ± 2%    ~     (p=0.222 n=5+5)
Gzip-8                   78.9MB/s ± 4%  82.3MB/s ± 1%  +4.25%  (p=0.008 n=5+5)
Gunzip-8                  514MB/s ± 4%   521MB/s ± 1%    ~     (p=0.841 n=5+5)
JSONEncode-8              121MB/s ± 2%   120MB/s ±10%    ~     (p=0.548 n=5+5)
JSONDecode-8             36.5MB/s ± 2%  36.6MB/s ± 4%    ~     (p=1.000 n=5+5)
GoParse-8                17.9MB/s ± 2%  17.7MB/s ± 4%    ~     (p=0.730 n=5+5)
RegexpMatchEasy0_32-8     371MB/s ± 1%   375MB/s ± 3%    ~     (p=0.310 n=5+5)
RegexpMatchEasy0_1K-8    5.15GB/s ± 1%  5.13GB/s ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchEasy1_32-8     387MB/s ± 2%   391MB/s ± 1%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_1K-8    2.85GB/s ± 2%  2.89GB/s ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchMedium_32-8   8.07MB/s ± 2%  8.06MB/s ± 1%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8   26.8MB/s ± 2%  26.6MB/s ± 7%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32-8     16.7MB/s ± 2%  16.7MB/s ± 5%    ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-8     17.8MB/s ± 1%  18.0MB/s ± 2%    ~     (p=0.310 n=5+5)
Revcomp-8                 527MB/s ± 6%   577MB/s ± 1%  +9.44%  (p=0.016 n=5+4)
Template-8               33.5MB/s ± 1%  33.4MB/s ± 7%    ~     (p=0.310 n=5+5)

Updates #19495

Change-Id: Ib9ece1690813d9b4788455db43d30891e2138df5
Reviewed-on: https://go-review.googlesource.com/38172
Reviewed-by: Hugues Bruant <hugues.bruant@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-21 06:07:24 +00:00
Alex Brainman
f2b79cadfd runtime: import os package in BenchmarkRunningGoProgram
I would like to use BenchmarkRunningGoProgram to measure
changes for issue #15588. So the program in the benchmark
should import "os" package.

It is also reasonable that basic Go program includes
"os" package.

For #15588.

Change-Id: Ida6712eab22c2e79fbe91b6fdd492eaf31756852
Reviewed-on: https://go-review.googlesource.com/37914
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-21 05:59:45 +00:00
Ian Lance Taylor
5dc14af682 runtime: clear signal stack on main thread
This is a workaround for a FreeBSD kernel bug. It can be removed when
we are confident that all people are using the fixed kernel. See #15658.

Updates #15658.

Change-Id: I0ecdccb77ddd0c270bdeac4d3a5c8abaf0449075
Reviewed-on: https://go-review.googlesource.com/38325
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-20 22:59:26 +00:00
Austin Clements
df6025bc0d runtime: disallow malloc or panic in scavenge
Mallocs and panics in the scavenge path are particularly nasty because
they're likely to silently self-deadlock on the mheap.lock. Avoid
sinking lots of time into debugging these issues in the future by
turning these into immediate throws.

Change-Id: Ib36fdda33bc90b21c32432b03561630c1f3c69bc
Reviewed-on: https://go-review.googlesource.com/38293
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-19 22:42:28 +00:00
Austin Clements
13ae271d5d runtime: introduce a type for lfstacks
The lfstack API is still a C-style API: lfstacks all have unhelpful
type uint64 and the APIs are package-level functions. Make the code
more readable and Go-style by creating an lfstack type with methods
for push, pop, and empty.

Change-Id: I64685fa3be0e82ae2d1a782a452a50974440a827
Reviewed-on: https://go-review.googlesource.com/38290
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-19 22:42:24 +00:00
Daniel Martí
77b09b8b8d runtime: remove unused g parameter
Found by github.com/mvdan/unparam.

Change-Id: I20145440ff1bcd27fcf15a740354c52f313e536c
Reviewed-on: https://go-review.googlesource.com/37894
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-16 14:03:45 +00:00
Carlos Eduardo Seo
d60166d5ee runtime: improve IndexByte for ppc64x
This change adds a better implementation of IndexByte for ppc64x.

Improvement for bytes·IndexByte:

benchmark                             old ns/op     new ns/op     delta
BenchmarkIndexByte/10-16              12.5          8.48          -32.16%
BenchmarkIndexByte/32-16              34.4          9.85          -71.37%
BenchmarkIndexByte/4K-16              3089          217           -92.98%
BenchmarkIndexByte/4M-16              3154810       207051        -93.44%
BenchmarkIndexByte/64M-16             50564811      5579093       -88.97%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkIndexByte/10-16              800.41       1179.64      1.47x
BenchmarkIndexByte/32-16              930.60       3249.10      3.49x
BenchmarkIndexByte/4K-16              1325.71      18832.53     14.21x
BenchmarkIndexByte/4M-16              1329.49      20257.29     15.24x
BenchmarkIndexByte/64M-16             1327.19      12028.63     9.06x

Improvement for strings·IndexByte:

benchmark                             old ns/op     new ns/op     delta
BenchmarkIndexByte-16                 25.9          7.69          -70.31%

Fixes #19030

Change-Id: Ifb82bbb3d643ec44b98eaa2d08a07f47e5c2fd11
Reviewed-on: https://go-review.googlesource.com/37670
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-03-16 13:54:20 +00:00
Keith Randall
d5dc490519 cmd/compile: intrinsics for math/bits.TrailingZerosX
Implement math/bits.TrailingZerosX using intrinsics.

Generally reorganize the intrinsic spec a bit.
The instrinsics data structure is now built at init time.
This will make doing the other functions in math/bits easier.

Update sys.CtzX to return int instead of uint{64,32} so it
matches math/bits.TrailingZerosX.

Improve the intrinsics a bit for amd64.  We don't need the CMOV
for <64 bit versions.

Update #18616

Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39
Reviewed-on: https://go-review.googlesource.com/38155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 02:44:16 +00:00
Martin Möhrmann
16200c7333 runtime: make complex division c99 compatible
- changes tests to check that the real and imaginary part of the go complex
  division result is equal to the result gcc produces for c99
- changes complex division code to satisfy new complex division test
- adds float functions isNan, isFinite, isInf, abs and copysign
  in the runtime package

Fixes #14644.

name                   old time/op  new time/op  delta
Complex128DivNormal-4  21.8ns ± 6%  13.9ns ± 6%  -36.37%  (p=0.000 n=20+20)
Complex128DivNisNaN-4  14.1ns ± 1%  15.0ns ± 1%   +5.86%  (p=0.000 n=20+19)
Complex128DivDisNaN-4  12.5ns ± 1%  16.7ns ± 1%  +33.79%  (p=0.000 n=19+20)
Complex128DivNisInf-4  10.1ns ± 1%  13.0ns ± 1%  +28.25%  (p=0.000 n=20+19)
Complex128DivDisInf-4  11.0ns ± 1%  20.9ns ± 1%  +90.69%  (p=0.000 n=16+19)
ComplexAlgMap-4        86.7ns ± 1%  86.8ns ± 2%     ~     (p=0.804 n=20+20)

Change-Id: I261f3b4a81f6cc858bc7ff48f6fd1b39c300abf0
Reviewed-on: https://go-review.googlesource.com/37441
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-15 22:45:17 +00:00
Austin Clements
4b8f41daa6 runtime: print user stack on other threads during GOTRACBEACK=crash
Currently, when printing tracebacks of other threads during
GOTRACEBACK=crash, if the thread is on the system stack we print only
the header for the user goroutine and fail to print its stack. This
happens because we passed the g0 to traceback instead of curg. The g0
never has anything set in its gobuf, so traceback doesn't print
anything.

Fix this by passing _g_.m.curg to traceback instead of the g0.

Fixes #19494.

Change-Id: Idfabf94d6a725e9cdf94a3923dead6455ef3b217
Reviewed-on: https://go-review.googlesource.com/38012
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-15 22:16:12 +00:00
Austin Clements
f2e87158f0 runtime: make GOTRACEBACK=crash crash promptly in cgo binaries
GOTRACEBACK=crash works by bouncing a SIGQUIT around the process
sched.mcount times. However, sched.mcount includes the extra Ms
allocated by oneNewExtraM for cgo callbacks. Hence, if there are any
extra Ms that don't have real OS threads, we'll try to send SIGQUIT
more times than there are threads to catch it. Since nothing will
catch these extra signals, we'll fall back to blocking for five
seconds before aborting the process.

Avoid this five second delay by subtracting out the number of extra Ms
when sending SIGQUITs.

Of course, in a cgo binary, it's still possible for the SIGQUIT to go
to a cgo thread and cause some other failure mode. This does not fix
that.

Change-Id: I4fbf3c52dd721812796c4c1dcb2ab4cb7026d965
Reviewed-on: https://go-review.googlesource.com/38182
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-15 22:16:10 +00:00
Hugues Bruant
ec091b6af2 runtime: add mapassign_fast*
Add benchmarks for map assignment with int32/int64/string key

Benchmark results on darwin/amd64

name                  old time/op  new time/op  delta
MapAssignInt32_255-8  24.7ns ± 3%  17.4ns ± 2%  -29.75%  (p=0.000 n=10+10)
MapAssignInt32_64k-8  45.5ns ± 4%  37.6ns ± 4%  -17.18%  (p=0.000 n=10+10)
MapAssignInt64_255-8  26.0ns ± 3%  17.9ns ± 4%  -31.03%  (p=0.000 n=10+10)
MapAssignInt64_64k-8  46.9ns ± 5%  38.7ns ± 2%  -17.53%  (p=0.000 n=9+10)
MapAssignStr_255-8    47.8ns ± 3%  24.8ns ± 4%  -48.01%  (p=0.000 n=10+10)
MapAssignStr_64k-8    83.0ns ± 3%  51.9ns ± 3%  -37.45%  (p=0.000 n=10+9)

name                     old time/op    new time/op    delta
BinaryTree17-8              3.11s ±19%     2.78s ± 3%    ~     (p=0.095 n=5+5)
Fannkuch11-8                3.26s ± 1%     3.21s ± 2%    ~     (p=0.056 n=5+5)
FmtFprintfEmpty-8          50.3ns ± 1%    50.8ns ± 2%    ~     (p=0.246 n=5+5)
FmtFprintfString-8         82.7ns ± 4%    80.1ns ± 5%    ~     (p=0.238 n=5+5)
FmtFprintfInt-8            82.6ns ± 2%    81.9ns ± 3%    ~     (p=0.508 n=5+5)
FmtFprintfIntInt-8          124ns ± 4%     121ns ± 3%    ~     (p=0.111 n=5+5)
FmtFprintfPrefixedInt-8     158ns ± 6%     160ns ± 2%    ~     (p=0.341 n=5+5)
FmtFprintfFloat-8           249ns ± 2%     245ns ± 2%    ~     (p=0.095 n=5+5)
FmtManyArgs-8               513ns ± 2%     519ns ± 3%    ~     (p=0.151 n=5+5)
GobDecode-8                7.48ms ±12%    7.11ms ± 2%    ~     (p=0.222 n=5+5)
GobEncode-8                6.25ms ± 1%    6.03ms ± 2%  -3.56%  (p=0.008 n=5+5)
Gzip-8                      252ms ± 4%     252ms ± 4%    ~     (p=1.000 n=5+5)
Gunzip-8                   38.4ms ± 3%    38.6ms ± 2%    ~     (p=0.690 n=5+5)
HTTPClientServer-8         76.9µs ±41%    66.4µs ± 6%    ~     (p=0.310 n=5+5)
JSONEncode-8               16.5ms ± 3%    16.7ms ± 3%    ~     (p=0.421 n=5+5)
JSONDecode-8               54.6ms ± 1%    54.3ms ± 2%    ~     (p=0.548 n=5+5)
Mandelbrot200-8            4.45ms ± 3%    4.47ms ± 1%    ~     (p=0.841 n=5+5)
GoParse-8                  3.43ms ± 1%    3.32ms ± 2%  -3.28%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8      88.2ns ± 3%    89.4ns ± 2%    ~     (p=0.333 n=5+5)
RegexpMatchEasy0_1K-8       205ns ± 1%     206ns ± 1%    ~     (p=0.905 n=5+5)
RegexpMatchEasy1_32-8      85.1ns ± 1%    85.5ns ± 5%    ~     (p=0.690 n=5+5)
RegexpMatchEasy1_1K-8       365ns ± 1%     371ns ± 9%    ~     (p=1.000 n=5+5)
RegexpMatchMedium_32-8      129ns ± 2%     128ns ± 3%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8     39.8µs ± 0%    39.7µs ± 4%    ~     (p=0.730 n=4+5)
RegexpMatchHard_32-8       1.99µs ± 3%    2.05µs ±16%    ~     (p=0.794 n=5+5)
RegexpMatchHard_1K-8       59.3µs ± 1%    60.3µs ± 7%    ~     (p=1.000 n=5+5)
Revcomp-8                   1.36s ±63%     0.52s ± 5%    ~     (p=0.095 n=5+5)
Template-8                 62.6ms ±14%    60.5ms ± 5%    ~     (p=0.690 n=5+5)
TimeParse-8                 330ns ± 2%     324ns ± 2%    ~     (p=0.087 n=5+5)
TimeFormat-8                350ns ± 3%     340ns ± 1%  -2.86%  (p=0.008 n=5+5)

name                     old speed      new speed      delta
GobDecode-8               103MB/s ±11%   108MB/s ± 2%    ~     (p=0.222 n=5+5)
GobEncode-8               123MB/s ± 1%   127MB/s ± 2%  +3.71%  (p=0.008 n=5+5)
Gzip-8                   77.1MB/s ± 4%  76.9MB/s ± 3%    ~     (p=1.000 n=5+5)
Gunzip-8                  505MB/s ± 3%   503MB/s ± 2%    ~     (p=0.690 n=5+5)
JSONEncode-8              118MB/s ± 3%   116MB/s ± 3%    ~     (p=0.421 n=5+5)
JSONDecode-8             35.5MB/s ± 1%  35.8MB/s ± 2%    ~     (p=0.397 n=5+5)
GoParse-8                16.9MB/s ± 1%  17.4MB/s ± 2%  +3.45%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8     363MB/s ± 3%   358MB/s ± 2%    ~     (p=0.421 n=5+5)
RegexpMatchEasy0_1K-8    4.98GB/s ± 1%  4.97GB/s ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchEasy1_32-8     376MB/s ± 1%   375MB/s ± 5%    ~     (p=0.690 n=5+5)
RegexpMatchEasy1_1K-8    2.80GB/s ± 1%  2.76GB/s ± 9%    ~     (p=0.841 n=5+5)
RegexpMatchMedium_32-8   7.73MB/s ± 1%  7.76MB/s ± 3%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8   25.8MB/s ± 0%  25.8MB/s ± 4%    ~     (p=0.651 n=4+5)
RegexpMatchHard_32-8     16.1MB/s ± 3%  15.7MB/s ±14%    ~     (p=0.794 n=5+5)
RegexpMatchHard_1K-8     17.3MB/s ± 1%  17.0MB/s ± 7%    ~     (p=0.984 n=5+5)
Revcomp-8                 273MB/s ±83%   488MB/s ± 5%    ~     (p=0.095 n=5+5)
Template-8               31.1MB/s ±13%  32.1MB/s ± 5%    ~     (p=0.690 n=5+5)

Updates #19495

Change-Id: I116e9a2a4594769318b22d736464de8a98499909
Reviewed-on: https://go-review.googlesource.com/38091
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-13 23:43:16 +00:00
Dave Cheney
7ee43faf78 runtime: remove sizeToClass
CL 32219 added precomputed sizeclass tables.

Remove the unused sizeToClass method which was previously only
called from initSizes.

Change-Id: I907bf9ed78430ecfaabbec7fca77ef2375010081
Reviewed-on: https://go-review.googlesource.com/38113
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-13 01:55:44 +00:00
David NewHamlet
e19f184b8f runtime: use cpuset_getaffinity for runtime.NumCPU() on FreeBSD
In FreeBSD when run Go proc under a given sub-list of
processors(e.g. 'cpuset -l 0 ./a.out' in multi-core system),
runtime.NumCPU() still return all physical CPUs from sysctl
hw.ncpu instead of account from sub-list.

Fix by use syscall cpuset_getaffinity to account the number of sub-list.

Fixes #15206

Change-Id: If87c4b620e870486efa100685db5debbf1210a5b
Reviewed-on: https://go-review.googlesource.com/29341
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2017-03-10 22:06:24 +00:00
Daniel Martí
da0d23e5cd runtime: remove unused ratep parameter
Found by github.com/mvdan/unparam.

Change-Id: Iabcdfec2ae42c735aa23210b7183080d750682ca
Reviewed-on: https://go-review.googlesource.com/38030
Reviewed-by: Peter Weinberger <pjw@google.com>
Run-TryBot: Peter Weinberger <pjw@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-10 20:46:58 +00:00
Bryan C. Mills
4210930a28 runtime/cgo: return correct sa_flags
A typo in the previous revision ("act" instead of "oldact") caused us
to return the sa_flags from the new (or zeroed) sigaction rather than
the old one.

In the presence of a signal handler registered before
runtime.libpreinit, this caused setsigstack to erroneously zero out
important sa_flags (such as SA_SIGINFO) in its attempt to re-register
the existing handler with SA_ONSTACK.

Change-Id: I3cd5152a38ec0d44ae611f183bc1651d65b8a115
Reviewed-on: https://go-review.googlesource.com/37852
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-09 18:53:35 +00:00
Bryan C. Mills
e57350f4c0 runtime: fix _cgo_yield usage with sysmon and on BSD
There are a few problems from change 35494, discovered during testing
of change 37852.

1. I was confused about the usage of n.key in the sema variant, so we
   were looping on the wrong condition. The error was not caught by
   the TryBots (presumably due to missing TSAN coverage in the BSD and
   darwin builders?).

2. The sysmon goroutine sometimes skips notetsleep entirely, using
   direct usleep syscalls instead. In that case, we were not calling
   _cgo_yield, leading to missed signals under TSAN.

3. Some notetsleep calls have long finite timeouts. They should be
   broken up into smaller chunks with a yield at the end of each
   chunk.

updates #18717

Change-Id: I91175af5dea3857deebc686f51a8a40f9d690bcc
Reviewed-on: https://go-review.googlesource.com/37867
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-09 18:36:49 +00:00
Josh Bleecher Snyder
23be728950 runtime: optimize slicebytestostring
Inline rawstringtmp and simplify.
Use memmove instead of copy.

name                     old time/op  new time/op  delta
SliceByteToString/1-8    19.4ns ± 2%  14.1ns ± 1%  -27.04%  (p=0.000 n=20+17)
SliceByteToString/2-8    20.8ns ± 2%  15.5ns ± 2%  -25.46%  (p=0.000 n=20+20)
SliceByteToString/4-8    20.7ns ± 1%  14.9ns ± 1%  -28.30%  (p=0.000 n=20+20)
SliceByteToString/8-8    23.2ns ± 1%  17.1ns ± 1%  -26.22%  (p=0.000 n=19+19)
SliceByteToString/16-8   29.4ns ± 1%  23.6ns ± 1%  -19.76%  (p=0.000 n=17+20)
SliceByteToString/32-8   31.4ns ± 1%  26.0ns ± 1%  -17.11%  (p=0.000 n=16+19)
SliceByteToString/64-8   36.1ns ± 0%  30.0ns ± 0%  -16.96%  (p=0.000 n=16+16)
SliceByteToString/128-8  46.9ns ± 0%  38.9ns ± 0%  -17.15%  (p=0.000 n=17+19)

Change-Id: I422e688830e4a9bd21897d1f74964625b735f436
Reviewed-on: https://go-review.googlesource.com/37791
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-08 22:05:52 +00:00
Bryan C. Mills
29edf0f9fe runtime: poll libc to deliver signals under TSAN
fixes #18717

Change-Id: I7244463d2e7489e0b0fe3b74c4b782e71210beb2
Reviewed-on: https://go-review.googlesource.com/35494
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-08 18:58:30 +00:00
David Chase
d71f36b5aa cmd/compile: check loop rescheduling with stack bound, not counter
After benchmarking with a compiler modified to have better
spill location, it became clear that this method of checking
was actually faster on (at least) two different architectures
(ppc64 and amd64) and it also provides more timely interruption
of loops.

This change adds a modified FOR loop node "FORUNTIL" that
checks after executing the loop body instead of before (i.e.,
always at least once).  This ensures that a pointer past the
end of a slice or array is not made visible to the garbage
collector.

Without the rescheduling checks inserted, the restructured
loop from this  change apparently provides a 1% geomean
improvement on PPC64 running the go1 benchmarks; the
improvement on AMD64 is only 0.12%.

Inserting the rescheduling check exposed some peculiar bug
with the ssa test code for s390x; this was updated based on
initial code actually generated for GOARCH=s390x to use
appropriate OpArg, OpAddr, and OpVarDef.

NaCl is disabled in testing.

Change-Id: Ieafaa9a61d2a583ad00968110ef3e7a441abca50
Reviewed-on: https://go-review.googlesource.com/36206
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 18:52:12 +00:00
Russ Cox
c797256a8f runtime/pprof: add GNU build IDs to Mappings recorded from /proc/self/maps
This helps systems that maintain an external database mapping
build ID to symbol information for the given binary, especially
in the case where /proc/self/maps lists many different files
(for example, many shared libraries).

Avoid importing debug/elf to avoid dragging in that whole
package (and its dependencies like debug/dwarf) into the
build of every program that generates a profile.

Fixes #19431.

Change-Id: I6d4362a79fe23e4f1726dffb0661d20bb57f766f
Reviewed-on: https://go-review.googlesource.com/37855
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-08 01:09:18 +00:00
Austin Clements
d50f892abc runtime: join selectgo and selectgoImpl
Currently selectgo is just a wrapper around selectgoImpl. This keeps
the hard-coded frame skip counts for tracing the same between the
channel implementation and the select implementation.

However, this is fragile and confusing, so pass a skip parameter to
send and recv, join selectgo and selectgoImpl into one function, and
use decrease all of the skips in selectgo by one.

Change-Id: I11b8cbb7d805b55f5dc6ab4875ac7dde79412ff2
Reviewed-on: https://go-review.googlesource.com/37860
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-07 21:19:38 +00:00
Austin Clements
b992c2649e runtime: print SP/FP on bad pointer crashes
If the bad pointer is on a stack, this makes it possible to find the
frame containing the bad pointer.

Change-Id: Ieda44e054aa9ebf22d15d184457c7610b056dded
Reviewed-on: https://go-review.googlesource.com/37858
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 20:46:54 +00:00
Austin Clements
caa7dacfd2 runtime: honor GOTRACEBACK=crash even if _g_.m.traceback != 0
Change-Id: I6de1ef8f67bde044b8706c01e98400e266e1f8f0
Reviewed-on: https://go-review.googlesource.com/37857
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-07 20:46:52 +00:00
Matthew Dempsky
c310c688ff cmd/compile, runtime: simplify multiway select implementation
This commit reworks multiway select statements to use normal control
flow primitives instead of the previous setjmp/longjmp-like behavior.
This simplifies liveness analysis and should prevent issues around
"returns twice" function calls within SSA passes.

test/live.go is updated because liveness analysis's CFG is more
representative of actual control flow. The case bodies are the only
real successors of the selectgo call, but previously the selectsend,
selectrecv, etc. calls were included in the successors list too.

Updates #19331.

Change-Id: I7f879b103a4b85e62fc36a270d812f54c0aa3e83
Reviewed-on: https://go-review.googlesource.com/37661
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-07 20:14:17 +00:00
Daniel Martí
5ed952368e runtime/pprof: actually use tag parameter
It's only ever called with the value it was using, but the code was
counterintuitive. Use the parameter instead, like the other funcs near
it.

Found by github.com/mvdan/unparam.

Change-Id: I45855e11d749380b9b2a28e6dd1d5dedf119a19b
Reviewed-on: https://go-review.googlesource.com/37893
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 20:01:05 +00:00
Elias Naur
b91b694b37 runtime/pprof: fix the protobuf tests on Android
Change-Id: I5f85a7980b9a18d3641c4ee8b0992671a8421bb0
Reviewed-on: https://go-review.googlesource.com/37896
Run-TryBot: Elias Naur <elias.naur@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 19:18:04 +00:00
Austin Clements
e4f73769bc runtime: strongly encourage CallersFrames with the result of Callers
For historical reasons, it's still commonplace to iterate over the
slice returned by runtime.Callers and call FuncForPC on each PC. This
is broken in gccgo and somewhat broken in gc and will become more
broken in gc with mid-stack inlining.

In Go 1.7, we introduced runtime.CallersFrames to deal with these
problems, but didn't strongly direct people toward using it. Reword
the documentation on runtime.Callers to more strongly encourage people
to use CallersFrames and explicitly discourage them from iterating
over the PCs or using FuncForPC on the results.

Fixes #19426.

Change-Id: Id0d14cb51a0e9521c8fdde9612610f2c2b9383c4
Reviewed-on: https://go-review.googlesource.com/37726
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-03-06 20:52:00 +00:00
Austin Clements
0efc8b2188 runtime: avoid repeated findmoduledatap calls
Currently almost every function that deals with a *_func has to first
look up the *moduledata for the module containing the function's entry
point. This means we almost always do at least two identical module
lookups whenever we deal with a *_func (one to get the *_func and
another to get something from its module data) and sometimes several
more.

Fix this by making findfunc return a new funcInfo type that embeds
*_func, but also includes the *moduledata, and making all of the
functions that currently take a *_func instead take a funcInfo and use
the already-found *moduledata.

This transformation is trivial for the most part, since the *_func
type is usually inferred. The annoying part is that we can no longer
use nil to indicate failure, so this introduces a funcInfo.valid()
method and replaces nil checks with calls to valid.

Change-Id: I9b8075ef1c31185c1943596d96dec45c7ab5100f
Reviewed-on: https://go-review.googlesource.com/37331
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-03-06 19:17:24 +00:00
Austin Clements
2ef88f7fcf runtime: lock-free fast path for mark bits allocation
Currently we acquire a global lock for every newMarkBits call. This is
unfortunate since every span sweep operation calls newMarkBits.

However, most allocations are simply linear allocations from the
current arena. Take advantage of this to add a lock-free fast path for
allocating from the current arena. With this change, the global lock
only protects the lists of arenas, not the free offset in the current
arena.

Change-Id: I6cf6182af8492c8bfc21276114c77275fe3d7826
Reviewed-on: https://go-review.googlesource.com/34595
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-06 18:40:26 +00:00
Austin Clements
6c4a8d195b runtime: don't hold global gcBitsArenas lock over allocation
Currently, newArena holds the gcBitsArenas lock across allocating
memory from the OS for a new gcBits arena. This is a global lock and
allocating physical memory can be expensive, so this has the potential
to cause high lock contention, especially since every single span
sweep operation calls newArena (via newMarkBits).

Improve the situation by temporarily dropping the lock across
allocation. This means the caller now has to revalidate its
assumptions after the lock is dropped, so this also factors out that
code path and reinvokes it after the lock is acquired.

Change-Id: I1113200a954ab4aad16b5071512583cfac744bdc
Reviewed-on: https://go-review.googlesource.com/34594
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-06 18:40:23 +00:00
Eitan Adler
789c5255a4 all: remove the the duplicate words
Change-Id: I6343c162e27e2e492547c96f1fc504909b1c03c0
Reviewed-on: https://go-review.googlesource.com/37793
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-06 04:39:12 +00:00
Josh Bleecher Snyder
d4451362c0 runtime: add slicebytetostring benchmark
Change-Id: I666d2c6ea8d0b54a71260809d1a2573b122865b2
Reviewed-on: https://go-review.googlesource.com/37790
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-05 05:14:08 +00:00
Austin Clements
4a7cf960c3 runtime: make ReadMemStats STW for < 25µs
Currently ReadMemStats stops the world for ~1.7 ms/GB of heap because
it collects statistics from every single span. For large heaps, this
can be quite costly. This is particularly unfortunate because many
production infrastructures call this function regularly to collect and
report statistics.

Fix this by tracking the necessary cumulative statistics in the
mcaches. ReadMemStats still has to stop the world to stabilize these
statistics, but there are only O(GOMAXPROCS) mcaches to collect
statistics from, so this pause is only 25µs even at GOMAXPROCS=100.

Fixes #13613.

Change-Id: I3c0a4e14833f4760dab675efc1916e73b4c0032a
Reviewed-on: https://go-review.googlesource.com/34937
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-04 02:56:37 +00:00
Austin Clements
3399fd254d runtime: remove unused gcstats
The gcstats structure is no longer consumed by anything and no longer
tracks statistics that are particularly relevant to the concurrent
garbage collector. Remove it. (Having statistics is probably a good
idea, but these aren't the stats we need these days and we don't have
a way to get them out of the runtime.)

In preparation for #13613.

Change-Id: Ib63e2f9067850668f9dcbfd4ed89aab4a6622c3f
Reviewed-on: https://go-review.googlesource.com/34936
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-04 02:56:35 +00:00
Elias Naur
7523baed09 misc/ios,cmd/go, runtime/cgo: fix iOS test harness (again)
The iOS test harness was recently changed in response to lldb bugs
to replace breakpoints with the SIGUSR2 signal (CL 34926), and to
pass the current directory in the test binary arguments (CL 35152).
Both the signal sending and working directory setup is done from
the go test driver.

However, the new method doesn't work with tests where a C program is
the test driver instead of go test: the current working directory
will not be changed and SIGUSR2 is not raised.

Instead of copying that logic into any C test program, rework the
test harness (again) to move the setup logic to the early runtime
cgo setup code. That way, the harness will run even in the library
build modes.

Then, use the app Info.plist file to pass the working
directory, removing the need to alter the arguments after running.

Finally, use the SIGINT signal instead of SIGUSR2 to avoid
manipulating the signal masks or handlers.

Fixes the testcarchive tests on iOS.

With this CL, both darwin/arm and darwin/arm64 passes all.bash.

This CL replaces CL 34926, CL 35152 as well as the fixup CL
35123 and CL 35255. They are reverted in CLs earlier in the
relation chain.

Change-Id: I8485c7db1404fbd8daa261efd1ea89e905121a3e
Reviewed-on: https://go-review.googlesource.com/36090
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-03-04 01:43:13 +00:00
David Lazar
781fd3998e runtime: use inlining tables to generate accurate tracebacks
The code in https://play.golang.org/p/aYQPrTtzoK now produces the
following stack trace:

goroutine 1 [running]:
main.(*point).negate(...)
	/tmp/go/main.go:8
main.main()
	/tmp/go/main.go:14 +0x23

Previously the stack trace missed the inlined call:

goroutine 1 [running]:
main.main()
	/tmp/go/main.go:14 +0x23

Fixes #10152.
Updates #19348.

Change-Id: Ib43c67012f53da0ef1a1e69bcafb65b57d9cecb2
Reviewed-on: https://go-review.googlesource.com/37233
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-03 21:29:34 +00:00
David Lazar
699175a11a cmd/compile,link: generate PC-value tables with inlining information
In order to generate accurate tracebacks, the runtime needs to know the
inlined call stack for a given PC. This creates two tables per function
for this purpose. The first table is the inlining tree (stored in the
function's funcdata), which has a node containing the file, line, and
function name for every inlined call. The second table is a PC-value
table that maps each PC to a node in the inlining tree (or -1 if the PC
is not the result of inlining).

To give the appearance that inlining hasn't happened, the runtime also
needs the original source position information of inlined AST nodes.
Previously the compiler plastered over the line numbers of inlined AST
nodes with the line number of the call. This meant that the PC-line
table mapped each PC to line number of the outermost call in its inlined
call stack, with no way to access the innermost line number.

Now the compiler retains line numbers of inlined AST nodes and writes
the innermost source position information to the PC-line and PC-file
tables. Some tools and tests expect to see outermost line numbers, so we
provide the OutermostLine function for displaying line info.

To keep track of the inlined call stack for an AST node, we extend the
src.PosBase type with an index into a global inlining tree. Every time
the compiler inlines a call, it creates a node in the global inlining
tree for the call, and writes its index to the PosBase of every inlined
AST node. The parent of this node is the inlining tree index of the
call. -1 signifies no parent.

For each function, the compiler creates a local inlining tree and a
PC-value table mapping each PC to an index in the local tree.  These are
written to an object file, which is read by the linker.  The linker
re-encodes these tables compactly by deduplicating function names and
file names.

This change increases the size of binaries by 4-5%. For example, this is
how the go1 benchmark binary is impacted by this change:

section             old bytes   new bytes   delta
.text               3.49M ± 0%  3.49M ± 0%   +0.06%
.rodata             1.12M ± 0%  1.21M ± 0%   +8.21%
.gopclntab          1.50M ± 0%  1.68M ± 0%  +11.89%
.debug_line          338k ± 0%   435k ± 0%  +28.78%
Total               9.21M ± 0%  9.58M ± 0%   +4.01%

Updates #19348.

Change-Id: Ic4f180c3b516018138236b0c35e0218270d957d3
Reviewed-on: https://go-review.googlesource.com/37231
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-03 21:29:30 +00:00
Austin Clements
77f64c50db runtime: clarify access to mheap_.busy
There are two accesses to mheap_.busy that are guarded by checks
against len(mheap_.free). This works because both lists are (and must
be) the same length, but it makes the code less clear. Change these to
use len(mheap_.busy) so the access more clearly parallels the check.

Fixes #18944.

Change-Id: I9bacbd3663988df351ed4396ae9018bc71018311
Reviewed-on: https://go-review.googlesource.com/36354
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-03 17:02:18 +00:00
Austin Clements
b50b728587 runtime: simplify sweep allocation counting
Currently sweep counts the number of allocated objects, computes the
number of free objects from that, then re-computes the number of
allocated objects from that. Simplify and clean this up by skipping
these intermediate steps.

Change-Id: I3ed98e371eb54bbcab7c8530466c4ab5fde35f0a
Reviewed-on: https://go-review.googlesource.com/34935
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-03 17:02:16 +00:00
Austin Clements
f1ba75f8c5 runtime: don't rescan finalizers queue during mark termination
Currently we scan the finalizers queue both during concurrent mark and
during mark termination. This costs roughly 20ns per queued finalizer
and about 1ns per unused finalizer queue slot (allocated queue length
never decreases), which can drive up STW time if there are many
finalizers.

However, we only add finalizers to this queue during sweeping, which
means that the second scan will never find anything new. Hence, we can
fix this by simply not scanning the finalizers queue during mark
termination. This brings the STW time under the 100µs goal even with
1,000,000 queued finalizers.

Fixes #18869.

Change-Id: I4ce5620c66fb7f13ebeb39ca313ce57047d1d0fb
Reviewed-on: https://go-review.googlesource.com/36013
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-03 17:02:14 +00:00
Austin Clements
98da2d1f91 runtime: remove wbufptr
Since workbuf is now marked go:notinheap, the write barrier-preventing
wrapper type wbufptr is no longer necessary. Remove it.

Change-Id: I3e5b5803a1547d65de1c1a9c22458a38e08549b7
Reviewed-on: https://go-review.googlesource.com/35971
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-03 17:02:12 +00:00
Josh Bleecher Snyder
9b15c13dc5 runtime/pprof: fix data race between Profile.Add and Profile.WriteTo
p.m is accessed in WriteTo without holding p.mu.
Move the access inside the critical section.

The race detector catches this bug using this program:


package main

import (
	"os"
	"runtime/pprof"
	"time"
)

func main() {
	p := pprof.NewProfile("ABC")
	go func() {
		p.WriteTo(os.Stdout, 1)
		time.Sleep(time.Second)
	}()
	p.Add("abc", 0)
	time.Sleep(time.Second)
}


$ go run -race x.go 
==================
WARNING: DATA RACE
Write at 0x00c42007c240 by main goroutine:
  runtime.mapassign()
      /Users/josh/go/tip/src/runtime/hashmap.go:485 +0x0
  runtime/pprof.(*Profile).Add()
      /Users/josh/go/tip/src/runtime/pprof/pprof.go:281 +0x255
  main.main()
      /Users/josh/go/tip/src/p.go:15 +0x9d

Previous read at 0x00c42007c240 by goroutine 6:
  runtime/pprof.(*Profile).WriteTo()
      /Users/josh/go/tip/src/runtime/pprof/pprof.go:314 +0xc5
  main.main.func1()
      /Users/josh/go/tip/src/x.go:12 +0x69

Goroutine 6 (running) created at:
  main.main()
      /Users/josh/go/tip/src/x.go:11 +0x6e
==================
ABC profile: total 1
1 @ 0x110ccb4 0x111aeee 0x1055053 0x107f031

Found 1 data race(s)
exit status 66


(Exit status 66?)

Change-Id: I49d884dc3af9cce2209057a3448fe6bf50653523
Reviewed-on: https://go-review.googlesource.com/37730
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-02 23:30:07 +00:00
Josh Bleecher Snyder
04fc887761 runtime: delay marking maps as writing until after first alg call
Fixes #19359

Change-Id: I196b47cf0471915b6dc63785e8542aa1876ff695
Reviewed-on: https://go-review.googlesource.com/37665
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-02 17:38:30 +00:00
Lynn Boger
e54bc92a2c runtime, cmd/go: roll back stale message, test detail
Some debugging code was recently added to:
1) provide more detail for the stale reason when it is
determined that a package is stale
2) provide file and package time and date information when
it is determined that runtime.a is stale

This backs out those those debugging messages.

Fixes #19116

Change-Id: I8dd0cbe29324820275b481d8bbb78ff2c5fbc362
Reviewed-on: https://go-review.googlesource.com/37382
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-01 18:50:27 +00:00
Josh Bleecher Snyder
064e44f218 runtime: evacuate old map buckets more consistently
During map growth, buckets are evacuated in two ways.
When a value is altered, its containing bucket is evacuated.
Also, an evacuation mark is maintained and advanced every time.
Prior to this CL, the evacuation mark was always incremented,
even if the next bucket to be evacuated had already been evacuated.
This CL changes evacuation mark advancement to skip previously
evacuated buckets. This has the effect of making map evacuation both
more aggressive and more consistent.

Aggressive map evacuation is good. While the map is growing,
map accesses must check two buckets, which may be far apart in memory.
Map growth also delays garbage collection.
And if map evacuation is not aggressive enough, there is a risk that
a populate-once read-many map may be stuck permanently in map growth.
This CL does not eliminate that possibility, but it shrinks the window.

There is minimal impact on map benchmarks:

name                         old time/op    new time/op    delta
MapPop100-8                    12.4µs ±11%    12.4µs ± 7%    ~     (p=0.798 n=15+15)
MapPop1000-8                    240µs ± 8%     235µs ± 8%    ~     (p=0.217 n=15+14)
MapPop10000-8                  4.49ms ±10%    4.51ms ±15%    ~     (p=1.000 n=15+13)
MegMap-8                       11.9ns ± 2%    11.8ns ± 0%  -1.01%  (p=0.000 n=15+11)
MegOneMap-8                    9.30ns ± 1%    9.29ns ± 1%    ~     (p=0.955 n=14+14)
MegEqMap-8                     31.9µs ± 5%    31.9µs ± 3%    ~     (p=0.935 n=15+15)
MegEmptyMap-8                  2.41ns ± 2%    2.41ns ± 0%    ~     (p=0.594 n=12+14)
SmallStrMap-8                  12.8ns ± 1%    12.7ns ± 1%    ~     (p=0.569 n=14+13)
MapStringKeysEight_16-8        13.6ns ± 1%    13.7ns ± 2%    ~     (p=0.100 n=13+15)
MapStringKeysEight_32-8        12.1ns ± 1%    12.1ns ± 2%    ~     (p=0.340 n=15+15)
MapStringKeysEight_64-8        12.1ns ± 1%    12.1ns ± 2%    ~     (p=0.582 n=15+14)
MapStringKeysEight_1M-8        12.0ns ± 1%    12.1ns ± 1%    ~     (p=0.267 n=15+14)
IntMap-8                       7.96ns ± 1%    7.97ns ± 2%    ~     (p=0.991 n=15+13)
RepeatedLookupStrMapKey32-8    15.8ns ± 2%    15.8ns ± 1%    ~     (p=0.393 n=15+14)
RepeatedLookupStrMapKey1M-8    35.3µs ± 2%    35.3µs ± 1%    ~     (p=0.815 n=15+15)
NewEmptyMap-8                  36.0ns ± 4%    36.4ns ± 7%    ~     (p=0.270 n=15+15)
NewSmallMap-8                  85.5ns ± 1%    85.6ns ± 1%    ~     (p=0.674 n=14+15)
MapIter-8                      89.9ns ± 6%    90.8ns ± 6%    ~     (p=0.467 n=15+15)
MapIterEmpty-8                 10.0ns ±22%    10.0ns ±25%    ~     (p=0.846 n=15+15)
SameLengthMap-8                4.18ns ± 1%    4.17ns ± 1%    ~     (p=0.653 n=15+14)
BigKeyMap-8                    20.2ns ± 1%    20.1ns ± 1%  -0.82%  (p=0.002 n=15+15)
BigValMap-8                    22.5ns ± 8%    22.3ns ± 6%    ~     (p=0.615 n=15+15)
SmallKeyMap-8                  15.3ns ± 1%    15.3ns ± 1%    ~     (p=0.754 n=15+14)
ComplexAlgMap-8                58.4ns ± 1%    58.7ns ± 1%  +0.52%  (p=0.000 n=14+15)

There is a tiny but detectable difference in the compiler:

name       old time/op      new time/op      delta
Template        218ms ± 5%       219ms ± 4%    ~     (p=0.094 n=98+98)
Unicode        93.6ms ± 5%      93.6ms ± 4%    ~     (p=0.910 n=94+95)
GoTypes         596ms ± 5%       598ms ± 6%    ~     (p=0.533 n=98+100)
Compiler        2.72s ± 3%       2.72s ± 4%    ~     (p=0.238 n=100+99)
SSA             4.11s ± 3%       4.11s ± 3%    ~     (p=0.864 n=99+98)
Flate           129ms ± 6%       129ms ± 4%    ~     (p=0.522 n=98+96)
GoParser        151ms ± 4%       151ms ± 4%  -0.48%  (p=0.017 n=96+96)
Reflect         379ms ± 3%       376ms ± 4%  -0.57%  (p=0.011 n=99+99)
Tar             112ms ± 5%       112ms ± 6%    ~     (p=0.688 n=93+95)
XML             214ms ± 4%       214ms ± 5%    ~     (p=0.968 n=100+99)
StdCmd          16.2s ± 2%       16.2s ± 2%  -0.26%  (p=0.048 n=99+99)

name       old user-ns/op   new user-ns/op   delta
Template   252user-ms ± 4%  250user-ms ± 4%  -0.63%  (p=0.020 n=98+97)
Unicode    113user-ms ± 7%  114user-ms ± 5%    ~     (p=0.057 n=97+94)
GoTypes    776user-ms ± 5%  777user-ms ± 5%    ~     (p=0.375 n=97+96)
Compiler   3.61user-s ± 3%  3.60user-s ± 3%    ~     (p=0.445 n=98+93)
SSA        5.84user-s ± 6%  5.85user-s ± 5%    ~     (p=0.542 n=100+95)
Flate      154user-ms ± 5%  154user-ms ± 5%    ~     (p=0.699 n=99+99)
GoParser   184user-ms ± 6%  183user-ms ± 4%    ~     (p=0.557 n=98+95)
Reflect    461user-ms ± 5%  462user-ms ± 4%    ~     (p=0.853 n=97+99)
Tar        130user-ms ± 5%  129user-ms ± 6%    ~     (p=0.567 n=93+100)
XML        257user-ms ± 6%  258user-ms ± 6%    ~     (p=0.205 n=99+100)

Change-Id: Id92dd54a152904069aac415e6aaaab5c67f5f476
Reviewed-on: https://go-review.googlesource.com/37011
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-28 20:42:50 +00:00
Josh Bleecher Snyder
504bc3ed24 cmd/compile, runtime: specialize convT2x, don't alloc for zero vals
Prior to this CL, all runtime conversions
from a concrete value to an interface went
through one of two runtime calls: convT2E or convT2I.
However, in practice, basic types are very common.
Specializing convT2x for those basic types allows
for a more efficient implementation for those types.
For basic scalars and strings, allocation and copying
can use the same methods as normal code.
For pointer-free types, allocation can occur without
zeroing, and copying can take place without GC calls.
For slices, copying is cheaper and simpler.

This CL adds twelve runtime routines:

convT2E16, convT2I16
convT2E32, convT2I32
convT2E64, convT2I64
convT2Estring, convT2Istring
convT2Eslice, convT2Islice
convT2Enoptr, convT2Inoptr

While compiling make.bash, 93% of all convT2x calls
are now to one of these specialized convT2x call.

Within specialized convT2x routines, it is cheap to check
for a zero value, in a way that it is not in general.
When we detect a zero value there, we return a pointer
to zeroVal, rather than allocating.

name                         old time/op  new time/op  delta
ConvT2Ezero/zero/16-8        17.9ns ± 2%   3.0ns ± 3%  -83.20%  (p=0.000 n=56+56)
ConvT2Ezero/zero/32-8        17.8ns ± 2%   3.0ns ± 3%  -83.15%  (p=0.000 n=59+60)
ConvT2Ezero/zero/64-8        20.1ns ± 1%   3.0ns ± 2%  -84.98%  (p=0.000 n=57+57)
ConvT2Ezero/zero/str-8       32.6ns ± 1%   3.0ns ± 4%  -90.70%  (p=0.000 n=59+60)
ConvT2Ezero/zero/slice-8     36.7ns ± 2%   3.0ns ± 2%  -91.78%  (p=0.000 n=59+59)
ConvT2Ezero/zero/big-8       91.9ns ± 2%  85.9ns ± 2%   -6.52%  (p=0.000 n=57+57)
ConvT2Ezero/nonzero/16-8     17.7ns ± 2%  12.7ns ± 3%  -28.38%  (p=0.000 n=55+60)
ConvT2Ezero/nonzero/32-8     17.8ns ± 1%  12.7ns ± 1%  -28.44%  (p=0.000 n=54+57)
ConvT2Ezero/nonzero/64-8     20.0ns ± 1%  15.0ns ± 1%  -24.90%  (p=0.000 n=56+58)
ConvT2Ezero/nonzero/str-8    32.6ns ± 1%  25.7ns ± 1%  -21.17%  (p=0.000 n=58+55)
ConvT2Ezero/nonzero/slice-8  36.8ns ± 2%  30.4ns ± 1%  -17.32%  (p=0.000 n=60+52)
ConvT2Ezero/nonzero/big-8    92.1ns ± 2%  85.9ns ± 2%   -6.70%  (p=0.000 n=57+59)

Benchmarks on a real program (the compiler):

name       old time/op      new time/op      delta
Template        227ms ± 5%       221ms ± 2%  -2.48%  (p=0.000 n=30+26)
Unicode         102ms ± 5%       100ms ± 3%  -1.30%  (p=0.009 n=30+26)
GoTypes         656ms ± 5%       659ms ± 4%    ~     (p=0.208 n=30+30)
Compiler        2.82s ± 2%       2.82s ± 1%    ~     (p=0.614 n=29+27)
Flate           128ms ± 2%       128ms ± 5%    ~     (p=0.783 n=27+28)
GoParser        158ms ± 3%       158ms ± 3%    ~     (p=0.261 n=28+30)
Reflect         408ms ± 7%       401ms ± 3%    ~     (p=0.075 n=30+30)
Tar             123ms ± 6%       121ms ± 8%    ~     (p=0.287 n=29+30)
XML             220ms ± 2%       220ms ± 4%    ~     (p=0.805 n=29+29)

name       old user-ns/op   new user-ns/op   delta
Template   281user-ms ± 4%  279user-ms ± 3%  -0.87%  (p=0.044 n=28+28)
Unicode    142user-ms ± 4%  141user-ms ± 3%  -1.04%  (p=0.015 n=30+27)
GoTypes    884user-ms ± 3%  886user-ms ± 2%    ~     (p=0.532 n=30+30)
Compiler   3.94user-s ± 3%  3.92user-s ± 1%    ~     (p=0.185 n=30+28)
Flate      165user-ms ± 2%  165user-ms ± 4%    ~     (p=0.780 n=27+29)
GoParser   209user-ms ± 2%  208user-ms ± 3%    ~     (p=0.453 n=28+30)
Reflect    533user-ms ± 6%  526user-ms ± 3%    ~     (p=0.057 n=30+30)
Tar        156user-ms ± 6%  154user-ms ± 6%    ~     (p=0.133 n=29+30)
XML        288user-ms ± 4%  288user-ms ± 4%    ~     (p=0.633 n=30+30)

name       old alloc/op     new alloc/op     delta
Template       41.0MB ± 0%      40.9MB ± 0%  -0.11%  (p=0.000 n=29+29)
Unicode        32.6MB ± 0%      32.6MB ± 0%    ~     (p=0.572 n=29+30)
GoTypes         122MB ± 0%       122MB ± 0%  -0.10%  (p=0.000 n=30+30)
Compiler        482MB ± 0%       481MB ± 0%  -0.07%  (p=0.000 n=30+29)
Flate          26.6MB ± 0%      26.6MB ± 0%    ~     (p=0.096 n=30+30)
GoParser       32.7MB ± 0%      32.6MB ± 0%  -0.06%  (p=0.011 n=28+28)
Reflect        84.2MB ± 0%      84.1MB ± 0%  -0.17%  (p=0.000 n=29+30)
Tar            27.7MB ± 0%      27.7MB ± 0%  -0.05%  (p=0.032 n=27+28)
XML            44.7MB ± 0%      44.7MB ± 0%    ~     (p=0.131 n=28+30)

name       old allocs/op    new allocs/op    delta
Template         373k ± 1%        370k ± 1%  -0.76%  (p=0.000 n=30+30)
Unicode          325k ± 1%        325k ± 1%    ~     (p=0.383 n=29+30)
GoTypes         1.16M ± 0%       1.15M ± 0%  -0.75%  (p=0.000 n=29+30)
Compiler        4.15M ± 0%       4.13M ± 0%  -0.59%  (p=0.000 n=30+29)
Flate            238k ± 1%        237k ± 1%  -0.62%  (p=0.000 n=30+30)
GoParser         304k ± 1%        302k ± 1%  -0.64%  (p=0.000 n=30+28)
Reflect         1.00M ± 0%       0.99M ± 0%  -1.10%  (p=0.000 n=29+30)
Tar              245k ± 1%        244k ± 1%  -0.59%  (p=0.000 n=27+29)
XML              391k ± 1%        389k ± 1%  -0.59%  (p=0.000 n=29+30)

Change-Id: Id7f456d690567c2b0a96b0d6d64de8784b6e305f
Reviewed-on: https://go-review.googlesource.com/36476
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-28 19:23:33 +00:00
Austin Clements
bab191042b cmd/internal/obj, runtime: update funcdata comments
The comments in cmd/internal/obj/funcdata.go are identical to the
comments in runtime/funcdata.h, but the majority of the definitions
they refer to don't apply to Go sources and have been stripped out of
funcdata.go.

Remove these stale comments from funcdata.go and clean up the
references to other copies of the PCDATA and FUNCDATA indexes.

Change-Id: I5d6e49a6e586cc9aecd7c3ce1567679f2a605884
Reviewed-on: https://go-review.googlesource.com/37330
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-27 22:29:28 +00:00
Dmitry Vyukov
ba6e5776fd runtime: remove unused RaceSemacquire declaration
These functions are not defined and are not used.

Fixes #19290

Change-Id: I2978147220af83cf319f7439f076c131870fb9ee
Reviewed-on: https://go-review.googlesource.com/37448
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 20:15:51 +00:00
Josh Bleecher Snyder
c7894924c7 runtime/pprof: handle empty stack traces in Profile.Add
If the caller passes a large number to Profile.Add,
the list of pcs is empty, which results in junk
(a nil pc) being recorded. Check for that explicitly,
and replace such stack traces with a lostProfileEvent.

Fixes #18836.

Change-Id: I99c96aa67dd5525cd239ea96452e6e8fcb25ce02
Reviewed-on: https://go-review.googlesource.com/36891
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-27 17:11:07 +00:00
Russ Cox
8c24e52247 runtime: check that pprof accepts but doesn't need executable
The profiles are self-contained now.
Check that they work by themselves in the tests that invoke pprof,
but also keep checking that the old command lines work.

Change-Id: I24c74b5456f0b50473883c3640625c6612f72309
Reviewed-on: https://go-review.googlesource.com/37166
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 20:46:37 +00:00
Russ Cox
0b8c983ece runtime/pprof/internal/profile: move internal/pprof/profile here
Nothing needs internal/pprof anymore except the runtime/pprof tests.
Move the package here to prevent new dependencies.

Change-Id: Ia119af91cc2b980e0fa03a15f46f69d7f71d2926
Reviewed-on: https://go-review.googlesource.com/37165
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 20:45:21 +00:00
Russ Cox
cbab65fdfa runtime/pprof: add streaming protobuf encoder
The existing code builds a full profile in memory.
Then it translates that profile into a data structure (in memory).
Then it marshals that data structure into a protocol buffer (in memory).
Then it gzips that marshaled form into the underlying writer.
So there are three copies of the full profile data in memory
at the same time before we're done. This is obviously dumb.

This CL implements a fully streaming conversion from
the original in-memory profile to the underlying writer.
There is now only one copy of the profile in memory.

For the non-CPU profiles, this is optimal, since we have to
have a full copy in memory to start with.

For the CPU profiles, we could still try to bound the profile
size stored in memory and stream fragments out during
the actual profiling, as Go 1.7 did (with a simpler format),
but so far that hasn't been necessary.

Change-Id: Ic36141021857791bf0cd1fce84178fb5e744b989
Reviewed-on: https://go-review.googlesource.com/37164
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 20:15:56 +00:00
Russ Cox
1564817d8c runtime/pprof: use more efficient hash table for staging profile
The old hash table was a place holder that allocates memory
during every lookup for key generation, even for keys that hit
in the the table.

Change-Id: I4f601bbfd349f0be76d6259a8989c9c17ccfac21
Reviewed-on: https://go-review.googlesource.com/37163
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 17:05:37 +00:00
Russ Cox
1a680a902a runtime/pprof: use new profile buffers for CPU profiling
This doesn't change the functionality of the current code,
but it sets us up for exporting the profiling labels into the profile.

The old code had a hash table of profile samples maintained
during the signal handler, with evictions going into a log.
The new code just logs every sample directly, leaving the
hash-based deduplication to an ordinary goroutine.

The new code also avoids storing the entire profile in two
forms in memory, an unfortunate regression introduced
when binary profile support was added. After this CL the
entire profile is only stored once in memory. We'd still like
to get back down to storing it zero times (streaming it to
the underlying io.Writer).

Change-Id: I0893a1788267c564aa1af17970d47377b2a43457
Reviewed-on: https://go-review.googlesource.com/36712
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
2017-02-24 17:01:47 +00:00
Russ Cox
a1261b8b0a runtime: do not allocate on every time.Sleep
It's common for some goroutines to loop calling time.Sleep.
Allocate once per goroutine, not every time.
This comes up in runtime/pprof's background reader.

Change-Id: I89d17dc7379dca266d2c9cd3aefc2382f5bdbade
Reviewed-on: https://go-review.googlesource.com/37162
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-02-24 15:34:01 +00:00
Russ Cox
b788fd80e6 runtime: new profile buffer implementation supporting label pointers
The existing CPU profiling buffer is a slice of uintptr, but we want to
start including profiling label data in the profiles, and those labels need
to be pointers in order to let them describe rich information.

This CL implements a new profBuf type that holds both a slice of uint64
for data and a slice of unsafe.Pointer for profiling labels (aka tags).
Making the runtime use these buffers will happen in followup CLs.

Change-Id: I9ff16b532d8edaf4ce0cbba1098229a561834efc
Reviewed-on: https://go-review.googlesource.com/36713
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-02-23 19:47:23 +00:00
Lynn Boger
ea48c9d232 runtime: more detail for crash_test.go
This updates the testcase to display the timestamps for the
runtime.a, it dependent packages atomic.a and sys.a, and
source files.

Change-Id: Id2901b4e8aa8eb9775c4f404ac01cc07b394ba91
Reviewed-on: https://go-review.googlesource.com/37332
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-22 16:34:14 +00:00
Josh Bleecher Snyder
4208fcdcd4 runtime: use standard linux/mipsx clone variable names
Change-Id: I62118e197190af1d11a89921d5769101ce6d2257
Reviewed-on: https://go-review.googlesource.com/37306
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-21 18:42:38 +00:00
Josh Bleecher Snyder
b6e0d4647f runtime: update assembly var names after monotonic time changes
Change-Id: I721045120a4df41462c02252e2e5e8529ae2d694
Reviewed-on: https://go-review.googlesource.com/37303
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-21 18:42:05 +00:00
Brad Fitzpatrick
a37f9d8a17 runtime/pprof: mark TestMutexProfile as flaky for now
Flaky tests hurt productivity. Disable for now.

Updates #19139

Change-Id: I2e3040bdf0e53597a1c4f925b788e3268ea284c1
Reviewed-on: https://go-review.googlesource.com/37291
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-20 20:17:16 +00:00
Dmitry Vyukov
0556e26273 sync: make Mutex more fair
Add new starvation mode for Mutex.
In starvation mode ownership is directly handed off from
unlocking goroutine to the next waiter. New arriving goroutines
don't compete for ownership.
Unfair wait time is now limited to 1ms.
Also fix a long standing bug that goroutines were requeued
at the tail of the wait queue. That lead to even more unfair
acquisition times with multiple waiters.

Performance of normal mode is not considerably affected.

Fixes #13086

On the provided in the issue lockskew program:

done in 1.207853ms
done in 1.177451ms
done in 1.184168ms
done in 1.198633ms
done in 1.185797ms
done in 1.182502ms
done in 1.316485ms
done in 1.211611ms
done in 1.182418ms

name                    old time/op  new time/op   delta
MutexUncontended-48     0.65ns ± 0%   0.65ns ± 1%     ~           (p=0.087 n=10+10)
Mutex-48                 112ns ± 1%    114ns ± 1%   +1.69%        (p=0.000 n=10+10)
MutexSlack-48            113ns ± 0%     87ns ± 1%  -22.65%         (p=0.000 n=8+10)
MutexWork-48             149ns ± 0%    145ns ± 0%   -2.48%         (p=0.000 n=9+10)
MutexWorkSlack-48        149ns ± 0%    122ns ± 3%  -18.26%         (p=0.000 n=6+10)
MutexNoSpin-48           103ns ± 4%    105ns ± 3%     ~           (p=0.089 n=10+10)
MutexSpin-48             490ns ± 4%    515ns ± 6%   +5.08%        (p=0.006 n=10+10)
Cond32-48               13.4µs ± 6%   13.1µs ± 5%   -2.75%        (p=0.023 n=10+10)
RWMutexWrite100-48      53.2ns ± 3%   41.2ns ± 3%  -22.57%        (p=0.000 n=10+10)
RWMutexWrite10-48       45.9ns ± 2%   43.9ns ± 2%   -4.38%        (p=0.000 n=10+10)
RWMutexWorkWrite100-48   122ns ± 2%    134ns ± 1%   +9.92%        (p=0.000 n=10+10)
RWMutexWorkWrite10-48    206ns ± 1%    188ns ± 1%   -8.52%         (p=0.000 n=8+10)
Cond32-24               12.1µs ± 3%   12.4µs ± 3%   +1.98%         (p=0.043 n=10+9)
MutexUncontended-24     0.74ns ± 1%   0.75ns ± 1%     ~           (p=0.650 n=10+10)
Mutex-24                 122ns ± 2%    124ns ± 1%   +1.31%        (p=0.007 n=10+10)
MutexSlack-24           96.9ns ± 2%  102.8ns ± 2%   +6.11%        (p=0.000 n=10+10)
MutexWork-24             146ns ± 1%    135ns ± 2%   -7.70%         (p=0.000 n=10+9)
MutexWorkSlack-24        135ns ± 1%    128ns ± 2%   -5.01%         (p=0.000 n=10+9)
MutexNoSpin-24           114ns ± 3%    110ns ± 4%   -3.84%        (p=0.000 n=10+10)
MutexSpin-24             482ns ± 4%    475ns ± 8%     ~           (p=0.286 n=10+10)
RWMutexWrite100-24      43.0ns ± 3%   43.1ns ± 2%     ~           (p=0.956 n=10+10)
RWMutexWrite10-24       43.4ns ± 1%   43.2ns ± 1%     ~            (p=0.085 n=10+9)
RWMutexWorkWrite100-24   130ns ± 3%    131ns ± 3%     ~           (p=0.747 n=10+10)
RWMutexWorkWrite10-24    191ns ± 1%    192ns ± 1%     ~           (p=0.210 n=10+10)
Cond32-12               11.5µs ± 2%   11.7µs ± 2%   +1.98%        (p=0.002 n=10+10)
MutexUncontended-12     1.48ns ± 0%   1.50ns ± 1%   +1.08%        (p=0.004 n=10+10)
Mutex-12                 141ns ± 1%    143ns ± 1%   +1.63%        (p=0.000 n=10+10)
MutexSlack-12            121ns ± 0%    119ns ± 0%   -1.65%          (p=0.001 n=8+9)
MutexWork-12             141ns ± 2%    150ns ± 3%   +6.36%         (p=0.000 n=9+10)
MutexWorkSlack-12        131ns ± 0%    138ns ± 0%   +5.73%         (p=0.000 n=9+10)
MutexNoSpin-12          87.0ns ± 1%   83.7ns ± 1%   -3.80%        (p=0.000 n=10+10)
MutexSpin-12             364ns ± 1%    377ns ± 1%   +3.77%        (p=0.000 n=10+10)
RWMutexWrite100-12      42.8ns ± 1%   43.9ns ± 1%   +2.41%         (p=0.000 n=8+10)
RWMutexWrite10-12       39.8ns ± 4%   39.3ns ± 1%     ~            (p=0.433 n=10+9)
RWMutexWorkWrite100-12   131ns ± 1%    131ns ± 0%     ~            (p=0.591 n=10+9)
RWMutexWorkWrite10-12    173ns ± 1%    174ns ± 0%     ~            (p=0.059 n=10+8)
Cond32-6                10.9µs ± 2%   10.9µs ± 2%     ~           (p=0.739 n=10+10)
MutexUncontended-6      2.97ns ± 0%   2.97ns ± 0%     ~     (all samples are equal)
Mutex-6                  122ns ± 6%    122ns ± 2%     ~           (p=0.668 n=10+10)
MutexSlack-6             149ns ± 3%    142ns ± 3%   -4.63%        (p=0.000 n=10+10)
MutexWork-6              136ns ± 3%    140ns ± 5%     ~           (p=0.077 n=10+10)
MutexWorkSlack-6         152ns ± 0%    138ns ± 2%   -9.21%         (p=0.000 n=6+10)
MutexNoSpin-6            150ns ± 1%    152ns ± 0%   +1.50%         (p=0.000 n=8+10)
MutexSpin-6              726ns ± 0%    730ns ± 1%     ~           (p=0.069 n=10+10)
RWMutexWrite100-6       40.6ns ± 1%   40.9ns ± 1%   +0.91%         (p=0.001 n=8+10)
RWMutexWrite10-6        37.1ns ± 0%   37.0ns ± 1%     ~            (p=0.386 n=9+10)
RWMutexWorkWrite100-6    133ns ± 1%    134ns ± 1%   +1.01%         (p=0.005 n=9+10)
RWMutexWorkWrite10-6     152ns ± 0%    152ns ± 0%     ~     (all samples are equal)
Cond32-2                7.86µs ± 2%   7.95µs ± 2%   +1.10%        (p=0.023 n=10+10)
MutexUncontended-2      8.10ns ± 0%   9.11ns ± 4%  +12.44%         (p=0.000 n=9+10)
Mutex-2                 32.9ns ± 9%   38.4ns ± 6%  +16.58%        (p=0.000 n=10+10)
MutexSlack-2            93.4ns ± 1%   98.5ns ± 2%   +5.39%         (p=0.000 n=10+9)
MutexWork-2             40.8ns ± 3%   43.8ns ± 7%   +7.38%         (p=0.000 n=10+9)
MutexWorkSlack-2        98.6ns ± 5%  108.2ns ± 2%   +9.80%         (p=0.000 n=10+8)
MutexNoSpin-2            399ns ± 1%    398ns ± 2%     ~             (p=0.463 n=8+9)
MutexSpin-2             1.99µs ± 3%   1.97µs ± 1%   -0.81%          (p=0.003 n=9+8)
RWMutexWrite100-2       37.6ns ± 5%   46.0ns ± 4%  +22.17%         (p=0.000 n=10+8)
RWMutexWrite10-2        50.1ns ± 6%   36.8ns ±12%  -26.46%         (p=0.000 n=9+10)
RWMutexWorkWrite100-2    136ns ± 0%    134ns ± 2%   -1.80%          (p=0.001 n=7+9)
RWMutexWorkWrite10-2     140ns ± 1%    138ns ± 1%   -1.50%        (p=0.000 n=10+10)
Cond32                  5.93µs ± 1%   5.91µs ± 0%     ~            (p=0.411 n=9+10)
MutexUncontended        15.9ns ± 0%   15.8ns ± 0%   -0.63%          (p=0.000 n=8+8)
Mutex                   15.9ns ± 0%   15.8ns ± 0%   -0.44%        (p=0.003 n=10+10)
MutexSlack              26.9ns ± 3%   26.7ns ± 2%     ~           (p=0.084 n=10+10)
MutexWork               47.8ns ± 0%   47.9ns ± 0%   +0.21%          (p=0.014 n=9+8)
MutexWorkSlack          54.9ns ± 3%   54.5ns ± 3%     ~           (p=0.254 n=10+10)
MutexNoSpin              786ns ± 2%    765ns ± 1%   -2.66%        (p=0.000 n=10+10)
MutexSpin               3.87µs ± 1%   3.83µs ± 0%   -0.85%          (p=0.005 n=9+8)
RWMutexWrite100         21.2ns ± 2%   21.0ns ± 1%   -0.88%         (p=0.018 n=10+9)
RWMutexWrite10          22.6ns ± 1%   22.6ns ± 0%     ~             (p=0.471 n=9+9)
RWMutexWorkWrite100      132ns ± 0%    132ns ± 0%     ~     (all samples are equal)
RWMutexWorkWrite10       124ns ± 0%    123ns ± 0%     ~           (p=0.656 n=10+10)

Change-Id: I66412a3a0980df1233ad7a5a0cd9723b4274528b
Reviewed-on: https://go-review.googlesource.com/34310
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-17 17:24:59 +00:00
Russ Cox
990124da2a runtime: use balanced tree for addr lookup in semaphore implementation
CL 36792 fixed #17953, a linear scan caused by n goroutines piling into
two different locks that hashed to the same bucket in the semaphore table.
In that CL, n goroutines contending for 2 unfortunately chosen locks
went from O(n²) to O(n).

This CL fixes a different linear scan, when n goroutines are contending for
n/2 different locks that all hash to the same bucket in the semaphore table.
In this CL, n goroutines contending for n/2 unfortunately chosen locks
goes from O(n²) to O(n log n). This case is much less likely, but any linear
scan eventually hurts, so we might as well fix it while the problem is fresh
in our minds.

The new test in this CL checks for both linear scans.

The effect of this CL on the sync benchmarks is negligible
(but it fixes the new test).

name                      old time/op    new time/op    delta
Cond1-48                     576ns ±10%     575ns ±13%     ~     (p=0.679 n=71+71)
Cond2-48                    1.59µs ± 8%    1.61µs ± 9%     ~     (p=0.107 n=73+69)
Cond4-48                    4.56µs ± 7%    4.55µs ± 7%     ~     (p=0.670 n=74+72)
Cond8-48                    9.87µs ± 9%    9.90µs ± 7%     ~     (p=0.507 n=69+73)
Cond16-48                   20.4µs ± 7%    20.4µs ±10%     ~     (p=0.588 n=69+71)
Cond32-48                   45.4µs ±10%    45.4µs ±14%     ~     (p=0.944 n=73+73)
UncontendedSemaphore-48     19.7ns ±12%    19.7ns ± 8%     ~     (p=0.589 n=65+63)
ContendedSemaphore-48       55.4ns ±26%    54.9ns ±32%     ~     (p=0.441 n=75+75)
MutexUncontended-48         0.63ns ± 0%    0.63ns ± 0%     ~     (all equal)
Mutex-48                     210ns ± 6%     213ns ±10%   +1.30%  (p=0.035 n=70+74)
MutexSlack-48                210ns ± 7%     211ns ± 9%     ~     (p=0.184 n=71+72)
MutexWork-48                 299ns ± 5%     300ns ± 5%     ~     (p=0.678 n=73+75)
MutexWorkSlack-48            302ns ± 6%     300ns ± 5%     ~     (p=0.149 n=74+72)
MutexNoSpin-48               135ns ± 6%     135ns ±10%     ~     (p=0.788 n=67+75)
MutexSpin-48                 693ns ± 5%     689ns ± 6%     ~     (p=0.092 n=65+74)
Once-48                     0.22ns ±25%    0.22ns ±24%     ~     (p=0.882 n=74+73)
Pool-48                     5.88ns ±36%    5.79ns ±24%     ~     (p=0.655 n=69+69)
PoolOverflow-48             4.79µs ±18%    4.87µs ±20%     ~     (p=0.233 n=75+75)
SemaUncontended-48          0.80ns ± 1%    0.82ns ± 8%   +2.46%  (p=0.000 n=60+74)
SemaSyntNonblock-48          103ns ± 4%     102ns ± 5%   -1.11%  (p=0.003 n=75+75)
SemaSyntBlock-48             104ns ± 4%     104ns ± 5%     ~     (p=0.231 n=71+75)
SemaWorkNonblock-48          128ns ± 4%     129ns ± 6%   +1.51%  (p=0.000 n=63+75)
SemaWorkBlock-48             129ns ± 8%     130ns ± 7%     ~     (p=0.072 n=75+74)
RWMutexUncontended-48       2.35ns ± 1%    2.35ns ± 0%     ~     (p=0.144 n=70+55)
RWMutexWrite100-48           139ns ±18%     141ns ±21%     ~     (p=0.071 n=75+73)
RWMutexWrite10-48            145ns ± 9%     145ns ± 8%     ~     (p=0.553 n=75+75)
RWMutexWorkWrite100-48       297ns ±13%     297ns ±15%     ~     (p=0.519 n=75+74)
RWMutexWorkWrite10-48        588ns ± 7%     585ns ± 5%     ~     (p=0.173 n=73+70)
WaitGroupUncontended-48     0.87ns ± 0%    0.87ns ± 0%     ~     (all equal)
WaitGroupAddDone-48         63.2ns ± 4%    62.7ns ± 4%   -0.82%  (p=0.027 n=72+75)
WaitGroupAddDoneWork-48      109ns ± 5%     109ns ± 4%     ~     (p=0.233 n=75+75)
WaitGroupWait-48            0.17ns ± 0%    0.16ns ±16%   -8.55%  (p=0.000 n=56+75)
WaitGroupWaitWork-48        1.78ns ± 1%    2.08ns ± 5%  +16.92%  (p=0.000 n=74+70)
WaitGroupActuallyWait-48    52.0ns ± 3%    50.6ns ± 5%   -2.70%  (p=0.000 n=71+69)

https://perf.golang.org/search?q=upload:20170215.1

Change-Id: Ia29a8bd006c089e401ec4297c3038cca656bcd0a
Reviewed-on: https://go-review.googlesource.com/37103
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-16 17:52:15 +00:00
Russ Cox
58d762176a runtime: run mutexevent profiling without holding semaRoot lock
Suggested by Dmitry in CL 36792 review.
Clearly safe since there are many different semaRoots
that could all have profiled sudogs calling mutexevent.

Change-Id: I45eed47a5be3e513b2dad63b60afcd94800e16d1
Reviewed-on: https://go-review.googlesource.com/37104
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2017-02-16 17:16:41 +00:00
Russ Cox
1f77db94f8 runtime: do not call wakep from enlistWorker, to avoid possible deadlock
We have seen one instance of a production job suddenly spinning to
100% CPU and becoming unresponsive. In that one instance, a SIGQUIT
was sent after 328 minutes of spinning, and the stacks showed a single
goroutine in "IO wait (scan)" state.

Looking for things that might get stuck if a goroutine got stuck in
scanning a stack, we found that injectglist does:

	lock(&sched.lock)
	var n int
	for n = 0; glist != nil; n++ {
		gp := glist
		glist = gp.schedlink.ptr()
		casgstatus(gp, _Gwaiting, _Grunnable)
		globrunqput(gp)
	}
	unlock(&sched.lock)

and that casgstatus spins on gp.atomicstatus until the _Gscan bit goes
away. Essentially, this code locks sched.lock and then while holding
sched.lock, waits to lock gp.atomicstatus.

The code that is doing the scan is:

	if castogscanstatus(gp, s, s|_Gscan) {
		if !gp.gcscandone {
			scanstack(gp, gcw)
			gp.gcscandone = true
		}
		restartg(gp)
		break loop
	}

More analysis showed that scanstack can, in a rare case, end up
calling back into code that acquires sched.lock. For example:

	runtime.scanstack at proc.go:866
	calls runtime.gentraceback at mgcmark.go:842
	calls runtime.scanstack$1 at traceback.go:378
	calls runtime.scanframeworker at mgcmark.go:819
	calls runtime.scanblock at mgcmark.go:904
	calls runtime.greyobject at mgcmark.go:1221
	calls (*runtime.gcWork).put at mgcmark.go:1412
	calls (*runtime.gcControllerState).enlistWorker at mgcwork.go:127
	calls runtime.wakep at mgc.go:632
	calls runtime.startm at proc.go:1779
	acquires runtime.sched.lock at proc.go:1675

This path was found with an automated deadlock-detecting tool.
There are many such paths but they all go through enlistWorker -> wakep.

The evidence strongly suggests that one of these paths is what caused
the deadlock we observed. We're running those jobs with
GOTRACEBACK=crash now to try to get more information if it happens
again.

Further refinement and analysis shows that if we drop the wakep call
from enlistWorker, the remaining few deadlock cycles found by the tool
are all false positives caused by not understanding the effect of calls
to func variables.

The enlistWorker -> wakep call was intended only as a performance
optimization, it rarely executes, and if it does execute at just the
wrong time it can (and plausibly did) cause the deadlock we saw.

Comment it out, to avoid the potential deadlock.

Fixes #19112.
Unfixes #14179.

Change-Id: I6f7e10b890b991c11e79fab7aeefaf70b5d5a07b
Reviewed-on: https://go-review.googlesource.com/37093
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-02-15 21:22:36 +00:00
Hana Kim
8833af3f4b runtime/pprof: print newly added fields of runtime.MemStats
in heap profile with debug mode

Change-Id: I3a80d03a4aa556614626067a8fd698b3b00f4290
Reviewed-on: https://go-review.googlesource.com/36962
Reviewed-by: Austin Clements <austin@google.com>
2017-02-15 21:14:37 +00:00
Ian Lance Taylor
c05b06a12d os: use poller for file I/O
This changes the os package to use the runtime poller for file I/O
where possible. When a system call blocks on a pollable descriptor,
the goroutine will be blocked on the poller but the thread will be
released to run other goroutines. When using a non-pollable
descriptor, the os package will continue to use thread-blocking system
calls as before.

For example, on GNU/Linux, the runtime poller uses epoll. epoll does
not support ordinary disk files, so they will continue to use blocking
I/O as before. The poller will be used for pipes.

Since this means that the poller is used for many more programs, this
modifies the runtime to only block waiting for the poller if there is
some goroutine that is waiting on the poller. Otherwise, there is no
point, as the poller will never make any goroutine ready. This
preserves the runtime's current simple deadlock detection.

This seems to crash FreeBSD systems, so it is disabled on FreeBSD.
This is issue 19093.

Using the poller on Windows requires opening the file with
FILE_FLAG_OVERLAPPED. We should only do that if we can remove that
flag if the program calls the Fd method. This is issue 19098.

Update #6817.
Update #7903.
Update #15021.
Update #18507.
Update #19093.
Update #19098.

Change-Id: Ia5197dcefa7c6fbcca97d19a6f8621b2abcbb1fe
Reviewed-on: https://go-review.googlesource.com/36800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-15 19:31:55 +00:00
Austin Clements
0993b2fd06 runtime: remove g.stackAlloc
Since we're no longer stealing space for the stack barrier array from
the stack allocation, the stack allocation is simply
g.stack.hi-g.stack.lo.

Updates #17503.

Change-Id: Id9b450ae12c3df9ec59cfc4365481a0a16b7c601
Reviewed-on: https://go-review.googlesource.com/36621
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-02-14 15:52:56 +00:00
Austin Clements
d089a6c718 runtime: remove stack barriers
Now that we don't rescan stacks, stack barriers are unnecessary. This
removes all of the code and structures supporting them as well as
tests that were specifically for stack barriers.

Updates #17503.

Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c
Reviewed-on: https://go-review.googlesource.com/36620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-14 15:52:54 +00:00
Austin Clements
c5ebcd2c8a runtime: remove rescan list
With the hybrid barrier, rescanning stacks is no longer necessary so
the rescan list is no longer necessary. Remove it.

This leaves the gcrescanstacks GODEBUG variable, since it's useful for
debugging, but changes it to simply walk all of the Gs to rescan
stacks rather than using the rescan list.

We could also remove g.gcscanvalid, which is effectively a distributed
rescan list. However, it's still useful for gcrescanstacks mode and it
adds little complexity, so we'll leave it in.

Fixes #17099.
Updates #17503.

Change-Id: I776d43f0729567335ef1bfd145b75c74de2cc7a9
Reviewed-on: https://go-review.googlesource.com/36619
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-02-14 15:52:51 +00:00
Austin Clements
7aeb915d6b runtime: remove unused debug.wbshadow
The wbshadow implementation was removed a year and a half ago in
1635ab7dfe, but the GODEBUG setting remained. Remove the GODEBUG
setting since it doesn't do anything.

Change-Id: I19cde324a79472aff60acb5cc9f7d4aa86c0c0ed
Reviewed-on: https://go-review.googlesource.com/36618
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-02-14 15:52:49 +00:00
Josh Bleecher Snyder
ef30a1c8aa runtime: fix some assembly offset names
For vet. There are more. This is a start.

Change-Id: Ibbbb2b20b5db60ee3fac4a1b5913d18fab01f6b9
Reviewed-on: https://go-review.googlesource.com/36939
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-14 02:09:48 +00:00
Josh Bleecher Snyder
cc2a52adef all: use keyed composite literals
Makes vet happy.

Change-Id: I7250f283c96e82b9796c5672a0a143ba7568fa63
Reviewed-on: https://go-review.googlesource.com/36937
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-14 02:09:14 +00:00
Josh Bleecher Snyder
46a75870ad runtime: speed up fastrand() % n
This occurs a fair amount in the runtime for non-power-of-two n.
Use an alternative, faster formulation.

name           old time/op  new time/op  delta
Fastrandn/2-8  4.45ns ± 2%  2.09ns ± 3%  -53.12%  (p=0.000 n=14+14)
Fastrandn/3-8  4.78ns ±11%  2.06ns ± 2%  -56.94%  (p=0.000 n=15+15)
Fastrandn/4-8  4.76ns ± 9%  1.99ns ± 3%  -58.28%  (p=0.000 n=15+13)
Fastrandn/5-8  4.96ns ±13%  2.03ns ± 6%  -59.14%  (p=0.000 n=15+15)

name                    old time/op  new time/op  delta
SelectUncontended-8     33.7ns ± 2%  33.9ns ± 2%  +0.70%  (p=0.000 n=49+50)
SelectSyncContended-8   1.68µs ± 4%  1.65µs ± 4%  -1.54%  (p=0.000 n=50+45)
SelectAsyncContended-8   282ns ± 1%   277ns ± 1%  -1.50%  (p=0.000 n=48+43)
SelectNonblock-8        5.31ns ± 1%  5.32ns ± 1%    ~     (p=0.275 n=45+44)
SelectProdCons-8         585ns ± 3%   577ns ± 2%  -1.35%  (p=0.000 n=50+50)
GoroutineSelect-8       1.59ms ± 2%  1.59ms ± 1%    ~     (p=0.084 n=49+48)

Updates #16213

Change-Id: Ib555a4d7da2042a25c3976f76a436b536487d5b7
Reviewed-on: https://go-review.googlesource.com/36932
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-14 00:01:22 +00:00
Ian Lance Taylor
62237c2c8e runtime: if runtime is stale while testing, show StaleReason
Update #19062.

Change-Id: I7397b573389145b56e73d2150ce0fc9aa75b3caa
Reviewed-on: https://go-review.googlesource.com/36934
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-13 23:46:12 +00:00
Sokolov Yura
663226d8e1 runtime: make fastrand to generate 32bit values
Extend period of fastrand from (1<<31)-1 to (1<<32)-1 by
choosing other polynom and reacting on high bit before shift.

Polynomial is taken at https://users.ece.cmu.edu/~koopman/lfsr/index.html
from 32.dat.gz . It is referred as F7711115 cause this list of
polynomials is for LFSR with shift to right (and fastrand uses shift to
left). (old polynomial is referred in 31.dat.gz as 7BB88888).

There were couple of places with conversation of fastrand to int, which
leads to negative values on 32bit platforms. They are fixed.

Change-Id: Ibee518a3f9103e0aea220ada494b3aec77babb72
Reviewed-on: https://go-review.googlesource.com/36875
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-13 20:22:02 +00:00
Ian Lance Taylor
40c27ed5bc runtime: if runtime is stale while testing, show cmd/go output
Update #19062.

Change-Id: If6a4c4f8d12e148b162256f13a8ee423f6e30637
Reviewed-on: https://go-review.googlesource.com/36918
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-13 20:00:05 +00:00
Ian Lance Taylor
3792db5183 net: refactor poller into new internal/poll package
This will make it possible to use the poller with the os package.

This is a lot of code movement but the behavior is intended to be
unchanged.

Update #6817.
Update #7903.
Update #15021.
Update #18507.

Change-Id: I1413685928017c32df5654ded73a2643820977ae
Reviewed-on: https://go-review.googlesource.com/36799
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-13 18:36:28 +00:00
Keith Randall
5a75d6a08e cmd/compile: optimize non-empty-interface type conversions
When doing i.(T) for non-empty-interface i and concrete type T,
there's no need to read the type out of the itab. Just compare the
itab to the itab we expect for that interface/type pair.

Also optimize type switches by putting the type hash of the
concrete type in the itab. That way we don't need to load the
type pointer out of the itab.

Update #18492

Change-Id: I49e280a21e5687e771db5b8a56b685291ac168ce
Reviewed-on: https://go-review.googlesource.com/34810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2017-02-13 18:16:31 +00:00
Josh Bleecher Snyder
8da91a6297 runtime: add Frames example
Based on sample code from iant.

Fixes #18788.

Change-Id: I6bb33ed05af2538fbde42ddcac629280ef7c00a6
Reviewed-on: https://go-review.googlesource.com/36892
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-13 06:10:35 +00:00
Russ Cox
45c6f59e1f runtime: use two-level list for semaphore address search in semaRoot
If there are many goroutines contending for two different locks
and both locks hash to the same semaRoot, the scans to find the
goroutines for a particular lock can end up being O(n), making
n lock acquisitions quadratic.

As long as only one actively-used lock hashes to each semaRoot
there's no problem, since the list operations in that case are O(1).
But when the second actively-used lock hits the same semaRoot,
then scans for entries with for a given lock have to scan over the
entries for the other lock.

Fix this problem by changing the semaRoot to hold only one sudog
per unique address. In the running example, this drops the length of
that list from O(n) to 2. Then attach other goroutines waiting on the
same address to a separate list headed by the sudog in the semaRoot list.
Those "same address list" operations are still O(1), so now the
example from above works much better.

There is still an assumption here that in real programs you don't have
many many goroutines queueing up on many many distinct addresses.
If we end up with that problem, we can replace the top-level list with
a treap.

Fixes #17953.

Change-Id: I78c5b1a5053845275ab31686038aa4f6db5720b2
Reviewed-on: https://go-review.googlesource.com/36792
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-12 15:54:16 +00:00
Josh Bleecher Snyder
2c91bb4c8a cmd/compile: make panicwrap argument-free
When code defines a method on T,
the compiler generates a corresponding wrapper method on *T.
The first thing the wrapper does is check whether
the pointer is nil and if so, call panicwrap.
This is done to provide a useful error message.

The existing implementation gets its information
from arguments set up by the compiler.
However, with some trouble, this information can
be extracted from the name of the wrapper method itself.

Removing the arguments to panicwrap simplifies and
shrinks the wrapper method.
It also means that the call to panicwrap does not
require any stack space.
This enables a further optimization on amd64/x86,
which is to skip the function prologue if nothing
else in the method requires stack space.
This is frequently the case in simple, hot methods,
such as Less and Swap in sort.Interface implementations.

Fixes #19040.

Benchmarks for package sort on amd64:

name                  old time/op  new time/op  delta
SearchWrappers-8       104ns ± 1%   104ns ± 1%    ~     (p=0.286 n=27+27)
SortString1K-8         128µs ± 1%   128µs ± 1%  -0.44%  (p=0.004 n=30+30)
SortString1K_Slice-8   118µs ± 2%   117µs ± 1%    ~     (p=0.106 n=30+30)
StableString1K-8      18.6µs ± 1%  18.6µs ± 1%    ~     (p=0.446 n=28+26)
SortInt1K-8           65.9µs ± 1%  60.7µs ± 1%  -7.96%  (p=0.000 n=28+30)
StableInt1K-8         75.3µs ± 2%  72.8µs ± 1%  -3.41%  (p=0.000 n=30+30)
StableInt1K_Slice-8   57.7µs ± 1%  57.7µs ± 1%    ~     (p=0.515 n=30+30)
SortInt64K-8          6.28ms ± 1%  6.01ms ± 1%  -4.19%  (p=0.000 n=28+28)
SortInt64K_Slice-8    5.04ms ± 1%  5.04ms ± 1%    ~     (p=0.927 n=28+27)
StableInt64K-8        6.65ms ± 1%  6.38ms ± 1%  -3.97%  (p=0.000 n=26+30)
Sort1e2-8             37.9µs ± 1%  37.2µs ± 1%  -1.89%  (p=0.000 n=29+27)
Stable1e2-8           77.0µs ± 1%  74.7µs ± 1%  -3.06%  (p=0.000 n=27+30)
Sort1e4-8             8.21ms ± 2%  7.98ms ± 1%  -2.77%  (p=0.000 n=29+30)
Stable1e4-8           24.8ms ± 1%  24.3ms ± 1%  -2.31%  (p=0.000 n=28+30)
Sort1e6-8              1.27s ± 4%   1.22s ± 1%  -3.42%  (p=0.000 n=30+29)
Stable1e6-8            5.06s ± 1%   4.92s ± 1%  -2.77%  (p=0.000 n=25+29)
[Geo mean]             731µs        714µs       -2.29%

Before/after assembly for sort.(*intPairs).Less follows.
It can be optimized further, but that's for a follow-up CL.

Before:

"".(*intPairs).Less t=1 size=214 args=0x20 locals=0x38
	0x0000 00000 (<autogenerated>:1)	TEXT	"".(*intPairs).Less(SB), $56-32
	0x0000 00000 (<autogenerated>:1)	MOVQ	(TLS), CX
	0x0009 00009 (<autogenerated>:1)	CMPQ	SP, 16(CX)
	0x000d 00013 (<autogenerated>:1)	JLS	204
	0x0013 00019 (<autogenerated>:1)	SUBQ	$56, SP
	0x0017 00023 (<autogenerated>:1)	MOVQ	BP, 48(SP)
	0x001c 00028 (<autogenerated>:1)	LEAQ	48(SP), BP
	0x0021 00033 (<autogenerated>:1)	MOVQ	32(CX), BX
	0x0025 00037 (<autogenerated>:1)	TESTQ	BX, BX
	0x0028 00040 (<autogenerated>:1)	JEQ	55
	0x002a 00042 (<autogenerated>:1)	LEAQ	64(SP), DI
	0x002f 00047 (<autogenerated>:1)	CMPQ	(BX), DI
	0x0032 00050 (<autogenerated>:1)	JNE	55
	0x0034 00052 (<autogenerated>:1)	MOVQ	SP, (BX)
	0x0037 00055 (<autogenerated>:1)	NOP
	0x0037 00055 (<autogenerated>:1)	FUNCDATA	$0, gclocals·4032f753396f2012ad1784f398b170f4(SB)
	0x0037 00055 (<autogenerated>:1)	FUNCDATA	$1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x0037 00055 (<autogenerated>:1)	MOVQ	""..this+64(FP), AX
	0x003c 00060 (<autogenerated>:1)	TESTQ	AX, AX
	0x003f 00063 (<autogenerated>:1)	JEQ	$0, 135
	0x0041 00065 (<autogenerated>:1)	MOVQ	(AX), CX
	0x0044 00068 (<autogenerated>:1)	MOVQ	8(AX), AX
	0x0048 00072 (<autogenerated>:1)	MOVQ	"".i+72(FP), DX
	0x004d 00077 (<autogenerated>:1)	CMPQ	DX, AX
	0x0050 00080 (<autogenerated>:1)	JCC	$0, 128
	0x0052 00082 (<autogenerated>:1)	SHLQ	$4, DX
	0x0056 00086 (<autogenerated>:1)	MOVQ	(CX)(DX*1), DX
	0x005a 00090 (<autogenerated>:1)	MOVQ	"".j+80(FP), BX
	0x005f 00095 (<autogenerated>:1)	CMPQ	BX, AX
	0x0062 00098 (<autogenerated>:1)	JCC	$0, 128
	0x0064 00100 (<autogenerated>:1)	SHLQ	$4, BX
	0x0068 00104 (<autogenerated>:1)	MOVQ	(CX)(BX*1), AX
	0x006c 00108 (<autogenerated>:1)	CMPQ	DX, AX
	0x006f 00111 (<autogenerated>:1)	SETLT	AL
	0x0072 00114 (<autogenerated>:1)	MOVB	AL, "".~r2+88(FP)
	0x0076 00118 (<autogenerated>:1)	MOVQ	48(SP), BP
	0x007b 00123 (<autogenerated>:1)	ADDQ	$56, SP
	0x007f 00127 (<autogenerated>:1)	RET
	0x0080 00128 (<autogenerated>:1)	PCDATA	$0, $1
	0x0080 00128 (<autogenerated>:1)	CALL	runtime.panicindex(SB)
	0x0085 00133 (<autogenerated>:1)	UNDEF
	0x0087 00135 (<autogenerated>:1)	LEAQ	go.string."sort_test"(SB), AX
	0x008e 00142 (<autogenerated>:1)	MOVQ	AX, (SP)
	0x0092 00146 (<autogenerated>:1)	MOVQ	$9, 8(SP)
	0x009b 00155 (<autogenerated>:1)	LEAQ	go.string."intPairs"(SB), AX
	0x00a2 00162 (<autogenerated>:1)	MOVQ	AX, 16(SP)
	0x00a7 00167 (<autogenerated>:1)	MOVQ	$8, 24(SP)
	0x00b0 00176 (<autogenerated>:1)	LEAQ	go.string."Less"(SB), AX
	0x00b7 00183 (<autogenerated>:1)	MOVQ	AX, 32(SP)
	0x00bc 00188 (<autogenerated>:1)	MOVQ	$4, 40(SP)
	0x00c5 00197 (<autogenerated>:1)	PCDATA	$0, $1
	0x00c5 00197 (<autogenerated>:1)	CALL	runtime.panicwrap(SB)
	0x00ca 00202 (<autogenerated>:1)	UNDEF
	0x00cc 00204 (<autogenerated>:1)	NOP
	0x00cc 00204 (<autogenerated>:1)	PCDATA	$0, $-1
	0x00cc 00204 (<autogenerated>:1)	CALL	runtime.morestack_noctxt(SB)
	0x00d1 00209 (<autogenerated>:1)	JMP	0

After:

"".(*intPairs).Swap t=1 size=147 args=0x18 locals=0x8
	0x0000 00000 (<autogenerated>:1)	TEXT	"".(*intPairs).Swap(SB), $8-24
	0x0000 00000 (<autogenerated>:1)	MOVQ	(TLS), CX
	0x0009 00009 (<autogenerated>:1)	SUBQ	$8, SP
	0x000d 00013 (<autogenerated>:1)	MOVQ	BP, (SP)
	0x0011 00017 (<autogenerated>:1)	LEAQ	(SP), BP
	0x0015 00021 (<autogenerated>:1)	MOVQ	32(CX), BX
	0x0019 00025 (<autogenerated>:1)	TESTQ	BX, BX
	0x001c 00028 (<autogenerated>:1)	JEQ	43
	0x001e 00030 (<autogenerated>:1)	LEAQ	16(SP), DI
	0x0023 00035 (<autogenerated>:1)	CMPQ	(BX), DI
	0x0026 00038 (<autogenerated>:1)	JNE	43
	0x0028 00040 (<autogenerated>:1)	MOVQ	SP, (BX)
	0x002b 00043 (<autogenerated>:1)	NOP
	0x002b 00043 (<autogenerated>:1)	FUNCDATA	$0, gclocals·e6397a44f8e1b6e77d0f200b4fba5269(SB)
	0x002b 00043 (<autogenerated>:1)	FUNCDATA	$1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x002b 00043 (<autogenerated>:1)	MOVQ	""..this+16(FP), AX
	0x0030 00048 (<autogenerated>:1)	TESTQ	AX, AX
	0x0033 00051 (<autogenerated>:1)	JEQ	$0, 140
	0x0035 00053 (<autogenerated>:1)	MOVQ	(AX), CX
	0x0038 00056 (<autogenerated>:1)	MOVQ	8(AX), AX
	0x003c 00060 (<autogenerated>:1)	MOVQ	"".i+24(FP), DX
	0x0041 00065 (<autogenerated>:1)	CMPQ	DX, AX
	0x0044 00068 (<autogenerated>:1)	JCC	$0, 133
	0x0046 00070 (<autogenerated>:1)	SHLQ	$4, DX
	0x004a 00074 (<autogenerated>:1)	MOVQ	8(CX)(DX*1), BX
	0x004f 00079 (<autogenerated>:1)	MOVQ	(CX)(DX*1), SI
	0x0053 00083 (<autogenerated>:1)	MOVQ	"".j+32(FP), DI
	0x0058 00088 (<autogenerated>:1)	CMPQ	DI, AX
	0x005b 00091 (<autogenerated>:1)	JCC	$0, 133
	0x005d 00093 (<autogenerated>:1)	SHLQ	$4, DI
	0x0061 00097 (<autogenerated>:1)	MOVQ	8(CX)(DI*1), AX
	0x0066 00102 (<autogenerated>:1)	MOVQ	(CX)(DI*1), R8
	0x006a 00106 (<autogenerated>:1)	MOVQ	R8, (CX)(DX*1)
	0x006e 00110 (<autogenerated>:1)	MOVQ	AX, 8(CX)(DX*1)
	0x0073 00115 (<autogenerated>:1)	MOVQ	SI, (CX)(DI*1)
	0x0077 00119 (<autogenerated>:1)	MOVQ	BX, 8(CX)(DI*1)
	0x007c 00124 (<autogenerated>:1)	MOVQ	(SP), BP
	0x0080 00128 (<autogenerated>:1)	ADDQ	$8, SP
	0x0084 00132 (<autogenerated>:1)	RET
	0x0085 00133 (<autogenerated>:1)	PCDATA	$0, $1
	0x0085 00133 (<autogenerated>:1)	CALL	runtime.panicindex(SB)
	0x008a 00138 (<autogenerated>:1)	UNDEF
	0x008c 00140 (<autogenerated>:1)	PCDATA	$0, $1
	0x008c 00140 (<autogenerated>:1)	CALL	runtime.panicwrap(SB)
	0x0091 00145 (<autogenerated>:1)	UNDEF

Change-Id: I15bb8435f0690badb868799f313ed8817335efd3
Reviewed-on: https://go-review.googlesource.com/36809
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-11 23:27:35 +00:00
Sokolov Yura
d03c124860 runtime: implement fastrand in go
So it could be inlined.

Using bit-tricks it could be implemented without condition
(improved trick version by Minux Ma).

Simple benchmark shows it is faster on i386 and x86_64, though
I don't know will it be faster on other architectures?

benchmark                       old ns/op     new ns/op     delta
BenchmarkFastrand-3             2.79          1.48          -46.95%
BenchmarkFastrandHashiter-3     25.9          24.9          -3.86%

Change-Id: Ie2eb6d0f598c0bb5fac7f6ad0f8b5e3eddaa361b
Reviewed-on: https://go-review.googlesource.com/34782
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-10 19:16:29 +00:00
Brad Fitzpatrick
9f75ecd5e1 runtime/debug: don't run a GC when setting SetGCPercent negative
If the user is calling SetGCPercent(-1), they intend to disable GC.
They probably don't intend to run one. If they do, they can call
runtime.GC themselves.

Change-Id: I40ef40dfc7e15193df9ff26159cd30e56b666f73
Reviewed-on: https://go-review.googlesource.com/34013
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-02-10 18:28:37 +00:00
Heschi Kreinick
2a74b9e814 cmd/trace: Record mark assists in execution traces
During the mark phase of garbage collection, goroutines that allocate
may be recruited to assist. This change creates trace events for mark
assists and displays them similarly to sweep assists in the trace
viewer.

Mark assists are different than sweeps in that they can be preempted, so
displaying them in the trace viewer is a little tricky -- we may need to
synthesize multiple slices for one mark assist. This could have been
done in the parser instead, but I thought it might be preferable to keep
the parser as true to the event stream as possible.

Change-Id: I381dcb1027a187a354b1858537851fa68a620ea7
Reviewed-on: https://go-review.googlesource.com/36015
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2017-02-10 18:03:42 +00:00
Russ Cox
9a7544395a runtime/pprof: merge internal/protopprof into pprof package
These are very tightly coupled, and internal/protopprof is small.
There's no point to having a separate package.

Change-Id: I2c8aa49c9e18a7128657bf2b05323860151b5606
Reviewed-on: https://go-review.googlesource.com/36711
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-10 13:09:19 +00:00
Robert Griesemer
3fd3171c2c cmd/compile/internal/syntax: removed gcCompat code needed to pass orig. tests
The gcCompat mode was introduced to match the new parser's node position
setup exactly with the positions used by the original parser. Some of the
gcCompat adjustments were required to satisfy syntax error test cases,
and the rest were required to make toolstash cmp pass.

This change removes the former gcCompat adjustments and instead adjusts
the respective test cases as necessary. In some cases this makes the error
lines consistent with the ones reported by gccgo.

Where it has changed, the position associated with a given syntactic construct
is the position (line/col number) of the left-most token belonging to the
construct.

Change-Id: I5b60c00c5999a895c4d6d6e9b383c6405ccf725c
Reviewed-on: https://go-review.googlesource.com/36695
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-02-10 01:22:30 +00:00
Ian Lance Taylor
e24228af25 runtime: enable/disable SIGPROF if needed when profiling
This ensures that SIGPROF is handled correctly when using
runtime/pprof in a c-archive or c-shared library.

Separate profiler handling into pre-process changes and per-thread
changes. Simplify the Windows code slightly accordingly.

Fixes #18220.

Change-Id: I5060f7084c91ef0bbe797848978bdc527c312777
Reviewed-on: https://go-review.googlesource.com/34018
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2017-02-09 18:53:34 +00:00
Russ Cox
e4371fb179 time: optimize Now on darwin, windows
Fetch both monotonic and wall time together when possible.
Avoids skew and is cheaper.

Also shave a few ns off in conversion in package time.

Compared to current implementation (after monotonic changes):

name   old time/op  new time/op  delta
Now    19.6ns ± 1%   9.7ns ± 1%  -50.63%  (p=0.000 n=41+49) darwin/amd64
Now    23.5ns ± 4%  10.6ns ± 5%  -54.61%  (p=0.000 n=30+28) windows/amd64
Now    54.5ns ± 5%  29.8ns ± 9%  -45.40%  (p=0.000 n=27+29) windows/386

More importantly, compared to Go 1.8:

name   old time/op  new time/op  delta
Now     9.5ns ± 1%   9.7ns ± 1%   +1.94%  (p=0.000 n=41+49) darwin/amd64
Now    12.9ns ± 5%  10.6ns ± 5%  -17.73%  (p=0.000 n=30+28) windows/amd64
Now    15.3ns ± 5%  29.8ns ± 9%  +94.36%  (p=0.000 n=30+29) windows/386

This brings time.Now back in line with Go 1.8 on darwin/amd64 and windows/amd64.

It's not obvious why windows/386 is still noticeably worse than Go 1.8,
but it's better than before this CL. The windows/386 speed is not too
important; the changes just keep the two architectures similar.

Change-Id: If69b94970c8a1a57910a371ee91e0d4e82e46c5d
Reviewed-on: https://go-review.googlesource.com/36428
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-09 14:45:16 +00:00
Ian Lance Taylor
87ad863f35 runtime: use atomic ops for fwdSig, make sigtable immutable
The fwdSig array is accessed by the signal handler, which may run in
parallel with other threads manipulating it via the os/signal package.
Use atomic accesses to ensure that there are no problems.

Move the _SigHandling flag out of the sigtable array. This makes sigtable
immutable and safe to read from the signal handler.

Change-Id: Icfa407518c4ebe1da38580920ced764898dfc9ad
Reviewed-on: https://go-review.googlesource.com/36321
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-08 04:14:41 +00:00
David Crawshaw
14c2849c3e runtime: update android time_now call
This was broken in https://golang.org/cl/36255

Change-Id: Ib23323a745a650ac51b0ead161076f97efe6d7b7
Reviewed-on: https://go-review.googlesource.com/36543
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-08 02:56:25 +00:00
Sameer Ajmani
38cb9d28a9 runtime/pprof: document that profile names should not contain spaces.
Change-Id: I967d897e812bee63b32bc2a7dcf453861b89b7e3
Reviewed-on: https://go-review.googlesource.com/36533
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-07 22:00:48 +00:00
Jaana Burcu Dogan
6cf7918e73 runtime/pprof: clarify CPU profile's captured during the lifetime of the prog
Fixes #18504.

Change-Id: I3716fc58fc98472eea15ce3617aee3890670c276
Reviewed-on: https://go-review.googlesource.com/36430
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-07 19:46:15 +00:00
Austin Clements
4af6b81d41 runtime: fix confusion between _MaxMem and _MaxArena32
Currently both _MaxMem and _MaxArena32 represent the maximum arena
size on 32-bit hosts (except on MIPS32 where _MaxMem is confusingly
smaller than _MaxArena32).

Clean up sysAlloc so that it always uses _MaxMem, which is the maximum
arena size on both 32- and 64-bit architectures and is the arena size
we allocate auxiliary structures for. This lets us simplify and unify
some code paths and eliminate _MaxArena32.

Fixes #18651. mheap.sysAlloc currently assumes that if the arena is
small, we must be on a 32-bit machine and can therefore grow the arena
to _MaxArena32. This breaks down on darwin/arm64, where _MaxMem is
only 2 GB. As a result, on darwin/arm64, we only reserve spans and
bitmap space for a 2 GB heap, and if the application tries to allocate
beyond that, sysAlloc takes the 32-bit path, tries to grow the arena
beyond 2 GB, and panics when it tries to grow the spans array
allocation past its reserved size. This has probably been a problem
for several releases now, but was only noticed recently because
mapSpans didn't check the bounds on the span reservation until
recently. Most likely it corrupted the bitmap before. By using _MaxMem
consistently, we avoid thinking that we can grow the arena larger than
we have auxiliary structures for.

Change-Id: Ifef28cb746a3ead4b31c1d7348495c2242fef520
Reviewed-on: https://go-review.googlesource.com/35253
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-07 18:39:18 +00:00
Austin Clements
1cc24690b8 runtime: simplify and cleanup mallocinit
mallocinit has evolved organically. Make a pass to clean it up in
various ways:

1. Merge the computation of spansSize and bitmapSize. These were
   computed on every loop iteration of two different loops, but always
   have the same value, which can be derived directly from _MaxMem.
   This also avoids over-reserving these on MIPS, were _MaxArena32 is
   larger than _MaxMem.

2. Remove the ulimit -v logic. It's been disabled for many releases
   and the dead code paths to support it are even more wrong now than
   they were when it was first disabled, since now we *must* reserve
   spans and bitmaps for the full address space.

3. Make it clear that we're using a simple linear allocation to lay
   out the spans, bitmap, and arena spaces. Previously there were a
   lot of redundant pointer computations. Now we just bump p1 up as we
   reserve the spaces.

In preparation for #18651.

Updates #5049 (respect ulimit).

Change-Id: Icbe66570d3a7a17bea227dc54fb3c4978b52a3af
Reviewed-on: https://go-review.googlesource.com/35252
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-07 18:39:15 +00:00
Austin Clements
efb5eae3cf runtime: make _MaxMem an untyped constant
Currently _MaxMem is a uintptr, which is going to complicate some
further changes. Make it untyped so we'll be able to do untyped math
on it before truncating it to a uintptr.

The runtime assembly is identical before and after this change on
{linux,windows}/{amd64,386}.

Updates #18651.

Change-Id: I0f64511faa9e0aa25179a556ab9f185ebf8c9cf8
Reviewed-on: https://go-review.googlesource.com/35251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-02-07 18:39:12 +00:00
Cherry Zhang
bed8129ee6 cmd/internal/obj: remove Follow pass
The Follow pass in the assembler backend reorders and copies
instructions. This even applies to hand-written assembly code,
which in many cases don't want to be reordered. Now that the
SSA compiler does a good job for laying out instructions, the
benefit of this pass is very little:

AMD64: (old = with Follow, new = without Follow)
name                      old time/op    new time/op    delta
BinaryTree17-12              2.78s ± 1%     2.79s ± 1%  +0.44%  (p=0.000 n=20+19)
Fannkuch11-12                3.11s ± 0%     3.31s ± 1%  +6.16%  (p=0.000 n=19+19)
FmtFprintfEmpty-12          50.9ns ± 1%    51.6ns ± 3%  +1.40%  (p=0.000 n=17+20)
FmtFprintfString-12          127ns ± 0%     128ns ± 1%  +0.88%  (p=0.000 n=17+17)
FmtFprintfInt-12             122ns ± 0%     123ns ± 1%  +0.76%  (p=0.000 n=20+19)
FmtFprintfIntInt-12          185ns ± 1%     186ns ± 1%  +0.65%  (p=0.000 n=20+19)
FmtFprintfPrefixedInt-12     192ns ± 1%     202ns ± 1%  +4.99%  (p=0.000 n=20+19)
FmtFprintfFloat-12           284ns ± 0%     288ns ± 0%  +1.33%  (p=0.000 n=15+19)
FmtManyArgs-12               807ns ± 0%     804ns ± 0%  -0.44%  (p=0.000 n=16+18)
GobDecode-12                7.23ms ± 1%    7.21ms ± 1%    ~     (p=0.052 n=20+20)
GobEncode-12                6.09ms ± 1%    6.12ms ± 1%  +0.41%  (p=0.002 n=19+19)
Gzip-12                      253ms ± 1%     255ms ± 1%  +0.95%  (p=0.000 n=18+20)
Gunzip-12                   38.4ms ± 0%    38.5ms ± 0%  +0.34%  (p=0.000 n=17+17)
HTTPClientServer-12         95.4µs ± 2%    96.1µs ± 1%  +0.78%  (p=0.002 n=19+19)
JSONEncode-12               16.5ms ± 1%    16.6ms ± 1%  +1.17%  (p=0.000 n=19+19)
JSONDecode-12               54.6ms ± 1%    55.3ms ± 1%  +1.23%  (p=0.000 n=18+18)
Mandelbrot200-12            4.47ms ± 0%    4.47ms ± 0%  +0.06%  (p=0.000 n=18+18)
GoParse-12                  3.47ms ± 1%    3.47ms ± 1%    ~     (p=0.583 n=20+20)
RegexpMatchEasy0_32-12      84.8ns ± 1%    85.2ns ± 2%  +0.51%  (p=0.022 n=20+20)
RegexpMatchEasy0_1K-12       206ns ± 1%     206ns ± 1%    ~     (p=0.770 n=20+20)
RegexpMatchEasy1_32-12      82.8ns ± 1%    83.4ns ± 1%  +0.64%  (p=0.000 n=20+19)
RegexpMatchEasy1_1K-12       363ns ± 1%     361ns ± 1%  -0.48%  (p=0.007 n=20+20)
RegexpMatchMedium_32-12      126ns ± 1%     126ns ± 0%  +0.72%  (p=0.000 n=20+20)
RegexpMatchMedium_1K-12     39.1µs ± 1%    39.8µs ± 0%  +1.73%  (p=0.000 n=19+19)
RegexpMatchHard_32-12       1.97µs ± 0%    1.98µs ± 1%  +0.29%  (p=0.005 n=18+20)
RegexpMatchHard_1K-12       59.5µs ± 1%    59.8µs ± 1%  +0.36%  (p=0.000 n=18+20)
Revcomp-12                   442ms ± 1%     445ms ± 2%  +0.67%  (p=0.000 n=19+20)
Template-12                 58.0ms ± 1%    57.5ms ± 1%  -0.85%  (p=0.000 n=19+19)
TimeParse-12                 311ns ± 0%     314ns ± 0%  +0.94%  (p=0.000 n=20+18)
TimeFormat-12                350ns ± 3%     346ns ± 0%    ~     (p=0.076 n=20+19)
[Geo mean]                  55.9µs         56.4µs       +0.80%

ARM32:
name                     old time/op    new time/op    delta
BinaryTree17-4              30.4s ± 0%     30.1s ± 0%  -1.14%  (p=0.000 n=10+8)
Fannkuch11-4                13.7s ± 0%     13.6s ± 0%  -0.75%  (p=0.000 n=10+10)
FmtFprintfEmpty-4           664ns ± 1%     651ns ± 1%  -1.96%  (p=0.000 n=7+8)
FmtFprintfString-4         1.83µs ± 2%    1.77µs ± 2%  -3.21%  (p=0.000 n=10+10)
FmtFprintfInt-4            1.57µs ± 2%    1.54µs ± 2%  -2.25%  (p=0.007 n=10+10)
FmtFprintfIntInt-4         2.37µs ± 2%    2.31µs ± 1%  -2.68%  (p=0.000 n=10+10)
FmtFprintfPrefixedInt-4    2.14µs ± 2%    2.10µs ± 1%  -1.83%  (p=0.006 n=10+10)
FmtFprintfFloat-4          3.69µs ± 2%    3.74µs ± 1%  +1.60%  (p=0.000 n=10+10)
FmtManyArgs-4              9.43µs ± 1%    9.17µs ± 1%  -2.70%  (p=0.000 n=10+10)
GobDecode-4                76.3ms ± 1%    75.5ms ± 1%  -1.14%  (p=0.003 n=10+10)
GobEncode-4                70.7ms ± 2%    69.0ms ± 1%  -2.36%  (p=0.000 n=10+10)
Gzip-4                      2.64s ± 1%     2.65s ± 0%  +0.59%  (p=0.002 n=10+10)
Gunzip-4                    402ms ± 0%     398ms ± 0%  -1.11%  (p=0.000 n=10+9)
HTTPClientServer-4          458µs ± 0%     457µs ± 0%    ~     (p=0.247 n=10+10)
JSONEncode-4                171ms ± 0%     172ms ± 0%  +0.56%  (p=0.000 n=10+10)
JSONDecode-4                672ms ± 1%     668ms ± 1%    ~     (p=0.105 n=10+10)
Mandelbrot200-4            33.5ms ± 0%    33.5ms ± 0%    ~     (p=0.156 n=9+10)
GoParse-4                  33.9ms ± 0%    34.0ms ± 0%  +0.36%  (p=0.031 n=9+9)
RegexpMatchEasy0_32-4       823ns ± 1%     835ns ± 1%  +1.49%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-4      3.99µs ± 0%    4.02µs ± 1%  +0.92%  (p=0.000 n=8+10)
RegexpMatchEasy1_32-4       877ns ± 3%     904ns ± 2%  +3.07%  (p=0.012 n=10+10)
RegexpMatchEasy1_1K-4      5.99µs ± 0%    5.97µs ± 1%  -0.38%  (p=0.023 n=8+8)
RegexpMatchMedium_32-4     1.40µs ± 2%    1.40µs ± 2%    ~     (p=0.590 n=10+9)
RegexpMatchMedium_1K-4      357µs ± 0%     355µs ± 1%  -0.72%  (p=0.000 n=7+8)
RegexpMatchHard_32-4       22.3µs ± 0%    22.1µs ± 0%  -0.49%  (p=0.000 n=8+7)
RegexpMatchHard_1K-4        661µs ± 0%     658µs ± 0%  -0.42%  (p=0.000 n=8+7)
Revcomp-4                  46.3ms ± 0%    46.3ms ± 0%    ~     (p=0.393 n=10+10)
Template-4                  753ms ± 1%     750ms ± 0%    ~     (p=0.211 n=10+9)
TimeParse-4                4.28µs ± 1%    4.22µs ± 1%  -1.34%  (p=0.000 n=8+10)
TimeFormat-4               9.00µs ± 0%    9.05µs ± 0%  +0.59%  (p=0.000 n=10+10)
[Geo mean]                  538µs          535µs       -0.55%

ARM64:
name                     old time/op    new time/op    delta
BinaryTree17-8              8.39s ± 0%     8.39s ± 0%    ~     (p=0.684 n=10+10)
Fannkuch11-8                5.95s ± 0%     5.99s ± 0%  +0.63%  (p=0.000 n=10+10)
FmtFprintfEmpty-8           116ns ± 0%     116ns ± 0%    ~     (all equal)
FmtFprintfString-8          361ns ± 0%     360ns ± 0%  -0.31%  (p=0.003 n=8+6)
FmtFprintfInt-8             290ns ± 0%     290ns ± 0%    ~     (p=0.620 n=9+9)
FmtFprintfIntInt-8          476ns ± 1%     469ns ± 0%  -1.47%  (p=0.000 n=10+6)
FmtFprintfPrefixedInt-8     412ns ± 2%     417ns ± 2%  +1.39%  (p=0.006 n=9+10)
FmtFprintfFloat-8           652ns ± 1%     652ns ± 0%    ~     (p=0.161 n=10+8)
FmtManyArgs-8              1.94µs ± 0%    1.94µs ± 2%    ~     (p=0.781 n=10+10)
GobDecode-8                17.7ms ± 1%    17.7ms ± 0%    ~     (p=0.962 n=10+7)
GobEncode-8                15.6ms ± 0%    15.6ms ± 1%    ~     (p=0.063 n=10+10)
Gzip-8                      786ms ± 0%     787ms ± 0%    ~     (p=0.356 n=10+9)
Gunzip-8                    127ms ± 0%     127ms ± 0%  +0.08%  (p=0.028 n=10+9)
HTTPClientServer-8          198µs ± 6%     198µs ± 7%    ~     (p=0.796 n=10+10)
JSONEncode-8               42.5ms ± 0%    42.2ms ± 0%  -0.73%  (p=0.000 n=9+8)
JSONDecode-8                158ms ± 1%     162ms ± 0%  +2.28%  (p=0.000 n=10+9)
Mandelbrot200-8            10.1ms ± 0%    10.1ms ± 0%  -0.01%  (p=0.000 n=10+9)
GoParse-8                  8.54ms ± 1%    8.63ms ± 1%  +1.06%  (p=0.000 n=10+9)
RegexpMatchEasy0_32-8       231ns ± 1%     225ns ± 0%  -2.52%  (p=0.000 n=9+10)
RegexpMatchEasy0_1K-8      1.63µs ± 0%    1.63µs ± 0%    ~     (p=0.170 n=10+10)
RegexpMatchEasy1_32-8       253ns ± 0%     249ns ± 0%  -1.41%  (p=0.000 n=9+10)
RegexpMatchEasy1_1K-8      2.08µs ± 0%    2.08µs ± 0%  -0.32%  (p=0.000 n=9+10)
RegexpMatchMedium_32-8      355ns ± 1%     351ns ± 0%  -1.04%  (p=0.007 n=10+7)
RegexpMatchMedium_1K-8      104µs ± 0%     104µs ± 0%    ~     (p=0.148 n=10+10)
RegexpMatchHard_32-8       5.79µs ± 0%    5.79µs ± 0%    ~     (p=0.578 n=10+10)
RegexpMatchHard_1K-8        176µs ± 0%     176µs ± 0%    ~     (p=0.137 n=10+10)
Revcomp-8                   1.37s ± 1%     1.36s ± 1%  -0.26%  (p=0.023 n=10+10)
Template-8                  151ms ± 1%     154ms ± 1%  +2.14%  (p=0.000 n=9+10)
TimeParse-8                 723ns ± 2%     721ns ± 1%    ~     (p=0.592 n=10+10)
TimeFormat-8                804ns ± 2%     798ns ± 3%    ~     (p=0.344 n=10+10)
[Geo mean]                  154µs          154µs       -0.02%

Therefore remove this pass. Also reduce text size by 0.5~2%.

Comment out some dead code in runtime/sys_nacl_amd64p32.s
which contains undefined symbols.

Change-Id: I1473986fe5b18b3d2554ce96cdc6f0999b8d955d
Reviewed-on: https://go-review.googlesource.com/36205
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-07 15:00:48 +00:00
Michael Matloob
cbef450df7 runtime/pprof: symbolize proto profiles
When generating pprof profiles in proto format, symbolize the profiles.

Change-Id: I2471ed7f919483e5828868306418a63e41aff5c5
Reviewed-on: https://go-review.googlesource.com/34192
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-07 14:03:13 +00:00
Michael Matloob
62956897c1 runtime: add definitions for SetGoroutineLabels and Do
This change defines runtime/pprof.SetGoroutineLabels and runtime/pprof.Do, which
are used to set profiler labels on goroutines. The change defines functions
in the runtime for setting and getting profile labels, and sets and unsets
profile labels when goroutines are created and deleted. The change also adds
the package runtime/internal/proflabel, which defines the structure the runtime
uses to store profile labels.

Change-Id: I747a4400141f89b6e8160dab6aa94ca9f0d4c94d
Reviewed-on: https://go-review.googlesource.com/34198
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-on: https://go-review.googlesource.com/35010
2017-02-06 20:29:37 +00:00
Michael Matloob
91ad2a2194 runtime/pprof: add definitions of profile label types
This change defines WithLabels, Labels, Label, and ForLabels.
This is the first step of the profile labels implemention for go 1.9.

Updates #17280

Change-Id: I2dfc9aae90f7a4aa1ff7080d5747f0a1f0728e75
Reviewed-on: https://go-review.googlesource.com/34198
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-06 15:43:06 +00:00
Ian Lance Taylor
6aee6b895c runtime: remove markBits.clearMarkedNonAtomic
It's not used, it's never been used, and it doesn't do what its doc
comment says it does.

Fixes #18941.

Change-Id: Ia89d97fb87525f5b861d7701f919e0d6b7cbd376
Reviewed-on: https://go-review.googlesource.com/36322
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-06 04:45:55 +00:00
Elias Naur
78074f6850 runtime: handle SIGPIPE in c-archive and c-shared programs
Before this CL, Go programs in c-archive or c-shared buildmodes
would not handle SIGPIPE. That leads to surprising behaviour where
writes on a closed pipe or socket would raise SIGPIPE and terminate
the program. This CL changes the Go runtime to handle
SIGPIPE regardless of buildmode. In addition, SIGPIPE from non-Go
code is forwarded.

This is a refinement of CL 32796 that fixes the case where a non-default
handler for SIGPIPE is installed by the host C program.

Fixes #17393

Change-Id: Ia41186e52c1ac209d0a594bae9904166ae7df7de
Reviewed-on: https://go-review.googlesource.com/35960
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-03 20:07:36 +00:00
Russ Cox
0e3355903d time: record monotonic clock reading in time.Now, for more accurate comparisons
See https://golang.org/design/12914-monotonic for details.

Fixes #12914.

Change-Id: I80edc2e6c012b4ace7161c84cf067d444381a009
Reviewed-on: https://go-review.googlesource.com/36255
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Caleb Spare <cespare@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-03 19:04:52 +00:00
Keith Randall
69e1634985 runtime: darwin/amd64, don't depend on outarg slots being unmodified
sigtramp was calling sigtrampgo and depending on the fact that
the 3rd argument slot will not be modified on return.  Our calling
convention doesn't guarantee that.  Avoid that assumption.

There's no actual bug here, as sigtrampgo does not in fact modify its
argument slots.  But I found this while working on the dead stack slot
clobbering tool.  https://go-review.googlesource.com/c/23924/

Change-Id: Ia7e791a2b4c1c74fff24cba8169e7840b4b06ffc
Reviewed-on: https://go-review.googlesource.com/36216
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-02-03 05:08:54 +00:00
Cherry Zhang
f69a6defd1 runtime: skip flaky TestGdbPythonCgo on MIPS
It seems the problem is on gdb and the dynamic linker. Skip the
test for now until we figure out what's going on with the system.

Updates #18784.

Change-Id: Ic9320ffd463f6c231b2c4192652263b1cf7f4231
Reviewed-on: https://go-review.googlesource.com/36250
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-02 21:45:42 +00:00
Lars Wiegman
e546b295b8 runtime: use mach_absolute_time for runtime.nanotime
The existing darwin/amd64 implementation of runtime.nanotime returns the
wallclock time, which results in timers not functioning properly when
system time runs backwards. By implementing the algorithm used by the
darwin syscall mach_absolute_time, timers will function as expected.

The algorithm is described at
https://opensource.apple.com/source/xnu/xnu-3248.60.10/libsyscall/wrappers/mach_absolute_time.s

Fixes #17610

Change-Id: I9c8d35240d48249a6837dca1111b1406e2686f67
Reviewed-on: https://go-review.googlesource.com/35292
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-02 21:20:40 +00:00
Josh Bleecher Snyder
0358367576 cmd/compile, runtime: convert byte-sized values to interfaces without allocation
Based in part on khr's CL 2500.

Updates #17725
Updates #18121

Change-Id: I744e1f92fc2104e6c5bd883a898c30b2eea8cc31
Reviewed-on: https://go-review.googlesource.com/35555
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-02 21:04:34 +00:00
Russ Cox
c47df7ae17 all: merge dev.typealias into master
For #18130.

f8b4123613 [dev.typealias] spec: use term 'embedded field' rather than 'anonymous field'
9ecc3ee252 [dev.typealias] cmd/compile: avoid false positive cycles from type aliases
49b7af8a30 [dev.typealias] reflect: add test for type aliases
9bbb07ddec [dev.typealias] cmd/compile, reflect: fix struct field names for embedded byte, rune
43c7094386 [dev.typealias] reflect: fix StructOf use of StructField to match StructField docs
9657e0b077 [dev.typealias] cmd/doc: update for type alias
de2e5459ae [dev.typealias] cmd/compile: declare methods after resolving receiver type
9259f3073a [dev.typealias] test: match gccgo error messages on alias2.go
5d92916770 [dev.typealias] cmd/compile: change Func.Shortname to *Sym
a7c884efc1 [dev.typealias] go/internal/gccgoimporter: support for type aliases
5802cfd900 [dev.typealias] cmd/compile: export/import test cases for type aliases
d7cabd40dd [dev.typealias] go/types: clarified doc string
cc2dcce3d7 [dev.typealias] cmd/compile: a few better comments related to alias types
5c160b28ba [dev.typealias] cmd/compile: improved error message for cyles involving type aliases
b2386dffa1 [dev.typealias] cmd/compile: type-check type alias declarations
ac8421f9a5 [dev.typealias] cmd/compile: various minor cleanups
f011e0c6c3 [dev.typealias] cmd/compile, go/types, go/importer: various alias related fixes
49de5f0351 [dev.typealias] cmd/compile, go/importer: define export format and implement importing of type aliases
5ceec42dc0 [dev.typealias] go/types: export TypeName.IsAlias so clients can use it
aa1f0681bc [dev.typealias] go/types: improved Object printing
c80748e389 [dev.typealias] go/types: remove some more vestiges of prior alias implementation
80d8b69e95 [dev.typealias] go/types: implement type aliases
a917097b5e [dev.typealias] go/build: add go1.9 build tag
3e11940437 [dev.typealias] cmd/compile: recognize type aliases but complain for now (not yet supported)
e0a05c274a [dev.typealias] cmd/gofmt: added test cases for alias type declarations
2e5116bd99 [dev.typealias] go/ast, go/parser, go/printer, go/types: initial type alias support

Change-Id: Ia65f2e011fd7195f18e1dce67d4d49b80a261203
2017-01-31 13:01:31 -05:00
Ian Lance Taylor
0949659952 runtime: add explicit (void) in C to avoid GCC 7 problem
This avoids errors like
    ./traceback.go:80:2: call of non-function C.f1

I filed https://gcc.gnu.org/PR79289 for the GCC problem. I think this
is a bug in GCC, and it may be fixed before the final GCC 7 release.
This CL is correct either way.

Fixes #18855.

Change-Id: I0785a7b7c5b1d0ca87b454b5eca9079f390fcbd4
Reviewed-on: https://go-review.googlesource.com/35919
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-01-30 19:27:49 +00:00
David Crawshaw
b531eb3062 runtime: reorder modules so main.main comes first
Modules appear in the moduledata linked list in the order they are
loaded by the dynamic loader, with one exception: the
firstmoduledata itself the module that contains the runtime.
This is not always the first module (when using -buildmode=shared,
it is typically libstd.so, the second module).

The order matters for typelinksinit, so we swap the first module
with whatever module contains the main function.

Updates #18729

This fixes the test case extracted with -linkshared, and now

	go test -linkshared encoding/...

passes. However the original issue about a plugin failure is not
yet fixed.

Change-Id: I9f399ecc3518e22e6b0a350358e90b0baa44ac96
Reviewed-on: https://go-review.googlesource.com/35644
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-01-25 22:33:57 +00:00
Russ Cox
9bbb07ddec [dev.typealias] cmd/compile, reflect: fix struct field names for embedded byte, rune
Will also fix type aliases.

Fixes #17766.
For #18130.

Change-Id: I9e1584d47128782152e06abd0a30ef423d5c30d2
Reviewed-on: https://go-review.googlesource.com/35732
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-01-25 18:57:20 +00:00
Ian Lance Taylor
aad06da2b9 cmd/link: mark DWARF function symbols as reachable
Otherwise we don't emit any required ELF relocations when doing an
external link, because elfrelocsect skips unreachable symbols.

Fixes #18745.

Change-Id: Ia3583c41bb6c5ebb7579abd26ed8689370311cd6
Reviewed-on: https://go-review.googlesource.com/35590
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-01-24 03:37:56 +00:00
Keith Randall
a96e117a58 runtime: amd64, use 4-byte ops for memmove of 4 bytes
memmove used to use 2 2-byte load/store pairs to move 4 bytes.
When the result is loaded with a single 4-byte load, it caused
a store to load fowarding stall.  To avoid the stall,
special case memmove to use 4 byte ops for the 4 byte copy case.

We already have a special case for 8-byte copies.
386 already specializes 4-byte copies.
I'll do 2-byte copies also, but not for 1.8.

benchmark                 old ns/op     new ns/op     delta
BenchmarkIssue18740-8     7567          4799          -36.58%

3-byte copies get a bit slower.  Other copies are unchanged.
name         old time/op   new time/op   delta
Memmove/3-8   4.76ns ± 5%   5.26ns ± 3%  +10.50%  (p=0.000 n=10+10)

Fixes #18740

Change-Id: Iec82cbac0ecfee80fa3c8fc83828f9a1819c3c74
Reviewed-on: https://go-review.googlesource.com/35567
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-01-23 19:39:22 +00:00
Bryan C. Mills
ea7d9e6a52 runtime: check for nil g and m in msanread
fixes #18707.

Change-Id: Ibc4efef01197799f66d10bfead22faf8ac00473c
Reviewed-on: https://go-review.googlesource.com/35452
Run-TryBot: Bryan Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-01-19 23:06:54 +00:00
Austin Clements
c1730ae424 runtime: force workers out before checking mark roots
Currently we check that all roots are marked as soon as gcMarkDone
decides to transition from mark 1 to mark 2. However, issue #16083
indicates that there may be a race where we try to complete mark 1
while a worker is still scanning a stack, causing the root mark check
to fail.

We don't yet understand this race, but as a simple mitigation, move
the root check to after gcMarkDone performs a ragged barrier, which
will force any remaining workers to finish their current job.

Updates #16083. This may "fix" it, but it would be better to
understand and fix the underlying race.

Change-Id: I1af9ce67bd87ade7bc2a067295d79c28cd11abd2
Reviewed-on: https://go-review.googlesource.com/35353
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-01-18 15:40:33 +00:00
Keith Randall
81a61a96c9 runtime: for plugins, don't add duplicate itabs
We already do this for shared libraries. Do it for plugins also.
Suggestions on how to test this would be welcome.

I'd like to get this in for 1.8.  It could lead to mysterious
hangs when using plugins.

Fixes #18676

Change-Id: I03209b096149090b9ba171c834c5e59087ed0f92
Reviewed-on: https://go-review.googlesource.com/35117
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2017-01-17 22:37:19 +00:00
Bryan C. Mills
fdde7ba2a2 runtime: avoid clobbering C callee-save register in cgoSigtramp
Use R11 (a caller-saved temp register) instead of RBX (a callee-saved
register).

I believe this only affects linux/amd64, since it is the only platform
with a non-trivial cgoSigtramp implementation.

Updates #18328.

Change-Id: I3d35c4512624184d5a8ece653fa09ddf50e079a2
Reviewed-on: https://go-review.googlesource.com/35068
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-12 00:06:32 +00:00
Austin Clements
2817e77024 runtime: debug prints for spanBytesAlloc underflow
Updates #18043.

Change-Id: I24e687fdd5521c48b672987f15f0d5de9f308884
Reviewed-on: https://go-review.googlesource.com/34612
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-10 15:59:39 +00:00
David Chase
7f1ff65c39 cmd/compile: insert scheduling checks on loop backedges
Loop breaking with a counter.  Benchmarked (see comments),
eyeball checked for sanity on popular loops.  This code
ought to handle loops in general, and properly inserts phi
functions in cases where the earlier version might not have.

Includes test, plus modifications to test/run.go to deal with
timeout and killing looping test.  Tests broken by the addition
of extra code (branch frequency and live vars) for added
checks turn the check insertion off.

If GOEXPERIMENT=preemptibleloops, the compiler inserts reschedule
checks on every backedge of every reducible loop.  Alternately,
specifying GO_GCFLAGS=-d=ssa/insert_resched_checks/on will
enable it for a single compilation, but because the core Go
libraries contain some loops that may run long, this is less
likely to have the desired effect.

This is intended as a tool to help in the study and diagnosis
of GC and other latency problems, now that goal STW GC latency
is on the order of 100 microseconds or less.

Updates #17831.
Updates #10958.

Change-Id: I6206c163a5b0248e3f21eb4fc65f73a179e1f639
Reviewed-on: https://go-review.googlesource.com/33910
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-01-09 21:01:29 +00:00
Austin Clements
ffedff7e50 runtime: add table of size classes in a comment
Change-Id: I52fae67c9aeceaa23e70f2ef0468745b354f8c75
Reviewed-on: https://go-review.googlesource.com/34932
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-01-08 00:01:30 +00:00
shawnps
067bab00a8 all: fix misspellings
Change-Id: I429637ca91f7db4144f17621de851a548dc1ce76
Reviewed-on: https://go-review.googlesource.com/34923
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-07 16:53:25 +00:00
Russ Cox
b902a63ade runtime: fix corruption crash/race between select and stack growth
To implement the blocking of a select, a goroutine builds a list of
offers to communicate (pseudo-g's, aka sudog), one for each case,
queues them on the corresponding channels, and waits for another
goroutine to complete one of those cases and wake it up. Obviously it
is not OK for two other goroutines to complete multiple cases and both
wake the goroutine blocked in select. To make sure that only one
branch of the select is chosen, all the sudogs contain a pointer to a
shared (single) 'done uint32', which is atomically cas'ed by any
interested goroutines. The goroutine that wins the cas race gets to
wake up the select. A complication is that 'done uint32' is stored on
the stack of the goroutine running the select, and that stack can move
during the select due to stack growth or stack shrinking.

The relevant ordering to block and unblock in select is:

	1. Lock all channels.
	2. Create list of sudogs and queue sudogs on all channels.
	3. Switch to system stack, mark goroutine as asleep,
	   unlock all channels.
	4. Sleep until woken.
	5. Wake up on goroutine stack.
	6. Lock all channels.
	7. Dequeue sudogs from all channels.
	8. Free list of sudogs.
	9. Unlock all channels.

There are two kinds of stack moves: stack growth and stack shrinking.
Stack growth happens while the original goroutine is running.
Stack shrinking happens asynchronously, during garbage collection.

While a channel listing a sudog is locked by select in this process,
no other goroutine can attempt to complete communication on that
channel, because that other goroutine doesn't hold the lock and can't
find the sudog. If the stack moves while all the channel locks are
held or when the sudogs are not yet or no longer queued in the
channels, no problem, because no goroutine can get to the sudogs and
therefore to selectdone. We only need to worry about the stack (and
'done uint32') moving with the sudogs queued in unlocked channels.

Stack shrinking can happen any time the goroutine is stopped.
That code already acquires all the channel locks before doing the
stack move, so it avoids this problem.

Stack growth can happen essentially any time the original goroutine is
running on its own stack (not the system stack). In the first half of
the select, all the channels are locked before any sudogs are queued,
and the channels are not unlocked until the goroutine has stopped
executing on its own stack and is asleep, so that part is OK. In the
second half of the select, the goroutine wakes up on its own goroutine
stack and immediately locks all channels. But the actual call to lock
might grow the stack, before acquiring any locks. In that case, the
stack is moving with the sudogs queued in unlocked channels. Not good.
One goroutine has already won a cas on the old stack (that goroutine
woke up the selecting goroutine, moving it out of step 4), and the
fact that done = 1 now should prevent any other goroutines from
completing any other select cases. During the stack move, however,
sudog.selectdone is moved from pointing to the old done variable on
the old stack to a new memory location on the new stack. Another
goroutine might observe the moved pointer before the new memory
location has been initialized. If the new memory word happens to be
zero, that goroutine might win a cas on the new location, thinking it
can now complete the select (again). It will then complete a second
communication (reading from or writing to the goroutine stack
incorrectly) and then attempt to wake up the selecting goroutine,
which is already awake.

The scribbling over the goroutine stack unexpectedly is already bad,
but likely to go unnoticed, at least immediately. As for the second
wakeup, there are a variety of ways it might play out.

* The goroutine might not be asleep.
That will produce a runtime crash (throw) like in #17007:

	runtime: gp: gp=0xc0422dcb60, goid=2299, gp->atomicstatus=8
	runtime:  g:  g=0xa5cfe0, goid=0,  g->atomicstatus=0
	fatal error: bad g->status in ready

Here, atomicstatus=8 is copystack; the second, incorrect wakeup is
observing that the selecting goroutine is in state "Gcopystack"
instead of "Gwaiting".

* The goroutine might be sleeping in a send on a nil chan.
If it wakes up, it will crash with 'fatal error: unreachable'.

* The goroutine might be sleeping in a send on a non-nil chan.
If it wakes up, it will crash with 'fatal error: chansend:
spurious wakeup'.

* The goroutine might be sleeping in a receive on a nil chan.
If it wakes up, it will crash with 'fatal error: unreachable'.

* The goroutine might be sleeping in a receive on a non-nil chan.
If it wakes up, it will silently (incorrectly!) continue as if it
received a zero value from a closed channel, leaving a sudog queued on
the channel pointing at that zero vaue on the goroutine's stack; that
space will be reused as the goroutine executes, and when some other
goroutine finally completes the receive, it will do a stray write into
the goroutine's stack memory, which may cause problems. Then it will
attempt the real wakeup of the goroutine, leading recursively to any
of the cases in this list.

* The goroutine might have been running a select in a finalizer
(I hope not!) and might now be sleeping waiting for more things to
finalize. If it wakes up, as long as it goes back to sleep quickly
(before the real GC code tries to wake it), the spurious wakeup does
no harm (but the stack was still scribbled on).

* The goroutine might be sleeping in gcParkAssist.
If it wakes up, that will let the goroutine continue executing a bit
earlier than we would have liked. Eventually the GC will attempt the
real wakeup of the goroutine, leading recursively to any of the cases
in this list.

* The goroutine cannot be sleeping in bgsweep, because the background
sweepers never use select.

* The goroutine might be sleeping in netpollblock.
If it wakes up, it will crash with 'fatal error: netpollblock:
corrupted state'.

* The goroutine might be sleeping in main as another thread crashes.
If it wakes up, it will exit(0) instead of letting the other thread
crash with a non-zero exit status.

* The goroutine cannot be sleeping in forcegchelper,
because forcegchelper never uses select.

* The goroutine might be sleeping in an empty select - select {}.
If it wakes up, it will return to the next line in the program!

* The goroutine might be sleeping in a non-empty select (again).
In this case, it will wake up spuriously, with gp.param == nil (no
reason for wakeup), but that was fortuitously overloaded for handling
wakeup due to a closing channel and the way it is handled is to rerun
the select, which (accidentally) handles the spurious wakeup
correctly:

	if cas == nil {
		// This can happen if we were woken up by a close().
		// TODO: figure that out explicitly so we don't need this loop.
		goto loop
	}

Before looping, it will dequeue all the sudogs on all the channels
involved, so that no other goroutine will attempt to wake it.
Since the goroutine was blocked in select before, being blocked in
select again when the spurious wakeup arrives may be quite likely.
In this case, the spurious wakeup does no harm (but the stack was
still scribbled on).

* The goroutine might be sleeping in semacquire (mutex slow path).
If it wakes up, that is taken as a signal to try for the semaphore
again, not a signal that the semaphore is now held, but the next
iteration around the loop will queue the sudog a second time, causing
a cycle in the wakeup list for the given address. If that sudog is the
only one in the list, when it is eventually dequeued, it will
(due to the precise way the code is written) leave the sudog on the
queue inactive with the sudog broken. But the sudog will also be in
the free list, and that will eventually cause confusion.

* The goroutine might be sleeping in notifyListWait, for sync.Cond.
If it wakes up, (*Cond).Wait returns. The docs say "Unlike in other
systems, Wait cannot return unless awoken by Broadcast or Signal,"
so the spurious wakeup is incorrect behavior, but most callers do not
depend on that fact. Eventually the condition will happen, attempting
the real wakeup of the goroutine and leading recursively to any of the
cases in this list.

* The goroutine might be sleeping in timeSleep aka time.Sleep.
If it wakes up, it will continue running, leaving a timer ticking.
When that time bomb goes off, it will try to ready the goroutine
again, leading to any one of the cases in this list.

* The goroutine cannot be sleeping in timerproc,
because timerproc never uses select.

* The goroutine might be sleeping in ReadTrace.
If it wakes up, it will print 'runtime: spurious wakeup of trace
reader' and return nil. All future calls to ReadTrace will print
'runtime: ReadTrace called from multiple goroutines simultaneously'.
Eventually, when trace data is available, a true wakeup will be
attempted, leading to any one of the cases in this list.

None of these fatal errors appear in any of the trybot or dashboard
logs. The 'bad g->status in ready' that happens if the goroutine is
running (the most likely scenario anyway) has happened once on the
dashboard and eight times in trybot logs. Of the eight, five were
atomicstatus=8 during net/http tests, so almost certainly this bug.
The other three were atomicstatus=2, all near code in select,
but in a draft CL by Dmitry that was rewriting select and may or may
not have had its own bugs.

This bug has existed since Go 1.4. Until then the select code was
implemented in C, 'done uint32' was a C stack variable 'uint32 done',
and C stacks never moved. I believe it has become more common recently
because of Brad's work to run more and more tests in net/http in
parallel, which lengthens race windows.

The fix is to run step 6 on the system stack,
avoiding possibility of stack growth.

Fixes #17007 and possibly other mysterious failures.

Change-Id: I9d6575a51ac96ae9d67ec24da670426a4a45a317
Reviewed-on: https://go-review.googlesource.com/34835
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-01-06 19:19:35 +00:00
Austin Clements
5dd978a283 runtime: expand HACKING.md
This adds high-level descriptions of the scheduler structures, the
user and system stacks, error handling, and synchronization.

Change-Id: I1eed97c6dd4a6e3d351279e967b11c6e64898356
Reviewed-on: https://go-review.googlesource.com/34290
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-01-06 18:30:36 +00:00
Austin Clements
618c291544 runtime: update big mgc.go comment
The comment describing the overall GC algorithm at the top of mgc.go
has gotten woefully out-of-date (and was possibly never
correct/complete). Update it to reflect the current workings of the
GC and the set of phases that we now divide it into.

Change-Id: I02143c0ebefe9d4cd7753349dab8045f0973bf95
Reviewed-on: https://go-review.googlesource.com/34711
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-01-06 18:22:35 +00:00
Austin Clements
7aefdfded0 runtime: use 4K as the boundary of legal pointers
Currently, the check for legal pointers in stack copying uses
_PageSize (8K) as the minimum legal pointer. By default, Linux won't
let you map under 64K, but

1) it's less clear what other OSes allow or will allow in the future;

2) while mapping the first page is a terrible idea, mapping anywhere
above that is arguably more justifiable;

3) the compiler only assumes the first physical page (4K) is never
mapped.

Make the runtime consistent with the compiler and more robust by
changing the bad pointer check to use 4K as the minimum legal pointer.

This came out of discussions on CLs 34663 and 34719.

Change-Id: Idf721a788bd9699fb348f47bdd083cf8fa8bd3e5
Reviewed-on: https://go-review.googlesource.com/34890
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-01-06 16:19:14 +00:00
Lion Yang
a2b615d527 crypto: detect BMI usability on AMD64 for sha1 and sha256
The existing implementations on AMD64 only detects AVX2 usability,
when they also contains BMI (bit-manipulation instructions).
These instructions crash the running program as 'unknown instructions'
on the architecture, e.g. i3-4000M, which supports AVX2 but not
support BMI.

This change added the detections for BMI1 and BMI2 to AMD64 runtime with
two flags as the result, `support_bmi1` and `support_bmi2`,
in runtime/runtime2.go. It also completed the condition to run AVX2 version
in packages crypto/sha1 and crypto/sha256.

Fixes #18512

Change-Id: I917bf0de365237740999de3e049d2e8f2a4385ad
Reviewed-on: https://go-review.googlesource.com/34850
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-05 15:37:37 +00:00
Michael Marineau
6a1cac2700 runtime: check sched_getaffinity return value
Android on ChromeOS uses a restrictive seccomp filter that blocks
sched_getaffinity, leading this code to index a slice by -errno.

Change-Id: Iec09a4f79dfbc17884e24f39bcfdad305de75b37
Reviewed-on: https://go-review.googlesource.com/34794
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2017-01-03 22:35:42 +00:00
Vladimir Stefanovic
155d314e50 runtime: fix SP alignment in mips{,le} sigfwd
Fixes misc/cgo/testsigfwd, enabled for mips{,le} with the next commit
(https://golang.org/cl/34646).

Change-Id: I2bec894b0492fd4d84dd73a4faa19eafca760107
Reviewed-on: https://go-review.googlesource.com/34645
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-03 15:42:06 +00:00
Austin Clements
a3f4cc0669 runtime: document MemStats.BySize fields
Change-Id: Iae8cdcd84e9b5f5d7c698abc6da3fc2af0ef839a
Reviewed-on: https://go-review.googlesource.com/34710
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2016-12-23 23:37:04 +00:00
Elias Naur
f419b56354 runtime: skip floating point hardware check on Android
CL 33652 removed the fake auxv for Android, and replaced it with
a /proc/self/auxv fallback. When /proc/self/auxv is unreadable,
however, hardware capabilities detection won't work and the runtime
will mistakenly think that floating point hardware is unavailable.

Fix this by always assuming floating point hardware on Android.

Manually tested on a Nexus 5 running Android 6.0.1. I suspect the
android/arm builder has a readable /proc/self/auxv and therefore
does not trigger the failure mode.

Change-Id: I95c3873803f9e17333c6cb8b9ff2016723104085
Reviewed-on: https://go-review.googlesource.com/34641
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-22 19:31:15 +00:00
Austin Clements
f24384f686 runtime: avoid CreateThread panic when exiting process
On Windows, CreateThread occasionally fails with ERROR_ACCESS_DENIED.
We're not sure why this is, but the Wine source code suggests that
this can happen when there's a concurrent CreateThread and ExitProcess
in the same process.

Fix this by setting a flag right before calling ExitProcess and
halting if CreateThread fails and this flag is set.

Updates #18253 (might fix it, but we're not sure this is the issue and
can't reproduce it on demand).

Change-Id: I1945b989e73a16cf28a35bf2613ffab07577ed4e
Reviewed-on: https://go-review.googlesource.com/34616
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-12-21 16:39:01 +00:00
Austin Clements
0c942e8f2c runtime: avoid incorrect panic when a signal arrives during STW
Stop-the-world and freeze-the-world (used for unhandled panics) are
currently not safe to do at the same time. While a regular unhandled
panic can't happen concurrently with STW (if the P hasn't been
stopped, then the panic blocks the STW), a panic from a _SigThrow
signal can happen on an already-stopped P, racing with STW. When this
happens, freezetheworld sets sched.stopwait to 0x7fffffff and
stopTheWorldWithSema panics because sched.stopwait != 0.

Fix this by detecting when freeze-the-world happens before
stop-the-world has completely stopped the world and freeze the STW
operation rather than panicking.

Fixes #17442.

Change-Id: I646a7341221dd6d33ea21d818c2f7218e2cb7e20
Reviewed-on: https://go-review.googlesource.com/34611
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-12-20 17:27:47 +00:00
Shenghou Ma
a0667be8ef runtime: use mincore to detect physical page size as last resort on Android
Fixes #18041.

Change-Id: Iad1439b2dd56b113c8829699eda467d1367b0e15
Reviewed-on: https://go-review.googlesource.com/34610
Reviewed-by: Austin Clements <austin@google.com>
2016-12-19 22:00:50 +00:00
Austin Clements
ddd558e7e4 runtime: clean up and improve reflect.methodValue comments
The runtime no longer hard-codes the offset of
reflect.methodValue.stack, so remove these obsolete comments. Also,
reflect.methodValue and runtime.reflectMethodValue must also agree
with reflect.makeFuncImpl, so update the comments on all three to
mention this.

This was pointed out by Minux on CL 31138.

Change-Id: Ic5ed1beffb65db76aca2977958da35de902e8e58
Reviewed-on: https://go-review.googlesource.com/34590
Reviewed-by: Keith Randall <khr@golang.org>
2016-12-19 21:02:53 +00:00
Michael Hudson-Doyle
1ec64e9b63 cmd/compile, runtime: a different approach to duplicate itabs
golang.org/issue/17594 was caused by additab being called more than once for
an itab. golang.org/cl/32131 fixed that by making the itabs local symbols,
but that in turn causes golang.org/issue/18252 because now there are now
multiple itab symbols in a process for a given (type,interface) pair and
different code paths can end up referring to different itabs which breaks
lots of reflection stuff. So this makes itabs global again and just takes
care to only call additab once for each itab.

Fixes #18252

Change-Id: I781a193e2f8dd80af145a3a971f6a25537f633ea
Reviewed-on: https://go-review.googlesource.com/34173
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-12-19 01:31:59 +00:00
Austin Clements
61db2e4efa runtime: cross-reference _func type better
It takes me several minutes every time I want to find where the linker
writes out the _func structures. Add some comments to make this
easier.

Change-Id: Ic75ce2786ca4b25726babe3c4fe9cd30c85c34e2
Reviewed-on: https://go-review.googlesource.com/34390
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-12-16 17:03:25 +00:00
Kevin Burke
1716add3dc runtime/pprof: fix spelling in test
Change-Id: Id10e41fe396156106f63a4b29d673b31bea5358f
Reviewed-on: https://go-review.googlesource.com/34551
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-12-16 16:09:13 +00:00
Ian Lance Taylor
ecc4474341 runtime/pprof: deflake tests for heavily loaded systems
In the sampling tests, let the test pass if we get at least 10 samples.

Fixes #18332.

Change-Id: I8aad083d1a0ba179ad6663ff43f6b6b3ce1e18cd
Reviewed-on: https://go-review.googlesource.com/34507
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-12-16 15:17:36 +00:00
Bryan C. Mills
29cb72154d runtime: preserve callee-saved C registers in sigtramp
This fixes Linux and the *BSD platforms on 386/amd64.

A few OS/arch combinations were already saving registers and/or doing
something that doesn't clearly resemble the SysV C ABI; those have
been left alone.

Fixes #18328.

Change-Id: I6398f6c71020de108fc8b26ca5946f0ba0258667
Reviewed-on: https://go-review.googlesource.com/34501
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-12-15 23:41:06 +00:00
Cherry Zhang
3444e5b355 runtime: fix mips assembly
I meant to say ~7, instead of ^7, in the review.

Fix build.

Change-Id: I5060bbcd98b4ab6f00251fdb68b6b35767e5acf1
Reviewed-on: https://go-review.googlesource.com/34411
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-12-15 02:50:23 +00:00
Vladimir Stefanovic
b909d01152 runtime: add cgo support for GOARCH=mips{,le}
Change-Id: Ib425ead7b340672837d3cb983bd785488706bd6d
Reviewed-on: https://go-review.googlesource.com/34314
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-14 23:52:33 +00:00
David Crawshaw
96414ca39f cmd/link: do not export plugin C symbols
Explicitly filter any C-only cgo functions out of pclntable,
which allows them to be duplicated with the host binary.

Updates #18190.

Change-Id: I50d8706777a6133b3e95f696bc0bc586b84faa9e
Reviewed-on: https://go-review.googlesource.com/34199
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-12-14 19:36:20 +00:00
Euan Kemp
afb350811e runtime: correct writebarrier typos
Change-Id: I7d67c3d64be915f0be5932d2c068606d74f93c29
Reviewed-on: https://go-review.googlesource.com/34378
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-12-14 18:19:10 +00:00
Ian Lance Taylor
fded5dbb2f runtime: don't crash if signal delivered on g0 stack
Also, if we changed the gsignal stack to match the stack we are
executing on, restore it when returning from the signal handler, for
safety.

Fixes #18255.

Change-Id: Ic289b36e4e38a56f8a6d4b5d74f68121c242e81a
Reviewed-on: https://go-review.googlesource.com/34239
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2016-12-12 19:19:59 +00:00
Joel Sing
f91ddaabe6 runtime, syscall: update openbsd for changes to syskill
Change the openbsd runtime to use the current sys_kill and sys_thrkill
system calls.

Prior to OpenBSD 5.9 the sys_kill system call could be used with both
processes and threads. In OpenBSD 5.9 this functionality was split into
a sys_kill system call for processes (with a new syscall number) and a
sys_thrkill system call for threads. The original/legacy system call was
retained in OpenBSD 5.9 and OpenBSD 6.0, however has been removed and
will not exist in the upcoming OpenBSD 6.1 release.

Note: This change is needed to make Go work on OpenBSD 6.1 (to be
released in May 2017) and should be included in the Go 1.8 release.
This change also drops support for OpenBSD 5.8, which is already an
unsupported OpenBSD release.

Change-Id: I525ed9b57c66c0c6f438dfa32feb29c7eefc72b0
Reviewed-on: https://go-review.googlesource.com/34093
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-12-12 01:30:39 +00:00
Raul Silvera
c6228ef7e2 runtime/pprof: track locations for goroutine profiles
Must add locations to the profile when generating a profile.proto.
This fixes #18229

Change-Id: I49cd63a30759d3fe8960d7b7c8bd5a554907f8d1
Reviewed-on: https://go-review.googlesource.com/34028
Reviewed-by: Michael Matloob <matloob@golang.org>
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-09 19:14:26 +00:00
Brad Fitzpatrick
4c4201f0e2 all: make spelling consistent
Fixes #17938

Change-Id: Iad12155f4976846bd4a9a53869f89e40e5b3deb3
Reviewed-on: https://go-review.googlesource.com/34147
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2016-12-08 23:22:37 +00:00
Austin Clements
01c6a19e04 runtime: add number of forced GCs to MemStats
This adds a counter for the number of times the application forced a
GC by, e.g., calling runtime.GC(). This is useful for detecting
applications that are overusing/abusing runtime.GC() or
debug.FreeOSMemory().

Fixes #18217.

Change-Id: I990ab7a313c1b3b7a50a3d44535c460d7c54f47d
Reviewed-on: https://go-review.googlesource.com/34067
Reviewed-by: Russ Cox <rsc@golang.org>
2016-12-07 20:59:16 +00:00