1
0
mirror of https://github.com/golang/go synced 2024-10-04 16:11:21 -06:00
Commit Graph

1768 Commits

Author SHA1 Message Date
Dmitriy Vyukov
327e431057 runtime: prepare for 8K pages
Ensure than heap is PageSize aligned.

LGTM=iant
R=iant, dave, gobot
CC=golang-codereviews
https://golang.org/cl/56630043
2014-01-29 18:18:46 +04:00
Dmitriy Vyukov
d62379eef5 runtime: more chan tests
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/57390043
2014-01-28 22:45:14 +04:00
Dmitriy Vyukov
d176e3d7c5 runtime: prefetch next block in mallocgc
json-1
cputime                  99600000     98600000      -1.00%
time                    100005493     98859693      -1.15%

garbage-1
cputime                  15760000     15440000      -2.03%
time                     15791759     15471701      -2.03%

LGTM=khr
R=golang-codereviews, gobot, khr, dave
CC=bradfitz, golang-codereviews, iant
https://golang.org/cl/57310043
2014-01-28 22:38:39 +04:00
Dmitriy Vyukov
d409e44cfb runtime: fix buffer overflow in make(chan)
On 32-bits one can arrange make(chan) params so that
the chan buffer gives you access to whole memory.

LGTM=r
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/50250045
2014-01-28 22:37:35 +04:00
Dmitriy Vyukov
ce884036d2 runtime: adjust malloc race instrumentation for tiny allocs
Tiny alloc memory block is shared by different goroutines running on the same thread.
We call racemalloc after enabling preemption in mallocgc,
as the result another goroutine can act on not yet race-cleared tiny block.
Call racemalloc before enabling preemption.
Fixes #7224.

LGTM=dave
R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/57730043
2014-01-28 22:34:32 +04:00
Vincent Vanackere
d7c14655a9 runtime/debug: fix incorrect Stack output if package path contains a dot
Although debug.Stack is deprecated, it should still return the correct result.
Output before this CL (using a trivial library in $GOPATH/test.com/a):
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
        com/a.ShowStack: os.Stdout.Write(debug.Stack())

Output with this CL applied:
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
        ShowStack: os.Stdout.Write(debug.Stack())

LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/57330043
2014-01-27 14:00:00 -08:00
Dmitriy Vyukov
86a3a54284 runtime: fix windows build
Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes #7218.

R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045
2014-01-28 00:26:56 +04:00
Dmitriy Vyukov
179d41fecc runtime: tune P retake logic
When GOMAXPROCS>1 the last P in syscall is never retaken
(because there are already idle P's -- npidle>0).
This prevents sysmon thread from sleeping.
On a darwin machine the program from issue 6673 constantly
consumes ~0.2% CPU. With this change it stably consumes 0.0% CPU.
Fixes #6673.

R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/56990045
2014-01-27 23:17:46 +04:00
Brad Fitzpatrick
a18f4ab569 all: use {bytes,strings}.NewReader instead of bytes.Buffers
Use the smaller read-only bytes.NewReader/strings.NewReader instead
of a bytes.Buffer when possible.

LGTM=r
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54660045
2014-01-27 11:05:01 -08:00
Dmitriy Vyukov
e1a91c5b89 runtime: fix buffer overflow in stringtoslicerune
On 32-bits n*sizeof(r[0]) can overflow.
Or it can become 1<<32-eps, and mallocgc will "successfully"
allocate 0 pages for it, there are no checks downstream
and MHeap_Grow just does:
npage = (npage+15)&~15;
ask = npage<<PageShift;

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/54760045
2014-01-27 20:29:21 +04:00
Dmitriy Vyukov
bace9523ee runtime: smarter slice grow
When growing slice take into account size of the allocated memory block.
Also apply the same optimization to string->[]byte conversion.
Fixes #6307.

benchmark                    old ns/op    new ns/op    delta
BenchmarkAppendGrowByte        4541036      4434108   -2.35%
BenchmarkAppendGrowString     59885673     44813604  -25.17%

LGTM=khr
R=khr
CC=golang-codereviews, iant, rsc
https://golang.org/cl/53340044
2014-01-27 15:11:12 +04:00
Jeff Sickel
03e4f25849 runtime/pprof: plan9 fails the TestGoroutineSwitch, skip for now.
LGTM=r
R=golang-codereviews, 0intro, r
CC=golang-codereviews
https://golang.org/cl/55430043
2014-01-25 10:09:08 -08:00
Dmitriy Vyukov
1fa7029425 runtime: combine small NoScan allocations
Combine NoScan allocations < 16 bytes into a single memory block.
Reduces number of allocations on json/garbage benchmarks by 10+%.

json-1
allocated                 8039872      7949194      -1.13%
allocs                     105774        93776     -11.34%
cputime                 156200000    100700000     -35.53%
gc-pause-one              4908873      3814853     -22.29%
gc-pause-total            2748969      2899288      +5.47%
rss                      52674560     43560960     -17.30%
sys-gc                    3796976      3256304     -14.24%
sys-heap                 43843584     35192832     -19.73%
sys-other                 5589312      5310784      -4.98%
sys-stack                  393216       393216      +0.00%
sys-total                53623088     44153136     -17.66%
time                    156193436    100886714     -35.41%
virtual-mem             256548864    256540672      -0.00%

garbage-1
allocated                 2996885      2932982      -2.13%
allocs                      62904        55200     -12.25%
cputime                  17470000     17400000      -0.40%
gc-pause-one            932757485    925806143      -0.75%
gc-pause-total            4663787      4629030      -0.75%
rss                    1151074304   1133670400      -1.51%
sys-gc                   66068352     65085312      -1.49%
sys-heap               1039728640   1024065536      -1.51%
sys-other                38038208     37485248      -1.45%
sys-stack                 8650752      8781824      +1.52%
sys-total              1152485952   1135417920      -1.48%
time                     17478088     17418005      -0.34%
virtual-mem            1343709184   1324204032      -1.45%

LGTM=iant, bradfitz
R=golang-codereviews, dave, iant, rsc, bradfitz
CC=golang-codereviews, khr
https://golang.org/cl/38750047
2014-01-24 22:35:11 +04:00
Dmitriy Vyukov
f8e0057bb7 sync: scalable Pool
Introduce fixed-size P-local caches.
When local caches overflow/underflow a batch of items
is transferred to/from global mutex-protected cache.

benchmark                    old ns/op    new ns/op    delta
BenchmarkPool                    50554        22423  -55.65%
BenchmarkPool-4                 400359         5904  -98.53%
BenchmarkPool-16                403311         1598  -99.60%
BenchmarkPool-32                367310         1526  -99.58%

BenchmarkPoolOverlflow            5214         3633  -30.32%
BenchmarkPoolOverlflow-4         42663         9539  -77.64%
BenchmarkPoolOverlflow-8         46919        11385  -75.73%
BenchmarkPoolOverlflow-16        39454        13048  -66.93%

BenchmarkSprintfEmpty                    84           63  -25.68%
BenchmarkSprintfEmpty-2                 371           32  -91.13%
BenchmarkSprintfEmpty-4                 465           22  -95.25%
BenchmarkSprintfEmpty-8                 565           12  -97.77%
BenchmarkSprintfEmpty-16                498            5  -98.87%
BenchmarkSprintfEmpty-32                492            4  -99.04%

BenchmarkSprintfString                  259          229  -11.58%
BenchmarkSprintfString-2                574          144  -74.91%
BenchmarkSprintfString-4                651           77  -88.05%
BenchmarkSprintfString-8                868           47  -94.48%
BenchmarkSprintfString-16               825           33  -95.96%
BenchmarkSprintfString-32               825           30  -96.28%

BenchmarkSprintfInt                     213          188  -11.74%
BenchmarkSprintfInt-2                   448          138  -69.20%
BenchmarkSprintfInt-4                   624           52  -91.63%
BenchmarkSprintfInt-8                   691           31  -95.43%
BenchmarkSprintfInt-16                  724           18  -97.46%
BenchmarkSprintfInt-32                  718           16  -97.70%

BenchmarkSprintfIntInt                  311          282   -9.32%
BenchmarkSprintfIntInt-2                333          145  -56.46%
BenchmarkSprintfIntInt-4                642          110  -82.87%
BenchmarkSprintfIntInt-8                832           42  -94.90%
BenchmarkSprintfIntInt-16               817           24  -97.00%
BenchmarkSprintfIntInt-32               805           22  -97.17%

BenchmarkSprintfPrefixedInt             309          269  -12.94%
BenchmarkSprintfPrefixedInt-2           245          168  -31.43%
BenchmarkSprintfPrefixedInt-4           598           99  -83.36%
BenchmarkSprintfPrefixedInt-8           770           67  -91.23%
BenchmarkSprintfPrefixedInt-16          829           54  -93.49%
BenchmarkSprintfPrefixedInt-32          824           50  -93.83%

BenchmarkSprintfFloat                   418          398   -4.78%
BenchmarkSprintfFloat-2                 295          203  -31.19%
BenchmarkSprintfFloat-4                 585          128  -78.12%
BenchmarkSprintfFloat-8                 873           60  -93.13%
BenchmarkSprintfFloat-16                884           33  -96.24%
BenchmarkSprintfFloat-32                881           29  -96.62%

BenchmarkManyArgs                      1097         1069   -2.55%
BenchmarkManyArgs-2                     705          567  -19.57%
BenchmarkManyArgs-4                     792          319  -59.72%
BenchmarkManyArgs-8                     963          172  -82.14%
BenchmarkManyArgs-16                   1115          103  -90.76%
BenchmarkManyArgs-32                   1133           90  -92.03%

LGTM=rsc
R=golang-codereviews, bradfitz, minux.ma, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/46010043
2014-01-24 22:29:53 +04:00
Dmitriy Vyukov
9fa9613e0b runtime: do not zero terminate strings
On top of "tiny allocator" (cl/38750047), reduces number of allocs by 1% on json.
No code must rely on zero termination. So will also make debugging simpler,
by uncovering issues earlier.

json-1
allocated                 7949686      7915766      -0.43%
allocs                      93778        92790      -1.05%
time                    100957795     97250949      -3.67%
rest of the metrics are too noisy.

LGTM=r
R=golang-codereviews, r, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/40370061
2014-01-24 22:29:01 +04:00
Russ Cox
a81692e265 cmd/gc: add zeroing to enable precise stack accounting
There is more zeroing than I would like right now -
temporaries used for the new map and channel runtime
calls need to be eliminated - but it will do for now.

This CL only has an effect if you are building with

        GOEXPERIMENT=precisestack ./all.bash

(or make.bash). It costs about 5% in the overall time
spent in all.bash. That number will come down before
we make it on by default, but this should be enough for
Keith to try using the precise maps for copying stacks.

amd64 only (and it's not really great generated code).

TBR=khr, iant
CC=golang-codereviews
https://golang.org/cl/56430043
2014-01-23 23:11:04 -05:00
Russ Cox
b377c9c6a9 liblink, runtime: fix cgo on arm
The addition of TLS to ARM rewrote the MRC instruction
differently depending on whether we were using internal
or external linking mode. That's clearly not okay, since we
don't know that during compilation, which is when we now
generate the code. Also, because the change did not introduce
a real MRC instruction but instead just macro-expanded it
in the assembler, liblink is rewriting a WORD instruction that
may actually be looking for that specific constant, which would
lead to very unexpected results. It was also using one value
that happened to be 8 where a different value that also
happened to be 8 belonged. So the code was correct for those
values but not correct in general, and very confusing.

Throw it all away.

Replace with the following. There is a linker-provided symbol
runtime.tlsgm with a value (address) set to the offset from the
hardware-provided TLS base register to the g and m storage.
Any reference to that name emits an appropriate TLS relocation
to be resolved by either the internal linker or the external linker,
depending on the link mode. The relocation has exactly the
semantics of the R_ARM_TLS_LE32 relocation, which is what
the external linker provides.

This symbol is only used in two routines, runtime.load_gm and
runtime.save_gm. In both cases it is now used like this:

        MRC		15, 0, R0, C13, C0, 3 // fetch TLS base pointer
        MOVW	$runtime·tlsgm(SB), R2
        ADD	R2, R0 // now R0 points at thread-local g+m storage

It is likely that this change breaks the generation of shared libraries
on ARM, because the MOVW needs to be rewritten to use the global
offset table and a different relocation type. But let's get the supported
functionality working again before we worry about unsupported
functionality.

LGTM=dave, iant
R=iant, dave
CC=golang-codereviews
https://golang.org/cl/56120043
2014-01-23 22:51:39 -05:00
Keith Randall
be5d2d4432 runtime: Print elision message if we skipped frames on traceback.
Fixes bug 7180

R=golang-codereviews, dvyukov
CC=golang-codereviews, gri
https://golang.org/cl/55810044
2014-01-23 12:47:30 -08:00
Dmitriy Vyukov
8371b0142e undo CL 45770044 / d795425bfa18
Breaks darwin and freebsd.

««« original CL description
runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044
»»»

R=golang-codereviews
CC=golang-codereviews
https://golang.org/cl/56060043
2014-01-23 19:56:59 +04:00
Dmitriy Vyukov
6d603af6dc runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044
2014-01-23 18:59:43 +04:00
Russ Cox
f7245c0626 runtime: fix typo in ARM code
The typo was introduced by one of Dmitriy's CLs this morning.
The fix makes the ARM build compile again; it still won't pass
its tests, but one thing at a time.

TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/55770044
2014-01-22 16:39:39 -05:00
Dmitriy Vyukov
b7b93a7154 runtime: fix code formatting
Place && at the end of line.
Offset expression continuation.

R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/55380044
2014-01-22 13:30:12 +04:00
Dmitriy Vyukov
9cbd2fb1aa runtime: remove locks from netpoll hotpaths
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
This mechanism does not require backing mutex to protect wait predicate.
Use it in netpoll. See comment in netpoll.goc for details.
This slightly reduces contention between reader, writer and read/write io notifications;
and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.

benchmark                             old ns/op    new ns/op    delta
BenchmarkTCP4ConcurrentReadWrite           2109         1945   -7.78%
BenchmarkTCP4ConcurrentReadWrite-2         1162         1113   -4.22%
BenchmarkTCP4ConcurrentReadWrite-4          798          755   -5.39%
BenchmarkTCP4ConcurrentReadWrite-8          803          748   -6.85%
BenchmarkTCP4Persistent                    9411         9240   -1.82%
BenchmarkTCP4Persistent-2                  5888         5813   -1.27%
BenchmarkTCP4Persistent-4                  4016         3968   -1.20%
BenchmarkTCP4Persistent-8                  3943         3857   -2.18%

R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
CC=golang-codereviews, khr
https://golang.org/cl/45700043
2014-01-22 11:27:16 +04:00
Dmitriy Vyukov
cb86d86786 runtime/race: race instrument reads/writes in select cases
The new select tests currently fail (the race is not detected).

R=khr
CC=golang-codereviews
https://golang.org/cl/54220043
2014-01-22 10:36:17 +04:00
Dmitriy Vyukov
98b50b89a8 runtime: allocate goroutine ids in batches
Helps reduce contention on sched.goidgen.

benchmark                               old ns/op    new ns/op    delta
BenchmarkCreateGoroutines-16                  259          237   -8.49%
BenchmarkCreateGoroutinesParallel-16          127           43  -66.06%

R=golang-codereviews, dave, bradfitz, khr
CC=golang-codereviews, rsc
https://golang.org/cl/46970043
2014-01-22 10:34:36 +04:00
Dmitriy Vyukov
8a3c587dc1 runtime: fix and improve CPU profiling
- do not lose profiling signals when we have no mcache (possible for syscalls/cgo)
- do not lose any profiling signals on windows
- fix profiling of cgo programs on windows (they had no m->thread setup)
- properly setup tls in cgo programs on windows
- check _beginthread return value

Fixes #6417.
Fixes #6986.

R=alex.brainman, rsc
CC=golang-codereviews
https://golang.org/cl/44820047
2014-01-22 10:30:10 +04:00
Russ Cox
dab127baf5 liblink: remove use of linkmode on ARM
Now that liblink is compiled into the compilers and assemblers,
it must not refer to the "linkmode", since that is not known until
link time. This CL makes the ARM support no longer use linkmode,
which fixes a bug with cgo binaries that contain their own TLS
variables.

The x86 code must also remove linkmode; that is issue 7164.

Fixes #6992.

R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/55160043
2014-01-21 19:46:34 -05:00
Keith Randall
c8c18614af runtime: if "panic during panic"'s stacktrace fails, don't recurse.
R=golang-codereviews, iant, khr, dvyukov
CC=golang-codereviews
https://golang.org/cl/54160043
2014-01-21 14:34:37 -08:00
Dmitriy Vyukov
cb133c6607 runtime: do not collect GC roots explicitly
Currently we collect (add) all roots into a global array in a single-threaded GC phase.
This hinders parallelism.
With this change we just kick off parallel for for number_of_goroutines+5 iterations.
Then parallel for callback decides whether it needs to scan stack of a goroutine
scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely.
This requires to store all goroutines in an array instead of a linked list
(to allow direct indexing).
This CL also removes DebugScan functionality. It is broken because it uses
unbounded stack, so it can not run on g0. When it was working, I've found
it helpless for debugging issues because the two algorithms are too different now.
This change would require updating the DebugScan, so it's simpler to just delete it.

With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same.

garbage-8
allocated                 2987886      2989221      +0.04%
allocs                      62885        62887      +0.00%
cputime                  21286000     21272000      -0.07%
gc-pause-one             26633247     24885421      -6.56%
gc-pause-total             873570       811264      -7.13%
rss                     242089984    242515968      +0.18%
sys-gc                   13934336     13869056      -0.47%
sys-heap                205062144    205062144      +0.00%
sys-other                12628288     12628288      +0.00%
sys-stack                11534336     11927552      +3.41%
sys-total               243159104    243487040      +0.13%
time                      2809477      2740795      -2.44%

R=golang-codereviews, rsc
CC=cshapiro, golang-codereviews, khr
https://golang.org/cl/46860043
2014-01-21 13:06:57 +04:00
Dmitriy Vyukov
0e027fca42 runtime: delete proc.p
It's entirely outdated today.

R=golang-codereviews, bradfitz, gobot, r
CC=golang-codereviews
https://golang.org/cl/43500045
2014-01-21 12:49:55 +04:00
Dmitriy Vyukov
1ba04c171a runtime: per-P defer pool
Instead of a per-goroutine stack of defers for all sizes,
introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes.

For a program that starts 1e6 goroutines and then joins then:
old: rss=6.6g virtmem=10.2g time=4.85s
new: rss=4.5g virtmem= 8.2g time=3.48s

R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/42750044
2014-01-21 11:20:23 +04:00
Keith Randall
abd588aa83 runtime: fix race detector by recording read by chansend.
R=golang-codereviews, dvyukov, khr
CC=golang-codereviews
https://golang.org/cl/54060043
2014-01-21 11:17:44 +04:00
Dmitriy Vyukov
d5a36cd6bb runtime: zero 2-word memory blocks in-place
Currently for 2-word blocks we set the flag to clear the flag. Makes no sense.
In particular on 32-bits we call memclr always.

R=golang-codereviews, dave, iant
CC=golang-codereviews, khr, rsc
https://golang.org/cl/41170044
2014-01-21 10:53:51 +04:00
Dmitriy Vyukov
b039abfc3e runtime: fix specials deadlock
The deadlock is between span->specialLock and proflock:

goroutine 11 [running]:
runtime.MProf_Free(0x7fa272d26508, 0xc210054180, 0xc0)
        src/pkg/runtime/mprof.goc:220 +0x27
runtime.freespecial(0x7fa272d1e088, 0xc210054180, 0xc0)
        src/pkg/runtime/mheap.c:691 +0x6a
runtime.freeallspecials(0x7fa272d1af50, 0xc210054180, 0xc0)
        src/pkg/runtime/mheap.c:717 +0xb5
runtime.free(0xc210054180)
        src/pkg/runtime/malloc.goc:190 +0xfd
selectgo(0x7fa272a5ef58)
        src/pkg/runtime/chan.c:1136 +0x2d8
runtime.selectgo(0xc210054180)
        src/pkg/runtime/chan.c:840 +0x12
runtime_test.func·058()
        src/pkg/runtime/proc_test.go:146 +0xb4
runtime.goexit()
        src/pkg/runtime/proc.c:1405
created by runtime_test.TestTimerFairness
        src/pkg/runtime/proc_test.go:152 +0xd1

goroutine 12 [running]:
addspecial(0xc2100540c0, 0x7fa272d1e0a0)
        src/pkg/runtime/mheap.c:569 +0x88
runtime.setprofilebucket(0xc2100540c0, 0x7fa272d26508)
        src/pkg/runtime/mheap.c:668 +0x73
runtime.MProf_Malloc(0xc2100540c0, 0xc0, 0x0)
        src/pkg/runtime/mprof.goc:212 +0x16b
runtime.mallocgc(0xc0, 0x0, 0xc200000000)
        src/pkg/runtime/malloc.goc:142 +0x239
runtime.mal(0xbc)
        src/pkg/runtime/malloc.goc:703 +0x38
newselect(0x2, 0x7fa272a5cf60)
        src/pkg/runtime/chan.c:632 +0x53
runtime.newselect(0xc200000002, 0xc21005f000)
        src/pkg/runtime/chan.c:615 +0x28
runtime_test.func·058()
        src/pkg/runtime/proc_test.go:146 +0x37
runtime.goexit()
        src/pkg/runtime/proc.c:1405
created by runtime_test.TestTimerFairness
        src/pkg/runtime/proc_test.go:152 +0xd1

Fixes #7099.

R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/53120043
2014-01-21 10:48:37 +04:00
Dmitriy Vyukov
d76a1e593c runtime: fix test on windows
The test prints an excessive \n when /dev/null is not present.

R=golang-codereviews, bradfitz, dave
CC=golang-codereviews
https://golang.org/cl/54890043
2014-01-21 10:44:08 +04:00
Dmitriy Vyukov
90eca36a23 runtime: ensure fair scheduling during frequent GCs
What was happenning is as follows:
Each writer goroutine always triggers GC during its scheduling quntum.
After GC goroutines are shuffled so that the timer goroutine is always second in the queue.
This repeats infinitely, causing timer goroutine starvation.
Fixes #7126.

R=golang-codereviews, shanemhansen, khr, khr
CC=golang-codereviews
https://golang.org/cl/53080043
2014-01-21 10:24:42 +04:00
Keith Randall
6c9f198c9a runtime: print stack trace when "panic during panic"
Fixes bug 7145

R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/53970043
2014-01-17 18:47:40 -08:00
Keith Randall
6f6a9445c9 runtime, cmd/gc: Get rid of vararg channel calls.
Vararg C calls present a problem for the GC because the
argument types are not derivable from the signature.  Remove
them by passing pointers to channel elements instead of the
channel elements directly.

R=golang-codereviews, gobot, rsc, dvyukov
CC=golang-codereviews
https://golang.org/cl/53430043
2014-01-17 14:48:45 -08:00
Russ Cox
98178b345a runtime: fix TestLFStackStress
Fixes #7138.

R=r, bradfitz, dave
CC=dvyukov, golang-codereviews
https://golang.org/cl/53910043
2014-01-17 17:42:24 -05:00
Aram Hăvărneanu
a46b434931 runtime: add support for GOOS=solaris
R=alex.brainman, dave, jsing, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/35990043
2014-01-17 17:58:10 +13:00
Keith Randall
873aaa59b7 reflect: Remove imprecise techniques from channel/select operations.
Reflect used to communicate to the runtime using interface words,
which is bad for precise GC because sometimes iwords hold a pointer
and sometimes they don't.  This change rewrites channel and select
operations to always pass pointers to the runtime.

reflect.Select gets somewhat more expensive, as we now do an allocation
per receive case instead of one allocation whose size is the max of
all the received types.  This seems unavoidable to get preciseness
(unless we move the allocation into selectgo, which is a much bigger
change).

Fixes #6490

R=golang-codereviews, dvyukov, rsc
CC=golang-codereviews
https://golang.org/cl/52900043
2014-01-16 13:35:29 -08:00
Dmitriy Vyukov
c0b9e6218c runtime: output how long goroutines are blocked
Example of output:

goroutine 4 [sleep for 3 min]:
time.Sleep(0x34630b8a000)
        src/pkg/runtime/time.goc:31 +0x31
main.func·002()
        block.go:16 +0x2c
created by main.main
        block.go:17 +0x33

Full program and output are here:
http://play.golang.org/p/NEZdADI3Td

Fixes #6809.

R=golang-codereviews, khr, kamil.kisiel, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/50420043
2014-01-16 12:54:46 +04:00
Dmitriy Vyukov
4722b1cbd3 runtime: use lock-free ring for work queues
Use lock-free fixed-size ring for work queues
instead of an unbounded mutex-protected array.
The ring has single producer and multiple consumers.
If the ring overflows, work is put onto global queue.

benchmark              old ns/op    new ns/op    delta
BenchmarkMatmult               7            5  -18.12%
BenchmarkMatmult-4             2            2  -18.98%
BenchmarkMatmult-16            1            0  -12.84%

BenchmarkCreateGoroutines                     105           88  -16.10%
BenchmarkCreateGoroutines-4                   376          219  -41.76%
BenchmarkCreateGoroutines-16                  241          174  -27.80%
BenchmarkCreateGoroutinesParallel             103           87  -14.66%
BenchmarkCreateGoroutinesParallel-4           169          143  -15.38%
BenchmarkCreateGoroutinesParallel-16          158          151   -4.43%

R=golang-codereviews, rsc
CC=ddetlefs, devon.odell, golang-codereviews
https://golang.org/cl/46170044
2014-01-16 12:17:00 +04:00
Dmitriy Vyukov
b3a3afc9b7 runtime: fix data race in GC
Fixes #5139.
Update #7065.

R=golang-codereviews, bradfitz, minux.ma
CC=golang-codereviews
https://golang.org/cl/52090045
2014-01-15 19:38:08 +04:00
Shenghou Ma
0db71338ed runtime/debug: force GC after setting of GCPercent to make it effective.
See also discussion in CL 51010045.

R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/52230043
2014-01-14 19:23:36 -05:00
Keith Randall
8454e2c287 runtime: Change size of map iter offset so 32-bit version compiles cleanly.
R=golang-codereviews, minux.ma
CC=golang-codereviews
https://golang.org/cl/52310043
2014-01-14 13:46:22 -08:00
Josh Bleecher Snyder
3be4d95731 runtime: change map iteration randomization to use intra-bucket offset
Map iteration previously started from a random bucket, but walked each
bucket from the beginning. Now, iteration always starts from the first
bucket and walks each bucket starting at a random offset. For
performance, the random offset is selected at the start of iteration
and reused for each bucket.

Iteration over a map with 8 or fewer elements--a single bucket--will
now be non-deterministic. There will now be only 8 different possible
map iterations.

Significant benchmark changes, on my OS X laptop (rough but consistent):

benchmark                              old ns/op     new ns/op     delta
BenchmarkMapIter                       128           121           -5.47%
BenchmarkMapIterEmpty                  4.26          4.45          +4.46%
BenchmarkNewEmptyMap                   114           111           -2.63%

Fixes #6719.

R=khr, bradfitz
CC=golang-codereviews
https://golang.org/cl/47370043
2014-01-14 12:54:05 -08:00
Russ Cox
3ec60c253d runtime: emit collection stacks in GODEBUG=allocfreetrace mode
R=khr, dvyukov
CC=golang-codereviews
https://golang.org/cl/51830043
2014-01-14 10:39:50 -05:00
Dmitriy Vyukov
e0dcf73d61 runtime: fix comment
Void function can not return false.

R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/52000043
2014-01-14 12:58:13 +04:00
Keith Randall
cefe6ac9a1 runtime/pprof: fix flaky TestCPUProfileMultithreaded test
It's too sensitive.

Fixes bug 7095

R=golang-codereviews, iant, minux.ma, rsc
CC=golang-codereviews
https://golang.org/cl/50470043
2014-01-13 21:18:47 -08:00
Russ Cox
a2edc469a0 runtime: remove redundant 0x prefix in error print
%x already adds the prefix unconditionally

R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/51550043
2014-01-13 11:39:04 -05:00
Joel Sing
073bd0ba24 runtime/pprof: enable profiling test on openbsd
Profiling of multithreaded applications works correctly on OpenBSD
5.4-current, so enable the profiling test.

R=golang-codereviews, minux.ma
CC=golang-codereviews
https://golang.org/cl/50940043
2014-01-13 11:24:08 +11:00
Joel Sing
471763f657 runtime, syscall: update for openbsd system ABI break
Update Go so that it continues to work past the OpenBSD system ABI
break, with 64-bit time_t:

  http://www.openbsd.org/faq/current.html#20130813

Note: this makes OpenBSD 5.5 (currently 5.4-current) the minimum
supported release for Go.

Fixes #7049.

R=golang-codereviews, mikioh.mikioh
CC=golang-codereviews
https://golang.org/cl/13368046
2014-01-11 19:00:32 +11:00
Ian Lance Taylor
8da8b37674 runtime: fix 32-bit malloc for pointers >= 0x80000000
The spans array is allocated in runtime·mallocinit.  On a
32-bit system the number of entries in the spans array is
MaxArena32 / PageSize, which (2U << 30) / (1 << 12) == (1 << 19).
So we are allocating an array that can hold 19 bits for an
index that can hold 20 bits.  According to the comment in the
function, this is intentional: we only allocate enough spans
(and bitmaps) for a 2G arena, because allocating more would
probably be wasteful.

But since the span index is simply the upper 20 bits of the
memory address, this scheme only works if memory addresses are
limited to the low 2G of memory.  That would be OK if we were
careful to enforce it, but we're not.  What we are careful to
enforce, in functions like runtime·MHeap_SysAlloc, is that we
always return addresses between the heap's arena_start and
arena_start + MaxArena32.

We generally get away with it because we start allocating just
after the program end, so we only run into trouble with
programs that allocate a lot of memory, enough to get past
address 0x80000000.

This changes the code that computes a span index to subtract
arena_start on 32-bit systems just as we currently do on
64-bit systems.

R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/49460043
2014-01-09 15:00:00 -08:00
Rowan Worth
c4770b991b runtime: co-exist with NPTL's pthread_cancel.
NPTL uses SIGRTMIN (signal 32) to effect thread cancellation.
Go's runtime replaces NPTL's signal handler with its own, and
ends up aborting if a C library that ends up calling
pthread_cancel is used.

This patch prevents runtime from replacing NPTL's handler.

Fixes #6997.

R=golang-codereviews, iant, dvyukov
CC=golang-codereviews
https://golang.org/cl/47540043
2014-01-09 09:34:04 -08:00
Ian Lance Taylor
7e639c0229 runtime: change errorCString to a struct
This prevents callers from using reflect to create a new
instance of errorCString with an arbitrary value and calling
the Error method to examine arbitrary memory.

Fixes #7084.

R=golang-codereviews, minux.ma, bradfitz
CC=golang-codereviews
https://golang.org/cl/49600043
2014-01-08 21:40:33 -08:00
Keith Randall
e7d010a6a7 runtime: deallocate specials before deallocating the underlying object.
R=dvyukov
CC=golang-codereviews
https://golang.org/cl/48840043
2014-01-08 12:41:26 -08:00
Ian Lance Taylor
89c5d17878 runtime: handle gdb breakpoint in x86 traceback
This lets stack splits work correctly when running under gdb
when gdb has inserted a breakpoint somewhere on the call
stack.

Fixes #6834.

R=golang-codereviews, minux.ma
CC=golang-codereviews
https://golang.org/cl/48650043
2014-01-08 12:36:31 -08:00
Keith Randall
020b39c3f3 runtime: use special records hung off the MSpan to
record finalizers and heap profile info.  Enables
removing the special bit from the heap bitmap.  Also
provides a generic mechanism for annotating occasional
heap objects.

finalizers
        overhead      per obj
old	680 B	      80 B avg
new	16 B/span     48 B

profile
        overhead      per obj
old	32KB	      24 B + hash tables
new	16 B/span     24 B

R=cshapiro, khr, dvyukov, gobot
CC=golang-codereviews
https://golang.org/cl/13314053
2014-01-07 13:45:50 -08:00
Emil Hessman
aeeda707ff runtime: Fix panic when trying to stop CPU profiling with profiler turned off
Fixes #7063.

R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/47950043
2014-01-06 09:53:55 -08:00
Jeff Sickel
c136197ca8 runtime: plan 9 does have /dev/random
R=golang-codereviews, r, aram
CC=0intro, golang-codereviews, rsc
https://golang.org/cl/43420045
2014-01-04 10:53:22 -08:00
Keith Randall
f59ea4e58b runtime: Fix race detector checks to ignore KindNoPointers bit
when comparing kinds.

R=dvyukov, dave, khr
CC=golang-codereviews
https://golang.org/cl/41660045
2014-01-04 08:43:17 -08:00
David Symonds
8778760a7e runtime: increase attempt count for map iteration order test.
Some builders broke on this test; I'm guessing that was because
this test didn't try hard enough to find a different iteration order.

Update #6719

R=dave
CC=golang-codereviews
https://golang.org/cl/47300043
2014-01-03 10:41:56 +11:00
David Symonds
d4c66a35ba runtime: add a test for randomised map iteration order.
Technically the spec does not guarantee that the iteration order is random,
but it is a property that we have consciously pursued, and so it seems
right to verify that our implementation does indeed randomise.

Update #6719.

R=khr, bradfitz
CC=golang-codereviews
https://golang.org/cl/47010043
2014-01-03 10:10:54 +11:00
Keith Randall
1cc2ff8fc7 runtime: use readrange instead of read to check for races
on map keys and values which are now passed by reference.

R=dvyukov, khr
CC=golang-codereviews
https://golang.org/cl/43490044
2013-12-30 12:03:56 -08:00
Dave Cheney
d2fe44d568 runtime: load runtime.goarm as a byte, not a word
Fixes #6952.

runtime.asminit was incorrectly loading runtime.goarm as a word, not a uint8 which made it subject to alignment issues on arm5 platforms.

Alignment aside, this also meant that the top 3 bytes in R11 would have been garbage and could not be assumed to be setting up the FPU reliably.

R=iant, minux.ma
CC=golang-codereviews
https://golang.org/cl/46240043
2013-12-29 15:25:34 +11:00
Ian Lance Taylor
59583b09f3 runtime/cgo: include <signal.h> to fix build
R=golang-codereviews
TBR=dfc
CC=golang-codereviews
https://golang.org/cl/43120044
2013-12-24 08:24:32 -08:00
Ian Lance Taylor
699dbb60b7 runtime/cgo: always set signal mask before calling pthread_create
This was done correctly for most targets but was missing from
FreeBSD/ARM and Linux/ARM.

R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/45180043
2013-12-24 08:08:15 -08:00
S.Çağlar Onur
41183d015d cgo/runtime: replace sigprocmask with pthread_sigmask.
sigprocmask use in a multithreaded environment is undefined so replace it with pthread_sigmask.

Fixes #6811.

R=jsing, iant
CC=golang-codereviews, golang-dev
https://golang.org/cl/30460043
2013-12-22 08:55:29 -08:00
Shenghou Ma
eb7ed0d626 runtime: fix build for OpenBSD
R=golang-dev
CC=golang-dev
https://golang.org/cl/38030045
2013-12-19 21:12:18 -05:00
Shenghou Ma
0097d30c97 runtime: unblock signals when we try to core dump
Fixes #6988.

R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/44070046
2013-12-19 20:45:05 -05:00
Keith Randall
cbc565a801 reflect: rewrite Value to separate out pointer vs. nonpointer info.
Needed for precise gc and copying stacks.

reflect.Value now takes 4 words instead of 3.

Still to do:
 - un-iword-ify channel ops.
 - un-iword-ify method receivers.

R=golang-dev, iant, rsc, khr
CC=golang-dev
https://golang.org/cl/43040043
2013-12-19 15:15:24 -08:00
Rémy Oudompheng
e6b023473e runtime: reduce delays in finalizer test.
The runtime tests are executed 4 times in all.bash
and there is currently a 5-second delay each time.

R=golang-dev, minux.ma, khr, bradfitz
CC=golang-dev
https://golang.org/cl/42450043
2013-12-19 21:37:44 +01:00
Alex Brainman
7f8a5057dd syscall: add NewCallbackCDecl again
Fixes #6338

R=golang-dev, kin.wilson.za, rsc
CC=golang-dev
https://golang.org/cl/36180044
2013-12-19 14:38:50 +11:00
Alex Brainman
f18e2a3271 runtime/pprof: skip tests that fail on windows-amd64-race builder
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/44180043
2013-12-19 14:15:57 +11:00
Keith Randall
2d20c0d625 runtime: mark objects in free lists as allocated and unscannable.
On the plus side, we don't need to change the bits when mallocing
pointerless objects.  On the other hand, we need to mark objects in the
free lists during GC.  But the free lists are small at GC time, so it
should be a net win.

benchmark                    old ns/op    new ns/op    delta
BenchmarkMalloc8                    40           33  -17.65%
BenchmarkMalloc16                   45           38  -15.72%
BenchmarkMallocTypeInfo8            58           59   +0.85%
BenchmarkMallocTypeInfo16           63           64   +1.10%

R=golang-dev, rsc, dvyukov
CC=cshapiro, golang-dev
https://golang.org/cl/41040043
2013-12-18 17:13:59 -08:00
Brad Fitzpatrick
8c6ef061e3 sync: add Pool type
Adds the Pool type and docs, and use it in fmt.
This is a temporary implementation, until Dmitry
makes it fast.

Uses the API proposal from Russ in http://goo.gl/cCKeb2 but
adds an optional New field, as used in fmt and elsewhere.
Almost all callers want that.

Update #4720

R=golang-dev, rsc, cshapiro, iant, r, dvyukov, khr
CC=golang-dev
https://golang.org/cl/41860043
2013-12-18 11:08:34 -08:00
Alex Brainman
ae9e4db07c runtime: skip broken TestRuntimeGogoBytes on windows
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/43730043
2013-12-18 14:17:47 +11:00
Keith Randall
deb554934c runtime, gc: call interface conversion routines by reference.
Part of getting rid of vararg C calls.

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/23310043
2013-12-17 16:55:06 -08:00
Keith Randall
9aae6c1a8b runtime: don't store evacuate bit as low bit of hashtable overflow pointer.
Hash tables currently store an evacuated bit in the low bit
of the overflow pointer.  That's probably not sustainable in the
long term as GC wants correctly typed & aligned pointers.  It is
also a pain to move any of this code to Go in the current state.

This change moves the evacuated bit into the tophash entries.

Performance change is negligable.

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/14412043
2013-12-17 15:23:31 -08:00
Brad Fitzpatrick
4b76a31c6d runtime: don't crash in SetFinalizer if sizeof *x is zero
And document it explicitly, even though it already said
it wasn't guaranteed.

Fixes #6857

R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/43580043
2013-12-17 14:18:58 -08:00
Russ Cox
a392cf4fd3 runtime: fix test
Was supposed to be in the nm CL.

TBR=r
CC=golang-dev
https://golang.org/cl/42870043
2013-12-16 12:59:30 -05:00
Russ Cox
bc135f6492 runtime: fix crash in runtime.GoroutineProfile
This is a possible Go 1.2.1 candidate.

Fixes #6946.

R=iant, r
CC=golang-dev
https://golang.org/cl/41640043
2013-12-13 15:44:57 -05:00
Carl Shapiro
c69402d82b runtime: remove outdated comment and related indentation
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/39810043
2013-12-10 11:17:43 -08:00
Carl Shapiro
f574726f16 runtime: check for signed zero in printfloat
Fixes #6899

R=golang-dev, r, cshapiro, iant, rsc
CC=golang-dev
https://golang.org/cl/38120043
2013-12-09 17:51:30 -08:00
Russ Cox
4230044bb8 runtime: remove non-extern decls of runtime.goarm
The linker is in charge of providing the one true declaration.

R=golang-dev, dave, r
CC=golang-dev
https://golang.org/cl/39560043
2013-12-09 19:35:07 -05:00
Anthony Martin
274a8e3f56 runtime: do not use memmove in the Plan 9 signal handler
Fixes a regression introduced in revision 4cb93e2900d0.

That revision changed runtime·memmove to use SSE MOVOU
instructions for sizes between 17 and 256 bytes. We were
using memmove to save a copy of the note string during
the note handler. The Plan 9 kernel does not allow the
use of floating point in note handlers (which includes
MOVOU since it touches the XMM registers).

Arguably, runtime·memmove should not be using MOVOU when
GO386=387 but that wouldn't help us on amd64. It's very
important that we guard against any future changes so we
use a simple copy loop instead.

This change is extracted from CL 9796043 (since that CL
is still being ironed out).

R=rsc
CC=golang-dev
https://golang.org/cl/34640045
2013-12-09 18:41:48 -05:00
Carl Shapiro
bc9691c465 cmd/gc, runtime: correct a misnomer regarding dead value maps
The funcdata symbol incorrectly named the dead value map the
dead pointer map.  The dead value map identifies all dead
values, including pointers and non-pointers, in a stack frame.
The purpose of this map is to allow the runtime to poison
locations of dead data to catch lost invariants.

R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/38670043
2013-12-09 14:45:10 -08:00
Russ Cox
7c17982f72 runtime: remove cross-function jump in vlop_arm.s
The new linker will disallow this on arm
(it is already disallowed on amd64 and 386)
in order to be able to lay out each function
separately.

The restriction is only for jumps into the middle
of a function; jumps to the beginning of a function
remain fine.

Prereq for linker cleanup (golang.org/s/go13linker).

R=iant, r, minux.ma
CC=golang-dev
https://golang.org/cl/35800043
2013-12-08 22:52:08 -05:00
Carl Shapiro
76c54c1193 runtime: add GODEBUG option for an electric fence like heap mode
When enabled this new debugging mode will allocate objects on
their own page and never recycle memory addresses.  This is an
essential tool to root cause a broad class of heap corruption.

R=golang-dev, dave, daniel.morsing, dvyukov, rsc, iant, cshapiro
CC=golang-dev
https://golang.org/cl/22060046
2013-12-06 14:40:45 -08:00
Carl Shapiro
f056daf075 cmd/5g, cmd/5l, cmd/6g, cmd/6l, cmd/8g, cmd/8l, cmd/gc, runtime: generate pointer maps by liveness analysis
This change allows the garbage collector to examine stack
slots that are determined as live and containing a pointer
value by the garbage collector.  This results in a mean
reduction of 65% in the number of stack slots scanned during
an invocation of "GOGC=1 all.bash".

Unfortunately, this does not yet allow garbage collection to
be precise for the stack slots computed as live.  Pointers
confound the determination of what definitions reach a given
instruction.  In general, this problem is not solvable without
runtime cost but some advanced cooperation from the compiler
might mitigate common cases.

R=golang-dev, rsc, cshapiro
CC=golang-dev
https://golang.org/cl/14430048
2013-12-05 17:35:22 -08:00
Carl Shapiro
48279bd567 runtime: add an allocation and free tracing for gc debugging
Output for an allocation and free (sweep) follows

MProf_Malloc(p=0xc2100210a0, size=0x50, type=0x0 <single object>)
        #0 0x46ee15 runtime.mallocgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:141
        #1 0x47004f runtime.settype_flush /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:612
        #2 0x45f92c gc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2071
        #3 0x45f89e mgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2050
        #4 0x45258b runtime.mcall /usr/local/google/home/cshapiro/go/src/pkg/runtime/asm_amd64.s:179

MProf_Free(p=0xc2100210a0, size=0x50)
        #0 0x46ee15 runtime.mallocgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:141
        #1 0x47004f runtime.settype_flush /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:612
        #2 0x45f92c gc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2071
        #3 0x45f89e mgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2050
        #4 0x45258b runtime.mcall /usr/local/google/home/cshapiro/go/src/pkg/runtime/asm_amd64.s:179

R=golang-dev, dvyukov, rsc, cshapiro
CC=golang-dev
https://golang.org/cl/21990045
2013-12-03 14:42:38 -08:00
Carl Shapiro
0368a7ceb6 runtime: move stack scanning into the parallel mark phase
This change reduces the cost of the stack scanning by frames.
It moves the stack scanning from the serial root enumeration
phase to the parallel tracing phase.  The output that follows
are timings for the issue 6482 benchmark

Baseline

BenchmarkGoroutineSelect	      50	 108027405 ns/op
BenchmarkGoroutineBlocking	      50	  89573332 ns/op
BenchmarkGoroutineForRange	      20	  95614116 ns/op
BenchmarkGoroutineIdle		      20	 122809512 ns/op

Stack scan by frames, non-parallel

BenchmarkGoroutineSelect	      20	 297138929 ns/op
BenchmarkGoroutineBlocking	      20	 301137599 ns/op
BenchmarkGoroutineForRange	      10	 312499469 ns/op
BenchmarkGoroutineIdle		      10	 209428876 ns/op

Stack scan by frames, parallel

BenchmarkGoroutineSelect	      20	 183938431 ns/op
BenchmarkGoroutineBlocking	      20	 170109999 ns/op
BenchmarkGoroutineForRange	      20	 179628882 ns/op
BenchmarkGoroutineIdle		      20	 157541498 ns/op

The remaining performance disparity is due to inefficiencies
in gentraceback and its callees.  The effect was isolated by
using a parallel stack scan where scanstack was modified to do
a conservative scan of the stack segments without gentraceback
followed by a call of gentrackback with a no-op callback.

The output that follows are the top-10 most frequent tops of
stacks as determined by the Linux perf record facility.

Baseline

+  25.19%  gc.test  gc.test            [.] runtime.xchg
+  19.00%  gc.test  gc.test            [.] scanblock
+   8.53%  gc.test  gc.test            [.] scanstack
+   8.46%  gc.test  gc.test            [.] flushptrbuf
+   5.08%  gc.test  gc.test            [.] procresize
+   3.57%  gc.test  gc.test            [.] runtime.chanrecv
+   2.94%  gc.test  gc.test            [.] dequeue
+   2.74%  gc.test  gc.test            [.] addroots
+   2.25%  gc.test  gc.test            [.] runtime.ready
+   1.33%  gc.test  gc.test            [.] runtime.cas64

Gentraceback

+  18.12%  gc.test  gc.test             [.] runtime.xchg
+  14.68%  gc.test  gc.test             [.] scanblock
+   8.20%  gc.test  gc.test             [.] runtime.gentraceback
+   7.38%  gc.test  gc.test             [.] flushptrbuf
+   6.84%  gc.test  gc.test             [.] scanstack
+   5.92%  gc.test  gc.test             [.] runtime.findfunc
+   3.62%  gc.test  gc.test             [.] procresize
+   3.15%  gc.test  gc.test             [.] readvarint
+   1.92%  gc.test  gc.test             [.] addroots
+   1.87%  gc.test  gc.test             [.] runtime.chanrecv

R=golang-dev, dvyukov, rsc
CC=golang-dev
https://golang.org/cl/17410043
2013-12-03 14:12:55 -08:00
Keith Randall
24699fb05c runtime: get rid of concatstring's vararg C argument.
Pass as a slice of strings instead.  For 2-5 strings, implement
dedicated routines so no slices are needed.

static call counts in the go binary:
 2 strings: 342 occurrences
 3 strings:  98
 4 strings:  30
 5 strings:  13
6+ strings:  14

Why?  C varags, bad for stack scanning and copying.

R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/36380043
2013-12-03 10:39:19 -08:00
Keith Randall
c0f2294577 runtime: fix race detector when map keys/values are passed by pointer.
Now that the map implementation is reading the keys and values from
arbitrary memory (instead of from stack slots), it needs to tell the
race detector when it does so.

Fixes #6875.

R=golang-dev, dave
CC=golang-dev
https://golang.org/cl/36360043
2013-12-02 18:03:25 -08:00
Keith Randall
c792bde9ef runtime: don't use ... formal argument to deferreturn.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/28860043
2013-12-02 13:07:15 -08:00
Keith Randall
3278dc158e runtime: pass key/value to map accessors by reference, not by value.
This change is part of the plan to get rid of all vararg C calls
which are a pain for getting exact stack scanning.

We allocate a chunk of zero memory to return a pointer to when a
map access doesn't find the key.  This is simpler than returning nil
and fixing things up in the caller.  Linker magic allocates a single
zero memory area that is shared by all (non-reflect-generated) map
types.

Passing things by reference gets rid of some copies, so it speeds
up code with big keys/values.

benchmark             old ns/op    new ns/op    delta
BenchmarkBigKeyMap           34           31   -8.48%
BenchmarkBigValMap           37           30  -18.62%
BenchmarkSmallKeyMap         26           23  -11.28%

R=golang-dev, dvyukov, khr, rsc
CC=golang-dev
https://golang.org/cl/14794043
2013-12-02 13:05:04 -08:00
Russ Cox
2c98a3bc2e cmd/5l, runtime: fix divide for profiling tracebacks on ARM
Two bugs:
1. The first iteration of the traceback always uses LR when provided,
which it is (only) during a profiling signal, but in fact LR is correct
only if the stack frame has not been allocated yet. Otherwise an
intervening call may have changed LR, and the saved copy in the stack
frame should be used. Fix in traceback_arm.c.

2. The division runtime call adds 8 bytes to the stack. In order to
keep the traceback routines happy, it must copy the saved LR into
the new 0(SP). Change

        SUB $8, SP

into

        MOVW    0(SP), R11 // r11 is temporary, for use by linker
        MOVW.W  R11, -8(SP)

to update SP and 0(SP) atomically, so that the traceback always
sees a saved LR at 0(SP).

Fixes #6681.

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/19910044
2013-10-31 18:15:55 +00:00
Russ Cox
b88148b9a0 undo CL 19810043 / 352f3b7c9664
The CL causes misc/cgo/test to fail randomly.
I suspect that the problem is the use of a division instruction
in usleep, which can be called while trying to acquire an m
and therefore cannot store the denominator in m.
The solution to that would be to rewrite the code to use a
magic multiply instead of a divide, but now we're getting
pretty far off the original code.

Go back to the original in preparation for a different,
less efficient but simpler fix.

««« original CL description
cmd/5l, runtime: make ARM integer division profiler-friendly

The implementation of division constructed non-standard
stack frames that could not be handled by the traceback
routines.

CL 13239052 left the frames non-standard but fixed them
for the specific case of a divide-by-zero panic.
A profiling signal can arrive at any time, so that fix
is not sufficient.

Change the division to store the extra argument in the M struct
instead of in a new stack slot. That keeps the frames bog standard
at all times.

Also fix a related bug in the traceback code: when starting
a traceback, the LR register should be ignored if the current
function has already allocated its stack frame and saved the
original LR on the stack. The stack copy should be used, as the
LR register may have been modified.

Combined, these make the torture test from issue 6681 pass.

Fixes #6681.

R=golang-dev, r, josharian
CC=golang-dev
https://golang.org/cl/19810043
»»»

TBR=r
CC=golang-dev
https://golang.org/cl/20350043
2013-10-31 17:18:57 +00:00
Russ Cox
b0db472ea2 cmd/5l, runtime: make ARM integer division profiler-friendly
The implementation of division constructed non-standard
stack frames that could not be handled by the traceback
routines.

CL 13239052 left the frames non-standard but fixed them
for the specific case of a divide-by-zero panic.
A profiling signal can arrive at any time, so that fix
is not sufficient.

Change the division to store the extra argument in the M struct
instead of in a new stack slot. That keeps the frames bog standard
at all times.

Also fix a related bug in the traceback code: when starting
a traceback, the LR register should be ignored if the current
function has already allocated its stack frame and saved the
original LR on the stack. The stack copy should be used, as the
LR register may have been modified.

Combined, these make the torture test from issue 6681 pass.

Fixes #6681.

R=golang-dev, r, josharian
CC=golang-dev
https://golang.org/cl/19810043
2013-10-30 18:50:34 +00:00