qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-10-04 08:21:22 -06:00

Author	SHA1	Message	Date
Dmitriy Vyukov	a12661329b	runtime: fix triggering of forced GC mstats.last_gc is unix time now, it is compared with abstract monotonic time. On my machine GC is forced every 5 mins regardless of last_gc. LGTM=rsc R=golang-codereviews CC=golang-codereviews, iant, rsc https://golang.org/cl/91350045	2014-05-13 09:53:03 +04:00
Dmitriy Vyukov	acb03b8028	runtime: optimize markspan Increases throughput by 2x on a memory hungry program on 8-node NUMA machine. LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/100230043	2014-05-07 19:32:34 +04:00
Dmitriy Vyukov	350a8fcde1	runtime: make MemStats.LastGC Unix time again The monotonic clock patch changed all runtime times to abstract monotonic time. As the result user-visible MemStats.LastGC become monotonic time as well. Restore Unix time for LastGC. This is the simplest way to expose time.now to runtime that I found. Another option would be to change time.now to C called int64 runtime.unixnanotime() and then express time.now in terms of it. But this would require to introduce 2 64-bit divisions into time.now. Another option would be to change time.now to C called void runtime.unixnanotime1(struct {int64 sec, int32 nsec} *now) and then express both time.now and runtime.unixnanotime in terms of it. Fixes #7852. LGTM=minux.ma, iant R=minux.ma, rsc, iant CC=golang-codereviews https://golang.org/cl/93720045	2014-05-02 17:32:42 +01:00
Russ Cox	a5b1530557	runtime: adjust GC debug print to include source pointers Having the pointers means you can grub around in the binary finding out more about them. This helped with issue 7748. LGTM=minux.ma, bradfitz R=golang-codereviews, minux.ma, bradfitz CC=golang-codereviews https://golang.org/cl/88090045	2014-04-16 11:39:43 -04:00
Dmitriy Vyukov	55e0f36fb4	runtime: fix program termination when main goroutine calls Goexit Do not consider idle finalizer/bgsweep/timer goroutines as doing something useful. We can't simply set isbackground for the whole lifetime of the goroutines, because when finalizer goroutine calls user function, we do want to consider it as doing something useful. This is borken due to timers for quite some time. With background sweep is become even more broken. Fixes #7784. LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/87960044	2014-04-15 19:48:17 +04:00
Dmitriy Vyukov	8fc6ed4c89	sync: less agressive local caching in Pool Currently Pool can cache up to 15 elements per P, and these elements are not accesible to other Ps. If a Pool caches large objects, say 2MB, and GOMAXPROCS is set to a large value, say 32, then the Pool can waste up to 960MB. The new caching policy caches at most 1 per-P element, the rest is shared between Ps. Get/Put performance is unchanged. Nested Get/Put performance is 57% worse. However, overall scalability of nested Get/Put is significantly improved, so the new policy starts winning under contention. benchmark old ns/op new ns/op delta BenchmarkPool 27.4 26.7 -2.55% BenchmarkPool-4 6.63 6.59 -0.60% BenchmarkPool-16 1.98 1.87 -5.56% BenchmarkPool-64 1.93 1.86 -3.63% BenchmarkPoolOverlflow 3970 6235 +57.05% BenchmarkPoolOverlflow-4 10935 1668 -84.75% BenchmarkPoolOverlflow-16 13419 520 -96.12% BenchmarkPoolOverlflow-64 10295 380 -96.31% LGTM=rsc R=rsc CC=golang-codereviews, khr https://golang.org/cl/86020043	2014-04-14 21:13:32 +04:00
Russ Cox	5539ef02b6	runtime: make times in GODEBUG=gctrace=1 output clearer TBR=0intro CC=golang-codereviews https://golang.org/cl/86620043	2014-04-10 14:34:48 -04:00
Russ Cox	95ee7d6414	runtime: use 3x fewer nanotime calls in garbage collection Cuts the number of calls from 6 to 2 in the non-debug case. LGTM=iant R=golang-codereviews, iant CC=0intro, aram, golang-codereviews, khr https://golang.org/cl/86040043	2014-04-09 10:38:12 -04:00
Russ Cox	72c5d5e756	reflect, runtime: fix crash in GC due to reflect.call + precise GC Given type Outer struct { Inner ... } the compiler generates the implementation of (Outer).M dispatching to the embedded Inner. The implementation is logically: func (p Outer) M() { (p.Inner).M() } but since the only change here is the replacement of one pointer receiver with another, the actual generated code overwrites the original receiver with the p.Inner pointer and then jumps to the M method expecting the Inner receiver. During reflect.Value.Call, we create an argument frame and the associated data structures to describe it to the garbage collector, populate the frame, call reflect.call to run a function call using that frame, and then copy the results back out of the frame. The reflect.call function does a memmove of the frame structure onto the stack (to set up the inputs), runs the call, and the memmoves the stack back to the frame structure (to preserve the outputs). Originally reflect.call did not distinguish inputs from outputs: both memmoves were for the full stack frame. However, in the case where the called function was one of these wrappers, the rewritten receiver is almost certainly a different type than the original receiver. This is not a problem on the stack, where we use the program counter to determine the type information and understand that during (Outer).M the receiver is an Outer while during (Inner).M the receiver in the same memory word is now an Inner. But in the statically typed argument frame created by reflect, the receiver is always an Outer. Copying the modified receiver pointer off the stack into the frame will store an Inner there, and then if a garbage collection happens to scan that argument frame before it is discarded, it will scan the Inner memory as if it were an Outer. If the two have different memory layouts, the collection will intepret the memory incorrectly. Fix by only copying back the results. Fixes #7725. LGTM=khr R=khr CC=dave, golang-codereviews https://golang.org/cl/85180043	2014-04-08 11:11:35 -04:00
Russ Cox	28f1868fed	cmd/gc, runtime: make GODEBUG=gcdead=1 mode work with liveness Trying to make GODEBUG=gcdead=1 work with liveness and in particular ambiguously live variables. 1. In the liveness computation, mark all ambiguously live variables as live for the entire function, except the entry. They are zeroed directly after entry, and we need them not to be poisoned thereafter. 2. In the liveness computation, compute liveness (and deadness) for all parameters, not just pointer-containing parameters. Otherwise gcdead poisons untracked scalar parameters and results. 3. Fix liveness debugging print for -live=2 to use correct bitmaps. (Was not updated for compaction during compaction CL.) 4. Correct varkill during map literal initialization. Was killing the map itself instead of the inserted value temp. 5. Disable aggressive varkill cleanup for call arguments if the call appears in a defer or go statement. 6. In the garbage collector, avoid bug scanning empty strings. An empty string is two zeros. The multiword code only looked at the first zero and then interpreted the next two bits in the bitmap as an ordinary word bitmap. For a string the bits are 11 00, so if a live string was zero length with a 0 base pointer, the poisoning code treated the length as an ordinary word with code 00, meaning it needed poisoning, turning the string into a poison-length string with base pointer 0. By the same logic I believe that a live nil slice (bits 11 01 00) will have its cap poisoned. Always scan full multiword struct. 7. In the runtime, treat both poison words (PoisonGC and PoisonStack) as invalid pointers that warrant crashes. Manual testing as follows: - Create a script called gcdead on your PATH containing: #!/bin/bash GODEBUG=gcdead=1 GOGC=10 GOTRACEBACK=2 exec "$@" - Now you can build a test and then run 'gcdead ./foo.test'. - More importantly, you can run 'go test -short -exec gcdead std' to run all the tests. Fixes #7676. While here, enable the precise scanning of slices, since that was disabled due to bugs like these. That now works, both with and without gcdead. Fixes #7549. LGTM=khr R=khr CC=golang-codereviews https://golang.org/cl/83410044	2014-04-03 20:33:25 -04:00
Russ Cox	4676fae525	cmd/gc, cmd/ld, runtime: compact liveness bitmaps Reduce footprint of liveness bitmaps by about 5x. 1. Mark all liveness bitmap symbols as 4-byte aligned (they were aligned to a larger size by default). 2. The bitmap data is a bitmap count n followed by n bitmaps. Each bitmap begins with its own count m giving the number of bits. All the m's are the same for the n bitmaps. Emit this bitmap length once instead of n times. 3. Many bitmaps within a function have the same bit values, but each call site was given a distinct bitmap. Merge duplicate bitmaps so that no bitmap is written more than once. 4. Many functions end up with the same aggregate bitmap data. We used to name the bitmap data funcname.gcargs and funcname.gclocals. Instead, name it gclocals.<md5 of data> and mark it dupok so that the linker coalesces duplicate sets. This cut the bitmap data remaining after step 3 by 40%; I was not expecting it to be quite so dramatic. Applied to "go build -ldflags -w code.google.com/p/go.tools/cmd/godoc": bitmaps pclntab binary on disk before this CL 1326600 1985854 12738268 4-byte align 1154288 (0.87x) 1985854 (1.00x) 12566236 (0.99x) one bitmap len 782528 (0.54x) 1985854 (1.00x) 12193500 (0.96x) dedup bitmap 414748 (0.31x) 1948478 (0.98x) 11787996 (0.93x) dedup bitmap set 245580 (0.19x) 1948478 (0.98x) 11620060 (0.91x) While here, remove various dead blocks of code from plive.c. Fixes #6929. Fixes #7568. LGTM=khr R=khr CC=golang-codereviews https://golang.org/cl/83630044	2014-04-02 16:49:27 -04:00
Russ Cox	1ec4d5e9e7	runtime: adjust GODEBUG=allocfreetrace=1 and GODEBUG=gcdead=1 GODEBUG=allocfreetrace=1: The allocfreetrace=1 mode prints a stack trace for each block allocated and freed, and also a stack trace for each garbage collection. It was implemented by reusing the heap profiling support: if allocfreetrace=1 then the heap profile was effectively running at 1 sample per 1 byte allocated (always sample). The stack being shown at allocation was the stack gathered for profiling, meaning it was derived only from the program counters and did not include information about function arguments or frame pointers. The stack being shown at free was the allocation stack, not the free stack. If you are generating this log, you can find the allocation stack yourself, but it can be useful to see exactly the sequence that led to freeing the block: was it the garbage collector or an explicit free? Now that the garbage collector runs on an m0 stack, the stack trace for the garbage collector was never interesting. Fix all these problems: 1. Decouple allocfreetrace=1 from heap profiling. 2. Print the standard goroutine stack traces instead of a custom format. 3. Print the stack trace at time of allocation for an allocation, and print the stack trace at time of free (not the allocation trace again) for a free. 4. Print all goroutine stacks at garbage collection. Having all the stacks means that you can see the exact point at which each goroutine was preempted, which is often useful for identifying liveness-related errors. GODEBUG=gcdead=1: This mode overwrites dead pointers with a poison value. Detect the poison value as an invalid pointer during collection, the same way that small integers are invalid pointers. LGTM=khr R=khr CC=golang-codereviews https://golang.org/cl/81670043	2014-04-01 13:30:10 -04:00
Russ Cox	5a23a7e52c	runtime: enable 'bad pointer' check during garbage collection of Go stack frames This is the same check we use during stack copying. The check cannot be applied to C stack frames, even though we do emit pointer bitmaps for the arguments, because (1) the pointer bitmaps assume all arguments are always live, not true of outputs during the prologue, and (2) the pointer bitmaps encode interface values as pointer pairs, not true of interfaces holding integers. For the rest of the frames, however, we should hold ourselves to the rule that a pointer marked live really is initialized. The interface scanning already implicitly checks this because it interprets the type word as a valid type pointer. This may slow things down a little because of the extra loads. Or it may speed things up because we don't bother enqueuing nil pointers anymore. Enough of the rest of the system is slow right now that we can't measure it meaningfully. Enable for now, even if it is slow, to shake out bugs in the liveness bitmaps, and then decide whether to turn it off for the Go 1.3 release (issue 7650 reminds us to do this). The new m->traceback field lets us force printing of fp= values on all goroutine stack traces when we detect a bad pointer. This makes it easier to understand exactly where in the frame the bad pointer is, so that we can trace it back to a specific variable and determine what is wrong. Update #7650 LGTM=khr R=khr CC=golang-codereviews https://golang.org/cl/80860044	2014-03-27 14:06:15 -04:00
Dmitriy Vyukov	f8c350873c	runtime: fix yet another race in bgsweep Currently it's possible that bgsweep finishes before all spans have been swept (we only know that sweeping of all spans has started). In such case bgsweep may fail wake up runfinq goroutine when it needs to. finq may still be nil at this point, but some finalizers may be queued later. Make bgsweep to wait for sweeping to complete, then it can decide whether it needs to wake up runfinq for sure. Update #7533 LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/75960043	2014-03-26 15:11:36 +04:00
Dmitriy Vyukov	40f5e67571	runtime: minor improvement of string scanning If we set obj, then it will be enqueued for marking at the end of the scanning loop. This is not necessary, since we've already marked it. This can wait for 1.4 if you wish. LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/80030043	2014-03-26 15:03:58 +04:00
Keith Randall	fff63c2448	runtime: WriteHeapDump dumps the heap to a file. See http://golang.org/s/go13heapdump for the file format. LGTM=rsc R=rsc, bradfitz, dvyukov, khr CC=golang-codereviews https://golang.org/cl/37540043	2014-03-25 15:09:49 -07:00
Keith Randall	1b45cc45e3	runtime: redo stack map entries to avoid false retention Change two-bit stack map entries to encode: 0 = dead 1 = scalar 2 = pointer 3 = multiword If multiword, the two-bit entry for the following word encodes: 0 = string 1 = slice 2 = iface 3 = eface That way, during stack scanning we can check if a string is zero length or a slice has zero capacity. We can avoid following the contained pointer in those cases. It is safe to do so because it can never be dereferenced, and it is desirable to do so because it may cause false retention of the following block in memory. Slice feature turned off until issue 7564 is fixed. Update #7549 LGTM=rsc R=golang-codereviews, bradfitz, rsc CC=golang-codereviews https://golang.org/cl/76380043	2014-03-25 14:11:34 -07:00
Ian Lance Taylor	4ebfa83199	runtime: accurately record whether heap memory is reserved The existing code did not have a clear notion of whether memory has been actually reserved. It checked based on whether in 32-bit mode or 64-bit mode and (on GNU/Linux) the requested address, but it confused the requested address and the returned address. LGTM=rsc R=rsc, dvyukov CC=golang-codereviews, michael.hudson https://golang.org/cl/79610043	2014-03-25 13:22:19 -07:00
Ian Lance Taylor	8de04c78b7	runtime: change nproc local variable to uint32 The nproc and ndone fields are uint32. This makes the type consistent. LGTM=minux.ma R=golang-codereviews, minux.ma CC=golang-codereviews https://golang.org/cl/79340044	2014-03-25 05:18:08 -07:00
Dmitriy Vyukov	fed5428c4a	runtime: fix another race in bgsweep It's possible that bgsweep constantly does not catch up for some reason, in this case runfinq was not woken at all. R=rsc CC=golang-codereviews https://golang.org/cl/75940043	2014-03-14 23:32:12 +04:00
Dmitriy Vyukov	8d321625fd	runtime: fix spans corruption The problem was that spans end up in wrong lists after split (e.g. in h->busy instead of h->central->empty). Also the span can be non-swept before split, I don't know what it can cause, but it's safer to operate on swept spans. Fixes #7544. R=golang-codereviews, rsc CC=golang-codereviews, khr https://golang.org/cl/76160043	2014-03-14 23:25:48 +04:00
Dmitriy Vyukov	0da73b9f07	runtime: fix a race in bgsweep See the comment for description. LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/75670044	2014-03-14 21:21:44 +04:00
Russ Cox	54c901cd08	runtime: fix empty string handling in garbage collector The garbage collector uses type information to guide the traversal of the heap. If it sees a field that should be a string, it marks the object pointed at by the string data pointer as visited but does not bother to look at the data, because strings contain bytes, not pointers. If you save s[len(s):] somewhere, though, the string data pointer actually points just beyond the string data; if the string data were exactly the size of an allocated block, the string data pointer would actually point at the next block. It is incorrect to mark that next block as visited and not bother to look at the data, because the next block may be some other type entirely. The fix is to ignore strings with zero length during collection: they are empty and can never become non-empty: the base pointer will never be used again. The handling of slices already does this (but using cap instead of len). This was not a bug in Go 1.2, because until January all string allocations included a trailing NUL byte not included in the length, so s[len(s):] still pointed inside the string allocation (at the NUL). This bug was causing the crashes in test/run.go. Specifically, the parsing of a regexp in package regexp/syntax allocated a []syntax.Inst with rounded size 1152 bytes. In fact it allocated many such slices, because during the processing of test/index2.go it creates thousands of regexps that are all approximately the same complexity. That takes a long time, and test/run works on other tests in other goroutines. One such other test is chan/perm.go, which uses an 1152-byte source file. test/run reads that file into a []byte and then calls strings.Split(string(src), "\n"). The string(src) creates an 1152-byte string - and there's a very good chance of it landing next to one of the many many regexp slices already allocated - and then because the file ends in a \n, strings.Split records the tail empty string as the final element in the slice. A garbage collection happens at this point, the collection finds that string before encountering the []syntax.Inst data it now inadvertently points to, and the []syntax.Inst data is not scanned for the pointers that it contains. Each syntax.Inst contains a []rune, those are missed, and the backing rune arrays are freed for reuse. When the regexp is later executed, the runes being searched for are no longer runes at all, and there is no match, even on text that should match. On 64-bit machines the pointer in the []rune inside the syntax.Inst is larger (along with a few other pointers), pushing the []syntax.Inst backing array into a larger size class, avoiding the collision with chan/perm.go's inadvertently sized file. I expect this was more prevalent on OS X than on Linux or Windows because those managed to run faster or slower and didn't overlap index2.go with chan/perm.go as often. On the ARM systems, we only run one errorcheck test at a time, so index2 and chan/perm would never overlap. It is possible that this bug is the root cause of other crashes as well. For now we only know it is the cause of the test/run crash. Many thanks to Dmitriy for help debugging. Fixes #7344. Fixes #7455. LGTM=r, dvyukov, dave, iant R=golang-codereviews, dave, r, dvyukov, delpontej, iant CC=golang-codereviews, khr https://golang.org/cl/74250043	2014-03-11 23:58:39 -04:00
Dmitriy Vyukov	3877f1d9c8	runtime: remove atomic CAS loop from marknogc Spans are now private to threads, and the loop is removed from all other functions. Remove it from marknogc for consistency. LGTM=khr, rsc R=golang-codereviews, bradfitz, khr CC=golang-codereviews, khr, rsc https://golang.org/cl/72520043	2014-03-11 17:35:49 +04:00
Dmitriy Vyukov	38f6c3f59d	runtime: wipe out bitSpecial from GC code LGTM=khr, rsc R=golang-codereviews, bradfitz, khr CC=golang-codereviews, khr, rsc https://golang.org/cl/72480044	2014-03-11 17:33:03 +04:00
Keith Randall	1306279cd1	runtime: remove unused declarations. LGTM=bradfitz R=golang-codereviews, bradfitz CC=golang-codereviews https://golang.org/cl/73720044	2014-03-10 16:02:46 -07:00
Russ Cox	b08156cd87	runtime: fix memory leak in runfinq One reason the sync.Pool finalizer test can fail is that this function's ef1 contains uninitialized data that just happens to point at some of the old pool. I've seen this cause retention of a single pool cache line (32 elements) on arm. Really we need liveness information for C functions, but for now we can be more careful about data in long-lived C functions that block. LGTM=bradfitz, dvyukov R=golang-codereviews, bradfitz, dvyukov CC=golang-codereviews, iant, khr https://golang.org/cl/72490043	2014-03-07 11:27:01 -05:00
Russ Cox	da1bea0ef0	runtime: fix malloc page alignment + efence Two memory allocator bug fixes. - efence is not maintaining the proper heap metadata to make eventual memory reuse safe, so use SysFault. - now that our heap PageSize is 8k but most hardware uses 4k pages, SysAlloc and SysReserve results must be explicitly aligned. Do that in a few more call sites and document this fact in malloc.h. Fixes #7448. LGTM=iant R=golang-codereviews, josharian, iant CC=dvyukov, golang-codereviews https://golang.org/cl/71750048	2014-03-06 18:34:29 -05:00
David du Colombier	fb5e1e1fa1	runtime: fix warnings on Plan 9 warning: pkg/runtime/mgc0.c:2352 format mismatch p UVLONG, arg 2 warning: pkg/runtime/mgc0.c:2352 format mismatch p UVLONG, arg 3 LGTM=bradfitz R=golang-codereviews, bradfitz CC=golang-codereviews https://golang.org/cl/71950044	2014-03-06 20:56:22 +01:00
Dmitriy Vyukov	aea99eda0f	runtime: fix runaway memory usage It was caused by mstats.heap_alloc skew. Fixes #7430. LGTM=khr R=golang-codereviews, khr CC=golang-codereviews, rsc https://golang.org/cl/69870055	2014-03-06 21:33:00 +04:00
Keith Randall	e9445547b6	runtime: move stack shrinking until after sweepgen is incremented. Before GC, we flush all the per-P allocation caches. Doing stack shrinking mid-GC causes these caches to fill up. At the end of gc, the sweepgen is incremented which causes all of the data in these caches to be in a bad state (cached but not yet swept). Move the stack shrinking until after sweepgen is incremented, so any caching that happens as part of shrinking is done with already-swept data. Reenable stack copying. LGTM=bradfitz R=golang-codereviews, bradfitz CC=golang-codereviews https://golang.org/cl/69620043	2014-02-27 14:20:15 -08:00
Keith Randall	1665b006a5	runtime: grow stack by copying On stack overflow, if all frames on the stack are copyable, we copy the frames to a new stack twice as large as the old one. During GC, if a G is using less than 1/4 of its stack, copy the stack to a stack half its size. TODO - Do something about C frames. When a C frame is in the stack segment, it isn't copyable. We allocate a new segment in this case. - For idempotent C code, we can abort it, copy the stack, then retry. I'm working on a separate CL for this. - For other C code, we can raise the stackguard to the lowest Go frame so the next call that Go frame makes triggers a copy, which will then succeed. - Pick a starting stack size? The plan is that eventually we reach a point where the stack contains only copyable frames. LGTM=rsc R=dvyukov, rsc CC=golang-codereviews https://golang.org/cl/54650044	2014-02-26 23:28:44 -08:00
Keith Randall	3b5278fca6	runtime: get rid of the settype buffer and lock. MCaches now hold a MSpan for each sizeclass which they have exclusive access to allocate from, so no lock is needed. Modifying the heap bitmaps also no longer requires a cas. runtime.free gets more expensive. But we don't use it much any more. It's not much faster on 1 processor, but it's a lot faster on multiple processors. benchmark old ns/op new ns/op delta BenchmarkSetTypeNoPtr1 24 23 -0.42% BenchmarkSetTypeNoPtr2 33 34 +0.89% BenchmarkSetTypePtr1 51 49 -3.72% BenchmarkSetTypePtr2 55 54 -1.98% benchmark old ns/op new ns/op delta BenchmarkAllocation 52739 50770 -3.73% BenchmarkAllocation-2 33957 34141 +0.54% BenchmarkAllocation-3 33326 29015 -12.94% BenchmarkAllocation-4 38105 25795 -32.31% BenchmarkAllocation-5 68055 24409 -64.13% BenchmarkAllocation-6 71544 23488 -67.17% BenchmarkAllocation-7 68374 23041 -66.30% BenchmarkAllocation-8 70117 20758 -70.40% LGTM=rsc, dvyukov R=dvyukov, bradfitz, khr, rsc CC=golang-codereviews https://golang.org/cl/46810043	2014-02-26 15:52:58 -08:00
Dave Cheney	7c8280c9ef	all: merge NaCl branch (part 1) See golang.org/s/go13nacl for design overview. This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come. CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately. The exact change lists included are 15050047: syscall: support for Native Client 15360044: syscall: unzip implementation for Native Client 15370044: syscall: Native Client SRPC implementation 15400047: cmd/dist, cmd/go, go/build, test: support for Native Client 15410048: runtime: support for Native Client 15410049: syscall: file descriptor table for Native Client 15410050: syscall: in-memory file system for Native Client 15440048: all: update +build lines for Native Client port 15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client 15570045: os: support for Native Client 15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32 15690044: net: support for Native Client 15690048: runtime: support for fake time like on Go Playground 15690051: build: disable various tests on Native Client LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/68150047	2014-02-25 09:47:42 -05:00
Dmitriy Vyukov	ea87501750	runtime: fix heap memory corruption With concurrent sweeping finc if modified by runfinq and queuefinalizer concurrently. Fixes crashes like this one: http://build.golang.org/log/6ad7b59ef2e93e3c9347eabfb4c4bd66df58fd5a Fixes #7324. Update #7396 LGTM=rsc R=golang-codereviews, minux.ma, rsc CC=golang-codereviews, khr https://golang.org/cl/67980043	2014-02-24 20:53:50 +04:00
Dmitriy Vyukov	6e612ae0f5	runtime: fix potential memory corruption Reinforce the guarantee that MSpan_EnsureSwept actually ensures that the span is swept. I have not observed crashes related to this, but I do not see why it can't crash as well. LGTM=rsc R=golang-codereviews CC=golang-codereviews, khr, rsc https://golang.org/cl/67990043	2014-02-24 20:53:20 +04:00
Dmitriy Vyukov	0ef0d6cd7b	runtime: fix double symbol definition runfinqv is already defined the same way on line 271. There may also be something to fix in compiler/linker wrt diagnostics. Fixes #7375. LGTM=bradfitz R=golang-codereviews, dave, bradfitz CC=golang-codereviews https://golang.org/cl/67850044	2014-02-24 20:23:03 +04:00
Russ Cox	67c83db60d	runtime: use goc2c as much as possible Package runtime's C functions written to be called from Go started out written in C using carefully constructed argument lists and the FLUSH macro to write a result back to memory. For some functions, the appropriate parameter list ended up being architecture-dependent due to differences in alignment, so we added 'goc2c', which takes a .goc file containing Go func declarations but C bodies, rewrites the Go func declaration to equivalent C declarations for the target architecture, adds the needed FLUSH statements, and writes out an equivalent C file. That C file is compiled as part of package runtime. Native Client's x86-64 support introduces the most complex alignment rules yet, breaking many functions that could until now be portably written in C. Using goc2c for those avoids the breakage. Separately, Keith's work on emitting stack information from the C compiler would require the hand-written functions to add #pragmas specifying how many arguments are result parameters. Using goc2c for those avoids maintaining #pragmas. For both reasons, use goc2c for as many Go-called C functions as possible. This CL is a replay of the bulk of CL 15400047 and CL 15790043, both of which were reviewed as part of the NaCl port and are checked in to the NaCl branch. This CL is part of bringing the NaCl code into the main tree. No new code here, just reformatting and occasional movement into .h files. LGTM=r R=dave, alex.brainman, r CC=golang-codereviews https://golang.org/cl/65220044	2014-02-20 15:58:47 -05:00
Russ Cox	86e3cb8da5	runtime: introduce MSpan.needzero instead of writing to span data This cleans up the code significantly, and it avoids any possible problems with madvise zeroing out some but not all of the data. Fixes #6400. LGTM=dave R=dvyukov, dave CC=golang-codereviews https://golang.org/cl/57680046	2014-02-13 11:10:31 -05:00
Dmitriy Vyukov	f8e4a2ef94	runtime: fix concurrent GC sweep The issue was that one of the MSpan_Sweep callers was doing sweep with preemption enabled. Additional checks are added. LGTM=rsc R=rsc, dave CC=golang-codereviews https://golang.org/cl/62990043	2014-02-13 19:36:45 +04:00
Russ Cox	73a304356b	runtime: fix non-concurrent sweep State of the world: CL 46430043 introduced a new concurrent sweep but is broken. CL 62360043 made the new sweep non-concurrent to try to fix the world while we understand what's wrong with the concurrent version. This CL fixes the non-concurrent form to run finalizers. This CL is just a band-aid to get the build green again. Dmitriy is working on understanding and then fixing what's wrong with the concurrent sweep. TBR=dvyukov CC=golang-codereviews https://golang.org/cl/62370043	2014-02-12 15:54:21 -05:00
Dmitriy Vyukov	3cac829ff4	runtime: temporary disable concurrent GC sweep We see failures on builders, e.g.: http://build.golang.org/log/70bb28cd6bcf8c4f49810a011bb4337a61977bf4 LGTM=rsc, dave R=rsc, dave CC=golang-codereviews https://golang.org/cl/62360043	2014-02-13 00:03:27 +04:00
Dmitriy Vyukov	3c3be62201	runtime: concurrent GC sweep Moves sweep phase out of stoptheworld by adding background sweeper goroutine and lazy on-demand sweeping. It turned out to be somewhat trickier than I expected, because there is no point in time when we know size of live heap nor consistent number of mallocs and frees. So everything related to next_gc, mprof, memstats, etc becomes trickier. At the end of GC next_gc is conservatively set to heap_allocGOGC, which is much larger than real value. But after every sweep next_gc is decremented by freedGOGC. So when everything is swept next_gc becomes what it should be. For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs), because by the end of GC we know number of frees for the previous GC. Significant caution is required to not cross yet-unknown real value of next_gc. This is achieved by 2 means: 1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral. 2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are returned to heap. This provides quite strong guarantees that heap does not grow when it should now. http-1 allocated 7036 7033 -0.04% allocs 60 60 +0.00% cputime 51050 46700 -8.52% gc-pause-one 34060569 1777993 -94.78% gc-pause-total 2554 133 -94.79% latency-50 178448 170926 -4.22% latency-95 284350 198294 -30.26% latency-99 345191 220652 -36.08% rss 101564416 101007360 -0.55% sys-gc 6606832 6541296 -0.99% sys-heap 88801280 87752704 -1.18% sys-other 7334208 7405928 +0.98% sys-stack 524288 524288 +0.00% sys-total 103266608 102224216 -1.01% time 50339 46533 -7.56% virtual-mem 292990976 293728256 +0.25% garbage-1 allocated 2983818 2990889 +0.24% allocs 62880 62902 +0.03% cputime 16480000 16190000 -1.76% gc-pause-one 828462467 487875135 -41.11% gc-pause-total 4142312 2439375 -41.11% rss 1151709184 1153712128 +0.17% sys-gc 66068352 66068352 +0.00% sys-heap 1039728640 1039728640 +0.00% sys-other 37776064 40770176 +7.93% sys-stack 8781824 8781824 +0.00% sys-total 1152354880 1155348992 +0.26% time 16496998 16199876 -1.80% virtual-mem 1409564672 1402281984 -0.52% LGTM=rsc R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot CC=golang-codereviews, khr https://golang.org/cl/46430043	2014-02-12 22:16:42 +04:00
Dmitriy Vyukov	e48751e217	runtime: increase page size to 8K Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% This is the exact reincarnation of already LGTMed: https://golang.org/cl/45770044 which must not break darwin/freebsd after: https://golang.org/cl/56630043 TBR=iant LGTM=khr, iant R=iant, khr CC=golang-codereviews https://golang.org/cl/58230043	2014-01-30 13:28:19 +04:00
Dmitriy Vyukov	86a3a54284	runtime: fix windows build Currently windows crashes because early allocs in schedinit try to allocate tiny memory blocks, but m->p is not yet setup. I've considered calling procresize(1) earlier in schedinit, but this refactoring is better and must fix the issue as well. Fixes #7218. R=golang-codereviews, r CC=golang-codereviews https://golang.org/cl/54570045	2014-01-28 00:26:56 +04:00
Dmitriy Vyukov	1fa7029425	runtime: combine small NoScan allocations Combine NoScan allocations < 16 bytes into a single memory block. Reduces number of allocations on json/garbage benchmarks by 10+%. json-1 allocated `8039872` 7949194 -1.13% allocs 105774 93776 -11.34% cputime 156200000 100700000 -35.53% gc-pause-one 4908873 3814853 -22.29% gc-pause-total 2748969 2899288 +5.47% rss 52674560 43560960 -17.30% sys-gc 3796976 3256304 -14.24% sys-heap 43843584 35192832 -19.73% sys-other 5589312 5310784 -4.98% sys-stack 393216 393216 +0.00% sys-total 53623088 44153136 -17.66% time 156193436 100886714 -35.41% virtual-mem 256548864 256540672 -0.00% garbage-1 allocated 2996885 2932982 -2.13% allocs 62904 55200 -12.25% cputime 17470000 17400000 -0.40% gc-pause-one 932757485 925806143 -0.75% gc-pause-total 4663787 4629030 -0.75% rss 1151074304 1133670400 -1.51% sys-gc 66068352 65085312 -1.49% sys-heap 1039728640 1024065536 -1.51% sys-other 38038208 37485248 -1.45% sys-stack 8650752 8781824 +1.52% sys-total 1152485952 1135417920 -1.48% time 17478088 17418005 -0.34% virtual-mem 1343709184 1324204032 -1.45% LGTM=iant, bradfitz R=golang-codereviews, dave, iant, rsc, bradfitz CC=golang-codereviews, khr https://golang.org/cl/38750047	2014-01-24 22:35:11 +04:00
Dmitriy Vyukov	f8e0057bb7	sync: scalable Pool Introduce fixed-size P-local caches. When local caches overflow/underflow a batch of items is transferred to/from global mutex-protected cache. benchmark old ns/op new ns/op delta BenchmarkPool 50554 22423 -55.65% BenchmarkPool-4 400359 5904 -98.53% BenchmarkPool-16 403311 1598 -99.60% BenchmarkPool-32 367310 1526 -99.58% BenchmarkPoolOverlflow 5214 3633 -30.32% BenchmarkPoolOverlflow-4 42663 9539 -77.64% BenchmarkPoolOverlflow-8 46919 11385 -75.73% BenchmarkPoolOverlflow-16 39454 13048 -66.93% BenchmarkSprintfEmpty 84 63 -25.68% BenchmarkSprintfEmpty-2 371 32 -91.13% BenchmarkSprintfEmpty-4 465 22 -95.25% BenchmarkSprintfEmpty-8 565 12 -97.77% BenchmarkSprintfEmpty-16 498 5 -98.87% BenchmarkSprintfEmpty-32 492 4 -99.04% BenchmarkSprintfString 259 229 -11.58% BenchmarkSprintfString-2 574 144 -74.91% BenchmarkSprintfString-4 651 77 -88.05% BenchmarkSprintfString-8 868 47 -94.48% BenchmarkSprintfString-16 825 33 -95.96% BenchmarkSprintfString-32 825 30 -96.28% BenchmarkSprintfInt 213 188 -11.74% BenchmarkSprintfInt-2 448 138 -69.20% BenchmarkSprintfInt-4 624 52 -91.63% BenchmarkSprintfInt-8 691 31 -95.43% BenchmarkSprintfInt-16 724 18 -97.46% BenchmarkSprintfInt-32 718 16 -97.70% BenchmarkSprintfIntInt 311 282 -9.32% BenchmarkSprintfIntInt-2 333 145 -56.46% BenchmarkSprintfIntInt-4 642 110 -82.87% BenchmarkSprintfIntInt-8 832 42 -94.90% BenchmarkSprintfIntInt-16 817 24 -97.00% BenchmarkSprintfIntInt-32 805 22 -97.17% BenchmarkSprintfPrefixedInt 309 269 -12.94% BenchmarkSprintfPrefixedInt-2 245 168 -31.43% BenchmarkSprintfPrefixedInt-4 598 99 -83.36% BenchmarkSprintfPrefixedInt-8 770 67 -91.23% BenchmarkSprintfPrefixedInt-16 829 54 -93.49% BenchmarkSprintfPrefixedInt-32 824 50 -93.83% BenchmarkSprintfFloat 418 398 -4.78% BenchmarkSprintfFloat-2 295 203 -31.19% BenchmarkSprintfFloat-4 585 128 -78.12% BenchmarkSprintfFloat-8 873 60 -93.13% BenchmarkSprintfFloat-16 884 33 -96.24% BenchmarkSprintfFloat-32 881 29 -96.62% BenchmarkManyArgs 1097 1069 -2.55% BenchmarkManyArgs-2 705 567 -19.57% BenchmarkManyArgs-4 792 319 -59.72% BenchmarkManyArgs-8 963 172 -82.14% BenchmarkManyArgs-16 1115 103 -90.76% BenchmarkManyArgs-32 1133 90 -92.03% LGTM=rsc R=golang-codereviews, bradfitz, minux.ma, gobot, rsc CC=golang-codereviews https://golang.org/cl/46010043	2014-01-24 22:29:53 +04:00
Russ Cox	a81692e265	cmd/gc: add zeroing to enable precise stack accounting There is more zeroing than I would like right now - temporaries used for the new map and channel runtime calls need to be eliminated - but it will do for now. This CL only has an effect if you are building with GOEXPERIMENT=precisestack ./all.bash (or make.bash). It costs about 5% in the overall time spent in all.bash. That number will come down before we make it on by default, but this should be enough for Keith to try using the precise maps for copying stacks. amd64 only (and it's not really great generated code). TBR=khr, iant CC=golang-codereviews https://golang.org/cl/56430043	2014-01-23 23:11:04 -05:00
Dmitriy Vyukov	8371b0142e	undo CL 45770044 / d795425bfa18 Breaks darwin and freebsd. ««« original CL description runtime: increase page size to 8K Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% R=golang-codereviews, dave, khr, rsc, khr CC=golang-codereviews https://golang.org/cl/45770044 »»» R=golang-codereviews CC=golang-codereviews https://golang.org/cl/56060043	2014-01-23 19:56:59 +04:00
Dmitriy Vyukov	6d603af6dc	runtime: increase page size to 8K Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% R=golang-codereviews, dave, khr, rsc, khr CC=golang-codereviews https://golang.org/cl/45770044	2014-01-23 18:59:43 +04:00
Dmitriy Vyukov	9cbd2fb1aa	runtime: remove locks from netpoll hotpaths Introduces two-phase goroutine parking mechanism -- prepare to park, commit park. This mechanism does not require backing mutex to protect wait predicate. Use it in netpoll. See comment in netpoll.goc for details. This slightly reduces contention between reader, writer and read/write io notifications; and just eliminates a bunch of mutex operations from hotpaths, thus making then faster. benchmark old ns/op new ns/op delta BenchmarkTCP4ConcurrentReadWrite 2109 1945 -7.78% BenchmarkTCP4ConcurrentReadWrite-2 1162 1113 -4.22% BenchmarkTCP4ConcurrentReadWrite-4 798 755 -5.39% BenchmarkTCP4ConcurrentReadWrite-8 803 748 -6.85% BenchmarkTCP4Persistent 9411 9240 -1.82% BenchmarkTCP4Persistent-2 5888 5813 -1.27% BenchmarkTCP4Persistent-4 4016 3968 -1.20% BenchmarkTCP4Persistent-8 3943 3857 -2.18% R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc CC=golang-codereviews, khr https://golang.org/cl/45700043	2014-01-22 11:27:16 +04:00
Dmitriy Vyukov	cb133c6607	runtime: do not collect GC roots explicitly Currently we collect (add) all roots into a global array in a single-threaded GC phase. This hinders parallelism. With this change we just kick off parallel for for number_of_goroutines+5 iterations. Then parallel for callback decides whether it needs to scan stack of a goroutine scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely. This requires to store all goroutines in an array instead of a linked list (to allow direct indexing). This CL also removes DebugScan functionality. It is broken because it uses unbounded stack, so it can not run on g0. When it was working, I've found it helpless for debugging issues because the two algorithms are too different now. This change would require updating the DebugScan, so it's simpler to just delete it. With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same. garbage-8 allocated 2987886 2989221 +0.04% allocs 62885 62887 +0.00% cputime 21286000 21272000 -0.07% gc-pause-one 26633247 24885421 -6.56% gc-pause-total 873570 811264 -7.13% rss 242089984 242515968 +0.18% sys-gc 13934336 13869056 -0.47% sys-heap 205062144 205062144 +0.00% sys-other 12628288 12628288 +0.00% sys-stack 11534336 11927552 +3.41% sys-total 243159104 243487040 +0.13% time 2809477 2740795 -2.44% R=golang-codereviews, rsc CC=cshapiro, golang-codereviews, khr https://golang.org/cl/46860043	2014-01-21 13:06:57 +04:00
Dmitriy Vyukov	1ba04c171a	runtime: per-P defer pool Instead of a per-goroutine stack of defers for all sizes, introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes. For a program that starts 1e6 goroutines and then joins then: old: rss=6.6g virtmem=10.2g time=4.85s new: rss=4.5g virtmem= 8.2g time=3.48s R=golang-codereviews, rsc CC=golang-codereviews https://golang.org/cl/42750044	2014-01-21 11:20:23 +04:00
Dmitriy Vyukov	d5a36cd6bb	runtime: zero 2-word memory blocks in-place Currently for 2-word blocks we set the flag to clear the flag. Makes no sense. In particular on 32-bits we call memclr always. R=golang-codereviews, dave, iant CC=golang-codereviews, khr, rsc https://golang.org/cl/41170044	2014-01-21 10:53:51 +04:00
Dmitriy Vyukov	c0b9e6218c	runtime: output how long goroutines are blocked Example of output: goroutine 4 [sleep for 3 min]: time.Sleep(0x34630b8a000) src/pkg/runtime/time.goc:31 +0x31 main.func·002() block.go:16 +0x2c created by main.main block.go:17 +0x33 Full program and output are here: http://play.golang.org/p/NEZdADI3Td Fixes #6809. R=golang-codereviews, khr, kamil.kisiel, bradfitz, rsc CC=golang-codereviews https://golang.org/cl/50420043	2014-01-16 12:54:46 +04:00
Dmitriy Vyukov	b3a3afc9b7	runtime: fix data race in GC Fixes #5139. Update #7065. R=golang-codereviews, bradfitz, minux.ma CC=golang-codereviews https://golang.org/cl/52090045	2014-01-15 19:38:08 +04:00
Russ Cox	3ec60c253d	runtime: emit collection stacks in GODEBUG=allocfreetrace mode R=khr, dvyukov CC=golang-codereviews https://golang.org/cl/51830043	2014-01-14 10:39:50 -05:00
Ian Lance Taylor	8da8b37674	runtime: fix 32-bit malloc for pointers >= 0x80000000 The spans array is allocated in runtime·mallocinit. On a 32-bit system the number of entries in the spans array is MaxArena32 / PageSize, which (2U << 30) / (1 << 12) == (1 << 19). So we are allocating an array that can hold 19 bits for an index that can hold 20 bits. According to the comment in the function, this is intentional: we only allocate enough spans (and bitmaps) for a 2G arena, because allocating more would probably be wasteful. But since the span index is simply the upper 20 bits of the memory address, this scheme only works if memory addresses are limited to the low 2G of memory. That would be OK if we were careful to enforce it, but we're not. What we are careful to enforce, in functions like runtime·MHeap_SysAlloc, is that we always return addresses between the heap's arena_start and arena_start + MaxArena32. We generally get away with it because we start allocating just after the program end, so we only run into trouble with programs that allocate a lot of memory, enough to get past address 0x80000000. This changes the code that computes a span index to subtract arena_start on 32-bit systems just as we currently do on 64-bit systems. R=golang-codereviews, rsc CC=golang-codereviews https://golang.org/cl/49460043	2014-01-09 15:00:00 -08:00
Keith Randall	020b39c3f3	runtime: use special records hung off the MSpan to record finalizers and heap profile info. Enables removing the special bit from the heap bitmap. Also provides a generic mechanism for annotating occasional heap objects. finalizers overhead per obj old 680 B 80 B avg new 16 B/span 48 B profile overhead per obj old 32KB 24 B + hash tables new 16 B/span 24 B R=cshapiro, khr, dvyukov, gobot CC=golang-codereviews https://golang.org/cl/13314053	2014-01-07 13:45:50 -08:00
Keith Randall	2d20c0d625	runtime: mark objects in free lists as allocated and unscannable. On the plus side, we don't need to change the bits when mallocing pointerless objects. On the other hand, we need to mark objects in the free lists during GC. But the free lists are small at GC time, so it should be a net win. benchmark old ns/op new ns/op delta BenchmarkMalloc8 40 33 -17.65% BenchmarkMalloc16 45 38 -15.72% BenchmarkMallocTypeInfo8 58 59 +0.85% BenchmarkMallocTypeInfo16 63 64 +1.10% R=golang-dev, rsc, dvyukov CC=cshapiro, golang-dev https://golang.org/cl/41040043	2013-12-18 17:13:59 -08:00
Brad Fitzpatrick	8c6ef061e3	sync: add Pool type Adds the Pool type and docs, and use it in fmt. This is a temporary implementation, until Dmitry makes it fast. Uses the API proposal from Russ in http://goo.gl/cCKeb2 but adds an optional New field, as used in fmt and elsewhere. Almost all callers want that. Update #4720 R=golang-dev, rsc, cshapiro, iant, r, dvyukov, khr CC=golang-dev https://golang.org/cl/41860043	2013-12-18 11:08:34 -08:00
Russ Cox	bc135f6492	runtime: fix crash in runtime.GoroutineProfile This is a possible Go 1.2.1 candidate. Fixes #6946. R=iant, r CC=golang-dev https://golang.org/cl/41640043	2013-12-13 15:44:57 -05:00
Carl Shapiro	c69402d82b	runtime: remove outdated comment and related indentation R=golang-dev, iant CC=golang-dev https://golang.org/cl/39810043	2013-12-10 11:17:43 -08:00
Carl Shapiro	76c54c1193	runtime: add GODEBUG option for an electric fence like heap mode When enabled this new debugging mode will allocate objects on their own page and never recycle memory addresses. This is an essential tool to root cause a broad class of heap corruption. R=golang-dev, dave, daniel.morsing, dvyukov, rsc, iant, cshapiro CC=golang-dev https://golang.org/cl/22060046	2013-12-06 14:40:45 -08:00
Carl Shapiro	f056daf075	cmd/5g, cmd/5l, cmd/6g, cmd/6l, cmd/8g, cmd/8l, cmd/gc, runtime: generate pointer maps by liveness analysis This change allows the garbage collector to examine stack slots that are determined as live and containing a pointer value by the garbage collector. This results in a mean reduction of 65% in the number of stack slots scanned during an invocation of "GOGC=1 all.bash". Unfortunately, this does not yet allow garbage collection to be precise for the stack slots computed as live. Pointers confound the determination of what definitions reach a given instruction. In general, this problem is not solvable without runtime cost but some advanced cooperation from the compiler might mitigate common cases. R=golang-dev, rsc, cshapiro CC=golang-dev https://golang.org/cl/14430048	2013-12-05 17:35:22 -08:00
Carl Shapiro	0368a7ceb6	runtime: move stack scanning into the parallel mark phase This change reduces the cost of the stack scanning by frames. It moves the stack scanning from the serial root enumeration phase to the parallel tracing phase. The output that follows are timings for the issue 6482 benchmark Baseline BenchmarkGoroutineSelect 50 108027405 ns/op BenchmarkGoroutineBlocking 50 89573332 ns/op BenchmarkGoroutineForRange 20 95614116 ns/op BenchmarkGoroutineIdle 20 122809512 ns/op Stack scan by frames, non-parallel BenchmarkGoroutineSelect 20 297138929 ns/op BenchmarkGoroutineBlocking 20 301137599 ns/op BenchmarkGoroutineForRange 10 312499469 ns/op BenchmarkGoroutineIdle 10 209428876 ns/op Stack scan by frames, parallel BenchmarkGoroutineSelect 20 183938431 ns/op BenchmarkGoroutineBlocking 20 170109999 ns/op BenchmarkGoroutineForRange 20 179628882 ns/op BenchmarkGoroutineIdle 20 157541498 ns/op The remaining performance disparity is due to inefficiencies in gentraceback and its callees. The effect was isolated by using a parallel stack scan where scanstack was modified to do a conservative scan of the stack segments without gentraceback followed by a call of gentrackback with a no-op callback. The output that follows are the top-10 most frequent tops of stacks as determined by the Linux perf record facility. Baseline + 25.19% gc.test gc.test [.] runtime.xchg + 19.00% gc.test gc.test [.] scanblock + 8.53% gc.test gc.test [.] scanstack + 8.46% gc.test gc.test [.] flushptrbuf + 5.08% gc.test gc.test [.] procresize + 3.57% gc.test gc.test [.] runtime.chanrecv + 2.94% gc.test gc.test [.] dequeue + 2.74% gc.test gc.test [.] addroots + 2.25% gc.test gc.test [.] runtime.ready + 1.33% gc.test gc.test [.] runtime.cas64 Gentraceback + 18.12% gc.test gc.test [.] runtime.xchg + 14.68% gc.test gc.test [.] scanblock + 8.20% gc.test gc.test [.] runtime.gentraceback + 7.38% gc.test gc.test [.] flushptrbuf + 6.84% gc.test gc.test [.] scanstack + 5.92% gc.test gc.test [.] runtime.findfunc + 3.62% gc.test gc.test [.] procresize + 3.15% gc.test gc.test [.] readvarint + 1.92% gc.test gc.test [.] addroots + 1.87% gc.test gc.test [.] runtime.chanrecv R=golang-dev, dvyukov, rsc CC=golang-dev https://golang.org/cl/17410043	2013-12-03 14:12:55 -08:00
Keith Randall	139cc96a57	runtime: markfreed's error reports should be prefixed with "markfreed", not "markallocated". R=golang-dev, iant CC=golang-dev https://golang.org/cl/14441055	2013-10-09 13:28:47 -07:00
Russ Cox	c3dadca977	runtime: do not scan stack by frames during garbage collection Walking the stack by frames is ~3x more expensive than not, and since it didn't end up being precise, there is not enough benefit to outweigh the cost. This is the conservative choice: this CL makes the stack scanning behavior the same as it was in Go 1.1. Add benchmarks to package runtime so that we have them when we re-enable this feature during the Go 1.3 development. benchmark old ns/op new ns/op delta BenchmarkGoroutineSelect 3194909 1272092 -60.18% BenchmarkGoroutineBlocking 3120282 866366 -72.23% BenchmarkGoroutineForRange 3256179 939902 -71.13% BenchmarkGoroutineIdle 2005571 482982 -75.92% The Go 1 benchmarks, just to add more data. As far as I can tell the changes are mainly noise. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4409403046 4414734932 +0.12% BenchmarkFannkuch11 3407708965 3378306120 -0.86% BenchmarkFmtFprintfEmpty 100 99 -0.60% BenchmarkFmtFprintfString 242 239 -1.24% BenchmarkFmtFprintfInt 204 206 +0.98% BenchmarkFmtFprintfIntInt 320 316 -1.25% BenchmarkFmtFprintfPrefixedInt 295 299 +1.36% BenchmarkFmtFprintfFloat 442 435 -1.58% BenchmarkFmtManyArgs 1246 1216 -2.41% BenchmarkGobDecode 10186951 10051210 -1.33% BenchmarkGobEncode 16504381 16445650 -0.36% BenchmarkGzip 447030885 447056865 +0.01% BenchmarkGunzip 111056154 111696305 +0.58% BenchmarkHTTPClientServer 89973 93040 +3.41% BenchmarkJSONEncode 28174182 27933893 -0.85% BenchmarkJSONDecode 106353777 110443817 +3.85% BenchmarkMandelbrot200 4822289 4806083 -0.34% BenchmarkGoParse 6102436 6142734 +0.66% BenchmarkRegexpMatchEasy0_32 133 132 -0.75% BenchmarkRegexpMatchEasy0_1K 372 373 +0.27% BenchmarkRegexpMatchEasy1_32 113 111 -1.77% BenchmarkRegexpMatchEasy1_1K 964 940 -2.49% BenchmarkRegexpMatchMedium_32 202 205 +1.49% BenchmarkRegexpMatchMedium_1K 68862 68858 -0.01% BenchmarkRegexpMatchHard_32 3480 3407 -2.10% BenchmarkRegexpMatchHard_1K 108255 112614 +4.03% BenchmarkRevcomp 751393035 743929976 -0.99% BenchmarkTemplate 139637041 135402220 -3.03% BenchmarkTimeParse 479 475 -0.84% BenchmarkTimeFormat 460 466 +1.30% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 75.34 76.36 1.01x BenchmarkGobEncode 46.50 46.67 1.00x BenchmarkGzip 43.41 43.41 1.00x BenchmarkGunzip 174.73 173.73 0.99x BenchmarkJSONEncode 68.87 69.47 1.01x BenchmarkJSONDecode 18.25 17.57 0.96x BenchmarkGoParse 9.49 9.43 0.99x BenchmarkRegexpMatchEasy0_32 239.58 241.74 1.01x BenchmarkRegexpMatchEasy0_1K 2749.74 2738.00 1.00x BenchmarkRegexpMatchEasy1_32 282.49 286.32 1.01x BenchmarkRegexpMatchEasy1_1K 1062.00 1088.96 1.03x BenchmarkRegexpMatchMedium_32 4.93 4.86 0.99x BenchmarkRegexpMatchMedium_1K 14.87 14.87 1.00x BenchmarkRegexpMatchHard_32 9.19 9.39 1.02x BenchmarkRegexpMatchHard_1K 9.46 9.09 0.96x BenchmarkRevcomp 338.26 341.65 1.01x BenchmarkTemplate 13.90 14.33 1.03x Fixes #6482. R=golang-dev, dave, r CC=golang-dev https://golang.org/cl/14257043	2013-10-02 11:59:53 -04:00
Russ Cox	30ecb4cd05	build: disable precise collection of stack frames The code for call site-specific pointer bitmaps was not ready in time, but the zeroing required without it is too expensive to use by default. We will have to wait for precise collection of stack frames until Go 1.3. The precise collection can be re-enabled by GOEXPERIMENT=precisestack ./all.bash but that will not be the default for a Go 1.2 build. Fixes #6087. R=golang-dev, jeremyjackins, dan.kortschak, r CC=golang-dev https://golang.org/cl/13677045	2013-09-16 20:26:10 -04:00
Dmitriy Vyukov	a33ef8d11b	runtime: account for all sys memory in MemStats Currently lots of sys allocations are not accounted in any of XxxSys, including GC bitmap, spans table, GC roots blocks, GC finalizer blocks, iface table, netpoll descriptors and more. Up to ~20% can unaccounted. This change introduces 2 new stats: GCSys and OtherSys for GC metadata and all other misc allocations, respectively. Also ensures that all XxxSys indeed sum up to Sys. All sys memory allocation functions require the stat for accounting, so that it's impossible to miss something. Also fix updating of mcache_sys/inuse, they were not updated after deallocation. test/bench/garbage/parser before: Sys 670064344 HeapSys 610271232 StackSys 65536 MSpanSys 14204928 MCacheSys 16384 BuckHashSys 1439992 after: Sys 670064344 HeapSys 610271232 StackSys 65536 MSpanSys 14188544 MCacheSys 16384 BuckHashSys 3194304 GCSys 39198688 OtherSys 3129656 Fixes #5799. R=rsc, dave, alex.brainman CC=golang-dev https://golang.org/cl/12946043	2013-09-06 16:55:40 -04:00
Keith Randall	23f9751e83	runtime: clean up map code. Remove hashmap.h. Use cnew/cnewarray instead of mallocgc. R=golang-dev, dvyukov CC=golang-dev https://golang.org/cl/13396045	2013-08-31 14:09:34 -07:00
Keith Randall	fb376021be	runtime: record type information for hashtable internal structures. Remove all hashtable-specific GC code. Fixes bug 6119. R=cshapiro, dvyukov, khr CC=golang-dev https://golang.org/cl/13078044	2013-08-31 09:09:50 -07:00
Carl Shapiro	c51152f438	runtime: check bitmap word for allocated bit in markonly When searching for an allocated bit, flushptrbuf would search backward in the bitmap word containing the bit of pointer being looked-up before searching the span. This extra check was not replicated in markonly which, instead, after not finding an allocated bit for a pointer would directly look in the span. Using statistics generated from godoc, before this change span lookups were, on average, more common than word lookups. It was common for markonly to consult spans for one third of its pointer lookups. With this change in place, what were previously span lookups are overwhelmingly become by the word lookups making the total number of span lookups a relatively small fraction of the whole. This change also introduces some statistics gathering about lookups guarded by the CollectStats enum. R=golang-dev, khr CC=golang-dev https://golang.org/cl/13311043	2013-08-29 13:52:38 -07:00
Keith Randall	ed467db6d8	cmd/cc,runtime: change preprocessor to expand macros inside of #pragma textflag and #pragma dataflag directives. Update dataflag directives to use symbols instead of integer constants. R=golang-dev, bradfitz CC=golang-dev https://golang.org/cl/13310043	2013-08-29 12:36:59 -07:00
Keith Randall	d0dd420a24	runtime: rename FlagNoPointers to FlagNoScan. Better represents the use of the flag, especially for objects which actually do have pointers but we don't want the GC to scan them. R=golang-dev, cshapiro CC=golang-dev https://golang.org/cl/13181045	2013-08-23 17:28:47 -07:00
Dmitriy Vyukov	dfdd1ba028	runtime: do not trigger GC on g0 GC acquires worldsema, which is a goroutine-level semaphore which parks goroutines. g0 can not be parked. Fixes #6193. R=khr, khr CC=golang-dev https://golang.org/cl/12880045	2013-08-22 02:17:45 +04:00
Carl Shapiro	87fdb8fb9a	undo CL 13010045 / 04f8101b46dd Update the original change but do not read interface types in the arguments area. Once the arguments area is zeroed as the locals area is we can safely read interface type values there too. ««« original CL description undo CL 12785045 / 71ce80dc4195 This has broken the 32-bit builds. ««« original CL description cmd/gc, runtime: use type information to scan interface values R=golang-dev, rsc, dvyukov CC=golang-dev https://golang.org/cl/12785045 »»» R=khr, golang-dev, khr CC=golang-dev https://golang.org/cl/13010045 »»» R=khr, khr CC=golang-dev https://golang.org/cl/13073045	2013-08-21 13:51:00 -07:00
Carl Shapiro	ca2d32b46d	undo CL 12785045 / 71ce80dc4195 This has broken the 32-bit builds. ««« original CL description cmd/gc, runtime: use type information to scan interface values R=golang-dev, rsc, dvyukov CC=golang-dev https://golang.org/cl/12785045 »»» R=khr, golang-dev, khr CC=golang-dev https://golang.org/cl/13010045	2013-08-19 14:16:55 -07:00
Keith Randall	6401e0f83f	runtime: don't run finalizers if we're still on the g0 stack. R=golang-dev, rsc, dvyukov, khr CC=golang-dev https://golang.org/cl/11386044	2013-08-19 12:20:50 -07:00
Carl Shapiro	21ea5103a4	cmd/gc, runtime: use type information to scan interface values R=golang-dev, rsc, dvyukov CC=golang-dev https://golang.org/cl/12785045	2013-08-19 10:19:59 -07:00
Russ Cox	5822e7848a	runtime: make SetFinalizer(x, f) accept any f for which f(x) is valid Originally the requirement was f(x) where f's argument is exactly x's type. CL 11858043 relaxed the requirement in a non-standard way: f's argument must be exactly x's type or interface{}. If we're going to relax the requirement, it should be done in a way consistent with the rest of Go. This CL allows f's argument to have any type for which x is assignable; that's the same requirement the compiler would impose if compiling f(x) directly. Fixes #5368. R=dvyukov, bradfitz, pieter CC=golang-dev https://golang.org/cl/12895043	2013-08-14 14:54:31 -04:00
Carl Shapiro	abc516e420	cmd/cc, cmd/gc, runtime: Uniquely encode iface and eface pointers in the pointer map. Prior to this change, pointer maps encoded the disposition of a word using a single bit. A zero signaled a non-pointer value and a one signaled a pointer value. Interface values, which are a effectively a union type, were conservatively labeled as a pointer. This change widens the logical element size of the pointer map to two bits per word. As before, zero signals a non-pointer value and one signals a pointer value. Additionally, a two signals an iface pointer and a three signals an eface pointer. Following other changes to the runtime, values two and three will allow a type information to drive interpretation of the subsequent word so only those interface values containing a pointer value will be scanned. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/12689046	2013-08-09 16:48:12 -07:00
Dmitriy Vyukov	23e15f7253	net: add special netFD mutex The mutex, fdMutex, handles locking and lifetime of sysfd, and serializes Read and Write methods. This allows to strip 2 sync.Mutex.Lock calls, 2 sync.Mutex.Unlock calls, 1 defer and some amount of misc overhead from every network operation. On linux/amd64, Intel E5-2690: benchmark old ns/op new ns/op delta BenchmarkTCP4Persistent 9595 9454 -1.47% BenchmarkTCP4Persistent-2 8978 8772 -2.29% BenchmarkTCP4ConcurrentReadWrite 4900 4625 -5.61% BenchmarkTCP4ConcurrentReadWrite-2 2603 2500 -3.96% In general it strips 70-500 ns from every network operation depending on processor model. On my relatively new E5-2690 it accounts to ~5% of network op cost. Fixes #6074. R=golang-dev, bradfitz, alex.brainman, iant, mikioh.mikioh CC=golang-dev https://golang.org/cl/12418043	2013-08-09 21:43:00 +04:00
Carl Shapiro	73c93a404c	cmd/cc, cmd/gc, runtime: emit bitmaps for scanning locals. Previously, all word aligned locations in the local variables area were scanned as conservative roots. With this change, a bitmap is generated describing the locations of pointer values in local variables. With this change the argument bitmap information has been changed to only store information about arguments. The locals member, has been removed. In its place, the bitmap data for local variables is now used to store the size of locals. If the size is negative, the magnitude indicates the size of the local variables area. R=rsc CC=golang-dev https://golang.org/cl/12328044	2013-08-07 12:47:01 -07:00
Dmitriy Vyukov	9c0500b466	runtime: use gcpc/gcsp during traceback of goroutines in syscalls gcpc/gcsp are used by GC in similar situation. gcpc/gcsp are also more stable than gp->sched, because gp->sched is mutated by entersyscall/exitsyscall in morestack and mcall. So it has higher chances of being inconsistent. Also, rename gcpc/gcsp to syscallpc/syscallsp. This is the same as reverted change 12250043 with save marked as textflag 7. The problem was that if save calls morestack, then subsequent lessstack spoils g->sched.pc/sp. And that bad values were remembered in g->syscallpc/sp. Entersyscallblock had the same problem, but it was never triggered to date. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/12478043	2013-08-06 13:38:44 +04:00
Dmitriy Vyukov	f38ff9e5ea	undo CL 12250043 / e911f94c4902 Break all 386 builders. ««« original CL description runtime: use gcpc/gcsp during traceback of goroutines in syscalls gcpc/gcsp are used by GC in similar situation. gcpc/gcsp are also more stable than gp->sched, because gp->sched is mutated by entersyscall/exitsyscall in morestack and mcall. So it has higher chances of being inconsistent. Also, rename gcpc/gcsp to syscallpc/syscallsp. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/12250043 »»» R=rsc CC=golang-dev https://golang.org/cl/12424045	2013-08-05 23:33:50 +04:00
Dmitriy Vyukov	d5ab784611	runtime: remove singleproc var It was needed for the old scheduler, because there temporary could be more threads than gomaxprocs. In the new scheduler gomaxprocs is always respected. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/12438043	2013-08-05 22:58:02 +04:00
Dmitriy Vyukov	f73972fa33	runtime: use gcpc/gcsp during traceback of goroutines in syscalls gcpc/gcsp are used by GC in similar situation. gcpc/gcsp are also more stable than gp->sched, because gp->sched is mutated by entersyscall/exitsyscall in morestack and mcall. So it has higher chances of being inconsistent. Also, rename gcpc/gcsp to syscallpc/syscallsp. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/12250043	2013-08-05 22:55:54 +04:00
Dmitriy Vyukov	0a904a3f2e	runtime: remove dead code Remove dead code related to allocation of type metadata with SysAlloc. R=golang-dev, bradfitz CC=golang-dev https://golang.org/cl/12311045	2013-08-04 23:32:06 +04:00
Pieter Droogendijk	6350e45892	runtime: allow SetFinalizer with a func(interface{}) Fixes #5368. R=golang-dev, dvyukov CC=golang-dev, rsc https://golang.org/cl/11858043	2013-07-29 19:43:08 +04:00
Dmitriy Vyukov	f8a850b250	runtime: refactor mallocgc Make it accept type, combine flags. Several reasons for the change: 1. mallocgc and settype must be atomic wrt GC 2. settype is called from only one place now 3. it will help performance (eventually settype functionality must be combined with markallocated) 4. flags are easier to read now (no mallocgc(sz, 0, 1, 0) anymore) R=golang-dev, iant, nightlyone, rsc, dave, khr, bradfitz, r CC=golang-dev https://golang.org/cl/10136043	2013-07-26 21:17:24 +04:00
Russ Cox	48769bf546	runtime: use funcdata to supply garbage collection information This CL introduces a FUNCDATA number for runtime-specific garbage collection metadata, changes the C and Go compilers to emit that metadata, and changes the runtime to expect it. The old pseudo-instructions that carried this information are gone, as is the linker code to process them. R=golang-dev, dvyukov, cshapiro CC=golang-dev https://golang.org/cl/11406044	2013-07-19 16:04:09 -04:00
Dmitriy Vyukov	eb04df75cd	runtime: prevent GC from seeing the contents of a frame in runfinq This holds the last finalized object and arguments to its finalizer. Fixes #5348. R=golang-dev, iant CC=golang-dev https://golang.org/cl/11454044	2013-07-19 18:01:33 +04:00
Russ Cox	c3de91bb15	cmd/ld, runtime: use new contiguous pcln table R=golang-dev, r, dave CC=golang-dev https://golang.org/cl/11494043	2013-07-18 10:43:22 -04:00
Dmitriy Vyukov	5887f142a3	runtime: more reliable preemption Currently preemption signal g->stackguard0==StackPreempt can be lost if it is received when preemption is disabled (e.g. m->lock!=0). This change duplicates the preemption signal in g->preempt and restores g->stackguard0 when preemption is enabled. Update #543. R=golang-dev, rsc CC=golang-dev https://golang.org/cl/10792043	2013-07-17 12:52:37 -04:00
Russ Cox	5d363c6357	cmd/ld, runtime: new in-memory symbol table format Design at http://golang.org/s/go12symtab. This enables some cleanup of the garbage collector metadata that will be done in future CLs. This CL does not move the old symtab and pclntab back into an unmapped section of the file. That's a bit tricky and will be done separately. Fixes #4020. R=golang-dev, dave, cshapiro, iant, r CC=golang-dev, nigeltao https://golang.org/cl/11085043	2013-07-16 09:41:38 -04:00
Dmitriy Vyukov	4b536a550f	runtime: introduce GODEBUG env var Currently it replaces GOGCTRACE env var (GODEBUG=gctrace=1). The plan is to extend it with other type of debug tracing, e.g. GODEBUG=gctrace=1,schedtrace=100. R=rsc CC=bradfitz, daniel.morsing, gobot, golang-dev https://golang.org/cl/10026045	2013-06-28 18:37:06 +04:00
Russ Cox	6fa3c89b77	runtime: record proper goroutine state during stack split Until now, the goroutine state has been scattered during the execution of newstack and oldstack. It's all there, and those routines know how to get back to a working goroutine, but other pieces of the system, like stack traces, do not. If something does interrupt the newstack or oldstack execution, the rest of the system can't understand the goroutine. For example, if newstack decides there is an overflow and calls throw, the stack tracer wouldn't dump the goroutine correctly. For newstack to save a useful state snapshot, it needs to be able to rewind the PC in the function that triggered the split back to the beginning of the function. (The PC is a few instructions in, just after the call to morestack.) To make that possible, we change the prologues to insert a jmp back to the beginning of the function after the call to morestack. That is, the prologue used to be roughly: TEXT myfunc check for split jmpcond nosplit call morestack nosplit: sub $xxx, sp Now an extra instruction is inserted after the call: TEXT myfunc start: check for split jmpcond nosplit call morestack jmp start nosplit: sub $xxx, sp The jmp is not executed directly. It is decoded and simulated by runtime.rewindmorestack to discover the beginning of the function, and then the call to morestack returns directly to the start label instead of to the jump instruction. So logically the jmp is still executed, just not by the cpu. The prologue thus repeats in the case of a function that needs a stack split, but against the cost of the split itself, the extra few instructions are noise. The repeated prologue has the nice effect of making a stack split double-check that the new stack is big enough: if morestack happens to return on a too-small stack, we'll now notice before corruption happens. The ability for newstack to rewind to the beginning of the function should help preemption too. If newstack decides that it was called for preemption instead of a stack split, it now has the goroutine state correctly paused if rescheduling is needed, and when the goroutine can run again, it can return to the start label on its original stack and re-execute the split check. Here is an example of a split stack overflow showing the full trace, without any special cases in the stack printer. (This one was triggered by making the split check incorrect.) runtime: newstack framesize=0x0 argsize=0x18 sp=0x6aebd0 stack=[0x6b0000, 0x6b0fa0] morebuf={pc:0x69f5b sp:0x6aebd8 lr:0x0} sched={pc:0x68880 sp:0x6aebd0 lr:0x0 ctxt:0x34e700} runtime: split stack overflow: 0x6aebd0 < 0x6b0000 fatal error: runtime: split stack overflow goroutine 1 [stack split]: runtime.mallocgc(0x290, 0x100000000, 0x1) /Users/rsc/g/go/src/pkg/runtime/zmalloc_darwin_amd64.c:21 fp=0x6aebd8 runtime.new() /Users/rsc/g/go/src/pkg/runtime/zmalloc_darwin_amd64.c:682 +0x5b fp=0x6aec08 go/build.(Context).Import(0x5ae340, 0xc210030c71, 0xa, 0xc2100b4380, 0x1b, ...) /Users/rsc/g/go/src/pkg/go/build/build.go:424 +0x3a fp=0x6b00a0 main.loadImport(0xc210030c71, 0xa, 0xc2100b4380, 0x1b, 0xc2100b42c0, ...) /Users/rsc/g/go/src/cmd/go/pkg.go:249 +0x371 fp=0x6b01a8 main.(Package).load(0xc21017c800, 0xc2100b42c0, 0xc2101828c0, 0x0, 0x0, ...) /Users/rsc/g/go/src/cmd/go/pkg.go:431 +0x2801 fp=0x6b0c98 main.loadPackage(0x369040, 0x7, 0xc2100b42c0, 0x0) /Users/rsc/g/go/src/cmd/go/pkg.go:709 +0x857 fp=0x6b0f80 ----- stack segment boundary ----- main.(builder).action(0xc2100902a0, 0x0, 0x0, 0xc2100e6c00, 0xc2100e5750, ...) /Users/rsc/g/go/src/cmd/go/build.go:539 +0x437 fp=0x6b14a0 main.(builder).action(0xc2100902a0, 0x0, 0x0, 0xc21015b400, 0x2, ...) /Users/rsc/g/go/src/cmd/go/build.go:528 +0x1d2 fp=0x6b1658 main.(builder).test(0xc2100902a0, 0xc210092000, 0x0, 0x0, 0xc21008ff60, ...) /Users/rsc/g/go/src/cmd/go/test.go:622 +0x1b53 fp=0x6b1f68 ----- stack segment boundary ----- main.runTest(0x5a6b20, 0xc21000a020, 0x2, 0x2) /Users/rsc/g/go/src/cmd/go/test.go:366 +0xd09 fp=0x6a5cf0 main.main() /Users/rsc/g/go/src/cmd/go/main.go:161 +0x4f9 fp=0x6a5f78 runtime.main() /Users/rsc/g/go/src/pkg/runtime/proc.c:183 +0x92 fp=0x6a5fa0 runtime.goexit() /Users/rsc/g/go/src/pkg/runtime/proc.c:1266 fp=0x6a5fa8 And here is a seg fault during oldstack: SIGSEGV: segmentation violation PC=0x1b2a6 runtime.oldstack() /Users/rsc/g/go/src/pkg/runtime/stack.c:159 +0x76 runtime.lessstack() /Users/rsc/g/go/src/pkg/runtime/asm_amd64.s:270 +0x22 goroutine 1 [stack unsplit]: fmt.(pp).printArg(0x2102e64e0, 0xe5c80, 0x2102c9220, 0x73, 0x0, ...) /Users/rsc/g/go/src/pkg/fmt/print.go:818 +0x3d3 fp=0x221031e6f8 fmt.(pp).doPrintf(0x2102e64e0, 0x12fb20, 0x2, 0x221031eb98, 0x1, ...) /Users/rsc/g/go/src/pkg/fmt/print.go:1183 +0x15cb fp=0x221031eaf0 fmt.Sprintf(0x12fb20, 0x2, 0x221031eb98, 0x1, 0x1, ...) /Users/rsc/g/go/src/pkg/fmt/print.go:234 +0x67 fp=0x221031eb40 flag.(stringValue).String(0x2102c9210, 0x1, 0x0) /Users/rsc/g/go/src/pkg/flag/flag.go:180 +0xb3 fp=0x221031ebb0 flag.(FlagSet).Var(0x2102f6000, 0x293d38, 0x2102c9210, 0x143490, 0xa, ...) /Users/rsc/g/go/src/pkg/flag/flag.go:633 +0x40 fp=0x221031eca0 flag.(FlagSet).StringVar(0x2102f6000, 0x2102c9210, 0x143490, 0xa, 0x12fa60, ...) /Users/rsc/g/go/src/pkg/flag/flag.go:550 +0x91 fp=0x221031ece8 flag.(*FlagSet).String(0x2102f6000, 0x143490, 0xa, 0x12fa60, 0x0, ...) /Users/rsc/g/go/src/pkg/flag/flag.go:563 +0x87 fp=0x221031ed38 flag.String(0x143490, 0xa, 0x12fa60, 0x0, 0x161950, ...) /Users/rsc/g/go/src/pkg/flag/flag.go:570 +0x6b fp=0x221031ed80 testing.init() /Users/rsc/g/go/src/pkg/testing/testing.go:-531 +0xbb fp=0x221031edc0 strings_test.init() /Users/rsc/g/go/src/pkg/strings/strings_test.go:1115 +0x62 fp=0x221031ef70 main.init() strings/_test/_testmain.go:90 +0x3d fp=0x221031ef78 runtime.main() /Users/rsc/g/go/src/pkg/runtime/proc.c:180 +0x8a fp=0x221031efa0 runtime.goexit() /Users/rsc/g/go/src/pkg/runtime/proc.c:1269 fp=0x221031efa8 goroutine 2 [runnable]: runtime.MHeap_Scavenger() /Users/rsc/g/go/src/pkg/runtime/mheap.c:438 runtime.goexit() /Users/rsc/g/go/src/pkg/runtime/proc.c:1269 created by runtime.main /Users/rsc/g/go/src/pkg/runtime/proc.c:166 rax 0x23ccc0 rbx 0x23ccc0 rcx 0x0 rdx 0x38 rdi 0x2102c0170 rsi 0x221032cfe0 rbp 0x221032cfa0 rsp 0x7fff5fbff5b0 r8 0x2102c0120 r9 0x221032cfa0 r10 0x221032c000 r11 0x104ce8 r12 0xe5c80 r13 0x1be82baac718 r14 0x13091135f7d69200 r15 0x0 rip 0x1b2a6 rflags 0x10246 cs 0x2b fs 0x0 gs 0x0 Fixes #5723. R=r, dvyukov, go.peter.90, dave, iant CC=golang-dev https://golang.org/cl/10360048	2013-06-27 11:32:01 -04:00
Dmitriy Vyukov	94dc963b55	runtime: fix race condition between GC and setGCPercent If first GC runs concurrently with setGCPercent, it can overwrite gcpercent value with default. R=golang-dev, iant CC=golang-dev https://golang.org/cl/10242047	2013-06-15 16:07:06 +04:00
Keith Randall	de316388a7	runtime: garbage collector runs on g0 now. No need to change to Grunnable state. Add some more checks for Grunning state. R=golang-dev, rsc, khr, dvyukov CC=golang-dev https://golang.org/cl/10186045	2013-06-14 11:42:51 -07:00

1 2 3 4 5 ...

284 Commits