Fixes#6874.
Use runtime.GC() as a stronger version of runtime.Gosched() which tends to bias the running goroutine in an otherwise idle system. This appears to reduce the worst case number of spins from 600 down to 30 on my 2 core system under high load.
LGTM=iant
R=golang-codereviews, lucio.dere, iant, dvyukov
CC=golang-codereviews
https://golang.org/cl/56540046
If a LowerUpper ever happens, maketables will complain.
Fixes#7002.
LGTM=dave
R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/59210044
Array values are comparable if values of the array element type
are comparable.
Fixes#6526.
LGTM=khr
R=rsc, bradfitz, khr
CC=golang-codereviews
https://golang.org/cl/58580043
In external link mode the linker explicitly adds the string
constant "runtime/cgo". It adds the string constant using the
same symbol name as the compiler, but a different format. The
compiler assumes that the string data immediately follows the
string header, but the linker puts the two in different
sections. The result is bad string data when the compiler
sees "runtime/cgo" used as a string constant.
The compiler assumption is in datastring in [568]g/gobj.c.
The linker layout is in addstrdata in ld/data.c. The compiler
assumption is valid for string literals. The linker is not
creating a string literal, so its assumption is also valid.
There are a few ways to avoid this problem. This patch fixes
it by only doing the fake import of runtime/cgo if necessary,
and by only creating the string symbol if necessary.
Fixes#7234.
LGTM=dvyukov
R=golang-codereviews, dvyukov, bradfitz
CC=golang-codereviews
https://golang.org/cl/58410043
The Transport's idle connection cache is keyed by a string,
for pre-Go 1.0 reasons. Ever since Go has been able to use
structs as map keys, there's been a TODO in the code to use
structs instead of allocating strings. This change does that.
Saves 3 allocatins and ~100 bytes of garbage per client
request. But because string hashing is so fast these days
(thanks, Keith), the performance is a wash: what we gain
on GC and not allocating, we lose in slower hashing. (hashing
structs of strings is slower than 1 string)
This seems a bit faster usually, but I've also seen it be a
bit slower. But at least it's how I've wanted it now, and it
the allocation improvements are consistent.
LGTM=adg
R=adg
CC=golang-codereviews
https://golang.org/cl/58260043
The code is copied from cmd/6g.
Empirically, all branch targets are nil in this code so
something is still wrong, but at least this stops 8g -S
from crashing.
Update #7178
LGTM=dave, iant
R=iant, dave
CC=golang-codereviews
https://golang.org/cl/58400043
This is the chunked half of https://golang.org/cl/49570044 .
We want full reads to return EOF as early as possible, when we
know we're at the end, so http.Transport client connections are eagerly
re-used in the common case, even if no Read or Close follows.
To do this, make the chunkedReader.Read fill up its argument p []byte
buffer as much as possible, as long as that doesn't involve doing
any more blocking reads to read chunk headers. That means if we
have a chunk EOF ("0\r\n") sitting in the incoming bufio.Reader,
we see it and set EOF on our final Read.
LGTM=adg
R=adg
CC=golang-codereviews
https://golang.org/cl/58240043
Set EOF on the final Read of a body with a Content-Length, which
will cause clients to recycle their connection immediately upon
the final Read, rather than waiting for another Read or Close
(neither of which might come). This happens often when client
code is simply something like:
err := json.NewDecoder(resp.Body).Decode(&dest)
...
Then there's usually no subsequent Read. Even if the client
calls Close (which they should): in Go 1.1, the body was
slurped to EOF, but in Go 1.2, that was then treated as a
Close-before-EOF and the underlying connection was closed.
But that's assuming the user even calls Close. Many don't.
Reading to EOF also causes a connection be reused. Now the EOF
arrives earlier.
This CL only addresses the Content-Length case. A future CL
will address the chunked case.
LGTM=adg
R=adg
CC=golang-codereviews
https://golang.org/cl/49570044
This change also addresses some places where the comments were lacking.
Fixes#7087.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/56700043
On 32-bits one can arrange make(chan) params so that
the chan buffer gives you access to whole memory.
LGTM=r
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/50250045
Tiny alloc memory block is shared by different goroutines running on the same thread.
We call racemalloc after enabling preemption in mallocgc,
as the result another goroutine can act on not yet race-cleared tiny block.
Call racemalloc before enabling preemption.
Fixes#7224.
LGTM=dave
R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/57730043
Under some circumstances linking a test binary with gccgo can fail, because
the installed version of the library ends up before the version built for the
test on the linker command line.
This admittedly slightly hackish fix fixes this by putting the library archives
on the linker command line in the order that a pre-order depth first traversal
of the dependencies gives them, which has the side effect of always putting the
version of the library built for the test first.
Fixes#6768
LGTM=rsc
R=golang-codereviews, minux.ma, gobot, rsc, dave
CC=golang-codereviews
https://golang.org/cl/28050043
Although debug.Stack is deprecated, it should still return the correct result.
Output before this CL (using a trivial library in $GOPATH/test.com/a):
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
com/a.ShowStack: os.Stdout.Write(debug.Stack())
Output with this CL applied:
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
ShowStack: os.Stdout.Write(debug.Stack())
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/57330043
Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes#7218.
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045
When GOMAXPROCS>1 the last P in syscall is never retaken
(because there are already idle P's -- npidle>0).
This prevents sysmon thread from sleeping.
On a darwin machine the program from issue 6673 constantly
consumes ~0.2% CPU. With this change it stably consumes 0.0% CPU.
Fixes#6673.
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/56990045
Use the smaller read-only bytes.NewReader/strings.NewReader instead
of a bytes.Buffer when possible.
LGTM=r
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54660045
In DWARF 4 the debug info for large types is put into
.debug_type sections, so that the linker can discard duplicate
info. This change adds support for reading type units.
Another small change included here is that DWARF 3 supports
storing the byte offset of a struct field as a formData rather
than a formDwarfBlock.
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/56300043
On 32-bits n*sizeof(r[0]) can overflow.
Or it can become 1<<32-eps, and mallocgc will "successfully"
allocate 0 pages for it, there are no checks downstream
and MHeap_Grow just does:
npage = (npage+15)&~15;
ask = npage<<PageShift;
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/54760045
When growing slice take into account size of the allocated memory block.
Also apply the same optimization to string->[]byte conversion.
Fixes#6307.
benchmark old ns/op new ns/op delta
BenchmarkAppendGrowByte 4541036 4434108 -2.35%
BenchmarkAppendGrowString 59885673 44813604 -25.17%
LGTM=khr
R=khr
CC=golang-codereviews, iant, rsc
https://golang.org/cl/53340044
On top of "tiny allocator" (cl/38750047), reduces number of allocs by 1% on json.
No code must rely on zero termination. So will also make debugging simpler,
by uncovering issues earlier.
json-1
allocated 7949686 7915766 -0.43%
allocs 93778 92790 -1.05%
time 100957795 97250949 -3.67%
rest of the metrics are too noisy.
LGTM=r
R=golang-codereviews, r, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/40370061
There is more zeroing than I would like right now -
temporaries used for the new map and channel runtime
calls need to be eliminated - but it will do for now.
This CL only has an effect if you are building with
GOEXPERIMENT=precisestack ./all.bash
(or make.bash). It costs about 5% in the overall time
spent in all.bash. That number will come down before
we make it on by default, but this should be enough for
Keith to try using the precise maps for copying stacks.
amd64 only (and it's not really great generated code).
TBR=khr, iant
CC=golang-codereviews
https://golang.org/cl/56430043
The addition of TLS to ARM rewrote the MRC instruction
differently depending on whether we were using internal
or external linking mode. That's clearly not okay, since we
don't know that during compilation, which is when we now
generate the code. Also, because the change did not introduce
a real MRC instruction but instead just macro-expanded it
in the assembler, liblink is rewriting a WORD instruction that
may actually be looking for that specific constant, which would
lead to very unexpected results. It was also using one value
that happened to be 8 where a different value that also
happened to be 8 belonged. So the code was correct for those
values but not correct in general, and very confusing.
Throw it all away.
Replace with the following. There is a linker-provided symbol
runtime.tlsgm with a value (address) set to the offset from the
hardware-provided TLS base register to the g and m storage.
Any reference to that name emits an appropriate TLS relocation
to be resolved by either the internal linker or the external linker,
depending on the link mode. The relocation has exactly the
semantics of the R_ARM_TLS_LE32 relocation, which is what
the external linker provides.
This symbol is only used in two routines, runtime.load_gm and
runtime.save_gm. In both cases it is now used like this:
MRC 15, 0, R0, C13, C0, 3 // fetch TLS base pointer
MOVW $runtime·tlsgm(SB), R2
ADD R2, R0 // now R0 points at thread-local g+m storage
It is likely that this change breaks the generation of shared libraries
on ARM, because the MOVW needs to be rewritten to use the global
offset table and a different relocation type. But let's get the supported
functionality working again before we worry about unsupported
functionality.
LGTM=dave, iant
R=iant, dave
CC=golang-codereviews
https://golang.org/cl/56120043
Needs to be an h3, not an h2.
Thanks to Mingjie Xing for pointing it out.
LGTM=dsymonds
R=golang-codereviews, dsymonds
CC=golang-codereviews
https://golang.org/cl/55980046