Better sampling of objects that are close in size to sampling rate.
See the comment for details.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/43830043
Current "System->etext" is not very informative.
Add parent "GC" frame.
Replace un-unwindable syscall/cgo frames with Go stack that leads to the call.
LGTM=rsc
R=rsc, alex.brainman, ality
CC=golang-codereviews
https://golang.org/cl/61270043
(up to 2.8x).
This is a partially unrolled version which performs better for small
hashes and only sacrifices a small amount of ultimate speed to a fully
unrolled version which uses 5k of code.
Code size
Before 1636 bytes
After 1880 bytes
15% larger
Benchmarks on Samsung Exynos 5 ARMv7 Chromebook
benchmark old ns/op new ns/op delta
BenchmarkHash8Bytes 1907 1136 -40.43%
BenchmarkHash1K 20280 7547 -62.79%
BenchmarkHash8K 148469 52576 -64.59%
benchmark old MB/s new MB/s speedup
BenchmarkHash8Bytes 4.19 7.04 1.68x
BenchmarkHash1K 50.49 135.68 2.69x
BenchmarkHash8K 55.18 155.81 2.82x
LGTM=dave, agl
R=dave, bradfitz, agl, adg, nick
CC=golang-codereviews
https://golang.org/cl/56990044
Remove GOOS_solaris ifdef from netpoll code,
instead introduce runtime edge/level triggered IO flag.
Replace armread/armwrite with a single arm(mode) function,
that's how all other interfaces look like and these functions
will need to do roughly the same thing anyway.
LGTM=rsc
R=golang-codereviews, dave, rsc
CC=golang-codereviews
https://golang.org/cl/55500044
1. Make internal chan functions static.
2. Move selgen local variable instead of a member of G struct.
3. Change "bool *pres/selected" parameter of chansend/chanrecv to "bool block",
which is simpler, faster and less code.
-37 lines total.
LGTM=rsc
R=golang-codereviews, dave, gobot, rsc
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/58610043
Moves sweep phase out of stoptheworld by adding
background sweeper goroutine and lazy on-demand sweeping.
It turned out to be somewhat trickier than I expected,
because there is no point in time when we know size of live heap
nor consistent number of mallocs and frees.
So everything related to next_gc, mprof, memstats, etc becomes trickier.
At the end of GC next_gc is conservatively set to heap_alloc*GOGC,
which is much larger than real value. But after every sweep
next_gc is decremented by freed*GOGC. So when everything is swept
next_gc becomes what it should be.
For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs),
because by the end of GC we know number of frees for the *previous* GC.
Significant caution is required to not cross yet-unknown real value of next_gc.
This is achieved by 2 means:
1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral.
2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are
returned to heap.
This provides quite strong guarantees that heap does not grow when it should now.
http-1
allocated 7036 7033 -0.04%
allocs 60 60 +0.00%
cputime 51050 46700 -8.52%
gc-pause-one 34060569 1777993 -94.78%
gc-pause-total 2554 133 -94.79%
latency-50 178448 170926 -4.22%
latency-95 284350 198294 -30.26%
latency-99 345191 220652 -36.08%
rss 101564416 101007360 -0.55%
sys-gc 6606832 6541296 -0.99%
sys-heap 88801280 87752704 -1.18%
sys-other 7334208 7405928 +0.98%
sys-stack 524288 524288 +0.00%
sys-total 103266608 102224216 -1.01%
time 50339 46533 -7.56%
virtual-mem 292990976 293728256 +0.25%
garbage-1
allocated 2983818 2990889 +0.24%
allocs 62880 62902 +0.03%
cputime 16480000 16190000 -1.76%
gc-pause-one 828462467 487875135 -41.11%
gc-pause-total 4142312 2439375 -41.11%
rss 1151709184 1153712128 +0.17%
sys-gc 66068352 66068352 +0.00%
sys-heap 1039728640 1039728640 +0.00%
sys-other 37776064 40770176 +7.93%
sys-stack 8781824 8781824 +0.00%
sys-total 1152354880 1155348992 +0.26%
time 16496998 16199876 -1.80%
virtual-mem 1409564672 1402281984 -0.52%
LGTM=rsc
R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot
CC=golang-codereviews, khr
https://golang.org/cl/46430043
$ go test -cpu=1,1,1,1,1,1,1,1,1 encoding/json
--- FAIL: TestIndentBig (0.00 seconds)
scanner_test.go:131: Indent(jsonBig) did not get bigger
On 4-th run initBig generates an empty array.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/49930051
ConstantTimeCompare has always been documented to take equal length
slices but perhaps this is too subtle, even for 'subtle'.
Fixes#7304.
LGTM=hanwen, bradfitz
R=golang-codereviews, hanwen, bradfitz
CC=golang-codereviews
https://golang.org/cl/62190043
Allow clients to check for timeouts without relying on error substring
matching.
Fixes#6185.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/55470048
Prevents a ton of garbage. (Noticed this when writing large
Camlistore zip archives to Amazon Glacier)
Note that the Closer part of the io.WriteCloser is never given
to users. It's an internal detail of the package.
benchmark old ns/op new ns/op delta
BenchmarkCompressedZipGarbage 42884123 40732373 -5.02%
benchmark old allocs new allocs delta
BenchmarkCompressedZipGarbage 204 149 -26.96%
benchmark old bytes new bytes delta
BenchmarkCompressedZipGarbage 4397576 66744 -98.48%
LGTM=adg, rsc
R=adg, rsc
CC=golang-codereviews
https://golang.org/cl/54300053
There is frequently a thread hanging on GQCS,
currently it skews profiles towards netpoll,
but it is not bad and is not consuming any resources.
R=alex.brainman
CC=golang-codereviews
https://golang.org/cl/61560043
mp->mcache can be concurrently modified by runtime·helpgc.
In such case sigprof can remember mcache=nil, then helpgc sets it to non-nil,
then sigprof restores it back to nil, GC crashes with nil mcache.
R=rsc
CC=golang-codereviews
https://golang.org/cl/58860044
filepath.Base covers all scenarios
(for example paths like d:hello.txt)
on windows
LGTM=iant, bradfitz
R=golang-codereviews, iant, bradfitz
CC=golang-codereviews
https://golang.org/cl/59740050
This CL is in preparation to make cgo work on freebsd/arm.
The signedness of C char might be a problem when we make bare syscall
APIs, Go structures, using built-in bootstrap scripts with cgo because
they do translate C stuff to Go stuff internally. For now almost all
the C compilers assume that the type of char will be unsigned on arm
by default but it makes a different view from amd64, 386.
This CL just passes -fsigned-char, let the type of char be signed,
option which is supported on both gcc and clang to the underlying C
compilers through cgo for avoiding such inconsistency on syscall API.
LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/59740051
This CL is in preparation to make cgo work on freebsd/arm.
It's just for fixing build fails on freebsd/arm, we still need to
update z-files later for fixing several package test fails.
How to generate z-files on freebsd/arm in the bootstrapping phase:
1. run freebsd on appropriate arm-eabi platforms
2. both syscall z-files and runtime def-files in the current tree are
broken about EABI padding, fix them by hand
3. run make.bash again to build $GOTOOLDIR/cgo
4. use $GOTOOLDIR/cgo directly
LGTM=iant
R=iant, dave
CC=golang-codereviews
https://golang.org/cl/59490052
This CL is in preparation to make cgo work on freebsd/arm.
How to generate defs-files on freebsd/arm in the bootstrapping phase:
1. run freebsd on appropriate arm-eabi platforms
2. both syscall z-files and runtime def-files in the current tree are
broken about EABI padding, fix them by hand
3. run make.bash again to build $GOTOOLDIR/cgo
4. use $GOTOOLDIR/cgo directly
LGTM=minux.ma, iant
R=iant, minux.ma, dave
CC=golang-codereviews
https://golang.org/cl/59580045
Whitespace characters are allowed in quoted-string according to RFC 5322 without
being "Q"-encoding. Address.String() already always formats the name portion in
quoted string, so whitespace characters should be allowed in there.
Fixes#6641.
LGTM=dave, dsymonds
R=golang-codereviews, gobot, dsymonds, dave
CC=golang-codereviews
https://golang.org/cl/55770043
I just added support for goto statements to my GopherJS project and now I am trying to get rid of my patches. These occurrences of goto however are a bit problematic:
GopherJS has to emulate gotos, so there is some performance drawback when doing so. In this case the drawback is major, since this is a core function of math/big which is called quite often. Additionally I can't see any reason here why the implementation with gotos should be preferred over my proposal.
That's why I would kindly ask to include this patch, even though it is functional equivalent to the existing code.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/55470046
Introduce two new environment variables, CC_FOR_TARGET and CXX_FOR_TARGET.
CC_FOR_TARGET defaults to CC and is used when compiling for GOARCH, while
CC remains for compiling for GOHOSTARCH.
CXX_FOR_TARGET defaults to CXX and is used when compiling C++ code for
GOARCH.
CGO_ENABLED defaults to disabled when cross compiling and has to be
explicitly enabled.
Update #4714
LGTM=minux.ma, iant
R=golang-codereviews, minux.ma, iant, rsc, dominik.honnef
CC=golang-codereviews
https://golang.org/cl/57100043
Add benchmarks for:
1. non-blocking failing receive (polling of "stop" chan)
2. channel-based semaphore (gate pattern)
3. select-based producer/consumer (pass data through a channel, but also wait on "stop" and "timeout" channels)
LGTM=r
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/59040043
Command was (and is) documented like:
"If name contains no path separators, Command uses LookPath to
resolve the path to a complete name if possible. Otherwise it
uses name directly."
But that wasn't true. It always did LookPath, and then
set a sticky error that the user couldn't unset.
And then if cmd.Dir was changed, Start would still fail
due to the earlier sticky error being set.
This keeps LookPath in the same place as before (so no user
visible changes in cmd.Path after Command), but only does
it when the documentation says it will happen.
Also, clarify the docs about a relative Dir path.
No change in any existing behavior, except using Command
is now possible with relative paths. Previously it only
worked if you built the *Cmd by hand.
Fixes#7228
LGTM=iant
R=iant
CC=adg, golang-codereviews
https://golang.org/cl/59580044
Fixes#6874.
Use runtime.GC() as a stronger version of runtime.Gosched() which tends to bias the running goroutine in an otherwise idle system. This appears to reduce the worst case number of spins from 600 down to 30 on my 2 core system under high load.
LGTM=iant
R=golang-codereviews, lucio.dere, iant, dvyukov
CC=golang-codereviews
https://golang.org/cl/56540046
If a LowerUpper ever happens, maketables will complain.
Fixes#7002.
LGTM=dave
R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/59210044
The Transport's idle connection cache is keyed by a string,
for pre-Go 1.0 reasons. Ever since Go has been able to use
structs as map keys, there's been a TODO in the code to use
structs instead of allocating strings. This change does that.
Saves 3 allocatins and ~100 bytes of garbage per client
request. But because string hashing is so fast these days
(thanks, Keith), the performance is a wash: what we gain
on GC and not allocating, we lose in slower hashing. (hashing
structs of strings is slower than 1 string)
This seems a bit faster usually, but I've also seen it be a
bit slower. But at least it's how I've wanted it now, and it
the allocation improvements are consistent.
LGTM=adg
R=adg
CC=golang-codereviews
https://golang.org/cl/58260043
This is the chunked half of https://golang.org/cl/49570044 .
We want full reads to return EOF as early as possible, when we
know we're at the end, so http.Transport client connections are eagerly
re-used in the common case, even if no Read or Close follows.
To do this, make the chunkedReader.Read fill up its argument p []byte
buffer as much as possible, as long as that doesn't involve doing
any more blocking reads to read chunk headers. That means if we
have a chunk EOF ("0\r\n") sitting in the incoming bufio.Reader,
we see it and set EOF on our final Read.
LGTM=adg
R=adg
CC=golang-codereviews
https://golang.org/cl/58240043
Set EOF on the final Read of a body with a Content-Length, which
will cause clients to recycle their connection immediately upon
the final Read, rather than waiting for another Read or Close
(neither of which might come). This happens often when client
code is simply something like:
err := json.NewDecoder(resp.Body).Decode(&dest)
...
Then there's usually no subsequent Read. Even if the client
calls Close (which they should): in Go 1.1, the body was
slurped to EOF, but in Go 1.2, that was then treated as a
Close-before-EOF and the underlying connection was closed.
But that's assuming the user even calls Close. Many don't.
Reading to EOF also causes a connection be reused. Now the EOF
arrives earlier.
This CL only addresses the Content-Length case. A future CL
will address the chunked case.
LGTM=adg
R=adg
CC=golang-codereviews
https://golang.org/cl/49570044
This change also addresses some places where the comments were lacking.
Fixes#7087.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/56700043
On 32-bits one can arrange make(chan) params so that
the chan buffer gives you access to whole memory.
LGTM=r
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/50250045
Tiny alloc memory block is shared by different goroutines running on the same thread.
We call racemalloc after enabling preemption in mallocgc,
as the result another goroutine can act on not yet race-cleared tiny block.
Call racemalloc before enabling preemption.
Fixes#7224.
LGTM=dave
R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/57730043
Although debug.Stack is deprecated, it should still return the correct result.
Output before this CL (using a trivial library in $GOPATH/test.com/a):
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
com/a.ShowStack: os.Stdout.Write(debug.Stack())
Output with this CL applied:
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
ShowStack: os.Stdout.Write(debug.Stack())
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/57330043
Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes#7218.
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045
When GOMAXPROCS>1 the last P in syscall is never retaken
(because there are already idle P's -- npidle>0).
This prevents sysmon thread from sleeping.
On a darwin machine the program from issue 6673 constantly
consumes ~0.2% CPU. With this change it stably consumes 0.0% CPU.
Fixes#6673.
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/56990045
Use the smaller read-only bytes.NewReader/strings.NewReader instead
of a bytes.Buffer when possible.
LGTM=r
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54660045
In DWARF 4 the debug info for large types is put into
.debug_type sections, so that the linker can discard duplicate
info. This change adds support for reading type units.
Another small change included here is that DWARF 3 supports
storing the byte offset of a struct field as a formData rather
than a formDwarfBlock.
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/56300043
On 32-bits n*sizeof(r[0]) can overflow.
Or it can become 1<<32-eps, and mallocgc will "successfully"
allocate 0 pages for it, there are no checks downstream
and MHeap_Grow just does:
npage = (npage+15)&~15;
ask = npage<<PageShift;
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/54760045
When growing slice take into account size of the allocated memory block.
Also apply the same optimization to string->[]byte conversion.
Fixes#6307.
benchmark old ns/op new ns/op delta
BenchmarkAppendGrowByte 4541036 4434108 -2.35%
BenchmarkAppendGrowString 59885673 44813604 -25.17%
LGTM=khr
R=khr
CC=golang-codereviews, iant, rsc
https://golang.org/cl/53340044
On top of "tiny allocator" (cl/38750047), reduces number of allocs by 1% on json.
No code must rely on zero termination. So will also make debugging simpler,
by uncovering issues earlier.
json-1
allocated 7949686 7915766 -0.43%
allocs 93778 92790 -1.05%
time 100957795 97250949 -3.67%
rest of the metrics are too noisy.
LGTM=r
R=golang-codereviews, r, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/40370061
There is more zeroing than I would like right now -
temporaries used for the new map and channel runtime
calls need to be eliminated - but it will do for now.
This CL only has an effect if you are building with
GOEXPERIMENT=precisestack ./all.bash
(or make.bash). It costs about 5% in the overall time
spent in all.bash. That number will come down before
we make it on by default, but this should be enough for
Keith to try using the precise maps for copying stacks.
amd64 only (and it's not really great generated code).
TBR=khr, iant
CC=golang-codereviews
https://golang.org/cl/56430043
The addition of TLS to ARM rewrote the MRC instruction
differently depending on whether we were using internal
or external linking mode. That's clearly not okay, since we
don't know that during compilation, which is when we now
generate the code. Also, because the change did not introduce
a real MRC instruction but instead just macro-expanded it
in the assembler, liblink is rewriting a WORD instruction that
may actually be looking for that specific constant, which would
lead to very unexpected results. It was also using one value
that happened to be 8 where a different value that also
happened to be 8 belonged. So the code was correct for those
values but not correct in general, and very confusing.
Throw it all away.
Replace with the following. There is a linker-provided symbol
runtime.tlsgm with a value (address) set to the offset from the
hardware-provided TLS base register to the g and m storage.
Any reference to that name emits an appropriate TLS relocation
to be resolved by either the internal linker or the external linker,
depending on the link mode. The relocation has exactly the
semantics of the R_ARM_TLS_LE32 relocation, which is what
the external linker provides.
This symbol is only used in two routines, runtime.load_gm and
runtime.save_gm. In both cases it is now used like this:
MRC 15, 0, R0, C13, C0, 3 // fetch TLS base pointer
MOVW $runtime·tlsgm(SB), R2
ADD R2, R0 // now R0 points at thread-local g+m storage
It is likely that this change breaks the generation of shared libraries
on ARM, because the MOVW needs to be rewritten to use the global
offset table and a different relocation type. But let's get the supported
functionality working again before we worry about unsupported
functionality.
LGTM=dave, iant
R=iant, dave
CC=golang-codereviews
https://golang.org/cl/56120043
Adam (agl@) had already done an initial review of this CL in a branch.
Added ClientSessionState to Config which now allows clients to keep state
required to resume a TLS session with a server. A client handshake will try
and use the SessionTicket/MasterSecret in this cached state if the server
acknowledged resumption.
We also added support to cache ClientSessionState object in Config that will
be looked up by server remote address during the handshake.
R=golang-codereviews, agl, rsc, agl, agl, bradfitz, mikioh.mikioh
CC=golang-codereviews
https://golang.org/cl/15680043
It implements parsing of the header and symbol table for both
32-bit and 64-bit Plan 9 binaries. The nm tool was updated to
use this package.
R=rsc, aram
CC=golang-codereviews
https://golang.org/cl/49970044
The typo was introduced by one of Dmitriy's CLs this morning.
The fix makes the ARM build compile again; it still won't pass
its tests, but one thing at a time.
TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/55770044
This change include updates to the probeIPv4Stack
and probeIPv6Stack to ensure that one or both
protocols are supported by ip(3).
The addition of fdMutex to netFD fixes the
TestTCPConcurrentAccept failures.
Additional changes add support for keepalive.
R=golang-codereviews, 0intro
CC=golang-codereviews, rsc
https://golang.org/cl/49920048
Doesn't really matter for the most part, since the runtime-integrated
network poller uses its own kevent implementation, but for people using
the syscall directly, we should use an unsafe.Pointer for the precise GC
to retain the pointer arguments.
Also push down unsafe.Pointer a bit further in exec_linux.go, not
that there are any GC preemption points in the middle and sys
is still live anyway.
R=golang-codereviews, dvyukov
CC=golang-codereviews, iant
https://golang.org/cl/55520043
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
This mechanism does not require backing mutex to protect wait predicate.
Use it in netpoll. See comment in netpoll.goc for details.
This slightly reduces contention between reader, writer and read/write io notifications;
and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.
benchmark old ns/op new ns/op delta
BenchmarkTCP4ConcurrentReadWrite 2109 1945 -7.78%
BenchmarkTCP4ConcurrentReadWrite-2 1162 1113 -4.22%
BenchmarkTCP4ConcurrentReadWrite-4 798 755 -5.39%
BenchmarkTCP4ConcurrentReadWrite-8 803 748 -6.85%
BenchmarkTCP4Persistent 9411 9240 -1.82%
BenchmarkTCP4Persistent-2 5888 5813 -1.27%
BenchmarkTCP4Persistent-4 4016 3968 -1.20%
BenchmarkTCP4Persistent-8 3943 3857 -2.18%
R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
CC=golang-codereviews, khr
https://golang.org/cl/45700043
- do not lose profiling signals when we have no mcache (possible for syscalls/cgo)
- do not lose any profiling signals on windows
- fix profiling of cgo programs on windows (they had no m->thread setup)
- properly setup tls in cgo programs on windows
- check _beginthread return value
Fixes#6417.
Fixes#6986.
R=alex.brainman, rsc
CC=golang-codereviews
https://golang.org/cl/44820047
In particular: setsockopt, getsockopt, bind, connect.
There are probably more.
All platforms cross-compile with make.bash, and all.bash still
pases on linux/amd64.
Update #7169
R=rsc
CC=golang-codereviews
https://golang.org/cl/55410043