On top of "tiny allocator" (cl/38750047), reduces number of allocs by 1% on json.
No code must rely on zero termination. So will also make debugging simpler,
by uncovering issues earlier.
json-1
allocated 7949686 7915766 -0.43%
allocs 93778 92790 -1.05%
time 100957795 97250949 -3.67%
rest of the metrics are too noisy.
LGTM=r
R=golang-codereviews, r, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/40370061
There is more zeroing than I would like right now -
temporaries used for the new map and channel runtime
calls need to be eliminated - but it will do for now.
This CL only has an effect if you are building with
GOEXPERIMENT=precisestack ./all.bash
(or make.bash). It costs about 5% in the overall time
spent in all.bash. That number will come down before
we make it on by default, but this should be enough for
Keith to try using the precise maps for copying stacks.
amd64 only (and it's not really great generated code).
TBR=khr, iant
CC=golang-codereviews
https://golang.org/cl/56430043
The addition of TLS to ARM rewrote the MRC instruction
differently depending on whether we were using internal
or external linking mode. That's clearly not okay, since we
don't know that during compilation, which is when we now
generate the code. Also, because the change did not introduce
a real MRC instruction but instead just macro-expanded it
in the assembler, liblink is rewriting a WORD instruction that
may actually be looking for that specific constant, which would
lead to very unexpected results. It was also using one value
that happened to be 8 where a different value that also
happened to be 8 belonged. So the code was correct for those
values but not correct in general, and very confusing.
Throw it all away.
Replace with the following. There is a linker-provided symbol
runtime.tlsgm with a value (address) set to the offset from the
hardware-provided TLS base register to the g and m storage.
Any reference to that name emits an appropriate TLS relocation
to be resolved by either the internal linker or the external linker,
depending on the link mode. The relocation has exactly the
semantics of the R_ARM_TLS_LE32 relocation, which is what
the external linker provides.
This symbol is only used in two routines, runtime.load_gm and
runtime.save_gm. In both cases it is now used like this:
MRC 15, 0, R0, C13, C0, 3 // fetch TLS base pointer
MOVW $runtime·tlsgm(SB), R2
ADD R2, R0 // now R0 points at thread-local g+m storage
It is likely that this change breaks the generation of shared libraries
on ARM, because the MOVW needs to be rewritten to use the global
offset table and a different relocation type. But let's get the supported
functionality working again before we worry about unsupported
functionality.
LGTM=dave, iant
R=iant, dave
CC=golang-codereviews
https://golang.org/cl/56120043
Needs to be an h3, not an h2.
Thanks to Mingjie Xing for pointing it out.
LGTM=dsymonds
R=golang-codereviews, dsymonds
CC=golang-codereviews
https://golang.org/cl/55980046
The R= is populated by Rietveld, so it's basically
anyone who replied to the CL. The LGTM= is meant
to record who actually signed off on the CL.
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/55390043
Adam (agl@) had already done an initial review of this CL in a branch.
Added ClientSessionState to Config which now allows clients to keep state
required to resume a TLS session with a server. A client handshake will try
and use the SessionTicket/MasterSecret in this cached state if the server
acknowledged resumption.
We also added support to cache ClientSessionState object in Config that will
be looked up by server remote address during the handshake.
R=golang-codereviews, agl, rsc, agl, agl, bradfitz, mikioh.mikioh
CC=golang-codereviews
https://golang.org/cl/15680043
It implements parsing of the header and symbol table for both
32-bit and 64-bit Plan 9 binaries. The nm tool was updated to
use this package.
R=rsc, aram
CC=golang-codereviews
https://golang.org/cl/49970044
The typo was introduced by one of Dmitriy's CLs this morning.
The fix makes the ARM build compile again; it still won't pass
its tests, but one thing at a time.
TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/55770044
This change include updates to the probeIPv4Stack
and probeIPv6Stack to ensure that one or both
protocols are supported by ip(3).
The addition of fdMutex to netFD fixes the
TestTCPConcurrentAccept failures.
Additional changes add support for keepalive.
R=golang-codereviews, 0intro
CC=golang-codereviews, rsc
https://golang.org/cl/49920048
Doesn't really matter for the most part, since the runtime-integrated
network poller uses its own kevent implementation, but for people using
the syscall directly, we should use an unsafe.Pointer for the precise GC
to retain the pointer arguments.
Also push down unsafe.Pointer a bit further in exec_linux.go, not
that there are any GC preemption points in the middle and sys
is still live anyway.
R=golang-codereviews, dvyukov
CC=golang-codereviews, iant
https://golang.org/cl/55520043
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
This mechanism does not require backing mutex to protect wait predicate.
Use it in netpoll. See comment in netpoll.goc for details.
This slightly reduces contention between reader, writer and read/write io notifications;
and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.
benchmark old ns/op new ns/op delta
BenchmarkTCP4ConcurrentReadWrite 2109 1945 -7.78%
BenchmarkTCP4ConcurrentReadWrite-2 1162 1113 -4.22%
BenchmarkTCP4ConcurrentReadWrite-4 798 755 -5.39%
BenchmarkTCP4ConcurrentReadWrite-8 803 748 -6.85%
BenchmarkTCP4Persistent 9411 9240 -1.82%
BenchmarkTCP4Persistent-2 5888 5813 -1.27%
BenchmarkTCP4Persistent-4 4016 3968 -1.20%
BenchmarkTCP4Persistent-8 3943 3857 -2.18%
R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
CC=golang-codereviews, khr
https://golang.org/cl/45700043
- do not lose profiling signals when we have no mcache (possible for syscalls/cgo)
- do not lose any profiling signals on windows
- fix profiling of cgo programs on windows (they had no m->thread setup)
- properly setup tls in cgo programs on windows
- check _beginthread return value
Fixes#6417.
Fixes#6986.
R=alex.brainman, rsc
CC=golang-codereviews
https://golang.org/cl/44820047
In particular: setsockopt, getsockopt, bind, connect.
There are probably more.
All platforms cross-compile with make.bash, and all.bash still
pases on linux/amd64.
Update #7169
R=rsc
CC=golang-codereviews
https://golang.org/cl/55410043
Now that liblink is compiled into the compilers and assemblers,
it must not refer to the "linkmode", since that is not known until
link time. This CL makes the ARM support no longer use linkmode,
which fixes a bug with cgo binaries that contain their own TLS
variables.
The x86 code must also remove linkmode; that is issue 7164.
Fixes#6992.
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/55160043
The escape analysis works by tracing assignment paths from
variables that start with pointer type, or addresses of variables
(addresses are always pointers). It does allow non-pointers
in the path, so that in this code it sees x's value escape into y:
var x *[10]int
y := (*int)(unsafe.Pointer(uintptr(unsafe.Pointer(x))+32))
It must allow uintptr in order to see through this kind of
"pointer arithmetic".
It also traces such values if they end up as uintptrs passed to
functions. This used to be important because packages like
encoding/gob passed around uintptrs holding real pointers.
The introduction of precise collection of stacks has forced
code to be more honest about which declared stack variables
hold pointers and which do not. In particular, the garbage
collector no longer sees pointers stored in uintptr variables.
Because of this, packages like encoding/gob have been fixed.
There is not much point in the escape analysis accepting
uintptrs as holding pointers at call boundaries if the garbage
collector does not.
Excluding uintptr-valued arguments brings the escape
analysis in line with the garbage collector and has the
useful side effect of making arguments to syscall.Syscall
not appear to escape.
That is, this CL should yield the same benefits as
CL 45930043 (rolled back in CL 53870043), but it does
so by making uintptrs less special, not more.
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/53940043
Many calls to symgrow pass a vlong value. Change the function
to not implicitly truncate, and to instead give an error if
the value is too large.
R=golang-codereviews, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/54010043
Currently we collect (add) all roots into a global array in a single-threaded GC phase.
This hinders parallelism.
With this change we just kick off parallel for for number_of_goroutines+5 iterations.
Then parallel for callback decides whether it needs to scan stack of a goroutine
scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely.
This requires to store all goroutines in an array instead of a linked list
(to allow direct indexing).
This CL also removes DebugScan functionality. It is broken because it uses
unbounded stack, so it can not run on g0. When it was working, I've found
it helpless for debugging issues because the two algorithms are too different now.
This change would require updating the DebugScan, so it's simpler to just delete it.
With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same.
garbage-8
allocated 2987886 2989221 +0.04%
allocs 62885 62887 +0.00%
cputime 21286000 21272000 -0.07%
gc-pause-one 26633247 24885421 -6.56%
gc-pause-total 873570 811264 -7.13%
rss 242089984 242515968 +0.18%
sys-gc 13934336 13869056 -0.47%
sys-heap 205062144 205062144 +0.00%
sys-other 12628288 12628288 +0.00%
sys-stack 11534336 11927552 +3.41%
sys-total 243159104 243487040 +0.13%
time 2809477 2740795 -2.44%
R=golang-codereviews, rsc
CC=cshapiro, golang-codereviews, khr
https://golang.org/cl/46860043
Instead of a per-goroutine stack of defers for all sizes,
introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes.
For a program that starts 1e6 goroutines and then joins then:
old: rss=6.6g virtmem=10.2g time=4.85s
new: rss=4.5g virtmem= 8.2g time=3.48s
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/42750044