Add missing race and msan checks to reflect.typedmmemove and
reflect.typedslicecopy. Missing these checks caused the race detector
to miss races and caused msan to issue false positive errors.
Fixes#16281.
Change-Id: I500b5f92bd68dc99dd5d6f297827fd5d2609e88b
Reviewed-on: https://go-review.googlesource.com/24760
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Fetch the current time in nanoseconds, not microseconds, by using
clock_gettime rather than gettimeofday.
Updates #11222
Change-Id: I1c2c1b88f80ae82002518359436e19099061c6fb
Reviewed-on: https://go-review.googlesource.com/26790
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
Found by vet.
Updates #11041
Change-Id: I5217b3e20c6af435d7500d6bb487b9895efe6605
Reviewed-on: https://go-review.googlesource.com/27493
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
In StartTrace we emit EvGoCreate for all existing goroutines.
This includes stack unwind to obtain current stack.
Real Go programs can contain hundreds of thousands of blocked goroutines.
For such programs StartTrace can take up to a second (few ms per goroutine).
Obtain current stack ID once and use it for all EvGoCreate events.
This speeds up StartTrace with 10K blocked goroutines from 20ms to 4 ms
(win for StartTrace called from net/http/pprof hander will be bigger
as stack is deeper).
Change-Id: I9e5ff9468331a840f8fdcdd56c5018c2cfde61fc
Reviewed-on: https://go-review.googlesource.com/25573
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
They are unused, and vet wants them to have
a function prototype.
Updates #11041
Change-Id: Idedc96ddd3c3cf1b1d2ab6d98796367eab29f032
Reviewed-on: https://go-review.googlesource.com/27492
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The asmdecl check had hand-rolled code that
calculated the size and offset of parameters
based only on the AST.
It included a list of known named types.
This CL changes asmdecl to use go/types instead.
This allows us to easily handle named types.
It also adds support for structs, arrays,
and complex parameters.
It improves the default names given to unnamed
parameters. Previously, all anonymous arguments were
called "unnamed", and the first anonymous return
argument was called "ret".
Anonymous arguments are now called arg, arg1, arg2,
etc., depending on the index in the argument list.
Return arguments are ret, ret1, ret2.
This CL also fixes a bug in the printing of
composite data type sizes.
Updates #11041
Change-Id: I1085116a26fe6199480b680eff659eb9ab31769b
Reviewed-on: https://go-review.googlesource.com/27150
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Go will have already cleared the structs (the original C wouldn't
have).
Change-Id: I4a5a0cfd73953181affc158d188aae2ce281bb33
Reviewed-on: https://go-review.googlesource.com/27435
Run-TryBot: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The previous fix for this, commit 336dad2a, had everything right in
the commit message, but reversed the test in the code. Fix the test in
the code.
This reversal effectively disabled the scavenger on large page systems
*except* in the rare cases where this code was originally wrong, which
is why it didn't obviously show up in testing.
Fixes#16644. Again. :(
Change-Id: I27cce4aea13de217197db4b628f17860f27ce83e
Reviewed-on: https://go-review.googlesource.com/27402
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The transition from mark 1 to mark 2 no longer enqueues new root
marking jobs, but some of the comments still refer to this. Fix these
comments.
Change-Id: I3f98628dba32c5afe30495ab495da42b32291e9e
Reviewed-on: https://go-review.googlesource.com/24965
Reviewed-by: Rick Hudson <rlh@golang.org>
This time with the cherry-pick from the proper patch of
the old CL.
Stack size increased.
Corrected NaN-comparison glitches.
Marked g register as clobbered by calls.
Fixed shared libraries.
live_ssa.go still disabled because of differences.
Presumably turning on more optimization will fix
both the stack size and the live_ssa.go glitches.
Enhanced debugging output for shared libs test.
Rebased onto master.
Updates #16010.
Change-Id: I40864faf1ef32c118fb141b7ef8e854498e6b2c4
Reviewed-on: https://go-review.googlesource.com/27159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
sysUnused (e.g., madvise MADV_FREE) is only sensible to call on
physical page boundaries, so scavengelist rounds in the bounds of the
region being released to the nearest physical page boundaries.
However, if the region is smaller than a physical page and neither the
start nor end fall on a boundary, then rounding the start up to a page
boundary and the end down to a page boundary will result in end < start.
Currently, we only give up on the region if start == end, so if we
encounter end < start, we'll call madvise with a negative length and
the madvise will fail.
Issue #16644 gives a concrete example of this:
start = 0x1285ac000
end = 0x1285ae000 (1 8K page)
This leads to the rounded values
start = 0x1285b0000
end = 0x1285a0000
which leads to len = -65536.
Fix this by giving up on the region if end <= start, not just if
end == start.
Fixes#16644.
Change-Id: I8300db492dbadc82ac1ad878318b36bcb7c39524
Reviewed-on: https://go-review.googlesource.com/27230
Reviewed-by: Keith Randall <khr@golang.org>
We should check whether there is a concurrent writer at the
start of every mapiternext, not just in mapaccessK (which is
only called during certain map growth situations).
Tests turned off by default because they are inherently flaky.
Fixes#16278
Change-Id: I8b72cab1b8c59d1923bec6fa3eabc932e4e91542
Reviewed-on: https://go-review.googlesource.com/24749
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
We already inlined
_, ok = e.(T)
_, ok = i.(E)
_, ok = e.(E)
The only ok-only variants not inlined are now
_, ok = i.(I)
_, ok = e.(I)
These call getitab, so are non-trivial.
Change-Id: Ie45fd8933ee179a679b92ce925079b94cff0ee12
Reviewed-on: https://go-review.googlesource.com/26658
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Merging from dev.ssa back into master.
Contains complete SSA backends for arm, arm64, 386, amd64p32.
Work in progress for PPC64.
Change-Id: Ifd7075e3ec6f88f776e29f8c7fd55830328897fd
Last part of the 386 SSA port.
Modify the x86 backend to simulate SSE registers and
instructions with 387 registers and instructions.
The simulation isn't terribly performant, but it works,
and the old implementation wasn't very performant either.
Leaving to people who care about 387 to optimize if they want.
Turn on SSA backend for 386 by default.
Fixes#16358
Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0
Reviewed-on: https://go-review.googlesource.com/25271
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Fixes#16618.
Change-Id: Iffada12e8672bbdbcf2e787782c497e2c45701b1
Reviewed-on: https://go-review.googlesource.com/25550
Run-TryBot: Minux Ma <minux@golang.org>
Reviewed-by: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Fixes#16570 on iOS.
Thanks Daniel Burhans for reporting the bug and testing the fix.
Change-Id: I43ae7b78c8f85a131ed3d93ea59da9f32a02cd8f
Reviewed-on: https://go-review.googlesource.com/25481
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
When compiling with -buildmode=shared, a map[int32]*_type is created for
each extra module mapping duplicate types back to a canonical object.
This is done in the function typelinksinit, which is called before the
init function that sets up the hash functions for the map
implementation. The result is typemap becomes unusable after
runtime initialization.
The fix in this CL is to move algorithm init before typelinksinit in
the runtime setup process. (For 1.8, we may want to turn typemap into
a sorted slice of types and use binary search.)
Manually tested on GOOS=linux with:
GOHOSTARCH=386 GOARCH=386 ./make.bash && \
go install -buildmode=shared std && \
cd ../test && \
go run run.go -linkshared
Fixes#16590
Change-Id: Idc08c50cc70d20028276fbf564509d2cd5405210
Reviewed-on: https://go-review.googlesource.com/25469
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
macOS Sierra beta4 changed the kernel interface for getting time.
DX now optionally points to an address for additional info.
Set it to zero to avoid corrupting memory.
Fixes#16570
Change-Id: I9f537e552682045325cdbb68b7d0b4ddafade14a
Reviewed-on: https://go-review.googlesource.com/25400
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Quentin Smith <quentin@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Mutator goroutines that allocate memory during the concurrent mark
phase are required to spend some time assisting the garbage
collector. The magnitude of this mandatory assistance is proportional
to the goroutine's allocation debt and subject to the assistance
ratio as calculated by the pacer.
When assisting the garbage collector, a mutator goroutine will go
beyond paying off its allocation debt. It will build up extra credit
to amortize the overhead of the assist.
In fast-allocating applications with high assist ratios, building up
this credit can take the affected goroutine's entire time slice.
Reduce the penalty on each goroutine being selected to assist the GC
in two ways, to spread the responsibility more evenly.
First, do a consistent amount of extra scan work without regard for
the pacer's assistance ratio. Second, reduce the magnitude of the
extra scan work so it can be completed within a few hundred
microseconds.
Commentary on gcOverAssistWork is by Austin Clements, originally in
https://golang.org/cl/24704
Updates #14812Fixes#16432
Change-Id: I436f899e778c20daa314f3e9f0e2a1bbd53b43e1
Reviewed-on: https://go-review.googlesource.com/25155
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Chris Broadfoot <cbro@golang.org>
Currently the pprof package gives almost no guidance for how to use it
and, despite the standard boilerplate used to create CPU and memory
profiles, this boilerplate appears nowhere in the pprof documentation.
Update the pprof package documentation to give the standard
boilerplate in a form people can copy, paste, and tweak. This
boilerplate is based on rsc's 2011 blog post on profiling Go programs
at https://blog.golang.org/profiling-go-programs, which is where I
always go when I need to copy-paste the boilerplate.
Change-Id: I74021e494ea4dcc6b56d6fb5e59829ad4bb7b0be
Reviewed-on: https://go-review.googlesource.com/25182
Reviewed-by: Rick Hudson <rlh@golang.org>
GOARCH=386 SSATEST=1 ./all.bash passes
Caveat: still needs changes to test/ files to use *_ssa.go versions. I
won't check those changes in with this CL because the builders will
complain as they don't have SSATEST=1.
Mostly minor fixes.
Implement float <-> uint32 in assembly. It seems the simplest option
for now.
GO386=387 does not work. That's why I can't make SSA the default for
386 yet.
Change-Id: Ic4d4402104d32bcfb1fd612f5bb6539f9acb8ae0
Reviewed-on: https://go-review.googlesource.com/25119
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The omission of this instruction could confuse the traceback code if a
SIGPROF occurred during a signal handler. The traceback code would
trace up to sigtramp, but would then get confused because it would see a
PC address that did not appear to be in the function.
Fixes#16453.
Change-Id: I2b3d53e0b272fb01d9c2cb8add22bad879d3eebc
Reviewed-on: https://go-review.googlesource.com/25104
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Most operations need an upper bound on the physical page size, which
is what sys.PhysPageSize is for (this is checked at runtime init on
Linux). However, a few operations need a *lower* bound on the physical
page size. Introduce a "minPhysPageSize" constant to act as this lower
bound and use it where it makes sense:
1) In addrspace_free, we have to query each page in the given range.
Currently we increment by the upper bound on the physical page
size, which means we may skip over pages if the true size is
smaller. Worse, we currently pass a result buffer that only has
enough room for one page. If there are actually multiple pages in
the range passed to mincore, the kernel will overflow this buffer.
Fix these problems by incrementing by the lower-bound on the
physical page size and by passing "1" for the length, which the
kernel will round up to the true physical page size.
2) In the write barrier, the bad pointer check tests for pointers to
the first physical page, which are presumably small integers
masquerading as pointers. However, if physical pages are smaller
than we think, we may have legitimate pointers below
sys.PhysPageSize. Hence, use minPhysPageSize for this test since
pointers should never fall below that.
In particular, this applies to ARM64 and MIPS. The runtime is
configured to use 64kB pages on ARM64, but by default Linux uses 4kB
pages. Similarly, the runtime assumes 16kB pages on MIPS, but both 4kB
and 16kB kernel configurations are common. This also applies to ARM on
systems where the runtime is recompiled to deal with a larger page
size. It is also a step toward making the runtime use only a
dynamically-queried page size.
Change-Id: I1fdfd18f6e7cbca170cc100354b9faa22fde8a69
Reviewed-on: https://go-review.googlesource.com/25020
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
When a non-Go thread calls into Go, the runtime needs an M to run the Go
code. The runtime keeps a list of extra M's available. When the last
extra M is allocated, the needextram field is set to tell it to allocate
a new extra M as soon as it is running in Go. This ensures that an extra
M will always be available for the next thread.
However, if many threads need an extra M at the same time, this
serializes them all. One thread will get an extra M with the needextram
field set. All the other threads will see that there is no M available
and will go to sleep. The one thread that succeeded will create a new
extra M. One lucky thread will get it. All the other threads will see
that there is no M available and will go to sleep. The effect is
thundering herd, as all the threads looking for an extra M go through
the process one by one. This seems to have a particularly bad effect on
the FreeBSD scheduler for some reason.
With this change, we track the number of threads waiting for an M, and
create all of them as soon as one thread gets through. This still means
that all the threads will fight for the lock to pick up the next M. But
at least each thread that gets the lock will succeed, instead of going
to sleep only to fight again.
This smooths out the performance greatly on FreeBSD, reducing the
average wall time of `testprogcgo CgoCallbackGC` by 74%. On GNU/Linux
the average wall time goes down by 9%.
Fixes#13926Fixes#16396
Change-Id: I6dc42a4156085a7ed4e5334c60b39db8f8ef8fea
Reviewed-on: https://go-review.googlesource.com/25047
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Add some simplification rules for floating point ops.
cmd/internal/obj/arm supports instructions that compare FP register
to 0, but runtime softfloat simulator does not. This CL adds these
instructions to softfloat simulator as well.
Updates #15365.
Change-Id: I29405b2bfcb4c8cf106cb7a1a811409fec91b170
Reviewed-on: https://go-review.googlesource.com/24790
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This fixes erroneous handling of the more result parameter of
runtime.Frames.Next.
Fixes#16349.
Change-Id: I4f1c0263dafbb883294b31dbb8922b9d3e650200
Reviewed-on: https://go-review.googlesource.com/24911
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The cgocallback function picked up a ctxt parameter in CL 22508.
That CL updated the assembler implementation, but there are a few
mentions in Go code that were not updated. This CL fixes that.
Fixes#16326
Change-Id: I5f68e23565c6a0b11057aff476d13990bff54a66
Reviewed-on: https://go-review.googlesource.com/24848
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
In the beta version of the macOS Sierra (10.12) release, the
gettimeofday system call changed on x86. Previously it always returned
the time in the AX/DX registers. Now, if AX is returned as 0, it means
that the system call has stored the values into the memory pointed to by
the first argument, just as the libc gettimeofday function does. The
libc function handles both cases, and we need to do so as well.
Fixes#16272.
Change-Id: Ibe5ad50a2c5b125e92b5a4e787db4b5179f6b723
Reviewed-on: https://go-review.googlesource.com/24812
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The shrinkstack code locks all the channels a goroutine is waiting for,
but didn't handle the case of the same channel appearing in the list
multiple times. This led to a deadlock. The channels are sorted so it's
easy to avoid locking the same channel twice.
Fixes#16286.
Change-Id: Ie514805d0532f61c942e85af5b7b8ac405e2ff65
Reviewed-on: https://go-review.googlesource.com/24815
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
The assembly is broken: it does `MOVQ g(R12), R14` expecting that
R12 contains tls address, but it does not do get_tls(R12) before.
This magically works on linux: `MOVQ g(R12), R14` is compiled to
`mov %fs:0xfffffffffffffff8,%r14` which does not use R12.
But it crashes on windows.
Add explicit `get_tls(R12)`.
Fixes#16206
Change-Id: Ic1f21a6fef2473bcf9147de6646929781c9c1e98
Reviewed-on: https://go-review.googlesource.com/24590
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When the blocked field was first introduced back in
https://golang.org/cl/61250043 the scheduler trace code incorrectly used
m->blocked instead of mp->blocked. That has carried through the
conversion to Go. This CL fixes it.
Change-Id: Id81907b625221895aa5c85b9853f7c185efd8f4b
Reviewed-on: https://go-review.googlesource.com/24571
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
If creating a new thread fails with EAGAIN, point the user at ulimit.
Fixes#15476.
Change-Id: Ib36519614b5c72776ea7f218a0c62df1dd91a8ea
Reviewed-on: https://go-review.googlesource.com/24570
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Several minor changes that remove a good chunk of the overhead added
to the reflect Name method over the 1.7 cycle, as seen from the
non-SSA architectures.
In particular, there are ~20 fewer instructions in reflect.name.name
on 386, and the method now qualifies for inlining.
The simple JSON decoding benchmark on darwin/386:
name old time/op new time/op delta
CodeDecoder-8 49.2ms ± 0% 48.9ms ± 1% -0.77% (p=0.000 n=10+9)
name old speed new speed delta
CodeDecoder-8 39.4MB/s ± 0% 39.7MB/s ± 1% +0.77% (p=0.000 n=10+9)
On darwin/amd64 the effect is less pronounced:
name old time/op new time/op delta
CodeDecoder-8 38.9ms ± 0% 38.7ms ± 1% -0.38% (p=0.005 n=10+10)
name old speed new speed delta
CodeDecoder-8 49.9MB/s ± 0% 50.1MB/s ± 1% +0.38% (p=0.006 n=10+10)
Counterintuitively, I get much more useful benchmark data out of my
MacBook Pro than a linux workstation with more expensive Intel chips.
While the laptop has fewer cores and an active GUI, the single-threaded
performance is significantly better (nearly 1.5x decoding throughput)
so the differences are more pronounced.
For #16117.
Change-Id: I4e0cc1cc2d271d47d5127b1ee1ca926faf34cabf
Reviewed-on: https://go-review.googlesource.com/24510
Reviewed-by: Ian Lance Taylor <iant@golang.org>