qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-11-19 15:44:44 -07:00

Author	SHA1	Message	Date
Austin Clements	ec25210564	runtime: support a two-level arena map Currently, the heap arena map is a single, large array that covers every possible arena frame in the entire address space. This is practical up to about 48 bits of address space with 64 MB arenas. However, there are two problems with this: 1. mips64, ppc64, and s390x support full 64-bit address spaces (though on Linux only s390x has kernel support for 64-bit address spaces). On these platforms, it would be good to support these larger address spaces. 2. On Windows, processes are charged for untouched memory, so for processes with small heaps, the mostly-untouched 32 MB arena map plus a 64 MB arena are significant overhead. Hence, it would be good to reduce both the arena map size and the arena size, but with a single-level arena, these are inversely proportional. This CL adds support for a two-level arena map. Arena frame numbers are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2 index. At the moment, arenaL1Bits is always 0, so we effectively have a single level map. We do a few things so that this has no cost beyond the current single-level map: 1. We embed the L2 array directly in mheap, so if there's a single entry in the L2 array, the representation is identical to the current representation and there's no extra level of indirection. 2. Hot code that accesses the arena map is structured so that it optimizes to nearly the same machine code as it does currently. 3. We make some small tweaks to hot code paths and to the inliner itself to keep some important functions inlined despite their now-larger ASTs. In particular, this is necessary for heapBitsForAddr and heapBits.next. Possibly as a result of some of the tweaks, this actually slightly improves the performance of the x/benchmarks garbage benchmark: name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.28ms ± 1% 2.26ms ± 1% -1.07% (p=0.000 n=17+19) (https://perf.golang.org/search?q=upload:20180223.2) For #23900. Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff Reviewed-on: https://go-review.googlesource.com/96779 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-23 21:59:50 +00:00
Austin Clements	33b76920ec	runtime: rename "arena index" to "arena map" There are too many places where I want to talk about "indexing into the arena index". Make this less awkward and ambiguous by calling it the "arena map" instead. Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9 Reviewed-on: https://go-review.googlesource.com/96777 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-23 21:59:48 +00:00
Austin Clements	ea8d7a370d	runtime: clarify address space limit constants and comments Now that we support the full non-contiguous virtual address space of amd64 hardware, some of the comments and constants related to this are out of date. This renames memLimitBits to heapAddrBits because 1<<memLimitBits is no longer the limit of the address space and rewrites the comment to focus first on hardware limits (which span OSes) and then discuss kernel limits. Second, this eliminates the memLimit constant because there's no longer a meaningful "highest possible heap pointer value" on amd64. Updates #23862. Change-Id: I44b32033d2deb6b69248fb8dda14fc0e65c47f11 Reviewed-on: https://go-review.googlesource.com/95498 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-21 20:32:36 +00:00
Austin Clements	ed1959c6e6	runtime: offset the heap arena index by 2^47 on amd64 On amd64, the virtual address space, when interpreted as signed values, is [-2^47, 2^47). Currently, we only support heap addresses in the "positive" half of this, [0, 2^47). This suffices for linux/amd64 and windows/amd64, but solaris/amd64 can map user addresses in the negative part of this range. Specifically, addresses 0xFFFF8000'00000000 to 0xFFFFFD80'00000000 are part of user space. This leads to "memory allocated by OS not in usable address space" panic, since we don't map heap arena index space for these addresses. Fix this by offsetting addresses when computing arena indexes so that arena entry 0 corresponds to address -2^47 on amd64. We already map enough arena space for 2^48 heap addresses on 64-bit (because arm64's virtual address space is [0, 2^48)), so we don't need to grow any structures to support this. A different approach would be to simply mask out the top 16 bits. However, there are two advantages to the offset approach: 1) invalid heap addresses continue to naturally map to invalid arena indexes so we don't need extra checks and 2) it perturbs the mapping of addresses to arena indexes more, which helps check that we don't accidentally compute incorrect arena indexes somewhere that happen to be right most of the time. Several comments and constant names are now somewhat misleading. We'll fix that in the next CL. This CL is the core change the arena indexing. Fixes #23862. Change-Id: Idb8e299fded04593a286b01a9582da6ddbac2f9a Reviewed-on: https://go-review.googlesource.com/95497 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-21 20:32:35 +00:00
Austin Clements	e9db7b9dd1	runtime: abstract indexing of arena index Accessing the arena index is about to get slightly more complicated. Abstract this away into a set of functions for going back and forth between addresses and arena slice indexes. For #23862. Change-Id: I0b20e74ef47a07b78ed0cf0a6128afe6f6e40f4b Reviewed-on: https://go-review.googlesource.com/95496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-21 20:32:34 +00:00
Austin Clements	c823155828	runtime: ensure sysStat for mheap_.arenas is aligned We don't want to account the memory for mheap_.arenas because most of it is never touched, so currently we pass the address of a uint64 on the heap. However, at least on mips, it's possible for this uint64 to be unaligned, which causes the atomic add in mSysStatInc to crash. Fix this by instead passing a nil stat pointer. Fixes #23946. Change-Id: I091587df1b3066c330b6bb4d834e4596c407910f Reviewed-on: https://go-review.googlesource.com/95695 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-21 03:27:07 +00:00
Martin Möhrmann	8999b1d6c9	runtime: shorten reflect.unsafe_New call chain reflect.unsafe_New is an often called function according to profiling in a large production environment. Since newobject is not inlined currently there is call overhead that can be avoided by calling mallocgc directly. name old time/op new time/op delta New 32.4ns ± 2% 29.8ns ± 1% -8.03% (p=0.000 n=19+20) Change-Id: I572e4be830ed8e5c0da555dc3a8864c8363112be Reviewed-on: https://go-review.googlesource.com/95015 Reviewed-by: Austin Clements <austin@google.com>	2018-02-21 00:31:21 +00:00
Austin Clements	d7691d055a	runtime: replace _MaxMem with maxAlloc Now that we have memLimit, also having _MaxMem is a bit confusing. Replace it with maxAlloc, which better conveys what it limits. We also define maxAlloc slightly differently: since it's now clear that it limits allocation size, we can account for a subtle difference between 32-bit and 64-bit. Change-Id: Iac39048018cc0dae7f0919e25185fee4b3eed529 Reviewed-on: https://go-review.googlesource.com/85890 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:26 +00:00
Austin Clements	90666b8a3d	runtime: move comment about address space sizes to malloc.go Currently there's a detailed comment in lfstack_64bit.go about address space limitations on various architectures. Since that's now relevant to malloc, move it to a more prominent place in the documentation for memLimitBits. Updates #10460. Change-Id: If9708291cf3a288057b8b3ba0ba6a59e3602bbd6 Reviewed-on: https://go-review.googlesource.com/85889 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:25 +00:00
Austin Clements	51ae88ee2f	runtime: remove non-reserved heap logic Currently large sysReserve calls on some OSes don't actually reserve the memory, but just check that it can be reserved. This was important when we called sysReserve to "reserve" many gigabytes for the heap up front, but now that we map memory in small increments as we need it, this complication is no longer necessary. This has one curious side benefit: currently, on Linux, allocations that are large enough to be rejected by mmap wind up freezing the application for a long time before it panics. This happens because sysReserve doesn't reserve the memory, so sysMap calls mmap_fixed, which calls mmap, which fails because the mapping is too large. However, mmap_fixed doesn't inspect why mmap fails, so it falls back to probing every page in the desired region individually with mincore before performing an (otherwise dangerous) MAP_FIXED mapping, which will also fail. This takes a long time for a large region. Now this logic is gone, so the mmap failure leads to an immediate panic. Updates #10460. Change-Id: I8efe88c611871cdb14f99fadd09db83e0161ca2e Reviewed-on: https://go-review.googlesource.com/85888 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:24 +00:00
Austin Clements	2b415549b8	runtime: use sparse mappings for the heap This replaces the contiguous heap arena mapping with a potentially sparse mapping that can support heap mappings anywhere in the address space. This has several advantages over the current approach: * There is no longer any limit on the size of the Go heap. (Currently it's limited to 512GB.) Hence, this fixes #10460. * It eliminates many failures modes of heap initialization and growing. In particular it eliminates any possibility of panicking with an address space conflict. This can happen for many reasons and even causes a low but steady rate of TSAN test failures because of conflicts with the TSAN runtime. See #16936 and #11993. * It eliminates the notion of "non-reserved" heap, which was added because creating huge address space reservations (particularly on 64-bit) led to huge process VSIZE. This was at best confusing and at worst conflicted badly with ulimit -v. However, the non-reserved heap logic is complicated, can race with other mappings in non-pure Go binaries (e.g., #18976), and requires that the entire heap be either reserved or non-reserved. We currently maintain the latter property, but it's quite difficult to convince yourself of that, and hence difficult to keep correct. This logic is still present, but will be removed in the next CL. * It fixes problems on 32-bit where skipping over parts of the address space leads to mapping huge (and never-to-be-used) metadata structures. See #19831. This also completely rewrites and significantly simplifies mheap.sysAlloc, which has been a source of many bugs. E.g., #21044, #20259, #18651, and #13143 (and maybe #23222). This change also makes it possible to allocate individual objects larger than 512GB. As a result, a few tests that expected huge allocations to fail needed to be changed to make even larger allocations. However, at the moment attempting to allocate a humongous object may cause the program to freeze for several minutes on Linux as we fall back to probing every page with addrspace_free. That logic (and this failure mode) will be removed in the next CL. Fixes #10460. Fixes #22204 (since it rewrites the code involved). This slightly slows down compilebench and the x/benchmarks garbage benchmark. name old time/op new time/op delta Template 184ms ± 1% 185ms ± 1% ~ (p=0.065 n=10+9) Unicode 86.9ms ± 3% 86.3ms ± 1% ~ (p=0.631 n=10+10) GoTypes 599ms ± 0% 602ms ± 0% +0.56% (p=0.000 n=10+9) Compiler 2.87s ± 1% 2.89s ± 1% +0.51% (p=0.002 n=9+10) SSA 7.29s ± 1% 7.25s ± 1% ~ (p=0.182 n=10+9) Flate 118ms ± 2% 118ms ± 1% ~ (p=0.113 n=9+9) GoParser 147ms ± 1% 148ms ± 1% +1.07% (p=0.003 n=9+10) Reflect 401ms ± 1% 404ms ± 1% +0.71% (p=0.003 n=10+9) Tar 175ms ± 1% 175ms ± 1% ~ (p=0.604 n=9+10) XML 209ms ± 1% 210ms ± 1% ~ (p=0.052 n=10+10) (https://perf.golang.org/search?q=upload:20171231.4) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.25ms ± 1% +0.84% (p=0.000 n=19+19) (https://perf.golang.org/search?q=upload:20171231.3) Relative to the start of the sparse heap changes (starting at and including "runtime: fix various contiguous bitmap assumptions"), overall slowdown is roughly 1% on GC-intensive benchmarks: name old time/op new time/op delta Template 183ms ± 1% 185ms ± 1% +1.32% (p=0.000 n=9+9) Unicode 84.9ms ± 2% 86.3ms ± 1% +1.65% (p=0.000 n=9+10) GoTypes 595ms ± 1% 602ms ± 0% +1.19% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.89s ± 1% +0.91% (p=0.000 n=9+10) SSA 7.19s ± 0% 7.25s ± 1% +0.75% (p=0.000 n=8+9) Flate 117ms ± 1% 118ms ± 1% +1.10% (p=0.000 n=10+9) GoParser 146ms ± 2% 148ms ± 1% +1.48% (p=0.002 n=10+10) Reflect 398ms ± 1% 404ms ± 1% +1.51% (p=0.000 n=10+9) Tar 173ms ± 1% 175ms ± 1% +1.17% (p=0.000 n=10+10) XML 208ms ± 1% 210ms ± 1% +0.62% (p=0.011 n=10+10) [Geo mean] 369ms 373ms +1.17% (https://perf.golang.org/search?q=upload:20180101.2) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.22ms ± 1% 2.25ms ± 1% +1.51% (p=0.000 n=20+19) (https://perf.golang.org/search?q=upload:20180101.3) Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0 Reviewed-on: https://go-review.googlesource.com/85887 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:23 +00:00
Austin Clements	45ffeab549	runtime: eliminate most uses of mheap_.arena_* This replaces all uses of the mheap_.arena_* fields outside of mallocinit and sysAlloc. These fields fundamentally assume a contiguous heap between two bounds, so eliminating these is necessary for a sparse heap. Many of these are replaced with checks for non-nil spans at the test address (which in turn checks for a non-nil entry in the heap arena array). Some of them are just for debugging and somewhat meaningless with a sparse heap, so those we just delete. Updates #10460. Change-Id: I8345b95ffc610aed694f08f74633b3c63506a41f Reviewed-on: https://go-review.googlesource.com/85886 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:22 +00:00
Austin Clements	d6e8218581	runtime: make span map sparse This splits the span map into separate chunks for every 64MB of the heap. The span map chunks now live in the same indirect structure as the bitmap. Updates #10460. This causes a slight improvement in compilebench and the x/benchmarks garbage benchmark. I'm not sure why it improves performance. name old time/op new time/op delta Template 185ms ± 1% 184ms ± 1% ~ (p=0.315 n=9+10) Unicode 86.9ms ± 1% 86.9ms ± 3% ~ (p=0.356 n=9+10) GoTypes 602ms ± 1% 599ms ± 0% -0.59% (p=0.002 n=9+10) Compiler 2.89s ± 0% 2.87s ± 1% -0.50% (p=0.003 n=9+9) SSA 7.25s ± 0% 7.29s ± 1% ~ (p=0.400 n=9+10) Flate 118ms ± 1% 118ms ± 2% ~ (p=0.065 n=10+9) GoParser 147ms ± 2% 147ms ± 1% ~ (p=0.549 n=10+9) Reflect 403ms ± 1% 401ms ± 1% -0.47% (p=0.035 n=9+10) Tar 176ms ± 1% 175ms ± 1% -0.59% (p=0.013 n=10+9) XML 211ms ± 1% 209ms ± 1% -0.83% (p=0.011 n=10+10) (https://perf.golang.org/search?q=upload:20171231.1) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.24ms ± 1% 2.23ms ± 1% -0.36% (p=0.001 n=20+19) (https://perf.golang.org/search?q=upload:20171231.2) Change-Id: I2563f8704ab9812434947faf293c5327f9b0d07a Reviewed-on: https://go-review.googlesource.com/85885 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:20 +00:00
Austin Clements	c0392d2e7f	runtime: make the heap bitmap sparse This splits the heap bitmap into separate chunks for every 64MB of the heap and introduces an index mapping from virtual address to metadata. It modifies the heapBits abstraction to use this two-level structure. Finally, it modifies heapBitsSetType to unroll the bitmap into the object itself and then copy it out if the bitmap would span discontiguous bitmap chunks. This is a step toward supporting general sparse heaps, which will eliminate address space conflict failures as well as the limit on the heap size. It's also advantageous for 32-bit. 32-bit already supports discontiguous heaps by always starting the arena at address 0. However, as a result, with a contiguous bitmap, if the kernel chooses a high address (near 2GB) for a heap mapping, the runtime is forced to map up to 128MB of heap bitmap. Now the runtime can map sections of the bitmap for just the parts of the address space used by the heap. Updates #10460. This slightly slows down the x/garbage and compilebench benchmarks. However, I think the slowdown is acceptably small. name old time/op new time/op delta Template 178ms ± 1% 180ms ± 1% +0.78% (p=0.029 n=10+10) Unicode 85.7ms ± 2% 86.5ms ± 2% ~ (p=0.089 n=10+10) GoTypes 594ms ± 0% 599ms ± 1% +0.70% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.87s ± 0% +0.40% (p=0.001 n=9+9) SSA 7.23s ± 2% 7.29s ± 2% +0.94% (p=0.029 n=10+10) Flate 116ms ± 1% 117ms ± 1% +0.99% (p=0.000 n=9+9) GoParser 146ms ± 1% 146ms ± 0% ~ (p=0.193 n=10+7) Reflect 399ms ± 0% 403ms ± 1% +0.89% (p=0.001 n=10+10) Tar 173ms ± 1% 174ms ± 1% +0.91% (p=0.013 n=10+9) XML 208ms ± 1% 210ms ± 1% +0.93% (p=0.000 n=10+10) [Geo mean] 368ms 371ms +0.79% name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.17ms ± 1% 2.21ms ± 1% +2.15% (p=0.000 n=20+20) Change-Id: I037fd283221976f4f61249119d6b97b100bcbc66 Reviewed-on: https://go-review.googlesource.com/85883 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:18 +00:00
Austin Clements	29e9c4d4a4	runtime: lay out heap bitmap forward in memory Currently the heap bitamp is laid in reverse order in memory relative to the heap itself. This was originally done out of "excessive cleverness" so that computing a bitmap pointer could load only the arena_start field and so that heaps could be more contiguous by growing the arena and the bitmap out from a common center point. However, this appears to have no actual performance benefit, it complicates nearly every use of the bitmap, and it makes already confusing code more confusing. Furthermore, it's still possible to use a single field (the new bitmap_delta) for the bitmap pointer computation by employing slightly different excessive cleverness. Hence, this CL puts the bitmap into forward order. This is a (very) updated version of CL 9404. Change-Id: I743587cc626c4ecd81e660658bad85b54584108c Reviewed-on: https://go-review.googlesource.com/85881 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:16 +00:00
Austin Clements	d941b07558	runtime: eliminate write barriers from persistentalloc We're about to start tracking nowritebarrierrec through systemstack calls, which will reveal write barriers in persistentalloc prohibited by various callers. The pointers manipulated by persistentalloc are always to off-heap memory, so this removes these write barriers statically by introducing a new go:notinheap type to represent generic off-heap memory. Updates #22384. For #22460. Change-Id: Id449d9ebf145b14d55476a833e7f076b0d261d57 Reviewed-on: https://go-review.googlesource.com/72771 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-10-29 17:56:18 +00:00
Giovanni Bajo	8e11cb3f3b	runtime: improve comments for nextSample The previous comment of nextSample didn't mention Poisson processes, which is the reason why it needed to create an exponential distribution, so it was hard to follow the reasoning for people not highly familiar with statistics. Since we're at it, we also make it clear that we are just creating a random number with exponential distribution by moving the bulk of the function into a new fastexprand(). No functional changes. Change-Id: I9c275e87edb3418ee0974257af64c73465028ad7 Reviewed-on: https://go-review.googlesource.com/65657 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-09-26 16:28:35 +00:00
Ilya Tocar	101fbc2c82	runtime: make nextFreeFast inlinable https://golang.org/cl/22598 made nextFreeFast inlinable. But during https://golang.org/cl/63611 it was discovered, that it is no longer inlinable. Reduce number of statements below inlining threshold to make it inlinable again. Also update tests, to prevent regressions. Doesn't reduce readability. Change-Id: Ia672784dd48ed3b1ab46e390132f1094fe453de5 Reviewed-on: https://go-review.googlesource.com/65030 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2017-09-20 20:27:13 +00:00
Josh Bleecher Snyder	29e9b89b9a	runtime: special case allocation of arrays of size 1 This avoids division and multiplication. Instrumentation suggests that this is a very common case. Change-Id: I2d5d5012d4f4df4c4af1f9f85ca9c323c9889c0e Reviewed-on: https://go-review.googlesource.com/54657 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-14 23:32:03 +00:00
Austin Clements	780249eed4	runtime: fall back to small mmaps if we fail to grow reservation Right now, if it's possible to grow the arena reservation but mheap.sysAlloc fails to get 256MB more of memory, it simply fails. However, on 32-bit we have a fallback path that uses much smaller mmaps that could take in this situation, but fail to. This commit fixes mheap.sysAlloc to use a common failure path in case it can't grow the reservation. On 32-bit, this path includes the fallback. Ideally, mheap.sysAlloc would attempt smaller reservation growths first, but taking the fallback path is a simple change for Go 1.9. Updates #21044 (fixes one of two issues). Change-Id: I1e0035ffba986c3551479d5742809e43da5e7c73 Reviewed-on: https://go-review.googlesource.com/51713 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-07-31 14:05:58 +00:00
Austin Clements	13ae3b3a8d	runtime: accept non-monotonic arena allocation on 32-bit Currently, the heap arena allocator allocates monotonically increasing addresses. This is fine on 64-bit where we stake out a giant block of the address space for ourselves and start at the beginning of it, but on 32-bit the arena starts at address 0 but we start allocating from wherever the OS feels like giving us memory. We can generally hint the OS to start us at a low address, but this doesn't always work. As a result, on 32-bit, if the OS gives us an arena block that's lower than the current block we're allocating from, we simply say "thanks but no thanks", return the whole (256MB!) block of memory, and then take a fallback path that mmaps just the amount of memory we need (which may be as little as 8K). We have to do this because mheap_.arena_used is both the highest used address in the arena and the next address we allocate from. Fix all of this by separating the second role of arena_used out into a new field called arena_alloc. This lets us accept any arena block the OS gives us. This also slightly changes the invariants around arena_end. Previously, we ensured arena_used <= arena_end, but this was related to arena_used's second role, so the new invariant is arena_alloc <= arena_end. As a result, we no longer necessarily update arena_end when we're updating arena_used. Fixes #20259 properly. (Unlike the original fix, this one should not be cherry-picked to Go 1.8.) This is reasonably low risk. I verified several key properties of the 32-bit code path with both 4K and 64K physical pages using a symbolic model and the change does not materially affect 64-bit (arena_used == arena_alloc on 64-bit). The only oddity is that we no longer call setArenaUsed with racemap == false to indicate that we're creating a hole in the address space, but this only happened in a 32-bit-only code path, and the race detector require 64-bit, so this never mattered anyway. Change-Id: Ib1334007933e615166bac4159bf357ae06ec6a25 Reviewed-on: https://go-review.googlesource.com/44010 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-05-25 14:26:19 +00:00
Austin Clements	e5a5c03f5b	runtime: don't corrupt arena bounds on low mmap If mheap.sysAlloc doesn't have room in the heap arena for an allocation, it will attempt to map more address space with sysReserve. sysReserve is given a hint, but can return any unused address range. Currently, mheap.sysAlloc incorrectly assumes the returned region will never fall between arena_start and arena_used. If it does, mheap.sysAlloc will blindly accept the new region as the new arena_used and arena_end, causing these to decrease and make it so any Go heap above the new arena_used is no longer considered part of the Go heap. This assumption used to be safe because we had all memory between arena_start and arena_used mapped, but when we switched to an arena_start of 0 on 32-bit, it became no longer safe. Most likely, we've only recently seen this bug occur because we usually start arena_used just above the binary, which is low in the address space. Hence, the kernel is very unlikely to give us a region before arena_used. Since mheap.sysAlloc is a linear allocator, there's not much we can do to handle this well. Hence, we fix this problem by simply rejecting the new region if it isn't after arena_end. In this case, we'll take the fall-back path and mmap a small region at any address just for the requested memory. Fixes #20259. Change-Id: Ib72e8cd621545002d595c7cade1e817cfe3e5b1e Reviewed-on: https://go-review.googlesource.com/43870 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-05-23 15:23:21 +00:00
Austin Clements	295d160e01	runtime: make _TinySizeClass an int8 to prevent use as spanClass Currently _TinySizeClass is untyped, which means it can accidentally be used as a spanClass (not that I would know this from experience or anything). Make it an int8 to avoid this mix up. This is a cherry-pick of dev.garbage commit `81b74bf9c5`. Change-Id: I1e69eccee436ea5aa45e9a9828a013e369e03f1a Reviewed-on: https://go-review.googlesource.com/41254 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-28 22:50:39 +00:00
Austin Clements	8e25d4ccef	runtime: eliminate heapBitsSetTypeNoScan It's no longer necessary to maintain the bitmap of noscan objects since we now use the span metadata to determine that they're noscan instead of the bitmap. The combined effect of segregating noscan spans and the follow-on optimizations is roughly a 1% improvement in performance across the go1 benchmarks and the x/benchmarks, with no increase in heap size. Benchmark details: https://perf.golang.org/search?q=upload:20170420.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.27ms ± 0% 2.25ms ± 1% -0.96% (p=0.000 n=15+18) name old time/op new time/op delta BinaryTree17-12 2.53s ± 2% 2.55s ± 1% +0.68% (p=0.001 n=17+16) Fannkuch11-12 3.02s ± 0% 3.01s ± 0% -0.15% (p=0.000 n=16+16) FmtFprintfEmpty-12 47.1ns ± 7% 47.0ns ± 5% ~ (p=0.886 n=20+17) FmtFprintfString-12 73.6ns ± 3% 73.8ns ± 1% +0.30% (p=0.026 n=19+17) FmtFprintfInt-12 80.3ns ± 2% 80.2ns ± 1% ~ (p=0.994 n=20+18) FmtFprintfIntInt-12 124ns ± 0% 124ns ± 0% ~ (all samples are equal) FmtFprintfPrefixedInt-12 172ns ± 1% 171ns ± 1% -0.72% (p=0.003 n=20+18) FmtFprintfFloat-12 217ns ± 1% 216ns ± 1% -0.27% (p=0.019 n=18+19) FmtManyArgs-12 490ns ± 1% 488ns ± 0% -0.36% (p=0.014 n=18+18) GobDecode-12 6.71ms ± 1% 6.73ms ± 1% +0.42% (p=0.000 n=20+20) GobEncode-12 5.25ms ± 0% 5.24ms ± 0% -0.20% (p=0.001 n=18+20) Gzip-12 227ms ± 0% 226ms ± 1% ~ (p=0.107 n=20+19) Gunzip-12 38.8ms ± 0% 38.8ms ± 0% ~ (p=0.221 n=19+18) HTTPClientServer-12 75.4µs ± 1% 76.3µs ± 1% +1.26% (p=0.000 n=20+19) JSONEncode-12 14.7ms ± 0% 14.7ms ± 1% -0.14% (p=0.002 n=18+17) JSONDecode-12 57.6ms ± 0% 55.2ms ± 0% -4.13% (p=0.000 n=19+19) Mandelbrot200-12 3.73ms ± 0% 3.73ms ± 0% -0.09% (p=0.000 n=19+17) GoParse-12 3.18ms ± 1% 3.15ms ± 1% -0.90% (p=0.000 n=18+20) RegexpMatchEasy0_32-12 73.3ns ± 2% 73.2ns ± 1% ~ (p=0.994 n=20+18) RegexpMatchEasy0_1K-12 236ns ± 2% 234ns ± 1% -0.70% (p=0.002 n=19+17) RegexpMatchEasy1_32-12 69.7ns ± 2% 69.9ns ± 2% ~ (p=0.416 n=20+20) RegexpMatchEasy1_1K-12 366ns ± 1% 365ns ± 1% ~ (p=0.376 n=19+17) RegexpMatchMedium_32-12 109ns ± 1% 108ns ± 1% ~ (p=0.461 n=17+18) RegexpMatchMedium_1K-12 35.2µs ± 1% 35.2µs ± 3% ~ (p=0.238 n=19+20) RegexpMatchHard_32-12 1.77µs ± 1% 1.77µs ± 1% +0.33% (p=0.007 n=17+16) RegexpMatchHard_1K-12 53.2µs ± 0% 53.3µs ± 0% +0.26% (p=0.001 n=17+17) Revcomp-12 1.13s ±117% 0.87s ±184% ~ (p=0.813 n=20+19) Template-12 63.9ms ± 1% 64.6ms ± 1% +1.18% (p=0.000 n=19+20) TimeParse-12 313ns ± 5% 312ns ± 0% ~ (p=0.114 n=20+19) TimeFormat-12 336ns ± 0% 333ns ± 0% -0.97% (p=0.000 n=18+16) [Geo mean] 50.6µs 50.1µs -1.04% This is a cherry-pick of dev.garbage commit `edb54c300f`, with updated benchmark results. Change-Id: Ic77faaa15cdac3bfbbb0032dde5c204e05a0fd8e Reviewed-on: https://go-review.googlesource.com/41253 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-28 22:50:37 +00:00
Austin Clements	1a033b1a70	runtime: separate spans of noscan objects Currently, we mix objects with pointers and objects without pointers ("noscan" objects) together in memory. As a result, for every object we grey, we have to check that object's heap bits to find out if it's noscan, which adds to the per-object cost of GC. This also hurts the TLB footprint of the garbage collector because it decreases the density of scannable objects at the page level. This commit improves the situation by using separate spans for noscan objects. This will allow a much simpler noscan check (in a follow up CL), eliminate the need to clear the bitmap of noscan objects (in a follow up CL), and improves TLB footprint by increasing the density of scannable objects. This is also a step toward eliminating dead bits, since the current noscan check depends on checking the dead bit of the first word. This has no effect on the heap size of the garbage benchmark. We'll measure the performance change of this after the follow-up optimizations. This is a cherry-pick from dev.garbage commit `d491e550c3`. The only non-trivial merge conflict was in updatememstats in mstats.go, where we now have to separate the per-spanclass stats from the per-sizeclass stats. Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40 Reviewed-on: https://go-review.googlesource.com/41251 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-28 22:50:31 +00:00
Austin Clements	bb6309cd63	runtime: inform arena placement using sbrk(0) On 32-bit architectures (or if we fail to map a 64-bit-style arena), we try to map the heap arena just above the end of the process image. While we can accept any address, using lower addresses is preferable because lower addresses cause us to map less of the heap bitmap. However, if a program is linked against C code that has global constructors, those constructors may call brk/sbrk to allocate memory (e.g., many C malloc implementations do this for small allocations). The brk also starts just above the process image, so this may adjust the brk past the beginning of where we want to put the heap arena. In this case, the kernel will pick a different address for the arena and it will usually be very high (at least, as these things go in a 32-bit address space). Fix this by consulting the current value of the brk and using this in addition to the end of the process image to compute the initial arena placement. This is implemented only on Linux currently, since we have no evidence that it's an issue on any other OSes. Fixes #19831. Change-Id: Id64b45d08d8c91e4f50d92d0339146250b04f2f8 Reviewed-on: https://go-review.googlesource.com/39810 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-04-21 14:34:10 +00:00
Austin Clements	6c6f455f88	runtime: consolidate changes to arena_used Changing mheap_.arena_used requires several steps that are currently repeated multiple times in mheap_.sysAlloc. Consolidate these into a single function. In the future, this will also make it easier to add other auxiliary VM structures. Change-Id: Ie68837d2612e1f4ba4904acb1b6b832b15431d56 Reviewed-on: https://go-review.googlesource.com/40151 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-04-11 01:35:47 +00:00
Austin Clements	29be3f1999	runtime: generalize GC trigger Currently the GC triggering condition is an awkward combination of the gcMode (whether or not it's gcBackgroundMode) and a boolean "forceTrigger" flag. Replace this with a new gcTrigger type that represents the range of transition predicates we need. This has several advantages: 1. We can remove the awkward logic that affects the trigger behavior based on the gcMode. Now gcMode purely controls whether to run a STW GC or not and the gcTrigger controls whether this is a forced GC that cannot be consolidated with other GC cycles. 2. We can lift the time-based triggering logic in sysmon to just another type of GC trigger and move the logic to the trigger test. 3. This sets us up to have a cycle count-based trigger, which we'll use to make runtime.GC trigger concurrent GC with the desired consolidation properties. For #18216. Change-Id: If9cd49349579a548800f5022ae47b8128004bbfc Reviewed-on: https://go-review.googlesource.com/37516 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-31 01:15:06 +00:00
Keith Randall	d5dc490519	cmd/compile: intrinsics for math/bits.TrailingZerosX Implement math/bits.TrailingZerosX using intrinsics. Generally reorganize the intrinsic spec a bit. The instrinsics data structure is now built at init time. This will make doing the other functions in math/bits easier. Update sys.CtzX to return int instead of uint{64,32} so it matches math/bits.TrailingZerosX. Improve the intrinsics a bit for amd64. We don't need the CMOV for <64 bit versions. Update #18616 Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39 Reviewed-on: https://go-review.googlesource.com/38155 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 02:44:16 +00:00
Sokolov Yura	663226d8e1	runtime: make fastrand to generate 32bit values Extend period of fastrand from (1<<31)-1 to (1<<32)-1 by choosing other polynom and reacting on high bit before shift. Polynomial is taken at https://users.ece.cmu.edu/~koopman/lfsr/index.html from 32.dat.gz . It is referred as F7711115 cause this list of polynomials is for LFSR with shift to right (and fastrand uses shift to left). (old polynomial is referred in 31.dat.gz as 7BB88888). There were couple of places with conversation of fastrand to int, which leads to negative values on 32bit platforms. They are fixed. Change-Id: Ibee518a3f9103e0aea220ada494b3aec77babb72 Reviewed-on: https://go-review.googlesource.com/36875 Run-TryBot: Minux Ma <minux@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Minux Ma <minux@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-13 20:22:02 +00:00
Austin Clements	4af6b81d41	runtime: fix confusion between _MaxMem and _MaxArena32 Currently both _MaxMem and _MaxArena32 represent the maximum arena size on 32-bit hosts (except on MIPS32 where _MaxMem is confusingly smaller than _MaxArena32). Clean up sysAlloc so that it always uses _MaxMem, which is the maximum arena size on both 32- and 64-bit architectures and is the arena size we allocate auxiliary structures for. This lets us simplify and unify some code paths and eliminate _MaxArena32. Fixes #18651. mheap.sysAlloc currently assumes that if the arena is small, we must be on a 32-bit machine and can therefore grow the arena to _MaxArena32. This breaks down on darwin/arm64, where _MaxMem is only 2 GB. As a result, on darwin/arm64, we only reserve spans and bitmap space for a 2 GB heap, and if the application tries to allocate beyond that, sysAlloc takes the 32-bit path, tries to grow the arena beyond 2 GB, and panics when it tries to grow the spans array allocation past its reserved size. This has probably been a problem for several releases now, but was only noticed recently because mapSpans didn't check the bounds on the span reservation until recently. Most likely it corrupted the bitmap before. By using _MaxMem consistently, we avoid thinking that we can grow the arena larger than we have auxiliary structures for. Change-Id: Ifef28cb746a3ead4b31c1d7348495c2242fef520 Reviewed-on: https://go-review.googlesource.com/35253 Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Elias Naur <elias.naur@gmail.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-07 18:39:18 +00:00
Austin Clements	1cc24690b8	runtime: simplify and cleanup mallocinit mallocinit has evolved organically. Make a pass to clean it up in various ways: 1. Merge the computation of spansSize and bitmapSize. These were computed on every loop iteration of two different loops, but always have the same value, which can be derived directly from _MaxMem. This also avoids over-reserving these on MIPS, were _MaxArena32 is larger than _MaxMem. 2. Remove the ulimit -v logic. It's been disabled for many releases and the dead code paths to support it are even more wrong now than they were when it was first disabled, since now we must reserve spans and bitmaps for the full address space. 3. Make it clear that we're using a simple linear allocation to lay out the spans, bitmap, and arena spaces. Previously there were a lot of redundant pointer computations. Now we just bump p1 up as we reserve the spaces. In preparation for #18651. Updates #5049 (respect ulimit). Change-Id: Icbe66570d3a7a17bea227dc54fb3c4978b52a3af Reviewed-on: https://go-review.googlesource.com/35252 Reviewed-by: Russ Cox <rsc@golang.org>	2017-02-07 18:39:15 +00:00
Austin Clements	efb5eae3cf	runtime: make _MaxMem an untyped constant Currently _MaxMem is a uintptr, which is going to complicate some further changes. Make it untyped so we'll be able to do untyped math on it before truncating it to a uintptr. The runtime assembly is identical before and after this change on {linux,windows}/{amd64,386}. Updates #18651. Change-Id: I0f64511faa9e0aa25179a556ab9f185ebf8c9cf8 Reviewed-on: https://go-review.googlesource.com/35251 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org>	2017-02-07 18:39:12 +00:00
Austin Clements	7aefdfded0	runtime: use 4K as the boundary of legal pointers Currently, the check for legal pointers in stack copying uses _PageSize (8K) as the minimum legal pointer. By default, Linux won't let you map under 64K, but 1) it's less clear what other OSes allow or will allow in the future; 2) while mapping the first page is a terrible idea, mapping anywhere above that is arguably more justifiable; 3) the compiler only assumes the first physical page (4K) is never mapped. Make the runtime consistent with the compiler and more robust by changing the bad pointer check to use 4K as the minimum legal pointer. This came out of discussions on CLs 34663 and 34719. Change-Id: Idf721a788bd9699fb348f47bdd083cf8fa8bd3e5 Reviewed-on: https://go-review.googlesource.com/34890 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2017-01-06 16:19:14 +00:00
Vladimir Stefanovic	272032d0b2	runtime: add support files for linux/mips{,le} port Only exe buildmode without cgo supported. Change-Id: Id104a79a99d3285c04db00fd98b8affa94ea3c37 Reviewed-on: https://go-review.googlesource.com/31487 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-11-15 21:49:01 +00:00
Keith Randall	7ba36f4adb	runtime: compute size classes statically No point in computing this info on startup. Compute it at build time. This lets us spend more time computing & checking the size classes. Improve the div magic for rounding to the start of an object. We can now use 32-bit multiplies & shifts, which should help 32-bit platforms. The static data is <1KB. The actual size classes are not changed by this CL. Change-Id: I6450cec7d1b2b4ad31fd3f945f504ed2ec6570e7 Reviewed-on: https://go-review.googlesource.com/32219 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2016-10-30 03:48:49 +00:00
Austin Clements	87e48c5afd	runtime, cmd/compile: rename memclr -> memclrNoHeapPointers Since barrier-less memclr is only safe in very narrow circumstances, this commit renames memclr to avoid accidentally calling memclr on typed memory. This can cause subtle, non-deterministic bugs, so it's worth some effort to prevent. In the near term, this will also prevent bugs creeping in from any concurrent CLs that add calls to memclr; if this happens, whichever patch hits master second will fail to compile. This also adds the other new memclr variants to the compiler's builtin.go to minimize the churn on that binary blob. We'll use these in future commits. Updates #17503. Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca Reviewed-on: https://go-review.googlesource.com/31369 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 18:20:33 +00:00
Austin Clements	ae3bb4a537	runtime: make fixalloc zero allocations on reuse Currently fixalloc does not zero memory it reuses. This is dangerous with the hybrid barrier if the type may contain heap pointers, since it may cause us to observe a dead heap pointer on reuse. It's also error-prone since it's the only allocator that doesn't zero on allocation (mallocgc of course zeroes, but so do persistentalloc and sysAlloc). It's also largely pointless: for mcache, the caller immediately memclrs the allocation; and the two specials types are tiny so there's no real cost to zeroing them. Change fixalloc to zero allocations by default. The only type we don't zero by default is mspan. This actually requires that the spsn's sweepgen survive across freeing and reallocating a span. If we were to zero it, the following race would be possible: 1. The current sweepgen is 2. Span s is on the unswept list. 2. Direct sweeping sweeps span s, finds it's all free, and releases s to the fixalloc. 3. Thread 1 allocates s from fixalloc. Suppose this zeros s, including s.sweepgen. 4. Thread 1 calls s.init, which sets s.state to _MSpanDead. 5. On thread 2, background sweeping comes across span s in allspans and cas's s.sweepgen from 0 (sg-2) to 1 (sg-1). Now it thinks it owns it for sweeping. 6. Thread 1 continues initializing s. Everything breaks. I would like to fix this because it's obviously confusing, but it's a subtle enough problem that I'm leaving it alone for now. The solution may be to skip sweepgen 0, but then we have to think about wrap-around much more carefully. Updates #17503. Change-Id: Ie08691feed3abbb06a31381b94beb0a2e36a0613 Reviewed-on: https://go-review.googlesource.com/31368 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-28 18:20:23 +00:00
Austin Clements	6b0f668044	runtime: consolidate h_spans and mheap_.spans Like h_allspans and mheap_.allspans, these were two ways of referring to the spans array from when the runtime was split between C and Go. Clean this up by making mheap_.spans a slice and eliminating h_spans. Change-Id: I3aa7038d53c3a4252050aa33e468c48dfed0b70e Reviewed-on: https://go-review.googlesource.com/30532 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-25 22:32:48 +00:00
Austin Clements	1bc6be6423	runtime: mark several types go:notinheap This covers basically all sysAlloc'd, persistentalloc'd, and fixalloc'd types. Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b Reviewed-on: https://go-review.googlesource.com/30941 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-15 17:58:20 +00:00
Austin Clements	aaf4099a5c	runtime: update malloc.go documentation The big documentation comment at the top of malloc.go has gotten woefully out of date. Update it. Change-Id: Ibdb1bdcfdd707a6dc9db79d0633a36a28882301b Reviewed-on: https://go-review.googlesource.com/29731 Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-09-26 22:00:53 +00:00
Austin Clements	6dda7b2f5f	runtime: don't hard-code physical page size Now that the runtime fetches the true physical page size from the OS, make the physical page size used by heap growth a variable instead of a constant. This isn't used in any performance-critical paths, so it shouldn't be an issue. sys.PhysPageSize is also renamed to sys.DefaultPhysPageSize to make it clear that it's not necessarily the true page size. There are no uses of this constant any more, but we'll keep it around for now. Updates #12480 and #10180. Change-Id: I6c23b9df860db309c38c8287a703c53817754f03 Reviewed-on: https://go-review.googlesource.com/25022 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-09-06 21:05:53 +00:00
Austin Clements	276a52de55	runtime: fetch physical page size from the OS Currently the physical page size assumed by the runtime is hard-coded. On Linux the runtime at least fetches the OS page size during init and sanity checks against the hard-coded value, but they may still differ. On other OSes we wouldn't even notice. Add support on all OSes to fetch the actual OS physical page size during runtime init and lift the sanity check of PhysPageSize from the Linux init code to general malloc init. Currently this is the only use of the retrieved page size, but we'll add more shortly. Updates #12480 and #10180. Change-Id: I065f2834bc97c71d3208edc17fd990ec9058b6da Reviewed-on: https://go-review.googlesource.com/25050 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-09-06 21:05:50 +00:00
Josh Bleecher Snyder	2b74de3ed9	runtime: rename fastrand1 to fastrand Change-Id: I37706ff0a3486827c5b072c95ad890ea87ede847 Reviewed-on: https://go-review.googlesource.com/28210 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-08-30 23:59:21 +00:00
Cherry Zhang	b2e0e9688a	cmd/compile: remove Zero and NilCheck for newobject Recognize runtime.newobject and don't Zero or NilCheck it. Fixes #15914 (?) Updates #15390. TBD: add test Change-Id: Ia3bfa5c2ddbe2c27c92d9f68534a713b5ce95934 Reviewed-on: https://go-review.googlesource.com/27930 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-30 23:10:43 +00:00
Dmitry Vyukov	14e5951166	runtime: increase malloc size classes When we calculate class sizes, in some cases we discard considerable amounts of memory without an apparent reason. For example, we choose size 8448 with 6 objects in 7 pages. But we can well use object size 9472, which is also 6 objects in 7 pages but +1024 bytes (+12.12%). Increase class sizes to the max value that leads to the same page count/number of objects. Full list of affected size classes: class 36: pages: 2 size: 1664->1792 +128 (7.69%) class 39: pages: 1 size: 2560->2688 +128 (5.0%) class 40: pages: 3 size: 2816->3072 +256 (9.9%) class 41: pages: 2 size: 3072->3200 +128 (4.16%) class 42: pages: 3 size: 3328->3456 +128 (3.84%) class 44: pages: 3 size: 4608->4864 +256 (5.55%) class 47: pages: 4 size: 6400->6528 +128 (2.0%) class 48: pages: 5 size: 6656->6784 +128 (1.92%) class 51: pages: 7 size: 8448->9472 +1024 (12.12%) class 52: pages: 6 size: 8704->9728 +1024 (11.76%) class 53: pages: 5 size: 9472->10240 +768 (8.10%) class 54: pages: 4 size: 10496->10880 +384 (3.65%) class 57: pages: 7 size: 14080->14336 +256 (1.81%) class 59: pages: 9 size: 16640->18432 +1792 (10.76%) class 60: pages: 7 size: 17664->19072 +1408 (7.97%) class 62: pages: 8 size: 21248->21760 +512 (2.40%) class 64: pages: 10 size: 24832->27264 +2432 (9.79%) class 65: pages: 7 size: 28416->28672 +256 (0.90%) name old time/op new time/op delta BinaryTree17-12 2.59s ± 5% 2.52s ± 4% ~ (p=0.132 n=6+6) Fannkuch11-12 2.13s ± 3% 2.17s ± 3% ~ (p=0.180 n=6+6) FmtFprintfEmpty-12 47.0ns ± 3% 46.6ns ± 1% ~ (p=0.355 n=6+5) FmtFprintfString-12 131ns ± 0% 131ns ± 1% ~ (p=0.476 n=4+6) FmtFprintfInt-12 121ns ± 6% 122ns ± 2% ~ (p=0.511 n=6+6) FmtFprintfIntInt-12 182ns ± 2% 186ns ± 1% +2.20% (p=0.015 n=6+6) FmtFprintfPrefixedInt-12 184ns ± 5% 181ns ± 2% ~ (p=0.645 n=6+6) FmtFprintfFloat-12 272ns ± 7% 265ns ± 1% ~ (p=1.000 n=6+5) FmtManyArgs-12 783ns ± 2% 802ns ± 2% +2.38% (p=0.017 n=6+6) GobDecode-12 7.04ms ± 4% 7.00ms ± 2% ~ (p=1.000 n=6+6) GobEncode-12 6.36ms ± 6% 6.17ms ± 6% ~ (p=0.240 n=6+6) Gzip-12 242ms ±14% 233ms ± 7% ~ (p=0.310 n=6+6) Gunzip-12 36.6ms ±22% 36.0ms ± 9% ~ (p=0.841 n=5+5) HTTPClientServer-12 93.1µs ±29% 88.0µs ±32% ~ (p=0.240 n=6+6) JSONEncode-12 27.1ms ±39% 26.2ms ±35% ~ (p=0.589 n=6+6) JSONDecode-12 71.7ms ±36% 71.5ms ±36% ~ (p=0.937 n=6+6) Mandelbrot200-12 4.78ms ±10% 4.70ms ±16% ~ (p=0.394 n=6+6) GoParse-12 4.86ms ±34% 4.95ms ±36% ~ (p=1.000 n=6+6) RegexpMatchEasy0_32-12 110ns ±37% 110ns ±36% ~ (p=0.660 n=6+6) RegexpMatchEasy0_1K-12 240ns ±38% 234ns ±47% ~ (p=0.554 n=6+6) RegexpMatchEasy1_32-12 77.2ns ± 2% 77.2ns ±10% ~ (p=0.699 n=6+6) RegexpMatchEasy1_1K-12 337ns ± 5% 331ns ± 4% ~ (p=0.552 n=6+6) RegexpMatchMedium_32-12 125ns ±13% 132ns ±26% ~ (p=0.561 n=6+6) RegexpMatchMedium_1K-12 35.9µs ± 3% 36.1µs ± 5% ~ (p=0.818 n=6+6) RegexpMatchHard_32-12 1.81µs ± 4% 1.82µs ± 5% ~ (p=0.452 n=5+5) RegexpMatchHard_1K-12 52.4µs ± 2% 54.4µs ± 3% +3.84% (p=0.002 n=6+6) Revcomp-12 401ms ± 2% 390ms ± 1% -2.82% (p=0.002 n=6+6) Template-12 54.5ms ± 3% 54.6ms ± 1% ~ (p=0.589 n=6+6) TimeParse-12 294ns ± 1% 298ns ± 2% ~ (p=0.160 n=6+6) TimeFormat-12 323ns ± 4% 318ns ± 5% ~ (p=0.297 n=6+6) name old speed new speed delta GobDecode-12 109MB/s ± 4% 110MB/s ± 2% ~ (p=1.000 n=6+6) GobEncode-12 121MB/s ± 6% 125MB/s ± 6% ~ (p=0.240 n=6+6) Gzip-12 80.4MB/s ±12% 83.3MB/s ± 7% ~ (p=0.310 n=6+6) Gunzip-12 495MB/s ±41% 541MB/s ± 9% ~ (p=0.931 n=6+5) JSONEncode-12 80.7MB/s ±39% 82.8MB/s ±34% ~ (p=0.589 n=6+6) JSONDecode-12 30.4MB/s ±40% 31.0MB/s ±37% ~ (p=0.937 n=6+6) GoParse-12 13.2MB/s ±33% 13.2MB/s ±35% ~ (p=1.000 n=6+6) RegexpMatchEasy0_32-12 321MB/s ±34% 326MB/s ±34% ~ (p=0.699 n=6+6) RegexpMatchEasy0_1K-12 4.49GB/s ±31% 4.74GB/s ±37% ~ (p=0.589 n=6+6) RegexpMatchEasy1_32-12 414MB/s ± 2% 415MB/s ± 9% ~ (p=0.699 n=6+6) RegexpMatchEasy1_1K-12 3.03GB/s ± 5% 3.09GB/s ± 4% ~ (p=0.699 n=6+6) RegexpMatchMedium_32-12 7.99MB/s ±12% 7.68MB/s ±22% ~ (p=0.589 n=6+6) RegexpMatchMedium_1K-12 28.5MB/s ± 3% 28.4MB/s ± 5% ~ (p=0.818 n=6+6) RegexpMatchHard_32-12 17.7MB/s ± 4% 17.0MB/s ±15% ~ (p=0.351 n=5+6) RegexpMatchHard_1K-12 19.6MB/s ± 2% 18.8MB/s ± 3% -3.67% (p=0.002 n=6+6) Revcomp-12 634MB/s ± 2% 653MB/s ± 1% +2.89% (p=0.002 n=6+6) Template-12 35.6MB/s ± 3% 35.5MB/s ± 1% ~ (p=0.615 n=6+6) Change-Id: I465a47f74227f316e3abea231444f48c7a30ef85 Reviewed-on: https://go-review.googlesource.com/24493 Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2016-08-19 21:24:28 +00:00
Austin Clements	d8b08c3aa4	runtime: perform publication barrier even for noscan objects Currently we only execute a publication barrier for scan objects (and skip it for noscan objects). This used to be okay because GC would never consult the object itself (so it wouldn't observe uninitialized memory even if it found a pointer to a noscan object), and the heap bitmap was pre-initialized to noscan. However, now we explicitly initialize the heap bitmap for noscan objects when we allocate them. While the GC will still never consult the contents of a noscan object, it does need to see the initialized heap bitmap. Hence, we need to execute a publication barrier to make the bitmap visible before user code can expose a pointer to the newly allocated object even for noscan objects. Change-Id: Ie4133c638db0d9055b4f7a8061a634d970627153 Reviewed-on: https://go-review.googlesource.com/23043 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-05-14 13:49:51 +00:00
Elias Naur	e6ec82067a	runtime: use entire address space on 32 bit In issue #13992, Russ mentioned that the heap bitmap footprint was halved but that the bitmap size calculation hadn't been updated. This presents the opportunity to either halve the bitmap size or double the addressable virtual space. This CL doubles the addressable virtual space. On 32 bit this can be tweaked further to allow the bitmap to cover the entire 4GB virtual address space, removing a failure mode if the kernel hands out memory with a too low address. First, fix the calculation and double _MaxArena32 to cover 4GB virtual memory space with the same bitmap size (256 MB). Then, allow the fallback mode for the initial memory reservation on 32 bit (or 64 bit with too little available virtual memory) to not include space for the arena. mheap.sysAlloc will automatically reserve additional space when the existing arena is full. Finally, set arena_start to 0 in 32 bit mode, so that any address is acceptable for subsequent (additional) reservations. Before, the bitmap was always located just before arena_start, so fix the two places relying on that assumption: Point the otherwise unused mheap.bitmap to one byte after the end of the bitmap, and use it for bitmap addressing instead of arena_start. With arena_start set to 0 on 32 bit, the cgoInRange check is no longer a sufficient check for Go pointers. Introduce and call inHeapOrStack to check whether a pointer is to the Go heap or stack. While we're here, remove sysReserveHigh which seems to be unused. Fixes #13992 Change-Id: I592b513148a50b9d3967b5c5d94b86b3ec39acc2 Reviewed-on: https://go-review.googlesource.com/20471 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-07 03:04:39 +00:00
Austin Clements	a20fd1f6ba	runtime: reclaim scan/dead bit in first word With the switch to separate mark bitmaps, the scan/dead bit for the first word of each object is now unused. Reclaim this bit and use it as a scan/dead bit, just like words three and on. The second word is still used for checkmark. This dramatically simplifies heapBitsSetTypeNoScan and hasPointers, since they no longer need different cases for 1, 2, and 3+ word objects. They can instead just manipulate the heap bitmap for the first word and be done with it. In order to enable this, we change heapBitsSetType and runGCProg to always set the scan/dead bit to scan for the first word on every code path. Since these functions only apply to types that have pointers, there's no need to do this conditionally: it's always necessary to set the scan bit in the first word. We also change every place that scans an object and checks if there are more pointers. Rather than only checking morePointers if the word is >= 2, we now check morePointers if word != 1 (since that's the checkmark word). Looking forward, we should probably reclaim the checkmark bit, too, but that's going to be quite a bit more work. Tested by setting doubleCheck in heapBitsSetType and running all.bash on both linux/amd64 and linux/386, and by running GOGC=10 all.bash. This particularly improves the FmtFprintf* go1 benchmarks, since they do a large amount of noscan allocation. name old time/op new time/op delta BinaryTree17-12 2.34s ± 1% 2.38s ± 1% +1.70% (p=0.000 n=17+19) Fannkuch11-12 2.09s ± 0% 2.09s ± 1% ~ (p=0.276 n=17+16) FmtFprintfEmpty-12 44.9ns ± 2% 44.8ns ± 2% ~ (p=0.340 n=19+18) FmtFprintfString-12 127ns ± 0% 125ns ± 0% -1.57% (p=0.000 n=16+15) FmtFprintfInt-12 128ns ± 0% 122ns ± 1% -4.45% (p=0.000 n=15+20) FmtFprintfIntInt-12 207ns ± 1% 193ns ± 0% -6.55% (p=0.000 n=19+14) FmtFprintfPrefixedInt-12 197ns ± 1% 191ns ± 0% -2.93% (p=0.000 n=17+18) FmtFprintfFloat-12 263ns ± 0% 248ns ± 1% -5.88% (p=0.000 n=15+19) FmtManyArgs-12 794ns ± 0% 779ns ± 1% -1.90% (p=0.000 n=18+18) GobDecode-12 7.14ms ± 2% 7.11ms ± 1% ~ (p=0.072 n=20+20) GobEncode-12 5.85ms ± 1% 5.82ms ± 1% -0.49% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 215ms ± 1% -1.22% (p=0.000 n=19+19) Gunzip-12 36.8ms ± 0% 36.7ms ± 0% -0.18% (p=0.006 n=18+20) HTTPClientServer-12 77.1µs ± 4% 77.1µs ± 3% ~ (p=0.945 n=19+20) JSONEncode-12 15.6ms ± 1% 15.9ms ± 1% +1.68% (p=0.000 n=18+20) JSONDecode-12 55.2ms ± 1% 53.6ms ± 1% -2.93% (p=0.000 n=17+19) Mandelbrot200-12 4.05ms ± 1% 4.05ms ± 0% ~ (p=0.306 n=17+17) GoParse-12 3.14ms ± 1% 3.10ms ± 1% -1.31% (p=0.000 n=19+18) RegexpMatchEasy0_32-12 69.3ns ± 1% 70.0ns ± 0% +0.89% (p=0.000 n=19+17) RegexpMatchEasy0_1K-12 237ns ± 1% 236ns ± 0% -0.62% (p=0.000 n=19+16) RegexpMatchEasy1_32-12 69.5ns ± 1% 70.3ns ± 1% +1.14% (p=0.000 n=18+17) RegexpMatchEasy1_1K-12 377ns ± 1% 366ns ± 1% -3.03% (p=0.000 n=15+19) RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 2% ~ (p=0.318 n=20+19) RegexpMatchMedium_1K-12 33.8µs ± 3% 33.5µs ± 1% -1.04% (p=0.001 n=20+19) RegexpMatchHard_32-12 1.68µs ± 1% 1.73µs ± 0% +2.50% (p=0.000 n=20+18) RegexpMatchHard_1K-12 50.8µs ± 1% 52.0µs ± 1% +2.50% (p=0.000 n=19+18) Revcomp-12 381ms ± 1% 385ms ± 1% +1.00% (p=0.000 n=17+18) Template-12 64.9ms ± 3% 62.6ms ± 1% -3.55% (p=0.000 n=19+18) TimeParse-12 324ns ± 0% 328ns ± 1% +1.25% (p=0.000 n=18+18) TimeFormat-12 345ns ± 0% 334ns ± 0% -3.31% (p=0.000 n=15+17) [Geo mean] 52.1µs 51.5µs -1.00% Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049 Reviewed-on: https://go-review.googlesource.com/22632 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-30 16:49:54 +00:00
Rick Hudson	e9eaa181fc	[dev.garbage] runtime: simplify nextFreeFast so it is inlined nextFreeFast is currently not inlined by the compiler due to its size and complexity. This CL simplifies nextFreeFast by letting the slow path handle (nextFree) handle a corner cases. Change-Id: Ia9c5d1a7912bcb4bec072f5fd240f0e0bafb20e4 Reviewed-on: https://go-review.googlesource.com/22598 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2016-04-29 16:47:11 +00:00

1 2 3 4

182 Commits