From 6383fb615205f1b0d0ce8f8c4329e8825807edd0 Mon Sep 17 00:00:00 2001 From: Austin Clements Date: Fri, 11 Dec 2015 12:33:34 -0500 Subject: [PATCH] runtime: deduct correct sweep credit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit deductSweepCredit expects the size in bytes of the span being allocated, but mCentral_CacheSpan passes the size of a single object in the span. As a result, we don't sweep enough on that call and when mCentral_CacheSpan later calls reimburseSweepCredit, it's very likely to underflow mheap_.spanBytesAlloc, which causes the next call to deductSweepCredit to think it owes a huge number of pages and finish off the whole sweep. In addition to causing the occasional allocation that triggers the full sweep to be potentially extremely expensive relative to other allocations, this can indirectly slow down many other allocations. deductSweepCredit uses sweepone to sweep spans, which returns fully-unused spans to the heap, where these spans are freed and coalesced with neighboring free spans. On the other hand, when mCentral_CacheSpan sweeps a span, it does so with the intent to immediately reuse that span and, as a result, will not return the span to the heap even if it is fully unused. This saves on the cost of locking the heap, finding a span, and initializing that span. For example, before this change, with GOMAXPROCS=1 (or the background sweeper disabled) BinaryTree17 returned roughly 220K spans to the heap and allocated new spans from the heap roughly 232K times. After this change, it returns 1.3K spans to the heap and allocates new spans from the heap 39K times. (With background sweeping these numbers are effectively unchanged because the background sweeper sweeps almost all of the spans with sweepone; however, parallel sweeping saves more than the cost of allocating spans from the heap.) Fixes #13535. Fixes #13589. name old time/op new time/op delta BinaryTree17-12 3.03s ± 1% 2.86s ± 4% -5.61% (p=0.000 n=18+20) Fannkuch11-12 2.48s ± 1% 2.49s ± 1% ~ (p=0.060 n=17+20) FmtFprintfEmpty-12 50.7ns ± 1% 50.9ns ± 1% +0.43% (p=0.025 n=15+16) FmtFprintfString-12 174ns ± 2% 174ns ± 2% ~ (p=0.539 n=19+20) FmtFprintfInt-12 158ns ± 1% 158ns ± 1% ~ (p=0.300 n=18+20) FmtFprintfIntInt-12 269ns ± 2% 269ns ± 2% ~ (p=0.784 n=20+18) FmtFprintfPrefixedInt-12 233ns ± 1% 234ns ± 1% ~ (p=0.389 n=18+18) FmtFprintfFloat-12 309ns ± 1% 310ns ± 1% +0.25% (p=0.048 n=18+18) FmtManyArgs-12 1.10µs ± 1% 1.10µs ± 1% ~ (p=0.259 n=18+19) GobDecode-12 7.81ms ± 1% 7.72ms ± 1% -1.17% (p=0.000 n=19+19) GobEncode-12 6.56ms ± 0% 6.55ms ± 1% ~ (p=0.433 n=17+19) Gzip-12 318ms ± 2% 317ms ± 1% ~ (p=0.578 n=19+18) Gunzip-12 42.1ms ± 2% 42.0ms ± 0% -0.45% (p=0.007 n=18+16) HTTPClientServer-12 63.9µs ± 1% 64.0µs ± 1% ~ (p=0.146 n=17+19) JSONEncode-12 16.4ms ± 1% 16.4ms ± 1% ~ (p=0.271 n=19+19) JSONDecode-12 58.1ms ± 1% 58.0ms ± 1% ~ (p=0.152 n=18+18) Mandelbrot200-12 3.85ms ± 0% 3.85ms ± 0% ~ (p=0.126 n=19+18) GoParse-12 3.71ms ± 1% 3.64ms ± 1% -1.86% (p=0.000 n=20+18) RegexpMatchEasy0_32-12 100ns ± 2% 100ns ± 1% ~ (p=0.588 n=20+20) RegexpMatchEasy0_1K-12 346ns ± 1% 347ns ± 1% +0.27% (p=0.014 n=17+20) RegexpMatchEasy1_32-12 82.9ns ± 3% 83.5ns ± 3% ~ (p=0.096 n=19+20) RegexpMatchEasy1_1K-12 506ns ± 1% 506ns ± 1% ~ (p=0.530 n=19+19) RegexpMatchMedium_32-12 129ns ± 2% 129ns ± 1% ~ (p=0.566 n=20+19) RegexpMatchMedium_1K-12 39.4µs ± 1% 39.4µs ± 1% ~ (p=0.713 n=19+20) RegexpMatchHard_32-12 2.05µs ± 1% 2.06µs ± 1% +0.36% (p=0.008 n=18+20) RegexpMatchHard_1K-12 61.6µs ± 1% 61.7µs ± 1% ~ (p=0.286 n=19+20) Revcomp-12 538ms ± 1% 541ms ± 2% ~ (p=0.081 n=18+19) Template-12 71.5ms ± 2% 71.6ms ± 1% ~ (p=0.513 n=20+19) TimeParse-12 357ns ± 1% 357ns ± 1% ~ (p=0.935 n=19+18) TimeFormat-12 352ns ± 1% 352ns ± 1% ~ (p=0.293 n=19+20) [Geo mean] 62.0µs 61.9µs -0.21% name old time/op new time/op delta XBenchGarbage-12 5.83ms ± 2% 5.86ms ± 3% ~ (p=0.247 n=19+20) Change-Id: I790bb530adace27ccf25d372f24a11954b88443c Reviewed-on: https://go-review.googlesource.com/17745 Run-TryBot: Austin Clements TryBot-Result: Gobot Gobot Reviewed-by: Rick Hudson Reviewed-by: Russ Cox --- src/runtime/mcentral.go | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/runtime/mcentral.go b/src/runtime/mcentral.go index 418a5ff36b..159079b1f0 100644 --- a/src/runtime/mcentral.go +++ b/src/runtime/mcentral.go @@ -32,7 +32,8 @@ func (c *mcentral) init(sizeclass int32) { // Allocate a span to use in an MCache. func (c *mcentral) cacheSpan() *mspan { // Deduct credit for this span allocation and sweep if necessary. - deductSweepCredit(uintptr(class_to_size[c.sizeclass]), 0) + spanBytes := uintptr(class_to_allocnpages[c.sizeclass]) * _PageSize + deductSweepCredit(spanBytes, 0) lock(&c.lock) sg := mheap_.sweepgen