Dmitriy Vyukov
|
f82db7d9e4
|
runtime: less aggressive per-thread stack segment caching
Introduce global stack segment cache and limit per-thread cache size.
This greatly reduces StackSys memory on workloads that create lots of threads.
benchmark old ns/op new ns/op delta
BenchmarkStackGrowth 665 656 -1.35%
BenchmarkStackGrowth-2 333 328 -1.50%
BenchmarkStackGrowth-4 224 172 -23.21%
BenchmarkStackGrowth-8 124 91 -26.13%
BenchmarkStackGrowth-16 82 47 -41.94%
BenchmarkStackGrowth-32 73 40 -44.79%
BenchmarkStackGrowthDeep 97231 94391 -2.92%
BenchmarkStackGrowthDeep-2 47230 58562 +23.99%
BenchmarkStackGrowthDeep-4 24993 49356 +97.48%
BenchmarkStackGrowthDeep-8 15105 30072 +99.09%
BenchmarkStackGrowthDeep-16 10005 15623 +56.15%
BenchmarkStackGrowthDeep-32 12517 13069 +4.41%
TestStackMem#1,MB 310 12 -96.13%
TestStackMem#2,MB 296 14 -95.27%
TestStackMem#3,MB 479 14 -97.08%
TestStackMem#1,sec 3.22 2.26 -29.81%
TestStackMem#2,sec 2.43 2.15 -11.52%
TestStackMem#3,sec 2.50 2.38 -4.80%
R=sougou, no.smile.face, rsc
CC=golang-dev, msolomon
https://golang.org/cl/7029044
|
2013-01-10 09:57:06 +04:00 |
|
Dmitriy Vyukov
|
5a5e698c8f
|
runtime: add goroutine creation benchmark
Current results on 2 core darwin/amd64:
BenchmarkGoroutineChain 351 ns/op
BenchmarkGoroutineChain-2 3840 ns/op
BenchmarkGoroutineChain-4 4040 ns/op
BenchmarkConcGoroutineChain 350 ns/op
BenchmarkConcGoroutineChain-2 875 ns/op
BenchmarkConcGoroutineChain-4 2027 ns/op
R=r, rsc
CC=golang-dev
https://golang.org/cl/6332054
|
2012-06-27 21:57:49 +04:00 |
|
Russ Cox
|
025abd530e
|
runtime: faster entersyscall, exitsyscall
Uses atomic memory accesses to avoid the need to acquire
and release schedlock on fast paths.
benchmark old ns/op new ns/op delta
runtime_test.BenchmarkSyscall 73 31 -56.63%
runtime_test.BenchmarkSyscall-2 538 74 -86.23%
runtime_test.BenchmarkSyscall-3 508 103 -79.72%
runtime_test.BenchmarkSyscall-4 721 97 -86.52%
runtime_test.BenchmarkSyscallWork 920 873 -5.11%
runtime_test.BenchmarkSyscallWork-2 516 481 -6.78%
runtime_test.BenchmarkSyscallWork-3 550 343 -37.64%
runtime_test.BenchmarkSyscallWork-4 632 263 -58.39%
(Intel Core i7 L640 2.13 GHz-based Lenovo X201s)
Reduced a less artificial server benchmark
from 11.5r 12.0u 8.0s to 8.3r 9.1u 1.0s.
R=dvyukov, r, bradfitz, r, iant, iant
CC=golang-dev
https://golang.org/cl/4723042
|
2011-07-19 11:01:17 -04:00 |
|
Dmitriy Vyukov
|
c9152a8568
|
runtime: eliminate contention during stack allocation
Standard-sized stack frames use plain malloc/free
instead of centralized lock-protected FixAlloc.
Benchmark results on HP Z600 (2 x Xeon E5620, 8 HT cores, 2.40GHz)
are as follows:
benchmark old ns/op new ns/op delta
BenchmarkStackGrowth 1045.00 949.00 -9.19%
BenchmarkStackGrowth-2 3450.00 800.00 -76.81%
BenchmarkStackGrowth-4 5076.00 513.00 -89.89%
BenchmarkStackGrowth-8 7805.00 471.00 -93.97%
BenchmarkStackGrowth-16 11751.00 321.00 -97.27%
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/4657091
|
2011-07-12 09:24:32 -07:00 |
|
Dmitriy Vyukov
|
91cc1e6b77
|
runtime: reset GOMAXPROCS during tests
Fix the fact that the test leaves GOMAXPROCS=3
and a running goroutine behind.
R=golang-dev, rsc
CC=dvyukov, golang-dev
https://golang.org/cl/4517121
|
2011-05-31 10:38:51 -04:00 |
|
Russ Cox
|
4f7fd3cb7f
|
runtime: disable long test (fix arm build)
TBR=r
CC=golang-dev
https://golang.org/cl/4449051
|
2011-04-23 10:03:51 -04:00 |
|
Russ Cox
|
781df132f9
|
runtime: stop deadlock test properly (fix arm5 build)
TBR=r
CC=golang-dev
https://golang.org/cl/4446058
|
2011-04-22 15:22:11 -04:00 |
|
Dmitriy Vyukov
|
29d78f1243
|
runtime: fix GOMAXPROCS vs garbage collection bug
Fixes #1715.
R=golang-dev, rsc1, rsc
CC=golang-dev
https://golang.org/cl/4434053
|
2011-04-21 12:09:25 -04:00 |
|