qbit/go - go - Tape:neT

qbit/go

Fork 0

mirror of https://github.com/golang/go synced 2024-10-02 18:18:33 -06:00

Commit Graph

Author	SHA1	Message	Date
Keith Randall	6c7cbf086c	runtime: get rid of most uses of REP for copying/zeroing. REP MOVSQ and REP STOSQ have a really high startup overhead. Use a Duff's device to do the repetition instead. benchmark old ns/op new ns/op delta BenchmarkClearFat32 7.20 1.60 -77.78% BenchmarkCopyFat32 6.88 2.38 -65.41% BenchmarkClearFat64 7.15 3.20 -55.24% BenchmarkCopyFat64 6.88 3.44 -50.00% BenchmarkClearFat128 9.53 5.34 -43.97% BenchmarkCopyFat128 9.27 5.56 -40.02% BenchmarkClearFat256 13.8 9.53 -30.94% BenchmarkCopyFat256 13.5 10.3 -23.70% BenchmarkClearFat512 22.3 18.0 -19.28% BenchmarkCopyFat512 22.0 19.7 -10.45% BenchmarkCopyFat1024 36.5 38.4 +5.21% BenchmarkClearFat1024 35.1 35.0 -0.28% TODO: use for stack frame zeroing TODO: REP prefixes are still used for "reverse" copying when src/dst regions overlap. Might be worth fixing. LGTM=rsc R=golang-codereviews, rsc CC=golang-codereviews, r https://golang.org/cl/81370046	2014-04-01 12:51:02 -07:00
Keith Randall	da7cf0ba5d	runtime: faster memclr on x86. Use explicit SSE writes instead of REP STOSQ. benchmark old ns/op new ns/op delta BenchmarkMemclr5 22 5 -73.62% BenchmarkMemclr16 27 5 -78.49% BenchmarkMemclr64 28 6 -76.43% BenchmarkMemclr256 34 8 -74.94% BenchmarkMemclr4096 112 84 -24.73% BenchmarkMemclr65536 1902 1920 +0.95% LGTM=dvyukov R=golang-codereviews, dvyukov CC=golang-codereviews https://golang.org/cl/60090044	2014-02-06 17:43:22 -08:00
Keith Randall	6021449236	runtime: faster x86 memmove (a.k.a. built-in copy()) REP instructions have a high startup cost, so we handle small sizes with some straightline code. The REP MOVSx instructions are really fast for large sizes. The cutover is approximately 1K. We implement up to 128/256 because that is the maximum SSE register load (loading all data into registers before any stores lets us ignore copy direction). (on a Sandy Bridge E5-1650 @ 3.20GHz) benchmark old ns/op new ns/op delta BenchmarkMemmove0 3 3 +0.86% BenchmarkMemmove1 5 5 +5.40% BenchmarkMemmove2 18 8 -56.84% BenchmarkMemmove3 18 7 -58.45% BenchmarkMemmove4 36 7 -78.63% BenchmarkMemmove5 36 8 -77.91% BenchmarkMemmove6 36 8 -77.76% BenchmarkMemmove7 36 8 -77.82% BenchmarkMemmove8 18 8 -56.33% BenchmarkMemmove9 18 7 -58.34% BenchmarkMemmove10 18 7 -58.34% BenchmarkMemmove11 18 7 -58.45% BenchmarkMemmove12 36 7 -78.51% BenchmarkMemmove13 36 7 -78.48% BenchmarkMemmove14 36 7 -78.56% BenchmarkMemmove15 36 7 -78.56% BenchmarkMemmove16 18 7 -58.24% BenchmarkMemmove32 18 8 -54.33% BenchmarkMemmove64 18 8 -53.37% BenchmarkMemmove128 20 9 -55.93% BenchmarkMemmove256 25 11 -55.16% BenchmarkMemmove512 33 33 -1.19% BenchmarkMemmove1024 43 44 +2.06% BenchmarkMemmove2048 61 61 +0.16% BenchmarkMemmove4096 95 95 +0.00% R=golang-dev, bradfitz, remyoudompheng, khr, iant, dominik.honnef CC=golang-dev https://golang.org/cl/9038048	2013-05-17 12:53:49 -07:00

Author

SHA1

Message

Date

Keith Randall

6c7cbf086c

runtime: get rid of most uses of REP for copying/zeroing.

REP MOVSQ and REP STOSQ have a really high startup overhead.
Use a Duff's device to do the repetition instead.

benchmark                 old ns/op     new ns/op     delta
BenchmarkClearFat32       7.20          1.60          -77.78%
BenchmarkCopyFat32        6.88          2.38          -65.41%
BenchmarkClearFat64       7.15          3.20          -55.24%
BenchmarkCopyFat64        6.88          3.44          -50.00%
BenchmarkClearFat128      9.53          5.34          -43.97%
BenchmarkCopyFat128       9.27          5.56          -40.02%
BenchmarkClearFat256      13.8          9.53          -30.94%
BenchmarkCopyFat256       13.5          10.3          -23.70%
BenchmarkClearFat512      22.3          18.0          -19.28%
BenchmarkCopyFat512       22.0          19.7          -10.45%
BenchmarkCopyFat1024      36.5          38.4          +5.21%
BenchmarkClearFat1024     35.1          35.0          -0.28%

TODO: use for stack frame zeroing
TODO: REP prefixes are still used for "reverse" copying when src/dst
regions overlap.  Might be worth fixing.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, r
https://golang.org/cl/81370046

2014-04-01 12:51:02 -07:00

Keith Randall

da7cf0ba5d

runtime: faster memclr on x86.

Use explicit SSE writes instead of REP STOSQ.

benchmark               old ns/op    new ns/op    delta
BenchmarkMemclr5               22            5  -73.62%
BenchmarkMemclr16              27            5  -78.49%
BenchmarkMemclr64              28            6  -76.43%
BenchmarkMemclr256             34            8  -74.94%
BenchmarkMemclr4096           112           84  -24.73%
BenchmarkMemclr65536         1902         1920   +0.95%

LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/60090044

2014-02-06 17:43:22 -08:00

Keith Randall

6021449236

runtime: faster x86 memmove (a.k.a. built-in copy())

REP instructions have a high startup cost, so we handle small
sizes with some straightline code.  The REP MOVSx instructions
are really fast for large sizes.  The cutover is approximately
1K.  We implement up to 128/256 because that is the maximum
SSE register load (loading all data into registers before any
stores lets us ignore copy direction).

(on a Sandy Bridge E5-1650 @ 3.20GHz)
benchmark               old ns/op    new ns/op    delta
BenchmarkMemmove0               3            3   +0.86%
BenchmarkMemmove1               5            5   +5.40%
BenchmarkMemmove2              18            8  -56.84%
BenchmarkMemmove3              18            7  -58.45%
BenchmarkMemmove4              36            7  -78.63%
BenchmarkMemmove5              36            8  -77.91%
BenchmarkMemmove6              36            8  -77.76%
BenchmarkMemmove7              36            8  -77.82%
BenchmarkMemmove8              18            8  -56.33%
BenchmarkMemmove9              18            7  -58.34%
BenchmarkMemmove10             18            7  -58.34%
BenchmarkMemmove11             18            7  -58.45%
BenchmarkMemmove12             36            7  -78.51%
BenchmarkMemmove13             36            7  -78.48%
BenchmarkMemmove14             36            7  -78.56%
BenchmarkMemmove15             36            7  -78.56%
BenchmarkMemmove16             18            7  -58.24%
BenchmarkMemmove32             18            8  -54.33%
BenchmarkMemmove64             18            8  -53.37%
BenchmarkMemmove128            20            9  -55.93%
BenchmarkMemmove256            25           11  -55.16%
BenchmarkMemmove512            33           33   -1.19%
BenchmarkMemmove1024           43           44   +2.06%
BenchmarkMemmove2048           61           61   +0.16%
BenchmarkMemmove4096           95           95   +0.00%

R=golang-dev, bradfitz, remyoudompheng, khr, iant, dominik.honnef
CC=golang-dev
https://golang.org/cl/9038048

2013-05-17 12:53:49 -07:00

3 Commits