1
0
mirror of https://github.com/golang/go synced 2024-10-02 10:18:33 -06:00
Commit Graph

6 Commits

Author SHA1 Message Date
Michael Hudson-Doyle
45c06b27a4 cmd/internal/obj, runtime: add NOFRAME flag to suppress stack frame set up on ppc64x
Replace the confusing game where a frame size of $-8 would suppress the
implicit setting up of a stack frame with a nice explicit flag.

The code to set up the function prologue is still a little confusing but better
than it was.

Change-Id: I1d49278ff42c6bc734ebfb079998b32bc53f8d9a
Reviewed-on: https://go-review.googlesource.com/15670
Reviewed-by: Minux Ma <minux@golang.org>
2015-10-18 22:13:30 +00:00
Ilya Tocar
5cf281a9b7 runtime: optimize duffcopy on amd64
Use movups to copy 16 bytes at a time.
Results (haswell):

name            old time/op  new time/op  delta
CopyFat8-48     0.62ns ± 3%  0.63ns ± 3%     ~     (p=0.535 n=20+20)
CopyFat12-48    0.92ns ± 2%  0.93ns ± 3%     ~     (p=0.594 n=17+18)
CopyFat16-48    1.23ns ± 2%  1.23ns ± 2%     ~     (p=0.839 n=20+19)
CopyFat24-48    1.85ns ± 2%  1.84ns ± 0%   -0.48%  (p=0.014 n=19+20)
CopyFat32-48    2.45ns ± 0%  2.45ns ± 1%     ~     (p=1.000 n=16+16)
CopyFat64-48    3.30ns ± 2%  2.14ns ± 1%  -35.00%  (p=0.000 n=20+18)
CopyFat128-48   6.05ns ± 0%  3.98ns ± 0%  -34.22%  (p=0.000 n=18+17)
CopyFat256-48   11.9ns ± 3%   7.7ns ± 0%  -35.87%  (p=0.000 n=20+17)
CopyFat512-48   23.0ns ± 2%  15.1ns ± 2%  -34.52%  (p=0.000 n=20+18)
CopyFat1024-48  44.8ns ± 1%  29.8ns ± 2%  -33.48%  (p=0.000 n=17+19)

Change-Id: I8a78773c656d400726a020894461e00c59f896bf
Reviewed-on: https://go-review.googlesource.com/14836
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-22 15:02:37 +00:00
Ilya Tocar
2421c6e3df runtime: optimize duffzero for amd64.
Use MOVUPS to zero 16 bytes at a time.

results (haswell):

name             old time/op  new time/op  delta
ClearFat8-48     0.62ns ± 2%  0.62ns ± 1%     ~     (p=0.085 n=20+15)
ClearFat12-48    0.93ns ± 2%  0.93ns ± 2%     ~     (p=0.757 n=19+19)
ClearFat16-48    1.23ns ± 1%  1.23ns ± 1%     ~     (p=0.896 n=19+17)
ClearFat24-48    1.85ns ± 2%  1.84ns ± 0%   -0.51%  (p=0.023 n=20+15)
ClearFat32-48    2.45ns ± 0%  2.46ns ± 2%     ~     (p=0.053 n=17+18)
ClearFat40-48    1.99ns ± 0%  0.92ns ± 2%  -53.54%  (p=0.000 n=19+20)
ClearFat48-48    2.15ns ± 1%  0.92ns ± 2%  -56.93%  (p=0.000 n=19+20)
ClearFat56-48    2.46ns ± 1%  1.23ns ± 0%  -49.98%  (p=0.000 n=19+14)
ClearFat64-48    2.76ns ± 0%  2.14ns ± 1%  -22.21%  (p=0.000 n=17+17)
ClearFat128-48   5.21ns ± 0%  3.99ns ± 0%  -23.46%  (p=0.000 n=17+19)
ClearFat256-48   10.3ns ± 4%   7.7ns ± 0%  -25.37%  (p=0.000 n=20+17)
ClearFat512-48   20.2ns ± 4%  15.0ns ± 1%  -25.58%  (p=0.000 n=20+17)
ClearFat1024-48  39.7ns ± 2%  29.7ns ± 0%  -25.05%  (p=0.000 n=19+19)

Change-Id: I200401eec971b2dd2450c0651c51e378bd982405
Reviewed-on: https://go-review.googlesource.com/14408
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-09-16 16:07:44 +00:00
Shenghou Ma
e7dd28891e cmd/internal/gc, cmd/[56789]g: rename stackcopy to blockcopy
To avoid confusion with the runtime concept of copying stack.

Change-Id: I33442377b71012c2482c2d0ddd561492c71e70d0
Reviewed-on: https://go-review.googlesource.com/8639
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-29 00:28:01 +00:00
Josh Bleecher Snyder
7e0c11c32f cmd/6g, runtime: improve duffzero throughput
It is faster to execute

	MOVQ AX,(DI)
	MOVQ AX,8(DI)
	MOVQ AX,16(DI)
	MOVQ AX,24(DI)
	ADDQ $32,DI

than

	STOSQ
	STOSQ
	STOSQ
	STOSQ

However, in order to be able to jump into
the middle of a block of MOVQs, the call
site needs to pre-adjust DI.

If we're clearing a small area, the cost
of that DI pre-adjustment isn't repaid.

This CL switches the DUFFZERO implementation
to use a hybrid strategy, in which small
clears use STOSQ as before, but large clears
use mostly MOVQ/ADDQ blocks.

benchmark                 old ns/op     new ns/op     delta
BenchmarkClearFat8        0.55          0.55          +0.00%
BenchmarkClearFat12       0.82          0.83          +1.22%
BenchmarkClearFat16       0.55          0.55          +0.00%
BenchmarkClearFat24       0.82          0.82          +0.00%
BenchmarkClearFat32       2.20          1.94          -11.82%
BenchmarkClearFat40       1.92          1.66          -13.54%
BenchmarkClearFat48       2.21          1.93          -12.67%
BenchmarkClearFat56       3.03          2.20          -27.39%
BenchmarkClearFat64       3.26          2.48          -23.93%
BenchmarkClearFat72       3.57          2.76          -22.69%
BenchmarkClearFat80       3.83          3.05          -20.37%
BenchmarkClearFat88       4.14          3.30          -20.29%
BenchmarkClearFat128      5.54          4.69          -15.34%
BenchmarkClearFat256      9.95          9.09          -8.64%
BenchmarkClearFat512      18.7          17.9          -4.28%
BenchmarkClearFat1024     36.2          35.4          -2.21%

Change-Id: Ic786406d9b3cab68d5a231688f9e66fcd1bd7103
Reviewed-on: https://go-review.googlesource.com/2585
Reviewed-by: Keith Randall <khr@golang.org>
2015-04-15 19:17:07 +00:00
Josh Bleecher Snyder
ad3600945a runtime: auto-generate duff routines
This makes it easier to experiment with alternative implementations.

While we're here, update the comments.

No functional changes. Passes toolstash -cmp.

Change-Id: I428535754908f0fdd7cc36c214ddb6e1e60f376e
Reviewed-on: https://go-review.googlesource.com/8310
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-02 02:37:59 +00:00