1
0
mirror of https://github.com/golang/go synced 2024-10-03 10:21:22 -06:00
go/src/pkg
Russ Cox d6b3f37e1e bytes: asm for bytes.IndexByte
PERFORMANCE DIFFERENCE

SUMMARY

                                                   amd64           386
2.2 GHz AMD Opteron 8214 HE (Linux)             3.0x faster    8.2x faster
3.60 GHz Intel Xeon (Linux)                     2.2x faster    6.2x faster
2.53 GHz Intel Core2 Duo E7200 (Linux)          1.5x faster    4.4x faster
2.66 Ghz Intel Xeon 5150 (Mac Pro, OS X)        1.5x SLOWER    3.0x faster
2.33 GHz Intel Xeon E5435 (Linux)               1.5x SLOWER    3.0x faster
2.33 GHz Intel Core2 T7600 (MacBook Pro, OS X)  1.4x SLOWER    3.0x faster
1.83 GHz Intel Core2 T5600 (Mac Mini, OS X)        none*       3.0x faster

* but yesterday I consistently saw 1.4x SLOWER.

DETAILS

2.2 GHz AMD Opteron 8214 HE (Linux)

amd64 (3x faster)

IndexByte4K            500000           3733 ns/op     1097.24 MB/s
IndexByte4M               500        4328042 ns/op      969.10 MB/s
IndexByte64M               50       67866160 ns/op      988.84 MB/s

IndexBytePortable4K    200000          11161 ns/op      366.99 MB/s
IndexBytePortable4M       100       11795880 ns/op      355.57 MB/s
IndexBytePortable64M       10      188675000 ns/op      355.68 MB/s

386 (8.2x faster)

IndexByte4K            500000           3734 ns/op     1096.95 MB/s
IndexByte4M               500        4209954 ns/op      996.28 MB/s
IndexByte64M               50       68031980 ns/op      986.43 MB/s

IndexBytePortable4K     50000          30670 ns/op      133.55 MB/s
IndexBytePortable4M        50       31868220 ns/op      131.61 MB/s
IndexBytePortable64M        2      508851500 ns/op      131.88 MB/s

3.60 GHz Intel Xeon (Linux)

amd64 (2.2x faster)

IndexByte4K            500000           4612 ns/op      888.12 MB/s
IndexByte4M               500        4835250 ns/op      867.44 MB/s
IndexByte64M               20       77388450 ns/op      867.17 MB/s

IndexBytePortable4K    200000          10306 ns/op      397.44 MB/s
IndexBytePortable4M       100       11201460 ns/op      374.44 MB/s
IndexBytePortable64M       10      179456800 ns/op      373.96 MB/s

386 (6.3x faster)

IndexByte4K            500000           4631 ns/op      884.47 MB/s
IndexByte4M               500        4846388 ns/op      865.45 MB/s
IndexByte64M               20       78691200 ns/op      852.81 MB/s

IndexBytePortable4K    100000          28989 ns/op      141.29 MB/s
IndexBytePortable4M        50       31183180 ns/op      134.51 MB/s
IndexBytePortable64M        5      498347200 ns/op      134.66 MB/s

2.53 GHz Intel Core2 Duo E7200  (Linux)

amd64 (1.5x faster)

IndexByte4K            500000           6502 ns/op      629.96 MB/s
IndexByte4M               500        6692208 ns/op      626.74 MB/s
IndexByte64M               10      107410400 ns/op      624.79 MB/s

IndexBytePortable4K    200000           9721 ns/op      421.36 MB/s
IndexBytePortable4M       100       10013680 ns/op      418.86 MB/s
IndexBytePortable64M       10      160460800 ns/op      418.23 MB/s

386 (4.4x faster)

IndexByte4K            500000           6505 ns/op      629.67 MB/s
IndexByte4M               500        6694078 ns/op      626.57 MB/s
IndexByte64M               10      107397600 ns/op      624.86 MB/s

IndexBytePortable4K    100000          28835 ns/op      142.05 MB/s
IndexBytePortable4M        50       29562680 ns/op      141.88 MB/s
IndexBytePortable64M        5      473221400 ns/op      141.81 MB/s

2.66 Ghz Intel Xeon 5150  (Mac Pro, OS X)

amd64 (1.5x SLOWER)

IndexByte4K            200000           9290 ns/op      440.90 MB/s
IndexByte4M               200        9568925 ns/op      438.33 MB/s
IndexByte64M               10      154473600 ns/op      434.44 MB/s

IndexBytePortable4K    500000           6202 ns/op      660.43 MB/s
IndexBytePortable4M       500        6583614 ns/op      637.08 MB/s
IndexBytePortable64M       20      107166250 ns/op      626.21 MB/s

386 (3x faster)

IndexByte4K            200000           9301 ns/op      440.38 MB/s
IndexByte4M               200        9568025 ns/op      438.37 MB/s
IndexByte64M               10      154391000 ns/op      434.67 MB/s

IndexBytePortable4K    100000          27526 ns/op      148.80 MB/s
IndexBytePortable4M       100       28302490 ns/op      148.20 MB/s
IndexBytePortable64M        5      454170200 ns/op      147.76 MB/s

2.33 GHz Intel Xeon E5435  (Linux)

amd64 (1.5x SLOWER)

IndexByte4K            200000          10601 ns/op      386.38 MB/s
IndexByte4M               100       10827240 ns/op      387.38 MB/s
IndexByte64M               10      173175500 ns/op      387.52 MB/s

IndexBytePortable4K    500000           7082 ns/op      578.37 MB/s
IndexBytePortable4M       500        7391792 ns/op      567.43 MB/s
IndexBytePortable64M       20      122618550 ns/op      547.30 MB/s

386 (3x faster)

IndexByte4K            200000          11074 ns/op      369.88 MB/s
IndexByte4M               100       10902620 ns/op      384.71 MB/s
IndexByte64M               10      181292800 ns/op      370.17 MB/s

IndexBytePortable4K     50000          31725 ns/op      129.11 MB/s
IndexBytePortable4M        50       32564880 ns/op      128.80 MB/s
IndexBytePortable64M        2      545926000 ns/op      122.93 MB/s

2.33 GHz Intel Core2 T7600 (MacBook Pro, OS X)

amd64 (1.4x SLOWER)

IndexByte4K            200000          11120 ns/op      368.35 MB/s
IndexByte4M               100       11531950 ns/op      363.71 MB/s
IndexByte64M               10      184819000 ns/op      363.11 MB/s

IndexBytePortable4K    500000           7419 ns/op      552.10 MB/s
IndexBytePortable4M       200        8018710 ns/op      523.06 MB/s
IndexBytePortable64M       10      127614900 ns/op      525.87 MB/s

386 (3x faster)

IndexByte4K            200000          11114 ns/op      368.54 MB/s
IndexByte4M               100       11443530 ns/op      366.52 MB/s
IndexByte64M               10      185212000 ns/op      362.34 MB/s

IndexBytePortable4K     50000          32891 ns/op      124.53 MB/s
IndexBytePortable4M        50       33930580 ns/op      123.61 MB/s
IndexBytePortable64M        2      545400500 ns/op      123.05 MB/s

1.83 GHz Intel Core2 T5600  (Mac Mini, OS X)

amd64 (no difference)

IndexByte4K            200000          13497 ns/op      303.47 MB/s
IndexByte4M               100       13890650 ns/op      301.95 MB/s
IndexByte64M                5      222358000 ns/op      301.81 MB/s

IndexBytePortable4K    200000          13584 ns/op      301.53 MB/s
IndexBytePortable4M       100       13913280 ns/op      301.46 MB/s
IndexBytePortable64M       10      222572600 ns/op      301.51 MB/s

386 (3x faster)

IndexByte4K            200000          13565 ns/op      301.95 MB/s
IndexByte4M               100       13882640 ns/op      302.13 MB/s
IndexByte64M                5      221411600 ns/op      303.10 MB/s

IndexBytePortable4K     50000          39978 ns/op      102.46 MB/s
IndexBytePortable4M        50       41038160 ns/op      102.20 MB/s
IndexBytePortable64M        2      656362500 ns/op      102.24 MB/s

R=r
CC=golang-dev
https://golang.org/cl/166055
2009-12-04 10:23:43 -08:00
..
archive/tar move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
asn1 go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
big go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
bignum go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
bufio go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
bytes bytes: asm for bytes.IndexByte 2009-12-04 10:23:43 -08:00
compress move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
container Replace sort.Sort call with heapify algorithm in Init. 2009-11-24 17:20:13 -08:00
crypto crypto/rsa: fix shadowing error. 2009-12-03 19:33:23 -08:00
debug move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
ebnf move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
encoding move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
exec move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
exp make Native Client support build again, 2009-12-04 10:11:32 -08:00
expvar json: Decode into native Go data structures 2009-11-30 13:55:09 -08:00
flag go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
fmt minor improvement to formatting: don't allocate padding strings every time. 2009-12-03 00:04:40 -08:00
go - include type-associated consts and vars when filtering a PackageDoc 2009-12-03 11:25:20 -08:00
gob The String() method requires global state that makes it not work outside of this package, 2009-12-03 17:14:32 -08:00
hash Add benchmarks for commonly used routines. 2009-11-24 00:21:50 -08:00
http move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
image go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
io Add ReadFrom and WriteTo methods to bytes.Buffer, to enable i/o without buffer allocation. 2009-12-03 12:56:16 -08:00
json apply gofmt to json files 2009-12-02 11:40:54 -08:00
log go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
malloc runtime: malloc fixes 2009-12-03 17:22:23 -08:00
math test case for large angles in trig functions 2009-11-24 15:42:46 -08:00
net net: turn off empty packet test by default 2009-12-03 22:19:55 -08:00
once go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
os move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
patch go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
path move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
rand go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
reflect go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
regexp Change to container/vector interface: 2009-11-24 13:43:18 -08:00
rpc fix segfault printing errors. add test case and improve messages. 2009-12-02 10:41:28 -08:00
runtime runtime: fix Caller crash on 386. 2009-12-03 17:24:14 -08:00
sort Add benchmarks for commonly used routines. 2009-11-24 00:21:50 -08:00
strconv Add benchmarks for commonly used routines. 2009-11-24 00:21:50 -08:00
strings Runes: turn string into []int 2009-12-02 20:47:38 -08:00
sync sync.RWMutex: rewritten to add support for concurrent readers. 2009-11-30 12:10:56 -08:00
syscall make Native Client support build again, 2009-12-04 10:11:32 -08:00
tabwriter Add flag -tabindent to gofmt: forces use of 2009-12-02 16:57:15 -08:00
template template: two bug fixes / nits 2009-11-30 10:29:14 -08:00
testing testing: compute MB/s in benchmarks 2009-12-04 09:56:31 -08:00
time move ReadFile, WriteFile, and ReadDir into a separate io/ioutil package. 2009-12-02 22:02:14 -08:00
unicode update package unicode to Unicode 5.2 2009-12-01 16:22:21 -08:00
unsafe unsafe: documentation typo. 2009-11-16 15:39:04 -08:00
utf8 a few utf8 benchmarks. on my mac: 2009-11-25 13:30:30 -08:00
websocket Explicitly return values where it's shadowing the parameter. 2009-12-01 15:54:49 -08:00
xgb A first stab at porting the XCB X11 protocol bindings to go. 2009-11-30 14:25:50 -08:00
xml go: makes it build for the case $GOROOT has whitespaces 2009-11-23 17:32:51 -08:00
deps.bash Build changes to support work on the BSDs. 2009-11-14 15:29:09 -08:00
Makefile make Native Client support build again, 2009-12-04 10:11:32 -08:00