XOR key into data 128 bits at a time instead of 64 bits
and pipeline half of state loads. Rotate loop to allow
single-register indexing for state[i].
On a MacBookPro10,2 (Core i5):
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 412 224 -45.63%
BenchmarkRC4_1K 3179 1613 -49.26%
BenchmarkRC4_8K 25223 12545 -50.26%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 310.51 570.42 1.84x
BenchmarkRC4_1K 322.09 634.48 1.97x
BenchmarkRC4_8K 320.97 645.32 2.01x
For comparison, on the same machine, openssl 0.9.8r reports
its rc4 speed as somewhat under 350 MB/s for both 1K and 8K
(it is operating 64 bits at a time).
On an Intel Xeon E5520:
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 418 259 -38.04%
BenchmarkRC4_1K 3200 1884 -41.12%
BenchmarkRC4_8K 25173 14529 -42.28%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 306.04 492.48 1.61x
BenchmarkRC4_1K 319.93 543.26 1.70x
BenchmarkRC4_8K 321.61 557.20 1.73x
For comparison, on the same machine, openssl 1.0.1
reports its rc4 speed as 587 MB/s for 1K and 601 MB/s for 8K.
R=agl
CC=golang-dev
https://golang.org/cl/7865046
-- amd64 --
On a MacBookPro10,2 (Core i5):
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 470 421 -10.43%
BenchmarkRC4_1K 3123 3275 +4.87%
BenchmarkRC4_8K 26351 25866 -1.84%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 272.22 303.40 1.11x
BenchmarkRC4_1K 327.80 312.58 0.95x
BenchmarkRC4_8K 307.24 313.00 1.02x
For comparison, on the same machine, openssl 0.9.8r reports
its rc4 speed as somewhat under 350 MB/s for both 1K and 8K.
The Core i5 performance can be boosted another 20%, but only
by making the Xeon performance significantly slower.
On an Intel Xeon E5520:
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 774 417 -46.12%
BenchmarkRC4_1K 6121 3200 -47.72%
BenchmarkRC4_8K 48394 25151 -48.03%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 165.18 306.84 1.86x
BenchmarkRC4_1K 167.28 319.92 1.91x
BenchmarkRC4_8K 167.29 321.89 1.92x
For comparison, on the same machine, openssl 1.0.1
(which uses a different implementation than 0.9.8r)
reports its rc4 speed as 587 MB/s for 1K and 601 MB/s for 8K.
It is using SIMD instructions to do more in parallel.
So there's still some improvement to be had, but even so,
this is almost 2x faster than what it replaced.
-- 386 --
On a MacBookPro10,2 (Core i5):
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 3491 421 -87.94%
BenchmarkRC4_1K 28063 3205 -88.58%
BenchmarkRC4_8K 220392 25228 -88.55%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 36.66 303.81 8.29x
BenchmarkRC4_1K 36.49 319.42 8.75x
BenchmarkRC4_8K 36.73 320.90 8.74x
On an Intel Xeon E5520:
benchmark old ns/op new ns/op delta
BenchmarkRC4_128 2268 524 -76.90%
BenchmarkRC4_1K 18161 4137 -77.22%
BenchmarkRC4_8K 142396 32350 -77.28%
benchmark old MB/s new MB/s speedup
BenchmarkRC4_128 56.42 244.13 4.33x
BenchmarkRC4_1K 56.38 247.46 4.39x
BenchmarkRC4_8K 56.86 250.26 4.40x
R=agl
CC=golang-dev
https://golang.org/cl/7547050
Consequently, remove many package Makefiles,
and shorten the few that remain.
gomake becomes 'go tool make'.
Turn off test phases of run.bash that do not work,
flagged with $BROKEN. Future CLs will restore these,
but this seemed like a big enough CL already.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5601057
This is largely based on ality's CL 2747042.
crypto/rc4: API break in order to conform to crypto/cipher's
Stream interface
cipher/cipher: promote to the default build
Since CBC differs between TLS 1.0 and 1.1, we downgrade and
support only 1.0 at the current time. 1.0 is what most of the
world uses.
Given this CL, it would be trival to add support for AES 256,
SHA 256 etc, but I haven't in order to keep the change smaller.
R=rsc
CC=ality, golang-dev
https://golang.org/cl/3659041
parsing and printing to new syntax.
Use -oldparser to parse the old syntax,
use -oldprinter to print the old syntax.
2) Change default gofmt formatting settings
to use tabs for indentation only and to use
spaces for alignment. This will make the code
alignment insensitive to an editor's tabwidth.
Use -spaces=false to use tabs for alignment.
3) Manually changed src/exp/parser/parser_test.go
so that it doesn't try to parse the parser's
source files using the old syntax (they have
new syntax now).
4) gofmt -w src misc test/bench
1st set of files.
R=rsc
CC=agl, golang-dev, iant, ken2, r
https://golang.org/cl/180047
the bash scripts and makefiles for building go didn't take into account
the fact $GOROOT / $GOBIN could both be directories containing whitespaces,
and was not possible to build it in such a situation.
this commit adjusts the various makefiles/scripts to make it aware of that
possibility, and now it builds successfully when using a path with whitespaces
as well.
Fixes#115.
R=rsc, dsymonds1
https://golang.org/cl/157067