Drop expecttaken function in favor of extra argument
to gbranch and bgen. Mark loop condition as likely to
be true, so that loops are generated inline.
The main benefit here is contiguous code when trying
to read the generated assembly. It has only minor effects
on the timing, and they mostly cancel the minor effects
that aligning function entry points had. One exception:
both changes made Fannkuch faster.
Compared to before CL 6244066 (before aligned functions)
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 4222117400 4201958800 -0.48%
BenchmarkFannkuch11 3462631800 3215908600 -7.13%
BenchmarkGobDecode 20887622 20899164 +0.06%
BenchmarkGobEncode 9548772 9439083 -1.15%
BenchmarkGzip 151687 152060 +0.25%
BenchmarkGunzip 8742 8711 -0.35%
BenchmarkJSONEncode 62730560 62686700 -0.07%
BenchmarkJSONDecode 252569180 252368960 -0.08%
BenchmarkMandelbrot200 5267599 5252531 -0.29%
BenchmarkRevcomp25M 980813500 985248400 +0.45%
BenchmarkTemplate 361259100 357414680 -1.06%
Compared to tip (aligned functions):
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 4140739800 4201958800 +1.48%
BenchmarkFannkuch11 3259914400 3215908600 -1.35%
BenchmarkGobDecode 20620222 20899164 +1.35%
BenchmarkGobEncode 9384886 9439083 +0.58%
BenchmarkGzip 150333 152060 +1.15%
BenchmarkGunzip 8741 8711 -0.34%
BenchmarkJSONEncode 65210990 62686700 -3.87%
BenchmarkJSONDecode 249394860 252368960 +1.19%
BenchmarkMandelbrot200 5273394 5252531 -0.40%
BenchmarkRevcomp25M 996013800 985248400 -1.08%
BenchmarkTemplate 360620840 357414680 -0.89%
R=ken2
CC=golang-dev
https://golang.org/cl/6245069
On 6l and 8l, this is a real instruction, guaranteed to
cause an 'undefined instruction' exception.
On 5l, we simulate it as BL to address 0.
The plan is to use it as a signal to the linker that this
point in the instruction stream cannot be reached
(hence the changes to nofollow). This will help the
compiler explain that panicindex and friends do not
return without having to put a list of these functions
in the linker.
R=ken2
CC=golang-dev
https://golang.org/cl/6255064
16 seems pretty standard on x86 for function entry.
I don't know if ARM would benefit, so I used just 4
(single instruction alignment).
This has a minor absolute effect on the current timings.
The main hope is that it will make them more consistent from
run to run.
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 4222117400 4140739800 -1.93%
BenchmarkFannkuch11 3462631800 3259914400 -5.85%
BenchmarkGobDecode 20887622 20620222 -1.28%
BenchmarkGobEncode 9548772 9384886 -1.72%
BenchmarkGzip 151687 150333 -0.89%
BenchmarkGunzip 8742 8741 -0.01%
BenchmarkJSONEncode 62730560 65210990 +3.95%
BenchmarkJSONDecode 252569180 249394860 -1.26%
BenchmarkMandelbrot200 5267599 5273394 +0.11%
BenchmarkRevcomp25M 980813500 996013800 +1.55%
BenchmarkTemplate 361259100 360620840 -0.18%
R=ken2
CC=golang-dev
https://golang.org/cl/6244066
The code was inconsistent about when it used
brchain(x) and when it used x directly, with the result
that you could end up emitting code for brchain(x) but
leave the jump pointing at an unemitted x.
R=ken2
CC=golang-dev
https://golang.org/cl/6250077
This bug has been introduced in the following revision:
changeset: 11404:26dceba5c610
user: Ivan Krasin <krasin@golang.org>
date: Mon Jan 23 09:19:39 2012 -0500
summary: compress/flate: reduce memory pressure at cost of additional arithmetic operation.
This is the review page for that CL: https://golang.org/cl/5555070/
R=rsc, imkrasin
CC=golang-dev
https://golang.org/cl/6249067
Most significant in mandelbrot, from avoiding MOVSD between registers,
but there are others.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/6258063
Surprise! The C code is using floating point values for its counters.
Its off the critical path, but the Go code and C code are supposed to
be as similar as possible to make comparisons meaningful.
It doesn't have a significant effect.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/6260058
This uses the patch output of gofmt (-d option) and applies each
chunk to the buffer, instead of replacing the whole buffer. The
main advantage is that the undo history is kept across gofmt'ings,
so it can really be used as a before-save-hook.
R=sameer, sameer
CC=golang-dev
https://golang.org/cl/6198047
The correct procid is needed for unparking LWPs on NetBSD - always
initialise procid in minit() so that cgo works correctly. The non-cgo
case already works correctly since procid is initialised via
lwp_create().
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/6257071
On NetBSD a cgo enabled binary has more than 32 sections - bump NSECTS
so that we can actually link them successfully.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/6261052
filterPaeth takes []byte arguments instead of byte arguments,
which avoids some redudant computation of the previous pixel
in the inner loop.
Also eliminate a bounds check in decoding the up filter.
benchmark old ns/op new ns/op delta
BenchmarkDecodeGray 3139636 2812531 -10.42%
BenchmarkDecodeNRGBAGradient 12341520 10971680 -11.10%
BenchmarkDecodeNRGBAOpaque 10740780 9612455 -10.51%
BenchmarkDecodePaletted 1819535 1818913 -0.03%
BenchmarkDecodeRGB 8974695 8178070 -8.88%
R=rsc
CC=golang-dev
https://golang.org/cl/6243061
A block with finalizer might also be profiled. The special bit
is needed to unregister the block from the profile. It will be
unset only when the block is freed.
Fixes#3668.
R=golang-dev, rsc
CC=golang-dev, remy
https://golang.org/cl/6249066
The check for Stringer etc. can only fire if the test is not a builtin, so avoid
the expensive check if we know there's no chance.
Also put in a fast path for pad, which saves a more modest amount.
benchmark old ns/op new ns/op delta
BenchmarkSprintfEmpty 148 152 +2.70%
BenchmarkSprintfString 585 497 -15.04%
BenchmarkSprintfInt 441 396 -10.20%
BenchmarkSprintfIntInt 718 603 -16.02%
BenchmarkSprintfPrefixedInt 676 621 -8.14%
BenchmarkSprintfFloat 1003 953 -4.99%
BenchmarkManyArgs 2945 2312 -21.49%
BenchmarkScanInts 1704152 1734441 +1.78%
BenchmarkScanRecursiveInt 1837397 1828920 -0.46%
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/6245068
Two fixes for indentation problems:
1. Properly recognize multi-line strings. These start with `, not ".
2. Don't indent a line if the beginning of the line is the end of a multi-line string. This happened for instance when inserting a closing bracket after a multi-line string.
R=sameer
CC=golang-dev
https://golang.org/cl/6157044
This prevents clients from seeing RSTs and missing the response
body.
TCP stacks vary. The included test failed on Darwin before but
passed on Linux.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/6256066
It was only being used for (*Stmt).Exec, not Query, and not for
the same two methods on *DB.
This unifies (*Stmt).Exec's old inline code into the old
subsetArgs function, renaming it in the process (changing the
old word "subset" to "driver", mostly converted earlier)
Fixes#3640
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/6258045
It's sad to introduce a new macro, but rnd shows up consistently
in profiles, and the function call overwhelms the two arithmetic
instructions it performs.
R=r
CC=golang-dev
https://golang.org/cl/6260051
Moving panic out of line speeds up fannkuch almost a factor of two.
Changes to bitwhacking code affect mandelbrot badly.
R=golang-dev, bradfitz, rsc, r
CC=golang-dev
https://golang.org/cl/6258056