The malloc sample trigger was not being set in a
new m, so the first allocation in each new m - the
goroutine structure - was being sampled with
probability 1 instead of probability sizeof(G)/rate,
an oversampling of about 5000x for the default
rate of 1 MB. This bug made pprof graphs show
far more G allocations than there actually were.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5224041
Fixes#2337.
Unfortunate sequence of events is:
1. maxcpu=2, mcpu=1, grunning=1
2. starttheworld creates an extra M:
maxcpu=2, mcpu=2, grunning=1
4. the goroutine calls runtime.GOMAXPROCS(1)
maxcpu=1, mcpu=2, grunning=1
5. since it sees mcpu>maxcpu, it calls gosched()
6. schedule() deschedules the goroutine:
maxcpu=1, mcpu=1, grunning=0
7. schedule() call getnextandunlock() which
fails to pick up the goroutine again,
because canaddcpu() fails, because mcpu==maxcpu
8. then it sees that grunning==0,
reports deadlock and terminates
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/5191044
therefore unlikely that there is a good use for its string version
LastBoundaryInString. Yet, the implemenation of this method would complicate
things a bit as it would require the introduction for another interface and
some duplication of code. Removing it seems a better choice.
R=r
CC=golang-dev
https://golang.org/cl/5182044
Major changes between hybi-08 and hybi-13
- hybi-08 uses Sec-WebSocket-Origin, but hybi-13 uses Origin
- hybi-13 introduces new close status codes.
hybi-17 spec (editorial changes of hybi-13) mentions
- if a server doesn't support the requested version, it MUST respond
with Sec-WebSocket-Version headers containing all available versions.
- client MUST close the connection upon receiving a masked frame
- server MUST close the connection upon receiving a non-masked frame
note that hybi-17 still uses "Sec-WebSocket-Version: 13"
see http://code.google.com/p/pywebsocket/wiki/WebSocketProtocolSpec
for changes between spec drafts.
R=golang-dev, adg
CC=golang-dev
https://golang.org/cl/5147043
This can work only if there is no type info required to initialize the decoder,
but it's easy and gains a few percent in the basic benchmarks by avoiding
bufio when it's a bytes.Buffer - a testing-only scenario, I admit.
Add a comment about what Decode expects from the input.
R=rsc
CC=golang-dev
https://golang.org/cl/5165048
When ncpu < 2, work.nproc is always 1 which results in infinite helper
threads being created if gomaxprocs > 1 and MaxGcproc > 1. Avoid this
by using the same limits as imposed helpgc().
R=golang-dev, rsc, dvyukov
CC=golang-dev
https://golang.org/cl/5176044
This change adds the osyield and usleep
functions and code to read the number of
processors from /dev/sysstat.
I also changed SysAlloc to return nil
when brk fails (it was returning -1).
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/5177049
This requires making the .dynamic section writable, as the
dynamic linker will change the value of the DT_DEBUG tag at
runtime. The DT_DEBUG tag is used by gdb to find all loaded
shared libraries.
R=rsc
CC=golang-dev
https://golang.org/cl/5189044
The loop recognizer uses the standard dominance
frontiers but gets confused by dead code, which
has a (not explicitly set) rpo number of 0, meaning it
looks like the head of the function, so it dominates
everything. If the loop recognizer encounters dead
code while tracking backward through the graph
it fails to recognize where it started as a loop, and
then the optimizer does not registerize values loaded
inside that loop. Fix by checking rpo against rpo2r.
Separately, run a quick pass over the generated
code to squash JMPs to JMP instructions, which
are convenient to emit during code generation but
difficult to read when debugging the -S output.
A side effect of this pass is to eliminate dead code,
so the output files may be slightly smaller and the
optimizer may have less work to do.
There is no semantic effect, because the linkers
flatten JMP chains and delete dead instructions
when laying out the final code. Doing it here too
just makes the -S output easier to read and more
like what the final binary will contain.
The "dead code breaks loop finding" bug is thus
fixed twice over. It seemed prudent to fix loopit
separately just in case dead code ever sneaks back
in for one reason or another.
R=ken2
CC=golang-dev
https://golang.org/cl/5190043
The spin-off renames some types. The new names are simply better:
image.Color -> color.Color
image.ColorModel -> color.Model
image.ColorModelFunc -> color.ModelFunc
image.PalettedColorModel -> color.Palette
image.RGBAColor -> color.RGBA
image.RGBAColorModel -> color.RGBAModel
image.RGBA64Color -> color.RGBA64
image.RGBA64ColorModel -> color.RGBA64Model
(similarly for NRGBAColor, GrayColorModel, etc)
The image.ColorImage type stays in the image package, but is renamed:
image.ColorImage -> image.Uniform
The image.Image implementations (image.RGBA, image.RGBA64, image.NRGBA,
image.Alpha, etc) do not change their name, and gain a nice symmetry:
an image.RGBA is an image of color.RGBA, etc.
The image.Black, image.Opaque uniform images remain unchanged (although
their type is renamed from image.ColorImage to image.Uniform). The
corresponding color types (color.Black, color.Opaque, etc) are new.
Nothing in the image/ycbcr is renamed yet. The ycbcr.YCbCrColor and
ycbcr.YCbCrImage types will eventually migrate to color.YCbCr and
image.YCbCr, but that will be a separate CL.
R=r, bsiegert
CC=golang-dev
https://golang.org/cl/5132048
Fix for new regexp library ($ isn't end of line any more).
Don't assume . is in PATH.
R=golang-dev, dsymonds
CC=golang-dev
https://golang.org/cl/5175052
This implements a replacer for when all old strings are single
bytes, but new values are not.
BenchmarkHTMLEscapeNew 1000000 1090 ns/op
BenchmarkHTMLEscapeOld 1000000 2049 ns/op
R=rsc
CC=golang-dev
https://golang.org/cl/5176043
My previous CL:
changeset: 9645:ce2e5f44b310
user: Russ Cox <rsc@golang.org>
date: Tue Sep 06 10:24:21 2011 -0400
summary: gc: unify stack frame layout
introduced a bug wherein no variables were
being registerized, making Go programs 2-3x
slower than they had been before.
This CL fixes that bug (along with some others
it was hiding) and adds a test that optimization
makes at least one test case faster.
R=ken2
CC=golang-dev
https://golang.org/cl/5174045
When all old & new string values are single bytes,
byteReplacer is now used, instead of the generic
algorithm.
BenchmarkGenericMatch 10000 102519 ns/op
BenchmarkByteByteMatch 1000000 2178 ns/op
fast path, when nothing matches:
BenchmarkByteByteNoMatch 1000000 1109 ns/op
comparisons to multiple Replace calls:
BenchmarkByteByteReplaces 100000 16164 ns/op
comparison to strings.Map:
BenchmarkByteByteMap 500000 5454 ns/op
R=rsc
CC=golang-dev
https://golang.org/cl/5175050
The map implementation was using the C idiom of using
a pointer just past the end of its table as a limit pointer.
Unfortunately, the garbage collector sees that pointer as
pointing at the block adjacent to the map table, pinning
in memory a block that would otherwise be freed.
Fix by making limit pointer point at last valid entry, not
just past it.
Reviewed by Mike Burrows.
R=golang-dev, bradfitz, lvd, r
CC=golang-dev
https://golang.org/cl/5158045
Running test/garbage/parser.out.
On a 4-core Lenovo X201s (Linux):
31.12u 0.60s 31.74r 1 cpu, no atomics
32.27u 0.58s 32.86r 1 cpu, atomic instructions
33.04u 0.83s 27.47r 2 cpu
On a 16-core Xeon (Linux):
33.08u 0.65s 33.80r 1 cpu, no atomics
34.87u 1.12s 29.60r 2 cpu
36.00u 1.87s 28.43r 3 cpu
36.46u 2.34s 27.10r 4 cpu
38.28u 3.85s 26.92r 5 cpu
37.72u 5.25s 26.73r 6 cpu
39.63u 7.11s 26.95r 7 cpu
39.67u 8.10s 26.68r 8 cpu
On a 2-core MacBook Pro Core 2 Duo 2.26 (circa 2009, MacBookPro5,5):
39.43u 1.45s 41.27r 1 cpu, no atomics
43.98u 2.95s 38.69r 2 cpu
On a 2-core Mac Mini Core 2 Duo 1.83 (circa 2008; Macmini2,1):
48.81u 2.12s 51.76r 1 cpu, no atomics
57.15u 4.72s 51.54r 2 cpu
The handoff algorithm is really only good for two cores.
Beyond that we will need to so something more sophisticated,
like have each core hand off to the next one, around a circle.
Even so, the code is a good checkpoint; for now we'll limit the
number of gc procs to at most 2.
R=dvyukov
CC=golang-dev
https://golang.org/cl/4641082
This is a possible optimization. I'm not sure the complexity is worth it.
The new benchmark in escape_test is 46us without and 35us with the optimization.
R=nigeltao
CC=golang-dev
https://golang.org/cl/5168041