The original plan was to collect allocation stacks
for all memory blocks. But it was never implemented
and it's not in near plans and it's unclear how to do it at all.
R=golang-dev, dave, bradfitz
CC=golang-dev
https://golang.org/cl/12724044
Add new proginfo function that returns information about a
Prog*. The information includes various instruction
description bits as well as a list of required registers set
and used and indexing registers used.
Convert the large instruction switches to use proginfo.
This information was formerly duplicated in multiple
optimization passes, inconsistently. For example, the
information about which registers an instruction requires
appeared three times for most instructions.
Most of the switches were incomplete or incorrect in some way.
For example, the switch in copyu did not list cases for INCB,
JPS, MOVAPD, MOVBWSX, MOVBWZX, PCDATA, POPQ, PUSHQ, STD,
TESTB, TESTQ, and XCHGL. Those were all falling into the
"unknown instruction" default case and stopping the rewrite,
perhaps unnecessarily. Similarly, the switch in needc only
listed a handful of the instructions that use or set the carry bit.
We still need to decide whether to use proginfo to generalize
a few of the remaining smaller switches in peep.c.
If this goes well, we'll make similar changes in 8g and 5g.
R=ken2
CC=golang-dev
https://golang.org/cl/12637051
On entry to a function, zero the results and zero the pointer
section of the local variables.
This is an intermediate step on the way to precise collection
of Go frames.
This can incur a significant (up to 30%) slowdown, but it also ensures
that the garbage collector never looks at a word in a Go frame
and sees a stale pointer value that could cause a space leak.
(C frames and assembly frames are still possibly problematic.)
This CL is required to start making collection of interface values
as precise as collection of pointer values are today.
Since we have to dereference the interface type to understand
whether the value is a pointer, it is critical that the type field be
initialized.
A future CL by Carl will make the garbage collection pointer
bitmaps context-sensitive. At that point it will be possible to
remove most of the zeroing. The only values that will still need
zeroing are values whose addresses escape the block scoping
of the function but do not escape to the heap.
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 4420289180 4331060459 -2.02%
BenchmarkFannkuch11 3442469663 3277706251 -4.79%
BenchmarkFmtFprintfEmpty 100 142 +42.00%
BenchmarkFmtFprintfString 262 310 +18.32%
BenchmarkFmtFprintfInt 213 281 +31.92%
BenchmarkFmtFprintfIntInt 355 431 +21.41%
BenchmarkFmtFprintfPrefixedInt 321 383 +19.31%
BenchmarkFmtFprintfFloat 444 533 +20.05%
BenchmarkFmtManyArgs 1380 1559 +12.97%
BenchmarkGobDecode 10240054 11794915 +15.18%
BenchmarkGobEncode 17350274 19970478 +15.10%
BenchmarkGzip 455179460 460699139 +1.21%
BenchmarkGunzip 114271814 119291574 +4.39%
BenchmarkHTTPClientServer 89051 89894 +0.95%
BenchmarkJSONEncode 40486799 52691558 +30.15%
BenchmarkJSONDecode 94193361 112428781 +19.36%
BenchmarkMandelbrot200 4747060 4748043 +0.02%
BenchmarkGoParse 6363798 6675098 +4.89%
BenchmarkRegexpMatchEasy0_32 129 171 +32.56%
BenchmarkRegexpMatchEasy0_1K 365 395 +8.22%
BenchmarkRegexpMatchEasy1_32 106 152 +43.40%
BenchmarkRegexpMatchEasy1_1K 952 1245 +30.78%
BenchmarkRegexpMatchMedium_32 198 283 +42.93%
BenchmarkRegexpMatchMedium_1K 79006 101097 +27.96%
BenchmarkRegexpMatchHard_32 3478 5115 +47.07%
BenchmarkRegexpMatchHard_1K 110245 163582 +48.38%
BenchmarkRevcomp 777384355 793270857 +2.04%
BenchmarkTemplate 136713089 157093609 +14.91%
BenchmarkTimeParse 1511 1761 +16.55%
BenchmarkTimeFormat 535 850 +58.88%
benchmark old MB/s new MB/s speedup
BenchmarkGobDecode 74.95 65.07 0.87x
BenchmarkGobEncode 44.24 38.43 0.87x
BenchmarkGzip 42.63 42.12 0.99x
BenchmarkGunzip 169.81 162.67 0.96x
BenchmarkJSONEncode 47.93 36.83 0.77x
BenchmarkJSONDecode 20.60 17.26 0.84x
BenchmarkGoParse 9.10 8.68 0.95x
BenchmarkRegexpMatchEasy0_32 247.24 186.31 0.75x
BenchmarkRegexpMatchEasy0_1K 2799.20 2591.93 0.93x
BenchmarkRegexpMatchEasy1_32 299.31 210.44 0.70x
BenchmarkRegexpMatchEasy1_1K 1074.71 822.45 0.77x
BenchmarkRegexpMatchMedium_32 5.04 3.53 0.70x
BenchmarkRegexpMatchMedium_1K 12.96 10.13 0.78x
BenchmarkRegexpMatchHard_32 9.20 6.26 0.68x
BenchmarkRegexpMatchHard_1K 9.29 6.26 0.67x
BenchmarkRevcomp 326.95 320.40 0.98x
BenchmarkTemplate 14.19 12.35 0.87x
R=cshapiro
CC=golang-dev
https://golang.org/cl/12616045
Probably we should remove this type before Go 1 contract has settled,
but too late. Instead, keep InvalidAddrError close to package generic
error types.
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/12670044
Prior to this change, pointer maps encoded the disposition of
a word using a single bit. A zero signaled a non-pointer
value and a one signaled a pointer value. Interface values,
which are a effectively a union type, were conservatively
labeled as a pointer.
This change widens the logical element size of the pointer map
to two bits per word. As before, zero signals a non-pointer
value and one signals a pointer value. Additionally, a two
signals an iface pointer and a three signals an eface pointer.
Following other changes to the runtime, values two and three
will allow a type information to drive interpretation of the
subsequent word so only those interface values containing a
pointer value will be scanned.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12689046
On my Mac, cuts the API checks from 15 seconds to 6 seconds.
Also clean up some tag confusion: go run list-of-files ignores tags.
R=bradfitz, gri
CC=golang-dev
https://golang.org/cl/12699048
Again, it still allocates but the code is simple.
benchmark old ns/op new ns/op delta
BenchmarkReadSlice1000Int32s 35580 11465 -67.78%
benchmark old MB/s new MB/s speedup
BenchmarkReadSlice1000Int32s 112.42 348.86 3.10x
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/12694048
AllTags lists all the tags that can affect the decision
about which files to include. Tools scanning packages
can use this to decide how many variants there are
and what they are.
R=bradfitz
CC=golang-dev
https://golang.org/cl/12703044
There are a few different places in the code that escape
possibly-problematic characters like < > and &.
This one was the only one missing &, so add it.
This means that if you Marshal a string, you get the
same answer you do if you Marshal a string and
pass it through the compactor. (Ironically, the
compaction makes the string longer.)
Because html/template invokes json.Marshal to
prepare escaped strings for JavaScript, this changes
the form of some of the escaped strings, but not
their meaning.
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/12708044
Use the same algorithm that go tool cover uses when producing HTML
output to render coverage intensity.
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/12712043
lookup_plan9.go's lookupSRV is using the wrong order for srv results. order should be weight, priority, port, following the response from /net/dns:
chi Aug 9 20:31:13 Rread tag 20 count 61 '_xmpp-client._tcp.offblast.org srv 5 0 5222 iota.offblast.org' 72
R=golang-dev, bradfitz
CC=ality, golang-dev, r, rsc
https://golang.org/cl/12708043
This change makes the way cc constructs pointer maps closer to
what gc does and is being done in preparation for changes to
the internal content of the pointer map such as a change to
distinguish interface pointers from ordinary pointers.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12692043
I've placed net.runtime_Semacquire into netpoll.goc,
but netbsd does not yet use netpoll.goc.
R=golang-dev, bradfitz, iant
CC=golang-dev
https://golang.org/cl/12699045
The mutex, fdMutex, handles locking and lifetime of sysfd,
and serializes Read and Write methods.
This allows to strip 2 sync.Mutex.Lock calls,
2 sync.Mutex.Unlock calls, 1 defer and some amount
of misc overhead from every network operation.
On linux/amd64, Intel E5-2690:
benchmark old ns/op new ns/op delta
BenchmarkTCP4Persistent 9595 9454 -1.47%
BenchmarkTCP4Persistent-2 8978 8772 -2.29%
BenchmarkTCP4ConcurrentReadWrite 4900 4625 -5.61%
BenchmarkTCP4ConcurrentReadWrite-2 2603 2500 -3.96%
In general it strips 70-500 ns from every network operation depending
on processor model. On my relatively new E5-2690 it accounts to ~5%
of network op cost.
Fixes#6074.
R=golang-dev, bradfitz, alex.brainman, iant, mikioh.mikioh
CC=golang-dev
https://golang.org/cl/12418043
The old code was caching per-type struct field info. Instead,
cache type-specific encoding funcs, tailored for that
particular type to avoid unnecessary reflection at runtime.
Once the machine is built once, future encodings of that type
just run the func.
benchmark old ns/op new ns/op delta
BenchmarkCodeEncoder 48424939 36975320 -23.64%
benchmark old MB/s new MB/s speedup
BenchmarkCodeEncoder 40.07 52.48 1.31x
Additionally, the numbers seem stable now at ~52 MB/s, whereas
the numbers for the old code were all over the place: 11 MB/s,
40 MB/s, 13 MB/s, 39 MB/s, etc. In the benchmark above I compared
against the best I saw the old code do.
R=rsc, adg
CC=gobot, golang-dev, r
https://golang.org/cl/9129044
Simple approach. Still generates garbage, but not as much.
benchmark old ns/op new ns/op delta
BenchmarkWriteSlice1000Int32s 40260 18791 -53.33%
benchmark old MB/s new MB/s speedup
BenchmarkWriteSlice1000Int32s 99.35 212.87 2.14x
Fixes#2634.
R=golang-dev, crawshaw
CC=golang-dev
https://golang.org/cl/12680046
Introduce freezetheworld function that is a best-effort attempt to stop any concurrently running goroutines. Call it during crash.
Fixes#5873.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12054044