1
0
mirror of https://github.com/golang/go synced 2024-11-05 17:36:15 -07:00
Commit Graph

55 Commits

Author SHA1 Message Date
Josh Bleecher Snyder
504bc3ed24 cmd/compile, runtime: specialize convT2x, don't alloc for zero vals
Prior to this CL, all runtime conversions
from a concrete value to an interface went
through one of two runtime calls: convT2E or convT2I.
However, in practice, basic types are very common.
Specializing convT2x for those basic types allows
for a more efficient implementation for those types.
For basic scalars and strings, allocation and copying
can use the same methods as normal code.
For pointer-free types, allocation can occur without
zeroing, and copying can take place without GC calls.
For slices, copying is cheaper and simpler.

This CL adds twelve runtime routines:

convT2E16, convT2I16
convT2E32, convT2I32
convT2E64, convT2I64
convT2Estring, convT2Istring
convT2Eslice, convT2Islice
convT2Enoptr, convT2Inoptr

While compiling make.bash, 93% of all convT2x calls
are now to one of these specialized convT2x call.

Within specialized convT2x routines, it is cheap to check
for a zero value, in a way that it is not in general.
When we detect a zero value there, we return a pointer
to zeroVal, rather than allocating.

name                         old time/op  new time/op  delta
ConvT2Ezero/zero/16-8        17.9ns ± 2%   3.0ns ± 3%  -83.20%  (p=0.000 n=56+56)
ConvT2Ezero/zero/32-8        17.8ns ± 2%   3.0ns ± 3%  -83.15%  (p=0.000 n=59+60)
ConvT2Ezero/zero/64-8        20.1ns ± 1%   3.0ns ± 2%  -84.98%  (p=0.000 n=57+57)
ConvT2Ezero/zero/str-8       32.6ns ± 1%   3.0ns ± 4%  -90.70%  (p=0.000 n=59+60)
ConvT2Ezero/zero/slice-8     36.7ns ± 2%   3.0ns ± 2%  -91.78%  (p=0.000 n=59+59)
ConvT2Ezero/zero/big-8       91.9ns ± 2%  85.9ns ± 2%   -6.52%  (p=0.000 n=57+57)
ConvT2Ezero/nonzero/16-8     17.7ns ± 2%  12.7ns ± 3%  -28.38%  (p=0.000 n=55+60)
ConvT2Ezero/nonzero/32-8     17.8ns ± 1%  12.7ns ± 1%  -28.44%  (p=0.000 n=54+57)
ConvT2Ezero/nonzero/64-8     20.0ns ± 1%  15.0ns ± 1%  -24.90%  (p=0.000 n=56+58)
ConvT2Ezero/nonzero/str-8    32.6ns ± 1%  25.7ns ± 1%  -21.17%  (p=0.000 n=58+55)
ConvT2Ezero/nonzero/slice-8  36.8ns ± 2%  30.4ns ± 1%  -17.32%  (p=0.000 n=60+52)
ConvT2Ezero/nonzero/big-8    92.1ns ± 2%  85.9ns ± 2%   -6.70%  (p=0.000 n=57+59)

Benchmarks on a real program (the compiler):

name       old time/op      new time/op      delta
Template        227ms ± 5%       221ms ± 2%  -2.48%  (p=0.000 n=30+26)
Unicode         102ms ± 5%       100ms ± 3%  -1.30%  (p=0.009 n=30+26)
GoTypes         656ms ± 5%       659ms ± 4%    ~     (p=0.208 n=30+30)
Compiler        2.82s ± 2%       2.82s ± 1%    ~     (p=0.614 n=29+27)
Flate           128ms ± 2%       128ms ± 5%    ~     (p=0.783 n=27+28)
GoParser        158ms ± 3%       158ms ± 3%    ~     (p=0.261 n=28+30)
Reflect         408ms ± 7%       401ms ± 3%    ~     (p=0.075 n=30+30)
Tar             123ms ± 6%       121ms ± 8%    ~     (p=0.287 n=29+30)
XML             220ms ± 2%       220ms ± 4%    ~     (p=0.805 n=29+29)

name       old user-ns/op   new user-ns/op   delta
Template   281user-ms ± 4%  279user-ms ± 3%  -0.87%  (p=0.044 n=28+28)
Unicode    142user-ms ± 4%  141user-ms ± 3%  -1.04%  (p=0.015 n=30+27)
GoTypes    884user-ms ± 3%  886user-ms ± 2%    ~     (p=0.532 n=30+30)
Compiler   3.94user-s ± 3%  3.92user-s ± 1%    ~     (p=0.185 n=30+28)
Flate      165user-ms ± 2%  165user-ms ± 4%    ~     (p=0.780 n=27+29)
GoParser   209user-ms ± 2%  208user-ms ± 3%    ~     (p=0.453 n=28+30)
Reflect    533user-ms ± 6%  526user-ms ± 3%    ~     (p=0.057 n=30+30)
Tar        156user-ms ± 6%  154user-ms ± 6%    ~     (p=0.133 n=29+30)
XML        288user-ms ± 4%  288user-ms ± 4%    ~     (p=0.633 n=30+30)

name       old alloc/op     new alloc/op     delta
Template       41.0MB ± 0%      40.9MB ± 0%  -0.11%  (p=0.000 n=29+29)
Unicode        32.6MB ± 0%      32.6MB ± 0%    ~     (p=0.572 n=29+30)
GoTypes         122MB ± 0%       122MB ± 0%  -0.10%  (p=0.000 n=30+30)
Compiler        482MB ± 0%       481MB ± 0%  -0.07%  (p=0.000 n=30+29)
Flate          26.6MB ± 0%      26.6MB ± 0%    ~     (p=0.096 n=30+30)
GoParser       32.7MB ± 0%      32.6MB ± 0%  -0.06%  (p=0.011 n=28+28)
Reflect        84.2MB ± 0%      84.1MB ± 0%  -0.17%  (p=0.000 n=29+30)
Tar            27.7MB ± 0%      27.7MB ± 0%  -0.05%  (p=0.032 n=27+28)
XML            44.7MB ± 0%      44.7MB ± 0%    ~     (p=0.131 n=28+30)

name       old allocs/op    new allocs/op    delta
Template         373k ± 1%        370k ± 1%  -0.76%  (p=0.000 n=30+30)
Unicode          325k ± 1%        325k ± 1%    ~     (p=0.383 n=29+30)
GoTypes         1.16M ± 0%       1.15M ± 0%  -0.75%  (p=0.000 n=29+30)
Compiler        4.15M ± 0%       4.13M ± 0%  -0.59%  (p=0.000 n=30+29)
Flate            238k ± 1%        237k ± 1%  -0.62%  (p=0.000 n=30+30)
GoParser         304k ± 1%        302k ± 1%  -0.64%  (p=0.000 n=30+28)
Reflect         1.00M ± 0%       0.99M ± 0%  -1.10%  (p=0.000 n=29+30)
Tar              245k ± 1%        244k ± 1%  -0.59%  (p=0.000 n=27+29)
XML              391k ± 1%        389k ± 1%  -0.59%  (p=0.000 n=29+30)

Change-Id: Id7f456d690567c2b0a96b0d6d64de8784b6e305f
Reviewed-on: https://go-review.googlesource.com/36476
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-28 19:23:33 +00:00
Robert Griesemer
3fd3171c2c cmd/compile/internal/syntax: removed gcCompat code needed to pass orig. tests
The gcCompat mode was introduced to match the new parser's node position
setup exactly with the positions used by the original parser. Some of the
gcCompat adjustments were required to satisfy syntax error test cases,
and the rest were required to make toolstash cmp pass.

This change removes the former gcCompat adjustments and instead adjusts
the respective test cases as necessary. In some cases this makes the error
lines consistent with the ones reported by gccgo.

Where it has changed, the position associated with a given syntactic construct
is the position (line/col number) of the left-most token belonging to the
construct.

Change-Id: I5b60c00c5999a895c4d6d6e9b383c6405ccf725c
Reviewed-on: https://go-review.googlesource.com/36695
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-02-10 01:22:30 +00:00
Keith Randall
8179b9b462 cmd/compile: make sure output params are live if there is a defer
If there is a defer, and that defer recovers, then the caller
can see all of the output parameters.  That means that we must
mark all the output parameters live at any point which might panic.

If there is no defer then this is not necessary.  This is implemented.

We could also detect whether there is a recover in any of the defers.
If not, we would need to mark only output params that the defer
actually references (and the closure mechanism already does that).
This is not implemented.

Fixes #18860.

Change-Id: If984fe6686eddce9408bf25e725dd17fc16b8578
Reviewed-on: https://go-review.googlesource.com/36030
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-03 15:21:47 +00:00
Josh Bleecher Snyder
c682d3239e cmd/compile: convert constants to interfaces without allocating
The order pass is responsible for ensuring that
values passed to runtime functions, including
convT2E/convT2I, are addressable.

Prior to this CL, this was always accomplished
by creating a temp, which frequently escaped to
the heap, causing allocations, perhaps most
notably in code like:

fmt.Println(1, 2, 3) // allocates three times

None of the runtime routines modify the contents
of the pointers they receive, so in the case of
constants, instead of creating a temp value,
we can create a static value.

(Marking the static value as read-only provides
protection against accidental attempts by the runtime
to modify the constant data.)

This improves code generation for code like:

panic("abc")
c <- 2 // c is a chan int

which can now simply refer to "abc" and 2,
rather than going by way of a temporary.

It also allows us to optimize convT2E/convT2I,
by recognizing static readonly values
and directly constructing the interface.

This CL adds ~0.5% to binary size, despite
decreasing the size of many functions,
because it also adds many static symbols.

This binary size regression could be recovered in
future (but currently unplanned) work.

There is a lot of content-duplication in these
symbols; this statement generates six new symbols,
three containing an int 1 and three containing
a pointer to the string "a":

fmt.Println(1, 1, 1, "a", "a", "a")

These symbols could be made content-addressable.

Furthermore, these symbols are small, so the
alignment and naming overhead is large.
As with the go.strings section, these symbols
could be hidden and have their alignment reduced.

The changes to test/live.go make it impossible
(at least with current optimization techniques)
to place the values being passed to the runtime
in static symbols, preserving autotmp creation.

Fixes #18704

Benchmarks from fmt and go-kit's logging package:

github.com/go-kit/kit/log

name                      old time/op    new time/op    delta
JSONLoggerSimple-8          1.91µs ± 2%    2.11µs ±22%     ~     (p=1.000 n=9+10)
JSONLoggerContextual-8      2.60µs ± 6%    2.43µs ± 2%   -6.29%  (p=0.000 n=9+10)
Discard-8                    101ns ± 2%      34ns ±14%  -66.33%  (p=0.000 n=10+9)
OneWith-8                    161ns ± 1%     102ns ±16%  -36.78%  (p=0.000 n=10+10)
TwoWith-8                    175ns ± 3%     106ns ± 7%  -39.36%  (p=0.000 n=10+9)
TenWith-8                    293ns ± 3%     227ns ±15%  -22.44%  (p=0.000 n=9+10)
LogfmtLoggerSimple-8         704ns ± 2%     608ns ± 2%  -13.65%  (p=0.000 n=10+9)
LogfmtLoggerContextual-8     962ns ± 1%     860ns ±17%  -10.57%  (p=0.003 n=9+10)
NopLoggerSimple-8            188ns ± 1%     120ns ± 1%  -36.39%  (p=0.000 n=9+10)
NopLoggerContextual-8        379ns ± 1%     243ns ± 0%  -35.77%  (p=0.000 n=9+10)
ValueBindingTimestamp-8      577ns ± 1%     499ns ± 1%  -13.51%  (p=0.000 n=10+10)
ValueBindingCaller-8         898ns ± 2%     844ns ± 2%   -6.00%  (p=0.000 n=10+10)

name                      old alloc/op   new alloc/op   delta
JSONLoggerSimple-8            904B ± 0%      872B ± 0%   -3.54%  (p=0.000 n=10+10)
JSONLoggerContextual-8      1.20kB ± 0%    1.14kB ± 0%   -5.33%  (p=0.000 n=10+10)
Discard-8                    64.0B ± 0%     32.0B ± 0%  -50.00%  (p=0.000 n=10+10)
OneWith-8                    96.0B ± 0%     64.0B ± 0%  -33.33%  (p=0.000 n=10+10)
TwoWith-8                     160B ± 0%      128B ± 0%  -20.00%  (p=0.000 n=10+10)
TenWith-8                     672B ± 0%      640B ± 0%   -4.76%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
NopLoggerSimple-8             128B ± 0%       96B ± 0%  -25.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         304B ± 0%      240B ± 0%  -21.05%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       159B ± 0%      127B ± 0%  -20.13%  (p=0.000 n=10+10)
ValueBindingCaller-8          112B ± 0%       80B ± 0%  -28.57%  (p=0.000 n=10+10)

name                      old allocs/op  new allocs/op  delta
JSONLoggerSimple-8            19.0 ± 0%      17.0 ± 0%  -10.53%  (p=0.000 n=10+10)
JSONLoggerContextual-8        25.0 ± 0%      21.0 ± 0%  -16.00%  (p=0.000 n=10+10)
Discard-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
OneWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TwoWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
TenWith-8                     3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=10+10)
LogfmtLoggerSimple-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
LogfmtLoggerContextual-8      7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
NopLoggerSimple-8             4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)
NopLoggerContextual-8         7.00 ± 0%      3.00 ± 0%  -57.14%  (p=0.000 n=10+10)
ValueBindingTimestamp-8       5.00 ± 0%      3.00 ± 0%  -40.00%  (p=0.000 n=10+10)
ValueBindingCaller-8          4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=10+10)

fmt

name                             old time/op    new time/op    delta
SprintfPadding-8                   88.9ns ± 3%    79.1ns ± 1%   -11.09%  (p=0.000 n=10+7)
SprintfEmpty-8                     12.6ns ± 3%    12.8ns ± 3%      ~     (p=0.136 n=10+10)
SprintfString-8                    38.7ns ± 5%    26.9ns ± 6%   -30.65%  (p=0.000 n=10+10)
SprintfTruncateString-8            56.7ns ± 2%    47.0ns ± 3%   -17.05%  (p=0.000 n=10+10)
SprintfQuoteString-8                164ns ± 2%     153ns ± 2%    -7.01%  (p=0.000 n=10+10)
SprintfInt-8                       38.9ns ±15%    26.5ns ± 2%   -31.93%  (p=0.000 n=10+9)
SprintfIntInt-8                    60.3ns ± 9%    38.2ns ± 1%   -36.67%  (p=0.000 n=10+8)
SprintfPrefixedInt-8               58.6ns ±13%    51.2ns ±11%   -12.66%  (p=0.001 n=10+10)
SprintfFloat-8                     71.4ns ± 3%    64.2ns ± 3%   -10.08%  (p=0.000 n=8+10)
SprintfComplex-8                    175ns ± 3%     159ns ± 2%    -9.03%  (p=0.000 n=10+10)
SprintfBoolean-8                   33.5ns ± 4%    25.7ns ± 5%   -23.28%  (p=0.000 n=10+10)
SprintfHexString-8                 65.3ns ± 3%    51.7ns ± 5%   -20.86%  (p=0.000 n=10+9)
SprintfHexBytes-8                  67.2ns ± 5%    67.9ns ± 4%      ~     (p=0.383 n=10+10)
SprintfBytes-8                      129ns ± 7%     124ns ± 7%      ~     (p=0.074 n=9+10)
SprintfStringer-8                   127ns ± 4%     126ns ± 8%      ~     (p=0.506 n=9+10)
SprintfStructure-8                  357ns ± 3%     359ns ± 3%      ~     (p=0.469 n=10+10)
ManyArgs-8                          203ns ± 6%     126ns ± 3%   -37.94%  (p=0.000 n=10+10)
FprintInt-8                         119ns ±10%      74ns ± 3%   -37.54%  (p=0.000 n=10+10)
FprintfBytes-8                      122ns ± 4%     120ns ± 3%      ~     (p=0.124 n=10+10)
FprintIntNoAlloc-8                 78.2ns ± 5%    74.1ns ± 3%    -5.28%  (p=0.000 n=10+10)
ScanInts-8                          349µs ± 1%     349µs ± 0%      ~     (p=0.606 n=9+8)
ScanRecursiveInt-8                 43.8ms ± 7%    40.1ms ± 2%    -8.42%  (p=0.000 n=10+10)
ScanRecursiveIntReaderWrapper-8    43.5ms ± 4%    40.4ms ± 2%    -7.16%  (p=0.000 n=10+9)

name                             old alloc/op   new alloc/op   delta
SprintfPadding-8                    24.0B ± 0%     16.0B ± 0%   -33.33%  (p=0.000 n=10+10)
SprintfEmpty-8                      0.00B          0.00B           ~     (all equal)
SprintfString-8                     21.0B ± 0%      5.0B ± 0%   -76.19%  (p=0.000 n=10+10)
SprintfTruncateString-8             32.0B ± 0%     16.0B ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfQuoteString-8                48.0B ± 0%     32.0B ± 0%   -33.33%  (p=0.000 n=10+10)
SprintfInt-8                        16.0B ± 0%      1.0B ± 0%   -93.75%  (p=0.000 n=10+10)
SprintfIntInt-8                     24.0B ± 0%      3.0B ± 0%   -87.50%  (p=0.000 n=10+10)
SprintfPrefixedInt-8                72.0B ± 0%     64.0B ± 0%   -11.11%  (p=0.000 n=10+10)
SprintfFloat-8                      16.0B ± 0%      8.0B ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfComplex-8                    48.0B ± 0%     32.0B ± 0%   -33.33%  (p=0.000 n=10+10)
SprintfBoolean-8                    8.00B ± 0%     4.00B ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfHexString-8                  96.0B ± 0%     80.0B ± 0%   -16.67%  (p=0.000 n=10+10)
SprintfHexBytes-8                    112B ± 0%      112B ± 0%      ~     (all equal)
SprintfBytes-8                      96.0B ± 0%     96.0B ± 0%      ~     (all equal)
SprintfStringer-8                   32.0B ± 0%     32.0B ± 0%      ~     (all equal)
SprintfStructure-8                   256B ± 0%      256B ± 0%      ~     (all equal)
ManyArgs-8                          80.0B ± 0%      0.0B       -100.00%  (p=0.000 n=10+10)
FprintInt-8                         8.00B ± 0%     0.00B       -100.00%  (p=0.000 n=10+10)
FprintfBytes-8                      32.0B ± 0%     32.0B ± 0%      ~     (all equal)
FprintIntNoAlloc-8                  0.00B          0.00B           ~     (all equal)
ScanInts-8                         15.2kB ± 0%    15.2kB ± 0%      ~     (p=0.248 n=9+10)
ScanRecursiveInt-8                 21.6kB ± 0%    21.6kB ± 0%      ~     (all equal)
ScanRecursiveIntReaderWrapper-8    21.7kB ± 0%    21.7kB ± 0%      ~     (all equal)

name                             old allocs/op  new allocs/op  delta
SprintfPadding-8                     2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfEmpty-8                       0.00           0.00           ~     (all equal)
SprintfString-8                      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfTruncateString-8              2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfQuoteString-8                 2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfInt-8                         2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfIntInt-8                      3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.000 n=10+10)
SprintfPrefixedInt-8                 2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfFloat-8                       2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfComplex-8                     2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfBoolean-8                     2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfHexString-8                   2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
SprintfHexBytes-8                    2.00 ± 0%      2.00 ± 0%      ~     (all equal)
SprintfBytes-8                       2.00 ± 0%      2.00 ± 0%      ~     (all equal)
SprintfStringer-8                    4.00 ± 0%      4.00 ± 0%      ~     (all equal)
SprintfStructure-8                   7.00 ± 0%      7.00 ± 0%      ~     (all equal)
ManyArgs-8                           8.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
FprintInt-8                          1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
FprintfBytes-8                       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
FprintIntNoAlloc-8                   0.00           0.00           ~     (all equal)
ScanInts-8                          1.60k ± 0%     1.60k ± 0%      ~     (all equal)
ScanRecursiveInt-8                  1.71k ± 0%     1.71k ± 0%      ~     (all equal)
ScanRecursiveIntReaderWrapper-8     1.71k ± 0%     1.71k ± 0%      ~     (all equal)

Change-Id: I7ba72a25fea4140a0ba40a9f443103ed87cc69b5
Reviewed-on: https://go-review.googlesource.com/35554
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-02 21:02:23 +00:00
David Chase
7f1ff65c39 cmd/compile: insert scheduling checks on loop backedges
Loop breaking with a counter.  Benchmarked (see comments),
eyeball checked for sanity on popular loops.  This code
ought to handle loops in general, and properly inserts phi
functions in cases where the earlier version might not have.

Includes test, plus modifications to test/run.go to deal with
timeout and killing looping test.  Tests broken by the addition
of extra code (branch frequency and live vars) for added
checks turn the check insertion off.

If GOEXPERIMENT=preemptibleloops, the compiler inserts reschedule
checks on every backedge of every reducible loop.  Alternately,
specifying GO_GCFLAGS=-d=ssa/insert_resched_checks/on will
enable it for a single compilation, but because the core Go
libraries contain some loops that may run long, this is less
likely to have the desired effect.

This is intended as a tool to help in the study and diagnosis
of GC and other latency problems, now that goal STW GC latency
is on the order of 100 microseconds or less.

Updates #17831.
Updates #10958.

Change-Id: I6206c163a5b0248e3f21eb4fc65f73a179e1f639
Reviewed-on: https://go-review.googlesource.com/33910
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-01-09 21:01:29 +00:00
David Chase
9c066bab64 cmd/compile: mark temps with new AutoTemp flag, and use it.
This is an extension of
https://go-review.googlesource.com/c/31662/
to mark all the temporaries, not just the ssa-generated ones.

Before-and-after ls -l `go tool -n compile` shows a 3%
reduction in size (or rather, a prior 3% inflation for
failing to filter temps out properly.)

Replaced name-dependent "is it a temp?" tests with calls to
*Node.IsAutoTmp(), which depends on AutoTemp.  Also replace
calls to istemp(n) with n.IsAutoTmp(), to reduce duplication
and clean up function name space.  Generated temporaries
now come with a "." prefix to avoid (apparently harmless)
clashes with legal Go variable names.

Fixes #17644.
Fixes #17240.

Change-Id: If1417f29c79a7275d7303ddf859b51472890fd43
Reviewed-on: https://go-review.googlesource.com/32255
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-10-31 19:38:50 +00:00
Keith Randall
442de98c14 cmd/compile,runtime: redo how map assignments work
To compile:
  m[k] = v
instead of:
  mapassign(maptype, m, &k, &v), do
do:
  *mapassign(maptype, m, &k) = v

mapassign returns a pointer to the value slot in the map.  It is just
like mapaccess except that it will allocate a new slot if k is not
already present in the map.

This makes map accesses faster but potentially larger (codewise).

It is faster because the write into the map is done when the compiler
knows the concrete type, so it can be done with a few store
instructions instead of calling typedmemmove.  We also potentially
avoid stack temporaries to hold v.

The code can be larger when the map has pointers in its value type,
since there is a write barrier call in addition to the mapassign call.
That makes the code at the callsite a bit bigger (go binary is 0.3%
bigger).

This CL is in preparation for doing operations like m[k] += v with
only a single runtime call.  That will roughly double the speed of
such operations.

Update #17133
Update #5147

Change-Id: Ia435f032090a2ed905dac9234e693972fe8c2dc5
Reviewed-on: https://go-review.googlesource.com/30815
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-12 20:41:23 +00:00
Than McIntosh
6c5e377d23 cmd/compile: relax liveness restrictions on ambiguously live
Update gc liveness to remove special conservative treatment
of ambiguously live vars, since there is no longer a need to
protect against GCDEBUG=gcdead.

Change-Id: Id6e2d03218f7d67911e8436d283005a124e6957f
Reviewed-on: https://go-review.googlesource.com/24896
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2016-10-03 18:07:32 +00:00
Cherry Zhang
d586aae1f4 test: errorcheck auto-generated functions
Add an "errorcheckwithauto" action which performs error check
including lines with auto-generated functions (excluded by
default). Comment "// ERRORAUTO" matches these lines.

Add testcase for CL 29570 (as an example).

Updates #16016, #17186.

Change-Id: Iaba3727336cd602f3dda6b9e5f97dafe0848e632
Reviewed-on: https://go-review.googlesource.com/29652
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-22 20:10:30 +00:00
Keith Randall
ca4089ad62 cmd/compile: args no longer live until end-of-function
We're dropping this behavior in favor of runtime.KeepAlive.
Implement runtime.KeepAlive as an intrinsic.

Update #15843

Change-Id: Ib60225bd30d6770ece1c3c7d1339a06aa25b1cbc
Reviewed-on: https://go-review.googlesource.com/28310
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-19 16:54:35 +00:00
Keith Randall
6129f37367 cmd/compile: inline convT2{I,E} when result doesn't escape
No point in calling a function when we can build the interface
using a known type (or itab) and the address of a local.

Get rid of third arg (preallocated stack space) to convT2{I,E}.

Makes go binary smaller by 0.2%

benchmark                   old ns/op     new ns/op     delta
BenchmarkEfaceInteger-8     16.7          10.1          -39.52%

Update #17118
Update #15375

Change-Id: I9724a1f802bfa1e3957bf1856b55558278e198a2
Reviewed-on: https://go-review.googlesource.com/29373
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-09-19 02:37:08 +00:00
Keith Randall
c199c76cb4 cmd/compile: turn live variable test off for ppc
ppc64 has an extraneous variable live in some situations.
We need a better tighten pass to get rid of this extra variable.
I'm working on it, but fix the test in the meantime.

Fixes build for ppc64.

Change-Id: I1efb9ccb234a64f2a1c228abd2b3195f67fbeb41
Reviewed-on: https://go-review.googlesource.com/29353
Reviewed-by: David Chase <drchase@google.com>
2016-09-16 19:41:42 +00:00
Keith Randall
b265d51789 test,cmd/compile: remove _ssa file suffix
Everything is SSA now.

Update #16357

Change-Id: I436dbe367b863ee81a3695a7d653ba4bfc5b0f6c
Reviewed-on: https://go-review.googlesource.com/29232
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-15 20:47:01 +00:00
Keith Randall
f13701bf2f test: make SSA tests unconditional
Delete legacy backend tests, make SSA tests unconditional.

Next CL will remove _ssa from the file names.

Update #16357

Change-Id: I2a7f5dcbc69455f63b5e6e6b2725df26ab86c8dd
Reviewed-on: https://go-review.googlesource.com/29231
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-15 20:21:26 +00:00
Michael Munday
6ec993adc3 cmd/compile: add SSA backend for s390x and enable by default
The new SSA backend modifies the ABI slightly: R0 is now a usable
general purpose register.

Fixes #16677.

Change-Id: I367435ce921e0c7e79e021c80cf8ef5d1d1466cf
Reviewed-on: https://go-review.googlesource.com/28978
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-13 19:39:38 +00:00
Keith Randall
83c73a85db cmd/compile: ignore contentEscapes for marking nodes as escaping
Redo of CL 28575 with fixed test.
We're in a pre-KeepAlive world for a bit yet, the old tests
were in a client which was in a post-KeepAlive world.

Change-Id: I114fd630339d761ab3306d1d99718d3cb973678d
Reviewed-on: https://go-review.googlesource.com/28582
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-07 06:59:22 +00:00
Brad Fitzpatrick
bdb3b790c6 Revert of cmd/compile: ignore contentEscapes for marking nodes as escaping
Reason for revert: broke the build due to cherrypick;
relies on an unsubmitted parent CL.

Original issue's description:
> cmd/compile: ignore contentEscapes for marking nodes as escaping
> 
> We can still stack allocate and VarKill nodes which don't
> escape but their content does.
> 
> Fixes #16996
> 
> Change-Id: If8aa0fcf2c327b4cb880a3d5af8d213289e6f6bf
> Reviewed-on: https://go-review.googlesource.com/28575
> Run-TryBot: Keith Randall <khr@golang.org>
> TryBot-Result: Gobot Gobot <gobot@golang.org>
> Reviewed-by: David Chase <drchase@google.com>
> 

Change-Id: Ie1a325209de14d70af6acb2d78269b7a0450da7a
Reviewed-on: https://go-review.googlesource.com/28578
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-07 03:23:44 +00:00
Keith Randall
923a74ce77 cmd/compile: ignore contentEscapes for marking nodes as escaping
We can still stack allocate and VarKill nodes which don't
escape but their content does.

Fixes #16996

Change-Id: If8aa0fcf2c327b4cb880a3d5af8d213289e6f6bf
Reviewed-on: https://go-review.googlesource.com/28575
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-07 02:07:03 +00:00
Cherry Zhang
7f27f1dfdd cmd/compile: add MIPS64 optimizations, SSA on by default
Add the following optimizations:
- fold constants
- fold address into load/store
- simplify extensions and conditional branches
- remove nil checks

Turn on SSA on MIPS64 by default, and toggle the tests.

Fixes #16359.

Change-Id: I7f1e38c2509e22e42cd024e712990ebbe47176bd
Reviewed-on: https://go-review.googlesource.com/27870
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-26 19:45:06 +00:00
David Chase
5b9ff11c3d cmd/compile: ppc64le working, not optimized enough
This time with the cherry-pick from the proper patch of
the old CL.

Stack size increased.
Corrected NaN-comparison glitches.
Marked g register as clobbered by calls.
Fixed shared libraries.

live_ssa.go still disabled because of differences.
Presumably turning on more optimization will fix
both the stack size and the live_ssa.go glitches.

Enhanced debugging output for shared libs test.

Rebased onto master.

Updates #16010.

Change-Id: I40864faf1ef32c118fb141b7ef8e854498e6b2c4
Reviewed-on: https://go-review.googlesource.com/27159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-18 16:34:47 +00:00
Cherry Zhang
d99cee79b9 [dev.ssa] cmd/compile, etc.: more ARM64 optimizations, and enable SSA by default
Add more ARM64 optimizations:
- use hardware zero register when it is possible.
- use shifted ops.
  The assembler supports shifted ops but not documented, nor knows
  how to print it. This CL adds them.
- enable fast division.
  This was disabled because it makes the old backend generate slower
  code. But with SSA it generates faster code.

Turn on SSA by default, also adjust tests.

Change-Id: I7794479954c83bb65008dcb457bc1e21d7496da6
Reviewed-on: https://go-review.googlesource.com/26950
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-15 03:37:34 +00:00
Keith Randall
c069bc4996 [dev.ssa] cmd/compile: implement GO386=387
Last part of the 386 SSA port.

Modify the x86 backend to simulate SSE registers and
instructions with 387 registers and instructions.
The simulation isn't terribly performant, but it works,
and the old implementation wasn't very performant either.
Leaving to people who care about 387 to optimize if they want.

Turn on SSA backend for 386 by default.

Fixes #16358

Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0
Reviewed-on: https://go-review.googlesource.com/25271
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-10 17:41:01 +00:00
Keith Randall
69a755b602 [dev.ssa] cmd/compile: port SSA backend to amd64p32
It's not a new backend, just a PtrSize==4 modification
of the existing AMD64 backend.

Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda
Reviewed-on: https://go-review.googlesource.com/25586
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-09 15:48:26 +00:00
Cherry Zhang
6b6de15d32 [dev.ssa] cmd/compile: support NaCl in SSA for ARM
NaCl code runs in sandbox and there are restrictions for its
instruction uses
(https://developer.chrome.com/native-client/reference/sandbox_internals/arm-32-bit-sandbox).

Like the legacy backend, on NaCl,
- don't use R9, which is used as NaCl's "thread pointer".
- don't use Duff's device.
- don't use indexed load/stores.
- the assembler rewrites DIV/MOD to runtime calls, which on NaCl
  clobbers R12, so R12 is marked as clobbered for DIV/MOD.
- other restrictions are satisfied by the assembler.

Enable SSA specific tests on nacl/arm, and disable non-SSA ones.

Updates #15365.

Change-Id: I9262693ec6756b89ca29d3ae4e52a96fe5403b02
Reviewed-on: https://go-review.googlesource.com/24859
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-07-16 03:13:45 +00:00
Cherry Zhang
42181ad852 [dev.ssa] cmd/compile: enable SSA on ARM by default
As Josh mentioned in CL 24716, there has been requests for using SSA
for ARM. SSA can still be disabled by setting -ssa=0 for cmd/compile,
or partially enabled with GOSSAFUNC, GOSSAPKG, and GOSSAHASH.

Not enable SSA by default on NaCl, which is not supported yet.

Enable SSA-specific tests on ARM: live_ssa.go and nilptr3_ssa.go;
disable non-SSA tests: live.go, nilptr3.go, and slicepot.go.

Updates #15365.

Change-Id: Ic2ca8d166aeca8517b9d262a55e92f2130683a16
Reviewed-on: https://go-review.googlesource.com/23953
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2016-07-06 15:05:50 +00:00
Emmanuel Odeke
53fd522c0d all: make copyright headers consistent with one space after period
Follows suit with https://go-review.googlesource.com/#/c/20111.

Generated by running
$ grep -R 'Go Authors.  All' * | cut -d":" -f1 | while read F;do perl -pi -e 's/Go
Authors.  All/Go Authors. All/g' $F;done

The code in cmd/internal/unvendor wasn't changed.

Fixes #15213

Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f
Reviewed-on: https://go-review.googlesource.com/21819
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-02 13:43:18 +00:00
David Chase
729abfa35c [dev.ssa] cmd/compile: default compile+test with SSA
Some tests disabled, some bifurcated into _ssa and not,
with appropriate logging added to compiler.

"tests/live.go" in particular needs attention.

SSA-specific testing removed, since it's all SSA now.

Added "-run_skips" option to tests/run.go to simplify
checking whether a test still fails (or how it fails)
on a skipped platform.

The compiler now compiles with SSA by default.
If you don't want SSA, specify GOSSAHASH=n (or N) as
an environment variable.  Function names ending in "_ssa"
are always SSA-compiled.

GOSSAFUNC=fname retains its "SSA for fname, log to ssa.html"
GOSSAPKG=pkg only has an effect when GOSSAHASH=n
GOSSAHASH=10101 etc retains its name-hash-matching behavior
for purposes of debugging.

See #13068

Change-Id: I8217bfeb34173533eaeb391b5f6935483c7d6b43
Reviewed-on: https://go-review.googlesource.com/16299
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
2015-10-30 20:35:20 +00:00
Dmitry Vyukov
7647741246 test: add -update_errors flag to run script
The flag updates error annotations in test files from actual compiler output.
This is useful when doing compiler changes that add/remove/change lots of errors,
or when adding lots of new tests.
Also I noticed at least 2 cases where annotation were sub-optimal:
1. The annotation was "leaking param p" when the actual error is
"leaking param p to result ~r1".
2. The annotation was "leaking param m" when the actual errors
are "leaking param m" and "leaking param mv1".

For now it works only for errorcheck mode.

Also, apply the update to escape and liveness tests.
Some files have gccgo-specific errors of the form "gc error|gccgo error",
so it is risky to run update on all files. Gccgo-specific error
does not necessary contain '|', it can be just truncated.

Change-Id: Iaaae767f859dcb8321a8cb4970b2b70969e8a345
Reviewed-on: https://go-review.googlesource.com/5310
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-10 11:33:42 +00:00
Josh Bleecher Snyder
77a2113925 cmd/gc: evaluate concrete == interface without allocating
Consider an interface value i of type I and concrete value c of type C.

Prior to this CL, i==c was evaluated as
	I(c) == i

Evaluating I(c) can allocate.

This CL changes the evaluation of i==c to
	x, ok := i.(C); ok && x == c

The new generated code is shorter and does not allocate directly.

If C is small, as it is in every instance in the stdlib,
the new code also uses less stack space
and makes one runtime call instead of two.

If C is very large, the original implementation is used.
The cutoff for "very large" is 1<<16,
following the stack vs heap cutoff used elsewhere.

This kind of comparison occurs in 38 places in the stdlib,
mostly in the net and os packages.

benchmark                     old ns/op     new ns/op     delta
BenchmarkEqEfaceConcrete      29.5          7.92          -73.15%
BenchmarkEqIfaceConcrete      32.1          7.90          -75.39%
BenchmarkNeEfaceConcrete      29.9          7.90          -73.58%
BenchmarkNeIfaceConcrete      35.9          7.90          -77.99%

Fixes #9370.

Change-Id: I7c4555950bcd6406ee5c613be1f2128da2c9a2b7
Reviewed-on: https://go-review.googlesource.com/2096
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
2015-02-12 22:23:38 +00:00
Dmitry Vyukov
b3be360f16 cmd/gc: allocate non-escaping maps on stack
Extend escape analysis to make(map[k]v).
If it does not escape, allocate temp buffer for hmap and one bucket on stack.

There are 75 cases of non-escaping maps in std lib.

benchmark                                    old allocs     new allocs     delta
BenchmarkConcurrentStmtQuery                 16161          15161          -6.19%
BenchmarkConcurrentTxQuery                   17658          16658          -5.66%
BenchmarkConcurrentTxStmtQuery               16157          15156          -6.20%
BenchmarkConcurrentRandom                    13637          13114          -3.84%
BenchmarkManyConcurrentQueries               22             20             -9.09%
BenchmarkDecodeComplex128Slice               250            188            -24.80%
BenchmarkDecodeFloat64Slice                  250            188            -24.80%
BenchmarkDecodeInt32Slice                    250            188            -24.80%
BenchmarkDecodeStringSlice                   2250           2188           -2.76%
BenchmarkNewEmptyMap                         1              0              -100.00%
BenchmarkNewSmallMap                         2              0              -100.00%

benchmark                old ns/op     new ns/op     delta
BenchmarkNewEmptyMap     124           55.7          -55.08%
BenchmarkNewSmallMap     317           148           -53.31%

benchmark                old allocs     new allocs     delta
BenchmarkNewEmptyMap     1              0              -100.00%
BenchmarkNewSmallMap     2              0              -100.00%

benchmark                old bytes     new bytes     delta
BenchmarkNewEmptyMap     48            0             -100.00%
BenchmarkNewSmallMap     192           0             -100.00%

Fixes #5449

Change-Id: I24fa66f949d2f138885d9e66a0d160240dc9e8fa
Reviewed-on: https://go-review.googlesource.com/3508
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
2015-02-12 09:53:52 +00:00
Russ Cox
9ef4e56108 [dev.garbage] all: merge dev.power64 (7667e41f3ced) into dev.garbage
Now the only difference between dev.cc and dev.garbage
is the runtime conversion on the one side and the
garbage collection on the other. They both have the
same set of changes from default and dev.power64.

LGTM=austin
R=austin
CC=golang-codereviews
https://golang.org/cl/172570043
2014-11-14 12:09:42 -05:00
Russ Cox
75d3f62b3c [dev.garbage] cmd/gc, runtime: add locks around print statements
Now each C printf, Go print, or Go println is guaranteed
not to be interleaved with other calls of those functions.
This should help when debugging concurrent failures.

LGTM=rlh
R=rlh
CC=golang-codereviews
https://golang.org/cl/169120043
2014-11-05 14:42:54 -05:00
Austin Clements
a5e1e1599c [dev.power64] test: "fix" live.go test on power64x
On power64x, this one line in live.go reports that t is live
because of missing optimization passes.  This isn't what this
test is trying to test, so shuffle bad40 so that it still
accomplishes the intent of the test without also depending on
optimization.

LGTM=rsc
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/167110043
2014-11-03 17:25:36 -05:00
Russ Cox
454d1b0e8b cmd/gc: fix call order in array literal of slice literal of make chan
Fixes #8761.

LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, r
https://golang.org/cl/144530045
2014-09-30 12:48:47 -04:00
Russ Cox
fcb4cabba4 cmd/gc: emit write barriers
A write *p = x that needs a write barrier (not all do)
now turns into runtime.writebarrierptr(p, x)
or one of the other variants.

The write barrier implementations are trivial.
The goal here is to emit the calls in the correct places
and to incur the cost of those function calls in the Go 1.4 cycle.

Performance on the Go 1 benchmark suite below.
Remember, the goal is to slow things down (and be correct).

We will look into optimizations in separate CLs, as part of
the process of comparing Go 1.3 against tip in order to make
sure Go 1.4 runs at least as fast as Go 1.3.

benchmark                          old ns/op      new ns/op      delta
BenchmarkBinaryTree17              3118336716     3452876110     +10.73%
BenchmarkFannkuch11                3184497677     3211552284     +0.85%
BenchmarkFmtFprintfEmpty           89.9           107            +19.02%
BenchmarkFmtFprintfString          236            287            +21.61%
BenchmarkFmtFprintfInt             246            278            +13.01%
BenchmarkFmtFprintfIntInt          395            458            +15.95%
BenchmarkFmtFprintfPrefixedInt     343            378            +10.20%
BenchmarkFmtFprintfFloat           477            525            +10.06%
BenchmarkFmtManyArgs               1446           1707           +18.05%
BenchmarkGobDecode                 14398047       14685958       +2.00%
BenchmarkGobEncode                 12557718       12947104       +3.10%
BenchmarkGzip                      453462345      472413285      +4.18%
BenchmarkGunzip                    114226016      115127398      +0.79%
BenchmarkHTTPClientServer          114689         112122         -2.24%
BenchmarkJSONEncode                24914536       26135942       +4.90%
BenchmarkJSONDecode                86832877       103620289      +19.33%
BenchmarkMandelbrot200             4833452        4898780        +1.35%
BenchmarkGoParse                   4317976        4835474        +11.98%
BenchmarkRegexpMatchEasy0_32       150            166            +10.67%
BenchmarkRegexpMatchEasy0_1K       393            402            +2.29%
BenchmarkRegexpMatchEasy1_32       125            142            +13.60%
BenchmarkRegexpMatchEasy1_1K       1010           1236           +22.38%
BenchmarkRegexpMatchMedium_32      232            301            +29.74%
BenchmarkRegexpMatchMedium_1K      76963          102721         +33.47%
BenchmarkRegexpMatchHard_32        3833           5463           +42.53%
BenchmarkRegexpMatchHard_1K        119668         161614         +35.05%
BenchmarkRevcomp                   763449047      706768534      -7.42%
BenchmarkTemplate                  124954724      134834549      +7.91%
BenchmarkTimeParse                 517            511            -1.16%
BenchmarkTimeFormat                501            514            +2.59%

benchmark                         old MB/s     new MB/s     speedup
BenchmarkGobDecode                53.31        52.26        0.98x
BenchmarkGobEncode                61.12        59.28        0.97x
BenchmarkGzip                     42.79        41.08        0.96x
BenchmarkGunzip                   169.88       168.55       0.99x
BenchmarkJSONEncode               77.89        74.25        0.95x
BenchmarkJSONDecode               22.35        18.73        0.84x
BenchmarkGoParse                  13.41        11.98        0.89x
BenchmarkRegexpMatchEasy0_32      213.30       191.72       0.90x
BenchmarkRegexpMatchEasy0_1K      2603.92      2542.74      0.98x
BenchmarkRegexpMatchEasy1_32      254.00       224.93       0.89x
BenchmarkRegexpMatchEasy1_1K      1013.53      827.98       0.82x
BenchmarkRegexpMatchMedium_32     4.30         3.31         0.77x
BenchmarkRegexpMatchMedium_1K     13.30        9.97         0.75x
BenchmarkRegexpMatchHard_32       8.35         5.86         0.70x
BenchmarkRegexpMatchHard_1K       8.56         6.34         0.74x
BenchmarkRevcomp                  332.92       359.62       1.08x
BenchmarkTemplate                 15.53        14.39        0.93x

LGTM=rlh
R=rlh
CC=dvyukov, golang-codereviews, iant, khr, r
https://golang.org/cl/136380043
2014-09-11 12:17:45 -04:00
Keith Randall
8217b4a203 runtime: convert panic/recover to Go
created panic1.go just so diffs were available.
After this CL is in, I'd like to move panic.go -> defer.go
and panic1.go -> panic.go.

LGTM=rsc
R=rsc, khr
CC=golang-codereviews
https://golang.org/cl/133530045
2014-09-05 10:04:16 -04:00
Russ Cox
1806a5732b cmd/gc, runtime: refactor interface inlining decision into compiler
We need to change the interface value representation for
concurrent garbage collection, so that there is no ambiguity
about whether the data word holds a pointer or scalar.

This CL does NOT make any representation changes.

Instead, it removes representation assumptions from
various pieces of code throughout the tree.
The isdirectiface function in cmd/gc/subr.c is now
the only place that decides that policy.
The policy propagates out from there in the reflect
metadata, as a new flag in the internal kind value.

A follow-up CL will change the representation by
changing the isdirectiface function. If that CL causes
problems, it will be easy to roll back.

Update #8405.

LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, r
https://golang.org/cl/129090043
2014-08-18 21:13:11 -04:00
Keith Randall
4aa50434e1 runtime: rewrite malloc in Go.
This change introduces gomallocgc, a Go clone of mallocgc.
Only a few uses have been moved over, so there are still
lots of uses from C. Many of these C uses will be moved
over to Go (e.g. in slice.goc), but probably not all.
What should remain of C's mallocgc is an open question.

LGTM=rsc, dvyukov
R=rsc, khr, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/108840046
2014-07-30 09:01:52 -07:00
Dmitriy Vyukov
40d7d5a656 cmd/gc: allocate select descriptor on stack
benchmark                      old ns/op     new ns/op     delta
BenchmarkSelectUncontended     220           165           -25.00%
BenchmarkSelectContended       209           161           -22.97%
BenchmarkSelectProdCons        1042          904           -13.24%

But more importantly this change will allow
to get rid of free function in runtime.

Fixes #6494.

LGTM=rsc, khr
R=golang-codereviews, rsc, dominik.honnef, khr
CC=golang-codereviews, remyoudompheng
https://golang.org/cl/107670043
2014-07-20 15:07:10 +04:00
Russ Cox
eb54079264 cmd/gc: fix liveness for address-taken variables in inlined functions
The 'address taken' bit in a function variable was not
propagating into the inlined copies, causing incorrect
liveness information.

LGTM=dsymonds, bradfitz
R=golang-codereviews, bradfitz
CC=dsymonds, golang-codereviews, iant, khr, r
https://golang.org/cl/96670046
2014-06-02 21:26:32 -04:00
Russ Cox
89d46fed2c cmd/gc: fix x=x crash
[Same as CL 102820043 except applied changes to 6g/gsubr.c
also to 5g/gsubr.c and 8g/gsubr.c. The problem I had last night
trying to do that was that 8g's copy of nodarg has different
(but equivalent) control flow and I was pasting the new code
into the wrong place.]

Description from CL 102820043:

The 'nodarg' function is used to obtain a Node*
representing a function argument or result.
It returned a brand new Node*, but that violates
the guarantee in most places in the compiler that
two Node*s refer to the same variable if and only if
they are the same Node* pointer. Reestablish that
invariant by making nodarg return a preexisting
named variable if present.

Having fixed that, avoid any copy during x=x in
componentgen, because the VARDEF we emit
before the copy marks the lhs x as dead incorrectly.

The change in walk.c avoids modifying the result
of nodarg. This was the only place in the compiler
that did so.

Fixes #8097.

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, iant, khr, r
https://golang.org/cl/103750043
2014-05-29 13:47:31 -04:00
Russ Cox
9dd062b82e undo CL 102820043 / b0ce6dbafc18
Breaks 386 and arm builds.
The obvious reason is that this CL only edited 6g/gsubr.c
and failed to edit 5g/gsubr.c and 8g/gsubr.c.
However, the obvious CL applying the same edit to those
files (CL 101900043) causes mysterious build failures
in various of the standard package tests, usually involving
reflect. Something deep and subtle is broken but only on
the 32-bit systems.

Undo this CL for now.

««« original CL description
cmd/gc: fix x=x crash

The 'nodarg' function is used to obtain a Node*
representing a function argument or result.
It returned a brand new Node*, but that violates
the guarantee in most places in the compiler that
two Node*s refer to the same variable if and only if
they are the same Node* pointer. Reestablish that
invariant by making nodarg return a preexisting
named variable if present.

Having fixed that, avoid any copy during x=x in
componentgen, because the VARDEF we emit
before the copy marks the lhs x as dead incorrectly.

The change in walk.c avoids modifying the result
of nodarg. This was the only place in the compiler
that did so.

Fixes #8097.

LGTM=r, khr
R=golang-codereviews, r, khr
CC=golang-codereviews, iant
https://golang.org/cl/102820043
»»»

TBR=r
CC=golang-codereviews, khr
https://golang.org/cl/95660043
2014-05-28 21:46:20 -04:00
Russ Cox
948b2c722b cmd/gc: fix x=x crash
The 'nodarg' function is used to obtain a Node*
representing a function argument or result.
It returned a brand new Node*, but that violates
the guarantee in most places in the compiler that
two Node*s refer to the same variable if and only if
they are the same Node* pointer. Reestablish that
invariant by making nodarg return a preexisting
named variable if present.

Having fixed that, avoid any copy during x=x in
componentgen, because the VARDEF we emit
before the copy marks the lhs x as dead incorrectly.

The change in walk.c avoids modifying the result
of nodarg. This was the only place in the compiler
that did so.

Fixes #8097.

LGTM=r, khr
R=golang-codereviews, r, khr
CC=golang-codereviews, iant
https://golang.org/cl/102820043
2014-05-28 19:50:19 -04:00
Russ Cox
28f1868fed cmd/gc, runtime: make GODEBUG=gcdead=1 mode work with liveness
Trying to make GODEBUG=gcdead=1 work with liveness
and in particular ambiguously live variables.

1. In the liveness computation, mark all ambiguously live
variables as live for the entire function, except the entry.
They are zeroed directly after entry, and we need them not
to be poisoned thereafter.

2. In the liveness computation, compute liveness (and deadness)
for all parameters, not just pointer-containing parameters.
Otherwise gcdead poisons untracked scalar parameters and results.

3. Fix liveness debugging print for -live=2 to use correct bitmaps.
(Was not updated for compaction during compaction CL.)

4. Correct varkill during map literal initialization.
Was killing the map itself instead of the inserted value temp.

5. Disable aggressive varkill cleanup for call arguments if
the call appears in a defer or go statement.

6. In the garbage collector, avoid bug scanning empty
strings. An empty string is two zeros. The multiword
code only looked at the first zero and then interpreted
the next two bits in the bitmap as an ordinary word bitmap.
For a string the bits are 11 00, so if a live string was zero
length with a 0 base pointer, the poisoning code treated
the length as an ordinary word with code 00, meaning it
needed poisoning, turning the string into a poison-length
string with base pointer 0. By the same logic I believe that
a live nil slice (bits 11 01 00) will have its cap poisoned.
Always scan full multiword struct.

7. In the runtime, treat both poison words (PoisonGC and
PoisonStack) as invalid pointers that warrant crashes.

Manual testing as follows:

- Create a script called gcdead on your PATH containing:

        #!/bin/bash
        GODEBUG=gcdead=1 GOGC=10 GOTRACEBACK=2 exec "$@"
- Now you can build a test and then run 'gcdead ./foo.test'.
- More importantly, you can run 'go test -short -exec gcdead std'
   to run all the tests.

Fixes #7676.

While here, enable the precise scanning of slices, since that was
disabled due to bugs like these. That now works, both with and
without gcdead.

Fixes #7549.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83410044
2014-04-03 20:33:25 -04:00
Russ Cox
96d90d0981 cmd/gc: shorten even more temporary lifetimes
1. Use n->alloc, not n->left, to hold the allocated temp being
passed from orderstmt/orderexpr to walk.

2. Treat method values the same as closures.

3. Use killed temporary for composite literal passed to
non-escaping function argument.

4. Clean temporaries promptly in if and for statements.

5. Clean temporaries promptly in select statements.
As part of this, move all the temporary-generating logic
out of select.c into order.c, so that the temporaries can
be reclaimed.

With the new temporaries, can re-enable the 1-entry
select optimization. Fixes issue 7672.

While we're here, fix a 1-line bug in select processing
turned up by the new liveness test (but unrelated; select.c:72).
Fixes #7686.

6. Clean temporaries (but not particularly promptly) in switch
and range statements.

7. Clean temporary used during convT2E/convT2I.

8. Clean temporaries promptly during && and || expressions.

---

CL 81940043 reduced the number of ambiguously live temps
in the godoc binary from 860 to 711.

CL 83090046 reduced the number from 711 to 121.

This CL reduces the number from 121 to 23.

15 the 23 that remain are in fact ambiguously live.
The final 8 could be fixed but are not trivial and
not common enough to warrant work at this point
in the release cycle.

These numbers only count ambiguously live temps,
not ambiguously live user-declared variables.
There are 18 such variables in the godoc binary after this CL,
so a total of 41 ambiguously live temps or user-declared
variables.

The net effect is that zeroing anything on entry to a function
should now be a rare event, whereas earlier it was the
common case.

This is good enough for Go 1.3, and probably good
enough for future releases too.

Fixes #7345.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83000048
2014-04-02 14:09:42 -04:00
Russ Cox
daca06f2e3 cmd/gc: shorten more temporary lifetimes
1. In functions with heap-allocated result variables or with
defer statements, the return sequence requires more than
just a single RET instruction. There is an optimization that
arranges for all returns to jump to a single copy of the return
epilogue in this case. Unfortunately, that optimization is
fundamentally incompatible with PC-based liveness information:
it takes PCs at many different points in the function and makes
them all land at one PC, making the combined liveness information
at that target PC a mess. Disable this optimization, so that each
return site gets its own copy of the 'call deferreturn' and the
copying of result variables back from the heap.
This removes quite a few spurious 'ambiguously live' variables.

2. Let orderexpr allocate temporaries that are passed by address
to a function call and then die on return, so that we can arrange
an appropriate VARKILL.

2a. Do this for ... slices.

2b. Do this for closure structs.

2c. Do this for runtime.concatstring, which is the implementation
of large string additions. Change representation of OADDSTR to
an explicit list in typecheck to avoid reconstructing list in both
walk and order.

3. Let orderexpr allocate the temporary variable copies used for
range loops, so that they can be killed when the loop is over.
Similarly, let it allocate the temporary holding the map iterator.

CL 81940043 reduced the number of ambiguously live temps
in the godoc binary from 860 to 711.

This CL reduces the number to 121. Still more to do, but another
good checkpoint.

Update #7345

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83090046
2014-04-01 20:02:54 -04:00
Russ Cox
b700cb4974 cmd/gc: shorten temporary lifetimes when possible
The new channel and map runtime routines take pointers
to values, typically temporaries. Without help, the compiler
cannot tell when those temporaries stop being needed,
because it isn't sure what happened to the pointer.
Arrange to insert explicit VARKILL instructions for these
temporaries so that the liveness analysis can avoid seeing
them as "ambiguously live".

The change is made in order.c, which was already in charge of
introducing temporaries to preserve the order-of-evaluation
guarantees. Now its job has expanded to include introducing
temporaries as needed by runtime routines, and then also
inserting the VARKILL annotations for all these temporaries,
so that their lifetimes can be shortened.

In order to do its job for the map runtime routines, order.c arranges
that all map lookups or map assignments have the form:

        x = m[k]
        x, y = m[k]
        m[k] = x

where x, y, and k are simple variables (often temporaries).
Likewise, receiving from a channel is now always:

        x = <-c

In order to provide the map guarantee, order.c is responsible for
rewriting x op= y into x = x op y, so that m[k] += z becomes

        t = m[k]
        t2 = t + z
        m[k] = t2

While here, fix a few bugs in order.c's traversal: it was failing to
walk into select and switch case bodies, so order of evaluation
guarantees were not preserved in those situations.
Added tests to test/reorder2.go.

Fixes #7671.

In gc/popt's temporary-merging optimization, allow merging
of temporaries with their address taken as long as the liveness
ranges do not intersect. (There is a good chance of that now
that we have VARKILL annotations to limit the liveness range.)

Explicitly killing temporaries cuts the number of ambiguously
live temporaries that must be zeroed in the godoc binary from
860 to 711, or -17%. There is more work to be done, but this
is a good checkpoint.

Update #7345

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81940043
2014-04-01 13:31:38 -04:00
Russ Cox
6722d45631 cmd/gc: liveness-related bug fixes
1. On entry to a function, only zero the ambiguously live stack variables.
Before, we were zeroing all stack variables containing pointers.
The zeroing is pretty inefficient right now (issue 7624), but there are also
too many stack variables detected as ambiguously live (issue 7345),
and that must be addressed before deciding how to improve the zeroing code.
(Changes in 5g/ggen.c, 6g/ggen.c, 8g/ggen.c, gc/pgen.c)

Fixes #7647.

2. Make the regopt word-based liveness analysis preserve the
whole-variable liveness property expected by the garbage collection
bitmap liveness analysis. That is, if the regopt liveness decides that
one word in a struct needs to be preserved, make sure it preserves
the entire struct. This is particularly important for multiword values
such as strings, slices, and interfaces, in which all the words need
to be present in order to understand the meaning.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c.)

Fixes #7591.

3. Make the regopt word-based liveness analysis treat a variable
as having its address taken - which makes it preserved across
all future calls - whenever n->addrtaken is set, for consistency
with the gc bitmap liveness analysis, even if there is no machine
instruction actually taking the address. In this case n->addrtaken
is incorrect (a nicer way to put it is overconservative), and ideally
there would be no such cases, but they can happen and the two
analyses need to agree.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c; test in bug484.go.)

Fixes crashes found by turning off "zero everything" in step 1.

4. Remove spurious VARDEF annotations. As the comment in
gc/pgen.c explains, the VARDEF must immediately precede
the initialization. It cannot be too early, and it cannot be too late.
In particular, if a function call sits between the VARDEF and the
actual machine instructions doing the initialization, the variable
will be treated as live during that function call even though it is
uninitialized, leading to problems.
(Changes in gc/gen.c; test in live.go.)

Fixes crashes found by turning off "zero everything" in step 1.

5. Do not treat loading the address of a wide value as a signal
that the value must be initialized. Instead depend on the existence
of a VARDEF or the first actual read/write of a word in the value.
If the load is in order to pass the address to a function that does
the actual initialization, treating the load as an implicit VARDEF
causes the same problems as described in step 4.
The alternative is to arrange to zero every such value before
passing it to the real initialization function, but this is a much
easier and more efficient change.
(Changes in gc/plive.c.)

Fixes crashes found by turning off "zero everything" in step 1.

6. Treat wide input parameters with their address taken as
initialized on entry to the function. Otherwise they look
"ambiguously live" and we will try to emit code to zero them.
(Changes in gc/plive.c.)

Fixes crashes found by turning off "zero everything" in step 1.

7. An array of length 0 has no pointers, even if the element type does.
Without this change, the zeroing code complains when asked to
clear a 0-length array.
(Changes in gc/reflect.c.)

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80160044
2014-03-27 14:05:57 -04:00
Russ Cox
7a7c0ffb47 cmd/gc: correct liveness for fat variables
The VARDEF placement must be before the initialization
but after any final use. If you have something like s = ... using s ...
the rhs must be evaluated, then the VARDEF, then the lhs
assigned.

There is a large comment in pgen.c on gvardef explaining
this in more detail.

This CL also includes Ian's suggestions from earlier CLs,
namely commenting the use of mode in link.h and fixing
the precedence of the ~r check in dcl.c.

This CL enables the check that if liveness analysis decides
a variable is live on entry to the function, that variable must
be a function parameter (not a result, and not a local variable).
If this check fails, it indicates a bug in the liveness analysis or
in the generated code being analyzed.

The race detector generates invalid code for append(x, y...).
The code declares a temporary t and then uses cap(t) before
initializing t. The new liveness check catches this bug and
stops the compiler from writing out the buggy code.
Consequently, this CL disables the race detector tests in
run.bash until the race detector bug can be fixed
(golang.org/issue/7334).

Except for the race detector bug, the liveness analysis check
does not detect any problems (this CL and the previous CLs
fixed all the detected problems).

The net test still fails with GOGC=0 but the rest of the tests
now pass or time out (because GOGC=0 is so slow).

TBR=iant
CC=golang-codereviews
https://golang.org/cl/64170043
2014-02-15 10:58:55 -05:00
Russ Cox
af545660d5 cmd/gc: correct liveness for various non-returning functions
When the liveness code doesn't know a function doesn't return
(but the generated code understands that), the liveness analysis
invents a control flow edge that is not really there, which can cause
variables to seem spuriously live. This is particularly bad when the
variables are uninitialized.

TBR=iant
CC=golang-codereviews
https://golang.org/cl/63720043
2014-02-14 00:38:24 -05:00