1
0
mirror of https://github.com/golang/go synced 2024-11-18 13:14:47 -07:00
Commit Graph

3007 Commits

Author SHA1 Message Date
David Chase
eeff8fa453 cmd/compile: remove now-irrelevant test
This test measures "line churn" which was minimized to help
improve the debugger experience.  With proper is_stmt markers,
this is no longer necessary, and it is more accurate (for
profiling) to allow line numbers to vary willy-nilly.

"Debugger experience" is now better measured by
cmd/compile/internal/ssa/debug_test.go

This CL made the obsoleting change:
https://go-review.googlesource.com/c/go/+/102435

Change-Id: I874ab89f3b243b905aaeba7836118f632225a667
Reviewed-on: https://go-review.googlesource.com/113155
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-14 20:33:00 +00:00
David Chase
c2c1822b12 cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).

Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).

Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.

Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".

The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices.  This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.

Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go.  There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.

Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)

This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.

The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.

This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.

Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-14 14:09:49 +00:00
Keith Randall
599f56dc05 cmd/compile: fix zero extend after float->int conversion
Don't do direct loads from argument slots if the sizes don't match.
This prevents us from loading from a float32 using a uint64 load
during expressions like uint64(math.float32Bits(f)) where f is a float32 arg.

Fixes #25322

Change-Id: I3887d76f78c844ba546243e7721d811c3d4a9700
Reviewed-on: https://go-review.googlesource.com/112637
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-10 17:32:43 +00:00
Lynn Boger
e4172e5f5e cmd/compile/internal/ssa: initialize t.UInt in SetTypPtrs()
Initialization of t.UInt is missing from SetTypPtrs in config.go,
preventing rules that use it from matching when they should.
This adds the initialization to allow those rules to work.

Updated test/codegen/rotate.go to test for this case, which
appears in math/bits RotateLeft32 and RotateLeft64. There had been
a testcase for this in go 1.10 but that went away when asm_test.go
was removed.

Change-Id: I82fc825ad8364df6fc36a69a1e448214d2e24ed5
Reviewed-on: https://go-review.googlesource.com/112518
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-10 14:50:30 +00:00
Michael Munday
6d00e8c478 cmd/compile: convert memmove call into Move when arguments are disjoint
Move ops can be faster than memmove calls because the number of bytes
to be moved is fixed and they don't incur the overhead of a call.
This change allows memmove to be converted into a Move op when the
arguments are disjoint.

The optimization is only enabled on s390x at the moment, however
other architectures may also benefit from it in the future. The
memmove inlining rule triggers an extra 12 times when compiling the
standard library. It will most likely make more of a difference as the
disjoint function is improved over time (to recognize fresh heap
allocations for example).

Change-Id: I9af570dcfff28257b8e59e0ff584a46d8e248310
Reviewed-on: https://go-review.googlesource.com/110064
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-09 11:20:40 +00:00
Matt Juran
e0adc35c47 test: fast GC+concurrency+types verification
This test runs independent goroutines modifying a comprehensive variety
of local vars to look for garbage collector regressions. This test has
been verified to trigger issue 22781 on the go1.9.2 tag. This test
expands on test/fixedbugs/issue22781.go.

Tests #22781

Change-Id: Id32f8dde7ef650aea1b1b4cf518e6d045537bfdc
Reviewed-on: https://go-review.googlesource.com/93715
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-08 21:15:48 +00:00
Martin Möhrmann
aee71dd70b cmd/compile: optimize map-clearing range idiom
replace map clears of the form:

        for k := range m {
                delete(m, k)
        }

(where m is map with key type that is reflexive for ==)
with a new runtime function that clears the maps backing
array with a memclr and reinitializes the hmap struct.

Map key types that for example contain floats are not
replaced by this optimization since NaN keys cannot
be deleted from maps using delete.

name                           old time/op  new time/op  delta
GoMapClear/Reflexive/1         92.2ns ± 1%  47.1ns ± 2%  -48.89%  (p=0.000 n=9+9)
GoMapClear/Reflexive/10         108ns ± 1%    48ns ± 2%  -55.68%  (p=0.000 n=10+10)
GoMapClear/Reflexive/100        303ns ± 2%   110ns ± 3%  -63.56%  (p=0.000 n=10+10)
GoMapClear/Reflexive/1000      3.58µs ± 3%  1.23µs ± 2%  -65.49%  (p=0.000 n=9+10)
GoMapClear/Reflexive/10000     28.2µs ± 3%  10.3µs ± 2%  -63.55%  (p=0.000 n=9+10)
GoMapClear/NonReflexive/1       121ns ± 2%   124ns ± 7%     ~     (p=0.097 n=10+10)
GoMapClear/NonReflexive/10      137ns ± 2%   139ns ± 3%   +1.53%  (p=0.033 n=10+10)
GoMapClear/NonReflexive/100     331ns ± 3%   334ns ± 2%     ~     (p=0.342 n=10+10)
GoMapClear/NonReflexive/1000   3.64µs ± 3%  3.64µs ± 2%     ~     (p=0.887 n=9+10)
GoMapClear/NonReflexive/10000  28.1µs ± 2%  28.4µs ± 3%     ~     (p=0.247 n=10+10)

Fixes #20138

Change-Id: I181332a8ef434a4f0d89659f492d8711db3f3213
Reviewed-on: https://go-review.googlesource.com/110055
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 21:15:16 +00:00
Michael Munday
8af0c77df3 cmd/compile: simplify shift lowering on s390x
Use conditional moves instead of subtractions with borrow to handle
saturation cases. This allows us to delete the SUBE/SUBEW ops and
associated rules from the SSA backend. Using conditional moves also
means we can detect when shift values are masked so I've added some
new rules to constant fold the relevant comparisons and masking ops.

Also use the new shiftIsBounded() function to avoid generating code
to handle saturation cases where possible.

Updates #25167 for s390x.

Change-Id: Ief9991c91267c9151ce4c5ec07642abb4dcc1c0d
Reviewed-on: https://go-review.googlesource.com/110070
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-08 16:19:56 +00:00
Lynn Boger
28edaf4584 cmd/compile,test: combine byte loads and stores on ppc64le
CL 74410 added rules to combine consecutive byte loads and
stores when the byte order was little endian for ppc64le. This
is the corresponding change for bytes that are in big endian order.
These rules are all intended for a little endian target arch.

This adds new testcases in test/codegen/memcombine.go

Fixes #22496
Updates #24242

Benchmark improvement for encoding/binary:
name                      old time/op    new time/op    delta
ReadSlice1000Int32s-16      11.0µs ± 0%     9.0µs ± 0%  -17.47%  (p=0.029 n=4+4)
ReadStruct-16               2.47µs ± 1%    2.48µs ± 0%   +0.67%  (p=0.114 n=4+4)
ReadInts-16                  642ns ± 1%     630ns ± 1%   -2.02%  (p=0.029 n=4+4)
WriteInts-16                 654ns ± 0%     653ns ± 1%   -0.08%  (p=0.629 n=4+4)
WriteSlice1000Int32s-16     8.75µs ± 0%    8.20µs ± 0%   -6.19%  (p=0.029 n=4+4)
PutUint16-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint32-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint64-16                1.85ns ± 0%    0.93ns ± 0%  -49.73%  (p=0.029 n=4+4)
LittleEndianPutUint16-16    1.03ns ± 0%    0.93ns ± 0%   -9.71%  (p=0.029 n=4+4)
LittleEndianPutUint32-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
LittleEndianPutUint64-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
PutUvarint32-16             43.0ns ± 0%    43.1ns ± 0%   +0.12%  (p=0.429 n=4+4)
PutUvarint64-16              174ns ± 0%     175ns ± 0%   +0.29%  (p=0.429 n=4+4)

Updates made to functions in gcm.go to enable their matching. An existing
testcase prevents these functions from being replaced by those in encoding/binary
due to import dependencies.

Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a
Reviewed-on: https://go-review.googlesource.com/98136
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 13:15:39 +00:00
Michael Munday
f31a18ded4 cmd/compile: add some generic composite type optimizations
Propagate values through some wide Zero/Move operations. Among
other things this allows us to optimize some kinds of array
initialization. For example, the following code no longer
requires a temporary be allocated on the stack. Instead it
writes the values directly into the return value.

func f(i uint32) [4]uint32 {
    return [4]uint32{i, i+1, i+2, i+3}
}

The return value is unnecessarily cleared but removing that is
probably a task for dead store analysis (I think it needs to
be able to match multiple Store ops to wide Zero ops).

In order to reliably remove stack variables that are rendered
unnecessary by these new rules I've added a new generic version
of the unread autos elimination pass.

These rules are triggered more than 5000 times when building and
testing the standard library.

Updates #15925 (fixes for arrays of up to 4 elements).
Updates #24386 (fixes for up to 4 kept elements).
Updates #24416.

compilebench results:

name       old time/op       new time/op       delta
Template         353ms ± 5%        359ms ± 3%    ~     (p=0.143 n=10+10)
Unicode          219ms ± 1%        217ms ± 4%    ~     (p=0.740 n=7+10)
GoTypes          1.26s ± 1%        1.26s ± 2%    ~     (p=0.549 n=9+10)
Compiler         6.00s ± 1%        6.08s ± 1%  +1.42%  (p=0.000 n=9+8)
SSA              15.3s ± 2%        15.6s ± 1%  +2.43%  (p=0.000 n=10+10)
Flate            237ms ± 2%        240ms ± 2%  +1.31%  (p=0.015 n=10+10)
GoParser         285ms ± 1%        285ms ± 1%    ~     (p=0.878 n=8+8)
Reflect          797ms ± 3%        807ms ± 2%    ~     (p=0.065 n=9+10)
Tar              334ms ± 0%        335ms ± 4%    ~     (p=0.460 n=8+10)
XML              419ms ± 0%        423ms ± 1%  +0.91%  (p=0.001 n=7+9)
StdCmd           46.0s ± 0%        46.4s ± 0%  +0.85%  (p=0.000 n=9+9)

name       old user-time/op  new user-time/op  delta
Template         337ms ± 3%        346ms ± 5%    ~     (p=0.053 n=9+10)
Unicode          205ms ±10%        205ms ± 8%    ~     (p=1.000 n=10+10)
GoTypes          1.22s ± 2%        1.21s ± 3%    ~     (p=0.436 n=10+10)
Compiler         5.85s ± 1%        5.93s ± 0%  +1.46%  (p=0.000 n=10+8)
SSA              14.9s ± 1%        15.3s ± 1%  +2.62%  (p=0.000 n=10+10)
Flate            229ms ± 4%        228ms ± 6%    ~     (p=0.796 n=10+10)
GoParser         271ms ± 3%        275ms ± 4%    ~     (p=0.165 n=10+10)
Reflect          779ms ± 5%        775ms ± 2%    ~     (p=0.971 n=10+10)
Tar              317ms ± 4%        319ms ± 5%    ~     (p=0.853 n=10+10)
XML              404ms ± 4%        409ms ± 5%    ~     (p=0.436 n=10+10)

name       old alloc/op      new alloc/op      delta
Template        34.9MB ± 0%       35.0MB ± 0%  +0.26%  (p=0.000 n=10+10)
Unicode         29.3MB ± 0%       29.3MB ± 0%  +0.02%  (p=0.000 n=10+10)
GoTypes          115MB ± 0%        115MB ± 0%  +0.30%  (p=0.000 n=10+10)
Compiler         519MB ± 0%        521MB ± 0%  +0.30%  (p=0.000 n=10+10)
SSA             1.55GB ± 0%       1.57GB ± 0%  +1.34%  (p=0.000 n=10+9)
Flate           24.1MB ± 0%       24.2MB ± 0%  +0.10%  (p=0.000 n=10+10)
GoParser        28.1MB ± 0%       28.1MB ± 0%  +0.07%  (p=0.000 n=10+10)
Reflect         78.7MB ± 0%       78.7MB ± 0%  +0.03%  (p=0.000 n=8+10)
Tar             34.4MB ± 0%       34.5MB ± 0%  +0.12%  (p=0.000 n=10+10)
XML             43.2MB ± 0%       43.2MB ± 0%  +0.13%  (p=0.000 n=10+10)

name       old allocs/op     new allocs/op     delta
Template          330k ± 0%         330k ± 0%  -0.01%  (p=0.017 n=10+10)
Unicode           337k ± 0%         337k ± 0%  +0.01%  (p=0.000 n=9+10)
GoTypes          1.15M ± 0%        1.15M ± 0%  +0.03%  (p=0.000 n=10+10)
Compiler         4.77M ± 0%        4.77M ± 0%  +0.03%  (p=0.000 n=9+10)
SSA              12.5M ± 0%        12.6M ± 0%  +1.16%  (p=0.000 n=10+10)
Flate             221k ± 0%         221k ± 0%  +0.05%  (p=0.000 n=9+10)
GoParser          275k ± 0%         275k ± 0%  +0.01%  (p=0.014 n=10+9)
Reflect           944k ± 0%         944k ± 0%  -0.02%  (p=0.000 n=10+10)
Tar               324k ± 0%         323k ± 0%  -0.12%  (p=0.000 n=10+10)
XML               384k ± 0%         384k ± 0%  -0.01%  (p=0.001 n=10+10)

name       old object-bytes  new object-bytes  delta
Template         476kB ± 0%        476kB ± 0%  -0.04%  (p=0.000 n=10+10)
Unicode          218kB ± 0%        218kB ± 0%    ~     (all equal)
GoTypes         1.58MB ± 0%       1.58MB ± 0%  -0.04%  (p=0.000 n=10+10)
Compiler        6.25MB ± 0%       6.24MB ± 0%  -0.09%  (p=0.000 n=10+10)
SSA             15.9MB ± 0%       16.1MB ± 0%  +1.22%  (p=0.000 n=10+10)
Flate            304kB ± 0%        304kB ± 0%  -0.13%  (p=0.000 n=10+10)
GoParser         370kB ± 0%        370kB ± 0%  -0.00%  (p=0.000 n=10+10)
Reflect         1.27MB ± 0%       1.27MB ± 0%  -0.12%  (p=0.000 n=10+10)
Tar              421kB ± 0%        419kB ± 0%  -0.64%  (p=0.000 n=10+10)
XML              518kB ± 0%        517kB ± 0%  -0.12%  (p=0.000 n=10+10)

name       old export-bytes  new export-bytes  delta
Template        16.7kB ± 0%       16.7kB ± 0%    ~     (all equal)
Unicode         6.52kB ± 0%       6.52kB ± 0%    ~     (all equal)
GoTypes         29.2kB ± 0%       29.2kB ± 0%    ~     (all equal)
Compiler        88.0kB ± 0%       88.0kB ± 0%    ~     (all equal)
SSA              109kB ± 0%        109kB ± 0%    ~     (all equal)
Flate           4.49kB ± 0%       4.49kB ± 0%    ~     (all equal)
GoParser        8.10kB ± 0%       8.10kB ± 0%    ~     (all equal)
Reflect         7.71kB ± 0%       7.71kB ± 0%    ~     (all equal)
Tar             9.15kB ± 0%       9.15kB ± 0%    ~     (all equal)
XML             12.3kB ± 0%       12.3kB ± 0%    ~     (all equal)

name       old text-bytes    new text-bytes    delta
HelloSize        676kB ± 0%        672kB ± 0%  -0.59%  (p=0.000 n=10+10)
CmdGoSize       7.26MB ± 0%       7.24MB ± 0%  -0.18%  (p=0.000 n=10+10)

name       old data-bytes    new data-bytes    delta
HelloSize       10.2kB ± 0%       10.2kB ± 0%    ~     (all equal)
CmdGoSize        248kB ± 0%        248kB ± 0%    ~     (all equal)

name       old bss-bytes     new bss-bytes     delta
HelloSize        125kB ± 0%        125kB ± 0%    ~     (all equal)
CmdGoSize        145kB ± 0%        145kB ± 0%    ~     (all equal)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.46MB ± 0%       1.45MB ± 0%  -0.31%  (p=0.000 n=10+10)
CmdGoSize       14.7MB ± 0%       14.7MB ± 0%  -0.17%  (p=0.000 n=10+10)

Change-Id: Ic72b0c189dd542f391e1c9ab88a76e9148dc4285
Reviewed-on: https://go-review.googlesource.com/106495
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 10:31:21 +00:00
Ben Shi
098ca846c7 cmd/compile: emit more compact 386 instructions
ADDL/SUBL/ANDL/ORL/XORL can have a memory operand as destination,
and this CL optimize the compiler to emit such instructions on
386 for more compact binary.

Here is test report:
1. The total size of pkg/linux_386/ and pkg/tool/linux_386/ decreases
about 14KB.
(pkg/linux_386/cmd/compile/ and pkg/tool/linux_386/compile are excluded)

2. The go1 benchmark shows little change, excluding ±2% noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.34s ± 2%     3.38s ± 2%  +1.27%  (p=0.000 n=40+39)
Fannkuch11-4                3.55s ± 1%     3.51s ± 1%  -1.33%  (p=0.000 n=40+40)
FmtFprintfEmpty-4          46.3ns ± 3%    46.9ns ± 4%  +1.41%  (p=0.002 n=40+40)
FmtFprintfString-4         80.8ns ± 3%    80.4ns ± 6%  -0.54%  (p=0.044 n=40+40)
FmtFprintfInt-4            93.0ns ± 3%    92.2ns ± 4%  -0.88%  (p=0.007 n=39+40)
FmtFprintfIntInt-4          144ns ± 5%     145ns ± 2%  +0.78%  (p=0.015 n=40+40)
FmtFprintfPrefixedInt-4     184ns ± 2%     182ns ± 2%  -1.06%  (p=0.004 n=40+40)
FmtFprintfFloat-4           415ns ± 4%     419ns ± 4%    ~     (p=0.434 n=40+40)
FmtManyArgs-4               615ns ± 3%     619ns ± 3%    ~     (p=0.100 n=40+40)
GobDecode-4                7.30ms ± 6%    7.36ms ± 6%    ~     (p=0.074 n=40+40)
GobEncode-4                7.10ms ± 6%    7.21ms ± 5%    ~     (p=0.082 n=40+39)
Gzip-4                      364ms ± 3%     362ms ± 6%  -0.71%  (p=0.020 n=40+40)
Gunzip-4                   42.4ms ± 3%    42.2ms ± 3%    ~     (p=0.303 n=40+40)
HTTPClientServer-4         62.9µs ± 1%    62.9µs ± 1%    ~     (p=0.768 n=38+39)
JSONEncode-4               21.4ms ± 4%    21.5ms ± 5%    ~     (p=0.210 n=40+40)
JSONDecode-4               67.7ms ± 3%    67.9ms ± 4%    ~     (p=0.713 n=40+40)
Mandelbrot200-4            5.18ms ± 3%    5.21ms ± 3%  +0.59%  (p=0.021 n=40+40)
GoParse-4                  3.35ms ± 3%    3.34ms ± 2%    ~     (p=0.996 n=40+40)
RegexpMatchEasy0_32-4      98.5ns ± 5%    96.3ns ± 4%  -2.15%  (p=0.001 n=40+40)
RegexpMatchEasy0_1K-4       851ns ± 4%     850ns ± 5%    ~     (p=0.700 n=40+40)
RegexpMatchEasy1_32-4       105ns ± 7%     107ns ± 4%  +1.50%  (p=0.017 n=40+40)
RegexpMatchEasy1_1K-4      1.03µs ± 5%    1.03µs ± 4%    ~     (p=0.992 n=40+40)
RegexpMatchMedium_32-4      130ns ± 6%     128ns ± 4%  -1.66%  (p=0.012 n=40+40)
RegexpMatchMedium_1K-4     44.0µs ± 5%    43.6µs ± 3%    ~     (p=0.704 n=40+40)
RegexpMatchHard_32-4       2.29µs ± 3%    2.23µs ± 4%  -2.38%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4       69.0µs ± 3%    68.1µs ± 3%  -1.28%  (p=0.003 n=40+40)
Revcomp-4                   1.85s ± 2%     1.87s ± 3%  +1.11%  (p=0.000 n=40+40)
Template-4                 69.8ms ± 3%    69.6ms ± 3%    ~     (p=0.125 n=40+40)
TimeParse-4                 442ns ± 5%     440ns ± 3%    ~     (p=0.585 n=40+40)
TimeFormat-4                419ns ± 3%     420ns ± 3%    ~     (p=0.824 n=40+40)
[Geo mean]                 67.3µs         67.2µs       -0.11%

name                     old speed      new speed      delta
GobDecode-4               105MB/s ± 6%   104MB/s ± 6%    ~     (p=0.074 n=40+40)
GobEncode-4               108MB/s ± 7%   107MB/s ± 5%    ~     (p=0.080 n=40+39)
Gzip-4                   53.3MB/s ± 3%  53.7MB/s ± 6%  +0.73%  (p=0.021 n=40+40)
Gunzip-4                  458MB/s ± 3%   460MB/s ± 3%    ~     (p=0.301 n=40+40)
JSONEncode-4             90.8MB/s ± 4%  90.3MB/s ± 4%    ~     (p=0.213 n=40+40)
JSONDecode-4             28.7MB/s ± 3%  28.6MB/s ± 4%    ~     (p=0.679 n=40+40)
GoParse-4                17.3MB/s ± 3%  17.3MB/s ± 2%    ~     (p=1.000 n=40+40)
RegexpMatchEasy0_32-4     325MB/s ± 5%   333MB/s ± 4%  +2.44%  (p=0.000 n=40+38)
RegexpMatchEasy0_1K-4    1.20GB/s ± 4%  1.21GB/s ± 5%    ~     (p=0.684 n=40+40)
RegexpMatchEasy1_32-4     303MB/s ± 7%   298MB/s ± 4%  -1.52%  (p=0.022 n=40+40)
RegexpMatchEasy1_1K-4     995MB/s ± 5%   996MB/s ± 4%    ~     (p=0.996 n=40+40)
RegexpMatchMedium_32-4   7.67MB/s ± 6%  7.80MB/s ± 4%  +1.68%  (p=0.011 n=40+40)
RegexpMatchMedium_1K-4   23.3MB/s ± 5%  23.5MB/s ± 3%    ~     (p=0.697 n=40+40)
RegexpMatchHard_32-4     14.0MB/s ± 3%  14.3MB/s ± 4%  +2.43%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4     14.8MB/s ± 3%  15.0MB/s ± 3%  +1.30%  (p=0.003 n=40+40)
Revcomp-4                 137MB/s ± 2%   136MB/s ± 3%  -1.10%  (p=0.000 n=40+40)
Template-4               27.8MB/s ± 3%  27.9MB/s ± 3%    ~     (p=0.128 n=40+40)
[Geo mean]               79.6MB/s       79.9MB/s       +0.28%

Change-Id: I02a3efc125dc81e18fc8495eb2bf1bba59ab8733
Reviewed-on: https://go-review.googlesource.com/110157
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-08 06:44:54 +00:00
Martin Möhrmann
b9a59d9f2e cmd/compile: optimize len([]rune(string))
Adds a new runtime function to count runes in a string.
Modifies the compiler to detect the pattern len([]rune(string))
and replaces it with the new rune counting runtime function.

RuneCount/lenruneslice/ASCII                  27.8ns ± 2%  14.5ns ± 3%  -47.70%  (p=0.000 n=10+10)
RuneCount/lenruneslice/Japanese                126ns ± 2%    60ns ± 2%  -52.03%  (p=0.000 n=10+10)
RuneCount/lenruneslice/MixedLength             104ns ± 2%    50ns ± 1%  -51.71%  (p=0.000 n=10+9)

Fixes #24923

Change-Id: Ie9c7e7391a4e2cca675c5cdcc1e5ce7d523948b9
Reviewed-on: https://go-review.googlesource.com/108985
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-05-06 05:31:01 +00:00
Martin Möhrmann
a8a60ac2a7 cmd/compile: optimize append(x, make([]T, y)...) slice extension
Changes the compiler to recognize the slice extension pattern

  append(x, make([]T, y)...)

and replace it with growslice and an optional memclr to avoid an allocation for make([]T, y).

Memclr is not called in case growslice already allocated a new cleared backing array
when T contains pointers.

amd64:
name                      old time/op    new time/op    delta
ExtendSlice/IntSlice         103ns ± 4%      57ns ± 4%   -44.55%  (p=0.000 n=18+18)
ExtendSlice/PointerSlice     155ns ± 3%      77ns ± 3%   -49.93%  (p=0.000 n=20+20)
ExtendSlice/NoGrow          50.2ns ± 3%     5.2ns ± 2%   -89.67%  (p=0.000 n=18+18)

name                      old alloc/op   new alloc/op   delta
ExtendSlice/IntSlice         64.0B ± 0%     32.0B ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/PointerSlice     64.0B ± 0%     32.0B ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/NoGrow           32.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)

name                      old allocs/op  new allocs/op  delta
ExtendSlice/IntSlice          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/PointerSlice      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/NoGrow            1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)

Fixes #21266

Change-Id: Idc3077665f63cbe89762b590c5967a864fd1c07f
Reviewed-on: https://go-review.googlesource.com/109517
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-05-06 04:28:23 +00:00
Martin Möhrmann
500d79c410 cmd/compile: refactor memclrrange for arrays and slices
Rename memclrrange to signify that it does not handle
all types of range clears.

Simplify checks to detect the range clear idiom for
arrays and slices.

Add tests to verify the optimization for the slice
range clear idiom is being applied by the compiler.

Change-Id: I5c3b7c9a479699ebdb4c407fde692f30f377860c
Reviewed-on: https://go-review.googlesource.com/110477
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-02 04:20:25 +00:00
Matthew Dempsky
004260afde cmd/compile: open code select{send,recv,default}
Registration now looks like:

        var cases [4]runtime.scases
        var order [8]uint16
	cases[0].kind = caseSend
	cases[0].c = c1
	cases[0].elem = &v1
	if raceenabled || msanenabled {
		selectsetpc(&cases[0])
	}
	cases[1].kind = caseRecv
	cases[1].c = c2
	cases[1].elem = &v2
	if raceenabled || msanenabled {
		selectsetpc(&cases[1])
	}
	...

Change-Id: Ib9bcf426a4797fe4bfd8152ca9e6e08e39a70b48
Reviewed-on: https://go-review.googlesource.com/37934
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:44 +00:00
Matthew Dempsky
3aa53b3135 runtime: eliminate runtime.hselect
Now the registration phase looks like:

    var cases [4]runtime.scases
    var order [8]uint16
    selectsend(&cases[0], c1, &v1)
    selectrecv(&cases[1], c2, &v2, nil)
    selectrecv(&cases[2], c3, &v3, &ok)
    selectdefault(&cases[3])
    chosen := selectgo(&cases[0], &order[0], 4)

Primarily, this is just preparation for having the compiler open-code
selectsend, selectrecv, and selectdefault.

As a minor benefit, order can now be layed out separately on the stack
in the pointer-free segment, so it won't take up space in the
function's stack pointer maps.

Change-Id: I5552ba594201efd31fcb40084da20b42ea569a45
Reviewed-on: https://go-review.googlesource.com/37933
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:31 +00:00
Richard Musiol
e3c684777a all: skip unsupported tests for js/wasm
The general policy for the current state of js/wasm is that it only
has to support tests that are also supported by nacl.

The test nilptr3.go makes assumptions about which nil checks can be
removed. Since WebAssembly does not signal on reading a null pointer,
all nil checks have to be explicit.

Updates #18892

Change-Id: I06a687860b8d22ae26b1c391499c0f5183e4c485
Reviewed-on: https://go-review.googlesource.com/110096
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-30 19:39:18 +00:00
Giovanni Bajo
e0d37a33ab cmd/compile: teach prove to handle expressions like len(s)-delta
When a loop has bound len(s)-delta, findIndVar detected it and
returned len(s) as (conservative) upper bound. This little lie
allowed loopbce to drop bound checks.

It is obviously more generic to teach prove about relations like
x+d<w for non-constant "w"; we already handled the case for
constant "w", so we just want to learn that if d<0, then x+d<w
proves that x<w.

To be able to remove the code from findIndVar, we also need
to teach prove that len() and cap() are always non-negative.

This CL allows to prove 633 more checks in cmd+std. Most
of them are cases where the code was already testing before
accessing a slice but the compiler didn't know it. For instance,
take strings.HasSuffix:

    func HasSuffix(s, suffix string) bool {
        return len(s) >= len(suffix) && s[len(s)-len(suffix):] == suffix
    }

When suffix is a literal string, the compiler now understands
that the explicit check is enough to not emit a slice check.

I also found a loopbce test that was incorrectly
written to detect an overflow but had a off-by-one (on the
conservative side), so it unexpectly passed with this CL; I
changed it to really trigger the overflow as intended.

Change-Id: Ib5abade337db46b8811425afebad4719b6e46c4a
Reviewed-on: https://go-review.googlesource.com/105635
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:38:32 +00:00
Giovanni Bajo
6d379add0f cmd/compile: in prove, detect loops with negative increments
To be effective, this also requires being able to relax constraints
on min/max bound inclusiveness; they are now exposed through a flags,
and prove has been updated to handle it correctly.

Change-Id: I3490e54461b7b9de8bc4ae40d3b5e2fa2d9f0556
Reviewed-on: https://go-review.googlesource.com/104041
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:38:18 +00:00
Giovanni Bajo
980fdb8dd5 cmd/compile: improve testing of induction variables
Test both minimum and maximum bound, and prepare
formatting for more advanced tests (inclusive / esclusive bounds).

Change-Id: Ibe432916d9c938343bc07943798bc9709ad71845
Reviewed-on: https://go-review.googlesource.com/104040
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-29 09:38:09 +00:00
Giovanni Bajo
7ec25d0acf cmd/compile: implement loop BCE in prove
Reuse findIndVar to discover induction variables, and then
register the facts we know about them into the facts table
when entering the loop block.

Moreover, handle "x+delta > w" while updating the facts table,
to be able to prove accesses to slices with constant offsets
such as slice[i-10].

Change-Id: I2a63d050ed58258136d54712ac7015b25c893d71
Reviewed-on: https://go-review.googlesource.com/104038
Run-TryBot: Giovanni Bajo <rasky@develer.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:37:35 +00:00
Giovanni Bajo
29162ec9a7 cmd/compile: in prove, infer unsigned relations while branching
When a branch is followed, we apply the relation as described
in the domain relation table. In case the relation is in the
positive domain, we can also infer an unsigned relation if,
by that point, we know that both operands are non-negative.

Fixes #20393

Change-Id: Ieaf0c81558b36d96616abae3eb834c788dd278d5
Reviewed-on: https://go-review.googlesource.com/100278
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:37:15 +00:00
Giovanni Bajo
5c40210987 cmd/compile: in prove, add transitive closure of relations
Implement it through a partial order datastructure, which
keeps the relations between SSA values in a forest of DAGs
and is able to discover contradictions.

In make.bash, this patch is able to prove hundreds of conditions
which were not proved before.

Compilebench:

name       old time/op       new time/op       delta
Template         371ms ± 2%        368ms ± 1%    ~     (p=0.222 n=5+5)
Unicode          203ms ± 6%        199ms ± 3%    ~     (p=0.421 n=5+5)
GoTypes          1.17s ± 4%        1.18s ± 1%    ~     (p=0.151 n=5+5)
Compiler         5.54s ± 2%        5.59s ± 1%    ~     (p=0.548 n=5+5)
SSA              12.9s ± 2%        13.2s ± 1%  +2.96%  (p=0.032 n=5+5)
Flate            245ms ± 2%        247ms ± 3%    ~     (p=0.690 n=5+5)
GoParser         302ms ± 6%        302ms ± 4%    ~     (p=0.548 n=5+5)
Reflect          764ms ± 4%        773ms ± 3%    ~     (p=0.095 n=5+5)
Tar              354ms ± 6%        361ms ± 3%    ~     (p=0.222 n=5+5)
XML              434ms ± 3%        429ms ± 1%    ~     (p=0.421 n=5+5)
StdCmd           22.6s ± 1%        22.9s ± 1%  +1.40%  (p=0.032 n=5+5)

name       old user-time/op  new user-time/op  delta
Template         436ms ± 8%        426ms ± 5%    ~     (p=0.579 n=5+5)
Unicode          219ms ±15%        219ms ±12%    ~     (p=1.000 n=5+5)
GoTypes          1.47s ± 6%        1.53s ± 6%    ~     (p=0.222 n=5+5)
Compiler         7.26s ± 4%        7.40s ± 2%    ~     (p=0.389 n=5+5)
SSA              17.7s ± 4%        18.5s ± 4%  +4.13%  (p=0.032 n=5+5)
Flate            257ms ± 5%        268ms ± 9%    ~     (p=0.333 n=5+5)
GoParser         354ms ± 6%        348ms ± 6%    ~     (p=0.913 n=5+5)
Reflect          904ms ± 2%        944ms ± 4%    ~     (p=0.056 n=5+5)
Tar              398ms ±11%        430ms ± 7%    ~     (p=0.079 n=5+5)
XML              501ms ± 7%        489ms ± 5%    ~     (p=0.444 n=5+5)

name       old text-bytes    new text-bytes    delta
HelloSize        670kB ± 0%        670kB ± 0%  +0.00%  (p=0.008 n=5+5)
CmdGoSize       7.22MB ± 0%       7.21MB ± 0%  -0.07%  (p=0.008 n=5+5)

name       old data-bytes    new data-bytes    delta
HelloSize       9.88kB ± 0%       9.88kB ± 0%    ~     (all equal)
CmdGoSize        248kB ± 0%        248kB ± 0%  -0.06%  (p=0.008 n=5+5)

name       old bss-bytes     new bss-bytes     delta
HelloSize        125kB ± 0%        125kB ± 0%    ~     (all equal)
CmdGoSize        145kB ± 0%        144kB ± 0%  -0.20%  (p=0.008 n=5+5)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.43MB ± 0%       1.43MB ± 0%    ~     (all equal)
CmdGoSize       14.5MB ± 0%       14.5MB ± 0%  -0.06%  (p=0.008 n=5+5)

Fixes #19714
Updates #20393

Change-Id: Ia090f5b5dc1bcd274ba8a39b233c1e1ace1b330e
Reviewed-on: https://go-review.googlesource.com/100277
Run-TryBot: Giovanni Bajo <rasky@develer.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:35:39 +00:00
Ben Shi
aaf73c6d1e cmd/compile: optimize ARM64 with shifted register indexed load/store
ARM64 supports efficient instructions which combine shift, addition, load/store
together. Such as "MOVD (R0)(R1<<3), R2" and "MOVWU R6, (R4)(R1<<2)".

This CL optimizes the compiler to emit such efficient instuctions. And below
is some test data.

1. binary size before/after
binary                 size change
pkg/linux_arm64        +80.1KB
pkg/tool/linux_arm64   +121.9KB
go                     -4.3KB
gofmt                  -64KB

2. go1 benchmark
There is big improvement for the test case Fannkuch11, and slight
improvement for sme others, excluding noise.

name                     old time/op    new time/op    delta
BinaryTree17-4              43.9s ± 2%     44.0s ± 2%     ~     (p=0.820 n=30+30)
Fannkuch11-4                30.6s ± 2%     24.5s ± 3%  -19.93%  (p=0.000 n=25+30)
FmtFprintfEmpty-4           500ns ± 0%     499ns ± 0%   -0.11%  (p=0.000 n=23+25)
FmtFprintfString-4         1.03µs ± 0%    1.04µs ± 3%     ~     (p=0.065 n=29+30)
FmtFprintfInt-4            1.15µs ± 3%    1.15µs ± 4%   -0.56%  (p=0.000 n=30+30)
FmtFprintfIntInt-4         1.80µs ± 5%    1.82µs ± 0%     ~     (p=0.094 n=30+24)
FmtFprintfPrefixedInt-4    2.17µs ± 5%    2.20µs ± 0%     ~     (p=0.100 n=30+23)
FmtFprintfFloat-4          3.08µs ± 3%    3.09µs ± 4%     ~     (p=0.123 n=30+30)
FmtManyArgs-4              7.41µs ± 4%    7.17µs ± 1%   -3.26%  (p=0.000 n=30+23)
GobDecode-4                93.7ms ± 0%    94.7ms ± 4%     ~     (p=0.685 n=24+30)
GobEncode-4                78.7ms ± 7%    77.1ms ± 0%     ~     (p=0.729 n=30+23)
Gzip-4                      4.01s ± 0%     3.97s ± 5%   -1.11%  (p=0.037 n=24+30)
Gunzip-4                    389ms ± 4%     384ms ± 0%     ~     (p=0.155 n=30+23)
HTTPClientServer-4          536µs ± 1%     537µs ± 1%     ~     (p=0.236 n=30+30)
JSONEncode-4                179ms ± 1%     182ms ± 6%     ~     (p=0.763 n=24+30)
JSONDecode-4                843ms ± 0%     839ms ± 6%   -0.42%  (p=0.003 n=25+30)
Mandelbrot200-4            46.5ms ± 0%    46.5ms ± 0%   +0.02%  (p=0.000 n=26+26)
GoParse-4                  44.3ms ± 6%    43.3ms ± 0%     ~     (p=0.067 n=30+27)
RegexpMatchEasy0_32-4      1.07µs ± 7%    1.07µs ± 4%     ~     (p=0.835 n=30+30)
RegexpMatchEasy0_1K-4      5.51µs ± 0%    5.49µs ± 0%   -0.35%  (p=0.000 n=23+26)
RegexpMatchEasy1_32-4      1.01µs ± 0%    1.02µs ± 4%   +0.96%  (p=0.014 n=24+30)
RegexpMatchEasy1_1K-4      7.43µs ± 0%    7.18µs ± 0%   -3.41%  (p=0.000 n=23+24)
RegexpMatchMedium_32-4     1.78µs ± 0%    1.81µs ± 4%   +1.47%  (p=0.012 n=23+30)
RegexpMatchMedium_1K-4      547µs ± 1%     542µs ± 3%   -0.90%  (p=0.003 n=24+30)
RegexpMatchHard_32-4       30.4µs ± 0%    29.7µs ± 0%   -2.15%  (p=0.000 n=19+23)
RegexpMatchHard_1K-4        913µs ± 0%     915µs ± 6%   +0.25%  (p=0.012 n=24+30)
Revcomp-4                   6.32s ± 1%     6.42s ± 4%     ~     (p=0.342 n=25+30)
Template-4                  868ms ± 6%     878ms ± 6%   +1.15%  (p=0.000 n=30+30)
TimeParse-4                4.57µs ± 4%    4.59µs ± 3%   +0.65%  (p=0.010 n=29+30)
TimeFormat-4               4.51µs ± 0%    4.50µs ± 0%   -0.27%  (p=0.000 n=27+24)
[Geo mean]                  695µs          689µs        -0.92%

name                     old speed      new speed      delta
GobDecode-4              8.19MB/s ± 0%  8.12MB/s ± 4%     ~     (p=0.680 n=24+30)
GobEncode-4              9.76MB/s ± 7%  9.96MB/s ± 0%     ~     (p=0.616 n=30+23)
Gzip-4                   4.84MB/s ± 0%  4.89MB/s ± 4%   +1.16%  (p=0.030 n=24+30)
Gunzip-4                 49.9MB/s ± 4%  50.6MB/s ± 0%     ~     (p=0.162 n=30+23)
JSONEncode-4             10.9MB/s ± 1%  10.7MB/s ± 6%     ~     (p=0.575 n=24+30)
JSONDecode-4             2.30MB/s ± 0%  2.32MB/s ± 5%   +0.72%  (p=0.003 n=22+30)
GoParse-4                1.31MB/s ± 6%  1.34MB/s ± 0%   +2.26%  (p=0.002 n=30+27)
RegexpMatchEasy0_32-4    30.0MB/s ± 6%  30.0MB/s ± 4%     ~     (p=1.000 n=30+30)
RegexpMatchEasy0_1K-4     186MB/s ± 0%   187MB/s ± 0%   +0.35%  (p=0.000 n=23+26)
RegexpMatchEasy1_32-4    31.8MB/s ± 0%  31.5MB/s ± 4%   -0.92%  (p=0.012 n=25+30)
RegexpMatchEasy1_1K-4     138MB/s ± 0%   143MB/s ± 0%   +3.53%  (p=0.000 n=23+24)
RegexpMatchMedium_32-4    560kB/s ± 0%   553kB/s ± 4%   -1.19%  (p=0.005 n=23+30)
RegexpMatchMedium_1K-4   1.87MB/s ± 0%  1.89MB/s ± 3%   +1.04%  (p=0.002 n=24+30)
RegexpMatchHard_32-4     1.05MB/s ± 0%  1.08MB/s ± 0%   +2.40%  (p=0.000 n=19+23)
RegexpMatchHard_1K-4     1.12MB/s ± 0%  1.12MB/s ± 5%   +0.12%  (p=0.006 n=25+30)
Revcomp-4                40.2MB/s ± 1%  39.6MB/s ± 4%     ~     (p=0.242 n=25+30)
Template-4               2.24MB/s ± 6%  2.21MB/s ± 6%   -1.15%  (p=0.000 n=30+30)
[Geo mean]               7.87MB/s       7.91MB/s        +0.44%

Change-Id: If374cb7abf83537aa0a176f73c0f736f7800db03
Reviewed-on: https://go-review.googlesource.com/108735
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 20:02:05 +00:00
Milan Knezevic
2959128dc5 cmd/compile: add softfloat support to mips64{,le}
mips64 softfloat support is based on mips implementation and introduces
new enviroment variable GOMIPS64.

GOMIPS64 is a GOARCH=mips64{,le} specific option, for a choice between
hard-float and soft-float. Valid values are 'hardfloat' (default) and
'softfloat'. It is passed to the assembler as
'GOMIPS64_{hardfloat,softfloat}'.

Change-Id: I7f73078627f7cb37c588a38fb5c997fe09c56134
Reviewed-on: https://go-review.googlesource.com/108475
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 14:50:17 +00:00
Josh Bleecher Snyder
d9a50a6531 cmd/compile: use prove pass to detect Ctz of non-zero values
On amd64, Ctz must include special handling of zeros.
But the prove pass has enough information to detect whether the input
is non-zero, allowing a more efficient lowering.

Introduce new CtzNonZero ops to capture and use this information.

Benchmark code:

func BenchmarkVisitBits(b *testing.B) {
	b.Run("8", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			x := uint8(0xff)
			for x != 0 {
				sink = bits.TrailingZeros8(x)
				x &= x - 1
			}
		}
	})

    // and similarly so for 16, 32, 64
}

name            old time/op  new time/op  delta
VisitBits/8-8   7.27ns ± 4%  5.58ns ± 4%  -23.35%  (p=0.000 n=28+26)
VisitBits/16-8  14.7ns ± 7%  10.5ns ± 4%  -28.43%  (p=0.000 n=30+28)
VisitBits/32-8  27.6ns ± 8%  19.3ns ± 3%  -30.14%  (p=0.000 n=30+26)
VisitBits/64-8  44.0ns ±11%  38.0ns ± 5%  -13.48%  (p=0.000 n=30+30)

Fixes #25077

Change-Id: Ie6e5bd86baf39ee8a4ca7cadcf56d934e047f957
Reviewed-on: https://go-review.googlesource.com/109358
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-26 18:22:28 +00:00
Carlos Eduardo Seo
ebb67d993a cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64x
This change implements math.Round as an intrinsic on ppc64x so it can be
done using a single instruction.

benchmark                   old ns/op     new ns/op     delta
BenchmarkRound-16           2.60          0.69          -73.46%

Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8
Reviewed-on: https://go-review.googlesource.com/109395
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-04-26 14:12:09 +00:00
Josh Bleecher Snyder
c5f0104daf cmd/compile: use intrinsic for LeadingZeros8 on amd64
The previous change sped up the pure computation form of LeadingZeros8.
This places it somewhat close to the table lookup form.
Depending on something that varies from toolchain to toolchain
(alignment, perhaps?), the slowdown from ditching the table lookup
is either 20% or 5%.

This benchmark is the best case scenario for the table lookup:
It is in the L1 cache already.

I think we're close enough that we can switch to the computational version,
and trust that the memory effects and binary size savings will be worth it.

Code:

func f8(x uint8)   { z = bits.LeadingZeros8(x) }

Before:

"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	LEAQ	math/bits.len8tab(SB), CX
	0x000f 00015 (x.go:7)	MOVBLZX	(CX)(AX*1), AX
	0x0013 00019 (x.go:7)	ADDQ	$-8, AX
	0x0017 00023 (x.go:7)	NEGQ	AX
	0x001a 00026 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:7)	RET

After:

"".f8 STEXT nosplit size=30 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	LEAL	1(AX)(AX*1), AX
	0x000c 00012 (x.go:7)	BSRL	AX, AX
	0x000f 00015 (x.go:7)	ADDQ	$-8, AX
	0x0013 00019 (x.go:7)	NEGQ	AX
	0x0016 00022 (x.go:7)	MOVQ	AX, "".z(SB)
	0x001d 00029 (x.go:7)	RET

Change-Id: Icc7db50a7820fb9a3da8a816d6b6940d7f8e193e
Reviewed-on: https://go-review.googlesource.com/108942
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:34:15 +00:00
Josh Bleecher Snyder
1d321ada73 cmd/compile: optimize LeadingZeros(16|32) on amd64
Introduce Len8 and Len16 ops and provide optimized lowerings for them.
amd64 only for this CL, although it wouldn't surprise me
if other architectures also admit of optimized lowerings.

Also use and optimize the Len32 lowering, along the same lines.

Leave Len8 unused for the moment; a subsequent CL will enable it.

For 16 and 32 bits, this leads to a speed-up.

name              old time/op  new time/op  delta
LeadingZeros16-8  1.42ns ± 5%  1.23ns ± 5%  -13.42%  (p=0.000 n=20+20)
LeadingZeros32-8  1.25ns ± 5%  1.03ns ± 5%  -17.63%  (p=0.000 n=20+16)

Code:

func f16(x uint16) { z = bits.LeadingZeros16(x) }
func f32(x uint32) { z = bits.LeadingZeros32(x) }

Before:

"".f16 STEXT nosplit size=38 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	BSRQ	AX, AX
	0x000c 00012 (x.go:8)	MOVQ	$-1, CX
	0x0013 00019 (x.go:8)	CMOVQEQ	CX, AX
	0x0017 00023 (x.go:8)	ADDQ	$-15, AX
	0x001b 00027 (x.go:8)	NEGQ	AX
	0x001e 00030 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0025 00037 (x.go:8)	RET

"".f32 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:9)	TEXT	"".f32(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:9)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:9)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:9)	MOVL	"".x+8(SP), AX
	0x0004 00004 (x.go:9)	BSRQ	AX, AX
	0x0008 00008 (x.go:9)	MOVQ	$-1, CX
	0x000f 00015 (x.go:9)	CMOVQEQ	CX, AX
	0x0013 00019 (x.go:9)	ADDQ	$-31, AX
	0x0017 00023 (x.go:9)	NEGQ	AX
	0x001a 00026 (x.go:9)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:9)	RET

After:

"".f16 STEXT nosplit size=30 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	LEAL	1(AX)(AX*1), AX
	0x000c 00012 (x.go:8)	BSRL	AX, AX
	0x000f 00015 (x.go:8)	ADDQ	$-16, AX
	0x0013 00019 (x.go:8)	NEGQ	AX
	0x0016 00022 (x.go:8)	MOVQ	AX, "".z(SB)
	0x001d 00029 (x.go:8)	RET

"".f32 STEXT nosplit size=28 args=0x8 locals=0x0
	0x0000 00000 (x.go:9)	TEXT	"".f32(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:9)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:9)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:9)	MOVL	"".x+8(SP), AX
	0x0004 00004 (x.go:9)	LEAQ	1(AX)(AX*1), AX
	0x0009 00009 (x.go:9)	BSRQ	AX, AX
	0x000d 00013 (x.go:9)	ADDQ	$-32, AX
	0x0011 00017 (x.go:9)	NEGQ	AX
	0x0014 00020 (x.go:9)	MOVQ	AX, "".z(SB)
	0x001b 00027 (x.go:9)	RET

Change-Id: I6c93c173752a7bfdeab8be30777ae05a736e1f4b
Reviewed-on: https://go-review.googlesource.com/108941
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:34:04 +00:00
Josh Bleecher Snyder
54dbab5221 cmd/compile: optimize TrailingZeros(8|16) on amd64
Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them.
amd64 only for this CL, although it wouldn't surprise me
if other architectures also admit of optimized lowerings.

name               old time/op  new time/op  delta
TrailingZeros8-8   1.33ns ± 6%  0.84ns ± 3%  -36.90%  (p=0.000 n=20+20)
TrailingZeros16-8  1.26ns ± 5%  0.84ns ± 5%  -33.50%  (p=0.000 n=20+18)

Code:

func f8(x uint8)   { z = bits.TrailingZeros8(x) }
func f16(x uint16) { z = bits.TrailingZeros16(x) }

Before:

"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	BTSQ	$8, AX
	0x000d 00013 (x.go:7)	BSFQ	AX, AX
	0x0011 00017 (x.go:7)	MOVL	$64, CX
	0x0016 00022 (x.go:7)	CMOVQEQ	CX, AX
	0x001a 00026 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:7)	RET

"".f16 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	BTSQ	$16, AX
	0x000d 00013 (x.go:8)	BSFQ	AX, AX
	0x0011 00017 (x.go:8)	MOVL	$64, CX
	0x0016 00022 (x.go:8)	CMOVQEQ	CX, AX
	0x001a 00026 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:8)	RET

After:

"".f8 STEXT nosplit size=20 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	BTSL	$8, AX
	0x0009 00009 (x.go:7)	BSFL	AX, AX
	0x000c 00012 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0013 00019 (x.go:7)	RET

"".f16 STEXT nosplit size=20 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	BTSL	$16, AX
	0x0009 00009 (x.go:8)	BSFL	AX, AX
	0x000c 00012 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0013 00019 (x.go:8)	RET

Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11
Reviewed-on: https://go-review.googlesource.com/108940
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:33:52 +00:00
Ilya Tocar
fb017c60bc cmd/compile/internal/ssa: fix endless compile loop on AMD64
We currently rewrite
(TESTQ (MOVQconst [c] x)) into (TESTQconst [c] x)
and (TESTQconst [-1] x) into (TESTQ x x)
if x is a (MOVQconst [-1]) we will be stuck in the endless rewrite loop.
Don't perform the rewrite in such cases.

Fixes #25006

Change-Id: I77f561ba2605fc104f1e5d5c57f32e9d67a2c000
Reviewed-on: https://go-review.googlesource.com/108879
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-24 16:20:41 +00:00
Josh Bleecher Snyder
d292f77e95 cmd/compile: rewrite 2*x+c into LEAx1 on amd64
Rewrite x<<1+c into x+x+c, which can be expressed as a single LEAQ/LEAL.

Bit of a special case, but the single-instruction
LEA is both shorter and faster than SHL then ADD.

Triggers 293 times during make.bash.

Change-Id: I3f09c8e9a8f3859d1eeed336f095fc3ada79c2c1
Reviewed-on: https://go-review.googlesource.com/108938
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-23 22:40:10 +00:00
Josh Bleecher Snyder
566e3e074c cmd/compile: avoid runtime call during switch string(byteslice)
This triggers three times while building std,
once in image/png and twice in go/internal/gccgoimporter.

There are no instances in std in which a more aggressive
optimization would have triggered.

This doesn't necessarily avoid an allocation,
because escape analysis is already able in many cases
to use a temporary backing for the string,
but it does at a minimum avoid the runtime call and copy.

Fixes #24937

Change-Id: I7019e85638ba8cd7e2f03890e672558b858579bc
Reviewed-on: https://go-review.googlesource.com/108035
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-04-21 00:50:50 +00:00
Lynn Boger
30311e8860 cmd/compile: generate load without DS relocation for go.string on ppc64le
Due to some recent optimizations related to the compare
instruction, DS-form load instructions started to be used
to load 8-byte go.strings. This can cause link time errors
if the go.string is not aligned to 4 bytes.

For DS-form instructions, the value in the offset field must
be a multiple of 4. If the offset is known at the time the
rules are processed, a DS-form load will not be chosen. But for
go.strings, the offset is not known at that time, but a
relocation is generated indicating that the linker should fill
in the DS relocation. When the linker tries to fill in the
relocation, if the offset is not aligned properly, a link error
will occur.

To fix this, when loading a go.string using MOVDload, the full
address of the go.string is generated and loaded into the base
register. Then the go.string is loaded with a 0 offset field.

Added a testcase that reproduces this problem.

Fixes #24799

Change-Id: I6a154e8e1cba64eae290be0fbcb608b75884ecdd
Reviewed-on: https://go-review.googlesource.com/107855
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 16:16:47 +00:00
Ben Shi
34f5f8a580 cmd/compile: optimize ARM64 with register indexed load/store
ARM64 supports load/store instructions with a memory operand that
the address is calculated by base register + index register.

In this CL,
1. Some rules are added to the compile's ARM64 backend to emit
such efficient instructions.
2. A wrong rule of load combination is fixed.

The go1 benchmark does show improvement.

name                     old time/op    new time/op    delta
BinaryTree17-4              44.5s ± 2%     44.1s ± 1%   -0.81%  (p=0.000 n=28+29)
Fannkuch11-4                32.7s ± 3%     30.5s ± 0%   -6.79%  (p=0.000 n=30+26)
FmtFprintfEmpty-4           499ns ± 0%     506ns ± 5%   +1.39%  (p=0.003 n=25+30)
FmtFprintfString-4         1.07µs ± 0%    1.04µs ± 4%   -3.17%  (p=0.000 n=23+30)
FmtFprintfInt-4            1.15µs ± 4%    1.13µs ± 0%   -1.55%  (p=0.000 n=30+23)
FmtFprintfIntInt-4         1.77µs ± 4%    1.74µs ± 0%   -1.71%  (p=0.000 n=30+24)
FmtFprintfPrefixedInt-4    2.37µs ± 5%    2.12µs ± 0%  -10.56%  (p=0.000 n=30+23)
FmtFprintfFloat-4          3.03µs ± 1%    3.03µs ± 4%   -0.13%  (p=0.003 n=25+30)
FmtManyArgs-4              7.38µs ± 1%    7.43µs ± 4%   +0.59%  (p=0.003 n=25+30)
GobDecode-4                 101ms ± 6%      95ms ± 5%   -5.55%  (p=0.000 n=30+30)
GobEncode-4                78.0ms ± 4%    78.8ms ± 6%   +1.05%  (p=0.000 n=30+30)
Gzip-4                      4.25s ± 0%     4.27s ± 4%   +0.45%  (p=0.003 n=24+30)
Gunzip-4                    428ms ± 1%     420ms ± 0%   -1.88%  (p=0.000 n=23+23)
HTTPClientServer-4          549µs ± 1%     541µs ± 1%   -1.56%  (p=0.000 n=29+29)
JSONEncode-4                194ms ± 0%     188ms ± 4%     ~     (p=0.417 n=23+30)
JSONDecode-4                890ms ± 5%     831ms ± 0%   -6.55%  (p=0.000 n=30+23)
Mandelbrot200-4            47.3ms ± 2%    46.5ms ± 0%     ~     (p=0.980 n=30+26)
GoParse-4                  43.1ms ± 6%    43.8ms ± 6%   +1.65%  (p=0.000 n=30+30)
RegexpMatchEasy0_32-4      1.06µs ± 0%    1.07µs ± 3%     ~     (p=0.092 n=23+30)
RegexpMatchEasy0_1K-4      5.53µs ± 0%    5.51µs ± 0%   -0.24%  (p=0.000 n=25+25)
RegexpMatchEasy1_32-4      1.02µs ± 3%    1.01µs ± 0%   -1.27%  (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4      7.26µs ± 0%    7.33µs ± 0%   +0.95%  (p=0.000 n=23+26)
RegexpMatchMedium_32-4     1.84µs ± 7%    1.79µs ± 1%     ~     (p=0.333 n=30+23)
RegexpMatchMedium_1K-4      553µs ± 0%     547µs ± 0%   -1.14%  (p=0.000 n=24+22)
RegexpMatchHard_32-4       30.8µs ± 1%    30.3µs ± 0%   -1.40%  (p=0.000 n=24+24)
RegexpMatchHard_1K-4        928µs ± 0%     929µs ± 5%   +0.12%  (p=0.013 n=23+30)
Revcomp-4                   8.13s ± 4%     6.32s ± 1%  -22.23%  (p=0.000 n=30+23)
Template-4                  899ms ± 6%     854ms ± 1%   -5.01%  (p=0.000 n=30+24)
TimeParse-4                4.66µs ± 4%    4.59µs ± 1%   -1.57%  (p=0.000 n=30+23)
TimeFormat-4               4.58µs ± 0%    4.61µs ± 0%   +0.57%  (p=0.000 n=26+24)
[Geo mean]                  717µs          698µs        -2.55%

name                     old speed      new speed      delta
GobDecode-4              7.63MB/s ± 6%  8.08MB/s ± 5%   +5.88%  (p=0.000 n=30+30)
GobEncode-4              9.85MB/s ± 4%  9.75MB/s ± 6%   -1.04%  (p=0.000 n=30+30)
Gzip-4                   4.56MB/s ± 0%  4.55MB/s ± 4%   -0.36%  (p=0.003 n=24+30)
Gunzip-4                 45.3MB/s ± 1%  46.2MB/s ± 0%   +1.92%  (p=0.000 n=23+23)
JSONEncode-4             10.0MB/s ± 0%  10.4MB/s ± 4%     ~     (p=0.403 n=23+30)
JSONDecode-4             2.18MB/s ± 5%  2.33MB/s ± 0%   +6.91%  (p=0.000 n=30+23)
GoParse-4                1.34MB/s ± 5%  1.32MB/s ± 5%   -1.66%  (p=0.000 n=30+30)
RegexpMatchEasy0_32-4    30.2MB/s ± 0%  29.8MB/s ± 3%     ~     (p=0.099 n=23+30)
RegexpMatchEasy0_1K-4     185MB/s ± 0%   186MB/s ± 0%   +0.24%  (p=0.000 n=25+25)
RegexpMatchEasy1_32-4    31.4MB/s ± 3%  31.8MB/s ± 0%   +1.24%  (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4     141MB/s ± 0%   140MB/s ± 0%   -0.94%  (p=0.000 n=23+26)
RegexpMatchMedium_32-4    541kB/s ± 6%   560kB/s ± 0%   +3.45%  (p=0.000 n=30+23)
RegexpMatchMedium_1K-4   1.85MB/s ± 0%  1.87MB/s ± 0%   +1.08%  (p=0.000 n=24+23)
RegexpMatchHard_32-4     1.04MB/s ± 1%  1.06MB/s ± 1%   +1.48%  (p=0.000 n=24+24)
RegexpMatchHard_1K-4     1.10MB/s ± 0%  1.10MB/s ± 5%   +0.15%  (p=0.004 n=23+30)
Revcomp-4                31.3MB/s ± 4%  40.2MB/s ± 1%  +28.52%  (p=0.000 n=30+23)
Template-4               2.16MB/s ± 6%  2.27MB/s ± 1%   +5.18%  (p=0.000 n=30+24)
[Geo mean]               7.57MB/s       7.79MB/s        +2.98%

fixes #24907

Change-Id: I94afd0e3f53d62a1cf5e452f3dd6daf61be21785
Reviewed-on: https://go-review.googlesource.com/107376
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-19 15:08:10 +00:00
Cherry Zhang
3042463d61 cmd/compile: in escape analysis, use element type for OIND of slice
The escape analysis models the flow of "content" of X with a
level of "indirection" (OIND node) of X. This content can be
pointer dereference, or slice/string element. For the latter
case, the type of the OIND node should be the element type of
the slice/string. This CL fixes this. In particular, this
matters when the element type is pointerless, where the data
flow should not cause any escape.

Fixes #15730.

Change-Id: Iba9f92898681625e7e3ddef76ae65d7cd61c41e0
Reviewed-on: https://go-review.googlesource.com/107597
Reviewed-by: David Chase <drchase@google.com>
2018-04-18 02:59:37 +00:00
Michael Munday
58cdecb9c8 cmd/compile: generate constants for NeqPtr, EqPtr and IsNonNil ops
If both inputs are constant offsets from the same pointer then we
can evaluate NeqPtr and EqPtr at compile time. Triggers a few times
during all.bash. Removes a conditional branch in the following
code:

copy(x[1:], x[:])

This branch was recently added as an optimization in CL 94596. We
now skip the memmove if the pointers are equal. However, in the
above code we know at compile time that they are never equal.

Also, when the offset is variable, check if the offset is zero
rather than if the pointers are equal. For example:

copy(x[a:], x[:])

This would now skip the copy if a == 0, rather than if x + a == x.

Finally I've also added a rule to make IsNonNil true for pointers
to values on the stack. The nil check elimination pass will catch
these anyway, but eliminating them here might eliminate branches
earlier.

Change-Id: If72f436fef0a96ad0f4e296d3a1f8b6c3e712085
Reviewed-on: https://go-review.googlesource.com/106635
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-16 20:43:57 +00:00
Ben Shi
cd65bbc01b cmd/compile/internal/ssa: optimize 386's subtraction
The SUBL instruction can take a memory operand, and this CL
implements this optimization.

The go1 benchmark shows a little improvement.

name                     old time/op    new time/op    delta
BinaryTree17-4              3.27s ± 2%     3.29s ± 3%    ~     (p=0.322 n=37+40)
Fannkuch11-4                3.49s ± 0%     3.53s ± 1%  +1.21%  (p=0.000 n=31+40)
FmtFprintfEmpty-4          46.2ns ± 3%    46.3ns ± 2%    ~     (p=0.351 n=40+28)
FmtFprintfString-4         82.0ns ± 3%    81.5ns ± 2%  -0.69%  (p=0.002 n=40+30)
FmtFprintfInt-4            94.6ns ± 3%    94.6ns ± 6%    ~     (p=0.913 n=39+37)
FmtFprintfIntInt-4          147ns ± 3%     150ns ± 2%  +1.72%  (p=0.000 n=40+25)
FmtFprintfPrefixedInt-4     186ns ± 3%     186ns ± 0%  -0.33%  (p=0.006 n=40+25)
FmtFprintfFloat-4           388ns ± 4%     388ns ± 4%    ~     (p=0.162 n=40+40)
FmtManyArgs-4               612ns ± 3%     616ns ± 4%    ~     (p=0.223 n=40+40)
GobDecode-4                7.35ms ± 5%    7.42ms ± 5%    ~     (p=0.095 n=40+40)
GobEncode-4                7.21ms ± 8%    7.23ms ± 4%    ~     (p=0.294 n=40+40)
Gzip-4                      360ms ± 4%     359ms ± 4%    ~     (p=0.097 n=40+40)
Gunzip-4                   46.1ms ± 3%    45.6ms ± 3%  -1.20%  (p=0.000 n=40+40)
HTTPClientServer-4         64.0µs ± 2%    64.1µs ± 2%    ~     (p=0.648 n=39+40)
JSONEncode-4               21.9ms ± 4%    22.1ms ± 5%    ~     (p=0.086 n=40+40)
JSONDecode-4               67.9ms ± 4%    66.7ms ± 4%  -1.63%  (p=0.000 n=40+40)
Mandelbrot200-4            5.19ms ± 3%    5.17ms ± 3%    ~     (p=0.881 n=40+40)
GoParse-4                  3.34ms ± 3%    3.28ms ± 2%  -1.78%  (p=0.000 n=40+40)
RegexpMatchEasy0_32-4       101ns ± 5%      99ns ± 3%  -2.40%  (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4       851ns ± 1%     848ns ± 3%  -0.36%  (p=0.004 n=33+40)
RegexpMatchEasy1_32-4       109ns ± 5%     105ns ± 3%  -3.53%  (p=0.000 n=39+40)
RegexpMatchEasy1_1K-4      1.03µs ± 4%    1.03µs ± 3%    ~     (p=0.638 n=40+38)
RegexpMatchMedium_32-4      131ns ± 5%     127ns ± 4%  -3.36%  (p=0.000 n=38+40)
RegexpMatchMedium_1K-4     43.4µs ± 4%    43.2µs ± 3%  -0.46%  (p=0.008 n=40+40)
RegexpMatchHard_32-4       2.21µs ± 4%    2.23µs ± 1%  +0.77%  (p=0.014 n=40+28)
RegexpMatchHard_1K-4       67.6µs ± 4%    67.7µs ± 3%  +0.11%  (p=0.016 n=40+40)
Revcomp-4                   1.86s ± 3%     1.77s ± 2%  -4.81%  (p=0.000 n=40+40)
Template-4                 71.7ms ± 3%    71.6ms ± 4%    ~     (p=0.200 n=40+40)
TimeParse-4                 436ns ± 4%     433ns ± 3%    ~     (p=0.358 n=40+40)
TimeFormat-4                413ns ± 4%     412ns ± 3%    ~     (p=0.415 n=40+40)
[Geo mean]                 63.9µs         63.6µs       -0.49%

name                     old speed      new speed      delta
GobDecode-4               105MB/s ± 5%   104MB/s ± 5%    ~     (p=0.096 n=40+40)
GobEncode-4               106MB/s ± 7%   106MB/s ± 3%    ~     (p=0.385 n=39+40)
Gzip-4                   54.0MB/s ± 4%  54.0MB/s ± 4%    ~     (p=0.100 n=40+40)
Gunzip-4                  421MB/s ± 3%   426MB/s ± 3%  +1.21%  (p=0.000 n=40+40)
JSONEncode-4             88.5MB/s ± 5%  88.0MB/s ± 5%    ~     (p=0.083 n=40+40)
JSONDecode-4             28.6MB/s ± 4%  29.1MB/s ± 4%  +1.65%  (p=0.000 n=40+40)
GoParse-4                17.3MB/s ± 3%  17.7MB/s ± 2%  +1.82%  (p=0.000 n=40+40)
RegexpMatchEasy0_32-4     316MB/s ± 5%   323MB/s ± 4%  +2.44%  (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4    1.20GB/s ± 1%  1.21GB/s ± 3%  +0.40%  (p=0.004 n=33+40)
RegexpMatchEasy1_32-4     291MB/s ± 7%   302MB/s ± 4%  +3.82%  (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4     993MB/s ± 4%   990MB/s ± 3%    ~     (p=0.623 n=40+38)
RegexpMatchMedium_32-4   7.61MB/s ± 5%  7.87MB/s ± 4%  +3.36%  (p=0.000 n=38+40)
RegexpMatchMedium_1K-4   23.6MB/s ± 4%  23.7MB/s ± 4%  +0.46%  (p=0.007 n=40+40)
RegexpMatchHard_32-4     14.5MB/s ± 4%  14.3MB/s ± 1%  -0.79%  (p=0.017 n=40+28)
RegexpMatchHard_1K-4     15.1MB/s ± 4%  15.1MB/s ± 3%  -0.11%  (p=0.015 n=40+40)
Revcomp-4                 137MB/s ± 3%   144MB/s ± 3%  +5.06%  (p=0.000 n=40+40)
Template-4               27.1MB/s ± 3%  27.1MB/s ± 4%    ~     (p=0.211 n=40+40)
[Geo mean]               78.9MB/s       79.7MB/s       +1.01%

Change-Id: I638fa4fef85833e8605919d693f9570cc3cf7334
Reviewed-on: https://go-review.googlesource.com/107275
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-16 04:41:20 +00:00
Giovanni Bajo
e7b1d0a9cf test: add missing copyright header
Change-Id: Ia64535492515f725fe3c4b59ea300363a0c4ce10
Reviewed-on: https://go-review.googlesource.com/107136
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 21:17:54 +00:00
Giovanni Bajo
2954ef20bb test: small cleanup of code and comments in run.go
While writing CL 107315, I went back and forth for the syntax used for
constraints of build environments in which the architecture did not
support varitants ("plan9/amd64" vs "plan9/amd64/"). I eventually
settled for the latter because the code required less heuristics
(think parsing "plan9/386" vs "386/sse2") but there were a few
leftovers in code and comments.

Change-Id: I9d9a008f3814f9a1642609650eb571e7f1a675cf
Reviewed-on: https://go-review.googlesource.com/107338
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 21:17:43 +00:00
Giovanni Bajo
284ba47b49 test: run codegen tests on all supported architecture variants
This CL makes the codegen testsuite automatically test all
architecture variants for architecture specified in tests. For
instance, if a test file specifies a "arm" test, it will be
automatically run on all GOARM variants (5,6,7), to increase
the coverage.

The CL also introduces a syntax to specify only a specific
variant (eg: "arm/7") in case the test makes sense only there.
The same syntax also allows to specify the operating system
in case it matters (eg: "plan9/386/sse2").

Fixes #24658

Change-Id: I2eba8b918f51bb6a77a8431a309f8b71af07ea22
Reviewed-on: https://go-review.googlesource.com/107315
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 20:02:43 +00:00
Giovanni Bajo
01aa1d7dbe test: migrate plan9 tests to codegen
And remove it from asmtest. Next CL will remove the whole
asmtest infrastructure.

Change-Id: I5851bf7c617456d62a3c6cffacf70252df7b056b
Reviewed-on: https://go-review.googlesource.com/107335
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 20:02:30 +00:00
Cherry Zhang
b08a9b7ecc all: use new softfloat on GOARM=5
Use the new softfloat support in the compiler, originally added
for softfloat on MIPS. This support is portable, so we can just
use it for softfloat on ARM.

In the old softfloat support on ARM, the compiler generates
floating point instructions, then the assembler inserts calls
to _sfloat before FP instructions. _sfloat decodes the following
FP instructions and simulates them.

In the new scheme, the compiler generates runtime calls to do FP
operations at a higher level. It doesn't generate FP instructions,
and therefore the assembler won't insert _sfloat calls, i.e. the
old mechanism is automatically suppressed.

The old method may be still be triggered with assembly code
using FP instructions. In the standard library, the only
occurance is math/sqrt_arm.s, which is rewritten to call to the
Go implementation instead.

Some significant speedups for code using floating points:

name                     old time/op    new time/op     delta
BinaryTree17-4              37.1s ± 2%      37.3s ± 1%     ~     (p=0.105 n=10+10)
Fannkuch11-4                13.0s ± 0%      13.1s ± 0%   +0.46%  (p=0.000 n=10+10)
FmtFprintfEmpty-4           700ns ± 4%      734ns ± 6%   +4.84%  (p=0.009 n=10+10)
FmtFprintfString-4         1.22µs ± 3%     1.22µs ± 4%     ~     (p=0.897 n=10+10)
FmtFprintfInt-4            1.27µs ± 2%     1.30µs ± 1%   +1.91%  (p=0.001 n=10+9)
FmtFprintfIntInt-4         1.83µs ± 2%     1.81µs ± 3%     ~     (p=0.149 n=10+10)
FmtFprintfPrefixedInt-4    1.80µs ± 3%     1.81µs ± 2%     ~     (p=0.421 n=10+8)
FmtFprintfFloat-4          6.89µs ± 3%     3.59µs ± 2%  -47.93%  (p=0.000 n=10+10)
FmtManyArgs-4              6.39µs ± 1%     6.09µs ± 1%   -4.61%  (p=0.000 n=10+9)
GobDecode-4                 109ms ± 2%       81ms ± 2%  -25.99%  (p=0.000 n=9+10)
GobEncode-4                 109ms ± 2%       76ms ± 2%  -29.88%  (p=0.000 n=10+9)
Gzip-4                      3.61s ± 1%      3.59s ± 1%     ~     (p=0.247 n=10+10)
Gunzip-4                    449ms ± 4%      450ms ± 1%     ~     (p=0.230 n=10+7)
HTTPClientServer-4         1.55ms ± 3%     1.53ms ± 2%     ~     (p=0.400 n=9+10)
JSONEncode-4                356ms ± 1%      183ms ± 1%  -48.73%  (p=0.000 n=10+10)
JSONDecode-4                1.12s ± 2%      0.87s ± 1%  -21.88%  (p=0.000 n=10+10)
Mandelbrot200-4             5.49s ± 1%      2.55s ± 1%  -53.45%  (p=0.000 n=9+10)
GoParse-4                  49.6ms ± 2%     47.5ms ± 1%   -4.08%  (p=0.000 n=10+9)
RegexpMatchEasy0_32-4      1.13µs ± 4%     1.20µs ± 4%   +6.42%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K-4      4.41µs ± 2%     4.44µs ± 2%     ~     (p=0.128 n=10+10)
RegexpMatchEasy1_32-4      1.15µs ± 5%     1.20µs ± 5%   +4.85%  (p=0.002 n=10+10)
RegexpMatchEasy1_1K-4      6.21µs ± 2%     6.37µs ± 4%   +2.62%  (p=0.001 n=9+10)
RegexpMatchMedium_32-4     1.58µs ± 5%     1.65µs ± 3%   +4.85%  (p=0.000 n=10+10)
RegexpMatchMedium_1K-4      341µs ± 3%      351µs ± 7%     ~     (p=0.573 n=8+10)
RegexpMatchHard_32-4       21.4µs ± 3%     21.5µs ± 5%     ~     (p=0.931 n=9+9)
RegexpMatchHard_1K-4        626µs ± 2%      626µs ± 1%     ~     (p=0.645 n=8+8)
Revcomp-4                  46.4ms ± 2%     47.4ms ± 2%   +2.07%  (p=0.000 n=10+10)
Template-4                  1.31s ± 3%      1.23s ± 4%   -6.13%  (p=0.000 n=10+10)
TimeParse-4                4.49µs ± 1%     4.41µs ± 2%   -1.81%  (p=0.000 n=10+9)
TimeFormat-4               9.31µs ± 1%     9.32µs ± 2%     ~     (p=0.561 n=9+9)

Change-Id: Iaeeff6c9a09c1b2c064d06e09dd88101dc02bfa4
Reviewed-on: https://go-review.googlesource.com/106735
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-13 16:39:39 +00:00
Cherry Zhang
5a91c83ce8 cmd/compile: in escape analysis, propagate loop depth to field
The escape analysis models "loop depth". If the address of an
expression is assigned to something defined at a lower (outer)
loop depth, the escape analysis decides it escapes. However, it
uses the loop depth of the address operator instead of where
the RHS is defined. This causes an unnecessary escape if there is
an assignment inside a loop but the RHS is defined outside the
loop. This CL propagates the loop depth.

Fixes #24730.

Change-Id: I5ff1530688bdfd90561a7b39c8be9bfc009a9dae
Reviewed-on: https://go-review.googlesource.com/105257
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-13 14:48:23 +00:00
Josh Bleecher Snyder
c1ed1f3c80 cmd/compile: fix evaluation of "" < s
Fixes #24817

Change-Id: Ifa79ab3dfe69297eeef85f7193cd5f85e5982bc5
Reviewed-on: https://go-review.googlesource.com/106655
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-04-12 19:38:37 +00:00
Josh Bleecher Snyder
2dfb423e6e cmd/compile: loop to ensure all autogenerated functions are compiled
I was wrong. There was a need to loop here.

Fixes #24761

Change-Id: If13b3ab72febde930bdaebdddd1c05e0d0446020
Reviewed-on: https://go-review.googlesource.com/105615
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-04-11 23:46:30 +00:00
Alberto Donizetti
467eca6076 test/codegen: port last stack and memcombining tests
And delete them from asm_test.

Also delete an arm64 cmov test has been already ported to the new test
harness.

Change-Id: I4458721e1f512bc9ecbbe1c22a2c9c7109ad68fe
Reviewed-on: https://go-review.googlesource.com/106335
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-11 16:08:04 +00:00
Robert Griesemer
3d501df441 cmd/compile: better error message when referring to ambiguous method/field
Fixes #14321.

Change-Id: I9c92c767b01cf7938c4808a8fef9f2936fc667bc
Reviewed-on: https://go-review.googlesource.com/106119
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-04-10 23:39:13 +00:00
Matthew Dempsky
535ad8efb8 cmd/compile: fix check that ensures main.main is a function
The check was previously disallowing package main from even importing
a non-function symbol named "main".

Fixes #24801.

Change-Id: I849b9713890429f0a16860ef16b5dc7e970d04a4
Reviewed-on: https://go-review.googlesource.com/106120
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-10 23:34:12 +00:00
Alberto Donizetti
188e2bf897 test/codegen: port arm64 BIC/EON/ORN and masking tests
And delete them from asm_test.

Change-Id: I24f421b87e8cb4770c887a6dfd58eacd0088947d
Reviewed-on: https://go-review.googlesource.com/106056
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-10 10:57:50 +00:00