1
0
mirror of https://github.com/golang/go synced 2024-11-12 08:50:22 -07:00
Commit Graph

11 Commits

Author SHA1 Message Date
Ilya Tocar
e4a500ce14 cmd/compile/internal/gc: improve comparison with constant strings
Currently we expand comparison with small constant strings into len check
and a sequence of byte comparisons. Generate 16/32/64-bit comparisons,
instead of bytewise on 386 and amd64. Also increase limits on what is
considered small constant string.
Shaves ~30kb (0.5%) from go executable.

This also updates test/prove.go to keep test case valid.

Change-Id: I99ae8871a1d00c96363c6d03d0b890782fa7e1d9
Reviewed-on: https://go-review.googlesource.com/38776
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-07 15:40:25 +00:00
Keith Randall
73f92f9b04 cmd/compile: use len(s)<=cap(s) to remove more bounds checks
When we discover a relation x <= len(s), also discover the relation
x <= cap(s).  That way, in situations like:

a := s[x:]  // tests 0 <= x <= len(s)
b := s[:x]  // tests 0 <= x <= cap(s)

the second check can be eliminated.

Fixes #16813

Change-Id: Ifc037920b6955e43bac1a1eaf6bac63a89cfbd44
Reviewed-on: https://go-review.googlesource.com/33633
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
Reviewed-by: David Chase <drchase@google.com>
2017-02-02 17:45:58 +00:00
Keith Randall
6317f92f6e cmd/compile: fix CSE with commutative ops
CSE opportunities were being missed for commutative ops. We used to
order the args of commutative ops (by arg ID) once at the start of CSE.
But that may not be enough.

i1 = (Load ptr mem)
i2 = (Load ptr mem)
x1 = (Add i1 j)
x2 = (Add i2 j)

Equivalent commutative ops x1 and x2 may not get their args ordered in
the same way because because at the start of CSE, we don't know that
the i values will be CSEd. If x1 and x2 get opposite orders we won't
CSE them.

Instead, (re)order the args of commutative operations by their
equivalence class IDs each time we partition an equivalence class.

Change-Id: Ic609fa83b85299782a5e85bf93dc6023fccf4b0c
Reviewed-on: https://go-review.googlesource.com/33632
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2017-02-02 17:45:43 +00:00
David Chase
0f29942489 cmd/compile: Repurpose old sliceopt.go for prove phase.
Adapt old test for prove's bounds check elimination.
Added missing rule to generic rules that lead to differences
between 32 and 64 bit platforms on sliceopt test.
Added debugging to prove.go that was helpful-to-necessary to
discover that missing rule.
Lowered debugging level on prove.go from 3 to 1; no idea
why it was previously 3.

Change-Id: I09de206aeb2fced9f2796efe2bfd4a59927eda0c
Reviewed-on: https://go-review.googlesource.com/23290
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-20 23:50:19 +00:00
Josh Bleecher Snyder
c9fd997524 cmd/compile: unroll comparisons to short constant strings
Unroll s == "ab" to

len(s) == 2 && s[0] == 'a' && s[1] == 'b'

This generates faster and shorter code
by avoiding a runtime call.
Do something similar for !=.

The cutoff length is 6. This was chosen empirically
by examining binary sizes on arm, arm64, 386, and amd64
using the SSA backend.

For all architectures examined, 4, 5, and 6 were
the ideal cutoff, with identical binary sizes.

The distribution of constant string equality sizes
during 'go build -a std' is:

 40.81%   622 len 0
 14.11%   215 len 4
  9.45%   144 len 1
  7.81%   119 len 3
  7.48%   114 len 5
  5.12%    78 len 7
  4.13%    63 len 2
  3.54%    54 len 8
  2.69%    41 len 6
  1.18%    18 len 10
  0.85%    13 len 9
  0.66%    10 len 14
  0.59%     9 len 17
  0.46%     7 len 11
  0.26%     4 len 12
  0.20%     3 len 19
  0.13%     2 len 13
  0.13%     2 len 15
  0.13%     2 len 16
  0.07%     1 len 20
  0.07%     1 len 23
  0.07%     1 len 33
  0.07%     1 len 36

A cutoff of length 6 covers most of the cases.

Benchmarks on amd64 comparing a string to a constant of length 3:

Cmp/1same-8           4.78ns ± 6%  0.94ns ± 9%  -80.26%  (p=0.000 n=20+20)
Cmp/1diffbytes-8      6.43ns ± 6%  0.96ns ±11%  -85.13%  (p=0.000 n=20+20)
Cmp/3same-8           4.71ns ± 5%  1.28ns ± 5%  -72.90%  (p=0.000 n=20+20)
Cmp/3difffirstbyte-8  6.33ns ± 7%  1.27ns ± 7%  -79.90%  (p=0.000 n=20+20)
Cmp/3difflastbyte-8   6.34ns ± 8%  1.26ns ± 9%  -80.13%  (p=0.000 n=20+20)

The change to the prove test preserves the
existing intent of the test. When the string was
short, there was a new "proved in bounds" report
that referred to individual byte comparisons.

Change-Id: I593ac303b0d11f275672090c5c786ea0c6b8da13
Reviewed-on: https://go-review.googlesource.com/26758
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-15 15:37:00 +00:00
Alexandru Moșoi
8b20fd000d cmd/compile: transform some Phis into Or8.
func f(a, b bool) bool {
          return a || b
}

is now a single instructions (excluding loading and unloading the arguments):
      v10 = ORB <bool> v11 v12 : AX

Change-Id: Iff63399410cb46909f4318ea1c3f45a029f4aa5e
Reviewed-on: https://go-review.googlesource.com/21872
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-19 22:04:30 +00:00
Alexandru Moșoi
27ebc84716 cmd/compile: handle non-negatives in prove
Handle this case:
if 0 <= i && i < len(a) {
        use a[i]
}

Shaves about 5k from pkg/tools/linux_amd64/*.

Change-Id: I6675ff49aa306b0d241b074c5738e448204cd981
Reviewed-on: https://go-review.googlesource.com/21431
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-02 20:34:38 +00:00
Alexandru Moșoi
b91cc53033 cmd/compile/internal/ssa: BCE for induction variables
There are 5293 loop in the main go repository.
A survey of the top most common for loops:

     18 for __k__ := 0; i < len(sa.Addr); i++ {
     19 for __k__ := 0; ; i++ {
     19 for __k__ := 0; i < 16; i++ {
     25 for __k__ := 0; i < length; i++ {
     30 for __k__ := 0; i < 8; i++ {
     49 for __k__ := 0; i < len(s); i++ {
     67 for __k__ := 0; i < n; i++ {
    376 for __k__ := range __slice__ {
    685 for __k__, __v__ := range __slice__ {
   2074 for __, __v__ := range __slice__ {

The algorithm to find induction variables handles all cases
with an upper limit. It currently doesn't find related induction
variables such as c * ind or c + ind.

842 out of 22954 bound checks are removed for src/make.bash.
1957 out of 42952 bounds checks are removed for src/all.bash.

Things to do in follow-up CLs:
* Find the associated pointer for `for _, v := range a {}`
* Drop the NilChecks on the pointer.
* Replace the implicit induction variable by a loop over the pointer

Generated garbage can be reduced if we share the sdom between passes.

% benchstat old.txt new.txt
name       old time/op     new time/op     delta
Template       337ms ± 3%      333ms ± 3%    ~             (p=0.258 n=9+9)
GoTypes        1.11s ± 2%      1.10s ± 2%    ~           (p=0.912 n=10+10)
Compiler       5.25s ± 1%      5.29s ± 2%    ~             (p=0.077 n=9+9)
MakeBash       33.5s ± 1%      34.1s ± 2%  +1.85%          (p=0.011 n=9+9)

name       old alloc/op    new alloc/op    delta
Template      63.6MB ± 0%     63.9MB ± 0%  +0.52%         (p=0.000 n=10+9)
GoTypes        218MB ± 0%      219MB ± 0%  +0.59%         (p=0.000 n=10+9)
Compiler       978MB ± 0%      985MB ± 0%  +0.69%        (p=0.000 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        582k ± 0%       583k ± 0%  +0.10%        (p=0.000 n=10+10)
GoTypes        1.78M ± 0%      1.78M ± 0%  +0.12%        (p=0.000 n=10+10)
Compiler       7.68M ± 0%      7.69M ± 0%  +0.05%        (p=0.000 n=10+10)

name       old text-bytes  new text-bytes  delta
HelloSize       581k ± 0%       581k ± 0%  -0.08%        (p=0.000 n=10+10)
CmdGoSize      6.40M ± 0%      6.39M ± 0%  -0.08%        (p=0.000 n=10+10)

name       old data-bytes  new data-bytes  delta
HelloSize      3.66k ± 0%      3.66k ± 0%    ~     (all samples are equal)
CmdGoSize       134k ± 0%       134k ± 0%    ~     (all samples are equal)

name       old bss-bytes   new bss-bytes   delta
HelloSize       126k ± 0%       126k ± 0%    ~     (all samples are equal)
CmdGoSize       149k ± 0%       149k ± 0%    ~     (all samples are equal)

name       old exe-bytes   new exe-bytes   delta
HelloSize       947k ± 0%       946k ± 0%  -0.01%        (p=0.000 n=10+10)
CmdGoSize      9.92M ± 0%      9.91M ± 0%  -0.06%        (p=0.000 n=10+10)

Change-Id: Ie74bdff46fd602db41bb457333d3a762a0c3dc4d
Reviewed-on: https://go-review.googlesource.com/20517
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
2016-04-01 09:37:58 +00:00
Keith Randall
47c9e139ae cmd/compile: extend prove pass to handle constant comparisons
Find comparisons to constants and propagate that information
down the dominator tree.  Use it to resolve other constant
comparisons on the same variable.

So if we know x >= 7, then a x > 4 condition must return true.

This change allows us to use "_ = b[7]" hints to eliminate bounds checks.

Fixes #14900

Change-Id: Idbf230bd5b7da43de3ecb48706e21cf01bf812f7
Reviewed-on: https://go-review.googlesource.com/21008
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
2016-03-31 21:16:23 +00:00
Alexandru Moșoi
cd798dcb88 cmd/compile/internal/ssa: generalize prove to all booleans
* Refacts a bit saving and restoring parents restrictions
* Shaves ~100k from pkg/tools/linux_amd64,
but most of the savings come from the rewrite rules.
* Improves on the following artificial test case:
func f1(a4 bool, a6 bool) bool {
  return a6 || (a6 || (a6 || a4)) || (a6 || (a4 || a6 || (false || a6)))
}

Change-Id: I714000f75a37a3a6617c6e6834c75bd23674215f
Reviewed-on: https://go-review.googlesource.com/20306
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-13 12:05:41 +00:00
Alexandru Moșoi
bdea1d58cf [dev.ssa] cmd/compile/internal/ssa: remove proven redundant controls.
* It does very simple bounds checking elimination. E.g.
removes the second check in for i := range a { a[i]++; a[i++]; }
* Improves on the following redundant expression:
return a6 || (a6 || (a6 || a4)) || (a6 || (a4 || a6 || (false || a6)))
* Linear in the number of block edges.

I patched in CL 12960 that does bounds, nil and constant propagation
to make sure this CL is not just redundant. Size of pkg/tool/linux_amd64/*
(excluding compile which is affected by this change):

With IsInBounds and IsSliceInBounds
-this -12960 92285080
+this -12960 91947416
-this +12960 91978976
+this +12960 91923088

Gain is ~110% of 12960.

Without IsInBounds and IsSliceInBounds (older run)
-this -12960 95515512
+this -12960 95492536
-this +12960 95216920
+this +12960 95204440

Shaves 22k on its own.

* Can we handle IsInBounds better with this? In
for i := range a { a[i]++; } the bounds checking at a[i]
is not eliminated.

Change-Id: I98957427399145fb33693173fd4d5a8d71c7cc20
Reviewed-on: https://go-review.googlesource.com/19710
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-28 19:48:20 +00:00