1
0
mirror of https://github.com/golang/go synced 2024-11-13 18:40:22 -07:00
Commit Graph

15 Commits

Author SHA1 Message Date
Keith Randall
deb4177cf0 cmd/compile: use masks instead of branches for slicing
When we do

  var x []byte = ...
  y := x[i:]

We can't just use y.ptr = x.ptr + i, as the new pointer may point to the
next object in memory after the backing array.
We used to fix this by doing:

  y.cap = x.cap - i
  delta := i
  if y.cap == 0 {
    delta = 0
  }
  y.ptr = x.ptr + delta

That generates a branch in what is otherwise straight-line code.

Better to do:

  y.cap = x.cap - i
  mask := (y.cap - 1) >> 63 // -1 if y.cap==0, 0 otherwise
  y.ptr = x.ptr + i &^ mask

It's about the same number of instructions (~4, depending on what
parts are constant, and the target architecture), but it is all
inline. It plays nicely with CSE, and the mask can be computed
in parallel with the index (in cases where a multiply is required).

It is a minor win in both speed and space.

Change-Id: Ied60465a0b8abb683c02208402e5bb7ac0e8370f
Reviewed-on: https://go-review.googlesource.com/32022
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-27 20:22:49 +00:00
David Chase
0f29942489 cmd/compile: Repurpose old sliceopt.go for prove phase.
Adapt old test for prove's bounds check elimination.
Added missing rule to generic rules that lead to differences
between 32 and 64 bit platforms on sliceopt test.
Added debugging to prove.go that was helpful-to-necessary to
discover that missing rule.
Lowered debugging level on prove.go from 3 to 1; no idea
why it was previously 3.

Change-Id: I09de206aeb2fced9f2796efe2bfd4a59927eda0c
Reviewed-on: https://go-review.googlesource.com/23290
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-20 23:50:19 +00:00
David Chase
c79ba22ece test: delete sliceopt.go
It tests the behavior of the old deleted compiler.

Fixes #17362.

Change-Id: Ia2fdec734c5cbe724a9de562ed71598f67244ab3
Reviewed-on: https://go-review.googlesource.com/30593
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-06 18:19:28 +00:00
Michael Munday
6ec993adc3 cmd/compile: add SSA backend for s390x and enable by default
The new SSA backend modifies the ABI slightly: R0 is now a usable
general purpose register.

Fixes #16677.

Change-Id: I367435ce921e0c7e79e021c80cf8ef5d1d1466cf
Reviewed-on: https://go-review.googlesource.com/28978
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-13 19:39:38 +00:00
Cherry Zhang
7f27f1dfdd cmd/compile: add MIPS64 optimizations, SSA on by default
Add the following optimizations:
- fold constants
- fold address into load/store
- simplify extensions and conditional branches
- remove nil checks

Turn on SSA on MIPS64 by default, and toggle the tests.

Fixes #16359.

Change-Id: I7f1e38c2509e22e42cd024e712990ebbe47176bd
Reviewed-on: https://go-review.googlesource.com/27870
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-26 19:45:06 +00:00
David Chase
5b9ff11c3d cmd/compile: ppc64le working, not optimized enough
This time with the cherry-pick from the proper patch of
the old CL.

Stack size increased.
Corrected NaN-comparison glitches.
Marked g register as clobbered by calls.
Fixed shared libraries.

live_ssa.go still disabled because of differences.
Presumably turning on more optimization will fix
both the stack size and the live_ssa.go glitches.

Enhanced debugging output for shared libs test.

Rebased onto master.

Updates #16010.

Change-Id: I40864faf1ef32c118fb141b7ef8e854498e6b2c4
Reviewed-on: https://go-review.googlesource.com/27159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-18 16:34:47 +00:00
Cherry Zhang
d99cee79b9 [dev.ssa] cmd/compile, etc.: more ARM64 optimizations, and enable SSA by default
Add more ARM64 optimizations:
- use hardware zero register when it is possible.
- use shifted ops.
  The assembler supports shifted ops but not documented, nor knows
  how to print it. This CL adds them.
- enable fast division.
  This was disabled because it makes the old backend generate slower
  code. But with SSA it generates faster code.

Turn on SSA by default, also adjust tests.

Change-Id: I7794479954c83bb65008dcb457bc1e21d7496da6
Reviewed-on: https://go-review.googlesource.com/26950
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-15 03:37:34 +00:00
Keith Randall
c069bc4996 [dev.ssa] cmd/compile: implement GO386=387
Last part of the 386 SSA port.

Modify the x86 backend to simulate SSE registers and
instructions with 387 registers and instructions.
The simulation isn't terribly performant, but it works,
and the old implementation wasn't very performant either.
Leaving to people who care about 387 to optimize if they want.

Turn on SSA backend for 386 by default.

Fixes #16358

Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0
Reviewed-on: https://go-review.googlesource.com/25271
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-10 17:41:01 +00:00
Keith Randall
69a755b602 [dev.ssa] cmd/compile: port SSA backend to amd64p32
It's not a new backend, just a PtrSize==4 modification
of the existing AMD64 backend.

Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda
Reviewed-on: https://go-review.googlesource.com/25586
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-09 15:48:26 +00:00
Cherry Zhang
6b6de15d32 [dev.ssa] cmd/compile: support NaCl in SSA for ARM
NaCl code runs in sandbox and there are restrictions for its
instruction uses
(https://developer.chrome.com/native-client/reference/sandbox_internals/arm-32-bit-sandbox).

Like the legacy backend, on NaCl,
- don't use R9, which is used as NaCl's "thread pointer".
- don't use Duff's device.
- don't use indexed load/stores.
- the assembler rewrites DIV/MOD to runtime calls, which on NaCl
  clobbers R12, so R12 is marked as clobbered for DIV/MOD.
- other restrictions are satisfied by the assembler.

Enable SSA specific tests on nacl/arm, and disable non-SSA ones.

Updates #15365.

Change-Id: I9262693ec6756b89ca29d3ae4e52a96fe5403b02
Reviewed-on: https://go-review.googlesource.com/24859
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-07-16 03:13:45 +00:00
Cherry Zhang
42181ad852 [dev.ssa] cmd/compile: enable SSA on ARM by default
As Josh mentioned in CL 24716, there has been requests for using SSA
for ARM. SSA can still be disabled by setting -ssa=0 for cmd/compile,
or partially enabled with GOSSAFUNC, GOSSAPKG, and GOSSAHASH.

Not enable SSA by default on NaCl, which is not supported yet.

Enable SSA-specific tests on ARM: live_ssa.go and nilptr3_ssa.go;
disable non-SSA tests: live.go, nilptr3.go, and slicepot.go.

Updates #15365.

Change-Id: Ic2ca8d166aeca8517b9d262a55e92f2130683a16
Reviewed-on: https://go-review.googlesource.com/23953
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2016-07-06 15:05:50 +00:00
Emmanuel Odeke
53fd522c0d all: make copyright headers consistent with one space after period
Follows suit with https://go-review.googlesource.com/#/c/20111.

Generated by running
$ grep -R 'Go Authors.  All' * | cut -d":" -f1 | while read F;do perl -pi -e 's/Go
Authors.  All/Go Authors. All/g' $F;done

The code in cmd/internal/unvendor wasn't changed.

Fixes #15213

Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f
Reviewed-on: https://go-review.googlesource.com/21819
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-02 13:43:18 +00:00
David Chase
729abfa35c [dev.ssa] cmd/compile: default compile+test with SSA
Some tests disabled, some bifurcated into _ssa and not,
with appropriate logging added to compiler.

"tests/live.go" in particular needs attention.

SSA-specific testing removed, since it's all SSA now.

Added "-run_skips" option to tests/run.go to simplify
checking whether a test still fails (or how it fails)
on a skipped platform.

The compiler now compiles with SSA by default.
If you don't want SSA, specify GOSSAHASH=n (or N) as
an environment variable.  Function names ending in "_ssa"
are always SSA-compiled.

GOSSAFUNC=fname retains its "SSA for fname, log to ssa.html"
GOSSAPKG=pkg only has an effect when GOSSAHASH=n
GOSSAHASH=10101 etc retains its name-hash-matching behavior
for purposes of debugging.

See #13068

Change-Id: I8217bfeb34173533eaeb391b5f6935483c7d6b43
Reviewed-on: https://go-review.googlesource.com/16299
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
2015-10-30 20:35:20 +00:00
Russ Cox
d447279927 cmd/internal/gc: optimize slice + write barrier
The code generated for a slice x[i:j] or x[i:j:k] computes the entire
new slice (base, len, cap) and then uses it as the evaluation of the
slice expression.

If the slice is part of an update x = x[i:j] or x = x[i:j:k], there are
opportunities to avoid computing some of these fields.

For x = x[0:i], we know that only the len is changing;
base can be ignored completely, and cap can be left unmodified.

For x = x[0:i:j], we know that only len and cap are changing;
base can be ignored completely.

For x = x[i:i], we know that the resulting cap is zero, and we don't
adjust the base during a slice producing a zero-cap result,
so again base can be ignored completely.

No write to base, no write barrier.

The old slice code was trying to work at a Go syntax level, mainly
because that was how you wrote code just once instead of once
per architecture. Now the compiler is factored a bit better and we
can implement slice during code generation but still have one copy
of the code. So the new code is working at that lower level.
(It must, to update only parts of the result.)

This CL by itself:
name                   old mean              new mean              delta
BinaryTree17            5.81s × (0.98,1.03)   5.71s × (0.96,1.05)     ~    (p=0.101)
Fannkuch11              4.35s × (1.00,1.00)   4.39s × (1.00,1.00)   +0.79% (p=0.000)
FmtFprintfEmpty        86.0ns × (0.94,1.11)  82.6ns × (0.98,1.04)   -3.86% (p=0.048)
FmtFprintfString        276ns × (0.98,1.04)   273ns × (0.98,1.02)     ~    (p=0.235)
FmtFprintfInt           274ns × (0.98,1.06)   270ns × (0.99,1.01)     ~    (p=0.119)
FmtFprintfIntInt        506ns × (0.99,1.01)   475ns × (0.99,1.01)   -6.02% (p=0.000)
FmtFprintfPrefixedInt   391ns × (0.99,1.01)   393ns × (1.00,1.01)     ~    (p=0.139)
FmtFprintfFloat         566ns × (0.99,1.01)   574ns × (1.00,1.01)   +1.33% (p=0.001)
FmtManyArgs            1.91µs × (0.99,1.01)  1.87µs × (0.99,1.02)   -1.83% (p=0.000)
GobDecode              15.3ms × (0.99,1.02)  15.0ms × (0.98,1.05)   -1.84% (p=0.042)
GobEncode              11.5ms × (0.97,1.03)  11.4ms × (0.99,1.03)     ~    (p=0.152)
Gzip                    645ms × (0.99,1.01)   647ms × (0.99,1.01)     ~    (p=0.265)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)   +0.90% (p=0.000)
HTTPClientServer       90.5µs × (0.97,1.04)  88.5µs × (0.99,1.03)   -2.27% (p=0.014)
JSONEncode             32.0ms × (0.98,1.03)  29.6ms × (0.98,1.01)   -7.51% (p=0.000)
JSONDecode              114ms × (0.99,1.01)   104ms × (1.00,1.01)   -8.60% (p=0.000)
Mandelbrot200          6.04ms × (1.00,1.01)  6.02ms × (1.00,1.00)     ~    (p=0.057)
GoParse                6.47ms × (0.97,1.05)  6.37ms × (0.97,1.04)     ~    (p=0.105)
RegexpMatchEasy0_32     171ns × (0.93,1.07)   152ns × (0.99,1.01)  -11.09% (p=0.000)
RegexpMatchEasy0_1K     550ns × (0.98,1.01)   530ns × (1.00,1.00)   -3.78% (p=0.000)
RegexpMatchEasy1_32     135ns × (0.99,1.02)   134ns × (0.99,1.01)   -1.33% (p=0.002)
RegexpMatchEasy1_1K     879ns × (1.00,1.01)   865ns × (1.00,1.00)   -1.58% (p=0.000)
RegexpMatchMedium_32    243ns × (1.00,1.00)   233ns × (1.00,1.00)   -4.30% (p=0.000)
RegexpMatchMedium_1K   70.3µs × (1.00,1.00)  69.5µs × (1.00,1.00)   -1.13% (p=0.000)
RegexpMatchHard_32     3.82µs × (1.00,1.01)  3.74µs × (1.00,1.00)   -1.95% (p=0.000)
RegexpMatchHard_1K      117µs × (1.00,1.00)   115µs × (1.00,1.00)   -1.69% (p=0.000)
Revcomp                 917ms × (0.97,1.04)   920ms × (0.97,1.04)     ~    (p=0.786)
Template                114ms × (0.99,1.01)   117ms × (0.99,1.01)   +2.58% (p=0.000)
TimeParse               622ns × (0.99,1.01)   615ns × (0.99,1.00)   -1.06% (p=0.000)
TimeFormat              665ns × (0.99,1.01)   654ns × (0.99,1.00)   -1.70% (p=0.000)

This CL and previous CL (append) combined:
name                   old mean              new mean              delta
BinaryTree17            5.68s × (0.97,1.04)   5.71s × (0.96,1.05)     ~    (p=0.638)
Fannkuch11              4.41s × (0.98,1.03)   4.39s × (1.00,1.00)     ~    (p=0.474)
FmtFprintfEmpty        92.7ns × (0.91,1.16)  82.6ns × (0.98,1.04)  -10.89% (p=0.004)
FmtFprintfString        281ns × (0.96,1.08)   273ns × (0.98,1.02)     ~    (p=0.078)
FmtFprintfInt           288ns × (0.97,1.06)   270ns × (0.99,1.01)   -6.37% (p=0.000)
FmtFprintfIntInt        493ns × (0.97,1.04)   475ns × (0.99,1.01)   -3.53% (p=0.002)
FmtFprintfPrefixedInt   423ns × (0.97,1.04)   393ns × (1.00,1.01)   -7.07% (p=0.000)
FmtFprintfFloat         598ns × (0.99,1.01)   574ns × (1.00,1.01)   -4.02% (p=0.000)
FmtManyArgs            1.89µs × (0.98,1.05)  1.87µs × (0.99,1.02)     ~    (p=0.305)
GobDecode              14.8ms × (0.98,1.03)  15.0ms × (0.98,1.05)     ~    (p=0.237)
GobEncode              12.3ms × (0.98,1.01)  11.4ms × (0.99,1.03)   -6.95% (p=0.000)
Gzip                    656ms × (0.99,1.05)   647ms × (0.99,1.01)     ~    (p=0.101)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)   +0.58% (p=0.001)
HTTPClientServer       91.2µs × (0.97,1.04)  88.5µs × (0.99,1.03)   -3.02% (p=0.003)
JSONEncode             32.6ms × (0.97,1.08)  29.6ms × (0.98,1.01)   -9.10% (p=0.000)
JSONDecode              114ms × (0.97,1.05)   104ms × (1.00,1.01)   -8.74% (p=0.000)
Mandelbrot200          6.11ms × (0.98,1.04)  6.02ms × (1.00,1.00)     ~    (p=0.090)
GoParse                6.66ms × (0.97,1.04)  6.37ms × (0.97,1.04)   -4.41% (p=0.000)
RegexpMatchEasy0_32     159ns × (0.99,1.00)   152ns × (0.99,1.01)   -4.69% (p=0.000)
RegexpMatchEasy0_1K     538ns × (1.00,1.01)   530ns × (1.00,1.00)   -1.57% (p=0.000)
RegexpMatchEasy1_32     138ns × (1.00,1.00)   134ns × (0.99,1.01)   -2.91% (p=0.000)
RegexpMatchEasy1_1K     869ns × (0.99,1.01)   865ns × (1.00,1.00)   -0.51% (p=0.012)
RegexpMatchMedium_32    252ns × (0.99,1.01)   233ns × (1.00,1.00)   -7.85% (p=0.000)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  69.5µs × (1.00,1.00)   -4.43% (p=0.000)
RegexpMatchHard_32     3.85µs × (1.00,1.00)  3.74µs × (1.00,1.00)   -2.74% (p=0.000)
RegexpMatchHard_1K      118µs × (1.00,1.00)   115µs × (1.00,1.00)   -2.24% (p=0.000)
Revcomp                 920ms × (0.97,1.07)   920ms × (0.97,1.04)     ~    (p=0.998)
Template                129ms × (0.98,1.03)   117ms × (0.99,1.01)   -9.79% (p=0.000)
TimeParse               619ns × (0.99,1.01)   615ns × (0.99,1.00)   -0.57% (p=0.011)
TimeFormat              661ns × (0.98,1.04)   654ns × (0.99,1.00)     ~    (p=0.223)

Change-Id: If054d81ab2c71d8d62cf54b5b1fac2af66b387fc
Reviewed-on: https://go-review.googlesource.com/9813
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-13 19:20:39 +00:00
Russ Cox
8552047a32 cmd/internal/gc: optimize append + write barrier
The code generated for x = append(x, v) is roughly:

	t := x
	if len(t)+1 > cap(t) {
		t = grow(t)
	}
	t[len(t)] = v
	len(t)++
	x = t

We used to generate this code as Go pseudocode during walk.
Generate it instead as actual instructions during gen.

Doing so lets us apply a few optimizations. The most important
is that when, as in the above example, the source slice and the
destination slice are the same, the code can instead do:

	t := x
	if len(t)+1 > cap(t) {
		t = grow(t)
		x = {base(t), len(t)+1, cap(t)}
	} else {
		len(x)++
	}
	t[len(t)] = v

That is, in the fast path that does not reallocate the array,
only the updated length needs to be written back to x,
not the array pointer and not the capacity. This is more like
what you'd write by hand in C. It's faster in general, since
the fast path elides two of the three stores, but it's especially
faster when the form of x is such that the base pointer write
would turn into a write barrier. No write, no barrier.

name                   old mean              new mean              delta
BinaryTree17            5.68s × (0.97,1.04)   5.81s × (0.98,1.03)   +2.35% (p=0.023)
Fannkuch11              4.41s × (0.98,1.03)   4.35s × (1.00,1.00)     ~    (p=0.090)
FmtFprintfEmpty        92.7ns × (0.91,1.16)  86.0ns × (0.94,1.11)   -7.31% (p=0.038)
FmtFprintfString        281ns × (0.96,1.08)   276ns × (0.98,1.04)     ~    (p=0.219)
FmtFprintfInt           288ns × (0.97,1.06)   274ns × (0.98,1.06)   -4.94% (p=0.002)
FmtFprintfIntInt        493ns × (0.97,1.04)   506ns × (0.99,1.01)   +2.65% (p=0.009)
FmtFprintfPrefixedInt   423ns × (0.97,1.04)   391ns × (0.99,1.01)   -7.52% (p=0.000)
FmtFprintfFloat         598ns × (0.99,1.01)   566ns × (0.99,1.01)   -5.27% (p=0.000)
FmtManyArgs            1.89µs × (0.98,1.05)  1.91µs × (0.99,1.01)     ~    (p=0.231)
GobDecode              14.8ms × (0.98,1.03)  15.3ms × (0.99,1.02)   +3.01% (p=0.000)
GobEncode              12.3ms × (0.98,1.01)  11.5ms × (0.97,1.03)   -5.93% (p=0.000)
Gzip                    656ms × (0.99,1.05)   645ms × (0.99,1.01)     ~    (p=0.055)
Gunzip                  142ms × (1.00,1.00)   142ms × (1.00,1.00)   -0.32% (p=0.034)
HTTPClientServer       91.2µs × (0.97,1.04)  90.5µs × (0.97,1.04)     ~    (p=0.468)
JSONEncode             32.6ms × (0.97,1.08)  32.0ms × (0.98,1.03)     ~    (p=0.190)
JSONDecode              114ms × (0.97,1.05)   114ms × (0.99,1.01)     ~    (p=0.887)
Mandelbrot200          6.11ms × (0.98,1.04)  6.04ms × (1.00,1.01)     ~    (p=0.167)
GoParse                6.66ms × (0.97,1.04)  6.47ms × (0.97,1.05)   -2.81% (p=0.014)
RegexpMatchEasy0_32     159ns × (0.99,1.00)   171ns × (0.93,1.07)   +7.19% (p=0.002)
RegexpMatchEasy0_1K     538ns × (1.00,1.01)   550ns × (0.98,1.01)   +2.30% (p=0.000)
RegexpMatchEasy1_32     138ns × (1.00,1.00)   135ns × (0.99,1.02)   -1.60% (p=0.000)
RegexpMatchEasy1_1K     869ns × (0.99,1.01)   879ns × (1.00,1.01)   +1.08% (p=0.000)
RegexpMatchMedium_32    252ns × (0.99,1.01)   243ns × (1.00,1.00)   -3.71% (p=0.000)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  70.3µs × (1.00,1.00)   -3.34% (p=0.000)
RegexpMatchHard_32     3.85µs × (1.00,1.00)  3.82µs × (1.00,1.01)   -0.81% (p=0.000)
RegexpMatchHard_1K      118µs × (1.00,1.00)   117µs × (1.00,1.00)   -0.56% (p=0.000)
Revcomp                 920ms × (0.97,1.07)   917ms × (0.97,1.04)     ~    (p=0.808)
Template                129ms × (0.98,1.03)   114ms × (0.99,1.01)  -12.06% (p=0.000)
TimeParse               619ns × (0.99,1.01)   622ns × (0.99,1.01)     ~    (p=0.062)
TimeFormat              661ns × (0.98,1.04)   665ns × (0.99,1.01)     ~    (p=0.524)

See next CL for combination with a similar optimization for slice.
The benchmarks that are slower in this CL are still faster overall
with the combination of the two.

Change-Id: I2a7421658091b2488c64741b4db15ab6c3b4cb7e
Reviewed-on: https://go-review.googlesource.com/9812
Reviewed-by: David Chase <drchase@google.com>
2015-05-12 17:55:09 +00:00