1
0
mirror of https://github.com/golang/go synced 2024-10-02 10:18:33 -06:00
Commit Graph

17 Commits

Author SHA1 Message Date
Michael Hudson-Doyle
9f0baca505 runtime: fixes for arm64 shared libraries
Building for shared libraries requires that all functions that are declared
have an implementation and vice versa so make that so on arm64.

It would be nicer to not require the stub sigreturn (it will never be called)
but that seems a bit awkward.

Change-Id: I3cec81697161b452af81fa35939f748bd1acf7fd
Reviewed-on: https://go-review.googlesource.com/13995
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-09-03 01:07:40 +00:00
Dave Cheney
686d44d9e0 runtime: check pointer equality in arm64 cmpbody
Updates #11336

Follow the lead of amd64 by doing a pointer equality check
before comparing string/byte contents on arm64.

BenchmarkCompareBytesEqual-8               25.8           26.3           +1.94%
BenchmarkCompareBytesToNil-8               9.59           9.59           +0.00%
BenchmarkCompareBytesEmpty-8               9.59           9.17           -4.38%
BenchmarkCompareBytesIdentical-8           26.3           9.17           -65.13%
BenchmarkCompareBytesSameLength-8          16.3           16.3           +0.00%
BenchmarkCompareBytesDifferentLength-8     16.3           16.3           +0.00%
BenchmarkCompareBytesBigUnaligned-8        1132038        1131409        -0.06%
BenchmarkCompareBytesBig-8                 1126758        1128470        +0.15%
BenchmarkCompareBytesBigIdentical-8        1084366        9.17           -100.00%

Change-Id: Id7125c31957eff1ddb78897d4511bd50e79af3f7
Reviewed-on: https://go-review.googlesource.com/13885
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-25 03:29:47 +00:00
Russ Cox
421220571d runtime, reflect: use correctly aligned stack frame sizes on arm64
arm64 requires either no stack frame or a frame with a size that is 8 mod 16
(adding the saved LR will make it 16-aligned).

The cmd/internal/obj/arm64 has been silently aligning frames, but it led to
a terrible bug when the compiler and obj disagreed on the frame size,
and it's just generally confusing, so we're going to make misaligned frames
an error instead of something that is silently changed.

This CL prepares by updating assembly files.
Note that the changes in this CL are already being done silently by
cmd/internal/obj/arm64, so there is no semantic effect here,
just a clarity effect.

For #9880.

Change-Id: Ibd6928dc5fdcd896c2bacd0291bf26b364591e28
Reviewed-on: https://go-review.googlesource.com/12845
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 21:35:35 +00:00
Alex Brainman
9d968cb47b runtime: rename cgocall_errno and asmcgocall_errno into cgocall and asmcgocall
Change-Id: I5917bea8bb35b0e725dcc56a68f3a70137cfc180
Reviewed-on: https://go-review.googlesource.com/9387
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-19 01:47:11 +00:00
Alex Brainman
2858b73843 runtime: remove cgocall and asmcgocall
In preparation for rename of cgocall_errno into cgocall and
asmcgocall_errno into asmcgocall in the fllowinng CL.
rsc requested CL 9387 to be split into two parts. This is first part.

Change-Id: I7434f0e4b44dd37017540695834bfcb1eebf0b2f
Reviewed-on: https://go-review.googlesource.com/11166
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-18 04:42:53 +00:00
Austin Clements
faa7a7e8ae runtime: implement GC stack barriers
This commit implements stack barriers to minimize the amount of
stack re-scanning that must be done during mark termination.

Currently the GC scans stacks of active goroutines twice during every
GC cycle: once at the beginning during root discovery and once at the
end during mark termination. The second scan happens while the world
is stopped and guarantees that we've seen all of the roots (since
there are no write barriers on writes to local stack
variables). However, this means pause time is proportional to stack
size. In particularly recursive programs, this can drive pause time up
past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap).

Re-scanning the entire stack is rarely necessary, especially for large
stacks, because usually most of the frames on the stack were not
active between the first and second scans and hence any changes to
these frames (via non-escaping pointers passed down the stack) were
tracked by write barriers.

To efficiently track how far a stack has been unwound since the first
scan (and, hence, how much needs to be re-scanned), this commit
introduces stack barriers. During the first scan, at exponentially
spaced points in each stack, the scan overwrites return PCs with the
PC of the stack barrier function. When "returned" to, the stack
barrier function records how far the stack has unwound and jumps to
the original return PC for that point in the stack. Then the second
scan only needs to proceed as far as the lowest barrier that hasn't
been hit.

For deeply recursive programs, this substantially reduces mark
termination time (and hence pause time). For the goscheme example
linked in issue #10898, prior to this change, mark termination times
were typically between 100 and 500ms; with this change, mark
termination times are typically between 10 and 20ms. As a result of
the reduced stack scanning work, this reduces overall execution time
of the goscheme example by 20%.

Fixes #10898.

The effect of this on programs that are not deeply recursive is
minimal:

name                   old time/op    new time/op    delta
BinaryTree17              3.16s ± 2%     3.26s ± 1%  +3.31%  (p=0.000 n=19+19)
Fannkuch11                2.42s ± 1%     2.48s ± 1%  +2.24%  (p=0.000 n=17+19)
FmtFprintfEmpty          50.0ns ± 3%    49.8ns ± 1%    ~     (p=0.534 n=20+19)
FmtFprintfString          173ns ± 0%     175ns ± 0%  +1.49%  (p=0.000 n=16+19)
FmtFprintfInt             170ns ± 1%     175ns ± 1%  +2.97%  (p=0.000 n=20+19)
FmtFprintfIntInt          288ns ± 0%     295ns ± 0%  +2.73%  (p=0.000 n=16+19)
FmtFprintfPrefixedInt     242ns ± 1%     252ns ± 1%  +4.13%  (p=0.000 n=18+18)
FmtFprintfFloat           324ns ± 0%     323ns ± 0%  -0.36%  (p=0.000 n=20+19)
FmtManyArgs              1.14µs ± 0%    1.12µs ± 1%  -1.01%  (p=0.000 n=18+19)
GobDecode                8.88ms ± 1%    8.87ms ± 0%    ~     (p=0.480 n=19+18)
GobEncode                6.80ms ± 1%    6.85ms ± 0%  +0.82%  (p=0.000 n=20+18)
Gzip                      363ms ± 1%     363ms ± 1%    ~     (p=0.077 n=18+20)
Gunzip                   90.6ms ± 0%    90.0ms ± 1%  -0.71%  (p=0.000 n=17+18)
HTTPClientServer         51.5µs ± 1%    50.8µs ± 1%  -1.32%  (p=0.000 n=18+18)
JSONEncode               17.0ms ± 0%    17.1ms ± 0%  +0.40%  (p=0.000 n=18+17)
JSONDecode               61.8ms ± 0%    63.8ms ± 1%  +3.11%  (p=0.000 n=18+17)
Mandelbrot200            3.84ms ± 0%    3.84ms ± 1%    ~     (p=0.583 n=19+19)
GoParse                  3.71ms ± 1%    3.72ms ± 1%    ~     (p=0.159 n=18+19)
RegexpMatchEasy0_32       100ns ± 0%     100ns ± 1%  -0.19%  (p=0.033 n=17+19)
RegexpMatchEasy0_1K       342ns ± 1%     331ns ± 0%  -3.41%  (p=0.000 n=19+19)
RegexpMatchEasy1_32      82.5ns ± 0%    81.7ns ± 0%  -0.98%  (p=0.000 n=18+18)
RegexpMatchEasy1_1K       505ns ± 0%     494ns ± 1%  -2.16%  (p=0.000 n=18+18)
RegexpMatchMedium_32      137ns ± 1%     137ns ± 1%  -0.24%  (p=0.048 n=20+18)
RegexpMatchMedium_1K     41.6µs ± 0%    41.3µs ± 1%  -0.57%  (p=0.004 n=18+20)
RegexpMatchHard_32       2.11µs ± 0%    2.11µs ± 1%  +0.20%  (p=0.037 n=17+19)
RegexpMatchHard_1K       63.9µs ± 2%    63.3µs ± 0%  -0.99%  (p=0.000 n=20+17)
Revcomp                   560ms ± 1%     522ms ± 0%  -6.87%  (p=0.000 n=18+16)
Template                 75.0ms ± 0%    75.1ms ± 1%  +0.18%  (p=0.013 n=18+19)
TimeParse                 358ns ± 1%     364ns ± 0%  +1.74%  (p=0.000 n=20+15)
TimeFormat                360ns ± 0%     372ns ± 0%  +3.55%  (p=0.000 n=20+18)

Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2
Reviewed-on: https://go-review.googlesource.com/10314
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-06-02 20:00:57 +00:00
Keith Randall
c526f3ac10 runtime: tail call into memeq/cmp body implementations
There's no need to call/ret to the body implementation.
It can write the result to the right place.  Just jump to
it and have it return to our caller.

Old:
  call body implementation
  compute result
  put result in a register
  return
  write register to result location
  return

New:
  load address of result location into a register
  jump to body implementation
  compute result
  write result to passed-in address
  return

It's a bit tricky on 386 because there is no free register
with which to pass the result location.  Free up a register
by keeping around blen-alen instead of both alen and blen.

Change-Id: If2cf0682a5bf1cc592bdda7c126ed4eee8944fba
Reviewed-on: https://go-review.googlesource.com/9202
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-04-29 04:46:25 +00:00
Josh Bleecher Snyder
a76099f0d9 runtime: fix arm64 asm vet issues
Several naming changes and a real issue in asmcgocall_errno.

Change-Id: Ieb0a328a168819fe233d74e0397358384d7e71b3
Reviewed-on: https://go-review.googlesource.com/9212
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-22 02:30:11 +00:00
David Crawshaw
5b72b8c7a3 runtime: aeshash stubs for arm64
For some reason the absense of an implementation does not stop arm64
binaries being built. However it comes up with -buildmode=c-archive.

Change-Id: Ic0db5fd8fb4fe8252b5aa320818df0c7aec3db8f
Reviewed-on: https://go-review.googlesource.com/8989
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-16 19:49:31 +00:00
Shenghou Ma
4a71b91d29 runtime: darwin/arm64 support
Change-Id: I3b3f80791a1db4c2b7318f81a115972cd2237f03
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/8782
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-16 13:01:19 +00:00
Josh Bleecher Snyder
969f10140c runtime: fix arm64 build
Broken by CL 8541.

Change-Id: Ie2e89a22b91748e82f7bc4723660a24ed4135687
Reviewed-on: https://go-review.googlesource.com/8734
Reviewed-by: Minux Ma <minux@golang.org>
2015-04-10 02:29:01 +00:00
Dave Cheney
ee349b5d77 runtime: add arm64 runtime.cmpstring and bytes.Compare
Add arm64 assembly implementation of runtime.cmpstring and bytes.Compare.

benchmark                                old ns/op     new ns/op     delta
BenchmarkCompareBytesEqual               98.0          27.5          -71.94%
BenchmarkCompareBytesToNil               9.38          10.0          +6.61%
BenchmarkCompareBytesEmpty               13.3          10.0          -24.81%
BenchmarkCompareBytesIdentical           98.0          27.5          -71.94%
BenchmarkCompareBytesSameLength          43.3          16.3          -62.36%
BenchmarkCompareBytesDifferentLength     43.4          16.3          -62.44%
BenchmarkCompareBytesBigUnaligned        6979680       1360979       -80.50%
BenchmarkCompareBytesBig                 6915995       1381979       -80.02%
BenchmarkCompareBytesBigIdentical        6781440       1327304       -80.43%

benchmark                             old MB/s     new MB/s     speedup
BenchmarkCompareBytesBigUnaligned     150.23       770.46       5.13x
BenchmarkCompareBytesBig              151.62       758.76       5.00x
BenchmarkCompareBytesBigIdentical     154.63       790.01       5.11x

* note, the machine we are benchmarking on has some issues. What is clear is
compared to a few days ago the old MB/s value has increased from ~115 to 150.
I'm less certain about the new MB/s number, which used to be close to 1Gb/s.

Change-Id: I4f31b2c7a06296e13912aacc958525632cb0450d
Reviewed-on: https://go-review.googlesource.com/8541
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-09 14:49:31 +00:00
Shenghou Ma
d0b62d8bfa runtime: linux/arm64 cgo support
Change-Id: I309e3df7608b9eef9339196fdc50dedf5f9439f3
Reviewed-on: https://go-review.googlesource.com/8450
Reviewed-by: Aram Hăvărneanu <aram@mgk.ro>
2015-04-08 09:08:27 +00:00
Russ Cox
92c826b1b2 cmd/internal/gc: inline runtime.getg
This more closely restores what the old C runtime did.
(In C, g was an 'extern register' with the same effective
implementation as in this CL.)

On a late 2012 MacBookPro10,2, best of 5 old vs best of 5 new:

benchmark                          old ns/op      new ns/op      delta
BenchmarkBinaryTree17              4981312777     4463426605     -10.40%
BenchmarkFannkuch11                3046495712     3006819428     -1.30%
BenchmarkFmtFprintfEmpty           89.3           79.8           -10.64%
BenchmarkFmtFprintfString          284            262            -7.75%
BenchmarkFmtFprintfInt             282            262            -7.09%
BenchmarkFmtFprintfIntInt          480            448            -6.67%
BenchmarkFmtFprintfPrefixedInt     382            358            -6.28%
BenchmarkFmtFprintfFloat           529            486            -8.13%
BenchmarkFmtManyArgs               1849           1773           -4.11%
BenchmarkGobDecode                 12835963       11794385       -8.11%
BenchmarkGobEncode                 10527170       10288422       -2.27%
BenchmarkGzip                      436109569      438422516      +0.53%
BenchmarkGunzip                    110121663      109843648      -0.25%
BenchmarkHTTPClientServer          81930          85446          +4.29%
BenchmarkJSONEncode                24638574       24280603       -1.45%
BenchmarkJSONDecode                93022423       85753546       -7.81%
BenchmarkMandelbrot200             4703899        4735407        +0.67%
BenchmarkGoParse                   5319853        5086843        -4.38%
BenchmarkRegexpMatchEasy0_32       151            151            +0.00%
BenchmarkRegexpMatchEasy0_1K       452            453            +0.22%
BenchmarkRegexpMatchEasy1_32       131            132            +0.76%
BenchmarkRegexpMatchEasy1_1K       761            722            -5.12%
BenchmarkRegexpMatchMedium_32      228            224            -1.75%
BenchmarkRegexpMatchMedium_1K      63751          64296          +0.85%
BenchmarkRegexpMatchHard_32        3188           3238           +1.57%
BenchmarkRegexpMatchHard_1K        95396          96756          +1.43%
BenchmarkRevcomp                   661587262      687107364      +3.86%
BenchmarkTemplate                  108312598      104008540      -3.97%
BenchmarkTimeParse                 453            459            +1.32%
BenchmarkTimeFormat                475            441            -7.16%

The garbage benchmark from the benchmarks subrepo gets 2.6% faster as well.

Change-Id: I320aeda332db81012688b26ffab23f6581c59cfa
Reviewed-on: https://go-review.googlesource.com/8460
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-04-07 14:26:47 +00:00
Josh Bleecher Snyder
ad3600945a runtime: auto-generate duff routines
This makes it easier to experiment with alternative implementations.

While we're here, update the comments.

No functional changes. Passes toolstash -cmp.

Change-Id: I428535754908f0fdd7cc36c214ddb6e1e60f376e
Reviewed-on: https://go-review.googlesource.com/8310
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-02 02:37:59 +00:00
Michael Hudson-Doyle
f78dc1dac1 runtime: rename ·main·f to ·mainPC to avoid duplicate symbol
runtime·main·f is normalized by the linker to runtime.main.f, as is
the compiler-generated symbol runtime.main·f.  Change the former to
runtime·mainPC instead.

Fixes issue #9934

Change-Id: I656a6fa6422d45385fa2cc55bd036c6affa1abfe
Reviewed-on: https://go-review.googlesource.com/8234
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-30 18:52:14 +00:00
Aram Hăvărneanu
846ee0465b runtime: add support for linux/arm64
Change-Id: Ibda6a5bedaff57fd161d63fc04ad260931d34413
Reviewed-on: https://go-review.googlesource.com/7142
Reviewed-by: Russ Cox <rsc@golang.org>
2015-03-16 18:45:54 +00:00