2013-11-12 21:04:22 -07:00
|
|
|
|
<!--{
|
|
|
|
|
"Title": "A Quick Guide to Go's Assembler",
|
2013-11-13 19:29:34 -07:00
|
|
|
|
"Path": "/doc/asm"
|
2013-11-12 21:04:22 -07:00
|
|
|
|
}-->
|
|
|
|
|
|
|
|
|
|
<h2 id="introduction">A Quick Guide to Go's Assembler</h2>
|
|
|
|
|
|
|
|
|
|
<p>
|
2015-07-07 23:53:47 -06:00
|
|
|
|
This document is a quick outline of the unusual form of assembly language used by the <code>gc</code> Go compiler.
|
2014-04-15 17:27:48 -06:00
|
|
|
|
The document is not comprehensive.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
2015-07-07 23:53:47 -06:00
|
|
|
|
The assembler is based on the input style of the Plan 9 assemblers, which is documented in detail
|
|
|
|
|
<a href="http://plan9.bell-labs.com/sys/doc/asm.html">elsewhere</a>.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
|
2015-07-07 23:53:47 -06:00
|
|
|
|
The current document provides a summary of the syntax and the differences with
|
|
|
|
|
what is explained in that document, and
|
2013-11-12 21:04:22 -07:00
|
|
|
|
describes the peculiarities that apply when writing assembly code to interact with Go.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine.
|
|
|
|
|
Some of the details map precisely to the machine, but some do not.
|
|
|
|
|
This is because the compiler suite (see
|
|
|
|
|
<a href="http://plan9.bell-labs.com/sys/doc/compiler.html">this description</a>)
|
|
|
|
|
needs no assembler pass in the usual pipeline.
|
2015-07-07 23:53:47 -06:00
|
|
|
|
Instead, the compiler operates on a kind of semi-abstract instruction set,
|
|
|
|
|
and instruction selection occurs partly after code generation.
|
|
|
|
|
The assembler works on the semi-abstract form, so
|
|
|
|
|
when you see an instruction like <code>MOV</code>
|
|
|
|
|
what the tool chain actually generates for that operation might
|
|
|
|
|
not be a move instruction at all, perhaps a clear or load.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
Or it might correspond exactly to the machine instruction with that name.
|
|
|
|
|
In general, machine-specific operations tend to appear as themselves, while more general concepts like
|
|
|
|
|
memory move and subroutine call and return are more abstract.
|
|
|
|
|
The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
2015-07-07 23:53:47 -06:00
|
|
|
|
The assembler program is a way to parse a description of that
|
|
|
|
|
semi-abstract instruction set and turn it into instructions to be
|
|
|
|
|
input to the linker.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
If you want to see what the instructions look like in assembly for a given architecture, say amd64, there
|
|
|
|
|
are many examples in the sources of the standard library, in packages such as
|
|
|
|
|
<a href="/pkg/runtime/"><code>runtime</code></a> and
|
|
|
|
|
<a href="/pkg/math/big/"><code>math/big</code></a>.
|
2015-07-07 23:53:47 -06:00
|
|
|
|
You can also examine what the compiler emits as assembly code
|
|
|
|
|
(the actual output may differ from what you see here):
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
$ cat x.go
|
|
|
|
|
package main
|
|
|
|
|
|
|
|
|
|
func main() {
|
|
|
|
|
println(3)
|
|
|
|
|
}
|
2015-07-07 23:53:47 -06:00
|
|
|
|
$ GOOS=linux GOARCH=amd64 go tool compile -S x.go # or: go build -gcflags -S x.go
|
2013-11-12 21:04:22 -07:00
|
|
|
|
|
|
|
|
|
--- prog list "main" ---
|
|
|
|
|
0000 (x.go:3) TEXT main+0(SB),$8-0
|
|
|
|
|
0001 (x.go:3) FUNCDATA $0,gcargs·0+0(SB)
|
|
|
|
|
0002 (x.go:3) FUNCDATA $1,gclocals·0+0(SB)
|
|
|
|
|
0003 (x.go:4) MOVQ $3,(SP)
|
|
|
|
|
0004 (x.go:4) PCDATA $0,$8
|
|
|
|
|
0005 (x.go:4) CALL ,runtime.printint+0(SB)
|
|
|
|
|
0006 (x.go:4) PCDATA $0,$-1
|
|
|
|
|
0007 (x.go:4) PCDATA $0,$0
|
|
|
|
|
0008 (x.go:4) CALL ,runtime.printnl+0(SB)
|
|
|
|
|
0009 (x.go:4) PCDATA $0,$-1
|
|
|
|
|
0010 (x.go:5) RET ,
|
|
|
|
|
...
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
The <code>FUNCDATA</code> and <code>PCDATA</code> directives contain information
|
|
|
|
|
for use by the garbage collector; they are introduced by the compiler.
|
|
|
|
|
</p>
|
|
|
|
|
|
2014-04-15 17:27:48 -06:00
|
|
|
|
<!-- Commenting out because the feature is gone but it's popular and may come back.
|
|
|
|
|
|
2013-11-12 21:04:22 -07:00
|
|
|
|
<p>
|
|
|
|
|
To see what gets put in the binary after linking, add the <code>-a</code> flag to the linker:
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
$ go tool 6l -a x.6 # or: go build -ldflags -a x.go
|
|
|
|
|
codeblk [0x2000,0x1d059) at offset 0x1000
|
|
|
|
|
002000 main.main | (3) TEXT main.main+0(SB),$8
|
|
|
|
|
002000 65488b0c25a0080000 | (3) MOVQ 2208(GS),CX
|
|
|
|
|
002009 483b21 | (3) CMPQ SP,(CX)
|
|
|
|
|
00200c 7707 | (3) JHI ,2015
|
|
|
|
|
00200e e83da20100 | (3) CALL ,1c250+runtime.morestack00
|
|
|
|
|
002013 ebeb | (3) JMP ,2000
|
|
|
|
|
002015 4883ec08 | (3) SUBQ $8,SP
|
|
|
|
|
002019 | (3) FUNCDATA $0,main.gcargs·0+0(SB)
|
|
|
|
|
002019 | (3) FUNCDATA $1,main.gclocals·0+0(SB)
|
|
|
|
|
002019 48c7042403000000 | (4) MOVQ $3,(SP)
|
|
|
|
|
002021 | (4) PCDATA $0,$8
|
|
|
|
|
002021 e8aad20000 | (4) CALL ,f2d0+runtime.printint
|
|
|
|
|
002026 | (4) PCDATA $0,$-1
|
|
|
|
|
002026 | (4) PCDATA $0,$0
|
|
|
|
|
002026 e865d40000 | (4) CALL ,f490+runtime.printnl
|
|
|
|
|
00202b | (4) PCDATA $0,$-1
|
|
|
|
|
00202b 4883c408 | (5) ADDQ $8,SP
|
|
|
|
|
00202f c3 | (5) RET ,
|
|
|
|
|
...
|
|
|
|
|
</pre>
|
|
|
|
|
|
2014-04-15 17:27:48 -06:00
|
|
|
|
-->
|
2013-11-12 21:04:22 -07:00
|
|
|
|
|
2015-07-07 23:53:47 -06:00
|
|
|
|
<h3 id="constants">Constants</h3>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Although the assembler takes its guidance from the Plan 9 assemblers,
|
|
|
|
|
it is a distinct program, so there are some differences.
|
|
|
|
|
One is in constant evaluation.
|
|
|
|
|
Constant expressions in the assembler are parsed using Go's operator
|
|
|
|
|
precedence, not the C-like precedence of the original.
|
|
|
|
|
Thus <code>3&1<<2</code> is 4, not 0—it parses as <code>(3&1)<<2</code>
|
|
|
|
|
not <code>3&(1<<2)</code>.
|
|
|
|
|
Also, constants are always evaluated as 64-bit unsigned integers.
|
|
|
|
|
Thus <code>-2</code> is not the integer value minus two,
|
|
|
|
|
but the unsigned 64-bit integer with the same bit pattern.
|
|
|
|
|
The distinction rarely matters but
|
|
|
|
|
to avoid ambiguity, division or right shift where the right operand's
|
|
|
|
|
high bit is set is rejected.
|
|
|
|
|
</p>
|
|
|
|
|
|
2013-11-12 21:04:22 -07:00
|
|
|
|
<h3 id="symbols">Symbols</h3>
|
|
|
|
|
|
|
|
|
|
<p>
|
2015-07-07 23:53:47 -06:00
|
|
|
|
Some symbols, such as <code>R1</code> or <code>LR</code>,
|
|
|
|
|
are predefined and refer to registers.
|
|
|
|
|
The exact set depends on the architecture.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
There are four predeclared symbols that refer to pseudo-registers.
|
|
|
|
|
These are not real registers, but rather virtual registers maintained by
|
|
|
|
|
the tool chain, such as a frame pointer.
|
|
|
|
|
The set of pseudo-registers is the same for all architectures:
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>FP</code>: Frame pointer: arguments and locals.
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>PC</code>: Program counter:
|
|
|
|
|
jumps and branches.
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>SB</code>: Static base pointer: global symbols.
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>SP</code>: Stack pointer: top of stack.
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
All user-defined symbols are written as offsets to the pseudo-registers
|
|
|
|
|
<code>FP</code> (arguments and locals) and <code>SB</code> (globals).
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code>
|
|
|
|
|
is the name <code>foo</code> as an address in memory.
|
2014-10-28 13:51:06 -06:00
|
|
|
|
This form is used to name global functions and data.
|
2015-07-07 23:53:47 -06:00
|
|
|
|
Adding <code><></code> to the name, as in <span style="white-space: nowrap"><code>foo<>(SB)</code></span>, makes the name
|
2014-10-28 13:51:06 -06:00
|
|
|
|
visible only in the current source file, like a top-level <code>static</code> declaration in a C file.
|
2015-07-07 23:53:47 -06:00
|
|
|
|
Adding an offset to the name refers to that offset from the symbol's address, so
|
|
|
|
|
<code>a+4(SB)</code> is four bytes past the start of <code>foo</code>.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
2013-11-13 19:29:34 -07:00
|
|
|
|
The <code>FP</code> pseudo-register is a virtual frame pointer
|
|
|
|
|
used to refer to function arguments.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register.
|
|
|
|
|
Thus <code>0(FP)</code> is the first argument to the function,
|
|
|
|
|
<code>8(FP)</code> is the second (on a 64-bit machine), and so on.
|
2015-07-07 23:53:47 -06:00
|
|
|
|
However, when referring to a function argument this way, it is necessary to place a name
|
2013-11-13 19:29:34 -07:00
|
|
|
|
at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>.
|
2015-07-07 23:53:47 -06:00
|
|
|
|
(The meaning of the offset—offset from the frame pointer—distinct
|
|
|
|
|
from its use with <code>SB</code>, where it is an offset from the symbol.)
|
|
|
|
|
The assembler enforces this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>.
|
|
|
|
|
The actual name is semantically irrelevant but should be used to document
|
|
|
|
|
the argument's name.
|
|
|
|
|
It is worth stressing that <code>FP</code> is always a
|
|
|
|
|
pseudo-register, not a hardware
|
|
|
|
|
register, even on architectures with a hardware frame pointer.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
2014-10-28 13:51:06 -06:00
|
|
|
|
For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names
|
2013-11-13 19:29:34 -07:00
|
|
|
|
and offsets match.
|
2014-10-28 13:51:06 -06:00
|
|
|
|
On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding
|
|
|
|
|
a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>.
|
|
|
|
|
If a Go prototype does not name its result, the expected assembly name is <code>ret</code>.
|
2013-11-13 19:29:34 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
The <code>SP</code> pseudo-register is a virtual stack pointer
|
|
|
|
|
used to refer to frame-local variables and the arguments being
|
|
|
|
|
prepared for function calls.
|
|
|
|
|
It points to the top of the local stack frame, so references should use negative offsets
|
|
|
|
|
in the range [−framesize, 0):
|
|
|
|
|
<code>x-8(SP)</code>, <code>y-4(SP)</code>, and so on.
|
2015-07-07 23:53:47 -06:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
On architectures with a hardware register named <code>SP</code>,
|
|
|
|
|
the name prefix distinguishes
|
|
|
|
|
references to the virtual stack pointer from references to the architectural
|
|
|
|
|
<code>SP</code> register.
|
|
|
|
|
That is, <code>x-8(SP)</code> and <code>-8(SP)</code>
|
|
|
|
|
are different memory locations:
|
|
|
|
|
the first refers to the virtual stack pointer pseudo-register,
|
|
|
|
|
while the second refers to the
|
2013-11-13 19:29:34 -07:00
|
|
|
|
hardware's <code>SP</code> register.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
2015-07-07 23:53:47 -06:00
|
|
|
|
<p>
|
|
|
|
|
On machines where <code>SP</code> and <code>PC</code> are
|
|
|
|
|
traditionally aliases for a physical, numbered register,
|
|
|
|
|
in the Go assembler the names <code>SP</code> and <code>PC</code>
|
|
|
|
|
are still treated specially;
|
|
|
|
|
for instance, references to <code>SP</code> require a symbol,
|
|
|
|
|
much like <code>FP</code>.
|
|
|
|
|
To access the actual hardware register use the true <code>R</code> name.
|
|
|
|
|
For example, on the ARM architecture the hardware
|
|
|
|
|
<code>SP</code> and <code>PC</code> are accessible as
|
|
|
|
|
<code>R13</code> and <code>R15</code>.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Branches and direct jumps are always written as offsets to the PC, or as
|
|
|
|
|
jumps to labels:
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
label:
|
|
|
|
|
MOVW $0, R1
|
|
|
|
|
JMP label
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Each label is visible only within the function in which it is defined.
|
|
|
|
|
It is therefore permitted for multiple functions in a file to define
|
|
|
|
|
and use the same label names.
|
|
|
|
|
Direct jumps and call instructions can target text symbols,
|
|
|
|
|
such as <code>name(SB)</code>, but not offsets from symbols,
|
|
|
|
|
such as <code>name+4(SB)</code>.
|
|
|
|
|
</p>
|
|
|
|
|
|
2013-11-12 21:04:22 -07:00
|
|
|
|
<p>
|
|
|
|
|
Instructions, registers, and assembler directives are always in UPPER CASE to remind you
|
|
|
|
|
that assembly programming is a fraught endeavor.
|
all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
2014-06-26 09:54:39 -06:00
|
|
|
|
(Exception: the <code>g</code> register renaming on ARM.)
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
In Go object files and binaries, the full name of a symbol is the
|
|
|
|
|
package path followed by a period and the symbol name:
|
|
|
|
|
<code>fmt.Printf</code> or <code>math/rand.Int</code>.
|
|
|
|
|
Because the assembler's parser treats period and slash as punctuation,
|
|
|
|
|
those strings cannot be used directly as identifier names.
|
|
|
|
|
Instead, the assembler allows the middle dot character U+00B7
|
|
|
|
|
and the division slash U+2215 in identifiers and rewrites them to
|
|
|
|
|
plain period and slash.
|
|
|
|
|
Within an assembler source file, the symbols above are written as
|
|
|
|
|
<code>fmt·Printf</code> and <code>math∕rand·Int</code>.
|
|
|
|
|
The assembly listings generated by the compilers when using the <code>-S</code> flag
|
|
|
|
|
show the period and slash directly instead of the Unicode replacements
|
|
|
|
|
required by the assemblers.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Most hand-written assembly files do not include the full package path
|
|
|
|
|
in symbol names, because the linker inserts the package path of the current
|
|
|
|
|
object file at the beginning of any name starting with a period:
|
|
|
|
|
in an assembly source file within the math/rand package implementation,
|
|
|
|
|
the package's Int function can be referred to as <code>·Int</code>.
|
|
|
|
|
This convention avoids the need to hard-code a package's import path in its
|
|
|
|
|
own source code, making it easier to move the code from one location to another.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<h3 id="directives">Directives</h3>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
The assembler uses various directives to bind text and data to symbol names.
|
|
|
|
|
For example, here is a simple complete function definition. The <code>TEXT</code>
|
|
|
|
|
directive declares the symbol <code>runtime·profileloop</code> and the instructions
|
|
|
|
|
that follow form the body of the function.
|
|
|
|
|
The last instruction in a <code>TEXT</code> block must be some sort of jump, usually a <code>RET</code> (pseudo-)instruction.
|
|
|
|
|
(If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in <code>TEXTs</code>.)
|
|
|
|
|
After the symbol, the arguments are flags (see below)
|
|
|
|
|
and the frame size, a constant (but see below):
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
TEXT runtime·profileloop(SB),NOSPLIT,$8
|
|
|
|
|
MOVQ $runtime·profileloop1(SB), CX
|
|
|
|
|
MOVQ CX, 0(SP)
|
|
|
|
|
CALL runtime·externalthreadhandler(SB)
|
|
|
|
|
RET
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
In the general case, the frame size is followed by an argument size, separated by a minus sign.
|
2014-04-27 08:40:48 -06:00
|
|
|
|
(It's not a subtraction, just idiosyncratic syntax.)
|
2013-11-12 21:04:22 -07:00
|
|
|
|
The frame size <code>$24-8</code> states that the function has a 24-byte frame
|
|
|
|
|
and is called with 8 bytes of argument, which live on the caller's frame.
|
|
|
|
|
If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>,
|
|
|
|
|
the argument size must be provided.
|
2014-10-28 13:51:06 -06:00
|
|
|
|
For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the
|
|
|
|
|
argument size is correct.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the
|
|
|
|
|
static base pseudo-register <code>SB</code>.
|
|
|
|
|
This function would be called from Go source for package <code>runtime</code> using the
|
|
|
|
|
simple name <code>profileloop</code>.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
2014-10-28 13:51:06 -06:00
|
|
|
|
Global data symbols are defined by a sequence of initializing
|
|
|
|
|
<code>DATA</code> directives followed by a <code>GLOBL</code> directive.
|
|
|
|
|
Each <code>DATA</code> directive initializes a section of the
|
|
|
|
|
corresponding memory.
|
|
|
|
|
The memory not explicitly initialized is zeroed.
|
|
|
|
|
The general form of the <code>DATA</code> directive is
|
2013-11-12 21:04:22 -07:00
|
|
|
|
|
|
|
|
|
<pre>
|
2014-10-28 13:51:06 -06:00
|
|
|
|
DATA symbol+offset(SB)/width, value
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>
|
2014-10-28 13:51:06 -06:00
|
|
|
|
which initializes the symbol memory at the given offset and width with the given value.
|
|
|
|
|
The <code>DATA</code> directives for a given symbol must be written with increasing offsets.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
The <code>GLOBL</code> directive declares a symbol to be global.
|
|
|
|
|
The arguments are optional flags and the size of the data being declared as a global,
|
|
|
|
|
which will have initial value all zeros unless a <code>DATA</code> directive
|
|
|
|
|
has initialized it.
|
|
|
|
|
The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives.
|
2014-10-28 13:51:06 -06:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
For example,
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
2014-10-28 13:51:06 -06:00
|
|
|
|
DATA divtab<>+0x00(SB)/4, $0xf4f8fcff
|
|
|
|
|
DATA divtab<>+0x04(SB)/4, $0xe6eaedf0
|
|
|
|
|
...
|
|
|
|
|
DATA divtab<>+0x3c(SB)/4, $0x81828384
|
|
|
|
|
GLOBL divtab<>(SB), RODATA, $64
|
|
|
|
|
|
|
|
|
|
GLOBL runtime·tlsoffset(SB), NOPTR, $4
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>
|
2014-10-28 13:51:06 -06:00
|
|
|
|
declares and initializes <code>divtab<></code>, a read-only 64-byte table of 4-byte integer values,
|
|
|
|
|
and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that
|
|
|
|
|
contains no pointers.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
There may be one or two arguments to the directives.
|
|
|
|
|
If there are two, the first is a bit mask of flags,
|
|
|
|
|
which can be written as numeric expressions, added or or-ed together,
|
|
|
|
|
or can be set symbolically for easier absorption by a human.
|
2014-08-12 18:04:45 -06:00
|
|
|
|
Their values, defined in the standard <code>#include</code> file <code>textflag.h</code>, are:
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
<li>
|
|
|
|
|
<code>NOPROF</code> = 1
|
|
|
|
|
<br>
|
|
|
|
|
(For <code>TEXT</code> items.)
|
|
|
|
|
Don't profile the marked function. This flag is deprecated.
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<code>DUPOK</code> = 2
|
|
|
|
|
<br>
|
|
|
|
|
It is legal to have multiple instances of this symbol in a single binary.
|
|
|
|
|
The linker will choose one of the duplicates to use.
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<code>NOSPLIT</code> = 4
|
|
|
|
|
<br>
|
|
|
|
|
(For <code>TEXT</code> items.)
|
|
|
|
|
Don't insert the preamble to check if the stack must be split.
|
|
|
|
|
The frame for the routine, plus anything it calls, must fit in the
|
|
|
|
|
spare space at the top of the stack segment.
|
|
|
|
|
Used to protect routines such as the stack splitting code itself.
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<code>RODATA</code> = 8
|
|
|
|
|
<br>
|
|
|
|
|
(For <code>DATA</code> and <code>GLOBL</code> items.)
|
|
|
|
|
Put this data in a read-only section.
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<code>NOPTR</code> = 16
|
|
|
|
|
<br>
|
|
|
|
|
(For <code>DATA</code> and <code>GLOBL</code> items.)
|
|
|
|
|
This data contains no pointers and therefore does not need to be
|
|
|
|
|
scanned by the garbage collector.
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
2015-07-07 23:53:47 -06:00
|
|
|
|
<code>WRAPPER</code> = 32
|
2013-11-12 21:04:22 -07:00
|
|
|
|
<br>
|
|
|
|
|
(For <code>TEXT</code> items.)
|
|
|
|
|
This is a wrapper function and should not count as disabling <code>recover</code>.
|
|
|
|
|
</li>
|
2015-07-07 23:53:47 -06:00
|
|
|
|
<li>
|
|
|
|
|
<code>NEEDCTXT</code> = 64
|
|
|
|
|
<br>
|
|
|
|
|
(For <code>TEXT</code> items.)
|
|
|
|
|
This function is a closure so it uses its incoming context register.
|
|
|
|
|
</li>
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</ul>
|
|
|
|
|
|
2014-10-28 13:51:06 -06:00
|
|
|
|
<h3 id="runtime">Runtime Coordination</h3>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
For garbage collection to run correctly, the runtime must know the
|
|
|
|
|
location of pointers in all global data and in most stack frames.
|
|
|
|
|
The Go compiler emits this information when compiling Go source files,
|
|
|
|
|
but assembly programs must define it explicitly.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
A data symbol marked with the <code>NOPTR</code> flag (see above)
|
|
|
|
|
is treated as containing no pointers to runtime-allocated data.
|
|
|
|
|
A data symbol with the <code>RODATA</code> flag
|
|
|
|
|
is allocated in read-only memory and is therefore treated
|
|
|
|
|
as implicitly marked <code>NOPTR</code>.
|
|
|
|
|
A data symbol with a total size smaller than a pointer
|
|
|
|
|
is also treated as implicitly marked <code>NOPTR</code>.
|
|
|
|
|
It is not possible to define a symbol containing pointers in an assembly source file;
|
|
|
|
|
such a symbol must be defined in a Go source file instead.
|
|
|
|
|
Assembly source can still refer to the symbol by name
|
|
|
|
|
even without <code>DATA</code> and <code>GLOBL</code> directives.
|
|
|
|
|
A good general rule of thumb is to define all non-<code>RODATA</code>
|
|
|
|
|
symbols in Go instead of in assembly.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Each function also needs annotations giving the location of
|
|
|
|
|
live pointers in its arguments, results, and local stack frame.
|
|
|
|
|
For an assembly function with no pointer results and
|
|
|
|
|
either no local stack frame or no function calls,
|
|
|
|
|
the only requirement is to define a Go prototype for the function
|
2015-01-08 19:43:47 -07:00
|
|
|
|
in a Go source file in the same package. The name of the assembly
|
|
|
|
|
function must not contain the package name component (for example,
|
|
|
|
|
function <code>Syscall</code> in package <code>syscall</code> should
|
|
|
|
|
use the name <code>·Syscall</code> instead of the equivalent name
|
|
|
|
|
<code>syscall·Syscall</code> in its <code>TEXT</code> directive).
|
2014-10-28 13:51:06 -06:00
|
|
|
|
For more complex situations, explicit annotation is needed.
|
|
|
|
|
These annotations use pseudo-instructions defined in the standard
|
|
|
|
|
<code>#include</code> file <code>funcdata.h</code>.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
If a function has no arguments and no results,
|
|
|
|
|
the pointer information can be omitted.
|
|
|
|
|
This is indicated by an argument size annotation of <code>$<i>n</i>-0</code>
|
|
|
|
|
on the <code>TEXT</code> instruction.
|
|
|
|
|
Otherwise, pointer information must be provided by
|
|
|
|
|
a Go prototype for the function in a Go source file,
|
|
|
|
|
even for assembly functions not called directly from Go.
|
|
|
|
|
(The prototype will also let <code>go</code> <code>vet</code> check the argument references.)
|
|
|
|
|
At the start of the function, the arguments are assumed
|
|
|
|
|
to be initialized but the results are assumed uninitialized.
|
|
|
|
|
If the results will hold live pointers during a call instruction,
|
|
|
|
|
the function should start by zeroing the results and then
|
|
|
|
|
executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>.
|
|
|
|
|
This instruction records that the results are now initialized
|
|
|
|
|
and should be scanned during stack movement and garbage collection.
|
|
|
|
|
It is typically easier to arrange that assembly functions do not
|
|
|
|
|
return pointers or do not contain call instructions;
|
|
|
|
|
no assembly functions in the standard library use
|
|
|
|
|
<code>GO_RESULTS_INITIALIZED</code>.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
If a function has no local stack frame,
|
|
|
|
|
the pointer information can be omitted.
|
|
|
|
|
This is indicated by a local frame size annotation of <code>$0-<i>n</i></code>
|
|
|
|
|
on the <code>TEXT</code> instruction.
|
|
|
|
|
The pointer information can also be omitted if the
|
|
|
|
|
function contains no call instructions.
|
|
|
|
|
Otherwise, the local stack frame must not contain pointers,
|
|
|
|
|
and the assembly must confirm this fact by executing the
|
|
|
|
|
pseudo-instruction <code>NO_LOCAL_POINTERS</code>.
|
|
|
|
|
Because stack resizing is implemented by moving the stack,
|
|
|
|
|
the stack pointer may change during any function call:
|
|
|
|
|
even pointers to stack data must not be kept in local variables.
|
|
|
|
|
</p>
|
|
|
|
|
|
2013-11-12 21:04:22 -07:00
|
|
|
|
<h2 id="architectures">Architecture-specific details</h2>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
It is impractical to list all the instructions and other details for each machine.
|
2015-07-12 23:22:35 -06:00
|
|
|
|
To see what instructions are defined for a given machine, say ARM,
|
|
|
|
|
look in the source for the <code>obj</code> support library for
|
|
|
|
|
that architecture, located in the directory <code>src/cmd/internal/obj/arm</code>.
|
|
|
|
|
In that directory is a file <code>a.out.go</code>; it contains
|
|
|
|
|
a long list of constants starting with <code>A</code>, like this:
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
2015-07-12 23:22:35 -06:00
|
|
|
|
const (
|
|
|
|
|
AAND = obj.ABaseARM + obj.A_ARCHSPECIFIC + iota
|
|
|
|
|
AEOR
|
|
|
|
|
ASUB
|
|
|
|
|
ARSB
|
|
|
|
|
AADD
|
2013-11-12 21:04:22 -07:00
|
|
|
|
...
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>
|
2015-07-12 23:22:35 -06:00
|
|
|
|
This is the list of instructions and their spellings as known to the assembler and linker for that architecture.
|
|
|
|
|
Each instruction begins with an initial capital <code>A</code> in this list, so <code>AAND</code>
|
|
|
|
|
represents the bitwise and instruction,
|
|
|
|
|
<code>AND</code> (without the leading <code>A</code>),
|
|
|
|
|
and is written in assembly source as <code>AND</code>.
|
|
|
|
|
The enumeration is mostly in alphabetical order.
|
|
|
|
|
(The architecture-independent <code>AXXX</code>, defined in the
|
|
|
|
|
<code>cmd/internal/obj</code> package,
|
|
|
|
|
represents an invalid instruction).
|
|
|
|
|
The sequence of the <code>A</code> names has nothing to do with the actual
|
|
|
|
|
encoding of the machine instructions.
|
|
|
|
|
The <code>cmd/internal/obj</code> package takes care of that detail.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
The instructions for both the 386 and AMD64 architectures are listed in
|
|
|
|
|
<code>cmd/internal/obj/x86/a.out.go</code>.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
The architectures share syntax for common addressing modes such as
|
|
|
|
|
<code>(R1)</code> (register indirect),
|
|
|
|
|
<code>4(R1)</code> (register indirect with offset), and
|
|
|
|
|
<code>$foo(SB)</code> (absolute address).
|
|
|
|
|
The assembler also supports some (not necessarily all) addressing modes
|
|
|
|
|
specific to each architecture.
|
|
|
|
|
The sections below list these.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
One detail evident in the examples from the previous sections is that data in the instructions flows from left to right:
|
|
|
|
|
<code>MOVQ</code> <code>$0,</code> <code>CX</code> clears <code>CX</code>.
|
2015-07-12 23:22:35 -06:00
|
|
|
|
This rule applies even on architectures where the conventional notation uses the opposite direction.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
2015-07-12 23:22:35 -06:00
|
|
|
|
Here follow some descriptions of key Go-specific details for the supported architectures.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<h3 id="x86">32-bit Intel 386</h3>
|
|
|
|
|
|
|
|
|
|
<p>
|
all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
2014-06-26 09:54:39 -06:00
|
|
|
|
The runtime pointer to the <code>g</code> structure is maintained
|
2013-11-12 21:04:22 -07:00
|
|
|
|
through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
|
|
|
|
|
A OS-dependent macro <code>get_tls</code> is defined for the assembler if the source includes
|
2015-07-12 23:22:35 -06:00
|
|
|
|
a special header, <code>go_asm.h</code>:
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
2015-07-12 23:22:35 -06:00
|
|
|
|
#include "go_asm.h"
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Within the runtime, the <code>get_tls</code> macro loads its argument register
|
all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
2014-06-26 09:54:39 -06:00
|
|
|
|
with a pointer to the <code>g</code> pointer, and the <code>g</code> struct
|
|
|
|
|
contains the <code>m</code> pointer.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this:
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
get_tls(CX)
|
all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
2014-06-26 09:54:39 -06:00
|
|
|
|
MOVL g(CX), AX // Move g into AX.
|
2015-07-12 23:22:35 -06:00
|
|
|
|
MOVL g_m(AX), BX // Move g.m into BX.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</pre>
|
|
|
|
|
|
2015-07-12 23:22:35 -06:00
|
|
|
|
<p>
|
|
|
|
|
Addressing modes:
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code>.
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>64(DI)(BX*2)</code>: The location at address <code>DI</code> plus <code>BX*2</code> plus 64.
|
|
|
|
|
These modes accept only 1, 2, 4, and 8 as scale factors.
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
</ul>
|
|
|
|
|
|
2013-11-12 21:04:22 -07:00
|
|
|
|
<h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3>
|
|
|
|
|
|
|
|
|
|
<p>
|
2015-07-12 23:22:35 -06:00
|
|
|
|
The two architectures behave largely the same at the assembler level.
|
|
|
|
|
Assembly code to access the <code>m</code> and <code>g</code>
|
|
|
|
|
pointers on the 64-bit version is the same as on the 32-bit 386,
|
|
|
|
|
except it uses <code>MOVQ</code> rather than <code>MOVL</code>:
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
get_tls(CX)
|
all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
2014-06-26 09:54:39 -06:00
|
|
|
|
MOVQ g(CX), AX // Move g into AX.
|
2015-07-12 23:22:35 -06:00
|
|
|
|
MOVQ g_m(AX), BX // Move g.m into BX.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<h3 id="arm">ARM</h3>
|
|
|
|
|
|
|
|
|
|
<p>
|
all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
2014-06-26 09:54:39 -06:00
|
|
|
|
The registers <code>R10</code> and <code>R11</code>
|
2013-11-13 19:29:34 -07:00
|
|
|
|
are reserved by the compiler and linker.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
2014-06-26 09:54:39 -06:00
|
|
|
|
<code>R10</code> points to the <code>g</code> (goroutine) structure.
|
|
|
|
|
Within assembler source code, this pointer must be referred to as <code>g</code>;
|
|
|
|
|
the name <code>R10</code> is not recognized.
|
2013-11-13 19:29:34 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
To make it easier for people and compilers to write assembly, the ARM linker
|
|
|
|
|
allows general addressing forms and pseudo-operations like <code>DIV</code> or <code>MOD</code>
|
|
|
|
|
that may not be expressible using a single hardware instruction.
|
|
|
|
|
It implements these forms as multiple instructions, often using the <code>R11</code> register
|
|
|
|
|
to hold temporary values.
|
|
|
|
|
Hand-written assembly can use <code>R11</code>, but doing so requires
|
|
|
|
|
being sure that the linker is not also using it to implement any of the other
|
|
|
|
|
instructions in the function.
|
2013-11-12 21:04:22 -07:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
When defining a <code>TEXT</code>, specifying frame size <code>$-4</code>
|
|
|
|
|
tells the linker that this is a leaf function that does not need to save <code>LR</code> on entry.
|
|
|
|
|
</p>
|
|
|
|
|
|
2013-11-13 19:29:34 -07:00
|
|
|
|
<p>
|
|
|
|
|
The name <code>SP</code> always refers to the virtual stack pointer described earlier.
|
|
|
|
|
For the hardware register, use <code>R13</code>.
|
|
|
|
|
</p>
|
2013-11-12 21:04:22 -07:00
|
|
|
|
|
2015-07-13 18:24:40 -06:00
|
|
|
|
<p>
|
|
|
|
|
Condition code syntax is to append a period and the one- or two-letter code to the instruction,
|
|
|
|
|
as in <code>MOVW.EQ</code>.
|
|
|
|
|
Multiple codes may be appended: <code>MOVM.IA.W</code>.
|
|
|
|
|
The order of the code modifiers is irrelevant.
|
|
|
|
|
</p>
|
|
|
|
|
|
2015-07-12 23:22:35 -06:00
|
|
|
|
<p>
|
|
|
|
|
Addressing modes:
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>R0->16</code>
|
|
|
|
|
<br>
|
|
|
|
|
<code>R0>>16</code>
|
|
|
|
|
<br>
|
|
|
|
|
<code>R0<<16</code>
|
|
|
|
|
<br>
|
|
|
|
|
<code>R0@>16</code>:
|
|
|
|
|
For <code><<</code>, left shift <code>R0</code> by 16 bits.
|
|
|
|
|
The other codes are <code>-></code> (arithmetic right shift),
|
|
|
|
|
<code>>></code> (logical right shift), and
|
|
|
|
|
<code>@></code> (rotate right).
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>R0->R1</code>
|
|
|
|
|
<br>
|
|
|
|
|
<code>R0>>R1</code>
|
|
|
|
|
<br>
|
|
|
|
|
<code>R0<<R1</code>
|
|
|
|
|
<br>
|
|
|
|
|
<code>R0@>R1</code>:
|
|
|
|
|
For <code><<</code>, left shift <code>R0</code> by the count in <code>R1</code>.
|
|
|
|
|
The other codes are <code>-></code> (arithmetic right shift),
|
|
|
|
|
<code>>></code> (logical right shift), and
|
|
|
|
|
<code>@></code> (rotate right).
|
|
|
|
|
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>[R0,g,R12-R15]</code>: For multi-register instructions, the set comprising
|
|
|
|
|
<code>R0</code>, <code>g</code>, and <code>R12</code> through <code>R15</code> inclusive.
|
|
|
|
|
</li>
|
|
|
|
|
|
2015-07-13 18:24:40 -06:00
|
|
|
|
<li>
|
|
|
|
|
<code>(R5, R6)</code>: Destination register pair.
|
|
|
|
|
</li>
|
|
|
|
|
|
2015-07-12 23:22:35 -06:00
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
|
|
<h3 id="arm64">ARM64</h3>
|
|
|
|
|
|
|
|
|
|
<p>
|
2015-07-13 18:24:40 -06:00
|
|
|
|
The ARM64 port is in an experimental state.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Instruction modifiers are appended to the instruction following a period.
|
|
|
|
|
The only modifiers are <code>P</code> (postincrement) and <code>W</code>
|
|
|
|
|
(preincrement):
|
|
|
|
|
<code>MOVW.P</code>, <code>MOVW.W</code>
|
2015-07-12 23:22:35 -06:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Addressing modes:
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
|
|
|
|
<li>
|
2015-07-13 18:24:40 -06:00
|
|
|
|
<code>(R5, R6)</code>: Register pair for <code>LDP</code>/<code>STP</code>.
|
2015-07-12 23:22:35 -06:00
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
</ul>
|
|
|
|
|
|
2015-07-21 19:24:27 -06:00
|
|
|
|
<h3 id="ppc64">Power 64, a.k.a. ppc64</h3>
|
2015-07-12 23:22:35 -06:00
|
|
|
|
|
|
|
|
|
<p>
|
2015-07-13 18:24:40 -06:00
|
|
|
|
The Power 64 port is in an experimental state.
|
2015-07-12 23:22:35 -06:00
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
Addressing modes:
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>(R5)(R6*1)</code>: The location at <code>R5</code> plus <code>R6</code>. It is a scaled
|
2015-07-13 18:24:40 -06:00
|
|
|
|
mode as on the x86, but the only scale allowed is <code>1</code>.
|
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
<li>
|
|
|
|
|
<code>(R5+R6)</code>: Alias for (R5)(R6*1)
|
2015-07-12 23:22:35 -06:00
|
|
|
|
</li>
|
|
|
|
|
|
|
|
|
|
</ul>
|
|
|
|
|
|
2013-11-12 21:04:22 -07:00
|
|
|
|
<h3 id="unsupported_opcodes">Unsupported opcodes</h3>
|
|
|
|
|
|
|
|
|
|
<p>
|
|
|
|
|
The assemblers are designed to support the compiler so not all hardware instructions
|
|
|
|
|
are defined for all architectures: if the compiler doesn't generate it, it might not be there.
|
|
|
|
|
If you need to use a missing instruction, there are two ways to proceed.
|
|
|
|
|
One is to update the assembler to support that instruction, which is straightforward
|
|
|
|
|
but only worthwhile if it's likely the instruction will be used again.
|
|
|
|
|
Instead, for simple one-off cases, it's possible to use the <code>BYTE</code>
|
|
|
|
|
and <code>WORD</code> directives
|
|
|
|
|
to lay down explicit data into the instruction stream within a <code>TEXT</code>.
|
|
|
|
|
Here's how the 386 runtime defines the 64-bit atomic load function.
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
// uint64 atomicload64(uint64 volatile* addr);
|
|
|
|
|
// so actually
|
|
|
|
|
// void atomicload64(uint64 *res, uint64 volatile *addr);
|
2015-07-12 23:22:35 -06:00
|
|
|
|
TEXT runtime·atomicload64(SB), NOSPLIT, $0-12
|
2014-10-28 13:51:06 -06:00
|
|
|
|
MOVL ptr+0(FP), AX
|
2015-07-12 23:22:35 -06:00
|
|
|
|
TESTL $7, AX
|
|
|
|
|
JZ 2(PC)
|
|
|
|
|
MOVL 0, AX // crash with nil ptr deref
|
2014-10-28 13:51:06 -06:00
|
|
|
|
LEAL ret_lo+4(FP), BX
|
2015-07-12 23:22:35 -06:00
|
|
|
|
// MOVQ (%EAX), %MM0
|
|
|
|
|
BYTE $0x0f; BYTE $0x6f; BYTE $0x00
|
|
|
|
|
// MOVQ %MM0, 0(%EBX)
|
|
|
|
|
BYTE $0x0f; BYTE $0x7f; BYTE $0x03
|
|
|
|
|
// EMMS
|
|
|
|
|
BYTE $0x0F; BYTE $0x77
|
2013-11-12 21:04:22 -07:00
|
|
|
|
RET
|
|
|
|
|
</pre>
|