mirror of
https://github.com/golang/go
synced 2024-11-13 19:00:25 -07:00
cmd/cgo: add implementation comment
R=golang-dev, r, bradfitz, iant CC=golang-dev https://golang.org/cl/7407050
This commit is contained in:
parent
3b69efb010
commit
062a239046
@ -134,3 +134,266 @@ See "C? Go? Cgo!" for an introduction to using cgo:
|
|||||||
http://golang.org/doc/articles/c_go_cgo.html
|
http://golang.org/doc/articles/c_go_cgo.html
|
||||||
*/
|
*/
|
||||||
package main
|
package main
|
||||||
|
|
||||||
|
/*
|
||||||
|
Implementation details.
|
||||||
|
|
||||||
|
Cgo provides a way for Go programs to call C code linked into the same
|
||||||
|
address space. This comment explains the operation of cgo.
|
||||||
|
|
||||||
|
Cgo reads a set of Go source files and looks for statements saying
|
||||||
|
import "C". If the import has a doc comment, that comment is
|
||||||
|
taken as literal C code to be used as a preamble to any C code
|
||||||
|
generated by cgo. A typical preamble #includes necessary definitions:
|
||||||
|
|
||||||
|
// #include <stdio.h>
|
||||||
|
import "C"
|
||||||
|
|
||||||
|
For more details about the usage of cgo, see the documentation
|
||||||
|
comment at the top of this file.
|
||||||
|
|
||||||
|
Understanding C
|
||||||
|
|
||||||
|
Cgo scans the Go source files that import "C" for uses of that
|
||||||
|
package, such as C.puts. It collects all such identifiers. The next
|
||||||
|
step is to determine each kind of name. In C.xxx the xxx might refer
|
||||||
|
to a type, a function, a constant, or a global variable. Cgo must
|
||||||
|
decide which.
|
||||||
|
|
||||||
|
The obvious thing for cgo to do is to process the preamble, expanding
|
||||||
|
#includes and processing the corresponding C code. That would require
|
||||||
|
a full C parser and type checker that was also aware of any extensions
|
||||||
|
known to the system compiler (for example, all the GNU C extensions) as
|
||||||
|
well as the system-specific header locations and system-specific
|
||||||
|
pre-#defined macros. This is certainly possible to do, but it is an
|
||||||
|
enormous amount of work.
|
||||||
|
|
||||||
|
Cgo takes a different approach. It determines the meaning of C
|
||||||
|
identifiers not by parsing C code but by feeding carefully constructed
|
||||||
|
programs into the system C compiler and interpreting the generated
|
||||||
|
error messages, debug information, and object files. In practice,
|
||||||
|
parsing these is significantly less work and more robust than parsing
|
||||||
|
C source.
|
||||||
|
|
||||||
|
Cgo first invokes gcc -E -dM on the preamble, in order to find out
|
||||||
|
about simple #defines for constants and the like. These are recorded
|
||||||
|
for later use.
|
||||||
|
|
||||||
|
Next, cgo needs to identify the kinds for each identifier. For the
|
||||||
|
identifiers C.foo and C.bar, cgo generates this C program:
|
||||||
|
|
||||||
|
<preamble>
|
||||||
|
void __cgo__f__(void) {
|
||||||
|
#line 1 "cgo-test"
|
||||||
|
foo;
|
||||||
|
enum { _cgo_enum_0 = foo };
|
||||||
|
bar;
|
||||||
|
enum { _cgo_enum_1 = bar };
|
||||||
|
}
|
||||||
|
|
||||||
|
This program will not compile, but cgo can look at the error messages
|
||||||
|
to infer the kind of each identifier. The line number given in the
|
||||||
|
error tells cgo which identifier is involved.
|
||||||
|
|
||||||
|
An error like "unexpected type name" or "useless type name in empty
|
||||||
|
declaration" or "declaration does not declare anything" tells cgo that
|
||||||
|
the identifier is a type.
|
||||||
|
|
||||||
|
An error like "statement with no effect" or "expression result unused"
|
||||||
|
tells cgo that the identifier is not a type, but not whether it is a
|
||||||
|
constant, function, or global variable.
|
||||||
|
|
||||||
|
An error like "not an integer constant" tells cgo that the identifier
|
||||||
|
is not a constant. If it is also not a type, it must be a function or
|
||||||
|
global variable. For now, those can be treated the same.
|
||||||
|
|
||||||
|
Next, cgo must learn the details of each type, variable, function, or
|
||||||
|
constant. It can do this by reading object files. If cgo has decided
|
||||||
|
that t1 is a type, v2 and v3 are variables or functions, and c4, c5,
|
||||||
|
and c6 are constants, it generates:
|
||||||
|
|
||||||
|
<preamble>
|
||||||
|
typeof(t1) *__cgo__1;
|
||||||
|
typeof(v2) *__cgo__2;
|
||||||
|
typeof(v3) *__cgo__3;
|
||||||
|
typeof(c4) *__cgo__4;
|
||||||
|
enum { __cgo_enum__4 = c4 };
|
||||||
|
typeof(c5) *__cgo__5;
|
||||||
|
enum { __cgo_enum__5 = c5 };
|
||||||
|
typeof(c6) *__cgo__6;
|
||||||
|
enum { __cgo_enum__6 = c6 };
|
||||||
|
|
||||||
|
long long __cgo_debug_data[] = {
|
||||||
|
0, // t1
|
||||||
|
0, // v2
|
||||||
|
0, // v3
|
||||||
|
c4,
|
||||||
|
c5,
|
||||||
|
c6,
|
||||||
|
1
|
||||||
|
};
|
||||||
|
|
||||||
|
and again invokes the system C compiler, to produce an object file
|
||||||
|
containing debug information. Cgo parses the DWARF debug information
|
||||||
|
for __cgo__N to learn the type of each identifier. (The types also
|
||||||
|
distinguish functions from global variables.) If using a standard gcc,
|
||||||
|
cgo can parse the DWARF debug information for the __cgo_enum__N to
|
||||||
|
learn the identifier's value. The LLVM-based gcc on OS X emits
|
||||||
|
incomplete DWARF information for enums; in that case cgo reads the
|
||||||
|
constant values from the __cgo_debug_data from the object file's data
|
||||||
|
segment.
|
||||||
|
|
||||||
|
At this point cgo knows the meaning of each C.xxx well enough to start
|
||||||
|
the translation process.
|
||||||
|
|
||||||
|
Translating Go
|
||||||
|
|
||||||
|
[The rest of this comment refers to 6g and 6c, the Go and C compilers
|
||||||
|
that are part of the amd64 port of the gc Go toolchain. Everything here
|
||||||
|
applies to another architecture's compilers as well.]
|
||||||
|
|
||||||
|
Given the input Go files x.go and y.go, cgo generates these source
|
||||||
|
files:
|
||||||
|
|
||||||
|
x.cgo1.go # for 6g
|
||||||
|
y.cgo1.go # for 6g
|
||||||
|
_cgo_gotypes.go # for 6g
|
||||||
|
_cgo_defun.c # for 6c
|
||||||
|
x.cgo2.c # for gcc
|
||||||
|
y.cgo2.c # for gcc
|
||||||
|
_cgo_export.c # for gcc
|
||||||
|
_cgo_main.c # for gcc
|
||||||
|
|
||||||
|
The file x.cgo1.go is a copy of x.go with the import "C" removed and
|
||||||
|
references to C.xxx replaced with names like _Cfunc_xxx or _Ctype_xxx.
|
||||||
|
The definitions of those identifiers, written as Go functions, types,
|
||||||
|
or variables, are provided in _cgo_gotypes.go.
|
||||||
|
|
||||||
|
Here is a _cgo_gotypes.go containing definitions for C.flush (provided
|
||||||
|
in the preamble) and C.puts (from stdio):
|
||||||
|
|
||||||
|
type _Ctype_char int8
|
||||||
|
type _Ctype_int int32
|
||||||
|
type _Ctype_void [0]byte
|
||||||
|
|
||||||
|
func _Cfunc_CString(string) *_Ctype_char
|
||||||
|
func _Cfunc_flush() _Ctype_void
|
||||||
|
func _Cfunc_puts(*_Ctype_char) _Ctype_int
|
||||||
|
|
||||||
|
For functions, cgo only writes an external declaration in the Go
|
||||||
|
output. The implementation is in a combination of C for 6c (meaning
|
||||||
|
any gc-toolchain compiler) and C for gcc.
|
||||||
|
|
||||||
|
The 6c file contains the definitions of the functions. They all have
|
||||||
|
similar bodies that invoke runtime·cgocall to make a switch from the
|
||||||
|
Go runtime world to the system C (GCC-based) world.
|
||||||
|
|
||||||
|
For example, here is the definition of _Cfunc_puts:
|
||||||
|
|
||||||
|
void _cgo_be59f0f25121_Cfunc_puts(void*);
|
||||||
|
|
||||||
|
void
|
||||||
|
·_Cfunc_puts(struct{uint8 x[1];}p)
|
||||||
|
{
|
||||||
|
runtime·cgocall(_cgo_be59f0f25121_Cfunc_puts, &p);
|
||||||
|
}
|
||||||
|
|
||||||
|
The hexadecimal number is a hash of cgo's input, chosen to be
|
||||||
|
deterministic yet unlikely to collide with other uses. The actual
|
||||||
|
function _cgo_be59f0f25121_Cfunc_flush is implemented in a C source
|
||||||
|
file compiled by gcc, the file x.cgo2.c:
|
||||||
|
|
||||||
|
void
|
||||||
|
_cgo_be59f0f25121_Cfunc_puts(void *v)
|
||||||
|
{
|
||||||
|
struct {
|
||||||
|
char* p0;
|
||||||
|
int r;
|
||||||
|
char __pad12[4];
|
||||||
|
} __attribute__((__packed__)) *a = v;
|
||||||
|
a->r = puts((void*)a->p0);
|
||||||
|
}
|
||||||
|
|
||||||
|
It extracts the arguments from the pointer to _Cfunc_puts's argument
|
||||||
|
frame, invokes the system C function (in this case, puts), stores the
|
||||||
|
result in the frame, and returns.
|
||||||
|
|
||||||
|
Linking
|
||||||
|
|
||||||
|
Once the _cgo_export.c and *.cgo2.c files have been compiled with gcc,
|
||||||
|
they need to be linked into the final binary, along with the libraries
|
||||||
|
they might depend on (in the case of puts, stdio). 6l has been
|
||||||
|
extended to understand basic ELF files, but it does not understand ELF
|
||||||
|
in the full complexity that modern C libraries embrace, so it cannot
|
||||||
|
in general generate direct references to the system libraries.
|
||||||
|
|
||||||
|
Instead, the build process generates an object file using dynamic
|
||||||
|
linkage to the desired libraries. The main function is provided by
|
||||||
|
_cgo_main.c:
|
||||||
|
|
||||||
|
int main() { return 0; }
|
||||||
|
void crosscall2(void(*fn)(void*, int), void *a, int c) { }
|
||||||
|
void _cgo_allocate(void *a, int c) { }
|
||||||
|
void _cgo_panic(void *a, int c) { }
|
||||||
|
|
||||||
|
The extra functions here are stubs to satisfy the references in the C
|
||||||
|
code generated for gcc. The build process links this stub, along with
|
||||||
|
_cgo_export.c and *.cgo2.c, into a dynamic executable and then lets
|
||||||
|
cgo examine the executable. Cgo records the list of shared library
|
||||||
|
references and resolved names and writes them into a new file
|
||||||
|
_cgo_import.c, which looks like:
|
||||||
|
|
||||||
|
#pragma dynlinker "/lib64/ld-linux-x86-64.so.2"
|
||||||
|
#pragma dynimport puts puts#GLIBC_2.2.5 "libc.so.6"
|
||||||
|
#pragma dynimport __libc_start_main __libc_start_main#GLIBC_2.2.5 "libc.so.6"
|
||||||
|
#pragma dynimport stdout stdout#GLIBC_2.2.5 "libc.so.6"
|
||||||
|
#pragma dynimport fflush fflush#GLIBC_2.2.5 "libc.so.6"
|
||||||
|
#pragma dynimport _ _ "libpthread.so.0"
|
||||||
|
#pragma dynimport _ _ "libc.so.6"
|
||||||
|
|
||||||
|
In the end, the compiled Go package, which will eventually be
|
||||||
|
presented to 6l as part of a larger program, contains:
|
||||||
|
|
||||||
|
_go_.6 # 6g-compiled object for _cgo_gotypes.go *.cgo1.go
|
||||||
|
_cgo_defun.6 # 6c-compiled object for _cgo_defun.c
|
||||||
|
_all.o # gcc-compiled object for _cgo_export.c, *.cgo2.c
|
||||||
|
_cgo_import.6 # 6c-compiled object for _cgo_import.c
|
||||||
|
|
||||||
|
The final program will be a dynamic executable, so that 6l can avoid
|
||||||
|
needing to process arbitrary .o files. It only needs to process the .o
|
||||||
|
files generated from C files that cgo writes, and those are much more
|
||||||
|
limited in the ELF or other features that they use.
|
||||||
|
|
||||||
|
In essence, the _cgo_import.6 file includes the extra linking
|
||||||
|
directives that 6l is not sophisticated enough to derive from _all.o
|
||||||
|
on its own. Similarly, the _all.o uses dynamic references to real
|
||||||
|
system object code because 6l is not sophisticated enough to process
|
||||||
|
the real code.
|
||||||
|
|
||||||
|
The main benefits of this system are that 6l remains relatively simple
|
||||||
|
(it does not need to implement a complete ELF and Mach-O linker) and
|
||||||
|
that gcc is not needed after the package is compiled. For example,
|
||||||
|
package net uses cgo for access to name resolution functions provided
|
||||||
|
by libc. Although gcc is needed to compile package net, gcc is not
|
||||||
|
needed to link programs that import package net.
|
||||||
|
|
||||||
|
Runtime
|
||||||
|
|
||||||
|
When using cgo, Go must not assume that it owns all details of the
|
||||||
|
process. In particular it needs to coordinate with C in the use of
|
||||||
|
threads and thread-local storage. The runtime package, in its own
|
||||||
|
(6c-compiled) C code, declares a few uninitialized (default bss)
|
||||||
|
variables:
|
||||||
|
|
||||||
|
bool runtime·iscgo;
|
||||||
|
void (*libcgo_thread_start)(void*);
|
||||||
|
void (*initcgo)(G*);
|
||||||
|
|
||||||
|
Any package using cgo imports "runtime/cgo", which provides
|
||||||
|
initializations for these variables. It sets iscgo to 1, initcgo to a
|
||||||
|
gcc-compiled function that can be called early during program startup,
|
||||||
|
and libcgo_thread_start to a gcc-compiled function that can be used to
|
||||||
|
create a new thread, in place of the runtime's usual direct system
|
||||||
|
calls.
|
||||||
|
|
||||||
|
*/
|
||||||
|
Loading…
Reference in New Issue
Block a user