mirror of
https://github.com/golang/go
synced 2024-11-22 09:14:40 -07:00
cmd/cgo: add implementation comment
R=golang-dev, r, bradfitz, iant CC=golang-dev https://golang.org/cl/7407050
This commit is contained in:
parent
3b69efb010
commit
062a239046
@ -134,3 +134,266 @@ See "C? Go? Cgo!" for an introduction to using cgo:
|
||||
http://golang.org/doc/articles/c_go_cgo.html
|
||||
*/
|
||||
package main
|
||||
|
||||
/*
|
||||
Implementation details.
|
||||
|
||||
Cgo provides a way for Go programs to call C code linked into the same
|
||||
address space. This comment explains the operation of cgo.
|
||||
|
||||
Cgo reads a set of Go source files and looks for statements saying
|
||||
import "C". If the import has a doc comment, that comment is
|
||||
taken as literal C code to be used as a preamble to any C code
|
||||
generated by cgo. A typical preamble #includes necessary definitions:
|
||||
|
||||
// #include <stdio.h>
|
||||
import "C"
|
||||
|
||||
For more details about the usage of cgo, see the documentation
|
||||
comment at the top of this file.
|
||||
|
||||
Understanding C
|
||||
|
||||
Cgo scans the Go source files that import "C" for uses of that
|
||||
package, such as C.puts. It collects all such identifiers. The next
|
||||
step is to determine each kind of name. In C.xxx the xxx might refer
|
||||
to a type, a function, a constant, or a global variable. Cgo must
|
||||
decide which.
|
||||
|
||||
The obvious thing for cgo to do is to process the preamble, expanding
|
||||
#includes and processing the corresponding C code. That would require
|
||||
a full C parser and type checker that was also aware of any extensions
|
||||
known to the system compiler (for example, all the GNU C extensions) as
|
||||
well as the system-specific header locations and system-specific
|
||||
pre-#defined macros. This is certainly possible to do, but it is an
|
||||
enormous amount of work.
|
||||
|
||||
Cgo takes a different approach. It determines the meaning of C
|
||||
identifiers not by parsing C code but by feeding carefully constructed
|
||||
programs into the system C compiler and interpreting the generated
|
||||
error messages, debug information, and object files. In practice,
|
||||
parsing these is significantly less work and more robust than parsing
|
||||
C source.
|
||||
|
||||
Cgo first invokes gcc -E -dM on the preamble, in order to find out
|
||||
about simple #defines for constants and the like. These are recorded
|
||||
for later use.
|
||||
|
||||
Next, cgo needs to identify the kinds for each identifier. For the
|
||||
identifiers C.foo and C.bar, cgo generates this C program:
|
||||
|
||||
<preamble>
|
||||
void __cgo__f__(void) {
|
||||
#line 1 "cgo-test"
|
||||
foo;
|
||||
enum { _cgo_enum_0 = foo };
|
||||
bar;
|
||||
enum { _cgo_enum_1 = bar };
|
||||
}
|
||||
|
||||
This program will not compile, but cgo can look at the error messages
|
||||
to infer the kind of each identifier. The line number given in the
|
||||
error tells cgo which identifier is involved.
|
||||
|
||||
An error like "unexpected type name" or "useless type name in empty
|
||||
declaration" or "declaration does not declare anything" tells cgo that
|
||||
the identifier is a type.
|
||||
|
||||
An error like "statement with no effect" or "expression result unused"
|
||||
tells cgo that the identifier is not a type, but not whether it is a
|
||||
constant, function, or global variable.
|
||||
|
||||
An error like "not an integer constant" tells cgo that the identifier
|
||||
is not a constant. If it is also not a type, it must be a function or
|
||||
global variable. For now, those can be treated the same.
|
||||
|
||||
Next, cgo must learn the details of each type, variable, function, or
|
||||
constant. It can do this by reading object files. If cgo has decided
|
||||
that t1 is a type, v2 and v3 are variables or functions, and c4, c5,
|
||||
and c6 are constants, it generates:
|
||||
|
||||
<preamble>
|
||||
typeof(t1) *__cgo__1;
|
||||
typeof(v2) *__cgo__2;
|
||||
typeof(v3) *__cgo__3;
|
||||
typeof(c4) *__cgo__4;
|
||||
enum { __cgo_enum__4 = c4 };
|
||||
typeof(c5) *__cgo__5;
|
||||
enum { __cgo_enum__5 = c5 };
|
||||
typeof(c6) *__cgo__6;
|
||||
enum { __cgo_enum__6 = c6 };
|
||||
|
||||
long long __cgo_debug_data[] = {
|
||||
0, // t1
|
||||
0, // v2
|
||||
0, // v3
|
||||
c4,
|
||||
c5,
|
||||
c6,
|
||||
1
|
||||
};
|
||||
|
||||
and again invokes the system C compiler, to produce an object file
|
||||
containing debug information. Cgo parses the DWARF debug information
|
||||
for __cgo__N to learn the type of each identifier. (The types also
|
||||
distinguish functions from global variables.) If using a standard gcc,
|
||||
cgo can parse the DWARF debug information for the __cgo_enum__N to
|
||||
learn the identifier's value. The LLVM-based gcc on OS X emits
|
||||
incomplete DWARF information for enums; in that case cgo reads the
|
||||
constant values from the __cgo_debug_data from the object file's data
|
||||
segment.
|
||||
|
||||
At this point cgo knows the meaning of each C.xxx well enough to start
|
||||
the translation process.
|
||||
|
||||
Translating Go
|
||||
|
||||
[The rest of this comment refers to 6g and 6c, the Go and C compilers
|
||||
that are part of the amd64 port of the gc Go toolchain. Everything here
|
||||
applies to another architecture's compilers as well.]
|
||||
|
||||
Given the input Go files x.go and y.go, cgo generates these source
|
||||
files:
|
||||
|
||||
x.cgo1.go # for 6g
|
||||
y.cgo1.go # for 6g
|
||||
_cgo_gotypes.go # for 6g
|
||||
_cgo_defun.c # for 6c
|
||||
x.cgo2.c # for gcc
|
||||
y.cgo2.c # for gcc
|
||||
_cgo_export.c # for gcc
|
||||
_cgo_main.c # for gcc
|
||||
|
||||
The file x.cgo1.go is a copy of x.go with the import "C" removed and
|
||||
references to C.xxx replaced with names like _Cfunc_xxx or _Ctype_xxx.
|
||||
The definitions of those identifiers, written as Go functions, types,
|
||||
or variables, are provided in _cgo_gotypes.go.
|
||||
|
||||
Here is a _cgo_gotypes.go containing definitions for C.flush (provided
|
||||
in the preamble) and C.puts (from stdio):
|
||||
|
||||
type _Ctype_char int8
|
||||
type _Ctype_int int32
|
||||
type _Ctype_void [0]byte
|
||||
|
||||
func _Cfunc_CString(string) *_Ctype_char
|
||||
func _Cfunc_flush() _Ctype_void
|
||||
func _Cfunc_puts(*_Ctype_char) _Ctype_int
|
||||
|
||||
For functions, cgo only writes an external declaration in the Go
|
||||
output. The implementation is in a combination of C for 6c (meaning
|
||||
any gc-toolchain compiler) and C for gcc.
|
||||
|
||||
The 6c file contains the definitions of the functions. They all have
|
||||
similar bodies that invoke runtime·cgocall to make a switch from the
|
||||
Go runtime world to the system C (GCC-based) world.
|
||||
|
||||
For example, here is the definition of _Cfunc_puts:
|
||||
|
||||
void _cgo_be59f0f25121_Cfunc_puts(void*);
|
||||
|
||||
void
|
||||
·_Cfunc_puts(struct{uint8 x[1];}p)
|
||||
{
|
||||
runtime·cgocall(_cgo_be59f0f25121_Cfunc_puts, &p);
|
||||
}
|
||||
|
||||
The hexadecimal number is a hash of cgo's input, chosen to be
|
||||
deterministic yet unlikely to collide with other uses. The actual
|
||||
function _cgo_be59f0f25121_Cfunc_flush is implemented in a C source
|
||||
file compiled by gcc, the file x.cgo2.c:
|
||||
|
||||
void
|
||||
_cgo_be59f0f25121_Cfunc_puts(void *v)
|
||||
{
|
||||
struct {
|
||||
char* p0;
|
||||
int r;
|
||||
char __pad12[4];
|
||||
} __attribute__((__packed__)) *a = v;
|
||||
a->r = puts((void*)a->p0);
|
||||
}
|
||||
|
||||
It extracts the arguments from the pointer to _Cfunc_puts's argument
|
||||
frame, invokes the system C function (in this case, puts), stores the
|
||||
result in the frame, and returns.
|
||||
|
||||
Linking
|
||||
|
||||
Once the _cgo_export.c and *.cgo2.c files have been compiled with gcc,
|
||||
they need to be linked into the final binary, along with the libraries
|
||||
they might depend on (in the case of puts, stdio). 6l has been
|
||||
extended to understand basic ELF files, but it does not understand ELF
|
||||
in the full complexity that modern C libraries embrace, so it cannot
|
||||
in general generate direct references to the system libraries.
|
||||
|
||||
Instead, the build process generates an object file using dynamic
|
||||
linkage to the desired libraries. The main function is provided by
|
||||
_cgo_main.c:
|
||||
|
||||
int main() { return 0; }
|
||||
void crosscall2(void(*fn)(void*, int), void *a, int c) { }
|
||||
void _cgo_allocate(void *a, int c) { }
|
||||
void _cgo_panic(void *a, int c) { }
|
||||
|
||||
The extra functions here are stubs to satisfy the references in the C
|
||||
code generated for gcc. The build process links this stub, along with
|
||||
_cgo_export.c and *.cgo2.c, into a dynamic executable and then lets
|
||||
cgo examine the executable. Cgo records the list of shared library
|
||||
references and resolved names and writes them into a new file
|
||||
_cgo_import.c, which looks like:
|
||||
|
||||
#pragma dynlinker "/lib64/ld-linux-x86-64.so.2"
|
||||
#pragma dynimport puts puts#GLIBC_2.2.5 "libc.so.6"
|
||||
#pragma dynimport __libc_start_main __libc_start_main#GLIBC_2.2.5 "libc.so.6"
|
||||
#pragma dynimport stdout stdout#GLIBC_2.2.5 "libc.so.6"
|
||||
#pragma dynimport fflush fflush#GLIBC_2.2.5 "libc.so.6"
|
||||
#pragma dynimport _ _ "libpthread.so.0"
|
||||
#pragma dynimport _ _ "libc.so.6"
|
||||
|
||||
In the end, the compiled Go package, which will eventually be
|
||||
presented to 6l as part of a larger program, contains:
|
||||
|
||||
_go_.6 # 6g-compiled object for _cgo_gotypes.go *.cgo1.go
|
||||
_cgo_defun.6 # 6c-compiled object for _cgo_defun.c
|
||||
_all.o # gcc-compiled object for _cgo_export.c, *.cgo2.c
|
||||
_cgo_import.6 # 6c-compiled object for _cgo_import.c
|
||||
|
||||
The final program will be a dynamic executable, so that 6l can avoid
|
||||
needing to process arbitrary .o files. It only needs to process the .o
|
||||
files generated from C files that cgo writes, and those are much more
|
||||
limited in the ELF or other features that they use.
|
||||
|
||||
In essence, the _cgo_import.6 file includes the extra linking
|
||||
directives that 6l is not sophisticated enough to derive from _all.o
|
||||
on its own. Similarly, the _all.o uses dynamic references to real
|
||||
system object code because 6l is not sophisticated enough to process
|
||||
the real code.
|
||||
|
||||
The main benefits of this system are that 6l remains relatively simple
|
||||
(it does not need to implement a complete ELF and Mach-O linker) and
|
||||
that gcc is not needed after the package is compiled. For example,
|
||||
package net uses cgo for access to name resolution functions provided
|
||||
by libc. Although gcc is needed to compile package net, gcc is not
|
||||
needed to link programs that import package net.
|
||||
|
||||
Runtime
|
||||
|
||||
When using cgo, Go must not assume that it owns all details of the
|
||||
process. In particular it needs to coordinate with C in the use of
|
||||
threads and thread-local storage. The runtime package, in its own
|
||||
(6c-compiled) C code, declares a few uninitialized (default bss)
|
||||
variables:
|
||||
|
||||
bool runtime·iscgo;
|
||||
void (*libcgo_thread_start)(void*);
|
||||
void (*initcgo)(G*);
|
||||
|
||||
Any package using cgo imports "runtime/cgo", which provides
|
||||
initializations for these variables. It sets iscgo to 1, initcgo to a
|
||||
gcc-compiled function that can be called early during program startup,
|
||||
and libcgo_thread_start to a gcc-compiled function that can be used to
|
||||
create a new thread, in place of the runtime's usual direct system
|
||||
calls.
|
||||
|
||||
*/
|
||||
|
Loading…
Reference in New Issue
Block a user