cmd/cgo: add implementation comment

R=golang-dev, r, bradfitz, iant CC=golang-dev https://golang.org/cl/7407050
2024-11-12 09:30:25 -07:00 · 2013-02-27 20:55:01 -08:00 · 2013-02-27 20:55:01 -08:00 · 062a239046
commit 062a239046
parent 3b69efb010
1 changed files with 263 additions and 0 deletions
--- a/src/cmd/cgo/doc.go
+++ b/src/cmd/cgo/doc.go
@ -134,3 +134,266 @@ See "C? Go? Cgo!" for an introduction to using cgo:
 http://golang.org/doc/articles/c_go_cgo.html
 */
 package main
 /*
 Implementation details.
 Cgo provides a way for Go programs to call C code linked into the same
 address space. This comment explains the operation of cgo.
 Cgo reads a set of Go source files and looks for statements saying
 import "C". If the import has a doc comment, that comment is
 taken as literal C code to be used as a preamble to any C code
 generated by cgo. A typical preamble #includes necessary definitions:
 	// #include <stdio.h>
 	import "C"
 For more details about the usage of cgo, see the documentation
 comment at the top of this file.
 Understanding C
 Cgo scans the Go source files that import "C" for uses of that
 package, such as C.puts. It collects all such identifiers. The next
 step is to determine each kind of name. In C.xxx the xxx might refer
 to a type, a function, a constant, or a global variable. Cgo must
 decide which.
 The obvious thing for cgo to do is to process the preamble, expanding
 #includes and processing the corresponding C code. That would require
 a full C parser and type checker that was also aware of any extensions
 known to the system compiler (for example, all the GNU C extensions) as
 well as the system-specific header locations and system-specific
 pre-#defined macros. This is certainly possible to do, but it is an
 enormous amount of work.
 Cgo takes a different approach. It determines the meaning of C
 identifiers not by parsing C code but by feeding carefully constructed
 programs into the system C compiler and interpreting the generated
 error messages, debug information, and object files. In practice,
 parsing these is significantly less work and more robust than parsing
 C source.
 Cgo first invokes gcc -E -dM on the preamble, in order to find out
 about simple #defines for constants and the like. These are recorded
 for later use.
 Next, cgo needs to identify the kinds for each identifier. For the
 identifiers C.foo and C.bar, cgo generates this C program:
 	<preamble>
 	void __cgo__f__(void) {
 	#line 1 "cgo-test"
 		foo;
 		enum { _cgo_enum_0 = foo };
 		bar;
 		enum { _cgo_enum_1 = bar };
 	}
 This program will not compile, but cgo can look at the error messages
 to infer the kind of each identifier. The line number given in the
 error tells cgo which identifier is involved.
 An error like "unexpected type name" or "useless type name in empty
 declaration" or "declaration does not declare anything" tells cgo that
 the identifier is a type.
 An error like "statement with no effect" or "expression result unused"
 tells cgo that the identifier is not a type, but not whether it is a
 constant, function, or global variable.
 An error like "not an integer constant" tells cgo that the identifier
 is not a constant. If it is also not a type, it must be a function or
 global variable. For now, those can be treated the same.
 Next, cgo must learn the details of each type, variable, function, or
 constant. It can do this by reading object files. If cgo has decided
 that t1 is a type, v2 and v3 are variables or functions, and c4, c5,
 and c6 are constants, it generates:
 	<preamble>
 	typeof(t1) *__cgo__1;
 	typeof(v2) *__cgo__2;
 	typeof(v3) *__cgo__3;
 	typeof(c4) *__cgo__4;
 	enum { __cgo_enum__4 = c4 };
 	typeof(c5) *__cgo__5;
 	enum { __cgo_enum__5 = c5 };
 	typeof(c6) *__cgo__6;
 	enum { __cgo_enum__6 = c6 };
 	long long __cgo_debug_data[] = {
 		0, // t1
 		0, // v2
 		0, // v3
 		c4,
 		c5,
 		c6,
 		1
 	};
 and again invokes the system C compiler, to produce an object file
 containing debug information. Cgo parses the DWARF debug information
 for __cgo__N to learn the type of each identifier. (The types also
 distinguish functions from global variables.) If using a standard gcc,
 cgo can parse the DWARF debug information for the __cgo_enum__N to
 learn the identifier's value. The LLVM-based gcc on OS X emits
 incomplete DWARF information for enums; in that case cgo reads the
 constant values from the __cgo_debug_data from the object file's data
 segment.
 At this point cgo knows the meaning of each C.xxx well enough to start
 the translation process.
 Translating Go
 [The rest of this comment refers to 6g and 6c, the Go and C compilers
 that are part of the amd64 port of the gc Go toolchain. Everything here
 applies to another architecture's compilers as well.]
 Given the input Go files x.go and y.go, cgo generates these source
 files:
 	x.cgo1.go       # for 6g
 	y.cgo1.go       # for 6g
 	_cgo_gotypes.go # for 6g
 	_cgo_defun.c    # for 6c
 	x.cgo2.c        # for gcc
 	y.cgo2.c        # for gcc
 	_cgo_export.c   # for gcc
 	_cgo_main.c     # for gcc
 The file x.cgo1.go is a copy of x.go with the import "C" removed and
 references to C.xxx replaced with names like _Cfunc_xxx or _Ctype_xxx.
 The definitions of those identifiers, written as Go functions, types,
 or variables, are provided in _cgo_gotypes.go.
 Here is a _cgo_gotypes.go containing definitions for C.flush (provided
 in the preamble) and C.puts (from stdio):
 	type _Ctype_char int8
 	type _Ctype_int int32
 	type _Ctype_void [0]byte
 	func _Cfunc_CString(string) *_Ctype_char
 	func _Cfunc_flush() _Ctype_void
 	func _Cfunc_puts(*_Ctype_char) _Ctype_int
 For functions, cgo only writes an external declaration in the Go
 output. The implementation is in a combination of C for 6c (meaning
 any gc-toolchain compiler) and C for gcc.
 The 6c file contains the definitions of the functions. They all have
 similar bodies that invoke runtime·cgocall to make a switch from the
 Go runtime world to the system C (GCC-based) world.
 For example, here is the definition of _Cfunc_puts:
 	void _cgo_be59f0f25121_Cfunc_puts(void*);
 	void
 	·_Cfunc_puts(struct{uint8 x[1];}p)
 	{
 		runtime·cgocall(_cgo_be59f0f25121_Cfunc_puts, &p);
 	}
 The hexadecimal number is a hash of cgo's input, chosen to be
 deterministic yet unlikely to collide with other uses. The actual
 function _cgo_be59f0f25121_Cfunc_flush is implemented in a C source
 file compiled by gcc, the file x.cgo2.c:
 	void
 	_cgo_be59f0f25121_Cfunc_puts(void *v)
 	{
 		struct {
 			char* p0;
 			int r;
 			char __pad12[4];
 		} __attribute__((__packed__)) *a = v;
 		a->r = puts((void*)a->p0);
 	}
 It extracts the arguments from the pointer to _Cfunc_puts's argument
 frame, invokes the system C function (in this case, puts), stores the
 result in the frame, and returns.
 Linking
 Once the _cgo_export.c and *.cgo2.c files have been compiled with gcc,
 they need to be linked into the final binary, along with the libraries
 they might depend on (in the case of puts, stdio). 6l has been
 extended to understand basic ELF files, but it does not understand ELF
 in the full complexity that modern C libraries embrace, so it cannot
 in general generate direct references to the system libraries.
 Instead, the build process generates an object file using dynamic
 linkage to the desired libraries. The main function is provided by
 _cgo_main.c:
 	int main() { return 0; }
 	void crosscall2(void(*fn)(void*, int), void *a, int c) { }
 	void _cgo_allocate(void *a, int c) { }
 	void _cgo_panic(void *a, int c) { }
 The extra functions here are stubs to satisfy the references in the C
 code generated for gcc. The build process links this stub, along with
 _cgo_export.c and *.cgo2.c, into a dynamic executable and then lets
 cgo examine the executable. Cgo records the list of shared library
 references and resolved names and writes them into a new file
 _cgo_import.c, which looks like:
 	#pragma dynlinker "/lib64/ld-linux-x86-64.so.2"
 	#pragma dynimport puts puts#GLIBC_2.2.5 "libc.so.6"
 	#pragma dynimport __libc_start_main __libc_start_main#GLIBC_2.2.5 "libc.so.6"
 	#pragma dynimport stdout stdout#GLIBC_2.2.5 "libc.so.6"
 	#pragma dynimport fflush fflush#GLIBC_2.2.5 "libc.so.6"
 	#pragma dynimport _ _ "libpthread.so.0"
 	#pragma dynimport _ _ "libc.so.6"
 In the end, the compiled Go package, which will eventually be
 presented to 6l as part of a larger program, contains:
 	_go_.6        # 6g-compiled object for _cgo_gotypes.go *.cgo1.go
 	_cgo_defun.6  # 6c-compiled object for _cgo_defun.c
 	_all.o        # gcc-compiled object for _cgo_export.c, *.cgo2.c
 	_cgo_import.6 # 6c-compiled object for _cgo_import.c
 The final program will be a dynamic executable, so that 6l can avoid
 needing to process arbitrary .o files. It only needs to process the .o
 files generated from C files that cgo writes, and those are much more
 limited in the ELF or other features that they use.
 In essence, the _cgo_import.6 file includes the extra linking
 directives that 6l is not sophisticated enough to derive from _all.o
 on its own. Similarly, the _all.o uses dynamic references to real
 system object code because 6l is not sophisticated enough to process
 the real code.
 The main benefits of this system are that 6l remains relatively simple
 (it does not need to implement a complete ELF and Mach-O linker) and
 that gcc is not needed after the package is compiled. For example,
 package net uses cgo for access to name resolution functions provided
 by libc. Although gcc is needed to compile package net, gcc is not
 needed to link programs that import package net.
 Runtime
 When using cgo, Go must not assume that it owns all details of the
 process. In particular it needs to coordinate with C in the use of
 threads and thread-local storage. The runtime package, in its own
 (6c-compiled) C code, declares a few uninitialized (default bss)
 variables:
 	bool	runtime·iscgo;
 	void	(*libcgo_thread_start)(void*);
 	void	(*initcgo)(G*);
 Any package using cgo imports "runtime/cgo", which provides
 initializations for these variables. It sets iscgo to 1, initcgo to a
 gcc-compiled function that can be called early during program startup,
 and libcgo_thread_start to a gcc-compiled function that can be used to
 create a new thread, in place of the runtime's usual direct system
 calls.
 */