From b806ba4d885651e0c9fb90fddf894ba69f72d925 Mon Sep 17 00:00:00 2001 From: Rob Pike Date: Tue, 15 Apr 2008 16:43:06 -0700 Subject: [PATCH] Add description of how compiling and linking handle dependencies. SVN=115807 --- doc/candl.txt | 263 ++++++++++++++++++++++++++++++++++++ src/lib/container/vector.go | 2 +- 2 files changed, 264 insertions(+), 1 deletion(-) create mode 100644 doc/candl.txt diff --git a/doc/candl.txt b/doc/candl.txt new file mode 100644 index 00000000000..38caba10a8d --- /dev/null +++ b/doc/candl.txt @@ -0,0 +1,263 @@ +Compiling and Linking +---- + +Assume we have: + + - one or more source files, *.go, perhaps in different directories + - a compiler, C. it takes one .go file and generates a .o file. + - a linker, L, it takes one or more .o files and generates a go.out (!) file. + +There is a question around naming of the files. Let's avoid that +problem for now and state that if the input is X.go, the output of +the compiler is X.o, ignoring the package declaration in the file. +This is not current behavior and probably not correct behavior, but +it keeps the exposition simpler. + +Let's also assume that the linker knows about the run time and we +don't have to specify bootstrap and runtime linkage explicitly. + + +Basics +---- + +Given a single file, main.go, with no dependencies, we do: + + C main.go # compile + L main.o # link + go.out # run + +Now let's say that main.go contains + + import "fmt" + +and that fmt.go contains + + import "sys" + +Then to build, we must compile in dependency order: + + C sys.go + C fmt.go + C main.go + +and then link + + L main.o fmt.o sys.o + +To the linker itself, the order of arguments is unimportant. + +When we compile fmt.go, we need to know the details of the functions +(etc.) exported by sys.go and used by fmt.go. When we run + + C fmt.go + +it discovers the import of sys, and must then read sys.o to discover +the details. We must therefore compile the exporting source file before we +can compile the importing source. Moreover, if there is a mismatch +between export and import, we can discover it during compilation +of the importing source. + +To be explicit, then, what we say is, in effect + + C sys.go + C fmt.go sys.o + C main.go fmt.o sys.o + L main.o fmt.o sys.o + + +The contents of .o files (I) +---- + +It's necessary to include in fmt.o the information for linking +against the functions etc. in sys.o. It's also possible to identify +sys.o explicitly inside fmt.o, so we need to say only + + L main.o fmt.o + +with sys.o discovered automatically. Iterating again, it's easy +to reduce the link step to + + L main.o + +with L discovering automatically the .o files it needs to process +to create the final go.out. + + +Automation of dependencies (I) +---- + +It should be possible to automate discovery of the dependencies of +main.go and therefore the order necessary to compile. Since the +source files contain explicit import statements, it is possible, +given a source file, to discover the dependency tree automatically. +(This will require rules and/or conventions about where to find +things; for now assume everything is in the same directory.) + +The program that does this might possibly be a variant of the +compiler, since it must parse import statements at least, but for +clarity let's call it D for dependency. It can be a little like +make, but let's not call it make because that brings along properties +we don't want. In particular, it reads the sources to discover the +dependencies; it doesn't need a separate description such as a +Makefile. + +In a directory with the source files above, including main.go, but +with no .o files, we say: + + D main.go + +D reads main.go, finds the import for fmt, and in effect descends, +automatically running + + D fmt.go + +which in turn invokes + + D sys.go + +The file sys.go has no dependencies, so it can be compiled; D +therefore says in effect + + "compile sys.go" + +and returns; then we have what we need for fmt.go since the exports +in sys.go are known (or at least the recipe to discover them is +known). So the next level says + + "compile fmt.go" + +and pops up, whereupon the top D says + + "compile main.go" + +The output of D could therefore be described as a script to run to +compile the source. + +We could imagine that instead, D actually runs the compiler. +(Conversely, we could imagine that C uses D to make sure the +dependencies are built, but that has the danger of causing unnecessary +dependency checking and compilation; more on that later.) + +To build, therefore, all we need to say is: + + D -c main.go # -c means 'run the compiler' + L main.o + +Obviously, D at this stage could just run L. Therefore, we can +simplify further by having it do so, whereupon + + D -c main.go + +can automate the complete compilation and linking process. + +Automation of dependencies (II) +---- + +Let's say we now edit main.go without changing its imports. To +recompile, we have two options. First, we could be explicit: + + C main.go + +Or we could use D to automate running the compiler, as described +in the previous section: + + D -c main.go + +The D command will discover the import of fmt, but can see that fmt.o +already exists. Assuming its existence implies its currency, it need +go no further; it can invoke C to compile main.go and link as usual. +Whether it should make this assumption might be controlled by a flag. +For the purpose of discussion, let's say it makes the assumption if +the -c flag is set. + +There are two implications to this scheme. First, running D when D +is going to turn around and run C anyway implies we could just run +C directly and save one command invocation. (We could decide +independently whether C should automatically invoke the linker.) + +The other implication is more interesting. If we stop traversing +the dependency hierarchy as soon as we discover a .o file, then we +may not realize that fmt.o is out of date and link against a stale +binary. To fix this problem, we need to stat() or checksum the .o +and .go files to see if they need recompilation. Doing this every +time is expensive and gets us back into the make-like approach. + +The great majority of compilations do not require this full check, +however; this is especially true when in the compile-debug-edit +cycle. We therefore propose splitting the model into two scenarios. + +Scenario 1: General + +In this scenario, we ask D to update the full dependency tree by +stat()-ing or checksumming files to check currency. The generated +go.out will always be up to date but incremental compilation will +be slower. Typically, this will be necessary only after a major +operation like syncing or checking out code, or if there are known +changes being made to the dependencies. + +Scenario 2: Fast + +In this scenario, we explicitly tell D -c what has changed and have +it compile only what is required. Typically, this will mean compiling +only the single active file or maybe a few files. If an IDE is +present or there is some watcher tool, it's easy to avoid the common +mistake of forgetting to compile a changed file. + +If an edit has caused skew between export and import, this will be +caught by the compiler, so it should be type-safe at least. If D is +running the compilation, it might be possible to arrange that C tells +it there is a dependency problem and have D then try to resolve it +by reevaluation. + + +The contents of .o files (II) +---- + +For scenario 2, we can make things even faster if the .o files +identify not just the files that must be imported to satisfy the +imports, but details about the imports themselves. Let's say main.go +uses only one function from fmt.go, called F. If the compiled main.o +says, in effect + + from package fmt get F + +then the linker will not need to read all of fmt.o to link main.o; +instead it can extract only the necessary function. + +Even better, if fmt is a package made of many files, it may be +possible to store in main.o specific information about the exact +files needed: + + from file fmtF.o get F + +The linker can then not even bother opening the other .o files that +form package fmt. + +The compiler should therefore be explicit and detailed within the .o +files it generates about what elements of a package are needed by +the program being compiled. + +Earlier, we said that when we run + + C fmt.go + +it discovers the import of sys, and must then read sys.o to discover +the details. Note that if we record the information as specified here, +when we then do + + C main.go + +and it reads fmt.o, it does not in turn need to read sys.o; the necessary +information has already been pulled up into fmt.o by D. + +Thus, once the dependency information is properly constructed, to +compile a program X.go we must read X.go plus N .o files, where N +is the number of packages explicitly imported by X.go. The transitive +closure need not be evaluated to compile a file, only the explicit +imports. By this result, we hope to dramatically reduce the amount +of I/O necessary to compile a Go source file. + +To put this another way, if a package P imports packages Xi, the +existence of Xi.o files is all that is needed to compile P because the +Xi.o files contain the export information. This is what breaks the +transitive dependency closure. diff --git a/src/lib/container/vector.go b/src/lib/container/vector.go index f08b340ab8f..7081bd3a953 100644 --- a/src/lib/container/vector.go +++ b/src/lib/container/vector.go @@ -123,7 +123,7 @@ func Test() { for i := 0; i < v.Len(); i++ { var x *I; x = v.At(i); - print i, " ", x.val, "\n"; // BUG: can't use I(v.At(i)) + print i, " ", x.val, "\n"; } }