From 719a06fd97f69ebea8f17cbae8a6dcfbe33fe26b Mon Sep 17 00:00:00 2001 From: Robert Griesemer Date: Tue, 4 Mar 2008 22:23:23 -0800 Subject: [PATCH] - renamed todo -> todo.txt - deleted obsolote go_spec SVN=111358 --- doc/go_spec | 1291 --------------------------------------------------- 1 file changed, 1291 deletions(-) delete mode 100644 doc/go_spec diff --git a/doc/go_spec b/doc/go_spec deleted file mode 100644 index 56b4ca6781b..00000000000 --- a/doc/go_spec +++ /dev/null @@ -1,1291 +0,0 @@ -The Go Annotated Specification - -This document supersedes all previous Go spec attempts. The intent -is to make this a reference for syntax and semantics. It is annotated -with additional information not strictly belonging into a language -spec. - - -Open questions - -- how to delete from a map - -- how to test for map membership (we may want an 'atomic install'? m[i] ?= x; ) - -- compound struct literals? -StructTypeName { a, b, c } - -- array literals should be easy/natural to write -[ 1, 2, 3 ] -ArrayTypeName [ 1, 2, 3 ] - -- map literals -[ "a" : 1, "d" : 2, "z" : 3 ] -MapTypeName [ "a" : 1, "d" : 2, "z" : 3 ] - -- are basic types interfaces / do they define interfaces? - -- package initialization? - - - -Design decisions - -A list of decisions made but for which we haven't incorporated proper -language into this spec. Keep this section small and the spec -up-to-date instead. - -- multi-dimensional arrays: implementation restriction for now - -- no '->', always '.' -- (*a)[i] can be sugared into: a[i] -- '.' to select package elements - -- arrays are not automatically pointers, we must always say - explicitly: "*array T" if we mean a pointer to that array -- there is no pointer arithmetic in the language -- there are no unions - -- packages: need to pin it all down - -- tuple notation: (a, b) = (b, a); - generally: need to make this clear - -- for now: no (C) 'static' variables inside functions - -- exports: we write: 'export a, b, c;' (with a, b, c, etc. a list of - exported names, possibly also: structure.field) -- the ordering of methods in interfaces is not relevant -- structs must be identical (same decl) to be the same - (Ken has different implementation: equivalent declaration is the - same; what about methods?) - -- new methods can be added to a struct outside the package where the - struct is declared (need to think through all implications) -- array assignment by value -- do we need a type switch? - -- write down scoping rules for statements - -- semicolons: where are they needed and where are they not needed. - need a simple and consistent rule - -- we have: postfix ++ and -- as statements - - - -Guiding principles - -Go is an attempt at a new systems programming language. -[gri: this needs to be expanded. some keywords below] - -- small, concise, crisp -- procedural -- strongly typed -- few, orthogonal, and general concepts -- avoid repetition of declarations -- multi-threading support in the language -- garbage collected -- containers w/o templates -- compiler can be written in Go and so can it's GC -- very fast compilation possible (1MLOC/s stretch goal) -- reasonably efficient (C ballpark) -- compact, predictable code - (local program changes generally have local effects) -- no macros - - -Syntax - -The syntax of Go borrows from the C tradition with respect to -statements and from the Pascal tradition with respect to declarations. -Go programs are written using a lean notation with a small set of -keywords, without filler keywords (such as 'of', 'to', etc.) or other -gratuitous syntax, and with a slight preference for expressive -keywords (e.g. 'function') over operators or other syntactic -mechanisms. Generally, "light" language features (variables, simple -control flow, etc.) are expressed using a light-weight notation (short -keywords, little syntax), while "heavy" language features use a more -heavy-weight notation (longer keywords, more syntax). - -[gri: should say something about syntactic alternatives: if a -syntactic form foreseeably will lead to a style recommendation, try to -make that the syntactic form instead. For instance, Go structured -statements always require the {} braces even if there is only a single -sub-statement. Similar ideas apply elsewhere.] - - -Modularity, identifiers and scopes - -A Go program consists of one or more files compiled separately, though -not independently. A single file or compilation unit may make -individual identifiers visible to other files by marking them as -exported; there is no "header file". The exported interface of a file -may be exposed in condensed form (without the corresponding -implementation) through tools. - -A package collects types, constants, functions, and so on into a named -entity that may be imported to enable its constituents be used in -another compilation unit. Each source file is part of exactly one -package; each package is constructed from one source file. - -Within a file, all identifiers are declared explicitly (expect for -general predeclared identifiers such as true and false) and thus for -each identifier in a file the corresponding declaration can be found -in that same file (usually before its use, except for the rare case of -forward declarations). Identifiers may denote program entities that -are implemented in other files. Nevertheless, such identifiers are -still declared via an import declaration in the file that is referring -to them. This explicit declaration requirement ensures that every -compilation unit can be read by itself. - -The scoping of identifiers is uniform: An identifier is visible from -the point of its declaration to the end of the immediately surrounding -block, and nested identifiers shadow outer identifiers with the same -name. All identifiers are in the same namespace; i.e., no two -identifiers in the same scope may have the same name even if they -denote different language concepts (for instance, such as variable vs -a function). Uniform scoping rules make Go programs easier to read -and to understand. - - -Program structure - -A compilation unit consists of a package specifier followed by import -declarations followed by other declarations. There are no statements -at the top level of a file. [gri: do we have a main function? or do -we treat all functions uniformly and instead permit a program to be -started by providing a package name and a "start" function? I like -the latter because if gives a lot of flexibility and should be not -hard to implement]. [r: i suggest that we define a symbol, main or -Main or start or Start, and begin execution in the single exported -function of that name in the program. the flexibility of having a -choice of name is unimportant and the corresponding need to define the -name in order to link or execute adds complexity. by default it -should be trivial; we could allow a run-time flag to override the -default for gri's flexibility.] - - -Typing, polymorphism, and object-orientation - -Go programs are strongly typed; i.e., each program entity has a static -type known at compile time. Variables also have a dynamic type, which -is the type of the value they hold at run-time. Generally, the -dynamic and the static type of a variable are identical, except for -variables of interface type. In that case the dynamic type of the -variable is a pointer to a structure that implements the variable's -(static) interface type. There may be many different structures -implementing an interface and thus the dynamic type of such variables -is generally not known at compile time. Such variables are called -polymorphic. - -Interface types are the mechanism to support an object-oriented -programming style. Different interface types are independent of each -other and no explicit hierarchy is required (such as single or -multiple inheritance explicitly specified through respective type -declarations). Interface types only define a set of functions that a -corresponding implementation must provide. Thus interface and -implementation are strictly separated. - -An interface is implemented by associating functions (methods) with -structures. If a structure implements all methods of an interface, it -implements that interface and thus can be used where that interface is -required. Unless used through a variable of interface type, methods -can always be statically bound (they are not "virtual"), and incur no -runtime overhead compared to an ordinary function. - -Go has no explicit notion of classes, sub-classes, or inheritance. -These concepts are trivially modeled in Go through the use of -functions, structures, associated methods, and interfaces. - -Go has no explicit notion of type parameters or templates. Instead, -containers (such as stacks, lists, etc.) are implemented through the -use of abstract data types operating on interface types. [gri: there -is some automatic boxing, semi-automatic unboxing support for basic -types]. - - -Pointers and garbage collection - -Variables may be allocated automatically (when entering the scope of -the variable) or explicitly on the heap. Pointers are used to refer -to heap-allocated variables. Pointers may also be used to point to -any other variable; such a pointer is obtained by "getting the -address" of that variable. In particular, pointers may point "inside" -other variables, or to automatic variables (which are usually -allocated on the stack). Variables are automatically reclaimed when -they are no longer accessible. There is no pointer arithmetic in Go. - - -Functions - -Functions contain declarations and statements. They may be invoked -recursively. Functions may declare nested functions, and nested -functions have access to the variables in the surrounding functions, -they are in fact closures. Functions may be anonymous and appear as -literals in expressions. - - -Multithreading and channels - -[Rob: We need something here] - - - - -Notation - -The syntax is specified in green productions using Extended -Backus-Naur Form (EBNF). In particular: - -'' encloses lexical symbols -| separates alternatives -() used for grouping -[] specifies option (0 or 1 times) -{} specifies repetition (0 to n times) - -A production may be referred to from various places in this document -but is usually defined close to its first use. Code examples are -written in gray. Annotations are in blue, and open issues are in red. -One goal is to get rid of all red text in this document. [r: done!] - - -Vocabulary and representation - -REWRITE THIS: BADLY EXPRESSED - -Go program source is a sequence of characters. Each character is a -Unicode code point encoded in UTF-8. - -A Go program is a sequence of symbols satisfying the Go syntax. A -symbol is a non-empty sequence of characters. Symbols are -identifiers, numbers, strings, operators, delimiters, and comments. -White space must not occur within symbols (except in comments, and in -the case of blanks and tabs in strings). They are ignored unless they -are essential to separate two consecutive symbols. - -White space is composed of blanks, newlines, carriage returns, and -tabs only. - -A character is a Unicode code point. In particular, capital and -lower-case letters are considered as being distinct. Note that some -Unicode characters (e.g., the character ä), may be representable in -two forms, as a single code point, or as two code points. For the -Unicode standard these two encodings represent the same character, but -for Go, these two encodings correspond to two different characters). - -Source encoding - -The input is encoded in UTF-8. In the grammar we use the notation - -utf8_char - -to refer to an arbitrary Unicode code point encoded in UTF-8. - -Digits and Letters - -octal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' } . -decimal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' } . -hex_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | 'a' | - 'A' | 'b' | 'B' | 'c' | 'C' | 'd' | 'D' | 'e' | 'E' | 'f' | 'F' } . -letter = 'A' | 'a' | ... 'Z' | 'z' | '_' . - -For now, letters and digits are ASCII. We may expand this to allow -Unicode definitions of letters and digits. - - -Identifiers - -An identifier is a name for a program entity such as a variable, a -type, a function, etc. - -identifier = letter { letter | decimal_digit } . - - -- need to explain scopes, visibility (elsewhere) -- need to say something about predeclared identifiers, and their - (universe) scope (elsewhere) - - -Character and string literals - -A RawStringLit is a string literal delimited by back quotes ``; the -first back quote encountered after the opening back quote terminates -the string. - -RawStringLit = '`' { utf8_char } '`' . - -`abc` -`\n` - -Character and string literals are very similar to C except: - - Octal character escapes are always 3 digits (\077 not \77) - - Hexadecimal character escapes are always 2 digits (\x07 not \x7) - - Strings are UTF-8 and represent Unicode - - `` strings exist; they do not interpret backslashes - -CharLit = '\'' ( UnicodeValue | ByteValue ) '\'' . -StringLit = RawStringLit | InterpretedStringLit . -InterpretedStringLit = '"' { UnicodeValue | ByteValue } '"' . -ByteValue = OctalByteValue | HexByteValue . -OctalByteValue = '\' octal_digit octal_digit octal_digit . -HexByteValue = '\' 'x' hex_digit hex_digit . -UnicodeValue = utf8_char | EscapedCharacter | LittleUValue | BigUValue . -LittleUValue = '\' 'u' hex_digit hex_digit hex_digit hex_digit . -BigUValue = '\' 'U' hex_digit hex_digit hex_digit hex_digit - hex_digit hex_digit hex_digit hex_digit . -EscapedCharacter = '\' ( 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' ) . - -An OctalByteValue contains three octal digits. A HexByteValue -contains two hexadecimal digits. (Note: This differs from C but is -simpler.) - -It is erroneous for an OctalByteValue to represent a value larger than 255. -(By construction, a HexByteValue cannot.) - -A UnicodeValue takes one of four forms: - - 1. The UTF-8 encoding of a Unicode code point. Since Go source - text is in UTF-8, this is the obvious translation from input - text into Unicode characters. - 2. The usual list of C backslash escapes: \n \t etc. 3. A - `little u' value, such as \u12AB. This represents the Unicode - code point with the corresponding hexadecimal value. It always - has exactly 4 hexadecimal digits. - 4. A `big U' value, such as '\U00101234'. This represents the - Unicode code point with the corresponding hexadecimal value. - It always has exactly 8 hexadecimal digits. - -Some values that can be represented this way are illegal because they -are not valid Unicode code points. These include values above -0x10FFFF and surrogate halves. - -A character literal is a form of unsigned integer constant. Its value -is that of the Unicode code point represented by the text between the -quotes. - -'a' -'ä' -'本' -'\t' -'\0' -'\07' -'\0377' -'\x7' -'\xff' -'\u12e4' -'\U00101234' - -A string literal has type 'string'. Its value is constructed by -taking the byte values formed by the successive elements of the -literal. For ByteValues, these are the literal bytes; for -UnicodeValues, these are the bytes of the UTF-8 encoding of the -corresponding Unicode code points. Note that "\u00FF" and "\xFF" are -different strings: the first contains the two-byte UTF-8 expansion of -the value 255, while the second contains a single byte of value 255. -The same rules apply to raw string literals, except the contents are -uninterpreted UTF-8. - -"" -"Hello, world!\n" -"日本語" -"\u65e5本\U00008a9e" -"\xff\u00FF" - -These examples all represent the same string: - -"日本語" // UTF-8 input text -`日本語` // UTF-8 input text as a raw literal -"\u65e5\u672c\u8a9e" // The explicit Unicode code points -"\U000065e5\U0000672c\U00008a9e" // The explicit Unicode code points -"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // The explicit UTF-8 bytes - -The language does not canonicalize Unicode text or evaluate combining -forms. The text of source code is passed uninterpreted. - -If the source code represents a character as two code points, such as -a combining form involving an accent and a letter, the result will be -an error if placed in a character literal (it is not a single code -point), and will appear as two code points if placed in a string -literal. [This simple strategy may be insufficient in the long run -but is surely fine for now.] - - -Numeric literals - -Integer literals take the usual C form, except for the absence of the -'U', 'L' etc. suffixes, and represent integer constants. (Character -literals are also integer constants.) Similarly, floating point -literals are also C-like, without suffixes and decimal only. - -An integer constant represents an abstract integer value of arbitrary -precision. Only when an integer constant (or arithmetic expression -formed from integer constants) is assigned to a variable (or other -l-value) is it required to fit into a particular size - that of type -of the variable. In other words, integer constants and arithmetic -upon them is not subject to overflow; only assignment of integer -constants (and constant expressions) to an l-value can cause overflow. -It is an error if the value of the constant or expression cannot be -represented correctly in the range of the type of the l-value. - -Floating point literals also represent an abstract, ideal floating -point value that is constrained only upon assignment. [r: what do we -need to say here? trickier because of truncation of fractions.] - -IntLit = [ '+' | '-' ] UnsignedIntLit . -UnsignedIntLit = DecimalIntLit | OctalIntLit | HexIntLit . -DecimalIntLit = ( '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ) - { decimal_digit } . -OctalIntLit = '0' { octal_digit } . -HexIntLit = '0' ( 'x' | 'X' ) hex_digit { hex_digit } . -FloatLit = [ '+' | '-' ] UnsignedFloatLit . -UnsignedFloatLit = "the usual decimal-only floating point representation". - - - -Compound Literals - -THIS SECTION IS WRONG -Compound literals require some fine tuning. I think we did ok in -Sawzall but there are some loose ends. I don't like that one cannot -easily distinguish between an array and a struct. We may need to -specify a type if these literals appear in expressions, but we don't -want to specify a type if these literals appear as intializer -expressions where the variable is already typed. And we don't want to -do any implicit conversions. - -CompoundLit = ArrayLit | FunctionLit | StructureLit | MapLit. -ArrayLit = '{' [ ExpressionList ] ']'. // all elems must have "the same" type -StructureLit = '{' [ ExpressionList ] '}'. -MapLit = '{' [ PairList ] '}'. -PairList = Pair { ',' Pair }. -Pair = Expression ':' Expression. - -Literals - -Literal = BasicLit | CompoundLit . -BasicLit = CharLit | StringLit | IntLit | FloatLit . - - -Function Literals - -The type of a function literal - -FunctionLit = FunctionType Block. - -A function literal represents a function. A function literal can be invoked -or assigned to a variable of the corresponding function pointer type. - - -// Function literal -func (a, b int, z float) bool { return a*b < int(z); } - -// Method literal -func (p *T) . (a, b int, z float) bool { return a*b < int(z) + p.x; } - - -Operators - -- incomplete - - -Delimiters - -- incomplete - - -Comments - -There are two forms of comments. - -The first starts '//' and ends at a newline. - -The second starts at '/*' and ends at the first '*/'. It may cross -newlines. It does not nest. - -Comments are treated like white space. - - -Common productions - -IdentifierList = identifier { ',' identifier }. -ExpressionList = Expression { ',' Expression }. - -QualifiedIdent = [ PackageName '.' ] identifier. -PackageName = identifier. - - -Types - -A type specifies the set of values which variables of that type may -assume, and the operators that are applicable. - -Except for variables of interface types, the static type of a variable -(i.e. the type the variable is declared with) is the same as the -dynamic type of the variable (i.e. the type of the variable at -run-time). Variables of interface types may hold variables of -different dynamic types, but their dynamic types must be compatible -with the static interface type. At any given instant during run-time, -a variable has exactly one dynamic type. A type declaration -associates an identifier with a type. - -Array and struct types are called structured types, all other types -are called unstructured. A structured type cannot contain itself. -[gri: this needs to be formulated much more precisely]. - -Type = TypeName | ArrayType | ChannelType | InterfaceType | - FunctionType | MapType | StructType | PointerType . -TypeName = QualifiedIdent. - - -[gri: To make the types specifications more precise we need to -introduce some general concepts such as what it means to 'contain' -another type, to be 'equal' to another type, etc. Furthermore, we are -imprecise as we sometimes use the word type, sometimes just the type -name (int), or the structure (array) to denote different things (types -and variables). We should explain more precisely. Finally, there is -a difference between equality of types and assignment compatibility - -or isn't there?] - - -Basic types - -Go defines a number of basic types which are referred to by their -predeclared type names. There are signed and unsigned integer types, -and floating point types: - - bool the truth values true and false - - uint8 the set of all unsigned 8bit integers - uint16 the set of all unsigned 16bit integers - uint32 the set of all unsigned 32bit integers - unit64 the set of all unsigned 64bit integers - - byte same as uint8 - - int8 the set of all signed 8bit integers, in 2's complement - int16 the set of all signed 16bit integers, in 2's complement - int32 the set of all signed 32bit integers, in 2's complement - int64 the set of all signed 64bit integers, in 2's complement - - float32 the set of all valid IEEE-754 32bit floating point numbers - float64 the set of all valid IEEE-754 64bit floating point numbers - float80 the set of all valid IEEE-754 80bit floating point numbers - - double same as float64 - -Additionally, Go declares 3 basic types, uint, int, and float, which -are platform-specific. The bit width of these types corresponds to -the "natural bit width" for the respective types for the given -platform (e.g. int is usally the same as int32 on a 32bit -architecture, or int64 on a 64bit architecture). These types are by -definition platform-specific and should be used with the appropriate -caution. - -[gri: do we specify minimal sizes for uint, int, float? e.g. int is -at least int32?] [gri: do we say something about the correspondence of -sizeof(*T) and sizeof(int)? Are they the same?] [r: do we want -int128 and uint128?.] - - -Built-in types - -Besides the basic types there is a set of built-in types: string, and chan, -with maybe more to follow. - - -Type string - -The string type represents the set of string values (strings). -A string behaves like an array of bytes, with the following properties: - -- They are immutable: after creation, it is not possible to change the - contents of a string -- No internal pointers: it is illegal to create a pointer to an inner - element of a string -- They can be indexed: given string s1, s1[i] is a byte value -- They can be concatenated: given strings s1 and s2, s1 + s2 is a value - combining the elements of s1 and s2 in sequence -- Known length: the length of a string s1 can be obtained by the function/ - operator len(s1). [r: is it a bulitin? do we make it a method? etc. this is - a placeholder]. The length of a string is the number of bytes within. - Unlike in C, there is no terminal NUL byte. -- Creation 1: a string can be created from an integer value by a conversion - string('x') yields "x" -- Creation 2: a string can by created from an array of integer values (maybe - just array of bytes) by a conversion - a [3]byte; a[0] = 'a'; a[1] = 'b'; a[2] = 'c'; string(a) == "abc"; - -The language has string literals as dicussed above. The type of a string -literal is 'string'. - - -Array types - -An array is a structured type consisting of a number of elements which -are all of the same type, called the element type. The number of -elements of an array is called its length. The elements of an array -are designated by indices which are integers between 0 and the length -- 1. - -THIS SECTION NEEDS WORK REGARDING STATIC AND DYNAMIC ARRAYS - -An array type specifies a set of arrays with a given element type and -an optional array length. The array length must be (compile-time) -constant expression, if present. Arrays without length specification -are called open arrays. An open array must not contain other open -arrays, and open arrays can only be used as parameter types or in a -pointer type (for instance, a struct may not contain an open array -field, but only a pointer to an open array). - -[gri: Need to define when array types are the same! Also need to -define assignment compatibility] [gri: Need to define a mechanism to -get to the length of an array at run-time. This could be a -predeclared function 'length' (which may be problematic due to the -name). Alternatively, we could define an interface for array types -and say that there is a 'length()' method. So we would write -a.length() which I think is pretty clean.]. [r: if array types have -an interface and a string is an array, some stuff (but not enough) -falls out nicely.] - -ArrayType = 'array' { '[' ArrayLength ']' } ElementType. -ArrayLength = Expression. -ElementType = Type. - -The notation - - array [n][m] T - -is a syntactic shortcut for - - array [n] array [m] T. - -(the shortcut may be applied recursively). - -array uint8 -array [64] struct { x, y: int32; } -array [1000][1000] float64 - - -Channel types - -A channel provides a mechanism for two concurrently executing functions -to exchange values and synchronize execution. A channel type can be -'generic', permitting values of any type to be exchanged, or it may be -'specific', permitting only values of an explicitly specified type. - -Upon creation, a channel can be used both to send and to receive; it -may be restricted only to send or to receive; such a restricted channel -is called a 'send channel' or a 'receive channel'. - -ChannelType = 'chan' [ '<' | '>' ] [ Type ] . - -chan // a generic channel -chan int // a channel that can exchange only ints -chan> float // a channel that can only be used to send floats -chan< // a channel that can receive (only) values of any type - -Channel values are created using new(chan) (etc.). Since new() -returns a pointer, channel variables are always pointers to -channels: - -var c *chan int = new(chan int); - -It is an error to attempt to dereference a channel pointer. - - -Pointer types - -- TODO: Need some intro here. - -Two pointer types are the same if they are pointing to variables of -the same type. - -PointerType = '*' Type. - -- We do not allow pointer arithmetic of any kind. - - -Interface types - -- TBD: This needs to be much more precise. For now we understand what it means. - -An interface type specifies a set of methods, the "method interface" -of structs. No two methods in one interface can have the same name. - -Two interfaces are the same if their set of functions is the same, -i.e., if all methods exist in both interfaces and if the function -names and signatures are the same. The order of declaration of -methods in an interface is irrelevant. - -A set of interface types implicitly creates an unconnected, ordered -lattice of types. An interface type T1 is said to be smaller than or -equalt to an interface type T2 (T1 <= T2) if the entire interface of -T1 "is part" of T2. Thus, two interface types T1, T2 are the same if -T1 <= T2, and T2 <= T1, and thus we can write T1 == T2. - - -InterfaceType = 'interface' '{' { MethodDecl } '}' . -MethodDecl = identifier Signature ';', - -// An empty interface. -interface {}; - -// A basic file interface. -interface { - Read(Buffer) bool; - Write(Buffer) bool; - Close(); -} - - -Interface pointers can be implemented as "fat pointers"; namely a pair -(ptr, tdesc) where ptr is simply the pointer to a struct instance -implementing the interface, and tdesc is the structs type descriptor. -Only when crossing the boundary from statically typed structs to -interfaces and vice versa, does the type descriptor come into play. -In those places, the compiler statically knows the value of the type -descriptor. - - -Function types - -FunctionType = 'func' Signature . -Signature = [ Receiver '.' ] Parameters [ Result ] . -Receiver = '(' identifier Type ')' . -Parameters = '(' [ ParameterList ] ')' . -ParameterList = ParameterSection { ',' ParameterSection } . -ParameterSection = [ IdentifierList ] Type . -Result = [ Type ] | '(' ParameterList ')' . - -// Function types -func () -func (a, b int, z float) bool -func (a, b int, z float) (success bool) -func (a, b int, z float) (success bool, result float) - -// Method types -func (p *T) . () -func (p *T) . (a, b int, z float) bool -func (p *T) . (a, b int, z float) (success bool) -func (p *T) . (a, b int, z float) (success bool, result float) - -A variable can only hold a pointer to a function, but not a function value. -In particular, v := func() {}; creates a variable of type *func(). To call the -function referenced by v, one writes v(). It is illegal to dereference a function -pointer. - - - -Map types - -A map is a structured type consisting of a variable number of entries -called (key, value) pairs. For a given map, -the keys and values must each be of a specific type. -Upon creation, a map is empty and values may be added and removed -during execution. The number of entries in a map is called its length. - -MapType = 'map' '[' KeyType ']' ValueType . -KeyType = Type . -ValueType = Type . - -map [string] int -map [struct { pid int; name string }] *chan Buffer - - -Struct types - -Struct types are similar to C structs. - -NEED TO DEFINE STRUCT EQUIVALENCE Two struct types are the same if and -only if they are declared by the same struct type; i.e., struct types -are compared via equivalence, and *not* structurally. For that -reason, struct types are usually given a type name so that it is -possible to refer to the same struct in different places in a program. -What about equivalence of structs w/ respect to methods? What if -methods can be added in another package? TBD. - -Each field of a struct represents a variable within the data -structure. In particular, a function field represents a function -variable, not a method. - -StructType = 'struct' '{' { FieldDecl } '}' . -FieldDecl = IdentifierList Type ';' . - -// An empty struct. -struct {} - -// A struct with 5 fields. -struct { - x, y int; - u float; - a []int; - f func(); -} - - - -Note that a program which never uses interface types can be fully -statically typed. That is, the "usual" implementation of structs (or -classes as they are called in other languages) having an extra type -descriptor prepended in front of every single struct is not required. -Only when a pointer to a struct is assigned to an interface variable, -the type descriptor comes into play, and at that point it is -statically known at compile-time! - -Package specifiers - -Every source file is an element of a package, and defines which -package by the first element of every source file, which must be a -package specifier: - -PackageSpecifier = 'package' PackageName . - -package Math - - -Package import declarations - -A program can access exported items from another package. It does so -by in effect declaring a local name providing access to the package, -and then using the local name as a namespace with which to address the -elements of the package. - -ImportDecl = 'import' PackageName FileName . -FileName = DoubleQuotedString . -DoubleQuotedString = '"' TEXT '"' . - -(DoubleQuotedString should be replaced by the correct string literal production!) -Package import declarations must be the first statements in a file -after the package specifier. - -A package import associates an identifier with a package, named by a -file. In effect, it is a declaration: - -import Math "lib/Math"; -import library "my/library"; - -After such an import, one can use the Math (e.g) identifier to access -elements within it - -x float = Math.sin(y); - -Note that this process derives nothing explicit about the type of the -`imported' function (here Math.sin()). The import must execute to -provide this information to the compiler (or the programmer, for that -matter). - -An angled-string refers to official stuff in a public place, in effect -the run-time library. A double-quoted-string refers to arbitrary -code; it is probably a local file name that needs to be discovered -using rules outside the scope of the language spec. - -The file name in a package must be complete except for a suffix. -Moreover, the package name must correspond to the (basename of) the -source file name. For instance, the implementation of package Bar -must be in file Bar.go, and if it lives in directory foo we write - -import Bar "foo/bar"; - -to import it. - -[This is a little redundant but if we allow multiple files per package -it will seem less so, and in any case the redundancy is useful and -protective.] - -We assume Unix syntax for file names: / separators, no suffix for -directories. If the language is ported to other systems, the -environment must simulate these properties to avoid changing the -source code. - - -Declarations - -- This needs to be expanded. -- We need to think about enums (or some alternative mechanism). - -Declaration = (ConstDecl | VarDecl | TypeDecl | FunctionDecl | - ForwardDecl | AliasDecl) . - - -Const declarations - -ConstDecl = 'const' ( ConstSpec | '(' ConstSpecList [ ';' ] ')' ). -ConstSpec = identifier [ Type ] '=' Expression . -ConstSpecList = ConstSpec { ';' ConstSpec }. - -const pi float = 3.14159265 -const e = 2.718281828 -const ( - one int = 1; - two = 3 -) - - -Variable declarations - -VarDecl = 'var' ( VarSpec | '(' VarSpecList [ ';' ] ')' ) | ShortVarDecl . -VarSpec = IdentifierList ( Type [ '=' ExpressionList ] | '=' ExpressionList ) . -VarSpecList = VarSpec { ';' VarSpec } . -ShortVarDecl = identifier ':=' Expression . - -var i int -var u, v, w float -var k = 0 -var x, y float = -1.0, -2.0 -var ( - i int; - u, v = 2.0, 3.0 -) - -If the expression list is present, it must have the same number of elements -as there are variables in the variable specification. - -[ TODO: why is x := 0 not legal at the global level? ] - - -Type declarations - -TypeDecl = 'type' ( TypeSpec | '(' TypeSpecList [ ';' ] ')' ). -TypeSpec = identifier Type . -TypeSpecList = TypeSpec { ';' TypeSpec }. - - -type IntArray [16] int -type ( - Point struct { x, y float }; - Polar Point -) - - -Function and method declarations - -FunctionDecl = 'func' [ Receiver ] identifier Parameters [ Result ] ( ';' | Block ) . -Block = '{' { Statement } '}' . - - -func min(x int, y int) int { - if x < y { - return x; - } - return y; -} - -func foo (a, b int, z float) bool { - return a*b < int(z); -} - - -A method is a function that also declares a receiver. The receiver is -a struct with which the function is associated. The receiver type -must denote a pointer to a struct. - -func (p *T) foo (a, b int, z float) bool { - return a*b < int(z) + p.x; -} - -func (p *Point) Length() float { - return Math.sqrt(p.x * p.x + p.y * p.y); -} - -func (p *Point) Scale(factor float) { - p.x = p.x * factor; - p.y = p.y * factor; -} - -The last two examples are methods of struct type Point. The variable p is -the receiver; within the body of the method it represents the value of -the receiving struct. - -Note that methods are declared outside the body of the corresponding -struct. - -Functions and methods can be forward declared by omitting the body: - -func foo (a, b int, z float) bool; -func (p *T) foo (a, b int, z float) bool; - - - -Statements - -Statement = - EmptyStat | Assignment | CompoundStat | Declaration | - ExpressionStat | IncDecStat | IfStat | WhileStat | ForStat | - RangeStat | ReturnStat . - - -Empty statements - -EmptyStat = ';' . - - -Assignments - -Assignment = Designator '=' Expression . - -- no automatic conversions -- values can be assigned to variables if they are of the same type, or -if they satisfy the interface type (much more precision needed here!) - - - -Compound statements - -CompoundStat = '{' { Statement } '}' . - - -Expression statements - -ExpressionStat = Expression . - - -IncDec statements - -IncDecStat = Expression ( '++' | '--' ) . - - - - -If statements - -IfStat = 'if' ( [ Expression ] '{' { IfCaseList } '}' ) | - ( Expression '{' { Statement } '}' [ 'else' { Statement } ] ). -IfCaseList = ( 'case' ExpressionList | 'default' ) ':' { Statement } . - -if x < y { - return x; -} else { - return y; -} - -if tag { -case 0, 1: s1(); -case 2: s2(); -default: ; -} - -if { -case x < y: f1(); -case x < z: f2(); -} - - -While statements - -WhileStat = 'while' ( [ Expression ] '{' { WhileCaseList } '}' ) | - ( Expression '{' { Statement } '}' ). -WhileCaseList = 'case' ExpressionList ':' { Statement } . - -while { -case i < n: f1(); -case i < m: f2(); -} - - -For statements - -NEEDS TO BE COMPLETED - -ForStat = 'for' ... - - - -Range statements - -Range statements denote iteration over the contents of arrays and maps. - -RangeStat = 'range' IdentifierList ':=' RangeExpression Block . -RangeExpression = Expression . - -A range expression must evaluate to an array, map or string. The identifier list must contain -either one or two identifiers. If the range expression is a map, a single identifier is declared -to range over the keys of the map; two identifiers range over the keys and corresponding -values. For arrays and strings, the behavior is analogous for integer indices (the keys) and array -elements (the values). - -a := [ 1, 2, 3]; -m := [ "fo" : 2, "foo" : 3, "fooo" : 4 ] - -range i := a { - f(a[i]); -} - -range k, v := m { - assert(len(k) == v); -} - - -Return statements - -ReturnStat = 'return' [ ExpressionList ] . - -There are two ways to return values from a function. The first is to -explicitly list the return value or values in the return statement: - -func simple_f () int { - return 2; -} - -func complex_f1() (re float, im float) { - return -7.0, -4.0; -} - -The second is to provide names for the return values and assign them -explicitly in the function; the return statement will then provide no -values: - -func complex_f2() (re float, im float) { - re = 7.0; - im = 4.0; - return; -} - -It is legal to name the return values in the declaration even if the -first form of return statement is used: - - -func complex_f2() (re float, im float) { - return 7.0, 4.0; -} - - -Expressions - -Expression = Conjunction { '||' Conjunction }. -Conjunction = Comparison { '&&' Comparison }. -Comparison = SimpleExpr [ relation SimpleExpr ]. -relation = '==' | '!=' | '<' | '<=' | '>' | '>='. -SimpleExpr = Term { add_op Term }. -add_op = '+' | '-' | '|' | '^'. -Term = Factor { mul_op Factor }. -mul_op = '*' | '/' | '%' | '<<' | '>>' | '&'. - -The corresponding precedence hierarchy is as follows: (5 levels of -precedence is about the maximum people can keep comfortably in their -heads. The experience with C and C++ shows that more then that -usually requires explicit manual consultation...). [gri: I still -think we should consider 0 levels of binary precedence: All operators -are on the same level, but parentheses are required when different -operators are mixed. That would make it really easy, and really -clear. It would also open the door for straight-forward introduction -of user-defined operators, which would be rather useful.] - -Precedence Operator - 1 || - 2 && - 3 == != < <= > >= - 4 + - | ^ - 5 * / % << >> & - - -For integer values, / and % satisfy the following relationship: - - (a / b) * b + a % b == a - -and - - (a / b) is "truncated towards zero". - -The shift operators implement arithmetic shifts for signed integers, -and logical shifts for unsigned integers. TBD: is there any range -checking on s in x >> s, or x << s ? - -[gri: We decided on a couple of issues here that we need to write down -more nicely] - -- There are no implicit type conversions except for -constants/literals. In particular, unsigned and signed integers -cannot be mixed in an expression w/o explicit casting. - -- Unary '^' corresponds to C '~' (bitwise negate). - -- Arrays can be subscripted (a[i]) or sliced (a[i : j]). A slice a[i -: j] is a new array of length (j - i), and consisting of the elements -a[i], a[i + 1], ... a[j - 1]. [gri/r: Is the slice array bounds -check hard (leading to an error), or soft (truncating) ?]. -Furthermore: Array slicing is very tricky! Do we get a copy (a new -array) or a new array descriptor? This is open at this point. There -is a simple way out of the mess: Structured types are always passed by -reference, and there is no value assignment for structured types. It -gets very complicated very quickly. - -[gri: Syntax below is incomplete - what about method invocation?] - -Factor = Literal | Designator | '!' Expression | '-' Expression | - '^' Expression | '&' Expression | '(' Expression ')' | Call. -Designator = QualifiedIdent { Selector }. -Selector = '.' identifier | '[' Expression [ ':' Expression ] ']'. -Call = Factor '(' ExpressionList ')'. - -[gri: We need a precise definition of a constant expression] - - - - -Compilation units - -The unit of compilation is a single file. A compilation unit consists -of a package specifier followed by a list of import declarations -followed by a list of global declarations. - -CompilationUnit = { ImportDecl } { GlobalDeclaration }. -GlobalDeclaration = Declaration. - - -Exports - -Globally declared identifiers may be exported, thus making the -exported identifer visible outside the package. Another package may -then import the identifier to use it. - -Export directives must only appear at the global level of a -compilation unit (at least for now). That is, one can export -compilation-unit global identifiers but not, for example, local -variables or structure fields. - -Exporting an identifier makes the identifier visible externally to the -package. If the identifier represents a type, the type structure is -exported as well. The exported identifiers may appear later in the -source than the export directive itself, but it is an error to specify -an identifier not declared anywhere in the source file containing the -export directive. - -ExportDirective = 'export' ExportIdentifier { ',' ExportIdentifier } . -ExportIdentifier = identifier . - -export sin, cos; - -One may export variables and types, but (at least for now), not -aliases. [r: what is needed to make aliases exportable? issue is -transitivity.] - -Exporting a variable does not automatically export the type of the -variable. For illustration, consider the program fragment: - -package P; -export v1, v2, p; -struct S { a int; b int; } -var v1 S; -var v2 S; -var p *S; - -Notice that S is not exported. Another source file may contain: - -import P; -alias v1 P.v1; -alias v2 P.v2; -alias p P.p; - -This program can use v and p but not access the fields (a and b) of -structure type S explicitly. For instance, it could legally contain - -if p == nil { } -if v1 == v2 { } - -but not - -if v.a == 0 { } - - -