This is a reference manual for the Go programming language. For more information and other documents, see the Go home page.
Go is a general-purpose language designed with systems programming in mind. It is strongly typed and garbage-collected, and has explicit support for concurrent programming. Programs are constructed from packages, whose properties allow efficient management of dependencies. The existing implementations use a traditional compile/link model to generate executable binaries.
The grammar is simple and regular, allowing for easy analysis by automatic tools such as integrated development environments.
The syntax is specified using Extended Backus-Naur Form (EBNF):
Production = production_name "=" Expression . Expression = Alternative { "|" Alternative } . Alternative = Term { Term } . Term = production_name | token [ "..." token ] | Group | Option | Repetition . Group = "(" Expression ")" . Option = "[" Expression ")" . Repetition = "{" Expression "}" .
Productions are expressions constructed from terms and the following operators, in increasing precedence:
| alternation () grouping [] option (0 or 1 times) {} repetition (0 to n times)
Lower-case production names are used to identify lexical tokens. Non-terminals are in CamelCase. Lexical symbols are enclosed in double quotes "" (the double quote symbol is written as '"').
The form "a ... b" represents the set of characters from a through b as alternatives.
Where possible, recursive productions are used to express evaluation order and operator precedence syntactically.
A program is constructed from a set of packages. Each package is defined by one or more source files compiled separately. In processing the source text in each file, the input is divided into a sequence of tokens.
Go source text is a sequence of Unicode code points encoded in UTF-8. The language processor does not canonicalize the input, so it will treat a single accented code point as distinct from the same character constructed from combining an accent and a letter; those are treated as two code points. For simplicity, this document will use the term character to refer to a Unicode code point.
Each code point is distinct; for example, upper and lower case letters are different characters.
There are four classes of tokens: identifiers, keywords, operators and delimiters, and literals. White space, formed from blanks, tabs, and newlines, is ignored except as it separates tokens that would otherwise combine into a single token. Comments, defined below, behave as white space. While breaking the input into tokens, the next token is the longest sequence of characters that form a valid token.
There are two forms of comments. The first starts at a the character sequence // and continues through the next newline. The second starts at the character sequence /* and continues through the character sequence */. Comments do not nest.
An identifier is a sequence of one or more letters and digits. The meaning of letter and digit is defined by the Unicode properties for the corresponding characters, with the addition that the underscore character _ (U+005F) is considered a letter. The first character in an identifier must be a letter.
letter = unicode_letter | "_" . identifier = letter { letter | unicode_digit } .
The following keywords are reserved and may not be used as identifiers.
break default func interface select case defer go map struct chan else goto package switch const fallthrough if range type continue for import return var
The following character sequences are tokens representing operators, delimiters, and other special lexemes:
+ & += &= && == != ( ) - | -= |= || < <= [ ] * ^ *= ^= <- > >= { } / << /= <<= ++ = := , ; % >> %= >>= -- ! ... . :
An integer literal is a sequence of one or more digits in the corresponding base, which may be 8, 10, or 16. An optional prefix sets a non-decimal base: 0 for octal, 0x or 0X for hexadecimal. In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
int_lit = decimal_lit | octal_lit | hex_lit . decimal_lit = ( "1" ... "9" ) { decimal_digit } . octal_lit = "0" { octal_digit } . hex_lit = "0" ( "x" | "X" ) hex_digit { hex_digit } . decimal_digit = "0" ... "9" . octal_digit = "0" ... "7" . hex_digit = "0" ... "9" | "A" ... "F" | "a" ... "f" .
Integer literals represent values of arbitrary precision, or ideal integers; they have no implicit size or type.
A floating-point literal is a decimal representation of a floating-point number. It has an integer part, a decimal point, a fractional part, and an exponent part. The integer and fractional part comprise decimal digits; the exponent part is an e or E followed by an optionally signed decimal exponent. One of the integer part or the fractional part may be elided; one of the decimal point or the exponent may be elided.
float_lit = decimals "." [ decimals ] [ exponent ] | decimals exponent | "." decimals [ exponent ] . decimals = decimal_digit { decimal_digit } . exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
As with integers, floating-point literals represent values of arbitrary precision, or ideal floats.
A character literal represents an integer value, typically a Unicode code point, as one or more characters enclosed in single quotes. Within the quotes, any character may appear except single quote and newline; a quoted single character represents itself, while multi-character sequences beginning with a backslash encode values in various formats.
The simplest form represents the exact character within the quotes; since Go source text is Unicode characters encoded in UTF-8, multiple UTF-8-encoded bytes may represent a single integer value. For instance, the literal 'a' holds a single byte representing a literal a, Unicode U+0061, value 0x61, while 'ä' holds two bytes (0xc3 0xa4) representing a literal a-dieresis, U+00E4, value 0xe4.
Several backslash escapes allow arbitrary values to be represented as ASCII text. There are four ways to represent the integer value as a numeric constant: \x followed by exactly two hexadecimal digits; \u followed by exactly four hexadecimal digits; \U followed by exactly eight hexadecimal digits, and a plain backslash \ followed by exactly three octal digits. In each case the value of the literal is the value represented by the digits in the appropriate base.
Although these representations all result in an integer, they have different valid ranges. Octal escapes must represent a value between 0 and 255 inclusive. (Hexadecimal escapes satisfy this condition by construction). The `Unicode' escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.
After a backslash, certain single-character escapes represent special values:
\a U+0007 alert or bell \b U+0008 backspace \f U+000C form feed \n U+000A line feed or newline \r U+000D carriage return \t U+0009 horizontal tab \v U+000b vertical tab \\ U+005c backslash \' U+0027 single quote (legal within character literals only) \" U+0022 double quote (legal within interpreted string literals only)
All other sequences are illegal inside character literals.
char_lit = "'" ( unicode_value | byte_value ) "'" . unicode_value = unicode_char | little_u_value | big_u_value | escaped_char . byte_value = octal_byte_value | hex_byte_value . octal_byte_value = "\" octal_digit octal_digit octal_digit . hex_byte_value = "\" "x" hex_digit hex_digit . little_u_value = "\" "u" hex_digit hex_digit hex_digit hex_digit . big_u_value = "\" "U" hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit . escaped_char = "\" ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "\" | "'" | """ ) .
The value of a character literal is an ideal integer, just as with integer literals.
String literals represent constant values of type string. There are two forms: raw string literals and interpreted string literals.
Raw string literals are character sequences between back quotes ``. Within the quotes, any character is legal except newline and back quote. The value of a raw string literal is the string composed of the uninterpreted bytes between the quotes.
Interpreted string literals are character sequences between double quotes "". The text between the quotes forms the value of the literal, with backslash escapes interpreted as they are in character literals. The three-digit octal (\000) and two-digit hexadecimal (\x00) escapes represent individual bytes of the resulting string; all other escapes represent the (possibly multi-byte) UTF-8 encoding of individual characters. Thus inside a string literal \377 and \xFF represent a single byte of value 0xFF=255, while ÿ, \u00FF, \U000000FF and \xc3\xbf represent the two bytes 0xc3 0xbf of the UTF-8 encoding of character U+00FF.
string_lit = raw_string_lit | interpreted_string_lit . raw_string_lit = "`" { unicode_char } "`" . interpreted_string_lit = """ { unicode_value | byte_value } """ .
During tokenization, two adjacent string literals separated only by the empty string, white space, or comments are implicitly combined into a single string literal whose value is the concatenated values of the literals.
StringLit = string_lit { string_lit } .
I don't believe this organization is complete or correct but it's here to be worked on and thought about.
Current implementation accepts only ASCII digits for digits; doc says Unicode.