mirror of
https://github.com/golang/go
synced 2024-11-25 05:07:56 -07:00
3d50b1e0e8
DELTA=176 (172 added, 0 deleted, 4 changed) OCL=25182 CL=25222
266 lines
12 KiB
HTML
266 lines
12 KiB
HTML
|
|
<h2>Introduction</h2>
|
|
|
|
<p>
|
|
This is a reference manual for the Go programming language. For more information and other documents, see <a href="/">the Go home page</a>.
|
|
</p>
|
|
|
|
<p>
|
|
Go is a general-purpose language designed with systems programming in mind. It is strongly typed and garbage-collected, and has explicit support for concurrent programming. Programs are constructed from <i>packages</i>, whose properties allow efficient management of dependencies. The existing implementations use a traditional compile/link model to generate executable binaries.
|
|
</p>
|
|
|
|
<p>
|
|
The grammar is simple and regular, allowing for easy analysis by automatic tools such as integrated development environments.
|
|
</p>
|
|
|
|
<h2>Notation</h2>
|
|
|
|
<p>
|
|
The syntax is specified using Extended Backus-Naur Form (EBNF):
|
|
</p>
|
|
|
|
<pre>
|
|
Production = production_name "=" Expression .
|
|
Expression = Alternative { "|" Alternative } .
|
|
Alternative = Term { Term } .
|
|
Term = production_name | token [ "..." token ] | Group | Option | Repetition .
|
|
Group = "(" Expression ")" .
|
|
Option = "[" Expression ")" .
|
|
Repetition = "{" Expression "}" .
|
|
</pre>
|
|
|
|
<p>
|
|
Productions are expressions constructed from terms and the following operators, in increasing precedence:
|
|
</p>
|
|
<pre>
|
|
| alternation
|
|
() grouping
|
|
[] option (0 or 1 times)
|
|
{} repetition (0 to n times)
|
|
</pre>
|
|
|
|
<p>
|
|
Lower-case production names are used to identify lexical tokens. Non-terminals are in CamelCase. Lexical symbols are enclosed in double quotes <tt>""</tt> (the
|
|
double quote symbol is written as <tt>'"'</tt>).
|
|
</p>
|
|
|
|
<p>
|
|
The form <tt>"a ... b"</tt> represents the set of characters from <tt>a</tt> through <tt>b</tt> as alternatives.
|
|
</p>
|
|
|
|
<p>
|
|
Where possible, recursive productions are used to express evaluation order
|
|
and operator precedence syntactically.
|
|
</p>
|
|
|
|
|
|
<h2>Lexical properties</h2>
|
|
|
|
<p>
|
|
A program is constructed from a set of <i>packages</i>. Each package is defined by one or more source files compiled separately. In processing the source text in each file, the input is divided into a sequence of <i>tokens</i>.
|
|
</p>
|
|
|
|
<h3>Unicode text</h3>
|
|
|
|
<p>
|
|
Go source text is a sequence of Unicode code points encoded in UTF-8. The language processor does not canonicalize the input, so it will treat a single accented code point as distinct from the same character constructed from combining an accent and a letter; those are treated as two code points. For simplicity, this document will use the term <i>character</i> to refer to a Unicode code point.
|
|
</p>
|
|
<p>
|
|
Each code point is distinct; for example, upper and lower case letters are different characters.
|
|
</p>
|
|
|
|
<h3>Tokens</h3>
|
|
|
|
<p>
|
|
There are four classes of tokens: identifiers, keywords, operators and delimiters, and literals. <i>White space</i>, formed from blanks, tabs, and newlines, is ignored except as it separates tokens that would otherwise combine into a single token. Comments, defined below, behave as white space. While breaking the input into tokens, the next token is the longest sequence of characters that form a valid token.
|
|
</p>
|
|
|
|
<h3>Comments</h3>
|
|
|
|
<p>
|
|
There are two forms of comments. The first starts at a the character sequence <tt>//</tt> and continues through the next newline. The second starts at the character sequence <tt>/*</tt> and continues through the character sequence <tt>*/</tt>. Comments do not nest.
|
|
</p>
|
|
|
|
<h3>Identifiers</h3>
|
|
|
|
<p>
|
|
An identifier is a sequence of one or more letters and digits. The meaning of <i>letter</i> and <i>digit</i> is defined by the Unicode properties for the corresponding characters, with the addition that the underscore character <tt>_</tt> (U+005F) is considered a letter. The first character in an identifier must be a letter.
|
|
</p>
|
|
|
|
<pre>
|
|
letter = unicode_letter | "_" .
|
|
identifier = letter { letter | unicode_digit } .
|
|
</pre>
|
|
|
|
<h3>Keywords</h3>
|
|
|
|
<p>
|
|
The following keywords are reserved and may not be used as identifiers.
|
|
</p>
|
|
<pre>
|
|
break default func interface select
|
|
case defer go map struct
|
|
chan else goto package switch
|
|
const fallthrough if range type
|
|
continue for import return var
|
|
</pre>
|
|
|
|
<h3>Operators and Delimiters</h3>
|
|
|
|
<p>
|
|
The following character sequences are tokens representing operators, delimiters, and other special lexemes:
|
|
</p>
|
|
<pre>
|
|
+ & += &= && == != ( )
|
|
- | -= |= || < <= [ ]
|
|
* ^ *= ^= <- > >= { }
|
|
/ << /= <<= ++ = := , ;
|
|
% >> %= >>= -- ! ... . :
|
|
</pre>
|
|
|
|
<h3>Literals</h3>
|
|
|
|
<h4>Integer literals</h4>
|
|
|
|
<p>
|
|
An integer literal is a sequence of one or more digits in the corresponding base, which may be 8, 10, or 16. An optional prefix sets a non-decimal base: <tt>0</tt> for octal, <tt>0x</tt> or <tt>0X</tt> for hexadecimal. In hexadecimal literals, letters <tt>a-f</tt> and <tt>A-F</tt> represent values 10 through 15.
|
|
</p>
|
|
<pre>
|
|
int_lit = decimal_lit | octal_lit | hex_lit .
|
|
decimal_lit = ( "1" ... "9" ) { decimal_digit } .
|
|
octal_lit = "0" { octal_digit } .
|
|
hex_lit = "0" ( "x" | "X" ) hex_digit { hex_digit } .
|
|
decimal_digit = "0" ... "9" .
|
|
octal_digit = "0" ... "7" .
|
|
hex_digit = "0" ... "9" | "A" ... "F" | "a" ... "f" .
|
|
</pre>
|
|
|
|
<p>
|
|
Integer literals represent values of arbitrary precision, or <i>ideal integers</i>; they have no implicit size or type.
|
|
</p>
|
|
|
|
<h4>Floating-point literals</h4>
|
|
<p>
|
|
A floating-point literal is a decimal representation of a floating-point number. It has an integer part, a decimal point, a fractional part, and an exponent part. The integer and fractional part comprise decimal digits; the exponent part is an <tt>e</TT> or <tt>E</tt> followed by an optionally signed decimal exponent. One of the integer part or the fractional part may be elided; one of the decimal point or the exponent may be elided.
|
|
</p>
|
|
<pre>
|
|
float_lit = decimals "." [ decimals ] [ exponent ] |
|
|
decimals exponent |
|
|
"." decimals [ exponent ] .
|
|
decimals = decimal_digit { decimal_digit } .
|
|
exponent = ( "e" | "E" ) [ "+" | "-" ] decimals .
|
|
</pre>
|
|
|
|
<p>
|
|
As with integers, floating-point literals represent values of arbitrary precision, or <i>ideal floats</i>.
|
|
</p>
|
|
|
|
<h4>Character literals</h4>
|
|
|
|
<p>
|
|
A character literal represents an integer value, typically a Unicode code point, as one or more characters enclosed in single quotes. Within the quotes, any character may appear except single quote and newline; a quoted single character represents itself, while multi-character sequences beginning with a backslash encode values in various formats.
|
|
</p>
|
|
<p>
|
|
The simplest form represents the exact character within the quotes; since Go source text is Unicode characters encoded in UTF-8, multiple UTF-8-encoded bytes may represent a single integer value. For instance, the literal <tt>'a'</tt> holds a single byte representing a literal <tt>a</tt>, Unicode U+0061, value <tt>0x61</tt>, while <tt>'ä'</tt> holds two bytes (<tt>0xc3</tt> <tt>0xa4</tt>) representing a literal <tt>a</tt>-dieresis, U+00E4, value <tt>0xe4</tt>.
|
|
</p>
|
|
<p>
|
|
Several backslash escapes allow arbitrary values to be represented as ASCII text. There are four ways to represent the integer value as a numeric constant: <tt>\x</tt> followed by exactly two hexadecimal digits; <tt>\u</tt> followed by exactly four hexadecimal digits; <tt>\U</tt> followed by exactly eight hexadecimal digits, and a plain backslash <tt>\</tt> followed by exactly three octal digits. In each case the value of the literal is the value represented by the digits in the appropriate base.
|
|
</p>
|
|
<p>
|
|
Although these representations all result in an integer, they have different valid ranges. Octal escapes must represent a value between 0 and 255 inclusive. (Hexadecimal escapes satisfy this condition by construction). The `Unicode' escapes <tt>\u</tt> and <tt>\U</tt> represent Unicode code points so within them some values are illegal, in particular those above <tt>0x10FFFF</tt> and surrogate halves.
|
|
</p>
|
|
<p>
|
|
After a backslash, certain single-character escapes represent special values:
|
|
</p>
|
|
<pre>
|
|
\a U+0007 alert or bell
|
|
\b U+0008 backspace
|
|
\f U+000C form feed
|
|
\n U+000A line feed or newline
|
|
\r U+000D carriage return
|
|
\t U+0009 horizontal tab
|
|
\v U+000b vertical tab
|
|
\\ U+005c backslash
|
|
\' U+0027 single quote (legal within character literals only)
|
|
\" U+0022 double quote (legal within interpreted string literals only)
|
|
</pre>
|
|
<p>
|
|
All other sequences are illegal inside character literals.
|
|
</p>
|
|
<pre>
|
|
char_lit = "'" ( unicode_value | byte_value ) "'" .
|
|
unicode_value = unicode_char | little_u_value | big_u_value | escaped_char .
|
|
byte_value = octal_byte_value | hex_byte_value .
|
|
octal_byte_value = "\" octal_digit octal_digit octal_digit .
|
|
hex_byte_value = "\" "x" hex_digit hex_digit .
|
|
little_u_value = "\" "u" hex_digit hex_digit hex_digit hex_digit .
|
|
big_u_value = "\" "U" hex_digit hex_digit hex_digit hex_digit
|
|
hex_digit hex_digit hex_digit hex_digit .
|
|
escaped_char = "\" ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "\" | "'" | """ ) .
|
|
</pre>
|
|
|
|
<p>
|
|
The value of a character literal is an ideal integer, just as with integer literals.
|
|
</p>
|
|
|
|
<h4>String literals</h4>
|
|
|
|
<p>
|
|
String literals represent constant values of type <tt>string</tt>. There are two forms: raw string literals and interpreted string literals.
|
|
</p>
|
|
<p>
|
|
Raw string literals are character sequences between back quotes <tt>``</tt>. Within the quotes, any character is legal except newline and back quote. The value of a raw string literal is the string composed of the uninterpreted bytes between the quotes.
|
|
</p>
|
|
<p>
|
|
Interpreted string literals are character sequences between double quotes <tt>""</tt>. The text between the quotes forms the value of the literal, with backslash escapes interpreted as they are in character literals. The three-digit octal (<tt>\000</tt>) and two-digit hexadecimal (<tt>\x00</tt>) escapes represent individual <i>bytes</i> of the resulting string; all other escapes represent the (possibly multi-byte) UTF-8 encoding of individual <i>characters</i>. Thus inside a string literal <tt>\377</tt> and <tt>\xFF</tt> represent a single byte of value <tt>0xFF</tt>=255, while <tt>ÿ</tt>, <tt>\u00FF</tt>, <tt>\U000000FF</tt> and <tt>\xc3\xbf</tt> represent the two bytes <tt>0xc3 0xbf</tt> of the UTF-8 encoding of character U+00FF.
|
|
</p>
|
|
|
|
<pre>
|
|
string_lit = raw_string_lit | interpreted_string_lit .
|
|
raw_string_lit = "`" { unicode_char } "`" .
|
|
interpreted_string_lit = """ { unicode_value | byte_value } """ .
|
|
</pre>
|
|
|
|
<p>
|
|
During tokenization, two adjacent string literals separated only by the empty string, white space, or comments are implicitly combined into a single string literal whose value is the concatenated values of the literals.
|
|
</p>
|
|
<pre>
|
|
StringLit = string_lit { string_lit } .
|
|
</pre>
|
|
|
|
<h2>Everything else</h2>
|
|
|
|
<p>
|
|
I don't believe this organization is complete or correct but it's here to be worked on and thought about.
|
|
</p>
|
|
|
|
<h2>Types</h2>
|
|
|
|
<h2>Constants</h2>
|
|
|
|
<h2>Expressions</h2>
|
|
|
|
<h2>Declarations</h2>
|
|
|
|
<h2>Control Structures</h2>
|
|
|
|
<h2>Program structure</h2>
|
|
|
|
<h2>Packages</h2>
|
|
|
|
<h2>Differences between this doc and implementation - TODO</h2>
|
|
<p>
|
|
<font color=red>
|
|
Current implementation accepts only ASCII digits for digits; doc says Unicode.
|
|
<br>
|
|
</font>
|
|
</p>
|
|
</div>
|
|
|
|
<br class="clearboth" />
|
|
<div id="pageFooter">
|
|
<p><span class="conf">Google Confidential:</span> For Internal Use Only.<br />© 2009 Google, Inc. All Rights Reserved.</p>
|
|
</div>
|
|
</body>
|
|
</html>
|