mirror of
https://github.com/golang/go
synced 2024-11-21 22:04:39 -07:00
doc: add Gobs of data article
Originally published on The Go Programming Language Blog, March 24, 2011. http://blog.golang.org/2011/03/gobs-of-data.html R=adg CC=golang-dev https://golang.org/cl/5834043
This commit is contained in:
parent
e9f82e6b68
commit
9e03dcb3fa
@ -10,6 +10,7 @@ RAWHTML=\
|
||||
articles/c_go_cgo.rawhtml\
|
||||
articles/go_concurrency_patterns_timing_out_moving_on.rawhtml\
|
||||
articles/godoc_documenting_go_code.rawhtml\
|
||||
articles/gobs_of_data.rawhtml\
|
||||
articles/image_draw.rawhtml\
|
||||
effective_go.rawhtml\
|
||||
go1.rawhtml\
|
||||
|
315
doc/articles/gobs_of_data.html
Normal file
315
doc/articles/gobs_of_data.html
Normal file
@ -0,0 +1,315 @@
|
||||
<!--{
|
||||
"Title": "Gobs of data",
|
||||
"Template": true
|
||||
}-->
|
||||
|
||||
<p>
|
||||
To transmit a data structure across a network or to store it in a file, it must
|
||||
be encoded and then decoded again. There are many encodings available, of
|
||||
course: <a href="http://www.json.org/">JSON</a>,
|
||||
<a href="http://www.w3.org/XML/">XML</a>, Google's
|
||||
<a href="http://code.google.com/p/protobuf">protocol buffers</a>, and more.
|
||||
And now there's another, provided by Go's <a href="/pkg/encoding/gob/">gob</a>
|
||||
package.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Why define a new encoding? It's a lot of work and redundant at that. Why not
|
||||
just use one of the existing formats? Well, for one thing, we do! Go has
|
||||
<a href="/pkg/">packages</a> supporting all the encodings just mentioned (the
|
||||
<a href="http://code.google.com/p/goprotobuf">protocol buffer package</a> is in
|
||||
a separate repository but it's one of the most frequently downloaded). And for
|
||||
many purposes, including communicating with tools and systems written in other
|
||||
languages, they're the right choice.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
But for a Go-specific environment, such as communicating between two servers
|
||||
written in Go, there's an opportunity to build something much easier to use and
|
||||
possibly more efficient.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Gobs work with the language in a way that an externally-defined,
|
||||
language-independent encoding cannot. At the same time, there are lessons to be
|
||||
learned from the existing systems.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<b>Goals</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The gob package was designed with a number of goals in mind.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
First, and most obvious, it had to be very easy to use. First, because Go has
|
||||
reflection, there is no need for a separate interface definition language or
|
||||
"protocol compiler". The data structure itself is all the package should need
|
||||
to figure out how to encode and decode it. On the other hand, this approach
|
||||
means that gobs will never work as well with other languages, but that's OK:
|
||||
gobs are unashamedly Go-centric.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Efficiency is also important. Textual representations, exemplified by XML and
|
||||
JSON, are too slow to put at the center of an efficient communications network.
|
||||
A binary encoding is necessary.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Gob streams must be self-describing. Each gob stream, read from the beginning,
|
||||
contains sufficient information that the entire stream can be parsed by an
|
||||
agent that knows nothing a priori about its contents. This property means that
|
||||
you will always be able to decode a gob stream stored in a file, even long
|
||||
after you've forgotten what data it represents.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
There were also some things to learn from our experiences with Google protocol
|
||||
buffers.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<b>Protocol buffer misfeatures</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Protocol buffers had a major effect on the design of gobs, but have three
|
||||
features that were deliberately avoided. (Leaving aside the property that
|
||||
protocol buffers aren't self-describing: if you don't know the data definition
|
||||
used to encode a protocol buffer, you might not be able to parse it.)
|
||||
</p>
|
||||
|
||||
<p>
|
||||
First, protocol buffers only work on the data type we call a struct in Go. You
|
||||
can't encode an integer or array at the top level, only a struct with fields
|
||||
inside it. That seems a pointless restriction, at least in Go. If all you want
|
||||
to send is an array of integers, why should you have to put put it into a
|
||||
struct first?
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Next, a protocol buffer definition may specify that fields <code>T.x</code> and
|
||||
<code>T.y</code> are required to be present whenever a value of type
|
||||
<code>T</code> is encoded or decoded. Although such required fields may seem
|
||||
like a good idea, they are costly to implement because the codec must maintain a
|
||||
separate data structure while encoding and decoding, to be able to report when
|
||||
required fields are missing. They're also a maintenance problem. Over time, one
|
||||
may want to modify the data definition to remove a required field, but that may
|
||||
cause existing clients of the data to crash. It's better not to have them in the
|
||||
encoding at all. (Protocol buffers also have optional fields. But if we don't
|
||||
have required fields, all fields are optional and that's that. There will be
|
||||
more to say about optional fields a little later.)
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The third protocol buffer misfeature is default values. If a protocol buffer
|
||||
omits the value for a "defaulted" field, then the decoded structure behaves as
|
||||
if the field were set to that value. This idea works nicely when you have
|
||||
getter and setter methods to control access to the field, but is harder to
|
||||
handle cleanly when the container is just a plain idiomatic struct. Required
|
||||
fields are also tricky to implement: where does one define the default values,
|
||||
what types do they have (is text UTF-8? uninterpreted bytes? how many bits in a
|
||||
float?) and despite the apparent simplicity, there were a number of
|
||||
complications in their design and implementation for protocol buffers. We
|
||||
decided to leave them out of gobs and fall back to Go's trivial but effective
|
||||
defaulting rule: unless you set something otherwise, it has the "zero value"
|
||||
for that type - and it doesn't need to be transmitted.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
So gobs end up looking like a sort of generalized, simplified protocol buffer.
|
||||
How do they work?
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<b>Values</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The encoded gob data isn't about <code>int8</code>s and <code>uint16</code>s.
|
||||
Instead, somewhat analogous to constants in Go, its integer values are abstract,
|
||||
sizeless numbers, either signed or unsigned. When you encode an
|
||||
<code>int8</code>, its value is transmitted as an unsized, variable-length
|
||||
integer. When you encode an <code>int64</code>, its value is also transmitted as
|
||||
an unsized, variable-length integer. (Signed and unsigned are treated
|
||||
distinctly, but the same unsized-ness applies to unsigned values too.) If both
|
||||
have the value 7, the bits sent on the wire will be identical. When the receiver
|
||||
decodes that value, it puts it into the receiver's variable, which may be of
|
||||
arbitrary integer type. Thus an encoder may send a 7 that came from an
|
||||
<code>int8</code>, but the receiver may store it in an <code>int64</code>. This
|
||||
is fine: the value is an integer and as a long as it fits, everything works. (If
|
||||
it doesn't fit, an error results.) This decoupling from the size of the variable
|
||||
gives some flexibility to the encoding: we can expand the type of the integer
|
||||
variable as the software evolves, but still be able to decode old data.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
This flexibility also applies to pointers. Before transmission, all pointers are
|
||||
flattened. Values of type <code>int8</code>, <code>*int8</code>,
|
||||
<code>**int8</code>, <code>****int8</code>, etc. are all transmitted as an
|
||||
integer value, which may then be stored in <code>int</code> of any size, or
|
||||
<code>*int</code>, or <code>******int</code>, etc. Again, this allows for
|
||||
flexibility.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Flexibility also happens because, when decoding a struct, only those fields
|
||||
that are sent by the encoder are stored in the destination. Given the value
|
||||
</p>
|
||||
|
||||
{{code "/doc/progs/gobs1.go" `/type T/` `/STOP/`}}
|
||||
|
||||
<p>
|
||||
the encoding of <code>t</code> sends only the 7 and 8. Because it's zero, the
|
||||
value of <code>Y</code> isn't even sent; there's no need to send a zero value.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The receiver could instead decode the value into this structure:
|
||||
</p>
|
||||
|
||||
{{code "/doc/progs/gobs1.go" `/type U/` `/STOP/`}}
|
||||
|
||||
<p>
|
||||
and acquire a value of <code>u</code> with only <code>X</code> set (to the
|
||||
address of an <code>int8</code> variable set to 7); the <code>Z</code> field is
|
||||
ignored - where would you put it? When decoding structs, fields are matched by
|
||||
name and compatible type, and only fields that exist in both are affected. This
|
||||
simple approach finesses the "optional field" problem: as the type
|
||||
<code>T</code> evolves by adding fields, out of date receivers will still
|
||||
function with the part of the type they recognize. Thus gobs provide the
|
||||
important result of optional fields - extensibility - without any additional
|
||||
mechanism or notation.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
From integers we can build all the other types: bytes, strings, arrays, slices,
|
||||
maps, even floats. Floating-point values are represented by their IEEE 754
|
||||
floating-point bit pattern, stored as an integer, which works fine as long as
|
||||
you know their type, which we always do. By the way, that integer is sent in
|
||||
byte-reversed order because common values of floating-point numbers, such as
|
||||
small integers, have a lot of zeros at the low end that we can avoid
|
||||
transmitting.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
One nice feature of gobs that Go makes possible is that they allow you to define
|
||||
your own encoding by having your type satisfy the
|
||||
<a href="/pkg/encoding/gob/#GobEncoder">GobEncoder</a> and
|
||||
<a href="/pkg/encoding/gob/#GobDecoder">GobDecoder</a> interfaces, in a manner
|
||||
analogous to the <a href="/pkg/encoding/json/">JSON</a> package's
|
||||
<a href="/pkg/encoding/json/#Marshaler">Marshaler</a> and
|
||||
<a href="/pkg/encoding/json/#Unmarshaler">Unmarshaler</a> and also to the
|
||||
<a href="/pkg/fmt/#Stringer">Stringer</a> interface from
|
||||
<a href="/pkg/fmt/">package fmt</a>. This facility makes it possible to
|
||||
represent special features, enforce constraints, or hide secrets when you
|
||||
transmit data. See the <a href="/pkg/encoding/gob/">documentation</a> for
|
||||
details.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<b>Types on the wire</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The first time you send a given type, the gob package includes in the data
|
||||
stream a description of that type. In fact, what happens is that the encoder is
|
||||
used to encode, in the standard gob encoding format, an internal struct that
|
||||
describes the type and gives it a unique number. (Basic types, plus the layout
|
||||
of the type description structure, are predefined by the software for
|
||||
bootstrapping.) After the type is described, it can be referenced by its type
|
||||
number.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Thus when we send our first type <code>T</code>, the gob encoder sends a
|
||||
description of <code>T</code> and tags it with a type number, say 127. All
|
||||
values, including the first, are then prefixed by that number, so a stream of
|
||||
<code>T</code> values looks like:
|
||||
</p>
|
||||
|
||||
<pre>
|
||||
("define type id" 127, definition of type T)(127, T value)(127, T value), ...
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
These type numbers make it possible to describe recursive types and send values
|
||||
of those types. Thus gobs can encode types such as trees:
|
||||
</p>
|
||||
|
||||
{{code "/doc/progs/gobs1.go" `/type Node/` `/STOP/`}}
|
||||
|
||||
<p>
|
||||
(It's an exercise for the reader to discover how the zero-defaulting rule makes
|
||||
this work, even though gobs don't represent pointers.)
|
||||
</p>
|
||||
|
||||
<p>
|
||||
With the type information, a gob stream is fully self-describing except for the
|
||||
set of bootstrap types, which is a well-defined starting point.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<b>Compiling a machine</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The first time you encode a value of a given type, the gob package builds a
|
||||
little interpreted machine specific to that data type. It uses reflection on
|
||||
the type to construct that machine, but once the machine is built it does not
|
||||
depend on reflection. The machine uses package unsafe and some trickery to
|
||||
convert the data into the encoded bytes at high speed. It could use reflection
|
||||
and avoid unsafe, but would be significantly slower. (A similar high-speed
|
||||
approach is taken by the protocol buffer support for Go, whose design was
|
||||
influenced by the implementation of gobs.) Subsequent values of the same type
|
||||
use the already-compiled machine, so they can be encoded right away.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Decoding is similar but harder. When you decode a value, the gob package holds
|
||||
a byte slice representing a value of a given encoder-defined type to decode,
|
||||
plus a Go value into which to decode it. The gob package builds a machine for
|
||||
that pair: the gob type sent on the wire crossed with the Go type provided for
|
||||
decoding. Once that decoding machine is built, though, it's again a
|
||||
reflectionless engine that uses unsafe methods to get maximum speed.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<b>Use</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
There's a lot going on under the hood, but the result is an efficient,
|
||||
easy-to-use encoding system for transmitting data. Here's a complete example
|
||||
showing differing encoded and decoded types. Note how easy it is to send and
|
||||
receive values; all you need to do is present values and variables to the
|
||||
<a href="/pkg/encoding/gob/">gob package</a> and it does all the work.
|
||||
</p>
|
||||
|
||||
{{code "/doc/progs/gobs2.go" `/package main/` `$`}}
|
||||
|
||||
<p>
|
||||
You can compile and run this example code in the
|
||||
<a href="http://play.golang.org/p/_-OJV-rwMq">Go Playground</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The <a href="/pkg/net/rpc/">rpc package</a> builds on gobs to turn this
|
||||
encode/decode automation into transport for method calls across the network.
|
||||
That's a subject for another article.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<b>Details</b>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The <a href="/pkg/encoding/gob/">gob package documentation</a>, especially the
|
||||
file <a href="/src/pkg/encoding/gob/doc.go">doc.go</a>, expands on many of the
|
||||
details described here and includes a full worked example showing how the
|
||||
encoding represents data. If you are interested in the innards of the gob
|
||||
implementation, that's a good place to start.
|
||||
</p>
|
@ -114,7 +114,7 @@ Guided tours of Go programs.
|
||||
<h4>Packages</h4>
|
||||
<ul>
|
||||
<li><a href="http://blog.golang.org/2011/01/json-and-go.html">JSON and Go</a> - using the <a href="/pkg/encoding/json/">json</a> package.</li>
|
||||
<li><a href="http://blog.golang.org/2011/03/gobs-of-data.html">Gobs of data</a> - the design and use of the <a href="/pkg/encoding/gob/">gob</a> package.</li>
|
||||
<li><a href="/doc/articles/gobs_of_data.html">Gobs of data</a> - the design and use of the <a href="/pkg/encoding/gob/">gob</a> package.</li>
|
||||
<li><a href="/doc/articles/laws_of_reflection.html">The Laws of Reflection</a> - the fundamentals of the <a href="/pkg/reflect/">reflect</a> package.</li>
|
||||
<li><a href="http://blog.golang.org/2011/09/go-image-package.html">The Go image package</a> - the fundamentals of the <a href="/pkg/image/">image</a> package.</li>
|
||||
<li><a href="/doc/articles/image_draw.html">The Go image/draw package</a> - the fundamentals of the <a href="/pkg/image/draw/">image/draw</a> package.</li>
|
||||
|
22
doc/progs/gobs1.go
Normal file
22
doc/progs/gobs1.go
Normal file
@ -0,0 +1,22 @@
|
||||
// Copyright 2011 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package gobs1
|
||||
|
||||
type T struct{ X, Y, Z int } // Only exported fields are encoded and decoded.
|
||||
var t = T{X: 7, Y: 0, Z: 8}
|
||||
|
||||
// STOP OMIT
|
||||
|
||||
type U struct{ X, Y *int8 } // Note: pointers to int8s
|
||||
var u U
|
||||
|
||||
// STOP OMIT
|
||||
|
||||
type Node struct {
|
||||
Value int
|
||||
Left, Right *Node
|
||||
}
|
||||
|
||||
// STOP OMIT
|
43
doc/progs/gobs2.go
Normal file
43
doc/progs/gobs2.go
Normal file
@ -0,0 +1,43 @@
|
||||
// Copyright 2011 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/gob"
|
||||
"fmt"
|
||||
"log"
|
||||
)
|
||||
|
||||
type P struct {
|
||||
X, Y, Z int
|
||||
Name string
|
||||
}
|
||||
|
||||
type Q struct {
|
||||
X, Y *int32
|
||||
Name string
|
||||
}
|
||||
|
||||
func main() {
|
||||
// Initialize the encoder and decoder. Normally enc and dec would be
|
||||
// bound to network connections and the encoder and decoder would
|
||||
// run in different processes.
|
||||
var network bytes.Buffer // Stand-in for a network connection
|
||||
enc := gob.NewEncoder(&network) // Will write to network.
|
||||
dec := gob.NewDecoder(&network) // Will read from network.
|
||||
// Encode (send) the value.
|
||||
err := enc.Encode(P{3, 4, 5, "Pythagoras"})
|
||||
if err != nil {
|
||||
log.Fatal("encode error:", err)
|
||||
}
|
||||
// Decode (receive) the value.
|
||||
var q Q
|
||||
err = dec.Decode(&q)
|
||||
if err != nil {
|
||||
log.Fatal("decode error:", err)
|
||||
}
|
||||
fmt.Printf("%q: {%d,%d}\n", q.Name, *q.X, *q.Y)
|
||||
}
|
@ -46,7 +46,12 @@ timeout="
|
||||
timeout2
|
||||
"
|
||||
|
||||
all=$(echo $defer_panic_recover $effective_go $error_handling $law_of_reflection $c_go_cgo $timeout slices go1)
|
||||
gobs="
|
||||
gobs1
|
||||
gobs2
|
||||
"
|
||||
|
||||
all=$(echo $defer_panic_recover $effective_go $error_handling $law_of_reflection $c_go_cgo $timeout $gobs slices go1)
|
||||
|
||||
for i in $all; do
|
||||
go build $i.go
|
||||
|
@ -226,7 +226,7 @@ where * signifies zero or more repetitions and the type id of a value must
|
||||
be predefined or be defined before the value in the stream.
|
||||
|
||||
See "Gobs of data" for a design discussion of the gob wire format:
|
||||
http://blog.golang.org/2011/03/gobs-of-data.html
|
||||
http://golang.org/doc/articles/gobs_of_data.html
|
||||
*/
|
||||
package gob
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user