diff --git a/doc/articles/race_detector.html b/doc/articles/race_detector.html new file mode 100644 index 00000000000..af348dfeb63 --- /dev/null +++ b/doc/articles/race_detector.html @@ -0,0 +1,353 @@ + + +
+Data races are one of the most common and hardest to debug types of bugs in concurrent systems. A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. See the The Go Memory Model for details. +
+ ++Here is an example of a data race that can lead to crashes and memory corruption: +
+ ++func main() { + c := make(chan bool) + m := make(map[string]string) + go func() { + m["1"] = "a" // First conflicting access. + c <- true + }() + m["2"] = "b" // Second conflicting access. + <-c + for k, v := range m { + fmt.Println(k, v) + } +} ++ +
+Fortunately, Go includes a built-in data race detector. To use it, add the -race
flag to the go command:
+
+$ go test -race mypkg // to test the package +$ go run -race mysrc.go // to run the source file +$ go build -race mycmd // to build the command +$ go install -race mypkg // to install the package ++ +
+When the race detector finds a data race in the program, it prints a report. The report contains stack traces for conflicting accesses, as well as stacks where the involved goroutines were created. For example: +
+ ++WARNING: DATA RACE +Read by goroutine 185: + net.(*pollServer).AddFD() + src/pkg/net/fd_unix.go:89 +0x398 + net.(*pollServer).WaitWrite() + src/pkg/net/fd_unix.go:247 +0x45 + net.(*netFD).Write() + src/pkg/net/fd_unix.go:540 +0x4d4 + net.(*conn).Write() + src/pkg/net/net.go:129 +0x101 + net.func·060() + src/pkg/net/timeout_test.go:603 +0xaf + +Previous write by goroutine 184: + net.setWriteDeadline() + src/pkg/net/sockopt_posix.go:135 +0xdf + net.setDeadline() + src/pkg/net/sockopt_posix.go:144 +0x9c + net.(*conn).SetDeadline() + src/pkg/net/net.go:161 +0xe3 + net.func·061() + src/pkg/net/timeout_test.go:616 +0x3ed + +Goroutine 185 (running) created at: + net.func·061() + src/pkg/net/timeout_test.go:609 +0x288 + +Goroutine 184 (running) created at: + net.TestProlongTimeout() + src/pkg/net/timeout_test.go:618 +0x298 + testing.tRunner() + src/pkg/testing/testing.go:301 +0xe8 ++ +
+The GORACE
environment variable sets race detector options. The format is:
+
+GORACE="option1=val1 option2=val2" ++ +
+The options are: +
+log_path
(default stderr
): The race detector writes
+its report to a file named log_path.pid. The special names stdout
+and stderr
cause reports to be written to standard output and
+standard error, respectively.exitcode
(default 66
): The exit status to use when
+exiting after a detected race.strip_path_prefix
(default ""
): Strip this prefix
+from all reported file paths, to make reports more concise.history_size
(default 1
): The per-goroutine memory
+access history is 32K * 2**history_size elements
. Increasing this
+value can avoid a "failed to restore the stack" error in reports, but at the
+cost of increased memory usage.+Example: +
+ ++$ GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go test -race ++ +
+When you build with -race
flag, go command defines additional
+build tag race
.
+You can use it to exclude some code/tests under the race detector. For example:
+
+// +build !race + +package foo + +// The test contains a data race. See issue 123. +func TestFoo(t *testing.T) { + // ... +} + +// The test fails under the race detector due to timeouts. +func TestBar(t *testing.T) { + // ... +} + +// The test takes too long under the race detector. +func TestBaz(t *testing.T) { + // ... +} ++ +
+To start, run your tests using the race detector (go test -race
).
+The race detector only finds races that happen at runtime, so it can't find
+races in code paths that are not executed. If your tests have incomplete coverage,
+you may find more races by running a binary built with -race
under a realistic
+workload.
+
+Here are some typical data races. All of them can be detected with the race detector. +
+ ++func main() { + var wg sync.WaitGroup + wg.Add(5) + for i := 0; i < 5; i++ { + go func() { + fmt.Println(i) // Not the 'i' you are looking for. + wg.Done() + }() + } + wg.Wait() +} ++ +
+The variable i
in the function literal is the same variable used by the loop, so
+the read in the goroutine races with the loop increment. (This program typically
+prints 55555, not 01234.) The program can be fixed by making a copy of the
+variable:
+
+func main() { + var wg sync.WaitGroup + wg.Add(5) + for i := 0; i < 5; i++ { + go func(j int) { + fmt.Println(j) // Good. Read local copy of the loop counter. + wg.Done() + }(i) + } + wg.Wait() +} ++ +
+// ParallelWrite writes data to file1 and file2, returns the errors. +func ParallelWrite(data []byte) chan error { + res := make(chan error, 2) + f1, err := os.Create("file1") + if err != nil { + res <- err + } else { + go func() { + // This err is shared with the main goroutine, + // so the write races with the write below. + _, err = f1.Write(data) + res <- err + f1.Close() + }() + } + f2, err := os.Create("file2") // The second conflicting write to err. + if err != nil { + res <- err + } else { + go func() { + _, err = f2.Write(data) + res <- err + f2.Close() + }() + } + return res +} ++ +
+The fix is to introduce new variables in the goroutines (note :=
):
+
+ _, err := f1.Write(data) + ... + _, err := f2.Write(data) ++ +
+If the following code is called from several goroutines, it leads to bad races on the service
map.
+Concurrent reads and writes of a map are not safe:
+
+var service map[string]net.Addr + +func RegisterService(name string, addr net.Addr) { + service[name] = addr +} + +func LookupService(name string) net.Addr { + return service[name] +} ++ +
+To make the code safe, protect the accesses with a mutex: +
+ ++var ( + service map[string]net.Addr + serviceMu sync.Mutex +) + +func RegisterService(name string, addr net.Addr) { + serviceMu.Lock() + defer serviceMu.Unlock() + service[name] = addr +} + +func LookupService(name string) net.Addr { + serviceMu.Lock() + defer serviceMu.Unlock() + return service[name] +} ++ +
+Data races can happen on variables of primitive types as well (bool
, int
, int64
), like in the following example:
+
+type Watchdog struct { last int64 } + +func (w *Watchdog) KeepAlive() { + w.last = time.Now().UnixNano() // First conflicting access. +} + +func (w *Watchdog) Start() { + go func() { + for { + time.Sleep(time.Second) + // Second conflicting access. + if w.last < time.Now().Add(-10*time.Second).UnixNano() { + fmt.Println("No keepalives for 10 seconds. Dying.") + os.Exit(1) + } + } + }() +} ++ +
+Even such “innocent” data races can lead to hard to debug problems caused by (1) non-atomicity of the memory accesses, (2) interference with compiler optimizations and (3) processor memory access reordering issues. +
+ +
+A typical fix for this race is to use a channel or a mutex.
+To preserve the lock-free behavior, one can also use the sync/atomic
package.
+
+type Watchdog struct { last int64 } + +func (w *Watchdog) KeepAlive() { + atomic.StoreInt64(&w.last, time.Now().UnixNano()) +} + +func (w *Watchdog) Start() { + go func() { + for { + time.Sleep(time.Second) + if atomic.LoadInt64(&w.last) < time.Now().Add(-10*time.Second).UnixNano() { + fmt.Println("No keepalives for 10 seconds. Dying.") + os.Exit(1) + } + } + }() +} ++ +
+The race detector runs on darwin/amd64
, linux/amd64
, and windows/amd64
.
+
+The cost of race detection varies by program, but for a typical program, memory +usage may increase by 5-10x and execution time by 2-20x. +
diff --git a/doc/docs.html b/doc/docs.html index 9bb012a50ab..256e1b915f8 100644 --- a/doc/docs.html +++ b/doc/docs.html @@ -132,6 +132,7 @@ Guided tours of Go programs.+The implementation now includes a built-in data race detector. +
+