Time to draw.Draw a 200x200 image fell from 18.4ms (and 1 malloc) to
5.6ms (and 0 mallocs). It's still relatively slow since it assumes
nothing about the src or mask images, but it does remove the malloc.
There are existing faster, more specialized paths for copies, fills
and image glyph masks.
Also added a "compare to a slow but obviously correct implementation"
check to draw_test.go.
R=rsc, r
CC=golang-dev
https://golang.org/cl/1223044