1
0
mirror of https://github.com/golang/go synced 2024-11-23 14:30:02 -07:00
The Go programming language
Go to file
Russ Cox 45be3530a3 index/suffixarray: add 32-bit implementation
The original index/suffixarray used 32-bit ints on 64-bit machines,
because that's what 'int' meant in Go at the time. When we changed
the meaning of int, that doubled the space overhead of suffix arrays
for all uses, even though the vast majority of them describe less
than 2 GB of text.

The space overhead of a suffix array compared to the text is not
insignificant: there's a big difference for many uses between 4X and 8X.

This CL adjusts names in qsufsort.go so that a global search and
replace s/32/64/g produces a working 64-bit implementation,
and then it modifies suffixarray.go to choose between the 32-bit
and 64-bit implementation as appropriate depending on the input size.
The 64-bit implementation is generated by 'go generate'.

This CL also restructures the benchmarks, to test different
input sizes, different input texts, and 32-bit vs 64-bit.

The serialized form uses varint-encoded numbers and is unchanged,
so on-disk suffix arrays written by older versions of Go will be
readable by this version, and vice versa.

The 32-bit version runs a up to 17% faster than the 64-bit version
on real inputs, but more importantly it uses 50% less memory.

I have a followup CL that also implements a faster algorithm
on top of these improvements, but these are a good first step.

name                                  64-bit speed   32-bit speed    delta
New/text=opticks/size=100K/bits=*-12  4.44MB/s ± 0%  4.64MB/s ± 0%   +4.41%  (p=0.008 n=5+5)
New/text=opticks/size=500K/bits=*-12  3.70MB/s ± 1%  3.82MB/s ± 0%   +3.30%  (p=0.008 n=5+5)
New/text=go/size=100K/bits=*-12       4.40MB/s ± 0%  4.61MB/s ± 0%   +4.82%  (p=0.008 n=5+5)
New/text=go/size=500K/bits=*-12       3.66MB/s ± 0%  3.77MB/s ± 0%   +3.01%  (p=0.016 n=4+5)
New/text=go/size=1M/bits=*-12         3.29MB/s ± 0%  3.55MB/s ± 0%   +7.90%  (p=0.016 n=5+4)
New/text=go/size=5M/bits=*-12         2.25MB/s ± 1%  2.65MB/s ± 0%  +17.81%  (p=0.008 n=5+5)
New/text=go/size=10M/bits=*-12        1.82MB/s ± 0%  2.09MB/s ± 1%  +14.36%  (p=0.008 n=5+5)
New/text=go/size=50M/bits=*-12        1.35MB/s ± 0%  1.51MB/s ± 1%  +12.33%  (p=0.008 n=5+5)
New/text=zero/size=100K/bits=*-12     3.42MB/s ± 0%  3.32MB/s ± 0%   -2.74%  (p=0.000 n=5+4)
New/text=zero/size=500K/bits=*-12     3.00MB/s ± 1%  2.97MB/s ± 0%   -1.13%  (p=0.016 n=5+4)
New/text=zero/size=1M/bits=*-12       2.81MB/s ± 0%  2.78MB/s ± 2%     ~     (p=0.167 n=5+5)
New/text=zero/size=5M/bits=*-12       2.46MB/s ± 0%  2.53MB/s ± 0%   +3.18%  (p=0.008 n=5+5)
New/text=zero/size=10M/bits=*-12      2.35MB/s ± 0%  2.42MB/s ± 0%   +2.98%  (p=0.016 n=4+5)
New/text=zero/size=50M/bits=*-12      2.12MB/s ± 0%  2.18MB/s ± 0%   +3.02%  (p=0.008 n=5+5)
New/text=rand/size=100K/bits=*-12     6.98MB/s ± 0%  7.22MB/s ± 0%   +3.38%  (p=0.016 n=4+5)
New/text=rand/size=500K/bits=*-12     5.53MB/s ± 0%  5.64MB/s ± 0%   +1.92%  (p=0.008 n=5+5)
New/text=rand/size=1M/bits=*-12       4.62MB/s ± 1%  5.06MB/s ± 0%   +9.61%  (p=0.008 n=5+5)
New/text=rand/size=5M/bits=*-12       3.09MB/s ± 0%  3.43MB/s ± 0%  +10.94%  (p=0.016 n=4+5)
New/text=rand/size=10M/bits=*-12      2.68MB/s ± 0%  2.95MB/s ± 0%  +10.39%  (p=0.008 n=5+5)
New/text=rand/size=50M/bits=*-12      1.92MB/s ± 0%  2.06MB/s ± 1%   +7.41%  (p=0.008 n=5+5)
SaveRestore/bits=*-12                  243MB/s ± 1%   259MB/s ± 0%   +6.68%  (p=0.000 n=9+10)

name                               64-bit alloc/op  32-bit alloc/op  delta
New/text=opticks/size=100K/bits=*-12    1.62MB ± 0%    0.81MB ± 0%  -50.00%  (p=0.000 n=5+4)
New/text=opticks/size=500K/bits=*-12    8.07MB ± 0%    4.04MB ± 0%  -49.89%  (p=0.008 n=5+5)
New/text=go/size=100K/bits=*-12         1.62MB ± 0%    0.81MB ± 0%  -50.00%  (p=0.008 n=5+5)
New/text=go/size=500K/bits=*-12         8.07MB ± 0%    4.04MB ± 0%  -49.89%  (p=0.029 n=4+4)
New/text=go/size=1M/bits=*-12           16.1MB ± 0%     8.1MB ± 0%  -49.95%  (p=0.008 n=5+5)
New/text=go/size=5M/bits=*-12           80.3MB ± 0%    40.2MB ± 0%     ~     (p=0.079 n=4+5)
New/text=go/size=10M/bits=*-12           160MB ± 0%      80MB ± 0%  -50.00%  (p=0.008 n=5+5)
New/text=go/size=50M/bits=*-12           805MB ± 0%     402MB ± 0%  -50.06%  (p=0.029 n=4+4)
New/text=zero/size=100K/bits=*-12       3.02MB ± 0%    1.46MB ± 0%     ~     (p=0.079 n=4+5)
New/text=zero/size=500K/bits=*-12       19.7MB ± 0%     8.7MB ± 0%  -55.98%  (p=0.008 n=5+5)
New/text=zero/size=1M/bits=*-12         39.0MB ± 0%    19.7MB ± 0%  -49.60%  (p=0.000 n=5+4)
New/text=zero/size=5M/bits=*-12          169MB ± 0%      85MB ± 0%  -49.46%  (p=0.029 n=4+4)
New/text=zero/size=10M/bits=*-12         333MB ± 0%     169MB ± 0%  -49.43%  (p=0.000 n=5+4)
New/text=zero/size=50M/bits=*-12        1.63GB ± 0%    0.74GB ± 0%  -54.61%  (p=0.008 n=5+5)
New/text=rand/size=100K/bits=*-12       1.61MB ± 0%    0.81MB ± 0%  -50.00%  (p=0.000 n=5+4)
New/text=rand/size=500K/bits=*-12       8.07MB ± 0%    4.04MB ± 0%  -49.89%  (p=0.000 n=5+4)
New/text=rand/size=1M/bits=*-12         16.1MB ± 0%     8.1MB ± 0%  -49.95%  (p=0.029 n=4+4)
New/text=rand/size=5M/bits=*-12         80.7MB ± 0%    40.3MB ± 0%  -50.06%  (p=0.008 n=5+5)
New/text=rand/size=10M/bits=*-12         161MB ± 0%      81MB ± 0%  -50.03%  (p=0.008 n=5+5)
New/text=rand/size=50M/bits=*-12         806MB ± 0%     403MB ± 0%  -50.00%  (p=0.016 n=4+5)
SaveRestore/bits=*-12                   9.47MB ± 0%    5.28MB ± 0%  -44.29%  (p=0.000 n=9+8)

https://perf.golang.org/search?q=upload:20190126.1+|+bits:64+vs+bits:32

Fixes #6816.

Change-Id: Ied2fbea519a202ecc43719debcd233344ce38847
Reviewed-on: https://go-review.googlesource.com/c/go/+/174097
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-05-01 15:19:07 +00:00
.github .github: don't render author-facing text in ISSUE_TEMPLATE 2018-11-02 04:47:34 +00:00
api api: add recent Go 1.13 work 2019-04-26 05:16:33 +00:00
doc runtime: support all as parameter in gdb goroutine commands. 2019-04-29 18:11:11 +00:00
lib/time time: read 64-bit data if available 2019-02-26 23:10:35 +00:00
misc all: add new GOOS=illumos, split out of GOOS=solaris 2019-04-30 16:59:13 +00:00
src index/suffixarray: add 32-bit implementation 2019-05-01 15:19:07 +00:00
test test: enable more memcombine tests for ppc64le 2019-05-01 14:27:27 +00:00
.gitattributes
.gitignore .gitignore: ignore src/cmd/dist/dist 2017-10-28 21:55:49 +00:00
AUTHORS A: Add Maya Rashish (individual CLA) 2019-04-23 14:40:30 +00:00
CONTRIBUTING.md all: restore changes from faulty merge/revert 2018-02-12 20:13:59 +00:00
CONTRIBUTORS CONTRIBUTORS: second round of updates for Go 1.12 2019-01-30 18:01:19 +00:00
favicon.ico website: recreate 16px and 32px favicon 2016-08-25 15:43:32 +00:00
LICENSE doc: revert copyright date to 2009 2016-06-01 22:40:04 +00:00
PATENTS
README.md README: linkify some paths 2018-06-06 18:07:01 +00:00
robots.txt

The Go Programming Language

Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.

Gopher image Gopher image by Renee French, licensed under Creative Commons 3.0 Attributions license.

Our canonical Git repository is located at https://go.googlesource.com/go. There is a mirror of the repository at https://github.com/golang/go.

Unless otherwise noted, the Go source files are distributed under the BSD-style license found in the LICENSE file.

Download and Install

Binary Distributions

Official binary distributions are available at https://golang.org/dl/.

After downloading a binary release, visit https://golang.org/doc/install or load doc/install.html in your web browser for installation instructions.

Install From Source

If a binary distribution is not available for your combination of operating system and architecture, visit https://golang.org/doc/install/source or load doc/install-source.html in your web browser for source installation instructions.

Contributing

Go is the work of thousands of contributors. We appreciate your help!

To contribute, please read the contribution guidelines: https://golang.org/doc/contribute.html

Note that the Go project uses the issue tracker for bug reports and proposals only. See https://golang.org/wiki/Questions for a list of places to ask questions about the Go language.