1
0
mirror of https://github.com/golang/go synced 2024-10-05 15:51:22 -06:00
Commit Graph

10 Commits

Author SHA1 Message Date
Marcel van Lohuizen
9aa70984a9 exp/locale/collate: include composed characters into the table. This eliminates
the need to decompose characters for the majority of cases.  This considerably
speeds up collation while increasing the table size minimally.

To detect non-normalized strings, rather than relying on exp/norm, the table
now includes CCC information. The inclusion of this information does not
increase table size.

DETAILS
 - Raw collation elements are now a struct that includes the CCC, rather
   than a slice of ints.
 - Builder now ensures that NFD and NFC counterparts are included in the table.
   This also fixes a bug for Korean which is responsible for most of the growth
   of the table size.
 - As there is no more normalization step, code should now handle both strings
   and byte slices as input. Introduced source type to facilitate this.

NOTES
 - This change does not handle normalization correctly entirely for contractions.
   This causes a few failures with the regtest. table_test.go contains a few
   uncommented tests that can be enabled once this is fixed.  The easiest is to
   fix this once we have the new norm.Iter.
 - Removed a test cases in table_test that covers cases that are now guaranteed
   to not exist.

R=rsc, mpvl
CC=golang-dev
https://golang.org/cl/6971044
2012-12-24 16:42:29 +01:00
Mikio Hara
78cee46f3a src: gofmt -w -s
R=golang-dev, dsymonds
CC=golang-dev
https://golang.org/cl/6935059
2012-12-15 14:19:51 +09:00
Marcel van Lohuizen
4c1a6f84f8 exp/locale/collate: removed weights struct to allow for faster and easier
incremental comparisons. Instead, processing is now done directly on colElems.
As a result, the size of the weights array is now reduced by 75%.
Details:
- Primary value of type 1 colElem is shifted by 1 bit so that primaries
  of all types can be compared without shifting.
- Quaternary values are now stored in the colElem itself. This is possible
  as quaternary values other than 0 or maxQuaternary are only needed when other
  values are ignored.
- Simplified processWeights by removing cases that are needed for ICU but not
  for us (our CJK primary values fit in a single value).

R=r
CC=golang-dev
https://golang.org/cl/6817054
2012-10-31 14:28:18 +01:00
Marcel van Lohuizen
b575e3ca99 exp/locale/collate: slightly changed collation elements:
- Allow secondary values below the default value in second form. This is
  to support before tags for secondary values, as used by Chinese.
- Eliminate collation elements that are guaranteed to be immaterial
  after a weight increment.

R=r
CC=golang-dev
https://golang.org/cl/6739051
2012-10-25 13:02:31 +02:00
Marcel van Lohuizen
882b6ef454 exp/locale/collate: This CL includes the following changes:
- Changed the representation of colElem to support a few cases
  for some languages not supported by the current format.
- Changed offsets for implicit primary values. This makes the
  values both easier to read and debug (last 4 nibbles are identical to
  implicit primary value) and also results in better packing.
- Fixed bug in weight conversion code that did not pop up yet by
  sheer luck.
Note that tables.go also includes changes to the contraction trie
from CL 6346092.

R=r, mpvl
CC=golang-dev
https://golang.org/cl/6392060
2012-07-13 11:38:22 +02:00
Russ Cox
ce69666273 exp/locale/collate: avoid 16-bit math
There's no need for the 16-bit arithmetic here,
and it tickles a long-standing compiler bug.
Fix the exp code not to use 16-bit math and
create an explicit test for the compiler bug.

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/6256048
2012-05-24 14:50:36 -04:00
Marcel van Lohuizen
ec099b2bc7 exp/locale/collate: implementation of main collation functionality for
key and simple comparisson. Search is not yet implemented in this CL.
Changed some of the types of table_test.go to allow reuse in the new test.
Also reduced number of primary values for illegal runes to 1 (both map to
the same).

R=r
CC=golang-dev
https://golang.org/cl/6202062
2012-05-17 19:48:56 +02:00
Marcel van Lohuizen
56a76c88f8 exp/locale/collate: from the regression test we derive that the spec
dictates a CJK rune is only part of a certain specified range if it
is explicitly defined in the Unicode Codepoint Database.
Fixed the code and some of the tests accordingly.

R=r
CC=golang-dev
https://golang.org/cl/6160044
2012-05-07 11:51:40 +02:00
Marcel van Lohuizen
10838165d8 exp/locale/collate: fixed two bugs uncovered by regression tests.
The first bug was that tertiary ignorables had the same colElem as
implicit colElems, yielding unexpected results. The current encoding
ensures that a non-implicit colElem is never 0.  This fix uncovered
another bug of the trie that indexed incorrectly into the null block.
This was caused by an unfinished optimization that would avoid the
need to max out the most-significant bits of continuation bytes.
This bug was also present in the trie used in exp/norm and has been
fixed there as well. The appearence of the bug was rare, as the lower
blocks happened to be nearly nil.

R=r
CC=golang-dev
https://golang.org/cl/6127070
2012-05-02 17:01:41 +02:00
Marcel van Lohuizen
bb3f3c9775 exp/locale/collate: added representation for collation elements
(see http://www.unicode.org/reports/tr10/).

R=r, r
CC=golang-dev
https://golang.org/cl/5981048
2012-04-25 13:16:24 +02:00