1
0
mirror of https://github.com/golang/go synced 2024-10-05 08:21:22 -06:00
Commit Graph

56 Commits

Author SHA1 Message Date
Marcel van Lohuizen
cfcc3ebfa4 exp/norm: changed API of Iter.
Motivations:
 - Simpler UI. Previous API proved a bit awkward for practical purposes.
 - Iter is often used in cases where one want to be able to bail out early.
   The old implementaton had too much look-ahead to be efficient.
Disadvantages:
 - ASCII performance is bad. This is unavoidable for tiny iterations.
   Example is included to show how to work around this.

Description:
Iter now iterates per boundary/segment. It returns a slice of bytes that
either points to the input bytes, the internal decomposition strings,
or the small internal buffer that each iterator has. In many cases, copying
bytes is avoided.
The method Seek was added to support jumping around the input without
having to reinitialize.

Details:
 - Table adjustments: some decompositions exist of multiple segments.
   Decompositions that are of this type are now marked so that Iter can
   handle them separately.
 - The old iterator had a different next function for different normal forms
   that was assigned to a function pointer called by Next.
   The new iterator uses this mechanism to switch between different modes
   for handling different type of input as well.  This greatly improves
   performance for Hangul and ASCII. It is also used for multi-segment
   decompositions.
 - input is now a struct of sting and []byte, instead of an interface.
   This simplifies optimizing the ASCII case.

R=rsc
CC=golang-dev
https://golang.org/cl/6873072
2012-12-24 16:53:25 +01:00
Shenghou Ma
d1ef9b56fb all: fix typos
caught by https://github.com/lyda/misspell-check.

R=golang-dev, gri
CC=golang-dev
https://golang.org/cl/6949072
2012-12-19 03:04:09 +08:00
Marcel van Lohuizen
e14cf90a8b unicode: move unicode and related packages to Unicode 6.2.0.
R=r, mpvl
CC=golang-dev
https://golang.org/cl/6818067
2012-10-31 17:32:16 +01:00
Robert Griesemer
465b9c35e5 gofmt: apply gofmt -w src misc
Remove trailing whitespace in comments.
No other changes.

R=r
CC=golang-dev
https://golang.org/cl/6815053
2012-10-30 13:38:01 -07:00
Marcel van Lohuizen
18aded7ab9 exp/norm: It is important that the unicode versions of the various packages align.
Replace hard-coded version strings with unicode.Version.

R=r, r
CC=golang-dev
https://golang.org/cl/6163045
2012-05-07 11:41:40 +02:00
Marcel van Lohuizen
10838165d8 exp/locale/collate: fixed two bugs uncovered by regression tests.
The first bug was that tertiary ignorables had the same colElem as
implicit colElems, yielding unexpected results. The current encoding
ensures that a non-implicit colElem is never 0.  This fix uncovered
another bug of the trie that indexed incorrectly into the null block.
This was caused by an unfinished optimization that would avoid the
need to max out the most-significant bits of continuation bytes.
This bug was also present in the trie used in exp/norm and has been
fixed there as well. The appearence of the bug was rare, as the lower
blocks happened to be nearly nil.

R=r
CC=golang-dev
https://golang.org/cl/6127070
2012-05-02 17:01:41 +02:00
Rob Pike
459837c86e all: fix errors found by go vet
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/6125044
2012-04-25 11:33:27 +10:00
Marcel van Lohuizen
98aa4968b7 exp/norm: exposed runeInfo type in API.
For completeness, we also expose the Canonical Combining Class of a rune.
This does not increase the data size.

R=r
CC=golang-dev
https://golang.org/cl/5931043
2012-04-11 16:47:53 +02:00
Robert Griesemer
f5f80368c4 exp/norm/normalize.go: fix typo
R=golang-dev, r, dsymonds
CC=golang-dev
https://golang.org/cl/5874045
2012-03-21 14:55:05 -07:00
Robert Griesemer
56cae1c230 all: gofmt -w -s src misc
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/5781058
2012-03-08 10:48:51 -08:00
Robert Griesemer
7c6654aa70 all: fixed various typos
(Semi-automatically detected.)

R=golang-dev, remyoudompheng, r
CC=golang-dev
https://golang.org/cl/5715052
2012-03-01 14:56:05 -08:00
Marcel van Lohuizen
ecd24f381e exp/norm: Added Iter type for iterating on segment boundaries. This type is mainly to be used
by other low-level libraries, like collate.  Extra care has been given to optimize the performance
of normalizing to NFD, as this is what will be used by the collator.  The overhead of checking
whether a string is normalized vs simply decomposing a string is neglible.  Assuming that most
strings are in the FCD form, this iterator can be used to decompose strings and normalize with
minimal overhead.

R=r
CC=golang-dev
https://golang.org/cl/5676057
2012-02-21 13:13:21 +01:00
Russ Cox
9f333170bf cmd/go: a raft of fixes
* add -work option to save temporary files (Fixes issue 2980)
* fix go test -i to work with cgo packages (Fixes issue 2936)
* do not overwrite/remove empty directories or non-object
  files during build (Fixes issue 2829)
* remove package main vs package non-main heuristic:
  a directory must contain only one package (Fixes issue 2864)
* to make last item workable, ignore +build tags for files
  named on command line: go build x.go builds x.go even
  if it says // +build ignore.
* add // +build ignore tags to helper programs

R=golang-dev, r, r
CC=golang-dev
https://golang.org/cl/5674043
2012-02-14 16:39:20 -05:00
Shenghou Ma
c11361e253 exp/norm: fix typo
R=golang-dev, gri
CC=golang-dev
https://golang.org/cl/5649086
2012-02-13 11:50:06 -08:00
Marcel van Lohuizen
a52fb458df exp/norm: merged charinfo and decomposition tables. As a result only
one trie lookup per rune is needed. See forminfo.go for a description
of the new format.  Also included leading and trailing canonical
combining class in decomposition information.  This will often avoid
additional trie lookups.

R=r, r
CC=golang-dev
https://golang.org/cl/5616071
2012-02-13 14:54:46 +01:00
Marcel van Lohuizen
8ba20dbdb5 exp/norm: a few minor changes in prepration for a table format change:
- Unified bounary conditions for NFC and NFD and removed some indirections.
   This enforces boundaries at the character level, which is typically what
   the user expects. (NFD allows a boundary between 'a' and '`', for example,
   which may give unexpected results for collation.  The current implementation
   is already stricter than the standard, so nothing much changes.  This change
   just formalizes it.
 - Moved methods of qcflags to runeInfo.
 - Swapped YesC and YesMaybe bits in qcFlags. This is to aid future changes.
 - runeInfo return values use named fields in preperation for struct change.
 - Replaced some left-over uint32s with rune.

R=r
CC=golang-dev
https://golang.org/cl/5607050
2012-02-02 13:55:53 +01:00
Marcel van Lohuizen
d673c95d6c exp/norm: Added some benchmarks for form-specific performance measurements.
R=r
CC=golang-dev
https://golang.org/cl/5605051
2012-02-02 13:19:12 +01:00
Robert Griesemer
f3f5239d1e all packages: fix various typos
Detected semi-automatically. There are probably more.

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/5620046
2012-02-01 16:19:36 -08:00
Russ Cox
2050a9e478 build: remove Make.pkg, Make.tool
Consequently, remove many package Makefiles,
and shorten the few that remain.

gomake becomes 'go tool make'.

Turn off test phases of run.bash that do not work,
flagged with $BROKEN.  Future CLs will restore these,
but this seemed like a big enough CL already.

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5601057
2012-01-30 23:43:46 -05:00
Marcel van Lohuizen
110964ac81 exp/norm: fixes a subtle bug introduced by change 10087: random offset
for map iteration.  New code makes table output predictable and fixes
bug.

R=rsc, r
CC=golang-dev
https://golang.org/cl/5573044
2012-01-23 19:36:52 +01:00
Olivier Duperray
e5c1f3870b pkg: Add & fix Copyright of "hand generated" files
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5554064
2012-01-19 10:14:56 -08:00
Marcel van Lohuizen
cadbd3ea49 exp/norm: fixed two unrelated bugs in normalization library.
1) incorrect length given for out buffer in String.
2) patchTail bug that could cause characters to be lost
   when crossing into the out-buffer boundary.

Added tests to expose these bugs.  Also slightly improved
performance of Bytes() and String() by sharing the reorderBuffer
across operations.

Fixes #2567.

R=r
CC=golang-dev
https://golang.org/cl/5502069
2011-12-23 18:21:26 +01:00
Maxim Pimenov
bf6dd2db04 various: use $GCFLAGS and $GCIMPORTS like Make does
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/5489065
2011-12-16 11:31:39 -05:00
David Symonds
3dbecd592b various: a grab-bag of time.Duration cleanups.
R=adg, r, rsc
CC=golang-dev
https://golang.org/cl/5475069
2011-12-13 10:42:56 +11:00
Joel Sing
7e797be7a3 exp/norm: fix rune/int types in test
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5472067
2011-12-11 09:25:09 -08:00
Russ Cox
2666b815a3 use new strconv API
All but 3 cases (in gcimporter.go and hixie.go)
are automatic conversions using gofix.

No attempt is made to use the new Append functions
even though there are definitely opportunities.

R=golang-dev, gri
CC=golang-dev
https://golang.org/cl/5447069
2011-12-05 15:48:46 -05:00
Rob Pike
30aa701fec renaming_2: gofix -r go1pkgrename src/pkg/[a-l]*
R=rsc
CC=golang-dev
https://golang.org/cl/5358041
2011-11-08 15:40:58 -08:00
Russ Cox
965845a86d all: sort imports
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5319072
2011-11-02 15:54:16 -04:00
Russ Cox
c2049d2dfe src/pkg/[a-m]*: gofix -r error -force=error
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/5322051
2011-11-01 22:04:37 -04:00
Marcel van Lohuizen
eef7809193 exp/norm: fixed bug that creeped in with moving to the new
regexp, which caused the last line of a test block to be ignored.

R=r, rsc
CC=golang-dev
https://golang.org/cl/5177052
2011-10-31 10:58:04 +01:00
Russ Cox
c945f77f41 exp/norm: use rune
Nothing terribly interesting here. (!)

Since the public APIs are all in terms of UTF-8,
the changes are all internal only.

R=mpvl, gri, r
CC=golang-dev
https://golang.org/cl/5309042
2011-10-25 22:26:12 -07:00
Christopher Wedgwood
707e5acd71 updates: append(y,[]byte(z)...) -> append(y,z...)"
(more are possible but omitted for now as they are part of
specific tests where rather than changing what is there we
should probably expand the tests to cover the new case)

R=rsc, dvyukov
CC=golang-dev
https://golang.org/cl/5247058
2011-10-12 13:42:04 -07:00
Marcel van Lohuizen
9a8da9d499 exp/norm: LastBoundary is used in preparation for an append operation. It seems
therefore unlikely that there is a good use for its string version
LastBoundaryInString. Yet, the implemenation of this method would complicate
things a bit as it would require the introduction for another interface and
some duplication of code. Removing it seems a better choice.

R=r
CC=golang-dev
https://golang.org/cl/5182044
2011-10-05 14:36:02 -07:00
Marcel van Lohuizen
5844fc1b21 exp/norm: introduced input interface to implement string versions
of methods.

R=r, mpvl
CC=golang-dev
https://golang.org/cl/5166045
2011-10-05 10:44:11 -07:00
Robert Griesemer
9c643bb3fa exp/norm: fix benchmark bug
- don't use range over string to copy string bytes
- some code simplification

R=mpvl
CC=golang-dev
https://golang.org/cl/5144044
2011-09-26 18:23:21 -07:00
Russ Cox
6c230fbc67 regexp: move to old/regexp, replace with exp/regexp
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/5127042
2011-09-26 18:33:13 -04:00
Marcel van Lohuizen
46468357a2 exp/norm: Adopt regexp to exp/regexp semantics.
R=rsc
CC=golang-dev
https://golang.org/cl/5046041
2011-09-19 17:30:19 +02:00
Marcel van Lohuizen
a083fd524a exp/norm: reverting to using strings.Repeat, as it doesn't look like exp/regexp
is going to support returning multiple matches for a single repeated group.

R=r, rsc, mpvl
CC=golang-dev
https://golang.org/cl/5014045
2011-09-16 11:28:53 +02:00
Marcel van Lohuizen
1913fdab98 exp/norm: changed trie to produce smaller tables.
Trie now uses sparse block when this makes sense.

R=r, r
CC=golang-dev
https://golang.org/cl/5010043
2011-09-16 11:27:05 +02:00
Marcel van Lohuizen
efea5d0fb9 exp/norm: Added regression test tool for the standard Unicode test set.
R=r
CC=golang-dev
https://golang.org/cl/4973064
2011-09-13 12:51:48 +02:00
Marcel van Lohuizen
3e42de29c9 exp/norm: fixed typo. Bug exposed by gomake testtables. Changes did not affect other tests
as this part of Hangul is handled algorithmically.

R=r
CC=golang-dev
https://golang.org/cl/4951074
2011-09-12 10:21:35 +02:00
Marcel van Lohuizen
d5e24b6975 exp/norm: performance improvements of quickSpan
- fixed performance bug that could lead to O(n^2) behavior
- performance improvement for ASCII case

R=r, r
CC=golang-dev
https://golang.org/cl/4956060
2011-09-05 19:09:20 +02:00
Marcel van Lohuizen
2517143957 exp/norm: added Reader and Writer and bug fixes to support these.
Needed to ensure that finding the last boundary does not result in O(n^2)-like behavior.
Now prevents lookbacks beyond 31 characters across the board (starter + 30 non-starters).
composition.go:
- maxCombiningCharacters now means exactly that.
- Bug fix.
- Small performance improvement/ made code consistent with other code.
forminfo.go:
- Bug fix: ccc needs to be 0 for inert runes.
normalize.go:
- A few bug fixes.
- Limit the amount of combining characters considered in FirstBoundary.
- Ditto for LastBoundary.
- Changed semantics of LastBoundary to not consider trailing illegal runes a boundary
  as long as adding bytes might still make them legal.
trie.go:
- As utf8.UTFMax is 4, we should treat UTF-8 encodings of size 5 or greater as illegal.
  This has no impact on the normalization process, but it prevents buffer overflows
  where we expect at most UTFMax bytes.

R=r
CC=golang-dev
https://golang.org/cl/4963041
2011-09-02 12:39:35 +02:00
Marcel van Lohuizen
4a4fa38d0e exp/norm: Reduced the size of the byte buffer used by reorderBuffer by half by reusing space when combining.
R=r
CC=golang-dev
https://golang.org/cl/4939042
2011-08-24 11:05:45 +02:00
Marcel van Lohuizen
d9c9c48797 exp/norm: added implemenation for []byte versions of methods.
R=r
CC=golang-dev
https://golang.org/cl/4925041
2011-08-22 12:52:04 +02:00
Marcel van Lohuizen
45b7084b92 exp/norm: a few minor fixes to support the implementation of norm.
maketables.go/tables.go
- Properly set combinesForward flag for JamoL and JamoV.
- Fixed Printf bug.
composition.go
- Make insertString use the same control flow as insert.
- Better Hangul and non-Hangul mixing.
forminfo.go
- Fixed bug in compBoundaryBefore that affected a few esoteric cases.
- Buffer overflow now tested in normalize_test.go (other CL).

R=r
CC=golang-dev
https://golang.org/cl/4924041
2011-08-22 12:11:29 +02:00
Marcel van Lohuizen
b40bd5efb7 exp/norm: implementation of decomposition and composing functionality.
forminfo.go:
- Wrappers for table data.
- Per Form dispatch table.
composition.go:
- reorderBuffer type.  Implements decomposition, reordering, and composition.
- Note: decompose and decomposeString fields in formInfo could be replaced by
  a pointer to the trie for the respective form.  The proposed design makes
  testing easier, though.
normalization.go:
- Temporarily added panic("not implemented") methods to make the tests run.
  These will be removed again with the next CL, which will introduce the
  implementation.

R=r, rogpeppe, mpvl, rsc
CC=golang-dev
https://golang.org/cl/4875043
2011-08-17 18:12:39 +10:00
Robert Hencke
8a439334ad exp/norm: fix incorrect prints found by govet.
R=golang-dev, dsymonds
CC=golang-dev
https://golang.org/cl/4895042
2011-08-14 14:02:48 +10:00
Marcel van Lohuizen
4abbdc0399 exp/norm: generate trie struct in triegen.go for better encapsulation.
R=r, r
CC=golang-dev
https://golang.org/cl/4837071
2011-08-12 18:00:31 +02:00
Marcel van Lohuizen
58a92bd1ef exp/norm: generate trie struct in triegen.go for better encapsulation.
R=r, r
CC=golang-dev
https://golang.org/cl/4837071
2011-08-12 17:44:14 +02:00