qbit/go - go - Tape:neT

qbit/go

mirror of https://github.com/golang/go synced 2024-10-05 16:01:22 -06:00

Author	SHA1	Message	Date
Marcel van Lohuizen	cfcc3ebfa4	exp/norm: changed API of Iter. Motivations: - Simpler UI. Previous API proved a bit awkward for practical purposes. - Iter is often used in cases where one want to be able to bail out early. The old implementaton had too much look-ahead to be efficient. Disadvantages: - ASCII performance is bad. This is unavoidable for tiny iterations. Example is included to show how to work around this. Description: Iter now iterates per boundary/segment. It returns a slice of bytes that either points to the input bytes, the internal decomposition strings, or the small internal buffer that each iterator has. In many cases, copying bytes is avoided. The method Seek was added to support jumping around the input without having to reinitialize. Details: - Table adjustments: some decompositions exist of multiple segments. Decompositions that are of this type are now marked so that Iter can handle them separately. - The old iterator had a different next function for different normal forms that was assigned to a function pointer called by Next. The new iterator uses this mechanism to switch between different modes for handling different type of input as well. This greatly improves performance for Hangul and ASCII. It is also used for multi-segment decompositions. - input is now a struct of sting and []byte, instead of an interface. This simplifies optimizing the ASCII case. R=rsc CC=golang-dev https://golang.org/cl/6873072	2012-12-24 16:53:25 +01:00
Shenghou Ma	d1ef9b56fb	all: fix typos caught by https://github.com/lyda/misspell-check. R=golang-dev, gri CC=golang-dev https://golang.org/cl/6949072	2012-12-19 03:04:09 +08:00
Marcel van Lohuizen	ecd24f381e	exp/norm: Added Iter type for iterating on segment boundaries. This type is mainly to be used by other low-level libraries, like collate. Extra care has been given to optimize the performance of normalizing to NFD, as this is what will be used by the collator. The overhead of checking whether a string is normalized vs simply decomposing a string is neglible. Assuming that most strings are in the FCD form, this iterator can be used to decompose strings and normalize with minimal overhead. R=r CC=golang-dev https://golang.org/cl/5676057	2012-02-21 13:13:21 +01:00
Marcel van Lohuizen	d673c95d6c	exp/norm: Added some benchmarks for form-specific performance measurements. R=r CC=golang-dev https://golang.org/cl/5605051	2012-02-02 13:19:12 +01:00
Marcel van Lohuizen	cadbd3ea49	exp/norm: fixed two unrelated bugs in normalization library. 1) incorrect length given for out buffer in String. 2) patchTail bug that could cause characters to be lost when crossing into the out-buffer boundary. Added tests to expose these bugs. Also slightly improved performance of Bytes() and String() by sharing the reorderBuffer across operations. Fixes #2567. R=r CC=golang-dev https://golang.org/cl/5502069	2011-12-23 18:21:26 +01:00
Russ Cox	c945f77f41	exp/norm: use rune Nothing terribly interesting here. (!) Since the public APIs are all in terms of UTF-8, the changes are all internal only. R=mpvl, gri, r CC=golang-dev https://golang.org/cl/5309042	2011-10-25 22:26:12 -07:00
Marcel van Lohuizen	5844fc1b21	exp/norm: introduced input interface to implement string versions of methods. R=r, mpvl CC=golang-dev https://golang.org/cl/5166045	2011-10-05 10:44:11 -07:00
Robert Griesemer	9c643bb3fa	exp/norm: fix benchmark bug - don't use range over string to copy string bytes - some code simplification R=mpvl CC=golang-dev https://golang.org/cl/5144044	2011-09-26 18:23:21 -07:00
Marcel van Lohuizen	d5e24b6975	exp/norm: performance improvements of quickSpan - fixed performance bug that could lead to O(n^2) behavior - performance improvement for ASCII case R=r, r CC=golang-dev https://golang.org/cl/4956060	2011-09-05 19:09:20 +02:00
Marcel van Lohuizen	2517143957	exp/norm: added Reader and Writer and bug fixes to support these. Needed to ensure that finding the last boundary does not result in O(n^2)-like behavior. Now prevents lookbacks beyond 31 characters across the board (starter + 30 non-starters). composition.go: - maxCombiningCharacters now means exactly that. - Bug fix. - Small performance improvement/ made code consistent with other code. forminfo.go: - Bug fix: ccc needs to be 0 for inert runes. normalize.go: - A few bug fixes. - Limit the amount of combining characters considered in FirstBoundary. - Ditto for LastBoundary. - Changed semantics of LastBoundary to not consider trailing illegal runes a boundary as long as adding bytes might still make them legal. trie.go: - As utf8.UTFMax is 4, we should treat UTF-8 encodings of size 5 or greater as illegal. This has no impact on the normalization process, but it prevents buffer overflows where we expect at most UTFMax bytes. R=r CC=golang-dev https://golang.org/cl/4963041	2011-09-02 12:39:35 +02:00
Marcel van Lohuizen	d9c9c48797	exp/norm: added implemenation for []byte versions of methods. R=r CC=golang-dev https://golang.org/cl/4925041	2011-08-22 12:52:04 +02:00

11 Commits