1
0
mirror of https://github.com/golang/go synced 2024-10-04 08:21:22 -06:00
Commit Graph

649 Commits

Author SHA1 Message Date
Marcel van Lohuizen
5a78e5ea4c exp/locale/collate: Added functionality to parse and process LDML files
for both locale-specific exemplar characters and tailorings to
the collation table.
Some specifices:
- Moved stringSet to the beginning of the file and added some functionality
  to parse command line files.
- openReader now modifies the input URL for localFiles to guarantee that
  any http source listed in the generated file is indeed this source.
- Note that the implementation of the Tailoring API used by maketables.go
  is not yet checked in. So for now adding tailorings are simply no-ops.
- The generated file of exemplar characters will be used somewhere else.
  Here is a snippet of how the body of the generated file looks like:

type exemplarType int
const (
        exCharacters exemplarType = iota
        exContractions
        exPunctuation
        exAuxiliary
        exCurrency
        exIndex
        exN
)

var exemplarCharacters = map[string][exN]string{
        "af": [exN]string{
                0: "a á â b c d e é è ê ë f g h i î ï j k l m n o ô ö p q r s t u û v w x y z",
                3: "á à â ä ã æ ç é è ê ë í ì î ï ó ò ô ö ú ù û ü ý",
                4: "a b c d e f g h i j k l m n o p q r s t u v w x y z",
        },
        ...
}

R=r
CC=golang-dev
https://golang.org/cl/6501070
2012-09-01 14:15:00 +02:00
Marcel van Lohuizen
18aa55c169 exp/locale/collate: first changes that introduce implementation of tailorings:
- Elements in the array are now sorted as a linked list.  This makes it easier to
  apply tailorings.
- Added code to sort entries by collation elements.
- Added logical reset points.  This is used for tailoring relative to certain
  properties, rather than characters.

NOTE: all code for type entry should now be in order.go.  To keep the diffs for
this CL reasonable, though, the existing code is left in builder.go.  I'll move
this in a separate CL.

R=r
CC=golang-dev
https://golang.org/cl/6493063
2012-09-01 14:13:37 +02:00
Nigel Tao
13cf2473b8 exp/html: change a node's children from a slice to a linked list.
Also rename Node.{Add,Remove} to Node.{AppendChild,RemoveChild} to
be consistent with the DOM.

benchmark                      old ns/op    new ns/op    delta
BenchmarkParser                  4042040      3749618   -7.23%

benchmark                       old MB/s     new MB/s  speedup
BenchmarkParser                    19.34        20.85    1.08x

BenchmarkParser mallocs per iteration is also:
10495 before / 7992 after

R=andybalholm, r, adg
CC=golang-dev
https://golang.org/cl/6495061
2012-08-31 10:00:12 +10:00
Marcel van Lohuizen
c61a185f35 exp/locale/collate: add code to ignore tests with (unpaired) surrogates.
In the regtest data, surrogates are assigned primary weights based on
the surrogate code point value.  Go now converts surrogates to FFFD, however,
meaning that the primary weight is based on this code point instead.
This change drops tests with surrogates and lets the tests pass.

R=r
CC=golang-dev
https://golang.org/cl/6461100
2012-08-24 15:56:07 +02:00
Nigel Tao
8eb05b3843 exp/html: remove unused forTag function.
R=adg
CC=golang-dev
https://golang.org/cl/6480051
2012-08-24 14:15:55 +10:00
Nigel Tao
fa0e9cd279 exp/html: refactor the parser.read method.
R=andybalholm
CC=golang-dev
https://golang.org/cl/6463070
2012-08-21 20:59:02 +10:00
Marcel van Lohuizen
a8357f0160 exp/locale/collate/build: fixed bug that was exposed by experimenting
with table changes.
NOTE: there is no test for this, but 1) the code has now the same
control flow as scan in exp/locale/collate/contract.go, which is
tested and 2) Builder verifies the generated table so bugs in this
code are quickly and easily found (which is how this bug was discovered).

R=r
CC=golang-dev
https://golang.org/cl/6461082
2012-08-20 10:56:41 +02:00
Marcel van Lohuizen
98883c811a exp/locale/collate: let regtest generate its own collation table.
The main table will need to get a slightly different collation table as the one
used by regtest, as the regtest is based on the standard UCA DUCET, while
the locale-specific tables are all based on a CLDR root table.
This change allows changing the table without affecting the regression test.

R=r
CC=golang-dev
https://golang.org/cl/6453089
2012-08-20 10:56:19 +02:00
Marcel van Lohuizen
2845e5881f exp/locale/collate: changed default AlternateHandling to non-ignorable, the same
default as ICU.

R=r
CC=golang-dev
https://golang.org/cl/6445080
2012-08-20 10:56:06 +02:00
Marcel van Lohuizen
6918357031 exp/locale/collate: Added test flag to maketables tool for comparing newly
against previously generated tables.

R=r
CC=golang-dev
https://golang.org/cl/6441098
2012-08-20 10:55:40 +02:00
Nigel Tao
2b14a48d54 exp/html: make the parser manipulate the tokenizer via exported methods
instead of touching the tokenizer's internal state.

R=andybalholm
CC=golang-dev
https://golang.org/cl/6446153
2012-08-20 11:04:36 +10:00
Andrew Balholm
d624f0c922 exp/html: simplify testing code
Now that the parser passes all tests in the test suite,
it is no longer necessary to keep track of which tests
pass and which don't. So remove the testlogs directory
and the code that uses it.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6453124
2012-08-16 09:31:22 +10:00
Andrew Balholm
27cb1cbb2e exp/html: skip render and reparse on more tests that build badly-formed parse trees
All of the remaining tests that had as status of PARSE rather than PASS had
good reasons for not passing the render-and-reparse step: the correct parse tree is
badly formed, so when it is rendered out as HTML, the result doesn't parse into the
same tree. So add them to the list of tests where that step is skipped.

Also, I discovered that it is possible to end up with HTML elements (not just text)
inside a raw text element through reparenting. So change the rendering routines to
handle that situation as sensibly as possible (which still isn't very sensible, but
this is HTML5).

R=nigeltao
CC=golang-dev
https://golang.org/cl/6446137
2012-08-15 11:44:25 +10:00
Adam Langley
7fa3b9f7ea exp/proxy: remove package.
This package has moved to go.net.

R=golang-dev, minux.ma, r, dave
CC=golang-dev
https://golang.org/cl/6461056
2012-08-14 13:04:14 -04:00
Andrew Balholm
3ba25e76a7 exp/html: generate replacement for <isindex> correctly
When generating replacement elements for an <isindex> tag, the old
addSyntheticElement method was producing the wrong nesting. Replace
it with parseImpliedToken.

Pass the one remaining test in the test suite.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6453114
2012-08-14 09:53:10 +10:00
Andrew Balholm
aa9a81b1b0 exp/html: discard tags that are terminated by EOF instead of by '>'
If a tag doesn't have a closing '>', it isn't considered a tag;
it is just ignored and EOF is returned instead.

Pass one additional test in the test suite.

Change tokenizer tests to match correct behavior.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6454131
2012-08-13 12:07:44 +10:00
Andrew Balholm
c5038c8593 exp/html: ignore self-closing flag except in SVG and MathML
In HTML content, having a self-closing tag is a parse error unless
the tag would be self-closing anyway (like <img>). The only place a
self-closing tag actually makes a difference is in XML-based foreign
content.

Pass 1 additional test.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6450109
2012-08-10 09:34:10 +10:00
Robert Griesemer
0f2529317f exp/types: add more import tests
Also simplified parsing of interface
types since they can only contain
methods (and no embedded interfaces)
in the export data.

R=rsc
CC=golang-dev
https://golang.org/cl/6446084
2012-08-09 11:55:00 -07:00
Andrew Balholm
22e918f5d6 exp/html: ignore </html> in afterBodyIM when parsing a fragment
Pass 1 additional test.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6454124
2012-08-09 10:19:25 +10:00
Andrew Balholm
e4a50195c3 exp/html: when ignoring <textarea> tag, switch tokenizer out of raw text mode
Pass 1 additional test.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6459060
2012-08-09 09:43:10 +10:00
Andrew Balholm
fca45719a4 exp/html: foster-parent text correctly
If a table contained whitespace, text nodes would not get foster parented
correctly.

Pass 1 additional test.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6459054
2012-08-08 10:00:57 +10:00
Andrew Balholm
5530a426ef exp/html: correctly handle <title> after </head>
The <title> element was getting removed from the stack of open elements,
when its parent, the <head> element should have been removed instead.

Pass 2 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6449101
2012-08-07 13:36:08 +10:00
Andrew Balholm
2276ab92c1 exp/html: fix foster-parenting when elements are implicitly closed
When an element (like <nobr> or <p>) was implicitly closed by another
start tag, it would keep foster parenting from working because
the check for what was on top of the stack of open elements was
in the wrong place.

Move the check to addChild.

Pass 2 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6460045
2012-08-07 09:35:09 +10:00
Alexey Borzenkov
a108369c83 syscall: return EINVAL when string arguments have NUL characters
Since NUL usually terminates strings in underlying syscalls, allowing
it when converting string arguments is a security risk, especially
when dealing with filenames. For example, a program might reason that
filename like "/root/..\x00/" is a subdirectory or "/root/" and allow
access to it, while underlying syscall will treat "\x00" as an end of
that string and the actual filename will be "/root/..", which might
be unexpected. Returning EINVAL when string arguments have NUL in
them makes sure this attack vector is unusable.

R=golang-dev, r, bradfitz, fullung, rsc, minux.ma
CC=golang-dev
https://golang.org/cl/6458050
2012-08-05 17:24:32 -04:00
Andrew Balholm
74db9d298b exp/html: don't treat SVG <title> like HTML <title>
The content of an HTML <title> element is RCDATA, but the content of an SVG
<title> element is parsed as tags. Now the parser doesn't go into RCDATA
mode in foreign content.

Pass 4 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6448111
2012-08-05 22:32:35 +10:00
Marcel van Lohuizen
89d40b911c exp/locale/collate: changed API of Builder to be more convenient
for dealing with CLDR files:
- Add now taxes a list of indexes of colelems that are variables. Checking and
  handling is now done by the Builder.  VariableTop is now also properly generated
  using the Build method.
- Introduced separate Builder, called Tailoring, for creating tailorings of root
  table.  This clearly separates the functionality for building a table based on
  weights (the allkeys* files) versus tables based on LDML XML files.
- Tailorings are now added by two calls instead of one: SetAnchor and Insert.
  This more closely reflects the structure of LDML side and simplifies the
  implementation of both the client and library side.  It also preserves
  some information that is otherwise hard to recover for the Builder.
- Allow the LDML XML element extend to be passed to Insert.  This simplifies
  both client and library implementation.

R=r
CC=golang-dev
https://golang.org/cl/6454061
2012-08-03 09:01:21 +02:00
Andrew Balholm
2f39a33b6a exp/html: in parse tests, discard only one trailing newline
Pass 2 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6454090
2012-08-03 09:31:45 +10:00
Nigel Tao
1916db786f html: make the low-level tokenizer also skip end-tag attributes.
R=andybalholm
CC=golang-dev
https://golang.org/cl/6453071
2012-08-03 09:29:16 +10:00
Rémy Oudompheng
37d7500f8d exp/types: set non-embedded method type during GcImport.
R=golang-dev, gri
CC=golang-dev
https://golang.org/cl/6445068
2012-08-02 16:24:09 -07:00
Robert Griesemer
a4ac339f43 exp/types: enable cycle checks again
Process a package's object in a reproducible
order (rather then in map order) so that we
get error messages in reproducible order.

R=r
CC=golang-dev
https://golang.org/cl/6449076
2012-08-01 16:37:06 -07:00
Andrew Balholm
dbbfbcc4a1 exp/html: implement escaping and double-escaping in scripts
The text inside <script> tags is not ordinary raw text; there are all sorts
of other complications. This CL implements those complications.

Pass 76 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6443070
2012-08-01 14:45:35 +10:00
Robert Griesemer
152279f203 exp/types: Replace String method with TypeString function
This is more in sync with the rest of the package;
for instance, we have functions (not methods) to
deref or find the underlying type of a Type.

In the process use a single bytes.Buffer to create
the string representation for a type rather than
the (occasional) string concatenation.

R=r
CC=golang-dev
https://golang.org/cl/6458057
2012-07-31 19:30:18 -07:00
Robert Griesemer
dcb6f59811 exp/types: implement Type.String methods for testing/debugging
Also:
- replaced existing test with a more comprehensive test
- fixed bug in map type creation

R=r
CC=golang-dev
https://golang.org/cl/6450072
2012-07-31 17:09:12 -07:00
Andrew Balholm
9f3b00579e exp/html: tokenize attributes of end tags
If an end tag has an attribute that is a quoted string containing '>',
the tokenizer would end the tag prematurely. Now it reads the attributes
on end tags just as it does on start tags, but the high-level interface
still doesn't return them, because their presence is a parse error.

Pass 1 additional test.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6457060
2012-08-01 09:35:02 +10:00
Andrew Balholm
eff32f573b exp/html: replace NUL with U+FFFD in text in foreign content
Pass 5 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6452055
2012-07-29 16:29:49 +10:00
Marcel van Lohuizen
601045e87a exp/locale/collate: changed trie in first step towards support for multiple locales.
- Allow handles into the trie for different locales.  Multiple tables share the same
  try to allow for reuse of blocks.
- Significantly improved memory footprint and reduced allocations of trieNodes.
  This speeds up generation by about 30% and allows keeping trieNodes around
  for multiple locales during generation.
- Renamed print method to fprint.

R=r
CC=golang-dev
https://golang.org/cl/6408052
2012-07-28 18:44:14 +02:00
Andrew Balholm
a1f340fa1a exp/html: parse CDATA sections in foreign content
Also convert NUL to U+FFFD in comments.

Pass 23 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6446055
2012-07-27 16:05:25 +10:00
Andrew Balholm
55f0c8b2cd exp/html: replace NUL bytes in plaintext, raw text, and RCDATA
If NUL bytes occur inside certain elements, convert them to U+FFFD
replacement character.

Pass 1 additional test.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6452047
2012-07-27 09:27:10 +10:00
Andrew Wilkins
d399b681a4 exp/types: process ast.Fun in checkObj; fix variadic function building
Fixed creation of Func's, taking IsVariadic from parameter list rather
than results.

Updated checkObj to process ast.Fun objects.

R=gri
CC=golang-dev
https://golang.org/cl/6402046
2012-07-26 11:47:46 -07:00
Andrew Balholm
899be50991 exp/html: don't insert empty text nodes
Pass 1 additional test.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6443048
2012-07-26 10:32:24 +10:00
Andrew Balholm
4d22519678 exp/html: allow frameset if body contains whitespace
If the body of an HTML document contains text, the <frameset> tag is
ignored. But not if the text is only whitespace.

Pass 4 additional tests.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6442043
2012-07-25 12:09:58 +10:00
Andrew Balholm
f979528ce6 exp/html: special handling for entities in attributes
Don't unescape entities in attributes when they don't end with
a semicolon and they are followed by '=', a letter, or a digit.

Pass 6 more tests from the WebKit test suite, plus one that was
commented out in token_test.go.

R=nigeltao
CC=golang-dev
https://golang.org/cl/6405073
2012-07-23 12:39:58 +10:00
Marcel van Lohuizen
882b6ef454 exp/locale/collate: This CL includes the following changes:
- Changed the representation of colElem to support a few cases
  for some languages not supported by the current format.
- Changed offsets for implicit primary values. This makes the
  values both easier to read and debug (last 4 nibbles are identical to
  implicit primary value) and also results in better packing.
- Fixed bug in weight conversion code that did not pop up yet by
  sheer luck.
Note that tables.go also includes changes to the contraction trie
from CL 6346092.

R=r, mpvl
CC=golang-dev
https://golang.org/cl/6392060
2012-07-13 11:38:22 +02:00
Marcel van Lohuizen
adc19ac5e3 exp/locale/collate: adjusted contraction trie to support Myanmar (Burmese),
which has a rather large contraction table. The value of the next state
offset now starts after the current block, instead of before.  This is
slightly less efficient (on extra addition per state change), but gives
some extra range for the offsets.
Also introduced constants for final (0) and noIndex (0xFF).
tables.go is updated in a separate CL.

R=r
CC=golang-dev
https://golang.org/cl/6346092
2012-07-13 11:38:00 +02:00
Jan Ziak
b3382ec9e9 exp/inotify: prevent data race
Fixes #3713.

R=bradfitz, rsc
CC=golang-dev
https://golang.org/cl/6331055
2012-06-25 14:08:09 -04:00
Jan Ziak
f5f3c3fe09 exp/inotify: prevent data race during testing
Fixes #3714.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/6341047
2012-06-24 19:22:48 -04:00
Marcel van Lohuizen
77b1378c3e exp/locale/collate: added regression test for collate package
based on UCA test files.

R=r
CC=golang-dev
https://golang.org/cl/6216056
2012-06-19 11:34:56 -07:00
Robert Griesemer
ca2ae27dd0 go/ast: multiple "blank" imports are permitted
R=rsc, dsymonds
CC=golang-dev
https://golang.org/cl/6303099
2012-06-18 21:56:41 -07:00
Nigel Tao
834edc4257 exp/html/atom: add some more atoms.
R=r, dsymonds
CC=golang-dev
https://golang.org/cl/6298085
2012-06-15 15:39:25 +10:00
Shenghou Ma
9852614291 exp/types: clean up objects after test
Fixes #3739.

R=bradfitz, rsc
CC=golang-dev
https://golang.org/cl/6295083
2012-06-15 02:52:18 +08:00