1
0
mirror of https://github.com/golang/go synced 2024-10-04 18:21:21 -06:00
go/src/pkg
Nigel Tao a49b8b9875 html: rewrite the tokenizer to be more consistent.
Previously, the tokenizer made two passes per token. The first pass
established the token boundary. The second pass picked out the tag name
and attributes inside that boundary. This was problematic when the two
passes disagreed. For example, "<p id=can't><p id=won't>" caused an
infinite loop because the first pass skipped everything inside the
single quotes, and recognized only one token, but the second pass never
got past the first '>'.

This change rewrites the tokenizer to use one pass, accumulating the
boundary points of token text, tag names, attribute keys and attribute
values as it looks for the token endpoint.

It should still be reasonably efficient: text, names, keys and values
are not lower-cased or unescaped (and converted from []byte to string)
until asked for.

One of the token_test test cases was fixed to be consistent with
html5lib. Three more test cases were temporarily disabled, and will be
re-enabled in a follow-up CL. All the parse_test test cases pass.

R=andybalholm, gri
CC=golang-dev
https://golang.org/cl/5244061
2011-10-14 09:58:39 +11:00
..
archive io: rename Copyn to CopyN. 2011-09-30 13:13:39 -07:00
asn1 time: make Weekday a method. 2011-09-12 11:47:55 -07:00
big build: clear execute bit from Go files 2011-09-05 07:48:42 -04:00
bufio bufio: handle a "\r\n" that straddles the buffer. 2011-08-25 08:44:12 +10:00
builtin builtin: correct description of a closed channel. 2011-08-16 16:03:30 +10:00
bytes updates: append(y,[]byte(z)...) -> append(y,z...)" 2011-10-12 13:42:04 -07:00
cmath math: remove the leading F from Fabs etc. 2011-09-29 09:54:20 -07:00
compress go/printer: changed max. number of newlines from 3 to 2 2011-07-14 14:39:40 -07:00
container container/vector: delete 2011-10-11 16:41:48 -07:00
crypto crypto/tls: more Unix root certificate locations 2011-10-13 16:17:15 -04:00
csv go/printer: changed max. number of newlines from 3 to 2 2011-07-14 14:39:40 -07:00
debug debug/elf: permit another case of SHT_NOBITS section overlap in test 2011-09-14 15:33:37 -07:00
encoding encoding/binary: added benchmarks 2011-10-05 13:04:43 -07:00
exec exec: add Command.ExtraFiles 2011-10-06 11:00:02 -07:00
exp go/types: move to exp/types per Go 1 plan 2011-10-13 15:41:48 -07:00
expvar go/printer: changed max. number of newlines from 3 to 2 2011-07-14 14:39:40 -07:00
flag flag: make zero FlagSet useful 2011-09-15 17:04:51 -04:00
fmt fmt: remove an obsolete reference to os.ErrorString in a comment 2011-10-12 13:50:08 -07:00
go go/types: move to exp/types per Go 1 plan 2011-10-13 15:41:48 -07:00
gob gob: avoid one copy for every message written. 2011-10-10 12:38:49 -07:00
hash hash/crc32: add SSE4.2 support 2011-07-12 09:29:24 -04:00
html html: rewrite the tokenizer to be more consistent. 2011-10-14 09:58:39 +11:00
http crypto/tls: fetch root certificates using Mac OS API 2011-10-13 13:59:13 -04:00
image image/tiff: Implement PackBits decoding. 2011-10-13 13:31:26 +11:00
index/suffixarray index/suffixarray: 4.5x faster index serialization (to memory) 2011-09-30 11:31:28 -07:00
io io: rename Copyn to CopyN. 2011-09-30 13:13:39 -07:00
json updates: append(y,[]byte(z)...) -> append(y,z...)" 2011-10-12 13:42:04 -07:00
log log: more locking 2011-07-17 15:46:00 -07:00
mail time: make Weekday a method. 2011-09-12 11:47:55 -07:00
math build: clear execute bit from source files 2011-10-06 18:33:13 +09:00
mime strings: implement a faster byte->string Replacer 2011-10-03 15:19:04 -07:00
net build: fix for new return restriction 2011-10-13 12:17:18 -04:00
old netchan: move to old/netchan 2011-10-12 11:46:50 -07:00
os updates: append(y,[]byte(z)...) -> append(y,z...)" 2011-10-12 13:42:04 -07:00
patch strings.Split: make the default to split all. 2011-06-28 09:43:14 +10:00
path path/filepath: added Rel as the complement of Abs 2011-10-04 11:27:06 -03:00
rand math: remove the leading F from Fabs etc. 2011-09-29 09:54:20 -07:00
reflect reflect: add comment about the doubled semantics of Value.String. 2011-09-20 13:26:57 -07:00
regexp regexp: speedups 2011-09-28 12:00:31 -04:00
rpc rpc: fix typo in documentation client example 2011-09-25 14:19:08 +10:00
runtime gc: disallow close on receive-only channels 2011-10-13 16:58:04 -04:00
scanner scanner: correct error position for illegal UTF-8 encodings 2011-08-08 13:54:32 -07:00
smtp strings.Split: make the default to split all. 2011-06-28 09:43:14 +10:00
sort go/doc, godoc, gotest: support for reading example documentation 2011-10-06 11:56:17 -07:00
strconv strconv: faster Unquote in common case 2011-09-26 13:59:12 -04:00
strings strings: implement a faster byte->string Replacer 2011-10-03 15:19:04 -07:00
sync sync/atomic: replace MFENCE with LOCK XADD 2011-09-19 11:09:00 -07:00
syscall net: implement ip protocol name to number resolver for windows 2011-10-12 10:29:22 +11:00
syslog os.Error API: don't export os.ErrorString, use os.NewError consistently 2011-06-22 10:52:47 -07:00
tabwriter go/printer: changed max. number of newlines from 3 to 2 2011-07-14 14:39:40 -07:00
template pkg: fix incorrect prints found by govet 2011-10-13 13:34:01 +11:00
testing testing: fix time reported for failing tests. 2011-10-07 14:15:16 -07:00
time time: make month/day name comparisons case insenstive 2011-10-04 12:52:30 -07:00
unicode unicode: fix make tables 2011-09-26 13:10:16 -04:00
unsafe unsafe: update doc 2011-08-31 17:59:35 -04:00
url url: handle ; in ParseQuery 2011-09-06 12:24:24 -04:00
utf8 utf8: add Valid and ValidString 2011-10-06 22:47:24 -07:00
utf16
websocket http: remove Request.RawURL 2011-10-12 11:48:25 -07:00
xml xml: marshal "parent>child" tags correctly 2011-08-26 12:29:52 -03:00
deps.bash catch future accidental dependencies to exp/ or old/ 2011-10-12 10:55:42 -07:00
Makefile go/types: move to exp/types per Go 1 plan 2011-10-13 15:41:48 -07:00