Andrew Balholm
053549ca1b
html: allow whitespace text nodes in <head>
...
Pass tests1.dat, test 50:
<!DOCTYPE html><script> <!-- </script> --> </script> EOF
| <!DOCTYPE html>
| <html>
| <head>
| <script>
| " <!-- "
| " "
| <body>
| "--> EOF"
Also pass tests through test 54:
<!DOCTYPE html><title>U-test</title><body><div><p>Test<u></p></div></body>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5311066
2011-10-28 09:06:30 +11:00
Andrew Balholm
833fb4198d
html: parse <style> elements inside <head> element.
...
Also correctly handle EOF inside a <style> element.
Pass tests1.dat, test 49:
<!DOCTYPE html><style> EOF
| <!DOCTYPE html>
| <html>
| <head>
| <style>
| " EOF"
| <body>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5321057
2011-10-27 10:26:11 +11:00
Andrew Balholm
bd07e4f259
html: close <option> element when opening <optgroup>
...
Pass tests1.dat, test 34:
<!DOCTYPE html>A<option>B<optgroup>C<select>D</option>E
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| "A"
| <option>
| "B"
| <optgroup>
| "C"
| <select>
| "DE"
Also passes tests 35-48. Test 48 is:
</ COM--MENT >
R=nigeltao
CC=golang-dev
https://golang.org/cl/5311063
2011-10-27 09:45:53 +11:00
Andrew Balholm
05ed18f4f6
html: improve parsing of lists
...
Make a <li> tag close the previous <li> element.
Make a </ul> tag close <li> elements.
Pass tests1.dat, test 33:
<!DOCTYPE html><li>hello<li>world<ul>how<li>do</ul>you</body><!--do-->
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <li>
| "hello"
| <li>
| "world"
| <ul>
| "how"
| <li>
| "do"
| "you"
| <!-- do -->
R=nigeltao
CC=golang-dev
https://golang.org/cl/5321051
2011-10-26 14:02:30 +11:00
Andrew Balholm
6e318bda6c
html: improve parsing of tables
...
When foster parenting, merge adjacent text nodes.
Properly close table row at </tr> tag.
Pass tests1.dat, test 32:
<!-----><font><div>hello<table>excite!<b>me!<th><i>please!</tr><!--X-->
| <!-- - -->
| <html>
| <head>
| <body>
| <font>
| <div>
| "helloexcite!"
| <b>
| "me!"
| <table>
| <tbody>
| <tr>
| <th>
| <i>
| "please!"
| <!-- X -->
R=nigeltao
CC=golang-dev
https://golang.org/cl/5323048
2011-10-26 11:36:46 +11:00
Nigel Tao
18b025d530
html: remove the Tokenizer.ReturnComments option.
...
The original intention was to simplify the parser, in making it skip
all comment tokens. However, checking that the Go html package is
100% compatible with the WebKit HTML test suite requires parsing the
comments. There is no longer any real benefit for the option.
R=gri, andybalholm
CC=golang-dev
https://golang.org/cl/5321043
2011-10-25 11:28:07 +11:00
Andrew Balholm
2aa589c843
html: implement foster parenting
...
Implement the foster-parenting algorithm for content that is inside a table
but not in a cell.
Also fix a bug in reconstructing the active formatting elements.
Pass test 30 in tests1.dat:
<a><table><td><a><table></table><a></tr><a></table><b>X</b>C<a>Y
R=nigeltao
CC=golang-dev
https://golang.org/cl/5309052
2011-10-23 18:36:01 +11:00
Nigel Tao
2f352ae48a
html: parse <select> tags.
...
The additional test case in parse_test.go is:
<select><b><option><select><option></b></select>X
R=andybalholm
CC=golang-dev
https://golang.org/cl/5293051
2011-10-22 20:18:12 +11:00
Nigel Tao
64306c9fd0
html: parse and render comment nodes.
...
The first additional test case in parse_test.go is:
<!--><div>--<!-->
The second one is unrelated to the comment change, but also passes:
<p><hr></p>
R=andybalholm
CC=golang-dev
https://golang.org/cl/5299047
2011-10-20 11:45:30 +11:00
Nigel Tao
b1fd528db5
html: parse raw text and RCDATA elements, such as <script> and <title>.
...
Pass tests1.dat, test 26:
#data
<script><div></script></div><title><p></title><p><p>
#document
| <html>
| <head>
| <script>
| "<div>"
| <title>
| "<p>"
| <body>
| <p>
| <p>
Thanks to Andy Balholm for driving this change.
R=andybalholm
CC=golang-dev
https://golang.org/cl/5301042
2011-10-19 08:03:30 +11:00
Andrew Balholm
c64e8e327e
html: insert implied <p> and </p> tags
...
(test # 25 in tests1.dat)
#data
<p><b><div></p></b></div>X
#document
| <html>
| <head>
| <body>
| <p>
| <b>
| <div>
| <b>
|
| <p>
| "X"
R=nigeltao
CC=golang-dev
https://golang.org/cl/5254060
2011-10-13 12:40:48 +11:00
Nigel Tao
1d0c141d7d
html: parse doctype tokens; merge adjacent text nodes.
...
The test case input is "<!DOCTYPE html><span><button>foo</span>bar".
The correct parse is:
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <span>
| <button>
| "foobar"
R=gri
CC=golang-dev
https://golang.org/cl/4794063
2011-08-01 10:26:46 +10:00
Nigel Tao
5a141064ed
html: parse misnested formatting tags according to the HTML5 spec.
...
This is the "adoption agency" algorithm.
The test case input is "<a><p>X<a>Y</a>Z</p></a>". The correct parse is:
| <html>
| <head>
| <body>
| <a>
| <p>
| <a>
| "X"
| <a>
| "Y"
| "Z"
R=gri
CC=golang-dev
https://golang.org/cl/4771042
2011-07-21 11:20:54 +10:00
Nigel Tao
d360e0213d
html: update section references in comments to the latest HTML5 spec.
...
R=r
CC=golang-dev
https://golang.org/cl/4699048
2011-07-13 16:53:02 +10:00
Yasuhiro Matsumoto
1e6d946594
html: parse start tags that aren't explicitly otherwise dealt with.
...
R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/4626080
2011-07-06 13:08:52 +10:00
Yasuhiro Matsumoto
054cf72b56
html: fix nesting when parsing a close tag.
...
R=nigeltao
CC=golang-dev
https://golang.org/cl/4636067
2011-06-30 23:16:33 +10:00
Nigel Tao
fec6ab9726
html: parse "<h1>foo<h2>bar".
...
R=gri
CC=golang-dev
https://golang.org/cl/3571043
2010-12-15 11:39:56 +11:00
Nigel Tao
71bd053ada
html: parse <table><tr><td> tags.
...
Also, shorten fooInsertionMode to fooIM.
R=gri
CC=golang-dev
https://golang.org/cl/3504042
2010-12-10 12:20:14 +11:00
Nigel Tao
49014c5b12
html: handle unexpected EOF during parsing.
...
This lets us parse HTML like "<html>foo".
R=gri
CC=golang-dev
https://golang.org/cl/3460043
2010-12-08 08:59:20 +11:00
Nigel Tao
08a47d6f60
html: first cut at a parser.
...
R=gri
CC=golang-dev
https://golang.org/cl/3355041
2010-12-07 12:02:36 +11:00