Andrew Balholm
a5d300862b
html: allow whitespace between head and body
...
Also ignore <head> tag after </head>.
Pass tests6.dat, test 0:
<!doctype html></head> <head>
| <!DOCTYPE html>
| <html>
| <head>
| " "
| <body>
Also pass tests through test 6:
<body>
<div>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5447064
2011-12-02 11:46:24 +11:00
Andrew Balholm
ce27b00f48
html: implement fragment parsing algorithm
...
Pass the tests in tests4.dat.
R=nigeltao
CC=golang-dev
https://golang.org/cl/5447055
2011-12-01 12:47:57 +11:00
Andrew Balholm
3b3922771a
html: parse <xmp> tags
...
Pass tests5.dat, test 10:
<p><xmp></xmp>
| <html>
| <head>
| <body>
| <p>
| <xmp>
Also pass the remaining tests in tests5.dat.
R=nigeltao
CC=golang-dev
https://golang.org/cl/5440062
2011-11-30 15:37:41 +11:00
Andrew Balholm
e32f4ba77d
html: parse the contents of <iframe> elements as raw text
...
Pass tests5.dat, test 4:
<iframe> <!---> </iframe>x
| <html>
| <head>
| <body>
| <iframe>
| " <!---> "
| "x"
Also pass tests through test 9:
<style> <!</-- </style>x
R=nigeltao
CC=golang-dev
https://golang.org/cl/5450044
2011-11-30 11:44:54 +11:00
Andrew Balholm
c32b607687
html: detect quirks mode
...
Pass tests3.dat, test 23:
<p><table></table>
| <html>
| <head>
| <body>
| <p>
| <table>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5446043
2011-11-29 11:18:49 +11:00
Andrew Balholm
68e7363b56
html: parse <nobr> elements
...
Pass tests3.dat, test 20:
<!doctype html><nobr><nobr><nobr>
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <nobr>
| <nobr>
| <nobr>
Also pass tests through test 22:
<!doctype html><html><body><p><table></table></body></html>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5438056
2011-11-28 10:55:31 +11:00
Andrew Balholm
557ba72e69
html: ignore <head> tags in <head> element
...
Pass tests3.dat, test 12:
<!DOCTYPE html><HTML><META><HEAD></HEAD></HTML>
| <!DOCTYPE html>
| <html>
| <head>
| <meta>
| <body>
Also pass tests through test 19:
<!DOCTYPE html><html><head></head><body><ul><li><div><p><li></ul></body></html>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5436069
2011-11-27 14:41:08 +11:00
Andrew Balholm
af081cd43e
html: ingore newline at the start of a <pre> block
...
Pass tests3.dat, test 4:
<!DOCTYPE html><html><head></head><body><pre>\n</pre></body></html>
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <pre>
Also pass tests through test 11:
<!DOCTYPE html><pre>

A</pre>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5437051
2011-11-24 13:15:09 +11:00
Andrew Balholm
77b0ad1e80
html: parse DOCTYPE into name and public and system identifiers
...
Pass tests2.dat, test 59:
<!DOCTYPE <!DOCTYPE HTML>><!--<!--x-->-->
| <!DOCTYPE <!doctype>
| <html>
| <head>
| <body>
| ">"
| <!-- <!--x -->
| "-->"
Pass all the tests in doctype01.dat.
Also pass tests2.dat, test 60:
<!doctype html><div><form></form><div></div></div>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5437045
2011-11-24 09:28:58 +11:00
Andrew Balholm
57ed39fd3b
html: on EOF in a comment, ignore final dashes (up to 2)
...
Pass tests2.dat, test 57:
<!DOCTYPE html><!--x--
| <!DOCTYPE html>
| <!-- x -->
| <html>
| <head>
| <body>
Also pass test 58:
<!DOCTYPE html><table><tr><td></p></table>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5436048
2011-11-23 09:26:37 +11:00
Andrew Balholm
95e60acb97
html: copy attributes from extra <html> tags to root element
...
Pass tests2.dat, test 50:
<!DOCTYPE html><html><body><html id=x>
| <!DOCTYPE html>
| <html>
| id="x"
| <head>
| <body>
Also pass tests through test 56:
<!DOCTYPE html>X<p/x/y/z>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5432045
2011-11-22 12:08:22 +11:00
Andrew Balholm
750de28d6c
html: ignore whitespace before <head> element
...
Pass tests2.dat, test 47:
" \n "
(That is, two spaces separated by a newline)
| <html>
| <head>
| <body>
Also pass tests through test 49:
<!DOCTYPE html><script>
</script> <title>x</title> </head>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5422043
2011-11-22 09:27:27 +11:00
Andrew Balholm
05d8d112fe
html: refactor parse test infrastructure
...
My excuse for doing this is that test cases with newlines in them didn't
work. But instead of just fixing that, I rearranged everything in
parse_test.go to use fewer channels and pipes, and just call a
straightforward function to read test cases from a file.
R=nigeltao
CC=golang-dev
https://golang.org/cl/5410049
2011-11-20 22:42:28 +11:00
Andrew Balholm
a1dbfa6f09
html: parse <isindex>
...
Pass tests2.dat, test 42:
<isindex test=x name=x>
| <html>
| <head>
| <body>
| <form>
| <hr>
| <label>
| "This is a searchable index. Enter search keywords: "
| <input>
| name="isindex"
| test="x"
| <hr>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5399049
2011-11-17 13:12:13 +11:00
Andrew Balholm
3276afd4d4
html: parse </optgroup> and </option>
...
Pass tests2.dat, test 35:
<!DOCTYPE html><select><optgroup><option></optgroup><option><select><option>
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <select>
| <optgroup>
| <option>
| <option>
| <option>
Also pass tests through test 41:
<!DOCTYPE html><!-- XXX - XXX - XXX -->
R=nigeltao, rsc
CC=golang-dev
https://golang.org/cl/5395045
2011-11-17 10:25:33 +11:00
Andrew Balholm
3307597069
html: parse <optgroup> tags
...
Pass tests2.dat, test 34:
<!DOCTYPE html><select><option><optgroup>
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <select>
| <option>
| <optgroup>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5393045
2011-11-16 19:25:55 +11:00
Andrew Balholm
28546ed56a
html: parse <caption> elements
...
Pass tests2.dat, test 33:
<!DOCTYPE html><table><caption>test TEST</caption><td>test
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <table>
| <caption>
| "test TEST"
| <tbody>
| <tr>
| <td>
| "test"
R=nigeltao
CC=golang-dev
https://golang.org/cl/5371099
2011-11-16 12:18:11 +11:00
Andrew Balholm
b91d82258f
html: auto-close <p> elements when starting <form> element.
...
Pass tests2.dat, test 26:
<!doctypehtml><p><form>
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <p>
| <form>
Also pass tests through test 32:
<!DOCTYPE html><!-- X
R=nigeltao
CC=golang-dev
https://golang.org/cl/5369114
2011-11-15 15:31:22 +11:00
Andrew Balholm
3bd5082f57
html: parse and render <plaintext> elements
...
Pass tests2.dat, test 10:
<table><plaintext><td>
| <html>
| <head>
| <body>
| <plaintext>
| "<td>"
| <table>
Also pass tests through test 25:
<!doctypehtml><p><dd>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5369109
2011-11-15 11:39:18 +11:00
Andrew Balholm
06ef97e15d
html: auto-close <dd> and <dt> elements
...
Pass tests2.dat, test 8:
<!DOCTYPE html><dt><div><dd>
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <dt>
| <div>
| <dd>
Also pass tests through test 9:
<script></x
R=nigeltao
CC=golang-dev
https://golang.org/cl/5373083
2011-11-13 23:27:20 +11:00
Andrew Balholm
3df0512469
html: handle end tags in strange places
...
Pass tests1.dat, test 111:
</strong></b></em></i></u></strike></s></blink></tt></pre></big></small></font></select></h1></h2></h3></h4></h5></h6></body></br></a></img></title></span></style></script></table></th></td></tr></frame></area></link></param></hr></input></col></base></meta></basefont></bgsound></embed></spacer></p></dd></dt></caption></colgroup></tbody></tfoot></thead></address></blockquote></center></dir></div></dl></fieldset></listing></menu></ol></ul></li></nobr></wbr></form></button></marquee></object></html></frameset></head></iframe></image></isindex></noembed></noframes></noscript></optgroup></option></plaintext></textarea>
| <html>
| <head>
| <body>
| <br>
| <p>
Also pass all the remaining tests in tests1.dat.
R=nigeltao
CC=golang-dev
https://golang.org/cl/5372066
2011-11-12 12:23:30 +11:00
Andrew Balholm
0a61c846ef
html: ignore <col> tag outside tables
...
Pass tests1.dat, test 109:
<table><col><tbody><col><tr><col><td><col></table><col>
| <html>
| <head>
| <body>
| <table>
| <colgroup>
| <col>
| <tbody>
| <colgroup>
| <col>
| <tbody>
| <tr>
| <colgroup>
| <col>
| <tbody>
| <tr>
| <td>
| <colgroup>
| <col>
Also pass test 110:
<table><colgroup><tbody><colgroup><tr><colgroup><td><colgroup></table><colgroup>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5369069
2011-11-11 21:44:01 +11:00
Andrew Balholm
83f61a27d6
html: parse column groups
...
Pass tests1.dat, test 108:
<table><colgroup><col><colgroup><col><col><col><colgroup><col><col><thead><tr><td></table>
| <html>
| <head>
| <body>
| <table>
| <colgroup>
| <col>
| <colgroup>
| <col>
| <col>
| <col>
| <colgroup>
| <col>
| <col>
| <thead>
| <tr>
| <td>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5369061
2011-11-11 11:41:46 +11:00
Andrew Balholm
e9e874b7fc
html: parse framesets
...
Pass tests1.dat, test 106:
<frameset><frame><frameset><frame></frameset><noframes></noframes></frameset>
| <html>
| <head>
| <frameset>
| <frame>
| <frameset>
| <frame>
| <noframes>
Also pass test 107:
<h1><table><td><h3></table><h3></h1>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5373050
2011-11-10 23:56:13 +11:00
Andrew Balholm
ddc5ec642d
html: don't emit text token for empty raw text elements.
...
Pass tests1.dat, test 99:
<script></script></div><title></title><p><p>
| <html>
| <head>
| <script>
| <title>
| <body>
| <p>
| <p>
Also pass tests through test 105:
<ul><li><ul></li><li>a</li></ul></li></ul>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5373043
2011-11-10 08:09:54 +11:00
Andrew Balholm
820523d091
html: correctly parse </html> in <head> element.
...
Pass tests1.dat, test 92:
<head></html><meta><p>
| <html>
| <head>
| <body>
| <meta>
| <p>
Also pass tests through test 98:
<p><b><div><marquee></p></b></div>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5359054
2011-11-09 19:18:26 +11:00
Andrew Balholm
ce4eec2e0a
html: treat <image> as <img>
...
Pass tests1.dat, test 90:
<p><image></p>
| <html>
| <head>
| <body>
| <p>
| <img>
Also pass test 91:
<a><table><a></table><p><a><div><a>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5339052
2011-11-09 09:43:55 +11:00
Andrew Balholm
f2b602ed42
html: parse <body>, <base>, <link>, <meta>, and <title> tags inside page body
...
Pass tests1.dat, test 87:
<body><body><base><link><meta><title><p></title><body><p></body>
| <html>
| <head>
| <body>
| <base>
| <link>
| <meta>
| <title>
| "<p>"
| <p>
Handling the last <body> tag requires correcting the original insertion mode in useTheRulesFor.
Also pass test 88:
<textarea><p></textarea>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5364047
2011-11-08 17:55:17 +11:00
Nigel Tao
bbd173fc3d
html: be able to test more than one testdata file.
...
R=andybalholm
CC=golang-dev
https://golang.org/cl/5351041
2011-11-07 09:38:40 +11:00
Andrew Balholm
632a2c59b1
html: properly close <tr> element when an new <tr> starts.
...
Pass tests1.dat, test 87:
<table><tr><tr><td><td><span><th><span>X</table>
| <html>
| <head>
| <body>
| <table>
| <tbody>
| <tr>
| <tr>
| <td>
| <td>
| <span>
| <th>
| <span>
| "X"
R=nigeltao
CC=golang-dev
https://golang.org/cl/5343041
2011-11-04 15:48:11 +11:00
Andrew Balholm
46308d7d11
html: move <link> element from after <head> into <head>
...
Pass tests1.dat, test 85:
<head><meta></head><link>
| <html>
| <head>
| <meta>
| <link>
| <body>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5297079
2011-11-04 09:29:06 +11:00
Andrew Balholm
77aabbf217
html: parse <link> elements in <head>
...
Pass tests1.dat, test 83:
<title><meta></title><link><title><meta></title>
| <html>
| <head>
| <title>
| "<meta>"
| <link>
| <title>
| "<meta>"
| <body>
Also pass test 84:
<style><!--</style><meta><script>--><link></script>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5331061
2011-11-03 17:12:13 +11:00
Andrew Balholm
cf6a712162
html: properly close <marquee> elements.
...
Pass tests1.dat, test 80:
<a href=a>aa<marquee>aa<a href=b>bb</marquee>aa
| <html>
| <head>
| <body>
| <a>
| href="a"
| "aa"
| <marquee>
| "aa"
| <a>
| href="b"
| "bb"
| "aa"
Also pass tests through test 82:
<!DOCTYPE html><spacer>foo
R=nigeltao
CC=golang-dev
https://golang.org/cl/5319071
2011-11-03 10:11:06 +11:00
Russ Cox
c2049d2dfe
src/pkg/[a-m]*: gofix -r error -force=error
...
R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/5322051
2011-11-01 22:04:37 -04:00
Andrew Balholm
22ee5ae25a
html: stop at scope marker node when generating implied </a> tags
...
A <a> tag generates implied end tags for any open <a> elements.
But it shouldn't do that when it is inside a table cell the the open <a>
is outside the table.
So stop the search for an open <a> when we reach a scope marker node.
Pass tests1.dat, test 78:
<a href="blah">aba<table><tr><td><a href="foo">br</td></tr>x</table>aoe
| <html>
| <head>
| <body>
| <a>
| href="blah"
| "abax"
| <table>
| <tbody>
| <tr>
| <td>
| <a>
| href="foo"
| "br"
| "aoe"
Also pass test 79:
<table><a href="blah">aba<tr><td><a href="foo">br</td></tr>x</table>aoe
R=nigeltao
CC=golang-dev
https://golang.org/cl/5320063
2011-11-02 11:47:05 +11:00
Nigel Tao
90b76c0f3e
html: refactor the blacklist for the "render and re-parse" test.
...
R=andybalholm
CC=golang-dev, mikesamuel
https://golang.org/cl/5331056
2011-11-02 09:42:25 +11:00
Andrew Balholm
9db3f78c39
html: process </td> tags; foster parent at most one node per token
...
Correctly close table cell when </td> is read.
Because of reconstructing the active formatting elements, more than one
node may be created when reading a single token.
If both nodes are foster parented, they will be siblings, but the first
node should be the parent of the second.
Pass tests1.dat, test 77:
<a href="blah">aba<table><a href="foo">br<tr><td></td></tr>x</table>aoe
| <html>
| <head>
| <body>
| <a>
| href="blah"
| "aba"
| <a>
| href="foo"
| "br"
| <a>
| href="foo"
| "x"
| <table>
| <tbody>
| <tr>
| <td>
| <a>
| href="foo"
| "aoe"
R=nigeltao
CC=golang-dev
https://golang.org/cl/5305074
2011-11-01 11:42:54 +11:00
Andrew Balholm
604e10c34d
html: adjust bookmark in "adoption agency" algorithm
...
In the adoption agency algorithm, the formatting element is sometimes
removed from the list of active formatting elements and reinserted at a later index.
In that case, the bookmark showing where it is to be reinserted needs to be moved,
so that its position relative to its neighbors remains the same
(and also so that it doesn't become out of bounds).
Pass tests1.dat, test 70:
<DIV> abc <B> def <I> ghi <P> jkl </B>
| <html>
| <head>
| <body>
| <div>
| " abc "
| <b>
| " def "
| <i>
| " ghi "
| <i>
| <p>
| <b>
| " jkl "
Also pass tests through test 76:
<test attribute---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
R=nigeltao
CC=golang-dev
https://golang.org/cl/5322052
2011-10-29 10:51:59 +11:00
Andrew Balholm
03f163c7f2
html: don't run "adoption agency" on elements that aren't in scope.
...
Pass tests1.dat, test 55:
<!DOCTYPE html><font><table></font></table></font>
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <font>
| <table>
Also pass tests through test 69:
<DIV> abc <B> def <I> ghi <P> jkl
R=nigeltao
CC=golang-dev
https://golang.org/cl/5309074
2011-10-28 16:04:58 +11:00
Andrew Balholm
053549ca1b
html: allow whitespace text nodes in <head>
...
Pass tests1.dat, test 50:
<!DOCTYPE html><script> <!-- </script> --> </script> EOF
| <!DOCTYPE html>
| <html>
| <head>
| <script>
| " <!-- "
| " "
| <body>
| "--> EOF"
Also pass tests through test 54:
<!DOCTYPE html><title>U-test</title><body><div><p>Test<u></p></div></body>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5311066
2011-10-28 09:06:30 +11:00
Andrew Balholm
833fb4198d
html: parse <style> elements inside <head> element.
...
Also correctly handle EOF inside a <style> element.
Pass tests1.dat, test 49:
<!DOCTYPE html><style> EOF
| <!DOCTYPE html>
| <html>
| <head>
| <style>
| " EOF"
| <body>
R=nigeltao
CC=golang-dev
https://golang.org/cl/5321057
2011-10-27 10:26:11 +11:00
Andrew Balholm
bd07e4f259
html: close <option> element when opening <optgroup>
...
Pass tests1.dat, test 34:
<!DOCTYPE html>A<option>B<optgroup>C<select>D</option>E
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| "A"
| <option>
| "B"
| <optgroup>
| "C"
| <select>
| "DE"
Also passes tests 35-48. Test 48 is:
</ COM--MENT >
R=nigeltao
CC=golang-dev
https://golang.org/cl/5311063
2011-10-27 09:45:53 +11:00
Andrew Balholm
05ed18f4f6
html: improve parsing of lists
...
Make a <li> tag close the previous <li> element.
Make a </ul> tag close <li> elements.
Pass tests1.dat, test 33:
<!DOCTYPE html><li>hello<li>world<ul>how<li>do</ul>you</body><!--do-->
| <!DOCTYPE html>
| <html>
| <head>
| <body>
| <li>
| "hello"
| <li>
| "world"
| <ul>
| "how"
| <li>
| "do"
| "you"
| <!-- do -->
R=nigeltao
CC=golang-dev
https://golang.org/cl/5321051
2011-10-26 14:02:30 +11:00
Andrew Balholm
6e318bda6c
html: improve parsing of tables
...
When foster parenting, merge adjacent text nodes.
Properly close table row at </tr> tag.
Pass tests1.dat, test 32:
<!-----><font><div>hello<table>excite!<b>me!<th><i>please!</tr><!--X-->
| <!-- - -->
| <html>
| <head>
| <body>
| <font>
| <div>
| "helloexcite!"
| <b>
| "me!"
| <table>
| <tbody>
| <tr>
| <th>
| <i>
| "please!"
| <!-- X -->
R=nigeltao
CC=golang-dev
https://golang.org/cl/5323048
2011-10-26 11:36:46 +11:00
Andrew Balholm
2f3f3aa2ed
html: dump attributes when running parser tests.
...
The WebKit test data shows attributes as though they were child nodes:
<a X>0<b>1<a Y>2
dumps as:
| <html>
| <head>
| <body>
| <a>
| x=""
| "0"
| <b>
| "1"
| <b>
| <a>
| y=""
| "2"
So we need to do the same when dumping a tree to compare with it.
R=nigeltao
CC=golang-dev
https://golang.org/cl/5322044
2011-10-25 09:33:15 +11:00
Andrew Balholm
2aa589c843
html: implement foster parenting
...
Implement the foster-parenting algorithm for content that is inside a table
but not in a cell.
Also fix a bug in reconstructing the active formatting elements.
Pass test 30 in tests1.dat:
<a><table><td><a><table></table><a></tr><a></table><b>X</b>C<a>Y
R=nigeltao
CC=golang-dev
https://golang.org/cl/5309052
2011-10-23 18:36:01 +11:00
Nigel Tao
2f352ae48a
html: parse <select> tags.
...
The additional test case in parse_test.go is:
<select><b><option><select><option></b></select>X
R=andybalholm
CC=golang-dev
https://golang.org/cl/5293051
2011-10-22 20:18:12 +11:00
Nigel Tao
64306c9fd0
html: parse and render comment nodes.
...
The first additional test case in parse_test.go is:
<!--><div>--<!-->
The second one is unrelated to the comment change, but also passes:
<p><hr></p>
R=andybalholm
CC=golang-dev
https://golang.org/cl/5299047
2011-10-20 11:45:30 +11:00
Nigel Tao
b1fd528db5
html: parse raw text and RCDATA elements, such as <script> and <title>.
...
Pass tests1.dat, test 26:
#data
<script><div></script></div><title><p></title><p><p>
#document
| <html>
| <head>
| <script>
| "<div>"
| <title>
| "<p>"
| <body>
| <p>
| <p>
Thanks to Andy Balholm for driving this change.
R=andybalholm
CC=golang-dev
https://golang.org/cl/5301042
2011-10-19 08:03:30 +11:00
Andrew Balholm
c64e8e327e
html: insert implied <p> and </p> tags
...
(test # 25 in tests1.dat)
#data
<p><b><div></p></b></div>X
#document
| <html>
| <head>
| <body>
| <p>
| <b>
| <div>
| <b>
|
| <p>
| "X"
R=nigeltao
CC=golang-dev
https://golang.org/cl/5254060
2011-10-13 12:40:48 +11:00