1
0
mirror of https://github.com/golang/go synced 2024-10-05 12:21:22 -06:00
Commit Graph

16 Commits

Author SHA1 Message Date
Andrew Balholm
6e318bda6c html: improve parsing of tables
When foster parenting, merge adjacent text nodes.
Properly close table row at </tr> tag.

Pass tests1.dat, test 32:
<!-----><font><div>hello<table>excite!<b>me!<th><i>please!</tr><!--X-->

| <!-- - -->
| <html>
|   <head>
|   <body>
|     <font>
|       <div>
|         "helloexcite!"
|         <b>
|           "me!"
|         <table>
|           <tbody>
|             <tr>
|               <th>
|                 <i>
|                   "please!"
|             <!-- X -->

R=nigeltao
CC=golang-dev
https://golang.org/cl/5323048
2011-10-26 11:36:46 +11:00
Nigel Tao
18b025d530 html: remove the Tokenizer.ReturnComments option.
The original intention was to simplify the parser, in making it skip
all comment tokens. However, checking that the Go html package is
100% compatible with the WebKit HTML test suite requires parsing the
comments. There is no longer any real benefit for the option.

R=gri, andybalholm
CC=golang-dev
https://golang.org/cl/5321043
2011-10-25 11:28:07 +11:00
Andrew Balholm
2aa589c843 html: implement foster parenting
Implement the foster-parenting algorithm for content that is inside a table
but not in a cell.

Also fix a bug in reconstructing the active formatting elements.

Pass test 30 in tests1.dat:
<a><table><td><a><table></table><a></tr><a></table><b>X</b>C<a>Y

R=nigeltao
CC=golang-dev
https://golang.org/cl/5309052
2011-10-23 18:36:01 +11:00
Nigel Tao
2f352ae48a html: parse <select> tags.
The additional test case in parse_test.go is:
<select><b><option><select><option></b></select>X

R=andybalholm
CC=golang-dev
https://golang.org/cl/5293051
2011-10-22 20:18:12 +11:00
Nigel Tao
64306c9fd0 html: parse and render comment nodes.
The first additional test case in parse_test.go is:
<!--><div>--<!-->

The second one is unrelated to the comment change, but also passes:
<p><hr></p>

R=andybalholm
CC=golang-dev
https://golang.org/cl/5299047
2011-10-20 11:45:30 +11:00
Nigel Tao
b1fd528db5 html: parse raw text and RCDATA elements, such as <script> and <title>.
Pass tests1.dat, test 26:
#data
<script><div></script></div><title><p></title><p><p>
#document
| <html>
|   <head>
|     <script>
|       "<div>"
|     <title>
|       "<p>"
|   <body>
|     <p>
|     <p>

Thanks to Andy Balholm for driving this change.

R=andybalholm
CC=golang-dev
https://golang.org/cl/5301042
2011-10-19 08:03:30 +11:00
Andrew Balholm
c64e8e327e html: insert implied <p> and </p> tags
(test # 25 in tests1.dat)
#data
<p><b><div></p></b></div>X
#document
| <html>
|   <head>
|   <body>
|     <p>
|       <b>
|     <div>
|       <b>
|
|           <p>
|           "X"

R=nigeltao
CC=golang-dev
https://golang.org/cl/5254060
2011-10-13 12:40:48 +11:00
Nigel Tao
1d0c141d7d html: parse doctype tokens; merge adjacent text nodes.
The test case input is "<!DOCTYPE html><span><button>foo</span>bar".
The correct parse is:
| <!DOCTYPE html>
| <html>
|   <head>
|   <body>
|     <span>
|       <button>
|         "foobar"

R=gri
CC=golang-dev
https://golang.org/cl/4794063
2011-08-01 10:26:46 +10:00
Nigel Tao
5a141064ed html: parse misnested formatting tags according to the HTML5 spec.
This is the "adoption agency" algorithm.

The test case input is "<a><p>X<a>Y</a>Z</p></a>". The correct parse is:
| <html>
|   <head>
|   <body>
|     <a>
|     <p>
|       <a>
|         "X"
|       <a>
|         "Y"
|       "Z"

R=gri
CC=golang-dev
https://golang.org/cl/4771042
2011-07-21 11:20:54 +10:00
Nigel Tao
d360e0213d html: update section references in comments to the latest HTML5 spec.
R=r
CC=golang-dev
https://golang.org/cl/4699048
2011-07-13 16:53:02 +10:00
Yasuhiro Matsumoto
1e6d946594 html: parse start tags that aren't explicitly otherwise dealt with.
R=golang-dev, nigeltao
CC=golang-dev
https://golang.org/cl/4626080
2011-07-06 13:08:52 +10:00
Yasuhiro Matsumoto
054cf72b56 html: fix nesting when parsing a close tag.
R=nigeltao
CC=golang-dev
https://golang.org/cl/4636067
2011-06-30 23:16:33 +10:00
Nigel Tao
fec6ab9726 html: parse "<h1>foo<h2>bar".
R=gri
CC=golang-dev
https://golang.org/cl/3571043
2010-12-15 11:39:56 +11:00
Nigel Tao
71bd053ada html: parse <table><tr><td> tags.
Also, shorten fooInsertionMode to fooIM.

R=gri
CC=golang-dev
https://golang.org/cl/3504042
2010-12-10 12:20:14 +11:00
Nigel Tao
49014c5b12 html: handle unexpected EOF during parsing.
This lets us parse HTML like "<html>foo".

R=gri
CC=golang-dev
https://golang.org/cl/3460043
2010-12-08 08:59:20 +11:00
Nigel Tao
08a47d6f60 html: first cut at a parser.
R=gri
CC=golang-dev
https://golang.org/cl/3355041
2010-12-07 12:02:36 +11:00