PHP HTML5 parser

by **Jero** » Sun May 20, 2007 11:24 am

I've been working on an HTML parser in PHP based on the HTML5 specification. The reason I started working on this was because I wanted to have a parser that could parse small pieces of HTML, like comments on a weblog, or messages on a forum such as this one. And that's exactly what I've been working on. Thanks to the specification and html5lib, I was able to get something working up and running.

The script, which I dubbed PH5P (the PHP abbreviation was already taken

), is written in two classes: a tokenizer class (HTML5) and a tree constructer class (HTML5TreeConstructer). The tokenizer processes every character and sends them as tokens to the tree constructer. However, there are a few cases where the specification isn't followed. This is because the parsing algorithm in the specification expects an entire document. This parser is only to be used for small pieces of HTML, such as comments. On a side note, the script requires PHP5 to run.

In the future I'm planning on parsing the character tokens as well. This will, for instance, allow the text to be wrapped in P elements when needed. But for now I'd be happy to have the parser work properly, so I'd appreciate it if you could test the parser and post all feedback in this topic.

http://jero.net/lab/ph5p/

by **zcorpan** » Wed Sep 19, 2012 3:30 pm

Is this still maintained? It might be a good idea to put it on github or so people can contribute and file bugs.

PHP HTML5 parser

PHP HTML5 parser

Re: PHP HTML5 parser

Who is online

Who is online