These forums are currently read-only due to receiving more spam than actual discussion. Sorry.

It is currently Sat Dec 02, 2017 4:04 pm Advanced search

PHP HTML5 parser

Here you can discuss things related to HTML and the Web in general that do not fit in to other categories.

PHP HTML5 parser

Postby Jero » Sun May 20, 2007 11:24 am

I've been working on an HTML parser in PHP based on the HTML5 specification. The reason I started working on this was because I wanted to have a parser that could parse small pieces of HTML, like comments on a weblog, or messages on a forum such as this one. And that's exactly what I've been working on. Thanks to the specification and html5lib, I was able to get something working up and running.

The script, which I dubbed PH5P (the PHP abbreviation was already taken ;)), is written in two classes: a tokenizer class (HTML5) and a tree constructer class (HTML5TreeConstructer). The tokenizer processes every character and sends them as tokens to the tree constructer. However, there are a few cases where the specification isn't followed. This is because the parsing algorithm in the specification expects an entire document. This parser is only to be used for small pieces of HTML, such as comments. On a side note, the script requires PHP5 to run.

In the future I'm planning on parsing the character tokens as well. This will, for instance, allow the text to be wrapped in P elements when needed. But for now I'd be happy to have the parser work properly, so I'd appreciate it if you could test the parser and post all feedback in this topic.
Posts: 1
Joined: Thu Mar 01, 2007 3:32 pm
Location: Rotterdam, the Netherlands

Re: PHP HTML5 parser

Postby zcorpan » Wed Sep 19, 2012 3:30 pm

Is this still maintained? It might be a good idea to put it on github or so people can contribute and file bugs.
Posts: 807
Joined: Tue Feb 06, 2007 8:29 pm
Location: Sweden

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 1 guest