aboutsummaryrefslogtreecommitdiff
path: root/www/py-html5-parser/pkg-descr
blob: 03b7267e4e1d14b9cb6e764f158d09a48e358d0b (plain) (blame)
1
2
3
4
5
6
7
8
9
A fast implementation of the HTML 5 parsing spec for Python. Parsing
is done in C using a variant of the gumbo parser. The gumbo parse
tree is then transformed into an lxml tree, also in C, yielding
parse times that can be a thirtieth of the html5lib parse times.
That is a speedup of 30x. This differs, for instance, from the gumbo
python bindings, where the initial parsing is done in C but the
transformation into the final tree is done in python.

WWW: https://html5-parser.readthedocs.io/