Jericho HTML Parser is a simple but powerful java library allowing
analysis and manipulation of parts of an HTML document, including
some common server-side tags, while reproducing verbatim any
unrecognised or invalid HTML.
It also provides high-level HTML form manipulation functions.
WWW: http://jerichohtml.sourceforge.net/doc/index.html