There's a set of tests which all the parsers should pass: https://github.com/AndreasMadsen/htmlparser-benchmark This issue is a hub for PRs adding tests.