HTML::Index on CPAN.

HTML::Index on CPAN.

This is a set of (perl) modules for creating, storing and searching indexes of html files that looks like a handy starting point for my html indexer. Seems like I might be able to sub-class it to use my own parser and store the code and throw out the content. So I could search for things like which pages on site.com are still using font tags? Which call such-and-such stylesheet or javascript library.

The real trick is going to be getting useful search results for tag combinations.

And don’t forget I want to offer a download of the results in csv or xls format!