Every time I see a utility like this, I think maybe I could switch to producing some materials in HTML as the primary, or main intermediary, source format. Then I try the utility and realize that that would be silly.
For example, I currently make PDF slides for talks. In theory I'd like to make HTML slides, but would still like the ability to render a PDF for a robust record. However, neither this utility (or PhantomJS, which I just tried) immediately do a good job of converting something like: http://bit-player.org/deck.js/limits-to-growth-Harvard-2012-...
EDIT: also just tried cutycapt, with similar results to wkhtmltopdf (got all slides rather than just visible one, with bad page breaks, and no TeX maths).
I take some of it back. Getting the latest version of wkhtmltopdf and telling it to wait (probably longer than necessary) to process javascript, works pretty well.
and actually work (create a sensible PDF representation of what I can see in a browser). So my feedback wouldn't be useful, as my use case is out of scope for your project: "PyXML2PDF is NOT compatible with any XHTML/HTML/CSS. It uses a small set of tags to quickly allow generation of PDFs."
Would it be sufficient to create PNGs of the web pages and extract the text of the webpage to place in the background of a PDF file (for search, screenreading)?
For example, I currently make PDF slides for talks. In theory I'd like to make HTML slides, but would still like the ability to render a PDF for a robust record. However, neither this utility (or PhantomJS, which I just tried) immediately do a good job of converting something like: http://bit-player.org/deck.js/limits-to-growth-Harvard-2012-...
EDIT: also just tried cutycapt, with similar results to wkhtmltopdf (got all slides rather than just visible one, with bad page breaks, and no TeX maths).