web2pdf – A tool to export bookmarks to PDF

web2pdf – A tool to export bookmarks to PDF

Here’s a follow up to my earlier post on archiving. I spent a couple of days coming up with a quick Python app to fill my needs.

Here it is: web2pdf.

Once installed, the configuration simply expects a bookmarks.html file on the filesystem. It reads it, stores the contents to an sqlite DB and starts  saving PDF versions of each link there-in.

You can kill the script and re-run it at a later point in time and it will continue where it stopped. The output looks like this:

(pdf) bash-4.3 ~/code/web2pdf/web2pdf$ ./web2pdf.py 
Found 2599 links in the bookmark file
Found 2599 rows in the bookmark db
..of which 81 links are already saved
..and 2506 are pending
Hit enter to start downloading pending PDFs
Downloading https://www.quantamagazine.org/20170207-bell-test-quantum-loophole/ | experiment-reaffirms-quantum-weirdness

There are more details in the github link. I’m not happy with how fast slow it is but I seem to be limited by the library I’m using and the use-case itself: fetching a page is trivial but it has to render it before exporting it.

As always there is more to do but it works pretty well already. It tags failed bookmarks separately in the DB in case it needs retrying later. I’ve tried to speed it up using Python3’s native async/await, but the performance improvements are not noticeable so far. I’ll try with multiprocess instead and commit whichever one works better.

0 thoughts on “web2pdf – A tool to export bookmarks to PDF

Leave a Reply

%d bloggers like this: