I’ve been gathering interesting links on the Coronavirus epidemic. Here are the ones that stand out from the rest:
Coronavirus: Why You Must Act Now – A well-circulated summary with some great graphs. The main takeaway is that the real count is far higher than what is being reported, so it is better to act now rather than wait. And that travel restrictions will at the most delay things by a few days, but social distancing is the best way to contain things.
Seeing the Smoke – A rationalist argues that it makes sense to prep NOW rather than wait for the slow engines of the government to move. This post inspired me to start working from home without waiting for an official mandatory recommendation from my company.
How quickly does this virus spread, compared to other diseases? The R0 (R-naught) number gives this estimate. Ed Yong has a good explainer on what the number is estimated to be, and why it is not very easy to measure.
I’ve seen the usual conspiracy theories doing the rounds about this being a human-engineered bioweapon. This paper, specifically was getting some traction. if any one sends that to you, tell them that the paper has already been retracted.
So, about a year back I covered my bookmarking workflow. In short, I was using Evernote and Google Drive to store PDF versions of links that interested me. One, it prevented link rot in case the site went down at a later point. Two, I wanted full text search over the content of the pages, not just the title and tags.
I eventually stopped using Evernote because its web interface is rubbish. I used a tool I wrote to download PDFs for around 2000 bookmarks and dumped them in Google Drive. That folder is now reaching 10GB in size.
I’ve now come to the depressing realization that none of this effort was of any use. When I need to dig through this archive and recollect something, there is so much noise that I don’t immediately get what I’m looking for. Or, as it often turns out, I hadn’t archived that page at all because I didn’t think I’d need it later.
The few PDFs that are actually useful to my reading style are the weekly LWN editions and other magazine-style PDFs like CACM, because I can save them to an ‘Incoming’ folder and read them at my leisure in my commute. But general web bookmarking doesn’t seem to be useful here.
So I’m changing tools again, to another old favourite: Diigo. It has a decent interface, supports full text search, and has a nice outliner tool to organize links and take notes. No idea if this plan will stick for long, as nothing in this area ever does, but let’s see.
Once installed, the configuration simply expects a bookmarks.html file on the filesystem. It reads it, stores the contents to an sqlite DB and starts saving PDF versions of each link there-in.
You can kill the script and re-run it at a later point in time and it will continue where it stopped. The output looks like this:
(pdf) bash-4.3 ~/code/web2pdf/web2pdf$ ./web2pdf.py
Found 2599 links in the bookmark file
Found 2599 rows in the bookmark db
..of which 81 links are already saved
..and 2506 are pending
Hit enter to start downloading pending PDFs
Downloading https://www.quantamagazine.org/20170207-bell-test-quantum-loophole/ | experiment-reaffirms-quantum-weirdness
There are more details in the github link. I’m not happy with how fast slow it is but I seem to be limited by the library I’m using and the use-case itself: fetching a page is trivial but it has to render it before exporting it.
As always there is more to do but it works pretty well already. It tags failed bookmarks separately in the DB in case it needs retrying later. I’ve tried to speed it up using Python3’s native async/await, but the performance improvements are not noticeable so far. I’ll try with multiprocess instead and commit whichever one works better.
I read/skim a lot of news on the internet and my current major problem is information retrieval. For a while I used a combination of Pinboard with Pocket, and later, Diigo with Pocket.
Eventually I realized that it was painfully hard to recollect something I’d come across a few months back. I don’t use too many tags and what really mattered was the full text search.
The other problem was link rot. For both reasons I wanted an archiving service that would cache the page and let me search through and see the content even if it disappeared down the line. Pinboard has one, but my next problem/solution was PDFs.
Files (PDFs) instead of links
I have a few 100 PDFs I’ve collected along the way. It would be a pain to upload each of them somewhere just to archive it. So the next option I investigated was one that would combine the two approaches. I already use Google Drive and the web interface has decent search functionality. What if I could archive my remaining bookmarks as PDFs and just dump them to my Drive?
And along the way.. Evernote
The last piece is with Evernote. I started using it sometime back and have already imported my bookmarks and PDFs to it, but at this point it only has the title, tags and url for the links. It has pretty decent full text search as well. I’ve started storing a lot of other notes to it, with cross-references to other notes and links.
Where do I go from here? Evernote is actually pretty good except for the nagging fear of what would happen if it ever shut down.
So my current hobby project has two pieces to it.
Import a bookmarks.html file and convert every link in it to a pdf. I’m picking python for this and have found weasyprint which seems to do the main conversion pretty well.
Once I have a few thousand PDFs in my filesystem instead of links, find a nice, cross platform way to search through them. Evernote still works perfectly fine here so I’m not too keen on doing this immediately.
In summary: Evernote is cool. The web clipper works great from browsers (not so much from the android app). The native windows client is pretty nice too, and its lightning fast to take a quick note and organize it later.
PDFs are probably going to be the main thing I’ll base my system on. I don’t have to depend on a closed-source service going down, nor do I have to worry about the source itself disappearing.
This week I stopped using my local bookmarking tool (Insipid), and moved over to Pinboard. I’ve heard extremely good things about it over the years, and their blog is always a fun read. I like their pay-for-use, non-ad-supported model as well.
The pros: the site is functional, and fast. The antisocial, private bookmarking focus is a great plus. My import worked smoothly, except that the tags weren’t showing up initially. They did pop up a day or so later, so no harm done. There seems to be nice integration with other sites like Readability and Twitter, where Pinboard auto-pulls the links in them and tags them unread. Nice. The bookmarklets work as expected, worth noting is the ‘read later’ one which silently saves the current page without getting in your way with a popup window.
The cons: Nothing comes to mind really. I’ve not used it enough to have any real complaints, but I’d like to see how good the search is some day when I’m searching for something that’s on the tip of my tongue. Overall the site comes hugely recommended.