Ten gigs of worthless PDFs

So, about a year back I covered my bookmarking workflow. In short, I was using Evernote and Google Drive to store PDF versions of links that interested me. One, it prevented link rot in case the site went down at a later point. Two, I wanted full text search over the content of the pages, not just the title and tags.

I eventually stopped using Evernote because its web interface is rubbish. I used a tool I wrote to download PDFs for around 2000 bookmarks and dumped them in Google Drive. That folder is now reaching 10GB in size.

I’ve now come to the depressing realization that none of this effort was of any use. When I need to dig through this archive and recollect something, there is so much noise that I don’t immediately get what I’m looking for. Or, as it often turns out, I hadn’t archived that page at all because I didn’t think I’d need it later.

The few PDFs that are actually useful to my reading style are the weekly LWN editions and other magazine-style PDFs like CACM, because I can save them to an ‘Incoming’ folder and read them at my leisure in my commute. But general web bookmarking doesn’t seem to be useful here.

So I’m changing tools again, to another old favourite: Diigo. It has a decent interface, supports full text search, and has a nice outliner tool to organize links and take notes. No idea if this plan will stick for long, as nothing in this area ever does, but let’s see.

The trials of a link hoarder

Early days – Pinboard and friends

I read/skim a lot of news on the internet and my current major problem is information retrieval. For a while I used a combination of Pinboard with Pocket, and later, Diigo with Pocket.

pin
4k bookmarks on pinboard 🙁

Eventually I realized that it was painfully hard to recollect something I’d come across a few months back. I don’t use too many tags and what really mattered was the full text search.

The other problem was link rot. For both reasons I wanted an archiving service that would cache the page and let me search through and see the content even if it disappeared down the line. Pinboard has one, but my next problem/solution was PDFs.

 

Files (PDFs) instead of links

I have a few 100 PDFs I’ve collected along the way. It would be a pain to upload each of them somewhere just to archive it. So the next option I investigated was one that would combine the two approaches. I already use Google Drive and the web interface has decent search functionality. What if I could archive my remaining bookmarks as PDFs and just dump them to my Drive?

And along the way.. Evernote

ever
The evernote native client

The last piece is with Evernote. I started using it sometime back and have already imported my bookmarks and PDFs to it, but at this point it only has the title, tags and url for the links. It has pretty decent full text search as well. I’ve started storing a lot of other notes to it, with cross-references to other notes and links.

Where do I go from here? Evernote is actually pretty good except for the nagging fear of what would happen if it ever shut down.

DIY

So my current hobby project has two pieces to it.

  1. Import a bookmarks.html file and convert every link in it to a pdf. I’m picking python for this and have found weasyprint which seems to do the main conversion pretty well.
  2. Once I have a few thousand PDFs in my filesystem instead of links, find a nice, cross platform way to search through them. Evernote still works perfectly fine here so I’m not too keen on doing this immediately.

In summary: Evernote is cool.  The web clipper works great from browsers (not so much from the android app). The native windows client is pretty nice too, and its lightning fast to take a quick note and organize it later.

PDFs are probably going to be the main thing I’ll base my system on. I don’t have to depend on a closed-source service going down, nor do I have to worry about the source itself disappearing.

Spring cleaning: cancelling my online subscriptions

Having two kids can be hard on the wallet sometimes. And I’m never happy unless I’m switching browsers, servers, services or distros every few months. So my current effort has been to track all my online expenses and terminate them with extreme prejudice.

Here is a list of all the things I’ve been using, and what I decided to do with them. First, the ones I didn’t cancel:

  • The Browser – Still my favourite source for curated long reads.
  • LWN – And my favourite source of linux news.
  • Saavn – This is only Rs. 100 a month so I’m not saving much cancelling it. My favourite source for music streaming/download.
  • Google Drive – I pay $2 a month for 100GB, and this is too important to cancel. All my kids’ pics are here.
  • Gandi – They’re my registrar for this website. The site is hosted in blogger so that part is already free.
  • Lastpass – The next service I cancel will probably be this. But its not too expensive, and integrates pretty well with my lifestyle (multiple machines and phone), so I’ve become rather tied to it.

And here are the ones I have sadly had to cancel:

  • Fastmail – I moved back to gandi’s email service. The webmail is not that great but I can always use a dedicated client. I’ve also moved several newsletter subscriptions to my gmail account to reduce the traffic here.
  • ACM – I initially got this only for the safari account that came with it. But my best study time is during my commute, when I never have any connectivity. Cancelled.
  • Linux Journal – A decent magazine that I’ve been a member of for the past couple years.
  • Marvel Unlimited – A great service. But apart from the initial couple months, I’ve started using the service less and less.
  • rsync – I didn’t know I was still paying for this until I came across the recurring payment page in PayPal. Cancelled. Good service, but it didn’t fit in my workflow.
  • Newsblur – Another good service. I switched to a free alternative (Digg)
  • Magzter – Got a year’s worth cheap, but never really used to it. Magazines don’t look good on the phone screen. So many things to read, so little time!
  • HotStar – I got this only for Game of Thrones. They do have the Wire and a few other good HBO shows.
  • Netflix – No time.
  • FSF – This one felt bad, because this is the only non-profit I donate to. But the $10 a month did add up to a lot more than many of the smaller ones in this list.
  • Digital Ocean – I had a VPS here for playing around with new tools. My web-to-email tools were also hosted here. A toy VPS is the hardest to part with 🙁
  • AWS – My free tier expired so I got rid of this asap.

Not bad for a few days’ worth of digging around and moving stuff, I guess. I think I’ve saved around Rs 15000 30000 annually with this. But knowing my nature, it won’t be long before I slowly start resubscribing to some of these. A low end VPS is going to be high in that list.

Update: added netflix and the fsf.

Flask trial run on PythonAnywhere

I felt it was about time I got into the web development side of Python, having learned enough of it to be dangerous. Since I moved my site to a PaaS, I wanted a quick and dirty alternative Python host that was easy to set up and use. Enter PythonAnywhere. This seemed to fit my immediate needs, as the site provides web-based bash, python and ipython shells for free users. They support both Python 2 and 3, so I went with the latter.

Next up, Flask. Their quickstart tutorial was what I used as a baseline. It seemed easier to start with than heavier alternatives like Django. Although the tutorial covers a minimal blog, I was able to make enough tweaks to it to get what I wanted.

Finally, a purpose. My wife is about a month away from delivering our first baby (oh let it be a girl, please 🙂 ), so I decided to make a simple site where the two of us could enter names that we wanted, for both genders. Each entry is equivalent to a blog post in the Flask tutorial, so the underlying code remained largely the same. It was easy enough to add the rest of the parts I needed. Here then is what it ended up looking like, and here’s a screenshot if I end up taking down the site later:

babynames

PythonAnywhere turned out to be a great experience. The founders were friendly enough to exchange a couple of mails directly, which was a fresh change from the noreply@website.com welcome mails that other sites favour. Getting the site up from my local test setup was a simple matter of ftp’ing a tar file over and extracting it.

Overall it took me less than half a day to get everything up, and another few hours of tweaking to add cute pictures and stuff.

Self-hosted Read-It-Later alternative

I was addicted to Pocket for a few months because of how well it fit into my reading habits. It had an elegant extension that basically allowed you to click-and-forget while all the magic happened in the background. The service filters all the trash from a page (typically a long article curated by the likes of The Browser and Longform) and neatly synchronizes content with other endpoints (my Android tablet and phone in my case). This was great for my evening commute back home because I’d usually have half a dozen or so interesting articles to read in a neat, clean page.

My very addiction to the service is what led me to explore other alternatives. Poche is the one I discovered and eventually settled with. It works just like Pocket as far as the content display goes, the major selling point being its open source nature. The devs have come up with a hosted solution for those who don’t run their own servers. Although saving and displaying work similar to the alternatives, sync does not. Your articles will still be in your poche but the android app will not cache them offline as of now.

poche

I’ve been running their self hosted version for more than a month and it has become a part of my daily routine now. I highly recommend supporting the devs for coming up with a great open source alternative in a niche that was largely proprietary until now.