This post is part of my Data For The Ages category, which groups all my attempts at making sure the data I use will be minimally readable in the long term.
Update: Those in a hurry (and with some Python expertise) might want to play around with Projects/MailArchive, which is my first pass at archiving web pages in a standard MIME container (including inline images).
Prompted by another round of enthusiastic posts regarding Yojimbo and finding myself looking for a way to easily keep track of research material, I decided to take a another look at it - which included exploring its innards a bit.
Under The Kimono
It turns out that Yojimbo stores all of its stuff in a SQLite database at /Users/you/Library/Application Support/Yojimbo/Database.sqlite, which is a clean, fast and native way to do this kind of thing in Mac OS X. And it ought to be portable across platforms - if you bother to tease out the schema, of course, as well as deal with the internal formats.
Using SQLiteManager (demo, limited to 20 rows per query), it took me no time at all to figure out that Yojimbo stores stuff like web pages as BLOBs, which has the advantages of being both simple to implement and retaining the original format in its maximum fidelity - but which is, well, as non-portable as you can possibly make it.
Data for the Ages? I don't think so.
You see, although it's perfectly feasible to throw a PDF file into Yojimbo and have some hope of getting it out later - since the binary format is more or less well known, and it should be readable ten years from now - I'm somewhat skeptical of the ability of any non-Mac software to read an Apple "web archive" (which is not RFC:2387 compliant), or any of the other data you stick in it.
To Apple's credit, the .webarchive format seems to be reasonably straightforward - it is essentially the serialization of a NSDictionary, storing the HTML and all inline imagery, etc. as a binary property list (including some of the HTTP headers from the server for each chunk). It is, however, as proprietary you can get.
So, after having some fun throwing a few things into Yojimbo and dumping the ZBLOB.ZBYTES field to confirm my theory, I started wondering precisely what its added value was.
Which is pretty obvious, really - it's pretty damn simple to use, lets you tag and encrypt items, and lets you find things pretty fast (as well as arbitrarily slicing and dicing your data in views of your own choosing).
I didn't look into the encryption used, but I assume it's using the native Mac OS X AES stuff and see no reason to tinker with it, although I'm curious to see how it interacts with search (it shouldn't store cleartext indexes of encrypted data, for one thing).
So Yojimbo is fine if you only use Macs (which I don't, not exclusively), and will probably be more than enough for most people.
For me, however, the allure quickly faded away.
Why Use Another Wheel?
As it turns out, I've got two Mac native applications that let me:
- File, group and search arbitrary data
- Store it centrally on a server
- Flag items of interest
- Share it in standard formats with other platforms (and people!)
And, with some add-ons, either of them can even do:
- Tagging
- Cross-platform Encryption
...provided, of course, you are willing to jump through some hoops.
And the first application is (you might be surprised to know)... Mail.app - plus an IMAP server.
The Rationale
Having kept all my stuff in an IMAP store for years, I'm used to storing a bunch of stuff in it, as well as accessing and searching it from a number of mail clients - and provided you put stuff in the right way, I've found I can always get it out again regardless of the platform I'm using.
So, let's go through the list above, shall we?
- Quicksilver's "Email To... (Compose)" makes it trivial to create a draft message from anything, which can then be saved, dragged and filed in an IMAP folder.
- You can e-mail a PDF of anything straight from the Print dialog box.
- Safari has a "Mail Contents Of This Page" option.
Regardless of how you create the draft, the act of "filing" itself can be performed and enhanced with an AppleScript custom action that creates the message, saves it to the server (without actually e-mailing it anywhere, of course) and moves it somewhere else (I'm tinkering with that at the moment, and will post the results when I'm happy with them).
As to grouping, you can group stuff by folders (obviously) or create search folders. Spotlight indexes everything in my IMAP store (provided you let Mail.app store full copies of your messages locally, of course), and flagging is, well - trivial.
As to sharing stuff with other platforms, the only thing you don't get right off the bat is - you guessed it - web archives.
The Missing Bits
Safari does have a "Mail Contents Of This Page" command, but that creates a mail message containing only the HTML (i.e., it does not include inline images as MIME parts) - it seems complete (and messages created in this way are readable by Thunderbird, for instance), but it's not suitable for long-term storage.
Well, as it happens, I read my RSS/Atom feeds as e-mail - i.e., newspipe has code to create a properly MIME-formatted message containing inline images (which I decode for my mobile news front-end).
It doesn't handle CSS (or Flash, or other embedded media), but then again, those are not usually the sort of things you want to file (and I'm not sure Apple's .webarchive deals with them flawlessly, which means Yojimbo will suffer from the same caveat).
But it shouldn't be much trouble to take that code and create a Python script that grabs the HTML content from a specific URL and e-mails it to my archive account directly (plus the requisite Quicksilver custom action, of course).
As to tagging, I have two obvious solutions:
- Use MailTags (nice, Mac-native and will be able to store its tagging information in IMAP headers, although the base64 encoding makes it impossible to use for simple cross-platform searching...)
- Store them in the Subject: field (easy to use with Spotlight searches, smart folders and just about anything else, can be done at creation time from a Quicksilver custom action)
Finally, where it regards encryption, either S/MIME or PGP are able to deal with encrypted MIME multipart data - I haven't experimented much yet, but storing encrypted drafts on an IMAP server appears to be entirely possible (searching, obviously, is completely out of the question, unless you rely exclusively on Subject: lines).
Hey! You Mentioned Two Applications!?
Oh, yes, of course. You see, the other application that lets me do pretty much everything Yojimbo does is... the Finder.
You just have to set up a few search folders, really. Takes a bit more patience (and forethought), but Quicksilver has a tagging plugin that works just fine, and that's why Apple gave us Spotlight anyway...