Oct 25 2013: Moved to Github (a long overdue change)
Given that Mail app now does somewhat sane .mbox-like archives using the "Archive" option, this script isn't actively maintained anymore, although I'm somewhat happy to see that it's become quite popular over the years and that other people have thought it useful enough to fork their own versions - still, if you're using this at all and you use Gmail, check the text below and upgrade.
As depicted here, I was wrestling with the need to back up my IMAP archive server for a good while, and eventually came up with this script, which has been passed around, enhanced, and even made into a reference of sorts for IMAP access. But, like all good things, it has its caveats:
- Copies every single message from every single folder (or a subset of folders) in your IMAP server to your disk.
- Does incremental copying (i.e., tries very hard to not copy messages twice).
- Tries to do everything as safely as possible (only performs read operations on IMAP)
- Generates mbox formatted files that can be imported into Mail.app (just choose "Other" on the import dialog).
- Is completely and utterly free (distributed under a BSD license).
- I do not support it, or intend to reply to any e-mails regarding problems in it. I will, however, accept fixes and enhances and publish them here from time to time.
$ python imapbackup.py -? Usage: imapbackup [OPTIONS] -z --compress create/append to gzip compressed files (EXPERIMENTAL) -s HOSTNAME --server=HOSTNAME connect to HOSTNAME -u USERNAME --username=USERNAME with USERNAME -p PASSWORD --password=PASSWORD with PASSWORD (you will be prompted for one if missing) ... # full options depend on version Mailbox files will be created IN THE CURRENT WORKING DIRECTORY $ imapbackup -s imap.local --username=me Password: \ IMAP: Scanning INBOX ... IMAP: Found 2440 messages in Personal/2005/Q3. \ MERGE: Copying from Personal/2005/Q3 to Personal.2005.Q3.mbox APPEND: Appended 2440 messages to Personal.2005.Q3.mbox (752171854 bytes, of which the largest message was 20211381 bytes)
These are missing features, ideas for enhancements, or both. There is a comprehensive TODO list inside the script itself, but my original notes regarding compression still apply:
- EXPERIMENTAL: Outputting gzip compressed files is doable, even for appending, but takes a long time to perform Message-Id scanning due to internal Python mechanics (my guess is that it tries to seek() inside the file a lot). Since bzip2 only lets you create completely new files, I'm considering re-implementing this in a different way:
- If the mailbox file does not exist, there is no need to scan it for existing messages, so I can use Python's native gzip or bzip2 file wrappers and create a new file.
- If the mailbox file already exists and is compressed, it's probably best to have it be completely decompressed first (by invoking gzip or bzip2gzip and bzip2 in precisely the same way and might be more efficient for large volumes of "new" messages (have to do some trial runs cases to figure out).
- No provision for injecting messages back into an IMAP server (remember, this is a backup utility, not imapsync). But since importing things into Mail.app tends to take a good while (especially when you have a lot of e-mail) and there's no point in importing into Mail.app only to drag things out to the server, I'm strongly considering implementing this (shouldn't be any trouble doing the actual copy, the command-line UI is the real pain).
- May mark messages as read in some IMAP servers (very heavily dependent on the server itself and its support of PEEK, is NOT fixable for those servers that don't support it, so try to backup only when you've read all your mail...).
- Does NOT backup IMAP flags (which also includes user-visible flags in Mail.app and Thunderbird), because - guess what - those aren't stored in the message headers. It does, however back up MailTags data, even though MailTags may not pick up flags from restored backups without a little nudging.
- SSL connections are supported since 1.3, but it doesn't do proper certificate chain validation (which should be fine for most people). This was not a priority for me since I back up my IMAP server over a LAN, but Michael Leonhard did a splendid job of patching it in and it seems to work fine.
- Authentication is secured by whatever Python's imaplib feels like using - i.e., the password is not necessarily encrypted over the wire (again, stuff like CRAM-MD5 is not a priority for me for the same reasons as SSL.
- Some sort of GUI might be possible by using EasyDialogs or the native Tcl/Tk bindings on the Mac (not a priority, but fun to tinker with).
- There is no detection of IMAP path separators whatsoever - i.e., if your IMAP server doesn't use '/' in nested folders, you'll have to tweak a constant in the script (no intention of fixing this).
- IMAP and disk files are scanned separately from the merge operation, which is nice, re-usable and safe. However, it may be nicer (in terms of reporting overall progress) to interleave IMAP scanning with merging after pre-scanning the disk files (not a priority until there is some sort of GUI).
- FIXED in 1.0.1a: For some reason, the qmail/Google/Groups combination tends to add a second Message-Id header to some messages. Since IMAP seems to prefer the first one and PortableUnixMailbox the last one, repeated runs will result in duplicate messages in the .mbox file (non-critical, but annoying).
- There are occasional instances where Message-Ids cannot be parsed when read from the disk. No data is lost, but may result in duplicate messages. This is due to my dumping the IMAP RFC:2822 format verbatim into a file (which is the safe, high-fidelity approach) and not being clever enough at parsing Message-Ids from a file (it is easier to parse the IMAP values - which the server takes some care to format correctly - than to try to anticipate all the possible horrors that the originating MUA or MTA perpetrated on the headers).
Does this FAQ mean you'll support the thing after all?
No, merely that I care about you.
I get some Python errors and then something about AUTH!
Of course you do. That's because the script does not support CRAM-MD5 authentication, and your server wants it to. Patches to do this are welcome, I have no easy way to test this at the moment.
- Added a timeout option to allow for slow IMAP servers.
- Added TCP_NODELAY to speed up transactions - should improve things for older Python builds.
- Added Brandon's fix for using BODY.PEEK instead of BODY to avoid marking messages as read (untested) and Ronan's deprecation patch for hashlib.
- Giuseppe Scrivano added folder support (see the file for more details)
- Michael Leonhard added folder support (I never got around to publish this version here).
- Fixed a minor issue with Microsoft Exchange where Exchange would reply to a query for RFC822.MSGSIZE with more information than expected (it apparently uses FLAGS as markers for Calendar items, and insists on sending them even if not asked to).
- Added bzip2 compression by popular demand. However, the unwashed masses will miss out on being able to append to existing .bz2 files (which is the nice thing about gzip). As a result, the code is now sprinkled with liberal warnings and a few more checks. Compression is likely to be done in a different way (by invoking external commands) if future tests show it to be faster.
- Added experimental gzip compression (at level 9). Works perfectly for new files, seems to work OK for appending to existing files. Makes for rather slow checking of pre-existing messages in local files, but is relatively fast for snapshots and saves a lot of disk space. Obviously, you'll have to decompress the files to import them into Mail.app.
- Test universe now spans over 8GB of e-mail messages (largest around 25MB)
- Made Message-Id collection a bit more resilient by parsing around malformed headers and adding a bit more error handling (there was no data loss, but unintended dupes).
- Fixed trailing newlines in some saved messages.
- Possible fix for multiple Message-Ids (seems to work for the infamous qmail/Google/Groups combo).
- Cleaned up some bits of code.
- Added command-line arguments, general code cleanup. First public version.
0.8 - 0.9:
- Worked around Bug #1092502Mac OS X by patching socket._fileobject.read (with thanks to Bob Ippolito).
- Removed PARTIAL fetches (no longer necessary after the fix)
0.3 - 0.7:
- Implemented PARTIAL message fetches (very large messages are fetched in chunks to avoid wasting RAM).
- Worked around extremely asinine Exchange behavior whereupon it will break Message-Id headers across several lines (probably caused by overlong headers and overzealous wrapping).