Data For The Ages, Take Two

A long while ago, I wrote a few notes on how to preserve your data for the ages, and, guess what, now that 14 years have gone by, I decided to revisit the topic–but with data from the 90s.

The reason was simple enough: one of the things on my endless To-Do list is to recover and catalogue some of the data I have on ancient floppy disks and CD-ROMs, most of it from my days in college.

Although I do have backups that predate 1989, there are a lot of fun things I did in the past that are worth preserving, not the least of which are some coursework projects taken just a little too far–like this Autonomous Guided Vehicle simulator, which was configured using a fuzzy logic DSL and tested out using a nice little UI I whipped up at the time:

Say what you want about Windows or C++, but this 1994 binary still works just fine today.

This was done in Visual C++ and MFC, and the fuzzy logic DSL was coded in the “hard” way by generating a parser using lex and yacc. I really should put it up on GitHub sometime…

File Format Archeology

But I digress–finding a working binary was a happy little accident, but what I really wanted to do was to convert reports like these to PDF:

We got burned once for not doing a user manual, and our motto became never again...
You can click here for the full PDF.

Like the above example, the stuff I have been going through is mostly coursework and academic stuff of various kinds that I find worth preserving, given that our course recently celebrated 30 years and I was, among other things, the editor for the course newspaper and various other odds and ends.

First, the good news: I have loads of .ZIP files on my NAS to go through, and even though the file format has evolved a bit over the years, The Unarchiver can deal with it all. And, as it turns out, Office formats were almost a good idea for long term storage.

But unlike most people at the time, me and my colleagues wrote our reports in Aldus PageMaker 4, which ran on both the Mac and Windows at the time–we’d draft text as Windows Write or Mac Word files, create diagrams in Corel Draw (or PowerPoint), and then laid it out on the page in ways that went a little above and beyond what the LaTeX folk turned in.

It helped that I was doing desktop publishing and some print design on the side, and had access to a bucketload of Macs (and later NeXT machines), although a lot of it was also hammered out on my humble 486DX and squeezed out through a pokey DeskJet 500.

So most of what I have from those days are a lot of .PM4, .WRI and .DOC files. I also have some diagrams, but the finished reports are all I care about.

Virtual RetroComputing

As it happens, nothing on my Mac or modern Windows machines can reliably open any of those files (Word has moved on to .docx, and even though it used to be able to open some .doc files, mine are positively primal–we’re talking Word 6 here).

So I took a dual-pronged approach:

  • I managed to get DOSBox to install Windows 3.1 from diskette images, to run PageMaker and Corel Draw (since we relied heavily on the fonts that came with the latter).
  • I set up my old favorite Windows For Legacy PCs inside Parallels Desktop to run Word 6 and PowerPoint (which refused to run under DOSBox), since getting Windows For Workgroups 3.11 to run inside it proved to be too much of a hassle and there was no easy way to get files in and out.

With both having direct access to my Mac’s filesystem, I started unpacking files, tossing them into the emulators and trying to get usable PDF files out of them.

Why You Should Hate Virtual Printers As Much As Real Ones

The biggest hassle? Actually printing the documents out.

Even when I had the right fonts installed, generating decent PDFs involved a lot of fiddling about and tuning the printer settings. Fortunately, I can actually read PostScript, and each emulation environment was helpful in its own way:

  • Parallels has a “Print to PDF (Mac Desktop)” virtual printer option.
  • DOSBox can capture print output to a file (and, strangely enough, it is easier to get Windows 3.1 to write PostScript to a file than later versions).

So I was getting half-baked PDFs in no time. But they were all subtly broken, so I took care to pick the simplest, less fussy, less optimization-prone printer driver inside each emulator: the gloriously ancient Apple LaserWriter II, which matched the epoch and required applications to send simpler PostScript code than some of the convoluted self-unpacking messes most printer drivers spit out today.

And yet, to make a very long story short, in order to print some of the more convoluted papers I still had to:

  • Make sure there was no font substitution table applied.
  • Make sure TrueType fonts were sent as soft fonts to the “printer”.
  • Find the absolutely correct fonts. This was crucial for things like equations, and I’m still trawling free font archives since I’m missing a couple of specific symbol variants.

The Readability Equation

The thing that took me the longest time to figure out was why any paper with equations came out completely garbled:

If your formulas look like this, then you need Fences.

After a lot of digging around, I figured out that equation metafiles generated by the older Microsoft Word Equation Editor used a different font (not MTExtra) to render big brackets and maths symbols.

Thankfully, Windows metafiles keep font metadata info, so I just had to copy the equations from PageMaker and paste them (as pictures) back into Word and inspect each weird letter to figure out the missing font name: It’s called Fences, and appears to consist solely of full or partial brackets (sometimes with one character for each half).

Towards The Next Age, If We’re Lucky

So I’m now (finally) putting together a comprehensive archive of all the stuff I built back in the 90s, and some of it will eventually find its way online. And I’d say these were a few hours well spent, for I now have glorious PDF files that, hopefully, will stand the test of time (assuming Adobe stops stuffing interactive junk into its new format proposals) and that I can share with my college archivists.

Next up, e-mail archives.

I have some ancient PST and Eudora files from those days that are also going to be a challenge, but I’m probably going to leave them for next weekend…