Opening and Converting Corrupted PDF Files

I sometimes come across the sort of format incompatibilities that can only be explained by joint incompetence on the part of Abobe and bigotry from people whose understanding of file formats amounts to “if it opens on my PC, then it’s your Mac’s fault”, and today was one of those days.

In short, I got a nicely formatted corporate PDF file that simply would not render on my iPad - or, or Preview, or anything else on my Mac, with fonts looking like this:

After a short discussion in which I soon realized that the chances of this being fixed at the creator’s side would be markedly less than nil, and considering the “option” of installing Adobe’s bloatware Reader a fate worse than death, I set out to analyze the file (and found it to be (surprise) generated by something that would like me to believe it was PowerPoint 2007 (which it couldn’t, since I happen to know they got that bit right) and that looks suspiciously like Adobe 9’s engine, since it tries to “optimize” fonts by loading what appears to be (to my aging PostScript-trained neurons, from the days where I actually wrote it by hand) an invalid font table.

So here’s today’s useful tip, brought to you with untempered hatred for the wanton mangling of standards and overall ignorance of document formats in today’s world - if you come across files looking like the above, there are at least two alternatives:

  • Open the file using Google Chrome, whose built-in (proprietary) PDF previewer fortunately ignores the extra font bits (at least for now). It bears noting, however, that it will currently crash if you try to print from it (to PDF or to a real printer).
  • Install Ghostscript and do ps2pdf12 stupidfile.pdf. That will forcefully downgrade a lot of the font “optimizations” and render the file completely readable.

And that’s all, folks. If you come across similar files, make sure to (politely) attempt to educate the originators and explain to them that it doesn’t cost much extra time to either e-mail their files to someone else to test first or to figure out the “save as” options in Adobe software.

Of course, now there’s a whole hour of my life I’ll never be getting back. Hope this saves someone time.

Update: This generates better (and smaller) output:

ps2pdf13 -dPDFSETTINGS=/ebook -dOptimize=true stupidfile.pdf

This rasterizes a number of elements at a decent resolution and builds a little table to optimize on-demand fetching via HTTP. /printer or /prepress are overkill and you don’t want to use /screen, since that exaggerates on DCT (JPEG) compression and things get all fuzzy.

See Also: