Paging all JPEG Geeks


This is part of my Data For The Ages category, which is in turn an extension of my ancient Quest for Easier Information Management.

Well, I'm now watching my files migrate from the Maxtor disk to the new Toshiba, and although I've already got a way to check for File Integrity on archived files, that has one interesting little issue -

It assumes your files are OK at the point of checksumming, which, well, might not be the case.

So, and focusing solely on photos for the moment, I was wondering if there is an easy way to test for JPEG file integrity (i.e., if it decodes alright) besides tedious visual inspection.

The idea is that I'll use something to verify whether a given JPEG file decodes properly, and folders with verified JPEGs then get a MD5SUMS or SHA1SUMS file (or both) added. This will then carry over to a DVD backup, and will also enable me to later verify the integrity of the photos on the DVD.

But, of course, before check-summing, I have to make sure the JPEG file is valid (otherwise it's kind of pointless).

Using djpeg and piping the output to /dev/null might be enough, but I was wondering if there is a more elegant solution than:

$ djpeg -v -outfile /dev/null test.jpg 2>&1 | grep Corrupt

Corrupt JPEG data: 45 extraneous bytes before marker 0xd3

Obviously, I can test the return value rather than grepping the output, but djpeg itself is the issue here - I'd like to know if there's anything else out there.

Anyone out there ever had to batch test, oh... 8000 photos at a UNIX command line?

Update: Well, it took five hours to run the File Integrity script (against two copies of my photo archive simultaneously), and there were only six photos that threw up JPEG decoding errors: five ancient ones taken with a VGA digicam (which may have been corrupted any number of ways before the disk clacking), and a very recent one taken with a Windows Mobile device. Turns out that the JPEG encoder used on it was the problem - every photo it saves has a few extra bytes between two header fields (but is viewable, despite the non-conformity to standards...).