File Integrity

A quick stab at using only bash and md5sum to keep track of file integrity inside my (Tripwire is nice, but a bit over the top and doesn't carry over to DVD backups):

#!/bin/bash
# Change the separator to allow for filenames containing spaces
# (the default is " \t\n", which confuses the for loop)
IFS=$'\n'
FOLDERS=`find /Volumes/disk\ 1/Pictures/Photos -type d | sed 's/ /\\ /g'`
for FOLDER in $FOLDERS; do
# mind you, this will only work with absolute pathnames
if [ -d $FOLDER ]; then
  echo "$0: INFO: Processing" $FOLDER
  cd $FOLDER
  for FILE in `ls -1|grep -i .jpg`; do
    echo "$0: INFO: Checking $FILE"
    djpeg -outfile /dev/null $FILE
    if [ $? -ne 0 ]; then
      echo "$0: ERROR: $FOLDER/$FILE is unreadable as JPEG"
    fi
  done
  if [ -e MD5SUMS ]; then
    md5sum -b -c MD5SUMS 2>&1 > /dev/null
    if [ $? -eq 1 ]; then
      echo "$0: ERROR: in $FOLDER:"
      md5sum -c MD5SUMS | grep FAILED 2>&1
    fi
  else
    echo "$0: WARNING: no MD5SUMS in $FOLDER, creating..."
    md5sum -b *.* > MD5SUMS
    # The obvious bit, in retrospect
    chown guest:everyone MD5SUMS
  fi
fi
done

More elaborate tests are forthcoming:

  • Lighter testing for validity of files (CPU-wise)
  • Check if the directory was updated after the MD5SUMS file (i.e., files were added or deleted)
  • If so, store the current MD5SUMS in an RCS file (might as well keep track), compute a new checksum, see if older files were changed, tease out the new filenames and add them.
  • E-mail me any changes

And using SHA-1 is better, obviously, but sha1sum isn't available anywhere and I can't be bothered to compile it on the just now.