Maintaining a coherent photo archive has been an issue for me ever since I started using a digital camera, and I’ve been hacking scripts for making sure things are consistent as far back as 2004, which, if you’ve been using Macs long enough, was a full four years before iPhoto supported IPTC metadata.
My needs are actually pretty simple: I’ve long decided to just keep a master copy of everything on my NAS using a simple hierarchy (YYYY/MM/YYYYMMDDHHMMSS.foo
) and eschew fancy albums.
There are two main reasons for that:
- Pretty event-oriented albums are always short-lived because you end up having to use copies of your masters in one way or another, and I don’t want to tie my archive to a gazillion different album/social sharing tools1.
- Apple just can’t seem to maintain any kind of stable photo sharing service over the years–even updating an album on the Apple TV, on the same network as your Mac is an unsightly mess.
So the real challenge is ensuring the files are filed properly according to creation date.
Where they come from is largely irrelevant, although my main “inbox” is still iCloud Photos and most things tend to go through there in one way or another for the sake of triage and the odd tweak2.
But filenames are always a jumble and I just can’t rely on filesystem dates, so I’ve had a number of different strategies over the years, especially as we moved on from taking photos in JPEG
and Canon .cr2
format (which use EXIF/XMP
metadata) to HEIF/HEIC
photos from more modern iPhones.
Plus, of course, video, which even Swiss Army knife-type tools like exiftool
have trouble with, and which is now at least half the storage of any new year I add to my archive.
Tooling
On the Mac ecosystem, things really haven’t improved that much in the past few years.
The best I can say is that the Photos app exports literally everything well enough if you have Download originals to this Mac
set, even down to (quite surprisingly) actually setting filesystem modification dates correctly, so I can mostly trust it to do the right thing.
But in this new, nerfed scripting era of Shortcuts there just aren’t any decent ways to go about renaming files in bulk depending on their metadata, and Photos cannot normalize file names the way I want to.
I’ve used a Python script to wrap jhead
inside Automator for a long while, but that doesn’t work for HEIF
files or video, so recently I decided to cheat and resort to reading Spotlight (mds
) metadata via a hacky shell script:
#!/bin/zsh
for FILE in "$@"
do
if [[ ! -f $FILE ]] then
continue
fi
EXTENSION="${FILE##*.}"
# Ensure we only handle the kinds of media we want
if [[ "CR2 cr2 DNG dng JPG JPEG jpeg jpg PNG png MOV mov MP4 mp4 HEIC heic" != *$EXTENSION* ]] then
continue
fi
# Normalize extensions
if [[ "CR2" == *$EXTENSION* ]] then
EXTENSION="cr2"
fi
if [[ "DNG" == *$EXTENSION* ]] then
EXTENSION="dng"
fi
if [[ "JPG JPEG jpeg" == *$EXTENSION* ]] then
EXTENSION="jpg"
fi
if [[ "PNG" == *$EXTENSION* ]] then
EXTENSION="png"
fi
if [[ "HEIC" == *$EXTENSION* ]] then
EXTENSION="heic"
fi
if [[ "MOV" == *$EXTENSION* ]] then
EXTENSION="mov"
fi
if [[ "MP4 MPEG4" == *$EXTENSION* ]] then
EXTENSION="mp4"
fi
# Now grab EXIF/IPTC data that MacOS has already figured out for us
METADATA=$(mdls "$FILE")
# The space prevents matching derived properties
ISO_DATE=$(echo $METADATA | grep "kMDItemContentCreationDate " | cut -d= -f 2 | sed 's/[^0-9]//g' | cut -c1-14)
if [[ ${#ISO_DATE} -eq 14 ]] then
if [[ "$FILE" == "$ISO_DATE.$EXTENSION" ]] then
continue
fi
if [[ ! -f "$ISO_DATE.$EXTENSION" ]] then
mv "$FILE" "$ISO_DATE.$EXTENSION"
else
for SUFFIX in {a..z}; do
SUFFIXED_DATE="$ISO_DATE$SUFFIX"
if [[ ! -f "$SUFFIXED_DATE.$EXTENSION" ]] then
mv "$FILE" "$SUFFIXED_DATE.$EXTENSION"
break
fi
done
fi
fi
done
This is a little barbaric, though, and only works on macOS.
Going Cross-Platform
Given that I have been doing more and more stuff in Linux I wanted something more reliable and future-proof, so I found a HEIF
plugin for pillow
, reached for the ffmpeg
bindings and wrote this:
#!/bin/env python3
from PIL import Image, ImageFilter, UnidentifiedImageError
from PIL.ExifTags import TAGS
from pi_heif import HeifImagePlugin
from os import listdir, rename, stat, chdir
from stat import S_ISDIR
from os.path import splitext, exists
from pprint import pprint
from ffmpeg import probe
from sys import argv, exit
from time import strftime, gmtime
# build a list of alphabetical suffixes, starting with a blank
SUFFIXES = ['']
SUFFIXES.extend(list(map(chr,range(ord("a"), ord("z")+1))))
PHOTO_EXTENSIONS = [".jpg",".jpeg",".heic", ".png", ".cr2", ".dng", ".gif"]
VIDEO_EXTENSIONS = [".mp4", ".m4v", ".mov"]
def parse_exif(image: Image) -> dict:
exif = image.info.get("exif")
if not exif:
return None
tags={}
for k, v in image.getexif().items():
tag = TAGS.get(k)
tags[tag] = v
return tags
def safe_rename(filename: str, date: str, marker: str="-") -> str:
ext = splitext(filename)[1].lower()
for s in SUFFIXES:
new_filename = f"{date}{s}{ext}"
if not exists(new_filename):
print(f"{filename} -{marker}-> {new_filename}")
rename(filename, new_filename)
return new_filename
break
print(f"{filename} -!-> {filename}")
return filename
def scan_files(path: str) -> int:
if exists(path) and S_ISDIR(stat(path).st_mode):
chdir(path)
else:
print(f"invalid path {path}")
return -1
for filename in listdir():
(name, ext) = splitext(filename)
ext = ext.lower()
# photos
if ext in PHOTO_EXTENSIONS:
try:
image = Image.open(filename, "r")
tags = parse_exif(image)
image.close()
except UnidentifiedImageError:
tags = None
# use EXIF data
if tags and 'DateTime' in tags:
# We get this as a string, so we can use it right away
date = tags['DateTime'].replace(" ","").replace(":","")
if(not filename.startswith(date)):
safe_rename(filename, date)
# use modification date instead (Apple Photos sets it correctly on export)
else:
date = strftime("%Y%m%d%H%M%S", gmtime(stat(filename).st_mtime))
if(not filename.startswith(date)):
safe_rename(filename, date, marker="?")
# video
elif ext in VIDEO_EXTENSIONS:
streams = probe(filename)["streams"]
for s in streams:
if 'creation_time' in s['tags']:
date = s['tags']['creation_time'].replace("T",'').replace("-","").replace(":","")[:14]
if(not filename.startswith(date)):
safe_rename(filename, date)
break
else:
print(f"skipping {filename}")
return 0
if __name__ == "__main__":
if len(argv) == 2:
exit(scan_files(argv[1]))
else:
print(f"Usage: {__file__} <path>")
This is designed to work in almost exactly the same way the old jhead
CLI tool used to, but for all the file formats I have (except .jxr
JPEG-XR
files from the Xbox Series X, which I can file manually).
I just archived all of my photos from 2021 and 2022 with the above, so I would call it “good enough” for the moment. Native apps like ExifRenamer can also do the job (although that one in particular doesn’t seem to have been updated recently), but I prefer my script since I really want a cross-platform solution these days.
It currently sits alongside an imagehash
version that I hope to finish some day and use to batch remove duplicates and cropped versions–which is going to be essential once I start archiving the photos my kids take as well…
-
I’ve also mostly given up on Flickr, although I might take up Pixelfed if it becomes more usable and my photographer friends join up as well. ↩︎
-
I’ve also never found in myself the faith required to trust Adobe or Lightroom with my photos, although I routinely try alternatives hoping that they aren’t fussy about storage. Guess what, everyone likes to reinvent the photo database wheel, and developers seem unable to just take a read-only filesystem tree from a NAS and work with it as is. ↩︎