Whilst (mostly) ignoring Black Friday and its assorted temptations (I have better things to do with cash, although I nearly jumped at getting one of these), I decided to ignore the plaintive beeping of my work gear, ignore a bunch of my To-Dos and sit down at my MacBook to do some actual workTM, i.e., coding.
$DIVINITY
was kind and visited upon us two ordinarily uninspiring rainy days during which I reviewed my (long neglected) personal projects, one of which (unsurprisingly) is this site, which has sported a (non-functional) archives section for a long while now (in fact, ever since the current redesign).
The markup is, in a word, horrific (and breaks every rule in the book since a lot of it is inline), but they are a work in progress (I’ll do media queries and a few other tweaks later), and the priority here was to get the back-end bits right.
Since I’ve always liked to include image previews and have them available when I share content online, two the things that were in line for building the archive feature were OpenGraph
support (which was easy enough to get to work, but needed to be pushed down into the indexer and surfaced across the site templating, which is a pain) and some form of smart thumbnailing, which was a lot more fun.
Doing Smart Thumbnails
This is a common enough feature to have zillions of variations, and the best ones use AI (or at least OpenCV
’s Haar transforms) to locate faces, identify contrasting features, etc., and from that build a set of crop boxes that best match a given set of criteria (maximize faces, colors, contrast, etc.).
After a little research, and given a single day to get this done (amidst all the usual stuff you need to pack into a weekend), I had four choices, really:
- Use Azure Cognitive Services to go through all my images (and store the bounding rects as metadata)
- Turn Håken Lid’s excellent article into a reusable library
- Build my own thing from scratch using
Tensorflow
(I have enough to go on) - Modify smartcrop.py to run faster, using evil trickery
The first would give me the best results for the least code, but one of the design goals for my CMS is that it should be able to operate entirely offline, and I would learn exactly zero in the process.
The other two would ordinarily make more sense, but they would require me to add significantly larger dependencies to my project (neither OpenCV
nor Tensorflow
are tools you can wield without qualms).
So I naturally went for hacking away at smartcrop
, which is a fairly inefficient algorithm in pure Python but which can be sped up tremendously by tweaking the sampling and caching the cropping rectangles. It’s by no means instant and bumps CPU requirements up a notch, but Cloudflare just laps up the resulting images, and makes it viable (at least for now):
All that’s needed to build a nice archive page, then, is to use the OpenGraph
metadata gathered by the indexer, generate a set of signed URLs1 to request the cropped thumbnails, and we’re good to go.
The next order of business was to figure out what to do with the articles that didn’t have images.
Thumbnails for Text
The image above hints at what I ended up going with and which afforded me an extra bit of fun.
Faced with the prospect of having to layout and style articles differently depending on whether they had images or not, I decided to simply generate simple single-letter previews for them.
I was both inspired by Pocket’s new article browser (which, incidentally, works so poorly with Safari that it’s become painful to use of late, to the point where I’m considering cancelling my subscription) and by the drop caps I decided to go with for article leads when I redesigned the site.
The code to generate those single-letter images was the most straightforward and fun bit I wrote all weekend (at least so far), and is worth sharing here:
def get_letter_image(width, height, title, font):
io = BytesIO()
initial = title.strip()[0].upper()
colors = [['BDDFDE','50A8A2'], ['F4E5CC','ECB354'], ['EDCACF','D94C58'], ['98CCF3','4EA3EF']]
# ensure color and position are consistent for the same title
color, off_x, off_y = map(float,sha1(bytes(title,'utf-8')).digest()[-3:])
color = list(map(hex_to_rgb,colors[int(color) % 3]))
# scale out font and letter offsets
size = int(max(width, height)*1.2)
off_x = off_x/255.0 * width
off_y = off_y/255.0 * height
im = Image.new(mode='RGB',size=(width,height), color=color[0])
fnt = ImageFont.truetype(font,size)
d = ImageDraw.Draw(im)
w, h = d.textsize(initial, font=fnt)
d.text((off_x - w/2,off_y - h/2), initial, font=fnt, fill=color[1])
im.save(io, format="JPEG", progressive=True, optimize=True, quality=80)
return io.getvalue()
Next Steps
Besides tweaking the colors for single-letter images and cleaning up the CSS (which I’ll be doing after testing the current layout for a few days, to let it sink in), I’m also planning to tweak the navigation a fair bit.
Right now it’s limited to showing a single month by design (I wanted to limit the number of simultaneous thumbnail requests, among other things) and it will fail spectacularly if you go back in time too much, but I mean to tie that in with search and redo both at once.
It might take a while, though–I have other things in the works, and one of them does include OpenCV
and a fair amount of C++
.
-
In order to prevent abuse, I insert an HMAC into all image URLs and do some extra checks when they are requested. Cloudflare helps a bit here, and I will likely move this to CDN workers some day… ↩︎