Keeping Tabs On Links

I have a long week ahead of me for various reasons (one of which is the need to spend some time in a US timezone to follow an internal event remotely), so I decided to clean up a bunch of things, including following up on last week’s upgrade.

Among other things (including the search buglets I’ve been expecting and an added sepia tinge for pages older than a year), I decided to fundamentally change the way I render tables, for three reasons:

  • Most of the top 10 consistently popular pages on the site (i.e., the ones people keep coming back to) are my “link tables” for various resources (like the Clojure and Python pages)
  • Most of those pages are still written in Textile, which is much better than Markdown for rendering complex tables but is an order of magnitude slower at rendering them, as last week’s profiler diagram hinted at.
  • Maintaining those link pages has grown to be a colossal pain over the years, since I was effectively managing the markup instead of the data in those tables, and cleaning them up was a chore.

So I thought a bit about how I want to manage the data inside all that markup, and decided to go with moving all the table data to YAML, allowing for custom ordering and trivial formatting as well as private comments:

---
head:
  date: Date
  link: link 
  notes: Notes
# I can sort on specific columns, and will 
# eventually add the ability to reverse the sorting order
ordering: [-date, link, notes]
formats:
    link: <a href="{url}">{link}</a>
    date: {date:%Y}
types:
    date: date
body:
- date: 2018
  link: some page elsewhere
  url: http://acme.inc/foobar
  notes: |
    I can edit or append to this programmatically with zero 
    hassle, and even add markup down the line
# this will be grouped with the above
- date: 2018 
  notes: No link here, but no problem either, since the formatter will just null the link column

I don’t really like YAML, but it beats JSON for quick editing over SSH and helps me enforce a schema while preserving the ability to have big dollops of text to annotate resources.

But it parses and renders much faster than Textile when using the C extensions to PyYAML, and makes it much easier to switch to something else down the line and consolidate the data across multiple sections of the site, possibly stuffing it into a ‘real’ database as necessary, and automating link checks for older content.

Data Migration

The painful bit, of course, is converting hundreds of tables across over 7.000 pages on this site, especially considering that, for aesthetic reasons, I always used rowspan to group together categories of resources in various ways–which makes the markup much harder to maintain, for starters, as well as utterly unfeasible to manually convert everything.

So, being fundamentally lazy, I injected a custom XML parser into the site’s rendering pipeline that dumps every table it comes across to a YAML file, and am letting this VM do first-pass conversions as visitors (and various web crawlers) visit each page.

This has the added benefit of making it obvious which are the pages most likely to benefit from the conversion, and turns each update into a trivial read through with occasional minor tweaks instead of a long, winding chore…