Keeping Tabs On Links

I have a long week ahead of me for various reasons (one of which is the need to spend some time in a US timezone to follow an internal event remotely), so I decided to clean up a bunch of things, including following up on last week’s upgrade.

Among other things (including the search buglets I’ve been expecting and an added sepia tinge for pages older than a year), I decided to fundamentally change the way I render tables, for three reasons:

  • Most of the top 10 consistently popular pages on the site (i.e., the ones people keep coming back to) are my “link tables” for various resources (like the Clojure and Python pages)
  • Most of those pages are still written in Textile, which is much better than Markdown for rendering complex tables but is an order of magnitude slower at rendering them, as last week’s profiler diagram hinted at.
  • Maintaining those link pages has grown to be a colossal pain over the years, since I was effectively managing the markup instead of the data in those tables, and cleaning them up was a chore.

So I thought a bit about how I want to manage the data inside all that markup, and decided to go with moving all the table data to YAML, allowing for custom ordering and trivial formatting as well as private comments:

  date: Date
  link: link 
  notes: Notes
# I can sort on specific columns, and will 
# eventually add the ability to reverse the sorting order
ordering: [date, link, notes]
    link: <a href="%(url)s">%(link)s</a>
- date: 2018
  link: some page elsewhere
  notes: |
    I can edit or append to this programmatically with zero 
    hassle, and even add markup down the line
# this will be grouped with the above
- date: 2018 
  notes: No link here, but no problem either, since the formatter will just null the link column

I don’t really like YAML, but it beats JSON for quick editing over SSH and helps me enforce a schema while preserving the ability to have big dollops of text to annotate resources.

But it parses and renders much faster than Textile when using the C extensions to PyYAML, and makes it much easier to switch to something else down the line and consolidate the data across multiple sections of the site, possibly stuffing it into a ‘real’ database as necessary, and automating link checks for older content.

Data Migration

The painful bit, of course, is converting hundreds of tables across over 7.000 pages on this site, especially considering that, for aesthetic reasons, I always used rowspan to group together categories of resources in various ways–which makes the markup much harder to maintain, for starters, as well as utterly unfeasible to manually convert everything.

So, being fundamentally lazy, I injected a custom XML parser into the site’s rendering pipeline that dumps every table it comes across to a YAML file, and am letting this VM do first-pass conversions as visitors (and various web crawlers) visit each page.

This has the added benefit of making it obvious which are the pages most likely to benefit from the conversion, and turns each update into a trivial read through with occasional minor tweaks instead of a long, winding chore…

The Third Python

It’s been a harrowing couple of weeks as a bunch of work-related stuff unraveled and needed to be put back in its appointed place, but I managed to find some time to tinker around this weekend and get some personal stuff done, including upgrading this website from the ground up (beginning with the usual seamless Ubuntu upgrade to 18.04, which I’ve been testing for a couple of months1).


The Future

a wide plaza between curved buildings with oval openings
The inner plaza at the Champalimaud Foundation.

Trivial Static Websites On Azure Blob Storage

At the risk of never getting around to picking up Kotlin, I’ve been revisiting other topics this weekend, namely updating last week’s Terraform example and figuring out the new features for static hosting on Azure, which I need to set up a little app I’m doing.


Terraforming Azure

As promised, this week I’m going to dive into Azure provisioning using Terraform, which is something I’ve been spending some time on, but which many folk in the Azure universe seem to be unaware of. Well, now there’s a copiously detailed example of how to bring them together, and I’m going to walk you through it.


Keeping Tabs On Azure Usage Using Jupyter

I haven’t written much of anything about Azure over the past year or so, other than assorted notes on infrastructure provisioning (to which I will get back, now that Terraform has an updated provider), nor about machine learning and data science—the former because it’s not a very sexy topic, and the latter because most machine learning in real-life boils down to a lot of data cleaning work that is hardly reflected in all the pretty one-off examples you’ll see in most blog posts.


The Maze

a glass walkway connecting two buildings over the road
Where I spent most of my time last month.

Static Considerations

Every now and then I ponder whether or not to move this site to purely static hosting (i.e., off an S3 bucket or an Azure storage account), both because it annoys me somewhat to maintain the VM and because the less code I have running to keep it up the better (and this even after having automated most away all chores and pared the code down to the absolute minimum).