How I Manage My Personal Infrastructure in 2026

As regular readers would know, I’ve been on the homelab bandwagon for a while now. The motivation for that was manifold, starting with the pandemic and a need to have a bit more stuff literally under my thumb.

But I also have a few services running in the cloud (in more than one cloud, actually), and I’ve seldom written about that or the overlaps between that and my homelab.

Zero Exposed Endpoints

One of my key tenets is zero exposed endpoints. That means no web servers, no , no weird port knocking strategies to get to a machine, nothing.

If it’s not exposed to the Internet, then it’s not something I ever need to worry about. But of course there’s a flip side to that: how do I get to my stuff when I need to?

This won’t be popular in many circles, but everything I have exposed to the Internet is behind Cloudflare in one form or another:

  • This site is fronted by Cloudflare (even though it is static HTML in an Azure storage account)
  • I use Cloudflare Tunnels to have a couple of services (web and ) accessible from outside the house (including some whimsical things like a way to share the screen from my when doing whiteboarding), and they are shut down automatically overnight.

To get at anything else, I use (extensively, from anywhere to anywhere). The VMs or VPSes I use across providers are only accessible via (and, of course, whatever native console the provider exposes), and typically have no exposed ports.

Static Is Faster, Lighter and Simpler

I’ve come to the conclusion over the years that there is no real reason for me to run a public web server for 90% of what I do, so things like this site, my RSS feed summarizer, and anything else that needs to publish Web content are designed from scratch to generate static content and push it out to blob storage.

That way, serving HTTP, managing certificates, and handling traffic spikes are literally someone else’s problem, and I never have to worry about it again.

Anything fancy or interactive I typically deploy on my homelab or inside Tailscale.

But, interestingly enough, I also:

  • don’t need to worry about response times
  • need a lot less CPU and memory altogether to get something done
  • can pack a lot more services into cheaper, often single-core VMs

Squeezed down, Concentrated Compute

Over the years I dabbled with many forms of service deployments, and after a long time building enterprise stuff and then refactoring those as microservices (typically using message queues, since HTTP makes you fall into all sorts of synchronous traps), I gradually came to a point where I started questioning how cost-effective some of those approaches were.

So today I don’t use any form of serverless compute. I have often been tempted by Cloudflare Workers, but since I don’t need that kind of extremely distributed availability and I can pack 12 small services inside a dual-core ARM VPS for a negligible fixed amount of money, I don’t have to worry about spikes.

If something somehow becomes too popular, the VM acts as a containment boundary for both performance and cost, which is much better than an unbound serverless bill.

I have been sticking to that approach for years now, since it’s cheap, predictable, and extremely easy to maintain or back up. And since I deploy most of my services as plain docker compose, I can set CPU and RAM limits if needed.

Keep It Dead Simple

Although it’s inescapable in many modern production environments, I don’t use for my own stuff, and the key reason for it is simplicity.

I don’t want to have to worry about managing a cluster, handling volume claims, or deal with the additional CPU and memory overhead that comes with it.

So I have been deploying most of my stuff using , docker compose and kata, my minimalist, ultra-pared-down deployment helper, which has an order of magnitude less complexity.

If I need redundancy or scale-out, it’s much simpler to deploy docker swarm, mount the local provider’s external storage of choice on all the nodes (if needed) and have an ultra-low overhead, redundant deployment. In fact, I have been for years now.

And, again, it’s easy to set up VM backups with point-in-time restore in the vast majority of providers. If there’s one boring technology that everyone got right, it’s definitely VM backups.

SQLite is awesome

I work a lot with huge data warehouses, data lakes, and various flavors of , plus all the madness around data medallions, ETL, data marts, and the various flavors of semantic indexing the agentic revolution needs but nobody really mentions.

But for my own stuff, I always go back to because it is both much simpler and surprisingly flexible:

  • I can store timeseries data in it
  • it is stupidly fast (there’s a 10GB SQLite file with all my home automation telemetry for a year, and it’s surprisingly zippy for the hardware it’s running on)
  • I can enrich it with indexable JSON without breaking the whole schema
  • it has baked-in full-text indexing
  • I can use it as a vector store with a couple of extensions
  • I can back it up trivially

So I hardly ever deploy any kind of database server–but when I do, it’s always .

And yet, I only have two instances of it running these days.

Secrets Management

Even with zero exposed endpoints, secrets management is still a thing I need to worry about. To reduce the number of moving parts, I’ve been using docker swarm secrets for most of my apps (or just the provider’s secrets management: Azure Key Vault, AWS Secrets Manager, etc.).

On my homelab I’ve been using Hashicorp Vault, but it is far too complex for most of my needs and I’ve been dabbling with a replacement.

Bringing It All Home

My homelab approach is pretty much the same: everything is behind , (almost) nothing is directly exposed to the Internet, and I use docker compose for most of my application deployments, except that the hypervisor is and I use containers extensively instead of full VMs.

There is a lot of FUD out there around running docker inside , but backing up an entire and its multiple docker compose applications as a single unit is incredibly convenient and much more efficient than a VM.

The only real issue I’ve had (a couple of times) is that a misbehaving container can bog down the entire host if resource limits are not set properly or if it abuses I/O (which is particularly easy to do in a NAS with HDDs), so those services live inside regular VMs.

Incidentally, podman has a bunch of issues running inside containers, largely related to cgroup and UID management.

As to service definitions themselves, docker compose and the like, everything is backed, of course, and I use to manage all of it, together with and a few custom actions.

Observability

This is the bit that I have been sorting out, and I’m converging towards a combination of for metrics and a custom OpenTelemetry collector I’m working on called Gotel to gather traces and logs from my applications.

I use the cloud provider’s managed backends for those (Azure Application Insights, AWS CloudWatch, etc.), but I want something simpler and more portable for my own stuff–and I’m building it now.

Notes for December 25-31

OK, this was an intense few days, for sure. I ended up going down around a dozen different rabbit holes and staying up until 3AM doing all sorts of debatably fun things, but here’s the most notable successes and failures.

Incidentally, this post is a great example of why I think when you have the ability and skills to guide it–without GitHub Copilot, I wouldn’t have even begun to scratch any of the more complex things I put together this week.

And scratching I did, very much so. I went through many of my long-term itches and annoyances regarding a bunch of semi-random things I care about, and tackled the most I could fit into the week for sheer fun.

The Unreasonable Popularity of PhotosExport

To begin with, I decided to modernize and (re)automate one of my usual year-end chores once and for all: Curating and filing the year’s photos off my iPhone and saving a snapshot to my NAS.

This has become a volume enterprise now that we have the new iCloud Shared Photo library, to which I and my wife and kids save photos we want to keep/share.

Even without Apple (finally) blessing us with that feature, exporting all of my photos off iCloud and saving them to my NAS with a coherent, reproducible naming convention and include all the original formats has been a recurring issue . It goes all the way back to , and I keep revisiting it.

Of course to help over the years, but guess what, macOS Tahoe and Apple’s utter neglect of automation have made those untenable (there isn’t even a decent Shortcuts action to export photos).

So this year I did it again, but using (in the perhaps naïve belief that Apple won’t break it) and the result was PhotosExport–a typically “me” tool in the sense that it does the absolutely bare minimum I need in the simplest possible way with the least amount of dependencies and frills.

It’s not even a proper Mac app, just a highly opinionated CLI tool that you can build and run yourself without any significant tooling except for the CLI tools.

And since I was a bit annoyed at Apple’s cloud features (including the fact that I can’t get at any of the interesting extra metadata and tagging), I decided to adorn the README with a suitable picture:

Record scratch: You may be wondering if this is self-referential
Record scratch: You may be wondering if this is self-referential

I didn’t think anyone else would be interested in it, but somehow it took off and got 160+ GitHub stars and 5 forks in a couple of days, as well as quite a bit of direct feedback–including from people who see it as a way to migrate off iCloud/Apple altogether.

Classic Mac Pedal To The Metal

In parallel, I proceeded to bite off more than I could chew in various other fronts.

For instance (and this is one of the failures), I had a go at building on my drawterm port and trying to get to work, but I would have to port and dis to 64-bit and that was a bit too much–but I did take a stab at it.

Somehow, that led me to something that consumed me way into the wee hours last night–trying to improve Basilisk II performance on low-end ARM hardware.

I could have spent these quiet days installing the new LCD display I got for the or finishing the design for a new case for my .1 Instead, I decided to see if I could:

  • Build a “baremetal” version that leveraged the Pi’s opengles2 support (which I sort of already had, since my manual builds used the frame buffer directly)
  • Add an ARM JIT engine.

And then I thought–heck, why not make it two engines? (cue the “five blades” meme)

After all, I have had to run Basilisk II on fairly low-end 32-bit hardware in the past, and the 32-bit builds are actually faster on 64-bit Raspbian for some things, so I took the partial ARM32 JIT engine from ARAnyM and paired it with the Unicorn Engine. It lacks FPU emulation, but seems to be able to JIT chunks of 68k code OK, if disappointingly slowly.

And… 24 hours later, I have experimental releases of both 32 and 64-bit JIT builds.

Only the 32-bit one actually boots for now and seems snappy, but has some SDL and screen corruption issues I’m trying to figure out (this is one of the most extreme cases, it’s a little better now):

Yeah, I know even the case looks rough...
Yeah, I know even the case looks rough...

…but I have just spent one of the most fun late nights in years fighting with raw, unfettered low-level , and even if I have to thank AI for a lot of the guidance and exploratory code, .

Pro Tip: Worktree support in is a complete lifesaver. Also, consider building specialized, local MCP servers to help you with common tasks.

The only thing that bugs me is that this took away nearly all of the time budget I had allocated to either finish the 3D model for a new case or opening the and start figuring out how to mount the hardware inside (although I have started ).

Node-RED Redemption

But Basilisk II wasn’t the only thing I decided to reboot. For a couple of weeks now, I have been quietly poking at a completely rebooted version of the original Node-RED dashboard module, rebuilt from the ground up using Preact and Apache ECharts.

I started doing it almost exclusively because of this Dashboard 2.0 issue, which has been open since 2024, and it sort of ballooned from there.

Truth be told, I’m also doing it because I think that the original dashboard is a much better fit for my needs in general, especially given the way it is deeply integrated into .

And I had a bunch of ulterior motives:

  • I wanted to do a sizable project in , and I might as well do some refactoring work to get started since I can compare it to the original.
  • I have always wanted to get rid of the Angular layer and have a single, unified modern file that didn’t require additional tooling to maintain.
  • I really need a dashboard that works the way the old one did. And I am 400% positive I am not the only one.

I’ve also always been a fan of the utterly no-frills, very lightweight take Preact has on handling components, so I guess things just clicked the moment I started using heavily.

Right now I have most of the components “working”, although there are a few differences and nearly all the charts are buggy (I am still mostly refactoring and generating tests to ensure UX and behavior compatibility), but it is… usable in a way:

Cue the CSS blinds meme
Cue the CSS blinds meme

Once I start daily driving it I’ll likely do the unthinkable and end up maintaining an npm package.

As a nice bonus, I added a bunch of locales that had never been supported, bringing the total up to 12:

  • 🇺🇸 en-US (English)
  • 🇩🇪 de (German)
  • 🇪🇸 es-es (Spanish)
  • 🇫🇷 fr-fr (French)
  • 🇮🇹 it-it (Italian)
  • 🇯🇵 ja (Japanese)
  • 🇵🇹 pt-pt (Portuguese - Portugal)
  • 🇧🇷 pt-br (Portuguese - Brazil)
  • 🇨🇳 zh-cn (Simplified Chinese)
  • 🇹🇼 zh-tw (Traditional Chinese)
  • 🇰🇷 ko (Korean)
  • 🇷🇺 ru (Russian)

Pro Tip: If you’re doing this kind of localization work, build a tool that does random sampling of your locale strings (I’m doing five sets of three - the English base plus two translations), and any ((even the simplest local model) will churn through the mistakes in minutes.

Only Fans, Sorta… Not

After almost a year of poking at ways to keep things cooler in my closet, I finally fixed my ’s fan controller (or, rather, found the Linux daemon that could deal with its proprietary PWM controller).

I haven’t pushed any of my scripts or notes to GitHub yet, but they’re going to end up on this repo.

Serendipitously, ’s CPU fan died sometime this week, and I had to do some open heart surgery on it:

Yes, of course I modded the PSU fan to bring in extra cooling
Yes, of course I modded the PSU fan to bring in extra cooling

Fortunately, I have spares of this kind of thing lying around, but it led me down another rabbit hole, which is taking another stab at fixing one of my long-time outstanding issues–having comprehensive metrics and alarms (i.e., observability) for all of my hardware (and home automation, and app metrics).

InfluxDB and Telegraf

has built-in monitoring for CPU, storage, etc., and I have long stuck to a very simple -based approach for my home automation temperature and power charts.

But I didn’t have a unified view of a lot of things–including baremetal data like CPU temperatures, fan speeds, etc.–and no real alarms other than a ZFS monitoring script that I put together when I repurposed some old HDDs.

After avoiding it for years I decided to look into InfluxDB, quickly steered away from the “big data style” version 3 rewrite and set up InfluxDB 2.0 on my main NAS.

Pointing at it was a very trivial setup from the Datacenter settings–a few clicks and all my nodes, VMs and LXCs were sending metrics to it.

But were they?

Well… not really. The baked-in PVE collector is really picky about settings, and the only way I got it to work was to set up a dedicated proxmox bucket (it refused to send LXC metrics to anything else, which I suspect is a bug).

Pro Tip: Check the Proxmox forums whenever you come across this kind of inconsistency. Sadly, it seems this bug has been around for years and surfaces inconsistently.

Since I absolutely loathe Grafana (it’s overly complicated), I decided to have a go at building dashboards on InfluxDB directly, which… Nope.

So of course I started another project:

I forked steward and am building yet another agentic dashboard creation application that will eventually make it easy for me to create Vega or Giraffe grammars and Flux scripts from just talking to an AI.

Yes, I know this is a trope, but I figured I might as well get on that bandwagon, even if it is too close to .

This is going to be a slow-burning project, so in the meantime I have already started piping zigbee2mqtt metrics into InfluxDB and I am setting up telegraf on all my physical machines and Docker hosts for both temperature/hardware sensor monitoring and detailed container statistics.

But first, I’m automating the heck out of the roll-out with ground-init to ensure it’s repeatable. Watch that for templates over the next few weeks.

And yes, I eventually caved in and installed Grafana to make do. But I intend to “experience” it again at leisure to see how much simpler I can make my own dashboarding experience. I hate it, but at least it works for now:

As anyone who's done Ops will tell you, flat charts are the best charts--they mean there's no trouble
As anyone who's done Ops will tell you, flat charts are the best charts--they mean there's no trouble

The only thing I’m missing now is proper OpenTelemetry-like application performance metrics. Most of my cloud stuff uses Azure Application Insights for metrics, tracing, and exception logging, and even if I might be able to shoehorn part of it into InfluxDB, any suggestions on something I can do about it (and yes, I have Signoz on my shortlist) are welcome.

Update, the day after: Boy, a new year sure changes your perspective. Anyway, I decided to ditch InfluxDB given the artificial limitations in 3.0 and their deprecation of Flux and use Graphite instead, since I just want simple time-series storage and graphing without all the bloat, and a single container gives me everything I need for initial exploration. More on that later as I fine-tune my telegraf setups and figure out alterting, which is the one missing piece for me now.

We Have Hologram Microcosm At Home

This one is completely off the wall, but worth mentioning since it’s been on my back-burner for years and I have finally gotten around to it.

In short, the Hologram Electronics Microcosm is a very fancy granular effects pedal that took the synth world by storm a few years ago, and that I have always wanted to play with.

But my music hobby only really happens when I’m happy at work (and ) plus there is no way I could ever justify spending that much on one, so I have been tinkering with the idea of replicating most of its features in software somehow.

And then I remembered I have a Norns Shield I built a few years ago that can do most of it on paper, and I had a wild idea: I got the Microcosm PDF manuals, rendered them in Markdown, extracted all the feature descriptions and built a very comprehensive, LLM-ready SPEC.md with absolutely everything I could glean from the documentation.

The result of around 30 minutes of feeding that to GitHub Copilot was nanocosm:

I was completely blown away by the fact that the spec resulted in a working UI
I was completely blown away by the fact that the spec resulted in a working UI

…and, even more mind-blowingly, the first effect just worked.

I plugged in my , piped the audio through, and it did some of the granular synthesis/looping/reverb I expected. I was hooked, and ended up spending most of last Friday fiddling with it.

Now for the reality check: I have no idea if it actually sounds like the Microcosm. After two days “implementing” all the effects I am constantly coming up against Supercollider issues and UI glitches, and the Norns has a pitiful amount of physical controls when compared to the Microcosm.

Still, I now have an effects toy that is a lot of fun to tinker with–it’s a great Christmas present for myself.

The icing on the cake is that I also ended up building an MCP server for it that is sophisticated enough for me to “show” Copilot what is in the UI and what sliders I’m tweaking so we can refine the part:

Yes, I hacked the websocket connection from the web UI
Yes, I hacked the websocket connection from the web UI

Once I can sort out the Supercollider bugs (right now I’m not releasing all of the filters in some UI interactions) and figure out if this is actually releasable. It is, after all, a clean room re-implementation of the Microcosm, and as any synth nerd will tell you, Behringer is still very much in business… but I need to think about it, and if I reach a positive conclusion I will put it up on GitHub as well.

Other Stuff

Finally, there were a few other minor successes/failures:

These had nearly daily fixes, since real life keeps coming up with corner cases–but they were all simple enough to be fixed using toad and the free OpenCode models using nothing but my iPad and a terminal window, so that was great.

But even though I’m quite happy with all of these hacks and this was arguably one of my best grown-up holiday weeks ever, I think I need to go back to dealing with hardware again.

I have been meaning to build my own ZigBee devices, but I also have new single-board computers and mini-PCs to test, so I think I’ll try to switch gears to those for the remaining few days before I go back to work (which is effectively Friday, but I’m looking forward to the weekend).

Thank You

This is likely going to be the last post of the year, so I would very much like to thank:

  • Everyone who’s visited or this site (and hence a good deal of my hobbies)
  • All the vendors who’ve gracefully provided throughout the year (and hence not just provided unique opportunities to gauge the state of various pieces of hardware, but also indirectly helped my private consulting engagements, since all that testing helps me keep my technical skills sharp)

But, most importantly, if you’ve read this far, I wish everyone a very Happy New Year.

May 2026 be a much better year for everyone all around (regardless of whether pan out or not).


  1. I’ve been slowly poking at this since , after all, and the inflection point was, I think, when I decided to set up my own automated builds of Basilisk II↩︎

TIL: Restarting systemd services on sustained CPU abuse

I kept finding avahi-daemon pegging the CPU in some of my LXC containers, and I wanted a service policy that behaves like a human would: limit it to 10%, restart immediately if pegged, and restart if it won’t calm down above 5%.

Well, turns out systemd already gives us 90% of this, but the documentation for that is squirrely, and after poking around a bit I found that the remaining 10% is just a tiny watchdog script and a timer.

Setup

First, contain the daemon with CPUQuota:

sudo systemctl edit avahi-daemon
[Service]
CPUAccounting=yes
CPUQuota=10%
Restart=on-failure
RestartSec=10s
KillSignal=SIGTERM
TimeoutStopSec=30s

Then create a generic watchdog script at /usr/local/sbin/cpu-watch.sh:

#!/bin/bash
set -euo pipefail

UNIT="$1"
INTERVAL=30

# Policy thresholds
PEGGED_NS=$((INTERVAL * 1000000000 * 9 / 10))   # ~90% of quota window
SUSTAINED_NS=$((INTERVAL * 1000000000 * 5 / 100)) # 5% CPU

STATE="/run/cpu-watch-${UNIT}.state"

current=$(systemctl show "$UNIT" -p CPUUsageNSec --value)
previous=0
[[ -f "$STATE" ]] && previous=$(cat "$STATE")
echo "$current" > "$STATE"

delta=$((current - previous))

# Restart if pegged (hitting CPUQuota)
if (( delta >= PEGGED_NS )); then
  logger -t cpu-watch "CPU pegged for $UNIT (${delta}ns), restarting"
  systemctl restart "$UNIT"
  exit 0
fi

# Restart if consistently above 5%
if (( delta >= SUSTAINED_NS )); then
  logger -t cpu-watch "Sustained CPU abuse for $UNIT (${delta}ns), restarting"
  systemctl restart "$UNIT"
fi

…and mark it executable: sudo chmod +x /usr/local/sbin/cpu-watch.sh

It’s not ideal to have hard-coded thresholds or to hit storage frequently, but in most modern systems /run is a tmpfs or similar, so for a simple watchdog this is acceptable.

The next step is to make it executable and figure out how to use it via systemd templates:

sudo chmod +x /usr/local/sbin/cpu-watch.sh
# cat /etc/systemd/system/[email protected]
[Unit]
Description=CPU watchdog for %i
After=%i.service

[Service]
Type=oneshot
ExecStart=/usr/local/sbin/cpu-watch.sh %i.service
# cat /etc/systemd/system/[email protected]
[Unit]
Description=Periodic CPU watchdog for %i

[Timer]
OnBootSec=2min
OnUnitActiveSec=30s
AccuracySec=5s

[Install]
WantedBy=timers.target

The trick I learned today was how to enable it with the target service name:

sudo systemctl daemon-reload
sudo systemctl enable --now [email protected]

You can check it’s working with:

sudo systemctl list-timers | grep cpu-watch
# this should show the script restart messages, if any:
sudo journalctl -t cpu-watch -f

Why This Works

The magic, according to Internet lore and a bit of LLM spelunking, is in using CPUUsageNSec deltas over a timer interval, which has a few nice properties:

  • Short CPU spikes are ignored, since the timer provides natural hysteresis
  • Sustained abuse (>5%) triggers restart
  • Pegged at quota (90% of 10%) triggers immediate restart
  • Runaway loops are contained by CPUQuota
  • Everything is systemd-native and auditable via journalctl

It’s not perfect, but at least I got a reusable pattern/template out of this experiment, and I can adapt this to other services as needed.

Ovo

Yeah, I don’t know what the grasshoppers want with the egg either
Another great evening spent in the company of Cirque du Soleil

Predictions for 2026

I had a go at doing predictions for 2025. This year I’m going to take another crack at it—but a bit earlier, to get the holiday break started and move on to actually relaxing and building fun stuff.

Read More...

Notes for December 9-24

Work slowed down enough that I was able to unwind a bit more and approach the holiday season with some anticipation–which, for me, invariably means queueing up personal projects. So most of what happened in my free time over the past couple of weeks was coding-related.

Read More...

The Big Blue Room

A lovely mirror
This part of town never disappoints, even in winter.

2025 In Review

Like , this is my somewhat rushed recollection of the year that was, and as usual it’s a mix of personal and professional reflections, with some thoughts on technology and trends thrown in for good measure.

Read More...

My Favorite Apps of 2025

I seldom write about the apps I use every day, so I thought a short note on the what’s and whys might be interesting.

Read More...

The TRMNL (DIY Everything Edition)

I’ve long had a fascination for digital picture frames, and that harks back to the Vodafone 520 Photo Frame, which was, at the time, a pretty wild concept–you could send pictures to it over GPRS and have them show up on a tiny LCD screen, way before social networks and ubiquitous connectivity made that a non-issue.

Read More...

Notes for November 23–December 8

Thanks to local bank holidays we had a couple of consecutive extended weekends, which I spent doing a bunch of long overdue chores, including replacing kitchen lights, organizing my office a bit better, and generally ticking off various small tasks that had been piling up for a while.

Read More...

Archives3D Site Map