As regular readers would know, I’ve been on the homelab bandwagon for a while now. The motivation for that was manifold, starting with the pandemic and a need to have a bit more stuff literally under my thumb.
But I also have a few services running in the cloud (in more than one cloud, actually), and I’ve seldom written about that or the overlaps between that and my homelab.
Zero Exposed Endpoints
One of my key tenets is zero exposed endpoints. That means no web servers, no ssh, no weird port knocking strategies to get to a machine, nothing.
If it’s not exposed to the Internet, then it’s not something I ever need to worry about. But of course there’s a flip side to that: how do I get to my stuff when I need to?
This won’t be popular in many circles, but everything I have exposed to the Internet is behind Cloudflare in one form or another:
- This site is fronted by Cloudflare (even though it is static HTML in an Azure storage account)
- I use Cloudflare Tunnels to have a couple of services (web and RDP) accessible from outside the house (including some whimsical things like a way to share the screen from my Supernote Nomad when doing whiteboarding), and they are shut down automatically overnight.
To get at anything else, I use Tailscale (extensively, from anywhere to anywhere). The VMs or VPSes I use across providers are only accessible via Tailscale (and, of course, whatever native console the provider exposes), and typically have no exposed ports.
Static Is Faster, Lighter and Simpler
I’ve come to the conclusion over the years that there is no real reason for me to run a public web server for 90% of what I do, so things like this site, my RSS feed summarizer, and anything else that needs to publish Web content are designed from scratch to generate static content and push it out to blob storage.
That way, serving HTTP, managing certificates, and handling traffic spikes are literally someone else’s problem, and I never have to worry about it again.
Anything fancy or interactive I typically deploy on my homelab or inside Tailscale.
But, interestingly enough, I also:
- don’t need to worry about response times
- need a lot less CPU and memory altogether to get something done
- can pack a lot more services into cheaper, often single-core VMs
Squeezed down, Concentrated Compute
Over the years I dabbled with many forms of service deployments, and after a long time building enterprise stuff and then refactoring those as microservices (typically using message queues, since HTTP makes you fall into all sorts of synchronous traps), I gradually came to a point where I started questioning how cost-effective some of those approaches were.
So today I don’t use any form of serverless compute. I have often been tempted by Cloudflare Workers, but since I don’t need that kind of extremely distributed availability and I can pack 12 small services inside a dual-core ARM VPS for a negligible fixed amount of money, I don’t have to worry about spikes.
If something somehow becomes too popular, the VM acts as a containment boundary for both performance and cost, which is much better than an unbound serverless bill.
I have been sticking to that approach for years now, since it’s cheap, predictable, and extremely easy to maintain or back up. And since I deploy most of my services as plain docker compose, I can set CPU and RAM limits if needed.
Keep It Dead Simple
Although it’s inescapable in many modern production environments, I don’t use Kubernetes for my own stuff, and the key reason for it is simplicity.
I don’t want to have to worry about managing a cluster, handling volume claims, or deal with the additional CPU and memory overhead that comes with it.
So I have been deploying most of my stuff using git, docker compose and kata, my minimalist, ultra-pared-down deployment helper, which has an order of magnitude less complexity.
If I need redundancy or scale-out, it’s much simpler to deploy docker swarm, mount the local provider’s external storage of choice on all the nodes (if needed) and have an ultra-low overhead, redundant deployment. In fact, I have been using that approach for years now.
And, again, it’s easy to set up VM backups with point-in-time restore in the vast majority of providers. If there’s one boring technology that everyone got right, it’s definitely VM backups.
SQLite is awesome
I work a lot with huge data warehouses, data lakes, and various flavors of Spark, plus all the madness around data medallions, ETL, data marts, and the various flavors of semantic indexing the agentic revolution needs but nobody really mentions.
But for my own stuff, I always go back to SQLite because it is both much simpler and surprisingly flexible:
- I can store timeseries data in it
- it is stupidly fast (there’s a 10GB SQLite file with all my home automation telemetry for a year, and it’s surprisingly zippy for the hardware it’s running on)
- I can enrich it with indexable JSON without breaking the whole schema
- it has baked-in full-text indexing
- I can use it as a vector store with a couple of extensions
- I can back it up trivially
So I hardly ever deploy any kind of database server–but when I do, it’s always Postgres.
And yet, I only have two instances of it running these days.
Secrets Management
Even with zero exposed endpoints, secrets management is still a thing I need to worry about. To reduce the number of moving parts, I’ve been using docker swarm secrets for most of my apps (or just the provider’s secrets management: Azure Key Vault, AWS Secrets Manager, etc.).
On my homelab I’ve been using Hashicorp Vault, but it is far too complex for most of my needs and I’ve been dabbling with a replacement.
Bringing It All Home
My homelab approach is pretty much the same: everything is behind Tailscale, (almost) nothing is directly exposed to the Internet, and I use docker compose for most of my application deployments, except that the hypervisor is Proxmox and I use LXC containers extensively instead of full VMs.
There is a lot of FUD out there around running docker inside LXC, but backing up an entire LXC and its multiple docker compose applications as a single unit is incredibly convenient and much more efficient than a VM.
The only real issue I’ve had (a couple of times) is that a misbehaving container can bog down the entire host if resource limits are not set properly or if it abuses I/O (which is particularly easy to do in a NAS with HDDs), so those services live inside regular VMs.
Incidentally,
podmanhas a bunch of issues running inside LXC containers, largely related tocgroupand UID management.
As to service definitions themselves, docker compose and the like, everything is git backed, of course, and I use Gitea to manage all of it, together with Portainer and a few custom actions.
Observability
This is the bit that I have been sorting out, and I’m converging towards a combination of Graphite for metrics and a custom OpenTelemetry collector I’m working on called Gotel to gather traces and logs from my applications.
I use the cloud provider’s managed backends for those (Azure Application Insights, AWS CloudWatch, etc.), but I want something simpler and more portable for my own stuff–and I’m building it now.