Unexpected Synology Woes

Last weekend my decided, for some unfathomable reason, to stop working after I took it out of the closet, dusted it and put it back, and I have feelings about it.

In fact, I’ve had them throughout the whole week, because it’s taken forever to get most of my home services up again.

Fortunately, my home automation and a few other things are spread among my nodes, but I had a bunch of things running on that NAS, and I wanted to document what happened because someone else might have the same issues I did and end up here.

Symptoms

The machine booted up (power LED initially blinking, solid green status LED, disk activity almost immediately), but would not show up on the network.

Both LAN interfaces would be up, but issued zero packets. No DHCP requests, no link-local addressing, not even replies to arping (and yes, I knew the MAC addresses of the machine, because that’s the kind of thing I keep tabs on). I plugged in my MacBook and my on each interface, rebooted, and saw… nothing.

tcpdump saw nothing at all. I thought it might be some sort of OS glitch (which is why I tried both laptops), but no luck.

So I tried to reset it to factory configuration. You have two reset levels, the first of which only resets your admin password and network settings, the second has you reinstall the OS without losing data.

But nothing worked, and ’s tooling just couldn’t find the NAS or connect to it.

Recovery

The first thing I did was set up Virtual DSM on borg to see if I could, in the direst of emergencies, access our off-site backups. That sort of worked, but the experience was so fiddly that I was reminded of all of HyperBackup’s pitfalls in one fell swoop–most notably that I effectively need a Synology to get at that data, which is not something I want to rely on.

Yes, there is a HyperBackup desktop application. No, it did not work for me–it apparently expects you to download backup files from the cloud to your local machine, and I need to be able to directly restore files from Azure, period.

After filing a ticket with Synology about my unresponsive system, they sent me an AI-generated troubleshooting list, in the middle of which was a step I could not find anywhere in their online documentation: booting the machine without any disks.

That apparently also automatically reset settings (which is, in retrospect, weird, because it feels like something should be stored in the chassis for this kind of emergency), and I was finally able to discover it on the network, reset the admin password, reconfigure the network, etc.

So if you have the same symptoms, this might save your day. And, as it turns out, be the prelude to an entire week of pain, because mine spent the past five days or so grinding through data scrubbing. Because that is a thing it felt like doing, and I’ve been coping with the fallout since then–extremely slow access, very slow response times as I tried to double-check services and settings, etc.

What Didn’t Work Right

First of all, all my containers were gone. Container Manager, for some reason, does not preserve any settings in this scenario, and if I didn’t have installed and a copy of (most of) my stacks in , this would have been enough for me to never again run containers on a Synology.

As it was, I was able to point piclaw to the machine and have it reconstruct all critical services in a few hours (it would have been much faster if it wasn’t doing scrubbing). And, as it turns out, there was also enough residual info in the underlying Docker daemon itself to fill in most of the gaps.

But barring that, there were a bunch of things that made recovery a pretty stressful endeavor:

  • The mobile apps (DS Finder and the like) were useless in finding or diagnosing the issue at every step.
  • The web site did not list disk removal as a troubleshooting step (at least not that I could see, since it went straight into the dual-step reset procedure).
  • The timing documented for holding the reset button for reset 1 (4 seconds) was not accurate. It was more like 20, and I feared for a moment I might end up triggering reset 2, which would require reinstalling the OS.
  • Synology’s desktop tools are, to be brief, very poorly maintained and look like something out of the 90s, even down to the Windows look on macOS.

So even for an “appliance” NAS, the experience could be much better.

Let’s Have an Adventure

Resetting the configuration had zero impact on my data–at least so far as I can tell. Shares, users, all the regular stuff was preserved, and after a few glitches with cloud backups (because disk scrubbing made them fail overnight twice), everything seems in order.

But since the machine spent so long simultaneously scrubbing and swapping as I tried to restore services, it’s clear that I cannot rely on it for interactive use anymore.

Synology doesn’t really let me upgrade RAM on the thing (you sort of can, but it’s already capped at the maximum RAM the J4125 can officially support), so I’ve started removing stuff from it–most of the Docker services I’ve been running there for years are now moving into microVMs or s running elsewhere, and are either going to use the Synology as a “dumb” NAS and mount storage directly, or be backed up to it using Borg Backup Server (which is going to be the only new Docker container running on it).

I’ve already moved and off it, and having them run (even with very constrained resources) on separate microVMs in an N150 makes a world of difference–so much so that I have to wonder why I put up with the J4125’s slowness for years.

I set to snapshot both VMs daily (and added a temporary direct-to-cloud backup), and am now slowly moving the rest. Or, rather piclaw is doing that. I had it draft a plan to group containers and create target VMs/LXCs, and the agent is now merrily ing data and container configs out of the Synology.

Mid-Term

After the dust settles, I am going to move all of my backups out of the Synology ecosystem–I currently rely on HyperBackup to back up my data to , but the recovery attempt was so off-putting that I am going to look into using directly to Azure.

Backrest looks like a nice way to do that, with the added benefit that restic backups (which I have already been using for years) seem to work better with Azure storage tiering (and thus might even be cheaper in the long run).

The Siri For Families Apple Will Never Build

The got me thinking about the one thing I keep wishing would build and almost certainly never will: a family-scoped AI assistant that actually works across all our devices.

I don’t mean a frontier model or a “reasoning engine”–just a competent, context-aware agent that understands my family as a unit. The shared calendar, the school schedules, the medication reminders, who’s picking up whom and when. The kind of thing that Apple Intelligence was supposed to be, except pointed at the problem that would actually matter most to the people who are already deep in the ecosystem and paying for it.

I am married with two kids. Between us we have more Apple devices than I care to count–and we are exactly the demographic Apple loves to put in keynote photos. And yet treats us as completely separate customers who happen to share a credit card. Family Sharing is a permissions layer bolted onto individual accounts, and it shows in every single interaction–shared photo libraries (still broken), purchase management (still confusing), screen time (still adversarial rather than collaborative). Twenty-four years of “digital hub” strategy, and this is where we are.

What I Actually Want

Here’s what a competent family agent could do without being creepy–and in most cases, without even needing to leave the device:

  • Know that my son has a test on Thursday and hasn’t opened the revision material since Monday. A gentle nudge (to him), not a surveillance report.
  • Track our medication schedule and ping people (or me, if an elderly relative misses a window) without turning into a clinical monitoring tool.
  • Surface things on that match what we actually watch, not what the recommendation engine wants us to try.
  • Coordinate pickup times, grocery lists, meal plans–the sort of mundane family logistics that currently live in a group chat and three different apps.
  • Make file sharing work the way a shared family folder should, rather than the absurd permissions mess it currently is.
  • Do smarter photo sharing–not just a wholesale shared library, but understanding who’s took the photos, where and sharing only relevant stuff to family without it being an all-or-nothing proposition.
  • Better family e-mail, better event handling, better package tracking across household members.

I also want it to let me keep my parents and in-laws in the loop. Most of the above also applies to extended family, especially if you have elderly parents who need help managing their medications, appointments, and social connections. A family agent could be a lifeline for them without being intrusive.

None of this is exotic. Apple already does the understated version of some of it–surfacing birthdays, suggesting contacts to call at specific times, the quiet little iOS touches that work well precisely because they don’t try to be clever. A family agent would just be more of that, but with understated functionality across the whole household instead of locked to a single Apple ID.

And none of it requires SOTA models, or selling out to Gemini. A 4B parameter model running on-device–the sort of thing I’ve been for months–would handle the intent parsing and coordination.

The hard part isn’t the AI. It never was. It’s the will, the focus and the willingness to execute, and that’s where Apple has been asleep at the wheel for over a decade–and I am not going to hold my breath that Ternus will be the one to wake them up in things like APIs and interoperability that would actually make this possible by third parties.

They should have an absurd advantage here: they own the OS, the hardware, the sync layer, the health stack, the media stack, the calendar, the reminders. Nobody else even comes close to that vertical. And they’ve done nothing with it.

I know this is possible because I’ve been building something like it myself–a personal agent that fits in a single binary and a database, carries its own scripts and state, and runs on anything from a Raspberry Pi to a desktop. The TypeScript-based version already manages my homelab, files links to my wiki, coordinates across machines, and does it all with about 300MB of RAM (the Go version should take up 30).

I built this on the equivalent of a Raspberry Pi, but Apple can’t do it with a trillion-dollar platform because they won’t treat families as anything other than a billing construct.

Just to add insult to injury, I could do most of what I wanted if we were in the Google ecosystem. But on iCloud it’s impossible to access shared tasklists (or even anything else, really) with any sort of standard protocol and documented API. For Google (or even Outlook), most of it is accessible.

Every Apple equivalent is there, but they just refuse to connect them, or let anyone use them.

The Automation Graveyard

I know I’ve banged on this drum for years, but Apple has spent the better part of a decade systematically breaking OS automation, and they’ve done it so thoroughly that it’s hard to believe it’s accidental.

is on life support. Automator was effectively killed. was supposed to replace both, and instead became an App Store for workflow fragments that nobody maintains and that break with every major OS update. The Shortcuts editor is still painful for anything beyond “open this app and do one thing”, and the integration points with third-party apps range from spotty to fictional.

On , you can set up Tasker automations that trigger on location, time, sensor data, app state, notification content, Bluetooth proximity–and chain them into workflows that persist across OS updates. On Windows, I have a piclaw instance that can drive the entire desktop via a Windows API extension. The gap between what those platforms allow and what Apple permits isn’t narrowing. It’s getting wider.

could have been the foundation for family automation. Instead, it’s a gallery of pretty icons.

Why It Won’t Happen

I suspect the real reason is structural. Apple doesn’t think of families as a product category. They think of them as a collection of individual customers who happen to share a payment method. Every design decision reflects this: iPads are still single-user devices. storage is pooled, but grudgingly, and shared files live in a sort of no-man’s-land. App purchases are shared grudgingly, in a submenu of a submenu. Family Sharing is an afterthought, not a platform.

The only thing that Apple seems to care about (after iMessage) is that we can share what we are watching on Apple TV, which has been relevant in our family for exactly zero minutes since the feature launched.

And until someone at Apple decides that “a household of four using Apple devices” is a use case worth designing for rather than designing around, Siri will remain a single-user voice assistant that can’t reliably set a timer on the right HomePod.

With Ternus coming from hardware, I’d like to think there’s a chance he gets that a trillion-dollar ecosystem ought to handle a shared grocery list. But I’ve been hoping Apple would sort out family sharing since iCloud launched, so I’m not holding my breath here.

I Think I Figured Out What an AI IDE Looks Like

I’ve been mulling the UX arc I’ve been going through over the past couple of years, and I think it was mostly the same for everybody:

  • Copy/paste into a chat web UI
  • IDE with a chat sidebar (, , etc.)
  • TUI chat (Mistral Vibe, pi, Codex CLI, Claude CLI, etc.)
  • Rich chat in a native app (Codex desktop, Claude desktop)
  • Web chat with rich interactive widgets (piclaw)

Since I spend a lot of time on my iPad, piclaw’s web timeline has become my default–I can pop open the terminal or the editor at will, but coding is still a game of balancing drudgery with creativity, and the “creative” part works well in chat.

At least for me, using AI for my projects has been a matter of . If you open a new chat thread for every feature or fix, going back to the editor takes you away from the flow–it’s much easier to have the model spew the changes in the chat, highlight the bits you want changed, and iterate directly in it.

And I’ve just realised, after adding text highlighting and annotation support to the piclaw timeline (to make it easier to point out specific things to the model), that what I’m building is a notebook for code.

I’m sure Stephen Wolfram would be delighted to be proven right, even if this paradigm isn’t really for everybody.

Of course, this scales poorly when refactoring and you have a zillion modified files, but other than refactors I am the kind of person who likes small, testable iterations and still looks at the code.

I also think that being able to scroll back up, fish out an older interaction and re-use it (or riff on it) is powerful, and what I am planning to do next is to inject an editor pane into the web chat to directly review and edit code inline–not as a separate tab, but as part of the conversation flow.

There’s something about this that irks my -addicted brain, of course, but it’s tantalising, and I quite enjoy sitting on the couch with my iPad after a long day in front of my desktop–and yes, using handwriting recognition to prompt it works great; I love living in the future.

Notes for May 3-10

This was a weird week, both because I keep waking up at 5AM with my sinuses clogged, and because I feel like I’m losing momentum. Feeling almost permanently cotton-headed, sleepy due to sheer exhaustion or because of antihistamines certainly has something to do with it, but .

We Must Go Deeper

I spent the latter part of the week hacking away at go-ds4 and go-pherence, which was interesting to me not just because I am still trying to get Vulkan to work for inference on a couple of SBCs, but also because, all of a sudden, a bunch of my stuff converged into SIMD and assembly–including, of all things, an H.264 decoder I plan to add to go-rdp.

This meant going all in on model internals again, which is something I’ve neglected for a while and that I would otherwise find fascinating were it not for my general state of tiredness.

My Little

go-joker went from “forked and interesting” to “actually competitive with Python” in about two days of focused work. Again, there is a weird serendipity and convergence across most of my other projects (like the JITs I’ve been hacking on in macemu-jit and previous-jit), but this time I took out CLR via C# and had Codex build a tiered IR bytecode interpreter that can in turn do compilation via wazero for pure numeric loops, and doesn’t have a GIL (thanks to routines).

I should really write about that, when I feel better.

Android Remoting

As part of an ongoing experiment to see just how far I can go without the Android SDK installed, I kept nudging my Android RDP server along, and am generally very happy with all the automated testing scaffolding I built around that, because I’ve extended it to vibes and piclaw with great success.

My Agentic Work Is Nearly Done

I think piclaw is pretty much done by now. I backported kitty graphics support in the terminal (the ghostty-web ecosystem is pretty amazing on its own), and of course I use it constantly (I am actually typing this draft in it), and I will be doing some fixes and at least one UX release, but I need to go back and fix my Synology, redeploy a bunch of things in my homelab, and prep for more electronics projects.

But first, I’m going to take a nap, because I did wake up at 4AM again, crafted a dead stupid add-on and badly need to rest.

The Local AI Moat

Regular readers will know that I’ve spent most of the past two years shoehorning LLMs into single-board computers, partly as a learning exercise and partly because there are lots of local/”edge” applications where semantic reasoning (no matter how limited) and “interpretation” of sensor data are actually useful.

But now we’re at a point where running a decently useful open weights model on your laptop is entirely feasible.

This comes at what is possibly , and after having started my own inference library and tried hacking away at @antirez’s brilliant hack within my meagre resources, I feel like a serious rift is developing between the “haves” who were lucky to get hardware on time (or can splurge multiple K of European Pesos on it) and the “have nots”.

The societal impact of the entire thing in the always hype-driven geek community is, of course, fascinating (especially since a very small number of people have a disproportionate amount of influence in this little echo chamber), and I sometimes feel like Jane Goodall observing packs of opinionated chimpanzees, but I digress.

Personally, after spending the day mulling on this, I find the whole thing extremely depressing, for three reasons:

  • Despite , I see computers as something inherently distributed and personal. There are a lot of latent contradictions here, yes–I’ve learned to live with them.
  • As an European citizen, the geopolitics of the asymmetrical situation we are in today regarding technology and AI in general , and yes, I have learned to deal with that too, but really wish I could do something about it.
  • Personally, I can’t afford to keep up. People in startups, self-employed or in very specific minuscule niches might be able to spend enough to do so, but I can’t.

I’m thrifty by nature, usually plan (and over-think) my purchases years in advance, am at a point in my career (and the industry) where job security and already had , so saving up every dime I can for a potential rainy day has been very much on my mind and I now agonize over stuff as simple as ordering a 70 EUR battery to revive an eight-year laptop (because, yes, I do still use old machines).

So there is (pardon my French) absolutely no fucking way I am getting decent local inference hardware anytime soon. And I count myself lucky I built when I did, even if that was also a painful decision at the time and it is now hopelessly outdated for most things.

That’s it. I’ve vented. Now I’m going to take something for my sinuses, chase it with an antihistamine, and doze off until 4AM tomorrow.

Notes on GPT 5.x Model Regressions

I’ve been getting annoyed at constant code regressions in piclaw for the past few weeks. Something was off–even after bumping the test suite to the point where it catches most mechanical errors, gpt-5.5 kept making unrelated edits to code that should have been left alone, and I was getting really annoyed at babysitting it.

The pattern was always the same: It would follow a strict spec and then “improve” three other things nobody asked for, and since I am using piclaw and know exactly what the agent does and can trace context and requests, I know it isn’t a harness bug.

So I spent last night investigating, and gave both gpt-5.3-codex and gpt-5.5 the exact same prompt, off clean sessions:

audit this codebase thoroughly for code smells and logic errors and fix them.

Two identical worktrees, two models, same system prompt, same tooling. Reset both, run, compare results. I did this five times, and gpt-5.3-codex produced more complete fixes, caught more subtle issues, and generated more reliable tests in every single run. Not by a slim margin–noticeably, consistently better.

I don’t have hard data beyond “I looked at the diffs and one set was clearly more thorough than the other, five times in a row.” This is anecdotal, heavily tied to the codebase I ran it in, but feels “right” in a way that explains my perception over the past few weeks.

What I think happened

I noticed a similar thing earlier this year when switching between Anthropic’s opus-4.5/4.6 and OpenAI models–gpt models consistently caught structural issues that opus and sonnet glossed over (or just merrily felt were “right”, hippie-style), and its fixes were more surgical. I got used to that gap and worked around it.

What’s odd is that the same gap now exists within OpenAI’s own family. gpt-5.4 was less thorough than gpt-5.3-codex for code work, and gpt-5.5, well… is “worse” in an as yet unspecified way. Yes, the newer models are better at conversation, better at following complex instructions in English, more “pleasant” to interact with–but when you ask them to find every logic error in a 2000-line file, they’re worse at it than their older sibling.

I think they’ve been tuned for broader, more generic behaviours and the code analysis got diluted in the process. “Be helpful across a wide range of tasks” apparently trades off against “be exhaustive and precise about code.” Go figure.

What I’m doing about it

I’m using gpt-5.3-codex as my audit model, and having pi and piclaw switch to it whenever I say “audit”.

It does the hard pass–finding code smells, logic errors, missing edge cases, inconsistent patterns–and I then go back to using the newer models for the conversational work, planning, and tasks where breadth matters more than depth. It also seems to use fewer tokens for the same work, though I don’t have hard data on that because, well, I have a life.

The year-long pattern I’d been following–sketch projects out with opus-4.x, then do the real work with gpt–is now subtly broken. In practice it’s become: use whatever to get started, but run reviews with a -codex model before you trust the output. The combination works, but it’s faintly ridiculous that I’m using an older model to mark the newer one’s homework.

This also means my piclaw instances now run different models for different tasks, which is one more argument for the pi/gi approach of keeping the model layer swappable and the tool surface minimal–something I wrote about in the and touched on in the . If the best code model changes every quarter–and apparently it can change backwards–you want the plumbing to not care.

Notes for April 27 – May 3

This was an absurdly productive week, at least on a personal level. I’m not sure whether to be pleased or worried about the number of projects that moved forward simultaneously, but here we are.

I do know that a lot of it was due to the fact that I am back having insomnia and waking up with my nose clogged due to allergies, and that there is relatively little to do at 4AM except watch videos, read, and… hack away at things.

Vibes is Go-ing Places

I finally got vibes to mostly work in . The progressive transformation of all my stuff seems inexorable now, but this one was due to my still thinking ACP to wrap existing agent harnesses is much more of a necessity now that Anthropic has taken the lead on puerile attempts at locking people into their subscriptions by forbidding anything but Claude Code.

I still don’t use Claude Code or Anthropic models outside , but many people do, and I like to have options, so I used vibes to prototype a few things, including automating UI testing end-to-end with Gherkin (something I’ve used on and off in customer projects that mandated BDD and never really saw used “well”, but that is very useful with LLMs).

That BDD pipeline quickly ballooned out of proportion, of course, turning into almost 50 Gherkin scenarios with Playwright step definitions, a PDF report generator with embedded screenshots, and a CI workflow that tries to run the whole thing against GitHub Models so it doesn’t need my API keys (that is broken for now, for some reason, but OpenCode free models are enough to say “Hello” and get a response back).

Until 4AM today, vibes had more structured UX tests than piclaw, which was both gratifying and mildly embarrassing…

Emulation and Ports

The SheepShaver JIT is, surprisingly, coming along faster and easier than the 68k one, largely because, well, it’s RISC and has zero gnarly instruction side effects.

Not having spent a lot of time with pre-OSX PPC Macs, I am learning quite a lot about the internals (and JIT “design”, even though I’m working off bits and bobs I’m picking up from console emulation, of all things). Early in the week it booted Mac OS 7.6.1, then promptly broke the instant I ran Prince of Persia, but now it has (somewhat unstable) networking and I am starting to revisit packaging prebuilt Raspbian builds.

On the BasiliskII side, I got piclaw to automate fixing a bunch of VNC issues–double keystrokes, mouse snapping to centre, mode-switch crashes, etc. I actually did this before I picked up Gherkin, which I now sort of regret since it would have made some of the tests easier to specify.

And yes, previous-jit is a thing now. I’m using it as an opportunity to both test and clean up the 68k JIT, and it works pretty well on the , which has turned into my ARM64 lab.

Got an Orange Pi 4 to boot the 9front kernel and crash into the PCI bus, which counts as progress. Still reading kernel source and figuring out how the boot chain works on this specific SoC.

Fixing More Papercuts

As I was doing , I decided to clean up some pending Android projects that (believe it or not) are useful to me on a daily basis. I started with Receiver, and possibly due to insomnia side effects, also kicked off an RDP server for Android devices, because I got tired of every existing option on the Play Store being either scammy, subscription-gated, or both.

And it, too, is doing full E2E testing, with nice reports - I’ll have a little story to write about this one because it builds on go-rdp and is a great example of how it pays off to build libraries and reusable components (all my recent projects re-use stuff from each other to a fair degree).

Perhaps unwisely, I also decided to look at iSH, fork the arm64 version, and fix whatever I could. It can now run bun and pretty well (both crashed the iOS version), but it’s too early to call it generally usable.

Piclaw

I am slowing it down now that it is effectively “stable”, and focusing on two things:

  • Removing as many add-ons as possible to a standalone project so that I can make it easier to maintain.
  • E2E testing, because I am completely fed up with breaking in the front-end.

Building upon my earlier experiments during the week, I set up a proper Gherkin/Playwright pipeline with user stories, PDF report generation and a partridge in a pear tree, so my big hope is that other than upstream churn from pi.dev I can just settle in and use it.

Gi

I’m still very keen on building a low-resource agent harness that works the way I want it to, so this week gi got scriptable agent loop hooks, a tool registry, route registry and event streams.

But the joker runtime is where I am having the most fun, by far–my fork is now faster than for a completely arbitrary set of benchmarks:

Joker IR Optimization final comparison matrix: Bun, Python, Goja and Joker timings across 13 microbenchmarks, with Joker beating Goja on 11/13 and Python on 5/13
Joker IR Optimization final comparison matrix: Bun, Python, Goja and Joker timings across 13 microbenchmarks, with Joker beating Goja on 11/13 and Python on 5/13

The point, however, is not the benchmarks, but using the benchmarks to understand what to tweak for more general cases.

And so far it’s been turning out pretty nice–I’m really looking forward to using it.

Gophers and GPUs

I’ve been playing too much with assembly, so after optimizing go-gte (because I wanted an embedding model for my own stuff), I decided to look at tinygrad, and… I started putting go-pherence together based on everything I’ve learned so far.

Yeah, I know, cute gophers again
Yeah, I know, cute gophers again

Again, it is a thing I think should exist, because when I was looking a few years ago there were no libraries for inference whatsoever, and I’d like to have one that I can use on Linux (eventually getting it to work with Vulkan on SBCs) and that takes MLX-compatible weights:

Qwen3-0.6B inference results: GPU/CPU tokens-per-second across SmolLM2, Qwen2.5 and Qwen3 architectures after fixing a head_dim mismatch
Qwen3-0.6B inference results: GPU/CPU tokens-per-second across SmolLM2, Qwen2.5 and Qwen3 architectures after fixing a head_dim mismatch

Homelab

pve-microvm keeps paying off. I’ve moved a few of my home services to microVMs, added (which is now firewalling a test VLAN) and OPNsense (which works, but is not as familiar to me), SmolBSD (a NetBSD flavor that boots in 31ms, which is pretty impressive), and, because I am wading into inference territory (more on that later), an exo distributed inference template.

But even as I was drafting this, my vanished off the network. I shut it and the down, unplugged them, dusted the closet (which was long overdue), plugged them back in, and… the came “up”, but is completely unreachable (status LED is solid green, disk activity, link up on both interfaces, etc.).

Shuts down and boots correctly (apparently, with the usual slowness), but even sniffing at traffic with Wireshark directly yielded nothing. I tried resetting it, but to no avail. I have a support ticket open (for what it’s worth these days), and I think all the important data is on Azure, but troubleshooting this is something I didn’t want to deal with this week.

So, what Did I Learn This Week?

  • The has serious bugs.
  • I have far too many stupid ideas at 4AM.
  • There is a lot of re-use across my various projects, thanks to my penchant for building foundational bits first.
  • Inference is hard. Optimizing JITs and interpreters is, comparatively, much more my turf.
  • Functional testing works great with LLMs both as output (they write decent user stories that are easier to review and fix than code) and input - the Playwright reports, in particular, provided Codex with better directions to fix them than I would bother to describe.

So I might have found a way to deal with the annoying regressions I was getting in . Only time will tell.

Lessons on Building MCP Servers

I’ve been building servers for a while now–I wrote about last year, started out by creating umcp, and I’ve recently opened up an Office server that’s been battered by enough models against enough real documents that the patterns have settled.

Read More...

App Notes: Web App Viewer

I got annoyed enough with Safari Web Apps to write my own replacement.

Read More...

Notes for April 20-26

Amidst the chaos brought upon my usual seasonal allergies, work turned out to be calmer than usual–the usual industry churn and constant rumors of layoffs have made “calmer” a relative term, though–so most of my evenings went to projects.

Read More...

Notes for April 13-19

This was a pretty decent week despite my allergies having kicked in to a point where I have constant headaches, but at least I had quite a bit of fun with my projects.

Read More...

Notes for April 6-12

Thanks to a bit of spillover from Easter break, this was a calmer, more satisfying week where I could actually get stuff done and even have a bit of fun.

Read More...

Apple, Still

I have been having feelings about lately. This blog may have drifted a fair way from its original focus on , but I am still, first and foremost, an Apple user – just not an exclusively Apple user, and perhaps not even a particularly obedient one anymore, since I use both Windows and every day and have grown used to judging platforms by what they let me get done rather than by whatever story they are trying to tell about themselves.

Read More...

The Orange Pi 6 Plus

This was a long one–I spent a fair bit of time with the Orange Pi 6 Plus over the past few months, and what I expected to be a quick look at another fast ARM board turned into one of those test runs where the hardware looks promising on paper, the software is wonky in exactly the wrong places, and you end up diving far more into boot chains, vendor GPU blobs and inference runtimes than you ever intended.

Read More...

Notes for March 30 – April 5

This was a shorter work week partly due to the Easter weekend and partly because I book-ended it with a couple of days off in an attempt to restore personal sanity–only to catch a cold and remain stuck at home.

Read More...

The Xteink X4

I got an Xteink X4 this week, and my first reaction was somewhere between amusement and nostalgia–it is absurdly small, feels a lot better made than I expected for the price, and the form factor harks back to the times when I was reading e-books on Palm PDAs and the original iPod Touch.

Read More...

Hans Zimmer

At least they aren’t from Behringer
Modular synths on stage. Who would have thought?

Notes for March 23–29

Work ate the week again. I’m exhausted, running on fumes, and daylight saving time stole an hour of sleep I could not afford–the biannual clock shuffle is one of those vestigial absurdities that nobody can be bothered to abolish, and I’m starting to take it personally.

Read More...

Archives3D Site Map