Notes for April 27 – May 3

This was an absurdly productive week, at least on a personal level. I’m not sure whether to be pleased or worried about the number of projects that moved forward simultaneously, but here we are.

I do know that a lot of it was due to the fact that I am back having insomnia and waking up with my nose clogged due to allergies, and that there is relatively little to do at 4AM except watch videos, read, and… hack away at things.

Vibes is Go-ing Places

I finally got vibes to mostly work in . The progressive transformation of all my stuff seems inexorable now, but this one was due to my still thinking ACP to wrap existing agent harnesses is much more of a necessity now that Anthropic has taken the lead on puerile attempts at locking people into their subscriptions by forbidding anything but Claude Code.

I still don’t use Claude Code or Anthropic models outside , but many people do, and I like to have options, so I used vibes to prototype a few things, including automating UI testing end-to-end with Gherkin (something I’ve used on and off in customer projects that mandated BDD and never really saw used “well”, but that is very useful with LLMs).

That BDD pipeline quickly ballooned out of proportion, of course, turning into almost 50 Gherkin scenarios with Playwright step definitions, a PDF report generator with embedded screenshots, and a CI workflow that tries to run the whole thing against GitHub Models so it doesn’t need my API keys (that is broken for now, for some reason, but OpenCode free models are enough to say “Hello” and get a response back).

Until 4AM today, vibes had more structured UX tests than piclaw, which was both gratifying and mildly embarrassing…

Emulation and Ports

The SheepShaver JIT is, surprisingly, coming along faster and easier than the 68k one, largely because, well, it’s RISC and has zero gnarly instruction side effects.

Not having spent a lot of time with pre-OSX PPC Macs, I am learning quite a lot about the internals (and JIT “design”, even though I’m working off bits and bobs I’m picking up from console emulation, of all things). Early in the week it booted Mac OS 7.6.1, then promptly broke the instant I ran Prince of Persia, but now it has (somewhat unstable) networking and I am starting to revisit packaging prebuilt Raspbian builds.

On the BasiliskII side, I got piclaw to automate fixing a bunch of VNC issues–double keystrokes, mouse snapping to centre, mode-switch crashes, etc. I actually did this before I picked up Gherkin, which I now sort of regret since it would have made some of the tests easier to specify.

And yes, previous-jit is a thing now. I’m using it as an opportunity to both test and clean up the 68k JIT, and it works pretty well on the , which has turned into my ARM64 lab.

Got an Orange Pi 4 to boot the 9front kernel and crash into the PCI bus, which counts as progress. Still reading kernel source and figuring out how the boot chain works on this specific SoC.

Fixing More Papercuts

As I was doing , I decided to clean up some pending Android projects that (believe it or not) are useful to me on a daily basis. I started with Receiver, and possibly due to insomnia side effects, also kicked off an RDP server for Android devices, because I got tired of every existing option on the Play Store being either scammy, subscription-gated, or both.

And it, too, is doing full E2E testing, with nice reports - I’ll have a little story to write about this one because it builds on go-rdp and is a great example of how it pays off to build libraries and reusable components (all my recent projects re-use stuff from each other to a fair degree).

Perhaps unwisely, I also decided to look at iSH, fork the arm64 version, and fix whatever I could. It can now run bun and pretty well (both crashed the iOS version), but it’s too early to call it generally usable.

Piclaw

I am slowing it down now that it is effectively “stable”, and focusing on two things:

  • Removing as many add-ons as possible to a standalone project so that I can make it easier to maintain.
  • E2E testing, because I am completely fed up with breaking in the front-end.

Building upon my earlier experiments during the week, I set up a proper Gherkin/Playwright pipeline with user stories, PDF report generation and a partridge in a pear tree, so my big hope is that other than upstream churn from pi.dev I can just settle in and use it.

Gi

I’m still very keen on building a low-resource agent harness that works the way I want it to, so this week gi got scriptable agent loop hooks, a tool registry, route registry and event streams.

But the joker runtime is where I am having the most fun, by far–my fork is now faster than for a completely arbitrary set of benchmarks:

Joker IR Optimization final comparison matrix: Bun, Python, Goja and Joker timings across 13 microbenchmarks, with Joker beating Goja on 11/13 and Python on 5/13
Joker IR Optimization final comparison matrix: Bun, Python, Goja and Joker timings across 13 microbenchmarks, with Joker beating Goja on 11/13 and Python on 5/13

The point, however, is not the benchmarks, but using the benchmarks to understand what to tweak for more general cases.

And so far it’s been turning out pretty nice–I’m really looking forward to using it.

Gophers and GPUs

I’ve been playing too much with assembly, so after optimizing go-gte (because I wanted an embedding model for my own stuff), I decided to look at tinygrad, and… I started putting go-pherence together based on everything I’ve learned so far.

Yeah, I know, cute gophers again
Yeah, I know, cute gophers again

Again, it is a thing I think should exist, because when I was looking a few years ago there were no libraries for inference whatsoever, and I’d like to have one that I can use on Linux (eventually getting it to work with Vulkan on SBCs) and that takes MLX-compatible weights:

Qwen3-0.6B inference results: GPU/CPU tokens-per-second across SmolLM2, Qwen2.5 and Qwen3 architectures after fixing a head_dim mismatch
Qwen3-0.6B inference results: GPU/CPU tokens-per-second across SmolLM2, Qwen2.5 and Qwen3 architectures after fixing a head_dim mismatch

Homelab

pve-microvm keeps paying off. I’ve moved a few of my home services to microVMs, added (which is now firewalling a test VLAN) and OPNsense (which works, but is not as familiar to me), SmolBSD (a NetBSD flavor that boots in 31ms, which is pretty impressive), and, because I am wading into inference territory (more on that later), an exo distributed inference template.

But even as I was drafting this, my vanished off the network. I shut it and the down, unplugged them, dusted the closet (which was long overdue), plugged them back in, and… the came “up”, but is completely unreachable (status LED is solid green, disk activity, link up on both interfaces, etc.).

Shuts down and boots correctly (apparently, with the usual slowness), but even sniffing at traffic with Wireshark directly yielded nothing. I tried resetting it, but to no avail. I have a support ticket open (for what it’s worth these days), and I think all the important data is on Azure, but troubleshooting this is something I didn’t want to deal with this week.

So, what Did I Learn This Week?

  • The has serious bugs.
  • I have far too many stupid ideas at 4AM.
  • There is a lot of re-use across my various projects, thanks to my penchant for building foundational bits first.
  • Inference is hard. Optimizing JITs and interpreters is, comparatively, much more my turf.
  • Functional testing works great with LLMs both as output (they write decent user stories that are easier to review and fix than code) and input - the Playwright reports, in particular, provided Codex with better directions to fix them than I would bother to describe.

So I might have found a way to deal with the annoying regressions I was getting in . Only time will tell.

Lessons on Building MCP Servers

I’ve been building servers for a while now–I wrote about last year, started out by creating umcp, and I’ve recently opened up an Office server that’s been battered by enough models against enough real documents that the patterns have settled.

I’m still not a fan of , but what follows is what I’ve learned about making tool chains actually work, condensed from swearing at logs rather than reading papers.

Disclaimer:This is a condensed version of CHAINING.md, which was itself stapled together from a bunch of notes in my vault. The full version has more code examples and a techniques inventory table that Opus just _had to add, and I’ve since beaten that out of it and restored most of the original text (minus typos).

The short version: the MCP servers I design do most of the work, while the model walks breadcrumbs.

Models don’t plan

They look at the conversation, scan the tool list, and grab whatever looks more probable. That’s it. There is no hidden planner. If you want chains that finish somewhere sensible, the server has to make the next call blindingly obvious at every step.

After a year or so, I have pared down my approach into these three things, roughly in order of how much pain they save you:

  • A small named core verb set covering most intents
  • Output that suggests the next call
  • An addressing scheme that survives between calls–anchors, IDs, paths, anything but line numbers.

Core verbs beat surface area

The Office server exposes over 100 tools. Its get_instructions() funnels models toward eight:

…start with office_help, then prefer office_read, office_inspect, office_patch, office_table, office_template, office_audit, and word_insert_at_anchor. Treat specialised tools as fallback, diagnostic, legacy-compatibility, or expert tools when the core flow is insufficient.

That single sentence does an outsized amount of work–it tells the model there is a recommended path, that the path is verb-shaped (help -> read -> inspect -> patch -> audit), and that everything else is opt-in.

Without it, models cheerfully reach for word_parse_sow_template when office_read would do, and you end up with five-call detours for one-call jobs.

So I quickly realized that I needed to be ruthless about which tools to surface and when. The specialised ones still ship–hidden under a “for experts” framing, and a handful of legacy ones filtered out of tools/list entirely.

I also make liberal use of activation sets–the surface the model sees is small; the surface it can reach is large.

Naming is the chain

Again, models chain whatever is most likely (or rhymes), and the most effective tactic, for me, has been taking advantage of that.

All Word tools are word_*, all Excel excel_*, all unified office_*. A model that just called office_inspect will reach for office_patch next, not word_patch_with_track_changes, because the prefix matches.

This particular server also makes liberal use of annotations and a little intent/inferrer hack that reads those prefixes to assign readOnlyHint/destructiveHint automatically, so naming discipline turns into safety metadata for free.

The prefix is the plan. The verb is the step. If you take one thing from this entire post, I’d suggest this notion…

Every response nominates the next call

This was the single change that made things behave on smaller models. The big ones will plan a chain from a tool list and a goal; the wee ones won’t–they grab the first plausible tool and stop.

The fix is stupid simple: every response ends with a breadcrumb dictionary of hints to follow. At minimum next_tools: [...], plus usage: "<exact call>" whenever the current tool produced a value the next one needs.

A model that can’t assemble arguments from a schema can copy the usage string verbatim. In fact, they will copy it, because it is still the most likely outcome as it fills in tokens, and thus those usage hints funnel the path the model takes.

Discovery as a tool, not documentation

Another thing I hit upon was that signposting needed to be curated.

Borrowing a page from intent mapping, office_help(goal=...) returns a structured record–recommended chain with rationale, fallbacks, diagnostic strings to watch for, one imperative next_step sentence. Not prose. Not a README, not skills. Data the model can act on without reading comprehension.

Called with no arguments, it returns the catalogue. Called with an unknown goal, it returns the supported set rather than an error, which turns a potential workflow-stopping error into an actual useful catalogue.

Addressing: anchors, not offsets

The biggest reason simple models can’t follow chains is the model losing the thread between calls. “Insert a paragraph after the introduction” is fine in English but catastrophic if you expect it to remember a byte offset across three tool calls.

In this particular scenario, I cheated and since most Office documents have headings (or cells, or internal structured paths inside OOXML), I used either verbatim text from the document or immovable coordinates (which was particularly hard in PowerPoint, by the way).

So besides suggestions and hints, return identifiers your tools will later accept as input. If you find yourself returning data the model has to describe back to you in natural language, you’ve made a chain that will misfire on a Tuesday afternoon when you’re not watching.

Modes turn one tool into four

I started out with individual editing tools per format, which was very easy to do automated tests for but incredibly wasteful of context, so at one point I decided to make things much simpler for initial discovery, and since I needed to make all outputs auditable, I then tagged available sub-operations risk-wise.

office_patch is the same code path whether you ask for dry_run, best_effort, safe, or strict. One tool, four modes, one entry in tools/list.

Discovery cost scales with tool count, not mode count. And dry_run -> safe -> strict is an escalation chain the model figures out on its own without being told.

If you have N tools that differ only in how cautious they are, collapse them. You’re wasting everyone’s context budget.

Diagnostics as the back-edge

Linear chains are easy. Real chains have loops, and loops only happen when the server invites the model back in. Every mutating tool returns a standard envelope with status, matched_targets, unmatched_targets, and next_tools.

The model then branches on a small subset of options “locally” without needing to go over the entire context, and if you name the diagnostic fields with exact strings the model will see again in your instructions, it will just reinforce them.

In this particular case, again, I cheated. I figured out that the models were starting to call tools at random because they couldn’t introspect the document well enough and ended up breaking files, so I always gave them at least one read-only tool, so the penalty for “I’m confused, let me look again” is one extra round-trip, not a destructive cock-up.

My MCP Design Checklist

  • Pick five to ten core verbs and name them in get_instructions() or your local equivalent
  • Use consistent prefixes by surface
  • Provide a discovery tool that returns recommendations as data, not prose
  • Make the discovery tool browseable–no-arg returns the catalogue, unknown input returns the supported set
  • Embed forward breadcrumbs in every tool response
  • Provide a map/anchors tool so addresses survive between calls
  • Give every mutating tool a mode enum including dry_run
  • Return named diagnostic fields and cite the recovery tools
  • Standardise the mutation envelope. If one tool changes something in a specific way, make sure the others are consistent (arguments, semantics, etc.)
  • Reject unknown arguments strictly (this is much easier in some runtimes than others)
  • Provide an audit tool so the model has somewhere to land
  • Cache anything the recovery loop calls more than once, because, well, it will get called dozens of times even if you carefully curate paths through your tooling with hints.
  • Make repeat calls safe–models retry, and they should be allowed to (idempotence is hard, and often impossible).

Do the boring work in the schema and the descriptions. The model will happily do the clever bit if you stop making it guess.

App Notes: Web App Viewer

I got annoyed enough with Safari Web Apps to write my own replacement.

It took about five minutes to get the core working, and maybe another hour of incremental tweaks spread over a day or so. That ratio–five minutes for the thing, an hour for the polish–tells you something about the state of the problem it solves.

Web App Viewer is a tiny native macOS shell that opens a URL in a WebKit window with no browser chrome. No address bar, no tab strip, no toolbar, no Safari-style fullscreen frame. One web page, one native window, as little visible UI as macOS will reasonably allow once a page is loaded (it hides traffic lights and scrollbars when the mouse is away).

You can drop URLs onto it in the Dock, send them from the Share sheet, a .webloc file, or a custom webappviewer:// URL scheme.

This is it. This is the whole app
This is it. This is the whole app

Why

Safari’s “Add to Dock” Web Apps have been around for a while now, and the idea is sound–pin a website as a standalone app, give it its own icon, get it out of the browser tab pile. The execution, though, is maddening, and it has always been broken across the board, but on macOS it is horrendous.

The resulting windows still carry persistent browser chrome I can’t hide, and the whole flow of creating one (find the menu item, wait, hope it picks up the right icon, hope it doesn’t break on the next Safari update) feels like an afterthought rather than a feature anyone at Apple actually uses.

This is one of dozens of papercuts that accumulate into a kind of low-grade daily friction, and I have a growing list of them that I intend to write about at some point. But this one was fixable before dinner, so I fixed it.

How

I fired up Codex with the kind of detailed mini-spec I described in –what the window should look like, how URLs should be accepted, what the drag behaviour should be–and told it to reuse the window styles and approach from Daisy and the USB Video Viewer (another small project I built to test SBCs via USB capture without adding more monitors to an already cluttered desk).

Disclosure: OpenAI provided me with a 6-month trial of Codex for my Open Source work (which has also helped me fully ), but you could probably do this with an brick-brained open-source local model (even if is a mess and under-represented in LLM training sets, which is a problem even with SOTA models).

The core is just WKWebView in a native window with chrome that fades in on hover. The Share Extension, the macOS Service, and the URL scheme were bits I tacked on after, and all the scaffolding (Makefile, signing, etc.) was AI-generated, because there is absolutely no reason to do that by hand in 2026.

There were, however, two things that were a right pain:

  • Adding an invisible drag strip needed a nudge from memory, but Codex was useless there. I knew how I’d have done it in and just guided it through the equivalent until it worked. Everything else was straightforward.
  • Web manifest icon detection in was… oh boy. The fact that still does not have a sane async model (at least like I would expect) and would poke at the page and web manifests but fail to wait and load the bigger icons took me a few tries to get right.

But it was totally worth it. I now have six instances of this running, and I found (and fixed) subtle bugs when trying to create each one of them, so I’m pretty much calling it “done” other than some manual UX tweaks I want to do to the menus and dialogs.

What I Use It For

The original motivation was wrapping Piclaw’s web UI as a frameless native-feeling app, and that works exactly as I wanted. But the nicer surprise has been dropping other self-hosted URLs into it–Grafana dashboards, consoles, internal tools–and getting a clean, chromeless window for each. It turns out that removing the browser frame makes everything feel lighter.

And I am casting one of them to an Android device via AirPlay (more on that later when I get that one stable), and the lack of browser chrome makes it… just great. Zero wasted pixels, no distractions, just the content.

But the way it really improves on what Apple didn’t do for me is usability and practicality. Drop in a URL, check it out, then hit Cmd+I and a new copy is installed to my ~/Applications folder, ready to launch from Spotlight, without cluttering the Dock or trying to figure out where they hid it in the sharing pane.

Bliss.

The Uncomfortable Bit

I was a happy user years ago, and I know there are paid apps that do roughly this. But the uncomfortable truth for Apple indie developers in the age of is that there is zero reason to pay for any of them when I can build a tailored version for my own needs this fast.

That’s not a criticism of those apps. It’s a warning sign about what -assisted development does to the economics of small, focused utilities–and, in the context of Mac apps, which were always a tiny cottage industry, is going to be worrisome for many.

But the real lesson here, I think, should be about what Apple ought to have just built into macOS instead of shipping the half-baked Web App support that provoked all of this in the first place.

I will have more words on that.

Notes for April 20-26

Amidst the chaos brought upon my usual seasonal allergies, work turned out to be calmer than usual–the usual industry churn and constant rumors of layoffs have made “calmer” a relative term, though–so most of my evenings went to projects.

I also re-read Project Hail Mary–partly because I needed something absorbing that wasn’t a screen, and partly because Weir is one of the few authors who makes engineering problem-solving feel like a page-turner. It holds up, and I can’t wait to see the movie.

Mac Retro-Hackery

Rocketing away
Rocketing away

PPC detour is, surprisingly, working much better than the 68k JIT, but already paid off: my naïve take on memory layouts meant that I hit one of the banes of modern emulation very fast–ASLR on aarch64 Linux was randomising addresses that the JIT needed to be fixed, but now I understand a lot of the issues I was having with 68K version.

The fix for now was to have the binary disable its own ASLR at startup via personality(ADDR_NO_RANDOMIZE) and re-exec, which is ugly but works and is the sort of thing nobody documents. And after doing that on the BasiliskII side as well, a lot of issues went away.

Both JITs now have proper Makefile workflows with tmux targets, which means I can build, test, run and kill either emulator from a single command–which I’ve been doing with my iPad, from the comfort of my couch.

As to the , it is not assembled, because the resistive touch screens I have are borderline unusable for precise tapping (so good thing I only 3D printed a test fit with old filament). I ordered a couple of larger capacitive ones and a bunch of other ESP32 stuff, so I expect to come back to that next weekend.

PVE microVMs

So tiny
So tiny

My little hack has been working great–although I had to fix a few things after upgrading one of my nodes (regression testing is the bane of my existence these days), pve-microvm now supports all the operating systems I care about, a few I had never considered using, and other than the fact that I am creatively patching ’s interface, it has been pretty stable, which was unexpected.

I got piclaw to hack in a custom OCI dialog to replace the Create VM wizard, an xterm.js console tab for microVMs (noVNC makes zero sense for serial-only machines), and a bunch of other features.

And of course it broke when shipped a patch release, but since I have a as a sacrificial node I can contain the blast radius of any upgrades. Mostly.

But right now I’m converting most of my LXCs to microVMs, and it’s been a blast–the speed is fantastic, and the fact that I can run in a microVM is just icing on the cake.

The Churning piclaw

Like I wrote above, regressions are the bane of my existence, and I am getting really annoyed at because despite all the nice tooling, it can still pass most linting and “compiling” and fail spectacularly at runtime. And since the upstream packages have been undergoing considerable churn and breaking changes, a lot of piclaw broke in various ways, and experimenting with different models really doesn’t help.

Even as I’m typing this, I am (yet) again waiting for an OpenAI model to audit some UI breakage that Anthropic’s models caused, because they just drop chunks off the code when editing it sometimes, but I am getting really annoyed at fixing things three times in a row…

And yet, the flexibility of and its extension model is pretty amazing–I decided to adopt it wholesale and have started breaking off pieces of piclaw into a piclaw-addons repository, into which I can throw all the mad experiments I want–for instance, yesterday I hacked together a “cheapskate” addon (a cost-conscious model router) that lets you use a bunch of free tiers across various providers, something that would be impossible to do in most harnesses…

Gi

Yes, another cute gopher
Yes, another cute gopher

And yet, I think it’s time to have a backup. So I created gi, a harness inspired by and designed for extensibility, but where all the extensions are externalized to the point where they can’t (hopefully) break the core, and where I want to try to rewind the clock to the simpler times of LISP machines–take your workspace, copy a state dump to another machine, and just carry on.

So I designed it as a single binary that can pack everything into a single database, and that binary embeds both a dialect (via Joker) and a engine that can hook into the state machine–so extensions can be written in either and live inside the SQLite blob alongside everything else.

And in true belt and suspenders style, I’m going to pack both a TUI and a web UI in the same binary.

But, most importantly, I’m taking a completely different approach at dependencies and testing–starting with bringing together most of my previous stuff in various forms, and writing a functional test suite and not just a code one. Still missing tool execution, keychain, workspace indexing–but it’s at the point where I can sit down and have a conversation with it.

9front on ARM

9front literally "on" ARM
9front literally "on" ARM

Yeah, I know. Another project. But I realized that I needed to remind myself of how to bootstrap a kernel on bare metal before I even try to get Haiku running outside QEMU, so I started poking at porting 9front to one of my ARM SBCs.

’s ideas about distributed computing and per-process namespaces have been rattling around in my head since the 90s, but more to the point it is a very simple system, and shifts the bulk of the effort into getting uboot and hardware bootstrapping to work instead of trying to figure out everything at once.

As a fun detour from that, I ended up creating a simple USB Video viewer to pull up video output from a USB capture card to watch things crash spectacularly.

Keeping an eye on things
Keeping an eye on things

Yet Another Website

While I was at it, I finally got around to refreshing rcarmo.github.io–my open source landing page, which had been accumulating a decade of pixel dust while I was off doing other things.

It’s nothing fancy: a single page that groups some of my repositories by topic (AI agents, cloud, hardware, infrastructure, libraries, macOS, terminal stuff) with one-line descriptions for each, and acts as a sane front door for anyone who stumbles onto my GitHub profile and doesn’t fancy scrolling through 380-something repos.

rcarmo.github.io project landing page
The refreshed landing page, sorted by topic and (slightly) opinionated about what's worth highlighting.

The rest of the week’s GitHub activity was the usual scattering: a small go-ai update (the unified LLM client I’m using inside gi), some ground-init and mdnsbridge cleanups, a zmk-config-totem tweak for the split keyboard I’ve been slowly getting used to, and a couple of apfelstrudel commits–because if I’m going to break my brain on emulators all week, I might as well let an AI agent help me make some weird music every now and then.

Site Cleanups

Flint, my “very stable” agent, kept earning its keep on the side: I finally split out and as their own subsections (consolidating entries that had been awkwardly squatting in the language tables) and tucked away a couple of odds and ends–notably and a –into the relevant pages.

None of this is glamorous, but the resource pages have been drifting for a while, and having an agent do the boring sorting (and ask me sensible questions about edge cases) is exactly the kind of thing to deal with chores I’ve been putting off for years.

And yeah, I know it’s too much, and that I’m spreading myself too thin.

Notes for April 13-19

This was a pretty decent week despite my allergies having kicked in to a point where I have constant headaches, but at least I had quite a bit of fun with my projects.

“Now I Have the Full Picture”

Yeah, I find Opus sycophancy and its traits obnoxious, but this time it’s right–I was trying to get to work with my particular flavor of Cheap Yellow Display and having so much trouble matching screen corruption and flipped colors (and bits) to the display code, that after I finally managed to get at least a stable (if broken) boot picture on screen, I thought to myself… why not let piclaw sort this out for me?

So I plugged the CYD and a Logitech Brio 4K into the , and… I got the most surreal ESP32 closed loop debugging setup going:

I ended up moving the camera farther away to get better focus
I ended up moving the camera farther away to get better focus

Five minutes later, I had all the display bugs fixed except for touch input, which was still rotated–a fair bargain.

Proxmox microVMs

I was looking at smolvm and going through my notes on Firecracker and other sandboxing mechanisms, when I realized I had come across microVMs a few months ago when looking at agent sandboxing mechanisms and the old QEMU JIT.

Now, I actually think that microVMs are way overrated, but I was literally in the shower when I realized that, for me (since I have zero interest in running microVMs in my laptop) would be the perfect way to manage them (also since I have zero interest in running another exotic hypervisor).

So I did a little spelunking, and… It worked. Badly, but it worked. I took my terminal session, added a few notes, and asked piclaw to investigate if it was possible to patch the UI–and guess what, it was a pretty simple patch–I got the agent to flesh out a Debian package, turn my hacks into a CI/CD workflow that builds and packs a suitable kernel into the .deb, and now I have a nice VM template, decent integration of microVMs into the web UI, the works.

pve-microvm patches qemu-server to add the machine type, ships a template workflow that pulls OCI container images and converts them to PVE disk images, and redirects serial to the web console so you get a proper terminal in the UI. There’s also init support and a balloon device (as well as qemu-agent support), but the OCI images are so barebones that I haven’t yet sorted out all of the ergonomics about using them to automatically deploy stuff.

Proxmox microVM integration in action

This looks like a very low impact addition to so far and I would love to upstream it, but I’m not holding my breath since maintainers aren’t trivial to reach and the old-style “join our developer mailing-list” approach is… just too effort-intensive as I have so much stuff to do these days.

We Now Do PowerPC JITs Too

The macemu work took an unexpected turn–I shifted from (68k) to SheepShaver (PowerPC), and things moved a lot faster than I expected. To make a long story short, it was Friday and I idly asked piclaw to do a comparative source analysis between both emulators, hoping for something that I’d missed in the quagmire of ROM patches I’ve been wading through.

Turns out that it told me that there was no real JIT support and did a comparative analysis of opcode coverage, ending with “there are, however, much less opcodes to translate in the RISC architecture. Do you want me to set up a quick opcode test harness for PPC”?

Uh… yeah? By Friday evening, every opcode family except AltiVec had native ARM64 codegen and was booting to the Welcome to Macintosh screen (and crashing, but this was comparatively 100x faster than the 68k work), and yesterday afternoon, after some back and forth about creating a second harness (effectively a headless Mac with no hardware to skip problematic ROM regions), I got it to do AltiVec via NEON (which the supports–I’ve yet to devise a fallback path for older chips).

The process was straightforward: point piclaw at an opcode group, have it implement the native codegen, run the harness, iterate on whatever broke, then once an opcode group was “done”, smoke test it on the headless Mac harness. The AltiVec stuff was the most satisfying part–mapping NEON intrinsics to Altivec semantics is tedious but tractable, exactly the kind of work where AI earns its keep and the harness catches every subtle difference.

SheepShaver now boots to a desktop with VNC input working. There’s still a long way to go because I have done zero hardware testing (it’s got no audio, only VNC input and, more importantly, no network or graphics acceleration), but a from-scratch PPC JIT on ARM64 booting to a desktop in around 24h is… not nothing.

I wish I could finish the 68k JIT, though, the register allocation strategy I guided the agent towards and the weird ROM patches BasiliskII does just don’t get along.

Lounge About Agentic Computing

The fun part for me has been that a lot of this has been done on an iPad on my couch, using the Apple Pencil or iOS voice typing to scratch out instructions. After an outing yesterday, I had the idea to just swipe between agents, and… oh boy.

The idea is simple–swipe left or right on the timeline to switch between agents–but making it feel right on an iOS PWA required far too many weird CSS and JS hacks, and the one real problem I’m having is that AI, no matter how many times you specify in painful detail what you want and how many actual code samples you give it, is still too prone to breaking very intricate UX–I’m getting really tired of weird regressions every time I add another feature.

I’m Not In Thrall To Anthropic, But I Can Help

I’m not an Anthropic customer (besides GitHub Copilot’s model selection, which now also includes the new, lobotomized Opus 4.7, I have a personal Codex subscription for OSS work), but so many people seem to have been caught by their ban on third-party coding harnesses that I decided to dust off Vibes, start porting it to (which I had already in my backlog) and turning it into an ACP-only wrapper so that people can use Claude with a nice web UI.

I think it’s the least I can do, and also gives me a decent web UI to drop in for my own work when I absolutely have to use Copilot.

Haiku on ARM64

And, of course, since I have far too many projects already, I decided to see if I could get Haiku to boot on ARM64. I don’t particularly care about doing for salesy startupy business stuff, but I love using it to build things I think should exist, and I have quite a few more I’d like to make happen…

Notes for April 6-12

Thanks to a bit of spillover from Easter break, this was a calmer, more satisfying week where I could actually get stuff done and even have a bit of fun.

My idea of fun, apparently, is to do 3D visualizations in piclaw
My idea of fun, apparently, is to do 3D visualizations in piclaw

Getting Organized

Now that piclaw is in cruise mode, I’ve started focusing on actually using it.

So I created an instance called Flint, which manages not only my vault but also all of my personal pursuits and most of my homelab: I gave it the API tokens for my cluster and , and over the past week it’s been busy:

  • It re-tagged most of my notes and drafts (as well as adding reference URLs for ongoing drafts), quizzing me on what to do with specific notes as it went
  • It rebuilt and redeployed my GPU sandbox (which I broke last week): recreated the VM, mounted the Ubuntu ISO, prompted me to run the installer, and installed the latest NVIDIA drivers, nvidia-docker and a baseline set of utilities.
  • I then asked it to look at the stacks in my gitea instance, my notes, and what needed to be set up, and it installed the agent and brand new versions of the stacks with tweaked network and volume settings, updated my notes, and upgraded the pinned image versions (troubleshooting as it went).
  • It developed and published an OPDS server and an EPUB read later service so I can fetch interesting web pages and read them later on the XteInk X4, including monitoring the CI pipeline and redeploying the containers
  • It audited and set up centralized stats collection in , which I had been meaning to do for ages (and I intend to have it set up Telegraf on other machines to collect metrics).

So far, Flint is a resounding success (it’s using GPT-5.4, a fairly sensible and stable model), but it doesn’t just do notetaking and operations.

Site Hackery

Flint has also become quite useful to help me tidy up my workflow—I was already using a piclaw instance to convert ancient and raw HTML posts into in batches, but there are a few things that have been nagging at me for years and that I can finally make significant progress on:

  • Adding links to my resource pages
  • Drafting link blog entries
  • Streamlining static site builds

I’ve had to do the first two for ages, but they both relied on adding bits of text to Reminders that were then post-processed and added to git using either the CLI or WorkingCopy. That worked OK for a while, but my iPad mini’s increasing slowness has made them quite frustrating, especially since I tend to do that kind of quick posting over breakfast and it was taking up too much time.

As it happens, GitHub has a REST API for Git Trees, and what that means in practice is that I can update a JSON changeset with these minor changes, let it accumulate over breakfast, and then apply them in batches–or, rather, have Flint do that, with all the guidance and steps in a SKILL.md file.

So my new breakfast workflow is to just send links to Flint using the iOS sharing pane or a bookmarklet (still experimenting with both), have it create a JSON changeset for links, and occasionally ask it to screenshot a page and create a blank Markdown document for linkblog posts. That is pre-filled with a title, likely tags and the appropriate image reference, and I just pop open the built-in editor tab in piclaw, finish the post and ask it to add the files to the changeset and post them via the API.

So far, it’s been going swimmingly: zero git fetches/commits/pushes, all handled server side, and very little friction–and it works on my iPad mini, albeit still slowly.

A New Hope

Another thing I’ve been working on is porting the site builder to for both speed and maintainability—the current codebase has some 20-year old hangovers that I wanted to get rid of, and some kind of reset has been long overdue, so I have been slowly poking at this for the past few months.

As it happens, the overall indexing and rendering process was pretty trivial—the real challenge has been to make sure that it looks exactly the same, especially given that my engine has some pretty specific Wiki-linking rules and I’ve accumulated a bunch of rendering helpers and custom plugins over the years.

Plus everything related to HTML rendering has changed: parsing, link resolution, templating, the works. And that’s enough to juggle already, so I don’t want to change the front-end design at all (yet).

I decided to be ambitious and aim for full rendering parity. So what did my little army of AI helpers do?

It converged on doing visual diffs out of random sampled pages: Take a locally rendered version, look at the public page, and generate an image that it can easily rate as “close” or “broken” by just counting the ratio of red pixels:

This is both brilliant and scary at the same time
This is both brilliant and scary at the same time

The process is greatly streamlined: sample 100 pages out of the nearly 10,000 we have now, render, batch compare, show me the worst ones, and then discuss and generalize the fixes (which is the only part the LLM is actively involved in). I could probably use autoresearch to automate this, but some of the fixes have to do with legacy rendering logic that no AI could ever figure out.

Still, this has converged very quickly to minor typography and spacing differences, and once I’m happy with the engine I’ll start looking at optimizing the actual blob uploading part–which I aim to standardize via rclone to remove my current dependency on storage accounts, but greatly optimize with deltas.

Remember, AIs Are Still Dumb

It turns out that if you tell an AI that empty catch blocks are forbidden, the thing will just… go and add comments inside them, instead of doing something useful like a warning log message…

I’m now doing another code audit pass over the entire piclaw codebase, and this kind of mechanical fix is trivial to set up and do reliably with autoresearch:

An autoresearch session doing a code audit pass
An autoresearch session doing a code audit pass

Now to see if I can get some reading and 3D printing done as well, since the whole point of using AI in the first place was to have more free time… right?

Apple, Still

I have been having feelings about lately. This blog may have drifted a fair way from its original focus on , but I am still, first and foremost, an Apple user – just not an exclusively Apple user, and perhaps not even a particularly obedient one anymore, since I use both Windows and every day and have grown used to judging platforms by what they let me get done rather than by whatever story they are trying to tell about themselves.

That makes the current moment a little awkward. Apple is still extraordinarily good at making hardware I want to pick up and use, and still more coherent than most of the industry in the broad strokes, but it also feels increasingly prone to sanding off the wrong edges, reinventing the UX wheel, and constantly adding paper cuts to their software.

The iPhone

The is probably the clearest example of that tension. It is still the phone I would rather carry, and the one whose hardware I trust most, but has become steadily more fussy without becoming proportionally more capable.

A lot of it has been the constant UI friction and pointless balkanization of features like screen mirroring, which I would very much like to have – I see zero point in using Messages on my Mac or futzing around with Handoff and AirDrop when I could just, you know, pull up a window into my phone and type stuff in.

And I know Apple could indeed engineer a way to make those features DMA-compliant if it really wanted to – I suppose breaking the user experience across the board with Liquid Glass had enough priority to preempt allocating engineering resources to, you know, proper features.

Sharing things, moving files around, background activity, browser limitations, the endless little inconsistencies in system UI and the ungainly bloat in Settings – that friction accumulates. None of it is fatal on its own, but the aggregate effect is that the platform feels far less light than it used to, even while Apple keeps insisting that everything is becoming more seamless.

Where The Cracks Show

I’m going to say it outright: I found insulting. Not just visually, but also because it tells me that instead of fixing glaring gaps in things like automation ( is definitely not in good health, and is pretty much dead) that could actually have put Apple in the forefront of automation and AI (never mind the miserable failures in Siri and Apple Intelligence), someone at Apple actually decided breaking visual affordances took priority over stability and providing consistent application intents and hooks across the board.

Even then, is in a better place than , but mostly because it still retains enough of its older character to be workable. Remember, I can just .

There is still a proper filesystem, there is still a shell (even if Apple seems intent on breaking the userland in very small increments across releases), there are still enough escape hatches to route around bad decisions, and Apple Silicon has papered over a remarkable amount of software bloat simply by being absurdly fast and power-efficient.

But the cracks are visible there too. System Settings remains a mess, cross-platform application quality keeps declining, and the old Mac assumption – that a user might actually want to understand how their machine works – seems to matter less every year. Meanwhile keeps borrowing bits of the Mac’s vocabulary without acquiring the Mac’s actual flexibility, which leaves both platforms feeling oddly misaligned.

The iPad

The remains the device I most want to use more than I actually do. I may pick one up every morning to read the news and get drafts started, but . The hardware is excellent, the battery life is still absurd, the pencil is useful, and for reading, sketching, note-taking and casual browsing it remains hard to beat. Fine.

But every time I try to push it into being a serious general-purpose computer, it reminds me that Apple still has not decided what it wants the to be. It can approximate a laptop for stretches at a time – and sometimes very convincingly – but the moment you need proper peripheral support, predictable file handling or sustained tool switching, the abstraction turns into safety glass – and I’m back to my long-held opinion that the only good iPad is the iPad mini.

That’s what I intend to upgrade this year, even if Apple comes out with a decent foldable (and, by the way, I really like the “leaked” form factor, because phones have become stupidly tall and unwieldy).

Fedora, Oddly Enough

And this is where comes in, because it has become my most useful point of comparison. on the desktop is still Linux on the desktop – gloriously inconsistent, occasionally infuriating, and always willing to expose its plumbing at the worst possible moment – but my experience is very conclusive: has reached a point where, for a lot of everyday work, it is simply easier to reason about than either macOS or iOS.

That does not make it better in every respect. It is not. But it does mean that a lot of the breakage in Apple software now has a reference point, and even considering I was always a UNIX user and deeply technical, the creature comforts that Linux now provides give me a lot more confidence than Apple’s software.

If Qualcomm wasn’t so obtuse about only supporting Windows and ARM laptops were more open, things would be very interesting indeed.

Still an Apple User

I still like the hardware, still prefer the overall ecosystem in a number of places, and still find myself evaluating a lot of the rest of the industry by standards set years ago.

But I also think it is getting harder to ignore how much of the original appeal has been traded away due to sheer mismanagement of software QA and Apple’s refusal to acknowledge the gaps across , core applications, and a consistent user experience.

Come on, Tim, get your people in line.

The Orange Pi 6 Plus

This was a long one–I spent a fair bit of time with the Orange Pi 6 Plus over the past few months, and what I expected to be a quick look at another fast ARM board turned into one of those test runs where the hardware looks promising on paper, the software is wonky in exactly the wrong places, and you end up diving far more into boot chains, vendor GPU blobs and inference runtimes than you ever intended.

Read More...

Notes for March 30 – April 5

This was a shorter work week partly due to the Easter weekend and partly because I book-ended it with a couple of days off in an attempt to restore personal sanity–only to catch a cold and remain stuck at home.

Read More...

The Xteink X4

I got an Xteink X4 this week, and my first reaction was somewhere between amusement and nostalgia–it is absurdly small, feels a lot better made than I expected for the price, and the form factor harks back to the times when I was reading e-books on Palm PDAs and the original iPod Touch.

Read More...

Hans Zimmer

At least they aren’t from Behringer
Modular synths on stage. Who would have thought?

Notes for March 23–29

Work ate the week again. I’m exhausted, running on fumes, and daylight saving time stole an hour of sleep I could not afford–the biannual clock shuffle is one of those vestigial absurdities that nobody can be bothered to abolish, and I’m starting to take it personally.

Read More...

Notes for March 16–22

This week’s update is going to be short, largely because work was hell and I ended up spending my Saturday evening poring through my meeting notes backlog until 2AM today and I have a splitting headache to show for it.

Read More...

Notes for March 9–15

Well, there went another work week. Slightly better (to a degree, although I got some discouraging news regarding a potential change), and another week where piclaw ate most of my evenings–it went from v1.3.0 to v1.3.16 in seven days, which is frankly absurd even by my standards.

Read More...

MacBook Neo Impressions

I went to a local mall yesterday and happened to chance upon a couple of s on display at our local (monopolistic) retailer1, and spent a half hour playing with them.

Read More...

So You Want To Do Agentic Development

We’re three months into 2026, and coding agents have been a big part of my time since –things have definitely intensified, and has already panned out: agents are everywhere.

Read More...

Notes for March 2–8

This was a frankly absurd week work-wise, with some pretty long days and a lot of late-night hacking on my projects (which is not exactly a new thing, but at least now I am asking piclaw to do it during the day time, which is a small improvement).

Read More...

Notes for February 23–March 1

Well, going back to work after a week off was rough.

Read More...

Archives3D Site Map