Running microVMs in Proxmox VE, The Easy Way

I’ve been running a mixed cluster for years – four nodes of wildly different capability, from an Atom x5-Z8350 with 2 GB of RAM (a , currently offline after years of faithful service as a baseline torture device) up to an i7-12700 with 128 GB (, my main homelab server).

This year, somewhere along the way between writing agentbox and all the hype around agentic sandboxes I got tired of the eternal compromise between LXC containers and full virtual machines, and ended up building pve-microvm – a Debian package that adds QEMU’s microvm machine type as a first-class managed guest in VE.

This isn’t a quick hack. Well, the first version was, actually, but it’s gone quite a bit farther than that, and certainly farther than I expected.

It now ships a custom kernel, patches the internals to provide web UI integration, and, due to my usual fascination with offbeat operating systems, ended up supporting (as of this writing) 21 guest OS types from Debian to NetBSD to .

Yes, I completely brought it upon myself to run in a microVM, and yes, it works.

Finding the Right Balance

After a few rounds of cluster cleanups and migrations, it’s now my daily driver for running , Caddy reverse proxies, mini-firewalls, and the AI agent that’s helping me clean up this post.

gives you two main options out of the box:

  • containers start instantly, share the host kernel, and are spectacularly efficient. But they’re not isolated – a kernel exploit in one container compromises everything. You can’t run a different OS. You can’t easily nest Docker inside them without ending up (eventually) wrestling with fuse-overlayfs gymnastics. And certain workloads (anything needing custom kernel modules, or CAP_SYS_ADMIN in anger) simply don’t fit.

  • Full VMs give you hardware isolation via KVM/VT-x, but they boot SeaBIOS or OVMF, sedately walk through GRUB as they yawn their way out of bed, probe a forest of emulated legacy devices (IDE controllers, VGA, USB hubs, PCI bridges), and typically take 5-10 seconds to reach a login prompt. Each one carries the overhead of that entire emulated chipset sitting in memory.

What I wanted was the security boundary of a VM with the startup characteristics of a container. QEMU’s microvm machine type – originally developed for Firecracker-style workloads – strips all of that away. No BIOS, no GRUB, no legacy devices. Direct kernel boot into a minimal virtio-only environment. The result: sub-300ms boot to a fully networked guest with a QEMU agent, running inside its own KVM hardware isolation boundary.

Comparison of Standard VM, microVM, and LXC Container isolation and boot characteristics
Comparison of Standard VM, microVM, and LXC Container isolation and boot characteristics

Now, let me be clear: I’m not spawning hundreds of these things. I have Azure for that – but I do want to run Actions workers, have a very limited set of hardware resources, and got fed up with the time it took for one particular VM to boot repeatedly…

What It Actually Does

pve-microvm is a single .deb that patches Proxmox’s qemu-server Perl modules at install time. When you set machine: microvm on a VM config, the standard config_to_command function delegates to my MicroVM.pm, which builds an (almost) completely different QEMU command line:

qemu-system-x86_64 -M microvm,x-option-roms=off,pit=off,pic=off,\
  isa-serial=on,rtc=on,acpi=on,pcie=on \
  -kernel /usr/share/pve-microvm/vmlinuz \
  -initrd /usr/share/pve-microvm/initrd \
  -append "console=ttyS0 root=/dev/vda rw quiet" \
  -device virtio-blk-pci-non-transitional,drive=drive-scsi0 \
  -device virtio-net-pci-non-transitional,netdev=net0 \
  ...

No chipset emulation. No PCI bridges. No VGA. The guest gets a single serial console (which PVE’s xterm.js connects to natively), virtio block devices, and a virtio network interface. Everything rides PCIe transport with non-transitional (modern-only) virtio devices rather than the MMIO transport microvm was originally designed around – for reasons I’ll come to in a moment.

How pve-microvm integrates with Proxmox VE internals
How pve-microvm integrates with Proxmox VE internals

The package ships:

  • A tiny (12MB) pre-built Linux 6.12.22 kernel compiled from x86_64_defconfig with a minimal overlay – virtio, vsock, virtiofs, 9p, and the modules Docker needs (overlay, veth, bridge, netfilter, BPF), because, well, I’m pragmatic.
  • A 1 MB initrd that probes virtio devices, finds the root filesystem by label or device path, and does a switch_root in ~150ms
  • pve-microvm-template – builds root filesystems from any of 12 supported OCI base images, with optional SSH, Docker, and guest agent
  • pve-oci-import – pulls an OCI image directly into a PVE-managed disk
  • Web UI extensions – a “Create µVM” button, machine type dropdown, conditional panel hiding for irrelevant settings, and an amber bolt icon in the resource tree
  • A systemd service (pve-microvm-early.service) that ensures patches are applied before pvedaemon starts on boot – critical for onboot=1 VMs

The Boot Sequence

Like in aerodynamics, most speed comes from eliminating everything that isn’t strictly necessary. A standard VM spends most of its boot time in firmware and bootloader, so a microVM skips all of that.

Boot timeline comparison between microVM and standard VM
Boot timeline comparison between microVM and standard VM

SmolBSD (a NetBSD guest using virtio-mmio transport) boots in 31ms. A full Debian with Docker and the QEMU agent is ready in under 8 seconds – and most of that time is apt package installation during first boot. Subsequent boots hit the 300ms mark consistently, even on my humble hardware.

A fun rabbit hole I went into when someone asked me to add SmolBSD support:

There’s a reason SmolBSD gets to use virtio-mmio and the Linux guests don’t. A QEMU microvm machine type can carry its virtio devices over two transports: the bare-bones MMIO interface it was originally built for, or PCIe. MMIO is the lighter of the two – no PCIe host bridge, no ACPI – which is how a NetBSD guest shaves itself down to 31 ms.

But on QEMU 10.x the MMIO path has (as far as I can tell) a device-probing bug for Linux guests: only virtio-blk binds, and the network, serial and balloon devices are never claimed by their drivers for some reason. NetBSD probes MMIO correctly and is perfectly happy; Linux (at least the kernel I am using) is not.

For every Linux guest I therefore fall back to PCIe with non-transitional (modern-only) virtio devices, which binds all of them reliably. The cost is about 50 ms of extra bring-up – which, against a 300 ms boot, I’ll take without complaint.

I think the above is actually a bug in my kernel configuration, but haven’t really had time (or maybe even the right hardware) to tackle it – this is something I’d love more people to look at and contribute patches.

One Kernel, Many Guests

There’s a deliberate consequence of this direct kernel boot trick that’s easy to miss: the kernel doesn’t live inside the guest. It sits on the Proxmox host at /usr/share/pve-microvm/vmlinuz, and the guest disk holds nothing but a root filesystem – userland, no /boot, no GRUB, no per-guest kernel package, no initramfs of its own.

That also means there’s no “boot the installer ISO and click through it” path, so instead the rootfs gets built straight from an OCI image with pve-microvm-template (Debian, Alpine, Fedora, Rocky, Amazon Linux and friends). In the weird cases, we import a prepared ext4/raw disk with qm importdisk. You don’t install an OS – you assemble a root filesystem.

Decoupling the kernel from the rootfs is what makes this interesting to run at scale. Every Linux microVM on the node boots the same vmlinuz – one kernel, built once from a stock x86_64_defconfig with a microvm overlay, so you can audit and update it in exactly one place: drop a new vmlinuz on the host, restart the guests, done. No guest ever pulls a broken kernel from an apt upgrade, because no guest has a kernel to upgrade, and the rootfs images stay tiny and completely kernel-agnostic.

Container-style kernel consistency, VM-style isolation.

What I’m Running

On my cluster right now, I have a fair smattering of these already. Four off the top of my head are:

  • gitea (VM 114, on an Intel N5105) – Bare-metal with SQLite, Caddy HTTPS, local actions runner, Avahi discovery. 2 cores, 2 GB RAM, 32 GB disk. Boots in ~3s, mostly because does a lot of housekeeping.
  • smith (VM 9022, on my i7) – the main piclaw agent, the system that manages the cluster, releases piclaw and generally keeps tabs on everything. 2 cores, 6 GB RAM, 48 GB disk. Runs Docker internally, and has 3 smaller, volatile siblings scattered throughout the cluster that don’t run Docker but have different roles (CI/CD, wipe-and-reinstall agent instance for testing upgrades, etc.)
  • exo (VM 9021, on my i7) – Distributed inference coordinator for running LLMs across multiple machines. CPU-only, 2 GB root.
  • virtualdsm (VM 300, ) – Synology DSM running inside a microVM with Docker, inside Terramaster NAS hardware. Yeah, I know I’m weird, but it was when my Synology went sideways and I haven’t nuked it yet. Uses the stock Debian kernel rather than my custom one, because DSM needs specific module paths.

I also have a dormant 9Front () one, as well as a little menagerie of OpenWrt, OPNsense, OSv unikernels, gokrazy Go appliances, and various Alpine/Fedora/Rocky/Amazon Linux configurations filed away as standard backups. The 21 guest OS types aren’t theoretical – each one has been booted and validated, and sometimes smith will go and thaw one out to do regression tests.

The (that ancient Atom x5-Z8350 with 2GB RAM) was invaluable as a baseline test platform, because If a microVM can boot and run usefully on a fanless 2016-era Atom with 2 GB of total system memory, it’ll work anywhere. And it could run six before it started slowing down…

What a Config Looks Like

There’s no magic to a microVM config – it’s an ordinary qm guest with a particular machine type and a kernel command line. Here’s gitea (the VM 114 above) as it sits in /etc/pve/qemu-server/114.conf:

agent: 1
args: -kernel /usr/share/pve-microvm/vmlinuz -append "console=ttyS0 root=/dev/vda rw quiet"
boot: order=scsi0
cores: 2
machine: microvm
memory: 2048
name: gitea
net0: virtio=BC:24:11:00:6E:01,bridge=vmbr0
onboot: 1
scsi0: local-lvm:vm-114-disk-0,size=32G
serial0: socket
tags: microvm
vga: serial0

The only microvm-specific lines are machine: microvm, the args carrying the kernel and its cmdline, and serial0: socket / vga: serial0 wiring the console through to xterm.js. Everything else – cores, memory, the virtio NIC on vmbr0, the scsi0 disk on local-lvm, onboot, the guest agent – is exactly what you’d write for any Proxmox VM.

Note there’s no -initrd in args: MicroVM.pm injects it automatically when it sees the shipped kernel, along with the balloon, vsock and (when configured) virtiofs devices. That’s the whole point – a microVM is a normal guest that happens to boot a host-provided kernel, not a special object you have to learn a new tool to manage.

Networking

A microVM attaches to the network exactly like any other guest. The interface spec is a bit of a mouthful (virtio-net-pci-non-transitional device on the PCIe bus), and it lands on whatever Linux bridge and VLAN you point it at:

qm set 900 --net0 virtio,bridge=vmbr0           # single NIC
qm set 900 --net0 virtio,bridge=vmbr0,tag=100   # tagged onto VLAN 100

Because it’s an ordinary KVM guest, the standard PVE firewall applies – per-VM nftables rules work the way they do for any VM, which is the part that actually matters when you’re running untrusted code.

Inside the guest, networking is handled by systemd-networkd rather than cloud-init: DHCP by default (matched on Type=ether, so it survives cloning with no MAC pinning), or a one-line /etc/microvm-static-net for a static address. Earlier versions leaned on cloud-init for this, and I found it too brittle; moving it to systemd-networkd made cloning reliable and I stopped having to debug templates half the time.

Network isolation, another mainstay of the current hype around agent isolation, is a solved problem I have no interest in re-solving inside the package because I think ’s own SDN already does it properly – a simple VLAN zone with a separate VNet per trust domain keeps untrusted guests on their own segment with no path to the LAN, and if I ever bothered with that on a home LAN (well I have thought about it… but got no time), a VXLAN zone with a designated exit node would let me funnel egress through a single firewalled choke point.

The microVM just lands on whatever VNet I point net0 at, so the policy lives in , not in a one-off ruleset I’d have to babysit.

There’s also a non-networked path for host/guest plumbing: each guest gets a vsock CID (VMID + 1000), which I use for SSH-agent forwarding (host keys into the guest without ever exposing them on the wire) and for virtiofs/9p directory sharing, because… I forget. I know I needed it at one point, even if right now most of my microvm instances just do SMB mounts (which were a pain to do under Docker and LXC)

The NIC count is… moderately sane. I currently allow six virtio interfaces per guest (net0-net5) on the spurious grounds that they’re twice the maximum number of physical interfaces across all of my host machines. It’s just defined in MicroVM.pm, and the very few people who need more than that are welcome to tweak it (and if you think you want a dozen you almost certainly want VLANs instead).

Storage and Migration

Storage was… trivial? every PVE backend works, because the disk is just a virtio-blk-pci device and hands it the same path it hands any VM. LVM and LVM-thin, ZFS, Ceph/RBD, NFS and CIFS, plain directory storage – all fine, with snapshots, linked and full clones, vzdump backups and qm importdisk behaving exactly as they do elsewhere. A microvm is a normal PVE guest that happens to boot differently.

Migration is the only place things are iffy. a) I can’t do live migration on any of my hardware (none of it is enterprise grade), and b) true live migration isn’t viable with the current QEMU microvm machine type anyway – it simply doesn’t implement it.

Offline migration (i.e., reshuffling instances around) works fine, though, and because microvm boot in well under a second, you can run a quick HA relocate cycle – stop, migrate, start – provided your disks live on shared storage.

I’ve measured it at around two seconds moving a small guest between nodes on shared CIFS. Not seamless VM motion, but perhaps good enough for most uses, and as far as I can figure out, ha-manager drives it the same way it would any guest.

Vs. LXC: When To Choose What

Mind you, I still run LXC containers. They’re the right choice when:

  • You trust the workload (or it’s your own code)
  • You need zero boot overhead
  • You want direct filesystem access from the host
  • You want slightly less memory allocation overhead (oh, and there’s shared page cache if you’re really lucky)

MicroVMs win when:

  • You need actual kernel isolation – running untrusted code, different kernel versions, nested Docker, anything with aggressive CAP_ requirements
  • You want a reproducible image you can snapshot, back up (vzdump works), offline-migrate, and clone (linked clones too)
  • The workload does something unusual to the kernel (BPF programs, custom netfilter rules, kernel modules) and you don’t want that leaking into your host (which is why most of my tunnels now land in a microvm)
  • You’re running non-Linux guests – NetBSD, Plan 9, FreeBSD-based firewalls – which simply can’t run as LXC
  • You want a VM that’s up before you’ve finished the command.

The overhead difference between LXC and a microVM on modern hardware is surprisingly small. On borg (that i7-12700), the idle memory footprint of a minimal microvm is ~40 MB, and the CPU overhead of the hypervisor is unmeasurable for most workloads.

The memory figure you set at boot is genuinely just a ceiling: As far as I can tell KVM still only backs the pages a guest actually touches, and the host’s same-page merging dedupes identical pages – kernels, libc, shared base-image layers – across every microvm on the node, so like on grown-up cloud hypervisors the real cost tracks the working set, not the nominal allocation.

Much to my embarrassment, I didn’t even bother with live resizing for a long while, but I added balloon support to the kernel recently, with free-page-reporting and deflate-on-oom. PVE’s balloon target drives proper auto-ballooning, and a virtio-mem device gives genuine fine-grained hot-add and hot-remove, so, again, this works like a normal VM.

The Awkward Parts

Now for the catches…

First one is pretty obvious: I’m patching someone else’s product, and even though I survived the 4 to Perl 5 transition decades ago and I have Codex to help these days, patching the Perl internals is fragile–every qemu-server upgrade can break my setup.

I mitigate this by a) not blindly running upgrades and b) using dpkg triggers (the package re-patches automatically after upgrades), but it’s still a race condition waiting to happen – I’ve had partial PVE upgrades leave the system in a state where pvedaemon couldn’t even compile the patched module.

And then come the weird failure modes–one upgrade had a nasty little side effect regarding root devices:

  • Root is found at root=/dev/vda because the guest boots under the microvm machine type with virtio-blk on PCIe
  • but if a guest ever comes up under the standard chipset instead (a half-applied patch, or an onboot=1 VM starting before the early-boot service has re-patched after a host reboot), the same disk enumerates as /dev/sda, root isn’t found, and the VM looks like it has lost its filesystem.

The data is always there, just at the wrong path. I’ve patched that particular twist in a way that the dpkg trigger and early-boot service ordering try to mitigate it, so the initrd now falls back to /dev/sda when /dev/vda is missing – but until (if ever) Proxmox supports microVMs natively, expect the occasional papercut.

Package version mismatches can cascade. I learned this the hard way – a partial apt upgrade on one of my nodes left libpve-cluster-perl and libpve-network-perl at incompatible versions, which broke all Perl module loading, which meant pvedaemon couldn’t start, which meant VMs couldn’t be managed.

So… always do full dist-upgrade, never partial upgrades on PVE nodes with this.

Another catch (for some) is that there is no VGA and no graphical console – the serial console is your only interface. If something goes wrong during boot, you’re reading kernel panics on a terminal. This is fine for servers, less fine for desktop-oriented guests, and really doesn’t like it.

The next one is that there’s no USB at all – no controllers, no passthrough, nothing on the bus – so I had to get the web UI to hide those options the moment you flip a guest to microvm. This might be a deal-breaker if you want to, say, run a controller (and is why my home automation stuff is still in an LXC).

The kernel is opinionated. My custom 6.12.22 kernel includes exactly what I need and nothing else. If your workload needs a module I haven’t included, you’ll need to rebuild it or use the stock Debian kernel (which works fine but is 3x larger and boots slower).

Finally, GPU and PCI passthrough are disabled, not impossible. And this is another thing I would love people with more money hardware to really dive into, since I actually had an RTX 3060 passed through and working in early testing.

In the end, I decided to temporarily disable GPU support for simplicity – the package strips hostpci* today and there’s no vIOMMU plumbing wired up, so it’s off by default.

There’s nothing stopping anyone from adding it back (save some QEMU microvm caveats around the minimal ECAM PCIe bus and IOMMU setup), and I’d genuinely love to be able to do proper PCI and vIOMMU testing – I just don’t have the spare hardware for it, and I chose to focus this on running agents rather than chasing accelerator passthrough.

Under The Hood: The Patch Strategy

The package modifies three files in the PVE Perl stack:

  • Machine.pm – extends the machine type regex to accept microvm as a valid value
  • QemuServer.pm – adds a use PVE::QemuServer::MicroVM import and a delegation check at the top of config_to_command: if the VM has machine: microvm, hand off to my module entirely
  • MicroVM.pm – the complete command builder, installed at /usr/share/perl5/PVE/QemuServer/MicroVM.pm

Since I’m obsessive about recoverability, the patches are reversible (pve-microvm-patch revert), and the original files are backed up. The dpkg trigger system (interest-noawait qemu-server) ensures automatic re-application after PVE updates. The early-boot service ensures patches are live before any VM auto-starts, preventing that sda/vda mixup I mentioned above:

pve-microvm-early.service (Before=pvedaemon.service pve-guests.service)
    └── /usr/share/pve-microvm/pve-microvm-patch apply

Getting Started

If you’ve read this far, you’re either a brave person or missed the GitHub link, so I’ll just drop the potted version here:

# On any PVE 9.x node:
wget https://github.com/rcarmo/pve-microvm/releases/latest/download/pve-microvm_0.3.12-1_all.deb
dpkg -i pve-microvm_0.3.12-1_all.deb

# Create a Debian microVM template:
pve-microvm-template --vmid 9000 --storage local-lvm --profile standard

# Clone it into a real VM:
qm clone 9000 100 --name my-microvm --full
qm set 100 --machine microvm --memory 1024 --cores 2
qm start 100

The template takes about 60 seconds to build (pulls the OCI image, installs packages, writes the root filesystem). After that, cloning is near-instant if you use linked clones.

What’s Next

Besides more testing (that I just don’t have the hardware, time or even focus for, since like most of my projects I just want to use the thing), what’s left is mostly polish: maybe a nicer configuration layer, network-off-by-default with egress allow-lists for untrusted guests for the paranoid, GPU passthrough if I ever wire up vIOMMU properly, and eventually AArch64 support for ARM-based PVE nodes (which I’ve run in the past, and I think might also have microvm support in their QEMU versions).

And, of course, someone should probably upstream this. I reached out informally to a couple of people at and was told I should join a mailing-list to discuss this, but a) that is awfully 90s and b) I just don’t have the time to do more than maintain this for myself. And yeah, I get that this would have to go through the entire enterprise QA pipeline (I ).

The source is at github.com/rcarmo/pve-microvm, the patches are fairly small, it does not break itself since 99% of the features come from QEMU/KVM, so… I’m just literally giving it away.

But if you’re running a homelab, know your way around Linux and want container-speed VMs with actual isolation, this might be what you’ve been looking for since… well, statistically, maybe never–but you can have it now!

Shoehorning... R-Type into the ESP32

This is a very quick follow-up to from a couple of weeks ago, and worth noting for the fun value and a little bit of .

I love old arcade games (especially some NeoGeo titles), so it was only natural that I gravitated to them while I was trying to get Mac color rendering to work on an ESP32–if there’s a piece of software that was extremely attuned to its hardware, it’s arcade games, often written to map directly into hardware.

And I love R-Type in particular, so even though I originally thought of getting Metal Slug to run on the ESP32-S3 because of its shared 68000 heritage with the Mac, I ended up wondering how fast I could make that run.

Turns out the M72 boards Irem did for R-Type ran an 8086-like CPU (the NEC V30, which has a few extensions) and a Z80 in tandem, and that the emulator wasn’t at all hard to recompile if you stubbed out things like audio (which is done by the Z80).

The Output, So Far

I decided to start with the hardest/smallest target (the plain CYD with a plain ESP32), which can barely run the emulator in one core and has almost no free RAM–to the point where after a few iterations it was rendering something, but clearly wouldn’t make it without rebuilding the whole emulator from scratch.

Getting it to render frames effectively (as in, rendering one frame without any visible stutters inside the frame), is exactly the kind of problem I am having on the Mac emulator because a) you typically need enough RAM to manage the framebuffer and b) all ESP CYD displays have limitations regarding display (typically SPI) bandwidth.

For a little bit of inside baseball (yeah, I’ve been spending time with US folk again) the real hassle (especially on the smaller ESP32) was handling memory maps, palette RAM, tile/sprite priority, and frame timing. You can finagle things a bit by reassigning one of the cores to “just” do rendering, and there are various DMA modes depending on chipset, but all of which proved to be enough distraction for me to upgrade to an S3-powered display as soon as I could.

So I just focused on clean frame renderings, even if the time required to produce them made it feel like a slideshow, so much so that after figuring out the backgrounds were a static texture composited behind the main sprites, I decided to skip that.

It would have been amazing to see running on the smaller one, though.

Then I got piclaw to port the entire thing to the ESP32-S3, and all of a sudden there was enough horsepower to run and render at around 50fps:

Both boards, starting from the same emulator state but rendering as fast as they can

I’m so happy with the results that I am considering getting this to run on an ESP32-P4 and see what we can do about audio and using the USB host port on that for a controller, but I really should focus on backporting the rendering techniques into a Mac emulator…

Either way, this was a great way to refine my approach at getting to tackle long, grinding, intricate problems, and the code is up on GitHub if anyone cares to check it out.

The Method

However, before handing it over to agents, I had to specify how to do this, and right now, after half a dozen embedded development and hardware porting projects since Christmas, the strategy is pretty well established:

  • Get something to run on a host harness, running VNC, plain SDL or just framebuffer dumps
  • Derive milestones from that (still quite manual) job. Maybe even more harnesses (like target CPU opcode harnesses for JITs, sprite subroutines, etc.)
  • Tackle the first few milestones on a simpler (but also more limited) hardware/software target
  • Build reusable debugging/introspection tools for each milestone that the agents can use later to have a feedback loop
  • Expand out from the above.

That’s why my first hack for these things is just to point a webcam at the display (or generate a frame, or a known good end-to-end output dump) and get them to render a test pattern:

The M5Stack Tab 5, the highest-end ESP32 device I have, showing a test pattern
The M5Stack Tab 5, the highest-end ESP32 device I have, showing a test pattern

From then on, the agents can use the camera and other test patterns to verify that they are rendering correctly (of course it’s useless for video, but any SOTA model these days can take useful feedback from images), and, as a bonus, I get their snapshots on the piclaw web interface and can verify that they are actually doing what I want them to do.

The Harness

I already knew what I wanted to achieve (in short, to explore and document techniques to render fast graphics on these boards), and I had a camera pointing at the target devices , but one of the things I wanted to explore with this setup was to mitigate long context problems:

  • Even if you use things like /goal (which I do, but with bounded horizons) models will inevitably deviate from the actual goal
  • As context piles up, they will also inevitably hyper focus on tangentially relevant issues (because they see code issues and zero in on those rather than take a broader view of what needs to be achieved)
  • Dead ends and back-tracking to reassess better approaches becomes nearly impossible

What I did was very simple. piclaw allows me to easily have multiple sessions running, and a few weeks ago I implemented a chat tool, with hilarious results:

Two piclaw agent sessions chatting with each other
Two piclaw agent sessions chatting with each other

…plus “agents” or sessions also have the ability to introspect each other’s state (goals, messages, current activity, compaction status, etc.) and schedule themselves, so setting up an @auditor / overseer that can keep track of other agents is trivial–all I needed was to write a SKILL.md file that told the auditor to:

  • Observe commits, logs, tests, and artifacts; judge progress from concrete evidence towards the set goal, not just sessions being generally “active” but treading water.
  • Enforce strict, reproducible completion gates (no interpreter fallbacks, ROM/global seeding, scanner bypasses, or synthetic shortcuts like skipping steps or faking code).
  • Nudge active sessions once with a concrete, evidence-backed step, a measurable success signal, and any corrections to make.
  • Require commit/push hygiene with a quality bar for commit messages
  • Never edit target-session code or implement fixes, keeping itself to steering only via chat and audit log entries.
  • Escalate from steering to actual interruptions only after repeated ignored guidance
  • Keep a running log with a summary of what was done every cycle (state, output/structural/strategy/steering aspect) and write out a neat Markdown template in the web UI

I gave that file to Opus 4.8 (I definitely still don’t trust Opus to write code, but I did want a different, complementary model steering Codex 5.5), told it which sessions to monitor, and let it go on its merry way.

For this particular case, I did have to intervene once or twice to highlight rendering and palette issues (which I can do in piclaw’s web interface on my iPad), but that was it:

Highlighting rendering and palette issues
Highlighting rendering and palette issues

And I think this approach has legs–I’m now using it to grind through the porting/testing/quality aspects of other things I’m doing, and will eventually try it with local models (if I ever to run them).

Deliberately Out Of Scope

Note that I did not want to create a fancy multi-agent system where every agent talks to each other: I wanted to have long-term oversight and steering.

And this is not a delegation pattern either (there is also a delegate plug-in that allows each session to delegate chores to simpler models).

I deliberately chose this approach because, in general, I’ve found multi-agent systems with “party lines” and loose couplings to be a complete waste of tokens unless there is a clear hierarchy and very well scoped outcomes–just like in a human team, really…

Notes for June 7-14

Another week, another set of bank holidays that I tried to leverage strategically to do interesting things with my time, and… I ended up throwing out my back and having to sit very still for hours at a time, which made the whole thing feel like a waste of paid vacation with extra ibuprofen.

The upside, if I can call it that, is that sitting still is reasonably compatible with finishing TV shows, staring at logs, profiling traces and dealing with broken model outputs for hours on end. Which sort of explains my notes for this week…

Local Models, Local Pain

I am a bit fed up with . Not in the usual performative sense, but because models are still not that smart, the tooling around them is uneven, and I . The week was a slow burn of getting go-pherence to become more than a bunch of random matmuls, which meant pushing Ideogram 4 far enough to make cat pictures and then immediately running into the limits of my RTX 3060.

A small generated cat image from Ideogram 4 running locally
Of course I used AI to generate a cat picture, this is the Internet!

There is no way I can scale this out to do more than 256x256 low-quality pictures on that card without a lot of pain and slow iteration, and the iteration is the problem. The code can be made to run, but the gap between generating a one-off cat picture and being able to use it routinely (and at acceptable performance) is just not worth it with the gear I have.

I am very seriously considering to get an NVIDIA GB10 or a Ryzen AI device, which seem like the bare minimum hardware to do barely half-assed local inference.

I also spent an outrageously unproductive amount of “learning” time on shoehorning DiffusionGemma into go-pherence, on both the and the RTX 3060 via mmap tricks, GPU expert caching, sparse self-conditioning and all the stupid details that decide whether an inference run takes minutes or merely feels like it does. Some of it worked surprisingly well (I got coherent answers), but none of my hardware is good enough for useful answers.

Regardless, the more I revisit AI-assisted projects from a few weeks ago, the more time I spend auditing whether the code matches the written SPEC.md rather than adding anything new. Code quality has been mostly OK in projects where I have my usual vetting and testing pipeline in place, but the common thread in the ones where I don’t is increasingly obvious: they were Anthropic-heavy. Opus keeps being very fluent about what it claims was implemented and very wrong about what is actually there. Go figure.

A Tailscale Rat

I picked up a temp work laptop early in the week (a Snapdragon X Plus machine), and although it is still early days I was impressed enough with the hardware and battery life to hack together womprat so I could get at my personal machines from it without installing anything of consequence.

Since this is a loaner and I mostly live inside AVD anyway, the interesting bit was making something small, disposable and ARM-friendly. Me being me, I used , built it on Linux, and glued together a browser, SSH client and remote-display shell on top of tsnet, WebView2, RDP and VNC bits. It is a pretty great combination, when it works, but I ended up having to wire up a Linux WebKitGTK test shell to have reproducible debugging.

More Agents

piclaw is still the thing I use to fix other things, so I kept poking at it even if I am a bit tired of the constant upstream churn from pi and associated paper cuts that come with maintaining a TypeScript application of its complexity. gi is now able to bootstrap itself, but not a replacement I can trust (I used Opus 4.8 on it and am still paying that technical debt), so I’ve actually been considering shifting to Codex for most things and use pi solely through IPC mode, which would mean going back, full circle, to vibes.

Emulation

After I tried (and failed) to enjoy some retro gaming this week (even though I did get a bit of a kick of further automating my Steam setup), but to compensate I took another pass at my NeXT and Mac JIT emulators, partly because I realised that (you guessed it) Opus lied and failed to implement MMU and I/O emulation correctly across the board.

Hardware and 3D Printing

I have another Radxa board to test, and this time I decided to have a go at doing photogrammetry to capture enough of the relative dimensions to design a 3D printed case for it–and besides the App Store being crammed with scammy “3D scanner” apps that do very little else despite repackaging SimpleObjectCapture (which, incidentally, you can now build for yourself in half an hour using Codex) I also confirmed iOS Object Capture is not really that great for fine detail, at least in the default settings:

The yet untested Q8B
The yet untested Q8B

I suspect I will be getting back to CAD and 3D printing pretty intensely over the next few months (or whenever I can actually move around). My back is still complaining, but at least I have an entire work week of… more sitting to… look forward(?) to, starting tomorrow.

Shoehorning Flying Toasters into a ESP32-S3

This is the (very) abridged story of how I got After Dark running on my own flavour of the –specifically, Flying Toasters on an ESP32-S3 board, zooming along at 65 FPS, which is both completely pointless and one of the more satisfying things I’ve done this month.

But Why?

Because.

A Mac… For Ants

When I first got wind of the , I immediately dug out one of my Cheap Yellow Displays and tried to get the software running on it, only to find out two things:

  • My Cheap Yellow Display was (predictably) different (same 240×320, but an ESP32-D0WD)
  • The resistive touch screen mine had, together with the relatively small size, made it unusable in practice

I mean, it ran, but… Here, you be the judge:

Yes, that is a coin cell, and this is a bit contrived of an example
Yes, that is a coin cell, and this is a bit contrived of an example

The photo above was me trying to push the envelope a bit.

The truth is that even with proper 1:1 pixel scaling, portrait rendering and some creative interpretations of how to push faster screen updates through the SPI bus, and even considering the Mac Plus emulation and basic Wi-Fi access worked really well (because of the genius trick of exposing ESP32 hardware through to the emulator), it was painfully slow.

Moar Pixels

I did the obvious thing and ordered a couple more, larger displays with a slightly more powerful chip and proper capacitive screens:

Size comparison, still debugging display rendering
Size comparison, still debugging display rendering

And the results were glorious: the new boards are labelled ESP32-8048S043C, and they come with an 800×480 capacitive panel, an ESP32-S3, and 8MB PSRAM, which is more than enough to run the emulator and a more complete version of Mac OS, running the original Musashi-based umac emulator at a fairly good speed (certainly faster than the original Mac Plus or even the Classic) and at full panel screen resolution (480×800) in black and white:

This reminded me a lot of my years doing PageMaker/print design
This reminded me a lot of my years doing PageMaker/print design

System Shenanigans

However, the original Cydintosh image was ancient. The emulator worked OK, but I needed to do a bunch of upgrades to the firmware on my personal fork:

  • Rewrote the display layer to support multiple board profiles, with the S3 target running a 480×800 framebuffer rotated to landscape
  • Bumped emulated Mac RAM to 1MB (the original used 128KB, barely enough for System 3.2)
  • Fixed portrait/landscape orientation, touch mapping, and the GT911 capacitive driver
  • Added an MQTT layer so the thing can actually do smart-home control (the original project’s raison d’être)
  • Patched umac to suppress Sony eject requests so disk mounting worked properly
  • Built an impromptu web flasher and a serial log capture tool, because flashing ESP32-S3 devices is its own special kind of hell

Getting there took a few weeks of evenings and a bit of creative hacking.

Registration? We Don’t Need No… Oh

Everything went pretty swimmingly until I actually injected After Dark into the System folder:

Guess what, no keyboard. At all.
Guess what, no keyboard. At all.

Without any way to input text (and no, Key Caps doesn’t really work for this), I had to resort to removing bits from the control panel with ResEdit until it mostly worked (removing the dialog was not trivial, and in the end I asked Codex to just disassemble the INIT(?) resource and skip over the dialog).

A few minor firmware tweaks later, I have a “real” Mac that runs After Dark as a screensaver perfectly:

Flying Toasters running on the ESP32-S3

In case you’re wondering, the toasters fly at a solid 65-67 FPS.

Which is faster than they ran on my actual Macintosh SE/30 back in 1991, because that machine was doing it in 1-bit black and white on a 68030 and this is a $15 display board doing it in 16-bit colour on a dual-core 240MHz chip with more RAM than my first three computers combined.

Not too shabby, even if it’s in black and white (which is what the original Mac ROM can handle).

This is Clearly Worth Overdoing

And that is why I have gone down the rabbit hole of trying to port this to an LC ROM, on yet another ESP32 board (a P4). It turns out there’s not enough RAM on the other boards to hold a colour frame buffer, not enough bandwidth to do the delta refresh hacks I did in the original version, and not enough CPU power and storage for the required emulation and system changes…

I’m still grinding through the mechanics of paring down BasiliskII to fit, but in the meantime this board is sitting on my desk doing nothing but rendering endless toasters and reminding me some things are worth the sheer fun involved every time I glance at it.

The MilkV Jupiter 2/SpacemiT K3

This is a fascinating box–so much so that after almost three weeks playing with it, I amassed so much material that I nearly decided to split my review into two parts, but in the end I decided to condense it a bit and post a longer piece than usual, even if that means almost half of it is a fairly wide-ranging exploration of how to get AI workloads on it.

The MilkV Jupiter 2 in its metal case
The MilkV Jupiter 2 in its metal case

Spoiler: We’re tantalizingly close to having usable non-GPU inference on SBCs, and surprisingly enough, RISC-V is more interesting than ARM right now.

I’ve tested a lot of ARM boards , but only a couple of RISC-V machines–and the MilkV Jupiter 2 is quite a substantial system: Sixteen cores (with a twist), a refreshingly roomy 32GB of RAM, a 10GbE SFP, Wi-Fi 6, a GPU with actual DRM nodes, all in a Pico ITX form factor.

Disclaimer: my contacts at Radxa supplied me with a Jupiter 2 free of charge, and as usual, this article follows my .

On paper, this is the first RISC-V board that doesn’t feel like a science project.

In person, and unlike most of the SBCs I get, the Jupiter 2 is a finished product, and came in a neat little box, fully assembled and contained in an unassuming metal case with external antennae as the only extra parts. No power brick, but since it has a USB-C PD port, I had zero trouble powering it from one of my monitors.

Hardware

After some careful disassembly, the board itself is pretty dense: 1× DP out, 1× eDP ribbon, 1× USB-C PD power input, 3× USB-A 3.0, 1× GbE RJ-45, 1× 10GbE SFP+ cage, an M.2 slot and what looks like a second M.2 for storage. There are also MIPI/eDP ribbon connectors I haven’t tested.

The board is dwarfed on the top side by the cooler, which I dared not remove
The board is dwarfed on the top side by the cooler, which I dared not remove

The SoC is SpacemiT’s K3–a big.LITTLE style arrangement with 8×A100 cores at 2GHz and 8×X100 cores at 2.4GHz, which makes it the first RISC-V chip I’ve handled that has asymmetric core clusters. And since there are a few other devices out there with the same reference design, I will henceforth refer to the Jupiter as the K3 for short.

Specs

The machine I’m testing has a nice assortment of features:

  • 16 RISC-V cores (8× Spacemit A100 + 8× Spacemit X100)
  • 32GB RAM
  • 128GB UFS
  • RTL8852BE Wi-Fi 6 + Bluetooth
  • 1 GbE RJ-45 + 10 GbE SFP (RTL8127 10GbE via PCIe)
  • An IMG (PowerVR) GPU
  • NOR flash for bootloader (SPI, 8MB: bootinfo + FSBL + env + eSOS + OpenSBI + U-Boot)
  • PWM fan
  • Pico ITX form factor

The ISA

If you’ve never come across SpacemiT’s stuff before (I had only a bare inkling of the K1), I heartily recommend the public SpacemiT K3 documentation and their GitHub repository since the architecture is laid out there, and it was fairly easy to get a high level grasp. In particular, the K3 SoC datasheet has a pretty good overview:

Block Diagram from the K3 Technical Brief
Block Diagram from the K3 Technical Brief

A key thing that needs to be taken into account is that the A100 cores are fundamentally different from the X100 ones. They have extended vector instruction sets, dedicated transactional memory, and, well… AI.

That documentation also seems to be the original source of the marketing claims that the K3 provides 60 TOPS of AI compute and can run 30B models at over 10 tokens/s. Well, sort of– as another spoiler, I can share that I hit a hard cap at an effective 3B (which seemed to be the practical limit), but we’ll get there…

Hardware Info

The board identifies itself as “SpacemiT K3 Pico ITX” in the device tree, and cores are reported like so:

Architecture:                            riscv64
Byte Order:                              Little Endian
CPU(s):                                  16
Vendor ID:                               0x710
Model name:                              Spacemit(R) A100
  Thread(s) per core:                    1
  Core(s) per socket:                    8
  CPU max MHz:                           2000.0000
  CPU min MHz:                           614.4000
Model name:                              Spacemit(R) X100
  Thread(s) per core:                    1
  Core(s) per socket:                    8
  CPU max MHz:                           2400.0000
  CPU min MHz:                           614.4000
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                10 MiB (4 instances)

One of the nice things about this box is that it comes with a 10GbE Realtek NIC. I wasn’t able to test that at full speed yet since my 10GbE interfaces are all in my server closet, but the 802.11ax reported below worked flawlessly with my Wi-Fi 6 setup:

# lspci
0000:00:00.0 PCI bridge: SpacemiT X100 PCIe Root Complex (rev 01)
0002:00:00.0 PCI bridge: SpacemiT X100 PCIe Root Complex (rev 01)
0002:01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8127 10GbE Controller (rev 08)
0004:00:00.0 PCI bridge: SpacemiT X100 PCIe Root Complex (rev 01)
0004:01:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8852BE PCIe 802.11ax Wireless Network Controller

There isn’t a lot to report on the USB front (most of the below is what is plugged into my LG Ultrafine):

# lsusb
Bus 005 Device 002: ID 043e:9a46 LG Electronics USA, Inc. USB2.1 Hub
Bus 005 Device 003: ID 043e:9a48 LG Electronics USA, Inc.
Bus 005 Device 004: ID 043e:9a42 LG Electronics USA, Inc. USB Audio
Bus 005 Device 009: ID 046d:085e Logitech, Inc. BRIO Ultra HD Webcam
Bus 005 Device 010: ID 043e:9a40 LG Electronics USA, Inc. USB Controls
Bus 007 Device 004: ID 04d9:0006 Holtek Semiconductor, Inc. Wired Keyboard
Bus 007 Device 005: ID 093a:2510 Pixart Imaging, Inc. Optical Mouse

The flash storage it ships with is also sensibly organized:

# lsblk
NAME        SIZE TYPE MOUNTPOINT MODEL
sda       119.3G disk            TY7B-128
├─sda1      256M part
├─sda2      256M part /boot
└─sda3    118.8G part /
mtdblock0     8M disk
mtdblock1   128K disk
mtdblock2   512K disk
mtdblock3    64K disk
mtdblock4     1M disk
mtdblock5   384K disk
mtdblock6   5.9M disk

That sda (model TY7B-128) initially fooled me into thinking it was a SATA SSD–but there’s no SATA controller on this board, and the 3.4 GB/s reads I measured later are well past anything SATA III can do (~600 MB/s). It’s actually 128GB of onboard UFS, which rides the kernel’s SCSI layer and so enumerates as sda exactly like a SATA disk would (NVMe would be nvme0n1, eMMC mmcblk*). The mtdblock devices are the 8 MB NOR flash partitions (bootinfo, FSBL, env, eSOS, OpenSBI, U-Boot).

# sensors
pwmfan-isa-0000
Adapter: ISA adapter
pwm1: 60% MANUAL CONTROL

thermal_cluster3-virtual-0
Adapter: Virtual device
temp1: +60.0°C

thermal_cluster1-virtual-0
Adapter: Virtual device
temp1: +60.0°C

thermal_gpu-virtual-0
Adapter: Virtual device
temp1: +63.0°C

thermal_top-virtual-0
Adapter: Virtual device
temp1: +62.0°C

cros_ec-isa-000c
Adapter: ISA adapter
fan1: 3208 RPM

thermal_cluster2-virtual-0
Adapter: Virtual device
temp1: +63.0°C

thermal_cluster0-virtual-0
Adapter: Virtual device
temp1: +64.0°C

thermal_vpu-virtual-0
Adapter: Virtual device
temp1: +60.0°C

The sensors output is a bit weird, but it does cover all the CPU cores (A100 are clusters 0 and 1, X100 are 2 and 3). And I will have a bit more to say about the fan.

But I’m ahead of myself here–these were gathered after plugging it in, obviously, and it’s worth rewinding and going over that part:

First Boot

This was a first-class experience, and I wish all SBCs worked this way: I plugged the DP port into my ancient LG Ultrafine, powered on the monitor, and got a Bianbu first-boot wizard in less than 5 seconds after the initial logo.

Clicked through it–language, timezone, user account–and landed on a working accelerated desktop. That’s it. No GRUB patching, no DTB hunting, no resize-filesystem bugs, no serial console required. The smoothest first boot I’ve had with an SBC all year.

The board ships with Bianbu 4.0 (“Resolute Raccoon”)–a Debian-based distribution from SpacemiT, which, unlike most ARM boards I’ve used recently, is actually running a modern 6.18.3 kernel.

MilkV Jupiter 2 LXQt on Wayland - note how only the first 8 cores are active
MilkV Jupiter 2 LXQt on Wayland - note how only the first 8 cores are active

The desktop runs LXQt on Wayland, SDDM as the display manager, and the whole thing felt responsive enough that I didn’t immediately reach for the terminal. That is not something I say about SBC desktops often, and even though I then spent most of the past three weeks accessing it via ssh, I would likely have zero issues using it.

Standard apt works (repos seem to be at spacemit.com), Debian toolchain is present, and the kernel command line includes some interesting RISC-V-specific hints: unaligned_scalar_speed=fast and unaligned_vector_speed=fast, which I think are related to the RVV extended vector instruction set and the way the kernel does thread allocation.

I dug around a bit more and the boot chain goes through NOR flash (OpenSBI + U-Boot) → UFS, which is cleaner than the SD-card-based setups on most SBCs I’ve tested, and it was able to update itself without any issues:

Setting up spacemit-ec-firmware (1:0.0.22) spacemit-ec-firmware: payload installed successfully. Current EC firmware 'SPACEMIT_PICO_ITX-V00.14' is older than packaged firmware 'SPACEMIT_PICO_ITX-V00.16'. Starting automatic EC firmware update during package installation...

[INFO] Automatic EC firmware update triggered during package installation.
[INFO] Current RW: SPACEMIT_PICO_ITX-V00.14
[INFO] Target FW : SPACEMIT_PICO_ITX-V00.16
[WARN] Do not remove power, reset the system, or interrupt the tool while flashing.
[INFO] 1. Erasing flash region... Erasing 262144 bytes at offset 0... done.
[ OK ] Erase completed.
[INFO] 2. Writing firmware image... Reading 242688 bytes from /lib/firmware/k3-pico-itx/ec.bin... Writing to offset 0... Writing: [########################################] 100% done.
[ OK ] Write completed.
[INFO] 3. Reading back flash contents for MD5 verification... Reading 242688 bytes at offset 0... Reading: [########################################] 100% done.
[ OK ] MD5 verification passed.
[ OK ] Automatic EC firmware update finished successfully.
[INFO] This reboots the EC firmware only. Linux is not rebooted automatically.
[WARN] The power LED may blink and ectool may be unavailable briefly after the command.
[INFO] Sending EC reboot command...
[ OK ] EC reboot command sent.
[INFO] Waiting 10 seconds for EC reboot to settle...
[WARN] Reboot Linux manually now to restore EC communication.
[INFO] After Linux reboots, verify with: ectool version Automatic EC firmware update completed.

Not UEFI, but compared to the U-Boot-on-SD-card experience that most ARM SBCs inflict on you, having a proper NOR flash boot chain with OpenSBI → U-Boot → onboard UFS is a step up, because it means you can brick the OS partition and still recover without reflashing an SD card on another machine (and yes, Rockchip, I’m looking at you).

And since it all worked out of the box, I did not try adding an NVMe (there’s an M.2 M-Key slot for one) or booting from it (yet), although since there is official Ubuntu support I fully intend to try that out in the future.

Toolchains

Developer tooling for RISC-V will be foremost on most of my readers’ minds, so I can tell you right away that I am currently making extensive use of these:

  • GCC 15.2 (riscv64)
  • Go 1.25.7 – works out of the box, which is significant for me
  • Python 3.14.3
  • Make 4.4.1

Sadly (for me), Bun isn’t available, since there’s no official riscv64 build available yet, but node works OK. I focused mostly on , though.

Performance

To get started, I ran a small battery of tests to get a feel for where this sits relative to the (CIX P1, 12 ARM cores) I’ve been .

CPU

Test MilkV Jupiter 2 (16× RISC-V) Orange Pi 6 Plus (12× ARM) Notes
7-Zip multi-thread 17,547 MIPS 42,346 MIPS ARM is 2.4× total
sysbench CPU (1 thread) 2,329 ev/s 2,800 ev/s ARM 1.2× per-core IPC
sysbench CPU (all cores) 16,980 ev/s (8 usable) 25,746 ev/s (12t) 7.3× vs 9.2× scaling
fib(42) GCC -O2 1.110s 0.649s ARM 1.7× faster
Go 50M trig ops 2.68s 0.483s ARM 5.5× (Go arm64 mature)
Python 10M loop 4.74s 1.07s ARM 4.4×

Note that these benchmarks only ran on the X100 cluster (cores 0–7). The A100 cores (8–15) are kernel-fenced for AI work–htop shows them sitting idle, and sched_setaffinity silently refuses to pin anything there from a normal shell. The reasons for that are various and fascinating, and I’ll get into them below.

The sysbench single-thread number is the interesting one here: 2,329 versus 2,800. That’s only a 1.2× gap per X100 core. The 7-Zip figures (17.5k vs 42.3k MIPS) look damning until you realize that the A100 cores weren’t used at all, so the Jupiter 2 is really running 8 general-purpose threads against the P1’s 12.

The real gap shows up in Go and Python (4-5×), which probably says more about how young the riscv64 runtime backends are than about the hardware itself.

Memory Bandwidth

Test MilkV Jupiter 2 Orange Pi 6 Plus (A720 best)
sysbench memory read 3,051 MiB/s 15-17 GB/s (libc memcpy)
sysbench memory write 2,694 MiB/s 35-47 GB/s (memset)

I went back and ran this in parallel on the CIX P1, and the K3’s memory bandwidth is much lower–roughly a fifth for reads. This is likely the biggest single performance gap and puts an upper cap on whatever the CPU can do regardless of how much it packs into each cycle. For inference workloads that are memory-bound, this matters a lot. The K3 has a few workarounds, though, as we’ll see later.

Storage

Test MilkV Jupiter 2
Sequential write 1.2 GB/s
Sequential read 3.4 GB/s
4K random write 113 MB/s (~28K IOPS)

The built-in UFS storage is very nice–NVMe-class speeds, better than what I saw on the Orange Pi 6 Plus’s NVMe setup with my own (underused) PCIe 4 SSD. No complaints here.

Thermals under Load

The board stays well-behaved under sustained 8-core stress-ng:

  • Idle: 59-64°C, fan at 45% / 2335 RPM
  • Full load (30s sustained): 62-68°C, fan ramps to 60% / 3194 RPM
  • No throttling observed, which made my usual CPU/thermal charts kind of pointless

Again, stress-ng --cpu 0 ran on the 8 available X100 cores, but even when I ran both CPU and AI loads that used the A100 cores, the fan was audible but not objectionable–noticeably quieter than the Orange Pi 6 Plus’s cix-ec-fan in quiet mode, and the fan controller API is much saner.

Since I had a few tussles with the Orange Pi 6 Plus’s fan controller limitations, I let an LLM loose on /sys/devices, and it found out that the Jupiter’s fan is managed by a CrosEC controller over eSPI (/sys/devices/platform/soc/cac8c000.espi/84000000.ec). That exposes a standard hwmon interface with fan1_input and (surprisingly) fan1_fault that standard Linux utilities can read (and the built-in cooler does seem to have the right number of wires to provide fan sensing, which is a nice touch).

There’s also a separate pwm-fan platform device at /sys/devices/platform/pwm-fan/hwmon/hwmon8/pwm1 that accepts values 0-255 for direct duty-cycle control, with pwm1_enable=1 when thermal management is active, with a pwm-fan cooling device linked to thermal_zone0. In practice, you never need to touch any of this–the board keeps itself at 60-68°C under sustained load with the fan barely audible, even when using all 16 cores and at an ambient temperature of nearly 28°C in my office.

Power Consumption

I stuck a USB PD power monitor between the PSU and the K3, and the figures were pretty stable: 11W idle, an oddly symmetrical 22W under load. I suspect using an SFP for networking will add significantly to that, but most of my testing was actually done by ssh over Wi-Fi.

GPU

Unlike the , where the GPU required driver rebinding and vendor package archaeology, the Jupiter 2’s PowerVR GPU works out of the box.

No module loading, no blacklisting, no package hunting. I ran vulkaninfo and got a conformant Vulkan 1.3 device on the first try, although I am not sure how far I can go with Vulkan compute on this board yet since I explored other avenues.

The hardware is an IMG PowerVR B-Series BXM-4-64 MC1, and Vulkan reports it cleanly:

  • deviceName = PowerVR B-Series BXM-4-64 MC1
  • driverID = DRIVER_ID_IMAGINATION_PROPRIETARY
  • apiVersion = 1.3.277
  • driverVersion = 1.588.1135 (24.2@6603887)
  • conformanceVersion = 1.3.8.1

Doing the usual barrel-scraping YouTube influencer “testing” of firing up a 4K video in the browser is… absurdly fluid, really, since the K3 has a dedicated video decode unit (/dev/video-dec0, V4L2 “mvx” driver–decode only, no hardware encode that I can find) and that seems to be properly stitched together on the Bianbu packages.

OpenCL 3.0 is also present, with cl_khr_fp16 and cl_khr_integer_dot_product – the latter suggesting hardware support for int8 dot products, which is exactly what you want for basic vision processing. I tried poking at it with my Vulkan tooling, and the Vulkan side exposes shaderFloat16 and shaderInt8, 16KB shared memory, and 2 compute queues.

In short, I had zero issues with desktop acceleration, and I expect the K3 to be well supported going forward. I do intend to explore Vulkan on this a bit more, but as you’ll see below, I got completely sidetracked by the ISA and how it does vector compute…

NPU Vs A100 CPU Cores

The device tree shows an Arm China Linlon V5 (Zhouyi AIPU) at c0500000, status okay.

Okay, then, but… the device-tree lacked the obvious NPU plumbing I am sort of used to from ARM:

  • /proc/device-tree/soc/linlon-v5@c0500000/compatible says arm china,linlon-v5
  • there are no /dev/aipu*, /dev/npu*, /dev/linlon* or /dev/zhouyi* nodes
  • there are no aipu, linlon or zhouyi kernel modules under /lib/modules/6.18.3-generic
  • dmesg is silent for those names
  • web searches for linlon-v5, arm china,linlon-v5, Zhouyi AIPU and SpacemiT K3 NPU drivers turned up no public driver or SDK that matches this node

The Linlon V5 block is effectively opaque–no driver, no SDK, no kernel module. So it’s a dead end for now, although I suspect there are drivers for it somewhere.

What is interesting is what’s hiding in Bianbu’s apt repository: a SpacemiT ONNX Runtime stack (spacemit-onnxruntime, python3-spacemit-ort) and a spacemit-tcm package. The latter ships libspine_tcm.so, spacemit-tcm-smi and a public spine_tcm.h, and it talks to /dev/tcm rather than to a classic /dev/npu device. That’s not an NPU path at all–it’s targeting the A100 RISC-V cores and their tightly-coupled memory directly.

The ISA, Again

After the first evening of poking around, I decided to do what most people would do and read some actual documentation–which wasn’t hard to come by.

The CPU chapter in SpacemiT’s documentation gave me a few hints: the A100 cores run SpacemiT-IME (Inference Matrix Engine), a set of custom RISC-V vector extensions for quantised matrix arithmetic, with a programming model that gave me a bit of a flashback to my FORTRAN and VAX days–matrices in registers, explicit tiling and core synchronisation–but as a crash course in what RISC-V vector extensions can actually do, it made for a fun read.

The short version, if you’re in a hurry, is that this is a “unified memory” RISCv system where the CPU itself can do some interesting quasi-GPU math:

A page from the docs
A page from the docs

Go-ing Places

The long version is that this is almost tailor made for go-pherence, my pet inference library. I’ve been trying to do mostly MLX-like FP16 stuff with it, but my intent is to do non-GPU stuff with it, and even though AVX2 and NEON are interesting, I was completely nerd-swiped by the idea of using this RISC-V RVV variant to do “proper” inference.

And Codex was able to sort out how to map this to useful steps and identify parts of the instruction set that could do just that:

The custom instructions (vmadotsu.hp, vmadotu.hp, vnpack4.vv, vupack.vv, vpack.vv) perform fused int4×int8 dot products with FP16 accumulation. Each vmadot dispatch processes 128 bytes of activation against 512 bytes of 4-bit weights, producing 32 partial results. The data layout treats VS1 as copies×(M, K) matrices and VS2 as copies×(K, N) matrices, with the result stored across VD(L) and VD(H).

The “hard” part was to map this to Go assembler, but, again, Codex had no trouble churning out code for vector operations by just lining up the right bits:

// func rvvMulVecVec(a *float32, b *float32, out *float32, n int)
TEXT ·rvvMulVecVec(SB), NOSPLIT, $0-32
    MOV  a+0(FP), X10
    MOV  b+8(FP), X11
    MOV  out+16(FP), X12
    MOV  n+24(FP), X13
    WORD $0x012072d7            // vsetvli t0, zero, e32, m4, tu, mu
loop:
    BEQ  X13, X0, done
    WORD $0x0126f2d7            // vsetvli t0, a3, e32, m4, tu, mu
    WORD $0x02056007            // vle32.v v0, (a0)
    WORD $0x0205e207            // vle32.v v4, (a1)
    WORD $0x92021057            // vfmul.vv v0, v0, v4
    WORD $0x02066027            // vse32.v v0, (a2)
    SLL  $2, X5, X6
    ADD  X6, X10, X10
    ADD  X6, X11, X11
    ADD  X6, X12, X12
    SUB  X5, X13, X13
    JMP  loop
done:
    RET

All it needed was this page (and a couple of others):

The instruction format page
The instruction format page

TCM (Tightly Coupled Memory)

I had some trouble figuring out how this mapped to the TCM memory device that I had found, but a few more pages into the ISA doc it became clear:

TCM is 3 MB of on-chip SRAM (8 × 384 KB blocks), meant as a low-latency scratchpad for the IME2 matrix engine. According to the docs, both sets of cores can access it in pairs:

  • From the X100 cores (VLEN=256), TCM reads at 1.14 GB/s (uncacheable device memory)
  • From the A100 cores (VLEN=1024), it reads at 5.4 GB/s via a direct SRAM path for wide vector loads

This is a pretty dramatic difference from the RAM bandwidth I measured earlier, and even more so if you consider that the A100 cores can access it four times faster than X100 cores. And there’s more:

  • Cores are organised in pairs sharing TCM blocks, so they can exchange results much faster
  • I later found that SpacemiT’s own reference code uses paired-worker barriers to overlap DMA (weight prefetch from DRAM into TCM) with compute on the partner core

If you’ve ever done double-buffering, well, this is it applied to vector compute.

Armed with this knowledge, I distilled it into a SPEC and went to town on the K3 with Codex to see if we could port some of the go-pherence SIMD inference kernels, but there was a serious kink: I couldn’t for the life of me figure out how to schedule code on the A100 cores.

Thread Scheduling Weirdness

So I asked Codex to get out Capstone and disassemble the TCM libraries. Turns out getting a thread onto the A100 cores requires a two-step handshake:

  • write the thread’s TID to /proc/set_ai_thread (a kernel interface that unlocks scheduling on cores 8–15 for that specific thread)
  • then call sched_setaffinity to pin it.

Without the registration the kernel silently refuses the affinity change–those cores are fenced off from normal userspace entirely (which explains the oddities in the early benchmarking).

SpacemiT’s own llama.cpp fork (PR #22863) uses this pattern: six pthreads permanently pinned to cores 8–13, synchronised with spine_barrier_t (an atomic spinlock barrier), sitting in a persistent work loop that processes matrix tiles from a shared queue.

The workers never return to the OS scheduler between operations–barriers replace dispatch overhead entirely. I later realized that a) this is how the K3 can hit 35–40 tok/s on Qwen3-0.6B Q4_K_M b) Go scheduling has a lot more overhead.

Disassembling the ONNX runtime I’d found (SpaceMITExecutionProvider) showed it used the same cores with SPACEMIT_EP_* settings for thread count, profiling, and operator filtering.

The Actual AI Bit

So where does this leave us in terms of usable inference? Well, a lot of people like speed, and if you want speed, you can install llama.cpp-tools-spacemit 0.0.8 and run TinyLlama 1.1B Chat Q2_K (which is just 459MiB) with 8 threads:

Test Result
Prompt processing pp128 137.47 ± 0.05 t/s
Token generation tg64 36.60 ± 0.01 t/s

This is pretty impressive as SBCs go, and no wonder I am starting to see YouTube videos demoing it—it fills up a screen impressively fast if you do a one-shot prompt, but is fundamentally useless.

Running Real Models

The more interesting question is whether the K3 can host a usable local coding endpoint, so I worked through a spread of current models on a fork of the SpacemiT llama.cpp tree, all at Q4_K_M with f16/f16 KV and 8 threads.

I cranked out a Pi session and had it draft a realistic agentic coding turn: a system prompt with tool definitions, a prior read tool call, the file returned as context, and a request to produce an edit tool call - roughly 700-900 prompt tokens in, 700 generated out.

The results were… Interesting. And slow to achieve, not just because of the turn times but also because I had to patch llama.cpp to match minor changes in the Bianbu libraries:

Model Type / active RAM Prefill (t/s) Decode (t/s) Overall† (t/s) Turn
Qwen3.6-28B-REAP-A3B MoE / A3B 17.3 GB 29.1 6.5 11.5 140s
Gemma 4 E4B dense / 4B 4.9 GB 28.9 5.7 9.5 147s
Gemma 4 E2B QAT UD-Q4_K_XL dense / 2B-ish 2.5 GB 99.6 12.9 - 18s/128 tok
Gemma 4 26B-A4B MoE / A4B 16.9 GB 38.8 5.1 9.1 154s
Qwen 3.5-9B dense / 9B 5.6 GB 22.5 4.5 8.2 195s
Gemma 4 12B dense / 12B 7.3 GB 18.7 2.46 4.3 322s
Gemma 4 12B QAT UD-Q4_K_XL dense / 12B 6.3 GB 25.0 3.6 4.2 ~86s/300 tok

†Overall = (prompt + completion tokens) ÷ total compute time - blends prefill and decode for the turn.

So yes, it can run fairly decent models, but at slightly over 2 minutes a turn, not in a usable way. That doesn’t mean it can’t run LLMs, just that it can’t run moderately serious ones at speed (still, I’m pretty sure you can stuff a smaller Qwen variant in there and do simple things like home automation).

Since I happened to be playing with a few of these models on my RTX3060 (where they work at 4-8x the speed, making them quite usable), I copied the weights across and had Codex script out the same run across them with a few variations in settings:

Model Note Prefill t/s Decode t/s
Qwen3.6-28B-REAP + ngram spec copy-heavy task, 81% accept 29 15.5 (2× peak)
Qwen3.6-28B-REAP @ 64K ctx light context 33.1 7.8
Qwen3.6-28B-REAP @ 262K ctx full native context 21.5 9.8
Qwen3 0.6B tiny model 293 55
Qwen3.6-28B Q4_0 (requant) deep 9K ctx 21.7 3.5
Qwen3.6-35B-REAP + MTP non-viable on this CPU backend - - (stalled)

The pattern is somewhat clear: on this memory-bandwidth-bound board, decode rate tracks 1 / active parameters – or something. Sparse mixtures-of-experts and sub-4B dense models “work”, but anything above 3B just doesn’t, really. And Multi-token prediction (MTP), which I had gotten to work pretty well on my 3060 under go-pherence, stalls completely.

Since Gemma 4 just came out (again, for what, the third time?) with QAT, I also tried both its MTP and QAT variants by patching llama.cpp a bit further (by this time I was really hooked).

And splitting the workload across core types actually “worked”: sticking drafters on the slower X100 and the rest on A100 was feasible, but… there’s no fast memory exchange between core types, so it was (verifiably) useless:

Gemma 4 E4B run Thread placement Prefill t/s Decode t/s
No drafter target on A100 8-15 26.36 5.99
Assistant MTP, 4 draft threads drafter on X100 0-7, target on A100 8-15 26.35 5.99
Assistant MTP, 8 draft threads drafter on X100 0-7, target on A100 8-15 26.30 5.97

QAT

Gemma 4 E2B QAT was, however, “useful”, for a rather slow definition of it (~13t/s), and it is technically multimodal, but on the K3… not usable either. I tossed a 224×224 test image into it, which took roughly 39–47 seconds just to process through the projector, and even though it could identify a solid red square an equally simple red square/blue circle image came back as “yellow and white”. Might be my code, but by this time I had already started eating into my vacation days and I decided to call it quits.

The numbers are interesting, though, and make me wonder what a K3-like CPU can do with other kinds of models:

Model / route Params (M) RAM / file Prefill t/s Decode t/s Notes
Qwen 3 0.6B 596 373 MB 37.5 43.5 Great demo, zero substance
Gemma 4 E2B QAT 4,630 2.5 GB 99.6 12.9 decent prose/code without toy-model speedups; can use tools, will keep it around
Qwen3.6-28B-REAP-A3B 28,240 17.3 GB 28.9 7.15 quality anchor; large context and actual coding
Gemma 4 E4B 7,520 4.9 GB 27.5 6.01 twice as big as E2B, and twice as slow
Gemma 4 12B QAT UD-Q4_K_XL 11,910 6.3 GB 25.0 3.6 sort of worked, but unusable

In practice, none of the model, quant, or speculative tricks break the ~7 t/s decode wall for genuinely useful generation on the quality models, and we’re stuck shuffling ~3B of active weights per token out of LPDDR.

I was able to get very close to the C numbers with the E2B QAT, so I will be playing with that a bit more–in fact, I think that the Gemma 4 models are the most interesting thing out there if, like me, you’re stuck with an RTX 3060 and a 36GB RAM M3 Mac as your top inference hardware…

Where This Leaves Me

I am quite taken by the K3, to the point where I just got Whisper going on it using go-pherence and am now trying to shoehorn various other things into it, but the summary is as follows:

  • Software maturity is surprisingly good – Bianbu 4.0 has a modern kernel (6.18), modern tooling, and has had zero papercuts (so far). This is not the “barely boots” RISC-V experience from two years ago, when barely worked out of the box on riscv64 without a full rebuild. And no, I didn’t try yet, but I will, eventually.
  • Thermals are well-managed – never exceeded 68°C even under sustained 16-core load, fan ramps smoothly and is much quieter than the CIX P1.
  • The shipping storage is great – NVMe-class speeds from the onboard UFS (didn’t feel the need to add an NVMe), no SD card in sight
  • GPU and video decoding story is better than expected – PowerVR Vulkan 1.3 and hardware decoding both work out of the box.
  • The “normal” CPU cores (and memory bandwidth) are middling – per-core IPC is roughly half an A720, but the A100 RVV sort of makes up for it.

And even if it can’t do “real” LLMs, I am pretty sure the K3 can handle standard image recognition swimmingly. YOLOv5 has just come out even as I am putting this post together, so I haven’t tested it, but the key thing is that RISC-V is really interesting as a CPU platform now (at least for me). Of course, it has to come with enough RAM (and those 32GB RAM are probably the bare minimum any AI SBC should have for any realistic use), and times are tough, but I look forward to testing its descendant(s).

Right now I’ve embarked on the rather quixotic quest of getting Ideogram 4 on it (and yes, I know it won’t really “work”, but I wanted to have a go and have another “working” implementation besides the RTX back-end), and I expect I will spend a bit of time trying to tweak Qwen or Gemma 4 on it to see if I can have a permanent “house LLM” that doesn’t suck and can do basic automation (even if slowly) – and I’ll update this post (or add a link to it at the bottom) with any positive results.

WWDC26: Early Impressions

This was the weirdest WWDC26 keynote in a while, and some of the past ones were visibly phoned in. It was rife with weirdness and flashbacks.

To my surprise, a few of my items actually made it. Naming the next macOS “Golden Gate” was not on my bingo card, though; a little too trippy and a lot too lofty for what is, by Apple’s own tacit admission, a Snow Leopard year: catching up rather than charging ahead.

The self-deprecating tone ran through the whole thing, from a hippy bus that was equal parts weird and funny to the unmistakable sense of a company that spent the past year watching the industry sprint past it on AI and is now, not running but sedately pacing, to catch up.

Moderately Likely To Work

Much to my surprise, two of my top annoyances got airtime: they’re tackling Spotlight and Mail search, the exact failures I called out, although whether either works once it ships is anyone’s guess.

They’re also doubling down on automation, at least superficially, with vibecoded and a renewed push for third-party Actions. Vibecoding Safari extensions and Shortcuts is the genuinely interesting part: it points at automation rather than novelty, which is more than I can say for yet another Image Playground. None of it erases the brittleness and legacy gaps that made me want a real platform to begin with, but it’s at least pointing the right way. Tab grouping and change detection in Safari are a fun party trick, no more.

Siri AI

And yes, there’s a new, as-yet-unproven Siri (with a completely pointless AI moniker) you summon by holding the power button (part Spotlight, part walkie-talkie, plus a floating gelatinous orb in Vision Pro), and a Siri app trying to be a catch-all bucket for every interaction.

The new voice struck me as a little cringe and overly American, which is an odd note to land on when you want me talking to my machines all day. The feature set is fuzzy: on paper it can touch far more of my data, and moving photos to the shared library by voice would be neat if it works. But Siri has been stuck at “if it works” for fifteen years, and the one thing I actually want (for it to handle my and calendar properly) wasn’t demoed in any useful detail.

Reheated, Or Absent

I wondered whether the automation push would reach , and the answer is a shrug: the new camera detection is cute, but a YOLO model has done exactly that for a decade, and the automation logic I actually need stays vague. The rest of my list didn’t show at all: no hypervisor on the iPad, no running my own code without the annual toll, nothing on iCloud sync, the Watch, or SwiftUI. Maybe the sessions turn something up (which is why this is an early read), but my expectations haven’t budged.

The framing around Apple Foundation Models was the bigger tell: we already know there’s Gemini underneath, which leaves me wondering how much Apple is adding beyond the wrapper. got the same treatment by being walked back in the most face-saving way imaginable, with the old Accessibility transparency slider re-warmed and trotted out as an improvement. Disingenuous is the word, twice over.

Update: Also much to my surprise, they actually mentioned unifying the corner radii, which I completely missed. I must have tuned it out after the 300 random percentage performance improvements they quoted against… no real baseline, really.

Anyway, Apple heard the parts of everyone’s complaints that a) did not force them to walk back Liquid Glass and b) fit the AI story it needed to tell, and stayed quiet on a lot of the boring structural stuff that’s been broken for years. Yes, they are committing to improving performance and fixing some of the most egregious issues, and that’s not nothing; hearing Spotlight and Mail search admitted out loud is more than I expected, but it is mostly Apple’s technical debt catching up with them, and, of course, Apple catching up with everyone else where it regards AI, but on its own terms and at its own pace.

Oh, and they deprecated pretty much all of my hardware, too. Kind of expected, much like the usual geographical restrictions, which mean a good chunk of this may not reach Portugal for a year, if at all.

I’m going to give it a couple of days until the dust settles, watch the Platforms State of the Union tomorrow, and then mull things over a bit more. And maybe, somehow, we can chalk up this WWDC as a sort of a win, in the long run.

Notes for June 1–7

I decided to take a couple of days off and generally tune out, thanks to a few strategically placed bank holidays – which meant my usual mix of relaxing and dealing with a few chores.

For starters, I replaced the battery on our A1466 MacBook Air, which just keeps on trucking – it’s now on its third battery (I swapped the factory one some four, or was it five, years ago). For around EUR 80, keeping that rather nice keyboard/screen/trackpad combination in use was a no-brainer, and it too now runs a Niri desktop, having been a few months ago.

Putting Pi on the Desktop

And since I quite like having an AI assistant that can actually do something useful on my desktop, I did a quick hack to wire Pi into Noctalia:

The Pi assistant panel running inside the Noctalia desktop shell
Pi running inside Noctalia -- here, mid-task on the shell plugin itself.

This took around 30 minutes to become useful, and gave me a couple of ideas for improvements to piclaw’s UX – I had forgotten how flexible QML is.

AI Can Be Entertaining Too

I’ve been automating away a fairly large chunk of VM and container management – I have a dedicated agent that knows how to manage my Portainer stacks and version them in Gitea, for instance – but as it turns out, LLMs are also pretty good at a few other things, like setting up emulators under Steam (creating nice icons, fixing controller input mappings, tuning upscaling and shaders, and the rest of it).

But I hadn’t let an LLM loose on my Calibre and music collections yet, and – with the right safeguards – it’s been awesome at tidying up metadata. I had dozens of ancient books with slightly broken Calibre metadata, so I’ve been putting together an server that sits next to my library to fix them – mostly because I don’t want to give a model full filesystem access to my NAS, and this way I can snapshot the database whenever it tries anything more extensive. I may well make something more generic, given time.

My WWDC 26 Wish List

Michael Tsai’s annual roundup of WWDC wish lists went up this week, and the thing that struck me most wasn’t any single request–it was the mood. There seem to be fewer wish lists than last year, several people openly admitted they couldn’t be bothered to write one, and the ones that did are pretty much bereft of any “aspirational” wishes.

Read More...

Field Notes From The AI Battlefield

Since today is a bank holiday for me, I decided to consolidate a few more of my notes into a post. What follows is a set of guiding “principles” that I’ve found useful over the past year or so and that I’ve codified into various bits of scaffolding I reuse across my projects.

Read More...

Notes for May 24–31

Today I realised that I could just spend the day doing essentially nothing and that nobody would hold it against me (at least in Western nations), so… I might well do just that, with a few caveats:

Read More...

Mildly Parboiled

Allergy season is finally fading (at least for me), but today was the first time I had to turn on the AC in the office, and it was great to realize that and almost four years of potential HomeKit foibles, my is still working perfectly.

Read More...

Indoor Wi-Fi Roaming with OpenWRT

A few months after writing up the units and moving the house over to , I ended up revisiting the one bit I had deliberately waved away as “good enough”: roaming.

Read More...

Notes for May 17-24

My sinuses are still giving me grief, but this week was much more successful at pretending to be enjoyable, at least. For starters, we watched Project Hail Mary, and it was every bit as good as I would expect it to be, which is very rare in movies these days.

Read More...

Logitech Combo Touch: Four Years Later

I think it’s time for an update on my iPad Pro M1 and, most importantly, the Logitech Combo Touch I got for it. Think of it as a long term review of sorts.

Read More...

TIL: Noctalia Shell Lock on Suspend

This is a little bit of follow-up to my – I keep using it routinely (especially when we travel for leisure) and love the little thing to bits, but I’ve been wanting to run it mostly on power saving mode to reap the most benefit out of the hardware (and battery, of course), so I started looking at desktop environment alternatives.

Yes, I could already get a full afternoon (and then some) out of it, but Apple Silicon has spoiled me as far as battery life expectations go, and has a little bit too much baggage for that kind of extended use.

Since I spend 90% of my time on it writing or coding and still have a penchant for keyboard-driven desktops, I initially switched to Fedora Sway Atomic (gotta love being able to swap environments with a single command…), but later installed Niri and Noctalia Shell because I really like both the idea of a scrolling window environment and the sheer polish of the whole thing–even if there are some rough edges here and there.

I am very happy with it, and writing plugins for it is trivial:

I hacked together a Bing Wallpaper plugin in 30m
I hacked together a Bing Wallpaper plugin in 30m

The one thing that annoyed me to no end, though, was locking on suspend, which Noctalia Shell should do but apparently doesn’t in , so I had to resort to two hacks:

Locking on Lid Close

The first was adding a switch-events block to the Niri config to trigger the lock screen when the lid closes:

switch-events {
    lid-close {
        spawn "qs" "-c" "noctalia-shell" "ipc" "call" "lockScreen" "lock"
    }
}

Idle Lock via swayidle

The second was setting up a swayidle systemd user service to lock after 5 minutes of inactivity and suspend after 10:

[Unit]
Description=SwayIdle Service
After=graphical-session.target

[Service]
Type=simple
ExecStart=/usr/sbin/swayidle -w \
    timeout 300 'qs -c noctalia-shell ipc call lockScreen lock' \
    timeout 600 'qs -c noctalia-shell ipc call sessionMenu lockAndSuspend'
Restart=on-failure
TimeoutSec=30

[Install]
WantedBy=graphical-session.target

This last one feels extremely gauche and I hope to find a better way, but I guess this comes with the territory. I don’t really care about having a trendy Wayland desktop (I just want a dead simple one with a bit of polish), but I hope this kind of hacks won’t be necessary for much longer.

Oh, and of course I set gsettings set org.gnome.desktop.wm.preferences button-layout 'close,minimize,maximize:appmenu' to match macOS decorations.

Apple Papercuts

I know this blog has strayed a fair distance from its Mac-centric origins, but I’ve been keeping a mental list of all the things that are broken, missing or inexplicably neglected in ’s software, and it’s gotten long enough that writing it down feels like a public service1.

Read More...

Notes for May 10-17

The weather has gone a tad cloudy again, which provided me some relief from my allergies–but not enough for proper overnight rest, so yet again I arrived at Friday afternoon totally exhausted.

Read More...

Announcing ios-linuxkit: Linux on iPad, the Hard Way

I’m done waiting for Apple to fix things. And one of the things I think should exist is a decent way to run Linux binaries on my iPad.

Read More...

Unexpected Synology Woes

Last weekend my decided, for some unfathomable reason, to stop working after I took it out of the closet, dusted it and put it back, and I have feelings about it.

Read More...

The Siri For Families Apple Will Never Build

The got me thinking about the one thing I keep wishing would build and almost certainly never will: a family-scoped AI assistant that actually works across all our devices.

Read More...

I Think I Figured Out What an AI IDE Looks Like

I’ve been mulling the UX arc I’ve been going through over the past couple of years, and I think it was mostly the same for everybody:

Read More...

Notes for May 3-10

This was a weird week, both because I keep waking up at 5AM with my sinuses clogged, and because I feel like I’m losing momentum. Feeling almost permanently cotton-headed, sleepy due to sheer exhaustion or because of antihistamines certainly has something to do with it, but .

Read More...

The Local AI Moat

Regular readers will know that I’ve spent most of the past two years shoehorning LLMs into single-board computers, partly as a learning exercise and partly because there are lots of local/”edge” applications where semantic reasoning (no matter how limited) and “interpretation” of sensor data are actually useful.

Read More...

Notes on GPT 5.x Model Regressions

I’ve been getting annoyed at constant code regressions in piclaw for the past few weeks. Something was off–even after bumping the test suite to the point where it catches most mechanical errors, gpt-5.5 kept making unrelated edits to code that should have been left alone, and I was getting really annoyed at babysitting it.

Read More...

Notes for April 27 – May 3

This was an absurdly productive week, at least on a personal level. I’m not sure whether to be pleased or worried about the number of projects that moved forward simultaneously, but here we are.

Read More...

Archives3D Site Map