This is a very quick follow-up to my Mac emulation hacks from a couple of weeks ago, and worth noting for the fun value and a little bit of AI.
I love old arcade games (especially some NeoGeo titles), so it was only natural that I gravitated to them while I was trying to get Mac color rendering to work on an ESP32–if there’s a piece of software that was extremely attuned to its hardware, it’s arcade games, often written to map directly into hardware.
And I love R-Type in particular, so even though I originally thought of getting Metal Slug to run on the ESP32-S3 because of its shared 68000 heritage with the Mac, I ended up wondering how fast I could make that run.
Turns out the M72 boards Irem did for R-Type ran an 8086-like CPU (the NEC V30, which has a few extensions) and a Z80 in tandem, and that the emulator wasn’t at all hard to recompile if you stubbed out things like audio (which is done by the Z80).
I decided to start with the hardest/smallest target (the plain CYD with a plain ESP32), which can barely run the emulator in one core and has almost no free RAM–to the point where after a few iterations it was rendering something, but clearly wouldn’t make it without rebuilding the whole emulator from scratch.
Getting it to render frames effectively (as in, rendering one frame without any visible stutters inside the frame), is exactly the kind of problem I am having on the Mac emulator because a) you typically need enough RAM to manage the framebuffer and b) all ESP CYD displays have limitations regarding display (typically SPI) bandwidth.
For a little bit of inside baseball (yeah, I’ve been spending time with US folk again) the real hassle (especially on the smaller ESP32) was handling memory maps, palette RAM, tile/sprite priority, and frame timing. You can finagle things a bit by reassigning one of the cores to “just” do rendering, and there are various DMA modes depending on chipset, but all of which proved to be enough distraction for me to upgrade to an S3-powered display as soon as I could.
So I just focused on clean frame renderings, even if the time required to produce them made it feel like a slideshow, so much so that after figuring out the backgrounds were a static texture composited behind the main sprites, I decided to skip that.
It would have been amazing to see running on the smaller one, though.
Then I got piclaw to port the entire thing to the ESP32-S3, and all of a sudden there was enough horsepower to run and render at around 50fps:
Both boards, starting from the same emulator state but rendering as fast as they can
I’m so happy with the results that I am considering getting this to run on an ESP32-P4 and see what we can do about audio and using the USB host port on that for a controller, but I really should focus on backporting the rendering techniques into a Mac emulator…
Either way, this was a great way to refine my approach at getting AI agents to tackle long, grinding, intricate problems, and the code is up on GitHub if anyone cares to check it out.
However, before handing it over to agents, I had to specify how to do this, and right now, after half a dozen embedded development and hardware porting projects since Christmas, the strategy is pretty well established:
Get something to run on a host harness, running VNC, plain SDL or just framebuffer dumps
Derive milestones from that (still quite manual) job. Maybe even more harnesses (like target CPU opcode harnesses for JITs, sprite subroutines, etc.)
Tackle the first few milestones on a simpler (but also more limited) hardware/software target
Build reusable debugging/introspection tools for each milestone that the agents can use later to have a feedback loop
Expand out from the above.
That’s why my first hack for these things is just to point a webcam at the display (or generate a frame, or a known good end-to-end output dump) and get them to render a test pattern:
The M5Stack Tab 5, the highest-end ESP32 device I have, showing a test pattern
From then on, the agents can use the camera and other test patterns to verify that they are rendering correctly (of course it’s useless for video, but any SOTA model these days can take useful feedback from images), and, as a bonus, I get their snapshots on the piclaw web interface and can verify that they are actually doing what I want them to do.
I already knew what I wanted to achieve (in short, to explore and document techniques to render fast graphics on these boards), and I had a camera pointing at the target devices like in previous hacks, but one of the things I wanted to explore with this setup was to mitigate long context problems:
Even if you use things like /goal (which I do, but with bounded horizons) models will inevitably deviate from the actual goal
As context piles up, they will also inevitably hyper focus on tangentially relevant issues (because they see code issues and zero in on those rather than take a broader view of what needs to be achieved)
Dead ends and back-tracking to reassess better approaches becomes nearly impossible
What I did was very simple. piclaw allows me to easily have multiple sessions running, and a few weeks ago I implemented a chat tool, with hilarious results:
Two piclaw agent sessions chatting with each other
…plus “agents” or sessions also have the ability to introspect each other’s state (goals, messages, current activity, compaction status, etc.) and schedule themselves, so setting up an @auditor / overseer that can keep track of other agents is trivial–all I needed was to write a SKILL.md file that told the auditor to:
Observe commits, logs, tests, and artifacts; judge progress from concrete evidence towards the set goal, not just sessions being generally “active” but treading water.
Enforce strict, reproducible completion gates (no interpreter fallbacks, ROM/global seeding, scanner bypasses, or synthetic shortcuts like skipping steps or faking code).
Nudge active sessions once with a concrete, evidence-backed step, a measurable success signal, and any corrections to make.
Require commit/push hygiene with a quality bar for commit messages
Never edit target-session code or implement fixes, keeping itself to steering only via chat and audit log entries.
Escalate from steering to actual interruptions only after repeated ignored guidance
Keep a running log with a summary of what was done every cycle (state, output/structural/strategy/steering aspect) and write out a neat Markdown template in the web UI
I gave that file to Opus 4.8 (I definitely still don’t trust Opus to write code, but I did want a different, complementary model steering Codex 5.5), told it which sessions to monitor, and let it go on its merry way.
For this particular case, I did have to intervene once or twice to highlight rendering and palette issues (which I can do in piclaw’s web interface on my iPad), but that was it:
Highlighting rendering and palette issues
And I think this approach has legs–I’m now using it to grind through the porting/testing/quality aspects of other things I’m doing, and will eventually try it with local models (if I ever get good enough hardware to run them).
Note that I did not want to create a fancy multi-agent system where every agent talks to each other: I wanted to have long-term oversight and steering.
And this is not a delegation pattern either (there is also a delegate plug-in that allows each session to delegate chores to simpler models).
I deliberately chose this approach because, in general, I’ve found multi-agent systems with “party lines” and loose couplings to be a complete waste of tokens unless there is a clear hierarchy and very well scoped outcomes–just like in a human team, really…
Jun 14th 2026 · 4 min read
·
#ai
#go
#hardware
#mac
#notes
#weekly
Another week, another set of bank holidays that I tried to leverage strategically to do interesting things with my time, and… I ended up throwing out my back and having to sit very still for hours at a time, which made the whole thing feel like a waste of paid vacation with extra ibuprofen.
The upside, if I can call it that, is that sitting still is reasonably compatible with finishing TV shows, staring at logs, profiling traces and dealing with broken model outputs for hours on end. Which sort of explains my notes for this week…
I am a bit fed up with AI. Not in the usual performative sense, but because models are still not that smart, the tooling around them is uneven, and I still don’t have the hardware to play with it the way I want. The week was a slow burn of getting go-pherence to become more than a bunch of random matmuls, which meant pushing Ideogram 4 far enough to make cat pictures and then immediately running into the limits of my RTX 3060.
Of course I used AI to generate a cat picture, this is the Internet!
There is no way I can scale this out to do more than 256x256 low-quality pictures on that card without a lot of pain and slow iteration, and the iteration is the problem. The code can be made to run, but the gap between generating a one-off cat picture and being able to use it routinely (and at acceptable performance) is just not worth it with the gear I have.
I am very seriously considering gathering donations to get an NVIDIA GB10 or a Ryzen AI device, which seem like the bare minimum hardware to do barely half-assed local inference.
I also spent an outrageously unproductive amount of “learning” time on shoehorning DiffusionGemma into go-pherence, on both the K3 and the RTX 3060 via mmap tricks, GPU expert caching, sparse self-conditioning and all the stupid details that decide whether an inference run takes minutes or merely feels like it does. Some of it worked surprisingly well (I got coherent answers), but none of my hardware is good enough for useful answers.
Regardless, the more I revisit AI-assisted projects from a few weeks ago, the more time I spend auditing whether the code matches the written SPEC.md rather than adding anything new. Code quality has been mostly OK in projects where I have my usual vetting and testing pipeline in place, but the common thread in the ones where I don’t is increasingly obvious: they were Anthropic-heavy. Opus keeps being very fluent about what it claims was implemented and very wrong about what is actually there. Go figure.
I picked up a temp work laptop early in the week (a Snapdragon X Plus machine), and although it is still early days I was impressed enough with the hardware and battery life to hack together womprat so I could get at my personal machines from it without installing anything of consequence.
Since this is a loaner and I mostly live inside AVD anyway, the interesting bit was making something small, disposable and ARM-friendly. Me being me, I used Go, built it on Linux, and glued together a browser, SSH client and remote-display shell on top of tsnet, WebView2, RDP and VNC bits. It is a pretty great combination, when it works, but I ended up having to wire up a Linux WebKitGTK test shell to have reproducible debugging.
piclaw is still the thing I use to fix other things, so I kept poking at it even if I am a bit tired of the constant upstream churn from pi and associated paper cuts that come with maintaining a TypeScript application of its complexity. gi is now able to bootstrap itself, but not a replacement I can trust (I used Opus 4.8 on it and am still paying that technical debt), so I’ve actually been considering shifting to Codex for most things and use pi solely through IPC mode, which would mean going back, full circle, to vibes.
After last week’s foray into the topic I tried (and failed) to enjoy some retro gaming this week (even though I did get a bit of a kick of further automating my Steam setup), but to compensate I took another pass at my NeXT and Mac JIT emulators, partly because I realised that (you guessed it) Opus lied and failed to implement MMU and I/O emulation correctly across the board.
I have another Radxa board to test, and this time I decided to have a go at doing photogrammetry to capture enough of the relative dimensions to design a 3D printed case for it–and besides the App Store being crammed with scammy “3D scanner” apps that do very little else despite repackaging SimpleObjectCapture (which, incidentally, you can now build for yourself in half an hour using Codex) I also confirmed iOS Object Capture is not really that great for fine detail, at least in the default settings:
The yet untested Q8B
I suspect I will be getting back to CAD and 3D printing pretty intensely over the next few months (or whenever I can actually move around). My back is still complaining, but at least I have an entire work week of… more sitting to… look forward(?) to, starting tomorrow.
Jun 14th 2026 · 5 min read
·
#3d-printing
#after-dark
#cydintosh
#esp32
#hardware
#mac
#retrocomputing
This is the (very) abridged story of how I got After Dark running on my own flavour of the Cydintosh–specifically, Flying Toasters on an ESP32-S3 board, zooming along at 65 FPS, which is both completely pointless and one of the more satisfying things I’ve done this month.
When I first got wind of the Cydintosh, I immediately dug out one of my Cheap Yellow Displays and tried to get the software running on it, only to find out two things:
My Cheap Yellow Display was (predictably) different (same 240×320, but an ESP32-D0WD)
The resistive touch screen mine had, together with the relatively small size, made it unusable in practice
I mean, it ran, but… Here, you be the judge:
Yes, that is a coin cell, and this is a bit contrived of an example
The photo above was me trying to push the envelope a bit.
The truth is that even with proper 1:1 pixel scaling, portrait rendering and some creative interpretations of how to push faster screen updates through the SPI bus, and even considering the Mac Plus emulation and basic Wi-Fi access worked really well (because of the genius trick of exposing ESP32 hardware through to the emulator), it was painfully slow.
I did the obvious thing and ordered a couple more, larger displays with a slightly more powerful chip and proper capacitive screens:
Size comparison, still debugging display rendering
And the results were glorious: the new boards are labelled ESP32-8048S043C, and they come with an 800×480 capacitive panel, an ESP32-S3, and 8MB PSRAM, which is more than enough to run the emulator and a more complete version of Mac OS, running the original Musashi-based umac emulator at a fairly good speed (certainly faster than the original Mac Plus or even the Classic) and at full panel screen resolution (480×800) in black and white:
This reminded me a lot of my years doing PageMaker/print design
Everything went pretty swimmingly until I actually injected After Dark into the System folder:
Guess what, no keyboard. At all.
Without any way to input text (and no, Key Caps doesn’t really work for this), I had to resort to removing bits from the control panel with ResEdit until it mostly worked (removing the dialog was not trivial, and in the end I asked Codex to just disassemble the INIT(?) resource and skip over the dialog).
A few minor firmware tweaks later, I have a “real” Mac that runs After Dark as a screensaver perfectly:
Flying Toasters running on the ESP32-S3
In case you’re wondering, the toasters fly at a solid 65-67 FPS.
Which is faster than they ran on my actual Macintosh SE/30 back in 1991, because that machine was doing it in 1-bit black and white on a 68030 and this is a $15 display board doing it in 16-bit colour on a dual-core 240MHz chip with more RAM than my first three computers combined.
Not too shabby, even if it’s in black and white (which is what the original Mac ROM can handle).
And that is why I have gone down the rabbit hole of trying to port this to an LC ROM, on yet another ESP32 board (a P4). It turns out there’s not enough RAM on the other boards to hold a colour frame buffer, not enough bandwidth to do the delta refresh hacks I did in the original version, and not enough CPU power and storage for the required emulation and system changes…
I’m still grinding through the mechanics of paring down BasiliskII to fit, but in the meantime this board is sitting on my desk doing nothing but rendering endless toasters and reminding me some things are worth the sheer fun involved every time I glance at it.
Jun 11th 2026 · 22 min read
·
#ai
#hardware
#homelab
#linux
#reviews
#riscv
#sbc
#spacemit
This is a fascinating box–so much so that after almost three weeks playing with it, I amassed so much material that I nearly decided to split my review into two parts, but in the end I decided to condense it a bit and post a longer piece than usual, even if that means almost half of it is a fairly wide-ranging exploration of how to get AI workloads on it.
The MilkV Jupiter 2 in its metal case
Spoiler: We’re tantalizingly close to having usable non-GPU inference on SBCs, and surprisingly enough, RISC-V is more interesting than ARM right now.
I’ve tested a lot of ARM boards over the past few years, but only a couple of RISC-V machines–and the MilkV Jupiter 2 is quite a substantial system: Sixteen cores (with a twist), a refreshingly roomy 32GB of RAM, a 10GbE SFP, Wi-Fi 6, a GPU with actual DRM nodes, all in a Pico ITX form factor.
Disclaimer: my contacts at Radxa supplied me with a Jupiter 2 free of charge, and as usual, this article follows my review policy.
On paper, this is the first RISC-V board that doesn’t feel like a science project.
In person, and unlike most of the SBCs I get, the Jupiter 2 is a finished product, and came in a neat little box, fully assembled and contained in an unassuming metal case with external antennae as the only extra parts. No power brick, but since it has a USB-C PD port, I had zero trouble powering it from one of my monitors.
After some careful disassembly, the board itself is pretty dense: 1× DP out, 1× eDP ribbon, 1× USB-C PD power input, 3× USB-A 3.0, 1× GbE RJ-45, 1× 10GbE SFP+ cage, an M.2 slot and what looks like a second M.2 for storage. There are also MIPI/eDP ribbon connectors I haven’t tested.
The board is dwarfed on the top side by the cooler, which I dared not remove
The SoC is SpacemiT’s K3–a big.LITTLE style arrangement with 8×A100 cores at 2GHz and 8×X100 cores at 2.4GHz, which makes it the first RISC-V chip I’ve handled that has asymmetric core clusters. And since there are a few other devices out there with the same reference design, I will henceforth refer to the Jupiter as the K3 for short.
If you’ve never come across SpacemiT’s stuff before (I had only a bare inkling of the K1), I heartily recommend the public SpacemiT K3 documentation and their GitHub repository since the architecture is laid out there, and it was fairly easy to get a high level grasp. In particular, the K3 SoC datasheet has a pretty good overview:
Block Diagram from the K3 Technical Brief
A key thing that needs to be taken into account is that the A100 cores are fundamentally different from the X100 ones. They have extended vector instruction sets, dedicated transactional memory, and, well… AI.
That documentation also seems to be the original source of the marketing claims that the K3 provides 60 TOPS of AI compute and can run 30B models at over 10 tokens/s. Well, sort of– as another spoiler, I can share that I hit a hard cap at an effective 3B (which seemed to be the practical limit), but we’ll get there…
One of the nice things about this box is that it comes with a 10GbE Realtek NIC. I wasn’t able to test that at full speed yet since my 10GbE interfaces are all in my server closet, but the 802.11ax reported below worked flawlessly with my Wi-Fi 6 setup:
That sda (model TY7B-128) initially fooled me into thinking it was a SATA SSD–but there’s no SATA controller on this board, and the 3.4 GB/s reads I measured later are well past anything SATA III can do (~600 MB/s). It’s actually 128GB of onboard UFS, which rides the kernel’s SCSI layer and so enumerates as sda exactly like a SATA disk would (NVMe would be nvme0n1, eMMC mmcblk*). The mtdblock devices are the 8 MB NOR flash partitions (bootinfo, FSBL, env, eSOS, OpenSBI, U-Boot).
The sensors output is a bit weird, but it does cover all the CPU cores (A100 are clusters 0 and 1, X100 are 2 and 3). And I will have a bit more to say about the fan.
But I’m ahead of myself here–these were gathered after plugging it in, obviously, and it’s worth rewinding and going over that part:
This was a first-class experience, and I wish all SBCs worked this way: I plugged the DP port into my ancient LG Ultrafine, powered on the monitor, and got a Bianbu first-boot wizard in less than 5 seconds after the initial logo.
Clicked through it–language, timezone, user account–and landed on a working accelerated desktop. That’s it. No GRUB patching, no DTB hunting, no resize-filesystem bugs, no serial console required. The smoothest first boot I’ve had with an SBC all year.
The board ships with Bianbu 4.0 (“Resolute Raccoon”)–a Debian-based distribution from SpacemiT, which, unlike most ARM boards I’ve used recently, is actually running a modern 6.18.3 kernel.
MilkV Jupiter 2 LXQt on Wayland - note how only the first 8 cores are active
The desktop runs LXQt on Wayland, SDDM as the display manager, and the whole thing felt responsive enough that I didn’t immediately reach for the terminal. That is not something I say about SBC desktops often, and even though I then spent most of the past three weeks accessing it via ssh, I would likely have zero issues using it.
Standard apt works (repos seem to be at spacemit.com), Debian toolchain is present, and the kernel command line includes some interesting RISC-V-specific hints: unaligned_scalar_speed=fast and unaligned_vector_speed=fast, which I think are related to the RVV extended vector instruction set and the way the kernel does thread allocation.
I dug around a bit more and the boot chain goes through NOR flash (OpenSBI + U-Boot) → UFS, which is cleaner than the SD-card-based setups on most SBCs I’ve tested, and it was able to update itself without any issues:
Not UEFI, but compared to the U-Boot-on-SD-card experience that most ARM SBCs inflict on you, having a proper NOR flash boot chain with OpenSBI → U-Boot → onboard UFS is a step up, because it means you can brick the OS partition and still recover without reflashing an SD card on another machine (and yes, Rockchip, I’m looking at you).
And since it all worked out of the box, I did not try adding an NVMe (there’s an M.2 M-Key slot for one) or booting from it (yet), although since there is official Ubuntu support I fully intend to try that out in the future.
Developer tooling for RISC-V will be foremost on most of my readers’ minds, so I can tell you right away that I am currently making extensive use of these:
GCC 15.2 (riscv64)
Go 1.25.7 – works out of the box, which is significant for me
Python 3.14.3
Make 4.4.1
Sadly (for me), Bun isn’t available, since there’s no official riscv64 build available yet, but node works OK. I focused mostly on Go, though.
To get started, I ran a small battery of tests to get a feel for where this sits relative to the Orange Pi 6 Plus (CIX P1, 12 ARM cores) I’ve been living with for months.
Note that these benchmarks only ran on the X100 cluster (cores 0–7). The A100 cores (8–15) are kernel-fenced for AI work–htop shows them sitting idle, and sched_setaffinity silently refuses to pin anything there from a normal shell. The reasons for that are various and fascinating, and I’ll get into them below.
The sysbench single-thread number is the interesting one here: 2,329 versus 2,800. That’s only a 1.2× gap per X100 core. The 7-Zip figures (17.5k vs 42.3k MIPS) look damning until you realize that the A100 cores weren’t used at all, so the Jupiter 2 is really running 8 general-purpose threads against the P1’s 12.
The real gap shows up in Go and Python (4-5×), which probably says more about how young the riscv64 runtime backends are than about the hardware itself.
I went back and ran this in parallel on the CIX P1, and the K3’s memory bandwidth is much lower–roughly a fifth for reads. This is likely the biggest single performance gap and puts an upper cap on whatever the CPU can do regardless of how much it packs into each cycle. For inference workloads that are memory-bound, this matters a lot. The K3 has a few workarounds, though, as we’ll see later.
The built-in UFS storage is very nice–NVMe-class speeds, better than what I saw on the Orange Pi 6 Plus’s NVMe setup with my own (underused) PCIe 4 SSD. No complaints here.
The board stays well-behaved under sustained 8-core stress-ng:
Idle: 59-64°C, fan at 45% / 2335 RPM
Full load (30s sustained): 62-68°C, fan ramps to 60% / 3194 RPM
No throttling observed, which made my usual CPU/thermal charts kind of pointless
Again, stress-ng --cpu 0 ran on the 8 available X100 cores, but even when I ran both CPU and AI loads that used the A100 cores, the fan was audible but not objectionable–noticeably quieter than the Orange Pi 6 Plus’s cix-ec-fan in quiet mode, and the fan controller API is much saner.
Since I had a few tussles with the Orange Pi 6 Plus’s fan controller limitations, I let an LLM loose on /sys/devices, and it found out that the Jupiter’s fan is managed by a CrosEC controller over eSPI (/sys/devices/platform/soc/cac8c000.espi/84000000.ec). That exposes a standard hwmon interface with fan1_input and (surprisingly) fan1_fault that standard Linux utilities can read (and the built-in cooler does seem to have the right number of wires to provide fan sensing, which is a nice touch).
There’s also a separate pwm-fan platform device at /sys/devices/platform/pwm-fan/hwmon/hwmon8/pwm1 that accepts values 0-255 for direct duty-cycle control, with pwm1_enable=1 when thermal management is active, with a pwm-fan cooling device linked to thermal_zone0. In practice, you never need to touch any of this–the board keeps itself at 60-68°C under sustained load with the fan barely audible, even when using all 16 cores and at an ambient temperature of nearly 28°C in my office.
I stuck a USB PD power monitor between the PSU and the K3, and the figures were pretty stable: 11W idle, an oddly symmetrical 22W under load. I suspect using an SFP for networking will add significantly to that, but most of my testing was actually done by ssh over Wi-Fi.
Unlike the Orange Pi 6 Plus, where the GPU required driver rebinding and vendor package archaeology, the Jupiter 2’s PowerVR GPU works out of the box.
No module loading, no blacklisting, no package hunting. I ran vulkaninfo and got a conformant Vulkan 1.3 device on the first try, although I am not sure how far I can go with Vulkan compute on this board yet since I explored other avenues.
The hardware is an IMG PowerVR B-Series BXM-4-64 MC1, and Vulkan reports it cleanly:
deviceName = PowerVR B-Series BXM-4-64 MC1
driverID = DRIVER_ID_IMAGINATION_PROPRIETARY
apiVersion = 1.3.277
driverVersion = 1.588.1135 (24.2@6603887)
conformanceVersion = 1.3.8.1
Doing the usual barrel-scraping YouTube influencer “testing” of firing up a 4K video in the browser is… absurdly fluid, really, since the K3 has a dedicated video decode unit (/dev/video-dec0, V4L2 “mvx” driver–decode only, no hardware encode that I can find) and that seems to be properly stitched together on the Bianbu packages.
OpenCL 3.0 is also present, with cl_khr_fp16 and cl_khr_integer_dot_product – the latter suggesting hardware support for int8 dot products, which is exactly what you want for basic vision processing. I tried poking at it with my Vulkan tooling, and the Vulkan side exposes shaderFloat16 and shaderInt8, 16KB shared memory, and 2 compute queues.
In short, I had zero issues with desktop acceleration, and I expect the K3 to be well supported going forward. I do intend to explore Vulkan on this a bit more, but as you’ll see below, I got completely sidetracked by the ISA and how it does vector compute…
The device tree shows an Arm China Linlon V5 (Zhouyi AIPU) at c0500000, status okay.
Okay, then, but… the device-tree lacked the obvious NPU plumbing I am sort of used to from ARM:
/proc/device-tree/soc/linlon-v5@c0500000/compatible says arm china,linlon-v5
there are no /dev/aipu*, /dev/npu*, /dev/linlon* or /dev/zhouyi* nodes
there are no aipu, linlon or zhouyi kernel modules under /lib/modules/6.18.3-generic
dmesg is silent for those names
web searches for linlon-v5, arm china,linlon-v5, Zhouyi AIPU and SpacemiT K3 NPU drivers turned up no public driver or SDK that matches this node
The Linlon V5 block is effectively opaque–no driver, no SDK, no kernel module. So it’s a dead end for now, although I suspect there are drivers for it somewhere.
What is interesting is what’s hiding in Bianbu’s apt repository: a SpacemiT ONNX Runtime stack (spacemit-onnxruntime, python3-spacemit-ort) and a spacemit-tcm package. The latter ships libspine_tcm.so, spacemit-tcm-smi and a public spine_tcm.h, and it talks to /dev/tcm rather than to a classic /dev/npu device. That’s not an NPU path at all–it’s targeting the A100 RISC-V cores and their tightly-coupled memory directly.
After the first evening of poking around, I decided to do what most people would do and read some actual documentation–which wasn’t hard to come by.
The CPU chapter in SpacemiT’s documentation gave me a few hints: the A100 cores run SpacemiT-IME (Inference Matrix Engine), a set of custom RISC-V vector extensions for quantised matrix arithmetic, with a programming model that gave me a bit of a flashback to my FORTRAN and VAX days–matrices in registers, explicit tiling and core synchronisation–but as a crash course in what RISC-V vector extensions can actually do, it made for a fun read.
The short version, if you’re in a hurry, is that this is a “unified memory” RISCv system where the CPU itself can do some interesting quasi-GPU math:
The long version is that this is almost tailor made for go-pherence, my pet inference library. I’ve been trying to do mostly MLX-like FP16 stuff with it, but my intent is to do non-GPU stuff with it, and even though AVX2 and NEON are interesting, I was completely nerd-swiped by the idea of using this RISC-V RVV variant to do “proper” inference.
And Codex was able to sort out how to map this to useful steps and identify parts of the instruction set that could do just that:
The custom instructions (vmadotsu.hp, vmadotu.hp, vnpack4.vv, vupack.vv, vpack.vv) perform fused int4×int8 dot products with FP16 accumulation. Each vmadot dispatch processes 128 bytes of activation against 512 bytes of 4-bit weights, producing 32 partial results. The data layout treats VS1 as copies×(M, K) matrices and VS2 as copies×(K, N) matrices, with the result stored across VD(L) and VD(H).
The “hard” part was to map this to Go assembler, but, again, Codex had no trouble churning out code for vector operations by just lining up the right bits:
I had some trouble figuring out how this mapped to the TCM memory device that I had found, but a few more pages into the ISA doc it became clear:
TCM is 3 MB of on-chip SRAM (8 × 384 KB blocks), meant as a low-latency scratchpad for the IME2 matrix engine. According to the docs, both sets of cores can access it in pairs:
From the X100 cores (VLEN=256), TCM reads at 1.14 GB/s (uncacheable device memory)
From the A100 cores (VLEN=1024), it reads at 5.4 GB/s via a direct SRAM path for wide vector loads
This is a pretty dramatic difference from the RAM bandwidth I measured earlier, and even more so if you consider that the A100 cores can access it four times faster than X100 cores. And there’s more:
Cores are organised in pairs sharing TCM blocks, so they can exchange results much faster
I later found that SpacemiT’s own reference code uses paired-worker barriers to overlap DMA (weight prefetch from DRAM into TCM) with compute on the partner core
If you’ve ever done double-buffering, well, this is it applied to vector compute.
Armed with this knowledge, I distilled it into a SPEC and went to town on the K3 with Codex to see if we could port some of the go-pherence SIMD inference kernels, but there was a serious kink: I couldn’t for the life of me figure out how to schedule code on the A100 cores.
So I asked Codex to get out Capstone and disassemble the TCM libraries. Turns out getting a thread onto the A100 cores requires a two-step handshake:
write the thread’s TID to /proc/set_ai_thread (a kernel interface that unlocks scheduling on cores 8–15 for that specific thread)
then call sched_setaffinity to pin it.
Without the registration the kernel silently refuses the affinity change–those cores are fenced off from normal userspace entirely (which explains the oddities in the early benchmarking).
SpacemiT’s own llama.cpp fork (PR #22863) uses this pattern: six pthreads permanently pinned to cores 8–13, synchronised with spine_barrier_t (an atomic spinlock barrier), sitting in a persistent work loop that processes matrix tiles from a shared queue.
The workers never return to the OS scheduler between operations–barriers replace dispatch overhead entirely. I later realized that a) this is how the K3 can hit 35–40 tok/s on Qwen3-0.6B Q4_K_M b) Go scheduling has a lot more overhead.
Disassembling the ONNX runtime I’d found (SpaceMITExecutionProvider) showed it used the same cores with SPACEMIT_EP_* settings for thread count, profiling, and operator filtering.
So where does this leave us in terms of usable inference? Well, a lot of people like speed, and if you want speed, you can install llama.cpp-tools-spacemit 0.0.8 and run TinyLlama 1.1B Chat Q2_K (which is just 459MiB) with 8 threads:
Test
Result
Prompt processing pp128
137.47 ± 0.05 t/s
Token generation tg64
36.60 ± 0.01 t/s
This is pretty impressive as SBCs go, and no wonder I am starting to see YouTube videos demoing it—it fills up a screen impressively fast if you do a one-shot prompt, but is fundamentally useless.
The more interesting question is whether the K3 can host a usable local coding endpoint, so I worked through a spread of current models on a fork of the SpacemiT llama.cpp tree, all at Q4_K_M with f16/f16 KV and 8 threads.
I cranked out a Pi session and had it draft a realistic agentic coding turn: a system prompt with tool definitions, a prior read tool call, the file returned as context, and a request to produce an edit tool call - roughly 700-900 prompt tokens in, 700 generated out.
The results were… Interesting. And slow to achieve, not just because of the turn times but also because I had to patch llama.cpp to match minor changes in the Bianbu libraries:
Model
Type / active
RAM
Prefill (t/s)
Decode (t/s)
Overall† (t/s)
Turn
Qwen3.6-28B-REAP-A3B
MoE / A3B
17.3 GB
29.1
6.5
11.5
140s
Gemma 4 E4B
dense / 4B
4.9 GB
28.9
5.7
9.5
147s
Gemma 4 E2B QAT UD-Q4_K_XL
dense / 2B-ish
2.5 GB
99.6
12.9
-
18s/128 tok
Gemma 4 26B-A4B
MoE / A4B
16.9 GB
38.8
5.1
9.1
154s
Qwen 3.5-9B
dense / 9B
5.6 GB
22.5
4.5
8.2
195s
Gemma 4 12B
dense / 12B
7.3 GB
18.7
2.46
4.3
322s
Gemma 4 12B QAT UD-Q4_K_XL
dense / 12B
6.3 GB
25.0
3.6
4.2
~86s/300 tok
†Overall = (prompt + completion tokens) ÷ total compute time - blends prefill and decode for the turn.
So yes, it can run fairly decent models, but at slightly over 2 minutes a turn, not in a usable way. That doesn’t mean it can’t run LLMs, just that it can’t run moderately serious ones at speed (still, I’m pretty sure you can stuff a smaller Qwen variant in there and do simple things like home automation).
Since I happened to be playing with a few of these models on my RTX3060 (where they work at 4-8x the speed, making them quite usable), I copied the weights across and had Codex script out the same run across them with a few variations in settings:
Model
Note
Prefill t/s
Decode t/s
Qwen3.6-28B-REAP + ngram spec
copy-heavy task, 81% accept
29
15.5 (2× peak)
Qwen3.6-28B-REAP @ 64K ctx
light context
33.1
7.8
Qwen3.6-28B-REAP @ 262K ctx
full native context
21.5
9.8
Qwen3 0.6B
tiny model
293
55
Qwen3.6-28B Q4_0 (requant)
deep 9K ctx
21.7
3.5
Qwen3.6-35B-REAP + MTP
non-viable on this CPU backend
-
- (stalled)
The pattern is somewhat clear: on this memory-bandwidth-bound board, decode rate tracks 1 / active parameters – or something. Sparse mixtures-of-experts and sub-4B dense models “work”, but anything above 3B just doesn’t, really. And Multi-token prediction (MTP), which I had gotten to work pretty well on my 3060 under go-pherence, stalls completely.
Since Gemma 4 just came out (again, for what, the third time?) with QAT, I also tried both its MTP and QAT variants by patching llama.cpp a bit further (by this time I was really hooked).
And splitting the workload across core types actually “worked”: sticking drafters on the slower X100 and the rest on A100 was feasible, but… there’s no fast memory exchange between core types, so it was (verifiably) useless:
Gemma 4 E2B QAT was, however, “useful”, for a rather slow definition of it (~13t/s), and it is technically multimodal, but on the K3… not usable either. I tossed a 224×224 test image into it, which took roughly 39–47 seconds just to process through the projector, and even though it could identify a solid red square an equally simple red square/blue circle image came back as “yellow and white”. Might be my code, but by this time I had already started eating into my vacation days and I decided to call it quits.
The numbers are interesting, though, and make me wonder what a K3-like CPU can do with other kinds of models:
Model / route
Params (M)
RAM / file
Prefill t/s
Decode t/s
Notes
Qwen 3 0.6B
596
373 MB
37.5
43.5
Great demo, zero substance
Gemma 4 E2B QAT
4,630
2.5 GB
99.6
12.9
decent prose/code without toy-model speedups; can use tools, will keep it around
Qwen3.6-28B-REAP-A3B
28,240
17.3 GB
28.9
7.15
quality anchor; large context and actual coding
Gemma 4 E4B
7,520
4.9 GB
27.5
6.01
twice as big as E2B, and twice as slow
Gemma 4 12B QAT UD-Q4_K_XL
11,910
6.3 GB
25.0
3.6
sort of worked, but unusable
In practice, none of the model, quant, or speculative tricks break the ~7 t/s decode wall for genuinely useful generation on the quality models, and we’re stuck shuffling ~3B of active weights per token out of LPDDR.
I was able to get very close to the C numbers with the E2B QAT, so I will be playing with that a bit more–in fact, I think that the Gemma 4 models are the most interesting thing out there if, like me, you’re stuck with an RTX 3060 and a 36GB RAM M3 Mac as your top inference hardware…
I am quite taken by the K3, to the point where I just got Whisper going on it using go-pherence and am now trying to shoehorn various other things into it, but the summary is as follows:
Software maturity is surprisingly good – Bianbu 4.0 has a modern kernel (6.18), modern tooling, and has had zero papercuts (so far). This is not the “barely boots” RISC-V experience from two years ago, when Go barely worked out of the box on riscv64 without a full rebuild. And no, I didn’t try Rust yet, but I will, eventually.
Thermals are well-managed – never exceeded 68°C even under sustained 16-core load, fan ramps smoothly and is much quieter than the CIX P1.
The shipping storage is great – NVMe-class speeds from the onboard UFS (didn’t feel the need to add an NVMe), no SD card in sight
GPU and video decoding story is better than expected – PowerVR Vulkan 1.3 and hardware decoding both work out of the box.
The “normal” CPU cores (and memory bandwidth) are middling – per-core IPC is roughly half an A720, but the A100 RVV sort of makes up for it.
And even if it can’t do “real” LLMs, I am pretty sure the K3 can handle standard image recognition swimmingly. YOLOv5 has just come out even as I am putting this post together, so I haven’t tested it, but the key thing is that RISC-V is really interesting as a CPU platform now (at least for me). Of course, it has to come with enough RAM (and those 32GB RAM are probably the bare minimum any AI SBC should have for any realistic use), and times are tough, but I look forward to testing its descendant(s).
Right now I’ve embarked on the rather quixotic quest of getting Ideogram 4 on it (and yes, I know it won’t really “work”, but I wanted to have a go and have another “working” implementation besides the RTX back-end), and I expect I will spend a bit of time trying to tweak Qwen or Gemma 4 on it to see if I can have a permanent “house LLM” that doesn’t suck and can do basic automation (even if slowly) – and I’ll update this post (or add a link to it at the bottom) with any positive results.
Jun 8th 2026 · 3 min read
·
#ai
#apple
#automation
#design
#ios
#macos
#opinion
#siri
#wwdc
This was the weirdest WWDC26 keynote in a while, and some of the past ones were visibly phoned in. It was rife with weirdness and flashbacks.
To my surprise, a few of my wish list items actually made it. Naming the next macOS “Golden Gate” was not on my bingo card, though; a little too trippy and a lot too lofty for what is, by Apple’s own tacit admission, a Snow Leopard year: catching up rather than charging ahead.
The self-deprecating tone ran through the whole thing, from a hippy bus that was equal parts weird and funny to the unmistakable sense of a company that spent the past year watching the industry sprint past it on AI and is now, not running but sedately pacing, to catch up.
Much to my surprise, two of my top annoyances got airtime: they’re tackling Spotlight and Mail search, the exact failures I called out, although whether either works once it ships is anyone’s guess.
They’re also doubling down on automation, at least superficially, with vibecoded Shortcuts and a renewed push for third-party Actions. Vibecoding Safari extensions and Shortcuts is the genuinely interesting part: it points at automation rather than novelty, which is more than I can say for yet another Image Playground. None of it erases the brittleness and legacy gaps that made me want a real platform to begin with, but it’s at least pointing the right way. Tab grouping and change detection in Safari are a fun party trick, no more.
And yes, there’s a new, as-yet-unproven Siri (with a completely pointless AI moniker) you summon by holding the power button (part Spotlight, part walkie-talkie, plus a floating gelatinous orb in Vision Pro), and a Siri app trying to be a catch-all bucket for every interaction.
The new voice struck me as a little cringe and overly American, which is an odd note to land on when you want me talking to my machines all day. The feature set is fuzzy: on paper it can touch far more of my data, and moving photos to the shared library by voice would be neat if it works. But Siri has been stuck at “if it works” for fifteen years, and the one thing I actually want (for it to handle my mail and calendar properly) wasn’t demoed in any useful detail.
I wondered whether the automation push would reach HomeKit, and the answer is a shrug: the new camera detection is cute, but a YOLO model has done exactly that for a decade, and the automation logic I actually need stays vague. The rest of my list didn’t show at all: no hypervisor on the iPad, no running my own code without the annual toll, nothing on iCloud sync, the Watch, or SwiftUI. Maybe the sessions turn something up (which is why this is an early read), but my expectations haven’t budged.
The framing around Apple Foundation Models was the bigger tell: we already know there’s Gemini underneath, which leaves me wondering how much Apple is adding beyond the wrapper. Liquid Glass got the same treatment by being walked back in the most face-saving way imaginable, with the old Accessibility transparency slider re-warmed and trotted out as an improvement. Disingenuous is the word, twice over.
Update: Also much to my surprise, they actually mentioned unifying the corner radii, which I completely missed. I must have tuned it out after the 300 random percentage performance improvements they quoted against… no real baseline, really.
Anyway, Apple heard the parts of everyone’s complaints that a) did not force them to walk back Liquid Glass and b) fit the AI story it needed to tell, and stayed quiet on a lot of the boring structural stuff that’s been broken for years. Yes, they are committing to improving performance and fixing some of the most egregious issues, and that’s not nothing; hearing Spotlight and Mail search admitted out loud is more than I expected, but it is mostly Apple’s technical debt catching up with them, and, of course, Apple catching up with everyone else where it regards AI, but on its own terms and at its own pace.
Oh, and they deprecated pretty much all of my hardware, too. Kind of expected, much like the usual geographical restrictions, which mean a good chunk of this may not reach Portugal for a year, if at all.
I’m going to give it a couple of days until the dust settles, watch the Platforms State of the Union tomorrow, and then mull things over a bit more. And maybe, somehow, we can chalk up this WWDC as a sort of a win, in the long run.
Jun 7th 2026 · 2 min read
·
#ai
#calibre
#mcp
#niri
#noctalia
#notes
#weekly
I decided to take a couple of days off and generally tune out, thanks to a few strategically placed bank holidays – which meant my usual mix of relaxing and dealing with a few chores.
For starters, I replaced the battery on our A1466 MacBook Air, which just keeps on trucking – it’s now on its third battery (I swapped the factory one some four, or was it five, years ago). For around EUR 80, keeping that rather nice keyboard/screen/trackpad combination in use was a no-brainer, and it too now runs a Niri desktop, having been converted to Fedora a few months ago.
I’ve been automating away a fairly large chunk of VM and container management – I have a dedicated agent that knows how to manage my Portainer stacks and version them in Gitea, for instance – but as it turns out, LLMs are also pretty good at a few other things, like setting up emulators under Steam (creating nice icons, fixing controller input mappings, tuning upscaling and shaders, and the rest of it).
But I hadn’t let an LLM loose on my Calibre and music collections yet, and – with the right safeguards – it’s been awesome at tidying up metadata. I had dozens of ancient books with slightly broken Calibre metadata, so I’ve been putting together an MCP server that sits next to my library to fix them – mostly because I don’t want to give a model full filesystem access to my NAS, and this way I can snapshot the database whenever it tries anything more extensive. I may well make something more generic, given time.
Jun 5th 2026 · 4 min read
·
#apple
#automation
#ios
#ipad
#macos
#rant
#wwdc
Michael Tsai’s annual roundup of WWDC wish lists went up this week, and the thing that struck me most wasn’t any single request–it was the mood. There seem to be fewer wish lists than last year, several people openly admitted they couldn’t be bothered to write one, and the ones that did are pretty much bereft of any “aspirational” wishes.
In short, most Apple developers seem resigned to their fate, and echoed the same weary plea for a “Snow Leopard” year where Apple fixes things instead of shipping more, er… “liquid” junk.
One thing that is clearly apparent even to me (even though I am not doing a lot of Mac or iOS development save ios-linuxkit) is that we haven’t even got stability in the 26s yet (John Siracusa has a rather mordant take on that in the latest ATP episode), and in a couple of weeks we’ll get betas of the 27s piling bugs on top of bugs.
I already wrote my catalogue of what’s broken last month, so consider this the constructive inverse–roughly the same list, reframed as things I’d actually like to see fixed next week.
None of these are moonshots. Most have been fixable for years, and a fair few were working better a decade ago.
What’s changed for me is the agentic-era stakes: I now point Codex and Claude at almost every tool I use during the day, and Apple’s software is, conspicuously, the part that fights back hardest (although I can’t really go on about it much, this week’s MS Build is chock full of examples where Microsoft is way ahead of Apple in working AI integration, and it’s… just sad to me personally).
My expectations are effectively rock-bottom by now. Apple has become a hardware company where software seems to have been tacked on as a somewhat under-maintained afterthought. But I can’t help but keep a scorecard, so here’s what I’m hoping for–in rough order of how often it ruins my week.
I want Mail to be automatable again. Not necessarily the full plugin API they killed, but an AppleScript dictionary that isn’t frozen in amber and a MailKit surface that can file, tag and search without ceremony–because the one app I live in all day is the one black box I can’t point an agent at. While they’re at it, smart folders and rules that sync from the Mac should finally arrive on iOS, roughly twenty years late.
Spotlight should simply find things that exist. I’d settle for that alone–no AI, no reinvention–just reliable, complete results and the one-line reindex affordance the Mac has had for years made available on iOS, so a corrupted index doesn’t mean a multi-hour restore that breaks Apple Pay and FaceID along the way.
In the agentic era, automation needs to be a first-class platform, not an afterthought. Like many others, I wish for a way to programmatically create and modify Shortcuts; I also want Shortcuts that don’t break between OS releases, a genuine cross-platform story, and the MCP-style hooks that OpenAI and Anthropic have to keep reinventing to automate anything in macOS. Windows still does COM and Win32 automation so well that I built an agent tool against it in fifteen minutes–Apple should be embarrassed by that comparison.
Give the iPad back a hypervisor. Hypervisor.framework has been on the Mac since Yosemite and Apple Silicon runs Linux VMs beautifully, yet an EUR 1,400 iPad Pro with an M4 can’t run a container or a VM that a EUR 50 ARM board handles without breaking a sweat. The entire local-LLM and coding-agent ecosystem I depend on is locked out of the most powerful tablet I own.
HomeKit needs a scripting layer and real logic. Scene chaining, granular presence, if-this-then-that that actually works, and–for the love of everything–let HomeKit automations call Shortcuts, not just the reverse. I’ve papered over all of it with Node-RED and Home Assistant, but none of that should be necessary for someone who bought into the ecosystem.
Make iCloud sync trustworthy and give us Sync Now buttons across the core apps, the way Messages already has (for now, until they notice and remove it). Stop silently migrating data to CloudKit and leaving the CalDAV and IMAP paths to rot–document third-party access properly instead of letting Reminders and Notes quietly vanish from open protocols. Apple has never exposed any APIs worth using, and that needs to change.
The Watch should be the best time-aware device Apple makes, and instead it’s a widget carousel. I want a Pebble-style chronological timeline, a Smart Stack that’s actually aligned with my calendar, and the Watch independence Imthaz Ahamed asked for–let it pair with more than one phone.
Let me run my own code on my own hardware without an annual EUR 99 toll. I don’t want App Store distribution–I want a “just run this on my phone” mode in Xcode that doesn’t involve certificate chains that expire and silently brick my sideloaded apps.
Stabilise SwiftUI or admit it’s a research project. Views that worked on iOS 17 behave differently on 18 and seem broken on 26, and I lose hours dropping to UIKit to dodge layout bugs reported years ago. Steve Troughton-Smith’s dream of a real cross-platform successor to UIKit and AppKit is the one I’d trade everything else on this list for if I had to write iOS apps for a living.
And no, I’m not going to complain about Liquid Glass again. I don’t think anyone at Apple will ever own up to how much of a failure it was (even down to controls that provide user feedback but don’t register clicks at the very edge of them), and some of it was an improvement (the other 80% of spattering controls atop application content wasn’t).
Every one of these is within Apple’s reach. They have the engineers, the money, and total control of the platform, which is precisely why the pattern grates: this isn’t technical inability, it’s a decade of chosen neglect dressed up as focus, whether you look at it from the pure platform side or if you think about it in terms of the (utterly absent) third-party API integration surface.
This is, unashamedly, a bit of a rant. I’ve been using Macs since System 6 and writing here since the OS X betas, and I’ve watched the company get richer and more capable while the software I use every day gets quietly worse at the boring, essential things, and no wonder I have gradually started using other platforms to the point where most people don’t even consider this a Mac blog.
But I am deeply indebted to Apple for making the platforms that have kept me sane over multiple decades, and I do care about the ecosystem, so… Here we are.
I’d love to be proved wrong next week. I won’t hold my breath–but the scorecard is open, the pen is out, and if all we get is another year of razzle over the dazzle, at least I’ll have a checklist to tick off.
Jun 4th 2026 · 9 min read
·
#agentic
#ai
#anthropic
#codex
#coding
#copilot
#llms
#openai
#workflow
Since today is a bank holiday for me, I decided to consolidate a few more of my notes into a post. What follows is a set of guiding “principles” that I’ve found useful over the past year or so and that I’ve codified into various bits of scaffolding I reuse across my projects.
After years of rumors, NVIDIA is finally shipping an Arm chip for Windows PCs, and the part that interests me isn’t the GPU–it’s the up to 128GB of unified LPDDR5x memory sitting behind it, something that Qualcomm never really went for.
The RTX Spark is essentially a consumer rebrand of the DGX Spark dev box (which I’ve been trying unsuccessfully to get my hands on, by the way), pairing a 20-core Grace CPU (co-designed with MediaTek, all big and “medium” cores, no efficiency cores) with up to 6,144 Blackwell cores, roughly a desktop RTX 5070’s worth of GPU inside an 80W envelope.
Might be a little toasty for a laptop, and will have to be very power efficient if they really want to compete with Apple Silicon… But there are zero actual specs anywhere on the PR, and pricing is sure to be… interesting.
But it’s nice to see them chasing the same unified-memory architecture that makes Apple’s M5 Pro/Max and the Framework Desktop genuinely useful for running local models, since 100GB+ of addressable VRAM is a lot more useful than the insulting 8-12GB you get on a discrete 5070.
And the gaming angle also makes it pretty interesting. Prism translation has finally gotten good enough that productivity work feels indistinguishable, but gaming remains a minefield of anti-cheat kernels that simply refuse to run. Qualcomm never “fixed” that (nor pricing, or efficiency either).
If it didn’t feel like the end times for computer hardware right now, this would be amazing.
May 31st 2026 · 3 min read
·
#go-pherence
#hardware
#networking
#niri
#notes
#weekly
Today I realised that I could just spend the day doing essentially nothing and that nobody would hold it against me (at least in Western nations), so… I might well do just that, with a few caveats:
Allergy season is finally fading (at least for me), but today was the first time I had to turn on the AC in the office, and it was great to realize that despite the recent Wi-Fi changes and almost four years of potential HomeKit foibles, my ESP32 hack is still working perfectly.
A few months after writing up the Cudy AX3000 units and moving the house over to OpenWRT, I ended up revisiting the one bit I had deliberately waved away as “good enough”: roaming.
My sinuses are still giving me grief, but this week was much more successful at pretending to be enjoyable, at least. For starters, we watched Project Hail Mary, and it was every bit as good as I would expect it to be, which is very rare in movies these days.
I think it’s time for an update on my iPad Pro M1 and, most importantly, the Logitech Combo Touch I got for it. Think of it as a long term review of sorts.
This is a little bit of follow-up to my MiniBook X review – I keep using it routinely (especially when we travel for leisure) and love the little thing to bits, but I’ve been wanting to run it mostly on power saving mode to reap the most benefit out of the hardware (and battery, of course), so I started looking at desktop environment alternatives.
Yes, I could already get a full afternoon (and then some) out of it, but Apple Silicon has spoiled me as far as battery life expectations go, and GNOME has a little bit too much baggage for that kind of extended use.
Since I spend 90% of my time on it writing or coding and still have a penchant for keyboard-driven desktops, I initially switched to Fedora Sway Atomic (gotta love being able to swap environments with a single command…), but later installed Niri and Noctalia Shell because I really like both the idea of a scrolling window environment and the sheer polish of the whole thing–even if there are some rough edges here and there.
I am very happy with it, and writing plugins for it is trivial:
I hacked together a Bing Wallpaper plugin in 30m
The one thing that annoyed me to no end, though, was locking on suspend, which Noctalia Shell should do but apparently doesn’t in Fedora, so I had to resort to two hacks:
This last one feels extremely gauche and I hope to find a better way, but I guess this comes with the territory. I don’t really care about having a trendy Wayland desktop (I just want a dead simple one with a bit of polish), but I hope this kind of hacks won’t be necessary for much longer.
Oh, and of course I set gsettings set org.gnome.desktop.wm.preferences button-layout 'close,minimize,maximize:appmenu' to match macOS decorations.
I know this blog has strayed a fair distance from its Mac-centric origins, but I’ve been keeping a mental list of all the things that are broken, missing or inexplicably neglected in Apple’s software, and it’s gotten long enough that writing it down feels like a public service1.
The weather has gone a tad cloudy again, which provided me some relief from my allergies–but not enough for proper overnight rest, so yet again I arrived at Friday afternoon totally exhausted.
Last weekend my DS1019+ decided, for some unfathomable reason, to stop working after I took it out of the closet, dusted it and put it back, and I have feelings about it.
The Ternus announcement got me thinking about the one thing I keep wishing Apple would build and almost certainly never will: a family-scoped AI assistant that actually works across all our devices.
I am very late to this party (it was announced at I/O last week and I’ve been buried in other things), but Google is replacing Chromebooks with “Googlebooks”–Android-based laptops with Gemini baked in, designed to sync with your phone.
On the face of it, this looks like yet another Google rebranding exercise, and considering what happened with the Pixel and Google’s penchant for unveiling “category defining” devices they never actually sell worldwide, My first reaction was “meh”.
But with Android’s recent support for desktop windowing, resizable apps and Linux sandboxing, this is actually very interesting for me–because it means you can have a laptop that runs Android apps natively, has a proper desktop shell, and can spin up a Linux container for development work. All on ARM hardware.
If they get the desktop UX right (which is a big “if” given Google’s track record with consistency–I’ve set up Android 16 on a Pi and it sucks), this could be a genuinely compelling alternative to both Chromebooks and cheap Windows laptops–especially for people who already live in the Android ecosystem and want something that doesn’t fight them the way iPadOS does.
May 12th 2026 · 1 min read
·
#agents
#ai
#ide
#ipad
#opinion
#piclaw
#ux
This was a weird week, both because I keep waking up at 5AM with my sinuses clogged, and because I feel like I’m losing momentum. Feeling almost permanently cotton-headed, sleepy due to sheer exhaustion or because of antihistamines certainly has something to do with it, but I am not exactly enthusiastic this weekend.
Regular readers will know that I’ve spent most of the past two years shoehorning LLMs into single-board computers, partly as a learning exercise and partly because there are lots of local/”edge” applications where semantic reasoning (no matter how limited) and “interpretation” of sensor data are actually useful.
I genuinely did not see this coming. Cloudflare has been building one of the more coherent AI developer platforms out there–Workers AI, AI Gateway, Vectorize, their edge inference stack–all sitting on top of the same network they’ve been quietly expanding for years. They’ve been making real moves in the agentic space, not just slapping an LLM API on top of existing products, and I thought they were doing it in a way that would require more people, not fewer.
And yet: 1,100 jobs gone–roughly 20% of their workforce–with the explanation being that internal AI adoption changed what the company actually needs. I can follow the logic even if I find the timing jarring. I have friends in the Lisbon office–one of their larger European engineering bases, and one of the better things to happen to the local tech scene in recent years–and I’m genuinely hoping they’re alright.
May 7th 2026 · 2 min read
·
#agents
#ai
#anthropic
#codex
#coding
#llm
#openai
#opinion
I’ve been getting annoyed at constant code regressions in piclaw for the past few weeks. Something was off–even after bumping the test suite to the point where it catches most mechanical errors, gpt-5.5 kept making unrelated edits to code that should have been left alone, and I was getting really annoyed at babysitting it.
This was an absurdly productive week, at least on a personal level. I’m not sure whether to be pleased or worried about the number of projects that moved forward simultaneously, but here we are.
Of all the Maclock mods I’ve seen since I wrote up my review, this is probably the best all-round solution. It uses a custom PCB to drive a properly fitted display from a Pi Zero 2W running SheepShaver, and the result is a clean, self-contained build with none of the cable-routing bodges that plague most of these projects–and still uses the battery, which is great.
I’ve been deliberately not finishing my own Maclock mod, and this is serendipitous–sometimes waiting yields the exact solution you’d have spent weeks converging on from a worse starting point. The custom PCB is the key bit: it solves the button re-use and display connection in one go, which is the part I kept stalling on.
I’ll be using my own macemu build on it, now that the ARM64 JIT work has made SheepShaver fast enough to make even a regular Pi Zero feel snappy…
May 3rd 2026 · 1 min read
·
#apps
#ghostty
#ios
#ipad
#mosh
#ssh
#terminal
This has pretty much replaced Blink for me. rootshell is a Metal-accelerated terminal for iPhone, iPad, Vision Pro (ha!) and, surprisingly, the Mac, built on Ghostty’s rendering engine.
It is buttery smooth, and has proper mosh support, which means my sessions survive Wi-Fi handoffs and network changes without dropping.
The Ghostty bit matters because it means the rendering is fast, the font handling is good (it has Fira Code Nerd which has become by default), and the whole thing feels like a proper terminal rather than the usual iOS compromise. There’s also a built-in AI assistant that can execute shell commands locally, which sounds gimmicky but is surprisingly useful for one-off tasks when you can’t be bothered to type out a long find or awk invocation on a phone keyboard (I got it to work with a Gemini API key).
The one thing I’m missing is the ability to install my own commands like I do with A-Shell lets you–local binaries, custom scripts, that sort of thing. A-Shell’s approach of bundling a minimal Unix userland inside the app sandbox is still unmatched for offline tinkering, but it’s nice to have alternatives.