So You Want To Do Agentic Development

We’re three months into 2026, and coding agents have been a big part of my time since –things have definitely intensified, and has already panned out: agents are everywhere.

Yes, I love this picture, and I won't apologize for it
Yes, I love this picture, and I won't apologize for it

My advice for people getting into this remains the same:

Choose Mature Tooling

In the music hobby, there’s a thing called GAS–Gear Acquisition Syndrome–where people get obsessed with buying the latest gear even if they don’t know how to use it. I see in the space right now, so I’d rather recommend starting with mature, well-supported tools:

  • with GitHub Copilot is still the best entry point–you can compare Claude, GPT and Gemini side by side, and it affords real control over the agent’s environment (plus it’s designed for enterprise use).
  • Mistral Vibe and Gemini CLI both have daily free tiers with enough fail-safes to experiment safely. (I still recommend sandboxing, but it’s less critical than it was a few months ago.)
  • OpenCode is the fully free route, but the models have fewer guardrails and can take definitely sandbox this one.

I can’t in good conscience recommend spending hundreds on Anthropic or OpenAI subscriptions right now–the market is saturated, and both are shipping desktop tools (Claude Code, Codex) that will likely come with cheaper tiers. The “use boring technology” adage applies here too.

Of course by now and keep , but that’s a long-term project, and I don’t think it’s the best starting point for most people.

Sandboxing

I never run agent tools on a machine with personal data–that’s why I built agentbox in the first place. You may not need to be as as I am, but supports dev containers on any platform, and both Anthropic and OpenAI are shipping sandboxes with their tools, so there’s really no excuse.

Privacy and Security

There are rather a lot of misconceptions about data privacy, and things like OpenClaw don’t help matters (I’m still gobsmacked people give it direct access to their e-mail). Even with enterprise-hosted models that don’t train on your data, “don’t run untrusted code on a machine with personal data” is a principle worth keeping.

Like I was quipping the other day, AI is the new digital advertising–and yet people are giving it more access to their data than they give ad networks, which is just baffling.

The Local Fallacy

And speaking of OpenClaw–the “local AI” fallacy needs addressing. None of these things are really “local” in any meaningful sense–the gap between local and cloud models is huge, and even tens of thousands of dollars in hardware won’t get you close to frontier capabilities.

The agentic loop is inescapable
The agentic loop is inescapable

“Fast, Good, Cheap: pick two” still applies, and it’s easy to get bitten by technology advances: I and it’s already obsolete.

And there is an almost weekly hype cycle around local models that I think is totally unwarranted in practice. For instance, Qwen is promising, but the local quantizations match last year’s cloud models at best, and the gap keeps widening.

Workflow

I keep coming across people who say AI generates rubbish code, and I think it’s usually one of two things: wrong tools ( does much more than provide a UI–it curates context and feeds models your code structure), or wrong approach (they’re quite literally “holding it wrong”). I wrote about two years ago and the fundamentals haven’t changed.

Part of it is inflated expectations, of course. Frontier models like Opus 4.6 and GPT-5.4 are very capable, but they need skill to use effectively, and they never produce perfect code on the first try. You have to know how to use them, and that takes practice.

I’ve been refining how I work with them since , and although things have evolved quite a bit, the core principles remain the same.

SPEC.md

Every project starts with a SPEC.md that I , 20-questions style, until it covers the essentials–goals, functional specs, non-functional specs, technical specs, and acceptance criteria.

I prefer SPEC.md over PRD.md because it emphasises specification over requirements–I want the agent to follow it, not interpret it freely.

This isn’t a prompt–it’s a living document that evolves with the project. And –feeding one an actual ECMA-376 spec document got me to 60% compliance in days, with no hallucinated APIs.

SKILL.md

I complement specs with SKILL.md files–guidelines for coding, tooling, or domain-specific tasks. I have a growing collection in agentbox, and every new project starts with a make init that .

You can also fold these into .github/copilot-instructions.md (which picks up automatically), but standalone skills are tool-agnostic.

The properly interesting bit is that agents now piclaw built its own hot-reload, backup, and web scraping skills after I’d guided it through the process a few times. Early days, but that’s where this is headed.

–models struggle to chain skills together, whereas MCP narrows context and presents clear next steps. But for teaching how rather than what, skills are still invaluable.

The PLAN.md Loop

After doing the prep work, I go into a loop :

The workflow
The workflow

From the SPEC.md, I create a PLAN.md–not a flat TODO checklist, but a structured breakdown the agent can reason about (what’s done, what’s blocked, why). It updates the plan as it goes, which also refreshes model context. No reliance on built-in planning tools (which are patchy across models), and the plan is always in the repo for me to review.

The loop itself is pretty simple:

  • I break down work into focused chunks–scaffolding, data model, API, etc.
  • The agent writes code, lints, tests, documents, and updates the PLAN.md.
  • I review and steer–correcting, feeding it more context, or pointing at examples.

Steering

The most important bit of the loop–and the one most people get wrong at first. Effective steering isn’t about reprompting and hoping for the best; it’s about funnelling the agent’s attention to the right context:

  • –describe expected behaviour, ask the agent to make them pass. My most reliable workflow, especially when porting across languages.
  • Linting and static analysis via Makefile or CI–the agent self-corrects. I aim for 80% coverage as a quality bar.
  • Steering by example–pointing at existing code that demonstrates the right approach.
  • Claude still writes dodgy tests (ok, fine, it forgets all sorts of corner cases), so I use Codex models for test and security audits.

Yes, that’s proper work. But it’s no different from managing humans, and it gets easier with practice.

Language Matters

Some languages are inherently better for agents–not in terms of popularity (any agent can do decent ), but because strong types and annotations help models understand intent and self-correct.

In my experience, , and work much better than (too many revisions, too-opinionated frameworks). in particular through explicit function and interface references, and has excellent tooling for enforcing good practices–which is why I’ve been using both a lot more recently.

What’s Next

My is working well, and I don’t see myself changing it anytime soon. But I’ve been slowly extending my agents’ reach–piclaw, which started as a weekend hack, is now , and I’ve been giving it more autonomy as I learn to trust the guardrails.

The next frontier (boy, is this a pompous term, but I guess is sneaking in this late in the evening) is getting agents to collaborate–sharing context and skills to work as a group. I have , but that’s a matter for another post.

Notes for March 2–8

This was a frankly absurd week work-wise, with some pretty long days and a lot of late-night hacking on my projects (which is not exactly a new thing, but at least now I am asking piclaw to do it during the day time, which is a small improvement).

Speaking of which, it’s pretty much finished UX-wise:

Our spiffy demo animation
Our spiffy demo animation

And I made it a point of letting piclaw release v1.0.0 by itself, which was kind of fun:

It does everything now
It does everything now

It’s pretty much feature complete mobile-wise, too:

  • I can log in via TOTP or Passkeys (WebAuthn), which is probably the only real use I ever found for them.
  • The UI is very nice to use on both the iPhone and iPad, and I can now edit files on desktop as well (I added a CodeMirror-based workspace editor in a moment of creeping featuritis, and it has turned out very useful).
  • I added lots of little quality-of-life features like a context usage indicator, thinking level display, and history navigation.
  • Most of the internals seem sound, with all the rough corners I found fixed.

I’m still (ironically) struggling with Azure OpenAI Integration, especially since GPT-5.3 Codex and 5.4 have a few twists in their API responses that are causing some issues, but I think I have most of the kinks worked out now.

And, since I can’t leave well enough alone, I also backported most of it to vibes and am working on a way for piclaw instances to talk to each other and share information.

Yes, I am building a little agent society. I must have drunk too much of that Kool-Aid… But still, it’s an interesting thing to play with, even if my priority this weekend was getting an ARM SBC up and running with a custom Debian image that piclaw built for itself–which is something I never expected to be doing when I started this project.

I’m sort of expecting Sarah Connor to burst through the door at any moment and tell me I have to stop.

Notes for February 23–March 1

Well, going back to work after a week off was rough.

It’s been a long while since I’ve been this stupendously unwilling to do , and although I quickly got a hold on things, the entire week felt like recurring confirmation that I am not doing the kind of work I want to be doing.

Fortunately I was able to have quite a bit of fun with my own projects in the wee hours.

Daily Vibing

I have been using vibes almost continuously to convert a few thousand posts in format to to accelerate this site’s migration to its future engine.

Over the past week, instead of being 40% and 60% , this site became around 75% , which is a huge improvement.

The agent is a little bit too enthusiastic about converting things to and sometimes screws up the formatting (especially for tables), but it’s a massive improvement over scripts, one-shot prompts or even my custom server.

Fortunately I had the foresight to make things like my link tables and other structured data use YAML, which made it much easier for the agent to parse and convert things without screwing up the formatting.

And, every morning, I can just pop open vibes on my phone and ask it to file new interesting links, or start a draft that I can finish later.

It is still too eager in marking pages as “new” (which I’ve had to fix a few times), but the fact that it can reclassify URLs on its own and validate inter-page links is a huge help.

The Lobsters Have Landed

This is not even its final form
This is not even its final form

piclaw, however, has been a completely different ball game. It shares a lot of DNA with vibes (in fact, it started as a fork of vibes that I then refactored and rewrote to handle pi), but the fact that it is an agentic system with a persistent memory that I can access from my phone has made it a much more powerful tool for me.

The main reason why is that I can pull out my phone or iPad (or a webapp on my desktop) and quickly get the agent to follow-up on things using text or voice input, which is surprisingly effective with frontier models.

Regardless of the mania, there is a lot to be said for an agentic loop with a dead simple memory (I gave it access to its own sqlite database besides files) and a bunch of SKILL.md files that it can write itself.

Over this past week, piclaw has rebuilt itself into a much more robust and capable system:

  • The first skill it built for itself was a “hot reload” skill that allows it to update its own codebase and restart itself, which is… exactly the thing that is going to bring about the singularity, I’m sure.
  • Learned to back itself up to Azure using restic every night, and to restore itself if something goes wrong.
  • Learned to use playwright to scrape nearby restaurant listings and suggest places to order from at lunchtime.
  • Refined that to do a comparative analysis of pretty much every inference API out there and build a pricing calculator for itself, plus a neat SVG chart with our token usage.
  • Checked a few sites every couple of days and gave me updates on interesting content that doesn’t have an feed (it’s also been able to do quite interesting “news alerts” off my summaries, which is something I will play around with more).
  • Searched ArXiv for new papers in my areas of interest, summarized them, and cross-referenced them with my notes to give me an overview of a couple of topics I needed for work.

But these were “just” new SKILL.md files that it wrote for itself and then loaded, which is impressive but not exactly groundbreaking.

Then it started improving itself:

  • Added the ability to switch models on the fly, which also means that scheduled tasks can now run on cheaper models automatically. Thanks to the brilliant design of pi itself, scheduled runs are appended to a side branch in the session tree and then the cursor is restored. The result is fully logged, but it doesn’t pollute the main model’s conversational context.
  • The web UI gained a workspace explorer with previews, downloads, and file‑reference pills that feed directly into prompts. I had piclaw and vibes steal each other’s CSS and UI tweaks as we implemented new features, and the result is a much more responsive and usable interface.
  • Large messages now have previews and digests, keeping the UI responsive. The timeline does infinite scrolling and full-text search, and the model can search it itself.

But the most impressive thing it did was that it built me a custom Debian Trixie image for a new ARM board hat almost booted on the first try.

And it only failed because of a quirk in the EFI/DTB boot sequence, which is… pretty impressive.

I gave it two things:

  • The URL to the vendor source tree, which I cloned
  • Access to a local VM running Ubuntu

…and it was able to ssh in, review the build scripts, cross-compile the kernel and use debootstrap to build a custom image for the board, which I then flashed onto an SD card.

That got to the “starting kernel” stage but then threw an error because EFI and DTBs are fickle beasts and it skipped a step in the process that I didn’t know about.

The Future Is Not What I Expected

After implementing automatic model switching, piclaw is running Codex 5.1 Mini or GPT 5 Mini most of the time for basic tasks, and it knows to switch to Opus 4.6 when interpreting content or Codex 5.3 Max when fixing code, so costs are… manageable. But I am not leaving it unsupervised with the more expensive models.

Right now I haven’t given piclaw access to any of my actual data except for some project code (which , so I am the one reviewing and pushing commits to GitHub), but I’m pretty sure I will have one or two projects it will be having direct access to in the near future.

But then again, piclaw is not –it is a much simpler, much more focused thing. I know every single thing it’s doing (and every single line of code it’s written for itself), so I had no compunction in giving its early iterations access to my vault, and will likely do so on a permanent basis soon.

I get the hype–it is indeed a much nicer way to use agents than, say, webterm.

Passkeys for my pocket

Right now I’m working on a little something that will make it easier to access. is awesome, but I need a brain-dead authentication mechanism, so I’m prototyping bouncer, which is going to be a reverse proxy that only supports passkey authentication and nothing else.

I have zero belief that passkeys are a real solution for consumer authentication, but if you flip the script and use them for temporary mobile authentication tokens, then they actually make sense.

However much that might be the case, it was a great excuse to draw another cute -themed mascot, and I am pretty happy with how it turned out:

The bouncer mascot
The bouncer mascot

Notes for February 16-21

This week I did something different: I took a wellness break from work and generally tried to tune out all the noise and messiness I have been experiencing there. It ate a chunk out of my PTO, but was mostly worth it.

Mood-wise that was quite the challenge with the lousy weather we’ve been having, but the lovely golden hours we get in our living room and today’s sunny morning have gradually bolstered my spirits to a fair degree of optimism that I fear will be wasted on the upcoming work week.

However, it still took me four whole days to completely tune out. No email, no calls (other than personal ones), and, finally, settling into having a decent night’s sleep without my waking up to the prospect of a day’s worth of meaningless meetings where I learn nothing new.

I also tried to go cold turkey on social networks, YouTube, and pretty much any kind of doomscrolling, which took most of those four days to accomplish (i.e., to move from “ooh, dopamine” to “why am I looking at this stuff?” and make a conscious decision to do something else).

The timing was tricky given all the madness (and I did peek at various corners of the internet to keep track of that), but overall I just tried to relax and do stuff that was meaningful to me–unhurriedly, thoughtfully, without trying to cram or rush things.

  • The eternal strife to keep a clean desk at my home office continues. I went there a couple of afternoons and tidied up, but gradually gravitated back to our sunnier living room.
  • I spent a bit of time trying to figure out , with mixed results.
  • I got an agent to clean up my vault’s duplicates (which resulted from ). I installed it via Apple’s container runtime (which works very well with agentbox), told it to do an FTS index and evaluate things based on similarity, and it did a pretty wonderful job, including re-tagging most of it. I now have it converting old posts on this site, five at a time (so I can manually review them using lazygit).
  • I built my own OpenClaw, which was much more amusing than I would have thought since when I asked it “what am I watching on my Android TV” the thing installed nmap and adb, found my SHIELD, connected to it (which I allowed) and figured out the YouTube video I was watching by just inspecting the Android runtime, which was kind of mind-blowing. I know what it did and how it “reasoned” its way through it, but no wonder people are going crazy.
  • For an encore, I told it to connect to my broker, and of course it was much better than Siri at figuring out home automation—it even told me “there is someone sitting at the desk” because it detected both presence and small movements in the payloads, and offered to turn the office lights off when I left (which I already have as an automation, but it couldn’t see that).

And of course I kept polishing my various projects. I may have accomplished little, but feel a lot better than I did last week.

macOS Tahoe 26.3 is Broken

I have no idea of what is happening since I can’t even find any decent logs in Console.app, but it seems that the latest update to macOS Tahoe (26.3) has a serious bug.

My desktop (a Mac Mini M2 Pro) has been crashing repeatedly since the update, and the symptom is always the same: it suddenly becomes sluggish (the mouse cursor slows down, then freezes completely), and then after a minute both my displays flash purple and then go black as the machine reboots.

For a machine that used to be on 24/7 with zero issues, this is a definite regression.

I’ve tried keeping ActivityMonitor open to see if I can spot any process that is consuming too much CPU or memory (the obvious suspect would be WindowServer), but everything looks normal until the crash happens, and there is only so much attention one can spare while, you know, doing productive stuff.

I have a sneaking suspicion that it might be Apple’s container framework, but I have no way to confirm that other than (I guess) stop using it for a few days.

But, again, the fact that I can’t find any logs or error messages in Console.app and that searching for anything inside it is still a nightmare is a testament to Apple’s general software quality and ability to troubleshoot anything these days…

I filed it as FB21983519 in case anyone cares.

Update, a few hours later: It’s not the container framework. I just came back to my machine after lunch and it had crashed again, so I guess it’s something else. Again, nothing shows up on the console logs other than the BOOT TIME entries for the reboot. There were no hardware changes, no new peripherals, and I’ve removed most of my login items (not that I have that many, and neither of them was recently updated).

Update, a day later: The crashes continue. My MacBook Pro also mysteriously crashed while asleep–the first time this has happened to me in recent memory, and also running 26.3. I have no idea if it’s related, but it seems to be a similar issue. Again, no logs or error messages in Console.app other than the BOOT TIME entries.

Update, two days later: A few people have suggested this might be a buggy interaction between container and Time Machine, which would be serendipitous given , and although I did indeed remove ~/Library/Application Support/com.apple.container from my desktop early on and still had a crash, my MacBook Pro never ran container in the first place, so I don’t think that’s the issue. And somehow, Touch ID stooped working on my MacBook Pro, which is just the cherry on top of this whole mess.

Update, a week later: I reinstalled macOS (sadly, Tahoe again) on my desktop and removed an errant sparse disk image from my laptop—apparently Time Machine was indeed the problem on that at least, since even though I wasn’t running container on it I did have a personal encrypted volume that 26.3 objected to. I now suspect that both 26.2 and 26.3 broke Time Machine in different ways.

TIL: Claude Opus 4.6 Can Reverse Engineer STL Files

So, I finally got around to opening the I’ve had sitting around for a while, and I was looking at the STL files for the screen bracket that are over on Printables.

Since I want to fit a lot more electronics into the case, one of the things I need to do is modify that bracket a bit so I can actually use the brightness knob to drive a potentiometer instead of just a dummy knob.

But you can’t really edit STL files, and I don’t have the original CAD files, so I thought “well, maybe I can just ask an to reverse-engineer the STL into an file, and then I can modify that”.

So I created a SPEC.md file with the following content, spun up a new agentbox container to handle it, and gave it to pi:

# STL to OpenScad reconstruction

You are running inside a Debian system with sudo, brew, python3 and whatever you need to accomplish this task.

Your job is to take the STL file in this workspace and reconstruct a copy of it using OpenSCAD.

If the STL is too large or complex, you can try decimating it/using octrees/segmenting it/using marching cubes as needed.

## Requirements

- The final STL file must be within 0.1mm accuracy of the original
- The OpenSCAD code must be modular, readable, and have relevant constants surfaced at the beginning for a customizer
- You should focus on building reusable tools, not one-off scripts
- You should maintain a plan.md file with your progress checklist and tick off items (including failed approaches) as you make progress

Much to my surprise, Claude Opus 4.6 was able to handle this task pretty well, and it produced a stupefyingly accurate file:

// WonderMac Screen Bracket - OpenSCAD Reconstruction
// Reconstructed from STL analysis
// All dimensions in mm

/* [Main Dimensions] */
// Total width of the bracket (X direction)
bracket_width = 68;         // X: -50 to 18
// X offset (left edge position)
bracket_x_offset = -50;

/* [Back Plate] */
// Height of the tall vertical back plate
back_plate_height = 21.5;   // Z: 0 to 21.5
// Thickness of the back plate (Y direction)
back_plate_thickness = 2;   // Y: -16.5 to -14.5
// Y position of back plate inner edge
back_plate_y = -16.5;

/* [Top Shelf] */
// Bridge depth (connects back plate to top lip)
bridge_depth = 5.5;         // Y: -14.5 to -9
// Bridge height
bridge_height = 2;          // Z: 0 to 2
// Top lip depth
top_lip_depth = 2;          // Y: -9 to -7
// Top lip height
top_lip_height = 5;         // Z: 0 to 5

/* [Frame] */
// Frame outer Y bounds
frame_outer_y_min = -66.5;
frame_outer_y_max = -14.5;  // matches back plate
// Frame inner bounds (lower section Z=0-7)
frame_inner_x_min = -47;
frame_inner_x_max_lower = 18;  // open right side
frame_inner_y_min_lower = -64.5;
frame_inner_y_max_lower = -16.5;
// Frame inner bounds (upper section Z=7-9)
frame_inner_x_max_upper = 13;
frame_inner_y_min_upper = -62.5;
frame_inner_y_max_upper = -17.5;
// Frame heights
frame_lower_height = 7;     // Z: 0 to 7
frame_upper_height = 2;     // Z: 7 to 9

/* [Bottom Lip] */
// Lip extension below frame
lip_y_min = -70.5;
lip_y_max = -66.5;
lip_z_min = 3;
lip_z_max = 5;
lip_height = 2;             // lip_z_max - lip_z_min

/* [Extended Plate] */
// L-shaped plate at Z=5-7
plate_z_min = 5;
plate_z_max = 7;
plate_height = 2;
// Bottom part of L (full width)
plate_bottom_y_min = -9;
plate_bottom_y_max = -3;
// Left part of L (extends upward)
plate_left_y_max = 15;
plate_left_x_max = -30;

/* [Screw Post] */
// Cylindrical post with through-hole
post_center_x = -41;
post_center_y = 10.5;
post_outer_radius = 2;
post_inner_radius = 1;
post_z_min = 0.5;
post_z_max = 5;             // meets plate bottom

/* [Screw Holes] */
// Hole radius for mounting holes
mount_hole_radius = 0.75;
// Lip mounting holes
lip_hole_left_x = -47.75;
lip_hole_right_x = 15.75;
lip_hole_y = -69;
// Plate mounting holes
plate_hole_left_x = -46;
plate_hole_right_x = 15;
plate_hole_y = -5;

/* [Resolution] */
$fn = 64;

// ============================================
// Modules
// ============================================

module back_plate() {
    translate([bracket_x_offset, back_plate_y, 0])
        cube([bracket_width, back_plate_thickness, back_plate_height]);
}

module bridge() {
    translate([bracket_x_offset, back_plate_y + back_plate_thickness, 0])
        cube([bracket_width, bridge_depth, bridge_height]);
}

module top_lip() {
    translate([bracket_x_offset, -9, 0])
        cube([bracket_width, top_lip_depth, top_lip_height]);
}

module frame_lower() {
    difference() {
        // Outer rectangle
        translate([bracket_x_offset, frame_outer_y_min, 0])
            cube([bracket_width,
                  frame_outer_y_max - frame_outer_y_min,
                  frame_lower_height]);
        // Inner cutout
        translate([frame_inner_x_min, frame_inner_y_min_lower, -1])
            cube([frame_inner_x_max_lower - frame_inner_x_min,
                  frame_inner_y_max_lower - frame_inner_y_min_lower,
                  frame_lower_height + 2]);
    }
}

module frame_upper() {
    difference() {
        // Outer rectangle (same as lower)
        translate([bracket_x_offset, frame_outer_y_min, frame_lower_height])
            cube([bracket_width,
                  frame_outer_y_max - frame_outer_y_min,
                  frame_upper_height]);
        // Inner cutout (smaller = thicker walls)
        translate([frame_inner_x_min, frame_inner_y_min_upper, frame_lower_height - 1])
            cube([frame_inner_x_max_upper - frame_inner_x_min,
                  frame_inner_y_max_upper - frame_inner_y_min_upper,
                  frame_upper_height + 2]);
    }
}

module bottom_lip() {
    translate([bracket_x_offset, lip_y_min, lip_z_min])
        cube([bracket_width, lip_y_max - lip_y_min, lip_height]);
}

module extended_plate() {
    // Bottom part of L (full width, Y=-9 to -3)
    translate([bracket_x_offset, plate_bottom_y_min, plate_z_min])
        cube([bracket_width,
              plate_bottom_y_max - plate_bottom_y_min,
              plate_height]);
    // Left part of L (Y=-3 to 15, X=-50 to -30)
    translate([bracket_x_offset, plate_bottom_y_max, plate_z_min])
        cube([plate_left_x_max - bracket_x_offset,
              plate_left_y_max - plate_bottom_y_max,
              plate_height]);
}

module screw_post() {
    translate([post_center_x, post_center_y, post_z_min])
        cylinder(r=post_outer_radius, h=post_z_max - post_z_min);
}

module screw_post_hole() {
    translate([post_center_x, post_center_y, -1])
        cylinder(r=post_inner_radius, h=back_plate_height + 2);
}

module lip_holes() {
    // Left lip hole
    translate([lip_hole_left_x, lip_hole_y, lip_z_min - 1])
        cylinder(r=mount_hole_radius, h=lip_height + 2);
    // Right lip hole
    translate([lip_hole_right_x, lip_hole_y, lip_z_min - 1])
        cylinder(r=mount_hole_radius, h=lip_height + 2);
}

module plate_holes() {
    // Left plate hole
    translate([plate_hole_left_x, plate_hole_y, plate_z_min - 1])
        cylinder(r=mount_hole_radius, h=plate_height + 2);
    // Right plate hole
    translate([plate_hole_right_x, plate_hole_y, plate_z_min - 1])
        cylinder(r=mount_hole_radius, h=plate_height + 2);
}

// ============================================
// Assembly
// ============================================

module bracket_assembly() {
    difference() {
        union() {
            back_plate();
            bridge();
            top_lip();
            frame_lower();
            frame_upper();
            bottom_lip();
            extended_plate();
            screw_post();
        }
        // Subtract all holes
        screw_post_hole();
        lip_holes();
        plate_holes();
    }
}

bracket_assembly();

This is what the output looks like
This is what the output looks like

But what is more important is that I was able to capture the entire process in a SKILL.md file, and it makes for an amazing read:

# SKILL: STL to Parametric OpenSCAD Reconstruction

## Goal

Reverse-engineer a binary/ASCII STL mesh file into a clean, parametric OpenSCAD source file that reproduces the original geometry within a specified tolerance (e.g. 0.1mm Hausdorff distance).

## When to Use

- You have an STL file of a mechanical part and need an editable parametric source
- The part is primarily composed of prismatic (box-like) and cylindrical features — not organic/sculpted shapes
- You need the output to be human-readable and customizable, not just a mesh re-export

## Prerequisites

- **Python packages**: `numpy`, `trimesh`, `scipy`, `shapely`, `networkx`, `rtree`, `numpy-stl`
- **System packages**: `openscad`
- Install with: `pip3 install numpy trimesh scipy shapely networkx rtree numpy-stl` and `sudo apt-get install openscad`

## High-Level Approach

### Phase 1: Mesh Triage

Load the STL with `trimesh` and gather key statistics to understand the scope:

- **Vertex/face count**: Determines complexity. Under ~5k faces is likely a machined/printed part with clean geometry.
- **Bounding box and extents**: Gives the overall dimensions.
- **Volume and watertightness**: Confirms the mesh is valid and closed.
- **Euler number**: Computes genus (number of through-holes). Formula: `genus = (2 - euler_number) / 2`. This tells you how many holes to find.

### Phase 2: Identify Z-Level Structure

For prismatic parts (brackets, enclosures, mounts), the geometry is almost always built from features extruded along one principal axis. Identify which axis that is by examining the unique coordinate values of vertices.

1. **Find unique vertex coordinates** along each axis (rounded to ~3 decimal places). The axis with the fewest unique values is the extrusion/stacking axis.
2. **List the discrete levels** on that axis. For this bracket, Z had only 8 unique values: `[0, 0.5, 2, 3, 5, 7, 9, 21.5]`. Each pair of adjacent levels defines a "layer" of constant cross-section.
3. **Count up-facing and down-facing face areas** at each level. Up-facing faces at a Z-level mark the *top* of a feature; down-facing faces mark the *bottom* of a feature starting at that height. The area values serve as checksums for your reconstruction.

### Phase 3: Cross-Section Analysis

Take planar cross-sections at the midpoint of each layer using `trimesh.section()`:

1. **Slice the mesh** at each intermediate Z value (e.g. Z=0.25, Z=1, Z=2.5, etc.).
2. **Convert to 2D polygons** via `section.to_planar()` and examine the `polygons_full` property.
3. **Simplify polygons** with Shapely's `.simplify()` to reduce curved arcs to key vertices while preserving corners.
4. **Transform back to world coordinates** using the planar transform matrix to get actual XY positions.
5. **Record each polygon's exterior and interior (hole) boundaries**. Note how many vertices remain after simplification — a 5-point polygon is a rectangle, a 9-point polygon is an L-shape, a 17-point polygon is a circle approximation, etc.

Track how the cross-section *changes* between layers — this reveals where features start, end, merge, or split.

### Phase 4: Identify Geometric Primitives

From the cross-section data, decompose the shape into CSG primitives:

- **Rectangles** (5 simplified vertices = box cross-section): Record corner coordinates, extrusion height range.
- **L-shapes / U-shapes** (9+ vertices): Decompose into union of rectangles, or model as rectangle-minus-rectangle.
- **Circles / arcs** (17+ vertices after simplification): Compute center as midpoint of extremes, radius as half the span. Verify by checking vertex distances from the computed center — they should all equal the radius.
- **Rings/annuli** (polygon with circular hole): Outer and inner radius from the exterior and interior boundaries.

For each primitive, determine:
- XY bounds or center+radius
- Z range (which layers it spans)
- Whether it's additive (part of the union) or subtractive (a hole to difference out)

### Phase 5: Cross-Validate with Vertex Grouping

For extra confidence, directly examine the raw vertices at each Z-level:

- Group vertices by their Z coordinate.
- For levels with few vertices (≤20), print them all — these directly reveal rectangle corners.
- For levels with many vertices, look for clusters. Compute distances from suspected circle centers and verify constant radius.
- Check that circle parameters (center, radius) are consistent across multiple Z-levels.

### Phase 6: Build the OpenSCAD Model

Structure the `.scad` file for readability and customization:

1. **Constants at the top** in OpenSCAD Customizer sections (`/* [Section Name] */`). Every dimension gets a named variable with a comment showing its physical meaning and original coordinate range.
2. **One module per feature**: `back_plate()`, `frame_lower()`, `screw_post()`, `lip_holes()`, etc. Each module is self-contained and uses only the global constants.
3. **Assembly module**: A single `bracket_assembly()` module that `union()`s all additive features, then `difference()`s all holes. This keeps the boolean logic clean and makes it easy to toggle features.
4. **Resolution control**: A single `$fn` parameter controls circle smoothness globally.

Modeling patterns:
- **Rectangular frame**: `difference()` of outer `cube()` minus inner `cube()`.
- **L-shaped plate**: `union()` of two overlapping `cube()` calls.
- **Through-hole**: `cylinder()` with height extending past the material (add 1mm on each side with `-1` offset and `+2` height to ensure clean boolean cuts).
- **Ring/post**: `cylinder()` for the outer, with a through `cylinder()` subtracted.

### Phase 7: Render and Compare

1. **Render** with `openscad -o output.stl model.scad`.
2. **Compare** using a reusable Python comparison tool that computes:
   - **Bidirectional surface distance**: Sample 50k points on each surface, find nearest point on the other surface using `trimesh.nearest.on_surface()`. Report mean, max, 95th/99th percentile.
   - **Volume difference**: Compare `mesh.volume` values.
   - **Bounds match**: Check bounding boxes agree within tolerance.
   - **Topology match**: Compare Euler numbers.
3. **Iterate** if the Hausdorff distance exceeds the tolerance. Common fixes:
   - Wrong dimension by a small amount → re-examine vertex coordinates at that Z-level
   - Missing feature → look at the worst-mismatch sample points to locate the problem area
   - Circle approximation error → increase `$fn`

### Phase 8: Verify the Accuracy Target

The final gate is the bidirectional Hausdorff distance. For this task the target was 0.1mm; the achieved result was 0.004mm (25× better than required). The residual error comes entirely from polygon approximation of circles (`$fn=64` gives a theoretical max error of `r × (1 - cos(π/64))` ≈ 0.0024mm for r=2mm).

## Key Lessons

1. **Z-level analysis is the critical insight for prismatic parts.** If the mesh has only a handful of unique Z values, the part is a stack of extruded profiles and can be exactly decomposed.
2. **Cross-sections + simplification finds the primitives fast.** Shapely's `simplify()` with a small tolerance (0.05–0.1mm) collapses arcs to their key points while preserving sharp corners.
3. **Euler number tells you how many holes to find.** Don't stop looking for features until you can account for all `(2 - χ) / 2` topological handles.
4. **Face normal grouping separates flat vs. curved surfaces.** Axis-aligned normals (±X, ±Y, ±Z) are planar faces; all others are cylinder walls. The Z-component of non-axis normals reveals whether cylinders are vertical (Z=0) or angled.
5. **Up/down face area sums serve as checksums.** Compute the expected area of each horizontal surface from your model parameters and verify it matches the STL. This catches dimension errors before rendering.
6. **Model in original coordinates, not relocated.** Keeping the STL's native coordinate system avoids translation errors and makes comparison trivial.
7. **Build the comparison tool first.** A reusable `compare_stl.py` with surface sampling and Hausdorff distance makes iteration fast and objective.

## Reusable Tools

All tools live in `tools/` with CLI interfaces, `--help`, and `--json` output.
See `tools/README.md` for full usage.

| Tool | Phase | Purpose |
|------|-------|---------|
| `tools/stl_info.py` | 1 | Mesh triage: stats, topology, genus, components |
| `tools/stl_zlevel.py` | 2 | Find discrete height levels, face areas, vertex coords |
| `tools/stl_cross_section.py` | 3–4 | Slice mesh, extract & classify 2D polygons |
| `tools/stl_normals.py` | 4 | Face normal grouping, cylinder feature detection |
| `tools/compare_stl.py` | 7 | Bidirectional Hausdorff distance, volume, topology |

### Quick-start workflow

python3 tools/stl_info.py part.stl                          # What am I dealing with?
python3 tools/stl_zlevel.py part.stl --vertices              # Layer structure + corners
python3 tools/stl_cross_section.py part.stl                  # Auto-slice cross-sections
python3 tools/stl_normals.py part.stl                        # Find cylinders and holes
python3 tools/stl_cross_section.py part.stl --axis x --at 0  # Hidden internal features
# ... write OpenSCAD model ...
openscad -o output.stl model.scad
python3 tools/compare_stl.py part.stl output.stl             # Verify accuracy

## Deliverables

| File | Purpose |
|------|---------|
| `tools/` | Reusable analysis toolkit (see `tools/README.md`) |
| `bracket.scad` | Parametric OpenSCAD source with customizer sections |
| `bracket_output.stl` | Rendered STL for comparison |
| `plan.md` | Progress checklist with identified components and results |

…and yes, it also created tools for its own use. It’s not a chimpanzee using a stick to get at termites, but it is pretty close: it’s an LLM creating its own toolchain to get at the underlying geometry of a mesh.

This is far more sophisticated than I expected, and it shows that LLMs can be used for scoped reverse-engineering tasks with the right prompting and constraints–but it also shows that you need to be able to understand the problem domain and guide the model with the right structure and checks to get a usable result.

The caveat is that this is a very specific use case (STL to for prismatic parts), and I wouldn’t expect the same approach to work for more complex shapes or different file formats without significant adjustments. But it’s very much above and beyond what we could do a year ago.

Now excuse me while I go and give it a development board’s dimensions and ask it to design a custom case for it…

Update: Pretty impressive results on the custom case as well–here’s a first draft of it, which is already pretty close to what I need:

An Orange Pi 6 Custom Case, generated after a few web searches and image references
An Orange Pi 6 Custom Case, generated after a few web searches and image references

Notes for February 8-15

This was a grueling week at work, but I managed to keep polishing my tools, even if I decided to reinvent a couple of wheels in the process.

go-te and webterm

Since I kept coming across weird edge cases in pyte, I decided to create go-te, a equivalent that complies with the esctest2 test suite. This was rather a roundabout way to ensure I could port webterm to , but the icon alone was totally worth it:

Gophers masquerading as terminals are doubly cute
Gophers masquerading as terminals are doubly cute

After that was done, porting webterm to use go-te and improve responsiveness only took a couple of hours.

Go is great for AI-assisted porting

I am very happy with as a target language for most of my new projects–I had been a few years back, but the opportunity to have deal with all the messiness of its error handling and map out the standard library for me has been unparalleled.

Right now, it’s the best language for my use cases, since it has:

  • Native speed (with some runtime overhead, but much better than ).
  • Full portability (including , which I’ve found very useful in go-rdp and webterm).
  • The standard library is still not fully “batteries included”, but there is now a huge range of stable libraries that are direct slot-ins for most of my needs, and the ecosystem is finally mature.
  • Extreme LLM/ friendliness (’s strict interface definitions and formatting conventions mean it’s almost impossible for an LLM to confuse similar symbols, and I’ve had pretty much zero compilation errors over several projects)
  • Great support for testing and fuzzing (even if a tad too basic sometimes)

And yeah, I could probably have done instead, but I prefer code I can read without delving into too many abstraction layers, and I haven’t yet come across a -shaped problem (even if I am eyeing some embedded hardware project ideas).

Test-driven porting is the way to go

As far as doing -assisted porting, the best results I’ve had have been when I can leverage an existing test suite to steer the LLM towards the right behavior and edge cases. This is especially true when the original codebase is large and complex, and when the target language has a different paradigm or ecosystem.

  • The Preact port of the dashboard I didn’t have a lot of tests, but the existing JS code could be coaxed into play. I’ve been poking at it slowly because bun still has broken WebSocket support, but it was one of the first major test-driven ports I did with Codex 5.1, and it showed me how to steer LLMs effectively at medium scale.
  • go-rdp plays off the MSRDP specs to build its test suite (and there are copious references on the Internet), and I could leverage a lot of my knowledge and background into steering the models directly towards testing the weird edge cases. I’m using it daily (although I need to clean up and push some tweaks).
  • go-ooxml is in a bit of a weird spot right now because the ECMA specifications have some gaps and modern Office versions have enriched the schemas in compliant but formerly unforeseen ways, but it is very much usable and will be part of a future project I’m planning out.
  • go-busybox is still churning through the original busybox test suite, and the real sticking point right now is ash and OS signal handling (which I might not even need in a environment, but which I would like to have working).
  • go-te, my pyte replacement, used both the pyte and esctest2 test suites to validate terminal behavior and build a completely usable library overnight.
  • webterm was ported to Go in less than two hours, largely because the test suite was translatable across as well.
  • go-textile is my port of the markup parser, which uses a variant of the RedCloth test suite to ensure compatibility with the reference implementation.
  • sashimi, my ongoing port of this site’s Python engine to , is using that and is pretty much done as well, except for the changes I want to make going forward. The core was “done” in less than two hours, and I had over 9000 “test cases” of live site content (then I spent two more hours having it optimize a custom wordcloud generator to ~87ms, which was a fun intellectual puzzle).

None of these things are perfect, and they’re at most 80% production-grade if you intend to make them load-bearing for a business venture, but they are excellent for my use cases because I had zero alternatives that matched my requirements.

And I am focusing on libraries and tooling on purpose–right now, I don’t think I can trust to build a full product from scratch without a lot of human oversight, but this way I can trust it to build the building blocks that I can then stitch together and harden as needed.

The Haves and The Have Nots

One thing I’ve noticed over the past few weeks is the huge gap in both perception and availability of .

William Gibson is, again, right: The future is already here—it’s just not very evenly distributed.

Take a dive into the churning cesspool of Twitter/X, and it’s pretty obvious: on one side we have obvious influencers (and people with what has to be significant disposable income) claiming to run with the most expensive Claude Opus subscriptions, and on the other side people who are bargain-hunting inference endpoints or trying to make do with open-source models.

But what amazes me are the people who are paying north of $20,000 for fully kitted out Mac Studios—sometimes in pairs or trios—to run local inference. I will freely admit I wish I had that amount of money lying around (I am still trundling along with a 3060 and a pretty much vanilla MacBook), but the key thing I’m noticing is that people without the means to access frontier models (or even decently performing local ones) are quite likely not going to be able to form any sort of grounded opinion on AI usefulness.

Open-source models are being hyped as one of the things that will change this, but the latest GLM and Kimi releases (or even the Qwen3 variants) all require an eye-watering amount of resources, so they aren’t going to help bridge this gap.

The result is that we have a growing divide between the haves and the have-nots in the space. Those with access to powerful models and resources are able to experiment, innovate, and create new applications, while those without are left behind, or (even worse) unable even to realize what is happening.

I have no solution for this, especially not when personal hardware is getting so expensive, but I hope we’ll sort it out over the next few years—preferably by moving models to the edge and making them more efficient, which has been for a while now.

Clair Obscur

Le cul de sac
I've actually never played RPGs before, but Expedition 33 has a… certain style to it. Maybe it's the French dialogue.

Notes for February 2-7

Half my working week was spent at an internal company thing, so I decided to make the best of my weekend and start cutting down on coding a bit.

Read More...

Accelerando, But Janky

The past couple of weeks have been sheer madness in the AI hype space, enough that I think it’s worthwhile capturing the moment for posterity.

Read More...

Notes for January 26 - February 1

I’ve had some feedback that my last few weekly notes (especially ) have been a bit too long and that I should try to keep them shorter and more focused on a single topic.

Read More...

Thoughts on AI-Assisted Software Development in 2026

A few things I jotted down during –i.e., while building out my agentbox and webterm setups and other things.

Read More...

Vibing with the Agent Control Protocol

Although most of my actual work , I have been wanting an easy way to talk to the newfangled crop of agents from my iPhone.

Read More...

Seizing The Means Of Production (Again)

Since , I’ve been hardening my agentbox and webterm setup through sheer friction. The pattern is still the same:

Read More...

TIL: Apple Broke Time Machine Again On Tahoe

So… Here we are again.

Today, after a minor disaster with my vault, I decided to restore from Time Machine, and… I realized that it had silently broken across both my Tahoe machines. I use a NAS as Time Machine target, exporting the share over and that has worked flawlessly for years, but this came as a surprise because I could have sworn it was working fine a couple of months ago–but no, it wasn’t.

For clarity: It just stopped doing backups, silently. No error messages, no notifications, nothing. Just no backups for around two months. On my laptop, I only noticed because I was trying to restore a file and the latest backup was from December. On my desktop, I had a Thunderbolt external drive as a secondary backup.

After some research, I found out that the issue is with unilateral decision to change their SMB defaults (without apparently notifying anyone), and came across a few possible fixes.

What Seems To Be Working Now

I found this gist, which I am reproducing here for posterity, that seems to be working for me, but which entails editing the nsmb.conf file on the Mac itself–which is not exactly ideal, since I’m pretty sure Apple will break this again in the future.

sudo nano /etc/nsmb.conf # I used vim, of course

…and adding the following lines (the file should be empty):

[default]
signing_required=yes
streams=yes
soft=yes
dir_cache_max_cnt=0
protocol_vers_map=6
mc_prefer_wired=yes

The explanation here is that Tahoe changed the default from signing_required=no to stricter control, and NAS devices with relaxed SMB settings cannot handle this without explicit configuration.

Another common pitfall is name encoding issues in machine names, so you should remove Non-ASCII Characters from the .sparsebundle name (that wasn’t an issue for me, but YMMV).

On the side, the recommendation was to go to Control Panel > File Services > SMB > Advanced and set:

  • Maximum SMB protocol: SMB3
  • Enable Opportunistic Locking: Yes
  • Enable SMB2 Lease: Yes
  • Enable SMB Durable Handles: Yes
  • Server signing: No (or “Auto”)
  • Transport encryption: Disabled

That doesn’t quite match my DSM UI, but it’s close enough, and my settings now look like this:

My SMB settings, as of DSM 7.3.2-86009-1
My SMB settings, as of DSM 7.3.2-86009-1

My Backup Backup Plan

Since I’m tired of Apple breaking Time Machine every few years and the lack of transparency around this (it’s not ’s fault), I have decided to implement a more robust solution that doesn’t depend on Synology’s SMB implementation.

I already have that has an LXC container running Samba for general file sharing, so I decided to look into that as a possible Time Machine target.

As it happens, mbentley/timemachine is a image specifically designed for this purpose, and it seems to be well-maintained, so I’m testing it like this:

services:
  timemachine:
    image: mbentley/timemachine:smb
    container_name: timemachine
    restart: always
    network_mode: host
    environment:
      - TM_USERNAME=timemachine
      - TM_GROUPNAME=timemachine
      - PASSWORD=timemachine
      - TM_UID=65534 # 'nobody' user
      - TM_GID=65534 # 'nobody' group
      - SET_PERMISSIONS=false
      - VOLUME_SIZE_LIMIT=0
    volumes:
      # this is a pass-though mountpoint to the ZFS volume in Proxmox
      - /mnt/shares/timemachine:/opt/timemachine
    tmpfs:
      - /run/samba

Right now the first option seems to be working, but I will probably switch to the solution in the near future, since it gives me more control over the implementation and avoids relying on ’s software.

But if anyone from Apple is reading this: please, stop breaking Time Machine every few years. It’s a critical piece of infrastructure for many users, and the lack of communication around these changes is frustrating.

The Third Way: Borg Backup

I have been using Borg for some time now on , and I am considering using it for my Macs as well. seems decent, I just haven’t tried it yet.

A Minor, Yet Annoying, Additional Problem

Plus I’m annoyed enough that earlier this morning I tried to set up a new device and the infamous Restore in Progress: An estimated 100 MB will be downloaded… bug (which has bitten me repeatedly over the last six years) is still there.

The usual fix was hitting Reset Network Settings and a full hardware reboot, plus reconnecting to Wi-Fi… But this time it took three attempts.

Come on, Apple, get your act together. Hire people who care about the OS experience, not just .

Notes for January 19-25

Since , I’ve been heads-down building a coding agent setup that works for me and using it to build a bunch of projects, and I think I’ve finally nailed it. A lot more stuff has happened since then, but I wanted to jot down some notes before I forget everything, and my next weekly post will probably be about the other projects I’ve been working on.

Read More...

The NestDisk

This one took me a while (for all the reasons you’ll be able to read elsewhere in recent posts), but the NestDisk has been quietly running beside my desktop for a month now, and it’s about time I do it justice.

Read More...

Notes for January 1-18

Return to work happened mostly as expected–my personal productivity instantly tanked, but I still managed to finish a few things I’d started during the holiday break–and started entirely new ones, which certainly didn’t help my ever-growing backlog.

Read More...

My Rube Goldberg RSS Pipeline

Like everybody else on the Internet, I routinely feel overwhelmed by the volume of information I “have” to keep track of.

Read More...

Archives3D Site Map