My WWDC 26 Wish List

Michael Tsai’s annual roundup of WWDC wish lists went up this week, and the thing that struck me most wasn’t any single request–it was the mood. There seem to be fewer wish lists than last year, several people openly admitted they couldn’t be bothered to write one, and the ones that did are pretty much bereft of any “aspirational” wishes.

In short, most Apple developers seem resigned to their fate, and echoed the same weary plea for a “Snow Leopard” year where Apple fixes things instead of shipping more, er… “liquid” junk.

One thing that is clearly apparent even to me (even though I am not doing a lot of Mac or iOS development save ) is that we haven’t even got stability in the 26s yet (John Siracusa has a rather mordant take on that in the latest ATP episode), and in a couple of weeks we’ll get betas of the 27s piling bugs on top of bugs.

I already wrote my catalogue of last month, so consider this the constructive inverse–roughly the same list, reframed as things I’d actually like to see fixed next week.

None of these are moonshots. Most have been fixable for years, and a fair few were working better a decade ago.

What’s changed for me is the agentic-era stakes: I now point Codex and Claude at almost every tool I use during the day, and Apple’s software is, conspicuously, the part that fights back hardest (although I can’t really , this week’s MS Build is chock full of examples where Microsoft is way ahead of Apple in working AI integration, and it’s… just sad to me personally).

My expectations are effectively rock-bottom by now. Apple has become a hardware company where software seems to have been tacked on as a somewhat under-maintained afterthought. But I can’t help but keep a scorecard, so here’s what I’m hoping for–in rough order of how often it ruins my week.

  • I want to be automatable again. Not necessarily the full plugin API they killed, but an dictionary that isn’t frozen in amber and a MailKit surface that can file, tag and search without ceremony–because the one app I live in all day is the one black box I can’t point an agent at. While they’re at it, smart folders and rules that sync from the Mac should finally arrive on , roughly twenty years late.
  • Spotlight should simply find things that exist. I’d settle for that alone–no AI, no reinvention–just reliable, complete results and the one-line reindex affordance the Mac has had for years made available on , so a corrupted index doesn’t mean a multi-hour restore that breaks Apple Pay and FaceID along the way.
  • In the agentic era, automation needs to be a first-class platform, not an afterthought. Like many others, I wish for a way to programmatically create and modify ; I also want Shortcuts that don’t break between OS releases, a genuine cross-platform story, and the MCP-style hooks that OpenAI and Anthropic have to keep reinventing to automate anything in macOS. Windows still does COM and Win32 automation so well that I built an agent tool against it in fifteen minutes–Apple should be embarrassed by that comparison.
  • Give the iPad back a hypervisor. Hypervisor.framework has been on the Mac since Yosemite and Apple Silicon runs Linux VMs beautifully, yet an EUR 1,400 iPad Pro with an M4 can’t run a container or a VM that a EUR 50 ARM board handles without breaking a sweat. The entire local-LLM and coding-agent ecosystem I depend on is locked out of the most powerful tablet I own.
  • needs a scripting layer and real logic. Scene chaining, granular presence, if-this-then-that that actually works, and–for the love of everything–let HomeKit automations call , not just the reverse. I’ve papered over all of it with Node-RED and Home Assistant, but none of that should be necessary for someone who bought into the ecosystem.
  • Make iCloud sync trustworthy and give us Sync Now buttons across the core apps, the way Messages already has (for now, until they notice and remove it). Stop silently migrating data to CloudKit and leaving the CalDAV and IMAP paths to rot–document third-party access properly instead of letting Reminders and Notes quietly vanish from open protocols. Apple has never exposed any APIs worth using, and that needs to change.
  • The Watch should be the best time-aware device Apple makes, and instead it’s a widget carousel. I want a -style chronological timeline, a Smart Stack that’s actually aligned with my calendar, and the Watch independence Imthaz Ahamed asked for–let it pair with more than one phone.
  • Let me run my own code on my own hardware without an annual EUR 99 toll. I don’t want App Store distribution–I want a “just run this on my phone” mode in that doesn’t involve certificate chains that expire and silently brick my sideloaded apps.
  • Stabilise or admit it’s a research project. Views that worked on iOS 17 behave differently on 18 and seem broken on 26, and I lose hours dropping to UIKit to dodge layout bugs reported years ago. Steve Troughton-Smith’s dream of a real cross-platform successor to UIKit and AppKit is the one I’d trade everything else on this list for if I had to write iOS apps for a living.

And no, I’m not going to complain about again. I don’t think anyone at Apple will ever own up to how much of a failure it was (even down to controls that provide user feedback but don’t register clicks at the very edge of them), and some of it was an improvement (the other 80% of spattering controls atop application content wasn’t).

Every one of these is within Apple’s reach. They have the engineers, the money, and total control of the platform, which is precisely why the pattern grates: this isn’t technical inability, it’s a decade of chosen neglect dressed up as focus, whether you look at it from the pure platform side or if you think about it in terms of the (utterly absent) third-party API integration surface.

This is, unashamedly, a bit of a rant. I’ve been using Macs since System 6 and writing here since the OS X betas, and I’ve watched the company get richer and more capable while the software I use every day gets quietly worse at the boring, essential things, and no wonder I have gradually started using other platforms to the point where most people don’t even consider this a Mac blog.

But I am deeply indebted to Apple for making the platforms that have kept me sane over multiple decades, and I do care about the ecosystem, so… Here we are.

I’d love to be proved wrong next week. I won’t hold my breath–but the scorecard is open, the pen is out, and if all we get is another year of razzle over the dazzle, at least I’ll have a checklist to tick off.

Field Notes From The AI Battlefield

Since today is a bank holiday for me, I decided to consolidate a few more of my notes into a post. What follows is a set of guiding “principles” that I’ve found useful over the past year or so and that I’ve codified into various bits of scaffolding I reuse across my projects.

As usual, I’ve tried to strip away all of the hype and fuzziness and stick to facts, but everyone has their own way of leveraging AI, so your mileage may vary.

However, unlike most of what I read online about AI these days, I am not pitching any specific tooling, although all of this is based on my experience.

Full Disclaimer: I and have a personal Codex account that OpenAI provided for my OSS work, as well as access to random Tier 2 providers that I use to test piclaw.

If you like this, you might be interested on , a minor rant about and my .

Do Not Blindly Trust AI-generated Code

A great example I usually point out is that if you ask an LLM to do extensive error handling on a piece of code, it will almost invariably (at least in ) generate empty catch(){} blocks and call that “error handling”.

Another is when I asked it to optimize a particular tree traversal function for an edge case and it just hard coded the result.

And this applies to nearly everything you ask any LLM to do–but code can be validated, and tested, and measured in various dimensions, and you can turn some of its foibles against it.

In the case of the first example above, a linter will catch that, and you can force the AI to turn those empty catches into something useful (like warning messages in logs).

The second one is nastier, but it too can be fixed through proper test fixtures (dynamic but non-repetitive).

Which is why I invariably wrap all my AI-driven projects into several layers of deterministic testing and automation.

Automate Everything Away from the Model

The ground rule I follow is that even SOTA models are inherently unreliable, so when I set up a project or after the first few days of goofing around with a prototype, I try to make sure everything runs on rails.

I typically start with putting together a Makefile because it works/is preinstalled everywhere, is extremely familiar to LLMs, and means I have to do zero thinking myself when running steps manually, but you can use whatever you want.

The important thing is that it must cover the entire development and release cycle, because your agent will inevitably start drifting off and forget how it should do things.

I set it up like this:

  • Makefile targets to do everything (that way there is no “secret sauce” only the model “knows” to do tests, a build, etc.)
    • linting/static analysis (go vet is great, but you should also prepare for typical LLM “lazy” idioms like empty catch blocks, which should be considered critical errors)
    • tests (unit/fuzzing/functional)
    • builds
    • packaging
    • upstream dependency updates (packages and vendored files)
  • One or more SKILL.md file(s) that explain how to use the Makefile and cover the dev/test/debug/release workflows. You should make sure those are referenced from AGENTS.md or use the .github/copilot conventions (insert your flavor of choice here).

The key thing is to always aim for reproducible steps. The model will always go off into the weeds seeking an adventure regardless of how many admonitions you put in AGENTS.md or equivalent, especially when debugging things, but the Makefile (or equivalent) should be your ground truth.

The SKILL.md files are… Well, of dubious value, really. I’ve found to have made them less effective since unlike gpt-5.3-codex newer models often don’t even read the files, but your mileage may vary.

Keep An Eye On Tests

In short, LLM-written tests are generally crap. Anthropic models, in particular, just plain cheat at writing them, so if you ask your LLM to write them, make sure you actually read them.

Unit tests written by LLMs very seldom do anything beyond the obvious, miss edge cases, etc. The only models that write halfway decent tests (as of mid-2026) are the Codex family of GPT models, and even vanilla 5.4/5.5 regressed on that from my standpoint, so my usual tactics are:

  • Build a set of prompts to have different models refactor tests without looking at the internals of your code (i.e., focus on contracts).
  • Treat tests as a black box that outputs a report, so that the session you are coding in does not see the tests and the session that runs and writes the tests does not see the code. You can call these different agents if you want–I call it separation of concerns.
  • Set up CI/CD flows that run all of the tests with zero agent intervention, but have CI/CD generate concise Markdown reports the agents can consume.

The last point is critical, so set it up as soon as you can–it frees up time on your machine and any decent agent can use gh (or equivalent) to fetch CI/CD artifacts, review the results and file issues for itself.

Use LLMs to Fast-Track User Stories

This is where SOTA models shine. Even Sonnet, bless its little stupid heart, can take a set of requirements and distill them into user stories and feature files much faster than formal committee-style BDD processes, and the quality and coverage (so far) seems to be better than humans’.

If you work with customers, this last bit is very important–humans will want to describe the user stories that matter to them in exquisitely irrelevant detail while completely skimping on the ones they don’t care about, whereas LLMs won’t care if they are describing boring bits or not, and they won’t quibble at the details–they will just do it.

The resulting user stories need to be reviewed, of course, but piping UX requirements through an LLM and Gherkin typically generates pretty decent scripted tests, especially if the LLM can look at your Preact/Vue/etc. code and build corresponding Playwright scripts.

This will save you weeks of work, and catch dozens of inevitable regressions as LLMs subtly break your front-end code en passant while implementing new features.

Ask me how I know.

Again, Never Let The LLM Run Tests

Mind that I never rely on the LLM to run Playwright for the actual tests directly - it will either cheat, be creative about how it inputs things, refresh the page to see if the DOM changes and break test state, etc. – it’s fine to use it to explore an app and draft the scripts, but when you run these things in CI/CD, you want them to be extremely deterministic.

And you want evidence of all functional tests, so I have a little toolkit to gather that evidence:

  • Playwright for web testing
  • tmux for TUI testing (rmux is also a thing now, but if you work in regulated industries the paperwork to get it baked into an image will likely outweigh the benefits)
  • A custom VNC harness for my retro emulators (using tesseract for OCR, which is surprisingly capable)
  • And, sometimes, a webcam or an USB video capture adapter (plus a sub-agent that only describes what it sees)

As a bonus, besides a Markdown report, I also generate a PDF report with screenshots and logs for the failing cases–and an override switch to screenshot all the tests for occasional audits.

Again, ask me why.

Do Not Let The Models Edit Freely

LLMs will always mangle long files, regardless of how big the model or context window is. Anthropic models (as of mid-2026) are particularly prone to that for some reason (as well as “drive by shootings” where they mangle tangentially related files).

You need to decrease your exposure to this kind of risk and do some proactive damage control by decreasing the impact of any such errors. It is not a matter of if, it is a matter of when, and it will nearly always manifest as weird regressions a few days down the line.

What I do:

  • If possible in your harness, disable full-file write tooling and force the model to use edit or diff for focused edits. The added friction will typically prevent it from mangling entire files.
  • Set strict caps on file sizes and (depending on the kind of package) guidelines for breaking up functionality.
  • Review changes to see if unexpected files were touched (I have been meaning to create a SKILL.md for doing this automatically, but eyeballing by listing uncommitted files it is just easier).

Sometimes I wish I could just make unrelated files read-only before letting the LLM loose on React/Preact code, so I am looking into LSPs and static analysis to see if I can do the coding equivalent of raycasting–projecting out which files would be related to a specific change.

Aggressively Refactor at Every Opportunity

Every few sessions. stop and refactor the code. Most technical debt from AI use comes from letting it literally piss all over your nice module structure.

In particular, I’ve found that LLMs like to define redundant types and duplicate code pretty much at random because they can’t see across your entire code base. If they’re operating in one part of the tree, they’ll be completely oblivious to the rest.

What I do is that once I have implemented one feature (or a sequence of features) and tests pass, I aggressively go in and review every single type, helper and filename.

Models can do baseline audits (the trope about OpenAI models fixing code Anthropic ones wrote is very much true in my experience), and you can trust the outlines of the audits, but with some caveats:

  • They will always cut short the depth to which they analyze code
  • They will often stop at module or dependency boundaries
  • They will only try to merge or remove duplicate code if it is blatantly obvious (and even then it is not a guarantee)

I do use models for audits, but only as a starting point. Then I go in and:

  • Point out where there was feature creep or duplication of code/responsibilities in the module structure
  • Enforce things like centralized logging
  • Manually flag duplicates and give instructions by adding TODO comments to the code

In (which I have sort of gravitated to recently due to the balance of great profiling and refactoring tools and less cognitive overhead than ), gopls can significantly help the model do most file splitting/refactoring automatically and without any chance for the model to mess things up, so every so often I fire up a dedicated session, hand it a prebaked set of guidelines and do a full-on refactoring pass.

Prune Abstractions

Models have a tendency to follow “best practices” to a point where they create untenable messes of nested abstractions, very much like the sort of people who write Python as if they were cosplaying at writing Java–classes, accessors and factories everywhere, etc. You know what I’m talking about.

This is something that initial SPECs and system prompts actually help with, until the context window is so full that those guidelines are “forgotten”.

Weed those out ruthlessly. By all means define reusable contracts and use strong typing ( is a godsend in that regard), but expect your linter and LSP to catch your LLM red-handed.

Learn To Walk Away

There are many ways to work with AI, and none of them work for everyone, but there are some basic tenets I follow:

  • Shorter Sessions = more attrition. One-shotting features will just create more pain and technical debt down the line, and they foster an illusion of progress, not stuff you can actually rely on.
  • Make sure you are willing to put in the design and spec effort. The more you think and plan yourself, the more grounding you can provide to an agent to keep it on track.
  • Leaving the agent to its own devices for an hour or so will give you time to ponder–yes, it might be risky token-wise if you haven’t specced out the work well enough, but that is part of the challenge here.

I think Ralph loops are profoundly stupid and wasteful, but am very much a fan of writing a SPEC, chunking it into a plan.md (or your harness’ equivalent) that includes clear directions for testing and then using things like /goal complete the plan.md file, because that provides the agent with a clear cut set of steps.

Goal seeking of various forms (, performance optimizations, etc.) can be extremely effective and reliable, but only if you’ve stacked up most of the previous tricks written above (and even then I’ve caught LLMs cheating at benchmarks in the most egregious way: “the simplest option is to not execute the query” is a real thing that actually happened).

Aim For Reproducible Everything

Again, do not trust any of the code the agent puts out. And even if it works, keep track of how it works–in a sentence, instrument the crap out of everything:

  • Enforce structured logging as soon as possible, and have automated checks to ensure that errors/exceptions/etc. are logged.
  • Maintain a set of benchmarking/regression tests that output actual metrics (if you don’t use OpenTelemetry, try to at least have a text file with key metrics)
  • Be very thorough about regression testing. Taking the time to rebuild and run last week’s version will often show that you’ve missed either testing for something or measuring something important.

Again, CI/CD is your friend here, and a lot of my time, even on personal projects, has been spent on building test and smoke harnesses of various kinds:

  • Mock up external APIs and write various failure modes into the mocks so that the LLM will have to deal with “errors” from the start.
  • When doing emulation/JIT work, create a test harness for each specific operation that you can gdb through (LLMs can actually do this pretty well), then a smoke harness that you can compare with QEMU, etc.
  • When doing microcontroller work, build and test subroutines separately in the host machine before assuming they will work in the microcontroller.
  • When doing inference optimizations (like in go-pherence), cross-check similar kernels across back-ends and architectures to ensure they all provide the same results

The list goes on, but the key thing is that everything should be automatable and outside the control of the LLM.

Is all the above hard work? Yes. But can you take most of it along with you when you start a new project? Also pretty much yes–and the icing on the cake is that once you’ve gotten the basics down, the principles are all transferrable across stacks/environments/runtimes and the thought process will keep your wits sharp.

Not to mention these things will save you a bunch of time.

Notes for May 24–31

Today I realised that I could just spend the day doing essentially nothing and that nobody would hold it against me (at least in Western nations), so… I might well do just that, with a few caveats:

Wi-Fi Fallout

Something very weird happened after I published – it made it to Hacker News (a day or so after I submitted it myself, because, as usual, most of my self-submitted links still appear to be shadow-banned despite 30K+ karma–and no, I don’t understand that either), and it was very popular among the usual band of armchair networking experts.

But then something really weird happened: I got an alert from Cloudflare that the lowercase-rewrite worker I’d deployed as a fallback for incorrect linking was exceeding the free-tier limit (100,000 runs, if I recall correctly), which made me curious enough to dig into the analytics:

Cloudflare page views control chart showing two out-of-control spikes reaching ~70,000 views/hour on 30 May
The control chart doesn't lie. Those orange dots are not normal.

I have CF’s anti-bot crawling settings active, I turned on CAPTCHAs again after the initial peak, and yet… 70,000 views in an hour, twice? Has to be crawlers. And how did CF let them through and count them?

So I went and plotted Clarity’s chart of “human” visitors (always an undercount, since it only captures people without JS or ad-blocking, but useful as a sanity check):

Microsoft Clarity unique visitors chart showing the genuine HN-driven spike to ~8,000 unique visitors on 29 May, with traffic returning to normal shortly after
The real HN spike was Thursday. Everything after is noise.

Definitely bots after the initial HN flood. I have to wonder why, why now, and whether Cloudflare’s free tier is still even marginally effective at blocking them.

go-pherence

The most interesting work this week was grafting speaker diarization onto go-pherence. Whisper tells you what was said; knowing who said it is a separate problem, and the standard answer is SpeechBrain plus a Python subprocess plus a fairly heavy PyTorch dependency. I did not want any of that. Instead I ported ECAPA-TDNN – the speaker embedding model SpeechBrain uses – to Go, and it all now mostly works with zero Python, even if it still needs a lot of tweaking.

There’s a speakercheck validation harness that runs spot-checks against windowed audio segments, scores against expected speaker labels, and outputs JSON reports, and a diarize-vtt command that accepts an optional ECAPA model and emits speaker-tagged VTT output. I expect to drop this onto one of my current hardware test subjects soon.

In Other News

I’ve been tinkering with more new hardware, but some things just take time and I’m still putting together my notes on those.

On the other hand, I am still very much impressed with the running , and I’m enjoying building little plugins for it as I go:

Niri display layout plugin showing the Kuycon P20 external display and built-in DSI screen arranged in a stacked layout
A Niri plugin to manage display layout, because of course I wrote one.

I will eventually publish these somewhere…

Mildly Parboiled

Allergy season is finally fading (at least for me), but today was the first time I had to turn on the AC in the office, and it was great to realize that and almost four years of potential HomeKit foibles, my is still working perfectly.

Those minor joys aside, I’ve been actively trying to get out of the house to do some exercise at least one hour a day and it is clearly not going to happen at lunchtime anymore–well, not every day, at least, so I’m starting to get cabin fever.

All of this to say that I’m feeling as if I am starting down the slippery slope to both physical and mental burnout again, and this time I’m backing off as early as possible.

For starters, I am currently profoundly annoyed at my current working arrangements, since my days of wall-to-wall meetings with completely random 15 minute breaks are both utterly destroying my health and eroding my ability to focus. Sometimes, and despite being remote for many, many years, I would really prefer to be back working at an office, if only because I miss walking about and using stairs to go and talk to people.

Turns out my closest project team are now in Madrid (plus Belgium, Sweden, Canada, etc.), so that isn’t going to happen. And, truth be told, online meetings are now so stupefyingly more productive (as meetings go) that actual work is still best done remote–as long as you can cut through the tremendous amount of AI-augmented cruft that a meeting now entails.

I, as usual, have been pragmatic about it and crafted my own agent to summarize meetings the way I want them, and to craft terse, minimalist works of corporate obeisance that avoid the walls of text I get by default and focus on the stuff I need to do instead of spouting corporate cheerleading (it has become ).

Anyway, my priority is now, again, my well-being. But I feel like my entire lifestyle is in dire need of an intervention, and the obvious life hacks most people suggest like exercising in the early morning (when I am trying to do my daily reading and research) or at the end of the day (when I am just bog tired) just don’t work for me, so the upshot of all this is that I am currently trying to carve out slots throughout the week to just get out of the house for 30 minutes.

Which is completely stupid.

This has to change (somehow). In the meantime, part of that carve-out is also going to be about mental health–I’m phasing out Twitter/X again, as well as a bunch of other “social” distractions and hypefests like HN.

Indoor Wi-Fi Roaming with OpenWRT

A few months after writing up the units and moving the house over to , I ended up revisiting the one bit I had deliberately waved away as “good enough”: roaming.

A real house, with a mix of phones, tablets, laptops and a few stubborn IoT things that insist on staying in 2016, has… issues. But they’re not always obvious, and given we’d both upgraded the 5GHz band and changed the locations of the access points, it took a while to figure out where the new rough spots were.

If you’re just tuning in, I have a hard split between a legacy 2.4GHz network and the modern 5GHz one. I already had client-managed roaming and basic handoff guidance, but now I added usteer, 802.11k neighbour reports (because hostapd was not cooperating), and things are now pretty much perfect.

The long version is below, with anonymised data and enough detail for future me to remember why I did this.

Why I Did Not Merge The SSIDs

The obvious advice for roaming is “use one SSID everywhere”, and that is often correct if you’re running Wi-Fi in an office, a public venue, or generally somewhere where you don’t have (or care about) legacy devices. It is also not what I did, because the 2.4GHz side needs to remain friendly to older and slightly terrible IoT devices, which means WPA2 compatibility and a conservative setup.

The 5GHz side is where the more modern clients live, and despite losing 5GHz access for a couple of things, I was happy to move it to WPA3. So this is what things look like from a high level:

  • 2.4GHz: legacy-compatible WPA2-ish network for IoT and old clients.
  • 5GHz: modern client network with WPA3/SAE
  • 2.5GbE backhaul across four “dumb” APs
  • Zero cloud management or vendor-specific software. Nada. Zilch. Non-negotiable.

User Feedback

However, I got a few complaints that when moving about the house, iPhones, iPads and MacBooks would not switch to another AP. Since our flat is wrapped around a couple of elevator shafts and there are a few spots (like the kitchen) where tiling, pipes and tiny RF nuisances like fridges were prevalent, that sort of tended to happen a lot–and Apple devices are notorious for being opinionated about that base station they want to stick to.

The baseline seemed fine. All four APs had 802.11r/k/v-related options enabled. Fast Transition was also demonstrably happening–the AP logs had auth_alg=ft entries that showed fast transition was happening, I had installed wpad-mbedtls for “mesh” support, but roaming clearly needed to be improved.

And my setup meant it had to be improved within each band/SSID, not across bands. Cross-band roaming is the client’s job, and many clients are not especially good at it.

Adding usteer

But two things stood out:

  • There was no steering daemon installed. Clients were making all roaming decisions on their own, which usually means they hang on to a far-away AP until their signal is frankly embarrassing.
  • rrm_nr_list was empty on every radio. In other words, even though 802.11k was enabled, hostapd was not exposing neighbour reports to clients, so… no real way to steer anything.

So I installed usteer and its LuCI companion package on all four APs, enabled it, and left the initial configuration at defaults:

opkg update
opkg install usteer luci-app-usteer
/etc/init.d/usteer enable
/etc/init.d/usteer restart

The default configuration is minimal: LAN gossip, syslog enabled, IPv6 disabled for the daemon (because, for reasons, I don’t trust our current ISP router to do anything reliably except act as an ONT), and a moderate debug level. That was enough for all APs to see one another and exchange client data, which is exactly what I wanted.

However, the 802.11k neighbour list wasn’t being populated. After poking through the OpenWRT forums, I realized the missing piece was static-neighbor-reports, which is one of those tiny OpenWRT packages that does exactly what it says and nothing more.

Each AP can generate its own 802.11k neighbour report element via:

ubus call hostapd.<iface> rrm_nr_get_own

But clients only get useful neighbour lists if each AP is told about the other APs. So I generated per-band lists and installed them per AP:

opkg install static-neighbor-reports
/etc/init.d/static-neighbor-reports enable
/etc/init.d/static-neighbor-reports restart

The important detail is that the reports are band-specific: 2.4GHz radios only advertise 2.4GHz peers, and 5GHz radios only advertise 5GHz peers. No cross-band mixing, because the two networks intentionally have different SSIDs and security settings.

After that, every AP had three neighbours per radio, usteer had AP/client state, and hostapd has explicit 802.11k neighbour data to hand to clients that ask for it.

What Changed

The first comparison is a little boring, but useful. Here is the 2.4GHz SNR before and after the change (this, like the other charts here, was generated from data):

2.4GHz SNR over the week
2.4GHz SNR over the week

2.4GHz SNR: pre-rollout vs latest
2.4GHz SNR: pre-rollout vs latest

There is no miracle here. 2.4GHz remains 2.4GHz–crowded, noisy, full of junk devices and crowded by all my neighbors. Two of the APs improved or stayed roughly level, two got worse in the sampling window, and I have zero expectations about ever clearing this kind of congestion without moving to the countryside.

The 5GHz side is more encouraging, even if you do need to know when we were near which AP at what time when you look at active bitrates:

5GHz bitrate over the week
5GHz bitrate over the week

The interesting part, though, is that at least between two APs, there was a noticeable shift in usage–which seems to reflect where clients should be registered in practice:

5GHz bitrate: pre-rollout vs latest
5GHz bitrate: pre-rollout vs latest

But the best sanity check is the sticky-client view, because that is what started this in the first place:

Sticky-client check
Sticky-client check

The number of merely weak clients did not disappear–one extra client fell below -75dBm in the later sample–but the very weak clients went away. That is the bit I care about: the previous -90dBm-ish sticky associations were gone in the later check, which seems to indicate clients are not getting hung up on their previous AP and are indeed roaming.

Caveats

A single sample is not science, and Wi-Fi is a swamp of client decisions, radio noise and domestic entropy. I also saw one new Fast Transition log entry after the rollout:

FT: Missing required pairwise in pull response from a peer AP

That happened once in the latest check. It is not enough to call the setup broken, but it is worth watching–especially because SAE and FT have enough moving parts that I would rather trust logs than assumptions.

Going Forward

I will be keeping an eye on this over the next few weeks… somehow. I got an LLM to do the Graphite queries and chart scripting for me, and ain’t nobody got time to build dashboards only I would look at, but the metrics aren’t going to go away and the stable config lives in my local instance now, so there’s really no excuse not to do a spot check in a few months.

But I really like my Cudy APs. No cloud controller, no meshing, no mobile app and no secret sauce. Just OpenWRT, collectd/Graphite, and the odd ssh session to check configs.

That is still the main thing I like about this setup: when it gets weird, it gets weird in ways I can inspect.

Notes for May 17-24

My sinuses are still giving me grief, but this week was much more successful at pretending to be enjoyable, at least. For starters, we watched Project Hail Mary, and it was every bit as good as I would expect it to be, which is very rare in movies these days.

Meetings Suck More In Summer

Insomnia seems to be fading, but as the weather improves, the time windows for leaving the house and enjoying exercise before the heat kicks in have become narrower and are in full-on collision with typical meeting schedules, and that has become a major drag on my optimism since I have to wonder why, as an industry, we haven’t really solved meetings.

The technology is fine–it’s a culture problem. Stand-ups, project syncs, account planning, everything requires far too many unproductive meetings that just accrete overhead because a) people don’t really prepare for them and b) people don’t have time to prepare for the meetings that matter because of all the other meetings.

And, of course, everyone thinks their meetings are the ones that matter.

Couch Time

Either way, I’ve finally started having more enjoyment off-work. A good deal of it stems from the fact that I can now use piclaw as an interactive notebook across all of my projects and just scribble on a tablet screen (including annotating images and text to feed back into the agent).

Using piclaw on the couch
Using piclaw on the couch

I have already gotten most of the annotation experience to work on my as well (and with a local agent to boot), so I’m starting to wonder when OpenAI or Anthropic will pick up on this (neither of them has a decent tablet UX, and they clearly don’t seem to care about that).

In the meantime, I’m looking for an Android tablet that would be at least as good as a Samsung one, but without any of their UI junk–the TCL NEXPaper ones seem very interesting, but it’s apparently impossible to reach any of their marketing people…

Joking Around

One of the things I’ve been playing with a la longue is Joker, my souped-up version of a runtime for . Well, go-joker now has a proper notebook interface–cells with run states, rich outputs, inline SVG rendering, WASM-backed bitmap demos, and a parallelised Mandelbrot cell that renders fast enough to feel interactive.

This is another step towards the -for-code thing I a few weeks ago, except it’s running in a Clojure interpreter that I developed in another notebook-like interface:

go-joker notebook with Mandelbrot rendering
go-joker notebook with Mandelbrot rendering

The irony of constantly working on notebooks within notebooks is not lost on me, but it does look very good right now.

Inference Hardware

I just got a SpacemiT K3 board to test, which is both my and a refreshing take on the ecosystem, because a) it was zero hassle to set up b) came with 32GB of RAM and c) has a promising (if weird) NPU arrangement that I fully intend to exploit, even if (as usual) source code and documentation is a little sparse.

On the GPU side, I’ve been trying to shoehorn a Qwen model with MTP and KV cache optimizations into my 12GB 3060 in parallel (without any real usable solution yet), so alternative hardware is even if (at least right now) it poses a completely different set of problems to solve.

Emulation Progress

My long-delayed build draws near–after pondering my options I ordered the mini-macintosh PCBs and parts (5 of them, even though I only have 2 Maclocks) and have been poking at the Mac JITed emulators a bit, but I got sidetracked into getting the MMU to work in previous-jit and… I haven’t really paid much attention to any of the other bits.

I did try to get ios-linuxkit to run faster through a variety of strategies, but the truth is that performance work on interpreters is humbling–most ideas that sound good measure worse, and none of it panned out except some iOS fixes–terminal input latency, soft keyboard lag, DNS fallback, and iPhone canvas scaling.

The gap between “works on my iPad Pro” and “works on an iPhone” is always wider than expected, and in this case I am actually considering removing ghostty-web from the iPhone version given the added overhead.

Logitech Combo Touch: Four Years Later

I think it’s time for an update on my iPad Pro M1 and, most importantly, the Logitech Combo Touch I got for it. Think of it as a long term review of sorts.

In short, I bought another Combo Touch–the old one was falling apart.

Disclaimer: I paid for this with my own money, as I did the first one, but Logitech did offer me a discount. As usual, this article follows my .

The Good Bits

I had originally chosen the “sand” color, which was a sort of calculated bet–I wanted something different from the traditional black, and mentally prepared myself for it to accrue stains or dirt over time.

Guess what, it really didn’t. I guess it will look slightly darker and dingy if put alongside a new one, but I have zero complaints about the fabric-like parts and can only find a very small (sub 5-mm) stain if I look really hard. Maybe I was lucky, but those bits still look great.

I have also had zero issues with the keyboard. Yes, it has short travel, but it is effectively full size, the international English layout is excellent for coding, and it has been extremely reliable over the past four years. The only key with a (cosmetic) issue is my S key, which was slightly marred by a stray solder blob.

And the trackpad is simply sublime–it is the best non-Apple trackpad I have across all my hardware, not to mention it is luxuriously large for a tablet trackpad.

The Bits That Fell Apart (Literally)

Over the years, the speaker slots (which are effectively thin strips of rubbery plastic) started deforming. First subtly, then to the point where they are now either broken or completely deformed:

Deformed speaker slots on the old Combo Touch
Deformed speaker slots on the old Combo Touch

This does coincide with how I hold it for writing in both landscape and portrait mode (the inner cover edge is also flaking off on the bottom left side in portrait orientation), but… I’m at a bit of a loss as to why this wasn’t factored into the design somehow.

Buying Another One

Unfortunately, Logitech does not offer the possibility to buy only the cover, otherwise I would have kept my current keyboard.

And there were no refurbished ones shippable to Europe either (for whatever reason), so I ended up reaching out to support and then buying an entirely new “Oxford grey” one (which was effectively the only color available).

Oxford grey Combo Touch next to the old sand one
Oxford grey Combo Touch next to the old sand one

The new one is physically identical as far as I can tell–same connector, same kickstand, same key layout, same excellent trackpad.

Which means everything I still applies, and I won’t repeat it here. What I’m more interested in this time is whether this one will last longer without deformation.

I have my doubts, of course.

TIL: Noctalia Shell Lock on Suspend

This is a little bit of follow-up to my – I keep using it routinely (especially when we travel for leisure) and love the little thing to bits, but I’ve been wanting to run it mostly on power saving mode to reap the most benefit out of the hardware (and battery, of course), so I started looking at desktop environment alternatives.

Yes, I could already get a full afternoon (and then some) out of it, but Apple Silicon has spoiled me as far as battery life expectations go, and has a little bit too much baggage for that kind of extended use.

Since I spend 90% of my time on it writing or coding and still have a penchant for keyboard-driven desktops, I initially switched to Fedora Sway Atomic (gotta love being able to swap environments with a single command…), but later installed Niri and Noctalia Shell because I really like both the idea of a scrolling window environment and the sheer polish of the whole thing–even if there are some rough edges here and there.

I am very happy with it, and writing plugins for it is trivial:

I hacked together a Bing Wallpaper plugin in 30m
I hacked together a Bing Wallpaper plugin in 30m

The one thing that annoyed me to no end, though, was locking on suspend, which Noctalia Shell should do but apparently doesn’t in , so I had to resort to two hacks:

Locking on Lid Close

The first was adding a switch-events block to the Niri config to trigger the lock screen when the lid closes:

switch-events {
    lid-close {
        spawn "qs" "-c" "noctalia-shell" "ipc" "call" "lockScreen" "lock"
    }
}

Idle Lock via swayidle

The second was setting up a swayidle systemd user service to lock after 5 minutes of inactivity and suspend after 10:

[Unit]
Description=SwayIdle Service
After=graphical-session.target

[Service]
Type=simple
ExecStart=/usr/sbin/swayidle -w \
    timeout 300 'qs -c noctalia-shell ipc call lockScreen lock' \
    timeout 600 'qs -c noctalia-shell ipc call sessionMenu lockAndSuspend'
Restart=on-failure
TimeoutSec=30

[Install]
WantedBy=graphical-session.target

This last one feels extremely gauche and I hope to find a better way, but I guess this comes with the territory. I don’t really care about having a trendy Wayland desktop (I just want a dead simple one with a bit of polish), but I hope this kind of hacks won’t be necessary for much longer.

Oh, and of course I set gsettings set org.gnome.desktop.wm.preferences button-layout 'close,minimize,maximize:appmenu' to match macOS decorations.

Apple Papercuts

I know this blog has strayed a fair distance from its Mac-centric origins, but I’ve been keeping a mental list of all the things that are broken, missing or inexplicably neglected in ’s software, and it’s gotten long enough that writing it down feels like a public service1.

Read More...

Notes for May 10-17

The weather has gone a tad cloudy again, which provided me some relief from my allergies–but not enough for proper overnight rest, so yet again I arrived at Friday afternoon totally exhausted.

Read More...

Announcing ios-linuxkit: Linux on iPad, the Hard Way

I’m done waiting for Apple to fix things. And one of the things I think should exist is a decent way to run Linux binaries on my iPad.

Read More...

Unexpected Synology Woes

Last weekend my decided, for some unfathomable reason, to stop working after I took it out of the closet, dusted it and put it back, and I have feelings about it.

Read More...

The Siri For Families Apple Will Never Build

The got me thinking about the one thing I keep wishing would build and almost certainly never will: a family-scoped AI assistant that actually works across all our devices.

Read More...

I Think I Figured Out What an AI IDE Looks Like

I’ve been mulling the UX arc I’ve been going through over the past couple of years, and I think it was mostly the same for everybody:

Read More...

Notes for May 3-10

This was a weird week, both because I keep waking up at 5AM with my sinuses clogged, and because I feel like I’m losing momentum. Feeling almost permanently cotton-headed, sleepy due to sheer exhaustion or because of antihistamines certainly has something to do with it, but .

Read More...

The Local AI Moat

Regular readers will know that I’ve spent most of the past two years shoehorning LLMs into single-board computers, partly as a learning exercise and partly because there are lots of local/”edge” applications where semantic reasoning (no matter how limited) and “interpretation” of sensor data are actually useful.

Read More...

Notes on GPT 5.x Model Regressions

I’ve been getting annoyed at constant code regressions in piclaw for the past few weeks. Something was off–even after bumping the test suite to the point where it catches most mechanical errors, gpt-5.5 kept making unrelated edits to code that should have been left alone, and I was getting really annoyed at babysitting it.

Read More...

Notes for April 27 – May 3

This was an absurdly productive week, at least on a personal level. I’m not sure whether to be pleased or worried about the number of projects that moved forward simultaneously, but here we are.

Read More...

Lessons on Building MCP Servers

I’ve been building servers for a while now–I wrote about last year, started out by creating umcp, and I’ve recently opened up an Office server that’s been battered by enough models against enough real documents that the patterns have settled.

Read More...

App Notes: Web App Viewer

I got annoyed enough with Safari Web Apps to write my own replacement.

Read More...

Notes for April 20-26

Amidst the chaos brought upon my usual seasonal allergies, work turned out to be calmer than usual–the usual industry churn and constant rumors of layoffs have made “calmer” a relative term, though–so most of my evenings went to projects.

Read More...

Archives3D Site Map