Vibing with the Agent Control Protocol

Although most of my actual work , I have been wanting an easy way to talk to the newfangled crop of agents from my iPhone.

So I spent a good chunk of this week building out a Slack-like web interface for chatting with agents via the Agent Client Protocol (ACP) called vibes.

Right now, it looks like this:

Just a web view, right? Well, not quite.
Just a web view, right? Well, not quite.

The UX

I blame for planting the seed of a chat-like interface for agents in my mind–this is not replacing my terminal-based workflow, but it’s a nice complement, especially for quick check-ins or when I want to give an agent a task and review the results later from my phone.

The web layer is “just” Preact and SSE, with enough CSS for it to work nicely in small screens and touch input, and the main timeline view shows messages from me and the agent, with support for rich content like code blocks, KaTeX formulae, images, and resource links.

But the key thing is the tool permission flow: when the agent wants to call a tool, it shows a modal with an explanation of what the tool does (fetched from the ACP server), and I can approve or deny it with a tap–that is the key part of ACP that I wanted to leverage, and that so far I’ve only seen in CLI/TUI clients.

The Back-End

How things tie together
How things tie together

One thing that isn’t on the diagram above is the database. I love for small projects like this, and all the more so now that I learned the tricks around using JSON columns for flexible data storage. And, of course, you get full text search support out of the box, which is perfect for searching what I intend to be an infinite timeline.

Hacking In ACP

Like , I am . ACP has many of the same flaws, except that now you also have to deal with the ambiguity of how to surface all of the interactivity you’d have in a TUI in a chat timeline.

Content parsing went through several iterations to handle all the edge cases: tool calls, thinking panels, resource links, embedded resources with annotations, live updates from the agent, etc.

And I had to test it with multiple ACP servers, since each implementation has its own quirks. Right now, vibes works reasonably well with my python-steward, Mistral’s vibe and GitHub Copilot CLI, but all of them have small differences in how they implement the spec.

If I had to do it again, I would have probably built a proper acp client library in or first, but since I was building both the client and server sides at the same time, I just kept iterating on the wire format until everything worked.

But Why?

It’s not just the convenience of having a cute web app on my phone–having a low-friction review loop is essential when working with agents (which is why I was keen on leveraging ACP in the first place), but I also wanted persistent history and richer rendering than what a terminal can provide, because I want to give my agents more complex tasks that involve multiple steps and outputs.

Everyone and their dog seems to be thinking that agents only have to have bash (Armin Ronacher makes some excellent points), but I am trying to strike a balance when designing steward: Give it all the tools it should need for most use cases, a little scripting engine (QuickJS) for extensibility, and extensive SKILL.md support so I can teach it to do new things.

The Sandboxing Endgame

I am pretty sure that my endgame will eventually involve WASM (maybe tinygo in a sandbox or a Cloudflare-like V8 isolate) and I’m actually hedging my bets by looking at porting a subset of busybox to , but for the moment I want to keep things simple and give agents access to higher-level tools that can do complex things without needing to script them from scratch.

Because, well, I don’t want to write coding agents. There’s a special kind of myopia around their incredible success, but I think there should be some balance in the Force.

Crustaceans are cool and all, but sometimes you just want to vibe with your agent about something as prosaic as scheduling a meeting or searching your vault.

This page is referenced in: