The Vibes

The profusion of hype on the Internet has led me to take a lot of things with a grain of salt, and if you’re a regular reader, you’ll know that generative AI has already added more than a few teaspoons into the broth of LLM-driven coding.

I’ve shared my experiences and approaches more than once, and considering it’s been two years since LLMs have become mainstream and we’re still trying to sort out UI interaction and tool integration approaches (don’t worry, I won’t go on about MCP, I ranted about it already), I thought it was time to share my current workflow and how I think about things.

But I must concede that things have progressed tremendously–any of the “reasoning” models I can run locally is now able to handily beat the early attempts at agentic AI solely by itself, LLM integrations have become commonplace, and developer tooling has become a billion-dollar industry… in valuations, if not in actual value.

The Editor Bonanza

I’ve tried pretty much every “AI code editor” out there (Zed, Cursor, Windsurf, and others that came before them, as well as many of the Claude Code variants and offshoots), and I am reminded of the ancient chestnut when Steve Jobs commented that Dropbox was “a feature, not a product” and passed on acquiring them.

Which, to be honest, seems to have been lost on modern investors. But I digress.

Sure, there are some nice UI features in Cursor and Windsurf’s “flows” are a nice approach at steering the LLM, but behind the scenes it’s still all about providing the right tools for the LLM to explore and manipulate the codebase, and above all good contextual prompting, and you can do that on any editor.

I spend most of my coding time using Visual Studio Code and vim (sometimes even using vim in Code’s terminal window…), and with the recent addition of Agent mode to the GitHub Copilot extension, the notion of using a third-party fork of what is essentially the same experience became pointless–leaving Zed as the only interesting alternative GUI editor for me, and aider as the go-to CLI tool when I am coding inside tmux on a VM.

It’s not the editor that makes the difference, and I find the notion that you can “tab tab tab” your way to production inside a second-hand fork of a mainstream editor completely ridiculous, just as I do all the hype around vibe coding.

Like anyone with a music hobby will readily tell you, Gear Acquisition Syndrome (commonly abbreviated as GAS) does not make you a better musician–it only gives you a dopamine hit and the kind of headache that comes with spending all night twiddling the knobs on your fancy new synth instead of actually composing a track.

It’s All About Being Organized

If you’ve done any research on coding with LLMs, you probably came across Harper Reed’s post or Simon Willison’s. Both of them are great reads, and cover different parts of the workflow (brainstorming, documenting as context refinement, and iterating).

My current workflow is quite similar, and I focus a lot on context refinement.

Starting New Projects

For greenfield (new) projects, I usually start with the same few steps:

I will write an initial SPEC.md that covers:
- What the project is about
- What kind of tools and libraries I want to use: aiohttp, sqlalchemy, etc.
- What kind of code style I prefer (for Python, that is usually functional, minimal OOP, and with explicit imports)
- How the code and tests should be laid out
I iterate on it with the LLM, 20-questions style, so that it refines the SPEC.md . This usually yields a list of features that I typically break out into a TODO.md to avoid having it constantly update the SPEC.md and inevitably break something.

I then ask the LLM to generate a basic project structure (usually a uv/pip project) and a few files to get started, like main.py, __init__.py, and a few test files.

Then I give it both SPEC.md and TODO.md files as context and ask it to implement the first few items on the TODO.md and check them off. Claude, Gemini and o3 (which I use on aider) can usually do this without any significant issue as long as you keep the current context focused on only one or two items.

After a few iterations of this, I already know what I need to fix or provide better instructions for, so I usually update the SPEC.md or create separate NOTES.md/README.md files to clarify some things that the LLM will inevitably “forget” about as we progress:

What are the key parts of the database schemas (so that it doesn’t need to go digging around in the ORM or SQL schemas I usually draft as reference)
What are the key threads/processes/coroutines and what is the division of labor among them
How it should handle specific data structures
How it should write specific sections of the code (say, event handlers or things that I want it to follow a strict pattern for)
How it should run tests

One thing I’ve found to be extremely productive is to use boring technology – i.e., avoid using the latest and greatest libraries or frameworks, and instead stick to the tried-and-true ones that are both already extensively documented (and thus more likely to be correctly handled by the LLM) and that I know well.

This allows me to focus on the problem at hand rather than getting both human and machine bogged down in the details of a new library or framework, as well as substantially reducing the amount of context I need to provide and the risk of hallucinations.

On top of this, I then draft a TODO.md with a Markdown list of the specific implementation steps we need to tackle.

I keep it short and objective and refer to it from chat: “Please implement the first TODO.md item on project structure” works great for me generally, and most models will actually go back and check off completed items from TODO.md, which feels magical the first time around.

The TODO.md will save you a lot of manual prompting, but you should clear out completed items every now and then to ensure you’re not crowding LLM context–I typically ask the LLM to rephrase them and move them to the SPEC.md and then revise both for accuracy.

Handling Existing Projects

For existing projects I am picking up or contributing to, the process is a bit different, but the same principles apply. The first thing I do is to go on a fact-finding mission, which usually involves:

Asking the model to assess the codebase and answer these basic questions:
- How is the code structured
- What are the key data structures
- What are the key APIs/interfaces it provides and consumes
- How are errors handled
- How are tests run
Consolidating the findings into a NOTES.md with its understanding of the code; I typically use Claude for this, because I like its summary style
Priming the TODO.md with any salient improvements, which I then review and add my own goals to (I’ve found Gemini a bit erratic here, and o3 to be a bit too enthusiastic or verbose, but that’s just personal taste)

I then go into the same loop as for greenfield projects, but with a lot more emphasis on adding logging, error checking and more README.md files on each module or process.

Note: I’ve also started writing little MCP servers to either perform routine tasks or to actually retrieve part of the information I need from the codebase (like the database schema or the test structure) and feed it to the LLM as more refined context, but it’s early days yet and I don’t really know how useful they will be in the long run (aside from being a fun exercise).

This approach works very well inside Visual Studio Code/GitHub Copilot or tmux with aider, and none of the fancy new AI editors bring anything substantially useful to the table–the ability to do direct code edits and tool using is fast becoming a commodity feature, and the only thing that really matters is how well the LLM can follow instructions and how skilled and methodical you are at providing them.

How Models Currently Stack Up

Right now there is no cloud-hosted reasoning model that does this amazingly better than any other, although at the very low end Qwen3 on my RTX3060 via ollama or directly on mlx can be surprisingly good at it when compared to cloud-hosted ones (the qwen3:8b variant runs very well on my MacBook, although aider sometimes has a tough time getting decent output from it).

I do prefer Claude at the moment, but that is mostly because Gemini tends to just ignore instructions and/or tools and o3 is very long-winded, so since those first two are included in my Copilot plan I tend to spend more time inside Visual Studio Code these days.

I haven’t used o1 or gpt-4.1 for anything substantial yet other than asking them to review documentation (which has been a bit hit and miss). gpt-4.1 is much faster than Claude, though, so you can give it very specific things to do and iterate on some things very quickly; a good example is front-end HTML/CSS, which tends to be essentially boilerplate.

But the real challenge is more than presentation or ability to follow immediate instructions.

In particular, I’ve found that there are certain key parts of structural analysis that the LLMs will completely forget about once we get down into code generation (which is both due to a lack of context window as we move along and the inability to retrace steps beyond the current task at hand).

This sort of explains why LLMs are pretty bad at architecting code, but the biggest flaw, in my view, is their not “discussing” outcomes and just plowing through implementation without any checkpoints (which is not something I’ve seen anyone implement yet).

Takeaways

The key thing is that this is a process. It requires planning, effort and a lot of writing that needs to be deliberately written to be re-usable and revised throughout the entire project, and is pretty far off the beaten track of the “tab tab tab” autocomplete approach or the blind one-shot prompting approach I’ve seen vibe coding advocates go on about.

More importantly, it requires a good understanding of the problem domain and the libraries and patterns you want to use, instead of just blindly accepting what the LLM puts forward.

In a word, it requires taste.

In two, it requires both taste and experience, which, aside from the discipline that comes with age, is something that eludes all the folk who believe they can replace programmers…

Tao of Mac