The End of Pour-Over Code
How to Engineer Code You Didn't Write



Listen to this post
AI-generated narration of The End of Pour-Over Code.
There's a certain kind of coffee shop where the barista spends five minutes making a single cup.
They weigh the beans. Grind them precisely. Pour hot water in careful spirals over a glass dripper.
It's meticulous. It's artisanal. It's beautiful.
And it absolutely does not scale.
For a long time, software engineering had a similar aesthetic. Good engineers wrote most of the code themselves. They read nearly every line. They understood the system intimately. Hand-crafted code was the ideal. But the environment is changing.
Today it's normal for LLM agents to generate large chunks of a system. Entire modules. Tests. Refactors. Sometimes full features.
Which raises an uncomfortable question:
If you didn't write the code, and you didn't read every line, what does it mean to still be doing good engineering?
The answer is not "trust the AI." The answer is that the work moves up a level. Software used to reward engineers who could hand-craft every line. The next generation will reward engineers who can design systems that safely accept code they didn't write.
Engineer Boundaries, Not Functions
Once agents can generate code cheaply, the game changes. The problem is no longer "can we produce the code?" The problem is "can this code exist in the system without quietly making a mess?" That is why boundaries matter more now.
You can survive a lot of mediocre implementation inside a well-bounded module.
You cannot survive a codebase where every generated helper reaches into shared state, calls three services, mutates something surprising, and returns whatever shape felt convenient in the moment.
That is how you end up with a repository full of code that technically works and operationally behaves like a gas leak.
A weak boundary usually looks innocent at first. That is part of the problem. It takes in loose input, returns a fuzzy result, hides side effects, and knows too much about neighboring systems.
Here is a weak boundary:
This function does too much, knows too much, and hides too much.
It loads data, applies business logic, writes to the database, triggers billing side effects, and emits analytics. If generated code starts multiplying around this style, your future is just a stack of polite-looking accidents.
Now compare it to a stronger boundary:
This version is not better because it is fancier. It is better because it is easier to trust. The input is explicit. The output is explicit. The side effects are narrower. The next action is surfaced in the result instead of being buried inside the function.
That is what a good boundary buys you. It contains damage. Even if some of the implementation is generated, the system is still easier to reason about because the module has a defined job and a predictable surface area.
That is the standard now, and it is not "is every line artisanal?"
More like:
- Can I tell what goes in?
- Can I tell what comes out?
- Can I tell what this thing is allowed to touch?
- If it breaks, do I know how far the damage travels?
In the old model, people obsessed over the function body. In the next one, the interface matters more.
Verify Behavior, Not Every Line
A lot of the anxiety around AI-generated code comes from the same instinct:
"If I didn't personally inspect every line, how do I know this thing is safe?"
Fair question.
But "read every line forever" is not a serious operating model once agents are generating large chunks of implementation. It does not scale, and more importantly, it points your attention at the wrong thing.
The real question is not whether you reviewed every line like a Victorian factory foreman. The real question is whether the system has enforceable contracts, and whether review is focused on consequences instead of trivia.
That is where confidence comes from now.
I care less that a generated implementation is elegant on first pass than I care that it satisfies the properties I actually need:
- Schema validity
- Idempotency
- Access control invariants
- Failure behavior
- Data integrity under retries
For example:
In a codebase with heavy agent involvement, that test suite is less like a safety net and more like a fence. It tells the system what counts as valid behavior and what does not. The implementation can change. The contract cannot.
That is the first half of the shift.
The second half is how humans review changes.
There are still places where exact implementation deserves real scrutiny, especially around security, data handling, concurrency, and anything irreversible. But as a default operating model, "review every line of every generated change" is not going to hold.
Human review becomes more valuable when it focuses on consequences:
- Did an interface change?
- Did a schema drift?
- Did a dependency introduce new risk?
- Did a module cross a boundary it should not cross?
- Did the blast radius get larger?
- Did observability get weaker?
Implementation can be delegated. Consequences cannot.
That is the standard I keep coming back to. Engineers should spend less time asking "would I have written this exact loop differently?" and more time asking "does this preserve the architecture, the contracts, and the operating model of the system?"
That is the real mental shift. When code is abundant, tests define the allowed behavior and review protects the system around it.
The old craft trained us to admire the cup. The new craft requires us to inspect the plumbing. Less romantic, maybe. Much more useful.
Documentation Becomes Operational Infrastructure
This is the part people still underestimate. If agents are helping build and maintain your codebase, documentation is no longer a nice artifact for future onboarding. It is operational infrastructure.
Docs are how you make a system legible. Humans need that because nobody can hold a growing codebase in their head forever. Agents need it because they are extremely good at following local cues and extremely capable of making locally reasonable and globally annoying decisions when those cues are missing.
Good documentation should explain:
- Architecture and service boundaries
- Module contracts
- Naming and layout conventions
- Invariants and dangerous areas
- What not to touch casually
The difference between a repository with decent docs and one without them is the difference between "helpful assistant" and "helpful raccoon with bolt cutters."
At minimum, I want a system to have a structure like this:
And those docs should answer practical questions:
- What owns this data?
- What events can trigger this flow?
- What assumptions are safe?
- What failures are expected?
- What changes require extra caution?
The goal is not documentation for documentation's sake. The goal is to make the next correct move easier for both humans and agents.
If the codebase is going to accept contributions from humans, agents, or some cursed collaboration between the two, docs need to do more than explain. They need to constrain.
Code Entropy Is the New Enemy
Cheap code has a hidden tax. It is very easy to generate one more helper, one more wrapper, one more adapter, one more service that kind of overlaps with the other one but has a slightly different name because nobody wanted to touch the original.
You wake up one day and the repository is full of things like:
- Duplicate utilities
- Near-identical implementations
- Half-adopted abstractions
- Abandoned modules
- Glue code that exists mainly to apologize for other glue code
This is what abundance does. It accelerates entropy. The limiting factor is no longer "can we produce code?" The limiting factor is "can we keep the system coherent while code production gets easier?" That pushes good engineering toward consolidation, simplification, and deletion.
A mature team in this environment needs regular cleanup loops:
- Merge overlapping utilities
- Remove unused modules
- Collapse shallow abstractions
- Delete dead paths aggressively
- Tighten boundaries that have started to drift
In other words, the mature move is often not generating more code. It is aggressively removing the code that should not exist anymore.
You will notice something funny here: the better code generation gets, the more important taste becomes. Not taste in naming variables. Taste in shaping the repository so it does not become a landfill with type safety.
Use Agents for System Hygiene
One of the more interesting shifts here is that agents are not just builders. They can also be maintainers. That is where this starts to get practical.
A lot of repo hygiene work is important, repetitive, and just annoying enough to get deferred for six months:
- Synchronizing docs with code
- Finding dead modules
- Suggesting consolidations
- Checking architecture rules
- Flagging dependency drift
- Identifying stale tests and unused helpers
Those are exactly the kinds of maintenance tasks agents can assist with, assuming you give them clear boundaries and a review loop.
You can imagine scheduled maintenance jobs that do things like:
Or more realistically:
- Open a PR to sync docs after interface changes
- Flag modules with no callers in the last N days
- Detect duplicate helpers with overlapping signatures
- Report new violations of architecture rules
- Suggest missing contract tests around fragile boundaries
That is a very different posture from "have the agent generate a file and hope for the best." It is not autonomous chaos. It is supervised hygiene.
I expect strong teams to build recurring maintenance loops around this:
- Weekly dead code proposals
- Documentation drift checks
- Dependency review with policy filters
- Architecture rule enforcement
- Coverage gap suggestions around critical seams
That's not replacing engineering. It's increasing how much upkeep a team can sustain without turning every staff engineer into a full-time janitor.
The New Engineering Loop
Put it all together and the loop starts to look different:
- Engineers design the architecture.
- Agents generate implementation inside defined boundaries.
- Tests enforce behavioral guarantees.
- Maintenance agents keep the repository legible.
- Engineers review outcomes, risks, and system-level changes.
That loop matters because it preserves what actually makes engineering valuable: judgment.
The point is not that code no longer matters. The point is that manual code production is no longer the center of gravity. The craft still exists. It just relocates.
Old craft was pouring the cup by hand. New craft is designing the cafe so every cup comes out right, even when you're not the one touching the kettle.
That is a harder job, honestly. It requires taste, systems thinking, discipline, and a stronger stomach for ambiguity. But it is also where the leverage is going to be.
If you're building with agents today, start here:
- Define explicit input/output types for critical modules
- Write tests for invariants, not implementation
- Document system boundaries before adding features
- Schedule weekly cleanup loops, not quarterly
- Review changes for blast radius, not syntax
You don't need a new stack. You need a tighter system.
The future of engineering is not less rigor. It is rigor at the system level.