The End of Pour-Over Code

How to Engineer Code You Didn't Write

Roger Rodriguez

Cover Image for The End of Pour-Over Code

Roger Rodriguez

Listen to this post

AI-generated narration of The End of Pour-Over Code.

March, 2026

There's a certain kind of coffee shop where the barista spends five minutes making a single cup.

They weigh the beans. Grind them precisely. Pour hot water in careful spirals over a glass dripper.

It's meticulous. It's artisanal. It's beautiful.

And it absolutely does not scale.

For a long time, software engineering had a similar aesthetic. Good engineers wrote most of the code themselves. They read nearly every line. They understood the system intimately. Hand-crafted code was the ideal. But the environment is changing.

Today it's normal for LLM agents to generate large chunks of a system. Entire modules. Tests. Refactors. Sometimes full features.

Which raises an uncomfortable question:

If you didn't write the code, and you didn't read every line, what does it mean to still be doing good engineering?

The answer is not "trust the AI." The answer is that the work moves up a level. Software used to reward engineers who could hand-craft every line. The next generation will reward engineers who can design systems that safely accept code they didn't write.

Engineer Boundaries, Not Functions

Once agents can generate code cheaply, the game changes. The problem is no longer "can we produce the code?" The problem is "can this code exist in the system without quietly making a mess?" That is why boundaries matter more now.

You can survive a lot of mediocre implementation inside a well-bounded module.

You cannot survive a codebase where every generated helper reaches into shared state, calls three services, mutates something surprising, and returns whatever shape felt convenient in the moment.

That is how you end up with a repository full of code that technically works and operationally behaves like a gas leak.

A weak boundary usually looks innocent at first. That is part of the problem. It takes in loose input, returns a fuzzy result, hides side effects, and knows too much about neighboring systems.

Here is a weak boundary:

async function processUser(user: any) {
  const profile = await enrich(user)
  if (profile.plan === "pro") {
    await db.users.update(profile.id, profile)
    await billing.sync(profile)
    await analytics.track("user_processed", profile)
  }
  return profile
}

This function does too much, knows too much, and hides too much.

It loads data, applies business logic, writes to the database, triggers billing side effects, and emits analytics. If generated code starts multiplying around this style, your future is just a stack of polite-looking accidents.

Now compare it to a stronger boundary:

type ProcessUserInput = {
  userId: string
  source: "signup" | "import" | "admin"
}
 
type ProcessUserResult = {
  userId: string
  status: "updated" | "skipped"
  billingSyncRequired: boolean
}
 
async function processUser(input: ProcessUserInput): Promise<ProcessUserResult> {
  const profile = await loadUserProfile(input.userId)
 
  if (profile.plan !== "pro") {
    return {
      userId: input.userId,
      status: "skipped",
      billingSyncRequired: false,
    }
  }
 
  await saveUserProfile(profile)
 
  return {
    userId: input.userId,
    status: "updated",
    billingSyncRequired: true,
  }
}

This version is not better because it is fancier. It is better because it is easier to trust. The input is explicit. The output is explicit. The side effects are narrower. The next action is surfaced in the result instead of being buried inside the function.

That is what a good boundary buys you. It contains damage. Even if some of the implementation is generated, the system is still easier to reason about because the module has a defined job and a predictable surface area.

That is the standard now, and it is not "is every line artisanal?"

More like:

Can I tell what goes in?
Can I tell what comes out?
Can I tell what this thing is allowed to touch?
If it breaks, do I know how far the damage travels?

In the old model, people obsessed over the function body. In the next one, the interface matters more.

Verify Behavior, Not Every Line

A lot of the anxiety around AI-generated code comes from the same instinct:

"If I didn't personally inspect every line, how do I know this thing is safe?"

Fair question.

But "read every line forever" is not a serious operating model once agents are generating large chunks of implementation. It does not scale, and more importantly, it points your attention at the wrong thing.

The real question is not whether you reviewed every line like a Victorian factory foreman. The real question is whether the system has enforceable contracts, and whether review is focused on consequences instead of trivia.

That is where confidence comes from now.

I care less that a generated implementation is elegant on first pass than I care that it satisfies the properties I actually need:

Schema validity
Idempotency
Access control invariants
Failure behavior
Data integrity under retries

For example:

it("is idempotent for repeated webhook deliveries", async () => {
  await handleWebhook(event)
  await handleWebhook(event)
 
  expect(await countPaymentsForEvent(event.id)).toBe(1)
})
 
it("rejects payloads that do not match the contract", async () => {
  const result = await handleWebhook({ ...event, customerId: null as any })
  expect(result.ok).toBe(false)
})

In a codebase with heavy agent involvement, that test suite is less like a safety net and more like a fence. It tells the system what counts as valid behavior and what does not. The implementation can change. The contract cannot.

That is the first half of the shift.

The second half is how humans review changes.

There are still places where exact implementation deserves real scrutiny, especially around security, data handling, concurrency, and anything irreversible. But as a default operating model, "review every line of every generated change" is not going to hold.

Human review becomes more valuable when it focuses on consequences:

Did an interface change?
Did a schema drift?
Did a dependency introduce new risk?
Did a module cross a boundary it should not cross?
Did the blast radius get larger?
Did observability get weaker?

Implementation can be delegated. Consequences cannot.

That is the standard I keep coming back to. Engineers should spend less time asking "would I have written this exact loop differently?" and more time asking "does this preserve the architecture, the contracts, and the operating model of the system?"

That is the real mental shift. When code is abundant, tests define the allowed behavior and review protects the system around it.

The old craft trained us to admire the cup. The new craft requires us to inspect the plumbing. Less romantic, maybe. Much more useful.

Documentation Becomes Operational Infrastructure

This is the part people still underestimate. If agents are helping build and maintain your codebase, documentation is no longer a nice artifact for future onboarding. It is operational infrastructure.

Docs are how you make a system legible. Humans need that because nobody can hold a growing codebase in their head forever. Agents need it because they are extremely good at following local cues and extremely capable of making locally reasonable and globally annoying decisions when those cues are missing.

Good documentation should explain:

Architecture and service boundaries
Module contracts
Naming and layout conventions
Invariants and dangerous areas
What not to touch casually

The difference between a repository with decent docs and one without them is the difference between "helpful assistant" and "helpful raccoon with bolt cutters."

At minimum, I want a system to have a structure like this:

/docs/architecture.md
/docs/conventions.md
/docs/services/report-service.md
/docs/services/billing-sync.md

And those docs should answer practical questions:

What owns this data?
What events can trigger this flow?
What assumptions are safe?
What failures are expected?
What changes require extra caution?

The goal is not documentation for documentation's sake. The goal is to make the next correct move easier for both humans and agents.

If the codebase is going to accept contributions from humans, agents, or some cursed collaboration between the two, docs need to do more than explain. They need to constrain.

Code Entropy Is the New Enemy

Cheap code has a hidden tax. It is very easy to generate one more helper, one more wrapper, one more adapter, one more service that kind of overlaps with the other one but has a slightly different name because nobody wanted to touch the original.

You wake up one day and the repository is full of things like:

Duplicate utilities
Near-identical implementations
Half-adopted abstractions
Abandoned modules
Glue code that exists mainly to apologize for other glue code

This is what abundance does. It accelerates entropy. The limiting factor is no longer "can we produce code?" The limiting factor is "can we keep the system coherent while code production gets easier?" That pushes good engineering toward consolidation, simplification, and deletion.

A mature team in this environment needs regular cleanup loops:

Merge overlapping utilities
Remove unused modules
Collapse shallow abstractions
Delete dead paths aggressively
Tighten boundaries that have started to drift

In other words, the mature move is often not generating more code. It is aggressively removing the code that should not exist anymore.

You will notice something funny here: the better code generation gets, the more important taste becomes. Not taste in naming variables. Taste in shaping the repository so it does not become a landfill with type safety.

Use Agents for System Hygiene

One of the more interesting shifts here is that agents are not just builders. They can also be maintainers. That is where this starts to get practical.

A lot of repo hygiene work is important, repetitive, and just annoying enough to get deferred for six months:

Synchronizing docs with code
Finding dead modules
Suggesting consolidations
Checking architecture rules
Flagging dependency drift
Identifying stale tests and unused helpers

Those are exactly the kinds of maintenance tasks agents can assist with, assuming you give them clear boundaries and a review loop.

You can imagine scheduled maintenance jobs that do things like:

if module.reference_count == 0:
    propose_removal(module)
 
if api_schema != documented_schema:
    propose_docs_update()

Or more realistically:

Open a PR to sync docs after interface changes
Flag modules with no callers in the last N days
Detect duplicate helpers with overlapping signatures
Report new violations of architecture rules
Suggest missing contract tests around fragile boundaries

That is a very different posture from "have the agent generate a file and hope for the best." It is not autonomous chaos. It is supervised hygiene.

I expect strong teams to build recurring maintenance loops around this:

Weekly dead code proposals
Documentation drift checks
Dependency review with policy filters
Architecture rule enforcement
Coverage gap suggestions around critical seams

That's not replacing engineering. It's increasing how much upkeep a team can sustain without turning every staff engineer into a full-time janitor.

The New Engineering Loop

Put it all together and the loop starts to look different:

Engineers design the architecture.
Agents generate implementation inside defined boundaries.
Tests enforce behavioral guarantees.
Maintenance agents keep the repository legible.
Engineers review outcomes, risks, and system-level changes.

That loop matters because it preserves what actually makes engineering valuable: judgment.

The point is not that code no longer matters. The point is that manual code production is no longer the center of gravity. The craft still exists. It just relocates.

Old craft was pouring the cup by hand. New craft is designing the cafe so every cup comes out right, even when you're not the one touching the kettle.

That is a harder job, honestly. It requires taste, systems thinking, discipline, and a stronger stomach for ambiguity. But it is also where the leverage is going to be.

If you're building with agents today, start here:

Define explicit input/output types for critical modules
Write tests for invariants, not implementation
Document system boundaries before adding features
Schedule weekly cleanup loops, not quarterly
Review changes for blast radius, not syntax

You don't need a new stack. You need a tighter system.

The future of engineering is not less rigor. It is rigor at the system level.

Field Notes.