Who's Accountable for the Code Nobody Wrote?

Share
Who's Accountable for the Code Nobody Wrote?

The productivity numbers are real. Teams using AI agents are shipping faster. Pull request volume is up. Cycle times are down. Features that used to take weeks are taking days. If your organization isn't taking this seriously, it's probably already falling behind.

But there's a question most of those productivity dashboards aren't measuring: who owns what just got shipped?

The wrong part of the equation

The software industry has long known, though not always acted on it, that writing code is the cheap part of owning it. Initial development is a fraction of lifetime cost. Maintenance, operations, incident response, and the slow accumulation of architectural decisions that made sense at the time: these are where the real spend lives. The maintenance burden on software has long been estimated to be the most significant percentage of total lifetime cost.

AI accelerates the cheap part. It does very little to reduce, and is likely to increase, everything that comes after.

Think about what creation via AI agent actually produces: large volumes of code, generated quickly, often without the coherent design intent that a human author imposes over time. An engineer who writes a service carries a mental model of it forward. That model is what gets you through the 2 AM incident. It's what lets you look at an error trace and say "oh, I know exactly what that is." AI-generated code ships without that context. The code exists. The understanding does not.

And here's what compounds the problem: as AI-assisted codebases grow, they don't necessarily get easier to work in. They can get harder. Navigation costs go up. The implicit architectural knowledge that makes a codebase coherent becomes thinner, because nobody designed the whole thing, and nobody really knows it.

What total cost of ownership actually means here

When I think about what it costs to own software in an AI-accelerated world, I find it useful to ask four questions. None of them are about the code itself. They're about the team's relationship to it.

  1. Can you see what's happening in production? Observability isn't glamorous, but it's the difference between knowing you have a problem and learning about it from a customer. If your team is shipping more changes more frequently, the signal-to-noise ratio in your production systems has to keep up.
  2. Can you understand why something broke? This is where the accountability gap starts to show. Debugging requires a working model of the system in your head. You need to know what it was supposed to do, what it actually did, and where those two things came apart. When nobody deeply understood the code in the first place, this step gets expensive fast.
  3. Can you fix it fast when it does break? Recovery time is largely a function of understanding. Teams that can't answer the previous question usually struggle here too. Deployment practices that limit blast radius, feature flags that let you roll things back without a full deployment, clear runbooks for the failure modes you've already seen: these are what buy you time when things go wrong.
  4. Does anyone actually know how the system works? This is the deepest question and the hardest to answer honestly. Teams moving fast with AI tooling often have high output and thin institutional knowledge. Those two things can coexist for a while. Eventually, they can't.

The false comfort of test coverage

The reflex response to "we're shipping more code" is "we need more tests." That's not wrong, but it's incomplete in a specific way that matters.

AI-generated tests tend to be written to make the implementation pass. They verify that the code does what it does, not that it does what you wanted. Coverage numbers go up. Confidence can be misplaced. You end up with a green test suite and a system whose behavior isn't well understood.

Good test coverage matters, but what you're actually trying to preserve is intent. Tests that capture intent, the kind a senior engineer writes while thinking about edge cases, failure modes, and the reasoning behind a particular decision, are the ones that save you. Those require human judgment that doesn't automatically come along with AI generation. Right now, most teams aren't drawing that distinction clearly enough.

Observability before acceleration

If I were advising an engineering leader who was about to significantly scale up AI-assisted development, the first thing I'd ask is: what does your observability look like?

Not test coverage. Observability. Can you see what's running? Do your alerts have enough fidelity to catch real signals without drowning on-call engineers in noise? Can you trace a production failure back to its origin quickly, with a team that may never have read the relevant code?

The teams I've seen get this right invest in observability before they accelerate. Not as a follow-on initiative, not as a Q3 priority after shipping a bunch of features in Q1 and Q2. Before. The cost of a production incident when nobody understands the code, when blast radius is larger than expected, and when the on-call engineer is staring at a service they've never touched, is much higher than any sprint velocity metric would suggest.

The accountability question

When a human engineer writes code, there's a natural accountability chain. They understand what they built. They're on the hook when it breaks. Understanding and accountability travel together.

AI-generated code severs that connection by default. The code exists. Ownership has to be assigned deliberately rather than assumed. Somebody has to read it closely enough to actually understand it. Somebody has to be willing to put their name on it. Not just approve the PR, but genuinely take responsibility for what it does in production.

This is a leadership problem as much as a technical one. Teams optimizing for velocity without building the ownership structures to match will eventually find that speed made things worse. What shipped fast will take a long time to understand, fix, and explain when it fails.

The right question for any team scaling AI-assisted development isn't "how much faster are we shipping?" It's "do we understand what we're shipping well enough to own it?"

If the answer is yes, go faster. If it isn't, that's the thing to work on first.

Note: Light AI assistance used for editing and idea refinement.