The next article initially appeared on Addy Osmani’s weblog web site and is being reposted right here with the creator’s permission.
Comprehension debt is the hidden price to human intelligence and reminiscence ensuing from extreme reliance on AI and automation. For engineers, it applies most to agentic engineering.
There’s a price that doesn’t present up in your velocity metrics when groups go deep on AI coding instruments. Particularly when its tedious to evaluation all of the code the AI generates. This price accumulates steadily, and finally it must be paid—with curiosity. It’s referred to as comprehension debt or cognitive debt.
Comprehension debt is the rising hole between how a lot code exists in your system and how a lot of it any human being genuinely understands.
Not like technical debt, which pronounces itself by way of mounting friction—sluggish builds, tangled dependencies, the creeping dread each time you contact that one module—comprehension debt breeds false confidence. The codebase seems clear. The assessments are inexperienced. The reckoning arrives quietly, often on the worst potential second.
Margaret-Anne Storey describes a scholar workforce that hit this wall in week seven: They may now not make easy adjustments with out breaking one thing surprising. The actual downside wasn’t messy code. It was that nobody on the workforce may clarify why design choices had been made or how completely different elements of the system have been imagined to work collectively. The speculation of the system had evaporated.
That’s comprehension debt compounding in actual time.
I’ve learn Hacker Information threads that captured engineers genuinely wrestling with the structural model of this downside—not the acquainted optimism versus skepticism binary, however a subject attempting to determine what rigor truly seems like when the bottleneck has moved.

A current Anthropic research titled “How AI Impacts Ability Formation” highlighted the potential downsides of over-reliance on AI coding assistants. In a randomized managed trial with 52 software program engineers studying a brand new library, individuals who used AI help accomplished the duty in roughly the identical time because the management group however scored 17% decrease on a follow-up comprehension quiz (50% versus 67%). The most important declines occurred in debugging, with smaller however nonetheless important drops in conceptual understanding and code studying. The researchers emphasize that passive delegation (“simply make it work”) impairs ability improvement excess of energetic, question-driven use of AI. The complete paper is obtainable at arXiv.org.
There’s a pace asymmetry downside right here
AI generates code far sooner than people can consider it. That sounds apparent, however the implications are simple to underestimate.
When a developer in your workforce writes code, the human evaluation course of has at all times been a bottleneck—however a productive and academic one. Studying their PR forces comprehension. It surfaces hidden assumptions, catches design choices that battle with how the system was architected six months in the past, and distributes information about what the codebase truly does throughout the individuals answerable for sustaining it.
AI-generated code breaks that suggestions loop. The quantity is just too excessive. The output is syntactically clear, usually well-formatted, superficially right—exactly the indicators that traditionally triggered merge confidence. However floor correctness is just not systemic correctness. The codebase seems wholesome whereas comprehension quietly hollows out beneath it.
I learn one engineer say that the bottleneck has at all times been a reliable developer understanding the venture. AI doesn’t change that constraint. It creates the phantasm you’ve escaped it.
And the inversion is sharper than it seems. When code was costly to provide, senior engineers may evaluation sooner than junior engineers may write. AI flips this: A junior engineer can now generate code sooner than a senior engineer can critically audit it. The speed-limiting issue that saved evaluation significant has been eliminated. What was a top quality gate is now a throughput downside.
I like assessments, however they aren’t an entire reply
The intuition to lean more durable on deterministic verification—unit assessments, integration assessments, static evaluation, linters, formatters—is comprehensible. I do that loads in initiatives closely leaning on AI coding brokers. Automate your approach out of the evaluation bottleneck. Let machines test machines.
This helps. It has a tough ceiling.
A check suite able to protecting all observable habits would, in lots of circumstances, be extra advanced than the code it validates. Complexity you’ll be able to’t motive about doesn’t present security although. And beneath that could be a extra elementary downside: You may’t write a check for habits you haven’t thought to specify.
No person writes a check asserting that dragged objects shouldn’t flip fully clear. After all they didn’t. That chance by no means occurred to them. That’s precisely the category of failure that slips by way of, not as a result of the check suite was poorly written, however as a result of nobody thought to look there.
There’s additionally a selected failure mode value naming. When an AI adjustments implementation habits and updates lots of of check circumstances to match the brand new habits, the query shifts from “is that this code right?” to “have been all these check adjustments mandatory, and do I’ve sufficient protection to catch what I’m not fascinated about?” Exams can’t reply that query. Solely comprehension can.
The information is beginning to again this up. Analysis means that builders utilizing AI for code technology delegation rating beneath 40% on comprehension assessments, whereas builders utilizing AI for conceptual inquiry—asking questions, exploring tradeoffs—rating above 65%. The instrument doesn’t destroy understanding. How you utilize it does.
Exams are mandatory. They aren’t enough.
Lean on specs, however they’re additionally not the complete story.
A standard proposed resolution: Write an in depth pure language spec first. Embody it within the PR. Overview the spec, not the code. Belief that the AI faithfully translated intent into implementation.
That is interesting in the identical approach Waterfall methodology was as soon as interesting. Rigorously outline the issue first, then execute. Clear separation of considerations.
The issue is that translating a spec to working code entails an unlimited variety of implicit choices—edge circumstances, knowledge constructions, error dealing with, efficiency tradeoffs, interplay patterns—that no spec ever totally captures. Two engineers implementing the identical spec will produce techniques with many observable behavioral variations. Neither implementation is incorrect. They’re simply completely different. And plenty of of these variations will finally matter to customers in methods no person anticipated.
There’s one other chance with detailed specs value calling out: A spec detailed sufficient to totally describe a program is kind of this system, simply written in a non-executable language. The organizational price of writing specs thorough sufficient to substitute for evaluation might nicely exceed the productiveness positive factors from utilizing AI to execute them. And you continue to haven’t reviewed what was truly produced.
The deeper problem is that there’s usually no right spec. Necessities emerge by way of constructing. Edge circumstances reveal themselves by way of use. The belief that you may totally specify a non-trivial system earlier than constructing it has been examined repeatedly and located wanting. AI doesn’t change this. It simply provides a brand new layer of implicit choices made with out human deliberation.
Study from historical past
A long time of managing software program high quality throughout distributed groups with various context and communication bandwidth has produced actual, examined practices. These don’t evaporate as a result of the workforce member is now a mannequin.
What adjustments with AI is price (dramatically decrease), pace (dramatically larger), and interpersonal administration overhead (primarily zero). What doesn’t change is the necessity for somebody with a deep system context to keep up a coherent understanding of what the codebase is definitely doing and why.
That is the uncomfortable redistribution that comprehension debt forces.
As AI quantity goes up, the engineer who really understands the system turns into extra useful, not much less. The flexibility to take a look at a diff and instantly know which behaviors are load-bearing. To recollect why that architectural resolution acquired made beneath stress eight months in the past.
To inform the distinction between a refactor that’s protected and one which’s quietly shifting one thing customers rely on. That ability turns into the scarce useful resource the entire system depends upon.
There’s a little bit of a measurement hole right here too
The explanation comprehension debt is so harmful is that nothing in your present measurement system captures it.
Velocity metrics look immaculate. DORA metrics maintain regular. PR counts are up. Code protection is inexperienced.
Efficiency calibration committees see velocity enhancements. They can not see comprehension deficits as a result of no artifact of how organizations measure output captures that dimension. The inducement construction optimizes appropriately for what it measures. What it measures now not captures what issues.
That is what makes comprehension debt extra insidious than technical debt. Technical debt is often a acutely aware tradeoff—you selected the shortcut, you understand roughly the place it lives, you’ll be able to schedule the paydown. Comprehension debt accumulates invisibly, usually with out anybody making a deliberate resolution to let it. It’s the mixture of lots of of evaluations the place the code seemed advantageous and the assessments have been passing and there was one other PR within the queue.
The organizational assumption that reviewed code is known code now not holds. Engineers accepted code they didn’t totally perceive, which now carries implicit endorsement. The legal responsibility has been distributed with out anybody noticing.
The regulation horizon is nearer than it seems
Each trade that moved too quick finally attracted regulation. Tech has been unusually insulated from that dynamic, partly as a result of software program failures are sometimes recoverable, and partly as a result of the trade has moved sooner than regulators may observe.
That window is closing. When AI-generated code is operating in healthcare techniques, monetary infrastructure, and authorities providers, “the AI wrote it and we didn’t totally evaluation it” is not going to maintain up in a post-incident report when lives or important property are at stake.
Groups constructing comprehension self-discipline now—treating real understanding, not simply passing assessments, as non-negotiable—might be higher positioned when that reckoning arrives than groups that optimized purely for merge velocity.
What comprehension debt truly calls for
The correct query for now isn’t “how will we generate extra code?” It’s “how will we truly perceive extra of what we’re transport?” so we will be sure that our customers get a constantly top quality expertise.
That reframe has sensible penalties. It means being ruthlessly express about what a change is meant to do earlier than it’s written. It means treating verification not as an afterthought however as a structural constraint. It means sustaining the system-level psychological mannequin that allows you to catch AI errors at architectural scale fairly than line-by-line. And it means being trustworthy in regards to the distinction between “the assessments handed” and “I perceive what this does and why.”
Making code low cost to generate doesn’t make understanding low cost to skip. The comprehension work is the job.
AI handles the interpretation, however somebody nonetheless has to know what was produced, why it was produced that approach, and whether or not these implicit choices have been the precise ones—otherwise you’re simply deferring a invoice that may finally come due in full.
You’ll pay for comprehension ultimately. The debt accrues curiosity quickly.

