We're Shipping More Code Than Ever. We Understand Less of It.

A developer slumped face-down on a desk in front of a code editor, lit only by the dim glow of a monitor at night

A few weeks ago I watched a junior push four commits to the same pull request, each one a fresh Claude paste at the same bug. Each round he was more confident. Each round he was no closer to fixing it, because the bug was in his prompt, not in the code. By round three nobody on the call could explain what the function was supposed to do.

The junior wasn’t the failure here. The pipeline was. We hired juniors into a process that quietly stopped doing the one thing it was supposed to do for them, which is to push back when they ship something they can’t defend. They learned to lean on AI because we let them, and then we trained them to.

In 2026, 76% of developers use an AI coding assistant, up from 44% two years ago. AI now writes roughly 41% of the code shipped. And yet 65% of developers still report burnout, AI-generated code carries 1.7× more issues than human-written code, and incidents per pull request are up 23.5% year-on-year.

We’re moving faster on paper while learning less, shipping worse, and burning out harder. The tools aren’t the problem. The expectation that they replace thinking is.

Yes, We’ve Heard This Before

Every five years a new tool gets blamed for breaking software engineering. IDEs were going to kill our ability to write Makefiles. Stack Overflow was going to produce a generation of copy-paste developers. Offshoring was going to gut institutional knowledge. None of those predictions aged well, and the industry survived each one.

So why is this time different? Because LLM output doesn’t look foreign. A Stack Overflow paste was obviously someone else’s code: different style, different variable names, different formatting. You knew you hadn’t written it. AI-generated code is formatted to your conventions, named after your variables, and commented in your tone. The social cue that used to say "stop, read this carefully" is gone. The output looks like it came from you, even when nobody on the team understood it.

That changes the failure mode. Old tools eroded specific skills. This one erodes the line between writing code and reading code, which is the line that mentorship, review, and learning all sit on.

This piece is about what happens when an organisation crosses that line at scale. If you’re a solo founder or a two-person shop, AI-shipped speed isn’t a failure of discipline, it’s the job. What I’m worried about is what breaks when companies scale those habits without scaling accountability.

The Speed Illusion

Output volume is up. Real velocity isn’t.

In a study of experienced developers using AI tools, participants took 19% longer to complete tasks while believing they were 20% faster. A 43-point gap between perception and reality, one of the largest "expectations gaps" recorded in modern software engineering research. The dashboards show acceleration. The clock shows the opposite.

The pattern repeats at the team level. According to a recent Cortex report, pull requests per author are up 20% year-on-year, incidents per pull request are up 23.5%, and change failure rates have risen by roughly 30%.

We’re measuring the wrong things. Lines of code, pull request count, story points closed: all up with AI. Time-to-resolution, incident counts, rework, weekend on-call hours: also up, faster. If you only look at the top of the funnel, AI looks like a miracle. If you look at the full ledger, the miracle starts to look like a loan.

The Habits That Don’t Get Fixed

In 2018, a junior who copy-pasted a Stack Overflow answer they didn’t understand would get stopped in code review before lunch. Someone would ask "why did you write it this way?" and the gap would surface. In 2026, that same junior gets a thumbs-up emoji and the pull request merges, because the senior who would’ve caught it is reviewing five times more code in the same hour.

Here are the habits I keep watching not get corrected:

Copy-paste without reading. Developers ship code they couldn’t have written by hand and can’t explain in review. That isn’t laziness. It’s a rational response to a quota that assumes AI velocity but pays for human throughput.

Stack traces become LLM input, not learning. Pasting a trace into Claude isn’t the problem. Pasting it without reading what came back is. When the model is right, you ship. When the model is wrong, you’ve got nothing to fall back on, because you never built the muscle that lets you read the trace yourself.

AI-generated tests no one validates. Green CI doesn’t mean correct code. Tests that mirror the implementation pass even when the implementation is wrong. Coverage numbers go up while real coverage goes down. We’re watching codebases where every line is "tested" and nobody can explain what any of it actually verifies.

Pattern atrophy. Developers are forgetting basic shapes: recursion, loops, common data structures, even the order of arguments to functions they wrote last month. Try this on yourself this weekend. Write merge sort with no AI, no Google, just a blank file. If it surprises you how rusty you feel, the atrophy already started. The skill rusts the same way any skill does when you stop using it.

The same dynamic plays out beyond closed teams. Open-source maintainers are now drowning in plausible-looking pull requests submitters can’t defend in review, the same broken feedback loop, just with no shared employer to escalate to.

None of this would survive a real review. That’s the whole problem. Reviews used to be where bad habits died. They’ve become a throughput gate.

Debugging Is Where the Bill Comes Due

You can ship code you don’t understand. You can’t debug it.

Production incident at 2 a.m. The AI doesn’t know your system’s quirks, your team’s conventions, or which of the seven retry layers is the one masking the real failure. You’re now reading code you wrote but never read. A debug cycle that took thirty minutes in 2020 takes three hours in 2026, not because the bug is harder, but because the mental model the original author should have built was never built. Compound interest on every shortcut taken during the original pull request.

This is also why the productivity numbers lie. The hours you saved during the pull request get quietly spent during the incident. The accounting is hidden because the two events happen on different days, owned by different on-call rotations, and reported in different dashboards. Speed booked in one quarter, cost paid in the next.

If you think this is just my anecdote, look at GitHub itself.

On 23 April 2026, GitHub’s merge queue silently corrupted code in 2,092 pull requests across 230 repositories. The root cause, in their own words: "existing test coverage primarily exercised single-PR merge queue groups, which did not exhibit the faulty base-reference calculation." A predictable edge case nobody thought to test. That gap wasn’t a memory failure, it was a velocity failure. The tests that would’ve caught a multi-PR squash regression are the kind you write when you’re slowing down to think about how a feature actually behaves under load. They’re the first tests to get cut when the team is told to ship faster.

The company that ships AI dev tools to the rest of us shipped at AI velocity, and an obvious failure mode reached production unguarded. If GitHub is hitting these walls, your team is too. You just haven’t noticed yet, because nobody is paying you to write a public availability report.

"But AI Saves Real Time, Right?"

The wins are real. They’re also narrow.

Boilerplate is genuinely faster. Scaffolding new files, projects, and test fixtures is much faster. Searching documentation for an unfamiliar library, translating between formats, drafting commit messages, generating regex you’d otherwise spend forty minutes on Stack Overflow for: all real wins. I use these tools every day at work.

The point isn’t that AI is bad. The point is that the wins are concentrated on the easy work, while the cost lands on the hard work. System design, debugging novel failures, writing code that survives a refactor in two years: none of that gets easier with AI. It gets harder, because the developer trying to do the hard work has less practice on the easy work that used to build the muscles.

You can ship a product without ever doing the hard work. You can’t keep one running.

The Quota Got Higher, Not the Pay

AI didn’t free up time. It became the new baseline expectation.

Scientific American reports that developers using AI tools are working longer hours, not shorter ones. TechCrunch’s February 2026 piece put it more bluntly: the first signs of burnout are coming from the developers who embraced AI the most. 65% of developers are reporting burnout in 2026 even though 61% of organisations have rolled AI into their development pipelines.

Most engineering managers didn’t set this quota. They received it. The line manager telling the tech lead that review is a bottleneck is themselves being told their team’s velocity needs to double now that Copilot is paid for. The pressure runs from the top of the org chart down, and the cognitive load of reviewing AI slop lands on whoever is holding the bag at the bottom.

Microsoft is the public version of this story. Satya Nadella reportedly described some of Copilot’s own integrations as "almost unusable". Windows 11 hit a Patch Tuesday so bad that users couldn’t shut down their machines. The company that bet hardest on AI shipped at AI velocity, and the bug reports caught up with them faster than the marketing did.

Same pattern as the games industry collapse I wrote about last year: short-term thinking, growth-at-all-costs, no investment in longevity. New tool, same mistake.

How Can We Turn This Around?

None of the fixes here are revolutionary. They’re the basics a lot of teams have stopped doing. Skip the platitudes; here’s what to actually do on Monday morning.

For developers. Don’t merge code where you can’t delete one line and predict the effect. If you can’t, you don’t understand it. Before pasting a stack trace into the LLM, give yourself ten minutes alone with the trace. Keep the muscle alive. Treat AI output like a draft, never a deliverable.

For seniors and tech leads. Make "can you explain this in plain English?" a required review comment on any pull request that smells AI-generated. It costs the author thirty seconds when they understand the code, and it surfaces the gap immediately when they don’t. That’s mentorship at scale, not theatre.

For managers and engineering leaders. Pair every velocity metric on your dashboard with change failure rate. The upward script is short: "velocity is up, but the share of changes that need a fix or rollback is up faster, and that’s the risk we’re taking." That framing survives the conversation with finance because it reads as risk visibility, not as a veto. Then watch your own measurement. Change failure rate gets sandbagged the moment teams notice it matters: incidents quietly drop in severity, rollbacks get rebranded as "follow-up PRs". If the number isn’t moving, the gaming probably already started. The metric is only useful when paired with a definition you don’t let drift.

For solo founders, OSS maintainers, and anyone owning the whole stack. Keep a SHORTCUTS.md in the repo root. One line per AI shortcut you knowingly took, with a date and the reason. When you’re debugging at 2 a.m. three weeks later, that file is your map. The accounting trick that hides the bill from corporate teams doesn’t exist for you (you’re the one being paged), but the corners you cut still vanish from memory if you don’t write them down.

For everyone. Slow down to learn. Compound interest works in both directions.

The 2 a.m. page is the part of this loop you can still avoid. The shortcut is yours, the debugger is yours, and the mental model that gets you back to bed is yours too. None of that is going to be built by the system that’s asking you to ship faster.