June 3, 2026

An Honest Assessment: Where AI Coding Actually Is in 2026

1 min read

The AI Productivity Boom Is Creating New Risks

The pressure: we see what you’re being asked to do

This page exists because we talk to a lot of engineering leaders and they all describe the same Tuesday afternoon. The pressure to “do something with AI” has collapsed into the pressure to cut headcount and call it AI productivity. Those aren’t the same thing, and pretending they are will cost you the engineers you can least afford to lose.

Your CEO read an article. It said engineers are obsolete. Someone in the next stand-up will ask why headcount isn’t down 30% yet. You know the article was wrong. You also know your political capital isn’t infinite.

The board wants AI productivity gains, this quarter. Not “24-month payoff with measured rollout.” Now. The pitch decks say 10×. The reality, if you’re honest, is that the tools you’ve tried so far have produced ~15% lift on greenfield and ~0% on the gnarly stuff that actually pays the bills.

Your best engineers are watching. They’ve seen the layoffs at peer companies. They’re watching to see whether you’ll equip them to do better work or replace them with a tool that ships slop. Whichever signal you send, the strongest ones will respond to it first.

The math has to actually work. Not on the deck. In production. With your real codebase. With the on-call rotation that has to debug AI-written code at 3am six months after the engineer who “owned” it left.

New section

The AI Productivity Boom Is Creating New Risks

Security Vulnerabilities Are Becoming a Growing Concern

Where AI coding actually is in 2026

The Twitter demos are real. So is the production gap. Six failure modes you will hit if you ship AI-generated code at scale without specialized guardrails — not theoretical, not extreme cases, the predictable ones:

Hallucinated APIs that look real. The model invents a function signature that exists in some adjacent library but not yours. The type checker catches it — if the model didn’t also invent the type. We see this most often on libraries with a popular sibling (Mongoose-style writes leaking into a Drizzle codebase, RSC primitives showing up in pages-router code).

Confident stubs that look finished. A // TODO: implement validation inside a function the model declared “done.” The diff is green. The PR description sounds confident. The function ships and silently accepts garbage input until the support ticket lands.

Silent rewrites disguised as small edits. You asked for one function changed. The model returned the whole file with three unrelated exports quietly removed (it “cleaned them up”). Reviewer eyes glaze on the long diff and merge it. The breakage surfaces in production when something downstream tries to import what isn’t there anymore.

Plausible code that misreads the goal. The model produces working code for an adjacent problem to the one you actually have. Tests pass. Demo looks great. Two weeks later someone realizes the auth flow it built protects the wrong boundary — the model never asked the question that would have surfaced it.

Drift the longer the chat runs. Early turns of the session reason carefully about your codebase. Twenty turns in, the model has lost the early context and is confidently shipping patterns that contradict decisions made in the first hour. The chat history looks coherent. The codebase starts to look schizophrenic.

Output that compiles but doesn’t run. Type-correct, lint-clean, import-clean — and broken at runtime. A null where the model assumed a value. A race the model didn’t reason about. A migration that runs locally and dies on the prod data shape. No model catches what only actual execution does.

None of this is an argument against using AI. It’s an argument that the layer between an AI model and your production codebase has to do real work. Without that layer you ship the failures above silently. With it, you get the productivity gain without the long tail of 3am pages.

These findings highlight a growing reality: AI-generated code often looks correct. But appearance is not the same thing as security. And as organizations increasingly trust AI-generated software, vulnerabilities can scale faster than engineering teams can detect or remediate them.

Security is only part of the problem. Another growing concern is technical debt.

What humans still do

Senior engineers don’t just write code — if they did, replacing them would be a defensible bet. The work that actually compounds in value is the work models are worst at: judgment, prioritization, knowing what NOT to build, owning a system through pages and postmortems and the slow gravity of tech debt.

Tradeoff judgment under business pressure. “Yes we could ship this in 3 days with the cleaner architecture, OR in 1 day with the duct tape that lets the sales team close the contract Friday.” Which one is right depends on context the model doesn’t have and isn’t allowed to ask.

Reading what people actually mean. The PM says “make it faster.” Your senior engineer hears “the demo’s tomorrow, the loading state looks broken, so really we need to make it FEEL faster, not be faster.” The model can’t reliably hear that. Yet.

Knowing what NOT to build. Half of a senior engineer’s value is talking the team out of work that shouldn’t exist. AI agents reflexively give you what you asked for. They almost never push back with “this whole subsystem is unnecessary, here’s what you actually need.”

Owning a system through its lifecycle. On-call at 3am. The postmortem two days later. The decision to deprecate a feature gracefully. The migration that touches 47 files and has to ship without downtime. Single-shot AI generates patches. People own systems.

Calibrated honesty about uncertainty. A good engineer will tell you “I think this is right but I’m not sure — let’s add a feature flag.” Models present output with uniform confidence whether they’re right or guessing. That calibration gap is exactly what causes the silent production bugs above.

Carrying tacit knowledge of YOUR codebase. Why is auth in /api/v2 instead of /api/v1? Because of the migration in 2023 the model wasn’t there for. Your senior engineer knows the answer. The model will generate plausible-sounding wrong answers all day.

Runtime Reality Is Often Missing

What firing your best engineers actually costs you

We talk to engineering leaders at companies that did the 2024 layoffs and the 2025 layoffs. The pattern is consistent enough that we’re comfortable describing it as predictable. You don’t pay the bill on the day you sign the package — you pay it 6–12 months later, and the cost shows up in places the spreadsheet didn’t model.

Tacit knowledge walks out the door first. The engineer who knows why the payment retry logic is the way it is gets the package and leaves. Six months later you’re debugging payment retries with people who weren’t there when the decision was made. Documentation never captured the why.

On-call coverage cracks before anyone notices. The on-call rotation that was 8 deep is now 4 deep. The remaining 4 burn out. Your most experienced engineer — the one you didn’t lay off — quits in month 5 because they’re carrying an unsustainable pager load. Now you’re 3 deep.

The rehire bill is bigger than you saved. Twelve months in, you’re trying to backfill the same seniority you let go. The market knows what happened. Candidates ask in the first call “are you going to do that again?” You’re paying 1.3–1.5× the comp on signing bonuses just to staunch the bleed.

AI failures hit a team that can’t unstick them. The model gets confused. The agent loops. The patch produces a runtime crash nobody on the team has the codebase mental model to debug — because the people who had that mental model are at a competitor now. Productivity goes negative for the month.

The alternative isn’t “don’t adopt AI.” The alternative is to use it to make the people you have measurably more effective — and to be able to point at the productivity gain without having to defend a layoff to do it.

Many organizations only discover problems after deployment. By then, the cost of remediation is dramatically higher.

The warning signs

Amplifier, not replacer

Codira is a native macOS engineering environment built around the assumption that you keep your senior engineers and equip them better. Same AI models as everyone else underneath — but wrapped in a 9-agent architecture and five deterministic guards specifically designed to catch the failure modes above, before they ship.

A 9-agent team behind every engineer. Instead of one model trying to be good at everything, each phase hands off to the agent optimized for it: Planner decomposes, Implementer writes, Reviewer approves, Security scans, QA generates tests, Auditor and Verifier find existing bugs, Explainer answers codebase questions, Debugger diagnoses failures. Your senior engineer directs the team. The team doesn’t direct itself.

Five deterministic guards on every patch. These aren’t more AI deciding if AI’s output is OK. They’re TypeScript parsers that catch the six failure modes from above: dropped exports, hallucinated imports, placeholder stubs, mass rewrites disguised as small edits, grounding gaps. Mechanical checks for mechanical failures. Your engineer’s eyes are reserved for the judgment calls.

Onboarding to your code in minutes, not weeks. /explain gives a 5-section guided tour of any codebase: stack, architecture, entry points, conventions, suggested follow-ups — with file:line refs you can click. The senior who carries the mental model still has it. The junior who needs to ramp gets the tour without burning the senior’s time.

Failures get analyzed, not just retried. When tests fail or UAT catches a runtime error, the Debugger agent gives root cause + execution trace + 2–3 candidate fixes with honest tradeoffs. Your junior engineers learn debugging by reading the analyses. Your senior engineers stop being interrupted to triage what someone else broke.

Three days of work in one. We’re honest about the multiplier: not 10×. Heavy users report 2–3× on routine work and ~1.3× on the gnarly stuff that AI struggles with most. The math on a 10-engineer team is enormous; the math on firing 7 of them and keeping 3 with AI is fiction.

Why Codira™ Was Built

What this page is not

Not anti-AI. We ship an AI IDE. We use AI all day. The argument is against AI without guardrails being shipped as if it were AI with guardrails — and against cutting headcount to dramatize a productivity gain that didn’t actually happen.

Not cover for keeping a team you should reshape. Some restructuring is correct. Roles do evolve. The argument is against indiscriminate cuts justified by AI productivity claims that won’t survive a quarter of real measurement.

Not a promise that Codira fixes everything. Codira is good at what it’s good at and we tell you where it isn’t. macOS only today. 2–3× on routine work, ~1.3× on hard problems. Won’t make a junior into a senior.

Not a guarantee your CEO will be persuaded. We can give you defensible KPIs and a stance to bring to the table. We can’t litigate the politics for you. The leaders who win this argument have done the work to back the position with their own pilot data.

Not free. Real AI tooling that actually catches the failure modes above costs money. It’s still ~50× cheaper than the rehire bill for a senior engineer you let go to look productive.

ENGINEER CONTROL

The next 6 months: measure productivity, not headcount

If you’re going to bring an AI productivity narrative back to your board, the version that survives twelve months of scrutiny is one anchored in shipped work per engineer rather than engineers eliminated. Four numbers we’d suggest you commit to publicly inside your org:

Shipped per engineer per quarter. Features merged, weighted by complexity. Should rise 25–60% in the second quarter of the pilot. This is the number to bring to the board.

Time from idea to first PR. How long after a feature is scoped until your engineer opens a draft PR. Codira’s spec expansion + multi-step plan should cut this by ~half.

Bug escape rate. Bugs your users find ÷ total bugs (caught + escaped). The five guards + UAT auto-run should push escapes down meaningfully.

Senior engineer regrettable attrition. The number you actually care about. If amplification works, the engineers who matter most stay — because they’re doing more interesting work, not babysitting AI slop.

Start a 60-day pilot on one team. Run Codira against your baseline. If the numbers move, you have a real answer for the board that doesn’t require firing the people who built your business. If they don’t, you’ve lost two months and you’ll know. Either way, you stopped guessing.

One engineer. Infinite scale.

The operating system for ai-native software engineering.

Download for macOS More from the blog