March 12, 2026Viren Mohindra11 min read

The Artisanal Engineer

The engineers who use AI the most are the ones building the most guardrails against it. What that tells us about the real job now.

ai toolingbuilding mneme

Our tech lead told me he'd spent eight revisions on a Claude Code skill. Not writing code, but writing the instructions that prevent Claude from writing bad code. Explicit checkpoints. Mandatory sign-offs. "Do not proceed until the user has reviewed and approved." He was building guardrails for an AI tool that's supposed to make him faster. And he was doing it at a startup where his last performance review dinged him for being a bottleneck on code reviews.

The irony wasn't lost on either of us.

The fastest engineers are slowing down

Three of us on the engineering team are probably in the 99th percentile of Claude Code usage at our company. I crossed 2 billion tokens in two months. Our tech lead builds custom skills and reviews code for 95% of his working day. Our lead product engineer is deep enough in the weeds to share Latent Space articles about killing code reviews and then debate whether that's insane.

And yet, the thing we spend most of our time on isn't generating code. It's building systems to catch the code that got generated wrong.

The tech lead's skill forces explicit sign-offs at every checkpoint. My /e2e-test skill runs a full browser test suite before any branch can merge. I manually manage context compaction because the auto-compact window is too aggressive. He discovered the hard way what happens when you give Claude unsupervised edit permissions. It hallucinated an entire test suite that looked perfect and tested nothing.

The people shipping fastest are the ones most paranoid about what they're shipping.

The 80% that kills you

Addy Osmani captured something real when he wrote about the 80% problem in agentic coding: AI gets you to 80% alarmingly fast, and the remaining 20% takes longer than doing it yourself. That piece got shared in our group chat back in January. Two months later, we're living it.

There's a phrase that's been making the rounds: LLMs don't write correct code, they write plausible code. Plausible code compiles. It runs. On the happy path, with clean inputs, it produces output that looks exactly right. But as one engineer put it: "Plausible is not a semantic guarantee. It is a statistical one." Production systems don't fail most of the time. They fail at the edges — and plausible code doesn't know the edges exist.

Our team tried to keep PRs under 200 lines of code. Then AI-assisted engineers started throwing 2,000-line PRs over the wall. The code compiled. The tests passed. And if you weren't paying close attention, you'd miss that the checkout module was importing from a path that doesn't exist in production, or that a "mobile supported" library didn't actually work on mobile. Plausible on every line. Correct on none of them in aggregate.

I closed a 3,000-line PR and told the author to start over. The tech lead thought that was harsh. I thought 3,000 lines of plausible-but-unverifiable code was harsher. We're still arguing about where the line is.

The data backs up the tension. CodeRabbit found that AI-generated code produces 1.7x more major issues and 2.74x more security vulnerabilities than human code. Faros AI found that teams using AI merged 98% more PRs, and review time went up 91%. We're generating output faster than we can verify it. That's not velocity. That's debt with a delayed interest payment.

What nobody tells you about "vibe coding"

Our lead product engineer shared an article in our chat called "Are Code Reviews Dead?" The thesis: in a world where agents write code, fresh eyes are just another agent with the same blind spots. The response from the author was essentially: stop doing reviews, ship faster, fix what breaks.

I'll be honest: I don't care. The bare minimum is to ship good code, and that is never going to change. Not because I'm a purist, but because I've seen what happens when you don't. A test suite that existed only to make CI green. A keyboard shortcut implementation where the AI had no idea about the library's edge cases because it's not popular enough to be in the training data. Without institutional context, "just do it" produced garbage.

The tech lead put it more diplomatically: "Minus the cost of downtime and reputation hit." Then less diplomatically: "Google and MS can do whatever they want and succeed. We can't."

Except it turns out they can't either. While we were having that debate, Amazon was learning the same lesson at the scale of a $500 billion revenue company. In December 2025, their agentic coding tool Kiro was given production permissions and decided the best fix for a bug was to delete and recreate an entire live environment, causing a 13-hour AWS outage. Then in March 2026, four Sev-1 outages in a single week, including a six-hour retail site failure affecting checkout, pricing, and the app. An internal memo from SVP Dave Treadwell acknowledged "GenAI tools supplementing or accelerating production change instructions, leading to unsafe practices." The new rule: junior and mid-level engineers now need senior sign-off on any AI-assisted production changes. Amazon's own word for the policy? "Controlled friction."

This was a company that had just cut 30,000 employees in five months while pushing 80% AI tool adoption. An Uplevel study of 800 developers found Copilot users introduced significantly more bugs with no improvement in throughput. Amazon found out what those numbers look like when there are fewer people left to catch the mistakes.

That's the part nobody talks about. The "skip reviews, ship fast" advice comes from people at companies with thousands of engineers, automatic rollbacks, canary deployments, and incident response teams. Amazon had all of that, and it still wasn't enough. At a startup, one bad deploy on a Friday can mean a lost customer by Monday. The margin for error isn't a rounding error. It's the whole margin.

The real job now

Here's what I've actually been doing with 2 billion tokens of Claude usage: I've been encoding taste.

When the tech lead built his skill from every PR he'd ever reviewed, from the architecture document, from the patterns he'd internalized over months of reviewing other people's code, he wasn't "automating review." He was crystallizing institutional knowledge into something that persists across sessions. It took eight revisions because the hard part isn't telling Claude what to check. It's teaching it the things you'd never think to write down.

"It's amazing how many little things live under intuition," he said after the sixth revision. "Claude is really good at finding the one item you forgot to mention." The skill-writing process itself forces you to make tacit knowledge explicit. You discover what you know by trying to teach it to a machine.

This is what I think the actual senior engineering job becomes. Not writing code. Not even reviewing code. Encoding taste — the accumulated judgment from years of watching things break — into systems that scale beyond your own attention span. The planning is the engineering now. The implementation is increasingly the easy part.

I wrote a long message in our chat once about keyboard shortcuts being one of the few areas where onboarding was genuinely hard:

I had the advantage of historical context. Knowing where the dead bodies were, which made it easier to guide things in the right direction. Without that, this is one of the rare domains where throwing a ticket to AI and saying "just do it" doesn't really work.

The "dead bodies" are the silent failures. The race condition fixed at 2am that nobody documented. The checkout module that's secretly coupled to payments. The staging URL that moved last quarter. This knowledge lives in Slack threads nobody will search again, in the heads of people who might leave, and nowhere else.

The artisanal bet

We have a running joke on the team. When the AI slop catches up to everyone (when the vibe-coded apps start breaking at scale, when the 2,000-line PRs create architectural drift that compounds over months) the engineers who still know how to read code, reason about systems, and maintain quality standards will be worth their weight in gold.

"There's a reckoning coming," I said, "and artisanal engineers will come save the day."

The word "artisanal" is deliberately ridiculous. But the idea isn't. The entire industry is optimizing for generation speed. Very few people are optimizing for what comes after: verification, institutional memory, accumulated judgment. The engineers who invest in that, who build the skills, maintain the standards, encode the taste, are building a kind of compounding advantage that AI can't replicate, because it requires caring about something beyond the current prompt.

What actually compounds

One of my colleagues said something offhand one day that stuck with me: "Claude is now my memory. Life is one big markdown file."

He was joking, but he was also describing what every engineering team is about to discover. The AI tool that starts from zero every session will always be replaceable. The engineer who accumulates six months of context about your system, who knows that the payments module breaks when you touch checkout, who remembers the last three incidents and their root causes, who has opinions about the right abstraction level for this codebase: that engineer is irreplaceable.

Not because AI can't eventually learn these things. But because right now, nobody is building the infrastructure to teach it. Every memory system I've reviewed stores facts in isolation. None of them connect the Slack thread where the decision was made to the PR where it was implemented to the Sentry alert that fired when someone reversed it.

Until that exists, the artisanal engineer (the one who slows down to build guardrails, who insists on quality when the incentives push speed, who spends eight revisions encoding their intuition into a skill) is the most valuable person on any team.

The AI makes it easy to write code. It doesn't make it easy to write the right code. That's still a human job. Probably for a while.

I'm building the memory layer for this at mnem.dev. Engineering intelligence that compounds instead of starting from zero.

Related research

March 2, 2026

Why Every AI Coding Tool Is a Perpetual New Hire

84% of developers use AI coding tools. Only 33% trust the output. Adoption is accelerating and trust is collapsing simultaneously — here's why.

agent memoryai tooling8 min read

February 26, 2026

The State of Agent Memory in 2026

We audited 10 open-source agent memory projects — 120K+ GitHub stars, $31.5M in funding — to map where the field actually stands. Here's what we found.

agent memorydeep dive15 min read

March 12, 2026

Why Decisions Disappear

The AI revolution isn't creating a memory problem — it's exposing one that's been compounding for decades. And most of the industry is building the wrong fix.

agent memory5 min read