Skip to main content
← Research
Viren Mohindra10 min read

We Deleted a Feature 4 Days After Shipping It

We built a memory system for AI coding tools, shipped a feature that remembered too much, and deleted it when retrieval quality degraded. Here's what forgetting taught us.

engineeringmemory systemsbuilding in public

The Borges problem

Jorge Luis Borges wrote a story in 1942 about a man who acquires perfect memory. He can remember every leaf on every tree, every word of every conversation. His verdict: "My memory, sir, is like a rubbish heap."

We built the AI equivalent. A module that took every PR quality finding — missing descriptions, large diffs without tests, short titles — and persisted them as memories. 152 lines of code, 286 lines of tests. The idea made sense: if the system finds issues, it should remember them for next time.

Four days later, retrieval quality had visibly degraded. Every PR with a moderately large diff was surfacing the same unhelpful memory: "Large change without test updates." Hitting at ~70% confidence — high enough to show up, useless enough to crowd out everything that mattered.

We deleted the entire module. 509 lines across 5 files.

Why it broke

The findings were generic. "Large change without test updates" is true of 30% of PRs in any codebase. When you persist it as a memory and attach it to file paths, it becomes the most prolific piece of "knowledge" in the system — surfacing everywhere, helping nowhere.

Key takeaway

Neuroscientists Blake Richards and Paul Frankland proved in 2017 that the brain actively forgets — not as a bug, but as a feature. "The goal of memory is to optimize decision-making, not to transmit information through time." MIT confirmed in 2025 that dopamine serves as an "all-clear" signal for memory extinction. The brain has dedicated circuitry for forgetting.

The deeper issue was a category error. Endel Tulving drew the distinction in 1972 between episodic memory (specific events) and semantic memory (generalized knowledge). Our PR findings were episodic: "this specific PR had these specific issues." What the memory system needed was semantic: "this team values test coverage."

Cuconasu et al. confirmed this at SIGIR 2024: in RAG systems, documents that are related but not relevant are more harmful than random noise. Near-misses poison retrieval worse than garbage. That was exactly our failure mode — plausible enough to surface, useless for decisions.

The uncomfortable lesson

The hard part of building a memory system isn't remembering. Embeddings cost $0.02 per million tokens. Storage is cheap. You can remember everything.

The hard part is knowing what to forget.

The analyzer still runs. Still shows findings in the PR comment. It just doesn't pretend those findings are institutional knowledge anymore. Some knowledge is meant to be ephemeral — useful in the moment, then gone.

Our 4-day delete cycle wasn't a failure. It was the data engine working.