Code change accuracy improved
55% in preliminary experiment.
We recently ran our first controlled experiment to measure how Devramp improves agent accuracy. Using Claude Code (Sonnet-4) on a 100K-line Go codebase, we tested eight representative pull requests (10–40 files each). For each PR, we generated structured summaries: a 300-word explanation of functional and technical changes (with no file or symbol references), distilled down to a 30-word prompt. With Devramp, the agent's outputs matched human reference diffs far more closely. On average, accuracy improved by 55%, and variability dropped by 18 percentage points — making results both more reliable and more consistent.
Result
100k LOC, GO
Date
23rd Sept 2025
Agent/Model
Claude Code (Sonnet-4)
Accuracy
+55
Variability
-18
Date
Codebase
Agent/Model
/devramp
Accuracy
/devramp
Variability
23rd Sept 2025
100k LOC, GO
Claude Code (Sonnet-4)
+55%
-18pp
Subscribe for Updates
Subscribe to be notified when we post new experiments, we'll also inform you of new blog posts.
Frequently Asked Questions
Ready to make AI work in your complex code base?
Without context AI stumbles in complex codebases.
Devramp makes it work!
Join Waitlist