Publications

Code Retrieval in Coding Agents

Why do coding agents work well on small projects but struggle with large codebases? We studied how different coding agents find and use code context. Turns out the retrieval problem is where most agents fall apart.


Experiments

Smaller projects and benchmarks from our research. Completed explorations, not ongoing work.

Code Context

Semantic code search with APIs similar to ripgrep. An experiment in figuring out the best way to provide code context to coding agents.

EmojiBench

A benchmark designed to detect AI-generated code by measuring emoji usage patterns.

Sync Engine

A sync engine for building offline-first applications with automatic conflict resolution and optimistic updates.

Code Retrieval Eval

Evaluation framework for measuring how well coding agents retrieve relevant code from large codebases. Built for our code retrieval research.

Cloud Debug Eval

Evaluation benchmark for LLMs in cloud infrastructure debugging.


Blog

Longer-form thinking and observations from our work.

Why Senior Engineers Get More Out of AI Than Junior Developers

A counterintuitive pattern: senior engineers consistently extract more value from AI coding tools than juniors. It's not about prompt tricks. It's about judgment.