Building a research agent that actually finishes the job
Most "autonomous" agents stall halfway through anything that matters. Here's the architecture I landed on after a year of long-horizon experiments — planning, memory, and the verification loop that keeps it honest.
Start with the verification loop, not the planner
Everyone reaches for the planner first. But a planner without a way to check its own work just produces confident, expensive nonsense. The component that made the difference for me was a verifier that scores each step against the original goal before the agent is allowed to move on.
Once the agent could tell when it was off-track, everything downstream got simpler. Re-planning became cheap. Memory stopped accumulating garbage. The video above walks through the exact graph — feel free to pause on the architecture diagram around the eight-minute mark.
"An agent that knows when it's wrong is worth ten that are merely fast."
In the full write-up I share the prompt scaffolding, the eval set I used to tune the verifier, and the failure cases that still trip it up. Subscribe below to get the next part — where I put this thing in production.