AI RESEARCHER · WRITER · ADVISOR

Florin

building frontier AI

I research how large models reason, build systems that put them to work, and write — clearly — about where this is all going. No hype, just the signal.

Read the blog → About me Subscribe

Essays published

Newsletter readers

Years in AI

Total reads

◢◣ [ portrait.jpg ]

SF · Remote est. 2015

// ABOUT

Translating frontier research into things people can actually ship.

I'm Florin — an AI researcher and writer focused on large language models, reasoning, and the messy engineering that makes them useful. I've spent the last decade building machine-learning products and turning dense papers into working systems.

This is where I think out loud: deep dives, build logs, and honest takes on what's working, what isn't, and what's coming next.

AREAS OF EXPERTISE

Large Language Models Reasoning & Agents Retrieval (RAG) Model Evaluation Fine-tuning AI Strategy MLOps Multimodal

// FEATURED

Latest writing

view all →

ESSAY

Jun 18, 2026 · 9 min read

Why reasoning models change the economics of software

When inference can think, the cost curve of building products bends in ways most teams haven't priced in yet.

GUIDE

Jun 09, 2026 · 14 min read

A practical guide to evaluating LLM agents

Vibes don't scale. A repeatable harness for measuring whether your agent actually completes the task.

ANALYSIS

May 28, 2026 · 7 min read

The quiet revolution in small models

The frontier gets headlines, but 8B-parameter models running on-device are where the real shift is happening.

BUILD LOG Jun 22, 2026·12 min read

Building a research agent that actually finishes the job

Most "autonomous" agents stall halfway through anything that matters. Here's the architecture I landed on after a year of long-horizon experiments — planning, memory, and the verification loop that keeps it honest.

▶

▸ Watch on YouTube · 18:42

Walkthrough — architecting a long-horizon research agent end to end.

Start with the verification loop, not the planner

Everyone reaches for the planner first. But a planner without a way to check its own work just produces confident, expensive nonsense. The component that made the difference for me was a verifier that scores each step against the original goal before the agent is allowed to move on.

Once the agent could tell when it was off-track, everything downstream got simpler. Re-planning became cheap. Memory stopped accumulating garbage. The video above walks through the exact graph — feel free to pause on the architecture diagram around the eight-minute mark.

"An agent that knows when it's wrong is worth ten that are merely fast."

In the full write-up I share the prompt scaffolding, the eval set I used to tune the verifier, and the failure cases that still trip it up. Subscribe below to get the next part — where I put this thing in production.

// ARCHIVE

Everything I've published

▶

Translating frontier research into things people can actually ship.

Latest writing

Why reasoning models change the economics of software

A practical guide to evaluating LLM agents

The quiet revolution in small models

Building a research agent that actually finishes the job

Start with the verification loop, not the planner

Everything I've published

Building a research agent that actually finishes the job

Why reasoning models change the economics of software

A practical guide to evaluating LLM agents

The quiet revolution in small models

How I evaluate frontier models in 2026

One sharp email on AI, every week.