I tracked every prompt I sent to Claude Code for 31 days and here's what the data said about my actual workflow

by Yus Erdo

I build with coding agents every day, mostly Claude Code, with a bit of Cursor and Codex. I had a vague feeling I was productive, but no real measure of it. So for 31 days I tracked everything: prompts, tool calls, sessions, commits, and PRs. That came to about 2.3K prompts, 6.3B tokens, and roughly $6K in pay per token equivalent spend.

A few things genuinely surprised me.

Most of my prompts do not directly ship code. Only about 5% led to a commit or PR. I would have guessed something closer to 20 to 30%. The rest was research, planning, debugging, or just using the wrong tool. It made me realise that a lot of real work happens before anything gets committed.

Process also matters a lot. Prompts that sat inside some kind of workflow, like planning, TDD, or subagent driven development, were much more likely to lead to shipped code than direct prompts. Brainstorming (superpowers:brainstorming skill) was heavily used but rarely shipped anything, which makes sense, but I had never really measured how much time went into that phase.

My code velocity also ramped up a lot over the month. I do not think I suddenly got smarter. I think I just got better at working with the agent: when to plan first, when to spawn a subagent, when to inspect state, and when to let it run.

I also realised that a single prompt is rarely a single action. On average, a prompt turned into a long multi step run with many tool calls. The real unit of work is not the prompt. It is the session.

The last surprise was the difference between personal and work projects. On my own repos, prompts often led to commits and PRs. On work repos, token usage was high but commits were rare. Same agent, same person, different mode. At work I mostly use the agent to read, explain, and debug existing systems. At home I use it to build.

The main thing this changed for me is awareness. I now catch myself asking whether I am actually shipping or just exploring. Both are useful, but knowing which mode I am in has already changed how I work.

The obvious limitation is that this is still just me. I do not know how much of this is personal habit versus something broader about AI assisted coding.

What I would love to hear from others is:

Have you ever measured your own AI coding workflow? What surprised you?

Which of these patterns feels familiar?

If you use multiple models, what actually helped you compare them in practice?

If you had a dashboard for your own agent use, what is the one metric you would want that nobody shows yet?

Yus Erdo

posted to

Developers

on April 9, 2026

Say something nice to yus_erdo…

Post Comment

1

That 5% commit rate is a familiar number — and your framing that the session is the unit of work rather than the prompt is one of the sharper observations I've seen on this.

The pattern I've noticed in my own workflow: sessions that produce commits tend to have a clear exit contract — some artifact (a handoff file, a PR branch, a structured summary) that captures what happened so the next session can pick up without re-deriving everything. Sessions that don't produce commits often end in context exhaustion or drift, where Claude loses the thread of earlier decisions.

Curious what percentage of your non-commit sessions ended in a compaction event vs. just user-ended. That distinction has been the most predictive variable in my own data — compaction events are where architectural drift tends to start.

kcarriedo

·
12 days ago
·
Reply
1

Yeah this is genuinely tough, trying to understand personal habits versus broader trends in AI-assisted coding. I actually know a couple of developers using AI coding agents who'd probably be open to a quick chat, happy to put you in touch.

Merc

·
24 days ago
·
Reply
1

The 55% number tracks with what I've seen — and the mechanism is almost always the same.

Claude Code re-reads your codebase and re-learns your conventions on every session. Without a CLAUDEmd file, you spend 20-30% of your prompts re-explaining things that should be constants: 'we use async/await throughout', 'services never import from routers', 'all DB access goes through the repository layer'. Those prompts are expensive.

A well-written CLAUDEmd puts all of that in the cache prefix. Cheap tokens upfront, zero re-explanation tokens on every task. The cost reduction compounds fast once the file is solid.

I've been writing production CLAUDEmd templates for different stacks (FastAPI, Next js, Go, Node, TypeScript React) and open-sourcing them. The pattern is always the same: one upfront investment in the config file, then dramatically fewer clarification rounds.

If useful: search oliviacraftlat on gumroad — there's a free starter kit and paid stack-specific packs

OliviaCraft

·
a month ago
·
Reply
1

The 5% commit rate is a reality check. Most founders confuse 'token usage' with 'traction.' Realizing that the real unit of work is the session helps you stop spinning your wheels in exploration mode.

Since you're obsessed with measuring your actual output, you should bring that dashboard into the Validation Arena (tokyolore.com).

$19 to enter, 30 days to ship real revenue.
$0 pool right now, and the winner gets a Tokyo trip! 🏆

Tokyolore

·
2 months ago
·
Reply
1

This is super interesting, especially the 5% of prompts leading to commits. I would’ve also guessed way higher.

The “session vs prompt” idea really clicked for me. It feels like most of the value is in the iteration loop, not the individual prompt.

Curious—did you notice certain types of sessions (like debugging vs planning) being more “expensive” in tokens?

JesusDavid

·
2 months ago
·
Reply
1

This comment was deleted 4 days ago.

OliviaCraft

·
5 days ago