Stop blaming your prompts. Blame your token budget.

by Avitech

Ever spend years copying text one word at a time — clicking, dragging, missing, trying again — before someone showed you you could just double-click to select the whole word?
That's exactly how I felt after a year of vibe coding with Claude.
I wasn't prompting wrong. I wasn't using the wrong model. I was just... running out of room.
One thing that took me a while to fully internalize: the context window is everything.
Every conversation has a token limit — code, documents, back-and-forth messages, all of it counts toward the same budget. The longer the conversation, the more the model has to "compress" older context to fit. You're not imagining it when responses start feeling more generic or forgetful mid-session — that's a real degradation, not a vibe.
A few signs I've learned to recognize:

Responses get more generic, less tailored to what we've been building
Claude repeats things it already said
Simple code starts having dumb mistakes
It "forgets" something we explicitly covered 20 messages ago

What actually helps:

New conversation for every new topic — no exceptions
Don't paste long code, describe what it does instead
Heavy code sessions: start fresh after ~30–40 messages
Pure text discussions: you can push further
When something feels "off" — just open a new chat. That instinct is usually right.

Been vibe coding for a while? I'd love to hear what's worked for you — and what hasn't.

Avitech

posted to

AI Tools

on May 14, 2026

Say something nice to Avitech…

Post Comment

1

This is actually great advice. I have swapped over not to long ago using this method.

Cmendi

·
23 days ago
·
Reply
1

Smart approach. I do something similar but keep the summary even leaner, just core architecture decisions and non-obvious constraints, since I've found that even a dense markedown dump can eat into the new session's budget faster than expected.

Avitech

·
a month ago
·
Reply
1

This is exactly why context window degradation is the silent killer of complex builds. You are completely right about needing to start fresh, but just opening a new chat means you lose the global architecture.

My workaround is 'State Summarization'. Around message 25, before the degradation hits, I prompt the model to generate a dense, compressed markdown summary of the current codebase architecture, established rules, and pending tasks. I then use that summary as the system prompt for the new chat. It bypasses the token bloat while keeping the model tightly grounded. How do you handle transferring the necessary context when you spin up those fresh sessions?

adhamkhaled

·
a month ago
·
Reply