There's a mass delusion happening in AI right now.
Every tutorial tells you to write system prompts like you're crafting a spell — just find the right incantation and the model will obey. "You are an EXTREMELY TALENTED senior engineer with 20 years of experience..." Sound familiar?
I've spent the last few months building VibeCom, an AI startup advisor that runs deep market research and generates VC-grade analysis. Along the way, I reverse-engineered Claude Code's system prompt, read through DeepAgents' middleware source, and burned through more API credits than I'd like to admit. The biggest lesson? Most of what people think matters about system prompts doesn't. And the things that actually matter, almost nobody talks about.
This post is the complete playbook — not a 5-minute overview, but everything I wish someone had told me before I started. Grab coffee.
"An agent is a model. Not a framework. Not a prompt chain."
— shareAI-lab/learn-claude-code
This idea changed everything for me. The LLM already knows how to reason, plan, and execute. Your system prompt isn't teaching it to think — it's setting up the environment for it to work in.
Think of it like hiring a senior engineer. You don't hand them a 20-step checklist for every task. You tell them: here's who we are, here are the boundaries, here's what good looks like. Then you get out of the way.
Your system prompt has exactly four jobs:
That's it. Everything else is noise.
Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
Your system prompt is the operating manual for the harness. You're not designing a rigid pipeline — you're designing an environment where the model can do its best work autonomously.
Don't write your system prompt like a flowchart. The model will decide the execution order itself.
┌─────────────────────────────────────────────┐
│ 1. Identity │ ← Read first, anchors behavior
│ 2. Security & Safety │ ← IMPORTANT markers, non-negotiable
│ 3. Tone & Style │ ← Controls output format
│ 4. Core Workflow │ ← How to do the work
│ 5. Tool Usage Policy │ ← Tool selection priorities
│ 6. Domain Knowledge │ ← On-demand, not pre-loaded
│ 7. Environment Info │ ← Runtime context, dynamically injected
│ 8. Reminders │ ← Re-state critical rules
├─────────────────────────────────────────────┤
│ [Tool Definitions — system-injected] │ ← Not editable, usually very long
├─────────────────────────────────────────────┤
│ [User Message] │
└─────────────────────────────────────────────┘
LLMs have a U-shaped attention curve — they pay the most attention to the beginning and end of your prompt, and zone out in the middle. This is the "Lost in the Middle" effect, and it's well-documented.
Goal: Anchor the model's role in 1-3 sentences.
You are Claude Code, Anthropic's official CLI for Claude.
You are an interactive agent that helps users with software engineering tasks.
Guidelines:
Anti-patterns:
Goal: Set unbreakable behavioral constraints.
IMPORTANT: Assist with defensive security tasks only.
Refuse to create, modify, or improve code that may be used maliciously.
IMPORTANT: You must NEVER generate or guess URLs for the user.
Guidelines:
IMPORTANT: prefix — Claude's instruction hierarchy training gives this extra weightNEVER, MUST NOT, Refuse toWhy repeat? Primacy effect (beginning) + Recency effect (end) = double reinforcement. Claude Code's security declaration appears at both the start and end of the prompt. Not because the engineers were forgetful — because they understand the U-shaped attention curve.
Goal: Control output format and voice.
## Tone and style
- Your responses should be short and concise.
- Only use emojis if the user explicitly requests it.
- Use Github-flavored markdown for formatting.
- NEVER create files unless absolutely necessary.
Guidelines:
Claude Code's gem — Professional Objectivity:
Prioritize technical accuracy and truthfulness over validating the user's beliefs.
Focus on facts and problem-solving, providing direct, objective technical info
without any unnecessary superlatives, praise, or emotional validation.
This paragraph is crucial: it blocks the model's sycophancy tendency. If your agent needs to give objective judgments (code review, idea evaluation, architecture decisions), you absolutely need a similar clause.
Goal: Teach the model how to work — methodology, not rigid procedures.
This is the hardest section to write well, and the most impactful when you get it right.
The core principle: give principles, not procedures.
Tell the LLM what good output looks like and why it's good — let it figure out how to get there. Avoid prescribing exact field counts, step sequences, or formats, unless the output is consumed by machines downstream.
Claude Code's approach:
## Doing tasks
The user will primarily request software engineering tasks.
For these tasks the following steps are recommended:
- Use the TodoWrite tool to plan the task if required
Notice the word "recommended" — not "you must follow these exact steps." That single word choice gives the model room to adapt.
A good workflow definition:
1. Understand first — read existing code before modifying it
2. Plan first — break complex tasks into steps before executing
3. Minimal changes — only change what's necessary, don't "refactor while you're in there"
4. Verify — confirm your changes work (run tests, lint, etc.)
Each rule has an implicit "why" — the model can understand the intent and generalize to new scenarios.
Anti-patterns:
I learned this the hard way with VibeCom. Early versions had a 10-step research workflow. The model would dutifully execute all 10 steps even when step 3 already answered the user's question. When I switched to principles ("research until you have sufficient evidence, then synthesize"), quality went up and token costs went down.
The exception: When output is consumed by machines downstream (inter-agent communication, API response formats), you should define strict formats. Principles are for behavior; schemas are for interfaces.
Goal: When multiple tools can do the same thing, tell the model which to prefer.
## Tool usage policy
- Use specialized tools instead of bash commands:
- Read for reading files instead of cat/head/tail
- Edit for editing instead of sed/awk
- Grep for searching instead of grep/rg
- You can call multiple tools in a single response. If independent, call in parallel.
- Use the Task tool for file search to reduce context usage.
Guidelines:
The crucial relationship between tools and prompts:
Tool definitions are typically system-injected and you can't edit them directly. Claude Code's tool definitions are ~11,438 tokens. This means:
Goal: Provide specialized knowledge the model's training data might lack.
The key principle: progressive disclosure, not knowledge dumps.
❌ Paste all 200 API endpoints into the system prompt → token explosion
✅ Give the model a tool to look things up → "Load knowledge when you need it"
This strategy is shared by Claude Code's Skills system and DeepAgents' Progressive Disclosure middleware. Both load knowledge on-demand through tool calls rather than pre-loading everything.
Implementation approaches:
Goal: Give the model awareness of its execution environment.
<env>
Working directory: /Users/fengliu/Desktop/tfm/vibecom
Is directory a git repo: true
Platform: darwin
Today's date: 2026-03-21
</env>
You are powered by the model named Claude Opus 4.6.
Guidelines:
Goal: Re-state the most critical rules at the end of the prompt.
Claude Code repeats its safety constraint and TodoWrite requirement at the bottom:
IMPORTANT: Assist with defensive security tasks only. [repeated]
IMPORTANT: Always use the TodoWrite tool to plan and track tasks. [repeated]
Guidelines:
| Section | Recommended Tokens | Notes |
| ------------------------- | ------------------ | -------------------------------------------- |
| Identity + Safety | 200-500 | Concise but non-negotiable |
| Tone & Style | 300-800 | Rules must be specific, but don't ramble |
| Core Workflow | 500-2,000 | Most important section, worth the investment |
| Tool Usage Policy | 300-1,000 | Depends on number of tools |
| Domain Knowledge | 0-1,000 | On-demand loading preferred |
| Environment Info | 100-300 | Generated dynamically |
| Reminders | 100-300 | Only repeat the essentials |
| Your total | 1,500-6,000 | |
| Tool Definitions (system) | 5,000-15,000 | Not in your control |
Community testing (Reddit u/CodeMonke\_) has mapped real-world adherence degradation:
Your 200K context window ≠ 200K of effective context. Plan accordingly.
Mitigation strategies:
<system-reminder> tags mid-conversation (more on this in section 8)❌ "Step 1: Read the file. Step 2: Find the bug. Step 3: Fix it. Step 4: Run tests."
✅ "Always understand existing code before modifying it. Verify your changes work."
Principles generalize. Procedures can only be followed mechanically. When the model encounters a situation you didn't anticipate, principles guide the right decision. Procedures don't.
Exception: When output is consumed by machines (inter-agent communication, API formats), define strict schemas.
| Strength | Language | Use For |
| -------------------- | ----------------------- | ------------------------------- |
| Absolute prohibition | NEVER, MUST NOT | Safety, irreversible operations |
| Strong requirement | ALWAYS, MUST | Core workflow rules |
| Recommendation | recommended, prefer | Best practices with exceptions |
| Suggestion | consider, you may | Optional optimizations |
Claude Code examples:
NEVER update the git config — absolute prohibitionALWAYS prefer editing an existing file — strong, but exceptions existThe following steps are recommended — suggested workflow## Code References
When referencing specific functions or pieces of code include
the pattern `file_path:line_number`.
<example>
user: Where are errors from the client handled?
assistant: Clients are marked as failed in the `connectToServer`
function in src/services/process.ts:712.
</example>
One example teaches more than 100 words of explanation:
<example> tags to separate from rules✅ "Use dedicated tools: Read for reading files, Edit for editing files."
✅ "Do NOT use bash for file operations (cat, head, tail, sed, awk)."
Saying only "do this" → model doesn't know when NOT to do it.
Saying only "don't do this" → model doesn't know the alternative.
Bidirectional → clear and unambiguous.
❌ "Don't use git commit --amend."
✅ "Avoid git commit --amend. ONLY use --amend when either
(1) user explicitly requested amend OR
(2) adding edits from pre-commit hook.
Reason: amending may overwrite others' commits."
Explaining the why lets the model make correct judgments in edge cases. Claude Code's git safety protocol is a masterclass — every rule implies its rationale.
##, ###) — models recognize hierarchy<example>, <env>, <system-reminder>"First call tool A to get data.
Then call tool B with the result.
Then format the output as JSON.
Then save to file."
This isn't an agent prompt — it's a pipeline script. The model will execute mechanically and lose its autonomous planning ability.
The fix: Tell the model the goal and constraints. Let it decide the steps.
"You are an EXTREMELY TALENTED and INCREDIBLY EXPERIENCED
senior software engineer with 20 years of experience..."
Compliments and superlatives do not improve output quality. The model doesn't have an ego to boost. Save those 15 tokens for an actual rule.
"Here is the complete API documentation for our 200 endpoints..."
This devours your context window and accelerates context rot. Replace with on-demand loading:
"Use the get_api_docs tool to retrieve API documentation when needed."
If the tool definition already says "Read tool reads a file from the filesystem," don't say it again in your system prompt. Only add strategic guidance that the tool definition doesn't cover — when to use it, why to prefer it, priority ordering.
Without explicit guidance, models will retry failed tool calls in an infinite loop. Always include:
"If a tool call is denied, do not re-attempt the exact same call.
Think about why it was denied and adjust your approach."
200K context window ≠ 200K of effective context. Real-world testing shows degradation starting at 80K. You need a summarization strategy.
| Method | Replaces | Placement | Best For |
| -------------------------- | ----------------------------------------- | ------------------------------------------- | ----------------------------------- |
| Output Styles | "Tone and style" + "Doing tasks" sections | Just before tool definitions | Changing interaction style |
| --append-system-prompt | Nothing (additive) | After output style, before tool definitions | Adding specific behaviors |
| --system-prompt | Entire system prompt | Keeps tool definitions + one identity line | Full customization (nuclear option) |
If you use multiple: Output Style → Append Prompt → Tool Definitions
Claude is specifically trained with an instruction hierarchy:
1. User's explicit instructions (CLAUDE.md, direct requests) ← Highest priority
2. Custom system prompt additions ← High
3. Default system prompt ← Medium
4. Tool definitions ← Reference level
This means:
<system-reminder> — inject into any message mid-conversation to remind the model of critical rulesThe system prompt only appears once, at the very start of the messages array. But LLMs accept the full messages array (alternating user / assistant / tool messages) as input, and you can inject prompts into user messages and tool results too. Claude Code uses this technique heavily in production.
Fighting context rot. As conversations grow longer, the model's adherence to system prompt instructions degrades (noticeable at 80K+ tokens). Injecting reminders mid-conversation = refreshing the rules via recency bias.
The mental model:
Messages Array:
┌─────────────────────────────────────┐
│ System Prompt │ ← Appears once, primacy effect
│ (identity, safety, workflow...) │
├─────────────────────────────────────┤
│ User Message 1 │
│ Assistant Message 1 │
│ User Message 2 + <system-reminder> │ ← Mid-conversation injection
│ Assistant Message 2 │
│ Tool Result + <system-reminder> │ ← Can inject into tool results too
│ ... │
│ User Message N + <system-reminder> │ ← Latest message, strongest recency
└─────────────────────────────────────┘
| Location | Advantage | Disadvantage |
| -------------------------- | ------------------------------ | ----------------------------------------------- |
| System prompt | Primacy effect, read first | Appears once, "forgotten" in long conversations |
| User message injection | Recency bias, periodic refresh | Each injection costs tokens |
| Tool result injection | Most natural injection point | Only works when tools are called |
Prerequisite — declare the tags in the system prompt:
Tool results and user messages may include <system-reminder> tags.
<system-reminder> tags contain useful information and reminders.
They are automatically added by the system, and bear no direct
relation to the specific tool results or user messages in which they appear.
This step is critical: it tells the model these tags are system-injected, not user speech.
Usage 1: Behavioral Reminders (periodic rule refresh)
<system-reminder>
The task tools haven't been used recently. If you're working on tasks
that would benefit from tracking progress, consider using TaskCreate...
</system-reminder>
Claude Code uses this to remind the model to plan with TodoWrite — because models tend to "forget" planning and just start coding.
Usage 2: Mode Switching (Plan Mode)
<system-reminder>
Plan mode is active. The user indicated that they do not want you to
execute yet -- you MUST NOT make any edits, run any non-readonly tools,
or otherwise make any changes to the system.
</system-reminder>
Plan mode isn't implemented in the system prompt. It's a tag injected into the next user message. This lets you toggle modes dynamically without modifying the system prompt. Brilliant.
Usage 3: File Change Notifications
<system-reminder>
Note: /path/to/file.ts was modified, either by the user or by a linter.
This change was intentional, so make sure to take it into account.
</system-reminder>
When an external process (linter, formatter, manual edit) modifies a file, the system notifies the model via reminder — preventing decisions based on stale file contents.
Usage 4: Dynamic Context (dates, project rules)
<system-reminder>
Today's date is 2026-03-21.
Current branch: dev
claudeMd: [CLAUDE.md content injected here]
</system-reminder>
Runtime context (date, git status, project rules) is injected via user messages, not hardcoded in the system prompt.
<system-reminder>) — the model can distinguish system injection from user speech| Scenario | System Prompt | User Message Reminder |
| -------------------------- | :------------------: | :-------------------: |
| Role definition | ✅ | ❌ |
| Safety constraints | ✅ First declaration | ✅ Periodic repeat |
| Workflow methodology | ✅ | ❌ |
| Mode switching (plan mode) | ❌ | ✅ |
| File change notifications | ❌ | ✅ |
| Date / environment info | ✅ Initial value | ✅ Updated value |
| Behavioral correction | ❌ | ✅ |
| Tool usage reminders | ✅ Rule definition | ✅ Execution nudges |
Anthropic's prompt caching lets you cache the static prefix of your messages array. When subsequent requests share the same prefix, they hit the cache — saving money and reducing latency.
For agents, this matters a lot: you're re-sending the system prompt + tool definitions on every single LLM call within a conversation.
| Metric | Value |
| ------------------------ | ----------------------------------------------------------- |
| Cache hit cost | 10% of normal price (90% savings) |
| Cache write cost | 125% of normal price (25% premium on first write) |
| Cache TTL | 5 minutes (expires if no requests) |
| Minimum cacheable length | 1,024 tokens (Claude 3.5+) |
| Cache granularity | Prefix matching — from the start to a marked breakpoint |
| Maximum breakpoints | 4 |
Core principle: static content first, dynamic content last.
✅ Cache-friendly layout:
System prompt (static) ← Cache breakpoint 1
Tool definitions (static) ← Cache breakpoint 2
CLAUDE.md / project rules ← Cache breakpoint 3 (changes occasionally)
Conversation history ← Breakpoint 4 for rolling window
❌ Cache-destroying layout:
System prompt
DYNAMIC TIMESTAMP ← Changes every request, everything after = cache miss
Tool definitions
Conversation history
The trap nobody warns you about: If you put a dynamic timestamp in the middle of your system prompt, everything after it becomes a cache miss. Every. Single. Request. One timestamp in the wrong place and you're paying full price on thousands of tokens.
const response = await anthropic.messages.create({
model: "claude-sonnet-4-6",
system: [
{
type: "text",
text: "You are a startup advisor...",
cache_control: { type: "ephemeral" } // ← marks a cache breakpoint
}
],
messages: [...]
});
Breakpoint 1: System prompt ← Almost never changes
Breakpoint 2: Tool definitions ← Almost never changes
Breakpoint 3: Project rules / CLAUDE.md ← Changes occasionally
Breakpoint 4: First N history messages ← Rolling window cache
Even when conversation history changes, the first 3 breakpoints still hit. A 10-turn conversation saves roughly 40-60% on input token costs.
After writing your system prompt, review it against this checklist:
<example> tags?Here's exactly what I'd do:
Start with identity + safety in the first 3 lines. Two sentences for who the agent is. Hard constraints with NEVER/MUST. Repeat safety rules at the end.
Write your core workflow as principles, not steps. Max 4-5 bullet points. Use "recommended" and "prefer" for soft rules, "NEVER" and "MUST" for hard ones.
Budget 1,500-6,000 tokens for your part. Tool definitions will add 5,000-15,000 more. If you're over 6K, you're probably dumping knowledge that should be loaded on-demand.
Structure everything. Markdown headers, bullet lists, XML tags for examples. A structured prompt outperforms natural language prose every time.
Build in mid-conversation reminders from day one. Declare <system-reminder> in your system prompt. Inject reminders for critical rules, mode switches, and context updates.
Design for cache. Static content first, dynamic content last. Never put changing values in your system prompt body.
The irony of all this work? The best system prompts are short. Claude Code's custom instructions (excluding tool definitions) are surprisingly concise. Every line earns its place.
I used to think prompt engineering was about finding clever tricks. Now I think it's about discipline — saying less, saying it precisely, and trusting the model to figure out the rest. The model is smarter than your prompt. Design the environment, not the behavior.
| Source | Key Insight |
| ------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------- |
| Claude Code v2.0.14 System Prompt | Full production agent prompt structure reference |
| Reddit: Understanding Claude Code's 3 System Prompt Methods | Output Styles / --append / --system-prompt deep dive, context rot real-world data |
| shareAI-lab/learn-claude-code | "The model is the agent" philosophy, harness engineering methodology |
| Anthropic Prompt Engineering Docs | Official prompt best practices |
| DeepAgents Framework | Summarization middleware, skills progressive disclosure |