Why I put Claude AI into Jail and Let Him Code Anyays

by Guy Powell

How I Put Claude AI in Jail and Got It to Ship Production Code

We just shipped working, secure code to production.

It was written by Claude.

But only after I locked it in a container, stripped its freedoms, and told it exactly what to do.

This isn’t an AI-generated brag post.

This is an explanation of what happens when you stop treating LLMs like co-founders and start treating them like extremely clever interns.

The Problem: Vibe Coding Is Chaos

If you’ve ever prompted AI to “build me a secure backend”, then you’ve experienced:

Hard-coded secrets
No config separation
Auth hacked together
Layers in the wrong places
Database logic in controller methods
Security that is more reminiscent of a first-year student project
It feels impressive. But the output is not shippable.

I once tried building a Monkey-Island-style game with Claude at 2am just for fun. It ended with me screaming at a yellow rectangle on an HTML canvas.

Fun? Yes.

Useful? Not remotely.

The Insight: Claude’s Not the Problem, You Are

Claude is phenomenally good at code generation if you feed it the right prompts, at the right level of granularity, and in the right order.

When I use it personally, it acts as a co-architect. I bounce ideas off it, I get help debugging and sometimes it even surprises me with novel solutions (like using inherited env vars + process scanning for child cleanup across Windows/Linux).

But left to its own devices on a complex problem or wide-open scope?

Chaos.

The gap isn’t capability, it’s orchestration.

So… I put Claude in jail. Here’s what I did:

1.Claude gets containerized
A clean, temporary dev environment. No Git credentials. Limited network access. No escape.

Start with a user story
Human developers aren’t expected to work off a one-line mission statement, so why should AI be any different? I feed it a detailed user story that a human developer would be happy with.

Chain-of-thought agent breaks down the work
“Build a login system” becomes 20+ sub-tasks: token handling, session state, role config, browser caching, etc.

Claude gets micromanaged step-by-step
Each sub-task is prompted as a mini workflow: analyse → code → fix → verify

Final Claude pass reviews everything
It outputs a structured JSON diff with explanations.

We converted that to a GitHub PR
A human reviews. If it’s clean, we merge. If not, we loop until we’re happy.

Every time the task ends, the Claude container is destroyed.

No memory of past sins. No rogue commits.

Clean. Contained. Effective.

The Result?

15–20 minutes per story
PRs that pass internal review
No vibe coding
Shippable code with zero hallucinated libraries or misaligned assumptions
It’s slower per interaction than just “ask it to code” – but way faster overall.
Less rework. Less debugging. More trust in what comes out the other end.

Can You Do This Too?

If you're expecting GPT or Claude to magically build your app from a one-line prompt, you're going to be disappointed.

But if you're willing to:

Break tasks down
Containerize your AI workflows
Build orchestration logic
And treat your LLM like a task-executing machine, not a co-pilot ...then yes, it can code for you. And you can ship it.

The Big Question

Don’t think of AI as a replacement. AI is the intern. Orchestration is the manager. And humans are still the ones deciding what matters.

But here’s what I keep asking myself, and I’d love to hear your thoughts:

Should we be building AI tools that act more like interns who learn under supervision… or should we keep pushing for AI that acts like senior engineers we can trust outright?

What do you think?

Want to See the Whole Architecture?

I wrote up a full 3-part breakdown of the system, including failures, lessons, and technical design:

Why I Put Claude in Jail

Read Part 1 on Substack → https://powellg.substack.com/

It’s funny, raw, and surprisingly useful. Part 3 includes a detailed breakdown of the orchestration model and how we integrated Claude into our platform.

TL;DR

LLMs aren't co-founders. They're interns.

Give them tight specs, step-by-step instructions, and no keys to prod.

We built a jail for Claude. And now it ships production-ready code.

Let me know if you want beta access - we’re opening testing soon and would love to get your feedback.

Guy Powell

posted to

Beta Testing

on August 28, 2025

Say something nice to guy_powell…

Post Comment

2

This is a brilliant write-up — I really like the framing of “LLMs aren’t co-founders, they’re interns.” It captures the reality of how powerful these models are, but also how dangerous “vibe coding” becomes without structure and guardrails. The containerized, step-by-step orchestration approach makes a ton of sense, and I think it’s the kind of discipline that will separate useful AI-assisted development from hobby experiments that never scale.

Your big question is spot on too: should AI evolve toward acting like supervised interns, or aim for senior-engineer-level trust? Personally, I think the “intern + orchestration” model is the right mindset for now — it delivers real value today while still leaving humans firmly in control of architecture and responsibility. As the tooling matures, maybe the trust gap narrows, but I doubt we’ll ever want to skip the orchestration layer entirely. This feels like the sweet spot between speed, safety, and reliability.

wasifali758595

·
3 months ago
·
Reply
1. 1
  
  Appreciate that! You nailed exactly the tension I was trying to surface. The intern analogy came straight out of necessity; once I gave Claude full repo access the “wow” moments were quickly followed by “oh no, this thing has no brakes.” That’s when orchestration stopped being optional and became the core of the system.
  
  I agree with you on the trajectory. Aiming for “senior-engineer-level trust” skips past the fact that a lot of the real leverage is in the scaffolding: slowing the model down, forcing it to answer step-by-step, making it justify changes before touching prod. That’s where consistency and safety live. Maybe the gap narrows as models get sharper, but I don’t see orchestration going away. If anything, it becomes more important as we rely on them for bigger pieces of the workflow.
  
  Glad this resonated. The hype cycle misses that structure is the difference between a fun demo and something you’d actually merge into main.
  
  guy_powell
  
  ·
  3 months ago
  ·
  Reply
2

Love the vibe of this Guy, and completely agree with your idea of 'putting claude in jail!' I do something similar in Cursor.

TimWilkinson

·
3 months ago
·
Reply
1. 1
  
  That's fantastic! I'd like to hear your process in doing this in Cursor.
  
  guy_powell
  
  ·
  3 months ago
  ·
  Reply
  1. 2
    I essentially have a 3 stage process and use 3 markdown files to provide rules for whichever model I'm using in Cursor (often just auto).
    
    I explain the feature I want to build in one or two parragraphs and reference a markdown file with rules for creating a PRD. As part of that Cursor is prompted to ask 7-10 follow up questions to clarify functional and technical aspects of the feature. This give me a chance to discuss with the model the approach to take and clarify the edges of the work (so it doesn't go off and create a tonne of complexity).
    
    Once I'm happy that the PRD accuratly reflects what I want built and how I want it built, I reference another md file which prompts Cursor to create a task list based on the PRD. Again there's a load of stuff in there to define the approach. For example, I ask the model to create the list as if it were for a junior developer. I find this drives more clarity and detail. Now I have 4-6 high level tasks with maybe 6-8 sub-tasks detailing the work that's going to be done. It even lists the files that will be needed at the top (great for debugging if things go wrong).
    
    Finally I have another md with rules for processing the tasks list. It forces Cursor to stop after each sub-task so I can check the work. It also has rules like 'use as little code as possible' to complete the task. If it's a simple feature I might let Cursor complete a task on it's own without stopping at each sub-task.
    
    The documentation takes no time at all for Cursor to create, but it gives me a huge amount of visibility and allows me to follow along and know exactly what's being built.
    
    TimWilkinson
    
    ·
    3 months ago
    ·
    Reply
    1. 1
      
      That’s a really smart workflow, you’ve basically turned Cursor into a structured apprentice rather than a chaotic code generator. Breaking the flow into PRD → task list → step-wise execution mirrors how I ended up building orchestration layers with Claude for ScrumBuddy. The magic isn’t in the model writing more code, it’s in forcing it to slow down, ask clarifying questions, and then give you bite-sized, reviewable outputs.
      
      What I like most about your setup is the visibility. By locking in PRD clarity and treating tasks as if for a junior dev, you’re front-loading understanding rather than back-loading debugging. That’s exactly the kind of discipline that stops AI from drifting into complexity. It’s almost like you’ve given Cursor an internal project manager.
      
      Have you thought about automating the “stop after each sub-task” part into a little orchestration layer of its own? That’s one thing I built around Claude, wrapping it in a loop so it naturally pauses, validates, and only proceeds when you give the green light. Could save you from juggling those manual checkpoints while keeping the same level of control.
      
      guy_powell
      
      ·
      3 months ago
      ·
      Reply
      1. 2
        
        Yeah nice idea. I've already got Cursor to pause automatically and wait for my green light, but it's the validation that would be great. Almost like an automated code review. Quite often I spot obvious inefficiencies in the code that's produced and have to promt a re-factor. Will checkout ScrumBuddy 👍
        
        TimWilkinson
        
        ·
        3 months ago
        ·
        Reply
        
        1
        
        I'm glad you said that, because that’s exactly the gap I built ScrumBuddy to close. ScrumBuddy adds a validation layer on top of that, it doesn’t just stop and wait, it actively reviews the code against both best practices and the specific conventions of your repo.
        
        Here’s how it works in practice: when a feature is broken down into tasks, ScrumBuddy wires Claude into the repo with structured context; file structure, naming patterns, style guides, even past PRs, so it knows how your project is supposed to “sound.” Every time it generates or modifies code, it runs that output back through a validation pass. It’ll flag things like inefficient loops, unused imports, or architectural drift and suggest cleaner alternatives automatically.
        
        Instead of you having to say “hey, can you refactor this into something more efficient?” ScrumBuddy bakes that instruction into its orchestration. By the time you’re reviewing, you’re already looking at code that’s closer to production-ready. And because it works task-by-task, not just feature-by-feature, it keeps inefficiencies from snowballing into bigger messes later.
        
        So you still stay in control. You get the green-light moments, but the code review muscle is handled for you, saving you from doing the same “spot-and-refactor” dance over and over.
        
        If you'd like to give it a try, we'd love for you to test our beta and get your feedback on it. You can sign up here and we'll get in touch with you to set you up: https://scrumbuddy.com/?utm_source=indiehackers&utm_medium=community
        
        guy_powell
        
        ·
        3 months ago
        ·
        Reply
        
        2
        
        Thanks Guy, I'll check it out
        
        TimWilkinson
        
        ·
        3 months ago
        ·
        Reply
        
        1
        
        Let me know your feedback when you try it out!
        
        guy_powell
        
        ·
        3 months ago
        ·
        Reply