I built an AI prompt engineering platform with a self-correcting generation loop

What it actually does
Four core systems:

Prompt generation engine — takes user input, applies personality style, and returns a structured prompt optimized for AI image generation.
Character Lock — a 41-field character sheet system that ensures visual consistency across unlimited generations. Gender, age, ethnicity, facial features, hair, clothing, accessories — stored and injected into every prompt automatically.
Helios personality engine — 6 AI archetypes that blend different stylistic approaches to prompt output. Users pick a personality, the engine adjusts tone, structure, and emphasis accordingly.
Vertex AI RAG pipeline — instead of relying purely on the model's base knowledge, every generation request queries a Vertex AI Search data store (Discovery Engine) backed by 34 curated prompt engineering documents before the model runs. Outputs reference established prompt engineering principles rather than hallucinated style advice.
The RAG layer runs on Gemini 3.1 Pro. The data store lives on Google Cloud and gets queried on every /generate call.

The self-correcting generation loop
Added a critic=true parameter to /generate. If the output scores below 80 on an internal quality rubric, the system silently regenerates — max 1 retry. The user never sees the failed attempt. Combined with RAG context, the first attempt is already better, and the retry almost always clears the threshold.

Stack
FastAPI, Next.js, PostgreSQL, SQLAlchemy, Zustand + TanStack Query. Vertex AI Search for RAG. Gemini 3.1 Pro as the generation model.

What the audit fixed today

Replaced python-jose (had CVE-2022-29217) with PyJWT
Stateless refresh tokens with 7-day expiry
SECRET_KEY mandatory in production — app won't boot without it
LimitUploadSizeMiddleware — blocks requests over 2MB before they hit memory
Prompt injection sanitization at Pydantic layer, user input isolated in XML tags
api_usage table auto-cleanup — 90-day retention
File lock on Alembic migrations for multi-worker safety
Extracted 40+ endpoints from monolithic main.py into 8 specialized routers

Before: 57 tests. After: 137 passing.

Testing approach
Cross-model QA — Claude wrote the code, Gemini generated the tests independently to avoid author bias. Gemini caught edge cases I wouldn't have written tests for myself.

Where it is now
Architecture is B2B-ready. Working on the monetization layer next.
Happy to answer questions about the RAG setup, the self-correction loop, or the Helios personality system.