What it actually does
Four core systems:
The self-correcting generation loop
Added a critic=true parameter to /generate. If the output scores below 80 on an internal quality rubric, the system silently regenerates — max 1 retry. The user never sees the failed attempt. Combined with RAG context, the first attempt is already better, and the retry almost always clears the threshold.
Stack
FastAPI, Next.js, PostgreSQL, SQLAlchemy, Zustand + TanStack Query. Vertex AI Search for RAG. Gemini 3.1 Pro as the generation model.
What the audit fixed today
Replaced python-jose (had CVE-2022-29217) with PyJWT
Stateless refresh tokens with 7-day expiry
SECRET_KEY mandatory in production — app won't boot without it
LimitUploadSizeMiddleware — blocks requests over 2MB before they hit memory
Prompt injection sanitization at Pydantic layer, user input isolated in XML tags
api_usage table auto-cleanup — 90-day retention
File lock on Alembic migrations for multi-worker safety
Extracted 40+ endpoints from monolithic main.py into 8 specialized routers
Before: 57 tests. After: 137 passing.
Testing approach
Cross-model QA — Claude wrote the code, Gemini generated the tests independently to avoid author bias. Gemini caught edge cases I wouldn't have written tests for myself.
Where it is now
Architecture is B2B-ready. Working on the monetization layer next.
Happy to answer questions about the RAG setup, the self-correction loop, or the Helios personality system.