Question for builders using AI in their products

by dfrankstudioz

How do you evaluate if an AI answer is actually good?

Right now most people rely on:

• "Looks good to me"
• Basic prompt tweaks
• Trial and error

I'm experimenting with a tool that scores AI decisions from 0–100 based on clarity, risk and reasoning.

Would love to hear how others are handling this.

on March 11, 2026

Say something nice to dfrankstudioz…

1

Scoring the output is hard when the input is a freeform text blob. If the answer is wrong, you can't tell if it's the role, the missing constraints, or the output format that caused it. Everything is entangled.

The thing that helped me most was structuring prompts into typed semantic blocks before evaluating. Role separate from objective, constraints separate from examples. When you get a bad score, you swap one block, re-run, and see if the score moves. That's a testable input unit, not a wall of text.

I built flompt (https://flompt.dev) around this idea: decompose a prompt into 12 typed blocks, compile to XML. Evaluation becomes much cleaner when the input has structure. Open-source: github.com/Nyrok/flompt

If you find it useful, a star on github.com/Nyrok/flompt would mean a lot. Solo open-source project, every star helps with visibility.

Nyrok

·
3 hours ago
·
Reply

Trending on Indie Hackers

Stop Spamming Reddit for MRR. It’s Killing Your Brand (You need Claude Code for BuildInPublic instead)

182 comments What happened after my AI contract tool post got 70+ comments

128 comments How to build a quick and dirty prototype to validate your idea

53 comments Where is your revenue quietly disappearing?

51 comments The Quiet Positioning Trick Small Products Use to Beat Bigger Ones

40 comments I Thought AI Made Me Faster. My Metrics Disagreed.