Self-rating barely predicts AI skill - what we found building an assessor

Building Aisa (https://aisa.to), an AI that assesses how people actually use AI through a 20-40 min conversation rather than a quiz.

The thing that keeps surprising me: how someone rates their own AI skill barely predicts how they actually score. The people most sure they're "advanced" are often what we call Copy-Pasters - high usage, but they take output at face value with no verification step. The ones who hedge ("I'm probably average") often turn out to be the most systematic.

Takeaway if you're building any kind of assessment: if your input is self-report, you're measuring confidence, not competence. We dropped self-rating entirely and only score observed behaviour, every score backed by a direct quote from the conversation.

Honestly the hard part was never the model. It was getting employers to trust a score that contradicts a confident candidate. Showing them the actual evidence quotes is what flipped it.

Happy to get into how the calibration pass works if anyone's wrestling with the same problem.