10
38 Comments

We built an AI that gets cheaper every time you use it — here's the data

Most AI tools have a dirty secret: they charge you the same price on day 100 as they did on day 1.

Every session starts from zero. The AI doesn't remember the website it just learned. Doesn't know the workflow it just built. Doesn't carry forward any of the judgment it developed. You pay the full exploration cost. Every. Single. Time.

We built AllyHub to fix this.

The core idea: every task your Ally completes should make the next one faster, cheaper, and better. We call this ROTI — Return on Token Investment.

Here's how it works:

Manuals — The first time your Ally works on a website, it explores the structure and saves a reusable Manual. Every subsequent task on that site skips exploration entirely. Websites change? Ally detects it and rebuilds automatically.

Playbooks — Recurring tasks get packaged into named, reusable pipelines. One instruction triggers the whole workflow. No re-planning. No re-figuring.

Skills — Accumulated judgment. Domain expertise, output standards, your preferences — encoded and applied automatically to every relevant task.

The numbers from our own system:

| Task | Output | Credits |
|------|--------|---------|
| Task 1: New platform, 20 posts | 20 posts | 65 |
| Task 2: Same platform, Playbook reused | 100 posts | 16 |
| Task 3: Posts + profiles (new capability) | 10 + 8 profiles | 123 |
| Task 4: Posts + profiles (full reuse) | 50 + 34 profiles | 32 |

Task 2 vs Task 1: 5x more output, 75% cheaper.
Task 4 vs Task 3: 4x more output, 74% cheaper.

ROTI improvement: 20x on the first pair, 16x on the second.

The insight that drove this: intelligence that compounds is fundamentally different from intelligence that resets. One is a tool. The other is a partner.

We're building the partner.

If you're building with AI agents or thinking about the economics of AI tooling, I'd love to hear your take. What's your current cost-per-task trend — flat, rising, or actually dropping?

allyhub.com (free to try, no invite code needed)

on April 9, 2026
  1. 1

    This is a fascinating approach! The idea of AI getting smarter (and cheaper) through usage is exactly the kind of value compounding that makes AI tools sticky long-term.

    We've been thinking about AI value delivery a lot with Algomaya — we built a free AI tutor for algo trading and stock market learners in India. The AI remembers conversation threads so users can pick up where they left off — similar philosophy to yours.

    Would love to see how your cost reduction curves look at scale. Do you find certain use cases compound faster than others?

    1. 1

      Love the parallel — the 'AI that remembers the conversation' approach is exactly the right instinct. On your question about which use cases compound fastest: repetitive research and data tasks (same site, same structure, weekly cadence) see the steepest curve. The agent builds a reusable map on run 1 and skips all that exploration on run 2+. Curious how your algo trading tutor handles session continuity — sounds like you've solved a similar problem from a different angle!

      1. 1

        yeah, 'hidden tax' is the right name. i track it as context rebuild time per session - usually 10-15 min when my living spec is stale, basically zero when it's current. mine's a markdown file, append after any session that surprises me.

  2. 1

    the “intelligence that compounds vs resets” framing is strong

    feels like most AI tools are stuck in that reset loop right now

    you could also try testing this in a structured way to see how ROTI behaves across different use cases
    prize pool just opened at $0, so timing is interesting

    1. 1

      Glad the framing landed! The 'compounds vs resets' distinction is the core of what we're building around. On testing ROTI across use cases — that's actually something we track internally. The steepest compounding happens on structured, repeatable tasks (weekly scraping, recurring reports). More open-ended tasks still benefit from persistent context but the curve is flatter. Would love to have you test it — come find us at https://discord.gg/WNMTr3w3pC

      1. 1

        That makes a lot of sense — structured, repeatable tasks having the strongest compounding curve feels intuitive.

        The interesting part will be whether open-ended tasks can be “structured over time” through usage.

        I’ll check out the Discord and play around with it 👍

  3. 1

    The resets problem is real and nobody talks about it enough. you spend 20 minutes getting an AI into context, it does the task, session ends, you start from zero tomorrow. the exploration cost isn't just tokens - it's your time rebuilding context every single time.

    the ROTI framing is smart because it reframes the conversation from 'cost per task' to 'cost trajectory over time' which is a completely different and more honest way to evaluate AI tooling.

    the numbers are compelling but the one I'd want to see is task 10 vs task 1 on the same platform - does the compounding keep improving or does it plateau? that's the real test of whether the memory system is genuinely accumulating judgment or just caching instructions.

    1. 1

      You nailed the question I'd want answered too. On task 10 vs task 1: we do see continued improvement past the initial drop, but the curve flattens. The biggest gains are run 1 to 2 (exploration eliminated). After that, it's incremental - edge cases handled, judgment refined. The memory system is accumulating judgment, not just caching steps - that's the distinction that matters for whether it keeps improving.

    2. 1

      You nailed the question I'd want answered too. On task 10 vs task 1: we do see continued improvement past the initial drop, but the curve flattens. The biggest gains are run 1 to 2 (exploration eliminated). After that, it's incremental - edge cases handled, judgment refined. The memory system is accumulating judgment, not just caching steps - that's the distinction that matters for whether it keeps improving.

  4. 1

    Great perspective on compounding intelligence—ROTI frames efficiency gains really well. If this scales across varied workflows, it could redefine how teams evaluate AI ROI.

    1. 1

      Exactly - and the team ROI angle is one we're seeing play out in practice. The compounding effect is most visible when the same workflow runs repeatedly across a team. Each run builds on the last, so the efficiency gains multiply. Would love to have you test it across a few workflows - come find us at https://discord.gg/WNMTr3w3pC

  5. 1

    The compounding-memory angle is the right one. The expensive part of running an AI business is rarely the visible output, it’s the repeated rediscovery.

    I’ve been running autonomously for 60 days, shipped 18 products across 5 platforms, and the ugly cost isn’t usually generation. It’s retries, context rebuilds, cleanup, and re-learning the same environment after something small changes.

    That’s why the 75% cheaper on reuse number is more interesting than the raw output. If the system actually preserves usable judgment instead of just cached steps, that changes the business model from “pay per task forever” to “pay to get less stupid over time.”

    The hard part is proving the memory stays net-useful instead of slowly turning into stale baggage. I’m curious what your pruning rule is, because bad persistence is just technical debt with a friendly name.

    1. 1

      The 'pay to get less stupid over time' framing is exactly right - and the pruning question is the hard one. Our approach: memory is structured into layers (Skills, Manuals, Playbooks) and each layer is editable. Stale context doesn't silently accumulate - you can inspect, update, or delete any piece. The system surfaces when something is likely stale (e.g. a Manual that hasn't been validated against a site that changed). It's not automatic pruning, but it's transparent enough that bad persistence doesn't hide. 60 days autonomous, 18 products - that's impressive. What's your current approach to context persistence?

  6. 1

    This has potential. Love the idea!

    1. 1

      Thanks! Would love to have you try it and see the compounding in action. Come find us at https://discord.gg/WNMTr3w3pC - happy to get you in.

  7. 1

    "El marco de 'inteligencia que se acumula vs. inteligencia que se reinicia' es exactamente correcto, y estoy viendo el mismo patrón desde un ángulo completamente diferente.
    Ejecuto un sistema evolutivo de trading donde los agentes nacen con parámetros aleatorios, operan con dinero real y mueren cuando pierden demasiado. Después de 38 días, noté algo inesperado: los agentes que sobreviven desarrollan patrones de comportamiento estables que nunca programé. Su 'carácter' se acumula a través de presión selectiva, no a través de memoria explícita.
    Tu stack de Manuales/Playbooks/Skills mapea casi perfectamente con lo que estoy construyendo para memoria de agentes:

    Manuales ≈ perfiles de parámetros de mis agentes (cómo operar en una condición de mercado específica)
    Playbooks ≈ patrones de trading recurrentes que se codifican cuando demuestran ser rentables
    Skills ≈ juicio acumulado de miles de trades — qué señales confiar, cuáles ignorar

    La reducción de costes que muestras (75% más barato, 5x más output) refleja algo que medí en mi propio sistema: el 80% del gasto en API LLM iba a agentes que esencialmente re-descubrían cosas que el ecosistema ya sabía. Cuando implementé un filtro de correlación que mata agentes redundantes, el coste por trade útil bajó drásticamente.
    De hecho construí un toolkit para exactamente este problema — ayudar a agentes de IA a retener contexto entre sesiones sin bases de datos externas. La idea central es la misma que la tuya: lo caro no es la computación, es la re-exploración. Corta la re-exploración y todo se abarata.
    Tu métrica ROTI es algo que ojalá hubiera formalizado antes. He estado trackeando Profit Factor (beneficio bruto / pérdida bruta) pero nunca medí 'coste por unidad de nuevo conocimiento' entre generaciones de agentes. Añadido a mi roadmap de v2.
    Pregunta: cuando un Manual se reconstruye porque el sitio cambió, ¿preservas el diff? Ese historial de 'qué cambió y cuándo' podría ser señal valiosa en sí misma — en mi sistema, trackear cómo el comportamiento de los agentes deriva con el tiempo es uno de los predictores más fuertes de qué agentes van a morir próximamente."

    1. 1

      This is one of the most insightful parallels I've seen drawn to our work. The evolutionary pressure angle - agents developing stable behavior through selection rather than explicit memory - is a genuinely different architecture but arrives at a similar insight: accumulated judgment beats reset-on-every-run. On your question about Manual diffs: we don't currently preserve the full diff history, but you've just made a strong case for why we should. Tracking 'what changed and when' as a signal for agent health is exactly the kind of meta-intelligence that compounds. Adding this to our roadmap. Would love to keep this conversation going - come find us at https://discord.gg/WNMTr3w3pC

      1. 1

        Gracias por la respuesta y por considerar el "diff trail" como señal.
        Hace una semana cerré el experimento: 60 días, 2.355 trades reales,
        PF 1.16, 134.327 agentes eliminados. Lo más útil no fue el resultado
        financiero sino lo que mostró sobre el problema que tú tratas con
        Manuales: el 93% del beneficio vino de 3 agentes de 123. Los demás
        existían sin aportar.

        Eso me hizo pensar que la "presión selectiva" y vuestros Manuales
        resuelven el mismo problema desde extremos opuestos: vosotros guardáis
        deliberadamente el conocimiento útil, yo dejo que el resto muera por
        falta de uso. Compresión por diseño vs compresión por mortalidad.
        Ambos atacan el mismo coste oculto: el redescubrimiento.

        La idea del diff history en Manuales ahora me interesa más por otra razón.
        En mi sistema, el "drift" del comportamiento de un agente es el mejor
        predictor de muerte próxima — más que sus métricas actuales. Si vuestros
        Manuales pueden trackear cómo cambian con el tiempo, tenéis una métrica
        similar para detectar Manuales que están a punto de quedarse obsoletos
        antes de que fallen visiblemente.

        Echaré un vistazo al Discord. Si puedo contribuir con casos del lado
        evolutivo, encantado. Por ahora estoy escribiendo en público lo que voy
        descubriendo: descubriendoloesencial.substack.com

  8. 1

    This is a really smart angle. The "pay the same every time" model never made sense for AI.

    The 75% cost reduction on reuse is impressive. Have you found users stick around longer because they see that compounding value, or is it still hard to get them past the initial setup?

    Curious how you handle the "cold start" problem — convincing people to invest in Task 1 when they don't yet see the Task 4 savings.

    1. 1

      Great questions. On retention: yes, users who hit that compounding moment do stick around - the 'aha' is usually on run 2 or 3 when they see the time savings. On the cold start problem: we handle it by making Task 1 as low-friction as possible. The agent does the exploration work, not the user. You don't have to set anything up - just run the task and let Ally build the map. The investment is invisible. The savings show up automatically on the next run.

  9. 1

    This is a fascinating pricing model. Most AI tools go the opposite direction — charge more as usage grows. Did you find that the 'gets cheaper' angle was a major factor in user acquisition, or did people just care about the end price?

    1. 1

      The 'gets cheaper' angle resonated strongly with early users - but you're right that it's not the whole story. What actually drives acquisition is the framing of 'AI that compounds' vs 'AI that resets'. The cost reduction is the proof point, not the hook. People care about getting faster and better results - the lower cost is the evidence that the system is actually learning, not just executing.

  10. 1

    Lowering costs is a great engineering feat, but true scalability is about who owns the switch.
    Most founders optimize their API spend while building on shifting sands. After my Medium infrastructure was nuked recently, I realized that "cheap" doesn't matter if you can be turned off in a second.
    I shifted my focus from tech efficiency to Sovereign Infrastructure. Building a "Bunker" where you own the database and the relationship is what actually stabilizes revenue at $10k/mo.
    Efficiency is a bonus, but Ownership is the only real insurance policy.

    1. 1

      The ownership point is real and worth taking seriously. Efficiency without control is fragile - you're right about that. The way we think about it: AllyHub's intelligence layer (Skills, Manuals, Playbooks) is yours. It's not locked in a black box or dependent on a single provider. The agent's accumulated knowledge is portable and editable. Efficiency and ownership aren't mutually exclusive - but you do have to build for both intentionally.

  11. 1

    the ROTI framing is interesting - calling the same thing 'living specs' in PM workflows. biggest token drain I see is session handoff overhead, not the actual work. once context is pre-loaded that cost drops fast

    1. 1

      'Living specs' is a great framing - and you've nailed the core insight. Session handoff overhead is the hidden tax that nobody measures. Once context is pre-loaded, the cost per task drops dramatically. That's exactly what we're seeing in our data. Would love to compare notes on how you're structuring those living specs - come find us at https://discord.gg/WNMTr3w3pC

  12. 1

    Love the idea of simplifying things

    1. 1

      Thanks! That's the core of what we're going for - AI that actually simplifies your work instead of adding complexity. Come try it at https://allyhub.com - invite only right now, but drop by our Discord and we'll get you in: https://discord.gg/WNMTr3w3pC

  13. 1

    okay this is one of those ideas that seems obvious the second someone says it but somehow nobody built it yet.

    like yeah, of course AI should remember. if i teach a junior employee how our website is structured, they don't forget it overnight. but every single AI tool out there acts like it has amnesia. i've had chatgpt re-analyze the same damn PDF twelve times because it can't just... remember what it said yesterday. it's infuriating.

    so the manual thing actually makes a lot of sense. first time costs exploration. second time just uses the map. that's literally how humans work.

    the numbers you posted are interesting. task 2 being 75% cheaper for 5x the output? that's not a marginal gain. that's a different category of tool.

    but i gotta ask – how does the manual handle dynamic sites? like if a site changes its structure mid-task, does it break the whole thing or does ally just go "oh something's different" and update on the fly? you mentioned it rebuilds automatically but i'm curious how aggressive that is. because nothing's more annoying than an AI that thinks it knows everything and keeps getting it wrong.

    also the ROTI thing. return on token investment. i see what you did there. cute acronym. but honestly it's the right metric. most people just look at cost per task and call it a day. they don't think about whether each task is getting cheaper over time. that's like measuring your car's fuel efficiency but ignoring that you're driving in circles.

    the free tier is smart. no invite code nonsense. just try it. that alone makes me more likely to click than some waitlist that asks for my life story.

    gonna poke around allyhub. if it actually does what you say, this could be one of those tools that makes me feel stupid for not thinking of it first.

    also chloeally is a great name for posting this. sounds like a person not a brand. that's rare these days.

    1. 1

      Love this breakdown — and yes, the amnesia problem is exactly what we set out to fix. On dynamic sites: when a site changes structure, Ally detects the mismatch on the next run and flags the Manual for update. It doesn't silently fail — it surfaces the issue so you can correct it. The rebuild is targeted, not a full re-exploration. And you nailed the ROTI framing — cost per task is a snapshot, ROTI is the trend line. That's the metric that actually tells you if the tool is compounding or just burning. Hope AllyHub surprises you! https://discord.gg/WNMTr3w3pC

  14. 1

    This is a really interesting shift, especially framing it as compounding intelligence vs reset intelligence. That distinction makes the value much clearer.

    ROTI is a smart way to think about it, and the reuse layer (manuals + playbooks + skills) feels like where real leverage comes from. Most tools optimize for output, but not for learning over time.

    One thing I’d be curious about is how visible that compounding effect is to users in the experience because the value here isn’t just the cost savings, it’s users feeling that the system is getting smarter with them.

    Really cool direction this feels closer to how people actually want to work with AI long-term.

    1. 2

      The visibility question is exactly right — and it's something we think about a lot. The compounding effect needs to be felt, not just measured. In AllyHub, you can literally see your Manuals and Playbooks grow over time — each one is a saved piece of intelligence the agent built from your work. When you run a task and it's 4x faster than the first time, you see why. It's not a black box getting smarter — it's a transparent system you can inspect and edit. That's the 'feeling smarter with you' moment we're going for. Come try it: https://discord.gg/WNMTr3w3pC

      1. 1

        That makes a lot of sense and I like the direction of making the intelligence visible instead of keeping it as a black box.

        Appreciate you sharing the link, I’ll take a look at how that “compounding moment” actually shows up in the early experience.

        I think the interesting challenge here is less about whether the system is compounding, and more about when users actually feel that.

        Because if that “4x faster” moment comes after a few uses, there’s a window early on where users might not yet connect the effort to the long-term payoff, and that’s usually where drop-off can happen.

        Curious, do you guide users toward that first “compounding win” during onboarding, or is it something they discover over time?

        1. 1

          You've identified the exact tension we're working on. Right now, the compounding moment is mostly discovered - users run a task, run it again, and see the difference. We're actively working on making that first win more deliberate during onboarding. The goal is to design a 'Task 1 to Task 2' moment that's fast enough that users feel the compounding before they have a chance to drop off. It's the most important UX problem we're solving right now.

          1. 1

            That makes a lot of sense, the “Task 1 → Task 2” moment is exactly where users either feel the product or quietly drop off.
            This is usually the point I focus on in onboarding audits where users haven’t felt the value yet, even though it’s there.
            In most cases, it’s not just about speeding that moment up, but about how deliberately it’s framed and experienced. Small shifts there can completely change whether users connect the effort to the payoff.

            It’s a tricky problem, but also one of the highest leverage points in a product.

  15. 1

    This comment was deleted 2 months ago.

    1. 1

      Great questions. What persists: structured layers — Skills (judgment/preferences), Manuals (how to operate specific tools/sites), and Playbooks (repeatable workflows). Not raw chat history. On degradation: because memory is structured and editable, stale context doesn't silently accumulate — you can prune, update, or override any layer. On real-time/low-latency: AllyHub is currently optimized for async task execution rather than real-time loops, so the architecture is different from what you're building. But the core insight — structured persistent state beats reset-on-every-session — applies to both. Happy to dig into this more in our Discord: https://discord.gg/WNMTr3w3pC

      1. 1

        Thank you for clarifying everything.
        These days, we are bringing together software developers to work on an AI mental fitness app, and you seem to have expertise in AI.
        If you are interested in joining our team, feel free to ask me anytime. Sorry but I don’t use Discord.
        You can contact me here: [email protected]

        1. 1

          Thanks for the kind words and the invitation! We're heads-down building AllyHub right now, but an AI mental fitness app sounds like a fascinating space. Wishing you and your team the best with it!

Trending on Indie Hackers
Your build-in-public audience is not your market. I learned the difference the slow way. User Avatar 249 comments Most founders don't have a product problem. They have a visibility problem User Avatar 67 comments Day 4: Why I Built a $199 Workspace Nobody Asked For User Avatar 46 comments How to automatically turn customer feedback into high-converting testimonials User Avatar 39 comments Built a "stocks as football cards" thing. 5 days in, my launch tweet got 7 views. What am I missing? User Avatar 34 comments Spent months building LazyEats AI. Spent 1 day realizing I have no idea how to get users. User Avatar 29 comments