4
18 Comments

Lumra – Turning Prompt Engineering into a Real Workflow

Hey Indie Hackers đź‘‹

I recently launched Lumra, a prompt management platform built for developers, indie builders, and prompt engineers who are working seriously with AI.

As AI-powered products grow, prompts stop being simple inputs and start becoming core logic.
They evolve, get refined, reused, broken, fixed — just like code.
But most tools don’t treat prompts that way.

In practice, prompts usually end up:
• Scattered across Notion pages and Google Docs
• Copied between projects with no version history
• Tweaked ad-hoc without clarity on what actually changed or why

That friction was slowing me down in my own projects — so I built Lumra.

What Lumra focuses on

Lumra is designed to make prompt engineering feel closer to software engineering:
• Structured prompt storage instead of loose text
• Versioning so you can iterate safely
• Clear organization across projects and use cases
• Reusability without copy-paste chaos

The goal isn’t to generate prompts for you — it’s to help you manage, refine, and scale the prompts you already care about.

Who it’s for
• Indie hackers building AI-powered products
• Developers shipping LLM features in real apps
• Prompt engineers who want clarity and control
• Anyone tired of losing “that one good prompt”

Indie-built, early-stage

Lumra is fully indie-built and still early.
I’m actively shaping it based on real usage and feedback from builders.

👉 Try it here: https://lumra.orionthcomp.tech

I’d love to hear:
• How you currently manage prompts
• What breaks in your workflow
• What you’d expect from a “GitHub for prompts”

Thanks for reading — and happy building

posted to Icon for group Building in Public
Building in Public
on December 29, 2025
  1. 1

    This resonates. Prompts scattered across Notion and docs is exactly where most teams are stuck. Versioning helps, but I think there's a step before that: the prompt itself needs internal structure.

    Most prompts are a single wall of text. The model has to guess where the role ends, where the constraints start, and what format you expect. When you break a prompt into typed sections (role, objective, constraints, output format, examples) each piece gets explicit boundaries.

    I've been building flompt (https://github.com/Nyrok/flompt) around this. It's a visual prompt builder where you decompose into 12 semantic blocks on a canvas, then compile to Claude-optimized XML. Open source, 75+ stars. Different angle than Lumra but same core insight: prompts are code, not prose. Would love to hear how you handle prompt structure inside Lumra's versioning.

  2. 2

    Turning prompt engineering into a workflow is the right direction — value usually shows up once it stops being “clever prompts” and starts being repeatable outcomes.

    At this stage, what’s the one signal you’re watching to know this is working — successful first run, repeat usage, or something else?

    1. 1

      Thank you. Over time, I’m continuing to improve Lumra.
      Soon, an output scoring system will be introduced, making it possible to measure whether prompts are actually working and how well they perform. Even now, the integrated usage metrics already provide valuable signals about prompt effectiveness and real-world performance.

      1. 2

        That makes sense — scoring will help once it’s in place.

        Before the scoring system exists, what’s the current behavior you trust most as a proxy for prompt quality?

        For example: users re-running the same workflow without editing prompts, prompts being reused across different inputs, or users abandoning “experimentation mode” and just running jobs.

        I’ve found those shifts often show up before explicit quality scores do.

        1. 1

          Right now, before explicit scoring is in place, the strongest proxy we trust is actual usage frequency.
          We track how often prompts (and specific prompt versions) are used in real runs. Over time, that usage naturally concentrates around the prompts that perform better in practice.
          When users consistently choose the same prompt or version—without tweaking it—that’s a strong signal it’s doing its job. In that sense, usage becomes an implicit quality signal long before we attach explicit scores to it.

          1. 2

            That makes a lot of sense.
            What I like about usage-as-signal is that it’s revealed preference, not self-reported intent — users don’t need to explain what works, they just keep coming back to it.

            In practice, that seems to separate “prompts that feel clever” from workflows that actually earn trust through repetition. Once people stop tweaking and just run jobs, the product has crossed an important line.

            Looking forward to seeing how explicit scoring complements that implicit signal over time.

  3. 2

    I'm building an AI-powered news aggregator and the "prompts scattered across Notion" problem resonates. My summarization prompts evolved differently for tutorials vs news vs opinion pieces — now keeping track of which version works best for each content type is becoming its own challenge.

    Curious: how do you think about prompt "context" beyond just versioning? Like tagging prompts by use case or the model they're optimized for?

    1. 2

      Absolutely — I completely get that. In power use cases like this, organizing prompts is critical for consistency and for maintaining a faster, more efficient development process.
      In your situation, being able to separate prompts with proper tags and categories, apply version control when needed, and — if you’re building in the browser — use Lumra’s Chrome extension brings a lot of advantages. You can save prompts automatically with live input sync as you write, create prompts effortlessly, and quickly reuse existing ones from your library. It integrates naturally into the workflow without adding friction.
      We also keep the tagging and category system fully flexible. Instead of choosing from a fixed list, you can use your own tags and categories — whether you want to organize by model, content type, or anything else. You can generate new tags on the fly for future prompts or easily reuse ones you’ve already created.

      1. 2

        Thanks for the detailed response! The flexible tagging system is exactly what I was hoping to hear — being able to define my own categories like "tutorial-summarization" vs "news-brief" vs "opinion-analysis" makes a lot more sense than choosing from a fixed list.

        The Chrome extension with live sync is interesting too. I've been manually copying prompts between my dev environment and docs — having that happen automatically would cut a lot of friction.

        One follow-up: when you mention "reuse counts per version" — does that help surface which prompt variants are actually being used in production vs which ones are just experiments sitting in the library? That visibility would be useful for cleanup.

        Will give Lumra a proper try this week. Thanks! 🙏

        1. 2

          Glad that resonated — you’re exactly describing the kind of friction Lumra is meant to remove.

          On the reuse counts per version question: yes, that visibility is very much intentional and it works the way you’re hoping.

          In practice, usage counts increase whenever a specific prompt version is actively pulled into your workflow. For example:

          Clicking a prompt (or a specific version) in the Chrome extension and inserting it directly into an input field

          Copying a prompt version from the library to use in ChatGPT, an IDE, or another tool

          Those actions increment the usage count for that exact version, not just the prompt as a whole. Over time, this gives you a very clear signal:
          Which prompt versions are actually used in day-to-day or production workflows
          Which ones were experiments, spikes, or one-off tests that never stuck
          That makes cleanup and consolidation much easier — you can confidently hava a clear vision on versions that have zero or very low real usage, instead of guessing.

          Think of it as the equivalent of:
          Git commits that are merged and deployed vs
          Commits that only ever lived on a local branch

          Once you start using the extension regularly, patterns emerge pretty quickly.

          Appreciate you giving Lumra a proper try — if you end up wanting to tune tagging or versioning conventions around your own workflows, that’s where it really starts to shine.

  4. 2

    This makes a lot of sense. Once prompts become part of product logic, treating them like code feels necessary.

    I’ve seen prompts get messy fast when teams iterate without versioning or context on what changed. Framing this as workflow and organization rather than prompt generation is a smart angle.

    Curious how builders end up using this once projects scale.

    1. 1

      Exactly. A pure “prompt generator” is a bit surface-level — useful at the beginning, but limited in the bigger picture. When you look at real systems, the real value isn’t generating prompts, it’s managing them.
      Today, prompts sit alongside code as part of a product’s core logic. Even setting aside vibe-coding, AI has quietly become embedded in many critical parts of traditional development workflows. And beyond coding, in any domain where AI is used seriously, poorly managed prompts quickly become a source of fragility.
      That’s why we see prompt management as infrastructure, not a feature — something teams rely on as projects grow and complexity sets in.

  5. 2

    The "GitHub for prompts" framing is compelling. Prompts really do evolve like code - they have edge cases, they break in unexpected contexts, they need to be tested against different inputs. The versioning angle is smart.

    Curious about a few things:

    1. How do you handle prompt dependencies? A lot of my prompts reference other prompts (like a system prompt that calls a formatting template). Does Lumra support that kind of composition?

    2. What's the collaboration story? If I'm iterating on prompts with a team, can we see who changed what and why - like commit messages for prompts?

    3. How do you think about testing? The hardest part of prompt engineering isn't writing - it's knowing if version B is actually better than version A. Any built-in way to compare outputs across versions?

    The "loose text scattered across Notion" problem is real. I've lost track of good prompts more times than I'd like to admit.

    1. 1

      From the start, Lumra was designed as a single source of truth for prompts that works across workflows. The Chrome extension plays a key role here: it lets you stay in your flow without switching tools. With live input sync, while you’re writing a prompt in the web app, the extension is already keeping it saved and ready — preserving both speed and focus.
      Versioning is another foundational piece. Being able to store and compare multiple versions of a prompt matters not just today, but for where Lumra is evolving next.
      Lumra is intentionally built around efficiency and simplicity. This is the first public version, but many of the features you’re asking about are already on the near-term roadmap and visible in the “coming beta features” section on the site.
      Team collaboration isn’t live yet, but the groundwork is already there:
      Prompt versions store author and editor metadata
      You can compare versions word-by-word with highlighted differences
      Each version supports its own notes
      Next, we’ll enable teammates to create versions, leave notes, and collaborate directly on the same prompt.
      Output comparison is also planned. One idea we’re exploring is scoring or rating prompt outputs and attaching that evaluation to version metadata. In the meantime, Lumra already shows usage signals like copy and reuse counts per version.
      Prompt-to-prompt references weren’t part of the original plan, but it’s a great idea and something we can definitely add to the roadmap.
      Lumra is currently live with features like live input sync, Chrome extension, prompt versioning, drafts, favorites, tags, categories, and basic analytics — with much more coming.
      Appreciate the thoughtful feedback.
      — Taha, Founder of Lumra

      1. 2

        The "stay in your flow" philosophy really resonates. Context switching is the silent killer of creative work, and a Chrome extension that keeps pace with your thinking without demanding attention is the right design instinct.

        The word-by-word diff for versions is smart. Prompts are dense - a single changed word can completely shift behavior. Making that visible turns debugging from guesswork into archaeology.

        Scoring outputs attached to version metadata could be powerful. The hard part with prompts isn't knowing what changed - it's knowing if the change was better. Even a simple "this worked / this didn't" signal per version would start building intuition over time.

        Glad the prompt-to-prompt reference idea landed. Complex prompt systems end up being compositional whether you plan for it or not. Surfacing those relationships early could prevent a lot of "wait, which version of that template am I using?" confusion.

        1. 1

          Exactly — just like in architecture or code, every brick, variable, or function detail matters. With prompts, even a single word can completely change behavior. That’s why word-by-word comparison is such a core feature for us.
          Being able to see exactly what changed — and attach version-specific notes — creates a much clearer working environment, especially for advanced or long-lived use cases.
          On the output side, scoring feels like a logical starting point for evaluating prompt versions within an organization and checking alignment with expectations. That said, we’re very open to exploring alternative or complementary ways of comparing outputs, and feedback there is always welcome.
          For complex use cases, we fully agree that a prompt-to-prompt referencing system enables much more advanced and composable prompt chains. It’s a powerful concept, and we’ve already moved it high on our roadmap. In this case, earlier really is better.

          1. 1

            The "architecture and code" parallel is apt. What's interesting is that the tolerance for ambiguity is inverted: in code, a vague function name is annoying but survivable. In prompts, vague language is the bug itself.

            The roadmap prioritization on prompt-to-prompt references sounds right. Most prompt workflows start simple - single prompts doing isolated tasks. But the moment you hit real complexity (multi-step agents, role-based contexts, domain-specific formatting), composition becomes non-negotiable. Building that muscle into the tool early means users don't have to re-architect when they scale.

            One pattern I've noticed: the most useful prompt management isn't about personal clarity alone - it's about organizational consistency. When three people on a team each have their own "best" version of the same prompt, you end up debugging prompt drift alongside output drift. Version metadata that shows not just what changed but who uses which version could surface that fragmentation before it causes downstream issues.

            Excited to see where this goes. The "every word matters" reality of prompt engineering really does need tools that take it that seriously.

            1. 1

              Well said — especially the point about ambiguity being the bug in prompts. That really captures the core challenge.
              Composition, collaboration, and evaluation are exactly the areas we’re leaning into. Prompt-to-prompt references, clear authorship and intent per version, and lightweight signals around “did this actually work?” all aim to reduce prompt drift as systems and teams scale.
              We see prompts as first-class artifacts, not scattered text — once you treat them that way, versioning and consistency become infrastructure, not extras.
              Thanks for the sharp insights. Excited to keep building in this direction and learn from how people like you are actually using these tools.

Trending on Indie Hackers
I'm a lawyer who launched an AI contract tool on Product Hunt today — here's what building it as a non-technical founder actually felt like User Avatar 142 comments “This contract looked normal - but could cost millions” User Avatar 54 comments A simple way to keep AI automations from making bad decisions User Avatar 52 comments 👉 The most expensive contract mistakes don’t feel risky User Avatar 41 comments The indie maker's dilemma: 2 months in, 700 downloads, and I'm stuck User Avatar 40 comments Never hire an SEO Agency for your Saas Startup User Avatar 39 comments