I built an open-source PII masking layer for LLM APIs — early traction, looking for design partners

I kept running into the same wall while talking to developers at healthtech and fintech companies: they wanted to use LLMs to automate workflows, but their data had names, emails, Aadhaar numbers, PAN cards, SSNs in it. Sending that to OpenAI or Anthropic felt wrong — legally and ethically.

Most teams were either skipping LLMs entirely or hand-rolling their own scrubbers. Neither felt like the right answer.

So I built Armos.

It wraps the OpenAI and Anthropic Python SDKs. Before your prompt goes out, PII is detected locally (nothing leaves your machine during detection), replaced with reversible tokens. The LLM sees tokens, responds with tokens, Armos swaps real values back. Your app gets the original text. The model never does.

The entire integration is one line:
client = ArmosOpenAI(OpenAI())

Where I am:

Just launched v1.2.1 on PyPI
Detects 10 entity types including India-specific ones (Aadhaar, PAN)
Got a warm lead from a tax automation company for a design partnership
HN post going up tomorrow

What I'm looking for:

Developers building on sensitive data (health, finance, legal, HR) who
want to trial this early
Feedback on what's missing — entity types, framework integrations,
async/streaming support
Honest criticism of the approach

Still early and rough around the edges. Would love to connect with anyone
hitting this problem.

GitHub: https://github.com/armos-ai/armos-python
Docs: https://armos.dev
pip install armos

Dhroov Gupta

posted to

Growth

on May 25, 2026

Say something nice to dhroovgupta…

Post Comment

2
Local-detection-first is the right architectural call — once data leaves the user machine, the trust math falls apart, especially in the verticals you're targeting.

Your bigger challenge isn't more entity types, it's that the buyer in healthtech/fintech is the compliance officer, not the developer. Engineers love Armos because it solves their problem; compliance signs the PO.

Two things worth doing before scaling design partners:
- Publish a public evasion test set (50 messy real-world PII shapes, what you catch, what you intentionally don't). Gives compliance something concrete to evaluate against, not vibes.
- Audit logs: what got masked, when, by whom. Compliance needs the trail even if they never see the originals.
Tech wedge is solid. Buyer narrative is where you can pull ahead of hand-rolled scrubbers.
ouuki

·
a month ago
·
Reply
1. 1
  
  Absolutely! I’m currently working on an MVP for an audit trail dashboard, which should be out soon. The goal is to give compliance officers a much clearer view of what data is being masked, detected, and sent through LLM workflows.
  
  On the buyer side, I’ve been trying to connect with compliance officers to better understand their requirements, but haven’t had much luck so far. I’ll keep at it. Thanks for the feedback!
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
  1. 1
    
    Glad the audit trail point landed. One thing I'd flag before you sink much time into dashboard UI: the compliance folks I've talked to care way less about charts than about an append-only log they can export and hand to an auditor, so a boring CSV or SIEM export might close design partners faster than visualizations. Are you speccing the MVP with an actual compliance officer at one of your pilot companies, or working from the dev side and guessing at what they'll ask for? That answer tends to decide whether the first version sticks.
    
    ouuki
    
    ·
    22 days ago
    ·
    Reply
2

The token substitution approach is clever — the model never sees real PII but the response still maps back cleanly. That's a much more elegant solution than the "just don't send sensitive data" advice most compliance teams give, which usually means not using LLMs at all.

The India-specific entity types are a smart differentiator. Aadhaar and PAN are everywhere in fintech there and most Western PII libraries just ignore them entirely.

Two things I'd want to know before using this in production: how does it handle PII that appears in unexpected formats (partial numbers, typos, regional variations)? And what's the performance overhead on detection — if I'm running this on every prompt in a high-volume pipeline, does it stay fast enough to not matter?

The "one line integration" pitch is exactly right for this audience. Developers don't want to think about it, they just want it to work. Good luck with the HN launch.

ibrh96

·
a month ago
·
Reply
1. 1
  
  Thanks! Depends on prompt size — ~20ms for ~150 tokens.
  Still working on unexpected formats — anything below a 0.34 threshold gets ignored for now.
  
  If you get a chance to try it, would love your feedback — it’ll help me improve it 🙌
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
2

The technical wedge is right, but the actual buyer for this in healthtech and fintech is usually the compliance officer, not the developer. Lean your roadmap into the audit story: what got masked, when, who saw the tokens, and proof the model never saw the originals. Developers ship it. Compliance pays for it. Also worth talking to MSPs and consultancies serving regulated industries. They have the design partners and the deal velocity. Happy to introduce you to a couple if useful.

GregoryScottHenson

·
a month ago
·
Reply
1. 1
  
  Really appreciate the framing — compliance officer as the buyer makes sense, and the audit trail is already on the roadmap for exactly that reason.
  
  On the intros — would love that actually. Even if it's early for MSP partnerships, talking to people close to the problem in regulated industries would help shape the roadmap. If you're open to making a couple of intros, I'm at [email protected].
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
2

Privacy infra for AI is going to become massive as adoption scales. Open-source is a smart move here — builds trust much faster in security-related tooling.

perfectstereotype

·
a month ago
·
Reply
1. 1
  
  Thanks! That's the moat. Would love for you to try it out and share your thoughts! And if you find it useful, starring the repo would mean a lot — it really helps with visibility.
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
2

The local-first part feels like a big deal here.
If a team is already nervous about sending sensitive data to an LLM provider, asking them to send it through another hosted tool would probably be a much harder sell...

ale023

·
a month ago
·
Reply
1. 2
  
  Thanks! That is what i am trying to achieve with Armos! Let's see how it goes.
  
  Would love for you to try it out and share your thoughts! And if you find it useful, starring the repo would mean a lot — it really helps with visibility.
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
2

Awesome idea brother

BenDiesenreiter

·
a month ago
·
Reply
1. 1
  
  Thanks! Would love for you to try it out and share your thoughts! And if you find it useful, starring the repo would mean a lot — it really helps with visibility.
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
  1. 2
    
    Would love to but for me currently not necessary. But if we extend i will come back to you
    
    BenDiesenreiter
    
    ·
    a month ago
    ·
    Reply
2

I like this idea.

ooh

·
a month ago
·
Reply
1. 1
  
  Would love for you to try it out and share your thoughts! And if you find it useful, starring the repo would mean a lot — it really helps with visibility.
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
  1. 1
    
    Sure thing.
    
    ooh
    
    ·
    a month ago
    ·
    Reply
2

really like the local-detection-first design, that's the part most hand-rolled scrubbers get wrong.

one thing worth being clear with your health/fintech design partners on: masking reduces the exposure but it doesn't take openai/anthropic out of their subprocessor chain. the api call still happens, so in a security review they'll still get asked "is anthropic a subprocessor, and is it disclosed?" the honest framing is risk-reduction, not "you don't have to disclose the llm anymore" - if a partner assumes the latter you'll both get burned in an audit.

also curious: where does the reversible token <-> real value map live? if it's persisted anywhere, that store kind of becomes the new crown jewel (and the new audit target). in-memory only?

boussettah

·
a month ago
·
Reply
1. 1
  
  Both points are exactly right and worth saying clearly.
  
  On the subprocessor chain — yes, masking doesn't remove OpenAI or Anthropic from the picture. The API call still happens, they're still processing data, they still need to be disclosed. What changes is what they process: tokens with no intrinsic meaning instead of raw PII. "John Peter" becomes [PII:NAME:c4587843]. The model reasons over that token fine — but if that data ever leaked from the provider's side, there's nothing recoverable. That's risk reduction, not compliance elimination, and I should be more explicit about that framing.
  
  On the vault — by default it's in-memory only, ephemeral per process. No persistence, nothing to audit. For multi-turn conversations where you need tokens to survive across requests, there's an optional Redis backend — but it's always the customer's own Redis, never ours (there is no ours). You're right that the Redis store becomes the new crown jewel: it holds the token→value map, so it needs to be treated like any other sensitive datastore — private network, auth, TLS, TTL on entries. Happy to document this more explicitly as a security consideration.
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
1

One thing I would add before chasing more entity types is a public failure-mode test set. For example: "here are 50 prompts with messy real-world PII shapes, here is what Armos catches, here is what it intentionally does not catch yet." That gives developers something concrete to trust and gives design partners an easy way to say "our data looks like case 17, but with physician IDs."

For the wedge, I would also separate two ICPs: teams trying to pass a security review for an LLM feature, and teams that already shipped one and are now worried about leakage. The second group will have sharper urgency and better examples.

JohnMadison

·
a month ago
·
Reply
1. 1
  
  Thanks — both points are spot on. The failure-mode test set resonates; we already have internal cases showing what we catch vs. miss, just need to make them public.
  
  Would love for you to try it out and share what breaks. pip install armos — quickstart at https://armos.dev
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
1

Good timing on this. We supply regulatory data (congressional bill tracking, vote alerts, hearing schedules) via goffer.ai webhook — and the compliance teams using it are exactly your target: fintech/legal teams that pipe bill summaries alongside client portfolios through LLMs. The PII bleed problem is real in that workflow — bill impact analysis often has client reference data in context. The reversible tokenization approach is the right call over regex scrubbers. For the data side: goffer.ai covers Congress.gov well; for state-level we layer in OpenStates. Anyone building LLM workflows specifically around FTC or SEC regulatory action feeds?

3vo

·
a month ago
·
Reply
1. 1
  
  Good to know the problem is real in that workflow. The fintech/legal teams actually building those LLM pipelines are exactly who I'm looking for — if any of them are hitting this directly, I'd love an intro. [email protected]
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
1

PII handling in LLM pipelines is trickier than it looks — especially when users mix languages mid-sentence. What's your approach for named entities in non-English text? Curious if you've tested it with Spanish/multilingual inputs.

worvi26

·
a month ago
·
Reply
1. 1
  
  Not planned yet, but if you're hitting this with Armos I'd love to understand the use case — happy to explore it as we expand language support.
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
  1. 1
    
    Not Armos — building Worvi (WhatsApp AI bot template) where PII masking comes up constantly (clinics, restaurants getting health/order data via chat). My use case: mask names/phones/emails in user messages before sending to Claude API, then unmask in the response. Currently regex-based but breaks on edge cases (accents, abbreviations). Would your layer handle Spanish PII out of the box?
    
    worvi26
    
    ·
    a month ago
    ·
    Reply
1

I love this..., if you are looking for high quality leads for your business and want help with scaling, you can send me a message on telegram @caseyimafidon, let's help you make money

Castilnatic

·
a month ago
·
Reply
1

nice that the integration is one line, that's usually where these things die so good call.

honestly the part i'd stress test is recall on detection. tokenizing what you catch is the easy bit, the real risk is the span you miss, a name in a weird format or an id that doesn't match a known pattern, and one miss means real pii goes to the model anyway. is there a way to fail safe on low confidence stuff, or at least surface what it almost flagged?

other thing, the token map is reversible so that mapping kind of becomes the new sensitive asset itself. where does it live and how long does it stick around? thats probably the first thing a security person at a healthtech would poke at.

chalermpon

·
a month ago
·
Reply
1. 1
  
  Thanks for posting this!
  On the fail safe part, let me come back on this, this is a good point which needs to be done.
  
  On the vault: in-memory by default — lives in process RAM, gone when the process ends, nothing persists. For multi-turn conversations there's an optional Redis backend, but it's always the customer's own Redis instance, never mine (there's no Armos server). Default TTL is 24 hours. Either way, the mapping never leaves your infrastructure.
  
  dhroovgupta
  
  ·
  a month ago
  ·
  Reply
1

This is a strong wedge because you are not selling “LLM security” in a vague way. You are solving a specific blocker that sensitive-data teams already feel: they want LLM automation, but they cannot casually send names, IDs, tax data, health data, or legal records into external models.

The local detection plus reversible token layer is the right trust angle. I would make that the center of the positioning: Armos is not just a wrapper, it is the privacy boundary between regulated workflows and LLM APIs.

One thing I’d pressure-test before the HN post and design partner conversations is the name. Armos is decent, but for healthtech, fintech, legal, and HR developers, the brand has to immediately feel secure, technical, and serious. This is infrastructure sitting between sensitive data and foundation models, so the name carries trust before the docs even do.

Vroth .com would fit that layer better if you want it to feel like hard security infrastructure for LLM workflows, not just an open-source SDK. The product direction is strong enough that naming is not cosmetic here. It affects whether security-conscious developers read it as a real privacy layer or another early wrapper.

aryan_sinh

·
2 months ago
·
Reply
1. 1
  
  Really appreciate this — the "privacy boundary between regulated workflows and LLM APIs" framing is sharper than how I've been positioning it. Stealing that.
  
  On the name — I hear you, and I don't disagree that names carry
  trust in security infra. But I'd rather not sweat it at this stage.
  
  No paid users, no enterprise contracts, nothing that makes a rebrand painful. If the
  product earns trust with the right teams, Armos won't have been the thing that stopped them. I'll revisit naming seriously before any real scaling push.
  
  What I'm more focused on right now is getting it in front of sensitive-data teams and letting them pressure-test the actual trust layer — the local detection, the reversible tokens, the zero PII to the model. That's where I want the feedback loop first.
  
  Are you building in any of these spaces? Would love to hear where you'd see this fitting or breaking.
  
  dhroovgupta
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    That makes sense. If there are no paid users or enterprise contracts yet, getting the trust layer pressure-tested matters more than renaming today.
    
    I’m not building directly in healthtech/fintech/legal, but the strongest fit I see is sensitive-data workflows where teams already want LLM automation but cannot justify sending raw PII into external models.
    
    Examples:
    
    healthtech admin/support workflows
    legal intake and contract review
    fintech support/compliance notes
    HR records and employee data
    insurance claims
    B2B SaaS tools handling customer records
    
    Where I think it breaks is if Armos is framed as an SDK feature instead of a privacy boundary.
    
    The sharper first-user angle is probably:
    
    “Use LLMs in sensitive workflows without sending raw PII to the model.”
    
    That is much easier for sensitive-data teams to understand than a generic “LLM security” pitch.
    
    If useful, I can put together a quick GTM/outreach pack around this wedge: target profiles, 3 cold emails, 3 LinkedIn DMs, 3 follow-ups, and the cleanest positioning angle for getting design partners.
    
    I’m doing a few quick ones at $49 to move fast. This one is a good fit because the pain is specific and the buyer profile is clear.
    
    LinkedIn: https://www.linkedin.com/in/aryan-y-0163b0278/
    
    aryan_sinh
    
    ·
    2 months ago
    ·
    Reply
    1. 1
      
      Thanks for this — genuinely useful framing.
      
      On the GTM pack — I appreciate the offer, but right now I'm not focused on outreach. The goal at this stage is developer adoption and finding 3–5 design partners who are actually hitting this problem, so I can let the roadmap be shaped by real use cases before I start selling anything. Happy to revisit when that changes.
      
      If you're building something where this friction comes up, I'd love to hear what it looks like from the inside.
      
      dhroovgupta
      
      ·
      a month ago
      ·
      Reply
      1. 1
        
        That makes sense. Design partners are the right step before selling this broadly.
        
        I’d separate broad outreach from design-partner discovery.
        
        For Armos, the useful motion is not “pitching buyers.” It is finding developers or teams already blocked by the exact workflow: they want LLM automation, but raw PII, compliance, or customer-record risk stops them from using external models safely.
        
        So I would not target generic security teams first.
        
        I’d target builders inside healthtech, legaltech, fintech, HR, insurance, and support-heavy B2B SaaS who have already tried to connect LLMs to sensitive internal workflows and hit the data-boundary problem.
        
        That is where the conversation becomes product discovery, not sales.
        
        The sharper ask is probably:
        
        “Are you currently avoiding or limiting LLM automation because sensitive customer data would leave your environment?”
        
        That gets you closer to the 3–5 design partners you actually need.
        
        If you want to move faster on that, I can put together a small design-partner discovery pack: target profile, qualification angle, 3 discovery messages, 3 follow-ups, and the exact problem framing to use.
        
        I’d keep it tight and practical, not broad GTM.
        
        aryan_sinh
        
        ·
        a month ago
        ·
        Reply